HARNESSING THE COMPLEXITY OF NATURAL EVOLUTION TO ENHANCE
                  EVOLUTIONARY ALGORITHMS
                                    By
                        Vincent Romeo Ragusa
                          A DISSERTATION
                              Submitted to
                      Michigan State University
               in partial fulfillment of the requirements
                            for the degree of
               Computer Science—Doctor of Philosophy
                                   2023


                                         ABSTRACT
    Evolutionary computation is a powerful optimization tool, and an invaluable test bed for
population genetics. Evolutionary algorithms can become stuck on local optima, but can
escape these traps by temporarily losing fitness in order to discover even higher fitness in a
process called valley-crossing. Valley-crossing is fundamentally linked to the balance between
the forces of selection and variation, and as such, controlling this balance is important for
optimizing the efficiency of evolutionary algorithms. Nature, in contrast, is not actively opti-
mized for performance, and yet nature seems to overcome many challenges that evolutionary
algorithms do not. It is possible that nature benefits from a highly dynamic balance between
selection and variation, and this constant flux helps natural populations avoid stagnation
and overcome obstacles in the fitness landscape. Working with this hypothesis in mind, I
investigate the nature of selection and how natural phenomena strengthen or weaken it.
    I find that selection strength can be thought of as the degree to which an evolving system
is dissimilar to neutral drift. This perspective opens the door to accept all phenomena that
affect the strength of selection as part of a unified theory of selection that treats selection
strength as an emergent property. I present a new evolutionary dynamic – the free-for-all
effect – that is the reduction of selection strength on organisms with higher-than-average
fitness. Free-for-all can result in rapid evolutionary adaption that would otherwise seem
impossible, and provides an elegant explanation for punctuated equilibrium. The discovery
of free-for-all highlights the importance of spatial structure in evolving populations, and has
led to the design of a new evolutionary search method called super explorers. Super explorers
mimic the free-for-all effect, and improve evolutionary search, while placing full control into
the hands of the algorithm designer.


Copyright by
VINCENT ROMEO RAGUSA
2023


                                ACKNOWLEDGMENTS
    I want to thank my advisor Chris Adami, who exemplifies what being a mentor is all
about. Chris welcomed me into his lab and helped me flourish. I would be remiss not to thank
Arend Hintze, who mentored me for the first half of my journey, and always encouraged me
to follow my instincts. I want to thank my committee members Charles Ofria and Wolfgang
Banzhaf, who have lent me their vast knowledge and perspective over the years. Of course,
I must thank Cliff Bohm. Cliff and I have shared an office for the last 6 years, and I am
grateful for every moment. Cliff is a great collaborator, and an even greater friend. There
are a number of people who have had a disproportionally large impact on my life. Many
of them helped steer me to where I am today, knowingly or not. I want to thank Michael
Harper, Matt Flemming, Anthony Rodriguez, and David Mathias for fostering my love of
computer science and inspiring me to keep learning. Last but not least, thank you to my
family and friends who have supported me with their love and laughter.
                                             iv


                           TABLE OF CONTENTS
Chapter 1 Introduction . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . .   1
   1.1 Background and Motivation . . . . . . .         . . . . . . . . . . . . . . . . .   1
   1.2 Literature Review . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . .   2
   1.3 Research Objectives and Contributions           . . . . . . . . . . . . . . . . .  17
   1.4 Research Methods . . . . . . . . . . . . .      . . . . . . . . . . . . . . . . .  18
   1.5 Outline of Dissertation . . . . . . . . . .     . . . . . . . . . . . . . . . . .  21
   1.6 Summary . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . .  22
Chapter 2 Quantifying the impact of selection .        . . . . . . . . . . . . . . . . .  23
   2.1 Introduction . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . .  23
   2.2 Properties of selection strength . . . . .      . . . . . . . . . . . . . . . . .  25
   2.3 The Selection Impact Metric . . . . . . .       . . . . . . . . . . . . . . . . .  27
   2.4 Modeling the Offspring Distributions of         Neutral Drift     . . . . . . . .  29
   2.5 Counting Offspring and Descendants . .          . . . . . . . . . . . . . . . . .  34
   2.6 Empirical data . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . .  36
   2.7 Discussion . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . .  42
   2.8 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  47
Chapter 3 Noisy fitness and selection strength       . . . . . . . . . . . . . . . . . .  49
   3.1 Introduction . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . .  49
   3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  52
   3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  57
   3.4 Discussion . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . .  63
Chapter 4 The role of disequilibrium     in evolutionary discovery         . . . . . . .  69
   4.1 Introduction . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . .  69
   4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  73
   4.3 Discussion . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . .  81
   4.4 Conclusion . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . .  89
   4.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  90
Chapter 5 Augmenting evolutionary algorithms           with Super explorers        . . . 94
   5.1 Introduction . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . 94
   5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
   5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
   5.4 Discussion . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . 108
   5.5 Conclusion . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . 115
Chapter 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . 117
   6.1 Measuring the impact of selection . . . . . . . . . . .         . . . . . . . . . 117
   6.2 Altering selection strength with fitness noise . . . . .        . . . . . . . . . 120
   6.3 The evolutionary free-for-all effect . . . . . . . . . . .      . . . . . . . . . 122
   6.4 Super Explorers: protected, sustained, neutral drift            . . . . . . . . . 128
   6.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . 131
                                           v


BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
                                      vi


Chapter 1
Introduction
1.1      Background and Motivation
Evolutionary computation (EC) is a field of artificial intelligence concerned with solving
optimization problems with algorithms inspired by evolution. These bio-inspired optimization
algorithms are called evolutionary algorithms (EAs). In order to solve an optimization
problem with EC, a population of digital organisms is created, which each represent a
candidate solution to the optimization problem. Some candidate solutions will be of higher
quality than others (they have higher fitness) and those high-fitness candidates are selectively
chosen to reproduce to form the next generation of solutions. The selection bias towards
higher fitness organisms increases the average fitness of the population over time. Digital
organisms tend to produce offspring identical to themselves; the information that defines the
parent is inherited by the child. Occasionally, however, small errors are introduced to create
variation among the members of the population. These small mutations change the child
organism and the candidate solution they represent. These random mutations are sometimes
beneficial and increase the organism’s fitness. Since organisms with high fitness are preferred
by selection, the population will slowly evolve towards ever higher fitness as these beneficial
mutations are discovered and build on one another. In the end, when there are no more
                                                1


improvements to discover, the highest quality solution to the optimization problem is now
available, and the optimization problem is solved. Hopefully.
    The scenario I just described is the ideal outcome of using EC to solve an optimization
problem. In practice, there are no guarantees that an evolving population will discover the
highest fitness solution. Sometimes the population does not even discover a solution that
can work practically in the real world. There are many reasons why EAs may struggle to
discover paths towards higher fitness. The optimization problem may simply be difficult to
solve, or difficult for EAs specifically. Alternatively, the problem may be solvable, but the
design of the EA may be ineffective for this specific problem. Perhaps instead, the EA is
well suited to solving the problem, but the poor performance is due to its various settings
and parameters being configured ineffectively. When an EA fails to find good solutions, it is
never quite clear which of these problems may be the one plaguing it. Each of these problems
are the subject of ongoing research by many scientists and engineers around the world.
    My general interest is in the design and tuning of the evolutionary algorithms, and the
theory behind how evolution solves problems. Specifically, I am interested in selection, and
how parts of an evolutionary algorithm that are ostensibly unrelated to selection can have a
large influence on selection. These complex relationships can create unforeseen obstacles, as
well as offer hidden benefits.
    In search of both obstacles and benefits, I look to the complexity of nature. The study
of natural evolution and related theory provide a rich source of inspiration. As EAs were
originally inspired by nature, I turn once again to nature for new ideas about how to improve
EAs and overcome the obstacles EC faces.
1.2       Literature Review
1.2.1      Evolutionary Algorithms
Evolutionary algorithms (EAs) [Holland, 1992, Koza, 1994, Rechenberg, 1984, De Jong, 2012]
are search and optimization algorithms. In other words, given a description of an optimization
                                                2


Figure 1.1: A toy evolutionary algorithm, evolving the color of 9 digital organisms. Green is
the fittest color, with colors further from green in chromatic order becoming increasingly less
fit. Top: five generations of organisms, with black arrows indicating the progression of time.
Bottom left: the evolution loop: replication, inheritance, and variation. Each iteration of the loop
produces the next generation of organisms. Selection occurs because the fittest organisms produce
more offspring during the reproduction step. Bottom right: the population’s color composition
over time. The average color of the population slowly changes to green.
problem, an EA will search for solutions to the problem, and attempt to maximize the quality
of the solutions it finds. It does this by emulating the evolution of a population of digital
organisms [Adami, 1998,Ofria and Wilke, 2004,Adami, 2006]. Each digital organism contains
a blueprint for a solution to the optimization problem. We call this blueprint the organism’s
genome1 . Each organism has a genotype — the unique set of genes the organism carries —
and a phenotype, the set of all traits, functions, and behaviors the organism expresses as a
result of its genotype. The organism’s phenotype is often the candidate solution it represents.
Figure 1.1 provides a simple description of an evolutionary algorithm.
      In any evolving system, the true measure of an organism’s fitness is its reproductive
success2 . Therefore, in order to leverage evolution to solve optimization problems, we must
    1
      The genome of biological organisms are not ’blueprints’ in this sense, but are more like the organism’s
instruction manual or operating system, however even these analogies are imperfect.
    2
      Some researchers talk about the long-term survival of the entire lineage that begins with the organism in
                                                      3


link the quality of an organism’s solution to the organism’s reproductive success. The quality
of an organism is assessed according to its phenotype: its traits, functions, and behaviors. The
specific way in which the organism’s phenotype is mapped to a metric of quality determines
which solutions will evolve. A fitness function accomplishes the mapping from aspects of
the organism’s phenotype to a quality score or scores. This quality score is then used by a
selection algorithm to decide which organisms reproduce, and how many offspring they will
have. It is commonplace (and sometimes confusing) to call the quality score assigned to the
organism its fitness. In reality, an organism’s fitness can only be determined in retrospect,
for example by counting its offspring. The organism’s quality score is merely proportional
to the organism’s fitness. However, for ease of writing, and to conform to the conventional
lexicon of EC, I will call the quality score ‘fitness’ whenever the distinction is irrelevant.
    In EC, as opposed to nature, selection is performed by a selection algorithm. A selection
algorithm looks at the quality scores assigned to each organism, and gives organisms with
higher quality a chance to reproduce more often than those with lower quality. Selection
algorithms are not unlike animal breeders in that they have full authority over which organisms
reproduce and how many offspring they will have. Since selection favors organisms with higher
quality scores, the population will evolve towards higher quality scores, and thus towards
high quality solutions to the optimization problem.
    When an organism is selected to reproduce, it passes its genotype to its offspring. The
offspring’s genotype may also receive mutations which make it distinct from its parent. If
sexual recombination is desired, two or more parents can be selected to produce offspring
together. The newly born organisms must also be evaluated by the fitness function in order
to participate in the next round of selection. Sometimes the mutation and recombination of
genomes results in a solution whose quality is higher than any other seen previously, moving
the population collectively towards higher fitness.
question. This ’long-term fitness’ can only be determined in retrospect, however the organism’s reproductive
success is (usually) a good predictor of its long-term fitness.
                                                       4


1.2.2       Fitness Landscapes and Valley-crossing
Fitness landscapes [Wright, 1932, Maynard Smith, 1970] are conceptual and visual aids for
understanding the fitness relationships between genotypes. Like a topographical map with
elevation markings, fitness landscapes show higher fitness genotypes at higher elevations
than lower fitness genotypes. Adjacent locations in the fitness landscape correspond to
genotypes that are similar to one another. Typically, the distance between two genotypes is
the mutational (Hamming) distance between them, but other measures of distance can be
used. Organisms at the same elevation in the fitness landscape have the same fitness, and
therefore have no fitness advantage over one another. Organisms at higher elevations have a
fitness advantage over organisms at lower elevations than them. Inheritance and variation
crate organisms at nearby locations, while selection favors those at higher elevations. The
net result is a population of organisms that climb the hills and wander the plains of their
fitness landscape.
     An evolving population tends to climb the slopes of its fitness landscape. However, if
climbing upwards towards higher fitness was all the population was capable of, it would
quickly reach the top of some hill, more formally called a local optimum, and never retreat
downward3 . If a population reaches a local optimum, there are no single mutations that
can increase the fitness of an organism further. This poses an obstacle for evolutionary
computation algorithms, which are intended to climb the highest hill in the fitness landscape:
the global optimum. The way around this obstacle is to allow the population to sometimes
retreat down from a local optimum in hopes of discovering another higher hill to climb.
The process of escaping one optimum to climb another is called “valley-crossing”, since the
space in the fitness landscape between two hills is like a valley. Valley-crossing can happen
gradually, for example through the sequential fixation of deleterious mutations due to Muller’s
ratchet [Muller, 1964], or quickly, with mutations occurring quickly without intermediate
   3
     This aspect of evolution is effectively the stochastic gradient descent algorithm, popular in machine
learning.
                                                      5


genotypes fixing in a process called stochastic tunneling [Iwasa et al., 2004a].
    Although a population does not always require a loss in fitness to cross a fitness valley
[Iwasa et al., 2004a, Østman and Adami, 2014], deleterious mutations are often necessary
to explain valley-crossing events. For example, [Covert et al., 2013] disallow deleterious
mutations in an evolutionary algorithm and observe that the adaptability of the system is
hindered due to the inability to cross valleys. Similarly, it has been shown that elitism,
strictly selecting only the highest fitness organisms, hinders adaptation on fitness landscapes
with many local optima [Oliveto et al., 2018], showing that deleterious mutations play an
important role.
    Besides crossing fitness valleys, there are other means to escape local optima. Extradi-
mensional bypass theory [Conrad, 1990, Cariani, 2002] describes how a fitness landscape may
change over time, resulting in new paths towards higher fitness that do not involve deleterious
mutations. The neutral theory of molecular evolution [Kimura, 1983, Kimura, 1991] posits
that fitness-neutral genetic variation accounts for nearly all variation seen in nature. If true,
the theory would suggest that natural populations traverse fitness landscapes primarily along
neutral ridges [Gavrilets, 1999]. In other words, the mutational distance to a beneficial mu-
tation limits the rate of adaptation, not the difficulty crossing fitness valleys. Neutral theory
has not gone unchallenged; evidence has accumulated to suggest that selection plays a more
important role than neutral theory would suggest [Lewontin et al., 1974, Kreitman, 2000, Fay
et al., 2002, Hahn, 2008, Kern and Hahn, 2018]. Modifications to the theory have been made
by [Ohta, 1992,Ohta, 2002] to allow for nearly-neutral variation, and in particular deleterious
mutations, which once again introduces valley-crossing into the equation. Regardless of how
neutral the natural fitness landscape is, we can be sure that optimization problems will
represent many kinds of fitness landscapes, and EAs must be equipped to navigate them all.
1.2.3      No free lunch: the balance between selection and variation
Viewing evolution as a search-and-optimization algorithm allows us to import a massive
body of literature from the optimization community. This body of literature can help
                                                6


predict how evolution will unfold and establish the limitations of evolution as a problem-
solving device. All search and optimization algorithms, including both natural selection and
computer models of evolution, are subject to the fundamental limitations of the no-free-
lunch theorems [Wolpert and Macready, 1995, Wolpert and Macready, 1997, Ho and Pepyne,
2002, Wolpert and Macready, 2005], and particularly to the explore-exploit tradeoff [Millidge
et al., 2021]. In order to explore the fitness landscape and discover better solutions, some
genomes must mutate far from the mass of the population, becoming less like the solutions
that birthed it. In order to exploit the fitness landscape for further optimizations, some
genomes must mutate near the mass of the population to fine tune the best solutions. Some
genomes must not mutate at all in order to preserve copies of the existing solutions for
future generations. Managing this tradeoff is typically a main concern in the development
of evolutionary algorithms. The manner in which an evolutionary algorithm balances these
concerns will determine the algorithm’s performance on a given fitness landscape.
    We can think of the explore-exploit dilemma in terms of selection, variation, and in-
heritance. Selection is a purifying force, by which I mean selection favors the high-fitness
individuals, while slowly driving all others to extinction. Inheritance is a cohesion force, by
which I mean members of the population accumulate in the same vicinity within the fitness
landscape when reproduction occurs. If selection and inheritance were the only phenomena
occurring in the population, every organism in the population would eventually become a
clone of the most fit individual. Together, selection and inheritance are the main drivers
of exploitation dynamics in the evolutionary optimization process. Variation is a diffusive
force, by which I mean mutations randomly move organisms around in the fitness landscape,
spreading them out in all directions. Variation is the primary driver of exploration dynamics
in the evolutionary search process. Inheritance plays a critical role in exploration as well,
since it allows for mutations to build on one another, increasing the range of exploration.
Inheritance also focuses the exploration in the vicinity of the population where good solu-
tions are known to exist. It is, therefore, the opposing forces of selection and variation which
                                                 7


result in the explore-exploit dilemma. Both selection and variation are necessary aspects of
evolution, but shifting the emphasis towards one aspect can also shift emphasis away from
the other.
     If selection and variation are necessary but contrary forces, then perhaps there is an
optimal balance between these forces that maximizes the rate of adaptation. The no-free-
lunch theorems tell us that any algorithm will have some inputs for which the algorithm
behaves optimally, while for other inputs the algorithm will perform terribly4 . If we think of
the evolutionary algorithm and a particular set of parameters (held constant) as a single unit,
we may consider the fitness function as the input, and deduce that these particular parameters
will lead to optimal evolutionary search on some fitness landscapes, and terrible search on
others. If instead we think of the evolutionary algorithm and a particular fitness function
as a single unit, we may consider the algorithm’s parameters as the input, and deduce that
some parameters will lead to optimal evolutionary search on this particular fitness function,
while other parameters will lead to terrible search. Furthermore, consider that the forces of
selection and variation are a direct consequence of the evolutionary algorithm and the values
of its parameters.
     Therefore, as a consequence of the no-free-lunch theorems, for a given location on a
fitness landscape, there is an optimal balance between selection and variation that results
in the fastest adaptation from that location. Equivalently, for every balance of selection
and variation, there are locations in the fitness landscape that are most easily traversed by
that population. However, in general, there is no single optimal balance between selection
and variation for all areas of every fitness landscape, and no fitness landscape is optimally
traversed by every balance of selection and variation. Put simply, the ‘best’ configuration of
the population depends entirely on where the population resides in the fitness landscape.
   4
     In truth, the theorems state that all algorithms perform the same when averaged across all inputs. We
may deduce from this that unless all inputs result in exactly the same performance, some inputs must achieve
better outcomes than others.
                                                       8


1.2.4       Selection regimes and rates of adaptation
It is common in the literature to talk about strong versus weak selection, or strong versus
weak mutation. In particular, the literature considers two regimes: strong selection / weak
mutation (SSWM) and weak selection / strong mutation (WSSM)5 . The SSWM and WSSM
regimes are also known as the sequential-fixation regime and the successive-mutations regime
[Desai and Fisher, 2007, Desai et al., 2007]. These regimes help us understand the different
ways that an evolutionary system can balance selection and variation.
     In the sequential-fixation regime, the discovery of beneficial mutations is rare, and so, once
discovered, beneficial mutations go to fixation before another is discovered. In this regime,
the rate of adaptation is limited only by the rate at which beneficial mutations are discovered.
The sequential-fixation regime may, in addition to correlating with strong selection and
weak mutation, also correlate with strong selection and strong mutation. To illustrate this
point, consider a population with a high mutation rate, but very few beneficial mutations
to discover in the local fitness landscape. This population is experiencing a large inflow of
neutral and deleterious variation that can work against selection and result in exploration
and potentially valley-crossing. Such a population would be in the sequential-fixation regime
if those valley-crossings rarely happen and the population goes to fixation between each
discovery.
     In the successive-mutations regime, many beneficial mutations are discovered faster than
a single one can fix, resulting in competition between beneficial mutations, called clonal
interference (CI) [Gerrish and Lenski, 1998, Campos et al., 2004, Desai et al., 2007, Fogle
et al., 2008]. CI slows down the rate of adaptation, since the population makes inefficient use
of the beneficial mutations it discovers, often simply losing those discoveries to competition
and drift.
     These two evolutionary regimes are the extremes. In reality, a population is likely
   5
     Sometimes the order of selection and mutation in these acronyms are written in the reverse order: WMSS
or SMWS.
                                                      9


somewhere between the two regimes and constantly changing. Thus, these two regimes serve
more as simplified models of evolution, with properties that make them easy to analyze and
model, than as complete descriptions of evolutionary dynamics.
1.2.5      Shifting the balance
Since no single configuration of an evolutionary algorithm’s selection and variation forces can
solve every problem, evolution should be a more powerful tool when it changes its selection and
variation over time. Indeed, this has been found to be true. There is a large body of literature
that describes methods for self-adaptive evolutionary algorithms, algorithms that tune their
own parameters in response to how the population of solutions has evolved so far [Bäck,
1994, Smit and Eiben, 2009, Kramer, 2010]. Self-tuning algorithms manipulate mutation
rates, mutation sizes, and other aspects of variation, as well as parameters that affect the
strength of selection. Typically, only one of either variation or selection is tuned, since their
opposing nature makes tuning both unnecessary for many applications. Other algorithms
use niching techniques (a.k.a. diversity maintenance or diversity preservation) [Shir, 2012] to
maintain variation in the population by means other than mutations alone. Typically, niching
algorithms co-opt selection to apply pressure against the collapse of diversity, effectively
shifting the balance towards more variation.
    Perhaps unsurprisingly, nature is full of phenomena that can also shift the balance of
selection and variation. For instance, the size of the evolving population relates to both
selection strength and standing variation [Jain et al., 2011, Ochs and Desai, 2015, Rozen
et al., 2008, Ohta, 1972, Lynch and Conery, 2003, Takahata, 1987]. Larger populations suffer
less sampling error and are more sensitive to small fluctuations in fitness, leading to strong
selection. Small populations more easily experience genetic drift, a phenomenon where the
frequency of a genotype in the population has a larger impact on its reproductive success
than the quality of its corresponding phenotype. The decoupling of quality from fitness
results in weak selection. It is important to note that adjusting the population size changes
more than just the selection strength felt by the individuals in the population. Increasing
                                               10


the population size also increases the mutation inflow, the number of individual mutations a
population experiences in a unit of time. A larger inflow of mutations increases the standing
variation of genotypes in the population [Feller, 1951, Kimura, 1964]. Whether increasing the
population size results in a net shift of the balance towards selection or variation will depend
on the circumstances6 .
      It has also been documented recently that noisy fitness, for example due to environmental
noise, can reduce the strength of selection. For example, [Wang and Zhang, 2011] and
[Melbinger and Vergassola, 2015] both see an increase in drift-like behavior when fitness
is noisy and the selection algorithm has difficulty comparing organisms accurately. Both
publications describe the change in dynamics as being similar to a reduction in the effective
population size. A third study by [Van Egeren et al., 2018] demonstrates how noisy fitness
can also benefit adaptation by weakening selection enough to enable valley-crossing.
      When a population discovers new territory and subsequently moves or grows to inhabit
it, the population is said to undergo a range expansion. Range expansion events have been
documented to reduce selection temporarily on the leading edge of the expansion [Slatkin
and Excoffier, 2012,Peischl et al., 2013,Peischl and Excoffier, 2015,Peischl et al., 2015,Gilbert
et al., 2017, Burton and Travis, 2008]. Once the range expansion has ended, the reduction in
selection strength ends as well.
      Mass-extinction events are also known to increase the adaptive capability of the surviving
members of the population [Lewis, 1962, Mathias and Ragusa, 2016, Engholdt and Mathias,
2021,Lehman and Miikkulainen, 2015,Slater, 2013,Sahney et al., 2010], likely due to reduced
competition on less-fit genotypes and the subsequent range expansion that occurs to replace
the lost biomass.
      Selection is not the only aspect of evolution that can change naturally. For example,
mutation rates have also been documented to shift naturally [Lenski et al., 2003]. However,
mutation rates, unlike selection strength, are a property of the organism and so subject to
    6
      Increasing the population size “to infinity” results in both infinite selection strength and infinite variation,
thus infinite populations are non-physical models that represent an impossible idealized population.
                                                         11


evolution [Lynch et al., 2016]. For this reason, shifts in mutation rate are not as capricious
as shifts in selection strength over the same time scales.
    The shifting balance between selection and variation was proposed long ago as a mech-
anism for valley-crossing in natural populations. The aptly named ‘shifting balance theory’
(SBT) was initially proposed by [Wright, 1932, Wright, 1982]. The theory described a partic-
ular mechanism for valley-crossing involving migration between small isolated populations,
where selection is weak, and larger core populations, where selection is strong. However, a
growing number of people have begun citing SBT in reference to a collection of phenomena
that naturally affect selection and variation, such as range expansions [Johnson, 2008]. I sup-
port this extension of SBT, and further propose to include all the phenomena I describe above.
For clarity, I will refer to the original narrow-sense SBT as ‘peripheral isolate theory’, and I
will refer to the more general assertion, that dynamic selection strength drives adaptation, as
shifting balance theory. SBT, defined this way, provides a robust and general mechanism for
how natural populations can cross fitness valleys and keep the wheels of evolution turning.
1.2.6      Punctuated Evolution
The irregular rate of adaptation seen in nature is the subject of much discussion [Simpson,
1944, Fitch, 1995]. The fossil record shows a repeating, irregular pattern of long periods of
stasis interrupted by rapid evolutionary change. This pattern was first called punctuated
equilibrium (PE) by [Eldredge, 1972, Gould and Eldredge, 1993, Gould, 1977, Gould and
Eldredge, 1977]. The same pattern has been seen in the observed variation of the speed
of the molecular clock [Langley and Fitch, 1974, Gillespie, 1984a]. Episodic patterns of
speciation have been inferred using different methods and on different taxonomic scales; from
the fossils of the lower and middle Cambrian [Conway Morris, 1998], Phanerozoic marine
fossils [Erwin et al., 1987, John Sepkoski Jr, 1998] and tracheophyte fossils [Niklas et al.,
1983, Niklas, 1997], super-tree reconstructions of mammalian phylogenies [Bininda-Emonds
et al., 2007], and from influenza sequence evolution [Wolf et al., 2006].
    A number of mechanisms have been proposed to account for variable rates of evolution.
                                                12


For instance, [Eldredge, 1972,Gould and Eldredge, 1993] originally proposed peripheral isolate
theory [Gould, 1977, Wright, 1932, Wright, 1982] as a possible mechanism for PE. Sometimes
new genetic mechanisms (or even mechanisms outside neo-Darwinian theory [Gould, 1987])
are invoked to explain this pattern, although the evidence for such mechanisms appears
slim [Charlesworth et al., 1982]. PE has also been observed in populations that undergo
periodic selection [Adami, 2006,Elena et al., 1996], and independently of whether adaptations
lead to the exclusion or coexistence of species [Chow, 2004]. Changes in the environment
leading to positive selection and the colonization of new niches have also been advanced as
causes for accelerated evolution (the founder effect) [Mayr, 1954, Templeton, 1980, Carson,
1975]. Equally, small population sizes [Ohta, 1972, Lynch and Conery, 2003], changes in
the population size [Takahata, 1987], catastrophic extinctions [Lewis, 1962], environmental
stress [Hoffmann and Parsons, 1997,Nevo, 2001] changes in mutation rate [Lenski et al., 2003],
and the introduction of new developmental or regulatory mechanisms [Erwin et al., 1987,Erwin
and Valentine, 1984, Davidson and Erwin, 2006] as well as genetic instabilities [Fontdevila,
1992] and chromosomal evolution [Bush et al., 1977, Sites and Moritz, 1987] have all been
implicated. All of the above points to a strong relationship between PE and SBT.
    There are alternative hypotheses for what causes PE that does not involve SBT at all.
One of the primary alternative hypotheses is neutral theory [Kimura, 1983, Kimura, 1991].
As discussed previously, neutral theory proposes that nearly all variation seen in nature is
neutral variation. If true, a natural population may drift aimlessly for many generations
before discovering a beneficial mutation. Thus, vast neutral landscapes can result in very long
periods of stasis in between the discovery of beneficial mutations. This alone only accounts
for the periods of stasis, and does not explain why beneficial mutations tend to cluster
together to result in rapid evolutionary change. Further, evolutionary computation often
displays punctuated equilibrium when evolving on fitness landscapes with far less neutrality,
or none at all. Thus, while neutrality certainly can play a role in the punctuated equilibrium
phenomenon, it cannot be the sole explanation.
                                               13


     Another hypothesis suggests that the evolutionary process exhibits ‘self-organized critical-
ity’ (SOC) [Bak and Boettcher, 1997]. In brief, SOC applies when a system in disequilibrium
moves towards equilibrium but stops at the ‘minimally stable state’ instead of progressing
to the ‘maximally stable state’ [Bak et al., 1987]. This leads to the system perching on
the edge of disequilibrium and thus being increasingly sensitive to further perturbations.
The canonical example is that of a sand pile: the addition of a grain of sand represents
may result in a perturbation from equilibrium for the entire pile, and the shifting of other
grains restores the pile to a minimally stable state7 . Sometimes more grains must shift
to accomplish equilibrium, and if the system moves beyond the minimal stable state (to a
more-than-minimally-stable state), there can be a prolonged state of stability where the pile
is temporarily less sensitive to the addition of more grains.
     The analogy with PE is easy to understand, since systems with SOC and evolution both
exhibit unpredictable and sudden large changes. However, in order to map SOC to evolution,
one must first explain what equilibrium the system is moving towards, what perturbations are
moving the system away from equilibrium, how these perturbations accumulate a potential for
change over time, what the minimally-stable state is, and why evolution stops there instead
of proceeding to a more stable state. Until these questions are answered, SOC remains a less
viable explanation for PE.
     We can amend the SOC hypothesis slightly by first observing that environmental factors
that may kick off evolutionary change can be self-organized. For example, earthquakes
[Turcotte et al., 1985, Smalley Jr. et al., 1985] and forest fires [Malamud et al., 1998],
among other things, are both thought to be self-organized. These kinds of events may
create environmental instability, that then leads to evolutionary adaptation. Since these
destabilizing events exhibit SOC, the adaptive response to them would result in the observed
PE without evolution itself exhibiting SOC.
     However, PE is observed in digital evolution all the time, and SOC does not appear in
   7
     The maximally stable state is no pile at all: all grains of sand in a ground state with zero instability.
                                                       14


 many of these systems. So, SOC cannot be the only explanation for PE, even if it can explain
 some spurts of evolutionary change. Shifting balance theory, in my opinion, remains the best
 contender to explain punctuated equilibrium.
 1.2.7      Genetic Recombination
 Hitherto now, I have avoided discussing processes by which two or more organisms can
 exchange genetic material with one another. Sexual reproduction and horizontal gene transfer
 are common examples of such an exchange. In evolutionary computation, it is common to
 use a simplified kind of sexual reproduction called crossover [Sastry et al., 2005]. Crossover
 produces an offspring whose genome is a combination of two or more parents’ genomes.
Although crossover is a crude analogy for sexual reproduction, I may nevertheless refer to
’sex’ in an abstract sense when talking about crossover or genetic recombination generally.
     When a population does not have the capability for genetic recombination, they are
 generally called asexual populations. Up to now, I have only discussed asexual populations.
Adding sex to an asexual population creates several notable changes to the evolutionary
 dynamics.
     First, sex allows the combination of two or more beneficial mutation that have arisen
 in separate lineages. This directly combats clonal interference by providing opportunities
 for competing benefits to come together and go to fixation instead of driving one another
 to extinction. Second, sex allows the combination of two or more deleterious mutations
 that have arisen in separate lineages. These doubly-deleterious individuals can be more
 easily purged by purifying selection, helping to slow or stop the spread of these mutations
 to future generations. Third, sex allows for the separation of beneficial mutations and
 deleterious mutations that have arisen in a single lineage. Offspring that receive only the
 beneficial mutation will outcompete organisms with the deleterious mutation, and offspring
 that receive only the deleterious mutation will be outcompeted by organisms that have the
 benefit. Like in the previous two cases, sex is assisting the beneficial mutations and hampering
 the deleterious mutations. All of these effects can be thought of as increasing the strength
                                                  15


 of selection [Kondrashov, 1988, Rice and Chippindale, 2001], and we have already discussed
 the implications of increasing the strength of selection in previous sections.
     Sex can also make exploration of the fitness landscape easier. Asexual populations traverse
 the fitness landscape through mutation alone, and valley-crossing is a major obstacle for
 asexual populations. Sex can join together mutations that might be difficult to join asexually.
 Consider the following example: Suppose we are evolving a population of organisms whose
 genomes are strings of four bits. The population begins with all organisms having genotype
’0000’. The genotype ’1111’ is the global optimum, having a higher fitness than all other
 genotypes. However, all other genotypes suffer a fitness penalty for each ’1’ in their genome;
 there are four single-1 genotypes, six double-1 genotypes, and four tripple-1 genotypes, and
 each group suffers a larger fitness penalty than the last. If we assume that at most one site
 in an organism’s genome can mutate in a single generation, then an asexual population must,
 at some point, have a tripple-1 genotype survive selection and produce a ’1111’ offspring to
 discover the global optimum. This may be an extraordinarily rare occurrence if the mutation
 rate is low and/or the fitness penalty for each ’1’ is severe. On the other hand, a sexual
 population could combine two genomes, each one a double-1 genotype, to produce a ’1111’.
 Under the same circumstances, this could be a far more likely occurrence than the asexual
 path to the same genotype. Thus, sex can enhance the population’s ability to explore the
 fitness landscape, and valley-cross.
     Finally, although recombination produces novel combinations of existing genetic variants,
 it cannot create new genetic variants in the sense that mutation does. In other words, a
 population that evolves with recombination but without mutations can explore the fitness
 landscape, but exploration is limited to creating combinations of the existing genetic variation.
As evolution progresses in such a system, genetic variation will be lost to selection, further
 limiting the combinations that can be produced. Without mutations to supply new variation,
 the population will eventually converge to a single genotype. The final result may not even be
 the optimal combination of the variation that was initially available in the population. Thus,
                                                16


sexual recombination is neither necessary, nor sufficient for evolution to continue indefinitely.
    Ultimately, sexual and asexual populations operate in largely the same way. Asexual
populations experience weaker selection, and suffer from clonal interference, but more easily
maintain genetic diversity. Sexual populations experience stronger selection and reduced
diversity, but can mix and match existing mutations to create novel combinations and explore
additional areas of the fitness landscape without waiting on the arrival of further mutations.
1.3       Research Objectives and Contributions
Many of the proposed causes of punctuated equilibrium are also phenomena I propose fit
under the umbrella of a general shifting balance theory. This observation leads us to the core
of my research: Can we leverage the natural phenomena, which are proposed to accelerate
natural evolution, to augment digital evolution and evolutionary computation?
    Since a general shifting balance theory predominantly involves phenomena that change
the selection strength of the system, that is where I have focused my efforts. As such, it will
be paramount to first understand selection in a more concrete way. How can we measure
the strength of selection? I believe a good measure of the strength of selection must come
from a good conceptualization of selection and its role in the evolutionary process. It also
remains to be seen if we can understand why each of the phenomena that shift the strength
of selection do so. Are these phenomena merely like changing the strength of selection or are
they actually changing the strength of selection, and what (if any) distinction is there to be
drawn between those two concepts? Answering these questions, again, comes down to first
creating a better understanding of what selection strength is to begin with.
    Many of the phenomenon that change the strength of selection are highly specific events
or circumstances. For example, the original shifting balance theory dealt only with isolated
populations and migration events. This mechanism is not expected to be a common occur-
rence, nor a primary driver of evolutionary change because of its rarity. Are there more
universal phenomena that create a shifting strength of selection? How can we detect these
                                               17


phenomena and rule out confounding factors? How plausible would these new hypothetical
phenomena be as explanations of punctuated equilibrium?
    Finally, how can we take answers to the questions above and turn them into solutions
for evolutionary computation? Which phenomena are easily converted to new algorithms or
best practices? What phenomena are difficult to leverage and why?
1.4      Research Methods
I use computational simulations to test both the development of new population genetics
theory and any potential contributions to computational algorithms. Wherever applicable,
existing population genetics theory is used to confirm or compare results generated by the
computational simulations.
    The MABE software package [Bohm et al., 2017] is one of the primary tools used to build
the evolutionary algorithms used in my work. Whenever an evolutionary algorithm with little
complexity was needed, I instead wrote a simple system in the python programming language.
All data analysis and visualization was performed using python and assorted libraries. The
details of each simulation are described in the corresponding chapters.
    My collaborators and I have made a number of novel contributions to the research methods
we use in the course of conducting the research. The contributions that appear in several
chapters in this dissertation are described below. Other contributions are discussed in more
detail in Chapter 6.
1.4.1      Fitness proportional selection without diminishing returns
We can take the standard fitness proportional selection algorithm (a.k.a. roulette wheel
selection or roulette selection), with probabilities of reproduction given by
                                                   wi
                                       P r(i) = PN        ,                             (1.1)
                                                   j  w j
                                               18


and we can modify it to avoid diminishing returns at large fitness values. This is accom-
plished by ensuring that the same change in fitness results in the same fitness advantage.
Mathematically, it is written as
                                                    bwi
                                       P r(i) = PN           ,                              (1.2)
                                                        b wj
                                                     j
where b is a parameter to tune the strength of selection. This formulation of roulette selection
no longer has diminishing returns, but it suffers from a computational problem, since bw can
easily reach the limit of what can be stored in most computing platforms. To fix this, we
can offset the fitness values, working instead with values relative to the maximum fitness,
                                                 bwi −wmax
                                    P r(i) = PN                .                            (1.3)
                                                      wj −wmax
                                                 j b
In this form, the magnitude of wi − wmax is bound by the range of fitness values wmin −
wmax , helping to keep the magnitude of bwi −wmax small enough to hold in computer memory.
Furthermore, the range of bwi is (−∞, ∞), while the range of bwi −wmax is (−∞, 1], ensuring
that any loss of precision occurs at the smallest values, which has no great impact on the
outcome of the simulations.
1.4.2      Saw-tooth fitness function
Measuring selection strength is hard (more about that in Chapter 2). In the absence of a
more sophisticated methodology, selection strength can be inferred by the relative ease or
difficulty of valley-crossing on a known fitness landscape. To this end, we have constructed
a novel ‘saw-tooth’ fitness function that endlessly repeats the same fitness valley.
    The saw-tooth fitness function requires organisms with a genome with a single locus
that encodes a number k. When the organism reproduces, this allele mutates to k + 1 with
             µ
probability  2
               or k −1 with probability µ2 , where µ is the probability to mutate. The saw-tooth
function, shown in Figure 1.2, can be written in terms of the genotype k as
                                      w(k) = Bn − Dm ,                                      (1.4)
                                                19


Figure 1.2: Diagram of the saw-tooth fitness function. The shape of the function is defined by
the valley width v, crossing benefit B, and mutation disadvantage D. The score of a genotype k is
calculated according to its position, given by n and m, multiplied by the benefit and disadvantage,
respectively.
                                          m = k (mod v) ,                                    (1.5)
and
                                                 k−m
                                            n=          ,                                    (1.6)
                                                   v
where B is the additive fitness benefit of discovering each peak, n is the number of peaks
to the right of the origin the genotype k has reached or passed, D is the additive fitness
disadvantage of each mutation into a fitness valley, m is the number of mutations into the
valley the genotype k has accumulated, and v is the width of the fitness valley, from peak
to peak. The modified roulette selection probabilities (Equation 1.3) can then be used to
ensure the selective forces are consistent everywhere in the saw-tooth fitness landscape.
    The saw-tooth function allows us to relate the average rate of adaptation (fitness gain)
to the average rate of valley-crossing, since each valley crossed awards the same fitness
benefit. Further, since we know that fitness valleys are more easily crossed when selection
is weaker, we can conceptually map the rate of valley-crossing to selection strength. These
results are also easily compared with the theoretical expectations for an Ornstein–Uhlenbeck
                                                 20


process [Uhlenbeck and Ornstein, 1930, Artime et al., 2018, Alili et al., 2005, Yi, 2010], by
looking at the runtime of the simulation and the average number of valleys crossed, i.e.,
                                                    tfinal
                                        t̄cross = B         .                               (1.7)
                                                    w̄final
1.5       Outline of Dissertation
The remaining chapters of this dissertation each explore a specific aspect of selection strength
and how its changes affect evolutionary dynamics. Chapter 2 delves into the nature of selection
itself, and explores the creation of a new metric of selection strength. The new metric of
selection strength highlights important aspects of selection that will inform future attempts
to improve evolutionary computation algorithms. Chapter 3 explores noisy fitness evaluation
as a natural mechanism for reducing selection strength. Fitness noise is compared and
contrasted with two other methods of controlling selection strength in an evolutionary system:
population size and tournament selection. Chapter 4 introduces a previously undocumented
evolutionary dynamic which affects the strength of selection of an evolving system. This newly
described dynamic, called the free-for-all effect, reduces selection strength on populations
during selective sweeps. The free-for-all effect also highlights that more than just weak
selection is needed for efficient exploration of a fitness landscape. Chapter 5 builds on the
discoveries of chapter 4 to develop a new method of augmenting evolutionary computation
algorithms, further emphasizing the need for more than just weak selection. Chapter 6,
the final chapter, summarizes the main results of each chapter and synthesizes a set of
best practices for evolutionary computation based on these findings. The final chapter also
includes a discussion of future research avenues and the benefits I believe are to be found by
exploring them.
                                                 21


1.6       Summary
Evolutionary computation is a powerful optimization tool, and an invaluable test bed for
population genetics. Although it is still debated how much evolutionary adaptation occurs
via valley-crossing, it is undeniable that valley-crossing can provide faster avenues to higher
fitness, or indeed the only avenue in some cases. Valley-crossing is fundamentally linked to
the balance between the forces of selection and variation, and as such, controlling this balance
is important for optimizing the efficiency of evolutionary algorithms. Nature, in contrast, is
not actively optimized for performance, and yet nature seems to overcome many evolutionary
obstacles that evolutionary algorithms do not. It is possible that nature benefits from a
highly dynamic balance between selection and variation, and this constant flux helps natural
populations avoid stagnation and overcome obstacles in the fitness landscape. Working with
this hypothesis in mind, I investigate the nature of selection and how natural phenomena
strengthen or weaken it. I will look at how to measure selection strength, explore how
selection strength changes over evolutionary time, and construct new evolutionary algorithms
and best practices based on my findings.
                                                22


Chapter 2
Quantifying the impact of selection
2.1       Introduction
The effectiveness of an evolving population at navigating its fitness landscape is determined
in large part by the strength of selection acting on it, and particularly so for deceptive fitness
landscapes [Oliveto et al., 2018]. Evolving populations are subject to an explore-exploit trade-
off [Wolpert and Macready, 1995,Wolpert and Macready, 1997,Ho and Pepyne, 2002,Millidge
et al., 2021] that governs, among other things, how capable the population is at navigating
the fitness landscape [Wright, 1932].
     Strong selection restricts exploration [Feller, 1951,Kimura, 1964,Iwasa et al., 2004a,Covert
et al., 2013, Oliveto et al., 2018, Lewis, 1962] and enhances exploitation by allowing only the
most fit organisms to reproduce. This increases the probability that a new beneficial mutant
will fix in the population, and also decreases the expected time to fixation, but reduces the
probability to discover a new benefit in the first place.
     Weak selection, on the other hand, allows many more less-fit organisms to reproduce,
enhancing exploration at the cost of exploitation. This increases the probability to discover
a new beneficial mutation, while reducing the probability to fix the mutation once it is
discovered, and increasing the time to fixation. The most adaptive populations are those
                                                  23


that happen to balance exploration and exploitation in a way that aligns with the (local)
topology of their fitness landscape [Ragusa and Bohm, 2021].
    In spite of its important role in evolutionary theory, selection strength is not a fundamental
(independent, or irreducible) property of evolving populations. Instead, it is an emergent
property affected by many aspects of the evolving population, organism, and environment. For
example, the following features all simultaneously affect the selection strength of an evolving
system in their own way: population size [Jain et al., 2011,Ochs and Desai, 2015,Rozen et al.,
2008, Ohta, 1972] and changing population sizes [Wright, 1982, Lynch and Conery, 2003],
noisy genotype-to-fitness actualizations [Wang and Zhang, 2011, Melbinger and Vergassola,
2015,Van Egeren et al., 2018], temporospatial population structures such as range expansions
[Slatkin and Excoffier, 2012, Peischl et al., 2013, Peischl and Excoffier, 2015, Peischl et al.,
2015, Gilbert et al., 2017] and population topology [Mühlenbein, 1991, Lieberman et al.,
2005,Mühlenbein, 2009,Askari and Samani, 2015,Möller et al., 2019,Tkadlec et al., 2019,Kuo
et al., 2021], mass-extinction events and bottlenecks [Lewis, 1962, Mathias and Ragusa,
2016, Engholdt and Mathias, 2021], sexual selection [Bohm et al., 2019], and many more.
While selection strength does not have a singular cause, quantifying the effective strength of
selection can be extremely useful for designing experiments that aim to measure the effect a
particular feature has on selection strength, understanding the present state of a population
of interest, or for comparing populations with one another.
    As one might expect, given the importance of selection in evolutionary theory, there is
no lack of methods that already exist for quantifying different aspects of selection strength.
These methods, as we will see, each have their drawbacks: some make too many assumptions
and are either unnecessarily restrictive or simply not appropriate in all cases, some make too
few assumptions and are consequently less meaningful, some have units that make comparing
populations difficult, and other measures can only be computed relative to another population,
which prohibits their use in some common cases.
    In this work, I aim to build a theory of selection strength from a set of general assumptions.
                                                 24


I provide intuition and justification for each of the assumptions before proceeding to formalize
them mathematically. I then present a new measure of selection strength, the selection impact
metric, and demonstrate its use on a number of digital evolution experiments.
2.2        Properties of selection strength
Selection creates a bias in the reproductive success of genotypes in a population. Here, bias
refers to a non-uniform probability of producing an offspring. Changes in the frequency of
a trait are the result of a correlation between the trait and the reproductive success of the
organism with that trait [Price et al., 1970,Price, 1972]. However, selection does not “intend”
to change trait frequencies; a changing trait frequency emerges from the bias in reproductive
success. Therefore, changes in trait frequencies, while correlated with selective forces, are
not where we should be looking to measure selection. Instead, we will only consider offspring
production when quantifying selection strength.
    If selection is a bias in reproductive success, then no bias means no selection. This
implies that a measure of selection strength should indicate zero in the case of neutral drift.
In addition to satisfying our common sense, identifying neutral drift with the absence of
selection has the additional benefit of defining an absolute zero for a metric of selection
strength. A population consisting of a single organism also has no bias in reproductive
success, since the probability of allocating offspring to among one individual is trivially
uniform. This informs us that selection is inherently a property of a population, so our
measure of selection strength will reflect this.
    It is important to acknowledge that the selection strength experienced by a population
can change over time. Indeed, the bias of reproductive success changes with each birth and
death. Ideally, a measure of selection strength would be able to resolve changes in selection
strength over time, as well as space.
    A metric of selection strength should allow for comparison between different species. If, for
instance, one wants to compare the selection pressure experienced by a population of beetles
                                                 25


to that experienced by a population of birds, the metric should facilitate such a comparison.
However, some metrics of selection strength, such as those based on the correlation between
an organism’s fitness and traits [Price et al., 1970, Price, 1972], are not capable of making
such comparisons as they are dependent on the units used for the traits and fitness [Lande
and Arnold, 1983, Houle, 1992, Hereford et al., 2004]1 . There are some ad-hoc solutions
that attempt to standardize units, but there is a lack of consensus on which solution to
use [Matsumura et al., 2012]. One way to avoid this problem is to use a unitless metric of
selection strength, that would allow for cross-species comparison of the selection pressure.
      The following list summarizes the desired properties of a selection strength metric.
      • (F) Selection strength is calculated from observing only the number of offspring that
        organisms produce.
      • (Z) Selection strength has an absolute zero: when the population experiences neutral
        drift.
      • (P) Selection strength describes a force felt by a population, a collection of more than
        one organism.
      • (C) Selection strength can be compared between two populations, regardless of their
        relationship to each other.
      • (U) Selection strength is a unitless quantity.
      I will reference these constraints in the following sections to both motivate the mathe-
matical development of a new metric, and to call attention to when the new metric satisfies
these requirements.
    1
      For example, in a population of beetles, a correlation may be found between the walking speed measured
in inches per second and the fitness measured in number of offspring. Similarly, in a population of birds, a
correlation may exist between wingspan measured in inches and number of offspring. However, it is uncertain
                                    [inches]           [inches]
how to compare the units of [second]×[offspring] and [offspring] .
                                                        26


2.3      The Selection Impact Metric
The core concept behind the selection impact metric is that selection creates a bias in the
distribution of offspring among the available parents in a population. Conversely, there is an
unbiased distribution of offspring that corresponds to neutral drift.
    We can represent a population’s offspring distribution as a probability mass function, f (n)
(defined in Section 2.5), that indicates what fraction of the population’s parents produced n
offspring during a particular period of observation.
    The unbiased drift distribution, fD (n) (defined in Section 2.4), is a theoretical offspring
distribution that represents the most likely distribution of offspring when the probability of
                          1
reproduction is exactly   N
                             for each organism in a population with N potential parents.
    With both f and fD in hand, we can measure how far from drift the population is and
define the selection impact as
                                       S(f ) = Dist(f, fD ) ,                              (2.1)
where Dist is a suitable distance function between two probability mass functions.
    Because the only data about the population required to compute the selection impact is
the distribution of offspring f , the definition of selection impact as Equation (2.1) satisfies
the constraint (F), that we only consider offspring counts. Further, by defining the selection
impact as a distance from fD , the constraint (Z), that the metric is zero in the case of
neutral drift, is also satisfied, since S(f ) = 0 implies f (n) = fD (n). The constraint (P),
that S describes the impact of selection on a population, is satisfied implicitly by the use of
probability mass functions (a population with N = 1 will always have f (n) = fD (n)).
    The remainder of Section 2.3 addresses the choice of distance function to be used in
Equation (2.1). Section 2.4 describes the derivation of fD from a minimal set of assumptions
about the population, and Section 2.5 describes how to construct f from an observation of
the population. Section 2.6 presents empirical data that verify the drift distributions, and
showcase some applications of S(f ). In Section 2.7, I discuss where the selection impact fits
                                                27


into the ecosystem of other selection strength measures and some future research directions.
2.3.1      Choosing a suitable distance measure
The ability to compare the selection impact of two observations, constraint (C), is satisfied if
Dist satisfies the triangle inequality. Likewise, the constraint (U), that the selection impact
is unitless, is satisfied if in turn Dist is unitless. Therefore, the choice of distance function is
limited to a set of unitless metrics that operate on probability mass functions.
    We have a choice between information theoretic measures of difference between probability
masses, like the Kullback-Leibler (KL) divergence [Kullback and Leibler, 1951], or a true
measure of distance between probability masses, like the Earth-Mover distance (EMD, a.k.a.
the Wasserstein distance or the Kantorovich–Rubinstein metric) [Kantorovich, 1960] or the
Cramér-von Mises distance (CMD) [Anderson, 1962]. We can rule out the KL-divergence
and similar information theoretic measures because they treat the discrete elements of the
distribution’s support as unordered symbols, instead of ordered values. For our application —
measuring the difference of two offspring distributions — we care about the distance between
offspring counts, not merely that two counts are different symbolically. A parent of two
offspring should be considered quite different from a parent of thirty-two, while considered
quite similar to a parent of three. Therefore, we will choose a proper distance measure over
an information theoretic measure.
    The choice between EMD and CMD is somewhat arbitrary, though the CMD has a
diminishing sensitivity to large differences along the support of the probability mass function.
Therefore, I have chosen to employ the EMD as the distance metric in Equation (2.1) for
the remainder of this report.
2.3.2      Earth Mover Distance
For two probability distributions P (x) and Q(x) with a common one-dimensional interval
support, χ = [x0 , ..., xn ], the EMD is given by the following recursive definition:
                                                   28


                                   ri+1 = P (xi ) + ri − Q(xi ) ,                           (2.2)
                                             r0 = 0 ,
                                                      Xn
                                     EMD(P, Q) =          |ri | .                           (2.3)
                                                      i=1
    If P (x) and Q(x) have different supports, then the EMD can be calculated using the
smallest interval that includes both, χ = [min(χP , χQ ), max(χP , χQ )].
2.4       Modeling the Offspring Distributions of Neutral
          Drift
In order to calculate the selection impact metric, it is necessary to have both the actual
offspring distribution of a population and a theoretical offspring distribution for a population
of the same size that is evolving under neutral drift. These drift reference distributions model
the most likely distribution of offspring among the parents in a neutrally drifting population.
    The neutral drift expectation depends on several key factors: the generational model (e.g.,
Wright-Fisher vs. Moran, Sections 2.4.1 & 2.4.2), population size (Sections 2.4.1.2 & 2.4.2.1),
and sexual recombination (Section 2.4.1.1).
2.4.1      Non-Overlapping Generations
In a Wright-Fisher process, time advances in discrete intervals called generations. A genera-
tion consists of a reproduction step and a replacement step. During the reproduction step, all
members of the population are given the opportunity to create an offspring. The probability
that a member of the population is selected as a parent is proportional to their fitness. Parents
are selected until a number of offspring equal to the population size are produced. The same
organism may be selected as a parent more than once during the reproduction step. Next,
in the replacement step, all members of the population are replaced by the offspring. Since
no parent ever exists in the population at the same time as their children, the generations
                                                 29


are called ’non-overlapping’.
     Let N denote the population size (e.g., N ∈ Z+ ). When the population experiences
                                                           1
neutral drift, each organism has a probability of          N
                                                             to be chosen to reproduce, and there are
N opportunities to be selected. We can model the probability that any organism has exactly
n offspring when the population drifts by a simple binomial probability distribution
                                               n                  N −n
                                              N       1              1
                              fD (n; N ) =                     1−             ,                        (2.4)
                                              n       N             N
where 0 ≤ n ≤ N and n ∈ Z+ . Because the population must be replenished with offspring
each generation, the number of trials is N . Hence, our binomial distribution is given by
fD (n; N ) and is parameterized only by N .
2.4.1.1      Sexual reproduction
Let p denote the number of parents needed to produce a single offspring. The most common
biological examples would be asexual (p = 1) and sexual (p = 2) offspring production,
however, p can be set arbitrarily high to accommodate other scenarios.
     When the population experiences neutral drift and each offspring is formed from p parents,
we can model the probability that any organism has exactly n offspring as follows:
                                                n                    pN −n
                                               pN        1              1
                            fD (n; N, p) =                         1−                                  (2.5)
                                                n       N              N
where 0 ≤ n ≤ pN and n ∈ Z. These modifications to Equation 2.4 take into account that the
                                                               1
probability to select from the initial population2 is          N
                                                                 , and that there are pN opportunities
to be selected, since we need p parents for each of the N offspring. These modifications can
be made in conjunction with those in Section 2.4.1.2.
   2
     Here I assume for mathematical simplicity that selection is with replacement. Although this implies that
one individual can sexually recombine with itself, the probability of such an occurrence diminishes if larger
populations are considered.
                                                     30


2.4.1.2      Changing population size
Let us now relax the assumption of constant population size.
     Let N (t) denote the initial population size at time t, and N (t+1) denote the population
size at time t + 1.
     When the population experiences neutral drift and the population size changes from N (t)
to N (t+1) , we can model the probability that any organism has exactly n offspring as
                                             (t+1)           n             N (t+1) −n
                             (t)   (t+1)     N             1                  1
                   fD (n; N , N          )=                             1 − (t)                        (2.6)
                                                n         N (t)             N
where 0 ≤ n ≤ N (t+1) and n ∈ Z. These modifications to Equation 2.4 take into account that
                                                                 1
the probability to select from the initial population is        N (t)
                                                                      , and there are N (t+1) opportunities
to be selected. These modifications can be made in conjunction with those in Section 2.4.1.1.
2.4.2        Overlapping Generations
When generations are allowed to overlap, the evolutionary dynamics are best described by
the Moran process. Time advances in a series of updates, where one organism gives birth,
and one organism dies3 . The probability to select an organism as a parent is proportional
                                                                                               1
to their fitness. The probability to select an organism to die is uniform (i.e.,               N
                                                                                                 ). Since a
parent can exist in the population at the same time as their descendants, the generations are
called ’overlapping’.
     Because of the coexisting generations, a model of neutral drift must track ancestor-
descendant relationships. To facilitate this ancestry tracking, we must label each organism
at time t = 0 with a unique integer i from the set 1, ..., N , which indicates the organism’s
clade. When an organism reproduces, their offspring inherit the clade identifier.
           (t)                                                                                       (t)
     Let ni denote the number of organisms in clade i at time t. Since the value of ni can
be any integer in {0, ..., N }, we can model the change of ni over time with a Markov chain
   3
     Here, I assume the birth-death model, where births happen before deaths. If death happens before birth,
the organism that dies cannot reproduce and the values of pa,b change accordingly.
                                                     31


with states n ∈ {0, ..., N }. When the population experiences neutral drift, each organism has
                   1
a probability of  N
                     to be chosen as a parent. Further, each organism always has a probability
    1
of  N
       to be selected for death. Since the same organism can be chosen for both events (and
each event is independent) there are four cases:
    1) An organism in clade i is selected for birth and an organism outside of clade i is
selected for death.
    2) An organism in clade i is selected for birth and an organism in clade i is selected for
death.
    3) An organism outside of clade i is selected for birth and an organism outside of clade i
is selected for death.
    4) An organism outside of clade i is selected for birth and an organism in clade i is
selected for death.
    Therefore, the Markov chain has a (N + 1) × (N + 1) transition matrix P = [pa,b ] where
pa,b is
                              
                                 n        n
                                          
                              
                              
                              
                                N
                                     1−  N
                                                    if a = n and b = n + 1
                              
                              
                              
                                 n 2          n 2
                                              
                              
                              
                                 N
                                     + 1−     N
                                                    if a = n and b = n
                       pa,b =                                                             (2.7)
                                     n    n
                                          
                              
                              
                              
                                1−  N   N
                                                    if a = n and b = n − 1
                              
                              
                              
                              
                              0
                                                   otherwise .
    While P naturally depends on the population size, I will suppress this dependence from
now on. Readers familiar with the Moran process will notice that Equation (2.7) descrives
the same probabilities as those for tracking allele frequencies under neutral drift. This is
because we are effectively assuming there are N distinct alleles (the clade labels), where the
allele frequency is synonymous with the clade size.
    In order to compute the drift distribution with this Markov chain, we must define the
population’s starting state and iterate the Markov chain the same number of times the
                                                 32


                                                                                (t)
 population has been updated. Let ⃗x(t) denote a vector in RN +1 , where xn represents the
 fraction of clades with size n at time t and ⃗x(0) denote the starting condition, determined by
 our initial labeling of each organism with a unique clade ID. For example, if N = 3, then
⃗x(0) = [0, 1, 0, 0], since 100% of the population has a clade of size n = 1 when t = 0.
     Let u denote the number of updates that have passed since the start of the observation.
The fraction of clades that become size n after u updates of drifting neutrally is given by the
 probability density function
                                          fD (n; N, u) = ⃗x(u) ,                            (2.8)
where ⃗x(t) is defined recursively as
                                            ⃗x(t) = ⃗x(t−1) P .                             (2.9)
 2.4.2.1     Changing population size
 Since Equation (2.7) is a function of N , any accounting for a changing population size
will require a new transition matrix each time N changes. However, if we only allow the
 population size to change by ±1 each update from t → t + 1, and we allow at most one birth
 and one death to occur on any particular update, we can limit the number of extra transition
 matrices needed to just two.
     If the population size has increased by one, then one of two events has occurred: either
 an organism in clade i has had an offspring, or an organism outside of clade i has had an
 offspring, but in either case no clade has experienced a death. Therefore, the (N +1)×(N +2)
 transition matrix is given by:
                                     
                                        n
                                         
                                     
                                     
                                     
                                       N
                                                   if a = n and b = n + 1 ,
                                     
                                     
                              pa,b =         n                                             (2.10)
                                               
                                      1−   N
                                                   if a = n and b = n ,
                                     
                                     
                                     
                                     
                                     0
                                                  otherwise .
                                                     33


    Similarly, if the population size decreases by one, we have a (N + 1) × N transition matrix
given by:                            
                                               n
                                                
                                     
                                     
                                     
                                        1−   N
                                                     if a = n and b = n ,
                                     
                                     
                           pa,b  =        n                                                (2.11)
                                            
                                         N
                                                     if a = n and b = n − 1 ,
                                     
                                     
                                     
                                     
                                     0
                                                    otherwise .
    In practice, finding the drift distribution for a population that changes size should be done
by modifying Equation (2.9) to no longer suppress the dependence of P on the population
size. For instance,
                                      ⃗x(t) = ⃗x(t−1) P(N (t−1) , N (t) ) ,                (2.12)
with the understanding that P is defined by Equation (2.7) when N (t−1) = N (t) , Equa-
tion (2.10) when N (t−1) + 1 = N (t) , and Equation (2.11) when N (t−1) − 1 = N (t) . This shows
that the particular sequence of births and deaths observed for a population will determine
the expected neutral-drift offspring distribution.
2.5      Counting Offspring and Descendants
This section explains the process of gathering the data required to apply the selection impact
metric. The method for counting offspring in the population is consistent across different
reproduction dynamics, unlike the drift reference distributions outlined in Section 2.4, which
can vary greatly.
2.5.1     Non-Overlapping Generations
In a Wright-Fisher process, all children are born simultaneously in time intervals called a
generation. With t denoting a particular generation, and t + 1 denoting the subsequent
                                (t,t+1)
generation, let Z (t,t+1) = [zi,j       ] be the N × (N + 1) offspring production matrix for the
                                                       34


                                                  (t,t+1)
t → t + 1 generation, where the element zi,j              is given by:
                               
                               
                               1 if agent i had j offspring during t → t + 1 ,
                               
                     (t,t+1)
                    zi,j     =                                                               (2.13)
                               
                               0 otherwise .
                               
     In order to be compared with a drift distribution, the offspring distribution for the t → t+1
generation, f (n, Z (t,t+1) ), must return the fraction of parents in the population that had n
offspring. This is obtained by taking the mean of the nth column of Z (t,t+1) as follows:
                                                             N
                                           (t,t+1)       1 X (t,t+1)
                                   f (n, Z         )=           z      ,                     (2.14)
                                                        N i=0 i,n
where N is inferred from the number of rows in Z.
     In each of the following sections, Equation (2.14) is used to convert the offspring production
matrix into a probability mass function.
2.5.1.1     Sexual reproduction
To account for sexual reproduction, each offspring that a parent contributes genetic material
to is recorded as an offspring for that parent. Therefore, the offspring production matrix
Z (t,t+1) becomes an N ×(pN +1) matrix, and each element zi,j is defined as in Equation (2.13).
2.5.1.2     Changing population size
To account for a population that changes size from N (t) to N (t+1) , the offspring production
matrix Z (t,t+1) becomes an N (t) × (N (t+1) + 1) matrix, and each element zi,j is defined as in
Equation (2.13).
2.5.2       Overlapping Generations
In a Moran process, one child is born, and one organism dies in a time interval called an
update.
     With t denoting time, t = 0 denoting the initial observation, and t = u the final observa-
                      (u)
tion, let Z (u) = [zi,j ] be the N × (N + 1) clade size matrix after u updates have occurred,
                                                      35


                      (u)
where the element zi,j is given by:
                       
                       
                       1 if clade i contains j living organisms at time t = u
                       
                 (u)
                zi,j =                                                                      (2.15)
                       
                       0 otherwise
                       
2.5.2.1    Changing population size
To account for a population that changes size from N (0) to N (u) , the offspring production
matrix Z (u) becomes an N (0) × (N (u) + 1) matrix, and each element zi,j is defined as in
Equation (2.15).
2.6      Empirical data
2.6.1      Drift model verification
We will first test the validity of the drift distributions described in Section 2.4 by observing a
population undergoing neutral drift and recording its offspring distribution (Equation (2.14)).
To this end, I simulate a population of 100 organisms that reproduce either according to
a non-overlapping generational model (Figure 2.1A and C) or an overlapping generational
model (Figure 2.1B). The population evolves asexually (Figure 2.1A and B) or sexually
(Figure 2.1C). Every organism has the same fitness to ensure neutral evolution. In all cases,
to compensate for the inherent variability among different measures of f (n) under the same
selection pressures, 5,000 replicate distributions are averaged into one offspring distribution,
fˆ(n). Since fD (n) represents the maximum-likelihood drift distribution, fˆ(n) and fD (n)
should be identical distributions.
    Each panel of Figure 2.1 shows the drift distribution fD (n; N ) (Equation (2.4)) and
the average offspring distribution fˆ(n) (Equation (2.14)) with a 99% confidence interval.
Since our simulated population is drifting neutrally, we expect to see agreement between the
theoretical reference and the observation, and we do. We can observe that the differences in
reproduction model and number of parents change the shape of the drift offspring distribution.
                                                  36


                                                                                                              fD(n)
                        Fraction of Parents with Count
                                                           10   2   A                                         f(n)
                                                                                                              99% C.I.
                                                           10   4
                                                           10   6
                                                                    0       2         4          6       8           10
                                                                                Parent's Offspring Count
                        Fraction of Ancestors with Count
                                                                                                              fD(n)
                                                           10   1
                                                                    B                                         f(n)
                                                                                                              99% C.I.
                                                           10   2
                                                           10   3
                                                           10   4
                                                           10   5
                                                                    0   2       4       6     8      10         12
                                                                            Ancestor's Descendant Count
                                                                                                              fD(n)
                        Fraction of Parents with Count
                                                           10   2   C                                         f(n)
                                                                                                              99% C.I.
                                                           10   4
                                                           10   6
                                                                    0   2         4       6       8      10     12
                                                                                Parent's Offspring Count
Figure 2.1: Average of 5,000 replicate offspring distributions fˆ(n) for an in silico population of
100 organisms, drifting neutrally (red dots). 99% confidence interval of fˆ(n) (red shaded area).
Maximum-likelihood drift distribution fD (n) (black line). A) Wright-Fisher process with asexual
reproduction (p = 1). B) Moran process with asexual reproduction (p = 1). C) Wright-Fisher
process with sexual reproduction (p = 2).
2.6.2     Measuring the selection impact over time
In this section, I show how the selection impact metric can be used to track population
dynamics over time. To do this, I collect offspring production data from a simulated popu-
lation of 10,000 organisms, evolving with asexual non-overlapping generations. As opposed
to the neutral control experiments I discussed in the previous section, here I will study the
adaptation of a population to a fitness function with deceptive fitness valleys (see Section 2.8,
                                                                                      37


Figure 2.5). The deceptive nature of the fitness landscape provides us with the opportunity
to collect data of a population undergoing stabilizing selection when trapped on a local
optimum, as well as directional selection once a valley is crossed.
    Figure 2.2 shows a single evolutionary history of a simulated population of 10,000 individ-
uals using a mutation rate of µ = 0.005 on the saw-tooth fitness landscape (see Section 2.8.1,
Figure 2.5). Panel A shows both the average population fitness (blue line) and the high-
est fitness in the population (red line) for each generation. We observe periods of stasis
where no change of fitness occurs, as well as periods of adaptation where beneficial muta-
tions are discovered and fixed. Panel B shows the selection impact for each generation as
a function of time. Recall that the selection impact, S(f ) = EMD(f, fD ), is computed as
the earth-mover distance between the population’s offspring distribution f (n, Z) and the
drift offspring distribution for that population, fD (n; N ). The selection impact is small but
non-zero (S(f ) ≈ 0.003) during periods of stasis. The stasis periods correspond to stretches
of time when the population has converged onto a local optimum, so stabilizing selection
is acting to remove deleterious mutations in those periods. Conversely, during periods of
adaptation, the selection impact increases sharply (S(f ) ≈ 0.05) as the population enters a
selective sweep (directional selection) following the discovery of a beneficial mutation. The
selection impact remains elevated only as long as the beneficial mutation is fixing. Once
the population converges on a new local optimum, the selection impact returns to the lower
value.
2.6.3      Effect of population size on the selection impact
Since population size is one of the main drivers of selection strength, it is important to study
the dependence of the selection impact metric as population size changes.
    In order to measure how population size affects the selection impact, I would like to test
the metric on populations experiencing drift, directional selection, and stabilizing selection.
I simulate the evolution of a population of organisms on five different fitness functions. The
first fitness function, w(x) = 1, assigns all organisms the same fitness, and represents the
                                               38


  A         15   Average Fitness
                 Max Fitness
            10
  Fitness
            5
            0
                  0                500      1000                 1500     2000             2500
  B     0.05
        0.04
        0.03
 S(f)
        0.02
        0.01
        0.00
                  0                500      1000                 1500     2000             2500
                                                   Generations
Figure 2.2: Fitness and selection impact of a population evolving via a Wright-Fisher process. A)
The black line represents the average fitness and the red line represents the maximum fitness of
each generation. The population is navigating a deceptive fitness landscape (details provided in
Section 2.8.1). B) The selection impact metric of each generation of the same population.
neutral control. The functions w(x) = x and w(x) = x3 describe landscapes with less and
more extreme gradients, respectively. The final two functions, w(x) = −|x| and w(x) = −x2 ,
have a global optimum, though the fitness cost of a deleterious mutation is more extreme in
the latter. For a visual representation of these functions, see Figure 2.5.
        Figure 2.3 shows the average selection impact of a population when evolved on each of
the five fitness function and with each of ten population sizes between 10 and 10,000. In
all cases, the mutation rate is µ = 0.125 (see Section 2.8). As the population size increases
along the x-axis, the average selection impact of the population on the neutral landscape
decreases rapidly towards zero. For all other fitness functions, the average selection impact
converges to a non-zero constant.
        Figure 2.3 shows that the selection impact for the hill-climb functions increases slightly
as the population size increases, while the impact on the two functions with local optima
decrease slightly. These small changes before convergence tell us something about the state
of the population relative to the fitness landscape. The increase of the selection impact with
population size (small circles and triangles) indicates that selection was acting inefficiently due
to the small population. As the population size increases, more mutations become available
in a single generation, which benefits hill-climbing. Conversely, a decreasing selection impact
                                                   39


                                   100
                S(f) = EMD(fD,f)
                                   10   1
                                            w(x) = 1
                                            w(x) = |x|
                                            w(x) = x
                                            w(x) = x2
                                   10   2   w(x) = x3
                                            99% Confidence interval
                                                     101                102             103   104
                                                                      Population Size
Figure 2.3: The effect of population size on the measured selection impact of a population
experiencing neutral drift (large circles), stabilizing selection (stars, crosses), and directional selection
(triangles, small circles). Each data point corresponds to the mean of 5,000 generations of evolution
on the respective fitness function. 99% confidence intervals are shown as a red-shaded area.
with increasing population size (stars and crosses) on fitness peaks suggests that small-
population-size effects are creating a larger bias in the offspring distribution than would
ordinarily occur in a larger population under the same selective pressures. That is to say,
random fluctuations in offspring counts have a meaningful impact on small populations, a
phenomenon that is already well known [Kimura and Ohta, 1969].
2.6.4      Mutation rate affects the selection impact
Because mutation rate is another factor known to affect the adaptation of evolving populations,
I simulate the evolution of 500 organisms at different mutation rates on each of the five fitness
functions listed above (described in Section 2.8).
    Figure 2.4 shows the average selection impact of a population when evolved with mutation
rates between 10−5 and one per organism per generation. When the mutation rate is very low,
the selection impact is also low due to the lack of persistent non-neutral phenotypic variance;
the population is effectively homogeneous most of the time, so selection cannot differentiate
between individuals until mutations arise. As the mutation rate increases, the inflow of
                                                                       40


beneficial and deleterious mutations also increases. The increased inflow of mutations creates
more contrast between individuals, allowing selection to act more effectively, as is reflected
in the increasing selection impact.
                                                   w(x) = 1
                                                   w(x) = |x|
                                 100               w(x) = x
                                                   w(x) = x2
                                                   w(x) = x3
                                                   99% Confidence interval
              S(f) = EMD(fD,f)
                                 10   1
                                          10   5            10   4           10   3           10   2   10   1   100
                                                                                  Mutation Rate
Figure 2.4: Effect of mutation rate on the measured selection impact of a population of 500
organisms experiencing neutral drift (large circles), stabilizing selection (stars and crosses), and
directional selection (triangles and small circles). Each selection impact shown is a mean of 5,000
generations under the respective fitness function. Each 99% confidence interval is shown as a red-
shaded area.
   Figure 2.4 shows that the selection impact of populations on local-optima functions
begin to increase only in response to relatively high mutation rates. At low mutation rates,
selection keeps the sub-population of suboptimal mutants small, and prevents the emergence
of organisms with multiple mutations. But, as we increase the mutation rate, a larger
proportion of parent organisms will carry a deleterious mutation, and thus children with
multiple mutations become more common too. The increased disparity between the highest-
and lowest-fitness organisms explains the increased selection impact. The selection impact
rises as the distribution of offspring becomes increasingly biased in favor of organisms closer
to the optimum.
   Conversely, Figure 2.4 shows that the selection impact of populations on a hill-climb
function will increase in response to lower mutation rates. The hill-climb functions cannot be
                                                                                  41


optimized, in the sense that there is no optimal genotype. As a result, a genotype with even
higher fitness can always be discovered. The mutation rate controls the discovery rate of new
beneficial mutations, so as the mutation rate increases, we observe an increased selection
impact that results from the increased frequency of selective sweeps. However, Figure 2.4
shows a critical mutation rate where the selection impact begins to trend downward. This
reversal signals a change in the evolutionary dynamics. At sufficiently high mutation rates,
there may be many organisms that have the highest fitness (or nearly the highest), but are
not very competitive with one another (i.e., clonal interference) [Desai and Fisher, 2007]. The
clonal interference results in a more uniform allocation of offspring among those top-ranking
individuals, and thus a lower selection impact.
2.7       Discussion
Beginning with a set of goals, outlined in Section 2.2, I have developed a metric for the impact
that selection has on a population. The metric is calculated by comparing what we expect to
occur when a population undergoes neutral drift with observations of evolution. Neutral drift
is well understood and can be accurately characterized mathematically, and this allows us to
construct a robust mathematical model of drift. I have shown that this mathematical model
of drift correctly predicts the offspring distribution of populations drifting neutrally. I have
further shown that the magnitude of the selection impact metric is sensitive to the occurrence
of selective sweeps and stabilizing selection. I have provided resources on how to calculate the
selection impact for a variety of circumstances, including overlapping and non-overlapping
generations, asexual and sexual reproduction, constant and changing population sizes, as
well as within and between sub-populations. The selection impact metric satisfies each of
the requirements discussed in Section 2.2, and provides a novel way of quantifying the way
selection acts on a population.
                                                42


2.7.1      Comparing the selection impact to other measures of selec-
           tion
The selection impact metric was designed from the start to only consider the offspring
distribution of a population. This not only ensures the metric is looking at the tell-tale
signature of selection acting on a population, but it also ensures that no assumptions are
made about the mode or form of selection.
    The mode of selection may be thought of as a kind of selection algorithm that actually
makes the decision about which organisms reproduce and how many offspring they have. For
example, truncation selection is a mode of selection where some percentage of the population
dies, with those organisms that are the least fit dying first, and the surviving organisms
reproduce with equal probability. Measures of selection strength have been formulated
with this mode of selection as a foundational assumption [Haldane, 1954, Van Valen, 1965,
Van Valen, 1967]. However, the measures of selection strength that assume truncation
selection have been shown to under-estimate the selection strength [O’Donald, 1968]. The
selection impact metric does not assume any particular mode of selection, it merely examines
the after-effect of selection, the distribution of offspring.
    Other measures of selection strength assume a particular relationship between fitness and
traits. In particular, [O’Donald, 1968] assumes a quadratic relationship, w(x) ∝ (θ − x)2 ,
where θ is the optimal phenotype. While certainly this approximation will hold in some
cases, it cannot hold in all cases, and it cannot hold when the population is in the midst of
selective sweep. My selection impact metric is not limited by any such assumptions.
    Like my selection impact, the Price equation [Price et al., 1970, Price, 1972] also does
not assume a particular relationship between traits and fitness. However, the Price equation
and extensions to it [Lande and Arnold, 1983] have received a number of criticisms that
cannot go unmentioned. The ability to measure selection strength with the Price equation
hinges on performing regression on the relationship between fitness and traits. However, this
results in a relationship with units, namely those of the trait and of the fitness measure, and
                                                43


comparisons between values with conflicting units are nonsensical [Houle, 1992,Hereford et al.,
2004, Matsumura et al., 2012]. Although ad hoc mathematical corrections are possible, there
is debate about which is the most appropriate correction to apply, if any at all. My selection
impact metric, in contrast, is unitless, and so any two measurements can be compared without
the need to correct for units.
2.7.2      Interpretation of the selection impact
I have referred to the metric introduced in this report as the ’selection impact’ (as opposed
to selection strength) because it illustrates the extent to which selection is actually affecting
the population. ’Selection strength’ can be thought of as a potential for adaptation. The
selection impact does not necessarily indicate the power of this potential, but rather the
effect it has on the population.
    In particular, the selection impact is a function of the movement of the population through
the fitness landscape, irrespective of where or how the population is moving. Further, the
selection impact reflects how selection moves the organisms in the population, as opposed
to the population’s center of mass. This is why in the case of stabilizing selection, we see a
non-zero selection impact even though the population is in equilibrium (e.g., Figure 2.2).
    If there is no potential for selection (i.e., no selection strength) or there is no non-neutral
phenotypic variation in the population, neutral drift occurs. Therefore, the selection impact
is proportional to the product of these two qualities, S(f ) ∝ (Pop.Var.) × (Sel.Str.).
2.7.3      On the Practicality of the selection impact metric
Calculating the selection impact requires detailed information about the population (each
individual’s offspring production) that is often difficult to obtain. If generations overlap, we
further have to keep track of each organism’s ancestry. This level of detail is manageable for
computer simulations, but rarely for field work.
    Another limitation of the selection impact metric is that it cannot connect selection to
specific traits, unlike other measures of selection in the literature [Haldane, 1954, Van Valen,
                                                  44


1965, O’Donald, 1968, Price et al., 1970, Lande and Arnold, 1983]. However, both of these
limitations can be addressed by using a drift reference distribution that models the expected
change in a trait frequency distribution under neutral drift. This would allow for the cal-
culation of the selection impact by comparing the expectation for a drifting trait with an
observation, eliminating the need to individually identify organisms or track their lineages.
    If x represents a quantitative trait and p(x, t) is a probability mass function showing
the frequency distribution of trait values in the population at time t, we can model neutral
drift by studying how p(x, t) changes when mutations are applied to organisms with uniform
probability. If µ(x, t|y) is a probability mass function that represents the measure of a trait
with initial value y that was mutated at time t, then the expected distribution of traits at
t + 1 is given by
                                                X∞
                               pD (x, t + 1) =       pD (y, t)µ(x, t|y).                 (2.16)
                                               y=−∞
If the shape of µ does not depend on y, then pD is the discrete convolution of p and µ
                                               X∞
                             pD (x, t + 1) =        pD (y, t)µ(x − y, t).                (2.17)
                                              y=−∞
    Comparing an actual observation of p(x, t + 1) with pD (x, t + 1) using the methods
described in Section 2.3 will produce a measure of the selection impact based only on the
distribution of traits observed in the population.
    Another possible trait-based formulation of selection impact could leverage the Fokker-
Planck equation [Risken, 1989],
                                                            D ∂ 2p
                                                     
                                          ∂    ∂U ∂p
                                  ṗ = −                 +          ,                    (2.18)
                                          ∂x   ∂x ∂x         2 ∂x2
where F = − ∂U ∂x
                  is the selective force acting on the trait, and D is the diffusion (mutation)
coefficient. The FP equation models time evolution of a probability density function p(x, t).
The function p can represent the fraction of individuals with phenotype x in the population
                                                  45


at time t. As discussed in the previous section, selection is analogous to a force generating a
                                             ∂U
potential for movement. Here, the force      ∂x
                                                comes from the gradient of the fitness landscape
U (x, t). If U is the neutral landscape, there is no force. If there is no variance p(x) = δ(x),
then the force has nothing to act on. Variance is supplied at a constant rate D.
    The FP equation can simulate neutral drift by initializing p(x, tinitial ) with an observation
of a population. Substituting U for the neutral landscape and evolving the equation in time
gives us p(x, tfinal ), a prediction of the phenotype distribution after drifting neutrally. This
can be compared with a second observation of the population at tfinal using the earth-mover
distance. The result is, in spirit, the selection impact since it is the difference between what
actually occurred and the expected outcome of neutral drift.
2.7.4       Concluding remarks
The selection impact metric offers a new way of quantifying selection over both time and
space. Since the selection impact represents the actual effect of the selection potential acting
on the population, it allows us to see evolving populations as they really are. In addition,
the selection impact shows promise for a number of additional applications.
    In the field of evolutionary optimization, tuning an optimization algorithm to maximize
the adaptation of the population is one of the hardest challenges to overcome. Using the
selection impact as a diagnostic aid for tuning optimization algorithms gives direct feedback
on the actual impact of the changes made to the algorithm. Sections 2.6.3, 2.6.4, and 2.7.2
provide a glimpse of how this is possible. Algorithm tuning can even be done automatically
in response to the performance of the population. Even very simple methods of automatic
parameter tuning have been shown to greatly improve optimization algorithms [Kramer,
2010].
    The selection impact might also be used to detect the current adaptive mode of some
populations. For example, whether the population is experiencing directional selection,
stabilizing selection, or neutral drift can be seen at a glance when the selection impact of
these three modes result in sufficiently different behavior (Section 2.6.2). This primarily
                                                 46


applies at large population sizes, since small populations experience these modes as more
similar than different (see Figure 2.3).
    Given the central role of selection in evolutionary theory, and the complex web of phenom-
ena that influence the strength and character of selection, a measure of selection is needed
that can work in any context. The selection impact metric is built on a theoretical foundation
that makes no assumptions about the mode of selection, or the many mechanisms that change
the strength of selection. As such, the selection impact can work in nearly any context, and
offers all evolutionary scientists, biological or computational, a universal language to describe
the strength of selection acting on a population.
2.8        Methods
The empirical data presented in Section 2.6 was collected using a simple evolutionary algo-
rithm. First, a population of digital organisms is initialized. Population sizes vary and are
specified in Section 2.6. Each organism has a genome given by a single integer, initially set to
zero. When an organism reproduces, the genome may mutate, changing the integer value by
+1 with probability µ/2, by −1 with probability µ/2, or leaving it unchanged with probability
1 − µ. Mutation rates vary and are specified in Section 2.6. Each organism is assigned a
score according to one of the 6 fitness functions described below. Once a score is assigned
to each organism, parents are chosen for reproduction with a roulette-wheel (fitness propor-
tional) selection algorithm. The generational model (i.e., overlapping versus non-overlapping
generations) and the number of parents required for reproduction (i.e., sexual or asexual)
vary and are specified in Section 2.6.
2.8.1       Fitness functions
The following fitness functions are used in this work, where x is the organism’s integer genomic
state: w(x) = 1 for the neutral landscape, w(x) = x or w(x) = x3 for a non-deceptive hill-
climb landscape, w(x) = −|x| or w(x) = −x2 for a landscape with an inescapable global
optimum, and w(x) = Sawtooth(x) for a deceptive hill-climb landscape. Figure 2.5 plots
                                                  47


these fitness functions for a subset of genomic states. The saw-tooth fitness function can be
represented by4
                                                                              
                                                      x                        d     d
                                       Sawtooth(x) =                      r+d+     −x .                                 (2.19)
                                                     w+1                       w     w
    The w parameter is the width of the valley:P the number of deleterious mutations to the
right of a local optimum before encountering a beneficial mutation. d is the depth of the
valley: the amount subtracted from an organism’s score when it is at the right-most position
in a fitness valley. All other points in the valley are interpolated between the fitness peak to
the left and the bottom of the valley. r is the rise in score from one local optimum to the next.
The first term of Equation (2.19) represents the cumulative advantage for crossing each valley,
while the second term represents the cumulative disadvantage of crossing each fitness valley.
Note that advantages increase as a step function, while disadvantages increase continuously.
The sum of these two terms result in the saw-tooth shape of the fitness function.
                                                                    w(x) = x                 6       Sawtooth(x)
                                   0                    20          w(x) = x3                5
                Organism's Score
                                   2                    10                                   4
                                                                                             3
                                   4                    0
                                                                                             2
                                   6       w(x) = 1     10                                   1
                                           w(x) = |x|   20                                   0
                                   8       w(x) = x2
                                                                                             1
                                       2     0      2              2      0       2              0   5     10      15
                                                             Organism's Genomeic State (x)
Figure 2.5: Six fitness functions: five polynomial fitness functions and one saw-tooth function.
The saw-tooth is shown as it was configured for Section 2.6.2: d = 1, w = 2, and r = 1.
   4
     The floor function, ⌊·⌋, is a function that returns the nearest whole number, less-than or equal-to the
input.
                                                                      48


Chapter 3
Noisy fitness and selection strength
This chapter is based on the article “Connections between noisy fitness and selection strength”
[Ragusa and Bohm, 2021].
3.1       Introduction
Selection is perhaps the most central concept in evolutionary theory. Selection can be
described as a filtering process which changes a population over time with regard to the
result of some evaluation (i.e., a fitness function). The filter can range from fully permissive,
where the result of the evaluation is not considered, to maximally restrictive, where only the
organism(s) with the best evaluation pass through. These two extreme cases are referred to
as drift and elitism, respectively. All other ’strengths of selection’ fall somewhere in between
drift and elitism.
3.1.1      Background
We are interested in understanding the relationship between parameters of evolution and
rates of adaptation. Here, we limit our investigation to the relationship between selection
strength and rates of adaptation in the context of different fitness landscapes.
    In evolutionary theory, the concept of a “fitness landscape” is the relationship between
genotypes and fitness [Wright, 1932]. A position in a fitness landscape surrounded by regions
                                                49


of lower fitness is called a “local optimum”. If a population is positioned on a local optimum,
then it must cross a region of lower fitness — called a “fitness valley” — in order to escape
the local optima and potentially ascend to higher fitness.
    Although a population does not always require a loss in fitness to cross a fitness valley,
[Iwasa et al., 2004a] deleterious mutations are often necessary to explain valley-crossing
events. For example, [Covert et al., 2013] disallow deleterious mutations and observe that
the adaptability of the system is hindered due to the inability to valley-cross. Similarly, it
has been shown that elitism hinders adaptation on rugged fitness landscapes [Oliveto et al.,
2018].
    In this work, we consider three specific evolutionary parameters that each effect selection
strength: population size, noisy evaluation, and tournament size (a parameter specific
to our model).
3.1.1.1     Population Size
Population size relates to selection strength in that selection in larger populations results in
less sampling error because of the grater number of sampling events when choosing parents.
Small populations experience drift more readily and so experience less selection strength,
while large populations are very sensitive to small fluctuations in fitness and experience
strong selection.
    Adjusting the population size in an evolutionary model, therefore, is a rather simple
way to adjust the selection strength. While this convention is ubiquitous in evolutionary
modeling, we believe it is somewhat flawed. Our concern with this approach is that altering
population size changes more than selection strength. Increasing the population size increases
mutation inflow and the standing variation of genotypes in the population. The increase in
standing variation is a natural consequence of the diffusion of genotypes due to mutational
inflow [Feller, 1951, Kimura, 1964].
                                               50


3.1.1.2    Noisy Evaluation
Some recent work has shown that noisy fitness, for example due to environmental noise, can
reduce the strength of selection. For example, [Wang and Zhang, 2011] and [Melbinger and
Vergassola, 2015] both see an increase in drift-like behavior when fitness is noisy and the
selection algorithm has difficulty comparing organisms accurately. Both publications describe
the change in behavior as being similar to a reduction in the effective population size.
    A third study by [Van Egeren et al., 2018] demonstrates how noisy fitness can also
benefit adaptation. By creating new opportunities for deleterious mutations to hitchhike,
the stochastic increases in fitness actually help populations cross fitness valleys that would
have otherwise been impossible to cross. They also see an increased capability for valley
crossing when the noise is increased. It is worth noting that these three studies do not
conflict; The drift-like behavior, which disrupts the effectiveness of selection, is precisely how
the deleterious hitchhikers survive to cross valleys.
3.1.1.3    Tournament Size
Tournament selection is a selection algorithm commonly used in computational models of
evolution. Regarding our work, it has several benefits, but the most relevant is that it has
an adjustable strength of selection. The strength of selection is adjusted by the ‘tournament
size’ parameter, which controls how many organisms in the population compete with each
other for each opportunity to reproduce. The size parameter can be set between drift and
elitism, which makes it an ideal dial of selection strength.
3.1.2      Summary of our work
In this work we perform a detailed assay on the relationship between population size, noisy
phenotype evaluation, and tournament size, and their effects on rates of genomic change. We
run our model on nearly 4,500 different scenarios.
    We observe evolution on a smooth fitness landscape as well as nine deceptive landscapes
using our model. We show that for the smooth landscape it is always best to have strong
                                               51


selection with noise-free fitness and a large population. For deceptive landscapes, there
is an optimum configuration of tournament size and noise that balances exploration and
exploitation. Population size, on the other hand, always increases genomic change when
larger, because it not only increases selection strength but also maximizes mutational inflow
and standing variation. We see that while these parameters for selection strength have similar
effects, they each behave in unique ways. Finally, we suggest that evaluation noise is a better
proxy for selection strength than the other two methods.
3.2       Methods
For this work, we implemented our evolutionary simulation using the MABE software [Bohm
et al., 2017]. In order to understand the relationship between noise, tournament size and,
population size, with rates of adaptation, we ran an array of nearly 4,500 scenarios. Each
scenario combines a tournament size with a score noise and a population size. Each combi-
nation of parameters is run on 10 fitness functions, one non-deceptive landscape and nine
deceptive landscapes. All nine deceptive fitness functions contain fitness valleys which must
be crossed in order to reach a higher score; The functions differ in the depth and width of
their fitness valleys. We used tournament selection for all scenarios.
3.2.1      Organism Definition
Our evolution model is an agent-based simulation where each generation of digital organisms
are evaluated and selected for reproduction. Unlike more complicated digital organisms [Ofria
and Wilke, 2004], we have simplified the concept considerably in order to remove as many
confounding interactions from our model as possible.
    In this work, the organism has a genome that is represented by a vector of 500 integer
values, g = [g1 , g2 , ..., g500 ]. This genome is used to give each organism a score, which is used
during selection and is inherited during reproduction. Each site in the genome may mutate
during reproduction, with probability µ = 0.0005 per site. When a mutation does occur at
some site gi in the genome vector, the value is mutated following the rule gi′ = gi ± 1 where
                                                     52


the offset +1 or −1 is decided randomly with equal probability.
3.2.2       Fitness Functions
An organism’s score is determined based on the organism’s genome vector g. First a sum of
all sites in the genome, x, is computed:
                                                 500
                                                 X
                                            x=       gi                                      (3.1)
                                                 i=1
     Note that since all gi are integers, the sum x is also an integer.
     The sum x is then passed through a function which maps specific values of x to a score.
     We use ten fitness functions in this work. The first is a simple hill-climbing function
w(x) = x. This function allows us to observe the dynamics of the system with no deception
(i.e., no fitness valleys).
     The other nine functions all include deceptive fitness valleys. There is a default fitness
function, and eight alternate functions which modify the topology of the default function in
order to explore how valley depth and width change the outcomes of the simulation.
     All the deceptive fitness functions are periodic, meaning after crossing any fitness valley
there will be another exactly like it, indefinitely so. However, the score at each of the local
optima is 5 higher than the score of the previous (increasing with x) which incentivizes the
population to continuously climb to ever higher peaks, similar to the simple hill-climbing
function.
     Due to the repeating nature of these functions, they can be defined by just one valley.
Each function begins with a plateau one mutation across, creating two positions of equal
fitness. Following that, as x increases, the score drops until finally jumping up to a height
greater than the starting position. The difference in score between the starting position and
the lowest point in the valley we call “valley depth”. The length, in x, of the valley, including
the plateau, we call “valley width”. Valley depth and width are varied between the nine
deceptive functions. The default function has a width and depth of 6. Shallow versions of
                                                53


the function have depth 3 while deep versions have depth 9. Similarly, narrow versions of
the function have width 3 while wide versions have width 12. Figure 3.1 provides a visual
representation of each of the nine deceptive fitness functions.
                                 narrow shallow                 shallow                wide shallow
                             valley width: 3           valley width: 6           valley width: 12
                        30   valley depth: 3           valley depth: 3           valley depth: 3
                        20
                score
                        10
                        0
                 −10
                                      narrow                    default                     wide
                             valley width: 3           valley width: 6           valley width: 12
                        30   valley depth: 6           valley depth: 6           valley depth: 6
                        20
                score
                        10
                        0
                 −10
                                   narrow deep                   deep                   wide deep
                             valley width: 3           valley width: 6           valley width: 12
                        30   valley depth: 9           valley depth: 9           valley depth: 9
                        20
                score
                        10
                        0
                 −10
                             0       10      20   30   0       10      20   30   0       10      20   30
                                      genotype                  genotype                  genotype
Figure 3.1: A visual representation of the 9 deceptive fitness landscapes we use throughout this
work. The functions are arranged in a 3 × 3 grid such that the depth of the fitness valley increases
from top to bottom and the length (in mutational distance) of each valley increases from left to
right. The names and dimensions of each fitness function are displayed above each plot.
3.2.3     Real-Valued Tournament Selection
In this work, we use tournament selection. The tournament selection algorithm is typically
defined as shown in Algorithm 1 where T , the integer variable that determines the tour-
nament size (T ∈ Z), is restricted such that T ≥ 1. Here, we wish to consider very weak
selection strength. In order to achieve this, we extended the tournament algorithm to include
tournament sizes that are between 2 and 1. Therefore, we define Algorithm 2 as a means of
defining and interpreting real-valued tournament sizes.
                                                               54


    In Algorithm 2, T is still restricted such that T ≥ 1, but is now a real value (T ∈ R).
Algorithm 2 amends Algorithm 1 such that a tournament of size ⌊T ⌋ is always conducted,
but an extra challenger is drawn with probability T − ⌊T ⌋ resulting in an average tournament
size of T . Note that every tournament conducted by Algorithm 2 is composed entirely of
integer-sized tournaments.
    The idea of fractional tournament size is, as far as we are aware, a new concept. It may
be challenging to conceive of the meaning of a tournament size of 1.1. The way to think of
T = 1.1 is to see it as tournament 1 (random selection) 90% of the time and tournament
2 10% of the time. So, given a population of size 100, this would mean that 90% of the
next generation would be the result of random selection and only 10% would be the result
of tournament 2.
  Algorithm 1: Standard Tournament Selection
    Result: A digital organism destined to reproduce
    count = 1;
    best = chooseRandomlyFrom(population);
    while count < T do
        challenger = chooseRandomlyFrom(population);
        if challenger.fitness > best.fitness then
            best = challenger;
        end
    end
    return best;
3.2.4      Applying Noise to Score
In some experiments, we investigate the effects of noisy evaluation on rates of adaptation.
Whenever we add noise to the score of an organism as part of our experiments, it is applied
as the last step before selection. N is the variable that controls the level of noise. Once an
organism’s score has been determined by the relevant fitness function, a random value in the
range [−N, N ] is chosen with uniform probability and then added to the organism’s score.
This creates a level of uncertainty; Organisms whose scores are closer together than 2N may
sometimes have their competitive advantages reversed by the noise applied to their score.
                                                55


  Algorithm 2: Real-Valued Tournament Selection
    Result: A digital organism destined to reproduce
    count = 1;
    best = chooseRandomlyFrom(population);
    whole = floor(T);
    remainder = T − whole;
    if randomFloat(0,1) ≤ remainder then
        whole = whole + 1;
    end
    while count < whole do
        challenger = chooseRandomlyFrom(population);
        if challenger.fitness > best.fitness then
            best = challenger;
        end
    end
    return best;
3.2.5      Model Justification
The genetic model, selection algorithm, and fitness functions we use in this work were chosen
after some consideration. We are primarily interested in the dynamics surrounding valley-
crossing events and how those dynamics are affected by the parameters that we vary.
    Our genomic model removes the influence of many genetic phenomena such as sexual
recombination or crossover, allele dominance, gene epistasis, horizontal gene transfer, and
gene translation.
    Our selection algorithm only considers the relative rank of scores in the population and
therefore maintains a constant strength of selection (unlike fitness-proportional selection
algorithms, which experience weakened selection as the average score increases).
    Our fitness functions are designed to have an unlimited number of identical valley-crossing
opportunities, regardless of the population’s position on the function. This allows us to
characterize how well a particular configuration of the model crosses valleys. By looking
at the final score of each run, we can determine how many valleys were crossed during the
runtime and compare this with other configurations to easily compare rates of adaptation
under different conditions.
                                                56


3.2.6      System Configuration
Every experiment condition was replicated 100 times. Each replicate was run for 20,000
generations. Every generation, the parent organisms are fully replaced with the offspring they
generated. Population size, tournament size, and score noise are varied between experiments
and are noted in the results section.
    Readers wishing to replicate the results from this paper are directed to the supplemental
materials, which include files and instructions for generating the data presented in this paper
(see: http://github.com/cliff-bohm/ALIFE-2021-Connections).
3.3       Results
3.3.1      Smooth Landscapes
Figure 3.2 shows the final scores of runs on the smooth landscape. The smooth landscape is a
simple hill climb function in which each increase in x awards the same increase in score. The
hill-climbing nature of the fitness function removes the explore-vs-exploit trade-off, leaving
pure exploitation as the optimal strategy. This is reflected in the data; Strong selection, large
populations, and no score noise result in the highest final scores.
3.3.2      Deceptive Landscapes
Next we consider deceptive fitness landscapes. We use nine fitness functions that contain
fitness valleys of different widths and depths. The functions are each labeled as deep/shallow
or narrow/wide versions of a default valley shape. Unlike the simple hill-climbing function,
these deceptive landscapes require a balance between exploration and exploitation.
3.3.2.1     Noise vs Population size
Figure 3.3 shows the final outcomes of runs with varying population sizes and score noise
amounts, with tournament size fixed at 2. If altering the population size was fundamentally
the same as changing the score noise, we would expect to see a one-to-one mapping that
                                               57


                    a                                           b
     103.4                            35000      103.4                                40000
      52.2                                        52.2
      26.6
      13.8
                                      30000       26.6
                                                  13.8
                                                                                      35000
                                                                                                      512
                                                                                                            c
                                                                                                                                  12000
  tournament size                             tournament size
       7.4                                         7.4                                30000           256
       4.2                            25000        4.2                                                128                         10000
       2.6                                         2.6                                25000            64
                                                                                                                                     final score
                                      20000                                                            32                         8000
       1.8                                         1.8
                                                                                              noise level
       1.4                                         1.4                                20000            16                         6000
                                      15000                                                             8
       1.2                                         1.2                                                  4                         4000
       1.1                                         1.1                                15000             2
      1.05                            10000       1.05                                                  1                         2000
     1.025                                       1.025                                10000             0
    1.0125                            5000      1.0125                                                         16
                                                                                      5000
                                                                                                               81
                                                                                                              256
   1.00625                                     1.00625
                                                                                                              625
                                                                                                             1296
                                                                                                             2401
                                                                                                             4096
                                                                                                             6561
       1.0                                         1.0
                                                                                                            10000
                                                                                                            14641
                      0
                      1
                      2
                      4
                      8
                     16                                            16
                                                                   81
                                                                  256
                                                                                                                population size
                     32
                     64
                    128
                    256
                    512                                           625
                                                                 1296
                                                                 2401
                                                                 4096
                                                                 6561
                        noise level                                 population size
                                                                10000
                                                                14641
Figure 3.2: Heatmap showing the final scores of all runs on the smooth fitness function. Contour
lines indicate lines of equal score. Every square represents the average score of 100 replicates. a)
pairs tournament sizes with noise levels while fixing population size at 625. b) pairs tournament
size with population sizes while fixing noise at 0. c) pairs noise level with population size while
fixing tournament size at 2.
relates pairs of parameters to the same rate of adaptation. Instead, we see that there is no
one-to-one mapping between the two parameters.
        Further, we see that the narrow fitness valleys are similar to the simple hill-climbing
landscape. This is most likely due to the narrow valley being so short that tunneling
across via mutation is possible. Under these circumstances, low noise and large populations
are advantageous. The large populations ensure that rare tunneling events happen more
frequently, while the noise-free score help ensure that selection detects those individuals.
        In the default and wide valley conditions, tunneling is less likely, so the effects of purifying
selection become an obstacle. Under these circumstances we see that score noise creates
uncertainty for selection and deleterious mutations can persist long enough to assist the
population in crossing the valley. When noise is too low, the population is stuck, and we
see large regions of low adaptation in Figure 3.3. We also see that when noise is too high,
adaptation begins to suffer due to the decreased ability to detect differences relative to the
noise-free scores; As the noise begins to dominate, the system devolves into drift-like behavior.
As in the narrow conditions, large populations result in fast adaptation, presumably due to
                                                                      58


                     narrow shallow              shallow                 wide shallow
               512                                                                          300
               256
                                                                                                 valleys crossed
               128                                               1000
                64                      2000                                                200
   noise amt
                32
                16
                 8                                               500
                 4                      1000                                                100
                 2
                 1
                 0                      0                        0                          0
                        narrow                   default                     wide
               512
               256
                                                                 800                        200
                                                                                                 valleys crossed
               128                      2000
                64                                                                          150
   noise amt
                32                                               600
                16
                 8                      1000                     400                        100
                 4
                 2                                               200                        50
                 1
                 0                      0                        0                          0
                      narrow deep                  deep                   wide deep
               512
               256
                                        2000
                                                                                                 valleys crossed
               128                                               600                        150
                64
   noise amt
                32
                16                                               400                        100
                 8
                 4
                                        1000
                 2                                               200                        50
                 1
                 0                      0                        0                          0
                        16
                        81
                       256                        16
                                                  81
                                                 256                       16
                                                                           81
                                                                          256
                       625
                      1296
                      2401                       625
                                                1296
                                                2401                      625
                                                                         1296
                                                                         2401
                      4096
                      6561
                     10000                      4096
                                                6561
                                               10000                     4096
                                                                         6561
                                                                        10000
                     14641                     14641                    14641
                      population size          population size           population size
Figure 3.3: This figure shows the final outcomes of simulations that combine Population Size and
Score noise parameters with tournament size T = 2. Every cell is the average of 100 replicates. The
color intensity indicates the number of valleys crossed during the total runtime of 20,000 generations
(final score divided by 5). The nine subplots each show the results of a different fitness function.
increased standing variation (genetic diffusion) and increased mutational inflow. Apparently,
these effects outweigh increases in selection strength because we see no point where population
size becomes too big.
3.3.2.2              Tournament size vs Population size
Figure 3.4 shows the final outcomes of runs with varying population sizes and tournament
sizes, with noise fixed at 0. As tournament size relates directly to selection strength, if
altering the population size was fundamentally the same as changing the selection strength,
                                                     59


we would expect to see a one-to-one mapping that relates pairs of parameters to the same
rate of adaptation. Instead, we see that there is no one-to-one mapping between the two
parameters.
    Further, we see that the narrow fitness valleys are similar to the simple hill-climbing
landscape. This is most likely due to the narrow valley being so short that tunneling
across via mutation is possible. Under these circumstances, high selection strength and
large populations are advantageous. The large populations ensure that rare tunneling events
happen more frequently, while the strong selection ensures those rare events go to fixation.
    Once again, in the default and wide valley conditions tunneling is less likely, so the
effects of purifying selection become an obstacle. Under these circumstances, we see that
only when tournament size is low can deleterious mutations persist long enough to assist
the population in crossing the valley. When tournament size is too high, the population is
purifying deleterious mutations too quickly to cross valleys, and we see large regions of low
adaptation in Figure 3.4. We also see that when tournament size is too low, adaptation
begins to suffer due to the decreased ability to discern differences in score. As tournament
size approaches 1, the system tends towards drift-like behavior. Large populations continue
to be advantageous here for the same reasons indicated in the Tournament size vs Population
size results.
3.3.2.3    Tournament size vs Noise
Figure 3.5 shows the final outcomes of runs with varying tournament sizes and score noise
amounts, with the population size fixed at 625. If altering the tournament size was funda-
mentally the same as changing the score noise amount, we would expect to see a one-to-one
mapping that relates pairs of parameters to the same rate of adaptation. Unlike Figures
3.3 and 3.4, we see a very strong relationship between the two, though it is not one-to-one
in all cases. There are, however, large areas within each plot where the relationship is
approximately linear.
                                             60


                         narrow shallow              shallow                wide shallow
                103.4                                                1000                     250
                 52.2                       6000
                 26.6
                 13.8                                                800                      200
                                            5000
 tournament size
                   7.4
                                                                                                   valleys crossed
                   4.2
                   2.6
                   1.8
                                            4000                     600                      150
                   1.4                      3000
                   1.2                                               400                      100
                   1.1
                 1.05                       2000
                1.025                                                200                      50
              1.0125
             1.00625
                                            1000
                   1.0                      0                        0                        0
                            narrow                   default                    wide
                103.4                                                                         250
                 52.2                       6000
                 26.6                                                800
                 13.8                                                                         200
                                            5000
 tournament size
                   7.4
                                                                                                   valleys crossed
                   4.2
                   2.6                      4000                     600                      150
                   1.8
                   1.4                      3000
                   1.2                                               400                      100
                   1.1
                 1.05                       2000
                1.025                                                200                      50
              1.0125
             1.00625
                                            1000
                   1.0                      0                        0                        0
                          narrow deep                 deep                   wide deep
                103.4                                                                         200
                 52.2                       6000
                 26.6                                                600
                 13.8                                                                         150
                                            5000
 tournament size
                   7.4
                                                                                                   valleys crossed
                   4.2
                   2.6                      4000                     400
                   1.8                                                                        100
                   1.4                      3000
                   1.2
                   1.1
                 1.05                       2000                     200                      50
                1.025
              1.0125
             1.00625
                                            1000
                   1.0                      0                        0                        0
                            16
                            81
                           256                        16
                                                      81
                                                     256                       16
                                                                               81
                                                                              256
                           625
                          1296
                          2401                       625
                                                    1296
                                                    2401                      625
                                                                             1296
                                                                             2401
                          4096
                          6561
                         10000                      4096
                                                    6561
                                                   10000                     4096
                                                                             6561
                                                                            10000
                         14641                     14641                    14641
                          population size          population size          population size
Figure 3.4: This figure shows the final outcomes of simulations that combine Tournament Size
and population size parameters with noise N = 0. Every cell is the average of 100 replicates. The
color intensity indicates the number of valleys crossed during the total runtime of 20,000 generations
(final score divided by 5). The nine subplots each show the results of a different fitness function.
           Here again, we see that the narrow fitness valleys are similar to the simple hill-climbing
landscape. This, again, is most likely due to the narrow valley being so short that tunneling
                                                        61


                           narrow shallow          shallow               wide shallow
                  103.4                     2500
                   52.2                                         600
                   26.6                                                                      125
                   13.8                     2000
  tournament size
                     7.4                                        500
                                                                                             100
                                                                                                  valleys crossed
                     4.2
                     2.6                    1500                400
                     1.8                                                                     75
                     1.4                                        300
                     1.2                    1000
                     1.1
                                                                200                          50
                   1.05
                  1.025                     500
                1.0125                                          100                          25
               1.00625
                       1                    0                   0                            0
                              narrow                default                   wide
                  103.4                                         500
                   52.2                                                                      100
                   26.6
                   13.8                     1500                400
  tournament size
                     7.4                                                                     80
                                                                                                  valleys crossed
                     4.2
                     2.6                                        300                          60
                     1.8                    1000
                     1.4
                     1.2                                        200
                     1.1                                                                     40
                   1.05                     500
                  1.025                                         100                          20
                1.0125
               1.00625
                       1                    0                   0                            0
                            narrow deep              deep                 wide deep
                  103.4                                         400
                   52.2
                                            1500                                             80
                   26.6
                   13.8
  tournament size
                     7.4                    1250                300                          60
                                                                                                  valleys crossed
                     4.2
                     2.6                    1000
                     1.8
                     1.4
                                            750                 200                          40
                     1.2
                     1.1
                   1.05                     500                 100
                  1.025                                                                      20
                1.0125
               1.00625
                                            250
                       1                    0                   0                            0
                             0
                             1
                             2
                             4
                             8                       0
                                                     1
                                                     2
                                                     4
                                                     8                  0
                                                                        1
                                                                        2
                                                                        4
                                                                        8
                            16
                            32
                            64
                           128
                           256                      16
                                                    32
                                                    64
                                                   128
                                                   256                 16
                                                                       32
                                                                       64
                                                                      128
                                                                      256
                           512                     512                512
                              noise amt            noise amt                noise amt
Figure 3.5: This figure shows the final outcomes of simulations that combine Tournament Size and
Score noise parameters with Population size P = 625. Every cell is the average of 100 replicates. The
color intensity indicates the number of valleys crossed during the total runtime of 20,000 generations
(final score divided by 5). The nine subplots each show the results of a different fitness function.
across via mutation is possible. Under these circumstances, high selection strength and low
noise are advantageous. The strong selection ensures rare high-fitness mutants go to fixation,
                                                     62


while the noise-free score help ensure that selection detects those individuals.
    As seen before, in the default and wide valley conditions tunneling is less likely, so the
effects of purifying selection become an obstacle. Under these circumstances, we see that all
but the very low and very high tournament sizes are viable. The very low settings result
in drift-like behavior, while the very high settings result in purifying selection that restricts
valley crossing. Interestingly, the score noise seems to be capable of compensating for the
increase in selection strength. As the noise increases, the optimal tournament size increases
too, suggesting that although the score noise adds uncertainty, this can be overcome simply
with more selection strength. However, this compensatory relationship begins to fail at
larger noise strengths because once the score signal is dominated by the noise, no amount of
selection strength can compensate. As the score noise increases, the system will eventually
tend towards drift-like behavior.
3.4       Discussion
In this work we show the relationships between all combinations of population size, noisy
evaluation, and tournament size in the context of rate of adaptation.
    One key observation is that there are relationships between each of the three parameters
and selection strength. However, while the behavior of these parameters are each related to
selection strength, the relationship is imperfect and there is no perfect one-to-one relationships
between any of the parameters we have considered. This means that each of these parameters
modify selection strength in different ways.
    We began our discussion of deceptive landscapes by introducing the concept of fitness
valley width and depth. In our simulations, we examined valley depths between 3 and 9 and
valley widths between 3 and 12. However, you could imagine setting the width or depth to
zero. Doing so removes all deception from the fitness landscape. In other words, setting the
width or depth to zero defines the smooth fitness landscape. With this in mind, we can see
that all the results sit on a continuum of deception ranging from none to high.
                                                63


    In the data collected from the smooth function, we see that for each pair of parameters,
the maximum rate of genomic change occurs where the parameters are tuned to maximum
selection strength. The rates of change decrease when any parameter moves away from strong
selection strength. The contour lines in Figure 3.2 show a trade-off between the values on
each axis that indicates where strength of selection is roughly equal. As the functions become
more deceptive, we see that the position of the greatest change becomes uncoupled from the
highest selection strength settings.
    We see an interesting alignment between the areas of the greatest change in Figure 3.5
and the contour lines in the smooth function plot (Figure 3.2.a). As deception increases,
the area of the greatest change shifts to align with contour lines that are farther from the
point of the highest selection strength. This illustrates the trade-off between exploration and
exploitation. As the functions become more deceptive, weaker selection is needed to allow
more exploration across wider and deeper valleys. In addition, the shape of the regions in
Figure 3.5 indicates an almost-compensatory trade-off between tournament size and noise
amount, which also aligns with the contour lines in the Figure 3.2.a.
    The trade-off we identified in Figure 3.5 is not seen in Figures 3.3 and 3.4. If population
size only effected selection strength, we would expect to see that very large populations would
result in too much purifying selection to allow for exploration across the more deceptive valleys,
but this is not the case. In these plots, we still see that a large tournament size or low noise
configurations result in reductions in the rate of adaptation, but larger populations always
result in a faster rate of change.
3.4.1      Population size and selection strength
Increases in population size are known to also increase the selection pressure by improving the
sampling of individuals in the population during selection. However, in the data presented
we do not see an optimal population size; We observe that rate of adaptation increases with
population size without seeing a reversal of this trend at high values. This seems to show
that the trade-off between exploration and exploitation is not invoked when increasing the
                                                64


population size. Therefore, we conclude that there are other forces at work. Namely, we
believe that the increase to mutational inflow and standing genetic variation that result from
increasing the population size are responsible for the absence of this trade-off. Both of these
forces increase the ability of the population to explore the fitness landscape. We believe these
increases must be compensating for the increased selection strength that on its own would
have caused exploration to be reduced due to purifying selection.
3.4.2      Proxies for selection strength
Based on our results, if someone wishes to design an experiment using an evolutionary model
and wishes to vary selection strength in their model as an independent variable, they aught
not to use population size as their proxy for selection strength. In the case where it is possible,
it should be preferred to use a model of selection which comes with its own explicit parameter
for selection strength, like tournament selection.
    We find that, similar to tournament size, there is an optimum setting for noise when
all other parameters are held constant. Lower than this level, the effects of selection may
become too strong for the population to explore the fitness landscape effectively. Higher
noise results in an unreliable signal for the selection algorithm to effectively exploit higher
fitness individuals. In other words, adding noise to score creates the same trade-off as we see
when we adjust the tournament size.
    The relationship between tournament size and noise seems to be roughly one-to-one. This
similarity suggests that evaluation noise is a rather good proxy for tournament size, which is
itself a direct controller of selection strength. Ergo, noise is a rather good proxy for directly
adjusting selection strength.
    One benefit of using noise as an independent proxy for selection strength is that it is
algorithm independent. Regardless of the selection algorithm in use, we can always apply
noise to the scores it will operate on to reduce selection strength.
                                                 65


3.4.3      Neutral Theory in the Context of Noisy Score
The way we have applied noise to our digital organisms’ fitness is somewhat abstract. The
noise could be interpreted as gene expression variance, or a stochastic environment where
luck plays a role in survival and reproduction. In nature, we might expect both of these
sources of noise to co-occur with, possibly, even more. Therefore, natural organisms may
experience a wide range of rates of adaptation even with all other factors usually attributed
to controlling selection strength held constant.
    This has large implications on the neutral theory of molecular evolution [Kimura, 1987].
In brief, neutral theory posits that the majority of genetic variation is neutral variation. Since
noisy fitness can change the effective strength of selection, noise can change what qualifies
as neutral or not. What was once neutral may become detectable if the noise on fitness is
reduced, while detectable traits may become neutral in periods of additional noise. This has
far-reaching consequences once you consider theories of local optima escape which require
the discovery of neutral traits, for example extradimensional bypass [Cariani, 2002]. In the
presence of noise, a pre-existing trait may become the neutral ridge that is used to bypass a
fitness valley.
3.4.4      The effectiveness of small Tournaments
It came as a surprise that a large portion of the fast-evolving runs were using tournament
sizes below T = 2. Seeing as how T = 1.4 means that more than half (60%) of the population
is reproducing under drift-like conditions, it is fascinating that the population does not
experience mutational meltdown. Perhaps T = 1.4 is like reducing the effective population
size to 40%, but also including a pool of additional organisms that might be shuffled in or
out between generations. Interpreted this way is not at all surprising, that T = 1.4 evolves.
To the contrary, it would seem to have the benefit of maintaining diversity more effectively
than T ≥ 2.
    The excitement over this should be tempered somewhat by acknowledging that the rates
                                               66


of change seen on the smooth function (Figure 3.2) for tournament sizes at or below 1.4 are
extremely low. So, while these low tournament sizes help to maximize the rate of evolution
by balancing exploration and exploitation, they cannot allow evolution to proceed faster than
would be the case if the landscape was smooth and non-deceptive.
    Even so, real-world fitness landscapes are high dimensional and likely highly deceptive.
As a consequence, it’s interesting to see that very small tournament sizes can have observable
long-term outcomes on evolution for deceptive landscapes. Systems experiencing near-drift
behavior are still able to undergo directional selection when the time period of observation
are long enough.
3.4.5     Future Work
One interesting consequence of the results presented here are the implications for multi-trait
evolutionary systems. For example, consider a digital organism who’s total fitness is given by
summing together progress on both the shallow and narrow fitness functions. If the system
is evolving under tournament selection without noise, the optimum for the narrow function
is T = 1.8, while for the shallow function it is T = 1.2. Further, if the shallow function
is evolved at, T = 1.8 it is virtually stuck. Therefore, the choice of tournament size could
cause an organism on the multi-trait landscape to evolve on both functions or on just one.
However, say we choose poorly (T ≥ 1.8 which will halt progress on shallow), adding noise
to the system, say N = 8, could move both traits into a fast-evolving scenario.
    Our work has focused on results using the tournament selection algorithm. As a rank
based algorithm, tournament selection is going to behave differently from more traditional
fitness-proportional selection algorithms (such as roulette). Fitness proportional selection
algorithms are known to have variable selection strength as a result of increasing organism
scores. In preliminary work, we attempted to apply a correction to roulette selection that
would fix the diminishing selection strength problem, but ultimately we chose to go with
tournament selection because no such correction was necessary. Future investigations should
use a fitness proportional selection method and compare the results with tournament.
                                               67


    In our analysis of score noise, we find that there is often a single optimal value for this
parameter, given the other parameters are already chosen. There is a potentially untapped
engineering technique in the use of score noise to enhance the rates of adaptation of evolu-
tionary optimization algorithms. The single optimum makes tuning this parameter rather
simple; Simple enough to automate during an evolutionary search.
                                              68


Chapter 4
The role of disequilibrium in
evolutionary discovery
4.1       Introduction
Empirically, the rate at which populations evolve is extremely variable [Fitch, 1995]. Among
the many factors that affect the speed of evolution are the size [Ohta, 1972,Fogle et al., 2008,
Handel and Rozen, 2009,Jain et al., 2011] and structure [Möller et al., 2019] of the population,
the dynamics, and structure of the fitness function [Handel and Rozen, 2009, Weissman
et al., 2009, Kryazhimskiy et al., 2009], mutation rates [Kimura, 1964, Kimura and Crow,
1964,Weissman et al., 2009], and the size and distribution of beneficial mutations [Gerrish and
Lenski, 1998,Orr, 2005]. Getting a better understanding of these variable rates of adaptation
requires us to investigate how populations shift between periods of stasis and periods of
adaptation.
    In a genetic fitness landscape model [Wright, 1932, Maynard Smith, 1970], genotypes
specify locations on a landscape and the height differences between locations represent
relative fitness: genotypes at higher elevations have an advantage over those at lower elevations.
Movements on the landscape correspond to mutations from one genotype to another [Maynard
                                                69


Smith, 1970]. Peaks in a fitness landscape (i.e., genotypes surrounded entirely by lower
fitness) are referred to as local optima. A population may be in evolutionary equilibrium
when located on a local optimum because while areas of higher fitness are present elsewhere
in the landscape, the population must lose fitness in order to cross the valley between their
current peak and another higher peak. The likelihood that a population is trapped on a peak
is related to the extent to which populations are able to explore. Features like deep and
wide valleys, weak mutation and strong selection work to limit exploration and inhibit valley
crossing.
    The literature has identified multiple processes that enable evolving populations to cross
fitness valleys. Some theories suggest mechanisms for valley crossing that do not require
the loss of fitness, such as one organism accumulating multiple mutations in a single genera-
tion, environmental changes that alter the fitness landscape [Gillespie, 1984b, Steinberg and
Ostermeier, 2016], or effects that add dimensions to a fitness landscape (extradimensional
bypass [Cariani, 2002]). Neutral theory [Kimura, 1983, Kimura, 1987, Kimura, 1991] suggests
that local optima do not really exist and that instead there are always neutral paths to
higher fitness [Gavrilets, 2003], while nearly neutral theory [Ohta, 1992,Ohta, 2002,Fay et al.,
2002, Kern and Hahn, 2018] posits that some deleterious mutations may be tolerated as long
as the deleterious effects are small.
    Both small population size and increased mutation rates can reduce selection strength
[Lynch and Conery, 2003, Lynch, 2007], thus reducing the effectiveness of purifying selection
to purge deleterious mutations and enhancing the probability of valley crossing [Jain et al.,
2011, Ochs and Desai, 2015, LaBar and Adami, 2016] via stochastic tunneling [Iwasa et al.,
2004b,Weinreich and Chao, 2005,Weissman et al., 2009]. In addition, a reduction of selection
strength during range expansion [Excoffier and Ray, 2008, Palmer et al., 2012] can also result
in an enhanced probability of valley crossing [Burton and Travis, 2008].
    When the population size is large and the mutation rate not too low, the equilibrium state
of the population consists of many variants: a molecular quasispecies [Eigen, 1971, Eigen
                                                70


and Schuster, 1979, Eigen et al., 1989, Wilke, 2005, Forster et al., 2006]. If the valley is not
too deep and not too wide, the mutant cloud may include genetic variants on the other side
of a local optimum, or, at least, support mutant subpopulations that are sufficiently close
to a new local peak to increase the chance that this peak will be discovered by stochastic
tunneling [Iwasa et al., 2004b]. The name ’stochastic tunneling’ can be misleading, but it is
merely when a genotype that has several mutations relative to the wild-type goes to fixation
without any of the intermediate genotypes (single-mutant, double-mutant, etc.) themselves
going to fixation.
    Despite all of these mechanisms that allow for valley-crossing, which in turn allow a
population in equilibrium to transition into disequilibrium, we still do not have a simple
mechanism that can sustain a period of disequilibrium long enough to reproduce punctuated
equilibrium.
    In this chapter, we identify a new mechanism that significantly affects rates of evolution
by influencing the rate at which populations can cross fitness valleys. This new mechanism
can generate periods marked by rapid adaptation punctuated by stasis, and provides insights
into why an evolving population switches from rapid adaptation to stasis.
    To understand this new mechanism for valley-crossing, consider what happens when an
evolving asexual population discovers a beneficial mutation. If the mutation provides a
sufficient benefit, the genotype carrying the mutation is likely to take over the population in
a selective sweep. In the case of a spatial population, as depicted in Fig. 4.1, the mutants
that compete with the wild type are found on the leading edge between the mutants and
the wild type. Leading-edge mutants experience a weaker strength of selection because
they compete with, and have a fitness advantage over, the wild type. The boundary itself
provides an additional benefit by facilitating weak competition. Reduced selection pressure
on advantaged organisms occurs in well-mixed populations too, and comes as a result of
maintaining above-average fitness when carrying additional deleterious mutations.
    We call the phenomenon of reduced selection experienced by high-fitness mutants because
                                              71


                                                                 w ild type
                                                                 m utants exper iencing
                                                                 r educed selection
                                                                 m utants exper iencing
                                                                 str ong selection
Figure 4.1: A spatial population where a wild-type population (gray) is in the process of being
outcompeted by a mutant type (yellow and orange). The members of the mutant type (yellow) on
the leading edge experience a reduction in selection, compared to other mutant organisms (orange),
due to interactions with the wild type.
of their competitive advantage during selective sweeps, “the Free-for-All effect” (FFA). This
name was chosen because the organisms in the lineages that experience reduced selection are
able to accumulate deleterious mutations while still remaining competitive; for them, it is
an evolutionary free-for-all. This does not imply that lineages experiencing FFA are entirely
immune to the effects of selection: lethal mutations are still evolutionary dead-ends, and
if the deleterious mutations are too disabling, other lineages that are less encumbered may
outcompete them.
    If an organism succeeds in reaching a new peak during a period of FFA (i.e., during a
selective sweep), the new beneficial type will experience a fitness advantage over the remainder
of the population resulting in a new selective sweep. This extends the period where FFA is
active; each new valley crossing acts to keep the window of enhanced exploration afforded
by FFA open, potentially leading to more discoveries. This can result in a cascading chain
of discoveries that will continue until the population finally fixes on a peak, closing the FFA
window.
                                                 72


Figure 4.2: (a): A typical fitness valley of width v = 5 (it takes 5 mutations from peak to peak),
in a one-dimensional multiplicative fitness function (here, log of fitness is shown). Deleterious
mutations from the peak in the direction of the next higher peak have an effect 1 − sd . The adjacent
peak has fitness 1 + sb . (b): Periodic fitness function as a function of the number of mutations k.
In this version, valleys are v = 5 mutations wide. Log-fitness is shown. (c): A neutral version of
(b), where the valley is replaced by a series of neutral mutations that is used when measuring Tnd .
4.2       Results
Free-for-all theory suggests that during selective sweeps, a portion of the mutant subpop-
ulation will experience reduced selection pressure, which results in an increased chance of
valley crossing. We can test FFA by observing variability in the rate of adaptation (i.e., an
increase in the frequency of valley-crossing between the time of discovery and the time of
fixation of a beneficial mutation). However, there are a number of alternative explanations
that may account for similar observations, such as fitness function topology or epistasis. To
ensure that we are detecting the effects of FFA, it is necessary to design experiments that
limit confounding effects. To that end, we use a digital evolution system based on a simple
fitness function shown in Fig. 4.2b, which are repetitions of the fitness valley described in
Fig. 4.2a. Experiments are run with various population sizes, fitness function difficulties, and
population structures (see Methods).
    Organisms in our system consist of a genome with a single locus that encodes the mutation
number k. This allele mutates with a probability µ to k + 1 or k − 1 with equal probability.
The periodic function Fig. 4.2 can be written in terms of the genotype k and the valley width
                                                  73


v as
                                    w(k) = (1 + sb )n (1 − sd )m ,                            (4.1)
where the value m = k (mod v) determines how far the genome is within the valley, and
      k−m
n=     v
           simply numbers the peaks that have been climbed.
    This fitness function ensures that the selective pressures are exactly the same on each
fitness peak. Hence, every valley crossing occurs under the same conditions (no diminishing
returns) so that changes in the rate of adaptation cannot be due to the fitness function.
    For each experiment, we will measure the time it takes to cross the fitness valleys under
various conditions, and also measure the fixation time of a beneficial mutation. The first
valley-crossing time we measure is the stochastic tunneling time Tst , the average time it
takes a clonal population, concentrated at one peak, to stochastically tunnel across a fitness
valley towards a higher peak via random mutations in the presence of purifying selection.
For one-dimensional fitness valleys, Tst is well understood and can be calculated as the mean
first-passage time of an Ornstein-Uhlenbeck process [Alili et al., 2005, Yi, 2010, Artime et al.,
2018] (see also [Weissman et al., 2009]). We also measure the neutral drift time Tnd , which
is the average time to cross a plateau of the same width (i.e., sd = 0, Fig. 4.2c). This time
is even better understood, as it is the mean first-passage time of a Wiener process [Risken,
1989]. In addition, we will measure the actual time to cross valleys Tobs , by observing a freely
evolving system. Finally, in addition to these valley crossing times, we measure the time
that a beneficial mutation takes to fix in a population, Tfix . Fixation times are well studied
and can be calculated easily as a function of mutation rate, population size, and mutational
effect size [Kimura and Ohta, 1969].
4.2.1      Mean rate of adaptation Tobs changes in response to popula-
           tion size N
Fig 4.3a shows Tst , Tobs , Tnd , and Tfix as a function of population size (N ) on a log-log scale,
for a fitness landscape with valley-width v = 5, a deleterious mutation cost sd = 0.33, and
                                                  74


a benefit for crossing the valley sb = 16.08, with a mutation rate µ = 0.1. This data was
collected from populations that are structured as a linear array with a periodic boundary
condition.
    Fig. 4.3b and c show analogous results from a population structured as a regular grid
with periodic boundaries and a well-mixed population, respectively. In Fig. 4.3b, v = 6,
sd = 0.139, sb = 19.107, and µ = 0.1. In Fig. 4.3c, v = 6, sd = 0.594, sb = 15.0, and µ = 0.1.
    Fig. 4.3a-c all show the same trend. Low population size results in rapid fixation (Tfix <
Tnd ) and Tobs ≈ Tst in these cases. At intermediate population sizes, when Tnd < Tfix < Tst ,
we see a drop in Tobs relative to Tst . Finally, around the point where Tst < Tfix , large
population sizes result in long fixation times and Tobs becomes far less sensitive to changes
in population size.
4.2.2      Observed valley-crossing times are bi-modal for intermediate
           population sizes
Figs. 4.3d-g show the distributions of Tst , Tobs , and Tnd , and the mean of Tfix for different
values of N , labeled in Fig. 4.3a. Fig. 4.3d shows that when fixation is very fast compared
to neutral crossing Tfix < Tnd , each beneficial mutation generally fixes before the next can
be discovered, as evidenced by the similar shapes of the Tst and Tobs distributions (they are
the same distribution aside from the number of samples collected). Figs. 4.3e-f show that
when fixation is somewhere between the drift and tunneling times Tnd < Tfix < Tst , Tobs
becomes a bimodal distribution, where one mode coincides with the Tst distribution and
the other mode is sandwiched between Tnd and Tfix . This indicates there are two modes of
valley-crossing under these conditions: one where circumstances match the those we created
for measuring Tst , and another that appears to occur under alternative circumstances. Finally,
Fig. 4.3g shows how, when valley-crossing happens much faster than fixation Tst < Tfix , the
distribution of Tobs shifts to the right of Tst and is no longer bimodal.
                                                75


Figure 4.3: (a): Empirical fixation times Tfix , stochastic tunneling times Tst , neutral drift times
Tnd , and observed transition time Tobs for fitness peaks with deleterious effect sd = 0.33, and
beneficial effect sb = 16.08, as a function of population size, in a one-dimensional spatial population
evolving at mutation rate µ = 0.1. Four population sizes labeled d, e, f, and g are shown in full detail
in panels (d-g). (d-g): Distribution of neutral drift times Tnd (gray), observed transitions Tobs (red),
and stochastic tunneling times Tst (blue) for four population sizes labeled in panel (a). Overlapping
red and blue distributions appear purple. The mean fixation time is indicated by a dashed vertical
line. (d): Distributions at N = 25, in the sequential-fixation regime. (e) Distributions at N = 100.
(f):Distributions at N = 1, 000. (g) Distributions at N = 100, 000, in the concurrent-mutations
regime. (h-k): the number of valleys crossed over the first 300,000 generations of the experiments
used in (d-g). (l-m): zoomed segment of (j-k) respectively.
                                                   76


4.2.3      Intermediate population sizes result in punctuated cascades
           of valley-crossing
Figs. 4.3h-k shows the cumulative number of valleys crossed by the population during the
first 300,000 generations of the experiments that generated Tobs in Figs. 4.3d-g respectively.
Figs. 4.3h-i each show that only a few valleys were crossed during this period. In Fig. 4.3j, we
can see long cascades, periods of rapid evolution where many valleys are crossed, punctuated
by periods of stasis; sometimes tens of thousands of generations pass without any change.
This is further illustrated in Fig. 4.3l, which shows a detailed view of the fitness trajectory
indicated by the dashed line box in Fig.4.3j. These cascades are a result of the fastest valley-
crossings being bunched together, instead of distributed uniformly. This suggests that the
population may be in one of two states at a given time: stasis, or rapid change. This is, not
coincidentally, reminiscent of the bimodal distribution of Fig.4.3f; the rapid valley-crossings
constitute one mode of the distribution, while the crossings that occur after periods of stasis
contribute to the other mode. Finally, Fig. 4.3k (and the detail shown in Fig.4.3m) depicts
a state of constant cascade: there are no periods of stasis observed.
4.2.4      The positions of Tobs transitions move in response to mutation
           rate and valley width
We observed earlier that the points where Tfix crosses Tnd and Tst are important delineations
that mark where Tobs begins to diverge from Tst and where Tobs becomes insensitive to
further increases of the population size, respectively. In Fig. 4.4, we show nine plots similar
to Fig. 4.3a, where Tst , Tobs , Tnd , and Tfix are graphed against population size. There are nine
such plots showing all combinations of three mutation rates (µ = 0.05, 0.1, 0.15, left-to-right),
and three valley widths (v = 4, 5, 6, top-to-bottom). The same fitness advantage from the
beneficial mutation is used in all nine plots (sb = 16.09), but in order to maintain valley
depth, the deleterious effect sizes were set to sd = 0.416, 0.333, 0.277 for valley widths 4,5,6
respectively. The center plot contains the same data as Fig. 4.3a. Both increasing mutation
                                                    77


rate and decreasing valley width result in valleys that are easier to cross, and so reduce Tst
and Tnd . In turn, these changes mean that Tst and Tnd cross Tfix at smaller population sizes
(x-axis) and at shorter times (y-axis). Thus, increasing mutation rate and decreasing valley
width bring the transition points between the sequential-fixation and successive-mutations
regimes closer together. Conversely, decreasing mutation rate or increasing valley width
moves the transition points further apart.
                       v=4
           8                                                      µ = 0.05                          µ = 0.1                       µ = 0.15
                                                    106
           6                                                                                                                                            Tobs
                                                    105
           4
                                                                                                                                                        Td
 log(w)                                    counts
                                                    104
           2
                                                    103                                                                                                 Tnd
           0
          −2
                                                    102
                                                                                                                                                        Tfix
                                                    101
          −4
                       v=5                          107
           8
                                                    106
           6
                                                    105
           4
 log(w)                                    counts
                                                    104
           2
                                                    103
           0
                                                    102
          −2
                                                    101
          −4
                       v=6                          107
           8
                                                    106
           6
                                                    105
           4
 log(w)                                    counts
                                                    104
           2
                                                    103
           0
                                                    102
          −2
                                                    101
          −4
               0   1   2   3   4   5   6                  101   102   103   104   105   101   102     103     104   105   101   102   103   104   105
                           k                                          N                               N                               N
Figure 4.4: Empirical (mean) fixation times Tfix , stochastic tunneling times Tst , neutral drift times
Tnd , and observed transition time Tobs for fitness peaks with width v = 4, 5, 6 (rows) and mutation
rates µ = 0.05, 0.1, 0.15 (columns). In order to keep valley depths (the difference between the valley
and the peak) constant, we have adjusted the deleterious effect of a mutation so that sd ≈ 0.42 for
the v = 4 landscape, while sd ≈ 0.28 for v = 6.
4.2.5                  There are three modes of evolution
The primary results discussed above can be summarized succinctly with the aid of Fig. 4.5.
The figure depicts an idealized picture of how population size affects Tst , Tnd , Tfix , and Tobs .
The dependence of Tst , Tnd , and Tfix on population size is well understood: as population
                                                                                          78


Figure 4.5: Three regimes of evolutionary dynamics. The sequential-fixation region (I, shaded
in red) is defined by Tfix ≤ Tnd , while for the successive-mutations region (III, shaded in blue)
Tfix ≥ Tst . In between those regions lies region II, where selective sweeps can weaken selection
(Tobs < Tst ) so that new peaks can be discovered, possibly in cascades.
size increases, an increase in mutation in-flow combined with an expanded quasi-species will
decrease Tst , and Tnd , while increasing Tfix . How population size affects Tobs is complex,
however, because how Tobs depends on population size is controlled by the relative magnitude
of the other times. Fig. 4.5 also contains a fifth valley-crossing time Tffa . This is the
hypothetical time to cross a valley via stochastic tunneling when selection strength is reduced
due to FFA, and corresponds to the previously unexplained second mode in the distribution
of Tobs (Figs. 4.3e-f). Therefore, we call this hypothetical time during free-for-all Tffa and
note that the condition Tnd ≤ Tffa ≤ Tst holds necessarily by the definition of these terms.
    We previously noted that where Tfix crosses Tnd and Tst mark major transition points in
                                                79


the behavior of Tobs . We can use these crossing points to divide the range of population sizes
into three regions that exhibit unique evolutionary dynamics from one another.
    We refer to the condition where Tfix < Tnd as region I (shaded red in Figure 4.5). Region
I is already well-characterized in the literature, and corresponds to the sequential-fixation
(“weak mutation–strong selection” or WMSS) regime where beneficial mutations fix one at a
time, in the order they are discovered [Desai and Fisher, 2007, Desai et al., 2007]. In region
I, because Tfix ≪ Tnd , FFA does not play an important role in the rate of adaptation since
a new benefit would need to appear faster than the average neutral drift time, suggesting
exceptionally weak selection due to FFA. Recall that Tffa ≥ Tnd , so although the reduction
in selection strength due to FFA always occurs, the net effect in this case is moot. Therefore,
as we would expect, and in accordance with the literature, we see that Tobs ≈ Tst under this
condition.
    We refer to the condition where Tnd ≤ Tfix ≤ Tst as region II (depicted white in Fig. 4.5).
Region II does not correspond to any previously defined mode of evolution. In region II, the
reduction in selection pressure resulting from FFA can aid in valley crossing (unlike region I).
As Tfix increases towards Tffa in response to the population size, we see that the probability
of cascades increases, which reduces Tobs (the average of Tst and Tffa , weighted by occurrence)
relative to Tst . This is reflected in Figs.4.3a-c,e-f, j, and l, as well as in Fig. 4.4.
    As the population size continues to increase, and Tfix approaches Tst , we see that the system
approaches a state of nearly-constant selective sweeps because the discovery of beneficial
mutations occurs more quickly than the fixation.
    We refer to the condition where Tst < Tfix as region III (shaded blue in Fig. 4.5). Region III
is also well-characterized in the literature and is the concurrent-mutations regime (also called
the strong mutation–weak selection (SMWS) regime) [Rouzine et al., 2003,Wilke, 2004,Desai
and Fisher, 2007,Desai et al., 2007,Rouzine et al., 2008,Brunet et al., 2008,Fogle et al., 2008,
Park et al., 2010,Fisher, 2013,Good et al., 2012]. Here, beneficial mutations are discovered at a
faster rate than they fix, resulting in competing subpopulations containing different beneficial
                                                 80


mutations. In this regime, evolution slows down due to clonal interference, and the speed of
adaptation becomes largely independent of population size [Gerrish and Lenski, 1998, Kim
and Stephan, 2003, Wilke, 2004, de Visser and Rozen, 2005, Campos and de Oliveira, 2004].
In region III, because Tffa ≪ Tfix , the system is in a constant state of selective sweeps and
therefore constantly experiencing the reduced selection from FFA. However, the substantial
decrease in the rate of adaptation due to clonal interference eventually overshadows the speed-
up due to FFA. As we approach “infinite population size” all valley-crossing times, theoretical
and actual, approach Tnd due to the infinitely widening quasispecies and the vanishing effects
of purifying selection. This is of little interest for discussions regarding valley-crossing, since
these unrealistic theoretical populations never get stuck on local optima.
4.3       Discussion
It is well-known that rates of adaptation, as inferred from the fossil record [Conway Morris,
1998, Erwin et al., 1987, John Sepkoski Jr, 1998, Niklas et al., 1983, Niklas, 1997] and experi-
mentation [Wolf et al., 2006, Elena et al., 1996], are not constant. Multiple mechanisms have
been proposed to explain the variability in the tempo of evolution, but these mechanisms
are usually specific to the time and place within which they are observed. Here, we have
presented a new dynamic, reduced selection on advantaged lineages during fixation — the
free-for-all effect (FFA) — that universally affects the rate of evolution. Under some condi-
tions, FFA predicts that evolving populations will shift between modes of stasis and change.
In particular, FFA provides a simple explanation for long periods of stasis separated by brief
periods of rapid adaptation (a.k.a., punctuated equilibrium).
4.3.1      Selective sweeps lead to reduced selection via FFA
Because FFA is a consequence of selective advantages, the effect is always present during
selective sweeps. When Tnd < Tfix < Tst and landscapes are deceptive, FFA predicts that
evolving populations will cross fitness valleys faster than expected if selection strength would
not change during selective sweeps. In other words, FFA predicts Tobs < Tst when the
                                                 81


conditions are right. Figs. 4.3e and f show examples of the valley-crossing times observed
during both stasis and selective sweeps. The distribution of Tobs is bimodal, where the right
mode of the distribution corresponds to Tst and consists of valley crossings that occur when
the population is in stasis on a peak. The anomalous left mode of the distribution, comprised
of shorter times, is upper-bound by Tfix and suggests that these valley crossings must occur
before fixation, that is, while the reduced selection pressure of FFA is available. Heuristically,
the rate of adaptation Tobs is described by a Markov process with two states, equilibrium
and non-equilibrium, where the state of the population determines the expected speed of
valley-crossing.
    While FFA can be clearly observed under some conditions, it can be difficult to detect for
a number of reasons. In particular, some conditions make it impossible for FFA to change
the rate of adaptation, while other conditions do not affect FFA directly, but result in other
concurrent changes to the rate of adaptation that mask the effects of FFA. For example, we
do not see a change in the rate of adaptation when beneficial mutation tends to fix before
a new one is discovered (the SSWM regime). Thus, even though FFA is reducing selection
pressure, the time required to valley-cross is still more than the fixation time, and as a result,
FFA does not result in more valley-crossing. In Fig. 4.3a-c, the absence of a noticeable change
in the rate of adaptation is evidenced by Tobs ≈ Tst , and indeed the distributions of Tst and
Tobs in Fig. 4.3d are very similar. In short, FFA is imperceptible in the SSWM regime.
    When fixation times are so large that they exceed the stochastic tunneling time Tst < Tfix
(the WSSM regime), multiple beneficial mutations are discovered before they can fix, leading
to clonal interference and the accompanying loss of some (or many) beneficial mutations. As a
consequence, it becomes difficult for the rate of adaptation to increase further [de Visser et al.,
1999, Desai and Fisher, 2007, Rouzine et al., 2008] even though FFA reduces the selection
strength on the advantaged lineages. In Fig. 4.3a-c, this appears as Tobs becoming decoupled
from Tst (and the distributions for those times in Fig. 4.3g becoming incongruous).
    Finally, in order to observe a reduction in Tobs relative to Tst resulting from FFA, fitness
                                                82


valleys are required. A reduction in selection strength can only increase the rate of adaptation
if strong selection was preventing progress to begin with. If there are no valleys, strong
selection does not present a barrier to progress and, in fact, any reduction of selection
strength in a non-deceptive landscape will slow the rate of adaptation [Ragusa and Bohm,
2021].
4.3.2      FFA depends on the relationship between fixation and valley-
           crossing time
We have previously asserted that the relationship between Tfix , Tst , and Tnd can predict if
FFA results in accelerated adaptation. If this is true, then any parameter that affects these
times will impact FFA dynamics. To support this claim, we show in Fig. 4.4 that changing
valley width or mutation rate can affect the relationship between Tfix , Tst , and Tnd , and show
how these changes correlate with changes in rates of adaptation Tobs . Conditions that make
the valleys easy to cross limit the observable impact of FFA because valley-crossing already
occurs rather frequently without reduced selection. Conversely, conditions that make valleys
harder to cross (lower mutation rate and/or wider valleys) intensify the degree to which FFA
appears to accelerate adaptation because valley-crossing occurs less frequently without the
aid of FFA.
4.3.3      FFA does not require spatially-structured populations
FFA is easiest to understand in the context of spatial populations, where the leading edge of
a sweep experiences reduced selection, as in Fig. 4.1. However, in Fig.4.3c, we also observe
the effects of FFA in well-mixed populations where there is no leading edge. Indeed, selective
sweeps in well-mixed populations are slightly different from sweeps in spatial populations.
    The underlying mechanism of FFA in well-mixed and spatial populations is fundamentally
the same; FFA is the reduced selection pressure on those members of the population that
carry a beneficial mutation, as a result of their competitive advantage over others in the
population. In a spatial population, this reduction in selection happens along the leading edge,
                                               83


because this is precisely where the competition occurs. In well-mixed populations, we must
abandon the notion of where competition happens; competition happens between everything,
everywhere, all at once. Instead of leveraging a spatial intuition, we can understand FFA in
well-mixed populations by considering that any descendants of an advantaged genotype that
acquire a deleterious mutation may still have a selective advantage over the average fitness
for a limited time. During this window of opportunity, the remaining advantage can allow
for further deleterious mutations to be accumulated on the lineage, so long as these multi-
mutants continue to have an advantage relative to the average fitness. As the unmutated
beneficial genotype goes to fixation, the average fitness will rise, and the temporary selective
advantage of a multi-mutant is lost. However, if before this happens, one of the multi-mutants
discovers another beneficial mutation (i.e., crosses a fitness valley) these dynamics will repeat
as the new beneficial type goes to fixation.
    Generally speaking, well-mixed populations experience dramatically faster fixation times
than their spatial counterparts when controlling for population size, and for this reason the
FFA window in well-mixed populations is comparatively short. In order for FFA to result
in accelerated rates of adaptation, Tffa must be correspondingly short so that fitness valleys
can be crossed before the window of opportunity closes.
4.3.4      FFA is a non-equilibrium phenomenon
Because FFA is a condition created by a population undergoing a selective sweep, FFA
is strictly a non-equilibrium effect of population genetics. While there are mathematical
tools to investigate non-equilibrium dynamics in populations (notably, the Fokker-Planck
equation [Risken, 1989]), for all but the most trivial dynamics (such as neutral drift), only
stationary solutions to the Fokker-Planck equations are available in closed form. Likewise,
most computational simulations of evolution (for example, replicator-mutator equations) rely
on methods that assume a constant strength of selection (see also [Weissman et al., 2009]).
Those methods will also fail to observe FFA. In agent-based simulations, on the other hand,
the selective forces on an organism are an emergent property: they are determined by the
                                               84


 immediate competitors (in a well-mixed or structured group) and can change dramatically
 during selective sweeps.
 4.3.5      Range expansion is a special case of FFA
 Range expansions occur when a population discovers and expands into new geographic
 territory. It has been suggested that range expansion can result in reduced selection on the
 leading edge and enhance valley-crossing [Burton and Travis, 2008], similar to our observations
 of FFA during selective sweeps. However, range expansion differs from selective sweeps in two
ways. First, a range expansion requires unpopulated territory for a population to expand into,
while FFA results from selective sweeps. In a selective sweep, the more fit genotypes overtake
 the existing population, so no new territory is necessary. Second, because the enhanced
 ability for valley crossing of a range expansion ends when the new territory is full, cascades
 of genetic improvement (like those seen in Fig.4.3j) are not possible with range expansions
 alone, except under very special conditions. One such condition is an environment that is
 structured such that multiple ranges, each more difficult to inhabit than the last, are available
 to the population, as in the MEGA plate experiment [Baym et al., 2016].
     Despite these differences, there are clear similarities between FFA and range expansion
 dynamics. In fact, if we consider empty geographic territory to be occupied by a zero-fitness
“virtual wild-type” (i.e., the environment presents no resistance to expansion), we can see
 immediately that FFA predicts the dynamics of range expansion. Thus, FFA appears to be
 a general explanation for the reduced selection during range expansions that also includes
 cases where the wild-type fitness is non-zero. In addition, if a valley is crossed during a range
 expansion, then the resulting selective sweep causes reduced selection via FFA and could
 result in a FFA cascade.
                                                85


4.3.6      FFA can lead to intermittent cascades of adaptation sepa-
           rated by periods of stasis
We have shown that under certain conditions, FFA can result in irregular rates of adaptation,
as shown in Figs. 4.3h-m. In particular, Figs. 4.3j and l show a punctuated rate of evolution
where some valley crossings occur as single events, but others are clustered, temporally, in
cascades. The phenomenon of long cascades, with long waiting times in between, is often
described as punctuated equilibrium (PE) [Gould and Eldredge, 1977].
    FFA describes how strength of selection shifts when the population moves between periods
of equilibrium and periods of non-equilibrium. Another concept that invokes shifts in selection
strength is Shifting Balance Theory (SBT), based on work by Wright [Wright, 1932, Wright,
1982], that explains how isolated subpopulations experiencing reduced selection are more
likely to cross fitness valleys. Gould and Eldredge [Gould and Eldredge, 1977] proposed SBT
as a potential explanation for the variable rates of evolution observed in natural systems
during PE. However, SBT has been criticized for being so specific a mechanism that the
circumstances that allow it to occur must be rare [Gavrilets, 2003]. In a recent article, it
has been suggested that the temporary shift in selection strength that occurs during range
expansions should be used to extend SBT [Johnson, 2008]. Along these lines, we suggest that
FFA resulting from selective sweeps should also be considered as part of SBT. With range
expansion and FFA as additional explanations for temporary shifts in selection strength, SBT
not only becomes a more general theory, but also becomes a far more reasonable hypothesis
for PE.
4.3.7      The complexity of natural fitness functions does not preclude
           FFA
In the simulations shown here, gene interactions and multiple mutations on a single offspring
(in a single event) are not possible, and diminishing returns are not present in the fitness
function. While our system is ideal to demonstrate that FFA exists, such a system will never
                                               86


be found in nature. Still, the simulation does not have any properties that could not be
found in a naturally occurring instance of evolution. In fact, the dissimilarity between the
simulation presented here, and natural systems stems from the lack of elements that are
commonly expected in natural systems. In the remainder of this section, we consider how
evolutionary dynamics of FFA would be affected by the inclusion of some features common
to natural populations.
     Allowing multiple mutations on the same organism in a single mutational event will reduce
Tst , Tffa , and Tnd because it makes crossing a valley easier. As a result, permitting simultaneous
mutations (or any change that affects valley-crossing times), will affect the location of the
boundaries between regions in Fig. 4.5, but will not otherwise affect the dynamics of FFA,
since FFA is the result of newly discovered fitness advantages, not specifically valley-crossing.
     Populations evolving on any fitness function (not just the one we have used here) will still
experience FFA dynamics. However, testing evolution on a landscape other than the one we
used in my simulations prevents us from distinguishing valley crossings that are shortened
by FFA from the regular valley crossings that occur when the population is in equilibrium.
For example, if we were to use a fitness function with diminishing returns, then because
the strength of FFA depends on the fitness gains of beneficial mutations, there will be a
diminishing effect of FFA over evolutionary time.
     Multidimensional (multi-trait) fitness functions provide additional opportunities for FFA
in the form of more mutational degrees of freedom, and more opportunities to discover
beneficial mutations overall. In this work, the fitness function is one-dimensional (a single
trait), so there is only ever one valley available for exploration. However, in a multidimensional
fitness function many directions for exploration are always present, and each may present
a different fitness challenge. Interestingly, FFA resulting from a discovery on one trait will
provide a reduction in selection strength that could allow another possibly unrelated trait
to drift toward a new peak. This is particularly interesting if we consider a scenario where
crossing an easy valley on one dimension could provide a large enough benefit to cross more
                                                   87


prohibitive fitness valleys on other dimensions. Thus, a FFA cascade that accumulates a
large net benefit before fixation could eventually result in crossing a valley that would be
strictly impassable in the absence of FFA.
    One consequence of multidimensionality is the possibility for gene interactions (epistasis),
the condition in which the allele of one gene alters the fitness effect of an allele of another
gene. A system with gene interactions will still experience FFA, but the gene interactions
will create confounding signals that make detecting FFA challenging. When two beneficial
mutations have a combined fitness benefit larger than the sum of their individual benefits,
the resulting fitness advantage will strengthen FFA; the larger benefit will cause a bigger
reduction of the selection strength. The opposite can also happen: two beneficial mutations
together may have a weaker fitness advantage than the sum of their benefits. In this case,
FFA is weakened by the loss of fitness. These kinds of gene interactions, and others, make
clearly observing FFA harder, but they do not preclude FFA from occurring.
    In this work, we use asexual populations, and so the rate of adaptation in region III
is reduced due to clonal interference. Sexual populations could, in theory, avoid clonal
interference via recombination, and thus experience quicker fixation times in region III than
asexual populations. Sex has also been shown to increase the strength of purifying selection,
making valley-crossing harder overall [Kondrashov, 1988,Rice and Chippindale, 2001] relative
to asexual populations. These changes may affect the observed rates of adaption in sexual
populations, but will not affect the dynamics of FFA.
4.3.8       Observing FFA in natural systems
We have argued that FFA dynamics are an unavoidable feature of evolutionary dynamics, even
though its ramifications are not always felt. We thus suspect that some natural populations
experience cascades of adaptation mediated by FFA. However, it is likely very difficult to
disambiguate FFA-induced spurts of adaptation from confounding effects, such as multiple
adaptations made possible by a change in the external environment. To make an unambiguous
case, it is probably necessary to have detailed genomic and fitness data on the line of descent,
                                               88


documenting that the population crossed valleys that standard population genetics would
deem uncrossable. However, it is not hard to imagine scenarios where FFA could play a
major role.
    In the evolution and spread of cancer, for example, multiple somatic mutations appear
in rapid succession. It is not unreasonable to hypothesize that a single (or several) “driver”
mutations [Martı́nez-Jiménez et al., 2020] create a non-equilibrium environment of reduced
selection that leads to the accumulation of multiple somatic mutations that, in turn, can
cross valleys that otherwise could not be crossed.
    Viral dynamics provides another possible venue to look for manifestations of FFA. Pan-
demics are often started by a beneficial mutation that results in a selective sweep, often
with global reach (as in the SARS-Cov2 pandemic) and multiple waves of variants can sweep
through local populations. Indeed, a punctuated pattern of evolution has been observed
in influenza evolution [Wolf et al., 2006]. Given that, in the case of SARS-Cov2, tens of
mutations can accumulate within a single immunocompromised individual (where selection
is relaxed) [Pedro et al., 2021,Cele et al., 2022], it is not unreasonable to imagine that during
the initial period of a strong wave, FFA might sufficiently reduce selection on the sweeping
variant such that deleterious mutations can rapidly accumulate and lead to new variants.
4.4       Conclusion
We have introduced the Free-for-All effect: a new non-equilibrium component of evolutionary
dynamics that affects the tempo and mode of evolution. We have discussed the conditions
that make it possible to detect the impact of FFA on evolutionary rates, and demonstrated
how FFA can result in cascades of discoveries of beneficial mutations. The FFA effect, and
the cascades that can result from FFA, provide a novel explanation for the observation of a
punctuated pattern of evolution (a.k.a., punctuated equilibrium).
    If a very large fitness advantage is discovered, as a result of a FFA cascade or any other
means, FFA can become so strong that selection is removed almost completely. Valleys of
                                                89


unfathomable depth could be crossed, and entirely new body plans could conceivably emerge
that are unimaginable under standard evolutionary dynamics, leading to further explosions
of diversity [Gould, 1989]. At the same time, the diminishing quantity of beneficial mutations
in the vicinity of local optima make fixation ever more likely, gradually bringing all cascades
to an end eventually. Thus, FFA provides an elegant explanation for the observation of
punctuated evolution: FFA explains why cascades occur (reduced selection strength during
selective sweeps), and why cascades come to an end (the return to strong purifying selection
after fixation). Furthermore, since FFA is active during all selective sweeps, it offers a
mechanism for shifts in selection strength that does not carry the burden of requiring rare
circumstances or environmental effects.
    Free-for-all suggests that any time a system undergoes change, it enters an
unstable state that may result in more change.
    While this work studied the impact of FFA on the evolution of a single trait, it is
immediately clear that mult-trait landscapes will only enhance this effect, as a selective
sweep due to the improvement of one trait can reduce selection on another trait. Cascades
that occur in parallel (rather than serially) are expected to lead to larger fitness gains, more
powerful reductions in selection strength, and therefore to even more pronounced periods of
rapid adaptation. These will be studied in future work.
4.5       Methods
The experiments in this work were conducted using the MABE framework, a modular re-
search tool that supports the development and execution of agent-based digital evolution
research [Bohm et al., 2017].
4.5.1      Computational model
The experiments in this work are designed to prohibit gene interactions, multiple mutations
on a single offspring, and diminishing returns in the fitness function.
    The digital organisms in this work consist of a genome with a single locus that encodes
                                               90


the mutation number k. This allele mutates with a probability µ; mutations result in k + 1
or k − 1 with equal probability. We use a periodic fitness function (Fig. 4.2) that can be
written in terms of k and the valley width v as
                                 w(k) = (1 + sb )n (1 − sd )m ,                           (4.2)
where the value m = k (mod v) determines how far the genome is within the valley, and
      k−m
n=     v
           simply numbers the peaks that have been climbed. This is how the function is
presented in the main text. For computational purposes, an alternative definition is used.
The alternative form of the fitness function is given by
                                       w(k) = rBn−Dm ,                                    (4.3)
where r is a parameter that controls the strength of selection. B and D are defined in relation
to sb and sd : sb = rB − 1 and sd = 1 − r−D . All the experimental configurations for the
fitness function and evolutionary algorithm are provided below in terms of v,r,B, and D.
    We carry out evolution experiments using three population structures: a one-dimensional
array with periodic boundary conditions, a regular two-dimensional grid with periodic bound-
ary conditions, and a well-mixed population. In the one-dimensional population structure,
each agent is in the center of a 1 × 3 neighborhood and has two neighbors. In the two-
dimensional population structure, each agent is in the center of a 3 × 3 neighborhood and has
eight neighbors. The well-mixed populations are structured identically to the one-dimensional
populations, with the modification that once per generation immediately before parent se-
lection the location of each agent is randomized to mix the population, effectively removing
the dimensionality. We still only allow each organism to have at most two offspring (as in
binary fission) to avoid the explosive growth rates that usually accompany a truly fitness-
proportional, well-mixed population. We use a generational evolutionary model where the
population size is fixed, and the entire population is replaced every generation. During re-
                                              91


production, a parent is selected from each neighborhood using fitness-proportional selection.
The offspring is placed at that neighborhood’s center location in the next generation.
4.5.2     Data collection
We measure four properties of the evolutionary system: the time it takes a stationary clonal
population to cross a fitness valley Tst , the time to cross the same valley without purifying
selection Tnd , the fixation time of a beneficial mutation Tfix , and the time to cross fitness
valleys when the system is allowed to evolve for a long period Tobs . To record Tst , we initialize
a population at a local optimum. When we detect that an agent has discovered the next
higher peak and has progeny after an additional 100 generations (i.e., it has established)
we record the time in generations of the discovery. We use the same process to record Tnd ,
but with an altered fitness function where the valley has no depth (sd = 0). To record Tfix ,
N − 1 agents (where N is the population size) are initialized on a local optimum, with the
remaining agent initialized on the next higher peak and tagged with a heritable marker. The
simulation is run with µ = 0, until either all organisms have the marker (fixation) or none do
(extinction). Upon fixation, the time in generations is recorded. On extinction, we disregard
the sample. For Tst , Tnd , and Tfix each time a recording is made the population is re-initialized
and more data is collected. To record Tobs , we initialize a population on a local optimum
and let it evolve. We use the same method of detecting the discovery and establishment of
valley-crossing events used when recording Tst and, Tnd , however, rather than re-initializing
the population after each valley crossing, we record the event and allow the simulation to
continue.
4.5.3     Experimental parameters
Experiments are run for a fixed number of CPU hours (4 hours unless otherwise noted).
Table 4.1 organizes all the experimental conditions into a single reference.
                                                 92


 Population structure           1-dimensional array w/ periodic boundary
 Population Sizes               101 to 105
 Mutation rates                 0.05, 0.1, 0.15
 Selection strength (r)         1.5
                                                4
 Fitness function parameters    B = 7, D = v−1    , v = [4, 5, 6]
 Population structure           2-dimensional grid w/ periodic boundary
 Population Sizes               4 × 102 to 25 × 105
 Mutation rates                 0.1
 Selection strength (r)         1.35
 Fitness function parameters    B = 10, D = 0.5, v = 6
 Population structure           Well-mixed
 Population Sizes               102 to 25 × 105
 Mutation rates                 0.08
 Selection strength (r)         2
 Fitness function parameters    [B = 4, D = 1.3, v = 3]
          Table 4.1: Experimental conditions used to generate all data in this report.
4.5.4     Distribution plots
In the Section 4.2, we show plots that show distributions of Tst , Tnd , Tobs , and Tfix . We create
the distribution plots by binning the data using variable-sized bins with ranges defined by
Eqn. (4.4) and then plotting the resulting histograms on a log/log scale.
                            bini = [1.15i , 1.15i+1 ) for i ∈ [0, 100]                         (4.4)
                                               93


 Chapter 5
Augmenting evolutionary algorithms
with Super explorers
This chapter is based on the article “Augmenting Evolution with Bio-Inspired ‘Super Explor-
 ers’” [Ragusa and Bohm, 2022].
 5.1      Introduction
All search and optimization algorithms, including both natural selection and computer models
 of evolution, are subject to the fundamental limitations of the no-free-lunch theorems [Ho
 and Pepyne, 2002], and particularly to the explore-exploit tradeoff [Millidge et al., 2021].
 Managing this tradeoff is typically a main concern in the development of evolutionary algo-
 rithms. In this work, we introduce the “super-explorer method” for augmenting selection
 methods used in evolutionary algorithms. The super-explorer method allows us to tune the
 trade-off between exploration and exploitation to better align with the ruggedness of a fitness
 landscape.
     The super-explorer method augments other selection methods by adding agents (the
“super explorers”), that are allowed to drift (are not subject to selection). Discoveries made
 by super explorers are immediately available for refinement by the underlying algorithm.
                                               94


In essence, super explorers allow for the simultaneous exploitation of new discoveries while
exploration continues in parallel. We compare three implementations of the super-explorer
method across a wide range of configurations. We test each implementation on two different
kinds of fitness landscapes: an NK-fitness landscape [Kauffman et al., 1993], and a saw-tooth
fitness landscape [Ragusa and Bohm, 2021]. We show that for both types of fitness function,
there are configurations where evolution augmented with super explorers preforms better
than without.
5.1.1      Biological inspiration
Natural selection does not act with consistent strength across time and space; Shifting balance
theory [Wright, 1932, Wright, 1982], range expansion [Slatkin and Excoffier, 2012, Peischl
et al., 2013, Peischl and Excoffier, 2015, Peischl et al., 2015, Gilbert et al., 2017, Burton and
Travis, 2008], environmental noise [Wang and Zhang, 2011, Van Egeren et al., 2018, Ragusa
and Bohm, 2021], population size changes [Jain et al., 2011, Ochs and Desai, 2015, Rozen
et al., 2008], sexual selection [Bohm et al., 2019], and mass-extinction events [Mathias and
Ragusa, 2016, Engholdt and Mathias, 2021] all describe scenarios where selection strength
changes over time, space, or both.
    As a consequence of these fluctuations, natural populations experience both increased
and decreased effectiveness at exploring the fitness landscape over time. Furthermore, during
the periods of increased exploration, the population may discover new fitness peaks that,
under stronger selection, would have been unlikely or impossible to discover. Range expansion
events are one particular example of a process that can cause a reduction in selection strength.
As a species enters new territory, a lack of competition can result in an accumulation of
deleterious mutations in organisms on the leading edge of the expansion for as long as
uncolonized territory remains [Burton and Travis, 2008]. The continued accumulation of
mutations in one lineage can result in adaptation via valley-crossing, a process known to be
critical for evolutionary adaptation [Covert et al., 2013, Oliveto et al., 2018].
    In this chapter, we present a method designed to augment existing optimization algorithms
                                                95


with the exploration-boosting power of range expansion, without requiring that populations
have spatial structure. In addition, our method is able to maintain the exploratory advantages
of the leading edge of a range expansion indefinitely. The key insight underlying our method
is that for sufficient exploration to occur, mutations must be able to accumulate in some—
but not all—lineages before purifying selection acts on them. To achieve this, we introduce
super explorers to the population. These agents artificially experience the same mutation-
accumulation as an organism at the leading edge of a range expansion.
5.1.2      Similar algorithms
Super explorers are agents that are set aside from the main population and always have exactly
one offspring every generation. Since the lineages of super explorers propagate regardless
of fitness, they experience sustained genetic drift. The super explorer method shares some
similarities with other algorithms that ignore the fitness gradient to boost the effectiveness
of a search algorithm.
    Particle Swarm Optimization (PSO) [Poli et al., 2007] is a population based search
algorithm that models agents as particles and simulates forces that attract the particles to
the highest observed fitness (analogous to selection). However, the individual particles in
PSO do not move directly to the higher gradients, but instead chaotically trend towards it,
often exploring areas of the fitness landscape more distant from known optima.
    Lexicase selection [Helmuth et al., 2014,La Cava et al., 2016] and real-valued Tournament
selection [Ragusa and Bohm, 2021] are examples of evolutionary algorithms that probabilis-
tically ignore some or all fitness gradient information. Novelty search [Lehman and Stanley,
2011] is an evolutionary algorithm that foregoes following the fitness gradient altogether and
instead focuses on collecting a catalog of functionally distinct solutions, evaluating them for
relevance to the fitness function ex post facto.
    Systems augmented with super explores are different from the examples above in key ways.
While super explorers do experience a form of occasional selection during the decay-replace
process (described in Methods), they do not individually experience a fitness gradient, even in
                                                96


the form of selection for novelty. While some other methods, such as lexicase and real-valued
Tournament, can occasionally allow for several generations of drift on some lineages, super
explorers are maintained separately from the main population, ensuring their lineages are
repeatedly free of selection.
5.2       Methods
5.2.1      Augmenting evolution with super explorers
Super explorers can augment any well-mixed discrete-generation agent-based selection method
(compare Algorithm 3 with Algorithm 4). Super explorers are added to a selection method
by dividing a population into two mutually exclusive pools, the active selection pool (or
ASP ) and super-explorer pool (or SEP ) (see Fig. 5.1). The ASP evolves via the rules of an
externally defined selection method (such as, roulette, tournament, lexicase, etc.), except
that parents can be drawn not only from the ASP, but also from the SEP. On the other
hand, the agents in the SEP each produce one offspring, with the standard mutation load,
regardless of fitness. While lineages in the ASP survive and perish following the rules of
the selection method in use, lineages in the SEP end only by decay. Whenever an agent
in the SEP is about to reproduce, there is a chance (the decay rate) that the agent will be
removed and replaced by some other agent in the population. The number of agents in the
ASP and SEP, the decay rate, the method used to replace decayed SEP agents, the mutation
settings, and the selection method used in the ASP (and related settings) are all parameters
established by the user.
    Because the super explorers reproduce regardless of their fitness, they are able to accu-
mulate mutations without consequence. The decoupling of survival from fitness allows super
explorers to explore the fitness landscape in directions that selection might prohibit; they
may descend into, and possibly cross, fitness valleys. The mutational paths of super explorers
are, in fact, random walks (i.e., undirected and unfocused), so they are an inefficient search
algorithm on their own. However, by introducing super explorer decay and replacement to
                                               97


      Time
          t          Active Selection Pool      Super Exporers Pool              Global Max
                             ASP                       SEP                          GM
                   selection +                        copy +
                     mutation                        mutation                   copy
                                                           M
                                                         yG
                                                       ca
            t+1      Active Selection Pool           de
                                                Super Exporers Pool              Global Max
                             ASP                       SEP                          GM
Figure 5.1: A diagram of an evolving system augmented with super explorers. Here, the population is
divided into the active selection pool (ASP) and the super-explorer pool (SEP). While selection in the ASP
(green arrows) is determined by a standard selection method (such as roulette, tournament, lexicase, etc.)
with parents drawn from the ASP and SEP, agents in the SEP are free from selection and simply accumulate
mutations (red solid arrow). From time to time, agents in the SEP decay and are replaced. Replacements (red
dashed arrows) are drawn from either the ASP and SEP, or the global max, depending on the replacement
method, where the global max maintains a copy of the highest scoring agent seen to date (blue arrows).
the algorithm, we provide a degree of focus to help keep the search process “on track.” The
                                   M/PP
                            decay P
search pattern of the super explorers can be seen as a genotypic radiation where the center of
the radiation is determined by the replacement method, the radiation distance is determined
by the decay rate, and the density of the search is determined by the number of agents in
the SEP.
    Since the selection method used to choose parents in the ASP has access to both the
ASP and the SEP, super explorers can migrate into the ASP if their fitness is competitive
(i.e., if a super explorer drifts across a valley, they can be discovered by and incorporated
into the ASP). Thus, a selection method augmented with super explorers has the advantage
of simultaneously exploring freely (in the SEP) while exploiting discoveries (in the ASP),
compared to a non-augmented method that tries to balance the explore-exploit trade-off with
only an ASP.
                                                    98


  Algorithm 3: A typical Evolutionary Algorithm
    Result: A population of adapted digital organisms.
    population = [Organism() for in range(popSize)];
    for g in range(numGenerations) do
        for org in population do
           evaluateFitness(org);
        end
        newPopulation = [];
        while len(newPopulation) < popSize do
           newPopulation.append(mutate(selectFrom(population)));
        end
        population = newPopulation;
    end
    return population;
5.2.2      Tunable parameters of super explorer systems
When employing the super explorer method, the user must first choose a population size,
selection method and mutation scheme. These choices, as well as the tuning of any associated
parameters, will have significant consequences on rates of adaptation. Regardless, these
choices have no bearing on the super explorer parameters themselves. As long as the selection
method is agent-based, well-mixed, and has discrete-generations, it can be augmented with
super explorers as presented in this work. Some extensions of super explorers, which allow
for the relaxation of some of these constraints, are addressed in the discussion.
    There are only three parameters that relate directly to super explorers: SEP size, decay
rate, and replacement method, which we describe below.
5.2.2.1    Parameter: super-explorer pool size
SEP size can be set to any integer value, from 0 to population size, and controls the intensity
of the search conducted by the super explorers. Note that the ASP size is not a parameter
and is simply defined as population size minus SEP size.
                                              99


  Algorithm 4: An Evolutionary Algorithm Augmented with Super Explorers
   Result: A population of adapted digital organisms.
   population = [Organism() for in range(popSize)];
   superExplorers = [Organism() for in range(sepSize)];
   for g in range(numGenerations) do
        for org in [population+superExplorers] do
           evaluateFitness(org);
        end
        newPopulation = [];
        while len(newPopulation) < popSize do
           newPopulation.append(mutate(selectFrom([population+superExplorers])));
        end
        population = newPopulation;
        newSEP = [];
        for org in superExplorers do
           if random() ≤ decayProbability then
               newSEP.append(mutate(selectFrom([population+superExplorers])));
           end
           else
               newSEP.append(mutate(org)));
           end
        end
        superExplorers = newSEP;
   end
   return population;
5.2.2.2    Parameter: decay rate
The decay rate is the per-generation probability that each agent in the SEP is subject to
replacement, and directly controls the explore-exploit trade-off of the SEP. The decay rate
can be set to any value r, such that 0 ≤ r ≤ 1. The decay rate determines the average
amount of time, t̄ ∼ 1/r, that a super-explorer lineage persists before replacement, which in
turn determines how far super explorers can drift (i.e., explore). While lower decay rates can
result in more exploration, they can also be wasteful if the valleys in the fitness landscape
do not require such distant explorations.
                                            100


5.2.2.3     Parameter: replacement method
The replacement method defines the process used to replace a decayed agent in the SEP. In
this work, we consider three replacement methods, shown in Fig. 5.1, though many others
are possible.
    The first method, the PopSelect (PS) method, replaces decayed SEP agents with a new
agent generated using the same selection process used for reproduction in the ASP.
    The second method, the PopMax (PM) method, replaces decayed SEP agents with the
max of the population (i.e., the max of all the agents in the combined ASP and SEP).
    The third method is the GlobalMax (GM) method. In addition to the ASP and SEP,
this method requires that the global max, a copy of the highest fitness agent seen to date in
either pool, is maintained and used to replace decayed SEP agents.
5.2.3      Fitness functions
We use two types of fitness functions to collect performance data for the parameter sweeps.
First, we chose a typical NK-landscape to provide an intuition regarding the performance
of super explores in a familiar context. Then, we turn to a saw-tooth function to provide a
different perspective on the operation of the super-explorer method under more controlled
conditions.
5.2.3.1     Fitness function: NK-landscape
The first fitness function is the well studied NK-landscape [Kauffman et al., 1993]. This land-
scape has a rugged topology and a global optimum. The NK function, NK eval(i, gi |N, K),
decides the fitness contribution of the i-th gene, gi . The fitness landscape is randomly gener-
ated with parameters N = 20 and K = 5 which control genome size, G = {g1 , ..., gN }, and
strength of epistasis (in our implementation, K = 1 represents no epistasis), respectively.
The score assigned to each organism by NK is given by
                                              N
                                           1 X
                              score(G) =         NK eval(i, gi |N, K)                      (5.1)
                                          N i=1
                                                101


5.2.3.2    Fitness function: saw-tooth landscape
The second fitness function, the saw-tooth fitness function, [Ragusa and Bohm, 2021] defines
a saw-tooth mapping function from genomic values to scores along an infinite set of ever
higher fitness peaks (like the teeth of a saw) as shown in Fig. 5.2.
    The saw-tooth landscape has two special properties. First, it has no global optimum.
Second, every fitness valley is self-similar; regardless of absolute position on the fitness
landscape, the difficulty of descending into and the benefit of crossing every valley is the same.
These two properties together result in a landscape that presents a constant challenge to an
evolving population, without diminishing returns (an idea associated with Fisher’s geometric
model [Orr, 2005]). In this landscape, agents have 10 independent genes, G = {g1 , ..., g10 },
where each gene is a single integer value. Applying the mapping function to each gene results
in 10 gene scores that are summed to produce the agent’s overall score. The saw-tooth
function saw(x|w, p, b) is specified by a valley width (w), a fitness penalty per mutation into
each valley (p), and a fitness benefit per valley crossed (b):
                                              X10
                                  score(G) =      saw(gi |w, p, b)                           (5.2)
                                              i=1
Here we test four versions of saw-tooth functions. In all four, p = −0.05 and the b = 1.0. We
varied w to create four versions of the function with different difficulties, from w = 4 (easy)
to w = 7 (hard). These functions are shown in Fig. 5.2.
5.2.4     Roulette selection
We used roulette selection in the ASP. Roulette selection is known to suffer from a diminishing
selection pressure as the absolute value of agents’ scores increases during evolution. In order
to ensure that equivalent relative increases to score result in the same relative increase in
offspring production, the scores generated from both the NK-landscape and the saw-tooth
landscape are exponentiated before selection occurs.
                                               102


              a                  b                       c                    d
            5 width 4            width 5                 width 6              width 7
            4
            3
    score   2
            1
            0
              0 4 8 12 16 20     0 4 8 12 16 20 0 4 8 12 16 20 0 4 8 12 16 20
               "genome value"     "genome value" "genome value" "genome value"
Figure 5.2: The saw-tooth functions that map gene values to “scores”, used in Fig. 5.4. Panels [a] through
[d] show the valley widths (w) 4 through 7, respectively. The values for penalty (p = −0.05) and benefit
(b = 1.0) are the same in all 4 functions.
                                        f (G) = exp(score(G))                                       (5.3)
5.2.5        Experiment conditions
In order to determine the effectiveness of super explorers, we compare roulette selection
with and without super-explorer augmentation on two classes of fitness function (NK and
saw-tooth). We use a total population (ASP + SEP) size of 1024 in all experiments. For
each combination of selection regime and function, we run a three-dimensional sweep of SEP
size, decay rate, and replacement method. We run decay rates {0.001, 0.002, 0.005, 0.01,
0.02, 0.05, 0.1, 0.2, 0.5, 1.0}, pool sizes {0, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024}, and
replacement methods {GM, PM, PS}.
    We run 101 replicates for each condition using the NK-fitness landscape (N = 20 and
K = 5) for 10,000 generations and record the final maximum fitness from each replicate.
Each replicate evolves on the same NK landscape, which is randomly generated a priori. A
bit genome was used where mutations are introduced with a bit-flip mutation operator, with
a 0.0005 per-site mutation rate (i.e., a 0.01 per-agent mutation rate).
    We run one replicate for each condition using the saw-tooth landscape for 40,000 genera-
tions and record the number of valleys crossed. In addition to the parameter sweep described
                                                   103


above, we used four saw-tooth functions with w = {4, 5, 6, 7} (shown in Fig. 5.2). An integer
genome was used where mutations are introduced with a point-offset mutation operator that
modifies a genome value by ±1, with a 0.05 per-site mutation rate (i.e., a 0.5 per-agent
mutation rate, as each agent has 10 loci).
    The MABE evolution framework was used to run experiments [Bohm et al., 2017].
5.3       Results
5.3.1      NK landscape
Figure 5.3 displays the data collected from 101 replicate evolutionary experiments on an
NK-landscape with parameters N = 20 and K = 5. The figure presents three color maps
(labeled [1], [2], and [3]) each showing results generated using a different replacement method
(GM, PM, and PS). The cells in each color map show the averages, across replicates, of the
maximum fitness detected at the end of each replicate. Note that values in each plot associated
with SEP size = 0 are the same, since decay rate only matters if there are super explorers.
The fitness differences between the max-fitness, min-fitness, and control configurations of
each panel are computed and checked for significance with a two sample z-test (shown in
Table 5.1).
    The NK-landscape results show that the GM and PM methods provide similar results
that are quite different from the PS data. In the GM and PM results, we see that the addition
of any super explorers improves performance under almost all conditions. In addition, as
SEP size increases, low decay rates tend to improve adaptation while high decay rates tend
to have the opposite effect. In fact, a population made up entirely of super explorers (SEP
size = 1024) combined with decay rate = 1.0, shows little to no signs of improvement over a
non-augmented population.
    We note a band of yellow color in the PS data, highlighting the conditions with the
greatest final scores. This band appears to show a trade-off between SEP size and decay rate.
To the right and above the band, we see values that conform closely to the non-augmented
                                                104


                1   GlobalMax              2       PopMax                3      PopSelect
            0
            2
            4                                                                                        0.74
            8
    SEP size
           16                                                                                        0.72
           32
           64
                                                                                                     0.70
          128
          256
          512                                                                                        0.68
         1024
                1.0
                0.5
                0.2                       1.0
                                          0.5
                                          0.2                           1.0
                                                                        0.5
                                                                        0.2
                0.1
                0.05
                0.02                      0.1
                                          0.05
                                          0.02                          0.1
                                                                        0.05
                                                                        0.02
                0.01
                0.005                     0.01
                                          0.005                         0.01
                                                                        0.005
                    decay rate                    decay rate                    decay rate
                0.002
                0.001                     0.002
                                          0.001                         0.002
                                                                        0.001
Figure 5.3: Average maximum final fitness on an NK landscape from 101 replicates using super explorers
with three replacement methods: [1] GM, [2] PM, and [3] PS. The color range is set so that red is associated
with the control condition; SEP size = 0 (i.e., no super explorers). The stars and crosses indicate the
highest-fitness and lowest-fitness configuration of each panel, respectively. The largest increase and decrease
of fitness, relative to the control, are shown in Table 5.1.
system, while to the left and below the band, we see that final recorded scores are lower than
the non-augmented system. This area of low values correlates with a large SEP size and low
decay rate.
    Note that the values in the column associated with decay rate = 1.0 in the PS data all
match the values associated with SEP size = 0. In these conditions, SEP agent replacement
happens every generation and uses the same replacement method used in the ASP. As a
result, the two pools act as a single ASP (small fluctuations are the result of sampling noise).
                       Decay Method         Largest Increase      Largest Decrease
                           GM                  +2.76% *               −0.15%
                           PM                  +2.61% *               ±0.00%
                            PS                 +1.51% *              −6.44% *
Table 5.1: The fitness differences between the max-fitness, min-fitness, and control configurations of the
NK-landscape data in Figure 5.3. ‘*’ indicates a p-value of p < 1. × 10−6 (two sample z-test).
                                                     105


5.3.2      Saw-tooth landscape
Figure 5.4 shows color maps illustrating the number of valleys crossed at the end of the
40,000 generations, with agents evolved on saw-tooth fitness functions. Each of the letters
[a] though [d] indicate a different valley width, from 4 to 7. The numbers [1] through [3]
indicate the replacement method used (GM, PM, and PS). Note that values in each panel
associated with SEP size = 0 are the same, since decay rate only matters if there are super
explorers. As we saw in the NK data, the saw-tooth landscape results show that the GM
and PM methods provide similar results that are quite different from the PS data.
    Across all panels of the saw-tooth data, there are vertical trends, associated with SEP
size, and horizontal trends, associated with decay rate. As we move from top to bottom and
introduce a higher ratio of super explorers (larger SEP size) relative to the size of the ASP,
we generally see a smooth transition from SEP size = 0 to SEP size = 1024. Conversely, as
we move from low decay rates to higher decay rates, particularly larger SEP sizes, we see
that the change in performance is not as simple to describe. In all GM and PM results, while
we see that large SEP combined with a high decay rate reduces rates of adaptation relative
to the non-augmented system, the effect of large SEP and low decay rate is not constant.
Rather, in the data for large SEP and low decay rate, the final scores flip from worse than
non-augmented to better as we move from less deceptive to more deceptive landscapes.
    In the PS results Fig. 5.4[a,b,c,d][3], we see a different trend. As mentioned in the NK
data analysis, when decay rate = 1.0 agents in the SEP experience the same selection as
agents in the ASP (i.e., replaced every generation using the selection method from the ASP).
Therefore, as in the NK data, the two pools act as a single ASP (small fluctuations are the
result of sampling noise). Unlike the NK data, we see that larger SEP sizes perform as well as
or better than the non-augmented system, except in the lower left of [a, b].[3] where SEP is
very large, and decay rate is very low. The highest rates of valley crossing appear to correlate
with large SEP sizes and middling decay rates (i.e., around 0.05 to 0.1).
    Fig. 5.4[3] shows the saw-tooth PS result. As in the NK data, the column associated with
                                              106


                            GlobalMax               PopMax                  PopSelect
                  0
                  2
                      a.1                   a.2                       a.3                       10000
                  4                                                                             8000
                  8
          SEP size
                 16
                                                                                                6000
                 32
                 64
                                                                                                4000
                128
                256
                                                                                                2000
                512
               1024
                  0
                  2
                      b.1                   b.2                       b.3                       5000
                  4
                  8                                                                             4000
          SEP size
                 16
                 32                                                                             3000
                 64
                                                                                                2000
                128
                256
                                                                                                1000
                512
               1024
                                                                                                0
                  0
                  2
                      c.1                   c.2                       c.3                       1750
                  4                                                                             1500
                  8                                                                             1250
          SEP size
                 16
                                                                                                1000
                 32
                 64                                                                             750
                128                                                                             500
                256
                512                                                                             250
               1024                                                                             0
                  0
                  2
                      d.1                   d.2                       d.3                       1000
                  4
                  8                                                                             500
          SEP size
                 16
                 32
                                                                                                0
                 64
                128
                256                                                                                 500
                512
               1024                                                                                 1000
                      1.0
                      0.5
                      0.2                  1.0
                                           0.5
                                           0.2                       1.0
                                                                     0.5
                                                                     0.2
                      0.1
                      0.05
                      0.02                 0.1
                                           0.05
                                           0.02                      0.1
                                                                     0.05
                                                                     0.02
                      0.01
                      0.005                0.01
                                           0.005                     0.01
                                                                     0.005
                            decay rate            decay rate                decay rate
                      0.002
                      0.001                0.002
                                           0.001                     0.002
                                                                     0.001
Figure 5.4: Total number of valleys crossed after 40k generations on saw-tooth landscapes. Horizontal
panel groups [a], [b], [c], and [d] show data for w = {4, 5, 6, 7} saw-tooth functions, respectively. Vertical
panel groups [1], [2], and [3] show data for replacement methods GM, PM, and PS, respectively. The stars
and crosses indicate the highest-fitness and lowest-fitness configuration of each panel, respectively. Note: the
scales on each color bar (i.e., for each row) are different.
                                                     107


decay rate = 1.0 uses the same replacement method used in the ASP. Since the replacements
occur every generation, the two pools act as a single ASP (small fluctuations are the result
of sampling noise).
5.4       Discussion
In this work, we show that augmenting selection (in this case roulette-wheel selection) with
super explorers can improve rates of adaptation across a range of landscapes and parameter
settings. In this discussion, we are primarily interested in describing the super-explorer
augmentation method.
     The conditions where SEP size = 0 show the behavior of the non-augmented system. In
these conditions, there are no super explorers, so decay rate has no effect. Hereafter we will
refer to these conditions as the ’control’ and they will serve as the baseline for all of our
comparisons.
     In our experiments, we test conditions with different SEP sizes, but we do not alter the
number of agents in the total population (ASP + SEP = 1024) or the mutation inflow per
agent. Therefore, the populations always experiences the same number of evaluations (i.e.,
uses the same amount of computational power) and have the same potential for discovery in
the form of mutations. As a result, differences in performance between the control and other
conditions must be accounted for by other means. Perhaps the observed differences can be
explained in terms of a trade-off between exploration and exploitation.
5.4.1      The explore-exploit trade-off of super explorers
In evolutionary systems (natural or computational) the explore-exploit trade-off explains
how the conditions that allow for effective navigation on smooth fitness landscapes hinder
navigation on deceptive fitness landscapes and vice versa. On a smooth landscape the paths
towards higher fitness are easy to discover, so unfocused distant explorations are wasteful:
it is prudent to spend energy exploiting the local fitness gradient. Conversely, on deceptive
landscapes the paths to higher fitness often require crossing fitness valleys, so focusing on
                                             108


local search only results in short-term superficial gains: it is prudent to spend time on long
shots.
    Our initial inspiration for the super explorer method was the biological phenomenon of
range expansion. There, we observed that the freedom from selection along the leading
edge of a range expansion creates lineages in which mutations could accumulate, while the
remainder of the population experiences purifying selection. To mimic this effect, we designed
a method that divides a population into two pools: the SEP that specializes in exploration,
and the ASP that specializes in exploitation. In practice, we see that either pool can be
set up to specialize in exploration or exploitation, since both the selection method used in
the ASP, and the super-explorer decay rate parameter used in the SEP, can be tuned to
maximize exploration or exploitation.
5.4.1.1    Decay rate
While it may seem intuitive that low decay rates enhance exploration and inhibit exploitation,
and that high decay rates do the opposite, it turns out that explaining why this is the case is
not trivial. There are two effects to consider: 1) the amount of genetic change that a lineage
in the SEP can accumulate before it decays (and is replaced) and 2) the strength of selection
resulting from the rate of replacement.
    Generally speaking, the amount of genetic change that any lineage can accumulate is
related to the strength of selection (which inhibits change), and how long the lineage can
avoid extinction. SEP lineages (unbroken phylogenies in the SEP) are special in that they are
free from selection, and so their potential for genetic change is only limited by how long they
persist, which in turn depends on the decay rate. Meanwhile, the decay rate also establishes
how often replacements occur and, because SEP lineages are free from selection, replacement
is the only selective force at work in the SEP. Hence, decay rates control the frequency of
selection events, and this frequency correlates directly with selection strength. Note that the
conditions likely to enhance exploration, high drift potential and low selection strength, are
conditions generated by low decay rates while the conditions likely to enhance exploitation,
                                               109


low drift potential and strong selection, are conditions generated by high decay rates.
5.4.1.2    Replacement method
The three replacement methods we chose to test correlate with a range of selection strengths.
The GM method provides the strongest selection strength because it always maintains the
best solution. The GM method insures that small fitness changes (potentially undetectable
by the ASP’s selection method) are always exploited, but this comes at a cost: the inability
to forget can inhibit escape from local optima. In fact, when the GM method is combined
with an SEP size equal to population size and decay rate equal to 1.0, the result is elitist
behavior; every agent tests one mutation, and then either becomes the new global max or is
forgotten. Compared to the GM method, the PS method results in weaker selection, which
cannot exceed the selection strength of the ASP’s selection method. This is because at decay
rate = 1.0, all agents in the population are replaced using the ASP selection method every
generation (regardless of SEP size). The PS method represents a middle ground in terms
of selection strength. However, since we are using a relatively large population size relative
to decay rate, forgetting high quality solutions is also unlikely, so the GM and PM methods
generate similar results.
5.4.1.3    SEP size
The SEP size parameter determines how much each pool (ASP and SEP) drives the system.
As SEP size increases, particularly when decay rates are low, there are more low quality
solutions in the total population, which increases the chance that low quality solutions will
be selected during ASP reproduction. As a result, larger SEP sizes result in weaker selection
in the ASP. This provides another mechanism that helps explain why low decay rates can
enhance exploration.
5.4.2      Analyzing the NK data
Fig. 5.3[1,2] show the results of using the GM and PM methods to evolve populations on
the NK-landscape. In these panels, the lower-left corner (SEP size = 1024, decay rate =
                                              110


0.001) maximizes exploration, while the lower-right corner (SEP size = 1024, decay rate =
1.0; i.e., elitism) maximizes exploitation, as described above. In the GM and PM results, we
see that enhancing exploration improves performance, while maximizing exploitation results
in performance that is not significantly different from the control. The fact that the results
of elitism are similar to the control suggest that the ASP is already generating elitist-like
behavior that hinders exploration. It follows then that improved adaptation associated with
lowered decay rates results from increased exploration.
    The sudden jump in fitness from SEP = 0 to SEP = 2 in Fig. 5.3[1,2] occurs because the
GM and PM methods select max perfectly when making replacements into the SEP. Using
the GM method, it is impossible to forget a high-value solution once it has been discovered;
forgetting is only very unlikely when using the PM method. As a result, the GM and PM
methods are able to identify and exploit fitness improvements that the ASP selection method
alone may not detect. Finally, the decrease in fitness along the right-most column (decay
rate = 1.0), from SEP = 2 all the way to SEP = 1024, shows that the behavior of the ASP
alone is not exactly the same as elitism. While the ASP can allow for the accumulation of
small (i.e., nearly-neutral) deleterious mutations, a system experiencing elitism can never
move in a direction that results in any loss of fitness. As a result, the highest fitness occurs
at decay rate = 1.0 and SEP size = 2, indicating that the best performance is generated by
a large ASP (that has the ability to explore, if only locally), augmented by a SEP that does
not forget.
    Fig. 5.3[3] shows the results of the PS method used to evolve populations on the NK-
landscape. In this panel, as in the GM and PM methods, the lower-left corner maximizes
exploration, while the lower-right corner maximizes exploitation. However, in these results,
the entire right side of the panel evolves similarly to the control. The low fitness recorded
in the low left is the result of weak selection that is unlikely to find effective solutions, and
it is likely to forget what it does find. The most interesting feature in this panel is the
yellow band—indicating the highest scores—that appears to show a trade-off between SEP
                                               111


size and decay rate. SEP size = 1024 and decay rate = 0.1 marks the bottom of this band,
and suggests that lineages that survive for about 10 generations are optimal when the entire
population is in the SEP. As we decrease the size of the SEP, we see that lower decay rates,
(i.e., drifting lineages that persist for longer) are optimal. This is because the larger ASP
results in more exploitation and less exploration. Apparently, the trade-off exists because
when there are fewer SEP agents they need more time to drift in order to make discoveries,
but larger ASPs are better at exploiting beneficial fitness gradients.
5.4.2.1      Analyzing the saw-tooth data
We ran four saw-tooth landscapes, representing different levels of deception. Across all panels
in Fig. 5.4 we see results that correlate with maximized exploration on the bottom left and
maximized exploitation on the bottom right. As in the NK-landscape data, the top row
(SEP = 0) shows the non-augmented system (the control). Fig. 5.4[a].[1,2] show the results
of the GM and PM methods on a saw-tooth function with only limited deception. Here
we see that small SEP sizes generate the best performance, while larger SEP size degrades
performance. This indicates that the ASP selection settings are well tuned for this function,
and evolution does not improve with the addition of super explorers. Moreover, elitism results
in the lowest levels of performance, since valley crossing is required to optimize this function.
In Fig. 5.4[b,c,d].[1,2] we find the results from the other three saw-tooth landscapes, each
with an increasing level of deception. As the functions become harder to adapt to, we see
that the control does not produce the best performance because it does not allow enough
exploration. As a consequence, larger SEP sizes become more effective. In Fig. 5.4[d] (the
hardest function), the best performance occurs when the entire population is SEP agents
(SEP size = 1024) and SEP lineages persist for about 200 generations (decay rate = 0.005).
This suggests that navigation on this landscape requires a high degree of exploration. The
fact that the highest performance values are not at the lowest decay rates means that too
much exploration is wasteful. It is worth noting that in Fig. 5.4[c,d][1,2], SEP size = 1024
with decay rate = 0.05 performs about as well as the control. We theorize that the explore-
                                                112


exploit trade-off in these conditions is similar. We believe this hypothesis can be extended
to state that any conditions that share performance values likely also experience similar
explore-exploit trade-offs.
    The results generated by the saw-tooth landscape using the PS method, Fig. 5.4[a,b,c,d][3]
differ significantly from the NK-landscape PS method results (we address this difference in
the next section). As expected, the values along the top and right side of each panel are
the same and correlate with the control condition. Except in the case of the least deceptive
function ([a]) the best performance across the three replacement methods is found in the PS
data. This supports the idea that weakened selection is beneficial for valley crossing on the
harder saw-tooth functions.
5.4.3      The effect of diminishing returns
The NK-landscape suffers from diminishing returns, which means that the ratio of beneficial
to deleterious mutations and the fitness gain per beneficial mutation decreases as fitness
increases. As a result, the optimal explore-exploit ratio changes over the course of evolution.
Strong exploitation initially maximizes the rate of adaptation, but later is likely to result in
getting stuck on local optima. Conversely, stronger exploration, though slower at first, can
achieve higher final performance in the long run, because it is able to escape local optima.
    In order to disambiguate the effects of diminishing returns from other dynamics, we
included the saw-tooth landscapes, which do not exhibit diminishing returns and allow us to
study the behavior of a system with a constant difficulty.
5.4.4      Super explorers alone
The bottom row in each panel in Figures 5.3 and 5.4 corresponds to a configuration where
the entire population is SEP agents; there are no agents in the ASP. The high performance
observed in some SEP-only configurations demonstrates that the combination of drift and
replacement is a viable search process in its own right. In fact, for several of the fitness
functions, a configuration with SEP = population size achieves maximum or nearly maximum
                                              113


performance. We call this new drift-and-replace search process “Drift Pool Optimization” or
DPO.
    There is a surprising similarity between DPO and particle swarm optimization (PSO).
Both DPO and PSO operate as a swarm of particles, as opposed to implementing evolution
by natural selection. The main difference is that while the particles in PSO tend to converge
towards known optima, the particles in DPO originate near current optima and are allowed
to drift. In both cases, the degree to which solutions can diverge from know optima is critical
to success: too little divergence will stifle innovation, while too much diffusion will result in
chaotic behavior incapable of effectively making discoveries.
5.4.5      Super explorers and recombination
In addition to the experiments shown in this work, we tested the saw-tooth functions with
recombination (results not shown). When producing offspring in the ASP and when picking
replacements using the PS method, we select two parents and perform three-point recombi-
nation to generate offspring (while GM and PM method replacements are still asexual). We
found that while scores with recombination tended to be higher, the trends in the data were
almost identical.
5.4.6      Extending the super explorer method
In this work, we presented the super-explorer method, an augmentation that can be used
to enhance other selection methods. Here we present some potential modifications and
extensions. While a super-explorer augmented system is already able to simultaneously
support enhanced exploration (in the SEP) and exploitation (in the ASP), the algorithm
could be upgraded so that it automatically modifies both decay rate and SEP size based on
observed rates of evolution in an adaptive manner, for example by monitoring the time the
population spends in stasis.
    Another possible modification to the algorithm would be to encourage diversity in the
SEP, which would ensure that the SEP explores genetic space more uniformly. Both genetic
                                                114


diversity and phenotypic diversity could be investigated. In addition, a method like MAP-
Elites [Mouret and Clune, 2015] could replace the global max pool and offer a more diverse
alternative to elitism.
    An interesting alternative to super-explorer augmentation would be to simply use another
selection method in place of the SEP. This would allow two selection methods with different
behaviors to synergize. We are particularly interested in investigating augmenting Particle
Swarm Optimization (PSO) with super explorers, and also in using PSO in place of the SEP
to augment other systems.
    There is no reason that a system must be limited to only two selection methods. A
population could be subdivided into any number of pools, each acting as an ASP or SEP
with unique selection methods and parameterization. Automation in the form of pool-size
balancing and parameter adjustments could also be considered.
5.4.7      Using the super explorer method to study evolution
The super explorer method could be used to further our understanding of general evolutionary
processes related to the trade-off between exploration and exploitation. Since the behavior
of the SEP can be related directly to the explore-exploit trade-off, super explorers could
be used to gauge the relative explore-exploit trade-off of various selection methods. In this
work, we only considered a single selection method for the ASP: roulette selection. Other
configurations of roulette-wheel selection, representing different selection strengths, as well
as other selection methods should be tested. This approach could allow us to ask a number
of different questions, such as: how does tournament size affect the explore-exploit trade-off?
5.5       Conclusion
In this work, we introduced a new bio-inspired optimization technique called “the super
explorer method” and we demonstrated the efficacy of the method on a number of deceptive
fitness landscapes. There is a wealth of literature exploring how population size, selection
strength, and mutation rate affect rates of adaptation. In addition to these, we argue that
                                              115


the frequency at which selection is applied and the distance a lineage can drift before it is
evaluated also affect the success of an evolutionary algorithm. Other evolutionary algorithms
have experimented with changing the frequency with which selection is applied (e.g., lexicase
and real-valued tournament) or ignoring fitness altogether (e.g., novelty search), but we
believe we are the first to design a method that intentionally protects lineages in order to
promote discovery by drift.
                                               116


Chapter 6
Conclusions
In this final chapter, I will summarize the main results of this dissertation and discuss the
significance of each result for computer scientists and biologists. I also provide advice on the
design and tuning of evolutionary algorithms based on the findings of my work. I identify
topics that should be the subject of future research, and I discuss the potential I see in
pursuing them. At the end of the chapter, I give my final remarks on the importance of the
work, and highlight the most important lessons learned.
6.1       Measuring the impact of selection
In Chapter 2, I introduced a novel measure of selection strength — the selection impact —
which I define as a population’s ’distance from neutral drift’. The new metric examines a
population’s offspring production and compares it with the offspring production of a neutrally
drifting control population. The selection impact measures the influence that selection has
on the reproductive success of organisms in a population. The new metric is dimensionless
and makes no assumptions about the type or mode of selection, allowing for easy comparisons
between any two measurements, even when comparing between species or modes of selection.
The selection impact is strongest when both the strength of selection is strong and the total
non-neutral phenotypic variation within the population is high. In one test case, I used
                                              117


the selection impact to optimize the mutation rate of an evolutionary algorithm that was
optimizing simple fitness functions; I found the peak impact occurred when the mutation
rate placed the population just on the boundary between the sequential-fixation regime and
the successional-mutations regime [Desai and Fisher, 2007].
    The evolutionary algorithm engineers and population geneticists each have a set way of
thinking about selection that reflects their goals and typical courses of action. Especially
for algorithm engineers, selection is seen as a component, a part of an algorithm that can
be tuned, swapped out, and re-designed, or in a word subordinate to the engineer. This
way of thinking omits the fact that selection is about more than just the selection algorithm;
selection is influenced by many factors that I have enumerated earlier during my review of
shifting balance theory and related phenomena. The selection impact metric is based on
a different way of thinking. The selection impact metric views neutral drift as the default
condition of a population of replicators. Selection, therefore, is anything and everything
that moves the dynamics of the population away from neutral drift. This is, as far as I am
aware, a novel perspective of selection that opens the door for further analysis. I believe
the discussion surrounding a new metric of selection strength, based on this way of thinking,
serves as the starting point for a discussion about selection in general.
    The selection impact metric relies on the experimentalist to first derive the neutral drift
reference distribution for their population of interest. I derived several of these reference
distributions for Wright-Fisher and Moran generational models. There are similarities between
the Wright-Fisher and Moran processes in terms of the way I calculate their drift reference
distributions. For the Moran process, it is necessary to track ancestors and descendants, while
the Wright-Fisher process tracks parents and offspring; one is clearly a special case of the
other. Unfortunately, my method of deriving the Moran drift references is computationally
intensive. Ideally, I would have been able to simplify the calculation of the Moran drift
reference. It may be possible to simplify the Moran calculations by more closely examining
the similarities between the two generational models, though I expect to find that the Wright-
                                              118


Fisher generational model is simpler because it is the special case.
    I chose to define the selection impact as a distance from neutral drift, but I could just as
well have been justified to create a metric that was a distance from the strongest possible
selection. Preliminary work on this approach showed that the pair of metrics, one a distance
from drift and the other a distance from elitism, are not replacements for each other: the
relationship between the two distance metrics is not simply that one is x and the other is
1 − x; there is a non-linear relationship between them. For this reason, I chose not to focus
on the elitism distance in order to refine my understanding of the drift distance. However,
exploring both metrics simultaneously may end up being a fruitful path to take. When the
two metrics are plotted against each other, the fixation of a new beneficial mutation traces
out a loop: the two metrics are not even in phase with one another. This is not true of all
pairs of selection strength metrics, and investigating the metrics that do form loops along
with the structure of the loops themselves may provide insights into the fixation dynamics
of a population.
    Other measures of selection strength, like the ubiquitous Price equation and its gener-
alizations, relate selection strength to the change of macroscopic traits in the population.
During my work on the selection impact, I began looking for ways to incorporate macroscopic
traits into the metric. The foundation of the selection impact is that neutral drift is the null
hypothesis, and this must remain the same while other changes are considered. One solution
I considered is adopting the Fokker-Planck equation (FPE) [Risken, 1989] as the underlying
computational framework for the selection impact. Instead of deriving a drift reference from
first principles, we can instead use an observation of a population together with the FPE to
predict the future state of the population, while assuming only neutral drift. In this tech-
nique, the population is first observed and a distribution of trait measurements is compiled.
Then, using the FPE to simulate neutral drift on the trait frequencies, the distribution of
traits is advanced forward in time by ∆t. Finally, a second measurement of the population is
taken after ∆t has elapsed in real time. The final measurement and the predicted outcome
                                              119


can then be compared using the same method of comparing distributions as the original
selection impact. The difference between the prediction, which assumes neutral drift, and
the observation is therefore the selection impact on the trait in question.
6.2       Altering selection strength with fitness noise
In Chapter 3, I compared three methods of controlling selection strength: adding noise to
each organism’s fitness score, changing the population size, and changing the tournament size
of the selection algorithm. By analyzing the rate of adaptation on a special fitness function,
we found that adding noise to fitness changed the rate of adaptation in a manner similar
to changing the tournament size. We also discovered that each method for controlling the
selection strength affected some aspect of the information transfer from the environment to
reproductive success: fitness noise hinders the detection of truly fit organisms, population
size can increase or reduce sampling error, and tournament size increases or decreases the
strictness of the selection filter. However, increasing the population size has an additional
effect: increasing the inflow of mutations. Consequently, I now believe population size is an
inadequate parameter for controlling the selection strength because it affects more than one
property of the system. Despite their differences, the similarities of each method prompted
us to consider compensatory changes that involve two or more methods. For example, if
increasing the population size is desirable to increase mutation inflow, but selection needs to
be kept low to improve valley-crossing, then adding noise alongside the increase in population
size could compensate for and counteract the increase in selection strength.
    In Chapter 1 I asked, does noisy fitness (among other things) merely look like changing
the strength of selection or is it actually changing the strength of selection, and is there a
meaningful distinction to be made? I have come to believe that there is no real distinction
between appearing to change the selection strength, and actually doing so. Each phenomenon
that appears to affect the strength of selection plays some role in the transference of infor-
mation from the environment to the population. This has led me to wonder if there isn’t
                                              120


a more central theory that could tie all of these selection-altering phenomena together. I
was inspired by the work of [Rivoire and Leibler, 2011, Montanez, 2013, Rivoire and Leibler,
2014] and others who represent evolution as a transfer of information from the environment
(or fitness function) to an organism’s genotype and eventually the entire population. This
seems to me to be the right way forward for understanding selection strength in a broader,
unified way. With a sufficiently detailed model of the information transfer, each phenomenon
that alters the strength of selection will have a place in that model, and clearly show the
loss of information that results in the weakening of selection. Paradoxically, we know that
perfect selection hinders adaptation, so we can infer that a lossless transfer of information
is suboptimal for evolution. However, this raises the question: what is the optimal type of
information loss and at what stage along the chain of transmission is the information loss
best placed?
     During my work on noisy fitness, I devised a simple way of defining a tournament size
between two integer values. I called this technique real-valued tournaments (RVT). RVT can
be defined as follows: if T ∈ R and T ≥ 1 then T is a real-valued tournament size. A real-
valued tournament size can be used in the tournament selection algorithm as follows: conduct
an integer-valued tournament of size1 ⌊T ⌋ with probability 1 − (T − ⌊T ⌋) or a tournament
size of ⌊T ⌋ + 1 with probability T − ⌊T ⌋. The average tournament size will be T . RVT
has utility beyond being a selection algorithm. For example, we can calculate the effective
tournament size of a mutant-type group undergoing a selective sweep to reveal the weaker
selection due to the free-for-all effect. First, let x denote the density of the mutant type in a
well-mixed population. If we assume the mutant type always wins in a tournament against
a wild type (even when a few minor deleterious mutations are hitchhiking), all wild/mutant
tournaments are effectively size T = 1 and all mutant/mutant tournaments are effectively
size T = 2. Thus, if we take a weighted average of these two types of encounters, we will
   1
     The floor function, ⌊·⌋, is a function that returns the nearest whole number, less-than or equal-to the
input.
                                                      121


have the effective tournament size of the mutant-type group.
                                       2x(1 − x) + 2x2        2
                                Teff =                   =                                  (6.1)
                                        2x(1 − x) + x2      2−x
This is perhaps one of the easiest ways to understand the free-for-all effect: by comparing the
probability of entering a tournament you are assured to win, versus a tournament that actually
tests the organisms involved. Working with tournament selection in this way provides a simple
and intuitive way to think about selection. This methodology merits further development.
6.3       The evolutionary free-for-all effect
In chapter 4 I introduced a new evolutionary dynamic, the free-for-all effect (FFA). FFA is a
reduction of selection pressure on clades that have a competitive advantage. The reduction in
selection strength is proportional to the fitness advantage, and can lead to a self-perpetuating
period of accelerated evolution called a FFA cascade. FFA naturally shifts the balance of
selection during selective sweeps, and so fits nicely into a general shifting balance theory, like
I have discussed in Chapter 1. For these reasons, FFA is also high on the list of phenomena
that can help explain the pattern of punctuated equilibrium.
    Free-for-all does not guarantee additional valley-crossings will occur, it merely reduces
the average stochastic tunneling time from what it would be during a period of equilibrium.
Therefore, when fixation times are longer, there is more time to allow stochastic tunneling
to occur while FFA is having an effect. This tells us that longer fixation times can accelerate
evolution, merely by making time for mutations to accumulate in the population while
conditions are more favorable to exploration. Longer fixation times also introduce a greater
probability of clonal interference in asexual populations, so there is a trade-off to consider for
researchers looking to leverage FFA in their evolutionary algorithms. Clonal interference can
be mitigated by using sexual recombination, which is already fairly standard in evolutionary
computing, however sex creates stronger selection [Kondrashov, 1988, Rice and Chippindale,
                                               122


Figure 6.1: A 1-dimensional ring population structure with N organisms. Organisms with a
beneficial mutation are colored red. In each generation, the number of organisms with the beneficial
mutation can increase by at most 2. The fixation of a beneficial mutation takes at least N2 generations.
2001], so this too may need to be compensated for.
    While investigating free-for-all in different population structures, we observed an im-
portant difference between well-mixed and spatial populations: spatial populations have a
maximum growth rate. For example, consider a 1-dimensional ring population structure
where organisms can only reproduce into adjacent locations, as shown in Figure 6.1. Because
offspring must be placed near their parent, the growth rate of an advantaged genotype is
limited, regardless of its fitness advantage over other genotypes. This also places a limit on
the minimum fixation time of a beneficial mutation, keeping the free-for-all window open at
least that long.
    In general, a d-dimensional spatial population where offspring are placed adjacent to their
parent has a minimum fixation time of
                                                 √
                                                    d 1/d
                                          tmin ∝      N .                                         (6.2)
                                                  2
In contrast, an advantaged genotype in a well-mixed population can grow arbitrarily fast,
limited only by the fitness advantage itself. In theory, a sufficiently beneficial mutation could
sweep a well-mixed population in a single generation.
                                                 123


    Since FFA has a larger effect when fixation is slow and when the fitness benefit is large, well-
mixed populations can prohibit their own free-for-all effect by rushing high-fitness genotypes
to fixation. In contrast, spatial populations offer a way for high-fitness genotypes to experience
a long FFA window. This is particularly important when FFA cascades occur, since each
new discovery in the cascade increases the fitness advantage of the mutant type. In a spatial
population, the additional fitness only serves to enhance FFA. In a well-mixed population,
these additional benefits would accelerate the fixation event and close the FFA window faster.
Below, I discuss the possibility of studying other population structures and their relationship
to FFA.
    The importance of the free-for-all effect when discussing the tempo and mode of evolution
cannot be understated. FFA is a shift of the strength of selection that occurs during every
selective sweep (range expansions included as a special case). As such, FFA must be kept in
mind for nearly all of evolutionary science. FFA creates the opportunity for extended periods
of rapid adaptation that might not occur otherwise, and is a crucial addition to the many
other factors that can explain punctuated equilibrium.
    FFA also brings into focus the importance of deep clades for the exploration of the
fitness landscape. Valley-crossing requires at least one deleterious mutation to occur on the
same lineage before a benefit can be discovered. Since during a selective sweep only those
lineages with higher-than-average fitness can accumulate deleterious mutations and continue
to grow, the deleterious mutations are concentrated in these above-average lineages. These
lineages can travel a further distance in the fitness landscape, cross wider fitness valleys, and
explore hard to reach areas. In contrast, deleterious mutations accumulated during periods
of equilibrium are easily removed by selection because the lineages they accumulate on are
likely brought below the average fitness. Thus, it is important to ’protect’ some lineages
from the fitness penalty of their deleterious mutations in order to increase the exploratory
power of the population. Spatial population structures naturally protect the lineages at the
boundary of a selective sweep, while also maintaining the strength of purifying selection
                                                124


within the boundary. For this reason, and the reasons given above, I strongly encourage the
 use of spatial population structures in all EAs where well-mixed populations are not strictly
 necessary.
     As of today, free-for-all is a theoretical discovery, by which I mean we have not yet found
 conclusive evidence for the effect in observations of natural populations. Finding such evidence
 is difficult for a number of reasons. Primarily, there are many confounding factors which may
 either look like FFA, or mask FFA with a counteracting effect. For example, in our work
with well-mixed populations, we observed a kind of ’refractory period’ that takes place after
 a selective sweep. This refractory period is essentially the time during which a population
 recovers from the loss of diversity that accompanies a selective sweep. This loss of diversity
 makes valley-crossing harder because the population cannot explore the fitness landscape
 effectively. Thus, the loss of diversity appears to lengthen the average time between valley-
 crossing events because additional time is needed to rebuild the molecular quasispecies [Eigen
 and Schuster, 1979, Wilke, 2005, Biebricher and Eigen, 2006]. FFA decreases the average
 time between crossings, and so the two effects together can cancel each other out when the
 impact of FFA on the dynamics of adaptation is weak, like in well-mixed populations. This
’canceling out’ only happens when examining the average rate of adaptation, but both the
 refractory period and FFA can be clearly seen in the distribution of valley-crossing times:
when compared to the equilibrium crossing-time distribution, the refractory period creates
 as a truncated left tail and FFA creates bi-modality. When designing our experiments, we
 made sure to exclude as many factors that could obfuscate the detection of FFA as possible,
 including epistasis, multiple traits, sex, diminishing returns, negative frequency dependence,
 and ecology. Nature systems, on the contrary, has each of these all at once, so conclusively
 demonstrating FFA in a natural system would require either a model organism with none
 of these confounding factors, or a method for discounting the effect of each of these factors
 from the data. Neither of these paths seem particularly easy, and thus warrant an entire
 investigation of their own.
                                                  125


    To understand the real impact of FFA, it is important to investigate how FFA affects
populations of organisms with multiple traits. In all the investigations of FFA that I have
shown, we have used a one-dimensional fitness function. This means that the FFA cascades we
have observed so far have all been sequential: each fitness valley that is crossed is on the same
fitness dimension and so must be crossed one at a time. In a multi-trait system, the discovery
of a beneficial mutation on one trait relaxes the selection pressure on the entire organism, so
several traits may valley-cross in parallel with one another. These parallel FFA cascades have
two important consequences: larger fitness improvements and the circumvention of trade-offs.
Parallel cascades are more likely to result in very large fitness improvements, as the parallel
discoveries combine their fitness advantages. We have already shown that FFA reduces
selection strength proportional to the net fitness advantage, so parallel cascades and the large
advantages they bring could dramatically reduce selection strength beyond anything we have
seen so far. Additionally, multi-trait systems often have trade-off relationships between two
or more traits, where one trait cannot improve without another trait losing fitness (see [Deb
et al., 2002] for more information about evolutionary optimization with trade-offs). Parallel
FFA cascades create opportunities for traits linked by trade-offs to all improve at the same
time, since the loss of fitness due to a trade-off may not be lethal during the period of reduced
selection strength.
    In preliminary work, we found that, indeed, each trait in an organism benefits from the
reduced selection during a selective sweep. Therefore, FFA induced by the improvement of
one trait may indirectly help other ostensibly unrelated traits cross a fitness valley. The
increased probability to valley-cross on every trait drastically increases the likelihood of a
FFA cascade. Furthermore, we found that systems with sufficiently many traits experience
extremely long cascades lasting up to 80,000 generations and crossing up to 160 fitness valleys
with only 6 traits. These very long cascades lend further support to the idea that FFA plays
a significant role in the explanation of the observed, often dramatic, changes in the frequency
of innovations, known as punctuated equilibrium. The interplay of multiple traits and FFA
                                                126


is the subject of ongoing research. Preliminary findings about multi-trait systems give rise
to a very simple evolutionary computation augmentation: simply adding ’dummy traits’
to the fitness function that increase the dimensionality of the fitness landscape, making it
possible for FFA to be triggered by a dummy trait and assist the main optimization task.
Investigating ways to leverage FFA to speed up adaptation led to the development of super
explorers (Chapter 5), and still provides opportunities for additional developments.
    When searching for other factors that may affect a population’s ability to escape its current
local optimum, we found that the shape of the optimum itself is such a factor. For example,
in my work on the selection impact metric the more steeply-sloped fitness landscapes showed
a higher selection impact (all other factors being the same), indicating selection doing more
work. This has led to the hypothesis that populations tend to get stuck on local optima with
gentler slopes and rounder surfaces for longer periods of time than on steeper, jagged peaks,
because the latter create a more turbulent equilibrium and thus create a kind of standing
FFA effect. To help illustrate this point, imagine a selective sweep occurring that never
quite ends, but instead perpetually climbs and slips down the top of the local optimum,
like Sisyphus’s boulder in the eponymous Greek myth. These small reoccurring Sisyphean
selective sweeps are essentially what occurs when the population is equilibrated on a local
optimum and selection is acting against the inflow of deleterious mutations. Thus, we predict
that stabilizing selection results in a small FFA effect that, over very long time spans, should
see steeper local optima escaped faster than gentler ones. This hypothesis has not yet been
tested.
    We have investigated FFA in well-mixed populations, as well as one- and two-dimensional
population structures. FFA occurs in each of them, but the size of the effect and the manner
in which it manifests is slightly different for the reasons discussed above. Clearly, a general
study of how population structure affects FFA is needed. I am particularly inspired by the
work of [Mühlenbein, 1991, Lieberman et al., 2005, Mühlenbein, 2009, Askari and Samani,
2015, Möller et al., 2019, Tkadlec et al., 2019, Kuo et al., 2021] who have extensively studied
                                                127


how population structure can affect fixation rates and probabilities, among other things. We
have already observed that population structures that promote Fisher waves [Fisher, 1937],
slow down fixation rates, and decouple growth rate from fitness advantage to promote a
stronger free-for-all effect. We have also already observed that the number of individuals
that experience FFA depends on the population structure. For example, in the 1D structure
in Figure 6.1, only the two individuals at the boundary between the mutant and wild-type
subpopulations experience FFA, while in the analogous 2D population the size of the boundary
changes over the course of fixation. There is substantial potential to improve evolutionary
computation with further research into how FFA is affected by population structure.
6.4       Super Explorers: protected, sustained, neutral drift
In Chapter 5, I introduced a new method called “super explorers” as a way to augment
evolutionary algorithms. The method is based on previous research that showed that the
free-for-all effect in spatial populations results in reduced selection pressure and deeper clades.
Super explorers are designed to reproduce regardless of their fitness, resulting in deep clades
that can explore fitness valleys without penalty. Occasionally, a super explorer will be replaced
by an organism from the rest of the evolving population, putting a soft bound on the depth
of the clades they produce. I showed that the addition of super explorers to an evolutionary
algorithm increases the exploration ability of the algorithm, even for modest super explorer
counts. Super explorers provide a benefit comparable to lowering the selection strength of the
entire population, without the risk of losing good solutions to Muller’s ratchet and neutral
drift. The resulting increase in adaptive power is analogous to the effects of FFA, as intended.
    The development of super explorers led to a novel method of joining two or more pop-
ulations together that I am now calling horizontal concatenation, depicted in Figure 6.2.
This method allows for the combination of an arbitrary number of population-based search
methods and encourages rapid exchange of information between populations, unlike an island
model that emphasizes isolation.
                                                 128


Figure 6.2: A general schematic of the horizontal concatenation method. An arbitrary number
of population-based search algorithms and archives (referred to collectively as pools) are joined
together in parallel. Each pool has read-only access to the other pools and can make use of them
to update itself, depicted here as arrows from t to t + 1. There is no requirement that a pool read
from other pools, and the technical limitations of some pools may prevent them from connecting
with other pools. See Chapter 5 for more information.
    The horizontal concatenation method was inspired by the architecture of Markov brains
(Fig. 3, [Hintze et al., 2017]). Each pool manages a container of organisms. For example,
a typical evolutionary algorithm manages the evolving population by choosing parents and
creating offspring, while an archive manages a container of organisms by deciding which
organisms to include, and which to exclude. In this way, each pool can be thought of as an
independent, self-contained unit with its own set of organisms that it alone is responsible for.
Each pool must decide for itself what constitutes a single update of its container, from time
t to time t + 1. For example, an evolving population can have overlapping generations or
non-overlapping generations, and the definition of a single update depends on the choice of
generational model. With this description of a pool in mind, the horizontal concatenation
method can be defined simply as allowing each pool to read from the containers of other
pools for the purposes of informing their own update process. It is in this way that the
horizontal concatenation method resembles the structure of a Markov brain: the set of all
containers is updated in parallel from t to t + 1, and each pool can read from any other
container, but can only write to its own container. Since an arbitrary number of pools can be
linked together (concatenated) and all pools update in parallel (so are arranged horizontally),
I call this method of parallel search “horizontal concatenation”.
    Horizontal concatenation has several advantages: it allows us to track which pool an
                                                129


organism and its offspring are in without ambiguity, and it allows us to add more pools
seamlessly. The ability to unambiguously track which pool an organism is in mostly serves to
make analysis of the line of descent easier. The ability to add more pools seamlessly, without
the need to modify the internal logic of other pools, makes sharing information between
pools a far easier and more powerful than it seems at first. To illustrate this point, first
consider the setup we used in the original work: we had a normally evolving population, a
pool of super explorers, and an additional pool that contained just the organism with the
highest fitness ever discovered, the global max. In that work, we sometimes replace a decayed
super explorer with the global max. However, we could instead replace the global max pool
with a more sophisticated archive of solutions, like MAP-Elites [Mouret and Clune, 2015],
providing a diverse range of choices to replace a decayed super explorer. But we can do even
better than simply replacing one pool with another; we can add more pools with drastically
different search properties. For example, we can concatenate an evolving population and an
implementation of particle-swarm optimization [Poli et al., 2007]. Preliminary work on this
particular combination has shown that interfacing the two search algorithms is rather easy;
the inclusion of a global-max pool helps link the two searches together. Furthermore, two
evolutionary searches with different selection algorithms or different generational models could
be combined to leverage the differences between them. The possibilities are too numerous
to list. Future work should investigate the limitations of this method, which are as yet
unknown. One obvious limitation is the increased computational overhead of adding more
pools, which is already a familiar limitation of evolutionary algorithms. It is unclear, however,
if the increased computational effort will be worth the search dynamics of a complex hybrid
algorithm. Horizontal concatenation shares some similarities to other parallel evolutionary
algorithms, like island models and demes, and this warrants comparative analysis to determine
how it fits into that area of research.
                                               130


6.5       Final Remarks
I believe that it is a mistake to regard evolutionary computation as separate from evolutionary
science. Even though evolutionary algorithms are inspired by evolution, the relationship
between the two does not end there. In a very real sense, natural evolution and evolutionary
algorithms are both instantiations of the same fundamental process. The differences are
merely details. This is why I believe it is paramount that evolutionary science and evolutionary
computation work together more collaboratively.
    Across all of these projects, I consistently found that well-mixed population models were
hard to tune for good performance compared to spatial models, only weakly exhibited free-
for-all and so made little use of its benefits, and create spooky competition-at-a-distance
interactions to boot. For these reasons, I have come to believe that spatial population
models, even simple one-dimensional ring populations, alleviate nearly all the problems often
associated with well-mixed populations. As such, I recommend the use of one-dimensional
ring populations or other simple spatial population models as the go-to population model
for evolutionary algorithms. Favoring spatial models is not actually a new revelation, but
something that has been discussed before [Mühlenbein, 1991, Mühlenbein, 2009]. Spatial
populations, in my opinion, deserve far more attention in evolutionary optimization than
they currently get.
    We find that in nature, selection strength is a highly fluid property of the evolving system.
The idea of naturally shifting selection strength is an old idea, but it has been re-discovered
many times to be the consequence of a diverse set of mechanisms. I believe we should start
to view selection as more of an emergent property of the entire evolutionary system, with
some aspects of the system contributing more heavily to selection and others lightly.
    I have investigated selection, and shifting selection strength, in order to find new ways of
improving evolutionary algorithms. I found that selection strength can be thought of as the
degree to which an evolving system is dissimilar to neutral drift. This perspective opens the
door to accept all phenomena that affect the strength of selection as part of a unified theory
                                                131


of selection. Such a theory is likely built on the idea that evolution is the transmission of
information from the fitness landscape to the population.
    Spatial structure in the population is an important re-occurring theme in much of this
thesis, and offers a minimally invasive way to improve an evolutionary algorithm. Spatial
structures allow for local selection strengths to weaken while keeping the global average
strength high, allowing for enhanced exploration at little cost to exploitation.
    I have presented a new evolutionary dynamic: the free-for-all effect, namely the reduction
of selection strength on organisms with higher-than-average fitness during a selective sweep
or range expansion. Shifting selection strength seems to be the best contender to explain
the punctuated nature of evolution, and FFA appears to be one of the most likely ways of
causing this shift to occur.
    The discovery of FFA led to the design of a new evolutionary search method called super
explorers. Super explorers serve as the first example of a new technique in evolutionary
computing, called horizontal concatenation, which has the potential to combine drastically
different search algorithms in a hybrid parallel search.
    Natural evolution inspires evolutionary algorithms. This initial inspiration has led to
an entire field of evolutionary optimization techniques. However, like all optimization tech-
niques, evolution is not a silver bullet, and must be tuned for each specific situation. Natural
evolution, while theoretically bound by the same constraints, seems to adapt continuously,
sometimes slowly and sometimes rapidly. In order to design the next generation of evolution-
ary optimization algorithms, we should continue to look back to nature for inspiration.
                                               132


Attributions
Chapter 3
This chapter is based on work done in collaboration with Clifford Bohm [Ragusa and Bohm,
2021]. Both authors contributed equally to the final text of the original manuscript, data
collection and analysis, and experimental design. Ragusa provided the final figures and
pseudo-code. Bohm maintains the github repository.
Chapter 4
This chapter is based on unpublished work done in collaboration with Clifford Bohm, Christoph
Adami, Charles Ofria, and Richard E. Lenski. Ragusa and Bohm contributed equally to the
original text, data collection and analysis, and experimental design. Bohm provided the final
figures. Adami made contributions to the original text.
Chapter 5
This chapter is based on work done in collaboration with Clifford Bohm [Ragusa and Bohm,
2022]. Both authors contributed equally to the final text of the original manuscript, data
collection and analysis, and experimental design. Bohm provided the final figures. Ragusa
provided the pseudo-code, and initial design of super explorers and the horizontal concatena-
tion method.
                                               133


                                        BIBLIOGRAPHY
[Adami, 1998] Adami, C. (1998). Introduction to Artificial Life. Springer Science & Business
  Media.
[Adami, 2006] Adami, C. (2006). Digital genetics: unravelling the genetic basis of evolution.
  Nature Reviews Genetics, 7(22):109–118.
[Alili et al., 2005] Alili, L., Patie, P., and Pedersen, J. L. (2005). Representations of the first
  hitting time density of an Ornstein-Uhlenbeck process. Stochastic Models, 21(4):967–980.
[Anderson, 1962] Anderson, T. W. (1962). On the distribution of the two-sample Cramer-
  von Mises criterion. The Annals of Mathematical Statistics, 33(3):1148–1159.
[Artime et al., 2018] Artime, O., Khalil, N., Toral, R., and San Miguel, M. (2018). First-
  passage distributions for the one-dimensional fokker-planck equation. Physical Review E,
  98(4):042143.
[Askari and Samani, 2015] Askari, M. and Samani, K. A. (2015). Analytical calculation of
  average fixation time in evolutionary graphs. Physical Review E, 92(4):042707.
[Bak and Boettcher, 1997] Bak, P. and Boettcher, S. (1997). Self-organized criticality and
  punctuated equilibria. Physica D: Nonlinear Phenomena, 107(2):143–150.
[Bak et al., 1987] Bak, P., Tang, C., and Wiesenfeld, K. (1987). Self-organized criticality:
  An explanation of the 1/f noise. Physical Review Letters, 59(4):381–384.
[Baym et al., 2016] Baym, M., Lieberman, T. D., Kelsic, E. D., Chait, R., Gross, R., Yelin,
  I., and Kishony, R. (2016). Spatiotemporal microbial evolution on antibiotic landscapes.
  Science, 353(6304):1147–1151.
[Biebricher and Eigen, 2006] Biebricher, C. K. and Eigen, M. (2006). What is a quasis-
  pecies? Quasispecies: concept and implications for virology, pages 1–31.
[Bininda-Emonds et al., 2007] Bininda-Emonds, O. R. P., Cardillo, M., Jones, K. E.,
  MacPhee, R. D. E., Beck, R. M. D., Grenyer, R., Price, S. A., Vos, R. A., Gittle-
  man, J. L., and Purvis, A. (2007). The delayed rise of present-day mammals. Nature,
  446:507–512.
[Bohm et al., 2019] Bohm, C., Ackles, A. L., Ofria, C., and Hintze, A. (2019). On sexual
  selection in the presence of multiple costly displays. In ALIFE 2019: The 2019 Conference
  on Artificial Life, pages 247–254. MIT Press.
[Bohm et al., 2017] Bohm, C., C G, N., and Hintze, A. (2017). MABE (modular agent
  based evolver): A framework for digital evolution research. In Artificial Life Conference
                                                 134


  Proceedings, pages 76–83. MIT Press One Rogers Street, Cambridge, MA 02142-1209,
  USA journals-info . . . .
[Brunet et al., 2008] Brunet, E., Rouzine, I. M., and Wilke, C. O. (2008). The stochastic
  edge in adaptive evolution. Genetics, 179:603–20.
[Burton and Travis, 2008] Burton, O. J. and Travis, J. M. (2008). The frequency of fitness
  peak shifts is increased at expanding range margins due to mutation surfing. Genetics,
  179(2):941–950.
[Bush et al., 1977] Bush, G. L., Case, S. M., Wilson, A. C., and Patton, J. L. (1977). Rapid
  speciation and chromosomal evolution in mammals. Proceedings of the National Academy
  of Sciences, 74(9):3942–3946.
[Bäck, 1994] Bäck, T. (1994). Parallel optimization of evolutionary algorithms. In Davidor,
  Y., Schwefel, H.-P., and Männer, R., editors, Parallel Problem Solving from Nature —
  PPSN III, Lecture Notes in Computer Science, page 418–427, Berlin, Heidelberg. Springer.
[Campos et al., 2004] Campos, P. R., Adami, C., and Wilke, C. O. (2004).            Modelling
  stochastic clonal interference. Modelling in Molecular Biology, pages 21–38.
[Campos and de Oliveira, 2004] Campos, P. R. A. and de Oliveira, V. M. (2004). Muta-
  tional effects on the clonal interference phenomenon. Evolution, 58:932–7.
[Cariani, 2002] Cariani, P. A. (2002). Extradimensional bypass. Biosystems, 64(1-3):47–53.
[Carson, 1975] Carson, H. L. (1975). The genetics of speciation at the diploid level. The
  American Naturalist, 109(965):83–92.
[Cele et al., 2022] Cele, S., Karim, F., Lustig, G., San, J. E., Hermanus, T., Tegally, H.,
  Snyman, J., Moyo-Gwete, T., Wilkinson, E., Bernstein, M., Khan, K., Hwa, S.-H., Tilles,
  S. W., Singh, L., Giandhari, J., Mthabela, N., Mazibuko, M., Ganga, Y., Gosnell, B. I.,
  Karim, S. S. A., Hanekom, W., Van Voorhis, W. C., Ndung’u, T., COMMIT-KZN Team,
  Lessells, R. J., Moore, P. L., Moosa, M.-Y. S., de Oliveira, T., and Sigal, A. (2022). Sars-
  cov-2 prolonged infection during advanced hiv disease evolves extensive immune escape.
  Cell Host Microbe, 30:154–162.e5.
[Charlesworth et al., 1982] Charlesworth, B., Lande, R., and Slatkin, M. (1982). A neo-
  darwinian commentary on macroevolution. Evolution, 36(3):474–498.
[Chow, 2004] Chow, S. S. (2004). Adaptive radiation from resource competition in digital
  organisms. Science, 305(5680):84–86.
[Conrad, 1990] Conrad, M. (1990). The geometry of evolution. BioSystems, 24(1):61–81.
                                              135


[Conway Morris, 1998] Conway Morris, S. (1998). The crucible of creation: the Burgess
  Shale and the rise of animals. Oxford University Press.
[Covert et al., 2013] Covert, A. W., Lenski, R. E., Wilke, C. O., and Ofria, C. (2013).
  Experiments on the role of deleterious mutations as stepping stones in adaptive evolution.
  Proceedings of the National Academy of Sciences, 110(34):E3171–E3178.
[Davidson and Erwin, 2006] Davidson, E. H. and Erwin, D. H. (2006). Gene regulatory
  networks and the evolution of animal body plans. Science, 311(5762):796–800.
[De Jong, 2012] De Jong, K. (2012). Generalized Evolutionary Algorithms, page 625–635.
  Springer, Berlin, Heidelberg.
[de Visser and Rozen, 2005] de Visser, J. A. G. M. and Rozen, D. E. (2005). Limits to
  adaptation in asexual populations. J Evol Biol, 18:779–88.
[de Visser et al., 1999] de Visser, J. A. G. M., Zeyl, C. W., Gerrish, P. J., Blanchard, J. L.,
  and Lenski, R. E. (1999). Diminishing returns from mutation supply rate in asexual
  populations. Science, 283:404–6.
[Deb et al., 2002] Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast
  and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary
  computation, 6(2):182–197.
[Desai and Fisher, 2007] Desai, M. M. and Fisher, D. S. (2007). Beneficial mutation–
  selection balance and the effect of linkage on positive selection. Genetics, 176(3):1759–
  1798.
[Desai et al., 2007] Desai, M. M., Fisher, D. S., and Murray, A. W. (2007). The speed of
  evolution and maintenance of variation in asexual populations. Current biology, 17(5):385–
  394.
[Eigen, 1971] Eigen, M. (1971). Selforganization of matter and the evolution of biological
  macromolecules. Naturwissenschaften, 58:465–523.
[Eigen et al., 1989] Eigen, M., McCaskill, J., and Schuster, P. (1989).        The molecular
  quasi-species. Adv. Chem. Phys., 75:149–263.
[Eigen and Schuster, 1979] Eigen, M. and Schuster, P. (1979). The Hypercycle—A Principle
  of Natural Self-Organization. Springer-Verlag, Berlin.
[Eldredge, 1972] Eldredge, N. (1972). Punctuated equilibria: an alternative to phyletic
  gradualism. Models in paleobiology, page 82–115.
[Elena et al., 1996] Elena, S. F., Cooper, V. S., and Lenski, R. E. (1996).       Punctuated
                                             136


  evolution caused by selection of rare beneficial mutations. Science, 272(5269):1802–1804.
[Engholdt and Mathias, 2021] Engholdt, K. and Mathias, H. D. (2021). A biologically-
  inspired model for mass extinction in genetic algorithms. In 2021 IEEE Congress on
  Evolutionary Computation (CEC), pages 1078–1085. IEEE.
[Erwin and Valentine, 1984] Erwin, D. H. and Valentine, J. W. (1984). “hopeful monsters,”
  transposons, and metazoan radiation. Proceedings of the National Academy of Sciences,
  81(17):5482–5483.
[Erwin et al., 1987] Erwin, D. H., Valentine, J. W., and Sepkoski, J. J. (1987). A compara-
  tive study of diversification events: The early paleozoic versus the mesozoic. Evolution,
  41(6):1177–1186.
[Excoffier and Ray, 2008] Excoffier, L. and Ray, N. (2008). Surfing during population
  expansions promotes genetic revolutions and structuration. Trends in Ecology & Evolution,
  23(7):347–351.
[Fay et al., 2002] Fay, J. C., Wyckoff, G. J., and Wu, C.-I. (2002). Testing the neutral theory
  of molecular evolution with genomic data from drosophila. Nature, 415(6875):1024–1026.
[Feller, 1951] Feller, W. (1951). Diffusion processes in genetics. In Proceedings of the second
  Berkeley Symposium on Mathematical Statistics and Probability, pages 227–246. University
  of California Press.
[Fisher, 2013] Fisher, D. S. (2013). Asexual evolution waves: Fluctuations and universality.
  J. Stat. Mech.: Theory and Experiment, 2013:P01011.
[Fisher, 1937] Fisher, R. A. (1937). The wave of advance of advantageous genes. Annals of
  Eugenics, 7(4):355–369.
[Fitch, 1995] Fitch, W. M. (1995). Tempo And Mode In Evolution: Genetics And Paleon-
  tology 50 Years After Simpson. National Academies Press (US).
[Fogle et al., 2008] Fogle, C. A., Nagle, J. L., and Desai, M. M. (2008). Clonal interference,
  multiple mutations and adaptation in large asexual populations. Genetics, 180(4):2163–
  2173.
[Fontdevila, 1992] Fontdevila, A. (1992). Genetic instability and rapid speciation: are they
  coupled? Genetica, 86(1):247–258.
[Forster et al., 2006] Forster, R., Adami, C., and Wilke, C. O. (2006). Selection for muta-
  tional robustness in finite populations. J Theor Biol, 243:181–90.
[Gavrilets, 1999] Gavrilets, S. (1999). Evolution and speciation in a hyperspace: the roles
                                               137


  of neutrality, selection, mutation and random drift. Evolution, page 1.
[Gavrilets, 2003] Gavrilets, S. (2003). Perspective: models of speciation: what have we
  learned in 40 years? Evolution, 57:2197–215.
[Gerrish and Lenski, 1998] Gerrish, P. J. and Lenski, R. E. (1998). The fate of competing
  beneficial mutations in an asexual population. Genetica, 102:127–144.
[Gilbert et al., 2017] Gilbert, K. J., Sharp, N. P., Angert, A. L., Conte, G. L., Draghi, J. A.,
  Guillaume, F., Hargreaves, A. L., Matthey-Doret, R., and Whitlock, M. C. (2017). Local
  adaptation interacts with expansion load during range expansion: maladaptation reduces
  expansion load. The American Naturalist, 189(4):368–380.
[Gillespie, 1984a] Gillespie, J. H. (1984a). The molecular clock may be an episodic clock.
  Proceedings of the National Academy of Sciences, 81(24):8009–8013.
[Gillespie, 1984b] Gillespie, J. H. (1984b). Molecular evolution over the mutational land-
  scape. Evolution, 38:1116–1129.
[Good et al., 2012] Good, B. H., Rouzine, I. M., Balick, D. J., Hallatschek, O., and Desai,
  M. M. (2012). Distribution of fixed beneficial mutations and the rate of adaptation in
  asexual populations. Proc Natl Acad Sci U S A, 109:4950–5.
[Gould, 1977] Gould, S. J. (1977). Evolution’s erratic pace. Natural History, 86(5):12–16.
[Gould, 1987] Gould, S. J. (1987). Is a New and General Theory of Evolution Emerging?,
  page 113–130. Life Science Monographs. Springer US.
[Gould, 1989] Gould, S. J. (1989). Wonderful Life: the Burgess Shale and the Nature of
  History. WW Norton & Company.
[Gould and Eldredge, 1977] Gould, S. J. and Eldredge, N. (1977). Punctuated equilibria:
  The tempo and mode of evolution reconsidered. Paleobiology, 3(2):115–151.
[Gould and Eldredge, 1993] Gould, S. j. and Eldredge, N. (1993). Punctuated equilibrium
  comes of age. Nature, 366(6452):223–227.
[Hahn, 2008] Hahn, M. W. (2008).          Toward a selection theory of molecular evolution.
  Evolution, 62(2):255–265.
[Haldane, 1954] Haldane, J. B. S. (1954). The measurement of natural selection. Proceedings
  of the 9th International Congress of Genetics, 1:480–487.
[Handel and Rozen, 2009] Handel, A. and Rozen, D. E. (2009). The impact of population
  size on the evolution of asexual microbes on smooth versus rugged fitness landscapes. BMC
                                              138


   Evolutionary Biology, 9(1):236.
[Helmuth et al., 2014] Helmuth, T., Spector, L., and Matheson, J. (2014). Solving un-
   compromising problems with lexicase selection. IEEE Transactions on Evolutionary
  Computation, 19(5):630–643.
[Hereford et al., 2004] Hereford, J., Hansen, T. F., and Houle, D. (2004). Comparing
   strengths of directional selection: How strong is strong? Evolution, 58(10):2133–2143.
[Hintze et al., 2017] Hintze, A., Edlund, J. A., Olson, R. S., Knoester, D. B., Schossau,
   J., Albantakis, L., Tehrani-Saleh, A., Kvam, P., Sheneman, L., Goldsby, H., Bohm, C.,
   and Adami, C. (2017). Markov brains: A technical introduction. (arXiv:1709.05601).
   arXiv:1709.05601 [cs, q-bio].
[Ho and Pepyne, 2002] Ho, Y.-C. and Pepyne, D. L. (2002). Simple explanation of the no-
   free-lunch theorem and its implications. Journal of optimization theory and applications,
   115(3):549–570.
[Hoffmann and Parsons, 1997] Hoffmann, A. A. and Parsons, P. A. (1997).              Extreme
   Environmental Change and Evolution. Cambridge University Press.
[Holland, 1992] Holland, J. H. (1992). Genetic algorithms. Scientific American, 267(1):66–
   73.
[Houle, 1992] Houle, D. (1992). Comparing evolvability and variability of quantitative traits.
  Genetics, 130(1):195–204.
[Iwasa et al., 2004a] Iwasa, Y., Michor, F., and Nowak, M. A. (2004a). Stochastic tunnels
   in evolutionary dynamics. Genetics, 166(3):1571–1579.
[Iwasa et al., 2004b] Iwasa, Y., Michor, F., and Nowak, M. A. (2004b). Stochastic tunnels
   in evolutionary dynamics. Genetics, 166(3):1571–1579.
[Jain et al., 2011] Jain, K., Krug, J., and Park, S.-C. (2011). Evolutionary advantage of
   small populations on complex fitness landscapes. Evolution, 65(7):1945–1955.
[John Sepkoski Jr, 1998] John Sepkoski Jr, J. (1998). Rates of speciation in the fossil record.
   Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences,
   353(1366):315–326.
[Johnson, 2008] Johnson, N. (2008). Sewall wright and the development of shifting balance
   theory. Nature Education, 1(1):52.
[Kantorovich, 1960] Kantorovich, L. V. (1960). Mathematical methods of organizing and
   planning production. Management Science, 6(4):366–422.
                                              139


[Kauffman et al., 1993] Kauffman, S. A. et al. (1993).          The origins of order: Self-
  organization and selection in evolution. Oxford University Press, USA.
[Kern and Hahn, 2018] Kern, A. D. and Hahn, M. W. (2018). The neutral theory in light
  of natural selection. Molecular biology and evolution, 35(6):1366–1371.
[Kim and Stephan, 2003] Kim, Y. and Stephan, W. (2003). Selective sweeps in the presence
  of interference among partially linked loci. Genetics, 164:389–98.
[Kimura, 1964] Kimura, M. (1964). Diffusion models in population genetics. Journal of
  Applied Probability, 1(2):177–232.
[Kimura, 1983] Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge
  University Press.
[Kimura, 1987] Kimura, M. (1987). Molecular evolutionary clock and the neutral theory.
  Journal of Molecular Evolution, 26(1):24–33.
[Kimura, 1991] Kimura, M. (1991). The neutral theory of molecular evolution: a review of
  recent evidence. The Japanese Journal of Genetics, 66(4):367–386.
[Kimura and Crow, 1964] Kimura, M. and Crow, J. F. (1964). The number of alleles that
  can be maintained in a finite population. Genetics, 49:725–738.
[Kimura and Ohta, 1969] Kimura, M. and Ohta, T. (1969). The average number of genera-
  tions until fixation of a mutant gene in a finite population. Genetics, 61:763–71.
[Kondrashov, 1988] Kondrashov, A. S. (1988). Deleterious mutations and the evolution of
  sexual reproduction. Nature, 336(6198):435–440.
[Koza, 1994] Koza, J. R. (1994). Genetic programming as a means for programming
  computers by natural selection. Statistics and computing, 4:87–112.
[Kramer, 2010] Kramer, O. (2010). Evolutionary self-adaptation: a survey of operators and
  strategy parameters. Evolutionary Intelligence, 3(2):51–65.
[Kreitman, 2000] Kreitman, M. (2000). Methods to detect selection in populations with
  applications to the human. Annual Review of Genomics and Human Genetics, 1(1):539–
  559.
[Kryazhimskiy et al., 2009] Kryazhimskiy, S., Tkačik, G., and Plotkin, J. B. (2009). The
  dynamics of adaptation on correlated fitness landscapes. Proceedings of the National
  Academy of Sciences, 106(44):18638–18643.
[Kullback and Leibler, 1951] Kullback, S. and Leibler, R. A. (1951). On information and
                                             140


   sufficiency. The Annals of Mathematical Statistics, 22(1):79–86.
[Kuo et al., 2021] Kuo, Y. P., Arrieta, C. N., and Carja, O. (2021). A theory of evolutionary
   dynamics on any complex spatial structure. bioRxiv, pages 2021–02.
[La Cava et al., 2016] La Cava, W., Spector, L., and Danai, K. (2016). Epsilon-lexicase
   selection for regression. In Proceedings of the Genetic and Evolutionary Computation
  Conference 2016, pages 741–748.
[LaBar and Adami, 2016] LaBar, T. and Adami, C. (2016). Different evolutionary paths
   to complexity for small and large populations of digital organisms. PLoS Comput Biol,
  12:e1005066.
[Lande and Arnold, 1983] Lande, R. and Arnold, S. J. (1983). The measurement of selection
   on correlated characters. Evolution, 37(6):1210–1226.
[Langley and Fitch, 1974] Langley, C. H. and Fitch, W. M. (1974). An examination of the
   constancy of the rate of molecular evolution. Journal of Molecular Evolution, 3(3):161–177.
[Lehman and Miikkulainen, 2015] Lehman, J. and Miikkulainen, R. (2015).            Extinction
   events can accelerate evolution. PLOS ONE, 10(8):e0132886.
[Lehman and Stanley, 2011] Lehman, J. and Stanley, K. O. (2011). Novelty search and the
   problem with objectives. In Genetic Programming Theory and Practice IX, pages 37–56.
   Springer.
[Lenski et al., 2003] Lenski, R. E., Winkworth, C. L., and Riley, M. A. (2003). Rates of
   DNA sequence evolution in experimental populations of Escherichia coli during 20,000
   generations. Journal of Molecular Evolution, 56(4):498–508.
[Lewis, 1962] Lewis, H. (1962). Catastrophic selection as a factor in speciation. Evolution,
  16(3):257–271.
[Lewontin et al., 1974] Lewontin, R. C. et al. (1974). The Genetic Basis of Evolutionary
  Change, volume 560. Columbia University Press, New York.
[Lieberman et al., 2005] Lieberman, E., Hauert, C., and Nowak, M. A. (2005). Evolutionary
   dynamics on graphs. Nature, 433(7023):312–316.
[Lynch, 2007] Lynch, M. (2007). The origins of genome architecture. Sinauer Associates
   Sunderland.
[Lynch et al., 2016] Lynch, M., Ackerman, M. S., Gout, J.-F., Long, H., Sung, W., Thomas,
  W. K., and Foster, P. L. (2016). Genetic drift, selection and the evolution of the mutation
   rate. Nature Reviews Genetics, 17(1111):704–714.
                                              141


[Lynch and Conery, 2003] Lynch, M. and Conery, J. S. (2003).            The origins of genome
  complexity. Science, 302(5649):1401–1404.
[Malamud et al., 1998] Malamud, B. D., Morein, G., and Turcotte, D. L. (1998). For-
  est fires: An example of self-organized critical behavior. Science (New York, N.Y.),
  281(5384):1840–1842.
[Martı́nez-Jiménez et al., 2020] Martı́nez-Jiménez, F., Muiños, F., Sentı́s, I., Deu-Pons, J.,
  Reyes-Salazar, I., Arnedo-Pac, C., Mularoni, L., Pich, O., Bonet, J., Kranas, H., et al.
  (2020). A compendium of mutational cancer driver genes. Nature Reviews Cancer,
  20(10):555–572.
[Mathias and Ragusa, 2016] Mathias, H. D. and Ragusa, V. R. (2016). An empirical study
  of crossover and mass extinction in a genetic algorithm for pathfinding in a continuous
  environment. In 2016 IEEE Congress on Evolutionary Computation (CEC), pages 4111–
  4118. IEEE.
[Matsumura et al., 2012] Matsumura, S., Arlinghaus, R., and Dieckmann, U. (2012). Stan-
  dardizing selection strengths to study selection in the wild: a critical comparison and
  suggestions for the future. BioScience, 62(12):1039–1054.
[Maynard Smith, 1970] Maynard Smith, J. (1970). Natural selection and the concept of a
  protein space. Nature, 225:563.
[Mayr, 1954] Mayr, E. (1954). Change of genetic environment and evolution. Evolution as
  process., page 157–180.
[Melbinger and Vergassola, 2015] Melbinger, A. and Vergassola, M. (2015). The im-
  pact of environmental fluctuations on evolutionary fitness functions. Scientific Reports,
  5(11):15211.
[Millidge et al., 2021] Millidge, B., Tschantz, A., Seth, A., and Buckley, C. (2021). Un-
  derstanding the origin of information-seeking exploration in probabilistic objectives for
  control. arXiv preprint arXiv:2103.06859.
[Möller et al., 2019] Möller, M., Hindersin, L., and Traulsen, A. (2019). Exploring and map-
  ping the universe of evolutionary graphs identifies structural properties affecting fixation
  probability and time. Nature Communications Biology, 2:1–9.
[Montanez, 2013] Montanez, G. D. (2013). Information transmission through genetic algo-
  rithm fitness maps. In 2013 IEEE Congress on Evolutionary Computation, pages 756–763.
  IEEE.
[Mouret and Clune, 2015] Mouret, J.-B. and Clune, J. (2015). Illuminating search spaces
  by mapping elites. arXiv preprint arXiv:1504.04909.
                                                142


[Muller, 1964] Muller, H. J. (1964). The relation of recombination to mutational advance.
  Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 1(1):2–9.
[Möller et al., 2019] Möller, M., Hindersin, L., and Traulsen, A. (2019). Exploring and map-
  ping the universe of evolutionary graphs identifies structural properties affecting fixation
  probability and time. Nature Communications Biology, 2(11):1–9.
[Mühlenbein, 1991] Mühlenbein, H. (1991). Parallel genetic algorithms, population genetics
  and combinatorial optimization. In Becker, J. D., Eisele, I., and Mündemann, F. W., edi-
  tors, Parallelism, Learning, Evolution, Lecture Notes in Computer Science, page 398–406,
  Berlin, Heidelberg. Springer.
[Mühlenbein, 2009] Mühlenbein, H. (2009). Evolutionary Computation: Centralized, Par-
  allel or Collaborative, volume 1 of Intelligent Systems Reference Library, page 561–595.
  Springer Berlin Heidelberg, Berlin, Heidelberg.
[Nevo, 2001] Nevo, E. (2001). Evolution of genome–phenome diversity under environmental
  stress. Proceedings of the National Academy of Sciences, 98(11):6233–6240.
[Niklas, 1997] Niklas, K. J. (1997).      The Evolutionary Biology of Plants.    University of
  Chicago Press.
[Niklas et al., 1983] Niklas, K. J., Tiffney, B. H., and Knoll, A. H. (1983).      Patterns in
  vascular land plant diversification. Nature, 303(5918):614–616.
[Ochs and Desai, 2015] Ochs, I. E. and Desai, M. M. (2015). The competition between
  simple and complex evolutionary trajectories in asexual populations. BMC evolutionary
  biology, 15(1):1–9.
[Ofria and Wilke, 2004] Ofria, C. and Wilke, C. O. (2004). Avida: A software platform for
  research in computational evolutionary biology. Artificial Life, 10(2):191–229.
[Ohta, 1972] Ohta, T. (1972). Population size and rate of evolution. Journal of Molecular
  Evolution, 1(4):305–314.
[Ohta, 1992] Ohta, T. (1992). The nearly neutral theory of molecular evolution. Annual
  Review of Ecology and Systematics, 23(1):263–286.
[Ohta, 2002] Ohta, T. (2002). Near-neutrality in evolution of genes and gene regulation.
  Proceedings of the National Academy of Sciences, 99(25):16134–16137.
[Oliveto et al., 2018] Oliveto, P. S., Paixão, T., Pérez Heredia, J., Sudholt, D., and
  Trubenová, B. (2018). How to escape local optima in black box optimisation: When
  non-elitism outperforms elitism. Algorithmica, 80(5):1604–1633.
                                                143


[Orr, 2005] Orr, H. A. (2005). The genetic theory of adaptation: a brief history. Nature
  Reviews Genetics, 6(2):119–127.
[Østman and Adami, 2014] Østman, B. and Adami, C. (2014). Predicting evolution and
  visualizing high-dimensional fitness landscapes. Recent advances in the theory and appli-
  cation of fitness landscapes, pages 509–526.
[O’Donald, 1968] O’Donald, P. (1968). Measuring the intensity of natural selection. Nature,
  220:197–198.
[Palmer et al., 2012] Palmer, S. A., Clapham, A. J., Rose, P., Freitas, F. O., Owen, B. D.,
  Beresford-Jones, D., Moore, J. D., Kitchen, J. L., and Allaby, R. G. (2012). Archaeoge-
  nomic evidence of punctuated genome evolution in gossypium. Molecular Biology and
  Evolution, 29(8):2031–2038.
[Park et al., 2010] Park, S., Simon, D., and Krug, J. (2010). The speed of evolution in large
  asexual populations. J. Stat. Phys., 138:381–410.
[Pedro et al., 2021] Pedro, N., Silva, C. N., Magalhães, A. C., Cavadas, B., Rocha, A. M.,
  Moreira, A. C., Gomes, M. S., Silva, D., Sobrinho-Simões, J., Ramos, A., Cardoso, M. J.,
  Filipe, R., Palma, P., Ceia, F., Silva, S., Guimarães, J. T., Sarmento, A., Fernandes, V.,
  Pereira, L., and Tavares, M. (2021). Dynamics of a dual sars-cov-2 lineage co-infection
  on a prolonged viral shedding covid-19 case: Insights into clinical severity and disease
  duration. Microorganisms, 9:300.
[Peischl et al., 2013] Peischl, S., Dupanloup, I., Kirkpatrick, M., and Excoffier, L. (2013).
  On the accumulation of deleterious mutations during range expansions. Molecular ecology,
  22(24):5972–5982.
[Peischl and Excoffier, 2015] Peischl, S. and Excoffier, L. (2015). Expansion load: recessive
  mutations and the role of standing genetic variation. Molecular ecology, 24(9):2084–2094.
[Peischl et al., 2015] Peischl, S., Kirkpatrick, M., and Excoffier, L. (2015). Expansion load
  and the evolutionary dynamics of a species range. The American Naturalist, 185(4):E81–
  E93.
[Poli et al., 2007] Poli, R., Kennedy, J., and Blackwell, T. (2007). Particle swarm optimiza-
  tion. Swarm intelligence, 1(1):33–57.
[Price, 1972] Price, G. R. (1972). Extension of covariance selection mathematics. Annals
  of Human Genetics, 35(4):485–490.
[Price et al., 1970] Price, G. R. et al. (1970). Selection and covariance. Nature, 227:520–521.
[Ragusa and Bohm, 2022] Ragusa, V. and Bohm, C. (2022). Augmenting evolution with
                                               144


  bio-inspired “super explorers”. In ALIFE 2022: The 2022 Conference on Artificial Life.
  MIT Press.
[Ragusa and Bohm, 2021] Ragusa, V. R. and Bohm, C. (2021). Connections between noisy
  fitness and selection strength. In ALIFE 2021: The 2021 Conference on Artificial Life.
  MIT Press.
[Rechenberg, 1984] Rechenberg, I. (1984). The evolution strategy. a mathematical model of
  darwinian evolution. In Synergetics—From Microscopic to Macroscopic Order: Proceedings
  of the International Symposium on Synergetics at Berlin, July 4–8, 1983, pages 122–132.
  Springer.
[Rice and Chippindale, 2001] Rice, W. R. and Chippindale, A. K. (2001). Sexual recombi-
  nation and the power of natural selection. Science, 294(5542):555–559.
[Risken, 1989] Risken, H. (1989). The Fokker-Planck Equation: Methods of Solution and
  Applications. Springer Verlag, Berlin, Heidelberg, and New York, 2nd edition.
[Rivoire and Leibler, 2011] Rivoire, O. and Leibler, S. (2011). The value of information for
  populations in varying environments. Journal of Statistical Physics, 142(6):1124–1166.
[Rivoire and Leibler, 2014] Rivoire, O. and Leibler, S. (2014). A model for the generation
  and transmission of variations in evolution. Proceedings of the National Academy of
  Sciences, 111(19):E1940–E1949.
[Rouzine et al., 2008] Rouzine, I. M., Brunet, E., and Wilke, C. O. (2008). The traveling-
  wave approach to asexual evolution: Muller’s ratchet and speed of adaptation. Theor
  Popul Biol, 73:24–46.
[Rouzine et al., 2003] Rouzine, I. M., Wakeley, J., and Coffin, J. M. (2003). The solitary
  wave of asexual evolution. Proc Natl Acad Sci U S A, 100:587–92.
[Rozen et al., 2008] Rozen, D. E., Habets, M. G., Handel, A., and de Visser, J. A. G. (2008).
  Heterogeneous adaptive trajectories of small populations on complex fitness landscapes.
  PloS one, 3(3):e1715.
[Sahney et al., 2010] Sahney, S., Benton, M. J., and Falcon-Lang, H. J. (2010). Rain-
  forest collapse triggered carboniferous tetrapod diversification in euramerica. Geology,
  38(12):1079–1082.
[Sastry et al., 2005] Sastry, K., Goldberg, D., and Kendall, G. (2005). Genetic Algorithms,
  page 97–125. Springer US, Boston, MA.
[Shir, 2012] Shir, O. M. (2012). Niching in evolutionary algorithms. In Rozenberg, G.,
  Bäck, T., and Kok, J. N., editors, Handbook of Natural Computing, page 1035–1069, Berlin,
                                             145


   Heidelberg. Springer.
[Simpson, 1944] Simpson, G. G. (1944). Tempo and Mode in Evolution. Columbia University
   Press.
[Sites and Moritz, 1987] Sites, Jack W., J. and Moritz, C. (1987). Chromosomal evolution
   and speciation revisited. Systematic Biology, 36(2):153–174.
[Slater, 2013] Slater, G. J. (2013). Phylogenetic evidence for a shift in the mode of mam-
   malian body size evolution at the cretaceous-palaeogene boundary. Methods in Ecology
  and Evolution, 4(8):734–744.
[Slatkin and Excoffier, 2012] Slatkin, M. and Excoffier, L. (2012). Serial founder effects
   during range expansion: a spatial analog of genetic drift. Genetics, 191(1):171–181.
[Smalley Jr. et al., 1985] Smalley Jr., R. F., Turcotte, D. L., and Solla, S. A. (1985). A
   renormalization group approach to the stick-slip behavior of faults. Journal of Geophysical
   Research: Solid Earth, 90(B2):1894–1900.
[Smit and Eiben, 2009] Smit, S. and Eiben, A. (2009). Comparing parameter tuning meth-
   ods for evolutionary algorithms. In 2009 IEEE Congress on Evolutionary Computation,
   page 399–406.
[Steinberg and Ostermeier, 2016] Steinberg, B. and Ostermeier, M. (2016). Environmental
   changes bridge evolutionary valleys. Sci Adv, 2:e1500921.
[Takahata, 1987] Takahata, N. (1987). On the overdispersed molecular clock. Genetics,
   116(1):169–179.
[Templeton, 1980] Templeton, A. R. (1980).         The theory of speciation via the founder
   principle. Genetics, 94(4):1011–1038.
[Tkadlec et al., 2019] Tkadlec, J., Pavlogiannis, A., Chatterjee, K., and Nowak, M. A. (2019).
   Population structure determines the tradeoff between fixation probability and fixation time.
  Communications Biology, 2(11):1–8.
[Turcotte et al., 1985] Turcotte, D. L., Smalley, R. F., and Solla, S. A. (1985). Collapse of
   loaded fractal trees. Nature, 313:671–672.
[Uhlenbeck and Ornstein, 1930] Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory
   of the Brownian motion. Physical Review, 36(5):823–841.
[Van Egeren et al., 2018] Van Egeren, D., Madsen, T., and Michor, F. (2018). Fitness
   variation in isogenic populations leads to a novel evolutionary mechanism for crossing
   fitness valleys. Communications Biology, 1(11):1–9.
                                             146


[Van Valen, 1965] Van Valen, L. (1965). Selection in natural populations. III. Measurement
  and estimation. Evolution, 19(4):514–528.
[Van Valen, 1967] Van Valen, L. (1967). Selection in natural populations. 6. Variation
  genetics, and more graphs for estimation. Evolution, 21(2):402–405.
[Wang and Zhang, 2011] Wang, Z. and Zhang, J. (2011). Impact of gene expression noise
  on organismal fitness and the efficacy of natural selection. Proceedings of the National
  Academy of Sciences, 108(16):E67–E76.
[Weinreich and Chao, 2005] Weinreich, D. M. and Chao, L. (2005). Rapid evolutionary
  escape by large populations from local fitness peaks is likely in nature. Evolution, 59:1175–
  82.
[Weissman et al., 2009] Weissman, D. B., Desai, M. M., Fisher, D. S., and Feldman, M. W.
  (2009). The rate at which asexual populations cross fitness valleys. Theoretical population
  biology, 75(4):286–300.
[Wilke, 2004] Wilke, C. O. (2004). The speed of adaptation in large asexual populations.
  Genetics, 167:2045–53.
[Wilke, 2005] Wilke, C. O. (2005). Quasispecies theory in the context of population genetics.
  BMC Evol Biol, 5:44.
[Wolf et al., 2006] Wolf, Y. I., Viboud, C., Holmes, E. C., Koonin, E. V., and Lipman, D. J.
  (2006). Long intervals of stasis punctuated by bursts of positive selection in the seasonal
  evolution of influenza a virus. Biology Direct, 1(1):34.
[Wolpert and Macready, 1995] Wolpert, D. and Macready, W. (1995).               No free lunch
  theorems for search. Technical Report SFI-TR-95-02-010.
[Wolpert and Macready, 1997] Wolpert, D. and Macready, W. (1997). No free lunch
  theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82.
[Wolpert and Macready, 2005] Wolpert, D. and Macready, W. (2005). Coevolutionary free
  lunches. IEEE Transactions on Evolutionary Computation, 9(6):721–735.
[Wright, 1932] Wright, S. (1932). The roles of mutation, inbreeding, crossbreeding, and
  selection in evolution. Proceedings of the Sixth International Congress of Genetics, 1:356–
  366.
[Wright, 1982] Wright, S. (1982). The shifting balance theory and macroevolution. Annual
  review of genetics, 16(1):1–20.
[Yi, 2010] Yi, C. (2010). On the first passage time distribution of an Ornstein–Uhlenbeck
                                              147


process. Quantitative Finance, 10(9):957–960.
                                         148