THE EVOLUTIONARY ORIGINS OF COGNITION:
UNDERSTANDING THE EARLY EVOLUTION OF BIOLOGICAL CONTROL SYSTEMS AND GENERAL
                                     INTELLIGENCE
                                           By
                              Anselmo Carvalho Pontes
                                   A DISSERTATION
                                       Submitted to
                              Michigan State University
                      in partial fulfillment of the requirements
                                   for the degree of
                     Computer Science - Doctor of Philosophy
               Ecology, Evolutionary Biology and Behavior-Dual Major
                                          2021


                                               ABSTRACT
                             THE EVOLUTIONARY ORIGINS OF COGNITION:
 UNDERSTANDING THE EARLY EVOLUTION OF BIOLOGICAL CONTROL SYSTEMS AND GENERAL
                                            INTELLIGENCE
                                                    By
                                      Anselmo Carvalho Pontes
In the last century, we have made great strides towards understanding natural cognition and
recreating it artificially. However, most cognitive research is still guided by an inadequate
theoretical framework that equates cognition to a computer system executing a data
processing task. Cognition, whether natural or artificial, is not a data processing system; it is a
control system.
At cognition's core is a value system that allows it to evaluate current conditions and decide
among two or more courses of action. Memory, learning, planning, and deliberation, rather
than being essential cognitive abilities, are features that evolved over time to support the
primary task of deciding “what to do next”.
I used digital evolution to recreate the early stages in the evolution of natural cognition,
including the ability to learn. Interestingly, I found cognition evolves in a predictable manner,
with more complex abilities evolving in stages, by building upon previous simpler ones. I initially
investigated the evolution of dynamic foraging behaviors among the first animals known to
have a central nervous system, Ediacaran microbial mat miners. I then followed this up by
evolving more complex forms of learning.


I soon encountered practical limitations of the current methods, including exponential demand
of computational resources and genetic representations that were not conducive to further
scaling. This type of complexity barrier has been a recurrent issue in digital evolution. Nature,
however, is not limited in the same ways; through evolution, it has created a language to
express robust, modular, and flexible control systems of arbitrary complexity and apparently
open-ended evolvability.
The essential features of this language can be captured in a digital evolution platform. As an
early demonstration of this, I evolved biologically plausible regulatory systems for virtual
cyanobacteria. These systems regulate the cells' growth, photosynthesis and replication given
the daily light cycle, the cell's energy reserves, and levels of stress. Although simple, this
experimental system displays dynamics and decision-making mechanisms akin to biology, with
promising potential for open-ended evolution of cognition towards general intelligence.


Copyright by
ANSELMO CARVALHO PONTES
2021


In memory of my mother, Elfa.
             v


                                     ACKNOWLEDGEMENTS
Thank you to Laura Grabowski and David Bryson, who helped me set up the first Avida
experiments, and to Andrew Mitchell, who created the tool to plot the behavior arenas and the
digital organisms’ paths. Many thanks to my collaborators: Nick Panchy, who spent many hours
with me discussing the fine details of gene regulation and cell metabolism; Shin-Han Shu, who
suggested I used a photosynthetic cell as a model for the evolution of gene regulation, and let
me take Nick away from his work; Fred Dyer, for the numerous hours he spent working on
paper drafts and educating me on animal behavior and cognition; Charles Ofria, for getting me
unstuck many times, and giving me clever suggestions about how to solve problems and make
our research projects better; Ian Whalen and Andrew Mitchell who worked diligently and
creatively on our projects, with great results; Cliff Bohm for replicating our results in record
time, and providing constructive criticism; Robert Mobley and Ali Tehrani for their time, ideas,
and support.
I am especially thankful to David Arnosti, William Henry, Shelagh Ferguson-Miller, Kay
Holekamp, Mark Reimers, Heather Eisthen, Chris Adami, and Charles Ofria, who welcomed me
into their classes, supported me in my objectives, and from whom I learned so much.
I am equally thankful to the Evolving Intelligence group, especially Rob Pennock, Fred Dyer, and
Chris Adami, a wonderful assembly of like minds, who gave me great feedback on my research,
pointed me to interesting directions and taught me a great deal about our field.
                                                  vi


I would like to give a special thanks to Erik Goodman and Wolfgang Banzhaf for the time they
spent listening to my ideas and providing feedback and support.
I am also very grateful to Luis Zaman, Josh Nahum, Anne Sonnenschein, Jory Schossau and Bill
Punch for their time, and suggestions on great research and resources that were important to
my work.
Thank you very much to Annat Harber for reviewing the manuscript of Chapter 2 and making
numerous suggestions to improve it. I also appreciate your enthusiasm, support, and advice,
which were crucial to this project.
Thank you to Ricardo Chagas, Scott Wagner, Alex Smith (UWEC) and Michael R. Weil all of
whom were instrumental along the way.
Thank you to my lab mates, past and present, Emily Dolson, Anya Vostinar, Rose Canino-Koning,
Alex Lalejini, Mathew Rupp, Michael Wiser, Luis Zaman, Jay Bundy, Aaron Wagner, and Heather
Goldsby for your help over the years.
Just as appreciated is the help, support, and advice from my graduate committee members,
Charles Ofria, Fred Dyer, Chris Adami, David Arnosti and Wolfgang Banzhaf.
Special gratitude to Charles Ofria for welcoming me at MSU, being my advisor, and enabling me
to realize my life-long dream of discovering wonderful things about the world.
Finally, my deepest gratitude to my wife, Amy Skalmusky, for tireless and unwavering support,
and for sharing my dream.
                                                 vii


                                                TABLE OF CONTENTS
LIST OF TABLES ........................................................................................................................xi
LIST OF FIGURES .....................................................................................................................xii
Chapter 1: Introduction ........................................................................................................... 1
  1.1 A Promising Idea Became a Sand Trap ................................................................................. 2
     1.1.1 The Rise of Information Processing Theory (IPT)........................................................... 2
     1.1.2 IPT Reveals its Shortcomings ......................................................................................... 4
     1.1.3 Alternative Theories ...................................................................................................... 5
  1.2 What Is Cognition? .............................................................................................................. 6
     1.2.1 Cognition Controls Behavior ......................................................................................... 6
     1.2.2 What Is Behavior? ......................................................................................................... 6
     1.2.3 Cognition Is an Agent’s Decision-Making System .......................................................... 8
  1.3 Rational Agency and Decision-Making ................................................................................. 9
     1.3.1 What Is a Rational Agent? ............................................................................................. 9
     1.3.2 Rational Agents Versus Clockwork .............................................................................. 10
     1.3.3 The Tryptophan System in Escherichia coli ................................................................. 13
     1.3.4 Decision-Making in Abstract Terms............................................................................. 15
     1.3.5 The Value System and Its Role in Decision-Making ..................................................... 16
     1.3.6 Available Information and Its Role in Decision-Making ............................................... 17
     1.3.7 Valuation-Associated Actions and Inaction ................................................................. 18
     1.3.8 Caveats of Rational Agency ......................................................................................... 19
     1.3.9 Self-Governance and Degrees of Autonomy ............................................................... 20
     1.3.10 Degrees of Intelligence across Different Dimensions ................................................ 20
     1.3.11 Collective Agency and Multicognition ....................................................................... 21
     1.3.12 Coming Full Circle...................................................................................................... 23
  1.4 Control Systems, and Why Cognition Is Not Information Processing................................. 24
     1.4.1 What Are Control Systems? ........................................................................................ 24
     1.4.2 Control System Techniques and Objectives ................................................................ 25
     1.4.3 Implementing Control Systems in Computers ............................................................. 25
     1.4.4 The Data Processing Algorithm and IPT ...................................................................... 27
     1.4.5 Natural Versus Artificial Control Systems.................................................................... 28
     1.4.6 Understanding Control Systems Illuminates Cognition ............................................... 30
     1.4.7 Why Cognition Is Not Information Processing ............................................................ 32
     1.4.8 Changing the Cognition Discussion: Instead of Process Information, Make Decisions 33
  1.5 Cognition is Intrinsic and Necessary for Life ...................................................................... 34
     1.5.1 What is Life? ............................................................................................................... 34
     1.5.2 Life Requires Cognition ............................................................................................... 36
     1.5.3 Cognition Is a Precursor of Life ................................................................................... 37
     1.5.4 Life Is a Type of Rational Agency ................................................................................. 39
                                                              viii


  1.6 Path Forward ..................................................................................................................... 40
Chapter 2: The Evolutionary Origin of Associative Learning ................................................... 44
  2.1 Introduction....................................................................................................................... 44
  2.2 Experimental System ......................................................................................................... 50
  2.3 The Behavioral Task ........................................................................................................... 52
    2.3.1 Experimental Conditions ............................................................................................. 56
  2.4 Results ............................................................................................................................... 58
    2.4.1 Repeated Evolution of Adaptive Behaviors: Error Recovery, Imprinting, and Reversal
    Learning ............................................................................................................................... 58
    2.4.2 Early trail predictability produces behavioral building blocks for learning .................. 63
    2.4.3 Learning May Not Generalize to Novel Environments ................................................ 67
    2.4.4 Cue Reversals During Evolution Foster Ability to Relearn During Lifetime .................. 68
    2.4.5 The Stepwise Evolution of Learning ............................................................................ 70
    2.4.6 Learning Can Evolve Suddenly..................................................................................... 74
  2.5 Discussion and Conclusions ............................................................................................... 75
    2.5.1 Emergence of Learning Depends on the Prior Evolution of Reflexive Behaviors......... 75
    2.5.2 Stepwise and Modular Evolution of Complex Behaviors ............................................. 76
    2.5.3 Why Learning Was Rare .............................................................................................. 78
    2.5.4 The Scientific Value of an Open-Ended Evolutionary Model ....................................... 79
    2.5.5 Early Evolution of an Intrinsic Value System ............................................................... 80
    2.5.6 Reversal Learning Seems No More Complex than Initial Learning .............................. 81
    2.5.7 How Evolution Continues to Shape Associative Learning............................................ 82
    2.5.8 Implications for Artificial Intelligence ......................................................................... 83
    2.5.9 Implications for the Evolution of Behavior .................................................................. 84
Chapter 3: Evolution of Patch Harvesting, an Insight into Early Bilaterian Cognition.............. 85
  3.1 Introduction....................................................................................................................... 85
    3.1.1 Background ................................................................................................................. 87
    3.1.2 Previous Research ....................................................................................................... 87
    3.1.3 Recent Evidence .......................................................................................................... 88
    3.1.4 Our Experiments ......................................................................................................... 89
    3.1.5 Our Findings ................................................................................................................ 91
  3.2 Methods ............................................................................................................................ 92
  3.3 Results ............................................................................................................................... 96
    3.3.1 Evolved Behaviors Resemble the Fossil Record......................................................... ..96
    3.3.2 Evolved Behaviors Fall into Four Stereotypical Strategies........................................... 97
    3.3.3 Evolved Strategies Depended on Patch Structure ..................................................... 100
    3.3.4 In-Depth Results from a Single Environment, Rectangular with Edges ..................... 102
    3.3.5 The Spiraling Strategy Evolved Memory Usage ......................................................... 103
    3.3.6 Lineage Analysis Shows Different Strategies Evolving from One Another............... ..105
  3.4 Discussion and Conclusions ............................................................................................. 107
    3.4.1 Complex Trails Require Healthy Patches ................................................................... 107
    3.4.2 Patch Boundaries May Have Guided Mat Mining Behavior ...................................... 108
                                                                    ix


    3.4.3 Peculiar Spiraling Behavior Balances Exploration and Exploitation ........................... 109
Chapter 4: Beyond Associative Learning, the Early Evolution of Configural Learning ........... 111
  4.1 Introduction..................................................................................................................... 111
    4.1.1 Background and Motivation ...................................................................................... 111
    4.1.2 Our Experiments ....................................................................................................... 112
    4.1.3 Our Findings .............................................................................................................. 113
  4.2 Methods ........................................................................................................................ ..113
  4.3 Results ............................................................................................................................. 116
  4.4 Discussion ........................................................................................................................ 120
    4.4.1 Extra Cognitive Abilities May Contribute to Adaptation ........................................... 120
    4.4.2 We Are Approaching a Complexity Barrier.............................................................. ..121
Chapter 5: Evolution of Allosteric Regulation in Cyanobacteria ............................................123
  5.1 Introduction..................................................................................................................... 123
    5.1.1 A New Digital Evolution Platform ............................................................................ ..124
    5.1.2 Evolution of Allosteric Regulation Experiment .......................................................... 125
    5.1.3 Findings ................................................................................................................... ..126
  5.2 Methods ........................................................................................................................ ..126
    5.2.1 Cell and Population Organization in Elfa ................................................................... 126
    5.2.2 Ancestral Cells ........................................................................................................... 127
    5.2.3 Ligands in Elfa ........................................................................................................... 129
    5.2.4 Cellular Costs and Constraints................................................................................. ..130
    5.2.5 Experimental parameters ....................................................................................... ..131
    5.2.6 Analyses .................................................................................................................... 131
  5.3 Results ........................................................................................................................... ..132
  5.4 Discussion and Conclusions ............................................................................................. 139
    5.4.1 Elfa as an open-ended digital evolution platform ................................................... ..140
APPENDICES ........................................................................................................................ 142
 Appendix A: Supplementary Material for Chapter 2 ............................................................ ..143
 Appendix B: Supplementary Material for Chapter 4 ............................................................ ..182
REFERENCES .........................................................................................................................184
                                                                    x


                                                       LIST OF TABLES
Table 2.1: Environments for experiment 1 .................................................................................. 57
Table 2.2: Behavioral strategies found in all experiments........................................................... 60
Table 2.3: Experiment 1: summary of results .............................................................................. 66
Table 3.1: Environments in the order they were used in the experiments. Each environment
consisted of four arenas with different patch configurations. Each organism experienced only
one arena in its lifetime. ............................................................................................................. 93
Table 3.2: Environmental Interaction instructions used in the experiment .............................. ..95
Table 3.3: Behavioral strategies found across all treatments. For each environment, we chose
the three populations with highest overall task quality across the 200 replicates and analyzed
the navigation strategy of their predominant organisms. Although there was a great deal of
variation, all behaviors could be classified into four major strategies. ....................................... 98
Table 3.4: Strategies that evolved among the three best performing populations from each
environment. .......................................................................................................................... ..101
Table A.1: Avida instructions mnemonic references ................................................................. 145
Table A.2: Preliminary experiment summary of results. Performance and strategies of the
organisms with AMTQ equal or higher than 25%, organized by environment*. ..................... ..155
Table A.3: Experiment 2 and follow-up experiment summary of results. Comparison between
the two conditions, standard Turing-complete instruction set, and non-Turing-complete,
minimal memory instruction set. .............................................................................................. 175
                                                                  xi


                                                               LIST OF FIGURES
Figure 1.1: Steam loom centrifugal governor, Museum of Science and Industry, Manchester, UK.
.................................................................................................................................................... 26
Figure 2.1: Sample Arena and Nutrient Trail. Shown is one of four virtual arenas from an
environment. Each virtual arena contained a single trail of nutrients laid out in a unique
configuration. At the beginning of its life, each organism was placed alone at the start of the
trail (green circle) in a randomly selected arena and oriented in the direction of the next
nutrient. ...................................................................................................................................... 53
Figure 2.2: Two top-performing strategies in experiment 1. Shown are the paths of the final
predominant organisms from two different replicates that evolved in the nutrient cued
environment in experiment 1. Both were tested in the same trail configuration to facilitate
comparison. In the left panel, an organism using an error recovery strategy achieved a task
quality score of 81% of the maximum. Starting from the green circle, it moved straight while
sensing forward cues but always tried to turn right (45 degrees) when sensing a turn cue. If
turning right led the organism into an empty cell, it would retreat to the previous position and
turn toward the left (90 degrees). It continued to repeat this behavior at every turn cue without
ever learning from its error. In the right panel, an organism from a separate replicate using a
generalizable imprinting strategy achieved a task quality score of 98% of the maximum. It also
tried to turn right when sensing a turn cue. However, it stepped off the path only once at the
first left turn. It learned the correct cue-response association and navigated the remainder of
the trail without error. ................................................................................................................ 61
Figure 2.3: Distribution of average maximum task quality (AMTQ) per environment for
experiment 1. Each violin plot represents the distribution of AMTQ across replicates for a given
environment. Only the environments that started with a predictable pattern (one fixed turn,
two fixed turns, and nutrient cued) evolved organisms that could finish the trail. They also
produced a wider range of navigational strategies and organisms that reached much higher task
quality than the control environment (random start). ................................................................ 65
Figure 2.4: Distribution of average maximum task quality across 900 replicates. The
performance histogram of all final predominant organism in experiment 2 reveals a marked
grouping by behavioral strategy. Organisms in groups 1 and 2 did not finish the trails, while
those in groups 3, 4 and 5 did. Group 1 consisted mainly of organisms that navigated by path
predicting and its hybrids. Group 2 consisted mainly of organisms that navigated by error
recovery, imprinting, and their hybrids. Group 3 consisted mainly of organisms that navigated
by more effective forms of error recovery. Group 4 consisted mainly of organisms that
employed imprinting hybrids. Group 5 consisted mainly of organisms capable of relearning. The
behaviors from groups 1, 2 and 3 were assessed from a sample of organisms. Those of groups 4
and 5 were assessed from all organisms. .................................................................................... 69
                                                                          xii


Figure 2.5: Evolutionary history: 10 Lineages. Shown is the evolution of task quality over time in
each of the 10 lineages that were ultimately capable of serial relearning from experiment 2. As
they transitioned to a new strategy, some lineages had great gains in task quality, while others
had more gradual ones. All lineages, however, went through occasional periods of fitness loss.
Different task quality ranges often corresponded to specific behavioral strategies. Range 1
corresponded to path predicting, range 2 corresponded to hybrid strategies that included
searching, range 3 corresponded to error recovery, and ranges 4 and 5 corresponded to
imprinting and relearning. .......................................................................................................... 72
Figure 2.6: Commonly observed evolutionary sequences. Shown are the evolutionary
trajectories of the 11 lineages that evolved associative learning in experiment 1, and the 10
lineages that evolved serial relearning in experiment 2. Behaviors evolved in a characteristic
sequence of phenotypic stages. Starting from a naive and sessile common ancestor, all 21
lineages evolved the capacity for moving, then sensing, followed by reflexive navigation and
then learning. The numbers next to the arrows indicate how many lineages followed a
particular pathway, with thicker lines indicating more common evolutionary pathways in
relation to alternatives................................................................................................................ 73
Figure 3.1: Ediacaran trace fossil. The millimeter wide traces were produced by bilaterians with
a nervous system, the first animals with these characteristics on record. Photo by Verisimilus at
English Wikipedia, CC BY 2.5, Image downloaded from:
https://commons.wikimedia.org/w/index.php?curid=2502886. ................................................ 86
Figure 3.2: Sample Arena with an irregular shaped nutrient patch and edge nutrients. Each
arena contained a patch of nutrients with a unique shape. At the start of its life, each organism
was placed alone in a randomly selected arena, within a nutrient patch, at a consistent location
and orientation. .......................................................................................................................... 94
Figure 3.3: Example of the pattern cycling strategy (with edge-hugging) in the environment
Rectangular with Edge. Green and red circles indicate the start and end of the trail. ................ 99
Figure 3.4: Illustration of the edge-reflecting (left) and edge-hugging (right) variations of the
pattern cycling strategy .............................................................................................................. 99
Figure 3.5: Example of the reactive meandering strategy in the environment irregular with holes
and edges ................................................................................................................................ ..100
Figure 3.6: Example of the plowing strategy in the environment rectangular with hole and edges
................................................................................................................................................ ..103
Figure 3.7: Example of the spiraling strategy in the environment disconnected patches without
edges. The arena is toroidal and the organism begins its navigation at the green dot and ends at
the red. It spirals inward but leaves the patch before reaching the center. It starts a new spiral
upon encountering a new patch. .............................................................................................. 104
                                                                          xiii


Figure 3.8: Evolutionary history of the plowing strategy in the environment rectangular with
hole and edges. On the left, is the evolution of task quality over time. On the right, the different
navigation strategies from selected ancestors along the lineage............................................ ..105
Figure 4.1: Environment with four unique arenas used in the second experiment. At the
beginning of each organism’s life cycle, we placed it on a nutrient at the start of the trail in a
randomly selected arena, facing the next nutrient. The direction of the first two turns in any
trail was random. However, the direction of the second turn could be predicted from the
number of nutrients preceding the first turn. The first 90-degree turn only appeared at
approximately the 25% mark of each trail. ............................................................................... 115
Figure 4.2: Path of an organism on a nutrient trail demonstrating configural learning. In this
trail, turn cues have different meanings depending on the context (i.e., if they are preceded by
a sharp turn cue or not). The organism makes a mistake on the second turn, stepping off the
trail, but recovers and subsequently associates the meaning of the cue with the correct
direction. Afterwards, it is able to extrapolate the learned cue to different contexts. ............. 118
Figure 4.3: At left, path of an ancestral organism that navigated by error recovery. At right, Path
of an ancestral organism that was capable of associative learning but not configural learning.
The organism learns the cue association in the second turn and uses it to navigate all 45-degree
turns. However, it is not capable of extrapolating the association for the 90-degree turns and
uses error recovery instead....................................................................................................... 119
Figure 5.1: The 18 populations that evolved allosteric regulation could be categorized into three
groups according to which of their proteins evolved allosteric regulation. The first group (four
populations) evolved growth factor regulation. The second group (10 populations) evolved RNA
polymerase (RNAp) regulation. The last group (four populations) evolved both growth factor
and RNAp regulation. All final populations had higher fitness than the ancestral one (green
dotted line at 0.9 generations per day). In addition, most populations with regulation (14/18)
had higher fitness than the best non-regulated final population (orange dashed line at 3.4
generations per day). ................................................................................................................ 134
Figure 5.2: Gene regulatory network of the ancestral cell with no regulation. On the top are the
five ligands available to the cell: ROS, fat reserves (Res), sugar reserves (Sug), irradiance (Light),
and cAMP. The ancestral cell does not sense the concentrations of any of these ligands. ....... 135
Figure 5.3: Gene regulatory network of one of the cells with regulated growth factors. Here, the
growth factor could bind three different ligands with different affinities, and with different
effects. The fat reserves ligand (Res) would bind weakly and had an agonistic effect. The
irradiance ligand (Light) would also bind weakly but would have a reverse agonistic effect.
Finally, the cAMP ligand would bind strongly and also have a strong reverse agonistic effect. In
practice, high fat reserves would promote a slight increase in growth factor production, while
                                                              xiv


both the presence of light (useful for building more reserves) and cAMP (indicating starvation)
reduced the growth factor production and thus delayed replication. ...................................... 136
Figure 5.4: Gene regulatory network of one of the cells with regulated RNAp. Here, the RNAp
could bind two different ligands with different affinities, and with different effects. The ROS
ligand would bind strongly and also cause a strong agonistic effect, while the sugar reserves
ligand (Sug) would bind weakly and have a weak reverse agonistic effect. In practice, the
presence of ROS, even in small amounts, would cause the cell to increase the expression rate of
all its genes, including the growth factor, causing it to replicate faster. Replication increased the
ratio of surface area to volume in the daughter cells, temporarily alleviating the source of
stress. High sugar reserves (indicating active photosynthesis), on the other hand, would slightly
reduce the expression rate of all genes, and contribute to slow down replication and build
energy reserves. ........................................................................................................................ 136
Figure 5.5: Gene regulatory network of one of the cells with both growth factor and RNAp
regulation. Here, the RNAp could bind three different ligands while the growth factor could
bind two. The ROS ligand would bind strongly to both RNAp and growth factor causing strong
agonistic effects. The fat reserves ligand (Res) could also bind to both RNAp and growth factor,
but it caused a weak reverse agonistic effect. Finally, the sugar reserves ligand (Sug) would bind
weakly to the RNAp and have a weak reverse agonistic effect. Similar to the case in fig. 4, ROS
would have a strong effect on gene expression by promoting the activity of RNAp, but with
even more emphasis on cell growth and replication, since it also directly promoted the activity
of the growth factor. However, high fat and sugar reserves (indicating active photosynthesis)
would slightly reduce the rate of protein expression and the activity of the growth factor, which
favored the accumulation of energy reserves. .......................................................................... 137
Figure 5.6: A twenty-four hour period of one of the cell lineages with RNAp allosteric regulation.
In the presence of ROS, which causes an agonistic effect on their RNAp, these cells grow faster
and divide, thus increasing their surface area and the absorption of CO2. Sharp drops in cell
volume indicate cell division. Note that cells have steeper growth curves and also replicate
more often during the day than at night. .................................................................................. 138
Figure 5.7: A twenty-four hour period of one of the fittest cell lineages without regulation.
These cells grow and replicate at a constant rate despite the environmental light cycle and their
own internal stress. As a result, they often suffer damage and mutations due to ROS
accumulation when they reach their photosynthetic limit. ...................................................... 139
Figure A.1: One fixed turn environment. This environment consists of four different trails. In all
of them, the first turn is to the right. The green circle indicates where the organism is placed at
the start of the trail, facing the next nutrient. .......................................................................... 147
Figure A.2: Two fixed turns environment. This environment consists of four different trails. In all
of them, the first turn is to the left and the second turn is to the right. The green circle indicates
where the organism is placed at the start of the trail, facing the next nutrient. ....................... 148
                                                                 xv


Figure A.3: Nutrient cued environment. This environment consists of four different trails. In
each of them, the number of nutrients before the first turn is even if the first turn is to the right
and odd if the first turn is to the left. The green circle indicates where the organism is placed at
the start of the trail, facing the next nutrient. .......................................................................... 149
Figure A.4: Random start environment. This environment consists of four different trails. In all
of them, the number of nutrients before the first turn is the same (3), and each of the four
possible combinations of first and second turns are contemplated, therefore, making the start
pattern truly random from the point of view of the organism. The green circle indicates where
the organism is placed at the start of the trail, facing the next nutrient. ................................ ..150
Figure A.5: Cue reversal environment. This environment consists of four different trails with the
same start pattern as the nutrient cued environment. In each of them, the number of nutrients
before the first turn is even if the first turn is to the right and odd if the first turn is to the left.
The pink circle indicates the point where the turn cues are reversed. The green circle indicates
where the organism is placed at the start of the trail, facing the next nutrient. ....................... 151
Figure A.6: Distribution of Average Maximum Task Quality (AMTQ) per environment, comparing
the preliminary experiment and experiment 1. Each violin plot represents the distribution of
task qualities across replicates for a given environment. The preliminary experiment did not use
the move back instruction (orange), experiment 1 did (blue). The difference between
experiments was significant in all but the one fixed turn environment (Kruskal–Wallis test: one
fixed turn, H = 3.0309, df = 1, p-value = 0.08169; two fixed turns, H = 6.7623, df = 1, p-value =
0.00931; nutrient cued, H = 19.522, df = 1, p < 0.0001; random start, H = 25.301, df = 1, p-value
= 4.904e-07). ........................................................................................................................... ..154
Figure A.7: Searching and imprinting hybrid strategy. This organism, which evolved in the
nutrient cued environment, is an example of a searching and imprinting hybrid strategy that
reached a task quality of 57% of the maximum in this trail. When the organism started
navigating the trail, it reacted to turn cues by turning to a default direction. If this direction led
it to step off the trail, it would initiate a search procedure, alternating forward moves and turns,
until it found another portion of the trail and reentered it. This stint off the trail also primed it
for imprinting the cue association the next time it encountered the non-default turn cue. After
which it would navigate the remainder of the trail using the learned association.................... 160
Figure A.8: Evolution of learning according to cue reversal position. Different cue reversal
positions along the trail of nutrients, from 10% to 90% of the total length, in 2.5% intervals. In
blue, the number of replicates from the first set that evolved learning out of 200 per position.
In yellow, the second set with 1000 replicates per position. .................................................... 161
Figure A.9: “Oscillator” organism. This organism moved back and forth between its start
position (green circle) and its final position (red circle). ........................................................... 163
                                                                   xvi


Figure A.10: “Straight mover” organism. This organism moved forward only until it encountered
a turn cue. ................................................................................................................................. 164
Figure A.11: “Right turner” organism. This organism reacted to turn cues by always rotating
right........................................................................................................................................... 165
Figure A.12: “Path predictor” organism. This organism had an encoded pattern in its behavioral
algorithm that reflected the different starts of the four trails for this particular environment
(nutrient cued). When the pattern no longer matched the trail, and the organism made a
“wrong” turn, it stopped moving. ........................................................................................... ..166
Figure A.13: “Error recoverer 1” organism. This organism was the first in this lineage to use the
error recovery strategy, however it often wasted movements, which made progress somewhat
slow. .......................................................................................................................................... 167
Figure A.14: “Error recoverer 2” organism. This organism was the last one in the lineage to rely
exclusively on the error recovery strategy. Its behavior was a streamlined version of its early
ancestor (Error recoverer 1), which allowed it to move faster, waste fewer movements, and
reach much further into the trail. ............................................................................................. 168
Figure A.15: “First learner” organism. This organism was the first one in the lineage to use the
relearning strategy. It differed from its immediate ancestor (Error recoverer 2) by a single
mutation. Its path in one of the arenas (right) shows that that it went off the trail three times:
once in the initial learning, a second time when the cues were reversed, and a third time when
it detected the trail ended, at which point the organism stopped moving. .............................. 169
Figure A.16: One single mutation separates the error recovery and relearning behaviors. The
transition from Error recoverer 2, on the left, to First learner, on the right, occurred due to a
single mutation. For the full source code of the First Learner, see fig. A.18. ..............................170
Figure A.17: Change in the behavioral algorithm from Error recoverer 2 to First learner due to a
single mutation. The mutation connected the error-recovery module to the memory-storing
module. Previously, memory-storing was executed only once, right after the organism was
initialized on the trail (left sequence). After the mutation, every time it made a wrong turn and
recovered, the organism stored the cue that led it off the trail in memory (right sequence)...171
Figure A.18: Comparison between First learner and Final organism. The picture shows the
genomes of the First learner and Final organism, side by side. Lines connecting both genomes
indicate corresponding portions of the algorithm. The longer Final organism’s genome indicates
a substantial accumulation of neutral mutations during evolution. Interestingly, the active parts
of the genome were highly conserved, and tended to remain together................................... 172
Figure A.19: Distribution of Average Maximum Task Quality (AMTQ) per condition for
experiment 2 and follow-up experiment. Each violin plot represents the distribution of AMTQ
                                                                       xvii


across replicates for a given condition. The difference between the two conditions was not
significant (Kruskal–Wallis test, H = 0.11008, df = 1, p = 0.7401). ........................................... ..174
Figure B.1. Flowchart of the predominant organism capable of the generalizable version of
configural learning from Experiment 2. Its cyclomatic complexity is 13. .................................. 183
                                                xviii


Chapter 1: Introduction
Science has progressively revealed the nature of the world around us and within us, but
scientific advances have been uneven. One subject that remains stubbornly resistant to our
efforts is cognition. We do not know how it works or how to measure it, we are not sure who or
what has it, and we even have trouble defining it. If we could understand cognition better, it
would change the way we see ourselves and the living world around us. If we understood
cognition to the point where we could recreate it artificially and with high degrees of
competence, we could harness one of the most powerful forces of all and transform both
technology and our society.
Currently, most scientific efforts related to cognition are based on a model that equates the
brain to a type of computer system. However, this paradigm is outdated, and although there
are alternative views, they have not yet gained traction. Our flawed understanding of cognition
is stunting the field of artificial intelligence (AI), which relies on theoretical guidance from
research on natural cognition.
In this chapter, I start by discussing the theory of cognition that is currently dominant, including
its history and how it has affected AI research. Later, I discuss the nature of cognition in the
context of agency and decision-making. I then provide a new perspective on the subject that I
believe is key to advancing both our understanding of cognition and our research on AI. I relate
the various concepts back to living organisms that were produced through an evolutionary
                                                      1


process and discuss approaches to creating AI with levels of autonomy and intelligence similar
to what we see in nature.
In subsequent chapters, I explore the evolution of behavioral control systems that exhibit key
cognitive abilities before finally discussing results from a system of my own. This system uses an
approach more directly inspired by nature to produce versatile control systems with greater
complexity than current evolutionary techniques permit.
1.1 A Promising Idea Became a Sand Trap
1.1.1 The Rise of Information Processing Theory (IPT)
In the years following the Second World War, there was much excitement about the newly
invented electronic digital computer and its potential for revolutionary applications. Early
computers were called “electronic brains”, not only in the popular press, but by scientists as
well. Amid this computer age enthusiasm, the field of artificial intelligence (AI) was launched in
1956 [1], the same year that a new theory equating human cognition to a computer system
started to take shape 1 [2].
This theory, which became known as Information Processing Theory (IPT), posits that human
cognition is an instantiation of a Turing machine, a general-purpose symbol manipulation
system. IPT holds that sensory data (stimuli), received as input, is converted into symbols,
1
  In the summer of 1956, Allen Newell and Herbert A. Simon, two of the founders of Information Processing Theory
(IPT), attended the historical workshop at Dartmouth College where the field of AI was named. There, they
presented their Logic Theorist system, the first application of IPT [3].
                                                           2


stored in memory, and processed according to some rule, which then generates an output
(behavior) 2 [2], [4].
AI rapidly adopted IPT and they shared some impressive early results, such as computer
programs that proved theorems and processed natural language, and robots that navigated
inside a room and manipulated objects [2], [5]. Buoyed by these achievements and a good dose
of computer-age enthusiasm, IPT soon replaced Behaviorism as the dominant theory of
cognition [2].
Originally, IPT considered cognition to be a uniquely human capacity – at most shared with
other primates – since we were supposedly the only creatures capable of symbolic
representation [2]. Later, animal behavior researchers extended IPT to non-primate animals [6]
by relaxing the concept of representation 3, going beyond explicit symbols, to include broadly
the encoding of information in neural signals [8]–[11], [7]. However, this broader definition still
implied that cognition was exclusive to organisms with a nervous system.
2
  Information Processing Theory has many variations, and some authors refer to information processing as a set of
related theories [2]. Despite their differences, the various flavors of information processing all share the view of
cognition as a symbol manipulation process, following a well-defined input–process–output sequence, akin to a
digital computer system executing a data processing task.
3
  Gallistel [7] (p. 4 and 7) defines representation as “signals, symbols and the operations on them”, where symbols
and signals are functionally homomorphic to aspects of the world. That is, “the signals and symbols carry
information about properties of the experienced world” in such a way that the brain, operating on those signals
and symbols, can “anticipate behaviorally relevant states of the world”. However, “a symbolic memory mechanism
has not so far been identified … The absence of a symbolic memory mechanism is a problem, because a
mechanism functionally equivalent to the tape in Turing’s abstract conception of a general-purpose computing
machine is essential to computation and representation.”
                                                             3


1.1.2 IPT Reveals its Shortcomings
Despite IPT’s popularity, it eventually became clear that humans are not the general-purpose
symbol manipulators that the theory predicted. Experiments demonstrated that humans do not
rely as much on representation and often use heuristics to solve problems [12], [13]. IPT was
also criticized for its one-way, serial processing, input–output model that strictly separated
perception from action, when the two are often intrinsically coupled, such as when our eyes
scan a scene for us to build an image [14], [15]. Moreover, IPT neglected the role of feedback
and required an unrealistically accurate internal model of the world to achieve real-time motor
control [5], [16].
Meanwhile, AI’s early promising results turned out to be brittle, as they did not scale well to
more complex versions of the same problem, nor did they generalize to related problems [5].
Since then, AI has mostly set aside its original goal of recreating human intelligence and has
instead settled for domain-specific targets, a narrow range of tasks where norms can be
articulated clearly and existing knowledge can be tapped to prime the system [1], [17].
There is also an ironic circularity in the relationship between AI and IPT. While AI researchers
look to IPT to understand how cognition works in order to emulate the process in a computer
system, IPT claims that cognition works just like … a computer system. In spite of these
shortcomings, IPT is still the most influential theory of cognition in AI, permeating most rule-
based and connectionist approaches [15], [18].
                                                    4


1.1.3 Alternative Theories
Alternative theories of cognition have been proposed where cognition is not akin to a data
processing system and does not rely on symbolic representation. The most prominent ones,
including Perceptual Control Theory [19], Dynamicism [20], and 4E Cognition [21], use a control-
system paradigm instead of the sequential input–process–output algorithm. None, however,
have yet achieved widespread acceptance. Most importantly, even in the light of these
theories, cognition remains a mysterious phenomenon.
While AI is trapped in this theory quicksand, with no clear agreement on what cognition really
is, recreating general intelligence 4 will likely remain out of reach. After all, how can we build
what we do not understand? 5
This dissertation attempts to understand cognition and find a path to artificially create it.
4
  General intelligence, as opposed to domain-specific intelligence, refers to the multifaceted and versatile
intelligence that natural organisms possess, which gives them autonomy and helps them adapt to changes in their
environments. It is a broader term than artificial general intelligence (AGI), which refers specifically to human-like
intelligence. It also differs from the term general AI, which loosely refers to AI systems that perform well across
multiple domains, without specific retraining for each one.
5
  This is a common quip among AI researchers and developers: how can we perform a task if we do not even know
what the task is?
                                                            5


1.2 What Is Cognition?
1.2.1 Cognition Controls Behavior
We routinely evaluate the cognitive abilities of organisms by observing their behavior.
Cognition gives an organism the ability to adapt its behavior according to the circumstances.
This is a point on which all theories of cognition generally agree: cognition is responsible for
controlling behavior [19], [22], [20], [23], [21], [10]. What is surprising, however, is that there is
no agreement on what qualifies as behavior.
1.2.2 What Is Behavior?
The extent of this disagreement was brought to light by a 2009 study [24] in which the authors
reviewed the literature on behavior and surveyed members of three behavior-focused scientific
societies and found that existing definitions of behavior often conflict with one another, are too
vague, or exclude whole groups of organisms 6. The authors also found that many researchers
rely on their own intuitive definitions of behavior, which can be inconsistent and often
contradict those from the literature.
In an attempt to reach consensus, the authors proposed yet another definition: “Behaviour is
the internally coordinated responses (actions or inactions) of whole living organisms
6
  According to some of these definitions, only animals are capable of behavior, even though there are many clear
examples of cognitive abilities in organisms as varied as plants and protists [25]–[30].
                                                           6


(individuals or groups) to internal and/or external stimuli, excluding responses more easily
understood as developmental changes.”
As often happens in such compromise, this new definition is quite hedged and still open to
interpretation. Furthermore, by downplaying behaviors that rely on physiological or
developmental processes, the definition seems biased towards animals.
The reckoning and discussions brought about by the study spurred another round of papers
proposing yet more definitions of behavior and ways to classify it. One paper, however, stands
out by casting the phenomenon in a new light and addressing many of the limitations of the
above definition 7 [31]. It proposes a definition of behavior that is grounded in the concept of
agency, which is particularly insightful for the study of cognition: behavior is the “observable
consequences of the choices a living entity makes in response to external or internal stimuli.”
What the authors mean is that all living organisms should be understood as agents [32], entities
capable of initiating their own actions and whose choice of action is influenced by their own
state and the state of their environment 8. A simple example of a choice is switching a molecular
7
  The authors of this study are two plant scientists and a philosopher, all with an interest in cognition. They were
motivated to propose a less-biased definition of behavior that applied to all living entities.
8
  More precisely, the agent’s estimate of the state of the environment.
                                                           7


pathway due to an environmental signal 9 [31]. The consequences of such a choice can be as
overt as movement or as covert as the production of a chemical compound 10.
Additionally, behavior and agency are not exclusive to single, whole organisms. Certain parts of
multicellular organisms, such as individual cells, are also agents and capable of behavior.
Likewise, a group of organisms, such as eusocial insects, can work together as a collective agent
[31], [33].
1.2.3 Cognition Is an Agent’s Decision-Making System
By viewing living organisms as agents, and behavior as a byproduct of agency and its intrinsic
action-selection process, we begin to get a clearer picture of cognition. If cognition is what
controls behavior, then cognition must be an agent’s decision-making system. Moreover, any
system making the decisions for another is a control system [34], so it follows that cognition is
an agent’s control system.
The idea that cognition is a control system is not new. Since at least the foundation of
Cybernetics in the 1940s [35], several researchers based their theories in this assumption [19],
[23], [36], [37]. Even Allen Newell, one of the founders of Information Processing Theory (IPT),
9
  Clearly, choosing to do nothing when an alternative exists is also a choice.
10
   The observability requirement in the definition is redundant and can be safely ignored. It was likely included for
compatibility with historical views of behavior. Observability changes with the availability of resources and
technology. Moreover, behaviors are classified in the literature as overt or covert, where the latter may be difficult
or even impossible to observe.
                                                          8


understood as much. In his final book, “Unified Theories of Cognition” [22] (p. 43), Newell
wrote:
          I want to take the mind to be the control system that guides the behaving organism in
          its complex interactions with the dynamic real world. … The mind then is simply the
          name we give to the control system that has evolved within the organism to carry out
          the interactions to the benefit of that organism or, ultimately, for the survival of its
          species.
Although Newell and others intuitively understood that cognition is a control system, they
never actually demonstrated it. As I will show in the next sections, this is not merely a
hypothesis nor a metaphor. The decision-making mechanism required for agency constitutes a
control system, and this is what we call cognition 11. Realizing this has many implications for the
study of behavior and cognition in general and, more importantly, for AI, due to the vast
differences between a control system and a data processing system. By shifting our perspective,
we can finally free ourselves from the sand trap of IPT.
1.3 Rational Agency and Decision-Making
1.3.1 What Is a Rational Agent?
In philosophy, an agent typically refers to a human being, an entity whose choices of action are
guided by their own beliefs and preferences [33], [38]. Humans, however, are only one instance
11
   The words cognition and intelligence are often used interchangeably. Throughout this dissertation, I will draw a
distinction between them. I will use cognition to refer to a binary quality that an entity can either possess or not
possess, while I will use intelligence to refer to a quality that cognitive entities possess in different degrees. I do,
however, use the term artificial intelligence (AI) as it is commonly understood.
                                                              9


of a larger class of rational agents, or intentional systems [39]–[41], which can be natural,
artificial, and even exobiological.
An agent is said to be rational if it always attempts to improve its own good, according to its
own value system (its preferences), given its state and estimate of current conditions (its
beliefs). Any entity that always make decisions randomly, without any sensitivity to changes in
current conditions, is therefore neither cognitive nor a rational agent 12.
All rational agents are cognitive, self-governing, and use information and energy. As noted
above, however, they do not need to be living [39], [40]. A refrigerator, a Roomba vacuum
cleaner, and a self-driving car are rational agents, just like bees, oak trees, and E. coli bacteria.
1.3.2 Rational Agents Versus Clockwork
It is important to distinguish agents 13 from the close-resembling but non-cognitive entities
called clockwork 14. Clockwork systems also use energy and can be quite complex, but they lack
the fundamental capacity of adaptation. They are insensitive to changes in conditions and
incapable of self-adjusting. They do not use information or make decisions. Some examples of
12
   A rational agent may choose to behave randomly occasionally. To be able to produce random, unpredictable
behavior at will is an asset in some circumstances, such as when being pursued by a predator or in certain social
situations.
13
   Throughout the rest of this dissertation, all references to “agents” should be understood to mean specifically
“rational agents”.
14
   Although I do not agree with Boulding’s system categorization overall [42], one of its categories named
“clockworks” (p. 202) seems to describe the class of systems I am referring to here.
                                                           10


clockwork include a mechanical clock, an electric water pump, a Ford model T, and a typical
protocell [43].
The addition of a control system (cognition), as simple as it may be, can turn a clockwork
system into a rational agent. A control system provides the capacity to sense the state of at
least one variable and to use that information to make the appropriate course of action.
Take, for example, an electric water pump that fills a swimming pool. This pump has no
automation and must be switched on or off by a person. It is clockwork. The pump, however,
can be easily automated and turned into a water-level maintaining device – a rational agent –
albeit a very simple one. One way to do this is by placing a float in the pool, attached to an
articulated arm (like a toilet float), so that the arm, depending on its angle, flips the pump’s
switch on or off. As the water level gets lower, the float dips into the pool until the arm reaches
an angle that turns the pump on. The float then rises with the water level until the arm reaches
the angle that turns the pump off.
This solution is admittedly clunky: there are many ways that this control system could be made
smarter and more elegant. Nevertheless, the ensemble pump, float, articulated arm, and switch
together make up a self-governing system, a rational agent.
The upgraded system uses the estimated water level to inform its choice of action. For
example, in the state where the pump is off, the system continuously decides between doing
nothing and turning the pump on. The threshold at which it prefers to turn the pump on is
determined by its value system, an intrinsic set of biases due to design and manufacture (such
                                                   11


as the length of the arm, the position of the electrical switch, the displacement of the float,
etc.). In more anthropomorphic language, the system can be described as valuing a water level
above a certain threshold. When the system believes that the water level is low, it acts to raise
this until it believes that the water has reached its preferred level.
It is easy to see how cognition, at its simplest level, can be built up from basic physics. We took
a clockwork mechanism (electric pump and switch) and interposed it with another (float and
articulated arm), in such a way that the original causal chain (electricity powering the pump)
became conditional on a measured variable (pool water level), thus creating cognition from
non-cognitive parts.
There is no upper limit for how complex cognition can be, and as it becomes more complex, it
also becomes more opaque. For example, the automated water pump could be further
elaborated by adding sensors and degrees of freedom. In addition to maintaining a minimum
water level, the system could be made to maintain the water below a maximum level by
controlling the pool’s drain. The water level thresholds themselves could also be made
adjustable and placed under the system’s control, allowing the system to decide on different
water levels depending on the time of the day or the weather, for example. As the control
system becomes more complex, it may be beneficial to account for correlations among
variables and regularities in the environment. Therefore, the system could be upgraded with
memory to store past states and events. It could also be made to learn associations, make
predictions, communicate with other systems, set goals, and make plans to achieve these goals.
                                                  12


These changes may improve the water pump, but they are not necessary for cognition. The
ability to store memories, learn, make predictions, and so on are supporting features that often
evolve, or are added by design, if they contribute to the agent’s fitness (or performance). In
Chapter 2, we demonstrate one such evolutionary process that produced agents with various
types of cognitive abilities, including associative learning.
1.3.3 The Tryptophan System in Escherichia coli
A biological analogue to the automated water pump is the tryptophan system in Escherichia
coli. Living organisms’ control systems are vastly more complex than our previous example.
However, complex control systems (natural or artificial) are typically modular and made of
simpler control systems that are arranged hierarchically, influencing one another, but allowing
for relatively independent analysis of the parts. The tryptophan system, taken separately, is
also a level-maintaining system based on proteins that interact with signaling chemicals to
make decisions. It maintains the concentration of tryptophan in the cell above a minimum level
by either allowing tryptophan to be produced when its concentration is low or blocking its
production when tryptophan concentration is high.
E. coli depends on the amino acid tryptophan to make essential proteins. Shortage of
tryptophan slows down protein production, which hinders most cellular activity, including
growth and reproduction, and in extreme cases can lead to death. E. coli can obtain tryptophan
from the environment, such as when it is inside an animal’s gut, but it can also synthesize its
own from simpler molecules, although with a cost. Synthesizing tryptophan requires spending
time, energy, and other resources not only on the production of tryptophan, but also on the
                                                  13


production of the tryptophan synthesis machinery. These are resources that could be allocated
to other cellular processes, and this ultimately impacts E. coli’s fitness. Therefore, E. coli should
only synthesize tryptophan when its concentration inside the cell is low.
E. coli senses and controls tryptophan concentration by means of a repressor protein complex
that the cell continuously produces in small amounts. Each of these complexes has pockets that
can bind tryptophan and cause the complex to change conformation. The binding is temporary
and depends on the concentration of tryptophan in the cell. When the concentration of
tryptophan in the cell is low, most of the complexes will be unbound and inert, allowing the
tryptophan synthesis machinery to be produced. As the tryptophan concentration gets higher,
more complexes will bind tryptophan and become active, thus blocking the production of the
tryptophan synthesis machinery, causing the synthesis of tryptophan to taper off.
All control systems have a cost. In this example, E. coli must produce the repressor protein
complex continuously as well as maintain a gene for it. However, the cost of the tryptophan
system is more than offset by the savings gained by producing tryptophan only when needed.
Typically, control (also known as regulation) is only employed where its cost is offset by the
savings or other benefits it provides. In nature, this is one reason why organisms evolve
different cognitive abilities depending on their environment. In the following chapters, we
perform digital evolution experiments and describe this phenomenon in more detail.
                                                 14


1.3.4 Decision-Making in Abstract Terms
E. coli makes decisions, such as those regarding tryptophan synthesis, in the context of its
internal state. This includes the cell’s energy level, whether it is obtaining tryptophan from the
environment or synthesizing its own, the current number of repressor protein complexes, the
number of copies of the tryptophan synthesis machinery, how many ribosomes it has to make
new proteins, etc. The internal state narrows down the decisions the cell must make, as well as
the information that is available and the actions that are relevant or even possible. In turn, the
cell’s decisions affect its internal state and set the stage for future decisions. Just like with
E.coli, any rational agent’s internal state is the result of its past decisions and experience.
In abstract terms, decision-making is the evaluation of current conditions, in the context of the
agent’s state, and the selection of a course of action associated with this valuation.
Decisions can be prompted by changes of state, previous decisions, or simply the passage of
time. In both the automated pump system and the Tryptophan system, decisions are made
continuously. An agent’s decisions, however, can also happen discretely, or even as singular
events, and – depending on an agent's complexity – multiple decisions can occur
simultaneously.
Decision-making does not require deliberation, which may or may not precede the event of
decision-making. Some agents have complex models of the world and can simulate the
outcome of alternative actions to predict future conditions. Decision-making, however, does
not require prediction about the future, only information about current conditions. Even agents
                                                   15


that are capable of predicting the future do not do this for every decision. More complex
deliberation processes take longer and consume more time and resources; as such, they are
reserved for only a subset of the decisions that an agent makes. Decisions are often made
based on heuristics, which are faster than deliberation.
Goal setting, memory, and learning are also supporting – rather than core – features of
cognition. They are not required for decision-making and may or may not be present in a
particular rational agent.
1.3.5 The Value System and Its Role in Decision-Making
Central to decision-making is the agent’s value system, a set of biases used to evaluate current
conditions and select a course of action. All agents have an intrinsic set of values, which vary
among individuals. An agent’s values can change over time and, depending on the agent, also
be learned. Artificial agents can also have values that are explicitly supplied.
Because a value system guides an agent’s decisions, an agent will tend toward certain end-
states, or frequently-visited states, which we can interpret as the agent’s goals. Thus, by
observing an agent’s behavior over time, we can deduce the agent’s values and goals, even if
the agent is not capable of explicit goal setting, which is a more complex cognitive ability and
not essential to cognition.
In evolving agents, intrinsic values are under continuous selection, just as any other physical or
behavioral trait, and those that contribute towards fitness tend to be selected over those that
do not. We will discuss this in more detail in Chapter 2.
                                                 16


1.3.6 Available Information and Its Role in Decision-Making
Decision-making requires evaluating information in real-time (or its digital equivalent). Time
constrains how much information is available for a decision, as well as how long that
information remains relevant. Given that agents make decisions with limited time and based on
partial information, decisions are rarely optimal. Time, however, can also reveal data patterns
both over time and across sensors that convey additional information and create opportunity
for learning. It is important to note that information extracted from sensors are estimates. The
state of a variable can never be known for certain, only inferred from measurements.
Information is a fundamental resource for agents, just like energy. Agents select which signals
to tend to, search for, and pursue, all the while evaluating competing signals, ignoring signals,
pooling signals, filtering signals, applying error-correction to signals, etc. This helps the agent to
obtain the best estimates about the state of the world, as well as about the agent itself, and
therefore decide what to attend to. There may be competing needs and not enough means to
meet all the demands simultaneously.
When an agent responds to information from sensors, it responds to semantic information
(meaning), not syntactic information (sensor data). Semantic information must be extracted
from the sensor data stream before it can be evaluated during decision-making. The amount of
semantic information that the data stream conveys does not need to correlate with the amount
of syntactic information. For example, in agents with multiple sensors it is common that the
data stream provides a continuous flow of sensor data, that the agent could not possibly
respond to, nor it should. Only salient features of the data stream that are relevant to the
                                                   17


current state of the agent are evaluated. The salient feature (carrying semantic information)
could be a pattern in the data, or specific values, or even the absence of the data stream.
Sensory data can be noisy and can have limited resolution and accuracy. Data can also be
sparse, or there can be too much of it. Agents employ statistical treatments, within their
capacity, to filter, integrate, and find correlations that help them infer the state of the variables
of interest. The aim is to get the best possible inference from the available information.
In the case of the automated water pump, the size and shape of the float, and the inertia of the
arm, combine to average the water level over an area and over time. The system is thus
applying a statistical filter to the information. In the tryptophan system, the numerous copies of
the tryptophan sensing complex and their sensitivity to tryptophan also produces an averaging
effect. This is an example of how the statistical treatments that the cell uses (such as sensor
copy number and affinity) are under evolutionary influence.
It is worth noting that, even in agents that are capable of predicting the future, they extrapolate
from present information. They must first produce an estimate of current conditions in order to
make a prediction. In terms of evolutionary origins, estimation must have come before
prediction, since the first is required for the latter.
1.3.7 Valuation-Associated Actions and Inaction
For every evaluation that an agent makes of the current conditions, there is an associated
action. This could be inaction – the decision to do nothing – or it could be as simple as a change
                                                   18


of state. It could also be more complex: an agent could deploy an action consisting of a
sequence of steps.
Actions can be innate or learned, although not all agents are capable of learning. In addition,
associations between valuations and actions can either be innate or learned. In either case,
upon reaching a valuation, the agent triggers the associated action.
1.3.8 Caveats of Rational Agency
Both rational agency and decision-making have caveats due to their existence in physical
reality. Some decisions that a rational agent makes may not be rational, while others may be
rational but self-defeating. Agents can suffer wear and tear or degradation of their sensors.
Agents can be affected by noise, fatigue, or toxins, or else suffer from external manipulation.
Even digital agents face limitations of computational resources. In addition, agents can change
over time due to experience or stage of development; this in turn impacts decision-making.
Factors such as environmental mismatch, novel situations, ambiguous cues, an inept cognitive
system, self-defeating or detrimental values, and overwhelming or conflicting demands can also
lead to poor decision-making.
Decision-making can occasionally be random in order to break ties, confuse adversaries,
increase beneficial variation, etc. Nevertheless, as noted in Section 1.3.1, an entity that always
makes decisions randomly is not a rational agent.
In addition, some actions may seem to initiate spontaneously [44]–[46]. Many of the actions
that we assume to be spontaneous may be due to hidden chains of events, such as changes of
                                                19


state, which include rational decisions. Some actions, however, may occasionally be triggered
without the agent’s authorization, due to degradation, external interference, or random
processes (noise).
1.3.9 Self-Governance and Degrees of Autonomy
All rational agents are self-governing since they are ultimately the ones that authorize their
own actions. Rational agents can, however, have differing degrees of autonomy [33]. Human-
designed agents typically depend on externally-supplied values and goals (thresholds,
tolerances, setpoints, etc.). They are therefore less autonomous than evolved agents, whose
values – and goals derived from them – are entirely their own 15.
The autonomy of human-designed agents may also be restricted to particular tasks and to
particular periods of time, such as an automobile’s cruise control. While the cruise control is
engaged, it autonomously maintains the speed. The speed target, however, is externally
supplied, and the agent has no control over the steering.
1.3.10 Degrees of Intelligence across Different Dimensions
Although all rational agents are cognitive, they vary in their intelligence, in terms of how
effective their cognition is. An agent with high intelligence is one that makes the best decisions
given constraints faced, such as limited time for decisions, incomplete information, and
15
   One of the goals of AI is to make machines less dependent on human supervision, i.e., more autonomous.
However, we also want to retain control over them, and make sure that their behavior conforms to our standards.
                                                        20


competing demands, notwithstanding that decisions are rarely optimal and organisms often
rely on heuristics that give good enough solutions.
The intelligence of an agent – how effective its cognition is – can be measured across multiple
dimensions and time scales. One perspective is to measure how well an agent is adapted to its
own environment. This can be further divided into how competent the agent is at a specific
type of task versus how competent the agent is at multiple types of tasks [47]. An agent could
be considered very intelligent if it solves a hard task, in which case the agent would be said to
have high capability. If this hard problem were rare, however, then another agent that is better
at more common, easier problems – in other words, an agent with high generality – may be
considered the more intelligent agent given a longer time scale.
Another perspective is to consider a higher order of generality in which an agent’s ability to
perform multiple novel tasks is assessed. This would be a measure of the agent’s ability to
adapt to novel environments. After all, an agent that is good at solving familiar problems may
nevertheless struggle to solve a new type of problem. An agent can thus be intelligent in its
own environment but poor at generalizing to other environments.
These two perspectives for measuring intelligence are of interest for AI research. The higher
order of generality is of particular importance to the goal of achieving general intelligence.
1.3.11 Collective Agency and Multicognition
As noted in Section 1.2.3, individual agents can work together as a collective agent. The
coupling or integration of agents can vary from occasional and facultative, as in the case of
                                                  21


whales coming together to hunt, to permanent and obligatory, as in the case of a honeybee
colony or a multicellular organism such as a plant or animal.
There are both natural and artificial collective agents. An autonomous vehicle, for example, is
made of up various cooperating subsystems, each of which is itself an agent. It is among natural
agents, however, that we see a greater spectrum of organizations and the highest levels of
complexity.
To take the example of a honeybee colony, here there is not just one but multiple orders of
collective agency. First, a eukaryotic cell is a collective agent because it is made up of, among
other things, mitochondria and ribosomes; these, although non-living, are bona-fide rational
agents. Each individual bee is also a collective agent since, as is the case with any multicellular
organism, it is made up of cells working together. The bee colony is the third order of collective
agency.
When the connection among constitutive agents becomes tighter, and their integration
becomes permanent, the collective agent can transition into a new higher-order individual
agent with its own higher-order cognition. After the transition, much of the cognitive
machinery remains subsumed at the level of the constitutive agents and obscured from analysis
at the higher level. For example, a lot of processing that happens in neurons happens at the
level of molecular circuits. Memory itself may even be stored intracellularly at that level.
This transitioning of a collective agent into a new higher-order agent is one of the main reasons
why cognition appears so mysterious and impenetrable; it muddles the problem of what
                                                   22


cognition is. To understand cognition in humans or other animals, we must understand that this
is a higher-order phenomenon: it is multicognition. We cannot understand the cognition of the
supra-entity without understanding that many of its components are hidden in the simpler
cognitive entities of which it is made. Our cognition, rather than dwelling in our brain, can be
thought of as a brain of brains, distributed through our entire body, with trillions of
interconnected cognitive units acting more or less as one.
1.3.12 Coming Full Circle
Amusingly, it is worth noting that the word intelligence derives from the Latin inter-legere,
meaning “to choose between” [25]. In our centuries-long quest to understand intelligence, it
seems we are coming full circle and rediscovering what our forebears knew all along:
intelligence means making good decisions.
                                                 23


1.4 Control Systems, and Why Cognition Is Not Information Processing
1.4.1 What Are Control Systems?
Any system that controls another system or process is a control system. A control system can
be artificial (human engineered), such as a thermostat that maintains a consistent temperature
in a room, or it can be natural (produced by evolution) such as the human thermoregulatory
system that maintains our body temperature even as outside conditions change. A control
system can be simple or complex, and it can be implemented in many different substrates,
including mechanical, electronic, biochemical, and digital.
It is important to note that the terms “control”, “regulate”, “manage”, and “govern” – when
applied to systems – are all synonyms. Molecular biologists, for example, talk about gene
regulation. Here “regulation” is just another word for “control”.
Artificial control systems have existed for thousands of years, but until the Industrial Revolution
they were largely curiosities or else had niche applications. It was during the Industrial
Revolution that they became widely adopted and brought substantial gains in productivity; for
example, Watt’s centrifugal governor was used to control the speed of a steam engine. The
Industrial Revolution also saw the beginning of control theory: James Maxwell’s “On
Governors” was published in 1868.
Control systems are fundamental to a lot of modern technology, including automatic
appliances, factory robots, modern aircrafts, and space probes. Despite this, and although they
have long been used by engineers, control systems largely remain a hidden technology [48]. A
                                                 24


better understanding of control systems across the wider science community would greatly
benefit numerous areas of research, helping to solve many current and future challenges.
1.4.2 Control System Techniques and Objectives
The most famous control system technique is the negative feedback loop, where the system
tries to maintain a controlled variable close to a setpoint value by acting to minimize the error
between the current value and the setpoint. This is the technique implemented in Watt’s
centrifugal governor (fig. 1) and in natural homeostatic systems. It is, however, only one of the
many techniques that control systems can employ. Others include positive feedback, feed
forward, open loop, etc.
For a control system, just as important as pursuing a target is the manner in which it achieves
this. While a control system must respond quickly to changes in conditions, it must also
minimize overshoots, oscillations, and hysteresis. It must be resilient to noise and perturbation
too, and in case of partial failure of its components, it should degrade its performance rather
than lose control completely (graceful degradation). Finally, it should minimize wear and tear of
its components and the use of energy by avoiding superfluous control activity.
1.4.3 Implementing Control Systems in Computers
Control systems can be implemented using digital computers and, indeed, most modern ones
are: computer-based control systems are cheap, practical, and powerful. Control systems as a
                                                 25


class are not, however, instantiations of a Turing machine. Some control systems are discrete,
but many are analog and continuous and can only be approximated using a digital computer.
Figure 1.1: Steam loom centrifugal governor, Museum of Science and Industry, Manchester, UK.
Special types of algorithms are used when control systems are implemented using digital
computers. Typically, these algorithms simulate real-time concurrent execution by using
techniques such as time-slicing, where computer time is divided into short intervals and
distributed among many different tasks that ideally would be performed simultaneously and
continuously. Such tasks include the evaluation of current conditions to decide the next course
of action, the deployment of action sequences, the polling of data from sensors, and the
refreshing of outputs to actuators. The concurrent tasks are executed in a loop (which is
continually on) for the duration of the mission and often for the lifetime of the system.
                                                 26


1.4.4 The Data Processing Algorithm and IPT
In contrast to the algorithms used to implement (real-time) control systems in computers, the
algorithm that serves as the model for Information Processing Theory (IPT) is the standard data
processing algorithm, as used in tasks such as payroll, word processing, and weather
forecasting. This algorithm implements function mapping (in the mathematical sense) [22],
which takes in some data and, through the application of a set of rules, outputs the
corresponding result.
The adoption of the data processing algorithm as a model for IPT was likely influenced by
several factors. This was the algorithm used in the early computer applications, and it remains
the most common type of algorithm used today. It is still the standard algorithm taught in
computer science education. Another factor is that in the 1940s and 1950s, control systems
were simple and mostly mechanical or electromechanical devices; it was not until the late
1950s that digital computers became commercially available for control applications [49]. In
addition, control algorithms have traditionally been researched within engineering and not
computer science departments.
It is important to note that data processing applications, unlike control systems, do not have
their own timing but instead follow the timing of data arrival. They start processing data as
soon as it is received, and they produce the output as fast as possible. IPT suggests our
cognition functions as a series of isolated transactions, each starting with an input and ending
with an output [22]. Cognition supposedly waits for a stimulus, maps this stimulus to the
appropriate behavior to execute, and then waits for the next stimulus, in a stimulus–action
                                                 27


loop. As a consequence of this flawed understanding of cognition, most cutting-edge AI systems
– such as convolutional neural networks – are designed to operate in this way, mapping input
to output and iterating if necessary.
1.4.5 Natural Versus Artificial Control Systems
Many natural control systems have been characterized over the years and, despite their unique
substrates, they function much like artificial ones [50]–[53]. For example, in humans, the
subsystem responsible for thermoregulation uses negative feedback control [54], while the
subsystem responsible for birth contractions uses positive feedback control [55]. In E. coli, the
subsystem responsible for chemotaxis implements a proportional–integral (PI) controller [50],
which is the same type of control strategy used in calcium homeostasis in cows [56]. “Negative
feedback control,” “positive feedback control,” and “PI controller” are engineering terms that
describe effective control strategies that we developed independently; we only found out later
that nature also uses them.
A variation of the PI control strategy called proportional–integral–derivative (PID) is likely the
strategy most commonly used in industry today. This type of controller is not capable of making
predictions about the future nor capable of learning. A more advanced control method is Model
Predictive Control (MPC). MPC is capable of making predictions about the future of the
dependent variables; it can also learn the model that describes the behavior of the dependent
variables. One of its applications is to control flight paths of autonomous aircrafts. It will likely
be many years before we discover whether bees and birds use a similar control strategy for
flying. Nevertheless, there is an interesting parallel between how more sophisticated control
                                                   28


technology has incorporated prediction and learning in order to solve more complex problems,
and how natural cognition may have evolved similar capabilities as organisms faced more
complex behavioral control challenges [57].
There are a number of important differences between artificial control systems and natural
ones. When an artificial control system is designed, the aim is for it to perform as optimally as
possible while operating within well-defined constraints. Optimal performance is highly
dependent on minimizing noise. When redundancy is required, the most common method to
achieve this is to deploy additional copies of the same system. Furthermore, artificial control
systems are designed to be easy to maintain.
By contrast, natural control systems must primarily be evolvable. Whereas a random change to
an artificial control system would likely cause it to stop working, natural control systems must
be robust to mutations: they must be resilient to negative mutations while at the same time
being amenable to evolving adaptations. Natural control systems also tolerate higher levels of
noise and may even exploit it [58], [59], allowing them to function in a wider range of
conditions. They are also often redundant, but their redundancy is typically not due to multiple
copies of the same system but rather due to alternative systems that overlap. Finally, they must
have great generality to operate in unstructured, dynamic environments. As a result of these
features, natural control systems are rarely optimal, but they are often resistant to failure,
which also facilitates them being reused by evolution in new contexts.
Of course, not all artificial control systems are designed: some are evolved. Notable examples
are the control systems of finless rockets and legged robots [60], [61]. Evolved control systems
                                                  29


have a lot more in common with natural systems. For example, they are more robust to
mutations and more autonomous, since they evolve their own values and goals. Those artificial
systems that have so far been evolved, however, are nowhere near as complex as natural
systems.
Level of complexity is probably the most obvious difference between natural and artificial
control systems. The most complex artificial control systems today – such as those responsible
for automated factories, powerplants, and spacecrafts – are still much simpler than the control
system responsible for a whole living organism, such as a bacterium, not to mention the vast
distributed multicognitive systems of mammals. At the end of this chapter, we will discuss a
path forward for producing more autonomous and intelligent – and inevitably more complex –
control systems.
1.4.6 Understanding Control Systems Illuminates Cognition
The acceptance that cognition is a control system gives us a unified language to discuss both
natural and artificial cognition. It also lets us use an existing framework to analyze, benchmark,
construct, and deconstruct cognition.
For example, it is clear that mechanisms such as memory, learning, prediction, and planning are
not fundamental for rational control, and thus not necessary for cognition. They may be
valuable additions in certain circumstances, but their evolution or engineering depends on the
cost-benefit tradeoff. Conversely, a value system, internal states, and the evaluation of current
information to decide a course of action are fundamental requirements for rational control.
                                                    30


As discussed previously, the control system of more complex agents can be subdivided into
subsystems to facilitate analysis. This, however, abstracts away their interconnections, and it is
important to remember that cognition is the control system of the whole agent. For example,
despite humans’ highly derived form of cognition (multicognition) with its many specialized
subsystems, the experience of psychological stress affects the organism at every level, from
intellectual performance to digestion, to growth and development, and even to inheritable DNA
methylation patterns that prime the next generation.
Something that is not apparent when we study simple control systems, but which becomes
clear when we analyze more complex ones – natural or artificial – is that agents are not driven
by sensor input, i.e. agents are not stimulus-response systems. An agent has its own motive
force: the need to reevaluate current conditions and “decide what to do next” [62] (p. 233). For
this, information is a resource, not a prompt.
A complex control system designed using best practices typically polls sensors at regular
intervals to extract semantic information. It only responds to this information if it is relevant to
the current state of the system. In other words, even if a sensor input is updated continually,
the new values will not affect the behavior of the system until the system chooses to read the
sensor and finds information that is relevant. This is similar to what occurs in nature, where
organisms are awash with sensory stimuli but, at their own timing, selectively respond to some
stimuli and not others, depending on the context.
Put simply, rational agents do not necessarily react to stimuli: they respond to information.
                                                31


1.4.7 Why Cognition Is Not Information Processing
Using the words "information" and "process" when referring to cognition may seem innocuous.
After all, it makes sense that cognition uses information and that it is a type of process.
However, the expression "information processing" has a very specific meaning, synonymous
with data processing. When used in reference to cognition, it stands for IPT's assumption that
cognition is an instantiation of a Turing machine executing a data processing algorithm. This
assumption, however, is false: cognition is a control system and control systems are not
instantiations of a Turing machine. As discussed in Section 1.5.3, control systems can be
implemented in digital computers as reasonable and practical approximations using a real-time
control algorithm. The data processing algorithm is not a good model for a control system, even
as an approximation.
Data processing follows a unidirectional chain of events that starts with data input and ends
with the output of the corresponding result. Control systems also have inputs and outputs, but
the chain of events does not need to flow in any particular direction. In addition, it is common
to have inputs without correlated outputs and outputs without correlated inputs (past or
present). There can also be internal chains of events that change the state of the system due to
the passage of time and which do not involve inputs or outputs. In control systems, the output
is not a transformation of the input: input and output may influence each other but they do not
need to be connected. Control systems can also be highly recursive, in which case the system’s
decisions affect not only inputs and outputs but also the system itself and how it operates.
                                                 32


While data processing systems are driven by data (syntactic information), control systems are
not. The receipt of data is what triggers a data processing algorithm to start working. Once it
has produced the corresponding output, it either terminates or waits for more data. Control
systems, on the other hand, operate continuously, and the receipt of new data, such as an
updated sensor measurement, does not necessarily cause the system to respond. Instead,
control systems regularly scan their sensors to extract semantic information and make it
available for evaluation; they can continue to make decisions even in the absence of new data.
Although the data processing algorithm is not a good model for a control system, it should be
noted that data processing can be part of a control system in a supporting role, such as in the
application of statistical filters to data or in the storage and retrieval of data from memory. Also
worth noting is that complex cognitive systems like ours can manipulate symbols and perform
data processing tasks, but within limitations. The cognitive mechanisms that support these
tasks perform very differently to the data processing algorithm.
1.4.8 Changing the Cognition Discussion: Instead of Process Information, Make
Decisions
The idea that cognition is information processing is deeply entrenched, including in the
language that we use when discussing behavior, cognitive phenomena, and its mechanisms. It is
important that we adjust our language to allow us to leave behind this flawed understanding of
cognition. Instead of saying “process information,” we should say “make decisions.”
                                                    33


For example, instead of “mycorrhizal networks process information,” we should say
“mycorrhizal networks make decisions.” Instead of “uncovering the information processing
mechanisms of the cell,” we should say “uncovering the decision-making mechanisms of the
cell.” Instead of “how information is processed in the cerebellum,” we should say “how
decisions are made in the cerebellum.”
Other expressions such as “control the organism,” “evaluate and make decisions,” “coordinate
its actions,” and “sense and respond” are also good alternatives.
1.5 Cognition is Intrinsic and Necessary for Life
1.5.1 What is Life?
Of the definitions that we have discussed so far, life is the most elusive. However, there seems
to be a consensus that living systems possess a few basic properties [63]–[65]:
     •   Enclosure, meaning that the living system is separate from the environment;
     •   Self-maintenance 16, meaning that the living system can maintain its integrity despite wear and
         tear, and through developmental and adaptation processes;
     •   Self-sustenance, meaning that the living system can obtain resources, such as energy,
         information, and materials, from the environment;
     •   Self-reproduction, meaning that the living system can produce additional living systems like
         itself.
It is possible to define life as possessing only a subset of these properties, however, this risks
including systems that are less viable or less interesting, as living. Likewise, it is possible to
16
   Enclosure and self-maintenance can be summed as autopoiesis [66], [67].
                                                     34


define life with additional properties beyond these four, but that risks biasing our definition
toward systems that closely resemble biological Earth life.
For example, a version of life where organisms could not self-replicate, would likely be fragile,
prone to extinction, and vulnerable to changing environmental conditions. Reproduction
provides opportunity for renewal, multiplication, perpetuation, and Darwinian evolution.
Without reproduction, organisms would accumulate damage until they could no longer survive,
since even self-repair mechanisms can suffer damage. In the absence of a process that created
organisms anew, this version of life would be vulnerable to extinction. Moreover, without
undergoing Darwinian evolution, these organisms would have limited ability to adapt and
increase in complexity, and no ability to speciate. Therefore, this version of life would likely be
very simple and dependent on a benign environment for survival.
Alternatively, a version of life where organisms could not self-maintain, would likely also be
fragile, simple, and vulnerable to extinction. Without self-maintenance, organisms would be
prone to degradation and need to replicate quickly in order to avoid extinction. This dynamic
would select against size and complexity. Therefore, these organisms would likely remain small,
simple, and dependent on a benign environment until they evolved mechanisms for self-
maintenance.
Finally, a definition of life with additional properties, for example that life must be a chemical
system [68], would unnecessarily rule out life that we may create using a mechanical,
electronic, or digital substrate.
                                                  35


1.5.2 Life Requires Cognition
Assuming that the set of life defining properties above is sound, it becomes evident that life
must also be capable of regulating its behavior based on current conditions; in other words, it
must be cognitive. Without cognition, living systems would be clockwork, meaning they would
need to self-maintain, self-sustain, and self-replicate without any form of regulation. Since
these clockwork systems could not adjust their behavior as conditions changed, they would
require an unrealistically stable, homogeneous, and benign environment to survive. Moreover,
they would remain vulnerable to random events such as disrepair and mutations. If such
systems could exist at all, they would likely be very simple.
In practice, a perfectly stable and homogeneous environment for life is not possible. Even the
most consistent environment, such as an underground reservoir where free energy comes from
radioactive decay, would be disrupted when organisms started replicating. The presence of a
variable number of organisms and their potential interactions would be sufficient to make the
environment heterogeneous, non-stable, and less predictable.
Moreover, regulation allows adaptation not only to the external environment but also to
internal changes. For example, regulation and coordination among separate processes in a cell
is essential for survival. In order to replicate, a cell must grow in size at the same rate that it
duplicates its genetic material, otherwise, at cell division, there could be more of one than the
other, causing a protein or genetic imbalance. In a clockwork cell, all of these processes would
need to happen without sensing or regulation, depending on perfect sequencing and
synchronization. This is an unrealistic proposition considering the stochasticity of the chemical
                                                    36


environment at the scale of the cellular components. Such a cell would be sensitive to errors,
and difficult to evolve. Mutations could quickly lead to imbalances for which the cell would
have no means to compensate.
Regulation also allows living systems to stabilize favorable steady state conditions far from
equilibrium that would not be possible without active control. These nonequilibrium steady
states enable living systems to achieve yields and accuracy levels that that are critical to life,
such as in the error correction during DNA replication [69] while minimizing energy
expenditure. In addition, regulation allows living systems to maximize the extraction of energy
and other resources from the environment for a given energy budget, which directly affects
their fitness. They do this by using information to make decisions, such as when cyanobacteria
adjust their growth rate according to light availability (Chapter 5). Therefore, we can say that
living systems convert information into free energy [70], [71], which clockwork systems cannot
do.
A clockwork version of life would be brittle, less viable and unlikely to exist in practice.
Therefore, we can assume that cognition must be part of life from the outset.
1.5.3 Cognition Is a Precursor of Life
There are some compelling theories of how life originated on Earth, such as Eigen and
Schuster’s hypercycles, Kauffman’s autocatalytic sets, and Deacon’s Autogen. Although none
may turn out to fully explain the origins of life, their common premise, that life originated from
self-replicating chemical systems undergoing pre-biotic Darwinian evolution, is likely correct.
                                                   37


In physical reality, self-replication is sufficient for Darwinian evolution. Replication implies
inheritance, whether the system constructs a copy of itself using an explicit genetic template,
or the system serves as a template for itself and divides by fission. In addition, any replicating
system is subject to copy errors that produce variation. Even error-correcting copying processes
in a computer are subject to errors, such as when cosmic rays change memory values. Finally,
any self-replicating system is subject to natural selection, since space and resources, even if
vast, are finite.
At any point in the evolution of these self-replicating chemical systems, they could have
benefited from evolving self-regulation. Self-regulation could have stabilized favorable non-
equilibrium conditions, shielding them from environmental variation and allowing them to
replicate faster. Over time, these more stable self-regulated systems could have outlasted non-
regulated ones and also become more complex.
Regulation would have been relatively straight-forward to evolve. All that is required is that one
or more of the molecular components in the self-replicating chemical system acquires allostery,
i.e., it becomes sensitive to another chemical from the environment, and changes behavior in
its presence. Allostery is relatively common among macromolecules (biological or not), which
generally form pockets that can temporarily bind smaller molecules. If this happens to an
enzyme, for example, the binding may increase or decrease the enzyme's function.
If a newly evolved allostery turned out to be beneficial, such as when the binding chemical
signaled a change in conditions that the system could now respond adaptively, it could be
selected during evolution. Of course, allostery is not the only form of regulation that self-
                                                     38


replicating chemical systems could have evolved, but it is the simplest and one that could
evolve commonly. (I explore this in experiments in Chapter 5)
Therefore, a proto-life chemical system evolving allostery and later giving origin to the first cell
is immensely more plausible than a fully formed clockwork cell arising, and then evolving
regulation. Therefore, it is safe to assume that cognition appeared before life, and that it is
intrinsic to it. In fact, cognition is one of the properties that enables life. As Jacques Monod
said, allostery is “the second secret of life”, the first being the genetic code [72].
1.5.4 Life Is a Type of Rational Agency
A simpler way of defining life is as a type of rational agency, where most actions are self-
directed: self-maintenance, self-sustenance, and self-reproduction. This contrasts with man-
made rational agents, which direct most of their actions externally, such as the pool pump
system keeping the level of the water.
Life's self-directed actions create recursive processes where the system's actions affect the
mechanisms that produce those actions, and so forth. Additionally, many of these processes are
operating in parallel and are dynamically interacting, often at a microscopic level and at
different time scales. This intricacy makes it difficult to analyze and understand life's underlying
processes, causing them to appear mysterious and special.
Life is also an extreme case of autonomy and intelligence. While man-made rational agents
need to be supervised, living beings are independent. They have their own values, are
                                                    39


responsible for their own decisions, and have been selected by evolution to be highly
competent within their environments.
While we would like to create machines that reach levels of autonomy and intelligence similar
to natural organisms, we do not need to recreate life to achieve this. We should, however,
attempt to emulate life's genetic representation and the sensing and regulatory mechanisms it
expresses, since they have the track record of evolving control systems with the levels of
complexity and competence that we seek.
1.6 Path Forward
Although cognition is a simple concept, the demands of natural life have led to the evolution of
cognitive systems with extreme levels of autonomy, intelligence, and complexity. Even the most
advanced man-made control systems, such as those from automated factories, autonomous
vehicles, and space probes, seem simplistic in comparison to the cognitive system of a lowly
bacterium. If we succeed in creating a control system with the levels of intelligence and
autonomy of an insect, or even a mammal, it will be by far the most complex system we have
ever created.
However, current engineering techniques may not be up to the task. Manually written software
reaches a complexity barrier somewhere between 107 and 108 lines of code [73]. State-of-the-
art ANNs get exponentially harder to train and deploy as they become more complex, the
largest current ones costing upwards of millions of dollars in computer time to train [74],
indicating that they, too, may be approaching a complexity barrier. Another potentially more
                                                40


vexing problem, whether we are writing the software or training an ANN, is that people are not
good at anticipating all the possible scenarios that an autonomous system will encounter during
its operation, which can lead to hard failure in a multitude of unforeseen cases.
Digital evolution would appear to be the perfect way to solve this problem. However, current
digital evolution approaches have struggled with a low complexity barrier. If we could
overcome the technical limitations, evolution would allow us to create cognition in an open-
ended manner, which has the potential to be more generalizable than anything we design or
train. Moreover, evolution allows us to make progress even when we do not fully understand
the problem, as is the case with complex cognition. Evolution is an inventive process; all it
requires is for us to be able to evaluate the quality of a provided solution, even if we cannot
envision a high-quality solution ourselves.
In the following chapters, I apply current evolutionary techniques to the problem of evolving
cognition and demonstrate a potential solution to the complexity barrier in the form of a new
evolutionary platform based on a genetic representation modeled after those found in nature.
In Chapter 2, we used the evolutionary platform Avida to investigate the evolution of navigation
behavior and associative learning. During these experiments we observed that behaviors evolve
in a predictable sequence, where more complex behaviors build upon simpler ones. Avida
proved well-suited for the evolution of behavioral control systems, thanks to its bacteria-like
circular genome. Avida’s genome allows the control program to be executed continuously
without contrived entry and exit points. In addition, when instructions for input and output
                                                 41


evolve, they can be used asynchronously at the request of the control program, informing
program’s decisions instead of prompting them.
In Chapter 3, we used Avida to investigate the evolution of patch harvesting behaviors. The
earliest animals with a nervous system in the fossil record are microbial mat miners from the
late Ediacaran. Their fossilized burrows are found all over the world and have intrigued
scientists for over a century. Although our study was exploratory, we were able to replicate the
harvesting patterns seen in the fossil record, propose evolutionary relationships among
different behaviors, and identify an additional selection pressure that may have driven the
evolution of these animals’ behavior.
In Chapter 4, we extended the study on the evolution of navigation and associative learning to
evolve a more complex learning ability, configural learning. This ability allowed an agent to
distinguish between combinations of cues and individual cues and respond differently in each
case. Although we succeeded in evolving the expected behavior, we also seem to have
approached a practical limit on the complexity of the control systems that we can evolve with
the current version of Avida.
In Chapter 5, I demonstrate Elfa, a new digital evolution platform inspired on cellular regulatory
and signaling mechanisms, by investigating the evolution of allosteric regulation in simulated
cyanobacteria. Ancestral cells express three core genes at low and constant rates; a growth
factor gene, a photosynthetic complex gene and an RNA polymerase gene. Cells also contain
five types of ligands whose concentration signals the amount of energy reserves the cell
contains, the intensity of the environmental light, starvation, and reactive oxygen species
                                                42


stress. I found that while most populations evolved a higher rate of gene expression without
any regulation, some populations evolved allosteric regulation of different genes, which
adjusted their behavior to the light cycle, allowing them to harvest more energy, and replicate
faster than any of the non-regulated populations.
                                                 43


Chapter 2: The Evolutionary Origin of Associative
Learning
Authors: Anselmo C. Pontes, Robert B. Mobley, Charles Ofria, Christoph Adami, and Fred C.
Dyer
This chapter is adapted from Am. Nat. 2020. Vol. 195, pp. E1–E19 [75]. © 2019 by The
University of Chicago. CC BY-NC 4.0. DOI: 10.1086/706252
2.1 Introduction
Associative learning has long been considered fundamental to the adaptability of behavior and
development of knowledge about the world [76]. It is also widely assumed that associative
learning emerged as animal behavior evolved greater complexity and may have provided new
avenues for this complexity to increase [77]–[81]. The general fitness advantage of learning in
living organisms seems clear: learning enables an organism to adapt its behavior during its
lifetime without requiring genetic changes across generations (as with evolution), and, unlike
other forms of behavioral plasticity that occur during development, learning can result in very
rapid rather than gradual behavioral modifications [82], [83]. Most research on the evolution of
learning has focused on the adaptive specialization of learning—how the speed of learning,
biases to learn certain things better than others, and capacity to store learned information
correlate with the reliance on learning in an organism’s natural environment [84]–[89]. Little is
known, however, about the historical question of what selection pressures and evolutionary
                                                44


precursors facilitated the emergence of learning from ancestors incapable of doing so or about
the processes that allowed more complex forms of learning to evolve from simpler ones [90],
[91].
Most people assume that complex behavior evolves in response to complex challenges;
however, the evolution of behavioral complexity need not entail the emergence of learning
[23]. Rather, learning evolves under specific environmental dynamics: where conditions that
are relevant to the organism's fitness change on the timescale of generations but remain
relatively stable within an individual’s lifetime [85]. Furthermore, in the particular case of the
evolution of associative learning, there must also be learnable cues that reliably correlate with
the state of the environment [92]. In this situation, organisms may benefit if they use those
cues to track current conditions and map them to appropriate responses. Since the
environment and cues may change between generations, the mapping cannot be encoded
genetically and must be learned during the organism’s lifetime.
Researchers have explored the factors of environmental dynamics and cue availability that are
necessary for the evolution of associative learning using both mathematical and empirical
approaches [85], [92]. However, it is still an open question whether these factors are sufficient
for associative learning to emerge during the evolution of an organism's behavioral repertoire.
Here, we propose:
 HYPOTHESIS 1: The initial evolution of associative learning depends on the scaffolding
provided by the prior evolution of a repertoire of instinctual behaviors that exploit stable
environmental patterns.
                                                  45


Skinner and others speculated that complex behavioral traits do not evolve independently of
each other but build on preexisting ones according to a characteristic evolutionary sequence
that starts with simple movement, then sensing, followed by tropisms and reflexes, and finally
learning [57], [93]. Similarly, it has been suggested that different forms of learning are not
independent but evolve from one another in a specific sequence, where more complex forms
build on the mechanisms of simpler ones and subsume them [94]–[97]. For example,
associative learning would have evolved from sensitization [94]–[97], a simpler, nonassociative,
form of learning where an organism increases its response to a repeated stimulus [98].
Therefore, we propose:
HYPOTHESIS 2: Complex behaviors, including learning, do not arise and function independently
from one another. Instead, as more complex cognitive processes arise, they do so in a modular
and stepwise manner, where early instinctual behaviors (such as moving and sensing) are co-
opted and integrated into increasingly more complex ones (such as error recovery or path
prediction) before eventually reaching associative learning.
 It has also been speculated that the emergence of associative learning required only minor
modifications in preexisting memory mechanisms [80], [96], [97], enabling it to evolve in
parallel in different species [80]. Thus, we propose:
HYPOTHESIS 3: Associative learning can arise suddenly, as a result of small modifications in
preexisting cognitive mechanisms, as opposed to arising gradually and independently by
accumulating incremental changes under selection.
                                                  46


Finally, given the expectation that environmental characteristics, such as stability and cue
availability, shape the type of learning that evolves [85], [89], [92], we investigate an additional
hypothesis on the flexibility of the associative learning mechanism that evolves in a particular
environment. We propose:
HYPOTHESIS 4: Organisms that evolve associative learning will not be able to change
established associations (e.g., reversal learning) unless such changes were necessary for success
during evolution.
Our research focuses on a definition of associative learning that emphasizes its consequences
for behavior rather than the mechanisms by which it works. We think this approach is justified
because associative learning is traditionally defined in operational rather than mechanistic
terms-for example, as "a behavioral modification, dependent on reinforcement, involving new
associations between different sensory stimuli, or between sensory stimuli and responses”
[80]- and may not even be a unitary behavioral trait with consistent properties across species.
For example, it is by no means clear that associative learning involves distinct mechanisms from
those underlying simpler, nonassociative forms of learning, such as habituation and
sensitization. In Drosophila, mutants incapable of associative learning also show reduced
habituation and sensitization [99], and in Aplysia, sensitization and associative learning share
many of the same molecular elements [100]. It is also not clear whether there is only one way
of implementing associative learning mechanistically. All animals in which associative learning
has been well established have a central nervous system (i.e., brains) - although many animal
groups have not yet been tested [80] - but having a brain is not necessary for associative
                                                  47


learning: plants are capable of it [101], and single-cell organisms may be as well [102], [103].
These observations suggest that associative learning has evolved independently, acquiring
different properties in different lineages [80], [88], [90]. Hence, they also justify the assumption
that we can study the evolution of associative learning as a phenotypic attribute of behavior
that is independent of a particular mechanistic implementation.
Major challenges arise in studying the evolutionary origin of learning. One challenge is the utter
lack of fossil evidence, especially from periods as remote as the Precambrian, when associative
learning behavior is believed to have first evolved [80]. Another is the difficulty of performing
phylogenetic comparisons to study the origin, as opposed to the adaptive function, of
behavioral traits. Although phylogenies are valuable to infer ancestral character states,
sequences, and timing of evolution of traits, this approach is virtually silent on the selective
forces and mechanisms involved [104] and may suggest patterns of evolution that could result
from multiple different processes [105]. In addition, associative learning is such a widespread
and likely ancient behavior that it is particularly challenging to reconstruct an accurate
phylogeny because of the lack of out-groups and because its origin presumably predates the
rapid adaptive radiation of the Cambrian explosion [98]. The ubiquity of associative learning
behavior among extant species is also a challenge for experimental evolution, which has been
very successful in studying the adaptive modification of existing learning mechanisms in animals
but can reveal little about the origins and early evolution of learning [91].
To overcome these limitations, here we study the origins of learning behavior in populations of
self-replicating computer programs that undergo open-ended evolution in a virtual
                                                  48


environment [106]. These digital organisms are selected for their ability to cope with behavioral
challenges in which associative learning may confer a fitness advantage; specifically, the
environment provides alternative courses of action and cues that reliably correlate with the
correct action, although these cues vary across generations [92].
This approach allows ample opportunities for a wide range of behaviors to evolve and enables
the discovery of evolutionary principles that are potentially independent of the cognitive
machinery that is undergoing evolution. We emphasize that digital evolution is not a simulation
of evolution but rather an instantiation of it [107]: although digital organisms are evaluated in
simulated environments, their behavioral control algorithm undergoes actual Darwinian
evolution. Specifically, (i) organisms reproduce and pass on their evolved traits, including their
behavioral algorithm, to their offspring; (ii) inheritance is subject to mutations, producing
variation; (iii) individual fitness depends on an organism’s performance at specific behavioral
tasks and determines the outcome of the competition for space in a size-limited population.
This approach enables true experimental study of evolutionary history across multiple replicate
lineages evolving under different conditions, providing insights not only on the outcomes of
evolution but also on the transitions that occur in different lines of descent. Digital evolution
has a proven track record of expanding evolutionary theory [108]–[110] with supporting
evidence often collected later in biological systems [111]. Previous studies in Avida have also
demonstrated the evolution of instinctive navigation, such as gradient ascent and trail-following
behavior [112], including the genetically encoded use of memory to dictate subsequent
behavior [106]. Here, we extend this work beyond reflexive behaviors to study the evolution of
                                                  49


associative learning where each individual organism must discover a mapping between
environmental cues and the optimal response.
Our results support the aforementioned hypotheses and, moreover, provide a rich picture of
the circumstances that favor—or disfavor—the evolution of learning, including the critical role
played by historical contingency. Learning is a rare outcome of evolution in our system, not
because of any intrinsic difficulty in the underlying computation but rather because oftentimes
lineages evolve highly flexible behavioral strategies over which learning does not provide a
strong selective advantage. When learning does evolve, it emerges via an almost stereotypical
sequence, as proposed by Skinner and others [57], [93]. Finally, we find that the evolution of
behavior is inseparable from the evolution of an intrinsic value system, the innate gauge of an
organism's experiences that provides positive or negative feedback on its actions.
2.2 Experimental System
We used the Avida digital evolution platform for all of our experiments [113], [114]. Avida is a
linear genetic programming platform, meaning that each organism’s genome consists of an
ordered sequence of computer instructions in a machine-like language. Instructions are simple,
self-contained operations, such as adding two values, storing a value in memory, or skipping to
another instruction if one value is greater than another. During evolution, random mutations
occur that can insert, remove, or replace instructions in offspring. Note that any sequence of
Avida instructions can be executed; as such, mutations will always produce valid programs even
if their functionality may be meaningless.
                                                 50


In addition to the instructions for arithmetical and logical operations described above, we used
a single instruction that caused the organism to reproduce as well as a set of instructions that
acted as simple sensors and effectors to interact with the environment (described in the next
section). Using Avida provided key benefits for the experimental study of evolution. For
example, the set of instructions we used formed a Turing-complete programming language
that, in theory, can represent any algorithm—including any behavioral control algorithm—given
the necessary sensors and effectors. In addition, it is easy to analyze an Avida organism to
dissect and study the behavioral control algorithms that evolve. Furthermore, we can archive all
ancestors and their evolutionary lines of descent to examine the evolutionary transitions that
occurred along any lineage, allowing us to study patterns that reveal how one set of behaviors
might potentiate another.
An Avida organism is defined by a sequence of instructions (its genome), and each particular
sequence defines its genotype. In our experiments, each population was seeded with a “naive”
organism that lacked any instruction for behavioral control other than the one necessary to
reproduce. Such an organism’s genome consisted of a sequence of null instructions that acted
as placeholders for future behavioral “genes” and a single “reproduce” instruction. To
reproduce, an organism had to execute a minimum number of instructions, that is, spend a
minimum amount of time in the environment in order to mature. At the same time, an
organism also had an upper limit in the number of instructions it could execute before it tried
to reproduce, essentially creating a maximum age. If an organism failed to reproduce by the
time this limit was reached, it was eliminated from the population. Reproduction was asexual
and resulted in the production of two offspring, both inheriting a copy of the parent’s genome.
                                                 51


However, only one of the offspring was subject to mutation, while the other remained identical
to the parent and essentially replaced it.
Populations were capped at 3,600 organisms. Once that limit was reached, every organism that
was born resulted in an existing one being randomly removed. Organisms did not interact with
each other in the environment; however, the age limit and the competition for space in the
size-limited population created a strong selection pressure for fast reproduction.
How well an organism performed the behavioral task determined the rate at which its
offspring’s instructions were executed and consequently how quickly they could reproduce.
Therefore, the better an organism performed on the behavioral task, the faster its offspring
executed their behavioral algorithm and reproduced. Thus, behaviors evolved in this digital
system in a purely Darwinian fashion.
2.3 The Behavioral Task
Bees, ants, and other insects are known to use local and distant landmarks for navigation [106],
[115]–[118]. For example, experiments have shown that bees can learn visual cue associations
to successfully navigate complex mazes [119], [120]. Inspired by these experiments, the
behavioral task that we presented to evolving Avida organisms consisted of navigating a trail of
nutrients in a virtual arena (fig. 2.1), where nutrients provided cues that indicated the direction
to follow — if organisms evolved the ability to sense and use them.
                                                  52


Figure 2.1: Sample Arena and Nutrient Trail. Shown is one of four virtual arenas from an environment.
Each virtual arena contained a single trail of nutrients laid out in a unique configuration. At the
beginning of its life, each organism was placed alone at the start of the trail (green circle) in a randomly
selected arena and oriented in the direction of the next nutrient.
An organism’s task was to complete as much of the trail as possible and then reproduce before
the end of its life. The system kept track of the organism’s cumulative performance by counting
the number of new nutrient locations it visited and subtracting the number of empty (i.e., off-
trail) locations encountered. This count was then divided by the total number of nutrients in
the trail to compute the organism’s “task quality,” which ranged from 0 to 1 (negative values
were set to zero). Nutrient locations were counted only on the first visit; subsequent visits to
the same nutrient location would not affect an organisms' task quality. However, visits to
empty locations were always deducted. The organism had no sensory feedback about its task
quality (i.e., cumulative performance), similar to the way a natural organism cannot sense its
own fitness. Nevertheless, our organisms are limited relative to natural ones that may be able
                                                      53


to measure payoffs of their foraging decisions by the rate of some physiological condition, such
as gut fullness [121].
Each environment in our evolutionary experiments consisted of four virtual arenas, each with a
different trail configuration (fig. 2.1). Every time an organism was born, it was randomly
assigned to one of the four arenas, placed at the beginning of the trail on a nutrient location,
and oriented in the direction of the next nutrient. The use of four trail configurations reduced
the likelihood of an organism evolving a rigid control algorithm tailored to a single nutrient trail
(genetically hardwiring a sequence of actions) instead of a flexible control algorithm that
captures the principles of trail navigation.
Each of our experiments consisted of between 50 and 900 replicates. At the end of an
experiment, we selected the predominant (most abundant) genotype from each replicate's final
population for behavioral analysis. Given the large population size, the predominant genotype
typically represented dozens of organisms, implying that they, on average, outperformed the
rest of the population on all four trail configurations. Indeed, in these experiments we found
that the predominant genotype typically had the highest task quality scores on each of the four
trails; thus, we measured its performance by computing the population’s average maximum
task quality (AMTQ) scores across all trail configurations.
An organism’s interaction with the environment depended on sensor and effector instructions
acquired through mutation and maintained during evolution. These instructions conferred the
abilities to sense the nutrient content of the current location (“sense current”), rotate right by
                                                   54


45 degrees (“rotate right”), rotate left by 45 degrees (“rotate left”), take one step ahead
(“move ahead”), and take one step back while facing forward (“move back”).
The execution of a sense current instruction provided feedback, in the form of an integer, about
the nutrient content of the location the organism occupied. Empty locations and nutrients, in
both straight portions of the trail and at turn points, were each sensed as different values.
Therefore, the numeric value of the nutrients could cue the organism to the direction of the
trail once they evolved the ability to interpret the sensed values correctly. There were four
types of cues: right turn (45 degrees), left turn (45 degrees), forward, and empty location (fig.
2.1). Nutrients that indicated forward (forward cue) and empty location were always
represented by the integers 0 and -1, respectively. Meanwhile, nutrients that indicated turns
(turn cues) were each assigned a distinct random number between 1 and 100 every time an
organism started on a trail, and this assignment persisted only during the organism’s lifetime.
Since forward cues and empty locations had persistent values between generations, organisms
could evolve to use them to predict optimal future moves. However, the environmental
uncertainty represented by randomized turn cues created an additional challenge for the
organism: it not only had to move and follow the trail, but it also had to identify the direction
represented by the turn cues. The optimal way to overcome this challenge was for an organism,
within its lifetime, to associate either the right turn cue or the left turn cue with the correct
action, thus identifying the opposite turn cue by exclusion.
Although one cannot predict the course of evolution, if associative learning evolved in our
experiments, we expected to recognize it by observing the path of the organism along the trail.
                                                   55


When placed on a new trail and allowed a period of exposure to the different turn cues, an
organism capable of associative learning should be able to consistently turn to the correct
direction every time it encounters a turn cue, something that would not be possible if the
organism were using heuristics or choosing randomly.
2.3.1 Experimental Conditions
2.3.1.1 Experiment 1
In experiment 1, we tested four different environments, each with four possible trail
configurations (table 2.1; figs. A.1-A.4, Appendix A). In three of the environments, the trails of
nutrients started with a simple (and presumably predictable) pattern (table 2.1). In the fourth
environment, which served as a control, the trails provided nutrients in an unpredictable
pattern - that is, each of the first two turns had equal probability of being to the right or to the
left. This setup allowed us to test our first three hypotheses (as presented above). We
performed 50 evolutionary replicates for each of the four conditions listed in table 2.1. See
Section A.1 of Appendix A for additional details on methods.
                                                  56


Table 2.1: Environments for experiment 1
                                    Predictable-start environments                         Control environment
                One fixed turn          Two fixed turns       Nutrient cued               Random start
                The first turn          The first turn was    The direction of the first  The direction of the
                was always to           always to the left    and second turns was        first and second turns
  Trail start   the right in all        and the second        random in all four trails   was random, and the
  pattern       four trails (fig.       was always to         but could be predicted      number of forward
                A.1).                   the right in all      by counting the number      cues preceding the
                                        four trails (fig.     of forward cues             first turn was the
                                        A.2).                 preceding the first turn    same in all four trails
                                                              (an odd number meant        (fig. A.4).
                                                              left, while even meant
                                                              right; fig. A.3).
Note: Each environment contained four different trail configurations. An organism experienced only one trail
configuration in its lifetime. See section A.1 of Appendix A for images of each environment.
2.3.1.2 Experiment 2
In experiment 2, we applied an additional selection pressure aimed at the evolution of reversal
learning. We used only the nutrient cued environment but reversed the turn cues at
approximately the 85% mark of each trail (fig. A.5). In a complementary experiment, we tested
different cue reversal positions ranging between 10% to 90% in 2.5% increments and found that
it did not affect the results significantly (Appendix A, sec. A.4; fig. A.8). Therefore, we report
only the results for the 85% mark.
We performed 900 evolutionary replicates in experiment 2. The reason for the larger number of
replicates than in experiment 1 was to generate a sufficient number of phenotypes for lineage
studies, especially to explore the ancestry of the rare organisms that evolved reversal learning.
A lineage study consists of singling out the final predominant organism of a population and
reconstructing its line of descent, testing every ancestral genotype on the behavioral task to
                                                           57


uncover how the behavior evolved over time. Although this experiment was designed to test
hypothesis 4, it also enabled us to obtain additional evidence relevant to hypothesis 3. See
Section A.1 of Appendix A for additional methodological details.
Raw data, code, and a video associated with this research are available in the Dryad Digital
Repository: (http://doi:10.5061/dryad.f45gh6s;(Pontes et al. 2020). The custom version of
Avida used in this study is available at the following location:
https://github.com/mercere99/Avida-AssociativeMemory [114].
2.4 Results
2.4.1 Repeated Evolution of Adaptive Behaviors: Error Recovery, Imprinting, and
Reversal Learning
Our experiments resulted in the evolution of organisms capable of adapting to unpredictable
environments by using a variety of strategies, including associative learning. We also observed
the evolution of flexible strategies that did not rely on learning (table 2.2). We called the most
successful nonlearning strategy “error recovery,” in which an organism attempted to follow the
nutrient trail and, on stepping off the trail, performed the necessary actions to return to it but
did not modify its future behavior based on the error. A particularly notable result was the
repeated evolution of associative learning, including both a rigid form that we called
“imprinting”, and a more flexible form that we called “relearning” (described in table 2.2).
We also found recurrent patterns in the behavioral strategies that evolved. Organisms from
different evolutionary replicates, which inevitably had genotypes producing distinct behavioral
                                                   58


control algorithms, generated a consistent set of behavioral phenotypes. We analyzed more
than 300 out of 1,100 replicates across all experimental conditions and found, notably, that
they all fell into five easily recognizable categories, including relearning, imprinting, and error
recovery (previously mentioned), plus “searching” and “path predicting” (see table 2.2). We
found some hybrids of these strategies as well.
                                                  59


Table 2.2: Behavioral strategies found in all experiments
                        Behavioral
                                                                  Description                              Typical AMTQ
                         Strategy
                                        A flexible and generalizable strategy, based on instrumental       95% - 97%
                                        conditioning, that allowed organisms to navigate any trail         (experiment 2;
                                        configuration regardless of the starting pattern. Organisms        97% - 99% in
                       Relearning
                                        using this strategy were able to re-form the cue-response          non-cue-
                                        association multiple times, even when the cues were                reversal
                                        reversed (fig. A.15).                                              environment)
 Learning behaviors
                                        A somewhat rigid strategy, based on instrumental
                                        conditioning, where organisms made the cue-response
                                        association early, and only once, in their lifetimes (fig. 2.2).
                                        These organisms were not able to relearn cues that were            97% - 99%
                                        changed or reversed. This strategy was further classified          (experiment 1;
                       Imprinting       into two subtypes: “generalizable” and “nongeneralizable.”         87% - 92% in
                                        The generalizable subtype enabled organisms to imprint             cue reversal
                                        and navigate on any trail configuration regardless of the          environment)
                                        starting pattern, while the nongeneralizable subtype
                                        limited organisms to imprint on trails with the same
                                        starting pattern in which the organisms evolved.
                                        A flexible and generalizable strategy that allowed
                                        organisms to navigate any trail configuration regardless of
                                        the starting pattern. These organisms did not discriminate
                                                                                                           78% - 84%
                                        between right and left turn cues. Instead, they reacted to
                       Error recovery                                                                      (experiments
                                        turn cues by turning in one direction (each organism had its
                                                                                                           1 and 2)
                                        own default direction) and, if this direction led an organism
                                        off the trail, returning and trying the other direction (fig.
 Reflexive behaviors
                                        2.2).
                                        A strategy that appeared only in hybrid combinations with
                                        others, especially error recovery and path predicting. It was      48% - 60%
                       Searching        triggered if the organism stepped off the trail and typically      (experiments
                                        involved performing several moving and turning steps to try        1 and 2)
                                        to find another segment of the trail and rejoin it (fig. A.7).
                                        A strategy where organisms encoded behavioral sequences
                                        in their genomes that matched the initial portion of               7% - 14%
                       Path
                                        different trails (fig. A.12). This strategy enabled them to        (experiments
                       predicting
                                        successfully navigate the first few segments of any trail but      1 and 2)
                                        not the entire trail.
                                                                  60


Table 2.2 (cont’d).
Note: We analyzed more than 300 organisms across all experimental conditions. Although the details of
their behavior differed among experiments, all behavioral phenotypes could be classified into five
strategies or hybrids of two or more. The performance results for each behavioral strategy fell into a
typical performance range, measured by average maximum task quality (AMTQ), and ranked from
highest to lowest.
Figure 2.2: Two top-performing strategies in experiment 1. Shown are the paths of the final
predominant organisms from two different replicates that evolved in the nutrient cued environment in
experiment 1. Both were tested in the same trail configuration to facilitate comparison. In the left panel,
an organism using an error recovery strategy achieved a task quality score of 81% of the maximum.
Starting from the green circle, it moved straight while sensing forward cues but always tried to turn right
(45 degrees) when sensing a turn cue. If turning right led the organism into an empty cell, it would
retreat to the previous position and turn toward the left (90 degrees). It continued to repeat this
behavior at every turn cue without ever learning from its error. In the right panel, an organism from a
separate replicate using a generalizable imprinting strategy achieved a task quality score of 98% of the
maximum. It also tried to turn right when sensing a turn cue. However, it stepped off the path only once
at the first left turn. It learned the correct cue-response association and navigated the remainder of the
trail without error.
                                                       61


The specific type of associative learning that evolved in our experiments was “instrumental
conditioning”, in which an organism forms an association between a stimulus and a behavior
from its repertoire [80]. Organisms that performed imprinting formed an association early in
their lives that was used for future decisions but could never be modified. Organisms that
performed relearning also formed associations between cues and actions early in their lives but
were able to form new associations if the cues changed, regardless of whether they were
swapped or replaced with novel ones. Additionally, we identified environmental factors and
historical constraints that strongly influence whether associative learning evolves.
The ability to relearn when cues are swapped is called “reversal learning,” a learning ability that
is sometimes regarded as cognitively complex [123]–[125]. A typical organism capable of
reversal learning followed the trail of nutrients until it encountered a turn cue. Since the
integers representing turn cues were randomly assigned for each generation, the organism
then attempted to turn 45 degrees in a default direction and move forward one step. If this
step led to a nutrient-containing location, the organism continued to follow the trail. However,
if the organism turned in the “wrong” direction and found itself on an empty location, it
engaged in a corrective reaction by taking one step back and turning 45 degrees twice (90
degrees) in the opposite direction (as if recoiling and turning away). The organism then made
the association between the turn cue and correct action, such that the turn cue alone was
sufficient to trigger the correct action in subsequent encounters. If the turn cues remained
consistent, the organism navigated the remainder of the trail without error. Alternatively, if the
turn cues changed further along the trail (including cue reversals) the organism again exhibited
the corrective reaction and updated its association to the new cue, resuming the navigation
                                                  62


without further error. Cues could be changed any number of times with the organism always
relearning the turn cue and navigating without error until the cue changed again or it reached
the end of the trail (fig. A.15).
This serial reversal learning behavior evolved repeatedly, although it was a rare outcome,
evolving in only 10 out of 900 replicates in experiment 2, where we specifically selected for
reversal learning (and not at all in the 200 replicates in experiment 1 where reversal learning
was not actively selected). Nevertheless, many replicates that did not result in the evolution of
reversal learning still produced organisms that were able to efficiently navigate the entire trail
using either imprinting or error recovery.
2.4.2 Early trail predictability produces behavioral building blocks for learning
 Although all environments could promote the evolution of simple controlled movement, not all
could lead to the evolution of learning. All environments were constructed in a way that could
potentially select for behavioral biases, such as moving along the trail of nutrients and avoiding
empty locations, that contributed to the overall task performance. Indeed, the very first
behaviors to evolve were simple forms of controlled movement, such as oscillatory behavior
(moving back and forth) and moving to an edge of the path segment and stopping (see
Appendix A, sec. A.4.1 for an example).
In addition, all environments provided organisms with the features thought necessary for
learning to evolve: frequent choices of actions (move straight, turn right, or turn left) and cues
that change each generation but reliably indicate the best choice within a generation [91], [92],
                                                  63


[126]. However, while these features were present in all environments, they proved insufficient
to evolve learning. Specifically, no replicates in the random start environment produced
organisms capable of learning (or even error recovery). In fact, none of the organisms from this
environment were able to navigate past the first turn, and their task quality remained at or
below 4% of the maximum across all 50 replicates. The environments in which learning did
evolve (i.e., one fixed turn, two fixed turns, and nutrient cued) all had a property that the
random start environment lacked: trails providing a high initial degree of predictability across
generations that enable organisms to evolve behavioral building blocks and navigate the trail
reflexively before evolving learning [57]. These building blocks include moving repeatedly,
sensing the current cue, distinguishing the different cues and reacting to them, turning to either
side, retreating to the trail when an empty location is sensed, storing a cue in memory, and
comparing the current cue with the one in memory. This result supports hypothesis 1.
The predictable-start environments (one fixed turn, two fixed turns, and nutrient cued; table
2.1; figs. A.1-A.3) were the only ones to evolve complex behaviors, including learning. These
environments also produced a wider range of navigational strategies and organisms that
reached substantially higher task quality than any organism in the random start environment
(fig. 2.3; table 2.3). The nutrient cued environment produced the largest proportion of
organisms that could navigate the entire trail, followed by the one fixed turn and the two fixed
turns environments. These organisms used imprinting, error recovery or a hybrid strategy (table
2.3). The organisms that achieved at least 25% AMTQ but did not complete the trail used the
                                                  64


same strategies but performed more slowly, or simply reproduced before reaching the end of
the trail (table 2.3).
Figure 2.3: Distribution of average maximum task quality (AMTQ) per environment for experiment 1.
Each violin plot represents the distribution of AMTQ across replicates for a given environment. Only the
environments that started with a predictable pattern (one fixed turn, two fixed turns, and nutrient cued)
evolved organisms that could finish the trail. They also produced a wider range of navigational strategies
and organisms that reached much higher task quality than the control environment (random start).
                                                     65


Table 2.3: Experiment 1: summary of results
                                                 Predictable-start environments                      Control
                                                                                                  environment
                                     One fixed turn    Two fixed turns     Nutrient cued         Random start
                 Proportion of
                                          18/50               13/50               23/50               0/50
                 replicates
                 Strategies          Imprinting (3)    Imprinting (5)      Imprinting (3*)
  Replicates     evolved
                                     Error recovery    Error recovery (7)  Error recovery
  in which
                 (no. replicates)    (15)                                  (20)                        NA
  organisms                                            Hybrid of path
  finished                                             predicting and
  the trail                                            error recovery (1)
                 Highest AMTQ        99%               99.7%               99%
                 observed            (imprinting)      (imprinting)        (imprinting)                NA
                 (strategy)
                 Proportion of
                                          9/50                 4/50                4/50               0/50
                 replicates
                 Strategies          Imprinting (1)    Imprinting (2)      Error recovery (1)
  Replicates
                 evolved (no.
  in which
                 replicates)         Error recovery    Error recovery (1)  Path predicting
  organisms
                                     (8)                                   (1)                         NA
  did not                                              Hybrid of error
  finish the                                           recovery and        Hybrid of error
  trail                                                searching (1)       recovery and
                                                                           searching (1)
  (AMTQ ≥
  25%)                                                                     Hybrid of path
                                                                           predicting,
                                                                           searching, and
                                                                           imprinting (1)
Note: Shown are the performance and strategies of the organisms with average maximum task quality (AMTQ)
equal to or higher than 25%, organized by environment. We examined only a sample of organisms that had less
than 25% AMTQ. Those that were examined displayed previously described strategies and did not travel far on the
trail. NA = not applicable.
* Two of these organisms performed a generalizable version of the imprinting strategy that allowed them to
navigate any trail configuration independently of the starting pattern.
                                                         66


2.4.3 Learning May Not Generalize to Novel Environments
Both imprinting and error recovery were successful strategies in experiment 1, but they
differed in how well organisms could generalize to novel trails. Organisms that used error
recovery did not depend on the pattern at the start of the trail for their navigation and could
finish any trail configuration that we tested (fig. 2.2).
In contrast, most of the organisms that used imprinting depended on the specific start pattern
from the environment in which they had evolved to form the cue association. When tested in
trails with a different start pattern, these organisms were not able to navigate far and scored
poorly in task quality. However, two replicates in the nutrient cued environment evolved a
generalizable version of imprinting that allowed the organisms to navigate any trail
configuration independently of the starting pattern. These organisms began navigating the trail
and, when sensing a turn cue, turned to a default direction. On making their first wrong turn
and stepping off the trail, these organisms used error recovery to step back onto the trail and
turn to the other direction. At this point, they imprinted the turn cue that led them astray and
used the learned association to navigate the remainder of the trail (fig. 2.2). However, when
tested in trails containing cue reversals or replacements, these organisms were not capable of
coping with such changes and made wrong turns and stepped off the trail. They then resorted
to using error recovery to get back on the trail and continue navigating until the end. This result
led us to propose hypothesis 4, namely, that the environment has to present cue reversals
along the trail to foster the evolution of more “complex” learning abilities, such as relearning
and reversal learning [123]–[125], a hypothesis that we tested as part of experiment 2.
                                                  67


2.4.4 Cue Reversals During Evolution Foster Ability to Relearn During Lifetime
 In experiment 2, we used only the nutrient cued environment because it was the only one
where generalizable imprinting evolved in experiment 1. At approximately the 85% mark of
each trail we swapped (reversed) the values associated with the turning cues, requiring the
organism to learn to turn in the opposite direction of the one it learned at the beginning of the
trail. We named this condition the “cue reversal” environment (fig. A.5). In a complementary
experiment, we tested varying the cue reversal position between the 10% and the 90% mark,
without any significant effect on the results (Appendix A, sec.A.4; fig. A.8).
The results support hypothesis 4, although, as in previous experiments, the evolution of a
complex learning ability proved to be a rare occurrence. Of 900 replicates, only 18 evolved the
capacity for any form of reversal learning (fig. 2.4). In 10 of these 18 replicates, organisms also
evolved the capacity for serial reversal learning, even though their ancestors had only
experienced a single cue reversal in their lifetimes. In a serial reversal learning trial, the agent is
confronted by a repeated reversal of a two-symbol combination. Organisms from the 10
replicates that could perform this task exhibited behavior that generalized to any trail
configuration. In the other eight replicates, organisms evolved at least some capacity for
reversal learning and relearning. However, their behavior had limitations, such as (i) being able
to learn certain pairs of cues and not others, (ii) generalizing their behavior to some novel trail
configurations and not others, or (iii) having a “short memory”, which is “forgetting” the
association after a while and needing to learn it anew on making a wrong turn. These
                                                  68


limitations led to failures in staying on the trail, and in these cases organisms got lost or stuck
outside the trail or resorted to navigating by error recovery or searching.
Figure 2.4: Distribution of average maximum task quality across 900 replicates. The performance
histogram of all final predominant organism in experiment 2 reveals a marked grouping by behavioral
strategy. Organisms in groups 1 and 2 did not finish the trails, while those in groups 3, 4 and 5 did.
Group 1 consisted mainly of organisms that navigated by path predicting and its hybrids. Group 2
consisted mainly of organisms that navigated by error recovery, imprinting, and their hybrids. Group 3
consisted mainly of organisms that navigated by more effective forms of error recovery. Group 4
consisted mainly of organisms that employed imprinting hybrids. Group 5 consisted mainly of organisms
capable of relearning. The behaviors from groups 1, 2 and 3 were assessed from a sample of organisms.
Those of groups 4 and 5 were assessed from all organisms.
As in experiment 1, the fittest organisms (based on task quality) that evolved in experiment 2
were those that used learning strategies. The organisms capable of relearning scored as high as
97% of the maximum and were the fittest overall. Their behavior was similar to the
generalizable imprinting that evolved in experiment 1, in that they made the association
between the cue and the correct action on stepping off the trail, but they were also capable of
                                                    69


relearning if a cue reversal led them off the trail. Intriguingly, these organisms could also
relearn when tested in environments where an initial pair of turn cues was replaced by a
completely different pair as well as when they were reversed or changed multiple times along
the trail, even though we did not specifically select for this form of flexibility.
The next fittest organisms that were capable of learning employed various hybrid strategies
involving imprinting, error recovery, and path predicting to reach task quality scores as high as
93% of the maximum. Although incapable of relearning per se (i.e., replacing a cue association
with another), they were able to form temporary associations (short-term imprinting). This
“short memory” gave the organisms the opportunity to form a new association after the
previous one had extinguished. This hybrid strategy turned out to score higher in task quality
than imprinting or error recovery alone. For additional results and a “bestiary” of evolved
behaviors, see sections A.3 and A.4 of Appendix A.
2.4.5 The Stepwise Evolution of Learning
We found a discernible pattern in the evolutionary trajectories of the organisms that evolved
learning strategies (relearning and imprinting). Despite the organisms having evolved
completely independently, these lineages passed through a characteristic sequence of
phenotypic stages corresponding to two or more of the categories we described in table 2.2.
We analyzed the ancestral lineages of all of the final predominant organisms that evolved
imprinting in experiment 1 and ten of the final predominant organisms capable of relearning in
experiment 2. Starting from a sessile common ancestor, all lineages first evolved the capacity
                                                 70


for moving, then sensing, followed by reflexive navigation and then learning, a result that
supports hypothesis 2. In addition, error recovery preceded the evolution of associative
learning in all of the lineages where the final predominant organism made the cue-response
association by stepping off the trail (generalizable imprinting, and relearning). In lineages where
imprinting evolved directly from path predicting, the final predominant organisms were not
capable of error recovery, and their behavior did not generalize to other trail configurations
(nongeneralizable imprinting; figs. 2.5,2.6).
                                                  71


Figure 2.5: Evolutionary history: 10 Lineages. Shown is the evolution of task quality over time in each of
the 10 lineages that were ultimately capable of serial relearning from experiment 2. As they transitioned
to a new strategy, some lineages had great gains in task quality, while others had more gradual ones. All
lineages, however, went through occasional periods of fitness loss. Different task quality ranges often
corresponded to specific behavioral strategies. Range 1 corresponded to path predicting, range 2
corresponded to hybrid strategies that included searching, range 3 corresponded to error recovery, and
ranges 4 and 5 corresponded to imprinting and relearning.
                                                    72


Figure 2.6: Commonly observed evolutionary sequences. Shown are the evolutionary trajectories of the
11 lineages that evolved associative learning in experiment 1, and the 10 lineages that evolved serial
relearning in experiment 2. Behaviors evolved in a characteristic sequence of phenotypic stages. Starting
from a naive and sessile common ancestor, all 21 lineages evolved the capacity for moving, then
sensing, followed by reflexive navigation and then learning. The numbers next to the arrows indicate
how many lineages followed a particular pathway, with thicker lines indicating more common
evolutionary pathways in relation to alternatives.
                                                    73


2.4.6 Learning Can Evolve Suddenly
Finally, we found that during evolution, the transitions from one strategy to another could
occur abruptly, often as a result of a single mutation. This is not to say that a single mutation
was sufficient to produce a new strategy but rather that the new strategy often evolved silently
over a great number of generations until one or a few mutations triggered the transition in
behavior, a result that supports hypothesis 3. Sometimes this evolutionary transition would give
the organism a large fitness advantage, and its descendants would sweep the population. For
example, the transition between error recovery and associative learning (imprinting or
relearning) always occurred in one generation (we never observed any instance of a behavior
that would be intermediary, such as a simpler form of learning). In one of the lineages we
analyzed, the transition from error recovery to relearning raised the AMTQ from 81% to 98% in
a single generation. This strategy transition was triggered by a single mutation that changed the
flow of the algorithm so that after an error recovery event, the value of the currently sensed
cue would be stored in memory (figs. A.16, A.17). The remainder of the error recovery process
stayed intact and was subsumed by the newly acquired relearning capacity. Other components
of the relearning algorithm, such as the module for storing the cue in memory, had already
been part of the ancestor for many generations but were not used or did not affect the
organism's task quality. This result represents a clear case of historical contingency, where one
or more modules had to be in place before new mutations could lead to a fitness gain [127],
[128]. See section A.4.1 of Appendix A for figures and phenotypic descriptions of the major
evolutionary transitions in this lineage.
                                                 74


2.5 Discussion and Conclusions
2.5.1 Emergence of Learning Depends on the Prior Evolution of Reflexive Behaviors
Most studies of the evolution of learning have focused on the selection pressures that may act
to increase or decrease an organism’s reliance on learning [86], [91], [92]. Our study
complements and extends this work by examining how learning may have first arisen. As
Dunlap demonstrated [91], [92], [126], learning is favored in environments that present
alternative courses of action, where the best action cannot be predicted at the beginning of an
organism’s life but environmental cues exist that reliably correlate with the best action.
However, we found that although all of the environments possessed those presumably
necessary qualities, they were not sufficient for learning to arise, as evidenced by results from
the random start environment. Instead, as hypothesis 1 predicts, for organisms to initially
evolve the capacity for learning, they must first accumulate simple behavioral building blocks to
cope with the environment reflexively. In the cases presented here, these building blocks
include an ability to move, to sense different cues, and to perform a range of actions (move
forward, turn left or right, step back) in response to different cues. Crucially, generalizable
learning (generalizable imprinting, and relearning; table 2.2) arose only in lineages that first
evolved a reflexive ability to correct for missteps and return to a trail of resources (error
recovery; table 2.2). With these reflexive behaviors in place, associative learning can then
evolve because it confers an advantage by enabling an organism to modulate its behavior based
on experience. Moreover, we find that reflexive and learning behaviors are shaped by different
characteristics of the environment - the former by regularities that are stable across
                                                 75


generations, and the latter by patterns that vary across generations but persist for periods
within an organism’s lifetime. The most flexible learning ability – relearning new cue
associations multiple times during an organism’s life – depends on specific selection for it (i.e.,
swapping cues within the individual’s lifetime as in experiment 2), as we proposed in hypothesis
4.
2.5.2 Stepwise and Modular Evolution of Complex Behaviors
Across many replicate evolutionary runs in several experimental conditions, we found an
almost stereotypical historical sequence leading to the ability to learn. Furthermore, our results
are consistent with the idea that behavioral control algorithms evolve modularly [129], where
more complex behaviors evolve by building on simpler ones and sharing their mechanisms. For
example, learning mechanisms incorporated previously evolved error recovery behavior (figs.
A.16, A.17). This result supports our hypothesis 2, originally proposed by Skinner and others for
natural organisms [57], [90], [93], that learning abilities evolve by building on previously
evolved reflexive behaviors.
However, in contrast to Skinner’s model, we found that not all intermediate modules have an
immediate survival value. Such is the case with the previously mentioned organism that
evolved relearning from error recovery in a sudden transition triggered by a single mutation
(figs. A.16, A.17), and whose error recovery ancestors had already acquired the capacity to
store the cue in memory but never used this ability and, therefore, gained no fitness benefit.
Only when a mutation connected the memory-storing with the error recovery module did the
organism acquire the capacity to learn, thus gaining fitness.
                                                 76


It is important to clarify that no single Avida instruction or even specific set of instructions could
bestow learning on an arbitrary nonlearning organism. All associative learning algorithms we
observed were assemblies of many instructions that had to be executed in the proper order for
learning behavior to manifest (see sample learning organisms; fig. A.18). That a single mutation
could activate this behavior in an offspring only demonstrates that the remainder of the
mechanism was already in place either as part of the existing behaviors or as neutral
instructions [109], [130].
In the eleven lineages leading to imprinting in Experiment 1, and in the ten lineages leading to
serial relearning in experiment 2 (fig. 2.5), we routinely found that complex abilities evolved
from simpler ones in sudden transitions triggered by just a few mutations. This finding supports
our hypothesis 3, that learning may arise through minor modification of existing mechanisms,
and also lends credence to the proposition that something similar could have happened among
natural organisms leading up to the Cambrian explosion [80].
More generally, these sharp transitions in phenotype are a consequence of the modular
evolution of behavior. Modularity inherently reduces the requirements for evolving a new trait
if it can build on existing ones, increasing phenotypic complexity with relatively modest genetic
modifications [131]–[133], which can build up silently and, once completed, cause a sudden
shift in phenotype.
                                                   77


2.5.3 Why Learning Was Rare
Despite striking regularities during the course of evolution, associative learning was actually a
rare outcome even in environments that fostered it (7% of lineages in experiment 1, 2% of
lineages in experiment 2). Our results suggest some possible explanations. First, as mentioned
above, complex behaviors can be hard to evolve, in part because they may depend on the
preexistence of reusable intermediary modules - including features without survival value - and
are therefore subject to the stochasticity of historical contingencies in general. Another
possible explanation is that a reflexive strategy involving error recovery may already confer high
fitness, such that the fitness gain associated with a learning strategy may not be enough for
learning to arise and spread in the population. Across evolutionary replicates, we found
organisms that could solve the problem in surprisingly different ways and obtain high levels of
fitness, even in these simple environments. Furthermore, there can be implicit costs on more
complex algorithms, including greater mutational fragility and longer processing times. Even
making more mistakes on the trail, a shorter, sufficiently faster algorithm could reproduce
more quickly and thus outcompete more complex algorithms that made fewer mistakes but
executed too slowly. Surprisingly, we found in a follow-up experiment (Appendix A, sec.A.5)
that the amount of computational memory available to an organism is not a constraint on the
evolution of learning in our system as long as the minimum amount necessary to solve the task
is provided. We performed a version of experiment 2, where we reduced the amount of
memory available in the organism's CPU from 26 integers to 2, which is the minimum necessary
to solve the learning task, but did not see a significant difference in the frequency of evolution
of the relearning strategy or in the average task quality and distribution of task quality in the
                                                 78


final population compared with Experiment 2 (fig. A.19; table A.3). Overall, the same conditions
that explain the rarity of solutions involving learning were also responsible for the variety of
solutions and evolutionary paths we observed, better resembling natural evolution, where
learning typically entails some kind of cost and is not always adaptive [23], [91], [93], [126],
[134].
Interestingly, the stepwise succession of behaviors observed in our lineage studies, in
conjunction with the diversity of final strategies from different replicates, are reminiscent of
how behaviors appeared on trace fossils from the Ediacaran and early Cambrian, becoming
more complex and diverse over time [135].
2.5.4 The Scientific Value of an Open-Ended Evolutionary Model
In comparison with prior studies of the evolution of learning using computational methods
[136]–[138], ours is striking in the open-endedness of the evolutionary process, which parallels
that of biological evolution. Avida [113] employs relatively neutral genetic building blocks
consisting primarily of algebraic and logic instructions, which do not constrain or favor the
evolution of any particular behavioral algorithm. Thus, we were able to explore a large solution
space and gain insights into the evolutionary dynamics that are also likely to occur in natural
open-ended systems, even though nature uses very different building blocks. In Avida, the
sheer number of potential solutions creates evolutionary dynamics and patterns that are not
possible to observe using simpler digital evolution systems. For example, Izquierdo's
groundbreaking work on the evolution of associative learning using neural networks consisted
of evolving only the connections between preexisting neurons [137], [138]. Although many
                                                  79


insights were gained from that experiment, the limited number of potential solutions also led
to a smaller diversity of outcomes. In fact, simply using neural networks, which are intrinsically
designed to form associations, means that fewer mutational steps are needed from a starting
point to evolve appropriate connections compared with the enormous search space in Avida. In
our experiments, the behavioral algorithms evolved from scratch, using the most basic
computer programming language elements, from ancestors incapable of sensation, movement,
or navigation of any kind.
Furthermore, in our system even a basic behavioral building block, such as the move back
instruction, could evolve from an assembly of simpler actions, as we demonstrate in a
preliminary experiment (Appendix A, sec. A.2). We performed a version of experiment 1
without the move back instruction, and even without it many organisms evolved the capacity to
navigate the entire trail using either imprinting or error recovery. These organisms evolved
behavior functionally equivalent to the move back instruction by assembling other instructions
from the basic instruction set.
2.5.5 Early Evolution of an Intrinsic Value System
An unexpected outcome of this study was that it provided insights into the evolution of
motivational mechanisms, which are thought to be integral to adaptive decision-making [93],
[139]–[143]. Some of the earliest building blocks to evolve across all of our experiments were
those responsible for evaluating experiences. In our system, evaluations were implicit features
of the evolved controller and not distinct modules for deciding “good” or “bad.” They were also
essential to behavior control, since organisms could not sense their own task quality scores to
                                                 80


determine whether an action was beneficial or harmful. Early in evolution, values started as
arbitrary biases, such as moving constantly or favoring turning one way or another, but biases
that proved adaptive (e.g., preferring continuous movement while avoiding empty or previously
visited locations) would fix, excluding less fit alternative biases. Over time, an intrinsic value
system evolved that ensured appropriate behavior in response to specific inputs, and when
associative learning arose, this value system provided reinforcement for behavior conditioning.
We can thus reinterpret the associative learning mechanism we have observed in the light of a
value system: when an organism capable of learning senses an empty location, it displays the
avoidance behavior because, in effect, it negatively values the experience. It associates this
negative experience with the cue that led it to the empty location, and from then on,
experiencing the cue alone is sufficient to activate the avoidance behavior.
2.5.6 Reversal Learning Seems No More Complex than Initial Learning
Reversal learning is often deemed more challenging cognitively than initial learning [123]–[125].
However, in our experiments organisms that evolved the ability for reversal learning showed no
difference in the capacity or speed of learning between the initial and subsequent learning
events. Thus, once reversal learning evolves, it does not seem cognitively more complex than
associative learning itself, at least in this system. Our point is not to undercut the study of how
serial reversal learning in animals becomes faster with experience and correlates with cognitive
flexibility [144]–[146] but to call for a refinement of the ideas around what is required for
reversal learning to occur.
                                                   81


2.5.7 How Evolution Continues to Shape Associative Learning
In a follow-up analysis (Appendix A, sec. A.6), we looked into how evolution continued to shape
learning after it appeared in a lineage. In the lineages that produced associative learning in
experiments 1 and 2, we found that learning ability would become attuned to the environment
of evolution in a variety of ways. For example, (i) ancestral organisms that could learn some cue
combinations but not others would eventually give rise to descendants that could learn any cue
combination; (ii) ancestral organisms that required multiple exposures to learn the cue-
response association gave rise to descendants that required fewer exposures and, ultimately,
final organisms that required only a single exposure; and (iii) in environments without cue
reversals, ancestral organisms that could re-form associations multiple times gave rise to final
organisms that could imprint only once.
These adaptations are consistent with the literature on preparedness and other so-called
constraints on learning [84], [88], [89], [147], as well as the literature on sensitive periods of
plasticity [148]–[150]. The key themes are that evolution produces learning mechanisms that
are optimized for the needs of an animal in the environment where it evolved, and since
learning is costly, evolution will often restrict the periods of an animal's life when it is most
capable of learning (sensitive periods). An example of learning optimization is when an animal
that relies on odors for foraging can learn more quickly to associate odors with good or bad
foods than visual cues with the same foods [151]. The phenomenon of sensitive periods for
learning is illustrated by filial imprinting in birds, where a chick learns who its mother is early in
life and that association does not change [149].
                                                    82


Consistent with this literature, the imprinting strategy was adaptive in experiment 1, where
there were no cue reversals. In that environment, ancestral organisms that were capable of re-
forming the cue association multiple times eventually gave rise to organisms that could form
the association only once, presumably becoming more efficient. The sensitive period for
learning in those lineages became restricted to the beginning of an organism’s life. Meanwhile
in experiment 2, where the environment contained cue reversals, the ability to re-form the cue
associations (relearning and short-term imprinting) was adaptive, and the sensitive period for
learning lasted an organism’s entire life.
Although our experiments were not designed to investigate these topics, the patterns we found
suggest a future area of study in which Avida is used to systematically explore how the
evolutionary environment can constrain and optimize learning abilities.
2.5.8 Implications for Artificial Intelligence
The insights of this study are relevant to the field of Artificial Intelligence, where lifetime
learning has long been a challenge. We demonstrated that adaptive autonomous agents,
capable of learning and navigation, can be produced by evolutionary methods, using
biologically consistent scenarios where the environment fosters the evolution of learning and
decision making, instead of traditional methods based on human design, which are difficult to
scale up and to apply to novel tasks. One of our future goals is to extend this study and test
whether we can evolve more complex forms of learning, such as contextual learning and rule
learning (the learning of rules and concepts), and see whether their evolution follows the same
sequence suggested in the literature [57], [90], [93]–[97]. We could test this hypothesis by
                                                  83


introducing additional cue types and requiring the organism to perform additional tasks in more
intricate trails.
2.5.9 Implications for the Evolution of Behavior
Finally, we believe that the evolution of learning in a digital environment would be useful to
investigate the effect that learning behavior has on evolvability and rate of adaptive evolution.
Some researchers have proposed that learning increases evolvability, since behavioral flexibility
shields organisms from some selective pressures, allowing the population to maintain its
diversity to cope with future selective events [81], [83]. Others have proposed that learning
could either drive evolution by helping organisms adapt to different niches, where they would
experience different selective pressures leading to change, or inhibit it by protecting them from
selective pressures, leading to stasis [79]. It has even been suggested that the emergence of
learning drove the diversification of complex behavior during the Cambrian explosion [80].
Overall, we agree with the remarkable assertion by B. F. Skinner [57] (p. 220) that
understanding “the conditions under which [learning] evolved are helpful in understanding its
nature.
                                                  84


Chapter 3: Evolution of Patch Harvesting, an Insight into
Early Bilaterian Cognition
Authors: Anselmo C. Pontes, Ian Whalen, Charles Ofria, and Fred C. Dyer
3.1 Introduction
3.1.1 Background
The earliest evidence of animals with a nervous system are trace fossils from the Ediacaran
Period (635–541 Ma) [152]. Trace fossils, also known as ichnofosils, are fossilized marks made
by an organism, not including the fossilized remains of the organism itself [153]. Burrows of
bilaterian animals feeding on microbial mats that once covered shallow marine floors have
been preserved in sandstones all over the world, going back as far as approximately 560 Ma
[152], [154]–[156]. These fossilized trails are horizontal and curvilinear and typically between 1
and 3mm wide [157], indicating the minute scale of the organisms that made them (fig. 3.1).
Later traces, however, can be as wide as 5mm 17.
17
   Some of the simplest horizontal curvilinear trace fossils of the same era were left by deposit feeders, animals
similar to those that fed on microbial mats, but that lived in deep waters below the photic zone and fed on detritus
covering the seafloor [157]. However, the more elaborate trail behaviors are believed to be the product of shallow
water microbial mat grazers, and it is on those animals that we focus our research.
                                                          85


Figure 3.1: Ediacaran trace fossil. The millimeter wide traces were produced by bilaterians with a
nervous system, the first animals with these characteristics on record. Photo by Verisimilus at English
Wikipedia, CC BY 2.5, Image downloaded from:
https://commons.wikimedia.org/w/index.php?curid=2502886.
The worm-like animals that left these traces are referred to as mat miners or undermat miners
[157], [158], after where they fed. Their horizontal burrows were confined to the thin layer of
microbial mat between the water, which was possibly low in oxygen, and the toxic and anoxic
substrate [152], [157], [159]. These animals were presumably composed only of soft tissues and
as such did not leave body fossils (though what are believed to be body impressions of such an
animal have recently been found [160]). They were likely closely related to the last common
ancestor of today’s bilaterians [152], [154], [160], which includes all animals with a central
nervous system.
                                                     86


From when they first appear at the end of the Ediacaran and into the early Cambrian 18 [154],
these trace fossils show increasingly complex patterns of activity and feeding strategies [152],
[157], [161]. They document a critical period when early benthic animals transitioned from
sessile to active lifestyles, which included burrowing, grazing, and hunting [155], [162]. They
also show a change in feeding strategies, from osmotrophy and filter feeding to heterotrophy
[154], [155]. This progressive increase in the complexity of the animals’ feeding behavior could
indicate an increase in the capacity and sophistication of their sensory and cognitive systems
[157]. By studying the changes in these animals’ feeding behavior, we can gain insight into the
early evolution of animal cognition and nervous systems. We can also better understand the
conditions that led to the Cambrian explosion [154], [163].
3.1.2 Previous Research
Researchers have identified some regularities and trends in the fossilized burrows. The earliest
traces are simple, showing little directionality, but over time they become increasingly
elaborate and compact, showing evidence of more complex responses, such as avoiding
crossing other burrows [157], [161]. They also form recognizable motifs, such as spirals and
tight parallel meandering, like plowing a field [157], [161], [164]–[166].
According to the earliest interpretations of the fossils, the behavior of these animals was
governed by a set of genetically encoded preferences (taxes): “phobotaxis”, avoiding crossing
burrows; “thigmotaxis”, preferring to move alongside an existing burrow; and “strophotaxis”, a
18
   The Cambrian Period is from 541 to 485.4 Ma.
                                                  87


propensity to make 180° turns after traveling a certain distance [161], [167]. Another, more
recent, interpretation is that the “taxes” we observe are not hardwired behaviors; instead, they
reflect the animal’s response to the concentration and distribution of food in the environment
[153]. The animal therefore only has a single taxis, which is to move along the food gradient.
The differences in trail morphology over geological time are frequently considered evidence
that the microbial mats these animals fed on were in decline and becoming patchier [157],
[166]. Some researchers who favor the hypothesis of multiple hardwired taxes suggest that the
animals responded by evolving more efficient grazing behaviors. While those who favor the
hypothesis of a single taxis suggest that the animals' behavior did not necessarily change; the
more compact trails are artifacts of animals following nutrient gradients in shrinking microbial
mats.
Researchers from both sides of the taxes argument have built computer simulations based on
their hypotheses [153], [157], [168], and both sides have been successful in demonstrating that,
under their set of conditions, pre-programmed agents can reproduce harvesting patterns
similar to those found in the fossil record. The reasons for patch harvesting trails becoming
more complex and compact over evolutionary time have, therefore, remained disputed.
3.1.3 Recent Evidence
The Ediacaran Period presented great environmental challenges for life, with great swings in
global temperature and oxygen concentrations [152], [155], [159]. Furthermore, recent studies
indicate that the ecology of the shallow marine environments from which most of the trace
                                                 88


fossils come was more dynamic and heterogeneous than previously thought [154]–[156], [169].
The widespread matgrounds covering the sea floor were composed of a variety of species,
including sessile animals, algae, and microbial communities, as well as decaying matter [154],
[155]. These microbial mats were prone to disturbance from waves, storms, coastal erosion,
etc., as well as the activity of competing mat miners [155], [156]. It is therefore almost certain
that the microbial mats were already broken into patches when the first mat miners appeared.
The activity of mat miners was likely further restricted by the concentration of oxygen, as mat
miners would only graze where there was both food and oxygen [154], [159], [170]. Oxygen
concentrations during the Ediacaran Period were much lower than today and varied greatly
over time [155], [159]. Among the species that made up the microbial mats, however, were
photosynthetic organisms; these created well-oxygenated oases where animals could thrive
[159], [171], [172].
Regarding the suggestion that microbial mats were in decline, recent evidence indicates that
shallow water matgrounds were actually increasing in abundance and diversity at the same
time that the mat miners’ burrows were becoming more sophisticated [155]. There is no
consistent evidence that microbial mats were in decline before the Cambrian Period [154],
[155], [173].
3.1.4 Our Experiments
Previous computational experiments that attempted to recreate nutrient harvesting behavior
were primarily based on pre-programmed (human-designed) agents whose behavior would
                                                 89


vary depending on a few calibration parameters [168]. These agents were also capable of
peripheral, and sometimes, long range sensing [153]. Few experiments attempted to use
evolutionary algorithms to evolve the agent’s behavior, and those that did either failed to
reproduce the more complex behavior patterns that we observe in the fossil record [174], or
else they did not evolve the agent’s control system itself, instead evolving calibration
parameters for human-designed agents [175], [176]. As such, these studies could provide only
limited insight into the selective pressures, the evolutionary relationships among the different
burrowing behaviors, and the landscape of potential burrowing strategies. Two good reviews of
these modeling efforts can be found in Hayes, 2003 [164] and Plotnick, 2003 [177].
For our computational experiments, we chose to once again use the Avida platform because of
its successful track record in similar studies involving both the evolution of navigation and
nutrient-gathering behaviors, such as gradient ascent and trail following [106], [178].
Moreover, Avida allowed us to evolve the control system of the agent’s behavior in an open-
ended manner. Afterwards, we were able to analyze the behavioral algorithms that evolved
and assess how they conformed to the different taxes that have been proposed. Avida also
permits reconstructing the evolutionary trajectory of any given lineage, which allowed us to
identify potential relationships between behaviors over evolutionary time.
We performed an exploratory study taking into consideration the latest evidence and
interpretation of the Ediacaran fossil record. Recent evidence points towards abundant but
patchy microbial mats over the entire period in which we find burrowing traces. We therefore
tested an environment with a constant abundance of nutrients over evolutionary time, and
                                                  90


where nutrients were distributed in homogeneous patches with no gradients 19. The aim was to
determine whether such an environment would lead to the evolution of the behavioral
patterns observed in the fossil record. We further simplified the environment by assuming that
there was no competition with neighboring organisms: each organism born was placed alone in
its own testing environment.
We chose to limit the type of sensory system that could evolve and, by so doing, test whether
even a very crude sensory apparatus is sufficient to evolve the behaviors observed in the fossil
record. Specifically, we assumed that each Avida organism could evolve to sense at most the
quality of the nutrient at its current location. This ability would allow it to distinguish four
conditions: nutrient present, nutrient absent, edge of nutrient patch, and nutrient already
consumed.
We also investigated the effect that patch shape, coverage, and distribution have on the types
of harvesting strategies that evolved, and whether different patch structures favored the
evolution of particular burrowing strategies.
3.1.5 Our Findings
In our experiments, harvesting behavior became more complex over the course of evolution,
tending to maximize patch coverage. We were able to replicate the stereotypical motifs from
19
   We agree with Carbone and Narbonne, 2014 [157], that it is unlikely that food gradients were solely responsible
for the complexification of trail designs occurring simultaneously all over the world. Therefore, we decided to test
the condition where patches did not have gradients and were not in decline.
                                                          91


the fossil record and observe how organisms transition from one type of strategy to another
over generations. We also found that when patches were fragmented and dispersed, organisms
evolved to balance exploration and exploitation. Remarkably, this ability involved the evolution
of memory utilization to decide when to leave a patch and seek another one.
Innate responses, such as phobotaxis, evolved in some of our organisms. However, what
appears as strophotaxis in our system was simply an organism’s response to reaching the edge
of a patch, and what appears as thigmotaxis was an artifact from other behaviors that produced
compact parallel trails.
Overall, in a regime where fitness depends on the amount of nutrients consumed during a
lifetime, cognitive improvements in response to patch geometry and distribution were
sufficient to produce the various complex strategies seen in the fossil record, without any need
for complex or long-range sensory capacity, changes in nutrient availability, or intra-patch
competition.
3.2 Methods
For these experiments, we used the Avida platform described in Chapter 2. This time, the
behavioral task consisted of harvesting nutrient patches within an arena. We tested eight
different environments, each containing four virtual arenas with unique patch configurations
(table 3.1). The arena size was 50 x 50, larger than in Chapter 2, and could contain one or more
nutrient patches. Patches could be polygonal or have an irregular (‘organic’) shape (e.g., fig.
3.2).
                                                 92


Table 3.1: Environments in the order they were used in the experiments. Each environment consisted of
four arenas with different patch configurations. Each organism experienced only one arena in its
lifetime.
                                    Number of                                               Average
        Environment name                                 Patch shape        Edge cues
                                      patches                                              coverage
  Rectangular with edges           1 large        Rectangular                   Yes           41%
  Rectangular without edges        1 large        Rectangular                   No            41%
  Rectangular with hole and
                                   1 large        Rectangular with hole         Yes           27%
  edges
  Irregular with edges             1 large        Irregular                     Yes           55%
  Irregular with holes and edges   1 large        Irregular with holes          Yes           49%
                                                  Rectangular patches
  Connected patches with edges     6 small                                      Yes           57%
                                                  with linking corridors
  Disconnected patches with                       Separated rectangular
                                   6 small                                      Yes           49%
  edges                                           patches
  Disconnected patches without                    Separated rectangular
                                   6 small                                      No            49%
  edges                                           patches
                                                    93


                                                                            Nutrient
                                                                             Edge Nutrient
                                                                            Empty Cell
Figure 3.2: Sample Arena with an irregular shaped nutrient patch and edge nutrients. Each arena
contained a patch of nutrients with a unique shape. At the start of its life, each organism was placed
alone in a randomly selected arena, within a nutrient patch, at a consistent location and orientation.
To eliminate co-evolution dynamics, organisms were neither allowed to interact directly nor
indirectly through environmental modification. At the start of its life, each organism was placed
alone in a randomly selected arena, within a nutrient patch, at a consistent location and
orientation. The behavioral task required the organism to harvest as many nutrients as possible
within its lifetime, while minimizing visits to empty locations (off-patch). We measured an
organism’s performance in the arena using a task quality score (also described in Chapter 2),
which was calculated by counting the number of nutrients consumed, subtracting the number
of visits to empty locations and dividing by the total number of nutrients in the arena, not
                                                   94


allowing the total to be negative. Revisiting previously consumed nutrient locations did not
affect the task quality score, however, revisiting empty locations did.
We also used the same instruction set as in Chapter 2, excluding the move back instruction.
Therefore, the instructions the organisms could acquire through evolution to interact with the
environment were sense current, move ahead, turn right, and turn left (table 3.2).
Table 3.2: Environmental Interaction instructions used in the experiment
  Avida instruction     Description
  Move ahead            Caused the organism to move one step in the direction it was facing.
  Turn right            Caused the organism to turn 45 degrees to the right.
  Turn left             Caused the organism to turn 45 degrees to the left.
  Sense current         Provided the organism with an integer corresponding to the content of its
                        current location (nutrient, empty location, previously consumed nutrient, and
                        edge nutrient).
If organisms evolved the ability to sense the content of their current location, they would sense
nutrients, empty locations, and previously consumed nutrient locations as distinct integers (3, -
1, and 1, respectively). In some environments, the edges of a patch could also be sensed as the
integer 0.
We evolved 200 populations for each of the eight environments (table 3.1), for a total of 1,600
replicates. We seeded each population with a single sessile and naive organism capable only of
reproducing. After a period of evolution that lasted 200,000 updates (a unit of time in Avida),
                                                    95


we selected the three populations with the highest overall task quality 20 from each
environment and analyzed the navigation strategies of their predominant organisms (most
abundant genotype).
In addition, we selected the most successful predominant organism overall for a full lineage
study, which consisted of testing every one of its ancestors on the behavioral task to
reconstruct the evolutionary history leading to the final strategy.
All other configurations not mentioned above, such as maximum population size, mutation
rates, and maximum organism age were the same as in Chapter 2 (Appendix A).
3.3 Results
3.3.1 Evolved Behaviors Resemble the Fossil Record
Starting from a sessile and naive ancestor, our evolutionary experiments produced organisms
capable of moving and sensing, which responded to the presence of nutrients in the
environment by systematically harvesting them in patches while avoiding empty locations. Over
the course of evolution, more complex and efficient harvesting strategies emerged in a
sequence similar to the one observed in the Ediacaran trace fossil record [157]. In addition,
these different strategies reproduced the stereotypical fossil trail patterns, such as “scribbling”,
“meandering” and spiraling [161], [166] (table 3.3).
20
   The overall task quality of a population equates to the Average Maximum Task Quality (AMTQ) explained in
Chapter 2.
                                                          96


These harvesting behaviors evolved despite the organisms' limited potential for sensing
environmental cues, and in an environment with constant nutrient availability over time, where
nutrients were distributed in patches of homogeneous quality.
We also found that innate responses, such as phobotaxis and what resembles thigmotaxis and
strophotaxis, evolved anew in some of our organisms. However, in our system, what seemed
like strophotaxis, an organism’s propensity for occasional 180 degree turns, was in fact a
response to reaching the edge of a patch. In addition, what appeared to be phobotaxis was an
artifact of other behaviors that produced compact parallel trails, and not an active sensing and
hugging of an existing trail.
3.3.2 Evolved Behaviors Fall into Four Stereotypical Strategies
After analyzing the three top performing predominant organisms from each of the eight
environments, we observed that, despite variations, their behaviors could be easily classified in
four main strategies: “reactive meandering”, “pattern cycling”, “plowing”, and “spiraling” (table
3.3). Pattern cycling (fig. 3.3) could be further divided into ”edge-hugging” and ”edge-
reflecting”, depending on how the organism reacted to edge cues (fig. 3.4).
                                                   97


Table 3.3: Behavioral strategies found across all treatments. For each environment, we chose the three
populations with highest overall task quality across the 200 replicates and analyzed the navigation
strategy of their predominant organisms. Although there was a great deal of variation, all behaviors
could be classified into four major strategies.
   Behavioral
                                                          Description
    Strategy
                   Organism moved straight forward until it encountered an edge cue or an empty
    Reactive       location, then it changed direction and continued moving straight. The resulting path
   meandering      crisscrossed the patch, often revisiting the same locations several times, while many
                   locations were never visited (fig. 3.5).
                   Organism moved in a closed cycle, such as the shape of an octagon, or a square,
                   offsetting the center of the cycle along a linear trajectory. The organism reacted to
     Pattern       edge cues or empty locations by changing trajectory. Depending on how they reacted
     cycling       to edge cues, this strategy could be further divided into two types: edge-hugging and
                   edge-reflecting. This strategy revisited locations often but tended to cover most of a
                   patch (figs. 3.3 and 3.4).
                   Organism moved in straight lines until reaching the edge of the patch, then turned
                   around and continued in a parallel line offset by one cell. As the organism moved
                   across a patch, it produced a pattern reminiscent of plowing a field, which tended to
     Plowing
                   cover most of a patch while minimizing the number of locations revisited. It reacted
                   to edge cues and empty locations by turning and starting another arm of the trail (fig.
                   3.6).
                   Organism spiraled inwards within a patch, moving straight and reacting to empty and
                   previously consumed nutrients by making turns. When it neared the center of the
                   spiral it would leave the patch in a straight line until it found another patch, where it
    Spiraling      would start spiraling again. This strategy covered most of the patches the organism
                   visited, while minimizing the number of revisited locations. The decision of when to
                   leave the current patch for another was based on memory of previous turning
                   conditions (fig. 3.7).
                                                      98


                                                                             Start
                                                                             End
                                                                             Nutrient
                                                                             Edge Nutrient
                                                                             Empty Cell
Figure 3.3: Example of the pattern cycling strategy (with edge-hugging) in the environment Rectangular
with Edge. Green and red circles indicate the start and end of the trail.
Figure 3.4: Illustration of the edge-reflecting (left) and edge-hugging (right) variations of the pattern
cycling strategy
                                                        99


                                                                        Start
                                                                        End
                                                                        Nutrient
                                                                        Edge Nutrient
                                                                        Empty Cell
Figure 3.5: Example of the reactive meandering strategy in the environment irregular with holes and
edges
3.3.3 Evolved Strategies Depended on Patch Structure
In our experiment, organisms evolved to respond to high-level features of the environment
such as the edges of the patch or the trail left by the organism. We did not see a harvesting
strategy based on local search such as proposed by Plotnick and Koy [153].
Plowing and spiraling were the most efficient strategies in the environments where they
evolved, reaching higher task quality than competing strategies. Reactive meandering and
pattern cycling were the least efficient strategies and reached comparable levels of task quality.
                                                  100


However, reactive meandering proved the most robust strategy, and the only one that evolved
in all environments (table 3.4).
Table 3.4: Strategies that evolved among the three best performing populations from each
environment.
                                                              Strategies Evolved
          Environment Name
                                           Reactive         Pattern
                                                                           Plowing       Spiraling
                                          meandering        cycling
Rectangular with edges                        Yes             Yes             Yes
Rectangular without edges                     Yes
Rectangular with hole and edges               Yes                             Yes
Irregular with edges                          Yes             Yes             Yes
Irregular with holes and edges                Yes             Yes
Connected patches with edges                  Yes             Yes
Disconnected patches with edges               Yes
Disconnected patches without edges            Yes                                          Yes
When analyzing the effect of patch shape, coverage, and distribution on the evolution of
harvesting strategies, we found that environments with single, solid patches produced a wider
variety of strategies than those with multiple patches or single patches with holes. However,
patch coverage, the ratio between the area covered with nutrients and the total area of the
arena, had no discernible effect on the diversity of the strategies that evolved. Finally, only in
an environment with multiple, disconnected patches did we see the spiraling strategy evolve,
and it involved a balance between exploration and exploitation.
                                                  101


The presence of edge cues did not have a consistent effect. Environments with single
rectangular patches produced a wider variety of strategies when edge cues were available,
while environments with multiple disconnected patches produced a wider variety of strategies
(including spiraling) when edge cues were not available.
3.3.4 In-Depth Results from a Single Environment, Rectangular with Edges
The environment rectangular with edges was one of two that evolved the highest diversity of
strategies: reactive meandering, pattern cycling, and plowing.
Among the three, reactive meandering had the lowest task quality performance. The organism
that employed it reacted to edge cues but not to previously consumed nutrients (fig. 3.5),
leading to wasted movements in already-explored areas of the patch.
Pattern cycling displayed intermediate performance. The organism that employed it reacted to
edge and empty location cues but not to previously consumed nutrients (fig. 3.3). It also
revisited positions often, however, due to its systematic and tightly spaced cycling, it was able
to fully exploit those areas that it visited.
Plowing had the best performance. As with pattern cycling, the organism reacted to edge and
empty location cues, but not to previously consumed nutrients (fig. 3.6). However, due to its
systematic navigation in parallel lines, it was able to exploit most of the patch while not
revisiting positions as frequently as the other two strategies. It was noteworthy that, before the
organism started plowing, it moved in a straight line until it encountered one of the edges of
                                                 102


the patch, then it would start the plowing pattern in the opposite direction. This initial step
allowed it to harvest the entire patch in a single sweep.
                                                                        Start
                                                                        End
                                                                        Nutrient
                                                                        Edge Nutrient
                                                                        Empty Cell
Figure 3.6: Example of the plowing strategy in the environment rectangular with hole and edges
3.3.5 The Spiraling Strategy Evolved Memory Usage
Spiraling evolved in a single environment, disconnected patches without edges. This was one of
two environments made of multiple, small, disconnected patches. Spiraling had some
interesting features. It evolved in an environment without edge cues, which we expected to
make the task more difficult. However, the organism that used this strategy was sensitive to
previously consumed nutrients and used these cues to guide its navigation. It was also
noteworthy that this organism spiraled inwards. When placed on a patch, the organism moved
straight until it found one of the edges, then it moved along the patch’s perimeter, and finally
                                                   103


spiraled towards the center. However, before reaching the center of the spiral, it would leave
the current patch and move in a straight line until it found another patch, where it would start
another inwards spiral.
The decision of when to leave the current patch was based on a complex calculation involving
the organism's memory of the conditions under which it made previous turns, such as by
reaching the edge of the patch or previously consumed nutrients. Therefore, this strategy also
implemented a balance between exploration and exploitation.
                                                                               Start
                                                                               End
                                                                               Nutrient
                                                                               Empty Cell
Figure 3.7: Example of the spiraling strategy in the environment disconnected patches without edges.
The arena is toroidal and the organism begins its navigation at the green dot and ends at the red. It
spirals inward but leaves the patch before reaching the center. It starts a new spiral upon encountering
a new patch.
                                                    104


3.3.6 Lineage Analysis Shows Different Strategies Evolving from One Another
We selected the overall top performing organism, which produced the plowing strategy in the
environment rectangular with hole and edges (fig. 3.6) and analyzed its ancestral lineage to
reconstruct the evolution of its behavior. We found that starting from a sessile ancestor, the
lineage first evolved pattern cycling, then transitioned into reactive meandering, and finally
plowing (fig 3.8). This result shows that the three strategies could evolve into one another over
the generations within the same lineage. Below we describe the strategies found at
intermediary points during evolution.
Task Quality
Figure 3.8: Evolutionary history of the plowing strategy in the environment rectangular with hole and
edges. On the left, is the evolution of task quality over time. On the right, the different navigation
strategies from selected ancestors along the lineage.
                                                  105


Organism 1: This organism used a Pattern Cyling strategy that hugged one of the edges of the
patch. It moved slowly and did not travel very far in the patch before reproducing. However,
this organism could already react to edge and empty location cues, which shows that the ability
to sense and discriminate environmental cues evolved early.
Organism 2: This organism performed reactive meandering, and moved faster than Organism 1.
It always responded to edge cues, avoiding visiting empty locations. Although it was able to visit
most of the patch, much of it was left unharvested.
Organism 3: This organism also performed reactive meandering. Interestingly, it often
alternated responding to edge cues and empty location cues. This made its navigation less
systematic than its ancestor’s, but it harvested more of the patch.
Organism 4: This organism used the plowing strategy. It responded to both edge and empty
location cues. It systematically exploited the areas it visited, however, it did not visit the entire
patch.
Organism 5: This organism also used the plowing strategy, but it moved more quickly and
further than Organism 4, therby harvesting most of the patch.
Organism 6: This was the final organism in this lineage. It used the plowing strategy, but with
an important addition to that of Organism 5. Before starting the plowing pattern, it moved in a
straight line until reaching the edge of the patch. Then, it performed a series of turns that
allowed it to position itself inside the patch, close to the edge and parallel to it. Then it plowed
the patch in one sweep.
                                                  106


3.4 Discussion and Conclusions
3.4.1 Complex Trails Require Healthy Patches
It has frequently been assumed that Ediacaran microbial mats were declining over time, from
homogeneous coverage when mat mining first arose to increasingly patchy and rarefied
coverage towards the Cambrian [157]. This assumption was incorporated in previous
computational models of mat mining behavior [153], [157], [168].
A more recent interpretation of fossils and other data from the period instead favors the thesis
that Ediacaran matgrounds were naturally patchy and were not declining before the Cambrian
[154]–[156], [169]. In support of this modern view, our results show that, in a patchy but stable
environment, mat mining behavior evolves in a similar progression and produces remarkably
similar trail designs to those seen in the fossil record, due simply to competition for faster
reproduction.
Although our work is exploratory, our results also suggest that the most complex harvesting
behaviors only evolve in environments where patches are large and homogeneous, indicative of
healthy and abundant microbial mats. In environments where patches are excessively irregular
and fragmented, indicative of stressed and shrinking microbial mats, the more systematic
harvesting strategies would not be effective and, indeed, do not evolve. Such strategies include
plowing and spiraling, which require large expanses to be efficient and which, in excessively
irregular and fragmented environments, have no competitive advantage over simpler
behaviors, such as pattern cycling and reactive meandering. If our early results hold, the more
                                                  107


complex and efficient trail patterns observed at the end of the Ediacaran and early Cambrian
indicate healthy and abundant microbial mats, not a decline as previously assumed.
3.4.2 Patch Boundaries May Have Guided Mat Mining Behavior
Previous computational models that successfully reproduced the trail patterns found in the
fossil record assumed that the animals’ behavior was either based on three taxes (phobotaxis,
thigmotaxis, and strophotaxis) [168], [175] or on a single gradient-following taxis [153]. Our
results, however, show that alternative scenarios are possible.
In our experiments, organisms evolved behaviors that were always sensitive to patch
boundaries, regardless of whether they were sensitive to existing trails or whether they
followed innate patterns. Even the more systematic behaviors, plowing and spiraling, were
shaped by the boundaries of the environment and not by thigmotaxis or strophotaxis.
Assuming that Ediacaran microbial mats were naturally patchy, it would have been
evolutionarily advantageous for mat miners to use patch boundaries and other persistent
features of the environment to guide their behavior. In Chapter 2, we showed how cognition
evolves to rely on persistent environmental cues to make decisions, whether these cues last for
generations or only for periods within the organism’s lifetime.
Moreover, as our results show, a simple sensory capacity, such as contact chemoreception at
the anterior end of the animal, would likely be sufficient to detect features such as existing
burrows and the boundaries of the harvestable area. Boundaries could have been demarcated
by different features, such as empty ground, sessile organisms, low oxygen, and decaying
                                               108


organisms. In any case, this sensory data would have provided sufficient input for the observed
behaviors.
3.4.3 Peculiar Spiraling Behavior Balances Exploration and Exploitation
The behaviors that evolved in our experiments produced trails remarkably similar to those from
Ediacaran and early Cambrian fossils. Reactive meandering shows close resemblance to “two-
dimensional avoidance traces” in Carbone and Narbonne, 2014 [157], while pattern cycling and
plowing resemble “scribbling” and “meandering” in Seilacher 1967 and 2007, who also
documents different types of spiraling behavior [161], [166].
Our organism’s spiraling behavior was particularly interesting because it evolved in a multi-
patch environment and it spiraled inwards. Inwards spiraling is thought to be more efficient
than outwards spiraling, as it allows the organism to fully exploit an area by first establishing its
limits. It also allows the organism to estimate how much nutrient is left in the patch based on
how tight the coiling is becoming [164].
In fact, our spiraling organism did not fully exploit its patches. Before it reached the center of a
spiral, it would leave in search of another patch. The organism balanced exploration and
exploitation in a manner reminiscent of Charnov’s Marginal Value Theorem (MVT) [179], where
the optimal time an organism remains on a food patch before leaving for another one is a
function of the time already spent and the time required to get to another patch.
Moreover, the organism relied on memory to decide when to leave a patch. Behaviors based on
memory use are considered more advanced than hardwired ones [106]. What is striking is that
                                                 109


this more complex cognitive function evolved in the context of balancing exploration and
exploitation instead of maximizing grazing coverage.
The efficiency of these early animals’ grazing patterns has been the focus of previous
discussions about their cognitive capacity and whether their nervous systems became more
complex over time [157]. Although there is no straightforward correlation between the
complexity of the grazing behavior and the complexity of the nervous system [152], there is a
qualitative difference between a cognitive system that performs hardwired behaviors and one
that uses memory and performs complex calculations to decide when to switch patches.
Assuming that Ediacaran microbial mats were naturally distributed in patches, it may not have
been so much the need to optimize grazing patterns that drove the evolution of cognition.
Instead, it may have been another challenge that mat miners faced: deciding when to leave a
patch and how to find the next one.
                                                110


Chapter 4: Beyond Associative Learning, the Early
Evolution of Configural Learning
Authors: Anselmo C. Pontes, Andrew Mitchell, Charles Ofria, and Fred C. Dyer
4.1 Introduction
4.1.1 Background and Motivation
Learning is classified in order of presumed complexity, which may also be the order in which it
evolves[180], [181]. The simplest learning forms, habituation and sensitization, are known as
nonassociative learning [180]. Unambiguous associations between pairs of elements of a set of
stimuli and responses, such as in Pavlovian and instrumental conditioning, are known as
elemental learning [181], [182]. Associations involving configurations of stimuli that interact in
space and time, creating ambiguity that can only be solved in context, such as in discrimination
and rule learning, are known as configural learning (or non-elemental learning) [180]–[183].
In Chapter 2, we confirmed the hypothesis that cognitive abilities, such as learning, are modular
and evolve in a stereotypical sequence from simple to more complex by building upon previous
competences [57], [94], [96], [184]–[186]. We decided to extend this experiment beyond
instrumental conditioning and investigate whether we could evolve more complex forms of
learning as well as other cognitive abilities.
                                               111


4.1.2 Our Experiments
We began by adding a new type of cue to our Avida environment from Chapter 2. This
modification allowed us to create cue combinations with different meanings. We expected that
this more complex environment would foster the evolution of a form of configural learning in
which the organism was required not only to learn new cues but to respond differently to them
depending on previous experience with a separate cue. We also expected that if configural
learning evolved, it would recapitulate many of the same steps in the evolution of instrumental
conditioning from Chapter 2.
In addition, we were interested in determining how the complexity of the behavioral control
algorithms that evolved in this study would compare with those from Chapter 2. Although
learning abilities do not always correlate with the complexity of an animal’s nervous system
[180], [181], there is a trend of increasing ability when going from nerve nets to centralized
nervous systems, and according to brain size among closely related species. For example,
jellyfish (which have on the order of 104 neurons, organized in nerve nets) are capable of
habituation and sensitization but not associative learning [180], [187]–[189], and fruit flies
(which have on the order of 105 neurons in their brain) are capable of learning associations
between cues presented simultaneously but not associations that depend on previous
experiences such as abstract concepts [180], [190]. Honeybees (which have on the order of 106
neurons in their brain), however, are capable of all these, in addition to other forms of learning
previously thought exclusive of vertebrates with much larger brains, such as learning by
observation [180], [181], [191]–[193].
                                                  112


4.1.3 Our Findings
We succeeded in evolving organisms capable of solving the configural learning task; however,
they were exceedingly rare. Out of 2,400 replicate populations, only two evolved organisms
with this capability.
We singled out one of these organisms and analyzed its evolutionary history. As in the previous
study, evolution followed a discrete sequence of behavioral stages, starting with moving and
sensing, followed by reflexive trail navigation using error recovery, then associative learning,
and finally, configural learning.
When comparing the organisms that evolved configural learning in this study with those that
evolved instrumental conditioning in Chapter 2, we found that although the lengths of their
behavioral control programs were similar, the complexity of the algorithm responsible for the
configural learning behavior was considerably higher. The rare evolution of these more complex
behavioral control algorithms may indicate a technical barrier that limits the cognitive capacity
of the organisms that can be evolved with the current version of Avida.
4.2 Methods
We continued using the Avida platform and the behavioral task consisting of navigating a trail
of nutrients in a virtual arena described in Chapter 2, where the nutrients could cue the
organism on the direction to follow if they evolved the ability to sense and interpret the cues
correctly. For this experiment, we used a new type of cue, called “sharp turn”, in addition to the
four used in Chapter 2 (forward, left turn, right turn and empty location). A sharp turn cue
                                                113


immediately preceding a left or right turn cue indicated that the correct action was to turn 90
degrees instead of 45 degrees.
The sharp turn, empty location, and forward cues, were represented by integer values that
remained constant throughout evolution (1, -1, and 0, respectively). While the right and left
turn cues were also represented by integers, they were randomly drawn from the interval
between 2 and 101 (inclusive) every time an organism was placed in the arena and remained
constant only during the organism’s lifetime.
For an organism to perform optimally in this environment, it had to associate at least one of the
two turn cues with the correct meaning, identifying the remaining cue by exclusion.
Additionally, the organism had to discriminate when to turn 90 or 45 degrees based on the
presence or absence of the sharp turn cue immediately before a turn cue.
We performed two experiments, each with a different environment consisting of four arenas
with unique trail configurations. Both environments were based on the nutrient cued
environment from Chapter 2. In the first experiment, the direction of each of the first two
turns, although random, could be predicted by counting the number of forward cues that
preceded the first turn. In the second experiment, the direction of the first two turns was also
random but only the direction of the second turn could be predicted by counting the number of
forward cues preceding the first turn.
In addition, the trails in both environments started with 45-degree turns and the first
appearance of a 90-degree turn was at approximately the 25% mark of each trail (fig. 4.1). This
                                                114


exclusion of 90-degree turns from the first portion of the trail was meant to simplify the early
stages of evolution and reduce the number of cues an organism had to respond to before it
could navigate far into the trail. The 25% mark was chosen based on the results of preliminary
experiments, and it afforded a balance between facilitating the evolution of simpler behavioral
building blocks with maintaining a strong selective pressure for the new cues to be identified
and acted upon.
Figure 4.1: Environment with four unique arenas used in the second experiment. At the beginning of
each organism’s life cycle, we placed it on a nutrient at the start of the trail in a randomly selected
arena, facing the next nutrient. The direction of the first two turns in any trail was random. However,
the direction of the second turn could be predicted from the number of nutrients preceding the first
turn. The first 90-degree turn only appeared at approximately the 25% mark of each trail.
In both experiments, we used the same Avida instruction set as in Chapter 2 which included the
move back instruction (Appendix B). All other configurations were the same as in Chapter 2
(Appendix B), such as maximum population size (3600), arena size (25 x 25), and total duration
of each experiment (250,000 updates). We evolved 600 population replicates in the first
                                                    115


experiment and 1,800 in the second. Each population was seeded with the same naive
organism of Chapter 2, which lacked any instruction for behavioral control other than the one
necessary for reproduction.
We also performed a full lineage study of the final predominant organism (most abundant
genotype) of the population with the highest AMTQ (Chapter 2) from the second experiment.
This study consisted of analyzing the performance of each of the organism’s ancestors and
characterizing the main behavioral transitions along the lineage.
Finally, we measured the length and complexity of the algorithm of the predominant organism
from Experiment 2. We used the Cyclomatic complexity measure [194] to determine the
complexity of each algorithm. This measure is based on the number of branching points and
alternative paths of execution in an algorithm. It requires tracing the execution of the
organism's program, flowcharting its algorithm, converting it into a graph, and evaluating the
number of edges, nodes and connected components.
4.3 Results
In our first experiment, we evolved 600 population replicates and found only one where the
predominant organism was able to perform the task. This organism learned by imprinting; that
is, it used the number of nutrients at the start of the trail to disambiguate the direction of the
first turn and associate one of the turn cues with the correct direction. The organism was also
able to perform the configural learning task and turn 45 or 90 degrees depending on the
presence of the sharp turn cue. However, its behavior did not generalize to all trail
configurations. When placed in a trail where the initial segment did not conform with the
                                                 116


pattern it had evolved in, the organism was not able to make the association, instead leaving
the trail and ceasing movement.
For the second experiment, we hypothesized that if we made the start of the trail harder to
predict, we would foster the evolution of a generalizable learning behavior, similar to the
generalizable imprinting described in Chapter 2, in which the organisms evolved to learn the
association by making a mistake, going off the trail, and recovering. Therefore, we used an
environment where the first turn could not be predicted from the number of nutrients
preceding it, although the second turn could.
We evolved 1800 population replicates until we found one where the final predominant
organism was able to perform the configural learning task in this more challenging
environment. This organism learned by making a mistake at a turn cue, stepping off the trail,
and using error recovery to return to the trail and turn to the correct direction (Fig. 2). In
subsequent tests, this behavior proved to generalize across all alternate trail configurations
and, therefore represents evidence in support of our hypothesis. Remarkably, when we tested
this organism on trails where the turn cues were changed or swapped midway, the organism
was able to relearn and reversal learn, even though its ancestors had not experienced these
conditions during evolution. When tested in an environment where the first turn was a 90-
degree one, the organism was still able to learn the association even though the environment in
which it evolved always started with 45-degree turns.
                                                117


Figure 4.2: Path of an organism on a nutrient trail demonstrating configural learning. In this trail, turn
cues have different meanings depending on the context (i.e., if they are preceded by a sharp turn cue or
not). The organism makes a mistake on the second turn, stepping off the trail, but recovers and
subsequently associates the meaning of the cue with the correct direction. Afterwards, it is able to
extrapolate the learned cue to different contexts.
We flowcharted the organism’s algorithm (Appendix B) and calculated its algorithmic
complexity (cyclomatic complexity), which was more than twice as much as the cyclomatic
complexity we found among the organisms that evolved general relearning in Chapter 2 (13 vs.
6).
We also measured the organism's length, which was 84 instructions, of which only 78 were
executed. This is in the same range as the organisms that evolved general relearning in Chapter
2, which was between 55 and 136 instructions long, of which between 53 and 110 instructions
were executed.
                                                    118


Finally, we performed a lineage study on the organism’s ancestral line. We found that the
ancestral behaviors evolved in succession, following the order that we predicted: First moving
and sensing evolved followed by error recovery behavior that allowed the organism to navigate
most of the trail with frequent mis-steps (fig. 4.3). This reflexive behavior was followed by the
evolution of associative learning, where the organism was able to associate the turn cues with
the correct directions and navigate 45-degree turns, but was not able to extrapolate the
learned association to the 90-degree turns, and would navigate these by error recovery (fig.
4.3). Only after this last behavior was in place, configural learning evolved (fig. 4.2).
Figure 4.3: At left, path of an ancestral organism that navigated by error recovery. At right, Path of an
ancestral organism that was capable of associative learning but not configural learning. The organism
learns the cue association in the second turn and uses it to navigate all 45-degree turns. However, it is
not capable of extrapolating the association for the 90-degree turns and uses error recovery instead.
                                                    119


4.4 Discussion
4.4.1 Extra Cognitive Abilities May Contribute to Adaptation
Confirming our expectation, configural learning evolved as the last step in a sequence of
increasingly complex behaviors, recapitulating the evolution of instrumental conditioning from
Chapter 2. We were surprised, however, by the evolution of certain additional cognitive
abilities that were not directly selected for in our environments, namely relearning, reversal
learning, and learning cue associations at 90-degree turns. In Chapter 2, relearning and reversal
learning evolved only in environments that contained cue reversals along the trails.
According to the theory of sensitive periods and other constraints on learning [84], [89], [147]–
[149], [195], [196], learning is costly, and it is optimized by evolution based on the needs of an
organism in its environment. One form of optimization is to restrict the periods in an animal’s
life when it is most capable of learning (sensitive periods). For example, in Chapter 2, some
ancestral organisms that were capable of relearning and which evolved in an environment
without cue reversals eventually gave rise to organisms that could only imprint once (Appendix
B, Section B.6).
It is possible that, given enough time, evolution would have continued to shape our organism’s
learning ability, optimizing it for this particular environment. Nevertheless, the possibility
remains that as more complex cognitive abilities evolve, they may carry extra features that, at
least initially, increase the generality of the organism’s intelligence, which can be beneficial in
adapting to different environments.
                                                    120


4.4.2 We Are Approaching a Complexity Barrier
Complex cognitive abilities evolve by building upon simpler ones. Therefore, the more complex
the ability, the longer it should take to evolve anew and the rarer it should be among replicate
populations. In our system, however, it seems that rarity increases exponentially with both
behavioral and algorithmic complexity.
In Chapter 2, where the algorithmic complexity of both error recovery and relearning was 6,
error recovery evolved in 35% of the replicate populations, imprinting evolved in 9%, while
relearning evolved in only 2%. In the configural learning experiments, where the algorithmic
complexity of the generalizable version of configural learning was 13, the imprinting version
evolved in 0.17% of the replicate populations while the generalizable version evolved in 0.06%
and required up to 1800 replicates for a single population to succeed at the task.
From a technical point of view, it becomes costly to perform digital evolution experiments using
such a large number of replicates. It is possible that if we had designed our environment
differently, or if we had seeded the populations with organisms already capable of associative
learning, configural learning would have evolved more readily. However, seeding an experiment
with pre-evolved organisms likely restricts the open-endedness of behaviors that will evolve
and may bias the order in which novel traits appear. In addition, if rarity continues to increase
exponentially with behavioral complexity, even seeding populations with pre-evolved
organisms will no longer be sufficient to make an experiment practical.
                                                 121


It seems that we reached a complexity barrier with Avida, a problem that other researchers
have encountered with different systems [197], [198]. Although Avida has proven to be a
versatile experimental tool in digital evolution, making invaluable contributions to our
understanding of Darwinian evolution and training a generation of digital evolution scientists,
Avida’s current genetic representation is inadequate for evolving organisms beyond a certain
level of complexity. This limitation mainly stems from Avida’s difficulty in evolving multiple
coexisting loops, reusable modules such as subroutines and functions, and memory usage
beyond a few registers. Although some of these issues may be mitigated by modifying the
instruction set and the virtual hardware, this would likely only raise the complexity barrier and
not solve the problem. What is needed is a new genetic representation that facilitates the
evolution of complexity.
                                                122


Chapter 5: Evolution of Allosteric Regulation in
Cyanobacteria
5.1 Introduction
In the preceding chapters, we investigated some of the earliest and most important steps in the
evolution of natural cognition: navigation, nutrient harvesting, and associative learning. As we
attempted to increase the complexity of the cognitive tasks, the evolution of the organisms
capable of performing those tasks took exponentially longer. Such complexity barriers
frequently limit evolution potential in artificial systems and are a recurrent problem with
present digital evolutionary methods [197], [198].
However, this does not appear to be a problem for biology, which has evolved cognitive
systems of incredible complexity and intelligence. Cognitive systems in living organisms also
have unique properties, which may be the reason for their evolvability and scalability. For
example, regulatory networks in cells have extraordinarily flexible topologies with substantial
modularity and redundancy [59], [199], [200]. In addition, cells employ multiple mechanisms of
regulation [201], and seem to exploit noise rather than attempting to minimize it [58], [202],
[203]. It appears that, in the process of creating life, nature created a language to implement
evolvable control systems of arbitrary complexity [204]–[206].
                                                  123


Biology also has the advantage of scale over artificial systems, which we strive to mitigate by
deploying our understanding of evolution, both natural and digital, to optimize the process and
reduce the necessary time scales and population sizes. Such techniques include using artificially
high mutation rates, multi-island environments, and carefully crafted selective pressures.
However, biology's advantage in scale does not diminish the importance of natural genetic
representations. In fact, the vast time scales of natural evolution provide the ultimate long-
term test of evolvability by implicitly selecting for organisms with underlying genetic
representations that are capable of producing beneficial complex traits. Therefore, in order for
artificial evolutionary systems to achieve similar levels of adaptive complexity, we must also
take inspiration from the aspects of natural systems that allowed them to express complex
phenotypes so successfully.
5.1.1 A New Digital Evolution Platform
Inspired by biology, I carefully translated the cell's chemical regulatory language into a digital
format, and implemented it in a new evolution platform named Elfa (Evolutionary Lab for
Flexible Agents). I attempted to preserve the essential features of cell regulation that make it a
powerful evolvable control systems language, including much of the stochasticity, while taking
advantage of the digital format to optimize other mechanisms, thus avoiding the steep costs of
chemical simulations.
In Elfa, the evolving agents are equivalent to cells, and the genetic representation is based on a
binary string chromosome, with genes capped by basal promoters and enhancers that bind
transcription factors, similar to Banzhaf [207], [208]. However, Elfa's representation
                                                 124


incorporates additional biological mechanisms that allow more complex networks to evolve,
such as ligand binding, agonism, proteases, and assembly-program gene products.
In order to demonstrate the platform and test its capabilities, I performed an experiment where
a digital equivalent of cyanobacteria can evolve allosteric regulation anew and take advantage
of the daily light cycle.
5.1.2 Evolution of Allosteric Regulation Experiment
As far as we know, oxygenic photosynthesis evolved only once in the ancestors of modern
cyanobacteria [209]. All green algae and plants inherited this capacity from an ancestor that
endosymbiotically acquired a cyanobacterium. In this experiment, I investigated the evolution
of gene regulation on the equivalent of an early prokaryote oxygenic photosynthetic cell.
This experiment consisted of exposing populations of cells that expressed their genes at a low,
constant rate and had no regulatory ability, to an environment where the light irradiance
followed a daily cycle. In addition, each cell contained chemical signals whose concentration
correlated with the light intensity in the environment, as well as the cell's level of stress, and
energy reserves. My hypothesis was that over the generations these cells would evolve higher
photosynthetic output and reproduce faster, maximizing the use of the available light.
Cells could evolve higher fitness via mutations that adjusted the rate at which their genes were
expressed, without adding any dynamic regulation mechanism, or they could evolve regulatory
mechanisms that allowed them to time the production of proteins, as well as growth and cell
division, according to the environmental light cycle.
                                                 125


Moreover, the experimental setup allowed different regulatory mechanisms to evolve to solve
the same problem. Some would involve direct regulation of one or more of the original
proteins via allostery, while others would involve the evolution of new ligand-regulated
transcription factors.
5.1.3 Findings
All populations evolved higher fitness over time by adjusting the expression level of their genes.
Some populations also evolved allosteric regulation of the original proteins, but this was rare.
Regulated cells took advantage of the light cycle by concentrating their growth and
reproduction during the light period and reducing protein expression at night. This behavior
technique allowed them to harvest much more energy from the environment, and reach much
higher fitness than the fittest cells that did not use regulation.
5.2 Methods
5.2.1 Cell and Population Organization in Elfa
In Elfa, virtual cells have a prototypical shape, with a given volume of cytosol bounded by a
semi-permeable membrane. The cell's volume and surface area change as the cell grows and
divides. The cell's cytosol contains a single circular chromosome, proteins, ligands, and other
molecules, which all can interact. The cell’s chromosome is made of binary digits, organized into
32-bit segments that correspond to integer numbers. The combination of 32-bit segments,
within a certain range of values, can encode for operons, each containing one or more genes.
Operons can have one or more basal promoters, as well as one or more upstream enhancers.
                                                  126


Segments with integer values outside of the coding range are interpreted as non-coding
sequences (if they are not transcribed) or linker domains (if they are transcribed).
During replication, the cell’s chromosome can suffer mutations. These could be simple point
substitutions (bit flips), as well as multiple segment duplications, transpositions, and deletions.
Mutations in an operon's basal promoter, for example, could alter the promoter's affinity for
the RNA polymerase and change the expression level of that operon's genes. Mutations could
also change the properties of proteins encoded in genes, for example making them sensitive to
a ligand. In addition, mutations could cause genes to duplicate or become pseudogenes. In sum,
much like in biology, mutations can shape the architecture of chromosomes in Elfa, which
encodes not only the genes but also the regulatory regions that determine how those genes are
expressed.
Cells are grouped in a population that is well mixed and capped in size. When a cell reaches a
threshold amount of cell growth factor and possesses sufficient energy reserves (details below)
it replicates, causing it to divide into two daughter cells. Once a population reaches the
maximum size, the birth of any additional cell causes the removal of another cell at random.
This constraint creates a strong selective pressure for fast replication.
5.2.2 Ancestral Cells
The ancestral genotype that seeded all experimental populations possessed only three genes,
transcribed at different, but constant, rates that were proportional to their basal promoter
                                                  127


affinity. These genes encoded a photosynthetic complex, an RNA polymerase, and a growth
factor, described below.
The photosynthetic complex was responsible for converting CO2, H20, and light into sugar that
the cell used for energy. The sugar conversion rate in a cell was proportional to the abundance
of the photosynthetic complex and the light irradiance at a given time and limited by the
amount of CO2 that diffused into the cell. If the cell ran out of CO2 in the presence of light, this
complex would produce a harmful compound, Reactive Oxygen Species (ROS), instead of sugar.
At high concentration, ROS would slow protein translation and cause damage to the cell’s
components including somatic mutations.
The RNA polymerase (RNAp) was responsible for transcribing the cell's genes. The rate at which
a gene was transcribed varied with the abundance of RNAp.
The cell growth factor was responsible for causing the cell to grow, and also for signaling the
initiation of cell replication. When the abundance of this factor reached a certain threshold, the
cell would attempt to replicate its genome and divide into two daughter cells.
In addition to the three gene products described above, this representation allowed the
encoding of proteins with the function of transcription factors. Transcription factors bound to
enhancers and either promoted or repressed protein transcription of the operon downstream.
Proteins could evolve a ligand-sensitive domain that temporarily bound ligands according to
their affinity sequence and affected the function of the protein, either increasing or decreasing
its activity.
                                                128


5.2.3 Ligands in Elfa
Elfa provides each cell with signals in the form of ligands, which convey information about the
conditions in the environment and within the cell. The strength of a signal is proportional to the
corresponding ligand concentration. However, the ancestral genotype was insensitive to these
signals. In order to sense them, the cells had to first evolve proteins with ligand-sensitive
domains with affinity for the specific ligands.
These were the five types of ligands used in this experiment:
The reactive oxygen species (ROS) ligand signaled the amount of this harmful compound. The
ligand was produced as the cell ran out of CO2 during photosynthesis. Higher concentrations of
ROS increased the probability of somatic mutations, protein damage, translation slowdown, cell
aging and loss of fat reserves.
The cyclic AMP (cAMP) ligand signaled starvation stress. This ligand was produced as the cell
ran out of sugar reserves. At low concentrations, it indicated that fat reserves were being
converted into sugar. However, in high concentrations, it indicated that the fat reserves were
running out. Higher concentrations of cAMP increased the chance of somatic mutations,
protein damage, and translation slowdown.
The irradiance level ligand indicated the amount of light reaching the cell. Its concentration
followed a 24-hour cycle of night and day, where underwater daylight irradiance is modeled
according to the literature [210].
                                                129


The sugar reserves ligand indicated the amount of sugar immediately available as energy for
the cell. It was produced as the cell performed photosynthesis and accumulated sugar.
The fat reserves ligand indicated the amount of energy in long-term storage. It was produced
when the sugar reserves reached a maximum threshold, and any excess sugar was converted
into fat.
All proteins had a half-life, which determined the average time that they remained in the cell
before being recycled. Cells, on the other hand, had no age limit but accumulated harmful
protein aggregates (plaque) over time [211].
Plaque was formed when proteins were damaged by excessive ROS, could not be recycled, and
accumulated in aggregates, which in turn could cause further damage to healthy proteins (cell
‘aging’). When a cell replicated, all of its plaque was segregated into one of the daughter cells,
while the other was born ‘rejuvenated’ [212]. However, at cell division, all proteins and energy
reserves of the mother cell were divided between the daughter cells.
5.2.4 Cellular Costs and Constraints
There were a number of costs and constraints to the cell’s actions. Cell replication, once
initiated, could not be stopped, and lasted an amount of time proportional to the length of its
chromosome. During this period, the energetic cost of replication was subtracted from the
cell’s energy reserves (sugar and fat). A cell that initiated replication and ran out of energy
reserves, died.
                                                  130


Transcribing genes, and translating proteins, also entailed energy costs as well as time delays,
proportional to the gene’s length, and to the protein copy number. This dynamic created a
selective pressure for efficient genetic encoding and regulation. By imposing the cell with costs
and constraints while providing sensory feedback signals, we avoided adding a number of
artificial limitations (such as cell age or genome size) or rates (e.g., somatic mutation rate or
transcription rate). In addition, we provided more of an opportunity for the evolution of
complex regulatory networks.
5.2.5 Experimental parameters
For this experiment, I evolved 1,012 population replicates, each capped at 200 cells, for a
period of 730 virtual days. Cells experienced the light environment corresponding to 4.0 meters
of ocean water, at 0.0 degrees latitude. All simulations started with light levels equivalent to
those that they would experience at 6:00 am on January 1st. Mutation rates were set at 1/50 for
point mutations, 1/1,200 segment duplication, 1/1,200 for segment deletion, and 1/1,500 for
segment transposition.
5.2.6 Analyses
At the end of the experiment, I sorted the populations according to whether they had fixed
allosteric regulation, transcription factors, or neither. Elfa provides population reports that
make this classification straightforward. In addition, I collected samples from each of the final
populations that evolved allosteric regulation and mapped their regulatory network. This
mapping, however, is not automated. It requires analyzing individual cells' genes to identify
                                                  131


molecular affinities and analyzing the cells' internal contents and dynamics under different
conditions.
In order to measure the fitness of a given population, I collected sample cells and incubated
them for a period of 30 simulated days without mutations to test their viability. Due to the
probability of somatic mutations under stress conditions, populations typically contained a
number of damaged and unviable cells. This issue was exacerbated in the populations that did
not evolve regulation since these cells were insensitive to the stress signals. Therefore,
measuring the fitness of a given population was a computationally intensive process.
Once I had identified the fitness of a set of populations, and I analyzed the regulatory network
of their representative genotypes, I selected a few cells that exemplified the different traits that
evolved and plotted their daily cycle of growth and division, noting any salient dynamics. These
plots were based on reports that Elfa can produce for individual cells or lineages.
5.3 Results
Out of the 1,012 replicate populations we evolved 21, only 18 fixed allosteric regulation.
However, 14 of these regulated populations reached much higher fitness levels than the fittest
non-regulated population. No population fixed allosteric regulation of transcription factors.
21
   The unusual number of replicates was due to the number of CPU cores, and threads that could efficiently run in
the computer I used for this experiment.
                                                      132


However, two populations fixed transcription factors without regulation as a mechanism to
increase their gene expression levels.
The cells that evolved regulation could be divided into three groups according to which of their
proteins evolved allostery (fig. 5.1). The first group evolved growth factor allostery and
consisted of four populations. The second group evolved RNA polymerase (RNAp) allostery and
consisted of ten populations. The final group evolved both growth factor and RNAp allostery
and consisted of four populations. Note that allostery of the photosynthetic complex was not a
viable mechanism of regulation, due to its large protein count that would easily quench most
types of ligands.
On average, the fittest group was the one where both the growth factor and the RNAp were
regulated (Average = 5.5 generations per day), the second fittest was the one with regulated
RNAp (Average = 4.5 generations per day). The regulated growth factor group had an average
fitness of 3.1 generations per day.
                                                  133


                            Regula�on Type vs. Fitness
                      8.0
                      7.0
                      6.0
 Genera�ons per Day
                      5.0
                      4.0
                      3.0
                      2.0
                      1.0
                      0.0
                              G.F.        RNAp      G.F. + RNAp
                                     Type of Regula�on
Figure 5.1: The 18 populations that evolved allosteric regulation could be categorized into three groups
according to which of their proteins evolved allosteric regulation. The first group (four populations)
evolved growth factor regulation. The second group (10 populations) evolved RNA polymerase (RNAp)
regulation. The last group (four populations) evolved both growth factor and RNAp regulation. All final
populations had higher fitness than the ancestral one (green dotted line at 0.9 generations per day). In
addition, most populations with regulation (14/18) had higher fitness than the best non-regulated final
population (orange dashed line at 3.4 generations per day).
The ligand that was most commonly used for regulation was ROS. ROS is produced when cells
run out of CO2 during photosynthesis, due to the slow diffusion of CO2 from the environment
into the cell. In large quantities, ROS can damage proteins and energy reserves, slow down
translation, and cause mutations. Therefore, cells that use ROS for regulation are quick to
                                                     134


respond to even small amounts of this ligand by increasing their growth rate in order to
replicate faster. Cell division serves to alleviate the shortage of CO2, since the newborn
daughter cells have a higher surface area to volume ratio than the mother cell, thus benefitting
from increased CO2 diffusion from the environment.
In figures 5.2 to 5.5, I contrast the ancestor cell that seeded the experiment (fig. 5.2) with cells
that evolved allostery from each of the three groups described earlier. Note that when RNAp is
regulated, the expression of all genes is affected. In addition, it is worth noting that not all
ligand affinities contribute to the fitness of the cell. Some are simply instances of cross-talk, or
interference.
Figure 5.2: Gene regulatory network of the ancestral cell with no regulation. On the top are the five
ligands available to the cell: ROS, fat reserves (Res), sugar reserves (Sug), irradiance (Light), and cAMP.
The ancestral cell does not sense the concentrations of any of these ligands.
                                                     135


Figure 5.3: Gene regulatory network of one of the cells with regulated growth factors. Here, the growth
factor could bind three different ligands with different affinities, and with different effects. The fat
reserves ligand (Res) would bind weakly and had an agonistic effect. The irradiance ligand (Light) would
also bind weakly but would have a reverse agonistic effect. Finally, the cAMP ligand would bind strongly
and also have a strong reverse agonistic effect. In practice, high fat reserves would promote a slight
increase in growth factor production, while both the presence of light (useful for building more reserves)
and cAMP (indicating starvation) reduced the growth factor production and thus delayed replication.
Figure 5.4: Gene regulatory network of one of the cells with regulated RNAp. Here, the RNAp could bind
two different ligands with different affinities, and with different effects. The ROS ligand would bind
strongly and also cause a strong agonistic effect, while the sugar reserves ligand (Sug) would bind weakly
and have a weak reverse agonistic effect. In practice, the presence of ROS, even in small amounts, would
cause the cell to increase the expression rate of all its genes, including the growth factor, causing it to
replicate faster. Replication increased the ratio of surface area to volume in the daughter cells,
temporarily alleviating the source of stress. High sugar reserves (indicating active photosynthesis), on
the other hand, would slightly reduce the expression rate of all genes, and contribute to slow down
replication and build energy reserves.
                                                     136


Figure 5.5: Gene regulatory network of one of the cells with both growth factor and RNAp regulation.
Here, the RNAp could bind three different ligands while the growth factor could bind two. The ROS
ligand would bind strongly to both RNAp and growth factor causing strong agonistic effects. The fat
reserves ligand (Res) could also bind to both RNAp and growth factor, but it caused a weak reverse
agonistic effect. Finally, the sugar reserves ligand (Sug) would bind weakly to the RNAp and have a weak
reverse agonistic effect. Similar to the case in fig. 4, ROS would have a strong effect on gene expression
by promoting the activity of RNAp, but with even more emphasis on cell growth and replication, since it
also directly promoted the activity of the growth factor. However, high fat and sugar reserves (indicating
active photosynthesis) would slightly reduce the rate of protein expression and the activity of the
growth factor, which favored the accumulation of energy reserves.
In figures 5.6 and 5.7, I contrast the daily cycle of two cell lineages, one capable of regulation
and another that expresses its genes at a constant rate. Regulated cells are typically more
resilient than non-regulated ones since they respond to stress signals, cAMP or ROS, thus
minimizing harm. Non-regulated cells on the other hand, are often mismatched with the
environment. Due to competition for faster replication in the fixed size population, non-
regulated cells are often swinging from starvation stress at night to ROS stress during the day
and suffering from high levels of mutation.
                                                       137


Figure 5.6: A twenty-four hour period of one of the cell lineages with RNAp allosteric regulation. In the
presence of ROS, which causes an agonistic effect on their RNAp, these cells grow faster and divide, thus
increasing their surface area and the absorption of CO2. Sharp drops in cell volume indicate cell division.
Note that cells have steeper growth curves and also replicate more often during the day than at night.
                                                   138


Figure 5.7: A twenty-four hour period of one of the fittest cell lineages without regulation. These cells
grow and replicate at a constant rate despite the environmental light cycle and their own internal stress.
As a result, they often suffer damage and mutations due to ROS accumulation when they reach their
photosynthetic limit.
5.4 Discussion and Conclusions
Overall, the experiment’s results supported my hypothesis that cells would evolve not only to
maximize their photosynthetic output but also to take advantage of signals that conveyed
information about the state of the environment. In addition, the results revealed some
unexpected dynamics, such as the use of ROS triggered growth and cell division to relieve ROS
stress instead of simply reducing the rate of photosynthesis by limiting the expression of the
photosynthetic complex.
                                                   139


Although the cells in the few populations that fixed allostery gained a great fitness advantage,
most populations repeatedly evolved allostery without it ever fixing. In fact, most populations
appeared to select against allostery. One possible explanation is that the evolution of allostery
is disruptive with a net detrimental effect, more often than it is purely adaptive or neutral. If
this is the case, allostery will only be able to fix in the rare chance that it does not create side
effects or if compensatory mutations occur rapidly to overcome the harm.
More surprising is the lack of fixation of regulation based on transcription factors. It is possible
that this form of regulation was slower and less efficient than the allosteric regulation of the
target proteins themselves. However, even if this is the case, transcription-factor-based
regulation should evolve if there are a sufficiently high number of parameters that the cells
could control, since there are only so many signals to which a protein could be sensitive.
5.4.1 Elfa as an open-ended digital evolution platform
As mentioned earlier, a cell's regulatory mechanisms, and its underlying genetic representation,
amount to a language of evolvable, open-ended control systems. In developing Elfa, I
reinterpreted this language in a digital format. My initial results have demonstrated a variety of
biologically realistic evolutionary outcomes, especially given the small populations sizes of only
200 cells and the relatively short time frames. While any fixation of de novo regulation is
impressive under these conditions, my work going forward is not only to experiment with the
platform but also to reevaluate my model in order to optimize its performance and make sure
that it captures the essential features of biology.
                                                   140


As it stands, Elfa's model already demonstrates open-ended properties akin to biology. For
example, sensors can not only evolve anew, but their gain and specificity can also be modified
by evolution. In addition, evolution can adjust many cell parameters, such as size, replication
rate, aging rate, and somatic mutation rate. However, the model could be made even more
open-ended by having cells with additional evolvable traits such as evolvable ligands, evolvable
RNA polymerase’s affinity sequence, evolvable chromosome repair machinery, and evolvable
horizontal gene transfer mechanisms
Elfa also has properties that make it a good candidate for evolving control systems for real-
world applications, such as agents that acquire their own control laws via evolution, are robust
to noise, function in simulated real-time, respond to analog signals, and are capable of dynamic
analog responses.
In nature, cells spent three billion years evolving increasingly effective cognitive systems, like
that of the amazing choanoflagellate [213]–[215]. It then took only 600 million years for cells to
evolve the human brain. If our time scale is number of generations instead of years, the
discrepancy in length between these two periods becomes even starker. Thus, I believe that the
path to evolving general intelligence starts with evolving the cognition of a cell.
                                                 141


APPENDICES
    142


           APPENDIX A
Supplementary Material for Chapter 2
               143


 A.1 Supplementary Methods
Authors: Anselmo C. Pontes, Robert B. Mobley, Charles Ofria, Christoph Adami, and Fred C.
Dyer
This text is adapted from the supplementary material of Am. Nat. 2020. Vol. 195, pp. E1–E19
[75]. © 2019 by The University of Chicago. CC BY-NC 4.0. DOI: 10.1086/706252
We used the Avida default setting for population size (3600 organisms), mutation rates (5%
probability for insertion and deletion, and 0.75% probability of instruction change) and
behavioral grid size (25 x 25, toroidal grid) [114]. We set the number of instructions that an
organism had to execute before it could reproduce to 1500, and the maximum age to 50 times
the genome length. In addition, we used an instruction set based on the optimum “genetic
hardware” [216], also known as “hardware 3”.
Our ancestral organism used the “reproduce” instruction, which performs the full self-
replication process, instead of the Avida default “copy loop”, which implements self-replication
as sequence of several instructions executed in a loop. This simplification allowed us to focus
our attention on the evolution of behavior, since the evolution of self-replication in Avida has
already been studied extensively [217]. Therefore, the ancestral organism we used to seed the
population was made of 100 instructions: 99 “nop” instructions followed by one reproduce
instruction. The Avida instructions mentioned throughout the text are also known by their
Avida mnemonic, listed in table A.1. We ran each replicate for 250 thousand updates, where
update is an Avida unit of time.
                                                 144


The five behavioral strategies we used to categorize the organisms' phenotypes (table 2.2) were
discerned during exploratory experiments in which we surveyed a great number of behaviors in
a wide range of conditions. We defined the different strategies by observing commonalities in
the paths that each organism followed when exposed to different environments and analyzing
the organism’s algorithm. Subsequently, when we performed the experiments described in this
paper, we used these predefined categories and classified each organism by observing its path
in different environments and also tracing the execution of its behavioral control program,
which included monitoring the organism’s internal memory.
Figures A.1 to A.4 show the four environments used in experiment 1, and Figure A.5 shows the
environment used in experiment 2.
Table A.1: Avida instructions mnemonic references
 Instruction name           Avida mnemonic
 Reproduce                  repro
 Sense Current              sg-sense
 Rotate Right               sg-rotate-r
 Rotate Left                sg-rotate-l
 Move Ahead                 sg-move
 Move Back                  sg-move-b
The organisms we selected for lineage study in experiment 2 (fig. 2.5) were the 10 most
sophisticated final predominant organisms capable of relearning - those capable of reversal
                                                145


learning any two symbol combinations while retaining the learned associations long-term and
generalizing their behavior to any trail configuration.
The custom version of Avida used in this study can be found here:
https://github.com/mercere99/Avida-AssociativeMemory
                                                146


Figure A.1: One fixed turn environment. This environment consists of four different trails. In all of them,
the first turn is to the right. The green circle indicates where the organism is placed at the start of the
trail, facing the next nutrient.
                                                       147


Figure A.2: Two fixed turns environment. This environment consists of four different trails. In all of
them, the first turn is to the left and the second turn is to the right. The green circle indicates where the
organism is placed at the start of the trail, facing the next nutrient.
                                                      148


Figure A.3: Nutrient cued environment. This environment consists of four different trails. In each of
them, the number of nutrients before the first turn is even if the first turn is to the right and odd if the
first turn is to the left. The green circle indicates where the organism is placed at the start of the trail,
facing the next nutrient.
                                                       149


Figure A.4: Random start environment. This environment consists of four different trails. In all of them,
the number of nutrients before the first turn is the same (3), and each of the four possible combinations
of first and second turns are contemplated, therefore, making the start pattern truly random from the
point of view of the organism. The green circle indicates where the organism is placed at the start of the
trail, facing the next nutrient.
                                                    150


Figure A.5: Cue reversal environment. This environment consists of four different trails with the same
start pattern as the nutrient cued environment. In each of them, the number of nutrients before the
first turn is even if the first turn is to the right and odd if the first turn is to the left. The pink circle
indicates the point where the turn cues are reversed. The green circle indicates where the organism is
placed at the start of the trail, facing the next nutrient.
                                                         151


A.2 Preliminary Experiment
This experiment was identical in design to experiment 1, with one exception. The instruction set
we used did not contain the move back instruction. The only sensor and effector instructions
were sense current, rotate right, rotate left and move ahead.
We performed 50 evolutionary replicates for each environment, except for the nutrient cued
one where we extended the number of replicates to 150 because we had observed a high
degree of variability and sought to determine whether a pattern would emerge.
A.2.1 Preliminary Experiment – Results
As in experiment 1, the evolution of learning (i.e., imprinting) was rare, but it occurred in all
predictable-start environments (one fixed turn, two fixed turns, and nutrient cued), and never
in the control environment (random start). The predictable-start environments also produced a
wider range of navigational strategies and much higher task quality scores than the random
start environment. No replicate in the random start environment produced an organism that
could complete the trail. The average maximum task quality we observed in this environment
was 5% (table A.2) for an organism that used a rigid strategy of moving straight on forward cues
and always turning left on a turn cue.
We analyzed over 100 out of the 300 replicates across all experimental conditions and found
that, just as in experiments 1 and 2, organisms from different evolutionary replicates generated
a consistent set of phenotypes. Their behavioral strategies always fell into one of the easily
                                                152


recognizable categories described in table 2.2, with some hybrids of two or more strategies also
observed.
In this experiment, error recovery evolved significantly less often in the predictable-start
environments than in experiment 1, where the move back instruction was available (Fisher's
exact test: one fixed turn, p = 0.0095; two fixed turns, p = 0.0125; nutrient cued, p < 0.0001). In
addition, we observed a significantly lower average maximum task quality in all but the one
fixed turn environment (Kruskal–Wallis test: one fixed turn, H = 3.0309, df = 1, p-value =
0.08169; Two Fixed Turns, H = 6.7623, df = 1, p-value = 0.00931; nutrient Cued, H = 19.522, df =
1, p < 0.0001; random start, H = 25.301, df = 1, p-value = 4.904e-07), with the largest effect in
the nutrient cued environment (fig. S6).
The error recovery strategy that evolved in this experiment used quite an elaborate routine.
When an organism encountered a turn cue, it would attempt to turn to one side. If its first
turning attempt led it into an empty location, it would turn around on the spot (execute four 45
degree turns) and move one step forward in order to get back on the trail, then it would turn to
the opposite direction that it had tried the first time and move forward. The organisms that
used this strategy did not learn from their errors and performed the same behavior at every
turn of the trail. However, this result demonstrated that even a basic Avida building block such
as the move back behavior could evolve from an assembly of simpler actions.
                                                 153


Figure A.6: Distribution of Average Maximum Task Quality (AMTQ) per environment, comparing the
preliminary experiment and experiment 1. Each violin plot represents the distribution of task qualities
across replicates for a given environment. The preliminary experiment did not use the move back
instruction (orange), experiment 1 did (blue). The difference between experiments was significant in all
but the one fixed turn environment (Kruskal–Wallis test: one fixed turn, H = 3.0309, df = 1, p-value =
0.08169; two fixed turns, H = 6.7623, df = 1, p-value = 0.00931; nutrient cued, H = 19.522, df = 1, p <
0.0001; random start, H = 25.301, df = 1, p-value = 4.904e-07).
                                                    154


Table A.2: Preliminary experiment summary of results. Performance and strategies of the organisms
with AMTQ equal or higher than 25%, organized by environment*.
                                        Predictable-start environments                 Control
                                                                                       environment
                               One fixed turn     Two fixed        Nutrient cued       Random
                                                  turns                                start
 Replicates    Proportion of
                                    13/50              7/50             4/150               0/50
 in which      replicates
 organisms
 finished      Strategies      Imprinting (9)     Imprinting       Imprinting (3)
 the trail     evolved (no.                                                                  N/A
                               Error recovery                      Error recovery
               of replicates)  (4)                                 (1)
               Highest         99%                98%              98%
               AMTQ            (Imprinting)       (Imprinting)     (Imprinting)
                                                                                             N/A
               observed
               (strategy)
 Replicates    Proportion of
                                     9/50              7/50            17/150               0/50
 in which      replicates
 organisms
 did not       Strategies      Imprinting         Imprinting       Error recovery
 finish the    evolved         Error recovery     Error recovery Path
 trail                                                             predicting
 (AMTQ ≥                                                           Searching
 25%)                                                                                        N/A
                                                                   Imprinting
                                                                   Hybrids of two
                                                                   or more
                                                                   strategies
*Note: We examined only a sample of organisms that had less than 25% AMTQ. Those that
were examined displayed previously described strategies and did not travel far on the trail.
In the course of our investigations, we came across intriguing possibilities that we plan to
explore in future studies. For example, in this preliminary experiment, we observed the
greatest variety of corrective actions that led the organisms back to the trail, since we were not
                                                 155


using the move back instruction. Most corrective actions led organisms to skip nutrients or take
extra steps through empty locations, resulting in a lower task quality. However, we often
observed these less efficient forms of error recovery in hybrid strategies with searching. This
finding led us to suspect that error recovery could have evolved as an optimization of a more
stereotypical searching strategy. Since error recovery is one of the most successful strategies
we observed, it would be interesting to analyze the lineages where it evolved in this study, in
order to reconstruct its evolutionary path in greater detail.
A.2.2 Preliminary Experiment – Bestiary
In the one fixed turn environment, we observed other strategies and variations that were not
among the top performers but are still worth mentioning. Some organisms evolved different
path predicting strategies. Others evolved different methods of error recovery. In one, the
organism did not possess a single rotate left instruction in its genome and performed all turns
by rotating right a number of times. In another, if the organism made a mistake at a turn and
stepped off the trail, it would bypass the turn cue and try to find the first nutrient after it. This
behavior was reminiscent of a search procedure.
In the nutrient cued environment, we also observed some noteworthy strategies among the
organisms that could not complete the trail. Most were hybrid strategies combining two or
more behaviors (except relearning).
We found two organisms that used a flexible path predicting strategy based on a mnemonic
rule and limited memory that could reach up to 29% of the maximum task quality. This strategy
                                                156


combined some genetic encoding of trail patterns with temporary memory storage of cue
values, which allowed the organism to distinguish a turn cue from the last one it had sensed,
and repeat the turn in the same direction or not.
Another two organisms used a hybrid strategy combining path predicting and error recovery.
They navigated the first portion of the trail based on genetically-encoded trail patterns, and as
soon as the pattern no longer applied and they stepped off the trail, they would use error
recovery in order to get back on the trail and continue to navigate, reaching up to 31% task
quality.
A different hybrid strategy we observed in three organisms also started with path predicting.
However, when their encoded pattern no longer applied and they stepped off the trail, they
would continue to search for another portion of the trail and enter it, even if in the opposite
direction. It appeared as if they hard-coded a portion of a trail within each trail.
We found one organism that used a hybrid strategy combining error recovery and searching. It
primarily navigated by error recovery, however, if it encountered two left turns in a row it
would not be able to find its way back to the trail. Instead, it continued searching for another
stretch of the trail, and reentered it, even if in the opposite direction. It would then resume
navigation by error recovery.
Finally, the best performing hybrid strategy combined imprinting, error recovery, and
searching. This organism did not complete the path but could reach a task quality of 84%. It
performed imprinting at the beginning of the path and would continue to navigate using the
                                                  157


associated cue in memory. However, similar to the previous organism, it had trouble navigating
two left turns in a row and would step off the trail. Once off the trail, it would perform an error
recovery behavior (moving and turning 90 degrees, four times) to get back on the trail. Then it
would continue navigating based on the imprinted cue. However, when it reached the end of
some trails, it would start a search behavior until it found another part of the trail and
reentered it, even if in the opposite direction, in which case it navigated the trail by employing
the error recovery strategy.
We examined only a sample of organisms that had less than 25% task quality. Of these, the
ones examined displayed previously described strategies and did not travel far on the trail.
A.3 Experiment 1 – Additional Results and Bestiary
In both the nutrient cued and one fixed turn environments, we found a variation of the error
recovery strategy in which the organisms averted stepping off the trail in non-default turns if
they were in close succession. They tended to repeat the non-default turn when encountering
another one in a sequence. This behavior allowed them to make fewer “mistakes” and achieve
a higher task quality (up to 87% of the maximum) than the organisms that used the typical form
of error recovery.
Analyzing the lineages of the organisms that used the more rigid form of imprinting, which did
not generalize to other trail configurations, we found that most evolved directly from ancestors
that navigated by path predicting. These lineages never evolved error recovery, which explains
the inability of these organisms to cope with mistakes that led them off the trail.
                                                 158


Interestingly, in the one fixed turn environment, one of the organisms that used the more rigid
form of imprinting evolved from ancestors that used error recovery, and that ability was
eventually lost during evolution. When learning initially evolved in this lineage, there were
ancestral organisms able to both imprint the cue association by using the pattern at the start of
the trail, as well as error recover and relearn if the cues changed and they stepped off the trail.
These overlapping abilities persisted over many generations, until the behavior was further
streamlined, and only the more rigid form of imprinting remained, which provided a small
advantage in task quality, in their particular environment. This case supports hypothesis 4 that
there must be a need for reversal learning (or relearning) for it to fix, otherwise imprinting
alone will suffice and could fix.
Similarly, one of the organisms that evolved generalizable imprinting had an ancestor capable
of coping with cue changes along the trail. However, this ability arose from a “short memory”
that required it to reassociate the cue at regular intervals, rather than an ability to relearn.
                                                  159


Figure A.7: Searching and imprinting hybrid strategy. This organism, which evolved in the nutrient cued
environment, is an example of a searching and imprinting hybrid strategy that reached a task quality of
57% of the maximum in this trail. When the organism started navigating the trail, it reacted to turn cues
by turning to a default direction. If this direction led it to step off the trail, it would initiate a search
procedure, alternating forward moves and turns, until it found another portion of the trail and
reentered it. This stint off the trail also primed it for imprinting the cue association the next time it
encountered the non-default turn cue. After which it would navigate the remainder of the trail using the
learned association.
A.4 Experiment 2 – Additional Results
We also tested whether the position of the cue reversal on the trail affected the evolution of
learning (imprinting and relearning). We used the same cue reversal environment, Turing-
complete instruction set, and sensing and moving instructions as in experiment 2, but varied
the position at which the cues were reversed in the trail. We hypothesized that if the cue
reversal took place early in the trail, organisms that navigated by imprinting would not have a
sufficient selective advantage over those that navigated by error recovery.
                                                        160


The results disproved our hypothesis. The position of the cue reversal on the trail had no
significant effect on the number of organisms that evolved learning. Initially, we varied the cue
reversal position from the 10% mark to the 90% mark of every trail, in 2.5% increments,
evolving 200 populations for each increment. Although not significantly different (Fisher's exact
test, p = 0.3803), we chose two points that had a large difference in the number of organisms
that evolved learning (65% mark and 85% mark) and evolved 1000 populations at each (fig.
A.8). Again, the difference was not significant (Fisher's exact test, p = 0.1497).
Figure A.8: Evolution of learning according to cue reversal position. Different cue reversal positions
along the trail of nutrients, from 10% to 90% of the total length, in 2.5% intervals. In blue, the number of
replicates from the first set that evolved learning out of 200 per position. In yellow, the second set with
1000 replicates per position.
                                                     161


A.4.1 Experiment 2 – Single Lineage Analysis
Below are phenotypic descriptions and figures from one of the lineages that evolved relearning
in experiment 2. This lineage is typical of the evolutionary sequences we observed, where
organisms first evolved the capacity for moving, then sensing, followed by a purely reflexive
navigation strategy, such as path predicting, searching, or error recovery, and ultimately
learning. In each figure, the background graph shows the task quality (TQ) over time,
highlighting ancestors at major evolutionary transitions. The graphic on the right shows the
path of a particular ancestor in one of the arenas of the cue reversal environment.
Ancestral organism (TQ = 0.0; Update born: 1):
This was the ancestral organism common to all lineages. Its genome was blank except for a
reproduce instruction. Therefore, this organism possessed no behavior and was sessile.
                                                 162


Oscillator phenotype (TQ = 0.016; Update born: 697):
This organism oscillated in place, moving one step forward and one step back (fig. A.9). It was
the first organism of the lineage with a real gain of task quality over the ancestral organism. It
did not possess a sense current instruction.
Figure A.9: “Oscillator” organism. This organism moved back and forth between its start position (green
circle) and its final position (red circle).
                                                  163


Straight mover phenotype (TQ = 0.032; Update born: 1424):
This organism was the first to be capable of sensing and reacting to environmental cues (fig.
A.10). Its behavior was divided in two phases. In the first phase, it moved straight forward,
and continued moving while sensing the forward cue. When it no longer sensed the forward
cue, it stopped moving forward and entered a second phase where it oscillated in place for
the remainder of its lifetime (until it could reproduce). It did not possess rotate instructions.
Figure A.10: “Straight mover” organism. This organism moved forward only until it encountered a turn
cue.
                                                 164


Right turner phenotype (TQ = 0.056; Update born: 32019):
This organism moved forward when sensing forward cues, and always turned right upon
sensing any turn cue (fig. A.11). If it encountered a left turn, it would turn to the right and
move into an empty location, which caused the organism to stop moving and attempt to
reproduce.
Figure A.11: “Right turner” organism. This organism reacted to turn cues by always rotating right.
                                                  165


Path predictor phenotype (TQ = 0.073; Update born: 54071):
This organism moved forwards when sensing forward cues and reacted to the turn cues by
turning either to the right or left, depending on how many steps it had moved and the direction
of its last turn (fig. A.12). It had encoded a pattern in its behavioral algorithm that reflected the
different starts of the four trails for this particular environment (nutrient cued). In other words,
it had a limited flexible response that would differentially match the pattern of the first few
segments of any of the four trails it could encounter. When the pattern no longer matched, and
the organism made a “wrong” turn, it stopped moving, and attempted to reproduce. (In three
of the paths it was able to pass the first two turns and in one of the paths it reached past the
fifth turn.)
Figure A.12: “Path predictor” organism. This organism had an encoded pattern in its behavioral
algorithm that reflected the different starts of the four trails for this particular environment (nutrient
cued). When the pattern no longer matched the trail, and the organism made a “wrong” turn, it stopped
moving.
                                                     166


Error recoverer 1 phenotype (TQ = 0.089; Update born: 64201):
This was the first organism in the ancestral lineage to use the error recovery strategy (fig.
A.13). However, it moved slowly and reproduced before reaching the end of the trail. The
organism reacted to the turn cues by first trying to turn to the right and moving straight. If that
led it into an empty location, it took one step back, and tried rotating to the left in 45 degrees
increments and moving straight until it found the trail of nutrients again.
Figure A.13: “Error recoverer 1” organism. This organism was the first in this lineage to use the error
recovery strategy, however it often wasted movements, which made progress somewhat slow.
                                                   167


Error recoverer 2 phenotype (TQ = 0.81; Update born: 66346):
This was the last organism in the ancestral lineage to rely exclusively on the error recovery
strategy (figs. A.14 and A.17). Once it evolved, this strategy became predominant in the
population and improved in efficiency. This organism was able to navigate much further into
the trail in a given time than its early ancestors. For example, when this organism made a
wrong turn, instead of trying to find the trail again by turning in 45-degree increments like Error
recoverer 1, it took one step back, turned 90 degrees at once, and moved forward.
Figure A.14: “Error recoverer 2” organism. This organism was the last one in the lineage to rely
exclusively on the error recovery strategy. Its behavior was a streamlined version of its early ancestor
(Error recoverer 1), which allowed it to move faster, waste fewer movements, and reach much further
into the trail.
                                                    168


First learner (TQ = 0.98; Update born: 66448):
This was the first organism in this lineage to use the relearning strategy (fig. A.15). Notably, it
differed from its immediate ancestor (Error recoverer 2) by a single mutation, which connected
the error-recovery module to a previously underutilized memory-storing module (figs. A.16 and
A.17). Therefore, this organism evolved relearning directly from error recovery, without passing
through imprinting.
Figure A.15: “First learner” organism. This organism was the first one in the lineage to use the relearning
strategy. It differed from its immediate ancestor (Error recoverer 2) by a single mutation. Its path in one
of the arenas (right) shows that that it went off the trail three times: once in the initial learning, a
second time when the cues were reversed, and a third time when it detected the trail ended, at which
point the organism stopped moving.
                                                    169


Final organism (TQ = 0.98; Update born: 249826):
The final predominant organism of the lineage used the same relearning strategy as the First
learner and achieved the exact same task quality score. However, its behavioral control
algorithm was slightly more compact, which allowed it to perform the same behavior as its
ancestor more quickly and reproduce sooner, consequently gaining fitness. Figure A.18 shows a
comparison between the First learner and the Final organism genomes. Although the Final
organism’s genome is longer, the executed portion of its program is slightly more efficient,
partly due to shorter loops.
Figure A.16: One single mutation separates the error recovery and relearning behaviors. The transition
from Error recoverer 2, on the left, to First learner, on the right, occurred due to a single mutation. For
the full source code of the First Learner, see fig. A.18.
                                                      170


Figure A.17: Change in the behavioral algorithm from Error recoverer 2 to First learner due to a single
mutation. The mutation connected the error-recovery module to the memory-storing module.
Previously, memory-storing was executed only once, right after the organism was initialized on the trail
(left sequence). After the mutation, every time it made a wrong turn and recovered, the organism
stored the cue that led it off the trail in memory (right sequence).
Note: Each block in the diagram represents a sequence of one or more instructions. For the full
source code of the First learner, see fig. A.18.
                                                     171


Figure A.18: Comparison between First learner and Final organism. The picture shows the genomes of
the First learner and Final organism, side by side. Lines connecting both genomes indicate corresponding
portions of the algorithm. The longer Final organism’s genome indicates a substantial accumulation of
neutral mutations during evolution. Interestingly, the active parts of the genome were highly conserved,
and tended to remain together.
A.4.2 Experiment 2 – Bestiary
In this experiment, several organisms evolved variations of imprinting in which the cue
associations did not last long (“short memory”). The most common of these variations was a
hybrid of imprinting and error recovery, where an imprinted association would last only as long
as the turns were in the non-default direction. A turn to the default direction would erase the
association. Afterwards, a turn to the non-default direction would cause the organism to step
off the trail, recover and imprint the association again. These organisms coped with cue
reversals by using error recovery until their memory lapsed and they could form a new
association. They reached higher task quality scores (up to 93% of the maximum) than
organisms using typical imprinting where the cue association did not lapse, but if the cues
changed, the organism was not able to reform the association. In another variation of this
                                                     172


strategy, the cue association lapsed after the organism had performed a certain number of
movements, and not due to encountering a specific type of cue.
A.5 Follow-Up Experiment
As a follow up to experiment 2, which selected for reversal learning, we designed an
experiment to test whether limiting the amount of computational memory available to an
organism would make the evolution of learning behavior more difficult.
In all previous experiments, we used a Turing-complete instruction set, which provided an
organism with six addressed memory positions, each capable of storing a single 32-bit integer,
and two push-pop memory stacks, each capable of storing ten 32-bit integers [216]. This time,
we used a non-Turing-complete, minimal memory instruction set. This new instruction set
provided an organism with only two addressed memory positions, and no stacks. We used the
same cue reversal environment (fig. A.5) and sensing and moving instructions as in experiment
2, and performed 200 evolutionary replicates under this condition.
Our hypothesis was that by providing the organisms with less memory to store information and
perform computation, the evolution of learning would be more difficult. We believed that two
registers were the minimum amount of memory necessary to solve the learning task, since one
register is always used by the sense current instruction to store the currently sensed cue. A
second register would be necessary to store a previously learned cue in order to compare them.
                                                173


A.5.1 Follow-Up Experiment – Results
Our hypothesis was not upheld. Relearning evolved under the minimal memory instruction set
at a rate not significantly different from the Turing-complete instruction set used in experiment
2 (Fisher's exact test, p = 0.4191). In addition, when comparing the final predominant organisms
of each condition, there was no significant difference in the distribution of AMTQ or average
AMTQ (Kruskal–Wallis test, H = 0.11008, df = 1, p = 0.7401) (fig. A.19; table A.3).
Figure A.19: Distribution of Average Maximum Task Quality (AMTQ) per condition for experiment 2 and
follow-up experiment. Each violin plot represents the distribution of AMTQ across replicates for a given
condition. The difference between the two conditions was not significant (Kruskal–Wallis test, H =
0.11008, df = 1, p = 0.7401).
Therefore, the amount of available memory appears not be a constraint on the evolution of
learning in our system, as long as the minimum amount necessary to solve the problem is
provided.
                                                   174


Table A.3: Experiment 2 and follow-up experiment summary of results. Comparison between the two
conditions, standard Turing-complete instruction set, and non-Turing-complete, minimal memory
instruction set.
                                 Experiment 2                 Follow-up experiment
  Instruction set                Turing-complete,             Non-Turing-complete, minimal
                                 standard                     memory
  Number of replicates           900                          200
  Average AMTQ across all        0.4853988                    0.4405631
  replicates
  Best performance               97%                          97%
  Organisms that evolved relearning:
  Behavior generalized to        10                           5
  any trail configuration
  Behavior did not               8                            1
  generalize
  Total                          18                           6
A.6 How Evolution Continues to Shape Learning
In a final analysis of our data, we looked into how learning continued to evolve after it first
appeared. Although our experiments were not designed to investigate this question, our data
shows some intriguing patterns consistent with the literature on how evolution optimizes
learning abilities for the particular characteristics of each environment.
                                                  175


A.6.1 Evolutionary Constraints on Learning
Among the organisms that evolved learning (imprinting and relearning) in experiments 1 and 2,
some were capable of learning any cue combination (in the range from 1 to 100) while others
could learn some combinations but not others. In addition, some organisms learned from the
experience of stepping off the path while others learned by using the pattern at the start of the
trail to decide when and what to imprint. We looked at the evolution of these and other
differences in learning ability in the context of the literature on preparedness and other so-
called constraints on learning [84], [88], [89], [147]. A key theme in this literature is that the
evolutionary process has produced learning mechanisms that are particularly efficient to form
the kinds of associations that an animal may need to use in the environment where it evolved.
For example, an animal that relies upon odors for foraging may more quickly learn to associate
odors with good or bad foods than to associate visual cues with the same foods [151].
Our studies were not specifically designed to test this idea, but in a more general sense it is
clear that digital organisms in some lineages evolved to learn those cue associations that were
relevant, and to do so with high efficiency. Furthermore, we could actually examine the
evolutionary history of those organisms that evolved the ability to learn, unlike most empirical
papers in the “preparedness” literature, where the focus is on the outcome of evolution and
not the evolutionary history itself. This enabled us to document specific examples of where the
performance of learning mechanisms improved the ability to deal with the learning task
presented by the environment, supporting the general assumption of the preparedness
literature.
                                                  176


For example, some of the organisms that could learn any cue combination had ancestors that
could only learn some combinations (the combinations where one of the cues had the value of
1 was particularly troublesome due to the specific assembly instructions available in the Avida
instruction set). This was due to some inefficiencies in their learning algorithm as it first
evolved, that were later mitigated by evolution. Therefore, in these lineages, organisms
improved their learning ability during evolution by increasing the range of cue values they could
associate.
We also found some lineages where ancestral organisms required multiple exposures to learn
the cue-response association (regardless if it was the initial learning or a reversal learning).
However, their descendants gradually evolved to require a lesser number of exposures, and the
final organisms were able to learn the cue-response association in a single exposure.
In many of the lineages we analyzed, there were ancestral organism with short memories. They
could form the cue-response association, but it would not last long. The organism would have
to either reform the association often or navigate the remainder of the trail using other
strategies, such as searching or error recovery. However, in all these lineages, we observed a
trend of increasing memory duration, with some of the final organisms being able to retain the
association indefinitely. A long-lasting memory was advantageous for organisms in experiment
1, where there were no cue reversals, and for organisms using the relearning strategy in
experiment 2, where the environment contained only one cue reversal.
We also observed lineages, in both experiments 1 and 2, where ancestral organisms could make
the association under two different conditions. They could first imprint the cue association by
                                                177


using the pattern at the start of the trail, and later, when their memory lapsed, they could
imprint (or relearn) the association when they stepped off the trail. This dual ability was lost in
the final organism, which specialized in one of the strategies, presumably because it simplified
its behavioral algorithm and allowed it to navigate faster and reproduce sooner.
One of these lineages evolved in the one fixed turn environment in experiment 1. Some of its
ancestral organisms were capable of relearning since they evolved from error recovery. Later
generations also evolved imprinting based on the pattern at the start of the trail (the first turn
was always to the right), and both abilities were simultaneously present in the same organism
for many generations, until the ability for error recovery was eventually lost, thus completing
the transition to imprinting based on the pattern at the start of the trail, which was the only
strategy used by the final organism.
In another lineage that evolved in experiment 2, some of the ancestral organisms were capable
of relearning since they evolved from error recovery, but they were also capable of imprinting
based on the pattern at the start of the trail. Additionally, these organisms had short memories
which required them to relearn often. However, in this case, it was the ability for imprinting
that was eventually lost during evolution, and the final organism was capable of relearning
exclusively.
Beyond this evidence of learning abilities improving during evolution, we observed variations in
the outcome of evolution that were not necessarily predicted by a strict interpretation of the
“preparedness” concept. In particular, in spite of the fact that the evolutionary pressures
should have been similar across replicates in our experiments, we find a high degree of
                                                  178


variation across replicate lineages in the efficiency of particular learning mechanisms and how
they arose. This suggests that a future area of study could be to understand historical
contingency in the evolution of learning: i.e., to explore systematically how early evolutionary
history influences subsequent evolution of learning abilities.
A.6.2 Imprinting vs. Relearning – Evolution of the Sensitive Period for Learning
In experiment 1, where environments did not contain cue reversals, the only form of learning
present in the final populations was imprinting. However, in experiment 2, where the
environment contained cue reversals, we found both imprinting and relearning. Moreover, the
highest performing imprinting organisms in experiment 2 had short memories, which would
allow them to re-form the associations once their memory had lapsed and was beneficial in
coping with cue reversals (Section A.4.2). We looked at these contrasting results from the two
experiments from the perspective of the literature on sensitive periods of plasticity [148], [150].
In regards to the evolution of learning, the theory states that when environmental conditions
are relatively stable during an individual's lifetime, such that the experiences early in life are
reliable predictors of future conditions, natural selection will favor restricting learning to a
period early in the individual's life, thus reducing the costs of learning. An example is filial
imprinting in birds: a chick needs to learn who its mother is only once, since her identity will not
change [149]. Conversely, if environmental conditions vary substantially within an individual's
lifetime, natural selection will favor lifelong learning.
In experiment 1, environments were stable within the organism’s lifetime, since there were no
cue reversals, and the organisms that evolved imprinting learned the cue-response association
                                                  179


early in the trail and retained that association thereafter. They were incapable of relearning
when tested in environments with cue reversals. Thus, their sensitive period for learning only
lasted until the first learning event. However, in experiment 2, where the environment was less
stable, since cues were reversed later in life, organisms evolved relearning and short-term
imprinting and were able to re-form the association multiple times along the trail. Their
sensitive period for learning lasted their entire lives.
For the organisms that evolved in the stable environments of experiment 1, remaining plastic
after the initial learning would not have provided any additional benefit. However, in the cue
reversal environment of experiment 2, retaining the ability to re-form the association after the
initial learning was adaptive. Moreover, the reduction of the sensitive period may be adaptive
in the stable environments of experiment 1, since evolution could presumably simplify an
organism's algorithm, limiting the execution of the learning module to the early part of its life,
thus allowing it to navigate more quickly the remainder of the trail and reproduce sooner.
When analyzing the evolutionary history of the organisms that evolved imprinting in
experiment 1, we found some lineages where ancestral organisms had sensitive periods for
learning that could last their entire life, that is, when tested in an environment with multiple
cue reversals, these organisms could either relearn or re-form the cue-association after their
memory had lapsed. Therefore, in some of the lineages of experiment 1, there was a trend
towards reducing the sensitive period for learning to the early part of the organism’s life. In
contrast, in experiment 2, where the environment contained cue reversals, we never observed
a reduction of sensitive periods in the lineages that evolved relearning or short-term imprinting.
                                                   180


As mentioned in the previous section (Section A.6.1), we also found lineages, both in
experiment 1 and experiment 2, where ancestral organisms were capable of learning using two
different strategies, imprinting and relearning. This dual ability was eventually lost during
evolution and the final organism used only one of the strategies. In agreement with the theory,
the strategy that prevailed in the environment without cue reversals was imprinting, while
relearning prevailed in the environment with cue reversals.
In the future, we would like to perform more experiments targeted specifically at investigating
the evolution of sensitive periods and shed light on these observations.
                                                181


           APPENDIX B
Supplementary Material for Chapter 4
               182


Figure B.1. Flowchart of the predominant organism capable of the generalizable version of configural
learning from Experiment 2. Its cyclomatic complexity is 13.
                                                 183


REFERENCES
    184


                                              REFERENCES
[1]  M. Mitchell, Artificial Intelligence: A Guide for Thinking Humans. Penguin UK, 2019.
[2]  R. Lachman, J. L. Lachman, and E. Butterfield, Cognitive psychology and information processing :
     an introduction. Hillsdale, N.J. : Lawrence Erlbaum Associates ; New York : distributed by Halsted
     Press, 1979. Accessed: Feb. 27, 2020. [Online]. Available:
     http://archive.org/details/cognitivepsychol00lach
[3]  P. McCorduck, Machines who think: a personal inquiry into the history and prospects of artificial
     intelligence, 25th anniversary update. Natick, Mass: A.K. Peters, 2004.
[4]  A. Newell and H. A. Simon, Human problem solving, vol. 104. Prentice-Hall Englewood Cliffs, NJ,
     1972.
[5]  R. A. Brooks, Flesh and Machines: How Robots Will Change Us. Vintage Books, 2003.
[6]  R. Menzel and J. Fischer, Eds., Animal Thinking: Contemporary Issues in Comparative Cognition.
     The MIT Press, 2011. doi: 10.7551/mitpress/9780262016636.001.0001.
[7]  C. Gallistel, “Learning and Representation,” Learn. Theory Behav. Learn. Mem. Compr. Ref., vol. 1,
     Dec. 2008, doi: 10.1016/B978-012370509-9.00082-6.
[8]  A. Kamil, “A Synthetic Approach to the Study of Animal Intelligence,” Nebr. Symp. Motiv., p. 53,
     1987.
[9]  R. Dukas, “Evolutionary Biology of Animal Cognition,” Annu. Rev. Ecol. Evol. Syst., vol. 35, no. 1,
     pp. 347–374, 2004, doi: 10.1146/annurev.ecolsys.35.112202.130152.
[10] S. J. Shettleworth, Cognition, Evolution, and Behavior. New York: Oxford University Press, 1998.
[11] C. R. Gallistel, “Animal Cognition: The Representation of Space, Time and Number,” Annu. Rev.
     Psychol., vol. 40, no. 1, pp. 155–189, Jan. 1989, doi: 10.1146/annurev.ps.40.020189.001103.
[12] D. H. Ballard, M. M. Hayhoe, and J. B. Pelz, “Memory Representations in Natural Tasks,” J. Cogn.
     Neurosci., vol. 7, no. 1, pp. 66–80, Jan. 1995, doi: 10.1162/jocn.1995.7.1.66.
[13] D. Kahneman, Thinking, fast and slow, 1st pbk. ed. New York: Farrar, Straus and Giroux, 2013.
[14] J. J. Gibson, The ecological approach to visual perception. New York, London: Psychology Press,
     2015.
[15] S. L. Hurley, Consciousness in action, 1. Harvard University Press paperback ed. Cambridge, Mass.:
     Harvard Univ. Press, 2002.
                                                    185


[16] R. S. Marken, “Perceptual organization of behavior: A hierarchical control model of coordinated
     action.,” J. Exp. Psychol. Hum. Percept. Perform., vol. 12, no. 3, p. 267, 19861101, doi:
     10.1037/0096-1523.12.3.267.
[17] D. C. Dennett, From bacteria to Bach and back: the evolution of minds, First edition. New York:
     W.W. Norton & Company, 2017.
[18] T. Gomila and P. Calvo, “1 - Directions for an Embodied Cognitive Science: Toward an Integrated
     Approach,” in Handbook of Cognitive Science, P. Calvo and A. Gomila, Eds. San Diego: Elsevier,
     2008, pp. 1–25. doi: 10.1016/B978-0-08-046616-3.00001-3.
[19] W. T. Powers, Behavior: The control of perception. Oxford, England: Aldine, 1973, pp. xi, 296.
[20] T. Van Gelder, “What Might Cognition Be, If Not Computation?,” J. Philos., vol. 92, no. 7, pp. 345–
     381, 1995, doi: 10.2307/2941061.
[21] L. de Bruin, A. Newen, and S. Gallagher, Eds., The Oxford Handbook of 4E Cognition. Oxford:
     Oxford University Press, 2018.
[22] A. Newell, Unified theories of cognition. Cambridge, Mass: Harvard University Press, 1990.
[23] P. Godfrey-Smith, “Environmental complexity and the evolution of cognition,” in The Evolution of
     Intelligence, R. Sternberg and J. Kaufman, Eds. London: Lawrence Erlbaum Associates, 2002, pp.
     233–249.
[24] D. A. Levitis, W. Z. Lidicker, and G. Freund, “Behavioural biologists don’t agree on what
     constitutes behaviour,” Anim. Behav., vol. 78, no. 1, pp. 103–110, Jul. 2009, doi:
     10.1016/j.anbehav.2009.03.018.
[25] A. Trewavas, “The foundations of plant intelligence,” Interface Focus, vol. 7, no. 3, p. 20160098,
     Jun. 2017, doi: 10.1098/rsfs.2016.0098.
[26] K. Aleklett and L. Boddy, “Fungal behaviour: a new frontier in behavioural ecology,” Trends Ecol.
     Evol., vol. 36, no. 9, pp. 787–796, Sep. 2021, doi: 10.1016/j.tree.2021.05.006.
[27] I. M. De la Fuente et al., “Evidence of conditioned behavior in amoebae,” Nat. Commun., vol. 10,
     no. 1, Art. no. 1, Aug. 2019, doi: 10.1038/s41467-019-11677-w.
[28] P. Lyon, “The cognitive cell: bacterial behavior reconsidered,” Front. Microbiol., vol. 6, Apr. 2015,
     doi: 10.3389/fmicb.2015.00264.
[29] D.-E. Nilsson and J. Marshall, “Lens eyes in protists,” Curr. Biol., vol. 30, no. 10, pp. R458–R459,
     May 2020, doi: 10.1016/j.cub.2020.01.077.
                                                    186


[30] M. K. Trinh, M. T. Wayland, and S. Prabakaran, “Behavioural analysis of single-cell aneural ciliate,
     Stentor roeseli, using machine learning approaches,” J. R. Soc. Interface, vol. 16, no. 161, p.
     20190410, Dec. 2019, doi: 10.1098/rsif.2019.0410.
[31] F. Cvrčková, V. Žárský, and A. Markoš, “Plant Studies May Lead Us to Rethink the Concept of
     Behavior,” Front. Psychol., vol. 7, 2016, doi: 10.3389/fpsyg.2016.00622.
[32] X. E. Barandiaran, E. Di Paolo, and M. Rohde, “Defining Agency: Individuality, Normativity,
     Asymmetry, and Spatio-temporality in Action,” Adapt. Behav., vol. 17, no. 5, pp. 367–386, Oct.
     2009, doi: 10.1177/1059712309343819.
[33] C. Misselhorn, “Collective Agency and Cooperation in Natural and Artificial Systems,” in Collective
     Agency and Cooperation in Natural and Artificial Systems: Explanation, Implementation and
     Simulation, C. Misselhorn, Ed. Cham: Springer International Publishing, 2015, pp. 3–24. doi:
     10.1007/978-3-319-15515-9_1.
[34] D. A. Sanders, “Artificial intelligence for control engineering,” Control Eng., vol. 62, no. 2, p. 38+,
     Feb. 2015.
[35] S. A. Umpleby, “A HISTORY OF THE CYBERNETICS MOVEMENT IN THE UNITED STATES,” J. Wash.
     Acad. Sci., vol. 91, no. 2, pp. 54–66, 2005.
[36] E. Pacherie, “The phenomenology of action: A conceptual framework,” Cognition, vol. 107, no. 1,
     pp. 179–217, Apr. 2008, doi: 10.1016/j.cognition.2007.09.003.
[37] W. R. Ashby, An introduction to cybernetics. New York: J. Wiley, 1956.
[38] K. Sterelny, The evolution of agency and other essays. Cambridge, UK ; New York: Cambridge
     University Press, 2001.
[39] D. C. Dennett, “Intentional Systems,” J. Philos., vol. 68, no. 4, pp. 87–106, 1971, doi:
     10.2307/2025382.
[40] D. Dennett, Intentional Systems Theory. Oxford University Press, 2009. doi:
     10.1093/oxfordhb/9780199262618.003.0020.
[41] D. C. Dennett, The intentional stance, 7. printing. Cambridge, Mass.: MIT Press, 1998.
[42] K. E. Boulding, “General Systems Theory-The Skeleton of Science,” Manag. Sci., vol. 2, no. 3, pp.
     197–208, 1956.
[43] A. J. Dzieciol and S. Mann, “Designs for life: protocell models in the laboratory,” Chem Soc Rev,
     vol. 41, no. 1, pp. 79–85, 2012, doi: 10.1039/C1CS15211D.
                                                    187


[44] A. Hanson, “Spontaneous electrical low-frequency oscillations: a possible role in Hydra and all
     living systems,” Philos. Trans. R. Soc. B Biol. Sci., vol. 376, no. 1820, p. 20190763, Mar. 2021, doi:
     10.1098/rstb.2019.0763.
[45] G. Buzsáki, Rhythms of the Brain. New York: Oxford University Press, 2006. doi:
     10.1093/acprof:oso/9780195301069.001.0001.
[46] R. A. Baines and M. Landgraf, “Neural development: The role of spontaneous activity,” Curr. Biol.,
     vol. 31, no. 23, pp. R1513–R1515, Dec. 2021, doi: 10.1016/j.cub.2021.10.026.
[47] J. Hernández-Orallo, B. S. Loe, L. Cheke, F. Martínez-Plumed, and S. Ó hÉigeartaigh, “General
     intelligence disentangled via a generality metric for natural and artificial intelligence,” Sci. Rep.,
     vol. 11, no. 1, p. 22822, Nov. 2021, doi: 10.1038/s41598-021-01997-7.
[48] A. Visioli, “Maximizing the Impact of Control at All Levels,” Front. Control Eng., vol. 1, 2020, doi:
     10.3389/fcteg.2020.602469.
[49] S. Bennett, “CONTROL AND THE DIGITAL COMPUTER: THE EARLY YEARS,” p. 6.
[50] T.-M. Yi, Y. Huang, M. I. Simon, and J. Doyle, “Robust perfect adaptation in bacterial chemotaxis
     through integral feedback control,” Proc. Natl. Acad. Sci., vol. 97, no. 9, pp. 4649–4653, Apr.
     2000, doi: 10.1073/pnas.97.9.4649.
[51] S. A. Frank, “Homeostasis, environmental tracking and phenotypic plasticity. I. A robust control
     theory approach to evolutionary design tradeoffs,” bioRxiv, p. 332999, May 2018, doi:
     10.1101/332999.
[52] S. A. Frank, Control Theory Tutorial: Basic Concepts Illustrated by Software Examples. Cham:
     Springer International Publishing, 2018. doi: 10.1007/978-3-319-91707-8.
[53] A. Mitra, A.-M. Raicu, S. L. Hickey, L. A. Pile, and D. N. Arnosti, “Soft repression: Subtle
     transcriptional regulation with global impact,” BioEssays, vol. 43, no. 2, p. 2000231, 2021, doi:
     https://doi.org/10.1002/bies.202000231.
[54] A. Kurz, “Physiology of Thermoregulation,” Best Pract. Res. Clin. Anaesthesiol., vol. 22, no. 4, pp.
     627–644, Dec. 2008, doi: 10.1016/j.bpa.2008.06.004.
[55] J. A. Russell, G. Leng, and A. J. Douglas, “The magnocellular oxytocin system, the fount of
     maternity: adaptations in pregnancy,” Front. Neuroendocrinol., vol. 24, no. 1, pp. 27–61, Jan.
     2003, doi: 10.1016/S0091-3022(02)00104-8.
[56] H. El-samad, J. P. Goff, and M. Khammash, “Calcium Homeostasis and Parturient Hypocalcemia:
     An Integral Feedback Perspective,” J. Theor. Biol., vol. 214, no. 1, pp. 17–29, Jan. 2002, doi:
     10.1006/jtbi.2001.2422.
[57] B. F. Skinner, “The evolution of behavior,” J. Exp. Anal. Behav., vol. 41, no. 2, pp. 217–221, 1984.
                                                     188


[58] A. Kurakin, “Stochastic Cell,” IUBMB Life, vol. 57, no. 2, pp. 59–63, 2005, doi:
     10.1080/15216540400024314.
[59] D. J. Nicholson, “Is the cell really a machine?,” J. Theor. Biol., vol. 477, pp. 108–126, Sep. 2019,
     doi: 10.1016/j.jtbi.2019.06.002.
[60] F. J. Gomez and R. Miikkulainen, “Active Guidance for a Finless Rocket Using.”
[61] J. Yosinski, J. Clune, D. Hidalgo, S. Nguyen, J. C. Zagal, and H. Lipson, “Evolving Robot Gaits in
     Hardware: the HyperNEAT Generative Encoding Vs. Parameter Optimization,” p. 8.
[62] S. P. Franklin, Artificial Minds. Cambridge, MA, USA: A Bradford Book, 1995.
[63] N. Kitadai and S. Maruyama, “Origins of building blocks of life: A review,” Geosci. Front., vol. 9,
     no. 4, pp. 1117–1153, Jul. 2018, doi: 10.1016/j.gsf.2017.07.007.
[64] L. Bich, “Autonomous Systems and the Place of Biology Among Sciences. Perspectives for an
     Epistemology of Complex Systems,” in Multiplicity and Interdisciplinarity: Essays in Honor of
     Eliano Pessa, G. Minati, Ed. Cham: Springer International Publishing, 2021, pp. 41–57. doi:
     10.1007/978-3-030-71877-0_4.
[65] R. I. Vane-Wright, “What is life? And what might be said of the role of behaviour in its evolution?:
     What is Life?,” Biol. J. Linn. Soc., vol. 112, no. 2, pp. 219–241, Jun. 2014, doi: 10.1111/bij.12300.
[66] F. G. Varela, H. R. Maturana, and R. Uribe, “Autopoiesis: The organization of living systems, its
     characterization and a model,” Biosystems, vol. 5, no. 4, pp. 187–196, May 1974, doi:
     10.1016/0303-2647(74)90031-8.
[67] H. R. Maturana and F. J. Varela, Autopoiesis and Cognition: The Realization of the Living, vol. 42.
     Dordrecht: Springer Netherlands, 1980. doi: 10.1007/978-94-009-8947-4.
[68] M. Vitas and A. Dobovišek, “Towards a General Definition of Life,” Orig. Life Evol. Biospheres, vol.
     49, no. 1, pp. 77–88, Jun. 2019, doi: 10.1007/s11084-019-09578-5.
[69] F. Wong and J. Gunawardena, “Gene Regulation in and out of Equilibrium,” Annu. Rev. Biophys.,
     vol. 49, no. 1, pp. 199–226, 2020, doi: 10.1146/annurev-biophys-121219-081542.
[70] L. von Bertalanffy, “The Theory of Open Systems in Physics and Biology,” Science, vol. 111, no.
     2872, pp. 23–29, Jan. 1950, doi: 10.1126/science.111.2872.23.
[71] P. C. W. Davies, The demon in the machine: how hidden webs of information are solving the
     mystery of life. London: Allen Lane, 2019.
[72] A. W. Fenton, “Allostery: an illustrated definition for the ‘second secret of life,’” Trends Biochem.
     Sci., vol. 33, no. 9, pp. 420–425, Sep. 2008, doi: 10.1016/j.tibs.2008.05.009.
                                                     189


[73] A. Jones, Ed., “Grand Research Challenges in Information Systems,” Rep. First Conf. – Inf. Syst., p.
     35, 2003.
[74] R. Dale, “GPT-3: What’s it good for?,” Nat. Lang. Eng., vol. 27, no. 1, pp. 113–118, Jan. 2021, doi:
     10.1017/S1351324920000601.
[75] A. C. Pontes, R. B. Mobley, C. Ofria, C. Adami, and F. C. Dyer, “The Evolutionary Origin of
     Associative Learning,” Am. Nat., vol. 195, no. 1, pp. E1–E19, Jan. 2020, doi: 10.1086/706252.
[76] D. Hume, A Treatise of Human Nature: Being an Attempt to Introduce the Experimental Method
     of Reasoning Into Moral Subjects. London: Oxford University Press, 1738.
[77] P. Godfrey-Smith, Complexity and the Function of Mind in Nature. New York: Cambridge
     University Press, 1996. doi: 10.1017/CBO9781139172714.
[78] B. H. Weber and D. J. Depew, Evolution and learning: The Baldwin effect reconsidered.
     Cambridge, Mass: MIT Press, 2003.
[79] R. A. Duckworth, “The role of behavior in evolution: a search for mechanism,” Evol. Ecol., vol. 23,
     no. 4, pp. 513–531, Jul. 2009, doi: 10.1007/s10682-008-9252-6.
[80] S. Ginsburg and E. Jablonka, “The evolution of associative learning: A factor in the Cambrian
     explosion,” J. Theor. Biol., vol. 266, no. 1, pp. 11–20, Sep. 2010, doi: 10.1016/j.jtbi.2010.06.017.
[81] R. L. Brown, “Learning, evolvability and exploratory behaviour: extending the evolutionary reach
     of learning,” Biol. Philos., vol. 28, no. 6, pp. 933–955, Nov. 2013, doi: 10.1007/s10539-013-9396-
     9.
[82] D. C. Dennett, Darwin’s dangerous idea: evolution and the meanings of life. New York:
     Touchstone, 1996.
[83] R. Dukas, “Effects of learning on evolution: robustness, innovation and speciation,” Spec. Sect.
     Behav. Plast. Evol., vol. 85, no. 5, pp. 1023–1030, May 2013, doi: 10.1016/j.anbehav.2012.12.030.
[84] M. E. Seligman, “On the generality of the laws of learning,” Psychol. Rev., vol. 77, no. 5, pp. 406–
     418, 1970, doi: 10.1037/h0029790.
[85] D. W. Stephens, “Change, regularity, and value in the evolution of animal learning,” Behav. Ecol.,
     vol. 2, no. 1, pp. 77–89, Mar. 1991, doi: 10.1093/beheco/2.1.77.
[86] F. Mery and T. J. Kawecki, “Experimental evolution of learning ability in fruit flies,” Proc. Natl.
     Acad. Sci., vol. 99, no. 22, p. 14274, Oct. 2002, doi: 10.1073/pnas.222371199.
[87] R. Dukas and J. M. Ratcliffe, Eds., Cognitive Ecology II. Chicago; London: University of Chicago
     Press, 2009. doi: 10.7208/chicago/9780226169378.001.0001.
                                                      190


[88] S. J. Shettleworth, Cognition, Evolution and Behavior. Oxford; New York: Oxford University Press,
     2010.
[89] M. Domjan, “Biological or Evolutionary Constraints on Learning,” in Encyclopedia of the Sciences
     of Learning, N. M. Seel, Ed. Boston, MA: Springer US, 2012, pp. 461–463. doi: 10.1007/978-1-
     4419-1428-6_89.
[90] B. R. Moore, “The evolution of learning,” Biol. Rev., vol. 79, no. 2, pp. 301–335, May 2004, doi:
     10.1017/S1464793103006225.
[91] A. S. Dunlap, M. W. Austin, and A. Figueiredo, “Components of change and the evolution of
     learning in theory and experiment,” Anim. Behav., vol. 147, pp. 157–166, Jan. 2019, doi:
     10.1016/j.anbehav.2018.05.024.
[92] A. S. Dunlap and D. W. Stephens, “Components of change in the evolution of learning and
     unlearned preference,” Proc. R. Soc. B Biol. Sci., vol. 276, no. 1670, pp. 3201–3208, Sep. 2009,
     doi: 10.1098/rspb.2009.0602.
[93] G. F. Miller and P. M. Todd, “Exploring Adaptive Agency I: Theory and Methods for Simulating the
     Evolution of Learning,” in Connectionist Models, D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and
     G. E. Hinton, Eds. San Mateo, CA: Morgan Kaufmann, 1991, pp. 65–80. doi: 10.1016/B978-1-
     4832-1448-1.50013-5.
[94] M. J. Wells, “Sensitization and the Evolution of Associative Learning,” in Neurobiology of
     Invertebrates: Proceedings of the Symposium Held at the Biological Research Institute of the
     Hungarian Academy of Sciences (Tihany) September 4–7, 1967, J. Salánki, Ed. Boston, MA:
     Springer US, 1968, pp. 391–411. doi: 10.1007/978-1-4615-8618-0_28.
[95] G. Razran, Mind in Evolution: An East-West Synthesis of Learned Behavior and Cognition. Boston:
     Houghton Mifflin, 1971.
[96] R. D. Hawkins and E. R. Kandel, “Is there a cell-biological alphabet for simple forms of learning?,”
     Psychol. Rev., vol. 91, no. 3, pp. 375–391, 1984, doi: 10.1037/0033-295X.91.3.375.
[97] R. D. Hawkins and E. R. Kandel, “Steps toward a cell-biological alphabet for elementary forms of
     learning,” in Neurobiology of Learning and Memory, G. Lynch, McGaugh James L, and N. M.
     Weinberger, Eds. New York: Guilford Press, 1984, pp. 385–404.
[98] M. van Duijn, “Phylogenetic origins of biological cognition: convergent patterns in the early
     evolution of learning,” Interface Focus, vol. 7, no. 3, p. 20160158, Jun. 2017, doi:
     10.1098/rsfs.2016.0158.
[99] J. S. Duerr and W. G. Quinn, “Three Drosophila mutations that block associative learning also
     affect habituation and sensitization,” Proc. Natl. Acad. Sci., vol. 79, no. 11, pp. 3646–3650, Jun.
     1982, doi: 10.1073/pnas.79.11.3646.
                                                   191


[100] A. C. Roberts and D. L. Glanzman, “Learning in Aplysia: looking at synaptic plasticity from both
      sides,” Trends Neurosci., vol. 26, no. 12, pp. 662–670, 2003, doi:
      https://doi.org/10.1016/j.tins.2003.09.014.
[101] M. Gagliano, V. V. Vyazovskiy, A. A. Borbély, M. Grimonprez, and M. Depczynski, “Learning by
      Association in Plants,” Sci. Rep., vol. 6, p. 38427, Dec. 2016.
[102] H. L. Armus, A. R. Montgomery, and J. L. Jellison, “Discrimination Learning in Paramecia (P.
      caudatum),” Psychol. Rec., vol. 56, no. 4, pp. 489–498, Oct. 2006, doi: 10.1007/BF03396029.
[103] Fernando Chrisantha T et al., “Molecular circuits for associative learning in single-celled
      organisms,” J. R. Soc. Interface, vol. 6, no. 34, pp. 463–469, May 2009, doi:
      10.1098/rsif.2008.0344.
[104] T. J. Ord and E. P. Martins, “Evolution of behaviour: phylogeny and the origin of present day
      diversity,” in Evolutionary Behavioral Ecology, Westneat, David and Fox, Charles W, Eds. New
      York: Oxford University Press, 2010, pp. 108–128.
[105] J. B. Losos, “Seeing the Forest for the Trees: The Limitations of Phylogenies in Comparative
      Biology: (American Society of Naturalists Address),” Am. Nat., vol. 177, no. 6, pp. 709–727, 2011,
      doi: 10.1086/660020.
[106] L. M. Grabowski, D. M. Bryson, F. C. Dyer, C. Ofria, and R. T. Pennock, “Early Evolution of Memory
      Usage in Digital Organisms.,” in Artificial Life XII: Proceedings of the Twelfth International
      Conference on the Synthesis and Simulation of Living Systems, Cambridge MA, 2010, pp. 224–231.
[107] R. T. Pennock, “Models, simulations, instantiations, and evidence: the case of digital evolution,” J.
      Exp. Theor. Artif. Intell., vol. 19, no. 1, pp. 29–42, Mar. 2007, doi: 10.1080/09528130601116113.
[108] C. O. Wilke, J. L. Wang, C. Ofria, R. E. Lenski, and C. Adami, “Evolution of digital organisms at high
      mutation rates leads to survival of the flattest,” Nature, vol. 412, no. 6844, pp. 331–333, Jul.
      2001, doi: 10.1038/35085569.
[109] R. E. Lenski, C. Ofria, R. T. Pennock, and C. Adami, “The evolutionary origin of complex features,”
      Nature, vol. 423, no. 6936, pp. 139–144, May 2003, doi: 10.1038/nature01568.
[110] S. S. Chow, C. O. Wilke, C. Ofria, R. E. Lenski, and C. Adami, “Adaptive Radiation from Resource
      Competition in Digital Organisms,” Science, vol. 305, no. 5680, p. 84, Jul. 2004, doi:
      10.1126/science.1096307.
[111] F. M. Codoñer, J.-A. Darós, R. V. Solé, and S. F. Elena, “The Fittest versus the Flattest:
      Experimental Confirmation of the Quasispecies Effect with Subviral Pathogens,” PLOS Pathog.,
      vol. 2, no. 12, p. e136, Dec. 2006, doi: 10.1371/journal.ppat.0020136.
[112] L. M. Grabowski, W. R. Elsberry, C. Ofria, and R. T. Pennock, “On the Evolution of Motility and
      Intelligent Tactic Response,” in Proceedings of the 10th Annual Conference on Genetic and
                                                       192


      Evolutionary Computation, New York, NY, USA, 2008, pp. 209–216. doi:
      10.1145/1389095.1389129.
[113] C. Ofria, D. M. Bryson, and C. O. Wilke, “Avida,” in Artificial life models in software, M. Komosinski
      and A. Adamatzky, Eds. London: Springer London, 2009, pp. 3–35.
[114] C. Ofria, C. T. Brown, and C. Adami, Avida. 2015. [Online]. Available:
      https://github.com/mercere99/Avida-AssociativeMemory
[115] F. C. Dyer, “Bees acquire route-based memories but not cognitive maps in a familiar landscape,”
      Anim. Behav., vol. 41, no. 2, pp. 239–246, Feb. 1991, doi: 10.1016/S0003-3472(05)80475-0.
[116] F. C. Dyer, “Cognitive ecology of navigation,” in Cognitive ecology: The evolutionary ecology of
      information processing and decision making, Chicago, IL, US: University of Chicago Press, 1998,
      pp. 201–260.
[117] T. S. Collett and M. Collett, “Memory use in insect visual navigation,” Nat. Rev. Neurosci., vol. 3,
      no. 7, pp. 542–552, Jul. 2002, doi: 10.1038/nrn872.
[118] L. M. Grabowski, “The evolutionary origins of memory use in navigation,” PhD Thesis, Michigan
      State University, East Lansing, MI, 2009. [Online]. Available:
      https://search.proquest.com/docview/304931513
[119] S. W. Zhang, K. Bartsch, and M. V. Srinivasan, “Maze Learning by Honeybees,” Neurobiol. Learn.
      Mem., vol. 66, no. 3, pp. 267–282, Nov. 1996, doi: 10.1006/nlme.1996.0069.
[120] S. W. Zhang, M. Lehrer, and M. V. Srinivasan, “Honeybee Memory: Navigation by Associative
      Grouping and Recall of Visual Stimuli,” Neurobiol. Learn. Mem., vol. 72, no. 3, pp. 180–201, Nov.
      1999, doi: 10.1006/nlme.1998.3901.
[121] E. L. Charnov, “Optimal Foraging: Attack Strategy of a Mantid,” Am. Nat., vol. 110, no. 971, pp.
      141–151, 1976.
[122] A. C. Pontes, R. B. Mobley, C. Ofria, C. Adami, and F. C. Dyer, “Data from: The Evolutionary Origin
      of Associative Learning.” American Naturalist, Dryad Digital Repository, Aug. 11, 2019. [Online].
      Available: http://doi:10.5061/dryad.f45gh6s
[123] R. Hadar and R. Menzel, “Memory Formation in Reversal Learning of the Honeybee,” Front.
      Behav. Neurosci., vol. 4, p. 186, 2010, doi: 10.3389/fnbeh.2010.00186.
[124] G. B. Bissonette and E. M. Powell, “Reversal learning and attentional set-shifting in mice,”
      Neuropharmacology, vol. 62, no. 3, pp. 1168–1174, 2012.
[125] G. Xue, F. Xue, V. Droutman, Z.-L. Lu, A. Bechara, and S. Read, “Common Neural Mechanisms
      Underlying Reversal Learning by Reward and Punishment,” PLOS ONE, vol. 8, no. 12, p. e82169,
      Dec. 2013, doi: 10.1371/journal.pone.0082169.
                                                    193


[126] A. S. Dunlap and D. W. Stephens, “Reliability, uncertainty, and costs in the evolution of animal
      learning,” Curr. Opin. Behav. Sci., vol. 12, pp. 73–79, Dec. 2016, doi:
      10.1016/j.cobeha.2016.09.010.
[127] Z. D. Blount, C. Z. Borland, and R. E. Lenski, “Historical contingency and the evolution of a key
      innovation in an experimental population of Escherichia coli,” Proc. Natl. Acad. Sci., vol. 105, no.
      23, pp. 7899–7906, Jun. 2008, doi: 10.1073/pnas.0803151105.
[128] R. E. Lenski, “Experimental evolution and the dynamics of adaptation and genome evolution in
      microbial populations,” ISME J., vol. 11, no. 10, pp. 2181–2194, Oct. 2017, doi:
      10.1038/ismej.2017.69.
[129] O. S. Soyer and R. A. Goldstein, “Evolution of response dynamics underlying bacterial
      chemotaxis,” BMC Evol. Biol., vol. 11, no. 1, p. 240, Dec. 2011, doi: 10.1186/1471-2148-11-240.
[130] C. Ofria, W. Huang, and E. Torng, “On the Gradual Evolution of Complexity and the Sudden
      Emergence of Complex Features,” Artif. Life, vol. 14, no. 3, pp. 255–263, May 2008, doi:
      10.1162/artl.2008.14.3.14302.
[131] H. H. McAdams, B. Srinivasan, and A. P. Arkin, “The evolution of genetic regulatory systems in
      bacteria,” Nat. Rev. Genet., vol. 5, no. 3, pp. 169–178, Mar. 2004, doi: 10.1038/nrg1292.
[132] N. Kashtan and U. Alon, “Spontaneous evolution of modularity and network motifs,” Proc. Natl.
      Acad. Sci., vol. 102, no. 39, pp. 13773–13778, Sep. 2005, doi: 10.1073/pnas.0503610102.
[133] G. P. Wagner, M. Pavlicev, and J. M. Cheverud, “The road to modularity,” Nat. Rev. Genet., vol. 8,
      no. 12, pp. 921–931, Dec. 2007, doi: 10.1038/nrg2267.
[134] T. D. Johnston, “Selective Costs and Benefits in the Evolution of Learning,” in Advances in the
      Study of Behavior, vol. 12, J. S. Rosenblatt, R. A. Hinde, C. Beer, and M.-C. Busnel, Eds. New York:
      Academic Press, 1982, pp. 65–106. doi: 10.1016/S0065-3454(08)60046-7.
[135] C. Carbone and G. M. Narbonne, “When Life Got Smart: The Evolution of Behavioral Complexity
      Through the Ediacaran and Early Cambrian of NW Canada,” J. Paleontol., vol. 88, no. 2, pp. 309–
      330, 2014, doi: 10.1666/13-066.
[136] P. M. Todd and G. F. Miller, “Exploring adaptive agency II: Simulating the evolution of associative
      learning,” in Proceedings of the first international conference on simulation of adaptive behavior
      on From animals to animats, 1991, pp. 306–315.
[137] E. Izquierdo and I. Harvey, “The Dynamics of Associative Learning in an Evolved Situated Agent,”
      in Advances in Artificial Life, Berlin, Heidelberg, 2007, pp. 365–374.
[138] E. Izquierdo, I. Harvey, and R. D. Beer, “Associative Learning on a Continuum in Evolved
      Dynamical Neural Networks,” Adapt. Behav., vol. 16, no. 6, pp. 361–384, Dec. 2008, doi:
      10.1177/1059712308097316.
                                                     194


[139] C. Breazeal, Designing Sociable Robots. Cambridge, MA: The MIT Press, 2004. doi:
      10.7551/mitpress/2376.001.0001.
[140] J. Panksepp, Affective neuroscience: The foundations of human and animal emotions. Oxford:
      Oxford University Press, 2004.
[141] A. R. Damasio, Descartes’ error: emotion, reason, and the human brain. London: Penguin, 2005.
[142] S. Singh, R. L. Lewis, and A. Barto, “Where Do Rewards Come From?,” Proc. Annu. Meet. Cogn.
      Sci. Soc., vol. 31, no. 31, Jan. 2009.
[143] S. Singh, R. L. Lewis, A. G. Barto, and J. Sorg, “Intrinsically Motivated Reinforcement Learning: An
      Evolutionary Perspective,” IEEE Trans. Auton. Ment. Dev., vol. 2, no. 2, pp. 70–82, Jun. 2010, doi:
      10.1109/TAMD.2010.2051031.
[144] A. B. Bond, A. C. Kamil, and R. P. Balda, “Serial reversal learning and the evolution of behavioral
      flexibility in three species of North American corvids (Gymnorhinus cyanocephalus, Nucifraga
      columbiana, Aphelocoma californica),” J. Comp. Psychol., vol. 121, no. 4, pp. 372–379, 2007, doi:
      10.1037/0735-7036.121.4.372.
[145] M. Cauchoix, E. Hermer, A. S. Chaine, and J. Morand-Ferron, “Cognition in the field: comparison
      of reversal learning performance in captive and wild passerines,” Sci. Rep., vol. 7, no. 1, p. 12945,
      Oct. 2017, doi: 10.1038/s41598-017-13179-5.
[146] S. D. Buechel, A. Boussard, A. Kotrschal, W. van der Bijl, and N. Kolm, “Brain size affects
      performance in a reversal-learning test,” Proc. R. Soc. B Biol. Sci., vol. 285, no. 1871, p. 20172031,
      Jan. 2018, doi: 10.1098/rspb.2017.2031.
[147] A. S. Dunlap, “Biological Preparedness,” in Encyclopedia of Animal Cognition and Behavior, J.
      Vonk and T. Shackelford, Eds. Cham: Springer International Publishing, 2017, pp. 1–7. doi:
      10.1007/978-3-319-47829-6_1301-1.
[148] P. Bateson, “How do sensitive periods arise and what are they for?,” Anim. Behav., vol. 27, pp.
      470–486, May 1979, doi: 10.1016/0003-3472(79)90184-2.
[149] E. Cashdan, “A sensitive period for learning about food,” Hum. Nat., vol. 5, no. 3, pp. 279–291,
      Sep. 1994, doi: 10.1007/BF02692155.
[150] T. W. Fawcett and W. E. Frankenhuis, “Adaptive explanations for sensitive windows in
      development,” Front. Zool., vol. 12, no. 1, p. S3, Aug. 2015, doi: 10.1186/1742-9994-12-S1-S3.
[151] A. S. Dunlap and D. W. Stephens, “Experimental evolution of prepared learning,” Proc. Natl. Acad.
      Sci., vol. 111, no. 32, pp. 11750–11755, Aug. 2014, doi: 10.1073/pnas.1404176111.
[152] G. E. Budd, “Early animal evolution and the origins of nervous systems,” Philos. Trans. R. Soc. B
      Biol. Sci., vol. 370, no. 1684, p. 20150037, Dec. 2015, doi: 10.1098/rstb.2015.0037.
                                                     195


[153] R. E. Plotnick and K. Koy, “LET US PREY: SIMULATIONS OF GRAZING TRACES IN THE FOSSIL
      RECORD,” p. 13, 2005.
[154] G. E. Budd and S. Jensen, “The origin of the animals and a ‘Savannah’ hypothesis for early
      bilaterian evolution: Early evolution of the animals,” Biol. Rev., vol. 92, no. 1, pp. 446–473, Feb.
      2017, doi: 10.1111/brv.12239.
[155] M. L. Droser, L. G. Tarhan, and J. G. Gehling, “The Rise of Animals in a Changing Environment:
      Global Ecological Innovation in the Late Ediacaran,” Annu. Rev. Earth Planet. Sci., vol. 45, no. 1,
      pp. 593–617, Aug. 2017, doi: 10.1146/annurev-earth-063016-015645.
[156] M. L. Droser and J. G. Gehling, “The advent of animals: The view from the Ediacaran,” Proc. Natl.
      Acad. Sci., vol. 112, no. 16, pp. 4865–4870, Apr. 2015, doi: 10.1073/pnas.1403669112.
[157] C. Carbone and G. M. Narbonne, “When life got smart: the evolution of behavioral complexity
      through the Ediacaran and early Cambrian of NW Canada,” J. Paleontol., vol. 88, no. 2, pp. 309–
      330, 2014.
[158] L. A. Buatois and M. G. Mángano, “Ediacaran Ecosystems and the Dawn of Animals,” in The Trace-
      Fossil Record of Major Evolutionary Events, vol. 39, M. G. Mángano and L. A. Buatois, Eds.
      Dordrecht: Springer Netherlands, 2016, pp. 27–72. doi: 10.1007/978-94-017-9600-2_2.
[159] M. Gingras et al., “Possible evolution of mobile animals in association with microbial mats,” Nat.
      Geosci., vol. 4, no. 6, pp. 372–375, Jun. 2011, doi: 10.1038/ngeo1142.
[160] S. D. Evans, I. V. Hughes, J. G. Gehling, and M. L. Droser, “Discovery of the oldest bilaterian from
      the Ediacaran of South Australia,” Proc. Natl. Acad. Sci., Mar. 2020, doi:
      10.1073/pnas.2001045117.
[161] A. Seilacher, “Fossil Behavior,” Sci. Am., vol. 217, no. 2, pp. 72–83, 1967.
[162] T. Monk and M. G. Paulin, “Predation and the Origin of Neurones,” Brain. Behav. Evol., vol. 84,
      no. 4, pp. 246–261, 2014, doi: 10.1159/000368177.
[163] S. Ginsburg and E. Jablonka, “The evolution of associative learning: A factor in the Cambrian
      explosion,” J. Theor. Biol., vol. 266, no. 1, pp. 11–20, 2010.
[164] B. Hayes, “Computing Science: In Search of the Optimal Scumsucking Bottomfeeder,” Am. Sci.,
      vol. 91, no. 5, pp. 392–396, 2003.
[165] R. Gougeon, D. Néraudeau, A. Loi, and M. Poujol, “New insights into the early evolution of
      horizontal spiral trace fossils and the age of the Brioverian series (Ediacaran–Cambrian) in
      Brittany, NW France,” Geol. Mag., pp. 1–11, Jan. 2021, doi: 10.1017/S0016756820001430.
[166] A. Seilacher, Trace fossil analysis. Berlin: Springer, 2007.
                                                     196


[167] R. Richter, “Flachseebeobachtungen zur Paläontologie und Geologie. 9. Zur Deutung rezenter und
      fossiler Mäander-Figuren,” Senckenbergiana, vol. 6, pp. 141–157, 1924.
[168] D. M. Raup and A. Seilacher, “Fossil Foraging Behavior: Computer Simulation,” Science, vol. 166,
      no. 3908, pp. 994–995, 1969.
[169] J. G. Gehling and M. L. Droser, “Textured organic surfaces associated with the Ediacara biota in
      South Australia,” Earth-Sci. Rev., vol. 96, no. 3, pp. 196–206, Oct. 2009, doi:
      10.1016/j.earscirev.2009.03.002.
[170] S. Xiao, Z. Chen, C. Zhou, and X. Yuan, “Surfing in and on microbial mats: Oxygen-related behavior
      of a terminal Ediacaran bilaterian animal,” Geology, vol. 47, no. 11, pp. 1054–1058, Nov. 2019,
      doi: 10.1130/G46474.1.
[171] W. Ding et al., “Early animal evolution and highly oxygenated seafloor niches hosted by microbial
      mats,” Sci. Rep., vol. 9, no. 1, pp. 1–11, Sep. 2019, doi: 10.1038/s41598-019-49993-2.
[172] H. Wang et al., “A benthic oxygen oasis in the early Neoproterozoic ocean,” Precambrian Res.,
      vol. 355, p. 106085, Apr. 2021, doi: 10.1016/j.precamres.2020.106085.
[173] L. A. Buatois, G. M. Narbonne, M. G. Mángano, N. B. Carmona, and P. Myrow, “Ediacaran
      matground ecology persisted into the earliest Cambrian,” Nat. Commun., vol. 5, no. 1, p. 3544,
      Mar. 2014, doi: 10.1038/ncomms4544.
[174] O. Hammer, “Computer simulation of the evolution of foraging strategies: application to the
      ichnological record,” Palaeontol. Electron., 1998, doi: 10.26879/98005.
[175] F. Papentin, “A Darwinian evolutionary system: III. Experiments on the evolution of feeding
      patterns,” J. Theor. Biol., vol. 39, no. 2, pp. 431–445, May 1973, doi: 10.1016/0022-
      5193(73)90110-0.
[176] F. Papentin and H. Röder, “Feeding patterns: the evolution of a problem and a a problem of
      evolution,” Neues Jahrb. Für Geol. Paläontol., no. 3, pp. 184–191, 1975.
[177] R. E. Plotnick, “Ecological and L-system based simulations of trace fossils,” Palaeogeogr.
      Palaeoclimatol. Palaeoecol., vol. 192, no. 1–4, pp. 45–58, Mar. 2003, doi: 10.1016/S0031-
      0182(02)00678-8.
[178] L. M. Grabowski, W. R. Elsberry, C. Ofria, and R. T. Pennock, “On the Evolution of Motility and
      Intelligent Tactic Response,” in Proceedings of the 10th Annual Conference on Genetic and
      Evolutionary Computation, New York, NY, USA, 2008, pp. 209–216. doi:
      10.1145/1389095.1389129.
[179] E. L. Charnov, “Optimal foraging, the marginal value theorem,” Theor. Popul. Biol., vol. 9, no. 2,
      pp. 129–136, Apr. 1976, doi: 10.1016/0040-5809(76)90040-X.
                                                      197


[180] C. J. Perry, A. B. Barron, and K. Cheng, “Invertebrate learning and cognition: relating phenomena
      to neural substrate: Invertebrate learning and cognition,” Wiley Interdiscip. Rev. Cogn. Sci., vol. 4,
      no. 5, pp. 561–582, Sep. 2013, doi: 10.1002/wcs.1248.
[181] R. Menzel, B. Brembs, and M. Giurfa, “1.26 - Cognition in Invertebrates,” in Evolution of Nervous
      Systems, J. H. Kaas, Ed. Oxford: Academic Press, 2007, pp. 403–442. doi: 10.1016/B0-12-370878-
      8/00183-X.
[182] J.-M. Devaud, T. Papouin, J. Carcaud, J.-C. Sandoz, B. Grünewald, and M. Giurfa, “Neural substrate
      for higher-order learning in an insect: Mushroom bodies are necessary for configural
      discriminations,” Proc. Natl. Acad. Sci., vol. 112, no. 43, pp. E5854–E5862, Oct. 2015, doi:
      10.1073/pnas.1508422112.
[183] S. Glautier, “Configural Cues in Associative Learning,” in Encyclopedia of the Sciences of Learning,
      N. M. Seel, Ed. Boston, MA: Springer US, 2012, pp. 759–762. doi: 10.1007/978-1-4419-1428-
      6_1777.
[184] O. S. Soyer and R. A. Goldstein, “Evolution of response dynamics underlying bacterial
      chemotaxis,” BMC Evol. Biol., vol. 11, no. 1, p. 240, Dec. 2011, doi: 10.1186/1471-2148-11-240.
[185] G. F. Miller and P. M. Todd, “Exploring adaptive agency I: Theory and methods for simulating the
      evolution of learning,” 1991.
[186] R. D. Hawkins and E. R. Kandel, “Steps toward a cell-biological alphabet for elementary forms of
      learning,” Neurobiol. Learn. Mem. Guilford Press N. Y., pp. 385–404, 1984.
[187] H. Bode, S. Berking, C. N. David, A. Gierer, H. Schaller, and E. Trenkner, “Quantitative analysis of
      cell types during growth and morphogenesis in Hydra,” Wilhelm Roux Arch. Für
      Entwicklungsmechanik Org., vol. 171, no. 4, pp. 269–285, Dec. 1973, doi: 10.1007/BF00577725.
[188] C. Skogh, A. Garm, D.-E. Nilsson, and P. Ekström, “Bilaterally symmetrical rhopalial nervous
      system of the box jellyfish Tripedalia cystophora,” J. Morphol., vol. 267, no. 12, pp. 1391–1405,
      2006, doi: 10.1002/jmor.10472.
[189] K. Cheng, “Learning in Cnidaria: A systematic review,” Learn. Behav., vol. 49, no. 2, pp. 175–189,
      Jun. 2021, doi: 10.3758/s13420-020-00452-3.
[190] J. I. Raji and C. J. Potter, “The number of neurons in Drosophila and mosquito brains,” PLOS ONE,
      vol. 16, no. 5, p. e0250381, May 2021, doi: 10.1371/journal.pone.0250381.
[191] R. Menzel and M. Giurfa, “Cognitive architecture of a mini-brain: the honeybee,” Trends Cogn.
      Sci., vol. 5, no. 2, pp. 62–71, Feb. 2001, doi: 10.1016/S1364-6613(00)01601-6.
[192] O. J. Loukola, C. Solvi, L. Coscos, and L. Chittka, “Bumblebees show cognitive flexibility by
      improving on an observed complex behavior,” Science, vol. 355, no. 6327, pp. 833–836, Feb.
      2017, doi: 10.1126/science.aag2360.
                                                      198


[193] S. R. Howard, A. Avarguès-Weber, J. E. Garcia, A. D. Greentree, and A. G. Dyer, “Numerical
      ordering of zero in honey bees,” Science, vol. 360, no. 6393, pp. 1124–1126, Jun. 2018, doi:
      10.1126/science.aar4975.
[194] T. J. McCabe, “A Complexity Measure,” IEEE Trans. Softw. Eng., vol. SE-2, no. 4, pp. 308–320, Dec.
      1976, doi: 10.1109/TSE.1976.233837.
[195] S. J. Shettleworth, Cognition, evolution, and behavior. Oxford University Press, 2010.
[196] T. W. Fawcett and W. E. Frankenhuis, “Adaptive explanations for sensitive windows in
      development,” Front. Zool., vol. 12 Suppl 1, p. S3, 2015, doi: 10.1186/1742-9994-12-S1-S3.
[197] G. S. Hornby, H. Lipson, and J. B. Pollack, “Generative representations for the automated design
      of modular physical robots,” IEEE Trans. Robot. Autom., vol. 19, no. 4, pp. 703–719, Aug. 2003,
      doi: 10.1109/TRA.2003.814502.
[198] M. Schoenauer, P. Savéant, and V. Vidal, “Divide-and-Evolve: a Sequential Hybridization Strategy
      Using Evolutionary Algorithms,” in Advances in Metaheuristics for Hard Optimization, P. Siarry
      and Z. Michalewicz, Eds. Berlin, Heidelberg: Springer, 2008, pp. 179–198. doi: 10.1007/978-3-540-
      72960-0_9.
[199] R. Albert, “Scale-free networks in cell biology,” J. Cell Sci., vol. 118, no. Pt 21, pp. 4947–4957, Nov.
      2005, doi: 10.1242/jcs.02714.
[200] T. J. Gibson, “Cell regulation: determined to signal discrete cooperation,” Trends Biochem. Sci.,
      vol. 34, no. 10, pp. 471–482, Oct. 2009, doi: 10.1016/j.tibs.2009.06.007.
[201] Y. Timsit and S.-P. Grégoire, “Towards the Idea of Molecular Brains,” Int. J. Mol. Sci., vol. 22, no.
      21, Art. no. 21, Jan. 2021, doi: 10.3390/ijms222111868.
[202] A. Kurakin, “Self-organization versus Watchmaker: stochastic dynamics of cellular organization,”
      Biol. Chem., vol. 386, no. 3, pp. 247–254, Mar. 2005, doi: 10.1515/BC.2005.030.
[203] A. Kurakin, “Self-organization versus Watchmaker: ambiguity of molecular recognition and design
      charts of cellular circuitry,” J. Mol. Recognit., vol. 20, no. 4, pp. 205–214, 2007, doi:
      10.1002/jmr.839.
[204] J. Collado-Vides, “A syntactic representation of units of genetic information—A syntax of units of
      genetic information,” J. Theor. Biol., vol. 148, no. 3, pp. 401–429, Feb. 1991, doi: 10.1016/S0022-
      5193(05)80245-0.
[205] J. Collado-Vides, “Grammatical model of the regulation of gene expression.,” Proc. Natl. Acad.
      Sci., vol. 89, no. 20, pp. 9405–9409, Oct. 1992, doi: 10.1073/pnas.89.20.9405.
[206] D. Bray, “Protein molecules as computational elements in living cells,” Nature, vol. 376, no. 6538,
      pp. 307–312, Jul. 1995, doi: 10.1038/376307a0.
                                                     199


[207] W. Banzhaf, “Artificial Regulatory Networks and Genetic Programming,” in Genetic Programming
      Theory and Practice, Springer, Boston, MA, 2003, pp. 43–61. doi: 10.1007/978-1-4419-8983-3_4.
[208] W. Banzhaf, “On Evolutionary Design, Embodiment, and Artificial Regulatory Networks,” in
      Embodied Artificial Intelligence, Springer, Berlin, Heidelberg, 2004, pp. 284–292. doi:
      10.1007/978-3-540-27833-7_22.
[209] W. W. Fischer, J. Hemp, and J. E. Johnson, “Evolution of Oxygenic Photosynthesis,” Annu. Rev.
      Earth Planet. Sci., vol. 44, no. 1, pp. 647–683, 2016, doi: 10.1146/annurev-earth-060313-054810.
[210] J. T. O. Kirk, Light and Photosynthesis in Aquatic Ecosystems. Cambridge University Press, 2010.
[211] N. G. Bednarska, J. Schymkowitz, F. Rousseau, and J. Van Eldere, “Protein aggregation in bacteria:
      the thin boundary between functionality and toxicity,” Microbiology, vol. 159, no. 9, pp. 1795–
      1806, 2013, doi: 10.1099/mic.0.069575-0.
[212] E. J. Stewart, R. Madden, G. Paul, and F. Taddei, “Aging and Death in an Organism That
      Reproduces by Morphologically Symmetric Division,” PLOS Biol., vol. 3, no. 2, p. e45, Feb. 2005,
      doi: 10.1371/journal.pbio.0030045.
[213] B. S. C. Leadbeater, The choanoflagellates: evolution, biology, and ecology. Cambridge, United
      Kingdom: Cambridge University Press, 2015.
[214] T. Brunet, B. T. Larson, T. A. Linden, M. J. A. Vermeij, K. McDonald, and N. King, “Light-regulated
      collective contractility in a multicellular choanoflagellate,” Science, vol. 366, no. 6463, pp. 326–
      334, Oct. 2019, doi: 10.1126/science.aay2346.
[215] T. Brunet and N. King, “The Origin of Animal Multicellularity and Cell Differentiation,” Dev. Cell,
      vol. 43, no. 2, pp. 124–140, Oct. 2017, doi: 10.1016/j.devcel.2017.09.016.
[216] D. M. Bryson and C. Ofria, “Understanding Evolutionary Potential in Virtual CPU Instruction Set
      Architectures,” PLoS ONE, vol. 8, no. 12, p. e83242, Dec. 2013, doi:
      10.1371/journal.pone.0083242.
[217] C. Ofria and C. Adami, “Evolution of Genetic Organization in Digital Organisms,” in Evolution as
      Computation, 2002, pp. 296–313.
                                                     200