EVOLUTION OF DECISION-MAKING SYSTEMS
By
Jorden D. Schossau

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Computer Science—Doctor of Philosophy
Ecology, Evolutionary Biology, and Behavior—Dual Major
2017

ABSTRACT
EVOLUTION OF DECISION-MAKING SYSTEMS
By
Jorden D. Schossau
Adaptive biological or engineered systems are adaptive because they can make decisions.
Some systems such as viruses use their molecular composition – genetic information – to
decide when to become lysogenic (dormant) or lytic (active). Others, such as self-driving cars,
must use spatiotemporal information about obstacles, speed, and previous signs to determine
when to turn or begin braking. Computational models of systems allow us to both engineer
better systems, and create better scientific understanding about the dynamic world.
The practice of modeling decision-making started with the study of interactions between
rational agents on the spectrum of conflict and cooperation began with Von Neumann and
Morgenstern’s Theory of Game and Economic Behavior [Neumann et al., 1944]. Scenarios,
called “games”, are models designed and studied to increase understanding of conflict and
cooperation between these agents. The games discussed here are Prisoner’s Dilemma and
Volunteer’s Dilemma. Modern methods of analysis for games involving populations of interacting agents fail to predict the final strategy distribution among all agents. In chapter 2
I develop a new computational agent-based simulation used as an inductive study system
to compare the deductive predictive capabilities of an analytical model that is capable of
predicting the final distribution under idealized conditions. Lastly, I show a novel finding that
the agent-based model suggests probabilistic, or mixed, strategies (such as probabilistic gene
expression) are a result of the development and maintenance of cooperation in Volunteer’s
Dilemma.
Game theory fails to provide tractable models for more complex decision-making situations,
such as those with complex spatial or temporal dimensions. In these cases an algorithm of

conditional logic may be used to simulate decision-making behavior. Yet still there are systems
for which the use of an algorithm as a model is inadequate due to incomplete knowledge
of the system. Perhaps the model makes too many generalizations, is limited by atomic
discretization, or is otherwise incomplete. In these cases it is useful to compensate for deficits
by using probabilistic logic. That is, we assume that a stochastic process can roughly describe
those subprocesses not fully modeled.
Lastly, algorithms as decision strategies can incorporate temporal information in the
decision-making process. There are two ways temporal information can be used in an individual’s conditional logic: evolutionary, and lifetime. The evolutionary approach has proved
much more flexible as a means to discover and tune models of unknown decision-making
processes. Neuroevolution is a machine learning method that uses evolutionary algorithms
to train artificial neural networks as models of decision-making systems. There is currently a
wide diversity of methods for neuroevolution that all share common structures of the types
of problems being solved: those generally being cognitive tasks. Toward this end it would be
useful if there were some properties common to all cognitive systems that could be incorporated into the optimizing objective function in order to enhance or simplify the evolutionary
process. In chapter 3 and 4 I explore new methods of improving model discovery through
neuroevolution and discuss the applicability of these methods for probabilistic models.

Copyright by
JORDEN D. SCHOSSAU
2017

ACKNOWLEDGMENTS
There are a number of people to whom I am grateful for their influence on my graduate
journey. Dr. Chris Adami has been my Research Experience for Undergraduates advisor in
the summer of 2007, my graduate advisor beginning in 2009 at Keck Graduate Institute,
and he allowed me to follow him to Michigan State University where I continued under his
guidance. In every case Dr. Adami has been an excellent resource and support allowing me to
have a diverse set of experiences and skills which will certainly be required in my chosen field.
The environment he created for research is unique in that it was heavily interdisciplinary and
encouraged new ideas regardless of their origin. I also thank Dr. Arend Hintze for mentoring
me during the same time frame by providing invaluable professional feedback, encouraging
the writing process, and helping to maintain a sane working environment. My committee
members Dr. Bill Punch and Dr. Charles Ofria have provided much-needed guidance and
feedback concerning my degree progress, professional outlook, and dissertation text. As chair
of the Department of Computer Science and Engineering, Dr. Eric Torng has been effective
and understanding by providing professional opportunities and support for activities.
I am fortunate to have studied at Michigan State University at a time when it had become
the epicenter for scientific research and collaboration in my chosen field. The timing was not
only good for research ideas through organizations such as BEACON and EEBB, but also
research infrastructure for computing provided by ICER and versioning and code hosting
provided by IT Services. I thank past and present members of the Adami lab, Ofria lab, and
Lenski lab who have provided much-needed feedback. For their camaraderie, friendship, and
wealth of experience-based advice I thank Anu Pakanati, Aditi Gupta, Lacey Best-Rowden,
Caitlyn Pickens, Neem Serra, Taylor Kelsaw, and Leonie Hintze.
For their endless support and encouragement I acknowledge my parents Pam Schossau
and Jim Schossau, my sister Carlin Schossau, Tom Trainor, Alita Burmeister, and my aunt
and uncle Kathy Stark and Gary Flum.
v

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 1 Introduction . . . . . . . . . . . . . .
1.1 Origins of Modeling Decision-Making
1.2 Example Games . . . . . . . . . . . . . .
1.3 Neuroevolution . . . . . . . . . . . . . .
1.3.1 Unification for Advancement . .
1.3.1.1 Machine Learning . . . . . .
1.3.1.2 Unsupervised Learning . . .
1.3.1.3 Reinforcement learning . . .

.
.
.
.
.
.
.
.

1
3
5
11
16
17
17
18

Chapter 2 Two New Volunteer’s Dilemma Models and Mixed-Strategies as
a Result of Selection . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 New Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Mutational Mechanism . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Spatial Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22
22
23
27
31
33
36
38
42

Chapter 3 Information-theoretic Neuro-correlates
Boost Evolution of Cognitive Systems .
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . .
3.2 Introduction . . . . . . . . . . . . . . . . . . .
3.3 Background . . . . . . . . . . . . . . . . . . . .
3.3.1 Network-theoretic neuro-correlates .
3.3.1.1 Minimum-Description-Length . . .
3.3.1.2 Topological Complexity . . . . . .
3.3.1.3 Connectivity and Sparseness . . .
3.3.1.4 Representations . . . . . . . . . .
3.3.1.5 Information Integration . . . . . .
3.3.1.6 Predictive Information . . . . . .
3.3.2 Complex Environments . . . . . . . . .
3.4 Methods . . . . . . . . . . . . . . . . . . . . . .
3.5 Results and Discussion . . . . . . . . . . . .
3.5.1 Augmented selection and its effect on
3.5.2 Neuro-correlate interactions . . . . . .
3.6 Conclusions . . . . . . . . . . . . . . . . . . . .

43
43
44
47
49
49
50
51
51
53
55
57
58
60
61
63
67

vi

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
other
. . . .
. . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

viii

. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
neuro-correlates
. . . . . . . . . . .
. . . . . . . . . . .

3.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 4 The Role of Conditional Independence
ligent Systems . . . . . . . . . . . . . . .
4.1 Abstract . . . . . . . . . . . . . . . . . . . . .
4.2 Introduction . . . . . . . . . . . . . . . . . .
4.3 Results . . . . . . . . . . . . . . . . . . . . .
4.4 Discussion . . . . . . . . . . . . . . . . . . .
4.5 Acknowledgements . . . . . . . . . . . . . .

Intel. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

69
69
70
74
83
86

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

vii

in the
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

Evolution of
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

68

LIST OF FIGURES

Figure 1.1:

Replicator equation modeling for the Rock-Paper-Scissors game with a
repulsive fixed point at population fractions (xR , xP , xS ) = (1/3, 1/3, 1/3).
(a): infinite population size (replicator equation), with initial condition
at the fixed point. (b): agent-based modeling (finite population size
N = 1, 024), Moran process (see Box 3), population frequencies plotted every 25 generations. In this case, the population ended up on the
fixed point xS = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8

Figure 1.2:

Trajectories of the population fraction of the game defined by Eq. (1.3)
with a = 1 and b = 0.2. The dashed line represents the Evolutionary
Stable set, while the solid lines show the population trajectories that
begin at the solid dots (different initial conditions), and move towards
the ES set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

(a): Average trajectories of lines of descent for probabilistic strategies
using an encoding where each of the two probabilities p and q is encoded
independently, thus fixing r (the “2-genes” encoding). (b): Average
trajectories of lines of descent for probabilistic strategies with different
initial conditions, using the “3-genes” encoding (see text). The dashed
line represents the predicted ES-set for deterministic strategies shown
in Fig. 1.2. Average of 1, 000 trajectories per line of descent, obtained
from populations with 1, 024 agents evolving at a mutation rate of 10%
per locus (translating to 30% per chromosome) for 1, 000 generations.

11

Predicted payoff for a single cooperator (blue) and a single defector
(red) in a VD game with group size N = 9, cooperation cost c = 0.4,
public good β = 3, and threshold ζ = 1, using equations 2.1 and 2.2
respectively. As only 1 cooperator is required to produce the public
good, the expected payoff for this cooperator does not change with
respect to changes in cooperator population density, but the viability
of the defection strategy depends on the density of cooperators. . . .

27

Figure 1.3:

Figure 2.1:

viii

Figure 2.2:

Figure 2.3:

Figure 2.4:

Figure 2.5:

Figure 2.6:

As a prediction of [Archetti and Scheuring, 2011], average payoff for a
single cooperator (blue) and a single defector (red) for every possible
prevalence of cooperators for the VD game with group size 9, cost of
cooperation 0.4, public good benefit 3, and ζ = 5 (A), ζ = 6 (B), ζ = 7
(C). The stable fixed point represented by the higher crossing point
of the payoff curves shifts toward more cooperation for games with
higher public good thresholds, predicted even for games with a high
threshold of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

Analytical predictions of the equilibrium fixed point (dashed line) compared with our agent-based simulation results (solid line) for a variety
of minimum thresholds for public goods generation ζ, for the VD game
of group size 5 (A) group size 9 (B), cooperating cost of 0.4, public good
benefit of 3, and µ of 0.005. Bootstrapped 95% confidence intervals
shown but not discernible. . . . . . . . . . . . . . . . . . . . . . . . .

30

Distributions representing likelihood of observing population mixtures,
derived from equation 2.8. All panels show predictions for the VD
game of N = 9, c = 0.4, β = 3, and µ = 0.005, but ζ = 5 (A), ζ = 6
(B), and ζ = 7 (C). For simple predictions there is only one peak (A),
but as the game becomes more difficult with higher ζ (B) a second
peak may form showing viability of defection (C). . . . . . . . . . . .

32

Likelihood of average population mixtures for mixed-strategies assuming infinite population size and infinite time. Each peak and valley
represents a fixed point: The left peak (defective) is attractive, the
valley is repulsive at PC = 0.1375, and the right peak (cooperative)
is attractive at PC = 0.275. It can be observed in finite populations
that a small mutation rate or large population will stabilize a population at the cooperative fixed point, but otherwise the population will
eventually collapse to the defective fixed point. . . . . . . . . . . . .

34

Agent-based evolved populations for the uniform mutational distribution (A) and Gaussian distribution (B), with public goods threshold
ζ = 4, for the VD game of group size 16, cooperating cost c = 0.6, public good benefit β = 3, and µ = 0.005. Over 100 replicates collected to
produce the distributions. Both distributions have and are predicted to
have a mean cooperation frequency fPC = 0.275 using the cooperative
fixed point in the distribution from Figure 2.5. Despite the average
of each distribution as the predicted fixed point, the distributions are
quite different from one another. Only the Gaussian mutational distribution produces a unimodal population strongly clustering around the
predicted fixed point, whereas the uniform mutational distribution produces a unimodal population with a long tail, and it is not immediately
apparent which strategy is the fixed point, or average strategy. . . .

35

ix

Figure 2.7:

Fixed points predicted exactly by the analytical model (dashed line)
and approximated by the average strategy of the evolved population
from the agent-based model (solid line). Both models have the parameters ζ/N = 0.2, c = 0.01, and β = 0.5. The agent-based model uses
µ = 0.005 and a smoothing payoff parameter s = 0.5, which is half
way between a linear and step interpolation. Both predictions converge
toward PC = 0.2, and is identical to the setup and result of the analysis
provided by [Archetti and Scheuring, 2011]. . . . . . . . . . . . . . .

36

Population distributions for single instances of agent-based evolved
populations for well-mixed (A) and spatial constraint (B), with public
goods threshold ζ = 4, for the VD game of group size 16, cooperating
cost c = 0.6, public good benefit β = 3, µ = 0.005. The analytical
model predicts a mean cooperation frequency fPC = 0.275 and both
well-mixed and spatial populations evolutionarily shift around this
fixed point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Spatial visualizations for single instances of agent-based evolved populations (the same from Figure 2.8) for well-mixed (A) and spatial
constraint (B), with public goods threshold ζ = 4, for the VD game
of group size N = 16, cooperating cost c = 0.6, public good benefit
β = 3, µ = 0.005. The analytical model predicts a mean cooperation
frequency fPC = 0.275 and both well-mixed and spatial populations
evolutionarily shift around this fixed point. . . . . . . . . . . . . . .

39

Figure 2.10: Comparison of analytical (dashed) and ABS (solid) predictions for
deterministic strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group size N = 9, cooperating cost
c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A and
B they analytical (dashed) is the same prediction assuming well-mixed.
Fixed points for the pure-strategy game predicted by the ABS under
spatial constraint (B) show much higher levels of cooperation than the
well-mixed analytical prediction. . . . . . . . . . . . . . . . . . . . .

40

Figure 2.8:

Figure 2.9:

Figure 2.11: Comparison of analytical (dashed) and ABS (solid) predictions for
probabilistic strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group size N = 9, cooperating cost
c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A and
B they analytical (dashed) is the same prediction assuming well-mixed.
Fixed points for the mixed-strategy game predicted by the ABS under
spatial constraint (B) show marginally higher levels of cooperation
than the analytical prediction, indicating higher cost-benefit efficiency. 41

x

Figure 2.12: The result of using Equation 2.7 to predict group fitness for Volunteer’s
Dilemma of group size N = 9, ζ = 4, cost c = 0.4, and benefit β = 3.
The fixed point is at PC = 0.59, but the maximum group fitness is
higher, at PC = 0.72. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.1:

Figure 3.2:

Figure 3.3:

Figure 3.4:

42

A: In the simulation, large or small blocks fall diagonally toward the
bottom row of a 20 × 20 discrete world, with the agent on the bottom
row. For the purpose of illustrating the task, a large brick is falling
to the left, while a small brick is falling to the right. In simulations,
only one block is falling at the time, and any one brick can fall either
only to the left or only to the right. The agent is rewarded for catching
small blocks and punished for catching large blocks. B: A depiction
of the agent’s neurons (bottom left: triangles depict sensors, circles
illustrate brain (internal) neurons, trapezoids denote actuators) and
the sequence of activity patterns on the agent’s 4-bit retina (right),
as a large brick falls to the right. Reproduced from [Marstaller et al.,
2013b], with permission. . . . . . . . . . . . . . . . . . . . . . . . .

56

A: Average of fitness over 10, 000 generations (over 128 replicates) for
the block-catching task. B: Average of fitness of 10, 000 generations
(over 128 replicates) for generating random numbers. •: the control
using only performance in the GA; The effect of augmenting the fitness
function (Eq:3.8) with Φatomic ( ), with R ( ), with graph diameter
( ) on task performance, with minimum-description-length ( ), with
Gamma Index ( ), with sparseness (a), with P Iss ( ), and with P Isa
( ) on task performance. . . . . . . . . . . . . . . . . . . . . . . . .

62

Absolute number of perfectly performing block-catching agents after
10, 000 generations evolved under the five different experimental conditions shown. * indicates significance using the Chi-Squared test relative
to Task or no neuro-correlate augmentation (p < 0.05). . . . . . . . .

63

Comparison of different optimization methods at the end of evolution
of the block-catching task. The top row shows the performance distribution (gray) and average performance given the different selection
regimes (MDL abbreviates minimum-description-length, which is the
potential brain size based on the length of the encoding genome, GI
abbreviates Gamma Index, GD graph diameter, and P Iss and P Isa are
predictive information sensor t to sensor t + 1, and sensor t to actuator
t + 1, respectively). Subsequent rows show the effect of the different
selection regimes (x axis) on the respective neuro-correlates. . . . . .

64

xi

Figure 3.5:

Figure 3.6:

Figure 4.1:

Figure 4.2:

Comparison of different optimization methods on each other at the end
of evolution of the random number generation task. The top row shows
the performance distribution (gray) and average performance given
the different selection regimes (MDL abbreviates minimum-descriptionlength, which is the potential brain size based on the length of the
encoding genome, GI abbreviates Gamma Index, GD graph diameter,
and P Iss and P Isa are predictive information sensor t to sensor t + 1,
and sensor t to actuator t + 1, respectively). Subsequent rows show
the effect of the different selection regimes (x axis) on the respective
neuro-correlates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

Effect on different neuro-correlates and performance when selecting
only on a single neuro-correlate. The left panel is about evolution in
the block-catching environment, and the panel on the right is about
the RNG task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

Examples of gate decomposition into combinations of simpler gates.
Panel A shows two deterministic logic gates whose inputs are crosswired so that both gates receive the same inputs. The tables below show
their probabilities to output 0 or 1 respectively. These probabilistic
logic boundary cases are effectively deterministic logic gates. Panel B
shows a two-in-two-out logic gate that is functionally identical with
the two gates depicted in Panel A. Panel C shows two probabilistic
logic gates similarly connected like the deterministic gates from panel
A. The logic tables below only show the case where both inputs are
0. The lower table shows replacing both probabilistic logic gates with
a single two-in-two-out probabilistic logic gate (similar to panel B)
and how the new probabilities for that gate are constructed from the
individual probabilities of both gates. . . . . . . . . . . . . . . . . .

75

Average fitness of organisms on the line of descent in the spatial temporal (A) and foraging (B) environments. The solid line represents
average performance of agents restricted to conventional probabilistic
HMGs (with instantaneous interactions), dashed lines represents average performance of agents restricted to decomposable HMGs (without
instantaneous interactions). Panel A shows agents evolved in the temporal spatial integration task, and panel B shows agents evolved in
the foraging task. The y axis is normalized to show the fraction of
maximally attainable fitness in each environment. The gray shadow
indicates the bootstrapped 95% confidence interval of the mean. . . .

78

xii

Figure 4.3:

Figure 4.4:

Figure 4.5:

Average performance of agents evolved in the association environment with increasing levels of error punishment. Performance of agents
evolved while restricted to decomposable HMGs is shown as a dashed
line. Performance of agents evolved while restricted to conventional
probabilistic HMGs is shown as a solid line. The gray shadow indicates
the bootstrapped 95% confidence interval of the mean. Panel A shows
the results for agents receiving zero punishment for path-following errors. Panel B shows the results for punishment of 0.05, meaning 0.05
was subtracted from the cumulative agent score every time it wandered off the path. Panel C shows the results for agents evolved with
punishment 0.1, again subtracted from their score when wandering off
the path. The y axis was normalized to show relative performance,
with 1.0 being the maximally attainable fitness in each environment
assuming ideal behavior. . . . . . . . . . . . . . . . . . . . . . . . . .

79

Distribution of performances at the end of evolution for each of two
conditions in each of three environments. At the end of evolution, the
200th -from-last agent of each independent line of descent is collected to
create these distributions. This near-the-end data slicing is necessary
to eliminate noise toward the end of the line of descent caused by mutated extant agents not yet pruned from the line of descent by selection.
The three environments are represented on the x-axis as I (temporal
spatial integration task), F (foraging task), and A (association task
with a punishment of 0.05). In each environment (I, F, and A) agents
were allowed to either use conventional probabilistic HMGs (labeled as
“prob”) or decomposable HMGs (labeled as “dec”). Red dashes indicate the mean and extrema. Gray violin plots show the distributions of
normalized fitness for all 120 replicates per experimental condition. Fitness was normalized such that maximal theoretically attainable fitness
is represented as 1.0. For each environment of Integration, Learning,
and Association, the conditions under which evolution was limited to
decomposable gates produced significantly better adapted agents. Significance was tested using the Mann-Whitney U test, with p < 0.05
for each environment (p = 0.0, p = 0.0, p < 2.2 × 10−112 ). . . . . . .

80

Average number of gates evolved along the line of descent in a competition experiment executed in different environments. Agents were
evolved in three different environments: Panel A shows the results
from the temporal spatial integration environment. Panel B shows
the results from the foraging environment. Panel C shows the results
from the association environment with a punishment cost of 0.05. The
average number of preferentially selected decomposable HMGs over
generations is shown as a dashed line, whereas the solid line represents
the average number of preferentially selected conventional probabilistic
HMGs. The gray shadow indicates a 95% confidence interval. . . . .

81

xiii

Figure 4.6:

Figure 4.7:

Average fitness along the line of descent in each environment while both
non-decomposable and decomposable gate types were evolvable: Panel
A shows line of descent fitness for the temperal spatial integration
environment. Panel B shows line of descent fitness for the foraging
environment. Panel C shows line of descent fitness for the association
environment. The gray shadow indicates a 95% confidence interval. .

82

Distributions of different properties of Markov Brains at the end of
evolution in three environments and two conditions each environment.
The three environments are represented on the x-axes as I (temporal spatial integration task), F (foraging task), and A (association
task with a punishment of 0.05). In each environment (I, F, and A)
agents were allowed to either use conventional probabilistic gates (labeled as ”prob”) or decomposable gates (labeled as ”dec”). Panel A
shows the number of gates, Panel C shows the gamma index (that
is the connection density), and Panel B shows the diameter (longest
of the shortest connections). Each panel shows the distributions as
violin plots in gray, and the mean as well as the extreme points as
red dashes. For each environment of Integration, Learning, and Association, and for both properties of Gamma Index and Diameter, the
conditions under which evolution was limited to decomposable gates
produced significantly more connected and larger diameter Markov
Brains. Significance was tested using the Mann-Whitney U test, with
p < 0.05. For Gamma Index, the difference between gate restricted
evolution within environments produced p values of p = 4.9 × 10−23 ,
p = 6.5 × 10−6 , p = 1.3 × 10−3 respectively. For Diameter, the difference between gate restricted evolution within environments produced
p values of p = 2.2 × 10−6 , p = 3.6 × 10−2 , p = 4.2 × 10−3 respectively. 83

xiv

Chapter 1
Introduction
The simplest form of decision-making is to not make a decision at all. That is, the easiest
strategy is a constant one where the decision, or choice, is not influenced by history or
environment and does not change during the lifetime of an organism. In natural systems these
strategies typically arise as reactions to slow and gradual environmental changes spanning
many lifetimes. The slow nature of these changes relative to the quick lifetimes of the
organisms allows the population of organisms to encode successful constant strategies directly
into their genomes. In engineered systems these strategies are the simplest to implement and
understand, but are the most limited in flexibility.
A slightly more complex form of decision-making is the probabilistic strategy. Instead of
always performing the same action as in the constant strategy case, an organism can employ
a strategy relying on stochastic processes to “decide” which action to take. This reliance
on external stochasticity to constrain choice may be thought of as a type of phenotypic
plasticity. It is believed that natural systems that exhibit phenotypic plasticity do so because
the predictability of their environment is low compared to the reliability of the noisy process
driving their strategy choices to correspond with benefits.
The next increase in decision-making complexity is an introduction of conditional logic
1

on salient cues. Probabilistic strategies are already conditional, but they are not influenced
by anything informative. This new type of conditional logic must be predicated on salient
cues, or in other words, cues that are informative about conditions or states relevant to the
organism.
These types of strategies can be better described as algorithms, or sets of rules for a
problem-solving behavior. For example, a waiter uses an algorithm to decide guest seating,
and a robotic arm uses an algorithm to decide how its many parts move (or don’t move) in
order to accomplish a high-level operation. With algorithms, we have the ability to describe
or model complex decision-making using nothing but simple conditional logic and atomic
operations.
Sometimes we are unable to fully model a natural system using an algorithm due to
incomplete knowledge of the system. Perhaps the model makes too many generalizations,
is limited by atomic discretization, or is otherwise incomplete. In these cases it is useful to
compensate for deficits by using probabilistic logic. That is, we assume that a stochastic
process can roughly describe those subprocesses not fully modeled. This use of probabilistic
logic differs from the probabilistic strategy discussed above in that the entire logic tree does
not solely depend on stochasticity for the decision-making process.
Lastly, algorithms as decision strategies can incorporate temporal information in the
decision-making process. There are two ways temporal information can be used in an individual’s conditional logic: evolutionary, and lifetime. The more commonly studied method
of including temporal information is through the process of evolution. The strategies themselves do not change how their decisions are being made, but these changes occur within
the population’s genetic pool over time. Evolution’s process of selection changes the strategy
distribution across the population. Through random mutation and continual selection the
population adapts its pool of strategies to the environmental problem. Lifetime use of temporal information is characterized by an individual’s decisions being predicated on previous

2

decisions by that individual or those of other individuals. This sequential lifetime model
of using past experiences to shape future decisions has been used to represent memory in
intelligent rational agents [Nowak and Sigmund, 1993a].
I begin with a brief description and history of the fields of game theory and evolutionary
neuroevolution related to the present results. Chapter 2 details new models for Volunteer’s
Dilemma and proposes a new mechanism of selection for cooperation. Chapter 3 introduces
the use of neurocorrelates to bootstrap the evolutionary process of decision-making agents.
Lastly, Chapter 4 further explores the issue of how to use information theory to measure the
decision networks of a probabilistic decision-making agent.

1.1

Origins of Modeling Decision-Making

The practice of modeling interactions between rational agents on the spectrum of conflict and
cooperation began with Von Neumann and Morgenstern’s Theory of Game and Economic
Behavior [Neumann et al., 1944] focusing mostly on economic utility. Even more abstractly,
game theory mathematically describes decision-making within social groups. However, this
larger applicability did not advance until John Nash’s work providing the concept of equilibrium [Nash et al., 1950, Nash, 1951]. The equilibrium concept spurred a renaissance in game
theory, but left some problems to fester. The full potential of the equilibrium concept was yet
to be utilized until Maynard Smith and Price’s Evolution and the Theory of Games [Maynard
Smith, 1982] where game theoretic ideas were applied to population biology and allowed the
creation of testable predictions.
However, this focus on rationality proved limiting for the field, since ultimately economists
wanted to model human decision-making, but contrarily humans often do not make rational
decisions. Similarly, insects and animals appear to make rational decisions but without the
power of human rationality [Smith, 1974]. Clearly, a broader theory was necessary.
With Maynard Smith’s Evolution and the Theory of Games in 1982, the focus changed
3

from rationality determining a game’s outcome, to the process that determines the outcome.
The process of equilibration can be slow as in biological evolution, or fast as in social mimicry
of behavior. However, the outcome depends not only on the equilibrating speed, but on all the
details of the process, such as mutational neighborhood, mutation rate, epistatic interactions,
and population size. The details of the evolutionary process can have difficult-to-predict
consequences on the path to, and final state of, the game outcome. This opaque nature of
evolutionary dynamics makes the discovery of mathematical formalisms for Evolutionary
Game Theory sometimes difficult or impossible.
Ideally then, to study Evolutionary Game Theory one should find a model organism
that exhibits behaviors of interest, and perform experimental evolution to investigate how
the costs, benefits, mutation rate, etc. influence population-level behavior. From these direct
empirical data one could build a generalizable mathematical model to describe the underlying
dynamics, and new scientific understanding would be attained. Unfortunately, the process
of evolution is extremely slow compared to a human lifespan. E. coli is one of the fastest
growing model organisms used for experimental evolution, but even with a doubling time
of roughly 20 minutes and typical flask transfer ratios of 1:100, it takes about 20 years to
witness evolution over 50,000 generations [Lenski, 2011]. Additionally, many of the necessary
conditions for various costs, benefits, mutation rates, etc. are difficult if not impossible to
experimentally manipulate. If the mathematics governing various types of Evolutionary Game
Theory is difficult to discover, and biological evolution too difficult to comprehensively study,
then it would be useful to have a study system that can be quickly and easily manipulated
to both inform and validate the mathematics, while having similar emergent properties to
biological study organisms. This is the reason to use computational agent-based modeling to
aid in studying Evolutionary Game Theory.
Unlike mathematical modeling, agent-based modeling is a type of modeling where each
individual in a population and all their interactions are simulated. The simplest historic agentbased model is perhaps the “Game of Life”, also known as a Cellular Automata and arose out
4

of the investigation of creating self-replicating machines [Gardner, 1970]. For any agent-based
model, only the simple rules governing the behavior of an individual must be defined from
the point of view of the agent, and yet complex population-level effects can still be observed
without the need to explicitly define them. For cellular automata, self-replication and complex
movement patterns were hailed as emergent behavior coming from very simplistic rulesets.
This level of abstraction paired with computational power means that 50,000 generations
of agent-based evolution of a population passes in about 15 minutes using 2010 compute
technology. Many effects can be studied using agent-based modeling, such as automotive
or pedestrian traffic congestion, social learning, and predator-prey dynamics [Naiem et al.,
2010, Amrutha, 2014, DeAngelis and Mooij, 2005]. The power of agent-based modeling lies in
both the fast computational realization of the model, and the fact that it shows a cause and
effect relationship between simple interaction rules and their emergent phenomena.
While agent-based modeling shows the cause and effect relationship between interaction
rules and the population-level behavior, it does not provide generalizable knowledge to other
similar systems as mathematical modeling does. This is why the research presented here
uses both agent-based modeling and mathematical modeling together to study Evolutionary
Game Theory.

1.2

Example Games

A commonly studied Game Theory game is the Prisoner’s Dilemma. The situation can be
understood as two thieves who are apprehended without any evidence to their crime. Each
thief is privately offered two options, to say nothing or to incriminate their accomplice. Each
thief is also told the potential results of their decisions: both stay silent and receive a minimum
sentence; both incriminate each other and receive medium sentence; or one stays silent and
is incriminated by the other, for which the silent one receives a maximum sentence and the
defector goes free.

5

The basics of any Game Theory game, such as Prisoner’s Dilemma requires each player
to have a set of strategies from which they must choose one. After all players have chosen a
strategy, then their selected strategies are compared. Players then receive rewards or costs
(including 0) as a result of how their strategies interact with those of the other players. The
resulting cost or benefit of each possible pair of strategy interactions can be written down in
matrix form called a Payoff Matrix. The payoff matrix can be analyzed and the dynamics of
optimal rational play predicted for each player.
The payoff matrix for a general two-player game is given by

C D




C 0

D b

a
.
0

(1.1)

where the specific game is classified by the sign of the variables: Prisoner’s Dilemma for
a < 0.b > 0, Snowdrift for a > 0, b > 0, anti-coordination for a < 0, b < 0, and harmony for
a > 0, b < 0 [Hofbauer and Sigmund, 1998, Szabo and Fath, 2007, Adami et al., 2012]. These
classes of games are called Zeeman classes, denoted by the four matrices created with the
variable sign changes [Zeeman, 1980]. For three-player games there are 20 unique Zeeman
classes.
Consider this common game involving three players, Rock-Paper-Scissors (RPS) where
there is no clear winner. The game may be described by the payoff matrix (in normalized
form with zero-diagonal)

6

R

P

S





R 0 −1 1 


.
P
1
0
−1




S −1 1
0

(1.2)

Payoff values are interpreted as the payoff the row-player receives playing against the
column-player. To model the stochastic nature of noisy interactions, likelihoods, or interactions
not yet fully understood, the mixed Nash equilibrium can describe the outcome. For the case
of the RPS game, the mixed Nash Equilibrium is the well-known ( 13 , 31 , 31 ).
This fixed point is unstable: the trajectories push outward from the central point, and
soon enough the population (in the infinite population approximation) moves from all scissors,
to all rock, to all paper (a trajectory known as a heteroclinic orbit because it connects the
unstable pure strategy fixed points, see Fig. 1.1a).
Such heteroclinic orbits are impossible to sustain for finite populations. In these orbits,
once a type is extinct it will not recover (see Fig. 1.1b). In finite populations playing the
same game (defined by the payoffs (1.2)) only a single strategy will survive, but which one
ultimately remains is random. Note that extinction can happen due to the stochasticity
introduced by the finite population even for the RPS game with an attractive interior fixed
point (obtained from Eq. (1.2) by, for example, flipping the signs in that matrix) [Claussen
and Traulsen, 2008, Adami et al., 2012].
While analytical observations can also make predictions for finite populations ( [Traulsen
and Hauert, 2009, Nowak and Sigmund, 1993b]), they assume an existing clonal population,
or a set of predefined strategies, or mutation rates that are quite small. Higher mutation
rates, finite population sizes, and heterogeneous strategies all add complexity for an analytical
treatment.
7

Figure 1.1: Replicator equation modeling for the Rock-Paper-Scissors game with a repulsive fixed
point at population fractions (xR , xP , xS ) = (1/3, 1/3, 1/3). (a): infinite population size (replicator
equation), with initial condition at the fixed point. (b): agent-based modeling (finite population
size N = 1, 024), Moran process (see Box 3), population frequencies plotted every 25 generations.
In this case, the population ended up on the fixed point xS = 1.

For pure-strategy interpretations, the prediction is, 1/3 of the population will play each
strategy all the time. For the mixed-strategy interpretation, the prediction is, each player will
play each strategy probabilistically with equal probability. However, when modeled using an
agent-based simulation, even with a population beginning with the mixed Nash equilibrium,
the population strategies spiral outward until a fixed point is reached (in this case, XS = 1).
Such finite population sizes and higher mutation rates in particular create problems of
analytical tractability, but are generally solved with agent-based simulation.
Another example is in the treatment of the evolutionary stable set. Consider for a moment
the two-player game described by the payoff matrix (1.1) in the “snowdrift” regime, where
both a, b > 0. Here, Nash’s concept of Evolutionary Stable Strategy (ESS) clearly shows that
any single strategy can thrive against itself and any rare alternative strategy. In this case,
none of the “pure” strategies C or D are an ESS, but the mixed strategy M (a probabilistic
mixture of the two strategies C and D) would be, with frequencies

8

a
a+b

and

b
a+b

respectively.

The mean payoff of strategy M against pure strategies C and D can be calculated, so the
question may be asked: “What happens if the pure strategies C and D play against another
strategy that has payoffs identical to the mixed strategy?” For example, the mixed strategy
M earns b2 /(a + b) against C, and a2 /(a + b) against D. What is the dynamics when a pure
strategy with these exact payoffs is thrown “into the mix”? This can be investigated by
studying the payoff matrix

C

D



M


C
0
a
ab/(a + b) 


.
D
b
0
ab/(a
+
b)




2
2
M b /(a + b) a /(a + b) ab/(a + b)

(1.3)

In this game, there are an infinite number of fixed points that form an evolutionary stable
set (ESS) (see [Weibull, 1995] for great detail on Evolutionary Stable Sets). This comes
about because strategy M may exist at any frequency s given that C exists at frequency
a(1 − s)/(a + b) and D exists at frequency b(1 − s)/(a + b). That is, for any combination of
frequencies of pure-strategies C and D, strategy M will persist as a strategy that mimics those
frequencies. M is essentially neutral relative to C and D, thus creating infinite fixed points
where no strategy is better than another. While this stable set of fixed points is attractive,
the populations may be free to evolve anywhere along the line due to random drift once they
are there (see Figure 1.2).
When testing these predictions with agent-based methods, several decisions in designing
the simulation can have significant impact on the results. For example, because the probabilities p, q, and r are changed via a discrete mutational process, the nature of that process will
affect the population dynamics. If the probabilities are implemented as continuous variables
(rather than discretized to a particular resolution such as 0 or 1), then the mutation could

9

C

D

M

Figure 1.2: Trajectories of the population fraction of the game defined by Eq. (1.3) with a = 1
and b = 0.2. The dashed line represents the Evolutionary Stable set, while the solid lines show the
population trajectories that begin at the solid dots (different initial conditions), and move towards
the ES set.

be performed by replacing the given probability by a uniform random number (“global” mutations), or be done by changing the probabilities either up or down by a uniform random
number from a distribution spanning a particular percentage (“local” mutations). In the
latter case, care must be taken so as to remain within the boundaries of a probability. At the
same time, it is not possible to update all three probabilities independently, as they must sum
to one. Thus, if the underlying genetics of the process in terms were implemented in terms
of three loci (one for p, one for q, and one for r), then mutating one locus will necessarily
affect both other loci (this is the “3-gene” implementation). If instead the genetics were implemented in terms of two independent loci (say, p and q, the “two-gene” implementation)),
then the value of the third probability is determined automatically. All these design decisions
affect the population dynamics.
Fig. 1.2, shows the average end points of the set of probabilities (p, q, r) on the evolutionary
“line of descent”. Note that only the average trajectory (averaged over 1,000 trials) starting
at the same initial state (p(0), q(0), r(0)) is smooth: each single trajectory itself is jagged. In
this run, when probabilities are mutated they are changed at most ±5% from their current
value, which can give rise to significant jumps within the triangle. The trajectories to reach
10

the ES-set are shown in Fig. 1.3a for the “3-genes” encoding, where mutations in one of the
loci affects the other two probabilities (as only two of the three probabilities are independent).
The trajectories differ when the decisions are encoded in two independent loci (Fig. 1.3b),
while the end-points are of course the same.

Figure 1.3: (a): Average trajectories of lines of descent for probabilistic strategies using an encoding
where each of the two probabilities p and q is encoded independently, thus fixing r (the “2-genes”
encoding). (b): Average trajectories of lines of descent for probabilistic strategies with different
initial conditions, using the “3-genes” encoding (see text). The dashed line represents the predicted
ES-set for deterministic strategies shown in Fig. 1.2. Average of 1, 000 trajectories per line of
descent, obtained from populations with 1, 024 agents evolving at a mutation rate of 10% per locus
(translating to 30% per chromosome) for 1, 000 generations.

1.3

Neuroevolution

Biological brains are incredibly good at integrating relevant information and making decisions
given their embodiment and typical environments. How do a brain’s synaptic connections
change over time? What is the purpose of the brain’s layered structure? How can complex
motion and behavior be engineered without being designed? These and other such questions
have inspired work in the field of Artificial Intelligence following creation of the programmable
computer in the mid 1930s. The computer enabled automation of logic and symbolic tasks
via symbolic logic operations. That is, if something could be described symbolically, then it
11

could be automated by a computer. The inevitable idea was that perhaps a brain could be
described symbolically, thereby creating an electronic brain, which would allow automation
of many varieties of tasks [Bain, 1985]. The first such accepted attempt was presented at
Dartmouth College in 1956: Logic Theorist. This program was designed specifically to follow
the same types of reasoning humans do when solving mathematical problems. By breaking
down problems into generalizable steps with conditional logic, the creators of Logic Theorist
were able to automatically prove 38 of the first 52 theorems in the Principia Mathematica.
Since then there have been various efforts at computational automation of a variety of
tasks; the more complex tasks requiring more complex operations, the design of which often
surpasses human ability to adequately describe at a symbolic level. It is around this problem
of automating seemingly simple but realistically complex tasks that the field of machine
learning arose.
Machine learning is the automation of generalization. The general problem is that there
exist some examples of data (either a priori or at runtime) for which the correct response is
known but the process to arrive at that response is unknown [Guyon et al., 2011]. Not all
possible examples are available or iterable so an appropriate generalization must be learned to
handle the unknown variation. That is, there exists an underlying distribution or dimension to
the data that is not humanly easy to understand or symbolically describe. It is this underlying
description that should be learned in an automated fashion. Examples include: Diagnosis via
biomarkers [Street et al., 1995], handwriting and voice recognition [Sebastiani, 2002], and
independent component analysis [Hyv¨arinen et al., 2004].
Unfortunately, some desirable applications of machine learning may have complex underlying relationships between the input data and the desired responses. This is often the
case involving automation of a complex task performed by a human such as playing a game,
or walking. In these cases a special substrate for machine learning called artificial neural
networks (ANN) may be used. As the birth of the general field of artificial intelligence was
bioinspired, so too was the concept of an ANN inspired by biological advances at the time
12

in the field of neurobiology: gaining insight into the functioning and structure of the human
brain [Rashevsky and Rashevsky, 1960].
After its inception, development in neural networks has been largely driven by two interests:
engineering, and scientific study. People interested in engineering use neural networks as tools
to solve specific problems usually involving automation of a task. These tasks are either human
tasks, or superhuman tasks. Those interested in scientific study use neural networks to answer
specific questions such as “how do synaptic connections change over time?” or “what is the
purpose of the brain’s layered structure?” The aims of each interest group are different, but
advances in the field of neural networks, and artificial intelligence in general, are driven by
contributions from both groups.
Neural networks are particularly well-suited to problem domains where the desired mapping between stimulus and response is too complicated for other machine learning techniques
to discover. Robotic control is one such application where neural networks excel. This is
because of their ability to receive and operate on continuous data, in real time, and be robust
to noise, in addition to learning the underlying complex relationships among the data. Like
many other machine learning methods, ANNs have traditionally had difficulty in several
conditions such as sparse and infrequent data, or when encountering situations or data wildly
different than that for which the network has been trained. The problem of data infrequency
has since been solved with long short-term memory (LSTM) network layers [Hochreiter and
Schmidhuber, 1997] allowing bidirectional persistence: data may persist in the network for
long times, and thus training methods can attribute error to earlier data much more easily.
The second problem of encountering new data is an unsolved problem of lifetime learning.
In robotics such lifetime learning situations might be a damaged wheel, smudged camera
lens, or unexpected or changing terrain. Some efforts have been made to remedy this problem but the most promising approaches rely on precomputing an enumeration of all control
strategies [Cully et al., 2015].

13

Training ANNs is difficult because of the issue of how to change hidden layer weights
based on correctness of the network output. Research in the field even stagnated for a couple
of decades (1960-1980) until back propagation was rediscovered [Bryson, 1975, Werbos, 1974].
However, back propagation is inherently slow and has the additional problem of vanishing
error gradients during optimization [Hochreiter, 1998] that is overcome with tricks like Deep
Nets and LSTMs. As a general panacea again biology provides inspiration for how to tune
these networks: evolution. As a result, the field of neuroevolution is created.
Neuroevolution uses evolutionary algorithms to train artificial neural networks. Various
methods exist for using the general algorithm of evolution (comprising of mutation, selection,
and inheritance) for the purpose of evolving aspects of ANNs. The differences between these
variations are mostly the way each biologically-inspired mechanism is utilized. Common
differences include use of particular mechanisms such as ontogenetic vs phylogenetic, direct
or indirect genetic encoding, mono- or poly- speciation, mono- or poly-deme, gene flow, etc.
In a different way from LSTMs, use of evolution avoids the vanishing error gradient
problem during training. Typically back propagation is used to place “blame” on specific
weights given particular desired outputs. Evolution requires no such statistical technique
because selection acts on the entire population of evaluated networks, propagating those with
more appropriate weights.
Another aspect of problems with standard network evaluation and modification is in
evaluation of the network response itself. Real-valued classification problems provide an easy,
almost intuitive way to adjust network weights during training. Based on network response
discrepancy the weights may be modified. However, this only works if the network response
from each set of inputs is independent. This task becomes difficult when complete sets of
responses must be evaluated such as lifetime behavior that must result in completion or
failure of a target goal.
When complex behavior is required of the neural network output, such as neural networks
14

being used for neuro-controllers, it becomes much more difficult to observe the network’s
performance by a single response. Instead, the behavior of the network may be observed
over a range or set of inputs. This complex behavior, such as playing checkers, or walking,
then has clear measurable performance relative to other variations of the same network, like
winning a game of checkers, or controlling a robot to travel the farthest distance. In this
way an evolutionary algorithm allows optimization of artificial neural networks performing
complex behavior when other methods fail.
There are many evolutionary approaches in machine learning. The most useful methods are those that succeed in a wide range of problem domains. Approaches such as the
ES-HyperNEAT algorithm [Risi and Stanley, 2011], indirect encoding, or Modular NEAT [Suchorzewski and Clune, 2011], aid with general problem domain representation and more
efficient subspace searching.
Aside from approaches in use, the future holds many new opportunities for the field of
neuroevolution. One such area is in hybrid computation. That is, the simultaneous use of
neuroevolution with something alive. This is not a grafting, rather a bidirectional feedback
system where something already highly evolved informs the rapid evolution of an artificial
neural network via a feedback interface. In a very similar way, it should also be feasible to
combine artificial neural network evolution with the growth of biological neural networks in
vitro [DeMarse et al., 2001].
Other approaches outside of evolution may also be of use. These approaches tend to
subtly improve upon the original foundations inspired by biology. One approach outside of
evolution that could be useful to explore for neuroevolution is problem decomposition on
the level of entire networks. That is, to evolve networks for lower-order tasks suspected of
supporting a higher-level modular design, then to evolve a higher-order network based on
those smaller. This is a form of task decomposition, similar to [Lu and Ito, 1999, Reisinger
et al., 2004]. Along this vein of research an automated approach would be useful. Cooperative

15

coevolution may help for division of labor, similar to [Goldsby et al., 2009]. It is a contentious
topic because some believe modularity is key to higher-level performance and efficiency, while
others believe that it adds deceptiveness to fitness landscapes reducing the achievement of
good results [Bullinaria, 2006].
It is interesting to note there is a trend in the engineering style of approach to machine
learning that is to develop mathematically-inspired solutions to problems. This is in contrast
to scientific study in the same field that tends to develop biologically-inspired solutions.
These solutions then become adopted for new engineering applications due to their numerous
advantages in handling complex tasks and behaviors. Authors of the successful HyperNEAT
network clearly stated their non-problem specific scientific intent “...the goal in this work is not
to produce the best possible checkers player but to analyze the properties of winning solutions
by examining how they exploit geometry and the implications of such exploitation.” [Gauci
and Stanley, 2010a] Both engineering and scientific approaches benefit from one another, but
engineered approaches tend to focus on specific problem optimization, whereas scientific study
approaches tend to discover generalizable improvements, largely benefiting from powerful
deductive models created with rigorous hypothesis testing and inductive generalization.

1.3.1

Unification for Advancement

It is commonly argued that there are in fact three major categories of machine learning:
Supervised, Unsupervised, and Reinforcement Learning. Supervised learning is the most
tractable form, where a corpus of input-output pairing is used to train a tunable model
through guided trial and error. Unsupervised learning is used for problems where the structure
or pattern of data must be discovered, and there is no explicit desired output [Qiu and Sapiro,
2015]. Reinforcement learning is used when, at all intermediate time points, the exact output
desired is not known, but the final result of all actions has an unambiguous desired outcome.
Confusingly, there are yet more categories coming into existence as researchers discover

16

new ways to treat data and train models, for example Semi-Supervised Learning and Deep
Learning, to name a few. This phenomenon is not unlike the early years of game theory
when every new extension and exception was developed to overcome limitations for specific
applications and these all became new methods with their own trade-offs and considerations,
ultimately being unified under a more flexible theory of evolutionary stability and equilibrium.
Indeed, it is even accepted that current artificial intelligence research must embrace an
“anarchy of methods” because there is no clear way forward [Lehman et al., 2014]. These
differentiated groupings of methods in ML confuse matters, make comparisons opaque, and
further encumber progress in the field of artificial intelligence. Therefore, for the purpose of
this text I will reorganize these into a more coherent structure with a unified rationale and
outline the benefit of doing this.
1.3.1.1

Machine Learning

Machine learning is the practice of tuning a process model to produce desired output given
examples of what that output should be. If no output is desired, then there is no way to
guide model tuning so all ML methods rely on providing some form of desired output. This
output may be in the form of an exact output, or perhaps an objective function when there
is an allowed tolerance for variation. Supervised learning trivially falls under this description,
so I will focus on the other two categories.
1.3.1.2

Unsupervised Learning

Unsupervised learning encompasses clustering and rule associating. Clustering in a ML
context is the tuning of a model to assign labels to data, maximizing or minimizing some
objective function. Typically the objective function is a multi-objective function involving
minimizing clusters, while maximizing the variance explained by the clustering, leading to
the well-known “knee” selection trade-off point, where further grouping results in diminishing
returns on explanation of variation. With this view of clustering, the ML approach is blatantly

17

substrate-neutral, allowing either hand-coded k-means clustering algorithms or general NN
algorithms to be used, for example. The only defining feature is that the output must be
a nominal labeling. Rule association is quite similar to clustering. Logical statements (or
trees of statements) generalize to relationships between the data are tuned according to
a multi-objective function such as minimizing parameters, while maximizing the variance
explained by a proposed rule. This view of rule association provides the same benefits as the
clustering view: substrate-neutrality and an ease of sharing approaches from other historically
disparate ML methods.
1.3.1.3

Reinforcement learning

The new reorganizations I have discussed so far rely on providing an explicit desired output
to a ML method, or providing an objective function through which an explicit desired output
may be automatically derived. Either way, knowledge of or a method to obtain an explicit
desired output is necessary. Reinforcement learning is no different. In reinforcement learning
the model being tuned usually integrates temporal information as part of its computation,
focusing on long-term learning and exploration as key differentiators. For this reason, environmental context and previous actions play large roles in the model’s generation of output, but
these are only more types of data being processed and there is nothing special about them.
Example problem domains include movement controllers, elevator scheduling, and various
board games. Problems typically tackled with reinforcement learning have training data too
large or intensive for batch learning, so an on-line approach is favored, but this is only a
change of the timescale for evaluations and updates. Take for example the same example
Barto uses in [Dietterich, 2004] of a person attempting to maximize a handheld radio signal.
At each location, the person observes a better or worse signal that gives no indication of
where to move next, unless previous information made available through memory is also
considered. Their goal is to maximize signal strength given all possible positions. The two
key traits to a successful algorithm here are long-term memory and exploration, where the

18

only feedback is a signal strength. Barto claims that supervised learning of the same task,
were it to be attempted, would give location and signal strength data to the algorithm as a
training corpus. This would work, but it would require many if not all signal strength-location
pairs to be know, there would be no exploration, and large spatial domains would quickly
become problematic. Reinforcement learning solves these problems by allowing feedback of
the difference between resulting and desired outputs and retaining state data from spans of
time. That is, a single attempted output results in a single feedback of how good that output
was. But this sounds very much like supervised learning, which for a single output receives
a single feedback, only there are more “guts” to the internal model being tuned and the
model chooses which subset of data on which to train. I contend that supervised learning,
unsupervised learning, and reinforcement learning, are all forms of the same learning that
require evaluation of model output relative to desired output.
There is another way to frame this argument using the topic known as the symbol grounding
problem. The symbol grounding problem is that embodied cognitive systems may have any
number of internal symbols used internally for logic by some rule, but the result of that
logic and symbol manipulation must be meaningful in the system’s environment. In other
words, the “problem” is that there must be some mechanism to translate the cognitive
(model) symbols representing concepts in the world to the actual things in the world (which
is why, among other reasons, the cognitive system must be an embodied one). Because of this,
practicioners using ML techniques will always be responsible for performing the “embodiment,”
or translation from the model result to the problem domain. When performing this function
of embodiment, the ML practicioner is indicating to the model the desired final output either
directly or as a measure of deviation from the desired output.
With unification of these ML approaches, similarities and differences between methods
become clearer. A LSTM network used to learn a reinforcement learning task could also learn
a simple supervised learning task. The LSTM network structure would be overkill for the
simpler task, but would work despite the lack of temporal context, although the experimenter
19

may need to reset the network after each data presentation because there should be no context.
The one problem many of these approaches suffer is the definition of layers and features. A
deep network of 15 layers will require more computational updates to produce a single output
than a 3 layer network learning the same task. In addition to knowing how many updates
to run, the experimenter must design the network layer count, width, input quantity, and
feature identity, among other properties. It is possible neuroevolution can address these issues,
and some have already been addressed by a subset of researchers. Some presented methods
begin with a well-connected network and make the network more efficient by modifying the
topology [Attik et al., 2005], others create novel topologies and weights [Rocha et al., 2005]
through a kind of Lamarckian evolution, and more recent work has brought evolution to deep
NNs once thought to be impossible to evolve, giving performance rivaling that of hand-tuned
networks [Miikkulainen et al., 2017].
With these recent advances, neuroevolution is enabling researchers to overcome many
problems historically plaguing machine learning, and is doing so for all kinds of problem
domains. The most promising methods are those that allow evolution of: network topology,
weights (if applicable), salient features, and hidden states (recursiveness or 1-output runtime). For this reason the included neuroevolution chapters 3 and 4 highlight an alternative
technology called Markov Network Brain (MNB). These networks have evolvable topologies,
allowing use of many hidden states to create recursion, and can evolve to use any number
of the provided features. See [Edlund et al., 2011] for a comprehensive discussion of the
functioning of MNBs.
Neuroevolution is increasingly being used to discover and optimize cognitive systems [Mnih
et al., 2016,Bojarski et al., 2016,Zhang and LeCun, 2015]. Toward this end it would be useful
if there were some properties common to all cognitive systems that could be incorporated into
the objective function in order to enhance the evolutionary process. Chapter 3 investigates this
idea with several spatial and temporal tasks using information-theoretic analyses. Chapter 4
then probes the issue of how information-theoretic analyses are influenced by the optional
20

probabilistic design of these MNB systems.

21

Chapter 2
Two New Volunteer’s Dilemma
Models and Mixed-Strategies as a
Result of Selection

2.1

Abstract

Volunteer’s Dilemma models are often developed to predict fixed points. These fixed points
represent singular average cooperation ratios among populations of players, and thus predict
what the dominating strategy or mix of strategies will be. However, the final evolved population may be in any of an infinite number of configurations all of which share the same
fixed point. It is yet unknown how to analytically predict the final population distribution
from only standard game parameters such as group size, cost, and benefit. Here I validate a
new analytical model that can describe the final evolved population distribution. To do this
I develop a new computational agent-based simulation used as an inductive study system
to compare the deductive predictive capabilities of the analytical model. Lastly, I show a
novel finding that the agent-based model suggests probabilistic, or mixed, strategies (such as

22

probabilistic gene expression) are a result of the evolution and maintenance of cooperation
in Volunteer’s Dilemma.

2.2

Introduction

Like all other games within Evolutionary Game Theory, Volunteer’s Dilemma (VD) is a model
about the conflict between the choice of cooperating and defecting. In this game each player
of a group has the option to pay a small cost so that a public good will become available
to all. For instance, a group of bystanders observe a crime. The public good is to have a
safer neighborhood with the criminal behind bars. The only way to achieve that benefit is to
volunteer and report the crime, however reporting carries an emotional and temporal cost.
If a volunteer steps forward then they are assured they will benefit from their own actions,
however their effort was wasted if any other bystanders volunteered, as the volunteer could
have been a free-rider benefiting from the self-sacrifice of others. In each application of VD
there is a different threshold of volunteers needed before the public good is made available.
The phenomenon is not unlike citizens paying taxes for roads and education, or environmental
stewardship for the preservation and enjoyment of natural areas and species, because if too
few sacrifice for the public good then no one benefits. This is the Volunteer’s Dilemma.
Real world systems modeled by the Volunteer’s Dilemma need not be macro systems
involving human players. Large groups of animals are well-known to engage in predator
warning behavior for group benefit such as vervet monkeys warning the group of terrestrial
or aerial predators. Producing a warning call is costly to the individual vocally broadcasting
their own location as the predator is more easily able to hone in on that single prey [Searcy
and Nowicki, 2005, Olson et al., 2013a].
Even groups of small organisms may engage in a Volunteer’s Dilemma. Vibrio cholerae
cells are known for their pathogenic effect when colonizing the human gut. In groups, these
cells can produce biofilms, or layers of cells, held together with self-produced structural
23

polymers. Once the cells are in a nutrient rich location, then their population size begins
doubling and the polymers they create in aggregate form a stabilizing extracellular matrix
holding the cells in place against turbid and uncertain environmental conditions [Nadell
et al., 2016]. Vibrio cholerae cells produce a chitinase that breaks down the chitin in cell
walls of fungi and some animals, releasing soluble oligomers ripe for catabolism that diffuse
through the population. Polymer production is a costly practice, and so through the process
of evolution Vibrio cholerae will produce a subset of the population that does not produce
the polymer yet still consumes the diffusing oligomers. As such, these non-producers are
non-cooperators, or defectors, and their competition with the producers creates a Volunteer’s
Dilemma where a lack of enough producers becomes self-defeating.
Many previous efforts have been made to model and understand the game of Volunteer’s
Dilemma using an evolutionary game theoretic perspective [Archetti and Scheuring, 2011, He
et al., 2014]. These models all assume some variation on the following basic game. Assume
a large constant well-mixed population of size Ppop . In each successive generation Ppop /N
groups of N members are formed at random, each group plays the public goods game, then
groups are disbanded. Players pay a cost c if they cooperated toward the public good, and
players gain a benefit β if there were enough cooperators to trigger availability of the public
good. Before the next generation a small fraction of the population are clonally replaced
from an ancestor with likelihood equal to the ancestor’s relative fitness.
All previous models focused on prediction of the population average, which is a single
point. However, once the population of mixed-strategies has reached a stable equilibrium
then the population may be of an infinite number of distributions that satisfy the predicted
average. There is currently a lack of theory to predict the strategy distribution of the entire
final population for mixed-strategy games.
Archetti et al. proposed an analytical model to estimate the fixed points for pure-strategy
games [Archetti and Scheuring, 2011] described by the payoff equations for cooperators and

24

defectors that describe payoff given various configurations of cooperation for the N − 1 other
players

N −1

WC =

f (j)B(j) − c

(2.1)

f (j)B(j)

(2.2)

j=0
N −1

WD =
j=0

where B(j) is the benefit given (β) if the minimum threshold is met with j cooperators,
otherwise 0. f (j) is the probability the public good is produced by exactly the number of
cooperators needed to meet the threshold ζ requirement regardless of the current player’s
choice. This is described by

f (j) =

N −1
PC j (1 − PC )N −1−j
j

(2.3)

and the equilibrated fixed point can be described as

N −1
c
PC ζ−1 (1 − PC )N −ζ =
b
ζ −1

(2.4)

where N is the group size, and ζ is the minimum number of cooperators to produce a
public good. The frequency of cooperators, PC , who pay a cost c in contributing to the public
good, and the benefit b rewarded to each player in the group if the public good is produced.
The model is based on assuming the fraction of cooperators as PC and calculates the change
in population fractions by taking the difference of the expected payoff for cooperators and

25

defectors at the assumed population fraction. The probability that at least ζ players cooperate
is

N

P (j) =
j=ζ

N
j
PC (s) (1 − PC (s) )N −j
j

(2.5)

where PC (s) is the stable fixed point of cooperators, which can be calculated. If the public
good is achieved, then each group member receives the benefit β according to a smooth
function

B(j) =

1
1+

e−s(j−ζ+1)

,

(2.6)

which describes a saturating benefit relative to the number of cooperators. Now the
average payoff for the entire group can be calculated as

N

P (j)WC + (1 − P (j))WD

W mix =

(2.7)

j=1

This system of equations predicts fixed points not only for pure strategy games, but also
mixed-strategy games. As Archetti et al. noted, their model works well for pure-strategy
polymorphic games where each player adheres to either cooperation or defection. For the
pure-strategy game the fixed points represent fractions of pure-strategy cooperators in the
equilibrated population. They also noted the model describes mixed-strategy games, but only
those in which the population of mixed-strategies is monomorphic where all players cooperate
with identical probabilities. In this case the fixed point represents the evolved probability for

26

cooperation shared by all players in the final population.
To address the mixed-strategy polymorphic case, the authors in [Archetti and Scheuring,
2011] discuss in the appendix an adaptation of their model to provide fixed points for the
mixed-strategy polymorphic VD game. They provide an argument that the dynamics are
identical to the monomorphic case, and thus they claim no new model needs to be developed
to describe the mixed-strategy polymorphic game.
From a biological perspective it seems possible that stabilities and instabilities may arise
from the struggle of a heterogeneous population faced with either a new or ever-present
contending phenotype. It could also be that fluctuations due to smaller population sizes
would have a large effect.

Methods

WC

2.3

3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.0

0.2

0.4

PC

0.6

0.8

1.0

Figure 2.1: Predicted payoff for a single cooperator (blue) and a single defector (red) in a VD
game with group size N = 9, cooperation cost c = 0.4, public good β = 3, and threshold ζ = 1,
using equations 2.1 and 2.2 respectively. As only 1 cooperator is required to produce the public
good, the expected payoff for this cooperator does not change with respect to changes in cooperator
population density, but the viability of the defection strategy depends on the density of cooperators.

To test the claim that the Archetti model can predict fixed points for the mixed-strategy
polymorphic game holds true, I compared the results of their model with that of an agentbased mixed-strategy polymorphic simulation.

27

In this simulation each discrete player is a member of the evolving population of 1, 024
players, so among all members those who do best will represent more of the population
in the next generation, with some variation of individuality due to mutation. This generational turnover is performed by a Moran process with random death and fitness-dependent
reproduction keeping population size constant, and a small chance of mutation to introduce
new strategy mixes. Each player is individually defined by a probability for cooperation, the
converse of which defines their probability for defection. For every new generation random
groups of size N are formed, the public goods game is played with cooperators paying a cost,
and all players benefiting if the public good is generated, and finally the group is disbanded
before the selection and reproduction step.
The proportion of strategies in any game theoretic scenario will usually create selection
pressure for a shift in strategy frequency within the population. After many generations of
evolution, the population will either enter a limit cycle where the population fluctuates in
an infinitely repeating pattern, or settle into a stable equilibrium where no more change will
occur without perturbation. In the standard volunteer’s dilemma there will never be a limit
cycle, so the population will always reach some form of equilibrium between the strategy
types.
Using equations 2.1 and 2.2 with a threshold of ζ = 1 for a simple scenario and setting a
proportion of cooperation PC in the population, the relative average payoff for each of the
cooperative and defective strategies can be predicted. By itself the average payoff is not very
interesting, but becomes informative when compared to payoffs predicted for incremental
changes in population proportion (See Figure 2.1). The resulting trend yields a slope, which
corresponds to the selection pressure for that strategy given the prevalent proportion of
cooperation in the population. However, the change in population proportions as a result of
selection pressure may be at odds with the population proportions required to produce such
high payoffs.

28

B

0.2

0.4

0.6

PC

0.8

1.0

3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.5
0.0

C
WC

3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.5
0.0

WC

WC

A

0.2

0.4

0.6

PC

0.8

1.0

3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.5
0.0

0.2

0.4

0.6

PC

0.8

1.0

Figure 2.2: As a prediction of [Archetti and Scheuring, 2011], average payoff for a single cooperator
(blue) and a single defector (red) for every possible prevalence of cooperators for the VD game with
group size 9, cost of cooperation 0.4, public good benefit 3, and ζ = 5 (A), ζ = 6 (B), ζ = 7 (C).
The stable fixed point represented by the higher crossing point of the payoff curves shifts toward
more cooperation for games with higher public good thresholds, predicted even for games with a
high threshold of 7.

For instance, a single defector in a well-mixed VD population of volunteering cooperators
is very likely to receive a payoff from the invested cooperation of any random group of
players. However, any increase in prevalence of the defector strategy necessarily decreases
the likelihood any defector will play in a group of benevolent strategies, which is the same
as decreasing the likelihood of a payoff. This trade-off is visually identified in Figure 2.1
showing average payoff for a single cooperator and single defector in a game with a low
cooperation threshold for public good of ζ = 1. Near a cooperative population proportion
of 0.22, a defecting player would achieve higher average payoff in a population comprised of
slightly more cooperators. This selection has two opposing forces: defectors gain larger payoff
the more likely they are to encounter cooperators, but selection will ensure the high payoff
players (defectors in this case) will reproduce proportionally more into the next generation,
decreasing the average payoff for the next generation of defectors.
Each crossing of the predicted payoff curves in Figure 2.2 indicates an equilibrium fixed
point. In each panel there are two such points that can be analytically predicted using
equations 2.1 and 2.2. For low prevalence of cooperators (low PC ) cooperators always perform
terribly or at least equivalently. As the expected prevalence of cooperation rises, cooperators
are the quickest to benefit and in panel A cooperation becomes the highest-paying strategy
near PC = 0.3, which would select for the population to produce more cooperators. This
29

1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3

B

1.0
0.8

PC

PC

A

0.6
0.4
0.2

0

1

2

3

ζ

4

5

6

0

1

2

3

4

ζ

5

6

7

8

9

Figure 2.3: Analytical predictions of the equilibrium fixed point (dashed line) compared with our
agent-based simulation results (solid line) for a variety of minimum thresholds for public goods
generation ζ, for the VD game of group size 5 (A) group size 9 (B), cooperating cost of 0.4, public
good benefit of 3, and µ of 0.005. Bootstrapped 95% confidence intervals shown but not discernible.

trend ends near PC = 0.7, above which there are so many cooperators that free-riding,
or defecting, is a much more lucrative strategy. This shift of high-paying strategy changes
selection pressure to favor defection. It is this crossing point where payoff curves for WC
and WD cross at PC = 0.7 that is the predicted stable fixed point, and the other crossing at
PC = 0.3 is an unstable fixed point.
The upper point discussed already is known as an attractive fixed point because perturbations in the population will eventually resolve back to the fixed point given enough time
to equilibrate. The lower fixed point is known as an unstable, or repulsive fixed point. If
the population is exactly at this lower fixed point, then average payoff for each strategy is
identical. If selection and reproduction are equally fair to all equivalent payoff strategies,
then the population strategy demographics do not change over evolutionary time or any time.
However, evolution is a stochastic process so it is highly unlikely the population will remain
exactly at this point. Any small change in population demographics results in differential
payoff between strategies, kickstarting the process of selection toward the attractive fixed
point. There are two other fixed points for all VD games: stable attractive fixed point at
PC = 0 and unstable fixed point at PC = 1.

30

Comparing these analytically determined fixed points to an agent-based simulation (ABS)
allows validation of the equations. Figure 2.3 shows how simulation results marginally differ
for the mixed-strategy version of the same game. The results appear identical for nearly all
thresholds, however differ at the highest thresholds: ζ ≥ 4 (panel A) in the case of group size
5, and ζ = 8 (panel B) for group size of 9.
These results that show only a marginal a discrepancy between the analytical method
suggest the previous claim to be true. That is, the pure-strategy model proposed by Archetti
et al. correctly predict fixed points for mixed-strategy games as the authors claim. Any new
models extending this model can be assumed to work similarly if the fundamental predictions
have not changed. I will now show the validation for a new theory that can predict not
only the fixed points for mixed-strategy polymorphic games, but the final evolved population
distributions around those fixed points as well.

2.3.1

New Model

Previous models described the definite population equilibrium point assuming an infinite
population and infinite time to equilibrate. The below model makes the same assumptions.

LPC = e−

φ(PC )
µ

(2.8)

where L is the probability distribution of strategies with cooperative probability PC
which is a solution to the Fokker-Planck equation associated with the stochastic replicator
equation at mutation rate µ.

βζ(N + 1 − ζ)
φ(PC ) = 3
N + 3N 2 + 2N

N

k=ζ

c
N
c
PC k (1 − PC )(N −k) + PC 2 − PC 3
k
2
3
31

(2.9)

B

0.2

0.4

0.6

PC

0.8

1.0

0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
0.0

C
PC

0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
0.0

PC

PC

A

0.2

0.4

0.6

PC

0.8

1.0

0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
0.0

0.2

0.4

0.6

PC

0.8

1.0

Figure 2.4: Distributions representing likelihood of observing population mixtures, derived from
equation 2.8. All panels show predictions for the VD game of N = 9, c = 0.4, β = 3, and µ = 0.005,
but ζ = 5 (A), ζ = 6 (B), and ζ = 7 (C). For simple predictions there is only one peak (A), but as
the game becomes more difficult with higher ζ (B) a second peak may form showing viability of
defection (C).

where PC is the proportion of cooperators, β is the benefit of the public good produced,
ζ is the threshold or minimum required cooperators to produce the public good, N is the
group size, k is the number of cooperating players, and c is the cost paid by an individual
for cooperating (see private communication with Christoph Adami for the derivation and
equation origin).
Equation 2.8 with equation 2.9 shares some similar structure with that proposed by
Archetti et al. but is extended by inspiration from the Fokker-Plank equation. This extension
allows the equation to predict not only a final point of stability between strategy ratios, but
how a population distribution of mixed-strategies will change over time. In other words, it is
the likelihood of observing each strategy given an infinite population guaranteed to contain all
strategies and infinite time for that population distribution to shift according to the selection
gradient and evolutionary stochasticity.
The magnitude of the curve in figure 2.4 represents strategy abundance, but also the
likelihood of observing that strategy at any point after the initial conditions have been
mitigated by time, in an infinite population where each strategy may be present. Panel A
shows a single peak distribution where such a population will have a mean mixed-strategy
of PC = 0.7. The dual peak shown in panel C predicts the final population will stabilize at
both peaks PC = 0 and PC = 0.91.
32

With these distributions as guides it is easier to predict what will happen to finite populations of VD players. Previous models relied on generating a single guaranteed prediction.
This model generates a stochastic prediction of likelihood for a stochastic process. Each peak
of these distributions in figure 2.4 is an attractor, and their relative magnitudes indicates
their attractive dominance over one another assuming there are portions of the population
representing strategies at each attractor of sufficient quantity. This co-existence of strategies
can only happen with a population large enough, and a mutational mechanism that generates
strategies diverse enough to populate both attractors.

2.3.2

Mutational Mechanism

Among the parameters of equation 2.8 is rate of mutation µ but also an assumption of
mutational mechanism, or how those possible mutations are distributed. This particular
model was created with assumption of a Gaussian distribution. Two distributions were
selected for comparison in ABS: uniform and Gaussian. The uniform distribution is the most
general model of mutational mechanism typically used in computational experiments. Under
a uniform mutational distribution each newly generated trait value from the distribution is as
equally likely as any other trait value. However, this simplistic model of mutations and their
effect on trait values might have large oversimplifying effects on the evolutionary history of
the population, obscuring or completely missing interesting dynamics. For instance, because
strategies are encoded directly from trait values, then every strategy is one mutation away from
any other strategy making the requirement for stable dominance of a strategy such that it must
be completely defensible against all other possible strategies. The Gaussian distribution is
more reasonable assumption for biological systems because new trait values only closely related
to current trait values can arise from new mutations, and they arise with a Gaussian likelihood
proportional to their distance from the last trait value. The implications for dominance under
the Gaussian distribution is that any dominant strategy must be defensible only against
closely related strategies, not against all other possible strategies.
33

PC

0.0035
0.0030
0.0025
0.0020
0.0015
0.0010
0.0005
0.0000
0.0

0.2

0.4

0.6

PC

0.8

1.0

Figure 2.5: Likelihood of average population mixtures for mixed-strategies assuming infinite
population size and infinite time. Each peak and valley represents a fixed point: The left peak
(defective) is attractive, the valley is repulsive at PC = 0.1375, and the right peak (cooperative) is
attractive at PC = 0.275. It can be observed in finite populations that a small mutation rate or large
population will stabilize a population at the cooperative fixed point, but otherwise the population
will eventually collapse to the defective fixed point.

Dominance between any two strategies of only one player each in a repeated game of
random new players is relatively easy to predict by comparing predicted average payoffs. For
a population of Npop players, the fixed point represents the dominant ratio of strategies. Fixed
point prediction in VD games with deterministic strategies can be calculated using equation
2.4, but may also be used for mixed-strategy games under large enough population sizes. For
instance, for a group size N = 9, cooperation cost c = 0.4, public good benefit β = 3, and
threshold ζ = 4 will equilibrate to around 59% cooperators and 41% defectors (2.3).
These mixed-strategy games are comprised of players who play any of the usual strategies
from the deterministic game according to a probability density function (PDF) with a domain
of all possible strategies. As each player represents a single PDF, the entire population of mixed
strategy distributions evolves as a mixture itself. This population eventually equilibrates to
a particular distribution of mixed strategies. Figure 2.5 shows a distribution of likelihoods
to observe the population in particular basins of attraction for mixed-strategies.
Final population distributions from the agent-based results can look quite different from
analytical predictions due to finite population and finite time. By Figure 2.5 the predicted

34

0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
0.0

B
fPC

fPC

A

0.2

0.4

PC

0.6

0.8

1.0

0.045
0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
0.0

0.2

0.4

PC

0.6

0.8

1.0

Figure 2.6: Agent-based evolved populations for the uniform mutational distribution (A) and
Gaussian distribution (B), with public goods threshold ζ = 4, for the VD game of group size 16,
cooperating cost c = 0.6, public good benefit β = 3, and µ = 0.005. Over 100 replicates collected to
produce the distributions. Both distributions have and are predicted to have a mean cooperation
frequency fPC = 0.275 using the cooperative fixed point in the distribution from Figure 2.5. Despite
the average of each distribution as the predicted fixed point, the distributions are quite different
from one another. Only the Gaussian mutational distribution produces a unimodal population
strongly clustering around the predicted fixed point, whereas the uniform mutational distribution
produces a unimodal population with a long tail, and it is not immediately apparent which strategy
is the fixed point, or average strategy.

cooperative fixed point assuming Gaussian distribution would be a population with an average
mixed-strategy of PC = 0.275 and would also mutationally average a PC = 0.275 with
probability equal to the same distribution. This ergodicity of representing both abundance
and likelihood of observation is only possible given assumptions of infinite population size and
infinite time, but in a finite population finite time simulation the expected ergodic relationship
will not hold. The agent-based simulation of evolving a population of 1024 individuals playing
VD with the same parameters yields two different distributions depending on the mutational
mechanism (Figure 2.6). A uniform distribution of possible mutations creates a population
distribution rife with defection, only offset by a large spread of different types of cooperators
(panel A). A Gaussian distribution of possible mutations creates a population distribution
clustered around the fixed point (panel B). While both population distributions in panels A
and B have average mixed-strategy PC = 0.275 matching the analytical fixed point prediction,
the population evolved with the more plausible Gaussian mutational regime creates a very

35

different evolved population. So while the fixed point can be predicted with some degree of
accuracy, the mix of the final population depends heavily on the assumption of mutational
mechanism.

2.3.3

Errors

Study systems are useful to study emergent phenomena, but an ever-present danger for theoretical or computational systems is having mistakenly designed bugs or incorrect assumptions
into the system. How then does an experimenter know their study system provides results
that may be reliably generalized? If the study system dynamics respond in similar ways
to the systems it represents, then the other dynamics that are more difficult to validate
may be accepted with higher confidence. In collaboration to develop a new mathematical
model, a study system was necessary from which inductive analyses could be used to inform
the analytical model. Both systems may have their respective errors during development. A
comparison to reasonable expectation and similar systems must be made before confidence
may be ascribed to their results and implications.

1.0

PC

0.8
0.6
0.4
0.2
0.0

10 1

10 2

Npop

10 3

Figure 2.7: Fixed points predicted exactly by the analytical model (dashed line) and approximated
by the average strategy of the evolved population from the agent-based model (solid line). Both
models have the parameters ζ/N = 0.2, c = 0.01, and β = 0.5. The agent-based model uses µ = 0.005
and a smoothing payoff parameter s = 0.5, which is half way between a linear and step interpolation.
Both predictions converge toward PC = 0.2, and is identical to the setup and result of the analysis
provided by [Archetti and Scheuring, 2011].

Archetti [Archetti and Scheuring, 2011] proposed a similar but simpler model already
36

vetted, which I can use to further validate the analytical model through comparison of identical
or similar analyses. The Archetti model is an analytical model that predicts the stable and
unstable equilibrium points of the VD game by predicting the proportion of cooperation to
defection.
Archetti’s models show that despite a difference in estimated result, the hard threshold
and smooth threshold results give converging results for large group sizes (N ). For small
group sizes, the smooth threshold model predicts larger cooperation ratios than the hard
threshold model. The convergence from both models can be predicted by extrapolating the
response from a range of group sizes. In other words, as population size increases, then both
models tend toward producing the same resulting fractional population of cooperators.
I can use the dynamics of Archetti’s models to test if our proposed models exhibit the
same convergence under identical assumptions. Both our models may be used with the
same assumptions, as like Archetti, our models assume mixed strategies and our analytical
model assumes a hard threshold while the computational model can be parameterized with
a steep or soft threshold. Ideally, I would perform an identical parameter sweep to Figure 3
from [Archetti and Scheuring, 2011] beginning with a population size of 100 and measuring
predicted ratio of cooperators all the way up to population size of 600 in increments of 50.
A direct comparison to these parameter ranges is not possible because the computational
complexity of the agent-based system is a limiting factor. The closest I can produce in
reasonable time is a sweep of results for population sizes of a factor of 10 less. However,
the dynamics for both the mathematical and simulation models suggest a convergence near
the same PC around which Archetti’s models converge. That is, under an assumption of
ζ/N = 0.2, and the payoff function is defined with a smoothing parameter s = 0.5 to create a
very shallow very smooth sigmoidal function, then both mathematical and simulation systems
converge to a value approaching 0.2. Note, s = 0.1 would be a nearly linear function, and an
s = 100 would be nearly a step function.

37

The smooth threshold model from Archetti always predicts a higher level of cooperation
than the step threshold model. For small population sizes, this difference is quite large, but
larger population sizes show an apparent trend to asymptotically converge on a single value
of predicted cooperation ratio within the evolved population. Our results follow a similar
pattern despite a different range of population sizes. Smooth threshold results for small
population sizes predict nearly 2x more cooperation in the final evolved population than
the step function results. Additionally, this difference decreases asymptotically for larger
population sizes, converging on the same value as Archetti’s models. This similarity of
predictive behavior from our models compared to Archetti’s models supports an equivalence
of validity.

2.3.4

Spatial Constraint

Closed form analytical models such as the ones discussed here cannot make predictions under
an assumption of spatial constraint, as higher order dynamics greatly influence the outcome.
For this reason only the tightly coupled linear systems, such as an agent-based simulation,
can provide predictions. Here I explore spatial considerations using the agent-based system
for pure-strategy and mixed-strategy systems, and compare predictions to the those of the
analytical model.
Figures 2.8 and 2.9 show the difference between a well-mixed (panel A) and spatial (panel
B) VD game with mixed-strategies. Each population has 1, 024 individuals and was evolved
for 80, 000 updates. Parameters were N = 16, c = 0.6, β = 3, and µ = 0.005. While the
population distributions shown are so different to almost be inverses of each other, their
means are nearly identical. This comparison emphasizes how, like mutational mechanism,
spatial constraint can highly influence the resulting population distribution.
However, a thorough comparison of fixed point predictions for a valid range of ζ will
better show other potential differences between well-mixed and spatial VD games. Figures

38

0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

B

0.25
0.20

fPC

fPC

A

0.15
0.10
0.05
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6

PC

PC

Figure 2.8: Population distributions for single instances of agent-based evolved populations for
well-mixed (A) and spatial constraint (B), with public goods threshold ζ = 4, for the VD game of
group size 16, cooperating cost c = 0.6, public good benefit β = 3, µ = 0.005. The analytical model
predicts a mean cooperation frequency fPC = 0.275 and both well-mixed and spatial populations
evolutionarily shift around this fixed point.

A

B

Figure 2.9: Spatial visualizations for single instances of agent-based evolved populations (the same
from Figure 2.8) for well-mixed (A) and spatial constraint (B), with public goods threshold ζ = 4, for
the VD game of group size N = 16, cooperating cost c = 0.6, public good benefit β = 3, µ = 0.005.
The analytical model predicts a mean cooperation frequency fPC = 0.275 and both well-mixed and
spatial populations evolutionarily shift around this fixed point.

39

A

B

1.0

0.8

0.8

0.6

0.6

PC

PC

1.0

0.4

0.4

0.2

0.2
0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

ζ

ζ

Figure 2.10: Comparison of analytical (dashed) and ABS (solid) predictions for deterministic
strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group size
N = 9, cooperating cost c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A and B they
analytical (dashed) is the same prediction assuming well-mixed. Fixed points for the pure-strategy
game predicted by the ABS under spatial constraint (B) show much higher levels of cooperation
than the well-mixed analytical prediction.

2.10 and 2.11 show differences between the mixed-strategy (panels A) and spatial (panels
B) VD games for group size N = 16, cooperating cost c = 0.6, public good benefit β = 3,
µ = 0.005. Spatial constraint is well-known to encourage cooperation in games, so the
expectation for ABS results are for higher cooperative ratios when the populations are
spatially constrained [Killingback and Doebeli, 1996, Dietz, 2003]. Both deterministic and
well-mixed VD fixed points are perfectly predicted by the analytical model for a variety of ζ
thresholds, but differ in how spatial constraint affects the PC fixed point. For deterministic
VD games, spatial constraint highly biases results toward cooperation, whereas the effect
is marginal for the mixed-strategy games. There is a point in Figure 2.11 panel B where
it appears spatial constraint undermines cooperation compared to the analytical prediction
for the highest of threshold ζ. This effect may be due to a lack of equilibration time, as the
similar downward trend is seen in panel A for the well-mixed games.
At a ζ = 4, spatial structure imposed on a pure-strategy population creates a population
with 83% cooperation. Spatial structure imposed on a mixed-strategy population of the
same threshold creates a population with mean strategy PC = 0.65 cooperation. To compare
40

B

1.0

1.0

0.8

0.8

0.6

0.6

PC

PC

A

0.4

0.4

0.2

0.2
0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

ζ

ζ

Figure 2.11: Comparison of analytical (dashed) and ABS (solid) predictions for probabilistic
strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group
size N = 9, cooperating cost c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A
and B they analytical (dashed) is the same prediction assuming well-mixed. Fixed points for the
mixed-strategy game predicted by the ABS under spatial constraint (B) show marginally higher
levels of cooperation than the analytical prediction, indicating higher cost-benefit efficiency.

the success of these populations is difficult given only the cooperation ratios, but a direct
comparison of group fitness can be made using Equation 2.7. An exploration of group fitness
as a result of PC is found in Figure 2.12. W mix of pure-strategy at 83% cooperation is
24.012, and W mix of mixed-strategy at PC = 0.65 cooperation is 24.027. The mixed-strategy
population has marginally higher predicted group fitness, with a lead of 0.015. This difference
is not very large, so one might assume that in direct competition of pure-strategies and
mixed-strategies there may be no clear winner. In small population sizes there will surely
be a negligible fitness difference, but some biological populations that exhibit a Volunteer’s
Dilemma style threshold game, such as biofilm-forming microbes, form populations of much
greater size. For a fitness effect of s = (24.027 − 24.012)/24.012 = 0.0006247 a population
of at least N = 1/0.0006247 or N = 1600 would be required to allow fixation of mutations
with such a small effect size. For a biologically plausible population size, the mixed-strategy
population would outcompete the pure-strategy population. The reason for this difference
can be understood by the cost-benefit ratio for cooperators in the VD game: too much
cooperation is costly and redundant, whereas efficient levels of cooperation benefit the entire

41

group because cooperative effort is not wasted beyond that required to create the public
good. These results suggest “phenotypic heterogeneity” or phenotypic plasticity such as that
observed in biofilms [Jefferson, 2004] is an adaptive trait whereby a probabilistic expression
allows efficient population strategy mixes with given large populations.

25

W MIX

20
15
10
5
0
0.0

0.2

0.4

PC

0.6

0.8

1.0

Figure 2.12: The result of using Equation 2.7 to predict group fitness for Volunteer’s Dilemma of
group size N = 9, ζ = 4, cost c = 0.4, and benefit β = 3. The fixed point is at PC = 0.59, but the
maximum group fitness is higher, at PC = 0.72.

2.4

Conclusion

Previous models for Volunteer’s Dilemma predict population fixed points as singular averages
of cooperation PC . Here I show validation for a new analytical model by providing an agentbased simulation. The development of both models allows an induction-deduction refining
development and validation for otherwise difficult modeling problems. The data presented
here highly support the new analytical model as a robust method to predict final population
distributions and the configurations through which they will progress over evolutionary time.
Finally, the comparative data suggest phenotypic plasticity, or probabilistic gene expression,
in microbes as a selective advantage in the evolution of cooperation for Volunteer’s Dilemma
games.

42

Chapter 3
Information-theoretic
Neuro-correlates
Boost Evolution of Cognitive Systems

3.1

Abstract

Genetic Algorithms (GA) are a powerful set of tools for search and optimization that mimic
the process of natural selection, and have been used successfully in a wide variety of problems,
including evolving neural networks to solve cognitive tasks. Despite their success, GAs sometimes fail to locate the highest peaks of the fitness landscape, in particular if the landscape
is rugged and contains multiple peaks. Reaching distant and higher peaks is difficult because
valleys need to be crossed, in a process that (at least temporarily) runs against the fitness
maximization objective. Here we propose and test a number of information-theoretic (as well
as network-based) measures that can be used in conjunction with a fitness maximization
objective (so-called “neuro-correlates”) to evolve neural controllers for two widely different
tasks: a behavioral task that requires information integration, and a cognitive task that re43

quires memory and logic. We find that judiciously chosen neuro-correlates can significantly
aid GAs to find the highest peaks.

3.2

Introduction

The last 50 years of research in Artificial Intelligence have taught us many things, but
perhaps the most obvious lesson is that designing complex cognitive systems is extremely
hard. Notwithstanding the success of chess-playing algorithms and self-driving cars, designing
a brain that rivals the performance of even the smallest vertebrate has proven elusive. While
the computational algorithms that are being deployed today on the aforementioned problems,
(as well as on image classification via convolutional nets) are impressive, many researchers are
convinced that none of these algorithms are cognitive in the sense that they display situational
understanding. For example, the celebrated convolutional nets can easily be fooled [Nguyen
et al., 2015] with trivial imagery, suggesting that they implement a sophisticated look-up
table after all, with very little understanding.
The failure of the design approach has been acknowledged by several groups of researchers
that have chosen an entirely different approach, namely to use the power of evolution to
create machine intelligence. This field of “neuro-evolution” [Yao, 1999, Floreano et al., 2008]
is much less developed than the standard design approach, but it has made great strides
in the last decade. It also has the advantage (compared to the design approach) that the
approach is known to have resulted in human-level intelligence at least once. In the field of
neuro-evolution, a Genetic Algorithm (GA) [Michalewicz, 1996] is used to evolve a program
that, when executed, builds a “neuro-controller”. This neuro-controller constitutes the brain
of a simulated entity, which is called an agent.
Each program is evaluated via the performance of the agent, and programs that gave rise
to successful brains are then replicated, and given proportionally more off-spring programs
than unsuccessful programs. Because mutations are introduced in the replication phase, new
44

types of programs are introduced every generation, trying out variations of the programs–and
therefore variations of the brains. This algorithm, closely modeled on the Darwinian process
that has given rise to all the biocomplexity on the planet today, has proven to be a powerful
tool that can create neuro-controllers for a diverse set of tasks.
Using evolution to create brains is no panacea, though. The use of Genetic Algorithms
(GAs) to optimize performance (fitness) of behavior controllers is often hindered by the
structure of complex fitness landscapes, that are typically rugged and contain multiple peaks.
The GA, via its fitness maximization objective, will discover local peaks but may get stuck at
sub-optimal peaks because crossing valleys is specifically not an objective. For a population
to overcome the valleys in such a rugged landscape, programs must (at least temporarily)
acquire deleterious mutations that are at odds with a simple reward system for optimization.
This difficulty is typically overcome by increasing diversity in the population [De Jong, 1975],
by splitting the population into islands [Whitley et al., 1998, Bitbol and Schwab, 2014], by
using alternative objectives such as in novelty search [Lehman and Stanley, 2008], or by
changing (and thus optimizing) the fitness function itself. Of these solutions, “Diversity” and
“Novelty” can be computationally intensive, while fitness function optimization is very specific
to every problem, and thus not a general solution.
Here we propose an alternative approach to fitness optimization in the evolution of
cognitive controllers, that takes advantage of the insight that functioning brains have a
certain number of characteristics that are a reflection of their network structure, as well as
their information-processing capacity. If we were able to reward these features at the same
time as rewarding the performance of the given task, it may be possible to evade the valleys
of the landscape, and move on neutral ridges towards higher peaks. The idea of using multiple
objectives in Genetic Algorithms is not at all new [Zhou et al., 2011], and it has been used
previously in neuro-evolution [Clune et al., 2013].
We present evidence from simulations of evolving virtual agents that establishes that it is

45

possible to improve the performance of a GA, increase the rate of adaptation, and improve the
performance of the final evolved solution, all by incorporating neuro-correlates of the evolving
agent’s brains into the fitness calculation. These neuro-correlates are metrics inspired by the
interface between Network Science and Cognitive Science that attempt to measure “how well
a brain is working”, independently of the achieved fitness. These measures typically do not
assess an agent’s performance of a task because it is often difficult to relate task performance
to cognitive ability. Ideally, these neuro-correlates either quantify the mode and manner that
information is being processed, or in which manner the nodes of the network are connected.
It is important that these neuro-correlates are agnostic of performance, as otherwise their
reward would not open a new dimension in optimization.
The evaluation of an agent in any GA typically involves measuring the agent’s performance
for a given task or environment. We show that multiplying the performance of an agent with
the value of its neuro-correlate will improve the speed of evolution and increase the ratio of
perfect-performance evolved agents (a simple way of performing multi-objective optimization,
see for example [Deb, 2014]). This improvement can be traced back to an increase in the
number of potentially beneficial mutations that may persist or sweep the population. If a
mutation increases cognitive ability but does not yet have an effect on the agent’s performance,
then it is evolutionarily neutral and can be lost by drift. However, if the neuro-correlate shows
an increase that is neutral with respect to performance, but improves cognition in some other
form (and increases a neuro-correlate), then such a mutation is no longer neutral and is
instead selected. In future generations, such an improvement might become beneficial for
task performance. Therefore, using neuro-correlates in conjunction with performance allows
these otherwise neutral mutations to stay in the population for longer or even promote them
to fixation. Subsequent mutations have then a chance to take advantage of these changes
that otherwise would have been lost to drift.

46

3.3

Background

We evolve agents to solve two very different tasks: a temporal-spatial integration task (active
categorical perception, see [Beer, 1996, Beer, 2003, van Dartel et al., 2005, Marstaller et al.,
2013b]) using embodied agents, and a purely mathematical (disembodied) task that requires
complex cognitive processing: the generation of random numbers using deterministic rules.
The temporal-spatial integration task requires agents to observe and categorize blocks of
different sizes falling toward them, by catching small blocks while avoiding large blocks. This
task cannot be solved by a purely reactive machine (see [Marstaller et al., 2013b]) because
agents must use memory in order to recognize the block’s trajectory and predict where it
will land. The task creates a fitness landscape known to be deceptive, as prior results have
shown that only a small fraction (about 10%) of populations result in an optimal solution.
The sub-optimal agents usually get stuck on local peaks that deliver about 80% of maximal
fitness [Marstaller et al., 2013b].
In the second set of experiments we investigate a task where agents are rewarded for
generating long sequences of random numbers (without access to a random number generator
or any other stochastic source). Agents are given an oscillating bit as an indicator of time,
and assigned fitness based on the length of the dictionary generated from a Lempel-Ziv
compression [Welch, 1984] of the agent’s output. Like the previous task, this task cannot be
solved by a purely reactive machine, although would be trivially solved if the agents could
access stochastic processes. However, because the agents use only deterministic processes
we expect this task to require a great amount of temporal integration and memory in order
for them to achieve a good amount of randomness. Indeed, generating random numbers is
a known task to test cognitive ability and disability, in particular in the realm of Autism
Spectrum Disorders and dementia [Brugger et al., 1996, Baddeley, 1998, Jahanshahi et al.,
2006].
The standard substrate for neuro-evolution are Artificial Neural Networks (ANNs, see
47

e.g. [Russell and Norvig, 2003]), but we use here a different substrate (“Markov networks” or
“Markov Brains”) that has proven to be adept at behavioral decision-making tasks [Albantakis
et al., 2014, Edlund et al., 2011, Olson et al., 2013b, Chapman et al., 2013, Haley et al.,
2014, Olson et al., 2015, Kvam et al., 2015, Olson et al., 2013c]. In contrast to ANNs in which
neurons are continuous-valued and non-firing, neurons in Markov brains (MBs) are digital
with only two states: quiescent or firing. Markov neurons can interact with any other neuron
via arbitrary logical functions (as opposed to the ubiquitous transfer- and activation-function
found in ANNs). We use MBs because we have experienced that they are computationally
more powerful and more evolvable, while having a much smaller computational footprint than
ANNs. ANNs on the other hand have a wide range of applications, and our results might
generalize to those applications as well.
The logic functions that connect neurons, along with the topology of the network, are
encoded directly by a string of bytes. The logic gates act on Markov variables (our equivalent
of a neuron), and the output is written into other Markov variables. In a sense, the MB
is defined by the edges between nodes, as it is the edges that carry all the computational
power. Each gate is specified by a gene, and the beginning of each gene on the chromosome
is specified by a particular combination of bytes–in our case, the “start codon” (42,213). The
bytes that follow determine the identity of the neurons it reads from, and the identifier of the
neuron(s) it writes to. The subsequent bytes encode the logic of the gate, which can be done
by simply encoding the truth table. While other MB implementations allow for stochastic
logic gates, we confine ourselves to deterministic gates, which have a much more concise
encoding (see Refs. [Edlund et al., 2011, Chapman et al., 2013, Marstaller et al., 2013b] for a
more detailed description of MB encoding and function).
There are alternative neural substrates that we could have studied here, including NEAT
or hyperNEAT [Stanley and Miikkulainen, 2002], genetic programming, or subsumption architecture machines [Brooks, 1986], etc.. These are all viable substrates for exploring the benefits
of neuro-correlate aided evolution. In this contribution we focus on testing the general validity
48

of the neuro-correlate augmented evolution approach. We do expect the results to depend
on the underlying neural substrates, their evolvability, and how well each neuro-correlate
can be assessed. In addition, our proposed method is easy to implement for other systems:
A neuro-correlate must be measured and the resulting value multiplied by the associated
performance. This should allow for a rapid testing of this method in other systems.
Despite evidence that an indirect encoding might be more advantageous [Clune et al.,
2011, D’Ambrosio et al., 2011, Gauci and Stanley, 2010b], the direct encoding has been very
successful in evolving controllers for virtual agents to solve a wide range of tasks [Marstaller
et al., 2010, Edlund et al., 2011, Chapman et al., 2013, Marstaller et al., 2013b, Albantakis
et al., 2014,Hintze and Miromeni, 2014]. In addition, these controllers have been instrumental
in establishing some of the neuro-correlates used in the present study, which increases our
confidence these measures perform as described. Next we describe the eight different neurocorrelates used to assess a controller’s topology and performance.

3.3.1

Network-theoretic neuro-correlates

3.3.1.1

Minimum-Description-Length

The simplest neuro-correlate is, colloquially speaking, the largest possible brain size. It
is difficult to define such a concept mathematically, but we can imagine that if we had a
description of the brain in terms of the program that builds it, then the shortest such program
would be the most concise description of the brain in a Minimum Description Length (MDL)
formalism, and larger MDLs could encode larger brains. The size of the genome that codes for
our Markov brains could serve as a proxy for the brain MDL, but it is almost never the smallest
description of the brain simply because the genome can add more “empty tape” instead of
running out of space to encode more logic gates, for example using a gene duplication. Using
the genome size (as proxy for MDL) as a neuro-correlate makes sense because it explicitly
rewards genome expansion, rather than waiting for a fortuitous genome duplication to add

49

the extra space. The genome size is directly proportional to the potential number of logic
gates and thus the number of connections the agent phenotype might have, since the genome
encodes the logic gates directly. Of course, under such a selective pressure genome length
is almost guaranteed to be very different from the compression limit (the smallest program
encoding the brain), but we can think of evolution as creating a selective pressure to compress
the brain description as much as possible.
In our implementation of Markov Brains the genome size varies between 2, 000 and 20, 000
loci (each locus on the genome is a byte, so it can take values between 0 and 255) and can
be affected by insertion- and deletion-mutations. We do not use the number of encoded logic
gates to directly assess brain size for two reasons: First, each gate can have a different number
of inputs and outputs, which influences the complexity of the gate. Second, gene duplications
can create exact copies of a gene that codes for a gate, which changes the number of gates
without (presumably) affecting the brain’s function.
Because the connectivity of standard ANNs is fixed [Russell and Norvig, 2003], an MDLlike neuro-correlate does not exist there, but the Vapnik-Chervonenkis (VC)-dimension [Vapnik and Chervonenkis, 1971] that bounds the learning capacity of a network could be a suitable
alternative. Within more plastic systems ANNs that allow for encoding of connections such
as NEAT [Stanley and Miikkulainen, 2002] the number of edges between neurons could be a
proxy for potential brain size.
3.3.1.2

Topological Complexity

Brains are networks of neurons, and our MBs are networks of logic gates, both of which
can be represented by graphs. Assessing the complexity of graphs is not a straight-forward
task (see, e.g., [McCabe, 1976]), but for the purpose of brain function some graph properties
are obvious neuro-correlate candidates, and easy to measure. We first measure the graph
diameter (GD) as the highest value in the distance matrix of the network - also known as
the longest of all shortest paths between all node pairs. The intuition behind using GD as
50

a neuro-correlate is that information traveling along neurological pathways in brains with a
large diameter has more time to interact with other neurons, in particular in other parts of
the brain. If information takes longer to pass from sensors to actuators, it remains within
the brain longer and therefore extends the agent’s short-term memory.
3.3.1.3

Connectivity and Sparseness

We measure the standard graph theoretic “Gamma Index” (GI) or “connectivity” of the
network as well as its converse, the network “sparseness.” The Gamma Index is the ratio of
extant connections to all possible connections. For this measure multiple connections between
nodes are treated like a single connection, otherwise there would be an infinite number of
possible connections between all nodes of the network.
Current understanding of brain optimization and organization suggests that connections
between neurons are costly [Ahn et al., 2006, Cherniak et al., 2004] and that this provides a
strong selection pressure during evolution. Specifically, it has been shown that minimizing
connections between neurons in a brain produces more modular and adaptable brains [Clune
et al., 2013, Huizinga et al., 2014]. As you will see, this is not necessarily the case but
depends on the task to evolve. Also, intuitively one might think that more connections are
better, and thus optimizing for density might be as beneficial as for sparseness under the
right circumstances. To incorporate this phenomenon, we use Sparseness and Gamma Index
separately, as they reflect different aspects of brain connectivity.
3.3.1.4

Representations

R, a measure for the amount of information that a brain represents within internal states, is a
new information-theoretical measure of cognitive function that correlates with fitness [Marstaller
et al., 2010,Marstaller et al., 2013b], but is in principle separate from an agent’s performance
on a task. R measures how much an agent knows about its environment above and beyond
its current sensory information. Representations can be thought of as internal models of the

51

world that the agent can use to make decisions in conjunction with sensory signals, or even in
the absence of sensory signals. Because of this, R is often identified with “what a brain knows
about its world with its eyes closed”. We can define R as the information the mental states
(described by a random variable M ) of an agent have about the environment (described by
random variable E), given its sensors (variable S) [Marstaller et al., 2013b]

R = I(E : M |S) = I(E : M ) − I(E : M : S) = I − I(S : E) − I(S : M ) .

(3.1)

Here, I is “multi-information”: the amount of information variables share [Schneidman et al.,
2003], which for three variables becomes

I = H(E) + H(M ) + H(S) − H(E, M, S) .

(3.2)

In equation (3.1), I(E : M |S) is the shared information between variables E and M given
the sensor state S, I(E : M ) is the (unconditional) shared information between environment
and brain internal states not including sensors), and I(E : M : S) is the shared information
between all three variables (a quantity that may become negative). The measure R thus
quantifies the correlation of mental states with world states, while subtracting the correlations
coming from the sensors. This notion is not well-defined when using the correlation function,
but it is when using information theory. R increases in evolving agents when their tasks require
knowledge or internal models about the environment [Marstaller et al., 2013b]. Agents without
representations, or which are purely reactive agents, remain below optimal performance.
Measuring these representations is simple as long as sensor, brain, and environmental states
are directly observable (as they are here) causing only a small computational overhead.
It is important to note that R is not necessarily correlated with fitness. For instance,
an agent might have representations about an incoming threat, but may not respond. Else,
an agent may make decisions based solely on sensorial input, obtaining high fitness without

52

representations. Therefore, representations do not necessarily make a prediction about an
agent’s performance, even though they are usually correlated. In addition, R is not strictly
speaking a neuro-correlate since it cannot be measured intrinsically. It is crucial that the
correlate used does not allow predictions about performance, because otherwise the correlate
in itself would be a proxy for fitness, and therefore optimizing a combination would not
introduce a new element. R satisfies this condition so we include this measure.
3.3.1.5

Information Integration

Agents solving a cognitively complex task must integrate information from different sensory
inputs to come to a single decision. Sensory inputs must be compared to one another in the
light of past experiences. The information-theoretic measure Φ is one way of quantifying a
brain’s information-integration capability [Tononi, 2008, Balduzzi and Tononi, 2008, Balduzzi
and Tononi, 2009, Edlund et al., 2011, Albantakis et al., 2014]. Unfortunately, computing Φ
is computationally intensive, so much so that it is cumbersome on modern high-performance
computers to calculate Φ exactly for brains of 16 neurons (for every agent at every update
of an evolving population), and essentially infeasible for brains with more than 24 neurons1 .
Here we use the much more practical Φatomic , which is a very good approximation of Φ
at a much reduced computational cost [Edlund et al., 2011] (Φatomic has also been defined
as ”synergistic information” [Barrett AB, 2011, Ay, 2015]). Specifically, Φatomic is given by
the integrated information, but calculated for one specific partition of the network (while the
standard Φ optimizes over partitions). The atomic partition is the one where each node is its
own part, that is, the atomic partition segments the network into all nodes being individuals
and not part of any other partition.
To define Φatomic , we first define the information that is processed (in time) by the entire system. Let us define the network’s state using the joint random variable X = X (1) X (2) . . . X (n) ,
1
Because the computational complexity of Φ scales super-exponentially, calculating Φ for the brain of the
lowly nematode C. elegans with 302 neurons, requires evaluating ∼ 4.8 × 10457 partitions of the network, an
absurd task.

53

where X (i) represents the elements (nodes) of the system, where X changes as time (t) progresses. Each variable Xt is defined by a probability distribution p(xt ) to find variable Xt in
(i)

(i)

(i)

state xt . Each node i progresses in time X0 → X1 and each Xt is described by probability
distribution p(xij ).
The information processed by the system from time step t to t + 1 is then given by

I(Xt : Xt+1 ) =

p(xt , xt+1 )log
xt ,xt+1

p(xt , xt+1 )
.
p(xt )p(xt+1 )

(3.3)

The measure Φatomic then quantifies how much of the information processed by the system
cannot be explained by the sum of the information processed by each individual computational
unit. Thus, in a sense Φatomic quantifies how much processing is “more than the sum of its
parts”, where the parts are defined by the individual neurons:
n
(i)

Φatomic = I(Xt : Xt+1 ) −

(i)

I(Xt : Xt+1 ) + I.

(3.4)

i=0

(i)

(i)

Here, I(Xt : Xt+1 ) is the information processed by the ith neuron, and I (called “integration”
or “multi-information” in other work) measures the nonindependence between the network
variables [Schneidman et al., 2003, McGill, 1954, Tononi et al., 1994, Lungarella et al., 2005,
Lungarella and Sporns, 2006] and is defined as
n
(i)

I=

H(Xt ) − H(Xt ).

(3.5)

i=0

With both system information unexplained by part sums over time, and spatial integration
measures of nonindependence, Φatomic represents both temporal and spatial network synergies.
Calculation of Φatomic then simplifies to
n
(i)

(i)

H(Xt |Xt+1 ) − H(Xt |Xt+1 )

Φatomic =
i=0

54

(3.6)

As with previous neuro-correlates, the act of integrating information does not imply that
there will be an action upon such integrations. However, selecting agents with a higher Φatomic
over others with the same performance guarantees the preservation of potentially beneficial
mutations. In addition, we know that Φatomic is a limiting factor in the evolution of cognitive
abilities: to perform a given task the agent requires a minimal amount of Φatomic [Joshi et al.,
2013], and a better performance necessitates a higher minimal amount of Φatomic .
3.3.1.6

Predictive Information

Predictive information (P I) can be measured in several ways. It is the one-step mutual
information a system has between time t and t + 1. MB animats have sensors and actuators,
and P I can be measured as a one-step mutual information of the sensors and future sensors,
or the actuators and future actuators, or the sensors and future actuators, or the actuators
and future sensors. Here we measure P I of the sensors and future sensors (P Iss ), and sensors
and future actuators (P Isa )

P Iss = I(St : St+1 )
(3.7)
P Isa = I(St : At+1 )
where St represents the sensor values at time t and At represents the actuator values at time
t.
An organism solving a physical task will move through the environment such that this
information is increased—we typically do not look around randomly, but in a predictable
manner. It has been shown that increasing P Iss can be advantageous for creating meaningful
behavior on agents [Ay et al., 2008].
Alternatively, predictive information can be understood as the information the sensors
have about the actuators after the brain processed the sensor information. For a purely
reactive agent, increasing this P Isa (for predictive information from sensor to motor) would

55

A

B

1

2

0

3
15

14

0

1

2

3

Sensor States

= on

= off

Figure 3.1: A: In the simulation, large or small blocks fall diagonally toward the bottom row of
a 20 × 20 discrete world, with the agent on the bottom row. For the purpose of illustrating the
task, a large brick is falling to the left, while a small brick is falling to the right. In simulations,
only one block is falling at the time, and any one brick can fall either only to the left or only to
the right. The agent is rewarded for catching small blocks and punished for catching large blocks.
B: A depiction of the agent’s neurons (bottom left: triangles depict sensors, circles illustrate brain
(internal) neurons, trapezoids denote actuators) and the sequence of activity patterns on the agent’s
4-bit retina (right), as a large brick falls to the right. Reproduced from [Marstaller et al., 2013b],
with permission.

56

be advantageous, because the actions of the agent become more appropriate to the action
required. At the same time, if agents need to become less reactive but more dependent on
their internal states P Isa should decrease after adaptation (as shown in [Edlund et al., 2011]).

3.3.2

Complex Environments

We investigate the effect that rewarding neuro-correlates have on adaptation in two different
environments. The first is a temporal-spatial integration task where an agent must catch
or avoid blocks that fall towards it (see Fig:3.1). The task is an adaptation of the “active
categorical perception task” studied earlier [Beer, 1996,Beer, 2003,van Dartel et al., 2005], and
requires a comparison between past and present sensor inputs to make inferences about future
optimal behavior. While the task is seemingly simple, the sensor modalities (embodiment)
of the agent are limited in such a way that this becomes a complex problem to be solved by
an agent [Marstaller et al., 2013b].
The second environment we use to optimize the agents is the generation of (pseudo) random numbers, and does not require embodiment of the brain. This task does not require any
input, but the agent must produce a sequence of zeroes and ones with high entropy. This task
is also used to assess cognitive abilities in humans. It is known that autism [Rinehart et al.,
2006], schizophrenia [Zlotowski and Bakan, 1963], as well as different forms or Alzheimer’s
disease can be diagnosed by analyzing a sequence of symbols generated by the human subject
who was asked to produce a sequence that is as unpredictable as possible [Rinehart et al.,
2006, Wagenaar, 1972, Williams et al., 2002, Brugger et al., 1996]. This complex task involves
memory [Baddeley, 1998], processing [Jahanshahi et al., 2006], and the ability to sequentially
process information [Brugger et al., 1996] – components that are also involved in the algorithmic generation of pseudo random numbers. It is unclear if an evolved Markov Brain random
number generator resembles either a computer algorithm or the cognitive abilities found in
humans. Nevertheless this task clearly qualifies as a complex problem requiring many components to work together, while at the same time it is not another example of an embodied
57

agent. The randomness of the produced sequence is measured by its compressability using
the Lempel-Ziv-Welch (LZW) algorithm [Welch, 1984].
One can think of many other complex environments for which this method of GA augmentation might be suitable, such as: navigation tasks [Edlund et al., 2011], classification
and perception tasks [Chapman et al., 2013], or tasks that require an understanding of group
behavior [Olson et al., 2013a, Hintze and Miromeni, 2014]. As long as neuro-correlates are
measurable and doing so does not impose too high a computational overhead, this augmentation should be applicable. However, different environments could benefit differently from the
neuro-correlates used–for example a one-layered perceptron in a classification task might not
require internal representations. In such cases the representation measure R might become
useless. Alternatively Φatomic might become meaningless in a task that does not require the
integration of information

3.4

Methods

We performed evolutionary experiments to test how neuro-correlates affect the performance
of a GA. (Our source code is available at https://gitlab.msu.edu/jory/entropy-2015-neurocorrelates). In each block-catching experiment the performance of an agent was assessed
by measuring the number of blocks correctly caught and correctly avoided in 80 trials. In
the random-number-generation (RNG) experiment. the agents were given one bit that at
each step changed from 0 to 1 and back for 500 updates. The output of the network over
those 500 updates was collected and compressed. The number of symbols written into this
compressed string was used as a proxy for the maximum entropy of that string. Highly regular
or constant strings result in very short sequences after compression, while strings with higher
entropy cannot be compressed that easily and result in longer strings. This environment has
no particular state the world can be in, thus measuring R is meaningless in this context.
Once performance is assessed, fitness can be calculated, upon which the GA’s selection
58

mechanism can operate. It is standard in this domain to use an exponential of performance
(1.1performance ) as a fitness measure to encourage selection of more complex behavior, similar
to [Lenski et al., 2003b]. Using fitness alone is our control: the non-augmented case. We also
explored the use of eight neuro-correlate measures for augmenting performance of the GA:
minimum-description-length (MDL), diameter of the brain network (GD), amount of information integration (Φatomic ), amount of representation about the environment (R), Gamma
Index (connectedness), sparseness, and two variations of predictive information: sensors t to
sensors t + 1, and sensors t to actuators t + 1 (P Iss and P Isa ). Augmenting performance by a
neuro-correlate is performed by a multiplication of the normalized neuro-correlate with the
exponential performance of the agent. Each evolutionary experiment is repeated 128 times
and agents are allowed to adapt over 10, 000 generations (all evolutionary parameters are
identical to [Marstaller et al., 2013b], except duplication and deletion are identical at 0.05).
The population begins and ends at size 100 and the mutation rate is 0.005. At the end of
each experiment the line of descent (LOD) is reconstructed [Lenski et al., 2003b] and the
results on the LOD are averaged over all replicates. The general form for fitness calculation
used in this work is

N
α

ω = 1.1

i=1

κi
κimax

+1

(3.8)

where N is the set of neuro-correlates to use. These experiments used N = 1 but the
general form allows any number of neuro-correlates to augment performance together. α is
the measure of performance, κ the measure of the neuro-correlate, and κimax the maximum
theoretical value for the neuro-correlate used. While using P I as a way to augment selection
has previously used to minor success in the context of lifetime learning in embodied artificial
intelligence [Zahedi et al., 2013] and attempted with linear additive function [Zahedi et al.,
2013], here we use a non-linear multiplicative function that emphasises the effect of both
beneficial and deleterious mutations.

59

Alternatively, the distribution of fitness or neuro-correlates at the end of evolutionary
adaptation is measured. The violin plots we use aggregate the replicate experiments and
visualize the distribution. The final population contains genetic variation not yet culled by
selection. To reduce such variation, we take from each experiment the organism on the line
of descent three generations prior to the final generation.

3.5

Results and Discussion

We see that three of the eight proposed neuro-correlates improve adaptation when used to
augment the GA in the block-catching task. The agent populations not only adapt faster, but
also evolve to a higher average performance after 10, 000 generations (see Fig:3.2 left). These
neuro-correlates are Φatomic and Graph Diameter (p < 0.05) and Gamma Index (p < 0.0001)
whereby p values were calculated using the Mann-Whitney U test. P Iss (p < 0.01) and
Sparseness (p < 0.05) produced significantly worse results when used as augmenting neurocorrelates. Because Φatomic , Graph Diameter, and Gamma Index all have a positive effect on
evolution, it seems additional selection pressure for neural architecture and material drastically
helps these populations of agents evolve to solve their task. It is possible that Sparseness has
a negative effect because Markov Brains start with relatively small neural architectures and
must evolutionarily grow to match the problem.
The results are very similar for the RNG task (see Fig:3.2 right). Significance of the
effect is roughly the same: Φatomic , Graph Diameter, and Gamma Index significantly affects
evolution positively when used as fitness-agumenting neuro-correlates (p < 0.01). R cannot
be used in the context of the RNG, and is therefore not shown.
Predictive information is maximized in cases where reactive behavior is rewarded. In tasks
that require memory, maximizing predictive information can be detrimental (and is not the
best predictor of fitness, see [Edlund et al., 2011]). It is possible that a predictive information
with a larger time delay, or a conditional predictive information such as I(St : At |At−1 ) [Rivoire
60

and Leibler, 2011] could produce better results. We plan on exploring those in the future.
Multi-parameter optimization (MPO) can often solve many problems related to multidimensional complex fitness landscapes but can suffer from a number of problems, all
well-described in the multi-parameter optimization literature (for an overview see [Deb,
2014, Knowles et al., 2001]). In most of these problem cases, the parameters to be optimized
work against each other in the form of trade-offs (one parameter can only be optimized at a
cost of another). We observe this effect with Gamma Index and sparseness depending on the
task to evolve, while all other neuro-correlates work synergistically with performance. See supplementary materials Fig:1-3 for evolutionary history interactions between neuro-correlates.
We find that some neuro-correlates affect performance or each other antagonistically. While
in our experiments this trade-off reduces the final performance of the evolved agents, it could
be overcome using MPO. An objective that is antagonistic using our fitness function could
be beneficial in MPO, which should be explored in the future.

3.5.1

Augmented selection and its effect on other neuro-correlates

Using a neuro-correlate to augment a genetic algorithm can shorten the runtime requirements
and may improve the overall performance of the evolved population depending on the neurocorrelate and objective. One might ask how augmenting selection using one neuro-correlate
affects the other correlates not under direct selection. Intuitively one would expect that
selection for a particular neuro-correlate, in conjunction with performance, should increase
not only performance (as discussed above) but also the measure itself. Similarly, since neither
of the Predictive Information measures augment selection, then we expect no increase in
Predictive Information when using them in conjunction with performance. However, we find
P Isa to increase when selecting for it together with performance in the RNG environment.
Additionally we find no other effect of one neuro-correlate driving the evolution of another.
The most prominent example of this effect is with Gamma Index driving the evolution of

61

Task # Correct

80
75
70
65
60
55
50
45
40
A

0

2000 4000 6000 8000 10000
Generation

LZW Length

120
115
110
105
100
95
B

0

2000 4000 6000 8000 10000
Generation

Figure 3.2: A: Average of fitness over 10, 000 generations (over 128 replicates) for the blockcatching task. B: Average of fitness of 10, 000 generations (over 128 replicates) for generating
random numbers. •: the control using only performance in the GA; The effect of augmenting the
fitness function (Eq:3.8) with Φatomic ( ), with R ( ), with graph diameter ( ) on task performance,
with minimum-description-length ( ), with Gamma Index ( ), with sparseness (a), with P Iss ( ),
and with P Isa ( ) on task performance.

62

Figure 3.3: Absolute number of perfectly performing block-catching agents after 10, 000 generations
evolved under the five different experimental conditions shown. * indicates significance using the
Chi-Squared test relative to Task or no neuro-correlate augmentation (p < 0.05).

nearly all other neuro-correlates and vice versa (see Fig:3.4). This further supports the idea
that using neuro-correlates does not necessarily create negative synergy as discussed in Results
and Discussions concerning multiple parameter optimization.
In the RNG environment R cannot be used because there is no environment about which
the agent could build representations. For other neuro-correlates we observe the same trend
in the RNG task as in the block-catching task (see Figure:3.5). Selecting for a particular
neuro-correlate increases its measure over evolutionary time, more so than the increase found
without explicit selection. The exception to this is Gamma Index and sparseness and Φatomic .
All other neuro-correlates seem to have no additional effect on each other.

3.5.2

Neuro-correlate interactions

We showed that selection can be augmented by using specific neuro-correlates while others
do not help. Is that because selecting for the neuro-correlates themselves already provides an
advantage for performance? One might think that for example maximizing knowledge about
the world (R) requires the agent to move in such a fashion that performance enhances auto-

63

8
4
0

Task

1.0
0.0

GD

R

Phi

80
70
60
50

5
3
1

0.9
0.7
1.6
1.2
0.8
0.4
2.0
1.0
0.0

Ta
Ta sk
sk*
Ta Φ
sk
Ta *R
s
Ta k*GD
sk*
M
Ta T DL
sk* as
Sp k*G
ars I
e
Ta ness
sk*
Ta PIss
sk*
PIs
a

PIsa

PIss

Sparseness

GI

MDL

20k
10k
0k
0.30
0.20
0.10
0.00

Figure 3.4: Comparison of different optimization methods at the end of evolution of the blockcatching task. The top row shows the performance distribution (gray) and average performance
given the different selection regimes (MDL abbreviates minimum-description-length, which is the
potential brain size based on the length of the encoding genome, GI abbreviates Gamma Index, GD
graph diameter, and P Iss and P Isa are predictive information sensor t to sensor t + 1, and sensor t
to actuator t + 1, respectively). Subsequent rows show the effect of the different selection regimes
(x axis) on the respective neuro-correlates.

64

12
8
4
0
5
3
1

Task
GI

1.00
0.90
0.80
0.70
0.20
0.10
0.00
0.8
0.4
0.0

Ta
s
Ta k
sk*
Ta Φ
sk
Ta *GD
sk*
M
Ta Ta DL
sk* sk
Sp *G
ars I
en
Ta ess
sk*
P
Ta Iss
sk*
PIs
a

PIsa

0.20
0.10
0.00

Sparseness

20k
10k
0k

PIss

MDL

GD

Phi

130
110
90
70

Figure 3.5: Comparison of different optimization methods on each other at the end of evolution of
the random number generation task. The top row shows the performance distribution (gray) and
average performance given the different selection regimes (MDL abbreviates minimum-descriptionlength, which is the potential brain size based on the length of the encoding genome, GI abbreviates
Gamma Index, GD graph diameter, and P Iss and P Isa are predictive information sensor t to sensor
t + 1, and sensor t to actuator t + 1, respectively). Subsequent rows show the effect of the different
selection regimes (x axis) on the respective neuro-correlates.

65

Task

Phi

GD
MDL

0.9

Sp

Ta
sk

PIs
s
PIs
a

0.8
0.4
0.0
−0.4

ars GI
en
ess

0.7

PIss

Sparseness

0.20
0.10
0.00
1.1

GD
MD
L

GI

20k
10k
0k

1.0
0.6
0.2
−0.2

R
GD
MD
L
Sp G
ars I
en
ess
PIs
s
PIs
a

Φ

0.5
2.0
1.0
0.0

4
2
0

PIsa

1.5

Ta
sk

12
8
4
0

Φ

Task
MDL
GI

6
4
2
0

0.20
0.10
0.00
1.10
1.00
0.90
0.80
0.70

PIsa

120
80
40

1.0

20k
10k
0k

PIss

Sparseness

8
4
0

0.0

GD

R

Phi

80
60
40
20

Figure 3.6: Effect on different neuro-correlates and performance when selecting only on a single
neuro-correlate. The left panel is about evolution in the block-catching environment, and the panel
on the right is about the RNG task.

matically. Therefore we repeated the above described experiment, but instead of augmenting
performance with a neuro-correlate, this time selection was performed on the neuro-correlates
alone.
We find that none of the neuro-correlates affect performance substantially in the context
of the RNG task (see the top rows in Figure:3.6) except for Φatomic . The effect on Φatomic
on the generation of random numbers is measurable, but very low. We assume that in order
to generate non-zero Φatomic information must flow through the system. Because there is no
control function the “information” is literally random (entropy), which is what the RNG

66

environment seems to be selecting.
As expected, selecting for a single neuro-correlate increases its value in both environments
(see the diagonal for both environments in Figure:3.6). However, we also find that many
neuro-correlates affect each other positively and negatively, and the effect is similar in both
environments. Some of these interactions are very intuitive. All measures that benefit from
more connections, for example, cause the minimum-description-length to increase, whereas
sparseness causes the minimum-description-length to shrink. Similarly, Φatomic and R have
at least some positive effect on each other, and we conjecture that having the ability to form
representations even though they might not be used to improve performance still requires
the ability to integrate information.
However, P Iss positively affects P Isa in the block-catching environment and has no effect
in the RNG environment, while P Isa has a positive effect on P Iss in both environments
(compare the bottom right of each Figure:3.6). To our knowledge the relation between the
two Predictive Information measures has not been studied, and we are unable to provide any
additional insight into this phenomenon.

3.6

Conclusions

We have tested whether eight different network- and information-theoretic neuro-correlates
can be used to improve the performance of a simple genetic algorithm. We found that Φatomic ,
Graph Diameter, and density (Gamma Index) each statistically significantly improve the
performance of a GA in two environments tested. Sparseness does not improve performance
as much as density does, suggesting sparseness is not generally beneficial for GAs in this
domain. Thus, sparseness should only be used if its application has been shown to be beneficial
for the problem domain in question. The two forms of predictive information measures (P Iss
and P Isa ) had a negative effect in both environments on finding optimal performers (see
Figure:3.3), and thus at least due to statistical significance P Isa should not be used to augment
67

selection in this domain. Gamma Index was significantly the most reliable fitness-augmenting
neuro-correlate in the block-catching task for producing perfectly performing agents (see
Figure:3.3).
Because the value of each neuro-correlate is simply multiplied by performance, the computational overhead is bound by the complexity of each measure. Typically, R and Φatomic
measures are computationally intensive and must be repeated for every agent in the population. This is a significant overhead, especially for increases in agent lifetime or brain size.
However, this study shows graph diameter and gamma index measures to be computationally
inexpensive and thus preferable, with preference between the two for graph diameter. While
the GA used here benefited from augmenting selection with neural correlates, we studied
only two environments, and it is likely other environments might respond differently. Another
possible extension of this work is to investigate other neuro-correlates, or even more topologically based neuro-correlates, or perhaps other algorithms such as novelty search [Lehman
and Stanley, 2008].

3.7

Acknowledgments

This work was supported in part by the National Science Foundation’s BEACON Center
for the Study of Evolution in Action under Cooperative Agreement DBI-0939454. We wish
to acknowledge the support of the Michigan State University High Performance Computing
Center and the Institute for Cyber Enabled Research (iCER).

68

Chapter 4
The Role of Conditional Independence
in the Evolution of Intelligent Systems

4.1

Abstract

Systems are typically made from simple components regardless of their complexity. While the
function of each part is easily understood, higher order functions are emergent properties and
are notoriously difficult to explain. In networked systems, both digital and biological, each
component receives inputs, performs a simple computation, and creates an output. When
these components have multiple outputs, we intuitively assume that the outputs are causally
dependent on the inputs but are themselves independent of each other given the state of
their shared input [Pearl, 2014]. However, this intuition can be violated for components with
probabilistic logic, as these typically cannot be decomposed into separate logic gates with
one output each. This violation of conditional independence is equivalent to instantaneous
interaction — the idea is that some information between the outputs is not coming from the
inputs and thus must have been created instantaneously. Here we compare evolved artificial
neural systems with and without instantaneous interaction across several task environments.

69

We show that systems without instantaneous interactions evolve faster, to higher final levels
of performance, and require fewer logic components to create a densely connected cognitive
machinery.

4.2

Introduction

Evolvable Markov Brains are networks of deterministic and probabilistic logic gates whose
function and connectivity are genetically encoded. They are a useful model to study the
evolution of behavior [Olson et al., 2013a], cognitive properties [Marstaller et al., 2013a], and
neural-network complexity [Edlund et al., 2011,Albantakis et al., 2014], and can also be used
as classifiers [Chapman et al., 2013]. At each generation of evolution within a particular
task environment, networks are selected based on their fitness and the populations adapt
through random genomic mutations. The genome is sequentially processed with specific
sites indicating the start of a gene. An individual gene encodes one Hidden Markov Gate
(HMG), which specifies connections between network elements and also determines inputoutput logic [Marstaller et al., 2013a]. These HMGs are generalized logic gates that encompass
conventional logic gates such as XOR or NAND, whose logic table is typically a static mapping
of two inputs to a single output (Figure 4.1A), but allow for more than the typical two-inone-out format and can use a probabilistic mapping between input and output states. Here,
HMGs could receive up to four inputs mapped to maximally four outputs. In this way, each
gene may encode an entire logic module, as opposed to only a single logic function.
Depending on the environment, deterministic or probabilistic HMGs may provide a mutually exclusive advantage for evolution. Apart from introducing randomness into the Markov
Brains, probabilistic HMGs also differ from deterministic HMGs in the way they can be
represented by a collection of simpler logic gates. The outputs of a deterministic HMG are
necessarily conditionally independent from each other: given the input state, an output is
either on (‘0’) or off (‘1’) with probability P = 1.0. Information about the state of other

70

outputs is irrelevant. As a consequence, a deterministic HMG can always be decomposed
into several logic gates with one output each. For example, a two-in-two-out deterministic
HMG (see Figure 4.1 panel B) can easily be decomposed into two independent two-in-one-out
gates (see Figure 4.1 panel A). This decomposition works similarly for larger gates with more
inputs and more outputs, requiring one logic gate per output.
Probabilistic HMGs, on the other hand, are not generally decomposable into separate logic
functions for each output. In the case of a two-in-two-out probabilistic HMG, a probability
table (for a detailed explanation see [Marstaller et al., 2013a]) maps all four possible input
states to all four possible output states. Let us for simplification purposes just consider the
case where both inputs for such a gate are 0. Then, four probabilities (P00 , P01 , P10 , P11 )
determine the probability for each of the four possible output states to occur given input
state 00. P00 defines the probability that both outputs are 0, and P11 the probability that
both outputs are 1, and so forth. Observe that all four probabilities must sum to 1.0:

P00 + P01 + P10 + P11 = 1.0

(4.1)

Except for the above requirement, the individual probabilities P00 to P11 evolve independently
for default probabilistic HMGs in Markov Brains.
Let us now try to use two probabilistic two-in-one-out gates (see Figure 4.1 panel C)
to achieve the same functionality as a two-in-two-out HMG. Pa = P (A = 1|I) denotes the
probability of the first gate (‘A’) to have an output of 1 for a given input state I. Consequently,
the probability for A to have an output of 0 is 1.0 − Pa . Synonymously, for the second gate
(‘B’) Pb = P (B = 1|I). The joint input-output function of the two individual gates A and B

71

is the following:

P00 = (1.0 − Pa )(1.0 − Pb )

(4.2)

P01 = (1.0 − Pa )Pb

(4.3)

P10 = Pa (1.0 − Pb )

(4.4)

P11 = Pa Pb

(4.5)

Given eqs. 2-4.5, the summation rule (eq. 4.1) is met. In addition, the following dependency
between probabilities holds:

P00 P11 = P01 P10

(4.6)

It is easy to see that probabilistic HMGs with independently evolved probabilities P00 to P11
may violate equation 4.6. As a result, probabilistic HMGs typically cannot be decomposed.
Decomposition of a probabilistic two-in-two-out HMG is only possible if:

P (out1 |I, out2 ) = P (out1 |I)

(4.7)

P (out2 |I, out1 ) = P (out2 |I)

(4.8)

for any input state I, which means that the two outputs must be conditionally independent
of each other given all possible I. Under these conditions, the HMG’s probabilities can be
expressed according to eqs. 2-5, with Pa = P (out1 = 1|I), the marginal probability of out1 = 1
given I, and Pb = P (out2 = 1|I), the marginal probability of out2 = 1 given I. The same
principle can be applied to HMGs with multiple conditionally independent outputs.
An example probability distribution for P00 to P11 that violates conditional independence
is PD = P (out1 , out2 |I) = (0, 0.5, 0.5, 0), whereas PD∗ = (0.25, 0.25, 0.25, 0.25) conforms
with eqs. 1-8. Note that the marginal probabilities P (out1 = 1|I) = P (out2 = 1|I) = 0.25 are

72

the same in both cases. In fact, there are infinitely many probability distributions with the
same marginal probabilities, but only PD∗ fulfills conditional independence. By contrast, PD
contains the additional constraint that P (out1 = 1|I, out2 = 1) = 0 and vice versa (cf. [James
and Crutchfield, 2016] for more intricate examples of hidden dependencies in probability
distributions). Making the temporal order explicit, the probabilities in PD∗ only depend on
the input I to the HMG at timestep t − 1, before the update. PD, however, also requires
instantaneous interaction between the outputs at time t.
This example demonstrates that, in Markov Brains with probabilistic HMGs, the output
state of an element at time t may depend on information that is not available at t − 1. Such
instantaneous interactions have implications with respect to the causal structure of these
Markov Brains, as they violate the postulate that causes must precede their effects. In addition, instantaneous interactions defy the notion of elementary causal components. Without
conditional independence, the interactions between elements in a probabilistic Markov Brain
cannot be represented as a Bayesian network, or directed acyclic graph (DAG) [Pearl, 2014].
This prohibits analyzing the causal composition of these Markov Brains, which means, for
example, that the theoretical framework of integrated information theory (IIT) [Oizumi et al.,
2014, Albantakis and Tononi, 2015], which assesses how sets of elements within a system
causally constrain each other, cannot be applied to these Markov Brains. In short, while
Markov Brains with general, probabilistic HMGs may be useful tools for artificial evolution
experiments, the networks of elements they encode cannot generally be interpreted as a
network of causally interacting components.
While instantaneous interaction may be a curious phenomenon in evolvable Markov
Brains, the question remains whether the potential gain in computational power through such
instantaneous interactions has any effect on the evolution and functionality of Markov Brains.
To explore this question, we implemented a decomposable version of the evolvable probabilistic
HMGs (up to four-in-four-out) with conditionally independent outputs {out1 , . . . , outN }, such

73

that:

P (outi |I, {out1:N \i }) = P (outi |I),

(4.9)

for all input states I and all outputs {out1 , . . . , outN }, where N is the number of outputs.
These decomposable HMGs comply with an extended version of eqs. 2-5:

P (outi = Oi |I)

P (out1:N = O|I) =

(4.10)

i=1:N

for all output states O and all input state I, and thus guarantee Markov Brains with causally
interpretable neural networks. In the following, we will compare the evolution of Markov
Brains generated by probabilistic HMGs against Markov Brains generated by decomposable
probabilistic HMGs in several task environments.
We will show that instantaneous interaction hampers evolutionary adaptation. At the
same time we show that systems evolved from components without instantaneous interaction
adapt their cognitive machinery. Contrary to what might be expected given the potential
gain in computational power through instantaneous interactions, Markov Brains with decomposable HMGs required fewer gates to create a more densely connected network with better
task performance. We conjecture that instantaneous interactions provide no computational
advantage for agents evolving in the tested sensory-motor task environments.

4.3

Results

Probabilistic HMGs in Markov Brains can independently evolve the probabilities that determine their outputs, which can lead to instantaneous interactions, as outlined above. Here, we
thus introduce another type of probabilistic HMG we call decomposable HMG. These gates
encode their size and connectivity in the same way as probabilistic HMGs. Nevertheless, the
probability tables of decomposable HMGs are restricted according to eqs. 9 and 10, and thus
74

A)

B)

in 1

in 2
in
1

in
2

0
0
1
1

0
1
0
1

Out 1
0
1
1.0 0.0
1.0 0.0
1.0 0.0
0.0 1.0

AND

out 1

in 1

OR

out 2

in 2

in
1

in
2

0
0
1
1

0
1
0
1

Out 2
0
1
1.0 0.0
0.0 1.0
0.0 1.0
0.0 1.0

C)
prob
2/2
gate

in
1

in
2

0
0
1
1

0
1
0
1

00
1.0
0.0
0.0
0.0

out 1 out 2
01 10 11
0.0 0.0 0.0
1.0 0.0 0.0
1.0 0.0 0.0
0.0 0.0 1.0

out 1
out 2

in 1

AND

A

out 1

B

out 2

OR

in 2
in
1

in
2

Out 1
0
1

in
1

in
2

0

0

1-Pa Pa

0

0

00
(1-Pa)(1-Pb)

Out 2
0
1
1-Pb

out 1 out 2
01
10
(1-Pa)Pb Pa(1-Pb)

Pb

11
Pa Pb

Figure 4.1: Examples of gate decomposition into combinations of simpler gates. Panel A shows two
deterministic logic gates whose inputs are cross-wired so that both gates receive the same inputs.
The tables below show their probabilities to output 0 or 1 respectively. These probabilistic logic
boundary cases are effectively deterministic logic gates. Panel B shows a two-in-two-out logic gate
that is functionally identical with the two gates depicted in Panel A. Panel C shows two probabilistic
logic gates similarly connected like the deterministic gates from panel A. The logic tables below
only show the case where both inputs are 0. The lower table shows replacing both probabilistic
logic gates with a single two-in-two-out probabilistic logic gate (similar to panel B) and how the
new probabilities for that gate are constructed from the individual probabilities of both gates.

must be encoded in a different way than those of general probabilistic HMGs.
Probabilities are encoded as one number per one locus. In the non-decomposable HMG
these probabilities are transcribed directly from the genome to fill the probability table,
after which they are appropriately normalized given the table dimensions. For decomposable
HMGs we ensure decomposability by transcribing only the marginal probabilities (eq. 4.9).
The probability matrix is then created from these marginal probabilities P (outi |I) by using
eq. 4.10. This allows us to evolve Markov Brains with either or both types of gates, conventional
probabilistic and decomposable, or in other words those that have instantaneous interaction
and those that do not. The difference in rate of adaptation, and differences in the solutions
evolved, will highlight the effect of instantaneous interaction on evolution.
Differences between Markov Brains with and without instantaneous interactions may
depend on the environments in which the virtual organisms were evolved. Some environments might benefit from instantaneous interactions, while others might favor conditional
independence. For this reason, three different environments were tested:
75

Temporal Spatial Integration
In this environment [Marstaller et al., 2013a, Albantakis et al., 2014] the agent can
move laterally left and right (one binary effector each) and is equipped with a set of
upwards facing sensors, two on the left and two on the right (one binary sensor each)
with a gap of 2 block subunits between them, and 8 hidden binary elements for storing
information. Small and large blocks are falling toward the agent one at a time, and
activate the sensors of the agent when above them regardless of distance. The blocks
fall different directions, and the block sizes and sensors are arranged in such a way that
the blocks need to be observed over several updates in order to be distinguished. Small
blocks must be caught while large blocks must be avoided.
Foraging and Spatial Reasoning
Agents are placed at a designated home area in a two-dimensional environment and
must first discover and then obtain food. Once they obtained food, they must move it to
the home location, after which more food must be collected. Early in an agent’s life food
appears nearby, but the circular perimeter onto which food is randomly placed increases
in diameter with each successful collection, moving the food successively farther away
from home. The sensors for this agent provide a very coarse representation of the
environment and are only accurate for nearby object. Inputs include sensor signals for
food and home locations: on, facing, and near, as well as angle to food with perception of
angle discretized to 45 degrees. Consequently, food and home locations are not reliably
observable, and the agent must navigate heuristically in the absence of this sensory
information. Outputs include turning left or right and moving forward.
Associative Memory
This two-dimensional environment [Grabowski et al., 2010, Grabowski and Magana,
2014] presents the agent with a path of rewards, surrounded by a field of poison. The
agent receives 4 sensor inputs about whether its current location is on path, poison, or
one of the cues. In addition, it receives cues about upcoming turns in the path. These
76

cues, their turn associations, and the path itself are all randomly generated with each
visit to the environment. The agent may affect 2 outputs that encode 4 actions: nothing,
turn left one unit, turn right one unit, and move forward one unit. In addition, the agent
has 8 hidden binary elements for memory. The agent must explore the environment
and learn which of the two symbols indicates right and which one indicates left. The
agent must then remember those associations and use that knowledge to navigate the
path properly.
For each experimental condition, we evolved 120 independent populations of 100 organisms until their performance plateaued. A plateau was reached near generation 10,000 in
the temporal spatial integration environment, generation 5,000 in the associative learning
environment, and generation 3,000 in the foraging environment. After evolution the line
of descent was reconstructed [Lenski et al., 2003a], that is the path of inheritance from a
random organism in the final population to its ancestor in the initial population. All sweeping
mutations are observable in the line of descent.
We find that in the foraging and temporal spatial integration environments agents restricted to decomposable gates adapt much more quickly and also achieve a better final
performance (see Figure 4.2).
The associative memory environment has a tunable punishment parameter for the cost
of wandering off the correct path. The relative difference between the reward for following
the path and the punishment for straying from the path greatly influences evolvability of
successful strategies. If wandering off the path is very costly, then the agent is severely limited
in its freedom to make mistakes and explore if it is to maintain its status as a viable organism.
This harsh limitation on mistake-making hampers evolution. We find that when punishment
for path deviation is relatively small or zero, decomposable HMGs provide a clear evolutionary
advantage. When punishment is high, agents restricted to conventional probabilistic HMGs
fail to evolve at all, while agents restricted to decomposable HMGs still evolve functional

77

A

B

0.65

0.55

Fitness

Fitness

0.60

0.50
0.45
0.40

0

2000 4000 6000 8000 10000
Generations

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

0

500 1000 1500 2000 2500 3000
Generations

Figure 4.2: Average fitness of organisms on the line of descent in the spatial temporal (A) and
foraging (B) environments. The solid line represents average performance of agents restricted to
conventional probabilistic HMGs (with instantaneous interactions), dashed lines represents average
performance of agents restricted to decomposable HMGs (without instantaneous interactions). Panel
A shows agents evolved in the temporal spatial integration task, and panel B shows agents evolved
in the foraging task. The y axis is normalized to show the fraction of maximally attainable fitness
in each environment. The gray shadow indicates the bootstrapped 95% confidence interval of the
mean.

Markov Brains (see Figure 4.3).
As shown in Figure 4.4, it is not only the rate of adaptation that is higher for decomposable
gates, but also the mean final achieved performance in all three environments, even though
the extrema and distributions vary greatly by environment.
In the above simulations, Markov Brains were restricted to one type of HMG, either
probabilistic or decomposable and seeded with 15 gates at the start of evolution. In a second
set of experiments we allowed Markov Brains to evolve both types of gates in order to assess if
there is preferential selection of one gate over the other. If either of the gate types confers more
of an advantage it should be selected more often than the other gate type. The null hypothesis
suggests both gates confer the same benefit and would be under equal selection. While it
took much more computational power to simulate these populations longer, we wanted to
investigate if there might be oscillations in preferential selection for gate type. In all tested
environments we find that decomposable HMGs are used more often than conventional
probabilistic HMGs (see Figure 4.5). To exclude any variation resulting from the initial
78

B

0

1000 2000 3000 4000 5000
Generations

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

C 0.40
Fitness

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

Fitness

Fitness

A

0

1000 2000 3000 4000 5000
Generations

0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00

0

1000 2000 3000 4000 5000
Generations

Figure 4.3: Average performance of agents evolved in the association environment with increasing
levels of error punishment. Performance of agents evolved while restricted to decomposable HMGs is
shown as a dashed line. Performance of agents evolved while restricted to conventional probabilistic
HMGs is shown as a solid line. The gray shadow indicates the bootstrapped 95% confidence interval
of the mean. Panel A shows the results for agents receiving zero punishment for path-following
errors. Panel B shows the results for punishment of 0.05, meaning 0.05 was subtracted from the
cumulative agent score every time it wandered off the path. Panel C shows the results for agents
evolved with punishment 0.1, again subtracted from their score when wandering off the path. The y
axis was normalized to show relative performance, with 1.0 being the maximally attainable fitness
in each environment assuming ideal behavior.

evolvability differences between the gates, agents are seeded with 15 HMGs of each type.
We find that selection quickly reduces these, but decomposable HMGs are kept more often
than probabilistic HMGs. Observe that different tasks require brains of different sizes, but
the effect of the dominating decomposable gate type is independent of this phenomenon (see
Figure 4.5). 50 independent populations of 100 agents each were evolved in each environment
for these competitions.
So far, our results suggest that decomposable HMGs have a clear advantage when it
comes to evolutionary adaptation of Markov Brains. Furthermore, these results suggest that
systems without instantaneous interactions evolve faster, and select against instantaneous
interaction when possible. Instantaneous interaction allows components to have outputs that
share information beyond that of their inputs. As such, the question was if instantaneous
interactions provide an advantage to computation? Our results suggest the opposite. The
question is now if systems that cannot have instantaneous interactions compensate for that
loss? It may be that decomposed components are more adaptable in a way that compensates
for the missing information. To investigate this issue we also assessed the effect of gate type

79

Final Fitness

1.0
0.8
0.6
0.4
0.2

F-d
ec
A-p
rob
A-d
ec

ob

F-p
r

ec
I-d

I-p

rob

0.0
Conditions

Figure 4.4: Distribution of performances at the end of evolution for each of two conditions in each
of three environments. At the end of evolution, the 200th -from-last agent of each independent line
of descent is collected to create these distributions. This near-the-end data slicing is necessary to
eliminate noise toward the end of the line of descent caused by mutated extant agents not yet pruned
from the line of descent by selection. The three environments are represented on the x-axis as I
(temporal spatial integration task), F (foraging task), and A (association task with a punishment of
0.05). In each environment (I, F, and A) agents were allowed to either use conventional probabilistic
HMGs (labeled as “prob”) or decomposable HMGs (labeled as “dec”). Red dashes indicate the mean
and extrema. Gray violin plots show the distributions of normalized fitness for all 120 replicates per
experimental condition. Fitness was normalized such that maximal theoretically attainable fitness is
represented as 1.0. For each environment of Integration, Learning, and Association, the conditions
under which evolution was limited to decomposable gates produced significantly better adapted
agents. Significance was tested using the Mann-Whitney U test, with p < 0.05 for each environment
(p = 0.0, p = 0.0, p < 2.2 × 10−112 ).

80

B

1.5

Gates Encoded

Gates Encoded

2.0

1.0
0.5
0.0

0

10000 20000 30000 40000 50000
Generations

16
14
12
10
8
6
4
2
0

C
Gates Encoded

A

0

500

1000 1500 2000 2500
Generations

8
7
6
5
4
3
2
1
0

0

10000 20000 30000 40000 50000
Generations

Figure 4.5: Average number of gates evolved along the line of descent in a competition experiment
executed in different environments. Agents were evolved in three different environments: Panel A
shows the results from the temporal spatial integration environment. Panel B shows the results
from the foraging environment. Panel C shows the results from the association environment with
a punishment cost of 0.05. The average number of preferentially selected decomposable HMGs
over generations is shown as a dashed line, whereas the solid line represents the average number of
preferentially selected conventional probabilistic HMGs. The gray shadow indicates a 95% confidence
interval.

on the structure of cognitive machinery. Observe that the different kinds of gates only differ
in how their probabilities are encoded and do not differ in how connections are made or how
mutations affect their connection or abundance. While there are many ways to measure these
networks [McCabe, 1976] we test evolved brains for number of gates and graph measures
used in similar work: gamma index (connectivity), and brain diameter (for a more detailed
explanation see [Schossau et al., 2015]).The first measure is simply the number of gates
present in the phenotype. The Gamma index measures the density of connections relative to
all possible connection that could have been made. A higher gamma index indicates a more
connected Markov Brain.
The brain diameter is the length of the shortest path between the furthest nodes in the
network. Diameter is assessed by computing all shortest connections between all components.
The longest of these shortest connections is the brain diameter. The larger the brain diameter
is the more steps it takes for information to traverse the Markov Brain, and the more
computations are possible.
Probably the most interesting result is that the total number of gates was lower when
only decomposable gates were allowed, at least when agents were evolved in the foraging and

81

B

0.70
0.65

Fitness

Fitness

0.60
0.55
0.50
0.45
0.40

0

10000 20000 30000 40000 50000
Generations

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

C

1.0
0.8

Fitness

A

0.6
0.4
0.2

0

500

1000 1500 2000 2500
Generations

0.0

0

10000 20000 30000 40000 50000
Generations

Figure 4.6: Average fitness along the line of descent in each environment while both nondecomposable and decomposable gate types were evolvable: Panel A shows line of descent fitness for
the temperal spatial integration environment. Panel B shows line of descent fitness for the foraging
environment. Panel C shows line of descent fitness for the association environment. The gray shadow
indicates a 95% confidence interval.

in the association environment (see Figure 4.7 panel C). When agents were evolved in the
temporal spatial integration environments, we find a slightly higher number of decomposable
gates. This suggests that the subsequent effects on density and diameter cannot simply
be explained by a higher number of gates. We find that the density and diameter of the
evolved Markov Brains are higher for those experiments where agents were restricted to
the evolution of decomposable gates (see Figure 4.7 panel B and C). This suggests agents
with decomposable gates evolve to use fewer gates, and that these fewer gates are more
tightly connected. Additionally, the total diameter of the network becomes longer. This
result suggests a possible contradiction to the idea that a system needs more decomposable
gates to compensate for the lack of instantaneous interactions. Secondly, it suggests that if
the loss of instantaneous interactions is part of a trade-off, then it is offset by an increase
in connectivity and brain diameter, though it may only be that these populations evolve
faster and thus arrive at better-connected high-fitness networks sooner. The increase in brain
diameter suggests that more computations are occurring sequentially in the brain. However,
we have no intuition what this could mean for the computations that happen in the Markov
Brain.
While the mixed gate conditions favored evolution of decomposable gates, this could
have come at the cost of lower achieved fitness than in the homogeneous gate conditions.

82

ob
A-d
ec

ec

A-p
r

F-d

rob

ec

F-p

rob

Diameter
Conditions

6
5
4
3
2
1
0

I-p

rob
F-d
ec
A-p
rob
A-d
ec

ec

F-p

I-d

rob
I-p

Gamma Index

ob
A-d
ec

ec

A-p
r

F-d

rob

ec

F-p

I-d

I-p

Conditions

C7

0.25
0.20
0.15
0.10
0.05
0.00

I-d

B 0.30

12
10
8
6
4
2
0

rob

Gates

A 1614

Conditions

Figure 4.7: Distributions of different properties of Markov Brains at the end of evolution in three
environments and two conditions each environment. The three environments are represented on the
x-axes as I (temporal spatial integration task), F (foraging task), and A (association task with a
punishment of 0.05). In each environment (I, F, and A) agents were allowed to either use conventional
probabilistic gates (labeled as ”prob”) or decomposable gates (labeled as ”dec”). Panel A shows
the number of gates, Panel C shows the gamma index (that is the connection density), and Panel
B shows the diameter (longest of the shortest connections). Each panel shows the distributions as
violin plots in gray, and the mean as well as the extreme points as red dashes. For each environment
of Integration, Learning, and Association, and for both properties of Gamma Index and Diameter,
the conditions under which evolution was limited to decomposable gates produced significantly more
connected and larger diameter Markov Brains. Significance was tested using the Mann-Whitney
U test, with p < 0.05. For Gamma Index, the difference between gate restricted evolution within
environments produced p values of p = 4.9 × 10−23 , p = 6.5 × 10−6 , p = 1.3 × 10−3 respectively. For
Diameter, the difference between gate restricted evolution within environments produced p values
of p = 2.2 × 10−6 , p = 3.6 × 10−2 , p = 4.2 × 10−3 respectively.

However, an examination of the line of descent fitness show no degradation in evolvability.
This suggests that while the path for a slower and more complex evolutionary trajectory with
non-decomposable gates existed within the search space, evolution preferentially selected
against that path for one with more evolvability.

4.4

Discussion

Evolvable Markov Brains are a useful modeling framework for studying evolution [Olson
et al., 2013a] and complex cognitive systems [Edlund et al., 2011, Marstaller et al., 2013a,
Albantakis et al., 2014], but also provide a powerful alternative to conventional machine
learning approaches due to their capacity for solving classification tasks [Chapman et al.,
2013]. Their connectivity and logic structure is determined by genes that encode HMGs that
map inputs to outputs via a logic table. Probabilistic HMGs yield the most general logic

83

tables, as each entry is independently determined by one locus in the genome with the only
restriction that each row has to sum to 1. As a consequence, probabilistic HMGs allow for
instantaneous interactions, shared information between outputs. This information sharing
may contain information useful to the system, but prohibits interpreting Markov Brains with
probabilistic HMGs as a network of elementary causally interacting components [Oizumi
et al., 2014, Pearl, 2014, Albantakis and Tononi, 2015]. While this may not impact machine
learning applications of Markov Brains, it may limit their use as a surrogate for biological
systems and aggravates analyzing their causal structure. To date, the possible role of these
instantaneous interactions in cognitive systems remains unclear.
Here we investigated the impact of instantaneous interactions on cognitive systems by
comparing the evolution of two different types of Markov Brains. Agents were evolved in
three different environments and were restricted to either conventional probabilistic HMGs
or decomposable HMGs that did not allow for instantaneous interactions (eq. 9 and 10) but
were otherwise identical to conventional probabilistic logic gates.
The decomposable HMGs introduced here are a special case of probabilistic HMGs, with
the additional constraint of conditionally independent outputs. A priori, probabilistic HMGs
thus have more computational potential. Nevertheless, we found that decomposable HMGs
without instantaneous interactions not only allow faster adaptation of Markov Brains, but also
promote improved final agent performance. In populations that could evolve the use of both
probabilistic and decomposable HMGs, we found preferential selection for decomposable gates
in all three tested environments. Lastly, we found that populations restricted to decomposable
HMGs evolved to use fewer gates, have a larger diameter, and a higher density of connectivity.
These findings suggest that instantaneous interactions hamper the evolution of cognitive
systems rather than providing computational advantages. In fact, Markov Brains with decomposable HMGs evolved to higher fitness levels using a similar or even fewer number of
gates than Markov Brains with probabilistic HMGs. This indicates that Markov Brains with

84

decomposable HMGs had no need to compensate for a lack in computational power. To the
contrary, evolution seems to exploit the conditional independence property of decomposable
HMGs in these systems and to avoid instantaneous interactions. This is suggested by the
finding that Markov Brains with decomposable HMGs are more densely connected, which
means that individual decomposable HMGs evolved on average more outputs than individual
probabilistic HMGs. Conditional independence thus seems to facilitate packing more inputoutput relations into a single HMG. In Markov Brains with probabilistic HMGs more gates
with fewer average outputs may be required specifically to avoid instantaneous interactions.
Further investigation should provide more insight into the question why systems with
elementary causal components that do not allow instantaneous interactions evolve faster.
After all, an instantaneous interaction contains information seemingly “from nothing” as it
is created between outputs of logic units and is not solely caused by inputs. Based on our
simulations, several factors may contribute to explain why evolution prefers decomposable
gates:
Search Space Evolutionary search for a population using probabilistic gates must traverse a
larger state space, since probabilistic gates include the subset of gates that are decomposable.
When limiting evolution to explore only decomposable gates, the search space becomes much
smaller and thus easier to explore.
Epistatic Interactions In decomposable gates, mutations may have a different functional
phenotypic effect than those affecting conventional probabilistic gates. This may allow
transversing the evolutionary search space in greater leaps.
Robustness of Functions Related to the above, conditional independence between the
outputs of a decomposable HMG may lead to a more robust encoding of beneficial inputoutput relations.
Determinism with respect to Inputs All tested evolution environments required the
agents to react to sensor stimuli in order to achieve high fitness. While a certain amount of

85

intrinsic indeterminism (noise) may be useful to trigger exploratory behavior, instantaneous
interactions between outputs that are not explained by past inputs might simply provide no
fitness advantage in sensory-motor cognitive tasks. Decomposable HMGs can be viewed as a
noisy version of deterministic HMGs. Any additional freedom in the logic table of probabilistic
HMGs may be superfluous or even detrimental.
Our work has several practical implications for the use of Markov Brains across disciplines.
For machine learning applications, where the interest lies primarily in the speed of evolution
and final fitness, our results suggest that decomposable HMGs may improve performance
considerably compared to general probabilistic HMGs. As models for cognitive systems, using
decomposable HMGs instead of general probabilistic HMGs has the additional benefit that
decomposable gates lead to causally interpretable Markov Brains. This allows analyzing the
causal composition of the resulting Markov Brains, for example, within the framework of
Integrated Information Theory [Oizumi et al., 2014, Albantakis et al., 2014].
Finally, whether biology and fundamentally physics allow for true instantaneous interactions in nature is an open question. Typical accounts of causation require that causes precede
their effects. Yet, missing variables and coarse-grained measurements may lead to system models with apparent instantaneous interactions between variables [James and Crutchfield, 2016].
Taken together, our results suggest that there is no apparent reason to include instantaneous
interactions in Markov Brains.

4.5

Acknowledgements

This work was supported in part by the National Science Foundation’s BEACON Center for
the Study of Evolution in Action, under contract No. DBI-0939454. We wish to acknowledge
the support of the Michigan State University High Performance Computing Center and the
Institute for Cyber Enabled Research.

86

BIBLIOGRAPHY

87

BIBLIOGRAPHY
[Adami et al., 2012] Adami, C., Schossau, J., and Hintze, A. (2012). Evolution and stability
of altruist strategies in microbial games. Phys Rev E, 85(1 Pt 1):011914.
[Ahn et al., 2006] Ahn, Y.-Y., Jeong, H., and Kim, B. J. (2006). Wiring cost in the organization of a biological neuronal network. Physica A: Statistical Mechanics and its Applications,
367:531–537.
[Albantakis et al., 2014] Albantakis, L., Hintze, A., Koch, C., Adami, C., and Tononi, G.
(2014). Evolution of integrated causal structures in animats exposed to environments of
increasing complexity. PLoS computational biology, 10(12):e1003966.
[Albantakis and Tononi, 2015] Albantakis, L. and Tononi, G. (2015). The Intrinsic CauseEffect Power of Discrete Dynamical Systems–From Elementary Cellular Automata to
Adapting Animats. Entropy, 17(8):5472–5502.
[Amrutha, 2014] Amrutha, S. (2014). Agent based simulation of routine activity with social
learning behavior. In Computational Intelligence and Computing Research (ICCIC), 2014
IEEE International Conference on, pages 1–6. IEEE.
[Archetti and Scheuring, 2011] Archetti, M. and Scheuring, I. (2011). Coexistence of cooperation and defection in public goods games. Evolution, 65(4):1140–1148.
[Attik et al., 2005] Attik, M., Bougrain, L., and Alexandre, F. (2005). Neural network topology optimization. In Proceedings of the 15th International Conference on Artificial Neural
Networks: Formal Models and Their Applications - Volume Part II, ICANN’05, pages 53–58,
Berlin, Heidelberg. Springer-Verlag.
[Ay, 2015] Ay, N. (2015). Information geometry on complexity and stochastic interaction.
Entropy, 17(4):2432.
uttler, F., and Olbrich, E. (2008). Predic[Ay et al., 2008] Ay, N., Bertschinger, N., Der, R., G¨
tive information and explorative behavior of autonomous robots. The European Physical
Journal B, 63(3):329–339.
[Baddeley, 1998] Baddeley, A. (1998). Random generation and the executive control of
working memory. The Quarterly Journal of Experimental Psychology: Section A, 51(4):819–
852.
[Bain, 1985] Bain, A. (1985). Mind and Body: The Theories of Their Relation. Routledge.
[Balduzzi and Tononi, 2008] Balduzzi, D. and Tononi, G. (2008). Integrated information in
discrete dynamical systems: motivation and theoretical framework. PLoS computational
88

biology, 4(6):e1000091.
[Balduzzi and Tononi, 2009] Balduzzi, D. and Tononi, G. (2009). Qualia: the geometry of
integrated information. PLoS computational biology, 5(8):e1000462.
[Barrett AB, 2011] Barrett AB, S. A. (2011). Practical measures of integrated information
for time-series data. PLoS Comput Biol, 7(1).
[Beer, 1996] Beer, R. (1996). Toward the evolution of dynamical neural networks for minimally cognitive behavior. In Maes, P., Mataric, M., Meyer, J., Pollack, J., and Wilson, S.,
editors, From Animals to Animats: Proceedings of the Fourth International Conference on
Simulation of Adaptive Behavior, pages 421–429, Cambridge, MA. MIT Press.
[Beer, 2003] Beer, R. (2003). The dynamics of active categorical perception in an evolved
model agent. Adaptive Behavior, 11:209–243.
[Bitbol and Schwab, 2014] Bitbol, A.-F. and Schwab, D. J. (2014). Quantifying the role of
population subdivision in evolution on rugged fitness landscapes. PLoS Computational
Biology, 10(8):e1003778.
[Bojarski et al., 2016] Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B.,
Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. (2016). End to end
learning for self-driving cars. arXiv preprint arXiv:1604.07316.
[Brooks, 1986] Brooks, R. A. (1986). A robust layered control system for a mobile robot.
Robotics and Automation, IEEE Journal of, 2(1):14–23.
[Brugger et al., 1996] Brugger, P., Monsch, A., Salmon, D., and Butters, N. (1996). Random
number generation in dementia of the alzheimer type: A test of frontal executive functions.
Neuropsychologia, 34(2):97–103.
[Bryson, 1975] Bryson, A. E. (1975). Applied optimal control: optimization, estimation and
control. CRC Press.
[Bullinaria, 2006] Bullinaria, J. A. (2006). Understanding the advantages of modularity in
neural systems. In Proc. Twentyeighth Annual Conf. Cogn. Sci. Soc, pages 119–124.
[Chapman et al., 2013] Chapman, S., Knoester, D., Hintze, A., and Adami, C. (2013). Evolution of an artificial visual cortex for image recognition. In Advances in Artificial Life,
ECAL, volume 12, pages 1067–1074.
[Cherniak et al., 2004] Cherniak, C., Mokhtarzada, Z., Rodriguez-Esteban, R., and Changizi,
K. (2004). Global optimization of cerebral cortex layout. Proceedings of the National
Academy of Sciences of the United States of America, 101(4):1081–1086.

89

[Claussen and Traulsen, 2008] Claussen, J. C. and Traulsen, A. (2008). Cyclic dominance
and biodiversity in well-mixed populations. Physical Review Letters, 100(5).
[Clune et al., 2013] Clune, J., Mouret, J.-B., and Lipson, H. (2013). The evolutionary origins of modularity. Proceedings of the Royal Society of London B: Biological Sciences,
280(1755):20122863.
[Clune et al., 2011] Clune, J., Stanley, K. O., Pennock, R. T., and Ofria, C. (2011). On
the performance of indirect encoding across the continuum of regularity. Evolutionary
Computation, IEEE Transactions on, 15(3):346–367.
[Cully et al., 2015] Cully, A., Clune, J., Tarapore, D., and Mouret, J.-B. (2015). Robots that
can adapt like animals. Nature, 521(7553):503–507.
[D’Ambrosio et al., 2011] D’Ambrosio, D., Lehman, J., Risi, S., and Stanley, K. (2011). Task
switching in multiagent learning through indirect encoding. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS 2011). IEEE, Piscataway.
[De Jong, 1975] De Jong, K. A. (1975). An Analysis of the Behavior of a Class of Genetic
Adaptive Systems. PhD thesis, University of Michigan, Ann Arbor, MI, USA. AAI7609381.
[DeAngelis and Mooij, 2005] DeAngelis, D. L. and Mooij, W. M. (2005). Individual-based
modeling of ecological and evolutionary processes. Annu. Rev. Ecol. Evol. Syst., 36:147–
168.
[Deb, 2014] Deb, K. (2014). Multi-objective optimization. In Search methodologies, pages
403–449. Springer.
[DeMarse et al., 2001] DeMarse, T. B., Wagenaar, D. A., Blau, A. W., and Potter, S. M.
(2001). The neurally controlled animat: biological brains acting with simulated bodies.
Autonomous robots, 11(3):305–310.
[Dietterich, 2004] Dietterich, A. (2004). Reinforcement learning and its relationship to supervised learning. in Handbook of learning and approximate dynamic programming, IEEE,
pages 47–60.
[Dietz, 2003] Dietz, R. D. (2003). Spatial competition, conflict and cooperation. PhD thesis,
The Ohio State University.
[Edlund et al., 2011] Edlund, J. A., Chaumont, N., Hintze, A., Koch, C., Tononi, G., and
Adami, C. (2011). Integrated information increases with fitness in the evolution of animats.
PLoS Computational Biology, 7(10):e1002236.
[Floreano et al., 2008] Floreano, D., D¨
urr, P., and Mattiussi, C. (2008). Neuroevolution: from
architectures to learning. Evolutionary Intelligence, 1(1):47–62.
90

[Gardner, 1970] Gardner, M. (1970). Mathematical games: The fantastic combinations of
John Conway’s new solitaire game life. Scientific American, 223(4):120–123.
[Gauci and Stanley, 2010a] Gauci, J. and Stanley, K. O. (2010a). Autonomous evolution of
topographic regularities in artificial neural networks. Neural computation, 22(7):1860–1898.
[Gauci and Stanley, 2010b] Gauci, J. and Stanley, K. O. (2010b). Indirect encoding of neural
networks for scalable go. In Parallel Problem Solving from Nature, PPSN XI, pages 354–363.
Springer.
[Goldsby et al., 2009] Goldsby, H. J., Goings, S., Clune, J., and Ofria, C. (2009). Problem
decomposition using indirect reciprocity in evolved populations. In Proceedings of the 11th
Annual conference on Genetic and evolutionary computation, pages 105–112. ACM.
[Grabowski et al., 2010] Grabowski, L. M., Bryson, D. M., Dyer, F. C., Ofria, C., and Pennock,
R. T. (2010). Early evolution of memory usage in digital organisms. In ALIFE, pages
224–231. The MIT Press.
[Grabowski and Magana, 2014] Grabowski, L. M. and Magana, J. A. (2014). Building on
simplicity: Multi-stage evolution of digital organisms. In ALIFE, pages 113–120. The MIT
Press.
[Guyon et al., 2011] Guyon, I., Cawley, G., Dror, G., and Lemaire, V. (2011). Results of
the active learning challenge. In Active Learning and Experimental Design workshop In
conjunction with AISTATS 2010, pages 19–45.
[Haley et al., 2014] Haley, P. B., Olson, R. S., Dyer, F. C., and Adami, C. (2014). Exploring
conditions that select for the evolution of cooperative group foraging. In ALIFE 14: The
Fourteenth Conference on the Synthesis and Simulation of Living Systems, volume 14,
pages 310–311.
[He et al., 2014] He, J.-z., Wang, R.-w., and Li, Y.-T. (2014). Evolutionary stability in the
asymmetric volunteer’s dilemma. PloS one, 9(8):1–6.
[Hintze and Miromeni, 2014] Hintze, A. and Miromeni, M. (2014). Evolution of autonomous
hierarchy formation and maintenance. In ALIFE 14: The Fourteenth Conference on the
Synthesis and Simulation of Living Systems, volume 14, pages 366–367.
[Hochreiter, 1998] Hochreiter, S. (1998). Recurrent neural net learning and vanishing gradient.
International Journal Of Uncertainity, Fuzziness and Knowledge-Based Systems, 6(2):107–
116.
[Hochreiter and Schmidhuber, 1997] Hochreiter, S. and Schmidhuber, J. (1997). Long shortterm memory. Neural computation, 9(8):1735–1780.

91

[Hofbauer and Sigmund, 1998] Hofbauer, J. and Sigmund, K. (1998). Evolutionary Games
and Population Dynamics. Cambridge University Press, Cambridge, UK.
[Huizinga et al., 2014] Huizinga, J., Clune, J., and Mouret, J.-B. (2014). Evolving neural
networks that are both modular and regular: Hyperneat plus the connection cost technique.
In Proceedings of the 2014 conference on Genetic and evolutionary computation, pages
697–704. ACM.
[Hyv¨arinen et al., 2004] Hyv¨arinen, A., Karhunen, J., and Oja, E. (2004). Independent component analysis, volume 46. John Wiley & Sons.
[Jahanshahi et al., 2006] Jahanshahi, M., Saleem, T., Ho, A. K., Dirnberger, G., and Fuller, R.
(2006). Random number generation as an index of controlled processing. Neuropsychology,
20(4):391.
[James and Crutchfield, 2016] James, R. G. and Crutchfield, J. P. (2016). Multivariate
dependence beyond shannon information. arXiv preprint arXiv:1609.01233.
[Jefferson, 2004] Jefferson, K. K. (2004). What drives bacteria to produce a biofilm? FEMS
microbiology letters, 236(2):163–173.
[Joshi et al., 2013] Joshi, N. J., Tononi, G., and Koch, C. (2013). The minimal complexity
of adapting agents increases with fitness. PLoS Computational Biology, 9(7):e1003111.
[Killingback and Doebeli, 1996] Killingback, T. and Doebeli, M. (1996). Spatial evolutionary
game theory: Hawks and doves revisited. Proceedings of the Royal Society of London B:
Biological Sciences, 263(1374):1135–1144.
[Knowles et al., 2001] Knowles, J. D., Watson, R. A., and Corne, D. W. (2001). Reducing
local optima in single-objective problems by multi-objectivization. In Evolutionary multicriterion optimization, pages 269–283. Springer.
[Kvam et al., 2015] Kvam, P., Cesario, J., Schossau, J., Eisthen, H., and Hintze, A. (2015).
Computational evolution of decision-making strategies. In 37th Annual Conference of the
Cognitive Science Society, volume 37, pages 1225–1230.
[Lehman et al., 2014] Lehman, J., Clune, J., and Risi, S. (2014). An anarchy of methods:
Current trends in how intelligence is abstracted in ai. IEEE Intelligent Systems, 29(6):56–
62.
[Lehman and Stanley, 2008] Lehman, J. and Stanley, K. O. (2008). Exploiting openendedness to solve problems through the search for novelty. In Proc. of the Eleventh
Intl. Conf. on Artificial Life (ALIFE XI), Cambridge, MA. MIT Press.
[Lenski, 2011] Lenski, R. E. (2011). Evolution in action: a 50,000-generation salute to Charles
92

Darwin. Microbe, 6(616):30–33.
[Lenski et al., 2003a] Lenski, R. E., Ofria, C., Pennock, R. T., and Adami, C. (2003a). The
evolutionary origin of complex features. Nature, 423(6936):139–144.
[Lenski et al., 2003b] Lenski, R. E., Ofria, C., Pennock, R. T., and Adami, C. (2003b). The
evolutionary origin of complex features. Nature, 423(6):139–144.
[Lu and Ito, 1999] Lu, B.-L. and Ito, M. (1999). Task decomposition and module combination based on class relations: a modular neural network for pattern classification. IEEE
Transactions on Neural Networks, 10(5):1244–1256.
[Lungarella et al., 2005] Lungarella, M., Pegors, T., Bulwinkle, D., and Sporns, O. (2005).
Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics, 3(3):243–262.
[Lungarella and Sporns, 2006] Lungarella, M. and Sporns, O. (2006). Mapping information
flow in sensorimotor networks. PLoS computational biology, 2(10):e144.
[Marstaller et al., 2010] Marstaller, L., Hintze, A., and Adami, C. (2010). Measuring representation. In Proceedings ASCS09: 9th Conference of the Australasian Society for Cognitive
Science, pages 232–237. North Ryde, NSW: Macquarie Centre for Cognitive Science.
[Marstaller et al., 2013a] Marstaller, L., Hintze, A., and Adami, C. (2013a). The evolution
of representation in simple cognitive networks. Neural computation, 25(8):2079–2107.
[Marstaller et al., 2013b] Marstaller, L., Hintze, A., and Adami, C. (2013b). The evolution
of representation in simple cognitive networks. Neural Comput, 25(8):2079–2107.
[Maynard Smith, 1982] Maynard Smith, J. (1982). Evolution and the Theory of Games.
Cambridge University Press, Cambridge, UK.
[McCabe, 1976] McCabe, T. (1976). A complexity measure issue. IEEE Transactions on
Software Engineering, 2:308–320.
[McGill, 1954] McGill, W. J. (1954). Multivariate information transmission. Psychometrika,
19(2):97–116.
[Michalewicz, 1996] Michalewicz, Z. (1996). Genetic Algorithms + Data Strucures = Evolution Programs. Springer Verlag, New York.
[Miikkulainen et al., 2017] Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D.,
Francon, O., Raju, B., Navruzyan, A., Duffy, N., and Hodjat, B. (2017). Evolving deep
neural networks. arXiv preprint arXiv:1703.00548.

93

[Mnih et al., 2016] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T.,
Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement
learning. In International Conference on Machine Learning, pages 1928–1937.
[Nadell et al., 2016] Nadell, C. D., Drescher, K., and Foster, K. R. (2016). Spatial structure,
cooperation and competition in biofilms. Nature Reviews Microbiology.
[Naiem et al., 2010] Naiem, A., Reda, M., El-Beltagy, M., and El-Khodary, I. (2010). An
agent based approach for modeling traffic flow. In Informatics and Systems (INFOS), 2010
The 7th International Conference on, pages 1–6. IEEE.
[Nash, 1951] Nash, J. (1951). Non-cooperative games. Annals of mathematics, pages 286–295.
[Nash et al., 1950] Nash, J. F. et al. (1950). Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49.
[Neumann et al., 1944] Neumann, J. v., Morgenstern, O., et al. (1944). Theory of games and
economic behavior. Princeton university press Princeton.
[Nguyen et al., 2015] Nguyen, A., Yosinski, J., and Clune, J. (2015). Deep neural networks
are easily fooled: High confidence predictions for unrecognizable images. In Computer
Vision and Pattern Recognition (CVPR ’15). IEEE.
[Nowak and Sigmund, 1993a] Nowak, M. and Sigmund, K. (1993a). A strategy of win-stay,
lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432):56.
[Nowak and Sigmund, 1993b] Nowak, M. and Sigmund, K. (1993b). A strategy of win-stay,
lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432):56–
58.
[Oizumi et al., 2014] Oizumi, M., Albantakis, L., and Tononi, G. (2014). From the Phenomenology to the Mechanisms of Consciousness: Integrated Information Theory 3.0. PLoS
Computational Biology, 10(5):e1003588.
[Olson et al., 2015] Olson, R. S., Haley, P. B., Dyer, F. C., and Adami, C. (2015). Exploring
the evolution of a trade-off between vigilance and foraging in group-living organisms. Royal
Society Open Science, 2(9).
[Olson et al., 2013a] Olson, R. S., Hintze, A., Dyer, F. C., Knoester, D. B., and Adami, C.
(2013a). Predator confusion is sufficient to evolve swarming behaviour. Journal of The
Royal Society Interface, 10(85):20130305.
[Olson et al., 2013b] Olson, R. S., Knoester, D. B., and Adami, C. (2013b). Critical interplay
between density-dependent predation and evolution of the selfish herd. In Proceedings
of the 15th annual conference on Genetic and evolutionary computation, pages 247–254.
94

ACM.
[Olson et al., 2013c] Olson, R. S., Knoester, D. B., and Adami, C. (2013c). Evolution of
swarming behavior is shaped by how predators attack. arXiv preprint arXiv:1310.6012.
[Pearl, 2014] Pearl, J. (2014). Probabilistic reasoning in intelligent systems: networks of
plausible inference. Morgan Kaufmann.
[Qiu and Sapiro, 2015] Qiu, Q. and Sapiro, G. (2015). Learning transformations for clustering
and classification. Journal of Machine Learning Research, 16(187-225):2.
[Rashevsky and Rashevsky, 1960] Rashevsky, N. and Rashevsky, N. (1960). Mathematical
biophysics: physico-mathematical foundations of biology, volume 2. Dover New York.
[Reisinger et al., 2004] Reisinger, J., Stanley, K. O., and Miikkulainen, R. (2004). Evolving
reusable neural modules. In Genetic and Evolutionary Computation Conference, pages
69–81. Springer.
[Rinehart et al., 2006] Rinehart, N. J., Bradshaw, J. L., Moss, S. A., Brereton, A. V., and
Tonge, B. J. (2006). Pseudo-random number generation in children with high-functioning
autism and aspergers disorder further evidence for a dissociation in executive functioning?
Autism, 10(1):70–85.
[Risi and Stanley, 2011] Risi, S. and Stanley, K. O. (2011). Enhancing es-hyperneat to evolve
more complex regular neural networks. In Proceedings of the 13th annual conference on
Genetic and evolutionary computation, pages 1539–1546. ACM.
[Rivoire and Leibler, 2011] Rivoire, O. and Leibler, S. (2011). The value of information for
populations in varying environments. Journal of Statistical Physics, 142(6):1124–1166.
[Rocha et al., 2005] Rocha, M., Cortez, P., and Neves, J. (2005). Simultaneous evolution of
neural network topologies and weights for classification and regression. In International
Work-Conference on Artificial Neural Networks, pages 59–66. Springer.
[Russell and Norvig, 2003] Russell, S. J. and Norvig, P. (2003). Artificial Intelligence: A
Modern Approach. Pearson Education, Upper Saddle River, N.J., 2nd edition.
[Schneidman et al., 2003] Schneidman, E., Still, S., Berry, M. J., Bialek, W., et al. (2003).
Network information and connected correlations. Physical review letters, 91(23):238701.
[Schossau et al., 2015] Schossau, J., Adami, C., and Hintze, A. (2015). Information-theoretic
neuro-correlates boost evolution of cognitive systems. Entropy, 18(1):6.
[Searcy and Nowicki, 2005] Searcy, W. A. and Nowicki, S. (2005). The evolution of animal
communication: reliability and deception in signaling systems. Princeton University Press.
95

[Sebastiani, 2002] Sebastiani, F. (2002). Machine learning in automated text categorization.
ACM computing surveys (CSUR), 34(1):1–47.
[Smith, 1974] Smith, J. M. (1974). The theory of games and the evolution of animal conflicts.
Journal of theoretical biology, 47(1):209–221.
[Stanley and Miikkulainen, 2002] Stanley, K. O. and Miikkulainen, R. (2002). Evolving
neural networks through augmenting topologies. Evol Comput, 10(2):99–127.
[Street et al., 1995] Street, W. N., Mangasarian, O. L., and Wolberg, W. H. (1995). An
inductive learning approach to prognostic prediction. In ICML, pages 522–530.
[Suchorzewski and Clune, 2011] Suchorzewski, M. and Clune, J. (2011). A novel generative
encoding for evolving modular, regular and scalable networks. In Proceedings of the 13th
annual conference on Genetic and evolutionary computation, pages 1523–1530. ACM.
[Szabo and Fath, 2007] Szabo, G. and Fath, G. (2007). Evolutionary games on graphs.
Physics Reports, 446(4-6):97–216.
[Tononi, 2008] Tononi, G. (2008). Consciousness as integrated information: a provisional
manifesto. The Biological Bulletin, 215(3):216–242.
[Tononi et al., 1994] Tononi, G., Sporns, O., and Edelman, G. M. (1994). A measure for
brain complexity: relating functional segregation and integration in the nervous system.
Proceedings of the National Academy of Sciences, 91(11):5033–5037.
[Traulsen and Hauert, 2009] Traulsen, A. and Hauert, C. (2009). Stochastic evolutionary
game dynamics. In Schuster, H.-G., editor, Reviews of Nonlinear Dynamics and Complexity,
volume 2. Wiley-VCH, Weinheim.
[van Dartel et al., 2005] van Dartel, M., Sprinkhuizen-Kuyper, I., Postma, E., and van den
Herik, J. (2005). Reactive agents and perceptual ambiguity. Adaptive Behavior, 13:227–42.
[Vapnik and Chervonenkis, 1971] Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the
uniform convergence of relative frequencies of events to their probabilities. Theory Probab.
Appl., 16:264–280.
[Wagenaar, 1972] Wagenaar, W. A. (1972). Generation of random sequences by human
subjects: A critical survey of literature. Psychological Bulletin, 77(1):65.
[Weibull, 1995] Weibull, J. W. (1995). Evolutionary Game Theory. MIT Press, Cambridge,
MA.
[Welch, 1984] Welch, T. A. (1984). A technique for high-performance data compression.
Computer, 6(17):8–19.
96

[Werbos, 1974] Werbos, P. J. (1974). Beyond regression: new tools for prediction and analysis
in the behavioral science. PhD thesis, Ph. D. Thesis, Harvard University.
[Whitley et al., 1998] Whitley, D., Rana, S., and Heckendorn, R. B. (1998). The island model
genetic algorithm: On separability, population size and convergence. Journal of Computing
and Information Technology, 7:33–47.
[Williams et al., 2002] Williams, M. A., Moss, S. A., Bradshaw, J. L., and Rinehart, N. J.
(2002). Brief report: Random number generation in autism. Journal of autism and
developmental disorders, 32(1):43–47.
[Yao, 1999] Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE,
87(9):1423–1447.
[Zahedi et al., 2013] Zahedi, K., Martius, G., and Ay, N. (2013). Linear combination of onestep predictive information with an external reward in an episodic policy gradient setting:
a critical analysis. Frontiers in Psychology, 4(801).
[Zeeman, 1980] Zeeman, E. C. (1980). Population dynamics from game theory. In Nitecki, Z.
and Robinson, C., editors, Global Theory of Dynamical Systems. Lecture Notes in Mathematics, volume 819, pages 471–497. Springer, N.Y.
[Zhang and LeCun, 2015] Zhang, X. and LeCun, Y. (2015). Text understanding from scratch.
arXiv preprint arXiv:1502.01710.
[Zhou et al., 2011] Zhou, A., Qu, B.-Y., Li, H., Zhao, S.-Z., Suganthan, P. N., and Zhang, Q.
(2011). Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm
and Evolutionary Computation, 1(1):32 – 49.
[Zlotowski and Bakan, 1963] Zlotowski, M. and Bakan, P. (1963). Behavioral variability of
process and reactive schizophrenics in a binary guessing task. The Journal of Abnormal
and Social Psychology, 66(2):185.

97