EVOLUTION OF DECISION-MAKING SYSTEMS By Jorden D. Schossau A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science—Doctor of Philosophy Ecology, Evolutionary Biology, and Behavior—Dual Major 2017 ABSTRACT EVOLUTION OF DECISION-MAKING SYSTEMS By Jorden D. Schossau Adaptive biological or engineered systems are adaptive because they can make decisions. Some systems such as viruses use their molecular composition – genetic information – to decide when to become lysogenic (dormant) or lytic (active). Others, such as self-driving cars, must use spatiotemporal information about obstacles, speed, and previous signs to determine when to turn or begin braking. Computational models of systems allow us to both engineer better systems, and create better scientific understanding about the dynamic world. The practice of modeling decision-making started with the study of interactions between rational agents on the spectrum of conflict and cooperation began with Von Neumann and Morgenstern’s Theory of Game and Economic Behavior [Neumann et al., 1944]. Scenarios, called “games”, are models designed and studied to increase understanding of conflict and cooperation between these agents. The games discussed here are Prisoner’s Dilemma and Volunteer’s Dilemma. Modern methods of analysis for games involving populations of interacting agents fail to predict the final strategy distribution among all agents. In chapter 2 I develop a new computational agent-based simulation used as an inductive study system to compare the deductive predictive capabilities of an analytical model that is capable of predicting the final distribution under idealized conditions. Lastly, I show a novel finding that the agent-based model suggests probabilistic, or mixed, strategies (such as probabilistic gene expression) are a result of the development and maintenance of cooperation in Volunteer’s Dilemma. Game theory fails to provide tractable models for more complex decision-making situations, such as those with complex spatial or temporal dimensions. In these cases an algorithm of conditional logic may be used to simulate decision-making behavior. Yet still there are systems for which the use of an algorithm as a model is inadequate due to incomplete knowledge of the system. Perhaps the model makes too many generalizations, is limited by atomic discretization, or is otherwise incomplete. In these cases it is useful to compensate for deficits by using probabilistic logic. That is, we assume that a stochastic process can roughly describe those subprocesses not fully modeled. Lastly, algorithms as decision strategies can incorporate temporal information in the decision-making process. There are two ways temporal information can be used in an individual’s conditional logic: evolutionary, and lifetime. The evolutionary approach has proved much more flexible as a means to discover and tune models of unknown decision-making processes. Neuroevolution is a machine learning method that uses evolutionary algorithms to train artificial neural networks as models of decision-making systems. There is currently a wide diversity of methods for neuroevolution that all share common structures of the types of problems being solved: those generally being cognitive tasks. Toward this end it would be useful if there were some properties common to all cognitive systems that could be incorporated into the optimizing objective function in order to enhance or simplify the evolutionary process. In chapter 3 and 4 I explore new methods of improving model discovery through neuroevolution and discuss the applicability of these methods for probabilistic models. Copyright by JORDEN D. SCHOSSAU 2017 ACKNOWLEDGMENTS There are a number of people to whom I am grateful for their influence on my graduate journey. Dr. Chris Adami has been my Research Experience for Undergraduates advisor in the summer of 2007, my graduate advisor beginning in 2009 at Keck Graduate Institute, and he allowed me to follow him to Michigan State University where I continued under his guidance. In every case Dr. Adami has been an excellent resource and support allowing me to have a diverse set of experiences and skills which will certainly be required in my chosen field. The environment he created for research is unique in that it was heavily interdisciplinary and encouraged new ideas regardless of their origin. I also thank Dr. Arend Hintze for mentoring me during the same time frame by providing invaluable professional feedback, encouraging the writing process, and helping to maintain a sane working environment. My committee members Dr. Bill Punch and Dr. Charles Ofria have provided much-needed guidance and feedback concerning my degree progress, professional outlook, and dissertation text. As chair of the Department of Computer Science and Engineering, Dr. Eric Torng has been effective and understanding by providing professional opportunities and support for activities. I am fortunate to have studied at Michigan State University at a time when it had become the epicenter for scientific research and collaboration in my chosen field. The timing was not only good for research ideas through organizations such as BEACON and EEBB, but also research infrastructure for computing provided by ICER and versioning and code hosting provided by IT Services. I thank past and present members of the Adami lab, Ofria lab, and Lenski lab who have provided much-needed feedback. For their camaraderie, friendship, and wealth of experience-based advice I thank Anu Pakanati, Aditi Gupta, Lacey Best-Rowden, Caitlyn Pickens, Neem Serra, Taylor Kelsaw, and Leonie Hintze. For their endless support and encouragement I acknowledge my parents Pam Schossau and Jim Schossau, my sister Carlin Schossau, Tom Trainor, Alita Burmeister, and my aunt and uncle Kathy Stark and Gary Flum. v TABLE OF CONTENTS LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction . . . . . . . . . . . . . . 1.1 Origins of Modeling Decision-Making 1.2 Example Games . . . . . . . . . . . . . . 1.3 Neuroevolution . . . . . . . . . . . . . . 1.3.1 Unification for Advancement . . 1.3.1.1 Machine Learning . . . . . . 1.3.1.2 Unsupervised Learning . . . 1.3.1.3 Reinforcement learning . . . . . . . . . . . 1 3 5 11 16 17 17 18 Chapter 2 Two New Volunteer’s Dilemma Models and Mixed-Strategies as a Result of Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 New Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Mutational Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Spatial Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 22 23 27 31 33 36 38 42 Chapter 3 Information-theoretic Neuro-correlates Boost Evolution of Cognitive Systems . 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . 3.2 Introduction . . . . . . . . . . . . . . . . . . . 3.3 Background . . . . . . . . . . . . . . . . . . . . 3.3.1 Network-theoretic neuro-correlates . 3.3.1.1 Minimum-Description-Length . . . 3.3.1.2 Topological Complexity . . . . . . 3.3.1.3 Connectivity and Sparseness . . . 3.3.1.4 Representations . . . . . . . . . . 3.3.1.5 Information Integration . . . . . . 3.3.1.6 Predictive Information . . . . . . 3.3.2 Complex Environments . . . . . . . . . 3.4 Methods . . . . . . . . . . . . . . . . . . . . . . 3.5 Results and Discussion . . . . . . . . . . . . 3.5.1 Augmented selection and its effect on 3.5.2 Neuro-correlate interactions . . . . . . 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . 43 43 44 47 49 49 50 51 51 53 55 57 58 60 61 63 67 vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . other . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . neuro-correlates . . . . . . . . . . . . . . . . . . . . . . 3.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4 The Role of Conditional Independence ligent Systems . . . . . . . . . . . . . . . 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . 4.2 Introduction . . . . . . . . . . . . . . . . . . 4.3 Results . . . . . . . . . . . . . . . . . . . . . 4.4 Discussion . . . . . . . . . . . . . . . . . . . 4.5 Acknowledgements . . . . . . . . . . . . . . Intel. . . . . . . . . . . . . . . . . . . . . . . . 69 69 70 74 83 86 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 vii in the . . . . . . . . . . . . . . . . . . . . . . . . Evolution of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 LIST OF FIGURES Figure 1.1: Replicator equation modeling for the Rock-Paper-Scissors game with a repulsive fixed point at population fractions (xR , xP , xS ) = (1/3, 1/3, 1/3). (a): infinite population size (replicator equation), with initial condition at the fixed point. (b): agent-based modeling (finite population size N = 1, 024), Moran process (see Box 3), population frequencies plotted every 25 generations. In this case, the population ended up on the fixed point xS = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Figure 1.2: Trajectories of the population fraction of the game defined by Eq. (1.3) with a = 1 and b = 0.2. The dashed line represents the Evolutionary Stable set, while the solid lines show the population trajectories that begin at the solid dots (different initial conditions), and move towards the ES set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 (a): Average trajectories of lines of descent for probabilistic strategies using an encoding where each of the two probabilities p and q is encoded independently, thus fixing r (the “2-genes” encoding). (b): Average trajectories of lines of descent for probabilistic strategies with different initial conditions, using the “3-genes” encoding (see text). The dashed line represents the predicted ES-set for deterministic strategies shown in Fig. 1.2. Average of 1, 000 trajectories per line of descent, obtained from populations with 1, 024 agents evolving at a mutation rate of 10% per locus (translating to 30% per chromosome) for 1, 000 generations. 11 Predicted payoff for a single cooperator (blue) and a single defector (red) in a VD game with group size N = 9, cooperation cost c = 0.4, public good β = 3, and threshold ζ = 1, using equations 2.1 and 2.2 respectively. As only 1 cooperator is required to produce the public good, the expected payoff for this cooperator does not change with respect to changes in cooperator population density, but the viability of the defection strategy depends on the density of cooperators. . . . 27 Figure 1.3: Figure 2.1: viii Figure 2.2: Figure 2.3: Figure 2.4: Figure 2.5: Figure 2.6: As a prediction of [Archetti and Scheuring, 2011], average payoff for a single cooperator (blue) and a single defector (red) for every possible prevalence of cooperators for the VD game with group size 9, cost of cooperation 0.4, public good benefit 3, and ζ = 5 (A), ζ = 6 (B), ζ = 7 (C). The stable fixed point represented by the higher crossing point of the payoff curves shifts toward more cooperation for games with higher public good thresholds, predicted even for games with a high threshold of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Analytical predictions of the equilibrium fixed point (dashed line) compared with our agent-based simulation results (solid line) for a variety of minimum thresholds for public goods generation ζ, for the VD game of group size 5 (A) group size 9 (B), cooperating cost of 0.4, public good benefit of 3, and µ of 0.005. Bootstrapped 95% confidence intervals shown but not discernible. . . . . . . . . . . . . . . . . . . . . . . . . 30 Distributions representing likelihood of observing population mixtures, derived from equation 2.8. All panels show predictions for the VD game of N = 9, c = 0.4, β = 3, and µ = 0.005, but ζ = 5 (A), ζ = 6 (B), and ζ = 7 (C). For simple predictions there is only one peak (A), but as the game becomes more difficult with higher ζ (B) a second peak may form showing viability of defection (C). . . . . . . . . . . . 32 Likelihood of average population mixtures for mixed-strategies assuming infinite population size and infinite time. Each peak and valley represents a fixed point: The left peak (defective) is attractive, the valley is repulsive at PC = 0.1375, and the right peak (cooperative) is attractive at PC = 0.275. It can be observed in finite populations that a small mutation rate or large population will stabilize a population at the cooperative fixed point, but otherwise the population will eventually collapse to the defective fixed point. . . . . . . . . . . . . 34 Agent-based evolved populations for the uniform mutational distribution (A) and Gaussian distribution (B), with public goods threshold ζ = 4, for the VD game of group size 16, cooperating cost c = 0.6, public good benefit β = 3, and µ = 0.005. Over 100 replicates collected to produce the distributions. Both distributions have and are predicted to have a mean cooperation frequency fPC = 0.275 using the cooperative fixed point in the distribution from Figure 2.5. Despite the average of each distribution as the predicted fixed point, the distributions are quite different from one another. Only the Gaussian mutational distribution produces a unimodal population strongly clustering around the predicted fixed point, whereas the uniform mutational distribution produces a unimodal population with a long tail, and it is not immediately apparent which strategy is the fixed point, or average strategy. . . . 35 ix Figure 2.7: Fixed points predicted exactly by the analytical model (dashed line) and approximated by the average strategy of the evolved population from the agent-based model (solid line). Both models have the parameters ζ/N = 0.2, c = 0.01, and β = 0.5. The agent-based model uses µ = 0.005 and a smoothing payoff parameter s = 0.5, which is half way between a linear and step interpolation. Both predictions converge toward PC = 0.2, and is identical to the setup and result of the analysis provided by [Archetti and Scheuring, 2011]. . . . . . . . . . . . . . . 36 Population distributions for single instances of agent-based evolved populations for well-mixed (A) and spatial constraint (B), with public goods threshold ζ = 4, for the VD game of group size 16, cooperating cost c = 0.6, public good benefit β = 3, µ = 0.005. The analytical model predicts a mean cooperation frequency fPC = 0.275 and both well-mixed and spatial populations evolutionarily shift around this fixed point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Spatial visualizations for single instances of agent-based evolved populations (the same from Figure 2.8) for well-mixed (A) and spatial constraint (B), with public goods threshold ζ = 4, for the VD game of group size N = 16, cooperating cost c = 0.6, public good benefit β = 3, µ = 0.005. The analytical model predicts a mean cooperation frequency fPC = 0.275 and both well-mixed and spatial populations evolutionarily shift around this fixed point. . . . . . . . . . . . . . . 39 Figure 2.10: Comparison of analytical (dashed) and ABS (solid) predictions for deterministic strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group size N = 9, cooperating cost c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A and B they analytical (dashed) is the same prediction assuming well-mixed. Fixed points for the pure-strategy game predicted by the ABS under spatial constraint (B) show much higher levels of cooperation than the well-mixed analytical prediction. . . . . . . . . . . . . . . . . . . . . 40 Figure 2.8: Figure 2.9: Figure 2.11: Comparison of analytical (dashed) and ABS (solid) predictions for probabilistic strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group size N = 9, cooperating cost c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A and B they analytical (dashed) is the same prediction assuming well-mixed. Fixed points for the mixed-strategy game predicted by the ABS under spatial constraint (B) show marginally higher levels of cooperation than the analytical prediction, indicating higher cost-benefit efficiency. 41 x Figure 2.12: The result of using Equation 2.7 to predict group fitness for Volunteer’s Dilemma of group size N = 9, ζ = 4, cost c = 0.4, and benefit β = 3. The fixed point is at PC = 0.59, but the maximum group fitness is higher, at PC = 0.72. . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.1: Figure 3.2: Figure 3.3: Figure 3.4: 42 A: In the simulation, large or small blocks fall diagonally toward the bottom row of a 20 × 20 discrete world, with the agent on the bottom row. For the purpose of illustrating the task, a large brick is falling to the left, while a small brick is falling to the right. In simulations, only one block is falling at the time, and any one brick can fall either only to the left or only to the right. The agent is rewarded for catching small blocks and punished for catching large blocks. B: A depiction of the agent’s neurons (bottom left: triangles depict sensors, circles illustrate brain (internal) neurons, trapezoids denote actuators) and the sequence of activity patterns on the agent’s 4-bit retina (right), as a large brick falls to the right. Reproduced from [Marstaller et al., 2013b], with permission. . . . . . . . . . . . . . . . . . . . . . . . . 56 A: Average of fitness over 10, 000 generations (over 128 replicates) for the block-catching task. B: Average of fitness of 10, 000 generations (over 128 replicates) for generating random numbers. •: the control using only performance in the GA; The effect of augmenting the fitness function (Eq:3.8) with Φatomic ( ), with R ( ), with graph diameter ( ) on task performance, with minimum-description-length ( ), with Gamma Index ( ), with sparseness (a), with P Iss ( ), and with P Isa ( ) on task performance. . . . . . . . . . . . . . . . . . . . . . . . . 62 Absolute number of perfectly performing block-catching agents after 10, 000 generations evolved under the five different experimental conditions shown. * indicates significance using the Chi-Squared test relative to Task or no neuro-correlate augmentation (p < 0.05). . . . . . . . . 63 Comparison of different optimization methods at the end of evolution of the block-catching task. The top row shows the performance distribution (gray) and average performance given the different selection regimes (MDL abbreviates minimum-description-length, which is the potential brain size based on the length of the encoding genome, GI abbreviates Gamma Index, GD graph diameter, and P Iss and P Isa are predictive information sensor t to sensor t + 1, and sensor t to actuator t + 1, respectively). Subsequent rows show the effect of the different selection regimes (x axis) on the respective neuro-correlates. . . . . . 64 xi Figure 3.5: Figure 3.6: Figure 4.1: Figure 4.2: Comparison of different optimization methods on each other at the end of evolution of the random number generation task. The top row shows the performance distribution (gray) and average performance given the different selection regimes (MDL abbreviates minimum-descriptionlength, which is the potential brain size based on the length of the encoding genome, GI abbreviates Gamma Index, GD graph diameter, and P Iss and P Isa are predictive information sensor t to sensor t + 1, and sensor t to actuator t + 1, respectively). Subsequent rows show the effect of the different selection regimes (x axis) on the respective neuro-correlates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Effect on different neuro-correlates and performance when selecting only on a single neuro-correlate. The left panel is about evolution in the block-catching environment, and the panel on the right is about the RNG task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Examples of gate decomposition into combinations of simpler gates. Panel A shows two deterministic logic gates whose inputs are crosswired so that both gates receive the same inputs. The tables below show their probabilities to output 0 or 1 respectively. These probabilistic logic boundary cases are effectively deterministic logic gates. Panel B shows a two-in-two-out logic gate that is functionally identical with the two gates depicted in Panel A. Panel C shows two probabilistic logic gates similarly connected like the deterministic gates from panel A. The logic tables below only show the case where both inputs are 0. The lower table shows replacing both probabilistic logic gates with a single two-in-two-out probabilistic logic gate (similar to panel B) and how the new probabilities for that gate are constructed from the individual probabilities of both gates. . . . . . . . . . . . . . . . . . 75 Average fitness of organisms on the line of descent in the spatial temporal (A) and foraging (B) environments. The solid line represents average performance of agents restricted to conventional probabilistic HMGs (with instantaneous interactions), dashed lines represents average performance of agents restricted to decomposable HMGs (without instantaneous interactions). Panel A shows agents evolved in the temporal spatial integration task, and panel B shows agents evolved in the foraging task. The y axis is normalized to show the fraction of maximally attainable fitness in each environment. The gray shadow indicates the bootstrapped 95% confidence interval of the mean. . . . 78 xii Figure 4.3: Figure 4.4: Figure 4.5: Average performance of agents evolved in the association environment with increasing levels of error punishment. Performance of agents evolved while restricted to decomposable HMGs is shown as a dashed line. Performance of agents evolved while restricted to conventional probabilistic HMGs is shown as a solid line. The gray shadow indicates the bootstrapped 95% confidence interval of the mean. Panel A shows the results for agents receiving zero punishment for path-following errors. Panel B shows the results for punishment of 0.05, meaning 0.05 was subtracted from the cumulative agent score every time it wandered off the path. Panel C shows the results for agents evolved with punishment 0.1, again subtracted from their score when wandering off the path. The y axis was normalized to show relative performance, with 1.0 being the maximally attainable fitness in each environment assuming ideal behavior. . . . . . . . . . . . . . . . . . . . . . . . . . 79 Distribution of performances at the end of evolution for each of two conditions in each of three environments. At the end of evolution, the 200th -from-last agent of each independent line of descent is collected to create these distributions. This near-the-end data slicing is necessary to eliminate noise toward the end of the line of descent caused by mutated extant agents not yet pruned from the line of descent by selection. The three environments are represented on the x-axis as I (temporal spatial integration task), F (foraging task), and A (association task with a punishment of 0.05). In each environment (I, F, and A) agents were allowed to either use conventional probabilistic HMGs (labeled as “prob”) or decomposable HMGs (labeled as “dec”). Red dashes indicate the mean and extrema. Gray violin plots show the distributions of normalized fitness for all 120 replicates per experimental condition. Fitness was normalized such that maximal theoretically attainable fitness is represented as 1.0. For each environment of Integration, Learning, and Association, the conditions under which evolution was limited to decomposable gates produced significantly better adapted agents. Significance was tested using the Mann-Whitney U test, with p < 0.05 for each environment (p = 0.0, p = 0.0, p < 2.2 × 10−112 ). . . . . . . 80 Average number of gates evolved along the line of descent in a competition experiment executed in different environments. Agents were evolved in three different environments: Panel A shows the results from the temporal spatial integration environment. Panel B shows the results from the foraging environment. Panel C shows the results from the association environment with a punishment cost of 0.05. The average number of preferentially selected decomposable HMGs over generations is shown as a dashed line, whereas the solid line represents the average number of preferentially selected conventional probabilistic HMGs. The gray shadow indicates a 95% confidence interval. . . . . 81 xiii Figure 4.6: Figure 4.7: Average fitness along the line of descent in each environment while both non-decomposable and decomposable gate types were evolvable: Panel A shows line of descent fitness for the temperal spatial integration environment. Panel B shows line of descent fitness for the foraging environment. Panel C shows line of descent fitness for the association environment. The gray shadow indicates a 95% confidence interval. . 82 Distributions of different properties of Markov Brains at the end of evolution in three environments and two conditions each environment. The three environments are represented on the x-axes as I (temporal spatial integration task), F (foraging task), and A (association task with a punishment of 0.05). In each environment (I, F, and A) agents were allowed to either use conventional probabilistic gates (labeled as ”prob”) or decomposable gates (labeled as ”dec”). Panel A shows the number of gates, Panel C shows the gamma index (that is the connection density), and Panel B shows the diameter (longest of the shortest connections). Each panel shows the distributions as violin plots in gray, and the mean as well as the extreme points as red dashes. For each environment of Integration, Learning, and Association, and for both properties of Gamma Index and Diameter, the conditions under which evolution was limited to decomposable gates produced significantly more connected and larger diameter Markov Brains. Significance was tested using the Mann-Whitney U test, with p < 0.05. For Gamma Index, the difference between gate restricted evolution within environments produced p values of p = 4.9 × 10−23 , p = 6.5 × 10−6 , p = 1.3 × 10−3 respectively. For Diameter, the difference between gate restricted evolution within environments produced p values of p = 2.2 × 10−6 , p = 3.6 × 10−2 , p = 4.2 × 10−3 respectively. 83 xiv Chapter 1 Introduction The simplest form of decision-making is to not make a decision at all. That is, the easiest strategy is a constant one where the decision, or choice, is not influenced by history or environment and does not change during the lifetime of an organism. In natural systems these strategies typically arise as reactions to slow and gradual environmental changes spanning many lifetimes. The slow nature of these changes relative to the quick lifetimes of the organisms allows the population of organisms to encode successful constant strategies directly into their genomes. In engineered systems these strategies are the simplest to implement and understand, but are the most limited in flexibility. A slightly more complex form of decision-making is the probabilistic strategy. Instead of always performing the same action as in the constant strategy case, an organism can employ a strategy relying on stochastic processes to “decide” which action to take. This reliance on external stochasticity to constrain choice may be thought of as a type of phenotypic plasticity. It is believed that natural systems that exhibit phenotypic plasticity do so because the predictability of their environment is low compared to the reliability of the noisy process driving their strategy choices to correspond with benefits. The next increase in decision-making complexity is an introduction of conditional logic 1 on salient cues. Probabilistic strategies are already conditional, but they are not influenced by anything informative. This new type of conditional logic must be predicated on salient cues, or in other words, cues that are informative about conditions or states relevant to the organism. These types of strategies can be better described as algorithms, or sets of rules for a problem-solving behavior. For example, a waiter uses an algorithm to decide guest seating, and a robotic arm uses an algorithm to decide how its many parts move (or don’t move) in order to accomplish a high-level operation. With algorithms, we have the ability to describe or model complex decision-making using nothing but simple conditional logic and atomic operations. Sometimes we are unable to fully model a natural system using an algorithm due to incomplete knowledge of the system. Perhaps the model makes too many generalizations, is limited by atomic discretization, or is otherwise incomplete. In these cases it is useful to compensate for deficits by using probabilistic logic. That is, we assume that a stochastic process can roughly describe those subprocesses not fully modeled. This use of probabilistic logic differs from the probabilistic strategy discussed above in that the entire logic tree does not solely depend on stochasticity for the decision-making process. Lastly, algorithms as decision strategies can incorporate temporal information in the decision-making process. There are two ways temporal information can be used in an individual’s conditional logic: evolutionary, and lifetime. The more commonly studied method of including temporal information is through the process of evolution. The strategies themselves do not change how their decisions are being made, but these changes occur within the population’s genetic pool over time. Evolution’s process of selection changes the strategy distribution across the population. Through random mutation and continual selection the population adapts its pool of strategies to the environmental problem. Lifetime use of temporal information is characterized by an individual’s decisions being predicated on previous 2 decisions by that individual or those of other individuals. This sequential lifetime model of using past experiences to shape future decisions has been used to represent memory in intelligent rational agents [Nowak and Sigmund, 1993a]. I begin with a brief description and history of the fields of game theory and evolutionary neuroevolution related to the present results. Chapter 2 details new models for Volunteer’s Dilemma and proposes a new mechanism of selection for cooperation. Chapter 3 introduces the use of neurocorrelates to bootstrap the evolutionary process of decision-making agents. Lastly, Chapter 4 further explores the issue of how to use information theory to measure the decision networks of a probabilistic decision-making agent. 1.1 Origins of Modeling Decision-Making The practice of modeling interactions between rational agents on the spectrum of conflict and cooperation began with Von Neumann and Morgenstern’s Theory of Game and Economic Behavior [Neumann et al., 1944] focusing mostly on economic utility. Even more abstractly, game theory mathematically describes decision-making within social groups. However, this larger applicability did not advance until John Nash’s work providing the concept of equilibrium [Nash et al., 1950, Nash, 1951]. The equilibrium concept spurred a renaissance in game theory, but left some problems to fester. The full potential of the equilibrium concept was yet to be utilized until Maynard Smith and Price’s Evolution and the Theory of Games [Maynard Smith, 1982] where game theoretic ideas were applied to population biology and allowed the creation of testable predictions. However, this focus on rationality proved limiting for the field, since ultimately economists wanted to model human decision-making, but contrarily humans often do not make rational decisions. Similarly, insects and animals appear to make rational decisions but without the power of human rationality [Smith, 1974]. Clearly, a broader theory was necessary. With Maynard Smith’s Evolution and the Theory of Games in 1982, the focus changed 3 from rationality determining a game’s outcome, to the process that determines the outcome. The process of equilibration can be slow as in biological evolution, or fast as in social mimicry of behavior. However, the outcome depends not only on the equilibrating speed, but on all the details of the process, such as mutational neighborhood, mutation rate, epistatic interactions, and population size. The details of the evolutionary process can have difficult-to-predict consequences on the path to, and final state of, the game outcome. This opaque nature of evolutionary dynamics makes the discovery of mathematical formalisms for Evolutionary Game Theory sometimes difficult or impossible. Ideally then, to study Evolutionary Game Theory one should find a model organism that exhibits behaviors of interest, and perform experimental evolution to investigate how the costs, benefits, mutation rate, etc. influence population-level behavior. From these direct empirical data one could build a generalizable mathematical model to describe the underlying dynamics, and new scientific understanding would be attained. Unfortunately, the process of evolution is extremely slow compared to a human lifespan. E. coli is one of the fastest growing model organisms used for experimental evolution, but even with a doubling time of roughly 20 minutes and typical flask transfer ratios of 1:100, it takes about 20 years to witness evolution over 50,000 generations [Lenski, 2011]. Additionally, many of the necessary conditions for various costs, benefits, mutation rates, etc. are difficult if not impossible to experimentally manipulate. If the mathematics governing various types of Evolutionary Game Theory is difficult to discover, and biological evolution too difficult to comprehensively study, then it would be useful to have a study system that can be quickly and easily manipulated to both inform and validate the mathematics, while having similar emergent properties to biological study organisms. This is the reason to use computational agent-based modeling to aid in studying Evolutionary Game Theory. Unlike mathematical modeling, agent-based modeling is a type of modeling where each individual in a population and all their interactions are simulated. The simplest historic agentbased model is perhaps the “Game of Life”, also known as a Cellular Automata and arose out 4 of the investigation of creating self-replicating machines [Gardner, 1970]. For any agent-based model, only the simple rules governing the behavior of an individual must be defined from the point of view of the agent, and yet complex population-level effects can still be observed without the need to explicitly define them. For cellular automata, self-replication and complex movement patterns were hailed as emergent behavior coming from very simplistic rulesets. This level of abstraction paired with computational power means that 50,000 generations of agent-based evolution of a population passes in about 15 minutes using 2010 compute technology. Many effects can be studied using agent-based modeling, such as automotive or pedestrian traffic congestion, social learning, and predator-prey dynamics [Naiem et al., 2010, Amrutha, 2014, DeAngelis and Mooij, 2005]. The power of agent-based modeling lies in both the fast computational realization of the model, and the fact that it shows a cause and effect relationship between simple interaction rules and their emergent phenomena. While agent-based modeling shows the cause and effect relationship between interaction rules and the population-level behavior, it does not provide generalizable knowledge to other similar systems as mathematical modeling does. This is why the research presented here uses both agent-based modeling and mathematical modeling together to study Evolutionary Game Theory. 1.2 Example Games A commonly studied Game Theory game is the Prisoner’s Dilemma. The situation can be understood as two thieves who are apprehended without any evidence to their crime. Each thief is privately offered two options, to say nothing or to incriminate their accomplice. Each thief is also told the potential results of their decisions: both stay silent and receive a minimum sentence; both incriminate each other and receive medium sentence; or one stays silent and is incriminated by the other, for which the silent one receives a maximum sentence and the defector goes free. 5 The basics of any Game Theory game, such as Prisoner’s Dilemma requires each player to have a set of strategies from which they must choose one. After all players have chosen a strategy, then their selected strategies are compared. Players then receive rewards or costs (including 0) as a result of how their strategies interact with those of the other players. The resulting cost or benefit of each possible pair of strategy interactions can be written down in matrix form called a Payoff Matrix. The payoff matrix can be analyzed and the dynamics of optimal rational play predicted for each player. The payoff matrix for a general two-player game is given by C D   C 0  D b a . 0 (1.1) where the specific game is classified by the sign of the variables: Prisoner’s Dilemma for a < 0.b > 0, Snowdrift for a > 0, b > 0, anti-coordination for a < 0, b < 0, and harmony for a > 0, b < 0 [Hofbauer and Sigmund, 1998, Szabo and Fath, 2007, Adami et al., 2012]. These classes of games are called Zeeman classes, denoted by the four matrices created with the variable sign changes [Zeeman, 1980]. For three-player games there are 20 unique Zeeman classes. Consider this common game involving three players, Rock-Paper-Scissors (RPS) where there is no clear winner. The game may be described by the payoff matrix (in normalized form with zero-diagonal) 6 R P S   R 0 −1 1    . P 1 0 −1     S −1 1 0 (1.2) Payoff values are interpreted as the payoff the row-player receives playing against the column-player. To model the stochastic nature of noisy interactions, likelihoods, or interactions not yet fully understood, the mixed Nash equilibrium can describe the outcome. For the case of the RPS game, the mixed Nash Equilibrium is the well-known ( 13 , 31 , 31 ). This fixed point is unstable: the trajectories push outward from the central point, and soon enough the population (in the infinite population approximation) moves from all scissors, to all rock, to all paper (a trajectory known as a heteroclinic orbit because it connects the unstable pure strategy fixed points, see Fig. 1.1a). Such heteroclinic orbits are impossible to sustain for finite populations. In these orbits, once a type is extinct it will not recover (see Fig. 1.1b). In finite populations playing the same game (defined by the payoffs (1.2)) only a single strategy will survive, but which one ultimately remains is random. Note that extinction can happen due to the stochasticity introduced by the finite population even for the RPS game with an attractive interior fixed point (obtained from Eq. (1.2) by, for example, flipping the signs in that matrix) [Claussen and Traulsen, 2008, Adami et al., 2012]. While analytical observations can also make predictions for finite populations ( [Traulsen and Hauert, 2009, Nowak and Sigmund, 1993b]), they assume an existing clonal population, or a set of predefined strategies, or mutation rates that are quite small. Higher mutation rates, finite population sizes, and heterogeneous strategies all add complexity for an analytical treatment. 7 Figure 1.1: Replicator equation modeling for the Rock-Paper-Scissors game with a repulsive fixed point at population fractions (xR , xP , xS ) = (1/3, 1/3, 1/3). (a): infinite population size (replicator equation), with initial condition at the fixed point. (b): agent-based modeling (finite population size N = 1, 024), Moran process (see Box 3), population frequencies plotted every 25 generations. In this case, the population ended up on the fixed point xS = 1. For pure-strategy interpretations, the prediction is, 1/3 of the population will play each strategy all the time. For the mixed-strategy interpretation, the prediction is, each player will play each strategy probabilistically with equal probability. However, when modeled using an agent-based simulation, even with a population beginning with the mixed Nash equilibrium, the population strategies spiral outward until a fixed point is reached (in this case, XS = 1). Such finite population sizes and higher mutation rates in particular create problems of analytical tractability, but are generally solved with agent-based simulation. Another example is in the treatment of the evolutionary stable set. Consider for a moment the two-player game described by the payoff matrix (1.1) in the “snowdrift” regime, where both a, b > 0. Here, Nash’s concept of Evolutionary Stable Strategy (ESS) clearly shows that any single strategy can thrive against itself and any rare alternative strategy. In this case, none of the “pure” strategies C or D are an ESS, but the mixed strategy M (a probabilistic mixture of the two strategies C and D) would be, with frequencies 8 a a+b and b a+b respectively. The mean payoff of strategy M against pure strategies C and D can be calculated, so the question may be asked: “What happens if the pure strategies C and D play against another strategy that has payoffs identical to the mixed strategy?” For example, the mixed strategy M earns b2 /(a + b) against C, and a2 /(a + b) against D. What is the dynamics when a pure strategy with these exact payoffs is thrown “into the mix”? This can be investigated by studying the payoff matrix C D  M  C 0 a ab/(a + b)    . D b 0 ab/(a + b)     2 2 M b /(a + b) a /(a + b) ab/(a + b) (1.3) In this game, there are an infinite number of fixed points that form an evolutionary stable set (ESS) (see [Weibull, 1995] for great detail on Evolutionary Stable Sets). This comes about because strategy M may exist at any frequency s given that C exists at frequency a(1 − s)/(a + b) and D exists at frequency b(1 − s)/(a + b). That is, for any combination of frequencies of pure-strategies C and D, strategy M will persist as a strategy that mimics those frequencies. M is essentially neutral relative to C and D, thus creating infinite fixed points where no strategy is better than another. While this stable set of fixed points is attractive, the populations may be free to evolve anywhere along the line due to random drift once they are there (see Figure 1.2). When testing these predictions with agent-based methods, several decisions in designing the simulation can have significant impact on the results. For example, because the probabilities p, q, and r are changed via a discrete mutational process, the nature of that process will affect the population dynamics. If the probabilities are implemented as continuous variables (rather than discretized to a particular resolution such as 0 or 1), then the mutation could 9 C D M Figure 1.2: Trajectories of the population fraction of the game defined by Eq. (1.3) with a = 1 and b = 0.2. The dashed line represents the Evolutionary Stable set, while the solid lines show the population trajectories that begin at the solid dots (different initial conditions), and move towards the ES set. be performed by replacing the given probability by a uniform random number (“global” mutations), or be done by changing the probabilities either up or down by a uniform random number from a distribution spanning a particular percentage (“local” mutations). In the latter case, care must be taken so as to remain within the boundaries of a probability. At the same time, it is not possible to update all three probabilities independently, as they must sum to one. Thus, if the underlying genetics of the process in terms were implemented in terms of three loci (one for p, one for q, and one for r), then mutating one locus will necessarily affect both other loci (this is the “3-gene” implementation). If instead the genetics were implemented in terms of two independent loci (say, p and q, the “two-gene” implementation)), then the value of the third probability is determined automatically. All these design decisions affect the population dynamics. Fig. 1.2, shows the average end points of the set of probabilities (p, q, r) on the evolutionary “line of descent”. Note that only the average trajectory (averaged over 1,000 trials) starting at the same initial state (p(0), q(0), r(0)) is smooth: each single trajectory itself is jagged. In this run, when probabilities are mutated they are changed at most ±5% from their current value, which can give rise to significant jumps within the triangle. The trajectories to reach 10 the ES-set are shown in Fig. 1.3a for the “3-genes” encoding, where mutations in one of the loci affects the other two probabilities (as only two of the three probabilities are independent). The trajectories differ when the decisions are encoded in two independent loci (Fig. 1.3b), while the end-points are of course the same. Figure 1.3: (a): Average trajectories of lines of descent for probabilistic strategies using an encoding where each of the two probabilities p and q is encoded independently, thus fixing r (the “2-genes” encoding). (b): Average trajectories of lines of descent for probabilistic strategies with different initial conditions, using the “3-genes” encoding (see text). The dashed line represents the predicted ES-set for deterministic strategies shown in Fig. 1.2. Average of 1, 000 trajectories per line of descent, obtained from populations with 1, 024 agents evolving at a mutation rate of 10% per locus (translating to 30% per chromosome) for 1, 000 generations. 1.3 Neuroevolution Biological brains are incredibly good at integrating relevant information and making decisions given their embodiment and typical environments. How do a brain’s synaptic connections change over time? What is the purpose of the brain’s layered structure? How can complex motion and behavior be engineered without being designed? These and other such questions have inspired work in the field of Artificial Intelligence following creation of the programmable computer in the mid 1930s. The computer enabled automation of logic and symbolic tasks via symbolic logic operations. That is, if something could be described symbolically, then it 11 could be automated by a computer. The inevitable idea was that perhaps a brain could be described symbolically, thereby creating an electronic brain, which would allow automation of many varieties of tasks [Bain, 1985]. The first such accepted attempt was presented at Dartmouth College in 1956: Logic Theorist. This program was designed specifically to follow the same types of reasoning humans do when solving mathematical problems. By breaking down problems into generalizable steps with conditional logic, the creators of Logic Theorist were able to automatically prove 38 of the first 52 theorems in the Principia Mathematica. Since then there have been various efforts at computational automation of a variety of tasks; the more complex tasks requiring more complex operations, the design of which often surpasses human ability to adequately describe at a symbolic level. It is around this problem of automating seemingly simple but realistically complex tasks that the field of machine learning arose. Machine learning is the automation of generalization. The general problem is that there exist some examples of data (either a priori or at runtime) for which the correct response is known but the process to arrive at that response is unknown [Guyon et al., 2011]. Not all possible examples are available or iterable so an appropriate generalization must be learned to handle the unknown variation. That is, there exists an underlying distribution or dimension to the data that is not humanly easy to understand or symbolically describe. It is this underlying description that should be learned in an automated fashion. Examples include: Diagnosis via biomarkers [Street et al., 1995], handwriting and voice recognition [Sebastiani, 2002], and independent component analysis [Hyv¨arinen et al., 2004]. Unfortunately, some desirable applications of machine learning may have complex underlying relationships between the input data and the desired responses. This is often the case involving automation of a complex task performed by a human such as playing a game, or walking. In these cases a special substrate for machine learning called artificial neural networks (ANN) may be used. As the birth of the general field of artificial intelligence was bioinspired, so too was the concept of an ANN inspired by biological advances at the time 12 in the field of neurobiology: gaining insight into the functioning and structure of the human brain [Rashevsky and Rashevsky, 1960]. After its inception, development in neural networks has been largely driven by two interests: engineering, and scientific study. People interested in engineering use neural networks as tools to solve specific problems usually involving automation of a task. These tasks are either human tasks, or superhuman tasks. Those interested in scientific study use neural networks to answer specific questions such as “how do synaptic connections change over time?” or “what is the purpose of the brain’s layered structure?” The aims of each interest group are different, but advances in the field of neural networks, and artificial intelligence in general, are driven by contributions from both groups. Neural networks are particularly well-suited to problem domains where the desired mapping between stimulus and response is too complicated for other machine learning techniques to discover. Robotic control is one such application where neural networks excel. This is because of their ability to receive and operate on continuous data, in real time, and be robust to noise, in addition to learning the underlying complex relationships among the data. Like many other machine learning methods, ANNs have traditionally had difficulty in several conditions such as sparse and infrequent data, or when encountering situations or data wildly different than that for which the network has been trained. The problem of data infrequency has since been solved with long short-term memory (LSTM) network layers [Hochreiter and Schmidhuber, 1997] allowing bidirectional persistence: data may persist in the network for long times, and thus training methods can attribute error to earlier data much more easily. The second problem of encountering new data is an unsolved problem of lifetime learning. In robotics such lifetime learning situations might be a damaged wheel, smudged camera lens, or unexpected or changing terrain. Some efforts have been made to remedy this problem but the most promising approaches rely on precomputing an enumeration of all control strategies [Cully et al., 2015]. 13 Training ANNs is difficult because of the issue of how to change hidden layer weights based on correctness of the network output. Research in the field even stagnated for a couple of decades (1960-1980) until back propagation was rediscovered [Bryson, 1975, Werbos, 1974]. However, back propagation is inherently slow and has the additional problem of vanishing error gradients during optimization [Hochreiter, 1998] that is overcome with tricks like Deep Nets and LSTMs. As a general panacea again biology provides inspiration for how to tune these networks: evolution. As a result, the field of neuroevolution is created. Neuroevolution uses evolutionary algorithms to train artificial neural networks. Various methods exist for using the general algorithm of evolution (comprising of mutation, selection, and inheritance) for the purpose of evolving aspects of ANNs. The differences between these variations are mostly the way each biologically-inspired mechanism is utilized. Common differences include use of particular mechanisms such as ontogenetic vs phylogenetic, direct or indirect genetic encoding, mono- or poly- speciation, mono- or poly-deme, gene flow, etc. In a different way from LSTMs, use of evolution avoids the vanishing error gradient problem during training. Typically back propagation is used to place “blame” on specific weights given particular desired outputs. Evolution requires no such statistical technique because selection acts on the entire population of evaluated networks, propagating those with more appropriate weights. Another aspect of problems with standard network evaluation and modification is in evaluation of the network response itself. Real-valued classification problems provide an easy, almost intuitive way to adjust network weights during training. Based on network response discrepancy the weights may be modified. However, this only works if the network response from each set of inputs is independent. This task becomes difficult when complete sets of responses must be evaluated such as lifetime behavior that must result in completion or failure of a target goal. When complex behavior is required of the neural network output, such as neural networks 14 being used for neuro-controllers, it becomes much more difficult to observe the network’s performance by a single response. Instead, the behavior of the network may be observed over a range or set of inputs. This complex behavior, such as playing checkers, or walking, then has clear measurable performance relative to other variations of the same network, like winning a game of checkers, or controlling a robot to travel the farthest distance. In this way an evolutionary algorithm allows optimization of artificial neural networks performing complex behavior when other methods fail. There are many evolutionary approaches in machine learning. The most useful methods are those that succeed in a wide range of problem domains. Approaches such as the ES-HyperNEAT algorithm [Risi and Stanley, 2011], indirect encoding, or Modular NEAT [Suchorzewski and Clune, 2011], aid with general problem domain representation and more efficient subspace searching. Aside from approaches in use, the future holds many new opportunities for the field of neuroevolution. One such area is in hybrid computation. That is, the simultaneous use of neuroevolution with something alive. This is not a grafting, rather a bidirectional feedback system where something already highly evolved informs the rapid evolution of an artificial neural network via a feedback interface. In a very similar way, it should also be feasible to combine artificial neural network evolution with the growth of biological neural networks in vitro [DeMarse et al., 2001]. Other approaches outside of evolution may also be of use. These approaches tend to subtly improve upon the original foundations inspired by biology. One approach outside of evolution that could be useful to explore for neuroevolution is problem decomposition on the level of entire networks. That is, to evolve networks for lower-order tasks suspected of supporting a higher-level modular design, then to evolve a higher-order network based on those smaller. This is a form of task decomposition, similar to [Lu and Ito, 1999, Reisinger et al., 2004]. Along this vein of research an automated approach would be useful. Cooperative 15 coevolution may help for division of labor, similar to [Goldsby et al., 2009]. It is a contentious topic because some believe modularity is key to higher-level performance and efficiency, while others believe that it adds deceptiveness to fitness landscapes reducing the achievement of good results [Bullinaria, 2006]. It is interesting to note there is a trend in the engineering style of approach to machine learning that is to develop mathematically-inspired solutions to problems. This is in contrast to scientific study in the same field that tends to develop biologically-inspired solutions. These solutions then become adopted for new engineering applications due to their numerous advantages in handling complex tasks and behaviors. Authors of the successful HyperNEAT network clearly stated their non-problem specific scientific intent “...the goal in this work is not to produce the best possible checkers player but to analyze the properties of winning solutions by examining how they exploit geometry and the implications of such exploitation.” [Gauci and Stanley, 2010a] Both engineering and scientific approaches benefit from one another, but engineered approaches tend to focus on specific problem optimization, whereas scientific study approaches tend to discover generalizable improvements, largely benefiting from powerful deductive models created with rigorous hypothesis testing and inductive generalization. 1.3.1 Unification for Advancement It is commonly argued that there are in fact three major categories of machine learning: Supervised, Unsupervised, and Reinforcement Learning. Supervised learning is the most tractable form, where a corpus of input-output pairing is used to train a tunable model through guided trial and error. Unsupervised learning is used for problems where the structure or pattern of data must be discovered, and there is no explicit desired output [Qiu and Sapiro, 2015]. Reinforcement learning is used when, at all intermediate time points, the exact output desired is not known, but the final result of all actions has an unambiguous desired outcome. Confusingly, there are yet more categories coming into existence as researchers discover 16 new ways to treat data and train models, for example Semi-Supervised Learning and Deep Learning, to name a few. This phenomenon is not unlike the early years of game theory when every new extension and exception was developed to overcome limitations for specific applications and these all became new methods with their own trade-offs and considerations, ultimately being unified under a more flexible theory of evolutionary stability and equilibrium. Indeed, it is even accepted that current artificial intelligence research must embrace an “anarchy of methods” because there is no clear way forward [Lehman et al., 2014]. These differentiated groupings of methods in ML confuse matters, make comparisons opaque, and further encumber progress in the field of artificial intelligence. Therefore, for the purpose of this text I will reorganize these into a more coherent structure with a unified rationale and outline the benefit of doing this. 1.3.1.1 Machine Learning Machine learning is the practice of tuning a process model to produce desired output given examples of what that output should be. If no output is desired, then there is no way to guide model tuning so all ML methods rely on providing some form of desired output. This output may be in the form of an exact output, or perhaps an objective function when there is an allowed tolerance for variation. Supervised learning trivially falls under this description, so I will focus on the other two categories. 1.3.1.2 Unsupervised Learning Unsupervised learning encompasses clustering and rule associating. Clustering in a ML context is the tuning of a model to assign labels to data, maximizing or minimizing some objective function. Typically the objective function is a multi-objective function involving minimizing clusters, while maximizing the variance explained by the clustering, leading to the well-known “knee” selection trade-off point, where further grouping results in diminishing returns on explanation of variation. With this view of clustering, the ML approach is blatantly 17 substrate-neutral, allowing either hand-coded k-means clustering algorithms or general NN algorithms to be used, for example. The only defining feature is that the output must be a nominal labeling. Rule association is quite similar to clustering. Logical statements (or trees of statements) generalize to relationships between the data are tuned according to a multi-objective function such as minimizing parameters, while maximizing the variance explained by a proposed rule. This view of rule association provides the same benefits as the clustering view: substrate-neutrality and an ease of sharing approaches from other historically disparate ML methods. 1.3.1.3 Reinforcement learning The new reorganizations I have discussed so far rely on providing an explicit desired output to a ML method, or providing an objective function through which an explicit desired output may be automatically derived. Either way, knowledge of or a method to obtain an explicit desired output is necessary. Reinforcement learning is no different. In reinforcement learning the model being tuned usually integrates temporal information as part of its computation, focusing on long-term learning and exploration as key differentiators. For this reason, environmental context and previous actions play large roles in the model’s generation of output, but these are only more types of data being processed and there is nothing special about them. Example problem domains include movement controllers, elevator scheduling, and various board games. Problems typically tackled with reinforcement learning have training data too large or intensive for batch learning, so an on-line approach is favored, but this is only a change of the timescale for evaluations and updates. Take for example the same example Barto uses in [Dietterich, 2004] of a person attempting to maximize a handheld radio signal. At each location, the person observes a better or worse signal that gives no indication of where to move next, unless previous information made available through memory is also considered. Their goal is to maximize signal strength given all possible positions. The two key traits to a successful algorithm here are long-term memory and exploration, where the 18 only feedback is a signal strength. Barto claims that supervised learning of the same task, were it to be attempted, would give location and signal strength data to the algorithm as a training corpus. This would work, but it would require many if not all signal strength-location pairs to be know, there would be no exploration, and large spatial domains would quickly become problematic. Reinforcement learning solves these problems by allowing feedback of the difference between resulting and desired outputs and retaining state data from spans of time. That is, a single attempted output results in a single feedback of how good that output was. But this sounds very much like supervised learning, which for a single output receives a single feedback, only there are more “guts” to the internal model being tuned and the model chooses which subset of data on which to train. I contend that supervised learning, unsupervised learning, and reinforcement learning, are all forms of the same learning that require evaluation of model output relative to desired output. There is another way to frame this argument using the topic known as the symbol grounding problem. The symbol grounding problem is that embodied cognitive systems may have any number of internal symbols used internally for logic by some rule, but the result of that logic and symbol manipulation must be meaningful in the system’s environment. In other words, the “problem” is that there must be some mechanism to translate the cognitive (model) symbols representing concepts in the world to the actual things in the world (which is why, among other reasons, the cognitive system must be an embodied one). Because of this, practicioners using ML techniques will always be responsible for performing the “embodiment,” or translation from the model result to the problem domain. When performing this function of embodiment, the ML practicioner is indicating to the model the desired final output either directly or as a measure of deviation from the desired output. With unification of these ML approaches, similarities and differences between methods become clearer. A LSTM network used to learn a reinforcement learning task could also learn a simple supervised learning task. The LSTM network structure would be overkill for the simpler task, but would work despite the lack of temporal context, although the experimenter 19 may need to reset the network after each data presentation because there should be no context. The one problem many of these approaches suffer is the definition of layers and features. A deep network of 15 layers will require more computational updates to produce a single output than a 3 layer network learning the same task. In addition to knowing how many updates to run, the experimenter must design the network layer count, width, input quantity, and feature identity, among other properties. It is possible neuroevolution can address these issues, and some have already been addressed by a subset of researchers. Some presented methods begin with a well-connected network and make the network more efficient by modifying the topology [Attik et al., 2005], others create novel topologies and weights [Rocha et al., 2005] through a kind of Lamarckian evolution, and more recent work has brought evolution to deep NNs once thought to be impossible to evolve, giving performance rivaling that of hand-tuned networks [Miikkulainen et al., 2017]. With these recent advances, neuroevolution is enabling researchers to overcome many problems historically plaguing machine learning, and is doing so for all kinds of problem domains. The most promising methods are those that allow evolution of: network topology, weights (if applicable), salient features, and hidden states (recursiveness or 1-output runtime). For this reason the included neuroevolution chapters 3 and 4 highlight an alternative technology called Markov Network Brain (MNB). These networks have evolvable topologies, allowing use of many hidden states to create recursion, and can evolve to use any number of the provided features. See [Edlund et al., 2011] for a comprehensive discussion of the functioning of MNBs. Neuroevolution is increasingly being used to discover and optimize cognitive systems [Mnih et al., 2016,Bojarski et al., 2016,Zhang and LeCun, 2015]. Toward this end it would be useful if there were some properties common to all cognitive systems that could be incorporated into the objective function in order to enhance the evolutionary process. Chapter 3 investigates this idea with several spatial and temporal tasks using information-theoretic analyses. Chapter 4 then probes the issue of how information-theoretic analyses are influenced by the optional 20 probabilistic design of these MNB systems. 21 Chapter 2 Two New Volunteer’s Dilemma Models and Mixed-Strategies as a Result of Selection 2.1 Abstract Volunteer’s Dilemma models are often developed to predict fixed points. These fixed points represent singular average cooperation ratios among populations of players, and thus predict what the dominating strategy or mix of strategies will be. However, the final evolved population may be in any of an infinite number of configurations all of which share the same fixed point. It is yet unknown how to analytically predict the final population distribution from only standard game parameters such as group size, cost, and benefit. Here I validate a new analytical model that can describe the final evolved population distribution. To do this I develop a new computational agent-based simulation used as an inductive study system to compare the deductive predictive capabilities of the analytical model. Lastly, I show a novel finding that the agent-based model suggests probabilistic, or mixed, strategies (such as 22 probabilistic gene expression) are a result of the evolution and maintenance of cooperation in Volunteer’s Dilemma. 2.2 Introduction Like all other games within Evolutionary Game Theory, Volunteer’s Dilemma (VD) is a model about the conflict between the choice of cooperating and defecting. In this game each player of a group has the option to pay a small cost so that a public good will become available to all. For instance, a group of bystanders observe a crime. The public good is to have a safer neighborhood with the criminal behind bars. The only way to achieve that benefit is to volunteer and report the crime, however reporting carries an emotional and temporal cost. If a volunteer steps forward then they are assured they will benefit from their own actions, however their effort was wasted if any other bystanders volunteered, as the volunteer could have been a free-rider benefiting from the self-sacrifice of others. In each application of VD there is a different threshold of volunteers needed before the public good is made available. The phenomenon is not unlike citizens paying taxes for roads and education, or environmental stewardship for the preservation and enjoyment of natural areas and species, because if too few sacrifice for the public good then no one benefits. This is the Volunteer’s Dilemma. Real world systems modeled by the Volunteer’s Dilemma need not be macro systems involving human players. Large groups of animals are well-known to engage in predator warning behavior for group benefit such as vervet monkeys warning the group of terrestrial or aerial predators. Producing a warning call is costly to the individual vocally broadcasting their own location as the predator is more easily able to hone in on that single prey [Searcy and Nowicki, 2005, Olson et al., 2013a]. Even groups of small organisms may engage in a Volunteer’s Dilemma. Vibrio cholerae cells are known for their pathogenic effect when colonizing the human gut. In groups, these cells can produce biofilms, or layers of cells, held together with self-produced structural 23 polymers. Once the cells are in a nutrient rich location, then their population size begins doubling and the polymers they create in aggregate form a stabilizing extracellular matrix holding the cells in place against turbid and uncertain environmental conditions [Nadell et al., 2016]. Vibrio cholerae cells produce a chitinase that breaks down the chitin in cell walls of fungi and some animals, releasing soluble oligomers ripe for catabolism that diffuse through the population. Polymer production is a costly practice, and so through the process of evolution Vibrio cholerae will produce a subset of the population that does not produce the polymer yet still consumes the diffusing oligomers. As such, these non-producers are non-cooperators, or defectors, and their competition with the producers creates a Volunteer’s Dilemma where a lack of enough producers becomes self-defeating. Many previous efforts have been made to model and understand the game of Volunteer’s Dilemma using an evolutionary game theoretic perspective [Archetti and Scheuring, 2011, He et al., 2014]. These models all assume some variation on the following basic game. Assume a large constant well-mixed population of size Ppop . In each successive generation Ppop /N groups of N members are formed at random, each group plays the public goods game, then groups are disbanded. Players pay a cost c if they cooperated toward the public good, and players gain a benefit β if there were enough cooperators to trigger availability of the public good. Before the next generation a small fraction of the population are clonally replaced from an ancestor with likelihood equal to the ancestor’s relative fitness. All previous models focused on prediction of the population average, which is a single point. However, once the population of mixed-strategies has reached a stable equilibrium then the population may be of an infinite number of distributions that satisfy the predicted average. There is currently a lack of theory to predict the strategy distribution of the entire final population for mixed-strategy games. Archetti et al. proposed an analytical model to estimate the fixed points for pure-strategy games [Archetti and Scheuring, 2011] described by the payoff equations for cooperators and 24 defectors that describe payoff given various configurations of cooperation for the N − 1 other players N −1 WC = f (j)B(j) − c (2.1) f (j)B(j) (2.2) j=0 N −1 WD = j=0 where B(j) is the benefit given (β) if the minimum threshold is met with j cooperators, otherwise 0. f (j) is the probability the public good is produced by exactly the number of cooperators needed to meet the threshold ζ requirement regardless of the current player’s choice. This is described by f (j) = N −1 PC j (1 − PC )N −1−j j (2.3) and the equilibrated fixed point can be described as N −1 c PC ζ−1 (1 − PC )N −ζ = b ζ −1 (2.4) where N is the group size, and ζ is the minimum number of cooperators to produce a public good. The frequency of cooperators, PC , who pay a cost c in contributing to the public good, and the benefit b rewarded to each player in the group if the public good is produced. The model is based on assuming the fraction of cooperators as PC and calculates the change in population fractions by taking the difference of the expected payoff for cooperators and 25 defectors at the assumed population fraction. The probability that at least ζ players cooperate is N P (j) = j=ζ N j PC (s) (1 − PC (s) )N −j j (2.5) where PC (s) is the stable fixed point of cooperators, which can be calculated. If the public good is achieved, then each group member receives the benefit β according to a smooth function B(j) = 1 1+ e−s(j−ζ+1) , (2.6) which describes a saturating benefit relative to the number of cooperators. Now the average payoff for the entire group can be calculated as N P (j)WC + (1 − P (j))WD W mix = (2.7) j=1 This system of equations predicts fixed points not only for pure strategy games, but also mixed-strategy games. As Archetti et al. noted, their model works well for pure-strategy polymorphic games where each player adheres to either cooperation or defection. For the pure-strategy game the fixed points represent fractions of pure-strategy cooperators in the equilibrated population. They also noted the model describes mixed-strategy games, but only those in which the population of mixed-strategies is monomorphic where all players cooperate with identical probabilities. In this case the fixed point represents the evolved probability for 26 cooperation shared by all players in the final population. To address the mixed-strategy polymorphic case, the authors in [Archetti and Scheuring, 2011] discuss in the appendix an adaptation of their model to provide fixed points for the mixed-strategy polymorphic VD game. They provide an argument that the dynamics are identical to the monomorphic case, and thus they claim no new model needs to be developed to describe the mixed-strategy polymorphic game. From a biological perspective it seems possible that stabilities and instabilities may arise from the struggle of a heterogeneous population faced with either a new or ever-present contending phenotype. It could also be that fluctuations due to smaller population sizes would have a large effect. Methods WC 2.3 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.0 0.2 0.4 PC 0.6 0.8 1.0 Figure 2.1: Predicted payoff for a single cooperator (blue) and a single defector (red) in a VD game with group size N = 9, cooperation cost c = 0.4, public good β = 3, and threshold ζ = 1, using equations 2.1 and 2.2 respectively. As only 1 cooperator is required to produce the public good, the expected payoff for this cooperator does not change with respect to changes in cooperator population density, but the viability of the defection strategy depends on the density of cooperators. To test the claim that the Archetti model can predict fixed points for the mixed-strategy polymorphic game holds true, I compared the results of their model with that of an agentbased mixed-strategy polymorphic simulation. 27 In this simulation each discrete player is a member of the evolving population of 1, 024 players, so among all members those who do best will represent more of the population in the next generation, with some variation of individuality due to mutation. This generational turnover is performed by a Moran process with random death and fitness-dependent reproduction keeping population size constant, and a small chance of mutation to introduce new strategy mixes. Each player is individually defined by a probability for cooperation, the converse of which defines their probability for defection. For every new generation random groups of size N are formed, the public goods game is played with cooperators paying a cost, and all players benefiting if the public good is generated, and finally the group is disbanded before the selection and reproduction step. The proportion of strategies in any game theoretic scenario will usually create selection pressure for a shift in strategy frequency within the population. After many generations of evolution, the population will either enter a limit cycle where the population fluctuates in an infinitely repeating pattern, or settle into a stable equilibrium where no more change will occur without perturbation. In the standard volunteer’s dilemma there will never be a limit cycle, so the population will always reach some form of equilibrium between the strategy types. Using equations 2.1 and 2.2 with a threshold of ζ = 1 for a simple scenario and setting a proportion of cooperation PC in the population, the relative average payoff for each of the cooperative and defective strategies can be predicted. By itself the average payoff is not very interesting, but becomes informative when compared to payoffs predicted for incremental changes in population proportion (See Figure 2.1). The resulting trend yields a slope, which corresponds to the selection pressure for that strategy given the prevalent proportion of cooperation in the population. However, the change in population proportions as a result of selection pressure may be at odds with the population proportions required to produce such high payoffs. 28 B 0.2 0.4 0.6 PC 0.8 1.0 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 0.0 C WC 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 0.0 WC WC A 0.2 0.4 0.6 PC 0.8 1.0 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 0.0 0.2 0.4 0.6 PC 0.8 1.0 Figure 2.2: As a prediction of [Archetti and Scheuring, 2011], average payoff for a single cooperator (blue) and a single defector (red) for every possible prevalence of cooperators for the VD game with group size 9, cost of cooperation 0.4, public good benefit 3, and ζ = 5 (A), ζ = 6 (B), ζ = 7 (C). The stable fixed point represented by the higher crossing point of the payoff curves shifts toward more cooperation for games with higher public good thresholds, predicted even for games with a high threshold of 7. For instance, a single defector in a well-mixed VD population of volunteering cooperators is very likely to receive a payoff from the invested cooperation of any random group of players. However, any increase in prevalence of the defector strategy necessarily decreases the likelihood any defector will play in a group of benevolent strategies, which is the same as decreasing the likelihood of a payoff. This trade-off is visually identified in Figure 2.1 showing average payoff for a single cooperator and single defector in a game with a low cooperation threshold for public good of ζ = 1. Near a cooperative population proportion of 0.22, a defecting player would achieve higher average payoff in a population comprised of slightly more cooperators. This selection has two opposing forces: defectors gain larger payoff the more likely they are to encounter cooperators, but selection will ensure the high payoff players (defectors in this case) will reproduce proportionally more into the next generation, decreasing the average payoff for the next generation of defectors. Each crossing of the predicted payoff curves in Figure 2.2 indicates an equilibrium fixed point. In each panel there are two such points that can be analytically predicted using equations 2.1 and 2.2. For low prevalence of cooperators (low PC ) cooperators always perform terribly or at least equivalently. As the expected prevalence of cooperation rises, cooperators are the quickest to benefit and in panel A cooperation becomes the highest-paying strategy near PC = 0.3, which would select for the population to produce more cooperators. This 29 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 B 1.0 0.8 PC PC A 0.6 0.4 0.2 0 1 2 3 ζ 4 5 6 0 1 2 3 4 ζ 5 6 7 8 9 Figure 2.3: Analytical predictions of the equilibrium fixed point (dashed line) compared with our agent-based simulation results (solid line) for a variety of minimum thresholds for public goods generation ζ, for the VD game of group size 5 (A) group size 9 (B), cooperating cost of 0.4, public good benefit of 3, and µ of 0.005. Bootstrapped 95% confidence intervals shown but not discernible. trend ends near PC = 0.7, above which there are so many cooperators that free-riding, or defecting, is a much more lucrative strategy. This shift of high-paying strategy changes selection pressure to favor defection. It is this crossing point where payoff curves for WC and WD cross at PC = 0.7 that is the predicted stable fixed point, and the other crossing at PC = 0.3 is an unstable fixed point. The upper point discussed already is known as an attractive fixed point because perturbations in the population will eventually resolve back to the fixed point given enough time to equilibrate. The lower fixed point is known as an unstable, or repulsive fixed point. If the population is exactly at this lower fixed point, then average payoff for each strategy is identical. If selection and reproduction are equally fair to all equivalent payoff strategies, then the population strategy demographics do not change over evolutionary time or any time. However, evolution is a stochastic process so it is highly unlikely the population will remain exactly at this point. Any small change in population demographics results in differential payoff between strategies, kickstarting the process of selection toward the attractive fixed point. There are two other fixed points for all VD games: stable attractive fixed point at PC = 0 and unstable fixed point at PC = 1. 30 Comparing these analytically determined fixed points to an agent-based simulation (ABS) allows validation of the equations. Figure 2.3 shows how simulation results marginally differ for the mixed-strategy version of the same game. The results appear identical for nearly all thresholds, however differ at the highest thresholds: ζ ≥ 4 (panel A) in the case of group size 5, and ζ = 8 (panel B) for group size of 9. These results that show only a marginal a discrepancy between the analytical method suggest the previous claim to be true. That is, the pure-strategy model proposed by Archetti et al. correctly predict fixed points for mixed-strategy games as the authors claim. Any new models extending this model can be assumed to work similarly if the fundamental predictions have not changed. I will now show the validation for a new theory that can predict not only the fixed points for mixed-strategy polymorphic games, but the final evolved population distributions around those fixed points as well. 2.3.1 New Model Previous models described the definite population equilibrium point assuming an infinite population and infinite time to equilibrate. The below model makes the same assumptions. LPC = e− φ(PC ) µ (2.8) where L is the probability distribution of strategies with cooperative probability PC which is a solution to the Fokker-Planck equation associated with the stochastic replicator equation at mutation rate µ. βζ(N + 1 − ζ) φ(PC ) = 3 N + 3N 2 + 2N N k=ζ c N c PC k (1 − PC )(N −k) + PC 2 − PC 3 k 2 3 31 (2.9) B 0.2 0.4 0.6 PC 0.8 1.0 0.040 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000 0.0 C PC 0.040 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000 0.0 PC PC A 0.2 0.4 0.6 PC 0.8 1.0 0.040 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000 0.0 0.2 0.4 0.6 PC 0.8 1.0 Figure 2.4: Distributions representing likelihood of observing population mixtures, derived from equation 2.8. All panels show predictions for the VD game of N = 9, c = 0.4, β = 3, and µ = 0.005, but ζ = 5 (A), ζ = 6 (B), and ζ = 7 (C). For simple predictions there is only one peak (A), but as the game becomes more difficult with higher ζ (B) a second peak may form showing viability of defection (C). where PC is the proportion of cooperators, β is the benefit of the public good produced, ζ is the threshold or minimum required cooperators to produce the public good, N is the group size, k is the number of cooperating players, and c is the cost paid by an individual for cooperating (see private communication with Christoph Adami for the derivation and equation origin). Equation 2.8 with equation 2.9 shares some similar structure with that proposed by Archetti et al. but is extended by inspiration from the Fokker-Plank equation. This extension allows the equation to predict not only a final point of stability between strategy ratios, but how a population distribution of mixed-strategies will change over time. In other words, it is the likelihood of observing each strategy given an infinite population guaranteed to contain all strategies and infinite time for that population distribution to shift according to the selection gradient and evolutionary stochasticity. The magnitude of the curve in figure 2.4 represents strategy abundance, but also the likelihood of observing that strategy at any point after the initial conditions have been mitigated by time, in an infinite population where each strategy may be present. Panel A shows a single peak distribution where such a population will have a mean mixed-strategy of PC = 0.7. The dual peak shown in panel C predicts the final population will stabilize at both peaks PC = 0 and PC = 0.91. 32 With these distributions as guides it is easier to predict what will happen to finite populations of VD players. Previous models relied on generating a single guaranteed prediction. This model generates a stochastic prediction of likelihood for a stochastic process. Each peak of these distributions in figure 2.4 is an attractor, and their relative magnitudes indicates their attractive dominance over one another assuming there are portions of the population representing strategies at each attractor of sufficient quantity. This co-existence of strategies can only happen with a population large enough, and a mutational mechanism that generates strategies diverse enough to populate both attractors. 2.3.2 Mutational Mechanism Among the parameters of equation 2.8 is rate of mutation µ but also an assumption of mutational mechanism, or how those possible mutations are distributed. This particular model was created with assumption of a Gaussian distribution. Two distributions were selected for comparison in ABS: uniform and Gaussian. The uniform distribution is the most general model of mutational mechanism typically used in computational experiments. Under a uniform mutational distribution each newly generated trait value from the distribution is as equally likely as any other trait value. However, this simplistic model of mutations and their effect on trait values might have large oversimplifying effects on the evolutionary history of the population, obscuring or completely missing interesting dynamics. For instance, because strategies are encoded directly from trait values, then every strategy is one mutation away from any other strategy making the requirement for stable dominance of a strategy such that it must be completely defensible against all other possible strategies. The Gaussian distribution is more reasonable assumption for biological systems because new trait values only closely related to current trait values can arise from new mutations, and they arise with a Gaussian likelihood proportional to their distance from the last trait value. The implications for dominance under the Gaussian distribution is that any dominant strategy must be defensible only against closely related strategies, not against all other possible strategies. 33 PC 0.0035 0.0030 0.0025 0.0020 0.0015 0.0010 0.0005 0.0000 0.0 0.2 0.4 0.6 PC 0.8 1.0 Figure 2.5: Likelihood of average population mixtures for mixed-strategies assuming infinite population size and infinite time. Each peak and valley represents a fixed point: The left peak (defective) is attractive, the valley is repulsive at PC = 0.1375, and the right peak (cooperative) is attractive at PC = 0.275. It can be observed in finite populations that a small mutation rate or large population will stabilize a population at the cooperative fixed point, but otherwise the population will eventually collapse to the defective fixed point. Dominance between any two strategies of only one player each in a repeated game of random new players is relatively easy to predict by comparing predicted average payoffs. For a population of Npop players, the fixed point represents the dominant ratio of strategies. Fixed point prediction in VD games with deterministic strategies can be calculated using equation 2.4, but may also be used for mixed-strategy games under large enough population sizes. For instance, for a group size N = 9, cooperation cost c = 0.4, public good benefit β = 3, and threshold ζ = 4 will equilibrate to around 59% cooperators and 41% defectors (2.3). These mixed-strategy games are comprised of players who play any of the usual strategies from the deterministic game according to a probability density function (PDF) with a domain of all possible strategies. As each player represents a single PDF, the entire population of mixed strategy distributions evolves as a mixture itself. This population eventually equilibrates to a particular distribution of mixed strategies. Figure 2.5 shows a distribution of likelihoods to observe the population in particular basins of attraction for mixed-strategies. Final population distributions from the agent-based results can look quite different from analytical predictions due to finite population and finite time. By Figure 2.5 the predicted 34 0.040 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000 0.0 B fPC fPC A 0.2 0.4 PC 0.6 0.8 1.0 0.045 0.040 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000 0.0 0.2 0.4 PC 0.6 0.8 1.0 Figure 2.6: Agent-based evolved populations for the uniform mutational distribution (A) and Gaussian distribution (B), with public goods threshold ζ = 4, for the VD game of group size 16, cooperating cost c = 0.6, public good benefit β = 3, and µ = 0.005. Over 100 replicates collected to produce the distributions. Both distributions have and are predicted to have a mean cooperation frequency fPC = 0.275 using the cooperative fixed point in the distribution from Figure 2.5. Despite the average of each distribution as the predicted fixed point, the distributions are quite different from one another. Only the Gaussian mutational distribution produces a unimodal population strongly clustering around the predicted fixed point, whereas the uniform mutational distribution produces a unimodal population with a long tail, and it is not immediately apparent which strategy is the fixed point, or average strategy. cooperative fixed point assuming Gaussian distribution would be a population with an average mixed-strategy of PC = 0.275 and would also mutationally average a PC = 0.275 with probability equal to the same distribution. This ergodicity of representing both abundance and likelihood of observation is only possible given assumptions of infinite population size and infinite time, but in a finite population finite time simulation the expected ergodic relationship will not hold. The agent-based simulation of evolving a population of 1024 individuals playing VD with the same parameters yields two different distributions depending on the mutational mechanism (Figure 2.6). A uniform distribution of possible mutations creates a population distribution rife with defection, only offset by a large spread of different types of cooperators (panel A). A Gaussian distribution of possible mutations creates a population distribution clustered around the fixed point (panel B). While both population distributions in panels A and B have average mixed-strategy PC = 0.275 matching the analytical fixed point prediction, the population evolved with the more plausible Gaussian mutational regime creates a very 35 different evolved population. So while the fixed point can be predicted with some degree of accuracy, the mix of the final population depends heavily on the assumption of mutational mechanism. 2.3.3 Errors Study systems are useful to study emergent phenomena, but an ever-present danger for theoretical or computational systems is having mistakenly designed bugs or incorrect assumptions into the system. How then does an experimenter know their study system provides results that may be reliably generalized? If the study system dynamics respond in similar ways to the systems it represents, then the other dynamics that are more difficult to validate may be accepted with higher confidence. In collaboration to develop a new mathematical model, a study system was necessary from which inductive analyses could be used to inform the analytical model. Both systems may have their respective errors during development. A comparison to reasonable expectation and similar systems must be made before confidence may be ascribed to their results and implications. 1.0 PC 0.8 0.6 0.4 0.2 0.0 10 1 10 2 Npop 10 3 Figure 2.7: Fixed points predicted exactly by the analytical model (dashed line) and approximated by the average strategy of the evolved population from the agent-based model (solid line). Both models have the parameters ζ/N = 0.2, c = 0.01, and β = 0.5. The agent-based model uses µ = 0.005 and a smoothing payoff parameter s = 0.5, which is half way between a linear and step interpolation. Both predictions converge toward PC = 0.2, and is identical to the setup and result of the analysis provided by [Archetti and Scheuring, 2011]. Archetti [Archetti and Scheuring, 2011] proposed a similar but simpler model already 36 vetted, which I can use to further validate the analytical model through comparison of identical or similar analyses. The Archetti model is an analytical model that predicts the stable and unstable equilibrium points of the VD game by predicting the proportion of cooperation to defection. Archetti’s models show that despite a difference in estimated result, the hard threshold and smooth threshold results give converging results for large group sizes (N ). For small group sizes, the smooth threshold model predicts larger cooperation ratios than the hard threshold model. The convergence from both models can be predicted by extrapolating the response from a range of group sizes. In other words, as population size increases, then both models tend toward producing the same resulting fractional population of cooperators. I can use the dynamics of Archetti’s models to test if our proposed models exhibit the same convergence under identical assumptions. Both our models may be used with the same assumptions, as like Archetti, our models assume mixed strategies and our analytical model assumes a hard threshold while the computational model can be parameterized with a steep or soft threshold. Ideally, I would perform an identical parameter sweep to Figure 3 from [Archetti and Scheuring, 2011] beginning with a population size of 100 and measuring predicted ratio of cooperators all the way up to population size of 600 in increments of 50. A direct comparison to these parameter ranges is not possible because the computational complexity of the agent-based system is a limiting factor. The closest I can produce in reasonable time is a sweep of results for population sizes of a factor of 10 less. However, the dynamics for both the mathematical and simulation models suggest a convergence near the same PC around which Archetti’s models converge. That is, under an assumption of ζ/N = 0.2, and the payoff function is defined with a smoothing parameter s = 0.5 to create a very shallow very smooth sigmoidal function, then both mathematical and simulation systems converge to a value approaching 0.2. Note, s = 0.1 would be a nearly linear function, and an s = 100 would be nearly a step function. 37 The smooth threshold model from Archetti always predicts a higher level of cooperation than the step threshold model. For small population sizes, this difference is quite large, but larger population sizes show an apparent trend to asymptotically converge on a single value of predicted cooperation ratio within the evolved population. Our results follow a similar pattern despite a different range of population sizes. Smooth threshold results for small population sizes predict nearly 2x more cooperation in the final evolved population than the step function results. Additionally, this difference decreases asymptotically for larger population sizes, converging on the same value as Archetti’s models. This similarity of predictive behavior from our models compared to Archetti’s models supports an equivalence of validity. 2.3.4 Spatial Constraint Closed form analytical models such as the ones discussed here cannot make predictions under an assumption of spatial constraint, as higher order dynamics greatly influence the outcome. For this reason only the tightly coupled linear systems, such as an agent-based simulation, can provide predictions. Here I explore spatial considerations using the agent-based system for pure-strategy and mixed-strategy systems, and compare predictions to the those of the analytical model. Figures 2.8 and 2.9 show the difference between a well-mixed (panel A) and spatial (panel B) VD game with mixed-strategies. Each population has 1, 024 individuals and was evolved for 80, 000 updates. Parameters were N = 16, c = 0.6, β = 3, and µ = 0.005. While the population distributions shown are so different to almost be inverses of each other, their means are nearly identical. This comparison emphasizes how, like mutational mechanism, spatial constraint can highly influence the resulting population distribution. However, a thorough comparison of fixed point predictions for a valid range of ζ will better show other potential differences between well-mixed and spatial VD games. Figures 38 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 B 0.25 0.20 fPC fPC A 0.15 0.10 0.05 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 PC PC Figure 2.8: Population distributions for single instances of agent-based evolved populations for well-mixed (A) and spatial constraint (B), with public goods threshold ζ = 4, for the VD game of group size 16, cooperating cost c = 0.6, public good benefit β = 3, µ = 0.005. The analytical model predicts a mean cooperation frequency fPC = 0.275 and both well-mixed and spatial populations evolutionarily shift around this fixed point. A B Figure 2.9: Spatial visualizations for single instances of agent-based evolved populations (the same from Figure 2.8) for well-mixed (A) and spatial constraint (B), with public goods threshold ζ = 4, for the VD game of group size N = 16, cooperating cost c = 0.6, public good benefit β = 3, µ = 0.005. The analytical model predicts a mean cooperation frequency fPC = 0.275 and both well-mixed and spatial populations evolutionarily shift around this fixed point. 39 A B 1.0 0.8 0.8 0.6 0.6 PC PC 1.0 0.4 0.4 0.2 0.2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ζ ζ Figure 2.10: Comparison of analytical (dashed) and ABS (solid) predictions for deterministic strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group size N = 9, cooperating cost c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A and B they analytical (dashed) is the same prediction assuming well-mixed. Fixed points for the pure-strategy game predicted by the ABS under spatial constraint (B) show much higher levels of cooperation than the well-mixed analytical prediction. 2.10 and 2.11 show differences between the mixed-strategy (panels A) and spatial (panels B) VD games for group size N = 16, cooperating cost c = 0.6, public good benefit β = 3, µ = 0.005. Spatial constraint is well-known to encourage cooperation in games, so the expectation for ABS results are for higher cooperative ratios when the populations are spatially constrained [Killingback and Doebeli, 1996, Dietz, 2003]. Both deterministic and well-mixed VD fixed points are perfectly predicted by the analytical model for a variety of ζ thresholds, but differ in how spatial constraint affects the PC fixed point. For deterministic VD games, spatial constraint highly biases results toward cooperation, whereas the effect is marginal for the mixed-strategy games. There is a point in Figure 2.11 panel B where it appears spatial constraint undermines cooperation compared to the analytical prediction for the highest of threshold ζ. This effect may be due to a lack of equilibration time, as the similar downward trend is seen in panel A for the well-mixed games. At a ζ = 4, spatial structure imposed on a pure-strategy population creates a population with 83% cooperation. Spatial structure imposed on a mixed-strategy population of the same threshold creates a population with mean strategy PC = 0.65 cooperation. To compare 40 B 1.0 1.0 0.8 0.8 0.6 0.6 PC PC A 0.4 0.4 0.2 0.2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ζ ζ Figure 2.11: Comparison of analytical (dashed) and ABS (solid) predictions for probabilistic strategy fixed points for well-mixed (A) and spatial constraint (B), for the VD game of group size N = 9, cooperating cost c = 0.4, public good benefit β = 3, µ = 0.005, and s = 1.0. For A and B they analytical (dashed) is the same prediction assuming well-mixed. Fixed points for the mixed-strategy game predicted by the ABS under spatial constraint (B) show marginally higher levels of cooperation than the analytical prediction, indicating higher cost-benefit efficiency. the success of these populations is difficult given only the cooperation ratios, but a direct comparison of group fitness can be made using Equation 2.7. An exploration of group fitness as a result of PC is found in Figure 2.12. W mix of pure-strategy at 83% cooperation is 24.012, and W mix of mixed-strategy at PC = 0.65 cooperation is 24.027. The mixed-strategy population has marginally higher predicted group fitness, with a lead of 0.015. This difference is not very large, so one might assume that in direct competition of pure-strategies and mixed-strategies there may be no clear winner. In small population sizes there will surely be a negligible fitness difference, but some biological populations that exhibit a Volunteer’s Dilemma style threshold game, such as biofilm-forming microbes, form populations of much greater size. For a fitness effect of s = (24.027 − 24.012)/24.012 = 0.0006247 a population of at least N = 1/0.0006247 or N = 1600 would be required to allow fixation of mutations with such a small effect size. For a biologically plausible population size, the mixed-strategy population would outcompete the pure-strategy population. The reason for this difference can be understood by the cost-benefit ratio for cooperators in the VD game: too much cooperation is costly and redundant, whereas efficient levels of cooperation benefit the entire 41 group because cooperative effort is not wasted beyond that required to create the public good. These results suggest “phenotypic heterogeneity” or phenotypic plasticity such as that observed in biofilms [Jefferson, 2004] is an adaptive trait whereby a probabilistic expression allows efficient population strategy mixes with given large populations. 25 W MIX 20 15 10 5 0 0.0 0.2 0.4 PC 0.6 0.8 1.0 Figure 2.12: The result of using Equation 2.7 to predict group fitness for Volunteer’s Dilemma of group size N = 9, ζ = 4, cost c = 0.4, and benefit β = 3. The fixed point is at PC = 0.59, but the maximum group fitness is higher, at PC = 0.72. 2.4 Conclusion Previous models for Volunteer’s Dilemma predict population fixed points as singular averages of cooperation PC . Here I show validation for a new analytical model by providing an agentbased simulation. The development of both models allows an induction-deduction refining development and validation for otherwise difficult modeling problems. The data presented here highly support the new analytical model as a robust method to predict final population distributions and the configurations through which they will progress over evolutionary time. Finally, the comparative data suggest phenotypic plasticity, or probabilistic gene expression, in microbes as a selective advantage in the evolution of cooperation for Volunteer’s Dilemma games. 42 Chapter 3 Information-theoretic Neuro-correlates Boost Evolution of Cognitive Systems 3.1 Abstract Genetic Algorithms (GA) are a powerful set of tools for search and optimization that mimic the process of natural selection, and have been used successfully in a wide variety of problems, including evolving neural networks to solve cognitive tasks. Despite their success, GAs sometimes fail to locate the highest peaks of the fitness landscape, in particular if the landscape is rugged and contains multiple peaks. Reaching distant and higher peaks is difficult because valleys need to be crossed, in a process that (at least temporarily) runs against the fitness maximization objective. Here we propose and test a number of information-theoretic (as well as network-based) measures that can be used in conjunction with a fitness maximization objective (so-called “neuro-correlates”) to evolve neural controllers for two widely different tasks: a behavioral task that requires information integration, and a cognitive task that re43 quires memory and logic. We find that judiciously chosen neuro-correlates can significantly aid GAs to find the highest peaks. 3.2 Introduction The last 50 years of research in Artificial Intelligence have taught us many things, but perhaps the most obvious lesson is that designing complex cognitive systems is extremely hard. Notwithstanding the success of chess-playing algorithms and self-driving cars, designing a brain that rivals the performance of even the smallest vertebrate has proven elusive. While the computational algorithms that are being deployed today on the aforementioned problems, (as well as on image classification via convolutional nets) are impressive, many researchers are convinced that none of these algorithms are cognitive in the sense that they display situational understanding. For example, the celebrated convolutional nets can easily be fooled [Nguyen et al., 2015] with trivial imagery, suggesting that they implement a sophisticated look-up table after all, with very little understanding. The failure of the design approach has been acknowledged by several groups of researchers that have chosen an entirely different approach, namely to use the power of evolution to create machine intelligence. This field of “neuro-evolution” [Yao, 1999, Floreano et al., 2008] is much less developed than the standard design approach, but it has made great strides in the last decade. It also has the advantage (compared to the design approach) that the approach is known to have resulted in human-level intelligence at least once. In the field of neuro-evolution, a Genetic Algorithm (GA) [Michalewicz, 1996] is used to evolve a program that, when executed, builds a “neuro-controller”. This neuro-controller constitutes the brain of a simulated entity, which is called an agent. Each program is evaluated via the performance of the agent, and programs that gave rise to successful brains are then replicated, and given proportionally more off-spring programs than unsuccessful programs. Because mutations are introduced in the replication phase, new 44 types of programs are introduced every generation, trying out variations of the programs–and therefore variations of the brains. This algorithm, closely modeled on the Darwinian process that has given rise to all the biocomplexity on the planet today, has proven to be a powerful tool that can create neuro-controllers for a diverse set of tasks. Using evolution to create brains is no panacea, though. The use of Genetic Algorithms (GAs) to optimize performance (fitness) of behavior controllers is often hindered by the structure of complex fitness landscapes, that are typically rugged and contain multiple peaks. The GA, via its fitness maximization objective, will discover local peaks but may get stuck at sub-optimal peaks because crossing valleys is specifically not an objective. For a population to overcome the valleys in such a rugged landscape, programs must (at least temporarily) acquire deleterious mutations that are at odds with a simple reward system for optimization. This difficulty is typically overcome by increasing diversity in the population [De Jong, 1975], by splitting the population into islands [Whitley et al., 1998, Bitbol and Schwab, 2014], by using alternative objectives such as in novelty search [Lehman and Stanley, 2008], or by changing (and thus optimizing) the fitness function itself. Of these solutions, “Diversity” and “Novelty” can be computationally intensive, while fitness function optimization is very specific to every problem, and thus not a general solution. Here we propose an alternative approach to fitness optimization in the evolution of cognitive controllers, that takes advantage of the insight that functioning brains have a certain number of characteristics that are a reflection of their network structure, as well as their information-processing capacity. If we were able to reward these features at the same time as rewarding the performance of the given task, it may be possible to evade the valleys of the landscape, and move on neutral ridges towards higher peaks. The idea of using multiple objectives in Genetic Algorithms is not at all new [Zhou et al., 2011], and it has been used previously in neuro-evolution [Clune et al., 2013]. We present evidence from simulations of evolving virtual agents that establishes that it is 45 possible to improve the performance of a GA, increase the rate of adaptation, and improve the performance of the final evolved solution, all by incorporating neuro-correlates of the evolving agent’s brains into the fitness calculation. These neuro-correlates are metrics inspired by the interface between Network Science and Cognitive Science that attempt to measure “how well a brain is working”, independently of the achieved fitness. These measures typically do not assess an agent’s performance of a task because it is often difficult to relate task performance to cognitive ability. Ideally, these neuro-correlates either quantify the mode and manner that information is being processed, or in which manner the nodes of the network are connected. It is important that these neuro-correlates are agnostic of performance, as otherwise their reward would not open a new dimension in optimization. The evaluation of an agent in any GA typically involves measuring the agent’s performance for a given task or environment. We show that multiplying the performance of an agent with the value of its neuro-correlate will improve the speed of evolution and increase the ratio of perfect-performance evolved agents (a simple way of performing multi-objective optimization, see for example [Deb, 2014]). This improvement can be traced back to an increase in the number of potentially beneficial mutations that may persist or sweep the population. If a mutation increases cognitive ability but does not yet have an effect on the agent’s performance, then it is evolutionarily neutral and can be lost by drift. However, if the neuro-correlate shows an increase that is neutral with respect to performance, but improves cognition in some other form (and increases a neuro-correlate), then such a mutation is no longer neutral and is instead selected. In future generations, such an improvement might become beneficial for task performance. Therefore, using neuro-correlates in conjunction with performance allows these otherwise neutral mutations to stay in the population for longer or even promote them to fixation. Subsequent mutations have then a chance to take advantage of these changes that otherwise would have been lost to drift. 46 3.3 Background We evolve agents to solve two very different tasks: a temporal-spatial integration task (active categorical perception, see [Beer, 1996, Beer, 2003, van Dartel et al., 2005, Marstaller et al., 2013b]) using embodied agents, and a purely mathematical (disembodied) task that requires complex cognitive processing: the generation of random numbers using deterministic rules. The temporal-spatial integration task requires agents to observe and categorize blocks of different sizes falling toward them, by catching small blocks while avoiding large blocks. This task cannot be solved by a purely reactive machine (see [Marstaller et al., 2013b]) because agents must use memory in order to recognize the block’s trajectory and predict where it will land. The task creates a fitness landscape known to be deceptive, as prior results have shown that only a small fraction (about 10%) of populations result in an optimal solution. The sub-optimal agents usually get stuck on local peaks that deliver about 80% of maximal fitness [Marstaller et al., 2013b]. In the second set of experiments we investigate a task where agents are rewarded for generating long sequences of random numbers (without access to a random number generator or any other stochastic source). Agents are given an oscillating bit as an indicator of time, and assigned fitness based on the length of the dictionary generated from a Lempel-Ziv compression [Welch, 1984] of the agent’s output. Like the previous task, this task cannot be solved by a purely reactive machine, although would be trivially solved if the agents could access stochastic processes. However, because the agents use only deterministic processes we expect this task to require a great amount of temporal integration and memory in order for them to achieve a good amount of randomness. Indeed, generating random numbers is a known task to test cognitive ability and disability, in particular in the realm of Autism Spectrum Disorders and dementia [Brugger et al., 1996, Baddeley, 1998, Jahanshahi et al., 2006]. The standard substrate for neuro-evolution are Artificial Neural Networks (ANNs, see 47 e.g. [Russell and Norvig, 2003]), but we use here a different substrate (“Markov networks” or “Markov Brains”) that has proven to be adept at behavioral decision-making tasks [Albantakis et al., 2014, Edlund et al., 2011, Olson et al., 2013b, Chapman et al., 2013, Haley et al., 2014, Olson et al., 2015, Kvam et al., 2015, Olson et al., 2013c]. In contrast to ANNs in which neurons are continuous-valued and non-firing, neurons in Markov brains (MBs) are digital with only two states: quiescent or firing. Markov neurons can interact with any other neuron via arbitrary logical functions (as opposed to the ubiquitous transfer- and activation-function found in ANNs). We use MBs because we have experienced that they are computationally more powerful and more evolvable, while having a much smaller computational footprint than ANNs. ANNs on the other hand have a wide range of applications, and our results might generalize to those applications as well. The logic functions that connect neurons, along with the topology of the network, are encoded directly by a string of bytes. The logic gates act on Markov variables (our equivalent of a neuron), and the output is written into other Markov variables. In a sense, the MB is defined by the edges between nodes, as it is the edges that carry all the computational power. Each gate is specified by a gene, and the beginning of each gene on the chromosome is specified by a particular combination of bytes–in our case, the “start codon” (42,213). The bytes that follow determine the identity of the neurons it reads from, and the identifier of the neuron(s) it writes to. The subsequent bytes encode the logic of the gate, which can be done by simply encoding the truth table. While other MB implementations allow for stochastic logic gates, we confine ourselves to deterministic gates, which have a much more concise encoding (see Refs. [Edlund et al., 2011, Chapman et al., 2013, Marstaller et al., 2013b] for a more detailed description of MB encoding and function). There are alternative neural substrates that we could have studied here, including NEAT or hyperNEAT [Stanley and Miikkulainen, 2002], genetic programming, or subsumption architecture machines [Brooks, 1986], etc.. These are all viable substrates for exploring the benefits of neuro-correlate aided evolution. In this contribution we focus on testing the general validity 48 of the neuro-correlate augmented evolution approach. We do expect the results to depend on the underlying neural substrates, their evolvability, and how well each neuro-correlate can be assessed. In addition, our proposed method is easy to implement for other systems: A neuro-correlate must be measured and the resulting value multiplied by the associated performance. This should allow for a rapid testing of this method in other systems. Despite evidence that an indirect encoding might be more advantageous [Clune et al., 2011, D’Ambrosio et al., 2011, Gauci and Stanley, 2010b], the direct encoding has been very successful in evolving controllers for virtual agents to solve a wide range of tasks [Marstaller et al., 2010, Edlund et al., 2011, Chapman et al., 2013, Marstaller et al., 2013b, Albantakis et al., 2014,Hintze and Miromeni, 2014]. In addition, these controllers have been instrumental in establishing some of the neuro-correlates used in the present study, which increases our confidence these measures perform as described. Next we describe the eight different neurocorrelates used to assess a controller’s topology and performance. 3.3.1 Network-theoretic neuro-correlates 3.3.1.1 Minimum-Description-Length The simplest neuro-correlate is, colloquially speaking, the largest possible brain size. It is difficult to define such a concept mathematically, but we can imagine that if we had a description of the brain in terms of the program that builds it, then the shortest such program would be the most concise description of the brain in a Minimum Description Length (MDL) formalism, and larger MDLs could encode larger brains. The size of the genome that codes for our Markov brains could serve as a proxy for the brain MDL, but it is almost never the smallest description of the brain simply because the genome can add more “empty tape” instead of running out of space to encode more logic gates, for example using a gene duplication. Using the genome size (as proxy for MDL) as a neuro-correlate makes sense because it explicitly rewards genome expansion, rather than waiting for a fortuitous genome duplication to add 49 the extra space. The genome size is directly proportional to the potential number of logic gates and thus the number of connections the agent phenotype might have, since the genome encodes the logic gates directly. Of course, under such a selective pressure genome length is almost guaranteed to be very different from the compression limit (the smallest program encoding the brain), but we can think of evolution as creating a selective pressure to compress the brain description as much as possible. In our implementation of Markov Brains the genome size varies between 2, 000 and 20, 000 loci (each locus on the genome is a byte, so it can take values between 0 and 255) and can be affected by insertion- and deletion-mutations. We do not use the number of encoded logic gates to directly assess brain size for two reasons: First, each gate can have a different number of inputs and outputs, which influences the complexity of the gate. Second, gene duplications can create exact copies of a gene that codes for a gate, which changes the number of gates without (presumably) affecting the brain’s function. Because the connectivity of standard ANNs is fixed [Russell and Norvig, 2003], an MDLlike neuro-correlate does not exist there, but the Vapnik-Chervonenkis (VC)-dimension [Vapnik and Chervonenkis, 1971] that bounds the learning capacity of a network could be a suitable alternative. Within more plastic systems ANNs that allow for encoding of connections such as NEAT [Stanley and Miikkulainen, 2002] the number of edges between neurons could be a proxy for potential brain size. 3.3.1.2 Topological Complexity Brains are networks of neurons, and our MBs are networks of logic gates, both of which can be represented by graphs. Assessing the complexity of graphs is not a straight-forward task (see, e.g., [McCabe, 1976]), but for the purpose of brain function some graph properties are obvious neuro-correlate candidates, and easy to measure. We first measure the graph diameter (GD) as the highest value in the distance matrix of the network - also known as the longest of all shortest paths between all node pairs. The intuition behind using GD as 50 a neuro-correlate is that information traveling along neurological pathways in brains with a large diameter has more time to interact with other neurons, in particular in other parts of the brain. If information takes longer to pass from sensors to actuators, it remains within the brain longer and therefore extends the agent’s short-term memory. 3.3.1.3 Connectivity and Sparseness We measure the standard graph theoretic “Gamma Index” (GI) or “connectivity” of the network as well as its converse, the network “sparseness.” The Gamma Index is the ratio of extant connections to all possible connections. For this measure multiple connections between nodes are treated like a single connection, otherwise there would be an infinite number of possible connections between all nodes of the network. Current understanding of brain optimization and organization suggests that connections between neurons are costly [Ahn et al., 2006, Cherniak et al., 2004] and that this provides a strong selection pressure during evolution. Specifically, it has been shown that minimizing connections between neurons in a brain produces more modular and adaptable brains [Clune et al., 2013, Huizinga et al., 2014]. As you will see, this is not necessarily the case but depends on the task to evolve. Also, intuitively one might think that more connections are better, and thus optimizing for density might be as beneficial as for sparseness under the right circumstances. To incorporate this phenomenon, we use Sparseness and Gamma Index separately, as they reflect different aspects of brain connectivity. 3.3.1.4 Representations R, a measure for the amount of information that a brain represents within internal states, is a new information-theoretical measure of cognitive function that correlates with fitness [Marstaller et al., 2010,Marstaller et al., 2013b], but is in principle separate from an agent’s performance on a task. R measures how much an agent knows about its environment above and beyond its current sensory information. Representations can be thought of as internal models of the 51 world that the agent can use to make decisions in conjunction with sensory signals, or even in the absence of sensory signals. Because of this, R is often identified with “what a brain knows about its world with its eyes closed”. We can define R as the information the mental states (described by a random variable M ) of an agent have about the environment (described by random variable E), given its sensors (variable S) [Marstaller et al., 2013b] R = I(E : M |S) = I(E : M ) − I(E : M : S) = I − I(S : E) − I(S : M ) . (3.1) Here, I is “multi-information”: the amount of information variables share [Schneidman et al., 2003], which for three variables becomes I = H(E) + H(M ) + H(S) − H(E, M, S) . (3.2) In equation (3.1), I(E : M |S) is the shared information between variables E and M given the sensor state S, I(E : M ) is the (unconditional) shared information between environment and brain internal states not including sensors), and I(E : M : S) is the shared information between all three variables (a quantity that may become negative). The measure R thus quantifies the correlation of mental states with world states, while subtracting the correlations coming from the sensors. This notion is not well-defined when using the correlation function, but it is when using information theory. R increases in evolving agents when their tasks require knowledge or internal models about the environment [Marstaller et al., 2013b]. Agents without representations, or which are purely reactive agents, remain below optimal performance. Measuring these representations is simple as long as sensor, brain, and environmental states are directly observable (as they are here) causing only a small computational overhead. It is important to note that R is not necessarily correlated with fitness. For instance, an agent might have representations about an incoming threat, but may not respond. Else, an agent may make decisions based solely on sensorial input, obtaining high fitness without 52 representations. Therefore, representations do not necessarily make a prediction about an agent’s performance, even though they are usually correlated. In addition, R is not strictly speaking a neuro-correlate since it cannot be measured intrinsically. It is crucial that the correlate used does not allow predictions about performance, because otherwise the correlate in itself would be a proxy for fitness, and therefore optimizing a combination would not introduce a new element. R satisfies this condition so we include this measure. 3.3.1.5 Information Integration Agents solving a cognitively complex task must integrate information from different sensory inputs to come to a single decision. Sensory inputs must be compared to one another in the light of past experiences. The information-theoretic measure Φ is one way of quantifying a brain’s information-integration capability [Tononi, 2008, Balduzzi and Tononi, 2008, Balduzzi and Tononi, 2009, Edlund et al., 2011, Albantakis et al., 2014]. Unfortunately, computing Φ is computationally intensive, so much so that it is cumbersome on modern high-performance computers to calculate Φ exactly for brains of 16 neurons (for every agent at every update of an evolving population), and essentially infeasible for brains with more than 24 neurons1 . Here we use the much more practical Φatomic , which is a very good approximation of Φ at a much reduced computational cost [Edlund et al., 2011] (Φatomic has also been defined as ”synergistic information” [Barrett AB, 2011, Ay, 2015]). Specifically, Φatomic is given by the integrated information, but calculated for one specific partition of the network (while the standard Φ optimizes over partitions). The atomic partition is the one where each node is its own part, that is, the atomic partition segments the network into all nodes being individuals and not part of any other partition. To define Φatomic , we first define the information that is processed (in time) by the entire system. Let us define the network’s state using the joint random variable X = X (1) X (2) . . . X (n) , 1 Because the computational complexity of Φ scales super-exponentially, calculating Φ for the brain of the lowly nematode C. elegans with 302 neurons, requires evaluating ∼ 4.8 × 10457 partitions of the network, an absurd task. 53 where X (i) represents the elements (nodes) of the system, where X changes as time (t) progresses. Each variable Xt is defined by a probability distribution p(xt ) to find variable Xt in (i) (i) (i) state xt . Each node i progresses in time X0 → X1 and each Xt is described by probability distribution p(xij ). The information processed by the system from time step t to t + 1 is then given by I(Xt : Xt+1 ) = p(xt , xt+1 )log xt ,xt+1 p(xt , xt+1 ) . p(xt )p(xt+1 ) (3.3) The measure Φatomic then quantifies how much of the information processed by the system cannot be explained by the sum of the information processed by each individual computational unit. Thus, in a sense Φatomic quantifies how much processing is “more than the sum of its parts”, where the parts are defined by the individual neurons: n (i) Φatomic = I(Xt : Xt+1 ) − (i) I(Xt : Xt+1 ) + I. (3.4) i=0 (i) (i) Here, I(Xt : Xt+1 ) is the information processed by the ith neuron, and I (called “integration” or “multi-information” in other work) measures the nonindependence between the network variables [Schneidman et al., 2003, McGill, 1954, Tononi et al., 1994, Lungarella et al., 2005, Lungarella and Sporns, 2006] and is defined as n (i) I= H(Xt ) − H(Xt ). (3.5) i=0 With both system information unexplained by part sums over time, and spatial integration measures of nonindependence, Φatomic represents both temporal and spatial network synergies. Calculation of Φatomic then simplifies to n (i) (i) H(Xt |Xt+1 ) − H(Xt |Xt+1 ) Φatomic = i=0 54 (3.6) As with previous neuro-correlates, the act of integrating information does not imply that there will be an action upon such integrations. However, selecting agents with a higher Φatomic over others with the same performance guarantees the preservation of potentially beneficial mutations. In addition, we know that Φatomic is a limiting factor in the evolution of cognitive abilities: to perform a given task the agent requires a minimal amount of Φatomic [Joshi et al., 2013], and a better performance necessitates a higher minimal amount of Φatomic . 3.3.1.6 Predictive Information Predictive information (P I) can be measured in several ways. It is the one-step mutual information a system has between time t and t + 1. MB animats have sensors and actuators, and P I can be measured as a one-step mutual information of the sensors and future sensors, or the actuators and future actuators, or the sensors and future actuators, or the actuators and future sensors. Here we measure P I of the sensors and future sensors (P Iss ), and sensors and future actuators (P Isa ) P Iss = I(St : St+1 ) (3.7) P Isa = I(St : At+1 ) where St represents the sensor values at time t and At represents the actuator values at time t. An organism solving a physical task will move through the environment such that this information is increased—we typically do not look around randomly, but in a predictable manner. It has been shown that increasing P Iss can be advantageous for creating meaningful behavior on agents [Ay et al., 2008]. Alternatively, predictive information can be understood as the information the sensors have about the actuators after the brain processed the sensor information. For a purely reactive agent, increasing this P Isa (for predictive information from sensor to motor) would 55 A B 1 2 0 3 15 14 0 1 2 3 Sensor States = on = off Figure 3.1: A: In the simulation, large or small blocks fall diagonally toward the bottom row of a 20 × 20 discrete world, with the agent on the bottom row. For the purpose of illustrating the task, a large brick is falling to the left, while a small brick is falling to the right. In simulations, only one block is falling at the time, and any one brick can fall either only to the left or only to the right. The agent is rewarded for catching small blocks and punished for catching large blocks. B: A depiction of the agent’s neurons (bottom left: triangles depict sensors, circles illustrate brain (internal) neurons, trapezoids denote actuators) and the sequence of activity patterns on the agent’s 4-bit retina (right), as a large brick falls to the right. Reproduced from [Marstaller et al., 2013b], with permission. 56 be advantageous, because the actions of the agent become more appropriate to the action required. At the same time, if agents need to become less reactive but more dependent on their internal states P Isa should decrease after adaptation (as shown in [Edlund et al., 2011]). 3.3.2 Complex Environments We investigate the effect that rewarding neuro-correlates have on adaptation in two different environments. The first is a temporal-spatial integration task where an agent must catch or avoid blocks that fall towards it (see Fig:3.1). The task is an adaptation of the “active categorical perception task” studied earlier [Beer, 1996,Beer, 2003,van Dartel et al., 2005], and requires a comparison between past and present sensor inputs to make inferences about future optimal behavior. While the task is seemingly simple, the sensor modalities (embodiment) of the agent are limited in such a way that this becomes a complex problem to be solved by an agent [Marstaller et al., 2013b]. The second environment we use to optimize the agents is the generation of (pseudo) random numbers, and does not require embodiment of the brain. This task does not require any input, but the agent must produce a sequence of zeroes and ones with high entropy. This task is also used to assess cognitive abilities in humans. It is known that autism [Rinehart et al., 2006], schizophrenia [Zlotowski and Bakan, 1963], as well as different forms or Alzheimer’s disease can be diagnosed by analyzing a sequence of symbols generated by the human subject who was asked to produce a sequence that is as unpredictable as possible [Rinehart et al., 2006, Wagenaar, 1972, Williams et al., 2002, Brugger et al., 1996]. This complex task involves memory [Baddeley, 1998], processing [Jahanshahi et al., 2006], and the ability to sequentially process information [Brugger et al., 1996] – components that are also involved in the algorithmic generation of pseudo random numbers. It is unclear if an evolved Markov Brain random number generator resembles either a computer algorithm or the cognitive abilities found in humans. Nevertheless this task clearly qualifies as a complex problem requiring many components to work together, while at the same time it is not another example of an embodied 57 agent. The randomness of the produced sequence is measured by its compressability using the Lempel-Ziv-Welch (LZW) algorithm [Welch, 1984]. One can think of many other complex environments for which this method of GA augmentation might be suitable, such as: navigation tasks [Edlund et al., 2011], classification and perception tasks [Chapman et al., 2013], or tasks that require an understanding of group behavior [Olson et al., 2013a, Hintze and Miromeni, 2014]. As long as neuro-correlates are measurable and doing so does not impose too high a computational overhead, this augmentation should be applicable. However, different environments could benefit differently from the neuro-correlates used–for example a one-layered perceptron in a classification task might not require internal representations. In such cases the representation measure R might become useless. Alternatively Φatomic might become meaningless in a task that does not require the integration of information 3.4 Methods We performed evolutionary experiments to test how neuro-correlates affect the performance of a GA. (Our source code is available at https://gitlab.msu.edu/jory/entropy-2015-neurocorrelates). In each block-catching experiment the performance of an agent was assessed by measuring the number of blocks correctly caught and correctly avoided in 80 trials. In the random-number-generation (RNG) experiment. the agents were given one bit that at each step changed from 0 to 1 and back for 500 updates. The output of the network over those 500 updates was collected and compressed. The number of symbols written into this compressed string was used as a proxy for the maximum entropy of that string. Highly regular or constant strings result in very short sequences after compression, while strings with higher entropy cannot be compressed that easily and result in longer strings. This environment has no particular state the world can be in, thus measuring R is meaningless in this context. Once performance is assessed, fitness can be calculated, upon which the GA’s selection 58 mechanism can operate. It is standard in this domain to use an exponential of performance (1.1performance ) as a fitness measure to encourage selection of more complex behavior, similar to [Lenski et al., 2003b]. Using fitness alone is our control: the non-augmented case. We also explored the use of eight neuro-correlate measures for augmenting performance of the GA: minimum-description-length (MDL), diameter of the brain network (GD), amount of information integration (Φatomic ), amount of representation about the environment (R), Gamma Index (connectedness), sparseness, and two variations of predictive information: sensors t to sensors t + 1, and sensors t to actuators t + 1 (P Iss and P Isa ). Augmenting performance by a neuro-correlate is performed by a multiplication of the normalized neuro-correlate with the exponential performance of the agent. Each evolutionary experiment is repeated 128 times and agents are allowed to adapt over 10, 000 generations (all evolutionary parameters are identical to [Marstaller et al., 2013b], except duplication and deletion are identical at 0.05). The population begins and ends at size 100 and the mutation rate is 0.005. At the end of each experiment the line of descent (LOD) is reconstructed [Lenski et al., 2003b] and the results on the LOD are averaged over all replicates. The general form for fitness calculation used in this work is N α ω = 1.1 i=1 κi κimax +1 (3.8) where N is the set of neuro-correlates to use. These experiments used N = 1 but the general form allows any number of neuro-correlates to augment performance together. α is the measure of performance, κ the measure of the neuro-correlate, and κimax the maximum theoretical value for the neuro-correlate used. While using P I as a way to augment selection has previously used to minor success in the context of lifetime learning in embodied artificial intelligence [Zahedi et al., 2013] and attempted with linear additive function [Zahedi et al., 2013], here we use a non-linear multiplicative function that emphasises the effect of both beneficial and deleterious mutations. 59 Alternatively, the distribution of fitness or neuro-correlates at the end of evolutionary adaptation is measured. The violin plots we use aggregate the replicate experiments and visualize the distribution. The final population contains genetic variation not yet culled by selection. To reduce such variation, we take from each experiment the organism on the line of descent three generations prior to the final generation. 3.5 Results and Discussion We see that three of the eight proposed neuro-correlates improve adaptation when used to augment the GA in the block-catching task. The agent populations not only adapt faster, but also evolve to a higher average performance after 10, 000 generations (see Fig:3.2 left). These neuro-correlates are Φatomic and Graph Diameter (p < 0.05) and Gamma Index (p < 0.0001) whereby p values were calculated using the Mann-Whitney U test. P Iss (p < 0.01) and Sparseness (p < 0.05) produced significantly worse results when used as augmenting neurocorrelates. Because Φatomic , Graph Diameter, and Gamma Index all have a positive effect on evolution, it seems additional selection pressure for neural architecture and material drastically helps these populations of agents evolve to solve their task. It is possible that Sparseness has a negative effect because Markov Brains start with relatively small neural architectures and must evolutionarily grow to match the problem. The results are very similar for the RNG task (see Fig:3.2 right). Significance of the effect is roughly the same: Φatomic , Graph Diameter, and Gamma Index significantly affects evolution positively when used as fitness-agumenting neuro-correlates (p < 0.01). R cannot be used in the context of the RNG, and is therefore not shown. Predictive information is maximized in cases where reactive behavior is rewarded. In tasks that require memory, maximizing predictive information can be detrimental (and is not the best predictor of fitness, see [Edlund et al., 2011]). It is possible that a predictive information with a larger time delay, or a conditional predictive information such as I(St : At |At−1 ) [Rivoire 60 and Leibler, 2011] could produce better results. We plan on exploring those in the future. Multi-parameter optimization (MPO) can often solve many problems related to multidimensional complex fitness landscapes but can suffer from a number of problems, all well-described in the multi-parameter optimization literature (for an overview see [Deb, 2014, Knowles et al., 2001]). In most of these problem cases, the parameters to be optimized work against each other in the form of trade-offs (one parameter can only be optimized at a cost of another). We observe this effect with Gamma Index and sparseness depending on the task to evolve, while all other neuro-correlates work synergistically with performance. See supplementary materials Fig:1-3 for evolutionary history interactions between neuro-correlates. We find that some neuro-correlates affect performance or each other antagonistically. While in our experiments this trade-off reduces the final performance of the evolved agents, it could be overcome using MPO. An objective that is antagonistic using our fitness function could be beneficial in MPO, which should be explored in the future. 3.5.1 Augmented selection and its effect on other neuro-correlates Using a neuro-correlate to augment a genetic algorithm can shorten the runtime requirements and may improve the overall performance of the evolved population depending on the neurocorrelate and objective. One might ask how augmenting selection using one neuro-correlate affects the other correlates not under direct selection. Intuitively one would expect that selection for a particular neuro-correlate, in conjunction with performance, should increase not only performance (as discussed above) but also the measure itself. Similarly, since neither of the Predictive Information measures augment selection, then we expect no increase in Predictive Information when using them in conjunction with performance. However, we find P Isa to increase when selecting for it together with performance in the RNG environment. Additionally we find no other effect of one neuro-correlate driving the evolution of another. The most prominent example of this effect is with Gamma Index driving the evolution of 61 Task # Correct 80 75 70 65 60 55 50 45 40 A 0 2000 4000 6000 8000 10000 Generation LZW Length 120 115 110 105 100 95 B 0 2000 4000 6000 8000 10000 Generation Figure 3.2: A: Average of fitness over 10, 000 generations (over 128 replicates) for the blockcatching task. B: Average of fitness of 10, 000 generations (over 128 replicates) for generating random numbers. •: the control using only performance in the GA; The effect of augmenting the fitness function (Eq:3.8) with Φatomic ( ), with R ( ), with graph diameter ( ) on task performance, with minimum-description-length ( ), with Gamma Index ( ), with sparseness (a), with P Iss ( ), and with P Isa ( ) on task performance. 62 Figure 3.3: Absolute number of perfectly performing block-catching agents after 10, 000 generations evolved under the five different experimental conditions shown. * indicates significance using the Chi-Squared test relative to Task or no neuro-correlate augmentation (p < 0.05). nearly all other neuro-correlates and vice versa (see Fig:3.4). This further supports the idea that using neuro-correlates does not necessarily create negative synergy as discussed in Results and Discussions concerning multiple parameter optimization. In the RNG environment R cannot be used because there is no environment about which the agent could build representations. For other neuro-correlates we observe the same trend in the RNG task as in the block-catching task (see Figure:3.5). Selecting for a particular neuro-correlate increases its measure over evolutionary time, more so than the increase found without explicit selection. The exception to this is Gamma Index and sparseness and Φatomic . All other neuro-correlates seem to have no additional effect on each other. 3.5.2 Neuro-correlate interactions We showed that selection can be augmented by using specific neuro-correlates while others do not help. Is that because selecting for the neuro-correlates themselves already provides an advantage for performance? One might think that for example maximizing knowledge about the world (R) requires the agent to move in such a fashion that performance enhances auto- 63 8 4 0 Task 1.0 0.0 GD R Phi 80 70 60 50 5 3 1 0.9 0.7 1.6 1.2 0.8 0.4 2.0 1.0 0.0 Ta Ta sk sk* Ta Φ sk Ta *R s Ta k*GD sk* M Ta T DL sk* as Sp k*G ars I e Ta ness sk* Ta PIss sk* PIs a PIsa PIss Sparseness GI MDL 20k 10k 0k 0.30 0.20 0.10 0.00 Figure 3.4: Comparison of different optimization methods at the end of evolution of the blockcatching task. The top row shows the performance distribution (gray) and average performance given the different selection regimes (MDL abbreviates minimum-description-length, which is the potential brain size based on the length of the encoding genome, GI abbreviates Gamma Index, GD graph diameter, and P Iss and P Isa are predictive information sensor t to sensor t + 1, and sensor t to actuator t + 1, respectively). Subsequent rows show the effect of the different selection regimes (x axis) on the respective neuro-correlates. 64 12 8 4 0 5 3 1 Task GI 1.00 0.90 0.80 0.70 0.20 0.10 0.00 0.8 0.4 0.0 Ta s Ta k sk* Ta Φ sk Ta *GD sk* M Ta Ta DL sk* sk Sp *G ars I en Ta ess sk* P Ta Iss sk* PIs a PIsa 0.20 0.10 0.00 Sparseness 20k 10k 0k PIss MDL GD Phi 130 110 90 70 Figure 3.5: Comparison of different optimization methods on each other at the end of evolution of the random number generation task. The top row shows the performance distribution (gray) and average performance given the different selection regimes (MDL abbreviates minimum-descriptionlength, which is the potential brain size based on the length of the encoding genome, GI abbreviates Gamma Index, GD graph diameter, and P Iss and P Isa are predictive information sensor t to sensor t + 1, and sensor t to actuator t + 1, respectively). Subsequent rows show the effect of the different selection regimes (x axis) on the respective neuro-correlates. 65 Task Phi GD MDL 0.9 Sp Ta sk PIs s PIs a 0.8 0.4 0.0 −0.4 ars GI en ess 0.7 PIss Sparseness 0.20 0.10 0.00 1.1 GD MD L GI 20k 10k 0k 1.0 0.6 0.2 −0.2 R GD MD L Sp G ars I en ess PIs s PIs a Φ 0.5 2.0 1.0 0.0 4 2 0 PIsa 1.5 Ta sk 12 8 4 0 Φ Task MDL GI 6 4 2 0 0.20 0.10 0.00 1.10 1.00 0.90 0.80 0.70 PIsa 120 80 40 1.0 20k 10k 0k PIss Sparseness 8 4 0 0.0 GD R Phi 80 60 40 20 Figure 3.6: Effect on different neuro-correlates and performance when selecting only on a single neuro-correlate. The left panel is about evolution in the block-catching environment, and the panel on the right is about the RNG task. matically. Therefore we repeated the above described experiment, but instead of augmenting performance with a neuro-correlate, this time selection was performed on the neuro-correlates alone. We find that none of the neuro-correlates affect performance substantially in the context of the RNG task (see the top rows in Figure:3.6) except for Φatomic . The effect on Φatomic on the generation of random numbers is measurable, but very low. We assume that in order to generate non-zero Φatomic information must flow through the system. Because there is no control function the “information” is literally random (entropy), which is what the RNG 66 environment seems to be selecting. As expected, selecting for a single neuro-correlate increases its value in both environments (see the diagonal for both environments in Figure:3.6). However, we also find that many neuro-correlates affect each other positively and negatively, and the effect is similar in both environments. Some of these interactions are very intuitive. All measures that benefit from more connections, for example, cause the minimum-description-length to increase, whereas sparseness causes the minimum-description-length to shrink. Similarly, Φatomic and R have at least some positive effect on each other, and we conjecture that having the ability to form representations even though they might not be used to improve performance still requires the ability to integrate information. However, P Iss positively affects P Isa in the block-catching environment and has no effect in the RNG environment, while P Isa has a positive effect on P Iss in both environments (compare the bottom right of each Figure:3.6). To our knowledge the relation between the two Predictive Information measures has not been studied, and we are unable to provide any additional insight into this phenomenon. 3.6 Conclusions We have tested whether eight different network- and information-theoretic neuro-correlates can be used to improve the performance of a simple genetic algorithm. We found that Φatomic , Graph Diameter, and density (Gamma Index) each statistically significantly improve the performance of a GA in two environments tested. Sparseness does not improve performance as much as density does, suggesting sparseness is not generally beneficial for GAs in this domain. Thus, sparseness should only be used if its application has been shown to be beneficial for the problem domain in question. The two forms of predictive information measures (P Iss and P Isa ) had a negative effect in both environments on finding optimal performers (see Figure:3.3), and thus at least due to statistical significance P Isa should not be used to augment 67 selection in this domain. Gamma Index was significantly the most reliable fitness-augmenting neuro-correlate in the block-catching task for producing perfectly performing agents (see Figure:3.3). Because the value of each neuro-correlate is simply multiplied by performance, the computational overhead is bound by the complexity of each measure. Typically, R and Φatomic measures are computationally intensive and must be repeated for every agent in the population. This is a significant overhead, especially for increases in agent lifetime or brain size. However, this study shows graph diameter and gamma index measures to be computationally inexpensive and thus preferable, with preference between the two for graph diameter. While the GA used here benefited from augmenting selection with neural correlates, we studied only two environments, and it is likely other environments might respond differently. Another possible extension of this work is to investigate other neuro-correlates, or even more topologically based neuro-correlates, or perhaps other algorithms such as novelty search [Lehman and Stanley, 2008]. 3.7 Acknowledgments This work was supported in part by the National Science Foundation’s BEACON Center for the Study of Evolution in Action under Cooperative Agreement DBI-0939454. We wish to acknowledge the support of the Michigan State University High Performance Computing Center and the Institute for Cyber Enabled Research (iCER). 68 Chapter 4 The Role of Conditional Independence in the Evolution of Intelligent Systems 4.1 Abstract Systems are typically made from simple components regardless of their complexity. While the function of each part is easily understood, higher order functions are emergent properties and are notoriously difficult to explain. In networked systems, both digital and biological, each component receives inputs, performs a simple computation, and creates an output. When these components have multiple outputs, we intuitively assume that the outputs are causally dependent on the inputs but are themselves independent of each other given the state of their shared input [Pearl, 2014]. However, this intuition can be violated for components with probabilistic logic, as these typically cannot be decomposed into separate logic gates with one output each. This violation of conditional independence is equivalent to instantaneous interaction — the idea is that some information between the outputs is not coming from the inputs and thus must have been created instantaneously. Here we compare evolved artificial neural systems with and without instantaneous interaction across several task environments. 69 We show that systems without instantaneous interactions evolve faster, to higher final levels of performance, and require fewer logic components to create a densely connected cognitive machinery. 4.2 Introduction Evolvable Markov Brains are networks of deterministic and probabilistic logic gates whose function and connectivity are genetically encoded. They are a useful model to study the evolution of behavior [Olson et al., 2013a], cognitive properties [Marstaller et al., 2013a], and neural-network complexity [Edlund et al., 2011,Albantakis et al., 2014], and can also be used as classifiers [Chapman et al., 2013]. At each generation of evolution within a particular task environment, networks are selected based on their fitness and the populations adapt through random genomic mutations. The genome is sequentially processed with specific sites indicating the start of a gene. An individual gene encodes one Hidden Markov Gate (HMG), which specifies connections between network elements and also determines inputoutput logic [Marstaller et al., 2013a]. These HMGs are generalized logic gates that encompass conventional logic gates such as XOR or NAND, whose logic table is typically a static mapping of two inputs to a single output (Figure 4.1A), but allow for more than the typical two-inone-out format and can use a probabilistic mapping between input and output states. Here, HMGs could receive up to four inputs mapped to maximally four outputs. In this way, each gene may encode an entire logic module, as opposed to only a single logic function. Depending on the environment, deterministic or probabilistic HMGs may provide a mutually exclusive advantage for evolution. Apart from introducing randomness into the Markov Brains, probabilistic HMGs also differ from deterministic HMGs in the way they can be represented by a collection of simpler logic gates. The outputs of a deterministic HMG are necessarily conditionally independent from each other: given the input state, an output is either on (‘0’) or off (‘1’) with probability P = 1.0. Information about the state of other 70 outputs is irrelevant. As a consequence, a deterministic HMG can always be decomposed into several logic gates with one output each. For example, a two-in-two-out deterministic HMG (see Figure 4.1 panel B) can easily be decomposed into two independent two-in-one-out gates (see Figure 4.1 panel A). This decomposition works similarly for larger gates with more inputs and more outputs, requiring one logic gate per output. Probabilistic HMGs, on the other hand, are not generally decomposable into separate logic functions for each output. In the case of a two-in-two-out probabilistic HMG, a probability table (for a detailed explanation see [Marstaller et al., 2013a]) maps all four possible input states to all four possible output states. Let us for simplification purposes just consider the case where both inputs for such a gate are 0. Then, four probabilities (P00 , P01 , P10 , P11 ) determine the probability for each of the four possible output states to occur given input state 00. P00 defines the probability that both outputs are 0, and P11 the probability that both outputs are 1, and so forth. Observe that all four probabilities must sum to 1.0: P00 + P01 + P10 + P11 = 1.0 (4.1) Except for the above requirement, the individual probabilities P00 to P11 evolve independently for default probabilistic HMGs in Markov Brains. Let us now try to use two probabilistic two-in-one-out gates (see Figure 4.1 panel C) to achieve the same functionality as a two-in-two-out HMG. Pa = P (A = 1|I) denotes the probability of the first gate (‘A’) to have an output of 1 for a given input state I. Consequently, the probability for A to have an output of 0 is 1.0 − Pa . Synonymously, for the second gate (‘B’) Pb = P (B = 1|I). The joint input-output function of the two individual gates A and B 71 is the following: P00 = (1.0 − Pa )(1.0 − Pb ) (4.2) P01 = (1.0 − Pa )Pb (4.3) P10 = Pa (1.0 − Pb ) (4.4) P11 = Pa Pb (4.5) Given eqs. 2-4.5, the summation rule (eq. 4.1) is met. In addition, the following dependency between probabilities holds: P00 P11 = P01 P10 (4.6) It is easy to see that probabilistic HMGs with independently evolved probabilities P00 to P11 may violate equation 4.6. As a result, probabilistic HMGs typically cannot be decomposed. Decomposition of a probabilistic two-in-two-out HMG is only possible if: P (out1 |I, out2 ) = P (out1 |I) (4.7) P (out2 |I, out1 ) = P (out2 |I) (4.8) for any input state I, which means that the two outputs must be conditionally independent of each other given all possible I. Under these conditions, the HMG’s probabilities can be expressed according to eqs. 2-5, with Pa = P (out1 = 1|I), the marginal probability of out1 = 1 given I, and Pb = P (out2 = 1|I), the marginal probability of out2 = 1 given I. The same principle can be applied to HMGs with multiple conditionally independent outputs. An example probability distribution for P00 to P11 that violates conditional independence is PD = P (out1 , out2 |I) = (0, 0.5, 0.5, 0), whereas PD∗ = (0.25, 0.25, 0.25, 0.25) conforms with eqs. 1-8. Note that the marginal probabilities P (out1 = 1|I) = P (out2 = 1|I) = 0.25 are 72 the same in both cases. In fact, there are infinitely many probability distributions with the same marginal probabilities, but only PD∗ fulfills conditional independence. By contrast, PD contains the additional constraint that P (out1 = 1|I, out2 = 1) = 0 and vice versa (cf. [James and Crutchfield, 2016] for more intricate examples of hidden dependencies in probability distributions). Making the temporal order explicit, the probabilities in PD∗ only depend on the input I to the HMG at timestep t − 1, before the update. PD, however, also requires instantaneous interaction between the outputs at time t. This example demonstrates that, in Markov Brains with probabilistic HMGs, the output state of an element at time t may depend on information that is not available at t − 1. Such instantaneous interactions have implications with respect to the causal structure of these Markov Brains, as they violate the postulate that causes must precede their effects. In addition, instantaneous interactions defy the notion of elementary causal components. Without conditional independence, the interactions between elements in a probabilistic Markov Brain cannot be represented as a Bayesian network, or directed acyclic graph (DAG) [Pearl, 2014]. This prohibits analyzing the causal composition of these Markov Brains, which means, for example, that the theoretical framework of integrated information theory (IIT) [Oizumi et al., 2014, Albantakis and Tononi, 2015], which assesses how sets of elements within a system causally constrain each other, cannot be applied to these Markov Brains. In short, while Markov Brains with general, probabilistic HMGs may be useful tools for artificial evolution experiments, the networks of elements they encode cannot generally be interpreted as a network of causally interacting components. While instantaneous interaction may be a curious phenomenon in evolvable Markov Brains, the question remains whether the potential gain in computational power through such instantaneous interactions has any effect on the evolution and functionality of Markov Brains. To explore this question, we implemented a decomposable version of the evolvable probabilistic HMGs (up to four-in-four-out) with conditionally independent outputs {out1 , . . . , outN }, such 73 that: P (outi |I, {out1:N \i }) = P (outi |I), (4.9) for all input states I and all outputs {out1 , . . . , outN }, where N is the number of outputs. These decomposable HMGs comply with an extended version of eqs. 2-5: P (outi = Oi |I) P (out1:N = O|I) = (4.10) i=1:N for all output states O and all input state I, and thus guarantee Markov Brains with causally interpretable neural networks. In the following, we will compare the evolution of Markov Brains generated by probabilistic HMGs against Markov Brains generated by decomposable probabilistic HMGs in several task environments. We will show that instantaneous interaction hampers evolutionary adaptation. At the same time we show that systems evolved from components without instantaneous interaction adapt their cognitive machinery. Contrary to what might be expected given the potential gain in computational power through instantaneous interactions, Markov Brains with decomposable HMGs required fewer gates to create a more densely connected network with better task performance. We conjecture that instantaneous interactions provide no computational advantage for agents evolving in the tested sensory-motor task environments. 4.3 Results Probabilistic HMGs in Markov Brains can independently evolve the probabilities that determine their outputs, which can lead to instantaneous interactions, as outlined above. Here, we thus introduce another type of probabilistic HMG we call decomposable HMG. These gates encode their size and connectivity in the same way as probabilistic HMGs. Nevertheless, the probability tables of decomposable HMGs are restricted according to eqs. 9 and 10, and thus 74 A) B) in 1 in 2 in 1 in 2 0 0 1 1 0 1 0 1 Out 1 0 1 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 AND out 1 in 1 OR out 2 in 2 in 1 in 2 0 0 1 1 0 1 0 1 Out 2 0 1 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 C) prob 2/2 gate in 1 in 2 0 0 1 1 0 1 0 1 00 1.0 0.0 0.0 0.0 out 1 out 2 01 10 11 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 out 1 out 2 in 1 AND A out 1 B out 2 OR in 2 in 1 in 2 Out 1 0 1 in 1 in 2 0 0 1-Pa Pa 0 0 00 (1-Pa)(1-Pb) Out 2 0 1 1-Pb out 1 out 2 01 10 (1-Pa)Pb Pa(1-Pb) Pb 11 Pa Pb Figure 4.1: Examples of gate decomposition into combinations of simpler gates. Panel A shows two deterministic logic gates whose inputs are cross-wired so that both gates receive the same inputs. The tables below show their probabilities to output 0 or 1 respectively. These probabilistic logic boundary cases are effectively deterministic logic gates. Panel B shows a two-in-two-out logic gate that is functionally identical with the two gates depicted in Panel A. Panel C shows two probabilistic logic gates similarly connected like the deterministic gates from panel A. The logic tables below only show the case where both inputs are 0. The lower table shows replacing both probabilistic logic gates with a single two-in-two-out probabilistic logic gate (similar to panel B) and how the new probabilities for that gate are constructed from the individual probabilities of both gates. must be encoded in a different way than those of general probabilistic HMGs. Probabilities are encoded as one number per one locus. In the non-decomposable HMG these probabilities are transcribed directly from the genome to fill the probability table, after which they are appropriately normalized given the table dimensions. For decomposable HMGs we ensure decomposability by transcribing only the marginal probabilities (eq. 4.9). The probability matrix is then created from these marginal probabilities P (outi |I) by using eq. 4.10. This allows us to evolve Markov Brains with either or both types of gates, conventional probabilistic and decomposable, or in other words those that have instantaneous interaction and those that do not. The difference in rate of adaptation, and differences in the solutions evolved, will highlight the effect of instantaneous interaction on evolution. Differences between Markov Brains with and without instantaneous interactions may depend on the environments in which the virtual organisms were evolved. Some environments might benefit from instantaneous interactions, while others might favor conditional independence. For this reason, three different environments were tested: 75 Temporal Spatial Integration In this environment [Marstaller et al., 2013a, Albantakis et al., 2014] the agent can move laterally left and right (one binary effector each) and is equipped with a set of upwards facing sensors, two on the left and two on the right (one binary sensor each) with a gap of 2 block subunits between them, and 8 hidden binary elements for storing information. Small and large blocks are falling toward the agent one at a time, and activate the sensors of the agent when above them regardless of distance. The blocks fall different directions, and the block sizes and sensors are arranged in such a way that the blocks need to be observed over several updates in order to be distinguished. Small blocks must be caught while large blocks must be avoided. Foraging and Spatial Reasoning Agents are placed at a designated home area in a two-dimensional environment and must first discover and then obtain food. Once they obtained food, they must move it to the home location, after which more food must be collected. Early in an agent’s life food appears nearby, but the circular perimeter onto which food is randomly placed increases in diameter with each successful collection, moving the food successively farther away from home. The sensors for this agent provide a very coarse representation of the environment and are only accurate for nearby object. Inputs include sensor signals for food and home locations: on, facing, and near, as well as angle to food with perception of angle discretized to 45 degrees. Consequently, food and home locations are not reliably observable, and the agent must navigate heuristically in the absence of this sensory information. Outputs include turning left or right and moving forward. Associative Memory This two-dimensional environment [Grabowski et al., 2010, Grabowski and Magana, 2014] presents the agent with a path of rewards, surrounded by a field of poison. The agent receives 4 sensor inputs about whether its current location is on path, poison, or one of the cues. In addition, it receives cues about upcoming turns in the path. These 76 cues, their turn associations, and the path itself are all randomly generated with each visit to the environment. The agent may affect 2 outputs that encode 4 actions: nothing, turn left one unit, turn right one unit, and move forward one unit. In addition, the agent has 8 hidden binary elements for memory. The agent must explore the environment and learn which of the two symbols indicates right and which one indicates left. The agent must then remember those associations and use that knowledge to navigate the path properly. For each experimental condition, we evolved 120 independent populations of 100 organisms until their performance plateaued. A plateau was reached near generation 10,000 in the temporal spatial integration environment, generation 5,000 in the associative learning environment, and generation 3,000 in the foraging environment. After evolution the line of descent was reconstructed [Lenski et al., 2003a], that is the path of inheritance from a random organism in the final population to its ancestor in the initial population. All sweeping mutations are observable in the line of descent. We find that in the foraging and temporal spatial integration environments agents restricted to decomposable gates adapt much more quickly and also achieve a better final performance (see Figure 4.2). The associative memory environment has a tunable punishment parameter for the cost of wandering off the correct path. The relative difference between the reward for following the path and the punishment for straying from the path greatly influences evolvability of successful strategies. If wandering off the path is very costly, then the agent is severely limited in its freedom to make mistakes and explore if it is to maintain its status as a viable organism. This harsh limitation on mistake-making hampers evolution. We find that when punishment for path deviation is relatively small or zero, decomposable HMGs provide a clear evolutionary advantage. When punishment is high, agents restricted to conventional probabilistic HMGs fail to evolve at all, while agents restricted to decomposable HMGs still evolve functional 77 A B 0.65 0.55 Fitness Fitness 0.60 0.50 0.45 0.40 0 2000 4000 6000 8000 10000 Generations 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 500 1000 1500 2000 2500 3000 Generations Figure 4.2: Average fitness of organisms on the line of descent in the spatial temporal (A) and foraging (B) environments. The solid line represents average performance of agents restricted to conventional probabilistic HMGs (with instantaneous interactions), dashed lines represents average performance of agents restricted to decomposable HMGs (without instantaneous interactions). Panel A shows agents evolved in the temporal spatial integration task, and panel B shows agents evolved in the foraging task. The y axis is normalized to show the fraction of maximally attainable fitness in each environment. The gray shadow indicates the bootstrapped 95% confidence interval of the mean. Markov Brains (see Figure 4.3). As shown in Figure 4.4, it is not only the rate of adaptation that is higher for decomposable gates, but also the mean final achieved performance in all three environments, even though the extrema and distributions vary greatly by environment. In the above simulations, Markov Brains were restricted to one type of HMG, either probabilistic or decomposable and seeded with 15 gates at the start of evolution. In a second set of experiments we allowed Markov Brains to evolve both types of gates in order to assess if there is preferential selection of one gate over the other. If either of the gate types confers more of an advantage it should be selected more often than the other gate type. The null hypothesis suggests both gates confer the same benefit and would be under equal selection. While it took much more computational power to simulate these populations longer, we wanted to investigate if there might be oscillations in preferential selection for gate type. In all tested environments we find that decomposable HMGs are used more often than conventional probabilistic HMGs (see Figure 4.5). To exclude any variation resulting from the initial 78 B 0 1000 2000 3000 4000 5000 Generations 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 C 0.40 Fitness 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Fitness Fitness A 0 1000 2000 3000 4000 5000 Generations 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 1000 2000 3000 4000 5000 Generations Figure 4.3: Average performance of agents evolved in the association environment with increasing levels of error punishment. Performance of agents evolved while restricted to decomposable HMGs is shown as a dashed line. Performance of agents evolved while restricted to conventional probabilistic HMGs is shown as a solid line. The gray shadow indicates the bootstrapped 95% confidence interval of the mean. Panel A shows the results for agents receiving zero punishment for path-following errors. Panel B shows the results for punishment of 0.05, meaning 0.05 was subtracted from the cumulative agent score every time it wandered off the path. Panel C shows the results for agents evolved with punishment 0.1, again subtracted from their score when wandering off the path. The y axis was normalized to show relative performance, with 1.0 being the maximally attainable fitness in each environment assuming ideal behavior. evolvability differences between the gates, agents are seeded with 15 HMGs of each type. We find that selection quickly reduces these, but decomposable HMGs are kept more often than probabilistic HMGs. Observe that different tasks require brains of different sizes, but the effect of the dominating decomposable gate type is independent of this phenomenon (see Figure 4.5). 50 independent populations of 100 agents each were evolved in each environment for these competitions. So far, our results suggest that decomposable HMGs have a clear advantage when it comes to evolutionary adaptation of Markov Brains. Furthermore, these results suggest that systems without instantaneous interactions evolve faster, and select against instantaneous interaction when possible. Instantaneous interaction allows components to have outputs that share information beyond that of their inputs. As such, the question was if instantaneous interactions provide an advantage to computation? Our results suggest the opposite. The question is now if systems that cannot have instantaneous interactions compensate for that loss? It may be that decomposed components are more adaptable in a way that compensates for the missing information. To investigate this issue we also assessed the effect of gate type 79 Final Fitness 1.0 0.8 0.6 0.4 0.2 F-d ec A-p rob A-d ec ob F-p r ec I-d I-p rob 0.0 Conditions Figure 4.4: Distribution of performances at the end of evolution for each of two conditions in each of three environments. At the end of evolution, the 200th -from-last agent of each independent line of descent is collected to create these distributions. This near-the-end data slicing is necessary to eliminate noise toward the end of the line of descent caused by mutated extant agents not yet pruned from the line of descent by selection. The three environments are represented on the x-axis as I (temporal spatial integration task), F (foraging task), and A (association task with a punishment of 0.05). In each environment (I, F, and A) agents were allowed to either use conventional probabilistic HMGs (labeled as “prob”) or decomposable HMGs (labeled as “dec”). Red dashes indicate the mean and extrema. Gray violin plots show the distributions of normalized fitness for all 120 replicates per experimental condition. Fitness was normalized such that maximal theoretically attainable fitness is represented as 1.0. For each environment of Integration, Learning, and Association, the conditions under which evolution was limited to decomposable gates produced significantly better adapted agents. Significance was tested using the Mann-Whitney U test, with p < 0.05 for each environment (p = 0.0, p = 0.0, p < 2.2 × 10−112 ). 80 B 1.5 Gates Encoded Gates Encoded 2.0 1.0 0.5 0.0 0 10000 20000 30000 40000 50000 Generations 16 14 12 10 8 6 4 2 0 C Gates Encoded A 0 500 1000 1500 2000 2500 Generations 8 7 6 5 4 3 2 1 0 0 10000 20000 30000 40000 50000 Generations Figure 4.5: Average number of gates evolved along the line of descent in a competition experiment executed in different environments. Agents were evolved in three different environments: Panel A shows the results from the temporal spatial integration environment. Panel B shows the results from the foraging environment. Panel C shows the results from the association environment with a punishment cost of 0.05. The average number of preferentially selected decomposable HMGs over generations is shown as a dashed line, whereas the solid line represents the average number of preferentially selected conventional probabilistic HMGs. The gray shadow indicates a 95% confidence interval. on the structure of cognitive machinery. Observe that the different kinds of gates only differ in how their probabilities are encoded and do not differ in how connections are made or how mutations affect their connection or abundance. While there are many ways to measure these networks [McCabe, 1976] we test evolved brains for number of gates and graph measures used in similar work: gamma index (connectivity), and brain diameter (for a more detailed explanation see [Schossau et al., 2015]).The first measure is simply the number of gates present in the phenotype. The Gamma index measures the density of connections relative to all possible connection that could have been made. A higher gamma index indicates a more connected Markov Brain. The brain diameter is the length of the shortest path between the furthest nodes in the network. Diameter is assessed by computing all shortest connections between all components. The longest of these shortest connections is the brain diameter. The larger the brain diameter is the more steps it takes for information to traverse the Markov Brain, and the more computations are possible. Probably the most interesting result is that the total number of gates was lower when only decomposable gates were allowed, at least when agents were evolved in the foraging and 81 B 0.70 0.65 Fitness Fitness 0.60 0.55 0.50 0.45 0.40 0 10000 20000 30000 40000 50000 Generations 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 C 1.0 0.8 Fitness A 0.6 0.4 0.2 0 500 1000 1500 2000 2500 Generations 0.0 0 10000 20000 30000 40000 50000 Generations Figure 4.6: Average fitness along the line of descent in each environment while both nondecomposable and decomposable gate types were evolvable: Panel A shows line of descent fitness for the temperal spatial integration environment. Panel B shows line of descent fitness for the foraging environment. Panel C shows line of descent fitness for the association environment. The gray shadow indicates a 95% confidence interval. in the association environment (see Figure 4.7 panel C). When agents were evolved in the temporal spatial integration environments, we find a slightly higher number of decomposable gates. This suggests that the subsequent effects on density and diameter cannot simply be explained by a higher number of gates. We find that the density and diameter of the evolved Markov Brains are higher for those experiments where agents were restricted to the evolution of decomposable gates (see Figure 4.7 panel B and C). This suggests agents with decomposable gates evolve to use fewer gates, and that these fewer gates are more tightly connected. Additionally, the total diameter of the network becomes longer. This result suggests a possible contradiction to the idea that a system needs more decomposable gates to compensate for the lack of instantaneous interactions. Secondly, it suggests that if the loss of instantaneous interactions is part of a trade-off, then it is offset by an increase in connectivity and brain diameter, though it may only be that these populations evolve faster and thus arrive at better-connected high-fitness networks sooner. The increase in brain diameter suggests that more computations are occurring sequentially in the brain. However, we have no intuition what this could mean for the computations that happen in the Markov Brain. While the mixed gate conditions favored evolution of decomposable gates, this could have come at the cost of lower achieved fitness than in the homogeneous gate conditions. 82 ob A-d ec ec A-p r F-d rob ec F-p rob Diameter Conditions 6 5 4 3 2 1 0 I-p rob F-d ec A-p rob A-d ec ec F-p I-d rob I-p Gamma Index ob A-d ec ec A-p r F-d rob ec F-p I-d I-p Conditions C7 0.25 0.20 0.15 0.10 0.05 0.00 I-d B 0.30 12 10 8 6 4 2 0 rob Gates A 1614 Conditions Figure 4.7: Distributions of different properties of Markov Brains at the end of evolution in three environments and two conditions each environment. The three environments are represented on the x-axes as I (temporal spatial integration task), F (foraging task), and A (association task with a punishment of 0.05). In each environment (I, F, and A) agents were allowed to either use conventional probabilistic gates (labeled as ”prob”) or decomposable gates (labeled as ”dec”). Panel A shows the number of gates, Panel C shows the gamma index (that is the connection density), and Panel B shows the diameter (longest of the shortest connections). Each panel shows the distributions as violin plots in gray, and the mean as well as the extreme points as red dashes. For each environment of Integration, Learning, and Association, and for both properties of Gamma Index and Diameter, the conditions under which evolution was limited to decomposable gates produced significantly more connected and larger diameter Markov Brains. Significance was tested using the Mann-Whitney U test, with p < 0.05. For Gamma Index, the difference between gate restricted evolution within environments produced p values of p = 4.9 × 10−23 , p = 6.5 × 10−6 , p = 1.3 × 10−3 respectively. For Diameter, the difference between gate restricted evolution within environments produced p values of p = 2.2 × 10−6 , p = 3.6 × 10−2 , p = 4.2 × 10−3 respectively. However, an examination of the line of descent fitness show no degradation in evolvability. This suggests that while the path for a slower and more complex evolutionary trajectory with non-decomposable gates existed within the search space, evolution preferentially selected against that path for one with more evolvability. 4.4 Discussion Evolvable Markov Brains are a useful modeling framework for studying evolution [Olson et al., 2013a] and complex cognitive systems [Edlund et al., 2011, Marstaller et al., 2013a, Albantakis et al., 2014], but also provide a powerful alternative to conventional machine learning approaches due to their capacity for solving classification tasks [Chapman et al., 2013]. Their connectivity and logic structure is determined by genes that encode HMGs that map inputs to outputs via a logic table. Probabilistic HMGs yield the most general logic 83 tables, as each entry is independently determined by one locus in the genome with the only restriction that each row has to sum to 1. As a consequence, probabilistic HMGs allow for instantaneous interactions, shared information between outputs. This information sharing may contain information useful to the system, but prohibits interpreting Markov Brains with probabilistic HMGs as a network of elementary causally interacting components [Oizumi et al., 2014, Pearl, 2014, Albantakis and Tononi, 2015]. While this may not impact machine learning applications of Markov Brains, it may limit their use as a surrogate for biological systems and aggravates analyzing their causal structure. To date, the possible role of these instantaneous interactions in cognitive systems remains unclear. Here we investigated the impact of instantaneous interactions on cognitive systems by comparing the evolution of two different types of Markov Brains. Agents were evolved in three different environments and were restricted to either conventional probabilistic HMGs or decomposable HMGs that did not allow for instantaneous interactions (eq. 9 and 10) but were otherwise identical to conventional probabilistic logic gates. The decomposable HMGs introduced here are a special case of probabilistic HMGs, with the additional constraint of conditionally independent outputs. A priori, probabilistic HMGs thus have more computational potential. Nevertheless, we found that decomposable HMGs without instantaneous interactions not only allow faster adaptation of Markov Brains, but also promote improved final agent performance. In populations that could evolve the use of both probabilistic and decomposable HMGs, we found preferential selection for decomposable gates in all three tested environments. Lastly, we found that populations restricted to decomposable HMGs evolved to use fewer gates, have a larger diameter, and a higher density of connectivity. These findings suggest that instantaneous interactions hamper the evolution of cognitive systems rather than providing computational advantages. In fact, Markov Brains with decomposable HMGs evolved to higher fitness levels using a similar or even fewer number of gates than Markov Brains with probabilistic HMGs. This indicates that Markov Brains with 84 decomposable HMGs had no need to compensate for a lack in computational power. To the contrary, evolution seems to exploit the conditional independence property of decomposable HMGs in these systems and to avoid instantaneous interactions. This is suggested by the finding that Markov Brains with decomposable HMGs are more densely connected, which means that individual decomposable HMGs evolved on average more outputs than individual probabilistic HMGs. Conditional independence thus seems to facilitate packing more inputoutput relations into a single HMG. In Markov Brains with probabilistic HMGs more gates with fewer average outputs may be required specifically to avoid instantaneous interactions. Further investigation should provide more insight into the question why systems with elementary causal components that do not allow instantaneous interactions evolve faster. After all, an instantaneous interaction contains information seemingly “from nothing” as it is created between outputs of logic units and is not solely caused by inputs. Based on our simulations, several factors may contribute to explain why evolution prefers decomposable gates: Search Space Evolutionary search for a population using probabilistic gates must traverse a larger state space, since probabilistic gates include the subset of gates that are decomposable. When limiting evolution to explore only decomposable gates, the search space becomes much smaller and thus easier to explore. Epistatic Interactions In decomposable gates, mutations may have a different functional phenotypic effect than those affecting conventional probabilistic gates. This may allow transversing the evolutionary search space in greater leaps. Robustness of Functions Related to the above, conditional independence between the outputs of a decomposable HMG may lead to a more robust encoding of beneficial inputoutput relations. Determinism with respect to Inputs All tested evolution environments required the agents to react to sensor stimuli in order to achieve high fitness. While a certain amount of 85 intrinsic indeterminism (noise) may be useful to trigger exploratory behavior, instantaneous interactions between outputs that are not explained by past inputs might simply provide no fitness advantage in sensory-motor cognitive tasks. Decomposable HMGs can be viewed as a noisy version of deterministic HMGs. Any additional freedom in the logic table of probabilistic HMGs may be superfluous or even detrimental. Our work has several practical implications for the use of Markov Brains across disciplines. For machine learning applications, where the interest lies primarily in the speed of evolution and final fitness, our results suggest that decomposable HMGs may improve performance considerably compared to general probabilistic HMGs. As models for cognitive systems, using decomposable HMGs instead of general probabilistic HMGs has the additional benefit that decomposable gates lead to causally interpretable Markov Brains. This allows analyzing the causal composition of the resulting Markov Brains, for example, within the framework of Integrated Information Theory [Oizumi et al., 2014, Albantakis et al., 2014]. Finally, whether biology and fundamentally physics allow for true instantaneous interactions in nature is an open question. Typical accounts of causation require that causes precede their effects. Yet, missing variables and coarse-grained measurements may lead to system models with apparent instantaneous interactions between variables [James and Crutchfield, 2016]. Taken together, our results suggest that there is no apparent reason to include instantaneous interactions in Markov Brains. 4.5 Acknowledgements This work was supported in part by the National Science Foundation’s BEACON Center for the Study of Evolution in Action, under contract No. DBI-0939454. We wish to acknowledge the support of the Michigan State University High Performance Computing Center and the Institute for Cyber Enabled Research. 86 BIBLIOGRAPHY 87 BIBLIOGRAPHY [Adami et al., 2012] Adami, C., Schossau, J., and Hintze, A. (2012). Evolution and stability of altruist strategies in microbial games. Phys Rev E, 85(1 Pt 1):011914. [Ahn et al., 2006] Ahn, Y.-Y., Jeong, H., and Kim, B. J. (2006). Wiring cost in the organization of a biological neuronal network. Physica A: Statistical Mechanics and its Applications, 367:531–537. [Albantakis et al., 2014] Albantakis, L., Hintze, A., Koch, C., Adami, C., and Tononi, G. (2014). Evolution of integrated causal structures in animats exposed to environments of increasing complexity. PLoS computational biology, 10(12):e1003966. [Albantakis and Tononi, 2015] Albantakis, L. and Tononi, G. (2015). The Intrinsic CauseEffect Power of Discrete Dynamical Systems–From Elementary Cellular Automata to Adapting Animats. Entropy, 17(8):5472–5502. [Amrutha, 2014] Amrutha, S. (2014). Agent based simulation of routine activity with social learning behavior. In Computational Intelligence and Computing Research (ICCIC), 2014 IEEE International Conference on, pages 1–6. IEEE. [Archetti and Scheuring, 2011] Archetti, M. and Scheuring, I. (2011). Coexistence of cooperation and defection in public goods games. Evolution, 65(4):1140–1148. [Attik et al., 2005] Attik, M., Bougrain, L., and Alexandre, F. (2005). Neural network topology optimization. In Proceedings of the 15th International Conference on Artificial Neural Networks: Formal Models and Their Applications - Volume Part II, ICANN’05, pages 53–58, Berlin, Heidelberg. Springer-Verlag. [Ay, 2015] Ay, N. (2015). Information geometry on complexity and stochastic interaction. Entropy, 17(4):2432. uttler, F., and Olbrich, E. (2008). Predic[Ay et al., 2008] Ay, N., Bertschinger, N., Der, R., G¨ tive information and explorative behavior of autonomous robots. The European Physical Journal B, 63(3):329–339. [Baddeley, 1998] Baddeley, A. (1998). Random generation and the executive control of working memory. The Quarterly Journal of Experimental Psychology: Section A, 51(4):819– 852. [Bain, 1985] Bain, A. (1985). Mind and Body: The Theories of Their Relation. Routledge. [Balduzzi and Tononi, 2008] Balduzzi, D. and Tononi, G. (2008). Integrated information in discrete dynamical systems: motivation and theoretical framework. PLoS computational 88 biology, 4(6):e1000091. [Balduzzi and Tononi, 2009] Balduzzi, D. and Tononi, G. (2009). Qualia: the geometry of integrated information. PLoS computational biology, 5(8):e1000462. [Barrett AB, 2011] Barrett AB, S. A. (2011). Practical measures of integrated information for time-series data. PLoS Comput Biol, 7(1). [Beer, 1996] Beer, R. (1996). Toward the evolution of dynamical neural networks for minimally cognitive behavior. In Maes, P., Mataric, M., Meyer, J., Pollack, J., and Wilson, S., editors, From Animals to Animats: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pages 421–429, Cambridge, MA. MIT Press. [Beer, 2003] Beer, R. (2003). The dynamics of active categorical perception in an evolved model agent. Adaptive Behavior, 11:209–243. [Bitbol and Schwab, 2014] Bitbol, A.-F. and Schwab, D. J. (2014). Quantifying the role of population subdivision in evolution on rugged fitness landscapes. PLoS Computational Biology, 10(8):e1003778. [Bojarski et al., 2016] Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316. [Brooks, 1986] Brooks, R. A. (1986). A robust layered control system for a mobile robot. Robotics and Automation, IEEE Journal of, 2(1):14–23. [Brugger et al., 1996] Brugger, P., Monsch, A., Salmon, D., and Butters, N. (1996). Random number generation in dementia of the alzheimer type: A test of frontal executive functions. Neuropsychologia, 34(2):97–103. [Bryson, 1975] Bryson, A. E. (1975). Applied optimal control: optimization, estimation and control. CRC Press. [Bullinaria, 2006] Bullinaria, J. A. (2006). Understanding the advantages of modularity in neural systems. In Proc. Twentyeighth Annual Conf. Cogn. Sci. Soc, pages 119–124. [Chapman et al., 2013] Chapman, S., Knoester, D., Hintze, A., and Adami, C. (2013). Evolution of an artificial visual cortex for image recognition. In Advances in Artificial Life, ECAL, volume 12, pages 1067–1074. [Cherniak et al., 2004] Cherniak, C., Mokhtarzada, Z., Rodriguez-Esteban, R., and Changizi, K. (2004). Global optimization of cerebral cortex layout. Proceedings of the National Academy of Sciences of the United States of America, 101(4):1081–1086. 89 [Claussen and Traulsen, 2008] Claussen, J. C. and Traulsen, A. (2008). Cyclic dominance and biodiversity in well-mixed populations. Physical Review Letters, 100(5). [Clune et al., 2013] Clune, J., Mouret, J.-B., and Lipson, H. (2013). The evolutionary origins of modularity. Proceedings of the Royal Society of London B: Biological Sciences, 280(1755):20122863. [Clune et al., 2011] Clune, J., Stanley, K. O., Pennock, R. T., and Ofria, C. (2011). On the performance of indirect encoding across the continuum of regularity. Evolutionary Computation, IEEE Transactions on, 15(3):346–367. [Cully et al., 2015] Cully, A., Clune, J., Tarapore, D., and Mouret, J.-B. (2015). Robots that can adapt like animals. Nature, 521(7553):503–507. [D’Ambrosio et al., 2011] D’Ambrosio, D., Lehman, J., Risi, S., and Stanley, K. (2011). Task switching in multiagent learning through indirect encoding. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS 2011). IEEE, Piscataway. [De Jong, 1975] De Jong, K. A. (1975). An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, University of Michigan, Ann Arbor, MI, USA. AAI7609381. [DeAngelis and Mooij, 2005] DeAngelis, D. L. and Mooij, W. M. (2005). Individual-based modeling of ecological and evolutionary processes. Annu. Rev. Ecol. Evol. Syst., 36:147– 168. [Deb, 2014] Deb, K. (2014). Multi-objective optimization. In Search methodologies, pages 403–449. Springer. [DeMarse et al., 2001] DeMarse, T. B., Wagenaar, D. A., Blau, A. W., and Potter, S. M. (2001). The neurally controlled animat: biological brains acting with simulated bodies. Autonomous robots, 11(3):305–310. [Dietterich, 2004] Dietterich, A. (2004). Reinforcement learning and its relationship to supervised learning. in Handbook of learning and approximate dynamic programming, IEEE, pages 47–60. [Dietz, 2003] Dietz, R. D. (2003). Spatial competition, conflict and cooperation. PhD thesis, The Ohio State University. [Edlund et al., 2011] Edlund, J. A., Chaumont, N., Hintze, A., Koch, C., Tononi, G., and Adami, C. (2011). Integrated information increases with fitness in the evolution of animats. PLoS Computational Biology, 7(10):e1002236. [Floreano et al., 2008] Floreano, D., D¨ urr, P., and Mattiussi, C. (2008). Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1(1):47–62. 90 [Gardner, 1970] Gardner, M. (1970). Mathematical games: The fantastic combinations of John Conway’s new solitaire game life. Scientific American, 223(4):120–123. [Gauci and Stanley, 2010a] Gauci, J. and Stanley, K. O. (2010a). Autonomous evolution of topographic regularities in artificial neural networks. Neural computation, 22(7):1860–1898. [Gauci and Stanley, 2010b] Gauci, J. and Stanley, K. O. (2010b). Indirect encoding of neural networks for scalable go. In Parallel Problem Solving from Nature, PPSN XI, pages 354–363. Springer. [Goldsby et al., 2009] Goldsby, H. J., Goings, S., Clune, J., and Ofria, C. (2009). Problem decomposition using indirect reciprocity in evolved populations. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 105–112. ACM. [Grabowski et al., 2010] Grabowski, L. M., Bryson, D. M., Dyer, F. C., Ofria, C., and Pennock, R. T. (2010). Early evolution of memory usage in digital organisms. In ALIFE, pages 224–231. The MIT Press. [Grabowski and Magana, 2014] Grabowski, L. M. and Magana, J. A. (2014). Building on simplicity: Multi-stage evolution of digital organisms. In ALIFE, pages 113–120. The MIT Press. [Guyon et al., 2011] Guyon, I., Cawley, G., Dror, G., and Lemaire, V. (2011). Results of the active learning challenge. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pages 19–45. [Haley et al., 2014] Haley, P. B., Olson, R. S., Dyer, F. C., and Adami, C. (2014). Exploring conditions that select for the evolution of cooperative group foraging. In ALIFE 14: The Fourteenth Conference on the Synthesis and Simulation of Living Systems, volume 14, pages 310–311. [He et al., 2014] He, J.-z., Wang, R.-w., and Li, Y.-T. (2014). Evolutionary stability in the asymmetric volunteer’s dilemma. PloS one, 9(8):1–6. [Hintze and Miromeni, 2014] Hintze, A. and Miromeni, M. (2014). Evolution of autonomous hierarchy formation and maintenance. In ALIFE 14: The Fourteenth Conference on the Synthesis and Simulation of Living Systems, volume 14, pages 366–367. [Hochreiter, 1998] Hochreiter, S. (1998). Recurrent neural net learning and vanishing gradient. International Journal Of Uncertainity, Fuzziness and Knowledge-Based Systems, 6(2):107– 116. [Hochreiter and Schmidhuber, 1997] Hochreiter, S. and Schmidhuber, J. (1997). Long shortterm memory. Neural computation, 9(8):1735–1780. 91 [Hofbauer and Sigmund, 1998] Hofbauer, J. and Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, UK. [Huizinga et al., 2014] Huizinga, J., Clune, J., and Mouret, J.-B. (2014). Evolving neural networks that are both modular and regular: Hyperneat plus the connection cost technique. In Proceedings of the 2014 conference on Genetic and evolutionary computation, pages 697–704. ACM. [Hyv¨arinen et al., 2004] Hyv¨arinen, A., Karhunen, J., and Oja, E. (2004). Independent component analysis, volume 46. John Wiley & Sons. [Jahanshahi et al., 2006] Jahanshahi, M., Saleem, T., Ho, A. K., Dirnberger, G., and Fuller, R. (2006). Random number generation as an index of controlled processing. Neuropsychology, 20(4):391. [James and Crutchfield, 2016] James, R. G. and Crutchfield, J. P. (2016). Multivariate dependence beyond shannon information. arXiv preprint arXiv:1609.01233. [Jefferson, 2004] Jefferson, K. K. (2004). What drives bacteria to produce a biofilm? FEMS microbiology letters, 236(2):163–173. [Joshi et al., 2013] Joshi, N. J., Tononi, G., and Koch, C. (2013). The minimal complexity of adapting agents increases with fitness. PLoS Computational Biology, 9(7):e1003111. [Killingback and Doebeli, 1996] Killingback, T. and Doebeli, M. (1996). Spatial evolutionary game theory: Hawks and doves revisited. Proceedings of the Royal Society of London B: Biological Sciences, 263(1374):1135–1144. [Knowles et al., 2001] Knowles, J. D., Watson, R. A., and Corne, D. W. (2001). Reducing local optima in single-objective problems by multi-objectivization. In Evolutionary multicriterion optimization, pages 269–283. Springer. [Kvam et al., 2015] Kvam, P., Cesario, J., Schossau, J., Eisthen, H., and Hintze, A. (2015). Computational evolution of decision-making strategies. In 37th Annual Conference of the Cognitive Science Society, volume 37, pages 1225–1230. [Lehman et al., 2014] Lehman, J., Clune, J., and Risi, S. (2014). An anarchy of methods: Current trends in how intelligence is abstracted in ai. IEEE Intelligent Systems, 29(6):56– 62. [Lehman and Stanley, 2008] Lehman, J. and Stanley, K. O. (2008). Exploiting openendedness to solve problems through the search for novelty. In Proc. of the Eleventh Intl. Conf. on Artificial Life (ALIFE XI), Cambridge, MA. MIT Press. [Lenski, 2011] Lenski, R. E. (2011). Evolution in action: a 50,000-generation salute to Charles 92 Darwin. Microbe, 6(616):30–33. [Lenski et al., 2003a] Lenski, R. E., Ofria, C., Pennock, R. T., and Adami, C. (2003a). The evolutionary origin of complex features. Nature, 423(6936):139–144. [Lenski et al., 2003b] Lenski, R. E., Ofria, C., Pennock, R. T., and Adami, C. (2003b). The evolutionary origin of complex features. Nature, 423(6):139–144. [Lu and Ito, 1999] Lu, B.-L. and Ito, M. (1999). Task decomposition and module combination based on class relations: a modular neural network for pattern classification. IEEE Transactions on Neural Networks, 10(5):1244–1256. [Lungarella et al., 2005] Lungarella, M., Pegors, T., Bulwinkle, D., and Sporns, O. (2005). Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics, 3(3):243–262. [Lungarella and Sporns, 2006] Lungarella, M. and Sporns, O. (2006). Mapping information flow in sensorimotor networks. PLoS computational biology, 2(10):e144. [Marstaller et al., 2010] Marstaller, L., Hintze, A., and Adami, C. (2010). Measuring representation. In Proceedings ASCS09: 9th Conference of the Australasian Society for Cognitive Science, pages 232–237. North Ryde, NSW: Macquarie Centre for Cognitive Science. [Marstaller et al., 2013a] Marstaller, L., Hintze, A., and Adami, C. (2013a). The evolution of representation in simple cognitive networks. Neural computation, 25(8):2079–2107. [Marstaller et al., 2013b] Marstaller, L., Hintze, A., and Adami, C. (2013b). The evolution of representation in simple cognitive networks. Neural Comput, 25(8):2079–2107. [Maynard Smith, 1982] Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge University Press, Cambridge, UK. [McCabe, 1976] McCabe, T. (1976). A complexity measure issue. IEEE Transactions on Software Engineering, 2:308–320. [McGill, 1954] McGill, W. J. (1954). Multivariate information transmission. Psychometrika, 19(2):97–116. [Michalewicz, 1996] Michalewicz, Z. (1996). Genetic Algorithms + Data Strucures = Evolution Programs. Springer Verlag, New York. [Miikkulainen et al., 2017] Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Navruzyan, A., Duffy, N., and Hodjat, B. (2017). Evolving deep neural networks. arXiv preprint arXiv:1703.00548. 93 [Mnih et al., 2016] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pages 1928–1937. [Nadell et al., 2016] Nadell, C. D., Drescher, K., and Foster, K. R. (2016). Spatial structure, cooperation and competition in biofilms. Nature Reviews Microbiology. [Naiem et al., 2010] Naiem, A., Reda, M., El-Beltagy, M., and El-Khodary, I. (2010). An agent based approach for modeling traffic flow. In Informatics and Systems (INFOS), 2010 The 7th International Conference on, pages 1–6. IEEE. [Nash, 1951] Nash, J. (1951). Non-cooperative games. Annals of mathematics, pages 286–295. [Nash et al., 1950] Nash, J. F. et al. (1950). Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49. [Neumann et al., 1944] Neumann, J. v., Morgenstern, O., et al. (1944). Theory of games and economic behavior. Princeton university press Princeton. [Nguyen et al., 2015] Nguyen, A., Yosinski, J., and Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Computer Vision and Pattern Recognition (CVPR ’15). IEEE. [Nowak and Sigmund, 1993a] Nowak, M. and Sigmund, K. (1993a). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432):56. [Nowak and Sigmund, 1993b] Nowak, M. and Sigmund, K. (1993b). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432):56– 58. [Oizumi et al., 2014] Oizumi, M., Albantakis, L., and Tononi, G. (2014). From the Phenomenology to the Mechanisms of Consciousness: Integrated Information Theory 3.0. PLoS Computational Biology, 10(5):e1003588. [Olson et al., 2015] Olson, R. S., Haley, P. B., Dyer, F. C., and Adami, C. (2015). Exploring the evolution of a trade-off between vigilance and foraging in group-living organisms. Royal Society Open Science, 2(9). [Olson et al., 2013a] Olson, R. S., Hintze, A., Dyer, F. C., Knoester, D. B., and Adami, C. (2013a). Predator confusion is sufficient to evolve swarming behaviour. Journal of The Royal Society Interface, 10(85):20130305. [Olson et al., 2013b] Olson, R. S., Knoester, D. B., and Adami, C. (2013b). Critical interplay between density-dependent predation and evolution of the selfish herd. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pages 247–254. 94 ACM. [Olson et al., 2013c] Olson, R. S., Knoester, D. B., and Adami, C. (2013c). Evolution of swarming behavior is shaped by how predators attack. arXiv preprint arXiv:1310.6012. [Pearl, 2014] Pearl, J. (2014). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann. [Qiu and Sapiro, 2015] Qiu, Q. and Sapiro, G. (2015). Learning transformations for clustering and classification. Journal of Machine Learning Research, 16(187-225):2. [Rashevsky and Rashevsky, 1960] Rashevsky, N. and Rashevsky, N. (1960). Mathematical biophysics: physico-mathematical foundations of biology, volume 2. Dover New York. [Reisinger et al., 2004] Reisinger, J., Stanley, K. O., and Miikkulainen, R. (2004). Evolving reusable neural modules. In Genetic and Evolutionary Computation Conference, pages 69–81. Springer. [Rinehart et al., 2006] Rinehart, N. J., Bradshaw, J. L., Moss, S. A., Brereton, A. V., and Tonge, B. J. (2006). Pseudo-random number generation in children with high-functioning autism and aspergers disorder further evidence for a dissociation in executive functioning? Autism, 10(1):70–85. [Risi and Stanley, 2011] Risi, S. and Stanley, K. O. (2011). Enhancing es-hyperneat to evolve more complex regular neural networks. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pages 1539–1546. ACM. [Rivoire and Leibler, 2011] Rivoire, O. and Leibler, S. (2011). The value of information for populations in varying environments. Journal of Statistical Physics, 142(6):1124–1166. [Rocha et al., 2005] Rocha, M., Cortez, P., and Neves, J. (2005). Simultaneous evolution of neural network topologies and weights for classification and regression. In International Work-Conference on Artificial Neural Networks, pages 59–66. Springer. [Russell and Norvig, 2003] Russell, S. J. and Norvig, P. (2003). Artificial Intelligence: A Modern Approach. Pearson Education, Upper Saddle River, N.J., 2nd edition. [Schneidman et al., 2003] Schneidman, E., Still, S., Berry, M. J., Bialek, W., et al. (2003). Network information and connected correlations. Physical review letters, 91(23):238701. [Schossau et al., 2015] Schossau, J., Adami, C., and Hintze, A. (2015). Information-theoretic neuro-correlates boost evolution of cognitive systems. Entropy, 18(1):6. [Searcy and Nowicki, 2005] Searcy, W. A. and Nowicki, S. (2005). The evolution of animal communication: reliability and deception in signaling systems. Princeton University Press. 95 [Sebastiani, 2002] Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1–47. [Smith, 1974] Smith, J. M. (1974). The theory of games and the evolution of animal conflicts. Journal of theoretical biology, 47(1):209–221. [Stanley and Miikkulainen, 2002] Stanley, K. O. and Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evol Comput, 10(2):99–127. [Street et al., 1995] Street, W. N., Mangasarian, O. L., and Wolberg, W. H. (1995). An inductive learning approach to prognostic prediction. In ICML, pages 522–530. [Suchorzewski and Clune, 2011] Suchorzewski, M. and Clune, J. (2011). A novel generative encoding for evolving modular, regular and scalable networks. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pages 1523–1530. ACM. [Szabo and Fath, 2007] Szabo, G. and Fath, G. (2007). Evolutionary games on graphs. Physics Reports, 446(4-6):97–216. [Tononi, 2008] Tononi, G. (2008). Consciousness as integrated information: a provisional manifesto. The Biological Bulletin, 215(3):216–242. [Tononi et al., 1994] Tononi, G., Sporns, O., and Edelman, G. M. (1994). A measure for brain complexity: relating functional segregation and integration in the nervous system. Proceedings of the National Academy of Sciences, 91(11):5033–5037. [Traulsen and Hauert, 2009] Traulsen, A. and Hauert, C. (2009). Stochastic evolutionary game dynamics. In Schuster, H.-G., editor, Reviews of Nonlinear Dynamics and Complexity, volume 2. Wiley-VCH, Weinheim. [van Dartel et al., 2005] van Dartel, M., Sprinkhuizen-Kuyper, I., Postma, E., and van den Herik, J. (2005). Reactive agents and perceptual ambiguity. Adaptive Behavior, 13:227–42. [Vapnik and Chervonenkis, 1971] Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl., 16:264–280. [Wagenaar, 1972] Wagenaar, W. A. (1972). Generation of random sequences by human subjects: A critical survey of literature. Psychological Bulletin, 77(1):65. [Weibull, 1995] Weibull, J. W. (1995). Evolutionary Game Theory. MIT Press, Cambridge, MA. [Welch, 1984] Welch, T. A. (1984). A technique for high-performance data compression. Computer, 6(17):8–19. 96 [Werbos, 1974] Werbos, P. J. (1974). Beyond regression: new tools for prediction and analysis in the behavioral science. PhD thesis, Ph. D. Thesis, Harvard University. [Whitley et al., 1998] Whitley, D., Rana, S., and Heckendorn, R. B. (1998). The island model genetic algorithm: On separability, population size and convergence. Journal of Computing and Information Technology, 7:33–47. [Williams et al., 2002] Williams, M. A., Moss, S. A., Bradshaw, J. L., and Rinehart, N. J. (2002). Brief report: Random number generation in autism. Journal of autism and developmental disorders, 32(1):43–47. [Yao, 1999] Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE, 87(9):1423–1447. [Zahedi et al., 2013] Zahedi, K., Martius, G., and Ay, N. (2013). Linear combination of onestep predictive information with an external reward in an episodic policy gradient setting: a critical analysis. Frontiers in Psychology, 4(801). [Zeeman, 1980] Zeeman, E. C. (1980). Population dynamics from game theory. In Nitecki, Z. and Robinson, C., editors, Global Theory of Dynamical Systems. Lecture Notes in Mathematics, volume 819, pages 471–497. Springer, N.Y. [Zhang and LeCun, 2015] Zhang, X. and LeCun, Y. (2015). Text understanding from scratch. arXiv preprint arXiv:1502.01710. [Zhou et al., 2011] Zhou, A., Qu, B.-Y., Li, H., Zhao, S.-Z., Suganthan, P. N., and Zhang, Q. (2011). Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation, 1(1):32 – 49. [Zlotowski and Bakan, 1963] Zlotowski, M. and Bakan, P. (1963). Behavioral variability of process and reactive schizophrenics in a binary guessing task. The Journal of Abnormal and Social Psychology, 66(2):185. 97