EMERGENT COORDINATION: ADAPTATION, OPEN-ENDEDNESS, AND COLLECTIVE INTELLIGENCE By Honglin Bao A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science – Master of Science 2022 ABSTRACT EMERGENT COORDINATION: ADAPTATION, OPEN-ENDEDNESS, AND COLLECTIVE INTELLIGENCE By Honglin Bao Agent-based modeling is a widely used computational method for studying the micro-macro bridge issue by simulating the microscopic interactions and observing the macroscopic emergence. This thesis begins with the fundamental methodology of agent-based models: how agents are represented, how agents interact, and how the agent population is structured. Two vital topics, the evolution of cooperation and opinion dynamics are used to illustrate methodological innovation. For the first topic, we study the equilibrium selection in a coordination game in multi-agent systems. In particular, we focus on the characteristics of agents (supervisors and subordinates versus representative agents), the interactions of agents (reinforcement learning in the games with fixed versus adaptive learning rates according to the supervision and time-varying versus supervision-guided exploration rates), the network of agents (single-layer versus multi-layer networks), and their impact on the emergent behaviors. Regarding the second topic, we examine how opinions evolve and spread in a cognitively heterogeneous agent population with sparse interactions and how the opinion dynamics co- evolve with the open-ended society's structural change. We then discuss the rich insights into collective intelligence in the two proposed models viewed from the interaction-based adaptation and open-ended network structure. We finally link collective emergent intelligence to diverse applications in the realm of computing and other scientific fields in a cross-multidisciplinary manner. ii This thesis is dedicated to my parents, for their endless love, support, and encouragement. iii ACKNOWLEDGEMENTS I conducted research throughout my graduate studies in biologically inspired computing and complex social systems. They are, however, all interconnected within the broader framework of complex adaptive systems. It has not been a straightforward journey. This thesis has been shaped throughout this journey. I am grateful for everything I have encountered; it has helped shape my unique academic background and skill set. I would like to express my gratitude to Prof. Wolfgang Banzhaf, my advisor. I have co-authored a number of papers with him, and he completed my academic training, taught me how to conduct independent research, and supported me in pursuing my interests. I would like to express my gratitude to my committee members, Charles Ofria and Emily Dolson, as well as my lab members, Ian, Yuan, Iliya, Stephen, Ken, Salman, Nathan, and Jun, for their numerous insightful discussions and collaborations. I would also like to express my gratitude to social scientists with whom I have collaborated, including Zachary Neal in the Psychology Department, Winson Taiquan Peng in the Communication Department of Michigan State University, and Misha Teplitskiy, a sociologist in the University of Michigan's School of Information. They had a sizable influence on my academic trajectory. I would like to express my gratitude to Xiaoqin Yan for his encouragement, without which this thesis would not have been possible. Finally, I want to express my sincere appreciation to my parents. I would not have completed my thesis or been happy with conducting research without their unrequited emotional and financial support; I love you. iv TABLE OF CONTENTS LIST OF TABLES ......................................................................................................................... vi LIST OF FIGURES ...................................................................................................................... vii LIST OF ALGORITHMS ............................................................................................................ viii KEY TO ABBREVIATIONS ........................................................................................................ ix INTRODUCTION ......................................................................................................................... 1 I. METHODOLOGY ................................................................................................................ 4 II. EVOLUTION OF COOPERATION ..................................................................................... 8 A. Coordination and Cooperation ............................................................................................ 8 B. Related Work ...................................................................................................................... 9 C. The Proposed Model ......................................................................................................... 10 D. Model Performance........................................................................................................... 17 1. Comparison of Mutation and Two Types of Exploration ............................................. 18 2. Comparison with Previous Work .................................................................................. 19 E. Multilayer Networks ......................................................................................................... 21 III. OPINION DYNAMICS AND STRUCTURAL CODYNAMICS ...................................... 28 A. Opinion Dynamics ............................................................................................................ 28 B. The Proposed Model ......................................................................................................... 29 1. Opinion Dynamics with Sparse Interactions ................................................................ 31 2. Open-ended Structural Dynamics ................................................................................. 34 C. Model Performance........................................................................................................... 36 1. How Do Group Opinions Evolve? ................................................................................ 37 2. How Do the Opinion Dynamics Shape the Structural Co-dynamics? .......................... 38 3. What Factors Affect the Evolved Global Opinion? ...................................................... 42 D. Discussions ....................................................................................................................... 45 IV. TOWARDS COLLECTIVE INTELLIGENCE ................................................................... 48 A. Adaptation ......................................................................................................................... 50 B. Open-endedness ................................................................................................................ 52 C. Collective Intelligence ...................................................................................................... 55 V. CONCLUSION ..................................................................................................................... 59 BIBLIOGRAPHY ......................................................................................................................... 60 v LIST OF TABLES Table 1: Payoff matrix of an n-action 2-player coordination game ................................................ 9 vi LIST OF FIGURES Figure 1: The overall methodolgogy .............................................................................................. 7 Figure 2: Fraction of cooperators under different exploration and mutation methods. ............... 18 Figure 3: Comparison with Yu et al., 2017. ................................................................................. 20 Figure 4: The overview of the multilayered algorithmic framework ........................................... 23 Figure 5: Influence of cluster size on the evolution of cooperation. ............................................ 25 Figure 6: Evolutionary dynamics of learning rate α and exploration rate µt................................ 26 Figure 7: The overview of the SCOOE model ............................................................................. 30 Figure 8: The opinion dynamics under default Gaussian stubbornness. ...................................... 38 Figure 9: The structural dynamics. ............................................................................................... 40 Figure 10: The community structure. ........................................................................................... 41 Figure 11: The evolved opinions with different stubbornness distributions. ............................... 44 Figure 12: Opinion dynamics with different distributions and mechanisms. ............................... 45 vii LIST OF ALGORITHMS Algorithm 1: The proposed cooperation framework .................................................................... 13 Algorithm 2: The collective learning framework ......................................................................... 16 Algorithm 3: The SCOOE model ................................................................................................. 36 viii KEY TO ABBREVIATIONS SCOOE model Sparse CO-evolutionary Open-Ended model Simulated Annealing-based Exploration SA Exploration Neural Cellular Automata NCA ix INTRODUCTION 1 An agent-based model is a computational method that simulates the characteristics, behaviors, and interactions of entities (the smallest units) in a system to comprehend the system's dynamics and the factors that govern such dynamics. At the macroscopic level, we can observe the rich emergence of "novelties" by assigning basic behavioral rules or communication protocols to agents to represent individual behaviors and interactions, e.g., game-theoretic, reinforcement learning-based, and behavioral science-inspired interactions. We cannot fully predict these novelties using "reductive" analysis of microscopic rules, such as differential equation-based analysis, due to their emergent nature. These novelties can be viewed as the collective behavior of simple-rule-following intelligence. This "bottom-up" modeling methodology is referred to as agent-based modeling, which has been widely applied in a variety of fields, most notably biology and social science. The idea of "bottom-up emergence" has a long history in computer science. An early work is the von Neumann Machine (Aspray, 1989). The machine creates a replica of itself based on simple rules. John Conway, a British mathematician, was inspired by this and created Game of Life by introducing simple interactions between agents and their neighbors in a two- dimensional lattice environment. We can observe novel dynamics emerging in the Game of Life continuously: A disordered agent population gradually evolves into a structure with multiple delicate and tangible shapes under various rules and initialized states. Several of them maintain symmetry while undergoing continuous shape changes. Certain well-formed shapes are disrupted 1 This section is based in part on the ongoing paper coauthored with Wolfgang Banzhaf, "Social Inspiration in Computational Models: A Survey." 1 and destroyed when disordered agents "invade." Several of them remain stable over time. Generally, we frequently observe that "order" emerges from "chaos." In the 1980s, Craig Reynolds used agent-based simulation to model the group behavior of social organisms and animals (Reynolds, 1987). Christopher Langton then coined the term "artificial life" or "ALife" to refer to this methodology in order to emphasize its significance in the study of the evolution of digital organisms (Langton, 1997). Artificial life now is a growing field at the intersection of evolutionary biology, complex systems, and computational cognitive science (Attenberg, 2005). Sociologists have modeled humans and their behavior as a collection of autonomous agents that operate in parallel and communicate with one another for the purpose of simulating real-world social phenomena. Schelling's segregation model is a pioneering social simulation model (Schelling, 1971), which states that people's "mild" preference for group membership can result in a highly segregated society. Epstein and Axtell have coined the term "bottom-up" social science or generative social science to refer to this simulation methodology (Epstein & Axtell, 1996). Agent-based models have thriving commercial applications in the real world, notably for robotics and self-driving cars, accompanied by the rapid advancement of artificial intelligence and machine learning. They have a profound theoretical influence on task allocation and coordination in real-world e-Commerce robotic picking systems (Barbati et al., 2012; Yuan et al., 2021; Wang et al., 2018). Waymo has proposed Carcraft, an agent-based platform for testing self-driving algorithms. It models the complex interactions between self- driving cars, pedestrians, and humans in traffic by creating different agents and calibrates the simulated human interactions with cars using massive real-world human behavioral data (Connors et al., 2018). 2 The purpose of this thesis is to study agent-based models, with a focus on their computational characteristics and multidisciplinary applications. The rest of the thesis is organized as follows. Section I gives an overview of the methodology. Section II and III use two examples, the evolution of cooperation and opinion dynamics, to illustrate how to incorporate the methodology into the computational model design. Section IV reflects two proposed models and their insights into collective intelligence. Section V concludes the thesis with final remarks. 3 I. METHODOLOGY 2 What is the critical methodology for incorporating agent-based models into multidisciplinary applications via advanced computing techniques? To answer this question, we must first examine the foundations of a "system." A system is, in brief, the sum of the relationships between entities within the system and the structure within which they exist (Burns & Helena, 1987). In agent- based models, an agent acts on behalf of an entity. As are composed of numerous agents to simulate human-made or natural systems by simulating both entities and their interactions and structures. We divide the methodology into three perspectives mirroring how a system operates: how agents are represented, how agents interact, and how agent populations are structured. The agent property is a critical concept in agent-based models. Economists refer to the model that contains identical agents of the same type as a representative agent model. More broadly, the agents in a representative model may be distinct. Nonetheless, their respective characteristics are irrelevant to the problem at hand, i.e., the agent behaves in such a way that the overall decision is practically the same as the decision of a single agent or a group of similar agents. Thus, we can simplify this to a model of representative agents. On the other hand, if distinct agent characteristics are relevant to the problem, we must consider agent heterogeneity, i.e., the differences between agents. The heterogeneity of agents is a fascinating subject. The majority of existing research assigns distinct functions or roles to agents to capture real-world behaviors, such as opinion leaders and followers in social networks (Zhao & Kou, 2014), ethnically diverse agents in studies of residential segregation (Fossett & Waren, 2005), different species in ecological systems (Filatova et al., 2013), and trustworthy and untrustworthy agents in 2 This section is based in part on the ongoing paper coauthored with Wolfgang Banzhaf, "Social Inspiration in Computational Models: A Survey." 4 reputation computing, a sub-field of computer security (Wang et al., 2010). Numerous elements in agent property design are inspired by psychological and behavioral studies. The most well- known example is the BDI (belief-desire-intention) agent model (Georgeff, 1998). It is inspired by Michael Bratman's theory of reasoning about human practices (Bratman, 1987). A BDI agent is defined by executing programmed beliefs, desires, and intentions and utilizes these concepts to design and evolve "smarter" agents. It has been widely used in planning and scheduling problems in software engineering (Rao & Georgeff, 1995). A critical aspect of the agent-based computing model is microscopic mechanisms of interaction and their impact on global emergence. Numerous perspectives are studied in the design of interaction protocols. For instance, game-theoretic interactions are used to simulate human strategic reasoning and behaviors (Adami et al., 2016). Reinforcement learning is used to model the interaction of agents with their environments (e.g., other agents) by maximizing cumulative rewards (Barbati et al., 2012). The concept of feedback in reinforcement learning, i.e., receiving rewards for good behaviors and punishments for creating problems, is fundamental to the study of decision-making. It is critical guidance for animals and humans acting and surviving in unknown environments. This concept permeates computational and experimental models in psychology, neuroscience, behavioral science, and ethology (Dayan & Daw, 2008). Learning and game theory are two widely used means of interaction with other agents or the environment. Agents gradually adapt to the environment in the process of interaction. In Section IV, we will examine systematically how the interaction among agents affects their adaptation and the macroscopic evolutionary dynamics. Some work takes a structural view of the holistic-level patterns. It conceptualizes the population as a social network, with each node representing an individual and each edge 5 representing an interaction between individuals. We can study the macroscopic patterns, network statistics, and dynamics of social networks using graph theoretic approaches. An example is social network-inspired optimization and its wide applications, e.g., information retrieval (Nasution & Noah, 2012; Matsuo et al., 2007). Improving the recall and precision rates in the retrieval process can be modeled by detecting increasingly segregated communities in the document population represented as a network. In addition, when the agent population is structured as a network, the statistical characteristics of the network, such as connectivity and open-endedness, will have a significant effect on the macroscopic emergent patterns. This issue will be discussed systematically in Section IV based on the models proposed in the thesis. On the other hand, some works argue that structured methods cannot be used effectively to analyze emergent dynamics (Villani et al., 2021). Otherwise, it would not be called “emergent.” These scholars employ a bottom-up perspective of macroscopic phenomena, in which even if the microscopic structure is chaotic, this has no effect on the emergence of the subtle macroscopic structure. A notable example is a large number of self-organized multiagent clustering algorithms (Thrun & Ultsch, 2021; Bu et al., 2017). Following these three lines of inquiry, Figure 1 illustrates the overview of methodology. The following sections will study how agent-based models with novel representations, interactions, and structures can be used to address computational problems related to coordination, cooperation, opinion dynamics, and collective intelligence. 6 Figure 1: The overall methodology. 7 II. EVOLUTION OF COOPERATION 3 A. Coordination and Cooperation The study of the evolution of cooperation has a long history in the application of game theory (Axelrod & Hamilton, 1981). It also assumes great importance in the field of multiagent systems. In multiagent societies, cooperation represents an interaction among agents that can be evolutionarily advantageous to improve the performance of individual agents or the overall behavior of the society they belong to. Therefore, one of the main goals in multiagent societies is to achieve efficient cooperation among agents to jointly solve tasks or to maximize a utility function. In this section, we adopt the “Rules of the Road Game,” a typical coordination game, as an example to study the evolution of cooperation (Young, 1996). Consider two carriages meeting on a narrow road from opposite directions, having no context to decide on which side of the road to pass the other. If they choose differently, it will cause a head-on collision between them, and they receive a negative payoff. Only if they choose the same way can they avoid a collision and receive a positive payoff. To abstract from this realistic situation to virtual multiagent societies, agents are striving to establish a convention/law of coordinated action by choosing from an action space without any central controller. The payoff matrix is shown in Table 1. 3 This section is based in part on the paper: Honglin Bao, Qiqige Wuyun, Wolfgang Banzhaf; July 23–27, 2018. "Evolution of Cooperation through Genetic Collective Learning and Imitation in Multiagent Societies." Proceedings of the ALIFE 2018: The 2018 Conference on Artificial Life. ALIFE 2018: The 2018 Conference on Artificial Life. Tokyo, Japan. (pp. pp. 436-443). ASME. https://doi.org/10.1162/isal_a_00082 8 Table 1: Payoff matrix of an n-action 2-player coordination game. There are multiple Nash-equilibria in this diagonal situation. Both two players choose the same action, i.e., coordinated action. However, even purely rational players cannot choose the specific coordinated action without negotiation because they have no information to differentiate between strictly the same multiple equilibria. In reality, people can survive such social dilemmas because there are laws or social norms for them to refer to. Our goal is to train agents of a virtual society to choose cooperative action without upper-level steering and regulation. B. Related Work In order to realize such coordination and cooperation, some techniques developed in the field of machine learning have been introduced into various multiagent systems (Kapetanakis & Kudenko, 2002). Machine learning has been proven to be a popular approach to solving multiagent system coordination problems (Savarimuthu et al., 2011). Among machine learning techniques, reinforcement learning has gained much attention in the field of multiagent systems since it can be used to model agent learning by trial-and-error interaction with the dynamic environment. However, several new challenges arise for reinforcement learning in multiagent systems. Foremost among these is that the performance of reinforcement learning is unsatisfactory in many real-world applications. The learning algorithm may not converge to the optimal action. Some researchers showed that an adaptive strategy, called evolutionary 9 reinforcement learning, which combines reinforcement learning with a genetic algorithm, could reach a better performance than either strategy alone (Ackley & Littman, 1991). Some new forms of learning, e.g., observational, imitational, and communication-based learning (Taylor et al., 2006, Savarimuthu et al., 2011), also significantly promote information proliferation (Dittrich & Banzhaf, 2002) in more complex environments and can be used to solve complex distributed multiagent coordination problems better than pure reinforcement learning approaches. Furthermore, ensemble methods are used to combine the advantages of multiple learning algorithms to obtain better performance than what could be obtained from any of them alone (Polikar, 2006). More recently, Yu and colleagues (Yu et al., 2017) studied the role of reinforcement learning, collective decision-making, social structure, and information diffusion in the process of the evolution of cooperation in the networked society. C. The Proposed Model Although previous work provided a strong basis to study the mechanisms behind the evolution of cooperation, existing work in this area has drawbacks. Individual reinforcement learners often fail to develop globally coordinated behavior and can be trapped in local sub-optimal dilemmas. Using an evolutionary approach for strategy selection can produce optimal behavior but may require significant computational efforts. Behavior imitation always creates weak local coordination in society, caused by local interactions between agents. This study is significantly different from other frameworks for the evolution of cooperation in previous studies because of the hybrid policy of agents' decision-making. Here we design a genetic algorithm-based cooperation framework, which takes into account evolutionary selection, collective learning, and imitation, in order to solve some particular non-cooperative games in complex multiagent 10 networks, overcome previous shortcomings, and produce an acceptable tradeoff in convergence rate and computation effort. The final decision of an agent (both parent and offspring agent) is influenced by the three kinds of processes shown below. Note that all agents in the model are identical in terms of interaction, learning, and decision-making. This kind of model is referred to as the “representative agent-based models” in economics (Gallegati & Kirman, 2012). • Evolutionary Selection (with inheritance and mutation): A population of agents plays a game with their neighbors (i.e., the agents which are directly connected with the focal agent) on the network for several iterations. The offspring generation will be reproduced from the parent generation according to the cumulative payoff distribution, and the most successful agents will pass on action to their offspring. Mutation will occur with a small probability during the inheritance process to create novelty. • Collective Learning: Agents on the network improve on their parents’ actions and their original actions through a collective reinforcement learning algorithm with exploration and exploitation. • Imitation: Agents update the cumulative payoff, compare their cumulative payoff to neighbors, and adopt the actions of more successful agents as their own actions with a particular probability. These three processes interact with each other and can have a significant influence on the evolution of cooperation in the entire society. The overview of the proposed algorithmic framework is shown in Algorithm 1. It constitutes a genetic algorithm-based cooperation framework for multiagent systems with collective decision-making, learning and imitation to facilitate the evolution of cooperation used 11 in some particular coordination games. This framework is set in a network structure such as a small-world network (Watts & Strogatz, 1998) or a scale-free network (Barabási & Albert, 1999). A population of agents plays the coordination game with their neighbors repeatedly and simultaneously in the network for several generations. Offspring generation io will be reproduced from the parent generation ip according to their cumulative payoff distribution E. That means, the probability of reproduction is proportional to the cumulative payoff; The agents with higher cumulative payoffs are more likely to reproduce their offspring. The most successful agents pass on behavior to their offspring io, and mutation will change this behavior with a small probability η during inheritance. The society information regarding nodes and edges will be updated at each round with new offspring nodes and parent-offspring edges. Then agents will improve their actions through a collective reinforcement learning algorithm with exploration and exploitation and a social learning process. 12 Algorithm 1: The proposed cooperation framework. We adopt a widely used reinforcement learning algorithm, Q-learning, to model this interaction. Its one-step updating rule is given by Equation 1. Here α ∈ (0, 1] is a learning rate, and λ ∈ [0, 1) is a discount factor. (1) As shown in Equation 1, an agent has a set of states and a set of actions. An agent performs an action a, transitions from state s to another new state s’ and receives an immediate 13 reward R(s, a). Q(s, a) is the expected reward of choosing action a in state s at time step t. During the interaction, agents want to maximize the expected discounted reward Q (s’, a’) to make decisions in the new state s’ at time step t + 1. The Q-function is learned during an agent’s lifetime inherited to choose a best-response action based on the Q-value regularly. Each agent needs to aggregate all the best-response actions regarding its neighbors into an overall action. This is inspired by the opinion aggregation process in that people usually have to seek suggestions from many other people before making a final decision. The opinion aggregation process can be realized by an ensemble learning method which combines multiple single- learning algorithms to obtain better performance than what could be obtained from any of them alone (Polikar, 2006). The foremost method of collective voting is inspired by a simple political principle, majority rule. Consider that in a simple society (e.g., an undirected simple graph which represents the multiagent network we adopt in this paper), human beings are more keen to decide as the majority of their neighbors. So, when agents make final decisions, they consider the action which quantitatively dominates in the best-response action pool. More complex and realistic methods to make a final decision consider the weight of each neighbors, such as performance-based weighted voting method and structure-based weighted voting method. For structure-based weighted voting, the weight of each neighbor is related to the degree of each neighbor. The focal agent will give higher weight to a neighbor with more connections. For performance-based weighted voting, the focal agents will consider previous interaction experience and will give higher weight to neighbors they trust. In this study, we adopt majority voting as the opinion aggregation method (Bao et al., 2018). On the other hand, for the pure greedy processes, such as the reinforcement learning- based interaction, agents can be trapped easily in local sub-optima, and thus fail to learn the 14 optimal behavior. During learning, an agent needs to strike a balance between exploitation of learned knowledge and the exploration of unexplored environments in order to try more actions, escape from local sub-optima, and learn optimal behavior. In this paper, we propose time-varying exploration inspired by simulated annealing (SA exploration) for dealing with exploitation and exploration during learning by treating the exploration rate as temperature. One step of SA exploration is given by Equation 2. (2) In Equation 2, µt is the exploration rate in the tth round of simulation, and µ0 is the initial exploration rate. At the beginning (t is small), exploration should be given higher weight to explore the unknown environment. As the algorithm continues (t increases), the probability of exploitation (i.e., 1- µt) increases determining that the agent will focus more on exploitation of learnt knowledge. The collective learning framework is illustrated in Algorithm 2: during the interaction with neighbors, agents need to find a best-response action regarding each neighbor with a Q- learning method. At each time step t, regarding each neighbor j, agent i chooses the best- response action with the highest Q-value with a probability of 1- µt (i.e., exploitation), or chooses an action randomly with a probability of µt (i.e., exploration). This occurs in the process of local interaction with neighbors. We call this process Local SA Exploration. When agents use specific ensemble methods to aggregate all the best-response actions into an overall action, agents choose the overall action under ensemble methods with a probability of 1- µt (i.e., exploitation), or choose an action randomly with a probability of µt (i.e., exploration). This occurs in the process of overall aggregation. We call this process Global SA Exploration. 15 In general, a small exploration rate (such as 10%) is kept throughout to conserve a small probability to explore. Algorithm 2: The collective learning framework. Social learning theory is connected with social behavior and learning and proposes that new behavior can be obtained by observing and imitating others’ behavior (Bandura and Walters, 1977). In real life, people not only can learn through their individual trial-and-error experiences (i.e., individual Q-learning to determine best-response actions), but also seek suggestions or advice from others in a society (as mentioned in opinion aggregation). Furthermore, they can also learn from the information directly provided by others through communication, observation, and imitation (Polikar, 2006). We are inspired by social learning 16 theory to add an imitation process after learning to promote the evolution of cooperation. After reproduction and learning, there is a new population with better performance in multiagent societies. In every time step, when agent i updates the cumulative payoff Ei, agent i in this new population adopts neighbor agent j’s behavior, replacing its heritable behavior, with a probability W. Following Szabo and Toke (Szabo & Toke, 1998), we set: (3) Here, E’i and E’j are the cumulative payoff of agent i and neighbor j after updating actions and payoffs. K represents some noise which is introduced to consider irrational choices. For K = 0 agent i adopts neighbor j’s strategy if E’j > E’i. Here we set K = 0.1. D. Model Performance To test the model, we use the Watts-Strogatz model (Watts & Strogatz, 1998) to generate a small-world network, and use the Barabasi-Albert model (Albert & Barabasi, 2002) to generate a scale-free network. In order to use the Barabasi-Albert model, we start with 2 agents and add a new agent with 1 edge to the network at every step. Because of the rewiring probability ρ, this approach generates a scale-free network following a power law distribution with an exponent γ = 3. We set the maximum number of edges to 1,000,000 for network evolution. Mutation rate η in inheritance is 0.01. Individual Q-learning rate α is 0.1. Average exploration rate in SA exploration is 0.1 for 100 time steps, so the initialized exploration rate µ0 is 0.144 after a simple mathematical calculation. Noise in imitation is set to 0.1. In this study, unless stated otherwise, we use the small-world network as the default network topology because it can evolve into many kinds of networks, and local SA exploration as the exploration mode. Experiments will be run in 100 time steps. 17 1. Comparison of Mutation and Two Types of Exploration We test the situation under 4-action space, i.e., action 0,..., action 3 respectively. Figure 2 shows the asymptotic percentage of cooperative actions (action 0) adopted by the agents when cooperation evolves in the entire society. Initially, each agent randomly chooses an action from action space, so there are about 25% of all agents to choose each action respectively. As our framework moves on, the number of agents who choose action 0 as the cooperative action finally reaches more than 90% in the situation with SA exploration (both local and global). This result means that more and more agents have reached a consensus on that action 0 should be the cooperative action. From Figure 2, we can see that the fraction of cooperators in the society using collective learning with local SA exploration mode is almost 100% which means that almost all the agents have reached a consensus on which action should be the cooperative action. The framework works in the entire society. Figure 2: Fraction of cooperators under different exploration and mutation methods. 18 We further study Figure 2 and we can draw these conclusions: 1). Local exploration is better than global exploration. The fraction of cooperators using collective learning with the global exploration mode is lower than that using collective learning with the local exploration mode. This is because agents explore the environment with a probability of 0.1. However, as agents using local exploration to explore the environment locally (i.e., choosing irrational action during local interaction) and aggregate to an overall action collectively, the randomness caused by the exploration can be removed. In global exploration, agents explore globally when they aggregate all best-response actions into an overall action, the randomness will be kept. 2). Mutation is necessary. The fraction of cooperators with mutation is higher than that without mutation. Although sometimes mutation has a bad influence, indeed, it is the source of novelty. 2. Comparison with Previous Work We mainly compare the performance of our model with Yu et al., 2017. Yu’s work is mainly based on collective reinforcement learning and information diffusion (i.e., communication-based social learning, agents sharing Q table to communicate). As shown in Figure 3, we set the action space as Na = 10 and follow all other parameter settings. Our framework has better performance than the previous study. We additionally test other situations with different action spaces; the results show the same trends. It indicates that our model works for the evolution of cooperation in the entire society. It is indeed effective for robust evolution by combining evolutionary selection, individual learning, collective voting, and social imitation. 19 Figure 3: Comparison with Yu et al., 2017. Through our experimental analysis, we find that there is not much difference in the efficiency of the evolution of cooperation in different sizes of agent population, different opinion aggregation methods, and different network structures. We additionally test models with separated mechanisms. We find: 1). Collective decision-making (opinion aggregation) and imitation will significantly facilitate the evolution of cooperation, especially collective decision-making; 2). Evolutionary selection does cause influence both on the convergence speed and convergence rate, but not as dramatic as collective decision-making or imitation; 3). We could not get any convergence curves in 100 generations during experiments without reinforcement learning-based interactions. To summarize, for robust cooperation evolving in networked agent systems, the potential key factors are: 20 • the way how agents interact with each other. This is also called interaction protocol. For instance, interacting randomly in a population or interacting with neighbors in a network; what game-theoretical situations the interaction is based on. • the way how agents update their learning information through interaction, i.e., what learning strategies (e.g., collective Q-learning (Yu et al., 2017), WoLF-PHC (Win or Learn Fast Policy Hill-Climbing, Bowling & Veloso, 2001), and fictitious play (Monderer & Shapley, 1996)) do agents use to update their learning information? • the way how agents diffuse their learnt information, e.g., communication-based social learning, imitation-based social learning, and observation-based social learning. • whether the entire population evolves in a better direction. Evolving to improve the entire fitness (e.g., reproducing offspring with better performance to increase the entire average fitness) represents an enhancement in the evolution of cooperation. E. Multilayer Networks Our previous model is built on a single-layer social network with representative agents. Let us further consider a variant of the basic model: how does the multilayer structure, for instance, supervisor and subordinates, shape the emergent dynamics of cooperation? Multilayer networks are networks composed of multiple layers of sub-networks, which can be widely found in nature and man-made systems. Each layer represents a distinct interaction, social circle, or timestamp. The analysis of modern networked social and physical systems like online social networks, transportation systems, metabolic and regulatory networks can all benefit from this new paradigm of network science. It can better distinguish and deal with heterogeneous relationships 21 between levels like supervisor/subordinate, opinion leader/follower, and hierarchical ecosystems than the previous flattening paradigms. The formalized social structure is a multilayered networked structure with dominance between higher and lower layers. This networked structure is divided into multiple small groups Gx ∈ (1, ..., X), where X indicates the number of small groups in the society, and x indicates the supervisor of a particular group. Supervisors of lower-level subordinates can also be subordinates of higher-level supervisors. Notice that supervisor agents are not global controllers. They make decisions based on local information reported by subordinates. So, we combine up- level supervision with bottom-level individual learning in the model. An agent i interacts with its rivals who connect with it directly and uses Q-learning with an exploration algorithm with the SA exploration rate µt, which have been documented in the original model, to choose a best-response action ai. Then agent i receives the corresponding payoff ri according to Table 1. After action-selection, focal agent i stores action ai, cumulative payoff Ri (calculated by the sum of corresponding payoffs interacting with all rivals) and learning parameters in a table. Agent i then reports all of them to agent i’s supervisor x and recognizes the rivals’ previous actions to determine whether to withdraw with a defector with a small probability of 10%. Supervisor x combines all reported best-response actions of its subordinate agents into an overall action ax through the collective learning methods, which has been documented in the original model (Algorithm 2). Supervisor x then interacts with a randomly selected rival in the same cluster of the supervisor layer, and imitates to update ax into a’x according to the performance difference between the overall actions of supervisor x and this rival (e.g., average cumulative payoff), following Equation 3 (Szabo & Toke, 1998). After aggregating the subordinates’ actions and imitation, supervisor x has a final action a’x which will 22 be used as guidance to teach its subordinate agents to act better. Two basic parameters of the learning process, i.e., learning rate α and SA exploration probability µt, will be adjusted based on supervisor x’s steering information among subordinate agent i and the peers within the same subordinate cluster. Finally, all agents update their learning information using the new learning rate and exploration rate. This closed-loop process is iterated for T time steps. The entire picture of the overall algorithmic framework is shown in Figure 4. Figure 4: The overview of the multilayered algorithmic framework. The processes of collective decision-making-based aggregation and imitation have been reported in the original model. While how do supervisors utilize the reported information to guide behavioral change among subordinates? We apply the Adaptive Learning Method. Supervisor x passes down the action ax, after aggregation and imitation, to its subordinate agents i. At each time step t, based on this guidance information from supervisor x, agent i adjusts its 23 behavior. “Adaptive learning” means that the subordinate agents adaptively adjust their learning information based on the supervised information from supervisors. Some previous algorithmic frameworks with high time/space complexity can be used to model this supervised process with the adaptive self-adjustment, such as MiniMax-Q Algorithm (Littman, 1994), Nash Q-learning Algorithm (Hu & Wellman, 2003), and Friend-or-Foe Q- learning Algorithm (Littman, 2001). In this thesis, we introduce a simple but insightful philosophy, i.e., win stay, lose shift (Nowak & Sigmund, 1993), to build this adaptive adjustment method. First, we should define two situations, “win” and “lose,” respectively. If the reported action ai of subordinate i is identical with the supervisor x’s final action ax after aggregation and imitation, this situation is approved by the supervisor and regarded as “win” and “lose” otherwise. The adjustment is conducted in the process of comparison of guidance information and the current situation of subordinates. Two primary parameters, i.e., learning rate α and SA exploration rate µt in Q-learning-based interactions, will be adjusted by accepting guidance. If “win,” the focal agent will decrease both learning rate α and exploration rate µt to stay in the winning situation; otherwise, increase these two parameters to escape from sub- optima, i.e., “shift.” In each step, the degree of adjustment is set to 10%. We show how multilayer structure re-shapes the evolution of cooperation. We first study the influence of cluster size. We fix the subordinate population to a 32 × 32 grid network and vary the cluster size to 2 × 2 (with 16 × 16 supervisors), 4 × 4 (with 8 × 8 supervisors), 8 × 8 (with 4 × 4 supervisors), and 16 × 16 (with 2 × 2 supervisors), respectively, to study the influence of cluster size on the evolution of cooperative hunting, which is shown in Figure 5. We also assume there are two layers where supervisors share ½ of the corresponding payoff, but subordinates share only ¼, introducing unequal payoff sharing in the hierarchical system. 24 In general, we find that with a larger cluster size, a higher level of cooperation can evolve (such as in the cases of cluster size 16 × 16, 8 × 8, and 4 × 4). Both small population size and large cluster size will lead to a broader view of a partial system for the focal supervisor, which facilitates the evolution of cooperation. Additionally, the smaller cluster size leads to more local groups distributed in the entire system and significantly increases the diversity of the system. It will take more effort to step across multiple sub-optima to evolve global cooperation. This will bring a negative influence on the evolution of global cooperation. It is unusual for the case of cluster size 2×2. We find that the curve of cluster size 2 × 2 seems to violate the general trends. That is because of the large number of supervisors. In our game-theoretical settings, supervisors share a higher fraction of payoff than subordinates, and the number of supervisors is fewer than subordinates. In the case of cluster size 2 × 2, there are 256 supervisors and 1024 subordinates. The number of supervisors is significantly higher than that in other cases. This will lead to an increase in average payoff (both initial and convergence value) and compensate for the negative influence caused by the small cluster size. Figure 5: Influence of cluster size on the evolution of cooperation. 25 We then study the dynamics of learning rate and exploration rate. In our model, dynamics in both learning rate and exploration rate are introduced by the “win stay, loose shift” rule and SA exploration. Based on a hierarchical grid network with 12×12 subordinates and 3×3 supervisors, we study parameter dynamics in reinforcement learning-based interactions shown in Figure 6. The start of the exploration rate µt is slightly higher than the learning rate α because of the difference between initialized values. An initialized increasing but decreasing to almost 0 afterward can be seen in both learning rate α and exploration rate µt. It indicates that agents initially do not know which action they can adopt, then they interact to try (“trial and error”); hence both α and µt increase. As the system evolves, agents realize which action they should adopt, in turn, both α and µt decrease until almost 0. Notice that the increase at the initial stage is more significant for learning rate than the exploration rate. SA exploration introduces a continuous decrease in the exploration rate µt as the system evolves. As a result, the difference between the learning rate α and exploration rate µt appears. Figure 6: Evolutionary dynamics of learning rate α and exploration rate µt. 26 Many extensions can be done based on this general framework. For example, more layers can be introduced to build a more complex adaptive learning structure. Many network structures, e.g., small-world and scale-free networks, can be introduced to represent agent-based artificial societies. Some interesting factors, such as kin selection which can be intuitively understood that agents have a higher chance to cooperate with kinship, can be investigated based on this framework. 27 III. OPINION DYNAMICS AND STRUCTURAL CODYNAMICS 4 We have shown the dynamics of cooperation. In this section, we study the dynamics of coordination within sparse-interaction agents based on an open-ended network. A. Opinion Dynamics The study of opinion dynamics, i.e., the study of the formation and dynamics of public opinions, is a crucial research topic in complex systems and social networks. The topic has been widely explored for several decades with theoretical models and real-world applications among different disciplines, including social science, control engineering, statistical physics, and computer science. Elucidation of the mechanisms behind macro-level opinion dynamics is vital for understanding social interactions/dynamics, complexity, distributed control, and decision- making. It also holds valuable lessons to apply to real-world empirical studies and applications like marketing and social media (Mastroeni et al., 2019). Many classic agent-based models have been explored under various assumptions to study opinion dynamics from different perspectives. For example, the Hegselmann-Krause model (Hegselmann & Krause, 2002) studies opinion polarization with the bounded confidence assumption, i.e., agents interact only if their opinions are sufficiently close to each other by falling within a confidence interval. The Sznajd model and its variations (Sznajd & Jozef, 2000) study the evolution of consensus in a closed society through majority voting. In that model, a focal agent polls its complete neighborhood (i.e., the group of agents sharing connections with the focal agent in the social network) and selects the opinion of the majority. However, some 4 This section is based in part on the paper coauthored with Zachary Neal and Wolfgang Banzhaf, “Coevolutionary opinion dynamics with sparse interactions in open-ended societies.” Complex & Intelligent Systems (2022). https://doi.org/10.1007/s40747-022-00810-w 28 assumptions in existing models, e.g., polling the complete neighborhood, seem to be no longer suitable, notably when people with bounded rationality only have a partial view and cannot access the complete neighborhood information in their social networks. Meanwhile, when we interact with neighbors, the literature from psychology suggests, we are mainly concerned with the overall opinion of neighbors (e.g., a joint opinion through collective decision-making), and we adjust our own opinions according to this feedback (Forsyth, 2018). Some other work uses the bounded-confidence assumption (Wang & Shang, 2015) applying dense interactions and serial opinion updates through interacting with all selected neighbors. While existing models thoroughly describe the dynamics of opinions and interactions, they ignore the built-in structural dynamics caused by opinion dynamics and open-endedness, e.g., through the effects of newcomers, leavers, and their impact on structure-opinion coevolution. B. The Proposed Model We address the limitations of prior works in the proposed SCOOE model, which takes into account the bounded rationality (e.g., incomplete information) of agents and the coevolution of structure and opinion in an open-ended society, as shown in Figure 7. Opinion dynamics with two components of sparse interactions (extrinsic and intrinsic forms): The focal agent with a limited view can only access a partial neighborhood. It aggregates a joint opinion of the incomplete neighborhood by collective decision-making. Then the focal agent only takes this joint opinion from the extrinsic incomplete neighborhood into account by the interaction with the joint opinion, rather than by dense interactions with all neighbors or certain neighbors with similar opinions selected by polling the neighborhood. Imitation is also introduced to drive opinion intrinsic adjustments based on observation of the incomplete neighborhood environment 29 without direct interactions with neighbors. The open-ended structure and opinion coevolve where the opinion dynamics affect the leaver exiting society and associated structural dynamics; A joiner with a random opinion joins. It changes the structural features and the neighborhood settings, and the neighborhood settings in turn affect the opinion dynamics. Figure 7: The overview of the SCOOE model. For agent i being exposed to a new opinion, we assume that agent i has a built-in probability of sticking to its opinion, i.e., a stubbornness probability wi. It describes the degree to which an agent relies on its original opinion. In contrast, the complement of stubbornness, i.e., an openness probability 1-wi, quantifies the degree to which agent i is willing to adopt a new opinion derived from the interaction with other agents. Heterogeneity is produced when agents hold different built-in cognitive features represented by different stubbornness (or openness) probabilities. We assume that stubbornness wi follows a probability distribution in the population, like a Poisson or Gaussian distribution. In the experimental section below, we report on the influence of different stubbornness distributions. The opinion Oi of an agent i is represented by a real number in the continuous interval [0,1]. It describes the degree to which an agent believes the propagated information, e.g., news or rumors. A higher value of opinion Oi 30 means that agent i believes the propagated information more strongly. Initially, each agent is assigned a random opinion, i.e., a random number in the range [0,1]. 1. Opinion Dynamics with Sparse Interactions We first discuss the opinions dynamics. The critical theme of opinion dynamics is sparse interaction and incomplete information. People do not serially poll the neighborhood in their social networks to update the opinion but are mainly influenced through considering the joint opinion of others as their feedback (Forsyth, 2018). This contrasts with most agent-based models which are formulated with such complete information assumptions, e.g., polling the entire neighborhood to select neighbors and conduct dense interactions serially to update opinions (Forsyth, 2018; Noorazar, 2020). The literature from psychology suggests two types of motivations for humans to change behavior, extrinsic motivation (people are assimilated into extrinsic environments) and intrinsic motivation (people are motivated by internal desire) (Deci et al., 2001). We take inspiration from this and assume two types of actions forming the sparse interaction, extrinsic collective interactions and an intrinsic observation mechanism. The interplay between these two actions enhances group opinion evolution. However, they play different roles in various stages of the model dynamics reported in the experimental section. We first study the extrinsic collective interaction with incomplete information -- we introduce a collective decision-making approach to incorporate the sparse joint opinion formation and interaction based on a limited neighborhood (i.e., incomplete information). Though several collective decision-making approaches have been proposed in discrete-opinion models, e.g., majority voting (Choi & Goh, 2018), this approach in continuous-opinion models has not been fully developed so far. Suppose a focal agent i with a connection degree di in its 31 social network is able to only access a random subset of neighbors i1, i2,..., ij, where j is randomly chosen and satisfies 1 ≤ j ≤ di. This assumption means incomplete information by a limited view and only partial access to neighbors, and it allows more dynamic interactions, e.g., an agent will not interact with its entire neighborhood. Agent i generates a joint opinion OiJoint of its random partial neighborhood, rather than by interacting with all its neighbors or certain neighbors with similar opinions serially. An intuitive way to generate a joint opinion is by taking the weighted average of the selected neighbors’ opinions (Friedkin & Johnsen, 1990). The weights assigned to different neighbors are proportional to their relative connection degree strength, as shown in Equation 4, where 𝑑!! is the degree of neighbor ik, k ∈ [1, j]. (4) Thus, the more a neighbor is connected in the local network (measured by its relative connection strength), the greater its weight and influence on the joint opinion in the collective decision-making process. Another critical factor in designing an interaction protocol is confirmation bias. That is, people collect and interpret information selectively by trying to follow their original bias (e.g., their original opinions) (Plous, 1993). The most widely adopted interaction protocol with confirmation bias is a bounded confidence model where rational agents owning the perfect information poll their entire neighborhoods and select others to interact only if their opinions fall within a confidence interval (Deffuant et al., 2000; Gomez-Serrano et al., 2012; Wang & Shang, 2015). Here we take inspiration from game theory and model this as an opinion interaction game with confirmation bias among bounded-rational agents with limited information. So, after 32 generating a weighted joint opinion based on limited neighborhood information, agent i with opinion Oi receives a payoff Ri represented by Equation 5. Ri = 1 − |Oi − OiJoint| (5) Equation 5 means that if the opinion Oi of agent i is very different from the joint opinion in its selected neighborhood (the local environment), it receives a low payoff. Neighborhoods with more similar opinions are considered more trustworthy, thus, resulting in a higher payoff. After considering the collective interaction by the game-playing and interaction with the joint opinion, the focal agent i adapts to the neighborhood. Suppose the stubbornness of i is ωi and its openness is 1-ωi, then the adapted opinion OiAdapted of agent i is calculated by Equation 6. It represents a combination of relying on its original opinion and accepting a new opinion (Centola & Macy, 2007; Friedkin & Johnsen, 2009). OiAdapted = Oi × ωi + OiJoint × (1 − ωi) (6) We have now seen how agents take advantage of extrinsic collective information within their incomplete neighborhoods. Agents also observe the local environment to adjust their opinions to seek a higher payoff. This is driven by intrinsic motivation. People sometimes engage in an activity just because they are drawn to do it (Golman & Loewenstein, 2018; Ryan & Deci, 2000). We apply the imitation rule here that does not need direct interactions but transforms information in the population through observation and self- adjustment (Szabo & Toke, 1998; Pan et al., 2018). Again, a focal agent i only accesses a random partial neighborhood as its observation environment. For these random neighbors, the focal agent i holds a probability 𝑊!,!" to imitate the local best-performing neighbor ir (i.e., the neighbor with the highest cumulative payoff) by adopting ir’s opinion as its own opinion. We still follow Szabo and Toke (Szabo & Toke, 1998), as shown in Equation 3, to model the imitation probability, which is 33 proportional to the gap between cumulative payoffs of the focal agent and the selected neighbor, and we set the noise to µ = 1.5. Thus, agents observe the environment and keep a close eye on the cumulative payoff gap. They then adjust their opinions voluntarily without direct interactions to achieve a greater payoff and a better position in society. In summary, we introduce (i) sparse opinion updates by taking incomplete information- based collective decision-making into account, and (ii) observation and self-adjustment of opinions without direct interaction with neighbors. Sparse interactions are achieved. 2. Open-ended Structural Dynamics This section presents the open-ended structural dynamics with leaving and joining agents and the opinion-structure coevolution. At each time step, the agent with the lowest cumulative payoff leaves the society, which models an intention to exit a society where most individuals have fairly different positions (e.g., opinions). The stubbornness and openness of the leaver are recorded. All adjacent edges of this agent are removed from the society upon leaving. As society evolves, leaver-driven structural dynamics will demonstrate the confirmation bias more strongly because stubborn agents with opinions fairly different from others will have a low payoff leading to their removal from the model. Opinion dynamics affect the cumulative payoff, influence which agents become leavers, and thus drive the structural coevolution of the system. At each time step, a newcomer v will join. As society evolves, the community structure constantly changes. Agent v has incomplete information about different communities. It detects the real-time community structure and attempts to join a random communityv by connecting to randomly selected nodes within communityv. We assign a random opinion Ov to v and the 34 recorded stubbornness/openness of the leaver (see above) to v in order to keep the cognitive ability distribution stable within the society. Note that the cumulative payoff Ev of the incoming agent v is not comparable to that of existing agents when calculating the imitation probability and removing leavers, especially for long-term experiments. We accordingly assume that given v joining at time step tv, agent v’s initialized cumulative payoff Ev is adjusted by the corresponding payoff 𝑅$# at time tv multiplied by the number of completed interactions tv. After initialization, the cumulative payoff Ev is calculated by regularly adding the corresponding payoff 𝑅$ at each time step t until v is removed or the system terminates globally. After joining a community, the newcomer v chooses and connects to a node u in another community. We apply the preferential attachment principle (i.e., nodes with a higher connection degree have a stronger ability to attract new nodes added to the network) because “the rich getting richer” phenomenon is widely observed in real-world societies (Barabasi & Albert, 1999). Thus, the probability pv,u for v choosing u to connect to follows Equation 7. % ∀u ∈ (G − communityv): pu,v ∝ ∑ %$ (7) $ G−communityv represents all of the other communities except for communityv, which the new node v joins. du represents the degree of node u. If only one community exists as the society evolves, the new node joins by connecting to only one node following preferential attachment. An algorithmic view of the SCOOE model is shown in Algorithm 3. 35 Algorithm 3: The SCOOE model C. Model Performance To test the model, we create two small-world networks holding 500 nodes each to model two physically separated groups of people interacting to a certain degree. So, randomly chosen edges connect the two small-world networks. We apply the Watts–Strogatz model to generate an individual small-world network (Watts & Strogatz, 1998). Each node is connected to four nearest 36 neighbors. The rewiring probability is set to 0.05. This structure constitutes the agent society, with each node representing an agent. A focal agent will only consider those agents connected by edges as neighbors and conduct sparse interactions based on the neighborhood. For initialization, we follow prior work from the psychology and computing realms (Das et al., 2014; Meehl, 1992) and set the stubbornness distribution to a Gaussian distribution with a mean of 0.5 and a standard deviation of 0.25. These parameters are chosen so that most values lie between 0 and 1. In addition, we apply a cut-off so that generated random numbers can only lie between 0 and 1, i.e., we constrain stubbornness to the interval between 0 to 1, as shown in Figure 8 (a). Imitation noise is set to µ = 1.5. The simulation is run for 450 Monte Carlo time steps. 1. How Do Group Opinions Evolve? We first study how far the group opinions evolve away from their initial states, measured by the variance dynamics shown in Figure 8. The opinions of agents are reasonably different at the start because agents are assigned random opinions initially. As society evolves, we find two stages of evolution: a fast-decay phase (i.e., the variance of group opinions dramatically decreases from 0.084 to 0.005) and a slow-decrease phase (i.e., the variance slowly continues dropping to 0.003 at the end of the simulation). It is interesting to find that the final opinions are in a relatively narrow band and less polarized without firmly believing or strongly unbelieving the rumors among the agent population, even with some agents never changing their opinions (stubbornness =1) but being removed by the model. The Gaussian stubbornness distribution is also evenly distributed. The majority of the population keeps a balance between maintaining their original opinions and accepting a new opinion. Mirroring reality, we find that agents are more likely to stay open-minded to propagated news/rumors during in an open-ended society. 37 Figure 8: The opinion dynamics under default Gaussian stubbornness. 2. How Do the Opinion Dynamics Shape the Structural Co-dynamics? For this question, we primarily focus on the dynamics of the clustering coefficient, average path length, degree distribution, and community structure. For real-time community detection, we use the most widely used method, namely the modularity-based method (Clauset et al., 2004). The society coevolves to be a holistically dense small-world structure with a heavy-tailed degree distribution. As shown in Figures 9 and 10, we find an increase in the clustering 38 coefficient and average degree, as well as a decrease in average path length and the number of communities. We initialize the society as two interconnected small-world networks. The random edges between them change the initialized small-world features by randomizing them to a certain degree. Thus, we find a chaotic society initially with 26 detected tiny communities and a coevolved society with nine segregated communities by the modularity-based method (Clauset et al., 2004). We also observe that the coevolved society has a small-world feature with a high clustering coefficient. It coevolves to be a more tightly knit group with dense connection degrees, high information transmission efficiency, and a low average path length due to network homophily. That is, the final opinions of the population are relatively consistent, leading to an increase in payoff and a decrease in conflicts (e.g., confirmation bias for fairly different opinions) in the game-playing upon interactions. This coevolutionary trend of the structure in turn boosts the evolution of a global opinion (McPherson et al., 2001) Although some small-world generation models, e.g., the Kleinberg model (Kleinberg, 2002), do not generate heavy-tailed degree distributions, it is not surprising to find a heavy-tailed degree distribution appearing in the SCOOE model. The advantages of “the rich” become significant eventually because of preferentially added joiners. Specifically, we calculate the proportion P(d) of nodes with connection degree d. We find that the relationships between node proportion P(d) and node degree d can be approximated by a linear relationship log[P(d)] ~ (-γ) * log(d) with a negative slope -γ= -2.758 through linear regression within a 95% confidence interval. Note that the data points in Figure 9 (f) represent the average degree and the node proportion in different degree ranges. We only study the nodes in these degree ranges because they fill most of the network. These nodes are enough to illustrate a linear relationship. 39 Figure 9: The structural dynamics. We also find an emergent dialectic relationship between community segregation and cohesion. Cohesion is a concept of togetherness and connectedness among nodes within a network. There is no unified definition of cohesion because it depends on the context. Previous literature has referred to it as cliques/communities, clusters, or average degree (Kolaczyk & Csardi, 2014). Figure 10 shows our assessment of the community segregation and cohesion. 40 Figure 10: The community structure. As shown in Figure 10, the initialized society is desegregated and chaotic with a low level of cohesion (i.e., with a low average degree and clustering coefficient). As society coevolves, we find it has a clear pattern of fewer segregated communities that become densely clustered (i.e., with a high average degree and clustering coefficient). Agents have disconnected social networks initially but highly cohesive social clusters eventually. It is interesting to note that society becomes segregated but dense spontaneously and simultaneously with a global consensus and cohesion, but without multiple local-opinion “barycenters” that might emerge aligned with segregated communities (Gomez-Serrano et al., 2012) Mirroring reality, as previous work suggests (Neal & Neal, 2014), a widely observed example in the real world is policy- making to reduce detrimental residential segregation. A widely adopted approach to introduce desegregated neighborhoods and reduce residential segregation is to improve cohesion, e.g., dense connections. However, a paradox exists between community segregation and cohesion. The society evolves to be dense with segregated communities, whereas a desegregated society is not as cohesive as we would expect. 41 3. What Factors Affect the Evolved Global Opinion? It is reasonable to suspect that the degree of stubbornness affects the emergence of a global opinion. Additionally, the SCOOE model incorporates multiple types of dynamics. What effect do these dynamics have on the evolution of a final opinion? This section will address these questions. To study the influence of stubbornness distributions, we also test a Beta distribution and a Poisson distribution. We initialize the Beta distribution with two positive shape parameters α = 7 and β = 1, and the Poisson distribution with the expected rate of occurrences λ = 1. We normalize the two generated distributions with the maximum value representing stubbornness=1. The evolved opinions in these two cases are shown in Figure 11. The variance comparison with different stubbornness distributions is shown in Figure 12 (a). The variance comparison is defined as the ratio of the opinion variance for the Poisson/Beta stubbornness distributions to that for the baseline Gaussian stubbornness distribution at each time step t, t in [0, 450]. The variance dynamics with different sparse interaction mechanisms in a population with a Gaussian stubbornness distribution are shown in Figure 12 (b). Stubbornness is generally small in a population with Poisson stubbornness. Agents are very flexible to become followers of the propagated news/rumor. As a result, it will be easier to pass the fast-decay phase, and we observe an initial lower variance than the baseline shown in Figure 12 (a). Because of the flexibility in updating opinions, evolved opinions are still inconsistent at the end of 450 time steps, and the final variance is relatively large. In contrast, the agent population generally has much higher Beta stubbornness. Accordingly, we find an initial increase in the variance ratio to pass the fast-decay phase shown in Figure 12 (a). Because of the high stubbornness, final opinions are stable with few changes, and lower final variance than the baseline can be observed. 42 It is challenging to drive the global opinion evolution among a stubborn population, e.g., the initially weak emergence of the global opinion in the population with high Beta stubbornness. However, it is interesting to find the most unified global consensus in such a society with many agents only weakly changing opinions. This unusual phenomenon is due to the open-endedness of society. The most stubborn agents will be considered maladapted to the environment and removed as society evolves. Agents will be assimilated by agents who surround them. No matter the initial opinions they hold in stubborn crowds, they will finally have a relatively unified group consensus after the long-term interactions and the slow assimilation of opinions crowding out dissidents in an open-ended society. We can say that these high stubbornness values serve as a “wall” — newcomers with similar opinions will be accepted, while newcomers with opinions out of this range will be removed quickly. We additionally test the model without the intrinsic self-adjustment mechanism as shown in Figure 12 (b). A widely studied contagion phenomenon in social networks is that the chance to adopt a contested “innovation” (e.g., firmly believing a piece of rumor) will be smaller for an individual with more neighbors (Centola & Macy, 2007; Granovetter, 1973). When a focal agent aggregates the joint opinion by collective decision-making, extreme opinions (e.g., a strong endorsement) of selected neighbors are neutralized by weighted averaging. This effect will be more significant for high-degree nodes, given the larger share of their neighbors. On the other hand, high-degree nodes with a fewer likelihood of being extreme have a more substantial impact on the weighted aggregation method and a more extensive influence range. At the same time, collective interactions decrease the probability of interacting directly with extreme agents and being affected by them. So, the extrinsic collective interaction mechanism boosts the emergence of a global consensus, as shown in the similar trends of the fast-decay phase in the 43 two cases in Figure 12 (b). It plays fewer roles when the population rapidly reaches a pre- consensus (the start of the slow-decrease phase in variance dynamics), given the constantly adapted local interaction environment with the randomness to select neighbors, the joiners/leavers, and a constant injection of new opinions. The intrinsic adjustment mechanism continues to further the emergence of a global consensus and weakens conflicts by direct imitation. It can be said that extrinsic collective interactions primarily play a role in the fast- decay phase of the variance dynamics, whereas intrinsic adjustments mainly play a role in the slow-decrease phase. Their interplay works to enhance the evolution of a global opinion. Note that when we set the self-adjustment noise µ to a very large value, we can observe similar results to the case of removing the self-adjustment mechanism. Figure 11: The evolved opinions with different stubbornness distributions. 44 Figure 12: Opinion dynamics with different distributions and mechanisms. D. Discussions We revisit and discuss the proposed mechanisms by focusing on their effects on the consensus evolution within groups. Lean and fast decision strategies can be produced with incomplete information. A broad assumption in the widely cited bounded confidence model is that rational agents owning the perfect information poll their neighborhood and select neighbors to interact with only if their opinions are sufficiently close to their own. This assumption facilitates polarization and global conflicts (Gomez-Serrano et al., 2012). It has been widely recognized that it is difficult to evolve a global consensus for large population sizes (Iniguez et al., 2009; Kou et al., 2012; Yu et al., 2015), because local consensus might be distributed in a society. As a result, such a system needs more bottom-level interactions to pass the formation of the local consensus. Our results validate several earlier findings with different mechanisms and remarkably boost the evolution even in a stubborn population (Lim et al., 2014; Semonsen et al., 2018; Weisbuch et al., 2003). Unlike some bounded confidence models, e.g., Deffuant et al., 2000 and Gomez-Serrano et al., 2012, 45 here we start by assuming that bounded-rational agents only access a partial neighborhood (incomplete information) to aggregate a joint opinion. Confirmation bias is represented by the stipulation that adopting more similar opinions will bring a higher payoff. We find that conflicts among bounded-rational agents are weakened globally and rapidly. Bounded rationality with incomplete information forms lean and fast decision strategies to reduce conflicts under uncertainties, whereas complete information weakens group coordination, as suggested by some literature from psychology (Gigerenzer & Selten, 2002). Open-endedness enables permanently novel opinions. We find that eventually evolved opinions are wholly unified in some closed-society models (Semonsen et al., 2018). The continuous addition and removal of agents and the structure/opinion dynamics they bring with them influence neighbors and their surroundings in a cascading fashion. Though the designed mechanisms strongly facilitate the evolution, it is impossible to reach a highly unified global consensus. One can only approach it no matter whether the randomness or noise exists, as the slow-decrease phase in variance dynamics. It can also be said that the SCOOE model is robust to boost and enhance the evolution of a global opinion as it successfully defends against the interference of a constant injection of novel opinions. The interplay between sparse interaction and open-ended structure reduces the echo chamber effect. The echo chamber effect in social media studies describes a situation where local opinions are reinforced by repetition inside a closed society and insulated from rebuttal or different opinions (confirmation bias). Surprisingly, a substantial body of research indicates that people are not as polarized as we would expect in the echo chamber, both empirically (Balietti et al., 2021; Bar-Gill et al., 2020; Hosseinmardi et al., 2021; Shore et al., 2016) and theoretically (Lim et al., 2014; Semonsen et al., 2018; Weisbuch et al., 2003). We offer two possible 46 theoretical justifications for this apparent discrepancy between evidence and intuition: From the perspective of opinion dynamics, the focal agent considers the collective opinion based on a limited view of the neighborhood, which reduces polarization quickly. From a structural dynamic perspective, the society in our model (and also in the real world) is open-ended and constantly changing. It imparts persistent dynamics on the neighborhood structure, resulting in neighbors with whom the focal agent interacts being neither isolated nor static. When we examine previous models based on a closed structure, some work has shown global/local polarization and extreme opinions (Banisch & Olbrich, 2019; Hegselmann et al., 2002; Mathias et al., 2017). The open- endedness feature with a constant injection of novel opinions in the SCOOE model helps a population defend against the echo chamber effect and stay open-minded. It reduces the chances of extreme results because extreme agents are likely to be removed from society. It also mirrors the findings of a global consensus formation in a population with high Beta stubbornness. In general, we believe that opinion and structural mechanisms are inextricably linked and that their interplay helps reduce polarization. 47 IV. TOWARDS COLLECTIVE INTELLIGENCE This section provides a thorough analysis of proposed models and relates their insights into collective intelligence to widespread applications in other subfields of computer science. Our first model focuses on the evolution of cooperation with two variants. The first variant discusses a framework based on evolutionary selection: the parent agent produces offspring based on its performance in the entire society (cumulative payoff). The offspring inherit the parent's behavior with mutation. Additionally, they improve their inherited behavior via two methods: collective learning and social learning. Collective learning signifies that the focal agent generates an overall best action via collective voting from the set of best actions learned through reinforcement learning with different neighbors. For social learning, we employ one of the simplest methods, imitation, in which the agent imitates others' actions according to the difference in the cumulative payoff. In the second variant, we transfer this evolutionary selection framework to a hierarchical network, with the subordinate agent reporting to the supervisor after learning the overall best action through collective learning. The supervisor obtains the best action of the upper layer by social learning with its upper-layer neighbors and communicates this piece of supervised information to the lower-layer subordinates. Subordinates then adjust the learning and exploration rates in their reinforcement learning protocols according to the supervision. We examine how cooperation emerges from selfish agents using these two variants. In the second model, we focus on opinion dynamics, i.e., how the agent adjusts its opinion in order to better integrate into society. We do not use reinforcement learning as a means of agent interaction. The agent's objective in the first model is to learn the optimal action, a typical Markov decision-making problem - deciding what action to adopt in the next step based 48 on the action, state, and payoff of the previous step. In contrast, for the study of opinion dynamics, the agent attempts to adapt to the environment by evolving and updating the opinion. We use a Bayesian method to update the existing opinion of the focal agent by a weighted average of existing opinion and the collective opinion from neighbors. Payoffs are lower when arguing with those who have the most divergent opinions. The degree to which agents adapt to the environment, represented by the cumulative payoff, drives them to leave and join society selectively. Opinions and social structure co-evolve, and we examine how coordination emerges through the coevolution from initial opinion chaos. These two models, despite their different contexts, both focus on a fundamental issue in multi-agent systems, artificial life, and complexity science: the emergence of coordination. Section I divides principal methodologies in building agent-based computational models into three related categories: how agents are represented, how agents interact, and how the agent population is structured. Two proposed models introduce rich dynamics in these three aspects. In this section, we continue to follow these three lines and discuss two proposed models by examining the following three issues: • First, we adopt an interaction-based perspective on adaptation, i.e., how agents with different representations adapt to the environment through different interaction protocols, such as reinforcement learning and opinion evolution, and how different adaptations impact the macroscopic evolutionary dynamics. • Second, we adopt a structural view to discuss open-endedness and its effect on the robust emergence of macroscopic coordination. 49 • Third, we take a broader view of how research on emergent coordination and collective intelligence in the fields of artificial life, evolutionary computation, and complex systems should be exploited to inspire other subfields of computer science. A. Adaptation Adaptation is an intricate concept. It describes how a species, an individual, or an agent gradually becomes fitted to the environment, say, the ecological system, society, and artificial system, respectively (Pimm et al., 2016). According to different time scales, adaptation has been classified into three categories (Jablonka & Lamb, 2006; Gershenson, 2010; Aguilar et al., 2014). According to Aguilar et al. (Aguilar et al., 2014), a slow adaptation that occurs over many lifetimes is referred to as evolution. The adaptation that occurs at a moderate rate (one lifetime, for example) is referred to as development (including morphogenesis and cognitive developments). A very rapid adaptation that requires only a small portion of a lifetime is referred to as learning. Note that some forms of adaptation may be counterproductive, hindering the population's ability to survive in its environment. Some examples include the behaviors acquired through social or normative learning, such as adopting unrelated children or altruistic behaviors that do not favor relatives or kin (Staddon, 2016). In our models, agents either "learn" the optimal action in a short time (tens of time steps, as shown in Figures 2, 3, 5, and 6 in the cooperation models) or adapt to the environment in a relatively slow manner. If we consider the time an agent spends in the environment to be its lifetime, it may take a large number of lifetimes for the agent population to "evolve" toward the group coordination in an open-ended environment, as shown in the open-ended opinion dynamics model (the slowly decreasing but non-convergent phase of opinion variance, Figures 8 and 12). 50 The evolutionary dynamics of a system are profoundly influenced by different modes of adaptation. In this thesis, we first demonstrate that adaptation through collective interaction facilitates evolution. As shown in Figure 2 of model 1, the "bad" effects of individual-level mutations (a small probability of not adopting the best action) that lead to suboptimal collective decision-making derived from the suboptimal action set are mitigated by voting methods such as majority voting. It is also worth noting that agent representation and structure both influence adaptation. The former, agent representation, aims to enhance the model's capacity to represent reality in the multidisciplinary context. In our models, for instance, we implement cognitively heterogeneous agents. This diversity is reflected in agents' capacity to accept and incorporate new perspectives from others. The hierarchical system with unequal payoff distribution is also introduced to represent the sharing between supervisors and subordinates. Different payoff matrices among agents will have an effect on how they learn optimal actions and adapt to society. Although the cooperation among subordinates is more challenging to emerge because they receive a lower payoff for cooperation than their supervisors, we still observe the global cooperation as supervised information influences how subordinates behave. In biology, ecology, and social sciences, there is a broad practice of using heterogeneous agents to represent different roles and functions of entities within the system (Epstein & Axtell, 1996; Macal & North, 2005). The latter is more concerned with the network structure and neighbor composition. When considering a focal agent without a global view, different niches in which the focal agent exists have different effects on its adaptation. When the number of niches is large (as shown in large cluster sizes in the hierarchical cooperation model), it is possible to form some local coordination distributed in the whole society. Crossing local coordination to form global coordination will be more challenging, as shown in Figure 5. 51 One of the most prominent criticisms of contemporary artificial intelligence and machine learning research is its lack of adaptability, as it has sought to predict and control rather than adapt (Aguilar et al., 2014; Gershenson, 2013). This is the root cause of numerous problems in current deep learning research, ranging from insufficient robustness, inflexible and rigid configuration assumptions (say, fixed input and configuration), and inadequate performance in adapting to novel task settings (Ha & Tang, 2022). It is evident that combining evolution and learning (e.g., Lazaridou & Baroni, 2020) or development and learning (e.g., Chrol-Cannon & Jin, 2014) is a promising approach, particularly for programming adaptive agent behaviors. In this thesis, an attempt is made to create adaptive learning. In the hierarchical cooperation model, we propose that the subordinates' learning and exploration rates are adapted based on their supervisors' social learning information (Figures 4 and 6). There is a vast amount of future work worthy of further investigation in real-world scenarios. Recent efforts have been made on this topic across the fields of machine learning, robotics, and evolutionary computation, especially for studying real-time, online, complex strategic interactions, such as automated trading in stock markets and real-time team formation/task allocation in multi-robot systems (Bloembergen et al., 2015). B. Open-endedness Evolution is unstable. Much research in evolutionary biology and artificial life indicates that the complexity of evolution exhibits an open-ended, limitless expansion (Bedau, 2007). Artificial life as a field in nature aims to create an artificial system to demonstrate how unrestricted evolutionary progress can be achieved (Bedau, 2007). At the OEE (Open-Ended Evolution) workshop in York, Tylar and colleagues provided a summary of the characteristics of open- 52 ended evolution: the ongoing generation of adaptive novelty and the ongoing increase in complexity (Tylar et al., 2016). Based on how we define "end," Banzhaf et al. provided three primary classifications of open-endedness: (1) processes that do not terminate; (2) processes without a specific objective; (3) processes that do not terminate and have no specific objective (Banzhaf et al., 2016). This thesis adopts a structural perspective on open-endedness: In models for the evolution of cooperation, complexity and novelty are increased by continuously generating offspring agents (new nodes) connected to their parents via newly generated edges in the network. In the opinion dynamics model, we introduce joiners and leavers. The joiner detects communities in real time and chooses one to join. Dissidents with significantly divergent opinions leave the network. Together, they influence the evolution of the structure. In both models, the newcomer's new behaviors are subsequently integrated into society. Novelty and complexity are constantly introduced in terms of both behavior and structure. As agents with new behaviors join the open- ended evolutionary system, the new behavior is accompanied by a reorganization of the structure, such as a change in edge configuration. Subsequent novelties are not generated solely by newly added agents and behaviors but also by the simultaneous structural dynamics. We can observe a gradual decrease in the variance of opinions that never converge (Figures 8 and 12). In both models, we find that the open-ended structural characteristic does not necessitate a coordination disruption. Instead, the dominant norm is stable even if it constantly evolves in an open-ended manner. Newcomers actively or passively choose to assimilate into it, which somewhat validates prior works from migration acculturation and cultural evolution (see Mesoudi, 2018). 53 The interplay between open-endedness and adaptation provides a means to improve the model's robustness. Adaptivity of the system enhances the evolution of the global opinion as it successfully defends against the interference of a constant injection of novel opinions. The open- ended society "repairs" the global opinion by expelling dissidents who may disrupt it. Recently, a large body of research in artificial life and evolutionary computation (e.g., Mordvintsev et al., 2020; Sandler et al., 2020) has shown that cellular automata-based neural networks with adaptive interactions between neurons can resist the interference of noise and self-repair when damaged. These works will be discussed in greater detail in the subsequent section on linking the emergent models with other subfields in computer science. The open-ended evolution in our models deserves further investigation because it is incomplete in its current standing. The ongoing generation of new kinds of entities and their behaviors do exist in our models, but they do not result in a new type of adaptation. That is, new agents continue to adapt, learn, or evolve in the same manner as their predecessors. Simultaneously, new agents select actions from the established action set without generating novelty in the strategy complexity. We should not solely focus on evolving particular coordination from a given set of behaviors. Rather than that, we expect the emergence with an active complexity and boundary adjustment. How to design a comprehensive, open-ended evolutionary system with the coexistence of new adaptations, new types of entities, major transitions (e.g., emergence), and the evolved open-endedness (open-endedness as a consequence of evolution) while keeping the model elegant is a fascinating direction (Tylar et al., 2016; Pattee & Sayama, 2019). 54 C. Collective Intelligence Our core in this thesis, group coordination as a form of collective intelligence, is one of the central issues in complex systems, multi-agent systems, and artificial life. The rational choice for an individual agent in the first work is to be uncooperative. The agent population learns to cooperate through the processes of learning and evolutionary selection, as cooperation is unattractive to the individual but advantageous to the population. In the second work, the agent's opinion is very different initially, and there is a constant injection of newcomers with different opinions. Through the evolutionary process, the conflict among agents with different opinions is diminished across the whole society. In our models, the emergence of collective altruism from individual selfishness and the group coordination from initial conflicts are observed frequently, benefitting the population. We hope that the insights of collective intelligence presented in this thesis will serve as a catalyst for other communities in computer science. We list some potential applications below. • Unsupervised learning: Unsupervised learning is a self-organizing process in nature due to the absence of supervision. Artificial life and collective intelligence will unquestionably benefit the development of robust and adaptive unsupervised learning algorithms (Rand, 2006). Clustering, for instance, is a typical unsupervised learning task. A general framework is to model a data instance as an agent in a multi-agent system and implement clustering techniques based on agent attribute similarity (say, distance in the vector space). The process of data clustering is the emergence of agent communities (Chaimontree et al., 2012; Bu et al., 2017). 55 • Robust deep learning: An early idea of this is the Neural Cellular Automata (NCA) (Wulff & Hertz, 1992). A unit in the CA lattice is represented by a neuron of a neural network. Instead of pre-defined CA rules, the neural network learns local interaction rules and updates each neuron’s state based on the interaction with local neighbors. Some recent work improves the NCA from different perspectives. Evolutionary algorithms have been used to train the learning of neural architecture, weights, and local interaction rules (Nichele et al., 2017). Some work is inspired by the NCA and creates noise-resistant deep learning algorithms by adaptively learning a coherent representation of features (Zhou et al., 2022). Adaptive interactions are introduced between neurons using the attention mechanism. The most relevant vectors to the output are shared between neurons and can be flexibly attributed to the highest weights in a collaborative manner (Tang & Ha, 2021; Jian et al., 2019). High resilience to damage is illustrated when almost the entire input image is removed, and the deep learning system is still able to regenerate it (Mordvintsev et al., 2020). Ha and Tang developed a systematical survey discussing the inspiration on robustness, generalizability, and adaptivity from collective intelligence in the current deep learning architectures from four perspectives: (1) image processing, (2) deep reinforcement learning, (3) multi-agent learning, and (4) meta-learning (Ha & Tang, 2022). • Collective robotics: The research on collective robotics, sometimes referred to as “hard” artificial life, transits a large number of ideas in agent interaction, scheduling, and learning to real-world robotic teams (Ferrante et al., 2015). We can program robots for a variety of roles and establish a series of interaction protocols to study tasks like team formation. The emergent property permits robotic teams to manage real-time, online 56 tasks more effectively than centralized control (Pitonakova et al., 2014). For instance, centralized collaboration can emerge from the information sharing between decentralized modules to collaboratively optimize a shared global reward in soft-bodied robot teams (Huang et al., 2020). • Optimization: When agents are viewed as independent problem solvers, the collaboration that emerges between them can solve complex optimization problems that a single agent cannot. A relatively early optimization algorithm based on the concept of agents' emergent collaboration and communication is the cultural algorithm. It is a collective search process in which the agent population and belief space (different knowledge of individual agents in the search space) coevolve by generating a more knowledgeable population and simultaneously updating individual agents' belief space (Reynolds, 1994). The cultural algorithm has inspired substantial research into metaheuristic algorithms and numerous real-life applications, particularly in search-based software engineering. Some instances include optimally assigning different programmers to different tasks (Harman, 2007), bug locating and auto-repair with heuristic approaches (Le Goues et al., 2012), and the co-evolution of programs (prey) and unit tests (predator) (Arcuri & Yao, 2008). Agent-based models are rooted in the fields of artificial life/evolutionary computing and fundamentally interplay with learning, behavioral modeling, cognitive science, and network science. There is a substantial amount of significant work on these topics that not only provides insights into evolutionary, collective intelligence but also inspires other subfields of computer science and even other disciplines. As the reader can see, the context of our thesis closely relates to a variety of social science topics, including opinion polarization, social networks, cooperative behavior, and voting. Since their inception, agent-based models and artificial life have been a 57 natural fit with evolutionary biology. They have produced a large body of classic work (see the review paper by Murphy et al., 2020). In studies of human society, another evolutionary, open- ended, complex adaptive system, similar evolutionary models to artificial life have not received the same level of attention (Kim & Cho, 2006; Arthur, 2014). Our thesis ultimately aims to contribute to the growth of digital evolution, artificial life, and their integration with other topics, such as network science, evolutionary game theory, and machine learning in a cross-disciplinary manner. 58 V. CONCLUSION This thesis studies the evolution of cooperation and opinion dynamics viewed from agent representation, agent interaction, and agent group structure. We are primarily concerned with elucidating agent heterogeneity, different modes of interaction, network structure, and their profound effects on evolutionary dynamics. In addition, we establish a connection between two models and a broader topic, collective intelligence, in terms of interaction-based adaptation and structural open-endedness. We hope that the concepts of evolution, emergence, and coordination presented in the thesis will be beneficial and closely integrated into other fields in order to increase the adaptability, flexibility, and robustness of models. 59 BIBLIOGRAPHY 60 BIBLIOGRAPHY Adami, Christoph, Jory Schossau, and Arend Hintze. "Evolutionary game theory using agent- based methods." Physics of Life Reviews 19 (2016): 1-26. Aguilar, Wendy, Guillermo Santamaría-Bonfil, Tom Froese, and Carlos Gershenson. "The past, present, and future of artificial life." Frontiers in Robotics and AI 1 (2014): 8. Arcuri, Andrea, and Xin Yao. "A novel co-evolutionary approach to automatic software bug fixing." In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 162-168. IEEE, 2008. Arthur, W. Brian. Complexity and the economy. Oxford University Press, 2014. Aspray, William. "John von Neumann's contributions to computing and computer science." Annals of the History of Computing 11.3 (1989): 189-195. Attenberg, Lee. Modularity: Understanding the development and evolution of natural complex systems. MIT Press, 2005. Axelrod, Robert, and William D. Hamilton. "The evolution of cooperation." Science 211.4489 (1981): 1390-1396. Bandura, Albert, and Richard H. Walters. "Social learning theory." Vol. 1. Prentice Hall: Englewood Cliffs, 1977. Banisch, Sven, and Eckehard Olbrich. "Opinion polarization by learning from social feedback." The Journal of Mathematical Sociology 43.2 (2019): 76-103. Banzhaf, Wolfgang, Bert Baumgaertner, Guillaume Beslon, René Doursat, James A. Foster, Barry McMullin, Vinicius Veloso de Melo, Thomas Miconi, Lee Spector, Susan Stepney, and Roger White. "Defining and simulating open-ended novelty: Requirements, guidelines, and challenges." Theory in Biosciences 135, no. 3 (2016): 131-161. Bao, Honglin, Qiqige Wuyun, and Wolfgang Banzhaf. "Evolution of cooperation through genetic collective learning and imitation in multiagent societies." In Proceedings of ALIFE 2018: The 2018 Conference on Artificial Life. MIT Press, 2018. Bar-Gill, Sagit, Yael Inbar, and Shachar Reichman. "The impact of social vs. nonsocial referring channels on online news consumption." Management Science 67.4 (2021): 2420-2447. Barabási, Albert-László, and Réka Albert. "Emergence of scaling in random networks." Science 286.5439 (1999): 509-512. 61 Barbati, Maria, Giuseppe Bruno, and Andrea Genovese. "Applications of agent-based models for optimization problems: A literature review." Expert Systems with Applications 39.5 (2012): 6020-6028. Bedau, Mark A. "Artificial life." In Philosophy of Biology, pp. 585-603. North-Holland, 2007. Bloembergen, Daan, Karl Tuyls, Daniel Hennes, and Michael Kaisers. "Evolutionary dynamics of multi-agent learning: A survey." Journal of Artificial Intelligence Research 53 (2015): 659-697. Bowling, Michael, and Manuela Veloso. "Rational and convergent learning in stochastic games." International Joint Conference on Artificial Intelligence. Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd, 2001. Bratman, Michael. Intention, plans, and practical reason. CSLI Publications, 1987. Bu, Zhan, Guangliang Gao, Hui-Jia Li, and Jie Cao. "CAMAS: A cluster-aware multiagent system for attributed graph clustering." Information Fusion 37 (2017): 10-21. Burns, Tom R., and Helena Flam. The shaping of social organization: Social rule system theory with applications. Sage Publications, 1987. Centola, Damon, and Michael Macy. "Complex contagions and the weakness of long ties." American Journal of Sociology 113.3 (2007): 702-734. Chaimontree, Santhana, Katie Atkinson, and Frans Coenen. "A framework for multi-agent based clustering." Autonomous Agents and Multi-Agent Systems 25, no. 3 (2012): 425-446. Choi, Jeehye, and Kwang-Il Goh. "Majority-vote dynamics on multiplex networks with two layers." New Journal of Physics 21.3 (2019): 035005. Chrol-Cannon, Joseph, and Yaochu Jin. "Computational modeling of neural plasticity for self- organization of neural networks." Biosystems 125 (2014): 43-54. Clauset, Aaron, Mark EJ Newman, and Cristopher Moore. "Finding community structure in very large networks." Physical Review E 70.6 (2004): 066111. Connors, Jacob, Scott Graham, and Logan Mailloux. "Cyber synthetic modeling for vehicle-to- vehicle applications." In Proceedings of 2018 International Conference on Cyber Warfare and Security, pp. 594-XI. Academic Conferences International Limited, 2018. Dayan, Peter, and Nathaniel D. Daw. "Decision theory, reinforcement learning, and the brain." Cognitive, Affective, & Behavioral Neuroscience 8, no. 4 (2008): 429-453. de Arruda, Guilherme Ferraz, Emanuele Cozzo, Tiago P. Peixoto, Francisco A. Rodrigues, and Yamir Moreno. "Disease localization in multilayer networks." Physical Review X 7, no. 1 (2017): 011014. 62 Deci, Edward L., Richard Koestner, and Richard M. Ryan. "Extrinsic rewards and intrinsic motivation in education: Reconsidered once again." Review of Educational Research 71.1 (2001): 1-27. Deffuant, Guillaume, David Neau, Frederic Amblard, and Gérard Weisbuch. "Mixing beliefs among interacting agents." Advances in Complex Systems 3, no. 01n04 (2000): 87-98. Dittrich, Peter, Thomas Kron, and Wolfgang Banzhaf. "On the scalability of social order- modeling the problem of double and multi-contingency following Luhmann." Journal of Artificial Societies and Social Simulation 6.1 (2002). Epstein, Joshua M., and Robert Axtell. Growing artificial societies: Social science from the bottom up. Brookings Institution Press, 1996. Ferrante, Eliseo, Ali Emre Turgut, Edgar Duéñez-Guzmán, Marco Dorigo, and Tom Wenseleers. "Evolution of self-organized task specialization in robot swarms." PLoS Computational Biology 11, no. 8 (2015): e1004273. Filatova, Tatiana, Peter H. Verburg, Dawn Cassandra Parker, and Carol Ann Stannard. "Spatial agent-based models for socio-ecological systems: Challenges and prospects." Environmental Modelling & Software 45 (2013): 1-7. Forsyth, Donelson R. Group dynamics. Cengage Learning, 2018. Fossett, Mark, and Warren Waren. "Overlooked implications of ethnic preferences for residential segregation in agent-based models." Urban Studies 42.11 (2005): 1893-1917. Friedkin, Noah E., and Eugene C. Johnsen. "Social influence and opinions." Journal of Mathematical Sociology 15.3-4 (1990): 193-206. Gallegati, Mauro, and Alan Kirman. "Reconstructing economics: Agent-based models and complexity." Complexity Economics 1.1 (2012): 5-31. Georgeff, Michael, Barney Pell, Martha Pollack, Milind Tambe, and Michael Wooldridge. "The belief-desire-intention model of agency." In Proceedings of the 1998 International Workshop on Agent Theories, Architectures, and Languages, pp. 1-10. Springer, Berlin, Heidelberg, 1998. Gershenson, Carlos. "Computing networks: A general framework to contrast neural and swarm cognitions." Paladyn, Journal of Behavioral Robotics 1, no. 2 (2010): 147-153. Gershenson, Carlos. "Facing complexity: Prediction vs. adaptation." In Complexity Perspectives on Language, Communication and Society, pp. 3-14. Springer, Berlin, Heidelberg, 2013. Gigerenzer, Gerd, and Reinhard Selten, eds. Bounded rationality: The adaptive toolbox. MIT Press, 2002. 63 Golman, Russell, and George Loewenstein. "Information gaps: A theory of preferences regarding the presence and absence of information." Decision 5.3 (2018): 143-164. Gómez-Serrano, Javier, Carl Graham, and Jean-Yves Le Boudec. "The bounded confidence model of opinion dynamics." Mathematical Models and Methods in Applied Sciences 22.02 (2012): 1150007. Granovetter, Mark S. "The strength of weak ties." American Journal of Sociology 78.6 (1973): 1360-1380. Ha, David, and Yujin Tang. "Collective intelligence for deep learning: A survey of recent developments." arXiv preprint arXiv:2111.14377 (2021). Harman, Mark. "The current state and future of search-based software engineering." In Future of Software Engineering (FOSE'07), pp. 342-357. IEEE, 2007. Hegselmann, Rainer, and Ulrich Krause. "Opinion dynamics and bounded confidence models, analysis, and simulation." Journal of Artificial Societies and Social Simulation 5.3 (2002). Hosseini, Fatemeh Beigom, Saeed Ghorbani, and Reza Rezaeeshirazi. "Effects of perceived autonomy support in the physical education on basic psychological needs satisfaction, intrinsic motivation and intention to perform physical activity in high school students." International Journal of School Health 7.4 (2020): 39-46. Hosseinmardi, Homa, Amir Ghasemian, Aaron Clauset, Markus Mobius, David M. Rothschild, and Duncan J. Watts. "Examining the consumption of radical content on YouTube." Proceedings of the National Academy of Sciences 118, no. 32 (2021): 166-177. Hu, Junling, and Michael P. Wellman. "Nash Q-learning for general-sum stochastic games." Journal of Machine Learning Research 4. Nov (2003): 1039-1069. Huang, Wenlong, Igor Mordatch, and Deepak Pathak. "One policy to control them all: Shared modular policies for agent-agnostic control." In International Conference on Machine Learning, pp. 4455-4464. PMLR, 2020. Iniguez, Gerardo, János Kertész, Kimmo K. Kaski, and R. A. Barrio. "Phase change in an opinion-dynamics model with separation of time scales." Physical Review E 83, no. 1 (2011): 016111. Jablonka, Eva, and Marion J. Lamb. "Soft inheritance: challenging the modern synthesis." Genetics and Molecular Biology 31, no. 2 (2008): 389-395. Jian, Muwei, Quan Zhou, Chaoran Cui, Xiushan Nie, Hanjiang Luo, Jianli Zhao, and Yilong Yin. "Assessment of feature fusion strategies in visual attention mechanism for saliency detection." Pattern Recognition Letters 127 (2019): 37-47. 64 Kapetanakis, Spiros, and Daniel Kudenko. "Improving on the reinforcement learning of coordination in cooperative multi-agent systems." In Proceedings of the Second AISB Symposium on Adaptive Agents and Multi-agent Systems II (2002): 119–131. Kim, Kyung-Joong, and Sung-Bae Cho. "A comprehensive overview of the applications of artificial life." Artificial Life 12, no. 1 (2006): 153-182. Kleinberg, Jon. "Small-world phenomena and the dynamics of information." Advances in Neural Information Processing Systems 14 (2001). Langton, Christopher G., ed. Artificial life: An overview. The MIT Press, 1997. Lazaridou, Angeliki, and Marco Baroni. "Emergent multi-agent communication in the deep learning era." arXiv preprint arXiv:2006.02419 (2020). Le Goues, Claire, Westley Weimer, and Stephanie Forrest. "Representations and operators for improving evolutionary software repair." In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 959-966. 2012. Lim, Dongwon, Hwansoo Lee, Hang-Jung Zo, and Andrew Ciganek. "Opinion formation in the digital divide." Journal of Artificial Societies and Social Simulation 17, no. 1 (2014). Littman, Michael L. "Friend-or-foe Q-learning in general-sum games." In Proceedings of the 2nd International Conference on Machine Learning, vol. 1, pp. 322-328. 2001. Littman, Michael L. "Markov games as a framework for multi-agent reinforcement learning." Machine Learning Proceedings. Morgan Kaufmann, 1994. 157-163. Littman, Michael L., and David H. Ackley. "Adaptation in constant utility non-stationary environments." In Proceedings of the 6th International Conference on Genetic Algorithms, pp. 136-142. 1991. Macal, Charles M., and Michael J. North. "Tutorial on agent-based modeling and simulation." In Proceedings of the Winter Simulation Conference, pp. 14-pp. IEEE, 2005. Mastroeni, Loretta, Pierluigi Vellucci, and Maurizio Naldi. "Agent-based models for opinion formation: A bibliographic survey." IEEE Access 7 (2019): 58836-58848. Mathias, Jean-Denis, Sylvie Huet, and Guillaume Deffuant. "An energy-like indicator to assess opinion resilience." Physica A: Statistical Mechanics and its Applications 473 (2017): 501-510. Matsuo, Yutaka, Junichiro Mori, Masahiro Hamasaki, Takuichi Nishimura, Hideaki Takeda, Koiti Hasida, and Mitsuru Ishizuka. "Polyphonet: An advanced social network extraction system from the web." Journal of Web Semantics 5, no. 4 (2007): 262-278. McPherson, Miller, Lynn Smith-Lovin, and James M. Cook. "Birds of a feather: Homophily in social networks." Annual Review of Sociology 27.1 (2001): 415-444. 65 Meehl, Paul E. "Factors and taxa, traits and types, differences of degree and differences in kind." Journal of Personality 60.1 (1992): 117-174. Mesoudi, Alex. "Migration, acculturation, and the maintenance of between-group cultural variation." PLoS One 13, no. 10 (2018): e0205573. Monderer, Dov, and Lloyd S. Shapley. "Fictitious play property for games with identical interests." Journal of Economic Theory 68.1 (1996): 258-265. Mordvintsev, Alexander, Ettore Randazzo, Eyvind Niklasson, and Michael Levin. "Growing neural cellular automata." Distill 5, no. 2 (2020): e23. Murphy, Kilian J., Simone Ciuti, and Adam Kane. "An introduction to agent‐based models as an accessible surrogate to field‐based research and teaching." Ecology and Evolution 10, no. 22 (2020): 12482-12498. Nasution, Mahyuddin KM, and Shahrul Azman Noah. "Information retrieval model: A social network extraction perspective." In Proceedings of the International Conference on Information Retrieval & Knowledge Management, pp. 322-326. IEEE, 2012. Neal, Zachary P., and Jennifer W. Neal. "The (in)compatibility of diversity and sense of community." American Journal of Community Psychology 53.1-2(2016):1-12. Nichele, Stefano, Mathias Berild Ose, Sebastian Risi, and Gunnar Tufte. "CA-NEAT: Evolved compositional pattern producing networks for cellular automata morphogenesis and replication." IEEE Transactions on Cognitive and Developmental Systems 10, no. 3 (2017): 687-700. Noorazar, Hossein. "Recent advances in opinion propagation dynamics: A 2020 survey." The European Physical Journal Plus 135.6 (2020): 1-20. Nordhausen, Klaus. Statistical analysis of network data with R. Springer Nature, 2015. Nowak, Martin, and Karl Sigmund. "A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game." Nature 364.6432 (1993): 56-58. Pan, Qiuhui, Xuesong Liu, Honglin Bao, Yu Su, and Mingfeng He. "Evolution of cooperation through adaptive interaction in a spatial prisoner’s dilemma game." Physica A: Statistical Mechanics and its Applications 492 (2018): 571-581. Pattee, Howard H., and Hiroki Sayama. "Evolved open-endedness, not open-ended evolution." Artificial Life 25, no. 1 (2019): 4-8. Pimm, Stuart L., Gareth J. Russell, John L. Gittleman, and Thomas M. Brooks. "The future of biodiversity." Science 269, no. 5222 (1995): 347-350. 66 Pitonakova, Lenka, Richard Crowder, and Seth Bullock. "Understanding the role of recruitment in collective robot foraging." In ALIFE 14: The Fourteenth International Conference on the Synthesis and Simulation of Living Systems, pp. 264-271. MIT Press, 2014. Plous, Scott. The psychology of judgment and decision making. Mcgraw-Hill Book Company, 1993. Polikar, Robi. "Ensemble based systems in decision-making." IEEE Circuits and Systems Magazine 6.3 (2006): 21-45. Rand, William. "Machine learning meets agent-based modeling: When not to go to a bar." In Proceedings of the Conference on Social Agents: Results and Prospects. 2006. Rao, Anand S., and Michael P. Georgeff. "BDI agents: From theory to practice." In Proceedings of the First International Conference on Multiagent Systems -AAAI 1995, vol. 95, pp. 312-319. 1995. Reynolds, Craig W. "Flocks, herds and schools: A distributed behavioral model." In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25-34. 1987. Reynolds, Robert G. "An introduction to cultural algorithms." In Proceedings of the Third Annual Conference on Evolutionary Programming, Vol. 24, pp. 131-139. River Edge: World Scientific, 1994. Sandler, Mark, Andrey Zhmoginov, Liangcheng Luo, Alexander Mordvintsev, and Ettore Randazzo. "Image segmentation via cellular automata." arXiv preprint arXiv:2008.04965 (2020). Savarimuthu, Bastin Tony Roy, Rexy Arulanandam, and Maryam Purvis. "Aspects of active norm learning and the effect of lying on norm emergence in agent societies." In Proceedings of the 13rd International Conference on Principles and Practice of Multi- Agent Systems, pp. 36-50. Springer, Berlin, Heidelberg, 2011. Schelling, Thomas C. "Dynamic models of segregation." Journal of Mathematical Sociology 1.2 (1971): 143-186. Semonsen, Justin, Christopher Griffin, Anna Squicciarini, and Sarah Rajtmajer. "Opinion dynamics in the presence of increasing agreement pressure." IEEE Transactions on Cybernetics 49, no. 4 (2018): 1270-1278. Shore, Jesse, J. Baek, and C. Dellarocas. "Network structure and patterns of information diversity on Twitter." MIS Quarterly 42.3(2018): 849-872. Staddon, John Eric Rayner. Adaptive behavior and learning. Cambridge University Press, 2016. Szabó, György, and Csaba Tőke. "Evolutionary prisoner's dilemma game on a square lattice." Physical Review E 58.1 (1998): 69-73. 67 Sznajd-Weron, Katarzyna, and Jozef Sznajd. "Opinion evolution in closed community." International Journal of Modern Physics C 11.06 (2000): 1157-1165. Tang, Yujin, and David Ha. "The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning." Advances in Neural Information Processing Systems 34 (2021). Taylor, Matthew E., Shimon Whiteson, and Peter Stone. "Comparing evolutionary and temporal difference methods in a reinforcement learning domain." In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 1321-1328. 2006. Taylor, Tim, Mark Bedau, Alastair Channon, David Ackley, Wolfgang Banzhaf, Guillaume Beslon, Emily Dolson, Tom Froese, Simon Hickinbotham, Takashi Ikegami, Barry McMullin, Norman Packard, Steen Rasmussen, Nathaniel Virgo, Eran Agmon, Edward Clark, Simon McGregor, Charles Ofria, Glen Ropella, Lee Spector, Kenneth O. Stanley, Adam Stanton, Christopher Timperle, Anya Vostinar, and Michael Wiser. "Open-ended evolution: Perspectives from the OEE workshop in York." Artificial Life 22, no. 3 (2016): 408-423. Thrun, Michael C., and Alfred Ultsch. "Swarm intelligence for self-organized clustering." Artificial Intelligence 290 (2021): 103237. Villani, Marco, Stefano Benedettini, Andrea Roli, David Lane, Irene Poli, and Roberto Serra. "Identifying emergent dynamical structures in network models." In Recent Advances of Neural Network Models and Applications, pp. 3-13. Springer, Cham, 2014. Wang, Hanfu, Weidong Chen, and Jingchuan Wang. "Heterogeneous multi-agent routing strategy for robot-and-picker-to-good order fulfillment system." In Proceedings of International Conference on Intelligent Autonomous Systems, pp. 237-249. Springer, Cham, 2018. Wang, Huanjing, and Lihui Shang. "Opinion dynamics in networks with common-neighbors- based connections." Physica A: Statistical Mechanics and its Applications 421 (2015): 180-186. Wang, Yao, Jie Zhang, and Julita Vassileva. "Super-agent-based reputation management with a practical reward mechanism in decentralized systems." In Proceedings of the 2010 IFIP WG 11.4 International Conference on Open Research Problems in Network Security Vol. 11. 2010. Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of 'small-world' networks." Nature 393.6684 (1998): 440-442. Weisbuch, Gérard, Guillaume Deffuant, Frederic Amblard, and J-P. Nadal. "Interacting agents and continuous opinions dynamics." In Heterogenous Agents, Interactions and Economic Performance, pp. 225-242. Springer, Berlin, Heidelberg, 2003. 68 Wulff, N., and J. A. Hertz. "Learning cellular automaton dynamics with neural networks." Advances in Neural Information Processing Systems 5 (1992). Young, H. Peyton. "The economics of convention." Journal of Economic Perspectives 10.2 (1996): 105-122. Yu, Chao, Zhen Wang, Hongtao Lv, Honglin Bao, and Yapeng Li. "Collective learning and information diffusion for efficient emergence of social norms." In Multi-agent and Complex Systems, pp. 193-210. Springer, Singapore, 2017. Yuan, Ruiping, Juntao Li, Xiaolin Wang, and Liyan He. "Multirobot task sllocation in e- Commerce robotic mobile fulfillment systems." Mathematical Problems in Engineering 2021 (2021). Zhao, Yiyi, and Gang Kou. "Bounded confidence-based opinion formation for opinion leaders and opinion followers on social networks." Studies in Informatics and Control 23.2 (2014): 153-162. Zhou, Ryan, Christian Muise, and Ting Hu. "Permutation-invariant representation of neural networks with neuron embeddings." In Proceedings of the European Conference on Genetic Programming (Part of EvoStar), pp. 294-308. Springer, Cham, 2022. 69