4. . a in? 4%? fig: . ..i .. N This is to certify that the dissertation entitled Modeling Genetic Algorithm Dynamics for OneMax and Deceptive Functions presented by Bulent Buyukbozkirli has been accepted towards fulfillment of the requirements for the Doctoral degree in Mathematics $122229)“, ’Maior F'rofessor’s Signature M £41 :2 00% fl / Date MSU is an Affinnative Action/Equal Opportunity Institution M‘Chlgan State University PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/CIRC/DateDue.p65-p. 15 MODELING GENETIC ALGORITHM DYNAMICS FOR ONEMAX AND DECEPTIVE FUNCTIONS BY Bulent Buyukbozkirli A DISSERTATION Submitted to Michigan State University In partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Mathematics 2004 ln this (Onel mean The crossc devek indivic CiOSSi QGDer (lowe‘ mode ODGra mode One” ”0884 IIXed ‘ ABSTRACT MODELING GENETIC ALGORITHM DYNAMICS FOR ONEMAX AND DECEPTIVE FUNCTIONS By Bulent Buyukbozkirli In this dissertation, we develop a model predicting dynamics of the counting-ones (OneMax) and a form of deceptive function problems. The model describes the mean allele and, in the case of deceptive function, mean deceptive block values. The genetic algorithm (GA) that is being modeled consists of two-point crossover, fitness proportional selection and mutation operators. The model is developed to estimate the average GA dynamics, but it can also be used for an individual run of the GA. First, we develop the model for the OneMax problem with very high crossover rates. Then, we modify the model by using statistics of very early generations from GA runs, to describe the complete dynamics for different (lower) crossover rates of the OneMax problem. In the development of the model, we introduce a new quantity that measures the effect of the crossover operation and is independent of generation, for practical purposes. Then, the model is generalized to cover other cases of the OneMax, such as weighted OneMax, as well as a form of deceptive function problem, for high enough crossover rates. The model is also modified to include Boltzmann selection with fixed or scaled selection pressures. even Botza Since dece: a Sui aflgon gene: shnpl nontr dfite “Wet " st eq'u; SQ!“- The model can be applied to OneMax and deceptive function problems even when the crossover, mutation and the selection pressures (in the case of Boltzmann scaling) are changed at predetermined generations during a GA run. Since our model estimates the mean value (and, mean deceptive block value for deceptive function) at each locus at any generation, it can be used to determine a suitable migration time as well as the migration rate for parallel genetic algorithms (in the island model case) when migrations are allowed at any generation for our benchmark problems. Although the problems for which our model proved successful were rather simple or idealized, they were often sufficiently involved to capture interesting nontrivial features of the GA dynamics. The author hopes to extend the approach to model solution of more representative real-world problems with various degrees of OneMax similarity and various amounts of deception. At the end of the dissertation, an attempt to develop a stochastic differential equation model to predict the evolution of the fitness distribution for the OneMax problem is also presented. Although our attempt was not successful, it shows a strong connection between fitness evolution and certain diffusion equations and points toward the possibility of the existence of a diffusion-type equation that could describe the OneMax and maybe other type of GA dynamics. DLCS gene: the 5 very ll algori thecfl hketc NMXDI resee for re My v mOm bGIOI Work the g ACKNOWLEDGEMENTS The front page of this dissertation has the name of only one person; however it came into existence by the collective being of many others. Dr. Goodman has introduced me to the interesting, open questions in the field of genetic algorithms. Through long and fruitful discussions with him, the details of this study have emerged. He was also very patient to read the manuscript in its very raw form and suggest many improvements. My involvement in the genetic algorithm theory has started by the suggestions and guidance of Dr. MacCluer in the direction of applying it to solve a practical power electronics problem. I would like to also express my thanks to the other members of my thesis committee: Dr. McCleer and Dr. Peng for their suggestions about the practical application of my research in the field of electrical power inverters, and Dr. Newhouse and Dr. Hall for reading my thesis and for their insightful questions. My wife, Figen, has been the strongest source of inspiration and love at every moment of my studies. Our son, Eren Baray, who was born just a couple months before the completion of this dissertation, brought the energy and will to bring this work into its final form at its hardest stages. I am also grateful for the prayers and the good wishes of my parents, elders of my family and all the friends. LIST ._ LIST CHAl Gen 1.1 1.2 d 1.3 1.4 ##u—A —L 1.5 CHA A Ml 2.1 2.2 TABLE OF CONTENTS LIST OF TABLES .............................................................................................. VII LIST OF FIGURES ........................................................................................... VIII CHAPTER 1 Genetic Algorithms: Theory and Models .......................................................... 1 1.1 Thesis Outline ............................................................................................... 1 1.2 What is a Genetic Algorithm? ....................................................................... 3 1 .2.1 Counting-Ones (OneMax) .............................................................. 10 1.2.2 Cumulants as a Tool to Observe GA Evolution ............................. 12 1.2.3 Counting-Ones and Migration ........................................................ 17 1.3 GA with a Real-Life Problem ...................................................................... 25 1.4 Theoretical Models of GA ........................................................................... 29 1.4.1 Schema Theory ............................................................................. 30 1.4.2 Random Walk Model ..................................................................... 34 1.4.3 Markov Chain Model ...................................................................... 35 1.4.4 Statistical Mechanics Model .......................................................... 37 1.5 The Need for a New Model ......................................................................... 40 CHAPTER 2 A Model of GA Dynamics for the OneMax Problem ....................................... 43 2.1 Problem Description and a Visual Representation of the OneMax Dynamics ..................................................................................... 43 2.2 The Model for OneMax with Fitness Proportional Selection ....................... 53 2.2.1 Selection and Crossover with “High Enough” Crossover Rate .............................................................................. 54 2.2.2 Mutation ......................................................................................... 57 2.2.3 Fitness Variance for “High Enough” Crossover Rates ................... 58 2.2.4 Lower Crossover Rates ................................................................. 60 2.2.5 The Algorithm of the Model ............................................................ 64 2.3 Weighted OneMax Fitness Function ........................................................... 66 2.4 OneMax with Boltzmann Scaling ................................................................ 69 CHAPTER 3 A Model of GA Dynamics for the Deceptive Function Problem .................... 74 3.1 Deceptive Function ..................................................................................... 74 3.2 A Model for the Deceptive Function with Fitness-Proportional Selection ..................................................................................................... 76 3.2.1 Selection and Crossover with “High Enough” Crossover Rate .............................................................................. 78 3.2.2 Mutation ......................................................................................... 84 3.3 3.4 CHAI Coml CHA Cont 5.1 5.2 APF BIB 3.2.3 Fitness Variance ............................................................................ 85 3.2.4 The Algorithm of the Model ............................................................ 88 3.3 Weighted Deceptive Fitness Function ........................................................ 90 3.4 Deceptive Function with Boltzmann Scaling ............................................... 92 CHAPTER 4 Comparison of the Model with GA Experiments ............................................ 95 4.1 OneMax Problem ........................................................................................ 96 4.1.1 Fitness Proportional Selection ....................................................... 96 4.1.2 Boltzmann Scaling ....................................................................... 103 4.1.3 Weighted Fitness Function .......................................................... 113 4.2 Deceptive Function Problem ..................................................................... 127 4.2.1 Fitness Proportional Selection ..................................................... 127 4.2.2 Boltzmann Selection .................................................................... 131 CHAPTER 5 Conclusions and Future Work ....................................................................... 136 5.1 Conclusions .............................................................................................. 136 5.2 Future Work .............................................................................................. 139 APPENDIX.................. _____________ _ _ -_ _ _ - - 142 A Comparison of the OneMax Problem with a Diffusion Model .................. 142 BIBLIOGRAPHY ............................................................................................... 149 vi am (I) fable LIST OF TABLES Table 1.1 The three cases with the average generation numbers when the mean fitness of 95 is achieved .................................. 22 Table 2.1 Notations .......................................................................... 46 vii LIST OF FIGURES Figure 1.1 Genetic Algorithm creating new populations with selection, mutation and crossover operations .................................. 4 Figure 1.2 An example of an initial population together with the fitnesses and relative fitnesses of their chromosomes ..................... 5 Figure 1.3 A possible outcome of fitness proportional selection ....................... 6 Figure 1.4 An example of two-point crossover .................................................. 7 Figure 1.5 An example of mutation ................................................................... 8 Figure 1.6 A GA with multiple populations, arrows showing the direction of migration ....................................................................... 9 Figure 1.7 Evolution of mean fitness over 100 generations in a GA, for a counting-ones problem. Population size = 50, chromosome length = 100, ,6 = 0.2 ........................................ 11 Figure 1.8 Evolution of mean fitness for a counting-ones problem over 200 generations, after the average is taken over 1000 GA runs. Population size = 50, chromosome length = 100, ,6 = 0.2. .................................................................. 12 Figure 1.9 The time evolution of cumulants, K1 and [(2, for two cases of GA. First with selection, mutation and crossover, the second with selection and mutation only. The graphs are obtained by averaging 100 GA runs ................................................................................................ 15 Figure 1.10 The time evolution of cumulants, K3 and K4, for two cases of GA. First with selection, mutation and crossover, the second with selection and mutation only. The graphs are obtained by averaging 100 GA runs ................................................................................................ 16 Figure 1.11 Evolution of fitness distributions for CASE1, CASE2 and CASE3 .................................................................................... 19 Figure 1.12 Evolution of mean and maximum fitness for CASE1, CASE2 and CASE3 ....................................................................... 20 Figure 1.13 Evolution of mean fitness and fitness variance for CASE1, CASE2 and CASE3 ......................................................... 21 viii Figur Figur Figur Figur Figw. Figu Figu Figu Figu Figu Figu Figu Figu: Figure 1.14 Figure 1.15 Figure 1.16 Figure 1.17 Figure 1 .18 Figure 1 .19 Figure 1.20 Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4 Figure 2.5 Figure 2.6 Figure 2.7 Figure 3.1 A), curves for three cases. pm = 0.001 and ,6 = 0.3 .................. 24 Passage from DC to AC voltage by switch-mode inverters ......................................................................................... 26 A simplified circuit of pulse-width-modulated inverter. ................... 27 The part of the chromosome representing timing of a transistor pair during one period of the desired output voltage ........................................................................................... 29 Template H .................................................................................... 31 The Random Walk ......................................................................... 35 The statistical mechanics model applied to cumulants .................. 38 Selection-mutation-crossover versus crossover- selection-mutation .......................................................................... 43 The chromosomes in the population, mean allele and other notations ............................................................................... 45 The mean allele values, ai’s at each locus i for times t =0, 10, 20, 30, 50 and 100 of a GA run with pc=0.25 and pm=0.001 ................................................................................. 47 Definition of Ah (t) ......................................................................... 49 The experimental average values of Ah(t) as a function of time for h =0, 0.1, 0.2, 0.9 , and the bar graph interpretation. The population size is 50 and the chromosome length is 100 genes. The GA parameters are pc = 0.25, pm = 0.001. The average is taken over 100 experiments .......................................................... 52 The mean value of the correction weights as a function of mean allele levels, h', for crossover rates pc = 0.25, 0.75 and 3. The statistical average is found over 100 runs of the GA. Population size is 50 and chromosome length is 100 genes .................................................. 63 Fitness with weights ....................................................................... 66 The definition of the fitness of a 3-bit deceptive funcfion .......................................................................................... 75 ix Figure Figure Figur- Figure Figuri FIgUfI I I‘Mlgur FiQUr. Figure 3.2 Variables a and yin a 3-bit deceptive function problem .......................................................................................... 77 Figure 3.3 Definitions of p?(t) and p37 (t) for a 3-bit deceptive funcfion .......................................................................................... 79 Figure 3.4 An example of the weighted fitness function for a 3-bit deceptive problem ......................................................................... 91 Figure 4.1 Ah as a function of time for four different cases where the crossover rate is 4 and 25 percent, respectively. There is no mutation. Black lines are the experimental averages over 100 GA runs and thick gray lines are the results obtained by model simulations. Population size is 50 and the chromosome length is 100 genes ..................... 98 Figure 4.2 A), as a function of time for four different cases where the crossover rate is 75 and 300 percent, respectively. There is no mutation. Black lines are the experimental averages over 100 GA runs and thick gray lines are the results obtained by model simulations. Population size is 50 and the chromosome length is 100 genes .................................................. 99 Figure 4.3 Ah as a function of time for two different cases where the mutation rate is 0.1 and 2 percent, respectively. The crossover rate is 50% in both cases. Black lines are the experimental averages over 100 GA runs and thick gray lines are the results obtained by model simulations ................................................................................... 1 00 Figure 4.4 The mean fitness for pa = 25%, 75% and 300%. In each figure four different mUtation rates, pm = 0%, 0.1%, 1% and 2%, are shown. Black lines are the experimental averages obtained by averaging over 100 GA runs and thick gray lines are the results obtained by model simulations .................................................... 102 Figure 4.5 The fitness variance for four different rates of mutation, pm = 0%, 0.1%, 1% and 2%, with crossover rate at 300%. Black lines are the experimental averages obtained by averaging over 100 GA runs and thick gray lines are the results obtained by model simulations ................................................................................... 1 03 Figure 4.6 Figure 4.7 Figure 4.8 Figure 4.9 Figure 4.10 Figure 4.11 Figure 4.12 Figure 4.13 Figure 4.14 Figure 4.15 Figure 4.16 Ah curves for fixed ,6 Boltzmann selection. pm = 0, pc = 60. The first graph is for )6 = 0.1, the second, for ,6 = 0.6 .................................................................... 105 Ah curves for fixed ,6 = 0.1 Boltzmann selection. pm = 0.01, 0.1, 0.2 , p, = 60 ...................................................... 106 The mean fitness graphs for fixed-,6 Boltzmann selection. The graphs show mean fitness for different mutation rates with ,6 = 0.1 or ,6 = 0.6 ...................................... 107 The mean fitness graphs for scaled-,6 Boltzmann selection. The graphs show mean fitness for different mutation rates with )8 = 0.1 or ,6 = 0.6 ...................................... 108 The mean fitness graphs for fixed )6 Boltzmann selection for three different selection pressures ,8 = 0.1, 0.3, 0.6 with pm = 0 ..................................... 109 The fitness variance for fixed ,6 Boltzmann selection for three different selection pressures ,8 = 0.1, 0.3, 0.6 with pm = 0 ..................................... 110 A), curves for scaled-fl Boltzmann selection. pm = 0.001, pc = 60. The first graph for ,6 = 0.1, the second ,6 = 0.6 ........................................................................... 111 The fitness variance for scaled ,8 Boltzmann selection for two different selection pressures ,8 = 0.1 and )6 = 0.6 showing each selection pressure for different mutation rates ............................. 112 Profile A and Profile B for the weighted fitness function; ....................................................................................... 115 Ah curves for Profile-A with fitness proportional selection, pm = 0 and pm =0.2 ................................................ 116 Ah curves for Profile-A for scaled-fl Boltzmann selection. with ,6 = 0.6 and pm = 0, pc = 60. .............................. 117 xi Figure 4.17 Figure 4.18 Figure 4.19 Figure 4.20 Figure 4.21 Figure 4.22 Figure 4.23 Figure 4.24 Figure 4.25 Figure 4.26 Figure 4.27 Mean fitness curves for Profile-A for fitness proportional selection with different pm ’s .................................... 118 Mean fitness curves for Profile-A with Boltzmann selection, scaled ,6 : (a) comparing different ,6 cases; (b) for ,6 = 0.6 with different pm’s ................................... 119 Fitness variance curves for Profile-A: (a) for fitness proportional selection with different pm’s; (b) comparing different ,6 cases ...................................................... 120 Fitness variance curves for Profile-A for ,8 = 0.6 with different pm ’5. Boltzmann selection is with scaled ,6 ................. 121 Ah curves for Profile-B with scaled-,6 Boltzmann selection. (a) ,3 = 0.1 , no mutation; (b) ,8 = 0.6 with no mutation ........................................................................... 122 Ah curves for Profile-B with scaled-,6 Boltzmann selection. ,6 =0.6 with 1% mutation. In all cases pc = 60 ............................................................................. 123 Mean fitness curves for Profile-B, showing fitness proportional selection and Boltzmann selection with ,6=0.1 and ,B=O.6 .................................................................. 124 Mean fitness curves for Profile-B. Boltzmann selection with ,6 = 0.6, and different mutation rates .................................. 125 Fitness variance curves for Profile-B, showing fitness proportional selection and Boltzmann selection with ,6=0.1 and fl=0.6 .................................................................. 126 A: and A; curves for fitness proportional selection. pc = “high enough”. pm = 0. Black lines are from GA run, gray lines are from model simulation .................................... 128 Aff and A; curves for fitness proportional selection with mutation rate pm = 0.001, and p6: “high enough” ....................................................................................... 129 xii Figure 4.28 Mean fitness and fitness variance curves for fitness proportional selection. p6: “high enough”. Pm = 0 , 0.001 ,and 0.2. Black lines are from GA run, gray lines are from model simulation ................................................... 130 Figure 4.29 Af and A; curves for selection with Boltzmann scaling. ,B=O.I p6: “high enough”. Pm = 0. Black lines are from GA run, gray lines are from model simulation .................................................................................... 132 Figure 4.30 Af and A}: curves for selection with Boltzmann scaling. [3:06 pc= “high enough”, pm = 0. Black lines are from GA run, gray lines are from model simulation .................................................................................... 133 Figure 4.31 Comparison of mean fitness and fitness variance curves for selection with Boltzmann scaling for ,6 =0.1, 0.3 and 0.6. pc = “high enough”, pm = 0 ....................... 134 Figure 4.32 Comparison of mean fitness and fitness variance curves for selection with Boltzmann scaling for ,6=0.1, p0: “high enough”, for different mutation rates ............................................................................................. 135 Figure 5.1 A complicated fitness function a composition of several OneMax and deceptive parts .......................................... 140 xiii Chapter 1 GENETIC ALGORITHMS: THEORY AND MODELS 1.1 Thesis Outline Genetic algorithms (GA) are stochastic, heuristic search algorithms that have been applied to a wide range of problems. In Section 1.2, the simple genetic algorithm is described together with some of the ways its dynamics are observed. The counting-ones (OneMax) benchmark problem is also introduced in this section serving as a simple demonstrative example for GA operators. We present a real-world example in Section 1.3, in order to demonstrate a nontrivial problem in which GA can be applied to find an optimum configuration. Section 1.4 contains a review of some of the ways GAs are currently analyzed. Chapter 1 ends with a critique of the current theories in Section 1.5, and explains why we need a new model, which is the motivation in the development of this dissertation. Chapter 2 starts with a detailed description of the OneMax problem. A new and practical model for the simple Genetic Algorithm dynamics of OneMax problem is developed in Section 2.2. The GA that is being modeled consists of two-point crossover, fitness proportional selection and mutation operators. For low crossover rates, the model uses statistics of the early generations of GA runs to describe the dynamics of the problem for all time, using a variety of crossover and mutation rates. In the development of the model, we introduce a new quantity that measures the effect of the crossover operation and is independent of generation, for practical purposes. Then, the model is generalized, in Section 2.3 and 2.4, to cover the weighted OneMax and the Boltzmann selection with fixed or scaled selection pressures. Chapter 3 follows a similar pattern as in Chapter 2, but for a type of deceptive function in stead of OneMax. The model in this chapter is developed only for high enough crossover rates. It estimates the simultaneous evolution of the mean allele and the mean all-1 deceptive blocks, as well as the mean fitness and fitness variance of the population. The model for lower crossover rates has still been under investigation by the author at the time this dissertation was written. In Section 3.1 the deceptive function is described and the modeling problem is formulated. The model for fitness proportional selection is developed in Section 3.2, which is followed by the extensions of the model to weighted deceptive function and Boltzmann selection in Sections 3.3 and 3.4, respectively. In Chapter 4, the model that is developed in Chapter 2 and Chapter 3 is tested by comparing its computer simulations with the results with averaged GA runs. Section 4.1 presents the results for OneMax and Section 4.2 for the deceptive function. An attempt to develop a stochastic differential equation model to predict evolution of fitness distribution for the OneMax problem is presented in Appendix. Although our attempt was not successful, it demonstrates a strong connection between fitness evolution and certain diffusion equations and points toward the possibility of the existence of a diffusion-type equation that could describe the OneMax and maybe other type of GA dynamics. 1.2 What Is a Genetic Algorithm? Genetic Algorithms (GAs) are stochastic search algorithms based on principles of natural selection and genetics. They were first developed by J. Holland in the 1960's, and presented, together with a large body of accompanying theory, in his 1975 book. There are three main operations in a genetic algorithm, namely selection, mutation and crossover (sometimes called recombination). Each operation has many different types which may be applied, depending on the specific nature of the optimization problem. At the beginning, an initial population that consists of a set of proposed solutions is chosen. The selection of the initial population is usually random, unless it is specifically required to search a more restricted region. Then, selection, mutation and crossover operations are applied to this initial population in a predetermined order to create the first generation (or second population), Figure 1.1. The second generation is obtained from the first one by the application of the same operators. This procedure continues until a “good enough” solution emerges within the population or until the whole population converges to the point that additional search is deemed unproductive. Selection Selection Selection Mutation Mutation Mutation Crossover crossover Crossover ———> ———> Initial Population pt 2.“ (randomly generated set of Generation Generation input values for the fitness function) Figure 1.1 Genetic Algorithm creating new populations with selection, mutation and crossover operations Let us consider a very simple example in order to illustrate the GA operations and define some GA-specific terms. In this example, we want to find the value of the input variables that yields the maximum value of the objective function f (called the ‘fitness function’ in the GA terminology) f(al,az,a3aa4aas) = 01+ “2 + a3 + a4 + “5v (1 .1) where the input variables, a1,a2,a3,a4 and a5 , take discrete values of only 0 or 1. A problem of this form, of arbitrary dimension, is called a ‘counting-ones’ (or sometimes ‘OneMax’) problem, since the value of the function is equal to the number of 1’s in the input values. It is obvious that the solution to the problem is al =1, a2 =1, a3 =1, a4 =1, 05 = 1. It is not so obvious, but important, that even some real-world problems — those that can be solved by decomposition into independent components — exhibit many aspects of the behavior of a counting ones problem, after suitable recoding of the inputs. Because of their simplicity, counting ones problems are often used in analyses of the dynamics of a genetic algorithm. A few examples of this kind of application can be found in [Goldberg, 1989], [Furutani 2002], [Priigel-Bennett, 2002] and in many other publications. Now, let us see how a GA would solve this problem of find the optimum of the function f . First, select values for (al,a2,a3,a4,a5) randomly, for example (1 ,0,1 ,0,0), etc.. To make the initial population, we generate a fixed-size of such number strings each of which is called a ‘chromosome’. In Figure 1.2, we see an example of such a population with 4 chromosomes, showing their fitness values and the ratios of their fitness to the total fitness of the population (so-called "relative fitness values”). In this example, the ‘chromosome length’ is 5 — i.e., 5 variables — and the ‘population size’ is 4. # Chromosome Fitness ‘70 of Total 1 101—00 2 25 2 1000-1 2 25 3 0-0-1—1—1 3 37.5 4 00-0-10 1 12.5 Total 8 100,0 Figure 1.2 An example of an initial population together with the fitnesses and relative fitnesses of their chromosomes Given a population, the selection operation creates a new population with the same number of chromosomes by selecting some of its chromosomes. One of the most common types of selection is ‘fitness proportional’ selection, also called ‘roulette wheel’ selection. In this type of selection, the probability that each chromosome would be selected for the next generation is equal to the proportion of its fitness to the total fitness of the population (i.e., its relative fitness). In our example, Figure 1.2, the first two chromosomes have probability 0.25 of being selected for the next generation, while the third one has probability 0.375 and the last one, 0.125. The selection operation is performed with replacement. That is, the fittest chromosomes might be selected more than once, while some chromosomes might be lost during the selection procedure. A possible outcome of the selection is shown in Figure 1.3, in which two copies of the third chromosome are selected, while the fourth one is not present after selection. 0-0-1-1-1 ———-) After selection, 1‘0‘0'0-1 we may have 1-0-1-0-0 ' rd two copies of 3 0-0—1-1-1 chromosome, while 4th one is lost. Figure 1.3 A possible outcome of fitness proportional selection The crossover operation is the most complicated operation of the GA. One of the most common types of crossover is two-point crossover. In two-point crossover, the chromosomes are first paired randomly, using a uniform probability density distribution. Then, within each pair, the parts of the chromosomes lying between two randomly (with a uniform probability density distribution) selected locations are swapped. Not all pairs of chromosomes have to undergo the crossover operation, though. An important parameter of crossover operation, called the crossover probability, pc, gives the probability at which a given pair of chromosomes are crossed. In Figure 1.4, first and third chromosomes pair off and their last two genes are swapped. Also, the second and fourth chromosomes pair off and their third and forth genes are swapped. 0-0—1-1-1 * 1-0-0—0-1 f ' 1-0-1-0-0 ‘ 7 0-0-1-1-1 1-0-0 0-1 0—0 l-l 1 0-0-1 1-1 1-0 1-0 0 Swap position 3 to 5: Swap position 2 to 4 : 1_0_0_1_1 0_0_1_0_1 New population after crossover 0-0-1-0-1 1-0-1-1-0 Figure 1.4 An example of two-point crossover In bitwise mutation, each gene of each chromosome changes its value from 1 to 0 or 0 to 1 with mutation probability, Pm .For example, a mutation at the 4"1 gene of 0-0-1-0-0 gives us 0-0-1-1-0, Figure 1.5. 0—0- 1 -0-0 Mutation at 4th gene I 0—0-1-1-0 Figure 1.5 An example of mutation Thus, after selection, crossover and mutation are applied, we get a new population, which is, on average, expected to be ‘better‘ than the previous one. Sometimes the relative fitness values of the chromosomes are replaced by some other scaled values of the fitness for the purposes of the selection operation. For example, if one wants to give disproportionately more chance of being selected for chromosomes with higher fitnesses, one can multiply their fitness values by some weights and use those values instead of their original fitness values. One of the common ways of fitness scaling is Boltzmann scaling, in which the fitness, fc, of chromosome c is replaced by f: = em , where ,6 is a constant, called the Boltzmann constant. The advantages of using a Boltzmann scaling is explained in Section 2.4. Another common application of GA involves using multiple populations in parallel. This case is called parallel GAs. Each population (deme) evolves separately. However, at certain times some of the chromosomes, usually the best ones, of some populations are allowed to ‘migrate’ into other populations (typically being copied, without removal from the 'donor‘ population). Figure 1.6 shows a possible configuration for such an application. The main purpose of such an application is to reduce the risk of premature convergence. Population 1 Population 2 Population 3 Population 5 Population 4 Figure 1.6 A GA with multiple populations, arrows showing the direction of migration There are several advantages of GAs over other optimization and search procedures for some classes of search problems. First, GAs can climb many peaks in parallel. There is then a smaller probability that they will miss a peak that leads to a global optimum. Second, GAs use the value of a function rather than performing operations on an explicit representation of the function (for example, formal differentiation). It is thus possible to find a good estimation of an optimum in a situation where the form of function itself is not known, but values of the function in particular situations are known, such as stock prices in the financial markets. GAs also produce effective solutions to problems with too many variables, in which other search procedures have hard time due to time limitations (computational complexity grows exponentially, for example, with problem order n). Since the information is exploited directly from a fitness function, no other information is necessary about the problem under consideration. In addition, GAs are better than a random search for almost any problem domains of significant interest, since they use directed randomness, which reduces the computation time by skipping areas that are not fruitful to search, directing the search effort towards areas where it is more likely to find the optimum. 1.2.1 Counting-Ones (OneMax) In this section we study the counting-ones problem with a longer chromosome length. In this example, the chromosome length is 100. Thus the fitness function is f(a1,a2,...,a100) = a1+a2 +...+aloo . (1.2) We apply the GA with population size P = 50, mutation rate Pm = 0.01, crossover rate Pc =1, and Boltzmann scaling with Boltzmann constant ,6 = 0.2. 10 15 v T 1 r v v v v 10- ~ Initial generation 10*- 10- ~ Gen = 20 l O 1 0 2O 30 40 50 60 70 Fitness Distribution, p( F) 01 10 - — Gen = 40 5 h on 1‘0 26 3‘0 46 Si) ab 7%) ab 10 ~ - Gen = 100 5 b —4 00 1i) 2i) 30 40 50 ab 76 at) 9i) 100 Figure 1.7 Evolution of mean fitness over 100 generations in a GA, for a counting-ones problem. Population size = 50, chromosome length = 100, ,6 = 0.2 11 In Figure 1.7, we see the evolution of fitness distribution for a run of the GA. If we apply the GA 1000 times to this problem, starting with different randomly chosen initial populations, and take the average of the fitness distributions, then the picture looks as in Figure 1.8. This figure shows a typical evolution of the average fitness distribution of a GA. 0.25- 10] and2lIJ 0.2 - ' H150 0.15- 10 I/ I i\ 0.1 - r .0“\ . a 1;, :f: .. \ ,2 0.05- ’/ \X .. .[i f/ if]; \\\(\ / \ , ,i- ”‘4 f/ \i\ / . ,. l— "’ l \.\3:\‘ J"! , .f:‘J , _ .‘l./ . \:\ . 030“" 40 50 60 70 00 90 100 Figure 1.8 Evolution of mean fitness for a counting-ones problem over 200 generations, after the average is taken over 1000 GA runs. Population size = 50, chromosome length = 100, ,6 = 0.2. 1.2.2 Cumulants as a Tool to Observe GA Evolution The characteristic function of a fitness distribution ,0(F) is defined by (13(0)) = zp(F)ein . Then the nth moment , ,an, of this distribution is defined 12 - - - _ fl" (1.60)" - - th by the coeffrcrents of the series (I>(a))—Z——'-——. Similarly, the n ":0 n. cumulant, K", of this distribution is defined as the coefficients obtained using the logarithm of the characteristic function Kn Um)" 10g((w))= 2— n=1 It follows from this definition that the first cumulant is just the mean fitness of the population, the second cumulant is the fitness variance, the third one is the skewness and the fourth one is the kurtosis. Cumulants are also related to the moments as K] = ZptnF = A F 19 = Zph1afi Mr; H a 4m 9 .A . I 100 r l I I ’_:__’Jfl,,'——-—J——hs--u!o-oan-.Iasnu-n-J-..-oonq 4 --- :.‘ -- I’- ’ IS A -__ J ,4 _ 95 x n .- I , ( . I' I l [(1 Mean Fitness Fitness __ without migration (CASE 1) 75 5 with migration of a random 70 E individual (CASE 2) __ with migration of a well organized 0...... .1 55 individual (CASE 3) l D 20 40 80 80 1 00 1 20 1 40 1 60 180 200 Generation # Figure 1.12 Evolution of mean and maximum fitness for CASE1, CASE2 and CASE3 20 Mean Fitness 100 , , 95 90 85 80 75 70 65 60 without migration (CASE 1) with migration of a random .. individual (CASE 2) with migration of a well organized individual (CASE 3) a 80 100 1 20 Generation # Figure 1.13 Evolution of mean fitness and fitness variance for CASE1, CASE2 and CASE3 21 Table 1.1 The three cases with the average generation numbers when the mean fitness of 95 is achieved Mean Gen # CASE 1 76.72 CASE 2 74.73 CASE 3 61.96 Now, let us look at a more realistic case of migration. In this counting-ones experiment, we have two populations, P1 and P2, of size 100, running simultaneously. We apply two-point crossover with 100% crossover rate, mutation with 0.1% rate, and Boltzmann selection with scaled ,B=O.3 (see section 2.4 for a definition of Boltzmann selection). The case when no mutation is applied is called CASE-A. In CASE-B, 10 members with the highest fitness of P2 is migrated into P1 at generation 20. In CASE-C the migration procedure of CASE-B applied at generations both 20 and 30. In order to observe the inner structure of the population P1, we count the number of loci in the population at which the mean allele values are less than or equal to a given number, h. In Section 2.1, we define these measures more formally and call them Ah. In particular, A0 counts the number of loci at which all chromosomes in the population has 0 values. Figure 1.14 shows Ah curves for h= 0, 0.1, ..., 0.5 as a function of time for each case. The migration times 20 and 30, in particular, are chosen by finding the time at which A0 is maximum, in other 22 words by finding average times when the number of loci whose values completely converged to zero over all population members reaches to a local maximum. As seen in these graphs, application of each migration pulls down Ah curves towards 0. This means that, on the average, the number of loci with value 1 will increase faster with migration. This example shows that the migration helps to make the population move or converge faster. In more complicated fitness function problems, this kind of migration will also prevent population converging to a false maximum, i.e. avoid premature convergence or convergence to a non-global local maximum, when the migration parameters are chosen correctly. 23 =0.001 , = 0.3 60 Pm [3 50> 40» (c \ No migration 30 i <= 0.4 \ <= 0 3 \\\\.\ - 20 . fl/;;;:3§ <= o_2 10,, ;'// =0 llill// MA/ 1 um I 1 L J l 0 5 1015 20 25 30 40 60 80 100 CASE-A t 60 . l 50 1' l '2 40 -l With migration at t = 20 <= .. . \ 40 ‘72» CASE-B ‘ 60 r- i 50 A. l l. 4° *‘l . . . r With migration at t = 20 and 30 (F 30- \\ N~\ 2o » frg / ,/ / \ l/’ /‘ m 1° W / \ ill/l ’/ ij/ ’klf;;k o ”I / t n 1 - -——i 0 20 40 60 80 100 CASE-C t Figure 1.14 Ah curves for three cases. Pm = 0.001 and ,6 = 0.3 24 flirtatin... ILL-n 9.. / . _ i. . r z . 1.3 GA with a Real-Life Problem In this section, we will see an application of a GA to a real-world problem. We will not give the solution of this problem by a GA in this dissertation since our main subject is the modeling of the GA rather than its applications. However, it is illustrative to include this example here to see how complicated the GA problems could be. This problem, suggested by P.J. McCleer of McCleer Power Inc., introduced me to genetic algorithms for the first time. Considering how to solve this problem and trying to determine most appropriate GA parameters and structure drove me to seeking how to understand GA dynamics in a more general framework. The problem involves the automotive 42-Volt DC electrical systems of hybrid cars. The system uses pulse-width-modulated inverters to convert the bus voltage to a 3-phase AC voltage with desired amplitude and frequency. In Figure 1.16, we see a simplified circuit of this system. DC voltage can be converted to a sinusoidal AC voltage, either single phase or three phase, with desired amplitude and frequency, by means of electrical devices called “switch-mode inverters.” The basic idea of these inverters is sketched in Figure 1.15. The DC voltage is converted to an AC voltage that alternates between values 0 and a certain fixed voltage. Only the fundamental component of this voltage is to be used as the output. Thus, the high frequency harmonics are filtered and the desired sinusoidal voltage is obtained. 25 Inverter switchings filter DC Voltage => | II H” II “H H H” H I" =5 W Figure 1.15 Passage from DC to AC voltage by switch-mode inverters By turning the switches, which are the six transistors Sig, 823, 83... 81b, 82., and 83b in Figure 1.16 on and off, the desired output power specifications are achieved while providing an output voltage with a specific amplitude and frequency but without unnecessary harmonics. One restriction in the functioning of the switches is that, in a vertical pair, such as Sin and Sm, the transistors can not be both on or off at the same time. Otherwise, this would cause a short circuit. Thus, it is enough to determine the status of one transistor in each pair. The whole art of switching scheme design comes into play at this point. The practice in automotive inverters is to control the switches by a central processing unit (CPU), which can be programmed to turn the switches on and off in any desired pattern. The switching is accomplished with high-current field- effect transistors (FET). The advantage of this method is the ability to control the fundamental component while eliminating some of the harmonics. However, one should be careful with the frequency of switching since, in practice, if a switch 26 turns off in an inverter leg, the turn-on of the other switch is delayed by a blanking time, which introduces low-order harmonics in the output. On the other hand, ripples in the output current result in ripples in the current through the capacitors. This ripple current causes the capacitors to warm up overtime, which is very undesirable since the temperature under the engine hood under normal operating conditions can approach 120°C. Such high temperatures can damage the capacitors. Because of the high cost of these capacitors, any reduction in their number will reduce the cost of the inverter. Reducing the number of capacitors can be obtained by reducing the current ic. Hence, the main focus of our design is to minimize ic while still meeting the desired output voltage and/or current specifications. As a result, the question is “what is the best switching algorithm which will produce minimum current ic through the capacitor C?” Figure 1.16 A simplified circuit of pulse-width-modulated inverter. 27 Let’s say we want to determine the timing for 818, Sea and $3., (timing of S"), 82:, and 83, are determined from them). A good solution to this problem can be achieved by a GA in the following way. Split the chromosome into 3 equal parts (one part for each vertical pair of transistors). For each part, the alleles (values) at successive loci (fields) of the chromosome are interpreted as the relative lengths of successive on and off intervals, of the corresponding transistor. We use discrete values for the time lengths. Each allele, let’s say, can take on a 4-bit value, so the values 0,1,2,...,15 are the possible field values (alleles) at each locus. We assume a maximum of 101 ON and OFFs for each transistor during a period. So each part consists of 101 loci and the total chromosome length is, then, 3x101 = 303 loci. We take the fitness function as the mean squared value of the current through the capacitor. The function is restricted to the domain in which we have the correct amplitude and frequency of the sinusoidal output (in a different application, the deviation from these constraints can also be put as a part of fitness function with negative weights.) Thus, we have a well-defined GA problem, whose application could produce switching algorithms that are more efficient than traditional methods. Recently, GA is, in fact, successfully applied to power inverters to determine the switching angle, i.e. time delay between switching times of the transistors, eliminating high order harmonics of the output voltage, (Ozpineci, et al [2004].) 28 One period / A \ String OFF ON OFF ON OFF OFF ON OFF (CthfDOSOITB): L 1 l n I l 6 n 3 1 3 I 4 l 10 5 7 o o o \J/// Fields (Genes) 101“ Field (Gene) Figure 1.17 The part of the chromosome representing timing of a transistor pair during one period of the desired output voltage 1.4 Theoretical Models of GA Some of the applications of GA may not converge to a desired value or sometimes would take months or years to achieve a good solution just because the design parameters are not chosen correctly. In practice, the fitness function is often very complicated and the crossover operation is a very complex mixing operator, which makes it quite difficult to analyze GAs theoretically. Several theories have been developed in order to understand why and how GAs work, and in order to choose the design parameters wisely. Theoretical models of Genetic Algorithms (GAs) fall into three main categories. The Markov chain model, as developed by Nix and Vose [1991], completely describes the probabilistic behavior of the GA. However, this model is too costly to implement computationally for problems with realistic population size and chromosome length. The statistical mechanics approach, developed by Prt‘lgeI-Bennett, Shapiro [1994] and Flattray [1996], gives fairly good results in modeling the OneMax problem with Boltzmann scaling, for a crossover rate of 100%, however it is not developed for lower crossover rates or to handle other benchmark 29 problems of GA such as deceptive functions. The approach of modeling GAs by considering building blocks (Goldberg [1989] and Goldberg, Deb, Thierens [1993]), on the other hand, gives us a good idea about the appropriate population size or the convergence time of the OneMax and help us determine the failure boundaries in the “control maps”. But the question of finding the most appropriate crossover or mutation rate is answered, so far, only by experimental results. We still lack a model that describes the behavior of the OneMax problem for different crossover and mutation rates together and allows us to choose the best parameters. 1.4.1 Schema Theory In this theory, the search space is divided into subspaces called “schemata." The aim is to characterize the schemata using macroscopic quantities, such as the number of individuals within a given schema H at generation t, denoted by m(H,t), average fitness of individuals in the schema and in the population, size of the search space, size of the schema, etc. Schema theorems model thematically how and why m(H,t) varies from one generations to the next. Since GAs are non-deterrninistic, one can only predict the expected value, E[m(H,t)], of m(H,t). As an example, consider a search space consisting of 5-bit binary chromosomes. A special subset of this search space can be described by the template, H = (1 0 * * 1), which means that the first, second and the last loci are fixed as 1, 0 and 1, respectively, while the third and fourth 3O loci can take either of the binary values. Hence the schema H consists of 4 chromosomes, Figure 1 .18. H=(10**1)—>{10111, 10101, 10011.10001} order : 0(H) = 3 defining length : 5(H) = 5 - 1 = 4 Figure 1.18 Template H The order, 0(H), of a schema H is defined to be the number of fixed digits within the schema. The defining length, 6(H ), of H is defined to be the distance between the first and the last fixed string positions in the schema. For example, for H=(10 * * 1) above, we have 0(H)=3 and 6(H) = 5-1 =4. The first schema theorem was developed by J. H. Holland in the 1960's, taught in his classes and used by his students in their theses, and made widely available in his 1975 book. Different versions of this theorem were later developed and published by Goldberg [1989], Whitley [1994], and Stephens and Waelbroeck [1997]. In one form, the schema theorem states that E[m(H,t+ 012 M -p(H,t)(1— pm)0<">[1— p, $430] (1.4) where 31 M :population size, N :string length, p(H,t) :selection probability of individuals in H at generation t, Pm : mutation probability, pc : crossover probability. The term 0' in Equation (1.4) is taken as 1—m(H ,t)/ H by Holland, as 1 by Goldberg, and as 1- p(H,t) by Whitley. In Equation (1 .4), the term M -p(H,t) gives the expected number of population members which are instances of the schema H after the selection operation. For example, if fitness-proportional selection is applied, then we have M -p(H,t)=m(H,t)f(H’t) m (1.5) where f (H ,t) is the average fitness of strings representing H within the population and fit) is the average fitness of the whole population. After selection, we can calculate the probability that each individual of H within the population will survive the changes made by mutation and crossover. The terms 6 H (1— Pm )0”) and 1— pc -N—(—1)0‘ in Equation (1 .4) represent these survival probabilities respectively. Equation (1.4) gives only a lower bound for 32 E [m(H ,t +1)] since it does not consider the newly created instances of schema H after mutation and crossover. On the other hand, we gain some insight into why GAs work by examining this equation. For example, consider Goldberg's version of the schema theorem with fitness-proportional selection: f (H .t) f (t) selectlon mutation crossover E[m(H,t +1)] 2 m(H,t)- (I—p.)”‘”’[1—p.@—)] N—l \ J (1.6) Equation (1.6) tells us that, if the average fitness of the schema, f (H ,t), is f (H) greater than the average fitness of the whole population — that is, if T > 1 , then, provided that 0(H ) and 6(H ) are small enough, the expected number of instances of H in the population increases exponentially over generations. By small enough values of 0(H) and 6(H), we mean that the values of 0(H), 5(H), pm and pc satisfy f(H’t) of! 6(H) 7??) (l-pm)()[1-ch_1]>1 (1.7) By means of this observation, Goldberg, 1989, has defined the notion of Building Blocks (BB), which are low-order, low—defining-length schemata of above average fitness — in other words, these schemata satisfying the condition above. Hence, the well-known “Building Block Hypothesis” of Goldberg says that 33 "A GA works by combining 885 to form higher-order 835 until it converges to an optimum or near-optimum solution." It is worthwhile to emphasize that Equation (1.4) works for all possible schemata independently and in parallel. Later, in 1997, Stephens and Waelbroeck, developed another schema theorem, which gives an exact equality for E[m(H ,t +1)] by the formula E[m(H,t+1)/M]=(1—p.)p(H.t)+Nag—1Nip(L(H.i),t)p(R(H,i),t) _ i=1 (1.8) where L(H,i) is the schema that is obtained by replacing the elements of H to the right of position iwith *‘s, and R(H,i) is the schema that is obtained by replacing the elements of H to the left of position i with *'s. Schema theory is applied to modeling of single populations and to determining good population sizes by G. Harik, D.E. Goldberg, , E. Cantu-Paz and BL. Miller [1999] and later to modeling of parallel GAs by E. Cantu-Paz [2001] 1.4.2 Random Walk Model The random walk model is used with schema theory to determine the appropriate population size N (G. Harik, D.E. Goldberg, , E. Cantu-Paz and BL. Miller [1999]). Let x0 be the initial number of BBs in the population and the variable x represent the number of 886 at any time. First, for a given 34 configuration of the population of size N , the probability, p, of producing one additional building block is calculated. Then one can visualize the dynamics of the number of 83s as a one-dimensional random walk as shown in Figure 1.19 x: # of BB’s q p (\A l l J I I 1 =0 x0=initial #of BB’s x: x0=N/2k Figure 1.19 The Random Walk Then the probability, P, that this random walk converges to x = N is calculated. Specifying an expected value of P will determine the correct size of the population that would result in this expected value being met or exceeded. 1.4.3 Markov Chain Model In this model, genetic operators are described by transition matrices acting on a vector describing the precise state of the population, (Nix and Vose [1991], V039 and Liepins [1991], Suzuki [1995].) In a search space where there are n possible points (i.e., chromosomes), let p = (poip1,..., p,,_1,)represent a population where pk is the proportion of the population occupied by item k. Let q = (q0,q1,...,qn_1,) be the probabilities that each item is generated in the next population. Consider the next population as P (the population size) independent samples of the search space, using q as 35 a probability distribution. Representing the chromosome length by L, we have n=2L. Note that, the size of the state space, i.e. the number of different . . . . P+2L—1 , , populations of Size P, IS given by P , which can be approximated by LP 7 when P very small compared to 2" . This number easily reaches to billions even for very small values of P and L. Think of the action of a GA as a map G : A —> A, where A={xe$)i" :xk 20,2xk =1} (1.9) Then the probability that population q follows population p is given by a multinomial distribution Plfi (G(p)j )qu .j=0 (P41)! (1.10) where G(p)is the expected next population, G( p) j is the 1‘“ entry of the probability vector G( p), i.e. the probability that the 1‘“ chromosome will be selected for the next generation. lterating G will produce a sequence of points. G describes the limiting behavior as the population size grow large. Then, we have the following theorem [Vose, 1999]. 36 Theorem: Given an initial population p, let q be the actual population observed after tgenerations. Then, for any 8 > O and 0 < 6 <1, there exists a number K such that if the population size is bigger than K, then the probability that l G’(p) — q" < 8 is greater than 1— 6. 1.4.4 Statistlcal Mechanics Model The population is described by a small set of macroscopic parameters under the assumption that microscopic details are not of critical importance, by A. Prt'igel- Bennett, J.L. Shapiro [1994,1997], and M. Rattray [1996]. They represent the fitness distribution by its cumulants and determine the effect of selection, mutation and crossover on each of the cumulants. Cumulants are more natural to use in this kind of representation instead of moments of the fitness distribution since they are self-averaging, i.e. their average represent a typical member of possible distributions, while moments are not self-averaging, (see Prl'igel- Bennett, [2002] for a discussion of this comparison.) Using these results, the shape of the next population fitness distribution is determined, (see Figure 1.20.) 37 $5.». in...) (.E.I. I! A 1 Estimate the shape of the population using Represent the fitness distribution these cumulants. by its cumulants, K' s (first four of them) xsm Ksmc lam—578‘ 1 x,(i)S-——-—»t 2 xtpris,e3 x.- (t)=K.(t+1) selection mutation crossover Needs statistical mechanics methods Figure 1.20 The statistical mechanics model applied to cumulants The objective is to determine average allele per site, average correlation per site, k etc. terms like < aJa- > - . The one variables are assumed to be i I I j¢k j, free to fluctuate subject to the constraint that the macroscopic quantities chosen are satisfied. Then, the distribution of the allele values is assumed to be the one which maximizes the entropy. As an example of their calculations, consider the mean allele value at the ith locus or. == 11:27! . Let N (a3) be the number of different ways that would give a,- as mean allele value at the f“ locus. Then, the entropy for this locus would be 38 "*4: ‘Ganiu who... .02 It /. I S(a,-)=log(N(a,-)). Introduce Lagrange multipliers, x and y, to enforce constraints on mean fitness and correlation yZZa,’ = yPZa, = yPKl : for mean fitness i=1 i i x2 . x2 7222mm? =7P22ai2 : for allele correlation J' k i i Then, the probability distribution for a {a} configuration is L L 2 P({ai}) : Hp(ai) = Hes(ai)+yPCY,-+(ani) /2 i=1 i=1 The maximal value of p(a,~) with respect to a,- gives the maximum entropy distribution for 01,. So, from derivatives with respect to a,, one gets a relationship between a,- and the Lagrange variables x and y. Using this expression of a,- in terms of x and y one can write down the previously chosen macroscopic variables, such as mean fitness, K1, and fitness variance, K2 . When the defining equations of K1 and K2 are written for the counting-ones problem, and the average over all possible crossover operations and mutation operations is taken, one sees that average mean fitness or the fitness variance does not depend on the mean allele values. Thus, the expected values of K1 and K2 can be calculated after mutation or crossover, and, we can find the values of x and y using them. Finally, using these values of Lagrange multipliers, one finds the 39 values of mean allele, 62,-. Then, ai’s are used to estimate K3 and K4 values. Finally, the shape of the fitness distribution for the next generation is estimated using the first four cumulants. Recently, the maximum entropy technique is applied by Whitley [2004], also to determine the shape of the fitness distribution from schema frequencies. 1.5 The Need for a New Model Studying the OneMax problem is important not for solution of that problem, per se, but because many real-world problems solved via genetic algorithms consist of a set of separable sub-problems for which the optimum is to optimize each individually, which is reminiscent of OneMax. When we look at all the models mentioned in this chapter, we have the following picture: The Markov Chain Model uses huge matrices to model the dynamics of a GA and is not applicable in practice, although it gives a good idea about some general features of GA evolution when applied to simple cases with too small population size and chromosome lengths. Its application to infinite populations give an exact description of this case, but, as pointed out by Prl'igel-Bennett [2002], only very large population size gives results close to infinite population case. This model, yet, is not applicable to problems with typical population sizes. It also requires the full knowledge of the fitness function. 40 Schema Theory assumes that the building blocks are already (partially) known. Assembly of building blocks is useful to observe to understand GA performance for many problems. This theory is hard to generalize to more realistic cases since it has many simplifying assumptions, and loses its applicability as populations lose their initial random character. The Statistical Mechanics Method requires explicit knowledge of the fitness function. This method makes very good predictions about the evolution of the fitness distribution. However, it is applicable in practice only to problems with very simple fitness functions. Effects of crossover are particularly hard to estimate using this method for most classes of real-world problems. So far, none of these models have been successfully applied to the analysis of parallel genetic algorithm behavior that involves migration before the population converges. Estimation of optimal times for performing such migrations, for real-world problems such as that of McCleer Power, Section 1.3, was among the initial motivators for the models developed in this thesis research. The examples in Section 1.2.3, illustrate that a model that predicts the mean allele evolutions can be applied to migration analysis. Although the models mentioned above are applicable to some cases of OneMax problem, there is no complete model predicting dynamics of deceptive functions (a class of functions that misleads the GA in finding the optimum. See Section 3.1 for a type of deceptive function). 41 As a result, we need a new model that can be applied to cover crossover rates lower than 100%, and also be used for migration analysis. Moreover, this model should also be simple enough, both theoretically and computationally, to potentially apply to a real-life problem. 42 Chapter 2 A MODEL OF GA DYNAMICS FOR THE ONEMAX PROBLEM 2.1 Problem Description and a Visual Representation of the OneMax Dynamics In this section, we study the OneMax problem. We consider the simple genetic algorithm in which two-point crossover, fitness-proportional selection and mutation are applied in the order given. Note that, "canonical” GAs typically apply selection before or after the crossover and mutation are applied. However, considering selection in between crossover and mutation does not make much of a difference in estimating the long term behavior of GAs since the essential difference between them is only at the very beginning or very end of the GA run, as illustrated in Figure 2.1. Also note that the order of mutation and crossover can be changed without making any difference in GAs dynamics since each gene remains in the population during these two operations and is equally likely to be changed by mutation before or after crossover. AA A chchs ) CSMXCSM)C VV V Figure 2.1 Selection-mutation-crossover versus crossover-selection-mutation 43 In this chapter, we develop a model of genetic algorithm behavior on the OneMax problem with a population consisting of P chromosomes of length L (see also Buyukbozkirli and Goodman [2004].) Let S(t) be the set of all chromosomes at time t, chrom an element of this set, and chrom(i) the allele at the f" locus of this chromosome, in other words, for chromk =(af,a§,...,af) the 1‘“ allele is chromk (i) = a!‘ .The fitness of a chromosome, chrom, will be denoted as f (chrom). So, for the OneMax problem, f(chromk) = Zchrom,c (i) = of + aé‘ + + at, l (2.1) where the values of of ’s are 1 or 0. The population-level variables that we are interested in are the mean fitness ml), the variance of the fitness mt), and the set of means of the alleles at each locus i {ai(t)};=1 ,,,,, L , at time t. They are given by the formulae PL P 1:10) = %Zf(chr0mk ) = %ZZa,-k, k-l _ k=l i=1 P K2(t)= i[Zf(chromk )2]- )'(1(t)2 P k=1 ' 1 . 1 P k all-(t) = —Zchromk (i) =—Zai (t). Pk=l Pk=l (2.2) 44 In the case of the “weighted fitness function” for a OneMax problem, each allele i of the chromosome is weighted by a constant weight, w;’. In this case, the fitness of a chromosome is given by the weighted sum f(chr0mk) = Zwi ~chr0mk (i) = wlaft + wzaé‘ + + wLaf. t' (2.3) Figure 2.2 shows a sample population at time ttogether with definitions of some of its parameters. mean allele at the ith locus : a,- chrom, = 1 O 1 0 fitness chrom: = O 0 1 1 fitness ehrom,= O O 0 . . . 1 1 fitness w, w: W3 . . . WL_1WL weights of alleles "‘03“ fitness 3 K1 variance of fitness : K2 = 0'2 Figure 2.2 The chromosomes in the population, mean allele and other notations In Table 2.1, we see a list of the notations used in this chapter in the development of the model. 45 Table 2.1 Notations P :population size L :chromosome length a," (t) :the 1"” allele of the k‘h chromosome at generation: a, (r) : the mean allele at the i'“ locus at generationr f (chromk) : the fitness of the k‘” chromosome at generation: [(10) :the mean fitness K2 (1) : the variance of the fitness pm : mutation probability of each allele, in decimal form p, : crossover rate, in decimal form S(t) : the set of all chromosomes at time t p,(t) : the probability that a chromosome that is selected randomly at time t with the probability scheme of the selection, has 1 at its ith locus When we study a particular run of a GA it is useful to look at its mean allele values for each locus and observe the way they change at each generation. In Figure 2.3, we see the mean allele values at times 0, 10, 20, 30, 50 and 100. The GA parameters of this OneMax problem are: chromosome length L=100, population size P=50, crossover rate pa = 0.25 and mutation rate pm = 0.001. 46 H'w} ld\r.li.'i ’IIAI. . I... : r. .3 K .a . M . 1. 0.8* 0.6B;::£;:30. cg... o time=0 0.4 .a 0.2L 0» l -0.2 ‘ ‘ 0 50 1(1) locus, i tlme=30 1- oo o o o a... mas” a.“ . . 0.8. - ° " . ' . ~ 0.6» ‘ . - .‘s . ' 0.4: D 0 ' . J(1 o 0.2» o o , . o ‘ on . 0?. . 000' o e: on -0.2 ‘ 0 50 100 locus, I pc=0.25 , pm=0.001 time =10 1 * e .' O . 4 o o o 0 Bi .0 :0 . I . A ' .. .. 3. g . o .0 u 0.6* 0. ’00 . ." ‘1... . . . . . 0.4i .2. ..0 .'e ‘a e. so 0 0‘2 1.. .. . . . 0 0.; 0. o oi -O.2 ‘ 0 50 1(1) locus, i time = 50 1 can." 0 -:.-HH 0 o 0 0a"- , 0.6» " . o o 4 . ° ° ' . - . . . . (I 0.2» . ' . . .. C . 0. o o 0:0 so. so 0 d' -O.2 ‘ * 0 5O 1m locus, i time=20 1L0...‘oe so so 0.... e . o cage. 1:. "" h ‘ 06’ ..~. . ..o. .0. o: 0 o .0 0 0.4 00 ' g 0' o . e 02?. . -.. -0. ' 0’ 0 o o -0.2 ‘ 0 50 100 locus I tlme=100 1L“..- o—qo- o 0.8. . ° -« '. o' . o .. 0.6 1 . o .. o... 0.4- . .. 0.23 " One 00.... one CI -0.2 0 50 100 locus I Figure 2.3 The mean allele values, ai’s at each locus i for times t =0, 10, 20, 30, 50 and 100 of a GA run with pc=0.25 and pm=0.001. Define Ah(t) to be the number of ai(t)’s whose values are less than or equal to h, Ah(t) =#{ a,(t)| a,(t) s h, i = 1,... 47 .L} (2.4) By this definition, for example, A0(t) gives the number of loci where all of the chromosomes have value 0, while A0_6(t) gives the number of loci where at most 60% of the chromosomes have value 1. For instance, in Figure 2.3, we have A0(3O)=9; i.e., at the 30th generation, at 9 of the loci, all the chromosomes have value 0. So, for this example of the GA, at generation 30, the allele value 1 is completely lost at 9 positions of the chromosomes. A1(t), by definition, is always equal to L. In Figure 2.4, we see how 1404 (t) is defined and the time evolution of Ah(t) for h=0, 0.1, 0.2, 0.9. The time evolution graphs of Aha) in Figure 2.4 are obtained by taking the average of 100 GA runs. 48 pc=0.25 , pm=0.001 time =80 1» a. .. .0 no...” 0 so- 0.. a... 0-..... m o . . o o 0.8~ . ' ,' — o o . . o o a 0.6~ o _ d o o O_4>¢ oooooooooooooooooooooooooooooooooooooo g.............. ooooooo Q nnnnnnnnnnnnnnn ‘oollo-h .- A0.4(t) :-' # 9f dots b ' 0.2» , level 0.4 - 0 o o o 0 ° 0 ..° 0 o o o o o _02 1 1 l L l l 20 40 60 80 100 allele nunber, i One-Max with : pc=0.25 , pm=0.001 Figure 2.4 Definition of Ah (t) 49 The values of the variables (Km), 19(1), {m(1)}i=1 L) and Ah(t) change from one 9"! GA run to another even if we have the same initial population. In terms of experimental results, we run a GA, with fixed parameters of selection, mutation and crossover, many times. For each run of the GA, we measure these quantities at each generation and take the average over all of the runs. The goal of our model is to estimate average values of these variables, hence the average behavior of the GA. In order to simplify the notation, we will use the same symbols (Km), 19(1), {ai(t)};=1,,,,L) and Ah(t) for values of a specific run of a GA, or for an experimental average of these values, or for the estimated theoretical average in our model. Which one is denoted will be clear from the context. We will use the superscripts c, cs or csm in order to distinguish these variables after crossover, selection or mutation is applied, respectively. So, cit-“(0 represents the mean of alleles at the f" locus at the 1‘" generation after crossover and selection have been applied, and 0493’"(t) equals a;(t+ 1). Note that the crossover operation does not change the mean allele values in OneMax problem. Thus, aq(t) equals ail-"(t)- The study of the time evolution of Ah(t)’s for several values of h gives a very practical insight into the behavior of the GA. Figure 2.5 shows the graphs of Ah (t) for h = 0, 0.1, 0.9, where the crossover rate is 25%, the mutation rate is 0.1%, and fitness-proportional selection is used. The population size is taken as 50 chromosomes and the chromosome length is 100 genes. Each time slice of such a graph can be seen as a “bar graph” of the mean allele distribution at the 50 given instant. In other words, the vertical distance between two curves gives the number of gene locations at which the mean allele is between the corresponding values, averaged across runs. For example, at t=100, at about 18 gene locations, none of the chromosomes (i.e. h=0) have value 1; at about 5 locations from 1 to 5 chromosomes (i.e., more than 0% and up to 10% of the population, i.e. 0). n=l (2.12) when the crossover rate is “high enough”. In summary, we have the following formula that is used in the code simulating the model: _M_g_g: (a) After crossover with high enough rate is applied, the probability that a randomly selected chromosome that is drawn with the probability scheme of fitness proportional selection, has a 1 at (1 - a,- (t))a,. (t) K10) . its 1'" locus is pim = 04(1) + 56 (b) The mean allele values after crossover and selection for high enough crossover rates are given by the formula P a,“ (t) = %2n . B(n,P, p, (t)) . n=l 2.2.2 Mutation In this section, we want to estimate a)“’"(t) given the values of af"(t). Each gene of a chromosome has the probability pm of changing its value from 1 to 0 or from 0 to 1 by mutation. When we consider the possible changes at the f“ locus only, the expected number, N, of total allele changes due to mutation can be found by using a binomial distribution as P N = Zn-B(n,P,pm). n=l (2.13) Since the percentage of 1’s at the it“ locus is 0408(1), a)“(t)N of these changes are going to be from 1 to 0, and (1-aF’(t))N of the changes are from 0 to 1, on the average. This means that the number of 1’s at the ith locus, which is Fatwa), will become Pa,“(t)- mcs(t)N+(1-a)°s(t))N after the mutation. Simplifying this quantity and dividing by P gives the mean allele for the next generation as _ .68 dig-(H1):aI,-"‘""(t)=(x,-"s(t)+1 2a, (”N (2.14) 57 ., 1334*».“13 run-MN... Mu As a result we have the following recipe to use in the model simulation: M 2.3: The mean allele after mutation is given by P N , where N = Zn-B(n,P,pm), n=l ailt+1)=afsm(t)=af‘(t)+———l-2:f (1) 2.2.3 Fitness Variance for “High Enough” Crossover Rates The fitness variance by definition is P K2 (t) = i-{Z f (chromk )2] - K1 (t)2 . k=l (2.15) If we write the fitness of chromk as the sum of its gene values a!‘ and change the order of summation after expanding the square sign above, we obtain 1 L P 19(1) =K1(t)+—Z Zafaf —K1(t)2 . Pit-xj k=l (2.16) P The term Zafaf in Equation (2.16), counts the number of chromosomes in k=1 which loci i and j both contain 1’s. In the case of “high enough” crossover rates, this count is estimated by using af’ and 01st as follows. The probability, p(i, j,n), that locations i and j have n common 1’s is given by the formula 4- P CS], where 11 could take any value a. J 170313") =( n Pa? - n 58 between max(O,Pa,-“ + Pale-3 — P) and min(Pa,-“,Paj-‘) and the product of P with a’s is rounded to the nearest integer in order to calculate the combinations. Thus, the estimation of the fitness variance in the case of “high enough” crossover rates is found by using CS CS 1 L - - CS x2 (t)=l(1 (t)+-FZXn-p(l,],n)—Kl (t)2_ i¢j n (2.17) The estimation of fitness variance after mutation is done by Prt’igel-Bennett and Shapiro, [1997]. Their formula gives us L 2 1 2 K?“ = (1 - 2pm) If?” +[1_F]pm(1— Pm)ZWi i=1 (2.18) where w. is the weight of the ith locus. In other words, the fitness of a chromosome (a1,a2,...,aL) is calculated by the weighted summation 2W1“: . In our special case, the values of w,-’s are all 1. So, we use the formula 2 1 2 L Kgsm(t)=(1-2pm) K§3(t)+[1-;)(pm — me1 i=1 CS 1 =(1-2pm)2K2 +L[1—F](pm—p31)=l(2(t+l) , (2.19) 59 to estimate the fitness variance after mutation. As a result we have the following two statements to use in the model simulations: M 2. 4: The fitness variance after crossover and selection is found 1 L . . by using K250) = K1“ (t)+FZXH'P(1.J,n)-Kfs(t)2, where iatj n Pa.“ P — Pa,“ P p(i,j,n)= ' x + cs . The same formula n Pa? —n Pa,- can be used to estimate K2 (0) using [(1 (0) since the initial values of the allele are chosen randomly which makes it equivalent to a highly mixed population. M 2. 5: The fitness variance after mutation is found by using 3 2 1 K5 "’ =(1-2pm) K?” +L(1-;)(pm #73.)- 2.2.4 Lower Crossover Rates Equation (2.11) gives the probability p. when the crossover rate is very high. In such a case, as in Section 2.2.1, we are able to treat the 1’s at a fixed locus of different chromosomes as identical to each other in terms of their roles in selection because of the high mixing rate of the crossover operator, which makes chromosomes look similar to each other, on the average. However, for lower and more realistic crossover rates, there will be some correlation between alleles within a chromosome and Equation (2.11) will no longer hold. Let’s keep the 60 usage of notation pi(t) for the probability of selecting a 1 at the ith locus in the case of the “high enough” crossover rate and denote the corresponding probability in the case of a lower crossover rate by p, (t). To remedy this situation and estimate 5,- (t) correctly, we consider artificial weights, c,-(t), for each locus in order to reflect the average change in the role of 1’s played in the selection process due to correlation between alleles. The correction weights, c,-(t), are defined implicitly by (1 — a, (1))a, (0c, (1) K1 (1) . EU) = 6310‘) + (2.20) The reason why we defined the correction weights as in the equation above is because if we write Equation (2.11) for a fitness function of the form f (chrom) = 2w,- -chrom(i) , with weights w), we would get an equation exactly I like Equation (2.20) with c; replaced by w). Our correction weights play a similar role at each locus as Ms would, except that 01’s change over time. The next step will be to estimate the c,’s statistically by means of some data gathered from experiments. In order to do this, the GA is run with fixed rates of pm and pc up to a pre-selected generation, say to. Let us call this generation Go. The crossover operation with the current rate, pc, is applied to Go many times. Each time, W t ) values are calculated from the experimental data for each locus i, and the corresponding 0,- values are found using Equation (2.20). 61 ‘1 ufluflflk.....z 3.13.5 .. . fit This process is repeated for many runs of the GA to obtain statistical measures. It is observed that the value of c) strongly depends on the values of at, as expected. Because of this dependence, it makes more sense to group the or according to their corresponding a) values before finding the statistics of the data gathered from the experiments. So, define Ciao) as the set { c,- values of the k‘h experiment of GA such that h s a,(t0) < h + 0.1}. for h = 0, 0.1, ...0.9 . The mean of the correction weights is obtained by finding ,u(h',t0):mekan(mean(C,:'(to)», h' = 1, 2, ...,10, where the relationship between the index h and h' is given by h = (h' -1)/10, to be consistent with definition (2.4). In order to measure how much the correction weights vary from one experiment to another, we also calculate the standard deviation 0(h',t0) = s£d(mean(C,:’ 00)». The experimental results show that, when pc is not too low (below about 4%), p and 0 remain more or less at the same value regardless of the time, to. Moreover, 11 shows a linear-like behavior while a shows a quadratic-like behavior as a function of h'. This behavior of the crossover operator allows us to use the linear approximation of p(h',5) to predict 5,1! ) for the following generations. Figure 2.6 shows the graphs of p for two different rates of crossover with to at generations 5, 15 and 30. We have observed that the inclusion of u in our model is good enough for describing the effects of c; distributions and the information coming from a does not play a significant role in the counting-ones problem. 62 However, for other problems, such as OneMax with Boltzmann scaling, 0 might be needed in the model. The deviations from the linear behavior, in Figure 2.6, at h'=1 or 10 are due to effects of statistical averaging in which there were not enough data points available for these border values. 3 2.51 2 5 pc = 0.25 2t pc = 0,75 2» p6 = 3 2 1.51 ' 1.5L 1 1M 1 1‘ . 0.5» 0.5, - Iiiearfitforb=5 0* 0* 0’ Estoppedatto=3o . * stoppedatto=5 e. :' '0.5‘ '0.5 "u stopped atto=15 -1 ' . J _1 1 a _1 1 l 0 5 10 0 5 h. 10 0 5 10 Figure 2.6 The mean value of the correction weights as a function of mean allele levels, h', for crossover rates pc = 0.25, 0.75 and 3. The statistical average is found over 100 runs of the GA. Population size is 50 and chromosome length is 100 genes M 2. 6: The mean allele values after crossover and selection for a crossover rate pc are given by the formula P — . . . Cit-“(0 = i2]; - B(n,P,p'iU», Where fit“) ___ 0’10) + (1 al(t))al(t)cl(t) Pn=l K1 (I) and the values of c, (t) are determined as described above for the corresponding crossover rate. 63 411...... EH13 ..wflfi... . 1 M. 2.2.5 The Algorithm of the Model Diagram 1, page 65, shows the flowchart of the algorithm that is used in the simulation of the model that has been developed in the previous sections. We apply this algorithm N times. The estimations of mean fitness and fitness variance are found by taking the averages over these N runs. a,(t) values of each run are used to find Ah (t) values for that particular run, h = 0, 0.1, 0.2, 0.9, Taking the average of these measures for each t, in turn, gives us the actual Ah (t) evolutions, that describe average GA behavior. The simulation of the model for high enough crossover rates starts with selecting a set of ai(0) values chosen by considering a binomial distribution for each locus in which we have P selections with a 50% chance of selecting a 1 each time. _M_g_.g (page 56) and M2_.3 (page 58) are applied to estimate the mean alleles after the crossover, selection and mutation operations. This process is iterated for each generation to obtain a dynamic simulation of the mean allele. At any moment, the mean fitness is estimated by MA (Section 2.2), and the fitness variance in the case of “high enough” crossover rates is estimated using M_2_.g or M (Section 2.2.3), depending on whether we are considering the variance right after the selection process or after the mutation, respectively. In the case of lower crossover rates, M 2.2 is replaced by M 2.6, in which the crvalues are pro-determined by the linear approximation of the data gathered at the 5th eneration of a set of GA runs as described in Section 2.2.4, Fi ure 2.6. 9 9 64 Diagram 1 The Flowchart of the Simulation Algorithm Select a, (O)’s randomly, i = 1, ,2 , , L using binomial distribution for each i, P selections with 0.5 success probability V Apply M 2.1, (page 54), to find K1(O) Apply M 2.4 ,(page 60), to find K2 (0) Fort: 1 to Maximumieneration number Fori=1toL: Apply M 2.2(a), (page 56), to find the value of p,- (t) Apply M 2.2(b), (page 56), to find a sample value for mean allele, af‘ (t), using a binomial distribution, P selection with p, (t) as the probability of success i=i+1 J Apply M 2.1, (Sec. 2.2), to find it,“ (t) Apply M 2.4,(Sec. 2.2.3), to find K? (t) Apply M 2.3, (See. 2.2.2), to find mean allele, afsm (t) = a,- (t +1), after mutation Apply M 2.1, (Sec. 2.2), to find mean fitness of the next generation K?“ (t) = K1 (t + 1) Apply M 2.5, (See. 2.2.3), to find the fitness variance of the next generation K§sm (t) = K2 (t -l- 1) 2.3 Weighted OneMax Fitness Function Now, we will consider the case in which each allele contributes to the fitness with different weights. In other words the fitness of a chromosome chromk =(a1k,a§,...,af) is f(chromk) = Zwi -a{‘ . I (2.21) Fitness function for weighted counting l's problem: IIIIOIIOO Fame”; “1 “a “a “4 “5 “0 “1 “a “1 “1*“21'w31'w41’w01'w1 Figure 2.7 Fitness with weights In Figure 2.7, we see an example of the weighted fitness of a chromosome of length 9. When we study the GA with the weighted fitness function of Equation (2.21) at time t, for the 1‘“ locus, Equation (2.9) would change to 66 42flchr0m) = P(1- 0!. (1)th (t) — ma,- (0) chrom e 53 (t) and 2 f (chrom) = Pat,-(t)(1(1 (t) + (1 — at. (t))Wi ) chrome S{(t) (2.22) Then, Equation (2.11) would be replaced by (1 - a, (t))a,. (ow, . K10) [7,-(1): at“) '1' (2.23) This would change our simulation model formula as M 2.2’: (a) After crossover with high enough rate is applied, the probability that a randomly selected chromosome that is drawn with the probability scheme of fitness proportional selection, has a 1 at (1‘ at “Dar (”Wt K10) is f" locus is p,- (t) = a, (t) + (b) The mean allele values after crossover and selection for high enough crossover rates are given by the formula P af‘(t>=%Zn-B(n.P.p.-(t>)- n=1 The existence of weights changes Equation (2.16) as 67 L P K2(t)=Kl(t)+-l—Z Zwmfwja’; «10)? tab j k=1 (2.24) Then, Equation (2.17) is replaced by 1 L 2 K55 (1) = K1“ (t) + 7522" - wiwjp(i, j,n) - K1“ (t) , i¢j n (2.25) where p(i, j,n) remains the same as Pa.“ P - Pa,“ P P“, j,n) = l X + cs ’ n Pa? —n Pa} (2.26) as in Section 2.2.3. The fitness variance after mutation is already given by Equation (2.18) for the weighted case. The modification for the lower crossover rates is straightfonrvard. The defining equation for the correction weights, Equation (2.20) is now (1-a,.(z))a,.(t)w,.c,.(t) K1 (1) . 13.0): ai(t)+ (2.27) 68 2.4 OneMax with Boltzmann Scaling One of the most common ways to scale the fitness values so that the chromosomes with higher fitness have more chance to survive the selection operation compared to the usual fitness-proportional selection is Boltzmann scaling. In this scaling, the fitness value f (chromi) of the ith chromosome is replaced by f(chromi) = efikhmm‘), (2.28) where ,6 , which controls the pressure of the selection, might be a function of some parameters of the population or it might be a constant. Then, the probability, q,- that the ith chromosome would be selected by the Boltzmann- scaled fitness proportional selection is given by the quotient efi‘ (chromi) i = p ' Zefikhmmj) i=1 q (2.29) The advantage of using a Boltzmann scaling is that the value of q,- is shift invariant. In other words, if the fitness values of all chromosomes present in the population at any time t are shifted by a value c, then q,- would remain unchanged for chrom). One can see this property easily from 69 .I'E 037.5, m-r .. ...se..r.. .7 m .9..A efl(f(chromi )+c) qi : p Zefl(f(chrom,)+c) i=1 eficeflflchrom) eflc ieflkhmmi) j=l efi‘(chrom,) = p = qi Zefllchmmi) j=l In the case of scaled ,6, the selection pressure, ,8, is sealed at each generation inversely proportional to the standard deviation, 0‘ = ,iKz of the fitness. So, .33 =___V21“(P)fi is used instead of ,8 and the fitness values, f (chrom), are 0' replaced by Then q,- is ,(21n(P) )flf( chrom) f(chrom): e a (2.30) i J i _mnmfl(chrom.) f (chromi) = e a t ,. ,_[______21n(r>) (chrom) Zea ___—131mm,.) =1 j= —l 70 When ,8 is scaled like this, then the value of f(chromflis invariant under multiplication of fitness values by a constant. This is because when all the fitness values in a population are multiplied by c, then the new variance of the 2 population would be c [(2, where K2 is the variance of the original population. Then, i—gflflcflchrom) V 21“(1’),5cf(chrom) f(chrom)’ = e “C K2 = e CJK—Z = f(chrom). Hence, using a Boltzmann selection with sealed ,6 , the selection operation would be invariant under constant multiples or shifts of fitness values. We will study two types of Boltzmann selection for the high enough crossover rates only. In the first one, ,6 is kept constant throughout the whole GA run. We call this case the ‘Boltzmann selection with fixed ,6 ’. In the second one, ,6 changes inversely proportional to the standard deviation of the fitness distribution of the current population. This case will be called ‘Boltzmann selection with sealed ,6 .’ Using the same notation as in Section 2.2.1, we define the sets Sf, (t) = { chrom e S (t) I chrom(i) = O} and 71 31.4.4422»... ...l . «A Sf(t)={chrome S(t)| chr0m(i)=1}, where S(t) is the set of all chromosomes at time t. Let us consider an instance of a GA run and study this instance for the ith locus. We define probability p), similar to that in Section 2.2.1, as the probability that a chromosome that is selected randomly with the Boltzmann probability scheme after the application of crossover has 1 at its 1th locus. Then, we have Zeflflchrom) chrome S i (t) Zefiuhrom) + Zefikhrom) ° chrome S f (t) chrome S3 (t) 1910‘) = (2.31) We want to find the expected value of the probability, pi(t), when the crossover rate is ‘high enough’. If the ith locus values of all chromosomes in this population are replaced by 1’s then we would have a fitness distribution whose mean fitness is Kl + (1 — (1,), where [(1 is the mean fitness of the original population. At this point, we assume that the variance of this new population is almost the same as the variance of the original population. Further, assuming that all the fitness distributions under consideration are normal distributions, we propose a model distribution for the set Sf (r) : M 2. 7: Fitness values of Sf (t) are assumed to have a normal distribution with mean [(1 + (1— a,) and variance K2. 72 Similarly, for the fitness values of S60) we have: M 2.8: Fitness values of 53(1‘) are assumed to have a normal distribution with mean K1 — ai and variance K2. 73 Chapter 3 A MODEL OF GA DYNAMICS FOR THE DECEPTIVE FUNCTION PROBLEM GAs, with the help of the selection operator, have a tendency to treat the direction in which an increase in fitness is observed when a gene value is changed as the direction to go to seek the optimum value (essentially, hill climbing). However, not all such changes, in general, lead toward a gene value that is part of the optimum chromosome. In particular, there are some test functions that are especially designed to mislead GAs that take advantage of this weakness of the algorithm. An important class of such functions is called “deceptive functions”. In this chapter, we will describe a form of deceptive function and develop a model for the GA with this function. 3.1 Deceptive Function First, we define an N-bit deceptive function. For this function, the chromosome is made up of some number of N-bit blocks and the fitness of the chromosome is found by adding up the fitness contribution of each block. The fitness contribution of an N-bit block is given by 74 A ifai=1foralli f(a ,a ,...,a )= . . 1 2 N Bm it some a, sare 0, wherem=# of zeros (3.1) The chromosome consists of many such N-bit blocks and its fitness is the sum of each of their contributions given by Equation (3.1). The constants A and B are such that A>BN. In other words, a deceptive block with the highest fitness contribution would have all 1’s as its alleles. Figure 3.1 shows an example of a 3-bit deceptive fitness function. In this example the fitness of (111 101 100) is A+1B+ZB = A+SB. We see that the fitness of the chromosome would increase if the last deceptive block, 100, would change to 000. However, it is clear that the optimum chromosome is the one with all 1’s. This is the reason why these functions are called deceptive: From the GA- selection operator's point of view 111 101 000 is better than 111 101 100, while the later is bitwise closer to the optimum. Fitness function for n-bit Deceptive problem: 34)“ case: me" | I 1 I o 1 I o o I “(C I" deceptive b106k3) “7" i/ H -- o 9:01 0's) A 9 a A 4' 39 Figure 3.1 The definition of the fitness of a 3-bit deceptive function 75 3.2 A Model for the Deceptive Function with Fitness-Proportional Selection When a GA is applied to find the optimum value of the deceptive function defined in the previous section, it tries to increase the number of both 0’s and 1’s at the same time. So, the problem involves a counting-0’s and a counting-all-l- deceptive blocks (i.e., the N-bit blocks with all their allele values equal to 1) problem, competing with each other. Both the counting-0’s and counting-all-1- deceptive blocks work similarly to the OneMax problem modeled in Chapter 2. In this section, we apply this idea and develop a model for the deceptive function, using the results of the OneMax model. We use the index letter j to index the deceptive blocks, and the index letter i to index the locus in the chromosome. Similarly to the mean allele values defined in Section 2.1, Figure 2.2, we define a _ # of 0's at the jth locus of all chromosomes J. _ P and y _ # of all -1 deceptive blocks at the 1"" N - bit deceptive loci in the population ’ _ P (3.2) where P is the population size. Figure 3.2 shows the definitions of a and y for a 3-bit deceptive function. 76 chrom1 *** *** **=l¢ chrom2 *** *** *** chrom? *** *** *** 1t 0's (1.: __ 11 ill—blocks 4 Population Size yi " Population Size Figure 3.2 Variables a and yin a 3-bit deceptive function problem Similarly to Equation (2.4), Figure 2.4, define A:(t)=#{aj(t)| aj(t).<_h,j=l,...,L} and A; (t) =#{ 7,-(t)| 7,-(t) S h, i = Lug-1%}, (3.3) where L is an integer multiple of N. In the following subsections, we develop a model to estimate the average expected values of a j (t) and 7,-(t) from their previous values. 77 3.2.1 Selection and Crossover with “High Enough” Crossover Rate Similarly to Section 2.2.1, we first consider the case in which the crossover rate is so high that the alleles at any locus are distributed essentially randomly among the chromosomes. We will call this crossover rate a “high enough” crossover rate. We have the following simple equation, following directly from the definition of the mean fitness M 3. 1 At any stage of the model the mean fitness is given by L/3 L i j=l Define p? (t) as the probability at generation tthat the f‘ locus of a chromosome that is selected randomly with the fitness proportional selection scheme after the application of crossover, has 1 as its gene value. p! (t) is defined as the probability at generation tthat the f“ N-bit deceptive block of a chromosome that is selected randomly with the fitness proportional selection scheme after the application of crossover, has all 1’s as its alleles. In Figure 3.3, we see a pictorial definition of p? (t) and pf (t) for a 3-bit deceptive function. 78 00‘39 b\ocY~ . \ W16 {9 I?“ 66cc?“ = chrom: . .® . Select one with the fitness / proportional selection p,’ (i) : Probability that this block is 111 probability scheme Population after pffl) : Probability that this allele is 1 generation t- 1 Figure 3.3 Definitions of p? (t) and pl?’ (t) fora 3-bit deceptive function Let S(t) be the population after crossover is applied with a high enough crossover rate to the population of generation t— 1. Define the subsets 753 (t), (sf (t), “55' (t) and “311(1) of S(t) as 7510) = chrome S(t) i'h deceptive block Of chrom is 11...1 , 1 7530) = S(t)-(Sf (0. 05110) = {chrome S(t) j’h locus 01 chrom iS 1 }. “55' (z) = S(t)-“511' (t). (3.4) Then, by definition of pj'(t) and p," (t) we have 79 2f (chr om) 2 f (chrom) pg“) ___ 4 chromeasgfl) 4 : chromeaSJU) 2 f (chrom) + 2 f (chrom) K1“ chromeaSJ (t) chromeaSljU) and 2f (Ch? om) 2 f (chrom) p170) = _ chronie78{(t) J = chromersffl) 2 f (chrom) + 2 f (chrom) Kf chromersli (t) chrome’S3(t) (3.5) In order to simplify notation, from now on we assume that the deceptive function is 3-bit. All the arguments below can easily be generalized to a deceptive function with any number of bits. Then, for the set 781" (t) we have iZflchromFP[A+(14(1)-Arf(t)-B(aj+t(t)+a).2(t)+aj+a(t)»1. chrome 751‘ (t) (3.6) where the values of land 1 are related to each other by j = N (i - 1), (N = 3 in our case). For the set “Slj (t), we have Zflchmm) = P[3 + (Ki (t) - Aria) - 309(0)]. chrome "SJ (t) (3.7) 80 where the values of land j are related to each other by i = int(j/ N ) + 1, (“int” is the function giving the integer value of its argument and N = 3 in our case). In both Equations (3.6) and (3.7), the bar over the summation means the average over all possible configurations of gene distributions, In which we assume that the genes are distributed randomly satisfying the given mean allele values, since the crossover rate is “high enough”. The superscript “d’ in both equations means the values of the corresponding variables after crossover is applied. Note that, in the case of the deceptive function, the crossover operation changes the value of mean fitness, K1(t), as well as 7,-(t), while 051- (1) remains unchanged. In the OneMax problem, the mean fitness was not changed by CI‘OSSOVGI’. Equations (3.6) and (3.7) give us the probabilities p? (t) and p,” (t) as y ___ yf(A+(xf —Ay,."—B(czj+1+arj+2 +698» ' ,where '=3i—1 p. Ki 1 ( ) (3.8) and 23+ K‘—A F-Ba. Pf: J( (I y, 1)),wherei=int(j/3)+1. Ki (3.9) In order to use equations (3.8) and (3.9), we need to estimate the values of Kf and yf - i.e., the mean fitness and deceptive block percentages at each 81 deceptive locus after crossover is applied. For high enough crossover rates, they are easy to estimate. yf is determined by c _ G(Pa—afil)’ P(1-(Zj+2), P(1-aj+3)) yi_ P i (3.10) where G(nl,n2,n3) is the function that gives the expected number of 111’s in the population when the genes are completely shuffled with nk being the number of 1’s at the corresponding locus of all the chromosomes. Then, Kf is given by L/3 r. Kf = Any + B a}. i j=l (3.11) In the process of fitness-proportional selection, we apply selection of chromosomes P times with replacement. Each time, the probability that the selected chromosome has 1 as its 1‘“ allele, is p? and the probability that the f“ deceptive block is 111 is p}’ . So, we use a binomial distribution to find the expected number of 1’s at the f“ locus and expected number of all-1-deceptive blocks at the f“ deceptive place. We use the notation B(n,P, r) to denote the probability of having n successes after P trials, when the success probability is r for each trial. Then, the expected theoretical value of a§‘(t) is 82 1 P 61?“) zign-B(n,P,p3-z(t)), (3.12) when the crossover rate is “high enough”. Similarly, the expected theoretical value of yf‘(t) is 7,-“(0 =%ZP:"'B("’P”"'7(’)) ' n=l (3.13) Then, in the model simulation, we have M 3.2: The values of a? (t) after crossover and selection for high enough crossover rates are given by Equation (3.12), where p? (t) is given by (3. 9) . Having found the values of 15:10), ale-:20) and 1.130) in the simulation, we need to modify Equation (3.13), in order to find yf‘(t), where j =3(i-1), as follows. Since the values of 615.110), ajiza) and a§i3(t) are already determined, this means that we have only 13=nnn(P(1—a;:,(i)), p(i-a3t120», P(1—a§-i3(t))). (3.14) 83 chromosomes left in the population which have the possibility of having 111- deceptive blocks at their 1”“ deceptive places. Moreover, the probability p! (I) would be modified as pi’ pi7(t)_____ _ " Pk (3.15) where k is the index of a for which we get the minimum value in {P(1— 075110)), P(1— a§12(t)). P(l — ale-:30»). Thus, the expected theoretical value of y,“ (t) is now given by 1 P r."‘(t)=;Zn-B(n P 1),-70>) "—1 (3.16) As a result, we have 3 3.3 After 014,10), 0"- +20) and a1+3( ) are determined by M 3. 2, the values of yf (t) are given by Equation (3.16), where P and p 7(t) are determined by Equations (3. 14) and (3. 15), respectively. 3.2.2 Mutation The modeling of mutation is done exactly as in Section 2.2.2. 34 Thus we have 3.4 The mean allele after mutation is given by csm CS 1 _ 2a§s (t) aj(t+1)=aj (t) =aj (t)+ N , where B(n,P,pm) u _M'b After the values of aj(t+ 1) are determined as above, the values of yi(t+ 1) are estimated from them by using the function G, which Is defined in Section 3.2.1, G(P(1—aj+1)9 P(1_aj+2)’ P(1_aj+3)) (3.17) 3.2.3 Fitness Variance Fitness variance by definition is 1 P L 2 2 [(2 (t) = — Zf(chromk) — Kl(t) P k=l (3.18) In order to write the fitness of a chromosome more easily in the calculations, define variables gi and a, as _ 1if the i’” deceptive block is 111 ' 0 othen~ise 85 a _ lif the I“ alleleis0 I 0 othenlvise Then the fitness of a chromosome is L/3 _ k k k k f (chr omit) — 2(148 i 1‘ B(a3(i-1)+l 1’ a3(i—l)+2 1' a3(i—1)+3))' l (3.19) Then, P ,(ZEflChmmk) = F 223018; + mam-0+1 + “3(1—1)+2 + “3(j-l)+3))* = 1,1 k k k 1. (A81 "' B(a3(i—1)+1 1' a3(i—l)+2 1' “3(i-1)+3)) (3.20) Separating the terms with i=j from the rest of the summation yields an estimation of the above expression, given by 86 1 P L/3 — 2 f (chromk )2 2'— E[Azyi + (33?;- P k=1 i=1 2 + (23) a3(i—l)+l 'a3(i—1)+2 '(1‘ a3(i-l)+3) 2 + (23) a3(i-l)+1 ' (1 ‘ a3(i—1)+2) ' a3(i—1)+3 2 + (23) (1" a3(i—l)+1)'a3(i—l)+2 'a3(i—l)+3 2 '1" B a3(,-_1)+1 ° (1 "" a3(i—l)+2) ' (1 _ a3(i-l)+3) 2 + B (1‘ a3(i-1)+1) ' ash-1H2 '(1‘ a3(i-l)+3) 2 + B (1 — a3(i—1)+1) ° (1 " “3(1-1)+2)' ash—n+3] L/3 + E[(B(a3(i—l)+l 1' “3(1—1)+2 + a3(i-1)+3)+ Ai’i)* tat j (3 (ax j—1)+l + a3( j—1)+2 + a3(j—1)+3)+ A7 j )1 (3.21) In the estimation above, 5,- is the estimated percentage of deceptive blocks which are 000 at the f" deceptive place. As a result, we have M 3.5 The estimation of fitness variance for high enough crossover rates is given by Equation (3.18), where the first term is estimated using Equation (3.21 ). 87 3.2.4 The Algorithm of the Model Diagram 2, page 89, shows the flowchart of the algorithm that is used in the simulation of the model developed in the previous sections. We apply this algorithm several times (~10 times). The estimations of mean fitness and fitness variance are found by taking their averages over these runs. Then, a j (t) and 79-0) values of each run are used to find A? 0) and A; 0) values, h = 0, 0.1, 0.2, 0.9, whose averages, in turn, give us the average Ah 0) evolutions. 88 Diagram 2 The Flowchart of the Simulation Algorithm Select a]. (O) ’s randomly, j = 1, ,2 , , L using binomial distribution for each i, P selections with 0.5 success probability Estimate 71(0) by Equation (3.10), i=1, , U3 Apply M 3.1, (page 78), to find K1 (0) Apply M 3.5,(page 87), to find K2 (0) For t = 1 to maximum generation number do the following | For each n =1 to U3: Estimate yg0) by Equation (3.10), n=1, , U3 Apply M 3.1.. (page 78), to find K‘f (t) For eachi: 1 to US: Apply M 3.2, (page 83), to find the values of a? (t) , forj={3(i—1)+l, 3(1-1)+2, 3(i—1)+3} Apply M 3.3, (page 84), to find to find the value of new), Apply M 3.1. (Page 78), to find K1“ (1‘) Apply M 3.5.(page 87), to find 19“ (t) Apply M 3.4, (page 85), to find mean allele, afsm (t) = or, 0+1), after mutation Apply M 3.3, (page 84), to find to find the value of yfsm 0), Apply M 3.1, (page 78), to find mean fitness of the next generation 14"" (t) = x, (t +1) . Apply M 3.5.(page 87), to find the fitness variance of the next generation K5” 0) = K2 0+ 1) 89 3.3 Weighted Deceptive Fitness Function We define the weighted fitness function for an N-bit deceptive problem by assigning weights both for each locus and for each N-bit deceptive block. Let’s use the notation W]. for the weight of f“ locus, and v,- for the weight of the i“ deceptive N-bit block. Define variables g,- and a, as = {1 if the i‘” deceptive block is all 1 's 81 . 0 othervwse a __ lit the 1'“ alleleiso I 0 otherwise Then, the fitness of a chromosome for a 3-bit case, for example, would be given by f (chromk ) = “3 k k k k - E[Awigi 1' B(v3(i—l)+la3(i—l)+l 1' V3(i—1)+2“3(i-1)+2 1' V3(i-1)+3“3(i-1)+3)] i (3.22) Figure 3.4 shows an example fitness calculations for the chromosome 111101100, with a 3-bit weighted deceptive function. 90 Fitness function for 3—bit weighted deceptive problem: I l I §l o I $1 0 0 Fitness: Allele weights ill| w, w,§w4 w5 w,’ £111., w, w, Av, + B(Wstl'wsdrwq) Deceptive block V. V2 V, weights Figure 3.4 An example of the weighted fitness function for a 3-bit deceptive problem For the mean fitness, in this case, we have M 3.1’ At any stage of the model the mean fitness is given by L/3 L K1 =A2viy, +Bijaj. r j=1 The Equations (3.6) and (3.7) would be changed as 2 f (chrom) chrom e ’5: (t) = P[Av,- + rcf (t) — Aviyf (t)— B(wj+1aj+1(t) + wj+2aj+2 (t) + wj+3aj+3 (0)] where j = 30 — 1) , (3.23) and 91 h 2 f(chrom) = P[Bwj + (Kf (t) — Aviyf (t) — Bwjaj (0)] , c rome a ({(t) (3.24) where i=int(j/N)+1. These modifications give us the probabilities p?(t) and pf (t) as p7 = l 7:? [Avi +ch(t)—Aviyic(t)- B(Wj+laj+l(t)+Wj+2aj+2(t)+Wj+3aj+3(t))] Kf (3.25) where j = 3(i — 1) , and pa _ alewj + (Kf(t)- Aviyf(t)-Bwjaj(t))J j — ,. (3.26) where i=int(j/3)+1. 3.4 Deceptive Function with Boltzmann Scaling In Section 2.4, we have defined the Boltzmann scaling for two cases, one with fixed ,6 and the other with ,6 adjusted inversely proportional to the standard deviation of the fitness distribution. The development of the model of the deceptive function with Boltzmann scaling is similar to Section 2.4. Using the 92 definitions of 123-'0) and p,-7(t) and the sets 7S3(t), 75(0), “SJ (t) and “51} (t), in Section 3.2.1, the equation corresponding to (3.5) would be Zeflflchrom) chromEaSJU) “0) - pf _ Zeflwhmm) + Zefi‘whrom) chromeang) chromeaslj (t) and Zeflkhmm) chromE’SHt) 7’ _ pi (t) _ Zeflkhmm) + Zeflkhmm) ° chromersffl) chrom€753(t) (3.27) In the model, we need to estimate expected values of Zemdm’m), chromeaSlj (t) Zemmm'"), Zefifwhm’") and Zefi (Ch’om).When the crossover chromeaSJ (t) dimmers: (t) chrome 756 (t) rate is high enough, we have for the y counting fiz f(chrom) = P[A + (Kf (t) — A7,? (t) - B(aja) + a 1+1 (t) + a 1+2 (t)))] chrome 75(0) (t) + all-+10) + aj+2(t))] _ (a, h =P KC —A F B 1 2f (c ram) [ 1(t) 7, (0+ 1-7f(t) chrom e ’53 (t) (3.28) and for the a counting 93 2 f (chrom) = chrom e “S((t) P[xf (t) — Bar]- (t) + A(Common(1— a 1+1 (0,1 - a 1,2 (t))-— yf (0)] Z f(chrom) = P[B + (xfo) -- Ayfo) — Ba 1(0)] . chrome “SJ (t) (3.29) Then, for the model simulation we have the assumptions M 3.6 The fitness distributions of the sets 756 (t) and "S11' (t) are normal with the mean values given by Equation (3.28). M 3.7 The fitness distributions of the sets “SJ (1) and “Slj (t) are normal with the mean values given by Equation (3.29). 94 Chapter 4 COMPARISON OF THE MODEL WITH GA EXPERIMENTS In this chapter, we explore the model for OneMax and deceptive functions developed in Chapter 2 and 3, with various sets of GA parameters, and compare the model predictions with the data obtained from GA runs. This way, we both examine the GA behavior under different circumstances and also test the validity of the model assumptions. The exploration of the OneMax function includes both “high enough” and lower crossover rates, while for the deceptive function, we have only the results for “high enough” crossover rates since the model for lower rates is a subject for future research. In all the graphs of this chapter comparing the model with the actual GA runs, the black lines represent the experimental GA results and the thick gray lines represent the results obtained from our model. For notation purposes only, in some of the graphs, we denote the fitness proportional selection without the Boltzmann scaling as )3 = _1, although it is clear that there is no such ,6 variable in this case. 95 4.1 OneMax Problem 4.1.1 Fitness Proportional Selection In this section we apply the model developed in Sections 2.2.1 through 2.2.5. The population size is 50 and the chromosome length is 100 genes. The experimental results are obtained by averaging 100 runs of the GA. The “high enough” crossover rate in our case is p6 = 300%, which means that 100% crossover is applied 3 times in a row before the selection. This rate of crossover is verified experimentally as “high enough" by observing that there is no significant change in the graphs of Ah(t), K1(t) and K2(t) if a higher value of pc is used. It can also be verified from Figure 2.6, since the correction weights, when p0 = 300%, are all very close to 1. In Figure 4.2 and Figure 4.2, the time variation of A, is shown for crossover rates of pa = 4%, 25%, 75% and 300%. We observed in our experiments with the model that when the crossover rate is too low, such as p0 = 4%, the statistics of correction weights taken only from the 5th generation are not enough and we needed adjustment by using the statistics at the 15th generation. Figure 4.2(a) shows the graph with this adjustment. For p6 = 25% and 75%, the statistics from only the 5th generation are used. The simulation for pa = 300% is obtained by taking all cis as 1. In all four cases, no mutation is applied — i.e., pm = 0. The graphs look quite similar to each other, except that there is some variation in the value to which the lines converge as time approaches 200 generations. We see that the limit value decreases from around 40 to 30 as pc increases from 4% to 300%. This slight decrease observed in the experimental graphs is well captured by the model simulations. Such a change in 96 the converged values has a significant effect in the final fitness values of the population as a whole. In Figure 4.3, the time evolution of An is shown for two different mutation rates, in both of which pc is kept constant at 50%. In the first case the mutation rate is very low at pm = 0.1%, while in the second case it is pm = 2%. In both cases, the model predicts the mean allele behavior very well. 97 153:1.)3 .I I.) ”.1.- /I 0 so 160 150 200 (b) t Figure 4.1 Ah as a function of time for four different cases where the crossover rate is 4 and 25 percent, respectively. There is no mutation. Black lines are the experimental averages over 100 GA runs and thick gray lines are the results obtained by model simulations. Population size is 50 and the chromosome length is 100 genes 98 0 so 160 150 200 0 5b 160 150 200 (b) t Figure 4.2 A, as a function of time for four different cases where the crossover rate is 75 and 300 percent, respectively. There is no mutation. Black lines are the experimental averages over 100 GA runs and thick gray lines are the results obtained by model simulations. Population size is 50 and the chromosome length is 100 genes 99 Ci ...,-Lu. {P .N. .41 J“ (a) 100 150 200 (b) t Figure 4.3 Ah as a function of time for two different cases where the mutation rate is 0.1 and 2 percent, respectively. The crossover rate is 50% in both cases. Black lines are the experimental averages over 100 GA runs and thick gray lines are the results obtained by model simulations 100 The experimental results and the model estimations of mean fitness for several values of pm with crossover rates of p6 = 25%, 75% and 300% are shown in Figure 4.4. The impact on the mean fitness of changing pc from 300% to 25% is more visible when pm is low, around 0 or 0.1%. The effect of higher mutation rates dominates the dynamics of mean fitness evolution, and decreases the amount by which different crossover rates affect the mean fitness. The estimation of the fitness variance when pa = 300% is shown in Figure 4.5 for various mutation rates, together with experimental averages. In all these graphs, we see that these dynamics of mean fitness and fitness variance are well captured by the simulation of the model. 101 Figure 4.4 mean fitness (a) mean fitness (b) mean fitness (c) 0 so 160 13to 260 2350 360 The mean fitness for p6 = 25%, 75% and 300%. In each figure four different mutation rates, pm = 0%, 0.1%, 1% and 2%, are shown. Black lines are the experimental averages obtained by averaging over 100 GA runs and thick gray lines are the results obtained by model simulations 102 p -300% 25- m ' p =0.02 : 20- «..., . _g i" p =0.01 I... ‘ _3 g 15~-— ........ -— - ........ g “a!“ m 10 ------- 4., --------------------------------- C ~ .. E ‘ p =0.001 5 ____________ _- _ ........ E" ____________ .. fl pm=0 00 5b 160 t 130 I 200 250 Figure 4.5 The fitness variance for four different rates of mutation, pm = 0%, 0.1%, 1% and 2%, with crossover rate at 300%. Black lines are the experimental averages obtained by averaging over 100 GA runs and thick gray lines are the results obtained by model simulations 4.1.2 Boltzmann Scaling In this section, we examine the model for fl =0.1, 0.3 and 0.6 for both fixed and scaled ,6. The “high enough” crossover rate for Boltzmann selection is observed to be pc 2 3000%, in other words, full crossover (i.e. 100%) is applied at least 30 times in the crossover operation. In Figure 4.6 and Figure 4.7, we have the Ah curves. Figure 4.6 shows the case when no mutation is applied. The ,6 values are fixed at 0.1 or 0.6. 103 Figure 4.7 has the graphs for 0.1%, 1% and 2% mutation rates with ,6=0.1 selection pressure. The corresponding mean fitness graphs are shown in Figure 4.9 and Figure 4.9. This figure also contains the mean fitness curves for the corresponding GA parameters when ,8 is scaled. The comparison of mean fitness graphs for fixed-,6 Boltzmann selection for three different selection pressures, ,6 = 0.1, 0.3, 0.6 , without any mutation are seen in Figure 4.10. The corresponding fitness variance graphs are seen in Figure 4.11. When the selection pressure, ,6 is scaled, the Ah curves look as in Figure 4.12. Lastly, the comparison of fitness variance curves for different mutation rates of ,6 = 0.1 and ,6 = 0.6 are seen in Figure 4.13. 104 Figure 4.6 Ah curves for fixed ,6 Boltzmann selection. pm = 0, pc = 60. The first graph is for ,6 = 0.1, the second, for ,B = 0.6 105 100 pc=60 , pm=0.001 ,p=0.1 pc=60 . pm=0.01 .p=0.1 60 4O 20 o 10 20 30 40 so so t 100 80- \ eo- \ \ pc=60,pm=0.02,fi=0.1 <21: \-. 40» 20- “ \ 0 A“ w—— _J 0 1o 20 so 40 so so Figure 4.7 Ah curves for fixed ,6 = 0.1 Boltzmann selection. pm = 0.01, 0.1, 0.2 , p, = 60 106 1%! T 1 f 1m) 9.? pm=0.001 90y - as.) / pm=0.01 l 80” .. 'l l%-002 75— 70- 65*- w% g=30fi=01 55 .. m L 1 L t 1 1 -1 0 10 20 30 40 50 60 70 t gh=omn I pm = 0.02 1‘ lh=0 ( £- 9 l j 0 10 20 310 410 510 610 70 t Figure 4.8 The mean fitness graphs for fixed- ,6 Boltzmann selection. The graphs show mean fitness for different mutation rates with fl = 0.1 or ,6 = 0.6 107 1,110.52. .1. at». ..4 .. - / T l 85 .1 ___J 1 _____ L L 1 __._,..__ 0 10 20 30 4O 50 60 70 Figure 4.9 The mean fitness graphs for scaled— ,6 Boltzmann selection. The graphs show mean fitness for different mutation rates with fl = 0.1 or ,8 =0.6 108 1 1.....unuyiehnfiwvfimpw2 ”r _- . 1% l r I fir l = 0.3 100 ~ l %» - ’ 3:011 [3 = 0.6 4 :2— ~ 1 l e‘o go Figure 4.10 The mean fitness graphs for fixed ,6 Boltzmann selection for three different selection pressures ,6 = 0.1, 0.3, 0.6 with pm = O 109 25 1 I I l I Figure 4.11 The fitness variance for fixed ,6 Boltzmann selection for three different selection pressures ,6 = 0.1, 0.3, 0.6 with Pm = O 110 1'? 333..-; .. mill s d . . , Figure 4.12 fi'iT f‘ ‘f‘ m so 70 l pc=60, pm=0.001 ,p=0.1 so <‘ 50 , \ \ 40 x x \ w \ \ 20‘} ‘ \‘\\ 10‘ ‘\ ‘ 7_ 0 ’M. F» J o 10 20 so 40 so so i (3 = 0.6 , pm = 0.001 1007 w- 80, 7o 60 50 4o 30. 20 10‘ F 0 A ‘ , l 0 5 10 15 20 25 30 Ah curves for scaled- ,8 Boltzmann selection. Pm = 0.001, pc = 60. The first graph for ,6 = 0.1, the second ,6 = 0.6 111 1,.‘Hnw\‘ no Uri... h .A , , 25 T T j’ \ 20» \e - 2 \ . ‘ ‘ __ pm=0.02 Isl \ ' ‘ 4 ‘\ \‘ ‘ N \ ‘ \ 2 \ 10* pm=0.01 l pc=60 , p=0.1 _ 5% p m - 0.001 O__ l J 1 A _1_ 0 10 20 30 40 so so i (3 = 0.6 124-2;: 1 1 T r l 1 10» L ' it 8’ ‘ i. (3:. ”it N r a: 6 a . "52-, fl 4 1: - . w 2.222 2: 2 E ,2 15;“ 0.02 2 . a“. . . pm=0.01 2 .90... “'_‘::-‘_ J. __ - .: ______ .2__:--=_ = 0 = 0 pm \jm 000 - 2 . , . . .. 0 10 20 30 40 50 60 70 t Figure 4.13 The fitness variance for scaled ,6 Boltzmann selection for two different selection pressures ,6 = 0.1 and ,6 = 0.6 showing each selection pressure for different mutation rates 112 4.1.3 Weighted Fitness Function In this section, we study the GA with two different weight profiles. We will call them Profile-A and Profile-B. The graphs of the weight profiles as a function of the gene locus are shown in Figure 4.14 and their equations are given by —1§(i—1)+1.9 if ISiSSO 49 Profile-A wi =4 %(i—51)+O.1 if 51sis100 L 1 if Isisso Profile-B w, = i—l , 3—— If SISiSIOO (4.1) In Profile-A, the weight increases from 0.1 to 1.9 linearly as we go from the middle of the chromosome towards the ends. Hence the gene values in the middle of the chromosome contribute less to the fitness compared to the ones near the ends of the chromosome. The maximum possible fitness value of this profile is 100, which is obtained when all the alleles are 1. Profile-B is constant 1 for the first half of the chromosome. The weights of this profile decrease from 1 to -1 linearly in the second half of the chromosome. Note that Profile-B takes negative values in the last quarter of the chromosome. This means that having a 1 in this part of the chromosome causes the fitness to decrease. So, the GA would try to have 0’s instead of 1’s in the last quarter of the chromosome. The maximum possible fitness value of this profile is 63, which is 113 obtained when all the alleles from locus 1 to 75 are 1 and from locus 77 to 100 are 0. The value of locus 76 does not matter in this case since its weight is 0. First we study Profile-A. The A}, curves for fitness proportional selection and Boltzmann selection with ,8 = 0.6 , without mutation are shown in Figure 4.15 and Figure 4.16. In Figure 4.15 (b) we also see the graphs for fitness proportional selection when the mutation rate is 2%. Figure 4.17 and Figure 4.18 have the mean fitness curves for several cases. In Figure 4.17, we see the graphs for different mutation rates of the fitness proportional selection. In Figure 4.18(a), the fitness proportional selection is compared with ,6 = 0.1 and ,6 = 0.6 of Boltzmann selection with scaled ,6 . And, in Figure 4.18(b) ,6 =0.6 case of Boltzmann selection is shown for mutation rates of 0%, 0.1% and 1%. The corresponding fitness variance curves of all the cases of are shown in Figure 4.19 and Figure 4.20. When we consider Profile-B, Ah curves look as in Figure 4.22 and Figure 4.22, where we see the curves for ,6 = 0.1 and ,6 = 0.6 with no mutation and ,8 = 0.6 with 1% mutation. Comparison of mean fitness curves for fitness proportional selection and Boltzmann selection with ,8 = 0.1 and ,B = 0.6 is in Figure 4.23. In Figure 4.24, the mean fitnesses with Boltzmann selection ( ,6 = 0.6) for different mutation rates are shown for time > 9, to be able to show the details of the graphs. Lastly, the fitness variance curves comparing fitness 114 proportional selection and Boltzmann selection with ,6 = 0.1 and ,6 = 0.6 are in Figure 4.25. 2 six 1.5~ K 2" t» .9 1. .C .9 3 0s Ol- -0_5 1 __ l__--2__.-. I 1. 0 20 40 so so 100 locus,i Profile-A 1.5 222 .° cn locus weights wi s 01 O 20 40 60 80 100 locus, i -1.5 0 Profile-B Figure 4.14 Profile A and Profile B for the weighted fitness function; 115 r...1 ...r-Buh..,. .» . 5». id. 2 . [3 = -1 , Profile-A 250 (b) Figure 4.15 Ah curves for Profile-A with fitness proportional selection, pm = 0 and pm =0.2 116 l ...IN.. M” 1-4...» ..a... H1 B = 0.6 , p“ = 0 , Profile-A 1m 7“ Figure 4.16 Ah curves for Profile-A for scaled- ,6 Boltzmann selection. withfl = 0.6 and pm =O,pc =60. 117 2.7 J- i __l 45 l P _ l. l 0 50 100 150 200 250 Figure 4.17 Mean fitness curves for Profile-A for fitness proportional selection with different pm’s 118 pm = 0 , Profile-A 5‘ fl [3 = -1 l 2 45 50 1(1) 1‘30 200 (a) t 102 100 932 96‘ 92~ 90* 88- 86 " (b) Figure 4.18 Mean fitness curves for Profile-A with Boltzmann selection, scaled ,B: (a) comparing different ,6 cases; (b) for ,6 = 0.6 with different 0 Pms 119 1.1.49.5...2 ulhltl. , A.. B = -1 , Profile-A 35 'r——_~7 If” I 30[ 2 H A = 0.02 25l l 2 p25 .. w ” . I _ £13,. h | V» 20- - '2 _ , .2 . ‘ N p = 0 01 2 . m ' 15— , 4 102 pm = 0 d 5) 2 (a) t pm = 0 , Profile-A 35 . T l (b) Figure 4.19 Fitness variance curves for Profile-A: (a) for fitness proportional selection with different pm’s; (b) comparing different ,6 cases 120 - ;. li'e’IIl‘A“ B = 0.6 , p" = 0 , Profile-A T 35 l 30— _ 25 2 20 l l 10~ 20 ' 30 40 50 so 70 Figure 4.20 Fitness variance curves for Profile-A for )6 = 0.6 with different pm ’s. Boltzmann selection is with scaled ,6 121 _L-A: 1'1. B = 0.1 , pm = 0 , Profile-B (a) (b) Figure 4.21 Ah curves for Profile-B with scaled- ,6 Boltzmann selection. (a) ,6 = 0.1 , no mutation; (b) ,6 = 0.6 with no mutation 122 % ilh’l-u“ I» t. Ir724w B = 0.6 , pm = 0.01 , Profile-B 1CD Figure 4.22 Ah curves for Profile-B with scaled-,8 Boltzmann selection. ,6 = 0.6 with 1% mutation. In all cases pc = 60 123 pm = 0 , Profile-B 65 l 60 +— __ ' .2 55 T l 50 45 if“ 40 ~ l3 2 _1 — 35 » 2 30 252 * Figure 4.23 Mean fitness curves for Profile-B, showing fitness proportional selection and Boltzmann selection with ,8 = 0.1 and ,6 = 0.6 124 Figure 4.24 Mean fitness curves for Profile-B. Boltzmann selection with ,6 = 0.6, and different mutation rates 125 H.13031l1l. A pm = 0 , Profile-B Figure 4.25 Fitness variance curves for Profile-B, showing fitness proportional selection and Boltzmann selection with fl = 0.1 and ,6 = 0.6 126 4.2 Deceptive Function Problem In this Section, we apply the GA to a 3-bit deceptive function. We study only the case where the crossover rate is high enough. The chromosome length is chosen as L = 99, and the population size is P = 50. So, we have 33 deceptive blocks, each of which is 3 bits long. In all the cases studied in this section, the values of A and B are chosen as A=4 and 8:1. Note that the maximum possible fitness for a chromosome, when the fitness is not weighted, is A x 33 = 4 x 33 = 132, which is achieved when all the alleles are 1. 4.2.1 Fitness Proportional Selection We apply the model developed in Sections 3.2.1 through 3.2.4. In Figure 4.26, we have the curves for both the evolution of alleles with zero value, A2, and the evolution of the deceptive blocks, A7 , h =0,0.1,...,0.9, when the fitness proportional selection is applied. The mutation rate is zero and the crossover rate pc= “high enough”. In Figure 4.27, A: and A; curves are shown when the mutation rate is 0.1%. The graphs of mean fitness and fitness variance are shown in Figure 4.28 comparing cases with different mutation rates. 127 020406080100120140160100200 t l _ L o l 1 L l_, 1 1 -' i 0 20 40 60 80 100 120 140 160 1K) 200 t Figure 4.26 A: and A}; curves for fitness proportional selection. pc= “high enough”, pm = 0. Black lines are from GA run, gray lines are from model simulation 128 s=-1,qn=0.001 Figure 4.27 Af,’ and A; curves for fitness proportional selection with mutation rate pm = 0.001, and pc = “high enough” 129 250 B = -1 F as ~ j x F ‘- - a i ,2 m“ "h '6 ”1 ' m \ p =0.02 25~ N M 20~ ‘ 1 152 1 j - 155.7% 3,.” 10- ‘ I J a. pm = 0.001 5»— pm=° ‘ ‘ 0 2 J I l A 0 so 100 150 200 250 Figure 4.28 Mean fitness and fitness variance curves for fitness proportional selection. pc = “high enough”, pm = 0 , 0.001 ,and 0.2. Black lines are from GA run, gray lines are from model simulation 130 4.2.2 Boltzmann Selection Application of scaled- ,6 Boltzmann selection with 0.1 and 0.6 selection pressures to our deceptive function yield A}? and A; curves in Figure 4.29 and Figure 4.30, in which cases the mutation rate is taken as zero. Comparison of mean fitness and fitness variance for selection pressures of 0.1, 0.3 and 0.6 are in Figure 4.31. Lastly, in Figure 4.32, we have the mean fitness and fitness variance curves for ,6 =0.1 Boltzmann selection with four different mutation rates, pm =0, 0.001, 0.01 and 0.02. 131 l '12.: v we“... N..;.5,\,.:d. 120 120 Figure 4.29 A: and A; curves for selection with Boltzmann scaling. ,6 =0.1 pc = “high enough”, pm = 0. Black lines are from GA run, gray lines are from model simulation 132 «'2 :3. :: :e'zla’ :3 20 25 30 Figure 4.30 A}: and A; curves for selection with Boltzmann scaling. ,B = 0.6 pc = “high enough”, pm = 0. Black lines are from GA run, gray lines are from model simulation 133 almmvfiwb. «t .92 2‘22 ’3 2 z r .. . . v. 120 110 100- 70 120 35 30 ._ 25 - 20 - 15 — 10~ 120 Figure 4.31 Comparison of mean fitness and fitness variance curves for selection with Boltzmann scaling for )3 :01, 0.3 and 0.6. Pa = “high enough”, pm = 0. 134 115 ____@,,_2 —~v—— r . r 1 110 1(5 1GP 8 B = 0.1 40 A I f I I I III . . _.-_ :.- -.- .___.'_I..I-_- I {F'- ,' V: "l 22,222.21, ‘ VT 35- 22..-... fi —00 I "v I pm .. I 30 2 -.-.-_I II I is I ...1‘ 225‘ N “:32 ‘ — d 15 r III‘ 2‘“ . 10 r- ‘=. I I a ‘se, 5 .2 ‘IIIIII I - h ‘ {7:23. pm 0 922?“? '3 M JTS~ H. l......_.._ 0 1 1 1 AJMW‘J‘F— 0 20 4O 60 80 1a) 120 t Figure 4.32 Comparison of mean fitness and fitness variance curves for selection with Boltzmann scaling for ,6 :01. pa = “high enough”, for different mutation rates 135 Chapter 5 CONCLUSIONS AND FUTURE WORK 5.1 Conclusions In this thesis, we have developed a new and very practical model for the GA dynamics of the OneMax problem and for a form of deceptive function problem that is described in Section 3.1. The model can be used to find the mean allele and, in the case of deceptive function, mean all-1 deceptive blocks values. Then, it predicts mean fitness and fitness variance of the population. Although the problems for which our model proved successful were rather simple or idealized, they were often sufficiently involved to capture interesting nontrivial features of the GA dynamics. An attempt to develop a stochastic differential equation model to predict evolution of fitness distribution for the OneMax problem is presented in Appendix A. Although our attempt was not successful, it shows a strong connection between fitness evolution and certain diffusion equations and points toward the possibility of the existence of a diffusion-type equation that could describe the OneMax dynamics. 136 2r >Na..1.l 1‘14...” The model can be applied to these benchmark problems even when the crossover, mutation and the selection rates (in the case of Boltzmann scaling) are changed at predetermined generations during a GA run. Because of this capability of the model, it is unique, to the best of the author‘s knowledge, among the current models of the GA. The method of building blocks for modeling parallel genetic algorithms is applied by Cantfl-Paz [2001] in the case where the migration occurs only when all the populations are converged. Since our model estimates the mean value at each locus at any generation, it can be used to determine a suitable migration time as well as the migration rate for parallel genetic algorithms (in the island model case) when migrations are allowed at any generation for our benchmark problems. The OneMax model, Diagram 1 (page 65), for modeling the case of typical crossover rates, uses some statistics of early generations of the GA in order to predict the rest of the evolution. The simulation results in Section 4.1 show that the model describes the GA dynamics for the OneMax problem very well for different crossover and mutation rates with fitness proportional selection. The correction weights introduced by Equation (2.20) are a new way of analyzing the crossover operator, and they work very well for two-point crossover, in our GA problem. Note that our OneMax model for “high enough” crossover rates is covering a different case than the statistical mechanics model of Priigel-Bennett and 137 Shapiro [1994]. The maximum entropy assumption of Priigel-Bennett and Shapiro essentially models a situation in which the crossover operator is assumed to be effective enough to allow a relocation of the alleles which is probabilistically most likely to occur, under the constraints of the given mean fitness and fitness variance, when the alleles move freely. On the other hand, our model of “high enough” crossover rates does not assume any constraint in relation to how much the alleles can be mixed. The lower crossover rates are modeled relative to this extreme case using correction measures. Extension of the OneMax model to the weighted fitness function is described in Section 2.3. The graphs of the model simulations, Section 4.1.3, show that the model is tested for weight Profiles A and B and the results are in excellent agreement with the GA results. The author believes that the model is effective for any weight profile. Application of the OneMax model, modified as described in Section 2.4, to the Boltzmann selection gives very good estimations of this case for both the fixed and the scaled selection pressures, in the case of “high enough” crossover rates. The model for the deceptive function, Diagram 2 (page 89), is quite powerful in predicting the dynamics for both fitness proportional and Boltzmann selections. This model is unique, to the best of the author’s knowledge, in predicting the dynamics of a deceptive function to such an extent. It estimates the simultaneous 138 evolution of both the mean allele values and the mean all-1 deceptive blocks, as well as the mean fitness and fitness variance of the population. 5.2 Future Work One immediate improvement needed in our OneMax model is to modify it to include the lower crossover rates with Boltzmann selection. It is observed by the author that the correction weights, as defined in Section 2.2.4, are not by themselves sufficient to predict the Boltzmann selection behavior since they- graphs, Figure 2.6, of this case change significantly as a function of time, unlike the fitness proportional selection case in which the statistics from generation 5, for example, would be sufficient to predict future behaviors. Our model of the deceptive function also needs to be developed further to include lower crossover rates. In this case, one would need two sets of correction weights, one for modifying Equation (3.8) and the other for Equation (3.9). Then, the correction weights are going to be a function of two variables, a and 7. The future work to improve the model would also include the estimation of the fitness variance for lower crossover rates, and an investigation of the predictive power of the model in the presence of external noise. The next step in developing our model towards more complex fitness functions is to consider a fitness function which is made up several OneMax and deceptive parts as shown in Figure 5.1. Here, the chromosome has many 139 subdivisions, possibly ovenapping, and the fitness of the chromosome is found by adding up the contribution of each part, which is found by applying either a (weighted) OneMax or a (weighted) deceptive function. It is speculated that a model for such a GA problem might be established by “pasting together” models developed in this thesis. ¢ chromosome 2 I l l loul II J Lounting I's S-bit deceptive 41—bit deceptive “W““Q ‘3 34’“ de‘iV‘ Figure 5.1 A complicated fitness function a composition of several OneMax and deceptive parts Once a model is developed for a GA problem as described in Figure 5.1, the next step might be to consider a real-life problem and try to represent it with a fitness function that looks like Figure 5.1. A different direction to follow in developing a model for complex real-life problems is as follows. Note that, when a ”snapshot” of a GA is examined at some instant, from the point of view of the crossover + selection operators, in relation to determining the next mean allele (or mean all-1 deceptive block) values, any GA problem looks as if it is a weighted OneMax problem at that instant. In other words, any GA problem, in theory, can be seen as a series of weighted OneMax problems, the weights of which change dynamically from generation to generation. So, if one can estimate such a weighted OneMax decomposition of a real-life problem, it should be sufficient to predict the 140 dynamics the real-life problem by applying the OneMax model developed in this dissertation with fitness weights changing as a function of time. 141 APPENDIX A COMPARISON OF THE ONEMAX PROBLEM WITH A DIFFUSION MODEL In this section we will study the diffusion equation Lg ’) ..— .%[A, (2) f(x,t)]+ ggp, (x) f(x,t)] (M) where 2 2 A1(x)= x(l—%lnx)|:a+%—(l——l—lnx]—g—] (A2) and 2 A2 (x) = 02x2[1— imac) (A.3) _ K with initial condition lim f(x, t) = 6(x — x0) (A.4) t—->0 and explore its relationship to the evolution process of the one-max problem. The constants are chosen such that 0', K > O and 0 < a < 1 . 142 Equation (A.1) appear in the study of population growth processes in random environments [Capocelli,Ricciardi,1975], [Ricciardi,1977, 1985]. If we consider the equation fl = ax(l — ilnx) (A.5) K as modeling a growth process with fertility rate a, then 2" becomes the asymptotic population size. Changing a into a+ A(t), where A(t) is a white 2 a noise with intensity —2—, the resulting stochastic equation becomes ézax(l_-1_lnx)+X(l—ilnXJA(t) - (A6) dt K K Equation (A.1) is the Fokker-Plank equation, in other words the forward equation, for the transition probability density function, f (x, t), of the stochastic process described by Equation (A.6). Here, A1(x) and A2 (x) give the infinitesimal drift and variance of this diffusion process, respectively. From Equations (A2) and (A3) we see that the diffusion interval is (0,eK ). We can also write the so-called Kolmogorov or the backward equation for (A.6) as 143 af( 2 x)_ 62f T2: A(x og—l 2.2.2., 2;. (Ar) In this equation, the initial condition, x0, of the forward equation is considered as another variable of the function f . The solution of (A.7). or equivalently of (A.1), can be obtained by applying the transformation " d (r/(t,x)=I :1 -at, u[l——lnu] K 3?=i//(t,x), 3'0 =V(t,xo), to Equation (A.7). Defining 7 as x0) , (A.8) f(xtl~o)= [L5 ’0] flat Equation (A?) is transformed into the well known Wiener process described by the equation 67(“3 '“ )= 627 at 52,2 with the solution in the form of a Gaussian function 144 ~~ 2 76,435): (4m)‘“2exp[-— 15%] . (A.9) By (A.8) we obtain r 2 . [K 1n(__—II((_ 13:0] - as] -exp<— - x , > . (A.10) K f(x’t) 2 2(K —1nx)x[nrt 47f where x takes values between 0 and ex. Figure (A.1) shows the graph of this solution for different time values. The constants in f are taken as a =1.5, 0'2 = 0.4, and K = 4.6052. For any fixed value of t, say t = t1, the function f (x,t1) gives the probability density function for the value of population after t1 time units from the beginning. In particular Tflxetlfl" = eIf(x,tl)dx =1- 145 K=4.6052, a=1.5, 0 2=0.4 0.9 2 i a 2 2 . t=10 0.sL - 0.7— - i=9 0.s~ 2 Ofir i=8 "' 4» 2 2 {/0 t 7’ t=s t=s , 0.2 {:4 -< _ ’ initial distribution (=1 l- 2 ' '0’! i ‘ 2 J J l__ ._L2_.22___ l ___.._ l __2 30 40 50 so 70 so 90 Figure A.1 Evolution of transition probability distribution, f (x,t) On the other hand, in Figure (A.2) we see the evolution of fitness distribution for the OneMax problem with a population size of 50, (mutation rate 0.001 and Boltzmann selection with ,6 =0.3, in which two-point crossover is applied). The figure shows the fitness graphs for every second generation from 0, 180. For each generation the area under the curve remains equal to 1 as in the case of Figure (A.1). When we compare Figure (A.1) with Figure (A.2) we see a remarkable similarity between two figures. 146 In Figure (A.3), the initial distribution is chosen somewhat similar to the artificial migration case CASE2 of in Section 1.2.3. The solution of the differential equation for this initial condition is shown together with the solution with regular initial condition. The comparison shows a similar relationship as seen in Section 1.2.3 between CASE1 and CASE2. chrom slze=100, pop size=50, margari=200, maxruns=500, beta=0.3, pm =0.001 0.9 I I I I I I 0.8 # gen 180 2 0.7 2 ,5 § 0.6 2 i5 2 .3‘ 0.5 a C 8 0.4 2 E 2..., 0.3 2 9.080 “- am 60 ., - 9’" 50 .1] I g 02 2 gen 20 Ben 309°" 40 ‘r‘v/‘l‘l‘rfifr'hlalsI‘l/l n v « 'i'o'v'v's’té'i’i'igofitf'o’rlli.llWl l '0 O 99'6’0’0:0,’0,'¢l’i)09’f’i3(l WIN: 0.1 b i99¢6$ttttto,iit,ltj( My, 9” ’I’Q’QO’W 0”.th i‘h‘.‘\ \‘ll:1\‘\ I; 36,424,923;3.393.242.3222:-Ml.10:-t\» O *- 5 ’ ' A 2 o. .._l. at- -7 «x.» -. '-.~\ IIII‘. ~. . K I l 1 l i I 30 40 so 60 70 so 90 100 Figure A.2 Evolution of fitness distribution for 180 generations of OneMax problem 147 0.12_"——w "T 1” *TTF "y i I l T 0.1»- 0.08 0.06 0.04 o(F), Fitness Density Distribution 0.02 9; Figure A.3 Evolution of two transition probability distributions, f(x,t). For the dark curves, initial condition is given to resemble a migration case Although our attempts to determine suitable A1 and A2 and find an evolution equation in the form of Equation (A.1) for the fitness distribution failed, the similarity between two evolution graphs suggests that with a suitable modification to Equation (A.1), one might be able to obtain a differential equation which models the evolution of the fitness distribution for the one-max problem of genetic algorithms. 148 BIBLIOGRAPHY Buyukbozkirli, B., Goodman, E.D., A statistical model of GA dynamics for OneMax and deceptive functions, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Seattle (2004) Cantu-Paz, E.: Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers, (2001) Capocelli, R.M., Ricciardi L.M., A note on growth processes in random environments. Biol. Cybernetics 1 8 (1975), 105-109. Furutani, H., Schema Analysis of OneMax Problem: Evolution equation for first order schemata. Foundations of Genetic Algorithms (FOGA) 7, (2002) Goldberg D. E., Genetic Algorithms in Search. Optimization & Machine Learning, Addison Wesley (1989) Goldberg, D.E., Deb, K., Thierens, D.: Toward a Better Understanding of Mixing in Genetic Algorithms. Journal of the Society of Instrument and Control Engineers, (1993), 32(1), 10-16 Goldberg, DE: The Design of Innovation. Kluwer Academic Publishers, Boston, Dordrecht, London, (2002) Harik, G., Cantu-Paz, E., Goldberg, D., & Miller, B. L. The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolutionary Computation, 7(3), 231 -253, (1 999) Holland J.H., Hierarchical descriptions of universal spaces and adaptive systems, Technical Report ORA Projects 01252 and 08226, University of Michigan, Department of Computer and Communication Sciences (1968) Holland J.H., Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press (1975) Nix, A.E., Vose, M.D.: Modeling Genetic Algorithms with Markov Chains. Ann. Math. Art. Intell., (1991), 5:79-88 149 Ozpineci, B., Tolbert, L.M., Chiasson, J. N., Harmonic optimization of multilevel converters using genetic algorithms. 35‘" Annual IEEE Power Electronics Specialists Conference, Aachen, Germany (2004) Poll, R., Exact schema theory for genetic programming and variable length genetic algorithms with one point crossover. Genetic Programming and Evolvable Machines, 2(2): 1 23—1 63, (2001 ) PriigeI-Bennett, A., Shapiro, J.L.: An Analysis of Genetic Algorithms Using Statistical Mechanics. Phys. Rev. Lett., (1994), 72(9): 1 305-1 309 PriigeI-Bennett, A., Shapiro, J.L.: The Dynamics of a Genetic Algorithm for Simple Random Ising Systems. Physica D, (1997), 104:75-114 Priigel-Bennett, A., Modeling finite populations. Foundations of Genetic Algorithms (FOGA) 7, (2002) Ricciardi L.M., Diffusion Processes and Related Topics in Biology. Lecture Notes in Biomathematics, 14, Springer-Venag (1977) Ricciardi L.M., Stochastic Population Theory: Diffusion Processes. (1985) Rattray, L.M.: Modeling the Dynamics of Genetic Algorithms Using Statistical Mechanics. PhD thesis, University of Manchester, Manchester, UK, (1996) Stephens C.R., Waelbroeck H., Schemata evolution and building blocks. Evolutionary Computation, 7(2): 1 09-1 24, (1999) Suzuki, J., A Markov chain analysis on simple genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics, Vol. 25 No.4, April (1995) Vose, M.D., Liepins, G.E.: Punctuated Equilibria in Genetic Search. Complex Systems, (1991), 5:31-44 Vose, M. D., The Simple Genetic Algorithm, Foundations and Theory. The MIT Press, (1999) Whitley D., A Genetic Algorithm Tutorial, Statistics and Computing, vol. 4, pages 65-85 (1994) 150 Wright, A. H., Rowe, J.E., Stephens C. R., Poli, R., Bistability in a Gene Pool GA with Mutation. Foundations of Genetic Algorithms 7, FOGA, (2002) Wright, A., Poli, R., Stephens, C.,Langdon, W.B., and Pulavarty, S. An Estimation of Distribution Algorithm Based on Maximum Entropy. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Seattle (2004) 151 I1111111111111111