This is to certify that the dissertation entitled ACHIEVING CONSISTENT EVOLUTION ACROSS ISOMETRICALLY EQUIVALENT SEARCH SPACES presented by Arnold L. Patton has been accepted towards fulfillment of the requirements for the Ph. D. degree in Computer Science and /7 mam % IV V v” Major Professor’s Signature 5 // L/b 9/ Date MSU is an Affinnative Action/Equal Opportunity Institution . a ‘ V. ‘ - - .— v '6 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c-JCIRC/DateDue.p65-p.15 , - 4_________________.___._——————-—~—_- ACHIEVING CONSISTENT EVOLUTION ACROSS ISOMETRICALLY EQUIVALENT SEARCH SPACES By Arnold L. Patton A DISSERTATION Submitted to Michigan State University In partial fulfillment of the requirements For the degree of DOCTOR OF PHILOSOPHY Department of Computer Science and Engineering 2004 ABSTRACT ACHIEVING CONSISTENT EVOLUTION ACROSS ISOMETRICALLY EQUIVALENT SEARCH SPACES By Arnold L. Patton Evolutionary Computation systems are stochastic search processes which attempt to locate the global optima of a search space. Some of these systems have been previously shown to be inconsistent in their behavior across isometric transformations, such as rotation of coordinate axes. This work attempts to study this issue in depth by categorizing behaviors of individual operators and composite EC systems. Several approaches toward creating modified operators which deal with these situations are also studied. COPYRIGHT by Arnold L. Patton 2004 To my wife, Rachel, and my children, Noah and Bethany for their patience, prayers, and sacrifice during this lnordlnately lengthy process. To my parents, John and June, wlthout whose sacrifice and support I could not have begun my journey through the halls of higher education. iv ACKNOWLEDGEMENTS My advisor, William F. Punch, III, for his patience and consideration. The members of the Michigan State University GARAGe, the discussions with whom Initiated much of this work. Especially Michael Raymer, Terrance Dexter, Owen Mathews, Victor Mlagkikh, Sasha Topchy, and Marilyn Wulfekuhler. The faculty of the Department of Computer Science and Information Systems at Bradley University for their support. The members of the BEAR group and Dean Claire Etaugh at Bradley University, for creation and support of the Beowulf cluster on which much of this data was collected. TABLE OF CONTENTS LIST OF TABLES ............................................................................................................. xi LIST OF FIGURES .......................................................................................................... xv CHAPTER 1 INTRODUCTION 1 1.1 INTENTION OF THIS THESIS ......................................................................................... 5 1.2 PROBLEM DOMAIN ...................................................................................................... 7 1.3 TECHNIQUES ............................................................................................................... 9 1.3.] Statistical Analysis of Distributions .................................................................. 10 1.3.2 Comparative Experimental Analysis of Operators - - - . 10 CHAPTER 2 BACKGROUND 11 2.1 No FREE LUNCH THEOREMS ..................................................................................... 11 2.2 OVERVIEW OF EVOLUTIONARY COMPUTATION ........................................................ 15 2.2.1 General Parametric Evolutionary Computation Algorithm. ............................. 20 2.2.2 Evolutionary Programming ............................................................................... 25 2.2.3 Genetic Algorithms ............................................................................................ 27 2.2.4 Evolutionary Strategies ..................................................................................... 37 2.3 OVERVIEW OF COMMON EVOLUTIONARY OPERATORS ............................................. 38 2.3.] Selection Operators ........................................................................................... 38 2.3.2 Recombination ................................................................................................... 46 2.3.3 Mutation ............................................................................................................ 59 2.4 EMPIRICAL TEST FUNCTIONS .................................................................................... 62 Vi 2.4.1 Function Categorization and Terms ................................................................. 63 2.4.2 Function Illustrations ........................................................................................ 68 2.4.3 Square Function ................................................................................................ 70 2.4.4 Sphere Function ................................................................................................ 71 2.4.5 Schwefel ’s Problem 1.2 ..................................................................................... 72 2.4.6 Schafier's Function ........................................................................................... 73 2.4.7 Schafler’s Function Modification 1 .................................................................. 75 2.4.8 Schafler’s Function Modification 2 .................................................................. 77 2.4.9 Schafier’s Function Modification 3 .................................................................. 79 2.4.10 Schafi’er’s Function Modification 4 ................................................................ 83 2.4.11 Ring Function .................................................................................................. 85 2.4.12 Trapezoid Cross Function ............................................................................... 89 2.4.13 Rosenbrock’s Saddle Function ........................................................................ 90 2.4.14 Spiral Function ................................................................................................ 91 2.4.15 Ackley ’s Function ............................................................................................ 94 2.4.16 Griewangk’s Function ..................................................................................... 96 2.4.17 Clover & Cross Function ................................................................................ 98 2.4.18 Bohashevsky ’s Function ................................................................................ 100 2.4.19 Rastrigrin ’s Function .................................................................................... 103 2.4.20 Yip & Pao’s Function .................................................................................... 105 2.4.21 Multi-modal Spiral Function ......................................................................... 107 2.4.22 Frequency Modulation Sounds (FMS) Problem ........................................... 110 2.4.23 Exponential Function .................................................................................... 1 13 Vii 2.4.24 Chain-Link Function ..................................................................................... I I4 2.4.25 Double Cos Function .................................................................................... 115 2.4.26 Inverse Exponential Function ....................................................................... 1 16 2.4.27 Worms Function ............................................................................................ I 18 2.4.28 Schwefel ’s Function ...................................................................................... 120 2.4.29 Dynamic Control ........................................................................................... 122 CHAPTER 3 ANALYSIS OF OPERATORS 123 3.1 STATISTICAL DISTRIBUTION ANALYSIS .................................................................. 124 3.1.1 Mean Disturbance ........................................................................................... 125 3.1.2 Variance Disturbance ..................................................................................... 128 3.1.3 Mean Focusing (Center Tending) ................................................................... 132 3.1.4 Covariance Disturbance ................................................................................. 133 3.2 LOCAL STATISTICS AND HOMEOMORPHIC ENCODING INVARIANCE ........................ 149 3.2.1 Translation of axes .......................................................................................... 151 3.2.2 Linear rescaling .............................................................................................. 152 3.2.3 Rotation of axes ............................................................................................... 154 3.2.4 Underconstrained (Free) Parameters ............................................................. 155 3.2.5 Validity of Local Statistical Extrapolation ...................................................... 159 3.3 EMPIRICAL ANALYSIS ............................................................................................. 161 3.3.1 Statistical Tests ................................................................................................ I 61 3.3.2 Statistical Test Cases ....................................................................................... 1 6 7 3.3.3 Example Statistical Analysis ........................................................................... 174 3.3.4 Empirical Statistical Results ........................................................................... 180 viii 3.3.5 Preliminary Operator Taxonomy .................................................................... 195 CHAPTER 4 ALTERNATE POPULATION RELATIVE OPERATORS .......... 200 4.1 OVERVIEW OF BENCHMARK SYSTEMS AND EMPIRICAL COMPARISONS .................. 201 4.1.] Random Rotational Presentation of Test Functions ....................................... 202 4.1.2 EP and Fast EP Systems ................................................................................. 203 4.1.3 BLX-a GA System ............................................................................................ 204 4.1.4 Ranksum Comparisons .................................................................................... 204 4.2 PRINCIPAL COMPONENT OPERATORS FOR GA ........................................................ 209 4.2.1 Principal Component Crossover (PCX) .......................................................... 210 4.2.2 Principal Component Gaussian Mutation (PCGM) ....................................... 211 4.2.3 Sample Selection for Principal Component Analysis ...................................... 212 4.2.4 Difficulties with Relative Under-Constraint and Excessive Freedom ............ 213 4.2.5 PC GA System Parameterization ..................................................................... 214 4.3 DEMONSTRATION OF ROTATIONAL BIAS IN STANDARD APPROACHES .................... 215 4.3.1 BlX-a Rotational Bias .................................................................................... 215 4.3.2 EP and Fast EP Rotational Bias ..................................................................... 21 7 4.3.3 Principal Component GA Rotational Bias ...................................................... 220 4.4 EMPIRICAL RELATIVE EFFICIENCY COMPARISON OF PCGA ................................... 222 4.4.1 Comparison to EP and Fast EP ...................................................................... 222 4.4.2 Comparison to BLX-a GA ............................................................................... 226 4.5 EMPIRICAL EVALUATION OF PCX AND PCGM IN ISOLATION ................................. 228 4.6 EMPIRICAL COMPARISON OF PRINCIPAL COMPONENT OPERATORS AND RANDOMLY ROTATED OPERATORS ............................................................................................ 233 ix 4.7 EVALUATION OF THE EFFECT OF SCALE ON PC OPERATORS ................................... 235 4.8 EVALUATION OF THE EFFECT OF SAMPLE SIZE ON PC OPERATORS ........................ 237 4.9 EVALUATION OF THE EFFECT OF POPULATION SIZE ON POPULATION RELATIVE OPERATORS ............................................................................................................ 240 4.9.1 PCGA Population Sensitivity .......................................................................... 240 4.9.2 BLX-a GA Population Sensitivity .................................................................... 243 4.10 LOSS SENSITIVE OPERATORS FOR EC ................................................................... 245 4.10.1 Variance Recapture and Convergence .......................................................... 245 4.10.2 Global versus Dimensional Variance Targets .............................................. 247 4.10.3 Difficulty with Non-fixed Axes ....................................................................... 248 4.1 1 EMPIRICAL RELATIVE EFFICIENCY COMPARISON OF VREP ................................. 249 4.11.1 Comparison to EP ......................................................................................... 249 4.11.2 Comparison to Fast EP ................................................................................. 252 4.11.3 Comparison to BLX-a GA ............................................................................. 254 4.11.4 Comparison to PCGA .................................................................................... 254 4.12 EFFECT OF RBCAP'I'URE PERCENTAGE ON VREP EFFICIENCY ............................... 257 4.13 EFFECT OF POPULATION SIZE ON VREP EFFICIENCY ............................................ 259 4.14 GRAPHIC PERFORMANCE COMPARISON OF SYSTEMS ............................................ 261 CHAPTER 5 CONCLUSIONS 289 5.1 CONCLUSIONS FROM THEORETICAL EVALUATIONS ................................................ 289 5.2 CONCLUSIONS FROM EMPIRICAL EVALUATIONS ..................................................... 290 5.3 FUTURE WORK ....................................................................................................... 292 REFERENCES 294 LIST OF TABLES TABLE 3.1 RESULTS ON ALIGNED UNIFORM UNTMODAL DISTRIBUTION .......................... 180 TABLE 3.2 RESULTS ON ROTATED UNIFORM UNTMODAL DISTRIBUTION ......................... 180 TABLE 3.3 RESULTS ON ALIGNED NORMAL UNTMODAL DISTRIBUTION ........................... 181 TABLE 3.4 RESULTS ON ROTATED NORMAL UNIMODAL DISTRIBUTION .......................... 181 TABLE 3.5 RESULTS ON N-DIMENSIONAL HYPERSPHERE SURFACE DISTRIBUTION... ........ 182 TABLE 3.6 RESULTS ON UNIFORM DENSITY N-DIMENSIONAL HYPERSPHERE DISTRIBUTION ................................................................................................................................. l 82 TABLE 3.7 RESULTS ON N—DIMENSIONAL NORMAL DISTRIBUTION ................................... 183 TABLE 3.8 RESULTS ON N—DIMENSIONAL NORMAL RING DISTRIBUTION .......................... 183 TABLE 3.9, RESULTS ON ROTATED N-DIMENSIONAL HYPERELLIPSOID DISTRIBUTION ...... 184 TABLE 3.10 RESULTS ON ROTATED N-DIMENSIONAL SKEWED HYPERELLIPSOID DISTRIBUTION .......................................................................................................... 184 TABLE 3.11 RESULTS FOR FIXED UNIFORM MUTATION CENTERED ON A SINGLE PARENT185 TABLE 3.12 RESULTS FOR FIXED NORMAL MUTATION CENTERED ON A SINGLE PARENT 185 TABLE 3.13 RESULTS FOR FIXED CAUCHY MUTATION CENTERED ON A SINGLE PARENT 186 TABLE 3.14 RESULTS FOR FIXED LOG-UNIFORM MUTATION CENTERED ON A SINGLE PARENT .................................................................................................................... 186 TABLE 3.15 RESULTS FOR FIXED UNIFORM MUTATION CENTERED ON THE MEAN OF 2 PARENTS .................................................................................................................. 186 TABLE 3.16 RESULTS FOR FIXED NORMAL MUTATION CENTERED ON THE MEAN OF 2 PARENTS .................................................................................................................. 187 xi TABLE 3.17 RESULTS FOR FIXED CAUCHY MUTATION CENTERED ON THE MEAN OF 2 PARENTS .................................................................................................................. 187 TABLE 3.18 RESULTS FOR FIXED LOG—UNIFORM MUTATION CENTERED ON THE MEAN OF 2 PARENTS .................................................................................................................. 187 TABLE 3.19 RESULTS FOR AVERAGTNG CROSSOVER ........................................................ 188 TABLE 3.20 RESULTS FOR LINEAR CROSSOVER ............................................................... 188 TABLE 3.21 RESULTS FOR EXTENDED LINEAR CROSSOVER ............................................. 188 TABLE 3.22 RESULTS FOR STANDARD BLX-0.5 ............................................................... 189 TABLE 3.23 RESULTS FOR BLX-0.5 CENTERED ON A SINGLE PARENT ............................. 189 TABLE 3.24 RESULTS FOR SIMPLEX CROSSOVER (SPX) .................................................. 189 TABLE 3.25 RESULTS FOR PRINCIPAL COMPONENT GAUSSIAN SAMPLING ...................... 190 TABLE 3.26 RESULTS FOR PRINCIPAL COMPONENT CROSSOVER ..................................... 190 TABLE 3.27 RESULTS FOR FIELD-BASED CROSSOVER ..................................................... 190 TABLE 3.28 RESULTS FOR GLOBAL DOMINANT RECOMBINAIION ................................... 191 TABLE 3.29 PRELIMINARY TAXONOMY ........................................................................... 198 TABLE 4.1 Z-STATISTIC OF RANKSUM MEASURES WITH TWO 90 MEMBER SAMPLE GROUPS ................................................................................................................................. 207 TABLE 4.2 RANKSUM MEASURES FOR VARIOUS PROBABILITY LEVELS WITH Two 90 MEMBER SAMPLE GROUPS ...................................................................................... 208 TABLE 4.3 PERFORMANCE COMPARISON OF ELK-0.5 GA UNDER ROTATED AND NON- ROTATED PRESENTATION ........................................................................................ 216 TABLE 4.4 PERFORMANCE COMPARISON OF EP UNDER ROTATED AND NON—ROTATED PRESENTATION ........................................................................................................ 218 xii TABLE 4.5 PERFORMANCE COMPARISON OF FAST EP UNDER ROTATED AND N ON-ROTATED PRESENTATION ........................................................................................................ 219 TABLE 4.6 PERFORMANCE COMPARISON OF PCGA UNDER ROTATED AND NON-ROTATED PRESENTATION ........................................................................................................ 221 TABLE 4.7 PERFORMANCE COMPARISON OF AN EP SYSTEM AND PCGA ......................... 224 TABLE 4.8 PERFORMANCE COMPARISON OF A FAST EP SYSTEM AND PCGA .................. 225 TABLE 4.9 PERFORMANCE COMPARISON OF A BLX-O.5 GA SYSTEM AND PCGA ........... 227 TABLE 4.10 PERFORMANCE COMPARISON OF PCX-ONLY AND PCX AND PCGM COMBINED IN A GA'FRAMEWORK ........................................................................... 230 TABLE 4.11 PERFORMANCE COMPARISON OF PCGM-ONLY AND PCX AND PCGM COMBINED IN A GA FRAMEWORK ........................................................................... 231 TABLE 4.12 PERFORMANCE COMPARISON OF PCGM-ONLY AND PCX AND PCX-ONLY IN A CA FRAMEWORK ..................................................................................................... 232 TABLE 4.13 PERFORMANCE COMPARISON OF A RANDOMLY ROTATED GA SYSTEM AND PCGA ...................................................................................................................... 234 TABLE 4.14 EFFECT OF SCALE, S, ON PCGM OPERATOR IN PCGA SYSTEM .................... 236 TABLE 4.15 EFFECTS OF SAMPLE POOL SIZE ON PCGA SYSTEM ..................................... 239 TABLE 4.16 EFFECTS OF POPULATION SIZE ON PCGA PERFORMANCE ........................... 242 TABLE 4.17 EFFECTS OF POPULATION SIZE ON BLX-0.5 PERFORMANCE ........................ 244 TABLE 4.18 PERFORMANCE COMPARISON OF AN EP SYSTEM AND VREP-0.998 ............ 251 TABLE 4.19 PERFORMANCE COMPARISON OF A FAST EP SYSTEM AND VREP-0.998 ...... 253 TABLE 4.20 PERFORMANCE COMPARISON OF A BLX-A GA SYSTEM AND VREP-0.998 . 255 TABLE 4.21 PERFORMANCE COMPARISON OF PCGA AND VREP-0.998 ......................... 256 xiii LE .22 EWEC F ARGE] 0N ~REPPERFORMANCE .. aeo- oeooogga 25 IAI’ 4 IO IIECAJ IUItE I to use... one 8 xiv LIST OF FIGURES FIGURE 2.1 OUTLINE OF A BASIC EVOLUTIONARY ALGORITHM ......................................... 23 FIGURE 2.2 EXAMPLE OF 2-POINT CROSSOVER ................................................................. 30 FIGURE 2.3 HISTOGRAM OF SAMPLE OF MUHLENBEIN’S LOG-UNIFORM DISTRIBUTION 55 FIGURE 2.4 2D SQUARE FUNCTION .................................................................................. 70 FIGURE 2.5 2D SPHERE FUNCTION ................................................................................... 71 FIGURE 2.6 2D SCHWEFEL’S PROBLEM 1.2 FUNCTION ..................................................... 72 FIGURE 2.7 2D SCHAFFER’S FUNCTION ............................................................................. 74 FIGURE 2.8 2D SCHAFFER’S FUNCTION - MODIFICATION 1 .............................................. 76 FIGURE 2.9 2D SCHAFFER’S FUNCTION - MODIFICATION 2 .............................................. 78 FIGURE 2.10 2D SCHAFFER’S FUNCTION — MODIFICATION 3, Low RESOLUTION .............. 80 FIGURE 2.11 2D SCHAFFER’S FUNCTION — MODIFICATION 3, MEDIUM RESOLUTION ....... 81 FIGURE 2.12 2D SCHAFFER’S FUNCTION — MODIFICATION 3, HIGH RESOLUTION .............. 82 FIGURE 2.13 2D SCHAFFER’S FUNCTION — MODIFICATION 4 ............................................ 84 FIGURE 2.14 2D RING FUNCTION ..................................................................................... 86 FIGURE 2.15 2D RING FUNCTION, CLOSER VIEW ............................................................. 87 FIGURE 2.16 2D RING FUNCTION, NEAR GLOBAL OPTIMA ............................................... 88 FIGURE 2.17 2D TRAPEZOID CROSS FUNCTION ................................................................. 89 FIGURE 2.18 2D ROSENBROCK’S SADDLE FUNCTION ....................................................... 90 FIGURE 2.19 2D SPIRAL FUNCTION .................................................................................. 93 FIGURE 2.20 2D ACKLEY’S FUNCTION ............................................................................. 95 FIGURE 2.21 2D GRIEWANGK’S FUNCTION ...................................................................... 97 FIGURE 2.22 2D CLOVER AND CROSS FUNCTION ............................................................. 99 XV FIGURE 2.23 2D BOHASHEVSKY’S FUNCTION ................................................................ 101 FIGURE 2.24 2D BOHASHEVSKY’S FUNCTION, CLOSER VIEW ......................................... 102 FIGURE 2.25 2D RASTRIGIN’S FUNCTION ....................................................................... 104 FIGURE 2.26 2D YIP & PAo’S FUNCTION ....................................................................... 106 FIGURE 2.27 2DMULTI-MODAL SPIRAL FUNCTION ........................................................ 109 FIGURE 2.28 FMS FUNCTION WITHIN INITIAL RANGE, x5 AND x6 .................................. 111 FIGURE 2.29 FMS FUNCTION WITHIN INITIAL RANGE, x2 AND x4 .................................. 112 FIGURE 2.30 2D EXPONENIIAL FUNCTION ..................................................................... 1 13 FIGURE 2.31 2D CHAIN-LINK FUNCTION ........................................................................ 114 FIGURE 2.32 2D DOUBLE COS FUNCTION ....................................................................... 115 FIGURE 2.33 2D INVERSE EXPONENIIAL FUNCTION ....................................................... 117 FIGURE 2.34 2D WORMS FUNCTION ............................................................................... 1 19 FIGURE 2.35 2D SCHWEFEL’S FUNCTION ....................................................................... 121 FIGURE 3.1 EXAMPLE SITUATIONS WHICH INCREASE VARIANCE THROUGH SELECTION 131 EQUATION 3.1 FORMULAS FOR SAMPLE COVARIANCE ................................................... 133 FIGURE 3.2 THREE EXAMPLE PARENT DISTRIBUTIONS .................................................... 134 EQUATION 3.2 GENERAL VARIANCE LOSS FORUMI.A ..................................................... 136 FIGURE 3.3 CROSSOVER COVARIANCE MODIFICATION EXAMPLE .................................... 137 EQUATION 3.2 COVARIANCE CONTRIBUTION OF PARENTS ............................................... 137 EQUATION 3.3 COVARIANCE CONTRIBUTION OF CHILDREN ............................................. 137 EQUATION 3.4 COVARIANCE MODIFICATION ................................................................... 137 EQUATION 3.5 MAGNITUDE OF COVARIANCE MODIFICATION, SLOPE-DISTANCE FORM.... 138 xvi FIGURE 3.4 RELATIVE COVARIANCE DISTURBANCE AS FACTOR OF SLOPE BETWEEN PARENTS .................................................................................................................. 140 FIGURE 3.5 REGION OF COVARIANCE LOSS _>_ 70% ........................................................... 140 EQUATION 3.6 EXPECTED RELATIVE MAGNITUDE OF COVARIANCE LOSS FOR FULLY DISSOCIATIVE OPERATORS ....................................................................................... 141 FIGURE 3.6 EXPECTED DIFFERENCE OF TWO SAMPLES FROM U(A,A+2) ........................... 142 EQUATION 3.7 PDF FOR EXPECTED DIFFERENCE BETWEEN Two UNIFORM SAMPLES ....... 142 EQUATION 3.8 EXPECTED VALUE OF D2 FOR A UNIFORM DISTRIBUTION OF WIDTH w ...... 142 EQUATION 3.9 EXPECTED LEVEL OF COVARIANCE DISRUPIION PER CHILD FOR A UNIFORMLY DISTRIBUTED POPULATION ALONG A LINE SEGMENT WITH SLOPE M AND LENGTH W ................................................................................................................ 142 FIGURE 3.7 NORMAL DISTRIBUTION ALONG COVARIANT LINE SEGMENT ......................... 143 FIGURE 3.8 SEARCH DISTRIBUTION OF A BLX-a OPERATOR ........................................... 145 EQUATION 4.1 EXPECTED MEAN OF RANKSUM VALUES .................................................. 205 EQUATION 4.2 EXPECTED VARIANCE OF RANKSUM VALUES ........................................... 205 EQUATION 4.3 Z-STATISTIC FOR AN OBSERVED RANKSUM MEASURE ............................. 206 EQUATION 4.4 FORMULA FOR PCGM SAMPLE STANDARD DEVIATION ........................... 211 EQUATION 4.5 FORMULA FOR VR MUTATION DISTRIBUTION VARIANCE CALCULATION 249 FIGURE 4.1 AVERAGE BEST PERFORMANCE ON SQUARE ................................................. 262 FIGURE 4.2 RANKSUM PERFORMANCE ON SQUARE .......................................................... 262 FIGURE 4.3 AVERAGE BEST PERFORMANCE ON SPHERE .................................................. 263 FIGURE 4.4 RANKSUM PERFORMANCE ON SPHERE ........................................................... 263 FIGURE 4.5 AVERAGE BEST PERFORMANCE ON SCHWEFEL’S 1.2 .................................... 264 xvii FIGURE 4.6 RANKSUM PERFORMANCE ON SCHWEFEL’S 1.2 ............................................. 264 FIGURE 4.7 AVERAGE BEST PERFORMANCE ON SCHAFFER .............................................. 265 FIGURE 4.8 RANKSUM PERFORMANCE ON SCHAFFER ...................................................... 265 FIGURE 4.9 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 1 ................................. 266 FIGURE 4.10 RANKSUM PERFORMANCE ON SCHAFFER MOD. 1 ........................................ 266 FIGURE 4.11 AVERAGE BEST PERFORMANCE ON RING .................................................... 267 FIGURE 4.12 RANKSUM PERFORMANCE ON RING ............................................................. 267 FIGURE 4.13 AVERAGE BEST PERFORMANCE ON TRAPEZOID & CROSS ........................... 268 FIGURE 4.14 RANKSUM PERFORMANCE ON TRAPEZOID & CROSS ................................... 268 FIGURE 4.15 AVERAGE BEST PERFORMANCE ON ROSENBROCK SADDLE ......................... 269 FIGURE 4.16 RANKSUM PERFORMANCE ON ROSENBROCK SADDLE ................................. 269 FIGURE 4.17 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 2 ............................... 270 FIGURE 4.18 RANKSUM PERFORMANCE ON SCHAFFER MOD. 2 ........................................ 270 FIGURE 4.19 AVERAGE BEST PERFORMANCE ON SPIRAL ................................................. 271 FIGURE 4.20 RANKSUM PERFORMANCE ON SPIRAL ......................................................... 271 FIGURE 4.21 AVERAGE BEST PERFORMANCE ON ACKLEY ............................................... 272 FIGURE 4.22 RANKSUM PERFORMANCE ON ACKLEY ....................................................... 272 FIGURE 4.23 AVERAGE BEST PERFORMANCE ON GRIEWANGK ........................................ 273 FIGURE 4.24 RANKSUM PERFORMANCE ON GRIEWANGK ................................................. 273 FIGURE 4.25 AVERAGE BEST PERFORMANCE ON CLOVER & CROSS ................................ 274 FIGURE 4.26 RANKSUM PERFORMANCE ON CLOVER & CROSS ........................................ 274 FIGURE 4.27 AVERAGE BEST PERFORMANCE ON BOHACHEVSKY .................................... 275 FIGURE 4.28 RANKSUM PERFORMANCE ON BOHACHEVSKY ............................................ 275 xviii FIGURE 4.29 AVERAGE BEST PERFORMANCE ON RASTRIGRTN ......................................... 276 FIGURE 4.30 RANKSUM PERFORMANCE ON RASTRIGRIN ................................................. 27 6 FIGURE 4.31 AVERAGE BEST PERFORMANCE ON YIP & PAO ........................................... 277 FIGURE 4.32 RANKSUM PERFORMANCE ON YIP & PAO .................................................... 277 FIGURE 4.33 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 3 ............................... 278 FIGURE 4.34 RANKSUM PERFORMANCE ON SCHAFFER MOD. 3 ........................................ 278 FIGURE 4.35 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 4 ............................... 279 FIGURE 4.36 RANKSUM PERFORMANCE ON SCHAFFER MOD. 4 ........................................ 279 FIGURE 4.37 AVERAGE BEST PERFORMANCE ON MULTIMODAL SPIRAL .......................... 280 FIGURE 4.38 RANKSUM PERFORMANCE ON MULTIMODAL SPIRAL .................................. 280 FIGURE 4.39 AVERAGE BEST PERFORMANCE ON FMS .................................................... 281 FIGURE 4.40 RANKSUM PERFORMANCE ON FMS ............................................................. 281 FIGURE 4.41 AVERAGE BEST PERFORMANCE ON EXPONENIIAL ...................................... 282 FIGURE 4.42 RANKSUM PERFORMANCE ON EXPONENIIAL .............................................. 282 FIGURE 4.43 AVERAGE BEST PERFORMANCE ON CHAIN-LINK ........................................ 283 FIGURE 4.44 RANKSUM PERFORMANCE ON CHAIN-LINK ................................................. 283 FIGURE 4.45 AVERAGE BEST PERFORMANCE ON DOUBLE COS ....................................... 284 FIGURE 4.46 RANKSUM PERFORMANCE ON DOUBLE COS ................................................ 284 FIGURE 4.47 AVERAGE BEST PERFORMANCE ON INVERSE EXPONENIIAL ....................... 285 FIGURE 4.48 RANKSUM PERFORMANCE ON INVERSE EXPONENIIAL ................................ 285 FIGURE 4.49 AVERAGE BEST PERFORMANCE ON WORMS ................................................ 286 FIGURE 4.50 RANKSUM PERFORMANCE ON WORMS ........................................................ 286 FIGURE 4.51 AVERAGE BEST PERFORMANCE ON SCHWEFEL ........................................... 287 xix FIGURE 4.52 RANKSUM PERFORMANCE ON SCHWEFEL .................................................... 287 FIGURE 4.53 AVERAGE BEST PERFORMANCE ON DYNAMIC CONTROL ............................ 288 FIGURE 4.54 RANKSUM PERFORMANCE ON DYNAMIC CONTROL ..................................... 288 XX Chapter 1 Introduction The NO Free Lunch (NFL) theorems [Wolpert 97] provide a context for understanding and comparing the relative strengths and weaknesses of algorithmic search techniques. Simply stated, the NFL theorems prove that for any finite representation from the set of all possible enumerated orderings of the represented points, and given some criteria of performance measurement, no single search algorithm can outperform any other on average. That is to say, the performance of all search algorithms must be equal when averaged across all possible represented search spaces. Put yet another way, any performance gains that a given algorithm makes over another on a specific problem instance must necessarily be offset by an equivalent loss on one or more different problem instances of equivalent representational complexity. This applies equally to both complex systems such as genetic algorithms and simple ones such as uniform random search. Over the space of all possible finite problem spaces, both random search and genetic algorithms must be equally powerful. To better understand how this result is possible, Wolpert [Wolpert 97] suggests considering how a given search algorithm selects a future search point given an existing set of known points in the space. Typically, the algorithm makes some assumption about the shape of the search space given the set of points already Visited (e. g. gradient descent assumes a locally smooth gradient) and selects the next point based on this "information". However, when considering the set of all possible enumerated orderings of points for a given representation, it is in equally likely that any given assumption leads to poorer or better solutions. (In fact, the stronger the assumption the less likely it will lead to good points.) Equivalently, making the inverse assumption is also equally likely to be correct over the set of all possible encoded search spaces. Wolpert suggests that the degree of success of a given search algorithm upon a given search problem is directly related to how well the assumptions of the algorithm hold true for the given space. Equivalently English [English 96] suggests that the set of previously visited points gives no actual "information" about the unvisited remainder of the search Space, and that any apparent gains by "informed" search techniques result from providential alignments between assumptions and problem spaces. Given the implications of these theorems, it seems somewhat pointless to compare the relative merits of individual search algorithms. Paradoxically, however, the NFL theorems actually provide the key to understanding the relative strengths of various search techniques by pointing to the underlying assumptions made by each given system. By analyzing the underlying assumptions, we can better judge over what set of search spaces a given system should outperform other systems. To the degree we expect a given assumption to hold within a search space, we should expect a system which makes that assumption to have increased performance. As this thesis is concerned with the performance of evolutionary search techniques, one of the most important factors is the set of operators employed to select new search points from an existing set of currently visited points. Analysis of the assumptions inherent to individual operators should provide a reasonable starting point. In general, an operator that performs best under certain assumptions (e. g. there are few or no covariant features in this landscape) tends to be perform poorly under a corresponding transformation which violates that assumption (6. g. rotation of the encoding axes). The standard AI term is that the operator is strong because it is specifically suited to, and therefore strongly tied to, certain assumptions about the search landscape. A weak AI method trades lower performance on a specific subset of problems (i.e. those for which the strong assumption applies) for better performance on problems outside of this subset. In fact, the NFL theorems provide a more formal basis for this strong / weak search classification continuum. Additionally, the NFL theorems prove that the tradeoff between performance on the strong subset and generality outside the strong subset is inescapable - a gain in one requires an equivalent loss in the other. Once the assmnptions inherent to a given operator are apparent, we may fiirther consider if a given assumption provides for a desired performance gain and/or loss over a given potential set of problems. It may be possible to weaken the operator by redesigning it such that the given assumption is removed or modified. Such a redesigned operator potentially trades a performance loss on a given subset of problems, namely the problems for which the original assumption held, for a performance gain on an alternative set of problems encompassed by the new assumption set. Most evolutionary search operators are designed with either the specific intention of either: exploiting assumed "information" from previously sampled potential solutions, being purely explorative, or attempting to statically or dynamically provide a balance between these two competing drives. However, often little thought is given concerning the interplay between the effects of problem alignment and dimensionality and the levels of exploration or exploitation. It is often possible for a single operator to have greatly divergent effects on two identical landscapes which present differences such as the alignment of the encoding axes. For example, [Salmon 96] studied the effect of axial rotation on the relative performance of EP and breeder genetic algorithm (BGA) systems. Such dependencies are troubling for two reasons. First, such divergent behavior indicates that the operator does not perform consistently based on the local attributes of the landscape itself, such as the local gradient or shape. Therefore additional information is necessary to determine the expected performance of such operators. Second, the end user of a search system, such as a genetic algorithm, often has a reasonable idea of what level of computational resources they would prefer to expend on search for new solutions versus refinement of current solutions. The constraints on computational resources may reflect real world limitations such as the available processor power and the need to meet commitment deadlines. However, for most evolutionary computation operators, the nature of the balance between exploration and exploitation is relatively unknown. 1. 1 Intention of this Thesis The intent of this thesis is to discover and categorize differences between evolutionary search operators over a variety of problems spaces such that their biases are clearly revealed. The outcome of this analysis is a taxonomy of Operator biases and the creation of alternative search operators which are Specifically designed to negate or circumvent these biases. A combination of empirical and analytical techniques are used to provide evidence of bias reduction and increased search performance under constrained circumstances in comparison to standard EC operators. Through statistical analysis of operator distributions, we can discover and categorize certain forms of operator bias. This may proceed either from direct analysis of the design of the operator, or through analysis of the effects of the operator under established circumstances. These “case studies” characterize the distribution sampled by an operator during application to a pre-selected source population distribution. By carefirlly selecting the source population distribution we will be able to discover certain statistical biases. By understanding the forms of bias inherent to an Operator, we can gain insight into what landscape characteristics are beneficial or detrimental to a system employing that operator. Specifically, we are interested in discovering what degree of invariance of behavior we can expect for a given operator under a specific local population distribution. Given a set of landscape transformations and suitable tests for operator invariance, we provide a “taxonomy of invariance” which allows for comparison between Operators and provides an initial decision point for operator selection. Ideally these taxonomic tests should provide meaningful quantitative values, but we settle for more -5- qualitative analysis on occasion. Empirical results on a given battery of test problems will be used to demonstrate the effectiveness of this taxonomy. Once we understand the biases inherent to a given operator, we can attempt to modify the operator in order to remove a given bias thereby creating a weaker version of the operator. The NFL results clearly specify that the resulting operator will lose relative power over a certain set of problems, but the benefit is an increase in the general applicability of the operator. It may be argued that limited gains in generality are more valuable than equivalent gains in strength for evolutionary computation operators, since for the majority of search landscapes the assumptions which may hold are relatively unknown. Empirical examination and comparison over a battery of test problems provide a method for evaluation of the relative strengths and weakness of these redesigned operators. All evolutionary search systems select future search points in reference to current or previously visited search points. In general, the relative step-size for a given EC operator may be dependent on a large number of factors, many of which may be artifacts of the choice of encoding. The result is a relatively uncontrollable stochastic process defined with little thought to addressing the balance between refining search and exploratory search. Operators which give some level of control over the balance between exploration and exploitation to the user can rectify this situation. This provides a secondary impetus to the design of new operators. The intention of this thesis is threefold. First, examples of operator analysis to detect bias will be examined, providing a simplified taxonomy of invariance for classification of EC operators. Second, operators redesigned to increase generality (weakness) will be studied; and, finally, examples of operators designed to allow greater control of the balance between exploration and exploitation will be examined. Chapter 2 provides sufficient technical background and is intended for those with only passing familiarity with evolutionary computation. The set of empirical test functions being used is also presented in Chapter 2. Chapter 3 presents numerous forms of statistical analysis Of operators and a taxonomy of invariance for EC operators. Some of the biases uncovered in Chapter 3 are used as the basis for the redesign of operators in Chapter 4. These operators are subjected to a series of comparative evaluations under a number of well known test problems, as well as some designed specifically to test the relative generality and strength of the various systems. Finally, conclusions are presented in Chapter 5. 1.2 Problem Domain The problem domain we have selected is real-valued function optimization. To facilitate empirical analysis we have selected test beds consisting primarily of a number of well known real-valued fimction optimization tasks. However, given the rather limited coverage of these functions in terms of relevant features (asymmetry, signal-to-noise ratio, co-dependency of variables, etc.) a greatly enlarged test set is proposed in Chapter 2 for the purposes of this analysis. Emphasis has been placed toward moderate scale (10-30 parameter) function optimization tasks. Most of the operators and systems proposed here should scale to larger domains at least as well as other EC techniques are capable of doing so; however, no Specific examination of this scalability of the proposed techniques and operators will be undertaken in this work. Real-valued function optimization can loosely be defined as the collection of all parametric problems that map a set Of 11 real numbers to the set of reals, i.e. f (R") 2 R . Typically the task is to find a set of inputs which map to a minimal or maximal domain value. Note that the landscape defined by a given mapping function may be multi-modal or unimodal, exhibit local smoothness (consistency between gradient and direction of nearest extremum) or extremely chaotic behavior. Since most evolutionary computation systems do not require a computable derivative, it is not necessary for f to be differentiable, or even piecewise differentiable. f may will be defined over the range of . . . n . . all possrble Inputs (Le. R ), although some functions may present range constraints on the input parameters which may require additional consideration. Generally, the function f is assumed to remain fixed over time, although there is much interest in EC behavior on dynamical real-valued landscapes. The majority of the problems investigated here are completely stationary with the exception of the occasional introduction of noise to the functional mapping, e.g. f (R") :> R + N (0,0). The techniques developed in this work are not specifically designed to address any of the issues involved in searching dynamical landscapes. While the functions being used in the test domain provide infinite resolution (i.e. they are defined over the set of all real values), the EC systems being tested all employ limited finite resolution representations (although quite admittedly large representations). Far fi'om being an incidental matter, the form and method of value representation is often a matter of concern and careful scrutiny in evolutionary computation. As discussed in Chapter 2, bias may be easily introduced via poor choice of representation or a mismatch between representation and operators. Wherever possible, the systems described in this work will use standard 64-bit IEEE representations unless otherwise specified. Hopefully using a uniform representation will eliminate representational bias from the empirical results. 1.3 Techniques There are two basic forms of analysis that will be applied in this work. Through analysis of the distributions produced by operators used in evolutionary systems relative to the source distributions, we may be able to characterize some of the fundamental biases inherent to these operators. This analysis is limited in that it treats each operator in isolation and does not consider behavior between multiple operators. For example, it is possible that two operators which have opposing biases tend to produce a system with minimal bias when used in conjunction. Also such analysis does not account for potential emergent behaviors from interaction between operators in a search system. Empirical analysis provides for direct comparison of systems employing various operator collections. Empirical results can confirm or deny hypotheses which result from distributional analysis, assuming such effects are not completely countered by any emergent behaviors. In order to obtain usefirl empirical data, we will require a control group for comparison. Therefore, we will test a number of well-established EC systems over the same test bed. Given the implications of the NFL theorems, the reader should be wary of extending such comparative empirical data beyond the given test bed functions. Inasmuch as we are evaluating the relative match between operator biases and landscape characteristics, one may possibly expect similar results over functions with similar characteristics to a given test bed firnction. -9- 1.3.1 Statistical Analysis of Distributions The distributions induced by a single pass Of a given operator on one or more example source populations will be statistically analyzed. Comparisons between the statistical characteristics of the source population and the produced population will be demonstrated for specific instances. Where possible, complete closed form analytical evaluation will be developed for the general case as well. 1.3.2 Comparative Experimental Analysis of Operators For empirical testing, we will denote a separate system as being an evolutionary computation approach which incorporates a unique series of operators or a modified mixture of application rates of a given set of operators. For each tested system, data will be collected for multiple test runs on each test function. Results from all test runs for a given system/function pairing will be averaged and plotted for visual comparison. Also, the standard deviations will be computed to allow for distributional comparison of test runs. Since distributional analysis of test runs can be somewhat misleading (given that the underlying distribution may not be Gaussian), non-parametric tests, such as Wilcoxson rank sum testing, will be carried out pair-wise between the final results of the tested systems to establish probable statistical significance of the comparative results. -10- Chapter 2 Background 2. 1 No Free Lunch Theorems This subsection provides an overview of the No Free Lunch theorems published by David H. Wolpert and William G. Macready [Wolpert 97} (originally published in working form in 1996) and further refined by Tom English [English 96]. Roughly stated, these theorems surnmize that all “black box” optimization procedures (i.e. any optimization procedure which treats the function to be optimized as an unknown system) must on average perform equally well when averaged across the set of all possible problem sets for a given encoding size. Therefore, any performance advantages of one method over another on a given problem instance must be offset by an equal performance loss on the remaining problem instances within the set of all possible problem sets. -11- The initial proof is actually quite simple in concept. The majority of the NFL theorems deal with overcoming potential objections to and providing extensions of the initial proof. The basic theorem makes the assumption that both the range and domain of the function being optimized are finite. However, both domain and range may be quite large; therefore, the basic theorem applies to all search methods which are realized via implementation using a fixed representation in a digital computer. Wolpert and Mcready [Wolpert 97] denote the search space as X and the Space of possible cost values as Y. An optimization problem provides a mapping from X to Y, so the size of the space of all possible problems is |Y ll XI (i.e. | Y | possible values at each of | X | possible search points), which is a large but finite set. Further, we may define the probability that a given search method will reach a given performance level (e.g. the minimum or maximum Y value obtained so far) within a given number of evaluations, based solely only on the mapping function and the history of previously Visited points. That is, P(Z | f ,m,a) , where Z is a histogram of the Y values of the visited points, f is the mapping function (i.e. landscape) being optimized, m is the number of points Visited so far, and a is the search method. This formulation works for any comparison made through evaluation of the histogram of function values obtained by a search firnction. Therefore, this analysis holds for measures such as efficiency (best function value earliest) and effectiveness (best function value reached), which are two of the most common search function evaluators. P(Zr | f ,m,a) can be shown to be dependent on PM}; | f ,m,a) , where d% is the history of Y values visited up to step In. For m = 1, it can be demonstrated that the sum -12- of P(d1Y | f ,m,a)over the set of all possible mapping functions, that is, z P(d1Y | f ,m, a)= |Y | IX '- 1. Intuitively, the number of functions over which our first f Y sample is exactly dIY is the set of all functions where the selected X value is mapped to dlY by f. That set must include all possible mappings of the values of Y to the other | X | X|l - 1 members of the domain, hence the set must be of size |Y | I - . Since this value is independent of a, this demonstrates that for m = 1, P(d,¥I | f ,m,a) is independent of the choice of a. It follows that if P(d,),; | f ,m,a)is independent of a for m, then it is also independent of a for m + l. Wolpert [Wolpert 97] shows that the value of P(d 1):; +1 I f ,m +1,a) = IITIPM’)", | f ,m,a). In other words, having another sample from Y only narrows the number of potential matching functions, f, by |Y |. This provides an inductive proof that P(d,},; | f ,m,a)is independent of a for all possible choice of a when averaged (summed) across all possible functions. We can conclude that —> —+ P( c | f ,m,a1) = P( c | f ,m,a2)for any possible choice of a, and a2, and therefore, any performance measure based on Zwill be equally likely to favor a, or a; for a completely arbitrary f Keeping in mind that a; or (12 represent two distinct search algorithms, the implication is that for an arbitrarily selected f either is equally likely to produce the better results. -13- [English 95] uses an information theoretical analysis to reach a similar conclusion. That is, the history of prior sampled points provides no information about the values of any unsarnpled points in the search space when considered over the set of all possible mapping functions. Therefore, he concludes that the power of a search method is related solely to the degree that the prior assumptions (prior information) match the actual circumstances. Or in other words, a search method is most successful when the mapping looks like the search method assumed it would before the search began. Many have pointed to the possibility of using meta-level techniques to select between alternate search methods, thereby allowing a meta-search function to eliminate any biasing prior assumptions. However, [English 99] points out that such methods necessarily reduce their effective performance by attempting to pursue multiple potential assumptions simultaneously. Further, the only effective method of removing all bias is to return to completely stochastic random search, and that as any meta-level technique continues to reduce bias, it necessarily approaches random search. The NFL theorems allow for comparison of search algorithms on specific problem instances. It is important to note that the NFL theorems do not imply that all algorithms must behave with equal strength on each problem instance. For that matter, [Wolpert 95] suggests that min-max comparisons are possible within the framework of NFL. That is, an algorithm, a, for example, may do much better than another search algorithm, a; on a number of problems but there may be no problems where a; performs dramatically better than a 1. In order to satisfy NFL in such a case, it must be necessary that the set of problems over which a; performs better than a 1 is much larger than the set over which the reverse is true. Therefore the overall performance average for the two -14- algorithms remains equal. In other words, it is possible for a given algorithm to sacrifice generality for strength on a Specific subset of the possible problem spaces. Likewise, any search algorithm which maintains generality must necessarily sacrifice strength. Weak (general) search algorithms may be greatly outperformed on Specific problem instances, but will not necessarily equally outperform on a compensating set of problem instances. Rather, it is probable that a weak search algorithm will provide smaller gains on a larger set of potential problems to compensate. Thus, comparisons and analysis as to the relative generality (weakness) or strength of different search algorithms are still possible. [Wolpert 95] suggests a geometric interpretation of the NFL theorems where the performance success of a search algorithm depends directly on the degree of “alignment” between a vector representing the actual search space information and the search algorithm’s expectations (assumptions) about the search landscape. This conceptualization provides loose support for local behavior analysis of EC operators and systems, in order to better understand what assumptions operators and systems are making about the search landscape. 2.2 Overview of Evolutionary Computation Evolutionary computation (EC) is the broad category for a number of forms of automated computer search. While each of these forms claim separate founding inspirations and often hold strongly opposing philosophical Views, they are often more alike than they are different in practice. Evolutionary computation derives its power from the ability to successfirlly redistribute search resources over time. This successive refinement is often achieved through competitive selection, which favors quantitatively better solutions. Since the inspiration and metaphor of this selection is the mechanism of -15- natural selection as found in theories of natural evolution, such algorithms are most frequently classified as evolutionary algorithms. Much of the history and origins of evolutionary computation deals with simulation of adaptation and empirical modeling of natural systems. However, the ability of evolutionary algorithms to solve problems is not directly linked to the provability of the underlying metaphor in that modern evolutionary algorithms are at best greatly abstracted models of natural systems. Evolutionary algorithms are often compared with and occasionally confused with simulated annealing, Monte Carlo systems, stochastic gradient descent, and other simple stochastic search algorithms. These stochastic search algorithms typically proceed from a single search point, or landscape sample, and progress by modifying this solution or solution set either randomly or using locally sampled landscape information, such as the local value of the gradient (assuming a continuously or piecewise differentiable function). New search points are then accepted stochastically on the basis of improvement. Indeed at first blush, some EC systems appear to be simply massively parallel stochastic search processes since individual members of a given population are often modified randomly and new search points are accepted stochastically on the basis of competitive selection. However, it is exactly the mechanism of competitive selection which differentiates evolutionary algorithms from parallel stochastic search, since competitive selection is the mechanism which enables an evolutionary algorithm to rebalance search resources toward more productive search areas. Thus the narrow yet critically significant separation between parallel stochastic search and evolutionary computation is that the latter may abandon some search paths in order to apply those resources as branches to more favorable search paths, and the former cannot. -16- As mentioned previously, there are a number of different approaches toward evolutionary computation and a number of different systems which are consistently championed by various professional circles. However, all forms of evolutionary computation may be categorized as either parametric or programmatic algorithms. Parametric evolutionary algorithms are systems which attempt to solve individual instances of specific problems. The solutions are pararneterized, that is an individual solution to the problem instance consists of values for the parameters of the problem which we desire to optimize. Thus we know the parameters before we solve the problem, searching only for the values of those parameters. For example, a genetic algorithm might be designed to search for an optimal packing order for a specific sequence of elements to be packed. Evaluation in a parametric evolutionary algorithm consists of decoding each potential solution and providing a quantitative rating of this solution or at least a qualitative comparison between two given solutions. Note that the former implies a single judging criterion, or at least the ability to combine multiple criteria into a single quantitative result. The second can allow for richer multi-criteria forms, such as pareto comparisons for multi-objective problems. In either case, parametric evolutionary algorithms always search out the best answer (or set or possibly answers in a pareto search) for a given explicit problem instance. In contrast to a programmatic evolutionary approach, there is no expectation that any information from the parametric search process for a given problem instance (such as final population, history of points visited, etc.) will provide any benefit during a search of a new instance of the same problem. (An instance of a problem implies one with similar form, but with potentially different -17- parameterization.) Therefore the applicability of the answer (or pareto answer set) is extremely specific and narrow (dependent, of course, on the narrowness of the problem statement). In the previous example, the solution to the optimal packing sequence for a given sequence of items does not likely provide any information for the optimal packing of a different series of items. Programmatic evolutionary systems attempt to build programmatic solutions to entire classes of similarly defined problems. While the primitives from which the programs are constructed are pre-selected, problem solutions are not pararneterized. Input for a programmatic evaluation commonly consists of some generated program and a number of problem instances to be examined, which may be a fixed test set or a stochastically selected test set. For example, a genetic program could be developed which is capable of finding optimal packing sequences for several similar sequences of items. The intention is to find a more generally applicable solution engine, rather than a single one-time solution. However, the level of generality of the produced program is not guaranteed and depends on numerous factors. Evaluation for a programmatic evolutionary system consists of simulating the action of the programmatic solution on a number of the given problem instances and producing an average or summary quantitative evaluation value which may be later compared to the relative value of other programmatic solutions. Note the search domain of a programmatic evolutionary algorithm is much larger than that of a typical parametric evolutionary algorithm in that it attempts to solve an entire class of problems, rather than a single specific problem instance. As an example, a typical parametric EC problem might be to schedule a given production set for a given set of resources, while a -13- corresponding programmatic problem might be to produce a program that is capable of scheduling a number of production sets having certain characteristics over a given set of resources (or even perhaps a number of different sets of resources). Given the broader nature of most programmatic search problems, and the sampled nature of the evaluation, programmatic search tasks often proceed at a much slower pace than related parametric search tasks. There are a number of different forms which modern programmatic EC systems use to encode solutions. Early evolutionary programming approaches evolved non- deterministic finite automata (NDFA), also some current work continues in that vein, although the term evolutionary programming (EP) is normally associated with a specific form of parametric evolutionary search. Classifier systems encode sets of production rules to create rule-based systems that carry out programmatic tasks. By far the most prevalent form of programmatic evolutionary search is genetic programming (GP), which encodes its programs as LISP-like function trees. There are also forms of programmatic evolutionary search that produce primitive directives similar to machine code. There are three common forms or schools of parametric evolutionary computation. The earliest form is evolutionary programming (EP) which originated with Lawrence Fogel [Fogel 62]. The most commonly referred form in the United States is genetic algorithms (GA) which was developed by John Holland in the 1960’s and published in book form in his monograph Adaptation in Natural and Artificial Systems [Holland 7 5]. Evolutionary strategies (ES) originated with Binert, Rechenberg, and Schwefel in Germany [Rechenberg 65] and has its strongest influence in European circles. Each of these systems employs an array of operators, and there are countless -19- variants and variations of each. In the following sections we will describe the standard forms of each of these three systems and note some of the more commonly used variations and variants. Following the overview of these systems, the major operators and variants will be categorized and presented in further detail. First, however, a common fiamework of an evolutionary algorithm is presented to allow unification of terminology in the later presentations. The analytic techniques developed in this research are designed for use in parametric evolutionary search systems. While it may be possible to adapt some of the concepts herein to programmatic approaches, certain assumptions such as continuity and natural ordinality of the parameter space, locality of movement within the parameter space, etc. need to be translated into equivalent assumptions on the nominal forms typical to most programmatic encodings before such adaptation could proceed. To the degree that such equivalent assumptions may not exist, it may not be possible to apply these techniques in a programmatic domain. Further, the form of these translated assumptions may dictate a different course than that presented here. It is likely that the form of analysis presented here may be in some fashion applicable to programmatic EC, but it is doubtful that the specific techniques and operators proposed would have any direct parallels. 2.2.1 General Parametric Evolutionary Computation Algorithm All evolutionary algorithms manipulate collections of (as opposed to individual) potential solutions, which represent populations in the metaphor of natural evolution. The individuals in a given population undergo successive rounds of modification and refinement through recombinative, mutative, and selective operations producing -20- individuals which may be incorporated into subsequent successor populations. Most evolutionary algorithms initialize the first source population randomly through random sampling across a bounded segment of the parameter range. Search resources are commonly constrained; consequently population sizes remain constant for the majority of EC systems. It is important to note that standard population sizes are exponentially smaller than the enumerated search space. Also, since search spaces grow exponentially with linear expansion of the encoding space population sizes do not scale with the size of the search space and therefore we expect an exponential slow down in performance and accuracy as the size of the search expands. The course of a standard evolutionary algorithm is often represented as progression from source population to successor population with various modifications to the individuals and potentially limited growth and reduction of the population between them. However, since we will be examining the collective distributional effects of an operator on the distribution of the population free from any emergent behaviors from interaction with other operators, we present a basic outline for an evolutionary algorithm which shows multiple distinct successive population transformations for each applied operator. In order to bridge the gap between standard notation and our extended notation, we will label all intermediate populations, pools. Therefore we will define an evolutionary algorithm as a series of transformations from a source, or parent population through a succession of intermediate pools, finally producing a successor, or child population. Note that these intermediate pools may be virtual populations in that the individuals which comprise these groups may never be collected together as such; however, they may still be viewed as a reasonable collection of individuals in a given -21- state, and as such may be treated statistically as a pOpulation variant. For example, a GA may successively select two individuals fiom the source population, apply recombinative and mutative operators to the selected individuals, and deposit the modified solutions into the successor population before selecting the next pair for mating. Thus, there never is a physical "post-breeding-selection" pool. An outline of the various stages of a standard evolutionary algorithm is given in Figure 2.1. Note that the order of the post-recombination and post-mutation pools are interchangeable. In fact, it is possible to have more than one recombinative operator or mutative operator, or none. For each mutative or recombinative operator, we will typically represent a separate post-recombination or post-mutation pool so that we can observe the effects of each operator in isolation. The initial population is copied to (or. equivalently becomes) the source population. Individuals from the source population may be selected for breeding through an optional breeding selection operator. Note that as with all intermediate pools, breeding pool sizes may vary or be fixed depending on the implementation of a given evolutionary algorithm. For each subject selected for breeding a cohort pool may be selected. A cohort pool is a potentially limited selection of individuals from the general population. Membership is typically stochastic and may be based on various criteria such as Similarity or dissimilarity to the initial subject (i.e. niching or incest reduction). Members from the cohort pool may provide supplementary data for operator action. Cohort pools may also be shared. For most common EC implementations the cohort pool is selected via the same mechanism as the breeding pool. However, we differentiate the two here since it is possible to select individuals via alternate mechanisms with the specific intent of selecting better breeding information for -22- a given member of the breeding pool, rather than simply more fit individuals. After all recombinative and mutative operators have been applied, an optional survival selection operator may be applied to determine the composition of the successor population. Note that one or more individuals from the original source population may also participate in the survival selection competition (e.g. u + l. selection, elitism). Finally, the successor population becomes the source population for the subsequent search iteration. This cycle repeats until some stopping criterion is reached. Initial Population I copy ,- - - - - -Source Population ¢ \ breeding-selection cohort-selection t ‘ Breeding Pool Cohort Pools \ / recombination v Post-Recombination Pool mutation = 9 Post-Mutttion Pool survival-selection V Successor Population Figure 2.1 Outline of a basic evolutionary algorithm Those familiar with standard presentations of evolutionary computation systems may note the lack of explicit objective evaluation of individuals in the outline presented -23- in Figure 2.1. Objective information, or some form of comparative capability is necessary to make selective decisions; therefore, we assume some objective evaluation is carried out before or during breeding selection and survival selection (and possibly cohort selection). In practice, new solutions are commonly evaluated and assigned a quantitative value at initialization (in the case of the initial population) or after modification by whatever operators are being applied. However, logistically this information is typically not required outside of selection operators, so we include the task of assigning objective values as part of the selection operations. In particular, any operator which does require objective value information may be Viewed as a composite of a pool selection component and an operator component within this model. Each evolutionary algorithm must choose a method for representation of the parameter values being searched. Often the choice of representation leads to Specific assumptions and search characteristics for a given EC implementation. EP and ES typically require parameters to be real-valued entities, and therefore are most suited to search spaces with continuous ordinal dimensions; however, any real-valued encoding such as IEEE floating point representation, or a sufficiently fine grained discrete encoding is suitable. Standard GA implementations represent all entities as bit strings. This is largely due to the strong influence of the schema theorem [Holland 75] and the k- armed bandit metaphor [Goldberg 89b] and the concept of implicit parallelism. Typical GA approaches to real-valued function optimization use fine-grained binary discretization to represent parameter values. Alternate forms of genetic algorithms are more flexible in their form of parameter representation but require alternate operators to achieve equivalent results. The theory of interval schema has been advanced as an -24- equivalent alternative to standard schema theory for such alternate GA forms [Eshelrnan 93]. 2.2.2 Evolutionary Programming Evolutionary programming (EP) was originally Visualized as a programmatic evolutionary algorithm, employing stochastically modified non-deterministic finite automata (NDFA) to encode programmatic solutions [Fogel 62]. Over time EP evolved into a parametric search algorithm used primarily for real-valued function optimization [Back 97]. While some active research still continues on NDFA evolution, the majority of current EP research focuses on parametric search. EP differs from the majority of other evolutionary algorithms in that it assumes that the population as a whole does not contain significant information about the most productive directions for firture search. This equates to a metaphor of speciation in that individuals are seen to represent species of solutions rather than individuals. The lack of interbreeding between species in natural systems thus equates to a prohibition against recombinative operators in general. The philosophy of design for EP operators stresses phenotypic versus genotypic manipulation. Genotypic manipulation implies that the level of manipulation should be at the encoded level. Thus the encoding designer has the onus of ensuring that operator manipulations translate to meaningful search actions in the parameter space. In contrast, phenotypic manipulation implies that the level of manipulation should be within the decoded parametric space. To minimize the potential for bias, meaningful search action for EP typically equates to favoring smaller search movements in phenontypic space. For -25- this reason EP typically incorporates operators which focus on mutation with emphasis toward continuity and localized search. Since EP does not use population level information to induce the magnitude of search activity, an alternative method for defining the magnitude and direction of mutation is necessary. Current EP implementations typically employ self-adaptive techniques to define mutative magnitudes. Self-adaptation is implemented through the addition of (1 parameters to each encoded solution, where d is the number of parametric dimensions. Each of these meta-parameters is initialized to a fraction of the initial range for each corresponding solution parameter (usually relative to the reciprocal of the square root of d in order to maintain constant operator variance regardless of dimensionality). These meta-parameters represent the width of the probability distribution used for mutation. A form of mutation is also applied to these meta-parameters simultaneously with the mutation of the solution parameters. Thus survival of individuals (or species) requires not only location of fi'uitful solution parameters but also fruitful mutation magnitudes. Note that the latter requires a longer time frame before successful feedback can be obtained. An alternative, earlier form of mutation step-size adaptation tied the magnitude of mutations to the ratio of successful advancement of children [Rechenberg 7 3]. Both of these techniques were originally pioneered under evolutionary strategies (ES) [Rechenberg 73] [Schwefel 77]. EP employs a form of survival selection, known as (p + it) selection [Back 97]. Under EP selection a population of 11 individual solutions are mutated to produce A children, where A is typically an integer multiple of 11. Note that all members of the source population participate equally in the production of children without any form of -26- reproductive selection. After all A children have been produced, a ranking tournament is held among the super pool of the 11 parents and A children combined, from which the best 1.1 are selected as members of the successor population. Thus the post-mutation pool, including the addition of the untouched members of the source generation is u + A in size, while the size of the source and successor populations is u — hence the term (1.1 + A) selection. EP selection arguably may achieve a high level of elitism. Elitism allows an elite or extremely fit individual to pass untouched from the source population to the successor generation. With (1.1 + A), all 11 members of the source population may survive untouched as members of the successor population. 2.2.3 Genetic Algorithms Genetic algorithms are largely focused on the concept of simulated genetic representation and modification. This is partially due to the original focus of some of the foundational and most influential works published on genetic algorithms. In Adaptation in Natural and Artificial Systems [Holland 75], Holland focuses primarily on understanding and simulating the mechanisms of natural adaptation. Much of the preliminary thrust in GA research simulated the natural genetic manipulation and often borrowed concepts from theories of genetics and natural evolution. Also, Holland first proposed the schema theorem as an attempt to explain the apparent power of adaptive search techniques [Holland 75]. This theory was further amplified by the k-armed bandit analysis of binary encoded GA as presented in Genetic Algorithms in Search, Optimization, and Machine Learning [Goldberg 89b]. These theories are largely concerned with binary schemata and their interactions. Many researchers still claim that -27- having a minimal alphabet size (i.e. binary encoding) provides optimal search power. There have been a number of criticisms against various portions of the schema and k- armed bandit theories [Fogel 98] [Fogel 00], and alternative theories have been advanced for GA incorporating other encoding forms [Eshelman 93]. GA stand as a partial antithesis to EP in that GA derive the majority of their search strength and guidance for firture search directions from the relative locations of the members of the source population in the genotypic space, and their relative objective values. This stems fiom the fact that recombination plays a dominant role in genetic algoritlmrs, while mutation is deemed necessary only to maintain sufficient diversity to support continued recombinative search. The justification for the strong reliance on recombination with minimal levels of mutation is based on the implicit parallelism argument of the building block theory. This strong emphasis on recombination has elicited criticism over the potential for revisitation of search points. Such concerns arise from consideration of TABU search [Glover 89] which derives a portion of its power from explicit measures taken to discourage point revisitation, and extended analysis of the No Free Lunch theorems which imply that point revisitation may cause exponential losses in search performance relative to those systems which explicitly avoid point revisitation. [English 99] There are a number of variants of GA, and a wide array of potential operators with varying degrees of popularity and levels of experimental results. First, we outline the most common form of genetic algorithm which is basically equivalent to Goldberg’s simple GA (SGA) [Goldberg 89b] following which we outline some of the more common operator variations and variant systems. -23- 2.2.3.1 The Standard Genetlc Algorithm SGA employs binary encoding and binary operators exclusively. The initial population is initialized using uniform random sampling. The reproductive selection scheme distributes the probability of selection according to the relative objective values of the members of the source population (assuming that we desire to maximize fitness), otherwise known as proportional selection. Pairs of the individuals from the breeding pool are selected for potential application of crossover. Binary mutation may be applied to individuals from the post-recombination pool; however, typically no intermediate breeding selection is performed between application of crossover and mutation. Crossover and mutation are applied on a probabilistic basis with a portion of the breeding pool potentially passing directly into the successor population without modification. Typical crossover application rates are fairly high (typically at least 70%), with low mutation rates (at most 1 bit per encoded solution, often much less). Reproductive selection chooses exactly p. individuals fi'om the source population with replacement (i.e. individuals from the source population may be repeated in the breeding pool). All Operators replace their input individuals with modified individuals in the successive pool or population upon application. Therefore, all pool and population sizes are fixed at u; hence, this selection may be termed (u, A) selection, where A = p, or (u, 1i) selection [Back 97]. No survival selection is applied. Selection and application .Of operators repeats until some ending criterion is reached or processing limits are reached. There are multiple forms of recombination which have been employed in GA systems. Standard crossover operators which are typically applied to binary encodings, but which may also be applied to larger encodings with restricted crossover points, -29- include l-point crossover, 2-point crossover, and uniform crossover. These basic crossover forms are similar in that they produce two child solutions from two parent solutions. The value from each of the two parent solutions for each allele, or atomic value (typically a single bit), appear in one of two children, thus the value information from the parents is conserved. The children are composed of blocks of values from alternating parents. The number of crossing points determines the number of contiguous parent blocks in the children. In crossing points implies m+l contiguous blocks transfered to each child (thus in 2-point crossover, there are three alternating contiguous blocks taken from alternating parents as demonstrated in Figure 2.2). The position of the crossing points is typically selected uniformly from the set of all possible crossing points. Uniform crossover allows Pa???” w. _,- Childl _- . L- x 5'53 GT [G l P‘ A? l: A. G G ; G ;.A7 A. Parent2 Child2 4,57,]. 1,, 1,7773", , . ’_' . . i r, . 'rr: . .4_ -_-- . r P AI 616 .1 A T: :P, A T G. RWT Crossing Points Figure 2.2 Example of 2-point Crossover from 0 to n crossing points, where n is the size of the set Of all possible crossing points. Thus, uniform crossover simply allows each allele an independent 50% chance of being inherited from a given parent. (Note however, that once a child solution inherits an allele from a given parent, the second child automatically receives the allele from the opposing parent.) -30- 2.2.3.2 Alternate Genetic Algorithm Operators A number of recombinative operators have been developed specifically for use by GA on real-valued function Optimization problems. These include averaging crossover [Davis 91], linear crossover [Wright 91], flat crossover [Radcliffe 90], blend crossover (BLX) [Eshehnan 92], unimodal normal distribution crossover (UNDX) [Ono 97], m- parent UNDX [Ono 99], and simplex crossover (SPX) [Tsutsui 99]. These operators are described in detail in Section 2.3. Use of the raw objective values for proportioning selection probabilities can often cause difficulties; therefore a number of alternate forms of reproductive selection have been employed in GA. Scaled proportional selection uses a normalization factor before determining relative selection probabilities. Ranked selection assigns probabilities to individuals depending on their absolute rank within the current population. Boltzman selection uses Boltzman scaling to determine selection probabilities. All of these forms of reproductive selection are quantitative in that they require the objective function to be able to produce a consistent single quantitative value of merit for an individual solution. Tournament selection requires only that a comparative evaluation can be made between the quality of two or more competing solutions. A number of optional operators and algorithms for genetic algorithms have been developed to help maintain population diversity. Niching attempts to maintain stable levels of diversity by forcing new individuals to compete with genetically similar individuals (of the previous or current generation) for survival [Goldberg 89b]. Incest reduction [Eshehnan 91b] takes an opposing approach by selecting a mate for a given selected individual that is most genetically opposite (typically measured in Hamming -31- distance). Elitism allows for objective value based survival of individuals fi'om the parent generation directly into the child generation, thereby guaranteeing that the best solution in the parent population will survive untouched [Eshelman 913]. These operators and algorithms have been widely studied and are often used in GA systems. 2.2.3.3 Alternate Genetic Algorithm Systems Two fairly common alternative GA systems designed for real-valued function optimization are Eshelman’s CHC [Eslehman 91a], and Mfihlenbein’s Breeder Genetic Algorithm (BGA) [Miihlenbein 95]. In their current form, these systems employ Operators intentionally adapted to the real-valued function domain, and have shown enhanced performance on specific real-valued test problems over the standard binary GA approach. [Eslehman 91a] [Mtihlenbein 95] CHC, first proposed by Eshelrnan [Eshelman 91a], was created with the express intention of overcoming difficulties with search effectiveness and efficiency in the standard GA approach. CHC introduced several new operators which were departures from the popular thought about genetic search at that time. CHC uses population elitist selection, which is survival oriented and uses a ranking of the combined parent and child solution sets reminiscent of EP selection. The current popularity of Uniform crossover [Syswerda 89] is partially due to its incorporation into the CHC framework (although CHC uses a specific variant of Uniform crossover, HUX, which always allocates exactly half of the alleles from each parent). CHC also introduced the concept of incest reduction as a method of maintaining population diversity. The CHC system does not employ a standard mutation operator, but instead enters a reinitialization phase once given convergence criteria have been met. The reinitialization is not a complete randomization -32- — the population is reinitialized with copies of the best solution with a preconfigured percentage of random mutation applied (Eshelrnan suggests 35% of alleles be mutated). Two years after its introduction, Eshelrnan introduced the blend crossover operator, BLX, and its most common variant, BLX-or. [Eshehnan 93] BLX—or recombination is designed for continuous domains. Eshelrnan successfully demonstrated that BLX—a works best within the CHC framework, as opposed to the standard GA framework. Given the relative difficulty of dealing with continuous domain problems with a standard binary GA, and given the large number of continuous domain optimization problems available in the real world, CHC using BLX—a quickly became a popular tool. Initially outlined by Mfihlenbein [Miihlenbein 95], BGA represents a radical departure from standard genetic algorithm design. The philosophical metaphor for BGA is that of simulating expert human breeders, rather than Simulating random natural genetic processes. Further, Miihlenbein attempts to justify the design of BGA and its operators using tools similar to those of expert human breeders, genetic theory, and statistical inference. BGA uses reproductive truncation selection, meaning that parents are selected randomly among the top 7% of the parent generation. Mtihlenbein demonstrates a derivation from T to the selection intensity, I, and thereby to the expected convergence rate in the absence of mutation. BGA obtains most of its power from recombinative operators, in fact it utilizes three forms of recombination: discrete recombination, extended intermediate recombination, and extended line recombination. The mutation Operator is unique to BGA, and its predecessor PGA (parallel genetic algorithm), and -33- consists of discrete log-uniform distributions scaled relative to the initial parameter range for a given variable. This mutation operator simulates the distribution induced by binary mutation on standard unsigned integral representations. Parallelization of genetic algorithms provides a number of potential modifications to the basic GA system. Parallelization typically is categorized by the degree of isolation and independence between individual solutions in the population. Global parallelization is conceptually identical to the standard GA system, with the simple addition of more computing resources. Course grained (island) parallelization provides for individual subpopulations with a specified rate of interchange of individuals. These migrated individuals may be selected at random, via competitive selection, or through various other mechanisms. Fine grained, or cellular genetic algorithms isolate individual solutions and provide a fixed interchange map (commonly a toroid) to designate which neighbors may be used during the recombinative operation [Whitley 93]. Note that it is also possible to simulate the logistics of one of these parallel systems using single processor. Therefore, this is logical parallelism, or more precisely logical breeding control. In this sense, these concepts are similar to niching and incest prevention in their effects. John Holland’s monograph, Adaptation in Natural and Artificial Systems [Holland 75], outlines two forms of “reproductive plan”. The Rd plan represents the model for the standard GA structure with its generation based population replacement. However, the Rd plan is the result of Holland’s schema theorem analysis of a different reproductive plan, R1 [DeJong 92]. The R] plan maintains a fixed sized population and -34- allows only a single operator application at a time. One or two parent solutions are selected from the population via proportional fitness based selection, and the resulting child solutions potentially displace one or two solutions from the current population. The replacement selection mechanism in R1 is uniform random sampling, that is, the child solution replaces a uniformly randomly selected individual from the parent population without regard to fitness. (Much emphasis in evolutionary simulation is placed on the importance of survival until reproduction, not on continued survival of the individual.) Whitley’s GENITOR system revived the R1 structure as an alternative “steady state GA” architecture [Whitley 88]. The popularity of GENITOR revived interest in DeJong’s analysis of “generation gap” measures (i.e. the effects having solutions of differing generational “age” competing in the same population) [DeJong 75]. Interestingly, Rechenberg proposed an identical system as an early form of ES [Rechenberg 73] and therefore deserves the credit for discovery [Rudolph 97]; however, modern steady state GA systems can clearly trace their origins to Holland’s nearly parallel development [Holland 75]. Alternate implementations of the steady state GA model use other forms of replacement selection, such as replacement of the worst solution, or competitive comparison against the progenitor solution(s) used to create this solution. DeJong concludes that any success from steady state GAS is more a product of the replacement strategy than a modification of the generation gap [DeJong 92]. However, Rogers and Prtigel-Bennett ([Rogers 1999], [Rogers 2000]) conclude that a steady state GA produces twice the selection pressure and twice the potential for genetic drift as a population level GA with a similar population size. The steady state GA is also related to parallelization -35.. of genetic algorithms in that allowing individual processors to proceed at an independent pace produces a Similar generation gap effect. More ambitious modifications of the GA framework focus on altering the method of solution representation, and thereby are less clearly classified as GA rather than general EC. One of these alternate systems, which still maintains a fairly clear GA character, is Goldberg’s messy GA [Goldberg 89c]. The messy GA allows for multiple redundant representation of values within a solution encoding. Individual parameter values are not determined by their position in the encoded solution as with most other parametric EC systems. Instead, encoded solution parameters define both their placement within the solution and their value. The central focus is in overcoming the encoding problem in terms of localizing the linkage between solution parameters, similar to Holland’s inversion operator [Holland 75]. A messy GA system attempts to build up solutions by examining subsolution sets (in the so-called primordial phase) and then building solutions from these subsolutions. This continues recursively until complete solutions are built. Specialized messy operators are designed to perform recombination and mutation on these representations in a logical manner. A given messy GA solution may under-represent one or more solution parameters by failing to include any specification for them, while simultaneously over-representing other solution parameters. Multiple potential solutions to these problems have been suggested. Missing parameters may be handled through use of default values or through sampling; however, the prior has the potential for introduction of bias, while the latter adds a potentially large stochastic component to evaluation. Over-representation is typically resolved via averaging or by taking the most recent specification over previous -36- ones. As with genetic programming representation, messy GA solutions tend to grow unboundedly over time unless some form of limitation or resolution is employed. However, such modifications tend to destroy the structure of the underlying subsolutions and are generally avoided. 2.2.4 Evolutionary Strategies Ths initial impetus for evolutionary strategies (ES) is attributed to experiments canied out by Bienert, Rechenberg, and Schwefel during the mid-1960’s [Rudolph 97] and reported initially by Rechenberg [Rechenberg 65]. ES systems commonly employ a number of techniques and operators for real valued function optimization. Rudolph [Rudolph 97] reports that the current focus of ES on real valued function optimization may be largely due to Rechenberg’s successful analysis of the simple version of ES in Euclidean space with continuous mutation [Rechenberg 73]. Early ES focused intently on mutation as the driving force of search, born largely from statistical analysis of complex systems. These analyses led to various improvements in the search technique, such as mechanisms for determination of the ideal mutative step size and later the mutative axial orientation. The success rate of mutations provided an early mechanism for determination of mutative step size, which is still in use in some current research. Rechenberg’s analysis determined that in order to successfully stave off premature convergence that 20% of all newly searched points should Show improvement relative to the parent solution. This led to development of the well-known th . . . . . 1/5 rule, whereby the mutatIve step srze 18 decreased If recent mutations have produced less than a 20% success ratio, and increases the step size if the mutation success rate rises above 20% [Schwefel 95]. -3 7.. ES research developed mechanisms for self-adaptation of mutative step-sizes. Self-adaptation allows co-evolution of the mutative parameters with the solution parameters. Each population element is extended to include step size parameters for mutation along each axis. This form of mutation has been adopted by EP systems as the primary form of mutation. ES systems also incorporate recombinative operators. These operators can be pairwise, as with GA systems; however, ES commonly uses global, population-level recombinative operators (i.e. operators which use all elements of the population as a joint parent set). More recent forms of evolutionary strategies move ' beyond self-adaptive mutative step sizes to include self-adaptation of the axial alignment [Rudolph 97]. The axial alignment can be represented as n(n-I)/2 rotation angles. ES systems may perform mutation and recombination on both the solution parameters and the self-adaptive, meta-level parameters; however, the actual operators employed for the meta-panneters are typically different than those used for the solution parameters. 2.3 Overview of Common Evolutionary Operators 2.3.1 Selection Operators There are countless forms of selection commonly employed in parametric evolutionary computation systems. The following overview includes the most common selection operators. The selection operators are presented in two groups: those operators which are applied to selection for breeding, and those operators which are used to select survivors. -33- 2.3.1.1 Reproductive Selection Reproductive selection operators are used to choose individual solutions from a given population pool for use in reproduction. This includes both mutative and recombinative reproduction, although some selection operators, such as incest reduction, are specifically designed to select mates for recombinative operators. 2.3.1.1.1 Uniform Random Selection and Uniform Sampling Some systems, especially those that employ post-reproductive (survival) selection, allow individuals for reproduction to be selected uniformly at random from the population pool. Using uniform random selection, both the worst and the best of the surviving solutions have an equal probability to reproduce. In fact, many systems force completely uniform sampling of the surviving population by producing c child solutions from each surviving solution. In this manner, the search process becomes somewhat less susceptible to stochastic effects such as genetic drift. This is more apparent with smaller population sizes, where stochastic effects tend to dominate quicker. 2.3.1.1.2 Proportional (Roulette) Selection Assume we are given a population pool of n solutions, s ,- , each with a given fitness value, ft. The assigned fitness values must be maximally oriented, that is, a more positive fitness values denotes a better solution. Select a uniform random value, t, from 1‘1 1 the range [0,2 1}). The selected solution is s 1, where 2 f,- -<_ t < 2 f,- , wheref_.1 = 0. i=0 i=0 This is the form of proportional selection used initially in GA by Holland [Holland 75] -39- and Goldberg [Goldberg 89b]. Given its pivotal position in the schema theorem, many GA purists insist that this is the only acceptable form of selection for use within a GA framework. However, the central problem with proportional selection is that the intensity Of selection varies with the variance of the fitness values. In landscapes with large fitness variance, proportional selection tends to converge quite early while in fairly flat landscapes, proportional selection may fail to provide sufficient pressure to differentiate solutions effectively. Further, proportional selection assumes that all fitness values are all positive, or alternately, that a minimal fitness value is known in advance. 2.3.1.1.3 Rank Based Selection Rank based selection is an algorithm that attempts to overcome the reservations about the effects of fitness variance and the requirement of a predetermined minimum fitness range which occur with proportional selection, while maintaining the same relative form of proportional selection. Given a population pool of n solutions s,- , each with a given fitness value, f ,- , sort the solutions in the population pool in order of their relative fitness from most fit to least fit. Assume a positively ranged monotically decreasing discrete function f(i) defined for all integers i 6 [Ln]. Assign the value fli) as the selection value for each solution s,- in the sorted population pool, where i is the sorted ordinal position of each solution. An example of a typical ranking function is f (i) = l - (I ~ 1). The standard proportional selection algorithm is then performed on n these selection values rather than on the raw fitness values. Note that any number of ranking functions may potentially be used. AS with tournament selection, it is possible to -40- carry out ranking selection without quantitative fitness information as long as the qualitative evaluations exhibit the transitive property that f i > f j, and f j > f k if and only if f i > f 11. However, in this case, a minimum of O(n*log(n)) qualititative comparison will be needed to produce the ranking. 2.3.1.1 .4 GA Tournament Selection Given a population pool of n solutions, s; , each with a given fitness value, f ,- , select two or more individuals, as elements of tournament set T, from the population pool with a uniform random probability. The selected solution is s j, where f j 2 f k, s k e T, k at j. Note that tournament selection is parameterized by the size of the tournament set, T, also known as the tournament size. The advantage of tournament selection over proportional selection is threefold. First, the relative fitness variance no longer impacts the selection intensity. Therefore, rescaling of the fitness does not modify the operation of selection if the relative order of fitness remains unmodified. Second, there is no requirement to maintain a positive fitness range or to have any predetemrined minimal fitness value. As with ranked selection, there is no requirement for fitness to be positively biased (that is, this form of selection works equally well with minimization and maximization problems). Third, tournament selection does not require the fitness comparison to be quantitative. Qualitative comparison of solutions is all that is required for tournament selection, as long as the comparison remains transitive. (That is, f ,- > f j, and f} > f k if and only if -41- f ,- > f k.) In the qualitative situation, each selection will require OGIT||)qualitative comparisons. 2.3.1.1.5 Niche Mating and Incest Reduction Incest reduction is actually a form of tournament selection for choosing a mate; however, the ranking criterion is no longer solely based on the fitness value, but rather on the degree of complementation to a previously selected mate. Given a population pool of n solutions, s,- , each with a given fitness value, f ,- , select a single individual , sp , from the population using any other fitness based selection mechanism. Next, select a pool of potential mates, M, where "M“ 2 2. Typically members of M are selected uniformly from the initial population pool, but they alternately may be selected by any standard fitness based selection mechanism. Given sp, select sq e M, such that”(sq,sp ll2||(sq,sj], for all 8} e M, where jet], and ||(sa,sb]lrepresents the Hamming distance (bit difference) between solutions a and b. Niche mating selects mates which are as similar as possible, thereby forming effective subspecies within a population [Deb 89]. The technique is similar to incest reduction except that the most closely related mate in terms of Hamming distance is taken from the mate pool, M. Eshelman demonstrated that incest reduction in the CHC framework was more effective on selected function optimization tasks than niche mating under a standard GA fi'amework. [Eshelman 91] 23.1.1.6 Fitness Sharing -42- Although related to crowding (see 2.3.1.2.1 below), fitness sharing is typically implemented as a form of reproductive selection, whereas crowding is by definition a survivalist form of selection. Fitness sharing forces solutions to carry an effective fitness value which may be reduced from its actual fitness value if there are too many near solutions (measured either genotypically or phenotypically) [Goldberg 87]. Alternate forms of fitness sharing have been proposed based on tournament selection, including restricted tournament selection (RTS) [Georges 95], and adaptive restricted tournament selection (ARTS) [Roy 95]. 2.3.1.2 Survival Selection While reproductive selection attempts to determine which members of the current population will participate in reproduction and to what degree, survival selection determines what solutions are kept after a reproductive cycle. Note that these functions are similar in nature in that if a solution is not selected for mating, it effectively does not survive in successive generations. However, reproductive selection tends to be more stochastic, allowing even the worst solutions a non-zero probability of reproduction, while survival selection tends to be more deterministic — either a solution survives or it is terminated. While it is possible to use both selection forms in concert, this normally produces too much selection pressure for a given system. Therefore typical systems which employ survival selection will also employ uniform random selection or uniform sampling for reproductive selection. -43- 2.3.1.2.1 Crowding Like fitness sharing algorithms, crowding attempts to maintain diversity by controlling the number of solutions which can populate a given solution landscape location. DeJong [DeJong 75] introduced crowding as a form of niche creation in GA. DeJong’s crowding algorithm forces individuals to dislodge similar individuals from the existing population. Thus the population tends to maintain its dispersion across the current peak locations in the population. However, given the nondeterrninistic character of stochastic reproduction processes, the populations still tend to drift toward the most prominent peak over time. Mafoud attempts to solve this difficulty with a deterministic form of crowding [Mafoud 92] [Mafoud 95]. 23.1.2.2 Boltzman Selection Boltzman selection is a technique originating with simulated annealing algorithms. In this form of survival selection, the results of a reproductive operator (mutative or recombinative) are compared against the original (parent) solution(s). Individuals that exhibit increased or equal fitness are always allowed to pass into the survival pool. This is accomplished by use of the Bolztman trial, whereby solution i attempts to maintain its position in the population in regards to potential replacement 1 (fi+fj)/T(t) e individual j. The probability that i wins this competition is , where 1+ T (t) is the current temperature. [Mahfoud 97] The cooling function, T (t), must be in the range [0,1] and is typically a monotonically non-increasing function of t, although this is not a requirement. The current time, t, is typically represented by a count of the number of generations, etc. As -44- the search progresses and t increases, Boltzman selection increases the relative selection intensity until it degenerates to the selection mechanism used in the steady state GA. There are numerous proposed functions for T (t), many of which are taken from simulated annealing studies, or which attempt to induce certain behaviors such as niche formation, etc. 2.3.1.2.3 Truncation Selection and EP Tournament Selection Truncation selection is the simplest of the survival selection forms. The population pool is sorted in order of fitness fi'om most fit to least fit, as with rank based selection. Only solutions in the top T % of the population survive. Miihlenbein gives a direct formulation between the value of T, and the selection intensity, I, and thereby the expected convergence rate of the population in the absence of mutation [Miihlenbein 95]. This form of selection is used in both BGA and genetic programming systems. [Koza 92] EP tournament selection is similar to truncation selection but is slightly more stochastic. Before ranking, each solution, s,- is randomly paired with t other solutions, producing tournament pool T ,-. s,- is assigned a score, v5, equal to the number of solutions tje T ,- for which f ( s; ) S f ( t j ) (assumes a minimization problem). Next, the population pool is sorted according to these tournament score values, and the top A are allowed to survive. F ogel specifies that the value oft should be related to the population size, but no exact formulation is given. -45- 2.3.2 Recombination By definition, a recombination operator is one that forms a new solution by collecting “genetic information” from multiple individuals and combining this information in some fashion. While the metaphor of sexual reproduction suggests a strict biological interpretation with two parents with separation of genetic information in simulated meiosis, and fusing of genetic material in simulated fertilization, in actuality most genetic encodings and recombinative operators are greatly abstracted from the current understanding of these processes. In terms of an exercise in simulated evolution, a recombinative operator Should endeavor to continue to reflect a reasonable abstraction of the actual physical processes. However, if we consider an EC system as primarily a heuristic search process for function optimization, then we are not strongly restricted to these terms. It is therefore possible to broaden the definition of recombination to include forms that are not observable in nature. For example, we are free to consider recombination of genetic material from more than two progenitors. Likewise, we can redefine what constitutes the “genetic information” that is being contributed. Rather than restricting ourselves to the direct interpretation of genetic information as a direct genetic representation in some binary or real valued DNA analog, we can consider any information garnered from multiple members of the population, such as parameter mean, covariance, etc. as potential fodder for recombinative operators. An acceptable definition of recombination therefore includes all operators that produce new solutions, where the composition of the new solution is directly influenced by the composition, structure, and/or relationships between two or more sampled landscape points previously (or -46- currently) visited by the search algorithm. Note that this would still allow us to differentiate purely mutative operations, since the decision as to which points to sample is typically based on the location of a single solution and a global or self-adaptive step-size parameter, without direct input from another solution. Recombination is not a prerequisite to achieve an evolutionary simulation. Systems which do not employ recombination can still qualify as evolutionary algorithms, provided that they incorporate a mechanism of selection which allows shifting of search resources from apparently less fruitful portions of the solution space to more fruitful ones. Systems such as parallel simulated annealing, which always produce a single new solution to replace each independent existing solution according to some criteria, do not qualify as evolutionary computation. Whereas an identical system which allows one solution to produce two or more new solutions with a balancing extinction of one or more existing solutions, where such decisions are made Via some form of competitive evaluation (though not necessarily solely or primarily value related), definitely qualifies as an evolutionary algorithm. The following review is not intended as an exhaustive review of current recombinative operators. The intention here is to introduce some of the subjectively more popular recombinative operators in the current literature and some variants which illustrate specific properties. 2.3.2.1 Discrete Recombination Forms Discrete recombination treats the individual alleles as non-contiguous symbols without a natural ordinality in terms of search space. In this sense, these operators are genomic, rather than phenotypic, since they focus directly on the representation of a -47- solution, rather than the relative position of a solution within the search domain. Non- continuity of representation is not a requirement for discrete recombination, but discrete operators treat even alleles that can be expressed in continuous fashion as if they were not continuous. That is, these operators do not take advantage of or account of the continuity. Since the alleles are not be interpreted geometrically, discrete recombination focuses on mixing the alleles found in two or more parent solutions. Note that the term discrete is somewhat of a misnomer, as it is possible to treat discrete integer values in a continuous manner through averaging, etc. A more correct term would be non- continuous or symbolic recombination. Common forms of discrete recombination exchange information between two parents by selecting inversion areas defined by crossing points. For example, consider two parent solutions, p,- and pi, each consisting of n individual alleles. Assume the kth allele for each encoded solution is the same size. We can consider a discrete recombination operator to consist of selection of an ordered set of unique crossing points, b1, where each h, e[0,n-1]. Given a function C(m), which returns the number of crossing points having a value less than of equal to m, we can compute the elements of the two child solutions, c0 and C], as: h=0,1. Pi,k ifl' C(k)+h is odd Ch,k- pJ-Jc ifl' C(k)+h is even’ Crossover is the standard name for this form of recombination in GA literature. GA crossing points are typically selected uniformly from the set of allowable crossing points. GA theorists use a binary representation when using crossover, with allowable crossing points between each two bits of the representation. This binary emphasis -43- provides direct support for implicit parallelism as posited by the schema theorem. However, it is possible to use crossover with arbitrarily sized alleles. GA crossover Operators are classified by the number of crossing points allowed. The minimal operator is one-point crossover, which exhibits a bias toward a higher frequency of disruption for longer schema patterns. Two-point crossover, also known as circular crossover, attempts to remove this bias by effectively treating the parent solutions as circular rather than linear encodings. Random uniform crossover allows the maximum number of crossing points. Rather than requiring n unique crossing points, uniform random crossover allows up to n crossing points to be selected, while duplicates are ignored. Altemately, uniform crossover may more easily be represented as arbitrary random assignment of allele pairs to the produced children. That is, assuming n uniform random samples, f1 from the range [0,1), the child alleles are selected according to the equations: pm: If fk <0-5 Pj,k ifl fk <05 c0,k = rcl,k = ‘ Pj,k otherwise pi, k otherwise An alternate variant, guaranteed uniform random crossover, forces each child to inherit exactly one half of its alleles from each parent. This may be achieved by creating an ordered list of all possible allele positions, then randomly reording this list. The resulting permutation allows assignment of the alleles first n/2 numbered positions from pito c0, and from [2} to c]. The remaining alleles for each child are then taken from the opposite parent. A common multi-parent recombination scheme involving more than two parents uses commonality or majority voting to determine the composition of child solutions [Pa] -49- 94] [Mtfhlenbein 89]. In these schemes, allele values which recurr with more than a preselected frequency within a selected parent pool are passed on to the child solution, while other alleles are selected at random. Dominant recombination can be yet another synonym for discrete recombination, especially when referring to 2-parent recombination; however, dominant recombination can denote a global form of discrete recombination, where each allele is chosen randomly from any member of the population unifonnly (or via some selection method). Hence, the most heavily expressed, or dominant, expression in any single allele position, if one exists, will be the most likely to be inherited. This closely resembles voting forms of discrete recombination. The shuffle crossover variant involves a temporary inversion operation performed on the components before crossover, which is reversed immediately after crossover takes place. While this may appear similar to uniform crossover, this inversion, when coupled with a one-point crossover, creates a uniform distribution on the total number of alleles exchanged. By contrast, the size of the exchange in uniform crossover is binomially distributed, with the most frequent exchanges being half of the encoding size. Guaranteed crossover forces exactly half of the alleles for each child to be inherited from each parent (assuming an even number of alleles). Guaranteed uniform crossover can be expressed in terms of a shuffle crossover where the crossover point is no longer random, but always selected as the most central cut point. 2.3.2.2 Intermediate Recombination Forms If we consider the components of individual solutions to represent points in a contiguous discrete or continuous domain, then we can potentially extend the -50- interpretation of the “information” being canied by a solution to include representation of its localized “neighborhood” within the domain. Note that this makes an implicit assumption that the functional behavior of nearby points can be extrapolated from sampled points (i.e. the fitness function is locally smooth). If we can accept this assumption, then it is possible to create operators that search within the area of the domain defined by two or more parents. There are a number of similar operators with various names, which implement these concepts. The ES operator, intermediate recombination, formulates a child solution by extrapolating between individual parameter values from two parents. Given two n-dimensional parent solutions, p,- and pi, and n uniform samples, on, , over the range [0,1), we can calculate a child solution’s parameters as: ck = Pi,k +ark (p 13" — P1316): Note that this is equivalent to uniform random sampling of the interior of the hypercube defined by the two parent solutions. In GA circles, this operator is known as blend crossover (BLX) [Eshehnan 93] or flat crossover [Radcliffe 90]. An alternate formulation of intermediate crossover is arithmetic crossover [Michalewicz 99]. Given two n-dimensional parent solutions, p; and pi, and n uniform samples, a], , over the range [0,1), we can calculate a child solution’s parameters according to the arithmetic crossover formula: Ck = ak Pi, k + (1 - ak )p j, k . Guaranteed average crossover [Davis 89] is identical to this formulation with choice of the a], = 0.5 for all k. A more generalized version of intermediate recombination is extended intermediate recombination, which extends the area of sampling by some multiple of -51- the distance between the two parent solutions. Miihlenbein defines extended intermediate recombination according to the same formula as intermediate recombination, but redefines the range of an, to be [-a, 1+ a), where a 2 0. [Miihlenbein 1993]. Milhlenbein suggests a value of 0.25. BLX-a is the same operator as simultaneously proposed by Eshelrnan [Eshehnan 93]. Both extended intermediate recombination and BLX—or attempt to address the problem of the bias toward the center found in their non-extended counterparts. However, unless the level of the extension is relative to the dimensionality of the problem, such operators potentially overcompensate by oversampling outside of the hypercube defined by the parent solutions. Some evolutionary systems also define a form of global intermediate recombination, which independently selects new parents for each child parameter, as opposed to each child solution. Various limitations and alternatives of this strategy may be used, for example, limiting the set of parents to a reduced subset of the entire population, etc. Modem evolutionary strategies extend the concept of global intermediate recombination to a generalized resampling of the existing population distribution. In effect, this is a reduction of the existing population to a series of distribution metrics and reinitialization of the population according to these metrics. For example, the sampled covariance matrix can be used to determine an eigenspace (similar to a principal component analysis) of the population distribution. The resarnpled population is a normal distribution aligned according to this eigenspace with the variances along each axis equivalent to the corresponding eigenvalue. This new distribution produces the same covariance as the original one, within the precision allowed by the degrees of -52- freedom (i.e. population size). Note, however, that the new population may have a completely different shape than the original. For example, the original may be non- normally distributed, highly asymetric, etc. 2.3.2.3 Linear Recombination Forms The concept of extrapolation can be more narrowly applied if we limit our exploration to the line between the parent solutions. In linear crossover [Wright 91], three offspring are produced from two parents. The three offspring are located at three fixed linear combinations of the parent solutions: %E+':‘;2_:':‘P—l‘%f’;: and - g; + g2. Geometrically, this is equivalent to the midpoint (guaranteed averaging crossover point), and two extensions from the ends along the line defined the parents at a half of the distance between the two points. If we consider the set of all offspring, both the mean and covaraince of the parents are preserved, but the average variance is increased by 1/3. Miihlenbein [Miihlenbein 1994] generalizes this concept with line recombination, which is identical in formulation to basic intermediate recombination, with the exception that all terms for a single recombination operation use the same a term (i.e. w,- is replaced by to). This causes all children to be drawn from a unimodal uniform random distribution along the line segment between the parent solutions. As with intermediate recombination, this operator exhibits bias toward the center of the two parents. As with extended intermediate recombination, extended line recombination -53- attempts to counter this bias by increasing the range of the a term to [-a, 1+ a), where a 2 0. For extended line recombination, a = 0.5 removes any central bias. Miihlenbein also introduced a much more complex version of extended line crossover, which attempts to mimic the distribution of binary crossover on standard binary integer values using a discrete version of the log-uniform distribution [Miihlenbein 1994]. Miihlenbein samples the log-uniform distribution discretely by first creating m uniform random binary samples am, where am = I with probability I/m, and 0 otherwise. On average at least one a", value will be equal to 1. Next, this binary sample is m o converted to a log-uniform sample by calculation of: ZaiZ' . Note that the resulting i=1 distribution is roughly uniformly distributed according to the log; of the points; however, certain points (those with less bits in the binary form of their log; value) are more heavily favored over those with more bits. A histogram of 1,000,000 samples of this distribution with m = 10 is illustrated in Figure 2.3. -54- Miihlenbein Log-unifonn Sampling j l l l [ | l M—iihlenbein: l_E _F..W_.v ~~~~~~ Number of Samples .§§§§§§§§ 1' 120 239 353471596 715 334‘ 953’ Sample Value Figure 2.3 Histogram of sample of Miihlenbein’s log-uniform distribution Given a sample, 50,.) from Miihlenbein’s discretized log-uniform distribution, the offspring values for two offspring are calculated as: Pi,k -pj,k Pi k 9 Ck = Pi, k +srfl6(m)[ — p 13k], where r is a constant proportional to the initial range of the given dimension (Miihlenbein suggests half of the range), s equals —1 for the first child, and +1 for the second, and ,6 is equal to 1 with probability 0.9, and equal to —1 otherwise. Note this fimction tends to favor the p; parent when both values are of the same sign (which should be the more fit of the two parents). Understanding the complete derivation of this function is a bit daunting, and the reader is referred to [Miihlenbein 1994] for further information. In essence this operator attempts to simulate binary crossover on integral binary values by incorporating a log—uniform distribution. -55- l l ! Also, this operator attempts to maintain relative scale both to the scale of the search space (as evidenced by the r term above) as well as the distance between individuals. 2.3.2.4 Unimodal Normal Distribution Crossover (UNDX) Ono and Kobayshi have proposed several variants to the standard BLX—a crossover operator. The first, unimodal normal distribution crossover (UNDX) [Ono 97] is somewhat closer to extended linear recombination in calculation. UNDX selects the axes of application from the selected parents. The first two parents fix the orientation of the first axis, the origin of the search distribution, and the variance for the operator distribution along this axis (half of the distance between the two parents). The origin is always taken to be the center point of the two parents. The search variances along the remaining (arbitrarily oriented) orthogonal dimensions are determined as ——1—times the .5 distance between the center of the first two parents and a third parent. One or more children can be produced from this search distribution. m-parent UNDX [Ono 99] enlarges the “population guided” nature of the standard 3-parent UNDX by incorporating m parent samples to determine the search distribution along all dimensions (it is assumed that m is significantly larger than the dimensionality to avoid Singularities, etc.). The origin of the distribution is the origin of the m parents. Also, a “separability” algorithm is applied to the resulting eigenspace to produce a skewed axis set which projects an approximately symmetric distribution from the original m-parent sample. This operator is essentially similar to recent ES versions of global intermediate recombination, but with a restricted sampling and an additional separability analysis component. ~56- m-parent UNDX is quite similar to the principal component mutation operator outlined in section 4.2; however, unlike UNDX, the proposed operator does not incorporate a center-tending bias since it is centered on a single arbitrary parent, and it does not make assumptions about the separability of dimensionality. For a more detailed - comparison of UNDX and the proposed operators, see section 4.2. 2.3.2.5 Simplex Crossover (SPX) Simplex crossover (SPX) [Tsutsui 99] is a multi-parent operator similar to UNDX in concept, yet much more sophisticated in design. First, for an n dimensional search space, n+1 parent individuals are selected. The center of mass, 0, of these solutions are computed, and a series of n random samples are drawn according to the formula: 1 rk = u k+1 , where u is a sample from a uniform random distribution on the interval [0, l), and k = [0,1,..n]. Note that as k increase, rk becomes exponentially more skewed toward 1 . Given the center of mass, 0, an externally assigned growth rate, a, and the n+1 parent “vectors”, Xi, calculate an expanded (or contracted) form of each parent vector X1, as Y,- for each i = 0, 1, n+1 according to the formula: 3'? = 5 + £07? - 5) Now, accumulate samples from the first n expanded vectors using the random samples, rk, as follows: -57- 56= 0.5?- = r._1(i’E—?ZI+EZDJ=10. 1. n/. This effectively combines a portion of the difference between two of the parent vectors with the previous accumulation and then rescales (shrinks) the resulting vector. Next, the final child value is given as: 23 = I; + a; Tsutsui and Goldberg offer the following claims regarding the distribution of search points by this operator [Tsutsui 99]: 1. It is independent of the encoding coordinate system (i.e. invariant across rotation, translation, and linear rescaling). 2. The mean vector of the parents and the children are identical (i.e. it is mean preserving). 3. The covariance matrix of the children is a rescaled version of that of the parent solutions. If we select a = W , then the covariance of the parents and children will be identical. Note that the title simplex crossover was previously used by Renders and Bersini [Renders 94], to denote a similar operator with more deterministic behavior. Their Simplex crossover is a fitness biased operator. Assuming k parents are selected initially, the centroid, c, of the fittest k-I parents is computed. Then the vector from this centroid to the worst parent solution is inverted about centroid position. Or equivalently, the point llpwom — c“ from c along the line between c and pwom is selected. -58- 2.3.2.6 Fitness Biased Recombination Numerous crossover schemes have been developed which specifically bias toward a favored parent. In most cases, the bias is directed toward the more fit of the two parents. These methods do not have a direct biological analog within the metaphor of evolution, but may be seen as an attempt to model dominance and recessiveness without incorporating full—fledged diploidity (with the assumption that preferable traits become more dominant than less favorable ones). Examples of fitness biased recombination include Wright’s heuristic crossover [Wright 94], and Eiben’s fitness-based scan [Eiben 94]. In heuristic crossover, each individual allele is computed as: c,- = rlxi —x j )+ x J- , where r is a uniform random sample on [0,1), and x; is the more fit of the two parent solutions. Note that although the intent to bias toward the more fit parent is apparent, as long as r samples the uniform distribution, this becomes equivalent to linear recombination and BLX. Fitness-based scan selects alleles based on the fitness of the associated parent solution relative to the fitness of all solutions in the parent pool. Thus, the probability that c,- = p11,,- is f (Pi: Pk i )/ f (P), where P is the set of all parents, and pi = p k i is the set of all parents with the same value in the i m allele. 2.3.3 Mutation Mutation is distinct from recombination in that a mutative operation selects search points based on information from a single member of the current population. Search operators that do not use any information from the current population are antithetical to -59- the metaphor of evolution and therefore are not typically considered mutative. Typical mutation operators proceed by adding a level of noise to an individual solution, often in the form of addition of random samples from a given, typically predetermined, probability distribution. Binary mutation is possibly the simplest form of mutation in that a binary mutation causes a bit to change from 0 to 1, or vice versa; therefore, no thought needs to be given as to the distance of the mutative step size. Binary mutation operators focus instead on the application rate, that is, the probability of an individual bit being flipped. Since binary mutation operates on the genotypic representation, it is difficult to characterize its distribution in terms of phenotypic modification as this is dependent on the form of encoding employed for the individual allele. For standard unsigned integer representations, the distribution approximates a discretized log-uniform distribution such as those used by Milhlenbein [Mtihlenbein 95]. However, the effect of binary mutation on integer representations is necessarily dependent on the actual binary pattern of each encoded value. Some approaches to modifying the effects of binary mutation by using alternate encodings, such as grey encoding, have been pursued [Whitley 97] [Whitley 99]. Mutation of non-binary, continuous representations such as integer and real values requires selection and parameterization of a mutation distribution, including determination of an appropriate scale, variance, or mutative step size. Potential solutions for selection of the mutative step size include use of a predetermined fixed value, adaptation based on the level of success, and co—evolution of the parameters of the mutative distribution with the parameters of an individual solution. Note that it is -60- possible for a mutative operator to use separate step sizes, or possibly even separate distributions, for each dimension of the search space. Most mutative operators employ a predetermined distribution for determination Of mutative perturbance. Lee explores an interesting alternative by adaptively adjusting the parameterization of a Levy distribution, thereby adaptively modifying the shape of the mutative distribution [Lee 99]. Typical mutative operators employ common distributions such as uniform, normal (Gaussian), Cauchy, and Laplace probability distributions. These distributions differ in their focus on central sampling, and the length and shape of the distribution tails. EP systems employ centralized distributions to increase the likelihood of small mutations over larger ones. Use of a fixed mutation step size is a simple mechanism, which provides the mutative step size as an external parameter. In such systems, the mutative step size is typically selected to be a fi'action of the expected parameter range for each parameter. EC systems that employ a fixed mutation step size typically have difficulty finding answers within greater precision than the level of mutative noise. For example, consider a system employing a randomly aligned vector with a length uniformly distributed on the range [0,1) as a mutative operator. If a given solution, p, requires refinement on the order of 10e-6 distance, only 1 in 10e6 mutations will be sufficiently small to produce improvement. This argument holds for most other distributions as well. The analysis of ideal evolutionary search systems shows that in order to ensure reasonable progress, 20% of all newly created search points should demonstrate increased fitness [Schwefel 95]. Therefore, one possible adaptive mechanism for mutative step size selection is to increase the mutative step size if greater than 20% of recent mutations have -61- demonstrated increased fitness, and likewise to decrease the mutative step size if less than 20% of recent mutations have demonstrated increased fitness. This algorithm is popularly known as the 1/5th rule. This algorithm assumes that the local landscape is a concentric hill climbing situation (i.e. the assumption is that reducing mutative size increases the success ratio). The most common form of mutative step size adaptation in modern ES and EP systems is self-adaptation. Self-adaptation is accomplished by inclusion of both the solution parameters and the mutation parameters within an individual solution. EP systems tend to employ self-adaptive step sizes across the axes of encoding, while ES systems may evolve either step sizes alone or both step sizes and axial orientation. The mutative parameters are modified by other (meta-level) mutation operators. This meta- level mutation typically modifies the exponent of the mutative step size (i.e. expands or contracts the mutative step Size) and employs a fixed meta-level step size. ES systems also apply recombinative operators to mutative parameters. 2.4 Empirical Test Functions Several of the selected test functions were chosen because they are widely used and highly regarded in EC and function optimization literature. One or two were selected because they provide novel, highly epistatic search landscapes. The remainder were created specifically for this work to provide representative samples of various combinations of the previously listed problem categories and descriptions. Specifically, the four modified versions of Shaffer’s function, the ring function, the trapezoid cross function, the spiral function, the clover & cross function, the multi-modal spiral function, -52- the chain-link function, the double cos function, the inverse exponential function, and the worms ftmction have been introduced Specifically for this work. 2.4.1 Function Categorization and Terms The following terms and categorizations are used in presentation of the test functions. 2.4.1.1 Unimodal Unimodal is a mathematical term which implies that the derivative has only a single extremum; that is, only at one point does the derivative become zero. Some of the functions being presented are not easily differentiated. Further we are not in general interested in all extrema of a function, only the minima or maxima (typically minima). Therefore, a direct geometric interpretation of the term unimodal focusing only on minima will be used here. A unimodal function will be defined as a function for which there is a path (not necessarily a linear path) from every point in the search space to the global optima which is non-increasing (for minimization problems) or non—decreasing (for maximization problems). 2.4.1.2 Monotonic Likewise we will use a geometric interpretation of the term monotonic. For our purposes, a monotonic function is one for which the most direct path between each point in the landscape and the global optima is non-increasing or non-decreasing for minimization and maximization problems respectively. -63- 2.4.1 .3 Center-Focused The term center-focused will be applied to any function for which the majority of points in the search space have a favorable instantaneous slope toward the global optimum. That is, a positive slope in the direction of a global maximum for maximization problems, and a negative slope toward the global minimum for a minimization problem. Determination of this attribute is largely based on visual inspection and analysis, although a complete mathematical treatment is possible for firnctions which can be differentiated. 2.4.1 .4 Independent Variables The complexity Of a test function is determined primarily by the level of interaction between the individual parameter fields within the function itself. In terms of the paradigm of evolution, the interdependence of parameters is somewhat equivalent to epistasis since the expression of the value of one parameter is being masked or mitigated by the value of another parameter. A function which uses variable values independently allows each parameter field to have a consistent contribution toward the function value regardless of the values of other parameters. This property allows parameters to be effectively context independent, in that the worth of a given parameter value in one situation is identical to its worth in all other situations. An EA must still determine each parameter’s contribution to the overall objective function value blindly; however, having independent contribution for each parameter fits most directly with the premise of EC. -64- 2.4.1.5 Relative Variable Relationship While a number of optimization test functions allow for independent contributions from function parameters, forcing parameters to be independent severely limits the complexity of the search space. For example, for all functions where the parameter contributions are each calculated by the same function, independence of the parameters results in a symmetric search space. Further independence implies that the function may be amenable to a Simpler approach such as an inductive search. To provide more complex search landscapes for testing, many test functions choose to combine the problem parameters in a non-linear manner. This causes the contribution of a given variable to become partially or completely dependent on other variable values. Various linear and exponential combinations of variable values are often used to cause linear and non-linear warping of otherwise simple landscapes. For example, consider the search landscape described by: f (x) = 100a2 +b2. This is a simple Sphere function stretched into an ellipse by weighting the a component more heavily than the b component. However, if we modify the input parameters to be x 2 . . . . . . - y In place of a, we have Imposed an exponentral relatronshrp between Input parameters x and y. That is, the optima of the 100 a 2 component follows the line x = y 2. If we further modify the b component to be an offset of x, such as x —— 1, this completes the transform fiom the simple ellipse function to the much more difficult Rosenbrock’s banana function. (Also, note that the 100 weight component effectively provides a signal-to-noise differentiation component.) -65- 2.4.1.6 Symmetric and Asymmetric Syrrunetry can be an important clue in optimization search. Many search techniques make use of axial symmetry (though often not explicitly or intentionally) to locate a global optimum which lies at an intersection between multiple symmetric local minima. Since we are concerned with the effects of symmetry on finding the global optima, we will usually want to consider the symmetry of the function centered on the global optima. That is, the fimction itself may be symmetrical about some other point (such as the origin) but may be considered asymmetrical in terms of the global optima placement. Note that symmetry and parameter independence are interrelated. For example, if a test function is composed of the sum of several independent functions, f ( x 1), then if f is symmetric the test function will also be symmetric. 2.4.1.7 Signal-to-Noise Discrimination Problems A fairly common model for creation of a reasonably difficult test function is to overlay a simple monotonic center-focused function with a secondary firnction such as a periodic sine wave. This can be achieved simply by composing the search problem as the sum of these two firnctions. The optima of the monotonic firnction is typically aligned with the global optima (or one of the global optima in the case of a periodic function) of the secondary masking function. To make the problem more difficult, the second term may be magnified relative to the first through multiplication by a relatively large constant (or likewise, the first term may be reduced relative to the first). -66- This type of problem is similar to situations where a signal is being received in the presence of noise. In this case, the noise would be the distracting local gradient information provided by the second term and the signal would be the potentially weakened value generated by the first term. Obviously, as the magnitude of the masking function relative to the signal function increases, the overall difficulty of the problem increases. A surprisingly large number of the test functions used in EC literature fall within this category. Ackley’s function, Bohachevsky’s function, Griewangk’s function, and Rastrigrin’s function all follow this basic formula with various methods of providing the base function and the periodic overlay function. In terms of signal-to-noise ratio, Griewangk’s function provides the lowest, while both Rastrigrin’s function and Ackley’s function use relatively mild noise components, and Bohachevsky’s function uses a strong signal and a relatively weak noise function. Yip and Pao’s function also fall in the category of signal discrimination problems, as do the spiral, multi-modal spiral, chain, double cos, and worm functions created in this work. However the latter functions can be differentiated in that the overlaying noise functions are not axially aligned periodic functions as is the case with Ackley’s function, Bohachevsky’s function, Griewangk’s function, Rastrigrin’s function, and Yip and Pao’s function. Note that the simplest form of a signal-to-noise discrimination problem is a simple summation of progressively weighted squares of individual parameters, such as with Schwefel’s problem 1.2. The successive geometric weighting of the parameter contributions causes modifications of lower indexed parameters to be masked by the amplitude of higher indexed ones. Thus, if we assume that all parameters have the same -67- initial search range, a search method will typically tend to refine the solution from higher indexed dimensions first. This need for successive refinement can be quite taxing on some EC techniques. Since some of the parameters have negligible effect on the objective value for a reasonably long period, there is a tendency for genetic drift to cause these values to converge prematurely. Thus the more heavily an EC system depends on population diversity, the more difficult such successive refinement problems will be. We can increase the level of successive refinement required by using exponential combinations of the input parameters (i.e. ( Z'xi )2 ) or by using the exponent of the parameter itself (i.e. ( x12 ) ). However, note that the second has the interesting property that once the values reach the | x,- | S 1 region, the successive refinement problem is effectively inverted (i.e. the earlier indexed parameters become dominant). This function is listed below as the Exponential Function. An even more drastic version might use i x. Also note that EC systems which require a minimum continuous level of mutation on all parameters may not ever be able to refine lower order components in such a successive refinement problems. The constant high noise levels injected into the objective function by the amplified mutational effects on higher level components may continuously mask feedback from modification of lower order components. 2.4.2 Function Illustrations Each of the following functions is illustrated with one or more height maps. These illustrations are views of two-dimensional projections of the fimction value along a selected x and y basis. All of other function variables are held at their optimal value unless otherwise specified. The darkness of the various pixels indicates the relative -68- height of each value (as compared to other values in the visible area of the illustration). The lowest points are darkest, while the highest points are white. So, effectively these illustrations provide a three dimensional View with two independent parameters (x and y), with the gray level as the dependent variable. -69— 2.4.3 Square Function The square function is simply the summation of the absolute values of each parameter. This function is unimodal, monotonic, symmetric and center-focused. Each parameter provides an independent contribution to the objective function. Equation: 2|in Type: Minimization Range: NO effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Unimodal, Monotonic, Symmetric, Center-Focused, Independent Variables Figure 2.4 2D Square Function -70- 2.4.4 Sphere Function The sphere fimction is simply the distance of each point in the search space from the center of the sphere (typically the origin). This function is unimodal, monotonic, symmetric and center-focused. Each parameter contribution is not independent. Equation: "2x12 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Unimodal, Monotonic, Symmetric, Center-Focused Figure 2.5 2D Sphere Function -71- 2.4.5 Schwefel’s Problem 1.2 This function demonstrates a linear relationship between the relative ranges of successive parameters. This function is unimodal, monotonic, symmetric and center- focused. Parameter contributions are independent. Equation: 211x? Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Unimodal, Monotonic, Symmetric, Center-Focused, Linear Relative Variable Ranges, Independent Variables Figure 2.6 2D Schwefel’s Problem 1.2 Function -72- 2.4.6 Schaffer’s Function This function provides near infinite resolution near the origin and concentric rings of hills and troughs; however, the general slope trend is toward the center. The original function was designed for only two dimensions. This extended form projects each sequential pair of parameters onto Schaffer’s original function. An alternate form would be to substitute the sum of all square terms in place of the two existing square terms. This form provides a more energetic n-dimensional surface, while the alternate form would present smooth concentric hypershells. This function is multi-modal, symmetric and center-focused. Parameter contributions are independent. n-l l .1 2 Equation: 2(xi2+xi2+1)/4 sin[50(x;2+xi2+1 ] +1.0 i =0 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-Focused -73- Figure 2.7 2D Schaffer’s Function -74- 2.4.7 Schaffer's Function Modification 1 This firnction is a modification of Shaffer’s function which offsets the centroid to the point [l,l,...,l], and forces the landscape to be dependent on multiple variables at once. Note that by offseting the global optima, we lose the level of resolution of the original Schaffer function. The resolution is now limited to the precision of our representation. If we wish to restore the higher resolution capability, we can eliminate the —1 term from the ( xn - l ) clauses. This function is multi-modal and center-focused. The function is symmetric about the global optima, but not linearly symmetric. Parameter contributions are dependent and linearly relative. Equation: n—2 20((xi ‘xi+l)2 +(J.‘i+1 ‘xi+2)2)%1 Si'{50((xi‘xi+1)2 +(x,-+1—x,-+2)2)0'1]2 +1.0 + i: ((xn_1—xn)2 +(x, .1)2)%1 sin[50((x,,_1 _x,,)2 + (x, .1)2)°"]2 +1.0 Type: Minimization Range: No effective range limits Global Optima: [1, 1, ..., 1] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-Focused, Linearly Relative Variables -75- Figure 2.8 2D Schaffer’s Function - Modification 1 -75- 2.4.8 Schaffer's Function Modification 2 This function is another modification of Shaffer’s function that is almost identical to the previous function, but with an additional square term. This modification causes the landscape to become warped in a similar fashion to the Rosenbrock saddle function. As in modification 1 we lose the level of resolution of the original Schaffer function, but we can restore this property if desired. This function is multi-modal, asymmetric and somewhat center-focused. Parameter contributions are dependent and relative. Equation: 2 n—2 ){1 0.1 2 [(‘i “x12+1)2 +(xi+l ‘xi+2)2] Si 5{(‘i ’x12+1)2 +(x,-+1 'xi+2)2] +1-0 + i=0 1 2 [(xn_1 —x,%)2 +(x,, —1)2]A sin 50[(xn_1—x3)2 +(x,, -1)2 J01 +1.0 Type: Minimization Range: No effective range limits Global Optima: [1, 1, ..., 1] Value at Global Optima: 0.0 Description: Multi-modal, Asymmetric, Center-Focused, Eponentially and Linearly Relative Variables -77- I -2.5 0.0 X: Figure 2.9 2D Schaffer’s Function - Modification 2 -73- 2.5 2.4.9 Schaffer’s Function Modification 3 This frmction uses the same mechanism as the original Schaffer function to provide extremely high levels of resolution; however, here rather than concentric rings we provide rapidly fluctuating high resolution energy bands. This is combined with a centralized tanh(distance) and the constant 0.001 term provide a signal-to-noise differentiation problem. This function is multi-modal and symmetric. Parameter contributions are independent. I! Equation: 2 tanh(x,-2 [0.001 + sin(SOQsin(x,- DO 1 )D i =0 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Independent Variables -79- l. '1’" ”1% «no :11“; Hill! ill 3%; *2 3:51.: $13111 3111 I 1111‘ 0111115 Ann 1111‘ A1115 A1111b . :V: 3:: alone all miii; 3m; iiii Illll‘ Gilli. Amt I‘llli ’vmrawe culls Imls v 0 C Mill! Hill! Gilli? “ill. Vllll “2 “it? 1e 11.! WE - nu- - =::I-I: O‘:: iiiiw :liil A111.» A111 - :3 3.311;: ‘ iiiiiv $333 .7- A1116 411 I -010: - - I -. .. - - .IlI'I C OI. - ...m..- ‘.. lmt IlllI Chile ”:3 A 33;: 5‘ ' 2 a..- 9.0.": Figure 2.10 2D Schaffer’s Function — Modification 3, Low Resolution -30- £1: M #10111? " , £11115» - «.21»: c» 1 3331233. ' ”‘0'“. .1, ‘ we? -5.0 0.0 5.0 Figure 2.11 2D Schaffer’s Function - Modification 3, Medium Resolution -31- xm Figure 2.12 2D Schaffer’s Function — Modification 3, High Resolution -82- 2.4.10 Schaffer‘s Function Modification 4 In this, the last of the fimctions based on the Shaffer function, we use the same mechanism as the original Schaffer function to provide extremely high levels of resolution; however, here the concentric rings form around multiple optima. This is combined again with a centralized tauh(distance) and the constant 0.001 term to provide a signal-to-noise differentiation problem. This function is multi-modal and symmetric. The areas between the local optima are semi-chaotic. Also, note that the global optima is not axially aligned with the pattern of local optima. Parameter contributions are not independent. "’1 2 2 o 1 Equation: Ztanh(x,- +xi+1I0.001+ sin(50(|sin(xi)sin(xi+1]) ' )D i =0 Type: Minimization Range: No effective range limits Global Optima: [0, O, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric -33- mm: 2.4.11 Ring Function This function is intended to provide an asymmetric landscape where the global optima basin is non-linear. This function is uni-modal and asymmetric. Parameter contributions are not independent. The target distance value, 1 has been selected to be 5 in the illustrations, and throughout all empirical testing. Algorithm: 1. Convert each pair of variables, xi,x,-+1, to polar coordinates. (I.e. calculate D as the distance from the origin, and 0 as the incident angle from the origin. If D is 0, then 0= 0. Oshould be in the range [ 0 , 2n ]. 2. Calculate d6 = ln—4. 4 3. If d0>7t,then (16:272—(10. 4. Calculate dD = D —t , where t is a constant representing the target distance. d6 5. The function value is: QdD| + 0.00%?) — 0.001. Type: Minimization Range: No effective range limits, although useful range will be relative to t Global Optima: [— tg,—t‘/;...,—t\[—12:] Value at Global Optima: 0 Description: Unimodal, Asymmetric -35- 10.0 " I x 0.0 40.0 I 40.0 o_o 10.0 Figure 2.14 2D Ring Function -35- Figure 2.15 2D Ring Function, Closer View -37- Figure 2.16 2D Ring Function, Near Global Optima -33- 2.4.12 Trapezoid Cross Function This function is a simple asymmetric unimodal landscape. This function is unimodal, symmetric and center-focused. Parameter contributions are independent. " e _x. (tanh(x,-)+1) Equation' 2 — 1'5 l ' . 4.5 2 1:0 Type: Minimization Range: No effective range limits Global Optima: [0.6846512, 0.6846512, ..., 0.6846512] Value at Global Optima: 5.43005203631e-05*n Description: Multi-modal, Symmetric, Center-Focused 5.0 x 1+1 0.0 -5.0 Figure 2.17 2D Trapezoid Cross Function -89- 2.4.13 Rosenbrock’s Saddle Function This function, also known as Rosenbrock’s banana and the De Jong function number 2, is a standard asymetric, center-focused, unimodal function. Parameter contributions are exponentially related. n—l Equation: 2100(x,2 + xi_1)2 +(1 —x,-)2 i =0 Type: Minimization Range: No effective range limits Global Optima: [1, l, ..., 1] Value at Global Optima: 0.0 Description: Unimodal, Asymmetric, Center Focused 2.0 Figure 2.18 2D Rosenbrock’s Saddle Function -90- 2.4.14 Spiral Function This spiral function provides a non-linear, non-center-focused landscape with difficult cliffs. The formula in step 3 calculates an effective modulo on spiral band width w. The calculation in step 2 provides an initial offset distance which varies from [0,w] as Ovaries from 0 to 21!. These two combined provide a spiral shaped landscape where the profile of a cross-section of the spiral is a sawtooth wave. The third component provides a simple ring function at a target distance based on p. The global optima will therefore be the point on the spiral trough that intersects the circle at the target distance (which is designed to be at 45° so that xi+1 may be used as x,- in the next clause of the summation and still have the same target). The two terms work together similar to a signal-to-noise fimction. This function is actually unimodal, though it may appear to be multi-modal with high activation barriers when moving radially. Although the spiral portion of the landscape is somewhat symmetric, the function overall is asymmetric about the global optima. Parameter contributions are not independent. In the illustration and in all empirical testing within this work, we have select d = 1.51, and n = 4. Algorithm: 1. Convert each pair of variables, xi,xi+1, to polar coordinates. (I.e. calculate D,- as the distance from the origin, and 0,- as the incident angle from the origin. If D,- is 0, then 0,- = 0. 6; should be in the range [ 0 , 21: ]. -91- 2. Calculate Ii = D,- — w—Zfi- , where w is the width of the spiral grove. 7r 3. Calculate m,- =1,- —w|_l,- /w_]. 4. The function value is: 2 m,- +0.1D,- —d(p+%), where p is a constant 1' representing the target distance in terms of windings of the spiral. Note that the 1/8 offset forces the target to be at the 45° position so that the global optima coordinates are the same in each dimension. Type: Minimization Range: No effective range limits Global Optima: ['ild(p +9 ’{ldip + 313-). ..,W] Value at Global Optima: 0.0 Description: Unimodal, Asymmetric -92- XM 10.0 0.0 Figure 2.19 2D Spiral Function -93- 10.0 2.4.15 Ackley’s Function The first two terms of this fiinction provide an exponential funnel focused about the origin. The second two terms provide an egg crate form similar to that found in Griewangk’s function, Yip and Pao’s function, and Rastrigrin’s function. This function is multi-modal, symmetric and center-focused. Parameter contributions are not independent. 2 -02 2:; 2cos(2mci) Equation: 20 - 20 e n + e — e n Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-Focused _94- -2.5 Figure 2.20 2D Ackley’s Function -95- 2.5 2.4.16 Gnewangk’s Function Griewangk’s function provides a strong signal-to-noise problem. The signal component is provided by the sum of squares factor, which provides a parabolic basin about the origin. The noise component is provided by the product term which provides an “egg-crate” shaped series of alternating hills and valleys which increase in frequency an decrease in width as the parameter index increases. The signal ratio is factored to be 1/4000th the maximum strength of the noise component. This causes the local minima of the masking product term to be quite attractive to a search process. This function is multi-modal, symmetric, and minimally center-focused. 2 . 2*: x- E nation: 1.0+ — co -i q 4000 H {J} ] Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Very Minimally Center-Focused -96- 10.0 xm 0.0 40.0 40.0 0.0 Figure 2.21 2D Griewangk’s Function -97- 10.0 2.4.17 Clover & Cross Function The first term of the function f provides a bell shaped curve with a maximum at the origin. This term serves as a focusing product on the second term. The second term is a symmetric two-hill shape with a sharp valley near the origin and long sloping tails toward infinity. The net result is a landscape with a global optima at the origin surrounded by a steep hills and a long sloping field which leads away from the global optima. This function is unimodal and symmetric. Parameter contributions are independent. Equation: Zf(xi)3f(0)=0,f(p¢0)=-p—J=——e Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Unimodal, Symmetric, Independent Variables -93- Figure 2.22 2D Clover and Cross Function -99- 2.4.18 Bohashevsky’s Function The first three terms of this function provide yet another egg-crate masking frmction, this time with slightly more emphasis on the second axis of each variable pair. The second two terms provide an oval parabola which is slightly longer in the second dimension. The combination of the two provides yet another signal-to-noise combination, similar to Griewangk’s function, Rastrigrin’s function, etc. This function is multi-modal, symmetric and center-focused. Parameter contributions are not independent. n -1 2 2 Equation: 2 (0.7 - 0.3 cos(37rx,- )- 0.4 cos(3zzxi+1 )+ x,- + 2x1.+1 ) i =0 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-focused -100- Xm Figure 2.23 2D Bohashevsky’s Function -101- Figure 2.24 2D Bohashevsky’s Function, Closer View -102- 2.4.19 Rastrigrin’s Function Like the Ackley’s, Bohachevsky’s, Griewangk’s, and Yip and Pao’s functions, Rastrigrin’s function is a signal-to-noise differentiation problem with a central valley masked by a strong periodic signal. However, unlike these functions, Rastrigrin’s function maintains independence of its parameter contributions. This function is multi- modal, symmetric and center focused, with independent parameter contributions. Equation: 10n + Z (x2 — lOcos(2nx)) Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-Focused, Independent Variables -103- Figure 2.25 2D Rastrigin’s Function -104- 2.4.20 Yip 8 Pao’s Function This function was introduced by Yip and Pao [Yip 95]. In its original version, 2 2 — cos(207vc1)cos(20mc2)+ 2 , the function was designed only for two inputs and used an offset of 2, for a final global optima value of 1. In this modified version of the Yip and Pao fimction, we have extended it beyond two dimensions and decreased the offset to produce the more standard 0 global optima value. As in the Griewangk function, this function frames a signal-to-noise discernment problem with the basic sphere function as the signal value, and a high frequency egg-crate function as the noise mask. Note in the illustration the similarities to the Griewangk function, but the extreme difference in relative scale (due to the higher frequency of the masking component here). it?) Equation: —Z-—n—'-— —- H cos(2071x,-)+ 1 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-Focused -105- IIIIIIIIIIII IIIIIIIIIIIIII,“ I I I I I I I I I I I I I I I I t IIIIIIIIIIIIIII III-III-IIIIIIIII IIIIIIIIIIIIIII I I I I I I I I I I I I I I I I II II III-IIIIIIIII IIIIIIIIIIIIIIIIIII II III IIIIIII IIII 'IIIIIIIIIIIIIIIIII lIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIII lIIIIIIIIIIIII IIIII IIIIIIIIIIIIIIIIIIII iIIIIIIII IIII IIIII IIIIIIIIIIIIIIIIIIII IIIIIIII- III. IIIII III-IIIIIIIIIIIIIIII IIIIIIII- III IIIII ,'....CIIIIIIII.I..-. IIIIIIIIII IIIIIIIII IIIIIIIIIIIIIIIIIIII DIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIII lIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIII ..lIIIIIIIIIOIIIIIII lIIIIIIIIIIIIIIIIIII .IIIIIIIIIIIIIIIIIII I I I I I I I I I I o I I I I I I I I I IIIIIIIIIIIIIIIIIIII I I I I I I I I I I I I I I I I I I I IIIIIIIIIIIIIIIIII IIIIIIII IIIIII IIIUIIIIIIIIIII. IIIIIIIIIIIII III-IIIIII-au Figure 2.26 2D Yip & Pao’s Function -106- 2.4.21 Multi-modal Spiral Function This function, affectionately nicknamed the Dandelion function, is based directly on the spiral function with the addition of an extra periodic function on the distance that the spiral has wound from the center. The parameter r allows control of the frequency of this secondary periodic term. The combination of these two terms produces yet another signal-to-noise discrimination problem. However, given that both the signal and noise functions are in polar coordinate space, a linear encoded solution will most likely view these recurrent hills and valleys as similar to those in Griewangk and other signal-to- noise problems with periodic masking frmctions. However, significantly, the global optima in this case is not at the center of the symmetry, nor is the landscape strongly center-focused. For these reasons, we may expect this problem to provide a reasonably high degree of difficulty for search algorithms. This function is multi-modal, though it may appear to be multi-modal with high activation baniers when moving radially. Although the spiral portion of the landscape is somewhat symmetric, the function overall is asymmetric about the global optima. Parameter contributions are not independent. In the illustration and in all empirical testing within this work, we have select d = 1.51, and n = 4, and r = 7.5. Note that the value nr must be a positive integer. Algorithm: 1. Convert each pair of variables, xi,x,-+1, to polar coordinates. (I.e. calculate D; as the distance from the origin, and 0 3 as the incident angle -107- from the origin. If D; is 0, then 0,- = 0. 0,- should be in the range [ 0 , 2n ]. 2. Calculate l,- = D,- — w?— , where w is the width of the spiral grove. 7r 3. Calculate m,- =l,- —u{li /w_|. 4. The function value is: D,- —d(p+%)+sm(r(23l2dij+6i +%J)+l], where p is a constant representing the target distance in terms of windings of the spiral, Z[mi +0.1 1' and r is a constant representing the number of complete sin waves per spiral turn. Note that the 1/8 offset forces the target to be at the 45° position so that the global optima coordinates are the same in each dimension. The #4 term forces the one of the sin curve optima to coincide with the optima of the first two terms. Type: Minimization Range: No effective range limits Global Optima: ['i/d(p + g) r\:/d(p + ,1?) . WM] Value at Global Optima: 0.0 Description: Unimodal, Asymmetric -lO8- Figure 2.27 2DMulti-modal Spiral Function -109- 2.4.22 Frequency Modulation Sounds (FMS) Problem This function was introduced in [Tsutsui 93]. This function is a fixed dimensional problem with six input parameters. Nonetheless, due to the high level of parameter interaction, this function is one of the most epistatic functions in this list. This function is highly multirnodal, and many of the parameter interactions are fairly chaotic. 100 2 Equation: f f,m. = Z (“o—yo (0) , where (=0 y(t)-x sin 1: 91+): sin x fl+x sin(x 2g) and 1 2100 3 4100 5 6100 ’ y0(t)=1.Osin 5.02—m—+l.5sin 4.92—”'+ 2.0sin(4.83’5-) 100 100 100 Type: Minimization Range: Listed as [-6.4, 6.35] Global Optima: [1.0, 5.0, 1.5, 4.9, 2.0, 4.8] Value at Global Optima: 0.0 Description: Multi-modal, chaotic ~110- .04 0.0 64 X6 Figure 2.28 FMS Function within Initial Range, x5 and x; (All other parameter values held at optimal.) -lll- Figure 2.29 FMS Function within Initial Range, x; and x4 (All other parameter values held at optimal.) -112- 2.4.23 Exponential Function This fimction provides an exponential successive refinement problem if the initial parameter ranges are selected to be equal. It is interesting to note that once the values reach the Ix,- | 51 region, the successive refinement problem is effectively inverted. Equation: in Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-Focused 45.0 0.0' so xi Figure 2.30 2D Exponential Function -113- 2.4.24 Chain-Link Function This function is similar to other signal-to—noise discrimination functions with periodic masks such as Griewangk’s function, etc. However, in this landscape, the local optima are long narrow valleys, rather than pits, and the symmetry of the landscape is non-axially aligned. Parameter contributions are not independent. n—l 2 Equation: Z(sin(x,-)- sin(x,-+1))2 + Z x, i=1 4000 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Center-Focused 40.0 0.0 10.0 Figure 2.31 2D Chain-Link Function -114— 2.4.25 Double Cos Function The double cos function is a highly eipstatic function which nonetheless has a fairly strong central basin and remains symmetric and axially aligned. n—l 2 Equation: 1 + Zeos(x,- cos(x,-+1))cos(x,- cos(x,-+1))+ Z 4330 i =1 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal, Symmetric, Chaotic lllfi‘ii- HI.” J'i'lll. .lll' .'-,'L' Figure 2.32 2D Double Cos Function -115- 2.4.26 Inverse Exponential Function This function is characterized by a weak slope toward the optima in the positive x irange, and a slope away from the global optima in the negative x ,- range, with an extremely high activation barrier to the immediate lefi of the global optima. Note that as the index increases, the tails of the slopes become flatter and the plunge to the optima becomes more precipitous. n . xi Equation: Zflxiai) where f(0,i) = 0 ’ f(a ¢ 0’1) = [0111)] i =1 Type: Minimization Range: Dependent on problem size and encoding precision Scalability: May not scale well to high dimensionality, depending on encoding precision Global Optima: [0, O, ..., 0] Value at Global Optima: 0.0 Description: Unimodal, Asymmetric, Independent variables -ll6- Figure 2.33 2D Inverse Exponential Function -ll7- 2.4.27 Worms Function This function provides yet another variation of the classic signal-to-noise discrimination problem; however, in this instance the noise function provides basins of attraction which are aligned diagonally to the axes, and which primarily do not lead toward the global optima. Algorithm: 1. Define the real-valued modulo function m, as m(a,b) = a — bB-J , and the real-valued floor function, q as q(a,b) = a — m(a,b). 2. For each pair of values, x,- , xi+1 , calculate di = l3q[xi+1 + %,w]—l—sin(xi+ w is a constant parameter of this w fimction. n—l 2 )1 x2 3. Ifdi> I, set d; = 2 - (1;. The function value is: 2d; + Z —'— . . . 4000 1:1 1:1 Type: Minimization Range: No effective range limits Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0 Description: Multi-modal -118- Xm 10.0 0.0 40.0 L A A ‘i A 40.0 0.0 10.0 Xi Figure 2.34 2D Worms Function -1 l9- 2.4.28 Schwefel’s Function This function presents a series of axially aligned local optima which increase in area and depth as the parameter values increase. In order to provide an effective limit, strict range enforcement is required. Interestingly, this function can be used for either minimization or maximization since it is inversely reflected across the x,- = x; + 1 planes. The parameter contributions are independent. I: Equation: Z-xi sin( lxil) i =1 Type: Minimization Range: [-500,500] Global Optima: [420.968745, 420.968745,..., 420.968745, ] Value at Global Optima: approx. 11 * 418.9828872724338 Description: Multi-modal, Symmetric, Independent Variables -120- x91 Figure 2.35 2D Schwefel’s Function -121- 2.4.29 Dynamic Control This function overlays a square function which provides dual pressure. The first clause forces all variables toward their neighbor values, while the second forces all variables toward zero. However, given the equal weighting of the two factors, moving an individual value to zero without also moving its neighbors results in a significantly worse objective value. Therefore all values must be moved simultaneously. No image is supplied for this function. This is similar to the dynamic control function found in [J anikow 91]. n—l n Equation: 20:,- wig-+1)2 + inz i=1 i=1 Type: Minimization Range: No effective range limitations Global Optima: [0,0,...,0] Value at Global Optima: 0.0 Description: Unimodal, Symmetric, Dependent Variables -122- Chapter 3 Analysis of Operators Analysis of the NFL theorems [Wolpert 97] (see Section 2.1) implies that the efficiency of a search process is dependent on the relative alignment between the assumptions made by a search operator and the problem landscape under investigation. One of the strongest assumptions that an operator demonstrates is the expected location of better individuals relative to a previously sampled point, or relative to the current population as a whole. For this reason, a study of the statistical biases of EC operators is warranted. Given the stochastic nature of most EC operators, bias analysis cannot effectively be carried out using a single application instance. However, analysis of the distribution induced by a number of operator applications in terms of population level statistical characteristics may provide insight into the relative bias of a given operator. These statistical biases equate to search biases or assumptions. For example, bias against -123- covariance implies assumptions of parameter independence or context sensitivity of parameter relationships. Variance reduction and center tending imply the assumption that the landscape contains concentric basins of attraction that are amenable to hill climbing. In section 3.1, the various statistical measures and their implications are outlined. This section includes closed form evaluation of several common operators as case studies to demonstrate various forms of bias. Section 3.2 discusses the relationship between local statistics and invariance to homeomorphic transformations of the encoding space, and its implication to EC system implementation. This section also includes a discussion as to the validity of using population local statistics during search. Finally, Section 3.3 presents a series of empirical tests and an initial battery of test cases that are then carried out on a selected number of representative search operators. The results of these tests are analyzed and a preliminary sample operator taxonomy is presented. 3. 1 Statistical Distribution Analysis In this section we will outline various statistical measures present in any group of sampled landscape points and discuss the forms of measurement of these values and what biases may be observed through modification of these values between parent and child populations. Where appropriate, direct closed form distributional analysis will be performed to provide examples of such bias as present in existing EC operators. In general our analysis will consist of statistical analysis of distributions produced from a single set of applications of a given operator (as in a single generation), as opposed to long term Markovian effects. Examples are typically presented as hypothetical input population distributions, without determination of how such conditions might arise in an evolutionary system. Analysis consists of comparing the statistical -124- differences between input and output distributions over a single set of operator applications. When discussing actual evolutionary computation systems, it may be necessary to differentiate between pre-breeding-selection and post-breeding-selection populations. A pre-breeding—selection population refers to the current population before selection for breeding takes place. A post-breeding—selection population refers to the actual or virtual pool of individuals selected for reproduction, whether or not the current operator is being applied to them. In systems where no breeding selection occurs, pre- breeding-selection and post-breeding-selection populations are equivalent. In order to separate the effects of breeding-selection and breeding operators, we will normally compare the results of operator application to the prior post-breeding-selection population. Equivalently, the operators may be considered as being applied in isolation without any effective breeding selection. The reader may refer to section 2.2.1 for a more general overview of breeding selection in evolutionary algorithms. Note that in these analyses, we typically ignore implementation issues such as rates of application for the various operators. Certainly an EC system that selectively applies a given operator will only induce a portion of the bias that could be produced by maximal application of that operator. However, in this analysis we are more interested in determining the extent of these biases where present. Indeed, an EC system can avoid all possible bias by refusing to employ any operators at all; however, such a system would be of little practical use. 3.1.1 Mean Disturbance Disturbances to the population mean may be evaluated as statistical fluctuations or as intentional modifications. Statistical fluctuations occur due to the stochastic nature -125- of the operators and the fact that finite populations and finite sampling are being used. For an individual Operator, these fluctuations tend to follow random vectors whose magnitude is relative to the magnitude of the variance modification induced by the operator on a population associated with a landscape. As such, these mean disturbances are subject to the standard random walk analysis. For example, in l- and 2- dimensional search spaces the mean will revisit any given point an infinite number of times over infinite time; however in spaces at or above 3 dimensions, revisiting any individual point during a random walk is extremely improbable. However, we know that population means do indeed tend to revisit locations consistently within problem spaces of greater than 3 dimensions. Although the statistical fluctuations of the mean for individual Operators tend to be random, the selection operator tends to correct these random fluctuations over time thus negating them (assuming a local attractor in the landscape). Therefore, on the whole we can ignore such fluctuations as long as they tend to be symmetrical and unbiased except in the consideration of genetic drift. (See 3.1.2 Variance Disturbance for a discussion of genetic drift.) On the other hand, as the goal of most search techniques is to focus the sampling near the global optima, it is necessary to move the population mean toward the optima. This movement is achieved by leaving these fluctuations or a component of them uncorrected. Therefore the magnitude of these fluctuations provides information about the search speed of an EC system. Also it determines the system’s ability to escape local minima. It is important to keep in mind that the “mean” is a fictitious point in an EC in that it may or may not have been actually sampled, and that the fitness of the population mean is not necessarily related to the fitness mean of the individuals in the population. -126- Intentional modification of the population mean indicates a biasing of the search process, in that a given area of the search space is more heavily sampled based upon the assumptions of the operator. Typically the assumptions of the operator are guided by feedback from the currently sampled landscape positions (i.e. the fitness and location of individuals in the population). The assumption is that fitter individuals point toward more fruitful search areas while less fit individuals point toward less productive search regions. A standard example of intentional mean shifting is BLX-a-b. BLX-a-b operation is similar to the operation of BLX-a, with the exception that the search area is non-symmetrical about the mean of the two parents. That is, one side of the distribution (typically the one toward the more fit parent) is larger than the other. The intent is to bias the population toward searching closer to more fit individuals. Such biases tend to reduce the variance of the population in local hill-climbing situations. There is a potential for a strong philosophical objection toward such biasing operators. Chiefly, there already exists an operator that by nature tends to focus the search and reduce variance in local hill climbing situations, namely selection. In many EC systems the difficulty is not in achieving more rapid convergence, but more thorough search. Thoroughness and speed are antithetical goals as an increase in speed necessitates less opportunity for search, and therefore a reduction in search thoroughness. In situations in which EC systems tend to converge quickly, adding a second mechanism to amplify the focus of the search tends to reduce the overall thoroughness of the search. In our analysis here, we will tend to ignore operators that intentionally induce a movement of the mean (including selection), since such fundamental bias makes it difficult to determine a baseline for further evaluative statistics. For example, if the mean -127- of the child population is shifted relative to that of the parent population, should the variance of the child distribution be measured from the original mean of the population or the newly induced one? Similar issues come into play for other statistical measures as well. Furthermore, such biases typically require a fitness map of the search landscape which reduces the generality of the analysis. Operators that are not fitness sensitive can be analyzed in a fitness neutral manner. Further, most EC operators are intentionally unbiased (symmetric) in their search behavior, so a large number of existing and potential EC operators may still be evaluated. Extension of this evaluation to intentionally biased operators is left for future research. 3.1.2 Variance Disturbance While the selection operator is the primary population mean modifier, most evolutionary operators tend to modify the population variance. Traditional EC mutation operators tend to modify the population variance in a specific manner regardless of the current population distribution. Selection also tends to modify the population variance; however, here the magnitude and direction of variance modification can be highly dependent on the landscape and initial population distribution. Selection disregards certain members of the population and overselects other members, thereby decreasing the number of contributors to the variance. However, it is incorrect to conclude that selection is therefore always a variance reducing operation. For example, consider the situation illustrated in Figure 3.1a where the current population consists of four points. If we imagine a symmetric hill descending landscape where points A and D are favored over B and C, it is possible that breeding selection may choose A twice and D twice and disregard B and C altogether. In this case, the -128- population variance changes from 3 1/3 before selection to 6 afier selection, thereby nearly doubling the population variance. Likewise, we might consider a similar two- dimensional case as in Figure 3.1b where the population is currently distributed in a ring shape (where darker areas represent increased fitness). Again, if we assume a fitness landscape which favors hill descending, then it is expected that the selected population will be distributed further from the mean (the darker colored points) thereby again increasing the population variance. While hill descending is a common component to these examples, it is not a requirement. Multiple vectors of attraction in the fitness landscape for selection are required to exhibit variance-increasing behavior, but this can be possible in landscapes which do not exhibit hill descending behavior. For example, the situation depicted in Figure 3.1a can as easily be imagined as a two-peak hill- clirnbing problem with competing peaks at points A and D. In monotonic, single attractor (hill climbing) landscapes, the behavior of selection is nearly always variance reducing. Genetic drift defines the effects of finite sampling through use of finite populations causing minute fluctuations in the variance of individual alleles. This fluctuation means that one individual is favored over equally fit individuals. Coupled with selection this fluctuation produces a random dominance effect between equally fit individuals. In genetic terms, heterozygosity (the representation of multiple competitive alleles in the population) decreases through genetic drift. In fact, over infinite time heterozygosity decreases to zero due to genetic drift, which means that eventually one allele will dominate the population to the exclusion of all others. While in the short term selection may cause increases to the population variance, the fact that we are using finite -129- populations means that ultimately genetic drift will cause the population variance to be reduced to zero. Most mutation operators employed in EC increase the population variance. Given that the normal method of mutation is to impose a sampling distribution about one or more individuals in the current population, typically by addition of a random sample from a selected distribution, the net effect is to increase the variance of the resulting child population. For this reason, mutation is often synonymous with variance increase. That is to say, most operators that increase the population variance are viewed as containing an element of mutation. One of the few examples of a consistent variance-reducing operator other than selection is the averaging crossover [Davis 91], where each two selected parents produce a child at their center mass, or mean point. To demonstrate this, consider the average contribution of two parents to the population variance as compared to the variance contributed by their mean. For simplicity, assume the population mean at the origin (i.e. zero). Since the variance is relative to the distance to the mean, not its absolute coordinate placement, this assumption does not invalidate the argument for populations where the mean is not at the origin. The average contribution of the parents is given as 2 2 x + x + . —1——2— , whereas that of their mean point is given as [fl—51;] . The difference rs 2 2 2 . x + x + . . . . . then given by: -—1—2——2— ~[fl—2—f—2—j , whrch reduces to (x1 — x2 )2 , whrch rs a posrtrve value for all xpt x;. We conclude that the averaging crossover operator loses variance -130- relative to the square of the average distance between parents, (which is another definition of the population variance). Increased Increased A B C D 1 2 3 4 Increased (a) one dimensional hill descending (b) two dimensional hill descending Figure 3.1 Example Situations which Increase Variance through Selection In contrast to mutation operators, most crossover operators tend to preserve the current variance in the population. In order to differentiate between the effects of selection and crossover, we can measure the population variance difference between the post breeding-selection pool (i.e. the average variance of those selected for reproduction), and the collection of all children produced by a crossover operator. Standard field-based crossover does not modify individual field values. Further, since each allele represented in the post breeding-selection pool will be present in some child solution, standard field- based crossover always has a zero net modification of the population variance. That is, the variance of the population after crossover is exactly the same as it was before crossover. Global intermediate recombination, which repopulates the population from the mean and variance of the parent population, also tends toward zero net population variance modification. However, there can be significant fluctuation in the variance -131- modification under global intermediate recombination due to the fact that the sampling set (number of crossovers applied) is finite. 3.1.3 Mean Focusing (Center Tending) It is possible for two operators to produce distinct distributions with identical mean and variance measures, which nonetheless represent radically different search distributions. For example, one operator may create a search distribution which uniformly samples within a certain range of the population mean, while a second operator might sample densely about the population mean and produce a small number of search points far beyond the range of the first operator’s distribution. If these distant search points are sufficiently distant and balanced about the mean, the resulting mean and variance measures may be identical to that of the first operator. Clearly the second operator demonstrates an assumption that highly sampled local search with occasional large jumps provides a more efficient search pattern; however, this assumption is not apparent in the normal statistical measurements between the parent and child populations. There are numerous metrics that might provide some insight into the shape of the search distribution. Since our first two measures deal with the first two moments of the population, it is only natural to think about use of the third moment, or skewness of the distribution. In practice, the third moment is somewhat impractical to compute for multi- dimensional search spaces and requires an exponentially larger number of sample points. Also, it may be difficult to compare two relative co-skewness matrices in terms of similarity. A simpler metric is to estimate the bias of the distribution of the distances of the population members from the mean. While this metric in unable to distinguish between -132- equivalent, but rotated, asymmetrical distributions, we may assume other metrics such as the measure of covariance disruption should allow us to detect these situations. Given the distribution of distances about the mean, we can use the first and second moments and their relation to the median distance (similar to the Pearson skewness coefficient) to produce a metric which measures the degree of bias toward the center of a given distribution. For more details on the exact formulation of this metric, see Section 3.2.1.4. 3.1.4 Covariance Disturbance The covariance of two variates is an indication of the level of correlation between them. Two equivalent formulas for calculating the covariance of two variates are given in Equation 3.1. Since covariance is relative to a mean, covariance of single points is meaningless, so we will typically address the covariance of an entire population or subgroup of a population. COV(xi:xj) = 5101' *fliXxJ' -.Uj))= mej) — E(x,-)E(xj) Equation 3.1 Formulas for Sample Covariance Disturbance of the covariance between parameters or fields by an evolutionary computation operator implies an assumption that the existing covariance is either an aberration, or an indication of an underlying relationship inherent to the problem space. Depending on the assumptions made, the covariance may be dampened or amplified by application of a given operator. Historically, EC researchers have only recently begun to focus on the covariance effects of their operators directly, therefore we find that typically most operators tend to dampen covariant tendencies in the population since they are not specifically designed to preserve them. Subsection 3.1.4.1 examines the covariance -133- disturbance inherent in standard GA crossover, while 3.1.4.2 provides similar analysis for BLX-a. Section 3.1.4.3 further discusses the significance of covariance loss during evolutionary search. 3.1.4.1 Crossover Covariance Disturbance Consider the three example parent distributions illustrated in Figure 3.2. In each example, the parents are uniformly distributed along a line segment that is one unit in length (the source population points are distributed along the heavy black line segment). In parent distribution a, the slope of the line segment is 0, for b and c, the slope is 1/2 and 1 respectively. For each distribution, the gray area indicates the expected exploration area produced by application of standard field-oriented crossover. (Note that in example a the area of exploration is the same as the distribution of the parents.) .____. .74 ._74_. a)m=0 b)m=1/2 c)m=1 Figure 3.2 Three example parent distributions -l34- Given these examples, it seems obvious that more exploration is taking place in example c than in b, which in turn demonstrates more exploration than in example a. In fact, example a demonstrates that crossover carries out no exploration once one of the parameters has converged, while (as we shall prove) example c shows the maximal exploration produced by a crossover for a continuous uniform population along a line segment. Observing behaviors graphically in this manner can provide some insight into the overall behavior of an operator; however, a mathematical analysis provides a more useful framework for generalization and characterization of such properties. An operator which retrieves information for parameters xi, and iii independently from the post- breeding selection pool (i.e. from independent sources) may be classified as fully dissociative for parameters i and j. Dominant recombination with m = u is an example of a fully dissociative operator. A standard crossover operator with a 100% application rate would also be fully dissociative. Given that the parents are chosen uniformly from the post- breeding selection pool, the expected covariance between xi and xj will be zero since the expected covariance between any two independent variables is zero. Given this result we can calculate the expected percentage of covariance loss for any partially dissociative operator. Consider a partially dissociative operator 0. We can partition the children produced by 0 according to whether parameters Xi,m and me for child 111 came from separate sources. (Note that for this level of analysis we assume that independence of sources implies independence of values; however, this is not true if x”, and xi,q fi'om parents p and q of child m are not independent. Since such issues concern long term -135- Markovian properties, we choose to ignore them for the moment.) Thus, we have created two sets, one for which xi and xj are independent, and one for which the expected covariance is the same as the original population. Also, the expected mean of both sets is the same as the mean of the original population. The formula for sample covariance in Equation 3.1 implies that covariance contributions are independent as long as the mean remains constant; therefore, the expected value of the covariance will be the ratio of the number of non-dissociated children to the total number of children multiplied by the covariance of the original population. (I.e., the dissociated children are expected to contribute zero net covariance to the child population, so the covariance of the non- dissociated children is averaged across the entire population). Thus the net proportion of expected covariance loss is equivalent to the net expected level of dissociation. Note that we can also adjust this calculation for the possibility of two parents having a common ancestor for xi or xj. Since selecting two parents which are not independent across xi or xj is equivalent to not dissociating xi and xj between the parents and the children, we can simply add any such occurrences to the set of non-dissociated children, even though 0 was successfully applied. Therefore, the final expected proportion of covariance loss is equivalent to the eflective dissociation rate, which is the probability of application of 0 times the probability of dissociation between X; and X] for a given application of O, multiplied by the probability that X; and xj are independent for randomly selected parents p and q (or equivalently, one minus the probability that xi or xj are not independent). -l36- Loss = p0 * Pdiss(xi,xj) * pi"d(xi’xj) Equation 3.2. General Variance Loss Forumla Let us assume a population with fully independent values for xi and Xj across all parents (i.e. each parent has a unique value for xi and xj). We can consider the relative levels of dissociation for a number of forms of standard crossover by considering the probability that the values are dissociated for different loci within a fixed encoding. c1 P1 y + dy q. ---------------------- ? y .pz- ___________________ {2 X x + dx Figure 3.3. Crossover covariance modification example 2 (x+dx)(y+dy)+xy—,u2 =2xy+dyx+dxy+dxdy—,u Equation 3.2 Covariance contribution of parents -l37- (Jt+dx)y+x(y+dy)—y2 =2xy+dxy+der—,u2 Equation 3.3 Covariance contribution of children (2xy+dxy+dyx—,u2)—(2xy+dxy+dyx+dxdy 412) = —-dxdy Equation 3.4 Covariance modification Figure 3.3 depicts the effects of an application of two-parent field-based crossover on two fields. The covariance contribution of the original parents may be expressed by Equation 3.2., and the covariance contribution of the children is expressed in Equation 3.3. Thus the change in the covariance of the population that occurs when these children are substituted for the parents may be expressed as shown in Equation 3.4 as —dxdy. Since each crossover application produces two children, the effective covariance displacement per child is half of this value: —dxdy/2. Note that if the parents and children had been reversed in this example, the covariance modification would be dxdy/Z. This result implies that magnitude of covariance modification depends only on the distance between the parents in each dimension, or equivalently, the distance between the parents and the slope between the parents. Thus, the level of covariance loss is independent of the actual placement of the parents relative to the origin. By substituting the slope formula and Pythagorean identity into Equation 3.4, we can translate this formula into slope-distance form as given in Equation 3.5. -138- dxdy_ dzm 2 2(l+m2) , where d and m are the distance and slope between the parents respectively Equation 3.5 Magnitude of covariance modification, slope-distance form Therefore, the relationship of the relative magnitude of the covariance disturbance and the slope for equal distance between parents is m/(1+m2). This relationship forms a sigrniodal curve as depicted graphically in Figure 3.4 for d = 1. This relationship has the interesting properties of reflectivity about the origin, f(-x) = -f(x), and symmetry between reciprocals, f(l/x) = f(x). Note that if the slope is constant for a given parent distribution, 2 . we can treat the factor m/(l-l-m ) as a constant; therefore, we can now compare relative magnitudes of covariance loss of similar distributions with different alignments to the axes of encoding. For example, consider the relative magnitude of covariance loss for a fully dissociative operator operating on a population as in Figure 3.2 with slope m = tan(n/S) as compared to the maximal covariance loss when m = 1. The result shows that the relative covariance disruption is still 70% of the maximum; however, this also means that since the covariance disruption will be greater than this level between slopes tan(rr/8) and tan(31r/8) as well as between the negated slopes -tan(1t/8) and -tan(31tl8) then the effective covariance loss is within 70% of the maximum for half of the possible slopes (the shaded areas in Figure 3.5). We can calculate the expected value of the magnitude of covariance loss as a percentage of the maximum given a uniform distribution of the possible slope (angle from the origin) by substituting m = tan(a) into Equation 3.4 multiplied by «[2 (since m is unifonnly dense from 0 to n/2) and integrating -139- from 0 to n/2 as shown in Equation 3.6. So the expected level of covariance modification is approximately %63 that of the maximum. \ 2*in/(1+m*m) — _1 .0 .0 N .5 I l -10 T5 0 5 10 Figure 3.4 Relative covariance disturbance as factor of slope between parents -140- 1 Creator: j Tk Canvas Widget 1 Preview: | This EPS picture was not saved with a preview included in it. I Comment: ‘ This EPS picture will print to a PostScript printer, but not to other types of printers. Figure 3.5 Region of covariance loss 2 70% 7y 2 1 1 / tan(a) da-2 2m 0 l+tan(a) 71' Equation 3.6 Expected relative magnitude of covariance loss for fully dissociative operators Equation 3.6 supports our earlier intuitive assessment that the magnitude of covariance modification is related to the slope between the parent solutions. From Equation 3.6, we can now estimate the magnitude of covariance loss in the situations depicted in Figure 3.2. First, we need to find the distribution of the difference between two samples from a uniform distribution. It can be shown that Equation 3.7 is the probability distribution function for the quantity ls, — szl where s], s; e U( a, a + w ). -141- This distribution is depicted in Figure 3.6 for w = 2. From Equation 3.7 we can now estimate the expected value of d2 necessary for Equation 3.6 by substituting the variable x for the quantity ls] — szl in Equation 3.7, and then integrating this expression multiplied by the quantity x2 from 0 to w, resulting in the expression shown in Equation 3.8. Substituting this result into Equation 3.5 produces the final result given in Equation 3.9. Note that to calculate the level of covariance loss, we now only need the slope of the line segment and the width of the distribution along that line segment. 2/é*(1-x/é) .— 0.8 - - 0.6 - — 0.4 L — 0.2 ~ ~ O 0 of2 0:4 0f6 0:8 i 112 1:4 1:6 1:8 2 Figure 3.6 Expected difference of two samples from U(a,a+2) 2 la-bl — l— ,where51,szeU(a,a+w) W W Equation 3.7 Pdf for expected difference between two uniform samples -142- W 2 x—(1——{]=—1—w2 ,wherex = lsl—szl, S1. 52 E U(a, a + W) 0 w 6 Equation 3.8 Expected value of d2 for a uniform distribution of width w wzm 12(1 + m2) Equation 3.9 Expected level of covariance disruption per child for a uniformly distributed population along a line segment with slope m and length w The result in 3.9 is of limited value since it only applies to uniform parent distributions along a narrowly focused area, which may expected to be somewhat rare under actual search conditions. However, we extend this result to consider other properties of covariance loss under crossover. Consider a population which is similarly aligned as that in Figure 3.2 c), and still strongly covariant, but which is distributed normally about the mean rather than uniformly. Figure 3.7 illustrates a sample discrete distribution matching this description. A point of interest would be to determine if the centralization of the distribution decreases or increases the covariance modification, and to what extent. Intuitively, we would expect the level of covariance loss to decrease, given that the parents are more tightly clustered about the center of the distribution. -143- 0.4 . ~ 0.2 ~ -0.4 r - -o'.4 0.2 0 0:2 0T4 Figure 3.7 Normal distribution along covariant line segment It would be useful to extend this form of analysis even further to be able to include arbitrary relationships between two parameters in the post- breeding selection pool. This is possible given that we are able to compute the joint probability distribution of d and m between members of the post- breeding selection pool, and either the probability distribution of d or m (or both). If we have such information for continuous parent distributions, we can substitute the probability functions into Equation 3.5 and integrate across all possible m and all possible (1. For discrete distributions, we can directly measure the average contribution for each possible two-parent combination (including two copies of the same parent if the selection mechanism permits that). 3.1.4.2 BLX-a Covariance Disturbance In order to calculate the relative magnitude of covariance disturbance produced by BLX-a from a given parent distribution, we will make a few simplifying assumptions. -144- First, we assume that the mean of the current population is the origin, thus alleviating the need to normalize the covariance contributions. Further, we assume that the mean of the population remains at the origin after operator application. This is a reasonable approximation since the distribution for standard BLX-or is symmetric along the axes, so for reasonably large populations, the expected mean movement is minor. In this analysis, we will again address an example of two parents selected from a uniform linear distribution in two dimensions similar to the situations depicted in Figure 3.2. For this analysis, we assign one parent the coordinates (a,b) and the second the coordinates (a+v, b-I-w), as illustrated in Figure 3.8. The child solutions will be selected uniformly from the grey shaded region. We can easily determine the covariance contribution of the parents as ab + (a+v)(b+w), since we are assuming the population mean is zero in both dimensions. L a a+V Figure 3.8 Search distribution of a BLX-a operator -145- Determining the potential covariance contribution of a child solution toward the covariance of the next generation is a more difficult computation, since its position is not necessarily fixed. The contribution can be determined by finding the average expected covariance contribution for a child. To accomplish this, we need to integrate the expression xy over the given limits and average the result by dividing over the total area being integrated (vw). The result of this calculation is given in Equation 3.10. a+vb+w 1 j ['——CUdedy vw x=ay=b avw2 bwv2 vzw2 b+W (Ii-V v w = avbw+ + = a+— b+— 2 2 4 b a 2 2 Equation 3.10 Expected covariance contribution of a child produced by BLX-a [a+-:—)(b+1;—)-(ab+(a+v)(b+w))/2=—¥ Equation 3.11 Expected magnitude of covariance loss produced by BLX-a To find the expected difference between the covariance of the population when the child replaces one of the parents, we subtract out the average of the parent covariance contribution and arrive at the expression given in Equation 3.11. In our earlier analysis of standard crossover, the distance between two parents along each dimension was labeled (1x and dy, therefore if we substitute v = ti; and w = (ly d d x y which is into Equation 3.11 (which would be equivalent to BLX-0.0), we get — exactly equal to Equation 3.5 divided by 2. We conclude that fully dissociative BLX-0.0 - 146 - loses one half of the covariance that a fully dissociative standard crossover does in this situation. Since the constant probability factor in the integral in Equation 3.10 and the ranges between the limits are both relative to the width of the distribution of x and y, the result is the same regardless of the selected widths and therefore the result is independent of 01. Further, since the average covariance contribution of the parents remains constant, Equation 3.11 is correct for all values of or. Therefore, the magnitude of covariance loss remains constant for all values of or. The fimction of or is apparently that of a scaling value, in that the shape of the distribution (as characterized by the level of covariance) remains constant, but the total variance in the child population increases or decreases proportionally with on. It is important to note that this result is extremely limited as it only applies to the covariance modification during a single operator application to a single pair of parents in two dimensions. These results are likely to change when situations with higher dimensionality are considered. Further, in order to predict the overall level of covariance loss for an entire single generation with multiple applications of a given operator, we would need to be able to determine the expected distribution of distances between the parents. 3.1.4.3 Significance of covariance preservation Many systems tend to ignore covariance altogether, both in terms of landscape relative indications as well as general search mechanisms. What significance, if any, is there to the relative relationship between variables in a search landscape? What are the limitations to the information we can gather and what is its relative value and cost? -147- There are two potential sources for covariant tendencies among parameters. These are spurious alignment and selective pressure. Like other population statistics, we may assume that both factors are normally present, and the degree to which either dominates is related to numerous factors, including the population size, local landscape conditions, etc. Therefore, some portion of the covariance of parameter values within the population is likely directly due to selection, in the same manner that some portion of the survivors of selection are likely to inhabit locally (even possibly globally) fruitful search areas. This latter assumption is seldom questioned; however, many EC systems are quick to discard other potential information resulting from selection. The level of information available in the population covariance is likely more strongly tied to the population size than other population statistics, such as allele diversity, thereby requiring larger populations. The required calculations are relatively expensive when compared with simpler recombinative and mutative operators, but are reasonably low when compared with most real-world application fitness function costs. A more theoretical cost factor is related to the NFL theorems. By making assumptions about the validity of the covariance of a given population in terms of directing further search, NFL implies that such a system should be stronger where such an assumption holds true, and necessarily weaker where it does not. Thus, it is possible that use and preservation of covariance information causes a loss of generality; however, the same argument can be applied to all EC systems in comparison to more general systems such as random sampling. The potential benefits of using such information would include more efficient and effective search, and invariance to rotation and diversity preservation in covariant landscapes. -l48- 3.2 Local Statistics and Homeomorphlc Encoding Invariance One of the central contentions put forth earlier is that evolutionary search operators and thereby evolutionary search systems should operate consistently relative to local landscape features regardless of other details. Specifically, any method which encodes a given landscape and which does not disturb the relative scale and distances between search points should produce equivalent results. Encodings such as shifting to logarithmic encoding or from Cartesian to polar coordinates may be discounted since the effects of these transformations modify relative local landscape features; effectively creating a new, if related, search landscape. At first one might consider that an ideal search system would indeed allow for all such transformations, or indeed would perhaps seek them out in order to facilitate search. However, using NFL as an analysis tool, it becomes clear that such a system is not a feasible reality. Since any possible search space may be mapped onto any given single search space given an arbitrarily complex mapping frmction, any system capable of remaining consistent across all such transformations would necessarily have to remain consistent across the space of all possible search problems for a given encoding size. However, the NFL theorems clearly state such a system is not obtainable. An argument may be made that for any given transformation a new landscape has been created — a landscape which remains independent and which cannot (or should not) be effectively analyzed in terms of its relationship to the original. However, this argument overextends the intentions of the NFL theorems and reaches conclusions not fully in evidence. The NFL theorems do not require that each problem be treated fully in -149- isolation with no relationship to other problems regardless of similarities. In fact [Wolpert 97] suggests problems should be treated in terms of similarities. Therefore for the set of transformations from which we desire invariant behavior we must select a non-empty (hopefully non-trivial) subset of the set of all possible transformations. These transformations are selected primarily on their generality, that is, the probability that such a transformation is known to occur between various alternate forms of solution encoding. Candidates include any homeomorphic transformation, such as a standard affine transform. Any transformations exhibiting isometry, that is, a continuous transformation that preserves distance, should be included. Common isometric transformations include coordinate translation, linear rescaling, and rotation of coordinate axes. Note that we may consider these transformations to take place in the genotype space, the phenotype space, or in a fitness-relative or other operator-specific manner. Given the number of operators included in this study, the potential differences in genotypic representation, and the naturalness of expression in R" for the class of functions we are studying, we will use the phenotypic distance (distances between the represented points regardless of the form of the actual representation) when evaluating landscape transformations. In the remainder of this subsection we will examine several common types of transformations. Subsection 3.2.1 addresses all forms of linear scaling, 3.2.2 examines rotation of the axis set, and 3.2.3 examines origin translation. -150- 3.2.1 Translation of axes One of the simplest possible transformations to a landscape is translation of the origin. In terms of encoding, this can simply be accomplished by adding or subtracting a linear term to each instance of an encoded parameter x,- within the fitness evaluation (e. g. substituting (xi— 1) for all x,- tenns within the fitness function). Thus, translation is often a byproduct of modification of some of the constants used within the fitness evaluation. Translation appears at first glance to be such a simple transformation that it is tempting simply to dismiss it as trivial. However, it is important to note that the level of precision of certain real valued encodings, namely IEEE floating point, are significantly higher near zero than about other integers (several hundred or thousands of magnitudes greater precision). Further, given that half of most typical floating point encodings cluster between zero and-one, it is certainly possible for careless treatment of the floating- point constituents to achieve strong bias toward the origin. Such biases coupled with the natural tendency of researchers to design encodings and test problems that place known optima near or at the origin may produce an inflated level of performance, which becomes irreproducible on truly unknown landscapes. Such systems tend to degrade quickly under modest translation of these same landscapes. 3.2.2 Linear rescaling There are three progressively more limited subsets of the set of all potential forms of linear rescaling which we will consider here. The most inclusive is asymmetrical linear rescaling along arbitrary vectors. The vectors selected may or may not be orthogonal (though any set of non-orthogonal rescalings may be reduced to one or more -151- sets of equivalent rescalings across orthogonal axes). These rescalings may be asymmetrical in that the magnitude of the rescaling may differ across the different vectors. A more limited form is asymmetrical linear rescaling along the original axes of encoding, and the most restrictive form is symmetrical linear rescaling along the original axes of encoding, where the scale factor remains constant across all axes. All three subsets have valid circumstances under which a researcher may hope to achieve equivalent results. In the presence of covariant behavior among the parameters of a search space, it is possible for arbitrary external influences to cause non-axially aligned linear rescalings. For example, suppose a given fitness landscape is defined by the error function: f(x) + g(x) + Cl * ( p; - pi), where p,- and pi are two of the problem parameters, c1 is a constant, g(x) is a non—linear function dependent on both i and j, and fix) is an unknown additional factor in the fitness function which is independent of p,- and 1)}. As c1 increases in value, the landscape is effectively compressed along the vector < I,-1 > , and stretched along the <1,1> vector in the i, j plane. Note that if g(x) were linearly dependent on i and j, then the same compression could be achieved with a pair of linear rescalings along the i and j axes. The magnitude of the diagonal compression/expansion is directly dependent on the value of c;. It is possible that the value of c,- is a fixed value, perhaps even of key importance to the specific problem instance. However, such scaling factors are often selected quite arbitrarily in practice. A search system which is not invariant over asymmetrical linear rescaling of arbitrary -152- vectors places the onus of selecting an optimal value of C] squarely on the shoulders of the researcher, normally without any method for evaluation short of trial and error. The circumstances that make it desirable to have invariant behavior across asymmetric linear rescaling along the original encoding axes are more common. Often a researcher may be uncertain of the actual relative effects of various parameter interactions. Certainly the researcher is almost always blind to pressures caused by localized landscape conditions which may be encountered during the search process. Thus, for a number of search problems the researcher might desire the search process to be insensitive to the relative scaling of the input parameters. Otherwise, simply deciding the relative ordering of the scaling factors which should be applied to the parameter set is on the order of d!, where d is the dimensionality of the problem (i.e. the number of parameters). Note that for most linear rescalings some points in the search landscape may become unreachable due to the relative precision and/or capacity of the underlying encoding. Likewise, some points which were previously unreachable may become reachable after any linear rescaling. While this may appear to increase the effective size of the search space, it is actually the density of the represented points which is increased. Therefore, we would expect rather than prolonging the search this should actually allow a higher level of refinement over time. However, it is not our intent here to evaluate alternative methods of encoding, or to explore the interplay between precision of representation and search behavior. Similarly, we may normally expect a level of speed up when the search space is linearly expanded if the initialization space becomes effectively more condensed. -153- However, we will normally assume that the population initialization occurs across the same bounds regardless of and scaling which is employed (i.e. the scaling is employed to the initialization bounds as well). 3.2.3 Rotation of axes Rotation of the axes of encoding can occur when parameters are used in linear recombination within the fitness firnction. Arbitrary rotation may seldom be an accidental result of an attempt to encode a given problem space within an EC system; although certainly some problem domains probably tend toward expression in arbitrary linear combination of parameters (such as solution of linear combinations of equations, etc.) However, the appeal of maintaining invariance across rotation is that the relative distances and local fitness relationships remain largely intact, thereby producing an effectively equivalent landscape in terms of local search characteristics. Given that there is no effective change to the actual problem landscape, only its orientation, this transformation approaches axial translation in its simplicity. Nonetheless, arbitrary free rotation of a problem landscape can drastically effect the performance of numerous search methods [Salmon 1998] [Fogel 1990] [Patton 1999]. This is largely due to the fact that many EC systems and operators do not attempt to preserve or estimate covariance information relative to a given local landscape. Rotation can cause previously independent parameters to be expressed as covariant combinations of axially aligned parameters in the rotated landscape. An arbitrary coordinate rotation is easily achieved by inserting a series of d(d-1)/2 arbitrary rotations (rotating axis 1' and j clockwise while holding all other axes fixed, for each possible paring i and j in a d dimensional space) between the encoded parameters -154- and the fitness evaluation. (I.e., the encoded parameters are rotated to the new coordinate space, and then these coordinates are used in place of the original encoded parameters within the fitness function.) The encoded parameters now receive fitness feedback based on their partial contribution toward one or more (potentially all) of the rotated parameters. This highly covariant situation requires multiple specific simultaneous modifications to simulate a single parameter modification in the natural (unrotated) encoding space. 3.2.4 Underconstrained (Free) Parameters An underconstrained parameter is one which is ignored (relatively or absolutely) during fitness evaluation. Addition of unconstrained parameters is a common issue when attempting to design a problem encoding for a specific problem type. Often the researcher is uncertain what variables are expected to directly or indirectly affect a given quality measure. Therefore, it is often tempting to include as many potential parameters as possible. During actual operation, a number of these parameters may not be used at all during fitness evaluation. A parameter in the encoding that provides no contribution to the fitness generation throughout the search landscape is a free parameter, or equivalently, an absolutely underconstrained parameter. Extending the dimensionality of the problem landscape without modifying the underlying fitness evaluation easily transforms any problem landscape into an underconstrained landscape. A second form of underconstraint is covariant underconstraint, which occurs when two or more parameters always operate in linear combination within the fitness landscape. For example, consider a transformation from an arbitrary landscape to a similar one with an additional degree of freedom, where the expression (x,- - x,,+1), is -155- substituted for each occurrence of the parameter x,- in the previous fitness function. Parameters x,- and xn+1 exhibit covariant underconstraint in that there is no fixed value required for either within the landscape, but rather there are an infinite number of equivalent solutions along each vector x,- — xn+1 = c. This situation closely resembles pareto optimal search landscapes, which are known to cause difficulty with many EC systems. We can modify the relative slope of these vectors by introduction of a constant multiplier. This substitution can even find non-linear paths of equivalent solutions by use of a non-linear combination of x,- and an. Further, the equivalence can be extended to hypercubic regions, or arbitrary hypercubic shapes by creation of n-way covariant underconstraints in which three or more parameters are combined. Such forms of underconstraint are seldom intentionally introduced; however, when dealing with unknown fitness landscapes, redundant parameterization is fairly common. Relative underconstraint implies that a parameter may not actually be fiee, but the relative contribution of the parameter is so small when compared to the absolute value of the local fitness landscape that it is effectively underconstrained. Complete relative underconstraint is a product of rounding errors within the level of precision of the representation being used. The parameter impact cannot be discerned, because it is effectively discarded during calculation. General relative underconstraint is closely related to signal-to-noise issues, in that random fluctuations from other parameters may mask feedback from the parameter. Unlike absolute and covariance underconstraint, relative underconstraint tends to vary across the fitness landscape. -156- Relative underconstraint is a common problem in the creation of EC fitness functions. Much of the relative success or failure of a given EC approach may be more related to the form of the fitness function and the levels of relative underconstraint than on the forrrr of the EC system being employed. A simple method to create relative underconstraint is to scale a linear component of the fitness function that is composed from a subset of the encoding parameters by a relatively large (or very small) factor. Likewise, relative underconstraint can be “tuned” by intentional introduction of such constants. Punch [Punch 1991] and Rayrner [Rayrner 2000] explore such modifications using evolved scaling values and masks for evolving various pattern discriminator systems. All forms of underconstraint provide no or little feedback about the quality of values in the underconstrained dimensions. Two common outcomes observed in EC systems working on underconstrained landscapes are loss of diversity and loss of focus. Hitchhiking and genetic drift are both forms of diversity loss and are more likely to occur where parameters are relatively underconstrained. On the other hand, systems with an especially strong mutation component tend to pump nearly limitless energy unchecked into underconstrained parameters. Both situations cause difficulty in relative underconstraint situations in that by the time the population reaches areas of the search space where the relative underconstraint has been reduced or eliminated, the population may not have sufficient diversity or may be too wide spread to find an optima. Some EC systems use global landscape information such as the population variance, etc. taken from the current or previous population samplings in order to guide further search directions. Such systems may find highly underconstrained landscapes to -157- present problems in differentiating effective noise in underconstrained dimensions from useful search information. Further, many systems, such as CHC, use convergence measures to estimate completion or other operational modifications. If convergence is measured in genetic or phenotypic terms, underconstraint can short-circuit such systems. Some EC systems depend on the relative ordering of parameter locations within a given encoding. Insertion of unconstrained parameters into these encodings may have negative or positive effects, but in either case definitely modify the degree of linkage between parameters. Some studies have suggested intentional transformations of this sort precisely to bias such parameter linkage aspects.[Forrest 1992] These transformations are modeled after the concept of introns in DNA; although Daida [Daida 1999] points out borrowing such terminology is imprecise at best. 3.2.5 Validity of Local Statistical Extrapolation There are two potential attitudes toward population-level statistics, one is that they are completely arbitrary and meaningless (essentially stochastic noise) at worst and potentially incorrectly biased at best; the other is that they contain useful information which points toward profitable locations for further exploration. To an extent, the majority of evolutionary systems correspond to modifications of population-level statistics in that the actions of evolutionary systems are typically tied to the location of the individuals in the current population. However, individual operators may choose to employ information from a single individual only or additional information from the current population. Multi-sourced operators, those that use information from more than a single solution, are often labeled multi-parent operators. However, multi-parent usually implies that all sources have successfully passed a similar selection phase (e. g. are all part -158- of the post-breeding—selection pool). We will use the broader term cohort-driven operator (CDC) to refer to any operator which samples one or more individuals from the current pre-breeding-selection or post-breeding-selection population for the purposes of extracting additional guidance in selection of firture search directions. Note that cohotr- driven operators may extract information from one or several additional individuals, up to and including the entire population or even from past populations. Also, cohort-driven operators may employ alternate techniques for selection of the cohort for a given operator application, even possibly employing information from both pre—breeding-selection and post-breeding-selection populations or tailoring selections to match currently selected individuals (e.g. selection of “neighboring” points for a given point). Examples of cohort-driven operators include all forms of crossover, BLX-a, and intermediate recombination. Most mutation operators are single-sourced, and as such are representative of those operators which choose to ignore further information in the population. There are several potential reasons for ignoring additional cohort-level information. Reducing the potential for bias is the typical motivation for most non-CDO mutation operators. Other motivations for creating single-sourced operators (880) include assumptions that the space may be locally noisy and allowing independence between operation distributions and population distributions (i.e. the assumption that operator distributions may be as well or better determined through alternative means). Examples of single-sourced operators (SSO) are binary mutation and self-adaptive mutation. Note that for the most part, since 830 are independent of population level statistics they frequently remain neutral in population level bias tests. ~159- However, we cannot reach the conclusion that simply because an operator is single-sourced that it will perforce be completely unbiased and independent of specific problem information beyond that represented by the local landscape. The majority of SSO mutation operators are strongly dependent on the presentation of the landscape in terms of the selected axes of encoding. Equivalently, we may specify that such operators are sensitive to the degree of covariance expressed in the encoding of the given landscape. Again, the corollaries of the NFL theorems provide a good basis for evaluation of the potential benefits of CD0 versus SSO. CDO typically exhibit certain biases in regards to the way in which they select distributions of search points from existing points. To the degree that the biases of an operator align with a given landscape, a system using that operator is likely to perform better. Conversely, when the operator bias does not match the given landscape, a system using less biased SSO is more likely to obtain better performance. 3.3 Empirical Analysis The following tests are proposed for measurement of mean modification, variance modification, covariance modification, and center tending by reproductive operators. Each test is performed by applying an EC operator in a single pass to one of the standard “test case” distributions specified in section 3.4.6., and comparing the measured statistical difference between the sampled test case points and the offspring produced by the operator. To completely characterize each operator, it would be necessary obtain several measurements and estimate the probability distribution for each characterizing statistic. For our purposes here, we will simply compute the mean, variance, min, and -l60- max for each statistic over a number of measurements, rather than attempt to graphically display the histogram of each probability distribution. Note that since we are primarily interested in functions with real domains, all of these measurements will be geometrically interpreted (i.e. in phenotypic space). However, similar analysis is possible in terms of genomic or even fitness relative measures. 3.3.1 Statistical Tests The following statistical measures are designed to allow characterization of the distribution induced by various operators. These measures are natural extensions of the statistical measures discussed in section 3.]. 3.3.1.1 Mean Modification This statistic is quite simply the distance between the center of the geometric locations of all solutions selected for breeding and the center of all solutions produced via the given operator. In mathematical terms, given solution sets p5, representing the selected parent solutions (including duplications), and c,- representing the set of all children produced, the total mean modification is given as: 2 Pi, j Ck,j y — A 9 where Pi,j and Ck; represent the jth real j i k component of the ith parent, and kth child vectors, respectively, and p is the total number of parents sampled, and A is the total number of children produced. Altemately, the same formula may be expressed as: ~161- \[Z((E(Px,j)- E(Cy,j V), where x and y are uniformly selected variables in the ranges [0, p) , and [0, ,1) respectively. 3.3.1.2 Variance Modification The proposed variance modification statistic is also relatively simple. As with the mean modification, we are interested in the change of the total population variance between the set of all selected parent solutions and the set of all produced solutions for a given operator. However, there are two alternate candidates for this measurement. The . . . 2 2 . Simplest rs to compute the total variance for each set as a p , and a' c, respectively and then report a'p - ac. This formula will be labeled the global variance modification (GVM). Altemately, given the vectors of variances across each axis of encoding, we can 21%.)“ Tap,j)2 calculate the variance modification as: d I d , where vp, ,- represents the . th . . . . . variance of the parent pool along the j axrs and likewrse, vc, ,- rs the variance of the children along the same axis. This measure will be denoted as the axial variance modification (AVM). Note that the AVM will always be at least as large in magnitude as the GVM; however, the GVM will ignore variance modifications that occur due to variance shift between dimensions. Operators that maintain overall population variance while shifting that variance among dimensions will be characterized by a large difference between the GVM and AVM measures. Note that the AVM is a limited form of covariance modification -l62- measure as well, since such operators much necessarily realign the covariances present in the two sets. 3.3.1.3 Covariance Modification As with variance modification, there are two possible methods of measuring the covariance modification: globally, and per instance. The covariance modification measure should ideally be invariant to mean shifting and variance modification if the underlying covariance relationships (i.e. the general shape of the distribution as characterized by the covariance measures) are maintained. Thus, we require a normalized instance of the covariance matrices for the parent and child sets as input for this metric. In order to compute this statistic, we first need to compute the covariance matrix for both the source population pool and the pool of produced solutions. Each element of the covariance matrix can be determined by the equation: it . th ci,j = 2(xi, k - p,- Xx j, k - ,u j ), where xi, k represents the t parameter of the k M k=0 . . , tlt . . sample in a pool, and ,u ,- rs the mean of the l parameter. Next, each covariance matrix is normalized by dividing each entry c 131' where i at j, by the product of the square roots of Cl, J. -——————————,i¢j,ci,,- ¢0’cj,j $0. In «Chi ticj,j the case, where c i, ,- = 0, or cid- = 0, we can effectively ignore, since all covariances in entries c i, i» and cl; j, unless cs, ,- = 0. That is, these rows and columns should be zero as well and therefore have no contribution to this metric. This effectively normalizes the covariances relative to the independent variable -l63- variances. From this normalized matrix, we can calculate the level of covariance disruption without regard to mean modification or uniform rescaling. As with the variance modification measure, there are two possible methods of measuring the total covariance disruption — globally and pairwise. However, since we are explicitly interested in the form and alignment of the population shape when evaluating covariance modification, the global form of this statistic does not provide much information (except to demonstrate general tendency toward overall reduction or increase in covariance, which the pairwise statistic will also demonstrate). The resulting measure of covariance modification is taken to be the average of the squares of the pairwise differences between the upper triangular entries of the normalized parent and child sample covariance matrices. Or more concisely: n—l n 2 22: z: (05-) wt?) CMod = 1:11:12“ 1) , where d is the dimensionality of the problem, (10). .th .th . . . Ci, j rs the 1 entry on the 1 row of the norrnalrzed covariant matrix calculated from (C) the parent samples, and Ci 1' is the corresponding entry in the normalized covariant matrix from the child pool. Since the number of components for the calculation grow exponentially, not linearly with the size of the problem space, this metric is normalized as a function of the dimensionality. 3.3.1.4 Center Tending In order to measure the degree of center tending, the following algorithm is proposed, which is similar in nature to the measure of the Pearson skewness coefficient, -l64- with the exception that the third moment is not being used. First, we calculate the average and median distance from the center for both the set of selected parents and the set of child solutions. Given the distribution of distances from the population center, consider averages pp and ye, medians, mp, and me, standard deviations 0p and cc of this ”1’ _mp _Pc Tmc distribution. Calculate the measure of center tending, CTM, as: For the case where a", or ac become 0, the corresponding term may be considered to be zero (since the associated difference between the mean and median values will also be zero). Note that this formula normalizes the measure from each set individually, thereby ignoring modifications to the variance that do not also modify the shape of the population distribution. Center-tending operators will exhibit a negative value on this statistic, while center-avoiding operators will exhibit positive values. A center neutral operator (e. g. spherical normal mutation) will still tend to exhibit a positive value in multidimensional domains simply due to the volumetric differentials. If we consider an n-dimensional hypersphere (e.g. a spherical uniform mutation) with its center intersecting the surface of a second hypersphere (the hypersphere of all points distance d and less fiom the center of the set of selected parents), the area of intersection with the second hypersphere will always be less than the area which does not intersect. This is true for all multi-dimensional situations, and the effect increases as the dimensionality increases (even though the effective volume approaches zero as the dimensionality increases). The ratio of the volume of intersection to the volume of the initial hypersphere (e. g. the mutation distribution) approaches 0.5 as the radius of the initial hypersphere approaches zero. As the radius shrinks, the interface of the -l65- intersection appears less curved and begins to approximate a linear boundary. It is possible to simulate volumetric normalization on a distribution of distance measures given the dimensionality of the domain; however, this adds a heuristic component to an otherwise deterministic measure without known benefit. The unnormalized instance of this statistic correctly reflects the probability of an operator moving toward, or away from the center in terms of distribution modification. Additionally, the value of this statistic for known center neutral operators can be used as a baseline for comparison for n-dimensional domains. A volumetrically normalized version of this measure might have an advantage in that it would be more sensitive toward shifts toward center than shifts outward when compared to the unnormalized instance. 3.3.2 Statistical Test Cases There are an unlimited possible number of distributions that could be used as the source parent distributions for these statistical tests. In the vein of the NFL theorems, we intend to limit our selections to analogs to common localized situations. Hopefully, the measurement of the effects of a given operator on these test distributions may give some insight as to the expected outcome of using that operator on a search landscape which may present similar distributions. Note that all of each distribution is homeomorphic to a symmetrical distribution (that is, no non-linear warping is represented). 3.3.2.1 Unidimensional Uniform Aligned The simplest test distribution, also arguably the most artificial, is a uniform sampling along a one dimensional unit vector which is aligned with one of the axes of -166- encoding and remains perpendicular to the remaining encoding axes. An example of such a distribution would be all samples of the form {c 1, Q, C 3, u ,-, c5}, where c1, c2, c3, and C5 are all arbitrarily selected constants, u ,- 6U(C4, C4 + 1), and U(a, b) represents a uniform sample taken from the range ( a, b ). Note that although the distribution itself is unidimensional, it is cast within an n-dimensional encoding space. This allows for observation of covariance modification. Therefore, this distribution is characterized by n+2 parameters: the number of dimensions, n, the set of constants, ci, and which dimension is selected for uniform sampling. For consistency, we select ' d 5 + 5 . . . . c,- = 2 (I ma ) , and we assume the first drmensron rs the one sampled, allowrng us to characterize the distribution with the single parameter, it. Note that the variable values for this test case are completely independent. 3.3.2.2 Unidlmenslonal Uniform Rotated The second distribution we choose to test is identical to the first, with the exception that the unit vector is rotated in n-dimensional space. This causes the sample to become highly covariant with strongly dependent variables. An arbitrary rotation is applied by n (ll-1) / 2 rotations between each pair of axes (requiring n (n-I) / 2 angles selected uniformly from (0, 2n]. These rotations may be collected together by multiplication of the given rotation matrices resulting in a normalized rotation matrix (i.e. one that rotates without rescaling). The previous unit distribution can be sampled, and each sample can be rotated by multiplication with this rotation matrix. Note that this new distribution is located in a different locality than the matching uniform aligned -l67- distribution, since the distribution is rotated about the origin, not the center of the aligned distribution. While the use of randomly sampled rotation is ideal, it requires the average of multiple sampling measurements to obtain a reasonable estimate for the given statistical analyses. Further, this statistic becomes much less stable and reproducible without large levels of sampling. These difficulties increase exponentially with the dimension of the encoding space, it. For these reasons, we will fix the rotations for this test case to that of sequential 45° (i.e. 1r/4 radians) rotations between successive dimensions. That is, for each integer ie(1, n-l), rotate dimension i 45° in the direction of dimension i+1. This rotation results in uniform maximal covariance among the variables of the resulting rotated distribution. 3.3.2.3 Unidimenslonal Normal Aligned Flat uniform distributions are expected to be somewhat rare in evolutionary computation, especially since positive selection pressure tends to cause more normally distributed points in localized hill climbing situations. Therefore, a slightly less artificial distribution might be a normally distributed unidimensional sample. As with the unidimensional normal aligned distribution, each example of this distribution will be samples of the form {c 1, c2, c3, n 5, c5}, where c1, c2, C3, and C5 are all arbitrarily selected constants, u,- eN(c4, d), and N(a, b) represents a normal sample with mean a and variance b. Again, we will reduce the number of free parameters by arbitrarily selecting d = 1, c; ' d 5 + 5 . . . . . . . = 2 (I "w ) , and c1 as the sampled drmensron. Srnce the normal distribution rs more -168- centralized (and therefore is a closer approximation to a single point distribution), we expect most statistical effects to be less pronounced with this distribution as opposed to the unidimensional uniform aligned distribution. Note that since the normal distribution is an infinite distribution, there is no range limit on this distribution; however, for all practical purposes, an arbitrary range limit of 6d from the center of the distribution should not cause significant loss. 3.3.2.4 Unidimenslonal Normal Rotated Given that we have a rotated version of the unidimensional uniform distribution; it seems natural to investigate a rotated version of the normal revision of this distribution as well. As with the rotated uniform unidimensional distribution, while random rotation would be most general, we will limit our usage here to successive 45° rotations as with the rotated uniform unidimensional. 3.3.2.5 N-dimensional hypersphere surface An ideal distribution for determining center tending is one that consists of all points along the surface of an n-dimensional hypersphere. This distribution may also be viewed as the collection of all points exactly distance d from the center point of the hypersphere. This distribution is characterized by n + 2 parameters, n, d, and the center point coordinates . Again, in the interest of reducing irrelevant 2(imod5)+5 parameterization, we arbitrarily select d = 1, and c,- = . This is certainly an artificial distribution, in that its production as a population distribution within an EC framework would be quite unusual; however, it provides good insight into the hill climbing/hill descending tendencies of operators. -169- As this distribution is symmetrical, we would not expect the addition of arbitrary rotation to enhance our understanding of the operator statistics. Therefore, there is no associated rotated version of this distribution. In order to create unbiased sampling, each point is selected by arbitrary rotation (using it (n - I ) / 2 two-dimensional uniform random rotations) of the point about the point This produces the least biased sampling along the surface. Other standard techniques, such as selection within the uniform hypercube denoted by the comer points, , and rescaling the resulting vector from the center of the hypersphere to unit length cause bias toward the comers of the hypercube. Even if we modify this algorithm to discard any initial points outside the sphere (i.e. only allow positive rescaling of the hypercube samples), the difference in the level of representational density causes a slight bias which is not apparent in the rotation method. 3.3.2.6 N-dimensional uniform hypersphere Given the distribution consisting of all points on the surface of a given hypersphere, a natural extension is to include the distribution consisting of all points within the hypersphere. This distribution selects points uniformly at random from the interior of a unit hypersphere centered on the coordinates . This test distribution allows us to examine hill-climbing tendencies in a less artificial setting. The lack of bias toward the center should allow operators with strong center tending and center avoiding tendencies to be readily apparent. -l70- The least biased method for sampling this distribution is by producing arbitrary points within the hypercube denoted by the comer points, , , and discarding points which are greater than (1 from the center of the hypersphere. However, this method becomes exponentially slow as It increases, due to the increase in the number of points found outside of the hypersphere. A more computationally tractable method is to use random rotations and set the length of the vector equal to l = J; , where p is an uniform sample from the range [0,1). To reduce parameterization of this distribution, we assume that d = 1. 3.3.2.7 N-dimensional normal distribution This distribution most naturally depicts a standard hill-climbing situation, in that the points are strongly clustered about a center point with the density tapering off as a frmction of the distance from this central point. This distribution is simple to produce, requiring 11 samples fiom the normal unit distribution, N(0,1). Each sample may then be described as: (C; + it], c; + 112, , on + nu}, where m is an independent sample from the N(0,1) distribution. This is equivalent to the set of samples of the form: {s1, s2, , sn}, where s,- eN(c,- , 1). This distribution is characterized by n+1 parameters, n, and the center point coordinates c;, for all integers i e (1, n). In order to reduce 2(imod5)+5 extraneous parameterization, we will again choose c,- = , thereby allowing characterization of this distribution with the number of dimensions, a, alone. This distribution is symmetric and therefore should be invariant under rotation. -171- 3.3.2.8 N-dimensional ring distribution As an EC system begins to converge on a symmetric optima, a possible expected population distribution is to have a few solutions near the optima, with increasing density toward a given distance d, followed by decreasing density beyond that point. The occurrence of this distribution often occurs as follows. First, a point is located near distance d from the local optima randomly during the search process. Suppose that this point is the first sample fi'om within the attraction basin of the given local optima. Assuming this is a favorable point, the EC system will begin to search in the local neighborhood of the landscape. Large jumps have a much higher probability of failure than smaller ones, so soon we have several points in a normal distribution about the initial point. If we assume that the fitness isobars surrounding the local optima are convex (e. g. circles centered on the optima), then selection is likely to begin biasing this distribution along these isobars. Thus, over time, we may develop a semi-ring shaped density where the distance fi'om the center is roughly normal in shape. To produce this distribution, we begin with the n—dimensional hypersphere surface and add a symmetric n-dimensional normal sample. In order to achieve the desired distribution, the variance of the normal sample must be small relative to the radius of the hypersphere. In order to place the tails of the normal sample at the r/3.5, where r is the radius of the hypersphere. As before, we assume r =1 to reduce extraneous parameterization. 3.3.2.9 N-dimensional normal ellipsoid rotated All of the multirnodal distributions outlined to this point are symmetric. In order to produce an effective non-symmetric distribution, we can consider a scaled version of -l72- the n-dimensional normal distribution. This can be accomplished by sealing each dimension arbitrarily by s,- (or equivalently setting the variance to Si), Where s,- is a uniform sample from the range [I/n,n). In order to reduce the stochastic nature of this test and to eliminate additional parameterization, we will fix s,- = i for each dimension i e (0,1,...n}. In order to study the effects of this asymmetry fully we choose to study this ellipsoid under rotation. While we can define the general test case in terms of arbitrary rotation, again to reduce stochastic effects and excess parameterization we arbitrarily select the same rotation used with the rotated unidimensional uniform distribution. 3.3.2.10 N-dlmenslonal skewed rotated While the previous distribution shows some level of asymmetry in terms of the distributional width along various cross-sections, that distribution is still symmetric across the axes of the distribution. In order to create asymmetry across individual axes, we choose to scale the half of the ellipsoidal distributions in the previous distribution by a constant term b. This distribution can be computed by sealing all normal samples before rotation. For example, if we select b = 0. 5, we can first create an n-dimensional sample from the non-rotated ellipse as , where c,- is a sample from the normal distribution with variance i. Next, we rescale all negative components by multiplication by b. And finally, we apply the required rotation. This creates a distribution that is foreshortened in each rotated dimension. -173- 3.3.3 Example Statistical Analysis To demonstrate these empirical tests and the forms of differentiation possible through them, we will evaluate a number of methods for producing new distributions. First, we focus on simple mutative operators by adding a random sample from a random distribution with a fixed variance. The distributions evaluated are uniform, normal, Cauchy, and log-uniform. Since numerous algorithms choose to simulate recombination by sampling points distributed symmetrically about the mean of two or more parents, we will simulate these fixed variance mutative operators centered both on a single parent and on the center of two parents. Second, we will evaluate three linear forms of recombination: averaging, linear, and extended linear. Two forms of dominant crossover, the two parent (a.k.a. uniform crossover) and n-lparent versions are also evaluated. PC crossover is similar in form to uniform crossover; however, the axes of application are determined by a sample of the population. BLX-0.5, and SPX, and PC Gaussian operators use position information from two or more parents to determine the search distribution. Also, we evaluate a modification to the BLX-O.5 algorithm that is parent centered, rather than mean centered. Details as to the implementation of each of these operators in these test is provided in the following sections. 3.3.3.1 Fixed Uniform For this operator, an independent uniform random sample is added to each tat—2 component of each solution. The range of the uniform sample rs [~12 ——2—, ,——2—2~], which produces a zero mean sample with variance of 1. Thus, the expected total variance addition is dependent on the number of dimensions. For the mean centered version of the -174- operator, the center of mass of two solutions is calculated, and two child solutions are produced by adding two independent sample sets to this center of mass. 3.3.3.2 Fixed Normal For this operator, an independent normal random sample is added to each component of each solution. The normal distribution sampled is that with a zero mean and a variance of 1. Thus, the expected total variance added is dependent on the number of dimensions. As with the fixed uniform operator, the mean centered version of this operator produces two children by adding two independent normal sample sets to the components of the average of two parent solutions. 3.3.3.3 Fixed Cauchy This operator functions in the same manner as the fixed uniform and fixed normal operators above, except that the samples are drawn from a Cauchy distribution with mean 0 and “width” of 1. Note that the variance of the Cauchy distribution is infinite. 3.3.3.4 Fixed Log-Uniform This operator adds samples from the log-uniform distribution across all dimensions. The log-uniform distribution is defined as a distribution that is uniform in the distribution of the log, of the samples for some integer radix b, (b>1). (Since all logarithms are related by a constant multiplicative factor, the selection of b must only be relative to the width of the uniform distribution.) For this operator, we choose the sample using the following algorithm: 1. Select two random uniform samples, 5, and s; on the range (0,18). -175- 2. Calculate Iu =1031‘18 . 3. If s; multiply In by -1. Note that the maximum value in this distribution is 1. In order to modify the relative variance to approximate 1 across each dimension, we also multiple each sample by the constant value 9.12. 3.3.3.5 Averaging This operator simply replaces the parent solutions with a solution that is the center of mass, or mean of two parent solutions. In order to facilitate computation, we select three parent solutions, p1, p2, and p3, and create three child solutions as the means of p 1 and p2,p2 and p3, and p1 and p3- 3.3.3.6 Linear This operator produces two child solutions from two parents by selecting two points uniformly along the line between the two parents in n-space. Note that all of the child solutions are geometrically “between” the parent solutions. Two child solutions are produced from each pair of parent solutions. 3.3.3.7 Extended Linear This operator is the same as the linear operator, but the sampling range is extended along the line between two parents to extend beyond the two parents. The size of this extension is equal to half of the distance between the two parents, thus, the child solutions should be equally distributed between the two parents and outside of this area. Two child solutions are produced from each pair of parent solutions. -176- 3.3.3.8 Field-Based Uniform Crossover This operator treats each dimensional parameter as an indivisible unit, or allele, and performs crossover between two parents by arbitrarily assigning an allele from one of the two parents. The two-child solution produced are complementary, in that for each allele assigned to a child solution from parent 1, the same allele is assigned from parent 2 to the opposing child solution. 3.3.3.9 Global Dominant Recombination This operator also treats each dimensional parameter as an indivisible unit, or allele, similar to field-based uniform crossover. However, rather than restricting recombination to two parents, we allow arbitrary selection of a given allele to any individual solution f;rom a randomly selected pool of parents. For this evaluation, we fix the poolsize at 50. 50 children are produced from each pool of 50 parents. 3.3.3.10 PC Crossover This operator uses the eigenspace of the covariance matrix as a basis for application of uniform crossover. This eigenspace chooses a basis set that aligns one dimension with the vector of maximal variance in the population; the second axis is aligned with the maximal variance in an orthogonal direction to the first, and so on. This eigenspace is also the basis of the principal component analysis. First, the covariance matrix is computed from a uniformly selected pool of solutions. For this evaluation, the pool size is fixed at 2d, where d is the dimensionality of the problem space. Next, the eigenspace of this covariance matrix is calculated. The two parent solutions are remapped into this eigenspace basis, and uniform crossover is -l77- performed on the components of the two solutions as represented in the eigenspace. The two produced child solutions are then mapped back into the standard encoding axes. Each two parent samples produce two children solutions with an arbitrarily selected pool for each operator application. 3.3.3.11 BLX-O.5 This operator is a direct application of the standard extended blend crossover operator, BLX-0.5. The components of a child solution are determined asck =p1,k +sk [p1,]c ‘P2,kl5k e[— O.5,1.5), where S], is a uniform sample. This operator is identical to the extended linear operator, except that the uniform sampling is independent across each dimension. Each two child solutions samples the region delineated by two parent solutions. 3.3.3.12 BLX-O.5 Parent Centered This operator is the same as the BLX-0.5 operator, except the range for each parameter is shifted so that the sampling is centered on the first parent solution. That is, given the formula for BLX-0.5, ck = p1,]: + Sk (Pl,k — p2,k ),sk e [— 05,15), the parent centered version of this operator can be calculated by simply shifting the range of the uniform samples, 5 k e [- 1,1). Each two parent samples produce tvm child solutions. 3.3.3.13 SPX This operator directly implements the simplex crossover, SPX. Each operator application uses n+1 parent samples to produce a single child solution. Each n+1 parent samples are used to produce n+1 child solutions. -178- 3.3.3.14 PC Gaussian This operator is similar in design to the PC Gaussian operator, except that after rotation into the eigenspace a Gaussian mutation rather than uniform crossover is performed. The variances of this mutative distribution are determined by the eigenvalues associated with each eigenvector. This operator produces one child solution from one parent, using a pool of associated solutions to provide the covariance sample. The pool size for this evaluation is fixed as 2d, where d is the dimensionality of the problem space. 3.3.4 Empirical Statistical Results The following tables demonstrate the results from 100 sets of 10,000 operator evaluations. Each operator is applied on each of the 9 test distributions. The statistical measures outline previously: mean modification (Au), GVM, AVM, covariance modification (CM), and center focus measure (CF M) are measured for each set of 10,000 evaluations. The averages of all 100 sets are presented in these tables. 3.3.4.1 Results by Distribution The following tables present the results of the statistical tests organized by the test distributions. This allows for relative behavioral comparison of operators in the same environment. -179- Operator An evil AVM CM cFM Fixed Uniform, P 0.095302 9.9929 9.99692 0.001004 0.0278 Fixed Normal, P 0.095259 9.97957 9.98858 0.001021 -0.0456 Fixed Cauchy, P 22.3083 2055480 6358350 12.9643 -0.13016 Fixed Log-Uniform, P 0.093714 9.93702 10.1305 0.000987 -0.5402 Fixed Uniform, M 0.096881 9.92675 9.93145 0.00099 0.028305 Fixed Normal, M 0.095265 9.95056 9.96111 0.000985 -0.0459 Fixed Cauchy, M 27.4192 2156770 6569580 72.3596 -0.12082 Fixed Log-Uniform, M 0.094141 9.98873 10.1718 0.000989 -0.53653 Averaging 0.001926 -0.03188 0.140812 0.000288 -0.16362 Linear 0.00314 -0.02768 0.087546 0 -0.09826 Extended Linear 0.006505 0.014019 0.044333 0 -0.13481 BLX-alpha 0.005899 0.014005 0.044289 0 -0.12823 BLX-alpha, P 0.009177 0.055007 0.173946 0 -0.19751 SPX 0.067286 0.608672 1.92479 0 003284 PO Gaussian 0.007076 0.083353 0.263585 8.46E-13 019183 PC Crossover 1.85E—13 -1.5E-10 2.68E-10 3.6E—25 2.24E-15 Field-based Crossover 1.64E-15 1.64E-15 5.18E-15 0 1.7E-15 Global Dominant 0.007308 0.000343 0.004803 0 0.00108 Table 3.1 Results on Aligned Uniform Unimodal Distribution Operator Art Gv'M AVM CM CFM Fixed Uniform, P 0.095913 9.99228 9.99621 0.007789 0.029806 Fixed Normal, P 0.092222 10.0121 10.0207 0.007892 -0.04467 Fixed Cauchy, P 80.2332 1.535+08 4.845+08 134.089 -0.11168 Fixed Log-Uniform. P 0.096776 9.92752 10.1025 0.007963 0.53995 Fixed Uniform, M 0.098863 9.94697 9.95071 0.00862 0.034669 Fixed Normal, M 0.0954 9.92685 9.93693 0.008592 -0.04115 Fixed Cauchy, M 41.8603 11546600 35893200 9821.28 -0.12106 Fixed Log-Uniform, M 0.096395 9.95437 10.129 0.008667 0.53417 Averaging 0.001935 -0.03215 0.055517 0.000914 -0.16272 Linear 0.003169 -0.02778 0.027776 0.000283 -0.10262 Extended Linear 0.005266 0.014089 0.014089 5.755-05 -0.13019 BLX-alpha 0.007217 0.013643 0.013951 0.002418 0.026734 BLX-alpha, P 0.011313 0.055432 0.055697 0.000435 -0.10184 SPX 0.07345 0.620532 0.620532 0.030364 -0.04096 PC Gaussian 0.007573 0.082807 0.082807 0.001421 -0.18419 PC Crossover 1.35E-16 -3.25-17 1.27E-16 7.215-33 -25-16 Field-based Crossover 4.545-17 -1.5E-18 5.745-17 0.002098 0.062642 Global Dominant 0.00896 415-05 0.002258 0.008007 0.034699 Table 3.2 Results on Rotated Uniform Unimodal Distribution -l80- Qperator I An GVM AVM CM CF'M Fixed Uniform,P 0.100764 9.99772 10.0039 0.001118 0.169404 Fixed Normal,P 0.096511 9.98041 9.99064 0.001071 0.149286 Fixed Cauchy, P 34.4028 5726270 17883100 787.14 0.085567 Fixed Log-Uniforrn,P 0.101021 10.1174 10.317 0.001123 -0.19573 Fixed Uniform,M 0.095777 9.49617 9.61634 0.001072 0.209971 Fixed NorrnaI,M 0.099884 9.48912 9.62041 0.001039 0.158183 Fixed Cauchy, M 35.2497 12002600 37712500 335.802 0.079093 Fixed Log-Uniform,M 0.096681 9.5047 9.80657 0.001057 -0.24116 Averaging 0.002021 -0.49377 1.58471 0.00023 -0.0026 Linear 0.010425 -0.33299 1.05299 0 -0.00928 Extended Linear 0.020283 0.16698 0.528036 0 -0.03563 BLX-alpha 0.020453 0.181205 0.573021 0 -0.03704 BLX-alpha, P 0.028741 0.669798 2.11809 0 -0.06833 SPX 0.236523 7.4097 23.4315 0 0.005203 PC Gaussian 0.022965 0.99181 3.13638 5.4E-13 -0.01048 PC Crossover 1.84E-13 -1.5E-10 2.67E-10 1.46E-14 3E-17 Field-based Crossover 1.32E-17 2.22E-18 1.19E-16 0 4.44E-18 Global Dominant 0.024284 -0.00891 0.108572 0 -0.00166 Table 3.3 Results on Aligned Normal Unimodal Distribution Operator Au GVM AW CM 3M md Uniform]? 0.099686 9.99125 9.99654 0.049393 0.184469 Fixed Normal, P 0.097443 9.96539 9.97613 0.049763 0.153426 Fixed Cauchy, P 17.621 964917 2886880 30.7709 0.072889 Fixed Log-Uniform, P 0.099198 10.0557 10.2486 0.048962 -0.19401 Fixed Uniform,M 0.097616 9.49966 9.50411 0.07228 0.212425 Fixed Normal, M 0.101171 9.48936 9.49952 0.072987 0.159413 Fixed Cauchy, M 32.5046 9013020 28243400 307.191 0.083529 Fixed Log-Uniform,M 0.097137 9.42228 9.61821 0.071652 -0.24128 Averaging 0.002124 -0.48912 0.493643 0.008906 -0.00479 Linear 0.010654 -0.33276 0.332757 0.003391 -0.00674 Extended Linear 0.021178 0.158937 0.158937 0.000637 -0.03515 BLX-alpha 0.02544 0.16653 0.173064 0.029077 0.061905 BLX-alpha, P 0.041024 0.676501 0.680602 0.005111 -0.00096 SPX 0.256775 7.36582 7.36582 0.359701 0.001964 PC Gaussian 0.026285 1.00032 1.00032 0.017237 -0.01029 PC Crossover 2.58E-17 -3.9E-16 4.09E-16 7.32E-33 -2.2E-17 Field-based Crossover 1.23E-17 2.11E-17 5.15E-17 0.025234 0.072309 Global Dominant 0.030563 0.000742 0.043299 0.096032 0.151748 Table 3.4 Results on Rotated Normal Unimodal Distribution -181- Operator Au GVM AVM CM CFM Fixed Uniform,P 0.098612 9.98519 9.99042 0.001126 0.051685 Fixed Normal, P 0.098227 9.96494 9.97535 0.001178 -0.00106 Fixed Cauchy, P 30.2911 6301540 19694200 243.852 0.08353 Fixed Log-Unifonn, P 0.095049 9.85522 10.0283 0.001108 0.52844 Fixed Uniform, M 0.100658 9.49271 9.52734 0.001098 0.05559 Fixed Normal,M 0.09743 9.51404 9.55566 0.0011 0.00564 Fixed Cauchy, M 38.5881 9015040 28206400 58.8462 0.0781 Fixed Log-Uniform, M 0.100468 9.6409 9.85611 0.001113 -0.4865 Averaging 0.001882 0.49304 0.912014 4.575-05 0.19271 Linear 0.012134 0.33254 0.60782 4.345-05 0.322579 Extended Linear 0.023395 0.167295 0.309606 7.53505 0.06361 BLX-alpha 0.023746 0.166974 0.309999 0.000125 0.00705 BLX-alpha, P 0.038326 0.667384 1.22863 0.000156 0.15128 SPX 0.283269 7.33396 13.4766 0.006962 0.023562 PC Gaussian 0.0283 0.9978 1.82206 0.000165 0.06194 PC Crossover 5.575-17 4.7E—16 0.015094 4.545-05 0.091521 Field-based Crossover 1.315-17 1.11E-18 9.75-17 9.11505 0.132339 Global Dominant 0.028838 0.5505 0.046306 0.000175 0.161629 Table 3.5 Results on n-dimensional Hypersphere Surface Distribution Operator An GVM AVM CM 515M Fixed Uniform, F 0.100395 9.99787 10.0027 0.001087 -0.25882 Fixed Normal,P 0.096509 10.0026 10.013 0.001101 0.31474 Fixed Cauchy, P 35.9693 5706930 17715500 8089.22 0.39358 Fixed Log-Uniform,P 0.095129 10.0709 10.2489 0.001101 0.8432 Fixed Uniform, M 0.096207 9.58539 9.58976 0.001083 0.25185 Fixed Normal, M 0.095354 9.57453 9.58485 0.001128 -0.3265 Fixed Cauchy, M 57.9317 91366900 2.89E+08 46.7645 0.39994 Fixed Log-Uniforrn, M 0.098437 9.58394 9.77334 0.001119 -0.83219 Averaging 0.001847 0.40953 0.412962 3.8E-05 -0.24269 Linear 0.01124 0.27766 0.278466 3.68E-05 0.2259 Extended Linear 0.022863 0.139376 0.143509 6.535-05 0.42957 BLX-alpha 0.023095 0.138798 0.143449 0.00011 0.30909 BLX-alpha, P 0.036031 0.555205 0.560778 0.000127 -0.35085 spx 0.269243 6.10068 6.16809 0.006122 -0.20386 PC Gaussian 0.028658 0.829183 0.831829 0.000144 0.33211 PC Crossover 4.15-17 1.24E-16 0.022671 3.87E-05 0.16004 Field-based Crossover 0 -2.2E-18 6.595-17 7.73505 0.21706 Global Dominant 0.027097 0.00087 0.030967 0.000154 0.29034 Table 3.6 Results on Uniform Density n—dimensional Hypersphere Distribution -182- Sperator r Art CW AVM CM CFM Fixed Uniform,P 0.096781 10.0163 10.0349 0.001564 0.019011 Fixed Normal,P 0.098614 9.98732 10.0146 0.001596 0.004682 Fixed Cauchy, P 34.0543 5849480 18323500 23866.2 -0.07132 Fixed Log-Uniform, P 0.100604 10.0979 10.283 0.001624 -0.26729 Fixed Uniform, M 0.096629 5.49506 5.68638 0.001819 0.029271 Fixed Normal, M 0.099634 5.47853 5.68391 0.001865 0.004272 Fixed Cauchy,M 86.3381 2.1E+08 6.64E+08 2338.61 -0.0699 Fixed Log-Uniform, M 0.098674 5.51143 6.0279 0.001796 -0.33956 Averaging 0.003239 4.52368 4.73485 0.000318 -0.00076 Linear 0.038609 -3.0239 3.16976 0.00044 -0.03271 Extended Linear 0.074817 1.52227 1.64713 0.000829 -0.1248 BLX-alpha 0.07913 1.4953 1.62682 0.001272 -0.03883 BLX-alpha,P 0.116761 6.07729 6.41829 0.001504 -0.07271 SPX 0.860061 66.4272 70.3316 0.074228 -0.00477 PC Gaussian 0.087542 9.07689 9.52179 0.001643 -0.03598 PC Crossover 1.73E-16 2.68E-15 0.261213 0.000406 -0.00486 Field-based Crossover 4.27E-17 5.33E-17 6.99E-16 0.000917 0.000276 Global Dominant 0.092101 -0.00686 0.405776 0.001787 0.002844 Table 3.7 Results on n-dimensional Normal Distribution Operator Au GVM AVM CM CFM Fixed UnifonTP 0.097507 10.0107 10.0175 0.001179 0.03021 Fixed Normal, P 0.0987 9.97933 9.99292 0.001261 -0.0166 Fixed Cauchy, P 51.5956 615511001.94E+08 3908.48 -0.10254 Fixed Log-Uniform, P 0.097922 10.1388 10.3286 0.00121 -0.46019 Fixed Uniform, M 0.096643 9.08709 9.12477 0.00127 0.033942 Fixed Norrnal,M 0.099034 9.07354 9.11768 0.001255 -0.01953 Fixed Cauchy, M 39.3995 18933700 59678700 635.377 -0.09394 Fixed Log-Uniform, M 0.101053 9.10455 9.33673 0.001224 -0.47771 Averaging 0.002431 -0.89823 1.17863 6.73E-05 -0.00429 Linear 0.016863 -0.60516 0.790924 8.51E-05 -0.00475 Extended Linear 0.03376 0.297963 0.40044 0.000162 -0.13151 BLX-alpha 0.032015 0.294687 0.391116 0.000261 -0.06882 BLX-alpha, P 0.051667 1.20896 1.60121 0.000309 -0.12177 SPX 0.38288 13.3774 17.5227 0.015336 0.00803 PC Gaussian 0.042073 1.82104 2.37735 0.000315 0081 PC Crossover 7.6E-17 8.1E-16 0.038964 8.69E-05 0.001561 Field-based Crossover 2.1E-17 -2.2E-18 1.66E-16 0.000183 0.002821 Global Dominant 0.042751 -0.00375 0.079754 0.000341 -0.00908 Table 3.8 Results on n-dimensional Normal Ring Distribution -l83- Operator Au GVM AVM CM CFM md Uniform, P 0.100504 9.87642 10.5683 0.005865 0.00116 Fixed Normal, P 0.095766 10.065 10.8199 0.005581 0.00093 Fixed Cauchy, P 43.5692 30896300 97514600 287.722 002964 Fixed Log-Uniform, P 0.099188 9.94697 10.7487 0.005822 0.001264 Fixed Uniform, M 0.097138 482.687 233.85 0.151382 0.00308 Fixed Normal, M 0.098206 482.619 233.734 0.152666 0.006394 Fixed Cauchy, M 32.3487 6224850 19487400 78.8244 0.02611 Fixed Log-Uniform, M 0.097173 482.429 233.53 0.150938 0.007776 Averaging 0.013394 492.791 241.779 0.104551 0.003993 Linear 0.243481 427.922 161.018 0.052632 0.02314 Extended Linear 0.482358 63.153 81.4166 0.042842 0.08719 BLX-alpha 0.48023 63.0731 81.4285 0.358389 0.03457 BLX-alpha, P 0.798515 257.221 325.178 0.118477 0.07127 SPX 5.58187 2805.79 3539.42 7.09697 0.00319 PC Gaussian 0.571992 384.981 482.734 0.249266 0.02335 PC Crossover 1.06E-15 8.3E-14 9.92877 0.015938 3.29E-06 Field-based Crossover 3.03E-16 -5.7E-16 3.31 E-14 0.30106 0.005396 Global Dominant 0.585842 0.72733 21.4282 1.09751 0.002894 Table 3.9 Results on Rotated n-dimensional Hyperellipsoid Distribution Operator Au GVM AVM CM - CFM Fixed Uniform, P 0.096076 10.0785 10.4659 0.007042 0.0034 Fixed Normal, P 0.09656 9.98929 10.4375 0.007319 0.007344 Fixed Cauchy, P 27.5999 3940060 12194800 65.7525 0.027919 Fixed Log-Uniform, P 0.095092 10.1606 10.7219 0.007051 0.006294 Fixed Uniform,M 0.09352 402.644 133.575 0.101616 0.027528 Fixed Normal, M 0.096024 -102.61 133.464 0.102183 0.02505 Fixed Cauchy, M 185.41 1.99E+09 6.27E+09 4178.72 0.01249 Fixed Log-Uniform,M 0.101524 -102.74 133.647 0.10145 0.026274 Averaging 0.011062 -112.581 141.131 0.060139 0.023516 Linear 0.180892 -75.2073 94.4249 0.031193 0.0074 Extended Linear 0.391856 37.1472 48.4681 0.024738 0.05863 BLX-alpha 0.351107 37.0258 48.1889 0.212002 0.00388 BLX-alpha, P 0.578277 150.157 189.22 0.072682 0.0382 SPX 4.27396 1662.75 2105.17 4.07067 0.014845 PC Gaussian 0.463965 227.011 284.753 0.15255 0.018949 PC Crossover 4.6E-15 4.32E-14 5.86794 0.009837 0.005653 Field-based Crossover 5.96E-16 -1.7E-15 2.19E-14 0.177148 0.025954 Global Dominant 0.444635 0.46726 12.1322 0.649314 0.045027 Table 3.10 Results on Rotated n-dimensional Skewed Hyperellipsoid Distribution -l84- 3.3.4.2 Results by Operator The following tables represent the same data as the previous section reorganized by Operator. Fixed Uniform, P Au GVM AVM CM CFM Aligned Uniform Unimodal 0.095302 9.9929 9.99692 0.001004 0.0278 Rotated Uniform 0.095913 9.99228 9.99621 0.007789 0.029806 Aligned Normal Unimodal 0.100764 9.99772 10.0039 0.001118 0.169404 Rotated Normal Unimodal 0.099686 9.99125 9.99654 0.049393 0.184469 n-Sphere Surface 0.098612 9.98519 9.99042 0.001126 0.051685 Uniform Density 0.100395 9.99787 10.0027 0.001087 0.25882 n-Dim. Normal 0.096781 10.0163 10.0349 0.001564 0.019011 n-Dim. Ring 0.097507 10.0107 10.0175 0.001179 0.03021 n-Dim. Rotated Ellipsoid 0.100504 9.87642 10.5683 0.005865 000116 n-Dim. Ellipsoid Skewed 0.096076 10.0785 10.4659 0.007042 0.0034 Table 3.11 Results for Fixed Uniform Mutation Centered on a Single Parent Fixed Normal, P Ap GVM AVM CM CFM Aligned Uniform Unimodal 0.095259 9.97957 9.98858 0.001021 0.0456 Rotated Uniform 0.092222 10.0121 10.0207 0.007892 0.04467 Aligned Normal Unimodal 0.096511 9.98041 9.99064 0.001071 0.149286 Rotated Normal Unimodal 0.097443 9.96539 9.97613 0.049763 0.153426 n-Sphere Surface 0.098227 9.96494 9.97535 0.001178 0.00106 Uniform Density 0.096509 10.0026 10.013 0.001101 0.31474 n-Dim. Normal 0.098614 9.98732 10.0146 0.001596 0.004682 n-Dim. Ring 0.0987 9.97933 9.99292 0.001261 0.0166 n-Dim. Rotated Ellipsoid 0.095766 10.065 10.8199 0.005581 0.00093 n-Dim. Ellipsoid Skewed 0.09656 9.98929 10.4375 0.007319 0.007344 Table 3.12 Results for Fixed Normal Mutation Centered on a Single Parent -185- rlfied Cauchy, P AJL GVM AVM CM CF M Aligned Uniform Unimodal Rotated Uniform Unimodal Aligned Normal Unimodal Rotated Normal Unimodal n-Sphere Surface Uniform Density n-Dim. Normal n-Dim. Ring n-Dim. Rotated Ellipsoid n-Dim. Ellipsoid Skewed 22.3083 2055480 6358350 80.2332 1.53E+08 4.84E+08 5726270 1 78831 0 964917 2886880 6301 540 1 969420 5706930 1 771 550 5849480 1832350 6155110 1.94E+08 3089630 9751460 3940060 121 9480 34.4028 1 7.621 30.291 1 35.9693 34.0543 51 .5956 43.5692 27.5999 12.9643 -0.13016 134.089 -0.11168 787.14 0.085567 30.7709 0.072889 243.852 -0.08353 8089.22 -0.39358 23866.2 -0.07132 3908.48 -0.10254 287.722 -0.02964 65.7525 0.027919] Table 3.13 Results for Fixed Cauchy Mutation Centered on a Single Parent Fixed Log-Uniform, P All GVM AVM CM CFM Aligned Uniform Unimodal Rotated Uniform Aligned Normal Unimodal Rotated Normal Unimodal n-Sphere Surface Uniform Density n-Dim. Normal n-Dim. Ring n-Dim. Rotated Ellipsoid n-Dim. Ellipsoid Skewed 0.093714 0.096776 0.101 021 0.0991 98 0.095049 0.0951 29 0.1 00604 0.097922 0.0991 88 0.095092 9.93702 9.92752 10.1 174 10.0557 9.85522 10.0709 10.0979 10.1388 9.94697 10.1606 10.1305 0.000987 10.1025 0.007963 10.317 0.001123 10.2486 0.048962 10.0283 0.001108 10.2489 0.001101 10.283 0.001624 -0.26729 10.3286 0.00121 -0.46019 10.7487 0.005822 0.001264 10.7219 0.007051 0.006294 -0.5402 -0.53995 -0. 1 9573 -0. 1 9401 -0.52844 -0.8432 Table 3.14 Results for Fixed Log-Uniform Mutation Centered on a Single Parent Fixed Uniform, M Au GVM AVM CM CFM Aligned Uniform Unimodal 0.096881 9.92675 9.93145 0.00099 0.028305 Rotated Uniform 0.098863 9.94697 9.95071 0.00862 0.0346691 Aligned Normal Unimodal 0.095777 9.49617 9.61634 0.001072 0.209971 Rotated Normal Unimodal 0.097616 9.49966 9.50411 0.07228 0.212425 n-Sphere Surface 0.100658 9.49271 9.52734 0.001098 0.05559! Uniform Density 0.096207 9.58539 9.58976 0.001083 -0.25185 n-Dim. Normal 0.096629 5.49506 5.68638 0.001819 0.029271 n-Dim. Ring 0.096643 9.08709 9.12477 0.00127 0.033942 n-Dim. Rotated Ellipsoid 0.097138 482.687 233.85 0.151382 0.00308 n-Dim. Ellipsoid Skewed 0.09352 402.644 133.575 0.101616 0.027528 -186- Table 3.15 Results for Fixed Uniform Mutation Centered on the Mean of 2 Parents Fixed Nomial, M An GVM AVM CM CFM Aligned Uniform Unimodal 0.095265 9.95056 9961110000985 004591 Rotated Uniform Unimodal 0.0954 9.92685 9.93693 0.008592 0.04115 Aligned Normal Unimodal 0.099884 9.48912 9.62041 0.001039 0.158183 Rotated Normal Unimodal 0.101171 9.48936 9.49952 0.072987 0.159413 n-Sphere Surface 0.09743 9.51404 9.55566 0.0011 0.00564 Uniform Density 0.095354 9.57453 9.58485 0.001128 0.3265 n-Dim. Normal 0.099634 5.47853 5.68391 0.001865 0.004272 n-Dim. Ring 0.099034 9.07354 9.11768 0.001255 0.01953 n-Dim. Rotated Ellipsoid 0.098206 -182.619 233.734 0.152666 0.006394 n-Dim. Ellipsoid Skewed 0.096024 -102.61 133.464 0.102183 0.02505 Table 3.16 Results for Fixed Normal Mutation Centered on the Mean of 2 Parents Fixed Cauchy, M An GVM AVM CM CFM Aligned Uniform Unimodal 27.4192 2156770 6569580 72.3596 0.12082 Rotated Uniform Unimodal 41.8603 11546600 35893200 9821.28 012106 Aligned Normal Unimodal 35.2497 12002600 37712500 335.802 0.079093 Rotated Normal Unimodal 32.5046 9013020 28243400 307.191 0.083529l n-Sphere Surface 38.5881 9015040 28206400 58.8462 0.0781 Uniform Density 57.9317 91366900 2.89E+08 46.7645 0.39994 n-Dim. Normal 86.3381 2.1E+08 6.64E+08 2338.61 006991 n-Dim. Ring 39.3995 18933700 59678700 635.377 009394 n-Dim. Rotated Ellipsoid 32.3487 6224850 19487400 78.8244 0.02611 n-Dim. Ellipsoid Skewed 185.41 1.99E+09 6.27E+09 4178.72 0.01249l Table 3.17 Results for Fixed Cauchy Mutation Centered on the Mean of 2 Parents Fixed Log-Uniform, M Ap GVM AVM CM CFM Aligned Uniform Unimodal 0.094141 9.98873 10.1718 0.000989 0.53653 Rotated Uniform Unimodal 0.096395 9.95437 10.129 0.008667 0.53417 Aligned Normal Unimodal 0.096681 9.5047 9.80657 0.001057 0.24116 Rotated Normal Unimodal 0.097137 9.42228 9.61821 0.071652 0.24128 n-Sphere Surface 0.100468 9.6409 9.85611 0.001113 0.4865 Uniform Density 0.098437 9.58394 9.77334 0.001119 0.83219 n-Dim. Normal 0.098674 5.51143 6.0279 0.001796 0.33956 n-Dim. Ring 0.101053 9.10455 9.33673 0.001224 0.47771 n-Dim. Rotated Ellipsoid 0.097173 482.429 233.53 0.150938 0.007776 n-Dim. Ellipsoid Skewed 0.101524 -102.74 133.647 0.10145 0.026274 Table 3.18 Results for Fixed Log-Uniform Mutation Centered on the Mean of 2 Parents -187- Averaging Att GVM AVM CM CFM Aligned Uniform Unimodal 0.001926 0.03188 0.140812 0.000288 0.16362 Rotated Uniform 0.001935 0.032155 0.055517 0.000914 0.16272 Aligned Normal Unimodal 0002021 0.49377 1.58471 0.00023 0.0026 Rotated Normal Unimodal 0.002124 0.48912 0.493643 0.008906 -0.00479[ n-Sphere Surface 0.001882 0.49304 0.912014 4.57E-05 0.19271 Uniform Density 0.001847 0.40953 0.412962 3.8E-05 024269 n-Dim. Normal 0.003239 4.52368 4.73485 0.000318 0.00076 n-Dim. Ring 0.002431 0.89823 1.17863 6.73E-05 000429 n-Dim. Rotated Ellipsoid 0.013394 492.791 241.779 0.104551 0.003993 n-Dim. Ellipsoid Skewed 0.011062 412.581 141.131 0.060139 0.023516 Table 3.19 Results for Averaging Crossover Linear gt GVM AVM CM CFM Aligned Uniform Unimodal 0.00314 0.02768 0.087546 0 009826 Rotated Uniform 0.003169 0.02778 0.027776 0.000283 0.10262 Aligned Normal Unimodal 0.010425 0.33299 1.05299 0 000928 Rotated Normal Unimodal 0.010654 0.33276 0.332757 0.003391 0.00674 n-Sphere Surface 0.012134 0.33254 0.60782 4.34E-05 0.322579 Uniform Density 0.01124 0.27766 0.278466 3.68E-05 02259 n-Dim. Normal 0.003239 4.52368 4.73485 0.000318 000076 n-Dim. Ring 0.016863 0.60516 0.790924 8.51E-05 000475 n-Dim. Rotated Ellipsoid 0.243481 427.922 161.018 0.052632 0.02314 n-Dim. Ellipsoid Skewed 0.180892 -75.2073 94.4249 0.031193 0.0074 Table 3.20 Results for Linear Crossover Extended Linear Au GVM AVM CM CFM Aligned Uniform Unimodal 0.006505 0.014019 0.044333 0 013481 Rotated Uniform 0.005266 0.014089 0.014089 5.75E-05 013019l Aligned Normal Unimodal 0.020283 0.16698 0.528036 0 003563 Rotated Normal Unimodal 0.021178 0.158937 0.158937 0.000637 0.03515 n-Sphere Surface 0.023395 0.167295 0.309606 7.53E-05 006361 Uniform Density 0.022863 0.139376 0.143509 6.53E-05 042957 n-Dim. Normal 0.074817 1.52227 1.64713 0.000829 0.1248 n-Dim. Ring 0.03376 0.297963 0.40044 0.000162 0.13151 n-Dim. Rotated Ellipsoid 0.482358 63.153 81.4166 0.042842 0.08719 n-Dim. Ellipsoid Skewed 0.391856 37.1472 48.4681 0.024738 0.05863 Table 3.21 Results for Extended Linear Crossover ~188- BLX-alpha All GVM AVM CM CJFM Aligned Uniform Unimodal Rotated Uniform Aligned Normal Unimodal Rotated Normal Unimodal 0.005899 00140050044289 0 -0.12823 0.007217 0.013643 0.013951 0.002418 0.026734 0.020453 0.181205 0.573021 0 -0.03704 0.02544 0.16653 0.173064 0.029077 0.061905 n-Sphere Surface 0.023746 0.166974 0.309999 0.000125 0.00705 Uniform Density 0.023095 0.138798 0.143449 0.00011 0.30909 n-Dim. Normal 0.07913 1.4953 1.62682 0.001272 0.03883 n-Dim. Ring 0.032015 0.294687 0.391116 0.000261 0.06882 n-Dim. Rotated Ellipsoid 0.48023 63.0731 81.4285 0.358389 0.03457 n-Dim. Ellipsoid Skewed 0.351107 37.0258 48.1889 0.212002 0.00388 Table 3.22 Results for standard BLX-0.5 BLX-alpha, P Ap GVM AVM CM CFM Aligned Uniform Unimodal 0.00917 0.05500 0.17394 0 -0.19751 Rotated Uniform 0.01131 0.05543 0.05569 0.00043 -0.10184 Aligned Normal Unimodal 0.02874 0.66979 2.11809 0 -0.06833 Rotated Normal Unimodal 0.04102 0.67650 0.68060 0.00511 -0.00096 n-Sphere Surface 0.03832 0.66738 1.22863 0.00015 -0.15128 Uniform Density 0.03603 0.55520 0.56077 0.00012 -0.35085 n-Dim. Normal 0.11676 6.07729 6.41829 0.00150 -0.07271 n-Dim. Ring 0.05166 1.20896 1.60121 0.00030 -0.12177 n-Dim. Rotated Ellipsoid 0.79851 257.221 325.178 0.11847 -0.07127 n-Dim. Ellipsoid Skewed 0.57827 150.157 189.22 0.07268 -0.0382 Table 3.23 Results for ELK-0.5 centered on a Single Parent SPX A14 GVM AVM CM CFM Aligned Uniform Unimodal 0.067286 0.6086? 1.92779 0 003284 Rotated Uniform Aligned Normal Unimodal Rotated Normal Unimodal n-Sphere Surface Uniform Density n-Dim. Normal n-Dim. Ring n-Dim. Rotated Ellipsoid n-Dim. Ellipsoid Skewed 0.07345 0.620532 0.620532 0.030364 -0.04096 0.236523 0.256775 0.283269 0.269243 0.860061 0.38288 5.581 87 4.27396 7.4097 7. 36582 7.33396 6. 1 0068 66.4272 1 3.3774 2805.79 1662.75 23.4315 0 0.005203 7.36582 0.359701 0.001964 13.4766 0.006962 0.023562 6.16809 0.006122 -0.20386 70.3316 0.074228 -0.00477 17.5227 0.015336 -0.00803 3539.42 7.09697 -0.00319 2105.17 4.07067 0.014845 Table 3.24 Results for Simplex Crossover (SPX) -189- PC Gaussian Ap. GVM AVM CM CFM Aligned Uniform Unimodal 0.007076 0.083353 0.263585 8.46E—13 019183 Rotated Uniform 0.007573 0.082807 0.082807 0.001421 018419l Aligned Normal Unimodal 0.022965 0.99181 3.13638 5.4543 0.01048 Rotated Normal Unimodal 0.026285 1.00032 1.00032 0.017237 0.01029 n-Sphere Surface 0.0283 0.9978 1.82206 0.000165 0.06194 Uniform Density 0.028658 0.829183 0.831829 0.000144 0.33211 n-Dim. Normal 0.087542 9.07689 9.52179 0.001643 0.03598 n-Dim. Ring 0.042073 1.82104 2.37735 0.000315 0.081 n-Dim. Rotated Ellipsoid 0.571992 384.981 482.734 0.249266 0.02335 n-Dim. Ellipsoid Skewed 0.463965 227.011 284.753 0.15255 0.018949 Table 3.25 Results for Principal Component Gaussian Sampling PC Crossover Apt GVM AVM CM CFM Aligned Uniform Unimodal 1.85543 4.5E-10 2.68E-10 3.6E-25 2.24E-15 Rotated Uniform 1.35E-16 -3.2E47 1.27E-16 7.21E-33 -2E-16 Aligned Normal Unimodal 1.84E-13 4.5E-10 2.67E-10 1.46E-14 3E-17 Rotated Normal Unimodal 2.58E47 -3.9E-16 4.09E-16 7.32E-33 -2.2E47 n-Sphere Surface 5.57E-17 4.7E-16 0.015094 4.54E-05 0.091521 Uniform Density 4.1E-17 1.24E-16 0.022671 3.87E-05 016004 n—Dim. Normal 1.73E-16 2.68E—15 0.261213 0.000406 0.00486 n-Dim. Ring 7.6E-17 8.1E-16 0.038964 8.69E-05 0.001561 n-Dim. Rotated Ellipsoid 1.06E-15 8.3E—14 9.92877 0.015938 3.29E-06 n-Dim. Ellipsoid Skewed 4.6E-15 4.32E-14 5.86794 0.009837 0.005653 Table 3.26 Results for Principal Component Crossover Field-based Crossover Ap. GVM AVM CM CFM Aligned Uniform Unimodal 1.64E-15 1.64E-15 5.18E-15 0 1.71545 Rotated Uniform 4.54E-17 4.5E-18 5.74E47 0.002098 0.062642 Aligned Normal Unimodal 1.32E-17 2.22E-18 1.19E-16 0 4.44E-18 Rotated Normal Unimodal 1.23E-17 2.11E-17 5.15E-17 0.025234 0.072309 n-Sphere Surface 1.31E-17 1.11E-18 9.7E-17 9.11E—05 0.132339 Uniform Density 0 -2.2E-18 6.59E-17 7.73E-05 021706 n-Dim. Normal 4.27E-17 5.33E-17 6.99E-16 0.000917 0.000276 n-Dim. Ring 2.1E-17 -2.2E-18 1.66E-16 0.000183 0.002821 n-Dim. Rotated Ellipsoid 3.03E-16 -5.7E46 3.31E44 0.30106 0.005396 n-Dim. Ellipsoid Skewed 5.96E-16 4.7E-15 2.19E-14 0.177148 0.025954 Table 3.27 Results for Field-Based Crossover ~190- GlobaTDominant Au GVM AVM CM CFM Aligned Uniform Unimodal 0.00730 0.00034 0.00480 0 0.00108 Rotated Uniform 0.00896 -4E-05 0.00225 0.00800 0.03469 Aligned Normal Unimodal 0.02428 0.00891 0.10857 0000166 Rotated Normal Unimodal 0.03056 0.00074 0.04329 0.09603 0.15174 n-Sphere Surface 0.02883 -3.5E-05 0.04630 0.00017 0.16162 Uniform Density 0.02709 0.00087 0.03096 0.00015-0.29034 n-Dim. Normal 009210000686 0.40577 0.00178 0.00284 n-Dim. Ring 0.04275 0.00375 0.07975 0.00034 0.00908 n-Dim. Rotated Ellipsoid 0.58584 0.72733 21.4282 1.09751 0.00289 n-Dim. Ellipsoid Skewed 0.44463 0.46726 12.1322 0.64931 0.04502 Table 3.28 Results for Global Dominant Recombination 3.3.4.3 Analysis of Results In comparing uniform, normal, and log-uniform fixed mutative schemes, note that all three provide nearly identical results in terms of variance and covariance disruption, and mean modification. These features are all relative to the variance of the distribution being sampled. Since we selected the variance to be identical, the results are nearly identical. Therefore, we see that the results of these values seem to be irrespective of the choice of normal or uniform distribution. However, the center focusing measure shows that the fixed normal mutation provides a more strongly centered child distribution than the uniform, as would be expected. Surprisingly, the log-uniform mutation operator shows greater center tending even than the Cauchy mutation Operator does. However, the variance of the log-uniform operator is nearly the same as that produced by the normal mutation Operator. This implies that either the log-uniform mutation moves more strongly toward the center or that it produces a significant number of high variance individuals that skew the mean of the distance distribution outward. Since we know from its design that the log-uniform mutation operator is strongly centered on individual parent solutions, we conclude that ~191- the second situation must be true. Note that the Cauchy distribution also produces a significant level of high variance members; however, here the number produced is large enough to significantly increase the variance and effectively produces a flatter distribution than the log-uniform mutation Operator in all cases. The general result seems to be that the skew of the mutation distribution toward the center determines the degree of center focus induced in the child population. While this is hardly an unexpected result, it does validate these statistical measures to some degree. A clearer result, demonstrated by analysis of the parent centered and mean centered versions of the various operators, is that a great deal of variance can be lost if the width of the operator applied is not relative to the distance between the parents for mean centered operators. The magnitude of this variance loss is relative to the variance of the parent distribution, and can be hidden by the relative magnitude of the variance of the mutative distribution. For example, the first four test cases, which provide relatively little population variance compared to the variance of the fixed normal, fixed uniform, and fixed log-uniform mutation operators, show an average 0.05 differential between the mean centered and parent centered versions. However, for the ellipsoid test cases, which have much larger initial population variances, the mean centered versions show huge variance losses. The effect of mean versus parent centering on covariance loss measure is consistent across the various operator pairs. In test cases where the uniform mutation operator shows reduced covariance loss when centered on a single parent, the normal, log-uniform, and BLX-0.5 Operators show similar magnitude reductions when parent centered. This is not true with the Cauchy mutation operator. Also, the test cases having rotated presentations demonstrate the greatest covariance loss when using the mean -l92- centered versions of the various operators. This is not surprising, since these distributions have the larger covariances, therefore have more variance to lose. A direct conclusion is that use of mean centered versus parent centered operators causes a direct increase in the degree of covariance loss. Note that for the BLX-0.5 operators the parent-centered version provides both an increase in the variance and a reduction in the covariance loss. This is expected result, since averaging adds a component of variance loss and covariance loss. Interestingly the parent centered BLX-0.5 also increases the degree of center focusing when compared to its mean centered version. This tendency is somewhat present in other mean/parent- centered pairs, but is not as pronounced as it is in the BLX-0.5 case. This implies that the parent centered version increases the variance of the search distribution without a matching increase in the median distance, thus extending the tails of the distribution without excessively flattening the center. The PC crossover operator shows maximal invariance on all statistics across all test cases. This implies that the PC crossover operator preserves the population mean, variance, covariance, and distribution shape to a high degree. Note that although the overall variance is preserved, the variance may be redistributed across the dimensions, as evidenced by the high AVM values. F ield—based crossover also preserves the mean and population variance to a high degree, but can disrupt the covariance and center tending characteristics under certain circumstances. The dominant recombination operator preserves these characteristics to a much lesser degree, as use of multiple parents provides a much more variant sampling. ~193- Interestingly, the SPX operator shows the greatest level of covariance modification; however, unlike the other operators with high levels of covariance modification this is not due to a loss or reduction of covariance but rather to a sharpening of the covariance that already exists in the population. Unfortunately the unsigned nature of the covariance modification statistic does not allow us to directly distinguish between covariance loss and covariance gain, so this observation must be made through direct evaluation of the distribution of the produced children. In these cases, a clue as to the covariance enhancing nature of the operator is that the magnitude of covariance modification on the last two test distributions is greater than the total expected normalized covariance measure. A further metric that could be of use in such situations would be a measure of relative alignment between the covariances of the parent and child distributions. However, given the possibility of underconstraint (i.e. singularities in the covariance matrix), construction of such a metric would be more difficult than simple direct comparison of the eigenspaces of the two covariance matrices, although that would be a reasonable first approach. The uniform density sphere test case consistently demonstrates the highest degree of center focus for each operator. Likewise, the hypersphere surface consistently elicits the highest degree of center avoidance from each operator. The reason for these observations is still under investigation. 3.3.5 Preliminary Operator Taxonomy Using the data from these statistical measurements and the given test distributions, we can estimate the relative tendencies of the tested operators and thereby produce an overall categorization of their operation. Note that this is no doubt an -l94- oversimplification of the behavior of a number of these operators, but it does provide a reasonable starting point for categorization of operators. Therefore, we can produce a preliminary form of taxonomical classification for these operators. 3.3.5.1 Mean modification None of the operators tested shows any direct bias in terms of mean modification. As discussed previously, most mean modifying operators tend to use information from the objective function to induce bias; however, none of the operators tested are fitness biased. While some of the operators, such as Cauchy mutation tend to show higher magnitude displacement of the population mean, the magnitude of displacement is lower, proportional to the level of variance modification, for Cauchy mutation than it is for several other operators. In summary, for this operator set, the mean modification statistic does not provide any useful differentiation between the operators. 3.3.5.2 Variance modification There are basically five classes of variance modification behaviors observed through these tests: high variance addition, variance addition, variance loss, variance preservation, and variance rescaling. Cauchy mutation adds an extremely high level of variance, and therefore the Cauchy operators are isolated in this categorization. Variance losing operators tend to reduce the population variance, often in proportion to the overall initial population variance. Both the averaging and linear crossover operators, not surprisingly, demonstrate reduced variance in the produced child distributions. Variance preserving Operators are those that neither add nor reduce the variance of the population. Of the three variance preserving operators, PC crossover is the most consistently variance -195- preserving, followed by field-based crossover and global dominant recombination. The majority of the fixed mutation schemes demonstrate basic variance addition, as expected. However, several operators tend add a degree of variance proportional to the current variance of the population. This effectively produces a rescaling of the population variance (e.g. doubling, tripling, etc.). Extended linear recombination, BLX-Ot, SPX, extended linear crossover, and PC Gaussian are all variance rescaling operators. 3.3.5.3 Covariance Modification Four potential classifications of covariance modification behavior have been composed from the statistical test data: covariance losing, covariance preserving, covariance altering, and covariance enhancing. Again, we place the Cauchy mutation operators in a separate categorization, covariance altering, since the large variances tend to overwhelm the covariance measures (effectively overshadowing the population covariance). These operators effectively induce an arbitrary alternate covariance on the population. Covariance losing operators include global dominant recombination, field- based recombination, and BLX—a. These operators are characterized by nearly complete neutralization of the population covariance. We include most of the small variance, fixed mutation operator as covariance preserving, since the magnitude of the covariance disruption is relative to the mutative variance in these cases. Note that a better classification would be variance relative covariance loss, but until we test multiple versions of these operators under various variances, this assumed categorization can not be proven empirically (although from a theoretical standpoint it seems fairly obvious). The PC Crossover operator stands out as the most truly covariance preserving operator of those tested. Both the SPX and PC Gaussian operators can be categorized as covariance -l96- enhancing, in that both operators tend to increase the magnitudes of the covariance present in the parent population. 3.3.5.4 Center Focusing Three categories of center focus behavior provide characterization of the shape modifying tendencies of various operators. These categories are defined as center focusing, center neutral, and center avoiding. Note that while center focus indicates a relative shape change, variance loss (or lower degrees of variance addition) indicates a more direct shift toward the population center. Thus, an operator such as averaging crossover often demonstrates a flatter distribution, but over a much more centralized range. Operators that use a centralized search distribution, such as Cauchy, and log- uniform distributions, tend to induce the search distribution onto the population distribution. The degree of reshaping is determined by the relative magnitude of the variance of the search distribution compared to the variance of the initial population. Tested operators demonstrating various degrees of center focusing are those using log- uniforrn and Cauchy distributions, extended linear crossover, the parent centered version of BLX-0.5, and the PC Gaussian operator. Center neutral operators include the fixed normal sampling mutation operators, PC Crossover, SPX, and BLX-0.5. Center avoiding operators include averaging crossover, global dominant recombination, field-based crossover, and linear crossover. -l97- 3.3.5.5 Preliminary Taxonomy The following table presents a summary of the preliminary operator taxonomy as presented above. Operator Mean Mod. Var. Mod. eggs? :33: Fixed Uniform, P Var. Relative Adding Preserving* Avoiding Fixed Normal, P Var. Relative Adding Preserving* Neutral Fixed Cauchy, P Var. Relative 2219(1an Altering Focusing Fixed Log-Uniform, P Var. Relative Adding Preserving* Focusing Fixed Uniform, M Var. Relative Adding Preserving* Avoiding Fixed Normal, M Var. Relative Adding Preserving* Neutral Fixed Cauchy, M Var. Relative 2:19 ding Altering Focusing Fixed Log-Uniform, M Var. Relative Adding Preserving* Focusing Averaging Var. Relative Reducing Losing Avoiding Linear Var. Relative Reducing Losing Avoiding Extended Linear Var. Relative Rescaling Preserving Focusing BLX-alpha Var. Relative Rescaling Losing Neutral BLX-alpha, P Var. Relative Rescaling Losing Focusing SPX Var. Relative Rescaling Enhancing Neutral PC Gaussian Var. Relative Rescaling Enhancing Focusing PC Crossover Var. Relative Preserving Preserving Neutral 23:23:? Var. Relative Preserving Losing Avoiding Global Dominant Var. Relative Preserving Losing Avoiding * Most likely mutation variance magnitude relative loss. Table 3.29 Preliminary Taxonomy Note that this preliminary taxonomy selects the division points between the various categories in a fairly arbitrary manner. A more quantitative categorization is -l98- possible through the test data; however, the exact formulation of such quantitative scales requires more extensive research into the interplay of the operators and the test distributions. -l99- Chapter 4 Alternate Population Relative Operators Given the analysis in Chapter 3, it should be clear that numerous standard EC Operators exhibit bias in terms of modification of the search distribution, as well as invariance or sensitivity to various homeogenic transformations of the search landscape. The question of the potential value or usefulness of such inherent biases is determined by the alignment between the assumptions such biases imply and the validity of these assumptions for a given search space. Non-invariant behavior implies a certain fragility in terms of the form of encoding used for a given problem, and thereby seems to place undue burden on the users of such operators. A potential result of this analysis is the creation of new operators that remain neutral to the preexisting population distribution and that are invariant in regards to the specific encoding employed. -200 - A potential approach to producing such Operators is to modify existing operators such that they are invariant and distribution neutral. Two possible methods for producing invariance to the selected axes of encoding are either to use unrestricted, or “free” axes for operator application (i.e. randomly reselecting the axes of application for each Operator application) or to restrict the axes of rotation to the most dominant set revealed through analysis of the distribution. While the second seems more likely to also produce distribution neutral operators, both forms will be compared. To demonstrate the effectiveness and efficiency of any modified operators, the relative performance of a system employing the operator will be compared to established EC systems operating on identical search landscapes. The evaluative comparisons will take place over the set of test problems outlined in Chapter 2. 4.1 Overview of Benchmark Systems and Empirical Comparisons For empirical comparisons three standard evolutionary computation approaches were selected. These systems include standard evolutionary programming, modified evolutionary programming substituting Cauchy samples for the normal mutative samples, and a version of the standard GA using blend plus alpha crossover (BLX-a) . The selected parameterization and operation of each'of these systems is detailed below. The NFL theorems state that no search system can be considered more powerful than another over the set of all possible problem spaces; therefore, any comparisons based on empirical results is restricted by the problems used for evaluation. Attempts to extrapolate on such results are likely to lead to erroneous conclusions. Nonetheless, empirical examination remains the simplest and most direct method to measure the relative strength (in the sense of “strong” and “weak” Al) on various systems. Empirical -201 - examination can provide a hint for characterizing the types of landscapes on which a given search technique is likely to perform well. 4.1.1 Random Rotational Presentation of Test Functions Except where explicitly noted, all test functions are presented in an offset and rotated fashion- The method of rotation is determined for a d dimensional problem by selecting 5d random pairs of dimensions uniforrnly and 5d associated random angles from 0 to 360 degrees. The individual rotation matrices for these rotations are built in the standard method, and then the accumulated rotation matrices are multiplied to produce a final conglomerated rotation. Note that since the standard test battery consists of 100 runs, the initial populations and associated random rotation sets are fixed for all 100 runs for all tested systems. (1.6. the initial population on the Sphere function and rotation matrix for the run number r is identical for the EP test series, and the EPGA test series, etc.). The initial population is computed in the standard coordinate space for the test problem, then rotated and translated to the active problem coordinate system. All test problems, even non-rotated test sequences, are redesigned to move the global optima from the origin (by shifting the effective origin). While this offset reduces the available resolution around the global optima drastically (thereby potentially effecting functions requiring high resolution such as the original Shaffer function and the Rosenbrock function), it prevents rewarding origin seeking behavior. Note that for non- rotated and rotated comparisons the same initial population for a given run number is used; however, the initial population is only translated for the non-rotated tests. -202 - 4.1.2 EP and Fast EP Systems The EP and Fast EP systems used for testing incorporate standard EP selection and mutation operators. The EP and fast EP systems are identical with the exception that the EP mutations are drawn from a normal distribution while the fast-EP mutation samples are drawn from a Cauchy distribution. Each solution to a d dimensional problem is encoded as a vector of d 64-bit floating point values (doubles), and an associated vector of d 64-bit mutative step-sizes for scaling the mutation. The initial values for the solution parameters are independently and uniformly initialized across the selected initial range for the problem. The initial values for the mutative step-sizes are initialized to the inverse square Of d. EP-style tournament selection is used in both the EP and fast EP systems presented here. This selection operator is classified as a (u + it) scheme, which indicates that the parent pool (of size u ) for the next generation is selected from the general pool of the combined parents and children from the current generation. EP tournament selection begins by selecting 10 competing solutions uniformly for each solution in the population. The tested solution receives a score from 0 to 10 based on the number of competing solutions in the sample with a less fit fitness value. After all solutions are scored, the solutions are sorted by this score and only the top it survive as parents for the next round (former parents have precedence on ties, all other ties are decided randomly). In all EP and fast EP results presented here the value for it is 100, and the value for l. is 6 u. Mutation occurs using the standard EP self-adaptive technique to adjust the mutative step-sizes. During each generation, each parent solution spawns 6 child -203 - solutions through mutation. The modification of the step-size is performed before the step-size is used in mutation as outlined in [Back97]. 4.1.3 BLX-a GA System The BLX-a GA system used in this empirical testing is a modified standard GA. Although [Eshehnan 93] demonstrates that BLX-O. is apparently more efficient within the CHC framework, we chose to test it within a standard GA framework in order to provide more direct comparisons between the relative power of the EPGA and BLX-a operators. Other than the difference in operators, both systems are parameterized identically, using the same code for all other components and initialized with the same set of initial populations and randomly selected rotations for each test run. The population size for all BLX-a performance data is 200 unless otherwise specified. Single element elitism is used (the best parent is directly copied into the child generation). Selection is tournament selection where two individuals are selected uniformly at random (with replacement, which allows a non-zero probability for the least fit member to survive) from the previous population and the fittest individual is selected as a parent solution. BLX-a is applied to a pair of selected parents producing a pair of independently produced children. Other than the single elite individual, all individuals in the new generation are produced through crossover. Since BLX-o. applies a random sample, no mutation operator is incorporated into the system. 4.1.4 Ranksum Comparisons The results presented in these tables are compiled from 90 of 100 runs, discounting the 5 best and 5 worst of 100 runs for each system type (EP, BLX-Ot, etc.). -204 - The best value found in those 90 runs is given as well as the number of evaluations used to find that value. The average best of all 90 runs is also reported. In multi-way or side- by-side comparisons, runs are ranked according to the best fitness value found during the rim and the sum of these rank values is then reported. The Wilcoxson rank sum test allows us to use these values in side-by-side comparisons to estimate the likelihood that the two sample groups are drawn from populations with unique means. The Wilcoxson test is strongly invariant to distribution shapes. SO unlike a standard t-test, we do not need to prove that the underlying distribution is normal. Given the large sample sizes, we can use the following formulas to compute the expected mean and variance of the ranksum distribution. Since the Wilcoxson values are expected to be normally distributed regardless of the underlying distributions of the sample sets, we can use these values in a normal Z-test against the viewed ranksum values to estimate the probability that the two means are distinct. _ nA(nA +713 +1) 2 p A , where "A, :13 are the size of sample sets A, and B respectively Equation 4.1 Expected Mean of Ranksum Values \[nAnB(nA +nB +1) 0A: 12 , where 71,], "B are the size of sample sets A, and B respectively Equation 4.2 Expected Variance of Ranksum Values -205 - Therefore, we can calculate the probability of a given ranksum value being produced if the two sample groups did come from the same mean. The formula for the relevant Z-statistic is given in Equation 4.3. R _ z=__4_fi«i 0A , where R A is the observed ranksum value Equation 4.3 Z-statistic for an Observed Ranksum Measure Using Equation 4.3, we can compute the Z-statistic and probability level for the ranksum values observed in the results. Table 4.1 presents Z-statistics and probability levels for a number of observed values covering the possible range, while Table 4.2 provides a ranksum values corresponding to probability levels of interest. In all side-by- side performance comparison data, all ranksum comparisons providing a 99.9% or greater level of significance are highlighted. ~206 - Probability Ranksum Z-statistic 71.49113 4095 4 1.7823 0.00% 4545 4 0.4731 0.00% 4995 016399 0.00% 5445 -7.85485 0.00% 5895 054571 0.00% 6345 -5.23657 0.00% 6795 092742 0.00% 7245 -2.61828 0.44% 7695 4.30914 9.52% 8145 0 50.00% 8595 1.309142 90.48% 9045 2.618283 99.56% 9495 3.927425 100.00% 9945 5.236566 100.00% 10395 6.545708 100.00% 10845 7.854849 100.00% 11295 9.163991 100.00% 11745 10.47313 100.00% Table 4.1 Z-statistic of Ranksum Measures with Two 90 Member Sample Groups -207 - Probability . . Z-statrstrc Ranksum #A <flB 0.001% 4.26504 6654.216 0.010% -3.71909 6845.046 0.050% 0.29056 6994.833 0.100% -3.09025 7064.847 0.500% -2.57583 7244.656 1.000% -2.32635 7331.859 5.000% 4 .64485 7570.065 10.000% 4.28155 7697.052 90.000°/o 1.281552 8592.948 95.000% 1 .644853 8719.935 99.000% 2.326347 8958.141 99.500% 2.575831 9045.344 99.900% 3.090253 9225.153 99.950% 3.29056 9295.167 99.990% 3.71909 9444.954 99.999% 4.265043 9635.784 Table 4.2 Ranksum Measures for Various Probability Levels with Two 90 Member Sample Groups Similar tests exist for multi-way ranksum comparisons, but they can only indicate if any of the samples deviates fiom the norm. Therefore these values are of limited value in multi-way comparisons and are used mainly to determine the validity of the average best values. Note that it is quite possible for a given system in a side-by-side comparison to demonstrate a weaker average best value on a given test function, while its ranksum indicates that it statistically outperforms the other system. This can occur if the system produces a number of outlier systems (beyond the 5% removal limit) which exhibit poor performance. For example, if system A produces an error value of 106‘3 once and me20 for all other runs, while system B produces a consistent we15 on all runs, the average -208 - error for A will be larger than that for B. However, the ranksum test clearly would indicate that A consistently outperforms B. 4.2 Principlal Component Operators for GA One fairly simple approach toward removal of rotational bias would be to ignore the bases of encoding altogether, treating the search points as existing in a non-oriented d- dimensional space. However, most operators require some form of axial or dimensional decomposition for application. One possible approach would be to select randomly aligned axes for each operator application. This effectively simulates a non- oriented approach. Unfortunately, given that the number of possible alignments grows exponentially with the dimensionality of the problem, the sampling of a fixed number of operator applications is likely to provide a relatively poor sampling of this space. The result is likely to produce a less stable system than those using fixed bases. Randomly rotated axes for operator application may offer an opportunity to reduce, or at least normalize, alignment bias. Ideally we would like to be able to select a correctly aligned basis set, or at least one with known properties, such as one that minimizes the area of distribution for the offspring. An optimal rotation would be one which aligns the longest dimension of the hyper-ellipsoid of the mutation distribution with the dimension having the largest variance in the population, and which aligns the second longest dimension of the hyper-ellipsoid with the dimension having the next largest variance which is also orthogonal to the dimension already selected, and so on. Note that this basis set may be located by solving for the eigenvectors of the covariance matrix and ordering the eigenvectors in descending order by the absolute value of their associated eigenvalue. This procedure is identical to that used in creating principal -209 - Fit _ component projections for data viewing [Jain 88]; hence, these operators are designated principal component (PC) operators. In fact, this is somewhat a misnomer since principal component analysis assumes reduction of the dimensionality complexity Of data for presentation by limiting the number of dimensions being projected. However, the term has also been used in [Kita 1999] and connotes the intention of using the eigenspace of the covariance matrix. Computation of an eigenspace requires sufficient samples to prevent a singularity in the computation. A singularity occurs when the sampled set has no projection along one or more dimensions. When a singularity occurs, the eigenvector calculation may produce random vectors for the smallest eigenvalues. If the population has not converged in its level of dimensionality, the covariance matrix requires at least d+1 samples. For the PCGM and PCX operators a suggested sample size of 2d is used. Since the PCGM operator uses the eigenvalue as the basis of the mutation size, the random eigenvectors are not expected to have much impact on the operator. Likewise if the singularity occurs because of loss of dimensionality due to convergence, it is unlikely that a pair of parents will have significant variance to exchange across the random orthogonal eigenvector. The only situation which could cause unexpected effects is if the sample set has a singularity while the parents for a PCX operation have a significant amount of variance across the random eigenvector. 4.2.1 Principal Component Crossover (PCX) The concept of using the principal component analysis basis space for crossover is straightforward. Two parents are selected using the standard breeding selection technique. An additional pool of the desired size is selected uniformly without -210 - replacement (and disallowing the two parents). The covariance of the combined pool (parents and pool members) is measured, and the eigenvector analysis of the resulting covariance matrix is calculated. The result Of this analysis provides a series of orthogonal unit vectors in d space, and a set of associated eigenvalues (representing the magnitude of the variance measured along each of these vectors). The eigenvalues are then discarded. The two selected parents are rotated into the basis represented by the eigenspace of the reduction of the covariance matrix. Standard two-point field-based crossover is then performed on the rotated parents, and the resulting children are then counter-rotated and submitted to next generation (or for mutation with PCGM). This operator was first introduced in the tech-report [Patton 1999]. 4.2.2 Principal Component Gaussian Mutation (PCGM) Principal Component Gaussian Mutation (PCGM) proceeds in a parallel fashion to PCX. A single parent is selected using the standard breeding selection technique, or is provided as a result of a PCX operation. An additional pool of the desired size is selected unifome without replacement. The covariance of the combined pool (solutions to be mutated and pool members) is measured, and the eigenvector analysis of the resulting covariance matrix is performed. The result of this analysis provides a series of orthogonal unit vectors in (1 space, and a set of associated eigenvalues (representing the magnitude of the variance measured along each of these vectors). J67. T , where e,- is the i m eigenvalue, and S is the PCGM scale factor Equation 4.4 Formula for PCGM Sample Standard Deviation -211- The d zero-mean normal samples are drawn, where the standard deviation of each sample is given in Equation 4.4. The vector of these samples is counter rotated using the rotational inverse of the eigenvector matrix and then is added to the solution being mutated. This Operator was first introduced in the tech-report [Patton 1999]. 4.2.3 Sample Selection for Principal Component Analysis The most obvious candidate for application of the principal component analysis is the covariance of the entire population. This would require less computation than use of multiple smaller pools. However, the use of the entire population would allow a single outlier to potentially dominate the variance measures and therefore the axial alignment. Further, using multiple smaller samples allows more stochastic behavior in the sampling process (under the assumption that it is better to be correct for part of each generation than to be wrong for one or more entire generations). Preliminary empirical testing demonstrates that using the entire population for eigenvector analysis of the covariant matrix provides less effective search in the majority Of situations than using smaller random samples from the population. If we decide to use a subsection of the population for operator application, the decisions of sample size and the method of selection come into play. Clearly the sample size must scale relative to the population size in order to avoid singularities (resulting in unfixed or free axes) in the covariance matrix as much as possible. A minimum sample size of at least d + 1, where d is the dimensionality of the problem space is necessary; however, we suggest a sample size of at least 2d. The simplest and least biased form of selection is to select uniformly from the population without replacement (replacement would allow multiple instances of the same -212 - 14.-.. l'... -;.r '2' -' a. . z: 'o ‘J J p point, which would increase the likelihood of a singularity). Altemately we could perform some form of tournament or other performance based selection for pool entry; however, this would likely compress the location of the pool members and again increase the likelihood of reaching singularity as the population converges. Similarly, we would prefer to apply some form of localization to the pool selection, allowing points “nearer” to the targeted parent solution(s) to have a better opportunity to participate in axis selection. Yet this would add additional cost and potentially increase the number of singularities. Use of uniform random sampling is the least biased of these options and should provide a sufficient basis for our current work. Investigations into other forms of selection for pool membership are left to future work. 4.2.4 Difficulties with Relative Under-Constraint and Excessive Freedom An anticipated drawback in the use of principal component analysis is that the analysis may become artificially constrained under conditions of excessive parameter freedom. Specifically, suppose that a given problem parameter has no impact on the evaluative value of a given solution. This parameter is then free to assume any value. Initially, this should not be a great concern as the distribution of this parameter will be random. However, as the evolutionary simulation progresses, it is possible that the parameter would drift toward alignment with one or more parameters. If the variance of this parameter is still fairly large, which might be expected since no selection pressure is being directly applied toward convergence of this value, an anomalously large covariance may be created. Since the principal component analysis fixes the new basis such that variance along successive dimensions is maximized, it is likely that the one of the early dimensions will include this artificial relationship. In addition, all subsequent axes will -213 - be constrained to be orthogonal to this artificial relationship, resulting in a potentially skewed basis. Note that the same analysis should apply to landscapes which suffer from relative under-constraint (i.e. the expected contribution of a given parameter over the initial range is exponentially smaller than that of another) to a degree. However, the actual performance of these systems may depend upon the relative speed of convergence for these parameters, since slower convergence allows more time for drift. None of the testbed problems outlined in Chapter 2 specifically incorporates free parameters. However, several show varying degrees of relative under-constraint. 4.2.5 PCGA System Parameterization For the PCGA system used to obtain the empirical results found in this section, both PCGM and PCX are employed, unless otherwise specified. Single element elitism is used (the best parent is directly copied into the child generation), and all other elements in the Child population are produced through the combined application of PCX and PCGA. Unless otherwise specified, the scale factor, S, for the PCGM operator is set to 3 (selected because it directly matches the amount of variance injected with a similar BLX- 0.5 application). Selection is tournament selection where two individuals are selected uniformly at random (with replacement, which allows a non-zero probability for the least fit member to survive) from the previous population and the fittest individual of the pair is selected as a parent solution. The default population size is 200, and the default pool size for both operators is 2d, where d is the dimensionality of the problem space. -214 - 4.3 Demonstration of Rotational Bias in Standard Approaches Salmon [Salmon 96] has previously demonstrated the rotational bias in CHC [Eshelrnan 1993], and how EP shows a far lesser degree of performance degradation over rotated landscapes. The following analysis reproduces and extends the work of [Salmon 96], including analysis of Fast EP and the proposed PCGA system. 4.3.1 BLX-O Rotational Bias Table 4.3 provides an empirical comparison of a BLX-0.S GA system under rotation and without rotation. Interestingly, the analysis demonstrates that the BLX—0.5 operator actually performs consistently better for a number of functions when the landscape is presented in a rotated manner. The Clover & Cross and Yip & Pao functions especially exhibit enhanced performance when rotated. Analysis of the Yip & Pao function shows that a rotated alignment allows for alignment of local optima in an aligned fashion (i.e. movement along the rotated axes is more likely to move fiom optima to optima). NO similar explanation presents itself for the increased performance on the Clover & Cross function under rotation. The pattern of these two functions demonstrating enhanced performance under rotation remains consistent across all the systems tested here, except PCGA. -215 - sees-8.2.- 38582 e8 ease: .85- 5 033m .5 seated-=8 bodes-eta.— n... 2.3. m comma-N- o axle-8.33m: «abound- FBw 3.- _oaom-scow omnmRm. o 89% omrwmlmmd cho>>l 8 580m oobowp comma-mm; dxm 0282-. 08 F 34mm 8% oovmmmmé woo 93:00 chow Sod ooooww VEDA-=30 8-3:...- oobomv _mzcocoaxw mums 8%.? 80F win. oomomomd oooomv _o-am 5855:5- is 0.888- ooome 888N- Elooemmmm- 88..- 00550; a .8: 585m coo: w E's c.8233 8o8 c.888- moi-898$ m 8.2 eta-om oommmmm. F J o o o o o o o o o Egg; e 8o a a; Egafilgig 88m 8888 e ease-am EEEEEE 88.. 9-88.4- a 56558 EN: NIN- oim- 88N- E..- o moo-0858.0 82 else-89o 288 8880.0 alga 88808 c 3.89550 saga .., --oeNN N e 329-. o .550. 0 9825-558 , o xoobcmmom 855.3- 884 8848.0 0 woo-odes: Egg? :8 comes-.3 89K 8885.,” o 8F figfig 82 Ema-.58 oboN: 8588...” o Foes-585m ..j-e- 8383 88: mas-N890 o 6528 8.88... 088 _86845 0 N; ..,-_oa-Eom 868mm 894 8808.0 o o-o-am . . 2-8-me 88». o 585m Eamxcmm cams. m.u>m Esmxcum cabs. m_a>m 2550 c0225“. vogue--502 U380: -216 - 4.3.2 EP and Fast EP Rotational Bias Table 4.4 and Table 4.5 compares standard EP and Fast EP systems under rotated and non-rotated landscapes. Interestingly, both the EP and Fast EP systems demonstrate more rotational bias than the BLX-0.5 system. Fast EP shows increased rotational bias, both in the magnitude of the performance differences as well as the number of functions where a statistically significant performance difference is observed, based on ranksum values. Again we see that the Clover & Cross and Yip and P80 functions consistently demonstrate enhanced performance under rotation. A potential explanation for the rotational biases displayed here is that the high levels of elitism in these systems may effectively allow a form of inductive search, where each parameter is fine-tuned individually while the others are held relatively constant. Inductive search typically proceeds best when parameters are relatively independent. (Rotation of the axially aligned function space may also be seen as recasting the encoding such that individual parameters become less independent.) This would be consistent with the observation that Fast EP enjoys a greater performance boost, in that the larger mutative jumps provided by the Cauchy samples would potentially allow quicker tuning of individual parameters. Note, however, that there is insufficient evidence to completely substantiate this hypothesis and the actual cause of this rotational bias is likely to be much more complex. This subject bears further scrutiny in the future; however, for now we simply conclude that both EP and Fast EP systems demonstrate considerable rotational bias. -217 - 565358...— eoafieméez Fees 1883— .625 km no nataafieo cage-Fuen— F..F.. 93am. mmwk owmmmmmd OOOON F moéwko. F Fmvm 059 F med OOOON F moéne F. F o 35:00 BEE-ND F FON F mo+ome.N- OOOON F mo+owow.N- 5&6 Fe- Eon-snow nmmo F ovvnmmNd QOOON F onmm Fm F6 0 wE-o>>l mm FN F oomVMde OOOON F 8 FmKoN c .axm 0232: vwoo F oommowN. F SOON F omwwmmvd c woo 33:00 mwNw oo meN.wN ooooN F OONwh-va KNom ooF-mmodm OOOON F oXNwmwd c xc_r_-c_m_._o mm FN F mob F8. F OOOON F noovoo. F c _mzcocoaxm vmhm oommnhd F OOOON F oommNmm.m 0 win. Am FR 00 memm.v oogN F comma-mod mm Fm oommmwoé OOOON F oommmwmd o _Efim _mvoEEBz END F ov F9650 OOOON F omMNno F.o o v .UOE Cothom ammo F 0805mm. F COOON F omovoomd o m 50.2 Cetmzow mNho F wommmmod OOOON F vanm 5.0 O can a E> ANvo F oommwvdm OOOON F 095de F o 55:53”. Nome F oommva. F QOOON F N To FmN.m o >xw>wcomcom mm. N F MN Frowmmé OOOON F mMNrowva 0 $90 a Co>o_0 thh ommNmmmd OQOON F mommomod 8mm envmmmwd OOOON F FMvamod o xmcm-sotmu - @500 F 00 Fommwd OOOON F Wormwood 0 >023.- mmmh OOQNNNQ. F OOOON F DONONNQ F mmmw 00 5 5m. F OOOON F oo mewN. F o _mzqw nwow OONmm.mNN ooo0N F 0850. 9. F mNNm oo FdeNN OOOON F oomwmdw F o N 622 .mtmcow VMNF OONmFR F.m ooooN F commONNd mmom oowwmmmfi OOQON F comm Fwo. F o xoocncmmom _ a -.. _ mw0N F Doom-949v QOOON F oomeNod o 0620 w .QSF mmmo F oomvkmd ooo0N F oono FNm.v o mcfir ,. -_ vooo F oomnmn. Fm ODON F cow FONdv o F 605. thmzom ommh oommmNFem ooowF F oomNmmdv ow; COthnmm ooomm oomNmmdF. o ..Dtmcom . .. . . mnoo F ooémmfim ooom F F N To meN o N. F FEE-scum 3mm wobNmm. F OOOON F m F-DONm.-v oth Fobmwo. F ooom F F m Froomwd 0 9.93m ..- 1 _. ,. . , . , ,. Nmno F OOMNmmmN OOOON F mONNoood o mean-Um Esme-cam coo: m_u>m “mom 539.5: :35. 226 «men 25.30 5.85:". — 332.502 3222-. -2l8 - fleas—.38...— ESCy—éez p5. 0353— .515. mm 8:..— .Fe genius-:60 ocean—notch m6 039—. mN-a mNNNN-oe 88: $88... ..,-8 98:8..- 88N- 8-88N o .558 259-No .- 8N- 8+88.N- 88: 8+o48N- False-e- _o-oo-scom . . _ 88- 888.0 288: 8.83 o 5531 - 8N: 88m8N .88: 8485.: o em. 855. 84: 8988.0 :88: 8N8NN.o o 80 288 SN: 8:8NNN 88N: 9888.0 3 53.580 85: $28.: 88N: 8-888 3 55883 ENS 888.2 88: 8888 c ms.“- . . 88: 8883 808: 8283 o Esme-855:2 .. Re: 2883 88: 9583 o 48.25528 8&- 8NBNNF 88N: 8N88.o o 882-828 88: 2855.0 288: 84858 e on.- 35> , . 8&- 8888 808: 8N88N o 585.com _. . as: 8.888 88: 9.88.: 0 55-8858 888N 88N: 8788.- . . o moo-o 896.0 .- . 88 0538.0 88N- NlNe-emoe o x8865 .. ..- . 86: owl-Keene OE: 8-8me o ape-es. .- . -. 8N: H.888.- 88N: 8883 0 =58 .._. 4... . 88: 888.3 amp-N: 8888 o Noe-€828 . ..e. 88: 8888 808: _8..-88N o goo-88oz .- _ . 8&- 80588 898 ESN o moo-08.8; - . . Em: 8R8: @8- o888N o 8E] .._ 88: 888 8 BOON: x8484...- o fees-585m .._-... NiN: 888.8 ob8lN: 858.: 0 585m _ . 88: 8-888 EN- 9-8m? o Nae-858 8NN NroesNe 88N: 3.8:.- eN8 :-oNNo.N BOON: ere-8N 0 958w ,-...........-_- --H ., - ..., . 8:: 0888.0 88: 8-28.4 o 588 5.39.5: can: m_m>m mom Esmxcam can: m.a>m «mom 2530 50:05:". — peace-.502 page: -219 - 4.3.3 Principal Component GA Rotational Bias In examination of Table 4.6, it is clear that the use of principal component guided operators provides the least amount of bias under rotation of all systems examined. Those functions which still revealed a rotational bias were those which provide either strong trenches of attraction along the aligned axes such as the Trapezoid and Cross, and Clover and Cross functions, or those which provide independent, relatively under- constrained parameters across the non-rotated axes. (Interestingly, the Clover and Cross function demonstrates increased performance in its non-rotated presentation for this system, in opposition of the results for all other tested systems.) Only the Schaffer Modification 1 function demonstrates increased performance under rotation. Given that this function presents a rotated ellipsoid structure, a given random rotation may possibly provide better alignment to the axes of this hyper-ellipsoid. The increased performance on the exponential and inverse exponential functions seems to reinforce this conclusion. Therefore, we can conclude that PCGA does not show significant rotational bias except in cases of fimctions with hyper-ellipsoid shaped isobars, and long axially aligned trenches. Comparison with the results from other system tests demonstrates that PCGA does effectively present a greatly reduced level of rotational bias. Therefore, we see that the level of guidance extracted from the population remains fairly consistent under both rotated and non-rotated presentations. An open question remains as to whether this guidance is effectively random rotational presentation. This issue is addressed in section 4.6. -220 - 53358.5 cSSefl-uez 1:8 383— .32:— :2 8828: 8% 88888 28 8NKNIN: 888 82588 8 5.988 838 8:88.? 888 9-88.? 838 8788.? 888 8:88.? 8 8.82888 8.8 8688,.» .885. 88888888 :88 878%.: 8888: 2-388 8 8908.886 838 88888 888 88888 838 88888 888 88888 8 88325 N88 $-88N 88% 3-82.: 8: $-mN8NN 888 3-82.: 8 8:2 N82 828F888 888: 88888 88 883NI8: 888: 88588 8 _Sam 82 8728.8 888: 878$.N 88 .2898 888: 9.8888 8 N8828858 N88 8822 888: 82888 82 8%82 888: 88.888 8 89888 88 . 8,828.8 8828 . 888898. 88 8N8|8IN8 88: 88888 8 890 .8 .8; 88 88888 888: 8888.: 82 88888 888: 8:83.: 8 8F 88: 88- $8: 822 .8888: 8 ,. 8,: , . 88. 882 ,, s . Ni... 8 885.8888 _VNE _mF-moolwm 8888: 2-888 88 9.288 888: 8:-88N 8 8888 .58 2N- 8818: 88: 8.288 82 8-8%.: 8882 8-288 8 «5.88388 _fiw 8-8mm; 888 «WIN-85.: :82 8-888 882 8N-2:.N 8 9888 88 2-828.: 888 2-28: 82 2-28.: 88:8 9-88.: 8 9888 —E:mxcmm cums. m_u>m «mom Esmxcmm cams. m_u>m «mom «Ezao cozocsu — 8282.52 “.280: -221- 4.4 Empirical Relative Efficiency Comparison of PCGA The following comparisons show the relative effectiveness of the PCGA search system as compared to EP, F ast EP, and BLX-a on the test problems listed in Chapter 2. The intention here is not to demonstrate dominance of one technique or another, rather to simply demonstrate that the PCGA system is relatively as strong as the others, and in some situations outperforms the other systems. Note that as previously discussed in section 4.1.1, all test functions being evaluated here are presented in rotated and translated fashion to the individual search systems. 4.4.1 Comparison to EP and Fast EP Analysis of the data in Table 4.7 and Table 4.8 shows that PCGA is capable of favorable performance on a number of functions. If we attempt to categorize the functions on which PCGA seems to provide enhanced performance, we see that they are largely unimodal fimctions, or functions with strong unimodal components. Conversely, EP and Fast EP provide better performance on the third and fourth modified versions of Schaffer’s function, the Multi-modal Spiral function, the Double Cos and Worms functions, all of which are strongly multi-modal functions with very low signal-to-noise ratios, in terms of information pointing toward the global optima. The apparent reason for the reduced relative performance of PCGA on these functions is that the lack of strong feedback toward a single optima. This allows the system to spread the population further and further without significant convergence toward a single optima. This demonstrates a potential danger of the effective “feedback” of variance through the PCGA operators. Under circumstances where selection is not likely to significantly reduce the population variance, these operators are likely to -222 - continue increasing the variance of the p0pu1ation. This hypothesis has been verif through preliminary measurements of the population variance on these landscapes 0 time. Note that this feature is not necessarily a negative one. When the landscape sufficiently chaotic, it may be more useful to continue exploration of the space than force exploitation of the single most promising peak. However, it is possible for P0 to completely fail to converge under such circumstances. Ideally, we desire to have m direct control over the balance of exploration and exploitation. This is a topic we \ explore again in section 4.10. The results in Table 4.7 and Table 4.7 demonstrate that use of population samp information is often as effective as or more effective than use of a self-adapt mechanism. However, we cannot determine whether the enhancement is provided by 1 of the population sampled variance for the mutative step size or the ability to bypass bases of encoding. Still these results show that using the population for guidance is equally valid approach under certain circumstances. -223 - «50m 6:: 589mm mm as no sour—3:30 35852.5.— hé 035—. 88: 28:86 88N: 8-2:: a o 388 28:? 32 8+o8N.N. 88 8.888- 88 8+88.N- 88N: 8+88N- 58.9? .2828 88: 8883 “88: 2882. o 2:25 N5: 8888 88N: 8528 o .90 852: NSN: 8828 085 8N§o.N : 80 288 88: 888.8 828: 8888 o c 83-580 88: 8-28.: 88N: 8.38.. 3 888°me . .. . 83 882.2 88N_ 8888...” o 22”. N8: 8223 88N: 8888 o .88 8.8585. omFN: oo¢¢oo~.: oooao oooomom._ o v.uo<‘.w=mzom 28: 888va 88N 8833 o 8 8.2888 ._ 88: .8288 88N: 8883 o 08 a a> ..,. 88: 888.8 88N: 888.? o 8888 .., ,. .. _ . . 85 8883 o88: N728...” c 8.98088 88: 8-88.: 288: 9-88N . c 8208220 n. .. . . 88: 288.0 o88: 8:88 o 88:80 .. .. .... 88: 8.88.8 88N: 888°.» o 8:2 . a .. mooN: oo:o:mm.: ooooN: oomemN _ o _m:am _ . . . 88: 8.8.88 88N: 888.8: o 882.828 88: 8888.2 88N: 8883 o 9.8.888 ... . . 88: 8883 288: 8883 o 820 a .8: _. .. .. 88: 88288 088: 8383 o 8F . 88: 822.5 68: 898.8 0 :82828 , . 88: 822.8 o88 888.? o .888 ._ 88: 8.82...” 88: N728N o «3.838 88: 8-2%? 88: 9-88m o 288 . ._ . . ,. 88: 8888.2 808: 8883 o 988 E335: cam: m_u>m “mom £335: :3: 226 «mom mm uEzno 5:25”. _ <09. -224 - <00.— .Ea 588.8 mm.— .8h 8 ac coming-:00 oogotom ad. Span. 8.2 288.0 00002 $888 0 09008 2.80? 8m: «.8888- 88 86888- 582? .2838 88: 0888.0 88: 2888.0 0 2503' 2.8: 08888 00002 003.28.: 0 gm. 8090. 822 08888 0008 823.08 0 80 2008 . 8:82 8:888 0082 0888.0 0 850-0020 822 8-28.: 882 8-888 0 880098 88: 00MB“: 0082 08888 85 0088.2 88: 08888 0 8: 8:: 08:88 0082 08888 0 .908 88500002 822 00088.: 808 8088.: 0 8 8.2.8me 822 88888 808 08838.: 0 m .802 .888 82 22880.0 8002 88:80.0 0 80 8 00> 822 0880.2 8002 00880.5 0 00.008880 . . . 8:2 00:28.0 88: 2.88: 0 8.9088008 822 2-88.: 8002 2-888 0 8908.966 .., . 8:2 0832.0 0802 8380.0 0 88280 0.. . 8:2 0888.0 8002 8-828 0 06002 ., . 88: 0888.: 00002 8280.: 0 _900m .. __ 822 882.2 8002 000828 0 8 .00: .888 0802 8883 0082 0888.8 . 0 890880". . ..., . 822 08588 F88 8%:8 0 890.8%: . 28. 08888 882 008088 0 Em .. ...0 , 822 0082.8 0082 8088 2 0 2.802586% _. .. . .. 822 8800.8 0802 0058.: 0 .828 . 0.0.... .., , . . . .- .. 822 8-888 @082 2-8m? 0 8.3028208 0.0., 0 . . . 8:2 :88.“ 0082 _3288 0 980m .0 0......0..0 :0 .0 0. . 822 0888.0 008: 8-28:. 0 28008 Eng—cam cam: m_u>u 5am Eamxcam cams. m_a>m “mom 2:030 5:050". _ <02 am". -225 - 4.4.2 Comparison to BLX-o GA Given the similarity in the approach of BLX-a and PCGA, it is not surprising that both systems demonstrate similar performance on the given test functions. Table 4.9 illustrates the effective performance differences between these two functions. The individual graphic performance analysis for individual test functions in section 4.13 further reinforce their similarity, since the PCGA and BLX-a curves are very similar in many cases. While several possible trends appear in the individual strengths of these two systems, there do not seem to be clear characteristics or trends which allow us to categorize the general relative preferences of either system. Coupled with the previously presented data on rotational bias, we can conclude that PCGA achieves a similar level of performance with less rotational bias on these test cases. Conversely this implies that BLX-a provides enhanced performance under circumstances where parameters are known to be fairly independent (e. g. on the non-rotated test cases). -226 - ii.‘ <80 08 as“: 5 8.030 a 00 astaaoo 888.050 0... 280. 0: _ ._ 8...... 8 .- 8:2 2-88.8 882 8.88.8- 0 .2008 00.803 . 8.. ., . a... p0,, 8:2 8.8088- 008 88088- 0.8.23- .2888 88 0888.0 008: 2888.0 88 8828.0 F08 028|080 0 8:020 s. . ..., .. ,. 8.. x , 00.0.0 , , . .. 413:8: 88|888 0082 8880.: 0 .080 88>... 88 0800.88 808 88308 8P8 08:88 088 83%: 0 80 20000 88 0888.0 088: 08:88.0 8: 0828.0 0802 0888.0 0 002-580 83. 00.. 8. ..m... . . 28...: , E0. 88: 88:8 8002 8-888 0 80080008 80.. ...0 . 00808.0: .3.... , . . .8...- 28: 0088.2 882 08838 0 8.28 23 0888.0 882 08888 . 80:... ., a. .. 8.. 0 88.800.00.02 088 8382 88|m 0888.: 88 0888.: 088 8:28.: 0 08 80258008 28 8.8888 0008 0888.: E: 82888 E: 0888.: 0 8.8.2.888 88 0888.0 0008 08880.0 8: 08'8080 008.. 08'8800 0 8n. 8 00> 88 008:8I82 0008 00808.0 88 0888.: 808 2538.0 0 0:00.080". 88 2.88.3- 0008 2-88.3- 28 8880.0 088 2.88.3- 0 0298808 88: 02-8%: 00008: 2.8808. 0 .. :.m..........:. .. ..0... 0 8058.806 88 80800.0 0080. 08800.0 28 8088.0 0008 80080.0 0 0008280 8:82 3-8mm8 80% 8-8K: ., .. r... ...0... 8088.8... 0 >802 888 00388.: 8002 0888.0 28 8882.: 0082 0888.0 0 .908 . 8-80am . 00...; 80.83.“... 822 83888 088 0288.0 0 8.8.2.888 38: 00888.8 0082 00888.8 . .8. 0.0008 ...1 :28 0 0085088 83 88808.0 88:. 08800.0. 80: 00:28.: 08M. 28'880 0 8058.8: 883 08:88 00.82. 00808: 38: 888: 088 0838|08 0 8E! 88 0280.0 8.08 50.88; 8.2 08858 088: 0888.8 0 2.820828 88 8-2.02.8 0082. 8.888.. 822 0888.0 88: 8808.0 0 .888 8:2 58'8: 0008 8.888 8.9. 8.8.88.3 0008 .82; 0 8.3028208 8:2 WIN-888 808 8-2:8 8.9., 8.0.0.8 803... 00000008. 0 988 828: 2-88.: 0008 2-808.: .3 8.88.0 0083.... 0000000.. 0 988 Eamxcmm_ cams. m.u>m “mom 259.53. :3: 8.26 «mom 85.30 cocoa—E _ <80 8.0.048 -227 - 4.5 Empirical Evaluation of PCX and PCGM in Isolation Studying the PCX and PCGM operators in concert provides some measure of validation that the operators offer similar performance to standard evolutionary computation approaches with less apparent bias, although potential faster convergence. It is entirely possible that the majority of the behavior of this system is provided by only one of the two PC operators. By comparison of the performance of PCGA systems with one of the two operators disabled, we can determine whether the performance of the system is provided primarily through PCX or PCGM or as an emergent property of the two when used in concert. From evaluation of Table 4.10 and Table 4.11, we can see that neither the PCGM mutation operator nor the PCX operator working alone provides as much search capability as the combination of the two. All systems use a scale factor of 3 for the PCGM mutation operator range. Interestingly, direct comparison between PCX-only and PCGM-only performance, as demonstrated in Table 4.12, shows that PCX performs better on twice as many of the test functions as the PCGM Operator. This makes some sense, as the level of exploration under PCX should be relative to the distance between the parents, while the level of exploration under PCGM will be one third of that when the scale factor is 3. Consistent with the previous discussion (Section 4.4.1) on the divergent behavior of PCGA on multi-modal landscapes with relatively weak global bias, we note that the PCGM-only and PCX-only systems tend to outperform the combination on these functions. This is the expected result, since these systems provide reduced variance ~228 - addition, and therefore have a lower probability of overwhelming the variance reducing effects of selection. -229 - issues-.— 5 a .0 350.50 28.— 05 06.. 05 8.8.080 .0 =85..an 85.5.0.8 00 .3 2.: no N— mo+omoo- P oo.o.oN_. Rmmoo .o. o o 33:00 HEM—0x0 8.3080 80:808.. 8- in? 38.83 8- .. . . . . 0.00.83- @8358 -230 - alga: .. 0 weaal _ . . a??? 0 .me 8.2.2 0 80 2880 0 0.03-520 . . . o _8_Eocoaxm . . _ . 0 ms: EEEE . 0 _sam 8820...: Elgfios 0 3 .022 509.08 jgio 0 8 .022 .8208 .. 0 on". 8 ac, 0 5.0338 0 >xw>ozomzom o 820 .8 53.0 Big? 0 c.0535 fig? 0 >263 mum—IE? 0 _eam . 0 8 .82 .8868 o xoobcomom o 390 a de 0 mail 83838.0 0 P .oos. ..mumcom 030 2.: 0.0 0 5:208 00008. 88 0&800 0 .8.— 228368 00888.8 80-8088 0 22.08 08888.8. 88800.0 0 98008 839.5: «mom «Ezno 026:0". a <09. 2.5.06“. 8.0.:an <8 8 0. 02.8500 28.. 8.3 88.. 808 8.00.288 .0 08.58800 855.083.. ...3 2838- 88.8. 8968888 00008. 00888.8. 0 _0808 2.5.? 8880. 896888.8- 0008 896883.8- =.88.8.3- 5.88358 0888 08888880 0008.. 0.830080 0888 0.808880 0008.. 08..888.0 0 8.502. 0880. 00800008 00008. 88:88.. 0 .98. 88.9.5 888.. 00803888 000.8 008.3308 0 88 83:08 838.. 00883888 000.8 0088888.0 0 882-5880 88.8. 0080.888 00008. 80-8838. 0 8888086 .., . . w 833.. 0088888. 00008. 88880.8. 0 82“. 880.. 00883888 00008. 00838088 0 5.8988583: 3388 0033808.. 00088 0000888.. 8388 0080388.. 00080. 0033888.. 0 3 .822 5.0858 0808 00888388 00038 0088338.. 0888 00888388 000.8 0088888.. 0 8 .822 888868 88.8. 008888... 00008. 08888800 0 088 8 _er r. . 88.8. 0080.308 00008. 0088.808 0 0888.888 88.8. 038.8080 00008. 88888000 0 88.985208 . .- 88.8. 898888.. 00008. 80-8880. 0 8820 8.8.66 88.8. 00.8380. 000%... 085%888 0 3.982880 88.8. 00.88888 00008. 888.80.. 0 88:2 .. . _ 8.3.. 0088888.. 00008. 0%I880... 0 _eam . 88.8. 00088.08 00008. 0083088. 0 8 8028888888 ..8.. 00883838 $.08. 00088883 0 8858888 .... . 888.. 08888880 00008. 8.888000 0 885.8 .88: .., .. 8088 00|8.808.8 08808. 008.3.8. 0 88%. ... ., ..8.. 88.8. 00888388 B08: 08888080 0 ..802888858 .,. 88.8. 003.8088 00008. 08888880 0 58858 ,,. . . . 88.8. 8&3; 00008. 003.8.88 0 8.8.8.8358 ...,... . 88.8. 00888.30. 00008. _08-813388 0 28888 ...,... ...,. . ,.. . . 88.8. 000088.88 00008. 00888888 0 8.8088 Esmxcam :88: 8_8>m .83 5:80.88: :80: 8.8>m .83 85.30 530:3". <88 ace-zoom -231- 'W‘ 880380.28 .5 a 0. 88.38.. 85 86.. 0.... 8.80.288 .0 88.88850“. 85:58.88 8.3 8.8.... 888.. 80+8888.8 00008. 00888. .8. 0 6.28 8.8.8:? .880. 80+8083.8. 00008. 8988888. ...8I8.8.3- 8.883888 . . 88.8. 0.808880 0008.. 08..888.0 0 8.8.8.5 888.. 00883833 00008. 00888808 0 .axm 88.82.. 0808 0.383880 00008. 088888..0 0888 008.8380 00008. 833.0800 0 88 8.808 . 8888 00883888 000.8 00888880 0 8.83-5880 8.88 880.888 00008. 80-8838. 0 8.58898 8838 838.88. 0000I8. 00.088... 8888 0088888. 00008. 0088808. 0 ms... 8880. 88888.8 00008. 00.88.88 , 0 _8._88_8..oe_._=2 .. ...,. . 888.. 0080388.. 00080. 0033888.. 0 3 .822 888888 .. .. 88.8. 00888388 000.8 00883.8. 0 8 82.88858 . ..., .. .. 883.. 008888... 00008. 08888800 0 888 8 .8. 8880. 00.88388 00008. 00888808 . 0 5.8.8888 .. . .. 888.. 038.8080 00008. 88888000 0 88.8..888208 88.8. 80-8083. 00008. 80888.3 . . 0 8858.886 .. .. , 883.. 8.8380. 00008. 083F880 0 8.8083880 , .. .. .08.. 00.88888 00008. 888.80.. 0 88:2 8830. 0088838.. 00008. 0088.88. . , 0 8.38 . H 8880. 00088.0.8 00008. 883088.. 0 8 .822 888888 . 8.00. 00883838 00008. 00088883 0 8.8.8888 808.. 00%08 00038 0088808. . 0 88.0.8 .88.. 8880. _088.8.8.3 00008. 00888888 .. . 0 88%| 00808883 00088 0088838.. 8838 0R88388 0008.. 08888088 0 .822 888888 ._ . , . ...,. . 8880. 83.8088 00008. 0888888.0 0 88858 .. . .. .88.. 00%83... 00008. 83.8.8.8 0 8.8.8.8368 _ . ...,. ..,. _ . .. 308.. 00888.30. 00008. 088I338..0 o 8.8888 . . . ...,... .88... 080088.8I8 00088. 888.888 0 8.8388 :82: 8_8>m .mom Ear-8.5: :88: 226 .88 85.80 5:2..."— >_co-xon_ 8.5-2.8.. -232 - 4.6 Empirical Comparison of Principal Component Operators and Randomly Rota ted Opera tors A question arises as to whether the rotation applied by the principal component analysis of the covariant matrix of the population sample provides useful search information or simply an expensive form of basis randomization. In order to evaluate the value of using the principal component basis, we can compare the performance of an identical system where the basis set is selected at random. To accomplish this, we replace the population sample in the principal component analysis phase with a set of randomly selected points. The variance of the actual population sample is then computed across the basis computed during the principal component analysis (i.e. the sample is rotated into the random basis and the variance across each axis is calculated). Both systems use the measured population variance to determine mutative step size. The only difference between the two systems is the method in which the basis set for operator application is derived. The data in Table 4.13 demonstrates that using the principal component analysis of an actual population sample provides better performance characteristics on nearly all of the tested functions. We can safely conclude that the use of population guidance does not provide a more expensive form of bases randomization, but rather provides actual usefiil information in determining future search directions for a majority of these functions. -233 - <09— ..5. 888.88 <9 U333— ».Eo—Eum a no ..8..—2:50 oogotoh mm .v 035—. 00088 8.88888- 0 gal 8+8. .88- coo. 896.88.8- =.8|8.8.3- 6.883888 08888086 0%.. 938%... 8E.o>>l 0838-888 0008. 88838.8. dxm 88.82.. B088 8888an 800 83:00 80.. 0888N886 xcj-Emco 08.8 8:88.583 oooom . 8838338 8.28. oooom oo¢Nnovo 8.88 888.832 ooowo oowoovm. — 3 .82 88858 0 0 0 0 0 0 0 0 . figs 00088808 0 8 82.888888 . 3%? 0888888... 0 8.8.... 00:33.88 0038 0088888. 0 5.8.8888 8.88883- 00088 8.88883- 0 8.9888858 . h A r ,, a o 88.0.8536 0000000. 000.8 08080.0 0 88:82.85 3.83888 8088 3.88%.. 0 .8283. 0030.888 0b008 000.8888 0 .8888 88:88.8 B008. ..-8888. 0 8.8288858 808.88I8.8 00008. 00888888 0 88.8888 0088l8883 0008 00880888 0 88.0 8 .88.. 0.88.888 0b08 _0088388.8 0 88F 00888883 00008. E883. 0 ..8oz.8..8..o8 8.88088 00008. 8.8fi8 0 88868 8.83088 88.. _888008. 0 8.8.8.8288 888.808 000.8 88-88888 0 8.8.38 8.8M38. 00E 8.8088L 0 8.8388 Esmxcam :88: Eamxcam ~ :88: 8_8>m _ .83 8.530 cozocau. <89. <88: -234 - 4.7 Evaluation of the Effect of Scale on PC Operators Using the direct variance of the p0pu1ation as the basis for the variance of thc mutative distribution effectively doubles the variance of the portion of the populatior which is mutated. If this mutation is too large to be compensated for through selection. the population will fail to converge. Ideally, this mutation size should be calibrated directly against the level of selection pressure, a concept which we more fully address in section 4.10. Two possible methods for reducing the size of the mutation distribution are to reduce the rate of application, i.e., mutating only a portion of the population each generation, and rescaling the mutative distribution. The first option has the tendency to increase the effective level of elitism as the population converges, greatly modifying the character of the search process. The second approach has the benefit of maintaining more even mutation while reducing the mutation level. The following data examine several alternative mutative scales and their effects on system performance. The presented values are the average best performance of 90 out of 100 runs (the 5 best and 5 worst results are discarded). The shadings indicate which (if any) of the systems demonstrates a statistically significant ranksum value when compared across all other represented systems. -235 - E80888 <80 0. 8.2800 28.. 00 .8 .8088 .0 .808 33 830... oo+mooonvmmo 0+80088.888 48....800R8838 8.800.88P88- . , . .. .0008 0.0.8088 80+800880338- 80+80080888.8- 858888.88- 8.88.5008 .. ._ .. . .0-8008.8|83.8 .0.80088888.8 .0-800.8|883.8 85.02.] 00+800.8088.8 00+8008|88.8.8 ...,-..., . .. ...w 00+80038.8.88 00.8 88.9.5 .0 800.8888 8 5-883.858 0080803888 .0... .8 . 88 85008 ...0-8 8 . .8: .08000F883 6800383888 0980030808.. 803-5800 .0-800W888... 30-8003388l88 _ 8.-8.0088....m. 8.2“. .0+80008888.. .0+80088|888.. . 3.... .0+80003888.8 80080008 0 ....I8 0 0 0 0 0 0 8.8008338 ..3 .., 00+800.0.80.3 008058 .8 ..8 0 8008 8805.002 00+80080808. . 0.0+80008888 . 00+80033808. 008888.08. 0 3 .822 808008 00+80038880. 8.... 888%88 00+80088838.8 00+80008338.8 0 8 .822 88808 80.800.038. 8 3080083888 . 8... .0-800.8m.8.8 0 080. 8.0.. 880088.888... 00+800.F88.8 . . .. .0+80080888.8 0 5.88.888 808838.86. 8.8008858. .. . , 0 8.8358008 8080088888.. 808888.083 0 880.0 8 .856 80800888888 80-800.8I8803 .0- 80088888. .8 0 08082808 580088833. ... . _ .. ._ 3.800-8fi888 0 88:2 00+80080888. 00+8000M308. . , , .... .. .. 00+8000.8I8N8 0 8008 00+8008mfl... 80-8008I88808 ...- . 0......8. 89800380388 0 80.02.888.08 8.800%? 00800888888 00+80088880.8 .. .8. ., .. .. ..8.. 0 88.008808 00+800.80|88.. 8.800%... . _.. _ . ...8... 00+8008.808.3 0 880.08 .08.. 09800388088 00+80088880.8 . . .. .. 00+800.8I888.8 0 80E 00+8008 8%.. .0-800E388 80.888383 0 . .822 88858 80800888838 80800888888 80-8008. .888 0 888008 00+8008|83338 80-8008 .833.8 88-38%008 0 8.. 8.8.8.5008 588888.88 30.80088. .8 88800888888 0 8.8008 028888.88... 5.88 .8I8I8 ..8 8 ..80088M888 0 8.8008 088.2 cues. 0885. c2025“. 8 3 8 8 -236 - As an interesting note, given that the formula for the variance of a uniform _ 2 2 distribution is (3%, the effective average variance for BLX-0.5 is then % = 3, when the distance between the two parent solutions is 1. Similarly if the average variance is 1, a scale value of 3 would provide the same mutative variance size (however, centered about a parent, not the center). So it is possible that the relative scale of the mutative size is the main factor in the success of BLX-O.5. 4.8 Evaluation of the Effect of Sample Size on PC Operators In exploration of the tunable features of the PCGA system, the previously raised question (Section 4.2.3) of the optimal sample size for the required principal component analysis presents itself. The data in Table 4.15 demonstrates the effective relative performance of identical PCGA systems with various pool sizes. All tests were performed using a population size of 200 and a scale factor of 3. The presented values are the average best performance of 90 out of 100 runs (the 5 best and 5 worst results are discarded). The shadings indicate which (if any) of the systems demonstrates a statistically significant ranksum value when compared across all other represented systems. Note that for the majority of the test cases evaluated a smaller pool size is preferential. A possible reason for this is that a smaller pool is less likely to contain a population outlier, and therefore will have smaller variance measures. This seems especially likely given that those functions which demonstrate enhanced performance include those which we have previously shown to react favorably to a reduction in variance addition in PCGA systems. -237 - For the majority of the tested functions, there is no statistical difference in the performance of these variants. Further, the magnitude of the performance difference on the remaining functions is relatively small. Nonetheless, this data supports the use of a minimal pool size within the restrictions of singularity avoidance issues as discussed in Section 4.2.3. -238 - E80888 <80 :0 88.8 .8.. 20.5.8 .0 8.88.8 8.3 8.88.0 8.888888 8 .-800 .88M8- 8.800.838.8- 8 .-800 .88888- 8.800.88M88- 0 60:8 298:? 80.800. .8.-88.8- 80+80008|888.8- 80.800808888- 80+80088088.8- 0.88.8 .3- 8.882888 5800889803 .0-8008I88803 8800888888 .8 _. -8 ..,: 8.503] 00+800.8.8I8.. 00.8008. .88.. 0080008888.. 0980088888.. .me 8925 0088385888 8800 E88 00+80080388.8 00+800. .0888 88 8.0080 5800888888 .0-80088|8I88.8 5-8003588 .. .. , 8.... 8.03-588 ..8..-8008838888. ...,. 3.-80088888.8 8.800M38088 . 828 .0+80083888.. .0+80083888.. .0+8.F8888... ..8.. . 80080806 00+800.8888.8 00+80038888.8 09800883888 ..... . . ..8.....» 88088802032 8.888% .8. . 00+800i8p888. . 0980033808. . 00+80E088. . 3 .802 888888 00+80008888.8 00800 .08888 00+800fi388 8 .802 88868 0 0 0 0 0 0 0 0 0 6.8888888. 80800883888 30800888838 . . 0 088 85> 00+800.8B08. . 00+8008W8m0. . 00.8008 .88“. 00+800 SE8. . 0 5.8.0888 8.-800008|88.3- 8.888%? 8.888%- 8.8800888. - 0 8898888808 0.-800388|88.8 0.-800888|$.8 0.80088E. ...,...8. w. .. ..,. .. .. . ...... 0 8890 8856 0080000030 00800000000 00+80000000.0 80.88.888.88 0 8.8083880 3.-800.888..8 3.880%:8 3..80088888.8 3.800888888 0 88:2 0080088808.. 00.800388... 00+8003.8|80.. ...,-8 .. . . _. 0 08808 3.-8008|88.8.8 3.800%: 8.-8003880..8 .. .. .3.. .. ..., . 0 8 802888888 8.800888%.8 00+800 .8I80 ..8 09800888808 00+8000.8I8. ..8 0 8.0050888 .0-80088|838.8 .0-80008I8808 . .. ., ._ . _ . 5800888888 . 0 88.08 .08.. 09800888083 00+8008.8I30.3 00+8008.88l8.8 ..5. .. ...,...8 .., 0 828 80-80088 .88. . 80-80% .8. . 30-800 .088 .. . 80-8000888.w.l. 0 . .822 888888 3800888883 3.800838088 8.-80E0..8 :8. . . .. ...r. 0 888888 88-80088 .88. . 8880088888.. 8880088388. . 8 .-80083080.. 0 8.. 8.8.8268 . .8.8008888..8 88800888088 88800800888 88800838388 0 8.8808 8 .-80088 .83. . 8 .-80088308.. 8 .-8003 .808. . 8 .-800888I88. 0 8.8008 5882 :88: :88: :08: 8:580 5:25". 03 08 08 8. -239 — 4.9 Evaluation of the Efiect of Population Size on Population Relative Opera tors An expectation of evolutionary computation systems is that as the population size increases, the level of diversity is maintained for a longer duration. The typical result is that performance increases as population size is increased for an equal number of generations. However, since each generation requires more evaluations when a larger population is used, the net gain for an equal number of evaluations will not favor the largest population sizes either, unless the search system performs no better than random search (e. g. a needle in a haystack function, or a completely random function). EC systems which employ population relative operators, such as the BLX-a and PCGA systems being studied here are dependent on population relative statistics to determine their actions. Therefore, such systems may be especially sensitive to the choice of population size. For example, a large population tends to maintain diversity longer, which equates to lower levels of variance loss through selection. If selection becomes sufficiently weakened in comparison to the variance addition operators, the population may fail to converge. In this section, we present empirical test results from PCGA and BLX-a systems with various population sizes. 4.9.1 PCGA POpulation Sensitivity Table 4.16 presents the results of identical PCGA systems operating with p0pu1ation sizes of 50, 100, 200, and 400 respectively. Each system executed an identical number of evaluations and the average best result for 90 of 100 (discarding the 5 best and 5 worst results) runs for each system on each function are reported. Systems demonstrating statistically significant ranksum values are highlighted. -240 - PCGA systems demonstrate definite sensitivity toward population size selection. The magnitude of performance difference and the number of functions which show significant change are both far greater than the previous comparison with modified pool sizes. The population size 200 variant shows clear dominance over the other selections. Further, test which run the larger 400 population size variant for the same number of generations (therefore having twice the number of evaluations) continue to show marked and significantly reduced performance results. The reason for this performance reduction has been shown to be the reduction in the relative selection strength, through measurement of average relative mutation magnitudes between the two systems (i.e. the 400 population size system does not converge as well overall). While we expect any EC system to show eventual preference for a given population size for a given landscape in terms of optimal search efficiency, analysis of PCGA performance here indicates that the system has an additional level of sensitivity. This result implies that selection of an appropriate population size may be extremely important when using PCGA. This result may extend to other systems which employ population relative operators as well. -241 - 880850.880 <80 00 88.8 508.000.. .0 8.88.88 80.3 8.080. 8.80B88838- .8... . . . ..8..... 89800888888 30+80008883.8 0 .0008 8.0.80? 80+8008.38|8.8- 80880080888 8- -. ., . . .. 80+80008003i8- 5E3- 8.882.088 .0-800888833 ..0-8008|88888 . . .0.800b0.80.8 0 8.0.031 , s ...,...»fi. 00+80008888.. 00800888303 00800888888 0 9.8 88.82: 00+80088888.8 09800803888 ._... ..8.. 00+80E888 0 88 85000 5800888888 .. _. _ ,_ .... 5800833888 .0+8008888..8 0 803-285 3780038888. . .. . - .0-800888|80.. .0+80088.88.8 0 828 88008885.. ..,. .. . .0+80088.888.. .0+80088838.8 0 80080003 00+8008F838 00800883888 ...- , ..8... 00+800.3.&.0 0 _8._08_800..._._05_ 0080088088.. 0080033808.. 00.800888I88 . 0980088888.. 0 3 .00.... .808088 0080088888 00+80088838.8. .8 00+80088.888 00800888888 0 8.8280858 .0-800I888.3.8 . .8 80- 8003.8i388 00+800M88...8 0 0888.3 68.800.888.88... ; _- . 00+8008m8lo88 .0+80.08|8338.8 0 5.8.0888 . 0.....8. .. ..8 . . . .0-80088880. 00+800.8|8.3.8 0 8898088008 88.800.83.088. 3 ...,... 80-800I88. .3.8 80-800 {P388 0 880.0 .8 .886 00880000000 .0 .0-8008|8388.8 .0+8008|8888. . 0 8808.885 3 .-800E8.8 8800883888 .0+80038000. . 0 88:8... 00+8008338..8 . 008088.88. 00+800388|888 0 08.08 80800380888 .. 00+8008. .338 .0+8008 .8I80.8 0 8 .802 808888 0030083088... . 00+800. . .08.8 .0+800b0088.8 0 88.008808 008008308. ..,-..,. 00+80038.88.. 00+800B8888 0 888.0 8 88.8 00+80088h883 .., .8. 00+80003E 880030.880... 0 80.8 8080038288.. _ ..,... 00+800fi088 88005888.. 0 . .005. 80808 80800888888 88.888080...“ . 00+80038.B. .0+800888I38.. 0 80808 88-800 .88R88 8880088308.... . .0+800i88 .88.. 80+800E838 0 8. . 8.8.8.288 88800838888 88.8008 08.8... - 09800888888 89800388808 0 8.8008 8.-8003008|8.. 8.883.808... 00+8008.38l8.8 5.888%088 0 8.8008 0.805. cues. c805. cues. 88:00 c2025". 003 008 00. 08 -242 - 4.9.2 BLX-o GA Population Sensitivity The analysis of the PCGA p0pu1ation sensitivity raises the question as to whether this is a standard feature with all population relative operators. Table 4.17 demonstrates the same test using the BLX-a GA system. Note that this system does not appear as directly sensitive to population size, as increasing the population (at least to the 400 per generation level) does not seem to adversely effect performance. Upon closer examination, the BLX-a system only appears to exhibit population limiting effects on two of these test functions, Dynamic Control and Rastrigrin. We cannot establish a direct correlation between population relative operators and population size, though the existence of such a relationship is apparent with PCGA systems. Possibly other factors, such as shape of the search distribution (normal vs. uniform) provide a mitigating factor. No general conclusion can be reached for the general approach of population relative operators at this juncture. -243 - 8800580888.. 8.0.030 :0 88.8 00.80.0000 80 8.8808 88.3 8.00.8 8-800888% 8. . 8.. ..8..... .0+80038388.8 80+800.8388.8 0 30000 0.0.8080 89800883388- 80+80033808. 8-.. 80+800 .38 .88- 80+80088888.8- 5.18 8.3- 8.882088 .0-800.8888.8 .0.80088888. 8 5800880888 ..8... .. 0 8....o>>| 00+80080838.8 . . ..,. 8. 00+80008088 8. 00+80080888.3 0 9.8 88.82: 09800888888 00+8008. .388 .,. 0980033888.. 0 800 85000 .0- 8008888. 8 8.800808 .88 8 . .. .. .. 0980088088.. 0 0.03-585 .80-8008|I 33.0.... 80-8008888..3 8.0 8008|8.0.8_ 3. 00+800.83.8.. 0 85.”. .0.-88.8.8888. 5.8803888. . .0+80088883.8 0 80080003 00+8003.883. 8 00+80080II8.08.8 .8.; .. .88. 8.8.8 ..8.. 00480083888 0 8.08 8000.03: .00$00.0.80. 8 098.008.8888 . 00+800888.8 . 00+80088880.8 0 3 .022 .808088 00+80000|8.88.8 880,8. 00+80088 .888 00+800088 .88 0 8 .002 .808088 .0-80Bm3308 3080088883 ., 30800888888 80-8008880..8 0 08". 8 .08. . . . ,. . 8.888%. 00+80088. 88.3 .0+8008880... 0 5.80.888 80800883888 5-888.588 00+80088I3.3.. 0 80.835858 88808-888888 8880088308.. 80800888880 0 880.0 8 8.6.0 30-8008|8088.3 80800880888 .0-8008m8.3.8 0 88083800 8. -8008|-8888. 8 8 .-800888838 00+80088 .008 0 88.8.8.8 00+800I|88838. . 00$ |8-8w... 8 00+800338.8.. 00+800W8m.0.8 0 38.08 0060088888. 8 098003.888. 8 00+8008 8883.8 .0+80083888.. 0 8 00.2 .80288 00+8008.888 8 00$0083888..8 00+80088.88.8 54.802808. 0 8805:8808 0980088888.. 00+800 .88.8.. 09800888808 00+8008.880.8 0 880.0 .8 08.8 00+80038.m8|.8 00+800M@i3..8 09800808838 00+800M88888 0 8:0... 00+80080808.8 00+80§08 .0+8008.880.. .0+800 .0m88. 0 . .002 808088 8080088888.. 5800888888 00+8008808 ..8 .0+80088880. . 0 808888 8880088888. . 88800008883 80-800 .83883 58808888.. 0 8. . 8.8.8358 08-800.88880.3 088008883808 08-800.888.88 00+80080883.8 0 8.8008 88800338888 8.-800388|.8.8 5-8008885... .0+8008.88|... 0 8.8008 580.2 cums. c805. 0822 «EEO 8.28:3... 003 008 00. 08 -244- 4. 10 Loss Sensitive Operators for EC As Sections 4.7 and 4.9 demonstrate, one of the difficulties with population- relative operators is adapting the scale of the mutation appropriately. An alternative approach would be to scale the amplitude of the mutative response to the magnitude of loss inflicted by the selection process. We will categorize such approaches as variance recapture or VR operators. In general a VR mutation Operator will measure the variance of the previous generation, 03.] , and the variance of the selected parent pool, op , for the current generation (note that this assumes that the entire selected parent pool can be determined before operator application begins, which requires a minor reorganization to some standard EC systems). A reactive VR mutation applies a mutative random sample for each produced child such that the total variance of the next population will be the variance of the parent population, a'p , plus some percentage of the variance loss, t ( 05.1 - op ). This is achieved by selecting the mutative distribution such that the variance of the mutation is proportional to t ( 07.1 - 0p ). Assuming that the two samples are independent, the variance of the sums of samples from the two should have the variance of the sum of the variances of the two distributions. Note that this implies use of a mutative distribution with finite variance (therefore we cannot directly apply a Cauchy mutative distribution without violating the design principal of VR operators). 4.10.1 Variance Recapture and Convergence The goal of most evolutionary computation systems is to locate the global optima for a given function through search. However, since we cannot know the location of the -245 - global optima in advance most EC systems make the assumption that the global optima will be located in the neighborhood of other near optimal solutions. Therefore a secondary goal of most evolutionary computation systems is gradual convergence on the perceived area of the best solutions in order to increase the probability of finding the global optima. In terms of VR operators, this implies that the percentage of recapture must, over the long term, must be less than 100% if we expect to allow for convergence. (I.e. if we force more variance into the population than selection seems capable of eliminating, the population will be prevented from converging.) Note that it is possible for short-term bursts of recapture to target greater than 100% of the lost variance, which allows for the possibility of annealing type cooling curves; however, as this greatly increases the potential scope and level of parameterization we will restrict ourselves here to fixed target VR operators and leave exploration of dynamic VR targeting to future work. A side effect of the decision of using fixed VR targets is that the initial range should encompass the global optima since it may be difficult for the search very far outside this boundary. While the derivation of variance recapture may seem intuitive, it appears to run counter to standard EC convention. Typical EC tend to achieve large levels of convergence early in the evolutionary process which tapers off as diversity drops (a necessary side-effect of convergence). In contrast, VR operators eschew early convergence for a slower steady state convergence rate in the hopes that the accompanying added diversity provides greater long term payoffs. Additionally, the steady state nature of VR systems allows for a form of resource scheduling, in that an experimenter may determine how much time may be spent in exploration and therefore -246 - how slow the convergence rate may be limited to (while selection pressure is sufficient to provide at least that level of variance reduction). VR Operators should be good candidates for both EP and GA systems (specifically systems with + and , style selection); however, in preliminary empirical testing VREP systems greatly outperformed VRGA systems. Upon inspection it appears that the variance loss due to selection maintains better stability under the high levels of elitism possible with +-selection. With ,-selection, the reaction of VR operators appears to directly effect the level of variance loss due to selection in the next generation; therefore the population seems to oscillate between generations with high exploration and low loss and generations with high selection and low exploration. The VR systems used for the results presented here are exclusively VREP systems. 4.10.2 Global versus Dimensional Variance Targets The concept of variance recapture seems to provide a choice of using either an overall global population variance target (i.e. dimensionless variance), or individual variance targets for each parameter. If the expected parameter ranges are well known, then individual parameter variance targets seem ideal. However, by forcing the variance of a given parameter to remain high, we may be unduly retarding the progress of the search. For example, in relatively under—constrained problems where one parameter has an exponentially larger effect on the evaluative value of a given solution than other parameter, forcing one parameter to maintain a high variance can effectively mask the contributions of other parameters for a long time. Nonetheless, individual parameter variance targets provide better search characteristics than use of a single global target. ~247 - The difficulty in employing a global variance goal becomes apparent if we consider problems with free parameters. If a given parameter is free, making no contribution toward the evaluative value of a solution, then we might expect the variance of this parameter to remain larger than those which receive some form of direct selective pressure. Further, we should expect that the level of variance loss across this dimension would be less than that on other dimensions, again due to the lack of direct selective pressure. The cumulative effect produced is that the population maintains its overall variance by shifting randomness out of the other dimensions and into the free dimension. We might expect this situation to provide very poor search characteristics in problems which are relatively under-constrained. In such situations, variance is shifted out of highly sensitive parameters and into less sensitive dimensions. The less sensitive dimensions are then swamped with excess variance which slows the effectiveness of the search. While this mechanism seems to provide a method to measure the relative sensitivity of the various dimensions, it does not directly provide an effective mechanism for directing the magnitude of mutative steps. 4.10.3 Difficulty with Non-fixed Axes While ideally we would like to employ both the techniques of variance recapture and population guided basis selection, it is difficult to formulate a method whereby variances across one set of axes are transferred as targets across another basis. The entire covariance measure could be transferred; however, using this rotated covariance matrix has the effect of ignoring the new basis. Therefore, the combination of these methods seems to require use of a single global variance measure. However, given the severe limitations of global variance targets in the presence of free or under-constrained ~248 - parameters as discussed previously, the combination of these methods does not appear to be easily rectifiable. Further study in this area is left for future work. 4. 11 Empirical Relative Efficiency Comparison of VREP For the purposes of empirical comparison, the VREP system was created which operated identically to the standard EP system previously employed with the elimination of the self-adaptive parameters. The self-adaptive mutation is replaced with a VR mutation operator which fixes the variance of the normal sampling applied on each dimension as a proportional to the variance recapture target percentage, t, the variance loss, ( 03:1 - a'p ) , and the number of children produced per parent, c , by the formula in Equation 4.5, which results in a target +-population with the required level of variance. -1 1 ( ) Equation 4.5 Formula for VR Mutation Distribution Variance Calculation All other parameters for the VREP systems were maintained identically to the EP and fast EP systems previous evaluated (population of 100, 6 children per parent, EP tournament selection with 10 votes per solution) unless otherwise specified. Also, as with all other systems evaluated in this chapter, the performance over same initial 100 populations and same 100 random rotations were measured. 4.11.1 Comparison to EP As previously stated, the VREP system evaluation is parallel to the EP evaluation. With the exception of the form of mutation, there is very little difference. The same code -249 - and testing environment was used in both cases. Therefore, the results in Table 4.18 should hopefully reflect as unbiased an evaluation between the effectiveness of the two mutation operators as is possible within the given set of test problems. Note that as always we must be careful with any extrapolations we attempt to make from this evaluation since NFL dictates that all search techniques must necessarily be equal over the set of all possible problem instances. The actual EP system tested used a 99.8% recapture target; hence the label VREP- 0.998. As the data in Table 4.18 demonstrates the VREP-0.998 system greatly outperforms EP on a majority of the test functions. ~250 - waAvfimg nun—d again Ammm :6 MO ESEQEOU Oogéuhvm ”fl? 2‘69 fig? 8-23.. o .880 2523 .. .. Big? 8+88.N- Salem 3. .2828 . .. fig? 285:. o ”25E 5%? SEEN .em. 82:. . fig? 8888 80 228 El; 88$ 80886 83-520 EEHE fig 88$ 3-83 32283 . El 88R. 2 88$ 8883 ms: .956 88852 9.0: 0.039%. V oooomw oommmmmd Ea 888m. 88$ 88.88 c v .82 5:28 58$. 3%? 88$ 8.383 m .82 5:28 .8985 88$ mm o o o o o o o . o gigs; a $88.0 o 8n. N .3 g . . __ _. ., .. . fig 88$ o8mNm.2 o 5.88m 35%? NF- a 5N. N 0 8.22028 , . o wwo.0 a ho>o_0 .2883 88$ @33 o 88:25 gorge 88$ 8-888 0 $22 85.8.. 88$ 8.8km.— o _sam 853$ SBMSS o N .82 5:28 8883 8S. 8. _ o 85:38. 8883 88808 o 890 a .8: oomvtmw 88$ o 9? 82.3.5 BRF 888.8 0 $825828 88858 888 8leme o 5:28 8-828 .88: $-BNINN o «5.28:8 8-2%? 88: 9683 o 28% . 8882 88$ 8NNo8 o o 98% m_a>m flow Eamxcum 50.2 «.35 ion 2530 5:05... -251- 4.11.2 Comparison to Fast EP Again, in order to provide fair comparison against general EP style mutative techniques, we include a direct evaluation of VREP and Fast EP using Cauchy mutative sampling. As in the evaluation of PCGA, we note that a fair comparison would be between a modified VREP using Cauchy mutative sampling with VR mutation; however, as discussed in the introduction to this section, use of the Cauchy distribution violates the spirit of the design of VREP. Nonetheless such a system may be feasible (afier all, fast EP does not seem to suffer from divergence even while employing an infinite variance mutative distribution). Such exploration is again left to future work and we herein present a direct comparison of Fast EP and VREP using a normal mutative sampling distribution. As with the standard EP approach, the VREP system greatly outperforms Fast EP on a clear majority of the test functions. In fact, table 4.19 demonstrates that Fast EP outperforms VREP only on the same functions that the standard EP approach does. This may indicate a particular match between EP methodology and these functions. Additionally, these three functions, Clover and Cross, Schaffer Modification 3, and Schaffer Modification 4, are among those which the randomly rotated version of PCGA was not outperformed by PCGA, although the relationship between the two occurrences, if any, is unclear. -252 - I . .u . i.ezlll..- ”Fill! tail. . .l . If‘k1.|l I «add—E .23 E398 hm— .mah a we :efluaanU manage-to.— 3... 035. 88 8888.8- 88» «8888. 8:$ $88.0 88$ 8-88N o .958 2&8 o 8:. 8888.8. 822, ”82.8.9 8$ 8+88N- 88: 8+88.N- c.888- .2888 28 8883 88$ 8N8... 88: 88:88 88$ 88m$ld o 8:25 58 0888.8 88$ 8088.: 8: 8883 88$ 8888.: o .em 852: 88 8,8886 89: 88$-N 8$ 888.0 88$ 8883 o 80 288 88 88885 88$ 8388 8:$ 88$.N 88$ 0888.0 0 83-580 88 8-88.8 88$ 8-88.: BB 8-38.: 88$ 8-88.8 0 88888 88 o8888 88$ 3:888 88: 88.8.2 88: 0888.8 0 ms: 88 88:28 88$ 8883 N8: 8wa08 88$ o888N 0 8.8.8382 N8: 8883 88$ 8888 88 068$.o 88: $888 0 8.82828 80: 8888.: 88$ 8588.0 88 2858: 88$ 8N8? 0 8.82828 88 085.88 88: 8888.0 88 588.0 88$ Awe-88.0 o 0829 88 A8888 883 8883 80$ 888.8 88$ 8:88.“ o :88me 88 8.88.8. 88$ 8-88.8- 8$ 8888 88: 28$: 0 88888 8:$ 98:8 88$ $88.8 8.8 5-28.8 828 08888 0 8908556 88 88888 88: rmlmNtod 88: S858 88$ 8388.0 0 88225 88 8-88.8 88: 8.083 8:$ 8288.0 88$ BéfltN o 628 8 888°: o8°$ 888.0 88: 8888.: 88$ 8880.: o _98 2883 88: 23°88 8:$ 888.8 88$ 8885 o $884888 58 8688.8 88$ 8883 88 0888.8 88$ 088va o 85:88 88:88 H82: 8883 88: 8888.8 808 8888 o 8908.8: 8 o8:$N.N 88$ 8888... m8: ootmlvme 88$ 8883 o 8.x 88, 8323 88: 8823 8:$ 88$.8 88$ 888.? 0 8225858 88:08 88$ 5583 8:$ 888.8 88$ 8:8: 0 5:28 .8 3.858 88: 3-88.: 8$ 8-888 88$ $-88.N 0 «3.6888 88 8.88..“ 8m: 8-88.: 98$ :83: 88$ 3-288 0 988 .. 8.28.8 88: 3.88.8 8:$ 0888.0 88: 8-28.8 o 988 Esmxcam cams. m_m>m .85 £39.53. :85. m_m>m «mom «Ezno c285."— _ nmm> mm". -253 - 4.11.3 Comparison to BLX-o GA Table 4.20 demonstrates that the BLX-a GA and VREP approaches each outperforms the other on roughly half of the tested fimctions. The BLX- a GA seems to outperform mostly on functions with a strongly unimodal component, while VREP tends to outperform on complex multi-modal landscapes. This outcome is logically consistent with the design of the VR mutation operator, in that VREP intentionally delays convergence. Therefore BLX-a is able to more quickly converge on strongly unimodal landscape, while VREP maintains more diversity and performs more exploration resulting in location of a better optima. 4.11.4 Comparison to PCGA The data in Table 4.21 demonstrates a strong similarity to the comparison of BLX-a and PCGA, except that the PCGA system outperforms on a slightly larger number of functions. Similarly we again see that VREP seems especially well suited to the most complex multi-modal search landscapes. -254 - waaéig was 888.8 <9 ~53: a no flour—3:50 unaware—om 36 935. 88: 2-28.8 889 2.888. o .928 0.52? 85: 84.888- 88 8,683- lemds- 8.82:8 88: 85888 88m 8888 o 2525 8k 8288 88m: 888.: 88 8:888 883 8888.: o .em. 8.2:. 88: 88:88. 88 8888.: o 80 288 8:: 8885 88m: 83888 o €3.88 88 8-088 888 8883 0 8.2888. 8: 888.8 889 8888...” o 8..“. 88: 8888.8 .888: 8888 o .28 886.85. :8 0888.: 88m: o888.o 28 8888.: 88m 888.: o v 8.2.2.28 . 88: 8988 82.: 8888.: o 8 85.5858 85 ovflmlmod 88: 88588 c 8n. as; 88: 8888 882 08888 o 5888 88: 2-88.? 888 9.88.8- c 8.96528 88: 2-28m 888 «788.8 o 8908220 88: Egg _88: _Elmmtod o 88:80 8:: 8-88.8 88: 8.28% o .622 . , : .. .... .. . 88 888: oboom: o888.o o .98 .. .., ... ., 88: 8388 88m 8888 o «82.828 _. 2 .. 88 8883 88m: 888:8 o 80588”. 88: 88:88 88: 8888.: . o 3.20 a .8: ._ .. . . . .. .. 88: 888: 82: 88888 0 8El . ...,... ...,... , .. _. 6:: 8890.8 88: 8888 o 30.28858 . . . , .. . . . . N8: 8883 88: 8883 o .828 85: 3-858 88: 3-88.: , . .. _. o «3,882.8 88: 2.8.8.8 88: 9-88.: L. . o 29.8 88: 8-288 88: 8-83.8 . ; o 988 E3358. _ cums. m_m>m 80m 539.com ...on w_a>m «mom «5.50 5:05.". _ mum; 3.me -255 - waaédmg .56 «GOA no datum—=00 3.3—Eaton— uNé 03.; 88: 8-888- 888 8.88...”- o .988 0.58? 8.8 8,888- 88 88%. SE? .9828 88 0888.0 88: 2888.0 0 25oz. :8: 8888.8 888 0888.: . o .8.”. 85>... .. 888 089.88 oBB 8833 o 80 288 _ 88: 8883 882 8888 o 83-580 88: 8-88.8 888 8-88.“ o 8.8888 88 888.: 888 8888 o 8.2”. . 8:: 88:88 688 8'8888 o 5.8.83.8: 8: 8888.: 888 8888.0 BS 838? 088 8888.: o v 82888 . _ 8:: 8888.“ 288 88.8.: o m 8.2 .888 88 8883. 88: 88888 0 8n. 83> 85 8883 889 88:88 0 5.8.8.". 85: 8-88.? 888 8-88.? _ c 8.98088 88: ova—Rm 888 8-88.8 .. o 8908.820 8.8 £883 88: 8866 _ o 88380 85 8-88.8 88: 388.8 . . , .. . o .628 8: 8885.: 888 858.0 88 8383 888 08888 o .88 88 2883 88: 2882. , . , o «8.2.89.8 . _, .... .. 88: 8883 888 8888.8 o 89888 88 88:88 88: 8888.: ._ , , .._ . o 8208.8; . . ..,... . , ,. .. . 88 828mm 888 8:88.: o 8F 8:: 888:8 88: 88888 . _ . . . .. . ..., o €882.88 88 88:3 888 588.0 - .. . .. . _ o .888 85 3888 68: 3-88.: . . ..,. .., .. o N.:m..2m;_.8 8:8 8-83.8 88: 8-88.: ._ . . .. ._ _, . . . o 29.8 88 8.0%.“- 88: 3-88.8 _. ..,. .. .._ .., o 288 Eamxcum— cams. m_a>m _ .85 £335... :3: m.a>m «mum 2530 5:05“. _ mums <09. -256 - 4. 12 Effect of Recapture Percentage on VREP Efikiency As with the PCGA operators, it is instructive to evaluate how the “tunable” parameters of the operator affect system performance. In this case, a crucial parameter seems to be the percentage target for variance recapture. Table 4.22 presents results for 4 selected recapture targets. The systems tested were identical in all aspects except for the recapture target. The presented values are the average best performance of 90 out of 100 runs (the 5 best and 5 worst results are discarded). The shadings indicate which (if any) of the systems demonstrates a statistically significant ranksum value when compared across all other represented systems. The 99.8% variant seems to outperform all other tested systems over the majority of tested functions. Note that those functions which responded favorably to a reduced recapture target are those which also tended to be dominated by EP and Fast EP, although there is no clear connection apparent (other than the possibility that EP and Fast EP can more rapidly reduce their mutative magnitudes). -257 - ocean—..8..».— mg .8 «QM—ah. 2:330: we «ooh—m— un... 03:. _oFEoo 2:8:an 338.81. 288888 2-82582 288288 8 868F888. . . 86882 F8- 8683.88- 8+822mm¢ :88 F..- .2838 56888: , 8-8888F Fo-82F.FNF..N 5-88-8.88 o 2:an 868FFFE 868.6% 58 8688FNIm.m 8688mm; o .93 6285 _ 86823.8 - 5-89.9.me 5-88288 868R..F8.F o 80 6.88 568E; Fo-o8m88.m., 5826le2 8+82FFBF 0 83-580 .8.-8882.9... 8.68%; 8.88882 8-88888 0 ms: 8+8o8m88 Fo+88lmmMQ F 5+8EQ F 56888 F. F o .388on 868855 868823 868 F226...” 8688F.8.F. o .88 .8083: 86888.... F 8688.08. F ,F388F88 ... 8-88% F2 0 F. .82 .888 868328. F 8:88.88. F ..8+88F.F~F.F _ 868823. F o m .82 .888 86888Fmd 8-68883 868823 868fi28 o 98 FF a> 868886.... 8+88.mF.|82 568268. F 568va FF.. F o 5.858 6 #82:?me 8 F6883?- - No.68.F-,8~F..a 868E2. F 0 8.88088 2.688858 2.68823 868F8|8F 8-88888 o 685 w .966 868283. 868M188 868E296 868F823 o 88.65 888888 868%5» 8.88ImF F3 8-88888 o 6:2 868288. F. 868% F. F 868.8%. F 86868-8. F o .88 568 FE. F ,Fo68FFF8e. F 8682 F6 5. F 8682688 8 m .82 .888 868288 8.82238 868883 8602888 0 89888 868E8~ 868688.“ . 868883 8683858 8 68.0 w .8; 868288 86882me 8683.58 8602988 0 8E 868282...” 8608688 8688688 8+88§8 o F 8.2 .888 8.8889. F 88882. F 868283 868 F888 0 .888 F. 789.8 Fem F. F68 Fem-.3 F. F68 le8 F.N F. F-88F.ml8.m o N. F 26888 3-88838 F“F.8FFo2mIN.... mF-82E8F. £68825. o 6.88 882.le3 88222.... B-68:F..2.m 8-8on3.8 o 6.68m :85. cams. cues. :35. 2530 c0225“. 86.8 8.8 8.8 3 -258 - 4. 13 Effect of Population Size on VREP Efficiency As with PCGA and BLX-a in Section 4.9, in this section we explore the effects of population size on the VREP performance. Table 4.23 demonstrates the effective performance differences for 4 different population size choices. The data support a strong sensitivity again to choice of population size. Since the recapture target is relative to the operation of selection, and the effect of selection is dependent on the population size, the VR operator is sensitive to population size in a similar fashion to the PCGA operators. However, unlike the PCGA operators, the VR operator cannot cause direct population divergence as long as the target rate remains below 100%. -259 ~ 6665 666.66 6686:6666.— .FE :6 656 862:8.— 66 6.6666...— 66.F. 6.66.6 66- 68.66.68. .2 666836266 . z... . 868M826 6 _6860 258? ...... 6668 FF.6I6F.6- 6668 .F..6..6.66.. 6. 8686 F6 F66- 6.86 FF.- 66662666 F.6- 6866.666. 6 5.86622 F26 5-88.6.6 6 6253 . ...; 6%... 86826286 86886h+ 6 .26 6685 86826-2: 86866666F _. 868%? 6 660 6.686 F6-86l6F.F62.6 86886666 568863 6 83-580 66- 66666666. 6 8686F6F66 .... . . . .. F6682F.6R6 6 6.2“. . -. F6686F.66F.. F 868666666 56883.. F 6 866826 8.6866l666F. 86885:. .. .. ..,. ............ 866266666 6 686.6858: 66686F62.F 868632F 86866663 .. 6 6662666686 8686l|6686 6668W6F666 6666.63.28 F .. ..86666622..F. 6 6 .662 .8866 F-6 683266 .6 5.8666666 .6. 6...... 8688663 6 666 6 _6> 866-62566 56862566 86686.68. 6 568666666 6 865666 6-686JI6FF.6.F 868E616 .16....F .. .6»... .. 86882.2 6 6.6666686 ....6F.6 ... . ..6 8-68.6866 6F- 6866.F|2..2.6. 266866663 6 666566686 6668666666 8686 F626. F 66.666? F668 F286. F 6 8662660 6668666666 66686326 . .. F6668£8F 6 26:66. 86626665 86622623 ._ . . ...F 868F686 6 .6666 86886.26 F668F2FF26 3.66.2 ...F. . F668F.2F62.6 6 6 662666666 866626623 F668F.6226.6 866262666 66662 F6F6.F 6 6.6986666 6666666626 86865666 868% F666 8662F666 6 6690 6 66; .8+8666666.F 86686E 868F.F6F6.6 868F28I6F2 6 656 8688966 568666666 86826.6 26 56866 :66 6 F .622 .8266 86866566 568856 66.866666F..F 568662666 6 6666666 8666666666 5.6866686 3.8662656 66666F..6.mF F6 6 6. F 6626866 8-6866866 8686F26F6 2.6868666 F66866F.66.6 6 6.6666 6668626666 F66666E6 266821666 866666662 6 6.866 cams. :66: :66: cums. aEzao cozocam 8F. 86 8F 66 -260 - 4. 14 Graphic Performance Comparison of Systems In order to demonstrate the effective relative operation of the major systems tested in this chapter, we have provided the following graphs. The first of each pair of graphs shows a semi-log trace of the average best evaluation value found for 90 out of 100 systems (discarding the 5 best terminal results and 5 worst terminating cases). The second of each pair demonstrates the relative ranksum value for the same 90 out of 100 runs at the given number of generations (i.e. if the runs had been terminated at that point these would be the assigned ranksum values). -261 - Average Error (log) Ranksum (x1000) La&;° -8 System 10 ‘9 — EP _ ------------ Fast EP 10 ‘11 ° ............ -12 BLX-0.5 10 _13 . '. u - ........... ._ PCGA 1o '14 ------ VREP 23 47 71 95 1 9 Evaluations (x10w) Figure 4.1 Average Best Performance on Square Square Function 6r-.-—_. ...—... -.. . .-—--_. ...—m -—~ - - ~66. F - ._._~_ - ...-......- 6-m VI-u——‘~M- .... ..--.- ---. ._._-..__ EP .____._. .- .. _ - -.. -.. _ --..-F__-._.- _- _--.~.__... pvt...— 29 ....— ':'..-. -;.:.--.-: m. -‘ -..- ..-,-.. """"""" BLX-0.5 :1: .. ......_._._.-.....- -_ -----. -. ..__.-- - 6... ...l q M6"--- -..,.-- ...-..--- F . F-FF .. .. F .... .--..-......1 q Ham—......— '6 —»-n-- ...—— nah-m .m— ”M 22.6. """ VREP Z: I- — . .. . . F . . ~ - - _ - _ - _ _ — - — - — ~P-w---n— - .. ..Ffi. ._.. . ...... qt——.—--—- F. 6.. -.— F...-F. F... W’“ -F—F. __ w vF—F ... 5 F. _ _ ... .— W.‘ - -— a--—~~.-‘- .. .-.,... ._ _ w— ~ — me— ~ _F. ..- w '1 FY“:— -.- -- ---- .---~~ .._ —.-.---—_.-.-.‘ ____._.__ - --.. 6.- 1 5 -. _— "...—... "F... ..-_....-..—.—..__...... q mac—w. - - . F. .6. — -. ... - -~-. 6 u. _..__.._._.___ «w -.-.——--.—.-...—u. ...-u--. ~.~-——..-—-F..‘— .—.—. -- A WEE- I I_-l‘-ll.-J-l-JFI-.I.I.-ll_- I...- II- ...-0...... ...lt—Jfi-I I.‘. I. -... ... - ..~- . vr‘Fua-o'f-v —. --.- any- '66-‘56 M...— -. ~—---—-—-.---——~- . . d--_F ..-- _._.._._,- -.-~.-- - .__-...—_.. .... . ' ....~..~ .... ..-..—.. ... 6 -—.... .. .... _...._ . - _ .-- ...———-—. —— —-------—.-—'-. — .— -..-.~-—.-.' - --. -.-v~— —- w- ...-u -"h'v -- 6L Ilrrrrrrrurrrrfifi 23 47 71 95 1 19 Evaluations (x1000) Figure 4.2 Ranksum Performance on Square -262 - Sphere Function Average Error ('09) 95 Evaluations (x1000) Figure 4.3 Average Best Performance on Sphere Sphere Function 36 I _.__ “nag—4 iff _w __ EP #:1 ;;‘"”;1:_Jj """"""" F881 EP __ ------------ BLX’0.5 M in: «w.» -.. — Sysbm m. ____~- W ”..., ... _u 29 22g..-...,...gfiI.Q§j§.'f__':‘"7f """ VREP 7W Ranksum (x1000) i L-n—u—uro‘IJ.-M-lln.l.l¢-ul‘-n~u-1.l-!J.m‘l-—-l ..- l-A“,um&l-u -MI I“. 8 ‘ W. "l‘b-uh. - . . . .-—-. -. -. ~.-‘ ‘ . .. - .. - M“... -.-~- _— 4 WV" .. ----..p...“ . - —----w.,...-_.-.._--. -.-. -\_.,.-.»....- -.. -5.. . ..- - ... -_ a. v. -.. _‘A-- .o .#v -...--- .-. ..-.—.-.-—. -.-i- H.— .-.u M”. Qmummmenw- .-A-A-M“L-,..‘-O- 0,"... H- ...- -v-—~- -..._... «M. a—---. -_..,_--.._ .. .-‘v‘w, .... H. j I —I l r T j —l I F T j l T j 23 47 71 95 119 Evaluations (x1000) Figure 4.4 Ranksum Performance on Sphere -263 - Schwafel’s 1.2 Function Average Error ('09) 95 Evaluations (x1000) 1 9 Figure 4.5 Average Best Performance on Schwefel’s 1.2 Ranksum (x1000) ”m;n.-rhr99:999?.?.'r?:'t '1": .. ,VI -..—u.-. .'q-,'--.~~- . . .....- .q. .... .aq HM‘va‘--- -...*.~.-,~--w.’.,-.‘--_.-.*.-i--» ~~— I" -i F-wv- ~?-FP.E—- F—.w--am_m,§.mm-v - ...“ p4}...—..-—-.~x-_...... .. .,., V U.-. “.--”. ...-. -. - ----—.--. . Irma. _ .... .. ...~ ‘ .l -..,-..H-.- q ...-....-~.~. - a. .. - .. I Schwafei’s 1.2 Function Sysbm , —— EP 9 """"""" Fast EP ............ BLX'0.5 . ........... .. PCGA ------ VREP ‘ V'r we—»---~....—.--d .. - . ".1 . . . .- .....- ~~4v ‘»>-.- ‘.-.~ ".- o-“l paw—w ...-e p. _ i, -- r»..- .. um... ...— .,.. .. ‘~ “v . -.. . .p‘ .. -.. ---w~~— - —. - . ‘-\-«.—._ - ‘-‘_a_ xv no y- "'M.~.-.a-u ..‘D. .- .- -.. - - . ~'- 'gfl...l-JJ.- 9...- I !.F.l.l..-.l I ..l 1 ‘. O 1pm.“p--~.-—.-.g.-----.-.-.,t.---,.,.‘i,-..-_--.‘. \........_—......._-....._ .......,-........ a....-..._.__,......- Li. ;:, _____ -. ... -~- .o-r-r ‘- .-r---a-_-. .- v. 'n‘H-r - a, -u-a~ .-~.v .-.- ..--... ,- , _ ,_ .._..,‘ C .- __ - -.- -.‘-L-O“-‘-.--t-O‘Ow - -r-v.-..—.. ~—~- . ...—“z, A- _. I FT—I—T r i I 47 73 95 Evaluations (x1000) I 119 Figure 4.6 Ranksum Performance on Schwefel’s 1.2 -264 - Schaffer Function Average Error ('09) 95 Evaluations (x1000) Figure 4.7 Average Best Performance on Schaffer Schaffer Function £5!!! __._ Sysbm _.....“ ‘_ it: ' ‘ ___— EP :27.” iii". ‘ 1:11“ it ------------- Fast Ep mg- a ...... i‘é-.;'.'£'.'.' ...... 5.3.3:: if ;i' ‘;i',‘.’.ii,;:; ----------- BLX4).5 ., .. - .....v v ..._,.,.. PM- -.-.-— 8 \-__ _ n ........... _- PCGA :_ O W ______ VREP .- ‘— - -.-~~.-~-—-.--u - ...- x - _-.-.h- .-.“... .— V -S‘ ”7%...-m- ..Wmer.E.: '''''' E .i . -.,..,-.V....‘ _.-‘--‘_._".:_:" fl . ...,,,......._._,._‘.,.....- 3 .. ...---.....1 ~__’o‘- awaA- _.-m...-.-r_.e. .-._- x -- - -.-..m...’ .- -.‘h fit... ...... __ .. .. - .- f- -..“--- __-_-u.- “a- -1 cm. é -- . ...,.~.--,~‘- -- — --,“. ~ _— '9 Inn—_..... .m‘.’-_.. - .. __.._. ._ a '."___'_"_'__‘.f_ ‘ ' :::'_i': ' ‘ ' ‘ “ ,_"_'.§.;.‘;‘.. 311111;: :".;T..‘._:.'. 12:21::'1..j.‘_’.i.::;"_‘.f: .' 7;: ;.’.:::.';f.' ,'..‘..:;‘i 1 ‘ e - - - - - -9“...-O.-AA-A‘--fl-.-.“.L- ...-‘O-‘t-Ofi-t --..-.-~-.—-—-.— an“--. -Hm-ae,‘ “+va ‘4 -rarwr-»_—KWW _. A2. 'Mm' ,.. i rt —I fiT 9F T T —I r 1 I T 1 T j. 23 47 71 95 119 Evaluations (x1000) Figure 4.8 Ranksum Performance on Schaffer -265 - Schaffer Function Modification 1 Sysbm -1 K. — EP ‘-. """"""" Fast EP 1 ------------ BLX-0.5 70 K. . ........... - PCGA X ------ VREP Average Error (log) 3 I I I I I I I I 23 47 71 95 ‘l 19 Evaluations (x1000) Figure 4.9 Average Best Performance on Schaffer Mod. 1 Schaffer Function Modification 1 Syst ET: EP ”"TTT. """"""" Fast EP f";- ............ BLX-0,5 if: ’ “ “ ' """""" - PCGA 2...“: — y- _ —- --——.—..p- -—— - -- - s -.— - .. 'vv'u ---._ .. - .~ u..-“ .- -- - —----. 1 5 4 in...» “mouth..-__.¢..-...-A.\...- “-.- a.-- . Na-.- ..- "N.” -...-.... -- . W..- -_ ___._...“ - . - “--.“.-- O .- p—‘l-r_-.-...-—— -.- ..‘—’--.— ”..--.-. .... —~~‘-.-.— - - - - . .-.———~_-..w.... - . - _t.. . __._- _ -.... .-.. . - - -.. .. _ - h — — — .._- -... ”an.“ ..M A” w;— v _~— ___—_.." .. --.». .--.- ~ . -.--—-..-..- -- .- n .--.->.»,—.‘- -..- .-.----~...- . . a'u“-'.. -.. _ A- . ...v Ranksum (x1000) i t i 2 i ~ —---—-.— -- - u-v-u --.-----—-- __- -. - cs <.--~~'..—\ ...—-.. n- .-._.....-_.._..—.....—-._-v-t-~..—.__ --.--. .. ..-“... ...-,....-_ -._. -....-u... . .r—.-.__... ,-_ “a- . ...“ «--,. --. -..--.-.-._ ..-..~--.- . .-.~z--- . - u'w\-—--— ...-..-,. ..- . .«-.. 1...”.-.- .~-_—.-.--.. wym-1-—---u n,"l§-~. .. --.--v— -..... - -.-v. - no- —---r....-—--—n.,._' I fl Wm“...- ...-.... . mu‘m.‘.‘m§‘-OO -..-- D .-. ‘-. J‘- -.:.-.-a—--—-.-. -«.w--..._ ,. ,...... M . -..-.-,. __ _ ..-.‘....- - _...“h- -..”..g ’-L — ' . . ' 'dx'u...‘ ‘VNW—fi—w - W, _---m. ’Nr—G‘nlfi/c.‘ 6 I I_T—I I— T I I r r T—j—T _I 23 47 7'1 95 1 19 Evaluations {x1 0(1)) Figure 4.10 Ranksum Performance on Schaffer Mod. 1 -266 - Average Error (log) Ranksum (x1000) 36 :4 Ln- '-~—'-->‘-t.‘— w- ...---.-— --—‘-——-————'J p- . -_. . __4 m_-M Sysm ..-...“ 22 “ ----- VREP W Ring Function System EP """"""" F ast EP """""" BLX-0.5 ............ - PCGA ------ VREP a—L —L O I I l l I I l 23 47 71 95 1 19 Evaluations (111 0(1)) Figure 4.11 Average Best Performance on Ring Ring Function - ~-.~.- — -..- -o-w .- --.---— -.-..- -- ~--. -1 — EP run-u - -r—- . '¢.. 3..." .. -.,‘wv """"""" F ast E P __._... ... _ y it " a a - o - a - ooooo BLX‘O 5 W ..-... .m-‘wuw-ha- n-m»~hv*.w\-.ne-W O you—W ...... - . a--- .. .- .--.--.---..--.-.--u.-.._-_----- -..._.....- - , __- ...._ - _ - - . .-,- - ---. . _ _.- _ .. .. - --- -..,. -. 1 5 - fi...” _.-._-.__-_._._.,.-...,- -_ __.“..- ---- --.--- .-..... ..---.... --. .-..._. ..-. . . _ -_ - _ _.._._--...- -.,--. _... .-. . . Fm‘f va —..- w.- -.- - -MA‘M- OM mfiAnMamAmw-h Wafgu.‘ f“..f.~-.- ---.--.,.---—.- --... ...--- -. ”...—--.... .-___.._,___",_,_,__ -v .....- , 3““! ...-- - --...,..-_. - - . .....- __._. _..... . .. --.,—-.....-._.--...-.._- - _ - - ..- -.-- -.-...-.-.-.__. 4w»- —- --.-....--.—---n-.-.~‘.._...-.--. .....-.._......_..---._.._..—-.---»..-_-....-. _.-. .-.-.. . .-.- --.. _- ...,—.....- ~ .- _ u- w u. .- — - du-u— .— v to ”- - .. _. - - . run-w — .— .r .— -.. __._ ‘m‘. ‘~-.~..__—-.---.-.-.. ___-_ .. ,____..-._-..._.._-._-.- .—__--._.._....._... “hi I rT—I l T—I Ifi I 23 47 71 95 ' 1 1'9 Evaluations (x1 0(1)) Figure 4.12 Ranksum Performance on Ring -267 - LTrapezoid & Cross Function a. Sysbm "-. —— EP "..,. """"""" Fast EP ............ BLX‘O.5 . ........... _ PCG A ----- VREP I I I I I I I I I I I I I I I— 23 47 71 95 1 19 EvaJuations (x1000) Figure 4.13 Average Best Performance on Trapezoid & Cross Average Error (log) 3 O Trapezoid & Cross Function System —"—- EP ”...,... ------------- Fast EP __— ------------ Bons ::::::. a.-.— —__——-.-—__—..-..“--- --...—.- . .-r.. — - . - ...—_._ _-v.--- .- . . ~._ __.—... Ranksum (x1000) M. ...—“___...-W. - . - . .-—: ;.‘.“"m-.-.-a-.-A-J.‘A- - .---.. .-..----....,-'.l'..'.--_. - .-. _. . .. . - --. . _-. -- .-- . __ ,..-_ . --- .. ....--“a- D , ”Wm... “..-,.-M' 6"25»"z1'7"7'1 95 119 EvaJuations (x1000) Figure 4.14 Ranksum Performance on Trapezoid & Cross -268 - Rosenbrock Banana Function 10 5 Sysbm EP 1 0 4 """"""" Fast EP ----------- BLX-0.5 . ........... - PCGA 10 3 ------ VREP Average Error (log) I I I W I I I I 23 47 71 95 1 19 Evaluations (x10w) Figure 4.15 Average Best Performance on Rosenbrock Saddle Rosenbrock Banana Function 8 O O 3 E 3 3 E 3jgili:.'_:_'I.i... """""""" Fast EP ..Wiilf.‘ 8 "aw-”’3 """""" 311415 E??? -Z-.—-~.-—---.-.--~~-—-~-~.-r-—-v---- . ........... - PCGA ”~5— 1:::7:_-1:i_‘-:::_._::::‘:__."" —————— VREP ...,... 6F I I I T I I I I I I I I I I I 23 47 71 95 119 Evaluations (x1000) Figure 4.16 Ranksum Performance on Rosenbrock Saddle -269 - Schaffer Function Modification 2 Average Error (log) 119 Evaluations (x10w) Figure 4.17 Average Best Performance on Schaffer Mod. 2 Schaffer Function Modification 2 Sysbm EP ;.i;‘_ """"""" Fast EP :3 ----------- BLX4).5 2:23: A O W. 8 -—-— -— VREP ”.--- *' [he x ‘l-o-I-I- --------------- - IIIII u " .-.-t). ..... 23 ---.- (I) ~-.- -- - - ~— A M; -..— .- c . .--.‘n--- -.. - "‘~.v~— - -. .--.---- v.5'.',- --—-- --A-.Vs---~‘.~ »-.- . . -. .\..-.-. -- ~ .'.~.- - - - _vr-a....u u--‘v~.-~< O O -1 whwwvo. - . - » - -- . Haw-.-.... --.-.-.... -. ......wu "._-, c . . -..-.-.e. -_e.-..-,- . ... ._.-.- - - a.-- ._. ..-,“.-- . d .. - 'l ”flown—'— . ~ 9M0”.u.-u--OO-Aw.l-‘Q-OQ-OO-M-u-D-A-MO‘-.‘—o W" PVMM" ~‘- ~ - - ' w’v- - - --u r rum-n- - ---«~.-.- - n ...-m.»- -..- ...... ...-.- - ~ .-.-.--.-«.-... .. .-~..-- «. 1"»¢»y. ——----4- .... . . _.., HP---_,V._._. ._--.. ....-,,.._._,L.._‘._-..,~A LA-----~ ...-fl “4,7,- --- Veal. __., _<,.~.._.~--.._.L..‘-._._-—-___a..._—~._.. ,. .. 6 Tjfi ITjj fiTI r1 I T j 23 47 71 95 119 Evaluations (x1000) Figure 4.18 Ranksum Performance on Schaffer Mod. 2 -270 - Average Error (log) Ranksum (x1000) 10 SpualFuncflon II 23 Sysbm EP """"""" Fast EP ............ BLX-O. 5 . ........... - PCG A ------ VREP v- -.__ "s. \\ -I-._.-.-:-'P.:Q-. ..... ~ I.- -h_- I I I I I i—fi‘f‘f‘T-‘r" 47 7'1 95 1 19 Evaluations (x1000) Figure 4.19 Average Best Performance on Spiral 29 22 15 SpkalFuncfion .... Sysbm ._.-.. O ..-- "_-.,- . ”:1;f:f;;:i:‘;‘;i::: “ “:13! EP ::.‘:.: _.-§‘-?§éis.fffliii_"".'i.7.‘;.‘;'.f; ------ - ------ Fast Ep ‘__......; . --‘l..--_._~._-.i__---_ ”W... ___“- A. | .- 14 a.-. , --------- -' BLX-0.5 ”___... ' .. - .-.,-....-.-..n.-.-..... .. _.- ...—_..-.. 3 .- :::f‘r:t:;‘§:t:f:‘:' ' """""" ' PCGA :7: .. _ .g.-..-'l. ....\r .31 1 "'7'"- VREP 3:: t V" . ---w- ..-—.... -_ ...-5‘. .....v....... - -..-... _.----:.~_; ---a...._--_— -..... “.....- gm--.“ -.-.--.-.. -..,. -.m -.. - n-Q‘;.—: ‘._ _‘r-'--‘u‘. 1.”.1..:l‘. ...-....-.— -.—..-._ .—-'-—- .’ .L .-. fun—3-.- .--————~-..~.-n- .--*-- ...—--- -u‘u -- -m -—-...— ..-—-. --n‘.m /a L/""" - 2. -..- --~. ~—.-- Hu- '- TW~— —- - n—‘r—T: -_-—... c h! a." -- .5----._...- '.--a..w- .. ... - ...- ._. 'h.;.. --....— ._-.--.4..----- -.-... -- . 1' ..-... -.,,‘O’: _..—_..- --.“..-m .. ...—..-. . -.-.- - ___._... _.. .. c-.- ___._-.. .X.-,, __._ -_...W..- -..... .- -.—.—p-_._....—.‘ .... .. _. —-——v---—- .... 4 Pmbyunouu. .... -—- . . «v- - - ..---- -w.----~.-.- ..w.‘ \ . . _. .. w.- -.. - . u d‘ .7 .— N- ‘- m ”tn-fl.--" .1 p. ._. .. ...-_--.» ....-— pwm----....—.-.--— - - - .--._-..—.-.- .. .— .. .... - - ._-— . .. .... .. .~ - - ... » p -. _ _. .. - n~ - .. .. .. _ _ .7- 4.. .— -. — --.. .-. a- n..- .~.._— -. .. Tf—I 23 IT—I I r—I—I fiTfi I 47 71 95 119 Evaluations (x1000) Figure 4.20 Ranksum Performance on Spiral -271 Average Error (log) Ranksum (x1000) Ackley Function Evaluations (x1000) Figure 4.21 Average Best Performance on Ackley Ackley Function 36 . Sysbm - if: 1 , -':_:1:;-i.ii::':t::"::i.::.""""“.- . — EP ”:1 ;‘:::‘"‘:‘:.i_::_:::‘:i';:::::.;::':::::::. ------------- Fast Ep .;:::1: “"*“;‘;“;'--;‘-"th';";)vv' III-v“ 1‘...“ 29 ‘ ' ------------ 13on.5 :: .1 - -_ ._- .. - ..- - .. .--.~---«..._ ---------....a- . -‘1 p... “......“— 1Exit-,_--_._____ ‘ " """""" ' PCGA :::': ‘--.._- . ’-.-.- - - -_.—19.4 ‘ - ~— \---—-———d . 22....,----.----_.-.----......._-._..-...-_. ------ VREP ._r -—- ._. Ww.-n—--o.._..—.~ .. .\~.-_—.-——-- ' u—c m'v‘mn - In- - O ."-\‘." "‘ .. —‘u'." .~' 3 o — Ul-I--_ - --.-O-C u'.'\‘m r----—--——-. .- _- --.-..—.- —- -...ragg... ._.,— , ”flew“... _.....“ ...,—w.— ...... N .. _ V _... - .. . . _ IW¥ m ’ II W.*om.wmno-A”»_¢~‘-o-A- Q-A’. ...-....A-m .. ”...,- -_r .- wu' __._- - -~.v~.—_.- ._.- --r _.~-—..-—.-_\-m --—- - - w‘.‘a-— - — Own-"i.- rTjITjfi TTIIrer 23 47 71 95 1 19 Evaluations (x1000) Figure 4.22 Ranksum Performance on Ackley -272 - Average Error (log) 10 10 10 -21 Griewangk Function . ..................... 23 47 Evaluations {X1 000) Figure 4.23 Average Best Performance on Griewangk Ranksum (x1000) 29‘ Griewangk Function .0 .r.a~QF. _. ___wygflfi..- --.. ' .-- '5'".- - - - — — - _ Vmp --.3---.- .. .-- -. ...--..-- -. .....g__. - . _ """"I’~ -.-------"! --.-__. — . ’ ‘- ..g.-'-.- -l-I—l-I- oooooooo “mo-kg" ---—- — ---'------'-~ ‘ -- --;_--. ~a.1-vv-vv-n-v9-w-ow ._..‘m. ..."..‘fl-i‘: .IFI- . .. _.,. , - ..,, ._....a..---.. ....--..-..~.......--._ - \‘c nu. “- —:“-- nu..- -- ---a-'u-.- um. —o- -_....“ -.v-— cw- _Op. .n—---—ow-—~ I LII-04.4“.q... -- ‘wu-w-po-d- rum- A- cu. .u—u. w.-—--— c. .-.-- ... - p§~.--—-- n.--.au-.-¢-.- -.v -.- .... I-‘n-IA‘l-l- - "" “ma...,,q _--......-- ..-.-.._...----\....-.-.-------.A,-- - - «....- ,-. ..--...”.-- v . .»-— —.--... -..-.-... ._.. v—w—u- - —.-.. .--u a... —. .. .... Ti ..--_nu..- I FT—I—I IT I I I I FW 23 47 71 95 1 19 Evaluations (x10w) Figure 4.24 Ranksum Performance on Griewangk -273 - Clover & Cross Function 0 5". Average Error (log) 10-317 I I I I I I j l I I I I I I I 23 47 71 95 1 19 Evaluations (x1000) Figure 4.25 Average Best Performance on Clover & Cross Clover & Cross Function Sysm __ EP "“ '_:.i_ """"""" Fast EP E3:- """"""" BLX-O.5 ...,.--“ . «._.-... -—‘.v-a--- -.-..-w— ab— --—— o-\>- .. _- . - . _ --..--._..._..~ ~.~. - -..- ..--. ... ._...- 4. . ..—-.- _ - .. . -.— q” . ._ “._....“ Ranksum (x1000) —. rand-.7... ’-"-'"-'.i.-,-; -.-~w-s- — .1 pm...- .3 . --.-.....- .... - ,4}.,4_.,”,.1_-,____.__ _._,.__.,..,.____..-.-.-_.-w .--.,.,.....-.a.. "..-. .3 .-...-- ---3.-..——.---..-..._..--mn- hfiW—Da ——,- . a.-.-»-—- .-- |— —.-_.-v.... ..- _ ..Tw. _ 1 - ._.—... m-J ‘- a _, «...—w“ _- NM“_¢.§~‘_"."-_np‘.“--A----'.‘vv~ll-..-’.v* »‘ ‘g’rflp‘ us.- --u-afl.--“ bflw—IImfiIIFT I—I I fiTnfi I r f 23 47 71 95 119 Evaluations (x1000) Figure 4.26 Ranksum Performance on Clover & Cross -274 - Bohachevsky Function EP """"""" Fast EP ............ BLX_0. 5 a ........... - PCGA ------ VREP Average Error (log) \ 23 47 71 95 1 19 Evaluations (x1000) Figure 4.27 Average Best Performance on Bohachevsky Bohachevsky Function -22.”... a... .. _ -. ___ ___- _ __ _ ___ _- .. “22...... -.. ma... .2. -.. . .--- --.,... - ___ - W—r. .- 8m 1"." ‘- b— _-- EP ....--” ...—..- -— AM“.'--—~'.~-wn--vu- .nav-—--.-.-s~m;—4 ............. Fast EP '1 M's-n"- "' W~u--‘.._..w* “i w . lam.— --.-.--.-—. - --- - - -.." --.- - - .-.--. —--..- .. ...-m - ... £1 LIF—H-I-n-n-—--o . -. ....--‘._.....___....c-.-.__.-.--.._.......N. lo-o—o—o-u-u BLX_0 5 ...,...n..- . at.--..-....-..- .-. ....... . 2-.-... ....--.- ...-..--.. .- ‘- ..... -- ..-.- -------.—- -..- .-.-.— -3.----~.-. p...~\.~.~. ...-00-..-..- “ .4 :.. .- _ - -- . . -.-... ... ..-—.-.-_--_-.--... ...,»..- _--. .. ,....-.-._ .—.--.....‘_-... -.-.....- ”...... ._- .-..._..-,-~....M.._—._.-.~.u~—-d - - — — _ — VB? ”_..-...... fl ,,,, ....-.-_.. ........ . ....... -~-~.-.......... ..-..-II-I.-I.-O.-Il Ranksum (x1000) I.“ - . - u‘ ...-_.-_.. 3....-. .. -..----~~ -———‘~n*‘m.vvndvmvmvr.flm O ._. ......-..-. _... ...-r~.-~ .--. ..._._._...-.v .- ..... -.--- ...-.-." .... . ...- _. ...,“.-.m. - . .. ..."..- ‘ ~~..-...,__.- ..-_- - - .. u1‘.- ..—- .v _-.- -.....-,_—_-- -..... ..-..-.. . .-,I __.. '-’."'v - u‘l _-— ~ --A . - .-" O . "manna-.I- 2! --. ... ...,-.-.! F. -1.......'."'. - . ....---_._._._.-_ _.. . ...._-- ........ -- ..-”- “-.-,-.- . “We“..- . O .-...-..~,.._-----..-_- - . -.v‘~._‘__ 4...- - a --4. ._.-..n- _._. . . - ..f--.‘_..- — . . ...- _..... rTTjIIrTIIrr—I 47 71 95 119 Evaluations (x1000) Figure 4.28 Ranksum Performance on Bohachevsky ~275 - Rastrigrin Function 1 0 Sysbm — EP """"""" Fast EP ............ BLX-O. 5 ............ _ PCGA ------ VREP N ___L O u—L Average Error (log) 0 l r I .-..-e; ---------------------- 23 47 71 95 1 19 Evaluations (x1000) Figure 4.29 Average Best Performance on Rastrigrin Rastrigrin Function Syabm -. - .. EP :73 ---------- Fast EP : ------------ BLXO.5 :37: """""" ' PCGA 3:11: ------ VREP ::::: - ---f“ ".-..-- - 3 -- "mm m;: A_v ,_ - —v\~—‘-—--U‘ M In-.- - . - - . 35...--- "If- .22 ._.” --_ .. -.,. -.-..- _ - .- a... -_ . ---..,.. . - - . . ., ---.-——. ._.—"...... _ _.-. ._. . _ _ _ .. Ranksum (x1000) .. .32 - . “I: -..Jt--..i-..__. -_- . -- .-- _. .- - -.--....---..~ “___-_... _.--2..._-...a....__ ‘ .. .. ._ _ ._ _ _ - -.----fl-.--..-_-.- . “___.... -. . .---..m----..__, - . ,. ---. -....- --.,,” ....-_...,_.--__. --...‘---.,..-.--._ . . ...__..... .----- .- U -. ..--._,-----___..-._ __.. ....__-_._--____. ""~‘--~—'- v;-" ‘ ' —‘;'.'&T'I‘7-v-v-vmmr. o _..”,aus...“ ._ .- _ __I_":fi_t_ __._-___3__' _-‘_'o-oc-nc—no-Ic— ...... ..I‘ .4 I—uwwfi—uvw—u- -—-—-’—.---.-.—o---...~.-u_ —--'u-\-I—" - .--1-.-.-.... .1 m~w.~—~— --~.-.. . . ....‘--...-.---._._..-.- . -4 "_..—.-.. . .... ..--—...-- -..-.4. .. . .-....—..-_... _.-- - -.. - -..t ...... deo—‘I——-u-o—-—-. l .. .-.-,.---__......_._-.._.‘ . .. -... .-. _ _. ._ _ “-_-..fi-.- -..-__._-_....... - -......_.---.._-.-- ._..-- ...—._...- -....- -.-- .-.. ...,»...H— _ ---.- EI—FI I [WT—I—j FTj—IIFW 23 47 71 95 119 Evaluations (X10111) Figure 4.30 Ranksum Performance on Rastrigrin ~276 - Average Error (log) Ypi & Pao Function 10 2 Sysm EP 10 1 """"""" Fast EP ------------ BLX-0.5 10 0 ............ — PCGA ------ VREP 1o '1 1o '2 1o ‘3 Evaluations (x10w) Figure 4.31 Average Best Performance on Yip & Pao Ranksum (x1000) Ypi & Pao Function 36 .. """"""" Fast EP ------------ BLX'0_5 _ . ------------ - PCGA ., l.’ 2 “1'3".15 —————— VREP _ .27. j: .ZI . """""" 6 I I I I I I I I I I f l I I If 23 47 71 95 119 Evaluations (x1000) Figure 4.32 Ranksum Performance on Yip & Pao -277 - Schaffer Function Modification 3 1 10 Sysnm ___-— EP ------------- FastEP ............ But-0.5 38‘: --------- " PCGA E" ------ VREP o t LLI 3: g g ‘I Ib-I-Um-v ..-hufi-3m‘." a "7"”“‘"‘ """" Ifrllrlllll'I'I Evaluations (x1 0(1)) Figure 4.33 Average Best Performance on Schaffer Mod. 3 Schaffer Function Modification 3 l 36 . -_...- ...,..- -_..-.__-..- Sysbm l2: 4*...“ ---w-- ~--- — EP :; 29.13'3'5r‘v """"""" Fast EP ...... l 75' ---..-........__. ----------- 3000.5 “‘77:... g ..‘c""""~s.: """""" " PCGA ._.”: 2 22 ---- VREP ......— x ‘ —i .... s... ——‘* E ‘ _... -.. ID 5 6 I I I I l I I I 3 F I I I I T I 23 47 71 95 1 19 Evaluations (x1000) Figure 4.34 Ranksum Performance on Schaffer Mod. 3 -278 - 101 Average Error (log) 8 C3 Schaffer Function Modification 4 . _— ‘o. ‘--‘——n—-‘_“ s - c s. u ‘- o o ‘- . . I I I l I I I I I I I I I 23 47 7i 95 1 1'9 Evaluations (x1000) Figure 4.35 Average Best Performance on Schaffer Mod. 4 Ranksum (x1000) Schaffer Function Modification 4 36...... _ . 7 System EP .. - I """"""" Fast EP hm“ ' --------- BLX-0.5 , , .. ----------- - PCGA j -“~’ -- , ------ VREP .. 'Ili ...... I l I I I I I I 23 47 71 95 119 Evaluations (x1000) Figure 4.36 Ranksum Performance on Schaffer Mod. 4 -279 - Multimodal Spiral Function 10 ._L l Sysnm EP """"""" Fast EP """""" BLX-O.5 ............ _ PCG A ------ VREP O Average Error (log) ‘.'i-II-0I-.o-y.. I fit I I 71 95 119 Evaluations (x1000) Figure 4.37 Average Best Performance on Multimodal Spiral Multimodal Spiral Function 36 l “._-..- System i . -.,..._,..___- .. . .L ‘39:}: 77..."- .4 h* --—- ‘v....f;..='l—.:.."o'..: ‘s— ”m.- 4 EP L? l c‘ . """" * W.” .-...-.....-. 29 ...._-._... __-“? Fast EP r-h‘ W‘ ‘ ev~ ' c ’ - ~ b--— .....I‘: - I] ------------ BLXo.5 :::t: 8 ‘ ”HI” .41... j . ........... - PCGA Eu.“ 9 . TEL'W-..‘ -. "_:: o +.-.---._..._.- -...-. —————— VREP .— 22 ._....fim-..” 1.-.-.. ...,“..H‘ --....... 7w _ E _-...rmflu‘w..- _ a +1. .M— N - .- f: “ "m"::*_‘.m:..:T.:-,;;€§i x 1 ......“m.-. -— ~ L3,.- ..," .. . .... — _____ .- Iii-‘3‘.”-.":.' :— .... C _.....9‘...“ V... --fis.-. .-- _ ___...-5AAA'QP. ..... - - .. . . -_ -_-- _ .. -.-... g L. ------ -m-...-‘o.. .-. -...- --, .— ... --.-..., .-.-._-...mum. 1'1: ' fl 0:” _ “ M - --. .— 0:. A ..M:;o«\r"—--~~- .. __-_ -.. ._. -.- -0. .. .-- -~ 8 1..- ' "77 x _1':‘*""“"*‘.‘ “ t ..:".':1::'1”f:"" q ban . . -.- _..7 .fi- -_..g...._.-;-.—.- -""--d' f... 6 I I I1 I T I —I I F f T —F I 23 i—I I 47 71 119 Evaluations (x1000) 95 Figure 4.38 Ranksum Performance on Multimodal Spiral ~280 - Average Error (log) Ranksum (x1000) Fred uenoy Modulation Sounds Function 10 10 """"""" Fast EP ............ BLX_05 - ........... _ PCG A ------ VREP I I I I I I I I I I I I T I I 23 47 71 95 119 Evaluations (X1 000) Figure 4.39 Average Best Performance on FMS Frequency Modulation Sounds Function 36 .7} ”I" - . ‘ ——-V— :E!.~mvnvmv-«-OO-OO-Mn 1 5 ”"1 ___._- _W ___._ g I _ a I _ _ ~___ _.. __ K —. - ~ * wag-=- 1_S “fif2235531”. if _ _ n. X _, .- 8 1:. .. ._.“.-. "WW”:T"W”M two..-” a": ._Zf::' MW W.:.'::'.:VMMH i:::t7"m _ 'tf*"mm 'W .H_"_ ..I LIV—_...“ _ _ .. .. .... ._.... - _...._- ...... ...,- --...- -_.—___._.-- 6 j 1 I I I I I f I r I I I I I 23 47 71 95 119 Evaluations (x1 0M) Figure 4.40 Ranksum Performance on FMS ~281- ExponenfialFUncfion C) System EP """"""" Fast EP ............ BLX-O.5 ............ _ PCG A O _; Q or O .LQ-‘ch-uw ) b9 3 c:8 (Lam O A __L O (I;1 I AverageError( Docs—58 “Surname: 23 47 71 95 1 19 Evaluations (X1 000) Figure 4.41 Average Best Performance on Exponential ExponenfialFuncfion SSMLW r__ ‘ _-;;ll._'r:.——,“v. ‘ . .~ )m __....M. ... ..... --..“p.'..AIl.-. _.... -. _...-_-__i.. . ..... . ______ a ........ . . __ ‘ -..-..- . -.--—._. -‘.‘ --- v--»—v -- - -— --—-~ - -- l. -- .- 29 ' -_.- '""'"""""..“"I'.'“" . -- _.....-- A -..-._..P. ...... -._...-.....,.. ,....--_-A...----..-._.._. ..--.-......_i..-.-.---.._.--.........-.-,..... ... "...-'3 7L) .. . C.- .I- q _.- -.‘_-_..._.\_,_ -----.~..---.._...._-- .-..-.._- . 22 .- - __._.,----...-__..--.'al-. -... .. dbfll“ ----t—-M~ _.- -'.-—.--_ .-.-.-_--.- —-'--a~.—.—-u- ——————— WI,—xp__._.+-n~-u_fi.-_._-—.. _-..---- . ...... .. . - .-.. .-W .--. “.--.-- - ---_...\- - ..-—....-- Ran ksum (x1000) i ‘. 1 d --—-=——=H ~f" --..- On.” Q A:;: - ........... - P CGA 3':- 4 L....__......n-_m‘.--m._w I fiTTT—rfi l fifirjj 23 47 71 95 HQ Evaluations (x1000) Figure 4.42 Ranksum Performance on Exponential -282 - Average Error (log) Chain-Link Function System EP """"""" Fast EP ............ BLX_05 _- . ........... _ PCGA — ------ VREP ~~~ ~--—-_-----o IIIIIIIIIIIIIII— 23 47 71 95 119 Evaluations (x1 0(1)) Figure 4.43 Average Best Performance on Chain-Link Ranksum (x1000) Chain-Link Function -....- Sysm i--- —— 5p 41:; {_if-T-i """""""" FastEP £355.13 .. ...._.:_:;-':::T:.‘_i‘: ----------- BLXO.5 ::::; 7'.fff"::_:;:i~fli‘:"_'ji'_"iii: ' """""" ‘ PCGA 3..; ------ VREP -:-.;' — . miz-¥-—a . W—P'Mfl- _.....— _ .... - _...— ' . - . ..._.~...,J..-11~.Il-.n-u. . In-a.‘ . 3 an" I- _______ ___________ - - ;;"-"f-T---'-'a'-'-‘m-" ' b. ' ———-—'.V.-__- —n.———-——-u-—_ . ...-__._...” . ._ . -_--.. ._--.-._.- -_-- _- -....- -.. .. . ..-.---.»‘x.,_-.-."—-__- -._____-____._,_. _---n.----.-._..--.-.-—-.-.---..--..-—— .. .. ..,m .. W..__ .-.~_..— ‘-.~ __._-___. A--- I l l l I 47 71 95 119 Evaluations (x1 0(1)) Figure 4.44 Ranksum Performance on Chain-Link -283 - Average Error (log) 0 _a. O to Double Cos Function ...—L 1_ l I I 47 71 119 Evaluations (x1000) Figure 4.45 Average Best Performance on Double Cos Ranksum (x1000) 36 29 i;}‘t§g}j‘_{i:.i;I;_-'_:'_i_;_._g_r_g—-4 ------------- Fast EP :55}; Double Cos Function L - _ ...-EH...- M—v.h___..u a» Wag,“ _.- . -_'.- ~ - 'm—I-H'“ sysm —~~“h-.v- - .. _-....“ 15} ............ BLX'0.5 ._.. fl _... .- fi—qfl-u—T—n .— .- ...... ~ - -W_*._~_. v.5” .. ‘vauw-.- _.-.-.--._ - . --. _v ...--- .- .-.... ....“- —..'——~-_-.-.-——- 'a'-- .---‘- -.‘ ---——-.-.-—‘----~ ' s ----.---.=_-. ...- ...IA.. "lull.- I.-Al ...-I. I. sun-nun um“. -.. _ -— r.~-—~—o “*Wi- — .- . .-.\--v~ .- —.- -- - . . -. - - - -- . . .- -- _ . .. ~-.—--—.-.~.~ .---- . a4 hr“-6-v------.— .— — - .—. --- . —. — __- —- .. ~1b-ow-«.--v- —~—-- — .— .-..» - .- ~- . .A... ...-.__---_. ... -- -., _. ..- -.. _ -<>—-- -.---..- .. .---.._. .-.- ..- ., 1w...-.-.--.> ,_ ----«‘v- .. .- 4r-u-~.-.-.‘.-- -_. --.— . _ .- p‘W-fi-C—h ‘w — a -- e.~-—r— ~— ----v v .— — ._.-cm. a- —-- In bro—ra— ~\.~--m ‘v ——'-\ M- - a..-“ —- v..- _ _.. -m-‘--.—-——--‘-—-------—_——. m. -— .— — .- ---— .- .— -——- —— ... _..—n .. I I I I I I I F I I 47 71 Evaluations (x1000) Figure 4.46 Ranksum Performance on Double Cos -284 - Inverse Exponential Function . 101‘ Average Error (log) °~-.- ---------------- - Fast EP _. I I I I I I I I I fi I I I I I 23 47 71 95 119 Evaluations (x1003) Figure 4.47 Average Best Performance on Inverse Exponential Inverse Exponential Function 36 “we...“ ----.fiwww..--. . .- ... .. .... —..‘ . "v-—-~--'..-av - u.-. ,-,......-. .-. .- .. __.... —‘.. --.-.- - .. . .- . - ..-. .. ..«.--... . .‘n- ~~ -- - .~~- - .. .- - - -~ -- F'uvr'.—.- ..-‘-...,-V.. - .....,,.._-._‘.._.... ._. . - ..— -..--- .. ._-.-.-_-- --.. --- r.-- —-... .. -.—-- 1- .. .. .. . r—v —— . .-.. .. . ——-— _~-—-..-.-w --—.---.. . “...—.- .....-- m~---'-—---—hqn—~%. .--. .\ ._._~.--...,,,.. .. ..vu-ns,.-.-. ...—.- --.-.-.- -_. -¢--— ”w-Vfi-mr-v“ ------—- ------ --'— ‘ ~- ’ he -l—po‘ ~‘\s‘-—x---‘.‘-——dlfi ... . ..--....» ..---- ...."f-' h..- .—.—.-~.._.,... _.---5... -_...- ‘- ..-..-_ ..F.-.._. .. . ...- _0-I-l-l-O . ..-..._. - .. .-___ . -..“..- . _. ..- .-.... -'._-_- A: .p — Ranksum (x1000) i i i 51 I I j I r I 23 47 71 95 119 Evaluations (3110(1)) Figure 4.48 Ranksum Performance on Inverse Exponential -285 - Average Error (log) Ranksum (x1000) Worms Function 1 0 Sysbm EP """"""" Fast EP ............ BLX-O.5 ............ _ PCG A IjIIWIIIIIIIIII 23 47 71 95 1 9 Evaluations (x10!!!) Figure 4.49 Average Best Performance on Worms _L Worms Function 36 _ ___-...u Sysbm f--. 4L-—---—-—----.-.--V—...\—----~~-.-m »>.-- 4 -------—-‘------—-i — EP 1....-.“ AM..-“ .--...._._._-._.. .... ._-—..,-..f...‘ u—m. . ......W-.- ...,-._-... fl“, ‘7é'._.'._'_‘_)—.“._-.T',ipfi:: ............. Fast EP WK """" 3' 'I ' , f _.-. ’f"*:’:'i:.::;"":_'""1": """""" BLXQS _....- ‘ -. - . \-_— - u -,-— ——— a — -... w - -— . —~ .., _ . .. ... ..- — § " - - _ - . . ----x-.. -_. ......-._- -.-... __. . _ - — y_m-—m ‘ .- .r - 8 - ._.-.--_---,. .-- -..--.------..--_-... \ -.. fi...._._.--.. -..-.. . -. .--_. .-.-.. -_-. ~ —& _ _--“- _ - .,-.-=---_.--- —- ..--w....-...-...‘v-.r.--—---—4.-.~ .1 pm“..— ...,—__.- - M~.-.—A ——~\‘-,- ... u--.-—-~..'-~V-- - -- --v—-—-'- --—~-.— . - -wx- -——-.- -..- - -- - -- - .e - —— -.-., . - — — 4' .- - - ... - V _ ..,. - _ .. wag” . - w ___ -w -._. ---. -..—- ——.—~.-o—a..-- -.---- —- - ......- - - ..— .-—_ - A- _ ..,... _q.--__._ -..- -- s .. - .. ._.. .. _ . ...... ..- -—-.—.--——- - -.....__ --.._ ‘fiww-_-- ..,. .--. .-‘<..- ----,----._._.---.__,__ _ éjIfiTIrTIYR—IIITII 23 47 95 119 Evaluations (x1000) Figure 4.50 Ranksum Performance on Worms -286 - Ranksum (x1000) 10 Average Error (log) Schwaefei Function Sysvm EP """"""" Fast EP """"""" BLX-0.5 ............ - PCG A ------ VREP I I f I I I T I T I T I I I I— 23 47 71 95 119 Evaluations (x1000) Figure 4.51 Average Best Performance on Schwefel 29 22“ Schwaefel Function .' fifti'iimwflu System 1:: __ .__.__. “Iii": EP — -~ i'O-f-Ohv-M ’W t__---:....a_._...__,_i ............. Fast EP _..... .. r. ‘- __..— In..- “3‘ ------------ Bl-X'o.5 ”_m "_--. m— - .. _..—._.“ W.“ ._ .. - - _...— — --...— - -- .. ._. .. _... - ... W __.—- ...— ..-“ _ ...... --.. ...-r -..... .. , .. .... ................................................ ........ ..-— . . - .. .. .. _-.. .— _ - - .. .- - ...—._.. ..-.... .... ._.. .. - - .. - .. .— ..... - - v-.. .. - ...- . - .. - .. ... .. -~ _ .. - ...... _ — _ 4... ._ .2 _. .. __._ -... .... ...—_..... .. ._.. .. .... i... -.-..- ..-._- .. ..-- ... - .- - --.-..... -..... _ _ --..,.._...__....._...._ W W”..- ‘m_- ...—”_--‘.-. , ~ -._ _- ._.. -_- ...-.... _--..._..m..-.._._, ._, 9.-._.v,, __sue _fi....,-«a_~._ ...,... I FT—I FTIITI'j Ij I 23 47 71 95 119 Evaluations (x10!!!) Figure 4.52 Ranksum Performance on Schwefel -287 - Average Error (log) Dynamic Control Function 23 47 71 Evaluations (x1 0(1)) 95 119 Figure 4.53 Average Best Performance on Dynamic Control Ranksum (x1000) 22;; 4».f—.-.. -~-._.. __._. -~-.-..-- *— yu...—_-.-.—..._-__....-....-~.._.-.-_...--_.-‘a-u ----..—. w‘hy-.--—i - s.-..— . s: -y l .‘\_a~‘*-' \—- ...- ..A.—\-. /-\.-.--—~ ..- ...4-.-,-\,,s-..“. '..._, .._._.._ - Dynamic Control Function Sysbm EP “‘ “j“ """"""" Fast EP _ ............ BLX_0. 5 ...,...:.._ ., . ........... .. PCGA : if“ ‘ Wn—yaw—Qo unswe- - - __.-A. .. —-_-—..- "..-- - ““0.-.“ .vvo. ms- -s. ... _u.-- .- .~._._.-¢.-ao._-u uh- ...—- ~‘InI‘--.I-Jl...-Il.-I‘L-I.Q-.I.-. INA- IJ-II,—II.-.J_—JI-_l l_-j_j-‘!_L WI _- _ ...-—‘ .— —-- -. .- .. ....— .. —— - -. -----.—-- ~4 y‘p-wu -.. v -_ -~ ..— I" xwwm-y- ' ~-—-- -— - —.-.-.- ... ""\v'.-'-—l ------ -.--..---~ '-'\‘I‘ ~ - .-_v- - . . -.- - - — .-—~--- ......— —-.o-u.------—-o Q.- A r;— -— x .. o-a- .. ...—— -.. . o- — ~ . —.-.- _- .-..- -_- _._—_ .*-...— _.---,.-‘_._-..r‘. _— - _..— a M---— u---.-- --— .— --.— “___.-..-...- pwk.~_=‘~r_-_ -M—“vv-- -V. =_._.- _a---,_.._ .. ,- —_ ...,—L... ..‘_,__ ‘--_ , .._~ .7“ ._fiv_mr_~.-.»,., l 1 fiT'T 23 FT—I 47 Evaluations (x1000) r—YTI 95 W1 119 rfi 71 Figure 4.54 Ranksum Performance on Dynamic Control -288 - Chapter 5 Conclusions This chapter presents a summary of the general conclusions reached in the previous chapters. Also potential areas for expansion of this work are summarized. 5.1 Conclusions from Theoretical Evaluations An informal taxonomy of testing functions was provided at the end of Chapter 2. Such a taxonomy could prove extremely useful in determining similarities and differences of various operators and EC systems if it were better formalized. Even the limited grouping and analysis provided the impetus for creating a number of interesting test problems which proved useful for differentiation of systems in Chapter 4. Taxonomy of EC operators by characteristics may provide a basis for understanding and predicting behavior under various circumstances or conditions. In Chapter 3 an attempt to provide a preliminary taxonomy was given. However, the -289- proposed taxonomy did not provide adequate or correct behavior predictions for the systems studied in Chapter 4; therefore, either dynamic emergent behaviors prove stronger than individual operator characteristics, or the methodology of characterization proposed in Chapter 3 was flawed. 5.2 Conclusions from Empirical Evaluations The central focus of the experiments in Chapter 4 was to demonstrate reduced behavioral change under invariant landscape modification. The experiments conclusively prove that the proposed modification of operators to reduce bias successfully reduced behavior changes induced by coordinate rotation. Further, and perhaps more importantly, the experiments in Section 4.6 demonstrate that use of existing population bias for operator axes selection outperforms random rotation for a number of test problems. This implies that such population sampling based techniques may provide useful search information which is effectively i gnored by self-adaptive and other non-population based operators. The performance of the PCGA system appears to be an emergent property of the combination of mutation and crossover operators. Therefore, both the actions of variance mixing (performed by crossover, which is variance neutral), and variance addition contribute to the search capabilities of these systems. In many cases, self-adaptive mutation techniques appear more sensitive to a rotated presentation than population relative operation techniques such as BLX-a and PCGA. The implication is that self-adaptive techniques may more closely mirror inductive search techniques; however, more study is required before such conjectures can be fully substantiated. -290- Population relative approaches, such as BLX-a and PCGA perform poorly when there are many small, scattered and nearly equivalent optima. This appears to be due to the implied additional parameter variance which is not eliminated by selection. Thus the mutative operator begins to overpower the variance reduction effects of selection. For the problems examined, PCGA is relatively insensitive to the choice of pool size, although empirically the smallest pool size proved most effective. PCGA is somewhat sensitive to the scale of the mutative operation relative to the population variance. In general, a scale of 1/3 appeared empirically most general for the test cases evaluated. However, PCGA is extremely sensitive to the population size. If the population size grows too large, the total p0pu1ation inertia appears to overpower the convergence strength of the selection operator. This situation is similar the selection of the mutative scale and is related to the inability to match variance gain to variance loss during selection. Note that BLX-a does not appear to suffer the same level of difficulty, although it is not known why. Both population relative and self-adaptive mutation approaches lack the ability in general to control mutative step sizes to maintain balance with the selection operator. The Variance Recapture mutation operator, which works well within the EP framework, does provide such a mechanism. Empirical testing revealed comparable results to other systems on a number of the tested fimctions. The Variance Recapture operator did not work well within the GA framework. This is likely due to the fact that EP provides a higher level of selection pressure and allows for more elitism. These characteristics provide greater stability in the level of variance reduction from generation to generation. -291- 5.3 Future Work This work provides an initial examination of a number of interesting phenomena. The design and evaluation of experiments to further explore and verify the conclusions reached in this work could provide a long agenda for future exploration. However, a number of specific tangent explorations were mentioned during this exposition. The following provides a summary of these areas. The end of Chapter 2 provides a fairly detailed, if somewhat informal, analysis of various test functions in terms of various characteristics. This taxonomy of test cases would be much more useful if formalized. Also the underlying collection of test problems should be evaluated and modified to provide a more balanced representation of the various characteristics. While Chapter 3 provides a classification of operators by their various statistical operational characteristics,’ this taxonomy does not provide useful predictions of the performance differences found under empirical examination. Either the emergent dynamic properties of search systems render such classification fruitless, or a more complete and effective operator classification system is possible. The existing classification system may prove more effective if more consideration is given to the effects of dimensionality. In Section 4.2.3, we suggested that alternate forms of pool selection may have merits over uniform random selection. Composition and evaluation of such selection schemes could provide an area of further research. Section 4.3.2 proposes that the tendency for EP to present more bias under rotation than the BLX-a and PCGA systems may imply a similarity between EP and -292- _'.I inductive search. This is quite a serious contention which should be expected to invoke immediate and detailed study. The relationship between scale and problem dimensionality is briefly alluded to in Section 4.7, and fiirther exploration of this is remitted for future work. In fact, any number of multifaceted interactions between the various algorithmic effectors studied (population size and scale, etc.) could provide a basis for expanded study. During discussion of the Variance Recapture operator, the use of alternate non- linear annealing schedules for recapture target selection was mentioned. (See Section 4.10.1.) This could provide a fruitful area of future research. Section 4.10.3 discusses the difficulty of using global variance recapture targets when the problem space is underconstrained. Evaluation of the truthfulness of this conjecture and any proposed methods to deal with these difficulties might provide a usefiil area for future work. In Section 4.11.2, the possibility of using Cauchy mutative sampling to provide a form of Fast-VREP was alluded to. This topic could provide a quick and insightful area for future research. Finally and perhaps most importantly, this evaluation could be made much more substantive if the battery of test cases consisted of a number of real-world problems. For example, side-by-side evaluation across one or more tests fi'om the various MINPACK test suites might provide quite insightful results. -293- References [Back 97] Back, Thomas, D. B. Fogel, and Z. Michalewicz, editors, Handbook of Evolutionary Computation, New York: Oxford University Press and Bristol: Institute of Physics Publishing, 1997. [Back 93] Back, T. and H.-P. Schwefel, "An overview of evolutionary algorithms for parameter optimization," Evolutionary Computation, Vol. 1., No. 1., pp. 1-23, 1993. [Daida 99] Daida, Jason M., Seth P. Yalcin, Paul M. Litvak, Gabriel A. Eickhoff, and John A. Polito 2, “Of Metaphors and Darwinism: Deconstructing Genetic Programming's Chimera,” Proceedings of the Congress on Evolutionary Computation, pp. 435-462, 1999. [DeJong 93] DeJong, Kenneth A. and J. Sarma, “Generation gaps revisited,” Foundations of Genetic Algorithms 2, pp. 19-28, 1993. [DeJong 75] DeJong, Kenneth A., An Analysis of the Behavior of a class of Genetic Adaptive Systems, Doctoral Dissertation, University of Michigan, Dissertation Abstracts International, (University Microfilms 76-9381), 1975. [English 99] English, Tom, "Some Information Theoretic Results on Evolutionary Optimization,” Proceedings of the 1999 Congress on Evolutionary Computation: CEC99. PP. 788-795, 1999. [English 96] English, Tom M., "Evaluation of evolutionary and genetic optimizers: No free lunch," Evolutionary Programming V: Proceedings of the Fifth Annual Conference an Evolutionary Programming, pp. 163-169, 1996. [Eshelman 93] Eshelrnan, L. J. and J. D. Schaffer, "Real-coded genetic algorithms and interval-schema," Foundations of Genetic Algorithms 2, pp. 187-202, 1993. [Eshelman 91a] Eshelrnan, Larry and J. Schaffer, “Preventing premature convergence in genetic algorithms by preventing incest,” Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 115-122, 1991. [Eshelman 91b] Eshelrnan, Larry, “The CHC Adaptive Search Algorithm . How to Have Safe Search When Engaging in Nontraditional Genetic Recombination,” Foundations of Genetic Algorithms I, pp. 265—283, 1991. [Fogel 98] Fogel, David B., and Adam Ghozeil, “The Schema Theorem and the Misallocation of Trials in the Presence of Stochastic Effects,” Evolutionary Programming VII, pp. 313-321, 1998. -294- [Fogel 99] Fogel, David B. and Adam Ghozeil, “Schema Processing, Proportional Selection, and the Misallocation of Trials in Genetic Algorithms,” Information Sciences, 1999. [Fogel 90] Fogel, David B., and J.W. Atmar , “Comparing Genetic Operators with Gaussian Mutations in Simulated Evolutionary Processes Using Linear Systems,” Biological Cybernetics, pp. 111-114, 1990. [Fogel 62] Fogel, Lawerence. J. “Autonomous automata,” Industrial Research, vol. 4, pp. 14-19, 1962. [Forrest 92] Forrest, Stephanie and M. Mitchell, “Relative buildingblock fitness and the building-block hypothesis,” Proceedings of workshop on Foundations of Genetic Algorithms and Classifier Systems ( F 0GA-92 ), pp. 109-126, 1992. [Goldberg 90] Goldberg, David. E., Deb Kalyanmoy, and Bradley Korb, “Messy genetic algorithms revisited: studies in mixed size and scale,” Complex Systems, vol. 4, pp. 415-444, 1990. [Goldberg 89a] Goldberg, David. E., Genetic Algorithms in Search, Optimization, and Machine Learning, Reading, MA: Addison Wesley, 1989. [Goldberg 89b] Goldberg, David.E, Bradley Korb, and Deb Kalyanmoy “Messy genetic algorithms: motivation, analysis, and first results,” Complex Systems, vol. 3, pp. 493-530, 1989. [Goldberg 87] Goldberg, David. E. and J. J. Richardson, “Genetic algorithms with sharing for multimodal function optimization,” Proceedings of the Second International Conference on Genetic Algorithms, pp. 41-49, 1987. [Glover 93], Glover, Fred and M. Laguna, "Tabu Search,” Modern Heuristic Techniques for Combinatorial Problems, Oxford, UK: Blackwell Scientific Publishing, 1993. [Hansen 01] Hansen, N. and Osterrneier, A., “Completely Derandomized Self-Adaptation in Evolution Strategies,” Evolutionary Computation, vol. 9, no. 2, pp. 159-195, 2001. [Hansen 00] Hansen , N ., “Invariance, Self-Adaptation and Correlated Mutations in Evolution Strategies,” Proceedings of the Sixth International Conference on Parallel Problem Solving from Nature (PPSN VI), pp. 355-364, 2000. [Hansen 98] Hansen, N., Verallgemeinerte individuelle Schrittweitenregelung in der Evolutionsstrategie. Eine Untersuchung zur entstochastisierten, koordinatensystemunabhdngigen Adaptation der Mutationsverteilung, Berlin: Mensch und Buch Verlag, 1998. [Hansen 97] Hansen, N. and A. Osterrneier , “Convergence Properties of Evolution Strategies with the Derandomized Covariance Matrix Adaptation: The (ii/pl, D- -295- ES,” Proceedings of the Fifth European Congress on Intelligent Techniques and Sofi Computing (EUFIT '97), pp. 650-654, 1997. [Hansen 96] Hansen, N. and A. Osterrneier, “Adapting Arbitrary Normal Mutation Distributions in Evolution Strategies: The Covariance Matrix Adaptation,” Proceedings of the I 996 IEEE International Conference on Evolutionary Computation (ICEC '96), pp. 312-317, 1996. [Harik 95] Harik, Georges, “Finding Multimodal Solutions Using Restricted Tournament Selection,” Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 24-31, 1995. [Holland 75] Holland, John H., Adaptation in natural and artificial systems, Ann Arbor, MI: University of Michigan Press, 1975. [Holland 92] Holland, John H., Adaptation in Natural and Artificial Systems, 2nd Ed. Cambridge, MA: MIT Press, 1992. [Jain 88] Jain, Anil. K. and R. C. Dubes. Algorithms for Clustering Data, Prentice Hall Advanced Reference Series, Englewood Cliffs, NJ: Prentice Hall, 1988. [J anikow 91] Janikow, C. 2. and Michalewicz, Z., “An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms,” Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 31-36, 1991. [Kalyanmoy 89] Kalyanmoy, Deb, and David E. Goldberg ,“An investigation of niche and species formation in genetic function optimization,” Proceedings of the Third International Congress on Genetic Algorithms, pp. 42 - 50, 1989. [Kita 99] Kita, H., Ono, I. and Kobayashi, S., “Multi-parental Extension of the Unimodal Normal Distribution Crossover for Real-coded Genetic Algorithms”, Proceedings of the 1999 Congress on Evolutionary Computation (CEC99), vol. 2, pp.1581- 1587, 1999. [Mafoud 95] Mafoud, S. W. , Niching Methods for Genetic Algorithms, Doctoral Dissertation and IlliGAL Report 95001, University of Illinois at Urbana- Champaign, Illinois Genetic Algorithms Laboratory), Dissertation Abstracts Int., (University Microfilms 9543663), 1995. [Mafoud 92] Mafoud, S. W., “Crowding and preselection revisited,” Parallel Problem Solving from Nature, vol 2., Amsterdam: Elsevier, pp 27-36, 1992. [Mathias 94] Mathias, K. E. and L. D. Whitley, "Changing representation during search: a comparative study of delta coding," Evolutionary Computation, vol. 2., no. 3, pp. 249-278, 1994. -296- [Mfihlenbein 96] Miihlenbein, H. and D. Schlierkamp—Voosen, “Perdictive Models for the Breeder Genetic Algorithm 1. Continuous Parameter Optimization,” Evolutionary Computation, vol. 1, pp. 25-49, 1996. [Ono 97] Ono, 1., and S. Kobayashi, "A real coded genetic algorithm for function optimization using unimodal normal distribution crossover," Proceedings of the Seventh International Conference on Genetic Algorithms, pp. 246-253, 1997. [Osterrneier 99] Osterrneier, A. and N. Hansen, “An evolution strategy with coordinate system invariant adaptation of arbitrary normal mutation distributions within the concept of mutative strategy parameter control”. GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 902-909, 1999. [Osterrneier 94] Osterrneier, A., Gawelczyk, A. and N. Hansen “A Derandomized Approach to Self-Adaptation of Evolution Strategies,” Evolutionary Computation vol. 2, no. 4, pp. 369-380, 1994. [Patton 99] Patton, Arnold L., E. D. Goodman, W. F. Punch III "Beyond Encoding: Rotationally Invariant Operators for Evolutionary Computation,” Tech Report, Michigan State University, 1999. [Patton 98] Patton, A., T. Dexter, E. D. Goodman, and W. F. Punch HI, (1998). "On the Application of Cohort-Driver operators to continuous Optimization Problems Using Evolutionary Computation," Evolutionary Programming VII, Springer-Verlag, pp. 671-681. [Press 92] Press, William H., S. A. Teokolsky, W. T. Vetterling and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, 2nd Edition. Cambridge, UK: Cambridge University Press, 1992. [Punch 91] Punch, William F., E.D. Goodman, M. Pei, L. Chia—Chun, P. Wovland, and R. Enbody, “Further research on feature selection and classification using genetic algorithms,” in Proceedings of the Fourth International Conference on Genetic Algorithms and their Applications, pp. 377-383, 1991. [Rayrner 00] Rayrner, Michael L., W.F. Punch, E.D. Goodman, L.A. Kuhn and A.K. Jain, "Dimensionality Reduction Using Genetic Algorithms,” IEEE Trans. EC, Vol. 4, No. 2, Pp. 164-171, July 2000. [Rayrner 97] Raymer, Michael L., P.C. Sanschagrin, W.F. Punch, S. Venkatararnan, E.D. Goodman, and LA. Kuhn, “Predicting conserved water-mediated and polar ligan interactions in proteins using a k-nearest-neighbors genetic algorithm,” Journal of Molecular Biology, vol. 265, pp. 445-464, 1997. [Rayrner 97] Rayrner, Michael L., W.F. Punch, E.D. Goodman, P.C. Sanschagrin, and LA. Kuhn, “Simultaneous feature scaling and selection using a genetic algorithm,” Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA), 9 PP 561-567, 1997. -297- [Rechenberg 73] Rechenberg, Ingo, Evolutionsstrategie.: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Stuttgart: Frommann- Holzboog, 1973. [Rechenberg 65] Rechenberg,1ngo, “Cybernetic solution path of an experimental problem,” Library Translation 1122, August 1965. Famborough Hants: Royal Aircraft Establishment. English translation of lecture given at the Annual Conference of the WGLR, Berlin, 1964. [Rogers 99] Rogers, A. and A. Prugel-Bennett, “Modelling the dynamics of steady-state genetic algorithms,” Foundations of Genetic Algorithms 5, pp. 57-68, 1999. [Rogers 00] Rogers, A. and A. Priigel-Bennett, “Evolving populations with overlapping generations,” Theoretical Population Biology, vol. 57, no. 2, pp. 121-129, 2000. [Roy 96] Roy R. and Parmee I. C., “Adaptive restricted tournament selection for the identification of multiple sub-optima in a multi modal function,” Proceedings of Artificial Intelligence and Simulation of Behavior (AISB '96) Workshop on Evolutionary Computation, 1-2 April, Brighton, UK, pp. 187-205, 1996. [Roy 95] Roy, R. and Parmee, I.C. “Adaptive restricted tournament selection for multi- modal fuction optimization,” Internal report no. PEDC-03-95, Plymouth University, UK, 1995. [Rudolph 97] Rudolph, Gunter, “Evolution strategies,” Handbook of Evolutionary Computation, pp. B:l.3:1-B:1.3:6, 1997. [Salomon 98] Salomon, Ralf, "Short notes on the schema theorem and the building bock hypothesis in genetic algorithms," Evolutionary Programming VII, pp. 113-122, 1998. [Salomon 96] Salomon, Ralf, “Performance degradation of genetic algorithms under coordinate rotation,” Evolutionary Programming V: Proceedings of the Fifth Annual Conference an Evolutionary Programming, pp. 155-161, 1996. [Salomon 96] Salomon, Ralf, “Reevaluating genetic algorithm performance under coordinate rotation of benchmark functions: a survey of some theoretical and practical aspects of genetic algorithms, BioSystems, vol. 39, no. 3, pp. 263-278, 1996. [Schaffer 91] Schaffer, J. David, Larry J. Eshelrnan, and Daniel Offutt, “Spurious Correlations and Premature Convergence in Genetic Algorithms”, Foundations of Genetic Algorithms I, pp. 102-114, 1991. -298- [S yswerda 89] Syswerda, G., "Uniform Crossover in Genetic Algorithms," Proceedings of the Third International Conference on Genetic Algorithms, pp. 2-9, 1989. [Whitley 93] Whitley, D. and V. Scott Gordon, “Cellular Genetic Algorithms,” Proceedings of the Fifth International Conference on Genetic Algorithms, pp. 177-183, 1993. [Whitley 89] Whitley, D. , ”The GEN IT OR algorithm and selective pressure,” Proceedings of the Third International Conference on Genetic Algorithms, pp. 116-121, 1989. [Whitley 88] Whitley, D. and J. Kauth, “Genitor: A Different Genetic Algorithm,” Proceedings the Fourth Rocky Mountain Conference on Artificial Intelligence, pp. 118-130, 1988. [Wolpert 97] Wolpert, D. H. and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67-83, 1997. [Yao 96] Yao, X., G. Lin and Y Liu, "An analysis of evolutionary algorithms based on neighborhood and step sizes,” Evolutionary Programming VI, pp. 297-307, 1996. [Yip 95] Yip, D. and Y. H. Pao, “Combinatorial Optimization with Use of Guided Evolutionary Simulated Annealing,” IEEE Transactions on Neural Networks, vol. 6, no. 2, pp. 290-295, 1995. -299- lllllllllllrillllllrill 02504 32 ‘