This is to certify that the
dissertation entitled
ACHIEVING CONSISTENT EVOLUTION ACROSS
ISOMETRICALLY EQUIVALENT SEARCH SPACES
presented by
Arnold L. Patton
has been accepted towards fulﬁllment
of the requirements for the
Ph. D. degree in Computer Science and
/7 mam
% IV
V v” Major Professor’s Signature
5 // L/b 9/
Date
MSU is an Afﬁnnative Action/Equal Opportunity Institution
. a ‘ V. ‘ 
 .— v '6
LIBRARY
Michigan State
University
PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.
DATE DUE DATE DUE DATE DUE
6/01 cJCIRC/DateDue.p65p.15
,  4_________________.___._———————~—_
ACHIEVING CONSISTENT EVOLUTION ACROSS ISOMETRICALLY
EQUIVALENT SEARCH SPACES
By
Arnold L. Patton
A DISSERTATION
Submitted to
Michigan State University
In partial fulﬁllment of the requirements
For the degree of
DOCTOR OF PHILOSOPHY
Department of Computer Science and Engineering
2004
ABSTRACT
ACHIEVING CONSISTENT EVOLUTION ACROSS ISOMETRICALLY
EQUIVALENT SEARCH SPACES
By
Arnold L. Patton
Evolutionary Computation systems are stochastic search processes which
attempt to locate the global optima of a search space. Some of these systems
have been previously shown to be inconsistent in their behavior across isometric
transformations, such as rotation of coordinate axes. This work attempts to study
this issue in depth by categorizing behaviors of individual operators and
composite EC systems. Several approaches toward creating modiﬁed operators
which deal with these situations are also studied.
COPYRIGHT by Arnold L. Patton
2004
To my wife, Rachel, and my children, Noah and Bethany for their patience,
prayers, and sacrifice during this lnordlnately lengthy process.
To my parents, John and June, wlthout whose sacrifice and support I could
not have begun my journey through the halls of higher education.
iv
ACKNOWLEDGEMENTS
My advisor, William F. Punch, III, for his patience and consideration.
The members of the Michigan State University GARAGe, the discussions
with whom Initiated much of this work. Especially Michael Raymer,
Terrance Dexter, Owen Mathews, Victor Mlagkikh, Sasha Topchy, and
Marilyn Wulfekuhler.
The faculty of the Department of Computer Science and Information
Systems at Bradley University for their support.
The members of the BEAR group and Dean Claire Etaugh at Bradley
University, for creation and support of the Beowulf cluster on which much
of this data was collected.
TABLE OF CONTENTS
LIST OF TABLES ............................................................................................................. xi
LIST OF FIGURES .......................................................................................................... xv
CHAPTER 1 INTRODUCTION 1
1.1 INTENTION OF THIS THESIS ......................................................................................... 5
1.2 PROBLEM DOMAIN ...................................................................................................... 7
1.3 TECHNIQUES ............................................................................................................... 9
1.3.] Statistical Analysis of Distributions .................................................................. 10
1.3.2 Comparative Experimental Analysis of Operators    . 10
CHAPTER 2 BACKGROUND 11
2.1 No FREE LUNCH THEOREMS ..................................................................................... 11
2.2 OVERVIEW OF EVOLUTIONARY COMPUTATION ........................................................ 15
2.2.1 General Parametric Evolutionary Computation Algorithm. ............................. 20
2.2.2 Evolutionary Programming ............................................................................... 25
2.2.3 Genetic Algorithms ............................................................................................ 27
2.2.4 Evolutionary Strategies ..................................................................................... 37
2.3 OVERVIEW OF COMMON EVOLUTIONARY OPERATORS ............................................. 38
2.3.] Selection Operators ........................................................................................... 38
2.3.2 Recombination ................................................................................................... 46
2.3.3 Mutation ............................................................................................................ 59
2.4 EMPIRICAL TEST FUNCTIONS .................................................................................... 62
Vi
2.4.1 Function Categorization and Terms ................................................................. 63
2.4.2 Function Illustrations ........................................................................................ 68
2.4.3 Square Function ................................................................................................ 70
2.4.4 Sphere Function ................................................................................................ 71
2.4.5 Schwefel ’s Problem 1.2 ..................................................................................... 72
2.4.6 Schaﬁer's Function ........................................................................................... 73
2.4.7 Schaﬂer’s Function Modiﬁcation 1 .................................................................. 75
2.4.8 Schaﬂer’s Function Modiﬁcation 2 .................................................................. 77
2.4.9 Schaﬁer’s Function Modiﬁcation 3 .................................................................. 79
2.4.10 Schaﬁ’er’s Function Modiﬁcation 4 ................................................................ 83
2.4.11 Ring Function .................................................................................................. 85
2.4.12 Trapezoid Cross Function ............................................................................... 89
2.4.13 Rosenbrock’s Saddle Function ........................................................................ 90
2.4.14 Spiral Function ................................................................................................ 91
2.4.15 Ackley ’s Function ............................................................................................ 94
2.4.16 Griewangk’s Function ..................................................................................... 96
2.4.17 Clover & Cross Function ................................................................................ 98
2.4.18 Bohashevsky ’s Function ................................................................................ 100
2.4.19 Rastrigrin ’s Function .................................................................................... 103
2.4.20 Yip & Pao’s Function .................................................................................... 105
2.4.21 Multimodal Spiral Function ......................................................................... 107
2.4.22 Frequency Modulation Sounds (FMS) Problem ........................................... 110
2.4.23 Exponential Function .................................................................................... 1 13
Vii
2.4.24 ChainLink Function ..................................................................................... I I4
2.4.25 Double Cos Function .................................................................................... 115
2.4.26 Inverse Exponential Function ....................................................................... 1 16
2.4.27 Worms Function ............................................................................................ I 18
2.4.28 Schwefel ’s Function ...................................................................................... 120
2.4.29 Dynamic Control ........................................................................................... 122
CHAPTER 3 ANALYSIS OF OPERATORS 123
3.1 STATISTICAL DISTRIBUTION ANALYSIS .................................................................. 124
3.1.1 Mean Disturbance ........................................................................................... 125
3.1.2 Variance Disturbance ..................................................................................... 128
3.1.3 Mean Focusing (Center Tending) ................................................................... 132
3.1.4 Covariance Disturbance ................................................................................. 133
3.2 LOCAL STATISTICS AND HOMEOMORPHIC ENCODING INVARIANCE ........................ 149
3.2.1 Translation of axes .......................................................................................... 151
3.2.2 Linear rescaling .............................................................................................. 152
3.2.3 Rotation of axes ............................................................................................... 154
3.2.4 Underconstrained (Free) Parameters ............................................................. 155
3.2.5 Validity of Local Statistical Extrapolation ...................................................... 159
3.3 EMPIRICAL ANALYSIS ............................................................................................. 161
3.3.1 Statistical Tests ................................................................................................ I 61
3.3.2 Statistical Test Cases ....................................................................................... 1 6 7
3.3.3 Example Statistical Analysis ........................................................................... 174
3.3.4 Empirical Statistical Results ........................................................................... 180
viii
3.3.5 Preliminary Operator Taxonomy .................................................................... 195
CHAPTER 4 ALTERNATE POPULATION RELATIVE OPERATORS .......... 200
4.1 OVERVIEW OF BENCHMARK SYSTEMS AND EMPIRICAL COMPARISONS .................. 201
4.1.] Random Rotational Presentation of Test Functions ....................................... 202
4.1.2 EP and Fast EP Systems ................................................................................. 203
4.1.3 BLXa GA System ............................................................................................ 204
4.1.4 Ranksum Comparisons .................................................................................... 204
4.2 PRINCIPAL COMPONENT OPERATORS FOR GA ........................................................ 209
4.2.1 Principal Component Crossover (PCX) .......................................................... 210
4.2.2 Principal Component Gaussian Mutation (PCGM) ....................................... 211
4.2.3 Sample Selection for Principal Component Analysis ...................................... 212
4.2.4 Difﬁculties with Relative UnderConstraint and Excessive Freedom ............ 213
4.2.5 PC GA System Parameterization ..................................................................... 214
4.3 DEMONSTRATION OF ROTATIONAL BIAS IN STANDARD APPROACHES .................... 215
4.3.1 BlXa Rotational Bias .................................................................................... 215
4.3.2 EP and Fast EP Rotational Bias ..................................................................... 21 7
4.3.3 Principal Component GA Rotational Bias ...................................................... 220
4.4 EMPIRICAL RELATIVE EFFICIENCY COMPARISON OF PCGA ................................... 222
4.4.1 Comparison to EP and Fast EP ...................................................................... 222
4.4.2 Comparison to BLXa GA ............................................................................... 226
4.5 EMPIRICAL EVALUATION OF PCX AND PCGM IN ISOLATION ................................. 228
4.6 EMPIRICAL COMPARISON OF PRINCIPAL COMPONENT OPERATORS AND RANDOMLY
ROTATED OPERATORS ............................................................................................ 233
ix
4.7 EVALUATION OF THE EFFECT OF SCALE ON PC OPERATORS ................................... 235
4.8 EVALUATION OF THE EFFECT OF SAMPLE SIZE ON PC OPERATORS ........................ 237
4.9 EVALUATION OF THE EFFECT OF POPULATION SIZE ON POPULATION RELATIVE
OPERATORS ............................................................................................................ 240
4.9.1 PCGA Population Sensitivity .......................................................................... 240
4.9.2 BLXa GA Population Sensitivity .................................................................... 243
4.10 LOSS SENSITIVE OPERATORS FOR EC ................................................................... 245
4.10.1 Variance Recapture and Convergence .......................................................... 245
4.10.2 Global versus Dimensional Variance Targets .............................................. 247
4.10.3 Difﬁculty with Nonﬁxed Axes ....................................................................... 248
4.1 1 EMPIRICAL RELATIVE EFFICIENCY COMPARISON OF VREP ................................. 249
4.11.1 Comparison to EP ......................................................................................... 249
4.11.2 Comparison to Fast EP ................................................................................. 252
4.11.3 Comparison to BLXa GA ............................................................................. 254
4.11.4 Comparison to PCGA .................................................................................... 254
4.12 EFFECT OF RBCAP'I'URE PERCENTAGE ON VREP EFFICIENCY ............................... 257
4.13 EFFECT OF POPULATION SIZE ON VREP EFFICIENCY ............................................ 259
4.14 GRAPHIC PERFORMANCE COMPARISON OF SYSTEMS ............................................ 261
CHAPTER 5 CONCLUSIONS 289
5.1 CONCLUSIONS FROM THEORETICAL EVALUATIONS ................................................ 289
5.2 CONCLUSIONS FROM EMPIRICAL EVALUATIONS ..................................................... 290
5.3 FUTURE WORK ....................................................................................................... 292
REFERENCES 294
LIST OF TABLES
TABLE 3.1 RESULTS ON ALIGNED UNIFORM UNTMODAL DISTRIBUTION .......................... 180
TABLE 3.2 RESULTS ON ROTATED UNIFORM UNTMODAL DISTRIBUTION ......................... 180
TABLE 3.3 RESULTS ON ALIGNED NORMAL UNTMODAL DISTRIBUTION ........................... 181
TABLE 3.4 RESULTS ON ROTATED NORMAL UNIMODAL DISTRIBUTION .......................... 181
TABLE 3.5 RESULTS ON NDIMENSIONAL HYPERSPHERE SURFACE DISTRIBUTION... ........ 182
TABLE 3.6 RESULTS ON UNIFORM DENSITY NDIMENSIONAL HYPERSPHERE DISTRIBUTION
................................................................................................................................. l 82
TABLE 3.7 RESULTS ON N—DIMENSIONAL NORMAL DISTRIBUTION ................................... 183
TABLE 3.8 RESULTS ON N—DIMENSIONAL NORMAL RING DISTRIBUTION .......................... 183
TABLE 3.9, RESULTS ON ROTATED NDIMENSIONAL HYPERELLIPSOID DISTRIBUTION ...... 184
TABLE 3.10 RESULTS ON ROTATED NDIMENSIONAL SKEWED HYPERELLIPSOID
DISTRIBUTION .......................................................................................................... 184
TABLE 3.11 RESULTS FOR FIXED UNIFORM MUTATION CENTERED ON A SINGLE PARENT185
TABLE 3.12 RESULTS FOR FIXED NORMAL MUTATION CENTERED ON A SINGLE PARENT 185
TABLE 3.13 RESULTS FOR FIXED CAUCHY MUTATION CENTERED ON A SINGLE PARENT 186
TABLE 3.14 RESULTS FOR FIXED LOGUNIFORM MUTATION CENTERED ON A SINGLE
PARENT .................................................................................................................... 186
TABLE 3.15 RESULTS FOR FIXED UNIFORM MUTATION CENTERED ON THE MEAN OF 2
PARENTS .................................................................................................................. 186
TABLE 3.16 RESULTS FOR FIXED NORMAL MUTATION CENTERED ON THE MEAN OF 2
PARENTS .................................................................................................................. 187
xi
TABLE 3.17 RESULTS FOR FIXED CAUCHY MUTATION CENTERED ON THE MEAN OF 2
PARENTS .................................................................................................................. 187
TABLE 3.18 RESULTS FOR FIXED LOG—UNIFORM MUTATION CENTERED ON THE MEAN OF 2
PARENTS .................................................................................................................. 187
TABLE 3.19 RESULTS FOR AVERAGTNG CROSSOVER ........................................................ 188
TABLE 3.20 RESULTS FOR LINEAR CROSSOVER ............................................................... 188
TABLE 3.21 RESULTS FOR EXTENDED LINEAR CROSSOVER ............................................. 188
TABLE 3.22 RESULTS FOR STANDARD BLX0.5 ............................................................... 189
TABLE 3.23 RESULTS FOR BLX0.5 CENTERED ON A SINGLE PARENT ............................. 189
TABLE 3.24 RESULTS FOR SIMPLEX CROSSOVER (SPX) .................................................. 189
TABLE 3.25 RESULTS FOR PRINCIPAL COMPONENT GAUSSIAN SAMPLING ...................... 190
TABLE 3.26 RESULTS FOR PRINCIPAL COMPONENT CROSSOVER ..................................... 190
TABLE 3.27 RESULTS FOR FIELDBASED CROSSOVER ..................................................... 190
TABLE 3.28 RESULTS FOR GLOBAL DOMINANT RECOMBINAIION ................................... 191
TABLE 3.29 PRELIMINARY TAXONOMY ........................................................................... 198
TABLE 4.1 ZSTATISTIC OF RANKSUM MEASURES WITH TWO 90 MEMBER SAMPLE GROUPS
................................................................................................................................. 207
TABLE 4.2 RANKSUM MEASURES FOR VARIOUS PROBABILITY LEVELS WITH Two 90
MEMBER SAMPLE GROUPS ...................................................................................... 208
TABLE 4.3 PERFORMANCE COMPARISON OF ELK0.5 GA UNDER ROTATED AND NON
ROTATED PRESENTATION ........................................................................................ 216
TABLE 4.4 PERFORMANCE COMPARISON OF EP UNDER ROTATED AND NON—ROTATED
PRESENTATION ........................................................................................................ 218
xii
TABLE 4.5 PERFORMANCE COMPARISON OF FAST EP UNDER ROTATED AND N ONROTATED
PRESENTATION ........................................................................................................ 219
TABLE 4.6 PERFORMANCE COMPARISON OF PCGA UNDER ROTATED AND NONROTATED
PRESENTATION ........................................................................................................ 221
TABLE 4.7 PERFORMANCE COMPARISON OF AN EP SYSTEM AND PCGA ......................... 224
TABLE 4.8 PERFORMANCE COMPARISON OF A FAST EP SYSTEM AND PCGA .................. 225
TABLE 4.9 PERFORMANCE COMPARISON OF A BLXO.5 GA SYSTEM AND PCGA ........... 227
TABLE 4.10 PERFORMANCE COMPARISON OF PCXONLY AND PCX AND PCGM
COMBINED IN A GA'FRAMEWORK ........................................................................... 230
TABLE 4.11 PERFORMANCE COMPARISON OF PCGMONLY AND PCX AND PCGM
COMBINED IN A GA FRAMEWORK ........................................................................... 231
TABLE 4.12 PERFORMANCE COMPARISON OF PCGMONLY AND PCX AND PCXONLY IN A
CA FRAMEWORK ..................................................................................................... 232
TABLE 4.13 PERFORMANCE COMPARISON OF A RANDOMLY ROTATED GA SYSTEM AND
PCGA ...................................................................................................................... 234
TABLE 4.14 EFFECT OF SCALE, S, ON PCGM OPERATOR IN PCGA SYSTEM .................... 236
TABLE 4.15 EFFECTS OF SAMPLE POOL SIZE ON PCGA SYSTEM ..................................... 239
TABLE 4.16 EFFECTS OF POPULATION SIZE ON PCGA PERFORMANCE ........................... 242
TABLE 4.17 EFFECTS OF POPULATION SIZE ON BLX0.5 PERFORMANCE ........................ 244
TABLE 4.18 PERFORMANCE COMPARISON OF AN EP SYSTEM AND VREP0.998 ............ 251
TABLE 4.19 PERFORMANCE COMPARISON OF A FAST EP SYSTEM AND VREP0.998 ...... 253
TABLE 4.20 PERFORMANCE COMPARISON OF A BLXA GA SYSTEM AND VREP0.998 . 255
TABLE 4.21 PERFORMANCE COMPARISON OF PCGA AND VREP0.998 ......................... 256
xiii
LE .22 EWEC F ARGE] 0N ~REPPERFORMANCE .. aeo oeooogga 25
IAI’ 4 IO IIECAJ IUItE I to use... one 8
xiv
LIST OF FIGURES
FIGURE 2.1 OUTLINE OF A BASIC EVOLUTIONARY ALGORITHM ......................................... 23
FIGURE 2.2 EXAMPLE OF 2POINT CROSSOVER ................................................................. 30
FIGURE 2.3 HISTOGRAM OF SAMPLE OF MUHLENBEIN’S LOGUNIFORM DISTRIBUTION 55
FIGURE 2.4 2D SQUARE FUNCTION .................................................................................. 70
FIGURE 2.5 2D SPHERE FUNCTION ................................................................................... 71
FIGURE 2.6 2D SCHWEFEL’S PROBLEM 1.2 FUNCTION ..................................................... 72
FIGURE 2.7 2D SCHAFFER’S FUNCTION ............................................................................. 74
FIGURE 2.8 2D SCHAFFER’S FUNCTION  MODIFICATION 1 .............................................. 76
FIGURE 2.9 2D SCHAFFER’S FUNCTION  MODIFICATION 2 .............................................. 78
FIGURE 2.10 2D SCHAFFER’S FUNCTION — MODIFICATION 3, Low RESOLUTION .............. 80
FIGURE 2.11 2D SCHAFFER’S FUNCTION — MODIFICATION 3, MEDIUM RESOLUTION ....... 81
FIGURE 2.12 2D SCHAFFER’S FUNCTION — MODIFICATION 3, HIGH RESOLUTION .............. 82
FIGURE 2.13 2D SCHAFFER’S FUNCTION — MODIFICATION 4 ............................................ 84
FIGURE 2.14 2D RING FUNCTION ..................................................................................... 86
FIGURE 2.15 2D RING FUNCTION, CLOSER VIEW ............................................................. 87
FIGURE 2.16 2D RING FUNCTION, NEAR GLOBAL OPTIMA ............................................... 88
FIGURE 2.17 2D TRAPEZOID CROSS FUNCTION ................................................................. 89
FIGURE 2.18 2D ROSENBROCK’S SADDLE FUNCTION ....................................................... 90
FIGURE 2.19 2D SPIRAL FUNCTION .................................................................................. 93
FIGURE 2.20 2D ACKLEY’S FUNCTION ............................................................................. 95
FIGURE 2.21 2D GRIEWANGK’S FUNCTION ...................................................................... 97
FIGURE 2.22 2D CLOVER AND CROSS FUNCTION ............................................................. 99
XV
FIGURE 2.23 2D BOHASHEVSKY’S FUNCTION ................................................................ 101
FIGURE 2.24 2D BOHASHEVSKY’S FUNCTION, CLOSER VIEW ......................................... 102
FIGURE 2.25 2D RASTRIGIN’S FUNCTION ....................................................................... 104
FIGURE 2.26 2D YIP & PAo’S FUNCTION ....................................................................... 106
FIGURE 2.27 2DMULTIMODAL SPIRAL FUNCTION ........................................................ 109
FIGURE 2.28 FMS FUNCTION WITHIN INITIAL RANGE, x5 AND x6 .................................. 111
FIGURE 2.29 FMS FUNCTION WITHIN INITIAL RANGE, x2 AND x4 .................................. 112
FIGURE 2.30 2D EXPONENIIAL FUNCTION ..................................................................... 1 13
FIGURE 2.31 2D CHAINLINK FUNCTION ........................................................................ 114
FIGURE 2.32 2D DOUBLE COS FUNCTION ....................................................................... 115
FIGURE 2.33 2D INVERSE EXPONENIIAL FUNCTION ....................................................... 117
FIGURE 2.34 2D WORMS FUNCTION ............................................................................... 1 19
FIGURE 2.35 2D SCHWEFEL’S FUNCTION ....................................................................... 121
FIGURE 3.1 EXAMPLE SITUATIONS WHICH INCREASE VARIANCE THROUGH SELECTION 131
EQUATION 3.1 FORMULAS FOR SAMPLE COVARIANCE ................................................... 133
FIGURE 3.2 THREE EXAMPLE PARENT DISTRIBUTIONS .................................................... 134
EQUATION 3.2 GENERAL VARIANCE LOSS FORUMI.A ..................................................... 136
FIGURE 3.3 CROSSOVER COVARIANCE MODIFICATION EXAMPLE .................................... 137
EQUATION 3.2 COVARIANCE CONTRIBUTION OF PARENTS ............................................... 137
EQUATION 3.3 COVARIANCE CONTRIBUTION OF CHILDREN ............................................. 137
EQUATION 3.4 COVARIANCE MODIFICATION ................................................................... 137
EQUATION 3.5 MAGNITUDE OF COVARIANCE MODIFICATION, SLOPEDISTANCE FORM.... 138
xvi
FIGURE 3.4 RELATIVE COVARIANCE DISTURBANCE AS FACTOR OF SLOPE BETWEEN
PARENTS .................................................................................................................. 140
FIGURE 3.5 REGION OF COVARIANCE LOSS _>_ 70% ........................................................... 140
EQUATION 3.6 EXPECTED RELATIVE MAGNITUDE OF COVARIANCE LOSS FOR FULLY
DISSOCIATIVE OPERATORS ....................................................................................... 141
FIGURE 3.6 EXPECTED DIFFERENCE OF TWO SAMPLES FROM U(A,A+2) ........................... 142
EQUATION 3.7 PDF FOR EXPECTED DIFFERENCE BETWEEN Two UNIFORM SAMPLES ....... 142
EQUATION 3.8 EXPECTED VALUE OF D2 FOR A UNIFORM DISTRIBUTION OF WIDTH w ...... 142
EQUATION 3.9 EXPECTED LEVEL OF COVARIANCE DISRUPIION PER CHILD FOR A
UNIFORMLY DISTRIBUTED POPULATION ALONG A LINE SEGMENT WITH SLOPE M AND
LENGTH W ................................................................................................................ 142
FIGURE 3.7 NORMAL DISTRIBUTION ALONG COVARIANT LINE SEGMENT ......................... 143
FIGURE 3.8 SEARCH DISTRIBUTION OF A BLXa OPERATOR ........................................... 145
EQUATION 4.1 EXPECTED MEAN OF RANKSUM VALUES .................................................. 205
EQUATION 4.2 EXPECTED VARIANCE OF RANKSUM VALUES ........................................... 205
EQUATION 4.3 ZSTATISTIC FOR AN OBSERVED RANKSUM MEASURE ............................. 206
EQUATION 4.4 FORMULA FOR PCGM SAMPLE STANDARD DEVIATION ........................... 211
EQUATION 4.5 FORMULA FOR VR MUTATION DISTRIBUTION VARIANCE CALCULATION 249
FIGURE 4.1 AVERAGE BEST PERFORMANCE ON SQUARE ................................................. 262
FIGURE 4.2 RANKSUM PERFORMANCE ON SQUARE .......................................................... 262
FIGURE 4.3 AVERAGE BEST PERFORMANCE ON SPHERE .................................................. 263
FIGURE 4.4 RANKSUM PERFORMANCE ON SPHERE ........................................................... 263
FIGURE 4.5 AVERAGE BEST PERFORMANCE ON SCHWEFEL’S 1.2 .................................... 264
xvii
FIGURE 4.6 RANKSUM PERFORMANCE ON SCHWEFEL’S 1.2 ............................................. 264
FIGURE 4.7 AVERAGE BEST PERFORMANCE ON SCHAFFER .............................................. 265
FIGURE 4.8 RANKSUM PERFORMANCE ON SCHAFFER ...................................................... 265
FIGURE 4.9 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 1 ................................. 266
FIGURE 4.10 RANKSUM PERFORMANCE ON SCHAFFER MOD. 1 ........................................ 266
FIGURE 4.11 AVERAGE BEST PERFORMANCE ON RING .................................................... 267
FIGURE 4.12 RANKSUM PERFORMANCE ON RING ............................................................. 267
FIGURE 4.13 AVERAGE BEST PERFORMANCE ON TRAPEZOID & CROSS ........................... 268
FIGURE 4.14 RANKSUM PERFORMANCE ON TRAPEZOID & CROSS ................................... 268
FIGURE 4.15 AVERAGE BEST PERFORMANCE ON ROSENBROCK SADDLE ......................... 269
FIGURE 4.16 RANKSUM PERFORMANCE ON ROSENBROCK SADDLE ................................. 269
FIGURE 4.17 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 2 ............................... 270
FIGURE 4.18 RANKSUM PERFORMANCE ON SCHAFFER MOD. 2 ........................................ 270
FIGURE 4.19 AVERAGE BEST PERFORMANCE ON SPIRAL ................................................. 271
FIGURE 4.20 RANKSUM PERFORMANCE ON SPIRAL ......................................................... 271
FIGURE 4.21 AVERAGE BEST PERFORMANCE ON ACKLEY ............................................... 272
FIGURE 4.22 RANKSUM PERFORMANCE ON ACKLEY ....................................................... 272
FIGURE 4.23 AVERAGE BEST PERFORMANCE ON GRIEWANGK ........................................ 273
FIGURE 4.24 RANKSUM PERFORMANCE ON GRIEWANGK ................................................. 273
FIGURE 4.25 AVERAGE BEST PERFORMANCE ON CLOVER & CROSS ................................ 274
FIGURE 4.26 RANKSUM PERFORMANCE ON CLOVER & CROSS ........................................ 274
FIGURE 4.27 AVERAGE BEST PERFORMANCE ON BOHACHEVSKY .................................... 275
FIGURE 4.28 RANKSUM PERFORMANCE ON BOHACHEVSKY ............................................ 275
xviii
FIGURE 4.29 AVERAGE BEST PERFORMANCE ON RASTRIGRTN ......................................... 276
FIGURE 4.30 RANKSUM PERFORMANCE ON RASTRIGRIN ................................................. 27 6
FIGURE 4.31 AVERAGE BEST PERFORMANCE ON YIP & PAO ........................................... 277
FIGURE 4.32 RANKSUM PERFORMANCE ON YIP & PAO .................................................... 277
FIGURE 4.33 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 3 ............................... 278
FIGURE 4.34 RANKSUM PERFORMANCE ON SCHAFFER MOD. 3 ........................................ 278
FIGURE 4.35 AVERAGE BEST PERFORMANCE ON SCHAFFER MOD. 4 ............................... 279
FIGURE 4.36 RANKSUM PERFORMANCE ON SCHAFFER MOD. 4 ........................................ 279
FIGURE 4.37 AVERAGE BEST PERFORMANCE ON MULTIMODAL SPIRAL .......................... 280
FIGURE 4.38 RANKSUM PERFORMANCE ON MULTIMODAL SPIRAL .................................. 280
FIGURE 4.39 AVERAGE BEST PERFORMANCE ON FMS .................................................... 281
FIGURE 4.40 RANKSUM PERFORMANCE ON FMS ............................................................. 281
FIGURE 4.41 AVERAGE BEST PERFORMANCE ON EXPONENIIAL ...................................... 282
FIGURE 4.42 RANKSUM PERFORMANCE ON EXPONENIIAL .............................................. 282
FIGURE 4.43 AVERAGE BEST PERFORMANCE ON CHAINLINK ........................................ 283
FIGURE 4.44 RANKSUM PERFORMANCE ON CHAINLINK ................................................. 283
FIGURE 4.45 AVERAGE BEST PERFORMANCE ON DOUBLE COS ....................................... 284
FIGURE 4.46 RANKSUM PERFORMANCE ON DOUBLE COS ................................................ 284
FIGURE 4.47 AVERAGE BEST PERFORMANCE ON INVERSE EXPONENIIAL ....................... 285
FIGURE 4.48 RANKSUM PERFORMANCE ON INVERSE EXPONENIIAL ................................ 285
FIGURE 4.49 AVERAGE BEST PERFORMANCE ON WORMS ................................................ 286
FIGURE 4.50 RANKSUM PERFORMANCE ON WORMS ........................................................ 286
FIGURE 4.51 AVERAGE BEST PERFORMANCE ON SCHWEFEL ........................................... 287
xix
FIGURE 4.52 RANKSUM PERFORMANCE ON SCHWEFEL .................................................... 287
FIGURE 4.53 AVERAGE BEST PERFORMANCE ON DYNAMIC CONTROL ............................ 288
FIGURE 4.54 RANKSUM PERFORMANCE ON DYNAMIC CONTROL ..................................... 288
XX
Chapter 1
Introduction
The NO Free Lunch (NFL) theorems [Wolpert 97] provide a context for
understanding and comparing the relative strengths and weaknesses of algorithmic search
techniques. Simply stated, the NFL theorems prove that for any ﬁnite representation
from the set of all possible enumerated orderings of the represented points, and given
some criteria of performance measurement, no single search algorithm can outperform
any other on average. That is to say, the performance of all search algorithms must be
equal when averaged across all possible represented search spaces. Put yet another way,
any performance gains that a given algorithm makes over another on a speciﬁc problem
instance must necessarily be offset by an equivalent loss on one or more different
problem instances of equivalent representational complexity. This applies equally to both
complex systems such as genetic algorithms and simple ones such as uniform random
search. Over the space of all possible ﬁnite problem spaces, both random search and
genetic algorithms must be equally powerful.
To better understand how this result is possible, Wolpert [Wolpert 97] suggests
considering how a given search algorithm selects a future search point given an existing
set of known points in the space. Typically, the algorithm makes some assumption about
the shape of the search space given the set of points already Visited (e. g. gradient descent
assumes a locally smooth gradient) and selects the next point based on this "information".
However, when considering the set of all possible enumerated orderings of points for a
given representation, it is in equally likely that any given assumption leads to poorer or
better solutions. (In fact, the stronger the assumption the less likely it will lead to good
points.) Equivalently, making the inverse assumption is also equally likely to be correct
over the set of all possible encoded search spaces. Wolpert suggests that the degree of
success of a given search algorithm upon a given search problem is directly related to
how well the assumptions of the algorithm hold true for the given space. Equivalently
English [English 96] suggests that the set of previously visited points gives no actual
"information" about the unvisited remainder of the search Space, and that any apparent
gains by "informed" search techniques result from providential alignments between
assumptions and problem spaces.
Given the implications of these theorems, it seems somewhat pointless to compare
the relative merits of individual search algorithms. Paradoxically, however, the NFL
theorems actually provide the key to understanding the relative strengths of various
search techniques by pointing to the underlying assumptions made by each given system.
By analyzing the underlying assumptions, we can better judge over what set of search
spaces a given system should outperform other systems. To the degree we expect a given
assumption to hold within a search space, we should expect a system which makes that
assumption to have increased performance.
As this thesis is concerned with the performance of evolutionary search
techniques, one of the most important factors is the set of operators employed to select
new search points from an existing set of currently visited points. Analysis of the
assumptions inherent to individual operators should provide a reasonable starting point.
In general, an operator that performs best under certain assumptions (e. g. there are
few or no covariant features in this landscape) tends to be perform poorly under a
corresponding transformation which violates that assumption (6. g. rotation of the
encoding axes). The standard AI term is that the operator is strong because it is
speciﬁcally suited to, and therefore strongly tied to, certain assumptions about the search
landscape. A weak AI method trades lower performance on a speciﬁc subset of
problems (i.e. those for which the strong assumption applies) for better performance on
problems outside of this subset. In fact, the NFL theorems provide a more formal basis
for this strong / weak search classiﬁcation continuum. Additionally, the NFL theorems
prove that the tradeoff between performance on the strong subset and generality outside
the strong subset is inescapable  a gain in one requires an equivalent loss in the other.
Once the assmnptions inherent to a given operator are apparent, we may ﬁirther
consider if a given assumption provides for a desired performance gain and/or loss over a
given potential set of problems. It may be possible to weaken the operator by redesigning
it such that the given assumption is removed or modiﬁed. Such a redesigned operator
potentially trades a performance loss on a given subset of problems, namely the problems
for which the original assumption held, for a performance gain on an alternative set of
problems encompassed by the new assumption set.
Most evolutionary search operators are designed with either the speciﬁc intention
of either: exploiting assumed "information" from previously sampled potential solutions,
being purely explorative, or attempting to statically or dynamically provide a balance
between these two competing drives. However, often little thought is given concerning
the interplay between the effects of problem alignment and dimensionality and the levels
of exploration or exploitation. It is often possible for a single operator to have greatly
divergent effects on two identical landscapes which present differences such as the
alignment of the encoding axes. For example, [Salmon 96] studied the effect of axial
rotation on the relative performance of EP and breeder genetic algorithm (BGA) systems.
Such dependencies are troubling for two reasons. First, such divergent behavior
indicates that the operator does not perform consistently based on the local attributes of
the landscape itself, such as the local gradient or shape. Therefore additional information
is necessary to determine the expected performance of such operators. Second, the end
user of a search system, such as a genetic algorithm, often has a reasonable idea of what
level of computational resources they would prefer to expend on search for new solutions
versus reﬁnement of current solutions. The constraints on computational resources may
reﬂect real world limitations such as the available processor power and the need to meet
commitment deadlines. However, for most evolutionary computation operators, the
nature of the balance between exploration and exploitation is relatively unknown.
1. 1 Intention of this Thesis
The intent of this thesis is to discover and categorize differences between
evolutionary search operators over a variety of problems spaces such that their biases are
clearly revealed. The outcome of this analysis is a taxonomy of Operator biases and the
creation of alternative search operators which are Speciﬁcally designed to negate or
circumvent these biases. A combination of empirical and analytical techniques are used
to provide evidence of bias reduction and increased search performance under
constrained circumstances in comparison to standard EC operators.
Through statistical analysis of operator distributions, we can discover and
categorize certain forms of operator bias. This may proceed either from direct analysis of
the design of the operator, or through analysis of the effects of the operator under
established circumstances. These “case studies” characterize the distribution sampled by
an operator during application to a preselected source population distribution. By
careﬁrlly selecting the source population distribution we will be able to discover certain
statistical biases. By understanding the forms of bias inherent to an Operator, we can gain
insight into what landscape characteristics are beneﬁcial or detrimental to a system
employing that operator. Speciﬁcally, we are interested in discovering what degree of
invariance of behavior we can expect for a given operator under a speciﬁc local
population distribution.
Given a set of landscape transformations and suitable tests for operator
invariance, we provide a “taxonomy of invariance” which allows for comparison between
Operators and provides an initial decision point for operator selection. Ideally these
taxonomic tests should provide meaningful quantitative values, but we settle for more
5
qualitative analysis on occasion. Empirical results on a given battery of test problems
will be used to demonstrate the effectiveness of this taxonomy.
Once we understand the biases inherent to a given operator, we can attempt to
modify the operator in order to remove a given bias thereby creating a weaker version of
the operator. The NFL results clearly specify that the resulting operator will lose relative
power over a certain set of problems, but the beneﬁt is an increase in the general
applicability of the operator. It may be argued that limited gains in generality are more
valuable than equivalent gains in strength for evolutionary computation operators, since
for the majority of search landscapes the assumptions which may hold are relatively
unknown. Empirical examination and comparison over a battery of test problems provide
a method for evaluation of the relative strengths and weakness of these redesigned
operators.
All evolutionary search systems select future search points in reference to current
or previously visited search points. In general, the relative stepsize for a given EC
operator may be dependent on a large number of factors, many of which may be artifacts
of the choice of encoding. The result is a relatively uncontrollable stochastic process
deﬁned with little thought to addressing the balance between reﬁning search and
exploratory search. Operators which give some level of control over the balance between
exploration and exploitation to the user can rectify this situation. This provides a
secondary impetus to the design of new operators.
The intention of this thesis is threefold. First, examples of operator analysis to
detect bias will be examined, providing a simpliﬁed taxonomy of invariance for
classiﬁcation of EC operators. Second, operators redesigned to increase generality
(weakness) will be studied; and, ﬁnally, examples of operators designed to allow greater
control of the balance between exploration and exploitation will be examined. Chapter 2
provides sufﬁcient technical background and is intended for those with only passing
familiarity with evolutionary computation. The set of empirical test functions being used
is also presented in Chapter 2. Chapter 3 presents numerous forms of statistical analysis
Of operators and a taxonomy of invariance for EC operators. Some of the biases
uncovered in Chapter 3 are used as the basis for the redesign of operators in Chapter 4.
These operators are subjected to a series of comparative evaluations under a number of
well known test problems, as well as some designed speciﬁcally to test the relative
generality and strength of the various systems. Finally, conclusions are presented in
Chapter 5.
1.2 Problem Domain
The problem domain we have selected is realvalued function optimization. To
facilitate empirical analysis we have selected test beds consisting primarily of a number
of well known realvalued ﬁmction optimization tasks. However, given the rather limited
coverage of these functions in terms of relevant features (asymmetry, signaltonoise
ratio, codependency of variables, etc.) a greatly enlarged test set is proposed in Chapter
2 for the purposes of this analysis.
Emphasis has been placed toward moderate scale (1030 parameter) function
optimization tasks. Most of the operators and systems proposed here should scale to
larger domains at least as well as other EC techniques are capable of doing so; however,
no Speciﬁc examination of this scalability of the proposed techniques and operators will
be undertaken in this work.
Realvalued function optimization can loosely be deﬁned as the collection of all
parametric problems that map a set Of 11 real numbers to the set of reals, i.e. f (R") 2 R .
Typically the task is to ﬁnd a set of inputs which map to a minimal or maximal domain
value. Note that the landscape deﬁned by a given mapping function may be multimodal
or unimodal, exhibit local smoothness (consistency between gradient and direction of
nearest extremum) or extremely chaotic behavior. Since most evolutionary computation
systems do not require a computable derivative, it is not necessary for f to be
differentiable, or even piecewise differentiable. f may will be deﬁned over the range of
. . . n . .
all possrble Inputs (Le. R ), although some functions may present range constraints on
the input parameters which may require additional consideration. Generally, the function
f is assumed to remain ﬁxed over time, although there is much interest in EC behavior on
dynamical realvalued landscapes. The majority of the problems investigated here are
completely stationary with the exception of the occasional introduction of noise to the
functional mapping, e.g. f (R") :> R + N (0,0). The techniques developed in this work
are not speciﬁcally designed to address any of the issues involved in searching dynamical
landscapes.
While the functions being used in the test domain provide inﬁnite resolution (i.e.
they are deﬁned over the set of all real values), the EC systems being tested all employ
limited ﬁnite resolution representations (although quite admittedly large representations).
Far ﬁ'om being an incidental matter, the form and method of value representation is often
a matter of concern and careful scrutiny in evolutionary computation. As discussed in
Chapter 2, bias may be easily introduced via poor choice of representation or a mismatch
between representation and operators. Wherever possible, the systems described in this
work will use standard 64bit IEEE representations unless otherwise speciﬁed. Hopefully
using a uniform representation will eliminate representational bias from the empirical
results.
1.3 Techniques
There are two basic forms of analysis that will be applied in this work. Through
analysis of the distributions produced by operators used in evolutionary systems relative
to the source distributions, we may be able to characterize some of the fundamental
biases inherent to these operators. This analysis is limited in that it treats each operator in
isolation and does not consider behavior between multiple operators. For example, it is
possible that two operators which have opposing biases tend to produce a system with
minimal bias when used in conjunction. Also such analysis does not account for
potential emergent behaviors from interaction between operators in a search system.
Empirical analysis provides for direct comparison of systems employing various
operator collections. Empirical results can conﬁrm or deny hypotheses which result from
distributional analysis, assuming such effects are not completely countered by any
emergent behaviors. In order to obtain useﬁrl empirical data, we will require a control
group for comparison. Therefore, we will test a number of wellestablished EC systems
over the same test bed. Given the implications of the NFL theorems, the reader should be
wary of extending such comparative empirical data beyond the given test bed functions.
Inasmuch as we are evaluating the relative match between operator biases and landscape
characteristics, one may possibly expect similar results over functions with similar
characteristics to a given test bed ﬁrnction.
9
1.3.1 Statistical Analysis of Distributions
The distributions induced by a single pass Of a given operator on one or more
example source populations will be statistically analyzed. Comparisons between the
statistical characteristics of the source population and the produced population will be
demonstrated for speciﬁc instances. Where possible, complete closed form analytical
evaluation will be developed for the general case as well.
1.3.2 Comparative Experimental Analysis of Operators
For empirical testing, we will denote a separate system as being an evolutionary
computation approach which incorporates a unique series of operators or a modiﬁed
mixture of application rates of a given set of operators. For each tested system, data will
be collected for multiple test runs on each test function. Results from all test runs for a
given system/function pairing will be averaged and plotted for visual comparison. Also,
the standard deviations will be computed to allow for distributional comparison of test
runs. Since distributional analysis of test runs can be somewhat misleading (given that
the underlying distribution may not be Gaussian), nonparametric tests, such as
Wilcoxson rank sum testing, will be carried out pairwise between the ﬁnal results of the
tested systems to establish probable statistical signiﬁcance of the comparative results.
10
Chapter 2
Background
2. 1 No Free Lunch Theorems
This subsection provides an overview of the No Free Lunch theorems published
by David H. Wolpert and William G. Macready [Wolpert 97} (originally published in
working form in 1996) and further reﬁned by Tom English [English 96]. Roughly stated,
these theorems surnmize that all “black box” optimization procedures (i.e. any
optimization procedure which treats the function to be optimized as an unknown system)
must on average perform equally well when averaged across the set of all possible
problem sets for a given encoding size. Therefore, any performance advantages of one
method over another on a given problem instance must be offset by an equal performance
loss on the remaining problem instances within the set of all possible problem sets.
11
The initial proof is actually quite simple in concept. The majority of the NFL
theorems deal with overcoming potential objections to and providing extensions of the
initial proof. The basic theorem makes the assumption that both the range and domain of
the function being optimized are ﬁnite. However, both domain and range may be quite
large; therefore, the basic theorem applies to all search methods which are realized via
implementation using a ﬁxed representation in a digital computer.
Wolpert and Mcready [Wolpert 97] denote the search space as X and the Space of
possible cost values as Y. An optimization problem provides a mapping from X to Y, so
the size of the space of all possible problems is Y ll XI (i.e.  Y  possible values at each of
 X  possible search points), which is a large but ﬁnite set. Further, we may deﬁne the
probability that a given search method will reach a given performance level (e.g. the
minimum or maximum Y value obtained so far) within a given number of evaluations,
based solely only on the mapping function and the history of previously Visited points.
That is, P(Z  f ,m,a) , where Z is a histogram of the Y values of the visited points, f is
the mapping function (i.e. landscape) being optimized, m is the number of points Visited
so far, and a is the search method. This formulation works for any comparison made
through evaluation of the histogram of function values obtained by a search ﬁrnction.
Therefore, this analysis holds for measures such as efﬁciency (best function value
earliest) and effectiveness (best function value reached), which are two of the most
common search function evaluators.
P(Zr  f ,m,a) can be shown to be dependent on PM};  f ,m,a) , where d% is the
history of Y values visited up to step In. For m = 1, it can be demonstrated that the sum
12
of P(d1Y  f ,m,a)over the set of all possible mapping functions, that is,
z P(d1Y  f ,m, a)= Y  IX ' 1. Intuitively, the number of functions over which our ﬁrst
f
Y sample is exactly dIY is the set of all functions where the selected X value is mapped to
dlY by f. That set must include all possible mappings of the values of Y to the other  X 
Xl
 1 members of the domain, hence the set must be of size Y  I  . Since this value is
independent of a, this demonstrates that for m = 1, P(d,¥I  f ,m,a) is independent of the
choice of a.
It follows that if P(d,),;  f ,m,a)is independent of a for m, then it is also
independent of a for m + l. Wolpert [Wolpert 97] shows that the value of
P(d 1):; +1 I f ,m +1,a) = IITIPM’)",  f ,m,a). In other words, having another sample from
Y only narrows the number of potential matching functions, f, by Y . This provides an
inductive proof that P(d,},;  f ,m,a)is independent of a for all possible choice of a when
averaged (summed) across all possible functions. We can conclude that
—> —+
P( c  f ,m,a1) = P( c  f ,m,a2)for any possible choice of a, and a2, and therefore, any
performance measure based on Zwill be equally likely to favor a, or a; for a completely
arbitrary f Keeping in mind that a; or (12 represent two distinct search algorithms, the
implication is that for an arbitrarily selected f either is equally likely to produce the better
results.
13
[English 95] uses an information theoretical analysis to reach a similar
conclusion. That is, the history of prior sampled points provides no information about the
values of any unsarnpled points in the search space when considered over the set of all
possible mapping functions. Therefore, he concludes that the power of a search method
is related solely to the degree that the prior assumptions (prior information) match the
actual circumstances. Or in other words, a search method is most successful when the
mapping looks like the search method assumed it would before the search began.
Many have pointed to the possibility of using metalevel techniques to select
between alternate search methods, thereby allowing a metasearch function to eliminate
any biasing prior assumptions. However, [English 99] points out that such methods
necessarily reduce their effective performance by attempting to pursue multiple potential
assumptions simultaneously. Further, the only effective method of removing all bias is to
return to completely stochastic random search, and that as any metalevel technique
continues to reduce bias, it necessarily approaches random search.
The NFL theorems allow for comparison of search algorithms on speciﬁc
problem instances. It is important to note that the NFL theorems do not imply that all
algorithms must behave with equal strength on each problem instance. For that matter,
[Wolpert 95] suggests that minmax comparisons are possible within the framework of
NFL. That is, an algorithm, a, for example, may do much better than another search
algorithm, a; on a number of problems but there may be no problems where a; performs
dramatically better than a 1. In order to satisfy NFL in such a case, it must be necessary
that the set of problems over which a; performs better than a 1 is much larger than the set
over which the reverse is true. Therefore the overall performance average for the two
14
algorithms remains equal. In other words, it is possible for a given algorithm to sacriﬁce
generality for strength on a Speciﬁc subset of the possible problem spaces. Likewise, any
search algorithm which maintains generality must necessarily sacriﬁce strength. Weak
(general) search algorithms may be greatly outperformed on Speciﬁc problem instances,
but will not necessarily equally outperform on a compensating set of problem instances.
Rather, it is probable that a weak search algorithm will provide smaller gains on a larger
set of potential problems to compensate. Thus, comparisons and analysis as to the
relative generality (weakness) or strength of different search algorithms are still possible.
[Wolpert 95] suggests a geometric interpretation of the NFL theorems where the
performance success of a search algorithm depends directly on the degree of “alignment”
between a vector representing the actual search space information and the search
algorithm’s expectations (assumptions) about the search landscape. This
conceptualization provides loose support for local behavior analysis of EC operators and
systems, in order to better understand what assumptions operators and systems are
making about the search landscape.
2.2 Overview of Evolutionary Computation
Evolutionary computation (EC) is the broad category for a number of forms of
automated computer search. While each of these forms claim separate founding
inspirations and often hold strongly opposing philosophical Views, they are often more
alike than they are different in practice. Evolutionary computation derives its power from
the ability to successﬁrlly redistribute search resources over time. This successive
reﬁnement is often achieved through competitive selection, which favors quantitatively
better solutions. Since the inspiration and metaphor of this selection is the mechanism of
15
natural selection as found in theories of natural evolution, such algorithms are most
frequently classiﬁed as evolutionary algorithms. Much of the history and origins of
evolutionary computation deals with simulation of adaptation and empirical modeling of
natural systems. However, the ability of evolutionary algorithms to solve problems is not
directly linked to the provability of the underlying metaphor in that modern evolutionary
algorithms are at best greatly abstracted models of natural systems.
Evolutionary algorithms are often compared with and occasionally confused with
simulated annealing, Monte Carlo systems, stochastic gradient descent, and other simple
stochastic search algorithms. These stochastic search algorithms typically proceed from a
single search point, or landscape sample, and progress by modifying this solution or
solution set either randomly or using locally sampled landscape information, such as the
local value of the gradient (assuming a continuously or piecewise differentiable function).
New search points are then accepted stochastically on the basis of improvement. Indeed
at ﬁrst blush, some EC systems appear to be simply massively parallel stochastic search
processes since individual members of a given population are often modiﬁed randomly
and new search points are accepted stochastically on the basis of competitive selection.
However, it is exactly the mechanism of competitive selection which differentiates
evolutionary algorithms from parallel stochastic search, since competitive selection is the
mechanism which enables an evolutionary algorithm to rebalance search resources
toward more productive search areas. Thus the narrow yet critically signiﬁcant
separation between parallel stochastic search and evolutionary computation is that the
latter may abandon some search paths in order to apply those resources as branches to
more favorable search paths, and the former cannot.
16
As mentioned previously, there are a number of different approaches toward
evolutionary computation and a number of different systems which are consistently
championed by various professional circles. However, all forms of evolutionary
computation may be categorized as either parametric or programmatic algorithms.
Parametric evolutionary algorithms are systems which attempt to solve individual
instances of speciﬁc problems. The solutions are pararneterized, that is an individual
solution to the problem instance consists of values for the parameters of the problem
which we desire to optimize. Thus we know the parameters before we solve the problem,
searching only for the values of those parameters. For example, a genetic algorithm
might be designed to search for an optimal packing order for a speciﬁc sequence of
elements to be packed.
Evaluation in a parametric evolutionary algorithm consists of decoding each
potential solution and providing a quantitative rating of this solution or at least a
qualitative comparison between two given solutions. Note that the former implies a
single judging criterion, or at least the ability to combine multiple criteria into a single
quantitative result. The second can allow for richer multicriteria forms, such as pareto
comparisons for multiobjective problems. In either case, parametric evolutionary
algorithms always search out the best answer (or set or possibly answers in a pareto
search) for a given explicit problem instance. In contrast to a programmatic evolutionary
approach, there is no expectation that any information from the parametric search process
for a given problem instance (such as ﬁnal population, history of points visited, etc.) will
provide any beneﬁt during a search of a new instance of the same problem. (An instance
of a problem implies one with similar form, but with potentially different
17
parameterization.) Therefore the applicability of the answer (or pareto answer set) is
extremely speciﬁc and narrow (dependent, of course, on the narrowness of the problem
statement). In the previous example, the solution to the optimal packing sequence for a
given sequence of items does not likely provide any information for the optimal packing
of a different series of items.
Programmatic evolutionary systems attempt to build programmatic solutions to
entire classes of similarly deﬁned problems. While the primitives from which the
programs are constructed are preselected, problem solutions are not pararneterized. Input
for a programmatic evaluation commonly consists of some generated program and a
number of problem instances to be examined, which may be a ﬁxed test set or a
stochastically selected test set. For example, a genetic program could be developed
which is capable of ﬁnding optimal packing sequences for several similar sequences of
items. The intention is to ﬁnd a more generally applicable solution engine, rather than a
single onetime solution. However, the level of generality of the produced program is not
guaranteed and depends on numerous factors.
Evaluation for a programmatic evolutionary system consists of simulating the
action of the programmatic solution on a number of the given problem instances and
producing an average or summary quantitative evaluation value which may be later
compared to the relative value of other programmatic solutions. Note the search domain
of a programmatic evolutionary algorithm is much larger than that of a typical parametric
evolutionary algorithm in that it attempts to solve an entire class of problems, rather than
a single speciﬁc problem instance. As an example, a typical parametric EC problem
might be to schedule a given production set for a given set of resources, while a
13
corresponding programmatic problem might be to produce a program that is capable of
scheduling a number of production sets having certain characteristics over a given set of
resources (or even perhaps a number of different sets of resources). Given the broader
nature of most programmatic search problems, and the sampled nature of the evaluation,
programmatic search tasks often proceed at a much slower pace than related parametric
search tasks.
There are a number of different forms which modern programmatic EC systems
use to encode solutions. Early evolutionary programming approaches evolved non
deterministic ﬁnite automata (NDFA), also some current work continues in that vein,
although the term evolutionary programming (EP) is normally associated with a speciﬁc
form of parametric evolutionary search. Classiﬁer systems encode sets of production
rules to create rulebased systems that carry out programmatic tasks. By far the most
prevalent form of programmatic evolutionary search is genetic programming (GP), which
encodes its programs as LISPlike function trees. There are also forms of programmatic
evolutionary search that produce primitive directives similar to machine code.
There are three common forms or schools of parametric evolutionary
computation. The earliest form is evolutionary programming (EP) which originated with
Lawrence Fogel [Fogel 62]. The most commonly referred form in the United States is
genetic algorithms (GA) which was developed by John Holland in the 1960’s and
published in book form in his monograph Adaptation in Natural and Artiﬁcial Systems
[Holland 7 5]. Evolutionary strategies (ES) originated with Binert, Rechenberg, and
Schwefel in Germany [Rechenberg 65] and has its strongest inﬂuence in European
circles. Each of these systems employs an array of operators, and there are countless
19
variants and variations of each. In the following sections we will describe the standard
forms of each of these three systems and note some of the more commonly used
variations and variants. Following the overview of these systems, the major operators
and variants will be categorized and presented in further detail. First, however, a
common ﬁamework of an evolutionary algorithm is presented to allow uniﬁcation of
terminology in the later presentations.
The analytic techniques developed in this research are designed for use in
parametric evolutionary search systems. While it may be possible to adapt some of the
concepts herein to programmatic approaches, certain assumptions such as continuity and
natural ordinality of the parameter space, locality of movement within the parameter
space, etc. need to be translated into equivalent assumptions on the nominal forms typical
to most programmatic encodings before such adaptation could proceed. To the degree
that such equivalent assumptions may not exist, it may not be possible to apply these
techniques in a programmatic domain. Further, the form of these translated assumptions
may dictate a different course than that presented here. It is likely that the form of
analysis presented here may be in some fashion applicable to programmatic EC, but it is
doubtful that the speciﬁc techniques and operators proposed would have any direct
parallels.
2.2.1 General Parametric Evolutionary Computation Algorithm
All evolutionary algorithms manipulate collections of (as opposed to individual)
potential solutions, which represent populations in the metaphor of natural evolution.
The individuals in a given population undergo successive rounds of modiﬁcation and
reﬁnement through recombinative, mutative, and selective operations producing
20
individuals which may be incorporated into subsequent successor populations. Most
evolutionary algorithms initialize the ﬁrst source population randomly through random
sampling across a bounded segment of the parameter range. Search resources are
commonly constrained; consequently population sizes remain constant for the majority of
EC systems. It is important to note that standard population sizes are exponentially
smaller than the enumerated search space. Also, since search spaces grow exponentially
with linear expansion of the encoding space population sizes do not scale with the size of
the search space and therefore we expect an exponential slow down in performance and
accuracy as the size of the search expands.
The course of a standard evolutionary algorithm is often represented as
progression from source population to successor population with various modiﬁcations to
the individuals and potentially limited growth and reduction of the population between
them. However, since we will be examining the collective distributional effects of an
operator on the distribution of the population free from any emergent behaviors from
interaction with other operators, we present a basic outline for an evolutionary algorithm
which shows multiple distinct successive population transformations for each applied
operator. In order to bridge the gap between standard notation and our extended notation,
we will label all intermediate populations, pools. Therefore we will deﬁne an
evolutionary algorithm as a series of transformations from a source, or parent population
through a succession of intermediate pools, ﬁnally producing a successor, or child
population. Note that these intermediate pools may be virtual populations in that the
individuals which comprise these groups may never be collected together as such;
however, they may still be viewed as a reasonable collection of individuals in a given
21
state, and as such may be treated statistically as a pOpulation variant. For example, a GA
may successively select two individuals ﬁom the source population, apply recombinative
and mutative operators to the selected individuals, and deposit the modiﬁed solutions into
the successor population before selecting the next pair for mating. Thus, there never is a
physical "postbreedingselection" pool.
An outline of the various stages of a standard evolutionary algorithm is given in
Figure 2.1. Note that the order of the postrecombination and postmutation pools are
interchangeable. In fact, it is possible to have more than one recombinative operator or
mutative operator, or none. For each mutative or recombinative operator, we will
typically represent a separate postrecombination or postmutation pool so that we can
observe the effects of each operator in isolation. The initial population is copied to (or.
equivalently becomes) the source population. Individuals from the source population
may be selected for breeding through an optional breeding selection operator. Note that
as with all intermediate pools, breeding pool sizes may vary or be ﬁxed depending on the
implementation of a given evolutionary algorithm. For each subject selected for breeding
a cohort pool may be selected. A cohort pool is a potentially limited selection of
individuals from the general population. Membership is typically stochastic and may be
based on various criteria such as Similarity or dissimilarity to the initial subject (i.e.
niching or incest reduction). Members from the cohort pool may provide supplementary
data for operator action. Cohort pools may also be shared. For most common EC
implementations the cohort pool is selected via the same mechanism as the breeding pool.
However, we differentiate the two here since it is possible to select individuals via
alternate mechanisms with the speciﬁc intent of selecting better breeding information for
22
a given member of the breeding pool, rather than simply more ﬁt individuals. After all
recombinative and mutative operators have been applied, an optional survival selection
operator may be applied to determine the composition of the successor population. Note
that one or more individuals from the original source population may also participate in
the survival selection competition (e.g. u + l. selection, elitism). Finally, the successor
population becomes the source population for the subsequent search iteration. This cycle
repeats until some stopping criterion is reached.
Initial Population
I
copy
,     Source Population ¢
\
breedingselection cohortselection
t ‘
Breeding Pool Cohort Pools
\ /
recombination
v
PostRecombination Pool
mutation
= 9 PostMutttion Pool
survivalselection
V
Successor Population
Figure 2.1 Outline of a basic evolutionary algorithm
Those familiar with standard presentations of evolutionary computation systems
may note the lack of explicit objective evaluation of individuals in the outline presented
23
in Figure 2.1. Objective information, or some form of comparative capability is
necessary to make selective decisions; therefore, we assume some objective evaluation is
carried out before or during breeding selection and survival selection (and possibly cohort
selection). In practice, new solutions are commonly evaluated and assigned a
quantitative value at initialization (in the case of the initial population) or after
modiﬁcation by whatever operators are being applied. However, logistically this
information is typically not required outside of selection operators, so we include the task
of assigning objective values as part of the selection operations. In particular, any
operator which does require objective value information may be Viewed as a composite
of a pool selection component and an operator component within this model.
Each evolutionary algorithm must choose a method for representation of the
parameter values being searched. Often the choice of representation leads to Speciﬁc
assumptions and search characteristics for a given EC implementation. EP and ES
typically require parameters to be realvalued entities, and therefore are most suited to
search spaces with continuous ordinal dimensions; however, any realvalued encoding
such as IEEE ﬂoating point representation, or a sufﬁciently ﬁne grained discrete
encoding is suitable. Standard GA implementations represent all entities as bit strings.
This is largely due to the strong inﬂuence of the schema theorem [Holland 75] and the k
armed bandit metaphor [Goldberg 89b] and the concept of implicit parallelism. Typical
GA approaches to realvalued function optimization use ﬁnegrained binary
discretization to represent parameter values. Alternate forms of genetic algorithms are
more ﬂexible in their form of parameter representation but require alternate operators to
achieve equivalent results. The theory of interval schema has been advanced as an
24
equivalent alternative to standard schema theory for such alternate GA forms [Eshelrnan
93].
2.2.2 Evolutionary Programming
Evolutionary programming (EP) was originally Visualized as a programmatic
evolutionary algorithm, employing stochastically modiﬁed nondeterministic ﬁnite
automata (NDFA) to encode programmatic solutions [Fogel 62]. Over time EP evolved
into a parametric search algorithm used primarily for realvalued function optimization
[Back 97]. While some active research still continues on NDFA evolution, the majority
of current EP research focuses on parametric search.
EP differs from the majority of other evolutionary algorithms in that it assumes
that the population as a whole does not contain signiﬁcant information about the most
productive directions for ﬁrture search. This equates to a metaphor of speciation in that
individuals are seen to represent species of solutions rather than individuals. The lack of
interbreeding between species in natural systems thus equates to a prohibition against
recombinative operators in general.
The philosophy of design for EP operators stresses phenotypic versus genotypic
manipulation. Genotypic manipulation implies that the level of manipulation should be
at the encoded level. Thus the encoding designer has the onus of ensuring that operator
manipulations translate to meaningful search actions in the parameter space. In contrast,
phenotypic manipulation implies that the level of manipulation should be within the
decoded parametric space. To minimize the potential for bias, meaningful search action
for EP typically equates to favoring smaller search movements in phenontypic space. For
25
this reason EP typically incorporates operators which focus on mutation with emphasis
toward continuity and localized search.
Since EP does not use population level information to induce the magnitude of
search activity, an alternative method for deﬁning the magnitude and direction of
mutation is necessary. Current EP implementations typically employ selfadaptive
techniques to deﬁne mutative magnitudes. Selfadaptation is implemented through the
addition of (1 parameters to each encoded solution, where d is the number of parametric
dimensions. Each of these metaparameters is initialized to a fraction of the initial range
for each corresponding solution parameter (usually relative to the reciprocal of the square
root of d in order to maintain constant operator variance regardless of dimensionality).
These metaparameters represent the width of the probability distribution used for
mutation. A form of mutation is also applied to these metaparameters simultaneously
with the mutation of the solution parameters. Thus survival of individuals (or species)
requires not only location of ﬁ'uitful solution parameters but also fruitful mutation
magnitudes. Note that the latter requires a longer time frame before successful feedback
can be obtained. An alternative, earlier form of mutation stepsize adaptation tied the
magnitude of mutations to the ratio of successful advancement of children [Rechenberg
7 3]. Both of these techniques were originally pioneered under evolutionary strategies
(ES) [Rechenberg 73] [Schwefel 77].
EP employs a form of survival selection, known as (p + it) selection [Back 97].
Under EP selection a population of 11 individual solutions are mutated to produce A
children, where A is typically an integer multiple of 11. Note that all members of the
source population participate equally in the production of children without any form of
26
reproductive selection. After all A children have been produced, a ranking tournament is
held among the super pool of the 11 parents and A children combined, from which the best
1.1 are selected as members of the successor population. Thus the postmutation pool,
including the addition of the untouched members of the source generation is u + A in size,
while the size of the source and successor populations is u — hence the term (1.1 + A)
selection.
EP selection arguably may achieve a high level of elitism. Elitism allows an elite
or extremely ﬁt individual to pass untouched from the source population to the successor
generation. With (1.1 + A), all 11 members of the source population may survive untouched
as members of the successor population.
2.2.3 Genetic Algorithms
Genetic algorithms are largely focused on the concept of simulated genetic
representation and modiﬁcation. This is partially due to the original focus of some of the
foundational and most inﬂuential works published on genetic algorithms. In Adaptation
in Natural and Artiﬁcial Systems [Holland 75], Holland focuses primarily on
understanding and simulating the mechanisms of natural adaptation. Much of the
preliminary thrust in GA research simulated the natural genetic manipulation and often
borrowed concepts from theories of genetics and natural evolution. Also, Holland ﬁrst
proposed the schema theorem as an attempt to explain the apparent power of adaptive
search techniques [Holland 75]. This theory was further ampliﬁed by the karmed bandit
analysis of binary encoded GA as presented in Genetic Algorithms in Search,
Optimization, and Machine Learning [Goldberg 89b]. These theories are largely
concerned with binary schemata and their interactions. Many researchers still claim that
27
having a minimal alphabet size (i.e. binary encoding) provides optimal search power.
There have been a number of criticisms against various portions of the schema and k
armed bandit theories [Fogel 98] [Fogel 00], and alternative theories have been advanced
for GA incorporating other encoding forms [Eshelman 93].
GA stand as a partial antithesis to EP in that GA derive the majority of their
search strength and guidance for ﬁrture search directions from the relative locations of the
members of the source population in the genotypic space, and their relative objective
values. This stems ﬁom the fact that recombination plays a dominant role in genetic
algoritlmrs, while mutation is deemed necessary only to maintain sufﬁcient diversity to
support continued recombinative search. The justiﬁcation for the strong reliance on
recombination with minimal levels of mutation is based on the implicit parallelism
argument of the building block theory. This strong emphasis on recombination has
elicited criticism over the potential for revisitation of search points. Such concerns arise
from consideration of TABU search [Glover 89] which derives a portion of its power
from explicit measures taken to discourage point revisitation, and extended analysis of
the No Free Lunch theorems which imply that point revisitation may cause exponential
losses in search performance relative to those systems which explicitly avoid point
revisitation. [English 99]
There are a number of variants of GA, and a wide array of potential operators
with varying degrees of popularity and levels of experimental results. First, we outline
the most common form of genetic algorithm which is basically equivalent to Goldberg’s
simple GA (SGA) [Goldberg 89b] following which we outline some of the more common
operator variations and variant systems.
23
2.2.3.1 The Standard Genetlc Algorithm
SGA employs binary encoding and binary operators exclusively. The initial
population is initialized using uniform random sampling. The reproductive selection
scheme distributes the probability of selection according to the relative objective values
of the members of the source population (assuming that we desire to maximize ﬁtness),
otherwise known as proportional selection. Pairs of the individuals from the breeding
pool are selected for potential application of crossover. Binary mutation may be applied
to individuals from the postrecombination pool; however, typically no intermediate
breeding selection is performed between application of crossover and mutation.
Crossover and mutation are applied on a probabilistic basis with a portion of the breeding
pool potentially passing directly into the successor population without modiﬁcation.
Typical crossover application rates are fairly high (typically at least 70%), with low
mutation rates (at most 1 bit per encoded solution, often much less). Reproductive
selection chooses exactly p. individuals ﬁ'om the source population with replacement (i.e.
individuals from the source population may be repeated in the breeding pool). All
Operators replace their input individuals with modiﬁed individuals in the successive pool
or population upon application. Therefore, all pool and population sizes are ﬁxed at u;
hence, this selection may be termed (u, A) selection, where A = p, or (u, 1i) selection
[Back 97]. No survival selection is applied. Selection and application .Of operators
repeats until some ending criterion is reached or processing limits are reached.
There are multiple forms of recombination which have been employed in GA
systems. Standard crossover operators which are typically applied to binary encodings,
but which may also be applied to larger encodings with restricted crossover points,
29
include lpoint crossover, 2point crossover, and uniform crossover. These basic
crossover forms are similar in that they produce two child solutions from two parent
solutions. The value from each of the two parent solutions for each allele, or atomic
value (typically a single bit), appear in one of two children, thus the value information
from the parents is conserved. The children are composed of blocks of values from
alternating parents. The number of crossing points determines the number of contiguous
parent blocks in the children. In crossing points implies m+l contiguous blocks
transfered to each child (thus in 2point crossover, there are three alternating contiguous
blocks taken from alternating parents as demonstrated in Figure 2.2). The position of the
crossing points is typically selected uniformly from the set of all possible crossing points.
Uniform crossover allows
Pa???” w. _, Childl _ . L x
5'53 GT [G l P‘ A? l: A. G G ; G ;.A7 A.
Parent2 Child2
4,57,]. 1,, 1,7773", , . ’_' . . i r, . 'rr: . .4_ _ . r
P AI 616 .1 A T: :P, A T G. RWT
Crossing Points
Figure 2.2 Example of 2point Crossover
from 0 to n crossing points, where n is the size of the set Of all possible crossing points.
Thus, uniform crossover simply allows each allele an independent 50% chance of being
inherited from a given parent. (Note however, that once a child solution inherits an allele
from a given parent, the second child automatically receives the allele from the opposing
parent.)
30
2.2.3.2 Alternate Genetic Algorithm Operators
A number of recombinative operators have been developed speciﬁcally for use by
GA on realvalued function Optimization problems. These include averaging crossover
[Davis 91], linear crossover [Wright 91], ﬂat crossover [Radcliffe 90], blend crossover
(BLX) [Eshehnan 92], unimodal normal distribution crossover (UNDX) [Ono 97], m
parent UNDX [Ono 99], and simplex crossover (SPX) [Tsutsui 99]. These operators are
described in detail in Section 2.3.
Use of the raw objective values for proportioning selection probabilities can often
cause difﬁculties; therefore a number of alternate forms of reproductive selection have
been employed in GA. Scaled proportional selection uses a normalization factor before
determining relative selection probabilities. Ranked selection assigns probabilities to
individuals depending on their absolute rank within the current population. Boltzman
selection uses Boltzman scaling to determine selection probabilities. All of these forms
of reproductive selection are quantitative in that they require the objective function to be
able to produce a consistent single quantitative value of merit for an individual solution.
Tournament selection requires only that a comparative evaluation can be made between
the quality of two or more competing solutions.
A number of optional operators and algorithms for genetic algorithms have been
developed to help maintain population diversity. Niching attempts to maintain stable
levels of diversity by forcing new individuals to compete with genetically similar
individuals (of the previous or current generation) for survival [Goldberg 89b]. Incest
reduction [Eshehnan 91b] takes an opposing approach by selecting a mate for a given
selected individual that is most genetically opposite (typically measured in Hamming
31
distance). Elitism allows for objective value based survival of individuals ﬁ'om the
parent generation directly into the child generation, thereby guaranteeing that the best
solution in the parent population will survive untouched [Eshelman 913]. These
operators and algorithms have been widely studied and are often used in GA systems.
2.2.3.3 Alternate Genetic Algorithm Systems
Two fairly common alternative GA systems designed for realvalued function
optimization are Eshelman’s CHC [Eslehman 91a], and Mﬁhlenbein’s Breeder Genetic
Algorithm (BGA) [Miihlenbein 95]. In their current form, these systems employ
Operators intentionally adapted to the realvalued function domain, and have shown
enhanced performance on speciﬁc realvalued test problems over the standard binary GA
approach. [Eslehman 91a] [Mtihlenbein 95]
CHC, ﬁrst proposed by Eshelrnan [Eshelman 91a], was created with the express
intention of overcoming difﬁculties with search effectiveness and efﬁciency in the
standard GA approach. CHC introduced several new operators which were departures
from the popular thought about genetic search at that time. CHC uses population elitist
selection, which is survival oriented and uses a ranking of the combined parent and child
solution sets reminiscent of EP selection. The current popularity of Uniform crossover
[Syswerda 89] is partially due to its incorporation into the CHC framework (although
CHC uses a speciﬁc variant of Uniform crossover, HUX, which always allocates exactly
half of the alleles from each parent). CHC also introduced the concept of incest reduction
as a method of maintaining population diversity. The CHC system does not employ a
standard mutation operator, but instead enters a reinitialization phase once given
convergence criteria have been met. The reinitialization is not a complete randomization
32
— the population is reinitialized with copies of the best solution with a preconﬁgured
percentage of random mutation applied (Eshelrnan suggests 35% of alleles be mutated).
Two years after its introduction, Eshelrnan introduced the blend crossover
operator, BLX, and its most common variant, BLXor. [Eshehnan 93] BLX—or
recombination is designed for continuous domains. Eshelrnan successfully demonstrated
that BLX—a works best within the CHC framework, as opposed to the standard GA
framework. Given the relative difﬁculty of dealing with continuous domain problems
with a standard binary GA, and given the large number of continuous domain
optimization problems available in the real world, CHC using BLX—a quickly became a
popular tool.
Initially outlined by Mﬁhlenbein [Miihlenbein 95], BGA represents a radical
departure from standard genetic algorithm design. The philosophical metaphor for BGA
is that of simulating expert human breeders, rather than Simulating random natural
genetic processes. Further, Miihlenbein attempts to justify the design of BGA and its
operators using tools similar to those of expert human breeders, genetic theory, and
statistical inference.
BGA uses reproductive truncation selection, meaning that parents are selected
randomly among the top 7% of the parent generation. Mtihlenbein demonstrates a
derivation from T to the selection intensity, I, and thereby to the expected convergence
rate in the absence of mutation. BGA obtains most of its power from recombinative
operators, in fact it utilizes three forms of recombination: discrete recombination,
extended intermediate recombination, and extended line recombination. The mutation
Operator is unique to BGA, and its predecessor PGA (parallel genetic algorithm), and
33
consists of discrete loguniform distributions scaled relative to the initial parameter range
for a given variable. This mutation operator simulates the distribution induced by binary
mutation on standard unsigned integral representations.
Parallelization of genetic algorithms provides a number of potential modiﬁcations
to the basic GA system. Parallelization typically is categorized by the degree of isolation
and independence between individual solutions in the population. Global parallelization
is conceptually identical to the standard GA system, with the simple addition of more
computing resources. Course grained (island) parallelization provides for individual
subpopulations with a speciﬁed rate of interchange of individuals. These migrated
individuals may be selected at random, via competitive selection, or through various
other mechanisms. Fine grained, or cellular genetic algorithms isolate individual
solutions and provide a ﬁxed interchange map (commonly a toroid) to designate which
neighbors may be used during the recombinative operation [Whitley 93]. Note that it is
also possible to simulate the logistics of one of these parallel systems using single
processor. Therefore, this is logical parallelism, or more precisely logical breeding
control. In this sense, these concepts are similar to niching and incest prevention in their
effects.
John Holland’s monograph, Adaptation in Natural and Artiﬁcial Systems
[Holland 75], outlines two forms of “reproductive plan”. The Rd plan represents the
model for the standard GA structure with its generation based population replacement.
However, the Rd plan is the result of Holland’s schema theorem analysis of a different
reproductive plan, R1 [DeJong 92]. The R] plan maintains a ﬁxed sized population and
34
allows only a single operator application at a time. One or two parent solutions are
selected from the population via proportional ﬁtness based selection, and the resulting
child solutions potentially displace one or two solutions from the current population. The
replacement selection mechanism in R1 is uniform random sampling, that is, the child
solution replaces a uniformly randomly selected individual from the parent population
without regard to ﬁtness. (Much emphasis in evolutionary simulation is placed on the
importance of survival until reproduction, not on continued survival of the individual.)
Whitley’s GENITOR system revived the R1 structure as an alternative “steady state GA”
architecture [Whitley 88]. The popularity of GENITOR revived interest in DeJong’s
analysis of “generation gap” measures (i.e. the effects having solutions of differing
generational “age” competing in the same population) [DeJong 75]. Interestingly,
Rechenberg proposed an identical system as an early form of ES [Rechenberg 73] and
therefore deserves the credit for discovery [Rudolph 97]; however, modern steady state
GA systems can clearly trace their origins to Holland’s nearly parallel development
[Holland 75].
Alternate implementations of the steady state GA model use other forms of
replacement selection, such as replacement of the worst solution, or competitive
comparison against the progenitor solution(s) used to create this solution. DeJong
concludes that any success from steady state GAS is more a product of the replacement
strategy than a modiﬁcation of the generation gap [DeJong 92]. However, Rogers and
PrtigelBennett ([Rogers 1999], [Rogers 2000]) conclude that a steady state GA produces
twice the selection pressure and twice the potential for genetic drift as a population level
GA with a similar population size. The steady state GA is also related to parallelization
35..
of genetic algorithms in that allowing individual processors to proceed at an independent
pace produces a Similar generation gap effect.
More ambitious modiﬁcations of the GA framework focus on altering the method
of solution representation, and thereby are less clearly classiﬁed as GA rather than
general EC. One of these alternate systems, which still maintains a fairly clear GA
character, is Goldberg’s messy GA [Goldberg 89c]. The messy GA allows for multiple
redundant representation of values within a solution encoding. Individual parameter
values are not determined by their position in the encoded solution as with most other
parametric EC systems. Instead, encoded solution parameters deﬁne both their
placement within the solution and their value. The central focus is in overcoming the
encoding problem in terms of localizing the linkage between solution parameters, similar
to Holland’s inversion operator [Holland 75]. A messy GA system attempts to build up
solutions by examining subsolution sets (in the socalled primordial phase) and then
building solutions from these subsolutions. This continues recursively until complete
solutions are built. Specialized messy operators are designed to perform recombination
and mutation on these representations in a logical manner.
A given messy GA solution may underrepresent one or more solution parameters
by failing to include any speciﬁcation for them, while simultaneously overrepresenting
other solution parameters. Multiple potential solutions to these problems have been
suggested. Missing parameters may be handled through use of default values or through
sampling; however, the prior has the potential for introduction of bias, while the latter
adds a potentially large stochastic component to evaluation. Overrepresentation is
typically resolved via averaging or by taking the most recent speciﬁcation over previous
36
ones. As with genetic programming representation, messy GA solutions tend to grow
unboundedly over time unless some form of limitation or resolution is employed.
However, such modiﬁcations tend to destroy the structure of the underlying subsolutions
and are generally avoided.
2.2.4 Evolutionary Strategies
Ths initial impetus for evolutionary strategies (ES) is attributed to experiments
canied out by Bienert, Rechenberg, and Schwefel during the mid1960’s [Rudolph 97]
and reported initially by Rechenberg [Rechenberg 65]. ES systems commonly employ a
number of techniques and operators for real valued function optimization. Rudolph
[Rudolph 97] reports that the current focus of ES on real valued function optimization
may be largely due to Rechenberg’s successful analysis of the simple version of ES in
Euclidean space with continuous mutation [Rechenberg 73].
Early ES focused intently on mutation as the driving force of search, born largely
from statistical analysis of complex systems. These analyses led to various
improvements in the search technique, such as mechanisms for determination of the ideal
mutative step size and later the mutative axial orientation. The success rate of mutations
provided an early mechanism for determination of mutative step size, which is still in use
in some current research. Rechenberg’s analysis determined that in order to successfully
stave off premature convergence that 20% of all newly searched points should Show
improvement relative to the parent solution. This led to development of the wellknown
th . . . . .
1/5 rule, whereby the mutatIve step srze 18 decreased If recent mutations have produced
less than a 20% success ratio, and increases the step size if the mutation success rate rises
above 20% [Schwefel 95].
3 7..
ES research developed mechanisms for selfadaptation of mutative stepsizes.
Selfadaptation allows coevolution of the mutative parameters with the solution
parameters. Each population element is extended to include step size parameters for
mutation along each axis. This form of mutation has been adopted by EP systems as the
primary form of mutation. ES systems also incorporate recombinative operators. These
operators can be pairwise, as with GA systems; however, ES commonly uses global,
populationlevel recombinative operators (i.e. operators which use all elements of the
population as a joint parent set). More recent forms of evolutionary strategies move
' beyond selfadaptive mutative step sizes to include selfadaptation of the axial alignment
[Rudolph 97]. The axial alignment can be represented as n(nI)/2 rotation angles. ES
systems may perform mutation and recombination on both the solution parameters and
the selfadaptive, metalevel parameters; however, the actual operators employed for the
metapanneters are typically different than those used for the solution parameters.
2.3 Overview of Common Evolutionary Operators
2.3.1 Selection Operators
There are countless forms of selection commonly employed in parametric
evolutionary computation systems. The following overview includes the most common
selection operators. The selection operators are presented in two groups: those operators
which are applied to selection for breeding, and those operators which are used to select
survivors.
33
2.3.1.1 Reproductive Selection
Reproductive selection operators are used to choose individual solutions from a
given population pool for use in reproduction. This includes both mutative and
recombinative reproduction, although some selection operators, such as incest reduction,
are speciﬁcally designed to select mates for recombinative operators.
2.3.1.1.1 Uniform Random Selection and Uniform Sampling
Some systems, especially those that employ postreproductive (survival)
selection, allow individuals for reproduction to be selected uniformly at random from the
population pool. Using uniform random selection, both the worst and the best of the
surviving solutions have an equal probability to reproduce. In fact, many systems force
completely uniform sampling of the surviving population by producing c child solutions
from each surviving solution. In this manner, the search process becomes somewhat less
susceptible to stochastic effects such as genetic drift. This is more apparent with smaller
population sizes, where stochastic effects tend to dominate quicker.
2.3.1.1.2 Proportional (Roulette) Selection
Assume we are given a population pool of n solutions, s , , each with a given
ﬁtness value, ft. The assigned ﬁtness values must be maximally oriented, that is, a more
positive ﬁtness values denotes a better solution. Select a uniform random value, t, from
1‘1 1
the range [0,2 1}). The selected solution is s 1, where 2 f, <_ t < 2 f, , wheref_.1 = 0.
i=0 i=0
This is the form of proportional selection used initially in GA by Holland [Holland 75]
39
and Goldberg [Goldberg 89b]. Given its pivotal position in the schema theorem, many
GA purists insist that this is the only acceptable form of selection for use within a GA
framework. However, the central problem with proportional selection is that the intensity
Of selection varies with the variance of the ﬁtness values. In landscapes with large fitness
variance, proportional selection tends to converge quite early while in fairly ﬂat
landscapes, proportional selection may fail to provide sufﬁcient pressure to differentiate
solutions effectively. Further, proportional selection assumes that all ﬁtness values are
all positive, or alternately, that a minimal ﬁtness value is known in advance.
2.3.1.1.3 Rank Based Selection
Rank based selection is an algorithm that attempts to overcome the reservations
about the effects of ﬁtness variance and the requirement of a predetermined minimum
ﬁtness range which occur with proportional selection, while maintaining the same
relative form of proportional selection. Given a population pool of n solutions s, , each
with a given ﬁtness value, f , , sort the solutions in the population pool in order of their
relative ﬁtness from most ﬁt to least ﬁt. Assume a positively ranged monotically
decreasing discrete function f(i) deﬁned for all integers i 6 [Ln]. Assign the value ﬂi) as
the selection value for each solution s, in the sorted population pool, where i is the sorted
ordinal position of each solution. An example of a typical ranking function is
f (i) = l  (I ~ 1). The standard proportional selection algorithm is then performed on
n
these selection values rather than on the raw ﬁtness values. Note that any number of
ranking functions may potentially be used. AS with tournament selection, it is possible to
40
carry out ranking selection without quantitative ﬁtness information as long as the
qualitative evaluations exhibit the transitive property that f i > f j, and f j > f k if and only if
f i > f 11. However, in this case, a minimum of O(n*log(n)) qualititative comparison will
be needed to produce the ranking.
2.3.1.1 .4 GA Tournament Selection
Given a population pool of n solutions, s; , each with a given ﬁtness value, f , ,
select two or more individuals, as elements of tournament set T, from the population pool
with a uniform random probability. The selected solution is s j, where f j 2 f k, s k e T,
k at j. Note that tournament selection is parameterized by the size of the tournament set,
T, also known as the tournament size.
The advantage of tournament selection over proportional selection is threefold.
First, the relative ﬁtness variance no longer impacts the selection intensity. Therefore,
rescaling of the ﬁtness does not modify the operation of selection if the relative order of
fitness remains unmodiﬁed. Second, there is no requirement to maintain a positive
ﬁtness range or to have any predetemrined minimal ﬁtness value. As with ranked
selection, there is no requirement for ﬁtness to be positively biased (that is, this form of
selection works equally well with minimization and maximization problems). Third,
tournament selection does not require the ﬁtness comparison to be quantitative.
Qualitative comparison of solutions is all that is required for tournament selection, as
long as the comparison remains transitive. (That is, f , > f j, and f} > f k if and only if
41
f , > f k.) In the qualitative situation, each selection will require OGIT)qualitative
comparisons.
2.3.1.1.5 Niche Mating and Incest Reduction
Incest reduction is actually a form of tournament selection for choosing a mate;
however, the ranking criterion is no longer solely based on the ﬁtness value, but rather on
the degree of complementation to a previously selected mate. Given a population pool of
n solutions, s, , each with a given ﬁtness value, f , , select a single individual , sp , from
the population using any other ﬁtness based selection mechanism. Next, select a pool of
potential mates, M, where "M“ 2 2. Typically members of M are selected uniformly
from the initial population pool, but they alternately may be selected by any standard
ﬁtness based selection mechanism. Given sp, select sq e M, such
that”(sq,sp ll2(sq,sj], for all 8} e M, where jet], and (sa,sb]lrepresents the
Hamming distance (bit difference) between solutions a and b.
Niche mating selects mates which are as similar as possible, thereby forming
effective subspecies within a population [Deb 89]. The technique is similar to incest
reduction except that the most closely related mate in terms of Hamming distance is taken
from the mate pool, M. Eshelman demonstrated that incest reduction in the CHC
framework was more effective on selected function optimization tasks than niche mating
under a standard GA ﬁ'amework. [Eshelman 91]
23.1.1.6 Fitness Sharing
42
Although related to crowding (see 2.3.1.2.1 below), ﬁtness sharing is typically
implemented as a form of reproductive selection, whereas crowding is by deﬁnition a
survivalist form of selection. Fitness sharing forces solutions to carry an effective ﬁtness
value which may be reduced from its actual ﬁtness value if there are too many near
solutions (measured either genotypically or phenotypically) [Goldberg 87]. Alternate
forms of ﬁtness sharing have been proposed based on tournament selection, including
restricted tournament selection (RTS) [Georges 95], and adaptive restricted tournament
selection (ARTS) [Roy 95].
2.3.1.2 Survival Selection
While reproductive selection attempts to determine which members of the current
population will participate in reproduction and to what degree, survival selection
determines what solutions are kept after a reproductive cycle. Note that these functions
are similar in nature in that if a solution is not selected for mating, it effectively does not
survive in successive generations. However, reproductive selection tends to be more
stochastic, allowing even the worst solutions a nonzero probability of reproduction,
while survival selection tends to be more deterministic — either a solution survives or it is
terminated. While it is possible to use both selection forms in concert, this normally
produces too much selection pressure for a given system. Therefore typical systems
which employ survival selection will also employ uniform random selection or uniform
sampling for reproductive selection.
43
2.3.1.2.1 Crowding
Like ﬁtness sharing algorithms, crowding attempts to maintain diversity by
controlling the number of solutions which can populate a given solution landscape
location. DeJong [DeJong 75] introduced crowding as a form of niche creation in GA.
DeJong’s crowding algorithm forces individuals to dislodge similar individuals from the
existing population. Thus the population tends to maintain its dispersion across the
current peak locations in the population. However, given the nondeterrninistic character
of stochastic reproduction processes, the populations still tend to drift toward the most
prominent peak over time. Mafoud attempts to solve this difﬁculty with a deterministic
form of crowding [Mafoud 92] [Mafoud 95].
23.1.2.2 Boltzman Selection
Boltzman selection is a technique originating with simulated annealing
algorithms. In this form of survival selection, the results of a reproductive operator
(mutative or recombinative) are compared against the original (parent) solution(s).
Individuals that exhibit increased or equal ﬁtness are always allowed to pass into the
survival pool. This is accomplished by use of the Bolztman trial, whereby solution i
attempts to maintain its position in the population in regards to potential replacement
1
(fi+fj)/T(t)
e
individual j. The probability that i wins this competition is , where
1+
T (t) is the current temperature. [Mahfoud 97]
The cooling function, T (t), must be in the range [0,1] and is typically a
monotonically nonincreasing function of t, although this is not a requirement. The
current time, t, is typically represented by a count of the number of generations, etc. As
44
the search progresses and t increases, Boltzman selection increases the relative selection
intensity until it degenerates to the selection mechanism used in the steady state GA.
There are numerous proposed functions for T (t), many of which are taken from simulated
annealing studies, or which attempt to induce certain behaviors such as niche formation,
etc.
2.3.1.2.3 Truncation Selection and EP Tournament Selection
Truncation selection is the simplest of the survival selection forms. The
population pool is sorted in order of ﬁtness ﬁ'om most ﬁt to least ﬁt, as with rank based
selection. Only solutions in the top T % of the population survive. Miihlenbein gives a
direct formulation between the value of T, and the selection intensity, I, and thereby the
expected convergence rate of the population in the absence of mutation [Miihlenbein 95].
This form of selection is used in both BGA and genetic programming systems. [Koza 92]
EP tournament selection is similar to truncation selection but is slightly more
stochastic. Before ranking, each solution, s, is randomly paired with t other solutions,
producing tournament pool T ,. s, is assigned a score, v5, equal to the number of solutions
tje T , for which f ( s; ) S f ( t j ) (assumes a minimization problem). Next, the population
pool is sorted according to these tournament score values, and the top A are allowed to
survive. F ogel speciﬁes that the value oft should be related to the population size, but no
exact formulation is given.
45
2.3.2 Recombination
By deﬁnition, a recombination operator is one that forms a new solution by
collecting “genetic information” from multiple individuals and combining this
information in some fashion. While the metaphor of sexual reproduction suggests a strict
biological interpretation with two parents with separation of genetic information in
simulated meiosis, and fusing of genetic material in simulated fertilization, in actuality
most genetic encodings and recombinative operators are greatly abstracted from the
current understanding of these processes. In terms of an exercise in simulated evolution,
a recombinative operator Should endeavor to continue to reﬂect a reasonable abstraction
of the actual physical processes. However, if we consider an EC system as primarily a
heuristic search process for function optimization, then we are not strongly restricted to
these terms.
It is therefore possible to broaden the deﬁnition of recombination to include forms
that are not observable in nature. For example, we are free to consider recombination of
genetic material from more than two progenitors. Likewise, we can redeﬁne what
constitutes the “genetic information” that is being contributed. Rather than restricting
ourselves to the direct interpretation of genetic information as a direct genetic
representation in some binary or real valued DNA analog, we can consider any
information garnered from multiple members of the population, such as parameter mean,
covariance, etc. as potential fodder for recombinative operators. An acceptable deﬁnition
of recombination therefore includes all operators that produce new solutions, where the
composition of the new solution is directly inﬂuenced by the composition, structure,
and/or relationships between two or more sampled landscape points previously (or
46
currently) visited by the search algorithm. Note that this would still allow us to
differentiate purely mutative operations, since the decision as to which points to sample is
typically based on the location of a single solution and a global or selfadaptive stepsize
parameter, without direct input from another solution.
Recombination is not a prerequisite to achieve an evolutionary simulation.
Systems which do not employ recombination can still qualify as evolutionary algorithms,
provided that they incorporate a mechanism of selection which allows shifting of search
resources from apparently less fruitful portions of the solution space to more fruitful
ones. Systems such as parallel simulated annealing, which always produce a single new
solution to replace each independent existing solution according to some criteria, do not
qualify as evolutionary computation. Whereas an identical system which allows one
solution to produce two or more new solutions with a balancing extinction of one or more
existing solutions, where such decisions are made Via some form of competitive
evaluation (though not necessarily solely or primarily value related), deﬁnitely qualiﬁes
as an evolutionary algorithm.
The following review is not intended as an exhaustive review of current
recombinative operators. The intention here is to introduce some of the subjectively
more popular recombinative operators in the current literature and some variants which
illustrate speciﬁc properties.
2.3.2.1 Discrete Recombination Forms
Discrete recombination treats the individual alleles as noncontiguous symbols
without a natural ordinality in terms of search space. In this sense, these operators are
genomic, rather than phenotypic, since they focus directly on the representation of a
47
solution, rather than the relative position of a solution within the search domain. Non
continuity of representation is not a requirement for discrete recombination, but discrete
operators treat even alleles that can be expressed in continuous fashion as if they were not
continuous. That is, these operators do not take advantage of or account of the
continuity. Since the alleles are not be interpreted geometrically, discrete recombination
focuses on mixing the alleles found in two or more parent solutions. Note that the term
discrete is somewhat of a misnomer, as it is possible to treat discrete integer values in a
continuous manner through averaging, etc. A more correct term would be non
continuous or symbolic recombination.
Common forms of discrete recombination exchange information between two
parents by selecting inversion areas deﬁned by crossing points. For example, consider
two parent solutions, p, and pi, each consisting of n individual alleles. Assume the kth
allele for each encoded solution is the same size. We can consider a discrete
recombination operator to consist of selection of an ordered set of unique crossing points,
b1, where each h, e[0,n1]. Given a function C(m), which returns the number of crossing
points having a value less than of equal to m, we can compute the elements of the two
child solutions, c0 and C], as:
h=0,1.
Pi,k iﬂ' C(k)+h is odd
Ch,k pJJc iﬂ' C(k)+h is even’
Crossover is the standard name for this form of recombination in GA literature.
GA crossing points are typically selected uniformly from the set of allowable crossing
points. GA theorists use a binary representation when using crossover, with allowable
crossing points between each two bits of the representation. This binary emphasis
43
provides direct support for implicit parallelism as posited by the schema theorem.
However, it is possible to use crossover with arbitrarily sized alleles.
GA crossover Operators are classiﬁed by the number of crossing points allowed.
The minimal operator is onepoint crossover, which exhibits a bias toward a higher
frequency of disruption for longer schema patterns. Twopoint crossover, also known as
circular crossover, attempts to remove this bias by effectively treating the parent
solutions as circular rather than linear encodings. Random uniform crossover allows
the maximum number of crossing points. Rather than requiring n unique crossing points,
uniform random crossover allows up to n crossing points to be selected, while duplicates
are ignored. Altemately, uniform crossover may more easily be represented as arbitrary
random assignment of allele pairs to the produced children. That is, assuming n uniform
random samples, f1 from the range [0,1), the child alleles are selected according to the
equations:
pm: If fk <05 Pj,k iﬂ fk <05
c0,k = rcl,k = ‘
Pj,k otherwise pi, k otherwise
An alternate variant, guaranteed uniform random crossover, forces each child
to inherit exactly one half of its alleles from each parent. This may be achieved by
creating an ordered list of all possible allele positions, then randomly reording this list.
The resulting permutation allows assignment of the alleles ﬁrst n/2 numbered positions
from pito c0, and from [2} to c]. The remaining alleles for each child are then taken from
the opposite parent.
A common multiparent recombination scheme involving more than two parents
uses commonality or majority voting to determine the composition of child solutions [Pa]
49
94] [Mtfhlenbein 89]. In these schemes, allele values which recurr with more than a
preselected frequency within a selected parent pool are passed on to the child solution,
while other alleles are selected at random.
Dominant recombination can be yet another synonym for discrete
recombination, especially when referring to 2parent recombination; however, dominant
recombination can denote a global form of discrete recombination, where each allele is
chosen randomly from any member of the population unifonnly (or via some selection
method). Hence, the most heavily expressed, or dominant, expression in any single allele
position, if one exists, will be the most likely to be inherited. This closely resembles
voting forms of discrete recombination.
The shuffle crossover variant involves a temporary inversion operation
performed on the components before crossover, which is reversed immediately after
crossover takes place. While this may appear similar to uniform crossover, this
inversion, when coupled with a onepoint crossover, creates a uniform distribution on the
total number of alleles exchanged. By contrast, the size of the exchange in uniform
crossover is binomially distributed, with the most frequent exchanges being half of the
encoding size. Guaranteed crossover forces exactly half of the alleles for each child to be
inherited from each parent (assuming an even number of alleles). Guaranteed uniform
crossover can be expressed in terms of a shufﬂe crossover where the crossover point is no
longer random, but always selected as the most central cut point.
2.3.2.2 Intermediate Recombination Forms
If we consider the components of individual solutions to represent points in a
contiguous discrete or continuous domain, then we can potentially extend the
50
interpretation of the “information” being canied by a solution to include representation of
its localized “neighborhood” within the domain. Note that this makes an implicit
assumption that the functional behavior of nearby points can be extrapolated from
sampled points (i.e. the ﬁtness function is locally smooth). If we can accept this
assumption, then it is possible to create operators that search within the area of the
domain deﬁned by two or more parents.
There are a number of similar operators with various names, which implement
these concepts. The ES operator, intermediate recombination, formulates a child
solution by extrapolating between individual parameter values from two parents. Given
two ndimensional parent solutions, p, and pi, and n uniform samples, on, , over the range
[0,1), we can calculate a child solution’s parameters as: ck = Pi,k +ark (p 13" — P1316):
Note that this is equivalent to uniform random sampling of the interior of the hypercube
deﬁned by the two parent solutions. In GA circles, this operator is known as blend
crossover (BLX) [Eshehnan 93] or ﬂat crossover [Radcliffe 90].
An alternate formulation of intermediate crossover is arithmetic crossover
[Michalewicz 99]. Given two ndimensional parent solutions, p; and pi, and n uniform
samples, a], , over the range [0,1), we can calculate a child solution’s parameters
according to the arithmetic crossover formula: Ck = ak Pi, k + (1  ak )p j, k . Guaranteed
average crossover [Davis 89] is identical to this formulation with choice of the a], = 0.5
for all k.
A more generalized version of intermediate recombination is extended
intermediate recombination, which extends the area of sampling by some multiple of
51
the distance between the two parent solutions. Miihlenbein deﬁnes extended intermediate
recombination according to the same formula as intermediate recombination, but
redeﬁnes the range of an, to be [a, 1+ a), where a 2 0. [Miihlenbein 1993]. Milhlenbein
suggests a value of 0.25. BLXa is the same operator as simultaneously proposed by
Eshelrnan [Eshehnan 93]. Both extended intermediate recombination and BLX—or
attempt to address the problem of the bias toward the center found in their nonextended
counterparts. However, unless the level of the extension is relative to the dimensionality
of the problem, such operators potentially overcompensate by oversampling outside of
the hypercube deﬁned by the parent solutions.
Some evolutionary systems also deﬁne a form of global intermediate
recombination, which independently selects new parents for each child parameter, as
opposed to each child solution. Various limitations and alternatives of this strategy may
be used, for example, limiting the set of parents to a reduced subset of the entire
population, etc.
Modem evolutionary strategies extend the concept of global intermediate
recombination to a generalized resampling of the existing population distribution. In
effect, this is a reduction of the existing population to a series of distribution metrics and
reinitialization of the population according to these metrics. For example, the sampled
covariance matrix can be used to determine an eigenspace (similar to a principal
component analysis) of the population distribution. The resarnpled population is a
normal distribution aligned according to this eigenspace with the variances along each
axis equivalent to the corresponding eigenvalue. This new distribution produces the
same covariance as the original one, within the precision allowed by the degrees of
52
freedom (i.e. population size). Note, however, that the new population may have a
completely different shape than the original. For example, the original may be non
normally distributed, highly asymetric, etc.
2.3.2.3 Linear Recombination Forms
The concept of extrapolation can be more narrowly applied if we limit our
exploration to the line between the parent solutions. In linear crossover [Wright 91],
three offspring are produced from two parents. The three offspring are located at three
ﬁxed linear combinations of the parent solutions: %E+':‘;2_:':‘P—l‘%f’;: and
 g; + g2. Geometrically, this is equivalent to the midpoint (guaranteed averaging
crossover point), and two extensions from the ends along the line deﬁned the parents at a
half of the distance between the two points. If we consider the set of all offspring, both
the mean and covaraince of the parents are preserved, but the average variance is
increased by 1/3.
Miihlenbein [Miihlenbein 1994] generalizes this concept with line
recombination, which is identical in formulation to basic intermediate recombination,
with the exception that all terms for a single recombination operation use the same a
term (i.e. w, is replaced by to). This causes all children to be drawn from a unimodal
uniform random distribution along the line segment between the parent solutions. As
with intermediate recombination, this operator exhibits bias toward the center of the two
parents. As with extended intermediate recombination, extended line recombination
53
attempts to counter this bias by increasing the range of the a term to [a, 1+ a), where a
2 0. For extended line recombination, a = 0.5 removes any central bias.
Miihlenbein also introduced a much more complex version of extended line
crossover, which attempts to mimic the distribution of binary crossover on standard
binary integer values using a discrete version of the loguniform distribution [Miihlenbein
1994]. Miihlenbein samples the loguniform distribution discretely by ﬁrst creating m
uniform random binary samples am, where am = I with probability I/m, and 0 otherwise.
On average at least one a", value will be equal to 1. Next, this binary sample is
m o
converted to a loguniform sample by calculation of: ZaiZ' . Note that the resulting
i=1
distribution is roughly uniformly distributed according to the log; of the points; however,
certain points (those with less bits in the binary form of their log; value) are more heavily
favored over those with more bits. A histogram of 1,000,000 samples of this distribution
with m = 10 is illustrated in Figure 2.3.
54
Miihlenbein Logunifonn Sampling
j
l
l
l
[

l
M—iihlenbein:
l_E
_F..W_.v ~~~~~~
Number of Samples
.§§§§§§§§
1' 120 239 353471596 715 334‘ 953’
Sample Value
Figure 2.3 Histogram of sample of Miihlenbein’s loguniform distribution
Given a sample, 50,.) from Miihlenbein’s discretized loguniform distribution, the
offspring values for two offspring are calculated as:
Pi,k pj,k
Pi k
9
Ck = Pi, k +srﬂ6(m)[ — p 13k], where r is a constant proportional to the
initial range of the given dimension (Miihlenbein suggests half of the range), s equals —1
for the ﬁrst child, and +1 for the second, and ,6 is equal to 1 with probability 0.9, and
equal to —1 otherwise. Note this ﬁmction tends to favor the p; parent when both values
are of the same sign (which should be the more ﬁt of the two parents). Understanding the
complete derivation of this function is a bit daunting, and the reader is referred to
[Miihlenbein 1994] for further information. In essence this operator attempts to simulate
binary crossover on integral binary values by incorporating a log—uniform distribution.
55
l
l
!
Also, this operator attempts to maintain relative scale both to the scale of the search space
(as evidenced by the r term above) as well as the distance between individuals.
2.3.2.4 Unimodal Normal Distribution Crossover (UNDX)
Ono and Kobayshi have proposed several variants to the standard BLX—a
crossover operator. The ﬁrst, unimodal normal distribution crossover (UNDX) [Ono
97] is somewhat closer to extended linear recombination in calculation. UNDX selects
the axes of application from the selected parents. The ﬁrst two parents ﬁx the orientation
of the ﬁrst axis, the origin of the search distribution, and the variance for the operator
distribution along this axis (half of the distance between the two parents). The origin is
always taken to be the center point of the two parents. The search variances along the
remaining (arbitrarily oriented) orthogonal dimensions are determined as ——1—times the
.5
distance between the center of the ﬁrst two parents and a third parent. One or more
children can be produced from this search distribution.
mparent UNDX [Ono 99] enlarges the “population guided” nature of the
standard 3parent UNDX by incorporating m parent samples to determine the search
distribution along all dimensions (it is assumed that m is signiﬁcantly larger than the
dimensionality to avoid Singularities, etc.). The origin of the distribution is the origin of
the m parents. Also, a “separability” algorithm is applied to the resulting eigenspace to
produce a skewed axis set which projects an approximately symmetric distribution from
the original mparent sample. This operator is essentially similar to recent ES versions of
global intermediate recombination, but with a restricted sampling and an additional
separability analysis component.
~56
mparent UNDX is quite similar to the principal component mutation operator
outlined in section 4.2; however, unlike UNDX, the proposed operator does not
incorporate a centertending bias since it is centered on a single arbitrary parent, and it
does not make assumptions about the separability of dimensionality. For a more detailed
 comparison of UNDX and the proposed operators, see section 4.2.
2.3.2.5 Simplex Crossover (SPX)
Simplex crossover (SPX) [Tsutsui 99] is a multiparent operator similar to
UNDX in concept, yet much more sophisticated in design. First, for an n dimensional
search space, n+1 parent individuals are selected. The center of mass, 0, of these
solutions are computed, and a series of n random samples are drawn according to the
formula:
1
rk = u k+1 , where u is a sample from a uniform random distribution on the
interval [0, l), and k = [0,1,..n].
Note that as k increase, rk becomes exponentially more skewed toward 1 .
Given the center of mass, 0, an externally assigned growth rate, a, and the n+1
parent “vectors”, Xi, calculate an expanded (or contracted) form of each parent vector X1,
as Y, for each i = 0, 1, n+1 according to the formula:
3'? = 5 + £07?  5)
Now, accumulate samples from the ﬁrst n expanded vectors using the random samples,
rk, as follows:
57
56= 0.5? = r._1(i’E—?ZI+EZDJ=10. 1. n/.
This effectively combines a portion of the difference between two of the parent vectors
with the previous accumulation and then rescales (shrinks) the resulting vector. Next, the
ﬁnal child value is given as:
23 = I; + a;
Tsutsui and Goldberg offer the following claims regarding the distribution of
search points by this operator [Tsutsui 99]:
1. It is independent of the encoding coordinate system (i.e. invariant across
rotation, translation, and linear rescaling).
2. The mean vector of the parents and the children are identical (i.e. it is mean
preserving).
3. The covariance matrix of the children is a rescaled version of that of the
parent solutions. If we select a = W , then the covariance of the parents
and children will be identical.
Note that the title simplex crossover was previously used by Renders and Bersini
[Renders 94], to denote a similar operator with more deterministic behavior. Their
Simplex crossover is a ﬁtness biased operator. Assuming k parents are selected initially,
the centroid, c, of the ﬁttest kI parents is computed. Then the vector from this centroid
to the worst parent solution is inverted about centroid position. Or equivalently, the point
llpwom — c“ from c along the line between c and pwom is selected.
58
2.3.2.6 Fitness Biased Recombination
Numerous crossover schemes have been developed which speciﬁcally bias toward
a favored parent. In most cases, the bias is directed toward the more ﬁt of the two
parents. These methods do not have a direct biological analog within the metaphor of
evolution, but may be seen as an attempt to model dominance and recessiveness without
incorporating full—ﬂedged diploidity (with the assumption that preferable traits become
more dominant than less favorable ones).
Examples of ﬁtness biased recombination include Wright’s heuristic crossover
[Wright 94], and Eiben’s fitnessbased scan [Eiben 94]. In heuristic crossover, each
individual allele is computed as:
c, = rlxi —x j )+ x J , where r is a uniform random sample on [0,1), and x; is the
more ﬁt of the two parent solutions. Note that although the intent to bias toward the more
ﬁt parent is apparent, as long as r samples the uniform distribution, this becomes
equivalent to linear recombination and BLX. Fitnessbased scan selects alleles based on
the ﬁtness of the associated parent solution relative to the ﬁtness of all solutions in the
parent pool. Thus, the probability that c, = p11,, is f (Pi: Pk i )/ f (P), where P is the set
of all parents, and pi = p k i is the set of all parents with the same value in the i m allele.
2.3.3 Mutation
Mutation is distinct from recombination in that a mutative operation selects search
points based on information from a single member of the current population. Search
operators that do not use any information from the current population are antithetical to
59
the metaphor of evolution and therefore are not typically considered mutative. Typical
mutation operators proceed by adding a level of noise to an individual solution, often in
the form of addition of random samples from a given, typically predetermined,
probability distribution.
Binary mutation is possibly the simplest form of mutation in that a binary
mutation causes a bit to change from 0 to 1, or vice versa; therefore, no thought needs to
be given as to the distance of the mutative step size. Binary mutation operators focus
instead on the application rate, that is, the probability of an individual bit being ﬂipped.
Since binary mutation operates on the genotypic representation, it is difﬁcult to
characterize its distribution in terms of phenotypic modiﬁcation as this is dependent on
the form of encoding employed for the individual allele. For standard unsigned integer
representations, the distribution approximates a discretized loguniform distribution such
as those used by Milhlenbein [Mtihlenbein 95]. However, the effect of binary mutation
on integer representations is necessarily dependent on the actual binary pattern of each
encoded value. Some approaches to modifying the effects of binary mutation by using
alternate encodings, such as grey encoding, have been pursued [Whitley 97] [Whitley
99].
Mutation of nonbinary, continuous representations such as integer and real
values requires selection and parameterization of a mutation distribution, including
determination of an appropriate scale, variance, or mutative step size. Potential solutions
for selection of the mutative step size include use of a predetermined ﬁxed value,
adaptation based on the level of success, and co—evolution of the parameters of the
mutative distribution with the parameters of an individual solution. Note that it is
60
possible for a mutative operator to use separate step sizes, or possibly even separate
distributions, for each dimension of the search space.
Most mutative operators employ a predetermined distribution for determination Of
mutative perturbance. Lee explores an interesting alternative by adaptively adjusting the
parameterization of a Levy distribution, thereby adaptively modifying the shape of the
mutative distribution [Lee 99]. Typical mutative operators employ common distributions
such as uniform, normal (Gaussian), Cauchy, and Laplace probability distributions.
These distributions differ in their focus on central sampling, and the length and shape of
the distribution tails. EP systems employ centralized distributions to increase the
likelihood of small mutations over larger ones.
Use of a fixed mutation step size is a simple mechanism, which provides the
mutative step size as an external parameter. In such systems, the mutative step size is
typically selected to be a ﬁ'action of the expected parameter range for each parameter.
EC systems that employ a fixed mutation step size typically have difﬁculty ﬁnding
answers within greater precision than the level of mutative noise. For example, consider
a system employing a randomly aligned vector with a length uniformly distributed on the
range [0,1) as a mutative operator. If a given solution, p, requires reﬁnement on the order
of 10e6 distance, only 1 in 10e6 mutations will be sufﬁciently small to produce
improvement. This argument holds for most other distributions as well.
The analysis of ideal evolutionary search systems shows that in order to ensure
reasonable progress, 20% of all newly created search points should demonstrate increased
ﬁtness [Schwefel 95]. Therefore, one possible adaptive mechanism for mutative step size
selection is to increase the mutative step size if greater than 20% of recent mutations have
61
demonstrated increased ﬁtness, and likewise to decrease the mutative step size if less than
20% of recent mutations have demonstrated increased ﬁtness. This algorithm is
popularly known as the 1/5th rule. This algorithm assumes that the local landscape is a
concentric hill climbing situation (i.e. the assumption is that reducing mutative size
increases the success ratio).
The most common form of mutative step size adaptation in modern ES and EP
systems is selfadaptation. Selfadaptation is accomplished by inclusion of both the
solution parameters and the mutation parameters within an individual solution. EP
systems tend to employ selfadaptive step sizes across the axes of encoding, while ES
systems may evolve either step sizes alone or both step sizes and axial orientation. The
mutative parameters are modiﬁed by other (metalevel) mutation operators. This meta
level mutation typically modiﬁes the exponent of the mutative step size (i.e. expands or
contracts the mutative step Size) and employs a ﬁxed metalevel step size. ES systems
also apply recombinative operators to mutative parameters.
2.4 Empirical Test Functions
Several of the selected test functions were chosen because they are widely used
and highly regarded in EC and function optimization literature. One or two were selected
because they provide novel, highly epistatic search landscapes. The remainder were
created speciﬁcally for this work to provide representative samples of various
combinations of the previously listed problem categories and descriptions. Speciﬁcally,
the four modiﬁed versions of Shaffer’s function, the ring function, the trapezoid cross
function, the spiral function, the clover & cross function, the multimodal spiral function,
52
the chainlink function, the double cos function, the inverse exponential function, and the
worms ftmction have been introduced Speciﬁcally for this work.
2.4.1 Function Categorization and Terms
The following terms and categorizations are used in presentation of the test
functions.
2.4.1.1 Unimodal
Unimodal is a mathematical term which implies that the derivative has only a
single extremum; that is, only at one point does the derivative become zero. Some of the
functions being presented are not easily differentiated. Further we are not in general
interested in all extrema of a function, only the minima or maxima (typically minima).
Therefore, a direct geometric interpretation of the term unimodal focusing only on
minima will be used here. A unimodal function will be deﬁned as a function for which
there is a path (not necessarily a linear path) from every point in the search space to the
global optima which is nonincreasing (for minimization problems) or non—decreasing
(for maximization problems).
2.4.1.2 Monotonic
Likewise we will use a geometric interpretation of the term monotonic. For our
purposes, a monotonic function is one for which the most direct path between each point
in the landscape and the global optima is nonincreasing or nondecreasing for
minimization and maximization problems respectively.
63
2.4.1 .3 CenterFocused
The term centerfocused will be applied to any function for which the majority of
points in the search space have a favorable instantaneous slope toward the global
optimum. That is, a positive slope in the direction of a global maximum for
maximization problems, and a negative slope toward the global minimum for a
minimization problem. Determination of this attribute is largely based on visual
inspection and analysis, although a complete mathematical treatment is possible for
ﬁrnctions which can be differentiated.
2.4.1 .4 Independent Variables
The complexity Of a test function is determined primarily by the level of
interaction between the individual parameter ﬁelds within the function itself. In terms of
the paradigm of evolution, the interdependence of parameters is somewhat equivalent to
epistasis since the expression of the value of one parameter is being masked or mitigated
by the value of another parameter. A function which uses variable values independently
allows each parameter ﬁeld to have a consistent contribution toward the function value
regardless of the values of other parameters. This property allows parameters to be
effectively context independent, in that the worth of a given parameter value in one
situation is identical to its worth in all other situations. An EA must still determine each
parameter’s contribution to the overall objective function value blindly; however, having
independent contribution for each parameter ﬁts most directly with the premise of EC.
64
2.4.1.5 Relative Variable Relationship
While a number of optimization test functions allow for independent contributions
from function parameters, forcing parameters to be independent severely limits the
complexity of the search space. For example, for all functions where the parameter
contributions are each calculated by the same function, independence of the parameters
results in a symmetric search space. Further independence implies that the function may
be amenable to a Simpler approach such as an inductive search.
To provide more complex search landscapes for testing, many test functions
choose to combine the problem parameters in a nonlinear manner. This causes the
contribution of a given variable to become partially or completely dependent on other
variable values. Various linear and exponential combinations of variable values are often
used to cause linear and nonlinear warping of otherwise simple landscapes.
For example, consider the search landscape described by: f (x) = 100a2 +b2.
This is a simple Sphere function stretched into an ellipse by weighting the a component
more heavily than the b component. However, if we modify the input parameters to be x
2 . . . . . .
 y In place of a, we have Imposed an exponentral relatronshrp between Input
parameters x and y. That is, the optima of the 100 a 2 component follows the line x = y 2.
If we further modify the b component to be an offset of x, such as x —— 1, this completes
the transform ﬁom the simple ellipse function to the much more difﬁcult Rosenbrock’s
banana function. (Also, note that the 100 weight component effectively provides a
signaltonoise differentiation component.)
65
2.4.1.6 Symmetric and Asymmetric
Syrrunetry can be an important clue in optimization search. Many search
techniques make use of axial symmetry (though often not explicitly or intentionally) to
locate a global optimum which lies at an intersection between multiple symmetric local
minima. Since we are concerned with the effects of symmetry on ﬁnding the global
optima, we will usually want to consider the symmetry of the function centered on the
global optima. That is, the ﬁmction itself may be symmetrical about some other point
(such as the origin) but may be considered asymmetrical in terms of the global optima
placement.
Note that symmetry and parameter independence are interrelated. For example, if
a test function is composed of the sum of several independent functions, f ( x 1), then if f
is symmetric the test function will also be symmetric.
2.4.1.7 SignaltoNoise Discrimination Problems
A fairly common model for creation of a reasonably difﬁcult test function is to
overlay a simple monotonic centerfocused function with a secondary ﬁrnction such as a
periodic sine wave. This can be achieved simply by composing the search problem as the
sum of these two ﬁrnctions. The optima of the monotonic ﬁrnction is typically aligned
with the global optima (or one of the global optima in the case of a periodic function) of
the secondary masking function. To make the problem more difﬁcult, the second term
may be magniﬁed relative to the ﬁrst through multiplication by a relatively large constant
(or likewise, the ﬁrst term may be reduced relative to the ﬁrst).
66
This type of problem is similar to situations where a signal is being received in
the presence of noise. In this case, the noise would be the distracting local gradient
information provided by the second term and the signal would be the potentially
weakened value generated by the ﬁrst term. Obviously, as the magnitude of the masking
function relative to the signal function increases, the overall difﬁculty of the problem
increases.
A surprisingly large number of the test functions used in EC literature fall within
this category. Ackley’s function, Bohachevsky’s function, Griewangk’s function, and
Rastrigrin’s function all follow this basic formula with various methods of providing the
base function and the periodic overlay function. In terms of signaltonoise ratio,
Griewangk’s function provides the lowest, while both Rastrigrin’s function and Ackley’s
function use relatively mild noise components, and Bohachevsky’s function uses a strong
signal and a relatively weak noise function. Yip and Pao’s function also fall in the
category of signal discrimination problems, as do the spiral, multimodal spiral, chain,
double cos, and worm functions created in this work. However the latter functions can be
differentiated in that the overlaying noise functions are not axially aligned periodic
functions as is the case with Ackley’s function, Bohachevsky’s function, Griewangk’s
function, Rastrigrin’s function, and Yip and Pao’s function.
Note that the simplest form of a signaltonoise discrimination problem is a
simple summation of progressively weighted squares of individual parameters, such as
with Schwefel’s problem 1.2. The successive geometric weighting of the parameter
contributions causes modiﬁcations of lower indexed parameters to be masked by the
amplitude of higher indexed ones. Thus, if we assume that all parameters have the same
67
initial search range, a search method will typically tend to reﬁne the solution from higher
indexed dimensions ﬁrst. This need for successive reﬁnement can be quite taxing on
some EC techniques. Since some of the parameters have negligible effect on the
objective value for a reasonably long period, there is a tendency for genetic drift to cause
these values to converge prematurely. Thus the more heavily an EC system depends on
population diversity, the more difﬁcult such successive reﬁnement problems will be. We
can increase the level of successive reﬁnement required by using exponential
combinations of the input parameters (i.e. ( Z'xi )2 ) or by using the exponent of the
parameter itself (i.e. ( x12 ) ). However, note that the second has the interesting property
that once the values reach the  x,  S 1 region, the successive reﬁnement problem is
effectively inverted (i.e. the earlier indexed parameters become dominant). This function
is listed below as the Exponential Function. An even more drastic version might use i x.
Also note that EC systems which require a minimum continuous level of mutation on all
parameters may not ever be able to reﬁne lower order components in such a successive
reﬁnement problems. The constant high noise levels injected into the objective function
by the ampliﬁed mutational effects on higher level components may continuously mask
feedback from modiﬁcation of lower order components.
2.4.2 Function Illustrations
Each of the following functions is illustrated with one or more height maps.
These illustrations are views of twodimensional projections of the ﬁmction value along a
selected x and y basis. All of other function variables are held at their optimal value
unless otherwise speciﬁed. The darkness of the various pixels indicates the relative
68
height of each value (as compared to other values in the visible area of the illustration).
The lowest points are darkest, while the highest points are white. So, effectively these
illustrations provide a three dimensional View with two independent parameters (x and y),
with the gray level as the dependent variable.
69—
2.4.3 Square Function
The square function is simply the summation of the absolute values of each
parameter. This function is unimodal, monotonic, symmetric and centerfocused. Each
parameter provides an independent contribution to the objective function.
Equation: 2in Type: Minimization
Range: NO effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Unimodal, Monotonic, Symmetric, CenterFocused, Independent
Variables
Figure 2.4 2D Square Function
70
2.4.4 Sphere Function
The sphere ﬁmction is simply the distance of each point in the search space from
the center of the sphere (typically the origin). This function is unimodal, monotonic,
symmetric and centerfocused. Each parameter contribution is not independent.
Equation: "2x12 Type: Minimization
Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Unimodal, Monotonic, Symmetric, CenterFocused
Figure 2.5 2D Sphere Function
71
2.4.5 Schwefel’s Problem 1.2
This function demonstrates a linear relationship between the relative ranges of
successive parameters. This function is unimodal, monotonic, symmetric and center
focused. Parameter contributions are independent.
Equation: 211x? Type: Minimization
Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Unimodal, Monotonic, Symmetric, CenterFocused, Linear Relative
Variable Ranges, Independent Variables
Figure 2.6 2D Schwefel’s Problem 1.2 Function
72
2.4.6 Schaffer’s Function
This function provides near inﬁnite resolution near the origin and concentric rings
of hills and troughs; however, the general slope trend is toward the center. The original
function was designed for only two dimensions. This extended form projects each
sequential pair of parameters onto Schaffer’s original function. An alternate form would
be to substitute the sum of all square terms in place of the two existing square terms.
This form provides a more energetic ndimensional surface, while the alternate form
would present smooth concentric hypershells. This function is multimodal, symmetric
and centerfocused. Parameter contributions are independent.
nl l .1 2
Equation: 2(xi2+xi2+1)/4 sin[50(x;2+xi2+1 ] +1.0
i =0
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, CenterFocused
73
Figure 2.7 2D Schaffer’s Function
74
2.4.7 Schaffer's Function Modiﬁcation 1
This ﬁrnction is a modiﬁcation of Shaffer’s function which offsets the centroid to
the point [l,l,...,l], and forces the landscape to be dependent on multiple variables at
once. Note that by offseting the global optima, we lose the level of resolution of the
original Schaffer function. The resolution is now limited to the precision of our
representation. If we wish to restore the higher resolution capability, we can eliminate
the —1 term from the ( xn  l ) clauses. This function is multimodal and centerfocused.
The function is symmetric about the global optima, but not linearly symmetric.
Parameter contributions are dependent and linearly relative.
Equation:
n—2
20((xi ‘xi+l)2 +(J.‘i+1 ‘xi+2)2)%1 Si'{50((xi‘xi+1)2 +(x,+1—x,+2)2)0'1]2 +1.0 +
i:
((xn_1—xn)2 +(x, .1)2)%1 sin[50((x,,_1 _x,,)2 + (x, .1)2)°"]2 +1.0
Type: Minimization Range: No effective range limits
Global Optima: [1, 1, ..., 1] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, CenterFocused, Linearly Relative
Variables
75
Figure 2.8 2D Schaffer’s Function  Modification 1
75
2.4.8 Schaffer's Function Modiﬁcation 2
This function is another modiﬁcation of Shaffer’s function that is almost identical
to the previous function, but with an additional square term. This modiﬁcation causes the
landscape to become warped in a similar fashion to the Rosenbrock saddle function. As
in modiﬁcation 1 we lose the level of resolution of the original Schaffer function, but we
can restore this property if desired. This function is multimodal, asymmetric and
somewhat centerfocused. Parameter contributions are dependent and relative.
Equation:
2
n—2 ){1 0.1
2 [(‘i “x12+1)2 +(xi+l ‘xi+2)2] Si 5{(‘i ’x12+1)2 +(x,+1 'xi+2)2] +10 +
i=0
1 2
[(xn_1 —x,%)2 +(x,, —1)2]A sin 50[(xn_1—x3)2 +(x,, 1)2 J01 +1.0
Type: Minimization Range: No effective range limits
Global Optima: [1, 1, ..., 1] Value at Global Optima: 0.0
Description: Multimodal, Asymmetric, CenterFocused, Eponentially and
Linearly Relative Variables
77
I
2.5 0.0
X:
Figure 2.9 2D Schaffer’s Function  Modiﬁcation 2
73
2.5
2.4.9 Schaffer’s Function Modiﬁcation 3
This frmction uses the same mechanism as the original Schaffer function to
provide extremely high levels of resolution; however, here rather than concentric rings
we provide rapidly ﬂuctuating high resolution energy bands. This is combined with a
centralized tanh(distance) and the constant 0.001 term provide a signaltonoise
differentiation problem. This function is multimodal and symmetric. Parameter
contributions are independent.
I!
Equation: 2 tanh(x,2 [0.001 +
sin(SOQsin(x, DO 1 )D
i =0
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, Independent Variables
79
l. '1’"
”1% «no :11“;
Hill! ill
3%;
*2
3:51.:
$13111
3111
I
1111‘ 0111115 Ann
1111‘ A1115 A1111b .
:V: 3::
alone all
miii; 3m; iiii
Illll‘ Gilli. Amt
I‘llli
’vmrawe
culls Imls
v
0
C
Mill! Hill! Gilli? “ill. Vllll
“2
“it?
1e
11.!
WE

nu
 =::II:
O‘::
iiiiw :liil
A111.» A111
 :3 3.311;:
‘ iiiiiv $333
.7 A1116 411
I
010:   I
. ..  
.IlI'I C OI.
 ...m.. ‘..
lmt IlllI Chile
”:3
A
33;:
5‘
'
2
a..
9.0.":
Figure 2.10 2D Schaffer’s Function — Modiﬁcation 3, Low Resolution
30
£1: M
#10111?
" , £11115» 
«.21»: c» 1
3331233.
' ”‘0'“. .1, ‘
we?
5.0 0.0 5.0
Figure 2.11 2D Schaffer’s Function  Modiﬁcation 3, Medium Resolution
31
xm
Figure 2.12 2D Schaffer’s Function — Modiﬁcation 3, High Resolution
82
2.4.10 Schaffer‘s Function Modiﬁcation 4
In this, the last of the ﬁmctions based on the Shaffer function, we use the same
mechanism as the original Schaffer function to provide extremely high levels of
resolution; however, here the concentric rings form around multiple optima. This is
combined again with a centralized tauh(distance) and the constant 0.001 term to provide
a signaltonoise differentiation problem. This function is multimodal and symmetric.
The areas between the local optima are semichaotic. Also, note that the global optima is
not axially aligned with the pattern of local optima. Parameter contributions are not
independent.
"’1 2 2 o 1
Equation: Ztanh(x, +xi+1I0.001+ sin(50(sin(xi)sin(xi+1]) ' )D
i =0
Type: Minimization Range: No effective range limits
Global Optima: [0, O, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric
33
mm:
2.4.11 Ring Function
This function is intended to provide an asymmetric landscape where the global
optima basin is nonlinear. This function is unimodal and asymmetric. Parameter
contributions are not independent. The target distance value, 1 has been selected to be 5
in the illustrations, and throughout all empirical testing.
Algorithm:
1. Convert each pair of variables, xi,x,+1, to polar coordinates. (I.e.
calculate D as the distance from the origin, and 0 as the incident angle
from the origin. If D is 0, then 0= 0. Oshould be in the range [ 0 , 2n ].
2. Calculate d6 =
ln—4.
4
3. If d0>7t,then (16:272—(10.
4. Calculate dD = D —t , where t is a constant representing the target
distance.
d6
5. The function value is: QdD + 0.00%?) — 0.001.
Type: Minimization Range: No effective range limits,
although useful range will be
relative to t
Global Optima: [— tg,—t‘/;...,—t\[—12:] Value at Global Optima: 0
Description: Unimodal, Asymmetric
35
10.0 "
I
x 0.0
40.0
I
40.0 o_o 10.0
Figure 2.14 2D Ring Function
35
Figure 2.15 2D Ring Function, Closer View
37
Figure 2.16 2D Ring Function, Near Global Optima
33
2.4.12 Trapezoid Cross Function
This function is a simple asymmetric unimodal landscape. This function is
unimodal, symmetric and centerfocused. Parameter contributions are independent.
" e _x. (tanh(x,)+1)
Equation' 2 — 1'5 l
' . 4.5 2
1:0
Type: Minimization Range: No effective range limits
Global Optima: [0.6846512, 0.6846512, ..., 0.6846512]
Value at Global Optima: 5.43005203631e05*n
Description: Multimodal, Symmetric, CenterFocused
5.0
x 1+1
0.0
5.0
Figure 2.17 2D Trapezoid Cross Function
89
2.4.13 Rosenbrock’s Saddle Function
This function, also known as Rosenbrock’s banana and the De Jong function
number 2, is a standard asymetric, centerfocused, unimodal function. Parameter
contributions are exponentially related.
n—l
Equation: 2100(x,2 + xi_1)2 +(1 —x,)2
i =0
Type: Minimization Range: No effective range limits
Global Optima: [1, l, ..., 1] Value at Global Optima: 0.0
Description: Unimodal, Asymmetric, Center Focused
2.0
Figure 2.18 2D Rosenbrock’s Saddle Function
90
2.4.14 Spiral Function
This spiral function provides a nonlinear, noncenterfocused landscape with
difﬁcult cliffs. The formula in step 3 calculates an effective modulo on spiral band width
w. The calculation in step 2 provides an initial offset distance which varies from [0,w] as
Ovaries from 0 to 21!. These two combined provide a spiral shaped landscape where the
proﬁle of a crosssection of the spiral is a sawtooth wave. The third component provides
a simple ring function at a target distance based on p. The global optima will therefore
be the point on the spiral trough that intersects the circle at the target distance (which is
designed to be at 45° so that xi+1 may be used as x, in the next clause of the summation
and still have the same target). The two terms work together similar to a signaltonoise
ﬁmction.
This function is actually unimodal, though it may appear to be multimodal with
high activation barriers when moving radially. Although the spiral portion of the
landscape is somewhat symmetric, the function overall is asymmetric about the global
optima. Parameter contributions are not independent. In the illustration and in all
empirical testing within this work, we have select d = 1.51, and n = 4.
Algorithm:
1. Convert each pair of variables, xi,xi+1, to polar coordinates. (I.e.
calculate D, as the distance from the origin, and 0, as the incident angle
from the origin. If D, is 0, then 0, = 0. 6; should be in the range [ 0 ,
21: ].
91
2. Calculate Ii = D, — w—Zﬁ , where w is the width of the spiral grove.
7r
3. Calculate m, =1, —w_l, /w_].
4. The function value is: 2 m, +0.1D, —d(p+%), where p is a constant
1'
representing the target distance in terms of windings of the spiral. Note
that the 1/8 offset forces the target to be at the 45° position so that the
global optima coordinates are the same in each dimension.
Type: Minimization Range: No effective range limits
Global Optima: ['ild(p +9 ’{ldip + 313). ..,W]
Value at Global Optima: 0.0
Description: Unimodal, Asymmetric
92
XM
10.0
0.0
Figure 2.19 2D Spiral Function
93
10.0
2.4.15 Ackley’s Function
The ﬁrst two terms of this ﬁinction provide an exponential funnel focused about
the origin. The second two terms provide an egg crate form similar to that found in
Griewangk’s function, Yip and Pao’s function, and Rastrigrin’s function. This function
is multimodal, symmetric and centerfocused. Parameter contributions are not
independent.
2
02 2:; 2cos(2mci)
Equation: 20  20 e n + e — e n
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, CenterFocused
_94
2.5
Figure 2.20 2D Ackley’s Function
95
2.5
2.4.16 Gnewangk’s Function
Griewangk’s function provides a strong signaltonoise problem. The signal
component is provided by the sum of squares factor, which provides a parabolic basin
about the origin. The noise component is provided by the product term which provides
an “eggcrate” shaped series of alternating hills and valleys which increase in frequency
an decrease in width as the parameter index increases. The signal ratio is factored to be
1/4000th the maximum strength of the noise component. This causes the local minima of
the masking product term to be quite attractive to a search process. This function is
multimodal, symmetric, and minimally centerfocused.
2
. 2*: x
E nation: 1.0+ — co i
q 4000 H {J} ]
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, Very Minimally CenterFocused
96
10.0
xm
0.0
40.0
40.0 0.0
Figure 2.21 2D Griewangk’s Function
97
10.0
2.4.17 Clover & Cross Function
The ﬁrst term of the function f provides a bell shaped curve with a maximum at
the origin. This term serves as a focusing product on the second term. The second term
is a symmetric twohill shape with a sharp valley near the origin and long sloping tails
toward inﬁnity. The net result is a landscape with a global optima at the origin
surrounded by a steep hills and a long sloping ﬁeld which leads away from the global
optima. This function is unimodal and symmetric. Parameter contributions are
independent.
Equation: Zf(xi)3f(0)=0,f(p¢0)=p—J=——e
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Unimodal, Symmetric, Independent Variables
93
Figure 2.22 2D Clover and Cross Function
99
2.4.18 Bohashevsky’s Function
The ﬁrst three terms of this function provide yet another eggcrate masking
frmction, this time with slightly more emphasis on the second axis of each variable pair.
The second two terms provide an oval parabola which is slightly longer in the second
dimension. The combination of the two provides yet another signaltonoise
combination, similar to Griewangk’s function, Rastrigrin’s function, etc. This function is
multimodal, symmetric and centerfocused. Parameter contributions are not
independent.
n 1 2 2
Equation: 2 (0.7  0.3 cos(37rx, ) 0.4 cos(3zzxi+1 )+ x, + 2x1.+1 )
i =0
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, Centerfocused
100
Xm
Figure 2.23 2D Bohashevsky’s Function
101
Figure 2.24 2D Bohashevsky’s Function, Closer View
102
2.4.19 Rastrigrin’s Function
Like the Ackley’s, Bohachevsky’s, Griewangk’s, and Yip and Pao’s functions,
Rastrigrin’s function is a signaltonoise differentiation problem with a central valley
masked by a strong periodic signal. However, unlike these functions, Rastrigrin’s
function maintains independence of its parameter contributions. This function is multi
modal, symmetric and center focused, with independent parameter contributions.
Equation: 10n + Z (x2 — lOcos(2nx))
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, CenterFocused, Independent Variables
103
Figure 2.25 2D Rastrigin’s Function
104
2.4.20 Yip 8 Pao’s Function
This function was introduced by Yip and Pao [Yip 95]. In its original version,
2 2
— cos(207vc1)cos(20mc2)+ 2 , the function was designed only for two inputs and
used an offset of 2, for a ﬁnal global optima value of 1. In this modiﬁed version of the
Yip and Pao ﬁmction, we have extended it beyond two dimensions and decreased the
offset to produce the more standard 0 global optima value. As in the Griewangk
function, this function frames a signaltonoise discernment problem with the basic
sphere function as the signal value, and a high frequency eggcrate function as the noise
mask. Note in the illustration the similarities to the Griewangk function, but the extreme
difference in relative scale (due to the higher frequency of the masking component here).
it?)
Equation: —Z—n—'— — H cos(2071x,)+ 1
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, CenterFocused
105
IIIIIIIIIIII
IIIIIIIIIIIIII,“
I I I I I I I I I I I I I I I I t
IIIIIIIIIIIIIII
IIIIIIIIIIIIIII
IIIIIIIIIIIIIII
I I I I I I I I I I I I I I I I
II II IIIIIIIIIIII
IIIIIIIIIIIIIIIIIII
II III IIIIIII IIII
'IIIIIIIIIIIIIIIIII
lIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIII
lIIIIIIIIIIIII IIIII
IIIIIIIIIIIIIIIIIIII
iIIIIIIII IIII IIIII
IIIIIIIIIIIIIIIIIIII
IIIIIIII III. IIIII
IIIIIIIIIIIIIIIIIII
IIIIIIII III IIIII
,'....CIIIIIIII.I...
IIIIIIIIII IIIIIIIII
IIIIIIIIIIIIIIIIIIII
DIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIII
lIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIII
..lIIIIIIIIIOIIIIIII
lIIIIIIIIIIIIIIIIIII
.IIIIIIIIIIIIIIIIIII
I I I I I I I I I I o I I I I I I I I I
IIIIIIIIIIIIIIIIIIII
I I I I I I I I I I I I I I I I I I I
IIIIIIIIIIIIIIIIII
IIIIIIII IIIIII
IIIUIIIIIIIIIII.
IIIIIIIIIIIII
IIIIIIIIIau
Figure 2.26 2D Yip & Pao’s Function
106
2.4.21 Multimodal Spiral Function
This function, affectionately nicknamed the Dandelion function, is based directly
on the spiral function with the addition of an extra periodic function on the distance that
the spiral has wound from the center. The parameter r allows control of the frequency of
this secondary periodic term. The combination of these two terms produces yet another
signaltonoise discrimination problem. However, given that both the signal and noise
functions are in polar coordinate space, a linear encoded solution will most likely view
these recurrent hills and valleys as similar to those in Griewangk and other signalto
noise problems with periodic masking frmctions. However, signiﬁcantly, the global
optima in this case is not at the center of the symmetry, nor is the landscape strongly
centerfocused. For these reasons, we may expect this problem to provide a reasonably
high degree of difﬁculty for search algorithms.
This function is multimodal, though it may appear to be multimodal with high
activation baniers when moving radially. Although the spiral portion of the landscape is
somewhat symmetric, the function overall is asymmetric about the global optima.
Parameter contributions are not independent. In the illustration and in all empirical
testing within this work, we have select d = 1.51, and n = 4, and r = 7.5. Note that the
value nr must be a positive integer.
Algorithm:
1. Convert each pair of variables, xi,x,+1, to polar coordinates. (I.e.
calculate D; as the distance from the origin, and 0 3 as the incident angle
107
from the origin. If D; is 0, then 0, = 0. 0, should be in the range [ 0 ,
2n ].
2. Calculate l, = D, — w?— , where w is the width of the spiral grove.
7r
3. Calculate m, =l, —u{li /w_.
4. The function value is:
D, —d(p+%)+sm(r(23l2dij+6i +%J)+l], where p is a
constant representing the target distance in terms of windings of the spiral,
Z[mi +0.1
1'
and r is a constant representing the number of complete sin waves per
spiral turn. Note that the 1/8 offset forces the target to be at the 45°
position so that the global optima coordinates are the same in each
dimension. The #4 term forces the one of the sin curve optima to
coincide with the optima of the ﬁrst two terms.
Type: Minimization Range: No effective range limits
Global Optima: ['i/d(p + g) r\:/d(p + ,1?) . WM]
Value at Global Optima: 0.0
Description: Unimodal, Asymmetric
lO8
Figure 2.27 2DMultimodal Spiral Function
109
2.4.22 Frequency Modulation Sounds (FMS) Problem
This function was introduced in [Tsutsui 93]. This function is a ﬁxed dimensional
problem with six input parameters. Nonetheless, due to the high level of parameter
interaction, this function is one of the most epistatic functions in this list. This function is
highly multirnodal, and many of the parameter interactions are fairly chaotic.
100 2
Equation: f f,m. = Z (“o—yo (0) , where
(=0
y(t)x sin 1: 91+): sin x ﬂ+x sin(x 2g) and
1 2100 3 4100 5 6100 ’
y0(t)=1.Osin 5.02—m—+l.5sin 4.92—”'+ 2.0sin(4.83’5)
100 100 100
Type: Minimization Range: Listed as [6.4, 6.35]
Global Optima: [1.0, 5.0, 1.5, 4.9, 2.0, 4.8] Value at Global Optima: 0.0
Description: Multimodal, chaotic
~110
.04 0.0 64
X6
Figure 2.28 FMS Function within Initial Range, x5 and x;
(All other parameter values held at optimal.)
lll
Figure 2.29 FMS Function within Initial Range, x; and x4
(All other parameter values held at optimal.)
112
2.4.23 Exponential Function
This ﬁmction provides an exponential successive reﬁnement problem if the initial
parameter ranges are selected to be equal. It is interesting to note that once the values
reach the Ix,  51 region, the successive reﬁnement problem is effectively inverted.
Equation: in
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, CenterFocused
45.0 0.0' so
xi
Figure 2.30 2D Exponential Function
113
2.4.24 ChainLink Function
This function is similar to other signalto—noise discrimination functions with
periodic masks such as Griewangk’s function, etc. However, in this landscape, the local
optima are long narrow valleys, rather than pits, and the symmetry of the landscape is
nonaxially aligned. Parameter contributions are not independent.
n—l 2
Equation: Z(sin(x,) sin(x,+1))2 + Z x,
i=1 4000
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, CenterFocused
40.0 0.0 10.0
Figure 2.31 2D ChainLink Function
114—
2.4.25 Double Cos Function
The double cos function is a highly eipstatic function which nonetheless has a
fairly strong central basin and remains symmetric and axially aligned.
n—l 2
Equation: 1 + Zeos(x, cos(x,+1))cos(x, cos(x,+1))+ Z 4330
i =1
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal, Symmetric, Chaotic
lllﬁ‘ii HI.”
J'i'lll. .lll' .','L'
Figure 2.32 2D Double Cos Function
115
2.4.26 Inverse Exponential Function
This function is characterized by a weak slope toward the optima in the positive x
irange, and a slope away from the global optima in the negative x , range, with an
extremely high activation barrier to the immediate leﬁ of the global optima. Note that as
the index increases, the tails of the slopes become ﬂatter and the plunge to the optima
becomes more precipitous.
n . xi
Equation: Zﬂxiai) where f(0,i) = 0 ’ f(a ¢ 0’1) = [0111)]
i =1
Type: Minimization Range: Dependent on problem size and
encoding precision
Scalability: May not scale well to high dimensionality, depending on encoding
precision
Global Optima: [0, O, ..., 0] Value at Global Optima: 0.0
Description: Unimodal, Asymmetric, Independent variables
ll6
Figure 2.33 2D Inverse Exponential Function
ll7
2.4.27 Worms Function
This function provides yet another variation of the classic signaltonoise
discrimination problem; however, in this instance the noise function provides basins of
attraction which are aligned diagonally to the axes, and which primarily do not lead
toward the global optima.
Algorithm:
1. Deﬁne the realvalued modulo function m, as m(a,b) = a — bBJ , and the
realvalued ﬂoor function, q as q(a,b) = a — m(a,b).
2. For each pair of values, x, , xi+1 , calculate
di = l3q[xi+1 + %,w]—l—sin(xi+ w is a constant parameter of this
w
ﬁmction.
n—l 2 )1 x2
3. Ifdi> I, set d; = 2  (1;. The function value is: 2d; + Z —'— .
. . 4000
1:1 1:1
Type: Minimization Range: No effective range limits
Global Optima: [0, 0, ..., 0] Value at Global Optima: 0.0
Description: Multimodal
118
Xm
10.0
0.0
40.0
L A A ‘i A
40.0 0.0 10.0
Xi
Figure 2.34 2D Worms Function
1 l9
2.4.28 Schwefel’s Function
This function presents a series of axially aligned local optima which increase in
area and depth as the parameter values increase. In order to provide an effective limit,
strict range enforcement is required. Interestingly, this function can be used for either
minimization or maximization since it is inversely reﬂected across the x, = x; + 1 planes.
The parameter contributions are independent.
I:
Equation: Zxi sin( lxil)
i =1
Type: Minimization Range: [500,500]
Global Optima: [420.968745, 420.968745,..., 420.968745, ]
Value at Global Optima: approx. 11 * 418.9828872724338
Description: Multimodal, Symmetric, Independent Variables
120
x91
Figure 2.35 2D Schwefel’s Function
121
2.4.29 Dynamic Control
This function overlays a square function which provides dual pressure. The ﬁrst
clause forces all variables toward their neighbor values, while the second forces all
variables toward zero. However, given the equal weighting of the two factors, moving an
individual value to zero without also moving its neighbors results in a signiﬁcantly worse
objective value. Therefore all values must be moved simultaneously. No image is
supplied for this function. This is similar to the dynamic control function found in
[J anikow 91].
n—l n
Equation: 20:, wig+1)2 + inz
i=1 i=1
Type: Minimization Range: No effective range limitations
Global Optima: [0,0,...,0]
Value at Global Optima: 0.0
Description: Unimodal, Symmetric, Dependent Variables
122
Chapter 3
Analysis of Operators
Analysis of the NFL theorems [Wolpert 97] (see Section 2.1) implies that the
efﬁciency of a search process is dependent on the relative alignment between the
assumptions made by a search operator and the problem landscape under investigation.
One of the strongest assumptions that an operator demonstrates is the expected location
of better individuals relative to a previously sampled point, or relative to the current
population as a whole. For this reason, a study of the statistical biases of EC operators is
warranted.
Given the stochastic nature of most EC operators, bias analysis cannot effectively
be carried out using a single application instance. However, analysis of the distribution
induced by a number of operator applications in terms of population level statistical
characteristics may provide insight into the relative bias of a given operator. These
statistical biases equate to search biases or assumptions. For example, bias against
123
covariance implies assumptions of parameter independence or context sensitivity of
parameter relationships. Variance reduction and center tending imply the assumption that
the landscape contains concentric basins of attraction that are amenable to hill climbing.
In section 3.1, the various statistical measures and their implications are outlined.
This section includes closed form evaluation of several common operators as case studies
to demonstrate various forms of bias. Section 3.2 discusses the relationship between
local statistics and invariance to homeomorphic transformations of the encoding space,
and its implication to EC system implementation. This section also includes a discussion
as to the validity of using population local statistics during search. Finally, Section 3.3
presents a series of empirical tests and an initial battery of test cases that are then carried
out on a selected number of representative search operators. The results of these tests are
analyzed and a preliminary sample operator taxonomy is presented.
3. 1 Statistical Distribution Analysis
In this section we will outline various statistical measures present in any group of
sampled landscape points and discuss the forms of measurement of these values and what
biases may be observed through modiﬁcation of these values between parent and child
populations. Where appropriate, direct closed form distributional analysis will be
performed to provide examples of such bias as present in existing EC operators.
In general our analysis will consist of statistical analysis of distributions produced
from a single set of applications of a given operator (as in a single generation), as
opposed to long term Markovian effects. Examples are typically presented as
hypothetical input population distributions, without determination of how such conditions
might arise in an evolutionary system. Analysis consists of comparing the statistical
124
differences between input and output distributions over a single set of operator
applications. When discussing actual evolutionary computation systems, it may be
necessary to differentiate between prebreedingselection and postbreedingselection
populations. A prebreeding—selection population refers to the current population before
selection for breeding takes place. A postbreeding—selection population refers to the
actual or virtual pool of individuals selected for reproduction, whether or not the current
operator is being applied to them. In systems where no breeding selection occurs, pre
breedingselection and postbreedingselection populations are equivalent. In order to
separate the effects of breedingselection and breeding operators, we will normally
compare the results of operator application to the prior postbreedingselection
population. Equivalently, the operators may be considered as being applied in isolation
without any effective breeding selection. The reader may refer to section 2.2.1 for a more
general overview of breeding selection in evolutionary algorithms.
Note that in these analyses, we typically ignore implementation issues such as
rates of application for the various operators. Certainly an EC system that selectively
applies a given operator will only induce a portion of the bias that could be produced by
maximal application of that operator. However, in this analysis we are more interested in
determining the extent of these biases where present. Indeed, an EC system can avoid all
possible bias by refusing to employ any operators at all; however, such a system would
be of little practical use.
3.1.1 Mean Disturbance
Disturbances to the population mean may be evaluated as statistical ﬂuctuations
or as intentional modiﬁcations. Statistical ﬂuctuations occur due to the stochastic nature
125
of the operators and the fact that ﬁnite populations and ﬁnite sampling are being used.
For an individual Operator, these ﬂuctuations tend to follow random vectors whose
magnitude is relative to the magnitude of the variance modiﬁcation induced by the
operator on a population associated with a landscape. As such, these mean disturbances
are subject to the standard random walk analysis. For example, in l and 2 dimensional
search spaces the mean will revisit any given point an inﬁnite number of times over
inﬁnite time; however in spaces at or above 3 dimensions, revisiting any individual point
during a random walk is extremely improbable.
However, we know that population means do indeed tend to revisit locations
consistently within problem spaces of greater than 3 dimensions. Although the statistical
ﬂuctuations of the mean for individual Operators tend to be random, the selection operator
tends to correct these random ﬂuctuations over time thus negating them (assuming a local
attractor in the landscape). Therefore, on the whole we can ignore such ﬂuctuations as
long as they tend to be symmetrical and unbiased except in the consideration of genetic
drift. (See 3.1.2 Variance Disturbance for a discussion of genetic drift.)
On the other hand, as the goal of most search techniques is to focus the sampling
near the global optima, it is necessary to move the population mean toward the optima.
This movement is achieved by leaving these ﬂuctuations or a component of them
uncorrected. Therefore the magnitude of these ﬂuctuations provides information about
the search speed of an EC system. Also it determines the system’s ability to escape local
minima. It is important to keep in mind that the “mean” is a ﬁctitious point in an EC in
that it may or may not have been actually sampled, and that the ﬁtness of the population
mean is not necessarily related to the ﬁtness mean of the individuals in the population.
126
Intentional modiﬁcation of the population mean indicates a biasing of the search
process, in that a given area of the search space is more heavily sampled based upon the
assumptions of the operator. Typically the assumptions of the operator are guided by
feedback from the currently sampled landscape positions (i.e. the ﬁtness and location of
individuals in the population). The assumption is that ﬁtter individuals point toward
more fruitful search areas while less ﬁt individuals point toward less productive search
regions. A standard example of intentional mean shifting is BLXab. BLXab
operation is similar to the operation of BLXa, with the exception that the search area is
nonsymmetrical about the mean of the two parents. That is, one side of the distribution
(typically the one toward the more ﬁt parent) is larger than the other. The intent is to bias
the population toward searching closer to more ﬁt individuals. Such biases tend to
reduce the variance of the population in local hillclimbing situations.
There is a potential for a strong philosophical objection toward such biasing
operators. Chieﬂy, there already exists an operator that by nature tends to focus the
search and reduce variance in local hill climbing situations, namely selection. In many
EC systems the difﬁculty is not in achieving more rapid convergence, but more thorough
search. Thoroughness and speed are antithetical goals as an increase in speed
necessitates less opportunity for search, and therefore a reduction in search thoroughness.
In situations in which EC systems tend to converge quickly, adding a second mechanism
to amplify the focus of the search tends to reduce the overall thoroughness of the search.
In our analysis here, we will tend to ignore operators that intentionally induce a
movement of the mean (including selection), since such fundamental bias makes it
difﬁcult to determine a baseline for further evaluative statistics. For example, if the mean
127
of the child population is shifted relative to that of the parent population, should the
variance of the child distribution be measured from the original mean of the population or
the newly induced one? Similar issues come into play for other statistical measures as
well. Furthermore, such biases typically require a ﬁtness map of the search landscape
which reduces the generality of the analysis. Operators that are not ﬁtness sensitive can
be analyzed in a ﬁtness neutral manner. Further, most EC operators are intentionally
unbiased (symmetric) in their search behavior, so a large number of existing and potential
EC operators may still be evaluated. Extension of this evaluation to intentionally biased
operators is left for future research.
3.1.2 Variance Disturbance
While the selection operator is the primary population mean modiﬁer, most
evolutionary operators tend to modify the population variance. Traditional EC mutation
operators tend to modify the population variance in a speciﬁc manner regardless of the
current population distribution. Selection also tends to modify the population variance;
however, here the magnitude and direction of variance modiﬁcation can be highly
dependent on the landscape and initial population distribution.
Selection disregards certain members of the population and overselects other
members, thereby decreasing the number of contributors to the variance. However, it is
incorrect to conclude that selection is therefore always a variance reducing operation.
For example, consider the situation illustrated in Figure 3.1a where the current population
consists of four points. If we imagine a symmetric hill descending landscape where
points A and D are favored over B and C, it is possible that breeding selection may
choose A twice and D twice and disregard B and C altogether. In this case, the
128
population variance changes from 3 1/3 before selection to 6 aﬁer selection, thereby
nearly doubling the population variance. Likewise, we might consider a similar two
dimensional case as in Figure 3.1b where the population is currently distributed in a ring
shape (where darker areas represent increased ﬁtness). Again, if we assume a ﬁtness
landscape which favors hill descending, then it is expected that the selected population
will be distributed further from the mean (the darker colored points) thereby again
increasing the population variance. While hill descending is a common component to
these examples, it is not a requirement. Multiple vectors of attraction in the ﬁtness
landscape for selection are required to exhibit varianceincreasing behavior, but this can
be possible in landscapes which do not exhibit hill descending behavior. For example,
the situation depicted in Figure 3.1a can as easily be imagined as a twopeak hill
clirnbing problem with competing peaks at points A and D. In monotonic, single
attractor (hill climbing) landscapes, the behavior of selection is nearly always variance
reducing.
Genetic drift deﬁnes the effects of ﬁnite sampling through use of ﬁnite
populations causing minute ﬂuctuations in the variance of individual alleles. This
ﬂuctuation means that one individual is favored over equally ﬁt individuals. Coupled
with selection this ﬂuctuation produces a random dominance effect between equally ﬁt
individuals. In genetic terms, heterozygosity (the representation of multiple competitive
alleles in the population) decreases through genetic drift. In fact, over inﬁnite time
heterozygosity decreases to zero due to genetic drift, which means that eventually one
allele will dominate the population to the exclusion of all others. While in the short term
selection may cause increases to the population variance, the fact that we are using ﬁnite
129
populations means that ultimately genetic drift will cause the population variance to be
reduced to zero.
Most mutation operators employed in EC increase the population variance. Given
that the normal method of mutation is to impose a sampling distribution about one or
more individuals in the current population, typically by addition of a random sample from
a selected distribution, the net effect is to increase the variance of the resulting child
population. For this reason, mutation is often synonymous with variance increase. That
is to say, most operators that increase the population variance are viewed as containing an
element of mutation.
One of the few examples of a consistent variancereducing operator other than
selection is the averaging crossover [Davis 91], where each two selected parents produce
a child at their center mass, or mean point. To demonstrate this, consider the average
contribution of two parents to the population variance as compared to the variance
contributed by their mean. For simplicity, assume the population mean at the origin (i.e.
zero). Since the variance is relative to the distance to the mean, not its absolute
coordinate placement, this assumption does not invalidate the argument for populations
where the mean is not at the origin. The average contribution of the parents is given as
2 2
x + x + .
—1——2— , whereas that of their mean point is given as [fl—51;] . The difference rs
2 2 2
. x + x + . . . . .
then given by: —1—2——2— ~[ﬂ—2—f—2—j , whrch reduces to (x1 — x2 )2 , whrch rs a posrtrve
value for all xpt x;. We conclude that the averaging crossover operator loses variance
130
relative to the square of the average distance between parents, (which is another
deﬁnition of the population variance).
Increased Increased
A B C D
1 2 3 4
Increased
(a) one dimensional hill descending (b) two dimensional hill descending
Figure 3.1 Example Situations which Increase Variance through Selection
In contrast to mutation operators, most crossover operators tend to preserve the
current variance in the population. In order to differentiate between the effects of
selection and crossover, we can measure the population variance difference between the
post breedingselection pool (i.e. the average variance of those selected for reproduction),
and the collection of all children produced by a crossover operator. Standard ﬁeldbased
crossover does not modify individual ﬁeld values. Further, since each allele represented
in the post breedingselection pool will be present in some child solution, standard ﬁeld
based crossover always has a zero net modiﬁcation of the population variance. That is,
the variance of the population after crossover is exactly the same as it was before
crossover. Global intermediate recombination, which repopulates the population from
the mean and variance of the parent population, also tends toward zero net population
variance modiﬁcation. However, there can be signiﬁcant ﬂuctuation in the variance
131
modiﬁcation under global intermediate recombination due to the fact that the sampling
set (number of crossovers applied) is ﬁnite.
3.1.3 Mean Focusing (Center Tending)
It is possible for two operators to produce distinct distributions with identical
mean and variance measures, which nonetheless represent radically different search
distributions. For example, one operator may create a search distribution which
uniformly samples within a certain range of the population mean, while a second operator
might sample densely about the population mean and produce a small number of search
points far beyond the range of the ﬁrst operator’s distribution. If these distant search
points are sufﬁciently distant and balanced about the mean, the resulting mean and
variance measures may be identical to that of the ﬁrst operator. Clearly the second
operator demonstrates an assumption that highly sampled local search with occasional
large jumps provides a more efﬁcient search pattern; however, this assumption is not
apparent in the normal statistical measurements between the parent and child populations.
There are numerous metrics that might provide some insight into the shape of the
search distribution. Since our ﬁrst two measures deal with the ﬁrst two moments of the
population, it is only natural to think about use of the third moment, or skewness of the
distribution. In practice, the third moment is somewhat impractical to compute for multi
dimensional search spaces and requires an exponentially larger number of sample points.
Also, it may be difﬁcult to compare two relative coskewness matrices in terms of
similarity.
A simpler metric is to estimate the bias of the distribution of the distances of the
population members from the mean. While this metric in unable to distinguish between
132
equivalent, but rotated, asymmetrical distributions, we may assume other metrics such as
the measure of covariance disruption should allow us to detect these situations. Given
the distribution of distances about the mean, we can use the ﬁrst and second moments and
their relation to the median distance (similar to the Pearson skewness coefﬁcient) to
produce a metric which measures the degree of bias toward the center of a given
distribution. For more details on the exact formulation of this metric, see Section 3.2.1.4.
3.1.4 Covariance Disturbance
The covariance of two variates is an indication of the level of correlation between
them. Two equivalent formulas for calculating the covariance of two variates are given
in Equation 3.1. Since covariance is relative to a mean, covariance of single points is
meaningless, so we will typically address the covariance of an entire population or
subgroup of a population.
COV(xi:xj) = 5101' *ﬂiXxJ' .Uj))= mej) — E(x,)E(xj)
Equation 3.1 Formulas for Sample Covariance
Disturbance of the covariance between parameters or ﬁelds by an evolutionary
computation operator implies an assumption that the existing covariance is either an
aberration, or an indication of an underlying relationship inherent to the problem space.
Depending on the assumptions made, the covariance may be dampened or ampliﬁed by
application of a given operator. Historically, EC researchers have only recently begun to
focus on the covariance effects of their operators directly, therefore we ﬁnd that typically
most operators tend to dampen covariant tendencies in the population since they are not
speciﬁcally designed to preserve them. Subsection 3.1.4.1 examines the covariance
133
disturbance inherent in standard GA crossover, while 3.1.4.2 provides similar analysis for
BLXa. Section 3.1.4.3 further discusses the signiﬁcance of covariance loss during
evolutionary search.
3.1.4.1 Crossover Covariance Disturbance
Consider the three example parent distributions illustrated in Figure 3.2. In each
example, the parents are uniformly distributed along a line segment that is one unit in
length (the source population points are distributed along the heavy black line segment).
In parent distribution a, the slope of the line segment is 0, for b and c, the slope is 1/2 and
1 respectively. For each distribution, the gray area indicates the expected exploration
area produced by application of standard ﬁeldoriented crossover. (Note that in example
a the area of exploration is the same as the distribution of the parents.)
.____. .74 ._74_.
a)m=0 b)m=1/2 c)m=1
Figure 3.2 Three example parent distributions
l34
Given these examples, it seems obvious that more exploration is taking place in
example c than in b, which in turn demonstrates more exploration than in example a. In
fact, example a demonstrates that crossover carries out no exploration once one of the
parameters has converged, while (as we shall prove) example c shows the maximal
exploration produced by a crossover for a continuous uniform population along a line
segment. Observing behaviors graphically in this manner can provide some insight into
the overall behavior of an operator; however, a mathematical analysis provides a more
useful framework for generalization and characterization of such properties.
An operator which retrieves information for parameters xi, and iii independently
from the post breeding selection pool (i.e. from independent sources) may be classiﬁed
as fully dissociative for parameters i and j. Dominant recombination with m = u is an
example of a fully dissociative operator. A standard crossover operator with a 100%
application rate would also be fully dissociative. Given that the parents are chosen
uniformly from the post breeding selection pool, the expected covariance between xi and
xj will be zero since the expected covariance between any two independent variables is
zero. Given this result we can calculate the expected percentage of covariance loss for
any partially dissociative operator.
Consider a partially dissociative operator 0. We can partition the children
produced by 0 according to whether parameters Xi,m and me for child 111 came from
separate sources. (Note that for this level of analysis we assume that independence of
sources implies independence of values; however, this is not true if x”, and xi,q ﬁ'om
parents p and q of child m are not independent. Since such issues concern long term
135
Markovian properties, we choose to ignore them for the moment.) Thus, we have created
two sets, one for which xi and xj are independent, and one for which the expected
covariance is the same as the original population. Also, the expected mean of both sets is
the same as the mean of the original population. The formula for sample covariance in
Equation 3.1 implies that covariance contributions are independent as long as the mean
remains constant; therefore, the expected value of the covariance will be the ratio of the
number of nondissociated children to the total number of children multiplied by the
covariance of the original population. (I.e., the dissociated children are expected to
contribute zero net covariance to the child population, so the covariance of the non
dissociated children is averaged across the entire population). Thus the net proportion of
expected covariance loss is equivalent to the net expected level of dissociation. Note that
we can also adjust this calculation for the possibility of two parents having a common
ancestor for xi or xj. Since selecting two parents which are not independent across xi or
xj is equivalent to not dissociating xi and xj between the parents and the children, we can
simply add any such occurrences to the set of nondissociated children, even though 0
was successfully applied. Therefore, the ﬁnal expected proportion of covariance loss is
equivalent to the eﬂective dissociation rate, which is the probability of application of 0
times the probability of dissociation between X; and X] for a given application of O,
multiplied by the probability that X; and xj are independent for randomly selected parents
p and q (or equivalently, one minus the probability that xi or xj are not independent).
l36
Loss = p0 * Pdiss(xi,xj) * pi"d(xi’xj)
Equation 3.2. General Variance Loss Forumla
Let us assume a population with fully independent values for xi and Xj across all
parents (i.e. each parent has a unique value for xi and xj). We can consider the relative
levels of dissociation for a number of forms of standard crossover by considering the
probability that the values are dissociated for different loci within a ﬁxed encoding.
c1 P1
y + dy q.  ?
y .pz ___________________ {2
X x + dx
Figure 3.3. Crossover covariance modification example
2
(x+dx)(y+dy)+xy—,u2 =2xy+dyx+dxy+dxdy—,u
Equation 3.2 Covariance contribution of parents
l37
(Jt+dx)y+x(y+dy)—y2 =2xy+dxy+der—,u2
Equation 3.3 Covariance contribution of children
(2xy+dxy+dyx—,u2)—(2xy+dxy+dyx+dxdy 412) = —dxdy
Equation 3.4 Covariance modiﬁcation
Figure 3.3 depicts the effects of an application of twoparent ﬁeldbased crossover
on two ﬁelds. The covariance contribution of the original parents may be expressed by
Equation 3.2., and the covariance contribution of the children is expressed in Equation
3.3. Thus the change in the covariance of the population that occurs when these children
are substituted for the parents may be expressed as shown in Equation 3.4 as —dxdy.
Since each crossover application produces two children, the effective covariance
displacement per child is half of this value: —dxdy/2. Note that if the parents and children
had been reversed in this example, the covariance modiﬁcation would be dxdy/Z. This
result implies that magnitude of covariance modiﬁcation depends only on the distance
between the parents in each dimension, or equivalently, the distance between the parents
and the slope between the parents. Thus, the level of covariance loss is independent of
the actual placement of the parents relative to the origin. By substituting the slope
formula and Pythagorean identity into Equation 3.4, we can translate this formula into
slopedistance form as given in Equation 3.5.
138
dxdy_ dzm
2 2(l+m2)
, where d and m are the distance and slope between the parents
respectively
Equation 3.5 Magnitude of covariance modification, slopedistance form
Therefore, the relationship of the relative magnitude of the covariance disturbance
and the slope for equal distance between parents is m/(1+m2). This relationship forms a
sigrniodal curve as depicted graphically in Figure 3.4 for d = 1. This relationship has the
interesting properties of reﬂectivity about the origin, f(x) = f(x), and symmetry between
reciprocals, f(l/x) = f(x). Note that if the slope is constant for a given parent distribution,
2 .
we can treat the factor m/(llm ) as a constant; therefore, we can now compare relative
magnitudes of covariance loss of similar distributions with different alignments to the
axes of encoding. For example, consider the relative magnitude of covariance loss for a
fully dissociative operator operating on a population as in Figure 3.2 with slope m =
tan(n/S) as compared to the maximal covariance loss when m = 1. The result shows that
the relative covariance disruption is still 70% of the maximum; however, this also means
that since the covariance disruption will be greater than this level between slopes
tan(rr/8) and tan(31r/8) as well as between the negated slopes tan(1t/8) and
tan(31tl8) then the effective covariance loss is within 70% of the maximum for half of
the possible slopes (the shaded areas in Figure 3.5). We can calculate the expected value
of the magnitude of covariance loss as a percentage of the maximum given a uniform
distribution of the possible slope (angle from the origin) by substituting m = tan(a) into
Equation 3.4 multiplied by «[2 (since m is unifonnly dense from 0 to n/2) and integrating
139
from 0 to n/2 as shown in Equation 3.6. So the expected level of covariance modiﬁcation
is approximately %63 that of the maximum.
\ 2*in/(1+m*m) —
_1
.0 .0
N .5
I l
10 T5 0 5 10
Figure 3.4 Relative covariance disturbance as factor of slope between parents
140
1
Creator: j
Tk Canvas Widget 1
Preview: 
This EPS picture was not saved
with a preview included in it. I
Comment: ‘
This EPS picture will print to a
PostScript printer, but not to
other types of printers.
Figure 3.5 Region of covariance loss 2 70%
7y
2
1 1 / tan(a) da2
2m 0 l+tan(a) 71'
Equation 3.6 Expected relative magnitude of covariance loss for fully dissociative
operators
Equation 3.6 supports our earlier intuitive assessment that the magnitude of
covariance modiﬁcation is related to the slope between the parent solutions. From
Equation 3.6, we can now estimate the magnitude of covariance loss in the situations
depicted in Figure 3.2. First, we need to ﬁnd the distribution of the difference between
two samples from a uniform distribution. It can be shown that Equation 3.7 is the
probability distribution function for the quantity ls, — szl where s], s; e U( a, a + w ).
141
This distribution is depicted in Figure 3.6 for w = 2. From Equation 3.7 we can now
estimate the expected value of d2 necessary for Equation 3.6 by substituting the variable
x for the quantity ls] — szl in Equation 3.7, and then integrating this expression multiplied
by the quantity x2 from 0 to w, resulting in the expression shown in Equation 3.8.
Substituting this result into Equation 3.5 produces the ﬁnal result given in Equation 3.9.
Note that to calculate the level of covariance loss, we now only need the slope of the line
segment and the width of the distribution along that line segment.
2/é*(1x/é) .—
0.8  
0.6  —
0.4 L —
0.2 ~ ~
O
0 of2 0:4 0f6 0:8 i 112 1:4 1:6 1:8 2
Figure 3.6 Expected difference of two samples from U(a,a+2)
2 labl
— l— ,where51,szeU(a,a+w)
W W
Equation 3.7 Pdf for expected difference between two uniform samples
142
W 2
x—(1——{]=—1—w2 ,wherex = lsl—szl, S1. 52 E U(a, a + W)
0 w 6
Equation 3.8 Expected value of d2 for a uniform distribution of width w
wzm
12(1 + m2)
Equation 3.9 Expected level of covariance disruption per child for a uniformly
distributed population along a line segment with slope m and length w
The result in 3.9 is of limited value since it only applies to uniform parent
distributions along a narrowly focused area, which may expected to be somewhat rare
under actual search conditions. However, we extend this result to consider other
properties of covariance loss under crossover. Consider a population which is similarly
aligned as that in Figure 3.2 c), and still strongly covariant, but which is distributed
normally about the mean rather than uniformly. Figure 3.7 illustrates a sample discrete
distribution matching this description. A point of interest would be to determine if the
centralization of the distribution decreases or increases the covariance modiﬁcation, and
to what extent. Intuitively, we would expect the level of covariance loss to decrease,
given that the parents are more tightly clustered about the center of the distribution.
143
0.4 . ~
0.2 ~
0.4 r 
o'.4 0.2 0 0:2 0T4
Figure 3.7 Normal distribution along covariant line segment
It would be useful to extend this form of analysis even further to be able to
include arbitrary relationships between two parameters in the post breeding selection
pool. This is possible given that we are able to compute the joint probability distribution
of d and m between members of the post breeding selection pool, and either the
probability distribution of d or m (or both). If we have such information for continuous
parent distributions, we can substitute the probability functions into Equation 3.5 and
integrate across all possible m and all possible (1. For discrete distributions, we can
directly measure the average contribution for each possible twoparent combination
(including two copies of the same parent if the selection mechanism permits that).
3.1.4.2 BLXa Covariance Disturbance
In order to calculate the relative magnitude of covariance disturbance produced by
BLXa from a given parent distribution, we will make a few simplifying assumptions.
144
First, we assume that the mean of the current population is the origin, thus alleviating the
need to normalize the covariance contributions. Further, we assume that the mean of the
population remains at the origin after operator application. This is a reasonable
approximation since the distribution for standard BLXor is symmetric along the axes, so
for reasonably large populations, the expected mean movement is minor.
In this analysis, we will again address an example of two parents selected from a
uniform linear distribution in two dimensions similar to the situations depicted in Figure
3.2. For this analysis, we assign one parent the coordinates (a,b) and the second the
coordinates (a+v, bIw), as illustrated in Figure 3.8. The child solutions will be selected
uniformly from the grey shaded region. We can easily determine the covariance
contribution of the parents as ab + (a+v)(b+w), since we are assuming the population
mean is zero in both dimensions.
L a a+V
Figure 3.8 Search distribution of a BLXa operator
145
Determining the potential covariance contribution of a child solution toward the
covariance of the next generation is a more difﬁcult computation, since its position is not
necessarily ﬁxed. The contribution can be determined by ﬁnding the average expected
covariance contribution for a child. To accomplish this, we need to integrate the
expression xy over the given limits and average the result by dividing over the total area
being integrated (vw). The result of this calculation is given in Equation 3.10.
a+vb+w 1
j ['——CUdedy
vw
x=ay=b
avw2 bwv2 vzw2 b+W (IiV v w
= avbw+ + = a+— b+—
2 2 4 b a 2 2
Equation 3.10 Expected covariance contribution of a child produced by BLXa
[a+:—)(b+1;—)(ab+(a+v)(b+w))/2=—¥
Equation 3.11 Expected magnitude of covariance loss produced by BLXa
To ﬁnd the expected difference between the covariance of the population when
the child replaces one of the parents, we subtract out the average of the parent covariance
contribution and arrive at the expression given in Equation 3.11.
In our earlier analysis of standard crossover, the distance between two parents
along each dimension was labeled (1x and dy, therefore if we substitute v = ti; and w = (ly
d d
x y which is
into Equation 3.11 (which would be equivalent to BLX0.0), we get —
exactly equal to Equation 3.5 divided by 2. We conclude that fully dissociative BLX0.0
 146 
loses one half of the covariance that a fully dissociative standard crossover does in this
situation. Since the constant probability factor in the integral in Equation 3.10 and the
ranges between the limits are both relative to the width of the distribution of x and y, the
result is the same regardless of the selected widths and therefore the result is independent
of 01. Further, since the average covariance contribution of the parents remains constant,
Equation 3.11 is correct for all values of or. Therefore, the magnitude of covariance loss
remains constant for all values of or. The ﬁmction of or is apparently that of a scaling
value, in that the shape of the distribution (as characterized by the level of covariance)
remains constant, but the total variance in the child population increases or decreases
proportionally with on.
It is important to note that this result is extremely limited as it only applies to the
covariance modiﬁcation during a single operator application to a single pair of parents in
two dimensions. These results are likely to change when situations with higher
dimensionality are considered. Further, in order to predict the overall level of covariance
loss for an entire single generation with multiple applications of a given operator, we
would need to be able to determine the expected distribution of distances between the
parents.
3.1.4.3 Signiﬁcance of covariance preservation
Many systems tend to ignore covariance altogether, both in terms of landscape
relative indications as well as general search mechanisms. What signiﬁcance, if any, is
there to the relative relationship between variables in a search landscape? What are the
limitations to the information we can gather and what is its relative value and cost?
147
There are two potential sources for covariant tendencies among parameters.
These are spurious alignment and selective pressure. Like other population statistics,
we may assume that both factors are normally present, and the degree to which either
dominates is related to numerous factors, including the population size, local landscape
conditions, etc. Therefore, some portion of the covariance of parameter values within the
population is likely directly due to selection, in the same manner that some portion of the
survivors of selection are likely to inhabit locally (even possibly globally) fruitful search
areas. This latter assumption is seldom questioned; however, many EC systems are quick
to discard other potential information resulting from selection.
The level of information available in the population covariance is likely more
strongly tied to the population size than other population statistics, such as allele
diversity, thereby requiring larger populations. The required calculations are relatively
expensive when compared with simpler recombinative and mutative operators, but are
reasonably low when compared with most realworld application ﬁtness function costs.
A more theoretical cost factor is related to the NFL theorems. By making assumptions
about the validity of the covariance of a given population in terms of directing further
search, NFL implies that such a system should be stronger where such an assumption
holds true, and necessarily weaker where it does not. Thus, it is possible that use and
preservation of covariance information causes a loss of generality; however, the same
argument can be applied to all EC systems in comparison to more general systems such
as random sampling. The potential beneﬁts of using such information would include
more efﬁcient and effective search, and invariance to rotation and diversity preservation
in covariant landscapes.
l48
3.2 Local Statistics and Homeomorphlc Encoding Invariance
One of the central contentions put forth earlier is that evolutionary search
operators and thereby evolutionary search systems should operate consistently relative to
local landscape features regardless of other details. Speciﬁcally, any method which
encodes a given landscape and which does not disturb the relative scale and distances
between search points should produce equivalent results. Encodings such as shifting to
logarithmic encoding or from Cartesian to polar coordinates may be discounted since the
effects of these transformations modify relative local landscape features; effectively
creating a new, if related, search landscape.
At ﬁrst one might consider that an ideal search system would indeed allow for all
such transformations, or indeed would perhaps seek them out in order to facilitate search.
However, using NFL as an analysis tool, it becomes clear that such a system is not a
feasible reality. Since any possible search space may be mapped onto any given single
search space given an arbitrarily complex mapping frmction, any system capable of
remaining consistent across all such transformations would necessarily have to remain
consistent across the space of all possible search problems for a given encoding size.
However, the NFL theorems clearly state such a system is not obtainable.
An argument may be made that for any given transformation a new landscape has
been created — a landscape which remains independent and which cannot (or should not)
be effectively analyzed in terms of its relationship to the original. However, this
argument overextends the intentions of the NFL theorems and reaches conclusions not
fully in evidence. The NFL theorems do not require that each problem be treated fully in
149
isolation with no relationship to other problems regardless of similarities. In fact
[Wolpert 97] suggests problems should be treated in terms of similarities.
Therefore for the set of transformations from which we desire invariant behavior
we must select a nonempty (hopefully nontrivial) subset of the set of all possible
transformations. These transformations are selected primarily on their generality, that is,
the probability that such a transformation is known to occur between various alternate
forms of solution encoding. Candidates include any homeomorphic transformation, such
as a standard afﬁne transform. Any transformations exhibiting isometry, that is, a
continuous transformation that preserves distance, should be included. Common
isometric transformations include coordinate translation, linear rescaling, and rotation of
coordinate axes.
Note that we may consider these transformations to take place in the genotype
space, the phenotype space, or in a ﬁtnessrelative or other operatorspeciﬁc manner.
Given the number of operators included in this study, the potential differences in
genotypic representation, and the naturalness of expression in R" for the class of
functions we are studying, we will use the phenotypic distance (distances between the
represented points regardless of the form of the actual representation) when evaluating
landscape transformations.
In the remainder of this subsection we will examine several common types of
transformations. Subsection 3.2.1 addresses all forms of linear scaling, 3.2.2 examines
rotation of the axis set, and 3.2.3 examines origin translation.
150
3.2.1 Translation of axes
One of the simplest possible transformations to a landscape is translation of the
origin. In terms of encoding, this can simply be accomplished by adding or subtracting a
linear term to each instance of an encoded parameter x, within the ﬁtness evaluation (e. g.
substituting (xi— 1) for all x, tenns within the ﬁtness function). Thus, translation is often
a byproduct of modiﬁcation of some of the constants used within the ﬁtness evaluation.
Translation appears at ﬁrst glance to be such a simple transformation that it is
tempting simply to dismiss it as trivial. However, it is important to note that the level of
precision of certain real valued encodings, namely IEEE ﬂoating point, are signiﬁcantly
higher near zero than about other integers (several hundred or thousands of magnitudes
greater precision). Further, given that half of most typical ﬂoating point encodings
cluster between zero andone, it is certainly possible for careless treatment of the ﬂoating
point constituents to achieve strong bias toward the origin. Such biases coupled with the
natural tendency of researchers to design encodings and test problems that place known
optima near or at the origin may produce an inﬂated level of performance, which
becomes irreproducible on truly unknown landscapes. Such systems tend to degrade
quickly under modest translation of these same landscapes.
3.2.2 Linear rescaling
There are three progressively more limited subsets of the set of all potential forms
of linear rescaling which we will consider here. The most inclusive is asymmetrical
linear rescaling along arbitrary vectors. The vectors selected may or may not be
orthogonal (though any set of nonorthogonal rescalings may be reduced to one or more
151
sets of equivalent rescalings across orthogonal axes). These rescalings may be
asymmetrical in that the magnitude of the rescaling may differ across the different
vectors. A more limited form is asymmetrical linear rescaling along the original axes of
encoding, and the most restrictive form is symmetrical linear rescaling along the original
axes of encoding, where the scale factor remains constant across all axes.
All three subsets have valid circumstances under which a researcher may hope to
achieve equivalent results. In the presence of covariant behavior among the parameters
of a search space, it is possible for arbitrary external inﬂuences to cause nonaxially
aligned linear rescalings. For example, suppose a given ﬁtness landscape is deﬁned by
the error function: f(x) + g(x) + Cl * ( p;  pi), where p, and pi are two of the problem
parameters, c1 is a constant, g(x) is a non—linear function dependent on both i and j, and
ﬁx) is an unknown additional factor in the ﬁtness function which is independent of p, and
1)}. As c1 increases in value, the landscape is effectively compressed along the vector
< I,1 > , and stretched along the <1,1> vector in the i, j plane. Note that if g(x) were
linearly dependent on i and j, then the same compression could be achieved with a pair of
linear rescalings along the i and j axes. The magnitude of the diagonal
compression/expansion is directly dependent on the value of c;. It is possible that the
value of c, is a ﬁxed value, perhaps even of key importance to the speciﬁc problem
instance. However, such scaling factors are often selected quite arbitrarily in practice. A
search system which is not invariant over asymmetrical linear rescaling of arbitrary
152
vectors places the onus of selecting an optimal value of C] squarely on the shoulders of
the researcher, normally without any method for evaluation short of trial and error.
The circumstances that make it desirable to have invariant behavior across
asymmetric linear rescaling along the original encoding axes are more common. Often a
researcher may be uncertain of the actual relative effects of various parameter
interactions. Certainly the researcher is almost always blind to pressures caused by
localized landscape conditions which may be encountered during the search process.
Thus, for a number of search problems the researcher might desire the search process to
be insensitive to the relative scaling of the input parameters. Otherwise, simply deciding
the relative ordering of the scaling factors which should be applied to the parameter set is
on the order of d!, where d is the dimensionality of the problem (i.e. the number of
parameters).
Note that for most linear rescalings some points in the search landscape may
become unreachable due to the relative precision and/or capacity of the underlying
encoding. Likewise, some points which were previously unreachable may become
reachable after any linear rescaling. While this may appear to increase the effective size
of the search space, it is actually the density of the represented points which is increased.
Therefore, we would expect rather than prolonging the search this should actually allow a
higher level of reﬁnement over time. However, it is not our intent here to evaluate
alternative methods of encoding, or to explore the interplay between precision of
representation and search behavior.
Similarly, we may normally expect a level of speed up when the search space is
linearly expanded if the initialization space becomes effectively more condensed.
153
However, we will normally assume that the population initialization occurs across the
same bounds regardless of and scaling which is employed (i.e. the scaling is employed to
the initialization bounds as well).
3.2.3 Rotation of axes
Rotation of the axes of encoding can occur when parameters are used in linear
recombination within the ﬁtness ﬁrnction. Arbitrary rotation may seldom be an
accidental result of an attempt to encode a given problem space within an EC system;
although certainly some problem domains probably tend toward expression in arbitrary
linear combination of parameters (such as solution of linear combinations of equations,
etc.) However, the appeal of maintaining invariance across rotation is that the relative
distances and local ﬁtness relationships remain largely intact, thereby producing an
effectively equivalent landscape in terms of local search characteristics. Given that there
is no effective change to the actual problem landscape, only its orientation, this
transformation approaches axial translation in its simplicity.
Nonetheless, arbitrary free rotation of a problem landscape can drastically effect
the performance of numerous search methods [Salmon 1998] [Fogel 1990] [Patton 1999].
This is largely due to the fact that many EC systems and operators do not attempt to
preserve or estimate covariance information relative to a given local landscape. Rotation
can cause previously independent parameters to be expressed as covariant combinations
of axially aligned parameters in the rotated landscape.
An arbitrary coordinate rotation is easily achieved by inserting a series of d(d1)/2
arbitrary rotations (rotating axis 1' and j clockwise while holding all other axes ﬁxed, for
each possible paring i and j in a d dimensional space) between the encoded parameters
154
and the ﬁtness evaluation. (I.e., the encoded parameters are rotated to the new coordinate
space, and then these coordinates are used in place of the original encoded parameters
within the ﬁtness function.) The encoded parameters now receive ﬁtness feedback based
on their partial contribution toward one or more (potentially all) of the rotated
parameters. This highly covariant situation requires multiple speciﬁc simultaneous
modiﬁcations to simulate a single parameter modiﬁcation in the natural (unrotated)
encoding space.
3.2.4 Underconstrained (Free) Parameters
An underconstrained parameter is one which is ignored (relatively or absolutely)
during ﬁtness evaluation. Addition of unconstrained parameters is a common issue when
attempting to design a problem encoding for a speciﬁc problem type. Often the
researcher is uncertain what variables are expected to directly or indirectly affect a given
quality measure. Therefore, it is often tempting to include as many potential parameters
as possible. During actual operation, a number of these parameters may not be used at all
during ﬁtness evaluation. A parameter in the encoding that provides no contribution to
the ﬁtness generation throughout the search landscape is a free parameter, or
equivalently, an absolutely underconstrained parameter. Extending the dimensionality of
the problem landscape without modifying the underlying ﬁtness evaluation easily
transforms any problem landscape into an underconstrained landscape.
A second form of underconstraint is covariant underconstraint, which occurs
when two or more parameters always operate in linear combination within the ﬁtness
landscape. For example, consider a transformation from an arbitrary landscape to a
similar one with an additional degree of freedom, where the expression (x,  x,,+1), is
155
substituted for each occurrence of the parameter x, in the previous ﬁtness function.
Parameters x, and xn+1 exhibit covariant underconstraint in that there is no ﬁxed value
required for either within the landscape, but rather there are an inﬁnite number of
equivalent solutions along each vector x, — xn+1 = c. This situation closely resembles
pareto optimal search landscapes, which are known to cause difﬁculty with many EC
systems. We can modify the relative slope of these vectors by introduction of a constant
multiplier. This substitution can even ﬁnd nonlinear paths of equivalent solutions by use
of a nonlinear combination of x, and an. Further, the equivalence can be extended to
hypercubic regions, or arbitrary hypercubic shapes by creation of nway covariant
underconstraints in which three or more parameters are combined. Such forms of
underconstraint are seldom intentionally introduced; however, when dealing with
unknown ﬁtness landscapes, redundant parameterization is fairly common.
Relative underconstraint implies that a parameter may not actually be ﬁee, but the
relative contribution of the parameter is so small when compared to the absolute value of
the local ﬁtness landscape that it is effectively underconstrained. Complete relative
underconstraint is a product of rounding errors within the level of precision of the
representation being used. The parameter impact cannot be discerned, because it is
effectively discarded during calculation. General relative underconstraint is closely
related to signaltonoise issues, in that random ﬂuctuations from other parameters may
mask feedback from the parameter. Unlike absolute and covariance underconstraint,
relative underconstraint tends to vary across the ﬁtness landscape.
156
Relative underconstraint is a common problem in the creation of EC ﬁtness
functions. Much of the relative success or failure of a given EC approach may be more
related to the form of the ﬁtness function and the levels of relative underconstraint than
on the forrrr of the EC system being employed. A simple method to create relative
underconstraint is to scale a linear component of the ﬁtness function that is composed
from a subset of the encoding parameters by a relatively large (or very small) factor.
Likewise, relative underconstraint can be “tuned” by intentional introduction of such
constants. Punch [Punch 1991] and Rayrner [Rayrner 2000] explore such modiﬁcations
using evolved scaling values and masks for evolving various pattern discriminator
systems.
All forms of underconstraint provide no or little feedback about the quality of
values in the underconstrained dimensions. Two common outcomes observed in EC
systems working on underconstrained landscapes are loss of diversity and loss of focus.
Hitchhiking and genetic drift are both forms of diversity loss and are more likely to occur
where parameters are relatively underconstrained. On the other hand, systems with an
especially strong mutation component tend to pump nearly limitless energy unchecked
into underconstrained parameters. Both situations cause difﬁculty in relative
underconstraint situations in that by the time the population reaches areas of the search
space where the relative underconstraint has been reduced or eliminated, the population
may not have sufﬁcient diversity or may be too wide spread to ﬁnd an optima.
Some EC systems use global landscape information such as the population
variance, etc. taken from the current or previous population samplings in order to guide
further search directions. Such systems may ﬁnd highly underconstrained landscapes to
157
present problems in differentiating effective noise in underconstrained dimensions from
useful search information. Further, many systems, such as CHC, use convergence
measures to estimate completion or other operational modiﬁcations. If convergence is
measured in genetic or phenotypic terms, underconstraint can shortcircuit such systems.
Some EC systems depend on the relative ordering of parameter locations within a
given encoding. Insertion of unconstrained parameters into these encodings may have
negative or positive effects, but in either case deﬁnitely modify the degree of linkage
between parameters. Some studies have suggested intentional transformations of this sort
precisely to bias such parameter linkage aspects.[Forrest 1992] These transformations
are modeled after the concept of introns in DNA; although Daida [Daida 1999] points out
borrowing such terminology is imprecise at best.
3.2.5 Validity of Local Statistical Extrapolation
There are two potential attitudes toward populationlevel statistics, one is that
they are completely arbitrary and meaningless (essentially stochastic noise) at worst and
potentially incorrectly biased at best; the other is that they contain useful information
which points toward proﬁtable locations for further exploration. To an extent, the
majority of evolutionary systems correspond to modiﬁcations of populationlevel
statistics in that the actions of evolutionary systems are typically tied to the location of
the individuals in the current population. However, individual operators may choose to
employ information from a single individual only or additional information from the
current population. Multisourced operators, those that use information from more than a
single solution, are often labeled multiparent operators. However, multiparent usually
implies that all sources have successfully passed a similar selection phase (e. g. are all part
158
of the postbreeding—selection pool). We will use the broader term cohortdriven
operator (CDC) to refer to any operator which samples one or more individuals from the
current prebreedingselection or postbreedingselection population for the purposes of
extracting additional guidance in selection of ﬁrture search directions. Note that cohotr
driven operators may extract information from one or several additional individuals, up to
and including the entire population or even from past populations. Also, cohortdriven
operators may employ alternate techniques for selection of the cohort for a given operator
application, even possibly employing information from both pre—breedingselection and
postbreedingselection populations or tailoring selections to match currently selected
individuals (e.g. selection of “neighboring” points for a given point). Examples of
cohortdriven operators include all forms of crossover, BLXa, and intermediate
recombination.
Most mutation operators are singlesourced, and as such are representative of
those operators which choose to ignore further information in the population. There are
several potential reasons for ignoring additional cohortlevel information. Reducing the
potential for bias is the typical motivation for most nonCDO mutation operators. Other
motivations for creating singlesourced operators (880) include assumptions that the
space may be locally noisy and allowing independence between operation distributions
and population distributions (i.e. the assumption that operator distributions may be as
well or better determined through alternative means). Examples of singlesourced
operators (SSO) are binary mutation and selfadaptive mutation. Note that for the most
part, since 830 are independent of population level statistics they frequently remain
neutral in population level bias tests.
~159
However, we cannot reach the conclusion that simply because an operator is
singlesourced that it will perforce be completely unbiased and independent of speciﬁc
problem information beyond that represented by the local landscape. The majority of
SSO mutation operators are strongly dependent on the presentation of the landscape in
terms of the selected axes of encoding. Equivalently, we may specify that such operators
are sensitive to the degree of covariance expressed in the encoding of the given
landscape.
Again, the corollaries of the NFL theorems provide a good basis for evaluation of
the potential beneﬁts of CD0 versus SSO. CDO typically exhibit certain biases in
regards to the way in which they select distributions of search points from existing points.
To the degree that the biases of an operator align with a given landscape, a system using
that operator is likely to perform better. Conversely, when the operator bias does not
match the given landscape, a system using less biased SSO is more likely to obtain better
performance.
3.3 Empirical Analysis
The following tests are proposed for measurement of mean modiﬁcation, variance
modiﬁcation, covariance modiﬁcation, and center tending by reproductive operators.
Each test is performed by applying an EC operator in a single pass to one of the standard
“test case” distributions speciﬁed in section 3.4.6., and comparing the measured
statistical difference between the sampled test case points and the offspring produced by
the operator. To completely characterize each operator, it would be necessary obtain
several measurements and estimate the probability distribution for each characterizing
statistic. For our purposes here, we will simply compute the mean, variance, min, and
l60
max for each statistic over a number of measurements, rather than attempt to graphically
display the histogram of each probability distribution. Note that since we are primarily
interested in functions with real domains, all of these measurements will be geometrically
interpreted (i.e. in phenotypic space). However, similar analysis is possible in terms of
genomic or even ﬁtness relative measures.
3.3.1 Statistical Tests
The following statistical measures are designed to allow characterization of the
distribution induced by various operators. These measures are natural extensions of the
statistical measures discussed in section 3.].
3.3.1.1 Mean Modification
This statistic is quite simply the distance between the center of the geometric
locations of all solutions selected for breeding and the center of all solutions produced via
the given operator. In mathematical terms, given solution sets p5, representing the
selected parent solutions (including duplications), and c, representing the set of all
children produced, the total mean modiﬁcation is given as:
2
Pi, j Ck,j
y — A 9 where Pi,j and Ck; represent the jth real
j i k
component of the ith parent, and kth child vectors, respectively, and p is the total number
of parents sampled, and A is the total number of children produced. Altemately, the same
formula may be expressed as:
~161
\[Z((E(Px,j) E(Cy,j V), where x and y are uniformly selected variables in the
ranges [0, p) , and [0, ,1) respectively.
3.3.1.2 Variance Modification
The proposed variance modiﬁcation statistic is also relatively simple. As with the
mean modiﬁcation, we are interested in the change of the total population variance
between the set of all selected parent solutions and the set of all produced solutions for a
given operator. However, there are two alternate candidates for this measurement. The
. . . 2 2 .
Simplest rs to compute the total variance for each set as a p , and a' c, respectively
and then report a'p  ac. This formula will be labeled the global variance modiﬁcation
(GVM). Altemately, given the vectors of variances across each axis of encoding, we can
21%.)“ Tap,j)2
calculate the variance modiﬁcation as: d I d , where vp, , represents the
. th . . . . .
variance of the parent pool along the j axrs and likewrse, vc, , rs the variance of the
children along the same axis. This measure will be denoted as the axial variance
modiﬁcation (AVM). Note that the AVM will always be at least as large in magnitude as
the GVM; however, the GVM will ignore variance modiﬁcations that occur due to
variance shift between dimensions.
Operators that maintain overall population variance while shifting that variance
among dimensions will be characterized by a large difference between the GVM and
AVM measures. Note that the AVM is a limited form of covariance modiﬁcation
l62
measure as well, since such operators much necessarily realign the covariances present in
the two sets.
3.3.1.3 Covariance Modification
As with variance modiﬁcation, there are two possible methods of measuring the
covariance modiﬁcation: globally, and per instance. The covariance modiﬁcation
measure should ideally be invariant to mean shifting and variance modiﬁcation if the
underlying covariance relationships (i.e. the general shape of the distribution as
characterized by the covariance measures) are maintained. Thus, we require a
normalized instance of the covariance matrices for the parent and child sets as input for
this metric.
In order to compute this statistic, we ﬁrst need to compute the covariance matrix
for both the source population pool and the pool of produced solutions. Each element of
the covariance matrix can be determined by the equation:
it
. th
ci,j = 2(xi, k  p, Xx j, k  ,u j ), where xi, k represents the t parameter of the k M
k=0
. . , tlt . .
sample in a pool, and ,u , rs the mean of the l parameter. Next, each covariance matrix
is normalized by dividing each entry c 131' where i at j, by the product of the square roots of
Cl, J.
——————————,i¢j,ci,, ¢0’cj,j $0. In
«Chi ticj,j
the case, where c i, , = 0, or cid = 0, we can effectively ignore, since all covariances in
entries c i, i» and cl; j, unless cs, , = 0. That is,
these rows and columns should be zero as well and therefore have no contribution to this
metric. This effectively normalizes the covariances relative to the independent variable
l63
variances. From this normalized matrix, we can calculate the level of covariance
disruption without regard to mean modiﬁcation or uniform rescaling.
As with the variance modiﬁcation measure, there are two possible methods of
measuring the total covariance disruption — globally and pairwise. However, since we
are explicitly interested in the form and alignment of the population shape when
evaluating covariance modiﬁcation, the global form of this statistic does not provide
much information (except to demonstrate general tendency toward overall reduction or
increase in covariance, which the pairwise statistic will also demonstrate). The resulting
measure of covariance modiﬁcation is taken to be the average of the squares of the
pairwise differences between the upper triangular entries of the normalized parent and
child sample covariance matrices. Or more concisely:
n—l n 2
22: z: (05) wt?)
CMod = 1:11:12“ 1) , where d is the dimensionality of the problem,
(10). .th .th . . .
Ci, j rs the 1 entry on the 1 row of the norrnalrzed covariant matrix calculated from
(C)
the parent samples, and Ci 1' is the corresponding entry in the normalized covariant
matrix from the child pool. Since the number of components for the calculation grow
exponentially, not linearly with the size of the problem space, this metric is normalized as
a function of the dimensionality.
3.3.1.4 Center Tending
In order to measure the degree of center tending, the following algorithm is
proposed, which is similar in nature to the measure of the Pearson skewness coefficient,
l64
with the exception that the third moment is not being used. First, we calculate the
average and median distance from the center for both the set of selected parents and the
set of child solutions. Given the distribution of distances from the population center,
consider averages pp and ye, medians, mp, and me, standard deviations 0p and cc of this
”1’ _mp _Pc Tmc
distribution. Calculate the measure of center tending, CTM, as:
For the case where a", or ac become 0, the corresponding term may be considered to be
zero (since the associated difference between the mean and median values will also be
zero). Note that this formula normalizes the measure from each set individually, thereby
ignoring modiﬁcations to the variance that do not also modify the shape of the population
distribution. Centertending operators will exhibit a negative value on this statistic, while
centeravoiding operators will exhibit positive values.
A center neutral operator (e. g. spherical normal mutation) will still tend to exhibit
a positive value in multidimensional domains simply due to the volumetric differentials.
If we consider an ndimensional hypersphere (e.g. a spherical uniform mutation) with its
center intersecting the surface of a second hypersphere (the hypersphere of all points
distance d and less ﬁom the center of the set of selected parents), the area of intersection
with the second hypersphere will always be less than the area which does not intersect.
This is true for all multidimensional situations, and the effect increases as the
dimensionality increases (even though the effective volume approaches zero as the
dimensionality increases). The ratio of the volume of intersection to the volume of the
initial hypersphere (e. g. the mutation distribution) approaches 0.5 as the radius of the
initial hypersphere approaches zero. As the radius shrinks, the interface of the
l65
intersection appears less curved and begins to approximate a linear boundary. It is
possible to simulate volumetric normalization on a distribution of distance measures
given the dimensionality of the domain; however, this adds a heuristic component to an
otherwise deterministic measure without known beneﬁt.
The unnormalized instance of this statistic correctly reﬂects the probability of an
operator moving toward, or away from the center in terms of distribution modiﬁcation.
Additionally, the value of this statistic for known center neutral operators can be used as
a baseline for comparison for ndimensional domains. A volumetrically normalized
version of this measure might have an advantage in that it would be more sensitive
toward shifts toward center than shifts outward when compared to the unnormalized
instance.
3.3.2 Statistical Test Cases
There are an unlimited possible number of distributions that could be used as the
source parent distributions for these statistical tests. In the vein of the NFL theorems, we
intend to limit our selections to analogs to common localized situations. Hopefully, the
measurement of the effects of a given operator on these test distributions may give some
insight as to the expected outcome of using that operator on a search landscape which
may present similar distributions. Note that all of each distribution is homeomorphic to a
symmetrical distribution (that is, no nonlinear warping is represented).
3.3.2.1 Unidimensional Uniform Aligned
The simplest test distribution, also arguably the most artiﬁcial, is a uniform
sampling along a one dimensional unit vector which is aligned with one of the axes of
166
encoding and remains perpendicular to the remaining encoding axes. An example of
such a distribution would be all samples of the form {c 1, Q, C 3, u ,, c5}, where c1, c2, c3,
and C5 are all arbitrarily selected constants, u , 6U(C4, C4 + 1), and U(a, b) represents a
uniform sample taken from the range ( a, b ). Note that although the distribution itself is
unidimensional, it is cast within an ndimensional encoding space. This allows for
observation of covariance modiﬁcation. Therefore, this distribution is characterized by
n+2 parameters: the number of dimensions, n, the set of constants, ci, and which
dimension is selected for uniform sampling. For consistency, we select
' d 5 + 5 . . . .
c, = 2 (I ma ) , and we assume the ﬁrst drmensron rs the one sampled, allowrng us to
characterize the distribution with the single parameter, it. Note that the variable values
for this test case are completely independent.
3.3.2.2 Unidlmenslonal Uniform Rotated
The second distribution we choose to test is identical to the ﬁrst, with the
exception that the unit vector is rotated in ndimensional space. This causes the sample
to become highly covariant with strongly dependent variables. An arbitrary rotation is
applied by n (ll1) / 2 rotations between each pair of axes (requiring n (nI) / 2 angles
selected uniformly from (0, 2n]. These rotations may be collected together by
multiplication of the given rotation matrices resulting in a normalized rotation matrix (i.e.
one that rotates without rescaling). The previous unit distribution can be sampled, and
each sample can be rotated by multiplication with this rotation matrix. Note that this new
distribution is located in a different locality than the matching uniform aligned
l67
distribution, since the distribution is rotated about the origin, not the center of the aligned
distribution.
While the use of randomly sampled rotation is ideal, it requires the average of
multiple sampling measurements to obtain a reasonable estimate for the given statistical
analyses. Further, this statistic becomes much less stable and reproducible without large
levels of sampling. These difﬁculties increase exponentially with the dimension of the
encoding space, it. For these reasons, we will ﬁx the rotations for this test case to that of
sequential 45° (i.e. 1r/4 radians) rotations between successive dimensions. That is, for
each integer ie(1, nl), rotate dimension i 45° in the direction of dimension i+1. This
rotation results in uniform maximal covariance among the variables of the resulting
rotated distribution.
3.3.2.3 Unidimenslonal Normal Aligned
Flat uniform distributions are expected to be somewhat rare in evolutionary
computation, especially since positive selection pressure tends to cause more normally
distributed points in localized hill climbing situations. Therefore, a slightly less artiﬁcial
distribution might be a normally distributed unidimensional sample. As with the
unidimensional normal aligned distribution, each example of this distribution will be
samples of the form {c 1, c2, c3, n 5, c5}, where c1, c2, C3, and C5 are all arbitrarily selected
constants, u, eN(c4, d), and N(a, b) represents a normal sample with mean a and variance
b. Again, we will reduce the number of free parameters by arbitrarily selecting d = 1, c;
' d 5 + 5 . . . . . . .
= 2 (I "w ) , and c1 as the sampled drmensron. Srnce the normal distribution rs more
168
centralized (and therefore is a closer approximation to a single point distribution), we
expect most statistical effects to be less pronounced with this distribution as opposed to
the unidimensional uniform aligned distribution. Note that since the normal distribution
is an inﬁnite distribution, there is no range limit on this distribution; however, for all
practical purposes, an arbitrary range limit of 6d from the center of the distribution
should not cause signiﬁcant loss.
3.3.2.4 Unidimenslonal Normal Rotated
Given that we have a rotated version of the unidimensional uniform distribution;
it seems natural to investigate a rotated version of the normal revision of this distribution
as well. As with the rotated uniform unidimensional distribution, while random rotation
would be most general, we will limit our usage here to successive 45° rotations as with
the rotated uniform unidimensional.
3.3.2.5 Ndimensional hypersphere surface
An ideal distribution for determining center tending is one that consists of all
points along the surface of an ndimensional hypersphere. This distribution may also be
viewed as the collection of all points exactly distance d from the center point of the
hypersphere. This distribution is characterized by n + 2 parameters, n, d, and the center
point coordinates . Again, in the interest of reducing irrelevant
2(imod5)+5
parameterization, we arbitrarily select d = 1, and c, = . This is certainly an
artiﬁcial distribution, in that its production as a population distribution within an EC
framework would be quite unusual; however, it provides good insight into the hill
climbing/hill descending tendencies of operators.
169
As this distribution is symmetrical, we would not expect the addition of arbitrary
rotation to enhance our understanding of the operator statistics. Therefore, there is no
associated rotated version of this distribution.
In order to create unbiased sampling, each point is selected by arbitrary rotation
(using it (n  I ) / 2 twodimensional uniform random rotations) of the point about the point This produces the least biased sampling along the
surface. Other standard techniques, such as selection within the uniform hypercube
denoted by the comer points, , and
rescaling the resulting vector from the center of the hypersphere to unit length cause bias
toward the comers of the hypercube. Even if we modify this algorithm to discard any
initial points outside the sphere (i.e. only allow positive rescaling of the hypercube
samples), the difference in the level of representational density causes a slight bias which
is not apparent in the rotation method.
3.3.2.6 Ndimensional uniform hypersphere
Given the distribution consisting of all points on the surface of a given
hypersphere, a natural extension is to include the distribution consisting of all points
within the hypersphere. This distribution selects points uniformly at random from the
interior of a unit hypersphere centered on the coordinates . This test
distribution allows us to examine hillclimbing tendencies in a less artiﬁcial setting. The
lack of bias toward the center should allow operators with strong center tending and
center avoiding tendencies to be readily apparent.
l70
The least biased method for sampling this distribution is by producing arbitrary
points within the hypercube denoted by the comer points, , , and discarding points which are greater than (1 from the center of
the hypersphere. However, this method becomes exponentially slow as It increases, due
to the increase in the number of points found outside of the hypersphere. A more
computationally tractable method is to use random rotations and set the length of the
vector equal to l = J; , where p is an uniform sample from the range [0,1). To reduce
parameterization of this distribution, we assume that d = 1.
3.3.2.7 Ndimensional normal distribution
This distribution most naturally depicts a standard hillclimbing situation, in that
the points are strongly clustered about a center point with the density tapering off as a
frmction of the distance from this central point. This distribution is simple to produce,
requiring 11 samples ﬁom the normal unit distribution, N(0,1). Each sample may then be
described as: (C; + it], c; + 112, , on + nu}, where m is an independent sample from the
N(0,1) distribution. This is equivalent to the set of samples of the form:
{s1, s2, , sn}, where s, eN(c, , 1). This distribution is characterized by n+1 parameters,
n, and the center point coordinates c;, for all integers i e (1, n). In order to reduce
2(imod5)+5
extraneous parameterization, we will again choose c, = , thereby allowing
characterization of this distribution with the number of dimensions, a, alone. This
distribution is symmetric and therefore should be invariant under rotation.
171
3.3.2.8 Ndimensional ring distribution
As an EC system begins to converge on a symmetric optima, a possible expected
population distribution is to have a few solutions near the optima, with increasing density
toward a given distance d, followed by decreasing density beyond that point. The
occurrence of this distribution often occurs as follows. First, a point is located near
distance d from the local optima randomly during the search process. Suppose that this
point is the ﬁrst sample ﬁ'om within the attraction basin of the given local optima.
Assuming this is a favorable point, the EC system will begin to search in the local
neighborhood of the landscape. Large jumps have a much higher probability of failure
than smaller ones, so soon we have several points in a normal distribution about the
initial point. If we assume that the ﬁtness isobars surrounding the local optima are
convex (e. g. circles centered on the optima), then selection is likely to begin biasing this
distribution along these isobars. Thus, over time, we may develop a semiring shaped
density where the distance ﬁ'om the center is roughly normal in shape.
To produce this distribution, we begin with the n—dimensional hypersphere
surface and add a symmetric ndimensional normal sample. In order to achieve the
desired distribution, the variance of the normal sample must be small relative to the
radius of the hypersphere. In order to place the tails of the normal sample at the r/3.5,
where r is the radius of the hypersphere. As before, we assume r =1 to reduce extraneous
parameterization.
3.3.2.9 Ndimensional normal ellipsoid rotated
All of the multirnodal distributions outlined to this point are symmetric. In order
to produce an effective nonsymmetric distribution, we can consider a scaled version of
l72
the ndimensional normal distribution. This can be accomplished by sealing each
dimension arbitrarily by s, (or equivalently setting the variance to Si), Where s, is a
uniform sample from the range [I/n,n). In order to reduce the stochastic nature of this
test and to eliminate additional parameterization, we will ﬁx s, = i for each dimension i e
(0,1,...n}.
In order to study the effects of this asymmetry fully we choose to study this
ellipsoid under rotation. While we can deﬁne the general test case in terms of arbitrary
rotation, again to reduce stochastic effects and excess parameterization we arbitrarily
select the same rotation used with the rotated unidimensional uniform distribution.
3.3.2.10 Ndlmenslonal skewed rotated
While the previous distribution shows some level of asymmetry in terms of the
distributional width along various crosssections, that distribution is still symmetric
across the axes of the distribution. In order to create asymmetry across individual axes,
we choose to scale the half of the ellipsoidal distributions in the previous distribution by a
constant term b. This distribution can be computed by sealing all normal samples before
rotation. For example, if we select b = 0. 5, we can ﬁrst create an ndimensional sample
from the nonrotated ellipse as , where c, is a sample from the normal
distribution with variance i. Next, we rescale all negative components by multiplication
by b. And ﬁnally, we apply the required rotation. This creates a distribution that is
foreshortened in each rotated dimension.
173
3.3.3 Example Statistical Analysis
To demonstrate these empirical tests and the forms of differentiation possible
through them, we will evaluate a number of methods for producing new distributions.
First, we focus on simple mutative operators by adding a random sample from a random
distribution with a ﬁxed variance. The distributions evaluated are uniform, normal,
Cauchy, and loguniform. Since numerous algorithms choose to simulate recombination
by sampling points distributed symmetrically about the mean of two or more parents, we
will simulate these ﬁxed variance mutative operators centered both on a single parent and
on the center of two parents. Second, we will evaluate three linear forms of
recombination: averaging, linear, and extended linear. Two forms of dominant crossover,
the two parent (a.k.a. uniform crossover) and nlparent versions are also evaluated. PC
crossover is similar in form to uniform crossover; however, the axes of application are
determined by a sample of the population. BLX0.5, and SPX, and PC Gaussian
operators use position information from two or more parents to determine the search
distribution. Also, we evaluate a modiﬁcation to the BLXO.5 algorithm that is parent
centered, rather than mean centered. Details as to the implementation of each of these
operators in these test is provided in the following sections.
3.3.3.1 Fixed Uniform
For this operator, an independent uniform random sample is added to each
tat—2
component of each solution. The range of the uniform sample rs [~12 ——2—, ,——2—2~], which
produces a zero mean sample with variance of 1. Thus, the expected total variance
addition is dependent on the number of dimensions. For the mean centered version of the
174
operator, the center of mass of two solutions is calculated, and two child solutions are
produced by adding two independent sample sets to this center of mass.
3.3.3.2 Fixed Normal
For this operator, an independent normal random sample is added to each
component of each solution. The normal distribution sampled is that with a zero mean
and a variance of 1. Thus, the expected total variance added is dependent on the number
of dimensions. As with the ﬁxed uniform operator, the mean centered version of this
operator produces two children by adding two independent normal sample sets to the
components of the average of two parent solutions.
3.3.3.3 Fixed Cauchy
This operator functions in the same manner as the ﬁxed uniform and ﬁxed normal
operators above, except that the samples are drawn from a Cauchy distribution with mean
0 and “width” of 1. Note that the variance of the Cauchy distribution is inﬁnite.
3.3.3.4 Fixed LogUniform
This operator adds samples from the loguniform distribution across all
dimensions. The loguniform distribution is deﬁned as a distribution that is uniform in
the distribution of the log, of the samples for some integer radix b, (b>1). (Since all
logarithms are related by a constant multiplicative factor, the selection of b must only be
relative to the width of the uniform distribution.)
For this operator, we choose the sample using the following algorithm:
1. Select two random uniform samples, 5, and s; on the range (0,18).
175
2. Calculate Iu =1031‘18 .
3. If s; multiply In by 1.
Note that the maximum value in this distribution is 1. In order to modify the
relative variance to approximate 1 across each dimension, we also multiple each sample
by the constant value 9.12.
3.3.3.5 Averaging
This operator simply replaces the parent solutions with a solution that is the center
of mass, or mean of two parent solutions. In order to facilitate computation, we select
three parent solutions, p1, p2, and p3, and create three child solutions as the means of p 1
and p2,p2 and p3, and p1 and p3
3.3.3.6 Linear
This operator produces two child solutions from two parents by selecting two
points uniformly along the line between the two parents in nspace. Note that all of the
child solutions are geometrically “between” the parent solutions. Two child solutions are
produced from each pair of parent solutions.
3.3.3.7 Extended Linear
This operator is the same as the linear operator, but the sampling range is
extended along the line between two parents to extend beyond the two parents. The size
of this extension is equal to half of the distance between the two parents, thus, the child
solutions should be equally distributed between the two parents and outside of this area.
Two child solutions are produced from each pair of parent solutions.
176
3.3.3.8 FieldBased Uniform Crossover
This operator treats each dimensional parameter as an indivisible unit, or allele,
and performs crossover between two parents by arbitrarily assigning an allele from one of
the two parents. The twochild solution produced are complementary, in that for each
allele assigned to a child solution from parent 1, the same allele is assigned from parent 2
to the opposing child solution.
3.3.3.9 Global Dominant Recombination
This operator also treats each dimensional parameter as an indivisible unit, or
allele, similar to ﬁeldbased uniform crossover. However, rather than restricting
recombination to two parents, we allow arbitrary selection of a given allele to any
individual solution f;rom a randomly selected pool of parents. For this evaluation, we ﬁx
the poolsize at 50. 50 children are produced from each pool of 50 parents.
3.3.3.10 PC Crossover
This operator uses the eigenspace of the covariance matrix as a basis for
application of uniform crossover. This eigenspace chooses a basis set that aligns one
dimension with the vector of maximal variance in the population; the second axis is
aligned with the maximal variance in an orthogonal direction to the ﬁrst, and so on. This
eigenspace is also the basis of the principal component analysis.
First, the covariance matrix is computed from a uniformly selected pool of
solutions. For this evaluation, the pool size is ﬁxed at 2d, where d is the dimensionality
of the problem space. Next, the eigenspace of this covariance matrix is calculated. The
two parent solutions are remapped into this eigenspace basis, and uniform crossover is
l77
performed on the components of the two solutions as represented in the eigenspace. The
two produced child solutions are then mapped back into the standard encoding axes.
Each two parent samples produce two children solutions with an arbitrarily selected pool
for each operator application.
3.3.3.11 BLXO.5
This operator is a direct application of the standard extended blend crossover
operator, BLX0.5. The components of a child solution are determined
asck =p1,k +sk [p1,]c ‘P2,kl5k e[— O.5,1.5), where S], is a uniform sample. This
operator is identical to the extended linear operator, except that the uniform sampling is
independent across each dimension. Each two child solutions samples the region
delineated by two parent solutions.
3.3.3.12 BLXO.5 Parent Centered
This operator is the same as the BLX0.5 operator, except the range for each
parameter is shifted so that the sampling is centered on the ﬁrst parent solution. That is,
given the formula for BLX0.5, ck = p1,]: + Sk (Pl,k — p2,k ),sk e [— 05,15), the parent
centered version of this operator can be calculated by simply shifting the range of the
uniform samples, 5 k e [ 1,1). Each two parent samples produce tvm child solutions.
3.3.3.13 SPX
This operator directly implements the simplex crossover, SPX. Each operator
application uses n+1 parent samples to produce a single child solution. Each n+1 parent
samples are used to produce n+1 child solutions.
178
3.3.3.14 PC Gaussian
This operator is similar in design to the PC Gaussian operator, except that after
rotation into the eigenspace a Gaussian mutation rather than uniform crossover is
performed. The variances of this mutative distribution are determined by the eigenvalues
associated with each eigenvector. This operator produces one child solution from one
parent, using a pool of associated solutions to provide the covariance sample. The pool
size for this evaluation is ﬁxed as 2d, where d is the dimensionality of the problem space.
3.3.4 Empirical Statistical Results
The following tables demonstrate the results from 100 sets of 10,000 operator
evaluations. Each operator is applied on each of the 9 test distributions. The statistical
measures outline previously: mean modiﬁcation (Au), GVM, AVM, covariance
modiﬁcation (CM), and center focus measure (CF M) are measured for each set of 10,000
evaluations. The averages of all 100 sets are presented in these tables.
3.3.4.1 Results by Distribution
The following tables present the results of the statistical tests organized by the test
distributions. This allows for relative behavioral comparison of operators in the same
environment.
179
Operator An evil AVM CM cFM
Fixed Uniform, P 0.095302 9.9929 9.99692 0.001004 0.0278
Fixed Normal, P 0.095259 9.97957 9.98858 0.001021 0.0456
Fixed Cauchy, P 22.3083 2055480 6358350 12.9643 0.13016
Fixed LogUniform, P 0.093714 9.93702 10.1305 0.000987 0.5402
Fixed Uniform, M 0.096881 9.92675 9.93145 0.00099 0.028305
Fixed Normal, M 0.095265 9.95056 9.96111 0.000985 0.0459
Fixed Cauchy, M 27.4192 2156770 6569580 72.3596 0.12082
Fixed LogUniform, M 0.094141 9.98873 10.1718 0.000989 0.53653
Averaging 0.001926 0.03188 0.140812 0.000288 0.16362
Linear 0.00314 0.02768 0.087546 0 0.09826
Extended Linear 0.006505 0.014019 0.044333 0 0.13481
BLXalpha 0.005899 0.014005 0.044289 0 0.12823
BLXalpha, P 0.009177 0.055007 0.173946 0 0.19751
SPX 0.067286 0.608672 1.92479 0 003284
PO Gaussian 0.007076 0.083353 0.263585 8.46E13 019183
PC Crossover 1.85E—13 1.5E10 2.68E10 3.6E—25 2.24E15
Fieldbased Crossover 1.64E15 1.64E15 5.18E15 0 1.7E15
Global Dominant 0.007308 0.000343 0.004803 0 0.00108
Table 3.1 Results on Aligned Uniform Unimodal Distribution
Operator Art Gv'M AVM CM CFM
Fixed Uniform, P 0.095913 9.99228 9.99621 0.007789 0.029806
Fixed Normal, P 0.092222 10.0121 10.0207 0.007892 0.04467
Fixed Cauchy, P 80.2332 1.535+08 4.845+08 134.089 0.11168
Fixed LogUniform. P 0.096776 9.92752 10.1025 0.007963 0.53995
Fixed Uniform, M 0.098863 9.94697 9.95071 0.00862 0.034669
Fixed Normal, M 0.0954 9.92685 9.93693 0.008592 0.04115
Fixed Cauchy, M 41.8603 11546600 35893200 9821.28 0.12106
Fixed LogUniform, M 0.096395 9.95437 10.129 0.008667 0.53417
Averaging 0.001935 0.03215 0.055517 0.000914 0.16272
Linear 0.003169 0.02778 0.027776 0.000283 0.10262
Extended Linear 0.005266 0.014089 0.014089 5.75505 0.13019
BLXalpha 0.007217 0.013643 0.013951 0.002418 0.026734
BLXalpha, P 0.011313 0.055432 0.055697 0.000435 0.10184
SPX 0.07345 0.620532 0.620532 0.030364 0.04096
PC Gaussian 0.007573 0.082807 0.082807 0.001421 0.18419
PC Crossover 1.35E16 3.2517 1.27E16 7.21533 2516
Fieldbased Crossover 4.54517 1.5E18 5.74517 0.002098 0.062642
Global Dominant 0.00896 41505 0.002258 0.008007 0.034699
Table 3.2 Results on Rotated Uniform Unimodal Distribution
l80
Qperator I An GVM AVM CM CF'M
Fixed Uniform,P 0.100764 9.99772 10.0039 0.001118 0.169404
Fixed Normal,P 0.096511 9.98041 9.99064 0.001071 0.149286
Fixed Cauchy, P 34.4028 5726270 17883100 787.14 0.085567
Fixed LogUniforrn,P 0.101021 10.1174 10.317 0.001123 0.19573
Fixed Uniform,M 0.095777 9.49617 9.61634 0.001072 0.209971
Fixed NorrnaI,M 0.099884 9.48912 9.62041 0.001039 0.158183
Fixed Cauchy, M 35.2497 12002600 37712500 335.802 0.079093
Fixed LogUniform,M 0.096681 9.5047 9.80657 0.001057 0.24116
Averaging 0.002021 0.49377 1.58471 0.00023 0.0026
Linear 0.010425 0.33299 1.05299 0 0.00928
Extended Linear 0.020283 0.16698 0.528036 0 0.03563
BLXalpha 0.020453 0.181205 0.573021 0 0.03704
BLXalpha, P 0.028741 0.669798 2.11809 0 0.06833
SPX 0.236523 7.4097 23.4315 0 0.005203
PC Gaussian 0.022965 0.99181 3.13638 5.4E13 0.01048
PC Crossover 1.84E13 1.5E10 2.67E10 1.46E14 3E17
Fieldbased Crossover 1.32E17 2.22E18 1.19E16 0 4.44E18
Global Dominant 0.024284 0.00891 0.108572 0 0.00166
Table 3.3 Results on Aligned Normal Unimodal Distribution
Operator Au GVM AW CM 3M
md Uniform]? 0.099686 9.99125 9.99654 0.049393 0.184469
Fixed Normal, P 0.097443 9.96539 9.97613 0.049763 0.153426
Fixed Cauchy, P 17.621 964917 2886880 30.7709 0.072889
Fixed LogUniform, P 0.099198 10.0557 10.2486 0.048962 0.19401
Fixed Uniform,M 0.097616 9.49966 9.50411 0.07228 0.212425
Fixed Normal, M 0.101171 9.48936 9.49952 0.072987 0.159413
Fixed Cauchy, M 32.5046 9013020 28243400 307.191 0.083529
Fixed LogUniform,M 0.097137 9.42228 9.61821 0.071652 0.24128
Averaging 0.002124 0.48912 0.493643 0.008906 0.00479
Linear 0.010654 0.33276 0.332757 0.003391 0.00674
Extended Linear 0.021178 0.158937 0.158937 0.000637 0.03515
BLXalpha 0.02544 0.16653 0.173064 0.029077 0.061905
BLXalpha, P 0.041024 0.676501 0.680602 0.005111 0.00096
SPX 0.256775 7.36582 7.36582 0.359701 0.001964
PC Gaussian 0.026285 1.00032 1.00032 0.017237 0.01029
PC Crossover 2.58E17 3.9E16 4.09E16 7.32E33 2.2E17
Fieldbased Crossover 1.23E17 2.11E17 5.15E17 0.025234 0.072309
Global Dominant 0.030563 0.000742 0.043299 0.096032 0.151748
Table 3.4 Results on Rotated Normal Unimodal Distribution
181
Operator Au GVM AVM CM CFM
Fixed Uniform,P 0.098612 9.98519 9.99042 0.001126 0.051685
Fixed Normal, P 0.098227 9.96494 9.97535 0.001178 0.00106
Fixed Cauchy, P 30.2911 6301540 19694200 243.852 0.08353
Fixed LogUnifonn, P 0.095049 9.85522 10.0283 0.001108 0.52844
Fixed Uniform, M 0.100658 9.49271 9.52734 0.001098 0.05559
Fixed Normal,M 0.09743 9.51404 9.55566 0.0011 0.00564
Fixed Cauchy, M 38.5881 9015040 28206400 58.8462 0.0781
Fixed LogUniform, M 0.100468 9.6409 9.85611 0.001113 0.4865
Averaging 0.001882 0.49304 0.912014 4.57505 0.19271
Linear 0.012134 0.33254 0.60782 4.34505 0.322579
Extended Linear 0.023395 0.167295 0.309606 7.53505 0.06361
BLXalpha 0.023746 0.166974 0.309999 0.000125 0.00705
BLXalpha, P 0.038326 0.667384 1.22863 0.000156 0.15128
SPX 0.283269 7.33396 13.4766 0.006962 0.023562
PC Gaussian 0.0283 0.9978 1.82206 0.000165 0.06194
PC Crossover 5.57517 4.7E—16 0.015094 4.54505 0.091521
Fieldbased Crossover 1.31517 1.11E18 9.7517 9.11505 0.132339
Global Dominant 0.028838 0.5505 0.046306 0.000175 0.161629
Table 3.5 Results on ndimensional Hypersphere Surface Distribution
Operator An GVM AVM CM 515M
Fixed Uniform, F 0.100395 9.99787 10.0027 0.001087 0.25882
Fixed Normal,P 0.096509 10.0026 10.013 0.001101 0.31474
Fixed Cauchy, P 35.9693 5706930 17715500 8089.22 0.39358
Fixed LogUniform,P 0.095129 10.0709 10.2489 0.001101 0.8432
Fixed Uniform, M 0.096207 9.58539 9.58976 0.001083 0.25185
Fixed Normal, M 0.095354 9.57453 9.58485 0.001128 0.3265
Fixed Cauchy, M 57.9317 91366900 2.89E+08 46.7645 0.39994
Fixed LogUniforrn, M 0.098437 9.58394 9.77334 0.001119 0.83219
Averaging 0.001847 0.40953 0.412962 3.8E05 0.24269
Linear 0.01124 0.27766 0.278466 3.68E05 0.2259
Extended Linear 0.022863 0.139376 0.143509 6.53505 0.42957
BLXalpha 0.023095 0.138798 0.143449 0.00011 0.30909
BLXalpha, P 0.036031 0.555205 0.560778 0.000127 0.35085
spx 0.269243 6.10068 6.16809 0.006122 0.20386
PC Gaussian 0.028658 0.829183 0.831829 0.000144 0.33211
PC Crossover 4.1517 1.24E16 0.022671 3.87E05 0.16004
Fieldbased Crossover 0 2.2E18 6.59517 7.73505 0.21706
Global Dominant 0.027097 0.00087 0.030967 0.000154 0.29034
Table 3.6 Results on Uniform Density n—dimensional Hypersphere Distribution
182
Sperator r Art CW AVM CM CFM
Fixed Uniform,P 0.096781 10.0163 10.0349 0.001564 0.019011
Fixed Normal,P 0.098614 9.98732 10.0146 0.001596 0.004682
Fixed Cauchy, P 34.0543 5849480 18323500 23866.2 0.07132
Fixed LogUniform, P 0.100604 10.0979 10.283 0.001624 0.26729
Fixed Uniform, M 0.096629 5.49506 5.68638 0.001819 0.029271
Fixed Normal, M 0.099634 5.47853 5.68391 0.001865 0.004272
Fixed Cauchy,M 86.3381 2.1E+08 6.64E+08 2338.61 0.0699
Fixed LogUniform, M 0.098674 5.51143 6.0279 0.001796 0.33956
Averaging 0.003239 4.52368 4.73485 0.000318 0.00076
Linear 0.038609 3.0239 3.16976 0.00044 0.03271
Extended Linear 0.074817 1.52227 1.64713 0.000829 0.1248
BLXalpha 0.07913 1.4953 1.62682 0.001272 0.03883
BLXalpha,P 0.116761 6.07729 6.41829 0.001504 0.07271
SPX 0.860061 66.4272 70.3316 0.074228 0.00477
PC Gaussian 0.087542 9.07689 9.52179 0.001643 0.03598
PC Crossover 1.73E16 2.68E15 0.261213 0.000406 0.00486
Fieldbased Crossover 4.27E17 5.33E17 6.99E16 0.000917 0.000276
Global Dominant 0.092101 0.00686 0.405776 0.001787 0.002844
Table 3.7 Results on ndimensional Normal Distribution
Operator Au GVM AVM CM CFM
Fixed UnifonTP 0.097507 10.0107 10.0175 0.001179 0.03021
Fixed Normal, P 0.0987 9.97933 9.99292 0.001261 0.0166
Fixed Cauchy, P 51.5956 615511001.94E+08 3908.48 0.10254
Fixed LogUniform, P 0.097922 10.1388 10.3286 0.00121 0.46019
Fixed Uniform, M 0.096643 9.08709 9.12477 0.00127 0.033942
Fixed Norrnal,M 0.099034 9.07354 9.11768 0.001255 0.01953
Fixed Cauchy, M 39.3995 18933700 59678700 635.377 0.09394
Fixed LogUniform, M 0.101053 9.10455 9.33673 0.001224 0.47771
Averaging 0.002431 0.89823 1.17863 6.73E05 0.00429
Linear 0.016863 0.60516 0.790924 8.51E05 0.00475
Extended Linear 0.03376 0.297963 0.40044 0.000162 0.13151
BLXalpha 0.032015 0.294687 0.391116 0.000261 0.06882
BLXalpha, P 0.051667 1.20896 1.60121 0.000309 0.12177
SPX 0.38288 13.3774 17.5227 0.015336 0.00803
PC Gaussian 0.042073 1.82104 2.37735 0.000315 0081
PC Crossover 7.6E17 8.1E16 0.038964 8.69E05 0.001561
Fieldbased Crossover 2.1E17 2.2E18 1.66E16 0.000183 0.002821
Global Dominant 0.042751 0.00375 0.079754 0.000341 0.00908
Table 3.8 Results on ndimensional Normal Ring Distribution
l83
Operator Au GVM AVM CM CFM
md Uniform, P 0.100504 9.87642 10.5683 0.005865 0.00116
Fixed Normal, P 0.095766 10.065 10.8199 0.005581 0.00093
Fixed Cauchy, P 43.5692 30896300 97514600 287.722 002964
Fixed LogUniform, P 0.099188 9.94697 10.7487 0.005822 0.001264
Fixed Uniform, M 0.097138 482.687 233.85 0.151382 0.00308
Fixed Normal, M 0.098206 482.619 233.734 0.152666 0.006394
Fixed Cauchy, M 32.3487 6224850 19487400 78.8244 0.02611
Fixed LogUniform, M 0.097173 482.429 233.53 0.150938 0.007776
Averaging 0.013394 492.791 241.779 0.104551 0.003993
Linear 0.243481 427.922 161.018 0.052632 0.02314
Extended Linear 0.482358 63.153 81.4166 0.042842 0.08719
BLXalpha 0.48023 63.0731 81.4285 0.358389 0.03457
BLXalpha, P 0.798515 257.221 325.178 0.118477 0.07127
SPX 5.58187 2805.79 3539.42 7.09697 0.00319
PC Gaussian 0.571992 384.981 482.734 0.249266 0.02335
PC Crossover 1.06E15 8.3E14 9.92877 0.015938 3.29E06
Fieldbased Crossover 3.03E16 5.7E16 3.31 E14 0.30106 0.005396
Global Dominant 0.585842 0.72733 21.4282 1.09751 0.002894
Table 3.9 Results on Rotated ndimensional Hyperellipsoid Distribution
Operator Au GVM AVM CM  CFM
Fixed Uniform, P 0.096076 10.0785 10.4659 0.007042 0.0034
Fixed Normal, P 0.09656 9.98929 10.4375 0.007319 0.007344
Fixed Cauchy, P 27.5999 3940060 12194800 65.7525 0.027919
Fixed LogUniform, P 0.095092 10.1606 10.7219 0.007051 0.006294
Fixed Uniform,M 0.09352 402.644 133.575 0.101616 0.027528
Fixed Normal, M 0.096024 102.61 133.464 0.102183 0.02505
Fixed Cauchy, M 185.41 1.99E+09 6.27E+09 4178.72 0.01249
Fixed LogUniform,M 0.101524 102.74 133.647 0.10145 0.026274
Averaging 0.011062 112.581 141.131 0.060139 0.023516
Linear 0.180892 75.2073 94.4249 0.031193 0.0074
Extended Linear 0.391856 37.1472 48.4681 0.024738 0.05863
BLXalpha 0.351107 37.0258 48.1889 0.212002 0.00388
BLXalpha, P 0.578277 150.157 189.22 0.072682 0.0382
SPX 4.27396 1662.75 2105.17 4.07067 0.014845
PC Gaussian 0.463965 227.011 284.753 0.15255 0.018949
PC Crossover 4.6E15 4.32E14 5.86794 0.009837 0.005653
Fieldbased Crossover 5.96E16 1.7E15 2.19E14 0.177148 0.025954
Global Dominant 0.444635 0.46726 12.1322 0.649314 0.045027
Table 3.10 Results on Rotated ndimensional Skewed Hyperellipsoid Distribution
l84
3.3.4.2 Results by Operator
The following tables represent the same data as the previous section reorganized
by Operator.
Fixed Uniform, P Au GVM AVM CM CFM
Aligned Uniform Unimodal 0.095302 9.9929 9.99692 0.001004 0.0278
Rotated Uniform 0.095913 9.99228 9.99621 0.007789 0.029806
Aligned Normal Unimodal 0.100764 9.99772 10.0039 0.001118 0.169404
Rotated Normal Unimodal 0.099686 9.99125 9.99654 0.049393 0.184469
nSphere Surface 0.098612 9.98519 9.99042 0.001126 0.051685
Uniform Density 0.100395 9.99787 10.0027 0.001087 0.25882
nDim. Normal 0.096781 10.0163 10.0349 0.001564 0.019011
nDim. Ring 0.097507 10.0107 10.0175 0.001179 0.03021
nDim. Rotated Ellipsoid 0.100504 9.87642 10.5683 0.005865 000116
nDim. Ellipsoid Skewed 0.096076 10.0785 10.4659 0.007042 0.0034
Table 3.11 Results for Fixed Uniform Mutation Centered on a Single Parent
Fixed Normal, P Ap GVM AVM CM CFM
Aligned Uniform Unimodal 0.095259 9.97957 9.98858 0.001021 0.0456
Rotated Uniform 0.092222 10.0121 10.0207 0.007892 0.04467
Aligned Normal Unimodal 0.096511 9.98041 9.99064 0.001071 0.149286
Rotated Normal Unimodal 0.097443 9.96539 9.97613 0.049763 0.153426
nSphere Surface 0.098227 9.96494 9.97535 0.001178 0.00106
Uniform Density 0.096509 10.0026 10.013 0.001101 0.31474
nDim. Normal 0.098614 9.98732 10.0146 0.001596 0.004682
nDim. Ring 0.0987 9.97933 9.99292 0.001261 0.0166
nDim. Rotated Ellipsoid 0.095766 10.065 10.8199 0.005581 0.00093
nDim. Ellipsoid Skewed 0.09656 9.98929 10.4375 0.007319 0.007344
Table 3.12 Results for Fixed Normal Mutation Centered on a Single Parent
185
rlﬁed Cauchy, P
AJL
GVM
AVM CM CF M
Aligned Uniform Unimodal
Rotated Uniform Unimodal
Aligned Normal Unimodal
Rotated Normal Unimodal
nSphere Surface
Uniform Density
nDim. Normal
nDim. Ring
nDim. Rotated Ellipsoid
nDim. Ellipsoid Skewed
22.3083 2055480 6358350
80.2332 1.53E+08 4.84E+08
5726270 1 78831 0
964917 2886880
6301 540 1 969420
5706930 1 771 550
5849480 1832350
6155110 1.94E+08
3089630 9751460
3940060 121 9480
34.4028
1 7.621
30.291 1
35.9693
34.0543
51 .5956
43.5692
27.5999
12.9643 0.13016
134.089 0.11168
787.14 0.085567
30.7709 0.072889
243.852 0.08353
8089.22 0.39358
23866.2 0.07132
3908.48 0.10254
287.722 0.02964
65.7525 0.027919]
Table 3.13 Results for Fixed Cauchy Mutation Centered on a Single Parent
Fixed LogUniform, P
All
GVM
AVM CM CFM
Aligned Uniform Unimodal
Rotated Uniform
Aligned Normal Unimodal
Rotated Normal Unimodal
nSphere Surface
Uniform Density
nDim. Normal
nDim. Ring
nDim. Rotated Ellipsoid
nDim. Ellipsoid Skewed
0.093714
0.096776
0.101 021
0.0991 98
0.095049
0.0951 29
0.1 00604
0.097922
0.0991 88
0.095092
9.93702
9.92752
10.1 174
10.0557
9.85522
10.0709
10.0979
10.1388
9.94697
10.1606
10.1305 0.000987
10.1025 0.007963
10.317 0.001123
10.2486 0.048962
10.0283 0.001108
10.2489 0.001101
10.283 0.001624 0.26729
10.3286 0.00121 0.46019
10.7487 0.005822 0.001264
10.7219 0.007051 0.006294
0.5402
0.53995
0. 1 9573
0. 1 9401
0.52844
0.8432
Table 3.14 Results for Fixed LogUniform Mutation Centered on a Single Parent
Fixed Uniform, M Au GVM AVM CM CFM
Aligned Uniform Unimodal 0.096881 9.92675 9.93145 0.00099 0.028305
Rotated Uniform 0.098863 9.94697 9.95071 0.00862 0.0346691
Aligned Normal Unimodal 0.095777 9.49617 9.61634 0.001072 0.209971
Rotated Normal Unimodal 0.097616 9.49966 9.50411 0.07228 0.212425
nSphere Surface 0.100658 9.49271 9.52734 0.001098 0.05559!
Uniform Density 0.096207 9.58539 9.58976 0.001083 0.25185
nDim. Normal 0.096629 5.49506 5.68638 0.001819 0.029271
nDim. Ring 0.096643 9.08709 9.12477 0.00127 0.033942
nDim. Rotated Ellipsoid 0.097138 482.687 233.85 0.151382 0.00308
nDim. Ellipsoid Skewed 0.09352 402.644 133.575 0.101616 0.027528
186
Table 3.15 Results for Fixed Uniform Mutation Centered on the Mean of 2 Parents
Fixed Nomial, M An GVM AVM CM CFM
Aligned Uniform Unimodal 0.095265 9.95056 9961110000985 004591
Rotated Uniform Unimodal 0.0954 9.92685 9.93693 0.008592 0.04115
Aligned Normal Unimodal 0.099884 9.48912 9.62041 0.001039 0.158183
Rotated Normal Unimodal 0.101171 9.48936 9.49952 0.072987 0.159413
nSphere Surface 0.09743 9.51404 9.55566 0.0011 0.00564
Uniform Density 0.095354 9.57453 9.58485 0.001128 0.3265
nDim. Normal 0.099634 5.47853 5.68391 0.001865 0.004272
nDim. Ring 0.099034 9.07354 9.11768 0.001255 0.01953
nDim. Rotated Ellipsoid 0.098206 182.619 233.734 0.152666 0.006394
nDim. Ellipsoid Skewed 0.096024 102.61 133.464 0.102183 0.02505
Table 3.16 Results for Fixed Normal Mutation Centered on the Mean of 2 Parents
Fixed Cauchy, M An GVM AVM CM CFM
Aligned Uniform Unimodal 27.4192 2156770 6569580 72.3596 0.12082
Rotated Uniform Unimodal 41.8603 11546600 35893200 9821.28 012106
Aligned Normal Unimodal 35.2497 12002600 37712500 335.802 0.079093
Rotated Normal Unimodal 32.5046 9013020 28243400 307.191 0.083529l
nSphere Surface 38.5881 9015040 28206400 58.8462 0.0781
Uniform Density 57.9317 91366900 2.89E+08 46.7645 0.39994
nDim. Normal 86.3381 2.1E+08 6.64E+08 2338.61 006991
nDim. Ring 39.3995 18933700 59678700 635.377 009394
nDim. Rotated Ellipsoid 32.3487 6224850 19487400 78.8244 0.02611
nDim. Ellipsoid Skewed 185.41 1.99E+09 6.27E+09 4178.72 0.01249l
Table 3.17 Results for Fixed Cauchy Mutation Centered on the Mean of 2 Parents
Fixed LogUniform, M Ap GVM AVM CM CFM
Aligned Uniform Unimodal 0.094141 9.98873 10.1718 0.000989 0.53653
Rotated Uniform Unimodal 0.096395 9.95437 10.129 0.008667 0.53417
Aligned Normal Unimodal 0.096681 9.5047 9.80657 0.001057 0.24116
Rotated Normal Unimodal 0.097137 9.42228 9.61821 0.071652 0.24128
nSphere Surface 0.100468 9.6409 9.85611 0.001113 0.4865
Uniform Density 0.098437 9.58394 9.77334 0.001119 0.83219
nDim. Normal 0.098674 5.51143 6.0279 0.001796 0.33956
nDim. Ring 0.101053 9.10455 9.33673 0.001224 0.47771
nDim. Rotated Ellipsoid 0.097173 482.429 233.53 0.150938 0.007776
nDim. Ellipsoid Skewed 0.101524 102.74 133.647 0.10145 0.026274
Table 3.18 Results for Fixed LogUniform Mutation Centered on the Mean of 2
Parents
187
Averaging Att GVM AVM CM CFM
Aligned Uniform Unimodal 0.001926 0.03188 0.140812 0.000288 0.16362
Rotated Uniform 0.001935 0.032155 0.055517 0.000914 0.16272
Aligned Normal Unimodal 0002021 0.49377 1.58471 0.00023 0.0026
Rotated Normal Unimodal 0.002124 0.48912 0.493643 0.008906 0.00479[
nSphere Surface 0.001882 0.49304 0.912014 4.57E05 0.19271
Uniform Density 0.001847 0.40953 0.412962 3.8E05 024269
nDim. Normal 0.003239 4.52368 4.73485 0.000318 0.00076
nDim. Ring 0.002431 0.89823 1.17863 6.73E05 000429
nDim. Rotated Ellipsoid 0.013394 492.791 241.779 0.104551 0.003993
nDim. Ellipsoid Skewed 0.011062 412.581 141.131 0.060139 0.023516
Table 3.19 Results for Averaging Crossover
Linear gt GVM AVM CM CFM
Aligned Uniform Unimodal 0.00314 0.02768 0.087546 0 009826
Rotated Uniform 0.003169 0.02778 0.027776 0.000283 0.10262
Aligned Normal Unimodal 0.010425 0.33299 1.05299 0 000928
Rotated Normal Unimodal 0.010654 0.33276 0.332757 0.003391 0.00674
nSphere Surface 0.012134 0.33254 0.60782 4.34E05 0.322579
Uniform Density 0.01124 0.27766 0.278466 3.68E05 02259
nDim. Normal 0.003239 4.52368 4.73485 0.000318 000076
nDim. Ring 0.016863 0.60516 0.790924 8.51E05 000475
nDim. Rotated Ellipsoid 0.243481 427.922 161.018 0.052632 0.02314
nDim. Ellipsoid Skewed 0.180892 75.2073 94.4249 0.031193 0.0074
Table 3.20 Results for Linear Crossover
Extended Linear Au GVM AVM CM CFM
Aligned Uniform Unimodal 0.006505 0.014019 0.044333 0 013481
Rotated Uniform 0.005266 0.014089 0.014089 5.75E05 013019l
Aligned Normal Unimodal 0.020283 0.16698 0.528036 0 003563
Rotated Normal Unimodal 0.021178 0.158937 0.158937 0.000637 0.03515
nSphere Surface 0.023395 0.167295 0.309606 7.53E05 006361
Uniform Density 0.022863 0.139376 0.143509 6.53E05 042957
nDim. Normal 0.074817 1.52227 1.64713 0.000829 0.1248
nDim. Ring 0.03376 0.297963 0.40044 0.000162 0.13151
nDim. Rotated Ellipsoid 0.482358 63.153 81.4166 0.042842 0.08719
nDim. Ellipsoid Skewed 0.391856 37.1472 48.4681 0.024738 0.05863
Table 3.21 Results for Extended Linear Crossover
~188
BLXalpha
All
GVM
AVM CM CJFM
Aligned Uniform Unimodal
Rotated Uniform
Aligned Normal Unimodal
Rotated Normal Unimodal
0.005899 00140050044289
0 0.12823
0.007217 0.013643 0.013951 0.002418 0.026734
0.020453 0.181205 0.573021
0 0.03704
0.02544 0.16653 0.173064 0.029077 0.061905
nSphere Surface 0.023746 0.166974 0.309999 0.000125 0.00705
Uniform Density 0.023095 0.138798 0.143449 0.00011 0.30909
nDim. Normal 0.07913 1.4953 1.62682 0.001272 0.03883
nDim. Ring 0.032015 0.294687 0.391116 0.000261 0.06882
nDim. Rotated Ellipsoid 0.48023 63.0731 81.4285 0.358389 0.03457
nDim. Ellipsoid Skewed 0.351107 37.0258 48.1889 0.212002 0.00388
Table 3.22 Results for standard BLX0.5
BLXalpha, P Ap GVM AVM CM CFM
Aligned Uniform Unimodal 0.00917 0.05500 0.17394 0 0.19751
Rotated Uniform 0.01131 0.05543 0.05569 0.00043 0.10184
Aligned Normal Unimodal 0.02874 0.66979 2.11809 0 0.06833
Rotated Normal Unimodal 0.04102 0.67650 0.68060 0.00511 0.00096
nSphere Surface 0.03832 0.66738 1.22863 0.00015 0.15128
Uniform Density 0.03603 0.55520 0.56077 0.00012 0.35085
nDim. Normal 0.11676 6.07729 6.41829 0.00150 0.07271
nDim. Ring 0.05166 1.20896 1.60121 0.00030 0.12177
nDim. Rotated Ellipsoid 0.79851 257.221 325.178 0.11847 0.07127
nDim. Ellipsoid Skewed 0.57827 150.157 189.22 0.07268 0.0382
Table 3.23 Results for ELK0.5 centered on a Single Parent
SPX A14 GVM AVM CM CFM
Aligned Uniform Unimodal 0.067286 0.6086? 1.92779 0 003284
Rotated Uniform
Aligned Normal Unimodal
Rotated Normal Unimodal
nSphere Surface
Uniform Density
nDim. Normal
nDim. Ring
nDim. Rotated Ellipsoid
nDim. Ellipsoid Skewed
0.07345 0.620532 0.620532 0.030364 0.04096
0.236523
0.256775
0.283269
0.269243
0.860061
0.38288
5.581 87
4.27396
7.4097
7. 36582
7.33396
6. 1 0068
66.4272
1 3.3774
2805.79
1662.75
23.4315 0 0.005203
7.36582 0.359701 0.001964
13.4766 0.006962 0.023562
6.16809 0.006122 0.20386
70.3316 0.074228 0.00477
17.5227 0.015336 0.00803
3539.42 7.09697 0.00319
2105.17 4.07067 0.014845
Table 3.24 Results for Simplex Crossover (SPX)
189
PC Gaussian Ap. GVM AVM CM CFM
Aligned Uniform Unimodal 0.007076 0.083353 0.263585 8.46E—13 019183
Rotated Uniform 0.007573 0.082807 0.082807 0.001421 018419l
Aligned Normal Unimodal 0.022965 0.99181 3.13638 5.4543 0.01048
Rotated Normal Unimodal 0.026285 1.00032 1.00032 0.017237 0.01029
nSphere Surface 0.0283 0.9978 1.82206 0.000165 0.06194
Uniform Density 0.028658 0.829183 0.831829 0.000144 0.33211
nDim. Normal 0.087542 9.07689 9.52179 0.001643 0.03598
nDim. Ring 0.042073 1.82104 2.37735 0.000315 0.081
nDim. Rotated Ellipsoid 0.571992 384.981 482.734 0.249266 0.02335
nDim. Ellipsoid Skewed
0.463965
227.011 284.753 0.15255 0.018949
Table 3.25 Results for Principal Component Gaussian Sampling
PC Crossover Apt GVM AVM CM CFM
Aligned Uniform Unimodal 1.85543 4.5E10 2.68E10 3.6E25 2.24E15
Rotated Uniform 1.35E16 3.2E47 1.27E16 7.21E33 2E16
Aligned Normal Unimodal 1.84E13 4.5E10 2.67E10 1.46E14 3E17
Rotated Normal Unimodal 2.58E47 3.9E16 4.09E16 7.32E33 2.2E47
nSphere Surface 5.57E17 4.7E16 0.015094 4.54E05 0.091521
Uniform Density 4.1E17 1.24E16 0.022671 3.87E05 016004
n—Dim. Normal 1.73E16 2.68E—15 0.261213 0.000406 0.00486
nDim. Ring 7.6E17 8.1E16 0.038964 8.69E05 0.001561
nDim. Rotated Ellipsoid 1.06E15 8.3E—14 9.92877 0.015938 3.29E06
nDim. Ellipsoid Skewed
4.6E15 4.32E14 5.86794 0.009837 0.005653
Table 3.26 Results for Principal Component Crossover
Fieldbased Crossover Ap. GVM AVM CM CFM
Aligned Uniform Unimodal 1.64E15 1.64E15 5.18E15 0 1.71545
Rotated Uniform 4.54E17 4.5E18 5.74E47 0.002098 0.062642
Aligned Normal Unimodal 1.32E17 2.22E18 1.19E16 0 4.44E18
Rotated Normal Unimodal 1.23E17 2.11E17 5.15E17 0.025234 0.072309
nSphere Surface 1.31E17 1.11E18 9.7E17 9.11E—05 0.132339
Uniform Density 0 2.2E18 6.59E17 7.73E05 021706
nDim. Normal 4.27E17 5.33E17 6.99E16 0.000917 0.000276
nDim. Ring 2.1E17 2.2E18 1.66E16 0.000183 0.002821
nDim. Rotated Ellipsoid 3.03E16 5.7E46 3.31E44 0.30106 0.005396
nDim. Ellipsoid Skewed 5.96E16 4.7E15 2.19E14 0.177148 0.025954
Table 3.27 Results for FieldBased Crossover
~190
GlobaTDominant Au GVM AVM CM CFM
Aligned Uniform Unimodal 0.00730 0.00034 0.00480 0 0.00108
Rotated Uniform 0.00896 4E05 0.00225 0.00800 0.03469
Aligned Normal Unimodal 0.02428 0.00891 0.10857 0000166
Rotated Normal Unimodal 0.03056 0.00074 0.04329 0.09603 0.15174
nSphere Surface 0.02883 3.5E05 0.04630 0.00017 0.16162
Uniform Density 0.02709 0.00087 0.03096 0.000150.29034
nDim. Normal 009210000686 0.40577 0.00178 0.00284
nDim. Ring 0.04275 0.00375 0.07975 0.00034 0.00908
nDim. Rotated Ellipsoid 0.58584 0.72733 21.4282 1.09751 0.00289
nDim. Ellipsoid Skewed 0.44463 0.46726 12.1322 0.64931 0.04502
Table 3.28 Results for Global Dominant Recombination
3.3.4.3 Analysis of Results
In comparing uniform, normal, and loguniform ﬁxed mutative schemes, note that
all three provide nearly identical results in terms of variance and covariance disruption,
and mean modiﬁcation. These features are all relative to the variance of the distribution
being sampled. Since we selected the variance to be identical, the results are nearly
identical. Therefore, we see that the results of these values seem to be irrespective of the
choice of normal or uniform distribution. However, the center focusing measure shows
that the ﬁxed normal mutation provides a more strongly centered child distribution than
the uniform, as would be expected.
Surprisingly, the loguniform mutation operator shows greater center tending even
than the Cauchy mutation Operator does. However, the variance of the loguniform
operator is nearly the same as that produced by the normal mutation Operator. This
implies that either the loguniform mutation moves more strongly toward the center or
that it produces a signiﬁcant number of high variance individuals that skew the mean of
the distance distribution outward. Since we know from its design that the loguniform
mutation operator is strongly centered on individual parent solutions, we conclude that
~191
the second situation must be true. Note that the Cauchy distribution also produces a
signiﬁcant level of high variance members; however, here the number produced is large
enough to signiﬁcantly increase the variance and effectively produces a ﬂatter
distribution than the loguniform mutation Operator in all cases. The general result seems
to be that the skew of the mutation distribution toward the center determines the degree
of center focus induced in the child population. While this is hardly an unexpected result,
it does validate these statistical measures to some degree.
A clearer result, demonstrated by analysis of the parent centered and mean
centered versions of the various operators, is that a great deal of variance can be lost if
the width of the operator applied is not relative to the distance between the parents for
mean centered operators. The magnitude of this variance loss is relative to the variance
of the parent distribution, and can be hidden by the relative magnitude of the variance of
the mutative distribution. For example, the ﬁrst four test cases, which provide relatively
little population variance compared to the variance of the ﬁxed normal, ﬁxed uniform,
and ﬁxed loguniform mutation operators, show an average 0.05 differential between the
mean centered and parent centered versions. However, for the ellipsoid test cases, which
have much larger initial population variances, the mean centered versions show huge
variance losses. The effect of mean versus parent centering on covariance loss measure
is consistent across the various operator pairs. In test cases where the uniform mutation
operator shows reduced covariance loss when centered on a single parent, the normal,
loguniform, and BLX0.5 Operators show similar magnitude reductions when parent
centered. This is not true with the Cauchy mutation operator. Also, the test cases having
rotated presentations demonstrate the greatest covariance loss when using the mean
l92
centered versions of the various operators. This is not surprising, since these
distributions have the larger covariances, therefore have more variance to lose. A direct
conclusion is that use of mean centered versus parent centered operators causes a direct
increase in the degree of covariance loss.
Note that for the BLX0.5 operators the parentcentered version provides both an
increase in the variance and a reduction in the covariance loss. This is expected result,
since averaging adds a component of variance loss and covariance loss. Interestingly the
parent centered BLX0.5 also increases the degree of center focusing when compared to
its mean centered version. This tendency is somewhat present in other mean/parent
centered pairs, but is not as pronounced as it is in the BLX0.5 case. This implies that the
parent centered version increases the variance of the search distribution without a
matching increase in the median distance, thus extending the tails of the distribution
without excessively ﬂattening the center.
The PC crossover operator shows maximal invariance on all statistics across all
test cases. This implies that the PC crossover operator preserves the population mean,
variance, covariance, and distribution shape to a high degree. Note that although the
overall variance is preserved, the variance may be redistributed across the dimensions, as
evidenced by the high AVM values. F ield—based crossover also preserves the mean and
population variance to a high degree, but can disrupt the covariance and center tending
characteristics under certain circumstances. The dominant recombination operator
preserves these characteristics to a much lesser degree, as use of multiple parents
provides a much more variant sampling.
~193
Interestingly, the SPX operator shows the greatest level of covariance
modiﬁcation; however, unlike the other operators with high levels of covariance
modiﬁcation this is not due to a loss or reduction of covariance but rather to a sharpening
of the covariance that already exists in the population. Unfortunately the unsigned nature
of the covariance modiﬁcation statistic does not allow us to directly distinguish between
covariance loss and covariance gain, so this observation must be made through direct
evaluation of the distribution of the produced children. In these cases, a clue as to the
covariance enhancing nature of the operator is that the magnitude of covariance
modiﬁcation on the last two test distributions is greater than the total expected
normalized covariance measure. A further metric that could be of use in such situations
would be a measure of relative alignment between the covariances of the parent and child
distributions. However, given the possibility of underconstraint (i.e. singularities in the
covariance matrix), construction of such a metric would be more difﬁcult than simple
direct comparison of the eigenspaces of the two covariance matrices, although that would
be a reasonable ﬁrst approach.
The uniform density sphere test case consistently demonstrates the highest degree
of center focus for each operator. Likewise, the hypersphere surface consistently elicits
the highest degree of center avoidance from each operator. The reason for these
observations is still under investigation.
3.3.5 Preliminary Operator Taxonomy
Using the data from these statistical measurements and the given test
distributions, we can estimate the relative tendencies of the tested operators and thereby
produce an overall categorization of their operation. Note that this is no doubt an
l94
oversimpliﬁcation of the behavior of a number of these operators, but it does provide a
reasonable starting point for categorization of operators. Therefore, we can produce a
preliminary form of taxonomical classiﬁcation for these operators.
3.3.5.1 Mean modification
None of the operators tested shows any direct bias in terms of mean modiﬁcation.
As discussed previously, most mean modifying operators tend to use information from
the objective function to induce bias; however, none of the operators tested are ﬁtness
biased. While some of the operators, such as Cauchy mutation tend to show higher
magnitude displacement of the population mean, the magnitude of displacement is lower,
proportional to the level of variance modiﬁcation, for Cauchy mutation than it is for
several other operators. In summary, for this operator set, the mean modiﬁcation statistic
does not provide any useful differentiation between the operators.
3.3.5.2 Variance modification
There are basically ﬁve classes of variance modiﬁcation behaviors observed
through these tests: high variance addition, variance addition, variance loss, variance
preservation, and variance rescaling. Cauchy mutation adds an extremely high level of
variance, and therefore the Cauchy operators are isolated in this categorization. Variance
losing operators tend to reduce the population variance, often in proportion to the overall
initial population variance. Both the averaging and linear crossover operators, not
surprisingly, demonstrate reduced variance in the produced child distributions. Variance
preserving Operators are those that neither add nor reduce the variance of the population.
Of the three variance preserving operators, PC crossover is the most consistently variance
195
preserving, followed by ﬁeldbased crossover and global dominant recombination. The
majority of the ﬁxed mutation schemes demonstrate basic variance addition, as expected.
However, several operators tend add a degree of variance proportional to the current
variance of the population. This effectively produces a rescaling of the population
variance (e.g. doubling, tripling, etc.). Extended linear recombination, BLXOt, SPX,
extended linear crossover, and PC Gaussian are all variance rescaling operators.
3.3.5.3 Covariance Modification
Four potential classiﬁcations of covariance modiﬁcation behavior have been
composed from the statistical test data: covariance losing, covariance preserving,
covariance altering, and covariance enhancing. Again, we place the Cauchy mutation
operators in a separate categorization, covariance altering, since the large variances tend
to overwhelm the covariance measures (effectively overshadowing the population
covariance). These operators effectively induce an arbitrary alternate covariance on the
population. Covariance losing operators include global dominant recombination, ﬁeld
based recombination, and BLX—a. These operators are characterized by nearly complete
neutralization of the population covariance. We include most of the small variance, ﬁxed
mutation operator as covariance preserving, since the magnitude of the covariance
disruption is relative to the mutative variance in these cases. Note that a better
classiﬁcation would be variance relative covariance loss, but until we test multiple
versions of these operators under various variances, this assumed categorization can not
be proven empirically (although from a theoretical standpoint it seems fairly obvious).
The PC Crossover operator stands out as the most truly covariance preserving operator of
those tested. Both the SPX and PC Gaussian operators can be categorized as covariance
l96
enhancing, in that both operators tend to increase the magnitudes of the covariance
present in the parent population.
3.3.5.4 Center Focusing
Three categories of center focus behavior provide characterization of the shape
modifying tendencies of various operators. These categories are deﬁned as center
focusing, center neutral, and center avoiding. Note that while center focus indicates a
relative shape change, variance loss (or lower degrees of variance addition) indicates a
more direct shift toward the population center. Thus, an operator such as averaging
crossover often demonstrates a ﬂatter distribution, but over a much more centralized
range.
Operators that use a centralized search distribution, such as Cauchy, and log
uniform distributions, tend to induce the search distribution onto the population
distribution. The degree of reshaping is determined by the relative magnitude of the
variance of the search distribution compared to the variance of the initial population.
Tested operators demonstrating various degrees of center focusing are those using log
uniforrn and Cauchy distributions, extended linear crossover, the parent centered version
of BLX0.5, and the PC Gaussian operator.
Center neutral operators include the ﬁxed normal sampling mutation operators,
PC Crossover, SPX, and BLX0.5. Center avoiding operators include averaging
crossover, global dominant recombination, ﬁeldbased crossover, and linear crossover.
l97
3.3.5.5 Preliminary Taxonomy
The following table presents a summary of the preliminary operator taxonomy as
presented above.
Operator Mean Mod. Var. Mod. eggs? :33:
Fixed Uniform, P Var. Relative Adding Preserving* Avoiding
Fixed Normal, P Var. Relative Adding Preserving* Neutral
Fixed Cauchy, P Var. Relative 2219(1an Altering Focusing
Fixed LogUniform, P Var. Relative Adding Preserving* Focusing
Fixed Uniform, M Var. Relative Adding Preserving* Avoiding
Fixed Normal, M Var. Relative Adding Preserving* Neutral
Fixed Cauchy, M Var. Relative 2:19 ding Altering Focusing
Fixed LogUniform, M Var. Relative Adding Preserving* Focusing
Averaging Var. Relative Reducing Losing Avoiding
Linear Var. Relative Reducing Losing Avoiding
Extended Linear Var. Relative Rescaling Preserving Focusing
BLXalpha Var. Relative Rescaling Losing Neutral
BLXalpha, P Var. Relative Rescaling Losing Focusing
SPX Var. Relative Rescaling Enhancing Neutral
PC Gaussian Var. Relative Rescaling Enhancing Focusing
PC Crossover Var. Relative Preserving Preserving Neutral
23:23:? Var. Relative Preserving Losing Avoiding
Global Dominant Var. Relative Preserving Losing Avoiding
* Most likely mutation variance magnitude relative loss.
Table 3.29 Preliminary Taxonomy
Note that this preliminary taxonomy selects the division points between the
various categories in a fairly arbitrary manner. A more quantitative categorization is
l98
possible through the test data; however, the exact formulation of such quantitative scales
requires more extensive research into the interplay of the operators and the test
distributions.
l99
Chapter 4
Alternate Population Relative
Operators
Given the analysis in Chapter 3, it should be clear that numerous standard EC
Operators exhibit bias in terms of modiﬁcation of the search distribution, as well as
invariance or sensitivity to various homeogenic transformations of the search landscape.
The question of the potential value or usefulness of such inherent biases is determined by
the alignment between the assumptions such biases imply and the validity of these
assumptions for a given search space. Noninvariant behavior implies a certain fragility
in terms of the form of encoding used for a given problem, and thereby seems to place
undue burden on the users of such operators. A potential result of this analysis is the
creation of new operators that remain neutral to the preexisting population distribution
and that are invariant in regards to the speciﬁc encoding employed.
200 
A potential approach to producing such Operators is to modify existing operators
such that they are invariant and distribution neutral. Two possible methods for producing
invariance to the selected axes of encoding are either to use unrestricted, or “free” axes
for operator application (i.e. randomly reselecting the axes of application for each
Operator application) or to restrict the axes of rotation to the most dominant set revealed
through analysis of the distribution. While the second seems more likely to also produce
distribution neutral operators, both forms will be compared.
To demonstrate the effectiveness and efﬁciency of any modiﬁed operators, the
relative performance of a system employing the operator will be compared to established
EC systems operating on identical search landscapes. The evaluative comparisons will
take place over the set of test problems outlined in Chapter 2.
4.1 Overview of Benchmark Systems and Empirical Comparisons
For empirical comparisons three standard evolutionary computation approaches
were selected. These systems include standard evolutionary programming, modiﬁed
evolutionary programming substituting Cauchy samples for the normal mutative samples,
and a version of the standard GA using blend plus alpha crossover (BLXa) . The
selected parameterization and operation of each'of these systems is detailed below.
The NFL theorems state that no search system can be considered more powerful
than another over the set of all possible problem spaces; therefore, any comparisons
based on empirical results is restricted by the problems used for evaluation. Attempts to
extrapolate on such results are likely to lead to erroneous conclusions. Nonetheless,
empirical examination remains the simplest and most direct method to measure the
relative strength (in the sense of “strong” and “weak” Al) on various systems. Empirical
201 
examination can provide a hint for characterizing the types of landscapes on which a
given search technique is likely to perform well.
4.1.1 Random Rotational Presentation of Test Functions
Except where explicitly noted, all test functions are presented in an offset and
rotated fashion The method of rotation is determined for a d dimensional problem by
selecting 5d random pairs of dimensions uniforrnly and 5d associated random angles
from 0 to 360 degrees. The individual rotation matrices for these rotations are built in the
standard method, and then the accumulated rotation matrices are multiplied to produce a
ﬁnal conglomerated rotation. Note that since the standard test battery consists of 100
runs, the initial populations and associated random rotation sets are ﬁxed for all 100 runs
for all tested systems. (1.6. the initial population on the Sphere function and rotation
matrix for the run number r is identical for the EP test series, and the EPGA test series,
etc.). The initial population is computed in the standard coordinate space for the test
problem, then rotated and translated to the active problem coordinate system.
All test problems, even nonrotated test sequences, are redesigned to move the
global optima from the origin (by shifting the effective origin). While this offset reduces
the available resolution around the global optima drastically (thereby potentially effecting
functions requiring high resolution such as the original Shaffer function and the
Rosenbrock function), it prevents rewarding origin seeking behavior. Note that for non
rotated and rotated comparisons the same initial population for a given run number is
used; however, the initial population is only translated for the nonrotated tests.
202 
4.1.2 EP and Fast EP Systems
The EP and Fast EP systems used for testing incorporate standard EP selection
and mutation operators. The EP and fast EP systems are identical with the exception that
the EP mutations are drawn from a normal distribution while the fastEP mutation
samples are drawn from a Cauchy distribution. Each solution to a d dimensional problem
is encoded as a vector of d 64bit ﬂoating point values (doubles), and an associated vector
of d 64bit mutative stepsizes for scaling the mutation. The initial values for the solution
parameters are independently and uniformly initialized across the selected initial range
for the problem. The initial values for the mutative stepsizes are initialized to the
inverse square Of d.
EPstyle tournament selection is used in both the EP and fast EP systems
presented here. This selection operator is classiﬁed as a (u + it) scheme, which indicates
that the parent pool (of size u ) for the next generation is selected from the general pool
of the combined parents and children from the current generation. EP tournament
selection begins by selecting 10 competing solutions uniformly for each solution in the
population. The tested solution receives a score from 0 to 10 based on the number of
competing solutions in the sample with a less ﬁt ﬁtness value. After all solutions are
scored, the solutions are sorted by this score and only the top it survive as parents for the
next round (former parents have precedence on ties, all other ties are decided randomly).
In all EP and fast EP results presented here the value for it is 100, and the value for l. is 6
u.
Mutation occurs using the standard EP selfadaptive technique to adjust the
mutative stepsizes. During each generation, each parent solution spawns 6 child
203 
solutions through mutation. The modiﬁcation of the stepsize is performed before the
stepsize is used in mutation as outlined in [Back97].
4.1.3 BLXa GA System
The BLXa GA system used in this empirical testing is a modiﬁed standard GA.
Although [Eshehnan 93] demonstrates that BLXO. is apparently more efﬁcient within the
CHC framework, we chose to test it within a standard GA framework in order to provide
more direct comparisons between the relative power of the EPGA and BLXa operators.
Other than the difference in operators, both systems are parameterized identically, using
the same code for all other components and initialized with the same set of initial
populations and randomly selected rotations for each test run.
The population size for all BLXa performance data is 200 unless otherwise
speciﬁed. Single element elitism is used (the best parent is directly copied into the child
generation). Selection is tournament selection where two individuals are selected
uniformly at random (with replacement, which allows a nonzero probability for the least
ﬁt member to survive) from the previous population and the ﬁttest individual is selected
as a parent solution. BLXa is applied to a pair of selected parents producing a pair of
independently produced children. Other than the single elite individual, all individuals in
the new generation are produced through crossover. Since BLXo. applies a random
sample, no mutation operator is incorporated into the system.
4.1.4 Ranksum Comparisons
The results presented in these tables are compiled from 90 of 100 runs,
discounting the 5 best and 5 worst of 100 runs for each system type (EP, BLXOt, etc.).
204 
The best value found in those 90 runs is given as well as the number of evaluations used
to ﬁnd that value. The average best of all 90 runs is also reported. In multiway or side
byside comparisons, runs are ranked according to the best ﬁtness value found during the
rim and the sum of these rank values is then reported.
The Wilcoxson rank sum test allows us to use these values in sidebyside
comparisons to estimate the likelihood that the two sample groups are drawn from
populations with unique means. The Wilcoxson test is strongly invariant to distribution
shapes. SO unlike a standard ttest, we do not need to prove that the underlying
distribution is normal. Given the large sample sizes, we can use the following formulas
to compute the expected mean and variance of the ranksum distribution. Since the
Wilcoxson values are expected to be normally distributed regardless of the underlying
distributions of the sample sets, we can use these values in a normal Ztest against the
viewed ranksum values to estimate the probability that the two means are distinct.
_ nA(nA +713 +1)
2
p A , where "A, :13 are the size of sample sets A, and B respectively
Equation 4.1 Expected Mean of Ranksum Values
\[nAnB(nA +nB +1)
0A:
12 , where 71,], "B are the size of sample sets A, and B
respectively
Equation 4.2 Expected Variance of Ranksum Values
205 
Therefore, we can calculate the probability of a given ranksum value being
produced if the two sample groups did come from the same mean. The formula for the
relevant Zstatistic is given in Equation 4.3.
R _
z=__4_ﬁ«i
0A
, where R A is the observed ranksum value
Equation 4.3 Zstatistic for an Observed Ranksum Measure
Using Equation 4.3, we can compute the Zstatistic and probability level for the
ranksum values observed in the results. Table 4.1 presents Zstatistics and probability
levels for a number of observed values covering the possible range, while Table 4.2
provides a ranksum values corresponding to probability levels of interest. In all sideby
side performance comparison data, all ranksum comparisons providing a 99.9% or
greater level of signiﬁcance are highlighted.
~206 
Probability
Ranksum Zstatistic
71.49113
4095 4 1.7823 0.00%
4545 4 0.4731 0.00%
4995 016399 0.00%
5445 7.85485 0.00%
5895 054571 0.00%
6345 5.23657 0.00%
6795 092742 0.00%
7245 2.61828 0.44%
7695 4.30914 9.52%
8145 0 50.00%
8595 1.309142 90.48%
9045 2.618283 99.56%
9495 3.927425 100.00%
9945 5.236566 100.00%
10395 6.545708 100.00%
10845 7.854849 100.00%
11295 9.163991 100.00%
11745 10.47313 100.00%
Table 4.1 Zstatistic of Ranksum Measures with Two 90 Member Sample Groups
207 
Probability . .
Zstatrstrc Ranksum
#A <ﬂB
0.001% 4.26504 6654.216
0.010% 3.71909 6845.046
0.050% 0.29056 6994.833
0.100% 3.09025 7064.847
0.500% 2.57583 7244.656
1.000% 2.32635 7331.859
5.000% 4 .64485 7570.065
10.000% 4.28155 7697.052
90.000°/o 1.281552 8592.948
95.000% 1 .644853 8719.935
99.000% 2.326347 8958.141
99.500% 2.575831 9045.344
99.900% 3.090253 9225.153
99.950% 3.29056 9295.167
99.990% 3.71909 9444.954
99.999% 4.265043 9635.784
Table 4.2 Ranksum Measures for Various Probability Levels with Two 90 Member
Sample Groups
Similar tests exist for multiway ranksum comparisons, but they can only indicate
if any of the samples deviates ﬁom the norm. Therefore these values are of limited value
in multiway comparisons and are used mainly to determine the validity of the average
best values.
Note that it is quite possible for a given system in a sidebyside comparison to
demonstrate a weaker average best value on a given test function, while its ranksum
indicates that it statistically outperforms the other system. This can occur if the system
produces a number of outlier systems (beyond the 5% removal limit) which exhibit poor
performance. For example, if system A produces an error value of 106‘3 once and me20
for all other runs, while system B produces a consistent we15 on all runs, the average
208 
error for A will be larger than that for B. However, the ranksum test clearly would
indicate that A consistently outperforms B.
4.2 Principlal Component Operators for GA
One fairly simple approach toward removal of rotational bias would be to ignore
the bases of encoding altogether, treating the search points as existing in a nonoriented
d dimensional space. However, most operators require some form of axial or
dimensional decomposition for application. One possible approach would be to select
randomly aligned axes for each operator application. This effectively simulates a non
oriented approach. Unfortunately, given that the number of possible alignments grows
exponentially with the dimensionality of the problem, the sampling of a ﬁxed number of
operator applications is likely to provide a relatively poor sampling of this space. The
result is likely to produce a less stable system than those using ﬁxed bases.
Randomly rotated axes for operator application may offer an opportunity to
reduce, or at least normalize, alignment bias. Ideally we would like to be able to select a
correctly aligned basis set, or at least one with known properties, such as one that
minimizes the area of distribution for the offspring. An optimal rotation would be one
which aligns the longest dimension of the hyperellipsoid of the mutation distribution
with the dimension having the largest variance in the population, and which aligns the
second longest dimension of the hyperellipsoid with the dimension having the next
largest variance which is also orthogonal to the dimension already selected, and so on.
Note that this basis set may be located by solving for the eigenvectors of the covariance
matrix and ordering the eigenvectors in descending order by the absolute value of their
associated eigenvalue. This procedure is identical to that used in creating principal
209 
Fit _
component projections for data viewing [Jain 88]; hence, these operators are designated
principal component (PC) operators. In fact, this is somewhat a misnomer since principal
component analysis assumes reduction of the dimensionality complexity Of data for
presentation by limiting the number of dimensions being projected. However, the term
has also been used in [Kita 1999] and connotes the intention of using the eigenspace of
the covariance matrix.
Computation of an eigenspace requires sufﬁcient samples to prevent a singularity
in the computation. A singularity occurs when the sampled set has no projection along
one or more dimensions. When a singularity occurs, the eigenvector calculation may
produce random vectors for the smallest eigenvalues. If the population has not converged
in its level of dimensionality, the covariance matrix requires at least d+1 samples. For
the PCGM and PCX operators a suggested sample size of 2d is used. Since the PCGM
operator uses the eigenvalue as the basis of the mutation size, the random eigenvectors
are not expected to have much impact on the operator. Likewise if the singularity occurs
because of loss of dimensionality due to convergence, it is unlikely that a pair of parents
will have signiﬁcant variance to exchange across the random orthogonal eigenvector.
The only situation which could cause unexpected effects is if the sample set has a
singularity while the parents for a PCX operation have a signiﬁcant amount of variance
across the random eigenvector.
4.2.1 Principal Component Crossover (PCX)
The concept of using the principal component analysis basis space for crossover is
straightforward. Two parents are selected using the standard breeding selection
technique. An additional pool of the desired size is selected uniformly without
210 
replacement (and disallowing the two parents). The covariance of the combined pool
(parents and pool members) is measured, and the eigenvector analysis of the resulting
covariance matrix is calculated. The result Of this analysis provides a series of
orthogonal unit vectors in d space, and a set of associated eigenvalues (representing the
magnitude of the variance measured along each of these vectors). The eigenvalues are
then discarded.
The two selected parents are rotated into the basis represented by the eigenspace
of the reduction of the covariance matrix. Standard twopoint ﬁeldbased crossover is
then performed on the rotated parents, and the resulting children are then counterrotated
and submitted to next generation (or for mutation with PCGM). This operator was ﬁrst
introduced in the techreport [Patton 1999].
4.2.2 Principal Component Gaussian Mutation (PCGM)
Principal Component Gaussian Mutation (PCGM) proceeds in a parallel fashion
to PCX. A single parent is selected using the standard breeding selection technique, or is
provided as a result of a PCX operation. An additional pool of the desired size is selected
unifome without replacement. The covariance of the combined pool (solutions to be
mutated and pool members) is measured, and the eigenvector analysis of the resulting
covariance matrix is performed. The result of this analysis provides a series of
orthogonal unit vectors in (1 space, and a set of associated eigenvalues (representing the
magnitude of the variance measured along each of these vectors).
J67.
T , where e, is the i m eigenvalue, and S is the PCGM scale factor
Equation 4.4 Formula for PCGM Sample Standard Deviation
211
The d zeromean normal samples are drawn, where the standard deviation of each
sample is given in Equation 4.4. The vector of these samples is counter rotated using the
rotational inverse of the eigenvector matrix and then is added to the solution being
mutated. This Operator was ﬁrst introduced in the techreport [Patton 1999].
4.2.3 Sample Selection for Principal Component Analysis
The most obvious candidate for application of the principal component analysis is
the covariance of the entire population. This would require less computation than use of
multiple smaller pools. However, the use of the entire population would allow a single
outlier to potentially dominate the variance measures and therefore the axial alignment.
Further, using multiple smaller samples allows more stochastic behavior in the sampling
process (under the assumption that it is better to be correct for part of each generation
than to be wrong for one or more entire generations). Preliminary empirical testing
demonstrates that using the entire population for eigenvector analysis of the covariant
matrix provides less effective search in the majority Of situations than using smaller
random samples from the population.
If we decide to use a subsection of the population for operator application, the
decisions of sample size and the method of selection come into play. Clearly the sample
size must scale relative to the population size in order to avoid singularities (resulting in
unﬁxed or free axes) in the covariance matrix as much as possible. A minimum sample
size of at least d + 1, where d is the dimensionality of the problem space is necessary;
however, we suggest a sample size of at least 2d.
The simplest and least biased form of selection is to select uniformly from the
population without replacement (replacement would allow multiple instances of the same
212 
14...
l'...
;.r
'2'
'
a.
.
z:
'o
‘J
J
p
point, which would increase the likelihood of a singularity). Altemately we could
perform some form of tournament or other performance based selection for pool entry;
however, this would likely compress the location of the pool members and again increase
the likelihood of reaching singularity as the population converges. Similarly, we would
prefer to apply some form of localization to the pool selection, allowing points “nearer”
to the targeted parent solution(s) to have a better opportunity to participate in axis
selection. Yet this would add additional cost and potentially increase the number of
singularities. Use of uniform random sampling is the least biased of these options and
should provide a sufﬁcient basis for our current work. Investigations into other forms of
selection for pool membership are left to future work.
4.2.4 Difficulties with Relative UnderConstraint and Excessive Freedom
An anticipated drawback in the use of principal component analysis is that the
analysis may become artiﬁcially constrained under conditions of excessive parameter
freedom. Speciﬁcally, suppose that a given problem parameter has no impact on the
evaluative value of a given solution. This parameter is then free to assume any value.
Initially, this should not be a great concern as the distribution of this parameter will be
random. However, as the evolutionary simulation progresses, it is possible that the
parameter would drift toward alignment with one or more parameters. If the variance of
this parameter is still fairly large, which might be expected since no selection pressure is
being directly applied toward convergence of this value, an anomalously large covariance
may be created. Since the principal component analysis ﬁxes the new basis such that
variance along successive dimensions is maximized, it is likely that the one of the early
dimensions will include this artiﬁcial relationship. In addition, all subsequent axes will
213 
be constrained to be orthogonal to this artiﬁcial relationship, resulting in a potentially
skewed basis.
Note that the same analysis should apply to landscapes which suffer from relative
underconstraint (i.e. the expected contribution of a given parameter over the initial range
is exponentially smaller than that of another) to a degree. However, the actual
performance of these systems may depend upon the relative speed of convergence for
these parameters, since slower convergence allows more time for drift.
None of the testbed problems outlined in Chapter 2 speciﬁcally incorporates free
parameters. However, several show varying degrees of relative underconstraint.
4.2.5 PCGA System Parameterization
For the PCGA system used to obtain the empirical results found in this section,
both PCGM and PCX are employed, unless otherwise speciﬁed. Single element elitism
is used (the best parent is directly copied into the child generation), and all other elements
in the Child population are produced through the combined application of PCX and
PCGA. Unless otherwise speciﬁed, the scale factor, S, for the PCGM operator is set to 3
(selected because it directly matches the amount of variance injected with a similar BLX
0.5 application).
Selection is tournament selection where two individuals are selected uniformly at
random (with replacement, which allows a nonzero probability for the least ﬁt member
to survive) from the previous population and the ﬁttest individual of the pair is selected
as a parent solution. The default population size is 200, and the default pool size for both
operators is 2d, where d is the dimensionality of the problem space.
214 
4.3 Demonstration of Rotational Bias in Standard Approaches
Salmon [Salmon 96] has previously demonstrated the rotational bias in CHC
[Eshelrnan 1993], and how EP shows a far lesser degree of performance degradation over
rotated landscapes. The following analysis reproduces and extends the work of [Salmon
96], including analysis of Fast EP and the proposed PCGA system.
4.3.1 BLXO Rotational Bias
Table 4.3 provides an empirical comparison of a BLX0.S GA system under
rotation and without rotation. Interestingly, the analysis demonstrates that the BLX—0.5
operator actually performs consistently better for a number of functions when the
landscape is presented in a rotated manner. The Clover & Cross and Yip & Pao functions
especially exhibit enhanced performance when rotated. Analysis of the Yip & Pao
function shows that a rotated alignment allows for alignment of local optima in an
aligned fashion (i.e. movement along the rotated axes is more likely to move ﬁom optima
to optima). NO similar explanation presents itself for the increased performance on the
Clover & Cross function under rotation. The pattern of these two functions
demonstrating enhanced performance under rotation remains consistent across all the
systems tested here, except PCGA.
215 
sees8.2. 38582 e8 ease: .85 5 033m .5 seated=8 bodeseta.— n... 2.3.
m commaN o axle8.33m:
«abound FBw 3. _oaomscow
omnmRm. o 89% omrwmlmmd cho>>l
8 580m oobowp commamm; dxm 0282.
08 F 34mm 8% oovmmmmé woo 93:00
chow Sod ooooww VEDA=30
83:... oobomv _mzcocoaxw
mums 8%.? 80F win.
oomomomd oooomv
_oam 5855:5
is 0.888 ooome 888N Elooemmmm 88..
00550;
a .8: 585m
coo: w
E's c.8233 8o8 c.888 moi898$
m 8.2 etaom
oommmmm. F
J
o
o
o
o
o
o
o
o
o
Egg; e 8o a a;
Egaﬁlgig 88m 8888 e easeam
EEEEEE 88.. 988.4 a 56558
EN: NIN oim 88N E.. o moo0858.0
82 else89o 288 8880.0 alga 88808 c 3.89550
saga .., oeNN N e 329.
o .550.
0 9825558
, o xoobcmmom
855.3 884 8848.0 0 wooodes:
Egg? :8 comes.3 89K 8885.,” o 8F
ﬁgﬁg 82 Ema.58 oboN: 8588...” o Foes585m
..je 8383 88: masN890 o 6528
8.88... 088 _86845 0 N; ..,_oaEom
868mm 894 8808.0 o ooam
. . 28me 88». o 585m
Eamxcmm cams. m.u>m Esmxcum cabs. m_a>m
2550 c0225“.
vogue502
U380:
216 
4.3.2 EP and Fast EP Rotational Bias
Table 4.4 and Table 4.5 compares standard EP and Fast EP systems under rotated
and nonrotated landscapes. Interestingly, both the EP and Fast EP systems demonstrate
more rotational bias than the BLX0.5 system. Fast EP shows increased rotational bias,
both in the magnitude of the performance differences as well as the number of functions
where a statistically signiﬁcant performance difference is observed, based on ranksum
values. Again we see that the Clover & Cross and Yip and P80 functions consistently
demonstrate enhanced performance under rotation.
A potential explanation for the rotational biases displayed here is that the high
levels of elitism in these systems may effectively allow a form of inductive search, where
each parameter is ﬁnetuned individually while the others are held relatively constant.
Inductive search typically proceeds best when parameters are relatively independent.
(Rotation of the axially aligned function space may also be seen as recasting the encoding
such that individual parameters become less independent.) This would be consistent with
the observation that Fast EP enjoys a greater performance boost, in that the larger
mutative jumps provided by the Cauchy samples would potentially allow quicker tuning
of individual parameters. Note, however, that there is insufﬁcient evidence to completely
substantiate this hypothesis and the actual cause of this rotational bias is likely to be
much more complex. This subject bears further scrutiny in the future; however, for now
we simply conclude that both EP and Fast EP systems demonstrate considerable
rotational bias.
217 
565358...— eoaﬁeméez Fees 1883— .625 km no nataaﬁeo cageFuen— F..F.. 93am.
mmwk owmmmmmd OOOON F moéwko. F Fmvm 059 F med OOOON F moéne F. F o 35:00 BEEND
F FON F mo+ome.N OOOON F mo+owow.N 5&6 Fe Eonsnow
nmmo F ovvnmmNd QOOON F onmm Fm F6 0 wEo>>l
mm FN F oomVMde OOOON F 8 FmKoN c .axm 0232:
vwoo F oommowN. F SOON F omwwmmvd c woo 33:00
mwNw oo meN.wN ooooN F OONwhva KNom ooFmmodm OOOON F oXNwmwd c xc_r_c_m_._o
mm FN F mob F8. F OOOON F noovoo. F c _mzcocoaxm
vmhm oommnhd F OOOON F oommNmm.m 0 win.
Am FR 00 memm.v oogN F commamod mm Fm oommmwoé OOOON F oommmwmd o _Eﬁm _mvoEEBz
END F ov F9650 OOOON F omMNno F.o o v .UOE Cothom
ammo F 0805mm. F COOON F omovoomd o m 50.2 Cetmzow
mNho F wommmmod OOOON F vanm 5.0 O can a E>
ANvo F oommwvdm OOOON F 095de F o 55:53”.
Nome F oommva. F QOOON F N To FmN.m o >xw>wcomcom
mm. N F MN Frowmmé OOOON F mMNrowva 0 $90 a Co>o_0
thh ommNmmmd OQOON F mommomod 8mm envmmmwd OOOON F FMvamod o xmcmsotmu
 @500 F 00 Fommwd OOOON F Wormwood 0 >023.
mmmh OOQNNNQ. F OOOON F DONONNQ F mmmw 00 5 5m. F OOOON F oo mewN. F o _mzqw
nwow OONmm.mNN ooo0N F 0850. 9. F mNNm oo FdeNN OOOON F oomwmdw F o N 622 .mtmcow
VMNF OONmFR F.m ooooN F commONNd mmom oowwmmmﬁ OOQON F comm Fwo. F o xoocncmmom
_ a .. _ mw0N F Doom949v QOOON F oomeNod o 0620 w .QSF
mmmo F oomvkmd ooo0N F oono FNm.v o mcﬁr
,. _ vooo F oomnmn. Fm ODON F cow FONdv o F 605. thmzom
ommh oommmNFem ooowF F oomNmmdv ow; COthnmm ooomm oomNmmdF. o ..Dtmcom
. .. . . mnoo F ooémmﬁm ooom F F N To meN o N. F FEEscum
3mm wobNmm. F OOOON F m FDONm.v oth Fobmwo. F ooom F F m Froomwd 0 9.93m
.. 1 _. ,. . , . , ,. Nmno F OOMNmmmN OOOON F mONNoood o meanUm
Esmecam coo: m_u>m “mom 539.5: :35. 226 «men
25.30 5.85:".
— 332.502 3222.
2l8 
ﬂeas—.38...— ESCy—éez p5. 0353— .515. mm 8:..— .Fe genius:60 ocean—notch m6 039—.
mNa mNNNNoe 88: $88... ..,8 98:8.. 88N 888N o .558 259No
. 8N 8+88.N 88: 8+o48N Falsee _oooscom
. . _ 88 888.0 288: 8.83 o 5531
 8N: 88m8N .88: 8485.: o em. 855.
84: 8988.0 :88: 8N8NN.o o 80 288
SN: 8:8NNN 88N: 9888.0 3 53.580
85: $28.: 88N: 8888 3 55883
ENS 888.2 88: 8888 c ms.“
. . 88: 8883 808: 8283 o Esme855:2
.. Re: 2883 88: 9583 o 48.25528
8& 8NBNNF 88N: 8N88.o o 882828
88: 2855.0 288: 84858 e on. 35>
, . 8& 8888 808: 8N88N o 585.com
_. . as: 8.888 88: 9.88.: 0 558858
888N 88N: 8788. . . o mooo 896.0
. . 88 0538.0 88N NlNeemoe o x8865
.. .. . 86: owlKeene OE: 88me o apees.
. . . 8N: H.888. 88N: 8883 0 =58
.._. 4... . 88: 888.3 ampN: 8888 o Noe€828
. ..e. 88: 8888 808: _8..88N o goo88oz
. _ . 8& 80588 898 ESN o moo08.8;
 . . Em: 8R8: @8 o888N o 8E]
.._ 88: 888 8 BOON: x8484... o fees585m
.._... NiN: 888.8 ob8lN: 858.: 0 585m
_ . 88: 8888 EN 98m? o Nae858
8NN NroesNe 88N: 3.8:. eN8 :oNNo.N BOON: ere8N 0 958w
,..........._ H .,  ..., . 8:: 0888.0 88: 828.4 o 588
5.39.5: can: m_m>m mom Esmxcam can: m.a>m «mom
2530 50:05:".
— peace.502 page:
219 
4.3.3 Principal Component GA Rotational Bias
In examination of Table 4.6, it is clear that the use of principal component guided
operators provides the least amount of bias under rotation of all systems examined.
Those functions which still revealed a rotational bias were those which provide either
strong trenches of attraction along the aligned axes such as the Trapezoid and Cross, and
Clover and Cross functions, or those which provide independent, relatively under
constrained parameters across the nonrotated axes. (Interestingly, the Clover and Cross
function demonstrates increased performance in its nonrotated presentation for this
system, in opposition of the results for all other tested systems.)
Only the Schaffer Modiﬁcation 1 function demonstrates increased performance
under rotation. Given that this function presents a rotated ellipsoid structure, a given
random rotation may possibly provide better alignment to the axes of this hyperellipsoid.
The increased performance on the exponential and inverse exponential functions seems to
reinforce this conclusion. Therefore, we can conclude that PCGA does not show
signiﬁcant rotational bias except in cases of ﬁmctions with hyperellipsoid shaped
isobars, and long axially aligned trenches.
Comparison with the results from other system tests demonstrates that PCGA
does effectively present a greatly reduced level of rotational bias. Therefore, we see that
the level of guidance extracted from the population remains fairly consistent under both
rotated and nonrotated presentations. An open question remains as to whether this
guidance is effectively random rotational presentation. This issue is addressed in section
4.6.
220 
53358.5 cSSeﬂuez 1:8 383— .32:—
:2 8828: 8% 88888 28 8NKNIN: 888 82588 8 5.988
838 8:88.? 888 988.? 838 8788.? 888 8:88.? 8 8.82888
8.8 8688,.» .885. 88888888 :88 878%.: 8888: 2388 8 8908.886
838 88888 888 88888 838 88888 888 88888 8 88325
N88 $88N 88% 382.: 8: $mN8NN 888 382.: 8 8:2
N82 828F888 888: 88888 88 883NI8: 888: 88588 8 _Sam
82 8728.8 888: 878$.N 88 .2898 888: 9.8888 8 N8828858
N88 8822 888: 82888 82 8%82 888: 88.888 8 89888
88 . 8,828.8 8828 . 888898. 88 8N88IN8 88: 88888 8 890 .8 .8;
88 88888 888: 8888.: 82 88888 888: 8:83.: 8 8F
88: 88 $8: 822 .8888: 8 ,. 8,: , . 88. 882 ,, s . Ni... 8 885.8888
_VNE _mFmoolwm 8888: 2888 88 9.288 888: 8:88N 8 8888
.58 2N 8818: 88: 8.288 82 88%.: 8882 8288 8 «5.88388
_ﬁw 88mm; 888 «WIN85.: :82 8888 882 8N2:.N 8 9888
88 2828.: 888 228: 82 228.: 88:8 988.: 8 9888
—E:mxcmm cums. m_u>m «mom Esmxcmm cams. m_u>m «mom
«Ezao cozocsu
— 8282.52 “.280:
221
4.4 Empirical Relative Efﬁciency Comparison of PCGA
The following comparisons show the relative effectiveness of the PCGA search
system as compared to EP, F ast EP, and BLXa on the test problems listed in Chapter 2.
The intention here is not to demonstrate dominance of one technique or another, rather to
simply demonstrate that the PCGA system is relatively as strong as the others, and in
some situations outperforms the other systems. Note that as previously discussed in
section 4.1.1, all test functions being evaluated here are presented in rotated and
translated fashion to the individual search systems.
4.4.1 Comparison to EP and Fast EP
Analysis of the data in Table 4.7 and Table 4.8 shows that PCGA is capable of
favorable performance on a number of functions. If we attempt to categorize the
functions on which PCGA seems to provide enhanced performance, we see that they are
largely unimodal ﬁmctions, or functions with strong unimodal components. Conversely,
EP and Fast EP provide better performance on the third and fourth modiﬁed versions of
Schaffer’s function, the Multimodal Spiral function, the Double Cos and Worms
functions, all of which are strongly multimodal functions with very low signaltonoise
ratios, in terms of information pointing toward the global optima.
The apparent reason for the reduced relative performance of PCGA on these
functions is that the lack of strong feedback toward a single optima. This allows the
system to spread the population further and further without signiﬁcant convergence
toward a single optima. This demonstrates a potential danger of the effective “feedback”
of variance through the PCGA operators. Under circumstances where selection is not
likely to signiﬁcantly reduce the population variance, these operators are likely to
222 
continue increasing the variance of the p0pu1ation. This hypothesis has been verif
through preliminary measurements of the population variance on these landscapes 0
time.
Note that this feature is not necessarily a negative one. When the landscape
sufﬁciently chaotic, it may be more useful to continue exploration of the space than
force exploitation of the single most promising peak. However, it is possible for P0
to completely fail to converge under such circumstances. Ideally, we desire to have m
direct control over the balance of exploration and exploitation. This is a topic we \
explore again in section 4.10.
The results in Table 4.7 and Table 4.7 demonstrate that use of population samp
information is often as effective as or more effective than use of a selfadapt
mechanism. However, we cannot determine whether the enhancement is provided by 1
of the population sampled variance for the mutative step size or the ability to bypass
bases of encoding. Still these results show that using the population for guidance is
equally valid approach under certain circumstances.
223 
«50m 6:: 589mm mm as no sour—3:30 35852.5.— hé 035—.
88: 28:86 88N: 82:: a o 388 28:?
32 8+o8N.N. 88 8.888 88 8+88.N 88N: 8+88N 58.9? .2828
88: 8883 “88: 2882. o 2:25
N5: 8888 88N: 8528 o .90 852:
NSN: 8828 085 8N§o.N : 80 288
88: 888.8 828: 8888 o c 83580
88: 828.: 88N: 8.38.. 3 888°me
. .. . 83 882.2 88N_ 8888...” o 22”.
N8: 8223 88N: 8888 o .88 8.8585.
omFN: oo¢¢oo~.: oooao oooomom._ o v.uo<‘.w=mzom
28: 888va 88N 8833 o 8 8.2888
._ 88: .8288 88N: 8883 o 08 a a>
..,. 88: 888.8 88N: 888.? o 8888
.., ,. .. _ . . 85 8883 o88: N728...” c 8.98088
88: 888.: 288: 988N . c 8208220
n. .. . . 88: 288.0 o88: 8:88 o 88:80
.. .. .... 88: 8.88.8 88N: 888°.» o 8:2
. a .. mooN: oo:o:mm.: ooooN: oomemN _ o _m:am
_ . . . 88: 8.8.88 88N: 888.8: o 882.828
88: 8888.2 88N: 8883 o 9.8.888
... . . 88: 8883 288: 8883 o 820 a .8:
_. .. .. 88: 88288 088: 8383 o 8F
. 88: 822.5 68: 898.8 0 :82828
, . 88: 822.8 o88 888.? o .888
._ 88: 8.82...” 88: N728N o «3.838
88: 82%? 88: 988m o 288
. ._ . . ,. 88: 8888.2 808: 8883 o 988
E335: cam: m_u>m “mom £335: :3: 226 «mom
mm uEzno 5:25”.
_ <09.
224 
<00.— .Ea 588.8 mm.— .8h 8 ac coming:00 oogotom ad. Span.
8.2 288.0 00002 $888 0 09008 2.80?
8m: «.8888 88 86888 582? .2838
88: 0888.0 88: 2888.0 0 2503'
2.8: 08888 00002 003.28.: 0 gm. 8090.
822 08888 0008 823.08 0 80 2008
. 8:82 8:888 0082 0888.0 0 8500020
822 828.: 882 8888 0 880098
88: 00MB“: 0082 08888 85 0088.2 88: 08888 0 8:
8:: 08:88 0082 08888 0 .908 88500002
822 00088.: 808 8088.: 0 8 8.2.8me
822 88888 808 08838.: 0 m .802 .888
82 22880.0 8002 88:80.0 0 80 8 00>
822 0880.2 8002 00880.5 0 00.008880
. . . 8:2 00:28.0 88: 2.88: 0 8.9088008
822 288.: 8002 2888 0 8908.966
.., . 8:2 0832.0 0802 8380.0 0 88280
0.. . 8:2 0888.0 8002 8828 0 06002
., . 88: 0888.: 00002 8280.: 0 _900m
.. __ 822 882.2 8002 000828 0 8 .00: .888
0802 8883 0082 0888.8 . 0 890880".
. ..., . 822 08588 F88 8%:8 0 890.8%:
. 28. 08888 882 008088 0 Em
.. ...0 , 822 0082.8 0082 8088 2 0 2.802586%
_. .. . .. 822 8800.8 0802 0058.: 0 .828
. 0.0.... .., , . . . . .. 822 8888 @082 28m? 0 8.3028208
0.0., 0 . . . 8:2 :88.“ 0082 _3288 0 980m
.0 0......0..0 :0 .0 0. . 822 0888.0 008: 828:. 0 28008
Eng—cam cam: m_u>u 5am Eamxcam cams. m_a>m “mom
2:030 5:050".
_ <02 am".
225 
4.4.2 Comparison to BLXo GA
Given the similarity in the approach of BLXa and PCGA, it is not surprising that
both systems demonstrate similar performance on the given test functions. Table 4.9
illustrates the effective performance differences between these two functions. The
individual graphic performance analysis for individual test functions in section 4.13
further reinforce their similarity, since the PCGA and BLXa curves are very similar in
many cases. While several possible trends appear in the individual strengths of these two
systems, there do not seem to be clear characteristics or trends which allow us to
categorize the general relative preferences of either system. Coupled with the previously
presented data on rotational bias, we can conclude that PCGA achieves a similar level of
performance with less rotational bias on these test cases. Conversely this implies that
BLXa provides enhanced performance under circumstances where parameters are
known to be fairly independent (e. g. on the nonrotated test cases).
226 
ii.‘
<80 08 as“: 5 8.030 a 00 astaaoo 888.050 0... 280.
0: _ ._ 8...... 8 . 8:2 288.8 882 8.88.8 0 .2008 00.803
. 8.. ., . a... p0,, 8:2 8.8088 008 88088 0.8.23 .2888
88 0888.0 008: 2888.0 88 8828.0 F08 028080 0 8:020
s. . ..., .. ,. 8.. x , 00.0.0 , , . .. 413:8: 88888 0082 8880.: 0 .080 88>...
88 0800.88 808 88308 8P8 08:88 088 83%: 0 80 20000
88 0888.0 088: 08:88.0 8: 0828.0 0802 0888.0 0 002580
83. 00.. 8. ..m... . . 28...: , E0. 88: 88:8 8002 8888 0 80080008
80.. ...0 . 00808.0: .3.... , . . .8... 28: 0088.2 882 08838 0 8.28
23 0888.0 882 08888 . 80:... ., a. .. 8.. 0 88.800.00.02
088 8382 88m 0888.: 88 0888.: 088 8:28.: 0 08 80258008
28 8.8888 0008 0888.: E: 82888 E: 0888.: 0 8.8.2.888
88 0888.0 0008 08880.0 8: 08'8080 008.. 08'8800 0 8n. 8 00>
88 008:8I82 0008 00808.0 88 0888.: 808 2538.0 0 0:00.080".
88 2.88.3 0008 288.3 28 8880.0 088 2.88.3 0 0298808
88: 028%: 00008: 2.8808. 0 .. :.m..........:. .. ..0... 0 8058.806
88 80800.0 0080. 08800.0 28 8088.0 0008 80080.0 0 0008280
8:82 38mm8 80% 88K: ., .. r... ...0... 8088.8... 0 >802
888 00388.: 8002 0888.0 28 8882.: 0082 0888.0 0 .908
. 880am . 00...; 80.83.“... 822 83888 088 0288.0 0 8.8.2.888
38: 00888.8 0082 00888.8 . .8. 0.0008 ...1 :28 0 0085088
83 88808.0 88:. 08800.0. 80: 00:28.: 08M. 28'880 0 8058.8:
883 08:88 00.82. 00808: 38: 888: 088 083808 0 8E!
88 0280.0 8.08 50.88; 8.2 08858 088: 0888.8 0 2.820828
88 82.02.8 0082. 8.888.. 822 0888.0 88: 8808.0 0 .888
8:2 58'8: 0008 8.888 8.9. 8.8.88.3 0008 .82; 0 8.3028208
8:2 WIN888 808 82:8 8.9., 8.0.0.8 803... 00000008. 0 988
828: 288.: 0008 2808.: .3 8.88.0 0083.... 0000000.. 0 988
Eamxcmm_ cams. m.u>m “mom 259.53. :3: 8.26 «mom
85.30 cocoa—E
_ <80 8.0.048
227 
4.5 Empirical Evaluation of PCX and PCGM in Isolation
Studying the PCX and PCGM operators in concert provides some measure of
validation that the operators offer similar performance to standard evolutionary
computation approaches with less apparent bias, although potential faster convergence. It
is entirely possible that the majority of the behavior of this system is provided by only
one of the two PC operators. By comparison of the performance of PCGA systems with
one of the two operators disabled, we can determine whether the performance of the
system is provided primarily through PCX or PCGM or as an emergent property of the
two when used in concert.
From evaluation of Table 4.10 and Table 4.11, we can see that neither the PCGM
mutation operator nor the PCX operator working alone provides as much search
capability as the combination of the two. All systems use a scale factor of 3 for the
PCGM mutation operator range. Interestingly, direct comparison between PCXonly and
PCGMonly performance, as demonstrated in Table 4.12, shows that PCX performs
better on twice as many of the test functions as the PCGM Operator. This makes some
sense, as the level of exploration under PCX should be relative to the distance between
the parents, while the level of exploration under PCGM will be one third of that when the
scale factor is 3.
Consistent with the previous discussion (Section 4.4.1) on the divergent behavior
of PCGA on multimodal landscapes with relatively weak global bias, we note that the
PCGMonly and PCXonly systems tend to outperform the combination on these
functions. This is the expected result, since these systems provide reduced variance
~228 
addition, and therefore have a lower probability of overwhelming the variance reducing
effects of selection.
229 
issues.— 5 a .0 350.50 28.— 05 06.. 05 8.8.080 .0 =85..an 85.5.0.8 00 .3 2.:
no N— mo+omoo P oo.o.oN_. Rmmoo .o. o o 33:00 HEM—0x0
8.3080 80:808.. 8 in? 38.83 8 .. . . . . 0.00.83 @8358
230 
alga: .. 0 weaal
_ . . a??? 0 .me 8.2.2
0 80 2880
0 0.03520
. . . o _8_Eocoaxm
. . _ . 0 ms:
EEEE . 0 _sam 8820...:
Elgﬁos 0 3 .022 509.08
jgio 0 8 .022 .8208
.. 0 on". 8 ac,
0 5.0338
0 >xw>ozomzom
o 820 .8 53.0
Big? 0 c.0535
ﬁg? 0 >263
mum—IE? 0 _eam
. 0 8 .82 .8868
o xoobcomom
o 390 a de
0 mail
83838.0 0 P .oos. ..mumcom
030 2.: 0.0 0 5:208
00008. 88 0&800 0 .8.— 228368
00888.8 808088 0 22.08
08888.8. 88800.0 0 98008
839.5: «mom
«Ezno 026:0".
a <09. 2.5.06“.
8.0.:an <8 8 0. 02.8500 28.. 8.3 88.. 808 8.00.288 .0 08.58800 855.083.. ...3 2838
88.8. 8968888 00008. 00888.8. 0 _0808 2.5.?
8880. 896888.8 0008 896883.8 =.88.8.3 5.88358
0888 08888880 0008.. 0.830080 0888 0.808880 0008.. 08..888.0 0 8.502.
0880. 00800008 00008. 88:88.. 0 .98. 88.9.5
888.. 00803888 000.8 008.3308 0 88 83:08
838.. 00883888 000.8 0088888.0 0 8825880
88.8. 0080.888 00008. 808838. 0 8888086
.., . . w 833.. 0088888. 00008. 88880.8. 0 82“.
880.. 00883888 00008. 00838088 0 5.8988583:
3388 0033808.. 00088 0000888.. 8388 0080388.. 00080. 0033888.. 0 3 .822 5.0858
0808 00888388 00038 0088338.. 0888 00888388 000.8 0088888.. 0 8 .822 888868
88.8. 008888... 00008. 08888800 0 088 8 _er
r. . 88.8. 0080.308 00008. 0088.808 0 0888.888
88.8. 038.8080 00008. 88888000 0 88.985208
. . 88.8. 898888.. 00008. 808880. 0 8820 8.8.66
88.8. 00.8380. 000%... 085%888 0 3.982880
88.8. 00.88888 00008. 888.80.. 0 88:2
.. . _ 8.3.. 0088888.. 00008. 0%I880... 0 _eam
. 88.8. 00088.08 00008. 0083088. 0 8 8028888888
..8.. 00883838 $.08. 00088883 0 8858888
.... . 888.. 08888880 00008. 8.888000 0 885.8 .88:
.., .. 8088 008.808.8 08808. 008.3.8. 0 88%.
... ., ..8.. 88.8. 00888388 B08: 08888080 0 ..802888858
.,. 88.8. 003.8088 00008. 08888880 0 58858
,,. . . . 88.8. 8&3; 00008. 003.8.88 0 8.8.8.8358
...,... . 88.8. 00888.30. 00008. _08813388 0 28888
...,... ...,. . ,.. . . 88.8. 000088.88 00008. 00888888 0 8.8088
Esmxcam :88: 8_8>m .83 5:80.88: :80: 8.8>m .83
85.30 530:3".
<88 acezoom
231
'W‘
880380.28 .5 a 0. 88.38.. 85 86.. 0.... 8.80.288 .0 88.88850“. 85:58.88 8.3 8.8....
888.. 80+8888.8 00008. 00888. .8. 0 6.28 8.8.8:?
.880. 80+8083.8. 00008. 8988888. ...8I8.8.3 8.883888
. . 88.8. 0.808880 0008.. 08..888.0 0 8.8.8.5
888.. 00883833 00008. 00888808 0 .axm 88.82..
0808 0.383880 00008. 088888..0 0888 008.8380 00008. 833.0800 0 88 8.808
. 8888 00883888 000.8 00888880 0 8.835880
8.88 880.888 00008. 808838. 0 8.58898
8838 838.88. 0000I8. 00.088... 8888 0088888. 00008. 0088808. 0 ms...
8880. 88888.8 00008. 00.88.88 , 0 _8._88_8..oe_._=2
.. ...,. . 888.. 0080388.. 00080. 0033888.. 0 3 .822 888888
.. .. 88.8. 00888388 000.8 00883.8. 0 8 82.88858
. ..., .. .. 883.. 008888... 00008. 08888800 0 888 8 .8.
8880. 00.88388 00008. 00888808 . 0 5.8.8888
.. . .. 888.. 038.8080 00008. 88888000 0 88.8..888208
88.8. 808083. 00008. 80888.3 . . 0 8858.886
.. .. , 883.. 8.8380. 00008. 083F880 0 8.8083880
, .. .. .08.. 00.88888 00008. 888.80.. 0 88:2
8830. 0088838.. 00008. 0088.88. . , 0 8.38
. H 8880. 00088.0.8 00008. 883088.. 0 8 .822 888888
. 8.00. 00883838 00008. 00088883 0 8.8.8888
808.. 00%08 00038 0088808. . 0 88.0.8 .88..
8880. _088.8.8.3 00008. 00888888 .. . 0 88%
00808883 00088 0088838.. 8838 0R88388 0008.. 08888088 0 .822 888888
._ . , . ...,. . 8880. 83.8088 00008. 0888888.0 0 88858
.. . .. .88.. 00%83... 00008. 83.8.8.8 0 8.8.8.8368
_ . ...,. ..,. _ . .. 308.. 00888.30. 00008. 088I338..0 o 8.8888
. . . ...,... .88... 080088.8I8 00088. 888.888 0 8.8388
:82: 8_8>m .mom Ear8.5: :88: 226 .88
85.80 5:2..."—
>_coxon_ 8.52.8..
232 
4.6 Empirical Comparison of Principal Component Operators and
Randomly Rota ted Opera tors
A question arises as to whether the rotation applied by the principal component
analysis of the covariant matrix of the population sample provides useful search
information or simply an expensive form of basis randomization. In order to evaluate the
value of using the principal component basis, we can compare the performance of an
identical system where the basis set is selected at random. To accomplish this, we
replace the population sample in the principal component analysis phase with a set of
randomly selected points. The variance of the actual population sample is then computed
across the basis computed during the principal component analysis (i.e. the sample is
rotated into the random basis and the variance across each axis is calculated). Both
systems use the measured population variance to determine mutative step size. The only
difference between the two systems is the method in which the basis set for operator
application is derived.
The data in Table 4.13 demonstrates that using the principal component analysis
of an actual population sample provides better performance characteristics on nearly all
of the tested functions. We can safely conclude that the use of population guidance does
not provide a more expensive form of bases randomization, but rather provides actual
useﬁil information in determining future search directions for a majority of these
functions.
233 
<09— ..5. 888.88 <9 U333— ».Eo—Eum a no ..8..—2:50 oogotoh mm
.v 035—.
00088 8.88888 0 gal
8+8. .88 coo. 896.88.8 =.88.8.3 6.883888
08888086 0%.. 938%... 8E.o>>l
0838888 0008. 88838.8. dxm 88.82..
B088 8888an 800 83:00
80.. 0888N886 xcjEmco
08.8 8:88.583
oooom . 8838338 8.28.
oooom oo¢Nnovo
8.88 888.832
ooowo oowoovm. —
3 .82 88858
0
0
0
0
0
0
0
0
. ﬁgs 00088808 0 8 82.888888
. 3%? 0888888... 0 8.8....
00:33.88 0038 0088888. 0 5.8.8888
8.88883 00088 8.88883 0 8.9888858
. h A r ,, a o 88.0.8536
0000000. 000.8 08080.0 0 88:82.85
3.83888 8088 3.88%.. 0 .8283.
0030.888 0b008 000.8888 0 .8888
88:88.8 B008. ..8888. 0 8.8288858
808.88I8.8 00008. 00888888 0 88.8888
0088l8883 0008 00880888 0 88.0 8 .88..
0.88.888 0b08 _0088388.8 0 88F
00888883 00008. E883. 0 ..8oz.8..8..o8
8.88088 00008. 8.8ﬁ8 0 88868
8.83088 88.. _888008. 0 8.8.8.8288
888.808 000.8 8888888 0 8.8.38
8.8M38. 00E 8.8088L 0 8.8388
Esmxcam :88: Eamxcam ~ :88: 8_8>m _ .83
8.530 cozocau.
<89. <88:
234 
4.7 Evaluation of the Effect of Scale on PC Operators
Using the direct variance of the p0pu1ation as the basis for the variance of thc
mutative distribution effectively doubles the variance of the portion of the populatior
which is mutated. If this mutation is too large to be compensated for through selection.
the population will fail to converge. Ideally, this mutation size should be calibrated
directly against the level of selection pressure, a concept which we more fully address in
section 4.10.
Two possible methods for reducing the size of the mutation distribution are to
reduce the rate of application, i.e., mutating only a portion of the population each
generation, and rescaling the mutative distribution. The ﬁrst option has the tendency to
increase the effective level of elitism as the population converges, greatly modifying the
character of the search process. The second approach has the beneﬁt of maintaining
more even mutation while reducing the mutation level. The following data examine
several alternative mutative scales and their effects on system performance. The
presented values are the average best performance of 90 out of 100 runs (the 5 best and 5
worst results are discarded). The shadings indicate which (if any) of the systems
demonstrates a statistically signiﬁcant ranksum value when compared across all other
represented systems.
235 
E80888 <80 0. 8.2800 28.. 00 .8 .8088 .0 .808 33 830...
oo+mooonvmmo
0+80088.888 48....800R8838 8.800.88P88 . , . .. .0008 0.0.8088
80+800880338 80+80080888.8 858888.88 8.88.5008
.. ._ .. . .08008.883.8 .0.80088888.8 .0800.8883.8 85.02.]
00+800.8088.8 00+800888.8.8 ...,..., . .. ...w 00+80038.8.88 00.8 88.9.5
.0 800.8888 8 5883.858 0080803888 .0... .8 . 88 85008
...08 8 . .8: .08000F883 6800383888 0980030808.. 8035800
.0800W888... 308003388l88 _ 8.8.0088....m. 8.2“.
.0+80008888.. .0+80088888.. . 3.... .0+80003888.8 80080008
0
....I8
0
0
0
0
0
0
8.8008338 ..3 .., 00+800.0.80.3 008058 .8 ..8 0 8008 8805.002
00+80080808. . 0.0+80008888 . 00+80033808. 008888.08. 0 3 .822 808008
00+80038880. 8.... 888%88 00+80088838.8 00+80008338.8 0 8 .822 88808
80.800.038. 8 3080083888 . 8... .0800.8m.8.8 0 080. 8.0..
880088.888... 00+800.F88.8 . . .. .0+80080888.8 0 5.88.888
808838.86. 8.8008858. .. . , 0 8.8358008
8080088888.. 808888.083 0 880.0 8 .856
80800888888 80800.8I8803 .0 80088888. .8 0 08082808
580088833. ... . _ .. ._ 3.8008ﬁ888 0 88:2
00+80080888. 00+8000M308. . , , .... .. .. 00+8000.8I8N8 0 8008
00+8008mﬂ... 808008I88808 ... . 0......8. 89800380388 0 80.02.888.08
8.800%? 00800888888 00+80088880.8 .. .8. ., .. .. ..8.. 0 88.008808
00+800.8088.. 8.800%... . _.. _ . ...8... 00+8008.808.3 0 880.08 .08..
09800388088 00+80088880.8 . . .. .. 00+800.8I888.8 0 80E
00+8008 8%.. .0800E388 80.888383 0 . .822 88858
80800888838 80800888888 808008. .888 0 888008
00+800883338 808008 .833.8 8838%008 0 8.. 8.8.8.5008
588888.88 30.80088. .8 88800888888 0 8.8008
028888.88... 5.88 .8I8I8 ..8 8 ..80088M888 0 8.8008
088.2 cues. 0885.
c2025“.
8 3 8 8
236 
As an interesting note, given that the formula for the variance of a uniform
_ 2 2
distribution is (3%, the effective average variance for BLX0.5 is then % = 3,
when the distance between the two parent solutions is 1. Similarly if the average
variance is 1, a scale value of 3 would provide the same mutative variance size (however,
centered about a parent, not the center). So it is possible that the relative scale of the
mutative size is the main factor in the success of BLXO.5.
4.8 Evaluation of the Effect of Sample Size on PC Operators
In exploration of the tunable features of the PCGA system, the previously raised
question (Section 4.2.3) of the optimal sample size for the required principal component
analysis presents itself. The data in Table 4.15 demonstrates the effective relative
performance of identical PCGA systems with various pool sizes. All tests were
performed using a population size of 200 and a scale factor of 3. The presented values
are the average best performance of 90 out of 100 runs (the 5 best and 5 worst results are
discarded). The shadings indicate which (if any) of the systems demonstrates a
statistically signiﬁcant ranksum value when compared across all other represented
systems.
Note that for the majority of the test cases evaluated a smaller pool size is
preferential. A possible reason for this is that a smaller pool is less likely to contain a
population outlier, and therefore will have smaller variance measures. This seems
especially likely given that those functions which demonstrate enhanced performance
include those which we have previously shown to react favorably to a reduction in
variance addition in PCGA systems.
237 
For the majority of the tested functions, there is no statistical difference in the
performance of these variants. Further, the magnitude of the performance difference on
the remaining functions is relatively small. Nonetheless, this data supports the use of a
minimal pool size within the restrictions of singularity avoidance issues as discussed in
Section 4.2.3.
238 
E80888 <80 :0 88.8 .8.. 20.5.8 .0 8.88.8 8.3 8.88.0
8.888888
8 .800 .88M8 8.800.838.8 8 .800 .88888 8.800.88M88 0 60:8 298:?
80.800. .8.88.8 80+80008888.8 80.800808888 80+80088088.8 0.88.8 .3 8.882888
5800889803 .08008I88803 8800888888 .8 _. 8 ..,: 8.503]
00+800.8.8I8.. 00.8008. .88.. 0080008888.. 0980088888.. .me 8925
0088385888 8800 E88 00+80080388.8 00+800. .0888 88 8.0080
5800888888 .0800888I88.8 58003588 .. .. , 8.... 8.03588
..8..8008838888. ...,. 3.80088888.8 8.800M38088 . 828
.0+80083888.. .0+80083888.. .0+8.F8888... ..8.. . 80080806
00+800.8888.8 00+80038888.8 09800883888 ..... . . ..8.....» 88088802032
8.888% .8. . 00+800i8p888. . 0980033808. . 00+80E088. . 3 .802 888888
00+80008888.8 00800 .08888 00+800ﬁ388 8 .802 88868
0
0
0
0
0
0
0
0
0
6.8888888. 80800883888 30800888838 . . 0 088 85>
00+800.8B08. . 00+8008W8m0. . 00.8008 .88“. 00+800 SE8. . 0 5.8.0888
8.80000888.3 8.888%? 8.888% 8.8800888.  0 8898888808
0.80038888.8 0.800888$.8 0.80088E. ...,...8. w. .. ..,. .. .. . ...... 0 8890 8856
0080000030 00800000000 00+80000000.0 80.88.888.88 0 8.8083880
3.800.888..8 3.880%:8 3..80088888.8 3.800888888 0 88:2
0080088808.. 00.800388... 00+8003.880.. ...,8 .. . . _. 0 08808
3.800888.8.8 3.800%: 8.8003880..8 .. .. .3.. .. ..., . 0 8 802888888
8.800888%.8 00+800 .8I80 ..8 09800888808 00+8000.8I8. ..8 0 8.0050888
.080088838.8 .080008I8808 . .. ., ._ . _ . 5800888888 . 0 88.08 .08..
09800888083 00+8008.8I30.3 00+8008.88l8.8 ..5. .. ...,...8 .., 0 828
8080088 .88. . 8080% .8. . 30800 .088 .. . 808000888.w.l. 0 . .822 888888
3800888883 3.800838088 8.80E0..8 :8. . . .. ...r. 0 888888
8880088 .88. . 8880088888.. 8880088388. . 8 .80083080.. 0 8.. 8.8.8268
. .8.8008888..8 88800888088 88800800888 88800838388 0 8.8808
8 .80088 .83. . 8 .80088308.. 8 .8003 .808. . 8 .800888I88. 0 8.8008
5882 :88: :88: :08:
8:580 5:25".
03 08 08 8.
239 —
4.9 Evaluation of the Eﬁect of Population Size on Population Relative
Opera tors
An expectation of evolutionary computation systems is that as the population size
increases, the level of diversity is maintained for a longer duration. The typical result is
that performance increases as population size is increased for an equal number of
generations. However, since each generation requires more evaluations when a larger
population is used, the net gain for an equal number of evaluations will not favor the
largest population sizes either, unless the search system performs no better than random
search (e. g. a needle in a haystack function, or a completely random function).
EC systems which employ population relative operators, such as the BLXa and
PCGA systems being studied here are dependent on population relative statistics to
determine their actions. Therefore, such systems may be especially sensitive to the
choice of population size. For example, a large population tends to maintain diversity
longer, which equates to lower levels of variance loss through selection. If selection
becomes sufﬁciently weakened in comparison to the variance addition operators, the
population may fail to converge. In this section, we present empirical test results from
PCGA and BLXa systems with various population sizes.
4.9.1 PCGA POpulation Sensitivity
Table 4.16 presents the results of identical PCGA systems operating with
p0pu1ation sizes of 50, 100, 200, and 400 respectively. Each system executed an
identical number of evaluations and the average best result for 90 of 100 (discarding the 5
best and 5 worst results) runs for each system on each function are reported. Systems
demonstrating statistically signiﬁcant ranksum values are highlighted.
240 
PCGA systems demonstrate deﬁnite sensitivity toward population size selection.
The magnitude of performance difference and the number of functions which show
signiﬁcant change are both far greater than the previous comparison with modiﬁed pool
sizes. The population size 200 variant shows clear dominance over the other selections.
Further, test which run the larger 400 population size variant for the same number of
generations (therefore having twice the number of evaluations) continue to show marked
and signiﬁcantly reduced performance results. The reason for this performance reduction
has been shown to be the reduction in the relative selection strength, through
measurement of average relative mutation magnitudes between the two systems (i.e. the
400 population size system does not converge as well overall).
While we expect any EC system to show eventual preference for a given
population size for a given landscape in terms of optimal search efﬁciency, analysis of
PCGA performance here indicates that the system has an additional level of sensitivity.
This result implies that selection of an appropriate population size may be extremely
important when using PCGA. This result may extend to other systems which employ
population relative operators as well.
241 
880850.880 <80 00 88.8 508.000.. .0 8.88.88 80.3 8.080.
8.80B88838 .8... . . . ..8..... 89800888888 30+80008883.8 0 .0008 8.0.80?
80+8008.388.8 80880080888 8 . ., . . .. 80+80008003i8 5E3 8.882.088
.0800888833 ..0800888888 . . .0.800b0.80.8 0 8.0.031
, s ...,...»ﬁ. 00+80008888.. 00800888303 00800888888 0 9.8 88.82:
00+80088888.8 09800803888 ._... ..8.. 00+80E888 0 88 85000
5800888888 .. _. _ ,_ .... 5800833888 .0+8008888..8 0 803285
3780038888. . .. .  .080088880.. .0+80088.88.8 0 828
88008885.. ..,. .. . .0+80088.888.. .0+80088838.8 0 80080003
00+8008F838 00800883888 ... , ..8... 00+800.3.&.0 0 _8._08_800..._._05_
0080088088.. 0080033808.. 00.800888I88 . 0980088888.. 0 3 .00.... .808088
0080088888 00+80088838.8. .8 00+80088.888 00800888888 0 8.8280858
.0800I888.3.8 . .8 80 8003.8i388 00+800M88...8 0 0888.3
68.800.888.88... ; _ . 00+8008m8lo88 .0+80.088338.8 0 5.8.0888
. 0.....8. .. ..8 . . . .080088880. 00+800.88.3.8 0 8898088008
88.800.83.088. 3 ...,... 80800I88. .3.8 80800 {P388 0 880.0 .8 .886
00880000000 .0 .080088388.8 .0+80088888. . 0 8808.885
3 .800E8.8 8800883888 .0+80038000. . 0 88:8...
00+8008338..8 . 008088.88. 00+800388888 0 08.08
80800380888 .. 00+8008. .338 .0+8008 .8I80.8 0 8 .802 808888
0030083088... . 00+800. . .08.8 .0+800b0088.8 0 88.008808
008008308. ..,..,. 00+80038.88.. 00+800B8888 0 888.0 8 88.8
00+80088h883 .., .8. 00+80003E 880030.880... 0 80.8
8080038288.. _ ..,... 00+800ﬁ088 88005888.. 0 . .005. 80808
80800888888 88.888080...“ . 00+80038.B. .0+800888I38.. 0 80808
88800 .88R88 8880088308.... . .0+800i88 .88.. 80+800E838 0 8. . 8.8.8.288
88800838888 88.8008 08.8...  09800888888 89800388808 0 8.8008
8.80030088.. 8.883.808... 00+8008.38l8.8 5.888%088 0 8.8008
0.805. cues. c805. cues.
88:00 c2025".
003 008 00. 08
242 
4.9.2 BLXo GA Population Sensitivity
The analysis of the PCGA p0pu1ation sensitivity raises the question as to whether
this is a standard feature with all population relative operators. Table 4.17 demonstrates
the same test using the BLXa GA system. Note that this system does not appear as
directly sensitive to population size, as increasing the population (at least to the 400 per
generation level) does not seem to adversely effect performance. Upon closer
examination, the BLXa system only appears to exhibit population limiting effects on
two of these test functions, Dynamic Control and Rastrigrin.
We cannot establish a direct correlation between population relative operators and
population size, though the existence of such a relationship is apparent with PCGA
systems. Possibly other factors, such as shape of the search distribution (normal vs.
uniform) provide a mitigating factor. No general conclusion can be reached for the
general approach of population relative operators at this juncture.
243 
8800580888.. 8.0.030 :0 88.8 00.80.0000 80 8.8808 88.3 8.00.8
8800888% 8. . 8.. ..8..... .0+80038388.8 80+800.8388.8 0 30000 0.0.8080
89800883388 80+80033808. 8.. 80+800 .38 .88 80+80088888.8 5.18 8.3 8.882088
.0800.8888.8 .0.80088888. 8 5800880888 ..8... .. 0 8....o>>
00+80080838.8 . . ..,. 8. 00+80008088 8. 00+80080888.3 0 9.8 88.82:
09800888888 00+8008. .388 .,. 0980033888.. 0 800 85000
.0 8008888. 8 8.800808 .88 8 . .. .. .. 0980088088.. 0 0.03585
.808008I 33.0.... 808008888..3 8.0 80088.0.8_ 3. 00+800.83.8.. 0 85.”.
.0.88.8.8888. 5.8803888. . .0+80088883.8 0 80080003
00+8003.883. 8 00+80080II8.08.8 .8.; .. .88. 8.8.8 ..8.. 00480083888 0 8.08 8000.03:
.00$00.0.80. 8 098.008.8888 . 00+800888.8 . 00+80088880.8 0 3 .022 .808088
00+800008.88.8 880,8. 00+80088 .888 00+800088 .88 0 8 .002 .808088
.080Bm3308 3080088883 ., 30800888888 808008880..8 0 08". 8 .08.
. . . ,. . 8.888%. 00+80088. 88.3 .0+8008880... 0 5.80.888
80800883888 5888.588 00+80088I3.3.. 0 80.835858
88808888888 8880088308.. 80800888880 0 880.0 8 8.6.0
3080088088.3 80800880888 .08008m8.3.8 0 88083800
8. 80088888. 8 8 .800888838 00+80088 .008 0 88.8.8.8
00+800I88838. . 00$ 88w... 8 00+800338.8.. 00+800W8m.0.8 0 38.08
0060088888. 8 098003.888. 8 00+8008 8883.8 .0+80083888.. 0 8 00.2 .80288
00+8008.888 8 00$0083888..8 00+80088.88.8 54.802808. 0 8805:8808
0980088888.. 00+800 .88.8.. 09800888808 00+8008.880.8 0 880.0 .8 08.8
00+80038.m8.8 00+800M@i3..8 09800808838 00+800M88888 0 8:0...
00+80080808.8 00+80§08 .0+8008.880.. .0+800 .0m88. 0 . .002 808088
8080088888.. 5800888888 00+8008808 ..8 .0+80088880. . 0 808888
8880088888. . 88800008883 80800 .83883 58808888.. 0 8. . 8.8.8358
08800.88880.3 088008883808 08800.888.88 00+80080883.8 0 8.8008
88800338888 8.800388.8.8 58008885... .0+8008.88... 0 8.8008
580.2 cums. c805. 0822
«EEO 8.28:3...
003 008 00. 08
244
4. 10 Loss Sensitive Operators for EC
As Sections 4.7 and 4.9 demonstrate, one of the difﬁculties with population
relative operators is adapting the scale of the mutation appropriately. An alternative
approach would be to scale the amplitude of the mutative response to the magnitude of
loss inﬂicted by the selection process. We will categorize such approaches as variance
recapture or VR operators. In general a VR mutation Operator will measure the variance
of the previous generation, 03.] , and the variance of the selected parent pool, op , for the
current generation (note that this assumes that the entire selected parent pool can be
determined before operator application begins, which requires a minor reorganization to
some standard EC systems). A reactive VR mutation applies a mutative random sample
for each produced child such that the total variance of the next population will be the
variance of the parent population, a'p , plus some percentage of the variance loss, t ( 05.1 
op ). This is achieved by selecting the mutative distribution such that the variance of the
mutation is proportional to t ( 07.1  0p ). Assuming that the two samples are
independent, the variance of the sums of samples from the two should have the variance
of the sum of the variances of the two distributions. Note that this implies use of a
mutative distribution with ﬁnite variance (therefore we cannot directly apply a Cauchy
mutative distribution without violating the design principal of VR operators).
4.10.1 Variance Recapture and Convergence
The goal of most evolutionary computation systems is to locate the global optima
for a given function through search. However, since we cannot know the location of the
245 
global optima in advance most EC systems make the assumption that the global optima
will be located in the neighborhood of other near optimal solutions. Therefore a
secondary goal of most evolutionary computation systems is gradual convergence on the
perceived area of the best solutions in order to increase the probability of ﬁnding the
global optima. In terms of VR operators, this implies that the percentage of recapture
must, over the long term, must be less than 100% if we expect to allow for convergence.
(I.e. if we force more variance into the population than selection seems capable of
eliminating, the population will be prevented from converging.) Note that it is possible
for shortterm bursts of recapture to target greater than 100% of the lost variance, which
allows for the possibility of annealing type cooling curves; however, as this greatly
increases the potential scope and level of parameterization we will restrict ourselves here
to ﬁxed target VR operators and leave exploration of dynamic VR targeting to future
work. A side effect of the decision of using ﬁxed VR targets is that the initial range
should encompass the global optima since it may be difﬁcult for the search very far
outside this boundary.
While the derivation of variance recapture may seem intuitive, it appears to run
counter to standard EC convention. Typical EC tend to achieve large levels of
convergence early in the evolutionary process which tapers off as diversity drops (a
necessary sideeffect of convergence). In contrast, VR operators eschew early
convergence for a slower steady state convergence rate in the hopes that the
accompanying added diversity provides greater long term payoffs. Additionally, the
steady state nature of VR systems allows for a form of resource scheduling, in that an
experimenter may determine how much time may be spent in exploration and therefore
246 
how slow the convergence rate may be limited to (while selection pressure is sufﬁcient to
provide at least that level of variance reduction).
VR Operators should be good candidates for both EP and GA systems
(speciﬁcally systems with + and , style selection); however, in preliminary empirical
testing VREP systems greatly outperformed VRGA systems. Upon inspection it appears
that the variance loss due to selection maintains better stability under the high levels of
elitism possible with +selection. With ,selection, the reaction of VR operators appears
to directly effect the level of variance loss due to selection in the next generation;
therefore the population seems to oscillate between generations with high exploration and
low loss and generations with high selection and low exploration. The VR systems used
for the results presented here are exclusively VREP systems.
4.10.2 Global versus Dimensional Variance Targets
The concept of variance recapture seems to provide a choice of using either an
overall global population variance target (i.e. dimensionless variance), or individual
variance targets for each parameter. If the expected parameter ranges are well known,
then individual parameter variance targets seem ideal. However, by forcing the variance
of a given parameter to remain high, we may be unduly retarding the progress of the
search. For example, in relatively under—constrained problems where one parameter has
an exponentially larger effect on the evaluative value of a given solution than other
parameter, forcing one parameter to maintain a high variance can effectively mask the
contributions of other parameters for a long time. Nonetheless, individual parameter
variance targets provide better search characteristics than use of a single global target.
~247 
The difﬁculty in employing a global variance goal becomes apparent if we
consider problems with free parameters. If a given parameter is free, making no
contribution toward the evaluative value of a solution, then we might expect the variance
of this parameter to remain larger than those which receive some form of direct selective
pressure. Further, we should expect that the level of variance loss across this dimension
would be less than that on other dimensions, again due to the lack of direct selective
pressure. The cumulative effect produced is that the population maintains its overall
variance by shifting randomness out of the other dimensions and into the free dimension.
We might expect this situation to provide very poor search characteristics in problems
which are relatively underconstrained. In such situations, variance is shifted out of
highly sensitive parameters and into less sensitive dimensions. The less sensitive
dimensions are then swamped with excess variance which slows the effectiveness of the
search. While this mechanism seems to provide a method to measure the relative
sensitivity of the various dimensions, it does not directly provide an effective mechanism
for directing the magnitude of mutative steps.
4.10.3 Difﬁculty with Nonﬁxed Axes
While ideally we would like to employ both the techniques of variance recapture
and population guided basis selection, it is difﬁcult to formulate a method whereby
variances across one set of axes are transferred as targets across another basis. The entire
covariance measure could be transferred; however, using this rotated covariance matrix
has the effect of ignoring the new basis. Therefore, the combination of these methods
seems to require use of a single global variance measure. However, given the severe
limitations of global variance targets in the presence of free or underconstrained
~248 
parameters as discussed previously, the combination of these methods does not appear to
be easily rectiﬁable. Further study in this area is left for future work.
4. 11 Empirical Relative Efﬁciency Comparison of VREP
For the purposes of empirical comparison, the VREP system was created which
operated identically to the standard EP system previously employed with the elimination
of the selfadaptive parameters. The selfadaptive mutation is replaced with a VR
mutation operator which ﬁxes the variance of the normal sampling applied on each
dimension as a proportional to the variance recapture target percentage, t, the variance
loss, ( 03:1  a'p ) , and the number of children produced per parent, c , by the formula in
Equation 4.5, which results in a target +population with the required level of variance.
1 1 ( )
Equation 4.5 Formula for VR Mutation Distribution Variance Calculation
All other parameters for the VREP systems were maintained identically to the EP
and fast EP systems previous evaluated (population of 100, 6 children per parent, EP
tournament selection with 10 votes per solution) unless otherwise speciﬁed. Also, as
with all other systems evaluated in this chapter, the performance over same initial 100
populations and same 100 random rotations were measured.
4.11.1 Comparison to EP
As previously stated, the VREP system evaluation is parallel to the EP evaluation.
With the exception of the form of mutation, there is very little difference. The same code
249 
and testing environment was used in both cases. Therefore, the results in Table 4.18
should hopefully reﬂect as unbiased an evaluation between the effectiveness of the two
mutation operators as is possible within the given set of test problems. Note that as
always we must be careful with any extrapolations we attempt to make from this
evaluation since NFL dictates that all search techniques must necessarily be equal over
the set of all possible problem instances.
The actual EP system tested used a 99.8% recapture target; hence the label VREP
0.998. As the data in Table 4.18 demonstrates the VREP0.998 system greatly
outperforms EP on a majority of the test functions.
~250 
waAvﬁmg nun—d again Ammm :6 MO ESEQEOU Oogéuhvm ”ﬂ? 2‘69
ﬁg? 823.. o .880 2523
.. .. Big? 8+88.N Salem 3. .2828
. .. ﬁg? 285:. o ”25E
5%? SEEN .em. 82:.
. ﬁg? 8888 80 228
El; 88$ 80886 83520
EEHE ﬁg 88$ 383 32283
. El 88R. 2 88$ 8883 ms:
.956 88852
9.0: 0.039%. V oooomw
oommmmmd
Ea 888m. 88$ 88.88 c
v .82 5:28
58$. 3%? 88$ 8.383
m .82 5:28
.8985 88$
mm
o
o
o
o
o
o
o
. o
gigs; a $88.0 o 8n. N .3
g . . __ _. ., .. . ﬁg 88$ o8mNm.2 o 5.88m
35%? NF a 5N. N 0 8.22028
, . o wwo.0 a ho>o_0
.2883 88$ @33 o 88:25
gorge 88$ 8888 0 $22
85.8.. 88$ 8.8km.— o _sam
853$ SBMSS o N .82 5:28
8883 8S. 8. _ o 85:38.
8883 88808 o 890 a .8:
oomvtmw 88$ o 9?
82.3.5 BRF 888.8 0 $825828
88858 888 8leme o 5:28
8828 .88: $BNINN o «5.28:8
82%? 88: 9683 o 28%
. 8882 88$ 8NNo8 o o 98%
m_a>m ﬂow Eamxcum 50.2 «.35 ion
2530 5:05...
251
4.11.2 Comparison to Fast EP
Again, in order to provide fair comparison against general EP style mutative
techniques, we include a direct evaluation of VREP and Fast EP using Cauchy mutative
sampling. As in the evaluation of PCGA, we note that a fair comparison would be
between a modiﬁed VREP using Cauchy mutative sampling with VR mutation; however,
as discussed in the introduction to this section, use of the Cauchy distribution violates the
spirit of the design of VREP. Nonetheless such a system may be feasible (aﬁer all, fast
EP does not seem to suffer from divergence even while employing an inﬁnite variance
mutative distribution). Such exploration is again left to future work and we herein
present a direct comparison of Fast EP and VREP using a normal mutative sampling
distribution.
As with the standard EP approach, the VREP system greatly outperforms Fast EP
on a clear majority of the test functions. In fact, table 4.19 demonstrates that Fast EP
outperforms VREP only on the same functions that the standard EP approach does. This
may indicate a particular match between EP methodology and these functions.
Additionally, these three functions, Clover and Cross, Schaffer Modiﬁcation 3, and
Schaffer Modiﬁcation 4, are among those which the randomly rotated version of PCGA
was not outperformed by PCGA, although the relationship between the two occurrences,
if any, is unclear.
252 
I . .u . i.ezlll.. ”Fill! tail. . .l . If‘k1.l I
«add—E .23 E398 hm— .mah a we :eﬂuaanU manageto.— 3... 035.
88 8888.8 88» «8888. 8:$ $88.0 88$ 888N o .958 2&8 o
8:. 8888.8. 822, ”82.8.9 8$ 8+88N 88: 8+88.N c.888 .2888
28 8883 88$ 8N8... 88: 88:88 88$ 88m$ld o 8:25
58 0888.8 88$ 8088.: 8: 8883 88$ 8888.: o .em 852:
88 8,8886 89: 88$N 8$ 888.0 88$ 8883 o 80 288
88 88885 88$ 8388 8:$ 88$.N 88$ 0888.0 0 83580
88 888.8 88$ 888.: BB 838.: 88$ 888.8 0 88888
88 o8888 88$ 3:888 88: 88.8.2 88: 0888.8 0 ms:
88 88:28 88$ 8883 N8: 8wa08 88$ o888N 0 8.8.8382
N8: 8883 88$ 8888 88 068$.o 88: $888 0 8.82828
80: 8888.: 88$ 8588.0 88 2858: 88$ 8N8? 0 8.82828
88 085.88 88: 8888.0 88 588.0 88$ Awe88.0 o 0829
88 A8888 883 8883 80$ 888.8 88$ 8:88.“ o :88me
88 8.88.8. 88$ 888.8 8$ 8888 88: 28$: 0 88888
8:$ 98:8 88$ $88.8 8.8 528.8 828 08888 0 8908556
88 88888 88: rmlmNtod 88: S858 88$ 8388.0 0 88225
88 888.8 88: 8.083 8:$ 8288.0 88$ BéﬂtN o 628
8 888°: o8°$ 888.0 88: 8888.: 88$ 8880.: o _98
2883 88: 23°88 8:$ 888.8 88$ 8885 o $884888
58 8688.8 88$ 8883 88 0888.8 88$ 088va o 85:88
88:88 H82: 8883 88: 8888.8 808 8888 o 8908.8:
8 o8:$N.N 88$ 8888... m8: ootmlvme 88$ 8883 o 8.x
88, 8323 88: 8823 8:$ 88$.8 88$ 888.? 0 8225858
88:08 88$ 5583 8:$ 888.8 88$ 8:8: 0 5:28
.8 3.858 88: 388.: 8$ 8888 88$ $88.N 0 «3.6888
88 8.88..“ 8m: 888.: 98$ :83: 88$ 3288 0 988
.. 8.28.8 88: 3.88.8 8:$ 0888.0 88: 828.8 o 988
Esmxcam cams. m_m>m .85 £39.53. :85. m_m>m «mom
«Ezno c285."—
_ nmm> mm".
253 
4.11.3 Comparison to BLXo GA
Table 4.20 demonstrates that the BLXa GA and VREP approaches each
outperforms the other on roughly half of the tested ﬁmctions. The BLX a GA seems to
outperform mostly on functions with a strongly unimodal component, while VREP tends
to outperform on complex multimodal landscapes. This outcome is logically consistent
with the design of the VR mutation operator, in that VREP intentionally delays
convergence. Therefore BLXa is able to more quickly converge on strongly unimodal
landscape, while VREP maintains more diversity and performs more exploration
resulting in location of a better optima.
4.11.4 Comparison to PCGA
The data in Table 4.21 demonstrates a strong similarity to the comparison of
BLXa and PCGA, except that the PCGA system outperforms on a slightly larger number
of functions. Similarly we again see that VREP seems especially well suited to the most
complex multimodal search landscapes.
254 
waaéig was 888.8 <9 ~53: a no ﬂour—3:50 unaware—om 36 935.
88: 228.8 889 2.888. o .928 0.52?
85: 84.888 88 8,683 lemds 8.82:8
88: 85888 88m 8888 o 2525
8k 8288 88m: 888.: 88 8:888 883 8888.: o .em. 8.2:.
88: 88:88. 88 8888.: o 80 288
8:: 8885 88m: 83888 o €3.88
88 8088 888 8883 0 8.2888.
8: 888.8 889 8888...” o 8..“.
88: 8888.8 .888: 8888 o .28 886.85.
:8 0888.: 88m: o888.o 28 8888.: 88m 888.: o v 8.2.2.28
. 88: 8988 82.: 8888.: o 8 85.5858
85 ovﬂmlmod 88: 88588 c 8n. as;
88: 8888 882 08888 o 5888
88: 288.? 888 9.88.8 c 8.96528
88: 228m 888 «788.8 o 8908220
88: Egg _88: _Elmmtod o 88:80
8:: 888.8 88: 8.28% o .622
. , : .. .... .. . 88 888: oboom: o888.o o .98
.. .., ... ., 88: 8388 88m 8888 o «82.828
_. 2 .. 88 8883 88m: 888:8 o 80588”.
88: 88:88 88: 8888.: . o 3.20 a .8:
._ .. . . . .. .. 88: 888: 82: 88888 0 8El
. ...,... ...,... , .. _. 6:: 8890.8 88: 8888 o 30.28858
. . . , .. . . . . N8: 8883 88: 8883 o .828
85: 3858 88: 388.: , . .. _. o «3,882.8
88: 2.8.8.8 88: 988.: L. . o 29.8
88: 8288 88: 883.8 . ; o 988
E3358. _ cums. m_m>m 80m 539.com ...on w_a>m «mom
«5.50 5:05.".
_ mum; 3.me
255 
waaédmg .56 «GOA no datum—=00 3.3—Eaton— uNé 03.;
88: 8888 888 8.88...” o .988 0.58?
8.8 8,888 88 88%. SE? .9828
88 0888.0 88: 2888.0 0 25oz.
:8: 8888.8 888 0888.: . o .8.”. 85>...
.. 888 089.88 oBB 8833 o 80 288
_ 88: 8883 882 8888 o 83580
88: 888.8 888 888.“ o 8.8888
88 888.: 888 8888 o 8.2”.
. 8:: 88:88 688 8'8888 o 5.8.83.8:
8: 8888.: 888 8888.0 BS 838? 088 8888.: o v 82888
. _ 8:: 8888.“ 288 88.8.: o m 8.2 .888
88 8883. 88: 88888 0 8n. 83>
85 8883 889 88:88 0 5.8.8.".
85: 888.? 888 888.? _ c 8.98088
88: ova—Rm 888 888.8 .. o 8908.820
8.8 £883 88: 8866 _ o 88380
85 888.8 88: 388.8 . . , .. . o .628
8: 8885.: 888 858.0 88 8383 888 08888 o .88
88 2883 88: 2882. , . , o «8.2.89.8
. _, .... .. 88: 8883 888 8888.8 o 89888
88 88:88 88: 8888.: ._ , , .._ . o 8208.8;
. . ..,... . , ,. .. . 88 828mm 888 8:88.: o 8F
8:: 888:8 88: 88888 . _ . . . .. . ..., o €882.88
88 88:3 888 588.0  .. . .. . _ o .888
85 3888 68: 388.: . . ..,. .., .. o N.:m..2m;_.8
8:8 883.8 88: 888.: ._ . . .. ._ _, . . . o 29.8
88 8.0%.“ 88: 388.8 _. ..,. .. .._ .., o 288
Eamxcum— cams. m_a>m _ .85 £335... :3: m.a>m «mum
2530 5:05“.
_ mums <09.
256 
4. 12 Effect of Recapture Percentage on VREP Eﬁkiency
As with the PCGA operators, it is instructive to evaluate how the “tunable”
parameters of the operator affect system performance. In this case, a crucial parameter
seems to be the percentage target for variance recapture. Table 4.22 presents results for 4
selected recapture targets. The systems tested were identical in all aspects except for the
recapture target. The presented values are the average best performance of 90 out of 100
runs (the 5 best and 5 worst results are discarded). The shadings indicate which (if any)
of the systems demonstrates a statistically signiﬁcant ranksum value when compared
across all other represented systems.
The 99.8% variant seems to outperform all other tested systems over the
majority of tested functions. Note that those functions which responded favorably to a
reduced recapture target are those which also tended to be dominated by EP and Fast EP,
although there is no clear connection apparent (other than the possibility that EP and Fast
EP can more rapidly reduce their mutative magnitudes).
257 
ocean—..8..».— mg .8 «QM—ah. 2:330: we «ooh—m— un... 03:.
_oFEoo 2:8:an
338.81. 288888 282582 288288 8
868F888. . . 86882 F8 8683.88 8+822mm¢ :88 F.. .2838
56888: , 88888F Fo82F.FNF..N 5888.88 o 2:an
868FFFE 868.6% 58 8688FNIm.m 8688mm; o .93 6285
_ 86823.8  589.9.me 588288 868R..F8.F o 80 6.88
568E; Foo8m88.m., 5826le2 8+82FFBF 0 83580
.8.8882.9... 8.68%; 8.88882 888888 0 ms:
8+8o8m88 Fo+88lmmMQ F 5+8EQ F 56888 F. F o .388on
868855 868823 868 F226...” 8688F.8.F. o .88 .8083:
86888.... F 8688.08. F ,F388F88 ... 888% F2 0 F. .82 .888
868328. F 8:88.88. F ..8+88F.F~F.F _ 868823. F o m .82 .888
86888Fmd 868883 868823 868ﬁ28 o 98 FF a>
868886.... 8+88.mF.82 568268. F 568va FF.. F o 5.858
6 #82:?me 8 F6883?  No.68.F,8~F..a 868E2. F 0 8.88088
2.688858 2.68823 868F88F 888888 o 685 w .966
868283. 868M188 868E296 868F823 o 88.65
888888 868%5» 8.88ImF F3 888888 o 6:2
868288. F. 868% F. F 868.8%. F 868688. F o .88
568 FE. F ,Fo68FFF8e. F 8682 F6 5. F 8682688 8 m .82 .888
868288 8.82238 868883 8602888 0 89888
868E8~ 868688.“ . 868883 8683858 8 68.0 w .8;
868288 86882me 8683.58 8602988 0 8E
868282...” 8608688 8688688 8+88§8 o F 8.2 .888
8.8889. F 88882. F 868283 868 F888 0 .888
F. 789.8 Fem F. F68 Fem.3 F. F68 le8 F.N F. F88F.ml8.m o N. F 26888
388838 F“F.8FFo2mIN.... mF82E8F. £68825. o 6.88
882.le3 88222.... B68:F..2.m 88on3.8 o 6.68m
:85. cams. cues. :35.
2530 c0225“.
86.8 8.8 8.8 3
258 
4. 13 Effect of Population Size on VREP Efﬁciency
As with PCGA and BLXa in Section 4.9, in this section we explore the effects of
population size on the VREP performance. Table 4.23 demonstrates the effective
performance differences for 4 different population size choices. The data support a
strong sensitivity again to choice of population size. Since the recapture target is relative
to the operation of selection, and the effect of selection is dependent on the population
size, the VR operator is sensitive to population size in a similar fashion to the PCGA
operators. However, unlike the PCGA operators, the VR operator cannot cause direct
population divergence as long as the target rate remains below 100%.
259 ~
6665 666.66 6686:6666.— .FE :6 656 862:8.— 66 6.6666...— 66.F. 6.66.6
66 68.66.68. .2 666836266 . z... . 868M826 6 _6860 258?
...... 6668 FF.6I6F.6 6668 .F..6..6.66.. 6. 8686 F6 F66 6.86 FF. 66662666
F.6 6866.666. 6 5.86622 F26 588.6.6 6 6253
. ...; 6%... 86826286 86886h+ 6 .26 6685
868262: 86866666F _. 868%? 6 660 6.686
F686l6F.F62.6 86886666 568863 6 83580
66 66666666. 6 8686F6F66 .... . . . .. F6682F.6R6 6 6.2“.
. . F6686F.66F.. F 868666666 56883.. F 6 866826
8.6866l666F. 86885:. .. .. ..,. ............ 866266666 6 686.6858:
66686F62.F 868632F 86866663 .. 6 6662666686
8686l6686 6668W6F666 6666.63.28 F .. ..86666622..F. 6 6 .662 .8866
F6 683266 .6 5.8666666 .6. 6...... 8688663 6 666 6 _6>
86662566 56862566 86686.68. 6 568666666 6 865666
6686JI6FF.6.F 868E616 .16....F .. .6»... .. 86882.2 6 6.6666686
....6F.6 ... . ..6 868.6866 6F 6866.F2..2.6. 266866663 6 666566686
6668666666 8686 F626. F 66.666? F668 F286. F 6 8662660
6668666666 66686326 . .. F6668£8F 6 26:66.
86626665 86622623 ._ . . ...F 868F686 6 .6666
86886.26 F668F2FF26 3.66.2 ...F. . F668F.2F62.6 6 6 662666666
866626623 F668F.6226.6 866262666 66662 F6F6.F 6 6.6986666
6666666626 86865666 868% F666 8662F666 6 6690 6 66;
.8+8666666.F 86686E 868F.F6F6.6 868F28I6F2 6 656
8688966 568666666 86826.6 26 56866 :66 6 F .622 .8266
86866566 568856 66.866666F..F 568662666 6 6666666
8666666666 5.6866686 3.8662656 66666F..6.mF F6 6 6. F 6626866
86866866 8686F26F6 2.6868666 F66866F.66.6 6 6.6666
6668626666 F66666E6 266821666 866666662 6 6.866
cams. :66: :66: cums.
aEzao cozocam
8F. 86 8F 66
260 
4. 14 Graphic Performance Comparison of Systems
In order to demonstrate the effective relative operation of the major systems tested
in this chapter, we have provided the following graphs. The ﬁrst of each pair of graphs
shows a semilog trace of the average best evaluation value found for 90 out of 100
systems (discarding the 5 best terminal results and 5 worst terminating cases). The
second of each pair demonstrates the relative ranksum value for the same 90 out of 100
runs at the given number of generations (i.e. if the runs had been terminated at that point
these would be the assigned ranksum values).
261 
Average Error (log)
Ranksum (x1000)
La&;°
8 System
10 ‘9 — EP
_  Fast EP
10 ‘11 ° ............
12 BLX0.5
10 _13 . '. u  ........... ._ PCGA
1o '14  VREP
23 47 71 95 1 9
Evaluations (x10w)
Figure 4.1 Average Best Performance on Square
Square Function
6r.—_. ...—... .. . .—_. ...—m
—~   ~66. F  ._._~_  ......... 6m VIu——‘~M
.... ... . ._._..__ EP .____._.
. .. _  .. .. _ ..F__._. _ _.~.__... pvt...—
29 ....— ':'... ;.:..: m. ‘
.. ..,.. """"""" BLX0.5 :1:
.. ......_._._...... _ . . ..__.  6... ...l
q M6" ..,. ..... F . FFF .. .. F .... .........1
q Ham—......— '6 —»n ...—— nahm .m— ”M
22.6. """ VREP Z:
I — . .. . . F . .
~   _  _  _ _ —  —  —
~Pwn—  .. ..Fﬁ. ._.. . ......
qt——.—— F. 6.. .— F...F. F...
W’“ F—F. __ w vF—F ... 5 F. _ _ ... .—
W.‘  — a—~~.‘ .. ..,... ._ _ w— ~ — me— ~ _F. .. w
'1 FY“:— .   .~~ .._ —..—_...‘ ____._.__  .. 6.
1 5 . _— "...—...
"F... .._......—.—..__......
q mac—w.   . F. .6. — . ...  ~. 6 u. _..__.._._.___ «w
..——.—....—u. ...u. ~.~——..—F..‘— .—.—. 
A WEE I I_l‘ll.JlJFI.I.I.ll_ I... II ...0...... ...lt—JﬁI I.‘. I.
... ...  ..~ . vr‘Fuao'fv —. . any '66‘56 M...— . ~———.——~
. . d_F .. _._.._._, .~.  .__...—_.. .... .
' ....~..~
.... ....—.. ... 6 —.... .. .... _...._ .
 _ . ...————. —— ——.—'. — .— ...~—..'  . .v~— — w ...u
"h'v 
6L
Ilrrrrrrrurrrrﬁﬁ
23 47 71 95 1 19
Evaluations (x1000)
Figure 4.2 Ranksum Performance on Square
262 
Sphere Function
Average Error ('09)
95
Evaluations (x1000)
Figure 4.3 Average Best Performance on Sphere
Sphere Function
36
I _.__
“nag—4
iff _w __ EP
#:1 ;;‘"”;1:_Jj """"""" F881 EP __
 BLX’0.5 M
in: «w.» ..
— Sysbm m.
____~
W
”...,
... _u
29
22g.....,...gﬁI.Q§j§.'f__':‘"7f """ VREP 7W
Ranksum (x1000)
i
Ln—u—uro‘IJ.Mlln.l.l¢ul‘n~u1.l!J.m‘l—l .. lA“,um&lu MI I“.
8 ‘ W. "l‘buh.  . .
.
.—. . . ~.‘
‘ . ..  ..  M“... .~ _—
4 WV" .. ..p...“ .  —w.,..._..._. .. \_.,..».... .. 5.. . ..  ... _ a.
v.
.. _‘A .o .#v ... .. ...—..—. .i H.— ..u
M”. Qmummmenw .AAM“L,..‘O 0,"...
H ... v—~ ..._... «M. a—. _..,_.._ .. .‘v‘w,
.... H.
j I —I l r T j —l I F T j l T j
23 47 71 95 119
Evaluations (x1000)
Figure 4.4 Ranksum Performance on Sphere
263 
Schwafel’s 1.2 Function
Average Error ('09)
95
Evaluations (x1000)
1 9
Figure 4.5 Average Best Performance on Schwefel’s 1.2
Ranksum (x1000)
”m;n.rhr99:999?.?.'r?:'t '1":
.. ,VI ..—u.. .'q,'.~~ . . ..... .q. .... .aq
HM‘va‘ ...*.~.,~w.’.,.‘_..*.i» ~~— I"
i Fwv ~?FP.E— F—.wam_m,§.mmv  ...“
p4}...—..—.~x_...... .. .,., V U.. “.”. .... .  —..
. Irma. _ .... .. ...~ ‘
.l ..,..H.
q .......~.~.  a. ..  ..
I
Schwafei’s 1.2 Function
Sysbm
, —— EP
9 """"""" Fast EP
............ BLX'0.5
. ........... .. PCGA
 VREP
‘ V'r we—»~....—.d
..  . ".1
. . . . ..... ~~4v
‘»>. ‘..~ ". o“l
paw—w ...e
p. _ i, 
r».. ..
um... ...—
.,.. .. ‘~ “v . .. . .p‘ .. .. w~~—  —.  . ‘\«.—._  ‘‘_a_ xv no y "'M.~..au
..‘D.
. . ..   . ~' 'gﬂ...lJJ. 9... I !.F.l.l...l I ..l 1 ‘.
O
1pm.“p~.—..g....,t.,.,.‘i,.._.‘. \........_—......._....._ .......,........ a......._.__,......
Li. ;:, _____ . ...
~ .orr ‘ .ra_. . v.
'n‘Hr
 a,
ua~ .~.v .. ..... , , _ ,_ .._..,‘
C
. __  . .‘LO“‘.tO‘Ow
 rv...—.. ~—~ . ...—“z, A _.
I FT—I—T
r i I
47 73 95
Evaluations (x1000)
I
119
Figure 4.6 Ranksum Performance on Schwefel’s 1.2
264 
Schaffer Function
Average Error ('09)
95
Evaluations (x1000)
Figure 4.7 Average Best Performance on Schaffer
Schaffer Function
£5!!!
__._ Sysbm _.....“
‘_ it: ' ‘ ___— EP :27.”
iii". ‘ 1:11“ it  Fast Ep mg
a ...... i‘é.;'.'£'.'.' ...... 5.3.3::
if ;i' ‘;i',‘.’.ii,;:;  BLX4).5 ., .. 
.....v v ..._,.,.. PM ..—
8 \__ _ n ........... _ PCGA :_
O W ______ VREP .
‘—  .~~.~—.u  ...
x  _..h ..“... .—
V S‘ ”7%...m ..Wmer.E.: ''''''
E .i . .,..,.V....‘ _.‘‘_._".:_:" ﬂ . ...,,,......._._,._‘.,.....
3 .. ........1 ~__’o‘ awaA _.m....r_.e. .._
x   ...m...’ . .‘h ﬁt... ...... __ .. ..
 . f ..“ ___u. “a 1 cm.
é  . ...,.~.,~‘  — ,“. ~ _— '9 Inn—_.....
.m‘.’_..  .. __.._. ._ a
'."___'_"_'__‘.f_ ‘ '
:::'_i': ' ‘ ' ‘ “
,_"_'.§.;.‘;‘.. 311111;: :".;T..‘._:.'. 12:21::'1..j.‘_’.i.::;"_‘.f: .' 7;: ;.’.:::.';f.' ,'..‘..:;‘i 1
‘ e
     9“...O.AAA‘ﬂ..“.L ...‘O‘tOﬁt
...~.——.— an“.
Hmae,‘ “+va ‘4
rarwr»_—KWW _. A2. 'Mm' ,..
i rt —I ﬁT 9F T T —I r 1 I T 1 T j.
23 47 71 95 119
Evaluations (x1000)
Figure 4.8 Ranksum Performance on Schaffer
265 
Schaffer Function Modification 1
Sysbm
1 K. — EP
‘. """"""" Fast EP
1  BLX0.5
70 K. . ...........  PCGA
X  VREP
Average Error (log)
3
I I I I I I I I
23 47 71 95 ‘l 19
Evaluations (x1000)
Figure 4.9 Average Best Performance on Schaffer Mod. 1
Schaffer Function Modification 1
Syst ET:
EP ”"TTT.
""""""" Fast EP f";
............ BLX0,5 if:
’ “ “ ' """"""  PCGA 2...“:
— y _ — ——.—..p ——    s .—  .. 'vv'u ._ ..  .~ u..“ .   —.
1 5 4 in...» “mouth..__.¢.....A.\... “. a. . Na. .. "N.” .......  . W.. _ ___._...“  .  “.“.
O
. p—‘lr_....—— . ..‘—’.— ”.... .... —~~‘..—     . ..———~_..w....  .  _t.. . __._ _ .... ... .   .. ..
_  h — — —
.._ ... ”an.“ ..M A”
w;—
v _~— ___—_.."
.. .». .. ~ . .—....  . n ..>.»,—.‘ .. ..~... . . a'u“'.. .. _ A . ...v
Ranksum (x1000)
i
t
i
2
i
~ ——.—   uvu .— __ .  cs
<.~~'..—\ ...—.. n .._....._.._..—.....—._vt~..—.__ .. .. ..“... ...,...._ ._. ....u... . .r—..__...
,_ “a
. ...“
«,. . ....._ ....~.
. ..~z .
 u'w\—— .....,. .. . .«.. 1...”.. .~_—.... wym1—u n,"l§~. .. .v— .....  .v.  no —r....——n.,._'
I
ﬂ Wm“... ....... . mu‘m.‘.‘m§‘OO .. D .. ‘. J‘
.:..a——.. «.w..._ ,. ,...... M . ...,. __ _ ...‘....  _...“h ..”..g
’L — ' . . ' 'dx'u...‘ ‘VNW—ﬁ—w  W, _m. ’Nr—G‘nlﬁ/c.‘
6 I I_T—I I— T I I r r T—j—T _I
23 47 7'1 95 1 19
Evaluations {x1 0(1))
Figure 4.10 Ranksum Performance on Schaffer Mod. 1
266 
Average Error (log)
Ranksum (x1000)
36 :4 Ln '~—'>‘t.‘— w ....— —‘——————'J p . _. . __4
m_M Sysm .....“
22 “  VREP W
Ring Function
System
EP
""""""" F ast EP
"""""" BLX0.5
............  PCGA
 VREP
a—L
—L
O
I I l l I I l
23 47 71 95 1 19
Evaluations (111 0(1))
Figure 4.11 Average Best Performance on Ring
Ring Function
 ~.~. — .. ow . .— ...  ~. 1 — EP runu  r—
. '¢.. 3..." .. .,‘wv """"""" F ast E P __._... ...
_ y it
" a a  o  a  ooooo BLX‘O 5 W
..... .m‘wuwha nm»~hv*.w\.neW O you—W ......
 . a .. . .......u...__ ..._.....  , __ ...._  _   . .,  . . _ _. _ .. ..   ..,. .
1 5  ﬁ...” _.._.___._._.,...., _ __.“..  . ...... ...... . ...._. ... . . _ _  _ _.._._... .,. _...
.. .
. Fm‘f va —.. w. .  MA‘M OM mﬁAnMamAmwh
Wafgu.‘ f“..f.~. ..,.—. ... ... . ”...—.... .___.._,___",_,_,__ v .....
, 3““! ...  ...,.._.   . ..... __._. _..... . .. .,—......_......_  _   .. . .......__.
4w» — ......—n..~‘.._..... ......._......_..._.._..—.».._..... _.. .... . .. ..
_ ...,—..... ~ . _ u w
u. . —  duu— .— v
to ”  .. _.   .
runw — .— .r .—
.. __._ ‘m‘. ‘~.~..__—..... ____ .. ,____..._..._.._._. .—__._.._....._...
“hi I rT—I l T—I Iﬁ I
23 47 71 95 ' 1 1'9
Evaluations (x1 0(1))
Figure 4.12 Ranksum Performance on Ring
267 
LTrapezoid & Cross Function
a. Sysbm
". —— EP
"..,. """"""" Fast EP
............ BLX‘O.5
. ........... _ PCG A
 VREP
I I I I I I I I I I I I I I I—
23 47 71 95 1 19
EvaJuations (x1000)
Figure 4.13 Average Best Performance on Trapezoid & Cross
Average Error (log)
3
O
Trapezoid & Cross Function
System —"—
EP ”...,...
 Fast EP __—
 Bons ::::::.
a..— —__——.—__—....“ ...—. . .r.. —  .  ...—_._ _v. . .
. ~._ __.—...
Ranksum (x1000)
M.
...—“___...W.  .  . .—: ;.‘.“"m..a.AJ.‘A

... .......,'.l'..'._.  .. _. . .. .  . . _.  . . __ ,.._ .  .. ....“a
D
, ”Wm... “..,.M'
6"25»"z1'7"7'1 95 119
EvaJuations (x1000)
Figure 4.14 Ranksum Performance on Trapezoid & Cross
268 
Rosenbrock Banana Function
10 5 Sysbm
EP
1 0 4 """"""" Fast EP
 BLX0.5
. ...........  PCGA
10 3  VREP
Average Error (log)
I I I W I I I I
23 47 71 95 1 19
Evaluations (x10w)
Figure 4.15 Average Best Performance on Rosenbrock Saddle
Rosenbrock Banana Function
8
O
O
3
E
3
3
E
3jgili:.'_:_'I.i... """""""" Fast EP ..Wiilf.‘
8 "aw”’3 """""" 311415 E???
Z.—~.—..~~—~~.r—v . ...........  PCGA ”~5—
1:::7:_1:i_‘:::_._::::‘:__."" —————— VREP ...,...
6F I I I T I I I I I I I I I I I
23 47 71 95 119
Evaluations (x1000)
Figure 4.16 Ranksum Performance on Rosenbrock Saddle
269 
Schaffer Function Modification 2
Average Error (log)
119
Evaluations (x10w)
Figure 4.17 Average Best Performance on Schaffer Mod. 2
Schaffer Function Modification 2
Sysbm
EP ;.i;‘_
""""""" Fast EP :3
 BLX4).5 2:23:
A
O W.
8 —— — VREP ”.
*' [he
x ‘loII   IIIII
u " ..t). .....
23 .
(I) ~.    ~—
A M; ..— . c . ..‘n ..  "‘~.v~—  . .. v.5'.', — A.Vs~‘.~ ». . . . .\....  ~ .'.~.    _vra....u u‘v~.~<
O O
1 whwwvo.  .  »   . Haw..... ...... . ......wu "._, c . . ....e. _e..., . ... ._..   a. ._. ..,“. . d ..

'l ”ﬂown—'— . ~ 9M0”.u.uOOAw.l‘QOQOOMuDAMO‘.‘—o
W" PVMM" ~‘ ~   ' w’v   u r rumn  «~..  n ...m.» .. ...... ....  ~ ....«.... .. .~.. «. 1"»¢»y. ——4 .... . . _..,
HP_,V._._. ._.. ....,,.._._,L.._‘._..,~A LA~ ...ﬂ “4,7,  Veal. __., _<,.~.._.~.._.L..‘._._—___a..._—~._.. ,. ..
6 Tjﬁ ITjj ﬁTI r1 I
T j
23 47 71 95 119
Evaluations (x1000)
Figure 4.18 Ranksum Performance on Schaffer Mod. 2
270 
Average Error (log)
Ranksum (x1000)
10
SpualFuncﬂon
II
23
Sysbm
EP
""""""" Fast EP
............ BLXO. 5
. ...........  PCG A
 VREP
v .__ "s.
\\ I._..:'P.:Q. .....
~
I.
h_
I I I I I
i—ﬁ‘f‘f‘T‘r"
47 7'1 95 1 19
Evaluations (x1000)
Figure 4.19 Average Best Performance on Spiral
29
22
15
SpkalFuncﬁon
.... Sysbm ._...
O
.. "_., .
”:1;f:f;;:i:‘;‘;i::: “ “:13! EP ::.‘:.:
_.§‘?§éis.fffliii_"".'i.7.‘;.‘;'.f;    Fast Ep ‘__......;
. ‘l.._._~._.i___ ”W... ___“
A.  .
14 a.. ,  ' BLX0.5 ”___...
' ..  ..,.......n....... .. _. ...—_....
3 . :::f‘r:t:;‘§:t:f:‘:' ' """""" ' PCGA :7:
.. _ .g...'l. ....\r
.31 1 "'7'" VREP 3::
t
V"
.
w ..—....
_ ...5‘. .....v.......  ..... _.:.~_; a....__— .....
“..... gm.“ ..... ..,. .m .. 
nQ‘;.—: ‘._ _‘r'‘u‘.
1.”.1..:l‘. ........— .—..._ .—'—
.’ .L ..
fun—3. .————~..~.n .* ...— u‘u  m —...— ..—. n‘.m
/a
L/"""  2.
.. ~. ~—. Hu ' TW~— —  n—‘r—T: _—...
c h! a."
 .5._... '.a..w .. ...  ... ._. 'h.;.. ....— ._..4.. ....  .
1' ..... .,,‘O’: _..—_.. .“..m .. ...—... . ..  ___._... _.. ..
c. ___._.. .X.,, __._ _...W.. ..... .
.—.—p_._....—.‘
.... .. _. ———v— ....
4 Pmbyunouu. .... — . . «v   .. w.~.. ..w.‘ \ . . _. .. w. .. 
. u d‘ .7 .— N ‘ m
”tnﬂ." .1 p. ._. .. ..._.» ....—
pwm....—..—    .._..—.. .. .— .. ....   ._— . .. .... .. .~   ...
» p . _ _. ..  n~ 
.. .. .. _ _ .7 4.. .—
. — .. .. a n.. .~.._— . ..
Tf—I
23
IT—I I r—I—I ﬁTﬁ I
47 71 95 119
Evaluations (x1000)
Figure 4.20 Ranksum Performance on Spiral
271
Average Error (log)
Ranksum (x1000)
Ackley Function
Evaluations (x1000)
Figure 4.21 Average Best Performance on Ackley
Ackley Function
36 . Sysbm  if:
1 , ':_:1:;i.ii::':t::"::i.::.""""“. . — EP ”:1
;‘:::‘"‘:‘:.i_::_:::‘:i';:::::.;::':::::::.  Fast Ep .;:::1:
“"*“;‘;“;';‘"th';";)vv' IIIv“ 1‘...“
29 ‘ '  13on.5 ::
.1  _ ._ ..  ..  .. ..~«..._ ....a . ‘1 p... “......“—
1Exit,__._____ ‘ " """""" ' PCGA :::':
‘.._ . ’..   _.—19.4
‘  ~— \————d .
22....,._........._._....._.  VREP
._r
— ._.
Ww.n—o.._..—.~ .. .\~._—.—— '
u—c
m'v‘mn  In  O ."\‘." "‘ .. —‘u'." .~' 3 o — UlI_  .OC u'.'\‘m
r———. . _ ...—. — ...ragg... ._.,— , ”ﬂew“... _.....“ ...,—w.— ......
N .. _ V _...  .. . . _
IW¥ m ’
II W.*om.wmnoA”»_¢~‘oA QA’. .......Am ..
”..., _r . wu' __._  ~.v~.—_. ._. r _.~—..—._\m
—   w‘.‘a—  — Own"i.
rTjITjﬁ TTIIrer
23 47 71 95 1 19
Evaluations (x1000)
Figure 4.22 Ranksum Performance on Ackley
272 
Average Error (log)
10
10
10
21
Griewangk Function
.
.....................
23 47
Evaluations {X1 000)
Figure 4.23 Average Best Performance on Griewangk
Ranksum (x1000)
29‘
Griewangk Function
.0
.r.a~QF. _. ___wygﬂﬁ.. ..
' . '5'".    — —  _ Vmp
.3. .. . . ..... .
.....g__.  . _ """"I’~
."! .__. —
. ’ ‘ ..g.'. lI—lI oooooooo
“mokg" — — ''~ ‘  ;_. ~a.1vvvvnv9wow
._..‘m. ..."..‘ﬂi‘: .IFI . .. _.,. ,  ..,, ._....a.... ........~........_
 \‘c nu. “ —:“ nu..  a'u. um. —o _....“ .v— cw
_Op. .n——ow—~
I
LII04.4“.q...  ‘wuwpod rum A cu. .u—u. w.—— c. .. ...  p§~.— n..au.¢. .v . .... I‘nIA‘ll  ""
“ma...,,q _...... ....._...\.......A,  
«.... ,. .....”. v . .»— —.... ...... ._.. v—w—u  —... .u a... —. .. ....
Ti
.._nu..
I FT—I—I IT I I I I FW
23 47 71 95 1 19
Evaluations (x10w)
Figure 4.24 Ranksum Performance on Griewangk
273 
Clover & Cross Function
0
5".
Average Error (log)
10317
I I I I I I j l I I I I I I I
23 47 71 95 1 19
Evaluations (x1000)
Figure 4.25 Average Best Performance on Clover & Cross
Clover & Cross Function
Sysm __
EP "“ '_:.i_
""""""" Fast EP E3:
""""""" BLXO.5 ...,.“
. «._.... —‘.va ...w— ab— —— o\>
.. _ .  . _ ..._..._..~ ~.~.  .. ... ... ._... 4. . ..—. _  .. . .— q” . ._ “._....“
Ranksum (x1000)
—. rand.7... ’"'"'.i.,; .~ws —
.1 pm... .3 . ...... .... 
,4}.,4_.,”,.1_,____.__ _._,.__.,..,.____...._.w ..,.,......a.. "... .3 .... 3...——......_..mn
hﬁW—Da ——, . a..»— . — —._.v.... .. _ ..Tw. _ 1  ._.—... mJ ‘ a _, «...—w“ _
NM“_¢.§~‘_"."_np‘.“A'.‘vv~ll..’.v* »‘ ‘g’rﬂp‘ us. uaﬂ.“
bﬂw—IImﬁIIFT I—I I ﬁTnﬁ I r f
23 47 71 95 119
Evaluations (x1000)
Figure 4.26 Ranksum Performance on Clover & Cross
274 
Bohachevsky Function
EP
""""""" Fast EP
............ BLX_0. 5
a ...........  PCGA
 VREP
Average Error (log)
\
23 47 71 95 1 19
Evaluations (x1000)
Figure 4.27 Average Best Performance on Bohachevsky
Bohachevsky Function
22.”... a... ..
_ .
___ ___ _ __ _ ___ _
.. “22...... .. ma... .2.
.. . . .,...  ___ 
W—r. .
8m 1"." ‘
b— _
EP ....”
...—.. —
AM“.'—~'.~wnvu .nav—..s~m;—4 ............. Fast EP
'1 M'sn" "' W~u‘.._..w* “i w
. lam.— ...—.     .." .   ... —.. .. ...m  ... £1 LIF—HInn—o
. . ....‘._.....___....c..__...._.......N. loo—o—ouu BLX_0 5 ...,...n..
.
at......... .. ....... . 2.... ..... .......
. ‘ .....  ... .— .. ...— 3.~.. p...~\.~.~.
...00....
“ .4 :.. . _   . . .... ... ..—..__.... ...,».. _.
.. ,......_ .—......‘_... ...... ”...... ._
...._..,~....M.._—._..~.u~—d   — — _ — VB? ”_........
ﬂ ,,,, ....._.. ........ . ....... ~~...........
....III.I.O.Il
Ranksum (x1000)
I.“
 .  u‘ ..._._.. 3..... .. ..~~ ———‘~n*‘m.vvndvmvmvr.ﬂm
O
._. ......... _... ...r~.~ .. ..._._._....v . ..... . ....." .... . ...
_. ...,“..m. 
. .. ...".. ‘ ~~.....,__. .._   .. u1‘. ..— .v _. .....,_—_ ..... ...... . .,I __..
'’."'v  u‘l _— ~ A .  ."
O
.
"manna.I 2! . ... ...,..! F. 1.......'."'.  . ...._._._._._ _.. . ...._ ........  ..” “.,. . “We“..
. O
......~,.._.._  . .v‘~._‘__ 4...  a 4. ._...n _._. . .  ..f.‘_.. — . . ... _.....
rTTjIIrTIIrr—I
47 71 95 119
Evaluations (x1000)
Figure 4.28 Ranksum Performance on Bohachevsky
~275 
Rastrigrin Function
1 0 Sysbm
— EP
""""""" Fast EP
............ BLXO. 5
............ _ PCGA
 VREP
N
___L
O
u—L
Average Error (log)
0
l r I ...e; 
23 47 71 95 1 19
Evaluations (x1000)
Figure 4.29 Average Best Performance on Rastrigrin
Rastrigrin Function
Syabm .  ..
EP :73
 Fast EP :
 BLXO.5 :37:
"""""" ' PCGA 3:11:
 VREP :::::
 f“ "...  3  "mm m;:
A_v ,_  —v\~—‘—U‘ M
In.  .   . 35... "If .22
._.” _ .. .,. ... _  . a... _ . ..,.. .   . . ., .——. ._.—"...... _ _.. ._. . _ _ _ ..
Ranksum (x1000)
..
.32
 . “I: ..Jt..i..__. _ .  . _. .  .......~ “____... _.2..._...a....__
‘ .. .. ._ _ ._ _ _  .ﬂ..._.
. “___.... . . ...m..__,  . ,. . .... .,,” ...._...,_.__.
...‘.,...._ . . ...__..... . .
U
. ..._,___..._ __.. ....___._____.
""~‘~—' v;" ‘ ' —‘;'.'&T'I‘7vvvmmr.
o
_..”,aus...“ ._ .
_ __I_":ﬁ_t_ __.____3__' _‘_'oocnc—noIc— ......
..I‘
.4 I—uwwﬁ—uvw—u ——’—...—o...~.u_ —'u\I—" 
.1......
.1 m~w.~—~— ~... . . ....‘....._._... . 4 "_..—... . .... ..—... ...4. .. . .....—.._... _.  ..  ..t ......
deo—‘I——uo——. l .. ..,.__......_._.._.‘ . .. ... .. _ _. ._ _
“_..ﬁ. ..__.__.......  ......_..._. ._.. ...—._... ....
. ... ...,»...H— _ .
EI—FI I [WT—I—j FTj—IIFW
23 47 71 95 119
Evaluations (X10111)
Figure 4.30 Ranksum Performance on Rastrigrin
~276 
Average Error (log)
Ypi & Pao Function
10 2
Sysm
EP
10 1 """"""" Fast EP
 BLX0.5
10 0 ............ — PCGA
 VREP
1o '1
1o '2
1o ‘3
Evaluations (x10w)
Figure 4.31 Average Best Performance on Yip & Pao
Ranksum (x1000)
Ypi & Pao Function
36 ..
""""""" Fast EP
 BLX'0_5
_ .   PCGA .,
l.’ 2 “1'3".15 —————— VREP _ .27.
j: .ZI . """"""
6 I I I I I I I I I I f l I I If
23 47 71 95 119
Evaluations (x1000)
Figure 4.32 Ranksum Performance on Yip & Pao
277 
Schaffer Function Modification 3
1
10 Sysnm
___— EP
 FastEP
............ But0.5
38‘:  " PCGA
E"  VREP
o
t
LLI
3:
g g ‘I IbIUmv ..huﬁ3m‘."
a "7"”“‘"‘ """"
Ifrllrlllll'I'I
Evaluations (x1 0(1))
Figure 4.33 Average Best Performance on Schaffer Mod. 3
Schaffer Function Modification 3
l
36 . _... ...,.. _...__.. Sysbm l2:
4*...“ w ~ — EP :;
29.13'3'5r‘v """"""" Fast EP ......
l 75' ..........__.  3000.5 “‘77:...
g ..‘c""""~s.: """""" " PCGA ._.”:
2 22  VREP ......—
x ‘ —i ....
s... ——‘*
E ‘ _... ..
ID
5
6 I I I I l I I I 3 F I I I I T I
23 47 71 95 1 19
Evaluations (x1000)
Figure 4.34 Ranksum Performance on Schaffer Mod. 3
278 
101
Average Error (log)
8
C3
Schaffer Function Modification 4
. _—
‘o. ‘‘——n—‘_“
s 
c
s.
u
‘
o
o
‘
.
.
I I I l I I I I I I I I I
23 47 7i 95 1 1'9
Evaluations (x1000)
Figure 4.35 Average Best Performance on Schaffer Mod. 4
Ranksum (x1000)
Schaffer Function Modification 4
36...... _ .
7 System
EP ..  I
""""""" Fast EP hm“
'  BLX0.5 ,
, ..   PCGA j “~’
 ,  VREP .. 'Ili
......
I l I I I I I I
23 47 71 95 119
Evaluations (x1000)
Figure 4.36 Ranksum Performance on Schaffer Mod. 4
279 
Multimodal Spiral Function
10
._L
l
Sysnm
EP
""""""" Fast EP
"""""" BLXO.5
............ _ PCG A
 VREP
O
Average Error (log)
‘.'iII0I.oy..
I ﬁt I I
71 95 119
Evaluations (x1000)
Figure 4.37 Average Best Performance on Multimodal Spiral
Multimodal Spiral Function
36 l “._.. System i .
.,..._,..___ .. . .L ‘39:}: 77..."
.4 h* — ‘v....f;..='l—.:.."o'..: ‘s— ”m. 4 EP L? l c‘ .
"""" * W.” ..........
29 ...._._... __“? Fast EP
rh‘ W‘ ‘ ev~ ' c ’  ~ b—
.....I‘:  I]  BLXo.5 :::t:
8 ‘ ”HI” .41... j . ...........  PCGA Eu.“
9 . TEL'W..‘ . "_::
o +..._..._. .... —————— VREP
.— 22
._....ﬁm..” 1....
...,“..H‘ ....... 7w _
E _...rmﬂu‘w.. _
a +1. .M— N  . f: “ "m"::*_‘.m:..:T.:,;;€§i
x 1 ......“m.. — ~ L3,. ..," .. . .... — _____ . Iii‘3‘.”.":.' :— ....
C _.....9‘...“ V... ﬁs.. . _ ___...5AAA'QP. .....   .. . . _ _ _ .. ....
g L.  m...‘o.. .. ... , .— ... ...., ..._...mum.
1'1: ' ﬂ 0:” _ “ M  . .—
0:. A ..M:;o«\r"—~~ .. ___
.. ._. . 0. .. . ~
8 1.. ' "77 x _1':‘*""“"*‘.‘ “ t ..:".':1::'1”f:""
q ban . . . _..7 .ﬁ _..g...._.;.—. ""d' f...
6 I I I1 I T I —I I F f T —F I
23
i—I I
47 71 119
Evaluations (x1000)
95
Figure 4.38 Ranksum Performance on Multimodal Spiral
~280 
Average Error (log)
Ranksum (x1000)
Fred uenoy Modulation Sounds Function
10
10
""""""" Fast EP
............ BLX_05
 ........... _ PCG A
 VREP
I I I I I I I I I I I I T I I
23 47 71 95 119
Evaluations (X1 000)
Figure 4.39 Average Best Performance on FMS
Frequency Modulation Sounds Function
36
.7} ”I"  . ‘ ——V— :E!.~mvnvmv«OOOOMn
1 5 ”"1 ___._ _W ___._
g I _
a I _ _ ~___ _.. __
K —.  ~ * wag=
1_S “ﬁf2235531”. if _ _ n.
X _, .
8 1:. .. ._.“.. "WW”:T"W”M
two..” a": ._Zf::' MW W.:.'::'.:VMMH
i:::t7"m _ 'tf*"mm 'W .H_"_
..I LIV—_...“ _ _ .. .. .... ._....  _...._ ...... ..., ... _.—___._.
6 j 1 I I I I I f I r I I I I I
23 47 71 95 119
Evaluations (x1 0M)
Figure 4.40 Ranksum Performance on FMS
~281
ExponenﬁalFUncﬁon
C)
System
EP
""""""" Fast EP
............ BLXO.5
............ _ PCG A
O
_;
Q
or
O
.LQ‘chuw
)
b9
3
c:8
(Lam
O
A
__L
O
(I;1 I
AverageError(
Docs—58
“Surname:
23 47 71 95 1 19
Evaluations (X1 000)
Figure 4.41 Average Best Performance on Exponential
ExponenﬁalFuncﬁon
SSMLW
r__ ‘ _;;ll._'r:.——,“v. ‘ . .~
)m __....M. ... .....
..“p.'..AIl.. _.... . _...___i.. . ..... . ______ a ........ . . __
‘ ....
. .—._. ‘.‘  v»—v   — —~   l. 
.
29 ' _. '""'"""""..“"I'.'“" .  _.....
A ..._..P. ...... ._........,..
,...._A......_.._. ........._i......_...........,..... ... "...'3 7L) .. .
C.
.I
q _. .‘__..._.\_,_ .~...._...._ ....._ .
22 .  __._.,...__...'al. ... ..
dbﬂl“ t—M~ _. '.—._ ..._. —'a~.—.—u ——————— WI,—xp__._.+n~u_ﬁ._._—.. _.. . ......
.. .  ... .W .. “..
 _...\ 
..—....
Ran ksum (x1000)
i
‘.
1
d —=——=H ~f" .. On.” Q A:;:  ...........  P CGA 3':
4 L....__......n_m‘.m._w
I ﬁTTT—rﬁ l ﬁﬁrjj
23 47 71 95 HQ
Evaluations (x1000)
Figure 4.42 Ranksum Performance on Exponential
282 
Average Error (log)
ChainLink Function
System
EP
""""""" Fast EP
............ BLX_05 _
. ........... _ PCGA —
 VREP
~~~
~—_o
IIIIIIIIIIIIIII—
23 47 71 95 119
Evaluations (x1 0(1))
Figure 4.43 Average Best Performance on ChainLink
Ranksum (x1000)
ChainLink Function
.... Sysm i
—— 5p 41:;
{_ifTi """""""" FastEP £355.13
.. ...._.:_:;':::T:.‘_i‘:  BLXO.5 ::::;
7'.fff"::_:;:i~fli‘:"_'ji'_"iii: ' """""" ‘ PCGA 3..;
 VREP :.;'
— . miz¥—a . W—P'Mﬂ _.....— _ ....  _...—
'
.  . ..._.~...,J..11~.Il.nu. . Ina.‘ . 3 an"
I _______ ___________
  ;;""fT''a''‘m"
' b. ' ————'.V.__ —n.—————u—_
. ...__._...” . ._ . _.. ._.._. _ _
.... .. ..
. ....»‘x.,_.."—__ ._________._,_. _n..._....—......——
.. .. ..,m .. W..__
..~_..— ‘.~ __.____. A
I l l l I
47 71 95 119
Evaluations (x1 0(1))
Figure 4.44 Ranksum Performance on ChainLink
283 
Average Error (log)
0
_a.
O
to
Double Cos Function
...—L
1_
l I I
47 71 119
Evaluations (x1000)
Figure 4.45 Average Best Performance on Double Cos
Ranksum (x1000)
36
29
i;}‘t§g}j‘_{i:.i;I;_'_:'_i_;_._g_r_g—4  Fast EP :55};
Double Cos Function
L  _ ...EH... M—v.h___..u a»
Wag,“ _. . _'. ~  'm—IH'“ sysm
—~~“h.v  .. _....“
15}
............ BLX'0.5
._..
ﬂ _... . ﬁ—qﬂu—T—n .—
. ...... ~  W_*._~_. v.5” ..
‘vauw. _...._  . . _v ... . ..... ....“
—..'——~_..—— 'a' .‘ .‘ ——..—‘~
' s
..=_. ... ...IA.. "lull. I.Al ...I. I. sunnun um“.
.. _ — r.~—~—o
“*Wi — . . ..\v~ . —.   . . .     . . .  _ . .. ~.——..~.~ . .
a4 hr“6v.— .— —  .—.  . —. — __ — ..
~1bow«.v —~— — .— ...»  . ~ . .A...
....___. ...  ., _. .. .. _
<>— ... .. ..._. .. .. .,
1w......> ,_ «‘v .. .
4ru~...‘. _. .— . _
. p‘WﬁC—h ‘w — a  e.~—r— ~— v v .— — ._.cm. a —
In bro—ra— ~\.~m ‘v ——'\ M  a..“ — v.. _ _..
m‘.———‘——_——.
m. — .— — . — . .— —— —— ... _..—n ..
I I I I I I I F I I
47 71
Evaluations (x1000)
Figure 4.46 Ranksum Performance on Double Cos
284 
Inverse Exponential Function
. 101‘
Average Error (log)
°~.   Fast EP _.
I I I I I I I I I ﬁ I I I I I
23 47 71 95 119
Evaluations (x1003)
Figure 4.47 Average Best Performance on Inverse Exponential
Inverse Exponential Function
36 “we...“ .ﬁwww... . . ... .. .... —..‘
. "v—~'..av  u.. ,,....... .. . .. __.... —‘.. ..  .. . . .  ... .. ..«.... . .‘n ~~   .~~  .. .   ~ 
F'uvr'.—. ..‘...,V..  .....,,.._._‘.._.... ._. .  ..— .. .. ._.._ ..  r. —... .. .—
1 .. .. .. .
r—v
——
. ... .. . ———
_~—...w —... . “...—. .....
m~'——hqn—~%. .. .\ ._._~....,,,.. .. ..vuns,... ...—. ... _. ¢—
”wVﬁmrv“ —  '— ‘ ~
’ he l—po‘
~‘\s‘—x‘.‘——dlﬁ ... . ......» .. ...."f' h.. .—.—.~.._.,... _.5... _...
‘
...._ ..F..._. .. . ...
_0IllO
. ....._.  .. .___ . ..“.. . _. .. ..... '.__ A:
.p —
Ranksum (x1000)
i
i
i
51
I I j I r I
23 47 71 95 119
Evaluations (3110(1))
Figure 4.48 Ranksum Performance on Inverse Exponential
285 
Average Error (log)
Ranksum (x1000)
Worms Function
1 0 Sysbm
EP
""""""" Fast EP
............ BLXO.5
............ _ PCG A
IjIIWIIIIIIIIII
23 47 71 95 1 9
Evaluations (x10!!!)
Figure 4.49 Average Best Performance on Worms
_L
Worms Function
36 _ ___...u Sysbm f.
4L———..V—...\—~~.m »>. 4
—‘—i — EP 1.....“
AM..“ ....._._._._.. .... ._—..,..f...‘ u—m.
. ......W. ...,._...
fl“, ‘7é'._.'._'_‘_)—.“._.T',ipﬁ:: ............. Fast EP WK
"""" 3' 'I '
, f _..
’f"*:’:'i:.::;"":_'""1": """""" BLXQS _....
‘ .  . \_—  u ,— ——— a — ... w  — . —~ .., _ . .. ... .. — § "
  _  . . x.. _. ......._ .... __. . _

— y_m—m ‘ . .r 
8  ._.._,. . ....._... \
.. ﬁ...._._... .... . . ._. .... _.
~ —& _ _“ _ 
.,.=_. — ..w..........‘v.r.——4..~
.1 pm“..— ...,—__.
 M~..—A ——~\‘, ... u.—~..'~V   v——' —~.— .  wx ——. ..      .e  —— .., .  — — 4' .   ...  V _ ..,.  _ ..
wag” .  w ___ w ._. . ..— ——.—~.o—a.. . —  ......   ..— .—_  A _ ..,...
_q.__._ ..  s ..  .. ._.. .. _ . ...... .. —.—.——  .....__ .._
‘ﬁww_ ..,. .. .‘<.. ,._._..__,__ _
éjIﬁTIrTIYR—IIITII
23 47 95 119
Evaluations (x1000)
Figure 4.50 Ranksum Performance on Worms
286 
Ranksum (x1000)
10
Average Error (log)
Schwaefei Function
Sysvm
EP
""""""" Fast EP
""""""" BLX0.5
............  PCG A
 VREP
I I f I I I T I T I
T I I I I—
23 47 71 95 119
Evaluations (x1000)
Figure 4.51 Average Best Performance on Schwefel
29
22“
Schwaefel Function
.' ﬁfti'iimwﬂu System 1::
__ .__.__. “Iii": EP
— ~ i'OfOhvM
’W t__:....a_._...__,_i ............. Fast EP
_..... ..
r. ‘ __..—
In..
“3‘  BlX'o.5 ”_m
"_. m—
 .. _..—._.“ W.“ ._ ..
  _...— — ...—
  .. ._.
.. _...  ... W __.—
...— ..“ _ ...... .. ...r ..... .. , .. ....
................................................
........ ..— . .  .. .. .. _.. .—
_   .. .  ...—._.. ......
.... ._.. ..   ..  ..
.— .....   v.. ..  ... . 
..  .. ... .. ~ _ ..  ...... _ — _ 4...
._ .2 _. .. __._ ...
.... ...—_..... .. ._.. .. .... i...
... ..._ .. .. ...  . 
...... ..... _ _ ..,.._...__....._...._
W
W”.. ‘m_ ...—”_‘..
, ~ ._ _ ._.. _ ....... _..._..m...._._,
._, 9.._.v,, __sue _ﬁ....,«a_~._ ...,...
I FT—I FTIITI'j Ij I
23 47 71 95 119
Evaluations (x10!!!)
Figure 4.52 Ranksum Performance on Schwefel
287 
Average Error (log)
Dynamic Control Function
23 47 71
Evaluations (x1 0(1))
95 119
Figure 4.53 Average Best Performance on Dynamic Control
Ranksum (x1000)
22;;
4».f—... ~._.. __._. ~... *—
yu...—_..—...___........~.._.._..._.‘au ..—.
w‘hy.—i  s...— . s: y l .‘\_a~‘*' \— ... ..A.—\. /\..—~
.. ...4.,\,,s..“. '..._, .._._.._ 
Dynamic Control Function
Sysbm
EP “‘ “j“
""""""" Fast EP _
............ BLX_0. 5 ...,...:.._ .,
. ........... .. PCGA : if“
‘ Wn—yaw—Qo unswe   __.A. .. —_—..
"..  ““0..“ .vvo. ms s. ... _u.
. .~._._.¢.ao._u uh ...—
~‘InI‘.IJl...Il.I‘LI.Q.I.. INA IJII,—II..J_—JI_l l_j_j‘!_L
WI _ _ ...—‘ .— — . . .. ....— .. ——  . .—
~4 y‘pwu .. v _ ~ ..—
I" xwwmy ' ~— —  —... ... ""\v'.'—l  ...~ ''\‘I‘ ~  ._v  . . .   — .—~ ......— —.ou.—o Q.
A r;— — x .. oa
.. ...—— .. . o — ~ . —.. _ ... _ _._—_ .*...— _.,.‘_._..r‘. _—  _..—
a M— u. — .— .— “___......
pwk.~_=‘~r__ M—“vv V. =_._. _a,_.._ .. , —_ ...,—L... ..‘_,__ ‘_ , .._~ .7“ ._ﬁv_mr_~..»,.,
l 1 ﬁT'T
23
FT—I
47
Evaluations (x1000)
r—YTI
95
W1
119
rﬁ
71
Figure 4.54 Ranksum Performance on Dynamic Control
288 
Chapter 5
Conclusions
This chapter presents a summary of the general conclusions reached in the
previous chapters. Also potential areas for expansion of this work are summarized.
5.1 Conclusions from Theoretical Evaluations
An informal taxonomy of testing functions was provided at the end of Chapter 2.
Such a taxonomy could prove extremely useful in determining similarities and
differences of various operators and EC systems if it were better formalized. Even the
limited grouping and analysis provided the impetus for creating a number of interesting
test problems which proved useful for differentiation of systems in Chapter 4.
Taxonomy of EC operators by characteristics may provide a basis for
understanding and predicting behavior under various circumstances or conditions. In
Chapter 3 an attempt to provide a preliminary taxonomy was given. However, the
289
proposed taxonomy did not provide adequate or correct behavior predictions for the
systems studied in Chapter 4; therefore, either dynamic emergent behaviors prove
stronger than individual operator characteristics, or the methodology of characterization
proposed in Chapter 3 was ﬂawed.
5.2 Conclusions from Empirical Evaluations
The central focus of the experiments in Chapter 4 was to demonstrate reduced
behavioral change under invariant landscape modiﬁcation. The experiments conclusively
prove that the proposed modiﬁcation of operators to reduce bias successfully reduced
behavior changes induced by coordinate rotation.
Further, and perhaps more importantly, the experiments in Section 4.6
demonstrate that use of existing population bias for operator axes selection outperforms
random rotation for a number of test problems. This implies that such population
sampling based techniques may provide useful search information which is effectively
i gnored by selfadaptive and other nonpopulation based operators.
The performance of the PCGA system appears to be an emergent property of the
combination of mutation and crossover operators. Therefore, both the actions of variance
mixing (performed by crossover, which is variance neutral), and variance addition
contribute to the search capabilities of these systems.
In many cases, selfadaptive mutation techniques appear more sensitive to a
rotated presentation than population relative operation techniques such as BLXa and
PCGA. The implication is that selfadaptive techniques may more closely mirror
inductive search techniques; however, more study is required before such conjectures can
be fully substantiated.
290
Population relative approaches, such as BLXa and PCGA perform poorly when
there are many small, scattered and nearly equivalent optima. This appears to be due to
the implied additional parameter variance which is not eliminated by selection. Thus the
mutative operator begins to overpower the variance reduction effects of selection.
For the problems examined, PCGA is relatively insensitive to the choice of pool
size, although empirically the smallest pool size proved most effective. PCGA is
somewhat sensitive to the scale of the mutative operation relative to the population
variance. In general, a scale of 1/3 appeared empirically most general for the test cases
evaluated. However, PCGA is extremely sensitive to the population size. If the
population size grows too large, the total p0pu1ation inertia appears to overpower the
convergence strength of the selection operator. This situation is similar the selection of
the mutative scale and is related to the inability to match variance gain to variance loss
during selection. Note that BLXa does not appear to suffer the same level of difﬁculty,
although it is not known why.
Both population relative and selfadaptive mutation approaches lack the ability in
general to control mutative step sizes to maintain balance with the selection operator.
The Variance Recapture mutation operator, which works well within the EP framework,
does provide such a mechanism. Empirical testing revealed comparable results to other
systems on a number of the tested ﬁmctions. The Variance Recapture operator did not
work well within the GA framework. This is likely due to the fact that EP provides a
higher level of selection pressure and allows for more elitism. These characteristics
provide greater stability in the level of variance reduction from generation to generation.
291
5.3 Future Work
This work provides an initial examination of a number of interesting phenomena.
The design and evaluation of experiments to further explore and verify the conclusions
reached in this work could provide a long agenda for future exploration. However, a
number of speciﬁc tangent explorations were mentioned during this exposition. The
following provides a summary of these areas.
The end of Chapter 2 provides a fairly detailed, if somewhat informal, analysis of
various test functions in terms of various characteristics. This taxonomy of test cases
would be much more useful if formalized. Also the underlying collection of test
problems should be evaluated and modiﬁed to provide a more balanced representation of
the various characteristics.
While Chapter 3 provides a classiﬁcation of operators by their various statistical
operational characteristics,’ this taxonomy does not provide useful predictions of the
performance differences found under empirical examination. Either the emergent
dynamic properties of search systems render such classiﬁcation fruitless, or a more
complete and effective operator classiﬁcation system is possible. The existing
classiﬁcation system may prove more effective if more consideration is given to the
effects of dimensionality.
In Section 4.2.3, we suggested that alternate forms of pool selection may have
merits over uniform random selection. Composition and evaluation of such selection
schemes could provide an area of further research.
Section 4.3.2 proposes that the tendency for EP to present more bias under
rotation than the BLXa and PCGA systems may imply a similarity between EP and
292
_'.I
inductive search. This is quite a serious contention which should be expected to invoke
immediate and detailed study.
The relationship between scale and problem dimensionality is brieﬂy alluded to in
Section 4.7, and ﬁirther exploration of this is remitted for future work. In fact, any
number of multifaceted interactions between the various algorithmic effectors studied
(population size and scale, etc.) could provide a basis for expanded study.
During discussion of the Variance Recapture operator, the use of alternate non
linear annealing schedules for recapture target selection was mentioned. (See Section
4.10.1.) This could provide a fruitful area of future research.
Section 4.10.3 discusses the difﬁculty of using global variance recapture targets
when the problem space is underconstrained. Evaluation of the truthfulness of this
conjecture and any proposed methods to deal with these difﬁculties might provide a
useﬁil area for future work.
In Section 4.11.2, the possibility of using Cauchy mutative sampling to provide a
form of FastVREP was alluded to. This topic could provide a quick and insightful area
for future research.
Finally and perhaps most importantly, this evaluation could be made much more
substantive if the battery of test cases consisted of a number of realworld problems. For
example, sidebyside evaluation across one or more tests ﬁ'om the various MINPACK
test suites might provide quite insightful results.
293
References
[Back 97] Back, Thomas, D. B. Fogel, and Z. Michalewicz, editors, Handbook of
Evolutionary Computation, New York: Oxford University Press and Bristol:
Institute of Physics Publishing, 1997.
[Back 93] Back, T. and H.P. Schwefel, "An overview of evolutionary algorithms for
parameter optimization," Evolutionary Computation, Vol. 1., No. 1., pp. 123,
1993.
[Daida 99] Daida, Jason M., Seth P. Yalcin, Paul M. Litvak, Gabriel A. Eickhoff, and
John A. Polito 2, “Of Metaphors and Darwinism: Deconstructing Genetic
Programming's Chimera,” Proceedings of the Congress on Evolutionary
Computation, pp. 435462, 1999.
[DeJong 93] DeJong, Kenneth A. and J. Sarma, “Generation gaps revisited,” Foundations
of Genetic Algorithms 2, pp. 1928, 1993.
[DeJong 75] DeJong, Kenneth A., An Analysis of the Behavior of a class of Genetic
Adaptive Systems, Doctoral Dissertation, University of Michigan, Dissertation
Abstracts International, (University Microﬁlms 769381), 1975.
[English 99] English, Tom, "Some Information Theoretic Results on Evolutionary
Optimization,” Proceedings of the 1999 Congress on Evolutionary Computation:
CEC99. PP. 788795, 1999.
[English 96] English, Tom M., "Evaluation of evolutionary and genetic optimizers: No
free lunch," Evolutionary Programming V: Proceedings of the Fifth Annual
Conference an Evolutionary Programming, pp. 163169, 1996.
[Eshelman 93] Eshelrnan, L. J. and J. D. Schaffer, "Realcoded genetic algorithms and
intervalschema," Foundations of Genetic Algorithms 2, pp. 187202, 1993.
[Eshelman 91a] Eshelrnan, Larry and J. Schaffer, “Preventing premature convergence in
genetic algorithms by preventing incest,” Proceedings of the Fourth International
Conference on Genetic Algorithms, pp. 115122, 1991.
[Eshelman 91b] Eshelrnan, Larry, “The CHC Adaptive Search Algorithm . How to Have
Safe Search When Engaging in Nontraditional Genetic Recombination,”
Foundations of Genetic Algorithms I, pp. 265—283, 1991.
[Fogel 98] Fogel, David B., and Adam Ghozeil, “The Schema Theorem and the
Misallocation of Trials in the Presence of Stochastic Effects,” Evolutionary
Programming VII, pp. 313321, 1998.
294
[Fogel 99] Fogel, David B. and Adam Ghozeil, “Schema Processing, Proportional
Selection, and the Misallocation of Trials in Genetic Algorithms,” Information
Sciences, 1999.
[Fogel 90] Fogel, David B., and J.W. Atmar , “Comparing Genetic Operators with
Gaussian Mutations in Simulated Evolutionary Processes Using Linear Systems,”
Biological Cybernetics, pp. 111114, 1990.
[Fogel 62] Fogel, Lawerence. J. “Autonomous automata,” Industrial Research, vol. 4, pp.
1419, 1962.
[Forrest 92] Forrest, Stephanie and M. Mitchell, “Relative buildingblock ﬁtness and the
buildingblock hypothesis,” Proceedings of workshop on Foundations of Genetic
Algorithms and Classiﬁer Systems ( F 0GA92 ), pp. 109126, 1992.
[Goldberg 90] Goldberg, David. E., Deb Kalyanmoy, and Bradley Korb, “Messy genetic
algorithms revisited: studies in mixed size and scale,” Complex Systems, vol. 4,
pp. 415444, 1990.
[Goldberg 89a] Goldberg, David. E., Genetic Algorithms in Search, Optimization, and
Machine Learning, Reading, MA: Addison Wesley, 1989.
[Goldberg 89b] Goldberg, David.E, Bradley Korb, and Deb Kalyanmoy “Messy genetic
algorithms: motivation, analysis, and ﬁrst results,” Complex Systems, vol. 3, pp.
493530, 1989.
[Goldberg 87] Goldberg, David. E. and J. J. Richardson, “Genetic algorithms with
sharing for multimodal function optimization,” Proceedings of the Second
International Conference on Genetic Algorithms, pp. 4149, 1987.
[Glover 93], Glover, Fred and M. Laguna, "Tabu Search,” Modern Heuristic Techniques
for Combinatorial Problems, Oxford, UK: Blackwell Scientiﬁc Publishing, 1993.
[Hansen 01] Hansen, N. and Osterrneier, A., “Completely Derandomized SelfAdaptation
in Evolution Strategies,” Evolutionary Computation, vol. 9, no. 2, pp. 159195,
2001.
[Hansen 00] Hansen , N ., “Invariance, SelfAdaptation and Correlated Mutations in
Evolution Strategies,” Proceedings of the Sixth International Conference on
Parallel Problem Solving from Nature (PPSN VI), pp. 355364, 2000.
[Hansen 98] Hansen, N., Verallgemeinerte individuelle Schrittweitenregelung in der
Evolutionsstrategie. Eine Untersuchung zur entstochastisierten,
koordinatensystemunabhdngigen Adaptation der Mutationsverteilung, Berlin:
Mensch und Buch Verlag, 1998.
[Hansen 97] Hansen, N. and A. Osterrneier , “Convergence Properties of Evolution
Strategies with the Derandomized Covariance Matrix Adaptation: The (ii/pl, D
295
ES,” Proceedings of the Fifth European Congress on Intelligent Techniques and
Soﬁ Computing (EUFIT '97), pp. 650654, 1997.
[Hansen 96] Hansen, N. and A. Osterrneier, “Adapting Arbitrary Normal Mutation
Distributions in Evolution Strategies: The Covariance Matrix Adaptation,”
Proceedings of the I 996 IEEE International Conference on Evolutionary
Computation (ICEC '96), pp. 312317, 1996.
[Harik 95] Harik, Georges, “Finding Multimodal Solutions Using Restricted Tournament
Selection,” Proceedings of the Sixth International Conference on Genetic
Algorithms, pp. 2431, 1995.
[Holland 75] Holland, John H., Adaptation in natural and artiﬁcial systems, Ann Arbor,
MI: University of Michigan Press, 1975.
[Holland 92] Holland, John H., Adaptation in Natural and Artiﬁcial Systems, 2nd Ed.
Cambridge, MA: MIT Press, 1992.
[Jain 88] Jain, Anil. K. and R. C. Dubes. Algorithms for Clustering Data, Prentice Hall
Advanced Reference Series, Englewood Cliffs, NJ: Prentice Hall, 1988.
[J anikow 91] Janikow, C. 2. and Michalewicz, Z., “An Experimental Comparison of
Binary and Floating Point Representations in Genetic Algorithms,” Proceedings
of the Fourth International Conference on Genetic Algorithms, pp. 3136, 1991.
[Kalyanmoy 89] Kalyanmoy, Deb, and David E. Goldberg ,“An investigation of niche
and species formation in genetic function optimization,” Proceedings of the Third
International Congress on Genetic Algorithms, pp. 42  50, 1989.
[Kita 99] Kita, H., Ono, I. and Kobayashi, S., “Multiparental Extension of the Unimodal
Normal Distribution Crossover for Realcoded Genetic Algorithms”, Proceedings
of the 1999 Congress on Evolutionary Computation (CEC99), vol. 2, pp.1581
1587, 1999.
[Mafoud 95] Mafoud, S. W. , Niching Methods for Genetic Algorithms, Doctoral
Dissertation and IlliGAL Report 95001, University of Illinois at Urbana
Champaign, Illinois Genetic Algorithms Laboratory), Dissertation Abstracts Int.,
(University Microﬁlms 9543663), 1995.
[Mafoud 92] Mafoud, S. W., “Crowding and preselection revisited,” Parallel Problem
Solving from Nature, vol 2., Amsterdam: Elsevier, pp 2736, 1992.
[Mathias 94] Mathias, K. E. and L. D. Whitley, "Changing representation during search:
a comparative study of delta coding," Evolutionary Computation, vol. 2., no. 3,
pp. 249278, 1994.
296
[Mﬁhlenbein 96] Miihlenbein, H. and D. Schlierkamp—Voosen, “Perdictive Models for
the Breeder Genetic Algorithm 1. Continuous Parameter Optimization,”
Evolutionary Computation, vol. 1, pp. 2549, 1996.
[Ono 97] Ono, 1., and S. Kobayashi, "A real coded genetic algorithm for function
optimization using unimodal normal distribution crossover," Proceedings of the
Seventh International Conference on Genetic Algorithms, pp. 246253, 1997.
[Osterrneier 99] Osterrneier, A. and N. Hansen, “An evolution strategy with coordinate
system invariant adaptation of arbitrary normal mutation distributions within the
concept of mutative strategy parameter control”. GECCO99: Proceedings of the
Genetic and Evolutionary Computation Conference, pp. 902909, 1999.
[Osterrneier 94] Osterrneier, A., Gawelczyk, A. and N. Hansen “A Derandomized
Approach to SelfAdaptation of Evolution Strategies,” Evolutionary Computation
vol. 2, no. 4, pp. 369380, 1994.
[Patton 99] Patton, Arnold L., E. D. Goodman, W. F. Punch III "Beyond Encoding:
Rotationally Invariant Operators for Evolutionary Computation,” Tech Report,
Michigan State University, 1999.
[Patton 98] Patton, A., T. Dexter, E. D. Goodman, and W. F. Punch HI, (1998). "On the
Application of CohortDriver operators to continuous Optimization Problems
Using Evolutionary Computation," Evolutionary Programming VII,
SpringerVerlag, pp. 671681.
[Press 92] Press, William H., S. A. Teokolsky, W. T. Vetterling and B. P. Flannery,
Numerical Recipes in C: The Art of Scientiﬁc Computing, 2nd Edition.
Cambridge, UK: Cambridge University Press, 1992.
[Punch 91] Punch, William F., E.D. Goodman, M. Pei, L. Chia—Chun, P. Wovland, and
R. Enbody, “Further research on feature selection and classiﬁcation using genetic
algorithms,” in Proceedings of the Fourth International Conference on Genetic
Algorithms and their Applications, pp. 377383, 1991.
[Rayrner 00] Rayrner, Michael L., W.F. Punch, E.D. Goodman, L.A. Kuhn and A.K. Jain,
"Dimensionality Reduction Using Genetic Algorithms,” IEEE Trans. EC, Vol. 4,
No. 2, Pp. 164171, July 2000.
[Rayrner 97] Raymer, Michael L., P.C. Sanschagrin, W.F. Punch, S. Venkatararnan, E.D.
Goodman, and LA. Kuhn, “Predicting conserved watermediated and polar ligan
interactions in proteins using a knearestneighbors genetic algorithm,” Journal of
Molecular Biology, vol. 265, pp. 445464, 1997.
[Rayrner 97] Rayrner, Michael L., W.F. Punch, E.D. Goodman, P.C. Sanschagrin, and
LA. Kuhn, “Simultaneous feature scaling and selection using a genetic
algorithm,” Proceedings of the Seventh International Conference on Genetic
Algorithms (ICGA), 9 PP 561567, 1997.
297
[Rechenberg 73] Rechenberg, Ingo, Evolutionsstrategie.: Optimierung technischer
Systeme nach Prinzipien der biologischen Evolution, Stuttgart: Frommann
Holzboog, 1973.
[Rechenberg 65] Rechenberg,1ngo, “Cybernetic solution path of an experimental
problem,” Library Translation 1122, August 1965. Famborough Hants: Royal
Aircraft Establishment.
English translation of lecture given at the Annual Conference of the WGLR,
Berlin, 1964.
[Rogers 99] Rogers, A. and A. PrugelBennett, “Modelling the dynamics of steadystate
genetic algorithms,” Foundations of Genetic Algorithms 5, pp. 5768, 1999.
[Rogers 00] Rogers, A. and A. PriigelBennett, “Evolving populations with overlapping
generations,” Theoretical Population Biology, vol. 57, no. 2, pp. 121129, 2000.
[Roy 96] Roy R. and Parmee I. C., “Adaptive restricted tournament selection for the
identiﬁcation of multiple suboptima in a multi modal function,” Proceedings of
Artiﬁcial Intelligence and Simulation of Behavior (AISB '96) Workshop on
Evolutionary Computation, 12 April, Brighton, UK, pp. 187205, 1996.
[Roy 95] Roy, R. and Parmee, I.C. “Adaptive restricted tournament selection for multi
modal fuction optimization,” Internal report no. PEDC0395, Plymouth
University, UK, 1995.
[Rudolph 97] Rudolph, Gunter, “Evolution strategies,” Handbook of Evolutionary
Computation, pp. B:l.3:1B:1.3:6, 1997.
[Salomon 98] Salomon, Ralf, "Short notes on the schema theorem and the building bock
hypothesis in genetic algorithms," Evolutionary Programming VII, pp. 113122,
1998.
[Salomon 96] Salomon, Ralf, “Performance degradation of genetic algorithms under
coordinate rotation,” Evolutionary Programming V: Proceedings of the Fifth
Annual Conference an Evolutionary Programming, pp. 155161, 1996.
[Salomon 96] Salomon, Ralf, “Reevaluating genetic algorithm performance under
coordinate rotation of benchmark functions: a survey of some theoretical and
practical aspects of genetic algorithms, BioSystems, vol. 39, no. 3, pp. 263278,
1996.
[Schaffer 91] Schaffer, J. David, Larry J. Eshelrnan, and Daniel Offutt, “Spurious
Correlations and Premature Convergence in Genetic Algorithms”, Foundations of
Genetic Algorithms I, pp. 102114, 1991.
298
[S yswerda 89] Syswerda, G., "Uniform Crossover in Genetic Algorithms," Proceedings
of the Third International Conference on Genetic Algorithms, pp. 29, 1989.
[Whitley 93] Whitley, D. and V. Scott Gordon, “Cellular Genetic Algorithms,”
Proceedings of the Fifth International Conference on Genetic Algorithms, pp.
177183, 1993.
[Whitley 89] Whitley, D. , ”The GEN IT OR algorithm and selective pressure,”
Proceedings of the Third International Conference on Genetic Algorithms, pp.
116121, 1989.
[Whitley 88] Whitley, D. and J. Kauth, “Genitor: A Different Genetic Algorithm,”
Proceedings the Fourth Rocky Mountain Conference on Artificial Intelligence,
pp. 118130, 1988.
[Wolpert 97] Wolpert, D. H. and W. G. Macready, “No free lunch theorems for
optimization,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp.
6783, 1997.
[Yao 96] Yao, X., G. Lin and Y Liu, "An analysis of evolutionary algorithms based on
neighborhood and step sizes,” Evolutionary Programming VI, pp. 297307, 1996.
[Yip 95] Yip, D. and Y. H. Pao, “Combinatorial Optimization with Use of Guided
Evolutionary Simulated Annealing,” IEEE Transactions on Neural Networks, vol.
6, no. 2, pp. 290295, 1995.
299
lllllllllllrillllllrill
02504 32 ‘