PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 K:IProj/Acc&Pres/ClRC/Date0uerindd

 

EVOLUTIONARY OPTIMIZATION METHODS
FOR ACCELERATOR DESIGN

By

Alexey A. Poklonskiy

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Physics and Astronomy
and
Mathematics

2009

ABSTRACT

EVOLUTIONARY OPTIMIZATION METHODS
FOR ACCELERATOR DESIGN

By

Alexey A. Poklonskiy

Many problems from the ﬁelds of accelerator physics and beam theory can be
formulated as optimization problems and, as such, solved using optimization meth-
ods. Despite growing efﬁciency of the optimization methods, the adoption of modern
optimization techniques in these ﬁelds is rather limited. Evolutionary Algorithms
(EAs) form a relatively new and actively developed optimization methods family.
They possess many attractive features such as: ease of the implementation, modest
requirements on the objective function, a good tolerance to noise, robustness, and
the ability to perform a global search efficiently which make them the tool of choice
for many design and optimization problems. In this work we study the application of
EAs to problems from accelerator physics.

We review the most commonly used methods of unconstrained optimization and
_ describe the GATool, evolutionary algorithm and the software package, used in this
work. Then we use a set of test problems to assess its performance in terms of com-
putational resources and the quality of the obtained result. We justify the choice of
GATool as a heuristic method to generate cutoff values for the COSY-GO rigorous
global optimization package. We design the model of their interaction and demon-
strate that the quality of the result obtained by GATool increases as the information

about the search domain is reﬁned, supporting the usefulness of this model. We dis-

cuss GATool’s performance on the problems with static and dynamic noise and study
useful strategies of GATool parameter tuning for these and other difﬁcult problems.

We review the challenges of constrained optimization with EAs and then describe
REPA, a new constrained optimization method based on repairing, in exquisite detail,
including the properties of its two repairing techniques: REF IND and REPROPT.
We assess REPROPT’S performance on the standard constrained optimization test
problems for BA with and suggest optimal default parameter values based on the
results. Then we study the performance of the REPA method on the same set of test
problems and compare the obtained results with those of several commonly used con-
strained optimization methods with EA. Based on the obtained results, particularly
on the outstanding performance of REPA on test problem that presents signiﬁcant
difficulty for other reviewed EAs, we conclude that the proposed method is useful
and competitive. We discuss REPA parameter tuning for difﬁcult problems and crit-
ically review some of the problems from the de-facto standard test problem set for
the constrained optimization with EA.

We study several different problems of accelerator design and demonstrate how
they can be solved with GATool. These problems include a simple accelerator design
problem (design a quadrupole triplet to be stigmatically imaging, ﬁnd all possible
solutions), a complex real-life accelerator design problem (an optimization of the
front end section for the future neutrino factory), and a problem of the normal form
defect function optimization used to rigorously estimate the stability of the beam
dynamics in circular accelerators. The positive results we obtained suggest that the
application of EAs to problems from accelerator theory has large potential. The

developed optimization scenarios and tools can be used to approach similar problems.

 

 

Dedicated to
life in all forms and appearances

and the evolution that drives it to perfection

iv

ACKNOWLEDGMENTS

They say: “Everything that has a beginning has an end”. Now, as the very moment
of inserting the ﬁnal piece into the large puzzle called thesis has come, it is utterly
satisfactory, and at the same time somewhat sad, to see this long-lasting endeavour
coming to an end. The work that was done during all these years of graduate school
has taken its ﬁnal shape, is summarized, and written up in a logical sequence. The
only thing it is waiting for is to be put to an ultimate examination by a strict yet fair
committee of the established and renowned professionals, and thus meet its concluding
test.

Even if your goal is not as big as the Universe, chances are that you do not have the
power to reach it immediately. The road to fulﬁllment, assuming you venture to try,
often consists of many small steps. Some of them would be easy to make, especially
if you were in the right place at the right time, some would require more attention
and efforts. Some would be so hard to make, that it would knock you off your feet
and test your ability not to surrender to the circumstances, test your persistence,
willpower and optimism. It would be hard not to lose your way and keep going, it
would be hard not to dilute your ultimate goal in short-term accomplishments and
everyday chores, but it is still possible. Even though there were all kind of steps on
my road through the graduate school, here I am, seeing the end, several steps away
from completion.

It is only when you get to your target, you can stop satisﬁed, look back and
observe the whole trip, thinking of and thanking all the wonderful people that helped
you along the way. So, ﬁrst and foremost, I would like to thank all the people who
were helping me throughout the Ph.D. program. Those who taught me, supported

me, inspired me, served as living examples of various great personal qualities, cheered

me up, and enlightened me in numerous friendly conversations. I want to thank all
the people who will not get their personal words of gratitude. I do remember you
and I am very grateful for what you have done for me. Thank you!

Undoubtedly, the most important person in the professional (and often personal)
life of a graduate student is his or her scientiﬁc adviser. I consider myself very lucky to
have several. All of them heavily inﬂuenced my professional development, the contents
and style of this work, and my view of the world of science. I thank my main adviser,
Dr. Martin Berz, Michigan State University, for his deep involvement with all of his
students, including me, for his willingness to help, his deep understanding of scientiﬁc
endeavors, and a great talent to explain even the most complicated topics such that
they are comprehensible by a mere student. In addendum, I thank him for showing
me so many great examples of the fundamental principles of the scientiﬁc research
and ethics. I also thank Dr. Dmitriy Ovsyannikov and Dr. Alexandre Ovsyannikov,
Saint-Petersburg State University, for inspiring me to join the graduate program and,
together with Dr. Berz, for giving me a great opportunity to come and study here
at Michigan State University. Another person who has continuously been working to
make the international exchange programs available to Russian students is Dr. Victor
Yarba, Fermilab. I am grateful to him for these efforts.

I thank two of my advisers “in the ﬁeld”, Dr. Carol Johnstone and Dr. David Neuf—
fer, Fermilab, for demonstrating to me how the practical problems of the frontiers of
science are being attacked and solved, and for teaching me to trust scientiﬁc intu-
ition and insights as much as rigorous step-by-step derivations. Additional words of
gratitude have to be said to Dr. Carol Johnstone for her endless help in my everyday
life in the United States and for making the accommodation process much smoother

than it could have been. I am also very grateful to Dr. Patricia Lamm, Mathemat—

vi

 

ics Department, Michigan State University, for being such a great professor and role
model of a great teacher. I thoroughly enjoyed her lectures, her creative application of
the modern technologies in the classroom, and even her carefully thought homework
assignments. I am grateful to Dr. Sheldon Newhouse, h-I'Iathernatics Department,
Michigan State University, for his enthusiasm and help with exams, to Dr. Keith
Promislow and Dr. Huyi Hu, Mathematics Department, Michigan State University
for their support, and to all other members of my guidance committee: Dr. S.D. Ma-
hanti, Dr. Joey Huston and Dr. Kyoko Makino, Physics & Astonomy Department,
Michigan State University, for their help. Personal kudos to Dr. Kyoko Makino for
her very extensive proofreading of this work and valuable technical corrections.

I separately thank my colleagues from the Beam Theory group: Dr. Kyoko Makino,
Dr. Shashikant Manikonda, Johannes Grote, Youn-Kyung Kim, Alex Wittig and, of
course, my comrade Dr. Pavel Snopok for sharing my professional and personal life
in all their ups and downs for almost the entire period of my graduate studies. I
will always remember our scientiﬁc discussions, small talks about science, conference
travels, dinners, and parties. I am grateful for the help in any circumstances and the
pleasure that I had, in working with all these wonderful people.

A great portion of my gratitude belongs to my mother, Larisa Poklonskaya, and
my father, Alexander Poklonskiy, for raising me, for being my ﬁrst teachers of ev-
erything, for giving me this inﬁnite supply of the curiosity that drives me through
the life, and for their unconditional love and support. Many thanks go to my friend,
Alyona Myasnikova, for being there whenever I needed her and for'her encouragement
and inspiration. I would also like to thank Dmytro Berbasov and Oksana Odnovol for
being understanding, forgiving, patient, and cheerful friends and roommates, espe-

cially during the thesis writing phase when I would rarely get off the work and did not

vii

bother about “unimportant” things like dirty dishes or general mess. Personal thanks
to Oksana for proofreading parts of this work. Additional thanks and deep gratitude
to my colleagues from Microsoft who helped me to ﬁnish up this work by ﬁxing and
polishing the language: Kristen Lovin, Bill Donkervoet, and Richard Zadorozny.

I thank Debbie Simmons and Brenda Wenzlick, secretaries of the Physics & As-
tronomy Department, Michigan State University, for their good work and helpfulness.

Thanks to Fermi National Accelerator Laboratory for ﬁnancial support which is,
of course, very important for scientiﬁc success.

Finally, I would like to thank you for reading this work and myself for actually

writing it.

viii

TABLE OF CONTENTS

ix

LIST OF TABLES .............................. xiii
LIST OF FIGURES ............................. xviii
1 Introduction ................................ l
1.1 Beam and Accelerator Theory ...................... 1
1.1.1 Differential Algebra and Map Methods ............. 1
1.1.2 Beam Dynamics .......................... 3
1.2 Neutrino Factories ............................ 16
1.2.1 Purpose and History ....................... 16
1.2.2 Design Overview ......................... 18
1.2.3 Front End ............................. 23
1.2.4 Decay, Bunching and Phase Rotation .............. 25
1.3 Optimization Problems .......................... 30
1.3.1 Introduction ............................ 30
1.3.2 Unconstrained Optimization ................... 33
1.3.3 Constrained Optimization .................... 35
Unconstrained Optimization ...................... 40
2.1 Optimization Methods .......................... 40
2.1.1 Derivative-based Methods .................... 40
2.1.2 Direct Search Methods ...................... 47
2.1.3 Evolutionary Algorithms ..................... 52
2.1.4 No Free Lunch Theorems for Optimization ........... 56
2.2 Rigorous Global Optimization ...................... 58
2.2.1 Conventional Interval Methods ................. 58
2.2.2 Taylor Methods .......................... 60
2.3 GATool Evolutionary Optimizer ..................... 66
2.3.1 Principles, Concepts and Building Blocks ............ 66
2.3.2 Design and Implementation ................... 70
2.3.3 Statistics, Diversity and Convergence .............. 82
2.3.4 Summary, Notes on Performance and Parallelization ...... 87
2.3.5 Noisy Data Handling ....................... 90
2.3.6 Studies on Integration with COSY-GO Rigorous Global Opti-
mizer ................................ 97
2.4 Conclusions ................................ 119

3 Constrained Optimization ....................... 121
3.1 Challenges in Constrained Optimization with Evolutionary Algorithms 121

3.2 Overview of the Methods ......................... 124
3.2.1 Killing ............................... 124
3.2.2 Penalty Functions ......................... 125
3.2.3 Special Genetic Operators .................... 138
3.2.4 Selection .............................. 139
3.2.5 Repairing ............................. 142
3.2.6 Other Methods .......................... 143

3.3 The REPA Constrained Optimization Method ............. 144
3.3.1 Introduction ............................ 144
3.3.2 REFIND: REpair by Feasible INDividual ............ 146
3.3.3 REPROPT: REpair by PRojecting through OPTimization . . 148
3.3.4 REPA: REPair Algorithm .................... 150
3.3.5 Studies on Constraint Projection by Standard COSY Infinity

Optimizers ............................. 154

3.3.6 Performance ............................ 173

3.4 Conclusions ................................ 188

4 Optimization Problems in Accelerator Design ............ 191

4.1 Quadrupole Stigmatic Imaging Triplet Design ............. 192

4.2 Normal Forms Defect Function Optimization .............. 208

4.3 Neutrino Factory Front End Design Optimization ........... 218
4.3.1 Problem Description and Motivation .............. 218
4.3.2 Optimization of the Front End Production Parameters . . . . 220

4.4 Conclusions ................................ 229

A COSY++ Macroprogramming Extension for COSY Inﬁnity . . . 232

Al COSYScript ................................ 232
A11 Introduction ............................ 232
A2 Syntax ................................... 234
A.2.1 Problems ............................. 236
A.3 COSY++ ................................. 245
A31 Introduction and Features .................... 245
A.3.2 Sections Assembler ........................ 246
A33 Active Blocks ........................... 253
A34 Full Processing .......................... 261
A35 Libraries .............................. 262
A.3.6 Compatibility Mode ....................... 264
A.3.7 Standard Libraries ........................ 265
A38 Front End ............................. 267
x

 

A.3.9 Additional Features and Notes .................. 271

APPENDICES ................................ 232
B The Genetic Algorithm Tool (GATool) in COSY Inﬁnity ..... 274
B. 1 Introduction ................................ 274
B2 Conﬁguration ............................... 275
83 Usage Scenarios .............................. 281
8.4 Access to Statistics ............................ 286
85 Default Parameters Set .......................... 287
86 Miscellaneous ............................... 288
8.7 Advanced Conﬁguration via Active Blocks ............... 288
C Test Problems in Unconstrained Optimization ........... 292
C.1 Sphere Function .............................. 293
C2 Rastrigin’s Function ........................... 295
C3 CosExp Function ............................. 297
C4 Rosenbrock’s Function .......................... 300
C5 Ackley’s Function ............................. 303
C6 Griewank’s Function ........................... 305
C? An Function ................................ 307
C8 SinSin Function .............................. 310
C9 Paviani’s Function ............................ 312
D Test Problems in Constrained Optimization with Evolutionary Al-

gorithms .................................. 315

xi

2.1

2.2

2.3

2.4

2.5

LIST OF TABLES

Results of one run of GATool on the 10-dimensional Rastrigin function
problem (see section C.2) performed with different GATool settings.
The values of the parameters that are different from the default ones
(see Figure B.1) are given in boldface ..................

Euclidean distance from the true minimizer to the current best found
objective function value and the objective function value averaged
by formula (2.3.14) with 91 = 5, for the 5-dimensional Sphere test
problem (true minimizer 0 see section C.1) minimized with GATool
with the default set of parameters (see Figure 8.1), population size :
10*dimension= 50 and dynamic noise in the range [—1, 1] .......

COSY-GO performance on the test problems (see Appendix C) with
increasing dimensionality; V is the volume of the search space and t is
the execution time in seconds. NO is the Taylor model expansion order
(see section 2.2). SinSin problem with NO 2 8 for 8 and 9 dimensions
takes too long to compute and is thus omitted .............

GATool performance on the test problems from Appendix C with de-
fault settings (see Figure 8.1) and population size = dim*100. V is
the volume of the search space, t is the execution time in seconds and
Q is the quality factor calculated as the difference between the best
obtained upper bound and the value of the global minimum (smaller
is better, 0 means that the global minimum is found) .........

GATool performance on the test problems from Appendix C, with
default settings (see Figure B.1) and population size : dim*10. V
is the volume of the search space, t is the execution time in seconds
and Q is the quality factor calculated as the difference between the best
obtained upper bound and the value of the global minimum (smaller
is better, 0 means that the global minimum is found) .........

xii

 

89

97

101

106

2.6

3.1

3.2

3.3

3.4

3.5

3.6

3.7

3.8

Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section C .4), with
the default set of parameters (see Figure B.1) by the e neighborhoods
of the global minimum for different initial search domains with decreas-
ing volumes. Average run time is around % 35 seconds independent of
the domain size ...............................

Success rate of the constraints projection for the Problem G00 (3.3.2)
on 1000 random points from {—100, 100]”. Here 12 = 2, problem has
one nonlinear equality constraint and four linear inequality constraints.
Three rows for each method correspond to the all combined, equality
combined + inequality combined and separate Optimization problem
formulation methods. Best methods in terms of the percentage of the
results are listed in boldface .......................

Average number of steps of the constraints projection for the Problem
G00 (3.3.2) on 1000 random points from [—100, 100]” .........

Success rate of the constraints projection for the Problem G00 (3.3.2)
on 1000 random points from {—1000, 1000]” ..............

Average number of steps of the constraints projection for the Problem
G00 (3.3.2) on 1000 random points from [—1000,1000]” ........

Success rate of the constraints projection for the Problem G03 (Figure
D.3) on 1000 random points from [—100, 100]”. Here 2) = 10, problem
has one nonlinear equality constraint. One row for each method cor-
respond to the separate optimization problem formulation methods.
Best methods in terms of the percentage of the results are listed in
boldface ..................................

Average number of steps of the constraints projection for the Problem
G03 (Figure D.3) on 1000 random points from [-—100, 100]” ......

Success rate of the constraints projection for the Problem G03 (Figure
D.3) on 1000 random points from [—1000,1000]” ............

Average number of steps of the constraints projection for the Problem
G03 (Figure D.3) on 1000 random points from [—1000,1000]”

xiii

112

161

162

163

164

165

165

166

3.9

3.10

3.11

3.12

3.13

3.14

3.15

3.16

Success rate of the constraints projection for the Problem G07 (Figure
D.7) on 1000 random points from {—100, 100]”. Here v : 10, prob—
lem has 8 inequality constraints (3 linear, 5 nonlinear). Two rows for
each method correspond to the All combined and separate optimization
problem formulation methods. Best methods in terms of the percent-
age of the results are listed in boldface .................

Average number of steps of the constraints projection for the Problem
G07 (Figure D.7) on 1000 random points from {—100, 100]” ......

Success rate of the constraints projection for the Problem G07 (Figure
D.7) on 1000 random points from [—1000,1000]” ............

Average number of steps of the constraints projection for the Problem
G07 (Figure D?) on 1000 random points from [—1000,1000]”

Best constraint satisfaction approaches according to the tests per-
formed at 1000 random points from {—100, 100]”. Here 2) is deﬁned
by a problem (see Appendix D). Success rates for the problem G05
are too low, hence II and III best methods for it are not listed . . . .

Best constraint satisfaction approaches according to the tests per-
formed at 1000 random points from [—1000,1000]”. Here 2) is deﬁned
by a problem (see Appendix D). Success rates for the problem G05
are too low, hence I, II and III best methods for it are not listed . . .

Percentage of the successful runs for different methods (from 100 runs
total). HEre v is the problem dimension, n is the number of constraints,
“Diff.” column lists difﬁculties of the problems according to [128]. Here
E,A,D,VD mean EASY, AVERAGE, DIFFICULT and VERY DIFFI-

CULT, correspondingly ..........................

Summary of the performance of the killing method, after 150 genera-
tions. The ﬁrst six columns describe the problem (for full description
see Appendix D) with the sixth column, “Optimum”, listing the value
of the global feasible optimum or the best known value of the opti-
mum. The next four columns describe the results of the 100 runs with
the killing method: the best, median, mean and the worst values that
were found. “Failed” is listed if the method failed to produce feasible
members in all runs ............................

xiv

166

167

168

169

177

178

 

 

3.17

3.18

3.19

4.1

4.2

Summary of the performance of the annealing penalty method, after
150 generations. The ﬁrst six columns describe the problem (for full
description see Appendix D) with the sixth column, “Optimum”, listing
the value of the global feasible optimum or the best known value of the
optimum. The next four columns describe the results of the 100 runs
with the annealing penalty method: the best, median, mean and the
worst values that were found. “Failed” is listed if the method failed to
produce feasible members in all runs ..................

Summary of the performance of the killing+penalty method, after 150
generations. The ﬁrst six columns describe the problem (for full de-
scription see Appendix D) with the sixth column, “Optimum”, listing
the value of the global feasible optimum or the best known value of
the optimum. The next four columns describe the results of the 100
runs with the killing+penalty method: the best, median, mean and
the worst values that were found. “Failed” is listed if the method failed
to produce feasible members in all runs .................

Summary of the performance of REPA method, after 150 generations.
The ﬁrst six columns describe the problem (for full description see
Appendix D) with the sixth column, “Optimum”, listing the value of
the global feasible optimum or the best known value of the optimum.
The next four columns describe the results of the 100 runs with the
REPA method: the best, median, mean and the worst values that
were found. “Failed” is listed if the method failed to produce feasible
members in all runs ............................

Triplet stigmatic imaging design statistics ...............

GATool’s performance for different population sizes compared to the
performance of the Taylor model methods-based global optimizer (TM-
MGO) and the Naive Sampling method, on the synthetic normal form
defect function (see Figure 4.8). TMMGO was executed on 256 IBM
SP POWER3 processors 375 MHz each, GATool and Naive Sampling
were executed on 1 Intel Pentium IV 2 Mhz processor. *For TMMGO
time is given as a number of processors x wall clock time of the run .

XV

179

180

213

 

4.3 GATool’s performance for different population sizes compared to the
performance of the Taylor model methods-based global optimizer (TM-
MGO) and Naive Sampling methods on the Tevatron normal form de-
fect function (see Figure 4.13). TMMGO was executed on 256 IBM
SP POWER3 processors 375 MHz each, GATool and Naive Sampling
were executed on 1 Intel Pentium IV 2 Mhz processor. *For TMMGO
time is given as a number of processors x wall clock time of the run . 215

4.4 Results of the Front End design optimization (in ascending order sorted
on the production rate for the 8000 particles initial beam) ...... 228

xvi

 

1.1

1.2

1.3

1.4

1.5

1.6

2.1

2.2

2.3

LIST OF FIGURES

Phase space trajectories in FODO cell, obtained for 1000 turns by
applying one turn map to the vector with initial coordinates 1000 times;
in conventional (left) and normal form (right) coordinates .......

Normal form coordinate space divided into rings and schematic view
of particles motion in one of those rings .................

Neutrino Factory schematics from the Feasibility Study Ila (RLA ac-
celeration variant) [10] ..........................

Distribution of particles energies 12m from the target calculated by
MARS, Etotal = E0 + T, where E0 is a rest energy (105.6 MeV for
muons), T — kinetic energy .......................

Example of the longitudinal beam dynamics in the Front End (phase-
energy plane), courtesy of David Neuffer [143] .............

The baseline Front End schematics from the latest International Scop-
ing Study [62] ...............................

One-point greedy iterative search strategy ...............

First steps performed by the LMDIF (COSY Inﬁnity built-in opti-
mizer) on the 2-dimensional Sphere test function (see section C.1)
starting from the initial guess (10, 10) ..................

First steps performed by the ANNEALING (005 Y Inﬁnity built-in
optimizer) on the 2-dimensional Sphere test function (see section C.1
starting from the initial guess (10, 10) ..................

Nelder—Mead method iteration ......................
First steps performed by SIMPLEX (COS Y Inﬁnity built-in optimizer)

on the 2-dimensional Sphere test function (see section C.1) starting
from the initial guess (10, 10) ......................

xvii

13

15

20

21

22

41

47

51

53

 

2.6 Evolutionary Algorithm .........................
2.7 LDB range bounding algorithm based on Taylor model (2.2.1)
2.8 COSY—GO Veriﬁed Global Optimizer box processing algorithm . . . .

2.9 Output of the results of the minimization of the 2D Rosenbrock’s func-
tion (see section C4) by COSY-GO ...................

2.10 Uniform mutation example: x is a member scheduled for mutation,
me are some of the possible one-coordinate mutants (all such mutants
are located on one of the two dashed lines), xm,2 are some of the
possible two-coordinate mutants (can be anywhere in S) .......

2.11 Continuous Crossover examples for the common scaling factor [5.
Points xp,b and xp,w are the parents with the better and worse
ﬁtnesses, correspondingly; xci for various i are the children gener-
ated with different values of the scaling factor: i = 1 corresponds to
5 E (0.5,1), i = 2 corresponds to B > 1, i = 3 corresponds to [3 = 0.5
(intermediate crossover, values of the ﬁtnesses neglected) .......

2.12 Continuous Crossover examples for the per-coordinate scaling factors
ﬁzz Points xp,b and xp,w are the parents with the better and worse
ﬁtnesses, correspondingly; xC i for various i are the children generated
with different values of the scaling factor for different coordinates. Here
the dotted rectangle contains all the children generated with 0 < 6,- <

2.13 Statistics gathered during one run of GATool on the 10-dimensional
Sphere function problem (see section C1). The horizontal axis for all
plots is the generation number (Max/ avg/ min function values, normal
axis) .....................................

2.14 Statistics gathered during one run of GATool on the 10-dimensional
Sphere function problem (see section C.1). The horizontal axis for all
plots is the generation number (Max/avg/min function values, loga-
rithmic axis). ...............................

xviii

65

82

83

85

 

2.15

2.16

2.17

2.18

2.19

2.20

2.21

2 .22

2.23

Statistics gathered during one run of GATool on the 10-dimensional
Sphere function problem (see section CI). The horizontal axis for all
plots is the generation number (Estimated average Euclidean distance
between population members) .......................

Statistics gathered during one run of GATool on the 10—dimensional
Sphere function problem (see section C.1). The horizontal axis for
all plots is the generation number (Min function value improvement
(absolute value)). .............................

GATool search algorithm .........................

Distribution of the results of 100 runs of GATool on the 5-dimensional
Sphere (left) and Rastrigin (right) test function problems (see Ap-
pendix C) with the default set of parameters (see Figure 8.1), pop-
ulation size = 10*dimension, by the e neighborhoods of the global
minimum. .................................

Distribution of the results of 100 runs of the GATool on the 5-
dimensional Rastrigin (right) test function problem (see Appendix C)
with the default set of parameters (see Figure B.1), population size :
20*dimension, by the e neighborhoods of the global minimum.

GATool’s performance in the 5-dimensional Sphere function problem
(see section C.1), population size 50, default set of parameters (see
Figure 8.1), without noise (left) and with the dynamic noise in the
range [—1, 1] (right). Generation number versus 23:1(13: — $i,true)v
where x* is the best minimizer found by GATool and xtrue is the true
global minimizer (in this case 0), is plotted. ..............

Global optimization of the spacecraft trajectories: pruned search space
in the epoch /epoch plane (courtesy of Roberto Armellin) [11]) . . . .

Growth of complexity factors of global optimization with COSY-GO
with dimension ..............................

Example of the execution time scaling for different scaling strategies
for Rastrigin’s function test problem (see section C.2) minimization
with GATool. The volume of the search space (logarithmic scale) is
shown to better demonstrate scaling issues ................

xix

86

87

89

93

94

95

99

103

108

 

2.24

2.25

2.26

2.27

2.28

2.29

2 .30

Example of the result quality scaling for different scaling strategies for
Rastrigin’s function test (see section C.2) problem minimization with
GATool. ..................................

Distribution of the results of 1000 runs of the GATool on the 5-
dimensional Rastrigin function test problem (see section C.2) with the
default set of parameters (see Figure B.1), population size : 10*di-
mension, by the e neighborhoods of the global minimum. Average run
time is 5.22 seconds. ...........................

Boxes generated during COSY-GO minimization of the 2-dimensional
Rosenbrock’s function (see section C.4) .................

Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section CA), with
the default set of parameters (see Figure B.1) by the e neighborhoods
of the global minimum for different initial search domains with de-

creasing volumes. Average run time is z 35 seconds independent of
the domain size ([—5, 10110, V = 5.6710“). .............

Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section C.4), with
the default set of parameters (see Figure 8.1) by the 5 neighborhoods
of the global minimum for different initial search domains with de-
creasing volumes. Average run time is z 35 seconds independent of

the domain size ([[—1.5, 15]“), V = 5.9 . 104]) ..............

Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section C .4), with
the default set of parameters (see Figure 8.1) by the e neighborhoods
of the global minimum for different initial search domains with de-
creasing volumes. Average run time is z 35 seconds independent of

the domain size ([[0,1.5]10, V = 5.76- 101]). ..............

Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section C.4), with
the default set of parameters (see Figure B.1) by the e neighborhoods
of the global minimum for different initial search domains with de-

creasing volumes. Average run time is z 35 seconds independent of
the domain size ([[o.5, 1.5110, V = 1.0- 100]) ...............

109

109

113

114

115

116

117

2.31 Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section C.4), with
the default set of parameters (see Figure 8.1) by the e neighborhoods
of the global minimum for different initial search domains with de-
creasing volumes. Average run time is z 35 seconds independent of

the domain size (“0.7, 1.3110, V = 0.6- 10—21) ..............

3.1 Example of the generation produced by EA for a constrained optimiza-
tion problem. Here points represent members of the population, the
cross represents the sought feasible minimum, 5 is a search space, and
F is a feasible set .............................

3.2 Example of the one-dimensional inequality constraint function and cor-
responding power penalty function of the type (3.2.10), a = 1.

3.3 Left to right, top to bottom: colormap plots (scales are different, i.e. the
same color on different plots may correspond to different function val-
ues) of P0(h(x)), P1(h(x)), and the actual Euclidian distance from
x to F where h is given by the formula (3.2.11) and F is a set of all
x E S = [—5, 5] x {—5, 5] such that the constraint h(x) S 0 is satisﬁed

3.4 REF IND: REpair by Feasible INDividual algorithm ..........

3.5 An example of the repairs performed by REFIND and REPROPT
repair methods (F is large compared to S) ...............

3.6 An example of the repairs performed by the REF IND and REPROPT
repair methods (F is small compared to S) ...............

3.7 REPA algorithm .............................

4.1 Objective function for triplet stigmatic imaging f(q1,q2), q, 6
[—-1,1], i = 1,2 (3D plot) .........................

4.2 Objective function for triplet stigmatic imaging f(q1,q2), q, E
[—1,1], i = 1,2 (contour lines plot) ...................

4.3 Ray tracing of the triplet, solution 1 ..................

118

123

131

132

149

151

152

198

199

 

4.4

4.5

4.6

4.7

4.8

4.9

4.10

4.11

4.12

4.13

4.14

4.15

4.16

A.1

A2

A.3

A.4

Ray tracing of the triplet, solution 2 .................. 202

Ray tracing of the triplet, solution 3 .................. 203
Ray tracing of the triplet, solution 4 .................. 204
Synthetic normal form defect function domain of interest ....... 209

Synthetic normal form defect function plots. Function values vs two
phase angles (3D plot) .......................... 210

Synthetic normal form defect function plots. Function values vs two
phase angles (contour lines plot) ..................... 211

COSY-GO output on synthetic normal form defect function maximiza—
tion .................................... 212

GATool’s parameters used for synthetic normal form defect function

maximization ............................... 212
Tevatron’s normal form defect function domain of interest ...... 214
The Tevatron normal form defect function. Function values vs two

phase angles ................................ 215
The Tevatron normal form defect function. Function values vs two

phase angles ................................ 216
Particles dynamics in the Tevatron ................... 216
COSY-GO output on the Tevatron normal form defect function maxi-

mization .................................. 217
COS YScript program structure ..................... 237
Example of inclusion workaround .................... 240
Example of the inclusion chain ...................... 241
Dynamic size of the variable in COSYScript (non-working) ...... 243

xxii

 

A.5 Dynamic size of the variable in COS YScript (working) ........
A.6 Including and included ﬁles with their sections marked up ......
A.7 Assembled ﬁle ...............................
A.8 Description section of the logging.fh library ..............

A.9 COSYScript ﬁle as a Perl ﬁle view from Active Blocks perspecitve . .

B.1 GATool’s default parameters .......................

C.1 Sphere function (3D plot) ........................
C.2 Sphere function (contour lines) .....................
C.3 Rastrigin’s function (3D plot) ......................
C.4 Rastrigin’s function (contour lines plot) .................
C.5 CosExp function (3D plot) ........................
C.6 CosExp function (contour lines plot) ..................
C.7 Rosenbrok’s function (3D plot) .....................
C.8 Rosenbrok’s function (contour lines plot) ................
C.9 Rosenbrock’s function contours near the minimum ...........
C.10 Ackley’s function (3D plot) ........................
C.11 Ackley’s function (contour lines plot) ..................
C.12 Griewank’s function (3D plot) ......................
C.13 Griewank’s function (contour lines plot) ................

C. 14 An function ................................

xxiii

288

295

296

297

301

301

303

304

307

C.15 An function (contour lines plot) ..................... 308

C16 SinSin function (3D plot) ......................... 310
C17 SinSin function (contour lines plot) ................... 311
C.18 Paviani’s function (3D plot) ....................... 312
C.19 Paviani’s function (contour lines plot) .................. 313
D.1 g01 Test problem ............................. 317
D2 g02 Test problem (best known value from [163]) ............ 318
D3 g03 Test problem ............................. 318
D4 g04 Test problem ............................. 319
D5 g05 Test problem ............................. 320
D6 g06 Test problem ............................. 321
D7 g07 Test problem ............................. 322
D8 g08 Test problem ............................. 323
[3.9 g09 Test problem ............................. 323
D. 10 g10 Test problem ............................. 324
D. 11 g11 Test problem ............................. 325
D. 12 g12 Test problem ............................. 325
13.13 g13 Test problem ............................. 326

13.14 Design of a Pressure Vessel (vess) [100] (best known value from [41]) . 327

13.15 Design of a Tension/ Compression Spring (tens) [9] (best known value
from [41]) ................................. 328

xxiv

CHAPTER 1

Introduction

1.1 Beam and Accelerator Theory

1.1.1 Differential Algebra and Map Methods

The dynamics of the various objects in Physics are often described by a system of the

nonlinear ordinary differential equations

dx
_ = f x,t , 1.1.1
Where x is a vector of coordinates of the considered object, t is time, and f is a

nonlinear vector function that describes various forces acting on the object and thus

goveI‘ning the dynamics. Initial conditions
x(0) 2 xi (1.1.2)

Specify the initial position of the object, i.e. its position at the moment of time that
S C0Ilsidered initial. It is often advantageous to describe the action of the system
(1 ' 1 - 1) with a so-called ﬂow operator, MT, which establishes a mapping between the

in‘ ~ . . . . . . .
1t131-1 posmon xi of the object at t = 0 and its ﬁnal posmon xf that the object

 

assumes at time T:

Xf = MT(xi)' (1.1.3)

The ﬂow operator approach is especially useful for studying various properties of the
dynamics in systems that are periodic in t. Since it captures the essential properties
of the dynamics in the system, it is possible to assess the properties of the flow MT ‘
instead of the dynamics of individual objects with varying initial conditions. A good
example of such system is a circular particle accelerator.

The problem here is that even in the cases of relatively simple functions f it is
frequently not possible to determine the system map in a closed form, so for the
practical purposes MT is often calculated via numerical integration of the equations
(1.1.1). However, if the function f is only weakly nonlinear, i.e. if its behaviour is
mostly determined by the linear component, then its map is also only weakly non-
linear and thus can be represented as a Taylor expansion with practically acceptable

precision. Developments in the ﬁeld of Differential Algebra (DA) and its applications
to Automatic Differentiation have have Opened the possibility to compute the Taylor
series for maps of such systems to an arbitrary high order. A detailed treatment
0f the Differential Algebra framework and its numerous applications including map
methods for Accelerator Physics can be found in [18].
particle accelerators typically consist of numerous subsystems inﬂuencing different
aspeCts of particle dynamics. The original method of map calculation, which involves
propagation of functional dependencies through a numeric solver of differential equa-
tions Using automatic differentiation technique, is slow and imprecise. Computation
of the ﬂow for individual devices and then application of the composition property
to obtain the ﬂow of the whole accelerator can be performed within the Differential

lgebra framework quite efﬁcrently and With unlimited preCISlOD.

The law of map composition tells us that if we have two maps: MtOvtl that relates
the initial position at the to to the ﬁnal position at time t1 and Mt1.t2 that relates
the initial position at time t1 to the ﬁnal position at time t2 then the map that relates
the initial position at time to to the ﬁnal position at time t2 can be constructed via

a map composition:

Mt0,t2 = MtlitQ o MtOvtl' (1.1.4)

Using this property we can assemble the transfer map for the entire accelerator if we

have transfer maps of all its elements which we can compute using DA.

1.1.2 Beam Dynamics

Particles in accelerators are rarely studied as standalone objects. Usually ensembles

of particles that have similar coordinates are used. These ensembles are called particle

beams. Since the particles in a beam are separated from each other by a relatively

small distance, it is often convenient to select one imaginary particle that represents

the motion of the whole beam inside the accelerator and then describe the motion

Of other particles in the beam in the coordinates that are relative to those of the
r ef'erence particle [18,89,170,175].

In the laboratory coordinate system the particle state is usually represented by

a. VeCtor that consists of its space coordinates and the components of its momentum

Vector corresponding to the coordinate axes. Time usually serves as an independent

Va'ria-Eﬂe:

Z(t) : ($ipIiyipyizipZ)T- (1.1.5)

I . . . . .
n the curVilinear coordinate system that is attached to a reference particle the ar-

C . . . .
length along the reference trajectory is usually serves as an independent variable. In

this coordinate system (often called the curvilinear coordinate system) the particle

state is represented by the following coordinates:

l . )
a =pcc/po
Z(3) : y , (1.1.6)
b =Py/P0
l= k(t — t0)

 

 

\5 = (E - Eo)/Eol
where the .73, y denote the position of the particle in this relative coordinate system,
p0 is an arbitrary ﬁxed momentum (usually the one of the reference particle), E0
and to are the energy and the time of flight of the reference particle, E is a total
energy of the particle, and k is a scaling coefﬁcient that transforms time coordinate
to space-like coordinate. In those coordinates the reference particle corresponds to a
z = 0.

The motion of a particle in the electromagnetic ﬁeld is governed by the Lorentz

force [93]:

d
d—‘t’zq(E+va). (1.1.7)

In order to study the motion of the particles that form the beam in the curvilinear
COordinates, those equations of the form (1.1.5) in the laboratory coordinate system,
are transformed into the curvilinear coordinate system, for the special case when the
reference trajectory is restricted to a plane (which is the case for most particle acceler—

ators) - Applying these transformations and using the reference trajectory simplifying

assumption they can be brought to the following form:

23': a(1+hx)p—0 (1.1.8)
=b(1+h:r):—g (1.1.9)
1 E B B
:’=-——( ””08 i+b—i—BQ——i)(i+hnf)+h0 (1.1.10)
1 + 770 Ps xeo xmo es XmO PS
E
b’=(—1——+"p—9—-—+y +-—B—‘f——a —p—0)(1+he)Bz (1.1.11)
1 + 770 p3 X60 X7720 X7710 p3
[=((1+h:r)1)-l——+nI—)—O 1)£ (1.1.12)
Hines v0
5’20 (1.1.13)

where I is a derivative with respect to the arclength s, h. is a radius of the curvature

of the reference trajectory,

—12
239: ( 71(2+n) 17f_a2_b2) /

 

Ps 770(2 + 770) 771(2)
E — eV(:r y s)
T] = 2 7
me
P_Q
XmO “ ze
is the magnetic rigidity,
: P0”0
X60 ze

is the electric rigidity, Bx» By, Bz and Ex, By, E; are :13, y and 2 components of
the magnetic and electric ﬁeld in the laboratory coordinate system, correspondingly.
A rigorous deﬁnition of the coordinate system and the detailed derivation of the
eqllations of motion in this system (1.1.8)—(1.1.13) can be found in [18].

Once the ﬁelds and the reference trajectory are known, these equations can be di-
rectly integrated (analytically for simplest cases, numerically for most real-life prob-

1e
ms) in order to determine the dynamics of the particles. More efﬁciently, map

methods, mentioned earlier, can be used for this purpose. The latter approach is used

by the Beam Physics package for COS Y Inﬁnity scientiﬁc computing code [22,23].

In this framework the Taylor expansion of the map is actually an array of Taylor

expansions of the dependencies of the ﬁnal coordinates on the initial coordinates.

Employing notation that is frequently used in optics in order to emphasize the nature

of the coordinate dependencies, it can be written in the following form:

33f = (flail-Ti +($lalai+(Ily)yi+($|b)bi+($|1)11+($|5)5i

+ (:clzzc)ati2 + (.rlacaﬂiai + (:r]:ry);riyi + (.i:|;rb).ribi + . . ..

In this notation the Taylor expansion for the map of the system (1.1.8)~(1.1.13)

takes the form:

If = Z(:r[:ri1ai2yi3bi4li56i6)x[1ai'Qyiz3bi'4lz5df6 (1.1.14)

af = Z(a|:ci1ai2yi3bi4[2562-6).r:1a:2yiz3biz4liz56f6 (1.1.15)

yf = 20;]inai2yi3bi4lf5di6)r:1aEZyz3bE4l:5di6 (1.1.16)

_ ,2: iiiz‘ it .i1i2i37747751:

(if—Zairemar/311415156»:i ai yi b, l, 56 (1.1.17)

If = Zﬂllea22y23b‘4lz5626)xi1ai2y13bi4li0626 (1.1.18)

a, = Z(a[;r7:1ai2yi36i4li56i6)6:1a:2yiz3biz4li256i6 (1.1.19)

(1.1.20)

where the summation is performed on all indices i1,i2, . . . 17:6 such that 22:1 ik g n,

77» is the Taylor expansion order.

1\Tol‘lhal Form Methods

epetitive systems such as synchrotrons and storage rings are the main component of

t
he Ihost modern high-energy particle accelerators. In those circular lattices, particles

ought to remain conﬁned for many turns. Hence their trajectories should be stable,

which usually requires them to be bounded in some way. The study of the dynamics of
particles in these structures and the stability of the dynamics is very important both
theoretically and practically. The advantages of map methods for such studies lie in
the ability to calculate a map of motion (see section 1.1.1) representing the action of
all accelerator elements in one full revolution. Then the repeated application of this
map, the so called Poincare map, can be studied to evaluate the stability of the entire
device for large number of turns.

The linear theory of repeated motion has been fully developed since its introduc-
tion by Courant and Snyder [49]. It relies on the well-known matrix methods from
Linear Algebra (see [17] for detailed treatment). Since the transfer map to ﬁrst order
is a matrix, the so called transfer matrix, the stability of the motion is determined
by its eigenvalues: if any of the eigenvalues has absolute value > 1, then the motion
is unstable. Since the motion in such structure is volume preserving, the product of
the eigenvalues of the transfer matrix must be one. This means that if there exists

an eigenvalue > 1, there must be another inversely proportional to it and thus < 1.
However, such an arrangement would make the motion unstable. Therefore for the
motion to be stable and volume preserving, all eigenvalues must have a magnitude
Of One. However, real-valued eigenvalues that have the magnitude of one can be per-
tur bed from this value under a small perturbation in the system parameters rather
easily, hence they should be complex. In sum, for our motion to be stable we need
the System’s linear transfer matrix to have only complex conjugate eigenvalues with
Ina'gnitudes of one. Further development of this theory and other conditions imposed

n the matrix and its eigenvalues by stability conSIderations can be found in any of

th . .
(2 Sources mentioned earlier.

 

Nonlinear motion is in general much more difﬁcult to study. In accelerator theory
nonlinear studies are usually divided into the study of the parameter-dependent linear
motion, where parameters include the particle energy spread, magnet misalignments,
etc., with perturbation theory; and the study of fully nonlinear dynamics. Many use-
ful properties of nonlinear motion can be obtained exactly by employing the method
of normal forms, ﬁrst introduced (to low orders, no more than 3) by Dragt in 1979 [54]
and developed over almost two decades, then brought to its full practical power (max-
imum order is theoretically inﬁnite, i.e. limited only by the available computational
resources) by Berz in 1992 within the Differential Algebra framework [16,17]. This
approach provides an algorithm to build a nonlinear change of variables to remove
all removable non-linearities and present motion in the set of variables where it is
circular with amplitude—dependent frequency.

Assume, that we obtained the nonlinear transfer map of a particle optical system

under consideration:

Zf=M(ZZ',5), (1.1.21)

Where 2 is the 2v-dimensional vector of the phase space coordinates, 6 is the vector
0f the system parameters and the indices i and f correspond to initial and ﬁnal
coordinates. We want to build a sequence of the coordinate transformations A of the

map

AoMoA_1 (1.1.22)

to reniove all nonlinearities of every order up to the desired.
The ﬁrst transformation is performed in order to make the map origin-preserving
for a11y 6:

M(0,6) = 0.

 

 

 

DA methods are employed to move the map to the new parameter—dependent ﬁxed
point 21: so that

ZF =M(ZF,(5). (1.1.23)

It is possible if and only if 1 is not an eigenvalue of the linear part of the map. For
stable repetitive systems such a condition always holds as we mentioned earlier.

The diagonalization of the linear part of the map, so called linear diagonalization,
is performed on the next step. From Linear Algebra we know that in this case
diagonalization is possible if the matrix has exactly 21) distinct eigenvalues, which is
true for most modern circular accelerators. In this case it is possible to represent all

. . . iin -
eigenvalues as complex conjugate pairs r -e 3.

J It is easy to show that for symplectic

systems [17], the condition for the determinant to be unity entails

rj:1,;1jElR, forj=1,...,v.

If we now transfer the matrix to the new basis of complex conjugate eigenvectors v-i

corresponding to complex conjugate eigenvalues, it assumes the diagonal form

 

 

7 r1e+2ﬂ1 0' 0 0 \
O T1€_Z'u1 ... 0 0
R = 5 g g g . (1.1.24)
0 0 rve+wv 0
K 0 0 0 rue-WU I

On subsequent steps we iteratively build a sequence of non-linear transformations
of the form (1.1.22), such that on each step the constructed transformation tries to
remove one particular order of nonlinearity. The ultimate goal is to remove all non-
linearities up to a speciﬁed order but, as it turns out, this is not always possible.
Nonlinearities that can be removed by means of this transformation are called remov—
able, all other nonlinearities are called non-removable. Non-removable nonlinearities

usually characterize the non-linear nature of the system under consideration.

It is worth noting that all these transformations are nonlinear and thus they do
not affect the diagonal form of the linear map obtained in the ﬁrst two steps. Since
the process is iterative, it is sufﬁcient to describe the algorithm to make the m-th
step in order to fully determine it. Having one step of the algorithm we can proceed
in applying it from order 2 to the desired order.

On the m—th step we try to remove nonlinearities of the order 772 only. In order
to achieve it, we start by splitting the map M into a linear part ’R and a nonlinear
part Sm: M = ’R + Sm. Then we perform a transformation using a map that to the

m-th order has the form

Am =I+Tmi (1-1-25)

where I is a linear unity map, Tm has only zero terms up to order (m —— 1). The
linear part of the transformation is a unity matrix that is invertible, hence the map
Am itself is invertible. Using relations for transfer map inversion from [17], we obtain

the inverse to order m:

Applying the transformation from (1.1.22), we obtain

AmoMoAgll =m (I+Tm)o(R+Sm)o (I— 7...)
=m (1+Tm)o(7z+sm—7zo7m), (1.1.27)

=mR+Sm+(TmOR—ROTm)

Where we used the fact that any nonlinear map composed with Tm is zero to the
Order m since Tm is of the order m and does not have smaller-order terms. If we now

could choose Tm so that for communicator

10

the following condition holds

Cm = —Sm, (1129)

then result of the transformation in (1.1.27) can be simpliﬁed to
Amen/1014,31 =m 72 (1.1.30)

and the transformation Am deﬁned by (1.1.25) removes all nonlinearities of the map
up to the order m. However, such choice of Tm is usually not possible.
In order to ﬁnd the conditions for its existence, we consider the Taylor expansion
j:

of the Tm in the coordinates s]. in the eigenvector basis vi”. The Taylor expansion

for the j—th component of the Tm, has the form
:1: —- k+ — k‘ kT - h?
at = Zea-Ice Hsi) 1 (a) 1 ...(sii . (a) ., own
where
:t + —
(ijlk ,1. )

are Taylor expansion coefﬁcients for corresponding exponents of st-z; k+ and k— are

vectors of exponents

k+ = (k+....,k,jr)

k7 = (kl—,...,h,j).

Now if we, using the same notation for Taylor expansion of Cm, substitute relations
(1.1.31) for Tm and the exact expression (1.1.24) for the R into the deﬁnition of the
Communicator (1.1.28), and then use the fact that polynomials are equal if the corre-
sponding coefficients are equal to equate coefficients of the corresponding exponents

in Taylor polynomials, we obtain the expression for the Taylor expansion coefﬁcients

11

of Cm components:

'0 + —
kl +kl ,ein(k+—k—)

i + — _ .
(ijIk ,k )_ (HIT, J

int i + -—
—r -e J -(ij[k ,k ).
(1.1.32)
Substituting this expression into the condition (1.1.29) and solving the resulting

equation for the coefﬁcients of the Taylor expansion (1.1.31) of Tm, we obtain:

:L- _
—(Sm,-lk+1k )

k++k‘ . _ .-, '
((Hyzl 7.11 l ) ,ezn(k+—k )_ 7.}, ,ezlzzlij)

Now we see that the existence of the transformation (1.1.30) depends on conditions

(1.1.33)

 

i __
(ijlk+’k )2

for which the expression in the denominator in the formula (1.1.33) is not zero. If it is
zero for certain values of (k+, k“), then the corresponding Taylor expansion term of
Sm cannot be removed. Some special cases, such as symplectic systems, which often
arise in accelerator physics, as well as quantities of interest (in particular resonances
of different kinds) that can be obtained from the normal form transformation, are

discussed in details in [17].

Normal Form Defect Function

Normal forms (see [17], section 1.1.2) are a valuable part of the map methods in Dif-
ferential Algebra (DA) framework and a powerful tool in studying the dynamics of the
particles in circular accelerators. As was mentioned, the motion in these coordinates
follows nearly perfect circles around a ﬁxed point (Fig. 1.1).

If the motion has perfectly circular nature, it entails constancy of the radii that
are thus invariants of motion. This, in turn, demonstrates the stability of the motion

for the inﬁnite time and number of turns. If the motion is non-perfectly circular, a

12

. , - s - V 1‘ - ‘ n -.
. .. :1 ‘ﬁ‘:.%&3%
h-.a"’=’—"” " -.

a

       

Figure 1.1: Phase space trajectories in FODO cell, obtained for 1000 turns by ap—
plying one turn map to the vector with initial coordinates 1000 times; in conventional
(left) and normal form (right) coordinates

measure of the defect in the invariants of the motion I is:
d: max(I(M) -—I), (1.1.34)

where M is the Poincare map (see section 1.1.2), and can be introduced using
Nekhoroshev-type estimates [140]. It then could be studied to estimate the time
and the number of turns that particles stay in the accelerator and make assertions
on motion stability [16,18,20]. Note, however, that the presented approach to calcu-
lation of the invariants of motion involves Taylor expansions of maps to a speciﬁed
order, hence it allows one to obtain the expansions of the invariant radii to that order
only. This means that with this method we can obtain only approximate invariants
from (1.1.34). Nevertheless, the quality of the approximation for the weakly nonlinear
systems that particle accelerators are in most cases, improves rapidly with the order
of the approximation order.

Suppose all trajectories in normal form coordinates are perfect circles. Then we
know we have found an invariant of the system for all degrees of freedom. Now,
if the transformation from normal form coordinates to conventional coordinates is
continuous, then the set of trajectories is bounded and the motion is stable for an
inﬁnite time. For most systems under consideration this, however, is not the case.
One reason for this is that the normal form defects can be very small and, being

calculated on a computer, can be caused by floating point operations errors. Another

13

 

reason is that the particle dynamics is calculated by means of Taylor expansions up to
a speciﬁed order, so the invariants we obtain are only approximate. Here the defects
of this approximation (decreasing with increasing order) produce the deviation from
circularity. A third reason that many systems are non-integrable, i.e. they do not have
an invariant for every degree of freedom. In this case the motion is non-circular even
if the dynamics are calculated exactly, with no approximations or numerical errors.
The non-integrability of the system indicates itself in the form of small denominators
in (1.1.33) in some step of the normal form transformation algorithm applied to its
map (see section 1.1.2). The circularity of the motion in the Figure 1.1 is disturbed
for all three reasons but the non-integrability of the system under consideration most
likely has the largest impact.

Any real physical system has some construction defects and real values of pa—
rameters can deviate from designed. Rigorous estimations of the stability ranges for
perturbed motion exist, but stability predictions are possible for only for very small
perturbations and totally dominated by realistic construction errors. While the defec—
tive nature of the invariants of motion prevents us from making statements on global
stability for an inﬁnite period of time, it is still possible to estimate stability for a
ﬁnite, but still practically useful period of time, applying principles established by
Nekhoroshev [140].

In order to do so, we divide the normal form coordinate space for each degree of
freedom into a set of rings such that in each of them motion is almost circular, as
demonstrated in Figure 1.2(a). Suppose that for the ring it the defect is not larger

than Arn. Then all particles launched from ring (n — 1) need to make at least

Tn —7‘,n 1
N11 _ 1.1-
ATn ( 35)

turns before they reach the n-th ring (see Figure 1.2(b)). If we want to estimate the

14

 

 

 

 

 

 

 

co
rn
f ________________
f--_-_----_-_--1
)--' f ...........
f _____________
____________ 5 """""°
______ I
I """"" F ..-
rn-1 I """ ‘ ------- f .........
0
0 Zn
(a) Normal form coordi- (b) Particles motion in the ring: arrows
nate space divided into a point at particle positions, step corre-
set of rings where we esti- sponds to one turn, height of the step
mate the maximum defect corresponds to the defect, maximum

height corresponds to Ari, number of
steps is the number of turns N,- particle
stay in this ring. Note that the defect
gets larger towards the outer radii.

Figure 1.2: Normal form coordinate space divided into rings and schematic view of
particles motion in one of those rings

minimal number of turns it would take particles to get from the inner circle bounded
by rmin (initial region) to the outer ring bounded by Tmax (restricted region, particles

that have reached it are considered lost), we can perform the subdivision
Tmin =T1<T2 < <Tn=Tmax.

Then if maximal defects on each of the rings bounded by those radii are Ari, i =
2,. . . ,n correspondingly, the total minimal number of turns, particles need to get to

the restricted region from the initial region, is given by

(1.1.36)
i=2
Usually the normal form defect function grows quickly with radii (as can be seen,

for example, in Figures 4.8 and 4.9), hence large values of n help us to get a better
estimate of N. Since in most cases of interest the Ar,- are small, motion stability can

be assured for a large number of turns.

The practical usefulness of this method heavily depends on the ability to determine
tight and rigorous bounds for the defects of Ari. In practice, defect functions (Figs.
4.8, 4.13) are multi-dimensional polynomials of high order, with many of the high-
order elements canceling each other out. Thus they pose difﬁculties for conventional
interval methods. Studies on obtaining rigorous bounds for the maxima of normal
form defect functions [20,26,126] have lead to many interesting numerical algorithms
applicable to much wider class of problems (see section 2.2.2 for description of a
rigorous global optimizer based on the DA framework and Taylor Models [118], section
4.2 on its application to the normal form defect function). The behaviour of these
functions is highly oscillatory, the number of local extrema is high so they also present
signiﬁcant difﬁculty for conventional minimization methods. These properties are
employed in section 4.2 to test the GATool genetic algorithm-based heuristic optimizer

described in section 2.3 and in appendix B.

1.2 Neutrino Factories

1.2.1 Purpose and History

A neutrino is a special kind of elementary particle that was believed not to have
any mass, charge or color. Recent studies, however, have demonstrated that neutri-
nos have a very small (estimated to be much less than 1 MeV), but non-zero mass.
They are the most abundant constituent of the universe and have an important im-
pact on astrophysical processes, from the ﬁrst minutes after the Big Bang itself to
supernovae explosions observed today. Neutrinos are created as a result of certain
types of radioactive decay or nuclear reactions such as those that take place in the

sun, in nuclear reactors, or when cosmic rays hit atoms. There are three types, or

16

"ﬂavors", of neutrinos, named after their partner leptons in the Standard Model:
electron neutrinos Ve, muon neutrinos ”H and tau neutrinos VT. Each of those types
also has an antimatter partner, called an antineutrino (176, I7”, 177— — electron, muon
and tau antineutrinos, correspondingly). In 1998, experiments began to show that
solar and atmospheric neutrinos change ﬂavors. The processes leading to these unex-
pected masses and mixing parameters are suggested to take place at energies never
seen since the Big Bang, perhaps connected to the uniﬁcation of all forces. Precise
determinations of the masses and mixing angles of the three families of neutrinos
opens a unique window of observation into these early times [63]. These fascinating
questions of physics require an ambitious accelerator-based neutrino experimental
program [1,2].

The Neutrino Factory is a very important facility for the long-term neutrino
physics program. Modern technologies of particle accelerators, both already devel-
oped and being researched, open the possibility of building an accelerator complex

020 muons per year. The idea of an accelerator

to produce and capture more than 1
where the pious are injected into a storage ring, decay to produce muons captured
within the same ring, and then further decay into a neutrino beam was proposed
several times by different researchers starting from Koshkarev in 1974, but it has the
basic problem that the resulting neutrino beam intensity was low [39,106]. The Neu-
trino Factory idea in its current form was proposed by Geer in 1997 [75]. He suggested
creating muons from an intense pion source at low energies, then compressing their
phase space to produce a bright beam which is then accelerated to the energies of

several tens of GeV and injected into a storage ring with long straight sections where

they decay into highly intense neutrino beams
p— -—> €~Vﬂl7€, 11+ ——> elixir/e. (1.2.1)

17

 

 

Beams of such brightness can be used for the extensive study of the neutrino oscilla-
tions [5] and neutrino interactions with the required high precision.

In the US, the Neutrino Factory and Muon Collider Collaboration [146] is a
collaboration of 130 scientists and engineers engaged in carrying out the accelerator
R&D that is needed before a Neutrino Factory can be actually built. Much techni-
cal progress has been made over the last few years, and the required key accelerator
experiments are now in the process of being pr0posed and approved. In addition to
the US. effort, there are active Neutrino Factory R&D groups in Europe and Japan,
and much of the R&D is performed and organized as an international endeavor. Neu-
trino Factory R&D is an important part of the present global neutrino program. The
Neutrino Factory requires an intense multi-GeV proton source capable of producing a
primary proton beam with a beam power of 2 MW or more on the target. This is the
same proton source required in the near future for Neutrino Superbeams [33]. There-
fore, there is a natural evolution from Superbeam experiments to Neutrino Factory
experiments over time. Studies performed so far have shown that the Neutrino Fac-
tory gives the best performance among all considered neutrino sources over virtually
all of the parameter space. Its practical possibility and cost remain, however, impor-
tant questions that are being actively researched. Numerous articles and technical
reports on the progress are published. The summary reports, including international

ones, are produced every year [1,3,86,150,178].

1.2.2 Design Overview

The Neutrino Factory is a secondary beam machine; that is, a production beam is
used to create secondary beams that eventually provide the desired ﬂux of neutrinos.

For the Neutrino Factory, the production starts from a high intensity proton beam

18

that is accelerated to a moderate energy (beams of 2-50 GeV have been considered
by various groups) and impinges on a target, typically made from a high-Z material
(baseline choice is a liquid Hg jet). The collisions between the proton beam and
the target nuclei produce secondary beams of pions that quickly decay (26.0 as) into
longer-living (2.2 ,us) muon beams. The remaining part of the Neutrino Factory is
used to condition the muon beam, rapidly accelerate it to the desired ﬁnal energy
of a few tens of GeV, and then store it in a decay ring with long straight sections
where the intense beam of neutrinos is produced from the decaying muons (1.2.1).
The resulting beam can then be used, for example, to hit a detector located hundreds
or thousands of kilometers from the source.

The Feasibility Study II [150] that was carried out jointly by the Brookheaven
National Laboratory (BNL) and the US. Neutrino Factory and Muon Collider Col-
laboration, established most of the current Neutrino Factory design ideas. Although
a number of other ideas or variations of existing ones was proposed since FS II, later
studies mainly concentrated on the exploration of already proposed concepts and
their combinations. Their main goals were conducting a cost/ performance analysis
and developing consensus on a baseline design for the facility [10]. It is worthwhile to
note that the details of the FS II design are highly inﬂuenced by a speciﬁc scenario
of sending a neutrino beam from BNL to a detector in Carlsbad, New Mexico. The
results that came out of the Feasibility Studies demonstrated technical feasibility of
the Neutrino Factory, established its cost baseline, and the expected range of per-
formance. Another important feature of this design is that such a Neutrino Factory
could be comfortably constructed on the site of an existing U.S. laboratory, such as
BNL or Fermi National Accelerator Laboratory (FNAL).

Here we list the main components of the Neutrino Factory (see example of the

19

RLA-acceleration based variant of the Study Ila design in Figure 1.3) and their pri-

mary functions:

 

Proton Driver

   
  
  
 

Hg Target

Capture

Drift v beam
Buncher

Bunch Rotation Storage

 

Cooﬁng

  
 

Acceleration

 

 

 

 

Figure 1.3: Neutrino Factory schematics from the Feasibility Study IIa (RLA ac-
celeration variant) [10]

o The Proton Driver provides z 2 MW beam of a moderate energy (several

GeV) protons on target.

0 Target. A high-Z target is put inside a 20 T solenoidal ﬁeld (superconducting
solenoid) to capture pions produced in the interactions of the inciding proton

beam with the nuclei of the target material (liquid Hg jet) (see the longitudinal

20

 

distribution of the particles 12 m from the target obtained from the MARS

simulation code [138] in Figure 1.4).

 

750 ' ‘ ' Turf}.-

. ,r'. -<
. I.’ . J; ’1:
-. yr. . r s; -‘ -
500 ~ .avssa»ssi- .
‘- ". ‘_ 1‘" -_.i'.“‘ ,1; - .
.,- r'i ‘

Etotal(MeV)
~ ~i

‘3?!

s." i"

r E;

4
A .. z . —‘_'.I .
AMI ':._ _,.~.“11_;‘. .-_"
250 . .,~ .. 1. J
‘1'. .- .0 .1 .. - '
I, , ﬂu” ; ( ... . ,..
. ti * 1‘s: cut-1’ ‘
b ( ~/‘ to v1::i:;'_-L A .

 

 

0 a . LMML

0 500 A #10000 A A #1500
cT (cm)

 

Figure 1.4: Distribution of particles energies 12m from the target calculated by
MARS, Etotal = E0 + T, where E0 is a rest energy (105.6 MeV for muons), T —
kinetic energy

0 The Front End consists of the parts of the Neutrino Factory between the tar-
get and the acceleration section. It collects the pions coming from the target,
conditions them to form a beam of the muons that are produced by pion decay,
and then manipulates this beam to prepare it for the acceleration by efﬁciently
matching the beam to the accelerator acceptance (see example of the longitu-
dinal dynamics of a beam with a relatively small initial phase space in Figure

1.5, courtesy of David Neuffer [143]). It consists of the following subsystems:

— Capture. The magnetic ﬁeld at the target is smoothly tapered down to
a much lower value, 2 T, which is then maintained through the bunching

and phase rotation sections to keep the beam conﬁned in the channel.

21

 

 

 

;.
Drift l _ [[ 1'} Buncher [

.1
111111111111Hiinu

 

 

 

 

 

 

 

 

V

[ [in _ (HE)rotItor ‘ «1 Cooler
:[[[[[[[[[[[[l111111]illlminim, $3M (”111111111 1111, gnaw”
ii. , _ 1 1-1...

 

 

 

 

 

 

 

 

 

Figure 1.5: Example of the longitudinal beam dynamics in the Front End (phase-

energy plane), courtesy of David Neuffer [143]
— Decay. This region is just an empty magnetic lattice where pions are
allowed to decay to muons and where the particles of the resulting beam

develop a correlation between a temporal coordinate and an energy.

— Bunching and Phase Rotation. First the large beam of muons is
bunched with RF cavities of modest gradient, whose frequencies decreases
as we proceed down the beam line. After bunching, another set of RF cav-
ities, with changing frequencies, is used to rotate the beam in longitudinal
phase space in order to reduce its energy spread and match the frequency

to the one of the downstream RF cavities for efﬁcient acceleration.

— Ionization Cooling. A solenoidal focusing channel, ﬁlled with high-
gradient RF cavities and LiH absorbers, cools the transverse normalized
RMS emittance of the beam [60]. In this stage muons in the momentum

range of 150—400 MeV/c pass through the absorbers (from LiH in this de-

22

sign) thus reducing the total momentum (both longitudinal and transverse
components). They are then reaccelerated in RF cavities to regain the lon-
gitudinal momentum component only. The total effect is a decrease in the

transverse momentum spread and, therefore, the transverse emittance.

o Acceleration. Increases the beam kinetic energy from z 138 MeV to a ﬁnal
energy in the range of 20~50 GeV. A superconducting pre-acceleration linear
accelerator (linac) with solenoidal focusing is used to raise the muon beam
energy to 1.5 GeV. It is then followed by a Recirculating Linear Accelerator
(RLA), arranged in a dogbone geometry, that increases the beam energy to 5
GeV. Finally a pair of cascaded F ixed-Field, Alternating Gradient (FFAG) rings
with combined-function doublet magnets, is used to bring the beam energy up
to 20 GeV. Additional FFAG stages could be added to reach a higher beam

energy, deemed necessary for physical reasons.

0 Storage and Decay Ring. A compact racetrack-shaped superconducting
storage ring in which z 35% of the stored muons decay to neutrinos and are
sent toward the detector located approximately 3500 km from the ring. Muons

survive in a ring for z 500 turns.

1.2.3 Front End

Since the focus of our research is primarily in the exploration and optimization (see
Section 4.3) of the Neutrino Factory Front End section, we describe its design here
in more detail. Since there are different variations of the Front End suggested by
different research groups, including a Japanese FFAG study [81] and CERN linear

channel studies [30], here we describe only the scheme based on the Neuffer phase

23

 

rotation [74,141,142,145] and z 201 MHz RFs in cooling and acceleration. The latest
“International scoping study of a future Neutrino Factory and super beam facility”
accepts this design as the currently preferred scheme. Its additional advantage is that
it captures muons of both signs with equal efﬁciency. As most neutrino experiments
are aimed at both neutrino and anti-neutrino studies, such setup doubles the overall
efﬁciency. Finally, it replaces expensive induction linac-based design with a relatively
inexpensive array of high-frequency RF cavities thus making the overall scheme better
in terms of cost / performance.

As can be seen in Figure 1.4, pions that are produced by the nuclear collisions
on target occupy a signiﬁcantly large longitudinal phase space. The transverse phase
space is mainly determined by the magnetic ﬁeld strength of the solenoidal capture
channel. According to the properties of the dynamics of particles in a solenoid [119],

particles with the transverse momentum satisfying the following condition

p .L < 0.3—B2—R,
where B is the solenoidal ﬁeld strength and R is the radius of the solenoid are cap-
tured after the target. In order to effectively accelerate the beam, it needs to be
preconditioned to be fully contained within the capture transverse acceptance (30 it
mm-rad) and the longitudinal acceptance (150 mm) of the subsequent accelerating
section. Another constraint that the resulting beam has to satisfy is that only the
particles that are contained within the longitudinal bucket of the accelerating system
(bucket area depends on the RF frequency, phase and a ﬁeld gradient) are captured
into the accelerating regime. Transverse emittance should be decreased by cooling in

order to achieve optimal intensity. Hence the main ﬁgure of merit for the Front End

is the number of captured muons at the exit per incoming pions.

24

 

 

1.2.4 Decay, Bunching and Phase Rotation

Pions, and the muons that are produced by their decay, are generated in the target
over a very wide range of energies (see, for example, Fig.1.4), but in a short time pulse
(% 3 ns rms). Preparation of the muon beam for acceleration thus requires signiﬁcant
conditioning that includes reducing the energy spread and forming the beam into a
train of bunches. Beam splitting into multiple bunches demonstrated itself as the best
technique since in this case the bucket areas are signiﬁcantly larger than the beam
area hence a very good acceptance is expected [62].

First, the beam is allowed to drift to develop an energy correlation, with higher
energy particles at the head and lower energy particles at the tail of the beam. Next,
the long beam is separated into a number of short bunches suitable for capture and
acceleration in a 201-MHz RF system. This is done with a series of RF cavities that
have decreasing frequencies and increasing gradients along the beam line, separated by
a suitably chosen drift spaces. The resultant bunch train still has a substantial energy
correlation, with the higher energy bunches places ﬁrst and progressively lower energy
bunches coming behind. The large energy tilt is then phase rotated into a bunch train
with a longer time duration and a lower energy spread using additional RF cavities
of decreasing frequencies but constant gradient and drifts.

And example 2D simulation of the dynamics of the particles in the structure is
shown in Fig.1.5. The beam at the end of the buncher and phase rotation section
has an average momentum of about 220 MeV/c. The proposed [142] system is based
on standard RF technology, and is expected to be much more cost effective than the
induction-linac—based system considered in [150]. An additional beneﬁt of the RF-
based system is the ability to transport both signs of muons simultaneously. Finally,

we note that there are many variations of the proposed scheme in order to study

25

 

 

performance/cost relations and/or better ﬁt to different designs of other sections of
the Neutrino Factory. Examples include low-frequency rotation, phase rotation with
a scaling F FAG, variations of gradients, phases, number of different RF frequencies,
geometry of windows in RF cavities, gas-ﬁlled cavities, and other alternative designs.

The baseline Front End schematic from the latest International Scoping Study [62]
is demonstrated in Figure 1.6. The baseline proton driver has an energy of 10 GeV.
The capture system is a 12 m long channel with the solenoidal ﬁeld dropping from
initial 20 T to 2 T and the channel radius increasing from 75 mm to 250 mm. It is
followed by a 100 m long decay section where the pions decay to muons and develop
a correlation between the temporal position and an energy. This correlation is then
employed by the 50 m long bunching section to split the beam into a train of bunches
via a set of RF cavities of a modest gradient and decreasing frequencies. Then another
set of RF cavities with higher gradients in the 50 in long rotator section are employed
to rotate the beam in the longitudinal phase space to reduce its energy spread. We
describe the logic behind the choice of the frequencies of the buncher and phase
rotator later in detail later in this section. The ﬁnal RMS energy spread in this
scheme is x 10.5 %. Then an 80 m long channel ﬁlled with high-gradient 201.25 MHz
RF cavities and LiH absorbers in the solenoidal ﬁeld is used to cool the transverse
normalized RMS emittance from 17 7r mm-rad to a: 7 mm-rad at a central muon
momentum of a: 220 MeV/c.

To set up buncher parameters we choose some ideal particle to be the main central
particle of the beam. Usually this is a particle with coordinates in the center of the
beam particles coordinates distribution. We then set phases of RF cavities in such
a way that this particle enters every cavity in the same phase (993 = 0) of EM ﬁeld

oscillations. By the virtue of the equations of motion in such a structure (see, for

26

 

phase

. coolin
rotation 9

capture decay bunching

 

——> —>

 

 

 

 

 

 

 

0 12 111 162 216 295m

 

 

Figure 1.6: The baseline Front End schematics from the latest International Scoping
Study [62]
example, Chapters 13-14, [170]), particles near the central one in (1,9 — 6E) phase
space are then formed into a stable group called a “bunch”. This group then oscillates
around the central particle in the longitudinal phase space along with the motion of
this particle in the accelerator. As a consequence of our choice of the main central
particle, phase and cavity parameters, other particles are passing all cavities in the
same 903 = 0 phase and thus forming bunches around themselves. In the following text
we will call them central particles and the one chosen ﬁrst the main central particle.
Of course, all central particles are not real particles, they are just an idealization
chosen to make equations of motion simpler.

Each cavity in the buncher has its frequency set to maintain the following con-
dition: the difference in the time of arrival of any two central particles in a place of
RF ﬁeld application remains equal to a ﬁxed integer number of RF oscillation periods

and this condition is maintained as the beam propagates through the buncher

1 1 ’\rf
At=tn——tc=z —-——— =nTrf=n~—C—-,nEZ, (1.2.2)

on be
where n is the number of the bunch counted from the main central particles, to, tn
and 110, on are time-of-arrival of main central and n-th central particle (main central
particle has n = 0) and their velocities respectively, Trf is the period of RF ﬁeld

oscillations, )‘rf is the RF wavelength, c is the speed of light.

27

As the E ﬁeld phase in the RF cavities is zero for the main central particle, it is
also zero for other central particles since they pass the RFs when the ﬁeld has zero
strength and therefore their energies stay constant through the buncher. We keep the
ﬁnal frequency of the buncher and rotator ﬁxed to match the beam into 201.25 MHz
cooling and/or accelerating sections. Thus, setting n = 1, )‘rf = 51, z = L in (1.2.2),
where z is the longitudinal coordinate with z = 0 at the beginning of the drift, 5. is

the ﬁnal RF wavelength in buncher (deﬁned by matching to the following cooler), L

is the longitudinal coordinate of the last RF in buncher, we can deﬁne

1 1 1 i
6 — = —- —- — = T, 1.2.3
(ﬁ) (31 13c) L ( )
where ﬁg, ﬁn are the main central particle and n-th central particle’s normalized

velocities, and then rewriting (1.2.2) we get

 

2311—1 = 231; + 716 (g) . (1.2.4)
Therefore for kinetic energies of central particles in the buncher we have following
relation
2 —1/2
.. (11(1)) = .. 1— 11.15121 1 —1 1
c as)

where W0 is the rest energy of the particle and Tn is the kinetic energy of the n-th
central particle. From (1.2.3),(1.2.4) it follows that in order to keep the time of arrival
difference between two central particles constant, the frequencies of RFs in a buncher
should depend on the longitudinal coordinate through

C

Arf(z) = z . 6 (g) :> Vrf(2) = W, (1.2.6)

In the buncher the RF gradient is adiabatically increased over the length of the

buncher. The goal here is to perform an adiabatic capture, in which the beam within

28

 

each bunch is compressed in phase such that it is concentrated near the central par-

ticle’s phase. We arbitrarily choose this gradient to be increasing quadratically
(z — 2D)

(2 — z )2
V110) = 131—T— + C—Z—D—,

(1.2.7)
where Vrf is RF voltage, ZD is the longitudinal coordinate of the beginning of the
buncher (equal to the drift length), B and C are positive constants, deﬁned by chosen
initial and ﬁnal RF gradients in a buncher, L is the length of the buncher. Note that,
since each of the bunches is centered at a different energy, they all have different
longitudinal oscillation frequencies, and a simultaneously matched compression for
all bunches is not possible. Instead a quasi-adiabatic capture is performed in order
to achieve an approximate bunch length minimization in each bunch.

Following the buncher is the (1p — 6E) uernier rotation system in which the RF
frequency is almost ﬁxed to the matched value at the end of the buncher and the
RF voltage is constant. In this system the energies of the central particles of the
low—energy bunches increase, while those of the high-energy bunches decrease. So the
whole energy spread reduces to the point where the beam is a string of similar-energy
bunches, which are captured into the ~201 MHz ionization cooling system matched
to the central energy of the beam.

We now describe the rotator parameters calculation in more detail. At the end of
the buncher we choose two reference particles (711 and n2) which were kept (n2 — n1)
RF periods from each other along the buncher, and the vernier offset 6. We then
keep the second central particle at ((n2 — n1) + 6)’\rf wavelengths from the ﬁrst
one through the rotator. With this choice, the second central particle passes all RF
cavities in a constant accelerating phase 1,0112 having constant energy change ATn2.
F fter [Tnl — T112] /ATn2 cavities, the energies of the ﬁrst particle (usually we choose

main central particle as ﬁrst central particle) and the chosen second central one will be

29

nearly equal. From this consideration we can derive the relation between the energy
change of the n-th central particle in each cavity of the rotator and the rotator

parameters:

, 7?, — 721
ATn(Erf,(S,n1,n2) = Erf81n(2ﬂéﬂ) , (1.28)

where AT", is the energy change of the n-th central particle, 6 is the vernier parameter,
n1 and n2 are the numbers of chosen central particles, Eff is the RF gradient of the
cavities in rotator. This process also aligns the energies of other central particles and
their bunches, hence at the end of the rotator we have the beam rotated in (1,9 — 6E)
space with a signiﬁcantly reduced energy spread. A simulation of the process in
(1,0 — 6E) phase space is shown in Figure 1.5.

Combining equations (1.2.5) and (1.2.8) we can obtain the formula for the central

energy of the n-th bunch after buncher and phase rotator in terms of their design

 

parameters:
T71?"(ﬂc,6(316),Erf,6,n1,n2) = (1.2.9)
2 -1/2
. 711—71.
W0 1— ”C —1 +mErfSin(27rdn2—_7}1—)

1+nﬁc5(315)

where m is the number of the RF cavities in rotator.

1.3 Optimization Problems

1.3. 1 Introduction

We live and work to reach our goals in a world where all available resources are
restricted. There is always a limit on the amount of time, money or technologies

that we have a control on. Most often we want to achieve our goals with maximum

30

satisfaction, spending a minimal amount of the limited resources, and producing a
minimal amount of unwanted side effects. In vague terms this is a formulation of
the optimization problem. Many design problems of science and industry can be
formulated as the optimization of a certain objective function under a particular set
of constraints. Many problems of accelerator and beam physics can be formulated
as optimization problems. As of the last feasibility study [62] most of the questions
of the Neutrino Factory design R&D (see section 1.2.2) were questions of optimiza-
tion, e.g. Optimum beam energy, repetition rate, and bunch length for the Proton
Driver. Target material, optimal production of the Front End (delivery of the most
muons fully contained within the capture transverse acceptance) and the longitudi-
nal acceptance of the Accelerating section (see section 4.3), were investigated. The
simple question of how to design of the one of the basic accelerator lattice building
blocks (see section 4.1) and the complicated problem of the stability estimating for
the particle dynamics in large and complex circular accelerators (see section 4.2) are
solved using the framework of optimization. Thus we see that optimization problems
and, of course, Optimization methods, are of great importance for many areas of the
modern science.

If we can construct a function that maps the properties we control to the measures
of the properties we want to optimize (minimize unwanted, maximize wanted), we
get a mathematical optimization problem. Properties under control are called control
variables or parameters, the function of measures is called an objective function. If
we seek the minimum of the objective function, it is typically called a cost function;
if the minimum is known to be zero, it is called an error function. In cases where the
maximum is sought, it is referred to as a ﬁtness function. Since the maximization can

be turned into minimization by ﬂipping the sign of the objective function, without

31

loss of generality we can restrict our consideration to minimization problems.

Optimization Of even a single performance measure is already a hard and complex
problem, but simultaneous optimization of several measures poses additional qualita-
tive complexity. Here optimal values of different measures are frequently achievable
with different combinations of control parameters and even the deﬁnition of the op-
timal solution itself is a non-trivial problem. Therefore it is usually preferable to use
mathematical models with a scalar objective function which provides some combined
performance characteristic of the system under consideration (see [102] for a discussion
on constructing combined objective functions). The problem of optimizing a single
Objective function is typically called an optimization problem or a single-objective
optimization problem while the problem of optimizing several objective functions is
referred to as a multi-objective optimization problem.

As was mentioned earlier, resources in real-life problems are typically limited.
Those limits can usually be modelled as equality and inequality constraints imposed
on control variables. An Optimization problem in their presence is called a constrained
optimization problem.

There exist many problems that can be formulated as optimization problems and
many methods to build a mathematical model for a problem. Such a variety of
methods produces a large variety of optimization problems. Apart from the properties

mentioned earlier they are often characterized with respect to:
0 Parameter types: on/off, discrete, continuous, functions of a certain type, etc.
o Dimensionality: number of control parameters.

0 Presence of noise: noise could be present in parameters and in the objective

function values.

32

0 Properties of the objective function: modality, time-dependence, continuity, dif-

ferentiability, smoothness, separability, etc.

Classiﬁcation of optimization problems is done with respect to these properties.
For example, combinatorial optimization deals with discrete control variables from
the space that usually contains a ﬁnite number of candidate solutions. Continuous
optimization typically considers control parameters from the space of real numbers
thus the number of potential solutions is inﬁnite. Continuous Optimization, in turn,
is divided into linear programming (linear objective function, linear constraints) [167]
and quadratic programming (quadratic objective function, linear constraints) [147].
Here well-known polynomial algorithms can solve a problem provided that certain
solution existence conditions are met. Au contrary, in non-linear programming (non-
linear non-quadratic objective function, nonlinear constraints) there are no general
algorithms with guaranteed convergence and estimate on the number of operations
required.

In this work we restrict our consideration to the optimization problems for which
the control parameters are real values, and the Objective function is scalar and is
generally nonlinear. A large number of design problems can be formulated as Opti-
mization problems of this type (see sections 4.2, 4.1, 4.3). In accelerator design control
parameters usually represent physical properties of the accelerator components, e.g.

magnet positions, strengths, lengths and apertures.

1.3.2 Unconstrained Optimization

Under our assumptions we can formulate the Optimization problem as follows. Let

S C_: R” be a search domain, x E S be a vector of v control parameters assuming real

33

values, and

fi£;H—+R (L31)

be an objective function. Let x be subjected to equality and inequality constraints
g.,-(x) =0, i: 1,...,n (1.3.2)

h..i(x) S 0, i = 1,. . . ,m (1.3.3)

Then the general problem of mathematical optimization is to ﬁnd f * E R such that

f* = min f(x) (1.3.4)
x63
and corresponding x* E S:
f * = f (X*)
which is usually written as
x* = arg min f(x), (1.3.5)
xES

such that it satisﬁes constraints (1.3.2), (1.3.3).
In some cases the optimization problem (1.3.4), (1.3.5) is softened. It is considered

solved if we ﬁnd x* E S such that there exist (5 6 IR and

f* = f(x*) = min f(x), (1.3.6)
X653

where 55 Q S is a delta-neighborhood of x*. In this case f * is called a local minimum
and the problem is called a local optimization problem. The original problem (1.3.4),
(1.3.5) is called global optimization problem and the corresponding f * is called a global
minimum (x* is also called a global minimum or, to make a distinction, a minimizer).
In this work we primarily consider global minimization problems therefore for all
considered optimization problems a global minimum is sought unless it is explicitly

stated otherwise. We cannot make any assumptions about the uniqueness of this

34

minimum for an arbitrary Optimization problem. Therefore by a minimum in these
problems we mean any of the non-unique ones, if there are several, unless stated
otherwise.

In some cases a global Optimization problem is restricted by adding additional
conditions. If it is required to prove that the found minimum is global, and provide
rigorous bounds for its value, then such subﬁeld of optimization is called a rigorous
global optimization (covered in more detail in section 2.2). If the constraints (1.3.2),
(1.3.3) are deﬁned, the problem is called a constrained optimization problem, other-
wise it is called an unconstrained optimization problem. In Chapter 2 we review of
the methods of unconstrained optimization with Evolutionary Algorithms, describe
the implemented GATool continuous unconstrained optimization EA, present studies

on its performance and potential of the integration with the rigorous optimization

package COSY-GO.

1.3.3 Constrained Optimization

In this section we consider constrained optimization problems, i.e. problems (1.3.1),
(1.3.5), and (1.3.4) in the presence of constraints (1.3.2) and (1.3.3), in more detail
and introduce the relevant terminology.

When constraints are imposed, the set
F: [xe Scnvlg,(x) =0, 71,-(x) go, 1': 1,...,n, j: 1,...,m} (1.3.7)

is called a feasible set. It contains all vectors from the search domain that simultane-
ously satisfy all constraints. Such vectors x E F are called feasible, all other vectors
are called unfeasible. If at some point x E S the inequality constraint hj(x) holds

as an equality (hj(x) = 0), it is called active at x. Equality constraints are consid-

35

ered active everywhere in S. Using these deﬁnitions we can rewrite a constrained

optimization problem formulation as

f* = min f(x) (1.3.8)
xEF
x* =ar min f x,
gxeF ()

where a sought minimum is also called a feasible minimum.

Search domain S is usually given as a v-dimensional box
S={XERU|:rl§:rl§fz,l=1,...,v} (1.3.9)

and thus can be treated as a set of inequality constraints included into the feasible set
deﬁnition (1.3.7) (note that by this deﬁnition F E S). However, for most real-world
problems S can be determined rather easily by estimating physically reasonable ranges
for control parameters and thus has a simple convex structure with linear boundaries.
Feasible set, in contrast, can be speciﬁed by a large number of complex non-uniform
constraints and therefore can have an extremely complex structure. Depending on
the problem F can have nonlinear boundaries, be non-convex, not connected, have
measure of zero, or even be empty (in this case the constrained optimization problem
has no solution) and is thus hard to study and visualize. Moreover, for many test
and real-life problems IF | << (S I, hence the distinction between a search space and a
feasible set is fully justiﬁed. The quantity

_ lﬂ
p— lSl 6 [0,1] (1.3.10)

is often used as one of the measures of the difficulty of the constrained optimiza-
tion problem. Empirically it can be viewed as a measure of the difﬁculty that the
constrains are adding to the problem, in comparison with the difﬁculty of the un-

constrained problem with the same objective function. Generally, the smaller p is,

36

the harder it is for the algorithm to ﬁnd feasible points in the search space. Note,
however, that for an arbitrary problem this factor is hard to estimate because of the
unknown and frequently complex structure of F. Random sampling of the search
space is usually employed for such estimation [130].

It is worth noting that the p factor alone does not determine the constrained
problem’s difﬁculty completely. However, the theoretically developed framework for
such analysis and comparison of different problems does is not established yet. Most
of the difﬁculty ratings are assigned heuristically and are derived from the practice.
For example, it is well-known that the problems with convex feasible sets are easier
to solve than the ones with non-convex feasible sets; that the problems with disjoint
F are harder to solve than the ones with connected F, etc [135]. It is also known
that the difﬁculty of the problem often increases as the number of constraints that
are active at the sought feasible minimum increases (for an arbitrary problem this
information is not available before the minimum is found). It is also worth noting,
that the optimization performance is algorithm-dependent (see section 2.1.4), hence it
cannot be measured for the problem itself without the considerations on the algorithm.
Some work towards characterizing constrained problems and determining if they are
EA-hard can be found in [57].

Inequality constraints (1.3.3) can be transformed into equality constraints by in-

troducing “dummy” variables éj’ j = 1, . . . ,m. In this case each inequality constraint
x) g 0

is converted into an equivalent equality constraint

h.j(x) + :32- : 0.

Each equality constraints (1.3.2) can in turn be transformed into two inequality con-

37

straints:

—g,l-(x) < 0, i=1,...,m (1.3.11)

gi(X)SO, i=1,...,m,

or, for the methods that do not rely on smoothness of the constraint functions to one

inequality constraint

]g.,;(x)| S O, i = 1,... ,m. (1.3.12)

For practical purposes of non-rigorous Optimization
]g,i(x)|—e§0, i=1,...,m, (1.3.13)

where e is an acceptable tolerance for equality constraint satisfaction is also frequently
used. Using these transformation we can limit our consideration to the problems
with either equality-only or inequality-only constraints without loss of generality. For
simplicity we consider only inequality constraints, i.e. constraints of the type (1.3.3),
treating n as a total number of constraints. In this case the feasible set (1.3.7) is
deﬁned as

F={x€SC_Zle]hj(x)gO,j=1,...,n}. (1.3.14)

If certain conditions on the constrained problem are satisﬁed, methods to solve
it analytically can be applied. For example, the well-known and widely applied La-
grange Multipliers Method requires an objective function and constraint functions be
written in an algebraic form, be deterministic, and differentiable. The generalization
of these conditions are Karush-Kuhn-Tucker (KKT) conditions [110] which formu-
late the necessary conditions for a point to be a conditional extrema. Additional
assumptions formulated in a variety of different forms and called regularity condi-

tions assure that the solution is non-degenerate. Under additional assumptions about

38

constraint functions, for example, when inequality constraint functions are afﬁne and
equality constraint functions are convex, sufficient conditions for the point to be a
global minimum can be formulated [7].

Unfortunately, many real-life problems are posed in such a way that their objec-
tive and/or constraint functions make the KKT conditions not applicable. In these
cases various numerical optimization methods are usually employed. Constraints are
often incorporated into an objective function or used to transform the problem into a
multi-objective optimization problem with help of penalty and barrier functions (see
Chapter 3) or the Lagrange multipliers method. After such simplifying transforma-
tion the optimization methods for the unconstrained problems can be applied to solve
the constrained problems. In general, most optimization methods for constrained
problems are based on the methods designed for unconstrained problems [147]. In
Chapter 3 we review constrained optimization methods (mostly those used in Evolu-
tionary Algorithms), propose a new method for a constrained optimization and study

its performance.

39

 

 

CHAPTER 2

Unconstrained Optimization

2.1 Optimization Methods

Once the mathematical model of the problem is developed, the types of the control
parameters are chosen and the objective function is constructed, the problem is most
frequently need to be solved. For all but rigorous optimization problems, point-based
iterative methods are most often used. Each step, they operate on a pOpulation
of points (for single-point methods it consists of one point) to generate the next,
supposedly better population in order to eventually converge to the sought minimum.
Single-point methods (also called descent methods) typically use the greedy iterative

search strategy from Figure 2.1.

2.1.1 Derivative-based Methods

If the objective function is sufﬁciently differentiable, derivative-based methods can be
utilized. The classic and the most well-known among them are: Newton’s, Steepest

Descent, Conjugate Gradient and Quasi-Newton methods, which all are iterative

4O

 

 

1. Start from the initial guess x0.

2. Compute the search direction pk.

3. Choose the step )‘k to achieve <p(/\k) = f(x]C +’\kpk) < f(xk) 2 90(0)
(more or less extensive line search).

4. If the move is successful, move to the next point xk+1 = xk + Ach'

5. Repeat steps 2-4 until “f(xk) — f(xk+1)|| < e, where e is a required

precision.

 

Figure 2.1: One-point greedy iterative search strategy

single-point local minimizers. Multi-start and clustering techniques were developed
in an attempt to turn them into multi-point global minimizers [172]. These techniques
increase the method’s chances of ﬁnding a global minimum rather than being trapped
in one of the local minima.

If the derivative cannot be obtained or is too expensive to calculate, derivative-free
direct search heuristic methods can be employed. These methods can be determinis-
tic or stochastic depending on the usage of random numbers in the search procedure.
Examples of the direct search methods are: Random Walk, Simulated Annealing,
and Hooke-Jeeves, which are single-point methods, and Nelder—Mead (nonlinear sim-
plex), Evolutionary Algorithms, and Particle Swarm Optimization, which are multi-
point methods. Rigorous optimization typically employs interval methods (see section
2.2.2).

If the objective function f is analytic, then denoting the gradient of the function
f (x) at the point x0 as g(xO) and its Hessian matrix at the same point as H(x0),

we can write its Taylor expansion at x0 in the form:
f(X) = f(xo) + s(><0)(x - x0) + g(x - X0)TH(X0)(X - X0) + - -- - (21.1)
If we then differentiate this expression, we get the expansion for the gradient:
g(x) = g(xO) + H(x0)(x — x0) + . . .. (2.1.2)

41

 

Taking into account that the necessary condition for the point to be a minimum is
that g(x*) = 0, substituting into (2.1.2) and neglecting terms of the order 3 and
higher since the function is approximately quadratic near the minimum, we obtain

the formula for the minimizer
x* = X0 — H_1(X0)g(X0) (2.1.3)

that is obviously exact for quadratic f (x). For non-quadratic twice differentiable
functions, it can be turned into the iterative procedure given in Figure 2.1. By
replacing x* with xk+1, x0 with xk and multiplying the direction by the step size

Ak, we obtain the multi-dimensional Newton’s method formula:

xk+1=xk —/\k-H_1(xk)g(xk). (2.1.4)

This method basically approximates a function at the current point with the quadratic
part of the Taylor polynomial and then makes a step to the minimum using the exact
formula (2.1.3). Since the formula is exact for the quadratic functions only, this step
does not reach the minimum but hopefully produces a next point that is closer to it.
In the sufﬁciently small neighborhood of the minimum the function is dominated by
the quadratic terms of the expansion (2.1.1) so the approximation gets more accurate
and the convergence speed increases.

However, the calculation of the Hessian matrix on each step is computationally
expensive. Its inversion is also an expensive and, moreover, numerically unstable
operation. Various methods like Gauss-Newton, Fletcher-Reeves, Davidon-Fletcher-
Powell, Broyden-Fletcher-Goldfarb-Snanno and Levenberg-Marquardt [147] were de-
veloped to avoid this problem.

One of the simplest of the gradient-based methods, the method of the Steepest De-

scent (also called Gradient Descent), is based on the fact that the function decreases

42

with the largest rate in the direction Opposite to the direction of the function’s gra-

dient at this point. Hence the step direction pk is chosen as
p), = —g<xk> (2.1.5)
and the iterative formula is
xk+1= xk — “\k - g(xk). (2.1.6)

Formula (2.1.6) can also be viewed as formula (2.1.4) with inverse Hessian approx-
imated by the identity matrix. However, such an approximation is very crude and
leads to the step size control problem due to the loss of the information about the
function curvature contained in the Hessian. This leads to very slow convergence
rates for functions like Rosenbrock’s function (see section C4). The anti-gradient in
the narrow valleys seen on its contour plot is directed towards another wall, while
the direction that leads to the minimum positions itself along the walls, i.e. almost
orthogonal to the direction calculated by the Steepest Descent method. Hence a typ-
ical path to the minimum consists of a series of zigzags from one wall to another,
the overall progress to the minimum is slow and the search process could be termi-
nated prematurely by Spending all its budgeted number of steps. This problem is
well-known in optimization and is referred to as the “error valley” problem.

Other derivative-based methods designed to solve the problems of slow conver-
gence and high computational cost can roughly be divided into two categories. Quasi-
Newton methods use formula (2.1.4) with various approximations for the inverse Hes-
sian while Conjugate Gradient methods employ another scheme to select directions p k
(based on the conjugate gradient method developed for fast minimization of quadratic
functions). Some methods combine both approaches adding heuristics to determine

directions and step sizes. The disadvantage of these derivative-based methods is that

43

in order to work they require the differential of the objective function or the second
differential which are often not deﬁned or are expensive to obtain. Also, while their
convergence on the quadratic functions is very fast, it generally does not hold for ar-
bitrary nonlinear functions. Finally, they are all local minimization methods so they
are best suited for unimodal objective functions. For multimodal objective functions
several approaches were developed: Multistart techniques where optimization with
one of the iterative methods from Figure 2.1 is started several times from different
initial points and Clustering methods which attempt to identify basins of attraction
(or clusters) for each extremum in order to determine the number of initial points
needed to ﬁnd all minima.

Here we describe one of the heuristic Quasi-Newton methods, namely the
Levenberg-Marquardt method. It forms the core of the LMDIF optimizer, one of the
built-in COS Y Inﬁnity [23] optimization methods. We start with the Gauss-Newton
method that is designed to solve nonlinear least squares problems, i.e. problems where

the objective function has a special form:

m
2
500 = Z (g(x)) . (2.1.7)
i=1
Here x typically consists of 1) parameters to be ﬁtted, fi, i = 1, . . . , m are the functions

of x, typically experimental results. If we denote f(x) = (f1(x), f2(x), . . . ,fm(x))T,

we can write the objective function as
_ T
S(x) — f(x)f(x) . (2.1.8)
Then its gradient is given by

g(S(x)) = 21f(x)Tr(x), (2.1.9)

44

where

 

Jf(x) = Jacf(x) = {agi:) } 1 . 1 (2.1.10)
2 m g

and its Hessian is given by
HS(x) =2JfT( x)J +()2§:f,-(x) x)Hfz,(x) (2.1.11)

where H fl, is the Hessian of fi~
Usually the objective function S is constructed so that its minimum value is zero.
By (2.1.7) it is attained only at the point x* where all f]; are zeros. For continuous
fz- this means that in the neighborhood of the minimum the second term in the
expression (2.1.11) for the Hessian of S (x) is getting close to zero and the Hessian
can be approximated as
HS(x) z 2Jf(x)TJf(x). (2.1.12)
Substituting expressions (2.1.9) and (2.1.12) into Newton’s method iterative formula

(2.1.4), we obtain an iterative formula for the Gauss-Newton method:

Xk+1= Xk — /\k ' (Jf(xk)TJf(Xk))_1Jf(Xk)Tf(Xk). (2.1.13)

The advantage of this method is that while it does not require a computation of
the second derivatives, it still uses information from them (although only approxi-
mately). In cases where the sought minimum is greater than zero, the neglected term
in expression (2.1.11) for the Hessian can become signiﬁcant thus making the approx-
imation (2.1.12) crude and decreasing the quality of the search procedure (2.1.13).
In this case the Leuenberg-Marquardt algorithm, which is a heuristic combination of
the Gauss-Newton algorithm and the Gradient Descent could be a better approach.
It is generally more robust (albeit sometimes slower) than the Gauss-Newton or the
Gradient Descent algorithms alone [77], in the sense of reliably ﬁnding solutions even

if the initial guess is far from the resulting minimum.

45

The Levenberg-Marquardt algorithm iterative formula is a slightly changed version

of the Gauss-Newton formula (2.1.13):

xk+1 = Xk - Mt : (JflxleJfkal + 71)‘13r(X/.-)Tf(Xk)- (2114)

Here I is the identity matrix, 7 is a non-negative value called damping parameter. It
is adjusted on each iteration using the following logic: if the value of the objective
function S decreases rapidly, the damping factor is decreased to make the algorithm’s
behaviour closer to that of the Gauss-Newton algorithm. If an iteration results in
insufﬁcient change of an objective function value, the damping factor is increased to
make algorithm’s behaviour closer to the one of the Steepest Descent. The choice of
the damping parameter and a scaling strategy is usually a matter of heuristic and
might require ﬁne-tuning for the problem.

Formula (2.1.14) is actually a development of Levenberg, while the insight Of Mar-
quardt was to replace the identity matrix with the diagonal of the approximated Hes-
sian (2.1.12) in order to use information contained in it even when the damping factor
is high and the method behaves like the Gradient Descent. This helps to avoid the
classic “error valley” problem mentioned earlier. The Levenberg-Marquardt method
is essentially heuristic, which makes it hard to theoretically prove its convergence, but
it is known to work extremely well in practice and thus is often considered as one of
the standard methods of the nonlinear Optimization. Note, however, that for higher-
dimensional problems its performance is signiﬁcantly reduced by the expensive and
ill-conditioned matrix inversion needed on each step of the iteration (2.1.14). Several
initial steps of the LMDIF minimization process (which is a heuristically enhanced
implementation of Levenberg-Marquardt algorithm) on the 2-dimensional Sphere test
function (see section C.1) starting from the initial guess (10,10) are shown in Figure

2.2.

46

 

\‘H \‘
‘\ \. \
\. \\
\\. \1
\ \
\ \
\\‘
\\ \ \
K "\ \ \ I
\ \ ‘\\ \c
\\ \\ \ N
\\ \[ \ ‘
i \x x
\\

/
14”“ j i
-r/
l

q

 

- - ‘ l
i , l
\V/ // (l l l
// ’ / /
_4 .\~..L . / . / . / . / .
'4 '2 0 2 4 6 8 1 0

q

 

 

 

Figure 2.2: First steps performed by the LMDIF (COS Y Inﬁnity built-in optimizer)
on the 2-dimensional Sphere test function (see section C.1) starting from the initial
guess (10,10)

2.1.2 Direct Search Methods

The methods described so far require the Objective function to be at least one time
differentiable and its derivatives to be cheap to obtain. Unfortunately, for most real-
life problems these conditions do not hold. Therefore we cannot use derivatives to
select the direction and the step size for the greedy strategy from Figure 2.1. The
Direct Search heuristic algorithms, also known as “generate—and-test” methods, are
heavily used for such problems. Their distinctive feature is that they divide the next
point search into generation and selection phases. Possible moves generated during
the ﬁrst phase are either accepted on the second phase and the iteration advances or
they are rejected and then a new move is generated.

The simplest direct search method is probably the Brute Force method. Here

the search domain is covered by a grid which is then visited point by point and

47

the best found minimizer is updated every time a better one is found. Due to its
search method, this algorithm is also called the Naive Sampling method. It suffers
from several obvious drawbacks: strong dependence of the Optimal grid size on the
problem and an exponential growth of the number of the points in the grid with
dimension. Since the algorithm visits all points in the grid during search, it leads to
an exponential growth of the search time.

The Random Walk method uses a v-dimensional Gaussian distribution to generate

trial step vectors Ax randomly. Then the trial points on k-th step are given by
xk,trial = xk—l + AX-
The selection is greedy, i.e. the ﬁrst trial point such that

f (Xk,trial) < f (Xi—1)

is accepted as the new iterate. While the method seems to not be as badly affected
by the “dimensionality curse”, the problem of determining the optimal parameters for
the Gaussian distribution that is used to generate step sizes still remains.

Hooke and Jeeves method also known as Pattern Search attempts to dynamically
adjust step size by exploring coordinate axes separately using per-axis step sizes,
which are reduced if the trial move is unsuccessful. In practice, this approach is
more effective than the Random Walk and Brute Force methods. More sophisticated
techniques to ﬁnd the direction and step size exist but methods of this type are still
typically used only in combination with other methods.

Greediness of the selection process in direct search algorithms often leads to their
convergence at a local minimum once they get into its basin of attraction. To avoid
this problem, the Simulated Annealing algorithm [105] modiﬁes the selection criteria

to also accept some “uphill” moves on the function’s landscape. This method is

48

frequently used for metaheuristic algorithms and is also one of the built-in COS Y
Inﬁnity optimizers called ANNEALING.

Strictly speaking, Simulated Annealing is not a method, it is a selection strategy
that replaces the greedy selection from step 3 of the greedy iterative search algorithm
from Figure 2.1. It helps to avoid being trapped in a local minimum, which is a
common case for greedy methods, and increases the chances of ﬁnding the global
minimum. As such it is often used in conjunction with direct search methods, most
frequently with the Random Walk method.

The inspiration for the method is the annealing process from metallurgy when a
material is ﬁrst heated (recovery phase) and then slowly cooled to change its properties
such as strength and hardness (recrystallization phase). The heating causes atoms to
move freely and the slow cooling gives them time to ﬁnd conﬁgurations with minimal
energy. A strategy built on this analogy allows the current search point to move to
the next point with a probability that depends on the difference of the function values

at the current and the candidate points (energy difference)

dz: = f (Kama) — f (Xi—1)

and the value of a parameter T (temperature) that is gradually decreased as the
search progresses. There are several parameters that influence the performance of the

method:

1. Probability of acceptance: deﬁnes the probability with which the next move
is accepted. It has to posses the following properties: be non—zero for any
values of d and T, but such that the probability of accepting a move with
d > 0 (function value increases) decreases as temperature decreases, while the

probability of accepting moves with d < 0 (function value decreases) increases

49

or stays constant (in classical Simulated Annealing it is equal to 1 for all such
moves). Frequently used is the following formula for the probability satisfying
these requirements:
1 ,d < 0 or T = 0
P(d, T) = _ d . (2.1.15)
e T ,Otherwise

Since there is a non-zero probability of accepting moves that worsen the ﬁnal

result, the best found point is typically saved separately.

. Annealing temperature schedule: determines the change of temperature with
iterations T = T(k). If the temperature decreases too fast, the method might
converge prematurely. If it decreases too slowly, calculations might take an

unnecessarily long time. Also note that for T = 0 the strategy turns to greedy.

. Trial point generation method: is not a part of the Simulated Annealing algo-
rithm itself but since the choice of the probability of acceptance and annealing
schedule depend both on a problem and on an exploration method, all three

control parameters should be selected and ﬁne-tuned together.

Further developments and enhancements of the algorithm like Adaptive Simulated

Annealing, Boltzman Annealing, Simulated Quenching, Fast Annealing and Rean-

nealing are discussed in [91]. Several initial steps of the ANNEALING minimization

process (which is an implementation of the Random Walk method with Simulated

Annealing selection strategy) on the 2-dimensional Sphere test function (see section

C.1) starting from initial guess (10,10) are presented in Figure 2.3.

Another heuristic search method that tries to avoid a convergence to a local min-

imum is the well-known Nelder-Mead method, also known as the Deforming Polyhe-

dron Search or the Nonlinear Simpler Method [92]. The main idea of the method is

50

12

 

10

 

our-boon

I
N
I
I,/’a“‘\
/ /
sf
M,
Mg- A
\ — ‘, '
‘F M

 

 

 

 

81012

Figure 2.3: First steps performed by the ANNEALING (005 Y Inﬁnity built-in
optimizer) on the 2-dimensional Sphere test function (see section C.1 starting from
the initial guess (10, 10)

to use a “search Object” which in this case is a polyhedron with (v + 1) vertices in
v-dimensional search space called a simplex. After creation of the initial polyhedron,
search proceeds with the search object being transformed and moved in the search
space in order to reach the minimum. Objective function evaluations in its vertices
are used to measure the performance and to select appropriate transformations and
movements. The initial simplex is either generated randomly, or is set as one of the
parameters. Note that if it is too small, the algorithm can be trapped in a local min-
imum. After the simplex is generated, the iterative search process given in Figure 2.4
can be initiated. There is a certain level of ﬂexibility in deﬁning the control flow and
conditional transitions, which sprouted different variations of the method, so Figure
2.4 demonstrates only one of those existing variations. Classic values for a, ﬂ, 7, ando

— the reﬂection, expansion, contraction and shrinking coefﬁcients — are 1, 2, 0.5,

51

and 0.5, respectively.

The advantage Of the simplex is that it can adapt to the Objective function surface
and thus efﬁciently control the step size. However, for complicated objective functions
(v + 1) points might not be enough to build a good model of the landscape, hence
there exist methods that use different “search objects” with more sample points, for
example, complex, which contains 2v points [136]. The Nelder-Mead method is the
last one of the built-in COS Y Inﬁnity optimizers and is called SIMPLEX. Several
initial steps of the SIMPLEX minimization process on the 2-dimensional Sphere test
function (see section C.1) starting from the initial guess (10, 10) are shown in Figure

2.5.

2.1.3 Evolutionary Algorithms

Another family of methods that use many points to explore the objective function
landscape is inspired by the process of evolution described by Darwin in his revo—
lutionary work “Origin of Species”, ﬁrst published in 1859 [51]. According to it, the
main driving forces of evolution are the variability in living organisms and the natural
selection implicitly performed on them by the environment. Over time these forces
shape different species to be very sophisticated inhabitants of the environment, i.e.
make them ﬁt to it.

If we view an Objective function as an environment and points in a search space
as organisms evolving to ﬁnd the best places in this environment (which are for
our purposes minima), we can easily sketch a general model of evolution suitable for
optimization which is called an Evolutionary Algorithm (EA) (see Figure 2.6). Having
the evidence of the efﬁciency of this algorithm in a variety of very well-ﬁt organisms

on Earth, there emerged a strong belief that its main principles can be applied to

52

 

0. All vertices are ordered and relabeled according to the corresponding
function values:
f(X1)S f(X2) S .<_ f(xv+1)'
centroid of all points except the worst one is calculated:
_ 1 v .
x’m — v 21:139-
1. Reflection is performed:
Xr = xv+1+ a(Km ‘- xv+1)-
If f(xr) < f(xl), reflection improved the best point,
go to step 2.
If f(x1)< f(xr) < f(xv), reflection improved the next worst point,
xv+1 = x7‘ ' '
go to step 0.
If f(xv) < f(xr) < f(xv+1), reflection improved the worst point,
go to step 3.
Else reflection failed to improve the worst point,
go to step 4.
2. Expansion is performed:
Xe = Xr + ﬁ<Xm — Xv+1).
If f(xe) < f(xr), expansion improved reflection,

xv+1 2 X6-
Else expansion failed to improve reflection,
xv+1 = 3‘7"

Go to step 0.
3. Contraction is performed:

x0 = xv+1+ 7(xm - xv+1)-
If f(xc) 5 f(x)»), contraction improved reflection,

xv+1 = xr.
go to step 0.
Else contraction failed to improve reflection,
go to step 4.
4. Shrinking of the whole simplex around the best point is performed:
x, =x1+a(xi —x1),i=2,...,v+1.
Go to step 0.

 

 

Figure 2.4: Nelder-Mead method iteration

53

 

 

 

 

 

 

Figure 2.5: First steps performed by SIMPLEX (COS Y Inﬁnity built-in Optimizer)
on the 2-dimensional Sphere test function (see section C.1) starting from the initial
guess (10,10)

function optimization problems equally successfully.

 

 

Generate initial population, evaluate fitness
While stop condition not satisfied do
Produce next population by
Selection
Recombination
Evaluate fitness
End while

 

Figure 2.6: Evolutionary Algorithm

Note that EA is actually a meta-algorithm and that all algorithms described earlier
can be formulated in this form. For single-point algorithms the population consists
of just one individual, selection and recombination are not applicable, the process

of producing a next search point can be viewed as a mutation. Multi-start methods

54

 

differ from the single-point ones only in the size of the population. The Nelder-Mead
method (see Figure 2.4) resembles EA more closely: it maintains a population of
(v + 1) points and uses both selection (by sorting its points using function values and
replacing the worst on each step) and recombination (during reﬂection a new point is
generated by means of other points in the population) to perform the optimization.
Expansion, contraction and shrinking here can be considered as different types of
mutation.

Because of such generality, it is commonly agreed that the family of Evolutionary
Algorithms includes only the ones that directly imitate the processes Of evolution
and use evolutionary terminology to describe their search strategies. A particularly
important distinctive feature of EAs is that the members of the population actively ex-
change information about the search space. Despite these distinctions, the boundary
is still blurry and some EAs, for example the Differential Evolution [171] algorithms,
are closer to multi-point direct search methods than to the “true” Evolutionary Algo-
rithms.

It is worth noting that EA does not pose any restrictions on the search space and
members of the population, which, multiplied by a variety of different approaches
to deﬁne ﬁtness, selection, recombination and mutation, leads to a very broad ﬁeld
of applications. Examples include a wide variety of optimization problems: numer—
ical optimization, combinatorial optimization, circuit design, scheduling problems,
video and sound quality optimization, control systems [53], image analysis [50], mar-
keting [164] and economics [12], trafﬁc control [35], manufacturing [90], and many
others. While EAs do not explicitly guarantee to ﬁnd even a local minimum, practi-
cal applications demonstrate that frequently they are able to ﬁnd a global minimum

or at least produce a practically acceptable solution.

55

Each of these applications is usually tied to a particular ﬂavour of the Evolu—
tionary Algorithms. Genetic Algorithms (GAS) [78,79] Often encode parameters as
strings of bits and modify them with logical operators and thus are better suited
for combinatorial optimization, for example for the class of problems equivalent to
the famous Traveling Salesman Problem [158]. Genetic and Evolutionary Program-
ming [107] evolve computer programs and are used, for example, to design snippets
Of code to ﬁlter out unimportant events from the ﬂow of the events coming from a
detector in high-energy physics. Evolution Strategies (ES) [82] and Differential Evo-
lution (DE) [171] both use real numbers and arithmetic evolutionary Operators for
continuous function optimization. It is also worth noting rapidly increasing inter-
est in the development of the optimizers mimicking various optimization and search
processes of nature: Particle Swarm Optimization, Ant Colony Optimization, Tabu
Search, Cultural algorithm, etc. [165] and their successful application to many real-
world problems.

In this work we consider EAs primarily in the context Of the real-valued functions
optimization. However, the overabundance of the variations of the Evolutionary Al-
gorithms suitable for the task does not allow us to cover them all. Our primary
goal was to ﬁnd and implement the one that is proven to be robust and efﬁcient,
with a default set of parameters effective for many applications and then to assess
its applicability to our problems. The description of the Evolutionary Algorithm we

implemented is presented in section 2.3.

2.1.4 No Free Lunch Theorems for Optimization

With such a large variety of optimization methods, having described only a few well-

known ones, one can conclude that there is no need for new algorithms. However,

56

each method has its own strengths and weaknesses and the variety of optimization
problems is much larger than the variety of methods. Thus development actively
continues, especially along with the rapid growth of the available computing resources,
which is opening the possibility to solve increasingly complex problems.

TO combine the best features of different methods and compensate for their weak-
nesses, combinations of methods are sometimes used. The Levenberg—Marquardt
method described earlier, is a heuristic combination of the Steepest Descent and
Gauss-Newton methods, and it works very well in practice. Even more sophisticated
combinations can be constructed using several direct search methods. Consider, for
example, a perfectly viable combination: a multi-start method, sampling points of the
search space by using them as initial points for the greedy direct search method com-
bined with the Simulated Annealing to compensate for its greediness. Combinations
of rigorous methods with good heuristic methods could signiﬁcantly speed up their
convergence (see section 2.3.6) and increase their robustness (see section 3.3.5). The
number of parameters for state-of-the-art methods and for their combinations can be
so large and their interactions so complex, that a separate Optimization technique to
adapt method parameters to the problem might be used.

Much controversy was caused by the so called “No free lunch theorems for opti-
mization” [176] which in rough terms states that the difference in performance of all
search algorithms averaged over all optimization problems is negligible. Every method
can perform better at certain classes of problems only at the price of performing worse
on other classes and this performance depends on the amount of information about
the problem incorporated into algorithm, i.e. on ﬁne-tuning of the algorithm for the
problem.

While this means that theoretically there is no algorithm that outperforms all

57

other algorithms on all problems, in practice we are always solving a particular prob-
lem or a class of problems. Therefore, we can interpret these theorems as an additional
reason to ﬁnd the best matching optimizer for each circumstance. N o efﬁcient general
purpose optimization method exists, hence there is always a SCOpe for improving al-
gorithms for better performance on particular problems. It must be noted, however,
that since human time is getting more and more expensive with computation time
becoming increasingly cheap and since we usually need a good solution in a reasonable
time rather than the best solution in an optimal time, running an algorithm which is
not optimal but capable of ﬁnding satisfactory solution in an acceptable time might
overall be more beneﬁcial than spending precious human time on ﬁnding the optimal

method.

2.2 Rigorous Global Optimization

2.2.1 Conventional Interval Methods

While local optimization methods solve only the problem of ﬁnding the extremum of
the function and in most cases they settle at a local extremum, global Optimization
methods have an additional goal to achieve. They need to prove that the extremum
they ﬁnd is global in the given search domain. Rigorous, veriﬁed or validated global
optimization is dealing with the problem of ﬁnding mathematically rigorous enclo-
sures for the extrema and the points in which they are attained, accounting also for
numerical computational inaccuracies. Note that for this purpose the function under
consideration is usually required to be sufﬁciently smooth on the search domain.
The most straightforward approach to validated global Optimization is a version

of the well-known general “divide and conquer” strategy: the large problem that is

58

hard to solve as a whole, is divided into a set of smaller problems which are easier to
solve (or conquer, following the terminology). For the global Optimization this strat-
egy takes the form of the branch—and-bound method that divides the original search
domain (1.3.9) into a stack of sub-boxes and conquers them one-by-one, removing
each box, proven not to contain a minimum, from further consideration, reducing its
size, or dividing into even smaller boxes for subsequent processing. This strategy is
applied on each step until the desired accuracy is reached. A rigorous estimate of
the upper bound of the minimum, a so called cutoff value, is frequently employed to
speed up the elimination. It can be calculated, for example, using interval arithmetic
evaluation of the function at some point in the box. Typically, interval calculus is
also used to obtain rigorous estimates for the maximum and minimum values of the
function on a box.

Algorithms based on the outward rounding interval arithmetic allow one to ob-
tain a rigorous estimate Of the global extrema. However, there are several problems

connected to their usage, of which the two most important are [26]:

1. Dependency Problem: resulting bound is not tight due to a signiﬁcant overesti-
mation introduced by the complexity of the function and features of the interval

arithmetic operations (these features at the same time make them rigorous).

2. Cluster Effect: the number of boxes in the stack that are located near the local
extrema remains almost constant for a prolonged period of algorithm execution

thus slowing down the elimination process.

The ﬁeld of global optimization is very wide and there exist many different ap-
proaches and algorithms which we do not cover in detail in this work. A detailed

survey of interval global Optimization methods can be found, for example, in [56,101].

59

2.2.2 Taylor Methods

Efﬁcient Taylor model methods that solve the dependency problem were ﬁrst de-
veloped in [125] for the problem of rigorously bounding a particularly complicated
function, which was introduced to solve a practical problem from the ﬁeld of nonlin-
ear dynamics (normal form defect function Optimization is discussed in section 4.2).
Detailed coverage of the current Taylor model methods research status, theory, im-
plementation, applications of the Taylor models, and numerous Taylor model-based
methods to solve many important problems of scientiﬁc computing, can be found
in [117]. Here we just outline the basic ideas behind these methods and describe their
application to veriﬁed global Optimization.

For the function f that is (n + 1) times continuously partially differentiable on a
domain D, a Taylor model of the order n consists of the Taylor polynomial P for f,
expanded at the point .230 E D up to n-th order and a remainder error bound interval
I such that

f(z) E P(:r,a:0) + 1, Va: 6 D. (2.2.1)

Given the Taylor model for the function on a search domain, one could perform
naive bounding by merely evaluating the polynomial P in interval arithmetic and then
summing results with remainder interval I. Even such a simplistic approach outper-
forms nai’ve interval methods or more advanced Centered Form methods applied to a
function f directly [117]. However, more sophisticated and efﬁcient range bounders
based on the Taylor model representation (2.2.1) were developed and implemented.
Two of them, namely the Linear Dominated Bounder (LDB) and the Quadratic Fast
Bounder (QF B), outlined in this section and covered in detail in [26,117], form the
core of the Taylor model-based veriﬁed global optimization package COSY-GO, im-

plemented in COS Y Inﬁnity [23].

60

Linear Dominated Bounder

The Linear Dominated Bounder (LDB) is based on the observation that in the Taylor

model representation (2.2.1) the linear part of the Taylor polynomial P frequently

dominates model’s behaviour and thus plays the main role in range bounding. This

linear part is utilized by the LDB in order to reduce the search domain that encloses

extrema. To ﬁnd the lower bound of M, the minimum Of the Taylor model (2.2.1) in D,

it performs the steps given in Figure 2.7. All re-expansion errors and point function

evaluations from step 3 are accounted for by including them into the remainder error

bound interval. Note, that even if there is no linear part in the original Taylor model,

it can often be introduced by shifting the expansion point.

 

 

1.

. Flip the coordinate directions to make all the linear coefficients L,- of

. 0n the n-th step compute the bound of the linear (IL) and nonlinear (IN)

Re-expand P at c, mid-point of D, to obtain polynomial Pm on the
centered domain D1.

the Pm positive so as to obtain the polynomial P+.

parts of P+ = L+ N on Dn. Then, according to the rules of interval
arithmetic, the minimum is bounded by [M , Min] = IL + IN.

If possible, reduce Min using the minimum of the current cutoff value,
the function values calculated at the left endpoint and the mid-point.

4. Calculate width of the resulting range: d = width(M,Min).

a) If (1 < e, desired accuracy 5 is reached, stop. M is a lower bound of
the minimum.

b) Else, if L2- # 0, the domaini containing the minimum is reduced by
setting

D—n+1,i = min(_l_)_n,z' + d/Lii-D-nﬂl-

Perform steps 1 and 2 on Dn+1 to obtain new 13+. Return to step 3

to perform (n+1)-st iteration.

 

 

Figure 2.7: LDB range bounding algorithm based on Taylor model (2.2.1)

61

Quadratic Fast Bounder

~The Quadratic Fast Bounder (QFB) is designed to work in cases where the linear
part of the Taylor model (2.2.1) is not dominant (most importantly in the vicinity
of the local minimizer) and it utilizes the quadratic part of P. Range bounding of a
quadratic polynomial on an arbitrary interval generally has exponential complexity
growth with the dimension and thus can be very expensive. However, Obtaining a
lower bound of the quadratic polynomial in an isolated neighborhood of the local
minimum (which is also an important problem of global optimization) turns out to
be a much simpler task. Indeed, in such a neighborhood the Hessian of a function f is
positive deﬁnite, so the purely quadratic part of the Taylor model (2.2.1) has a positive
deﬁnite Hessian matrix H. Positive deﬁniteness itself can be tested in a veriﬁed way
via a common LDL or extended Cholesky decomposition, as demonstrated in [26]. If
a purely quadratic part is, indeed, positive deﬁnite, QFB provides a lower bound of
the Taylor model rather cheaply based on the following observations.

Suppose we obtain the Taylor model (2.2.1) for the given function f on the given
domain D. Let H be the Hessian matrix of P and let it be positive deﬁnite. We now
represent P as

P+I=(P—Q)+I+Q (2.2.2)

and Observe that then the lower bound for P + I is obtained as
l(P+I)=1(P—Q)+l(1)+l(Q). (2.2.3)
If we now choose Q such that
1 T

for any £0 E D, then, since H is positive deﬁned, l(Q) = O in (2.2.3) and is actually

attained at :c = 3:0. By this choice of Q, the remaining polynomial (P — Q) does not

62

 

 

contain purely quadratic monomials: it consists only of ﬁrst (linear), third and higher
order terms. If we now choose $0 to be the minimizer of the quadratic part P2 of P
in D, then, by the consequence of the Kuhn-Tucker conditions, it is also a minimizer
of the remaining linear part, so the lower bound (2.2.3) is dominated by the orders
2 3 and is thus optimally sharp.

Hence, by choosing .230 to be sufﬁciently close to a minimizer of P2 in D, we can
make the contribution of P2 — Q to the lower bound (2.2.3) sufﬁciently small. To
determine this minimizer we can utilize the fast iterative directional minimization
method descending in the direction of —VP2 limiting at the same time the obtained

values to remain inside D.

Validated Global Optimization Package COSY-GO

The methods described earlier, combined together, form the basis for the veriﬁed
global optimization COSY-GO package for COS Y Inﬁnity. It uses a branch-and-
bound scheme to manage the list of boxes that represents the current state of the
search space. The core of the package itself consists of two global optimization meth-
ods: interval bounding and interval bounding with Centered Forms which is roughly
equivalent to the ﬁrst order TM method (used mainly as an auxiliary tool for simple
tasks). The other, most sophisticated, efﬁcient and thus heavily employed method is
based on the Taylor model methods described earlier in this section [26,117,122,125].
At every step, the algorithm from Figure 2.8 is applied to each box from the list.
Then the search process either stops if the desired accuracy is achieved, or reduces
the search space volume by re-splitting it to smaller boxes and inserting them back
into a list and continues. An example output of the COSY-GO rigorous minimization

of the 2-dimensional Rosenbrock’s function (see section C4) is presented in Figure

63

2.9. It demonstrates different methods COSY-GO utilized to eliminate and reduce

boxes in order to obtain a rigorous enclosure of the minimum.

 

 

1. A lower bound is obtained by applying the various available bounding
schemes sequentially in the order described below. If the obtained lower
bound is below the cutoff value, the box is eliminated, otherwise it
is bisected. Each subsequent method is applied only if the previous
one fails. The following bounding methods are used:

a) Simple interval bounding of the function f.

b) Naive Taylor model bounding based on the evaluation of the Taylor
polynomial ID in interval arithmetic.

c) LDB bounding. If it fails, the LDB domain reduction is performed as
described earlier.

d) QFB bounding, if the quadratic part of the 1’ is positive

definite.

2. The cutoff value is heuristically updated using following methods:
a) The result of the function evaluation at the midpoint of the
current box.
b) The linear and quadratic parts of I3 are utilized to obtain a

potential cutoff update.

 

 

Figure 2.8: COSY-GO Veriﬁed Global Optimizer box processing algorithm

64

 

 

4t:

COSY—GO result for Rosenbrock: ID=0, NV=2, N0=S, GORUN=1
Output 1evel=3

COSY-GO Computation time : 0.2002880000000001 sec
Initial search volume : 9.000000000000016
Remaining volume : 0.7655209967109666E—19
Total number of steps : 119

Number of boxes eliminated

By retained lower bound : 0
By interval lower bound : 43
By naive TM lower bound

By LDB cutoff

By QFB cutoff

By splitting : 53

Number of boxes reduced

By LDB : 32
By QFB : 23
- QFB and no box splitting : 12

Number of remaining boxes
In the master list
In the local lists : 0
Smaller than SIZE : 1

SIZE= 0.1000000000000000E—OS

Maximum number of active boxes in the local list : 14

Enclosure of the minimum
[-0.2004168360008975E-291, 0.1112491091587934E-026]

 

Figure 2.9: Output of the results of the minimization of the 2D Rosenbrock’s func-
tion (see section CA) by COSY-GO

65

 

2.3 GATool Evolutionary Optimizer

As it was mentioned in section 2.1, Evolutionary Algorithms (EAs) are successfully
employed to solve many different optimization problems of science and industry. Their

distinctive advantages include:
0 relative ease of implementation,

0 ability to efﬁciently ﬁnd global optima avoiding local ones even in very large

search spaces (see section 2.3.6),

o no requirements on the objective function other than the ability to calculate its

value at every point of the search space,
0 good tolerance to noise (see section 2.3.5),
o ability to work even when the traditional search methods fail.

General interest in the ﬁeld of EAs is steadily growing. Active research on the devel-
opment of EAs and their applications has produced a large number of publications;
bibliography on Evolutionary Computation as of now contains more than 4000 en-
tries on Evolutionary Computation and related areas [65]. To save space and avoid
repeating work we describe only the concepts of Evolutionary Algorithms that essen-
tial to the considered Optimizer. For more detailed treatment we refer the reader to

an excellent introduction to Evolutionary Algorithms in [128].

2.3.1 Principles, Concepts and Building Blocks

A generic scheme of the Evolutionary Algorithm is presented in Figure 2.6. Because

of the EA’s generality, in order to select or construct a speciﬁc EA for the problem or

66

a class of problems, we need to make certain design decisions. Usually the EA design
process starts with selecting an appropriate representation of the possible solutions
from the search space. Effective encoding of the solutions is the ﬁrst and one of the
most important components of successful EA usage. For the Travelling Salesman
Problem [6], it can be a string of numbers denoting the visited cities, for the physical
device design optimization it can be an array of the control parameters in the ﬂoating
point format.

Some algorithms make a distinction between genotypic and phenotypic represen-
tations. Genotypic representation here is an encoding Of the solution that has to
be decoded to a phenotypic representation before evaluation. Typical example of
such encoding is a genotypic binary encoding of the phenotypic real-valued param-
eters that is frequently employed by Genetic Algorithms [78]. Most frequently used
representations include binary encoding, binary gray encoding, integer, real or even
standard data structures such as lists, trees, etc.

Each step of the execution, EA works with a set of the representations of the cur-
rent potential solutions called a population (P). Individual members of the population

are called individuals:
P={x1,x2,...,xN}, (2.3.1)

where N is the population size. As the algorithm progresses with the search, the pop-
ulation changes. The population on the i-th step of the EA is called i-th generation.

The deﬁnition of the solution representation is, Of course, meaningless without a
method to measure the performance of the represented solution. The ﬁtness function

is used for this purpose (typically ﬁtness is a real, non-negative number):

ﬁtness: P e———> RF (2.3.2)

67

The ﬁtness function serves as the main connection between the objective function
and the Evolutionary Algorithm. Its main purpose is to rank individuals according
to the optimization goals. Hence it is constructed such that the individuals that are
better in terms of the underlying optimization problem have higher ﬁtnesses than
the ones that are worse. Typically the ﬁtness is calculated from the value of the
objective function via a process called ﬁtness scaling. This process can, for example,
convert the function values obtained during the minimization process where smaller
values correspond to better solutions, to ﬁtnesses in [0, 1] such that a larger ﬁtness

corresponds to a better individual:
ﬁtness(x) = ﬁtness(f(x)), x E P. (2.3.3)

Fitness scaling plays an important role in a successful EA application. It can be used
to increase or decrease the evolutionary pressure by inﬂuencing the selection methods
(especially the prOportional selection described later in this section) increasing or
decreasing the difference in ﬁtness between the members of the pOpulation that have
different objective function values.

In order to progress in the evolutionary search the selection and reproduction
processes must also be designed. The selection process selects individuals for repro-
duction with a probability that is related to their ﬁtnesses. If we want the evolution to
progress, we must select better individuals more often. One class of selection methods
is called proportional selection. It includes roulette, stochastic remainder, universal
stochastic, deterministic sampling and other methods where individuals are selected
with a probability that is directly proportional to their ﬁtness. Another class Of meth-
ods includes tournament selection where a series of the tournaments among the ﬁxed
size (two or more) samplings of the individuals of from the population is held. The

ﬁtnesses of all the tournament participants are compared in order tO determine the

68

 

..u

 

v,- .,

~58...

V‘ 1

ﬁttest one that is the winner. The important problem here is to choose and tune the
selection method so that it exerts an optimal amount of evolutionary pressure in order
to keep the population diverse and avoid premature convergence of the algorithm on
the one hand but, on the other hand, not to suppress the convergence at all and keep
it at a reasonable level.

During the reproduction phase, the next generation is produced from the current
one and the results of the selection phase. Two evolutionary operators usually em-
ployed for reproduction are mutation, which produces a new individual (mutant) via
modiﬁcation of the single selected individual, and crossover, which uses two or more
individuals (parents) to produce a new individual (child). These two Operators are
typically connected with two main processes of the EA search procedure: exploration
and exploitation. The ﬁrst Of them is a search of the potentially interesting zones
of the search space, i.e. zones where the location of the Optima is suspected. The
second is an examination of these zones in order to ﬁnd the Optima. Mutation is
usually responsible for the exploration while crossover is driving the exploitation.

An important concept that inﬂuences reproduction is elitism. It is an operator
employed by the EA algorithm to preserve a certain number Of the best members of the
current population and transfer them to the next population intact, without mutation
or crossover (though they also participate in the selection and can be selected to
produce both mutants and children). Elitism guarantees that the best found value
Of the next generation is not worse than the best found value of the current one and
thus is very important for steady convergence. Care must be taken, however, to keep
the number of elite members relatively small in order to allow an EA to explore the
search space even when some very good potential solutions have already been found

and thus to avoid the convergence to a local optimum. Under additional assumption

69

about the run time being inﬁnite (or a maximum number of generations, depending on
the selected stopping criteria) the convergence to the global Optimum can be proven
for the Evolutionary Strategies that are a type Of EAs [161].

TO start the search we also need a method to generate the initial population
(typically by generating random samples uniformly over the whole search space),
stopping criteria (typical criteria include maximum number of generations, maximum
number of stall generations and maximum run time) and the algorithm parameters
(population size N, tolerances mutation and crossover rates, number of elite members,
etc.). Only when the design process is completed can the algorithm be used.

Here we should note that despite all their attractive features, Evolutionary Algo-

rithms also have certain weaknesses and complications:

0 it is possible to choose a “right” representation but “wrong” genetic operators
or to set method parameters to non-optimal values, which results in degraded

performance in both speed and quality,

0 extensive ﬁne-tuning via trial-and-error and intuition might be required to tune

the method for reasonable performance on speciﬁc problems,

0 no complete theoretical methods to unambiguously select or design the Evolu-

tionary Algorithm for the problem are developed up to date.

2.3.2 Design and Implementation

The algorithm we implemented uses the best features of Evolutionary Strategies (ES),
Genetic Algorithms (GA) and Differential Evolution (DE). We named the algorithm
and its implementation in COS Y Inﬁnity system GATool because it most closely re-

sembles the logic of GA (while it is deﬁnitely not a classic GA). It is worth noting that

70

the widely popular Matlab scientiﬁc computations package [139] includes Genetic Al-
gorithms Toolbox that provides a very similar algorithm in its standard distribution.
This helps demonstrate that this algorithm is, indeed, well-tested and proven to be
efﬁcient.

From Evolutionary Strategies we adopted the representation of a potential solution

as a vector of real numbers, i.e. a vector of problem arguments:
x: ($1,1‘2,...,$U)T. (2.3.4)
Then the pOpulation members are:

)T,i=1,...,N

Xi = (I.i1,113,l'2, . . . “in,

and
f = (f(xl),f(x2),...,f(xN))T = (f1,f2, . . . ,fN)T (2.3.5)

is a vector containing function evaluations for the members of the population and i
and 35 denote the minimum and maximum function values of the population members
correspondingly. Noting the success of the ES and DE (see references in section
2.1) both using such a representation, we suggest that it is more adequate for the
optimization of the problems with real-valued parameters than the binary encoding
frequently used in GAs. Note that this representation is phenotypic, i.e. a member
of the population does not need to be decoded in order to be evaluated.

Since we implemented the algorithm for the minimization of real-valued functions,

ﬁtness scaling mapping had to satisfy two requirements:
0 smaller function values mapped to larger real ﬁtness values and

0 resulting ﬁtness is non-negative for all function values.

71

For convenience, after the function values are mapped to ﬁtnesses they are normalized

to be in the [0, 1] range. Several ﬁtness scaling functions are currently used:

0 Linear:

ﬁtness(x,j) = ﬁtness,- 2 7 — f, 2 0 (2.3.6)

o Proportional: ﬁrst the following transformation

ﬁtness,- 2 (ﬁg—é — fi) , (2.3.7)

which effectively rotates the function values around the center of the function
values range, is applied. Then, if i < 0, it is added to the resulting ﬁtness to

make it non-negative.

0 Rank: function values are sorted in ascending order and then the ﬁtness is
assigned to individuals according to the indices of their ﬁtnesses in the resulting
array. Practically, the square root of the inverse of the index has demonstrated
itself as an efﬁcient formula. This technique ensures that the best members
(with smaller indices) are further apart than the worst members (with larger

indices). Hence the evolutionary competition between best members is stronger.

From these ﬁtness scaling methods, rank scaling has demonstrated itself as the most
eﬂicient. Note that albeit being most computationally expensive of the implemented
methods, it is also the most numerically stable since it does not involve the Operation
of subtraction which can lead to a cancellation effect and thus to the loss of accuracy
if the function values are close to machine precision.

The initial population in EAs is generated by producing uniformly random points

from the initial box (the most common method). Here we assume that the search

72

domain is given as a v-dimensional box:
5 = [G1,b1] X [G2,D2] X . .. X [Cl/Lubv]. (2.38)

GATool makes a distinction between the global search box and initial box, which is
usually (but not necessarily) contained within the global box or is equal to it. As
it was mentioned, the initial box is utilized to generate the initial population, while
the global box is employed to control the population for the presence of the outside
members. If the elimination mode is on (by default), all members of the population
that are initially generated or produced during the search outside of the global box,
are killed and then regenerated in the initial box. This strategy can be easily extended
on a collections of boxes, both local and global. It can be used if we want to direct
the search to certain zones of the search domain and is particularly important for the
COSY-GO rigorous Optimization package interaction (see section 2.3.6).

The initial pOpulation can also be seeded, i.e. initialized with predeﬁned members.
For example, if we have reasons to suspect certain zones of the search space for the lo-
cation of the minima, we might want to pre-generate some members of the population
located there in order to direct the search process. In most cases, however, seeding
is not recommended because additional pressure can decrease GATOOl’s chances tO
ﬁnd a global minima and in the worst case trap GATool in the local minima. It
must also be noted that a search space that does not enclose the sought minima with
enough tightness can produce similar effects since the uniformly randomly generated
initial population can be too sparsely distributed over the search space and thus be
to distant from the minima for a successful search.

The process of selection of individuals for reproduction is also Of great importance

73

for the algorithm’s efﬁciency. For ease of notation we denote by
5 = rand(a, b), (2.3.9)

where “(” is either [ or (; “)” is either ] or ), a uniformly distributed random number
from the corresponding interval: 6 6 (a,b). For example E = rand[a,b) denotes
a uniformly distributed random number 6 from [a,b). We denote the number of

members we need to select as NS The following frequently used methods of

elect‘

selection are used b GATool:

0 Roulette Wheel: suppose

k
Sic = Z ﬁtnessi, k = 1,. . .,N, (2.3.10)
i=1

SO 2 0 and t : rand[0, SN]. Then the index 5 such that
S€_1 < t < 55

denotes the member selected by a random turn of the roulette wheel. Note that
here

Sk — Sk_1= fitnessk,

i.e. sizes Of the sectors of the wheel are equal to the ﬁtnesses of the correspond-
ing members. Thus members with larger ﬁtnesses have higher chances to be

selected. The procedure is repeated Nselect times.

0 Stochastic Uniform: suppose partial sums of ﬁtnesses are deﬁned as in the

Roulette Wheel selection method (2.3.10). Let h be a selection step:

3
h: N

N select '

 

74

 

Let t1 = rand[0, h] and
t:=q+%f—Umj=2HqN&mW

Then

tj e [0,SNl,j=1v~-:1Nselect

and we select Nselecic indices 9 of the population members by choosing those

that satisfy the relation
S€j_1<tj< 56],, j = 1,. . . ’NSGlGCt'

0 Tournament: suppose T is a number of the members participating in a tour-
nament. We randomly select T members of the population with equivalent
probability for each member to be selected. Suppose the indices of the chosen
members form the set I. Then the index of the member that is a result of the

tournament selection is determined as
5 = arg max{ﬁtness,-}.
iEI ’

The procedure is repeated Nselect times.

From these selection methods Stochastic Uniform stood out as the most efﬁcient
and robust in practice. Elite members are selected as the Nelite members of the
population with the largest ﬁtnesses. Members needed for the evolutionary Operators
in order to produce the next generation are selected by the chosen selection method.
It must be noted that one member of the population can be selected several times.
The next population, is produced during the reproduction phase on the basis of
the current one and the results of the selection. Elite transfer and two evolutionary

Operators — mutation and crossover — are employed. While elite transfer is a simple

75

process of COpying the best individuals from the current population to the next one
(evolution of neutrality, preservation of already found results), mutation (variability,
innovation, exploration) and crossover (ancestry, information exchange, exploitation)
are more advanced. The number of members of the next generation generated by each
of these methods is determined by two parameters: mutation rate and the number
of elite members. Mutation rate is a number in the [0, 1] range that determines the
percentage of the next population that is generated by mutation. The number of the
elite members or elite rate determines the number or percentage of elite members
transferred to the next population. The remaining members of the next population
are generated by the crossover. Hence the next generation completely replaces the
current one with the exception of elite members.

There are two types Of mutation used:

a Uniform: usually employed by classic GAS [78], it involves random change
in one or several genes that occurs with a certain probability. For classic GAS
operating on binary codes, it involves ﬂipping the bit value or several bit values.
In our case for every member x2- selected for mutation for each coordinate I’lj’
j = 1,... ,v, the random number Q- j = rand[0, 1] is generated. If it is less or
equal to the predeﬁned coordinate mutation rate pc E [0,1], the coordinate is

replaced with the random value of this coordinate from the search domain:

$ij,m = ran(l[aj,bj].

Since the distribution of 5,: j is uniform, the coordinate mutation rate is also a
probability for exactly one coordinate of the member scheduled for mutation

to change. Hence the case of pc 2 0 corresponds to no mutations at all while

the case of pc 2 1 corresponds to a mandatory mutation in every coordinate

76

of every member selected for mutation. Frequently the value of pg 2 1/v is
used so that on average only one coordinate is changed per the mutation of
the member. From probability theory we know that the probability Of the
combination Of the independent events (replacement of the certain coordinate)
is equal to the product of the probabilities of the mutations of the individual
events (replacement of the several coordinates). Since usually pc is chosen such
that pc << 1 and it is shared between all coordinates, the probability for several
coordinates of one member to mutate is rapidly decreasing as the coordinate
mutation rate raised to the power of the number of coordinates. This also means
that the probability for the mutant of any point from S to be any other point
from S (i.e. the probability of transition between any two pairs of points in S
during the search) is positive if pc is positive (even if it is very small). This, in
turn, means that any point of the search space has a certain probability to be
considered during optimization, which is important to ensure that the search
for the optimum is global (in the search space). The example of the uniform
mutation is demonstrated in Figure 2.10. Here x is the population member
selected for mutation, Xm,1 (the ones on the dashed lines) are the mutants
with only one coordinate changed, xm,2 are the mutants with two coordinates
changed. Here for every x scheduled for mutation, the probability of mutating

to any of the xx,1 is pc while the probability of mutating to any of the xx.2 is
Pg S Pc-

Gaussian: usually employed by the E85 [161]. In this type Of mutation a differ-
ence vector is added to each member selected for mutation in order to produce a

mutant member of the new generation. Each coordinate of the difference vector

is generated as a random number from the Gaussian distribution with the mean

77

of 0 and a standard deviation equal to half the length of the corresponding range

of the search domain S:
b- — a,-
No.02) = N (0» 7i) '

Due to the properties of the Gaussian distribution this means that even though
all coordinates are changed for every mutant, most changes in coordinates are
small compared to a search range for the corresponding coordinate. We note
that in this case the result of the mutation is not guaranteed to remain in S
hence additional suppression of these cases might be needed if such requirement
is imposed. In our case we do not perform this check since the outside members
can still be a valuable source of information about the search space and they
will transfer this information during the recombination phase. We also added an
Option for this mutation type to be adaptive by deﬁning the shrinking schedule

for the standard deviation:

 

02 = 02(9) 2 (1— agnfax)’ (2.3.11)

where g is a generation number, gmax is the maximum allowed number of
generations and a is the shrinking factor (typically in the [0, 1] range). Of
course, adaptive parameters are deﬁned only if the maximum allowed number
of generations is set. The mutant produced by the Gaussian mutation can be
any point of the search space (even outside of it) with the probability that is
decreasing with the growth of the distance from the mutated point. Due to the
form of the Gaussian distribution there is a positive probability for any point

in the search space S to be generated by the mutation of this type.

Practical experience shows that there is no universally best mutation type and that

different problems beneﬁt from different approaches. Some problems might perform

78

better with both types working simultaneously. For Gaussian mutation with adap-
tation, the shrinking factor must be chosen carefully to allow enough time for ex-
ploration in the beginning of the search by generating large jumps but to, allow the
algorithm to converge at the end of the search by producing relatively small deviations

in the neighborhood Of the potential result location.

A

 

 

 

 

l
S l
X 2 x
O m' 6 m,1
I
l
X x
_____ {»T£___________*_-__----—
I
X 2 l
o m, i Oxm'2
oxm,2 [
éxmj
Xmlz I
0 l
l
)

 

 

Figure 2.10: Uniform mutation example: x is a member scheduled for mutation,
me are some of the possible one-coordinate mutants (all such mutants are located
on one of the two dashed lines), xm,2 are some of the possible two-coordinate mutants
(can be anywhere in S)

The process of the generation of new, untested solutions is essential for the success
of the method. But the process of the exchange of the information about already
explored points of the search space is also an integral part of the algorithm. The
evolutionary operator of crossover is designed to perform this task. A usual crossover

method employed by GAS is the Uniform Crossover (also called n-point Crossover).

79

For each of these zones the contributing parent is selected randomly or by some other
means. Then the genes in each such zone are copied from the contributing parent
into the corresponding zone of the child. Therefore the child have a chance to share
a part of its genes with one parent and another part with another. The number of
points in n-point crossover determines how parents’ genes are being mixed in a child.

Despite the success Of this crossover method used in a variety of the different EAS
we decided to use another crossover method, which could be viewed as a very simple
form of the line search method of the classic continuous Optimization methods (step
3, Figure 2.1). This method is usually called Arithmetic or Continuous Crossover.
Suppose two parents are selected for crossover: xp,1 and xpg. We compare their
ﬁtnesses to determine the better ﬁt and the worse ﬁt parents and relabel them as pr

and xp,w, correspondingly. Then their child is generated by the following formula:

XC = Xpaw + C(Xp’b — Xp’w), (2.3.12)

where S is a scaling factor. It can be shared between all coordinates or selected
individually for each coordinate (then the formula (2.3.12) is applied coordinate-
wise). Since all parents are selected independently, the situation when one population
member is chosen for the role Of both parents can occur. Particularly often this
happens when the population size is small. From (2.3.12) it follows that in such
a case the child duplicates this member, which leads to a stagnation of the search
process. We implemented a process of the suppression of such situations: in case
it occurs, another parent may be re—selected until different parent is chosen or the
budgeted number of trials is exhausted.

From (2.3.12) it can be seen that the child is generated on the line connecting two
parents and the scaling factor ﬂ (typically ﬂ 6 (0, 2]) determines its location on this

line relative to the better and worse parents. If S < 0.5 then the child is generated

80

 

1!

"f"

It

closer to the worse parent, if 0.5 < S < 1, then the child is generated closer to the
better parent, between parents, if S > 1, then the child is generated closer to the
better parent, outside the segment Of line between parents and the case of S := 0.5
corresponds to the intermediate crossover when the child is generated in the middle
of this segment (in this case, parents’ ﬁtness values are effectively ignored). Examples
of the different crossover children can be seen in Figure 2.11.

Equivalent notes can be made for per-coordinate scale factors, only coordinate-
wise, with the line replaced by a box with parents occupying opposite vertices of its
main diagonal (see examples in Figure 2.12). In this case, it is possible, for example, to
generate a child that is closer to the worse parent in the ﬁrst coordinate but closer to
the better parent in the second coordinate, or with all coordinates closer to the better
parent’s coordinates but each one with its own scale. This feature can be useful for
the ﬁne adjustment of the algorithm to a problem, if an additional knowledge about
its landscape is available.

Since this type of crossover for S > 0.5 introduces additional pressure to converge
(although in practice it does work better) additional randomization can be added
to preserve the diversity of the population and avoid the stagnation of the search
Process:

xc = xp,w + rand(0, 1) - ﬁ(xp,b — Xp,w)- (2.3.13)

Random multiplier can be generated once for a member or generated anew for each
COOrdinate. Values of S 6 [075,085] have demonstrated a reliable performance in
most practical cases.
It is interesting to note that for the DE optimization approach a very similar
method is employed to perform the differential mutation which is an essential feature

of the algorithm (there the crossover is of the n-point type). The difference is that in

81

It is interesting to note that for the DE optimization approach a very similar
method is employed to perform the differential mutation which is an essential feature
of the algorithm (there the crossover is of the n-point type). The difference is that in

classic DE all three vectors in the right hand side of (2.3.12) are different.

A

 

 

 

 

i

 

Figure 2.11: Continuous Crossover examples for the common scaling factor ,8.
Points xp,b and Xp,w are the parents with the better and worse ﬁtnesses, corre—
SPondingly; XCJ for various i are the children generated with different values of the
scaling factor: 2' = 1 corresponds to ,8 G (0.5,1), i = 2 corresponds to ,6 > 1, i = 3
COrresponds to )6 = 0.5 (intermediate crossover, values of the ﬁtnesses neglected)

23.3 Statistics, Diversity and Convergence

se"'el‘al types of statistics are gathered during the search process. Some of them are
used to determine if the stopping condition is met and some provide various measures
of the algorithm’s performance. One important performance characteristic of any EA

”IS ability to maintain a diverSity 1n the population in order to avmd the convergence

82

 

 

 

Xp.b

X
3‘
w

1
I
I
I
I
I
I
I
I
I
I
I
I
l
---—-—J
O

O
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I.

 

 

 

 

F

 

Figure 2.12: Continuous Crossover examples for the per-coordinate scaling factors
[3,. Points xp,b and xp,w are the parents with the better and worse ﬁtnesses, corre-
spondingly; xc,i for various i are the children generated with different values of the
scaling factor for different coordinates. Here the dotted rectangle contains all the
children generated with 0 < B,- < 1, i = 1,. . . ,v

83

wards the end of the search we want an optimizer to converge at the sought result.
Convergence here means that the diversity drops down and the search process stalls,
i.e. the improvement achieved by a next generation remains less than the required
tolerance. Therefore we actually want the diversity to decrease but we do not want
it to decrease prematurely in order to keep the balance between the exploration of
the search space and the exploitation of the potentially interesting zones. Achieving
the balance between the fast but possibly, premature convergence and slow but more
robust performance is a matter of GATool parameters settings, particularly those of
the selection and reproduction. The minimum, average and maximum function values
in the current population and their change from generation to generation measure the
performance of the search. If the minimum is known, the distance to a minimum both
in the search space and in the space of the objective function values can also be used
(see section 2.3.6).

An example Of the values of these performance characteristics and their behaviour
during the search is demonstrated in Figures 2.13—2.16. These results are gathered
during one run of the GATool on the 10-dimensional Sphere function test problem
(see section C.1). Even thOugh GATool is a stochastic search method and thus the
results of the different runs are different if the random number generator generates
different random numbers, the qualitative picture generally remains the same. Two
stages of the search process can be clearly recognized. The ﬁrst stage, which typically
takes 1—20 generations, is exploration, i.e. the search for the zones of interest. It can
be observed on the graphs of the statistics from Figure 2.13—2.16 as relatively fast in
the beginning, slowing towards the end improvement of the minimum, average and
maximum function values, as well as the average distance between the members of

the population. After the exploration is done, exploitation performs its job Of ﬁnding

84

the minimum in the zones of interest (typically one zone) found during exploration.
This can be noticed for all statistics as the relatively slow change of the minimum

function value while other statistics randomly oscillate aroun an equilibrium state.

 

 

 

250 l I I I T t I I I
I
2m 1. .-
150 )' i
C
100 - .
. I
O 0
50 " ’~ ... O ' g . ﬂ. ' O. '1
. h.~uu‘~.w.tm.'. .0.-~".‘~ ....IN
.0 '
0 50-5:

 

 

0 10 20 30 4O 50 60 70 8O 90 100

Figure 2.13: Statistics gathered during one run of GATool on the 10-dimensional
Sphere function problem (see section C.1). The horizontal axis for all plots is the
generation number (Max/ avg/ min function values, normal axis).

Now that we have considered all the building block Of the algorithm, in order to
be able to start the search, we must deﬁne the stopping criteria. We implemented 4

stopping criteria commonly used by EAS:

0 Maximum stall generations: maximum allowed number of stall generations, i.e.
generations with the improvement to the obtained minimum function remains

less than the desired tolerance. Usually it means that the search converged.

o Desired objective function value: useful for design Optimization when we know
we will be satisﬁed and will not need further search if the values of the param—

eters that produce the desired value of the objective function are found. If the

85

 

%

 

1000

 

I I I I I I U I

100 F‘-

 

 

 

 

:0. ’- ...... I ... :3'. .W.......'C..ﬂi
10 re '. 1
' g u c
1 f .0 ‘5‘. . i
p a. “ﬁv'ﬂwlf‘:~'.ﬂ “-ﬂgfw‘.~W~50-ﬂg.lﬁm‘q
0.1 ’ ' ‘
r -, 1
0.01 L '-. J
0.001 E ‘5, ;
. .L .
0.0001 ) 1 ,
b w .
18‘005 P 1
. "II-l
1&006 1 I n l 1 n

0 10 20 304050 60 70 80 90100

Figure 2.14: Statistics gathered during one run of GATool on the 10-dimensional
Sphere function problem (see section C.1). The horizontal axis for all plots is the
generation number (Max/ avg/ min function values, logarithmic axis).

 

16 I I r I U I I T U

 

 

o
14 ~ '1
12 - ..
0
10 - 1
a .' .
6 I- O ..
I
4 . .l
.l
2 ' . d
.... o e I 0. 0
o [ ‘MW'M nag-“5.1

 

0 10 20 30 40 50 60 70 80 90 100

Figure 2.15: Statistics gathered during one run of GATool on the 10-dimensional
Sphere function problem (see section C.1). The horizontal axis for all plots is the

generation number (Estimated average Euclidean distance between population mem-
bers).

86

 

CD

(‘3

1m I I I I I I I I

 

 

 

I
10 .9 J
I
l I- . u:
...
0.1 " ' d
O
0.01 " I1
"I
0.001 " .. II
0.0001 1 “ -
I ' I.
1e-005 _ v... '3 . - ,
0' . .-
1£9006 - ' o ..f ‘ I . ... . 0 .
OI . ﬁ 0 ..I .0 .0...
1e-007 ~ , - . "u.
o I. '
leooa M A 1 n M I II 1 1 L

 

0 10 20 304050 60 70 80 90100

Figure 2.16: Statistics gathered during one run of GATool on the 10—dimensional
Sphere function problem (see section C.1). The horizontal axis for all plots is the
generation number (Min function value improvement (absolute value)).

algorithm preserves the best found values by elitism, this criteria can be less
useful since the smaller values of the objective function that can still be found

even after the desired value is reached are usually more preferable.
0 Maximum total generations: maximum allowed number of generations, 9mm:-

0 Maximum run time: maximum CPU time allowed to be used by a search pro-

CCSS.

By default GATool uses the maximum number of stall generations and maximum

total number Of generations stopping criteria.

2.3.4 Summary, Notes on Performance and Parallelization

The GATool search algorithm is demonstrated in Figure 2.17, technical details of

its implementation in COS Y Inﬁnity [23], user interface and default values of the

87

parameters are described in Appendix B. Performance of the method on standard test
problems is assessed in section 2.3.6, on real-life problems from Accelerator Physics
— in sections 4.2, 4.1, 4.3, noisy data handling methods are discussed in section
2.3.5, and constrained optimization in pair with the proposed REPA repair algorithm
is examined in section 3.3. Results of several example runs on the 10—dimensional
Rastrigin function problem (see section C .2) performed with different GATool settings
are summarized in Table 2.1. Even though the results obtained from different runs
would be different, it can be seen that the results of the Optimization highly depend
on the choice of parameters.

Since the EAS are heuristic in many ways and since most of their parameters
are interdependent in a non-trivial way, a rigorous analysis of the method is too
complex and not available for the general case. Rather the statistical approach to
performance evaluation on the various test problems is adOpted. However, as it is
discussed in section 2.1.4, good performance of the algorithm on the arbitrary large
class of the optimization problems does not guarantee good performance on problems
that do not belong to this class. Therefore we suggest considering these tests only as
an estimate of the algorithm’s behaviour and always performing tests on the problems
that are to be studied by any EA including GATool.

Note that, in principle, GATool can be parallelized almost trivially. If the objective
function evaluation is expensive relative to the GATool’s computational expenses, it
can be distributed over several (several hundreds or thousands) computers, evaluated
there in parallel and then gathered and passed back to the GATool running sequen-
tially for processing. This can be done with no extra effort using the PLOOP “parallel
100p” construct that was recently added to COS Y Inﬁnity [103,104]. In cases where

a large population is needed or any other purpose, the algorithm itself can be paral-

88

 

 

 

Randomly generate initial population, set predefined members, if any
Calculate objective function values, scale to fitnesses
Update statistics
While any of the stop conditions is not satisfied do
Perform Roulette Wheel/Stochastic Uniform/Tournament Selection
Generate next population
Produce mutants by Uniform/Gaussian Mutation
Produce children by Continuous Crossover
Copy elite members
Replace old population with a newly generated one
Calculate objective function values, scale to fitnesses
Update statistics
End while

 

 

Figure 2.17: GATool search algorithm

Table 2.1: Results of one run of GATool on the 10-dimensional Rastrigin function
PrOblem (see section C.2) performed with different GATool settings. The values of

the parameters that are different from the default ones (see Figure B.1) are given in
bOIdface

 

Scaling Elite Mutation Crossover Result Time
Rank 10 Unif(0.1) Heur(0.8, 1) 0.593E—02 0h 4m 308
Rank 10 Unif(0.01) Heur(0.8, 1) 0.196 Oh 4m 275
Rank 10 Gauss(1, l) Heur(0.8, 1) 3.082 0h 4m 255
Rank 10 Unif(0.01) Heur(0.8, O) 0.100E—01 0h 4m 435
Rank 0 Unif(0.1) Heur(0.8, 1) 0.125E-03 0h 4m 29s

Linear O Unif(0.1) Heur(0.8, 1) 7.4327 0h 4m ls

 

89

 

 

lelized by using a co-evolutionary model where several populations co-evolve together
starting from different initial populations, possibly using different algorithm param-
eter values. Each of these populations can be evolved on a separate machine. After
certain number of generations (or in predeﬁned time quants) a transfer of members
between these populations is performed in order to exchange the Obtained informa-
tion. This model of the parallel EA execution is frequently called the Island Model,
and the process of exchange, the Migration Operator.

Also it should be noted that GATool is a modular algorithm, i.e. it allows easy
modiﬁcation or substitution of its operators. Other types of selection, mutation,
crossover, initial pOpulation generation and stopping criteria can be added if deemed
necessary. Another important note is that GATool is a very general-purpose optimizer
with almost no requirements imposed on the considered problem. In fact, the only
requirement is the ability to evaluate an objective function value at any point in search
space. However, because of this generality, if there exists a method of optimization
that is specialized for a particular class of problems and it uses the extra information
about this class, then it is likely to outperform than GATool. On the other hand,
even in this case it can still be advantageous to build a hybrid algorithm (see section
2.3.6) or at least use GATool to cheaply explore the search space and generate some

800d starting points for the specialized method.

2.3.5 Noisy Data Handling

All physical devices operate with errors: they could not be manufactured exactly as
designed without construction errors and the operational information coming from
Various detectors contain noise and due to imperfections and limited precision. Here

We discuss how such problems can be treated with GATool.

90

 

We consider two classes of problems both containing noise in the function values:

0 Static: the function values contain errors but these errors remain constant across

function evaluations:
“'17): ftruefxl + Aff'xla V513.

0 Dynamic: the function values contain errors that change every time the function
is evaluated:
f(x) = ftruell‘) + I'and(-l-\f(-73), +Af($)).
where rand is a random number whose distribution is determined by the con-

sidered problem. For simplicity, in this section we consider only uniformly

distributed random numbers.

First we consider static noise problems. Since the noise in the function values is
usually several orders smaller than the function range, for this class of the problems we
can use the same GATool parameter values that we would utilize for the undisturbed
function values, possibly averaging results to decrease the contribution of noise. Of
course, if the function range is of the same order as the noise itself, then its presence
Signiﬁcantly changes the properties of the function and the true value of the minimum
is unlikely to be recovered.

Many test functions in Appendix C can be viewed as a sum of the “main function”
that determines the large-scale behaviour, main properties and a global minimum, and
a “noise function” that adds oscillatory behaviour of a smaller scale, thus introducing
many local extrema and making optimization harder. Consider, for example, the
Sphere function (see section C.1):

f(x) = Z 2,2
i=1

91

and the Rastrigin function (see section C2):

n
f(x) = 10n + Z (11:;2 — 10 cos(2nri)) .
i=1

The “main function” here is a Sphere function and the “noise function” is
n
10n + Z (—10cos(2m,-)) .
i=1

Both Sphere and Rastrigin’s functions have the same global minimum: f (O) = 0,
x* = 0.

We ran GATool on both test problems in 5 dimensions 100 times using a default set
of parameters (see Figure 8.1), population size : 10*dimension : 50. The statistical
distributions of the solutions by the neighborhoods of the known global minima is
presented for both functions in Figure 2.18. The average run time for the Sphere
function is 4.09 seconds, for Rastrigin function — 5.22 seconds; quality reduction is
clearly noticeable. However, if we increase the population size to 20*dimension :
100 and repeat the simulation, we can bring the quality back (see Figure 2.19), i.e.
compensate for the noise at the cose of an increased average runtime increased of
11.92 seconds. Hence, our recommendation for noisy problems with the static noise
is to increase the population size and the maximum number of generations allowed
to retain the quality at the expense of an increased run time.

Dynamic noise in the function values can occur, for example, in the problem of
On-line optimization of the control parameters of some complicated physical system:
a car autopilot, a self-tuning nuclear reactor or particle accelerator. Here the primary
Source Of the noise is the limited accuracy of the physical measurements. Here our
Population members are vectors that store sets Of control parameters. The Objective

function is evaluated by taking measurements of the performance from detectors.

92

100

 

 

 

 

‘3
'E 80 -
8
‘6'»
'03 50 .
C
.E
g 40 -
_o
5
8 20 -
5°
0
5.0 1.0 0.5 0.1 0.05 0.010.001
Neighborhood radius
(a) Sphere
'O
8
-E
8
.C
.9
d)
C
.E
V)
C
O
.5
2
8
S”

 

 

 

5.0 1.0 0.5 0.1 0.05 0.010.001
Neighborhood radius
(b) Rastrigin

Figure 2.18: Distribution of the results of 100 runs of GATool on the 5-dimensional
Sphere (left) and Rastrigin (right) test function problems (see Appendix C) with the
default set of parameters (see Figure B.1), population size : 10*dimension, by the e
neighborhoods of the global minimum.

93

100

 

 

 

E

‘E 80 - <
.8

‘51

.5 60 . .
r:

.E

g 40 r .
.o

3

8 20 ~

5"

0

5.0 1.0 0.5 0.1 0.05 0.010.001
Neighborhood radius

Figure 2.19: Distribution of the results of 100 runs of the GATool 0n the 5—
dimensional Rastrigin (right) test function problem (see Appendix C) with the default
set of parameters (see Figure B.1), population size = 20*dimension, by the e neigh-
borhoods of the global minimum.
Different measurements performed with the same set of control parameters will most
likely return different values of the objective function, so the noise is indeed dynamic.
The main problem here is that the mechanism of the elite members that preserves
the best values found to this step and assures convergence does not work since func-
tion values change between generations and the best members might not remain best
after re-evaluation. This effect does not let the method converge which can be seen in
Figure 2.20. Here we use the sum of the coordinate—wise differences of the best found
minimizer with the true one instead of their squares or modules in order to demon-
Strate that the current minimizer’s position relative to the true minimum is changing
not just the distance. Note that for the problem without noise, the method converged
in the 52—nd generation, while for the noisy problem it reached the maximum number

Of generations oscillating around the true minimizer without convergence.

94

 

 

 

Difference

 

-2 7 1 . . . . . . . .
10 20 30 40 50 60 70 80 90 100
Generation

 

(a) No noise

2 I I I I I I

 

H
J
Y
L

 

Difference

 

_2 L . . . . . . . .
10 20 30 40 50 60 7O 80 90 100
Generation

 

(b) Dynamic noise

Figure 2.20: GATool’s performance in the 5-dimensional Sphere function problem
(388 section C.1), population size 50, default set of parameters (see Figure B.1), with-
01“? noise (left) and with the dynamic noise in the range [—1, 1] (right). Generation
nurIlber versus 23:1(1’9‘ — Jeanne), where x* is the best minimizer found by GATool

i
and Xtrue is the true global minimizer (in this case 0), is plotted.

The approach recommended earlier for the static problem (increased population
size and / or maximum number of generations in evolution) can also be utilized in this
case. Another alternative is to use the mean value of the population’s best member

averaged over generations as suggested in [102]:

92
— 1
x* = x* = —— E : xi. Is 91 s 92 s gmax- (2.3.14)
92 _ 91 +1i=91

As can be seen in Figure 2.20, the best value obtained by GATool oscillates around the
true minimum. Hence, by running GATool for a sufﬁcient number of generations and
skipping several initial ones (typically g1 : 5.. .20) where the method is searching
for the area of interest and then averaging the best obtained minimizer using (2.3.14),
we can reduce the effect of noise. Note that the noise distribution must be taken into
account when averaging. In our case we used a uniform noise distribution, hence we
employ the simple arithmetic mean.

Minimizing the 5-dimensional Sphere function problem with the true minimizer 0
(see section C.1) by GATool with the default set of parameters (see Figure B.1), pop-
UIation size : 10*dimension= 50 and the uniform dynamic noise in the range [—-1, 1]
With a different maximum number of generations we obtain the results summarized in
Table 2.2. While GATool’s stochastic nature and the fact that noise is dynamic make
the results run-dependent, we can repeat these simulations to statistically observe
that the averaged value of the minimizer indeed gets better with an increase in the
nuInber of generations GATool is allowed to run. It should be noted, however, that
if the noise is dynamic but its level is of the same order as the required precision, the
nunlber of generations where the positive effect of the averaging starts to show up

can get very large.

96

Table 2.2: Euclidean distance from the true minimizer to the current best found
objective function value and the objective function value averaged by formula (2.3.14)
with 91 = 5, for the 5-dimensional Sphere test problem (true minimizer 0 see section
C.1) minimized with GATool with the default set of parameters (see Figure B.1),
population size : 10*dimension= 50 and dynamic noise in the range [—1, 1]

 

 

 

 

 

 

G . Distance to minimizer
eneratlon

current averaged
100 0.18567 0.22973
200 0.17075 0.31166
500 0.13479 0.07508
1000 0.21228 0.06281

 

 

2.3.6 Studies on Integration with COSY-GO Rigorous Global
Optimizer

The COSY-GO veriﬁed global optimizer is based on the box processing algorithm
described in section 2.2.2. In the second step of the algorithm (see Figure 2.8), it
employs various heuristics in order to update the current cutoff value, which is the
best rigorous upper bound for a minimum. This value is then used in the ﬁrst step to
help trimming branches of the search tree when they are evaluated to lie above the
cutoff in the objective function range space. The better the cutoff, the more boxes
Can be eliminated from the relatively expensive processing performed by COSY-GO
to maintain the rigor. Thus the overall execution time is also getting better. In the
Current version, the heuristics used for this purpose are an evaluation Of the function
Value at the middle of each box, a gradient line search, and a representation of the
f1lflction as a convex quadratic form to update the cutoff value [26].

For small-dimensional problems, even such a naive approach works ﬁne but for
larger problems where days of computational time on thousands of CPUS are needed
to ﬁnish the search, it can be beneﬁcial to use a better heuristic algorithm. While

there exists a plethora of heuristic optimization methods (see section 2.1), we claim

97

that GATool is particularly suitable for this purpose since it is fast, robust, multi-
purpose and tolerant to noise (see sections 2.3, 2.3.5, 4.2, 4.1, 4.3). We should note,
however, that while generally EAS are capable of ﬁnding good estimates of extrema
even for very complicated functions, they are not guaranteed to succeed and may
require extensive ﬁne-tuning to the problem in order to achieve good results.

In this section we describe the problems that rigorous optimization with COSY-
G0 is dealing with as the dimension of the problem increases and discuss how the
GATool optimization method can be utilized to reduce their negative impact. We
also demonstrate some example simulations and overview some ideas on the optimal
choice of GATool parameters in order to obtain a good balance between computation
time and the quality of the result.

In order to rigorously bound the optimum, COSY-GO represents the current
search space as a list of boxes and employs Taylor model methods [4] to eliminate or
Split them, effectively reducing the search space volume. The package was efﬁciently
applied to rigorous global minimization of the well-known test problems for the global
Optimizers [26,117,122] as well as to solve some real life problems such as spacecraft
trajectory optimization [11] and normal form defect function optimization [26,103]. A
Collection of the boxes generated during the 2-dimensional Rosenbrock’s test problem
(See section C.4) minimization and the spacecraft trajectory optimization are shown
in Figures 2.26 and 2.21, correspondingly (courtesy of Roberto Armellin). Perfor-
Inance of the COSY-GO on some of the test problems listed in Appendix C with
increasing dimensionality is summarized in Table 2.3.

It. should be noted that even for such a small sampling of the test problems,
some of the results demonstrate a visible correlation between the dimension and the

minimization time. Some problems scale well with dimension (An), i.e. the time

98

 

 

 

6500 '1‘ .
6000 f -
70
. '60
’1
.]
Sf . ‘50
if“
: - -40
.l

 

 

 

 

 

 

2000 3000 4000 5000 6000
151t1n0]

Figure 2.21: Global Optimization of the spacecraft trajectories: pruned search space
in the epoch / epoch plane (courtesy of Roberto Armellin) [11])

99

I]?

.1“

required to ﬁnd the minimum does not grow as the dimensionality increases, some
scale not so well (CosExp), and some scale poorly (Paviani, SinSin). Note that for the
An function, computation time is very small and is dominated by the input/output
time rather than by actual numerical computations.

Generally, we cannot expect problems to scale well with dimension. Only when
they have some special properties and symmetries are such expectations justiﬁed.
There are several reasons for that. First, as the dimensionality of the problem in-
creases, so does the volume of the search Space. Suppose, for simplicity, that our
search space is a v-dimensional cube with the length Of one side equal to d. Then its
volume is given by

V = d“. (2.3.15)

so the volume of the space searched for the extrema grows exponentially (see Figure
2.22(a)). Note that if 0 < d < 1 then the search space volume actually decreases
with the dimension, which is the case for the An test function from Table 2.3. As
the volume increases, in general the number of the boxes that COSY~GO needs to
eliminate to enclose the minima with the desired accuracy increases. The number of
local minima that should be reviewed and rejected in order to ﬁnd the global one, can
grow exponentially as well (as it does for most test problems from Appendix C). Hence
the complexity of global optimization generally grows exponentially with dimension.
Note, that for the local optimizers, complexity Often grows polynomially [26].
Another problem, directly connected to the Taylor methods used by COSY—GO,
is that the number of monomials in the Taylor expansion Of the function grows with

the dimension:
(n + v)!
nlv!

M = (2.3.16)

Here v is the number of variables (dimension) and n is the order of the expansion

100

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2; as 85 86 mos Nos :3 Nos 3 . a
was”: was.” «6:: ~33 N32.” «636 is? Eons > m .. 07. <
NEE. 2.9.3 3.3». 2.3 a: Se s .

. . . . . . . . w .1 oz Emsm
«+62 a $68 a stoma H 2.69. e 768 m :68 H 8 w 8 a. >
8.3m ES 3a was a: as mos 8s s . _ tax 8
32.2 “+16%; wigs Press 133 $68.1. «+33. Traces > m 1 07 m 0
Same: mesa 2.8“ 8.8 we.“ was 2s as s .
. . . . . . . . w 0 oz _§$§
wins. H :63 2 also a $68 a «+63 m. «+68 a. «+6: a $68 6 >
a w _ a W e _ m r a. _ m _ a
56585 maouoEmSm EoEOanH

 

 

 

 

 

@6338 3:: mm was $5956 3 weed 03 waxes. 286586 a 98 w 8m w H 0 Z ﬁts Eoﬁoa
Emﬁm ANN :osoom oomv $20 Swansea 3on .8381 23 mm OZ .mwcoOOm E. was 55388 2: ma N was Seam :88"... 65 mo
25:? 2: E > ”ﬁascoﬁcoﬁww wEmwoaoE 5:5 6 ﬁwcmaaaa oemv 98an $3 as... :o mocwﬁeotoa OO->mOO "mm Each.

101

”(In

(NO in Table 2.3). The graph of the number of monomials against the dimensions
Obtained for different Taylor expansion orders is shown in Figure 2.22(b). As the
number of monomials increases, so does the evaluation time for P from the Taylor
model representation (2.2.1). Since in high-order multi-dimensional Taylor models,
many higher-order monomials are frequently zero, an efﬁcient technique to handle
this sparsity helps to reduce the impact of this factor. Such a technique is efﬁciently
implemented in COS Y Inﬁnity: monomials that are equal to zero are not stored and
do not participate in calculations [13]. This allows COS Y Inﬁnity to be memory and
computation-efﬁcient and allows it to operate on Taylor expansions up to higher orders
with a reasonable amount of the computing resources. A good heuristic cannot help
in solving the problem of growing numbers of monomials, but it could help to reduce
the bad impact of exponential growth of the search space volume by providing good
cutoff values and thus allowing elimination of more boxes on each step of COSY—GO.

Here we consider GATool as a potential candidate cutoff values generator. The
rationale for such a consideration is that it frequently succeeds in ﬁnding a good upper
bound for the global minimum even for high—dimensional and complex problems in a
reasonable time (see sections 4.2, 4.1, 2.3) and that it scales well with dimension. We
start our examination by applying GATool with the default set of parameters (see
Figure B.1) to the set Of test problems from Table 2.3 (this table shows the COSY-GO
performance).

Since the population size in EA is what largely determines its ability to thoroughly
explore a search space, we tested different population size scaling schemes. Here we
present results for two different mechanisms: the population size is 100*dimension and
the population size is 10*dimension. Results obtained from random runs using these

two strategies are summarized in Tables 2.4 and 2.5 respectively. For each problem,

102

 

le+008 .

 

Initial volume = 1

 

 

 

 

 

 

  
     

 

 

   

 

1e+007 ', Initial volume: 2 +
’ Initial volume = 3 +
1e+006 } Initial volume = 5 +
: Initial volume = 10 - —._,
1(DOOO ’
“5’ 10000 _
2 .
g 1000
100 I
10 ' '
l
o l ‘ 1‘ 4 l n .
Dimension
(a) Search space volume for different initial volumes
100000 ?-____—1==—1 , 1 I '
, N0 = 2 +
N0 = 3 +
’ N0 = 4 +
10000 r N0 = 5 .
.02 ' m = 6 .u..-
c 1000 P» N0 = 8
o
E u
“5
g 100 ,
E .
3
z I
10

 

 

    

1 2 3 4 5 6 7 8
Dimension

(b) Number of monomials for different expansion orders

Figure 2.22: Growth of complexity factors of global optimization with COSY-GO
with dimension

103

three parameters are listed: V is the volume of the search space, t is the execution
time in seconds and Q is the quality factor calculated as the difference between the
best obtained upper bound and the value of the global minimum (smaller is better, 0
means that the global minimum is found). Note that the Sphere function test problem
is not listed for the large population size due to its simplicity.

We note that GATool is by design a stochastic algorithm. Hence, even if it provides
good results on an occasional run, there is no guarantee for results to be consistently
good each time. However, statistical studies demonstrate its robustness in providing
a good upper estimate Of the minimum. See, for example, Table 2.6 in this section
and Table 4.1 in section 4.1.

Albeit an increase in the population size usually increases the quality of the Ob-
tained estimate (by quality we mean its proximity to the global minimum), population
size is also one of the main factors that inﬂuences the computation time Of GATool.
It is directly connected to the required number of function evaluations, it increases
the execution time for the most ﬁtness scaling algorithms, increases selection time
and number of times crossover and mutation must be performed to generate the next
pOpulation. Finally, it increses the size Of the memory footprint, which increases the
computation time due to a high price of the memory operations. The important prop-
erty of the heuristic cutoff search algorithm is not only its ability to ﬁnd a good result
but also its ability to perform this search in a reasonable amount of time. GATool has
several features that allow it to perform well even with relatively small population
sizes (which also greatly reduces the computation time).

Now we investigate the tradeoff between the quality of the result and execution
time in more detail. Between Tables 2.4 and 2.5, there is the difference of almost two

orders of magnitude of the execution time can be easily seen, yet for some problems

104

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

fame was; n6m«.« 66$.« «.82 wsow.« «.686 «62.« 0
news” :43 one: «new :5 «a: 8.3 nos 3 sarcasm
«+83 «+82 «+33 383 n+6«3 «+33 «+8.3 «+62: >
«-6w«.« «6««.«. «2«.« «as; «6.34 «in? «sews «some a
«was 8.«w« 3n: «2“: «he «was 8.8 «e: s «seesaw
Sign ««+6««.« Stems 2+6wm.« 2+6w«.« 2+6B.« “is: «+33 >
o-§.« c-6«m.« 96:: @168.« «$3 5.2? ~68.« 8d 0
2...? 8.2:. 8.3... «33 «31.2 3.3 8.2 3.3 s 595
«+33 «+83 «+22 18% 1.83 2+8“: 8s co.« >
«we ow.« «an 76$...” 76%.” $8.... «6:: 268.... a
o«.«o« m«.«:. «33 sees «53 «SE «is «5.: s Sascha
2+3?” a+8n.« «+62; n+6»: stems «+83 ”+63...” «+2..«.« >
«6%; «$3 mags was; n6«o.« eases 96%; @138.” o
«foe mamas 5.9.x ««.8« «38 8.2: «3: n«.«« ... 182$ 5
«+3.2 “+1.5; e+68.« 322.. «+33 «+83 «+23 1686 > m
2-68s Tags Slows En; Tana is; Sam: 238s 0
«was «£2: «23 «man ««.«o« 3.32 «is 2.8 s axmmoo
«+2.3 :33 e+68.« m+s«o.« «+33 «+83 «+33 Steve >
«in? «$3 «$3 «6%.... «63¢ is?” «.636 216:; o
ow.««w «3: were Eden «new :82 8.3 «a: s 5.
«-83 «.23 «6:: «$2 «in; «.33 Is? Eon.« >
a w _ N. mowmcmcwa m L w m _ m EmpoEaBm Eoﬁem

 

 

 

 

 

3:58 2 83555 Eco—m 2:
3:... £52: o .333 mm $=wﬁmv 82558 Ban 2: mo egg 2: was wagon Sam: «5:850 amen 2: 5on3 magnate on» we
wouﬂsoﬁo 888 32.25 oﬁ mm U «:8 238% E 2:: 283888 2: mm a ,8QO 53% 2: mo @833 2: ﬂ > 63.48% H 38
cosﬂsaoa was Cd madman— oomv mwcmﬁom :9&% 4:3 0 ﬁwcoaaa‘ 80¢ 928an 0.6.8 23 no monwﬁaofoa _ooB<U "EN 2an

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

«.2«.« «-23 «-2«.« «-23 «-23 «.23 «-23 2.2«.« a
««.«« ««.«3 ««.«3 ««.« ««.« 33 ««.« 3«.« a 222
«+23 «+23 «+23 «+23 «+23 «+23 «+23 «+2«.« >
«23 «-2«.« «2«.« «2«.« «.2«.« «.2«.« 3-23 «23 a
««.«« ««.2 ««.«« ««.« ««.« ««.« «3 ««.« 3 52522
«+23 «+23 «+23 «+23 «+23 «+23 «+23 «+23 >
323 «...-«««.« «.23 «+2«.« 323 3.23 «.2«.« «.23 a
««.«« ««.«« ««.«« 2.2 ««.« ««.« ««.« ««.« a «2an
««+23.« ««+2«.« ««+2«.« ««+2«.« ««+2«.« ««+2«.« «+23 «+2.3 >
«-««.« «2«.« «-3«.« «««.« «23 «23 «-23.« «2«.« d
««.«« ««.«« ««.«« ««.« ««.« ««.« «3 ««.« a «2.2
«+23 «+2«.« «+23 «+2«.« «+2«.« «+23 ««.« ««.« >
««.« ««.« ««.« ««.« «3 «3 ««.« «3 a
««.«: ««.«« ««.2 ««.« ««.« ««.« ««.« ««.« a «85582
««+2«.« «+2«.« «+2«.« «+23 «+2«.« «+2«.« «+2«.« «+2«.« >
«2«.« «.23 3-23 «2«.« «.23 «2«.« «.23 «.23 a
««.«« ««.«« 2.2 ««.« ««.« ««.« ««.« ««.« p 6222
«+2«.« «+23 «+2«.« «+2«.« «+2«.« «+2«.« «+23 «+2«.« >
32«.« T2«.« 3.2«.« T23 3««.« 3.23 32«.« 3-23.« 0
««.«« ««.«« «33 ««.« ««.« ««.« «3 ««.« 3 9280
«+23 «+23 «+2«.« «+2«.« «+2«.« «+2«.« «+23 «+2«.« >
«23 «23 «-23 «23 «23 «23 «.23 «-23 o
««.«« :2 ««.«« ««.«« ««.2 ««.« ««.« ««.« « 5
«.23 «2«.« «.2«.« «.23 «.23 «2«.« 323 323 >
« « « « « « « «
852585 «««.««33& 8238A

 

 

 

 

A858 2 88558 228% 2:

106

«2: «328 o .2225 2 «2382 8:558 132m 2: «o 2:? 23 28 «553 «23: 258% «men 2: 5253 882236 2: mm
Eadie—8 “89$ 3:38 2: «3 0 ~28 £882 5 o8$ cowsuoxo 2: 2 a 63% 48:22 2: mo «85? 2: «m > 6358 H 36

«2222 «2. 3 «SE «2 «25% «:22 5;, “o 222% «E «233 «g «s 2 «2352 332$ ««.« «32

 

the quality of the result remains the same. For some problems, the quality decrease
is lower than the increase in the time of the execution. Comparing the time of the
execution of GATool from Table 2.5 with that of COSY-GO from Table 2.3, we see
that for poorly scaling, high-dimensional problems (SinSin, Paviani and also complex
multi-dimensional normal form defect function optimization discussed in section 4.2,
Tables 4.2, 4.3), the time difference between the two methods becomes signiﬁcant
enough for GATool to be a useful heuristic.

It is important to note a relatively low quality of the several estimates provided
by GATool. However, the interaction algorithm interaction of COSY-GO is such
that the search domain for GATool is being reduced with time by the COSY-GO
box elimination process. Domain reduction greatly improves GATool’s accuracy, as is
demonstrated later in this section. Also note that there might be a different optimal
strategy for population size scaling with dimension. This choice is largely heuristic
and depends on the problem under consideration. In Figures 2.23 and 2.24, we demon-
strate how the execution time and the result quality result change with dimension for

different multipliers m from the
PopSize = m - Dimension

scaling strategy. Results are presented for the random run on the Rastringin’s function
test problem (see section C?) with the default set of parameters from Figure 8.1 and
may vary from run to run. However, qualitatively the relation between the quality
and the population size remains unchanged; larger population statistically provides
better search results at the cost of increased execution time.

The statistical demonstration of the consistency of results is summarized in Figure
2.25. Data is gathered from 1000 runs of GATool on the 5-dimensional Rastrigin

function test problem with the default set of parameters (see Figure 8.1), population

107

size = 10*dimension. The average run time is 5.22 seconds. Note that while the
quality of the result from Table 2.5 is only achieved in 5.8% of the runs, almost 50%
of the times, the result is consistent with the general trend demonstrated in Figures
2.23 and 2.24. Also note that since we are studying the quality of .the obtained cutoff
value, we measure the proximity of the results to the global minimum in the range
space of the function. Hence, for some functions, even if the values of the function at
some points are in the same sufﬁciently small neighborhood of the global minimum

value, the points themselves may be far from each other.

 

 

 

  
 
  
 

350 ——'l= - . r=F u u le+010
pop=10*dlm + _
pop= 20*dim + ‘

300 - pop=30*dim + “"009

pop = 50*dim +

250 p pop=100*dlm -- :-
volume

1e+008

 

 

 

 

le+007

 

 

E 200 0
g le+006 5
'4: 150 9
100000
100 10000
50 1000
0 100

2 3 4 S 6 7 8 9
dimension

Figure 2.23: Example of the execution time scaling for different scaling strategies
for Rastrigin’s function test problem (see section C.2) minimization with GATool.
The volume of the search space (logarithmic scale) is shown to better demonstrate

SCﬁling issues.

The interaction mechanism between COSY-GO and GATool could be turned from
exploitation into true symbiosis. Suppose both methods start working on the same

Problem at the same time. In section 2.3.2 it was mentioned that GATool starts its

108

 

quality

   

 

pop = 10*dim +
pop = 20*dim ———A—— 4
pop = 30*dim + '
pop = 50*dim + '

 

 

 

 

 

. . pop = 100*dim ‘
4 5 6 7 8 9
dimension

Figure 2.24: Example of the result quality scaling for different scaling strategies for
Rastrigin’s function test (see section 0.2) problem minimization with GATool.

 

% Solutions in neighborhood

 

 

 

5.0 1.0 0.5 0.1 0.05 0.010.001
Neighborhood radius

Figure 2.25: Distribution of the results of 1000 runs of the GATool on the 5-
dimensional Rastrigin function test problem (see section C.2) with the default set of
parameters (see Figure B.1), population size ; 10*dimension, by the 5 neighborhoods
of the global minimum. Average run time is 5.22 seconds.

109

search in a box and ends up with the minimum value of the function in this box that
it was able to obtain. By that time COSY-GO, working in parallel on the same box, is
likely to have already partitioned it into a set of smaller boxes and eliminated some of
them. Using the cutoff value provided by GATool, COSY-GO likely could eliminate
more boxes and send GATool the update of the current search space in the form of its
current boxes list. COSY-GO rigorously guarantees that the global minimum resides
in one of the boxes from this list. Then GATool runs again using this set of boxes as
an initial search space and, in most cases, since its volume is smaller, obtains a better
cutoff value. It is then returned to COSY-GO to let it eliminate more boxes. The
process continues until the global minimum is bounded by COSY-GO with desired
accuracy.

Since the whole method is heuristic, there might be several different strategies of
using the information about the search space obtained from COSY-GO. For example,
if We have some information about more suspicious and less suspicious boxes, we can
r1111 GATool on a set of the more suspicious boxes with the hope that a smaller volume
reSUlts in a better cutoff. Alternatively, we can run GATool in each of the boxes for
a Very small number of generations with a very small population size to obtain a
VaIUe of the minimum for each of the boxes in the collection in a time comparable to
that of the midpoint test. Then the cutoff is selected as the best value among the
Ones obtained. Some other strategies could also be invented possibly using additional
infOlrmation about the problem. Regardless of the choice of the strategy, we claim that
the smaller volume of the search space that encloses the sought minimum generally
leﬁds to a. better upper bound on the minimum obtained by GATool. Since COSY-

GO often provides smaller enclosures of the minimum on each step, this would lead

t0 an increased quality of the cutoff provided by GATool, thus these two methods

110

performances form a synergy. The exact strategy formulation and implementation is
a direction for future work.

In order to support our hypothesis, we selected the Rosenbrock’s function test
problem in 10 dimensions (see section (3.4) as the worst case scenario for which
GATool alone was unable to provide a reasonable estimate with a small population
size (see Table 2.5) and/ or high dimensionality. Since it is also one of the COSY-GO
performance test problems, we ﬁrst run it with COSY-GO to obtain the list of boxes
it generates during execution (see the example of the 2-dimensional Rosenbrock’s
function minimization process in Figure 2.26). Then we used several boxes similar to
the generated ones with the decreasing volume as initial search domains for GATool
and performed the statistical study of the quality of the estimates obtained for each
Of these initial boxes in 100 runs. The results are summarized in Table 2.6 and in
Figures 2.27—2.31.

Note that while generally the distribution of results by neighborhood improves
as the volume of the search space decreases, the volume itself is not the only factor
determining performance. Comparing the results for the search domains of {—5, 10]10
and [—1.5, 1.5]10, we see that, while for the second box 100% of the results in the
Se(10nd domain lie within the radius 10 around the global minimum, there are no
SolIltions that lie within the radii of 5 or 1. However for the ﬁrst box, having a
v01llme 7 orders of magnitude larger, 45% and 7% of the results lie within these
CIOSer neighborhoods, correspondingly. It was observed that for the [—1.5, 1.5]10
initial search domain GATool converges on the function values around 7 and is not

progressing towards the global minimum.

The reason behind such behaviour is that the different choices of the search do-

main can reveal or hide different properties of the studied function thereby changing

111

 

 

 

 

 

 

mm o o o o :2;

a 2” o o o :2:

2: a o o o 2;

2: 2: o o o 3

2: 2: 2 o m 3

2: 2: on o a. :

2: 2: 2: o 2. m.

2: 2: 2: 2: e. S

2: 2: 2: 2: 2w om

2: 2: 2: 2: 2: 2:
2-822s u > $22.: n > 183 n > is? u > 2+3; u >

3:: 2.2 23.122 2:312 23.121 2:: :1 ages

QEEOU

 

 

.mﬁm EwEov 2: mo 2:82:23on 828% mm R 2598 2 «ES 8: momenta 2:239» wﬁvémaoov £25 38:26 58% FEE
pamamwﬂv :8 8:82:22 132w 2: mo mwooﬁonzwﬁc m 23 3 3d Sawm— wwmv 88223.3 mo 8m inﬂow on... ﬁrs AVG 22.6%
$3 28382 ammo 228:3 mkooﬁnmmom 32065866: o5 no 308420 2: m0 m5: 2: mo 2:52 23 no 205255me "9N 3nt

112

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1.5 L2 . . s . ﬂ . ‘
1 - .
0.5 - .
0 - .
-0.5 t .
-1 _ .
-1.5 - .l
-1l5 -i -0.5 0 0.5 i 1.5

Figure 2.26: Boxes generated during COSY-GO minimization of the 2-dimensional
Rosenbrock’s function (see section CA)
the performance of the algorithm trying to exploit them. Therefore some particular
Choices of the search domain might work better than the other even if the volume
Size-based logic suggests otherwise. The problem of the optimal initial box selection
is Similar to the problem of ﬁnding a good initial point for the local optimization
methods. Determination of the optimal initial search domain for an arbitrary func-
tion, when we generally have no information about its behaviour and the location
of the minimum, is in this case a matter of trial and error. However, COSY-GO
exploring the search space and eliminating regions that are guaranteed not to contain
the sought minimum, provides GATool this additional information about the search
domain, thereby greatly increasing the quality of search, as is demonstrated in Table
2-6 and in Figures 2.27—2.31. It is also worth noting that in this case the average run

time for all domain sizes is around 35 seconds and it is not observed to depend on

113

 

% Solutions in neighborhood

 

 

 

 

100 50 10 5 1 0.5 0.1 0.05 0.010.001
Neighborhood radius

Figure 2.27: Distribution of the results of 100 runs of the GATool on the 10-
diInensional Rosenbrock’s function test problem (see section CA), with the default
set, of parameters (see Figure 8.1) by the e neighborhoods of the global minimum for
different initial search domains with decreasing volumes. Average run time is z 35
Seccmds independent of the domain size ([—5, 10110, V = 5.67 - 1011).

114

 

100

§80~
860i 1
"2
.5
§40~ .
3
8
69 20-
0 4L: J 1

 

 

 

 

100 50 10 S 1 0.5 0.1 0.05 0.010.001
Neighborhood radius

Figure 2.28: Distribution of the results of 100 runs of the GATool on the 10-
dil'nensional Rosenbrock’s function test problem (see section C.4), with the default
set of parameters (see Figure 8.1) by the 5 neighborhoods of the global minimum for
different initial search domains with decreasing volumes. Average run time is z 35
SecOnds independent of the domain size ([[—1.5, 15]“), V = 5.9 - 104]).

 

% Solutions in neighborhood

 

 

 

100 50 10 5 1 0.5 0.1 0.05 0.010.001
Neighborhood radius

Figure 2.29: Distribution of the results of 100 runs of the GATool on the 10-
dirIlensional Rosenbrock’s function test problem (see section CA), with the default
3913 Of parameters (see Figure B.1) by the e neighborhoods of the global minimum for
diﬂ‘erent initial search domains with decreasing volumes. Average run time is w 35
SecOnds independent of the domain size ([[0, 15]“), V = 5.76 - 101]).

116

% Solutions in neighborhood

 

 

 

100 50 10 5 1 0.5 0.1 0.05 0.010.001
Neighborhood radius

Figure 2.30: Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section C.4), with the default
set of parameters (see Figure 8.1) by the e neighborhoods of the global minimum for
different initial search domains with decreasing volumes. Average run time is z 35
seconds independent of the domain size ([[0.5, 15]“), V = 1.0 - 100]).

117

% Solutions in neighborhood

 

 

 

100 50 10 5 1 0.5 0.1 0.05 0.010.001
Neighborhood radius

Figure 2.31: Distribution of the results of 100 runs of the GATool on the 10-
dimensional Rosenbrock’s function test problem (see section C.4), with the default
set of parameters (see Figure B.1) by the a neighborhoods of the global minimum for
different initial search domains with decreasing volumes. Average run time is z 35
seconds independent of the domain size ([[0.7, 1.3]10, V = 0.6 - 10_2]).

118

the volume of the search space.

As an additional note, we need to mention that during the ﬁnal steps of the COSY-
GO execution, the function and parameter values at which the algorithm operates
can get very small. In such a case, the precision of the calculations might suffer from
ﬂoating point arithmetic rounding errors. Hence it is also important for the cutoff
search method to be numerically stable. As mentioned in section 2.3.2, GATool with
the Rank selection method does not perform any numerically unstable operations.

Hence it does not add numeric instability to the problem.

2.4 Conclusions

We reviewed the commonly used methods of unconstrained optimization and de-
scribed the implemented GATool Evolutionary Algorithm for unconstrained continu-
ous optimization in detail. We assessed its performance in terms of computation time
and the quality of the obtained result, studied the tradeoff between the computational
resources needed and the resulting quality GATool provides. We discussed GATool’s
performance in the presence of static and dynamic noise, suggested useful strategies
of performance tuning for the EA-hard problems and demonstrated their usefulness
on examples. We justiﬁed the choice of GATool as a heuristic method to generate cut-
off values for COSY-GO rigorous optimization package, outlined the scheme of their
interaction, and presented sample runs and statistics that support these choices. We
demonstrated that the quality of the result increases as the information about the
search domain is reﬁned, which is an essential feature for integration with the box
elimination scheme of COSY-GO. Full implementation of the combined GATool and

COSY-GO algorithm is a matter of technical details of integration and is a promising

119

direction of future research.

120

CHAPTER 3

Constrained Optimization

3-1 Challenges in Constrained Optimization with

Evolutionary Algorithms

As noted in section 2.1, EAS are successfully applied to problems for which conven—

tional methods are not applicable or fail, and they have proven themselves useful for
many of such real-life problems. The issue here is that Evolutionary Algorithms were
not originally designed to handle constraints. Even though unconstrained EAS had
already demonstrated themselves to be very efﬁcient general-purpose optimizers, the
ability to handle constraints would significantly increase their range of applications
and help in solving many important optimization problems.

This motivation drove the development of a large number of different approaches
for constraints handling for EA and their successful usage in a number of different
Constrained optimization problems. In this section, we give just a short review of the
most commonly used constrained optimization methods for EAS. Necessary modiﬁca-

tions are made to the way they were originally presented to make the review uniﬁed.

121

For more methods, detailed descriptions, critique and comprehensive bibliography on
the topic, we refer to [129], [42], and [128].

There are certain challenges in adapting EAS for constrained minimization. For a
general EA optimizer, the only operation that connects an algorithm with a problem
is the evaluation of an objective function. This operation alone then serves as a basis
for the ﬁtness evaluation of the population members (see section 2.3) and constraints
are ignored. Suppose now that we are implementing an EA optimizer for constrained
problems and have started from producing the initial generation demonstrated in
Figure 3.1. Here the dots represent members of the population, the cross represents
the sought feasible minimum, and S and F denote the search space and feasible set
correspondingly.

Before we proceed further, we need to design the method to handle constraints and
this poses several important problems. As can be seen, the population contains both
feasible and unfeasible members positioned at different distances from the solution.
Since we are interested in a feasible minimum at the end of the search, there is an ob-
vious strategy to completely eliminate unfeasible members from further consideration.
However, this strategy is too naive since keeping unfeasible members in a population
might be beneﬁcial for the whole process as demonstrated in Figure 3.1. Here the
arithmetic crossover (see section 2.3) between unfeasible points a and c or between
feasible point d and unfeasible point 0 would likely produce a new member that is
much closer to the optimum than the one produced by a crossover between feasible
points f and (1. Hence, keeping unfeasible members might increase the probability of
success and speed of convergence.

If unfeasible members are allowed to remain in a population, the questions of

comparison between feasible and unfeasible, and between two unfeasible members,

122

arise during the ﬁtness calculation. Repairing unfeasible members in order to make
them feasible also seems like a worthwhile approach. Increased number of factors to
deal with and a variety of possible modiﬁcations to the algorithm that can be made to
handle constraints greatly increase the diversity of various approaches to constrained

EAS.

 

 

 

 

 

Figure 3.1: Example of the generation produced by EA for a constrained optimiza-
tion problem. Here points represent members of the population, the cross represents
the sought feasible minimum, S is a search space, and F is a feasible set

In the general Evolutionary Algorithm from Figure 2.6 we can pick out several op-
erations to modify for constrained optimization: ﬁtness evaluation, selection, genetic
operators of recombination and mutation and reproduction process. Combinations of
approaches and various heuristics are also possible and, in fact, are widely adopted.
Also popular are co—evolutionary techniques where several populations are evolved
using different ﬁtness evaluation methods and / or different genetic operators. Such an

approach can be viewed as a higher order meta-method that combines several EAs.

123

3.2 Overview of the Methods

In this section we describe various approaches and their variations in arbitrary order
trying to cover them from the simplest, most widely adopted and general methods
to the novel, more sophisticated, and more problem-speciﬁc algorithms. Note that
EAS are very heuristic optimization methods and many of them are combinations of
several techniques which makes them hard to classify. So for the reason of uniﬁcation
and for the sake of simplicity of comparison we mostly cover only the main ideas
introduced in these methods. Their implementation details, testing and applications
can be found in references given along with descriptions. As a last note before starting
the review, we have to add that examples of successful applications of constrained
EA to real-world problems from various ﬁelds are numerous and could be found, to

list a few, in [36,37,45,55,59,80,99, 112,149].

3.2.1 Killing

Perhaps, the most obvious and frequently used approach to constraint handling in EAS
is to eliminate unfeasible members of the population. Usually elimination is performed
after genetic Operators are applied but before ﬁtness evaluation. Replacements are
then generated using the selected new members generation method (usually uniformly
distributed random points from S, see section 2.3). To allow some unfeasible members
in population for the reasons described earlier, members are often eliminated non-
deterministically, with certain probability that increases with the amount of constraint
violation (usually estimated by some combined penalty function, see section 3.2.2).
To increase the number of the feasible members, the elimination-regeneration process

can be repeated several times or until a certain percent of the population members

124

becomes feasible. Note, that in its simplest form, i.e. when all the unfeasible members
are eliminated independently of the amount of the constraint violation, this method
uses minimal amount of information about the problem and thus is expected to be
inefficient.

From the description it is logical to suggest that the killing method performs
reasonably well only in the cases where the p factor is large [42]. Its practical usage
demonstrated that this is, indeed, true and that its performance is far from acceptable
(see section 3.3.6) only the cases of the small ,0, i.e. when the feasible set is signiﬁcantly
smaller that the search space [133], which frequently happens when constraints are
hard to satisfy. This method is particularly inefficient for problems where the global
minimum is attained at the point where some constraints are active, i.e. hold as
equalities, because they are hard to satisfy. This observation supports the claim that
limiting the EA search to only a feasible space may reduce its performance since in
this case EA is omitting the information about a search space provided by already

generated unfeasible individuals.

3.2.2 Penalty Functions

The penalty functions paradigm was not invented speciﬁcally for EAS. Rather it
was suggested as a general numerical method applicable to constrained optimization
problems. Its basic idea is to transform the original constrained minimization prob-
lem (1.3.1), (1.3.3) into an equivalent unconstrained minimization problem. Here
equivalence means that the feasible minimum of the original constrained problem is
a minimum of the resulting unconstrained problem or at least is acceptably close to
it.

This transformation is performed via a set of so—called penalty functions

125

PJ-(hJ-(x)), j = 1,. . . ,7). corresponding to a set of constraints. Here penalty func-
tion Pj calculates the amount of penalty assigned to a vector x for violating j-
th constraint. Utilizing those functions the problem of constrained minimization
(1.3.7), (1.3.8) could be transformed into an unconstrained multi-objective minimiza-

tion problem

x"< = arg min <I>(x), (3.2.1)
xES

T that could be solved

where M) = (1010100), P201200). . . . .Pn<hn<x)>, f(X))
by multi-objective optimization techniques. It could also be converted even further

to an unconstrained single-objective minimization problem

x* = arg min cp(x), (3.2.2)
xES

where cp = cp(<I>(x)) is the function that combines the original objective function
and penalty functions into a single objective function. Usually penalty functions are
chosen such that [[9900 — f (x)|[ ———+ 0 as x —+ F. Function 4p also has to be balanced
to guide the search process to a feasible set F and hold it there, but not to interfere
with the search of the minimum inside F. Care must be taken to achieve this balance
in terms of the inﬂuence of the original objective function and penalties on a combined
function (,0. In case penalties are dominating a value of (p, the pressure to produce
feasible points might prevent the algorithm from ﬁnding an optimum. In the opposite
situation, i.e. if the original objective function dominates in calculating the value of
(,0, the optimization result tends to be optimal but unfeasible and thus useless.

A variety of methods to deﬁne penalty functions for (I), combine them and the
original objective function into a function cp(x), produced a large number of differ-
ent constrained minimization methods. Nevertheless, since different problems have

different properties of the constraint functions sets, there seem to be no universally

126

optimal penalty function deﬁnition strategy, similarly to a nonexistence of the uni-
versally best general optimizer as discussed in section 2.1.4. The optimal approach
should instead be selected and ﬁne-tuned based on knowledge about the posed prob-
lem. It should also be noted that multi—objective optimization problems are generally
harder to solve (see section 2.1), so it is often more desirable to convert a constrained
problem into a single-objective unconstrained problem (3.2.2) by choosing appropriate
P1,P2,...,Pn,<p.

The most frequently used method to deﬁne the combining function 4,9 is a linear

combination of the individual penalties:
n
99(131, P2,...,Pn;f) = 2 ijj + wnﬂf, (3.2.3)
k=1

where wj are freely chosen weight constants. Under this choice of cp the constrained
optimization problem (1.3.7), (1.3.8) is transformed into an unconstrained optimiza-
tion problem (3.2.2). Since wn+1 is a weight coeﬂicient of the objective function

in the original constrained problem, usually chosen to be unity for simplicity. The

objective function then assumes the form
n
we) = f(x) + ZjlePJ-(hﬂxﬂ. (3.2.4)
J:

For a general-purpose optimizer in cases where no information about a problem is
available, all weight coefficients for penalties are usually set to unity, at least initially.
Since in practice constraints and thus penalty functions often have different ranges
of values, weight coefficients wj can be selected to either normalize penalty values to
balance their inﬂuence on the combined objective function or to increase the relative

impact of some constraints if they are known to be harder or more important to

satisfy.

127

Historically, the penalty function method applied to nonlinear programming was
prOposed by Fiacco, McCormick and Zangwill [72,127]. Their proposal was to replace
an original constrained optimization problem with a single-objective unconstrained
optimization problem via (.9 from (3.2.4), where wj = r > 0, Vj, r is an arbitrary
constant. Then by carefully choosing penalty functions it can be guaranteed that the
search process applied to the unconstrained Optimization problem (3.2.2) converges
to a feasible minimum. Two most frequently used types of penalty functions that

satisfy these requirements are inverse:

 

1
Pj(Z) = —h,j(X) (3.2.5)
and logarithmic:
P313) = —10g(—hj(X)), (3-2-6)

j = 1,... ,n, so substituting them to (3.2.4) we get

 

" 1
tax) = f(x) — r 2 , (32.7)
F, h,(x>
and
990:) = f(x) — r E: log ( — g(x)) (3.2.8)
j=1
correspondingly.

Note that the penalty part of the objective functions (3.2.7) and (3.2.8) demon-
strates fast and unbounded growth if one of the constraint functions hj(X) approaches
zero. Therefore it creates an inﬁnite barrier on the boundary of the feasible set in the
objective function landscape and thus prevents iterative unconstrained minimization
methods from stepping outside of the feasible region once they start their search inside
it. Also note that the inﬂuence of the penalties falls off rapidly as we move from the

boundary. So if we choose the parameter r so that the unconstrained minimum of the

128

combined objective function (3.2.7) or (3.2.8) is reached at the feasible minimum of
the original objective function, we can approach the original constrained optimization
problem with most of the unconstrained minimization methods.

In practice, this method is applied by solving a sequence of the unconstrained
optimization problems (3.2.7) or (3.2.8) with different values of the parameter r = rk
for k = 1,2,... such that rk —> 0 as It increases. This approach is called Sequential
Unconstrained Minimization Technique (SUMT) and is proven to converge under cer-
tain assumptions on the objective function, the constraint functions and the sequence
{rk} [147]. Penalty functions of the barrier type are called barrier functions, so the
method is also called Barrier Functions Method.

The disadvantages of this approach are its computational expensiveness and that
in order to start the search it requires a feasible initial point which is frequently hard
to ﬁnd for all but the most trivial problems. Penalty functions of the barrier type are
also called interior penalty functions since they prevent optimization methods from
considering any unfeasible members during the execution, thereby pushing them to
search in the interior of the feasible region. Exterior penalty functions allow unfeasible
members to be considered during the search process but assign them a penalty that
generally grows with their distance from the feasible set. Usually exterior penalty
functions are such that Pj = P-(z) 2 0, z E R, j = 1,...,n and deﬁned in the

J

following way

0 < O
P(z) = Z — , . (3.2.9)
penalty(z) > 0 otherw1se
The most frequently used penalty functions of this type are from the power penalty
family:
0 < O
Pa(z) = { a z _ , = (max{0, 2})a, (3.2.10)
2 otherw15e

Where a = 0, 1, and 2 are most frequently used.

129

If we then substitute the value of the constraint function into a penalty function

of the type (3.2.9)
@0500).

we obtain a non-negative penalty assigned to a vector x for not satisfyng the j-th
constraint or zero if the j-th constraint is not violated (here index j of the penalty
function is given because generally penalty functions could be selected separately for
each constraint function). Power penalty functions (3.2.10) use a violated constraint
function value at the unfeasible point raised to the a—th power as a penalty (see
example of the power penalty for a = 1, one-dimensional problem in Figure 3.2).

In order to demonstrate how penalty function values correlate with the distance

of the unfeasible x from the feasible set F we use the simple constraint
h(X) = IIXIloo — 1 S 0, (3-2-11)

where

def
llxlloo = ._max In:

’00.,

is the inﬁnity norm of x. This constraint deﬁnes F to be the u-dimensional cube cen-
tered at the origin of the coordinates, with the length of the side equal 1. Colormaps
of the power penalties (3.2.10) with a = 0, 1, as well as the actual Euclidean distance

from x to F

d<x. F) dif min d<x.y>,
yEF

for u = 2 are demonstrated on the Figure 3.3. It can be seen that that for a % 0
a power penalty function provides an approximation of the actual distance to the
feasible set, although there can be cases where such approximation can be too crude.
In this case special penalty functions based on the knowledge about a particular set

Of constraints are preferred.

130

 

All h(x)

 

 

 

/>x
(a) Inequality constraint function

A’ 1
P (h(X))

 

 

 

 

 

 

 

J»

 

 

 

 

(b) Power penalty for inequality constraint function

Figure 3.2: Example of the one-dimensional inequality constraint function and cor-
reSpending power penalty function of the type (3.2.10), a = 1.

131

 

 

 

 

 

 

 

 

 

Figure 3.3: Left to right, top to bottom: colormap plots (scales are different, i.e. the
same color on different plots may correspond to different function values) of P0(h(x)),
P1(h(x)), and the actual Euclidian distance from x to F where h is given by the

formula (3.2.11) and F is a set of all x E S = [——5, 5] X [—5, 5] such that the constraint,
h(x) S 0 is satisﬁed

132

Since changing the ﬁtness evaluation procedure by using combined penalty func—

tions (3.2.2) requires relatively small effort and is relatively easy to analyze due to
moderate changes to EA, this approach is widely adopted for constrained EA opti-
mization and has demonstrated its practical usefulness [129]. A common weakness
Of such an approach is that the choice of penalty functions, combining function (,0
and their parameters are strongly problem-dependent. Most Of the time it requires
extensive ﬁne-tuning for the problem in order to achieve Optimal performance. Since
EAS are often applied to the problems with little knowledge available in advance, this
turns out to be a non-trivial task. Solving it is a matter of experience, trial and error,
and good heuristics. Little to no theory about choosing the right penalty functions
for the problem and method have been developed. There is, however, a promising
potential solution in the development of penalty function methods that are adapting
and self-adapting (see later in this section).

A general observation about penalty functions coming from experience is that
penalty functions Of just the number Of the violated constraints generally perform
worse than the penalty functions using the information about the degree Of the con-
straint violation [162]. This observation supports the claim of the “NO Free Lunch
Theorem for Optimization” (see section 2.1.4) stating the direct dependence of algo-
rithm performance on the amount of information about. the problem it utilizes. From

the family of power penalty functions (3.2.10),

0 2:30,

1 otherwise

190(3) = (3.2.12)

is the only one that does not use information about the degree of constraint. violation

and thus should be avoided.
The penalty function methods described so far, can be used not only with EAS

but with most other unconstrained Optimization methods (including those described

133

 

in section 2.1) and, in general, are frequently used for constrained Optimization. Later
in this section we review several EA-speciﬂc penalty function methods. All of these
use transformation to a single-objective minimization problem (3.2.2) via some sort
of combining function, «,0. It is also worth noting that other methods of constrained

EA optimization frequently use penalty functions for various auxiliary tasks.

Levels Of Violation

This method was prOposed by Homaifar, Lai and Qi [87]. For each constraint, a user

defines several levels Of constraint violation:
0 = sz < le < < 2)] < Zl+1,j = +00, j = 1,...,n, (3.2.13)

where l is the chosen number of levels of violation and penalty level coefficients are

le’RQj’” "RH-1,]. SO that
Rj(:) = Rkj’ zk—1,j< Z S skj. (3.2.14)

Then the combined objective function is built using a linear combining function (3.2.4)

and power penalties (3.2.10) with

1),-(z) = 192(2), (3.2.15)
wj = 1111(3): Rj(z), (3.2.16)
forj =1,...,n. Hence
99(x) = f(x) + Z RJ-(hJ-(x))P2(hJ-(x)). (3.2.17)
j=1

The main idea here is to give a user the ability to precisely balance the contribution

of each constraint to the combined objective function by making weight coefficients

134

wj dependent not only on the index of the constraint function but also on the vio—
lation level for this constraint. However, this method requires levels of violation and
violation coefficients to be defined for each constraint, thus the total number of pa-
rameters of this method is n(2l + 1). Therefore their determination for the problem
is a non-trivial problem itself and can easily get quite taxing. At the same time,
practical studies from [133] indicate that the quality of solutions obtained via this

method heavily depends on the choice of these parameters.

Multiplicative

An interesting approach was proposed by Carlson [34]; she suggested constructing
the combined objective function for unconstrained optimization by multiplying the

original Objective function by a penalty:

20‘) = f(X)P(X)a (3-2-18)

where P is designed such that P(x) 2 1, Vx E S. Note that this method also
requires f (x) 2 0 , Vx E 3. Studies on various multiplicative penalties demonstrate

reasonably good performance [37].

Dynamic

Joines and Houck [98] propose using a dynamic penalty function, i.e. a function
with parameters depend on the number of the current EA generation, similar to the
parameter rk in the SUMT method. In their method, the combined Objective function

iS built using the linear combining function (3.2.4) and power penalties (3.2.10) with

PJ-(z) 2 195(2), (3.2.19)

iv] = (01:)0‘, (3.2.20)

135

for j = 1,. . .,n, where k is the generation number and constants C, 01 and 6 are
method’s parameters that can be used to tune it to the problem. With this choice of
penalties and the penalty combination method, the unconstrained Objective function

assumes the form

g(x) = f(x) + (Ci-)0 2 Pﬁ(hj(x)). (3.2.21)
j=1

Parameters must be chosen such that the penalty multiplier (Ck)a Of the objective
function increases with generations, thus increasing the pressure to produce feasible
individuals after a given time to explore the whole search space S. This method, albeit
reported to be efficient and having much smaller number of parameters to set, suﬂ'ers
from the problem of most penalty function ,ethods: the sensitivity of the quality of
a result to the choice of method parameters. Even though the values C = 0.5 and
13 = 1 or 2 were found by authors to work reasonably well, there is an evidence that
these values lead to premature convergence at either an unfeasible solution or at. a
feasible solution impractically far from the optimum (see examples in [133]).
Another idea is to change the penalties dynamically in similarly to the Simulated
Annealing method described in section 2.1. The method proposed by M ichalewicz and
Attia [130] divides all constraints into four types: linear equalities, linear inequalities,
nonlinear equalities and nonlinear inequalities. th the uses a random single point.
that satisﬁes all linear constraints as a seed and maintains linear constraints satisﬁed
via a set of specially designed genetic Operators, ﬁnally, it applies a linear combining

function (3.2.4) and power penalties (3.2.10) with

Pj(z) = 192(2), (3.2.22)
1
11’} = an (3.223)
f01‘ j = 1,. . . ,n and {Tie}, a. cooling schedule deﬁned by the user. The summation
136

 

Of penalties in the resulting Objective function is done over all nonlinear equality and
inequality constraints. Linear constraints are considered taken care of, hence

1

2?); Z P2(hJ-(x)), (3.2.24)

jEA

99(X) = f(X) +

where A is a set Of indices of the violated nonlinear constraints and Tk is decreasing
as the evolution progresses. The process is stopped upon reaching a ﬁnal, predefined
“freezing temperature”, 7f This method provides not only a good performance on
many test functions with To = 1, ff 2 0.000001 and Tk+1 = 0.1rk. [130] but also a
signiﬁcant. sensitivity to the choice of cooling schedule.

Another annealing objective function is proposed by .loines and Houck [98] and is

based on their penalty combination function (3.2.21):

(CW 11 ,PﬁthJ-txn

99(x) = f(x) + e F (3.2.25)

However, this function requires very precise normalization of penalties in order to
avoid ﬂoating point overﬂows during computations.

It is worth noting the results published by Hilton and Culver [37]. They tested the
multiplicative penalty function method (3.2.18) and additive penalty method (3.2.4).
They observed that the linear change of weights with generations resulted in faster
convergence when they were kept constant and the multiplicative penalty method

(MPM) was more robust than the additive penalty method (APM) for their problem.

Adaptive

The main idea behind adaptive penalty function methods is to introduce a feedback
about the performance of the search process into the penalized objective function. An
intention behind this idea is to increase the penalty if the algorithm is experiencing

insufficient pressure to generate feasible members or decrease it in the other order to

137

redirect effort towards the search for the minimum of the objective function. Weights
Of the individual penalties could be redistributed based on the statistics about the
number of members of the population satisfying particular constraints over gener-
ations (see, for example [57]). However, since the adaptive penalty approach itself
requires some parameters that are set by the user, this method still suffers from the
same parameter-dependence performance problem as all penalty function methods.
Research in this direction looks very promising and rewarding, but it is out of the
scope Of this work. For a detailed description Of the family of the adaptive and self-
adaptive penalties we refer to [42] and for examples of the adaptive penalty functions

and their applications we refer to [45,58].

3.2.3 Special Genetic Operators

Special genetic operators could be applied to preserve the feasibility of the population.
They are particularly important for problems where ﬁnding even one feasible solution
is problematic. These operators are heavily utilized in the GENOCOP framework
(GEnetic algorithm for Numerical Optimization for COnstrained Problems) invented
and developed by Michalewicz. They are used to handle constraints in cases where
the feasible space F is convex [131]. The first versions were capable of processing
linear constraints only; later versions used co—evolutionary techniques and a repairing
method to extend the method to process non-convex spaces as well I 132,134]. We note
that the task of designing such operators is very problem dependent and might not
be Solvable for the general feasible set F deﬁned by generally nonlinear inequalities

(1.3.14).

138

3.2.4 Selection

Since changes in an objective function produced by penalty functions methods di-
rectly affect the ﬁtness of the individuals, they also implicitly affect the selection
process. For the penalty function methods, we refer to section 3.2.2 and here we de-
scribe only methods that modify the selection process explicitly. Methods Of selection
modification avoid the weakness of penalty function methods, which is a. sensitivity
of their performance on the choice Of the penalty functions and various parameters of
the method. An example of the. most straightforward approach of deﬁning a selection
for constrained problems can be found in the method of Powell and Skolnick [157].
They suggest using the heuristic selection rule: “every feasible solution is better than
any unfeasible.” However, for problems with small p factor, the algorithm is Often
trapped in an unfeasible solution [133] and its behaviour is similar to the behaviour
of the killing method described in section 3.2.1.

Coello [43] suggests splitting the population into a number of sub-populations
equal to the number of constraints plus one and then performing the selection as
follows: in each of the sub-populations, selection is based on the corresponding con-
straint violation; in the last one it is based on the objective function value. This
approach is reported to produce good results on several test problems albeit with a
Somewhat slow convergence rate.

An interesting algorithm to compare individuals was proposed by Jiménes and
Verdegay [94]. Selection in their method is performed via a series of deterministic

binary tournaments with the winner determined based on the following set of rules:

1. If both members are feasible, selection is performed on the objective function

value.

139

2. If one member is feasible and another one is unfeasible, select the feasible mem-

her.

3. If both members x and y are unfeasible, select the one with the smaller maxi-

mum constraint violation:

{: argmin . max hJ-(x), . max hj(y)
]=l,...,71 y=1,...,n

This method strongly favors feasible points Over unfeasible ones, thus might be omit-
ting useful information about the problem provided by unfeasible members. Hence it
might get trapped in a wrong part of the feasible space, i.e. far from the feasible min-
imum. However, the method may be useful if the feasible space is heavily constrained
and/or relatively small.

An extension of this technique, inspired by the concept of the non-dominance from
Game Theory and multi-objective optimization and based on so called Niched-Pareto
Genetic Algorithm [88] for multi-objective unconstrained Optimization, was suggested
by Coello and Mezura [41]. Suppose we are given a multi-objective Optimization

problem: to ﬁnd x* E S such that

x* = arg min F(x), (3.2.26)
xES

Where F(x) = (f1(x), f2(x), . . . , fN(x))T.

The deﬁnition Of the minimum for the unconstrained multi-objective optimization
PFOblem is not trivial since, generally, objective functions f,- (x) attain minimal values
at. different points. Therefore the problem is not ﬁnding a vector that minimizes
all objective functions simultaneously but rather ﬁnding a set of all Pareto optimal
DOints in a search space [151]. To deﬁne Pareto optimality for the problem (3.2.26)

“'9 first deﬁne the concept of Pareto dominance. Vector x is said to dominate vector

140

y in the Pareto sense if it is not worse than y in all Objective functions values
F1003 nty). (=1.....N

and is strictly better in at least one objective function value, i.e.:

3i; g(x) < F[(y).

Basically, this deﬁnition says that any of the objective function’s values at x could
not be improved by any y from the search space without. worsening another objective
function value. Then the point x* is called Pareto optimal (or non-dominated) for
the problem (3.2.26) if it is not dominated by any other point x E S, i.e. if there

* in the search space.

does not exist a better compromise for the x
Considering penalty functions of constraints as additional objectives to minimize
and thus transforming a single—objective constrained minimization problem into a

multi-Objective unconstrained problem (3.2.1), they propose to use this definition to

modify the selection rules as follows:

1. If both members are feasible, selection is performed on the objective function

value.

2. If one member is feasible and another one is unfeasible, select the feasible mem—

ber.

3. If both are unfeasible, one is dominated, and another one is non-dominated,

select the non-dominated member.

4. If both are unfeasible and dominated or both are unfeasible and non-dominated,

select the one with the minimal constraint violation.

141

As could be seen, the first two rules remained intact and the last rule is changed
using the deﬁnition of non-dominance. This approach is demonstrated to be useful
and computationally efficient on various test problems. Based on this method and
the concept of non-dominance, Oyama, Shimoyama and Fujii developed an extension

also applicable to multi-Objective constrained optimization problems [149].

3.2.5 Repairing

Repair algorithms are based on the idea of “repairing” the unfeasible members of
the population to make them feasible and then either using the repaired version to
evaluate the ﬁtness of the original member or to replace it altogether. Particularly
popular in combinatorial optimization problems, they are not widely used in non-
linear numerical optimization problems. The GENOCOP algorithm, mentioned in
section 3.2.3, employs repair algorithms [132]. During the search process it maintains
two separate populations. The ﬁrst one is kept feasible against linear constraints via
specialized genetic operators and is used for the search of an Optimum. The second
population is used for constraint satisfaction and consists only of fully feasible mem-
bers. These fully feasible members are used to repair linearly feasible members of the
ﬁrst population before ﬁtness calculation.

In principle, any optimization algorithm (see section 2.1) can be used for repairing.
Since these methods put high pressure on keeping population feasible, they might be
particularly useful for heavily constrained problems, problems with small feasible
space and problems where constraint satisfaction is critical like engineering problems
or veriﬁed optimization (see section 2.2). The disadvantages of these methods are
that they increase computational cost and might actually harm the search process, if

repair algorithm is not tuned for a problem by turning potentially useful unfeasible

142

members into useless feasible members (see examples of useful unfeasible members on

Figure 3.1, section 1.3.3).

3.2.6 Other Methods

Handling constraints in some pre-deﬁned order was suggested by Schoenauer and
Xanthakis [166]. Their method starts with a randomly generated population and
evolves it to minimize the violation of j-th constraint on j-th step, i.e. its objective

function on j-th step is

gal-(x) : Pn(hj(x)). (3.2.27)

The population from step (j — 1) is used as an initial population for step j. Points
that do not satisfy the first (j - 1) constraints are eliminated during the j-th step.
The search process st0ps when a certain threshold on a number of members of the
population that satisfy the current constraint is reached. On the last step, the method
Optimizes the objective function itself. This algorithm was primarily designed for
engineering problems where the search space is small and sparse and shows reasonable
performance in these cases. However, it poses a problem of selecting a particular order
of constraints. It is reported that different orders produce different results in terms
of the run time and precision [133].

Koziel and Michalewicz [108] proposed and reported successful application of the
interesting method based on establishing a homomorphous mapping between a gen-
erally nonlinear, non-convex feasible search space and a v-dimensional cube that is
convex, linear and is much easier to optimize with EA. Using this method, an,original
constrained problem is transformed into a topologically equivalent but simpler uncon-
strained problem. This method, apart from being elegant and efﬁcient, suffers from

the complexity of the transformation in general case and the additional computational

143

expense it requires.

The co—evolutionary model [8] constitutes another popular direction Of constrained
optimization with EA. As was mentioned earlier, some algorithms are maintaining
several populations in which evolution is performed based on various criteria. Some-
times different flavours of EA are used. GENOCOP (see section 3.2.3), for example,
maintains two populations, one with members satisfying linear constraints and an-
other one with members satisfying all constraints.

Other approaches include using Immune System Emulation [44,84], Cultural Al-
gorithms I40], Ant Colony Optimization I28], and many others. However, there is no
universally superior constrained optimization method found yet; most of the described
methods are applicable to different problems with reasonable average performance yet
they perform particularly well on some speciﬁc problem or a class of the problems.
There is a variety of other methods that are unique or combine several approaches
and thus fall off our classiﬁcation. For thees methods we refer to the references in the
beginning of the section. Spreading interest in the ﬁeld and active contributions from
many researchers, have increased their number rapidly. We present our own approach

to constrained Optimization with BA in section 3.3.

3.3 The REPA Constrained Optimization Method

3. 3.1 Introduction

AS already noted in section 1.3.3, constrained Optimization problems (1.3.8) form
an important class of all optimization problems (1.3.4), which is the reason for the
existence of a large number of constrained optimization methods for Evolutionary Al-

gorithms (see section 3.2). These methods are usually divided into several subclasses

144

by the main approach employed to handle constraints in originally unconstrained
EA: killing, penalty functions, special genetic operators, selection modiﬁcation, re-
pairing unfeasible individuals and others. Each of these approaches has advantages
and examples of successful applications as well as disadvantages demonstrated in test
cases. It was also noted that existence of the universally best constrained optimiza-
tion method for EA is not very probable due to the “No Free Lunch Theorem for
Search and Optimization” (see section 2.1.4) [135].

However, non-existence of the best method does not eliminate the possibility to
design better general methods. Since most of the time Optimization methods are
applied to a particular problem, there is also a deﬁnite need for new methods that
perform better on this problem or class of the problems even if they fail on some other
problems. Also, since the repair approach to constraints handling was successfully
applied to a combinatorial optimization with EA but is not very pOpular in numeric
mathematical programming, it was worth exploring. Our main motivations were to
create a method which would be useful for constrained veriﬁed global optimization (see
section 2.2 for global Optimization and section 2.3.6 for the description of the scenario
Of integration Of our EA with COS Y-GO global Optimization package) and to use for
the problems of constrained optimization in accelerator physics where constraints
are often imposed by physical limitations and thus must not be violated. For these
purposes, the main requirement is not to produce the Optimal value but rather to
produce good value to serve as an excellent cutoff update as fast as possible. For
constrained veriﬁed global optimization the result must be feasible, otherwise it is
useless.

Also note that while the found value is feasible but not optimal, it could still be

a good cutoff value. Most of the methods described in section 3.2 are constructing

145

feasible members by generating the initial population and then performing stochastic
mutation and crossover. Most of the methods, while shown to be efficient given
enough time and good at preserving diversity in the population, might not put enough
pressure on constraints satisfaction to work in conjunction with COS Y-GO. The
repair type of the constrained satisfaction methods seems the most promising for our
purpose. Finally, this method might also get useful if the evaluation of the Objective
function is much more expensive than evaluation Of constraints, which, nevertheless,
have to be satisﬁed since it does not require objective function evaluations in order

to perform repairing. The approach suggested in this work is Of this repair type and

is called REPair Algorithm (REPA).

3.3.2 REFIND: REpair by Feasible INDividual

The REPair Algorithm (REPA) consists of two repairing techniques working together
in order to transform an unfeasible member of the population into a feasible one and
then replacing it in the population with the result if repair was successful. These two
techniques are called REpair by Feasible INDividual (REFIND) and REpair by PRO-
jecting through OPTimization (REPROPT). The first one was originally suggested
by Michalewicz, then implemented in his GENOCOP package [132] and then adapted
and extended by us. The idea of the method is to use already feasible members Of
the population to repair unfeasible ones by searching for a replacement along the line
connecting an unfeasible member Xu 6 S \ F with the feasible member xf E F. In
most cases there is at least some neighborhood of xf that. belongs to F or the line
might cross F in several places. Hence, such searches have high chances of producing
a replacement feasible member that is not xf, thus not trading off the diversity of

the population. The algorithm for the REFIND is presented on the Figure 3.4. Here

146

P is a penalty function Of the type (3.2.9) that also includes penalties for leaving the
search space S'. The search for a feasible point along the line is performed via one
of the COSY Inﬁnity built-in optimization algorithms (see section 2.1) starting from
the pre—deﬁned initial parameter value A0. The method has the following parameters

that are to be set by user:

0 The feasible member search algorithm: currently the search is sequential and
is is performed until the maximum allowed number of feasible members (5 in
this work) is found or the last member of the pOpulation is checked. At this
point the one that is closest in a sense of the Euclidean distance to the repaired

member is selected.

0 The algorithm to search for the repair candidate: the search is performed along
the line (thus only one parameter is ﬁtted) that includes at least one feasible
vector xf. Hence, the probability of ﬁnding a repair in this case is high (we
are not considering the case of A = 1 since it would produce a duplicate of
the x f and thus reduce diversity). In order to keep some randomness of the
result we currently employ the ANNEALING optimization algorithm (see sec~
tion 2.1). The initial value A0 is common for all members in all generations
and quite possibly influences the results as well (see notes on the dependence
of the iterative, single-point search methods on the initial point in section 2.1).
We suggest avoiding A0 = 0 since it corresponds to xu and we know that it
is unfeasible in advance. On the other hand, values of A0 close to 1 make the
initial point for the search closer to the xf, which is known to be feasible, so
the search might stop prematurely and the result might be too close to the
x f which also unnecessarily decrease the diversity of the population. Thus, by

choosing A0 6 (0,05) we increase the probability of ﬁnding a feasible member

147

faster. In this work, A0 = 0.1 is used. Tolerance to the ﬁnal value and maximum
number of steps allowed inﬂuence the quality of a result and execution time.
We used a tolerance of 0, since the expectation of ﬁnding a feasible member by
this method is high and the maximum number Of steps equal to 15 since we
are dealing with a 1-dimensional optimization problem so the search should not
take a lot of steps. We also do not want to set up the maximum allowed number
of steps to a higher value to keep the repair method reasonably computationally

expensive.

The penalty function method: deﬁnes penalty function to minimize in order
to ﬁnd a feasible vector. For the description of the penalty function methods
see section 3.2.2. In this work, we employed quadratic penalty functions of the
power family (3.2.10) and combining function (3.2.4) with wj = 1,j 2 1,. . . ,n,
wn+1 = 0 since we are minimizing penalty only. It must be noted, however, that
the search algorithm and penalty function method should generally be selected
together. For example, some algorithms might be able to solve multi-objective

penalized problems (3.2.1) better than single-objective combined problems.

3.3.3 REPROPT: REpair by PRojecting through OPTimiza—

tion

The second method used to repair individuals is REPROPT. Its main idea is to

perform a projection of the unfeasible member to the feasible set by Optimizing the

penalty function via some single-point iterative method (see section 2.1). The un-

feasible point serves as an initial value for the optimizer. Note that the projection

here means a possibly existing element in the feasible set F that could be found via

148

 

 

Find feasible individuals from the current population
R: {xf,1vxf,21""xf,N}

If at least one feasible individual is found
Find xf E R such that d(xf,xu) = mianR d(x,xu)
Search for a feasible point along the line connecting xu and x f by I

 

solving the Optimization problem X“ = arg minA P(xu(1 — A) + Axf), j
where I) is a penalty function for constraints violation
If the resulting penalty is within tolerance

Repair succeeded, return x = x-u(1 — X“) + Xfxf
Else

Repair failed
End if

Else
Repair failed

End if

 

 

 

Figure 3.4: REFIND: REpair by Feasible INDividual algorithm

the optimization process; hence it depends on the method and method parameters.
IVIoreover, if the method is stochastic (for example, Simulated Annealing), the results
of the projection are not unique.

This method follows the same logic as REFIND in Figure 3.4. However, for RE-
PROPT, there is no feasible member in the population thus there is no parametriza-
tion of the line between xf and xu as in REFIND, hence instead of minimizing the
penalty function by changing one parameter /\ (1-dimensional problem), we perform
minimization by changing all coordinates (or a subset) of the repair candidate (multi—
dimensional problem). This, of course, increases the complexity of the problem (see
also section 2.3.6 for discussion of the increased complexity of the multi-objective
0Dtimization), since the direction towards a feasible set is generally not known. Thus
for high-dimensional problems, this method might be inefficient and might have to be

replaced with some other strategy, for example, killing. Some other strategy to find at

149

least one feasible member for REFIND can also be employed. Another possibility is
to use quasi-projection, i.e. to project using a relatively large penalty tolerance. Then
the points successfully projected by REFIND, would reside in some neighborhood of
F but not necessarily inside F itself.

Parameters of this algorithm are the choice of the penalty functions method, opti-
mizer for projection, penalty satisfaction tolerance, and a maximum number of steps
allowed. Parameters for REFIND and REPROPT can be tuned separately. In this
work we used the LMDIF Optimization algorithm (see section 2.1) and transforma-
tion of a single objective constrained problem into a multi-objective problem (3.2.1)
via power penalties (3.2.10) (see section 3.3.5 for the justiﬁcation of the choice) with
penalty tolerance 1_5 and the maximum number of steps allowed, 50. Note, that.
since LMDIF is capable of handling multi-objective optimization problems, we trans-
form our single-Objective constrained problems into a multi-objective unconstrained

problem (3.2.1).

3.3.4 REPA: REPair Algorithm

A simpliﬁed demonstration of the possible results of the application of those two
algorithms to the example initial population from Figure 3.1 is presented in Figure 3.5.
Here, points 6 and g are repaired by the REFIND algorithm, and a is repaired by the
REPROPT. Dotted lines represent optimization paths and primed points represent
possible repair candidates.

For the unfeasible point e, the closest feasible point is f so the successful repair
candidates 6’ and e” are located on the line connecting e and f, 6’ corresponds to
N“ < 1 and 8” corresponds to X" > 1. For the unfeasible point 9, the closest feasible

point is h. Even though h lies on the boundary of F, the repair candidates 9’, g”,

150

9'”, and 9”” lie inside F. Note that the line connecting g and )2 leaves and enters

F. This demonstrates that repair candidates obtained via some feasible point might

not necessarily be close to it. thus the diversity is preserved. Here 9’ corresponds to

”II

A* < (l. 9” corresponds to 0 < A < 1, and g”, and 9 correspond to A > 1.

 

S

 

 

 

 

Figure 3.5: An example of the repairs performed by REFIND and REPROPT repair
methods (F is large compared to S)

Even though there are feasible members in the population, point a is shown re-
paired by the REPROPT method as a demonstration. It. can be seen that. the repair
candidate (1’ obtained via some iterative Optimization method is not the closest feasi-
ble point from F that. we can obtain knowing the structure of F. However, since the
structure of F is complex and it. is given as a set. of nonlinear equalities and inequal-
ities. such projection might. not be feasible to perform. Also note that the feasible

’ is in some sense “worse” than the unfeasible point. a. itself since it

repair candidate a.
is farther from the feasible minimum. To avoid this, we can include the original ob-

jective to optimization done by REPROPT but generally this should be decided on a

per-problem basis, since, as we noted, the objective function itself might be expensive
to calculate. Also it is worth noting that here we demonstrate only successful repairs
and the volume of the example F compared to the volume of S is relatively large. If
we start from the population from Figure 3.5 but reduce the feasible space as shown
in Figure 3.6, we notice that there are no feasible members in the population to use
for REFIND and that. it. is much more difficult for REPROPT to produce a feasible
repair candidate. Thus. only (1, which was already close to F. is repaired to all while

h’. (L'. and c, are considered repair failures.

 

S
s3 '
I .0
9‘ ~Qh \
9 I ‘0 a'
O I O -
O-h- .o_ o .. .b
O f .
P
. ‘dl a p‘ \ .
\o I o C
0 ..C-6
0
0
.e
. O

 

 

 

Figure 3.6: An example of the repairs performed by the REFIND and REPROPT
repair methods (F is small compared to S)

The REPA algorithm uses both of these methods to perfm'm repairs of an unfea—
sible member as shown in Figure 3.7 (see (2.3.9) for the definition of rand). The user
can control the percentage of the population repaired and the required penalty toler-
ance. Note that this REPA does not. change the (‘ilfijective function nor does it change

the selection process, thus additional modifications to the EA might be needed. In

this work we used the EA from section 2.3 and the penalty functions method (3.2.2)

with a linear combining function (3.2.4) (11." : 1, Vj) and quadratic power penalties

1
(3.2.10).
An Additional beneﬁt of the REPA method is that it is only applied if the mem—

bers of the population are unfeasible. As long as the population remains feasible, it

produces no effect on the optimization.

 

 

If combined penalty > penalty tolerance
If rand[0,1] < percent repaired
If succeeded x = REFIND(xu)
Repair succeeded, replace xu in population with x
Else
If succeeded x = REPROPT(xu)
Repair succeeded, replacetxu in population with x
Else
Repair failed
End if
End if
Else
Repair skipped
End if
Else
Repair not needed
End if

 

Figure 3.7: REPA algorithm

 

3.3.5 Studies on Constraint Projection by Standard COSY
Inﬁnity Optimizers
Introduction

Both the REFIND and REPROPT methods have certain parameters to be set by the
user. Both use one of the built-in COSY Inﬁnity optimizers (SIMPLEX, LMDIF,
ANNEALING, all described in section 2.1). The choice of ANNEALING as the
Optimization method for REFIND was justiﬁed by two reasons: ﬁrst, the task is one-
dimensional, thus relatively simple, and is solvable with high probability, as explained
in section 3.3.2 so all search methods work efficiently, and, second we wanted to add
randomness to the process so that there is no pattern in the results of the repair such
as a repair candidate being close to the boundary of F or close to the feasible point
used for repair. Therefore we selected ANNEALING as the “most random” of all three
methods. The REPROPT method, however, deals with much harder v-dimensional
optimization problems, hence the question of the best optimization algorithm it is not
solvable as easily. Although generally this selection should be made for the problem
(see section 2.1.4), we investigated this question on the set of test problems commonly
used in the ﬁeld of constrained EA Optimization [135] (see Appendix D).

The parameters of REPROPT include the penalty functions method, the algo-
rithm used for projection, penalty satisfaction tolerance and the maximum number
of steps allowed. To determine good default values of these parameters, we studied
the performance of this method on a standard set of test problems for constrained op-
timization with Evolutionary Algorithms [128,135] (see section 2.1). Built-in COSY
Inﬁnity unconstrained optimizers (see section 2.1) and their combinations were used

for this purpose. The choice of the optimization methods is based on their long-term

reputation of being versatile, robust and efﬁcient tools (see section 2.1). They are
frequently used by many nonlinear optimization packages and are readily available in
the COS Y Inﬁnity system where GATool (see section 2.3) is implemented. Different
methods to construct penalty functions and formulate the unconstrained optimiza-
tion problem from the constrained problem by their means (see section 3.2.2) were
explored. The transformations and conventions applied to all the test problems and

the general setup of the tests are described in the next section.

Methodology

All the equality constraints of the type (1.3.2) were converted into the equivalent
inequality constraints (1.3.3) using the transformation (1.3.12) so that the feasible
set. is given by (1.3.1-4).

All constraints in the test; set were known to be satisﬁable. Since we were not
interested in the global minima of the constraint functions but rather in the simul-
taneous satisfaction of all constraints, the set of constraint functions was converted
to a set of penalties using power penalties (3.2.10) with a = 0, 1, and 2. Using the
property (3.2.9), which the power penalty functions satisfy, the problem of projecting
point x0 onto F via a chosen optimizer can be formulated as follows: using x0 as a

starting value, ﬁnd Xf such that

Pi(}z.,j(xf)) = Egg Pifhz'fx» = 0, 2' = 1... .,n. (3.3.1)

Such xf is then automatically feasible. Note that this method is equivalent to the ap-
proach (3.2.1) that allows a conversion of a single-objective constrained Optimization
problem to a multi—objective unconstrained problem via penalty functions. The differ-
ence from it is that in our case we do not have an Objective function to minimize. Note

also, that in (3.3.1) we can be satisﬁed with non-zero penalty values if they are within

the desired tolerance from zero (e.g. because of the practical considerations). This is
particularly applicable to the converted equality constraints because they might be
non-zero simply due to the limited precision of the ﬂoating-point arithmetic.

Three types of the objective functions were tested:

0 all combined: a multi-(‘ibjective problem (3.3.1) was converted into a single-

objective problem (3.2.4) with all “'i = 1.

o equality combined 1 inequality combined: a multi-objective problem (3.3.1) was
converted into a two-objective optimization problem with inequality constraints
and equality constraints (transformed to inequality constraints using (1.3.12)
but still more difﬁcult. to satisfy than the true inequalities) converted into two
separate Objective functions using the method from all combined approach.
This distinction was made because the equality constraints are usually harder
to satisfy thus they might require more severe penalties in order to be satisﬁed

by optimizer.

0 separate: a multi-objective optimization problem (3.3.1) was treated as is. It
must be noted, however, that for ANNEALING and SIMPLEX methods, the
problem was internally converted into a single-Objective optimization problem of
optimizing the sum of the squares of the objective functions, i.e. this formulation
is equivalent to the all combined method for a = 2. LMDIF has the ability to

solve multidimensional problems directly.

The following abbreviations for the search methods are used: S -— SIMPLEX,
L — LMDIF, A — ANNEALING Optimization methods. Combined methods were
implemented by making several steps using one method and then making several

steps using another method with the idea to combine the strengths of both methods

and compensate for their weaknesses. Combinations of methods and their respective
abbreviations are: S+A ——- SIMPLEX + ANNEALING, S+L —— SIMPLEX + LMDIF,
L+A — LMDIF + ANNEALING.

Each combination of the penalty function (a = 0,1.2, selected separately for
equality and inequality constraints) and the optimization problem formulation (all
combined, equality combined + inequality combined, separate) was tested on each
of the simple (S, L, A) and combined (S+A, S+L, L+A) methods. For the prob-
lems without equality constraints, Optimization problem formulations all combined
and equality combined l inequality wmbined are equivalent, hence only all combined
was tested. For problems with only one constraint all formulations of optimization
problems are equivalent.

Special abbreviation for each variant of the problem formulation and optimization
strategy is employed. The description starts with the abbreviation of the optimiza-
tion method (S, L, A, Si A, S+L, L+ A) followed by the type of the penalty function
used for the constraints, enclosed by parentheses. For problems with equality and
inequality constraints, both types are separated by a comma and the ﬁrst type cor-
responds to equality constraints. The following types are used: 1 for power 0, z
for power 1, and z2 for the power 2. For problems with inequality or equality con-
straints only, one type denotes the type of the penalty used for the corresponding
constraints. For optimization problems of the all combined type “:c” is added after
the method abbreviation before parenthesis. For problems with both equality and
inequality constraints type equality cmnbined r inequality combined is also marked

9'.

with “:c”, types of the penalties are separated by “+ instead of a comma. Examples:
S+L:c(22) denotes SIMPLEX+LMDIF combined method, problem with inequality

constraints only, all combined objective function, and a penalty power is 2. L(z)

157

denotes LMDIF method, separate objective functions, power of the penalty equals
1. L+A:c(z + 2:2) denotes combined LMDIF+ANNEALING optimization method,
equality combined -/ inequality combined Optimization problem with penalty power 1
for equality constraints and 2 for inequality constraints.

Test problems were constructed by taking constraints from the standard con-
strained optimization test bench for EAs [128,135] (see Appendix D). Since it mostly
consists of the inequality-only constrained problems, a simple 2-dimensional problem
(3.3.2) with one equality and four inequality constraints was suggested by Dr. Martin

Berz [27].

gl(x) : a“? + 1'3 —- 1.12 = 0

h1(x) 2:131 — 1 g 0
h2(x) = —:l’1 — 1 S 0 (3.3.2)
h3(x)::r2—ISO

h4(x) = —.7:2——1S0

Initial points were generated randomly from the uniform distribution over S =
[~100.100]v and S = [—1000,1000]v. The total number of the different points tested
for each combination was 1000.

For all these methods the maximum number of steps was 1000 and the precision
was 10‘5. For the combined methods, the maximum number of steps with the ﬁrst
and second methods in one step of the combined algorithm was 10, the total number
of steps was counted by summing steps made by both methods, and the maximum was
set to 1000. The projection was considered successful if all the objective functions were
within the tolerance from the global minimum of zero. The projection was considered

failed if the desired tolerance was not reached and the method either converged or

158

reached the maximum allowed number of steps.

Results and Conclusions

Using all those conventions a series of tests was performed. Outcome of those tests
is summarized in the tables, where for every combination of the method, the penalty
functions and the objective function construction method, the percentage of the suc-
cessful runs and the average number of steps including the failed runs are listed. For
each problem a set of the tables similar to the Tables 3.1, 3.2, 3.3, and 3.4 for the
problem (3.3.2) is constructed. The best methods in terms of the number of the
successful runs are listed in these tables in boldface, the number of the steps of
these best methods is highlighted using the same style. Headers of the columns show
powers of the penalty functions as described in methodology.

For each method three rows contain the results for the all combined, equality com-
bined / inequality combined and separate objective function construction methods.
In the cases where there are no equality constraints or no inequality constraints, the
equality combined .r' inequality combined method is equivalent to the all combined
hence it was not tested. Therefore the number of the rows for each method in this
case is two. The problems G03 and G11 have one constraint each, hence the number
of rows in these cases is one.

For the sake of space we do not. list all tables for all the cases. Results of all test
can be found in [156]. Here we list the results for the constraints set (3.3.2) since
this problem has both an equality constraint and several inequality constraints (see
Tables 3.1, 3.2, 3.3, and 3.4). We also list. the results for the problem G03 in Figure
D.3) because it has only one constraint (see Tables 3.5, 3.6, 3.7, and 3.8) and for

the problem G07 in Figure D.7 because it has the largest number of constraints (see

159

Tables 3.9, 3.10, 3.11, and 3.12).

Results of all tests are summarized in two performance tables: Table 3.13 and Ta-
ble 3.14 [156]. For every problem the three best approaches to constraint satisfaction
are listed. Different. tables correspond to the different initial point sampling ranges.
The comparison is based on the percentage of the successful runs and the average
number of the steps made during the search process (including failed ones).

From those tables it could be clearly seen that the optimal approach to constraint

satisfaction on the selected set of problems is:
o optimizer: LMDIF

o objective function type: separate, i.e. penalties for individual constraints are

treated as separate objectives in a multi-objective optimization problem (3.2.1)

0 power for the penalty function: a = 1 for both equality and inequality con-

straints

This approach is the ﬁrst best for problems G01, G02, G04, G06, G07 and G11,
second best for G00 and “tens" and the third best for G08, G13 and “vess” (see
Appendix D). The combined LMDIF+ANNEALING search method used with the
same penalty function and the objective function type is the second best. approach
with a slightly larger number of steps. However, for some problems (G03, G13), it
demonstrated signiﬁcantly better performance; and, for most of them it does not.
perform signiﬁcantly worse than the leader. We believe that this is caused by the fact
that the random and very heuristic ANNEALING method helps the deterministic and
analytic LMDIF method to avoid getting stuck on difﬁcult landscapes in the search
space of the complicated problems. We also believe that. a good performance of the

next best SIMPLEX-+LMDIF combined method is also due to the LMDIF while the

l. 60

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8.8 8.8: 8.8 8.8: 8.8: 8.8 8.8 8.8 88 .
8.8 8.8 8.8 8.2: 8.8: 8.8 8.8 8.8 8.8 zz<+8§
8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8
8.8 8.8 8.8 88 8.8 8.8 8.8 8.8 8.8
8.8 8.8 8.8 88 8.8 8.8 8.8 8.8 8.8 22448.8
8.: 8.8 8.8 88 8.8 8.8 8.8 8.8 8.8
8.8: 8.8: 8.8 8.8: 8.2: 8.8 8.8 8.8 8.8
8.2: 8.8 8.8 8.2: 8.2: 8.8 8.8 8.8 8.8 8.888
8.8 8.2: 8.8 8.8 8.8 8.8 8.8 8.8 8.8
8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 .
88 8.8 8.8 88 88 8.8 8.8 8.8 8.8 02.88224
8.8 8.8 8.8 8.8 88 8.8 8.8 88 8.8
8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8
8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 88.5
8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 88
8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8
8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 8.8 888.8
8.8 8.8 8.8 8.8. 8.8 8.8 8.8 8.8 8.8
mam: 8%. am; ms.” 6 as _ 1N Fm”: 3H3 882
mmGQUng

 

 

161

88202 5 88m: 88 8:88 8: mo muggﬁma .8: ..o 8:8... 5 £80me 88m
8558: #5555888 :8an 5888::ch 888:8. 88 8:388 8338:.“ l- 8:388 8.83.88 «8:388 28 8: 8 8888
-.So @058: :28 :8 8.6: $25. .maﬁabmccu 328685 .83: .58 «Ed “588:8 5:853 885:0: mac .8: :8an .m t a. 8th
.333 53.; an 8an 828:: cos :0 3.03 80 EmEci 2: 88 53835 85.8850 83 .6 3,8 883m "H.” 2an

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

can 9.: can va Nw HoH 9:: GE: oocH

Nwm HE. bHN NMN 05H 2: OCH: OCH: cocH ZZ<+QEQ
mun Em man mwm ova w H H H52 H53 83

wow mmv ch H53 83 25H :9: GE: oooH .

o? wmv Hmv 83 2:: GE: c9: 83 oooH 224 + m Em
w? mmv wmv 82 GE: 2:: COCH Ex: 83

©NH mHH QNH HON on wvm OCH: :2: COOH

th mam mNH Hun ovH wvm OCH: OCH: OCH: DEAanﬁm
mm H w H H mm H com m9V omm ccoH GE: 25H

OCH: 2:: 83 83 GE: H53 08H 2:: CASH .
:2: 2:: 83 83 2:: cos 92: cos 9:: OZS4QZZ4
CE: 2:: OCH: 83 GE: OCH: OCH: CE: 02:

m3 mm m: mw on on mm 3 w

at new NHH Ha an mm m.» mm m ED 24
mm: me 3: mu 3 m: 3.. mm m

3..“ mmm mmm mmv mmv mmv v v v

awn awn mmm mmv mmv mmw w v v XMALZE
mmm mmm mom va mew wmm cw .5 w.

n. m n. m H. N NLN Nd H5. ”.H m.H H.H

N: “Cr _ L r 8 8 .2...

3.88 we mm mg

 

 

.88 .81

Sea 850a E258 2:: cc 333$ 80 :53th 8: 88 88288.33 8:888:00 8: hc 858 go 828:: mmfeﬁw "a.” @368

162

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ovﬁw odoﬁ chum odoH Odo“ 066a cod cod cod
8.3 oond 225$ 92: 92: omda cod cod ocd ZZ<+QS§
8.5 323 ondm oodn coda ovdmw cod cod cod
8.: 3.3. cméh 3d oﬁo omd cod co. cod . . .
oﬁao OWE. CNS. cvd omd and cod cod cod 27x. +n:zm
cmdn 3.3. 8.3 3.9 cmd cad cod cod cod

odo~ odoﬁ coda odoﬁ 92: 84% cod cod cod

odoH cdoﬁ owdm 92: 92: oméw cod cod cod Q24+m2m
22.3 92: oudm om.ww cmﬁo cwdm cod cod cod
cod cod cad cad cod cod cod cod cod .

cod cod ccd ccd ccd cod cod cod cod 0254m224.
cod ocd cc: cad ocd cod cod cod cod
cw.mm oeda 3.5 oﬁmm owdm oﬁmw ocd cod cod
ow.ww oodw emu; 3km cwdm :36 cod cod 8.: “:93
8.3 omdw Eng cmAw 36m cwhv cod cod cod
3.? oodv cog. cmdm coda coda cod ocd cod .
9.3 owdv cwdw cmdm ccdm ocdm cod cod ocd Kmamwﬁm
OWE. cad» owns» cwgm cm.wm Exam cod cod ocd

m.m N.N a...» nu... vim in n; NJ H;

. . m _ . g . . . r _ _ ......

mmcuSE K.

 

 

2.802 dog; Soc €on 83.5: 9:: :c 3.98 8.0 Eaﬁohm 2: .5.“ ceatmﬂoa $53?ch 2: ...: $5 mmcuczm an.” mEmH.

163

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

wan wHN Nam 2% ca: Mm; c9: coca 2::
www cog man an” Eh mm: cos 83 cos ZZ<+QS§
mom gm 5mm 59. mam 1.: c2: 83 cos
mum. hmm m H m oooﬁ coca cog o2: GE: 83 .
mam gm won 83 83 GE: o2: cog 83 224, + $25
mun 5m NF. coca coed coca GE: 83 coca
H: v3 v3 mmm om Em ass 83 cog .
3m m3“ «.3 mmm mm: EN 83 cos 2:: Q24+m2m
mm: m: mm: Sm mg Em oooﬁ cog c9:
83 o2: coca o2: 83 83 02: 83 DE:
02: cog o2: coca 83 cos cos 2:: 83 UZE<mZZ<
see 83 cog cog oooﬁ cog c9: 2:: can:
53 2: m2 a» 3» H2 mm 2 o
gm Km 2: v: mm 3: mm on w EOE;
mom new 3m HS mg t3 ww mm o
mom com com 5% ES. 5% w w v
com com com 5% 5% 5% v w v qumEHm
3m. mvm Sm mmv mam own mm 5 v
N. N m. n im win win in NA NA 7H

1 1 1. _ _ .r H E ......

38% Mo % was

 

 

182 .081

So: EECQ Ecccﬁ o2: co $.03 80 52935 2: p8 cosucaca 326.558 2: .3 327. .3 #55:: mmﬁoﬁx "v.” 23m?

164

Table 3.5: Success rate of the constraints projection for the Problem G03 (Fig—
ure D.3) on 1000 random points from [—100,100]v. Here 22 _, 10, problem has one
nonlinear equality constraint. One row for each method correspond to the separate
optimization problem formulation methods. Best methods in terms of the percentage
of the results are listed in boldface

 

 

 

 

 

 

 

 

 

 

 

 

% success

Method 1 I 2 F ;2
SIMPLEX 0.00 100.0 99.90
LMDIF 0.00 83.30 77.70
ANNEALING 0.00 0.00 0.00
SIMPLEX+LMDIF 0.00 100.0 100.0
SIMPLEX+ANNEALING 0.00 46.80 55.20
LMDIF+ANNEALING 0.00 100.0 100.0

 

Table 3.6: Average number of steps of the constraints projection for the Problem
G03 (Figure D.3) on 1000 random points from [—100,100]v

 

 

 

 

 

 

 

 

 

Method avg # or steps

1 I z I :2
SIMPLEX 7 258 257
LMDIF 9 338 430
ANNEALING 1000 1000 1000
SIMPLEX+LMDIF 1000 333 488
SIMPLEX+ANNEALING 1000 924 901
LMDIF+ANNEALING 1000 270 361

 

 

 

 

Table 3.7: Success rate. of the ('tonstraints projection for the Problem G03 (Figure
D3) on 1000 random points from {—1000 1000]"

 

 

 

 

 

 

 

 

 

 

 

 

Method "/r, success
1 I z 1 .22

SIMPLEX 0.00 99.50 99.50
LMDIF 0.00 69.40 68.80
ANNEALING 0.00 0.00 0.00
SIlV‘lPLEX+Ll\/'IDIF 0.00 99.90 100.0
SIMPLEX+ANNEALING 0.00 0.00 0.00
LMDIF+ANNEALING 0.00 100.0 100.0

 

165

Table 3.8: Average number of steps of the constraints projection for the Problem
G03 (Figure D.3) on 1000 random points from [-1000,1000]v

 

 

 

 

 

 

 

 

 

, avg #1 of steps
Method 1 ] z ] z2
SIMPLEX 7 343 343
LMDIF 9 475 526
ANNEALING 1000 1000 1000
SIMPLEX+LMDIF 1000 466 691
SIMPLEX—+ ANNEALING 1000 1000 1000
LMDIF+ANNEALING 1000 419 614

 

 

 

 

Table 3.9: Success rate of the constraints projection for the Problem G07 (Figure
D.7) on 1000 random points from {—100. 100]”. Here 1) — 10, problem has 8 inequality
constraints (3 linear, 5 nonlinear). Two rows for each method correspond to the All
combined and separate optimization problem formulation methods. Best. methods in
terms of the percentage of the results are listed in boldface

 

 

 

 

 

 

 

 

 

 

 

 

Method (7]: success
1 ] z ] :2
0.00 48.20 64.70
SIMPLEX 0.00 0.00 0.00
0.00 100.0 90.10
LMDIF 0.00 100.0 99.30
ANNEALING 8:33 8:88 8:88
, 0.00 0.00 0.00
SIMPLEX+LMDIF 0.00 000 0.00
, 0.00 0.10 0.00
SIMPLEX+ANNEALING 0.00 000 0,00
0.00 0.00 0.00
LMDIF+ANNEALING 0.00 0.00 0,00

 

166

Table 3.10: Average number of steps of the constraints projection for the Problem
G07 (Figure D.7) on 1000 random points from [—100,100]v

 

 

 

 

 

 

 

 

 

 

 

 

Method avg # 0f steps

1 [ z ] 22
SIMPLEX 88 723 7:831
LMDIF if} 888 828
ANNEALING i888 i888 i888
SIMPLEX i LMDIF 8888 8888 8888
SIMPLEX é ANNEALING 8888 8888 8888
LMDIF + ANNEALING 8888 8888 8888

 

Table 3.11: Success rate of the constraints projection for the Problem G07 (Figure
D.7) on 1000 random points from [—1000,1000]v

 

 

 

 

 

 

 

 

 

 

 

 

Method % success
1 ] z ] 22
, 0.00 25.20 40.40
SIMPLEX 0.00 0.00 0.00
0.00 99.80 92.90
LMDIF 0.00 100.0 97.20
ANNEALING 8:88 8288 8288
SIMPLEX+ LMDIF 8:88 888 888
, , . , 0.00 0.00 0.00
SIMPLEMANNEALING 0.00 0.00 0.00
LMDIF + ANNEALING 888 888 888

 

Table 3.12: Average number of steps of the constraints projection for the Problem
G07 (Figure D.7) on 1000 random points from [—1000,1000]v

 

 

 

 

 

 

 

 

 

 

 

 

Method avg # 0f steps

1 ] z ] .22
SIMPLEX 88 988 888
LMDIF ii 888 888
ANNEALING i888 i888 i888
SIMPLEX+LMDIF 8888 8888 8888
SIMPLEX+ANNEALING 8888 8888 8888
LMDIF+ANNEALING }888 8888 8888

 

heuristic SIMPLEX method helps LMDIF‘to not get stuck. Therefore we consider
LMDIF (possibly paired with heuristic “helper method”) as the best selection for the
constraint satisfaction on the presented set of problems. ANNEALING method alone
demonstrated the worse results and SIMPLEX showed generally average performance.

Considering “No Free Lunch Theorems for Search and Optimization” (see section
2.1.4) such a superior performance of one optimization method over others can be
explained since it uses the largest amount of the information about the problem to
guide the search process. While SIMPLEX and ANNEALING are purely heuristic
methods and do not use any information about a problem apart from the function
values, LMDIF uses both the ﬁrst derivative and approximation of the second deriva-
tive [77] to determine the direction to the minimum. As one can see in Appendix
D, most of the constraints in the presented set of the problems are given in a form
of nice, twice continuously differentiable functions. Hence it is possible to use this
available information in order to run the specialized method. We speculate that for

general constraint functions LMDIF would not. work so well do not, in terms of the

168

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

«8 EN Amaefm 8m 25 4...: 8m mam Am”: Es
8 2% a: 2: 95 3?; as 35 £33,; as,
as. 93 34 Rm ”.8 Sim an 23 A... <3 20
2m 3:: $3.7m m2 :2: $.14 m2 92: 34+m So
on $8 34.3 on 92: 34+m om 92: A”: :0
a? is Eﬂi an 3: $3.76 own as Am”: 20
mg 9% 343 2m 93 €3.33 Rm :5 Nmmvam 80
8 0% a... E :2: 3.7% .8 3:: 3T; 80
- - - N3“ 2a ANN: NS :2: 3; Sc
z: 92: 3.7 m 5 25 Ans: mm 26 3.. 80
- - - - - - as Z: n + ms: 80
cm 3:: Am” 4 mm 3:: 3.1g 2 :2: 3g 30
an ode 34-. m EN 32 Efi mg :2: 3m 80
g 9.3 ﬁver m 3 Em @%é.+m 8 3m 3: 80
EN 92: Am”: a 5% ES 3 :2: a: 80
E 92: s .wgtm cm. :3 w .....3 mm 3:: NM .914 80
88m ﬁ Him as _ @222: macaw _ 82m caH ~55me 88m E 85 c5 ﬂ $052: .

H: E H EcEcE

 

 

 

team: He: 8e ﬂ no.“

£52.38 amen H: can E 85; .32 c8 Ex new :8an 2: “8 $55 mmooosm .AQ ﬁccoaasq. away 52:95 a 3 tacccw ﬂ 2. 953

.L

o9 63L 89c 350a 832:: 92: ﬁt. 388:3 8mg 2: 3 @3383 853859“ ccsoﬂmsam 28:55: amem "Mad 2nt

169

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

as 3: Anaheim w: 3 3: .2: 3 an”: 2E
a: 5m 3on m2 «.8 Amwem mg 25 Amﬂeim we»
New .98 3: mom :3 Sim E 25 32+: 20
mam :2: maxim t: :2: 34.3 E 3:: 34$ 20
E 2:: 3.1: a: 25 Sim mm 23 3: :0
new 9.8 3:: o3 3:. 9...:ij was wt We: 20
m3. :3 35.: 2m 3s $.33 m: 23 £39m 80
cm :3 3.: a 3:: 3.1m mm 3:: 33+: woo
- - - Em v.3 as: a: 3:: 3.: So
9: :2: Sim mm: 33 Am”: a: 2a A”: 80
- - - - - - - - - 80
m2 :2: £3: a: :2: 34.3 3 3a a: EU
a; 3:: 3.1: 8:. 3m 3.1m my.“ n.3, 3m 20
an 3: £30235 mom was 3..§.+m Q: is d3 80
w: 3:: an: 3 $3 3.5 at 92: 3: 80
ca 3:: a iim 3 go. W ...q: 9: Ex: .... if; 80
38m _ 82m 05 :5:qu macaw. H 85 c\c _ CECE: mamﬂ 85 CAL «55.2:

E : : E305

 

 

9:5; .32 c3 8e 80 8222:: 2: 5.. mega: mmcuozm

Eoc EEca 53.52 ace: .5 «55.5th $8: 2: 3 mzzzcuum mmcowcaaa :osoﬂmcwm 365250 ammm

33¢: :0: 2w ﬁ :8 £53ng amen E can : q
.5 x€c¢aa< oomv E2205 .m 3 cocci: mm a. 8mm «Loco: dog;

3L5 @358

170

best constraint satisfaction method might be quite different.

The data in the summary tables 3.13, 3.14 can also be used to help in selecting
an optimal number of steps for guaranteed constraint satisfaction. However, we are
interested in the computational cost as well as the performance. Hence, a different set
of the tests might be needed in order to determine the minimum maximum number
of steps allowed in order to reach a desired rate of successful runs to all runs. Here
we can only conclude that this level would depend on the maximum allowed number
of steps. Setting it to the values less than the average from the tables would most
likely lead to a degradation in performance.

We also note that problems with equality constraints (G03, G05, G13) and high—
dimensional problems (G03, G07, G09, G10) have indeed demonstrated themselves
as harder to solve. However, high-dimensional problem G02 and problem G11 with
equality constraint only, did not obey this empirical rule. Hence we suggest the
estimation of the difficulty of the problem based on this rule to be taken with care
and always veriﬁed by simulations.

We see that for these problems the power penalty functions (3.2.10) with a = 1
are the best choice, while a = 2 is a significantly inferior variant. However, this
result is not only a problem-dependent but also an optimizer-dependent hence we
can not conclude that these functions are the best. choice for any combination of the
problem and the optimizer. We believe that since the step penalty functions (a = 0)
demonstrated near zero successful runs in our test (see tables in section “Results”
in |156]), they should generally be avoided as they do not provide any information
about the direction in which the penalty is changing. Since they only indicate if
the point is feasible or not, the search landscape for such penalties is flat, which is

causing most optimization methods to fail because of their inability to make a move

171

to a point that is better than the initial. This conclusion is in accordance with the
study on penalty functions in [162].

Wherever it applies (problems G00, G05), our studies do not demonstrate a signif—
icant difference in performance between the all combined and the equality combined
1 inequality combined optimization problem formulation methods except for the G05
tested on 1000 random points from {—100, 100]4 where it demonstrated 2.5 times bet-
ter performance than any other method. However, these results were not veriﬁed by
the test performed on the search domain from the problem formulation (see Figure
D.5) [156]. Both these objective function types were outperformed by the separate
method and thus are not recommended.

Poor results for the problem G05 in both test ranges are observed to be ob-
tained due to a difference in 3 orders of magnitude between search domains for
11,332 6 [0,1200] and 233,134 6 [—0.55,0.55] from the problem formulation (see Fig-
ure D.5) that are inconsistent with the common search domains of [—100,100]4 and
[—1000,1000]4 used in testing. Additional testing on the suggested search domain
supported all the observations about the best optimization and the objective func-
tion construction methods presented earlier. It must be noted, however, that the best
results were obtained when the quadratic power penalties were used for the equality
constraints, i.e. when penalties for violating the equality constraints were steeper
than the ones for violating the inequality constraints.

Based on our tests we conclude that the transformation of the constraint satis-
faction problem into the multi-objective unconstrained optimization problem (3.2.1)
via the power penalty functions (3.2.10) with a = 1 and successive treatment of the
resulting optimization problem with the LMDIF COSY Inﬁnity optimizer is a rea-

sonable choice for the default parameters of REPROPT. However, we should note

that the considered problem set is not very large and it does not cover all the pos-
sible cases, hence the results might not be general enough and thus might not be
universally applicable. In case of the poor performance of the REPROPT method we
suggest tuning the parameters based on the information about the problem, possibly

after studies similar to the ones performed for this work.

3.3.6 Performance

In this section we assess the performance of the REPA constraint handling method for
Evolutionary Algorithm. The REPA repair algorithm does not work as a standalone
method, but rather is designed to work with any Evolutionary Algorithm for the
continuous optimization on the real domain. Here we study its performance together
with GATool (see section 2.3). The population size for GATool was chosen using
the dimension*20 formula, to compensate for additional difﬁculty that constrained
problems have compared to unconstrained ones, all parameters were set to their de-
fault values (see Figure B.1). As the test problems we used the standard constrained
optimization test bench for EAS [128,135] (see appendix D for problem formulations
and properties).

Three methods are chosen for comparison. The ﬁrst one is the most straight-
forward and expectedly least efﬁcient, killing method (see section 3.2.1). For each
member of the population the sum of the quadratic penalties (3.2.10) is calculated.
Members of the population that have this combined penalty higher than the allowed
tolerance (10_5 in our simulations) are eliminated and then regenerated using the
selected method of the new members generation (randomly uniform by default). This
process is repeated several times up to the maximum allowed number (50 in our cases)

or until all the penalties are zeros, i.e. until the resulting member is feasible. The

173

objective function is not modiﬁed and the elimination process is the only connection
of the algorithm with the problem’s constraints.

Another one is based on the modiﬁcation of the objective function via the penalty
function method (see section 3.2.2). Due to its efﬁciency despite the simplicity we
employed the annealing dynamic penalty (3.2.24) where T0 = 100, Tk+1 = 0.9Tk and
T f is not needed since the number of generations we used for testing is relatively small
and the termination upon reaching the ﬁnal temperature is not required. Despite the
existence of the more sophisticated techniques, the penalty function methods are still
efficient and quite popular.

Another strategy that we tested is a killing method combined with the objective
function modiﬁcation method. This algorithm has only 5 kill/regenerate cycles but
the objective function was calculated using the formula (3.2.4) where all in] = 1 and
Pj are quadratic penalty functions from the power penalty family (3.2.10).

For the REPair Algorithm (REPA) following values for the parameters were used:
the percentage of the repaired unfeasible members was set to 95% tolerance was
set to 10-5. The search space in our tests was given in a form of a v-dimensional
box. This box was represented as a set of linear inequality constraints and added to
the deﬁnition of the feasible set (1.3.14). This was done since the repairing process
technically can produce repair candidates out of the search space hence we need to
penalize the repair algorithm for leaving the search space.

For the repair methods of REPA the following parameters were used:

0 REPair by Feasible lNDividuals (REFIND): the feasible member to repair with
was chosen among 5 feasible members as the closest to the unfeasible one being
repaired; the objective function for constraint satisfaction is built by using the

multi-objective optimization approach (3.2.1) with the quadratic power penal-

174

ties (3.2.10); the optimization algorithm was ANNEALING, the maximum steps

allowed was set to 15 and the tolerance was set to 0.

o REpair by PRojecting through OPTimization (REPROPT): constraint satisfac-
tion optimization problem formulation and solution methods were selected based
on studies presented in section 3.3.5. single-objective constrained problem was
converted into a multi-objective optimization problem via linear power penal-
ties (3.2.10), then solved via LMDIF optimization algorithm with the maximum
number of steps allowed set to 50 (compromise between speed and efﬁciency)

and the tolerance set to 10—5.

For each combination of the problem and the method 100 runs were performed.
Percentage of the successful runs is given in Table 3.15. Results on the obtained values
are summarized in Tables 3.16, 3.17, 3.18, 3.19 for the killing, annealing penalty,
killing+ penalty and the REPA methods, respectively. In those tables ﬁrst six columns
describe the problem (for full description see Appendix D) with the sixth column,
“Optimum”, listing the value of the global feasible optimum or the best known value
of the optimum. Next four columns describe the results of the 100 runs with the
selected optimizer: the best, median, mean and the worst values that were found.

To select the resulting value from the ﬁnal population produced by the chosen
method, the member with the minimum function value among those whose penalties
were smaller than the allowed tolerance. If the emphasize is on the constraints sat-
isfaction rather than on the minimum function value (as in the rigorous constrained
optimization), the selection of the result can be based on the minimum penalty vi-
olation (among those within tolerance, of course) instead. If no population member
with its constraint violation penalty within required tolerance was found, the run was

considered failed. Mean and median results were calculated over the successful runs

175

only.

It is also worth mentioning that due to the fact that one of the stopping criteria
was the maximum number of generations allowed and some of those problems are
signiﬁcantly difﬁcult (particularly high-dimensional), most constrained optimization
methods were frequently not convergent. Rather they exhausted the allowed number
of generations and stopped, which is particularly true to the least efficient naive
killing strategy. Results obtained using this method, even though they can be worse
than the one obtained with no such restriction, are particularly important for the
problem of cutoff value update generation in rigorous optimization (see section 2.3.6)
and allows to keep the execution time under control. In this study we tried to show
that at a reasonable computational expense REPA method is capable of generating
good solutions. Another study might be required in order to ﬁnd out if it is able to
ﬁnd optimal solutions to all the problems. Another note that has to be made is that
here all the constrained optimization methods were used in conjunction with GATool
(see section 2.3) but in principle are not tied to it. For different pairing unconstrained
continuous EA-based optimizers results can be different.

By studying the rate of success in Table 3.15 one can conclude that the REPA
method is the best one for the considered problem set. Apart from its poor per-
formance on the problems G01. and G10 (see Figure D.10) it was able to produce a
feasible result in almost all runs with the default settings. The second best was the
annealing penalty method, the third and the fourth places are shared between the
killing and the killing-+penalty methods.

It is worthwhile to note that for the problem G06, G10, G11 the killing method
outperforms the killing+penalty method. If we compare the numerical values of the

results produces by those two methods (Tables 3.16, 3.18), we can notice that none

176

Table 3.15: Percentage of the successful runs for different methods (from 100 runs
total). HEre v is the problem dimension, n is the number of constraints, “Diff.”
column lists difﬁculties of the problems according to [128]. Here E,A,D,VD mean
EASY, AVERAGE, DIFFICULT and VERY DIFFICULT, correspondingly

 

 

 

 

 

Problem Diff. v n Success Rate (%)
KillingT Killing-t Penalty ] Anneal. Penalty 8 REPA

G01 D 13 9 2 3 100 9
G02 D 20 2 100 100 100 100
G03 D 10 1 3 2 100 100
G04 A 5 6 100 100 99 100
G05 VD 4 5 0 0 0 100
G06 A 2 2 23 2 54 99
G07 A 10 8 0 0 100 100
G08 E 2 2 100 100 100 100
G09 A 7 4 100 100 100 100
G 10 D 8 6 10 0 0 11
G 11 E 2 1 12 1 100 99
G12 E 3 1 100 95 100 100
G 13 VD 5 3 0 0 76 100
tens E 3 4 96 44 89 100
vess A 4 4 100 100 100 100

 

 

 

 

 

 

 

 

 

of those two methods is superior to another. They both failed on the problems G05,
G07, G10 and G13 and demonstrated similar quality of the result on the problems
G04, G09 and G12. On the problems G01, G02, G03, G11 the killing+penalty has
demonstrated better performance, while on the problems G06, G08, “tens” and “vess”
the naive killing performed better. On almost all the problems marked as EASY those
methods were able to achieve the optimum or a value close to the optimal. Both the
killing and the killing+penalty methods were clearly outperformed by the annealing
penalty and REPA techniques (see additionaly Tables 3.17, 3.19) both in the success
rate and the quality of the results.

It is worth noting that the killing method can be adequate for problems with a
relatively large p factor. However, for the cases when it is relatively small or very

small, random regeneration has a very small chance of generating a feasible member.

177

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FNhnchw. Hmcwwv wvH NacHodmnmw H HvavadehmmvH wnocvmwmﬁmooN vaovmdnoc Nosed» : v < mm?
wNONHmnNd 3333.: 25:38.: :mewvHod waNHod nmmnd v m m 83
:28.“ :21: :28: :23: wavammod ooood n m Q> «H 0
$8536. Nownwamd- $3.886- ewcwamamdi H- 33.: H m m NHO
thHmmgd 33::de newowmwnd 8:023: mud 38.: H N m H H O
:28.“ :28.“ :28: :2 8.: 53.33. 88.: o w O S U
8338.:an 9”:me Enema ovmvaomdom £593:de nnmccmcdww mmHnd : h < :00
wmmowhood- Nnoncvcod- 3.2.98.0- 33386. 3386. Hwnwd N N m wow
:23: :25: :28.“ :28.“ HaONcoméN 88.: m OH 4 .50
05:38.ng - mHnnHNdemm- 5333.53- HoNvawEdeo- wmechmc- Rood N N .4. wow
:28: :28: :23 :23. HwadeHm 8:: n : Q> mow
:nvmcwmndomwN- noomwNmoNncmN- 33:39:83- 3333.38? 0.3688- agoNN : m < :00
wNmmNHood- can 836. 53886. 38836. H- :Nood H OH D 80
NoNomemd- 3.338.? 3338:- Hchchnd- memowd- nummda N ON D New
33.5%.? $3336- 3333:. nvwaoNMN. H - m H - wooed m 2 Q 50

f 8.53 _ :82 _ :2:on % «mom _ 858:3 L a. R _ : A > L :5 _ 82no.~nH _

 

m8: :8 8
838o8 2:32 8::8: o: :23 :oﬁo8 2: .: :82 m: 2:28.:3 .282 295 :2: v.22?» :20? 2: :8 :38 £238 £25 2:
62:68 25:2 2: 3:3 2:: co: 2: .«o 252 2: 2:88: 8838 82 :2. 2:. 8:8:ao 2: .8 2:6» 8523 $2: 2: 8
8:8:ao 222:2 :32w 2: Mo 28> 2: w::m: ...8:8::O: £838 5:: 2: 5:? a 8:525 mcm 8:285: :3 .5: 82:8:
2: 2:58: 8838 x: :8: 22H. .m:o:.m.$:¢w on: 5:: 60:88 25:2 2: mo 8:88:85: 2: mo inEEsm 5H.” 2nt

178

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N HNNwvavaov Oanm HwH .8: H nvwwo HNndova NONmomdeH Ha vaovmdmo: Nchodm v v < 2w.»
:Nammva .o 3338.: ©3339: thwNNHod :woNHod 33.0 v n m 25:
vmmmehmHN monNon .o $3:on .o 3333.: mavammod 88.: n m O> 2 O
3:33:6- ocammmomd- 3883:- aaaaaaaad- H- 33.: H n m NHU
Novanvvwd 335:5: 3332.: H H 3255.: Ed 3.8.: H N m H H O

:28 :2:8 :28 :2:8 Bandit. ONood c w G c: U
vomonnmnéwo amchme. Hm: mHmwNNNv. Hwo Hhvmmmcndwc mnmoomodww 3H wd v N < 80
$3386. omvgtod- H H858:- vanwmacéu 32:86. 53.: N N m woo
$2 mwomdm $333.3 $833.3. wwvawccH .oN HchwoméN good w 3 ¢. SO

3303:. HEN- Hancvmmdmuv- wmnmkw H .39.- 2.8: Hndvnc- wwm Hw. 8%. “wood N N < 80

:28 :2:8 :28 :2:8 mevéNHm coed w v Q> moo
Hmonmmco. SEN- Hm wavadovom- nvwwNmNoﬁmvon- Hvommb $688- mmm.m©com- 2.8.3 c m. < 30
«33586- omMNaommd- o Huvmchd- NumNONnmd- H- cNood H OH Q 80
333.36. nwomumwwd- mam Hmwwcd- 33:355- :Hemowd- 93:5: N ON 0 Nov
5838.:- NVNvomvde- ngHnde- $583.2- mH- wooed m NH O 50

_ :203 _ :82 _ 5:222 _ :mom _ 858::0 H a: R _ c _ > _ :5 _ 82:8: _

 

28.: :5 8 22:8:8 2822 82:2: 0:
:28 :2::8 2: : :22: 2 2:28.”? :85: 225 :2: v28; :20? 2: :2: 82:8 822:8 :52: 2: n:2:¢8 5:28: wczdoccw
2: 2:3 28: OH: 2: :o 2:52 2: 2:52: 2:882: 88: 82: 2E. .8:8::c 2: .8 2:2, 852:: :m2: 2: .5 828:2: 282:8
:32»: 2: .8 2:2 2: 25m: ...8:8::O: .8888 ﬁx: 2: 8:? AD 288:9: 8m 82:282.: :8 8.: 82:8: 2: 2:88:
2:888 x: :2: 2E. .m:o_:.2oaow SH :28 82:28 2228: 28:28:: 2: :0 2:58:88: 2: :o bu88=m "NH.” 29:8

179

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3338on H: unmovmmwdpwomH wwnmmvwm.8va H 3:83.982: 333.38 35.3. v v < mm?
@328”: Banged owcHtccd «2:88.: 588.: nmmnd v m m 98»
:8H8 :m:8 :m:8 :o:8 wavmmmod good m m Q> ”H 0
558386. vcwvoammd- woamchmd- anmaaaaé- H- 3:»: H n m m HO
mhwnmcmad H 583%.: £855.: mnoovombé mud aged H N m H H U
:28 :o:8 :w:8 :w:8 hommdvon omood : m O OH 0
5338.:de EwmmcmaﬁooH whooH wovdwa 3:353 65. 9.308.:me 8H md v N. < 80
82.38.: mammonmod- mvnomeod- vmnm Hnaod- nmwmaod- Hwnwd m m m wow
:28 :v:8 :o:8 :o:8 Samoan. vm 88.: m A: < 30
8585.8: - mammwomwdmvm- wmmnwomwdmvm- :Hcmwmmndn Hm- wwm H w. 5%- Rood a m < 80
H528 :28 :28 :e:8 HwavémHn coed m : Q> now
mnvmmwcﬁnngm- mm H mHomc. Hanan- ammcmwawdcmam- 353va .88»... @3639”- 28.3 c m < vow
#33256. N. Hovmnmcd- n 53:36. Hogans .o- H- 88.: H oH O no.0
mnmgwmmd- a 5:886- «3:856. Hoomamzd- 38%.? 3.8.8 m on D New
manage.” ommwmw: .m- $3355- :32: H H .v- 3. wooed m nH Q How

_ 6.83 F :82 a 58:82 _ umom _ 858590 _ o. R ﬂ: _ > _ . :51— 8:395 _

 

2:: :a E 82::me @388 95:05 8
:28 :oﬁmE .2: .: :83 m: 2:28?“ :58 v.83 :8: mos—S, $83 2: :5 52:: 58:2: £ch 2: :0:me migmapwwE—i
2: 5:: 2:: OCH 2:. .0 $38: 2: 9:88: 92:38 .58 38: 25. .EEEEO on: :o 33> 925:: :8: 2: :o SEEK:
2:38 Ego—m on: .6 :25: 2: wﬁam: ,..E:::EO= 5838 5x6 2: 8:3 a 8:559: 9% 5598mm: :8 Sb :8an 2:
9:88: 8638 5w :8: 2:: £55823: 9.: $8: €288 3_::o:+w:::x m5 .6 8:588.th 2: mo Dwﬁﬁsm "a.” 2an

180

 

 

 

 

 

 

 

 

 

 

 

 

 

 

mmwnvnaH .mmnov 33336va H wovnmmHv.§oH wmnnmon .mmww $83.38 35.2” v v < $9,.
832.36 @3323. 8H HmmHod nnmweusd chmHod nmmud v n m m5;
58836 38336 mnwwmmmod Hvemmnmcd waammcd oocod n m Q> «H 0
323256. wwgmamad- $8385- aaamaaamd- H- Send H m m NH O
vmvwgmwd ophmHmmﬁo «wwwnomnd unoccowhé mud 3.86 H m m :0
333863: mowonmwmdwom $338.35 $333.33 5362:. omood c w O CH 0
$835 .mwo voowmwwn. Sc 8333. Hwo mmmmmewdwo numoomodwo 836 v n < 80
mvanamod- Hme Hnmcd- wavmwnaod- vcmawmaod- mammaod- 53¢ a m m woo
3335.3 omoﬁéowdm «6385.3 wawvnvccdm Haemooném 886 w 3 < 3O
mgnwmou. Hnmm- EnnmmwﬁH H Hm- mvavawmvHoco- mwmmmomegc- £88. 58. nmood m m < $90
unmwwwhmdem manmmhcémHn awothmdem ammonavéNHm mevémHm oood m. w 92 new
HuHmome. H Hmom- o3 Hmmwwdeon- $53: .38”. mnvnwtodcoon- 3038”- 2.8.3 o m < 30
oH Hmvwnnd- 38585- mammmmmmd- @3336. H- cmood H oH a new
mmmﬁmwmd- gagged- mwmmwwmod- nmmmwownd- 3386. «$3.3 m on D New
momma??? 3833.9- $332.3- $838.3- nH- wooed m m H Q How
_ ”.803 ~ .5on _ 5:52 _ 6cm @ EEEEO _ Q R a = ﬂ > ﬂuﬁﬂ EcEEL _

 

m5: =a
S. E382: Snag 8355 8 3:3 H552: 2: a team: ﬂ :vazam: .358 $95 HS: was?» $8.5 2: was 538 6.238 £8:
.2: ”H5238 $55 2: 53» m5: 2: 2: me 3:58 2: carommv 3838 Sou 38: 25. .Esﬁﬁmo 2: .Ho wig :32; amen
.2: .8 Engage oﬁﬁmﬂ =3on 2: .5 egg 23 mag: $82530: 5838 5me 2: ﬁts HQ 5.9595 9% cosatomoc :3 .8:
52295 2: mntomwv maﬁa—8 5m HE: 23. .mcosﬁmcmm omH Eda 6052c <nHmHm .Ho 8588th 2: Ho ﬁxEEHHm 5H.» 0558

181

Here the directional search for the feasible set that is done by REPA demonstrated
itself to be more efﬁcient. Speciﬁcally this is demonstrated on the problems with
equality constraints, i.e. cases with the theoretical p equal to zero.

Comparing the annealing penalty and REPA methods (Tables 3.15, 3.17, 3.19)
we notice that while the success rate for the problem G01 is higher for the an-
nealing penalty method, the quality of the best and median results was better for
REPA method. The annealing penalty method failed on the problems G05 and G10
(VERY DIFFICULT and DIFFICULT), demonstrated a similar performance on the
best achieved results for the problems G08, G09, G11 and G12 (EASY and AVER-
AGE). It was able to ﬁnd optimal values for the problems G08, G11, G12 (EASY).
However the mean and the median results for all the problems except G09 were much
better for REPA than for the annealing penalty method (for problem G09 the mean
and the average values are of the same quality). Therefore we can conclude that
REPA method provides more consistent results over the runs and thus is a more
robust algorithm. REPA method found the optimal values for the problems G05,
G08, G11, G12, G13 and “tens” and the results close to the optimum for the prob-
lems G03, G04, G06, G09. For two problems that were classiﬁed by Mezura [128]
as VERY DIFFICULT (G05, G13) it was able to ﬁnd an optimal value consistently
(the mean and the median values are nearly Optimal). REPA method outperforms
the annealing penalty method on all the problems but G09, where the results are of
the same quality.

Based on our tests we can conclude that. the proposed REPA method was the
best from the tested methods for the continuous constrained Optimization. It also
demonstrates itself as being robust which is important for consistent generation of

good cutoff values for rigorous constrained optimization. Due to the repairing process

182

performed by it, the REPA method is relatively computationally expensive but for
rigorous optimization it is a valid tradeoff for the quality of the result. Another
important case is the optimization of the physical device design, where the objective
function can be calculated on the base of the expensive simulations and thus be very
taxing to calculate. Constraints in this case are very important to satisfy.

If we compare the results obtained by REPA method with the results obtained
with other constrained Evolutionary Algorithms by other authors, we can conclude
that the proposed approach is, indeed, efﬁcient and competitive. Those experiments
include the studies performed by Michalewicz [133] (problems 1, 2, 3 and 5 in his
test set are problems G01, G10, G09 and G07 in our test set, correspondingly); by
Coello and Mezura [41] (examples 2, 3, 4 in his test set are the problems “vess”,
“tens” and G12 in our test. set, correspondingly) and by Mezura [128] (see Chapter
5, test functions are the same). We also note that for the problems G05, G13 that
are efﬁciently and robustly solved by our method, most of the other state of the
art approaches to constrained optimization with EAs failed even to ﬁnd a feasible
nunnberl128l

Trying to improve the results for the test problems marked DIFFICULT (G01,
G02, G10) and those marked AVERAGE (G07, “vess”) that were not completely
solved because the feasible optimum was not reached with the default settings, we
tested if the increase in the maximum allowed number of generations from 150 to 250
increases the quality of the results (this suggestion is more or less generic for GA-hard
problems, see for example section 2.3.5). For G01 we did not. obtain any statistically
signiﬁcant improvement. For G02 the increased number of generations resulted in the

increase in the quality of the results (compare with values from Table 3.19):

 

best
median

-0.746413744215766

-0.7896210467822932 1

183

—0.74038332818324S
—0.6797819227868159

ll

mean
WOI‘St

 

For G10 the increased number of generations produced an increase in the success rate

(grows to 19%) with reduction in the quality of the results:

 

 

best = 7407.208490599209
median = 9378.23832879927
mean = 9870.55475611721
worst = 19827.95976815720

 

For G07 the increased number of generations resulted in the increase in the quality

of the results:

 

 

best = 25.22498907620453
median = 28.1881923101962
mean = 28.2941232844445
worst = 35.71934554079836

 

For “vess” the increased number of generations resulted in the increase in the quality

of the results:

 

best = 8797.914097806060
median = 9724.07325335419
mean = 10449.0859049756
worst = 23374.47976868584

 

 

However, for G02, p z 99.997370, hence the main impact on the performance
for this problem is an effective optimization inside the feasible region, which REPA
method is not. interfering with. Hence the performance is mainly affected by the
performance of the underlying genetic algorithm and the problem should be tuned
for the optimal performance by tuning GATool parameters. We suggest that the
better performance of GATool in this case is a main reason for the better results of
the GATool + REPA method. For other problems changing REPA settings might be

more relevant.

184

 

 

 

Since the problem G01 is not only high-dimensional (13) but has the largest num-
ber Of constraint functions (nine) deﬁning the feasible set, we suspected that the
increase of performance for this function can be reached by changing the settings for
the optimizer used by REPROPT method tO project unfeasible members to the feasi-
ble set. Experiments demonstrate that by decreasing the projection penalty tolerance
to 1 and increasing the maximum allowed number of steps for projection to 70 we can

improve the success rate from 9% up to the 100% and increase the quality of results

 

 

to

best = -14.95789279338429
median = -14.3710717847534

mean = -14.3276103831189

worst = -13.12539263469034

 

This case also demonstrates an important technique of repairing. By setting the
desired constraint satisfaction penalty to 1 we allow the resulting repaired point to
be in the close proximity to the feasible set (provided the constraint functions are
continuous, see section 3.2.2), but not necessarily inside it or on the boundary. It
turned out that this relaxation of the repairing conditions still can be useful for the
optimization process. The determination of the Optimal set of parameters for the
given constrained Optimization problem is not a trivial task and generally done by
trial and error. Little to no theory exist on the topic.

If we consider optimizers constructed from GATool and REPA by changing their
parameters as different optimizers rather than the same version of the one Optimizer
(this assumption is valid since most of them can influence the search process to
a large extent) from “NO Free Lunch Theorems for Search and Optimization” or
NFL theorems (see section 2.1.4), we can conclude that there is no universally best

set of those parameters, no matter how hard and extensively we try to find them.

185

 

There will always exist problems for which the default parameters will not work in
an Optimal way, hence knowing the strategies to tune the method for the problem is
also important. For the parameters of GATool and REPA see sections 2.3 and 3.3,
correspondingly. Note that the number of parameters is large and some Of them are
dependent, which on one hand makes it hard to tune them, but on the other hand
gives the user a great flexibility. Since we view our Optimizer as a general-purpose
tool applicable to a wide range Of different problems, we consider this mostly as its
advantage.

Few words of critique must be said about the standard problem set for EA-based
constrained Optimization that we used. First, we note that the difﬁculty of the prob—
lem according to NFL theorems is method-dependent, hence the classiﬁcation of the
difﬁculty made by Mezura [128] and we adopted in this work (see Appendix D) might
not be adequate for all methods. Indeed, it turns out, that two of the problems: G05
and G13 marked as VERY DIFFICULT due to a presence of one or more nonlinear
equality in the constraint functions set, did not pose a signiﬁcant problem for our
method (see Tables 3.15, 3.19) thus can not be considered VERY DIFFICULT with
respect to it. The problems G01, G02, G03, G10 marked as DIFFICULT, due to a
high dimensionality and small p (except for G02, which has the largest dimensional-
ity) have demonstrated themselves as being moderately difﬁcult since the values close
to the Optimal were found. However, the problems G07 and vess marked as AVER-
AGE posed for our method the same or even higher difﬁculty than the DIFFICULT
problems. The problems G08, G11 and G12, marked as EASY, pose no difﬁculty for
our method thus the classiﬁcation here is adequate.

We claim that the good performance of our approach on the VERY DIFFICULT

problems G05 and G13 is due to the fact that their extreme difﬁculty was suggested

186

based on the small size of the feasible set (p z 0) and the presence of the rela-
tively large number of the nonlinear and linear equalities in the set of constraint
functions. However, all those constraint functions are analytic therefore REPROPT
repair method having the LMDIF as a default Optimizer and thus using the extra
information coming from differentiability (see the description of LMDIF in section
2.1), is capable of efﬁciently repairing unfeasible members (see section 3.3.5). For
functions that do not possess those desirable properties, results can be different.

We claim that the difficulty of the DIFFICULT problems is mostly due to their
high dimensionality which is known to increase the problem’s complexity (see section
2.3.6). Additional studies with the increased maximum number Of generations for
GATool, increased maximum number of steps for repairing and the decreased con-
straint projection tolerance for REPA demonstrate an increase in the quality of the
result.

We additionally note that the problem G12 originally suggested to be a test for the
cases when where a feasible set is disjoint and thus the problem is difﬁcult, demon-
strated itself as an easy problem, successfully solved even by the most inefﬁcient
methods (see Tables 3.15, 3.17, 3.19, the studies in Chapter 5 of [128]). It seems like
the disjoint pieces of the feasible set (3-dimensional balls, see Figure D.12) are large
enough, placed close enough to each other and cover the search space dense enough
(p x 4.7697) that they do not model the main difﬁculty they were supposed to model
well enough. Moreover, for this problem the unconstrained and the constrained min-
imumizers coincide (which is easy to verify) hence the results of both unconstrained
and constrained minimization should coincide as well. Therefore it does not test
the speciﬁc difﬁculty of the constrained minimization. We conclude that other more

adequate test problems for disjoint feasible sets are needed.

187

3.4 Conclusions

In this section we reviewed the challenges of the constrained optimization with the
Evolutionary Algorithms. We presented a uniﬁed overview of the commonly used
constrained optimization methods for EA. We described our proposed constrained
optimization by the repairing method called REPA in the exquisite details including
its two repairing methods: REFIND and REPROPT. For REPROPT we assessed
its performance on the standard constrained Optimization test problems for EA with
a variety of conﬁgurations and we discussed the results and suggested the optimal
default conﬁguration. We also studied the performance of the REPA method with
the default settings on the same set Of test problems and compared its results with
several of the commonly used constrained Optimization methods for EA described in
the review section. We outlined and demonstrated with examples several strategies
for REPA performance tuning for the hard problems. We also presented some critique
on the de-facto standard test problems set used in this work.

More extensive testing of REPair Algorithm (REPA) on other synthetic problems
(see, for example, Chapter 8 in [128]) and real-life problems is needed and therefore
is a direction for the future research. The problems that pose difﬁculties for our
method should be identiﬁed and their distinctive properties should be studied and
described. Here it must be noted that the optimization methods used for the con-
straint projection by REPA can be changed. Hence the optimization methods that
are more appropriate for the considered constrained problem can be used to increase
the REPA efﬁciency.

For the practical purposed REPA repair method should be used together with
penalty function methods that augment the objective function with penalties for

constraint violation. This is done in order to direct the evolutionary search performed

188

by GATool towards a feasible optimum, not just an Optimum of the objective function
itself. Otherwise all additional computations required to repair unfeasible members
and bring them to the feasible region can become worthless. For example, if the
unconstrained optimum is outside of the feasible set and is strongly pronounced,
the unconstrained search based on the original unmodiﬁed objective function tends
to converge to this unconstrained optimum thus driving the constrained optimum
search in the wrong direction.

To conclude the section we present. some ideas that can be used to further enhance
REPA method. The concept of elitism is important for the convergence proofs (see, for
example, Chapter 9 in [128])) and in practice it improves convergence. However, in the
case of the GATOOI+REPA Optimization the result of the elitism operator application
in GATool are frequently destroyed by REPA since every unfeasible member can be
repaired and thus has its function value changed. The concept of the feasible elitism
might be introduced to perform a similar task, i.e. the best feasible members can
be preserved (note, however, that ﬁnding even one feasible member is Often the main
difﬁculty of the constrained optimization). Preservation of all feasible members found
during the optimization might also be a valuable enhancement, especially for the
problems with a small p, i.e. a. feasibility set. that is a much smaller than the search
space. First, it guarantees the success in terms of ﬁnding a feasible result even if
it is not optimal. Second, it allows REPA to use the much less expensive and the
more efficient REpair by Feasible INDividuals (REFIND) repair method instead Of
the more general-purpose REpair by PRojecting through OPTimization (REPROPT)
(see sections 3.3.2, 3.3.3).

Based on the results described in this Chapter, particularly on the outstand-

ing performance of REPA on the test problems that presented signiﬁcant difﬁculty

189

for other constrained optimization EAS, we conclude that the method is useful and
competitive. Important consequence Of this development is that REPA constrained
optimization method can be used to extend the GATool integration with COSY-GO
(see sections 2.3.6 and 2.4). Since COSY-GO is capable of the constrained rigorous
global optimization as well as of the unconstrained one, GATool, given the ability to
handle constrained Optimization problems, can be used as is suggested in the scheme
of GATool & COSY-GO integration for the constrained rigorous global optimization

as well as for the unconstrained one.

190

CHAPTER 4

Optimization Problems in Accelerator

Design

Evolutionary Algorithms (EAS), as was mentioned in section 2.1, have many proper-
ties that make them attractive for various applications in Accelerator Physics. They
can ﬁnd globally optimal or nearly Optimal solutions even for very high-dimensional
functions (when the objective function depends on many control parameters) with
moderate computational expenses (see section 2.3.4. They can handle noisy prob-
lems which Often arise in physical systems, have no requirements on the objective
function other than the ability to calculate its value for all points from the search
space, and, ﬁnally, are relatively easy to implement (see Chapter 2). These properties
make EAs a great tool to explore and solve problems that. were previously considered
unsolvable, or ﬁnd newer, better, and even unpredicted solutions to well-known prob-
lems. EAs are capable of the both unconstrained and constrained Optimization (see
Chapter 3). This is particularly important for design problems that usually include
a large number of constraints imposed by physical limitations, cost considerations,

and available technology. EA’s ability to generate new solutions mimics the nature’s

191

ability to produce new species. It has proved itself useful in the ﬁeld of design in gen-
eral and shows great potential for application to accelerator design in particular (see
section 4.1 in this chapter). The results presented earlier in this work extend EA’s
application to the ﬁeld of rigorous global Optimization which is used, for example, for
rigorous estimation of the stability of the particles in large accelerators that consist
of thousands of elements (see section 4.2).

Despite these attractive features that have created growing interest in Evolution-
ary Algorithms in science and industry, their usage in the ﬁeld of Accelerator Physics
and Beam Theory is not very common. The number of publications that we were
able to ﬁnd on accelerator design with Evolutionary Algorithms is surprisingly small
( [32,64, 160]) especially in comparison with the number of papers on their applica-
tion to various other design problems. In this chapter we present several different
problems from the various stages of accelerator design that were treated with the
GATool Evolutionary Algorithm (see section 2.3). These problems include a simple
accelerator design problem, a complex real-life accelerator design problem, and the
problem of Optimizing the normal form defect function that. is connected to the rig-
orous stability estimation Of the particle motion in a circular accelerator (see section
1.1.2). The usefulness Of the obtained results encourages us to suggest that the EAs
application to these problems is indeed very promising and has great potential for

future research.

4.] Quadrupole Stigmatic Imaging Triplet Design

Different types of electromagnetic elements are used to conﬁne and manipulate

charged particle beams in accelerators. The most. common are dipoles, which usually

1.92

generate a constant magnetic ﬁeld to bend the beam, and quadrupoles which employ
a linear ﬁeld gradient to focus the beam in the transverse dimension. The questions
of conﬁnement and focusing are of the utmost importance for circular accelerators
and storage rings. Here particles must remain in the central channel for millions of
turns in order to be accelerated to the required energy, or focused, as is the case in
a collider, to provide continuous collisions of beams circulating in the opposite di-
rection. The three main types of focusing employed in most accelerators are: weak
focusing, strong focusing, and solenoidal edge focusing.

Weak focusing was historically the ﬁrst type of focusing developed for accelerators
and is primarily achieved by shaping the poles of the dipole magnets such that the

By component of the magnetic ﬁeld decreases with orbit radius r

883/
(9r

 

<0,

Such shaping of the poles generates a linear restoring force which forces particles to
oscillate around a stable reference orbit as they travel through the channel. These
oscillations are called betatron oscillations, since the ﬁrst. machines built employing
this design principle were betatrons ]47,170]. However, the principle of weak focusing
has one serious drawback: when the circumference of the accelerator is large, the
amplitude of the betatron oscillations is large as well, which leads to either increased
particle loss or large magnet apertures to avoid it. Large apertures, in turn, lead to an
increase in the magnetic lattice cost and increased complexity in the manufacturing
process.

The main principle of strong focusing is in alternating focusing and defocussing
magnetic lenses in order to achieve an overall focusing effect. Quadrupole lens are
commonly used in accelerators to achieve focusing. However, they focus only in one

direction and defocus in the other direction. Hence, in order to to achieve simulta-

193

neous focusing in both planes one needs to combine at least two of the quadrupole
lenses.

The smallest accelerator element that is focusing in both at and y directions using
alternating focusing is called a FODO cell. Here F denotes the lens focusing in one
direction (and defocussing in the other direction), D denotes the lens defocussing in
the same direction, and O denotes a drift. FODO cells are optically stable if the
distance between lenses of equal strength in a sequence is less than their focal length
in the focusing plane.

The edge ﬁeld of the solenoids can be used to focus in both directions simultane-
ously. However it introduces non-linear components to the particles motion, couples
the motions in the (x-a) and (y-b) planes and thus makes the dynamics signiﬁcantly
more complex [119]. Hence this type of focusing is typically limited to the cases where
the beam has a large phase space size and a large momentum distribution [144] so
the quadrupole focusing cannot be used.

Most modern high-energy accelerators utilize strong focusing by quadrupole lens
combinations. It is worth noting that the combination of the magnetic quadrupole
lenses with alternating north—south pole orientation (which makes them alternately
focusing in (x-a) and (y-b) planes) has a stronger net focusing effect than a series
of solenoid lenses of the same ﬁeld strength. Thus quadrupole lattices are gener-
ally favored over solenoidal ones because of a reduced size, power consumption, and
decoupled particle motion in the transverse planes. These are the key reasons why al-
ternating gradient (strong) focusing with the quadrupole lens is the main technology
employed in many large-scale high-energy modern accelerators.

The map methods described in section 1.1.1 are typically employed to study trans-

verse motion of the particles around the reference trajectory. The combined effect of

194

accelerator lattice elements on the particle coordinates can be calculated by applying
series of the maps of individual elements to the particles’ initial phase space coordi-
nates. In the linear case these maps can be obtained rather easily. They are simple
matrices and their combined effect is calculated via a matrix multiplication. In linear
optics (both light and magnetic), studies on the maps’ properties provide a simple
and elegant. derivation of well-known physical laws of optics [31].

For nonlinear cases transfer maps can be described as polynomials for which the
particle coordinates are variables (see (1.1.20)). As the required order of calculations
goes up, their computation becomes more and more costly. In weakly nonlinear
systems (which most accelerators are by design) the Differential Algebra framework
can be successfully applied to easily Obtain maps up to an arbitrary order [18].

If we consider a rotationally symmetric optical system and set a: to be the position
of the ray and m its slope, then the transfer matrix for its Optical element can be
conveniently denoted as

(.13, :12) (:13, m)

A! 2 (4.1.1)

(mar) (ﬁlm)
If the initial coordinates of the ray are given as (.r,m)T then its coordinates after

this element are computed via

CEf _ (513,23) (30,777.) :13,-
mf _ (m,:13) (m,m) m,- ' (4.1.2)

Applying the maps of the elements in the system along the ray trajectory, we obtain
the ﬁnal coordinates. Imaging systems, i.e. optical systems in which the ﬁnal position
of a ray is independent of its initial angle and depends only on the initial position,

must have a transfer map satisfying the following condition:

(.r,m) = O.

Now consider the linear uncoupled motion of the particles in the accelerator in the
(x-a) and (y-b) planes. Suppose it is not radially symmetric. Applying a similar
principle we see that in order for such system to be simultaneously imaging in both

planes, its transfer matrix must possess the following property:
(age) : (y,b) = 0. (4.1.3)

In light Optics the smallest imaging system consists of drift, rotationally symmetri-
cal lens and one more drift. As was mentioned earlier, in accelerators the quadrupole
lens focuses in one plane and defocuses in other plane, hence the system that consists
Of the drift, quadrupole, and one more drift cannot achieve simultaneous imaging in
both planes. Optical systems that are imaging in both planes are called .stigmatz'cally
imaging or point—to-point systems.

A system that consists of two quadrupoles with alternating focusing planes is
called a quadrupole doublet. Its net effect is focusing in both planes, but the focusing
effects in x and y directions are different. If we use the convention that the ﬁrst.
quadrupole is focusing in the :c direction and defocussing in the y direction, then
the calculations show that the orbit displacement is larger in the .7: direction and
the focusing action in this direction is stronger. Hence the focal points for x and
y are different and simultaneous imaging cannot be achieved (see ]170] for detailed
derivation).

Stigmatic imaging can be achieved by a system of three quadrupole lenses called a
triplet. In order to achieve this property two outside quadrupoles are set to the equal
strength and the same focusing direction, and the quadrupole in the middle is set
to focus the beam in the perpendicular plane with a different strength. Quadrupoles
in the system are almost always separated by drifts. The parameters of such system

could be tuned so that it performs stigmatic imaging.

196

An example of tuning quadrupole triplet parameters to achieve stigmatic imaging
can be found in the standard COS Y Inﬁnity Beam Theory package [22] in one of the
demonstrations from the demo.fox ﬁle. In this particular case the system is built
from a quadrupole of length 0.1 meters and strength —q1 Teslas (negative strength
convention denotes that it is defocussing in :13 direction); a drift of length 0.06 meters,
another quadrupole of the same length and strength of q2 Tesla, followed by the drift,
and a quadrupole with the same parameters as the ﬁrst two (i.e. system is symmetric
with respect to the center quadrupole). In COS Y Inﬁnity such system is conveniently

defined by the following sequence of commands:

 

 

M0 .1 —q1 .025 ;
DL .06 ;

M0 .1 q2 .035 ;
DL .06 ;

M0 .1 -q1 .025 ;

 

If we now deﬁne the function

f(91»(12)=l(1‘lall+l(ylbllZ 0. W142, (4-1-4)

where (.rla) and (ylb) are the corresponding elements of the triplet. transfer map, then
for q] and (1.2 such that
f(qilacizl = 0.

also (:rla) = 0 and (y, b) = 0 hence the system is stigmatically imaging, according to
condition (4.1.3). From the deﬁnition of this function it is clear that its minimum is
at zero. The graph of the function (4.1.4) is shown in Figures 4.1 and 4.2.

By looking at the contour plot we can roughly guess that the objective function has
four extrema. From the 3D plot we can deduce that all four of them are minima. Using
the COS Y Inﬁnity’s built-in SIMPLEX method as it is demonstrated in demo.fox

we can find all four of them. But the problem here is that it requires us to provide

197

 

140 -«

120~

100~

80~

 

 

 

Figure 4.]: Objective function for triplet stigmatic imaging f(q1,q2), qz- E
[—1,1], '=1,2 (3D plot)

198

 

 

/ t // \\ l
/ f .... \
I ." // ‘\ \ v _
0.8 [ // / x/ // \\\\ \K
. / /' //’"‘"—’\\\;—; “‘1
a, la ’/ /_F’
0.6 l- ”, /J // / /~—:'/ / // I.
/ '1 / ﬂﬂ/ /
/ . .1" / \th/k// / I / x /
0'4 ,_ / l/ /l/ // '/,/.
/ ,/ / / /// // //
. ’ // ’1 r/
0,2 / 'x’ / // / / ,
/ l/ // / / / /
’ / l/ / / “/
o >— / / / v
I / / / / / /
/ / // I" / /
// / / III I/ /
'0.2 " I, /.‘ I / / ’ /
. / / / / / / I /
// II/ ,/ ”\ / / / I /
-0 4 I // /,— H// // / /
x , I ,/ﬂ ,
/C/ “32? / / ,’
'06 f / /’: // ___:/o———’ I / / ,I/ ’
V/ 4’ _ _\\ 1/ / f/ //
~—~T\\ \ ‘\‘K\\ m ‘1’ 'I [I ’ I
-0 8 ._ ‘\\ \\ \x\\ ‘ // // //
‘\“ v If,
_1 1 1 \\- 1 4 I 1 1 ’1 1
-1 -0 8 -0.6 -0 4 -0.2 0 0 2 0 4 0 6 0 8 1

Figure 4.2: Objective function for triplet stigmatic imaging f(q1,q2), qi 6
[—1,1], 2' = 1, 2 (contour lines plot)

199

good initial guesses in the domain of attraction of each minimum (starting from the
same initial guess the SIMPLEX method always converges to the same result, i.e.
it is deterministic, see Figure 2.4). In a two-dimensional problem we can search for
such points relatively easily, using visual tools such as a contour plot. However, for a
high—dimensional problem this task is often far from trivial. Even if we perform very
ﬁne-grained sampling of the search space (which gets prohibitively expensive as the
dimensionality goes up) we may not ﬁnd good starting points since the search space
volume is too large.

Once we know the locations of these four extrema, we can ﬁnd them using one of

the standard Optimization methods. Their values with a precision up to 10"5 are:
1. ql x 0.452, (12 z 0.58,
2. ql m 0.288, qQ m 0.504,
3. ql m —0.288, q2 a: —0.504,
4. ‘71 x —().452, Q? a: —0.58.

Points 1 and 4, and 2 and 3 are symmetric relative to the origin, since the quadrupole
is reflection symmetric in both planes even though the focusing/defocussing pattern
reverses (the change of the sign of the quadrupole strength just flips the direction
of focusing). With these quadrupole strengths the system is imaging in both planes,
hence it stays imaging. It is worth noting that the paths of the particles in the (x-
a) and (y—b) planes, corresponding to symmetric solutions, also interchange between
Figures 4.3, 4.4 and 4.5, 4.6), correspondingly.

We ran GATool on the same objective function using the default parameters (see

Figure B.1, p.288), choosing the population size to be 100 times the dimension which

200

 

-.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

/
l
/ /l” r— I‘Nh. \\
[V ./ \ \.
/ /'.l/ ,4// \\ \ \ . v'/../
/ / \ \\,‘ /’/
,./"/ -- " l" ’ ‘i ‘~~‘ \“~ .. ../" H- -"
,»"'_ .-- ”7“” -~—>:’;\ /—“”-l
V::-__“ ‘~ _’_‘/":’7 \ “-—-_ ‘_
‘\.. __ ‘ -—' /’ \_ “‘
-‘ \ \\\ KI 1" TV H4 I \>\“~
\ [~\ \ / / If \\
‘0‘ H’J/ ﬂ
0 0
(a) (x-z) projection
4‘ - ""
“I, \. ‘ ,‘I’l I
.- /'
I — I P ____._ ‘1 ‘ .."d -- ‘ .— — ‘1 i- __ M / I __
_ 2% _::T— I ' If I
' - ,/ \ ' ~ - ' ' ' F I ‘
i h 4 ,1 -,\ - .J— - \
\\
‘ _/'

 

 

 

 

 

 

 

 

 

 

(b) (y-z) projection

Figure 4.3: Ray tracing of the triplet, solution 1

201

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f"! - ‘w
/ \
/ \‘V
// \
l \
I
/ ‘\
(aL/ 14%,; ~ ‘1.\ \‘n. /
/// ”I \‘ \f‘x /
/ T ‘l \ ~ /"
// I ’ \ §\ [\ ‘~‘ ,./
__,-- ~33” “ -~—~-~-::.-,\ xiii-”’T
‘11: -~——- -‘ _ ,/-- -*";/’ \\_:-~__‘H
\‘\ “s\ V ’_ u--- .a/ .\ ‘s
\x \ \ ‘N /]V/ I "// \\\
\ k ‘\‘ I] I r,/ \\
-__ _ / 5
I'/
\ ,II
\
\ /
/
I
\ /
I
\\
l] A
C O
(a) (x-z) pr0jection
,, ’ — “N , "' _- “W ,H
/// ‘ ‘ "1]" \f\'\ ,,/’/
/ ' -..“... L _>____‘_, “\ _ ../
gay-’- 3 ~~ —— - - a . ~~-~. -33., ,-./;,:-_H/
<~_—~~-x _,-—---/ “ ~_-\
\\ “‘* ~~ _ w— -* _‘ c - _. H -"' // \'~\‘ “~—
\ .
\ r l - 3 - x ’ \‘\-\
\F J ’7 “fr/l ‘\“‘
_ 1

 

 

 

 

(b) (y-z) projection

Figure 4.4: Ray tracing of the triplet, solution 2

202

 

 

 

 

 

,/
u"
/
. /
b — >— j —- —< H._ \\ , .H
_.. -— ‘ _ \_ _,,r _ __-—
.a’ 4 ‘ — -‘-\ .’_.—" I
“3%, ,H- -— .
\._ ._ ——- /- \_ _ ‘ H .
_\ i _] .,__ __ _ , - \- - H-
- ——

 

 

 

 

 

 

 

 

 

 

 

(a) (x-z) projection

 

 

 

 

 

..- ‘\‘_
/. '/" .
“z ‘3." .../
l n/
, . ./'
,1 ,— H— _, > ‘ ,0 ,.
,—”.-— — ‘\. 'I'v“ r
L
W
\\ ~ ./ .\ "——-.‘
\ . .
\‘ _ _ - ~. -‘ __
m \.
\
,, _ \
[ ‘ /' ‘\

 

 

 

 

 

 

 

 

 

 

(b) (y-z) projection

Figure 4.5: Ray tracing of the triplet, solution 3

203

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

r '— _ ~ "_ ‘LK
/ / I "A, \ \\ . ’//
r / ‘ /
’I , " ---- “is “x .’ _- "" “**~- \ / _,.H-
/"_-*‘“ , WWW-HA ‘.':_-4"— T“ ~ “—‘F‘x /...-— w“
‘<:~ - 1 _ ,,-—«-"7 ‘xt‘ ..... - ' '7 <‘"
\ \ a q [- .- I '— ‘ r/ \V \ \ H _ p P ' .,H I / \\ \ _‘
x \ /" \
\~ \ x/ \ ,z" \
\ ,JV \ /' \x
N. K _ ’ I; \ ‘ - ’4/ \
C I
(a) (x-z) prolection
I
1.7 " ‘ \
/ V". ...—H —_,_‘_V \\
z r.’ N ‘\
. / \ [ /
//'/ V, “in \ ,x/
4' ,r' “a \ //
/ ,/ \‘\ -\‘ /
/ ’_ _ ,_ __> ‘ ,/ ’_.
Ha- ~' “ - - ..--~\ , /_H-~ 7’
VTI‘ ‘ § - ._—- \ ‘7‘
\\ “7" '1“ ,. i—"" ,/ F‘x —“"“~
\\ \\ I" ’/ \.\
e ' / ".
\- .\ 1 I p ' / \u“
“a \ .' ,4 / K‘
\ ./ .r f
\ /

 

 

 

 

(b) (y-z) projection

Figure 4.6: Ray tracing of the triplet, solution 4

204

 

 

 

 

 

Table 4.1: Triplet stigmatic imaging design statistics

 

 

 

 

 

 

 

 

 

 

# Runs Solution found (%)

1 [ 2 1 3 ] 4
200 12.0 46.5 36.0 5.5
1000 9.0 46.9 37.0 7.1
3000 «1.7 31.3 60.3 3.7
10000 8.18 47.27 38.19 6.36

 

 

 

 

is 2 in our case. The stopping criteria was set to the maximum number of stall
generations (see section 2.3) and the search domain was {—10.10] x [~10,10]. Tak-
ing into account physical considerations about the strengths of non-superconducting
quadrupoles, we should have searched for q1, q2 in a realistic interval [-1,1] but we
wanted to demonstrate GATool’s usefulness for exploration even if the information
about a search domain is scarce and approximate. Constraints in general reduce the
search space volume thus simplifying the task of the optimizer.

Each run of the GATool successfully ﬁnished the search on one of the solutions
1-4. Since the Genetic Algorithm is a stochastic optimization method, it does not
guarantee the result will be the same on each run. Running the algorithm 200, 1000,
3000, and 10,000 times with the same set of parameters and search domain and then
analyzing the resulting solutions we get Table 4.1. One important. observation here is
that GATool was able to ﬁnd one of the global minima on every run without needing
to specify an initial search point. Second observation is it was able to find all solutions
to the problem in just 200 runs. These observations demonstrate that GATool is a
valuable method to perform at least an initial study of a system with complicated
behaviour, complex dependences on control parameters, a sophisticated structure, and
a non-analytic Objective function. Such conditions often arise during the accelerator
design. Even though this method does not guarantee a global minimum, it frequently

provides excellent. insight into the behaviour of the system and is usually able to ﬁnd

205

 

a good upper bound for the minimum. It can then serve as a starting point or a cutoff
value for another optimization method as it is described in section 2.3.6.

Some of the minima (l-st and 4-th solutions) are more difﬁcult for the optimizer
to ﬁnd. We believe that non-symmetric percentages for symmetric solutions were
introduced by the random number generator implementation details but this question
requires further investigation which is out of the sc0pe of this work. We suggest
running GATool a sufﬁciently large number of times in order to get a better estimate
of the minimum value (or values).

In the case of the objective function (4.1.4), insights into the fact that solutions 1
and 4 are harder to ﬁnd can be obtained by studying the contour lines of the function
(see Figures 4.1 and 4.2). The contour lines indicate that these minima are sharper
than the other two and that their domains of attraction are much smaller. As we
noted earlier, a graphical method of system investigation is not always available.

It is also worth noting that, in principle, the linear transfer map of the quadrupole

can be calculated analytically using the expressions for the magnetic ﬁeld

where k is a quadrupole strength, and :c, y are transverse coordinates. Plugging them
into the equations of motion (1.1.8)—(1.1.13), linearizing them and then solving, we
can directly obtain the transfer matrix of the system. The linear map of the drift.
is the same as it is in light Optics. Thus we can calculate the map describing the
combined action of the triplet by multiplying known maps. Then, using strengths of
the quadrupoles as variables, we can obtain an analytic expression for the objective

function (4.1.4).

206

 

However, the algorithm described in this section and its implementation in COS Y
Inﬁnity are applicable to any map element, including non-linear elements (which are
much harder to obtain), and its combinations, including non-analytic. COSY Inﬁnity
efﬁciently calculates transfer maps to an arbitrary order, giving the user a powerful
tool to build complex yet relatively computationally inexpensive objective functions
that describe the desired properties of accelerating structures. Then, applying the
method described in this section, the user can tune the control parameters to achieve
the design goals.

It is worth noting that this particular example problem is relevant to the frequently
encountered collider Interaction Region design problem. Here, strong Final Focus
Telescope (FFT) quadrupoles are required to focus the beam in both planes to extreme
sizes at the low-beta Interaction Point (IP) where the beams collide. All of the current
approaches and techniques fail to ﬁnd an adequate minima for the IP optics I97].
These accelerator codes start from the assumption of the optically small beta sizes
at the IP and attempt to ﬁt. and match the strengths of FFT quadrupoles to optical
regions outside of the IR. Even with this simpliﬁcation, the sensitivities are such that
the solutions are highly oscillatory, thus additional constraints are often imposed
to ﬁnd an acceptable solution. The optical designs of the advanced systems such
as forefront colliders, the International Linear Collider, the Muon Collider, and the
Linear Hadron Collider with most modern codes create a difﬁcult and often intractable
problem.

The considered example highlights the power of GATool in uncovering solutions
that are difﬁcult to ﬁnd for the problems originating in advanced accelerator op-
tics. Thus this work is of a particular importance to the advanced accelerator design

ﬁeld, especially since most modern Optimization algorithms break down for parame-

207

ter spaces of large dimension and volume. Based on this evidence we conclude that

GATool is a signiﬁcant addition to an advanced accelerator designer’s tool set.

4.2 Normal Forms Defect Function Optimization

Normal form defect functions, described in section 1.1.2, are very useful for rigorous
estimation of the stability of a circular accelerator. Deviations from the invariant
circles they are measuring, directly influence the number of stable turns particles
stay in an accelerator before being lost. The maximum of a deviation allows us to get
a lower bound of the number of turns particles stays within the considered region.
The difﬁculty in this seemingly solved problem is that these functions are multidi-
mensional polynomials of the order of up to 200 ]125] (thus they are very oscillatory)
with many high-order terms that cancel each other out during the function evalua-
tion. Conventional optimization methods do not perform well on such functions and
usually get stuck in one of the local extrema. See Figures 4.8 and 4.13, particu-
larly phase angle dependence plots for examples of the landscapes of these functions.
Conventional methods of global optimization (such as various ﬂavors of interval meth-
ods) [83,101] suffer from the cancellation and the clustering effects [26]. Taylor model
methods (see section 2.2.2), however, allow one to obtain tight rigorous estimates of
the maximum, even under these unfavorable conditions. While in this case tight esti-
mation is practically doable, such a daunting task still requires a tremendous amount.
of computation time. This time can usually be reduced when there is a method to
provide good cutoff values to the box elimination algorithm. Here by a cutoff we
mean a lower bound for the maximum. Knowing the lower bound we can safely cut

off all the candidates (in our case boxes for maximum) that are below this bound

208

from the future considerations, thus saving computing resources and speeding up the
search process.

The box elimination algorithm is one of the main parts of the rigorous global
optimization process of COSY-GO, hence its execution time heavily contributes to
the execution time of the whole search. We claim that having a cutoff value generated
by the GATool optimizer (see section 2.3.6) is advantageous for COSY-GO and it leads
to a reduced computation time.

In this section we study the performance of GATool on normal form defect function

optimization in order to:
o assess its performance on the complex high-dimensional multi-modal function

0 compare the computation time and the quality of the results with the ones
obtained by the Taylor model methods-based rigorous global optimizer devel-

0ped by Youn-Kyung Kim [103] in order to study the potential for combining

COSY-GO and GATool into one hybrid method

At first, a less complex (and thus potentially easier to optimize) synthetic defect
function in 6 variables (three pairs of phase radii and angles) based on the generated
polynomials available at. the http://bt .pa.msu.edu was considered on the search do-

main from Figure 4.7.

 

 

[ 0.499999999E-001, 0.100000001 ] [ —3.14159266, 3.14159266 ]
[ 0.499999999E—001, 0.100000001 ] [ —3.14159266, 3.14159266 ]
[ 0.499999999E-001, 0.100000001 ] [ -3.14159266, 3.14159266 ]

 

Figure 4.7: Synthetic normal form defect function domain of interest

Projection on the space of two phase angles is demonstrated in Figure 4.8 (ﬁxed

values of the radii are equal to 0.1). Note that the oscillatory behavior of the function

209

 

is very prominent.

 

 

Figure 4.8: Synthetic normal form defect function plots. Function values vs two
phase angles (3D plot)

The results obtained from the Taylor model methods-based global optimizer used
for the rigorous estimation of this function’s maximum [26,103] are summarized in
Figure 4.10,

Results obtained from GATool with parameters from Figure 4.11 and initial box
from Figure 4.7 are summarized in Table 4.2. Results obtained by naive sampling
of the search space are presented for comparison. The number of samples is equal
to the maximum number of function evaluations made by GATool during the search
(over all population sizes). Timing for Taylor model methods-based global optimizer

(TMMGO) is calculated as a product of the number of processors used and the total

210

   

 

4,. ﬂf)

-3 -2 -1

a t
@%

Figure 4.9: Synthetic normal form defect function plots. Function values vs two
phase angles (contour lines plot)

211

 

 

SAMPLE NORMAL FORM DEVIATION FUNCTION (RADII IN [0.0S,O.1])
STOPPING CONDITION 1 HAS BEEN MET.

NUMBER OF PROCESSES: 256

NUMBER OF ITERATIONS: 2013

WALL CLOCK TIME: 0 hr 54 min 57.32927064225078 sec
WALL CLOCK TIME IN SECONDS: 3297.329270642251 sec
ORDER OF TAYLOR MODELS USED: 5

TAYLOR MODEL BOUNDING METHOD: REDB

MAXIMUM LIST SIZE: 503064

FINAL LIST SIZE: 16

NUMBER OF SMALL BOXES IN THE LIST: 0

INTERVAL ENCLOSURE FOR THE MAXIMUM:
[0.3373698730538533E—O4, 0.3373699188996994E—O4]
WIDTH: .4584584624369067E-11

 

Figure 4.10: COSY-GO output on synthetic normal form defect function maximiza-
Hon

 

 

Reproduction: number of elite = 10, mutation rate = 0.2
Mutation: UNIFORM, gene mutation probability = 0.1
Crossover: HEURISTIC, ratio = 0.8, randomization is on

Fitness scaling: RANK

Selection: STOCHASTIC UNIFORM

Creation: UNIFORM
killing is on

Stopping: max generations = 1000,
stall generations = 25,
tolerance = 1E-9

 

Figure 4.11: GATool‘s parameters used for synthetic normal form defect function
maximization

212

 

 

Table 4.2: GATool’s performance for different population sizes compared to the
performance of the Taylor model methods-based global Optimizer (TMMGO) and the
Naive Sampling method, on the synthetic normal form defect function (see Figure
4.8). TMMGO was executed on 256 IBM SP POWER3 processors 375 MHz each,
GATool and Naive Sampling were executed on 1 Intel Pentium IV 2 Mhz processor.
*For TMMGO time is given as a number of processors x wall clock time of the run

 

 

 

 

 

 

 

 

[Method f Time (3y ] Max Value ] Difference with COSY-G0 ]

TMMGO 256 x 3297' - |-, -1

Naive Sampling 109 0.20907529234 [1.2329453035, 1.28294626E—5]
GATool, pop—60 17 0.32741614234 [9.9537309237, 99537767737]
GATool, pep-130 33 031952463734 [1.78451855E-6, 1.78452314E-6]
GATool, pop7300 30o 033204450234 153253704937, 5.32541634E-7]
GATool, pop:600 373 0.33169447734 156753957737, 5.67544162E—7]
GATool, pep:1000 553 0.33203547334 [5.28439469E-7, 52344405437]
GATool, pop:1200 613 0.33651573534 [35403731333 8.54133164E-8]
GATool, popr'2000 3459 033701063034 [3.592423263335923367133]

 

execution time measured by a wall clock. We should note, however, that parallel
execution introduces unavoidable overhead for communications which should be sub-
tracted from the total execution time, but for our problem it is relatively small hence
this correction is neglected.

From the comparison table it is evident that even with a population size that is
just 10 times the dimension of the problem (60), GATool is able to provide a good
estimate (naive sampling estimates the range of the function when the values are in
the search box as 0.5 - 10_4) Of the lower bound of the maximum. As can be seen
from the table, an increase in the population size generally leads to an increase in
the quality of the estimate. However, as could be seen for the population of the size
180, this is not always the case and even runs with larger populations can occasionally
perform worse than runs with smaller populations. Performing a statistically sufﬁcient
number of runs for each population size, we can verify that for the problem under
consideration the quality Of the Obtained estimate averaged across runs is getting

better as the pOpulation size increases.

213

Another normal form defect function is computed for a real large circular accel-
erator and thus contains a lot of nonlinear elements which makes it generally harder
to optimize. It is computed by P. Snopok for the Tevatron accelerator [159] located
at Fermi National Accelerator Laboratory. Here we were interested in estimating a
maximal defect on a particular circular region of the phase space deﬁned in Figure

4.12.

 

 

[ 0.199999999E-004, 0.400000001E-004 ] [ -3.14159266, 3.14159266 ]
[ 0.199999999E-004, 0.400000001E—004 ] [ -3.14159266, 3.14159266 ]

 

Figure 4.12: Tevatron’s normal form defect function domain of interest

A projection on the space of the two phase angles is demonstrated in Figure 4.13
(ﬁxed values of radii are equal to 0.4 - 10—4). The oscillatory behavior similar to the
synthetic function is clearly recognizable. The dynamics Of the particles in Tevatron
that resulted from applying the one-turn transfer map in conventional and normal
form coordinates is demonstrated in Figure 4.15. This map was used for the normal
form defect function computation.

Results of applying COSY-GO to the rigorous estimation of this function’s maxi-
mum [26,103] are summarized in Figure 4.16,

Applying GATool with parameters from Figure 4.11 (only relative tolerance was
changed tO 1-10‘25 to reﬂect much smaller function values), and the initial box from
Figure 4.12, we Obtained results summarized in Table 4.3. As in Table 4.2, results
Obtained by nai've sampling are presented for comparison.

Notice that the last value Of the maximum obtained by GATool for a popula-
tion size of 1000 is inside the rigorous enclosure established by COSY-GO, hence we
might conclude that GATool improved the lower bound for the maximum obtained by

COSY-GO. However, this effect might also be attributed to floating point operation

214

 

 

 

Figure 4.13: The Tevatron normal form defect function. Function values vs two
phase angles

Table 4.3: GATool’s performance for different population sizes compared to the
performance of the Taylor model methods-based global optimizer (TMMGO) and
Naive Sampling methods on the Tevatron normal form defect function (see Figure
4.13). TMMGO was executed on 256 IBM SP POWER3 processors 375 MHz each,
GATool and Naive Sampling were executed on 1 Intel Pentium IV 2 Mhz processor.
*For TMMGO time is given as a number of processors x wall clock time of the run

 

 

 

 

 

 

 

Method Time (3) Max Value Difference with COSY-G0
COSY-GO 1024 x 935‘ - ]-, -]

Naive Sampling 46 0.384215054E—18 [4.01596187322, 7.11441777E—14]
GATool, p0p"40 5 0.380347985E-18 [4.26866555E—21, 7.11441816E—14]
GATool, popr200 18 0.382665745E—18 l1.95090547E—21, 7.11441793E—14]
GATool, pop—400 75 0.384126132E—18 [4.9051810313—22, 7.1144177813—14]
GATool, pop‘*600 177 0384406960318 [2.09690285E-22, 7.1144177513-14]
GATool, pop~800 117 0.384035970E—18 [5.80680790E-22, 7.11441779E—14]
GAToti rum-1000 230 0.384644775E—18 {-2.81241401E—23, 7.11441773E—14]

 

215

2W@y@©@
:0 @nt

1

 

.3

Figure 4.14: The Tevatron normal form defect function. Function values vs two
phase angles

   

(a) Conventional coordinates (b) Normal form coordinates

Figure 4.15: Particles dynamics in the Tevatron

216

 

NORMAL FORM DEVIATION FUNCTION FOR TEVATRON
STOPPING CONDITION 0 HAS BEEN MET.

NUMBER OF PROCESSES: 1024

NUMBER OF ITERATIONS: 326

WALL CLOCK TIME: 0 hr 15 min
WALL CLOCK TIME IN SECONDS:
ORDER OF TAYLOR MODELS USED: 7
TAYLOR MODEL BOUNDING METHOD: REDB

MAXIMUM LIST SIZE: 999424

FINAL LIST SIZE: 998400

NUMBER OF SMALL BOXES IN THE LIST: 0

INTERVAL ENCLOSURE FOR THE MAXIMUM:
[.3846166509606185E—18 , . 71144 56197038863E-13]

WIDTH: . 71144177 3 5373770E—13

    
    
    
  
   

35.44252496212721 sec
935.4425249621272 sec

 

Figure 4.16: COSY-GO output on the Tevatron normal form defect function max-
imization

errors that is made signiﬁcant. by the values Of the considered function being close
to the machine precision. Such errors are treated by COSY-GO in a rigorous way
via outward rounding for the interval calculations. GATool uses standard ﬂoating
points operations and thus is susceptible to numerical inaccuracies. Therefore the

numbers from the table can be used to demonstrate only the growth of the quality of

the estimate obtained by GATool with the growth of the population size.

If a more rigorous result is needed, COS Y Inﬁnity ’s object-oriented features allow
the user to easily overload standard floating point arithmetic with high-precision

arithmetic. Development. of the high-precision arithmetic package for COSY Inﬁnity

is currently underway [27,174]. It is worth noting that this effort is partially inspired

by a normal form defect function rigorous bounding problem.

In this section we demonstrated that GATool can be satisfactory applied to the

practically useful problem of estimating the extrema of a complex multidimensional

217

function. We showed that the quality of the result together with the computation time
support the usage of GATool as a fast generator of good cutoff values for COSY-GO.

It should be noted, however, that a cutoff is a lower bound of the maximum. GATool
itselfcan not he used to estimate the upper bound of the maximum of the normal form
defect function. COSY-GO is needed to accomplish this task with GATool working
in parallel to reduce the overall computation time. As is discussed in section 2.4, the
integration of these methods is a topic for the future research. Normal form defect
function bounding problem would server a good test to assess the performance Of this

combined tool.

4.3 Neutrino Factory Front End Design Optimiza-
tion

4.3.1 Problem Description and Motivation

The Neutrino Factory, as is described in section 1.2, is an important facility for a fu-

ture of the neutrino research program, and is currently in the active R&D stage [178].

Its designs are frequently changed and explored in search for the Optimal solution and

the cost /performance ratio. Such a solution would allow for the international collab-

oration (mostly members are from USA universities and laboratories) to realistically

consider building this next-generation accelerator [10]. The front end section plays

an important role in the overall performance of the factory. It conditions the high

emittance beam coming from the production target for the subsequent acceleration
(see section 1.2.3).

Effective delivery Of the particles with optimal phase space formation (to match

218

the transverse acceptance and the acceleration regime) and minimal particle losses

are key performance characteristics of the Front End. Thus one quantity that requires

optimization is the ratio of the muons matching the accelerating regime at the end
of the channel to the number of the pions coming from the target, i.e. production
efﬁciency. It is also one of the main factors in achieving the primary goal of the
whole accelerator, namely producing high-intensity beams of neutrinos for various
experiments.

From the description of a currently accepted baseline front end in section 1.2.3
it can be easily concluded that there are many variations in the lattice parameters
that can potentially lead to different performance characteristics. Since the front end
is just a subsection of the Neutrino Factory, it needs to ﬁt into a general scheme,

which means that its performance cannot be considered alone. Rather it should be

tuned to ﬁt optimally into the overall accelerator design. However, there are different
suggested designs of the subsections that precede and follow the Front End, and
different variants of the Front End itself, therefore different optimizations might be
required in order to explore all possibilities to their full extent. Hence, it is important

to establish a general scheme of exploration and optimization that can be applied

to study any of these variants. Some of the factors that should be considered or
explored for the Front Ends include: physical limits on the maximum gradients that
can be obtained in RF cavities or a number of RF cavities with different frequencies;
schemes that provide shorter or longer bunch trains; optimization of the production
parameters (the number of muons captured into the accelerating regime); different
central energies of exiting bunches; different allowed energy spread, and, of course, the
cost considerations. Matching the beam into different accelerating/cooling structures

following the considered lattice also has to be taken into account.

219

Optimization studies that address some of these issues ]10,71,74,95,109,145,152,
154,155] are summarized in the yearly summary reports produced by the international
collaboration working on the Neutrino Factory project ]1,62,86,150,178]. However,
given the fact that there was not general agreement on the design and there are still
a lot of the variations that were to be considered, we were motivated to perform such
a study ourselves. Another motivation lied in the fact that Evolutionary Algorithms
had demonstrated themselves as an efﬁcient tool for design exploration and optimiza—
tion (see section 2.1). Thus the application of the GATool algorithm (see section 2.3)
to the problem of a front end optimization was interesting both from the practical
point of obtaining new optimal designs and as a test of the algorithm performance

on a. complicated real-life problem. Here the complexity of the problem lied in the
objective function that was not deﬁned analytically and included stochastic simula-
tions. Moreover, successful application of the algorithm to this problem would have
established a general scheme of front end optimization which could be used for the
subsequent studies. Successful application of the EA to an exploration of one of the

front end designs [32] served as another factor that lead us to believe in the success

Of the experiment.

4.3.2 Optimization of the Front End Production Parameters

As can be seen from the description of the Front End design, parameters that can be

changed for different sections (see section 1.2.3) include:

1. Capture and Decay: the length of the section LD and the focusing ﬁelds.

2. Bunching: the length of the section LB, RF voltages Vé, z' = linrfs or initial

and final voltage and the voltage increase formula (linear, quadratic, etc.). Final

220

frequency is usually strictly speciﬁed by the cooling/accelerating subsections of

the whole accelerator, but can be varied if it positively inﬂuences the overall

Neutrino Factory performance.

3. Phase Rotation: the length LtpR’ RF voltage chR of the phase-energy rotation
section, number N of RF ﬁeld oscillation periods between chosen second central
particle and the main central particle (with n = 0), and the vernier parameter 6.
Also the kinetic energy TC Of the main central particle can be changed (usually

Tc is taken as the peak Of energy distribution of the particles of the beam).

4. Ionization Cooling: parameters of the RF cavrties (Vrf,cool’ Vrf,cooli er,cool)i

position, width, material, and the position of absorbers, and focusing ﬁeld.

For our study we explore the cooling section. We varied RF cavities parameters
and the momentum of the central particles in the beam within the ranges obtained
from the physical considerations. We also optimized the mathematical model of the

structure in order to ﬁnd a conﬁguration which would provide the maximum particle
production described earlier.

Most of the numerical studies of beam dynamics in the Front End are performed

in ICOOL —— the de-facto standard Muon Collaboration particle tracking code. It

was originally developed in 1999 for ionization cooling simulations Of muon beams [66]

and has been actively developed over years to include new elements and models [69]
(available at. http://pubweb.bnl.gov/users/fernow/www/icool/readme.html).

ICOOL belongs to the family Of the so-called ray tracing codes. It calculates

particle dynamics employing the Runge-Kutta or the Boris numerical integration

methods to integrate equations of motion. The dynamics is studied in Frenet-Serret

coordinate system [29], which is a right-handed system where s is tangent to the

221

reference orbit, y is vertical, and :L‘ is the third orthogonal coordinate. In a circular
orbit .7: grows in the radial direction. The reference orbit is deﬁned to be the path
where the transverse coordinates .r and y and the transverse momenta p1: and py
remain zero. The shape Of the reference orbit. in a global Cartesian coordinate system
is determined by the curvature parameter.

The electromagnetic ﬁeld can be speciﬁed using built-in models that include most.
common accelerator elements and their approximations. It can be calculated from the
ﬁeld maps or Fourier coefﬁcients, or read from external sources. ICOOL accurately
models the decay processes and particle interactions with matter including energy
loss, energy straggling and multiple Coulomb scattering [68,70]. The beam can be
generated from the uniform or Gaussian distributions or read from an input ﬁle.

Various tools are developed to analyze the results produced by ICOOL. The stan-
dard code for the emittance calculation is called ECALC9 [67]. It allows a user to
compute the number Of particles in the ﬁxed phase space volume. The input is read
from a ﬁle that contains the particle type, maximum, and minimum value for p; in
GeV/c, two different cuts for the transverse acceptance in m-rad (to obtain a number

of particles that are left after two different acceptance cuts when all other cuts staying
the same at once), a longitudinal acceptance cut in m-rad and an RF frequency to

determine the RF bucket area for the longitudinal cut.

The tool chain for the Optimization of the production parameters was assembled

from the following pieces:

0 COS Y Inﬁnity: provided the implementation Of the GATool optimization

method (see section 2.3 for the description Of the algorithm and Appendix B

for the technical details Of the implementation).

0 ICOOL: performed actual simulations of the beam dynamics in the Front End

222

with the parameters values passed from COSY.

0 ECALC9: performed analysis Of the results of the ICOOL simulations, cal-
culated the number of particles within the desired acceptance (and thus the

production ratio), that served as the Objective function value.

0 Perl: used to control other programs in the tool chain and pass parameters
and values between them. It was used to set up the Front End lattice for
ICOOL based on the control parameters provided by COSY, run ICOOL and
then ECALC9 to obtain the objective function value, and ﬁnally pass it back

to COSY to complete one optimization step.

The initial distribution Of particles coming from the target contains 8000 particles.
It was generated by MARS simulation code for the 24 GeV proton beam on the Hg
jet target [138]. The Front End lattice that was used for this study started from the
target and included capture, decay, bunching, and phase rotation regions as well as

a cooling section and a matching between phase rotation and cooling subsystems:

0 Capture: 15.25 m of the vacuum channel in a solenoidal ﬁeld that falls off
starting from 20 T on the target tO 2 T at the end of the channel. At the same

time the radius Of the channel increases from 0.075 m to 0.3 m.

o Decay: vacuum channel of a constant aperture of 0.3 m in a constant solenoidal

ﬁeld of 2 T.

o Bunching: vacuum channel of a constant aperture of 0.3 m and a total length of
L = 21 m in a constant solenoidal ﬁeld of 2 T. An array of RF cavities separated
by drifts where the parameters were calculated according to logic described in

section 1.2.3 so as to perform the adiabatic bunching (28 cells, each consists of

223

the drift of 0.125 m, followed by RF cavity of 0.5 m and another drift of 0.125
m). Particles are bunched around the central momentum of 0.280 GeV/c. An
integer number of wavelengths that separate two reference particles (n in (1.22))
is 7 (hence the momentum of the second central particle according to (1.2.5) is
0.154 GeV/c) and the initial RF gradient is set to 15 MV/m, the formula for
the RF gradient dependence on the longitudinal coordinate 2, measured from
the start of the buncher, is

, z
Mrf Z V0,rf f '

0 Phase rotation: vacuum channel Of a constant aperture of 0.3 m and a total
length of L = 24 m in a constant. solenoidal ﬁeld of 2 T. An array of RF
cavities separated by the drifts with the parameters calculated according to
logic described in section 1.2.3 so as to perform the rotation of the beam in
the longitudinal space by decelerating higher-energy bunches and accelerating
lower-energy ones (32 cells, each consists of the drift of 0.125 m, followed by
RF cavity of 0.5 m and another drift of 0.125 m). Vernier offset 6 from (1.2.8)

is 0.1, RF gradient is 15 h'lani for all cavities.

0 Cooling: vacuum channel of a constant aperture of 0.3 m and a total length
of L = 93 m in an alternating solenoidal ﬁeld Of the maximum strength a: 2.5
T. An array of the 124 cells (0.75 m each), with LiH absorbers to achieve total
momentum loss and an RF ﬁelds to achieve longitudinal momentum regain, are
combined together in order to cool the transverse emittance of the beam. The
ﬁrst four cells have the solenoidal ﬁeld designed so as to match the transverse
particle dynamics in phase rotation section to the one of the cooling section.

All RF cavities have a frequency of 201.25 MHz, ﬁeld gradient is 18 MV/m and

224

the RF phase is 30 degrees.

This particular design was shorter than the one of the baseline and it was developed to
study the cost gain versus the performance losses resulted from shortening the Front
End. It achieved this tradeoff by removing some of the elements of the baseline.

Another goal was to study the potential applicability of this design for the Muon

Collider project [46].

We used the described lattice as a reference design and explored its performance

related to changes in the following control parameters:

RF frequency in the cooling section (also influences the downstream accelerator

section): Vrf,cool 6 [200.204] MHz.

RF ﬁeld gradient. in the cooling section: Vrfcool 6 [12,20] h‘lVy’m.

RF ﬁeld phase in the cooling section: “prfcool 6 [0,360] degrees.

Central momentum in the ﬁrst four matching sections of the cooling channel:

pc,match_cool 6 [022,024] GeV/c.

The values of the cuts for ECALC9 analysis were selected to estimate of the accep-

tance of the subsequent acceleration subsystem [143]:

minimum and maximum p;: 0.100 GeV/c and 0.300 GeV/c, correspont'lingly;

o transverse acceptance cut: 30E—3 m-rad;

longitudinal acceptance cut: 0.25 m-rad;

RF frequency for the bucket calculation set to a value used by RF cavities of

the cooling section (on of the control parameters).

The number of particles within the speciﬁed acceptance (n2) [67] at the end of the
lattice was chosen as an objective value to be maximized. The initial number of
particles was kept constant so the targeted quantity was the production ratio. GATool
parameters were set to default values (see Figure 8.1, p.288) and the population size
for this 4-dimensional problem was set to 250 (dimension x 62.5). Such choice provided
a good compromise between the total time of the search given the expensiveness of
the objective function calculation (see below), and the quality of the GATool result
(see studies in section 2.3.6).

Several of the best obtained results (elite in GATool terminology) from three runs
(each of them took several months to complete on a single machine) were evaluated
using the described scheme and the full initial number of particles of 8000 (2000 of
which were used in optimization to reduce computation time). The control parameters
and the objective function values for the discovered designs are listed along with the
reference design provided by Neuffer [143] in Table 4.4. The range of the values
of the objective function that. was Obtained during the Optimization runs is 15 to
497. From the table it can be seen that. the optimization of the current scheme with
control parameters in the speciﬁed ranges was unable to achieve designs that have
statistically (simulation includes stochastic processes) signiﬁcantly better production
efficiency. Although this cannot serve as a rigorous proof of the nonexistence of such
designs, we can take into account generally good performance of the GATool on other
problems (see section 2.3.4) and suggest that this gives a good reason to believe that
the reference design is, indeed, optimal. Relatively small deviations of the Optimal
RF frequency (20120—20155) and RF gradient (17.67-18.88) among all the solutions
(except for the first optimization run) support the assumption about the reference

parameters being robust and located near the global Optimum. This observation is

226

particularly important, since the parameters of the devices that are calculated by
numerical simulations eventually have to be realized in the physical devices Operating
with ﬁnite precision and subject to construction errors.

The best solution obtained from the ﬁrst optimization run, demonstrated one of
the best performances and, along with it, the smallest RF gradient among other solu-
tions. Smaller-gradient RFs are generally easier/cheaper to manufacture, which can
signiﬁcantly reduce in the potential cost of the lattice. However, the ﬁnal frequency
from this solution is different from the frequency of the current baseline accelerating
section. Hence additional studies on the combined performance are needed to reveal
additional beneﬁts and drawbacks of this solution. The best solution from the third
Optimization run provides similar performance on a frequency that is much closer to
the reference 201.25 and thus might be preferable. Some of the other sets of parame-
ters that provide similar production performances can also be useful since they might
be easier or cheaper to Obtain, or provide additional opportunities for the designers
of the downstream sections of the Neutrino Factory. Hence the main beneﬁts of the
GATool for this problem are in the exploration Of the space of the solutions that
provide some interesting candidate solutions. It would otherwise only be possible
through extensive trial and error, similar to the method used to obtain the reference
solution over the years of the research. Also we veriﬁed the quality of this solution
obtained through such a laborious process.

Yet another important result that was Obtained is that we established a framework
for the Neutrino Factory Front. End lattice numerical Optimization. It can be used for
many optimization scenarios, including, for example, a simultaneous Optimization of
all control parameters of the most realistic Front. End simulation on the large ranges

of the parameter values.

227

A555 Ase:
monEwQ ooow 2: :8 SA: :oﬁesvoa 2: :0 “03:8 :2vo chcoovAA EV :cAuaNAEAEo :mAmmc cam— Acoi «A: .Ao SAN—mom "Ev oEmH.

 

 

BN3 5...: AAWANAAV 8A. 8N5 E..NN we: ONAON 55A .5: .55. EN

ANQN. a 83 ANANQ ...NA. ANNN. 85: we: NEON :5 5A. .5: .36 RN.

AA.AN. e E: AALMN. 8 ANA. MANNAL 858 A3: A.N.AAAN :8 EN .5: :5. EN

AAAAN. e 5.: Ac o.A.N 8 EA. eNNd ONoNA we: A.N.SN 55A .5 .5: .30 EN

ANAN. 8 S: An m.A.N 8 SA. eNNs 3...: we: NA..AoN :3 EN .5: .50 A53. 3
Ac NNS Ne: AN. N..A.N e 5AA. NNNAA 2A.: 5: SAN :8 we .5: .55 En my.
ANNN. 8 ﬂ: AN NA.N. 8 AAA. eNNd 3A.: 8.: NNAAAN 55A .5 .5: :5. RN

ANNN. 8 cm: AA. MN. 8 EA. ONNs ENeN 3.: SEN :3 .5: :5. A5

AmNNa Nw: Ac FAN 8 Na. .82. 53.: 8.: NEON A5: .5: .55 RN.

AA.NN.e a: As A..A.N e on. NNNd ANN: t: eA..AA.N 5.5 5: .5: :5. EN.

ANANA: 5.: ANANE we. ANNA: 25cm 5:: A..N.8N Ame: 553555 85:35

33:8 3555A _e\>eo_ :35: E93: 3:);

Socw\m5 m: AooonS m: 300138586: 68.23 38.3» Goof: mcepeﬁﬁem

 

 

 

 

 

 

 

 

 

An additional note must be made on performance. Even though 2000 particles
were used for simulations during optimization instead of the 8000 used in the baseline
simulations, one simulation run of the optimized front end lattice still took approxi-
mately 0.4 hours on a Pentium IV 2Mhz computer with le of memory. Therefore
the calculations needed to perform one step of GATool Optimization (which include
the evaluation of the objective function for every population member, see section
2.3) took approximately 100 hours. Since a typical number of generations needed
for GATool to explore a space of parameters and converge can easily get above 100,
the computing time can get prohibitively long. For subsequent studies (possibly on
more realistic and thus more computationally expensive lattices), a potentially bene-
ficial strategy is to employ the parallelization of the objective function evaluation as

discussed in section 2.3.4.

4.4 Conclusions

In this section we considered an application of the GATool evolutionary optimizer
presented in this work to set of the cases covering a broad spectrum of the problems

from the accelerator design ﬁeld:

0 an example of the quadrupole stigmatic imaging triplet design that is relatively
simple on the surface yet demonstrates certain complexities under closer in-
vestigation (this problem is also directly connected to the real-life complicated

problem of the collider interaction region design);
0 an estimation of particle dynamics stability via a normal form defect function;

0 optimization of the control parameters of the front end for the next generation

accelerator. All of these optimization problems are formulated in such a way

229

that their objective functions are either very hard to treat for most standard
optimizers or are even non-treatable due to their small domains of attraction
for the extrema, highly oscillatory landscape that contains a large number of
local extrema, or, as in the third problem, general unavailability of the function
in the algebraic form (it is computed through numerical simulations) and its

stochasticity (some devices in accelerators are simulated with stochastic effects).

We demonstrated how GATool is capable of solving these problems (or helping to
solve by dramatically reducing the computational time as is the case for the normal
form defect function Optimization). It proved itself to possess attractive features such
as: moderate requirements on the computational resources, no requirements on the
objective function except for the trivial ability to compute its value for a given set of
control parameters, and a surprisingly high (considering such modern requirements)
quality of the obtained result.

GATool also demonstrated a very useful ability to ﬁnd furtive or unpredicted
solutions to accelerator design problems, thus enabling easy exploration of the space
of the different optimal solutions. Exploration phase usually require a domain expert,
a lot of the time spent. in trial and error, with fair amount of intuition and even blind
luck. With the help of GATool both the initial exploration of the design and the ﬁnal
ﬁne tuning phases become much more efﬁcient and/or rich since the number of the
solutions that can be considered and their quality dramatically increase with almost
no additional human effort. Summing up all the evidences we conclude that GATool
demonstrated itself to be a valuable addition to a tool set of a modern accelerator

scientist.

230

APPENDICES

231

APPENDIX A

COSY++ Macroprogramming

Extension for COSY Inﬁnity

A.1 COSYScript

A.1.1 Introduction

COS Y Inﬁnity [22,23,48] is a powerful software package for scientiﬁc computations.
It was originally created by Dr. Martin Berz [14], who is currently maintaining and
further developing the package with Dr. Kyoko Makino. Contributions, additions and
enhancements, accumulated for over a decade, experience and feedback obtained from
different users working on different scientiﬁc problems, along with careful design and
the implementation have made this powerful tool even better.

It is built around the Fortran77 kernel which implements the Differential Algebra
arithmetic [16]. Other packages implement graphical interface, optimization methods,
and even own scripting language interpreter. This language is called COS YScm’pt and

has simple yet rich syntax closely resembling Pascal. Despite a relatively small number

232

of built-in Operators, this language gives a demanding user full access to not only
real numbers, vectors (with Optimizations for vectorizing supercomputers), complex
numbers, logical type variables, and strings but also to such complex data types as
Interval Numbers, Differential Algebra Vectors, Taylor Models [118,121] and their
complex arithmetic via a transparent set Of operators and functions. The concept of
polymorphism from Object-Oriented programming is carefully used in design to allow
a user to easily switch between different. data types and mix them in calculations, thus
giving this package the ability to easily manage the complexity of computations, type
and precision Of the Obtained results.

Another interesting feature Of the language is an optimization built into the stan-
dard syntax. After each step of the Optimization process with one of three built-in
optimizers (each one has its own strengths and weaknesses) a user is given the current.
best value and execution control, and is from there free to make decisions about the
subsequent execution flow. This feature allows one to build complex Optimization
scenarios combining automatic Optimization by built-in Optimizers with the user in-
put. It is also worth noting the COS Y Inﬁnity is a multi-platform system and is
capable of producing a graphical output on every platform it is supported on (includ-
ing Windows, Linux and MaxOS), interfaces modern programming languages such as
C+ + and Fortan90 and, ﬁnally, it is easily extensible. The code base is still under
active development, new features are being developed and added, and, as such, are
available to users upon request. The general policy is to include them in a standard
distribution available to the entire user community only after extensive testing and
ﬁne-tuning. This policy ensures that system remains consistent and robust. Current
features under development includes the language—level parallelism and high-precision

numbers arithmetic.

233

Such a framework provides a user with a versatile set Of tools capable Of ele-
gantly solving a lot Of otherwise computationally hard (or even unsolved) problems.

COSYScript applications include, but are not limited to:

high-order Automatic Differentiation of the functions [4],

— the verified and non-veriﬁed integration of the Ordinary Differential Equations and

Differential Algebraic Equations [168],

rigorous and veriﬁed numerical methods with highly suppressed dependencies [168],

— rigorous global Optimization [118],

Beam Theory, where it is applied to a variety Of problem, e.g. the analysis of

the high-order effects, high-order fringe ﬁelds treatment in accelerators and
spectrographs, rigorous long-term stability studies [15,19,21,24,25,76,96,113~

116,119,120,123,124,126,154,177 .

A.2 Syntax

From a programmer’s point. of view COSY Inﬁnity consists of the three main parts:

1. An elementary Operations package written in Fortran 77 which implements oper-
ations on various COSY data types such as Differential Algebra, Taylor Models

and interval arithmetic: dafox.f.

2. An optimization package (foxﬁt.f), a graphics package (foxgraf.f) and a com-
piler and executor package (foxy.f) which combines all these to implement the
COSYScript language and COSY Inﬁnity front. end. All these are written in

Fortran 77, same as the kernel.

234

3. Packages written in COS YScript: the Beam Theory package (cosy.fox), the
Rigorous Computing package (TM.f0x), the Rigorous Global Optimizer and

Rigorous Global Integrator (currently distributed separately).

Although it is such an extensive and developed, environment for scientiﬁc compu-
tations, COS Y Inﬁnity still has some aspects that could beneﬁt from improvement.
At the time COSY Inﬁnity was initially designed and developed it was not possi-
ble to predict how successful it. would be. Thus, the built—in mechanism to allow
COS YScript code to be modular was relatively simple and rudimentary (by the code
modularity we hereby mean the ability to store the source code in more or less self~
contained modules). When the code is modular, most Of the services are provided by
well-deﬁned interfaces to these modules with the implementation details hidden from
a user.

A common program in such a framework simply includes the required modules to
import their services and call the imported procedures and functions. The amount of
the code in the modules is typically much larger than the code in the user program.
If a user decides to build a larger program or a set Of programs, he might want to
implement. some functionality in his own modules. This approach allows code to be
clean and well-structured and permits easy reuse Of the already written code. In such
cases the amount of the user code can be comparable to one of the modules shipped
with the system itself.

In general, COSYScript program is a set of nested blocks and each of them consists
of three sections. The blocks are marked by the beginning and ending statements for

the outermost block (the main program block):

 

 

BEGIN ;

END;

 

 

and for all the inner blocks:

 

 

FUNCTION <name> {<args>};

ENDFUNCTION ;

 

OI‘

 

 

PROCEDURE <name> {<args>};

ENDPROCEDURE ;

 

The three sections that build up a block are

1. Variables
2. Nested blocks (functions and procedures)

3. Executable code

placed in this exact order. The ﬁrst two Of these sections are optional and can be
omitted, while the absence of the executable code results in a compilation error.
The structure of a generic COSYScript program along with some hints on the
name scoping rules (for variables, procedures and functions) are demonstrated by
the example in Figure A.1. Note that COSYScript is a case-insensitive language,
hence the variables, functions, and procedures that are deﬁned in the same scope
whose names differ only in case refer to the same variable, function, or procedure,

correspondingly.

A.2.1 Problems

Inclusion Mechanism

While the code’s nested structure makes it tree-like, the original inclusion mechanism

supported by COSY Inﬁnity is linear. It is implemented via a pair of commands:

236

 

 

 

 

BEGIN;
VARIABLE main_var1 1;

VARIABLE main_var8 10;

PROCEDURE Procl arg1 arg2;
VARIABLE proc1_var1 5 2;
PROCEDURE Proc1_Proc1 argl;
VARIABLE proc1_proc1_var1 11;

{ Commentary: code fbr Proc1_Proc1 }
Procl—Proc1_var1 := ’Hello’;
Pr0C1_var1 := ’world!’;

ENDPROCEDURE;
{ Commentary: code fbr Procl }
procl_var1 := ’Goodbye!’;
ENDPROCEDURE;
FUNCTION Funcl argl arg2 arg3;
{ COmmentary: code fbr PUncl }

Funcl := (argl + arg2 + arg3) * main_var1;
ENDFUNCTION;

{ COmmentary: code fbr main block }
write 6 ’Hello, world!’;

END;

 

Figure A.1: COSYScript program structure

237

 

 

 

SAVE <name>

 

and

 

 

INCLUDE <name>

 

The ﬁrst of these commands is used at the end of the COSYScript ﬁle that is included.
It precompiles the source code from the <name>.f0x ﬁle and then saves it in the

binary form to the file ﬁle named <name>.bin. Then the

 

 

INCLUDE <name>

 

command used in some other ﬁle includes the compiled <name>.bin into it. How-
ever, each ﬁle can contain only one inclusion command (at the beginning), thereby

only linear “chain inclusion” is supported. The ﬁrst ﬁle calls

 

 

SAVE <name>

 

in its last line and every next ﬁle starts from

 

 

INCLUDE <name_previous>

 

and ends with

 

 

SAVE <name_current>

 

in order to incrementally save both the code it includes from the previous ﬁles and

the code it contains. The last ﬁle starts with

 

 

INCLUDE <name_previous>

 

and includes all the code that was gathered by the name_previous.fox (and thus
the code of all the ﬁles from the inclusion chain).

Note that while the code saved by SAVE is precompiled (which saves the source
processing time), the action of the INCLUDE is essentially equivalent to copying the
content Of the ﬁle that was saved with the SAVE statement (not including the statement

itself) and then replacing corresponding INCLUDE statement with this copied code.

238

 

 

 

 

 

 

 

The problem with this approach is in the fact that the sections in the blocks of
any COS YScript program have to be exactly in the order mentioned previously and
cannot be broken into parts. Suppose the ﬁrst ﬁle in the inclusion chain Opens the
main block with BEGIN, then speciﬁes variables, functions, and procedures, and then
precompiles and saves its contents as described.

The next ﬁle includes it in-place, thereby starting where the previous ﬁle left off,
i.e. in the middle of the section with functions and procedures. In this case some
of them are already deﬁned in the included ﬁles. Thus the including ﬁle cannot
add variables to the main block, rather it can only continue adding functions and
procedures to the Nested Blocks section, and then some code to the Executable Code
section. If the ﬁrst. ﬁle is saved only after some code is added to the Executable Code
section, then the next ﬁle in inclusion chain can only add code to the Executable Code
section of the main block; it has no access to Variables or Nested Blocks sections.

The frequently used workaround for such an approach is to deﬁne the procedure
called RUN and then use it as shown in Figure A.2. Note that we have to put END at
the end of this code even though we did not explicitly put a corresponding BEGIN; to
this ﬁle. It is hidden behind the INCLUDE statement in the ﬁle we are including.

Names deﬁned in the enclosing block are visible and accessible in the enclosed
block. They can even be overridden by the names local to the enclosed block. How-
ever, the Opposite is not true: the variables deﬁned in the enclosed block are not
visible to the enclosing block and thus cannot be used by it. This is perfectly legit
from the encapsulation point of view, since the external block should be isolated from
the intrinsic implementation details Of the internal block. This concept is successfully

applied to enhance modularity in many existing programming languages.

239

 

 

INCLUDE ’file’
PROCEDURE RUN;
{ Private variables }
VARIABLE i 1;

{ Private nested blocks }
FUNCTION max 8 b;

ENDFUNCTION;
{ Private code }
i := max(1,2);

ENDPROCEDURE;
{ Main block code, just call the package code }
RUN;

END;

 

Figure A.2: Example of inclusion workaround

Suppose, however, we want to save the code in the ﬁle from Figure A2 and then
include the result into another ﬁle. We have several options: we can save the code
inside the RUN procedure, outside of it, in the main block before the ﬁrst line of
code, in the nested blocks section, or in the code section Of the main block. Let
us consider each of these Options. If we save the ﬁle anywhere outside of the RUN
procedure, and then include it in the next ﬁle in chain, the next ﬁle can only call
the RUN procedure. It will have no access to the variables, procedures or functions
deﬁned in the RUN procedure. Hence either RUN should serve as a call dispatcher,
providing the access to the names defined inside it (which is a cumbersome and hardly
maintainahle solution), or we have to call SAVE inside the RUN procedure. But then
with the next ﬁle in chain we will be in the same situation: it would have to deﬁne

its own RUN enclosed in the RUN of the previous ﬁle in chain. As the chain grows, the

240

 

resulting ﬁle ﬁlel

 

 

 

filel code filel code
file2 code SAVE filel;

 

 

 

ﬁle2

 

INCLUDE filel;
file2 code
SAVE file2;

 

 

 

ﬁle11

 

... INCLUDE filelO;
filell code filell code

 

 

 

 

 

Figure A.3: Example Of the inclusion chain

level Of nesting increases, and keeping track of the names visibility and nesting level
can become an issue.

Consider the case where a user did not build the code of the whole chain but
receives only a ﬁle in precompiled binary form. He than has no way to determine
the number of ﬁles that was used to compile the received ﬁle and the combination
of methods they used to COpe with the nesting problem in inclusion. It makes the
derivation Of the proper syntax for ending statements a matter of trial and error and
clutters user’s program with unneeded code.

We can see that such an approach to inclusion is suitable for small projects.
However, for large projects, especially the ones developed by by several authors, it
does not provide enough support to maintain modularity of the program and manage
the dependencies transparently. All these consequences follow from each ﬁle in the
inclusion chain not being a syntactically separate compilation unit and is bound to

be made aware of its place in the chain of inclusion (see Figure A.3).

241

Size of Dynamic Global Variables

COS YScript is an interpreted language, and it each program is interpreted from
beginning to end, following a nested blocks structure. In each block interpreter starts
from the variables section, processing all the variables declarations and storing the
corresponding entries in the symbols table. Then, it checks the declarations of the
procedures and functions in the Nested Blocks section without looking into their
code. It then proceeds to the Executable Code. When the procedure or function call
is found, the interpreter searches for the name of the procedure or function in the
symbol table, initializes the arguments (provided the name was found) and recursively
continues interpreting the nested block deﬁned by a procedure or function, thereby
following the tree-like structure of the program.

COS YScript uses the following syntax for the variable declaration:

 

 

VARIABLE <name> <size> { <dim1> {<d1'm2>...}};

 

Here name is a variable name, size is a variable size (where unit is the size of REAL data
type), and dimensions are used to make multi-dimensional arrays of the Objects. Note
that the type of the variable is not declared and is deduced dynamically at run time.
In this deﬁnition, size and dimensions can be valid COS YScript expressions. The
validity of the expression includes both the syntactic validity (i.e. it should form a
syntactically correct COSYScript expression) and interpretation validity (i.e. should
be interpretable: all the names used in the expression must have their declarations
already processed). It is thus possible to have the size of a variable or its dimensions
to be set by nother variables, but the variable containing size must be declared earlier.

Consider an example in Figure A.4 where we want the variable dynamic_var to be
declared such that its size is deﬁned by another variable. Here we want to set the size

of the dynamic_var to 10 but this is not what would happen. The declaration of the

242

 

 

 

VARIABLE size 1;
VARIABLE dynamic_var size;

 

Figure AA: Dynamic size Of the variable in COSYScript (non-working)

dynamic_var is processed after the declaration of the size variable but before the size
variable gets the value of 10. The syntax of the variable declaration in COS YScript
does not allow to deﬁne the initial variable value. All variables in COSYScript are
initially initialized to RE(O) which is 0 of the REAL data type. Hence the dynamic_var
in the example is not declared as we intended. It is not. even a valid declaration of a
COSYScript variable since it really declares a variable of zero size and in COSYScript
variable size must be a positive integer.

To solve this problem it is currently suggested to use a technique similar to the
one to avoid the inclusion problems [23]. Some procedure or function encloses the
variable whose size is to be deﬁned by another variable, thus placing the size variable
in the outer block. Going back to the example in the Figure A.4, we modify it using
the current guidelines to get a proper dynamically sized variable deﬁnition. Such
modiﬁcation ensures that both a declaration and an initialization of the variable size
are done before the declaration of the dynamic_var thus we achieve the result desired.

However, there are two problems with the proposed solution. First, it clutters
a program with the unnecessary code, second, it cannot. be used for the variables
declared in the main block since it serves as a root of the tree of all blocks. Hence it
is therefore not. enclosed by any other block where the size variable can be declared

and initialized.

243

 

 

 

VARIABLE size 1;
PROCEDURE RUN;
VARIABLE dynamic_var size;
ENDPROCEDURE;
size := 10;

RUN ;

 

Figure A.5: Dynamic size of the variable in COSYScript (working)

User Interface

Apart from the problems already mentioned, the execution options for the
COSYScript interpreter are very limited. In fact, it does not accept any Options
directly from user. During startup, it looks for the ﬁle foxyinp.dat in the current
directory, reads its first. string and then displays the COS Y Inﬁnity logo and the user
prompt. In the prompt it allows the user to either enter a new COS YScript ﬁlename
for execution or press «Enter» to execute the ﬁle read from foxyinp.dat. If the
ﬁlename is correct, it interprets the ﬁle and displays either the list of interpretation
errors if interpretation fails or the output of the program if it succeeds. The text out-
put is shown in COSY Inﬁnity window, optional (depending on the driver selected)
windows containing the graphical output can be opened separately.

After executing the script, the ﬁle interpreter exits, leaving two intermediate pro-
cessing ﬁles with the names <name>.cod and <name>.lis (useful for debugging)

and the ﬁles produced by the script. The user cannot specify the name of the script. to

244

 

execute from the command line, cannot ask to cleanup the intermediate ﬁles, specify
the path to search for the ﬁles to include or pass command line arguments to the
COSYScript script. The command to call external programs through the OS shell
was added to COS YScript just recently so the only way to interface COSY Inﬁnity
was through the ﬁles containing input parameters and output results.

Much of these limitations for such a powerful language and framework are at-
tributes to the fact. that it was designed and developed by a very small group of
people in a very limited amount of time with primary focus on scientiﬁc methods and
new algorithms. User interface was never a top priority. Enhancing COS Y Inﬁnity
user interface and providing tight integration with another well-developed and ma-
ture programming language to use some Of its features seemed as a good approach to

make the scientiﬁc computing features of COSY Inﬁnity more attractive to users.

A.3 COSY++

A.3.1 Introduction and Features

To address the problems described in the previous section and enrich the user’s ex-
perience with COSY Inﬁnity we designed and implemented the COSY! r" extension

package. Its main features include, but are not. limited to:

- new mechanism for COSYScript source ﬁles inclusion that provides better separa—

tion of modules from user code;

- Active Blocks mechanism that enables the use ofthe Perl programming language
( [153]) as a macrolanguage for COS YScript programs. Applications: con-

ditional COS YScript code generation, command—line arguments processing,

 

macroprogramming, data preprocessing, and COS YScript libraries conﬁgura-

tion;

— new libraries for vector manipulations, coordinate conversions to/from MARS [137]

and ICOOL [68], logging, timing scripts execution and debug output;

— GATool library for real-valued functions Optimization with callback interface (al-
lows a user to interact with the Optimizer on each step, ﬁts into the general

COS Y Inﬁnity optimization package design (see Appendix B for details);

— front end for the COS YScript interpreter that allows a user to specify the script
(and even several scripts) to interpret and execute from the command line, pass
command line arguments to the scripts, use different COSY Inﬁnity executables
(e.g. complied with different compiler optimizations to benchmark the speed
and precision), perform cleanup after an execution, and save script output to
a ﬁle and specify a search path for the library ﬁles (this allows user to store
common libraries in one central location and use them from any script in any

directory);
— automatic conversion of old COS YScript scripts, i.e. compatibility mode.

We will now describe these features and implementation details in greater depth.

A.3.2 Sections Assembler

Sections Assembler addresses inclusion mechanism problems discussed in section A.2.
Instead Of linear inclusion it uses the concept of assembly. In order to produce the
resulting ﬁle from the user’s code and the code Of the modules (libraries) it assembles

them together using the structure of the COSYScript program, its blocks ordering,

246

and special markup commands inside COS YScript comments (this makes them trans-
parent tO the conventional COS YScript interpreter).

As we discussed in section A2), a COSYScript program is con-
structed from blocks. Each of these is built from the sections

(some can be omitted). Consider the root. block that consists from

HEADER section

 

 

BEGIN ;

 

VARIABLES section

 

 

VARIABLE main_var1 1;

VARIABLE main_var8 10;

 

FUNCTIONS section

 

 

PROCEDURE Procl arg1 arg2;
VARIABLE procharl S 2;

PROCEDURE ProcLProcl argl;
VARIABLE prochrocharl 11;

{ Commentary: code fbr Proc1_Proc1 }
Procl—Proc1_var1 := 'Hello’;
Proc1_var1 := ’world!';

ENDPROCEDURE;
{ Commentary: code fbr Procl }
proc1_var1 := ’Goodbyel’;

ENDPROCEDURE;
FUNCTION Funcl arg1 arg2 arg3;
{ Commentary: code for Funcl }

Funcl := (argl + arg2 + arg3) * main_var1;
ENDFUNCIION ;

 

247

 

 

 

CODE section

 

 

{ Commentary: code fbr main block }
write 6 'Hello, world!’;

 

FOOTER section

 

END;

 

 

The goal here is to ﬁrst mark up sections of the root blocks of the COSYScript ﬁles.
Then, if a COS YScript ﬁle requests to add marked up ﬁle to an assembly, the result
is created by merging the corresponding sections of the included ﬁle with the ones of
the including ﬁle, instead of performing in-place insertion.

Intrinsically, the assembly is represented by a list of sections. All the contents
of these sections are empty on initialization. The included ﬁle is parsed into the
sections, and these sections are added to the assembly, and then the same procedure
is repeated for the including ﬁle. Thus the including ﬁle’s sections are effectively
appended to the corresponding sections of the included ﬁle. If there are more ﬁles
to assemble, the process becomes multi-step. The assembler goes through the list of
the ﬁles to assemble, parsing them into sections and adding sections to the assembly.
When the parsing process is completed, sections are output to the assembled ﬁle in
the speciﬁed order. Thus they form correctly structured COS YScript code which can
then be interpreted by COS YScript.

As an example, consider the two files with their sections marked up in Figure A.6.
The result that comes out of the Sections Assembler after processing and merging is
in Figure A.7. Note that the function FUNCl and the variable F001 used in including
ﬁle were not. declared in it, hence any attempt to use this ﬁle without assembly would
result in COS YScript interpretation errors. Also note, that after assembly these

names are correctly ordered, thus the compilation and the execution Of the resulting

248

 

 

ﬁle is possible. The order Of the ’Hello, world—1,2’ statements in the assembled ﬁle
is meant to give hints on the assembly order.
The general rule is that the content of included ﬁles in assembled sections precede

the contents of corresponding sections in including ﬁles. Section mark up statements

 

 

{ #section <name> }

{ #section <name> }

 

are preserved in the form of sections beginning markers in order to increase the
readability of the resulting ﬁle. These statements are transparent for COSYScript
since they look like valid comments to its parser.

The instruction to add a ﬁle to an assembly has the form

 

 

{ #assemble ’file_name' }

 

and can be put anywhere in the ﬁle. Assembly instructions are processed before run-
ning COS YScript interpreter hence they are not ruled by the COS YScript syntax
rules. The name of a ﬁle speciﬁed in the instruction is searched in the current direc-
tory and in the assembly path, which can be speciﬁed via the COS YI I front-end.
Standard sufﬁxes (by default .111 and .fox) can be omitted and will be automatically
appended during library search. See section A.3.8 on conﬁguring the COS Y/ 1 front
end for details on the search path and standard sufﬁxes.

The ﬁle added to assembly can itself contain assembly instructions. In this case
processing continues recursively, forming the assembly tree, which is traversed from
the leaves to the root in the depth-ﬁrst manner. For example, if ﬁlel instructs the
assembler to assemble ﬁle2, which in turn requests to assemble ﬁle3, then every
section’s contents will be built from the contents of the corresponding sections: ﬁrst
from ﬁle3, then from file2 and only then from ﬁlel. Such ordering rules allow ﬁle2

and file3 to be libraries, and for the ﬁle2 to use ﬁle3 (possibly to implement parts

249

 

 

Including ﬁle

Included ﬁle (lib_ﬁle)

 

 

 

{ #assemble 'lib_file’ }

{ #section HEADER }
BEGIN;
{ #endsection }

{ #section VARIABLES }
VARIABLE F002 1;
{ #endsection }

{ #section FUNCTIONS }
PROCEDURE PROCZ;

RES := FUNC1(F001, F002);
ENDPROCEDURE;
{ #endsection }
{ #section CODE }

WRITE 6 ’Hello, world—2!’;
{ #endsection }

{ #section FOOTER }
END;
{ #endsection }

 

{ #section VARIABLES }
VARIABLE F001 1;
{ #endsection }

{ #section FUNCTIONS }
FUNCTION FUNCl;

ENDFUNCTION;
{ #endsection }

{ #section CODE }
WRITE 6 ’Hello, world—1!’;
{ #endsection }

 

 

 

Figure A.6: Including and included ﬁles with their sections marked up

 

 

 

{ #section HEADER }
BEGIN;

{ #section VARIABLES }
VARIABLE F001 1;

VARIABLE F002 1;

{ #section FUNCTIONS }
FUNCTION FUNCl;

mnruucnon;
PROCEDURE PROCZ;
RES := FUNC1(F001, F002);
ma...
{ #section CODE }
WRITE 6 ’Hello, world—1!’;

WRITE 6 ’Hello, world-2!’;

{ #section FOOTER }
END;

 

Figure A.7: Assembled ﬁle

 

of its own functionality). In this example ﬁle3 can be a user application using the
features from the library in ﬁle2 (and thus implicitly from ﬁle3).

The Sections Assembler also has a mechanism to prevent the multiple ﬁle inclusion
into an assembly in order to avoid duplicating code across sections. If the ﬁle that
is already in the assembly tree is encountered again during assembly processing, the
warning is issued and the ﬁle is not processed for the second time around. Note
that this mechanism also prevents the assembly tree from the lOOps (ﬁle3 includes
ﬁle2, which, in turn includes ﬁle3) that can lead to an inﬁnite cycle in the assembly
process.

The algorithm is straightforward: whenever a ﬁlename is encountered in the as-
sembly process, a unique name based on the ﬁle’s base name, size, and creation time
is created. The name generation process is guaranteed to create the same unique
name for the same ﬁle (in the same location with the same attributes) whether it is
speciﬁed by an absolute path or a relative path. This generated name is then looked
up in the list Of the unique ﬁlenames for the ﬁles that are already in assembly. If this
name is found, the ﬁle is considered to be already assembled, otherwise its unique
name is stored in the table and the ﬁle is added to the assembly.

Apart from the HEADER, VARIABLES, FUNCTIONS, CODE and FOOTER
sections described earlier, COS Y; 1 recognizes a DESCRIPTION section that pre-
cedes all other sections in the assembly. It is intended to serve as a placeholder for
the library descriptions but it can also contain assembly instructions and the Ac-
tive Blocks initialization code. An example of the description section taken from the

logging.fh library is shown in Figure A.8.

252

 

{ #section DESCRIPTION }

{**********1?*zhi'**********************************:I'************************}
{ Copyright (C) Michigan State university 2007, All Rights Reserved, }
{ \COSYS{} logging routines library by Alexey Poklonskiy }
{*************************************************************************}
{ Control variables and their defaults }
{ LogLevel = 1 (0 -— logging is off, 1 -- logging is on) }
{ nMAxLogFiles = 10 }
{ iFirstLogFileDescriptor = 50 }
{ LogFilesDir = './log/’ }
{*************************************************************************}

{ #endsection }

 

 

 

Figure A.8: Description section of the logging.fh library

A.3.3 Active Blocks

The idea of Active Blocks is inspired by the Active Server Pages technology developed
by Microsoft [85]. It allows one to build interactive HyperText Markup Language
(HTML) pages by adding the ability to embed VBScript or any other Active Scripting
Engine language code into the otherwise static web page source code.

The embedded code in the HTML text is recognized by special beginning and
ending markers, <% and %>. Everything in between these markers is treated as a code
written in one of the scripting languages mentioned earlier and is executed during
the HTML page rendering phase. Since VBScript is a feature-rich scripting language
with an extensive object model providing access to many OS services, it gives user
a plethora Of tools to make HTML pages dynamic. A detailed description of the
technology is out Of scope for this work and can be found in [85].

Similar design lies in the foundation of COSY; t Active Blocks (ABS). Everything
enclosed by {% and %} markers in the COSYScript source code is treated as Perl

programming language [173] source code and is executed during file processing.

253

There are two types of the Active Blocks:

non-inclusive

 

{%

%}

 

 

 

and

inclusive

 

{%=

%}

 

 

 

The difference between them is that apart from the effects produced by the execu-
tion of the Perl code inside the non-inclusive Active Block, it does not affect the
COS YScript source code in any way. A return value of the inclusive Active Block
is inserted back into the ﬁle and is thus interpreted as a part of the COS YS'cript
program. This makes inclusive Active Blocks useful for conditional COS YScript code
generation. As an example, consider the code of the tracing procedure from the

trace.fh library:

 

 

procedure Trace sMsg;
{%=
my $result = "";
if($TraceLeve1){
$result = " write 6 sMsg;";
}
else{
Sresult = " CONTINUE;";
}
return $result;
%}
endprocedure ;

 

 

If' $TraceLeve1 is set to any nonzero value in one of the earlier Active Blocks, this

254

 

 

 

 

inclusive Active Block deﬁnition inside the COS YScript procedure declaration is re-

placed by:

 

procedure Trace sMsg;
write 6 sMsg;
endprocedure;

 

 

But if it is not set or set to a zero, this Active Block is replaced by:

 

procedure Trace sMsg;
CONTINUE;
endprocedure;

 

 

One more example of Active Block usage comes from the logging library logging.fh:

 

{%

# Create the directory for log files if it was not created and the

# logging is on

if($LogLeve1 and not —d $LogFilesDir){

mkdir($LogFilesDir) or
AB::die(”Can’t create directory \"$LogFiIesDir\” “.
“for log files: 5!");

}

%}

 

 

Here AB is used during library initialization. It checks if the directory to store the
log ﬁles exists and creates it if it does not. In the code from utils.fh, ActiveBlock is

used to access Perl’s random number generator:

 

{ Get the inintial randOm value by using Perl’s rand() which }
{ calls srand() to get random seed automatically }
time := {%= rand(); %}*1000;

 

 

COS YScript does not provide an interface to directory creation and its pseudorandom
number generation capabilities are limited by the fact that it always starts from the
same seed. Therefore these examples clearly demonstrate how COS YS'cript can be
Enhanced by providing access to Perl’s services via Active Blocks. Other examples can

be found in the examples directory of the COSY; / distribution. It. is also worth

255

 

 

noting that all Active Blocks in one COSYScript ﬁle share the same Perl environment
thus they can communicate with each other via shared variables.

In the previous paragraph we briefly considered the aspects of Active Block ex-
ecution. Now we will examine them in more detail. After an Active Block is found
by COS Y/ t , its type (inclusive, non-inclusive) is determined, beginning and ending
marks are stripped, its contents are extracted, and then it is executed as Perl code.
Perl, as well as many other scripting languages, provides dynamic compilation and
execution of its own code during the main program execution. This feature allows a
user to generate subroutines in the run-time or execute code passed to the script as
text. The compilation and execution of this code is performed in Perl via the eval
operator. It accepts a string with Perl code as an argument, compiles it, and then
executes it in the context of the script. AB execution order is the order in which they
are speciﬁed in the source ﬁle.

The problem with this approach is that in this case the code passed to the eval
Operator has access to all variables of the program executing the eval within the
scope of the eval statement. This code can accidentally or intentionally modify
these variables thereby altering the calling program execution flow, and in the worst
case leading to a data corruption or a crash. COSY/ / itself is written in Perl, it
executes ABs written in Perl using the mentioned operator, and thus it is potentially
susceptible to this problem.

The implemented solution is in to use special Perl Safe module which provides so
called “sandboxes” or compartments for a safe code evaluation. A code executed in
such compartment is unaware of the fact that it is being executed this way, but it is
effectively unable to access any data of the program calling eval (and in some cases

even to some of the Perl services) unless such permissions are explicitly granted. This

 

 

way all the sensitive COSY I I data is protected from the code in Active Blocks and
all its services are provided only through the well-deﬁned interface described later.

There are two different symbol scoping mechanisms in Perl: dynamic and lexical
(for furhter details on Perl programming refer to [173]).

Dynamically scoped variables are accessible globally. They always belong to some
package and can be accessed via fully a qualiﬁed name (e.g. $Package::Variab1e).
Alternatively, a short name (e.g. $Variab1e) can be used as an equivalent
to a fullly qualiﬁed name built using the current package maintained by Perl
($CurPackage: :Variable). The current package can be set via package <name> Oper-
ator, default package name is main. A current package declaration is in effect from
the place it was made in the current program to the end of the block it is speciﬁed
in.

Lexically scoped variables visibility is deﬁned by the blocks they are declared
in. Their seeping rules are the same as the one of the COSYScript variables, i.e
they are accessible after their declaration from the block they are deﬁned in and all
the blocks enclosed by the block they are deﬁned in. The difference between Perl’s
lexically scoped variables and COSYScript variables is that Perl’s variables can be
declared anywhere in the block since Perl does not separate the block into sections
while COS YScript does and allows variable declarations only in the ﬁrst of them. The
lexically scoped variables do not belong to any package and are independent from the
package statements.

The reason for such an extensive treatment of the seemingly irrelevant tOpic is that
internally individual Active Blocks are executed as if they were different blocks of the
same Perl program. Hence all the hereby mentioned seeping rules apply to them. It.

is possible for Active Blocks to communicate and control each other via this shared

257

 

 

 

 

COSYScript ﬁle Corresponding Perl ﬁle
{ #package PackageName } {
COSYScript code package PackageName;
{% Active Block 1 code
Active Block 1 code }
%} {
\COSYS{} code package PackageName;
{%= Active Block 2 code
Active Block 2 code
%}
\COSYS {} code package PackageName;
{% Active Block 3 code
Active Block 3 code }
%}

 

 

 

 

 

Figure A.9: COSYScript ﬁle as a Perl ﬁle view from Active Blocks perspecitve

execution environment, deﬁned by variables and functions. Active Blocks execution
order is the order in which they are declared in the source ﬁle. The representation of
the COS YScript ﬁle containing Active Blocks as a Perl script, shared by all Active
Blocks, is demonstrated in Figure A.9. Note that since Active Block 2 is of the
inclusive type, its return value (deﬁned by Perl return statement or by the result of
the evaluation of the last statement in a block if the return statement is not speciﬁed)
is inserted into COS YScript program after evaluation.

Therefore in the Active Block body dynamic variables used without package name
are binded to the package PackageName. For example, unqualiﬁed dynamic variable
SVariable in any of the Active Blocks 1,2,3 in the above example is really a qualiﬁed
dynamic variable $PackageName::Variab1e. Dynamic variables accessed via a fully
qualiﬁed name with a package name explicitly given, are bound to this package, e.g.

$AnotherPackageName: :Variable. Dynamic variables are accessible from anywhere in

258

l

 

 

the program across the AB boundaries and, as though, are useful to pass information
between different Active Blocks in the same COS YScript program.

Lexical variables are declared via my Perl Operator. They are visible only in the
lexical block they are declared in, thus they cannot be accecced outside the Active
Block itself. If the AB consist Of several Perl blocks, then each block’s lexical variables
are visible in the corresponding block only.

PackageName in the example earlier is a package name for the current ﬁle. It is
either an automatically generated unique name based on the ﬁle name or a name set

by user via package pragma:

 

 

{ #package package_name }

 

 

In the case there are several package name declarations. the name from the ﬁrst
pragma is taken.

In both cases PackageName is set once per ﬁle. Practically user can change it
from Active Blocks using Perl, but such practice is not recommended in order to
avoid possible problems from such unexpected behaviour. We recommend to name
packages to give some clue on the services provided by the package. For example,
timers.fh library uses Timers package name (thus the variables are syntactically
COSY: :Timers: : package), logging.fh uses Logging package name (thus the variables
are syntactically in COSY: :Loggingzz package). Note however, that cosy.fh library
uses BeamTheory package name as it better describes the library’s services.

A user’s code can omit package name declaration and rely on the automatic
package name generation unless his intent is to build library to share with other
people. In this case it strongly advised to invest some time to chose a descrip~
tive package name in order to make the library conﬁguration variables accessible via

COSY: :<PackageName>:: preﬁx. The unique package names generated for the ﬁle in

259

 

 

the absence of the explicit deﬁnition by package pragma are not guaranteed not to
change in the next COSY+ I run.

Also note, that Perl itself internally maintains _PACKAGE_ variable to store the
current package’s name. It is often used by the COS Y+ ~-/ libraries to mark their

variables and functions sections:

 

 

{%= "{s‘m’n'r ”._PACKAGE_." functions *M}\n" %}

 

which is replaced by the COSYScript commentary

 

 

{st-ah!- package_name functions *zht'}

 

during COSY/ /' processing.

Based on the experience we suggest the following scheme of the Perl variables
usage in Active Blocks: in order to create the variables used in the current Active
Block only, use the lexical scoping. For the variables accessible in all Active Blocks
of a single ﬁle (e.g. ﬁle-speciﬁc conﬁguration Options and flags) use the unqualiﬁed
dynamic variables (and it is advised not to change the current package in Active
Blocks to be able to access them). In order to create variables visible for other ﬁles
(e.g. public conﬁguration Options) or access such variables deﬁned in other ﬁles,
use the dynamic variables that are fully qualiﬁed by the name of the package. For
example, SCOSY: :Timers: :nMaxTimers is a variable to deﬁne a maximum number of
timers supported by a timers library and it can be accessed from any ﬁle not just
from the ﬁle with the library itself. The dynamic variables should be used with great
care and be properly initialized prior to their usage since they are accessible from
anywhere in the program and thus can be accidentally or intentionally set to the

unexpected values thus causing the problems that are hard to debug.

260

 

 

 

 

A.3.4 Full Processing

SO far we reviewed the individual pieces of the COS Y/ I, here we review the gen-
eral scheme it follows to process a ﬁle. When the ﬁle for processing is speciﬁed by
user, COS Y4 I initializes a new processing session by setting the parameters Of the
processing, checking the ﬁle for existence, creating and initializing safe compartments
for Active Blocks execution. Then it creates new Sections Assembler, initializes it
and ﬁnally adds the speciﬁed ﬁle to the assembly. Before parsing the ﬁle into the

sections and creating the assembly, Sections Assembler pre-processes ﬁle’s pragmas
and Active Blocks in the three passes:
1. package pragmas: searches for the ﬁrst one and saves the package name for

Active Blocks evaluation;

2. Active Blocks: executes all in the respective package; replaces their deﬁnitions

in the source code with a result of the execution if AB is inclusive or with

nothing, if non-inclusive);
3. All assembly pragmas as described in section A.3.2.

Note that all Active Blocks in the ﬁle are processed and executed prior to assembly
pragmas processing, whether they are located before or after the pragmas.

For the assembly pragmas processing the logic is the following: every assembly
rsragma is parsed, theﬁle name to assemble is extracted and added to the assembly.
Sections Assembler then pre-processes this ﬁle using the same 3—pass processing and
adds it to the assembly. On the third pass it can encounter other assembly pragmas
Idence the assembly process recursively continues forming assembly tree along the way.

If we consider an assembly tree with the assemblying ﬁles on the top and the

assembled ﬁles on the bottom, then the Active Blocks are executed starting from the

261

root (the ﬁrst ﬁle added to the assembly) to the leaves, depth ﬁrst. The sections,
in contrary, are assembled from the leaves to root, depth ﬁrst. This approach allow
user to conﬁgure libraries they are adding to the assembly before adding their code.
This is the common logic of the source ﬁle inclusion processing used by many other

programming languages and maerolanguages. Note however, two big differences with

the other languages:

0 All Active Blocks in a ﬁle are executed prior to processing the ﬁrst assembly

pragma.

0 Assembly method works differently from the commonly used whole—ﬁle—in-place

inclusion method as it was described earlier.

A.3.5 Libraries

Libraries generally provide some additional functionality to a user and are pack-
agded together by using a common factor in the services they provide. For
example, logging.fh library provides functions to open, write to and close
log ﬁes, conversions.fh provides beam physics coordinate conversion routines,
utils.fh provides various utility functions to operate on vectors and other built-
in COSYScript datatypes. Some of the libraries included to COSY! I pack-
age provide conﬁguration interfaces via Active Block variables, for example
SLogLevel, SnMaxLogFiles, $iFirstLogFileDescriptor, $LogFilesDir in the log-
gingfh. Special care must be taken in providing the default values for these variables
Since a user can change them prior to using a library in order to conﬁgure it, so we
need to check if the variable is assigned a value prior to its initialization to a default

Value in the library.

262

Other issue is that the errors in Active Blocks does not. stop the processing of the
ﬁle and assembly, these errors are trapped in eval’shown to a user during processing
and then ignored. Sometimes the severity of these errors is such that we want to stop
further processing and exit with error message. Since Active Blocks are executed by
Perl in safe compartments (see the details before), termination of the processing is

not possible unless explicitly permitted to.

To address these (and, possibly, other) issues, the AB package provides two sub-
routines: init_config_var and die. First subroutine takes a conﬁguration variable
name and a reference to its initial value (could be of the SCALAR, ARRAY, HASH
Perl’s basic data types [173]) or SCALAR, value in case of the scalar variable ini-
t ialization (a shortcut for the most frequently used datatype). Then it checks if the
T. his variable with this data type was not already assigned a value. If it is not, then
i t. is initialized with the provided value, otherwise the subroutine does nothing thus
preserving the value set by user or some other library. The second subroutine, die,
takes one argument: a message to show to a user. It outputs this message to the
Standard error stream and then stops the subsequent processing. Note, that from
.Active Blocks these two subroutines should be called as AB: :init_config_var and
AB: :die correspondingly.

A self-explanatory example of the library variablees initialization is taken from

logging.fh:

 

~{%
AB::init_config_var(LogLeve1, 1);
AB::init_config_var(nMaxLogFiles. 10);
AB::init_config_var(iFirstLogFileDescriptor, SO);
AB::init_config_var(LogFilesDir, ”./Iog/");

 

[36}

263

 

A.3.6 Compatibility Mode

In order to make user experience with the new system as painless as possible and to

make it possible to reuse the large code base accumulated over the years Of COS Y
Inﬁnity active usage, the old syntax conversion mode is added to COSY/ /' . Before
processing a ﬁle COSY! r tries to determine if the ﬁle was is written in the pure
COS YScript, by checking for section pragmas and Active Blocks. If they are not
found, the ﬁle is considered to be using the old syntax and the attempt to convert
it. to the new syntax is performed prior to the further processing. After replacing
original COSYScript inclusion with the assembly instructions and marking up the
Sections automatically, the ﬁle is processed by COSY?!’ I as a ﬁle written using all
t be new features.

Here are some more details Of the process. COS Yt ~/~ ﬁrst checks if the ﬁle begins
with INCLUDE statement or BEGIN statement. If it starts from the inclusion statement,
COS Y! I- extracts the ﬁle name from the statement and searches if there is a new
Version substitute for the included ﬁle in the list. of the available libraries. Two most
(éommonly used COS YScript libraries are cosy.fox and tm.fox and they both have
substitutes: cosy.fh and tm.fh, correspondingly. Ifthe substitute was found, INCLUDE
i 8 replaced by a corresponding assemble pragma.

During the second and ﬁnal step of the conversion, COS Y»; / tries to determine
the boundaries of the sections of the root block (see section A.2) and properly mark
them with section pragmas. Note that this search is heuristic, hence it can fail for
some ﬁles. Our tests, however, show that most of the old ﬁles can be converted,

processed and executed without problems.

One case of errors that can occur is if the sections’ boundaries are placed to the

comments put before the actual section boundary. COSY/ I does not recognize

264

COS Y Inﬁnity comments so the sections in this case are marked up incorrectly. Care
is taken to correctly process the cases of the missing sections, but we can not guarantee
it to work correctly in all cases. Correct treatment of the sections requires a complete
rewrite Of the COS YScript parser which is out of the scope Of this work. However,
it can potential] be done provided signiﬁcant demand from the users. Some work in
this direction is already done as a part Of the development of the autoconversion tool

for the COS YScript to C ++ programming language source ﬁle conversion [38].

A.3.7 Standard Libraries

COSY! / libraries are a part of COSY/ z package and can be found in the include

subdireetory of the package root directory. Here is the list of the currently imple-

mented general purpose libraries:

0 conversions.fh provides coordinate conversions between COS Y Inﬁnity par-

ticle coordinates and ICOOL coordinates, between COS Y Inﬁnity coordinates

and MARS coordinates [68,137]
0 logging.fh provides log ﬁles manipulation and logging interface

0 physics.fh contains the deﬁnitions of the physical constants, their initializa-

tions and various functions for commongly used formulae from accelerator

physics

0 timers.fh provides timing for the COSY Inﬁnity scripts or parts of them which

is useful for code proﬁling

o tracing.fh provides interface to the tracing procedures, that generate output

only when the certain tracing level is set; useful for degugging or output ver-

265

bosity level consistency across different parts of the pgoram

0 utils.fh deﬁnes various convenience functions that are not built-in into COS Y
Inﬁnity: seed—based random numbers generators including random numbers
from Gaussian distribution, vector constructors, arithmetic and logical oper-
ations on vectors and matrices, logical indexing for them similar to Matlab,

vector distances; domain scaling procedures for optimization functions

There are also the libraries from the to optimization in optimization subdirectory

ofthe COSY; I package:
0 gatool.fh implements the real-valued functions optimizer based on Genetic

Algorithm (see Appendix B for details)

0 test_functions.fh is a collection of the test functions for global minimizers,

most possess certain properties that make them hard to treat for optimizers

o lienard_jones.fh is a set of the test functions based on the various Lienard
Potential calculation problems; accessible from the test_functions.fh or as a

standalone library

0 tm.fh is a TM.fox standard 005' Y Inﬁnity package containing Taylor Models

manipulation interface, marked up for COSY/ /
and Beam Theory libraries in cosy subdirectory:

o cosy.fh is a standard COS Y Inﬁnity package COSY.fox containing vari-
ous Beam Theory computation and visualization algorithms, marked up for

COSY+¢

o cosy_wrappers.fh provides convenience functions to access certain elements

of the cosy.fh

266

Examples Of the library usage can be found in examples subdirectory Of the

COS Yr‘ # distribution root directory.

A.3.8 Front End

In order to make all these features available to a user, the front end to Sections
Assembler and Active Blocks processor is written in Perl programming language. It
exists in the form Of the command-line script cosy++.pl and can be found in the
root directory of the C05 Y I I distribution along with readme ﬁle that briefly covers
its features and installation procedure. More complete and detailed description of the
command line parameters and the usage modes of the cosy++.pl can be Obtained

anytime by the calling the script from the command interpreter with -h switch:

[ >> cosy++.p1 -h

COSY/ I is under the active deveIOpment, details can change, but at the current

rnoment it outputs the following information about the current version, developer,

copyright and usage:

Usage:
cosy++.pl I-h]
cosy++.pl [options] filel [file2 ...]
cosy++.p1 [options] -a file [argl ...]
--h[e1p]
Print usage information

-—v[erbose]
Print information about the parsing process. Additive, i.e could be

used several times for increased level of verbosity. Currently

supported levels 0,1,2

-—co[sy]=cosy_exec
Execute the processed file with COSYInfinity using cosy_exec

executable by default uses "cosy_ni" in the directory set up
during COSY installation

 

267

 

 

 

 

——[no]e[xec]
COSYInfinity execution flag. It is run if the flag is on and not
run if it is off

——cl[eanup]
Perform cleanup after execution. Additive. Supported levels:
0: no cleanup ‘
1: delete ".lis", ".cod" and "foxyinp.dat" COSY intermediate
files
2: delete processed file too

——o[ut]
Stores COSY Infinity output to file after execution. If not
set or set to "", output to STDOUT (default)

—-i[nc1ude]=path
Semicolon—separated list of directories to search in during
assembly pragmas processing (usually where COSY++ libraries
for COSY Infinity are stored)

——s[uffixes]
Semicolon—separated list of suffixes to be appended to files
specified in assembly directives during assembly

--a[arg-mode]
In case this option is set everything that follows first file name
is treated as an argument to \COSYS{} and it could be accessed from
the active blocks via Perl internal @ARGV array

—-p[roc—name]=processed_template_string
String used to generate the name of the processed file. Could
contain special variable names $base, $ext, $full which are
replaced by basename of the file, its extension and the full
name of file during processing correspondingly (defaul:
"processed_$full)

file

File to process. In arg—mode everything that follows it is treated
as a list of arguments to script accessible through Active Blocks

268

 

 

filel [file2 ...]
If arg—mode is off, expect a list of files to process in the order
they are supplied. All options set are shared between all files
in the list

 

As is seen from the detailed help, apart from the name(s) of the ﬁles to process
and arguments passed to them, a user can specify the path to search for the ﬁles
from assembly pragmas, suffixes to be appended to ﬁlenames during this search,
template for the processed ﬁlename, a name of the 005 Y Inﬁnity executable to run
the resulting ﬁle, a filename to store the textual output of the run, cleanup options
and a level of the output verbosity for COSY / t .

There exists another method to specify COS Y/ r” parameters that is particularly
useful if the user typically executes COS Y! t with the same set of parameters and
rarely needs to modify them. All these parameters can be stored in COSY/— + con-
ﬁguration ﬁles. All conﬁguration ﬁles used by COS Y-l— 1' are named .cosy++ but
can be stored in different directories. Whenever COS Yl l is executed it sets its

conﬁguration parameters in the following order:
1. Default parameter values deﬁned in the cosy++.pl source code
2. Parameters set in .cosy++ stored in the directory with cosy++.pl script

3. Parameters set in .cosy++ stored in the user home directory (obtained from

OS)

4. Parameters set in .cosy++ stored in the current directory (current as of the

moment of the execution)

. Parameters set from the command line

CI!

269

 

Parameters that are not set on any of these stages will have the values set by the
previous stage. The value of the parameter set on any of these stages overrides the
value from all previous stages.

Such scheme allows very flexible conﬁguration of the execution parameters where
the global parameters that are used most of the times are stored in the main con-
ﬁguration ﬁle (processed on the stage 2) and the conﬁguration parameters that are
unique to some source ﬁle are stored along with the ﬁle itself. A user does not need to
set all of the parameters in this conﬁguration ﬁle, only the ones he intends to override.
For example during the deveIOpment phase user can set cleanup level to a minimum
in the local conﬁguration ﬁle in order to closely track all syntax processing errors.
Later, when the code is stable and he does not need cleanup, he can simply remove
the local conﬁguration ﬁle to automatically switch to the settings in the global one.
For trial runs with parameters changing from run to run command line parameters
conﬁguration is more useful.

Syntax of the .cosy++ conﬁguration ﬁle is the one that is typically used by the

Unix conﬁguration ﬁles. It consists of the lines containing

[ name = value

pairs. Here name is a name of the conﬁguration variable to set, value is a value to

 

assign to this variable. Everything after # symbol is considered a comment and thus
is ignored. Value string can contain expressions in the form %name% which will be
replaced by the value stored in the variable name. This feature is useful for adding
values to conﬁguration variables instead of completely overriding them. For example,

to add directory c:/dir to the search path for assembly ﬁles, use

 

[:ASSEMBLY_PATH = %ASSEMBLY_PATH%;C:/dir

 

 

270

 

 

 

in the conﬁguration ﬁle. File .cosy++ in the root directory of the COSY: 1 dis-
tribution contains all the conﬁguration parameters that COSY! + recognizes along

with their initializtion instructions.

A.3.9 Additional Features and Notes

One useful feature of the COSY; I is that the result of its processing (in case it was
successful) is a valid COSYScript ﬁle. It can then be executed by COSY Inﬁnity
without any more modiﬁcations. Hence if there is a need to share the code that
actively uses COS Yet / features with the COSY Inﬁnity user who does not use or
does not have COS Y/ -/ package, one can just. process the ﬁle by COS Yi- I and then
share the results of processing executable by a standalone COSYScript interpreter.

Additional feature of the COS Y/ I which does not ﬁt into the general list of
features provided by Sections Assembler and Active Blocks is the extended veriﬁcation
of the COS YScript syntax. If the corresponding conﬁguration ﬂag is set, COS Yi- I
performs the check for global variables names clashes and warns user if the global
variable name is deﬁned more than once.

By global variables we hereby mean variables deﬁned in the VARIABLES section
of the root block (see section A.2). By default, if there are several declarations of the
variable with the same name, COSYScript silently uses the last one. With the new
assembly mechanism in place the number of the global variables can easily get very
large. Some of these can be declared in the ﬁles developed by different people who can
unintentionally use the same variable names for different purposes. The behaviour
of the program that results from assemblying these ﬁles is, at best, unpredictable.
Bugs like this are very hard to ﬁnd and ﬁx, hence this feature can be of great help

to developers. Additional COS YScript syntax checking can be added. For example,

271

 

to issue warnings about other common mistakes COSYScript programmers make. In
order to avoid problems with the global variables names clashes we recommend user
to preﬁx all global variables in the library by an acronym of the library name and to
use long descriptive names, e.g. GAToolStatus.

Interaction between Active Blocks syntax and the COSYScript syntax is subtle.
During the Active Blocks processing phase COS YI I totally ignores the COS YScript
syntax. It searches for the Active Block beginning and ending marks, executes what-
ever it ﬁnds inside. Then it replaces Active Block in the source code with the results
of processing if the Active Block is of the inclusive type or with nothing if it is of
the non-inclusive type and proceeds to next Active Block. Process then continues
until the end of the ﬁle is reached. However, COS YI I ignores COS YScript com-
ments or any other syntax elements. Hence, for example, Active Block inside a valid
COS YScript comment would still be processed. What is important is to ensure that
the results of the Active Blocks processing are forming a valid COS YScript ﬁle. Also
note that the contents of the Active Blocks are treated as a Perl code, hence all com-
ments inside them should be written in Perl style, not in COS YScript style. Nested
Active Blocks are not supported and should be avoided.

The parsing algorithm of the COS YI I is very heuristic and utilizes regular ex-
pressions [173]. The COS YScript structure is recursive and it is well-known that
parsing of the recursive structures with regular expressions is very hard and in gen-
eral case is not even possible [73]. To change this unfortunate situation we need to
completely reimplement COSYScript language parser. For example, we need to omit
searching for the beginnings and endings of the sections in COSYScript comments
which can be nested and thus are recursive as well. It is worth noting that we did not

ﬁnd a lot of cases where the heuristic COSY I I parser failed. To avoid problems with

272

 

sections recognition user should try to avoid words such as “variable” in comments
before VARIABLES section, “functions” or “procedure” before FUNCTIONS section.

Also note that while the assembly model greatly increases the flexibility and mod-
ularity of the code, it slows down the interpretation process. In the conventional
COS YScript inclusion model the ﬁle to be included is already precompiled while in
the assembly mode it is inserted as text each time and thus is recompiled each time
the ﬁle is processed. If the library to include/assemble is large, the conventional
model provides faster startup (the time passed between passing ﬁlename to execute
to COS Y Inﬁnity and the moment the program execution actually starts). We how-
ever believe that for complex and thus frequently large projects assembly model still
constitutes a good tradeoff of some Speed for the signiﬁcantly better code quality.
For the small projects it might still be worthwhile to use old model. However new
convenience libraries provide a lot of useful services that speed up the development
process so there are still some considerations to switch to COS YI I. Old syntax con-
version mode can help in making the transition from the conventional COS YScript
usage model the to new one easy.

The missing features can be added and known limitations can be resolved in the
future given signiﬁcant demand generated by users, but as of now it is out of the
scope of this work. As a ﬁnal note we remind that COS YI‘ I code is modular, well-

documented and easily extensible, thus additional features could be easily added.

273

-.d'glr- Ir:

APPENDIX B

The Genetic Algorithm Tool

(GATool) in COSY Inﬁnity

B. 1 Introduction

GATool is a real-valued function optimization package for COS Y Inﬁnity implement-
ing Evolutionary Optimizer from section 2.3. It designed in such a way that it should
work well for the most problems with the default values of parameters and a minimal
amount of fine-tuning. However, some problems might be solved better or faster (in
some cases both) with the non-default algorithm parameters. Therefore there was
developed the parameter conﬁguration interface for GATool. The parameter initial-
ization must be done before starting the actual minimization process, the change of
the parameters during the run process is not supported. Moreover, it is highly not
recommended and the results of such change are unpredictable. The methods to get
access to the statistics of the current run and the best value found on each step are
also provided. Interfaces, conﬁguration parameters and the values they can take,

the default parameter set and typical GATool usage patterns, are described in this

274

appendix.

B.2 Conﬁguration

 

 

procedure SetCreationParams CreationAlg;

 

 

Sets the creation algorithm used in the initial population generation and regeneration
of the eliminated members of the population. Supports the following CreationAlg

values:

 

 

CreationAlg = 1 { UNIFORM creation algorithm, i.e. new member of the }
{ population is any point in the initial box with }
{ uniform probability }

 

 

 

 

procedure SetArealParams ivInitBox Scale ileobalBox IsKillingOn;

 

 

 

Sets the initial box and the global box parameters. The ﬁrst parameter, ivInitBox is a
vector of intervals that deﬁnes search ranges for the coordinates. The second pareme-
ter, Scale is a scaling coefﬁcient. Effective initial box is generated from ivInitBox
by multiplying the lengths of its sides by this coefﬁcient (note that in such case the
volume of the box increases as the n-th power of the Scale). The scaling coefﬁcient is
introduced to simplify the exploration of the problem, i.e. if a user wants to try run-
ning the algorithm with a smaller or larger box, he does not have to manually rescale
each interval in ivInitBox. Instead, he can simply change the scaling coefﬁcient. The
initial population is generated in the scaled ivInitBox.

The third parameter, ileobalBox is a vector of intervals and in most cases
it should contain the scaled initial box. It is used along with the last parameter,

IsKillingOn. In the case this parameter set to a non—zero value, all the members of

275

 

 

 

t. he population outside of the ileobalBox are eliminated and then replaced by the
new members generated using the creation algorithm speciﬁed by SetCreationParams.
This rule is applied on each step of the Optimization process. Hence if the killing is on,
GATool guarantees that all the members of every generation stay inside ileobalBox.
If the scaled initial box is not a proper subset of the global box, GATool issues a warn—
ing, but proceeds with the execution. In this case the members of the pOpulation that

are in the scaled initial box but not in the global box are eliminated if the killing is on.

 

 

procedure SetInitialPopulation nInitPopSize aInitPop;

 

 

Some portion of or the whole initial population can be predeﬁned with this procedure.
This feature is particularly useful if a user has some insights about the function’s
behaviour (perhaps, obtained by using GATool with the parameters tuned for
the exploration of the search space, or some other optimizer, or analysis). This
subroutine provides an interface to transfer these insights as a hint for the GATool
optimizer. The information in our case consists of the points in the search space
that user considers as the potential minimizers or to be in their close proximity. The
two parameters the procedure takes are nInitPopSize and aInitPop. They deﬁne
the size of the initial population (must be less than or equal to g_nPopSize) and its
elements, correspondingly. Here aInitPop is an array with g_nDim elements. Each of
the elements is a vector of the length equal to g_nPopSize, so that aInitPop(i) is

a vector containing all the i-th coordinates of all the members of the initial population.

 

procedure SetReproductionParams nElite MutationRate;

 

 

Sets the ratio of the members of the next generation generated by each of the avail-

able new members generation methods. There are three types of the members in the

276

 

 

next generation: the elite, mutated and produced by a crossover. The elite mem-
bers are the best members (the points that are providing the smallest values of the
minimized function) and as such they are transferred from the previous generation
without changes. A number of these members is set by the nElite parameter and
must be a non-negative integer less than or equal to the population size g_nPopSize.
The mutated members are the ones produced by mutating members of the previous
generation using the chosen mutation algorithm. MutationRate parameter deﬁnes
the percentage of the new population that is generated by the mutation. It must
be a real value from the [0,1] range. The actual number of the mutants is then
g_nPOpSize*MutationRate. The number of the elite children plus the number of the
mutants must be less then or equal than the population size. If this sum is less than
the population size, all the remaining members of the pOpulation are generated by
the crossover.

There are three forces that affect the evolution in Genetic Algorithm: the
exploration, exploitation and conservation (see [148I for the more detailed study and
the explanation of the similar concepts of compression, transmission and neutrality
selection and their interplay in the evolution process). The exploration is responsible
for an exploration of the search space with essentially random moves in random
directions in a hope to ﬁnd the areas of interest. The exploitation is a more careful
examination and reﬁnement of these areas in the hope to ﬁnd a minimum. The
conservation is responsible for preserving the best values found so far. The elite
members of the population drive the preservation, mutated members drive the
exploration and the members produced by the crossover drive the exploitation.
Hence by controlling the reproduction parameters, a user controls the impact of

these forces on the evolutionary search and, as such, the performance of the method

277

 

 

which can be made more exploratory or quickly converging to a local minimum. The
process of selecting a right set of parameter values is mostly heuristic, involves trial

and error, and non-trivially depends on the problem.

 

 

procedure SetFitScalingParams FitScalingAlg;

 

Sets the ﬁtness scaling algorithm. Currently supported algorithms (described in de-

tails in section 2.3):

 

 

FitScalingAlg = 1 { LINEAR }
FitScalingAlg = 2 { PROPORTIONAL }
FitScalingAlg = 3 { RANK }

 

The fitness scaling ”transforms the function values in any ﬁnite range to the
fitnesses in the [0,1] range in order to make comparisons between different function
values domain-independent. Since the search is directed towards the minimum of the
function, the larger ﬁtnesses correspond to the smaller values of the function. In order
to perform this transformation LINEAR and PROPORTIONAL scaling algorithms
map the function values to the desired interval by means of the multiplication and
addition; RANK sorts them and then assigns the ﬁtnesses according to the position
in the sorted list. From these methods RANK is the slowest due to the sorting
which is of the order of 0(nlog n), where n = g_nPopSize in the worst case. But
at the same time this scaling algorithm is the least sensitive to numerical errors
since it does not involve any mathematical operations on function values. The other
two involve the subtraction which can lead to the cancellation errors if the function

assumes small values on the search domain.

 

 

procedure SetCrossoverParams CrossoverAlg CrossoverParams;

 

278

 

 

 

Sets the crossover parameters. First parameter, CrossoverAlg sets the type of the

algorithm. Currently GATool supports only one crossover algorithm:

 

 

CrossoverAlg = 1 { HEURISTIC }

 

The second argument, CrossoverParams, is an array with the parameters of the
crossover algorithm. For HEURISTIC algorithm the following parameters are sup—

ported:

 

 

CrossoverParams(1) { ratio (scalar or vector) of the distance between two }
{ parents where the child is created }

CrossoverParams(Z) { randomization flag which determines if the effective }
{ ratio is multiplied by a random number before usage }

 

This algorithm creates the child on the line connecting two parents if the ratio
is a scalar, or in the hypercube determined by two parents and the ratio, if the
ratio is a vector. If the ratio (or the ratio’s component in case of the vector ratio)
is > 0.5, then the child is created closer to the better parent (in this coordinate
in case of the vector ratio). It is created closer to the worse parent if the ratio
is in (0.0.5). If it is negative, the directions are reversed. Recommended range

of the values for the ratio is [0, 2]. Details of the algorithm are described in section 2.3.

 

 

procedure SetHeurCrossover Ratio IsRandomize;

 

More user-friendly interface to set HEURISTIC crossover algorithm and its parame-
ters. See the description of the SetCrossoverParams in this sectin for the description

of the parameters.

 

 

procedure SetMutationParams MutationAlg MutationParams;

 

279

 

 

 

 

Sets the mutation algorithm parameters. The ﬁrst parameter, MutationAlg selects

the mutation algorithm. Currently supported are the following types of the mutation

 

 

algorithms:
MutationAlg = 1 { UNIFORM }
MutationAlg = 2 { FADING GAUSSIAN }

 

UNIFORM algorithm ﬁrst checks the mutation probability for each coordinate of ev-
ery member and, if the mutation occurs, replaces the coordinate value by a randomly
generated number from the initial box’ corresponding range. FADING GAUSSIAN
algorithm generates the vector with each coordinate generated in the Gaussian dis-
tribution centered at 0 with the mean equal to the width of the corresponding range
from the initial box’ multiplied by the scale and fading parameter and then adds it to
the mutated member to produce the mutant. Details of the algorithms are described
in section 2.3.

Second argument, MutationParams is an array containing parameters of the se-
lected mutation algorithm. For UNIFORM mutation only one parameter is sup—

ported:

 

 

MutationParam(1) { gene mutation probability which specifies the }
{ probability with which each gene of every member
{ of the population selected for mutation is mutated }

He

 

For FADING GAUSSIAN following parameters are supported:

 

 

MutationParam(1) { scale to determine the Gaussian distribution’s mean }
{ value (scale = 1 corresponds to the full length of }
{ the box along coordinate) }
MutationParam(2) { shrink factor, that determines the speed with which }
{ mean value shrinks with generations (shrink factor = 0}
{ corresponds to no shrinking; allowed values range is }

{ [0,1]) }

 

280

 

 

 

 

 

procedure SetUnifMutation GeneMutProb;
procedure SetGaussMutation Scale ShrinkFactor;

 

 

More user-friendly way to set. UNIFORM or FADING GAUSSIAN mutation algo-

rithms and their parameters.

 

 

procedure SetSelectionParams SelectionAlg;

 

 

Sets the selection algorithm parameters. Determines the algorithm that selects the
members of the population for the mutation and crossover. The currently supported

selection algorithms are:

 

 

SelectionAlg = 1 { ROULETTE }
SelectionAlg = 2 { STOCHASTIC UNIFORM }
SelectionAlg = 3 { TOURNAMENT }

 

 

All of these algorithms use an information about members’ ﬁtnesses to select the
members with better ﬁtness with higher probability in order to use them for the
reproduction. The method used to exploit this information depends on the algorithm.

The details of the algorithms are described in section 2.3.

B.3 Usage Scenarios

A typical scheme of the GATool usage is:

 

 

GA_Init ProblemDim PopulationSize RandomSeed;
{ Set initial population }

{ Set various algorithm parameters }
{ Set stopping criteria }

281

 

P'J'ﬂl

 

 

GA_InitProb1em;

while g_GAToolStatus#0;
g_vFValues := OBJ_FUNC(g_aNextPopulation, g_nDim);
GA.Step;

endwhile;

GA_FinalizeProb1em;

 

 

 

Here the comments denote the placing of the optional GATool conﬁguration pro-
cedures described in the previous section, OBJ_FUNC is a function that GATool is
minimizing. Note that the function value computation method is left to a user. In
order to proceed with the search GATool only needs function values evaluated at the
points stored in g_aNextPopulation which is an array that contains g_nDim vectors
with population members coordinates. The population array’s i-th element is a vector
containing i-ths coordinates of all the members of the population. Function values
must be stored in g_vFValues in the same order they are stored in the pOpulation
array.

The member can be extracted from the array as a vector by calling
GetPopulationMember, the function described in this section. Note that this is in—
volves the usage of the temporary variables and does not exploit the vector operators
optimized by COS Y Inﬁnity, hence it is generally ineffective and should be avoided.
More efficient (however, not always applicable) method is to design OBJ_FUNC such that
it takes the population array and dimension as its arguments, returns the vector with
function values at these points and uses the vector Operators to compute this result.
The vector manipulation functions from the utils.fh library (see section A.3.7) might
prove themselves particularly useful for the task. Here is an example of the function
designed to take advantage of the vector operations (from the test_functions.fh

library):

 

lfunction Schwefechn x nDim;

282

 

 

variable i 1;

Schwefechn := 0;

loop 1 1 nDim-l;
Schwefechn := Schwefechn + (x(i)*sin(SQRT(VectorAbs(x(i)))));
endloop;

Schwefechn := 418.9829*nDim — Schwefechn;
endfunct ion ;

 

A user is free to put arbitrary code before calling the GA_Step procedure to proceed
with the next step. Such ﬂexibility allows one to build arbitrarily complex optimiza-
tion scenarios on the base of GATool. A user might perform some data manipulations
of his own, change the initial and global boxes (which is particularly important for
the COSY-GO interaction described in details in section 2.3.6), use other optimizers,
get. interactive input, etc., in order to build complex optimization scenarios. If any of
the stopping criteria are satisﬁed, GATool sets the g_GATOOIStatus variable to a zero
value hence the main while l00p in the example stops execution. The g_StopReason
variable indicates the reason for stepping,

Here is the list of the procedures used to initialize and ﬁnalize GATool, set the

stepping criteria and perform one step of the optimization.

283

 

 

 

procedure GA_Init Dim PopSize Seed;

 

Initializes the GATool. Sets the dimension and the population size (can be shared by
several otherwise different problems), which also implicitly deﬁnes a lot Of internal
parameters’ and buffers’ sizes. Sets the default values Of the parameters. The last
parameter, Seed, sets the initial seed for the pseudorandom number generator and
can be used to reproduce GATool runs. The pseudorandom numbers generator
implemented in COSY Inﬁnity, produces exactly the same sequence of the random
numbers if started from the same seed. Hence the two runs on the same problem
with the same value of Seed (and other parameters) will be identical in both the
intermediate and final results. If the Seed is set. to —1, the value of the seed is
generated randomly using the computer’s internal clock as a source of randomness.

This procedure should be called before any other GATool subroutine.

 

 

procedure SetStoppingCriteria nMaxGenerations nMaxStallGenerations
DesiredMinFValue IsStopDesiredMinFVal RelTol;

 

Sets various stopping criteria for the algorithm. The ﬁrst argument, nMaxGenerations,
if positive, sets the limit on the maximum number Of the generation produced during
the Optimization (essentially the maximum number of steps). The second argument,
nMaxStallGenerations, if positive, sets the limit on the maximum number of steps on
which the best function value (minimal in our case) changes by less than the tolerance
set by the last argument, RelTOI. The third and fourth arguments determine if the
algorithm stops when the desired minimal function value is reached or exceeded. Here
DesiredMinFValue speciﬁes the desired minimal value, the IsStopDesiredMinFVal

flag turns the checking on (when set to any non-zero value) or off (when set to a zero).

284

 

 

 

 

procedure SetMaxRunTime MaxRunTime;

 

 

Sets another stopping criteria for the algorithm: maximum run time in seconds,

speciﬁed by MaxRunTime; must. be positive.

 

 

procedure GA_InitProblem;

 

 

Performs the initialization of the problem-speciﬁc data structures. It Opens the log
ﬁles, starts the timers, initializes the statistics, generates the initial population,
outputs the method parameters to results log ﬁle. Then it sets the g_GAT0018tatus
variable to a non-zero value to indicate that the method is running. This procedure
should be called after all the parameters of the method are set but before the ﬁrst

call to GA_Step.

 

 

procedure GA_Step;

 

 

Performs one step of the search process. In order to perform it correctly, needs
g_vFValues to contain the values of the function calculated at the points stored in
g_aNextPopulation. The points themselves are generated by GATool but the com-
putation of the function values is left to a user. The procedure also updates the
statistics (including the current best minimizer) and writes the information to the
log ﬁles. It also checks the stopping criteria. If at least one Of them is satisﬁed, it
sets g_StopReason to the non-zero value indicating the exact reason for stopping and
g_GAToolstatus to zero indicating that the minimization is completed. In the current
version the following reasons for stopping are supported (correspond to the GATool

stopping criteria):

 

g_StopReason = 1 { Maximum number of generations is reached }

285

 

 

 

g_StopReason = 2 { Maximum number of stall generations (when the }
{ minimum value of the function changes by less then }
{ the specified tolerance) is reached }
g_StopReason = 3 { Desired minimal function value is reached }
g_StopReason = 4 { Time limit is reached }

 

If the minimization process is not completed, the procedure then generates the
next population and stores it in the g_aNextPopulation. It should be called after

GA_InitProblem.

 

 

procedure GA_FinalizeProblem;

 

Closes the log ﬁles, shutdowns the timers, performs the internal cleanup and prints the

execution timings. It should be called when the Optimization process is completed.

B.4 Access to Statistics

There are many internal variables used by GATool to store the statistics, the current
and the next populations, the function values and ﬁtnesses, the stOpping criteria, the
log ﬁles’ descriptors, timers, etc. Since all Of them are deﬁned as global variables a
user can potentially access these variables directly, but is strongly discouraged to do
so. The internal representations of the structures are implementation details and are
subject to a future changes by the GATool developer. What user should rely on is
an open interface in the form of the procedures and functions designed to provide an
access to internally available information. This interface forms a contract between a
tool developer and a user. Hence, for example if the procedure is designed to return
the number of the elite members of the population to a user, it would return this

number even if the internal name of the variable holding this value or the whole set of

286

 

 

 

 

the internal structures was changed. Were a user accessing this variable by its name
he would have to change this name in all the code. These user interface subroutines
can also perform transformations of the internal representation to another format and
change otherwise misleading internal names to a more user-readable.

In the current version of GATool the following routines provide the access to its

internal statistics:

 

 

procedure GetCurBestMemberVec vCurBestMember FMin;
procedure GetCurBestMemberArr aCurBestMember FMin;

 

A pair of the procedures that return the current best member of the population
(point in the search Space) and the corresponding value of the function at this point
(minimal since the method is performing a minimization). The difference between
these procedures is that. the ﬁrst of them returns the current best member as a
vector while the second returns it as an array. The values are returned through the

procedures’ arguments.

 

 

function GetPopulationMember aPopulation iIndex;

 

Takes the population array and an index Of the member of the population and returns
the population member (a point in the search space) in the vector format. In the
current version of GATool there are two population arrays: g_aCurPopulation and

g_aNextPopulation, iIndex could assume values from 1 to g_nPopSize.

B.5 Default Parameters Set

The default. set of the configuration parameters is shown on the Figure 8.1 It is tested

to work for a large class Of the Optimization problems reasonably well.

287

 

 

 

 

 

 

Reproduction: number of elite = 10, mutation rate = 0.2
Mutation: UNIFORM, gene mutation probability = 0.1
Crossover: HEURISTIC, ratio = 0.8, randomization is on

Fitness scaling: RANK

Selection: STOCHASTIC UNIFORM
Creation: UNIFORM
Areal: initial box = {—10.10} x ... x {-10.10}

global box
killing is off
Stopping: max generations = 1000,

{—10.10} x ... x {—10.10}

stall generations = 25,
tolerance = 1E-5

 

Figure B.l: GATOOl’s default parameters

B.6 Miscellaneous

 

 

procedure SetDumpFValues;
procedure UnsetDumpFValues;

 

Turns on and off the mode when all the points where the function was evaluated
during the Optimization along with the function values, are stored in the f_values
log ﬁle in the log ﬁles’ directory. Can be used, for example, to plot the function that

is expensive to calculate.

8.7 Advanced Conﬁguration via Active Blocks

Apart. from the COSYScript. subroutines, some Of the GATool parameters can be
conﬁgured using Active Blocks (described in details in Appendix A). Some conﬁg-
uration parameters are available through Active Blocks only (they deﬁne dynamic
variable sizes and internal GATool implementation details, especially experimental

ones), some through both Active Blocks and COSYScript subroutines, some only

288

 

 

through COSYScript subroutines. In order to conﬁgure GATool using Active Blocks,
a user should set the values of the variables described in this section before adding
library ﬁle to an assembly (as described in the section A.3.5).

Conﬁguration variables that can be set from Active Blocks only and their default

values are:

 

 

SCOSY::GATool::MaxDim = 20

 

 

Speciﬁes the maximum dimensionality of the problem that could be set via GA_Init.

 

 

SCOSY::GATool::MaxPopSize = $COSY::GAT001::MaxFim * 100;

 

 

Speciﬁes the maximum population size that could be set via GLInit.

 

 

SCOSYchTool::nMaxMutationParams = 3

 

 

Speciﬁes the maximum number of mutation params allowed.

 

 

$COSY:GATOOI::nMaxCrossoverParams = 2

 

 

Speciﬁes the maximum number Of crossover params allowed.

 

 

$C08Y:GATool::IsSuppressIncestCrossover = 1

 

 

The ﬂag that determines if the incests are suppressed during the crossover. The incest
in our terminology is a case when both parents in crossover correspond to the same
point in the search space. By the nature Of the crossover algorithm implemented in
GATool, it would result in a child that is exactly replicating this point which, in turn,
leads to a premature convergence of the method and thus should be avoided. The
incest suppression is particularly important for the small population sizes. The pop-
ulation can be small, for example, due to the expensiveness of the Objective function

calculation.

 

 

$COSY:GATOOI::IsShuffleAfterSelection = 1

 

 

289

 

 

If the flag is set to a non-zero value, an additional shuffling of indices is performed
after selection. This is done tO prevent the premature convergence and increase the

diversity by additionally mixing the population.

 

 

$C08Y::GAT001::nE1ite = 10
SCOSY::GATool::MutationRate = 0.2
SCOSY::GATool::MaxInitBoxSize = 10
$C08Y::GATOOI::RelTol = 1E-5
SCOSY::GATool::InitPopSize = 10
$COSY::GATOOI::IsKilling = 0
$COSY::GATool::PopCreationAlg = 1
SCOSY::GATool::FitScalingA1g = 3
$COSY::GATool::CrossoverAlg = 1
SCOSY::GATool::CrossoverRatio = 0.8
$COSY::GAT001::CrossoverIsRandomize = 1
SCOSY::GAToolzzMutationAlg = 1
SCOSY::GATool::MutationScale = 0.8
$COSY::GATool::MutationShrinkFactor = 0.6
SCOSY::GATOOl::MutationGeneMutProb = 0.1
SCOSY::GATool::Se1ectionA1g = 2
$COSY::GATool::StOppingMaxGens = 1000
SCOSY::GATool::StoppingStallGens = 25
SCOSY::GATool::StoppingMinFValue = undef
SCOSY::GATool::MaxRunTime = undef

 

 

 

These variables set. the default values Of the GATool parameters (see Figure 8.1).

 

 

SCOSY::GATOOl::RandomSeed = -l

 

 

This variable sets the random seed used if the last argument to GLInit is non-
positive. If this variable is set to a positive value, GATool always starts from the
predeﬁned seed. Note, however, that a positive argument to GA_Init overrides the
value set, from the Active Block. It is not recommended to change the value of this

variable; it is mainly provided for debugging purpose.

 

 

SCOSY::GATool::IsDumpFVa1ues = 0

 

 

290

 

 

 

 

 

 

Active Block COSYScript Random Seed’s Value
- - randomly generated
- + COS YScript: from argument to GA_Init
+ — Active Block: from SCOSY: :GATool: :RandomSeed
+ + COSYScript: from argument to GLInit

 

 

Default value Of the flag that controls the mode function values to the log ﬁle dumping

mode as described earlier.

291

APPENDIX C

Test Problems in Unconstrained

Optimization

Every new Optimization method has to prove its worthiness and justify the time and
effort spent in its development, implementation, and testing. Some of the optimiza-
tion methods aim at achieving a reasonable performance on a larger class of the prob—
lems, Others willingly narrow the target class in order to achieve better performance
on it. To assess the performance of different methods, compare their strengths and
weaknesses, reveal the unforeseen aspects of their behaviour, and stress-test them, a
large number of test problems was invented and examined [61]. In order to test the
behaviour of GATool (see section 2.3) we selected some of the most commonly used
test problems representing different aspects of of the difﬁculty of the optimization
process. In this appendix we present the formulations of these problems along with
their characteristics and example plots for the 2-dimensional cases (most of these

problems are deﬁned for a general n-dimensional case).

292

C.1 Sphere Function

 

Figure C.1: Sphere function (3D plot)

293

 

 

 

 

Figure 0.2: Sphere function (contour lines)

 

(a) Contour lines plot

 

 

# Definition
#ff"): 5:117?
# Search domain
atria—6,6], z‘=1,2,...,v
# Local minima

# One, same as global

# Global minimum
#x*=(0,...,0), f(x*)=0
# Description

# The simplest function for the conventional minimization methods:

# symmetric and unimodal.

smooth,

The gradient is directed towards the global

# minimum at any point. Using a right step size, a conventional minimizer

# can reach the global minimum in one step starting from any initial
# point. This problem is used to test the performance of the global

# Optimizer on the simplest case.

 

294

 

1011

’s Funct

lgin

Rastr'

C.2

 

Figure 0.3: Rastrigin’s function (3D plot)

295

 

 
   

‘t

MC 1' ><<§> me we)? Q Qgcég
©© >< >28“ ){OX(3)-(©><O‘ ' ‘
assassmsassaeax
0 34m») «<@>)(<(0))I(«OM0> (@)X((\))X<‘®)"X(@J}k.€
ZGQSQORSQSQSQXWO”

)«(Q (63):
Yﬁwewcs» “OC@
90s? 3

A“ s:

    

 

   
    
 
 

 

Figure 0.4: Rastrigin’s function (contour lines plot)

 

# Definition

# f(x)— — 101) + Z?_1(x2 — 10 cos(21rzi))

# Search domain

# z,- e [~6,6], i=1,2,...,v

# Local minima

# Lots

# Global minimum

# x* = (0,...,0), f(x*) =0

# Description

# Sphere function with added oscillatory behaviour which leads to a
# large number of the local minima. Conventional methods get stuck at
# one of the local minima.

 

296

 

C.3 CosExp Function

 

 

Figure 0.5: CosExp function (3D plot)

297

 

)

Figure C.6: CosExp function (contour lines plot

298

 

 

#Definition

#f(x=) sz _lcos(:cz-)— 2exp(— 1027-” ___1(:1:2t—1)2)
# Search domain

#$i€[—4,4], i=1,2,...,’U

# Local minima

#Lots

# Global minimum

#x*=(1,...,1)

# n = 2, f(x*) ~ —1.7081
# n = 3, f(x*) z —1.8422
# n = 4, f(x*) a —1.9147
# n = 5, f(x*) x —1.9539
# n = 6, mm x —1.9751
# n = 7, f(x*) 5:1 —1.9885
# n 2 8, f(x*) x —1.9927
# n — 9, f(x*) z —1.9960
# n =10, f(x*) a: 1.9978

# Description

# Highly oscillatory function with many local minima and a very
# sharp and well-pronounced global minimum that has a tiny domain
# of attraction. This property makes this minimum extremely hard
# to find especially for high-dimensional formulations.

 

299

 

C.4 Rosenbrock’s Function

 

-10 —10

Figure 0.7: Rosenbrok’s function (3D plot)

300

 

 

 

 

 

Figure 0.9: Rosenbrock’s function contours near the minimum

301

 

 

# Definition

# f(x) = 21’: (100(1? - mug + (a — 1)?)

# Search domain

# x,- 6 [—5,10], i=1,2,...,v

# Local minima

# One, same as global

# Global minimum

# x* =(1,...,1), f(x*) =0

# Description

# Is also called “banana" function for its banana-shaped contour lines
# (see Figure C.9 for magnified contour plots near the minimum).

# It poses difficulties for conventional minimizers due to its flat

# landscape and behaviour near the minimum. In the very narrow contour
# “valleys" around it, the gradient is pointing almost perpendicular

# to the direction towards the minimum. This can result in zigzag

# movements with small step sizes during the optimization. Minimization
# process frequently stops exhausting an allowed number of steps.

 

302

 

C.5 Ackley’s Function

 

Figure C.10: Ackley’s function (3D plot)

303

 

Eli‘s...tIris:...—....c<t£titli¢at¢£<¢l¢1¢tér Htslc<csi€t¢<txict

: -oOuonnoDoc09cno°°®0¢o0090000808009000002301000.09uoo.b

3U

   

0806900
5000-.3: - K

xi. r

Ackley’s function (contour lines plot)

Figure C.11

 

 

ion

t

# Defini

# f(X)

1 cos(27rrz-))

’U
2:

— experl :-

a1 rel/2)

_lz:

20 + 20 exp(—5"1(v

# Search domain

’0

i

,2,-

1

i

# z,- 6 [-30,30],

inima

#Local m

# Several
# Global

minimum

=0

x*)

(

, f

.,0)

#x*=(0,..

# Description

th its

imum wi

very sharp global min

# Oscillations with small amplitude

Conventional methods

# pit hidden in a large number of the local minima.

# get trapped at one of the local minima.

 

 

304

C.6 Griewank’s Function

 

Figure 0.12: Griewank’s function (3D plot)

305

 

 

Figure 0.13: Griewank’s function (contour lines plot)

 

’ # Definition

# f(x) = 4000—12321;;22— f=1cos(z,--r1/2) +1

# Search domain

# ac,- 6 [—600,600], i=1,2,...,v

# Local minima

# Several

# Global minimum

# x* = (0,...,0), f(x*) = 0

# Description

# Sphere function modulated by oscillations introduced by the cosine
# component. Number of local minima increases exponentially with

# dimension. It is frequently used as a test problem for the global
# optimization methods. See, however [111] for its criticism due to a
# complexity reduction with the dimension and proposed modifications to
# improve the difficutly.

 

 

 

306

 

C.7

An Function

 

 

Figure 0.14: An function

307

 

 

 

u-b

   

    

   
  

   

      
     

  

 

  

// 7/ .......... z ...................
/ 7 2’

J 1' /
. .f .- , ' /
, '1 .I _‘ l l
I . , .
(l'.‘ -‘>‘ O
/

 

 

Figure 0.15: An function (contour lines plot)

308

 

 

 

# Definition

# f(x)=(:c1—(v+ M+Z_ —(v+1) 2)2‘sz'$i—1
# Search domain

# 2:,- e [~0.25,0.25], i=1,2,...,v

# Local minima

# One, same as global

# Global minimum

# n = 2, f(x *) a: —0.0247;
# n = 3, f(x* ) x —0.0273;
# n = 4 f(x *)z —0.0256;
# n = 5, f(x* ) z —0.0231;
# n = 6, f(x*) 2 —0.0208;
# n = 7, f(x*) z —0.0188;
# n = 8, f(x*) z —0.0170;
# n- _ 9, f(x *).~e —0.0156;

# Description

# Similar to the Sphere function but very flat. Poses difficulties

# for conventional optimization methods because of its flatness which
# makes them take very small steps towards the minimum. Hence

# minimization process takes prohibitively many steps to complete.

 

309

 

C.8 SinSin Function

 

Figure 0.16: SinSin function (3D plot)

310

 
   

 

  

Dr

     
    
  

    

Figure 0.17: SinSin function (contour lines plot)

 

 

# Definition

# f(x) = “3:1511100332‘)

# Search domain

#xiE[—1,1],i=l,2 ..... v

# Local minima

# Several, same as global

# Global minimum

# S eeeee 1

# x* = (-—1r/20 + 27rk1, . . . , ~7r/20 + 27rkn), 1% E Z,i = 1,2 ..... n

# f (X*) = -

# Description

# Highly oscillatory function with many global minima. Poses significant
# difficulty for the rigorous global minimizers due to a large number of
# the regions of interest which cannot be eliminated from

# the consideration during the search process.

 

311

 

 

C.9 Paviani’s Function

 

Figure C.18: Paviani’s function (3D plot)

312

 

 

 

 

Figure C.19: Paviani’s function (contour lines plot)

31.3

 

 

 

# Definition

# f(x) = 23:1(ln(.r,j— 2))2 + (ln(10 -— .’z:.l-))2 — 3:1 2372/1}
# Search domain

# x,- e [2001,9999], i=1,2,...,v

# Local minima

# One, same as global

# Global minimum

# n = 2, x* z (972,972); f(x*) 2 ~82.883;

# n = 3, x* z (964,964,964); f(x*) x —77.396;

# n = 4, x* z (9.58, . . .,9.58); f(x*) x —72.357;

# n = 5, x* z (9.53, . . . ,953); f(x*) 2: —67.591;

# n = 6, x* z (9.49,. . .,949); f(x*) z —63.013;

# n = 7, x* z (9.45,. . .,945); f(x*) at: —58.57;

# n = 8, x* z (9.40, . . .,940); f(x*) z —54.23;

# n = 9, x* z (9.38,. . .,938); f(x*) 3 —49.97;

# Description

# Is a challenge for interval methods because even though it is flat, it
# has a high level of dependencies. This property makes it hard for the
# interval methods to reject boxes until they are very finely split [27].

 

314

 

APPENDIX D

Test Problems in Constrained
Optimization With Evolutionary

Algorithms

The set Of test problems for single—Objective constrained Optimization was suggested
as a standard benchmark for EAS by Michalewicz [135] and later adopted to test
the performance of all newly developed methods by the EA community [41,52,133,
149,169]. This testbench includes various synthetic problems (G01-G13) that expose
different properties Of the constraints, the feasible set, and the sought minimum.
Several real-life design problems originally solved by constrained EAS (vess, tens)
are also used. Problems in this appendix are listed using the conventions from the
optimization problem formulation (1.3.1), (1.3.2), (1.3.3), (1.3.4), (1.3.5). The search
space S is given as a box, i.e. set of ranges for 11,-, 27 = 1,. . . ,v. The values for global
minima are listed if known. The best known values are given where the true minima

are not. known.

315

A rough empirical classiﬁcation of problem difﬁculty and estimates for the

IFI

p= —-—--100
ISI

parameter are taken from I128]. Note that generally the most important factors that
increase the difﬁculty of a constrained Optimization problem are. the presence of at
least one nonlinear inequality and a high dimensionality. Note also, that even though
any feasible set where one of the constraints is an equality theoretically has a measure
zero, the value Of the parameter p obtained by a ﬁnite sampling Of the search space
to find the feasible points can be non-zero. For practical purposes such an estimate
is more useful than a purely theoretical measure. First, because, for the general
set Of constraints precise determination of F can be extremely difﬁcult. Second, for
practical purposes set F that. consists of a single point is harder to treat than F that
consists of a single line, which is, in turn harder to treat than F that consists of
a plane. The small deviations Of p from the theoretical zero obtained by sampling
allow us to make such a distinction (even though only approximately). Values Of p in
the problem descriptions are obtained by sampling the search space S by 1,000,000

random points.

316

 

 

DIFFICULT
p 2 0.0003
1) = 13

n=9 (9 linear inequalities; h1, h2, h3, h4, h5, h6 active)

quadratic objective function

f(x) = 523:1(5’32‘ — 513,2)— 21:5 1‘2‘

h4x =—2.T4—£E5+;L‘10_<_0
h6x =—2.’L‘8—$9+$12$O
’17 X = -—8.1'1 +1310 SO
h8 x = —8.132+$11_<_0

122-E[O,1l, i=1,...,9
xi6[0,100l, i=10,...,12
$13E[0,1]

x* = (1,1,1,1,1,1,1,1,1,3,3,3,1)
f(x*) = -15

 

 

Figure D.1: g01 Test problem

317

 

 

DIFFICULT

p z 99.9973

2) = 20

n = 2 (1 linear inequality, 1 nonlinear inequality; hl almost
active (—10-8))

nonlinear objective function

f(x) = *1 < 25:1 WSW-i) - 2 Ht;1cos2<x..>)(zt;1revs—”I

h1(x) = 0.75 — “33:1 511,- S 0
h2(x) = 3:13;, — 7.51) S 0

.171" 6 [0,10], 2': 1,...,1)

best known f(x*) = —0.803619

 

Figure D.2: g02 Test problem (best. known value from [163])

 

 

DIFFICULT

p a: 0.0026

1) = 10

n =1 (1 nonlinear equality; 91 active)

nonlinear objective function

f(X) = ~v2/v 115:1172;
g(x) = $21.73? — 1 = 0
x,e[0,10], i=1,...,v

x* = 1/ﬁ(:l:1,:t1,...,:t1), any combination of il’s such that their product
is positive

f(X*) = —1

 

Figure D.3: g03 Test problem

318

 

 

 

 

AVERAGE

p 2: 27.0079

1) = 5

n = 6 (4 linear inequalities, 2 nonlinear inequalities; h1, h6 active)
quadratic objective function

f(x) = 5357854713 + 0835689137125 + 3729323951 — 40792.141

61(x) = 85.334407 + 000568585255 + 0.0006262rlr4 —0.0022053x31‘5 — 92 g 0
( ) ——85.334407 — 00056858332335 — 000062623124 + 000220532355 3 0
7730:) = 80.51249 + 0007131722555 + 000299555:le + 00021813333 — 110 g 0
h4(x) —80.51249 — 0.0071317x2x5 — 0002995552152 — 0002181353 + 90 g 0

( )

( )

hgx

[15 x = 9.300961 + 00047026333275 + 0.0012547x1x3 + 00019085233174 — 25 _<_ 0
——9.300961 - 00047026123235 — 0.001254171131133 — 00019085233334 + 20 S 0

h6x

1‘1 E [78,102]
1‘2 6 [33,45]
132' 6 [27,45], i: 3,...,5

x* = (78, 33, 29.995256025682, 45, 36775812905788)
f(x*) = —30665.539

 

Figure DA: g04 Test problem

319

 

 

 

VERY DIFFICULT
p 2 0.0000
v = 4

n = 5 (2 linear inequalities, 3 nonlinear equalities; 91, 92, 93 are active}

nonlinear objective function

f(x) = 321 + 00000015513 + 25:2 + (0.000002/3);r:23

h1(x) 2 —1‘4 + $3 - 0.55 S 0

h2(x) = —:r3 + x4 —- 0.55 S 0

g(x) = 10003in(—x3 —- 0.25) + 10005in(-:64 — 0.25) + 894.8 — 5:1 = 0
92(x) = 1000 sin(:r3 — 0.25) + 1000 sin(;1:3 — 1:4 — 0.25) + 894.8 — 1‘2 = 0
g3(x) = 1000 sin(.r4 -— 0.25) +10008in(:r4 — 1:3 — 0.25) + 1294.8 = 0

.r, 6 [0.1200], 721.2
.r, 6 [455,055], 2': 3,4

best known x* = (679.9453,1026.067,0.1188764,—0.3962336)
f(x*) = 5126.498]

 

 

Figure D.5: g05 Test problem

320

 

 

AVERAGE

p 2 0.0057

21 = 2

n == 2 (2 nonlinear inequalities; hl, hg active)

nonlinear objective function
f(x) = (1:1 — 10)3 + (1‘2 — 20)3

h1(x) = —(:1:1— 5)2 — (5:2 — 5)2 +100 3 0
h2(x) = (.421 — 6)2 + (:52 — 5)2 -— 82.81 g 0

131 63 [13,100]
:62 6 [0,100]

x* = (14.095,0.84296)
f(x*) = —6961.81388

 

Figure D.6: g06 Test. problem

321

 

 

 

AVERAGE

p z 0.0000

'1) = 10

71. = 8 (3 linear inequalities, 5 nonlinear inequalities; hl, h2, h3, h4, h5
h6 active)

quadratic objective function

f(x) = I? + 13% + :1?le -— 141‘1 —16.’L'2 + (1‘3 —10)2 + 4(234 — 5)2 + ($5 — 3)2
+2(5:6 —- 1)2 + 5.5% + 7(28 —- 11)2 + 2(ch -—10)32 + (.510 — 7)2 + 45
h1(x) = —105 +4er +5932 — 3:137 +9er < 0
h2(x) z 1011 — 8.272 —17:r7 + 2:238 < 0
h3( )2 —8:L‘1 + 2:132 + 5179— 22310 — 12 < 0
h4(x)=3.(r1— 2)2 + 4(22 — 3)2 + 213% -— 7334 — 120 S 0
775(x)=5r1+822+(0:3—6)2—2:r4-4030
() “Tl +2 (172—2)2—2;r1.r2+14.r5—6.276£0
( )-03(11—8)2+2(.r2—4)2+3.rg—:66—30$0
( ) —3:r1 + 612 +12(1‘9 — 8)“2 — 73:10 _<_ 0

3‘0

6X
h7x

H

I18 x
172' 6 [—10,10], i=1,...,10
= (2.171996, 2.363683, 8.773926, 5.095984, 0.9906548, 1.430574, 1.321644,

9.828726, 8.280092, 8.375927)
f(x*) = 243062091

‘

 

 

Figure D.7: g07 Test problem

322

 

 

 

 

EASY

p R 0.8581

v : 2

n = 2 (2 nonlinear inequalities)

nonlinear objective function
f(x) = —sin3(27r.7:1)sin(27r:L'2)(:L-:l3(:r1 + (132))—

h1(X)=CE¥—IL‘2+1<O
h2(x)=1—xl+(a:2— 4)2<0

5,6[010], 5:1,2

= (1.2279713, 4.2453733)
f(x*) = —0.095825

 

 

Figure D.8: g08 Test problem

 

 

AVERAGE

p z 0.5199

7) = 7

n = 4 (4 nonlinear inequalities; h1, h4 active)

nonlinear objective function

f(x) = (1:1 —10)2 + 5(1‘2 —12)2 + 13% + 3(r4 — 11)2 +1017g+ 71% + 31% — 4236367

—10:E6 - 8:117
h(x) = —127+2x§ +3.521 +53 +453, + 55:5 < 0
h2(x) = -—282 + 7231+ 3:132 + 101133 + :64 — :35 < 0
h3(x) = —1296 + 2321+ 2:3 + 6:1:2 — 8x7 < 0
X)

114(x :2 4:131 +33% -3.rl:rg +2r3 + 5.736 —11:67 < 0
Ii E [—10, 10], i: 1,...,7

= (2.330499, 1.951372, —0.4775414, 4.365726, —0.6244870. 1.038131, 1.594227)
f(x*) = 680.6300573

 

 

Figure D.9: g09 Test problem

323

 

 

DIFFICULT

p a: 0.0020

1) = 8

n = 6 (3 linear inequalities, 3 nonlinear inequalities; hl, hg, h3 active)
linear objective function

f(x)=$1+1'2+$3

[g(x) = —1 + 0.0025(1’4 + x6) 3 0

h2(x) = —1 + 0.0025(55 + :57 — 44) s 0

71300 = —1 + 0.01(.’138 — 1'5) S 0

h4(x) = “$1376 + 8333325224 + 100361 — 83333.333 S 0
745(x) = —$2.r7 + 1250275 + 562.724 — 125034 S 0

h6(x) = —.233:l‘8 + 1250000 + 17315 -- 2500.235 S 0

5:1 6 [10010000]
x, e [1000,10000], 2': 2,...,3
33,-E [10,1000], i=4,...,8

x* : (579.3167, 1359.943, 5110.071, 182.0174,295.5985,217.9799, 286.4162, 395.5979)
f(x*) = 7049.3307

 

Figure D.10: g10 Test, problem

324

 

 

 

EASY

p z 0.0973

1) = 2

n =1 (1 nonlinear equality; 91 active)
linear objective function

n0=4%+92—02

gl(x) =r2—r%=0
33,: 6 [—1,1], i=1,2

x* = (:l:1/\/2,1/2)

 

 

 

f(x*) = 0.75
Figure D.11: g1], Test problem
EASY
p x 4.7697
1) =2 3

n =1 (93 nonlinear inequalities joined by logical 0R instead of AND,
disjoint F)
quadratic objective function

.ﬂm=~4m—Rmo—0l—mZ—ez—52—eg—m%
h,(x):(51_p)2 +(1L‘2 —-q)2 + (:13 —r)2 —- 0.0625 g 0, 5:1,...,93,

p,q,r=1,...,9
x is feasible if it satisfies one of h,-

1'.i€[0,10],2'.=1,2,3

x* 2 (5,5,5)
f(x*) = —1

 

Figure D.12: g12 Test problem

325

 

 

 

 

VERY DIFFICULT

p z 0.0000

v =

n = 3 (1 linear equality, 2 nonlinear equalities; g1, g2, 93 active)

nonlinear objective function

f(X) : 6.151175211331415

g(x) = 2?:11‘72— 10 = 0
92(x) = 172033 — 5.764335 2 0
g3(x)=r‘15+x%+1=0

.r, e {—23.23}, 5:1,2
x,- e {—32.32}, 2': 3,4,5

x* = (—1.717143,1.595709,1.827247,—0.7636413,—0.763645)
f(x*) = 0.0539498

 

Figure D.13: g13 Test problem

326

 

 

 

AVERAGE

p z 39.6762

v = 4

n == 4 (3 linear inequalities, 1 nonlinear inequality)
quadratic objective function

f(x) = 06224515354 + 177815253 + 3.166155%, + 19.84.4453
h1(x) = —:I:1 + 0.0193583 S 0

h2(x) = -IL‘2 + 000954333 S 0

h3(x) = 4.5354 — 4/3453 + 1296000 g 0

h4(x) = 1:4 — 240 S 0

(Ci E [1,99], 221,2
52,6[10200], 5:3,4

best known: f(x*) = 6059946341

 

Figure D.14: Design of a Pressure Vessel (vess) [100] (best known value from [41])

327

 

 

 

EASY

p z 0.7537

1) = 3

n = 4 (1 linear inequality, 3 nonlinear inequalities)

quadratic objective function

f(x) = (5:3 + 2325:?

h.1(x)=1-— 5353(717855‘11r1 g 0

h2(x) = (4.5% - 515:2)(12566(525§ — .4.~‘11))—1 + (5108.1.~¥)—1 — 1 g 0
h,3(x) : 1 — 14045212?— xgl g 0

h4(x) : (.172 + 1‘1)1.5_ — 1 S 0

1‘1 E [005,2]

5:2 6 [025,13]

173 E [2,15]

best known: f(x*) = 0.012681

 

Figure D.15: Design of a Tension/Compression Spring (tens) ]9] (best known value
from [41])

328

 

BIBLIOGRAPHY

329

Ill

l2l

|3|

M

[7]

l8|

l9]

l10l

IHI

C. Albright, V. Barger, J. Beacom, S. Brice, J. .l. Gomez-Cadenas, M. Good-
man, D. Harris, P. Huber, A. Jansson, M. Lindner, O. Mena, P. Rapidis,
K. Whisnant, W. Winter, The Neutrino Factory, and Muon Collider Collabora-
tion. The neutrino factory and beta beam experiments and development. Tech-
nical Report BNL-723692004, FNAL-TM-2259, LBNL-55478, BNL, FNAL,
LBNL, July 2004.

Carl H. et al. Albright. Physics at a neutrino factory. Technical Report.
FERMILAB-FN-0692, Fermi National Accelerator Laboratory, 2000.

Mohammad M. Alsharoa et al. Recent progress in neutrino factory and muon
collider research within the muon collaboration. Phys. Rev. ST Accel. Beams,
6:081001, 2003.

Martin Berz an Kyoko Makino. Higher order multivariate automatic differen-
tiation and validated computation of remainder bounds. WSEAS' Yl‘ansactions
on Mathematics, 3(Issue 1):37, January 2004. see http://www.wseas.org.

M. Apollonio et al. Oscillation physics with a neutrino factory. ((g)) ((11)).
CERN Yellow Report on the Neutrino Factory, 2002.

David L. Applegate, Robert E. Bixby, Vasek Chvatal, and William J. Cook. The
Traveling Salesman Problem: A Computational Study. Princeton University
Press, 2006.

Mordecai Avriel. Nonlinear Programming: Analysis and Methods. Dover Pub-
lications, 2003.

Helio J. C. Barbosa. A Coevolutionary Genetic Algorithm for Constrained
Optimization. In Proceedings of the Congress on Evolutionary Computation
19.99 (CEC’QQ), volume 3, pages 1605—1611, Piscataway, New Jersey, July 1999.
IEEE Service Center.

Ashok Dhondu Belegundu. A Study of Mathematical Programming Methods for
Structural Optimization. PhD thesis, University of Iowa, Iowa, USA, 1982.

J. S. Berg, S. A. Bogacz, S. Caspi, J. Cobb, R. C. Fernow, J. C. Gallardo,
S. Kahn, H. Kirk, D. Neuffer, R. Palmer, K. Paul, H. Witte, and M. Zis—
man. Cost-effective design for a neutrino factory. Phys. Rev. ST Accel. Beams,
9(1):011001, Jan 2006.

F. Bernelli-Zazzera, M. Berz, M. Lavagna, R. Armellin, P. Di Lizia, and F. Top-
puto. Global trajectory optimisation: Can we prune the solution space when
considering deep space maneuvers? Final Report. 0674101, ARIADNA Study,
December 2007.

330

112]

l13l

WI

1151

[WI

1171

I18l

WI

[20)

IQ‘ZI

|23l

R. H. Berry and G. D. Smith. Using a genetic algorithm to investigate tax-
ation induced interactions in capital budgeting. In Proc. of the International
Conference on Neural Networks and Genetic Algorithms, pages 567—574, 1993.

M. Berz. Differential algebraic description of beam dynamics to very high orders.
Particle Accelerators, 24:109—124, 1989.

M. Berz. Computational aspects of design and simulation: COSY INFINITY.
Technical Report 740, National Superconducting Cyclotron Laboratory, Michi-
gan State University, East Lansing, MI 48824, 1990.

M. Berz. High-Order Computation and Normal Form Analysis of Repetitive
Systems, in: M. Month (Ed), Physics of Particle Accelerators, volume 249,
pages 456—489. American Institute of Physics Publishing, New York, 1991.

M. Berz. Differential algebraic formulation of normal form theory. In S. Martin
M. Berz and K. Ziegler, editors, M. Berz, S. Martin and K. Ziegler (Eds), Proc.
Nonlinear Effects in Accelerators, page 77, London, 1992. Institute of Physics
Publishing.

M. Berz. Modern map methods for charged particle optics. Nuclear Instruments
and Methods, 363:100, 1995.

M. Berz. Modern Map Methods in Particle Beam Physics. Academic Press, San
Diego, 1999. Also available at http://bt.pa.msu.edu/pub.

M. Berz, B. Erdélyi, and K. Makino. Fringe ﬁeld effects in small rings of large
acceptance. Physical Review S T-A B, 3:124001, 2000.

M. Berz and G. Hoffstéitter. Exact bounds of the long term stability of weakly
nonlinear systems applied to the design of large storage rings. Interval Compu-
tations, 2268589, 1994.

M. Berz and K. Makino. Normal form methods and optimization for nonlinear
properties of cooling channels — part I. In Proc. of the APS/DPF/DPB Summer
Study on the Future of Particle Physics (Snowmass 2001), 2002. number T504.

M. Berz and K. Makino. COSY INFINITY Version 9.0 beam physics man-
ual. Technical Report MSUHEP-060804, Department of Physics and Astron-
omy, Michigan State University, East Lansing, MI 48824, 2006. see also
http://cosyinﬁnity.org.

M. Berz and K. Makino. COSY INFINITY Version 9.0 programmer’s man-
ual. Technical Report MSUHEP-060803, Department of Physics and Astron-
omy, Michigan State University, East Lansing, MI 48824, 2006. see also
http:,/’/cosyinfinity.org.

331

l24l

l25l

[26]

I27]

[28]

IQ9I

l30|

l3ll

l32|

I33]

l34l

l35l

I36]

M. Berz, K. Makino, and B. Erdelyi. Fringe ﬁeld effects in muon rings. AIP
CP, 530238—47, 2000.

M. Berz, K. Makino, and C. J. Johnstone. Propagation of a large-emittance
muon beam through a straight, quadrupole-based precooling channel. In Neu-
trino Factories and Superbeams, volume 721, page 413. AIP Conference Pro-
ceedings, 2004.

M. Berz, K. Makino, and Y.-K. Kim. Long-term stability of the Tevatron by
verified global optimization. Nuclear Instruments and Methods, 558:1—10, 2005.

Martin Berz. Private communication.

George Bilchev and Ian C. Parmee. Constrained and multi-modal optimisation
with an ant colony search model. In C. Parmee and M. J. Denham, editors,
Proceedings of 2nd International Conference on Adaptive Computing in Engi-
neering Desing and Control. University of Plymouth, Plymouth, UK, March
1996.

Donato Bini, Robert T Jantzen, and Andrea Merloni. Geometric interpretation
of the frenet-serret frame description of circular orbits in stationary axisymmet-
ric spacetimes. Classical and Quantum Gravity, 16:1333—1348, 1999.

A. Blondel and et al. (ed.). cha/cern studies of a european neutrino factory
complex. Technical Report CERN—2004002, CERN, 2004.

M. Born and E. Wolf. Principles of Optics. Pergamon Press, Oxford, 6 th
edition, 1980.

Stephen Brooks. Muonl Distributed Particle Accelerator Design. on-line.

J. Burguet-Castell, M. B. Gavela, J. J. Gomez-Cadenas, P. Hernandez, and
Olga Mena. Superbeams plus neutrino factory: The golden path to leptonic cp
violation. Nucl. Phys, 8646:301—320, 2002.

Susan E. Carlson, R. Shonkwiler, S. Babar, and M. Aral. Annealing a genetic
algorithm over constraints. on-line.

G. Casadei, A. Palareti, and G. Proli. Classifier system in trafﬁc management.
In Proc. of the International Conference on Neural Networks and Genetic Al-
gorithms, pages 620-627, 1993.

Weng Tat Chan, T.F. Fwa, and Kh. Zahidul Hoque. Constraint Handling Meth-
ods in Pavement Maintenance Programming. Transportation Research Part C
- Emerging Technologies, 9(3):]75—190, June 2001.

332

 

 

]37] Amy B. Chan-Hilton and Teresa B. Culver. Constraint-handling methods for
optimal groundwater remediation design by genetic algorithms. In Proc. of
IEEE International Conference on Systems, Man, and Cybernetics, vol.4, pages

3937—3942. IEEE, 1998.

I38] L. M. Chapin, J. Hoefkens, and M. Berz. The COSY language independent
architecture: Porting COSY source files. Insitute of Physics CS, 175:37—45,
2004.

]39] D. Cline and D. Neuffer. A muon storage ring for neutrino oscillation experi-
ments. In AIP Conf. Proc. 68, page 846, 1980.

[40] Carlos A. Coello Coello and Ricardo Landa Becerra. Constrained Optimiza-
tion Using an Evolutionary Programming-Based Cultural Algorithm. In I.C.
Parmee, editor, Proceedings of the Fifth International Conference on Adaptive
Computing in Design and Manufacture (A CDM’2002), volume 5, pages 317-—
328, University of Exeter, Devon, UK, April 2002. Springer-Verlag.

]41] Carlos A. Coello Coello and Efren Mezura—Montes. Constraint-handling in
genetic algorithms through the use of dominance-based tournament selection.
Advanced Engineering Informatics, 16(3):193—203, July 2002.

]42] Carlos A. Coello Coello. A Survey of Constraint Handling Techniques used
with Evolutionary Algorithms. Technical Report Lania-RJ-99-04, Laboratorio
Nacional de Informatica Avanzada, Xalapa, Veracruz, Mexico, 1999.

[43] Carlos A. Coello Coello. Treating Constraints as Objectives for Single-Objective
Evolutionary Optimization. Engineering Optimization, 32,(3,):275—308,, 2000.

[44] Carlos A. Coello Coello and Nareli Cruz Cortes. Use of Emulations of the Im-
mune System to Handle Constraints in Evolutionary Algorithms. In Cihan H.
Dagli, Anna L. Buczak, Joydeep Ghosh, Mark J. Embrechts, Okan Erson, and
Stephen Kercel, editors, Intelligent Engineering Systems through Artiﬁcial Neu-
ral Networks (ANNIE’QOOI), volume 11, pages 141-—146, St. Louis Missouri,
USA, November 2001. ASME Press.

[45] David W. Coit and Alice E. Smith. Penalty guided genetic search for reliabil-
ity design optimization. Computers and Industrial Engineering, 30(4):895—904,
September 1996. Special Issue on Genetic Algorithms.

[46] Muon Collider Collaboration. u+p_ collider: A feasibility study. Technical
Report BNL-52503, Fermilab-Conf-96/092, LBNL-38946, Muon Collider Col-
laboration, jul 1996. pres. at the Snowmass’96 workshop.

333

[47] M. Conte and W. W. MacKay. An Introduction to the Physics of Particle
Accelerators. World Scientific, Singapore, 1991.

[48] COSY Infinity home page. on-line. http://www.cosyinfinity.org/.

[49] Courant and Snyder. Theory of the alternating gradient synchrotron. Annals
of Physics, 3(1), 1958.

[50] R. Cucchiara. Analysis and comparison of different genetic models for the
clustering problem in image analysis. In Proc. of the International Conference
on Neural Networks and Genetic Algorithms, pages 423—427, 1993.

[51] Charles Darwin. The Origin of Species by Means of Natural Selection, or the
Preservation of Favoured Races in the Struggle for Life. London: John Murray,
6 edition, 1872.

[52] Kalyanmoy Deb. hI‘Iulti—objective genetic algorithms: Problem difﬁculties and
construction of test problems. Evolutionary Computation, 7(3):205—230, 1999.

[53] D. Deugo and F. Oppacher. Achieving self—stabilization in a distributed system
using evolutionary strategies. In Proc. of the International Conference on Neural
Networks and Genetic Algorithms, pages 400—407, 1993.

[54] A. J. Dragt and J. M. Finn. Normal form for mirror machine Hamiltonians.
Journal of Mathematical Physics, 20(12):2649, 1979.

[55] Lin Du and John Bigham. Constrained Coverage Optimisation for Mobile Cel-
lular Networks. In Giinther Raidl et al., editor, Applications of Evolutionary
Computing. Evoworkshops 2003: EvoBIO, EvoCOP, EvoIASP, EvoMUSART,
EvoROB, and EvoSTIM, pages 199-210, Essex, UK, April 2003. Springer Ver-
lag. Lecture Notes in Computer Science Vol. 2611.

[56] Eldon Hansen (Ed) and G. William Walster (Edu) Global Optimization Using
Interval Analysis. Pure and Applied Mathematics. A Dekker Series of Mono-
graphs and Textbooks. CRC, 2nd edition, 2003.

[57] A. E. Eiben, P.-E. Raué, and ZS. Ruttkay. GA-easy and GA-hard constraint sat-
isfaction problems. In M. Meyer, editor, Proceedings of the ECAI’94 Workshop
on Constraint Processing, pages 267—284. Springer—Verlag, 1995.

[58] A. E. Eiben and J. K. van der Hauw. Adaptive penalties for evolutionary graph
coloring. In .1.-K Hao, E. Lutton, E. Ronald, M. Schoenauer, and D. Sny-
ers, editors, Artiﬁcial Evolution’97, volume 1363 of Lecture Notes in Computer
Science, pages 95—106. Springer-Verlag, Berlin, 1998.

334

[59] Fuat Erbatur, Ogﬁzhan Hasancebi, Ilker Tiitiincii, and Hakan Kilic. Optimal
design of planar and space structures with genetic algorithms. Computers and
Structures, 75(2):209—224, March 2000.

[60] D. Errede, K. Makino, M. Berz, C. J. Johnstone, and A. van Ginneken. Stochas-
tic processes in muon ionization cooling. Nuclear Instruments and Methods A,
519:466—471, 2004.

[61] Christodoulos A. Floudas et al. Handbook of Test Problem in Local and
Global Optimization, volume 33 of Nonconver Optimization and Its Applica-
tions. Springer, 1999.

[62] J .S. Berg et al. Iss scoping study: Accelerator design concept for future neutrino
factories. Technical Report RAL-TR-2007-23, RAL, 2007.

[63] W.-M. Yao et. al. Neutrino mass, mixing, and flavor change. Journal of Physics,
G, 33(1), 2006.

[64] Neal C. Evans and David L. Shealy. Design of a gradient-index beam shaping
system via a genetic algorithm optimization method. In Fred M. Dickey and
Scott C. Holswade, editors, Proceedings of SPIE, volume 4095, pages 26—39,
October 2000.

[65] Bibliography on Evolutionary Computation. on-line. http://www.fmi.
uni-stuttgart.de/fk/evolalg/eabib.html.

[66] RC. Fernow. Icool, a simulation code for ionization cooling of muon beams.
In Proc. 1999 Particle Accelerator Conference, page 3020, New York, 1999. see
http:,-”’/pubweb.bnlgov/people)"fernow/.

[67] RC. Fernow. Physics analysis performed by ecalc9. Neutrino Factory and Muon
Collider Notes MUC-NOTE-COOL_THEOR.Y-280, Brookhaven National Lab-
oratory, 2003. see http://www-mucool.fnal.gov/notes/notes.html.

[68] RC. Fernow. ICOOL User’s Guide. Brookhaven National Laboratory, June
2005. see http: / / pubweb.bnl. gov / people /fernow / icool / readme.html.

[69] RC. Fernow. Recent developments on the muon-facility design code icool. In
Proc. 2005 Particle Accelerator Conference, page 2651, Knoxville, Texas, 2005.

[70] RC. Fernow. Scattering in icool. Neutrino Factory /Muon Collider Notes MUC-
NOTE-COOL-THEORY-0336, Brookhaven National Laboratory, April 2006.

[71] RC. Fernow and J .C. Gallardo. Examination of the us study 2a neutrino factory
front-end design. Neutrino Factory/Muon Collider Notes MUC-NOTE-COOL-
THEORY-0331, Fermi National Accelerator Laboratory, January 2006.

335

[72] AV. Fiacco and GP. McCormick. Nonlinear Programming: Sequential Uncon-
strained Minimiaztion Techniques. Wiley, New York, 1968.

[73] Jeffrey Fried]. Mastering Regular Expressions. O’Reilly Media, Inc, 3rd edition,
August 2006.

[74] .l.C. Gallardo, J.S. Berg, R.C. Fernow, H. Kirk, R..B. Palmer, D. Neuffer, and
K. Paul. New and efﬁcient neutrino factory front-end design. Neutrino Facto-
ry/Muon Collider Notes MUC-NOTE-COOL-THEORY-0316, Fermi National
Accelerator Laboratory, 2005.

[75] S. Geer. Neutrino beams from muon storage rings: Characteristics and physics
potential. Phys. Rev., D57z6989—6997, 1998.

[76] A. Geraci, T. Barlow, M. Portillo, J. Nolen, K. Shepard, M. Berz, and
K. Makino. High-order maps with acceleration for optimization of electro-
static and radio-frequency ion-optical elements. Review of Scientiﬁc Instru-
ments, 73,9:3174—3180, 2002.

[77] Philip E. Gill and Walter Murray. Algorithms for the solution of the nonlinear
least-squares problem. SIAM Journal on Numerical Analysis, 15(5):977~992,
1978.

[78] David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley Professional, lst. ed. edition, 1989.

[79] David E. Goldberg. The Design of Innovation. Lessons from and for Competent
Genetic Algorithms. Springer, 1st ed. edition, June 2002.

[80] J. Gottlieb, E. Marchiori, and C. Rossi. Evolutionary Algorithms for the Sat-
isfiability Problem. Evolutionary Computation, 10(1):35—50, 2002.

[81] NuFactJ Working Group. A feasibility study of a neutrino factory in japan.
Technical Report v. 1.0, NufactJ Working Group, May 2001.

[82] Schwefel H.-P. Evolution and optimium seeking. Wiley, New York, 1991.

[83] J. Rokne H. Ratschek. New Computer Methods for Global Optimization. Ellis
Horwood Limited, Chichester, England, 1988.

[84] P. Hajela and J. Lee. Constrained genetic search via schema adaptation.

an immune network solution. Structural and Multidisciplinary Optimization,
12(1):11—15, August 1996.

[85] Scot Hillier and Daniel Mezick. Programming Active Server Pages. Microsoft
Press, 1997.

336

[86] N. Holtkamp, D. Finley, and (eds). A feasibility study of a neutrino source

I871

l88l

I89]

I901

I91l

I92]

|93l

I94]

l95l

l96l

I97l
|98l

based on a muon storage ring. Technical Report F ERMILAB-PUB—OO—lOB-E,
Fermi National Accelerator Laboratory, April 2000. see http://library.fnal.
gov/archive/test—preprint/fermilab—pub-OO-108-e.shtml.

A. Homaifar, S. H.-Y. Lai, and X. Qi. Constrained optimization via genetic
algorithms. Simulation, 62(4):242-254, 1994.

J. Horn, N. Nafpliotis, and DE. Goldberg. A niched pareto genetic algorithm
for multiobjective optimization. In Proceedings of the First IEEE Conference
on Evolutionary Computation, 1994. IEEE World Congress on Computational
Intelligence, pages 82—87, Orlando, FL, USA, June 1994.

S. Humphries. Charged Particle Beams. Wiley, New York, 1990.

Philip Husbands, Frank Mill, and Stephen Warrington. Genetic algorithms, pro-
duction plan optimisation and scheduling. In Proceedings of the 1st Workshop
on Parallel Problem Solving from Nature, pages 80-84, 1991.

L. Ingber. Adaptive simulated annealing (asa): Lessons learned. Control and
Cybernetics, 25(1):33—34, 1996.

Nelder J .A. and Mead R. A simplex method for function minimization. Com-
puter Journal, 7:308—313, 1965.

J. D. Jackson. Classical Electrodynamics. Wiley, New York, 1975.

Fernando Jiménes and Jose L. Verdegay. Evolutionary techniques for con-
strained optimiation problems. In 1th European Congress on Intelligent Tech-
niques and Soft Computing, 1999.

C. J. Johnstone, M. Berz, D. Errede, and K. Makino. Optimization and beam

control in large-emittance accelerators: Neutrino factories. In 2003 Interna-
tional Conference Physics and Control, pages 964—973. IEEE, 2003.

C. J. Johnstone, M. Berz, D. Errede, and K. Makino. Muon beam ionization
cooling in a linear quadrupole channel. Nuclear Instruments and Methods A,
519:472-482, 2004.

Carol Johnstone. Private communication.

Jeffrey A. Joines and Christopher R. Houck. On the use of non-stationary
penalty functions to solve nonlinear constrained optimization problems with
ga’s. In Proceedings of the First IEEE Conference on Evolutionary Computa—
tion, 1.994. IEEE World Congress on Computational Intelligence, pages 579—
584. IEEE Press, 1994.

337

l99l

[100]

[101]

[102)

[103)

[104]

[105)

[106]

[107]

[108]

[109]

|110|

[111]

George Katodrytis. Genetic algorithms in design: Theory and application. In
Proc. of Computer Graphics, Imaging and Vision: New Trends, 2005, pages
426—430, 2005.

S. Kazarlis and V. Petridis. Varying fitness functions in genetic algorithms:
Studying the rate of increase of the dynamic penalty terms. In A. E. Eiben,
T. Biick, M. Schoenauer, and H.-P. Schwefel, editors, Proceedings of the 5th
Parallel Problem Solving from Nature (PPSN V), pages 211—220, Heidelberg,
Germany, September 1998. Springer-Verlag.

R. B. Kearfott. Rigorous Global Search: Continuous Problems. Kluwer, 1996.

.louni A. Lampinenk Kenneth V. Price, Rainer M. Storn. Differential Evolution:
A Practical Approach to Global Optimization. Natural Computing. Springer, 1st
ed. edition, December 2005.

Y.-K. Kim. Private communication.

Y.-K. Kim and M. Berz. Parallel constructs in COSY INFINITY. MSU High
Energy Phyisics Preprint MSUHEP-060805, Michigan State University, 2006.

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated
annealing. Science, 220(4598):671~680, 1983.

D. G. Koshkarev. Proposal for a decay ring to produce intense secondary particle
beams at the sps.

John R. Koza. Genetic Programming: On the Programming of Computers by
Means of Natural Selection. Complex Adaptive Systems. The MIT Press, De-
cember 1992.

Slawomir Koziel and Zbigniew Michalewicz. Evolutionary algorithms, homo-
morphous mappings, and constrained parameter optimization. Evolutionary
Computation, 7(1):19—44, 1999.

K.Paul and C. Johnstone. Optimizing the pion capture and decay channel. Neu-
trino Factory/Muon Collider Notes MUC-NOTE-COOL-THEORY-0280, Fermi
National Accelerator Laboratory, 2004.

H. W. Kuhn and A. W. Tucker. Nonlinear programming. In Proceedings of 2nd
Berkeley Symposium, Berkeley, pages 481—492. University of California Press.,
1951.

M. Locatelli. A note on the griewank test function. Journal of Global Opti—
mization, 25(2):]69—174, February 2003. 2003.

338

[112] Kai-Yew Lum, Pierre-Marie Jacquart, and Mourad Sefrioui. Constrained Opti-
mization of Multilayered Anti-Reﬂection Coatings using Genetic Algorithms. In
Lipo Wang, Kay Chen Tan, Takeshi Furuhashi, Jong—Hwan Kim, and Xin Yao,
editors, Proceedings of the 4th Asia-Paciﬁc Conference on Simulated Evolution
and Learning (SEAL’2002), volume 1, pages 172—177, Orchid Country Club,
Singapore, November 2002. Nanyang Technical University.

[113] C. O. Maidana, M. Berz, and K. Makino. Muon beam ring cooler simulations
using COSY INFINITY. Insitute of Physics CS, 175:211—218, 2004.

[114] K. Makino and M. Berz. Effects of kinematic correction on the dynamics in
muon rings. AIP CP, 530:217-227, 2000.

[115] K. Makino and M. Berz. Map-based muon cooling channel sim-
ulations with COSY INFINITY. In B. King, editor, Proc. of 6-
Month Feasibility Study on High Energy Muon Colliders, 2001.
httpz/ /pubweb.bnl. gov /people/bking/mucoll/index.html.

[116] K. Makino and M. Berz. Recent applications of COSY to nonlinear beam
dynamics problems. In Proceedings of the APS/DPF/DPB Summer Study on
the Future of Particle Physics (Snowmass 2001 ), 2002. number T510.

[117] K. Makino and M. Berz. Taylor models and other validated functional inclusion
methods. International Journal of Pure and Applied Mathematics, 4(4):379-
456, 2003.

[118] K. Makino and M. Berz. \z'eriﬁed global Optimization with Taylor model meth-
ods. International Journal of Computer Research, 12,2:245—252, 2003.

[119] K. Makino and M. Berz. Solenoid elements in COSY INFINITY. Institute of
Physics CP, 175:219~228, 2004.

[120] K. Makino and M. Berz. Tetra cooler ring simulation in cosy inﬁnity. In
Neutrino Factories and Superbeams, volume 721, page 418. AIP Conference
Proceedings, 2004.

[121] K. Makino and M. Berz. Range bounding for global optimization with Taylor
models. Transactions on Computers, 4,11:1611—1618, 2005.

[122] K. Makino and M. Berz. Veriﬁed global optimization with taylor model based
range bounders. Transactions on Computers, 4(11):1611—1618, November 2005.

[123] K. Makino, M. Berz, D. Errede, and C. J. Johnstone. High order map treatment
of superimposed cavities, absorbers, and magnetic multipole and solenoid ﬁelds.
Nuclear Instruments and Methods A, 519:162—174, 2004.

339

[124.)

[125)

[126]

[127)

[128]

|129|

mm

[131]

I132]

|133|

[134)

K. Makino, D. Errede, and M. Berz. Cooling channel simulations based on
map methods. In Proc. of the APS/DPF/DPB Summer Study on the Future of
Particle Physics (Snowmass 2001), 2002. number T702,.

Kyoko Makino. Rigorous Analysis of Nonlinear Motion in Particle Accelerators.
PhD thesis, Michigan State University, East Lansing, Michigan, USA, 1998.

Kyoko Makino, Martin Berz, Youn-Kyung Kim, and Pavel Snopok. Long-term
stability of large accelerators. ECMI Newsletter, 39, 2006.

GP McCormick. Penalty function versus nonpenalty function methods for con—
strained nonlinear programming problems. Mathematical Programming, 1:217-
238, 1971.

Efren Mezura—Montes. Alternative Techniques to Handle Constraints in Evolu-
tionary Optimization. PhD thesis, Computer Science Section, Electrical Eng.
Depart.ment., CINVESTAV-IPN, Mexico City, Mexico, December 2004.

Z. Michalewicz. A survey of constraint handling techniques in evolutionary
computation methods. In J. R. McDonnell, R. G. Reynolds, and D. B. Fogel,
editors, Proc. of the Fourth Annual Conference on Evolutionary Programming,
pages 135—155, Cambridge, MA, 1995. The MIT Press.

Z. Michalewicz and N. Attia. Evolutionary optimization of constrained prob-
lems. In Proceedings of the 3rd Annual Conference on Evolutionary Program-
ming, pages 98—108. World Scientiﬁc, 1994.

Z. Michalewicz and C. Janikow. Handling constraints in genetic algorithms.
In Proceeddings of the Fourth International Conference on Genetic Algorithms,
pages 151—157, Los Altos, CA, 1991. Morgan Kaufmann Publishers.

Z. Michalewicz and G. Nazhiyath. Genocop III: A co-evolutionary algorithm
for numerical Optimization problems with nonlinear constraints. In Proceedings
of the 2nd IEEE International Conference on Evolutionary Computation, vol.
2, pages 647—651, 1995.

Zbigniew Michalewicz. Genetic Algorithms, Numerical Optimization, and Con-
straints. In Larry J. Eshelman, editor, Proceedings of the Sixth International
Conference on Genetic Algorithms (ICGA-95), pages 151—158, San Mateo, Cal-
ifornia, July 1995. University of Pittsburgh, Morgan Kaufmann Publishers.

Zbigniew M ichalewicz. Genetic Algorithms 1 Data Structures — Evolution Pro-
grams. Springer, 2nd edition, 1998.

340

 

[135] Zbigniew Michalewicz and Marc Schoenauer. Evolutionary algorithms for con-
strained parameter optimization problems. Evolutionary Computation, 4(1):1—
32, 1996.

[136] Box M.J. A new method of constrained Optimization and a comparison with
other methods. Computer Journal, 8:42—52, 1965.

[137] N. V. Mokhov and O. E. Krivosheev. MARS Code Status. Technical Re-
port FERMILAB-Conf-OO/ 181, Fermi National Accelerator Laboratory, August
2000. Presented Paper at the Monte Carlo 2000 International Conference, Lis-
bon, Portugal, October 23-26, 2000.

[138] N. V. Mokhov and A. van Ginneken. Pion production and targetry at p+u—
colliders. Technical Report Fermilab—Conf-98/O41, Fermi National Accelerator
Laboratory, January 1998. Published Proceedings Of the 4th International Con—
ference on Physics Potential and Development of Muon Colliders, San Francisco,
California, December 10—12, 1997.

[139] Holly Moore. MATLAB for Engineers. ESource. Prentice Hall, 1st edition,
April 2006.

[140] N.N. Nekhoroshev. An exponential estimate Of the time of stability of nearly
integrable hamiltonian systems. Uspekhi Mat. Nauk, 32(6), January 1977.

[141] D. Neuffer. Exploration of the “high-frequency” buncher concept.
Neutrino Factory/Muon Collider Notes MUC-NOTE-DECAY_CHANNEL-
0269, Fermi National Accelerator Laboratory, 2003. see http://www-
mucool.fnal.gov/notes/notes.html.

[142] D. Neuffer and A. Van Ginneken. High-frequency bunching and cp— 6E rotation
for a muon source. In Proceedings of 2001 Particle Accelerators Conference,
page 2029, Chicago, 2001.

[143] David Neuffer. Private communication.

[144] David Neuffer. Beam dynamics problems of the muon collaboration: V-factories
and [PL—p“ colliders. Neutrino Factory/Muon Collider Notes MUC-NOTE-
COOL-THEORY-0266, Fermi National Accelerator Laboratory, 2002.

[145] David Neuffer. More studies on cooling with high—frequency phase rotation.
Neutrino Factory/Muon Collider Notes MUC-NOTE—COOL-THEORY-0335,
Fermi National Accelerator Laboratory, 2006.

[146] The Neutrino Factory and Muon Collider Collaboration Web page. on-line.
http://www.cap.bnl.gov/mumu/.

341

[147]

[148]

]149]

]150]

[151]

]152]
]153]

[154]

]155|

[157]

Jorge Nocedal and Stephen Wright. Numerical Optimization. Springer Series
in Operations Research and Financial Engineering. Springer, July 2006.

Collier TC. Ofria C, Adami C. Selective pressures on genomes in molecular
evolution. Journal of Theoretical Biology, 4(222):447—483, June 2003.

Akira Oyama, Koji Shimoyama, and K020 Fujii. New Constraint-Handling
Method for Multi-Objective Multi-Constraint Evolutionary Optimization and
Its Application to Space Plane Design. In R. Schilling, W. Haase, J. Periaux,
H. Baier, and G. Bugeda, editors, Evolutionary and Deterministic Methods for
Design, Optimization and Control with Applications to Industrial and Societal
Problems (EUROGEN 2005), Munich, Germany, 2005.

S. Ozaki, R. Palmer, M. Zisman, J. Gallardo, and (eds). Feasibility study-II Of
a muon—based neutrino source. Technical Report BNL-52623, Muon Collider
Collaboration, June 2001. see http:,r’,/www.cap.bnl.gov/mumu/studyii/FS2-
report.html.

Vilfredo Pareto. Cours D’Economie Politique, volume I. F. Rouge, Lausanne,
1896.

Kevin Paul. Frontend optimization, development and application. presentation.

Perl Programming Language. on-line. http://www.perl.org.

A. A. Poklonskiy, D. Neuffer, C. J. Johnstone, M. Berz, K. Makino, D. A.
Ovsyannikov, and A. D. Ovsyannikov. Optimizing the adiabatic buncher and
phase rotator. Nuclear Instruments and Methods, 558:135-141, 2005.

AA. Poklonskiy, D.A. Ovsyannikov, A.D. Ovsyannikov, D. Neuffer, and
M. Berz. Exploring the bunching section of the neutrino factory. In Physics
and Control, 2005. Proceedings. 2005 International Conference, pages 272—277,
August 2005.

Alexey Poklonskiy. Studies on performance of the COSY Inﬁnity Optimizers on
constraints satisfaction. High Energy Physics Preprint HEP-080317, Michigan
State University, East Lansing, MI, 48823, USA, April 2008. available at http:

//www.bt.pa.msu.edu/pub/.

D. Powell and M. M. Skolnick. Using genetic algorithms in engineering design
optimization with non-linear constraints. In Proceedings of the Fifth Interna—
tional Convference on Genetic Algorithms, pages 424—430, Los Altos, CA, 1993.
Morgan Kaufmann Publishers.

342

[158]

[161]

[162]

[163]

[164]

[165]

[166]

]167]

[168]

[169]

P. Prinetto, M. Rebaudengo, and M. Sonza Reorda. Hybrid genetic algorithms
for the traveling salesman problem. In Proc. of the International Conference on
Neural Networks and Genetic Algorithms, pages 559—566, 1993.

P.Snopok, C.Johnstone, and M.Berz. Simulation and optimization of the teva-
tron accelerator. Springer, 50:199—209, 2005.

S Ramberger and Stephan Russenschuck. Genetic algorithms for
the optimal design of superconducting accelerator magnets. Tech-
nical Report LHC-PROJECT—REPORT-275, CERN Document Server
[http:f/cdsweb.cern.ch/oai2d.py] (Switzerland), 1999.

Ingo Rechenberg. Evolutionsstrategie - Optimierung technischer Systems nach
Prinzipien der biologischen Evolution. PhD thesis, Technical University of
Berlin, 1971.

J. Richardson, M. Palmer, G. Liepins, and M.Hillard. Some guidelines for
genetic algorithms with penalty functions. In Proceedings of the Third Interna-
tional Conference on Genetic Algorithms, pages 191—197, San Francisco, CA,
USA, 1989. Morgan Kaufmann Publishers Inc.

Thomas P. Runarsson and Xin Yao. Stochastic ranking for constrained evo-
lutionary optimization. IEEE Transactions on Evolutionary Computation,
4(3):284—294, September 2000.

S. Sandqvist. On ﬁnding optimal potential customers from a large marketing
database — A genetic algorithm approach. In Proc. of the International Con-
ference on Neural Networks and Genetic Algorithms, pages 528—535, 1993.

Johannes J. Schneider and Scott Kirkpatrick. Stochastic Optimization. Scientiﬁc
Computing. Springer, lst edition, November 2006.

M. Schoenauer and S. Xanthakis. Constrained ga optimization. In Proceedings
of the Fifth International Convference on Genetic Algorithms, pages 573—580,
Los Altos, CA, 1993. Morgan Kaufmann Publishers.

Alexander Schrijver. Theory of Linear and Integer Programming. John Wiley
and Sons, 1998.

Rigorous computation * self-veriﬁed methods page. on-line. http: //bt .pa.msu.
edu/index_se1fva1idated.htm.

Ankur Sinha, Aravind Srinivasan, and Kalyanmoy Deb. A Population-Based,
Parent Centric Procedure for Constrained Real-Parametrer Optimization. In
2006 IEEE Congress on Evolutionary Computation (CEC’2006), pages 943—
949, Vancouver, BC, Canada, July 2006. IEEE.

343

[170]

[171]

[172[

[173[

[174]

[175]

[176]

[177]

[178]

Jr. Stanley Humphries. Principles of Charged Particle Acceleration. John Wiley
and Sons, 1999.

Rainer Storn and Kenneth Price. Differential evolution - a simple and efﬁcient
adaptive scheme for global optimization over continuous spaces. Technical Re-
port TR-95-012, Berkeley, CA, 1995.

Almo TOrn and Antanas Zilinskas. Global Optimization, volume 350 of Lecture
Notes in Computer Science. Springer, 1989.

Larry Wall, Tom Christiansen, and Jon Orwant. Programming Perl. O’Reilly
Media, 3rd ed. edition, July 2000.

Alex Wittig. Private communication.

H. Wollnik. Optics of Charged Particles. Academic Press, Orlando, Florida,
1987.

DB. Wolpert and WC. Macready. NO free lunch theorems for optimization.
Evolutionary Computation, IEEE Transactions, 1(1):67—82, April 1997.

F. Zimmermann, C. Johnstone, M. Berz, B. Erdélyi, K. Makino, and W. Wan.
Fringe ﬁelds and dynamic aperture in muon storage rings. Neutrino Factory/-
Muon Collider Notes MUC-NOTE—NEUTRINO-SRC-0095, Fermi National Ac—
celerator Laboratory, 2000. see http://www-mucool.fnal.gov/notes/notes.html,
also CERN SL 2000-011 A-P, and CERN NuFact Note 21, CERN.

Michael S. Zisman. International scoping study of a future accelerator neutrino
complex. In Proceedings of EPAC 2006, pages 2427—2429, Edinbrugh, Scotland,
2007.

344