SOLVING COMPUTATIONALLY EXPENSIVE PROBLEMS USING
SURROGATE-ASSISTED OPTIMIZATION: METHODS AND APPLICATIONS
                                    By
                               Julian Blank
                           A DISSERTATION
                               Submitted to
                       Michigan State University
               in partial fulfillment of the requirements
                            for the degree of
              Computer Science – Doctor of Philosophy
                                   2022


                                          PUBLIC ABSTRACT
               SOLVING COMPUTATIONALLY EXPENSIVE PROBLEMS USING
         SURROGATE-ASSISTED OPTIMIZATION: METHODS AND APPLICATIONS
                                                     By
                                               Julian Blank
Optimization is omnipresent in many research areas and has become a critical component across
industries. However, while researchers often focus on a theoretical analysis or convergence proof
of an optimization algorithm, practitioners face various other challenges in real-world applications.
This thesis focuses on one of the biggest challenges when applying optimization in practice:
computational expense, often caused by the necessity of calling a third-party software package.
To address the time-consuming evaluation, we propose a generalizable probabilistic surrogate-
assisted framework that dynamically incorporates predictions of approximation models. Besides the
framework’s capability of handling multiple objectives and constraints simultaneously, the novelty
is its applicability to all kinds of metaheuristics. Moreover, often multiple disciplines are involved in
optimization, resulting in different types of software packages utilized for performance assessment.
Therefore, the resulting optimization problem typically consists of multiple independently evaluable
objectives and constraints with varying computational expenses. Besides providing a taxonomy
describing different ways of independent evaluation calls, this thesis also proposes a methodology
to handle inexpensive constraints with expensive objective functions and a more generic concept
for any type of heterogeneously expensive optimization problem. Furthermore, two case studies
of real-world optimization problems from the automobile industry are discussed, a blueprint for
solving optimization problems in practice is provided, and a widely-used optimization framework
focusing on multi-objective optimization (founded and maintained by the author of this thesis) is
presented. Altogether, this thesis shall pave the way to solve (computationally expensive) real-world
optimization more efficiently and bridge the gap between theory and practice.


                                            ABSTRACT
              SOLVING COMPUTATIONALLY EXPENSIVE PROBLEMS USING
        SURROGATE-ASSISTED OPTIMIZATION: METHODS AND APPLICATIONS
                                                  By
                                             Julian Blank
Significant effort has been made to solve computationally expensive optimization problems in
the last two decades. To address the expense of objectives and constraint functions, the usage
of surrogate models during optimization has emerged as a predominant approach. Numerous
surrogate-based algorithms have been proposed, each answering what to model, what to model it
with, and how to utilize the additional information in different ways. However, many optimization
methods are tailored to a specific problem or make impractical assumptions, such as having only a
single objective, not having any constraints to satisfy, or all objectives and constraints being equally
time-consuming. Such assumptions make it challenging for practitioners to find suitable optimiza-
tion algorithms to apply to their applications and show the need for more generalized methods as
well as best practices for solving computationally expensive applications. Thus, this dissertation
addresses computationally expensive optimization problems in a holistic manner, including their
interdisciplinary character and practicalities. It discusses different viewpoints on computationally
expensive optimization and demonstrates relevance by providing an overview of applications in
various research fields. Moreover, it proposes a generalizable surrogate-assisted framework for
solving computationally expensive problems. The framework endows an existing optimization
algorithm fulfilling minimal requirements with surrogate assistance, improving the convergence
behavior. The algorithm’s search pattern on the computationally inexpensive surrogate is used
to determine the subsequent designs to run for the time-consuming simulation. Moreover, this
thesis investigates an often overlooked fact that objectives and constraints may be independently
computable in practice. Independent target functions result in a heterogeneously expensive opti-
mization problem with some objectives or constraints being less and some more time-consuming
than others. A typical scenario of heterogeneity is objectives requiring a time-consuming simula-


tion but computationally inexpensive geometrical or physical constraints. For such cases, this thesis
proposes an optimization method exploiting the inexpensiveness of constraints during optimization.
Furthermore, the approach is generalized for any type of heterogeneity, requiring addressing partial
information on a solution level. Because only little attention has been paid to such a fundamentally
important aspect of expensive optimization, the proposed method needs a novel way of treating
missing information during optimization. The evaluation order of targets is determined based on
an information gain sorting taking the trade-off between prediction error and evaluation time into
account. The proposed method then iterates over targets with more information gain first and
discards solutions early on during the evaluation process without evaluating them entirely. Be-
sides algorithmic aspects of optimizing time-consuming functions, this dissertation also addresses
more practical matters. It presents the architecture and usage of the multi-objective optimization
framework pymoo (founded and maintained by the author of this thesis), a widely used and well-
known toolkit across academia and industry. While some end-users directly employ the provided
state-of-the-art algorithms, others utilize rapid prototyping capabilities or other features such as
multi-criteria decision-making and visualization. Moreover, this thesis portrays optimization as a
product of a collaboration between two or more disciplines. The interdisciplinary characteristics
shall be a substantial part of the problem-solving procedure. Thus the proposal of a blueprint for
collaborative optimization provides some guidance and best practices for practitioners. Altogether,
the optimization of time-consuming functions is looked at from different angles. Practitioners who
face time-consuming real-world problems can use the proposed methods for solving various types
of optimization problems. Thus, this thesis shall pave the way to solve (computationally expensive)
real-world optimization more efficiently and bridge the gap between theory and practice.


                                    ACKNOWLEDGEMENTS
First and foremost, I am highly grateful to my supervisor, Kalyanmoy Deb, for his continuous
support, valuable advice, and patience during my Ph.D. study. His knowledge and experience
have encouraged me in all the time of my academic research and daily life. Moreover, I would
like to thank my Ph.D. committee, Erik Goodman, Sandeep Kulkarni, and Ronald Averill, for
their valuable feedback and comments throughout my academic journey. I would also like to
thank all my co-workers and fellow students for their steady support in and outside the laboratory
and classroom; especially Yashesh Dhebar for the fruitful discussions, Zichao Lu for the steady
feedback concerning my optimization framework, Ritam Guha for the accompany on my lunch
walks, Proteek Chandan Roy for being my patient desk neighbor, Rayan Hussein for his advice and
reliability at any time, and Yash Vesikar for the great experience working together on a real-world
problem. Also, I would like to thank Nicholle Robertson for reviewing my thesis for language and
grammar.
    Additionally, I would like to express my gratitude to my whole family and relatives, especially
my mother, Barbara Blank, for the emotional and financial support, my son, Emil Babilon, for
being the most beautiful enrichment of my life, my grandmother, Hannelore Blank, for the steady
support, my grandfather Rudi Blank, for teaching me many useful lessons for succeeding in life,
and my uncle, Holger Blank, for mentoring me during my stay. Moreover, I would like to thank
all my friends in the Lansing area, especially Justin Miller and his children Elinor and Cora Miller,
his dad Bill Miller, and Carolyn Jehners, for the family-like environment during my stay and the
adventures we have had together. Finally, I would like to thank everyone I have been in touch with
within the last few years and who accompanied me on my journey in any way.
                                                  iv


                                 TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   xi
LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .            . . . 1
   1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
       1.1.1 Problem Characteristics and Challenges . . . . . . . . . . . . . . . .       . . . 2
       1.1.2 Facing a Computationally Expensive Problem (CEP) . . . . . . . . .           . . . 5
       1.1.3 The Role of Surrogate Model in Optimization and Some Terminology             . . . 6
   1.2 Research Goals and Contributions . . . . . . . . . . . . . . . . . . . . . . .     . . . 10
   1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . 13
CHAPTER 2 FUNDAMENTALS . . . . . .              . . . . . . . . . . . . . . . . . . . . . . . . 14
   2.1 Single-Objective Optimization (SOO)      . . . . . . . . . . . . . . . . . . . . . . . . 14
   2.2 Multi-Objective Optimization (MOO)       . . . . . . . . . . . . . . . . . . . . . . . . 15
   2.3 Genetic Algorithms (GAs) . . . . . .     . . . . . . . . . . . . . . . . . . . . . . . . 17
   2.4 Data Modeling and Predictors . . . . .   . . . . . . . . . . . . . . . . . . . . . . . . 19
       2.4.1 Polynomial Regression (PR) .       . . . . . . . . . . . . . . . . . . . . . . . . 19
       2.4.2 Radial Basis Functions (RBFs)      . . . . . . . . . . . . . . . . . . . . . . . . 21
       2.4.3 Kriging . . . . . . . . . . . .    . . . . . . . . . . . . . . . . . . . . . . . . 22
   2.5 Efficient Global Optimization (EGO) .    . . . . . . . . . . . . . . . . . . . . . . . . 23
   2.6 Summary of the Chapter . . . . . . .     . . . . . . . . . . . . . . . . . . . . . . . . 24
CHAPTER 3 LITERATURE REVIEW . . . . . . . . . . . . . . . . . .               . . . . . . . . . 25
   3.1 Different Viewpoints on Computationally Expensive Optimization         . . . . . . . . . 25
   3.2 Surrogate-Based Optimization: History and Recent Trends . . . .        . . . . . . . . . 27
   3.3 Relevant Literature and Applications . . . . . . . . . . . . . . . .   . . . . . . . . . 30
   3.4 Summary of the Chapter and Open Issues . . . . . . . . . . . . .       . . . . . . . . . 43
CHAPTER 4     A GENERALIZED PROBABILISTIC SURROGATE-ASSISTED FRAME-
              WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    44
   4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
   4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   47
   4.3 Interfacing Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
   4.4 Probabilistic Surrogate-Assisted Framework (PSAF) . . . . . . . . . . . . . . . . .      53
       4.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      53
               4.4.1.1 Influence of Surrogate through Tournament Selection Pressure (𝛼)         55
               4.4.1.2 Continue Optimization on Surrogate (𝛽) . . . . . . . . . . . . . .       56
                                               v


              4.4.1.3 Balancing the Utilization of Surrogate (𝜌) . . . . . . . . . . .     . .  58
              4.4.1.4 Surrogate Management . . . . . . . . . . . . . . . . . . . . .       . .  59
      4.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . .     . .  60
              4.4.2.1 What are suitable values for (𝛼, 𝛽 and 𝜌)? . . . . . . . . . . .     . .  60
              4.4.2.2 Is it beneficial to update 𝜌 each iteration? . . . . . . . . . . .   . .  63
              4.4.2.3 How does PSAF perform compared to other surrogate-based
                       algorithms? . . . . . . . . . . . . . . . . . . . . . . . . . . .   . .  63
      4.4.3 Summary of Section 4.4 . . . . . . . . . . . . . . . . . . . . . . . . .       . .  65
  4.5 Generalized Probabilistic Surrogate-Assisted Framework (GPASF) . . . . . . .         . .  66
      4.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . .  66
              4.5.1.1 Generalized 𝛼 and 𝛽-phase . . . . . . . . . . . . . . . . . . .      . .  67
              4.5.1.2 Balancing the Exploration and Exploitation (𝛾) . . . . . . . .       . .  70
              4.5.1.3 Surrogate Management . . . . . . . . . . . . . . . . . . . . .       . .  71
      4.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . .     . .  72
              4.5.2.1 (Unconstrained) Single-Objective Optimization . . . . . . . .        . .  74
              4.5.2.2 Constrained Single-Objective Optimization . . . . . . . . . .        . .  75
              4.5.2.3 (Unconstrained) Multi-Objective Optimization . . . . . . . .         . .  77
              4.5.2.4 Constrained Multi-Objective Optimization . . . . . . . . . .         . .  79
      4.5.3 Summary of Section 4.5 . . . . . . . . . . . . . . . . . . . . . . . . .       . .  80
  4.6 Summary of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . .  81
CHAPTER 5 HETEROGENEOUS EXPENSIVE OBJECTIVES AND CONSTRAINTS                                 .  83
  5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  83
  5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   .  85
  5.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   .  89
      5.3.1 Evaluations Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .  90
      5.3.2 Job Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .  91
  5.4 Computationally Expensive Objectives and Inexpensive Constraints . . . . . . .         .  93
      5.4.1 Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . .      .  94
      5.4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      .  96
      5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    .  99
      5.4.4 Summary of Section 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . .       . 103
  5.5 Constrained Multi-Objective Optimization Problems With Heterogeneous Eval-
      uation Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
      5.5.1 How to Exploit Heterogeneity of an Optimization Problem? . . . . . . .           . 105
      5.5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . 107
              5.5.2.1 Survival Under Uncertainty . . . . . . . . . . . . . . . . . . .       . 107
              5.5.2.2 Probabilistic Surrogate-Guided Mating . . . . . . . . . . . . .        . 109
              5.5.2.3 Heterogeneously Expensive Evolutionary Algorithm (HE-EA) .             . 110
              5.5.2.4 Surrogate Management . . . . . . . . . . . . . . . . . . . . . .       . 115
      5.5.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . .    . 115
      5.5.4 Summary of Section 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . .       . 122
  5.6 Summary of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . 122
CHAPTER 6    REAL-WORLD APPLICATIONS . . . . . . . . . . . . . . . . . . . . . . . 124
                                              vi


  6.1 Case Study I: Cylinder Head Water Jacket Optimization . . . . . . . . . . . . .    . . 124
      6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
      6.1.2 Proposed Proximity-Based Surrogate-Assisted Optimization Method . .          . . 126
             6.1.2.1 Selection of Initial High-Fidelity Solutions . . . . . . . . . .    . . 126
             6.1.2.2 Parallel Infill Strategy . . . . . . . . . . . . . . . . . . . . .  . . 127
             6.1.2.3 Management of the Surrogate Model . . . . . . . . . . . . .         . . 130
             6.1.2.4 Optimization of the Approximate Function(s) . . . . . . . . .       . . 132
             6.1.2.5 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . 133
      6.1.3 Descriptive Experiments . . . . . . . . . . . . . . . . . . . . . . . . .    . . 133
             6.1.3.1 Effect of 𝑅prox . . . . . . . . . . . . . . . . . . . . . . . . .   . . 135
             6.1.3.2 Effect of 𝑁cycle . . . . . . . . . . . . . . . . . . . . . . . . .  . . 137
             6.1.3.3 Effect of 𝑟 R2R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
      6.1.4 Numerical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . .     . . 139
      6.1.5 Application: Cylinder Head Water Jacket Design . . . . . . . . . . . .       . . 144
             6.1.5.1 Problem Description . . . . . . . . . . . . . . . . . . . . . .     . . 144
             6.1.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . 144
      6.1.6 Summary of Section 6.1 . . . . . . . . . . . . . . . . . . . . . . . . .     . . 148
  6.2 Case Study II: Electric Machine Design . . . . . . . . . . . . . . . . . . . . .   . . 148
      6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
      6.2.2 Electric Machine Design and Optimization Problem Formulation . . . .         . . 151
             6.2.2.1 Selection of Machine Topology, Objective Functions, and Eval-
                        uation Method . . . . . . . . . . . . . . . . . . . . . . . . .  . . 152
             6.2.2.2 Definition of Feasible Search Space . . . . . . . . . . . . . .     . . 153
             6.2.2.3 Selection of Operating Point for Optimization . . . . . . . . .     . . 153
      6.2.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . 155
             6.2.3.1 Repair Operator . . . . . . . . . . . . . . . . . . . . . . . .     . . 157
             6.2.3.2 Surrogate Incorporation . . . . . . . . . . . . . . . . . . . .     . . 157
             6.2.3.3 NSGA-II-WR-SA . . . . . . . . . . . . . . . . . . . . . . . .       . . 158
      6.2.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .   . . 160
             6.2.4.1 Analysis of Constraints . . . . . . . . . . . . . . . . . . . . .   . . 160
             6.2.4.2 Impact of Repair Operator . . . . . . . . . . . . . . . . . . .     . . 161
             6.2.4.3 Parameter Study for Surrogate-Assisted Optimization . . . . .       . . 162
             6.2.4.4 Convergence Analysis with and without Surrogates . . . . . .        . . 166
             6.2.4.5 Analysis of Pareto-optimal Solutions . . . . . . . . . . . . .      . . 168
             6.2.4.6 Selection of Preferred Solutions . . . . . . . . . . . . . . . .    . . 168
      6.2.5 Summary of Section 6.2 . . . . . . . . . . . . . . . . . . . . . . . . .     . . 173
  6.3 Summary of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . 174
CHAPTER 7 OPTIMIZATION IN PRACTICE . . .             . . . . . . . . . . . . . . . . . . . . 175
  7.1 Collaborative Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
      7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
      7.1.2 Related Work . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . . . . . 177
      7.1.3 SOLVeR: Collaborative Optimization       . . . . . . . . . . . . . . . . . . . . 178
      7.1.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
      7.1.5 Summary of Section 7.1 . . . . . . .     . . . . . . . . . . . . . . . . . . . . 187
                                            vii


   7.2 pymoo: Multi-Objective Optimization in Python     . . . . . . . . . . . . . . . . . . . 188
       7.2.1 Introduction . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . 189
       7.2.2 Related Work . . . . . . . . . . . . . .    . . . . . . . . . . . . . . . . . . . 190
       7.2.3 Architecture . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . 192
       7.2.4 Problems . . . . . . . . . . . . . . . .    . . . . . . . . . . . . . . . . . . . 194
              7.2.4.1 Implementations . . . . . . .      . . . . . . . . . . . . . . . . . . . 194
              7.2.4.2 Parallelization . . . . . . . .    . . . . . . . . . . . . . . . . . . . 194
       7.2.5 Optimization Module . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 197
              7.2.5.1 Algorithms . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 197
              7.2.5.2 Operators . . . . . . . . . . .    . . . . . . . . . . . . . . . . . . . 197
              7.2.5.3 Termination Criterion . . . .      . . . . . . . . . . . . . . . . . . . 199
              7.2.5.4 Decomposition . . . . . . . .      . . . . . . . . . . . . . . . . . . . 200
       7.2.6 Analytics . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . . . . 201
              7.2.6.1 Performance Indicators . . . .     . . . . . . . . . . . . . . . . . . . 201
              7.2.6.2 Visualization . . . . . . . . .    . . . . . . . . . . . . . . . . . . . 202
              7.2.6.3 Decision Making . . . . . . .      . . . . . . . . . . . . . . . . . . . 204
       7.2.7 Summary of Section 7.2 . . . . . . . .      . . . . . . . . . . . . . . . . . . . 206
   7.3 Summary of the Chapter . . . . . . . . . . . .    . . . . . . . . . . . . . . . . . . . 207
CHAPTER 8 CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . 208
   8.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
   8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
                                             viii


                                       LIST OF TABLES
Table 1.1: Terminology of different aspects using surrogates. . . . . . . . . . . . . . . . . .   7
Table 3.1: Overview of surrogate-assisted optimization being used in different kind of
           applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Table 4.1: Categorization regarding the surrogate’s role in an algorithm. . . . . . . . . . . . 50
Table 4.2: Rankings of best performing hyper-parameters. . . . . . . . . . . . . . . . . . . 63
Table 4.3: Ranking with adaptive 𝜌. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Table 4.4: A comparison of DE, GA, PSO, and CMAES with their GPSAF variants
           on unconstrained single-objective problems with four other surrogate-assisted
           algorithms. The rank of the best performing algorithm in each group is shown
           in bold. The overall best performing algorithm for each problem is highlighted
           with a gray shade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Table 4.5: A comparison of DE, GA, PSO, and ISRES with their GPSAF variants on
           constrained single-objective problems with SACOBRA – the current state-of-
           art algorithms for constrained optimization. . . . . . . . . . . . . . . . . . . . . 76
Table 4.6: A comparison of NSGA-II, SMS-EMOA, and SPEA2 with their GPSAF vari-
           ants with four surrogate-assisted algorithms on bi-objective optimization problems. 78
Table 4.7: A comparison of NSGA-III, SMS-EMOA, and SPEA2 with their GPSAF vari-
           ants with four surrogate-assisted algorithms on three-objective optimization
           problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Table 4.8: A comparison of NSGA-III, SMS-EMOA, and SPEA2 with their GPSAF
           variants with four surrogate-assisted algorithms on constrained multi-objective
           optimization problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Table 5.1: The median normalized Inverted Generational Distance (IGD) values out of
           11 runs for NSGA-II, SA-NSGA-II and IC-SA-NSGA-II on constrained bi-
           objective optimization problems. The best performing method and other sta-
           tistically similar methods are marked in bold. . . . . . . . . . . . . . . . . . . . 101
Table 5.2: Average IGD values for unconstrained bi-objective problems from the ZDT
           test suite. NSGA-II does not use heterogeneous evaluation time information,
           hence produce identical IGD value for all different evaluation time combinations. 117
                                                 ix


Table 5.3: Average IGD values for the constrained bi-objective problem TNK. . . . . . . . 118
Table 5.4: Average IGD values for the constrained bi-objective problem Welded Beam. . . . 119
Table 5.5: Average IGD values for the three-objective problem DTLZ2. . . . . . . . . . . . 120
Table 6.1: Median of the performance indicator ( 𝑓best − 𝑓 ∗ for single-objective problems
           and HVR for multi-objective problems) and the outcome of the statistical test
           for the performance of the tested methods for 𝐷 = 5. 𝛼ref defines the selected
           reference point for calculation of HRV. . . . . . . . . . . . . . . . . . . . . . . . 142
Table 6.2: Median of the performance indicator ( 𝑓best − 𝑓 ∗ for single-objective problems
           and HVR for multi-objective problems) and the outcome of the statistical test
           for the performance of the tested methods for 𝐷 = 10. 𝛼ref defines the selected
           reference point for calculation of HRV. . . . . . . . . . . . . . . . . . . . . . . . 142
Table 6.3: Parameters of IPM machine used for optimization. . . . . . . . . . . . . . . . . 151
Table 6.4: Values of geometric variables used for optimization. . . . . . . . . . . . . . . . 154
Table 6.5: The constraint violation of each constraint value from 𝑔1 to 𝑔10 . . . . . . . . . . 161
Table 6.6: Optimization setup and results. Evaluations (Evals) correspond to total func-
           tional evaluations performed in five runs. The reported hypervolume is calcu-
           lated after normalization of objective functions. . . . . . . . . . . . . . . . . . . 162
Table 6.7: Complete setup for analyzing impact of hyperparameters on performance of
           surrogate-assisted optimization. For all cases, number of functional evalua-
           tions is limited to 200 in a single run. Each case is repeated 5 times, thus,
           making total evaluations 1,000 for each case. . . . . . . . . . . . . . . . . . . . 164
Table 6.8: Results for Setups A, B, and C defined for analyzing impact of hyperparam-
           eters on performance of surrogates. Hypervolume (HV) is calculated after
           normalization of objective functions. . . . . . . . . . . . . . . . . . . . . . . . . 164
Table 6.9: Performance comparison of five preferred solutions found using domain spe-
           cific a posteriori MCDM method and trade-off analysis. Preferred values are
           highlighted in bold for the five solutions. . . . . . . . . . . . . . . . . . . . . . . 171
Table 7.1: Multi-objective optimization frameworks in Python. . . . . . . . . . . . . . . . 192
Table 7.2: Multi-objective optimization test problems. . . . . . . . . . . . . . . . . . . . . 195
Table 7.3: Multi-objective optimization algorithms. . . . . . . . . . . . . . . . . . . . . . . 198
                                                  x


                                        LIST OF FIGURES
Figure 1.1: Characteristics of optimization problems. . . . . . . . . . . . . . . . . . . . . .      5
Figure 1.2: The relation between optimization and machine learning. . . . . . . . . . . . .          8
Figure 2.1: The design and objective spaces during optimization. . . . . . . . . . . . . . . 16
Figure 2.2: Flowchart of a genetic algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 3.1: An interdisciplinary research area with different terminologies based on the
            perspective: goal, method, and problem. . . . . . . . . . . . . . . . . . . . . . 26
Figure 3.2: Overview of the overall percentage of each viewpoint. . . . . . . . . . . . . . . 27
Figure 3.3: Analysis of the literature regarding publications related to surrogate-assisted
            optimization from 1995 to 2021. . . . . . . . . . . . . . . . . . . . . . . . . . 28
Figure 3.4: Overview of surrogate-assisted and related publications at The Genetic and
            Evolutionary Computation Conference (GECCO) from 2005 to 2020. . . . . . . 29
Figure 4.1: Robustly adding surrogate-assistance to population-based algorithms (illus-
            tration inspired by [1]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 4.2: Different roles of surrogates in the design of an algorithm. . . . . . . . . . . . . 49
Figure 4.3: Tournament selection with 𝛼 competitors to create a surrogate-influenced
            infill solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure 4.4: Continuation of the algorithm’s run for 𝛽 iteration on the surrogate. . . . . . . . 57
Figure 4.5: Hyper-parameter Analysis for PSAF-CMA-ES with varying 𝛼, 𝛽 and 𝜌.
            Shown are the baseline algorithm CMA-ES (blue), the 2nd to 10th best
            (orange), and the best (red). The performance 𝑓 (any) is normalized with
            respect to the baseline algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 62
Figure 4.6: Comparison of the average performance of PSAF with the original algorithms
            and other surrogate-based algorithms. . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 5.1: Strategies for the evaluation procedure considering a set of solutions 𝑋 and
            target values 𝑉 to be calculated. . . . . . . . . . . . . . . . . . . . . . . . . . . 90
                                                   xi


Figure 5.2: A comparison of ·/B and ·/E strategies assuming parallel processing of all
            jobs, requiring different average times for evaluation. . . . . . . . . . . . . . . . 92
Figure 5.3: Job schedule using a queue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Figure 5.4: The two steps in each iteration: exploitation and exploration. . . . . . . . . . . 97
Figure 5.5: Sampling the design of experiments only in the feasible space using Rejection-
            Based Sampling (RBS), Niching Genetic Algorithm (NGA) and Energy Method. 100
Figure 5.6: Solutions in the objective space of representative runs for CTP2, CTP4, CTP8,
            C3DTLZ4, TNK, and OSY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Figure 5.7: One iteration of HE-EA consisting of an ordered target evaluation ( 𝑓1 , 𝑔1 , 𝑓2 )
            and offspring eliminations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Figure 5.8: An illustration of the objective space for different types of unconstrained and
            constrained multi-objective problems. The results are based on representative
            run of the median performance for each problem. The different expensiveness
            of target functions and termination criteria are set analogously to the other
            experiments. The visibly better red-colored points are obtained using the
            proposed HE-NSGA-III procedure. . . . . . . . . . . . . . . . . . . . . . . . . 121
Figure 6.1: Adaptive trust region approach restricts the search within the cross-patterned
            region allowing the optimization algorithm to focus near high fidelity solutions. 129
Figure 6.2: Selection Error Probability: Pairwise comparison between high-fidelity and
            prediction values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Figure 6.3: Flowchart of PSA-EA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Figure 6.4: Effect of 𝑟 ini and 𝜏𝑅 on the performance of PSA-EA on each test problem
            when FEmax = 100 and 𝑁cycle = 100. . . . . . . . . . . . . . . . . . . . . . . . 136
Figure 6.5: Effect of 𝑁cycle on the performance of PSA-EA (𝜏𝑅 = 2, 𝑟 ini = 0.4) for the
            corresponding test problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Figure 6.6: Effect of 𝑟 R2R on the performance of PSA-ES when 𝜏𝑅 = 2, 𝑁cycle = 20, and
            FEmax = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Figure 6.7: Predicted values of new solutions and their true values after CFD simulation
            for both methods, both problems, and both objectives. . . . . . . . . . . . . . . 146
Figure 6.8: All generated solutions by PSA-EA and CS for problems 𝐵34 and 𝐵38. . . . . . 147
                                                 xii


Figure 6.9: Generated solutions in the vicinity of the interest region for problems 𝐵34
             and 𝐵34. The base and the selected designs for fabrication are demonstrated
             by arrows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Figure 6.10: IPM machine used for optimization. . . . . . . . . . . . . . . . . . . . . . . . 152
Figure 6.11: Torque profile of reference design at rated operating conditions. . . . . . . . . . 152
Figure 6.12: Geometric variables used for optimization. . . . . . . . . . . . . . . . . . . . . 154
Figure 6.13: 2010 Prius motor efficiency contours for 650 Vdc [2]. . . . . . . . . . . . . . . 155
Figure 6.14: Ranking selection of solutions obtained by optimizing the surrogate-based
             optimization problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Figure 6.15: Objective space illustrating dominated and non-dominated solutions for opti-
             mization runs completed using NSGA-II and NSGA-II-WR. . . . . . . . . . . . 162
Figure 6.16: Objective space illustrating Pareto-optimal fronts for cases of Setups A, B, and C.165
Figure 6.17: Comparison of objective space with Pareto-optimal fronts and normalized
             design space of Pareto optimal sets. Pareto-optimal sets are obtained from
             two optimization methods, NSGA-II-WR and NSGA-II-WR-SA. . . . . . . . . 166
Figure 6.18: Exploration of objective space and MSE in prediction of objective functions,
             for each generation using NSGA-II-WR-SA. . . . . . . . . . . . . . . . . . . . 167
Figure 6.19: Objective space highlighting the selected solutions using the a posteriori
             MCDM method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Figure 6.20: Five selected Pareto-optimal solutions highlighted in normalized design space. . 173
Figure 6.21: Magnetic flux density plots of Solutions 1, 2, 3 and reference design at rated
             operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Figure 7.1: Collaborative optimization practice using SOLVeR. . . . . . . . . . . . . . . . 179
Figure 7.2: Phases and responsibilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Figure 7.3: Software architecture of pymoo. . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Figure 7.4: Illustration of some crossover operators for different variables types. . . . . . . 199
Figure 7.5: Different visualization methods coded in pymoo. . . . . . . . . . . . . . . . . . 203
                                                xiii


                                 LIST OF ALGORITHMS
Algorithm 4.1: Infill-And-Advance Interface . . . . . . . . . . . . . . . . . . . . . . . . . 52
Algorithm 4.2: PSAF: A Probabilistic Surrogate-Assisted Framework . . . . . . . . . . . . 54
Algorithm 4.3: GPSAF: Generalized Probabilistic Surrogate-Assisted Framework . . . . . 68
Algorithm 4.4: Probabilistic Knockout Tournament (PKT) . . . . . . . . . . . . . . . . . . 70
Algorithm 5.1: IC-SA-NSGA-II: Inexpensive Constrained Surrogate-Assisted NSGA-II . . 96
Algorithm 5.2: Probabilistic Survival: Subset Selection under Uncertainty . . . . . . . . . 108
Algorithm 5.3: Probabilistic Surrogate-Guided Mating . . . . . . . . . . . . . . . . . . . . 109
Algorithm 5.4: Heterogeneously Expensive Evolutionary Algorithm (HE-EA) . . . . . . . 111
Algorithm 6.1: NSGA-II-WR-SA: NSGA-II with Repair and Surrogate Assistance. . . . . . 159
Algorithm A.1: Scopus Query: Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Algorithm A.2: Scopus Query: Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Algorithm A.3: Scopus Query: Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
                                              xiv


                           KEY TO ABBREVIATIONS
ASE   Approximate Solution Evaluation
BO    Bayesian Optimization
CEP   Computationally Expensive Problem
CFD   Computational Fluid Dynamics
DE    Differential Evolution
EA    Evolutionary Algorithm
EGO   Efficient Global Optimization
EI    Expected Improvement
EOP   Expensive Optimization Problem
ESE   Exact Solution Evaluation
ET    Evaluation Times
FEA   Finite Element Analysis
GA    Genetic Algorithm
GECCO Genetic And Evolutionary Computation Conference
GPSAF Generalized Probabilistic Surrogate-Assisted Framework
HE-EA Heterogeneously Expensive Evolutionary Algorithm
MOP   Multi-Objective Optimization Problem
MTPA  Maximum Torque Per Ampere
PF    Pareto Front
                                         xv


PI   Probability Of Improvement
PKT  Probabilistic Knockout Tournament
PMSM Permanent Magnet Synchronous Machine
PR   Polynomial Regression
PS   Pareto Set
PSAF Probabilistic Surrogate-Assisted Framework
PSO  Particle Swarm Optimization
RBF  Radial Basis Function
SE   Solution Evaluation
SOO  Single-Objective Optimization
TPE  Tree-Structured Parzen Estimator
                                         xvi


                                              CHAPTER 1
                                           INTRODUCTION
1.1    Motivation
Optimization is omnipresent and has become an inherent part of our lives. The products we use
and the services we consume result from a development process where all kinds of decisions
regarding design and functionality had to be made. Such decisions are usually based on criteria to
be met and measures to be minimized or maximized. For instance, the development of a washing
machine requires implementing a controller that decides the amount of soap and water to add
given the extent of dirt and grease with the ultimate goal of cleaning the laundry as effectively and
efficiently as possible; smartphones are designed to maximize the user’s experience considering
functionality, usability, and aesthetics; or music streaming services aim to maximize the availability
of songs and streaming quality while minimizing the consumer’s bandwidth usage. For all these
products and services, optimization plays a vital role during development and allows companies
to keep a competitor’s advantage through steady improvements. Being aware of the importance
of optimization, one might want to know what optimization is and how it is defined. Indeed, this
is an open question with different answers to it. On the one hand, mathematicians might state
that the optimization’s goal is finding inputs that satisfy the mathematical definition of optimality
given a function to be either minimized or maximized. On the other hand, practitioners might view
optimization generally as a tool to provide insights and support for decision-making when facing an
application problem. A well-known scientist, George Dantzig, who made significant contributions
to industrial engineering, once said:
       True optimization is the revolutionary contribution of modern research to decision
       processes.
                                                                             — George Dantzig
                                                    1


    The usage of the term true optimization implies that optimization seems to be more than
its mathematical definition and emphasizes that optimization as a process shall have a purpose
connected to an overall goal: contributing to modern research. Such an emphasis on applicability
shows the importance of the fusion of theory and practice. When facing a real-world optimization
problem, identifying characteristics and possible challenges is always a good starting point.
1.1.1   Problem Characteristics and Challenges
Facing a real-world optimization problem often starts by investigating the problem’s characteristics.
For most applications, such an analysis will be manifold, because each trait must either implicitly
or explicitly be considered by the optimization method. In the following, important characteristics
of optimization problems are be briefly discussed.
Variable Types: The search space of an optimization problem is based on variables and their types.
For instance, common variable types are continuous, discrete, binary, or permutation. Where
some problems have variables of the same type, others might be of a mixed nature. For example,
mixed-integer programming refers to a problem type with continuous and discrete variables to be
optimized at the same time [3], or a real-world tour planning problem, the traveling thief problem,
consists of a permutation and binary variables [4]. Moreover, it is worth noting that the type of
variables also indirectly impact the search space size. While the search space spanned by real-
valued variables is infinite and uncountable, for a discrete search space, the cardinality is known
however, in practice, intractable to iterative over.
Number of Variables: Not only the type but also the number of variables is critical. Different
algorithms are proposed explicitly for single-variable, small-scale, or large-scale optimization
problems. For example, a relatively simple optimization method such as the golden section method
is known to be effective for optimizing a single variable [5]. However, the golden section search
does not apply to problems with more than one variable. On the other extreme, algorithms have
been customized to work efficiently on a larger scale. For instance, a genetic algorithm has been
                                                    2


customized to solve one billion binary variable problem [6], and stochastic gradient descend based
methods with backpropagation [7] are used to optimize a couple of thousands or millions of
variables in (deep) neural networks [8].
Number of Objectives: In the early stage of optimization, researchers have focused on optimization
problems with a single objective. However, many optimization problems consist of multiple
conflicting objectives and thus are multi-objective in nature [9]. Consequently, researchers have
started to propose algorithms to simultaneously optimize multiple objectives with the goal of
obtaining a so-called Pareto-optimal set of solutions. Because not a single solution but a whole
set of solutions shall be found, population-based algorithms are predominantly used [10, 11]. It
is worth mentioning that a multi-dimensional objective space has its own challenges, such as
normalization [12], visualization [13], and decision making [14].
Constraints: Criteria that a solution needs to satisfy are referred to as constraints [15]. From
a user’s perspective, constraints have priority over objective values. No matter how good the
solution’s objectives are, a solution is considered infeasible if it violates one or more constraints [16].
Researchers distinguish between equality and inequality constraints when defining an optimization
problem. Equality constraints are rather strict and may or may not be supported depending on
the type of algorithm. Inequality constraints can have less or more impact depending on their
definition. In some (rare) cases, the optimum with and without constraints might remain the same;
in other cases, the optimum is shifted to the boundary of one or multiple constraints. Generally,
constraints can have a non-neglectable impact on the optimization problem’s complexity.
Differentiability: The availability of derivatives can be beneficial and they can be directly used by an
optimization method [17]. Primarily, algorithms use the first or second-order derivatives to improve
the convergence. However, for many real-world optimization problems, obtaining derivatives is
impractical or impossible. Thus, gradient-free optimization must be applied, which is, in its more
general form, also known as black-box optimization where no problem-dependent information is
utilized [18].
                                                    3


Fitness Landscape: Most of the previously discussed problem characteristics are known before the
optimization is applied. However, the nature of the fitness landscape is usually less obvious yet
critical. In contrast to relatively simple uni-modal functions, a multi-modal function with a few
or many local optima increases the problem complexity significantly. Many suboptimal regions
require balancing exploitation and exploration during optimization. Whereas global optimization
focuses on finding the single best optimal solution of a function with possibly many local optima,
multi-modal optimization attempts to obtain all suboptimal solutions simultaneously [19]. When
multiple local or global optima have been found, a common post-processing task is deciding what
solution to choose, which often involves comparing the robustness or sensitivity of solutions.
Uncertainty: Many optimization algorithms assume that the objective and constraint functions
are deterministic. However, especially in applications based on simulations, some underlying
randomness may exist, and thus an evaluation with the same input produces different results. In
other words, this type of problem consists of a non-deterministic function introducing uncertainty
during evaluation. Some researchers address the uncertainty by converting the problem to be
deterministic. The evaluation is repeated with different random seeds in such approaches, and the
average and variance can then serve as objectives or constraints and be optimized. Even though this
might be suitable for some problems, algorithms directly addressing the functions’ stochasticity
are preferred. Different kinds of optimization methods have been proposed in the research field of
stochastic optimization [20].
Evaluation Time: For practitioners, the time spent for optimization stands in trade-off with the
solution’s quality. The time spent in optimization can be divided into two categories: first, time
for evaluating solutions and, second, the algorithmic overhead. In real-world optimization, the
evaluation of a solution often requires utilizing domain-specific third-party software to carry out
simulations, for instance, Computational Fluid Dynamic (CFD) [21] or Finite Element Analysis
(FEA) [22]. The optimization of problems with computationally expensive evaluations justifies
more algorithmic overhead with the goal of selecting solutions to be evaluated more carefully.
                                                  4


                        Figure 1.1: Characteristics of optimization problems.
    Certainly, this shall not be an exhaustive list of problem characteristics but give the reader of
this dissertation an idea of how fundamentally different optimization problems are. In Figure 1.1,
some of the key challenges are exemplified in a word cloud forming a light bulb. The illustration
figuratively demonstrates the number of possible combinations of difficulties one might face in
practice. Since they all need to be addressed, this also shows the challenge of finding a suitable
optimization method for a specific application problem.
1.1.2   Facing a Computationally Expensive Problem (CEP)
The optimization of computational expensiveness objective and constraint functions has already
been briefly discussed. However, since the computational expense is one of the biggest challenges
practitioners face in practices, some more details are provided next.
    One may raise the question, what kind of problems are computationally expensive? Technically,
computational expensiveness is directly related to the number of necessary calculations until the
evaluation has terminated. The number of calculations, in turn, is correlated to the problem’s
complexity and definition. Computationally expensive problems (CEPs) occur in many different
                                                   5


research areas and applications, and the origin of the computational expense is frequently caused
by a simulation needed to be run. For instance, discrete-time simulations are employed to model a
complex scenario. Let us consider the manufacturing process in a medium-sized enterprise where
optimization aims to maximize productivity. The manufacturing process itself is rather complex
and includes, amongst other things, arranging a production line or planning the production schedule.
The complexity of the process makes it nearly impossible to reduce the objective to a few equations.
Therefore, a discrete-time simulation is a common approach to forecast performance. However, to
conclude with enough certainty, the simulation needs to be run for a large number of time steps,
and thus the performance evaluation becomes computationally expensive. Besides discrete-time
simulations, other simulations commonly faced in practice are computational fluid dynamics (CFD)
or finite element analysis (FEA). Both computationally intense models are used to forecast fluid
flows or other physical phenomena using partial differential equations. Especially in engineering,
they are popularly utilized to make design decisions during development. Apart from simulations,
time-consuming functions are an inherent part of analyzing a large amount of data. Thus, modeling
or predicting based on a large data set requires a longer processing time since it is often unclear
what model with what hyper-parameters to choose.
    Instead of addressing the complex problem directly, one might suggest attempting to simplify
the problem to reduce its complexity and, thus, the evaluation time. Simplifications, however,
result in a different optimization problem to be solved, and obtained solutions are unlikely to be
optimal regarding the more complicated model. Thus, we argue simplifications can be misleading
and oversimplify the application to be investigated. For this reason, directly optimizing the CEP
is recommended. However, the optimization method needs to consider the computational expense
for a function call and deal with a limited evaluation budget.
1.1.3    The Role of Surrogate Model in Optimization and Some Terminology
When optimizing a CEP, typically the overall goal is not converging to the true optimum with
many but to obtain a near-optimal solution with only a few solution evaluations. The execution of
                                                  6


a time-consuming evaluation dominates the algorithm’s computational overhead for finding new
solutions in each iteration. Thus, an algorithm has more time for carefully selecting new designs
than traditional optimization algorithms; however, the evaluation budget is usually only limited
to a few hundred evaluations instead of a few thousand. A standard method to speed up the
convergence of existing methods is using a surrogate model (also called metamodel, approximation
model, simulation model, data-driven model, response surface), which approximates the time-
consuming function. Incorporating a surrogate into the optimization process is indicated by adding
"surrogate-assisted" or "metamodel-based" to the algorithm’s name or description.
    Some more terminology used in this thesis is listed in Table 1.1. We distinguish between the time
for a solution evaluation by referring to a computationally inexpensive evaluation using a surrogate
and a time-consuming or computationally expensive evaluation running the application’s evaluation
function. Moreover, in some cases, we emphasize the accuracy levels by differentiating between the
low-fidelity evaluation of the surrogate and the high-fidelity of the application evaluation procedure.
As an abbreviation, we denote the surrogate evaluation by approximate solution evaluation (ASE),
and the application-based assessment simply by solution evaluation (SE) or in some cases to
highlight that no approximation error exists by exact solution evaluation (ESE).
    Figure 1.2 illustrates the relation between the time-consuming simulation, the surrogate model,
and the optimization method. The optimization method uses the surrogate model to obtain pre-
dictions of the simulation’s outcome. The surrogate is generated based on data from the time-
consuming simulations in the past. Inevitably, such an approximation model comes along with
                    Table 1.1: Terminology of different aspects using surrogates.
                 Aspect            Model / Simulation                Surrogate
                                computationally expensive
                 Time                                      computationally inexpensive
                                     time-consuming
                 Accuracy              high-fidelity                low-fidelity
                                      Exact Solution           Approximate Solution
                 Evaluation
                                     Evaluation (ESE)            Evaluation (ASE)
                                                     7


          Computationally                                    Algorithmic
              Expensive                                      Computation
          Model / Simulation                Surrogate                 fit
                                                                    update
                                                                                Optimization
                                                                Approximate
                                                                   Solution
                                                                  Evaluation
                                                                 (low-fidelity)
          Exact Solution Evaluation
                (high-fidelity)
                Figure 1.2: The relation between optimization and machine learning.
an inaccuracy. The most fundamental questions in terms of the design of an algorithm using a
surrogate are:
What? Before thinking about how the surrogate is used, one might ask what should be modeled
at all? Since optimization problems might not have only a single target function, objectives and
constraints must be considered during optimization. Thus, for different optimization problems
and algorithms, a different granularity of data exists. For instance, should the constraints be
aggregated to the constraint violation before modeling, or should each constraint value be modeled
separately? Similarly, should the objective values be aggregated with a decomposition function,
or each objective be considered separately? These fundamental questions are interwoven with the
design of an algorithm but need to be answered to propose any type of surrogate-assisted algorithm.
For more information, we encourage the interested reader to look at the taxonomy that describes
all possible combinations to model objectives and constraints [23].
With What? After determining what values a surrogate model should predict, one has to decide
what type of model to use. Numerous models have been proposed in literature, each having
up and downsides. Apart from deciding on the surrogate type beforehand, researchers have
investigated selecting the most suitable surrogate dynamically to improve the robustness of the
                                                  8


surrogate assistance. A dynamic surrogate selection includes finding the surrogate type and the
surrogate’s hyper-parameters, which rapidly increases the time for modeling. For such a selection,
the suitability of a surrogate needs to be assessed, which requires choosing training and test and
defining an error metric. Both are critical and impact the overall performance of a surrogate-assisted
algorithm.
How? Lastly, the question of how an optimization algorithm utilizes the surrogate is of importance.
The usage of a surrogate is a more open question and could entirely vary from algorithm to
algorithm. Whereas in some algorithms, the surrogates might serve as an assistant, the whole
design is based on it in others. Moreover, the role of the surrogate can differ from being used in
the vicinity of a solution providing local approximations or attempting to forecast the overall trend
of the fitness landscape. Aside from usage, the updating procedure (also referred to as surrogate
management) needs to be addressed and is another critical component for the performance of a
surrogate-assisted algorithm.
     These three fundamental questions must be answered when proposing a surrogate-assisted
algorithm. We answer the What question throughout this thesis always by modeling each target
function (objective and constraint) independently. A downside of such an approach is the increase
in the modeling effort for problems with many target objectives and constraints. However, we
argue that the additional computational burden is negligible due to the time-consuming functions.
Moreover, it avoids prediction error accumulation through function aggregation, leading to higher
accuracy. The answer to the With what question follows the recent development of selecting the
best type and parameters of a surrogate dynamically. Finding the most suitable model for each
target function has shown to be a robust solution in surrogate-assisted optimization. More details
and what metric for surrogate selection is used is discussed in each of the chapters. Lastly, the
How question. Different ways of answering How surrogates can be utilized during optimization
shall be discussed in this thesis. Besides discussing how surrogates have been used throughout
literature, we answer this question on a very generic level with only a few assumptions. This keeps
                                                   9


the proposed ideas applicable to many different types of optimization algorithms. Moreover, the
How question is approached from different viewpoints, such as generalizability, asynchronicity,
and practicability.
1.2    Research Goals and Contributions
This dissertation’s research goals and contributions are stated after introducing the optimization of
computationally expensive functions using surrogate-assisted algorithms.
(i) Analysis and Classification of Surrogate-Based Algorithms: Optimization problems with ex-
pensive evaluation functions have become a vital research direction over the last few years. This
dissertation analyzes relevant publications and categorizes them regarding their viewpoints. More-
over, it provides a thematic overview of surrogate-assisted optimization. It covers essential topics
such as what type of surrogates have been used, how constraints or multiple objectives have been
handled in the past, and other recent research trends. Moreover, it presents a large number of
applications demonstrating the importance and practicability of surrogate-assisted optimization.
(ii) Proposal of a Generalizable Surrogate-Assisted Framework: In the last two decades, significant
effort has been made to solve computationally expensive optimization problems using surrogate
models. Regardless of whether surrogates are the primary drivers of an algorithm or improve the
convergence of an existing method, most proposed concepts are rather specific and not very generic.
Some important considerations are selecting a baseline optimization algorithm, a suitable surrogate
methodology, and the surrogate’s involvement in the overall algorithm design. This dissertation
proposes a probabilistic surrogate-assisted framework, demonstrating its applicability to a broad
category of optimization algorithms [24]. The framework injects knowledge from a surrogate
into an existing algorithm through a tournament-based procedure and continues the optimization
run on the surrogate’s predictions. The surrogate’s involvement is determined by updating a
replacement probability based on the accuracy from past iterations. The proposed framework
enables the incorporation of surrogates into an existing optimization algorithm and, thus, paves
                                                 10


the way for new surrogate-assisted algorithms dealing with challenges in less-frequently addressed
computationally expensive functions, such as different variable types, a large number of variables,
multiple objectives, and constraints.
(iii) Proposal of a Methodology for Heterogeneous Expensive Optimization: A significant amount
of research has been done in optimizing computationally expensive functions using different kinds
of surrogate modeling approaches. Most studies, however, assume that the evaluation of objec-
tive and constraint functions are non-separable, and their values are available at the same time.
However, in practice, the target functions can often be evaluated independently and are differently
time-consuming or, in other words, are heterogeneously expensive. In this dissertation, we first
investigate problems with separately computable computationally inexpensive constraint functions,
while the objectives may still be time-consuming [25]. This scenario probably makes the simplest
case of handling heterogeneous and multi-scale surrogate modeling in the presence of constraints.
Second, we generalize the proposed concept to be applied to any kind of heterogeneously expen-
sive problem [26]. The proposed method sends batches to an evaluator asking only a subset of
objectives or constraints to be evaluated. This also requires dealing with partial information during
optimization. Investigating heterogeneously expensive problems is vital for optimizing real-world
optimization problems.
(iv) A Blueprint for Collaborative Optimization: One of the related issues for surrogate-based
optimization is the need for executing the optimization task in collaboration between the domain
experts and the optimization experts. Collaboration among different stakeholders in achieving a
problem-solving task is increasingly recognized as a vital component of applied research today. For
instance, in various research areas in engineering, economics, medicine, and society, optimization
methods are used to find efficient solutions. Such a problem-solving task involves at least two
types of collaborators – optimization experts and domain experts. Each collaborator cannot solve a
problem most efficiently and meaningfully alone, but a systematic collaborative effort utilizing each
other’s expert knowledge plays a critical and essential role. While many articles on the outcome
of such collaborations have been published, and the justification of domain-specific information
                                                  11


within an optimization has been established, systematic approaches to collaborative optimization
have not been proposed yet. In this dissertation, methodical descriptions and challenges of col-
laborative optimization in practice are provided, and a blueprint illustrating the essential phases of
the collaborative process is proposed [27]. Moreover, collaborative optimization is illustrated by
case studies of previous optimization projects with several industries. The study should encourage
and pave the way for optimization researchers and practitioners to come together and embrace each
other’s expertise as they solve complex problems of the twenty-first century.
(v) Demonstrating Optimization of Computationally Expensive Functions in Practice: Many real-
world problems are associated with computationally expensive and time-consuming simulations
for evaluation. Each candidate design should be selected carefully in such problems, even though
it means extra algorithmic complexity. In this dissertation, two real-world engineering problems
are investigated. First, we demonstrate the optimization of an engine cylinder head design, which
has eight design parameters and two conflicting objectives [28]. Each design evaluation requires a
detailed CFD simulation which takes about one hour using 32 CPUs. The optimization budget is
limited to 61 design evaluations in total. Second, we present the optimization of an electric machine
design optimization of a Toyota Prius 2010 motor. The problem consists of 10 design variables, two
computationally expensive objectives, and 10 geometric computationally inexpensive constraints.
The proposed method exploits the independently computable objective and constraint functions
and their difference in evaluation time. The case studies demonstrate the relevance of optimizing
computationally expensive functions in practice.
(vi) Insights on the Design and Usage of an Optimization Framework: Developing an optimization
method entirely from scratch can be tedious and very time-consuming. Thus, choosing an optimiza-
tion framework or customizing existing algorithms is wise and time-saving. As an example, this
dissertation discusses the software design and features of pymoo, a multi-objective optimization
framework in Python [29]. As the leading developers of the framework, we like to give insights
into the class structure and organization and the meaning and responsibility of being in charge of
a framework. We cover the architecture’s high-level overview to show its capabilities, followed by
                                                 12


explaining each module and its corresponding sub-modules. The algorithm implementations are
customizable, and the source code can be easily modified/extended by supplying custom operators.
The latter is crucial for rapid prototyping, often employed in practice.
1.3    Structure of the Thesis
This thesis is structured in the following way. Chapter 2 provides the necessary background
information for the reader to follow along with this dissertation. On the one hand, this includes
the basics of single and multi-objective optimization and, on the other hand, some state-of-the-art
surrogate models. A brief history and a review of surrogate-assisted optimization are presented
in Chapter 3. Moreover, we provide a categorization of existing surrogate-based optimization
methods and give an overview of applications with computationally expensive functions across
research fields. In Chapter 4, a generic surrogate-assisted method is proposed, first for single-
objective optimization and then for constrained multi-objective optimization. In contrast to most
existing approaches, the proposed method can be applied to a variety of metaheuristics. So far, most
assume the evaluation procedure evaluates objectives and constraints all at once. However, since
this is often not the case in practice, we look into the optimization of independently computable
functions with varying expenses in Chapter 5. The proposed method exploits more inexpensive
objectives and defines an information gain metric to find a suitable order for the evaluation process.
In Chapter 6, we present the optimization of two engineering optimization problems with time-
consuming evaluations and insights into collaborative optimization in practice, and the design of an
optimization framework is given in Chapter 7. Finally, conclusions and future research directions
are discussed in Chapter 8.
                                                  13


                                              CHAPTER 2
                                          FUNDAMENTALS
This chapter lays the foundation for understanding the components of surrogate-assisted optimiza-
tion. First, the goal of (single-objective) optimization is verbally and mathematically defined.
Second, multiple-objective optimization is introduced by explaining fundamental concepts such as
Pareto-dominance and Pareto-optimality. Moreover, we discuss the history and design of genetic
algorithms as they have emerged as the predominant choice for solving multi-objective optimization
problems. Apart from the optimization, three widely used types of surrogate models are presented,
and their benefits and drawbacks are discussed. Lastly, one of the most well-known surrogate-based
algorithms called efficient global optimization (EGO) is described.
2.1   Single-Objective Optimization (SOO)
Before exploring the usage of surrogates in optimization, optimization itself needs to be defined.
Nocedal and Wright describe optimization as "the minimization or maximization of a function
subject to constraints on its variables”[30]. Mathematically, an optimization problem is given by
                          Minimize      𝑓 (x),
                          subject to   𝑔 𝑗 (x) ≤ 0,             ∀ 𝑗 ∈ (1, . . . , 𝐽),
                                                                                                   (2.1)
                                       ℎ 𝑘 (x) = 0,            ∀𝑘 ∈ (1, . . . , 𝐾),
                                       𝑥 𝑑(𝐿) ≤ 𝑥 𝑑 ≤ 𝑥 𝑑(𝑈) , ∀𝑑 ∈ (1, . . . , 𝐷),
where x denotes a vector in the search space x ∈ Ω of length 𝐷, 𝑓 (x) the objective function mapping
to a real number, 𝑔 𝑗 (x) the 𝑗-th inequality constraints and ℎ 𝑘 (x) the 𝑘-th equality constraint. For
each dimension 𝑑 of the variable vector x, a box constraints is given by 𝑥 𝑑(𝐿) ≤ 𝑥 𝑑 ≤ 𝑥 𝑑(𝑈) . The
definition only considers minimization as it is known that
                                                    14


                                𝑥 ∗ = argmax 𝑓 (x) = argmin − 𝑓 (x),                            (2.2)
and thus, any maximization can be converted to a minimization problem. In the remainder of this
thesis, we will explicitly refer to single-objective optimization (SOO) if it shall be emphasized the
optimization problem consists of a single objective function.
2.2   Multi-Objective Optimization (MOO)
In practice, optimization problems often do not consist of only one but multiple conflicting ob-
jectives. For some readers, this seems to be a relatively minor change in the problem definition;
however, it significantly impacts the complexity of the optimization problem. A multi-objective
optimization problem (MOP) is mathematically defined by
                          Minimize      𝑓𝑚 (x),                 ∀𝑚 ∈ (1, . . . , 𝑀),
                          subject to    𝑔 𝑗 (x) ≤ 0,              ∀ 𝑗 ∈ (1, . . . , 𝐽),
                                                                                                (2.3)
                                        ℎ 𝑘 (x) = 0,             ∀𝑘 ∈ (1, . . . , 𝐾),
                                        𝑥 𝑑(𝐿) ≤ 𝑥 𝑑 ≤ 𝑥 𝑑(𝑈) ,  ∀𝑑 ∈ (1, . . . , 𝐷).
    In contrast to single-objective optimization problems (see Equation 2.1), multiple objectives
 𝑓𝑚 where 𝑚 ∈ (1, . . . , 𝑀) are minimized which results in a multi-dimensional objective space of
R 𝑀 . The existence of a multi-dimensional design and objective space is illustrated in Figure 2.1.
In this example, the design space consists of three variables (𝐷 = 3), and the objective space
has a dimensionality of two (𝑀 = 2). Each three-dimensional point x in the design space maps
to a two-dimensional point in the objective space 𝑓 (x). It is interesting to observe that the two-
dimensional objective space does not allow us to conclude if one solution p is better than another
solution q by simply comparing two scalar values. Because of an additional dimension in the
objective space, two vectors instead of two scalars have to be compared, and a more general
                                                     15


                            Decision Space                           Objective Space
                 𝑥!                                       𝑓"
                                    ℝ!                                       ℝ"
                    Pareto Set (PS)   𝑥"
                                                                    𝒇(𝒙)
                                     𝒙
                                                              Pareto Front (PF)
                                                     𝑥#                                     𝑓#
                    Figure 2.1: The design and objective spaces during optimization.
concept is necessary. One of the most important relations between two solutions in multi-objective
optimization is Pareto-dominance:
Definition 2.2.1 (Pareto-dominance). A solution p dominates another solution q with the Pareto-
dominance relation if 𝑓𝑖 (p) ≤ 𝑓𝑖 (q) holds ∀𝑖 ∈ {1, . . . , 𝑀 } and ∃ 𝑗 ∈ {1, . . . , 𝑀 } such that
 𝑓 𝑗 (p) < 𝑓 𝑗 (q).
      In other words, a solution p dominates another solution q if (i) p is not worse in any objective
and (ii) is at least strictly better in one objective. Having defined the relation between two solutions,
one can decide if a solution is optimal in a multi-objective context:
Definition 2.2.2 (Pareto-optimal). A solution x∗ is Pareto-optimal if there does not exist any solution
p which Pareto-dominates x∗ .
      First, one can note that claiming that one solution is Pareto-optimal requires considering its
relation to all other solutions in the search space. Second, the definition allows more than one
solution to be Pareto-optimal. Commonly, researchers refer to all Pareto-optimal solutions in the
design space as the Pareto set (PS). The PS maps to the objective space where it is called Pareto
front (PF) defined by PF = {f (x) | x ∈ PS}. The goal of multi-objective optimization is to obtain
PS and thus PF as quickly as possible.
                                                       16


     Even though the definition of multi-objective problems (see Equation 2.3) includes any value
for the number of objectives 𝑀, it is worth noting that researchers often refer to optimization
problems with four or more objectives (𝑀 ≥ 4) to many-objective optimization problems. One
reason why 𝑀 ≥ 4 has been chosen is because of the increasing difficulty of visualizing results
in more than three dimensions and the requirement of a different search methodology to handle
many dimensions. For more details about multi- and many-objective optimization, we encourage
interested readers to have a look at Kalyanmoy Deb’s book [9], one of the most formative and cited
books in this research area.
2.3   Genetic Algorithms (GAs)
In the late 1950s and early 1960s, John Holland proposed to mimic sexually reproducing organ-
isms in computer science [31]. Because it did not address recombination but only the mutation,
researchers did not pay much attention to this initial study. Nevertheless, Holland continued his
research and published his seminal book Adaptation in Natural and Artificial Systems [32] which
is together with David Goldberg’s book Genetic Algorithms in Search, Optimization and Machine
Learning [1] is considered as the birth of genetic algorithms. From there on, researchers and
practitioners started to apply evolutionary computation to problems in various research fields, and
it quickly gained popularity. Most other existing optimization methods during that time were based
on a single solution attempting to be improved in each iteration. The major drawback of such
point-by-point search algorithms is their lack of a global view of the fitness landscape increasing
the likelihood of getting stuck in a local optimum significantly. In contrast, genetic algorithms
(GAs) are based on a set of solutions and implicitly explore the search space but still exploit the
knowledge of well-performing solutions using evolutionary operators.
     Before describing the overall procedure of GAs, some terms frequently used in evolutionary
computing will be discussed. Researchers commonly refer to the set of solutions as population and
a single solution as individual. Moreover, each iteration of a genetic algorithm is called generation.
An important hyper-parameter of genetic algorithms is the population size which determines the
                                                 17


                                               Initialization
                                                                                     Selection
                                                  Mating                            Crossover
    Population   +      Offsprings
                    Combine
                                                 Survival                           Mutation
               Merged
                    Truncate
              Survivors
                                               Termination
                                   Figure 2.2: Flowchart of a genetic algorithm.
number of individuals kept in each generation. Figure 2.2 illustrates the procedure of genetic
algorithms (GAs). First, the initial population is (usually randomly) created and evaluated. Then,
the current population is used to create offsprings by executing a procedure called mating (also
known as recombination). Mating itself consists of three steps: selection, crossover, and mutation.
The selection creates a pool of individuals serving as parents. The crossover operator recombines the
parents to generate one or multiple offspring solutions. The crossover’s design has the challenging
task of incorporating information of all parents to create one or multiple new solutions. Then, the
mutation is applied to each offspring by perturbing solutions with a specific pre-defined probability.
The mating has been completed, and the offsprings are merged with the current population. This
results in a population twice as large as initially. The (environmental) survival applies a natural
selection to the new merged population and reduces it back to its original size. This reduction
mimics the survival of the fittest in biology and is an essential aspect of evolutionary computation.
After the survival, the algorithm either proceeds with the next generation or terminates.
    GAs are known as a meta-heuristic because the realization of the evolutionary operators allows
incorporating domain-specific knowledge into the algorithm. The usage of such custom operators in
genetic algorithms is also referred to as customization [33]. One might have realized that processing
                                                          18


a set of solutions in each iteration is ideal for obtaining not a single but multiple optimal solutions
in the end. Thus, GAs are a suitable candidate for finding a set of optimal solutions due to their
search behavior.
2.4    Data Modeling and Predictors
Data modeling is a well-studied research direction with broad applicability in all kinds of disciplines.
Thus, numerous approximation and interpolation models have been proposed over the last decades.
A model aims to approximate the relation between input X and output y. Let the 𝑖-th data point
of the input be denoted by X (𝑖) with 𝑖 ∈ (1, . . . , 𝑁) and be a vector of length X (𝑖) ∈ R 𝐷 . We
refer to the 𝑑-th value of the 𝑖-th data point by 𝑥 𝑑(𝑖) and to the corresponding function value of X (𝑖)
by y (𝑖) = 𝑓 (X (𝑖) ). Given another set of data points X′ and y′ originating from the same source,
a predictor uses a model based on X and y to provide predictions ŷ′ given X′. Note that in this
dissertation, we generally denote predicted values from models by ˆ (known as the hat symbol).
Generally, the deviation between the predictions ŷ′ and the true values y′ should be minimized.
Some popularly used error metrics for measuring the deviation are the mean squared error, mean
absolute error, or the coefficient of determination.
    Since our purpose of using models is their utilization as surrogates during optimization, three
common choices for surrogates – polynomial regression (PR), radial basis function (RBF), and
Kriging – are discussed next.
2.4.1    Polynomial Regression (PR)
One type of surrogate model which has been used very early on in optimization is polynomial
regression (PR). An important parameter of PR is the degree of the polynomial, which determines
the complexity of the model. Let 𝑃(X) be a function mapping an input X to its polynomial
representation. Then, one can formulate a system of linear equations
                                            y = 𝑃(X) 𝛽 + 𝜖,                                        (2.4)
                                                    19


where y is the output, 𝛽 is coefficients to be found, and 𝜖 is the approximation error. Minimizing
the approximation error 𝜖 results in
                                                           −1
                                   𝛽 = 𝑃(X)𝑇 𝑃(X)                  𝑃(X)𝑇 y.                        (2.5)
    The shape and values of 𝑃(X) differ depending on the polynomial degree. The most simple
form of polynomial regression is constant where 𝑃(X) is given by
                                                        ©   1    ª
                                                        ­        ®
                                                            ..
                                            𝑃(X) = ­                                               (2.6)
                                                        ­        ®
                                                             .   ®,
                                                        ­        ®
                                                        ­        ®
                                                            1
                                                        «        ¬
which plays only a minor role in practice. More interesting, however, is a linear
                                           ©   1 X1(1) . . . X𝑑(1)          ª
                                           ­                                ®
                                               ..  ..          ..      ..
                                  𝑃(X) = ­                                                         (2.7)
                                           ­                                ®
                                                .   .             .     .   ®,
                                           ­                                ®
                                               1 X1(𝑛) . . . X𝑑(𝑛)
                                           ­                                ®
                                           «                                ¬
or, quadratic regression
                                                         2                                    2
                       © 1 X1(1) . . . X𝑑(1) X1(1)              . . . X𝑖(1) X (1)
                                                                               𝑗  . . . X𝑑(1)   ª
                       ­                                                                        ®
                          ..   ..   ...     ..       ..                   ...              ..
               𝑃(X) = ­                                                                            (2.8)
                       ­                                                                        ®
                           .    .            .        .                                     .   ®,
                       ­                                                                        ®
                                                         2                                    2
                         1 X1(𝑛) . . . X𝑑(𝑛) X1(𝑛)              . . . X𝑖(𝑛) X (𝑛) . . . X𝑑(𝑛)
                       ­                                                                        ®
                       «                                                       𝑗                ¬
which is employed to estimate the first or second order derivative. It is worth noting that a smaller
degree might not capture the function entirely but indicate trends. In contrast, a larger degree can
model more complex functions but often suffers from overfitting. Moreover, in comparison to other
models, PR does not necessarily fit through all data points exactly.
                                                  20


2.4.2   Radial Basis Functions (RBFs)
Radial basis functions (RBFs) [34] are real-valued functions used to predict new points through
interpolation. The interpolation is based on the relation between pairs of solutions X (𝑖) and X ( 𝑗) .
Given a distance 𝑟 = X (𝑖) − X ( 𝑗) between two points, their relation is expressed by a kernel
function 𝑘 (𝑟). Some well-known kernel functions are:
                               𝑘 (𝑟) = 𝑟                                (linear),
                               𝑘 (𝑟) = 𝑟 3                              (cubic),
                                         √︁
                               𝑘 (𝑟) = 𝑟 2 + 𝜎 2              (multiquadratic),                 (2.9)
                                              2 /𝜎 2
                               𝑘 (𝑟) = 𝑒 −𝑟                          (gaussian),
                               𝑘 (𝑟) = 𝑟 2 log (𝑟)                           (tps).
    The kernel function determines the type of relation between points. Given two sets of points
𝐴 and 𝐵, the kernel function 𝐾 defines the kernel matrix between 𝐾 (A, B) by applying 𝐾 to all
possible point pairs between a (𝑖) ∈ A and b ( 𝑗) ∈ B:
                                    © 𝑘 (a (1) , b (1) ) . . . 𝑘 (a (1) , b (|𝐵|) ) ª
                                    ­                                                       ®
                                                 ..            ...             ..
                        𝐾 (A, B) = ­                                                           (2.10)
                                    ­                                                       ®
                                                  .                             .           ®.
                                    ­                                                       ®
                                    ­                                                       ®
                                        𝑘 (a (| 𝐴|) , b (1) ) . . . 𝑘 (a (| 𝐴|) , b (|𝐵|) )
                                    «                                                       ¬
    To fit the RBF model, A = B = and thus the kernel matrix 𝐾 (, ) and the tail 𝑃(), form a linear
system of equations
                                                                
                                    𝐾 (X, X) 𝑃(X)  𝜆 y
                                                                 =  ,                     (2.11)
                                                                
                                    
                                     𝑃(X)𝑇
                                                         0     𝑐  0
                                                                 
                                                                
    to be solved for 𝜆 and 𝑐. The predictions for an unknown set of points X′ is then given by
                                      ŷ′ = 𝐾 (X′, X)𝜆 + 𝑐𝑇 𝑃(X′).                             (2.12)
                                                          21


     RBFs have the advantage of fitting through each data point and a prediction based on distances
to other solutions in their neighborhood. However, one open question is what distance functions
(most researchers use the Euclidean distance) and what kernel functions are the most suitable for
modeling a specific data set.
2.4.3    Kriging
Kriging [35] (also known as Gaussian Process) is a powerful tool frequently used in Geostatistics
and Machine Learning. Similar to RBFs, it uses a kernel function to make inferences [36].
Nevertheless, the motivation of Kriging is entirely different and based on a normal distribution of
functions. The input X and output y are both assumed to be standardized, thus having a mean of
zero and variance of one. Therefore, some data pre-processing might be necessary to apply Kriging
if this is not the case. Predictions are based on a joint distribution between known inputs X and y
and points to predict X′ with their true function values of y′:
                                                                                 
                             y                𝐾 (X, X) + 𝜎 2 I 𝐾 (X, X′)  ª
                                          ©                  𝑛
                               ∼ N ­­ 0,                                                     (2.13)
                                                                                 ®
                                                                                   ® .
                             y′                𝐾 (X′, X)          𝐾   (X  ′, X′) 
                                                                                 
                                        «                                        ¬
     Thus, we can derive the conditional distribution
                                         y′ | X, y, X′ ∼ N 𝜇 𝑦 ′ , 𝜎𝑦 ′ ) ,
                                                                         
                                                                                               (2.14)
where
                                                                           −1
                                  𝜇 𝑦 ′ = 𝐾 (X′, X) 𝐾 (X, X) + 𝜎𝑛2 I             y,            (2.15)
and
                                                                               −1
                     𝜎𝑦 ′ = 𝐾 (X′, X′) − 𝐾 (X′, X) 𝐾 (X, X) + 𝜎𝑛2 I                 𝐾 (X, X′). (2.16)
                                                        22


     The values of 𝜇 𝑦 ′ serve as predictions ŷ′, and the diagonal of the covariance matrix 𝜎𝑦 ′ as
the estimated prediction error. Not only providing a prediction but also an uncertainty measure
is one of the major benefits of Kriging. Moreover, an interesting feature of Kriging is automatic
relevance detection which optimizes over the influence of a specific dimension (or feature). This is
done by adding 𝐷 additional parameters to the kernel function, each representing the influence of a
particular dimension. The relevance parameters are usually optimized using a maximum likelihood
estimation. Nevertheless, another layer of optimization can further increase the computational
burden, which is already known to be cubic with respect to the number of data points to fit the
model.
2.5     Efficient Global Optimization (EGO)
One of the most popular surrogate-based algorithms is efficient global optimization (EGO) [37]
(also known as Bayesian optimization (BO)) which shall be briefly explained. Efficient global
optimization (EGO) is based on Kriging and has essentially three phases: (i) fit a Kriging model
given all previously evaluated solutions, (ii) define an optimization problem using the predictions
and uncertainty measure, (iii) solve the optimization problem to obtain a new infill point to be
evaluated. This fit-define-optimize procedure has been widely used over the last years, and many
different variants and extensions have been proposed. The definition of the optimization problem in
(ii) is also known as the acquisition function or infill criterion. Based on the prediction of Kriging,
a common acquisition function is the probability of improvement (PI) defined by
                                                                𝜇 𝑦 ′ − 𝑓 (x+ )
                                                                               
                                                     +  
                            PI(x) = 𝑃 𝑓 (x) ≥ 𝑓 (x ) = Φ                          ,              (2.17)
                                                                      𝜎𝑦 ′
where 𝜇 𝑦 ′ and 𝜎𝑦 ′ represent the output of Kriging, and x+ the current best solution. One downside
of the probability of improvement is it measures only the likelihood of a solution being improved
but not the amount of improvement. Thus, researchers have proposed expected improvement (EI)
defined by
                               EI(x) = (𝜇 𝑦 ′ − 𝑓 (x+ )) Φ(𝑍) + 𝜎𝑦 ′ 𝜙(𝑍),                       (2.18)
                                                    23


where
                                           (𝜇 𝑦 ′ − 𝑓 (x+ ))
                                        
                                        
                                                             if 𝜎𝑦 ′ > 0
                                        
                                        
                                        
                                                 𝜎𝑦 ′
                                    𝑍=                                   ,                     (2.19)
                                        
                                        0                   if 𝜎𝑦 ′ = 0
                                        
                                        
                                        
to encounter this drawback. More details about the history of EGO and its extension will be
provided in Chapter 3.3. It is worth noting that with an increasing amount of data points, the
Kriging model itself becomes computationally expensive. Also, studies have shown that with
increasing dimensionality, the algorithm’s performance deteriorates significantly.
2.6   Summary of the Chapter
In this chapter, we have introduced some fundamental principles and algorithms referred to through-
out this thesis. First, we have presented the basics of single and multi-objective optimization. Sec-
ond, genetic algorithms have been explained, and some basic (surrogate) models were introduced.
Finally, a well-known surrogate-based optimization method called EGO was presented. Altogether,
a brief introduction of fundamentals shall help the reader to follow along with this thesis.
                                                       24


                                           CHAPTER 3
                                    LITERATURE REVIEW
Even though handling computationally expensive function evaluations seems to be a rather practical
manner, many relevant papers have been published in recent years. In this comprehensive literature
review, we seek to identify the most relevant publications and keywords to get an idea of the recent
developments. It is worth noting that optimizing computationally expensive functions is a task
performed in many different research areas and thus rather interdisciplinary. The interdisciplinarity
causes the authors’ focus of studies and viewpoints to be diverse. Therefore, this chapter aims to
keep the big picture in mind and consider different viewpoints of optimization computationally
expensive functions.
3.1   Different Viewpoints on Computationally Expensive Optimization
Existing publications can be roughly put into three categories based on their focus: goal, method, or
problem. Each viewpoint is reasonable and often determined by the authors’ personal preference.
An overview of each category and its keywords is illustrated in Figure 3.1. Moreover, each type is
discussed in detail next.
Goal: The metadata of a publication, such as a title or keywords, often focuses on the goal
or purpose. It is critical to converge to the (global) optimum with a limited number of function
evaluations when optimizing time-consuming functions. Thus, relevant articles have been published
under the umbrella of efficient global optimization. Moreover, it is worth noting that this is related
to anytime optimization. As indicated by the name, the optimization’s goal is the proposal of
methods that can be interrupted anytime, having achieved the best result possible. Thus, anytime
optimization focuses on developing algorithms considering the whole convergence curves and not
only the final result. Comparing the convergence generally favors quickly converging algorithms
to near-optimal solutions over slower converging algorithms finding the exact optimum.
                                                 25


                  Goal                            Method                           Problem
                                       Surrogate-Assisted Optimization         Expensive Black-Box
      Efficient Global Optimization                  (SAO)                         Optimization
                  (EGO)
                                           (Sequential) Model-based        Simulation-Based Optimization
                                                  Optimization                          (SBO)
         Anytime Optimization
                                             Bayesian Optimization           Data-Driven Optimization
    Viewpoint: Design a method to      Viewpoint: Incorporate            Viewpoint: Solve computationally
    converge as quickly as possible    surrogate(s) into an algorithm to expensive problems (often
    to the global optimum.             accelerate the convergence.       simulations).
    Focus: How to design an            Focus: What algorithm should      Focus: How can the specific
    algorithm with a fast              be used to become surrogate-      simulation problem be solved?
    convergence by trading off the     assisted, and how is the          More domain-specific solutions,
    algorithmic overhead?              surrogate used?                   such as different granularities of
                                                                         a simulation or data separation,
                                                                         might be performed.
Figure 3.1: An interdisciplinary research area with different terminologies based on the perspective:
goal, method, and problem.
Method: In contrast to the goal of optimization, the methodology to achieve the goal is frequently
focused on. Thus, the authors emphasize the usage of surrogates during optimization as it is an
inherent part of the proposed algorithm. It is worth noting that, nevertheless, researchers refer to a
model commonly as surrogate; however, other terms such as metamodel, response surface model,
approximation model, or simulation model have also been used in the past [38]. Although some
terms might have (slightly) different meanings, one might find relevant publications by refining the
search considering a different terminology.
Problem: Apart from emphasizing the goal or method, some researchers focus on the computation-
ally expensive application itself. Problem-related literature requires a search regarding different
types of applications that are either computationally expensive or supposed to be optimized with a
minimal budget of function evaluations. Typically, simulation-based problems are of an expensive-
ness nature and are thus worth searching for. An evaluation time of a couple of hours or even days
is quite common. Moreover, one can think of optimization problems where a large amount of data
needs to be processed. Data-intensive studies frequently address the use of distributed systems to
                                                      26


                                                                       Goal
                        Method
                                       63.4                  12.9
                                      (6437)                (1306)
                                                           23.8
                                                         (2414)
                                                                   Problem
                  Figure 3.2: Overview of the overall percentage of each viewpoint.
parallelize and speed up function evaluation. So far, less attention has been paid to the usage of
surrogates in such fields.
     Identifying categories and keywords is essential for a comprehensive literature review. Next,
we provide some more context and statistics about the viewpoints on optimizing computationally
expensive problems described above.
3.2    Surrogate-Based Optimization: History and Recent Trends
Next, the importance and frequency of each of the viewpoints are evaluated. To visualize the
interest and activity in the whole field and show the authors’ usage of different viewpoints, we have
performed a keyword analysis using Scopus [39]. The keywords and queries of the analyses can be
found in Appendix 13. In Figure 3.2, the distribution of keywords mentioned in the publication’s
title is shown. It is apparent that most publications (63.4%) focus on the method itself and directly
mention the surrogate or synonyms somewhere in the title or abstract. Nevertheless, a significant
amount of research has also been conducted targeting the problem (23.8%) or the goal (12.9%).
The distribution of the different viewpoints demonstrates the importance of looking at all facets of
optimizing computationally expensive functions.
                                                   27


                                    1200      Aggregated
                                              Goal
           Number of Publications
                                    1000
                                              Method
                                              Problem
                                     800
                                     600
                                     400
                                     200
                                       0
                                       1995    2000        2005           2010   2015   2020
                                                                       Year
Figure 3.3: Analysis of the literature regarding publications related to surrogate-assisted optimiza-
tion from 1995 to 2021.
   Besides the overall distribution of viewpoints, the development over the last years shall be
discussed. Figure 3.3 shows the number of publications in each year categorized by goal, method,
and problem. The x-axis represents each year starting from 1990 to 2021, and the y-axis is the
absolute number of publications. Each line is based on the number of publications in the specific
year if the metadata includes at least one representative keyword of the category. Please note that
one publication can be assigned to more than one category in this case. The aggregated curve,
however, represents the sum of all publications, counting each publication exactly once. First, one
can observe an increasing trend throughout all categories. This increase is not least caused by
the relevancy of addressing the computational expensiveness in practice. Second, for publications
focusing on the method, an almost exponential trend can be observed, whereas the goal and problem
seem to behave more linearly. One possible reason for this trend is the usage of the term surrogate
which has become the first choice by many researchers.
   One venue where the usage of surrogates found a place is the genetic and evolutionary com-
putation conference (GECCO) [40]. GECCO is one of the largest peer-reviewed conferences in
                                                                  28


                                       Genetic and Evolutionary Computation Conference (GECCO)
                               0.09
                                                                                                                                               12
                                                                                                                                               149
                               0.08
                               0.07
  Percentage of Publications
                                                                                                                                         8
                               0.06                                                                                                     140
                               0.05                                                                                        8
                                                                                                                          178
                                                                                                                                  7                    5
                               0.04                                                     6                                                             134
                                                                                       170                                       194
                               0.03
                                                                          3                           3
                               0.02                                      167                   3     180            2
                                       3             3                                        204            2     144
                                      250           263            2             2                          182
                               0.01                         1     218
                                                                                263
                                                           194
                                              0
                                             201
                               0.00
                                      2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   2016   2017   2018   2019   2020   2021
Figure 3.4: Overview of surrogate-assisted and related publications at The Genetic and Evolutionary
Computation Conference (GECCO) from 2005 to 2020.
the field of Evolutionary Computation and the leading conference of the Special Interest Group on
Genetic and Evolutionary Computation (SIGEVO) of the Association for Computing Machinery
(ACM). In addition to evolutionary methods, researchers started to discuss approaches aiming
to solve computationally expensive problems. To draw some conclusions about the development
in the field, we have analyzed the number of publications addressing computationally expensive
problems from 2005 to 2021. We have used the BibTeX files provided by ACM for this analysis and
have filtered out all publications with two pages or less (poster publications). We have identified
publications related to surrogate-assisted optimization by a keyword search1 in the titles since the
conference’s focus is methods and applications related to optimization, no other precautions to
avoid finding irrelevant studies needed to be taken.
            1 Keywords:
                  surrogate, metamodel, model-based, computationally expensive, Kriging, Gaussian process, radial
basis, response surface, EGO, efficient global optimization
                                                                                       29


    In Figure 3.4, the years are shown on the x-axis and the percentage of publications matching
the criteria for being related to surrogate-assisted optimization on the y-axis. The number of
publications is located on top of each bar, and the number of each year’s accepted publications is
printed below. One can observe an increasing trend, especially from 2017 onward. In 2020 more
than eight percent of all publications are related to surrogate-assisted optimization and address the
optimization computationally expensive optimization problems, which is encouraging.
    So far, this chapter has introduced different viewpoints and provided some insights into the
research field’s development. Next, a thematic literature review of state-of-the-art methods for
optimizing computationally expensive problems is provided, and recent trends are discussed.
3.3   Relevant Literature and Applications
All algorithms utilizing surrogates during optimization have to find an answer to the What?, With
what?, and How? questions discussed in Chapter 1.1.3. These questions are directly related
to important topics in surrogate-assisted optimization, such as what type of surrogate or what
optimization algorithm is used.
Type of Surrogate: The type of surrogate is a critical answer to the With what? question. The
surrogate is generated based on information obtained through time-consuming evaluations from
the past. Fitting a model based on data is a common task for Machine Learning methods [41]
and thus a research field where many ideas originate or are borrowed from, especially in data-
driven optimization [42]. Frequently, surrogates are used to predict the objective and constraint
values directly. In general, for such a prediction, one can distinguish between approximation and
interpolation models. Approximation models are not necessarily returning the exact values of the
training data during prediction. For instance, PR [43, 44, 45] with a predefined degree is used to
approximate the objective function. In [45], a linear regression model has been utilized, for which
the prediction variance and inference quality can be determined. Many optimization algorithms use
a replacement strategy and keep the newly evaluated solution if it outperforms the current one. For
                                                  30


such decision-based optimization algorithms, Support Vector Machines (SVMs) [46, 47] have been
employed to predict the outcome of the replacement decision. The predictions are based on binary
classification (will the new solution outperform the current one) by drawing a decision boundary
in the design space [48]. By predicting the outcome of decisions, surrogate-assisted algorithms
attempt to look at one or multiple iterations into the future using a surrogate [46] and thus improve
their convergence behavior. It is worth noting that approximation models, such as polynomial
regression with a smaller degree and SVMs, minimize the training error but will often not reduce it
to zero due to the models’ simplicity. In contrast, interpolation models attempt to fit each point of
the data set exactly. A widely used interpolation model for surrogate-based optimization is Kriging
(also known as Gaussian Process) [37, 49, 50, 51, 52, 53]. Kriging is known to be a good choice
for problems with a low number of decision variables. One challenge of incorporating Kriging into
an algorithm is the choice of a suitable hyperparameter configuration and the numerical instability
for imbalanced data sets [54]. Numerous implementations of Kriging in all kinds of programming
languages exist, but the Matlab DACEfit toolbox [55] made available by Lophaven et al. is one
of the most commonly used by researchers. For optimization problems with a larger number of
variables, researchers have frequently chosen RBFs as surrogates [54, 56, 57, 58, 59, 60]. In
general, RBFs have been not only be used to model the function globally but also locally [57].
Similar to other surrogates, RBF uses kernel functions of a specific type (for instance, linear, multi-
quadratic, gaussian), which is an important hyper-parameter to be considered [58]. In contrast to
Kriging, RBFs do not provide any information about the estimated error or uncertainty; however,
distances to existing high-fidelity evaluated points have been used instead [59]. Moreover, Neural
Networks [61, 62, 63, 64, 65] have been utilized to model data sets. Even though neural networks
are very effective in fitting data accurately, they have only been sporadically used for surrogate-
assisted optimization. The main reason for that is the challenge of using neural networks with a
limited amount of data [66].
Most of the surrogate types mentioned above are designed for real-valued input data spanning a
continuous search space. For discrete variables, however, Random Forests have shown promising
                                                  31


results [67, 68, 69, 70, 71]. For instance, Hutter et al. [67] have proposed a sequential model-based
algorithm configuration (SMAC) method, which is well-known for finding optimal hyperparam-
eters without any manual tuning [67]. Also, some researchers have investigated surrogates for
mixed variables types, for instance, a combination of continuous and discrete variables [72, 73].
Furthermore, Walsh functions [74] have recently been discovered to serve as surrogates for discrete
optimization problems [75, 76]. Walsh functions can decompose any function of the Hilbert space
and be naturally used as a basis for the space of pseudo-boolean problems [75]. Compared to other
surrogates, Walsh functions are comparably computationally inexpensive and can be extendable for
other variable types such as permutations [76]. They have been used for mutation in evolutionary
algorithms [77] and might be one possible promising research direction for surrogate-assisted op-
timization. Furthermore, the tree-structured parzen estimator (TPE) [78, 79] has been utilized by
surrogates for continuous, discrete as well as mixed variable problems. Whereas Kriging directly
models 𝑝(𝑦|𝑥), TPE models 𝑝(𝑥|𝑦) and 𝑝(𝑦) (where 𝑥 is defined by the variable and 𝑦 by the
objectives or constraints) [78]. Among other things, TPEs have been used for hyper-parameter
optimization for neural networks and computationally expensive multi-objective optimization [79].
When having to choose from several surrogate types, it can be challenging to commit to a specific
type during algorithm design. Also, intuitively the suitability of a surrogate depends on the fitness
landscape [80]. Thus, the performance of different surrogates for different types of problems
has been compared [81]. Moreover, methods using an online surrogate selection, also known as
surrogate ensembles, have been proposed [54, 82, 83, 84, 85]. The ensemble of surrogates can
be incorporated in different ways. For instance, a set of solutions can be obtained where each
solution corresponds to an optimum found on a specific surrogate. Alternatively, one may combine
all surrogates by taking the average of the predictions [82]. In [83], the authors have addressed the
so-called curse of uncertainty by employing a weighted average of surrogate’s prediction where
the weight is proportional to the surrogate’s root mean squared error. In contrast, the variance
of predictions from a surrogate ensemble has been utilized as a measure for robustness in [56].
The amount of publications addressing different kinds of surrogates is quite large. Each surrogate
                                                   32


has its benefits and drawbacks, which is indicated by the variety of surrogates used in different
studies. However, a general recommendation for what type of surrogate is suitable for what type of
optimization problem is still an open question [38, 86].
Type of Algorithm: Early on, researchers have realized that response surfaces – a synonym for
surrogates – are helpful to be used for optimization. [87, 88, 89]. A response surface has been
fitted through a data set referred to as the design of experiments created in a space-filling manner.
Then, an optimization algorithm is executed on the response surface, and an optimal solution is
obtained [87]. This procedure became especially popular in engineering and is also known as
offline or non-adaptive surrogate-assisted optimization. However, such a non-adaptive approach
assumes the model to be relatively accurate and does not account for any model error. Because
for most applications, an underlying modeling error is inevitable, a sequential surrogate update has
emerged as one of the crucial components of surrogate-based optimization methods [90].
Sequentially updating surrogates opens up many different possibilities of making use of an approx-
imation model during optimization. One type of algorithm where the surrogate is a substantial
part of the optimization procedure is Bayesian optimization [91], also known as EGO [37, 92]. In
EGO, the surrogate – typically Kriging [35] – provides a prediction and an uncertainty measure
used to find a trade-off between exploitation and exploration. Both aspects are addressed by so-
called acquisition functions (or infill criteria) such as expected improvement [37] or probability of
improvement [92]. Especially in lower-dimensional search spaces, the exploration aspect in the
acquisition function is important [93]. Other types of surrogate and scalarization techniques have
been studied, such as radial basis functions with distance-based uncertainty measure [94] or linear
spline function with a customized probability function [95]. All EGO approaches have a common
embedded (global) optimization algorithm using the surrogate’s predictions. The limitation of find-
ing only a single new solution in each iteration has been investigated thoroughly, and multi-point
EGO approaches have been proposed [96, 97, 98, 99, 100]. Some known challenges of EGO are
dealing with a large number of variables and numerical instabilities introduced by a biased solution
distribution in the search space [54].
                                                    33


Instead of letting surrogates be the most substantial part of an algorithm’s design, they have also
been used to guide the search of existing algorithms. For instance, for evolutionary algorithms,
a prescreening with surrogates has been commonly employed to improve the convergence [101,
102, 103, 104]. For prescreening, a larger number of offsprings is created through mating, and
the surrogate predicts their fitness. Then an environmental selection is applied to reduce the
offspring to a few being evaluated on the time-consuming function [101]. Such incorporation of
surrogates is also known as generation-wise evolutionary control [105] and introduces naturally
bias towards solutions being predicted to be more promising. Moreover, surrogates are incorporated
into memetic algorithms [106] – evolutionary algorithms with an embedded local search – where
they assist the local search in becoming more efficient [83].
Another well-known metaheuristic and type of evolutionary algorithm is differential evolution
(DE) [107] known to be especially effective for the global optimization of continuous vari-
ables [108]. In DE, the mating is based on a crossover with a mutated individual. The mutation
is based on the addition of a weighted differential vector. An offspring replaces an individual of
the current population if its performance is superior. In [109] the surrogate serves as a classifier
predicting whether a solution will be replaced or not; thus, if an offspring outperforms the current
solution. This is in contrast to [110], where the authors propose only evaluating the most promising
offspring based on the surrogate’s prediction. Furthermore, the algorithm’s global search has been
complemented by a local search assisted with a surrogate [111]. The combination of global and
local search provides an indirect impact through biased recombination and direct impact through a
local refinement. Another approach has been proposed in [60], where the surrogate filters out only
the most promising solutions analogously to the prescreening for genetic algorithms.
Besides DE, variants of the well-known model-based evolutionary algorithm covariance matrix
adaptation evolution strategy (CMA-ES) [112, 113] with assisting surrogates have been pro-
posed [46, 48, 114, 115, 116, 117]. In CMA-ES new candidate solutions are sampled according
to a multivariate normal distribution which is continuously updated based on the best perform-
ing individuals determined by their rank. In [115], locally weighted (quadratic) regression [114]
                                                   34


approximates the ranking before the expensive evaluation takes place. One drawback of such a
relatively simple approximation model is its simplicity and the lack of suitable fit for more complex
fitness landscapes. Moreover, a full quadratic model requires a large number of solutions with in-
creasing search space dimensionality, which has been addressed in [46]. Later on, the authors have
proposed to replace the quadratic regression with a rank-based Support Vector Machine [48] with
a Gaussian kernel. As a kernel matrix, the covariance matrix adapted from CMA-ES itself is used.
Although the proposed method has shown to be more efficient, the results also indicate premature
convergence on problems with a multi-modal landscape. These issues have been addressed, and
the method has been further improved by the authors in [116], which only exploits the surrogate if
it is sufficiently accurate.
Moreover, surrogates have been incorporated into particle swarm optimization (PSO) [118]in
various ways. Similar to evolutionary algorithms, a generation can be simulated on the surrogate
before being evaluated on the computationally expensive evaluation [119]. Another interesting
way of using surrogate knowledge is the modification of the algorithm’s social component. For
instance, in [120], attractors are derived from an embedded optimization on the ensemble of local
and global surrogates and, thus, improve the algorithm’s convergence.
Multiple objectives: Optimizing multiple conflicting objectives at a time has been extensively
studied in the last decades. However, improving the convergence by incorporating surrogates re-
quires rethinking current approaches. Evolutionary algorithms are commonly employed to solve
multi-objective problems because their population-based search fits the desire of obtaining a non-
dominated set of solutions. A comprehensive overview of evolutionary algorithms assisted by
surrogates can be found in [121]. The authors discuss 45 different algorithms proposed in the
literature and categorize the approaches regarding their type and the number of surrogates. More-
over, we encourage interested readers to look at [122] for a more general overview of methods,
challenges, applications, and recent developments in the research field. Commonly, multi-objective
optimization algorithms are either based on dominance, decomposition, or indicators to address the
existence of larger dimensional objective spaces [121]. For each approach incorporating a surrogate
                                                 35


has to be implemented differently. For instance, in [64, 123] a classifier has been learned to deter-
mine whether an offspring is dominated. The decision boundary drawn by the classifier to make
the decisions regarding dominance is used to filter out non-promising offsprings without evaluating
them expensively. Moreover, the approximations of objectives and constraints can be utilized to
prescreen a set of solutions. For instance, Chugh et al. proposed KRVEA [124] – an extension of
RVEA [125] with Kriging as a surrogate – performing a survival with the predictions of a surro-
gate before evaluating solutions expensively. In 2006, Joshua Knowles proposed ParEGO [126],
where an iteration of the well-known EGO [37] algorithm is performed on scalarized objective
values. The decomposition is based on the augmented Tchebycheff aggregation function [127]
with a weight vector uniformly drawn from a set of reference directions. The rather expensive
surrogate implementation has been further improved later on to handle a solution set up to a size
of 500 [128]. In [129], Li et al. proposed a Kriging Metamodel Assisted Multi-Objective Genetic
Algorithm (K-MOGA), which keeps expensively evaluated and predictions by the surrogate in the
same population. The algorithm determines dynamically whether the predicted values need to be
evaluated by the time-consuming function based on their domination status. The domination status
considers the distances between non-dominated and dominated solutions and aims to represent the
prediction error of the surrogate. Despite predicting each individual separately, the surrogate has
been utilized to predict a multi-objective performance indicator. For instance, in [130], the authors
proposed a Selection-based Efficient Global Optimization (SMS-EGO) algorithm, which uses the
S-metric [131] as an infill criterion. The infill criterion was further adapted to consider a step-wise
uncertainty reduction in [132] and a computationally cheaper infill criterion with similar perfor-
mance [133]. In 2010, Zhang et al. proposed MOEA/D-EGO [134], which is a combination of the
EGO idea and the MOEA/D [11] in general and was able to outperform ParEGO. MOEA/D-EGO
fits local surrogates in the neighborhood determined by fuzzy clusters [135] and can evaluate more
than one solution in each generation. Moreover, Habib et al. proposed Hybrid Surrogate-Assisted
Multi-objective Algorithm (HSMEA) addressing computationally expensive many-objective opti-
mization problems [136]. The proposed method is capable of handling constraints and keeps two
                                                    36


archives to improve the performance for irregular Pareto fronts. Also, it incorporates a local search
subject to the angle of reference directions to improve the equality of infill solutions. The results on
a wide range of test problems indicate that HSMEA performs significantly better than CSEA and
K-RVEA. Recently, an adaptive Bayesian approach has been proposed by Wang et al. [137], which
is based on the EGO principle but tunes the hyperparameter of the acquisition function according
to search dynamics. The proposed method balances the exploration and exploitation by switching
between an angle-based distance and an angle-penalized distance throughout the optimization. The
method showed promising results on test problems and one real-world problem with a budget of
300 function evaluations.
Constraints: Efficient constraint handling plays a vital role in computationally expensive prob-
lems [59, 60, 94, 111, 138]. In the early phase of surrogate-assisted optimization, rather simple
approaches have been used to address constraints. For instance, in [94] mostly box constraints
have been considered for which it was not necessary to fit a surrogate because of their simplicity.
Later on, methods have been proposed to handle more realistic computationally expensive problems
with constraints using the feasibility first principle. In [59] the algorithm’s performance has been
shown on a real-world problem from the car industry with 68 inequality constraints. The proposed
method uses the predictions of surrogates to approximate the constraint functions to find candidate
solutions with the least constraint violation. A few solutions are selected from these candidates by
a weighted ranking considering the predicted objective values and their distance to existing already
evaluated solutions. Similarly, a feasibility prescreening has been proposed in [60], where the
surrogate is utilized to improve the probability of evaluating a feasible solution in each generation.
A more indirect way of surrogate-assisted constrained handling using a tournament selection based
on predictions has been investigated in [111]. Based on predictions, two individuals compete with
each other intending to choose the better one. If both solutions are feasible, the least infeasible
of the two; if both are infeasible, the more feasible of the two. This selection pressure naturally
introduces bias to evaluate feasible solutions in a genetic algorithm. Also, a common technique
to deal with constraints by penalizing the objective function has been investigated using surro-
                                                   37


gates. As known for computationally inexpensive problems, the constraint violation’s weighting is
important to balance the impact of feasibility and infeasibility. In [139] the authors proposed to
weigh the penalty in proportion to the number of feasible solutions. Results have shown that such
an approach outperforms a constant weighting throughout the run of an algorithm. In [138], the
proposed method minimizes the constraint violation until a feasible solution has been found and,
second, aggregates the objective and constraint using a modified expected improvement function
to proceed. Altogether, constraint handling is an essential aspect of optimization in practice, and
it should be considered during the algorithm design. However, existing studies do not allow us to
derive a clear tendency toward a surrogate-based constraint handling approach performing better
than others.
Heterogeneous Computation of Functions: Less attention has been paid to problems with ob-
jective and constraint functions with varying expensiveness. Studies are limited to bi-objective
problems where one objective is computational expensive and the other computational inexpensive
(cheap) [140, 141, 142, 143, 144]. In 2013, Allmendinger et al. [140] has laid the foundation for
investigating heterogeneously expensive optimization problems and proposed baseline methods,
such as waiting for the slow objective to be evaluated or using the nearest neighbor approximation
with Gaussian noise as a prediction. The authors extended their initial study in [141] in which
they proposed evolutionary algorithms where the cheap objective is used as a look-ahead function
for one or more generations. In 2018, Chugh et al. have proposed HK-RVEA [142], which uses
Kriging as a surrogate to approximate the expensive objective. Moreover, a trust-region-based
algorithm with quadratic approximations for objectives has been investigated later on in [143].
Wang et al. proposed the usage of transfer learning to make inferences about the more expensive
objective from the cheap one [144]. It is worth noting that the difference in expensiveness between
the objectives is critical. In existing studies, it has been assumed that one objective is two, five, 10,
or 15 times slower than another [142, 144].
Miscellaneous: Different characteristics and challenges of optimization problems have been already
                                                     38


discussed in Chapter 1. A few of them shall now be reviewed with respect to surrogate-assisted
optimization. The optimization problem might be computationally expensive but also have a
large number of variables that need to be paid extra attention. To deal with such problems,
either the algorithm handles the large-scaled decision space directly [145], or a feature selection is
performed [146, 147]. Some real-world problems do not contain only a single optimization problem
but multiple, which can be put in a hierarchy. These problems are especially challenging due to
the embedded optimization which needs to be performed. For a bi-level optimization problem,
the lower-level can be approximated by a surrogate [148]. Moreover, it is worth mentioning that
simulations are often not deterministic and, thus, stochastic optimization shall be used [149]. Non-
determinism implies that two uncertainties, one from the simulation and one from the surrogate,
accumulate. Some simulations allow defining different levels of accuracy. For instance, running a
simulation for one hour provides function values with a confidence level of 95%, whereas running
it for two hours increases it to 99%. Optimizing such problems is commonly referred to as
multi-fidelity optimization [150]. For example, the existence of multiple fidelity levels have been
addressed with linear regression models in [45], and a test problem suite was proposed in [151].
Furthermore, the usage of surrogates in dynamic optimization has to be investigated. For instance,
in [152] the concept drift during optimization is addressed using a sliding window approach for the
surrogate data to be fitted. All these challenges of real-world optimization problems show what
type of problems need to be looked at in connection with expensive functions.n
Applications: Many practical optimization problems consist of expensive function evaluation.
Thus, surrogate-assisted optimization has been widely applied in all kinds of interdisciplinary
research areas. In the following, we highlight applications and research fields that have shown to
be of importance and have had a significant impact. Table 3.1 provides an overview of research
fields and concrete applications addressed in the literature.
Some problems directly related to Computer Science have an expensive evaluation function [67,
71, 153, 154, 155]. Most existing algorithms in the literature have parameters, also referred to
as hyperparameters, which shall be tuned to maximize the performance. Hyperparameter-tuning
                                                  39


is especially expensive for non-deterministic algorithms where performance assessment requires
running multiple runs [67]. One such example is finding a well-performing architecture of a neural
network. Each network design requires optimizing the corresponding weights, which usually
requires lots of computational resources and time [71, 153]. Moreover, learning decision policies,
as commonly done in reinforcement learning, is a time-consuming task and requires many iterations.
Thus, instead of interacting in the live environment, surrogates have been used to speed up the
convergence time [154, 155].
Besides Computer Science, many other multidisciplinary research fields require the optimization
of computationally expensive functions. For instance, in environmental optimization, various
simulations of different kinds of optimization problems have been applied. One way of addressing
risk is by simulation of possible scenarios in the future. For instance, time-consuming simulations
have been optimized to forecast wildfire behavior [156], the seismic risk based on stochastic
ground motion [157], or the spread of a pandemic [69]. Apart from assessing the risk or predicting
the future outcome, the performance evaluation of configurations or policies is frequently based
on simulations. For instance, surrogates have been used to obtain an optimal robotic milking
barn facility allocation and to investigate the design’s relation to herd size, feeding routine, and
management practices [158]. Also, surrogates have frequently been employed to improve the
convergence on problems related to water distribution systems in different ways, such as data-
driven, projection, and hierarchical-based approaches [159, 160, 161, 162]. Furthermore, other
environmental problems with expensive simulations related to the wind farm layout [163], reservoir
management [164, 165], aquifer systems [166], sustainable transportation [167] , wood-based
composite materials [168], nuclear power plants [169], wind waves [170] have been investigated.
In general, the goodness of an engineering design often requires simulations for evaluation pur-
poses. One research field is the design of aircraft structural components where CFD simulations
are performed to predict aerodynamic forces and aerodynamic efficiency [173, 174, 175, 176].
Despite optimization related to the exterior shape of an airplane, surrogates have been employed
to model turbulent reacting flows of aeronautical combustion chambers [178]. Also, simulations
                                                  40


Table 3.1: Overview of surrogate-assisted optimization being used in different kind of applications.
  Research Field          Topic
  Computer Science        Algorithm Configuration [67],          Neural Architecture
                          Search [153, 71], Reinforcement Learning [154, 155]
  Environment             Seismic Risk Assessment [157], Wild Fire Spread [156], Pan-
  Agriculture             demic Forecasting [69], Crop Yield [171, 172], Aquifer Sys-
                          tems [166], Water Management [159, 160, 161, 162], Reservoir
                          Management [164, 165],
                          Robotic Milking Barn [158], Sustainable Transportation [167],
                          Wind Farm Layout [163], Wood-based Composite Materi-
                          als [168], Nuclear Power Plant [169], Wind Wave [170]
  Engineering             Aircraft Structural Components [173, 174, 175, 176], Heli-
                          copter Rotor Design [177], Aeronautical Combustion Cham-
                          bers [178], Vehicle Crashworthiness Design [179, 180], Ve-
                          hicle Body Lightweight [181] DC Motor [182], Building Per-
                          formance Simulation [183, 184, 185, 186], Fused magnesium
                          furnaces [187, 188], Antenna [189, 190], Microwave Struc-
                          ture [191, 192, 193], Circuit Design Centering [194], Modular
                          Flowsheet Optimization [195],
  Medicine                Health Care Operation Management [196, 197] Resource Plan-
  Biology                 ing Emergency Department [198], Trauma System [70], Breast
                          Cancer [199, 200], Medical Image Registration [201], Protein
                          Engineering [202]
  Business                Production Planing [203], Supply Chain [204], Traffic [205,
  Operations              206, 207], Inventory Management [208, 209], Job Schedul-
                          ing [210, 211], Enterprise Architecture [212], Manufactur-
                          ing [213], Recommender Systems [214], Portfolio Optimiza-
                          tion, Maintenance [215]
are not limited to airplanes but also have been applied to design the helicopter rotor in [177].
Another critical research direction is simulations related to automobiles. For instance, studies to
maximize the vehicle’s crashworthiness [179, 180], minimize the vehicle body weight [181] or
to control the direct speed of a DC Motor have been conducted [182]. Moreover, the design of
buildings has been investigated [183, 184, 185, 186]. Optimization problems consist mostly of a
discrete search space and multiple objectives such as environmental quality or building energy con-
sumption. Moreover, expensive simulations are performed in electrical and chemical engineering.
For instance, surrogates have been utilized to optimize the antenna’s design regards to maximum
                                                   41


gain, maximum front-to-back ratio, and minimal ground plane area [189, 190], the microwave
design [191, 192, 193], circuit design centering [194], or a chemical modular flowsheets [195].
Apart from engineering, expensive simulation problems frequently occur in the medical and bi-
ological research field. For instance, studies about simulations of schedules of operations in a
hospital in general [196, 197] or resource planning in an emergency department [198] have been
conducted. Moreover, the design of a Trauma System [70] in Colorado with regards to the waiting
time until a patient is being treated and the suitability and quality of treatment. The problems were
driven by a data set with 100,000 emergency records of 72 hospitals with five different capability
levels. Furthermore, more medical-related tasks such as the prediction of the growth of breast
cancer [199, 200] or image registration (a process of transforming different sets of data into one
coordinate system) [201] have been assisted by surrogates.
Also, in operation research, simulations are necessary to evaluate the outcome. Expensive simu-
lations can take place anywhere along the value chain, such as manufacturing [213], production
planing [203], supply chain management [204], or inventory management [208, 209]. For each
task, the management has to choose one implementation out of many possible combinations by
optimizing the company’s objectives. For companies where logistics play a more critical role,
simulations have been employed to optimize the (freight) traffic [205, 206, 207]. For marketing,
so-called recommender systems suggest products to customers based on the current shopping cart
or purchase history. Obtaining such suggestions can be computationally expensive because they
are often based on a large amount of data, and thus surrogates have been used [214].
The variety of applications requiring expensive simulations demonstrates the need for suitable
algorithms. For most such applications and simulations, surrogates have been employed to address
the time-consuming simulations and make approximates available to the algorithm.
                                                   42


3.4   Summary of the Chapter and Open Issues
This chapter has provided an overview of surrogate-assisted optimization from different view-
points. Moreover, we have presented a number of applications with time-consuming simulations
demonstrating the relevance of research investigating the optimization of computationally expensive
functions.
    Despite all achievements in surrogate-assisted optimization over the last years, a few research
directions have been paid only a little attention [38]. Many algorithms that have been proposed
are rather specific and not very generalizable. Thus, a lot of surrogate-assisted methods address
only a specific problem class. This results in numerous surrogate-assisted variants of algorithms
where it is unclear what method is the most appropriate for an application problem. Thus, there
is a need for a more general methodology to incorporate surrogates into an existing algorithm.
Moreover, most studies assume that the objective and constraints are not separably evaluable and
have the same evaluation time. However, for many real-world problems, the evaluation consists
of calling different functions or third-party software products. Therefore, the independence with
possibly varying evaluation times shall be directly exploited by an optimization method. Such
practicable aspects have mostly been neglected so far in literature and are worth investigating.
Besides aspects directly related to the optimization of time-consuming functions, optimization
often is interdisciplinary and requires collaborations. Rarely, the characteristics and realizations of
such collaborations have been focused on. However, in practice, collaboration is essential for the
overall success of the project.
                                                  43


                                             CHAPTER 4
      A GENERALIZED PROBABILISTIC SURROGATE-ASSISTED FRAMEWORK
This chapter proposes a methodology for incorporating surrogates into metaheuristics and op-
timization in general. After showing the importance of considering the computational expense
during optimization, different surrogate-based approaches are categorized regarding the role of the
surrogate during optimization and the interdependency of the algorithm’s design. First, we focus
on computationally expensive single-objective optimization problems and propose a framework to
incorporate surrogates into the optimization procedure. The incorporation is based on a probabilis-
tic selection from a search pattern created by optimizing the surrogate. Afterward, we generalize
this concept to solve constrained multi-objective problems by incorporating the Pareto-dominance
principle and constraint violation of solutions into the probabilistic selection scheme. The majority
of this chapter is based on [24] and [216] with some minor modifications to ensure consistency
throughout this thesis.
4.1    Introduction
Many optimization problems are computationally expensive and require the execution of one or mul-
tiple time-consuming functions to evaluate a solution. Expensive optimization problems (EOPs) are
especially important in practice and are omnipresent in all kinds of research and application areas,
for instance Agriculture [172], Engineering [179], Health Care [196], or Computer Science [153].
Often the expensiveness of the evaluation is caused by the requirement of running a simulation, such
as Computational Fluid Dynamic (CFD) [21], Finite Element Analysis (FEA) [22], or processing
a large amount of data [217, 70]. The majority of simulation-based data-intensive problems is
black-box in nature [218] and gradient information is not available or even more time-consuming
to derive. Thus, it is vital to address the time-consuming objective and/or constraint functions as
an inherent part of the optimization problem and limit the overall evaluation budget significantly.
                                                  44


    The computational expensiveness is most commonly addressed by the usage of so-called surro-
gate or metamodels [104]. Substantial effort has been made based on a well-known approach known
as efficient global optimization (EGO) [37]. The solutions being evaluated in each iteration are
based on the optimum of a utility optimization problem – also known as infill criterion – commonly
defined by the surrogate’s value and error predictions from Kriging [35]. Original limitations such
as the evaluation of a single point per iteration, the lack of constraint handling, or dealing with
multiple objectives have been investigated, and extensions have been proposed [126, 219]. Overall,
the algorithm can be reduced to a fit-define-optimize procedure where the utility problem definition
becomes more challenging when new problem complexities need to be handled. Moreover, the
surrogate model is the core of all EGO approaches, and its accuracy is inevitably more critical for
the algorithm’s performance.
    Another category of surrogate-assisted algorithms uses an existing optimization method but in-
corporates one or multiple surrogates more organically [105]. Such approaches aim to improve the
convergence behavior of the baseline algorithm and, thus, the anytime performance. Researchers
have explored different ways of incorporating surrogates into population-based algorithms, such
as genetic algorithms (GA) [1], differential evolution (DE) [220], or particle swarm optimization
(PSO) [118] over the last years. All surrogate-assisted algorithms must find a reasonable trade-off
between exploiting the knowledge of the surrogate and exploring the search space. On the one
hand, researchers have looked into methods adding a surrogate with lighter influence on the origi-
nal algorithm, for instance, using surrogate-based pre-selection in evolutionary strategy [221] or a
predictor for the individual replacement in DE [109]. On the other hand, an existing algorithm’s
behavior can be entirely biased by a surrogate by guiding the search more significantly. A global
and local surrogate have been incorporated into PSO to solve expensive large-scale optimization
problems [111] or into DE for expensive constrained optimization problems [222]. Numerous vari-
ants of surrogate-assisted algorithms indicates that many different ways of incorporating surrogates
into an optimization method exist, but also that no best practice procedure has been established
yet [38].
                                                  45


                                                                    Specialized
                                          Robust                    With Surrogate
                 Efficiency/Computation
                                          With Surrogate
                                                                      Specialized
                                             Robust
                                                           Problem Type
Figure 4.1: Robustly adding surrogate-assistance to population-based algorithms (illustration in-
spired by [1]).
   The need for more generalizable concepts in surrogate-assisted optimization has been rec-
ognized, and frameworks aiming to solve a broad category of optimization problems have been
proposed [83, 223]. These frameworks provide a generic method for solving unconstrained and con-
strained, single- and multi-objective optimization problems using the fit-define-optimize strategy.
Despite these frameworks being applicable for numerous optimization problems, their design is
rather challenging to incorporate and apply research conducted on computationally more inexpen-
sive problems. Thus, a framework generalizing different algorithms is desired. In this chapter, we
propose a novel surrogate-assisted framework that enables the ability to add surrogate assistance to
population-based algorithms. Whereas most other surrogate-assisted algorithms aim to incorporate
surrogates into specialized algorithms, the goal of this study is to provide a scheme to add surrogate
assistance to a whole algorithm class (see Figure 4.1). Even though specialized surrogate-assisted
algorithms are likely to outperform a generic concept on a specific problem type, the merit of
this study is its broad applicability to different algorithms being proposed in literature. Because
of the variety of existing algorithms for all kinds of problems, our framework is also generalized
to numerous problem classes, such as unconstrained or constrained and single or multi-objective
problems. The main contributions of this chapter can be summarized as follows:
  (i) We provide a categorization of existing surrogate-assisted algorithms regarding their sur-
      rogate usage. We distinguish methods regarding their surrogate’s impact and importance
                                                               46


       during optimization and identify what has had less attention in the past.
  (ii) We propose a framework that uses an existing population-based algorithm’s search pattern
       and improves the convergence behavior incorporating surrogate assistance. In contrast to
       existing surrogate-assisted methods, we are using the entire search pattern of the search of
       an algorithm on the surrogate and not only using final solutions. This allows transferring
       features of existing algorithms to expensive optimization problems.
 (iii) The exploitation of the surrogate and the search space exploration is addressed using a
       probabilistic tournament selection based on points suggested by the algorithm. The surrogate
       prediction error is incorporated into the tournament selection and reliably balances the
       exploitation-exploration trade-off based on the surrogate’s accuracy.
 (iv) Our proposed method has truly surrogate-assisted characteristics. The surrogate guides the
       search depending on its accuracy and can have more or less impact on the baseline algorithm.
       Moreover, the maximum amount of impact can be regulated by setting a hyperparameter. In
       an extreme case, if the surrogate turns out to be a disadvantage during the search, it might
       even be disabled temporarily, and the method falls back to the default algorithm.
4.2    Background
In this section, the short overview of existing methods in the previous section will be enriched with
details, and existing surrogate-assisted algorithms will be categorized regarding their surrogate
incorporation.
    Existing surrogate-assisted methods can be roughly put into one of the following categories
based on the surrogate’s involvement: aided, customized, dependent, or once (see Figure 4.2).
The latter describes the early development of optimization using the approximation model, fitted
exactly once during optimization and never updated (once). Algorithms that perform an update
of the surrogate can mostly depend on its predictions (dependent) or use it as an assistant in an
existing method to improve the convergence behavior (aided, customized). The surrogate’s role and
                                                  47


dependency on the algorithm’s design are vital for generalizing and, thus, shall be given special
attention in this study. A thematic overview of these different types of surrogate involvement in
an algorithm is given next. Especially in the early phase of surrogate-based optimization, the
surrogate was fitted only once and optimized. Thus, the optimization’s outcome entirely depends
on the accuracy of the surrogate model.
    The limitation of fitting a surrogate only once has soon been overcome by a more adaptive
approach known as EGO (Efficient Global Optimization) [37]. The surrogate guiding the search is
Kriging [35], which provides predictions and a measure of uncertainty for each point. The prediction
and uncertainty together define the so-called acquisition function (or infill criterion), such as the
expected improvement [37] or probability of improvement [92] aiming to balance exploitation
and exploration simultaneously. The optimization of the acquisition function results in an infill
solution, which is first evaluated and then added to the model. The procedure is repeated until a
termination criterion is met. The limitation of finding only a single new solution in each iteration has
been investigated thoroughly, and multi-point EGO approaches have been proposed [96, 99, 100].
Moreover, the concept has been generalized to solve multi-objective optimization problems by
using decomposition [126, 134] or replacing the objective with a performance indicator based
metric [130]. The idea has also been extended to handle constraints, which is especially important
for solving real-world optimization problems [138]. Instead of using acquisition functions to address
the surrogate’s uncertainty, algorithms based on trust regions have been proposed. Inevitably,
updating the trust-region radii becomes vital for the algorithm’s performance [224]. Whereas
original studies were limited to unconstrained single-objective optimization, the surrogate-assisted
trust-region concept has been generalized to constrained and multi-objective optimization [83, 223].
                                                   48


                                      A surrogate is part of the                                               The surrogate is fitted exactly
                                      algorithm's design and serves as                                         once and then optimized. Thus,
                                      an assistant to improve the                                              the performance strongly
                                      algorithm's convergence.                                                 depends on the surrogate's
                                                                                                               prediction error.
        Surrogate-Aided                   Surrogate-Customized                Surrogate-Centered                  Just Once / Not Adaptive
                                                                        The surrogate is a substantial
An algorithm retrieves some aid
                                                                        component of the algorithm and
from a surrogate, which plays only
                                                                        its design. Without any surrogate,
a minor role in the algorithm's
                                                                        the algorithm cannot be executed.
design.
                                     Figure 4.2: Different roles of surrogates in the design of an algorithm.
                            Surrogate-Assisted                                                         Surrogate-Based
                                                                        49


    Apart from the approaches discussed above, the direct usage of surrogates in an algorithm has
been explored in various areas, for instance, bi-level optimization [56, 227] or mixed-integer opti-
mization [228]. All these approaches have one thing in common – the algorithm has been designed
based on an approximating model and thus has a significant surrogate dependency. Therefore, the
surrogate’s suitability and accuracy are critical for the optimization’s success. Inaccurate surrogate
predictions and error estimations, inevitably occurring in large-scale optimization problems, are
known to be problematic [38].
    In contrast to algorithms being design-based on surrogates, researchers have investigated sur-
rogates’ incorporation into existing optimization methods. Such approaches are also known as
surrogate-assisted algorithms, emphasizing the surrogate’s role as an assistant during optimiza-
tion. In our categorization, surrogate-assisted algorithms are split up into two categories. On
the one hand, algorithms being aided by a surrogate where only minor changes of the original
algorithm design are made; on the other hand, surrogate-customized methods where the algorithm
has a significant impact on the algorithm’s design. Because the judgment of impact is subjective,
the transition between both categories is somewhat fluent.
    The benefit of surrogate-aided algorithms is that with relatively minor modifications, a surrogate
has been incorporated, and the performance has been improved [104]. One well-known approach is
a pre-selection (or pre-filtering), which uses a surrogate to select a subset of solutions that usually
              Table 4.1: Categorization regarding the surrogate’s role in an algorithm.
                       Category            Algorithm / Study
                       Aided               MAES [221]
                       Customized          MOEAD-EGO [134], K-RVEA [225],
                                           HSMEA [136], CSEA [64], PAL-
                                           SAPSO [119], CAL-SAPSO [120]
                       Centered            EGO [37], ParEGO [126], SMS-
                                           EGO [130], Max-Min SAEA [56],
                                           SACOBRA [226], SABLA [227],
                                           GS-MOMA [83], GSGA [223],
                                           MISO [228], GOSAC [229]
                       Just Once           [230], [87], [231]
                                                   50


would be evaluated on the expensive problem [221]. Moreover, instead of changing the behavior in
a generation, surrogates have also been used across generations by switching between the expensive
evaluation and surrogate predictions entirely for some iterations [102, 105]. Another example for
a surrogate-influenced algorithm is modifying a memetic algorithm (genetic algorithm with local
search) by executing an evaluation-intensive local search on the surrogate [51].
     Apart from surrogate-assisted methods with relatively minor modifications of existing al-
gorithms, substantial customization based on well-known algorithms has been proposed by re-
searchers. This has resulted in surrogate-assisted variants of well-known algorithms, such as lqC-
MAES [232] derived from CMAES [112], KRVEA [225] and HSMEA [136] based on RVEA [125],
MOEAD-EGO [134] as an improvement of MOEAD [11], or CAL-PSO [120] based on PSO [118],
to name a few. Each surrogate-assisted variant is in principle based on an algorithm originally
developed for computationally more inexpensive optimization problems but customizes the default
behavior, for instance, by one or multiple local or global surrogates, implementing a sophisticated
selection after surrogate-based optimization.
     The increasing number of surrogate-assisted algorithms shows the importance and relevance
of optimizing computationally expensive functions in practice. Indisputably, approaches and ideas
directly designed for and dependent on one or multiple surrogates have their legitimacy but are rather
difficult to use for newly proposed algorithms. The increasing number of surrogate-customized
algorithms indicates the need for a best practice procedure and more generalizable methods. Thus,
this study aims to investigate a surrogate-aided framework of algorithms applicable to a broad
category of optimization methods.
4.3   Interfacing Metaheuristics
One of the major challenges when proposing a generalized optimization framework is the number
and strictness of assumptions being made. On the one hand, too many assumptions restrict the
applicability; on the other hand, too few assumptions limit the usage of existing elements in al-
gorithms. In this study, we target any type of population-based algorithm with two phases in an
                                                 51


  Algorithm 4.1: Infill-And-Advance Interface
     Input : Algorithm Φ
  1  while Φ has not terminated do
  2      𝑋 ← Φ.infill()
  3      𝐹, 𝐺 ← evaluate(𝑋)
  4      Φ.advance(𝑋, 𝐹, 𝐺)
  5  end
iteration: the process of generating new solutions to be evaluated (infill) and a method processing
evaluated infill solutions (advance). With these two methods, running an algorithm can be sum-
marized by the pseudo-code shown in Algorithm 4.1. Until the algorithm Φ has been terminated,
the infill method returns a set of new designs 𝑋 to be evaluated. After obtaining the objective 𝐹
and constraint 𝐺 values for design, the algorithm is advanced by providing the evaluated solutions
{𝑋, 𝐹, 𝐺}. By looking at this interface, we further make two (weak) assumptions. First, we do
not assume that 𝑋 needs to be identical with the suggested designs from infill (Line 2 and 4), but
can also be modified. Second, the infill method is non-deterministic, resulting in different designs
𝑋 whenever called. Both assumptions can be considered weak because most population-based
algorithms already fulfill them. So, how can existing optimization methods be described into
infill and advance phases? Genetic algorithms (GAs) generate new solutions using evolutionary
recombination-mutation operators and then process them using an environmental survival selec-
tion [1] operator; PSO methods create new solutions based on a particles’ current velocity, personal
best, and global best, and process the solutions using a replacement strategy [118]; CMAES sam-
ples new solutions from a normal distribution, which is then updated in each iteration [112]. Shown
by well-known state-of-the-art algorithms following or being suitable to be implemented in this
optimization method design pattern, this seems to be a reasonable assumption to be made for a
generic framework. Moreover, it is worth noting that some researchers and practitioners also refer
to the pattern as ask-and-tell interface.
     However, how should this interface be utilized, and what role can surrogates play in improving
the algorithm’s performance? Precisely this is the subject of this work. Nevertheless, before
moving on to the proposed framework, some more specifications of the surrogate usage are to be
                                                 52


defined. First, the surrogate shall only be used as an assistant (in contrast to other methods where
everything is developed centered around the surrogate). Second, the proposed method should be
adaptive, allowing to decrease and increase the impact of surrogate usage and, if desired, even
falling back to the original pseudo-code shown in Algorithm 4.1. Third, the surrogate prediction
error needs to be addressed to ensure both exploitation and exploration. Altogether, the design
goals are formulated to make the optimization framework and surrogate incorporation flexible.
4.4   Probabilistic Surrogate-Assisted Framework (PSAF)
In the following, we propose probabilistic surrogate-assisted framework (PSAF), a framework for
solving computationally expensive single-objective optimization problems.
4.4.1   Methodology
In contrast to most existing surrogate-assisted algorithms, PSAF uses not only the final solution(s)
obtained by optimizing the surrogate but the whole search pattern. By making use of the search
pattern, the exploration-exploitation balance is found by taking the surrogate’s accuracy into ac-
count. To allow even more flexible exploitation of the surrogate, we propose two phases. First,
derive a solution set that is influenced by the surrogate, and second, introduce surrogate bias by
optimizing the surrogate for a number of iterations. Both procedures are important to incorporate
surrogates into existing methods effectively.
     The overall outline of the algorithm is shown in Algorithm 4.2. The PSAF concept requires a
baseline algorithm Φ which implements two methods – infill() and advance(X,F). Moreover,
three hyper-parameters, 𝛼, 𝛽, and 𝜌 (max) are passed to balance the surrogate’s influence on the
baseline algorithm. First, the design of experiments 𝑋 (doe) are generated in a space-filling manner
and evaluated 𝐹 (doe) using the time-consuming evaluation function (Line 1). An iterative procedure
introducing surrogate bias to the baseline algorithm continues until the user-defined termination
criterion is met. Each iteration begins with asking the baseline algorithm for new infill solutions.
All steps from there on until the evaluation of 𝑋 and the advancement of the algorithm (Line 29
                                                 53


  Algorithm 4.2: PSAF: A Probabilistic Surrogate-Assisted Framework
    Input : Algorithm Φ with infill() and advance(X, F), Surrogate Tournament Pressure 𝛼 (≥ 1),
             Number of Simulated Iterations 𝛽 (≥ 0), Maximum Surrogate-Bias 𝜌 (max) (≥ 0.0)
    /* Sample Design of Experiments (DOE)                                                       */
  1 𝑋 (doe) ← doe(); 𝐹 (doe) ← evaluate(𝑋 (doe) )
  2 Φ.advance(𝑋 (doe) , 𝐹 (doe) )
  3 while not terminated do
         /* Default infill solutions from baseline algorithm                                    */
  4      𝑋 ← Φ.infill()
         /* Fit a surrogate and predict values for infills                                      */
  5      𝑆 ← fit(𝑋 (doe) , 𝐹 (doe) )
  6      𝐹ˆ ← 𝑆.predict(𝑋)
         /* Surrogate-assisted Tournament Pressure (𝛼)                                          */
  7      foreach 𝑘 ← 2 to 𝛼 do
  8           𝑋 𝛼𝑘 ← Φ.infill()
  9           𝐹ˆ 𝛼𝑘 ← 𝑆.predict(𝑋 𝛼𝑘 )
 10           foreach 𝑗 ← 1 to size( 𝐹 𝛼𝑘 ) do
 11                 if 𝐹ˆ 𝑗𝛼𝑘 < 𝐹ˆ 𝑗 then 𝑋 𝑗 ← 𝑋 𝑗𝛼𝑘 ; 𝐹ˆ 𝑗 ← 𝐹ˆ 𝑗𝛼𝑘 ;
 12           end
 13      end
         /* Bias by Continuing the Algorithm Φ′ on Surrogate (𝛽)                                */
 14      Φ′ ← copy(Φ)
 15      𝑋 𝛽 ← 𝑋; 𝐹ˆ 𝛽 ← 𝐹ˆ
 16      foreach 𝑘 ← 1 to 𝛽 do
 17           𝑋 𝛽𝑘 ← 𝐴 ′ .infill()
 18           𝐹ˆ 𝛽𝑘 ← 𝑆.predict(𝑋 𝛽𝑘 )
 19           foreach 𝑗 ← 1 to size( 𝐹ˆ 𝛽𝑘 ) do
                                         𝛽
 20                 𝑖 ← closest(𝑋 𝑗 𝑘 , 𝑋 𝛽 )
                    if 𝐹ˆ 𝑘 < 𝐹ˆ then 𝑋 ← 𝑋 𝑘 ; 𝐹ˆ ← 𝐹ˆ 𝑘 ;
                          𝛽        𝛽         𝛽      𝛽       𝛽       𝛽
 21                       𝑗       𝑖         𝑖       𝑗      𝑖        𝑗
 22           end
 23           Φ′ .advance(𝑋 𝛽𝑘 , 𝐹ˆ 𝛽𝑘 )
 24      end
 25      𝜌 ← min(estm_surr_bias(S), 𝜌 (max) )
 26      foreach 𝑗 ← 1 to size( 𝑋 𝛽 ) do
                                                   𝛽
 27           if random() < 𝜌 then 𝑋 𝑗 ← 𝑋 𝑗 ;
 28      end
         /* Next iteration of the overall algorithm                                             */
 29      𝐹 ← evaluate(𝑋)
 30      𝐴.advance(𝑋, 𝐹)
 31      𝑋 (doe) ← 𝑋 (doe) ∪ 𝑋
 32      𝐹 (doe) ← 𝐹 (doe) ∪ 𝐹
 33 end
and 30) introduce surrogate bias. The surrogate bias consists of two phases: the 𝛼-phase with
a light influence using a tournament selection based on surrogate predictions (Line 7 to 13); and
                                                                54


the 𝛽-phase simulating the algorithm for a number of generations on the surrogate and accepting
solutions with the probability 𝜌 derived from the surrogate’s accuracy (Line 14 to 28).
4.4.1.1    Influence of Surrogate through Tournament Selection Pressure (𝛼)
A well-known concept in evolutionary computation to introduce a bias toward more promising
solutions is tournament selection. An individual from the population has to win a tournament to
contribute to the mating process. The number of competitors (𝛼) balances how greedy the selection
procedure will be. On the one hand, a larger value of 𝛼 allows only elitist solutions to participate
in mating, while a smaller value introduces less selection pressure. For genetic algorithms, the
most frequently used tournament mode is the binary tournament (𝛼 = 2), which compares a pair
of solutions regarding one or multiple metrics. A standard binary tournament implementation for
constrained single-objective optimization declares the less infeasible solution as the winner if one
or both solutions are infeasible or otherwise the solution with the smaller function value.
     In the context of surrogate assistance, the tournament selection introduces surrogate bias during
the generation of new infill solutions. Whereas in genetic algorithms, evaluated solutions (using
ESE) compete with each other during mating selection, in PSAF solutions evaluated on the surrogate
(ASE) are compared. Figuratively, the surrogate serves as a referee in a tournament by providing
predictions before the evaluation. In Figure 4.3, surrogate-assisted tournament selection with three
competitors (𝛼 = 3) for four infill solutions is shown. Initially, the algorithm’s infill function
has been called three times to generate 𝑋 𝛼1 , 𝑋 𝛼2 , and 𝑋 𝛼3 . Then, a tournament takes place where
                                                      𝛼
the 𝑖-th solutions of the 𝑗-th infill solution set 𝑋𝑖 𝑗 compete with each other. For instance, for the
first tournament the winner of 𝑋1𝛼1 , 𝑋2𝛼1 , and 𝑋3𝛼1 has to be declared. As a comparison function, the
                                         𝛼𝑗
surrogate’s approximation function 𝐹ˆ𝑖       is used. In general, setting 𝛼 = 1 disables the tournament
selection and serves as a fallback. By involving the surrogate in the tournament selection (𝛼 > 1),
the infill solutions 𝑋 get a smaller or larger influence based on the number of competitors.
                                                     55


                                                                 𝜶=𝟑
                                                        $
                                                      𝑋# !        𝑋# "
                                                                       $          $
                                                                                 𝑋# #   *
                                              $&
                      Tournament Selection 𝐹"#
                                                             *
                                                                                            #infills = 4
                                                        $              $          $
                                                      𝑋" !        𝑋" "           𝑋" #
                                                        $
                                                      𝑋! !        𝑋! "
                                                                       $     *    $
                                                                                 𝑋! #
                                                        $
                                                      𝑋% !        𝑋% "
                                                                       $          $
                                                                                 𝑋% #   *
                                                                       $
                                                                      𝑋# "         𝑋#
                                                                       $
                                                                      𝑋" !         𝑋"
                                                   Winners             $
                                                                      𝑋! !         𝑋!
                                                                       $
                                                                      𝑋% "         𝑋%
Figure 4.3: Tournament selection with 𝛼 competitors to create a surrogate-influenced infill solu-
tions.
4.4.1.2   Continue Optimization on Surrogate (𝛽)
While the tournament is an effective concept to incorporate the surrogate’s approximation, it is
limited by looking only a single iteration into the future. To further increase the surrogate’s impact,
the baseline algorithm is continued to run for 𝛽 more consecutive iterations on the surrogate’s
approximations. Inevitably, the question of how many iterations are suitable arises and indicates
the importance of tuning 𝛽. Nevertheless, even more critical, how should the algorithm profit
from simulating the algorithm on the surrogate? An inappropriate choice of 𝛽 will cause the
surrogate’s optimum to be repeatedly found and will entirely discard the baseline algorithm’s
default infill procedure. This also causes a diversity loss of infill solutions and does not account for
the surrogate’s approximation error. Thus, we propose a probabilistic surrogate-assisted approach
that balances the surrogate’s impact on the baseline algorithm to address these issues.
   The probabilistic procedure is described in Algorithm 4.2 from Line 14 to 28. Because the
iterations are only simulated on the surrogate, the original algorithm object Φ must be copied to
                                                                 56


                                                       𝜷=𝟓
                           𝑋            𝛽$        𝛽%          𝛽&        𝛽'        𝛽(
                                         %         %           %         %         %
                           𝑋!           𝑋! !      𝑋! "       𝑋! #       𝑋! $      𝑋! %
                                         %         %           %         %         %
                           𝑋"           𝑋" !      𝑋" "       𝑋" #       𝑋" $      𝑋" %
                                         %         %           %         %         %
                           𝑋#           𝑋# !      𝑋# "       𝑋# #       𝑋# $      𝑋# %
                                         %         %           %         %         %
                           𝑋$           𝑋$ !      𝑋$ "       𝑋$ #       𝑋$ $      𝑋$ %
                                                                               𝑋"
                                𝑋#
                   𝑥"
                                                            𝑋%
                                                                    𝑋!
                       Select best for each 𝑋!
                                𝑋#                                                  𝐶'
                                       𝜌                                     𝜌
                   𝑥"                                  𝜌
                                                            𝑋%
                                                                      𝜌
                                                                        𝐶'
                       Replace with probability 𝜌
                                                         𝑥#
          Figure 4.4: Continuation of the algorithm’s run for 𝛽 iteration on the surrogate.
Φ′ to avoid any modifications of the current algorithm’s state (Line 14). Then, the algorithm’s
run (of Φ′) is continued on the surrogate model for 𝛽 iterations, by calling in each iteration 𝑘 the
infill method returning 𝑋 𝛽 𝑘 and feeding back to the algorithm the approximations 𝐹ˆ 𝛽 𝑘 by calling
advance (Line 17, 18, and 23). The goal of these iterations is to introduce more surrogate-bias
into 𝑋. Therefore, a surrogate-biased population 𝑋 𝛽 is obtained by initializing 𝑋 𝛽 = 𝑋 and 𝐹ˆ 𝛽 = 𝐹ˆ
                                                                      𝛽
(Line 15) and assigning in each iteration (𝑘) every solution 𝑋 𝑗 𝑘 to its closest solution 𝑖 in 𝑋 𝛽 . The
closest solution is determined based on the smallest (normalized) Euclidean distance in the design
space. The infill 𝑋𝑖 and the corresponding prediction 𝐹ˆ𝑖 is replaced if the newly found solution
                     𝛽                                            𝛽
                                                                                                        𝛽
performs better considering the surrogate’s prediction. Finally, a biased candidate solution 𝑋 𝑗
replaces 𝑋 𝑗 with probability 𝜌 bounded by 𝜌 (max) . Clearly, the value of 𝜌 determines the impact of
the 𝛽-phase on the baseline algorithm.
    An example with five iterations (𝛽 = 5) and four infill solutions 𝑋1 , 𝑋2 , 𝑋3 , and 𝑋4 is also
                                                       57


illustrated in Figure 4.4. Calling the infill function of the baseline algorithm results in five solution
sets with four solutions each. When running the algorithm, the assignment takes place, and for
instance, 𝑋1 has four solutions being the closest to, and 𝑋4 has six. The assignment of the closest
solution will show cluster-like arrangements and preserve diversity.
       In general, optimizing the surrogate model is a common technique used in surrogate-assisted
algorithms. However, a crucial aspect is addressing the utilization of knowledge from these
iterations. An assignment-based and probabilistic approach keeps the balance between the default
algorithm’s behavior and surrogate bias. The strategy to determine the ”bottleneck“ variable 𝜌 of
the 𝛽-phase is described next.
4.4.1.3      Balancing the Utilization of Surrogate (𝜌)
The number of infill solutions being finally biased by 𝛽 iterations on the surrogate is critical and
balances the whole surrogate’s involvement. In industry projects finding a suitable surrogate can
be rather challenging and is often done manually. Comparing different types of surrogates and
selecting the most suitable one is usually based on a metric that judges a model’s trustworthiness.
A well-known metric to estimate the accuracy is the coefficient of determination (also known as
𝑅 2 ):
                                 2
                                         Í            ˆ 2
                                           𝑖 (𝑦 𝑖 − 𝑓𝑖 )          MSE(𝑦, 𝑓ˆ)
                                𝑅 =1− Í                     = 1 −               ,                   (4.1)
                                            𝑖 (𝑦 𝑖 − 𝑦¯ )
                                                          2
                                                                  MSE(𝑦, 𝑓ˆ𝑦¯ )
where 𝑦𝑖 represents the output and 𝑓ˆ𝑖 the prediction of the 𝑖-th value, and 𝑦¯ the arithmetic mean of
all output values. The denominator 𝑖 (𝑦𝑖 − 𝑦¯ ) 2 = MSE(𝑦, 𝑓ˆ𝑦¯ ) represents the Mean Squared Error
                                       Í
(MSE) of a surrogate 𝑓ˆ𝑦¯ always predicting the average of all output values. This error serves as
the normalization constant for coefficient of determination. For a surrogate performing worse than
 𝑓ˆ𝑦¯ , the right-hand side of the equation results in a value greater than 1 and, thus, 𝑅 2 becomes
negative. If the surrogate performs equally good, the value is zero and otherwise positive. The
upper bound of the 𝑅 2 metrics is 1, which could theoretically be reached with an MSE of zero.
These characteristics turn out to be very suitable for defining a probability and thus have been the
inspiration for balancing the surrogate bias.
                                                      58


The replacement probability 𝜌 is given by bounding 𝑅 2 on the lower end to zero
                                                       MSE(𝑦, 𝑓ˆ)
                                                                  
                                    𝜌 = max 0, 1 −                   ,                               (4.2)
                                                      MSE(𝑦, 𝑓ˆ𝑏 )
where 𝑓ˆ𝑏 represents a baseline predictor. The formula of 𝜌 generalizes the definition of 𝑅 2 by
considering an arbitrary baseline predictor 𝑓ˆ𝑏 instead of 𝑓ˆ𝑦¯ . Given that 𝑅 2 has an upper bound of
one and is at least zero, 𝜌 ∈ (0, 1) holds and thus is a valid probability.
      But what does 𝜌 as a metric defining the surrogate bias imply in the context of an algorithm’s
iteration? If the surrogate performs worse than 𝑓ˆ𝑏 (𝜌 = 0), no solution will be replaced. On the
opposite, if the surrogate has no prediction error, all solutions will be surrogate biased. Even more
importantly, if none of these two extreme cases occur, the value of 𝜌, and therefore the surrogate
bias, will be adjusted proportionally to the accuracy of the model normalized by the performance of
 𝑓ˆ𝑏 . In our implementation, we chose a 𝑘-nearest neighbor model (𝑘 = 𝑛 + 1 where 𝑛 represents the
number of variables) as a baseline predictor 𝑓ˆ𝑏 and average 𝜌 over the last five iterations (sliding
window). For estimating the model’s accuracy, we use all data points observed until the last
generation as training and newly evaluated infill solutions as a validation set. In the initial iteration,
where only the design of experiments and no infill solutions exist, a 𝑘-fold cross-validation is
performed. Our experiments have shown that the surrogate prediction error assessment is essential
and, thus, directly incorporating it into the algorithm design recommended.
4.4.1.4     Surrogate Management
The algorithm’s outline has already shown that the surrogate model has to be fitted through data
points and is used as a predictor for new infill solutions. In general, all matters related to fitting
or updating a surrogate are referred to as surrogate management. It is noteworthy that in practice,
not only single but multiple surrogates are recommended to provide a more robust model with less
approximation error. With multiple surrogates, we refer not only to the type of surrogate but also
concrete hyper-parameters. In our implementation, in total 15 surrogates, consisting of the model
types RBF [34] and Kriging [35], are validated. The hyper-parameters instantiate models with
different mean functions, kernel, and noise. Finally, the model with the highest 𝜌 value is chosen.
                                                   59


On the one hand, an increasing number of points from optimization increase the time spent for
surrogate management and, on the other hand, can lead to precision issues. The precision issues are
caused by solutions by very close to each other in the design space. Thus, we employ an 𝜖-clearing
approach, which always selects the solution with the smallest function value and then clears all
solutions with less than 𝜖 distance to it (𝜖 = 0.005). We reduce the overall amount of points by
only considering the 200 best solutions from the 𝜖-cleared solution set.
4.4.2   Experimental Results
This study focuses on computationally expensive functions by limiting the evaluation budget for
test problems. This is a commonly used principle in surrogate-assisted research, especially for
more general approaches that need to be tested on several optimization problems. In this study, we
have limited the function evaluations to 200-300 and considered problems with up to 10 variables.
For comparison, we use well-known test problems, such as Sphere, Ackley, Rosenbrock, and others
from the single-objective BBOB test suite [233]. To demonstrate the generalizability of PSAF,
we (i) conduct an experiment focusing on the most suitable hyper-parameter combination, (ii)
investigate the impact of dynamically determining the surrogate-bias by 𝜌 and (iii), lastly, compare
the proposed framework of surrogate-assisted algorithms with other recently proposed methods for
computationally expensive problems.
4.4.2.1   What are suitable values for (𝛼, 𝛽 and 𝜌)?
In our first experiment, 𝜌 is kept fixed and not updated. This shall give insights if 𝜌 is a problem-
dependent variable and, in fact, benefits from being updated based on the surrogate’s prediction
error. Moreover, the impact of 𝛼 and 𝛽 on an algorithm’s performance shall be of interest.
    For this hyper-parameter study, we select CMA-ES [112] as a baseline algorithm. We normalize
each variable between zero and one to avoid any scaling irregularities and initialize the algorithm
with a standard deviation of 𝜎 = 0.15. The algorithm’s initial starting point is determined by the best
solution found by generating 20 points using Latin Hypercube Sampling [234]. We employ grid-
                                                  60


based optimization by setting the hyper-parameters 𝛼 ∈ (1, 2, 3, 5, 10), 𝛽 ∈ (0, 5, 10, 20, 30, 40, 50)
and 𝜌 ∈ (0.1, 0.2, . . . , 0.9, 1.0) for PSAF-CMA-ES . Because 𝛽 = 0 makes the value of 𝜌 irrelevant,
there is no need to consider any run with 𝜌 = 0 in addition. Because of the stochastic nature of the
algorithm, we execute each parameter combination 11 times. This resulted in 60,588 runs in total
for all test problems. As a performance criterion, we address the so-called anytime performance
 𝑓 (any) of the algorithm and calculate the integral of the convergence curve based on the gap to the
optimum 𝑓 (gap) = 𝑓 − 𝑓 (opt) . Measuring the convergence and not only the final function value
addresses the desire of a surrogate-assisted algorithm converging as quickly as possible with a very
limited function evaluation budget.
     In Figure 4.5 results of the hyper-parameter experiment for three exemplary optimization prob-
lems are shown in the form of a parallel coordinate plot [235]. The first three vertical lines represent
the parameters 𝛼, 𝛽, and 𝜌, and the last the performance metric 𝑓 (any) , relative to the baseline al-
gorithm CMA-ES. Thus, the baseline algorithm’s performance (blue) always ends up being 1.0,
and the resulting values indicate the proportional improvement/deterioration. Moreover, the best
performing parameter combination (red) and the second to tenth best (yellow) are highlighted.
(i) One can observe that adding surrogate bias has successfully improved the baseline algorithm’s
performance. PSAF achieved values less than one for almost all parameter combinations and
improved the baseline algorithm’s performance. Moreover, for the most suitable hyper-parameter
values, PSAF showed a remarkable improvement by reducing the convergence integral to 30%.
(ii) One might think that introducing a strong bias in the 𝛽-phase makes the 𝛼-phase irrelevant.
However, results indicate that it is beneficial to employ pre-filtering. The surrogate-influence in the
𝛼 phase is applied no matter how good the surrogate performs but only provides surrogate influence
to a specific extent. Nevertheless, more experiments need to be conducted to determine the most
suitable value for 𝛼 across all problems.
(iii) It becomes evident that for sphere and for rosenbrock rather large values of 𝛽 and 𝜌 and
thus a stronger surrogate-bias are a better choice. This is in contrast to bbob-f07-1 where less
surrogate involvement has turned out to be more effective. Moreover, even for a relatively simple
                                                    61


Figure 4.5: Hyper-parameter Analysis for PSAF-CMA-ES with varying 𝛼, 𝛽 and 𝜌. Shown are
the baseline algorithm CMA-ES (blue), the 2nd to 10th best (orange), and the best (red). The
performance 𝑓 (any) is normalized with respect to the baseline algorithm.
10-dimensional quadratic function, the surrogate may not be used 100% (𝜌 = 1.0) of the time.
Analysis has shown this can be attributed to the limited number of points initially. Even for
problems with almost no complexity, a small initial number of designs of experiments (here 20)
requires the baseline algorithm to do some exploration until the surrogate starts to recognize the
characteristics of the function and has a suitable accuracy.
(iv) Besides visualization, we have also performed a ranking-based analysis to find suitable param-
eter combinations. Thus, we have averaged the ranking in percentage across all problems. For
instance, rank 30 out of 307 results in a value of ≈ 0.0977. The average percentage ranks with their
standard deviations are shown in Table 4.2. Results indicate that an 𝛼 value between 5 to 10, a 𝛽
value between 30 to 50 and a value of 𝜌 between 0.3 to 0.5 perform best.
                                                   62


                     Table 4.2: Rankings of best performing hyper-parameters.
                                                       norm. rank
                                    𝛼     𝛽    𝜌
                                                     mean      std
                                    10   40   0.4   0.2485   0.1556
                                    5    40   0.5   0.2485   0.1949
                                    10   50   0.3   0.2586   0.1664
                                    10   30   0.3   0.2595   0.1914
                                    5    40   0.3   0.2606   0.1728
4.4.2.2   Is it beneficial to update 𝜌 each iteration?
Our next study addresses the impact of updating 𝜌 in each iteration. The relatively small 𝜌 values
found to perform best might indicate that trusting the surrogate too much slows down the overall
convergence. Thus, the effect of updating 𝜌 in each iteration based on the surrogate’s prediction error
should be investigated (Algorithm 4.2 and Section 4.4.1.3). Considering the insights gained from
the hyper-parameter study, we define an upper bound for 𝜌, determining the maximum influence
of the 𝛽-phase to a reasonable value of 𝜌 (max) = 0.7. Moreover, we have used good-performing
parameter combinations from the previous hyper-parameter study. The experiment reveals that an
update of 𝜌 performs significantly better than the best parameter combination from before and is,
thus, recommended (see Table 4.3).
                                Table 4.3: Ranking with adaptive 𝜌.
                                                          norm. rank
                                 𝛼     𝛽       𝜌
                                                        mean       std
                                 10   40    adaptive   0.1375   0.1831
                                 10   40      0.4      0.2485   0.1556
4.4.2.3   How does PSAF perform compared to other surrogate-based algorithms?
For the remainder of this study, we fix the hyper-parameters to a 𝛼 = 10, 𝛽 = 40 and perform a
dynamic update of 𝜌 with 𝜌 (max) = 0.7. So far, our experiments have been based on CMA-ES
to avoid an immense amount of runs for drawing conclusions about suitable hyper-parameters.
However, for a comparison with other methods we have applied PSAF to the following well-known
                                                   63


population-based algorithms besides CMA-ES [112]: DE [220], PSO [118] with adaptive 𝑐 − 1 and
𝑐 − 2 [236] and a standard genetic algorithm [1]. For all algorithms, the population size and number
of infills solutions (or, depending on the algorithm, called particles or offsprings) have been set to
10. We have used the standard implementations of the algorithms mentioned above available in
pymoo [29]. All other hyperparameters are set to each algorithm’s default settings.
    First, we like to confirm that PSAF improves the convergence of the considered baseline
algorithms on various test problems. Figure 4.6 shows the convergence plots (averaged over 11
runs) of a variety of single-objective optimization problems. The PSAF variants are plotted using
straight and the baseline algorithms with dashed lines. The convergence curves demonstrate the
superiority of surrogate-assisted approaches across all test problems except for bbob-f04-1-10d.
We attribute the superiority of PSO to the problem complexity and the fact that no algorithm can
converge with the limited evaluation budget of 300.
    Second, the performance compared to other surrogate-based algorithms shall be demonstrated.
As a comparison we have chosen a standard EGO implementation from GPyOpt [237] (with 10
infill points in each iteration), a recently proposed method called 𝜖-shotgun [238] (with a batch
size of 10 and 𝜖 = 0.1), and lqCMAES [232]. First, one can observe that lqCMA-ES, based on a
quadratic model approximation, converges closer to the optimum if a solution near the optimum
is found. Nevertheless, for problems where this is not the case, PSAF variants show superior
performance. Extending PSAF to perform a local search using a quadratic model might show
similar convergence behavior near an optimum. Moreover, EGO and 𝜖-shotgun are outperformed
by almost all PSAF variants except for bbob-f05-1-10d where a solution close to the optimum is
found right away. Comparing algorithms of the PSAF framework with itself does allow to declare
no clear winner. Whereas PSAF-PSO seems to perform well for most problems, PSAF-CMA-ES
converges faster for the problems with only two variables. Altogether, considering an algorithm
with a gap to the optimum of less than 10−6 as converged, at least one PSAF variant was better 50%
(6/12) and equally good 30% (4/12) of the time. This can be considered a remarkable achievement
for a generalizable framework.
                                                  64


Figure 4.6: Comparison of the average performance of PSAF with the original algorithms and other
surrogate-based algorithms.
4.4.3    Summary of Section 4.4
In this section, we have proposed a framework of generalized probabilistic surrogate-assisted algo-
rithms. The idea is based on improving the convergence of an existing algorithm by incorporating a
surrogate’s knowledge. The concept consists of a surrogate’s influence through a tournament-based
procedure with 𝛼 competitors and a stronger surrogate’s bias by using solutions with probability
𝜌 derived from continuing the optimizing for 𝛽 iterations on the surrogate. Experiments with a
                                               65


parametric study on 𝛼, 𝛽, and 𝜌 have shown that the proposed approach effectively improves the
convergence behavior on a variety of problems. While 𝛼 = 10 and 𝛽 = 40 worked the best overall,
we have presented an adaptive procedure of updating 𝜌 depending on the surrogate’s prediction
error inspired by the well-known 𝑅 2 metric. PSAF variants of CMA-ES, DE, GA, and PSO have
shown competitive performance compared to other surrogate-based algorithms. Applying PSAF to
other variable types to further demonstrate the approach’s capabilities will be part of future work.
Additionally, the effect of a local search to improve the convergence behavior near the optimum is
worth investigating. Other interesting future studies for PSAF are extensions to handle constraints
and multiple objectives. This will require a suitable baseline algorithm and a modification of 𝜌 esti-
mation based on more than one surrogate. Altogether, the proposed probabilistic surrogate-assisted
concept shall pave the way for new algorithms. PSAF allows making use of existing algorithms’
benefits to solve computationally expensive problems efficiently using a surrogate. Thus, this shall
be an alternative to the widely-used fit-and-optimize method used in EGO and other algorithms.
4.5    Generalized Probabilistic Surrogate-Assisted Framework (GPASF)
Next, PSAF shall be extended to be suitable to handle multiple objectives and constraints. General-
ized probabilistic surrogate-assisted framework (GPSAF) follows the two-phase concept as PSAF.
However, the 𝛼-phase and the 𝛽-phase now have to consider multiple criteria when comparing
solutions.
4.5.1    Methodology
Before describing the responsibilities and modifications of each of the phases, the outline of the
algorithm is discussed (see Algorithm 4.3). Before any surrogate can be fit, a solution archive 𝐴
is initialized by some design of experiments A.X are generated in a space-filling manner. A good
spread of solutions is recommended to allow surrogates to capture the overall fitness landscape as
accurately as possible. A.X is evaluated using exact solution evaluation (ESE) resulting in A.F
and A.G (Line 2). Then, while the number of evaluations is less than the maximum solution
                                                  66


evaluation budget ESE (max) , infill solutions P.X are generated by calling the non-deterministic
infill method of the baseline algorithm Φ. The default execution of algorithm Φ would immedi-
ately evaluate P.X using ESE and directly feed the solutions back to the algorithm by executing
Φ.advance(P.X, P.F, P.G) (Line 35 and 36). However, instead of doing so, GPSAF modifies
P.X in a way to be influenced and biased by surrogates (Line 6 to 30) and advances the algorithm
in the end of the iteration (Line 35 to 36). After having estimated the surrogate error and fitted
the surrogates for objective and constraint functions, the 𝛼-phase adds surrogate influence to P.X
by replacing solutions being predicted to be better (Line 8 to 14). Thereafter, the 𝛽-phase runs
algorithm Φ for multiple generations (evaluations only on ASE) and assigns each solution to its
closest P.X. For each of the resulting candidate solution pools U[j] assigned P[j] a probabilistic
tournament determines the winning candidate (Line 15 to 30). Afterward, the replacement phase
takes place where either the solution originating from the 𝛼-phase P[j] is kept or replaced with
U[j] from the 𝛽-phase (Line 31 to 34). The solutions set to P.X are evaluated, and the algorithm
Φ is advanced (Line 35 and 36). Finally, the prediction error is updated before starting the next
iteration, and the newly evaluated solutions are added to archive A.
4.5.1.1    Generalized 𝛼 and 𝛽-phase
For single-objective optimization, the 𝛼-phase has already been described (see Section 4.4.1.1).
There, the comparison of the two solutions is based only on one single objective value. For the more
generic version with constraints and objectives, the winner of each solution pool is determined as
follows: if all solutions are infeasible, select the least infeasible solution; otherwise, select a non-
dominated solution (break ties randomly). For both the constraint and objective values, only ASEs
are used. Otherwise, the 𝛼-phase remains the same, including its responsibilities and mechanics.
     Analogously to PSAF, GPSAF further increases the surrogate’s impact by looking 𝛽 iterations
into the future through calling infill and advance of the baseline algorithm repetitively. To obtain
the 𝛽-solution for constrained multi-objective problems, we use a so-called probabilistic knockout
tournament (PKT) to select solutions from each cluster with the goal of self-adaptively exploiting
                                                   67


  Algorithm 4.3: GPSAF: Generalized Probabilistic Surrogate-Assisted Framework
    Input : Algorithm Φ, Surrogate Tournament Pressure 𝛼 (≥ 1), Number of Simulated Iterations 𝛽 (≥ 0), Replacement Probability
            Exponent 𝛾, Maximum Number of Solution Evaluations ESE (max)
    /* Sample Design of Experiments (DOE)                                                                                       */
  1 A ← ∅; P ← ∅; Q ← ∅; U ← ∅; e ← ∅
  2 A.X ← doe(); A.F, A.G ← evaluate(A.X)
  3 while size( 𝐴) < ESE (max) do
         /* Infill sols. from baseline algorithm                                                                                */
  4      P.X ← Φ.infill()
         /* Estimate error - only initially                                                                                     */
  5      if e = ∅ then e ← estm_error(A.X, A.F, A.G);
         /* Surrogates for each obj.         and constr.                                                                        */
  6      𝑆 ← fit(A.X, A.F, A.G)
  7      P.𝐹,ˆ P.𝐺ˆ ← 𝑆.predict(P.X)
         /* Surrogate Influence ( 𝛼)                                                                                            */
  8      foreach 𝑘 ← 2 to 𝛼 do
  9            Q.𝑋 ← Φ.infill()
 10            Q.𝐹,ˆ Q.𝐺ˆ ← 𝑆.predict(Q.X)
 11            foreach 𝑗 ← 1 to size(Q) do
 12                  if not dominates(P[j], Q[j]) then P[j] = Q[j] ;
 13            end
 14      end
         /* Surrogate Bias (𝛽)                                                                                                  */
 15      Φ′ ← copy(Φ)
 16      𝑈←∅
 17      foreach 𝑘 ← 1 to 𝛽 do
 18            Q.𝑘 ← 𝑘
 19            Q.𝑋 ← Φ′ .infill()
 20            Q.𝐹,ˆ Q.𝐺ˆ ← 𝑆.predict(Q.X)
 21            foreach 𝑗 ← 1 to size(Q) do
 22                  𝑖 ← closest(P.𝑋, Q[j].𝑋)
 23                  U[i] ← U[i] ∪ Q[j]
 24            end
 25            Φ′ .advance(Q.X, Q.𝐹, ˆ Q.𝐺)ˆ
 26      end
 27      V ← list()
 28      foreach 𝑗 ← 1 to size(U) do
 29            V ← V ∪ prob_knockout_tourn(U[j])
 30      end
         /* Replacement (𝛾)                                                                                                     */
 31      foreach 𝑗 ← 1 to size(P) do
 32            𝜌 ← repl_prob(U[j], 𝑈, 𝛾)
 33            if rand() < 𝜌 then P[j] ← 𝑉 [ 𝑗 ]) ;
 34      end
         /* Evaluate on ESE                                                                                                     */
 35      P.F, P.G ← evaluate(P.X)
         /* Prepare next iteration of GPSAF                                                                                     */
 36      Φ.advance(P.X, P.F, P.G)
 37      e ← update_error(P.F, P.G, P.𝐹,      ˆ P.𝐺)ˆ
 38      A←A∪P
 39 end
surrogates. The goal is to use surrogates more when they provide accurate predictions but use them
more carefully when they provide only rough estimations. Necessary for generalization, PKT also
applies to problems with multiple objectives and constraints, often with varying complexities and
                                                                 68


surrogate errors to be considered.
     Generally, we define PKT as a subset selection of 𝑘 solutions from a set of solutions 𝐶 by
applying pairwise comparisons under noise as shown in Algorithm 4.4. Initially, the solution set 𝐶
to select from is shuffled to randomize the matches (Line 1). If the current number of participants
|𝐶 (𝑡) | is odd, a random solution is chosen to compete twice (Line 4). Each competition occurs
under noise, based on the current prediction error of the surrogates. The noise is added to each
objective and constraint independently before comparing the solutions. After adding the noise, the
comparison is identical to the subset selection explained in Section 4.5.1.1 (feasibility, dominance,
random tie break) with two competitors (𝛼 = 2). The winner of each round moves on to the next
and is added to 𝐶 (𝑡+1) (Line 8). If too many solutions have been eliminated, randomly choose some
losers from the last round (Line 13). This results in a set of solutions of size 𝑘 being returned as
tournament winners under noise. The design of PKT applies to the most general case of constrained
multi-objective optimization because the selection procedure can be reduced to a comparison of
two solutions.
     Back to the cluster-wise selection in the 𝛽-phase where PKT is executed with 𝑘 = 1 to obtain a
winner for each solution set U_j. An example with five iterations (𝛽 = 5) and four infill solutions
𝑋1 , 𝑋2 , 𝑋3 , and 𝑋4 is illustrated in Figure 4.4. Calling the infill and advance function of the baseline
algorithm results in five solution sets (𝛽1 to 𝛽5 ) with four solutions each. The advancement of
multiple iterations is based on ASEs. In each iteration, all solutions are directly assigned to the
closest 𝑋𝑖 solution from the 𝛼-phase forming the cluster 𝑈𝑖 . The cluster search pattern division is
essential to preserve diversity. For each cluster, a winner 𝑉𝑖 is declared by performing the PKT.
For instance, in this example, 𝑋1 has four solutions in 𝑈1 where one from the fourth iteration 𝛽4
is finally selected. At the end of the 𝛽-phase, each cluster 𝑈𝑖 has at most one solution 𝑉𝑖 to be
assigned to (some clusters may stay empty because no solutions are assigned to it).
                                                     69


   Algorithm 4.4: Probabilistic Knockout Tournament (PKT)
     Input : Solution Set 𝐶, Prediction errors 𝑒, Number of winners 𝑘
  1  𝐶 (1) ← shuffle(𝐶)
  2  𝑡←1
  3  while |𝐶 (𝑡) | > 𝑘 do
  4       if |𝐶 (𝑡) | is odd then 𝐶 (𝑡) ← 𝐶 (𝑡) ∪ rselect(𝐶 (𝑡) , 1);
  5       𝐶 (𝑡+1) ← ∅
  6       foreach 𝑖 ← 1 to |𝐶 (𝑡) |/2 do
   7             𝑤 ← compare_noisy(𝐶2𝑖(𝑡) , 𝐶2𝑖+1(𝑡)
                                                     , 𝑒)
   8             𝐶 (𝑡+1) ←𝐶   (𝑡+1) ∪𝑤
  9       end
 10       𝑡 ←𝑡+1
 11  end
 12  if |𝐶 (𝑡) | < 𝑘 then
 13       𝐶 (𝑡) ← 𝐶 (𝑡) ∪ rselect(𝐶 (𝑡−1) \ 𝐶 (𝑡) , |𝐶 (𝑡) | − 𝑘)
 14  end
 15  return 𝐶 (𝑡)
4.5.1.2      Balancing the Exploration and Exploitation (𝛾)
PSAF has defined the probability 𝜌 for choosing between either keeping the 𝛼-solution or replacing
it with the 𝛽-solution based on the error of the surrogate model. However, now multiple surrogate
models exist, and for instance, the model for the first objective might have almost no prediction
error, but the one for the second objective be highly inaccurate. Thus a different logic has to be
implemented.
     GPSAF uses another particularly useful piece of information for this decision: the distribution
of assigned solutions across clusters. The search pattern derived from surrogates with a high-
density area indicates a region of interest. Thus, we propose to set the replacement probability to
                                                                     𝛾
                                                           |𝑈 𝑗 |
                                                 𝜌=                      .                        (4.3)
                                                         max 𝑗 |𝑈 𝑗 |
     The denominator max 𝑗 |𝑈 𝑗 | normalizes the number of assigned points with respect to the points
in the current cluster |𝑈 𝑗 |. The exponent 𝛾 can be used to control the importance of the distribution
and was kept constant at 𝛾 = 0.5. The cluster with the highest density is always chosen from the
𝛽-phase because the nominator and denominator will be equal. This will always be the case for
                                                           70


baseline algorithms returning only one infill solution where stronger surrogate bias is generally
desirable. After the replacement, the solutions will finally be sent to the time-consuming solution
evaluation.
4.5.1.3   Surrogate Management
Besides using surrogates in an algorithmic framework, more needs to be said about the models
themselves. First, one should note that only the predictions of data points need to be provided by
surrogates and no additional error estimation (the error estimates are kept track of by our method
directly). Not requiring an error estimation does not limit the models to a specific type, unlike other
surrogate-based algorithms. Second, each of the objective and constraint functions is modeled
independently, known as M1 in the surrogate usage taxonomy in [23]. Even though modeling all
functions increases the algorithmic overhead, it prevents larger prediction errors through complexity
aggregations of multiple functions. Third, a generic framework for optimizing computationally
expensive functions requires a generic surrogate model implementation. Clearly, some model
types are more suitable for some problems than others. Thus, to provide a more robust framework,
each function is approximated with a set of surrogates, and the best one is chosen to be used. The
surrogate types in this section consist of the model types RBF [34] and Kriging [35], both initialized
with different hyper-parameters (normalization, regressions, kernel). A pre-normalization step
referred to as PLOG [226] is attempted and selected, if well-performing, for constraint functions.
Two metrics assess the performance of a model: First, Kendall Tau Distance [239] comparing
the ranking of solutions being less sensitive to outliers with a large prediction error; second, the
Maximum Absolute Error (MAE) to break any ties. The value of MAE is also used as an error
approximation when noise is added to individuals. The error estimation in the first iteration is
based on k-fold cross-validation (𝑘 = 5) to get a rough estimate of how well a surrogate can capture
the function type. The performance metrics are updated in each iteration by fitting a surrogate
based on all solutions seen so far (training set) and assessing their error on the newly evaluated
solutions (test set). Finally, a moving average of five iterations to avoid a smooth and more robust
                                                  71


estimation provides the data for selecting the best surrogate and estimating the prediction error for
each objective and constraint.
4.5.2    Experimental Results
In this section, we present the performance of GPSAF applied to various population-based algo-
rithms solving unconstrained and constrained, single- and multi-objective optimization problems.
Proposing an optimization framework requires comparing a group of algorithms, which is not a
trivial task itself. Benchmarking is further complicated when non-deterministic algorithms are
compared, in which case not only a single but multiple runs need to be considered.
    For a fair comparison of optimization methods across test problems and to measure the impact
of GPSAF on a baseline algorithm, we use the following ranking-based procedure:
    i) Statistical Domination: After collecting the data for each test problem and algorithm (A ∈ Ω)
       from multiple runs, we perform a pairwise comparison of performance indicators (PI) between
       all algorithms using the Wilcoxon Rank Sum Test (𝛼 = 0.05). The null-hypothesis 𝐻0 is that
       no significant difference exists, whereas the alternative hypothesis is that the performance
       indicator of the first algorithm (PI(A)) is smaller than the one of the second (PI(B)). For
       single-objective optimization, the PI function consists of the gap to the optimum (if known)
       or the best function value found. For multi-objective optimization IGD [240] (if optimum is
       known) or Hypervolume [241] is used. The dominance function between two algorithms, A
       and B , is then defined by
                             𝜙(A, B) = RANKSUM(PI(B), PI(A), alt=’less’),                       (4.4)
       where the function 𝜙(A, B) returns zero if the null hypothesis is accepted or a one if it is
       rejected.
   ii) Number of Dominations: The performance 𝑃(A) of algorithm A is then determined by the
                                                  72


       number of methods that are dominating it:
                                                     ∑︁
                                            𝑃(A) =       𝜙(B, A).                                (4.5)
                                                     B∈Ω
                                                    A≠B
       This results in a domination number 𝑃(A) for each method, which is zero if no other
       algorithm does not outperform it.
  iii) Ranking: Finally, we sort the methods by their 𝑃(A). This may result in a partial ordering
       with multiple algorithms with the same 𝑃(A) values. In order to keep the overall sum of
       ranks equal, we assign their average ranks in case of ties. For instance, let us assume five
       optimizations methods A, B, C, D, and E: algorithm A outperforms all others; between the
       performances of B, C, and D, no significant difference exists; E performs the worst. In this
       case, method A gets rank 1, the group of methods B, C, and D, rank (2 + 3 + 4)/3 = 9/3 = 3,
       and E ranks 5. Averaging the ranks for ties penalizes an optimization method for being
       dominated by the same amount of algorithms as others and keeps the rank sum for each
       problem the same.
    This conveniently provides a ranking for each test problem. To evaluate the performance of a
method on a test suite, we finally average the ranks across problems. If an algorithm fails to solve a
specific problem for all runs, it gets the maximum rank and becomes the worst performing algorithm.
Otherwise, all failing runs will be ignored (this has only rarely happened for a competitor algorithm
to compare with). The ranks are used to compare the performances of methods in this manuscript,
the values of the performance indicators for the methods on all test problems can be found in the
Supplementary Document. Each algorithm has been executed 11 times on each test problem. If not
explicitly mentioned in the specific experiment, the total number of solution evaluations has been
set to ESE (max) = 300. For some simpler constrained problems, even fewer evaluations have been
used. A relatively limited evaluation budget also means that more complicated problems might
not be solved (near) optimally. However, a comparison of how well an algorithm has performed
imitates the situation researchers face in practice. If the number of variables is not fixed, the
                                                   73


number of variables is fixed to 10. The results are presented in ranking tables where the overall
best performing algorithm(s) are highlighted with a gray cell background for each ranking-based
comparison for a test problem. The best-performing ones in a group are shown in bold.
     Moreover, some more details about our implementation shall be said. For the baseline al-
gorithms, we use implementations of population-based algorithms available in the well-known
multi-objective optimization framework pymoo1 [29] developed in Python. For all methods, the
default parameters provided by the framework are kept unmodified, except the population size
(=20) and the number of offsprings (=10) to create a more greedy implementation of the methods.
The surrogate implementation of Kriging is based on a Python clone2 of DACEFit [55] origi-
nally implemented in Matlab. The RBF models are a re-implementation based on [226]. The
hyper-parameters of GPSAF were determined through numerous empirical experiments during the
algorithm development. A reasonable and well-performing configuration given by 𝛼 = 30, 𝛽 = 5,
and 𝛾 = 0.5 is fixed throughout all experiments.
4.5.2.1      (Unconstrained) Single-Objective Optimization
The first experiment investigates the capabilities of GPSAF for improving the performance of ex-
isting algorithms on unconstrained single-objective problems. We use the BBOB test problems (24
functions in total) available in the COCO-platform [233] which is a widely used test suite with a
variety of more and less complex problems. Four well-known population-based optimization meth-
ods, DE [107], GA [1], PSO [118], and CMAES [112] serve as baseline optimization algorithms
and their GPSAF variants provide a surrogate-assisted version. The results are compared with four
other surrogate-assisted algorithms, SACOSO [222], SACC-EAM-II [242], SADESammon [243],
SAMSO [244] available in the PlatEMO [245] framework. The rankings from the experiment are
shown in Table 4.4. First, one can note that GPSAF outperforms the other four existing surrogate-
assisted algorithms. One possible reason for the significant difference could be their development
for a different type of test suite (for instance, problems with a larger number of variables). In
    1 http://pymoo.org  (Version 0.5.0)
    2 https://pypi.org/project/pydacefit/
                                                  74


Table 4.4: A comparison of DE, GA, PSO, and CMAES with their GPSAF variants on unconstrained
single-objective problems with four other surrogate-assisted algorithms. The rank of the best
performing algorithm in each group is shown in bold. The overall best performing algorithm for
each problem is highlighted with a gray shade.
                                                                             SACC-
                  GPSAF-       GPSAF-         GPSAF-          GPSAF-     SA           SADE      SA
  Problem   DE            GA            PSO           CMAES                  EAM-
                    DE           GA            PSO            CMAES   COSO           Sammon    MSO
                                                                                II
    f01     11.0    4.0   7.5    2.5     5.5     1.0    5.5      2.5    7.5    9.5     12.0     9.5
    f02     10.0    4.0   4.0    1.0     4.0    2.0     7.5      6.0    12.0   7.5     10.0    10.0
    f03      8.5    3.0   5.0    1.5     7.0     1.5    8.5      5.0    10.0   5.0     11.5    11.5
    f04      8.5    4.0   4.0    2.0     4.0     1.0    6.5      6.5    10.0   8.5     11.0    12.0
    f05      5.5    1.0   7.5    3.0     5.5    2.0     7.5      4.0    11.0   9.5      9.5    12.0
    f06      7.0    7.0    2.0   2.0     4.5    4.5     9.0      2.0    10.0   7.0     11.0    12.0
    f07      9.5    5.5   5.5    2.0     5.5     2.0    5.5      2.0    9.5    8.0     11.0    12.0
    f08      8.0    6.0   8.0    2.0     5.0     2.0    4.0      2.0    10.0   8.0     11.0    12.0
    f09     10.0    3.5   8.0    3.5     3.5     3.5    7.0      3.5     3.5   9.0     11.5    11.5
    f10     10.5    5.5   5.5    1.5     5.5     1.5    5.5      5.5    10.5   5.5     10.5    10.5
    f11     10.5    3.5   7.0    1.5     7.0    3.5     7.0      7.0    10.5   1.5     12.0     7.0
    f12      2.5    6.5   6.5    6.5     2.5     2.5    2.5      6.5    10.0   9.0     11.0    12.0
    f13      7.5    5.0   7.5    2.5     5.0     1.0    5.0      2.5    10.0   9.0     11.0    12.0
    f14      9.5    5.0   7.5    2.0     5.0     2.0    5.0      2.0    9.5    7.5     11.0    12.0
    f15     10.0    3.5   6.5    1.5     6.5     1.5    6.5      3.5    9.0    6.5     11.0    12.0
    f16      9.0    6.0   6.0    2.0     9.0     2.0    4.0      2.0    11.5   6.0     11.5     9.0
    f17      9.5    5.0   7.0    3.0     7.0    3.0     3.0      1.0    11.0   7.0     9.5     12.0
    f18      9.5    4.0   6.0    2.0     6.0     2.0    6.0      2.0    9.5    8.0     11.0    12.0
    f19     11.0    5.0   10.0   1.0     5.0    5.0     8.5      5.0    5.0    2.0     8.5     12.0
    f20      9.5    2.5   7.0    2.5     2.5     2.5    5.5      5.5    9.5    8.0     11.5    11.5
    f21      8.5    7.0    3.5   3.5     3.5     3.5    3.5      3.5    10.0   8.5     11.5    11.5
    f22      9.5    6.0   6.0    2.5     6.0     2.5    2.5      2.5    9.5    8.0     11.0    12.0
    f23     10.0    5.5   5.5    1.5     5.5    5.5    11.0     12.0    9.0    1.5     5.5      5.5
    f24     10.0    2.0   8.0    2.0     4.0     2.0    8.0      5.5    8.0    5.5     11.0    12.0
   Total   8.958   4.583 6.292  2.292  5.188   2.479   6.021   4.146   9.417  6.896   10.667  11.062
this test suite, some problems are rather complicated and exploiting the surrogate too much will
cause the optimizer to be easily trapped in local optima. Also, we contribute the efficiency of
GPSAF to the significant effort for finding the most suitable surrogate. The order of relative rank
improvement is given by GA (6.292/2.292 = 2.7452), PSO (2.0927), DE (1.9546), and for CMAES
(1.4522). Besides GPSAF-GA having the biggest relative rank improvement, it also is the overall
best performing algorithm in this experiment, closely followed by GPSAF-CMAES. Altogether, a
significant and quite remarkable improvement is achieved by applying GPSAF for (unconstrained)
single-objective optimization.
4.5.2.2    Constrained Single-Objective Optimization
Rarely are optimization problems unconstrained in practice. Thus, especially for surrogate-assisted
methods aiming to solve computationally expensive real-world problems, the capability of dealing
                                                  75


Table 4.5: A comparison of DE, GA, PSO, and ISRES with their GPSAF variants on constrained
single-objective problems with SACOBRA – the current state-of-art algorithms for constrained
optimization.
                                GPSAF-           GPSAF-             GPSAF-    GPSAF-
    Problem   ESE (max)   DE              GA                 PSO                       SACOBRA
                                  DE               GA                 PSO      ISRES
      G1         75       5.5     3.5      7.5      3.5       7.5      5.5        1.0      2.0
      G2        300       3.5     7.0      1.0      3.5       6.0      3.5       8.0       3.5
      G4         75       6.5     3.5      6.5      5.0       8.0      3.5        1.5      1.5
      G6         75       7.0     4.0      7.0      4.0       7.0      4.0        1.5      1.5
      G7         75       7.0     4.0      7.0      4.0       7.0      4.0       2.0       1.0
      G8        100       7.0     4.5      7.0      4.5       7.0      3.0        1.0      2.0
      G9        300       7.0     4.5      7.0      2.5       4.5      2.5       7.0       1.0
      G10       300       8.0     3.5      6.5      3.5       6.5      3.5       3.5       1.0
      G11       300       7.0      2.5     5.5      2.5       5.5      2.5        2.5      8.0
      G12       300       6.0     4.5      8.0      4.5       7.0      3.0        1.5      1.5
      G16       300       5.5      2.5     5.5      8.0       7.0      2.5        2.5      2.5
      G18       300       7.0     3.5      7.0      3.5       7.0      3.5       3.5       1.0
      G19       300       7.5      2.0     6.0      2.0       7.5      2.0       5.0       4.0
      G24       300       7.5     4.5      7.5      4.5       6.0      2.0        2.0      2.0
    Total                6.571   3.857   6.357    3.964      6.679   3.214      3.036     2.321
with constraints is essential. The so-called G-problems or G-function benchmark [246, 247] was
proposed to develop optimization algorithms dealing with different kinds of constraints regarding
the type (equality and inequality), amount, complexity, and result in feasible and infeasible search
space. The original 13 test functions were extended in a CEC competition in 2006 [248] to 24
constrained single-objective test problems [249]. In this study, G-problems with only inequality
constraints (and no equality constraints) are used. Besides the GPSAF variants of DE and GA,
improved stochastic ranking evolutionary strategy (ISRES) [250] is applied to GSPAF. ISRES
implements an improved mating strategy using differentials between solutions in contrast to its
predecessor SRES [251]. ISRES follows the well-known 1/7 rule, which means with a population
size of 𝜇 individuals 7 · 𝜇 offsprings are created. For this study, GPSAF creates a steady-state
variant of ISRES by using the proposed probabilistic knockout tournament to choose one out
of the 𝜆 solutions. This ensures a fair comparison with SACOBRA [226] which also evaluates
one solution per iteration. To the best of our knowledge, SACOBRA implemented in R [252]
is currently the best-performing algorithm on the G problem suite.            The constrained single-
objective results are presented in Table 4.5. First, it is apparent that the GPSAF variants improve
                                                 76


the baseline algorithms. Only for G2, the genetic algorithm outperforms its and other surrogate-
assisted variants, which we contribute to the very restricted feasible search space (also, this has
shown to be a difficult problem for surrogate-assisted algorithms in [226]). Second, GPSAF-ISRES
shows the best results out of all GPSAF variants. This indicates that it is beneficial if the baseline
method has been proposed with a specific problem class in mind. Even though DE, GA, and PSO
can handle constraints (for instance, naively using the parameter-less approach), there are known
to not perform particularly well on complex constrained functions without any modifications.
In contrast, ISRES has been tested on the G problems in the original study and proven to be
effective. Furthermore, adding surrogate assistance to it has further improved the results. Third,
GPSAF-ISRES shows competitive performance to the state-of-the-art algorithm SACOBRA. In this
experiment, out of all 14 test problems: GPSAF variants were able to outperform SACOBRA four
times and a baseline algorithm (GA) one time; five times the performance of at least one GPSAF
variant was similar; four times SACOBRA has shown significantly better results. Altogether, one
can say GPSAF has created surrogate-assisted methods competing with the state-of-the-art method
for constrained single-objective problems.
4.5.2.3   (Unconstrained) Multi-Objective Optimization
Many applications have not one but multiple conflicting objectives to optimize. For this reason,
this experiment focuses specifically on multi-objective optimization problems. As a test suite, we
choose ZDT [253], a well-known test suite proposed when multi-objective optimization has gained
popularity. Throughout this experiment, we set the number of variables to 10, except for the high
multi-modal problem, ZDT4, where the number of variables is limited to 5. The WFG [254] test
suit provides even more flexibility by being scalable with respect to the number of objectives. Here,
we simply set the objective number to be two to create another bi-objective test suite. Moreover, the
number of variables has been set to 10 where four of them are positional. The baseline algorithms
NSGA-II [10], SMS-EMOA [255], and SPEA2 [256] are used as baseline algorithms. The results
are compared with four other surrogate-assisted algorithms: AB-SAEA [137], KRVEA [225],
                                                  77


Table 4.6: A comparison of NSGA-II, SMS-EMOA, and SPEA2 with their GPSAF variants with
four surrogate-assisted algorithms on bi-objective optimization problems.
                                       GPSAF-
                        GPSAF-  SMS-                   GPSAF-
      Problem NSGA-II                   SMS-   SPEA2           AB-SAEA  K-RVEA  ParEGO   CSEA
                       NSGA-II  EMOA                   SPEA2
                                       EMOA
      ZDT1       9.0      2.5     9.0     2.5     9.0     2.5    6.0       5.0     2.5     7.0
      ZDT2       9.0      1.5     9.0    6.0      9.0    5.0     3.0       4.0     1.5     7.0
      ZDT3       9.0      4.5     9.0    4.5      9.0    4.5     4.5       1.0     2.0     7.0
      ZDT4       6.5      2.0     6.5     2.0     6.5     2.0    9.0       6.5    10.0     4.0
      ZDT6       7.5      3.0     9.5    6.0      9.5    3.0     5.0       3.0     1.0     7.5
      WFG1       9.0      6.0     9.0    6.0      9.0    6.0     4.0       2.0     2.0     2.0
      WFG2       7.0      1.5     8.0    9.0      4.5    4.5     4.5       1.5    10.0     4.5
      WFG3       8.0      3.5    10.0    3.5      8.0     1.5    5.5       1.5     5.5     8.0
      WFG4       6.5      1.5     9.0    5.0      6.5     1.5    3.5       3.5     9.0     9.0
      WFG5       8.0      2.5     8.0    4.0     10.0    5.0     6.0       2.5     1.0     8.0
      WFG6       9.0      2.0    10.0    6.5      4.5     2.0     2.0      4.5    6.5     8.0
      WFG7       5.5      4.0     8.5     2.0     8.5     2.0    5.5       7.0     2.0    10.0
      WFG8       7.5      2.5    10.0    5.0      9.0     2.5     2.5      6.0     2.5     7.5
      WFG9       7.5      3.0     7.5     3.0     6.0     3.0     3.0     10.0     9.0     3.0
      Total     7.786    2.857   8.786  4.643   7.786   3.214   4.571     4.143  4.607   6.607
Table 4.7: A comparison of NSGA-III, SMS-EMOA, and SPEA2 with their GPSAF variants with
four surrogate-assisted algorithms on three-objective optimization problems.
                                       GPSAF-
                        GPSAF-    SMS-                 GPSAF-
     Problem  NSGA-III                   SMS-   SPEA2          AB-SAEA   K-RVEA  ParEGO  CSEA
                       NSGA-III  EMOA                   SPEA2
                                        EMOA
     DTLZ1       1.5      3.5      1.5    6.5      5.0    8.0     10.0      9.0     6.5     3.5
     DTLZ2       8.5       2.0     8.5     2.0     8.5     2.0    6.0       4.0     5.0     8.5
     DTLZ3       2.0      5.5      2.0     2.0     5.5    8.0     10.0      9.0     5.5     5.5
     DTLZ4       6.5      6.5      9.0    6.5      6.5    3.5     2.0       1.0    10.0     3.5
     DTLZ5       6.0      3.0      9.0     1.0     9.0    2.0     6.0       6.0     4.0     9.0
     DTLZ6       7.5      5.5      7.5    3.5      9.0    5.5      1.5      1.5    10.0     3.5
     DTLZ7       8.0      3.5      8.0    3.5      8.0    3.5     3.5       1.0    10.0     6.0
     Total     5.714     4.214     6.5   3.571   7.357   4.643   5.571      4.5   7.286   5.643
ParEGO [126], CSEA [64] available in PlatEMO [245].
    The results on the two multi-objective test suites are shown in Table 4.6. First, one can note
that all surrogate-assisted algorithms outperform the ones without. This indicates that surrogate
assistance effectively improves the convergence behavior. Second, GPSAF-NSGA-II performs the
best with a rank of 2.893 and shows the best performance, followed by GPSAF-SPEA2, GPSAF-
SMS-EMOA, and KRVEA. It is worth noting that ParEGO is penalized by being terminated for
ZDT4 and WFG2, where the surrogate model was not able to be built.
    To show the behavior of three-objective optimization problems, we have replaced NSGA-II
with NSGA-III and run all algorithms on the DTLZ problems suite [257] test suite. The results
are shown in Table 4.7. Whereas for most problems, the GPSAF variants outperform the baseline
                                                  78


algorithms, for DTLZ1 and DTLZ3, this is not the case. Both problems consist of multi-modal
convergence functions, which causes a large amount of surrogate error. Thus, surrogate-assisted
algorithms (including the four GPSAF is compared to) are misguided. This seems to be a vital
observation deserving to be investigated in more detail in the future. Nevertheless, GPSAF improves
the performance of baseline algorithms for the other problems. GPSAF-SMS-EMOA shows overall
the best results in this experiment with an average rank of 2.786 followed by GPSAF-NSGA-III.
4.5.2.4   Constrained Multi-Objective Optimization
Lastly, we compare GPSAF on constrained multi-objective optimization problems which often
occur in real-world optimization. The challenge in dealing with multiple objectives and constraints
in combination with computationally expensive solution evaluations truly mimics the complexity
of industrial optimization problems. We have compared our results with HSMEA [136] a recently
proposed algorithm for constraint multi-objective optimization. With consultation of the authors,
some minor modifications of the publicly available source code had to be made for dealing with
computationally expensive constraints – as this is an assumption made in this study. The results on
CDTLZ [258], BNH [259], SRN [260], TNK [261], and OSY [262] are shown in Table 4.8. Again,
one can observe that the GPSAF variants consistently improve the performance of the baseline
optimization methods. The only exception is C1-DTLZ1, where all methods could find no feasible
solution, and thus, an equal rank is assigned. We contribute this to the complexity of the test
problems given by the constraint violation and the multi-modality of the objective functions. For
OSY and TNK, the GPSAF variants show a significantly better performance than HSMEA; for
C3-DTLZ4, the performance is similar; and for C2-DTLZ2, BNH, and TNK, it performs better.
Altogether, GPSAF-NSGA-II can obtain a better rank than HSMEA, but it is fair to say that for
three out of the seven constrained multi-objective optimization problems, HSMEA is the winner.
Nevertheless, GPSAF improved the performance of baseline algorithms and showed competitive
results to another surrogate-assisted optimization method.
                                                 79


Table 4.8: A comparison of NSGA-III, SMS-EMOA, and SPEA2 with their GPSAF variants with
four surrogate-assisted algorithms on constrained multi-objective optimization problems.
                                                           GPSAF-
                                        GPSAF-     SMS-                        GPSAF-
       Problem     ESE (max) NSGA-II                         SMS-    SPEA2               HSMEA
                                        NSGA-II   EMOA                         SPEA2
                                                            EMOA
       C1-DTLZ1      300        4.0        4.0       4.0      4.0       4.0       4.0      4.0
       C2-DTLZ2      300       4.5         4.5      4.5       4.5      4.5       4.5       1.0
       C3-DTLZ4      300       5.0         1.5      5.0       5.0      5.0       5.0       1.5
       BNH           100       5.5         2.5      7.0       4.0      5.5       2.5       1.0
       SRN           100       6.0         3.0      6.0       3.0      6.0       3.0       1.0
       TNK           100       6.0         2.0      6.0       2.0      6.0        2.0      4.0
       OSY           300       5.0         1.5      5.0       3.0      5.0        1.5      7.0
      Total                   5.143       2.714    5.357     3.643    5.143     3.214     2.786
4.5.3    Summary of Section 4.5
This section has proposed a generalized probabilistic surrogate-assisted framework applicable to
any type of population-based algorithm. GSPAF incorporates two different phases to provide
surrogate assistance, one considering using the current state of the baseline algorithm and the
other looking at multiple iterations into the future. In contrast to other existing surrogate-assisted
algorithms, the surrogate search is not reduced to the final solutions on the surrogate, but the whole
search pattern is utilized. Solutions are selected using a probabilistic tournament that considers
surrogate prediction errors for objectives and constraints from the search pattern. GPSAF has
been applied to multiple well-known population-based algorithms proposed for unconstrained and
constrained single and multi-objective optimization. We have provided comprehensive results on
test problem suites indicating that GPSAF competes and outperforms existing surrogate-assisted
methods. The combination of GPSAF creating well-performing surrogate-assisted algorithms with
its simplicity and broad applicability is very promising.
     The encouraging results provide scope for further exploring generalized surrogate-assisted al-
gorithms. One main challenge of a generalized approach is the recommendation of hyper-parameter
configurations (𝛼, 𝛽, 𝜌, or 𝛾). The parameters have been set through empirical experiments; how-
ever, through the broad applicability, different mechanisms of baseline algorithms on very different
optimization problems make it difficult to draw generally valid conclusions. A more systemic and
                                                  80


possibly resource-intensive study will provide an idea of how different hyper-parameter settings im-
pact the performance of different algorithms. In addition, experiments investigating the sensitivity
shall be especially of interest.
    The focus of this study was to explore different types of problems with multiple objectives and
constraints. Thus, the number of variables was kept relatively small as this is often the case for
computationally expensive problems. Therefore, even though the search space dimensions do not
directly impact the idea proposed in this section, it shall be part of a future study of how surrogate
assistance performs for large-scale problems. Moreover, the number of solution evaluations per
run has been set to 300, which allows using all solutions exhaustively for modeling without a large
modeling overhead. However, more solution evaluations might be feasible for mediocre expensive
optimization problems.
    Nevertheless, this extensive explorative study on the use of surrogates in single and multi-
objective optimization with and without constraints has indicated a viable new direction in congru-
ence with existing emerging studies for a generic optimization methodology.
4.6   Summary of the Chapter
In this chapter, we have proposed a framework of generalized probabilistic surrogate-assisted algo-
rithms. The idea is based on improving the convergence of an existing algorithm by incorporating
a surrogate’s knowledge. First, we have proposed PSAF for computationally expensive single-
objective problems where surrogate influences the solutions to be evaluated through a tournament-
based procedure with 𝛼 competitors. An even stronger surrogate’s bias is introduced by using
solutions with probability 𝜌 derived from continuing the optimizing for 𝛽 iterations on the surro-
gate. Experiments with a parametric study on 𝛼, 𝛽, and 𝜌 have shown that the proposed approach
effectively improves the convergence behavior on various problems. Second, we have extended
PSAF to a more general framework GPSAF which can handle multiple objectives and constraints.
We have proposed comparing two solutions by using a constrained Pareto-dominance relation under
uncertainty by introducing some error noise to the objectives and constraints. The selection of a
                                                 81


single solution from a set is defined based on the pairwise comparisons in a knockout tournament.
Both tasks are incorporated into a generalized framework to use surrogates as assistance during
convergence. Results indicate that GPSAF applied to eight different unconstrained and constrained,
as well as single- and multi-objective algorithms, show a competitive performance. Altogether, the
proposed probabilistic surrogate-assisted concepts shall pave the way for new algorithms. PSAF
and GPSAF allow using existing algorithms’ benefits to solve computationally expensive problems
using surrogates efficiently. Thus, this could be an alternative to the widely-used fit-and-optimize
method used in EGO and other algorithms.
                                                 82


                                              CHAPTER 5
          HETEROGENEOUS EXPENSIVE OBJECTIVES AND CONSTRAINTS
This chapter focuses on efficiently optimizing problems with heterogeneously expensive objectives
and constraints. First, some examples demonstrate why this is an essential topic for optimization and
why it is worth exploiting varying times during evaluation. Second, different evaluation procedures
in an algorithm regarding the job granularity and their scheduling are discussed. Then, a more
specific but frequently occurring case of computationally expensive objectives with inexpensive
constraints is investigated. Lastly, any kind of heterogeneity of objectives and constraints is
considered, which requires handling partial information during evaluation and optimization. The
majority of this chapter is based on the article published in [26], except Section 5.4 which is based
on [25].
5.1   Introduction
Many real-world optimization problems require the consideration of multiple conflicting objectives
to reflect the complexity of the application [9]. Additionally, the satisfaction of constraints is
necessary to guarantee to find an indeed feasible solution [263]. Let us quickly review the definition
of an optimization problem
                          Minimize    𝑓𝑚 (x),                 ∀𝑚 ∈ (1, . . . , 𝑀),
                          subject to 𝑔 𝑗 (x) ≤ 0,               ∀ 𝑗 ∈ (1, . . . , 𝐽),            (5.1)
                                      𝑥 𝑑(𝐿) ≤ 𝑥 𝑑 ≤ 𝑥 𝑑(𝑈) ,  ∀𝑑 ∈ (1, . . . , 𝐷),
where x are the variables to optimize, 𝑥 𝑑(𝐿) and 𝑥 𝑑(𝑈) are the lower and upper bounds for the
𝑑-th variable, 𝑓𝑚 is the 𝑚-th objective, and 𝑔 𝑗 the 𝑗-th constraint function. For brevity, no
equality constraints are considered here. The above mathematical description makes it apparent
that assessing the performance of one design (solution) x requires evaluating a set of functions: 𝑀
                                                    83


objective and 𝐽 constraint functions. This results in evaluating a total of 𝑀 + 𝐽 functions, from here
on, referred to as target functions.
    Because of the multi-disciplinary nature real-world problems, some of the target functions (a
functional group) may require calling a single third-party software, running a simulation [21, 22], or
other computing-intensive tasks [67, 153]. Depending on the software being used, the performance
assessment might become fairly time-consuming, for instance, a couple of hours or even days [28].
A typical practical optimization problem may have two to five such functional groups which must
be independently called and evaluated. In addition, there may be certain simplistic target functions,
which are mathematically defined and are much quicker to compute than the software or simulation
codes.
    When optimizing computationally expensive functions, special attention needs to be paid to
the limited solution evaluation budget. Over the last decades, researchers have predominantly
used surrogate-assisted optimization methods, where an interpolation or approximation model
of the computationally expensive target function is utilized during optimization [264]. Most
existing surrogate-assisted algorithms assume that the values of all target functions (objectives and
constraints) are evaluated within one computing job and become available only at the end of the
most expensive target evaluation.
    A point-based optimization method requires a single new solution to be evaluated at a time to
complete an iteration. Thus, the possibilities of exploiting the heterogeneity in the evaluation time
of different target functions are limited. However, for a population-based optimization algorithm,
such as a generational evolutionary computation (EC) algorithm, a set of offspring population
members must be evaluated to proceed to the next generation. It is intuitive to realize that if
some target functions are relatively quick to compute in contrast to others, the partial evaluation
of population members can be utilized to determine if a population member needs to go through
with more expensive target function evaluations. Thus, there is a need for building an evaluation
plan for handling heterogeneous target functions for saving computational time, specifically in a
population-based optimization algorithm. This is the main crux of this paper.
                                                  84


    The majority of the surrogate-assisted optimization methods create new infill solutions based
on optimization of the surrogate models. This is useful in its own right in hopefully finding
good solutions in a quick computational time. However, since infill solutions are to be evaluated
fully for all target functions before they can be used to update surrogate models or proceed with
the algorithm, the ideas must be modified for heterogeneous target functions to harness the full
advantage of quick partial evaluation of them. Because neither independently computable nor
heterogeneously expensive functions have been a major focus of research in the past, barring a few
studies, [25, 140, 141], researchers and practitioners remain using existing optimization methods
by letting the optimizer wait until the calculation of all targets has finished. In such a case, the
most time-consuming target function determines the waiting time for a solution to be evaluated
entirely [140]. The waiting is caused by the optimization method not being capable of processing
partial information and results in unused time slowing down the convergence. Other challenges,
such as managing computational resources, software licenses, or dealing with hardware or software
failures, must be addressed when optimizing real-world problems.
5.2   Related Work
Some effort has been made in the past to investigate the heterogeneous expensiveness of target
functions. Mostly, bi-objective problems with no constraints have been considered so far. This
implies one objective being computationally inexpensive (cheap) and one being expensive. Since
only two target functions are considered, authors also refer to the difference as a delay in evaluating
the objective functions.
    The first paper directly addressing heterogeneously expensive objectives was published by
Allmendinger et al. [140] in 2013. The authors have proposed three different ways of dealing with
missing an objective value caused by such a delay. First, the missing objective value can be filled
by randomly drawing pseudo values in the boundaries of the objective space (random). Second,
some Gaussian noise is added to the corresponding objective value of a randomly chosen individual
(being evaluated on all objectives) and assigned (noise-based). Third, the missing value is replaced
                                                 85


by the nearest neighbor’s objective values – which can be interpreted as fitness approximation – in
the design space being evaluated on all objectives (fitness inheritance). Moreover, for the evaluation
selection, the authors have proposed to select always the most recently generated offsprings (sweep
selection) or to select them based on a priority score obtained by full or partial non-dominated rank
(priority selection). The results indicate that the fitness-inheritance-based pseudo value assignment
combined with the sweep selection performs the best.
    This initial study has been extended by a number of schemes for handling the delay of an
objective function [141]. The authors proposed four different approaches: wait for all objectives to
be finished (Waiting); optimize the cheap objective first and evaluate the expensive objective for the
optima found (Fast-First); use the cheap objective to look ahead at possible promising offsprings
each generation (Brood Interleaving); incorporate even more selection pressure for the expensive
objective evaluations by running a single-objective optimization algorithm on the cheap objective
(Speculative Interleaving); the experimental study revealed that the performance is affected by
the amount of delay for the objective. Speculative Interleaving turned out to perform well when
the termination criterion is based on a shorter time limit, and the delay of the objectives is rather
significant. Unsurprisingly, the Fast-First strategy outperformed other methods when the objectives
were highly positively correlated. The authors also found out that Waiting and Brood Interleaving
became increasingly competitive with a longer running time of the algorithms.
    In 2018, Chugh et al. [142] proposed HK-RVEA an extension of K-RVEA [124] which can han-
dle two objective functions with different latencies. Moreover, in contrast to other existing methods,
where the expensive objective is predicted by relatively simple approximation, the authors have
used Kriging (also known as Gaussian process), a powerful approximation model frequently used
in surrogate-assisted optimization. Significant changes compared to the original K-RVEA are
related to the training and update mechanism of the surrogate, driven by a single-objective evolu-
tionary algorithm. A comparison regarding bi-objective test problems with previously proposed
approaches [140, 141] showed that HK-KRVEA works especially well in cases with low latencies.
    Thomann et al. have developed the trust region-based algorithm MHT that employs quadratic
                                                    86


approximations for objectives not being evaluated yet [143]. The Tammer-Weidner-functional is
used for finding descending directions to make use of the heterogeneity of the objective functions.
The trust region limits the surrogate’s underlying error and serves as a step size in each iteration.
The authors have used the concept of local ideal points given by the minimum of the local quadratic
model to calculate the search direction in each step. Because of the limited function evaluation
budget and one computationally expensive function, the goal of this initial study was to obtain only
a single Pareto-efficient solution. In [265] the same authors have proposed a method that starts
from the Pareto-efficient solution found by MHT and attempts to explore the neighborhood of the
solution further to cover the whole or at least parts of the Pareto-front.
    A surrogate-assisted approach called Tr-SAEA has been proposed by Wang et al. for heteroge-
neously expensive bi-objective problems [144, 266]. Inevitably, the existence of a computationally
inexpensive and an expensive one quickly leads to knowledge asymmetry. Thus, the authors
propose a transfer learning scheme within a surrogate-assisted evolutionary algorithm to transfer
knowledge from the fast objective to the slower one. The transfer is achieved by fitting models
where knowledge about the variables and the fast objective serve as an input. The approach has
shown to be more robust to varying levels of latency and correlation between the objectives.
    Another informative resource about the state-of-the-art of heterogeneous objectives and future
research can be found in [267]. The article focuses on unconstrained multi-objective optimization
with heterogeneous objective functions. Heterogeneity is discussed in a general manner with a focus
on the heterogeneity of the evaluation times. The authors give an overview of recent developments
and possible gaps in this research direction.
    In [268], independently computable functions in constrained single-objective optimization
have been investigated. The authors have proposed eight different constraint handling techniques
by combining the ranking of infeasible/feasible solutions, the evaluation type, and the constraint
violation aggregation function. In this first study, the sequence in which the constraints are
evaluated is determined randomly. Results have shown that this can already significantly improve the
convergence of an optimization algorithm. Later on, the work has been extended by incorporating a
                                                  87


feasibility relaxation mechanism to permit constraint evaluation for potentially important solutions
close to the constraint boundary and by using the feasibility ratio to determine the sequence for
constraint evaluation [269]. By ordering the constraints based on the likelihood of violation,
computational resources can be saved by stopping to evaluate a solution further as soon as the
first violating constraint is discovered. Instead of using only knowledge of the feasibility ratio
of constraints gathered from the past, a surrogate for predicting a solution’s likelihood to violate
a specific constraint is used in [270]. Other novelties presented by the authors are a modified
infeasibility-driven ranking for ordering the partially evaluated solutions and an adaptive switching
between partial and complete evaluation. Whereas the ranking is essential to give the potentially
infeasible solution a chance to survive, the switching guarantees a minimum amount of entirely
evaluated solutions.
     Besides publications directly addressing heterogeneously expensive objectives, the connection
to related research directions shall be discussed. One way of addressing the expensive objective
is approximating the value with a surrogate before doing the time-consuming evaluation. This
introduces a low-fidelity evaluation (using the approximation model) with an underlying prediction
error and a high-fidelity model (time-consuming but without any error) for the corresponding
objective. Having an objective function with different fidelity levels is also known as multi-fidelity
optimization [271]. However, in contrast to heterogeneously expensive problems addressed in
this paper, in multi-fidelity optimization, not only one but multiple functions with often different
expenses exist for the same target. Moreover, the existence of multiple independently computable
target functions requires to think about the design of distributed and asynchronous algorithms [143].
Distributing the evaluation of a set of solutions and their objectives and constraints on different
computing nodes causes asynchronicity. An implementation has to address the asynchronicity,
for instance, by waiting for all information necessary to obtain and return the results – the most
common implementation in algorithms – or by processing asynchronous events and thus partial
evaluations. Furthermore, the situation of having partially evaluated solutions is in a way related
to not considering some objectives or targets temporarily. Some studies have addressed the more
                                                   88


specific case of removing (redundant) objectives for the whole optimization run [272]. In the case
of heterogeneous expensiveness, one usually knows beforehand what objective is expensive and,
thus, might less frequently be available for all individuals early on. This changes the viewpoint
from what objective is being removed to how to decide whether it is worthwhile to spend time
evaluating the time-consuming evaluation of the expansive objective for some individuals or not.
Moreover, accessing only partial information of objectives or constraints is related to analyzing a
data set with missing values. The occurrence of missing values has been studied thoroughly in data
science and machine learning and thus is worth having a look at [273].
     To the best of our knowledge, the combination of heterogeneously expensive functions for
constrained multi-objective optimization has not been explored yet. Thus, this work shall provide a
starting proof-of-principle study for different evaluation times considering both multiple objectives
and constraints and should encourage more attention in the near future.
5.3    Background
Before different approaches for optimizing problems with heterogeneous target functions are dis-
cussed, the evaluation process must be looked at systematically. In general, the evaluation process
itself can be split into two interdependent parts: i) the jobs being submitted by the algorithm, and
ii) the scheduling of these jobs. We propose a scheme of different ways for an algorithm to submit
jobs regarding a set of solutions and target functions for the former. This defines the frequency and
granularity of information the algorithms retrieves. However, the schedules determine the point of
time the algorithm is notified of the job to be finished. The purpose of the scheduler is to decide
what job should be executed next. Since resources are commonly limited, a scheduler often uses
a job queue or even more sophisticated load-balancing techniques. Even though this may sound
like a minor implementation detail, for practitioners running optimization methods in a distributed
computing environment, this can become crucially important.
                                                  89


                                                                            Target Values (V)
                                                                                   Objectives and/or Constraints
                                                           Elementwise ()/𝐸)                                           Batch ()/B)
                                               E/E Strategy                                             E/B Strategy
                           Elementwise (E/))
                                               function eval(X, V)        (1,1)                         function eval(X, V)          (1, V)
                                                for i in 1… |X|           (1, 2)                         for i in 1… |X|             (2, V)
    Set of Solutions (X)
                                                 for j in 1 … |V|                                          enqueue(X[i], V)
                                                                          (2, 1)                                                     (3, V)
                                                   enqueue(X[i], V[j])                J                 end                                   J
                                               end                        (2, 2)
                                                                          (3, 1)
                                               |J| = |X| % |V|                                          |J| = |X|
                                                                          (3, 1)
                                               B/E Strategy                                             B/B Strategy
                           Batch (B/))
                                                                          (X, 1)                                                     (X, V)
                                               function eval(X, V)                                      function eval(X, V)
                                                for j in 1 … |V|          (X, 2)
                                                                                                         enqueue(X, V)
                                                  enqueue(X, V[j])                    J                 end                                   J
                                               end
                                               |J| = |V|                                               |J| = 1
Figure 5.1: Strategies for the evaluation procedure considering a set of solutions 𝑋 and target values
𝑉 to be calculated.
5.3.1                 Evaluations Jobs
The frequency and granularity of information the algorithms retrieve opens up new possible ways
of asynchronous calculations to efficiently use computing resources. The evaluation of a set of
solutions 𝑋 with multiple target values 𝑉 can be achieved in several ways. For instance, the
algorithm can evaluate each solution in 𝑋 sequentially by submitting a job for each entry of 𝑋
separately or a single job containing all solutions in 𝑋 as a batch. Second, the algorithm can decide
what target values each of the jobs should include: Should it be all targets in 𝑉 that provide complete
information about a solution, or just a subset of targets? These choices result in four different ways
of packaging the computation jobs defining how the evaluation takes place (see Figure 5.1). We
refer to strategy Y/Z where Y and Z are replaced by E (elementwise) if only a single value and
by B (batch) if multiple solutions are chosen. The naming convention is applied analogously to Z
with respect to the target values. The B/B strategy is most commonly used, where the algorithm
                                                                                      90


schedules the calculation of all solutions 𝑋 and target values 𝑉 only once and retrieves the resulting
values when the job has finished. This reduces the number of scheduled jobs to one; however, it
does not allow the algorithm to retrieve any intermediate information during evaluation. Contrary
to scheduling all jobs at once, the calculation can be split into many small jobs using the E/E
strategy. For each solution 𝑋𝑖 and each target value 𝑉 𝑗 , a separated job is submitted. The resulting
number of jobs |𝐽 | is equal to |𝑋 | · |𝑉 |, and the algorithm retrieves a notification whenever each of
the jobs has finished. Thus, it has the highest frequency and granularity of information considered
in this schema. However, possible calculations that might be shared across the calculation of target
values are done repeatedly. The E/B strategy schedules a single solution at a time but multiple
target values, which results in a job list of size |𝑋 |. Since the set of solutions evaluated at a
time is usually larger than the target values, this is the strategy with the second most frequency
of information. The B/E strategy submits multiple jobs for each target value but does not split
up the set of solutions. This results in |𝑉 | jobs to be processed. Analogously, the E/E strategy
variables necessary for the calculations of multiple target values cannot be shared. Nevertheless,
if some target values are obtained significantly faster than others, the algorithm is notified without
waiting for more computationally expensive target values. It is worth mentioning that a target 𝑣 ∈ 𝑉
can also be a group of targets always being evaluated together. Each partitioning has benefits and
drawbacks regarding their flow of information to the algorithm and the concrete implementation.
5.3.2    Job Scheduling
Job partitioning defines the general frequency and granularity of information, but the time of
retrieving pieces of information remains unknown without concrete timing. The most straightfor-
ward implementation of a scheduler is a FIFO or priority queue, allowing new jobs to be added and
retrieving the next job to compute. For now, let us assume all jobs in the queue are processed in
parallel which requires distributing jobs to at least |𝑋 | · |𝑉 | workers.
    Further, the optimization problem consists of two objectives, 𝑓1 and 𝑓2 , and one constraint, 𝑔1 .
We assume different execution times for each target value: 𝑡 ( 𝑓1 ) for the first, 𝑡 ( 𝑓2 ) for the second
                                                     91


                             )/B                                              V = 𝑓$ ∪ 𝑓% ∪ 𝑔$
                           Strategy                                                 (X,V)
                                     𝑓$
                              (#)
                          𝑋          𝑓%
                                     𝑔$
                                                         …
                                     𝑓$
                         𝑋 (*)       𝑓%
                                     𝑔$
                                               𝑡 (𝑓$ )        𝑡 (𝑔$ )            𝑡 (𝑓% )
                             ) /E
                           Strategy               (X, 𝑓$ )       (X, 𝑔$ )           (X, 𝑓% )
Figure 5.2: A comparison of ·/B and ·/E strategies assuming parallel processing of all jobs, requiring
different average times for evaluation.
objective, and 𝑡 (𝑔1 ) for the constraint. The evaluation times are defined as 𝑡 ( 𝑓1 ) < 𝑡 (𝑔1 ) < 𝑡 ( 𝑓2 ).
Figure 5.2 demonstrates a possible evaluation procedure for a solution set of size three. Moreover,
it shows the information flow for the ·/B (E/B and B/B) and ·/E (B/E and E/E) strategy. The ·/B
strategy returns the result given by the union of all target values 𝑉 = 𝑓1 ∪ 𝑓2 ∪ 𝑔1 exactly once. This
implies the algorithm is waiting to obtain full information about all solutions and is idle meanwhile.
The waiting time is given by max(𝑡 ( 𝑓1 ), 𝑡 ( 𝑓2 ), 𝑡 (𝑔1 )) or in general by max(𝑡 (𝑉1 ), . . . , 𝑡 (𝑉|𝑉 | )). One
single outlier (with a rather larger evaluation time) will increase the overall waiting time and greatly
impact the algorithm’s overall performance. In this example, the maximum evaluation time is
given by 𝑡 ( 𝑓2 ), which is almost three times larger than 𝑡 ( 𝑓1 ) (see Figure 5.2). On the contrary,
the ·/E strategy provides some information to the algorithm whenever the calculation of a target
value has been finished. Thus, for the three target values the algorithm sends the first notification
at 𝑡 ( 𝑓1 ) with 𝑓1 , the second at 𝑡 (𝑔1 ) with 𝑔1 , and the third at 𝑡 ( 𝑓2 ) with 𝑓2 . This implies that the
optimization algorithm retrieves multiple partial function evaluations at different times, changing
the evaluation schedule by step-wise eliminations, thereby making the optimization task more
efficient. A batch-wise evaluation of targets would not allow any such advantage.
                                                         92


                                    E/E Strategy with Scheduling
                      𝑊 (!)  (𝑋  ! , 𝑓! )      (𝑋 ($) , 𝑓$ )
                      𝑊 (") (𝑋 (!), 𝑓$)                                                         (𝑋 (%) , 𝑓! )
                      𝑊 (#) (𝑋 ($), 𝑔!)                               (𝑋 (%) , 𝑔! )
                       𝑊 ($) (𝑋 (!), 𝑔!)                              (𝑋 ($) , 𝑓! )
                                          𝑡$                   𝑡%                     𝑡&        𝑡'          𝑡(        𝑡)
                                           !
                                        (𝑋   , 𝑓! )          (𝑋 (!) , 𝑓! )    (𝑋 ($) , 𝑓! )          (𝑋 (%) , 𝑔! )
                                                             (𝑋 ($) , 𝑓! )
                                                                                          (𝑋 (!) , 𝑓! )       (𝑋 (%) , 𝑓! )
                               Figure 5.3: Job schedule using a queue.
    A different execution of jobs might occur, not assuming that a computing unit exists for all jobs
simultaneously. Figure 5.3 illustrates a possible execution of jobs with four instances/worker 𝑊1 to
𝑊4 using the E/E strategy. Because the scheduler decides what jobs to execute next, no prediction
about the availability of target values can be made. Moreover, the dispatcher will report different
amounts of partial information; thus, an asynchronous algorithm design is desired.
    The systematic analysis of the evaluation process will help researchers think of possible imple-
mentations and their impact on the algorithm’s design. Based on the above evaluation strategies for
dealing with heterogeneously expensive objectives and constraints, we propose a population-based
optimization method with B/E evaluation strategy, which makes use of partial evaluation of target
functions to carefully eliminate evaluation of expensive targets for potentially inferior population
members, thereby making the overall method computationally efficient.
5.4   Computationally Expensive Objectives and Inexpensive Constraints
In the past years, a significant amount of research has been done in optimizing computationally
expensive and time-consuming objective functions using various surrogate modeling approaches.
                                                              93


Constraints have often been neglected or assumed to be a by-product of the expensive objective
computation, thereby being available after executing expensive evaluation routines. However,
many optimization problems in practice have separately evaluable computationally inexpensive
geometrical or physical constraint functions, while the objectives may still be time-consuming. This
scenario probably makes the simplest case of handling heterogeneous and multi-scale surrogate
modeling in the presence of constraints. Even though computationally expensive objective functions
have been studied extensively [104, 23, 38], computationally inexpensive constraints have been
paid little attention to in literature. Thus, we propose IC-SA-NSGA-II, an inexpensive constraint
handling method using a surrogate-assisted version of NSGA-II [10]. First, we describe three
different methods to generate a feasible design of experiments, and second, the outline of the
algorithm. Our proposed method makes explicit use of the computationally inexpensive constraint
functions and guarantees a solution’s feasibility before running the time-consuming simulation of
objective functions.
5.4.1    Design of Experiments
In surrogate-assisted optimization [104], the so-called Design of Experiments (DOE) needs to be
generated to build the initial model(s). The number of initial points 𝑁 DOE depends on different
factors, such as the number of variables or the complexity of the fitness landscape. A standard
method frequently used to generate a well-spaced set of points is Latin Hypercube Sampling
(LHS) [234]. However, with the availability of an efficient feasibility check, more sophisticated
approaches would be preferred. Therefore, three different methods returning a well-spaced set of
feasible solutions using inexpensive constraint functions are described.
Rejection Based Sampling (RBS): A relatively simple approach is modifying LHS, which already
provides a well-spaced set of solutions, to consider feasibility. Such a feasible and well-spaced set
of points can be obtained by using LHS to produce a point set 𝑃 and rejecting all infeasible solutions
to obtain a set of feasible solutions 𝑃(feas) . Because infeasible solutions have been discarded, the
resulting point set does not have the desired number of points (|𝑃(feas) | < 𝑁 DOE ) and, thus, the
                                                  94


process shall be repeated until enough feasible solutions have been found. If the size of the obtained
point set exceeds 𝑁 DOE , a subset is selected randomly.
Niching Genetic Algorithm (NGA): The sampling process itself can be seen as an optimization
problem where the objective is given by the constraint violation 𝑓 (x) = cv(x), which takes a value
zero for a feasible solution x, and a positive value proportional to the sum of normalized constraint
violation of all constraints if x is infeasible. Such an objective function results in a multi-modal
optimization problem where a diverse solution set with objective values of zeros shall be found.
One type of algorithm used for multi-modal problems is niching-based genetic algorithms (NGA),
where the diversity is ensured by an 𝜖-clearing based environmental survival [274]. For single-
objective optimization problems, the 𝜖-clearing occurs in the design space and guarantees a distance
(usually Euclidean distance is used) from one solution to another. The survival always selects the
best performing not already selected or cleared solution and then clears its neighborhood with less
than 𝜖 distance. Thus, the spread of solutions is accomplished by disfavoring solutions in each
other’s vicinity. After setting the objective to be the constraint violation, a suitable 𝜖 has to be
found. The suitability of a given 𝜖 depends on the size of the feasible region(s) of the corresponding
optimization problem and is unknown beforehand. On the one hand, if 𝜖 is too large, the number
of optimal solutions found by the algorithm will not exceed 𝑁 DOE . On the other hand, if 𝜖 is too
small, the solution set’s spread has room for improvement. For tuning the hyper-parameter 𝜖, we
start with 𝜖 = 𝜖0 (a number close to 1.0) and execute NGA. If the size of the obtained solution set
is less than 𝑁 DOE , we set 𝜖 = 0.9 · 𝜖 and repeat this procedure until a solution set of at least of size
𝑁 DOE is found.
Riesz s-Energy Optimization (Energy): The Riesz s-Energy [275] is a generalization of potential
energy concept and is defined for the point set z as
                                          |z| |z|
                                       1 ∑︁ ∑︁          1
                              𝑈 (z) =                             , z ∈ R𝑛×𝑀 ,                       (5.2)
                                       2 𝑖=1 𝑗=1 z (𝑖) − z ( 𝑗) 𝑠
                                              𝑗≠𝑖
where the inverse norm to the power 𝑠 of each pair of points (z (𝑖) and z ( 𝑗) ) is summed up. It has
already been shown in [276] that Riesz s-Energy can be used to achieve a well-spaced point set by
                                                    95


  Algorithm 5.1: IC-SA-NSGA-II: Inexpensive Constrained Surrogate-Assisted NSGA-II
    Input : Number of Variables 𝑛, Expensive Objective Function 𝑓 (x), Inexpensive Constraint
               Function 𝑔(x), Maximum Number of Solution Evaluations ESEmax , Number of Design of
               Experiments 𝑁 DOE , Exploration Points 𝑁 (explr) , Exploitation Points 𝑁 (exploit) , Number of
               generations for exploitation 𝑘, Multiplier of offsprings for exploration 𝑠.
    /* initialize feasible solutions using the inexpensive function 𝑔                                      */
  1 ← constrained_sampling(’energy’, 𝑁 DOE , 𝑔)
  2 F ← 𝑓 ()
  3 while || < ESE max do
        /* exploitation using the surrogate                                                                */
  4      𝑓ˆ ← fit_surrogate(, F)
                          
  5        (cand) , F(cand) ← optimize(’nsga2’, 𝑓ˆ, 𝑔, , F, 𝑘)
                          
  6        (cand) , F(cand) ← eliminate_duplicates(,(cand) , F(cand) )
  7     𝐶 ← cluster(’k_means’, 𝑁 (exploit) , F(cand) )
  8     (surrogate) ← ranking_selection( (cand) , 𝐶, crowding(F(cand) ))
        /* exploration using mating and least crowded selection                                            */
        ′    ′
  9       , F ← survival(, F)
 10     (mat) ← mating( ′ , F′ , 𝑠 · 𝑁 (explr) )
 11     (explr) ← feas_and_max_distance_selection( (mat) ,(cand) , 𝑋, 𝑔)
        /* evaluate and merge to the archive                                                               */
 12     F(explr) ← 𝑓 ( (explr) ); F(surrogate) ← 𝑓 ( (surrogate) );
 13     ← ∪(explr) ∪(surrogate)
 14     F ← F ∪ F(explr) ∪ F(surrogate)
 15 end
executing a gradient-based algorithm. The restriction made there that all points have to lie on the
unit simplex has been replaced by the feasibility check provided by the computationally inexpensive
constraint. Thus, a point is only replaced by its successor obtained by the gradient update if the
successor is feasible. The algorithm’s initial point set, which is necessary to be provided, is first
tried to be obtained by RBS, and if NGA could not find a sufficient number of feasible solutions.
5.4.2   Methodology
The outline of IC-SA-NSGA-II is shown in Algorithm 5.1. The initial design of experiments of
size 𝑁 DOE are obtained by the proposed initialization method based on Riesz s-Energy and then
evaluated by executing the expensive simulation 𝑓 (x) (Lines 1 and 2). Afterward, while the
                                                        96


    𝑓"                      Population (high-fidelity)                                         Select
                                                                                           least crowded
                            Candidates (surrogate)
                                                                                                              𝑥"
                                                                       Population (high-fidelity)
                                                                       Candidates (surrogate)
       Ranking Selection                                               Mating
       (based on Crowding Distance)
                                                     𝑓#                                𝑥#
(a) Exploitation: Select solutions from the candidates        (b) Exploration: Select from a solution set obtained
set obtained by optimizing on the surrogate                   through evolutionary operators by maximizing the dis-
                                                              tance to existing solutions and candidates.
                 Figure 5.4: The two steps in each iteration: exploitation and exploration.
number of solution evaluations ESEmax is not exceeded (Line 3) the algorithm continues to generate
𝑁 (exploit) solutions derived from the surrogate for exploitation and 𝑁 (explr) solutions obtained
by mating and a distance-based selection for exploration. The exploitation starts with fitting
surrogate model(s) which results in the approximation function 𝑓ˆ (Line 4). Using the surrogates
 𝑓ˆ and the computationally inexpensive function 𝑔 the optimization is continued or in other words
simulated assuming 𝑓 = 𝑓ˆ for 𝑘 more generations (Line 5). From the last simulated generation,
the candidates       (cand)  and F(cand) are extracted from the optimum, and duplicates with respect
to F are eliminated (Line 6). This ensures F(cand) to consist of only non-dominated solutions
with respect to F. From          (cand)  only 𝑁 (exploit) solutions are chosen for expensive evaluation by
executing ranking selection [1] in each cluster where the ranking is based on the crowding distance
in F(cand) . Figure 5.4a illustrates the exploitation procedure of a population with five solutions
(circles). The algorithm found 10 candidate solutions (triangles) by optimizing the surrogate and
the inexpensive constraint functions. In this example, 𝑁 (exploit) = 3 and, thus, the K-means
algorithm is instantiated to find three clusters. From each cluster, the solutions obtained by ranking
selection based on the crowding distance are assigned to (surrogate) .
      Besides the exploitation, some exploration is essential to be incorporated into a surrogate-
                                                         97


assisted algorithm. The exploration is based on the evolutionary recombination of NSGA-II with
post-filtering based on the distance in the design space (Line 9 to 11). First, the environmental
survival is executed because the mating should not be based on the archive but instead on a subset of
                             ′
more promising solutions . Second, mating takes place to produce 𝑠 · 𝑁 (explr) solutions (mat) and,
third, the set of feasible solutions from (mat) being maximally away from and (cand) are assigned
to (explr) . Figure 5.4b demonstrates this explorative step in more detail. All infeasible solutions
generated through mating have already been eliminated. The solution with the maximum distance
to others is selected, which represents the least crowded solution with respect to and (cand) . The
exploration step purposefully chooses solutions not suggested by the surrogate and helps to escape
from local optima if necessary. After selecting the first solution with the maximum distance to
others, the solution is marked as selected and considered in the distance calculations for the second
iteration in the selection procedure. Finally, the infill solutions (surrogate) and (explr) are evaluated
on the expensive objective functions 𝑓 and merged with and F (Line 12 to 14).
     In our implementation, we set the number of the initial design of experiments to 𝑁 DOE = 11𝑛 − 1,
where 𝑛 is the number of variables. Moreover, we simulate NSGA-II for 𝑘 = 20 generations with
100 offsprings each generation on the surrogate model. We have used the NSGA-II implementation
available in pymoo [29] for optimization and the Radial Basis Function (RBF) implementation with
a cubic kernel and linear tail available in pySOT [277] for the surrogate model. In each iteration
five new solutions are evaluated using the expensive objective function, where 𝑁 (exploit) = 3 and
𝑁 (explr) = 2. Furthermore, we set the multiplier of offsprings during exploration to 𝑠 = 100.
Moreover, it is worth pointing out that we compare our method with SA-NSGA-II, which does
not assume the constraints are inexpensive but follows overall the same procedure. In contrast to
IC-SA-NSGAII, it uses regular Latin Hypercube Sampling for the initial design of experiments and
fits a surrogate of the constrained function(s) to evaluate feasibility.
                                                   98


5.4.3   Results
The proposed method uses the inexpensiveness of the constraint function(s), and, thus, the perfor-
mance on constrained multi-objective optimization problems will be evaluated. In this chapter,
we focus on bi-objective problems with up to 10 constraint functions. In contrast to the constraint
function, we treat all objectives to be computationally expensive.
    To evaluate the algorithm’s performance, we use the CTP test problems suite [278], which has
been designed to address constraints of varying difficulty. Moreover, the performance on other bi-
objective constrained optimization problems frequently used in literature such as OSY [9], TNK [9],
SRN [9], C2DTLZ2 [258], C3DTLZ4 [258], and Car Side Impact (CAR) [258] shall be evaluated.
The number of solution evaluations ESEmax is kept relatively small to mimic the evaluation budget
of time-consuming simulations. In the following, first, the performance of methods proposed to
generate a feasible solution set for the design of experiments are visually analyzed, and, second,
the algorithm’s performance on test problems is discussed.
    In Figure 5.5, the results of Rejection Based Sampling (RBS), Niching GA (NGA), and Riesz
s-Energy (Energy) are shown. Compared to RBS and NGA, Energy obtains a very uniform and
well-spaced point set in the inside of the feasible region across all problems. Also, it is worth noting
that points on the constraint boundary are found, which can be very valuable to start with because,
in practice, optima frequently lie on constraint boundaries. For the purpose of visualization, the
CTP8 problem with two variables (and nine feasible disconnected regions) has been investigated.
All methods were able to obtain more than one feasible solution in all regions.
    Table 5.1 lists the median values (obtained from 11 runs) of the Inverted Generational Distance
(IGD) [240] indicator of 14 constrained bi-objective optimization problems. The obtained results
have been normalized with respect to the ideal and nadir point of each problem’s True front. The
best performing method and other statistically similar methods (Wilcoxon rank test, 𝑝 = 0.05) are
marked in bold. Besides IC-SA-NSGA-II, we ran a more steady-state version of NSGA-II with five
offsprings in each generation and an initial population of size 11𝑛 − 1 sampled by LHS. Moreover,
SA-NSGA-II is the proposed method without the modifications made to exploit the availability
                                                   99


TNK
SRN
CTP8
OSY
Figure 5.5: Sampling the design of experiments only in the feasible space using Rejection-Based
Sampling (RBS), Niching Genetic Algorithm (NGA) and Energy Method.
                                             100


Table 5.1: The median normalized Inverted Generational Distance (IGD) values out of 11 runs for
NSGA-II, SA-NSGA-II and IC-SA-NSGA-II on constrained bi-objective optimization problems.
The best performing method and other statistically similar methods are marked in bold.
       Problem      Variables  Constraints   ESEmax     NSGA-II  SA-NSGA-II       IC-SA-NSGA-II
         CTP1          10            2        200        3.6399      0.0237           0.0196
         CTP2          10            1        200        1.4422      0.1721           0.0173
         CTP3          10            1        200        1.2282      0.2752           0.0357
         CTP4          10            1        400        0.8489      0.3969           0.0736
         CTP5          10            1        400        0.7662      0.1145           0.0139
         CTP6          10            1        400        7.7155      0.1909           0.0117
         CTP7          10            1        400        1.5517      0.0164           0.0032
         CTP8          10            2        400       11.6452      0.5963           0.0074
          OSY           6            6        500        0.4539      0.0273           0.0381
          SRN           2            2        200        0.0263      0.0112           0.0108
          TNK           2            2        200        0.1281      0.0200           0.0092
      C2DTLZ2          12            1        200        0.3787      0.1185           0.0484
      C3DTLZ4           7            2        200        0.2622      0.1210           0.0481
          CAR           7           10        200        0.2362      0.0168           0.0147
of inexpensive constraints. Clearly, IC-SA-NSGA-II outperforms the other approaches for most
of the problems. For 11 out of 14 optimization problems, our proposed method shows the best
performance significantly; for two problems (CTP1 and SRN), SA-NSGA-II performs statistically
similarly; and for one problem (OSY), SA-NSGA-II shows slightly better results. NSGA-II is not
able to find a near-optimal set of solutions with the limited ESEmax for any of the selected problems.
    Figure 5.6 shows the obtained solution set for each method of representative runs for CTP2,
CTP4, CTP8, C3DTLZ4, TNK, and OSY, for which well-converged and well-diversified non-
dominated solutions are found with 200 to 500 solution evaluations. For the difficult problem
CTP2, our proposed method converges near the true optima, which lies on the constraint boundary.
Similarly, for CTP8, where nine feasible islands for which three contain the Pareto-optimal set exist
and, thus, a good exploration of the search space is needed. For C3DTLZ4, IC-SA-NSGA-II has
obtained a better diversity in the solution set than SA-NSGA-II.
                                                  101


Figure 5.6: Solutions in the objective space of representative runs for CTP2, CTP4, CTP8,
C3DTLZ4, TNK, and OSY.
                                            102


5.4.4    Summary of Section 5.4
The optimization of computationally expensive optimization problems has become more important
in practice. Often such problems have physical or geometrical constraints that are relatively
computationally inexpensive and can be formulated in equations without running the simulation.
To solve these kinds of problems, we have proposed IC-SA-NSGA-II, a surrogate-assisted NSGA-
II, which efficiently handles inexpensive constraint functions in the initial design of experiments
as well as in each iteration. We have tested our proposed method on 14 constrained bi-objective
optimization problems, and our results indicate that efficiently handling the inexpensive constraints
helps to converge faster.
    This section has focused on solving constrained optimization problems with inexpensive con-
straints and expensive objective functions with limited solution evaluations. However, some other
heterogeneous problems with different time scales of objective and constraint evaluations must be
considered next. For instance, the constraints can be even more time-consuming than the objective
function, or the objectives and constraints can have different time scales within themselves. After
addressing such cases, there is a need for developing a unified algorithm for generic heterogeneous
optimization problems. Moreover, studies about heterogeneously expensive optimization problems
have so far been limited to bi-objective optimization problems. This section has started to add
some more complexity by adding constraint functions. However, the effect of heterogeneity for
many-objective optimization problems shall provide more insides into exploiting the discrepancy
of evaluation times and asynchronicity.
5.5    Constrained Multi-Objective Optimization Problems With Heterogeneous Evaluation
       Times
After investigating a scenario with computationally inexpensive constraints and expensive objec-
tives, this shall now be generalized to independently computable heterogeneously expensive target
functions. Most existing algorithms do not particularly exploit the practical fact that the objectives
                                                 103


and constraints are often independently computable. The independent evaluation usually originates
from different software packages being executed to determine the performance of a solution. The
procedure of assessing the performance of a solution might be fundamentally different for each
software package, and thus most of the time, the computing time will vary. This results in an op-
timization problem with independently heterogeneously expensive objectives and constraints that
the optimization method must evaluate and use.
    For most practical problems, at least one of the target functions involves a time-consuming
evaluation process (we refer here as high-fidelity evaluations), thereby causing an optimization run
to go on for hours or days. In order to optimize such problems, surrogate-assisted optimization
methods are prevalent. Using a few high-fidelity solution evaluations, a surrogate model (an
approximate mathematical function) of each target function is created. Instead of using high-
fidelity expensive target functions, surrogate models are usually optimized to find a set of infill
points. Evaluating a solution using the surrogate models is referred to as low-fidelity evaluation.
The created infill points are evaluated using high-fidelity target functions, and the surrogate models
are updated. Since the optimization task is performed on the surrogate models (low-fidelity
evaluation), there is usually a substantial gain in computational time compared to optimization with
high-fidelity evaluations. In general, there is a trade-off between gain in computational effort and
the resulting accuracy of the obtained solutions in surrogate-assisted optimization methods. If a
few high-fidelity solutions are used to build surrogate models, the computational time will be small,
but the resulting surrogate model may be inaccurate. Thus, optimizing the surrogate models may
result in inferior infill solutions in terms of the high-fidelity target functions. Surrogate-assisted
optimization algorithms make a fine balance between the building cost of surrogate models and
how extensively they are optimized.
    However, it is important to note that this paper does not propose another surrogate-assisted
optimization algorithm, nor does it plan to compare the proposed methodology with an existing
surrogate-assisted optimization method. Here, we focus on handling heterogeneously expensive
target functions within an optimization algorithm utilizing surrogate models. Thus, no effort is
                                                   104


made to directly find new infill points using the surrogates. However, instead, models are used to
evaluate new solutions to estimate their expected target function values without computing them
with high-fidelity evaluation procedures. This is not to say that built surrogate models cannot be
exploited further beyond the scope of our proposed methodology, and rather this is something we
plan to execute in a subsequent study. Here, we address the presence of heterogeneity in practical
target function evaluations and how a population-based algorithm can exploit it to come up with a
computationally quick algorithm.
5.5.1    How to Exploit Heterogeneity of an Optimization Problem?
Most existing population-based algorithms do not assume target functions are independently com-
putable and thus wait to update the old population until all new population members are evaluated for
all target functions. If all target functions must be computed simultaneously by a single evaluation
procedure, or the evaluation time for all target functions is negligible compared to the desired time
for executing an optimization run, no special treatment for any evaluation schedule is necessary.
     However, let us consider that not all target functions can be evaluated as a single block of
computation, but rather independent groups of target functions must be evaluated using different
evaluation schemes. Moreover, some groups of target functions require comparatively more time
to get evaluated compared to other groups, so there is heterogeneity in the computing efforts.
Such problems are predominant in most practical problems, including science and engineering. A
practical design must be evaluated from multi-physics considerations, such as aerodynamics, fluid
mechanics, solid mechanics, aesthetics, and others. In such a scenario, if a new design is already
evaluated to be worse for certain relatively inexpensive target functions, it can be avoided for further
processing within the algorithm and thereby saving computational time by not evaluating expensive
target functions. Importantly, such decisions can only be made relative to a set of solutions and
cannot be made for a single solution. This is the reason why EC methods are ideal candidates for
handling heterogeneous target functions.
     For instance, let us assume a genetic algorithm after having executed a few iterations when
                                                   105


solving a bi-objective optimization problem with one constraint. The algorithm has completed the
mating process and created a set of new offspring solutions 𝑋 to be evaluated for two objectives
 𝑓1 and 𝑓2 and constraint 𝑔1 . Instead of evaluating all solutions at once for 𝑓1 , 𝑓2 , and 𝑔1 , only one
or multiple solutions can be chosen from 𝑋 to retrieve one target function value at a time, say, the
constraint 𝑔1 . Depending on the values of 𝑔1 , some solutions can be discarded already because
they are found to be infeasible. The more “promising” solutions are kept and will continue to be
evaluated on the next target 𝑓1 . Analogously, some solutions can be eliminated, and only a few
are finally sent to obtain the values 𝑓2 . By processing partial information, not all solutions will
be evaluated for all targets, which speeds up the overall evaluation process. Such exploitation of
partial information requires answering two elementary questions.
     (Q1) Target Order Problem: A relevant question arises: “In what order should the targets be
evaluated?” Intuitively, this should depend on the actual evaluation times of target functions and
their predicted target values. For the example discussed above, if computational times are having
the following relationships: 𝑡 ( 𝑓1 ) < 𝑡 (𝑔1 ) < 𝑡 ( 𝑓2 ), then an ordering of the targets by evaluation time
should follow least to most expensive, or 𝑓1 followed by 𝑔1 , which is then followed by 𝑓2 . But the
values of the target functions must also play an essential role in deciding on the order of evaluation.
Suppose 𝑔1 is found to be positive. In that case, this indicates that the solution is infeasible, and a
smart algorithm may decide not to evaluate 𝑓1 and 𝑓2 at all for this solution to save computational
time. That is well and good, but there is a problem with the above method. In order to know
their relative target function values, the solution has to be evaluated for all target functions. If all
functions are already evaluated, there is no need to do any ordering. This rounding argument can
be answered by making low-fidelity evaluations of offspring population members using the current
surrogate models. But since surrogate models are approximate, each surrogate model may have
different prediction accuracy. Thus, the order of their evaluation should depend on a solution’s rank
in the population-based on its predicted target values (for example, in multi-objective NSGA-II,
a combination of non-dominated rank and crowding distance), their accuracy of prediction, and
computational time for high-fidelity evaluation of target functions. Despite the importance of 𝑔1 in
                                                       106


determining the feasibility of a solution, it may turn out that computation of the cheapest objective
function ( 𝑓1 ) first is beneficial over the constraint evaluation to determine if the most expensive
objective ( 𝑓2 ) needs to be evaluated at all. A combined metric for ordering target functions is
provided in Subsection 5.5.2.3.
     (Q2) Elimination Problem: The next question is as follows: “Under what circumstances should
an offspring solution be eliminated and not continued to be evaluated for the remaining targets?”
If all targets would be evaluated for all solutions, then the optimization method does not make any
use of separately computing solutions. Therefore, some solutions need to be eliminated during the
evaluation process. Whereas the order defines what partial information should be made available
next, the elimination decides whether a solution is worth keeping and evaluated for more targets.
The decision is based on partial information, where some targets are evaluated, their high-fidelity
values are available, and a surrogate only predicts others. Another valuable piece of information is
the surrogate accuracy derived from the past, which helps to judge how reliable the predictions are.
5.5.2    Methodology
5.5.2.1    Survival Under Uncertainty
An environment selection or survival decides given a set of solutions which one are the fittest
and shall survive. Under certainty, numerous survivals have been proposed in the literature,
for instance, for unconstrained single-objective genetic algorithms simply a selection based on
the objective values [1] or in NSGA-II, survival based on non-dominated sorting and crowding
distance [10]. However, most environmental survivals proposed in the evolutionary computation
literature assume that the exact values for objectives and constraints are known. The goal of the
proposed probabilistic survival is to make existing survival procedures applicable under uncertainty.
With uncertainty, we refer to the situation that some target values originate from a prediction with
an underlying error. Despite the situation where either all targets are based on predictions or all are
exact, the survival also needs to handle cases of mixed uncertainty, where some targets are exact
and some predicted.
                                                   107


   Algorithm 5.2: Probabilistic Survival: Subset Selection under Uncertainty
     Input : Population 𝑃, Offsprings 𝑄, Uncertain Targets 𝑉, Predictions errors 𝑒, Iterations 𝛾
     /* Repeat the experiment 𝛾 times                                                              */
  1  foreach 𝑘 ← 1 to 𝛾 do
   2     𝛼𝑖 ← 0 ∀𝑖 ∈ (1, . . . , |𝑄|)
   3     𝑀 ← merge_and_copy(𝑃, 𝑄)
   4     foreach 𝑖 ← 1 to (1, . . . , |𝑀 |) do
             /* Add noise for uncertain target                                                     */
   5         foreach 𝑣 ∈ 𝑉 do
   6              𝑀 [𝑣] = 𝑀 [𝑣] + N (0, 𝑒 2𝑣 )
   7         end
   8     end
   9     𝑀 ′ ← survival(𝑀)
 10      foreach 𝑖 ← 1 to (1, . . . , |𝑄|) do
             /* If survived, increase the counter                                                  */
 11          if 𝑄 𝑖 ∈ 𝑀 ′ then 𝛼𝑖 ← 𝛼𝑖 + 1 ;
 12      end
 13  end
     /* Convert survival counts to probabilities                                                   */
 14  foreach 𝑖 ← 1 to (1, . . . , |𝑄|) do 𝛼𝑖 ← 𝛼𝑖 /𝛾 ;
 15  return 𝛼
     The proposed probabilistic survival does not change an existing survival but calls it repeatedly
with some introduced error noise for predicted targets. The procedure is illustrated in Algorithm 5.2.
Given a parent population 𝑃, offsprings 𝑄, a set of uncertain targets 𝑉, the average prediction error
𝑒 𝑣 of each target 𝑣 ∈ 𝑉, the number of iterations the experiment is repeated 𝛾, and a survival
probability 𝛼𝑖 for each offspring 𝑄 𝑖 . In total, the survival under certainty calls the survival
considering certainty exactly 𝛾 times. In each iteration, first, the population 𝑃 and offsprings 𝑄 are
merged and copied to 𝑀. Then for each solution, for each uncertain target 𝑣 ∈ 𝑉 Gaussian noise
is added N (0, 𝑒 2𝑣 ). The population, 𝑀 with error noise, is sent to the survival selection, and the
survivors are assigned to 𝑀 ′. If solution 𝑖 has survived and thus is in 𝑀 ′, its counter 𝛼𝑖 is increased
by one. Finally, the survival counters 𝛼𝑖 are converted to probabilities by dividing by the number
of experiments conducted 𝛾.
     With a mix of certain and uncertain targets, the proposed survival might look as follows. As-
suming the objective space values 𝑓1 already have been assessed using the high-fidelity evaluation,
                                                       108


  Algorithm 5.3: Probabilistic Surrogate-Guided Mating
    Input : Population 𝑃, Surrogate 𝑆, Predictions errors 𝑒, Mating Iterations 𝛽, Prob. Surv.
               Iterations 𝛾
    /* Regular mating used by EA                                                                      */
  1 𝑄 ← mating()
  2 foreach 𝑘 ∈ (1, . . . , 𝛽) do
          /* Double the number of offsprings                                                          */
  3       𝑄 ′ ← 𝑄 ∪ mating()
          /* Surv. Prob. of each offspring                                                            */
  4       𝛼 ← prob_surv(𝑃, 𝑄 ′, 𝑉, 𝑒, 𝛾)
          /* Discard unpromising offsprings                                                           */
  5       𝑄 ← top(𝑄 ′, |𝑄|, 𝛼, ’descending’)
  6 end
  7 return 𝑄
and 𝑓ˆ2 and 𝑔ˆ 1 are predicted by a surrogate with a known prediction error of 𝑒 𝑓ˆ2 and 𝑒 𝑔ˆ1 , respec-
tively. In each iteration, 𝑓ˆ2 is provided with error noise N (0, 𝑒 2ˆ ), as well as the constraint 𝑔ˆ 1 with
                                                                     𝑓2
N (0, 𝑒 2𝑔ˆ1 ). The outcome of multiple survival experiments with different amounts of introduced
error noise for uncertain targets ( 𝑓2 and 𝑔1 ) lets us derive the probability of a solution to survive in
a mixed certain and uncertain environment.
5.5.2.2       Probabilistic Surrogate-Guided Mating
Given the prediction and the average error for each target, one can calculate the survival probability
𝛼 for each offspring originating from mating. Since the mating only uses information about the
parents and no predictions, many solutions may have a relatively low survival probability and might
be directly discarded. In order to increase the survival probability, this information can be directly
used during mating.
    In Algorithm 5.3 the proposed modified mating is demonstrated. The idea is based on repeating
the original mating procedure mating() multiple (𝛽) times by only keeping the most promising
offspring solutions. Initially, the offspring population 𝑄 is created. Then in each iteration, another
offspring population is merged to create 𝑄 ′. For each solution in 𝑄 ′, the survival probability is
determined to keep solutions most likely to survive. This is achieved by taking the top |𝑄| solutions
                                                    109


from 𝑄 ′ based on a descending sorting by 𝛼. After having the process repeated 𝛽 times, the
solutions that have repeatedly survived are returned.
5.5.2.3     Heterogeneously Expensive Evolutionary Algorithm (HE-EA)
The pseudo-code of the proposed heterogeneously expensive evolutionary algorithm (HE-EA) is
shown in Algorithm 5.4. HE-EA assumes that an approximation of evaluation times (ET) for each
target exists beforehand. However, if this should not be the case, the evaluation time can be kept
track of using a book-keeping approach after evaluating each target. Initially, a list of all targets
𝑉 is created where first all objectives and then all constraints appear. Afterward, HE-EA creates
a space-filling set of designs 𝑃. Then, for each target 𝑣 ∈ 𝑉, the initial population 𝑃 is evaluated
by calling the high-fidelity evaluation function evaluate(P,v), the survival error 𝜌𝑣 is set to
one, the surrogate 𝑆 𝑣 is fit, and the mean absolute error 𝑒 𝑣 is estimated using cross-validation.
Cross-validation is helpful to provide to measure the complexity of each target.
    Until the time limit of running the optimization procedure has been met, the algorithm’s main
loop is repeated. It starts by performing the mating procedure to generate the offspring population
𝑄 and predicting the objective and constraints predict(S,Q) using the surrogate 𝑆. Analogously,
the current population 𝑃 is copied to 𝑃′, and the surrogate is used to obtain approximations.
Next, the order in which the targets are supposed to be evaluated needs to be determined. The
order should be based on the trade-off between evaluation time ET and surrogate error 𝑒. The
function order(ET,𝛼) first calculates an indicator value for each target. We propose a metric
called information gain (IG 𝑘 ) of the 𝑘-th target as the survival error (𝜌 𝑘 ) per unit evaluation time
(ET 𝑘 ), as follows:
                                                      𝜌𝑘
                                              IG 𝑘 =      .                                         (5.3)
                                                     ET 𝑘
Targets with larger information gain are preferred to be evaluated first and thus, the target evaluation
order 𝜏 is given by sorting IG in descending order. To illustrate the intuition behind this order, let
us consider a few examples where two targets, 𝑣 1 and 𝑣 2 , are compared with each other. Assuming
both targets have the same evaluation time ET1 = ET2 , but target one has a larger survival error
                                                  110


  Algorithm 5.4: Heterogeneously Expensive Evolutionary Algorithm (HE-EA)
    Input : Evaluation Times ET, Max. Survival Prob. 𝛼 (min)
    /* Initialize the target vector                                                             */
  1 𝑉 ← ( 𝑓1 , . . . , 𝑓 𝑚 , 𝑔1 , . . . , 𝑔 𝐽 )
    /* Sample design of experiments                                                             */
  2 𝑃 ← doe()
  3 foreach 𝑣 ∈ 𝑉 do
  4     evaluate(P, v)
  5     𝜌 𝑣 ← 1.0
  6     𝑆 𝑣 ← fit_surrogate(𝑃, 𝑣)
  7     𝑒 𝑣 ← estm_mae(𝑆, 𝑃, 𝑣)
  8 end
  9 while time left do
        /* Create the offspring population                                                      */
 10     𝑄 ← prob_mating(𝑃, 𝑄, 𝑆, 𝑒, 𝛾, 𝛽)
        /* Prediction of P and Q                                                                */
 11     𝑄 ′ ← predict(S, Q) 𝑃 ′ ← predict(S, P);
        /* The order to eval. targets                                                           */
 12     𝜏 ← order(ET, 𝜌)
        /* Targets with uncertainty                                                             */
 13     𝑉 (𝑈) ← 𝑉
        /* Survival prob. before evaluation                                                     */
 14     𝛼 (0) ← prob_surv(𝑃 ′, 𝑄 ′, 𝑉 (𝑈) , 𝑒)
 15     foreach 𝑘 ← 1 to |𝑉 | do
 16           𝑣 ← 𝑉 [𝜏𝑘 ]
              /* Evaluate and copy targets                                                      */
 17           copy(𝑃, 𝑃 ′, 𝑣); evaluate(𝑄 ′, 𝑣)
 18           𝑉 (𝑈) ← 𝑉 (𝑈) \ {𝑣}
 19           𝛼 (𝑘) ← prob_surv(𝑃 ′, 𝑄 ′, 𝑉 (𝑈) , 𝑒)
              /* Calculate the survival error                                                   */
                         Í |𝑄′ | (𝑘)
 20           𝜌 𝑣 ← 𝑖=1           |𝛼𝑖 − 𝛼𝑖(𝑘−1) |
              /* Fit target surrogates and calc. 𝑒                                              */
 21           𝑆 𝑣 ← fit_surrogate(𝑃, 𝑣)
 22           𝑒 𝑣 ← mae(𝑆, 𝑃, 𝑣)
              /* Eliminate unpromising solutions                                                */
 23           𝑄 ← eliminate(𝑄, 𝛼 (𝑘) , 𝛼 (min) )
              /* If all solultions were eliminated                                              */
 24           if |𝑄| = 0 then break ;
 25     end
 26     𝑃 ← survival(𝑃 ∪ 𝑄)
 27 end
𝜌1 > 𝜌2 . This results in IG1 > IG2 and the first target to be evaluated first. Intuitively this is the
right decision because the target with a more significant estimation error can potentially eliminate
                                                     111


more solutions already and thus save computation time. On the other hand, let two targets have
the same survival error 𝜌1 = 𝜌2 , but the first one have a larger evaluation time ET1 > ET2 . This
results in IG1 < IG2 . In this case, the effort for evaluation is identical, and the target modeled
less accurately is evaluated first. The intuition behind information gain is defining a target order
giving preference to targets that are difficult to predict by the surrogates or are computationally less
expensive during evaluation.
     Before starting with the evaluation procedure, the uncertain targets 𝑉 (𝑈) are initialized to be
all targets 𝑉, and the initial survival probabilities 𝛼 (0) are obtained by executing the probabilistic
survival (see Algorithm 5.2). The evaluation process of the offsprings 𝑄 loops over the targets
𝑉 in the order of 𝜏 using the counter variable 𝑘. In each iteration, the target 𝑣 ← 𝑉 [𝜏𝑘 ] is then
evaluated for the offsprings and copied over for the population. After removing the current target
𝑣 from the remaining targets 𝑉 (𝑈) , the new survival probability 𝛼 (𝑘) is calculated, and the survival
error 𝜌𝑣 determined. The survival error is the mean absolute error between the two different 𝛼’s
and represents the error introduced by the prediction of target 𝑣. Afterward, the surrogate used for
the predictions of target 𝑣 is updated and its prediction error set. In the end of each target iteration,
offsprings with a survival probability 𝛼𝑖(𝑘) ≤ 𝛼 (min) are eliminated. The elimination of unpromising
offsprings saves time because their evaluation of remaining targets is skipped over.
     At the end of the evaluation process, the deterministic survival of the remaining fully evaluated
offspring population and the current population is performed. In some iterations, no offspring
might be left due to the iterative elimination, and one might wonder if the algorithm can be caught
in a deadlock. However, in such iterations, the surrogate for each target has been updated using
the already eliminated but partially evaluated offspring solutions. The procedure is then repeated
until the time limit of running the algorithm has been reached. A time limit as a termination
criterion instead of counting solution evaluation is recommended because it considers full and
partial evaluations of individuals. Let us have a close look at one iteration of HE-EA for a
bi-objective optimization problem having two objectives ( 𝑓1 , 𝑓2 ) and one constraint (𝑔1 ). Let
us assume that the target order has been determined to be ( 𝑓1 , 𝑔1 , 𝑓2 ). A flowchart diagram
                                                    112


illustrates the process of evaluating the offspring population (see Figure 5.7). Initially, the current
population 𝑃 was copied and predicted by the surrogate to become 𝑃′. This step can be essential
because comparing only predictions with predictions ensures comparing only apples with apples
and oranges with oranges. However, if all surrogates fit through the data set exactly, this step
is skipped. After generating the predicted population, the offspring population 𝑄 is created by
surrogate-guided mating, and their predictions are available. In the flowchart, targets predicted
by surrogates are shown in blue, and the ones with exact values are shown in orange. In the first
iteration of the elimination-based evaluation, the parent and the offspring population are predicted
by the surrogate, and the survival probability of 𝛼 (0) is calculated.
     After evaluating the first target 𝑣 ← 𝑓1 , the survival probability 𝑎 (1) having no error noise for
 𝑓1 is predicted. The survival error 𝜌 𝑓1 is then determined by the mean absolute error of survival
probabilities. Before moving on to the next target, all offspring members with 𝑎 (1) less than 𝑎 (min)
are eliminated. Then, the target evaluation is continued by evaluating 𝑔1 , performing probabilistic
survival result in 𝑎 (2) and updating the surrogate error 𝜌𝑔1 . Analogously, in the last iteration, 𝑓2
is evaluated. It is worth pointing out that 𝛼 (3) does not require any probabilities survival because
after having evaluated all targets, no noise will be added, which results in a deterministic survival
procedure. Thus, the survival probabilities are either zero or one. Finally, the population and the
fully evaluated offspring population are sent to the survival operator to determine the population for
the next iteration. Notice that the above algorithm also degenerates to the single-objective constraint
problems with heterogeneous evaluation times among objective ( 𝑓 ) and multiple constraints (𝑔).
                                                   113


      𝑃                      𝑃(                 𝑃(                                       𝑃(                                      𝑃(
  𝑓! 𝑓" 𝑔!              𝑓)! 𝑓)" 𝑔
                                *!          𝑓! 𝑓)" 𝑔*!                               𝑓! 𝑓)" 𝑔*!                               𝑓! 𝑓" 𝑔!
                                                                                                                                          𝑃
                predict            Copy 𝑓!                       Copy 𝑔!                                    Copy 𝑓$
                                                                                                                                       𝑓! 𝑓" 𝑔!
Surrogate-assisted      𝑓)! 𝑓)" 𝑔
                                *!          𝑓! 𝑓)" 𝑔*!           𝑓! 𝑓)" 𝑔*!          𝑓! 𝑓)" 𝑔*!           𝑓! 𝑓)" 𝑔*!          𝑓! 𝑓" 𝑔!
           Mating
                                   Evaluate            Eliminate            Evaluate            Eliminate            Evaluate
        Predicted
                                      𝑓&                                       𝑔&                                       𝑓'
        Exact
                            𝑄                  𝑄                                        𝑄                                        𝑄
                            𝛼 (()              𝛼 (+)                                    𝛼 (-)                                    𝛼 (,)
      Probabilistic
      Survival
                                     𝜌*)                             𝜌))                                        𝜌**
     Figure 5.7: One iteration of HE-EA consisting of an ordered target evaluation ( 𝑓1 , 𝑔1 , 𝑓2 ) and offspring eliminations.
                                                                                 114


5.5.2.4   Surrogate Management
Even though surrogate modeling is not the focus of this study, a few words need to be said to
complete the algorithm description. In this study, each target value is modeled independently to
avoid any error accumulation across targets. Thus, in total |𝑀 | + |𝐽 | surrogate models are built [23].
For each target, different surrogate models are fitted, and then one with the least mean absolute
error is chosen as a predictor. We have used two different types of models: Radial Basis Functions
(RBFs) [34] and Kriging [35, 55]. Each of them is instantiated with different hyper-parameters,
resulting in a list of potential models from which one is selected. It is worth mentioning that the
number of data points for each model may vary because of partial evaluations. In order to decrease
the computational burden of surrogate fitting, in this study, only the previous 200 data points are
considered. The mean absolute error is estimated in each iteration by considering the data points
as the training set and the infill solutions evaluated as a validation set. This allows a realistic error
estimation in each iteration.
5.5.3    Results and Discussions
The capability of the proposed method to exploit independently computable and heterogeneous
expensive optimization problems shall be examined next.
    We have chosen NSGA-II [10] with a population size of 100 for unconstrained and constrained
bi-objective problems. For problems with three objectives, we have used NSGA-III [279, 258, 12]
with 91 reference directions originating from uniform weight sampling [280] with a partition
number of 12. For both algorithms, the default parameter settings proposed in the papers and
defined in their implementations available in the multi-objective optimization framework in Python
called pymoo [29] is used. In each experiment, three different algorithms are compared with each
other. First, the baseline algorithm without any modifications: This represents an optimization
method waiting for all targets to be evaluated despite their heterogeneous evaluation times. Second
is a modification of the baseline algorithm incorporating the Elimination-Based Evaluation (EBE)
with the default mating procedure where the survival probabilities have been assessed by repeating
                                                  115


the survival a hundred times (𝛾 = 100). The third is the complete heterogeneously expensive
(HE) evolutionary algorithm with a surrogate-guided mating (𝛽 = 30) and elimination-based
evaluation, where the consideration of two different variations shall help to give credit to eliminating
unpromising solutions during evaluation and creating solutions more likely to survive beforehand
separately. Throughout the experiment, we have fixed the minimum survival probability to 𝛼 (min) =
0.3. Some numerical experiments have shown that this seems to be a reasonable value for discarding
unpromising individuals. When proposing a new method, the number of hyper-parameters shall be
kept as small as possible. It is worth mentioning that in this study, the hyper-parameters 𝛾, 𝛽, and
𝛼 (min) have an intuitive meaning which helps to set them properly. Their values were determined
through an empirical study, and the performance of configurations close to the suggested one has
shown to be similar. Thus, no significant sensitivity could be observed.
     The performance of the proposed methods is assessed on unconstrained and constrained multi-
objective test problems where the time for objective and constraint functions has been systematically
varied. We have conducted 11 runs for each problem and algorithm to address the stochastic
behavior of the underlying optimization method. All tables presented in this section show the
average IGD values [240]. The best-performing algorithm for each problem is marked as the winner
(∗), and other algorithms performing significantly similar (Wilcoxon signed-rank test, 𝑝 = 0.05)
are labeled by (≈), and ones are performing significantly worse by (−). Moreover, we have denoted
the number of variables by 𝑁, the number of objectives by 𝑀, and the number of constraints by 𝐽
for each problem. Starting with bi-objective problems, we have used the ZDT [253] problem suite
with the evaluation times for a solution are fixed to 20 time units. The overall evaluation budget is
set to seven hours for ZDT1-3 and to 10 hours for ZDT4. This equals 13 and 18 generations of fully
evaluated individuals, respectively. For the experiment, the evaluation time of the first objective
is set to 1, 5, 15, or 19, and the one of the second complementary to make their sum equal to 20.
The results are shown in Table 5.2. First, one can note that the different evaluation times always
lead to identical results for the baseline algorithm, caused by the algorithm waiting for all targets to
be evaluated before proceeding. Second, EBE and HE were both able to outperform the NSGA-II
                                                   116


Table 5.2: Average IGD values for unconstrained bi-objective problems from the ZDT test suite.
NSGA-II does not use heterogeneous evaluation time information, hence produce identical IGD
value for all different evaluation time combinations.
                         𝑡 ( 𝑓1 , 𝑓2 )   NSGA-II      EBE-NSGA-II   HE-NSGA-II
                                                     ZDT1
                                             (𝑁 = 10, 𝑀 = 2, 𝐽 = 0)
                         (1,19)                         0.1053  (−)  0.0166 (*)
                         (5,15)                         0.1078  (−)  0.0169 (*)
                         (10,10)        0.3258 (−)      0.1162  (−)  0.0120 (*)
                         (15,5)                         0.0968  (−)  0.0119 (*)
                         (19,1)                         0.0882  (−)  0.0099 (*)
                                                     ZDT2
                                             (𝑁 = 10, 𝑀 = 2, 𝐽 = 0)
                         (1,19)                         0.1942  (−)  0.0099 (*)
                         (5,15)                         0.2303  (−)  0.0107 (*)
                         (10,10)        0.6457 (−)      0.2123  (−)  0.0139 (*)
                         (15,5)                         0.2233  (−)  0.0148 (*)
                         (19,1)                         0.2316  (−)  0.0143 (*)
                                                     ZDT3
                                             (𝑁 = 10, 𝑀 = 2, 𝐽 = 0)
                         (1,19)                         0.0854  (−)  0.0376 (*)
                         (5,15)                         0.0930  (−)  0.0474 (*)
                         (10,10)        0.2009 (−)      0.0777  (−)  0.0348 (*)
                         (15,5)                         0.0597  (−)  0.0324 (*)
                         (19,1)                         0.0540  (−)  0.0194 (*)
                                                     ZDT4
                                              (𝑁 = 5, 𝑀 = 2, 𝐽 = 0)
                         (1,19)                        19.4416 (*)  20.4623 (≈)
                         (5,15)                        21.1689 (≈)  17.5283 (*)
                         (10,10)       27.7984 (−)     17.7420 (≈)  16.9622 (*)
                         (15,5)                        16.2275 (*)  16.2289 (≈)
                         (19,1)                        14.9382 (*)  15.5179 (≈)
                                           0 (*)           3 (*)       17 (*)
                         Total             0 (≈)           2 (≈)        3 (≈)
                                          20 (−)          15 (−)       0 (−)
no matter what time variation of 𝑡 ( 𝑓1 , 𝑓2 ) has been chosen. By comparing EBE and HE with each
other, one can conclude that increasing the probability of a solution surviving during mating is in
general helpful. For ZDT1 and ZDT2, much better results, and for ZDT3, still significantly better
results have been achieved. For ZDT4, none of the methods can converge close enough to the
                                                      117


           Table 5.3: Average IGD values for the constrained bi-objective problem TNK.
                                                 TNK
                                         (𝑁 = 2, 𝑀 = 2, 𝐽 = 2)
                        𝑡( 𝑓 ,𝑔)    NSGA-II      EBE-NSGA-II        HE-NSGA-II
                        (1,19)                      0.0034 (*)       0.0043 (≈)
                        (5,15)                     0.0035 (−)        0.0030 (*)
                        (10,10)    0.0214 (−)      0.0040 (−)        0.0031 (*)
                        (15,5)                     0.0043 (−)        0.0030 (*)
                        (19,1)                     0.0046 (−)        0.0030 (*)
                                      0 (*)            1 (*)            4 (*)
                        Total         0 (≈)            0 (≈)            1 (≈)
                                      5 (−)            4 (−)            0 (−)
true Pareto front, given the limited evaluation budget. Thus, for ZDT4, even though both methods
outperform NSGA-II, no clear winner can be declared. Altogether, we can conclude that for the
considered unconstrained bi-objective problems, HE-NSGA-II shows the best results by winning
17/20 test instances and being significantly similar in the remaining ones.
    One benefit of the proposed approach is considering groups of targets and the capability of
extending the concept of heterogeneously expensive functions to multiple objectives and constraints.
This will become apparent when discussing the following constrained multi-objective problems.
First, we investigate TNK [9, 261], which has two objectives and two constraints and a discontinuous
Pareto front. We have considered all objectives and all constraints, each as a group of targets. This
imitates the real-world scenario where Software A returns both objective values and Software B
calculates the constraints.
    The evaluation time variations have been set analogously to the unconstrained bi-objective
problems, and for each run, the time limit is set to three hours. The results listed in Table 5.3 show
the superiority of EBE and HE over the NSGA-II. Across all time variations, EBE and HE converge
to the Pareto-optimal set. For four out of five problems, HE-NSGA-II turns out to be significantly
the best performing method. Interestingly for 𝑡 ( 𝑓 , 𝑔) = (1, 19) representation, the case of objectives
being much less computationally expensive than constraints, EBE performs marginally better.
    For the welded-beam design problem [281], the first objective 𝑓1 is the cost of fabricating the
                                                  118


        Table 5.4: Average IGD values for the constrained bi-objective problem Welded Beam.
                                              Welded Beam
                                           (𝑁 = 4, 𝑀 = 2 ,𝐽 = 4)
                                   NSGA-II    EBE-NSGA-II           HE-NSGA-II
                               𝑡   0.078 (−)    0.0577 (−)            0.0168 (*)
welded beam and can be written in a closed-form mathematical term. Thus, it is relatively quick to
compute. The second objective 𝑓2 and constraints 𝑔1 and 𝑔2 are the deflection of the beam-end and
hence belong to the same target function group. Constraint 𝑔4 is less time-consuming, but 𝑔3 is the
buckling load, requiring more computational effort. Following relative computational times are
considered for the target functions: 𝑡 ( 𝑓1 ) = 1, 𝑡 ({ 𝑓2 , 𝑔1 , 𝑔2 }) = 12, 𝑡 (𝑔3 ) = 12, 𝑡 (𝑔4 ) = 1. Again,
the time limit has been set to three hours.
      Overall, the results indicate that HE-NSGA-II performs significantly better than the two com-
petitors. However, it is also worth mentioning that the difference between the average IGD values
is relatively small. Some further analysis has shown that is caused by the { 𝑓2 , 𝑔1 , 𝑔2 } groups of
targets being responsible for a relatively high survival prediction error. Therefore, even though
their evaluation is more time-consuming than others, they are scheduled first when the order of
targets 𝜏 is determined. This clearly shows the challenge of defining the evaluation order consisting
of more complex target groups with mixed complexity and expense.
      Moreover, the proposed methods shall be analyzed for a DTLZ2 [257], an unconstrained three
objective optimization problem. The experiment is set up that the evaluation times of all objectives
 𝑓1 , 𝑓2 , and 𝑓3 sum up to 30. In total, we have run the methods for 16 different combinations,
varying the expensiveness from being completely homogeneous (10,10,10) to one objective being
28 times more computationally expensive to evaluate than the cheapest function. The time limit
for each run has been set to four hours. The average results IGD values for all time variations
are shown in Table 5.5. Whereas EBE improved the performance from NSGA-III, incorporating a
more sophisticated mating in HE-NSGA-III outperforms the other competitors significantly across
all problems. This implies that the prediction of values during the run was quite accurate and that
                                                   119


               Table 5.5: Average IGD values for the three-objective problem DTLZ2.
                                                    DTLZ2
                                             (𝑁 = 10, 𝑀 = 3, 𝐽 = 0)
                        𝑡 ( 𝑓1 , 𝑓2 , 𝑓3 ) NSGA-III EBE-NSGA-III HE-NSGA-III
                        (28,1,1)                        0.1992 (−)  0.0794 (*)
                        (1,28,1)                        0.2053 (−)  0.0806 (*)
                        (1,1,28)                        0.1926 (−)  0.0820 (*)
                        (25,4,1)                        0.2051  (−) 0.0802 (*)
                        (25,1,4)                        0.2077  (−) 0.0791 (*)
                        (1,25,4)                        0.2050  (−) 0.0811 (*)
                        (4,25,1)                        0.2017  (−) 0.0810 (*)
                        (1,4,25)           0.2824 (−)   0.2015  (−) 0.0825 (*)
                        (4,1,25)                        0.2011  (−) 0.0815 (*)
                        (15,10,5)                       0.2068  (−) 0.0916 (*)
                        (15,5,10)                       0.2095  (−) 0.0957 (*)
                        (5,15,10)                       0.2021  (−) 0.0930 (*)
                        (10,15,5)                       0.2088  (−) 0.0961 (*)
                        (5,10,15)                       0.2049  (−) 0.0889 (*)
                        (10,5,15)                       0.2027  (−) 0.0901 (*)
                        (10,10,10)                      0.2047 (−)  0.1040 (*)
                                              0 (*)        0 (*)      16 (*)
                        Total                 0 (≈)        0 (≈)       0 (≈)
                                             16 (−)       16 (−)       0 (−)
putting some more bias into the offspring population has shown its effect. Another interesting
fact is that the more heterogeneous the evaluations become, the better the results. Whereas for
homogeneous times (10,10,10), HE-NSGA-III achieved an average IGD value of 0.104, which is
decreased to 0.0794 for (28,1,1).
    Lastly, some visualizations of objective space from different optimization problems are dis-
cussed. Figure 5.8 shows the median performing runs for the baseline algorithms (NSGA-II or
NSGA-III) and their EBS and HE variants. We have chosen some representative evaluation times
for each problem. The scatter plots confirm the discussion of the results based on IGD values and
give the reader an idea of the differences in convergence and diversity to expect by exploiting the
heterogeneity. For ZDT1-3 (see Figure 5.8a to 5.8c), one can observe the significant difference
between the baseline approach and the proposed variants. For ZDT4 (see Figure 5.8d), the limit
evaluation budget was not sufficient to converge for any method. It is worth noting that the figure
                                                      120


         (a) ZDT1-(10,10)                     (b) ZDT2-(1,19)                    (c) ZDT3-(5,15)
          (d) ZDT4-(19,1)                    (e) Welded Beam                      (f) TNK-(1,19)
      (g) DTLZ2-(10,10,10)                 (h) DTLZ2-(28,1,1)                   (i) Carside Impact
Figure 5.8: An illustration of the objective space for different types of unconstrained and constrained
multi-objective problems. The results are based on representative run of the median performance
for each problem. The different expensiveness of target functions and termination criteria are set
analogously to the other experiments. The visibly better red-colored points are obtained using the
proposed HE-NSGA-III procedure.
reveals that EBE achieves a better diversity than HE. For Welded Beam (see Figure 5.8e), HE-
NSGA-II can find more solutions with a smaller value of 𝑓1 whereas for TNK (see Figure 5.8f),
visually no significant difference can be observed. For DTLZ2, Figures 5.8g and 5.8h show that a
larger amount of heterogeneity in fact helps to converge faster and find a more diverse set. Another
problem where the diversity of solutions has been shown to be significantly better is the Carside
Impact problem shown in Figure 5.8i. The problem consisting of three objectives and 10 con-
straints has been set up so that the objectives are computationally inexpensive and the constraints
                                                   121


computationally expensive. Altogether, the visual inspections of the median runs of the experiment
show how the exploitation of heterogeneously expensive functions can improve the convergence of
an existing algorithm.
5.5.4   Summary of Section 5.5
In this section, the differently expensive target functions have been exploited by an elimination-
based evaluation which discards partially evaluated solutions based on their likelihood of surviving.
The concept can consider each target function separately but also handle target groups. This can
especially be useful for practitioners when a software package returns more than one objective
or constraint to be used in optimization. Moreover, the proposed approach is applicable to other
evolutionary methods where an elitist environment survival is incorporated. This has been demon-
strated by adding the support of differently expensive targets to two well-known multi-objective
algorithms. Results on unconstrained and constrained bi-objective and multi-objective problems
indicate that the proposed method can efficiently exploit the fact of differently time-consuming
target functions.
5.6   Summary of the Chapter
This chapter has investigated the evaluation of independently computable functions during opti-
mization. Four different strategies for evaluating the objectives and constraints of a solution set
have been proposed. Afterward, optimization problems with inexpensive constraints and expensive
objectives were studied. The optimization has started with a well-spaced point set in the feasible
space; the inexpensiveness of constraints is exploited by filtering out any kind of infeasible solu-
tion before evaluating the more expensive objectives. Then, problems with heterogeneity across
constraints and objectives have been considered. There, the strategy of evaluating a set of solutions
for a specific target (B/E) has been used to handle constrained multi-objective optimization prob-
lems with heterogeneous evaluation times. The proposed evolutionary algorithm has addressed the
                                                 122


order of targets during evaluation by sorting the targets by the survival prediction error divided by
evaluation time.
    This chapter shall also be extended in three directions. First, a further complication of the
expensiveness of target functions is worth investigating. In this work, the evaluation time of each
target function is kept constant, independent of the solution being evaluated. However, this does
not need to be necessarily the case when solving real-world problems. Besides requiring a more
sophisticated book-keeping approach of evaluation times, this also introduces another level of
uncertainty to the target ordering problem, which needs to be addressed.
    Second, the generational evaluation approach proposed in this proof-of-principle study can be
extended to develop a steady-state method for further gain in overall computational time. In such
a method, each offspring member can be evaluated with respect to the parent population in an
appropriate adaptively obtained order of the target functions. Its acceptance and elimination can
be determined using surrogate models.
    Third, this chapter has focused on how to take advantage of heterogeneous functions for
objectives and constraints within a population-based optimization algorithm. Although surrogate
models are created and used to determine the order of evaluating target functions and eliminating
evaluation of some population members depending on their multi-objective rank and evaluation
time, the surrogate models themselves can be exploited further in arriving at better infill solutions.
We defer such studies, which will eventually allow a complete surrogate-assisted heterogeneity-
handling EC method that would be practically viable than the usual all-at-a-time evaluation-based
algorithms. Nevertheless, our suggestion of a target function ranking scheme based on a member’s
worth in terms of non-domination and diversity in the population, accuracy of the prediction models,
and actual computational times is unique and marks a start of the further future studies for solving
heterogeneously expensive problems.
                                                123


                                             CHAPTER 6
                                 REAL-WORLD APPLICATIONS
This chapter presents two case studies of real-world applications with computationally expensive
objectives functions. The first case study consists of the optimization of a cylinder head water
jacket design considering two objectives. The article has been originally published in [28] and is
presented with minor modifications to ensure consistency throughout this thesis. Two of the three
elementary questions of surrogate assistance, What and With What, have been answered analogous
to Chapter 4. However, the How question is addressed by employing a local search on the surrogates
using a trust-region approach. It is worth mentioning that this study is one of the outcomes of a
two-year collaboration with a well-known automobile company.
    The second case study addresses the optimization of the design of an electric machine with two
objectives and various (geometric) constraints. Analogously to the first case study, the notation has
been adapted for consistency based on the original article published in [282]. In total, 10 design
variables with a precision of two decimal places are considered. The problem consists of compu-
tationally expensive objectives but much less time-consuming constraints. The inexpensiveness of
constraints is an important fact to be exploited by the optimization method. Both case studies shall
give some insights into how to deal with computationally expensive problems in practice.
6.1    Case Study I: Cylinder Head Water Jacket Optimization
6.1.1    Introduction
Evolutionary algorithms are robust optimization tools that can handle different challenges of prac-
tical optimization problems such as uncertainty [283], multimodality [284], constraints [263], and
conflicting objectives [9]. These algorithms do not require any major simplification of the actual
problem, a matter which is often required by the classical point-based methods [285]. However,
this flexibility usually comes at the cost of a requirement for a high number of solution evaluations.
                                                   124


For example, the evaluation budget in BBOB2009 test suite was set to 106 𝐷 [286], in which 𝐷 is
the number of variables in the problem.
    In many applications, commonly referred to as computationally expensive problems, the eval-
uation of each solution requires a significant amount of computation time and effort. For example,
in optimal mechanical design, the evaluation of each candidate design may require a finite element
or computational fluid dynamic analysis, which can take anywhere from a few hours to a few
days. In some extreme cases, a solution evaluation may even require a costly prototyping and
experimental testing process [287]. The computation time of a design evaluation is the bottleneck
of the optimization process even if the evaluation process can be parallelized; therefore, in such
applications, the evaluation budget is generally limited to a few hundred or less.
    The goal in such problems is to make the most out of the available evaluation budget, which
is fulfilling the design objective(s) as much as possible. A limited evaluation budget clarifies the
importance of a careful selection of candidate solutions for evaluation, which motivates the use of
surrogate models [104], also known as metamodels, function approximation [121], and response
surface methodologies [89].
    One decisive feature of surrogate-assisted algorithms is the metamodel update, which is also
known as model management or evolution control [105]. Quite often, a single metamodel is used,
which is updated during the optimization process [121]. The most suitable metamodel is not known
beforehand and the choice of the metamodel is often based on its popularity in the related field.
Such metamodel-assisted algorithms are generally less accurate than more sophisticated methods
which employ an ensemble of metamodels. Updating the surrogate is more complicated in the latter
group and requires a metric to assess candidate metamodels in order to select the most suitable
metamodel at each cycle. Another prominent algorithmic aspect of surrogate-assisted methods is
the way to select new solutions for high-fidelity evaluation, which is referred to as infill criterion or
infill strategy [121]. Regarding recent development in parallel computing, it is practically crucial
that a surrogate-assisted method be able to select several infill solutions so that they can be evaluated
in parallel to reduce the optimization process wall-clock time.
                                                 125


    This section develops a surrogate-assisted evolutionary algorithm for single and multi-objective
optimization problems. This method, called Proximity-Based Surrogate-Assisted Evolutionary
Algorithm (PSA-EA), selects new candidate solutions following two goals. First, these solutions
should optimize the predicted objective values, and second, they should maximize the information
collected about specific regions of the problem landscape. This industry-motivated method is
tailored to problems with the following features:
     • The number of high-fidelity solution evaluations is limited to 50-100.
     • The number of variables varies between 5 and 10.
     • The required computational time for each solution evaluation is high (a few minutes to a
       few days); therefore, the computation time for other parts of the optimization process is
       negligible.
     • The number of cycles is limited; thus, the method must be able to select several infill solutions
       in parallel.
6.1.2    Proposed Proximity-Based Surrogate-Assisted Optimization Method
This section elaborates Proximity-Based Surrogate-Assisted Evolutionary Algorithm (PSA-EA) for
computationally expensive single-objective and multi-objective problems.
6.1.2.1    Selection of Initial High-Fidelity Solutions
The selection of initial high-fidelity solutions is an important phase because the sampling strategy
directly affects the goodness of the initial metamodel. The space-filling approach in this section
is based on the Maximin method [288], which aims to maximize the minimum distance among
all solutions. The employed algorithm in this study is an adaption of the initialization procedure
in [289]: The first initial point is generated randomly. Then, a new random point is sampled and
accepted if it maintains a distance of at least 𝑅0ini from already selected solutions; otherwise, it is
discarded and a new random point is generated. If several successive attempts are rejected (e.g.,
                                                  126


100 attempts), 𝑅0ini is slightly reduced (𝑅0ini ← 0.99𝑅0ini ). This process continues until all initial
high-fidelity solutions are generated. The distance variable 𝑅0ini is initially set to a conservatively
large value, which is half of the maximum distance that can exist in the search space between two
points (𝑅0ini = 0.5∥ 𝑿 𝑈 − 𝑿 (𝐿) ∥ 2 ).
6.1.2.2     Parallel Infill Strategy
Evolutionary algorithms should be able to maintain a reasonable trade-off between exploration
and exploitation. The likelihood of missing the global optimum increases if the search is not
exploratory enough. On the other hand, when focusing more on exploration, no time might be left
to exploit the gathered information about the design space. Evolutionary algorithms generally focus
on exploration in the beginning and gradually emphasize exploitation of the available information.
For example, the initialization of the population is an entirely exploratory process since the objective
values of the solutions are not considered.
    Finding a reasonable trade-off between exploration and exploitation is more complex for parallel
infill strategies. If exploration is ignored, infill solutions may be selected very close to each other.
To address this issue, Sóbester et al. [290] performed multimodal optimization of the surrogate
model and selected the optima of the approximate function as new infill solutions. This strategy,
however, limits the number of infill solutions in a cycle to the number of optima in the approximate
function. Furthermore, this strategy might be too exploratory for the final cycles, when the method
should make the most out of the existing information. To address this issue, Zhan et al. [291]
allowed selection of more infill solutions in the vicinity of fitter optima; however, their method
needed to tune a threshold value.
    As discussed, each new solution provides some new information on the landscape of the
objective function(s). If this solution is far from the existing ones, this information will be
significant. For surrogate-assisted methods, this information can provide an extra contribution:
It can improve the goodness of the metamodel in subsequent cycles if the new infill solution is
relatively far from the existing high-fidelity solutions. To take the diversity of new infill solutions
                                                     127


into account, this study defines an infeasible spherical region with radius 𝑅prox around each existing
high-fidelity solution so that subsequent infill solutions are not selected close to existing ones. A
greater 𝑅prox enforces that the new infill solutions considerably contribute to improving the goodness
of the surrogate model in future cycles (primary goal). Solutions with good predicted values are
preferred only if they are sufficiently far from the existing high-fidelity solutions (secondary goal).
    The accuracy of the predicted values strongly depends on the goodness of the metamodel.
Predictions close to observations are more accurate than predictions far from them. Therefore, the
concept of trust regions [292] is used to control the amount of acceptable uncertainty in the search
process. The trust region defines spherical regions of radius 𝑅trust around each solution. Subsequent
solutions for high-fidelity evaluation must be selected in the trust region. If 𝑅trust is large, points
with high predicted fitness are probably those with high uncertainty. This can mislead the search
to regions where the metamodel is less accurate. The value of 𝑅trust can control the exploitation of
the optimization process: a small 𝑅trust forces the algorithm to select new infill solutions close to
existing ones, where the prediction error is small. Combining both concepts, a permissible search
region is defined where all infill points must lie. 𝑅trust ≥ 𝑅prox > 0 should gradually decrease to
allow for a gradual transition from exploratory to exploitative search. In this study, 𝑅trust is set to
be proportional to 𝑅prox .
    Figure 6.1 shows the change in the permissible region during the optimization process. Areas
with a horizontal hatch pattern represent the region defined by 𝑅prox , whereas the region beyond the
trust region is delineated by an inclined hatch pattern. The permissible search region is demarcated
with a square-shaped hatch pattern. In the initial cycle, exploration is emphasized by having a large
𝑅trust and a large 𝑅prox . In intermediate cycles, both radii have decreased to improve exploitation.
During the final cycles, both regions are relatively small to maximize exploitation.
    Ideally, the rate of the reduction in 𝑅trust and 𝑅prox can be controlled by a single user-defined
parameter. Although different functions can be used, an exponentially decreasing function is
preferred in this study:
                                                   128


                                     Search
                                       Search
                                          Search
              2R2R                   Region
                                       Region
                                          Region             Rtrust
                                                                Rtrust
                                                                    Rtrust
                  prox2R
                      proxprox
        (a) Early Generations             (b) Intermediate Generations                  (c) Final Generations
Figure 6.1: Adaptive trust region approach restricts the search within the cross-patterned region
allowing the optimization algorithm to focus near high fidelity solutions.
                                                                        𝜏R         
                                                      FE − 𝑟 ini FEmax
                             𝑅prox = max   𝑅0ini   1−                         , 𝜖 prox ,                      (6.1)
                                                      FEmax (1 − rini )
                                             𝑅trust = 𝑟 R2R · 𝑅prox ,                                         (6.2)
in which 𝑅0ini is the minimum distance between two solutions after generating initial high-fidelity
solutions (see Section 6.1.2.1), FEmax is the solution evaluation budget, FE is the number of
evaluations so far, 𝑟 init is the fraction of the evaluation budget that was used for the initial high-
fidelity solution, and 𝜏R specifies the reduction rate of 𝑅prox . 𝜖prox > 0 is the lower bound for the
distance between two solutions. 𝜖 prox = 10−6 || 𝑿 (𝑈) − 𝑿 (𝐿) || 2 , which is 10−6 times the euclidean
distance of the upper minus the lower bound for each variable. This limit was defined solely to
prevent having multiple infill solutions on or extremely close to each other. Besides, it ensures
𝑅trust > 0. A parameter study is performed in Section 6.1.3 to find a good value for 𝜏𝑅 . Notably,
𝑅prox and 𝑅trust can be embedded into any optimization algorithm by adding constraints that declare
solutions as infeasible when they are in the proximal or outside of the trust region.
                                                      129


6.1.2.3    Management of the Surrogate Model
PSA-EA employs an independent metamodel for each objective function. It utilizes the DACEFIT
module [55], a Kriging-based metamodel provided by the Matlab Surrogate Model Optimization
Toolbox. The module allows setting the regression type, correlation function, and the initial length
scale(s) 𝜃 of the corresponding kernel. The regression can either be constant, linear, or quadratic.
The quadratic option is excluded since it requires a large number of high-fidelity solutions, leaving
two options for the regression type. For the correlation, an exponential, generalized exponential,
gaussian, linear, spherical, or cubic spline function can be used (six options). The bounds for 𝜃 were
set to [10−6 , 102 ] for which the candidate initial values are set to [10−5 , 10−4 , . . . , 100 ] (six options).
Since the accuracy of DACEFIT depends on the proper selection of the metamodel parameters,
the best parameter setting should be found for each objective at the beginning of each cycle. Each
parameter configuration results in a different metamodel. From this perspective, PSA-EA can be
regarded as a method that employs an ensemble of metamodels.
     For the first cycle, the set of candidate metamodels includes all settings for the parameters of
the metamodel, resulting in 2 × 6 × 6 = 72 metamodels. Each candidate metamodel is trained and
assessed, and the best one is selected for the current cycle. A fraction of the worst settings for
the metamodel parameters is discarded such that in the last cycle, only two parameter settings are
tested. For the case when there are 72 candidate solutions in the beginning, this fraction is:
                                                       (1/𝑁cycle −1)
                                                     2
                                         𝑟 drop =                      .                                     (6.3)
                                                    72
This reduction in the number of candidate metamodels significantly decreases the training and
assessing time by excluding metamodels unlikely to be a suitable in future.
     A common strategy to assess a metamodel is to divide the existing high-fidelity solutions into
training and testing data, e.g. 80% for training and 20% for testing, as performed by [136]. However,
since the number of high-fidelity solutions is very limited, this study suggests using stratified 𝑛-fold
cross-validation, according to which all existing solutions are used as training and testing sets.
Each time one solution is excluded from the training set, and its value is predicted using the trained
                                                     130


metamodel. This process continues until the predicted values of all solutions are calculated. Having
the actual and predicted values of each solution, the goodness of each metamodel can be assessed.
The best-found metamodel is then selected and retrained using all available solutions as there is no
more need for test data.
    There are several measures for comparing the goodness of a metamodel. The most commonly
used one [104] is Mean Squared Error (MSE):
                                                   𝑛                     2
                                                1 ∑︁  ˆ
                                      𝐸 MSE =           𝑓 (𝒙𝑖 ) − 𝑓 (𝒙𝑖 ) ,                          (6.4)
                                                𝑛 𝑖=1
in which 𝑛 is the number of available solutions and 𝑓ˆ(𝒙𝑖 ) and 𝑓 (𝒙𝑖 ) are the predicted and real
values, respectively. This study, however, favors a goodness measure that takes only the rank of
solutions into account after they are sorted according to their function values. Jin et al. [293]
proposed a number of such measures based on the difference between the predicted and actual
rank of (selected) solutions after they have been sorted according to their predicted and actual
values. This study suggests a more intuitive measure, called selection error probability (SEP). SEP
considers all pairwise comparisons of solutions and sums up the number of times that comparing
the predicted values has resulted in incorrect identification of the better solution:
                                                            𝑛 ∑︁  𝑛
                                                 1         ∑︁
                                  𝐸 SEP   =                           𝑞(𝒙𝑖 , 𝒙 𝑗 ),                  (6.5)
                                            0.5𝑛(𝑛 − 1) 𝑖=1 𝑗=𝑖+1
where
                                     
                                            if 𝑓 (𝒙𝑖 ) − 𝑓 (𝒙 𝑗 ) ( 𝑓ˆ(𝒙𝑖 ) − 𝑓ˆ(𝒙 𝑗 )) < 0,
                                                                
                                      1,
                                     
                                     
                                     
                     𝑞(𝒙𝑖 , 𝒙 𝑗 ) =
                                     
                                      0, otherwise.
                                     
                                     
                                     
    The number of possible pairs with 𝑛 solutions is 0.5 · 𝑛 · (𝑛 − 1). For each pair 𝑞(𝒙𝑖 , 𝒙 𝑗 ) returns
1 if the comparison of two solutions using their predicted values is not the same as using their true
values, otherwise it returns 0. A metamodel with a smaller 𝐸 SEP is considered as a better one.
    Figure 6.2 illustrates how SEP works in a common situation in model-based optimization.
The true function values are shown in blue and the predictions in red. The absolute metamodel
error is the integral of the difference between both functions. The optimization algorithm aims
                                                       131


Figure 6.2: Selection Error Probability: Pairwise comparison between high-fidelity and prediction
values.
to minimize 𝑓 (𝑥) (True) by using 𝑓ˆ(𝑥) (Prediction). The optimization algorithm will make
pairwise comparisons. For instance, if point 𝑥 1 and 𝑥 2 are compared, then 𝑓 (𝑥 1 ) > 𝑓 (𝑥2 ) and
 𝑓ˆ(𝑥 1 ) > 𝑓ˆ(𝑥 2 ). The metamodel does predict the domination relation correctly. Contrarily, when
𝑥 1 and 𝑥 3 are compared, 𝑓 (𝑥 1 ) > 𝑓 (𝑥 3 ), but 𝑓ˆ(𝑥 1 ) < 𝑓ˆ(𝑥 3 ). The metamodel prediction leads to
an incorrect comparison result. Considering minimizing using this metamodel the algorithm will
favor solution 𝑥 1 over 𝑥3 which is indeed wrong because 𝑓 (𝑥1 ) > 𝑓 (𝑥 3 ). This measures intuition
and mimics the decisions of an evolutionary algorithm during optimization.
6.1.2.4      Optimization of the Approximate Function(s)
After selecting and training the best metamodel, an evolutionary algorithm is employed to perform
optimization of the approximate function(s). In principle, any black-box optimization method can
be used for this purpose. In PSA-EA:
      • The Covariance Matrix Adaptation Evolution Strategy (IPOP-CMA-ES) with restarts [294]
         is used for single-objective problems. Since new solutions undergo high-fidelity evaluation
         at the end of each cycle, IPOP-CMA-ES is executed multiple times consecutively, generating
         one new point each time. This new solution does not affect the metamodel, but it changes
         the optimization problem landscape since each new point modifies the permissible region
                                                    132


       defined by 𝑅prox and 𝑅trust . At the same time, 𝑅prox and 𝑅trust are updated whenever a new
       solution is generated.
     • The recent unified version of Non-Dominated Sorting Genetic Algorithm III (NSGA-III) [279]
       is used for multi-objective problems. In this case, the solutions obtained by optimization are
       candidate solutions, from which some solutions are selected for high-fidelity evaluation. As
       a constraint, each solution must be in 𝑅trust , but not be in 𝑅prox . The selection procedure is
       similar to NSGA-III except that it uses a clearing strategy. Existing points are assigned to
       reference lines. The solution assigned to the least crowded reference line is selected for high-
       fidelity evaluation, and all points within the proximity region of this solution are cleared. If
       another point must be selected, but all points have been cleared, NSGA-III is run again after
       applying the effect of recent solutions on the permissible region. For practical problems, the
       user may manually select new points from candidate solutions.
    Since the surrogate provides a computationally cheap prediction for a solution value, the budget
           √
of 5000𝐷 𝑁 𝐹 of approximate evaluations are provided per each infill solution, in which 𝐷 and 𝑁 𝐹
are the numbers of decision parameters and the objectives in the problem. This setting is based on
numerical experiments conducted by the authors.
6.1.2.5    Flowchart
Figure 6.3 presents the flowchart of PSA-EA. After generating initial high-fidelity solutions, an
exhaustive search is performed to find the best metamodel. Then, a set of infill solutions is
generated by optimizing the approximate function(s). The new infill solutions are used to update
the metamodel(s), and this process continues until the budget for high-fidelity evaluation is depleted.
6.1.3    Descriptive Experiments
This section performs a few descriptive experiments to demonstrate the effects of different compo-
nents of PSA-EA. For this purpose, four test problems were selected:
                                                  133


                                      Start
      Sample initial high-fidelity solutions using the space filling method
     Form the set of all candidate metamodels for each objective function
 Assess the goodness of each candidate metamodel for each objective function
                   using stratified n-fold cross-validation
                                                                                   Model Management
Select the best metamodel for each objective function. Remove a fraction (rdrop)
  of the worst existing metamodels from the set of candidate metamodels for
                            each objective function
      Retrain the best metamodel using all existing high-fidelity solutions
              Perform optimization using approximate function(s)
                           Select new infill solutions
                                                                                     Infill Strategy
                             Update Rprox and Rtrust
                                          Yes
         No           Have sufficient infill solutions been
                           generated for this cycle?
            Perform high-fidelity evaluation of the infill solutions
                                           Yes
          No
                        Has the budget for high-fidelity
                            evaluations been used?
                                       End
                   Figure 6.3: Flowchart of PSA-EA.
                                      134


     • Modified Rastrigin function, a highly multimodal function with a symmetric bowl-shaped
       global structure.
     • Schwefel function, a multimodal function in which good local minima lie close to the corners
       of search space. Approaching any corner means going away from the rest of good local
       minima.
     • shifted Ackley function, which shows a sudden reduction in the objective function value
       when approaching the global minimum.
     • ZDT3 function, a two-objective test problem with a disjoint Pareto front.
    Each problem is optimized 50 times independently. For single-objective problems, the per-
formance indicator measures the difference between the best value found by the method and the
global optimum value ( 𝑓best − 𝑓 ∗ ). For multi-objective problems, the performance indicator is
hypervolume ratio (HVR), which is the ratio of the measured hypervolume of the non-dominated
solutions divided by the hypervolume of the true Pareto front. Therefore, 0 ≤ HVR ≤ 1. The
reference point (𝑿ref ) for the calculation of HV is calculated as follows:
                                  𝑿ref = 𝛼ref ( 𝑿nadir − 𝑿ideal ) + 𝑿ideal ,                     (6.6)
    in which 𝛼ref > 1. For a reasonable value of 𝛼ref , 𝐻𝑉 𝑅 ≈ 1 means that the final solutions could
provide a good approximation of the Pareto front, regardless of the shape of the problem and the
nadir points. HVR is known to be a Pareto-compliant performance indicator [295]. By default, this
study sets 𝛼ref = 1.1, as recommended by [296].
6.1.3.1   Effect of 𝑅prox
In PSA-EA, a larger 𝜏𝑅 makes a faster transition from exploration to exploitation. 𝜏𝑅 = ∞, for
example, suddenly reduces 𝑅prox to 𝜖prox , which actually suppress the effect of 𝑅prox concept.
This section analyzes the effect of 𝜏𝑅 on the performance of PSA-EA. The budget of high-fidelity
                                                   135


                                               𝜏𝜏tauR=0.5
                                                  𝑅=  = 0.5
                                                 tauR=0.5
                                                  𝑅      0.5                                             𝜏𝜏tauR=0.5
                                                                                                             𝑅=  = 0.5
                                                                                                            tauR=0.5
                                                                                                             𝑅      0.5
                best ‐ f                                             best ‐ f
             fbest       **                                        fbest      **
                                               𝜏𝜏tauR=2
                                                  𝑅=  = 22
                                                 tauR=2
                                                  𝑅                                                      𝜏𝜏tauR=2
                                                                                                             𝑅=  = 22
                                                                                                            tauR=2
                                                                                                             𝑅
             12
             12                                𝜏𝜏tauR=10
                                                  𝑅=  = 10
                                                 tauR=10
                                                  𝑅      10      500
                                                                 500                                     𝜏𝜏tauR=10
                                                                                                             𝑅=  = 10
                                                                                                            tauR=10
                                                                                                             𝑅      10
                                               𝜏𝜏tauR=inf
                                                  𝑅=  =∞
                                                 tauR=inf
                                                  𝑅      ∞                                               𝜏𝜏tauR=inf
                                                                                                             𝑅=  =∞
                                                                                                            tauR=inf
                                                                                                             𝑅      ∞
                                               𝜏𝜏tauR=0.5
                                                  𝑅=  = 0.5
                                                 tauR=0.5
                                                  𝑅      0.5                                             𝜏𝜏tauR=0.5
                                                                                                             𝑅=  = 0.5
                                                                                                            tauR=0.5
                                                                                                             𝑅      0.5
             f9best
               9best
                     ‐ f**                     𝜏𝜏tauR=2
                                                  𝑅=  = 22
                                                 tauR=2
                                                  𝑅                  best ‐ f
                                                                   fbest
                                                                 400
                                                                 400          **
                                                                                                         𝜏𝜏tauR=2
                                                                                                             𝑅=  = 22
                                                                                                            tauR=2
                                                                                                             𝑅
             12
             12                                𝜏𝜏tauR=10
                                                  𝑅=  = 10
                                                 tauR=10
                                                  𝑅      10      500
                                                                 500                                     𝜏𝜏tauR=10
                                                                                                             𝑅=  = 10
                                                                                                            tauR=10
                                                                                                             𝑅      10
                                               𝜏𝜏tauR=inf
                                                  𝑅=  =∞
                                                 tauR=inf
                                                  𝑅      ∞       300
                                                                 300                                     𝜏𝜏tauR=inf
                                                                                                             𝑅=  =∞
                                                                                                            tauR=inf
                                                                                                             𝑅      ∞
              66
                                                                 400
                                                                 400
              99                                                 200
                                                                 200
              33                                                 300
                                                                 300
                                                                 100
                                                                 100
              66
                                               rini
                                                  ini            200
                                                                 200                                     riniini
              00                                                    00
              33 00         0.2
                            0.2  0.4
                                 0.4  0.6
                                       0.6   0.8
                                             0.8          11            00         0.2
                                                                                   0.2    0.4
                                                                                           0.4  0.6
                                                                                                0.6  0.8
                                                                                                     0.8             11
                                                                 100
                                                                 100
                                               rini
                                                  ini                                                    riniini
              00                                                    00
                  00        0.2
                            0.2  0.4
                                 0.4  0.6
                                       0.6   0.8
                                             0.8          11            00         0.2
                                                                                   0.2    0.4
                                                                                           0.4  0.6
                                                                                                0.6  0.8
                                                                                                     0.8             11
           Rastrigin                                                                     Schwefel
                   (a) Modified Rastrigin Problem.                               (b) Schwefel Problem.
           Rastrigin                                                                     Schwefel
                                               𝜏𝜏tauR=0.5
                                                  𝑅=  = 0.5
                                                 tauR=0.5
                                                  𝑅      0.5                                          𝜏𝜏tauR=0.5
                                                                                                            𝑅= = 0.5
                                                                                                        tauR=0.5
                                                                                                           𝑅      0.5
             fbest
                best ‐ f**                     𝜏𝜏tauR=2
                                                  𝑅=  = 22
                                                 tauR=2
                                                  𝑅              HVR                                  𝜏𝜏tauR=2
                                                                                                            𝑅=
                                                                                                        tauR=2
                                                                                                           𝑅   = 22
             15
             15                                𝜏𝜏tauR=10
                                                  𝑅=  = 10
                                                 tauR=10
                                                  𝑅      10        11                                 𝜏𝜏tauR=10
                                                                                                            𝑅=
                                                                                                        tauR=10
                                                                                                           𝑅   = 10
                                                                                                                  10
                                               𝜏𝜏tauR=inf
                                                  𝑅=  =∞
                                                 tauR=inf
                                                  𝑅      ∞                                            𝜏𝜏tauR=inf
                                                                                                            𝑅= =∞
                                                                                                        tauR=inf
                                                                                                           𝑅      ∞
                                               𝜏𝜏tauR=0.5
                                                  𝑅=  = 0.5
                                                 tauR=0.5
                                                  𝑅      0.5                                          𝜏𝜏tauR=0.5
                                                                                                            𝑅= = 0.5
                                                                                                        tauR=0.5
                                                                                                           𝑅      0.5
                best ‐ f
             fbest
             12
             12          **
                                               𝜏𝜏tauR=2
                                                  𝑅=  = 22
                                                 tauR=2
                                                  𝑅              HVR
                                                                 0.9
                                                                 0.9                                  𝜏𝜏tauR=2
                                                                                                            𝑅=
                                                                                                        tauR=2
                                                                                                           𝑅   = 22
             15
             15                                𝜏𝜏tauR=10
                                                  𝑅=  = 10
                                                 tauR=10
                                                  𝑅      10        11                                 𝜏𝜏tauR=10
                                                                                                            𝑅=
                                                                                                        tauR=10
                                                                                                           𝑅   = 10
                                                                                                                  10
              99                               𝜏𝜏tauR=inf
                                                  𝑅=  =∞
                                                 tauR=inf
                                                  𝑅      ∞                                            𝜏𝜏tauR=inf
                                                                                                            𝑅= =∞
                                                                                                        tauR=inf
                                                                                                           𝑅      ∞
                                                                 0.8
                                                                 0.8
             12
             12
              66                                                 0.9
                                                                 0.9
              99                                                 0.7
                                                                 0.7
              33
                                                                 0.8
                                                                 0.8
              66                               rini
                                                  ini                                                 rini ini
              00                                                 0.6
                                                                 0.6
                  00        0.2
                            0.2  0.4
                                 0.4  0.6
                                       0.6   0.8
                                             0.8          11     0.7 00
                                                                 0.7              0.2
                                                                                  0.2    0.4
                                                                                          0.4  0.6
                                                                                               0.6  0.8
                                                                                                    0.8             11
              33
                       (c) Shifted Ackley Problem.
                                               rini
                                                  ini
                                                                                   (d) ZDT3 Problem.rini   ini
              00                                                 0.6
                                                                 0.6
                  00        0.2
                            0.2  0.4
                                 0.4  0.6
                                       0.6   0.8
                                             0.8          11           00         0.2
                                                                                  0.2    0.4
                                                                                          0.4  0.6
                                                                                               0.6  0.8
                                                                                                    0.8             11
Figure 6.4:Ackley
                Effect of 𝑟 ini and 𝜏𝑅 on the performance of PSA-EA                    ZDT3 on each test problem when
FEmax = 100 and 𝑁cycle = 100.
           Ackley                                                                      ZDT3
solutions is set to 20𝐷, and different values for fraction of initial high-fidelity solutions (𝑟 ini ) and
𝜏𝑅 are tried. For this experiment, 𝑟 R2R is set to a very large value to suppress the effect of 𝑅trust .
Furthermore, only one solution is generated at each cycle. Figure 6.4 illustrates the performance
metric for each setting. As it can be observed:
    • In general, a proper 𝜏𝑅 can significantly improve final results. This is more detectable
      for Rastrigin, Schwefel, and Ackley functions. 𝜏𝑅 = 2 results in significantly better final
      solutions, when compared to 𝜏𝑅 = ∞ or 𝜏𝑅 = 0.5.
    • A gradual reduction of 𝑅prox (for example, 𝜏𝑅 = 2 ) improves the robustness of the method to
      the fraction of initial solutions (𝑟 ini ).
    • When 𝜏𝑅 = ∞, exploration is limited to the initialization phase. For a small value of 𝑟 ini , this
                                                             136


       results in a more considerable performance drop compared to 𝜏𝑅 = 2, in which exploration
       diminishes gradually. For the same reason, a higher 𝑟 ini can be beneficial when 𝜏𝑅 → ∞
       since it improves exploration.
    • Suppressing the idea of 𝑅prox is advantageous for ZDT3 problem, possibly because of the
       simplicity of the objective functions in this problem or the fact that in multi-objective opti-
       mization, some diversity is automatically preserved in the selection process. Nevertheless,
       𝜏𝑅 = 2 is only a little worse than 𝜏𝑅 → ∞.
    Consequently, a gradual reduction of 𝑅prox , motivated by a gradual shift from exploration to
exploitation, can improve both the quality of final solutions and robustness to the fraction of initial
solutions.
6.1.3.2   Effect of 𝑁cycle
A smaller 𝑁cycle is desirable from an application point of view since it allows for the parallel
evaluation of new infill solutions; however, it degrades the optimization performance by postponing
the exploitation of true values of new infill solutions. Therefore, it is practically important to
investigate the sensitivity of a surrogate-assisted method to this parameter. This section explores
whether a gradual shift from exploration to exploitation can contribute to the performance of PSA-
EA when 𝑁cycle is limited. For this experiment, 𝑟 R2R is set to a very large value to suppress the
effect of 𝑅trust . Figure 6.5 demonstrates the median, the first quartile (Q1), and the third quartile
(Q3) of 𝑓best − 𝑓 ∗ or HVR as a function of 𝑁cycle when 𝜏𝑅 = 2. It reveals that:
    • For the single objective problems, the performance substantially improves by increasing
       𝑁cycle up to 𝑁cycle = 5. After that, the rate of improvement diminishes, and for 𝑁cycle >
       16, no significant improvement was obtained by increasing 𝑁cycle . Knowing this trade-off
       makes decision-making easier on the value of 𝑁cycle by considering the available computing
       resources.
                                                  137


                                                                               Q1 Q1                                  Q1 Q1                              Q1 Q1
                                                                 f* ‐ f*
                                                      fbest f‐best                 Median f
                                                                               Median              f* ‐ f*
                                                                                           bestf‐best
                                                                                                                           Median f
                                                                                                                      Median                f* ‐ f*
                                                                                                                                    bestf‐best
                                                                                                                                                              Median HVR
                                                                                                                                                         Median           HVR
                                                                               Q3 Q3                                  Q3 Q3                              Q3 Q3
                                                     9    9                             12001200                                15 15                              0.92 0.92
                                                                                                                                                                    0.9 0.9
                                                     6    6                               800 800                               10 10
                                                                                                                                                                   0.88 0.88
                                                     3    3                               400 400                                 5    5
                                                                                                                                                                   0.86 0.86
                                                           N N                                        Ncycle
                                                                                                 N0cycle                                N N                                  Nc
                                                     0    0 cycle cycle                     0                                     0    0 cycle cycle               0.84 0.84
                                                        1 21 42 84 16       8 32
                                                                              16 64
                                                                                  32 64        1    21 42 84 168 32   16 64
                                                                                                                          32 64      1 21 42 84 168 32   16 64
                                                                                                                                                             32 64       1 21 42
                                                              (a) M-Rastrigin                            (b) Schwefel
   Q1 Q1                                 Q1 Q1                                 Q1 Q1                                  Q1 Q1
                best ‐fbest                                    f‐best
   MedianMedianf        f*  ‐ f*         MedianMedian f           f* ‐ f*           Median HVRHVR
                                                                               Median                                 Median
   Q3 Q3                                 Q3 Q3 RastriginRastrigin
                                                        best                   Q3 Q3                Schwefel Q3 Median
                                                                                              Schwefel                    Q3                    Ackley
                                                                                                                                                     Ackley
             1200 1200                               15 15                               0.92 0.92
                                                                                          0.9 0.9
               800 800                               10    10
                                                                                         0.88 0.88
               400 400                                5       5
                                                                                         0.86 0.86
                      NcycleNcycle                            NcycleNcycle                             N Ncycle cycle
                 0      0                             0       0                          0.84 0.84
8 32166432 64       1    2 1 4 2 8 4 168 32166432 64     1       21 42 84 168 32
                                                                               16 64
                                                                                   32 64       1 21 42 84 168 32      16 64
                                                                                                                          32 64
                                                                 (c) M-Ackley                              (d) ZDT3
                    Figure 6.5: Effect of 𝑁cycle on the performance of PSA-EA (𝜏𝑅 = 2, 𝑟 ini = 0.4) for the corresponding
                    test Schwefel
                   Schwefel problem.                                       Ackley
                                                                       Ackley                              ZDT3  ZDT3
                           • For the ZDT3 problem, 𝑁cycle has no detectable effect on the performance. This demonstrates
                              that the initial surrogate model, which is trained using the initial 40 solutions, is sufficiently
                              accurate, and new infill solutions do not change the model much. This observation explains
                              why for this specific problem, a gradual transition from exploration to exploitation was not
                              beneficial in Section 6.1.3.1.
                    6.1.3.3       Effect of 𝑟 R2R
                    𝑅trust may enforce exploitation by limiting the search to solutions near the existing high-fidelity
                    solutions. To check the effect of 𝑅trust , this section optimizes four test problems using different
                                                                                       138


values of 𝑟 R2R . For this experiment, 𝜏𝑅 = 2, 𝑁cycle = 20, and FEmax = 100. Figure 6.6 shows Q1,
Median, and Q3 of the performance indicator. As observed:
     • for ZDT3 and Schwefel problems, a large value for 𝑟 R2R is advantageous. For these two
       problems, any 𝑟 R2R ≥ 4 is a reasonable choice.
     • No considerable effect of 𝑟 R2R can be detected for the shifted Ackley and modified Rastrigin
       problems.
    The results of this section suggest that the default value of 𝑟 R2R should be equal or greater than
four; however, no upper limit can be defined at this stage. This observation can be explained as
follows: A smaller 𝑟 R2R results in a smaller difference between the predicted and the actual values
of new infill solutions because it restricts the search to the neighborhood of existing high-fidelity
solutions. On the other hand, this restriction limits the potential improvement from new infill
solutions since farther solutions are disregarded even though the updated surrogate model predicts
a good value for them.
6.1.4    Numerical Comparison
This section compares the performance of PSA-EA with two of the recently proposed surrogate-
assisted optimization methods:
     • Surrogate Optimization of Computationally Expensive Multi-Objective (SOCEMO) [297], a
       surrogate-assisted optimization method for multi-objective problems.
     • Mixed Integer Surrogate Optimization (MISO) [228], a surrogate-assisted optimization
       method for mixed-variable single-objective problems.
    One main reason for selecting these methods for comparison is the availability of their source
code and documentation [298], which facilitates the simulation of arbitrary test problems. All
parameters of MISO/SOCEMO are set to their default values, including the number of initial
samples, which is set to 2𝐷 + 1. To the best of the author’s knowledge, no option to control the
                                                  139


                                                                            Q1 Q1                                  Q1 Q1                                   Q1 Q1
                                                        *                                      *                                        * *
                                                fbest-f
                                                      fbest -f*                 Median fbest-f
                                                                            Median          fbest -f*              Median
                                                                                                                       Median  fbestf-fbest -f             Median
                                                                                                                                                               Median HVR HVR
                                                                            Q3 Q3                                  Q3 Q3                                   Q3 Q3
                                                 3    3                               500 500                                 1.2 1.2                                  1   1
                                                                                      400 400
                                                                                                                              0.9 0.9
                                                 2    2                                                                                                              0.9 0.9
                                                                                      300 300
                                                                                                                              0.6 0.6
                                                                                      200 200
                                                 1    1                                                                                                              0.8 0.8
                                                                                                                              0.3 0.3
                                                                                      100 100
                                                                       1/rR2R
                                                                           1/rR2R                             1/rR2R
                                                                                                                  1/rR2R                              1/rR2R
                                                                                                                                                          1/rR2R
                                                 0    0                                  0    0                                 0     0                              0.7 0.7
                                                   0     0.2
                                                          0 0.2 0.4 0.4
                                                                     0.6 0.6
                                                                          0.8 0.8
                                                                                1  1        0    0.2
                                                                                                 0 0.2  0.4 0.4
                                                                                                             0.6 0.6
                                                                                                                  0.8 0.8
                                                                                                                        1  1       0     0.2
                                                                                                                                          0 0.2 0.4 0.4
                                                                                                                                                     0.6 0.6
                                                                                                                                                          0.8 0.8
                                                                                                                                                                1 1      0 00.2 0.2
                                                                                                                                                                                 0.
                                                        (a) M-Rastrigin                               (b) Schwefel
 Q1 Q1                                Q1 Q1                                 Q1 Q1                                  Q1 Q1
                    * -f*                               * -f*
     Median fbest-f
 Median          fbest                   Median fbest-f
                                      Median          fbest        Rastrigin
                                                              Rastrigin          Median HVR
                                                                            Median          HVR            Schwefel
                                                                                                      Schwefel     Median
                                                                                                                       Median                      Ackley
                                                                                                                                               Ackley                            Z
 Q3 Q3                                Q3 Q3                                 Q3 Q3                                  Q3 Q3
           500 500                             1.2 1.2                                  1 1
           400 400
                                               0.9 0.9
                                                                                      0.9 0.9
           300 300
                                               0.6 0.6
           200 200
                                                                                      0.8 0.8
                                               0.3 0.3
           100 100
 1/rR2R
R2R                              1/rR2R
                                     1/rR2R                            1/rR2R
                                                                           1/rR2R                            1/rR2R
                                                                                                                  1/rR2R
             0     0                             0     0                              0.7 0.7
.8 0.8
 6  1   1       0     0.2
                      0 0.20.4 0.4
                                0.6 0.6
                                    0.8 0.8
                                          1  1      0     0.2
                                                          0 0.2 0.4 0.4
                                                                     0.6 0.6
                                                                          0.8 0.8
                                                                                1   1      0 0.2 0 0.2 0.4 0.4
                                                                                                            0.6 0.6
                                                                                                                 0.8 0.8
                                                                                                                       1 1
                                                         (c) M-Ackley                                    (d) ZDT3
                     Figure 6.6: Effect of 𝑟 R2R on the performance of PSA-ES when 𝜏𝑅 = 2, 𝑁cycle = 20, and FEmax =
 igin                         Schwefel
                     100.Schwefel                                   Ackley
                                                               Ackley                                  ZDT3 ZDT3
                     number of cycles in these methods is provided. To ensure a fair comparison, the same number of
                     initial samples is used for PSA-ES. Moreover, two values for 𝑁cycle are considered for PSA-EA:
                          • 𝑁cycle = FEmax (1 − 𝑟 ini ), in which only one solution is generated and evaluated at each cycle.
                             It allows for a fair comparison with MISO/SOCEMO. This variant of PSA-EA is denoted by
                             PSA-EA(S) .
                          • 𝑁cycle = 8, in which multiple solutions are generated in each cycle and evaluated in parallel.
                             This practically demanding variant is denoted by PSA-EA(P) .
                          Four single-objective and six multi-objective test problems are employed in this study for
                                                                                    140


numerical evaluation and comparison. The single-objective test problems are some of the most
commonly used test problems in the global optimization literature. This study uses slightly modified
variations of these problems mainly by shifting the fitness function to relocate the location of the
global optimum. The family of ZDT [253] and DTLZ [257] test problems are widely used in the
multi-objective optimization literature. A few of them were excluded either because of excessive
simplicity (ZDT1 and ZDT2) or similarity to other selected problems. The mathematical definitions
of the customized test problems used in this study are provided in the supplementary document.
    These test problems are optimized for 𝐷 = 5 and 𝐷 = 10, when FEmax = 10𝐷. This setting is
based on the types of practical problems that motivated this study. For each problem, 50 independent
runs are performed and the performance indicator, which was explained in Section 6.1.3, is reported
for each method. By default, 𝛼ref = 1.1 (see Equation 6.6); however, for some more difficult
problems, this setting may result in HVR ≈ 0 for the tested methods. If so, this indicator may not
determine which method has performed better. Alternatively, a greater 𝛼ref was chosen such that
at least one method can reach HVR ≥ 0.5.
    Wilcoxon rank-sum test with significance level 𝛼 = 0.01 is employed to check whether there
is a statistically significant difference between the performance of PSA-EA and that of the other
tested method (MISO for single-objective and SOCEMO for multi-objective problems). Table 6.1
presents the performance indicator for both methods when 𝐷 = 5. It also provides the CPU time
for each method to optimize all the problems once. The signs +, =, and − denote that PSA-
EA(S) or PSA-EA(P) is statistically better, equal, and worse when compared with SOCEMO/MISO,
respectively. The same data for 𝐷 = 10 are provided in Table 6.2. The obtained results reveal that:
     • PSA-EA(S) outperforms MISO/SOCEMO for seven problems when 𝐷 = 5, but it is out-
       performed by MISO/SOCEMO for two problems. For 𝐷 = 10, PSA-EA(S) outperforms
       MISO/SOCEMO in nine problems but it is not outperformed in any problem. This ob-
       servation demonstrates that the superiority of PSA-EA(S) over MISO/SOCEMO intensifies
       for problems of higher dimensions. For example, PSA-EA(S) is outperformed by MISO/-
       SOCEMO for 5-D M-DTLZ1, but it excels for 10-D version of this problem. In particular,
                                                  141


Table 6.1: Median of the performance indicator ( 𝑓best − 𝑓 ∗ for single-objective problems and HVR
for multi-objective problems) and the outcome of the statistical test for the performance of the
tested methods for 𝐷 = 5. 𝛼ref defines the selected reference point for calculation of HRV.
                                                    MISO or
              PID   Function         𝑁obj   𝛼ref                PSA-EA(S)   PSA-EA(P)
                                                   SOCEMO
                1   M-Rastrigin       1      −        7.43        3.32 (+)    3.92 (+)
                2   Schwefel          1      −       -1358       -1569 (+)   -1366 (=)
                3   M-Ackley          1      −        6.31        3.36 (+)    6.37 (=)
                4   M-Rosenbrock      1      −        110         250 (−)     512 (−)
                5   ZDT3              2     1.1      0.507       0.882 (+)   0.883 (+)
                6   M-ZDT4            2      7          0        0.597 (+)   0.522 (+)
                7   ZDT6              2      3       0.485       0.535 (=)   0.435 (=)
                8   M-DTLZ1           3     20       0.549       0.206 (−)   0.004 (−)
                9   DTLZ2             3     1.1      0.686       0.888 (+)   0.886 (+)
               10   DTLZ6             3     1.1      0.131       0.649 (+)   0.632 (+)
               CPU time (minute)                       4.7           51          36
Table 6.2: Median of the performance indicator ( 𝑓best − 𝑓 ∗ for single-objective problems and HVR
for multi-objective problems) and the outcome of the statistical test for the performance of the
tested methods for 𝐷 = 10. 𝛼ref defines the selected reference point for calculation of HRV.
                                                   MISO or
              PID   Function        𝑁obj   𝛼ref                PSA-EA(S)    PSA-EA(P)
                                                  SOCEMO
               1    M-Rastrigin       1      −       19.64       6.40 (+)     8.52 (+)
               2    Schwefel          1      −       -2644      -3088 (+)    -2619 (=)
               3    M-Ackley          1      −        8.37       4.31 (+)     8.69 (=)
               4    M-Rosenbrock      1      −        907         627 (=)    1249 (−)
               5    ZDT3              2     1.1      0.077      0.936 (+)    0.935 (+)
               6    M-ZDT4            2     25       0.147      0.668 (+)    0.535 (+)
               7    ZDT6              2      5       0.546      0.609 (+)    0.577 (=)
               8    M-DTLZ1           3     60         0        0.648 (+)    0.421 (+)
               9    DTLZ2             3     1.1      0.493      0.830 (+)    0.837 (+)
               10   DTLZ6             3     1.1      0.069      0.718 (+)    0.720 (+)
              CPU time (minute)                       11.2          449         180
                                                 142


  PSA-EA(S) significantly outperforms MISO/SOCEMO for M-Rastrigin, M-Ackley, ZDT3,
  ZDT4, and DTLZ6.
• Compared to MISO/SOCEMO, PSA-EA(P) still excels in 11 problems and falls behind only
  in three problems.
• For some problems, a detectable performance drop can be observed for PSA-EA when the
  number of cycles is reduced to eight. In contrast, there is no considerable difference between
  PSA-EA(S) and PSA-EA(P) for PID=5, 9, and 10. Besides, both variants of PSA-EA could
  reach a relatively high HVR with the default value of 𝛼ref . This implies that for these problems,
  even the initial surrogate model could predict the rank of solutions with good reliability. An
  efficient surrogate-assisted method is an ideal tool for optimizing such problems.
• Neither of the tested methods could provide a good approximation of the Pareto front of
  M-DTLZ1, even though the difficulty of the original problem was moderated; therefore, a
  large deviation from the recommended value of 𝛼ref was necessary to obtain discriminating
  values for HVR.
• The CPU time for PSA-EA variants are much higher than that of MISO/SOCEMO; however,
  for problems in which each solution evaluation may take a few hours or more, the computation
  time of PSA-EA is still negligible.
• Compared to MISO/SOCEMO, PSA-EA(P) has an important practical advantage: It can
  submit the new infill designs for external evaluation in a group of the desired size. Based
  on the author’s evaluation, MISO/SOCEMO does not have this flexibility and requires the
  evaluation of new infill solutions to proceed. This is a critically important feature when it
  comes to practical problems.
                                            143


6.1.5    Application: Cylinder Head Water Jacket Design
In this work, PSA-EA is employed to optimize the design of an engine cylinder head. In the
following, the problem is described, the optimization procedure explained, and finally, the obtained
results are discussed.
6.1.5.1    Problem Description
The problem has eight design parameters, which are the area of four inlets and four outlets of the
cooling water jacket. These parameters are normalized with respect to the largest possible section.
This way each area variable takes a value within (xi ∈ [0, 100], i = 1, 2, . . . , 8). In the base design,
all inlets and outlets are set to their maximal size, which means 𝑥𝑖 = 100 for all 𝑖 = 1, 2, . . . , 8.
Two conflicting objectives were defined – 𝑓1 (𝑥) to be maximized and 𝑓2 (𝑥) to be minimized. Each
design evaluation requires a detailed CFD simulation which was performed by the engineering
team at Ford. The CFD model consists of 2.4 million volume cells, 13.7 million interior faces,
and 11.4 million vertices. The CFD analysis terminates after 4,000 iterations. Residuals such as
continuity and energy are monitored but not used as convergence criteria.
     Each design analysis takes about one hour using 32 CPUs. The optimization budget is limited
to 61 design evaluations to complete the optimization task in a reasonable time. Furthermore,
two different boundary conditions are considered for CFD simulation, resulting in two separate
problems. They are named B34 and B38 here. These problems have no constraints except the
bounds of the search range.
6.1.5.2    Results
Two optimization approaches are tested on problems B34 and B38 in parallel and independently. In
the first approach, a commercial software is utilized by engine design engineers for surrogate model
training and optimization. The selection of new infill solutions, however, is performed manually
by the engineering team. This approach is denoted by CS (Commercial Software).
                                                 144


    In the second approach, PSA-EA is employed by the research team at Michigan State University
(MSU) for design optimization. For PSA-EA, the selection of new infill solutions is performed
automatically by the algorithm, except for the final two cycles, in which the engineering team acted
as decision-makers to choose preferred solutions from PSA-EA results. At the end of each cycle,
three to five new solutions obtained by PSA-EA were sent to the engineering team at Ford for
evaluation.
    The CS approach was run independently by the engine design engineers, while PSA-EA was run
at MSU with assistance on solution evaluation. It is worth noting that at the end of the ninth cycle,
the search range for decision variables was manually reduced based on the range of non-dominated
solutions.
    Figure 6.7 shows the predicted values of new infill solutions, as well as their true values after
CFD simulation for both methods (CS and PSA-EA) and both problems (B34 and B38). All
obtained solutions by both methods are illustrated in Figure 6.8. The region of interest is focused
in Figure 6.9, which also demonstrates the final solutions selected by the engineering team for
fabrication and experimental testing. Based on the obtained results, the following conclusions can
be made:
    • PSA-EA and CS generate solutions that dominate most initial infill solutions (Figure 6.8).
       This can be considered a checkpoint for the validity of the optimization process, even when
       the evaluation budget is highly limited.
    • The selected solutions for fabrication are from PSA-EA results for problem B34 and from
       CS results for problem B38 (Figure 6.9). For both solutions, 𝑓2 is slightly greater than 10.
    • The prediction error is initially high for both methods and both objectives. The error remains
       high for the CS method until the end but gradually reduces for the PSA-EA method with
       iteration (Figure 6.7). It is remarkable that the proposed approach, starting with wide
       discrepancies between true and surrogate model function values, match so well in a few
       iterations. The use of the trust region method and the overall surrogate modeling approach
                                                  145


                                           PSA-EA:True
             f1       B34                  CS:True           f1          B38
                                           PSA-EA:Predict
            7000                           CS:Predict       7000
            6000                                            6000
            5000                                            5000
            4000                                            4000
            3000                                            3000
            2000                                            2000
                                          Infill Sol. No.                      Infill Sol. No.
            1000                                            1000
                  31    41       51             61                31  41     51            61
             f2             B34                              f2          B38
            14                                              14
            12                                              12
            10                                              10
             8                                                8
             6                                                6
             4                                                4
                                   Infill Sol. No.                             Infill Sol. No.
             2                                                2
                31     41       51             61               31   41     51             61
Figure 6.7: Predicted values of new solutions and their true values after CFD simulation for both
methods, both problems, and both objectives.
      make this reliable and remarkable correlation. This is more detectable for 𝑓2 , for which the
      prediction error becomes almost zero in final cycles when using PSA-EA. This advantage of
      PSA-EA is presumably the result of a better exploration of promising regions in early cycles
      and better exploitation in the final cycles, and the manual reduction of the search range.
    • Compared with the base design, the selected solutions show a maximum of 88% and 114%
      improvement of 𝑓1 for problems B34 and B38, respectively. This is a piece of evidence
      for the superiority of design by optimization in comparison with intuition-based design
      methodologies.
    • One interesting and unexpected feature of the selected design for problem B34 is that for
      this design, the size of one of the inlets/outlets is close to zero (𝑥 6 = 6.27). This unex-
                                                          146


                                         Ini. Designs
        f2                                                   f2               B38
                            B34          PSA-EA
                                         Ini. Designs
        f                                CS                  f2               B38
        702                 B34          PSA-EA            100
        70                               CS
                                                           100
        60
                                                            80
        60
        50                                                  80
        50                                                  60
        40
        40                                                  60
        30                                                  40
        30                                                  40
        20
        20                                                  20
        10
                                                            20
        10                                    f1                                                f1
         0                                                    0
                                              f1                                                f16000
         01000    2000  3000    4000   5000       6000        01000  2000  3000   4000   5000
           1000   2000  3000    4000   5000       6000         1000  2000  3000   4000   5000     6000
        Figure 6.8: All generated solutions by PSA-EA and CS for problems 𝐵34 and 𝐵38.
        f2                                                   f2               B38
                            B34
        f212                                                 f2               B38
                            B34                            12
        11.5
          12                                               12
          11
        11.5                                               11
        10.5
          11                                               11
          10
        10.5                                               10
         9.5
          10                            Selected           10                          Selected
         9.59                           Selected            9                          Selected
         8.59                                               9
         8.58                                               8
         7.58                                               8             Base
                              Base            f1                                                f1
         7.57                                               7             Base
                         3000 Base            f1                                                f16000
            71000  2000          4000  5000       6000      71000   2000  3000    4000   5000
             1000  2000  3000    4000  5000       6000        1000  2000  3000    4000   5000     6000
Figure 6.9: Generated solutions in the vicinity of the interest region for problems 𝐵34 and 𝐵34.
The base and the selected designs for fabrication are demonstrated by arrows.
      pected observation demonstrates the possibility of using optimization to develop innovative
      knowledge about key features of optimal solution(s). Such information can be used in earlier
      stages of design to determine the number of inlets/outlets or even their locations. Although
      simultaneous optimization of different features is more challenging than optimization of only
      sizes of inlets/outlets, it is predictably much more rewarding as well.
    • For this problem, considering the soft constraint 𝑓2 ≤ 10 from the beginning could have been
      advantageous. It would automatically concentrate the search to the region of interest in the
      𝑓 -space; however, such information about the upper bound of 𝑓2 may not have been known
      beforehand and may have resulted from the non-dominated solutions over cycles.
   Although both methods were tested independently, the CS approach was run by the design
                                                       147


engineering team at Ford, who knew the designers’ region of interest. This significant privilege
helped to introduce a focus in the search to a small region, which provides more infill solutions for
that region and improves the accuracy of the metamodel. For the PSA-EA, such information was
provided and used only for the last two cycles, when the exploration phase was almost completed.
6.1.6   Summary of Section 6.1
In this work, we have developed a proximity-based surrogate-assisted evolutionary algorithm (PSA-
EA) for single and multi-objective optimization of computationally expensive problems. PSA-EA
selects new infill solutions according to their predicted fitness (exploitation) but imposes a constraint
on their diversity to substantially improve the accuracy of the surrogate model for future cycles
(exploration). It follows a gradual transition from exploration to exploitation by reducing the
proximity radius over time. The importance of such a gradual transition has been numerically
demonstrated, especially when the number of optimization cycles is limited.
    The proposed surrogate-assisted method has been applied to optimize the cylinder head water
jacket design. Besides the optimization procedure itself, it demonstrates what optimization in
practice may look like. Considerations such as the existence of two different problems with similar
goals, a reference design always compared to during optimization, solution evaluations being
carried out via E-Mail communication, and introducing a soft constraint toward the end of the
optimization run. Altogether, this should give the reader of this dissertation an idea of how diverse
optimization in practice is and how such challenges can be addressed.
6.2   Case Study II: Electric Machine Design
6.2.1   Introduction
Electric machine design is an iterative process where each iteration focuses on improving the
design’s quality. Machines are complex systems where many variables, geometric and physical,
interact non-linearly and affect the machine’s performance. These performance measures can
                                                   148


include evaluating electromagnetic, thermal, and structural performances, making electric machines
a multi-physics MOP. The need for design optimization is mainly application-driven, and since the
field of applications for electric machines is large, there is much scope for improvement.
    Research efforts in machine design optimization have been focused on improving solution
accuracy and quality while reducing the optimization run-time. Some of the early electric machine
design optimization studies include using pattern search and sequential unconstrained minimization
techniques to optimize induction motor [299, 300]. However, it was demonstrated that evolutionary
algorithms (EAs) are superior to point-by-point methods in finding global optimums for complex
systems like electric machines [301]. Consequently, the use of EAs has increased with time in the
optimization of machine design [302, 303, 304, 305, 306, 307]. Since the electric machine is a
nonlinear system, finite element analysis (FEA) is the most preferred method for their evaluation.
Examples of optimization studies where EAs were combined with FEA can be found [308, 309].
Since EAs require many function evaluations to converge to the global optimum, using FEA
to evaluate each case during optimization requires substantial computational resources. In this
regard, researchers have explored techniques, including surrogate-assisted optimization methods,
to substitute FEA for reducing computation time; however, at the expense of solution accuracy [310,
311, 312, 313, 314]. For example, the authors used a combination of a second-order response
surface model (RSM) and GA to optimize the magnet shape and placement in the rotor of an
interior permanent magnet (IPM) machine to achieve the largest constant power speed region
(CPSR) [310]. The second-order RSM was used to predict d-axis and q-axis inductances and
magnet flux linkage of an IPM machine. Similarly, a surrogate-based optimization approach using
multi-objective differential evolution (MODE) algorithm was employed to minimize active material
mass and total losses of an axial flux PM (AFPM) machine at rated operation [313]. Moreover,
a local refinement strategy improving a Pareto-optimal design further after termination of the
optimization method was presented [315]. Results showed that a posteriori local search, even with
fewer function evaluations, produced similar results compared to an approach solely relying on
global optimization.
                                                  149


    Although researchers have explored different optimization algorithms and methods to eval-
uate objective functions, constraint handling in literature for electrical machine optimization is
inefficient. In optimizing electric machines, constraints define feasible regions in the design and
objective space [316]. Typically, every machine optimization problem involves some geometric
constraints that must be respected to generate feasible designs. Unlike objective functions, these
geometric constraints can be quickly evaluated through analytical expressions. A procedure to iden-
tify the geometric feasibility of a candidate solution can be included in the optimization algorithm
itself. A common approach to handle geometrically infeasible solutions is to discard them and
rely on random initialization of variables and repeat this process until a feasible solution has been
found [317]. However, random sampling becomes relatively inefficient as the number of geometric
variables and constraints increases, which is eventually reflected in the quality of Pareto-optimal
solutions when the computational resources are limited. To tackle this problem, we propose an
embedded optimization problem where the information from constraint violation is used to repair
geometrically infeasible solutions and improve the Pareto-optimal front.
The main contributions of this work are the following:
     • The proposal of a repair operator ensuring the feasibility of designs. This operator exploits
       inexpensive constraints, e.g., geometric constraints calculated using analytical expressions,
       while respecting manufacturing accuracy limitations.
     • A demonstration of improvement in quality of the Pareto-optimal solution set by integrating
       a repair operator into the optimization cycle. Additionally, a performance validation of the
       proposed surrogate assistance for predicting the computationally expensive objectives.
     • A detailed physical explanation of Pareto-optimal solutions and recommendations for select-
       ing preferred solutions based on two different approaches: (1) a domain specific a posteriori
       multi-criteria decision-making (MCDM) method which involves machine expertise, and (2)
       a trade-off analysis of the Pareto-optimal set.
                                                 150


                     Table 6.3: Parameters of IPM machine used for optimization.
                      Parameters              Values    Parameters         Values
                      Mechanical power         69 kW    Turns per coil       11
                      Rated speed            3000 rpm   Slot/ pole/ phase     2
                      Peak current             177 A    Slot fill factor    0.46
                      Stator outer diameter   264 mm    Air-gap           0.75 mm
                      Rotor outer diameter  160.4 mm    Stack length      50.8 mm
                      Average torque        214.8 Nm    Magnet type        NdFeB
                      Torque pulsations      36.2 Nm    DC-link voltage    650 V
    The remainder of this case study is structured as follows. Section 6.2.2 discusses the formulation
of the optimization problem. Section 6.2.3 explains the proposed optimization method, which
exploits the computationally inexpensive constraints by introducing a repair operator and the
computationally expensive objective functions by using surrogate models. The impact of the
proposed repair operator and surrogate assistance on the convergence of the algorithm is discussed
in Section 6.2.4. A detailed discussion about Pareto-optimal solutions and the selection of preferred
electric machine designs is also included. Finally, conclusions are drawn in Section 6.2.5.
6.2.2    Electric Machine Design and Optimization Problem Formulation
The optimization of electrical machines typically starts with the selection of a machine template.
Based on machine performance requirements, objective functions are formulated, followed by the
selection of the design variables, variable ranges, and constraints. Average torque, torque pulsations,
losses, and efficiency are some of the most common objective functions used in electric machine
design. Designers usually select variables based on the domain knowledge and then proceed to
sensitivity analysis to keep only the most significant variables during optimization [318]. Variable
ranges can be selected arbitrarily or based on the machine designer’s experience. Since the search
space is generally high dimensional, a proper definition of geometric constraints can ensure the
reliability of solutions.
                                                   151


                                                                                                               Stator yoke
                                                                                            Stator slots and
                                                                                            copper windings
                                                                                  Permanent magnets
                                                                                    Rotor yoke
                                                                                  Shaft
                    (a) 2D Model                                                                    (b) Reduced Model
                         Figure 6.10: IPM machine used for optimization.
                                              235
                                              230                              torque
                                                                               pulsations
                                              225       average
                          Rated torque (Nm)
                                                        torque
                                              220
                                              215
                                              210
                                              205
                                              200
                                              195
                                                    0      20      40     60       80        100        120        140
                                                                  electrical position (degree)
           Figure 6.11: Torque profile of reference design at rated operating conditions.
6.2.2.1   Selection of Machine Topology, Objective Functions, and Evaluation Method
Permanent magnet synchronous machines (PMSMs) are known for their high torque density and
efficiency. They are extensively used in hybrid and electric vehicle applications. In this work,
an electrical machine used in the 2010 Toyota Prius is chosen for optimization. The machine is
a 3-phase, 48-slot, 8-pole IPM machine with a single layer of a V-shaped magnet. Details of
the machine parameters are given in Table 6.3 and a visualization is shown in Figure 6.10a and
6.10b [319]. Since IPM machines are highly nonlinear, FEA is chosen as the objective function
evaluation tool. Because FEA is time-consuming, periodicity is exploited, and only 1/8th of the
model is considered for evaluation. The goal is to maximize average torque while minimizing
torque pulsations, where pulsations are defined from peak to peak as shown in Figure 6.11.
                                                                           152


6.2.2.2   Definition of Feasible Search Space
The following geometric variables are kept constant as in the original 2010 Prius motor:
     • Inner diameter (ID) and outer diameter (OD) of stator and rotor
     • Air-gap length between rotor and stator
     • Stack length of the machine
     • Number of turns per coil
     • Slot fill factor and maximum current density in stator slot
    Next, a sensitivity analysis is performed to identify the 10 most significant variables shown in
Figure 6.12. Out of the 10 variables, six define magnet shape and placement in the rotor, while
four control slot size and shape. Variable ranges are defined to be within 20% variation from
the reference design. All variable values are limited to have only two decimal places because of
manufacturing accuracy limitations. Variable values for the reference design (𝑥 (ref) ) along with
lower (𝑥 (𝐿) ) and upper bounds (𝑥 (𝑈) ) are given in Table 6.4. A total of 10 geometric constraints are
considered for the optimization problem. The geometric constraints allow a quick check of a design
for geometric feasibility without running the FEA simulation. The constraints are defined by having
domain knowledge in mind, which is vital for a good torque profile and a typical manufacturing
tolerance of ±0.05 mm.
6.2.2.3   Selection of Operating Point for Optimization
The performance of an IPM machine directly depends on the speed and torque requirements. Figure
6.13 shows the efficiency contour map of the 2010 Prius motor with a dc-link voltage of 650 V
[2]. The peak torque rating of the machine stays constant until a particular speed called the base
speed of the machine, which is also very close to the rated speed. PMSMs are famous for their
torque density, and therefore, their operation at rated speed is of high importance. To increase
                                                    153


                  Table 6.4: Values of geometric variables used for optimization.
                  𝑥𝑑      Variable                   Unit             𝑥 (ref)    𝑥 (𝐿)    𝑥 (𝑈)
                  𝑥1      Height of rotor pole cap   mm           9.56           7.65    11.47
                  𝑥2      Magnet thickness           mm           7.16           5.73     8.59
                  𝑥3      Magnet width               mm           17.88         14.30    21.46
                  𝑥4      Angle between magnets      degree      145.35         116.28   174.42
                  𝑥5      Bridge height              mm           1.99           1.59     2.39
                  𝑥6      Q-axis width               mm           13.9          11.12    16.68
                  𝑥7      Slot height                mm           30.9          24.72    37.08
                  𝑥8      Slot width                 mm           6.69           5.35     8.03
                  𝑥9      Height of slot opening     mm           1.22           0.98     1.46
                  𝑥 10    Width of slot opening      mm           1.88           1.50     2.26
                                                                  x8
                                                                          x7
                                                                 x1
                                                                          x3
                                                            x2
                                                     x6/2
                                        x10
                                                      x4
                                         x9   x5
                         Figure 6.12: Geometric variables used for optimization.
the efficiency of PMSMs in the speed region up to the base speed, they are operated in a way to
minimize the excitation current fed to stator copper windings and, therefore Joule losses, while
meeting the torque requirements. This specific mode of operation is called maximum torque per
ampere (MTPA) operation. For the optimization problem formulation, the rotational speed of the
rotor and the excitation angle is kept equal to the values of the reference design at rated MTPA
operation. Moreover, it is worth noting that the excitation current is not kept constant. The slot fill
factor denotes the proportion of the slot area filled by copper windings. As the slot fill factor is
kept constant in contrast to the slot cross-section, it results in different current ratings for different
designs.
                                                     154


                 Figure 6.13: 2010 Prius motor efficiency contours for 650 Vdc [2].
     Mathematically, the 10-variable (𝐷 = 10) MOP is now defined as:
                     Maximize Average Torque(x),
                      Minimize Torque Pulsation(x),
                      subject to 𝑔 𝑗 (x) ≤ 0,               ∀ 𝑗 ∈ (1, . . . , 10),              (6.7)
                                 𝑥 𝑑(𝐿) ≤ 𝑥 𝑑 ≤ 𝑥 𝑑(𝑈) ,    ∀𝑑 ∈ (1, . . . , 10),
                         where x ∈ R𝐷 ,
where 𝑔 𝑗 (x) represent the geometrical constraints, x the variables to optimize, and 𝑥 𝑑(𝐿) and
𝑥 𝑑(𝑈) the lower and upper bound of the 𝑑-th variable respectively. All variables are restricted
to have a precision of two decimal places due to manufacturing accuracy limits. Both objective
functions, Average Torque(x) and Torque Pulsation(x), are based on the result of a 2D
transient electromagnetic analysis.
6.2.3    Methodology
When solving real-world optimization problems, the definition of the problem itself can be challeng-
ing, and numerous design decisions have to be made. A careful observation of the optimization
problem in the previous section reveals that during optimization, along with two objectives, 10
                                                    155


geometric constraints need to be dealt with as well. In particular, while objective functions are
expensive to compute, geometric constraints are relatively inexpensive and are computed using
mathematical expressions. A preliminary study [282] showed that the computational inexpensive-
ness of constraint evaluations could be exploited through a repair operator. The goal of the repair
operator is to convert an infeasible solution to a feasible one that satisfies all constraints. Addi-
tionally, design optimization of electric machines is an expensive problem to solve and an effort
must be made to reduce the computational cost. The evaluation of 1000 design solutions using a
2D transient magnetic study in Flux-2D on a single core of a PC-based workstation can take about
14 hours even with only 1/8𝑡ℎ periodic model. In order to maintain the statistical significance of
results, an optimization run has to be repeated several times, making the whole procedure extremely
expensive. Therefore, this study also presents a strategy for the incorporation of surrogates into the
proposed optimization algorithm along with a repair operator.
    In this study, the well-known evolutionary multi-objective optimization (MOO) algorithm
NSGA-II [10] is used as the base optimization algorithm. NSGA-II is a modular, parameter-less
optimization algorithm well suited for bi-objective optimization problems, including optimization
of electric machines. NSGA-II starts with a population of random solutions called the parent
population. After evaluating the population members, pair-wise comparisons are made to select
non-dominated and less-crowded solutions [9] in order to meet the main goals of multi-objective
optimization. The selected population members are then recombined and mutated to create an
offspring population of the same size as the parent population. After their evaluation, the offspring
population is merged with the parent population to execute a final survival selection to pick the
top half of the population. The selected population becomes the parent population of the next
generation. This process is continued until a termination criterion is satisfied. The incorporation
of a repair operator and surrogate assistance is explained below.
                                                 156


6.2.3.1    Repair Operator
The repair operator implementation consists of two phases: first, the geometric constraints are
satisfied by using an embedded but relatively simple optimization procedure; second, the precision
of two decimals is satisfied by rounding each variable up or down, preserving the satisfaction of the
geometric constraints for an easy implementation purpose. Thus, the repair operator finally returns
a feasible solution considering all the electric engine design problem specifications. The proposed
repair ensures that a solution is feasible before its evaluation of objectives. In order to ensure feasi-
bility, the constraint functions are called more frequently than the objective functions, exploiting the
inexpensiveness of constraints. Incorporating this repair operator into an evolutionary algorithm is
relatively straightforward and yet an effective strategy of adapting an existing optimization method
to the needs of a real-world optimization problem.
6.2.3.2    Surrogate Incorporation
Commonly, surrogates – approximation or interpolation models – are utilized during optimization
to improve the convergence behavior. First, one shall distinguish between two different types of
evaluations: ESEs that require to run the computationally expensive evaluation; and ASEs which is
a computationally inexpensive approximation by the surrogate. Where the overall optimization run
is limited by ESEmax function evaluation, function calls of ASEs are only considered as algorithmic
overhead. In order to improve the convergence of NSGA-II, the surrogates provide ASEs and let
the algorithm look several iterations into the future without any evaluation of ESEs. The surrogate
models are used to create a set of infill solutions as follows: First, NSGA-II is run for 𝑘 more
iterations (starting from the best solutions found so far), returning the solution set      (cand) . The
number of solutions in     (cand) corresponds to the population size of the algorithm fixed to 100
solutions in this study. After eliminating duplicates in (cand) , the number of solutions 𝑁 desired to
run using ESEs needs to be selected. The selection first creates 𝑁 clusters (in the objective space
based on F(cand) ) using the k-means algorithm and then uses a roulette wheel selection based on
the predicted crowding distances. Note that this will introduce a bias towards boundary points as
                                                   157


they have been depicted with a crowding distance of infinity. Altogether, this results in 𝑁 solutions
to be then evaluated using ESEs in this optimization cycle.
                               𝑓"                     Population (high-fidelity)
                                                       Candidates (surrogate)
                                  Ranking Selection
                                  (based on Crowding Distance)
                                                                               𝑓!
Figure 6.14: Ranking selection of solutions obtained by optimizing the surrogate-based optimiza-
tion problem.
     A few more words shall be said about the surrogate itself. Since the electric machine design
is formulated with two objectives to be optimized, two different models are built. Separately
fitting a model for each objective corresponds to the M1 method proposed in the surrogate usage
taxonomy [23]. For each objective, the best model type is found by iterating over different model
realizations of RBF [34] and Kriging [35] varying normalization, regression, and kernel type.
Finally, the best model type is chosen based on the validation set’s performance.
6.2.3.3   NSGA-II-WR-SA
In Algorithm 6.1, a detailed pseudo-code demonstrating the solution repair and surrogate usage
is provided. The algorithm’s parameters are the expensive objective functions 𝑓 (x) and the
inexpensive constraint functions 𝑔(x); the maximum number of exact solution evaluations ESEmax
serves as an overall termination criterion; the number of the initial design of experiments 𝑁 DOE
describes how many designs are evaluated before optimization starts; the number of solutions 𝑁
evaluated in each optimization cycle; and the number of surrogate optimization cycles 𝑘, or in other
words for how many iterations the surrogate are used to look into the future.
                                                     158


  Algorithm 6.1: NSGA-II-WR-SA: NSGA-II with Repair and Surrogate Assistance.
    Input : Expensive Objective Function 𝑓 (x), Inexpensive Constraint Function 𝑔(x), Maximum
             Number of Exact Solution Evaluations ESEmax , Number of Design of Experiments 𝑁 DOE ,
             Number of ESEs in each Iteration 𝑁, Number of Surrogate Optimization Cycles 𝑘
    /* initialize feas. solutions using the inexpensive function 𝑔 */
  1 ← constrained_sampling(𝑁 DOE , 𝑔)
  2 F ← 𝑓 ()
  3 while || < ESE max do
        /* exploitation using the surrogate */
  4      𝑓ˆ ← fit_surrogate(, F)
                         
  5       (cand) , F(cand) ← optimize(’NSGA-II-WR’, 𝑓ˆ, 𝑔, , F, 𝑘)
                         
  6       (cand) , F(cand) ← eliminate_duplicates(,(cand) , F(cand) )
  7     𝐶 ← cluster(’k_means’, 𝑁 (exploit) , F(cand) )
  8     (surrogate) ← ranking_selection( (cand) , 𝐶, crowding(F(cand) ))
        /* evaluate and merge to the archive */
  9     F(surrogate) ← 𝑓 ( (surrogate) );
 10     ← ∪(surrogate)
 11     F ← F ∪ F(surrogate)
 12 end
    First, the algorithm starts by sampling 𝑁 DOE solutions in the feasible space using a sampling
strategy producing only feasible solutions (for more details, we refer to [282]) and evaluates the
solution set (Line 1 and 2). Then, while the overall evaluation budget ESEmax has not been used yet,
surrogates 𝑓ˆ are built for the objectives (Line 4). By applying NSGA-II for 𝑘 optimization cycles
starting from 𝑋 using the surrogate models 𝑓ˆ(𝑥) and the inexpensive objective functions 𝑔(𝑥), a
candidate set of solutions    (cand) and F(cand) is retrieved (Line 5). Depending on the surrogate
problem, some solutions in      (cand)    can be identical to the ones already evaluated in 𝑋; thus, a
duplicate elimination ensures these solutions are filtered out (Line 6). Since the size of         (cand)
exceeds 𝑁, a subset selection based on the predicted crowding distances takes place (Line 7 and 8).
Finally, the resulting solution set (surrogate) of size 𝑁 is evaluated using ESEs and is appended to
the archive of solutions.
                                                    159


6.2.4    Results and Discussion
In the following, the performance contribution of each of the components in the proposed method
shall be examined. This includes answering the following key questions:
     • Is the repair operator helpful during optimization, and what is its impact?
     • Does the usage of surrogates improve the convergence behavior?
     • What insights are gained from the Pareto-optimal designs, and what can we learn from them
       for the electric machine design?
    The first two questions are related to the optimization method itself and its convergence. The
latter is vital as it addresses the ultimate goal of optimization, which is gaining more insights and
finally choosing an electric machine design.
6.2.4.1    Analysis of Constraints
Before analyzing the impact of the repair operator, the constraints shall be investigated. As an initial
study, 100 random solutions are sampled using the Latin hypercube sampling method. In order
to ensure statistical significance, each experiment is repeated 100 times. Results show that 69.7%
of these randomly sampled solutions are infeasible, or in other words, only 30.3% are feasible.
A solution is considered infeasible if one or more constraints are violated. An analysis of each
constraint separately provides more information on what type of constraints are more difficult to
satisfy. In Table 6.5, the percentage of infeasible solutions is shown for each constraint. The
percentages reveal that some constraints are more difficult to satisfy than others. The constraints
𝑔8 and 𝑔9 , which control the slot shape, and 𝑔1 , which controls the magnet placement close to the
shaft, have not been violated in any of the 10, 000 solutions. In contrast, some other constraints,
such as 𝑔2 and 𝑔7 , related to magnet placement close to rotor OD, are responsible for infeasible
solutions 31.43% and 36.28% of the time. It is worth mentioning that these 10, 000 solutions are
                                                   160


              Table 6.5: The constraint violation of each constraint value from 𝑔1 to 𝑔10 .
  Value      𝑔1        𝑔2       𝑔3        𝑔4       𝑔5        𝑔6        𝑔7      𝑔8      𝑔9      𝑔10
  Infeas.   0.0%    31.43%   19.16%     19.16%   21.94%    19.59%    36.28%   0.0%    0.0%   20.25%
  Rank        7        2        6          6        3         5         1       7       7       4
generated from the ranges of the variables defined in Table 6.4. For a different search space with
arbitrarily defined variable bounds, the percentage of infeasible solutions could be even larger.
6.2.4.2    Impact of Repair Operator
In order to investigate the impact of the proposed repair operator, experiments with two optimization
methods are conducted: (1) NSGA-II and (2) NSGA-II-WR, which refers to NSGA-II with the
proposed repair operator. Both approaches use the binary tournament selection, simulated binary
crossover (SBX) operator with a probability of 0.9, and polynomial mutation. The distribution
index used for crossover and mutation operators are 𝜂𝑐 = 15 and 𝜂𝑚 = 20, respectively. For both
methods, a population size of 100, 20 offsprings in each generation, and 1500 function evaluations
in total are chosen for one optimization run. Five such optimization runs are completed for each
method to maintain statistical significance, making the total number of evaluations 7500. The
overall setup and results of all experiments are shown in Table 6.6 and Figure 6.15.
     First of all, one can note that both algorithms have successfully converged to a set of Pareto-
optimal solutions showing trade-offs between average torque and torque pulsation. Second, NSGA-
II-WR outperforms NSGA-II, which demonstrates the positive impact of the repair operator. The use
of the repair operator obtains more non-dominated solutions, which also have a larger hypervolume
value. Moreover, it should be noted that for the calculation of hypervolume, the worst and the best
points are found from the combined set of the two Pareto-optimal fronts. Thereafter, the objective
functions are normalized to obtain the normalized hypervolume, as shown in Table 6.6. Now, one
might argue why this experiment was necessary and how a repair operator could have harmed the
convergence in the first place? One possible risk of adding a deterministic repair of infeasible
solutions is a diversity loss because the natural exploration of an evolutionary algorithm has been
                                                  161


Table 6.6: Optimization setup and results. Evaluations (Evals) correspond to total functional
evaluations performed in five runs. The reported hypervolume is calculated after normalization of
objective functions.
       Algorithm            Description        Evals  Feasible    Non-dominated     Hypervolume
       NSGA-II             Conventional        7,500   5,446            27              0.7206
       NSGA-II-WR           With Repair        7,500   7, 500           59              0.7382
              (a) NSGA-II (7,500 evaluations).               (b) NSGA-II-WR (7,500 evaluations).
Figure 6.15: Objective space illustrating dominated and non-dominated solutions for optimization
runs completed using NSGA-II and NSGA-II-WR.
interfered with. Thus, influencing the exploration by adding a repair can also have a negative
impact. However, results indicate that the proposed customization is well-suited for optimizing the
design of an IPM machine and increases the diversity of obtained solutions on the Pareto-optimal
front.
6.2.4.3   Parameter Study for Surrogate-Assisted Optimization
In this section, the following three hyperparameters related to the surrogate-assistance are varied
to analyze the performance of the proposed optimization method.
    • 𝑁: Number of ESEs in each iteration
    • 𝑘: Number of generations for exploitation using ASEs
    • 𝑁 DOE : Number of initial design of experiments
                                                     162


    To carefully analyze the impact of the above parameters, three optimization Setups, A, B, and
C, are defined in a sequential manner. Each setup consists of four cases with variations applied to
only one hyperparameter while keeping the other two constant. Each case is repeated five times
with 200 functional evaluations in each run, making the total number of evaluations equal to 1,000.
To make fair comparisons, the same set of 𝑁 DOE are used for all runs defined in Setup A and B, while
the same seed is used to generate the initial population for all runs defined in Setup C. The complete
setup for this study is shown in Table 6.7. Clearly, Setup A, B, and C quantify the impact of 𝑁, 𝑘,
and 𝑁 DOE . respectively. In order to compare the performance of surrogates in different cases, three
criteria are selected, (1) the number of non-dominated solutions (𝑁𝑛𝑑𝑠 ), (2) Hypervolume (HV),
and (3) the rate of change of HV with evaluations (RHVE). Since the study aims to find a suitable
hyperparameter setting in a sequential manner, results of Setup A are used to define Setup B, and
combined results of Setups A and B are used to define Setup C. The calculation of the three criteria
is explained below.
     • 𝑁𝑛𝑑𝑠 : All five runs of each case are combined to obtain one non-dominated (Pareto) front,
       yielding the number of non-dominated solutions.
     • HV: The best and the worst objective function values are found from the Pareto-optimal sets
       of the cases being analyzed, and objective functions values are normalized to calculate the
       corresponding HV.
     • RHVE: Five runs of each case provide five arrays of RHVE. Then the median of these five
       arrays is used to obtain the final RHVE for the corresponding case.
    Results for all the cases of three setups are shown in Table 6.8 and Figure 6.16. The observations
from this study are listed below.
     • Increasing the number of 𝑁, initially improves the non-dominated front. However, a large
       value of 𝑁 may lead to an over-early convergence with a biased search, as can be seen for
       𝑁 = 20 and 𝑁 = 25. A possible way to overcome this could be to increase the total number
                                                    163


Table 6.7: Complete setup for analyzing impact of hyperparameters on performance of surrogate-
assisted optimization. For all cases, number of functional evaluations is limited to 200 in a single
run. Each case is repeated 5 times, thus, making total evaluations 1,000 for each case.
                                    Setup A             Setup B             Setup C
               Case    Runs
                                 𝑁    𝑘   𝑁 DOE      𝑁     𝑘   𝑁 DOE      𝑁    𝑘   𝑁 DOE
                  1     5         5  25    100       10   10   100       10   35    60
                  2     5        10  25    100       10   20   100       10   35    80
                  3     5        20  25    100       10   25   100       10   35   100
                  4     5        25  25    100       10   35   100       10   35   120
Table 6.8: Results for Setups A, B, and C defined for analyzing impact of hyperparameters on
performance of surrogates. Hypervolume (HV) is calculated after normalization of objective
functions.
                                 Setup A            Setup B             Setup C
                    Case
                              𝑁 𝑛𝑑𝑠    HV        𝑁 𝑛𝑑𝑠     HV        𝑁 𝑛𝑑𝑠    HV
                      1        42    0.8051       32     0.8062       47    0.8650
                      2        40    0.8304       33     0.7851       51    0.8269
                      3        56    0.7711       40     0.8304       43    0.8211
                      4        30    0.7475       43     0.8397       38    0.7983
       of evaluations. Nevertheless, that would also result in an unwanted increase in the associated
       computational cost.
    • Increasing the value of parameter 𝑘 leads to a better Pareto-optimal front. This makes sense
       since more generations for exploitation means more ASEs before surrogates produce infill
       solutions.
    • Increasing the value of 𝑁 DOE results in smaller HV. For a smaller value of 𝑁 DOE , the surrogate
       has more ESEs to improve the model fit and generate better offsprings in future generations.
       This means running the optimization cycle with more ESEs could lead to an improvement
       in the quality of the Pareto-optimal set for a larger value of 𝑁 DOE . However, this will also
       increase the computational cost of optimization, which is again undesirable.
                                                 164


                                                                           1
                                                 Normalized hypervolume
                                                                          0.9
                                                                          0.8
                                                                          0.7
                                                                                                                         N=5
                                                                                                                         N=10
                                                                          0.6                                            N=20
                                                                                                                         N=25
                                                                          0.5
                                                                             60   80      100     120   140    160    180       200
                                                                                       Number of evaluated solutions
                        (a)                                                                       (b)
                                                                           1
                                                 Normalized hypervolume
                                                                          0.9
                                                                          0.8
                                                                          0.7
                                                                                                                         k=10
                                                                                                                         k=20
                                                                          0.6                                            k=25
                                                                                                                         k=35
                                                                          0.5
                                                                             60   80      100     120   140    160    180       200
                                                                                       Number of evaluated solutions
                        (c)                                                                       (d)
                                                                           1
                                                 Normalized hypervolume
                                                                          0.9
                                                                          0.8
                                                                          0.7
                                                                                                                     N DOE=60
                                                                          0.6
                                                                                                                     N DOE=80
                                                                          0.5                                        N DOE=100
                                                                                                                     N DOE=120
                                                                          0.4
                                                                             50             100               150               200
                                                                                       Number of evaluated solutions
                        (e)                                                                       (f)
Figure 6.16: Objective space illustrating Pareto-optimal fronts for cases of Setups A, B, and C.
                                              165


                                                           1 .0 0 1 .0 0 1 .0 0 1 .0 0 1 .0 0 1 .0 0 1 .0 0 1 .0 0 1 .0 0  1 .0 0
                                                           0 .0 0 0 .0 0 0 .0 0 0 .0 0 0 .0 0 0 .0 0 0 .0 0 0 .0 0 0 .0 0  0 .0 0
                                                           x1     x2     x3     x4     x5     x6     x7     x8     x9     x10
                          (a)                                                          (b)
Figure 6.17: Comparison of objective space with Pareto-optimal fronts and normalized design
space of Pareto optimal sets. Pareto-optimal sets are obtained from two optimization methods,
NSGA-II-WR and NSGA-II-WR-SA.
6.2.4.4    Convergence Analysis with and without Surrogates
Based on the hyperparameter study, Case 1 from Setup C, with 𝑁 = 10, 𝑘 = 35, and 𝑁 DOE = 60,
is identified as the best setting for surrogate-assisted optimization and its results are compared
with those obtained from NSGA-II-WR. For the remainder of this section, we refer to the best
hyperparameter configuration found as NSGA-II-WR-SA.
    Figure 6.17a shows a comparison of the two Pareto-optimal sets obtained by NSGA-II-WR and
NSGA-II-WR-SA. One can observe that NSGA-II-WR-SA clearly outperforms NSGA-II-WR as
the Pareto-optimal front obtained with the former method dominates most of the Pareto-optimal
front obtained with the latter. To understand the convergence of each optimization method, the
design space of the two Pareto-optimal sets is plotted using a parallel coordinates plot (PCP) as
shown in Figure 6.17b. Each vertical axis in the PCP plot represents the normalized optimization
variable 𝑥 𝑑 with its lower and upper bounds as 0 and 1, respectively, and each horizontal line
represents a solution. The design space of the two Pareto-optimal sets shows that almost all the
variables have converged to an optimal value with NSGA-II-WR-SA, whereas, with NSGA-II-WR,
some of the variables still have significant variations with some further scope of convergence.
These observations validate the incorporation of surrogates in the proposed optimization method,
                                                166


                                                                                                                80
                                                                        14                                               Average torque
  Average torque(Nm)
                       250                                                                                               Pulsations
                                                                                           Mean squared error
                                                                        12
                                                                                                                60
                                                                             Generations
                                                                        10
                       200                                              8                                       40
                                                                        6
                                                                        4                                       20
                       150
                                                                        2
                                                                                                                0
                             0   10      20    30      40     50   60                                                0            5                 10   15
                                      Torque pulsations(Nm)                                                                           Generations
                                                 (a)                                                                                      (b)
Figure 6.18: Exploration of objective space and MSE in prediction of objective functions, for each
generation using NSGA-II-WR-SA.
demonstrating the improvement of the algorithm’s convergence.
                   It should be noted that while NSGA-II-WR uses 7,500 ESEs, the evaluation budget for NSGA-II-
WR-SA is limited to only 1,000. However, there is more to optimization with surrogates than just the
number of function evaluations. As explained in the previous section, surrogates look into the future
for 𝑘 iterations, 35 in the case of NSGA-II-WR-SA, before producing 𝑁 number of infill solutions
for the next generation. In each iteration, surrogates provide ASEs, of size equal to 100 in this study,
effectively evaluating 3,500 solutions before generating infills. This way, the effective number of
evaluations using surrogates exceeds those without surrogates, which leads to better convergence.
Further insights into the convergence can be obtained by analyzing how the surrogates explore
the objective space and how close the predicted objective values are to the actual expensive ones.
Figure 6.18 shows the objective space and the mean squared error (MSE) of predictions in each
generation using NSGA-II-WR-SA (in one single run). An important observation is that surrogates
can predict average torque with higher accuracy than torque pulsations, which are greatly influenced
by the electrical steel’s non-linearity (magnetic saturation). The sudden rise in MSE of pulsations
from generation 7 to 8 can be explained by analyzing the corresponding solutions in objective space.
Due to a sudden increase in average torque, more and more solutions are generated with higher
slot cross-section and magnet volume, which results in an operation in a high saturation region.
Consequently, it takes some time before the surrogates can predict the pulsations accurately.
                                                                                   167


6.2.4.5    Analysis of Pareto-optimal Solutions
After discussing the optimization procedure in detail, Pareto-optimal solutions shall be analyzed
in order to gain insight into the electric machine design. For this purpose, the design space of
the Pareto-optimal set obtained by NSGA-II-WR-SA is analyzed, as shown in Figure 6.17b. Since
the size of the machine is kept constant, magnet width (𝑥 3 ), slot height (𝑥 7 ), and slot width (𝑥 8 )
converge to the higher end of variable ranges for most solutions. While a larger magnet width
increases magnet flux linkage leading to higher average torque, it also reduces the q-axis width
(𝑥 6 ), which has converged to the lower end of the variable range. Similarly, an increase in slot
cross-section area results in more space for winding, which directly translates to higher allowable
excitation current and an increase in average torque. Additionally, a reduction in bridge height (𝑥 5 )
directly increases the air-gap flux density, which increases average torque.
      On the other hand, torque pulsations are affected by the magnet pole arc and material saturation.
Magnet pole arc is directly proportional to magnet width (𝑥 3 ) and angle between the magnets (𝑥4 ).
Material saturation is a nonlinear behavior observed in magnetic materials, such as electrical
steel, introducing saturation harmonics in magnetic flux density. While a larger slot cross-section
increases average torque by means of more excitation current, it also increases magnetic material
saturation, leading to more torque pulsations. Lastly, the height and width of slot opening, 𝑥 9 and
𝑥 10 respectively, which are responsible for slot harmonics, have converged to the lower end of the
variable range. This makes sense as semi-closed slots are used as an effective method to reduce
slot harmonics contributing to torque-pulsations [320].
6.2.4.6    Selection of Preferred Solutions
The selection of an electric machine design is primarily application-dependent. A popular approach
uses a scalarized function for optimization, which yields a single optimal solution at the end of an
optimization run. However, selecting weights for a scalarized function is rather difficult. Scalar-
ization also prevents the possibility of analyzing trade-offs offered by Pareto-optimal solutions. In
this study, two different approaches are used to select the preferred solutions; (1) a domain-specific
                                                   168


a posteriori multi-criteria decision-making (MCDM) method which involves machine expertise,
and (2) trade-off analysis of the Pareto-optimal set to identify and choose the solutions with the
highest trade-off. Pareto-optimal solutions obtained from combined runs of NSGA-II-WR and
NSGA-II-WR-SA optimization methods are used in both approaches.
Domain Specific A Posteriori MCDM Method:                 For domain-specific a posteriori MCDM
method, the following performance measures, important to all PMSMs, apart from the two objective
functions defined in Equation 6.7, are used to select preferred solutions from the Pareto-optimal
set.
     • Total harmonic distortion of noload back emf (THDV)
     • Peak of fundamental of back emf (F-BEMF)
     • Magnet utilization factor (MUF)
     For electric machine design, a low value of THDV is desired since it is a direct measure of
noise, vibration, and harshness (NVH) during the operation of the electric machine. Similarly, an
increase in F-BEMF increases the average torque but reduces the maximum speed that the machine
can achieve, thus, presenting a trade-off. On the other hand, a higher value of MUF is desired,
where MUF is defined as the ratio of average torque to PM volume. Since PM material is expensive,
a higher value of MUF translates to a reduction in the machine cost. Figure 6.19a shows THDV
for Pareto-optimal solutions obtained from combined runs of NSGA-II-WR and NSGA-II-WR-SA
optimization methods. Although solutions lying in the bottom region of the Pareto-front have the
least torque pulsation, they have the highest THDV (more than 30%) and must be avoided during
selection. Since the remaining Pareto-optimal solutions have similar THDV (10-14%), it is easier
to select solutions based on the other four performance measures. Based on further evaluation,
three preferred solutions, 1, 2, and 3, are selected, which are also highlighted in Figure 6.19a. The
basis of the selection of these solutions is as follows.
     • Solution 1: maximum average torque
                                                  169


                      280                                                                                                 280
 Average torque(Nm)                                                                                  Average torque(Nm)
                      260                                                  30                                             260
                                                                  1
                                                                                THDV(%)
                      240                                                  25                                             240
                      220                 2                                20                                             220
                                      3
                                                                           15                                                                                                     Solution 3
                      200                             Reference                                                           200                                                     Solution 4
                                                                                                                                                                                  Solution 5
                      180                                                  10                                             180
                            0   10        20   30    40      50       60                                                        0            10      20         30           40    50          60
                                     Torque pulsations(Nm)                                                                                         Torque pulsations(Nm)
(a) Selected solutions with domain specific a posteriori                                                                      (b) Selected solutions with trade-off analysis.
MCDM method.
Figure 6.19: Objective space highlighting the selected solutions using the a posteriori MCDM
method.
                      • Solution 2: maximum MUF
                      • Solution 3: minimum pulsation and F-BEMF
Trade-Off Calculation Using Objective Functions: Trade-off analysis of the Pareto-optimal set is an
effective method to select preferred solutions without domain expertise. In this study, for a particular
solution (x (𝑖) ), a trade-off is calculated using the following equation on the neighborhood of points
ranked according to Euclidean distance (represented by 𝐵(x (𝑖) )). The term 𝑘=1
                                                                                     Í𝑀
                                                                                          {1|𝑐 𝑘 > 𝑑 𝑘 }
calculates the number of 𝑘’s (out of 𝑀) for which the condition 𝑐 𝑘 > 𝑑 𝑘 is valid. It should be
noted that for the trade-off calculation, only two objective functions defined in (6.7) are used, and
solutions with high trade-off values are desired.
                                                                                                                                                                    
                                                                                                                                       (x ( 𝑗) )          (x (𝑖) )
                                                                                    Í𝑀
                                                                                          𝑘=1 𝑚𝑎𝑥                             0, 𝑓 𝑘               − 𝑓𝑘
                                               Avg.Loss(x (𝑖) , x ( 𝑗) ) =                                                                                               ,
                                                                                                                                    ( 𝑗) )   > 𝑓 𝑘 (x (𝑖) )}
                                                                                           Í𝑀
                                                                                                𝑘=1 {1| 𝑓 𝑘 (x
                                                                                                                                    
                                                                                    Í𝑀
                                                                                              𝑚𝑎𝑥     0,  𝑓   (x (𝑖) ) − 𝑓 (x ( 𝑗) )
                                                                                          𝑘=1               𝑘             𝑘
                                               Avg.Gain(x (𝑖) , x ( 𝑗) ) =                                                             ,                                                       (6.8)
                                                                                                            (𝑖) ) > 𝑓 (x ( 𝑗) )}
                                                                                           Í𝑀
                                                                                             𝑘=1 {1| 𝑓 𝑘 (x            𝑘
                                                                                   |𝐵(x (𝑖) )|    Avg.Loss(x (𝑖) , x ( 𝑗) )
                                                    Trade-off(x (𝑖) ) = max                                                 .
                                                                                          𝑗=1     Avg.Gain(x (𝑖) , x ( 𝑗) )
                                                                                           170


Table 6.9: Performance comparison of five preferred solutions found using domain specific a
posteriori MCDM method and trade-off analysis. Preferred values are highlighted in bold for the
five solutions.
                         Avg torque          Pulsations          THDV           MUF           F-BEMF
       Solution
                            (Nm)                (Nm)              (%)        (Nm/mm3 )          (V)
             1            263.0374             47.4060          14.1263        0.0290         248.2401
             2            235.5986             14.2488          11.2982        0.0308         236.7291
             3            231.3853             9.9186           11.5016        0.0304         234.4451
             4            258.3494             38.2791          12.2894        0.0287         246.4461
             5            254.4529             33.7342          10.4722        0.0285         242.1732
      Reference           214.7760             36.1846          14.4093        0.0330         209.2622
    After performing a trade-off calculation based on the above definition, three solutions with the
highest trade-off, Solution 3, 4, and 5, are selected from the combined Pareto-optimal set, as shown
in Figure 6.19b. Interestingly, Solution 3 is picked again, with the highest trade-offs among all
solutions. The basis of the selection of these solutions is as follows.
     • Solution 3: highest trade-off value (114.99)
     • Solution 4: 2𝑛𝑑 highest trade-off value (50.79)
     • Solution 5: 3𝑟 𝑑 highest trade-off value (35.07)
Performance comparison of selected solutions: Performance details of the five selected solutions
along with reference design are given in Table 6.9. Further insights into the performance of these
solutions can be gained by analyzing the design space, as shown in Figure 6.20a. Some important
observations highlighting the trade-off among selected solutions are as follows.
     • Out of the five selected solutions, Solution 4 is dominated in all performance measures by at
       least one solution.
     • Solution 1 provides the maximum average torque but also maximizes the amplitude of pul-
       sations and F-BEMF. Both these characteristics can be explained by larger magnet thickness
       (𝑥 2 ), slot height (𝑥 7 ), slot width (𝑥 8 ), and slot opening height and width (𝑥 9 and 𝑥 10 ).
                                                         171


    • Solutions 2 and 3 perform quite similarly in all aspects, with slight variations observed in
       average torque and torque pulsations. This can be explained since both solutions have almost
       similar values of design variables with only a significant difference in angle between magnets
       (𝑥 4 ).
    • All selected solutions have larger F-BEMF compared to the reference design, which means
       that all five solutions will have a smaller speed range. The relation between F-BEMF and
       the maximum achievable speed can be seen in Figure 6.20b, which shows the torque/speed
       envelop of solutions 1 and 3 along with reference design. With a further increase in speed,
       one would observe that torque produced by Solution 1 drops to zero more quickly compared
       to Solution 3.
    • Although Solution 5 has the least THDV and provides high average torque (only 3.26% less
       than Solution 1), it has the least MUF out of the selected solutions. Additionally, Solution
       5 has significantly less pulsation due to smaller slot width (𝑥8 ), leading to a smaller slot
       cross-section compared to Solution 1.
    • A comparison of magnetic flux density plots of Solutions 1, 2, and 3 at corresponding rated
       operating conditions shows that Solution 1 suffers from higher saturation in stator teeth, back
       iron, and rotor steel close to magnet edges as explained above and is shown in Figure 6.21.
    Based on the discussion presented in this work, one should select Solution 1, 4, or 5 for an
application with a high average torque requirement. If the focus is more on smooth operation with
high-speed range, Solution 2 or 3 should be selected. It is also worth mentioning that while trade-off
analysis can pick Solution3, one would have missed out on Solution 2 with the highest MUF which
requires domain expertise. Ultimately, the selection of a single solution out of a Pareto-optimal set
will require further preference information.
                                                 172


           1 .0 0   1 .0 0   1 .0 0   1 .0 0   1 .0 0   1 .0 0   1 .0 0   1 .0 0   1 .0 0   1 .0 0
                                                                                                                          300
                                                                                                                                                                             Reference
                                                                                                     Average torque(Nm)
                                                                                                                                                                             Solution 1
                                                                                                                                                                             Solution 3
                                                                                                                          200
                                                                                                                          100
                                                                                                                           0
                                                                                                                                0         0      0        0       0         00      00
           0 .0 0   0 .0 0   0 .0 0   0 .0 0   0 .0 0   0 .0 0   0 .0 0   0 .0 0   0 .0 0   0 .0 0
                                                                                                                                    200       400      600     800    100        120
           x1       x2       x3       x4       x5       x6       x7       x8       x9       x10
                                                                                                                                                     Speed(RPM)
                                               (a)                                                                                                       (b)
   Figure 6.20: Five selected Pareto-optimal solutions highlighted in normalized design space.
                                           (a) Solution 1.                                                                                          (b) Solution 2.
                                           (c) Solution 3.                                                                                    (d) Reference design.
Figure 6.21: Magnetic flux density plots of Solutions 1, 2, 3 and reference design at rated operation.
6.2.5   Summary of Section 6.2
In this section, a repair operator to efficiently handle inexpensive constraints has been presented.
The operator’s goal is to convert every infeasible solution to a feasible one before the expensive
objective function evaluation takes place. The repair operator has been incorporated into NSGA-
II, a popular multi-objective optimization algorithm, and its impact has been demonstrated by
                                                                                                     173


solving a bi-objective optimization problem with expensive objectives and inexpensive constraint
function evaluations. Results have shown that the optimization with the proposed repair operator
(NSGA-II-WR) has led to an improved Pareto-optimal set compared to the baseline optimization
method (NSGA-II). Since the optimization of real-world problems is often time-consuming, an
extension of the proposed method with surrogate assistance has also been proposed. A sequential
parametric study has been performed to identify the optimal values of three parameters; (1) infills
in each generation (𝑁), (2) number of generations for exploitation (𝑘), and (3) number of the
initial design of experiments (𝑁 DOE ). Results with a tuned parameter configuration have indicated
that optimization with surrogates (NSGA-II-WR-SA) improves the algorithm’s convergence further
and outperforms NSGA-II-WR significantly. The ultimate goal of an optimization process is to
reach an optimal solution that can be implemented successfully. Thus, an a posteriori MCDM
approach focused on machine cost, noise, vibration, harshness (NVH), and speed range of the
electric machine has been presented to identify three preferred solutions out of the Pareto-optimal
set successfully. Additionally, trade-off calculations based on objective functions have also been
used to pick three preferred solutions, thereby helping the designer focus on solutions of the most
interest.
6.3    Summary of the Chapter
This chapter has presented two case studies of real-world applications with computationally ex-
pensive objectives functions. First, we have provided a case study addressing the optimization of
a cylinder head water jacket. Results have shown that surrogate incorporation can effectively im-
prove the convergence behavior of optimization algorithms. The second case study has investigated
the optimization of the design of an electric machine. The optimization problem was based on
computationally expensive objectives but less time-consuming constraints. The proposed method
has exploited the discrepancy of expenses by ensuring the feasibility of a design before starting
the time-consuming simulation. Both case studies have demonstrated the practical relevance of
directly addressing computationally expensive functions in the algorithm design.
                                                 174


                                            CHAPTER 7
                                   OPTIMIZATION IN PRACTICE
The previous chapters have enlightened the optimization of computationally expensive functions
from a more technical point of view, focusing on how surrogates can be used during optimization.
Nevertheless, most applications’ essential and challenging aspect is their interdisciplinary character,
which has not been addressed yet. For application problems, knowledge and experience in opti-
mization and at least one other domain is necessary. In practice, this requires collaboration between
experts with (preferably) complementary expertise. But the way of carrying out the collaborations
and the responsibilities of the domain and optimization experts need to be investigated. One of
the optimization expert’s tasks is to find a suitable optimization method or develop a prototype.
For both purposes, most commonly existing optimization frameworks are either directly used by
executing an optimization function or indirectly by importing modules. Thus, as developers of a
widely used optimization framework called pymoo, we like to give some insights into its architec-
ture, features, and usage. As collaboration and the usage of frameworks are critical for real-world
optimization success, both will be discussed next.
7.1    Collaborative Optimization
7.1.1   Introduction
Interdisciplinarity is a critical component of any applied research nowadays. Multiple branches of
knowledge coming together require not only to master each discipline independently but also their
intersections. A discipline playing an essential role in various sciences regarding problem-solving
tasks is (mathematical) optimization. The interdisciplinary character of optimization becomes
apparent by studying related literature in various research fields, such as engineering, economics,
medicine, and society [172, 174, 200, 204]. While reading different kinds of studies, one will
realize that some publications focus on the domain and others more on the optimization method
                                                 175


itself; however, most of the attention is paid to the investigation’s outcome and not the collaborative
process. Since collaboration is vital for success, this chapter focuses on aspects of collaborative
research in the context of optimization, which will be referred to as collaborative optimization in
the remainder of this study.
    Collaborations are essential in an interwoven discipline like optimization, which requires knowl-
edge in optimization itself and one or multiple other domains. Because domain knowledge is the
foundation for the algorithm’s design, its incorporation requires a fundamental, or even deep, un-
derstanding of the domain and the desired method’s requirements. Naturally, this demonstrates the
need for domain and optimization knowledge, which is realized by initiating a collaboration. The
attempt of separately solving the domain-specific and optimization-related tasks is likely to fail;
however, this is still carried out in practice even today. For instance, such a clear separation of tasks
can be realized by having a domain expert formulating the problem statement independently and the
optimization expert developing the algorithm from thereon without any further feedback from the
domain expert. Even though each task should have a collaborator responsible for taking the lead,
communication and agreements are vital for true collaboration and success. Thus, in collaborative
optimization, the outcome is more than the sum of its parts, and success is achieved by effectively
addressing the fusion of multiple research fields.
    This chapter’s focus shall lie on the research collaboration in any kind of discipline where
optimization is needed and applied. Thus, the different phases of collaboration and all support-
ing activities are of importance. Furthermore, in this study, collaborative optimization is based
on human-human interactions; however, human-machine interaction can be a component of the
problem description. Nevertheless, while related works specify collaborative optimization only in
the context of multi-disciplinary design optimization [321], this study considers collaboration in
a more generic context. Moreover, it is worth mentioning that the term collaborative optimiza-
tion has also been used to refer to a specific type of algorithms to solve large-scale optimization
problems [322, 323], which is also not the focus of this study.
    First, we’ll discuss work related to different aspects of collaborative optimization. In Sec-
                                                   176


tion 7.1.3, we propose a blueprint for collaborative optimization by describing primary and sup-
porting activities. Illustrative case studies are provided in Section 7.1.4 and conclusions are
discussed in Section 7.1.5.
7.1.2    Related Work
Collaboration can be defined as “the situation of two or more people working together to create
or achieve the same thing” [324]. Sharing the same goal while working together is essential to
understanding the word’s meaning. Another definition emphasizes the existence of conflicting
goals, which should be reduced to a common denominator, and the fact that collaboration is
more contentious than coordination or cooperation [325]. In the field of optimization, well-
studied subjects characterize collaboration by projects and project management, interdisciplinarity,
communication, and (applied) research. Even though all of them have to be mastered simultaneously
in collaborative optimization, related work considers them independently for now. Later in this
study, a more precise definition of collaborative optimization and these aspects’ interactions will
be provided.
    Projects have been well-studied throughout the literature and are a fundamental part of eco-
nomics. Techniques to measure the success of a project have especially been of interest. A
well-known method for measuring success is the so-called iron triangle, describing success as a
trade-off of time, cost, and quality [326]. Whereas most authors agree that the criteria are critical,
the model has also been criticized for being too simple. Thus, more sophisticated models have
been proposed to measure the success or failure of a project. In general, there is an agreement
that for projects in general, measuring success is challenging, not least because of subjective views
of stakeholders or the time dependency [327]. During a project, the time is also referred to as
the project life cycle, which can be divided into different phases: conceptualization, planning,
executing, termination [328]. More modern approaches, however, do not follow the traditional
waterfall model; instead, they pursue flexible and iterative project management strategies [329].
    Projects with goals regarding more than one discipline have to deal with interdisciplinary chal-
                                                 177


lenges. Interdisciplinary is characterized by a suitable combination of knowledge from different
specialties. The purpose of the combination is to exceed the values the sum of all contributions
individually [330]. A successful fusion of disciplines requires unifying separate ways of under-
standing and approaching problems across disciplines [331]. Rooting interdisciplinary research
more in society was attempted by promoting work across disciplines on many research universities
campuses in the United States in past years. However, the general superiority of interdisciplinary
over disciplinary knowledge has also been critically assessed [332].
    Collaboration across disciplines has to ensure efficient communication. Unavoidably, commu-
nication is a practical discipline and a vital skill for many different sciences [333]. It is a widespread
belief that interpersonal and social problems are caused by impaired communication and can be
alleviated by good communication [334].
    Besides essential aspects of collaboration itself, successful collaborations in optimization are
evident by studying literature. Various studies show optimization is almost ubiquitous, for instance,
in Agriculture [172], Engineering [174], Medicine [200] , or Economics [204]. Different
research studies use different kinds of collaborations among different stakeholders. Collaboration
is also set up in different ways, for example, in the same laboratory between researchers, across
departments and research groups, across research institutes in the same of different countries, or
between academia and industry.
7.1.3   SOLVeR: Collaborative Optimization
Collaborative optimization describes a procedure involving at least two stakeholders – a domain-
specific and optimization expert – pursuing to solve an optimization problem interactively. The
domain-specific expert initially provides the problem to be solved with the optimization expert’s
knowledge and experience. The interaction between both experts is crucial to solve the problem
successfully and can occur at different levels of involvement.
    Even though collaborations are carried out in different manners and have different challenges,
they often have analogous phases and supplemental activities. Thus, collaborative optimization
                                                     178


                                                  Supporting
                                               Project Management
                                                 Communication
                                                                                                             ls
                                                                                                          Goa
                                                Interdisciplinarity
                                               Collaboration Type
                S                O                      L                Ve                   R
            Specification   Optimization and                           Verification of   Repetition and
                                                     Live Test                                            Goa
               of the       Algorithm Design                            Method and           Lesson          ls
              Problem                                                     Results           Learned
                                                    Primary
                    Figure 7.1: Collaborative optimization practice using SOLVeR.
shall be schematized to track the overall progress and highlight important aspects for a successful
collaboration. A blueprint for collaborative optimization is shown in Figure 7.1, presenting not
only the phases but also the supporting activities. The primary phases follow the SOLVeR acronym:
Specification of the Problem (‘S’), Optimization and Algorithm Design (‘O’), Live Test (‘L’),
Verification of Method and Results (‘Ve’), and Repetitions and Lessons Learned (‘R’). For each
phase, the domain and the optimization expert’s roles and responsibilities differ and shall be
discussed in detail. Moreover, the arrows between the phases on the bottom indicate that multiple
iterations of phases are inevitable in practice and an essential part of a collaboration. Furthermore,
the phases are accompanied by supporting activities, such as project management, communication,
interdisciplinarity, and the type of collaboration. The blueprint’s split of primary and supporting
activities is inspired by the well-known value chain model [335] with similar characteristics. Both
the primary and supporting activities are essential to reach the goals. In the following, the five
SOLVeR phases are discussed, and additionally, an overview of each phase’s characteristics is
provided in Figure 7.2. Moreover, all supporting activities are described in detail.
                                                                 179


(i) Specification of the Problem (‘S’): In the first phase, all collaborators need to get a clear under-
standing of the optimization method’s overall goal. For the optimization expert, this often requires
understanding the fundamentals of a foreign research field. Thus, the domain expert’s responsibility
is to communicate efficiently and to define domain-related terminology if necessary. The primary
goal is not for all collaborators to understand every little detail but to grasp what the problem is
about. Thus, abstraction should be made whenever possible. Moreover, possible requirements
and meta-information about the problem should be discussed, for instance, the evaluation time
of a single design or the type and number of variables to be considered. After the problem has
been defined verbally, it should be stated mathematically, defining the objective(s), constraints,
and the underlying search space. With fundamental knowledge about the domain, the optimization
expert will often take the lead for the mathematical problem formulation. Nevertheless, the do-
main expert’s feedback is crucial to ensure the formulation fits the specifications and the domain
expert’s expectations. For instance, a target measure could be either incorporated into the problem
formulation as a constraint or an objective. Whereas both options might be legitimate ways of
considering this metric, domain knowledge can favor one or the other. Together with the optimiza-
tion expert’s knowledge about each option’s benefits and drawbacks in an optimization sense, the
domain knowledge demonstrates the benefits of a close collaboration from the beginning.
(ii) Optimization and Algorithm Design (‘O’): After the problem has been defined mathematically,
the design of a suitable algorithm is of most interest. The selection or design of an algorithm
requires experience in optimization and can be rather challenging. Before starting with the al-
gorithm’s design, all problem-dependent information shall be analyzed. For instance, does the
evaluation also provide information about the gradient? Or, how many function evaluations are
affordable? However, it’s important to note that some characteristics can only be assumed and
are not known beforehand. For example, a vital question to ask is the modality of the function’s
fitness landscape because it determines whether a local or global search might be appropriate. If
there is an explicitly defined equality constraint, one of the variables can be replaced in terms
of other variables – a process that eliminates one variable, and also every modified solution will
                                                  180


                  S                         O                            L                           Ve                          R
      Specification of the         Optimization and                                          Verifications of           Repetition and
                                                                    Live Test
            Problem                Algorithm Design                                       Method and Results            Lesson Learned
    •Specification of the       •Solve the optimization     •This phase can be         •Verify the obtained       •The last step should
     problem verbally and        problem using               interwoven with the        results and check          emphasize that multiple
     mathematically              optimization techniques     algorithm design or stay   whether they satisfy the   repetitions might be
    •Domain expert clarifies    •The optimization expert     separate                   requirements defined       necessary to find a
     doubts                      takes the lead in this     •Evaluate the               during specification       satisfying method
    •Optimization expert         phase                       performance of the        •Evaluate the optimization •If the result or method is
     translates the problem     •This could be to select an  optimization method on     method itself considering  not satisfying, all
     into a mathematical         algorithm that already      the real problem           being failsafe and other   previous phases need to
     description                 exists, modifying an       •This might require to set  requirements defined       be reviewed
    •Specify all requirements    existing one to meet the    up infrastructure such as  beforehand                •Draw lessons learned
     the optimization needs      needs, or design an         computational resources   •The optimization expert    from the optimization
     to consider, for instance,  algorithm from scratch     •Writing interfaces to be   usually will take care of  project after each
     execution time or          •This phase can include      able to call the actual    the technical and the      iteration
     algorithm overhead          evaluating the              optimization method        domain expert of the
                                 performance on a test or                               domain-specific aspects
                                 the real optimization
                                 problem
                                            Figure 7.2: Phases and responsibilities.
automatically satisfy the equality constraint. The use of such information to redefine an original
problem requires collaboration between optimization and domain-specific experts at the start of the
optimization process. A standard optimization algorithm can be modified to suit the supplied prob-
lem information. This can happen in modifying different operators of the algorithm. For example,
the initial solution(s) can be repaired to satisfy certain constraints so that the search can begin from
a good solution(s). The generative operations for creating new solutions can be motivated by the
problem information so that new solutions satisfy the supplied problem information.
      The fact that the mathematical problem definition and the optimization method are directly
linked to each other demonstrates the interdependence of the first two phases and the importance of
collaboration. After completing phase two, an algorithm has been developed, possible bugs during
development have been fixed, and source code or a binary file for running the method exists.
(iii) Live Test (‘L’): In the third phase, the developed algorithm is run in a live environment to
observe its performance on the real-world optimization problem. The testing phase is crucial
to ensure that the algorithm’s design is suitable for the original problem. This might require
                                                                      181


interfacing between different programming languages or setting up the computational resources to
run the method in a live environment. The domain and optimization expert’s responsibilities in this
phase depend on the type of collaboration and agreements. On the one hand, the algorithm’s design
can be driven by test problems with similar characteristics as the real-world optimization problem.
The development on test problems is also often necessary because of the lack of computational
resources or software licenses on the algorithm developer’s end or the industrial partner preferring
to make the problem not accessible to the outside. On the other hand, the problem’s evaluation
function might be delivered to the optimization expert – either open or closed source – and may
be directly used during the algorithm’s design. In some cases, the problem might have been
vaguely defined from the beginning, and the developer needs to implement a representative live
environment from scratch, for instance, by generating synthetic data with reasonable assumptions.
The variety of live tests’ realizations shows that different collaboration types require a different
amount of collaborative effort in this phase. However, no matter what type has been chosen, this
phase’s outcome is a method and results that have to be analyzed.
(iv) Verification of Method and Results (‘Ve’): In the fourth phase, the goals and requirements
defined initially need to be critically assessed and verified. The verification is based on the results
obtained in the previous phase. Even though the verification procedure will vary from collaboration
to collaboration, some tasks employed in practice are to analyze the algorithm’s convergence over
time and carefully inspect the solutions being found. In some collaborations, the optimization is
only performed once, and most attention is paid to the obtained solution(s) itself, and the method
plays a minor role. The obtained solutions need to be closely examined and made sure that all
requirements are satisfied. The examination often involves checking correctness, feasibility, and
the visualization of solutions and results. If discrepancies have been observed, the mathematically
defined problem might need to be refined or even entirely redefined, and a reiteration of phase one
may be necessary. Other collaborations might focus more on the method itself, primarily when the
algorithm is run repetitively, for instance, daily or weekly. Then, a thorough test of the method,
including possible boundary scenarios, is of importance. Moreover, for stochastic algorithms, not
                                                  182


a single run’s performance, but a statistical analysis of a set of runs needs to be done to address the
underlying randomness and ensure the method’s robustness. No matter where the priority of the
collaboration lies, verification is crucial to measure the overall success.
(v) Repetition and Lesson Learned (‘R’): As recommended for projects in general, the last phase
consists of reflecting on the collaboration and critically assessing the progress made. Practitioners
will agree that no project ends without future work and possible new collaborations. Thus, drawing
lessons learned to avoid pitfalls helps improve long-term efficiency and productivity.
Besides primary activities classified into phases, supporting activities are an essential part of
collaborative optimization. The supporting activities accompany any of the primary phases and
play a different role anytime during the collaboration.
Project Management: A project is characterized by a project schedule with a clearly defined
beginning and end. Moreover, the project’s outcome is typically defined by milestones and project
goals, which should be achieved during or at the end of the project. In practice, goals can also
be conflicting, for instance, in a university-industry research collaboration where the researchers
prioritize a seminal publication. In contrast, the industry might want to keep the findings confidential
to keep a competitor’s advantage. Nevertheless, agreeing on the goals initially and keeping track
of them is good advice in all collaborations. Moreover, project management includes all matters
regarding funding more resources and workforce during the collaboration.
Communication: Efficient communication is essential on many levels. The collaboration is ac-
companied by communication throughout all phases. The availability of collaborators and the
communication frequency can significantly impact the project’s outcome. While some collabo-
rators prefer frequent feedback, such as daily or weekly, others favor less frequent meetings, for
instance, monthly or biannually. Besides the frequency of regular meetings, the collaboration
should define several milestone meetings, consisting of at least a kick-off and final meeting. The
type of communication often depends on the geographical distance between collaborators. A rel-
atively small distance and convenient commute shall allow in-person meetings. Often, however,
                                                   183


this is not the case, and mostly online meetings are scheduled. Modern technology that allows to
turn on a webcam, share the screen, or even take over screen control can become handy to increase
such meetings’ productivity. Moreover, consistent e-mail correspondence and a hybrid in-person
and online communication style are often carried out in practice. Challenges in communication
commonly occur through domain-specific terminology, which is unclear to all collaborators or even
language barriers in international collaborations.
Interdisciplinarity: Many collaborations have their origin of a subject being of an interdisciplinary
manner. Therefore, an expert for the involved disciplines significantly speeds up the research process
or makes meaningful insights possible at all. In collaborative optimization, interdisciplinarity is
given by the presence of optimization itself and one other discipline. For some projects, even
multiple other disciplines might be involved with possible conflicting objectives. In literature, such
a situation related to optimization is also referred to as Multi-Disciplinary Design Optimization
(MDO). During the collaboration, especially during the initial problem specification phase, a
fundamental understanding of each discipline is essential. Even only rudimentary knowledge helps
develop an appreciation for each other’s research fields and facilitate meaningful discussions.
Collaboration Type: The type of collaboration has a significant impact on each collaborator’s
responsibility. With the type of collaboration, we refer to aspects related to the involvement, type,
and number of collaborators. In a light collaboration, details of the optimization problem regarding
complete problem formulation are available to the optimization experts, thereby not requiring much
collaboration between the two expert groups. In a medium collaboration, besides the details of the
problem formulation, further information is required either due to the complexity involved in the
problem or due to the nature of the problem. Optimization experts must share intermediate results
with domain-specific experts to get further information to improve the optimization method. In a
strong collaboration, both groups must engage in more collaboration to solve the problem. This can
happen if the objective and constraint functions cannot be shared with the optimization experts due
to confidentiality issues or the unavailability of computing resources with the optimization group.
                                                  184


7.1.4   Case Studies
The blueprint for collaborative optimization can be put into practice in different ways. We demon-
strate two case studies to illustrate.
Case Study 1: Cylinder Head Water Jacket. As a case study, the collaboration with an automobile
company regarding the optimization of a Cylinder Head Water Jacket is discussed. A study focusing
on the optimization itself has already been published [28]; however, details of the collaborative
process itself were not part of the study. Initially, the industrial partner with domain-specific
expertise sought an optimization expert to solve an industrial design problem that could not be
solved suitably with a commercial solver. Most commercial solvers are generic and not ideal
candidate solution methods to find an acceptable solution with a solution evaluation budget. Thus,
a collaboration was initiated. The industrial collaborator had a background in engineering and
more than a decade of experience in engineering design. The optimization experts are specialized
in multi-objective and evolutionary optimization, and the team consisted of one professor and two
Ph.D. students. The goal to design an algorithm that can deal with a constrained multi-objective
optimization problem where each evaluation requires computationally expensive simulation was
defined (phase ‘S’). Due to the time-consuming evaluation function, the overall evaluation budget
was limited to 120 simulations per optimization run. However, the algorithmic overhead could
be significantly higher and even reach a couple of minutes to find new solutions in each iteration.
Secondly, the algorithm was first developed on test problems with similar characteristics but
computationally inexpensive functions (phase ‘O’). Even though the algorithm has been designed
from scratch, the usage of existing modules and algorithms of pymoo [29] – a Python framework
for multi-objective optimization – was handy for prototyping and even sped up the algorithm’s
development. Bi-monthly discussions between all collaborators accompanied the research process.
Thirdly, multiple runs on the live environment (phase ‘L’) optimizing the Cylinder Head Water
Jacket have been employed. Because the optimization experts did not have access to the simulation
software, the optimization run was carried out manually by sending engineering designs back and
                                               185


forth via e-mail. This way, multiple experiments have been run, and at the same time, the results
were verified (phase ‘Ve’). Thus, the execution of phases ‘L’ and ‘Ve’ happened simultaneously.
As the method has been confirmed to be suitable for the optimization problem, the source code has
finally been delivered to the industrial partner. Delivering the source code ensured the algorithm
was used in the future for similar problems (phase ‘R’). Moreover, a final meeting discussing the
method and assessing the project’s success has taken place between all collaborators and coworkers
from related departments.
Case Study 2: Engine Design. In another auto-industry project executed at the COIN Lab, the initial
task of the industry designers was to reduce the weight of an automobile engine from its current
weight by 10 kg (phase ‘S’). The problem involves 145 discrete variables, which can be varied within
specified lower and upper bounds, 146 constraints which all must be satisfied, and six conflicting
objectives, which all must be optimized. The objective and constraint functions were not available
in explicit form; rather, a black-box executable was supplied. Initial collaborations between the two
groups revealed that the functions’ gradients were also available from the executable routine. The
availability of gradient information allowed the optimization experts to devise a new operator – a
gradient-based local search approach – to improve a solution locally. Another study revealed that
when 2.5 million random solutions were evaluated, no single solution was found to be feasible. The
majority of the search space being infeasible prompted optimization experts to devise an algorithm
to infinitely emphasize every feasible solution. A generic many-objective optimization algorithm
(NSGA-III [279]) was modified to develop a customized method (phase ‘O’) to find feasible non-
dominated solutions. The customized algorithm was directly applied to solve the engine design
problem (phase ‘L’). The developed method resulting from a close collaboration found a new
engine, 17 kilograms lighter than the current design, which is 7 kg better than originally desired
(phase ‘Ve’). Further information on obtained results can be found from [336]. This study mostly
used a light collaboration mode.
    The power of collaborative optimization came next from the designers. The multiplicity of
designs obtained by customized NSGA-III motivated the designers to set the next goal (phase ‘R’)
                                                 186


to find multiple engines with identical weight. This promoted the whole ‘SOLVeR’ procedure
to a new specification (phase ‘S’). Optimization experts then introduced the concept of niche-
preservation – survival of similar solutions as clusters – to develop a new optimization method
(phase ‘O’). Niche preservation is a new optimization technique that was possible to be developed
only by a collaborative problem-solving approach. The method was applied to the real problem
(phase ‘L’), and three different pairs of engines, each having an identical weight, were obtained
(phase ‘Ve’). The SOLVeR approach’s ability to reduce the engine weight by more than 10 kg
motivated the designers to repeat the process (phase ‘R’) to a third cycle in which they aspired to
reduce the weight further by relaxing the constraint bounds.
    Relaxation of constraints to improve objective function was dealt with by formulating a two-
objective optimization problem (phase ‘O’). One of the objectives was to minimize the amount of
constraint violation from the current best solution; the second conflicting objective was to maximize
the amount of weight reduction from the current best solution. The bi-objective optimization method
found multiple trade-off solutions with different combinations of constraint violations and weight
reductions (phase ‘L’). The solutions allowed designers to better understand the trade-off before
choosing a final solution for implementation (phase ‘Ve’).
    None of these extensions achieved with specific and innovative optimization methods were
academic, nor were they standard optimization practices. However, they revealed alternate solutions
close to the designers’ interests, so they had a plethora of pertinent solutions before choosing one.
Such a design feat was possible only with a collaborative optimization procedure.
7.1.5   Summary of Section 7.1
Optimization is an interdisciplinary research field and a substantial part of various sciences. Thus,
collaboration is vital to tackle problem-solving tasks in all kinds of disciplines successfully.
Whereas most studies focus on the outcome of such collaborative optimization, this study puts
the collaborative process itself as the center of attention. To guide the process of collaboration, we
have proposed a blueprint following the SOLVeR approach consisting of five phases: Specification
                                                  187


of the Problem, Optimization and Algorithm Design, Live Test, Verification of Method and Results,
and Repetitions and Lesson. We have defined the domain and the optimization expert’s roles and
responsibilities for each phase and highlighted the other supporting activities during collabora-
tive optimization. Moreover, two case studies have illustrated how the blueprint for collaborative
optimization was implemented in practice.
    This section has demonstrated the importance of performing a collaborative optimization rather
than a silo-based optimization without any intermediate interactions from domain-specific experts.
Collaborative optimization makes solving challenging problems quicker and opens up new avenues
for more flexible and practical optimization studies. Through collaboration, the experts benefit
from each other, which results in understanding different facets of the application and gaining
insights more efficiently.
7.2    pymoo: Multi-Objective Optimization in Python
Collaborative optimization implies that different kinds of experts are working together. However,
one should not assume that everyone can write code or is familiar with the usage of programming
languages. Thus, standard software is one way of making research accessible to a larger audience.
Nevertheless, such deliverables are limited by definition because the software package is mainly of a
black-box nature, and no further modifications can be made. In contrast to standard software, open-
source frameworks are publicly available and are ideal for customizing optimization methods. For
this reason, the usage of frameworks offers a good trade-off between not having to start developing
an optimization method from scratch and having access to existing state-of-the-art optimization
algorithms. As someone who has made an effort to develop an optimization framework, we
would like to share our development’s common design principles and features. Our optimization
framework pymoo is an open-source (evolutionary) multi-objective optimization framework written
in Python, and we are proud to be able to say that it has gained some popularity over the last
years [29].
                                                188


7.2.1   Introduction
Optimization plays an essential role in many scientific areas, such as engineering, data analytics,
and deep learning. These fields are fast-growing, and their concepts are employed for various
purposes, for instance, gaining insights from large data sets or fitting accurate prediction models.
Efficient implementation in a suitable programming language is essential whenever an algorithm
must handle a significantly large amount of data. Python [337] has become the programming
language of choice over the last few years for the research areas mentioned above because it is not
only easy to use but good community support also exists. Python is a high-level, cross-platform,
and interpreted programming language that focuses on code readability. A large number of high-
quality libraries are available and support for any kind of scientific computation is ensured. These
characteristics make Python an appropriate tool for many research and industry projects where the
investigations can be complex.
    A fundamental principle of research is to ensure the reproducibility of studies and provide access
to the research materials whenever possible. In computer science, this translates to a sketch of an
algorithm and the implementation itself. However, the implementation of optimization algorithms
can be challenging, and, specifically, benchmarking is time-consuming. Having access to either a
good collection of different source codes or a comprehensive library is time-saving and reduces the
probability of an error-prone implementation from scratch.
    To address this need for multi-objective optimization in Python, we introduce pymoo. The goal
of our framework is not only to provide state-of-the-art optimization algorithms but also to cover
different aspects related to the optimization process itself. We have implemented single-, multi-,
and many-objective test problems, which can be used as a test-bed for algorithms. In addition to
the objective and constraint values of test problems, gradient information can be retrieved through
automatic differentiation [338]. Moreover, a parallel evaluation of solutions can be implemented
through vectorized computations, multi-threaded execution, and distributed computing. Further,
pymoo provides implementations of performance indicators to measure the quality of results
obtained by a multi-objective optimization algorithm. Tools for an explorative analysis through
                                                 189


visualization of lower and higher-dimensional data are available, and multi-criteria decision-making
methods guide selecting a single solution from a solution set based on preferences.
    Our framework is designed to be extendable through its modular implementation. For instance,
a genetic algorithm is assembled in a plug-and-play manner by making use of specific sub-modules,
such as initial sampling, mating selection, crossover, mutation, and survival selection. Each sub-
module takes care of an aspect independently, and, therefore, variants of algorithms can be initiated
by passing different combinations of sub-modules. This concept allows end-users to incorporate
domain knowledge through custom implementations. For example, in an evolutionary algorithm, a
biased initial sampling module created with the knowledge of domain experts can guide the initial
search.
    Furthermore, we like to mention that our framework is well-documented, with a large number
of available code snippets. We created a starter’s guide for users to become familiar with our
framework and demonstrate its capabilities. As an example, it shows the optimization results
of a bi-objective optimization problem with two constraints. An extract from the guide will be
presented in this chapter. Moreover, we explain each algorithm and code needed to run it on a
suitable optimization problem in our software documentation. Additionally, we show a definition
of test problems and provide a plot of their fitness landscapes. The framework documentation is
built using Sphinx [339], and the correctness of modules is ensured by automatic unit testing [340].
Most algorithms have been developed in collaboration with the second author and benchmarked
extensively against the original implementations.
7.2.2   Related Work
In the last decades, various optimization frameworks in diverse programming languages have been
developed. However, some of them only partially cover multi-objective optimization. In general,
the choice of a suitable framework for an optimization task is a multi-objective problem itself.
Moreover, some criteria are rather subjective, for instance, the usability and extendibility of a
framework. Therefore, the assessment regarding criteria and the decision-making process differ
                                                190


from user to user. For example, one might have decided on a programming language first, either
because of personal preference or a project constraint and then searched for a suitable framework.
One might give more importance to the overall features of a framework, for example, parallelization
or visualization, over the programming language itself. An overview of some existing multi-
objective optimization frameworks in Python is listed in Table 7.1, each of which is described in
the following.
    Recently, the well-known multi-objective optimization framework jMetal [341] developed in
Java [342] has been ported to a Python version, namely jMetalPy [343]. The authors aim to further
extend it and to make use of the full feature set of Python, for instance, data analysis and data
visualization. In addition to traditional optimization algorithms, jMetalPy also offers methods for
dynamic optimization. Moreover, the post-analysis of performance metrics of an experiment with
several independent runs is automated.
    Parallel Global Multiobjective Optimizer, PyGMO [344], is an optimization library for the easy
distribution of massive optimization tasks over multiple CPUs. It uses the generalized island-model
paradigm for the coarse-grained parallelization of optimization algorithms and, therefore, allows
users to develop asynchronous and distributed algorithms.
    Platypus [345] is a multi-objective optimization framework that offers implementations of
state-of-the-art algorithms. It enables users to generate an experiment with various algorithms and
provides post-analysis methods based on metrics and visualization.
    A Distributed Evolutionary Algorithms in Python (DEAP) [346] is a novel evolutionary com-
putation framework for rapid prototyping and testing of ideas. Even though DEAP does not focus
on multi-objective optimization, due to the modularity and extendibility of the framework, multi-
objective algorithms can be developed. Moreover, parallelization and load-balancing tasks are
supported out of the box.
    Inspyred [347] is a framework for creating bio-inspired computational intelligence algorithms
in Python, which is not focused on multi-objective algorithms directly, but on evolutionary com-
putation in general. However, an example for NSGA-II [10] is provided, and other multi-objective
                                                  191


                   Table 7.1: Multi-objective optimization frameworks in Python.
                 Name       License       Focus on     Pure     Visuali-   Decision
                                           multi-     Python     zation     Making
                                          objective
                 jMetalPy      MIT           ✓          ✓          ✓           ✗
                 PyGMO       GPL-3.0         ✓          ✗          ✗           ✗
                 Platypus    GPL-3.0         ✓          ✓          ✗           ✗
                  DEAP      LGPL-3.0         ✗          ✓          ✗           ✗
                 Inspyred      MIT           ✗          ✓          ✗           ✗
                  pymoo     Apache 2.0       ✓          ✓          ✓           ✓
algorithms can be implemented through the modular implementation of the framework.
    If the search for frameworks is not limited to Python, other popular frameworks should be
considered: PlatEMO [245] in Matlab, MOEA [348] and jMetal [343] in Java, jMetalCpp [349]
and PaGMO [344] in C++. Of course, this is not an exhaustive list and readers may search for other
available options.
7.2.3   Architecture
Software architecture is fundamentally important to keep source code organized. On the one hand,
it helps developers and users to get an overview of existing classes, and on the other hand, it
allows flexibility and extendibility by adding new modules. Figure 7.3 visualizes the architecture of
pymoo. The first level of abstraction consists of optimization problems, algorithms, and analytics.
Each of the modules can be categorized into more detail and consists of multiple sub-modules.
   (i) Problems: Optimization problems in our framework are categorized into single-, multi-, and
       many-objective test problems. Gradients are available through automatic differentiation, and
       parallelization can be implemented by using a variety of techniques.
  (ii) Optimization: Since most of the algorithms are based on evolutionary computations, op-
       erators such as sampling, mating selection, crossover, and mutation have to be chosen or
                                                  192


Architecture
                                                               pymoo
           Problems                                   Optimization                               Analytics
  single-      multi-       many-
                                        Sampling         Crossover      Mutation
 objective    objective    objective
             Gradients               Mating Selection     Survival       Repair                  Performance Decision
                                                                                   Visualization
                                                                                                  Indicator  Making
           Parallelization             Constraint                      Termination
                                       Handling        Decomposition
                                                                        Criterion
                                      Figure 7.3: Software architecture of pymoo.
         implemented. Furthermore, because many problems in practice have one or more con-
         straints, a methodology for handling those must be incorporated. Some algorithms are based
         on decomposition, which splits the multi-objective problem into many single-objective prob-
         lems. Moreover, when the algorithm is used to solve the problem, a termination criterion
         must be defined either explicitly or implicitly by the implementation of the algorithm.
(iii) Analytics: During and after an optimization run, analytics support the understanding of data.
         First, intuitively the design space, objective space, or other metrics can be explored through
         visualization. Moreover, to measure the convergence and/or diversity of a Pareto-optimal
         set, performance indicators can be used. For real-parameter problems, the recently proposed
         theoretical KKT proximity metric [350, 351] computation procedure is included in pymoo
         to compute the proximity of a solution to the true Pareto-optimal front, despite not knowing
         its exact location. In order to support the decision-making process, either through finding
         points close to the area of interest in the objective space or high trade-off solutions. This
         can be applied either during an optimization run to mimic interactive optimization or as a
         post-analysis.
     In the remainder of the chapter, we will discuss each of the modules mentioned in more detail.
                                                                   193


7.2.4   Problems
It is common practice for researchers to evaluate the performance of algorithms on a variety of
test problems. Since we know no single-best algorithm for all arbitrary optimization problems
exist [352], this helps to identify problem classes where the algorithm is suitable. Therefore,
a collection of test problems with different numbers of variables, objectives, or constraints and
alternating complexity becomes handy for algorithm development. Moreover, in a multi-objective
context, test problems with different Pareto front shapes or varying variable densities close to the
optimal region are of interest.
7.2.4.1   Implementations
In our framework, we categorize test problems regarding the number of objectives: single-objective
(1 objective), multi-objective (2 or 3 objectives), and many-objective (more than 3 objectives). Test
problems implemented in pymoo are listed in Table 7.2. For each problem, the number of variables,
objectives, and constraints are indicated. If the test problem is scalable to any of the parameters,
we label the problem with (s). If the problem is scalable, but a default number was originally
proposed, we indicate that with surrounding brackets. In case the category does not apply, for
example, because we refer to a test problem family with several functions, we use (·).
     The implementations in pymoo let end-users define what values of the corresponding problem
should be returned. On an implementation level, the evaluate function of a Problem instance
takes a list return_value_of which contains the type of values being returned. By default the
objective values "F" and if the problem has constraints the constraint violation "CV" are included.
The constraint function values can be returned independently by adding "G". This gives developers
the flexibility to receive the values that are needed for their methods.
7.2.4.2   Parallelization
If evaluation functions are computationally expensive, a serialized evaluation of a set of solutions
can become the bottleneck of the overall optimization procedure. For this reason, parallelization
                                                  194


Table 7.2: Multi-objective optimization test problems.
 Problem            Variables   Objectives Constraints
                    Single-Objective
 Ackley             (s)         1          -
 Cantilevered Beams 4           1          2
 Griewank           (s)         1          -
 Himmelblau         2           1          -
 Knapsack           (s)         1          1
 Pressure Vessel    4           1          4
 Rastrigin          (s)         1          -
 Rosenbrock         (s)         1          -
 Schwefel           (s)         1          -
 Sphere             (s)         1          -
 Zakharov           (s)         1          -
 G1-9               (·)         (·)        (·)
                    Multi-Objective
 BNH                2           2          2
 Carside            7           3          10
 Kursawe            3           2          -
 OSY                6           2          6
 TNK                2           2          2
 Truss2D            3           2          1
 Welded Beam        4           2          4
 CTP1-8             (s)         2          (s)
 ZDT1-3             (30)        2          -
 ZDT4               (10)        2          -
 ZDT5               (80)        2          -
 ZDT6               (10)        2          -
                    Many-Objective
 DTLZ 1-7           (s)         (s)        -
 CDTLZ              (s)         (s)        -
         −1
 DTLZ1              (s)         (s)        -
 SDTLZ              (s)         (s)        -
 WFG                (s)         (s)        -
                          195


is desired for utilizing existing computational resources more efficiently and the distribution of
long-running calculations. In pymoo, the evaluation function receives a set of solutions if the
algorithm uses a population. This empowers the user to implement any kind of parallelization as
long as the objective values for all solutions are written as an output when the evaluation function
terminates. In our framework, a couple of possibilities to implement parallelization exist:
   (i) Vectorized Evaluation: A common technique to parallelize evaluations is to use matrices
       where each row represents a solution. Therefore, a vectorized evaluation refers to a column
       that includes the variables of all solutions. By using vectors, the objective values of all
       solutions are calculated at once. To run calculations on a GPU, implementing support for
       PyTorch [353] tensors can be done with little overhead given suitable hardware and correctly
       installed drivers.
  (ii) Threaded Loop-wise Evaluation: If the function evaluation should occur independently, a
       for loop can be used to set the values. By default, the evaluation is serialized, and no
       calculations occur in parallel. By providing a keyword to the evaluation function, pymoo
       spawns a thread for each evaluation and manages those by using the default thread pool
       implementation in Python. This behavior can be implemented out of the box, and the number
       of parallel threads can be modified.
 (iii) Distributed Evaluation: If the evaluation should not be limited to a single machine, the
       evaluation itself can be distributed to several workers or a whole cluster. We recommend
       using Dask [354] which enables distributed computations on different levels. For instance,
       the matrix operation itself can be distributed, or a whole function can be outsourced. Similar
       to the loop-wise evaluation, each individual can be evaluated element-wise by sending it to
       a worker.
                                                  196


7.2.5    Optimization Module
The optimization module provides different kinds of sub-modules to be used in algorithms. Some
of them are more of a generic nature, such as decomposition and termination criterion, and others
are more related to evolutionary computing. By assembling those modules together, algorithms are
built.
7.2.5.1     Algorithms
Available algorithm implementations in pymoo are listed in Table 7.3. Compared to other op-
timization frameworks, the list of algorithms may look rather short; however, each algorithm is
customizable, and variants can be initialized with different parameters. For instance, a Steady-State
NSGA-II [355] can be initialized by setting the number of offspring to one. This can be achieved
by supplying this as a parameter in the initialization method. Moreover, it is worth mentioning
that many-objective algorithms, such as NSGA-III or MOEAD, require reference directions to be
provided. The reference directions are commonly desired to be uniform or to have a bias toward a
region of interest. Our framework offers an implementation of the Das and Dennis method [280]
for a fixed number of points (fixed with respect to a parameter often referred to as partition number)
and a recently proposed Riesz-Energy based method which creates a well-spaced point set for an
arbitrary number of points and is capable of introducing a bias towards preferred regions in the
objective space [276].
7.2.5.2     Operators
The following evolutionary operators are available:
   (i) Sampling: The initial population is mostly based on random sampling. In some cases, it
        might be based on domain knowledge or a set of existing solutions whose performance has
        already been assessed. Otherwise, it can be sampled randomly for real, integer, or binary
        variables. Additionally, Latin-Hypercube Sampling [234] can be used for real variables.
                                                  197


                        Table 7.3: Multi-objective optimization algorithms.
                                        Algorithm    Reference
                                        GA            [1, 32]
                                        BRKGA        [356]
                                        DE           [107]
                                        Nelder-Mead  [357]
                                        CMA-ES       [112, 113]
                                        NSGA-II      [10]
                                        RNSGA-II     [358]
                                        NSGA-III     [279, 258, 12]
                                        UNSGA-III    [359]
                                        RNSGA-III    [360]
                                        MOEAD        [11]
 (ii) Crossover: A variety of crossover operators for different type of variables are implemented. In
      Figure 7.4 some of them are presented. Figures 7.4a to 7.4d help to visualize the information
      exchange in a crossover with two parents being involved. Each row represents an offspring and
      each column a variable. The corresponding boxes indicate whether the values of the offspring
      are inherited from the first or from the second parent. For one- and two-point crossovers, it can
      be observed that either one or two cuts in the variable sequence exist. Contrarily, the Uniform
      Crossover (UX) does not have any clear pattern because each variable is chosen randomly
      either from the first or from the second parent. For the Half Uniform Crossover (HUX), half
      of the variables, which are different, are exchanged. For the purpose of illustration, we have
      created two parents that have different values in 10 different positions. For real variables,
      Simulated Binary Crossover [361] mimics the combination of binary encoded variables. In
      Figure 7.4e, the probability distribution when the parents 𝑥 − 1 = 0.2 and 𝑥 − 2 = 0.8 where
      𝑥 −𝑖 ∈ [0, 1] with 𝜂 = 0.8 are recombined is shown. Analogously, in case of integer variables
      we subtract 0.5 from the lower and add (0.5 − 𝜖) to the upper bound before applying the
      crossover and round to the nearest integer afterwards (see Figure 7.4f).
(iii) Mutation: For real and integer variables Polynomial Mutation [362, 9] and for binary variables
      Bitflip mutation [1] is provided.
                                                   198


                                 0                                                   0
                  Individuals                                         Individuals
                                50                                                  50
                                     0              100                                  0          100
                                                  Variables                                       Variables
                                          (a)     One Point                                  (b) Two Point
                                 0                                                   0
                  Individuals                                         Individuals
                                50                                                  50
                                     0              100                                  0          100
                                                  Variables                                       Variables
                                                (c) UX                                          (d) HUX
                                p(x)                                                p(x)
                                                0.2           0.8                                 10         10
                                                      x                                                x
                                         (e) SBX (real, eta=0.8)                             (f) SBX (int, eta=3)
          Figure 7.4: Illustration of some crossover operators for different variables types.
   Different problems require different types of operators. In practice, if a problem is supposed
to be solved repeatedly and routinely, it makes sense to customize the evolutionary operators to
improve the convergence of the algorithm. Moreover, for custom variable types, for instance trees
or mixed variables [363], custom operators [6] can be implemented easily and called by algorithm
class. Our software documentation contains examples for custom modules, operators, and variable
types.
7.2.5.3   Termination Criterion
For every algorithm, it must be determined when it should terminate a run. This can be simply
based on a predefined number of function evaluations, iterations, or a more advanced criterion, such
                                                                    199


as the change of a performance metric over time. For example, we have implemented a termination
criterion based on the variable and objective space difference between generations. In order to make
the termination criterion more robust, the last 𝑘 generations are considered. The largest movement
from a solution to its closest neighbor is tracked across generations, and whenever it is below
a certain threshold, the algorithm is considered to have converged. Analogously, the movement
in the objective space can also be used. In the objective space, however, normalization is more
challenging and has to be addressed carefully. The default termination criterion for multi-objective
problems in pymoo keeps track of the boundary points in the objective space and uses them, when
they have settled down, for normalization. More details about the proposed termination criterion
can be found in [364].
7.2.5.4   Decomposition
Decomposition transforms multi-objective problems into many single-objective optimization prob-
lems [365]. Such a technique can be either embedded in a multi-objective algorithm and solved
simultaneously or independently using a single-objective optimizer. Some decomposition methods
are based on the lp-metrics with different 𝑝 values. For instance, a naive but frequently used
decomposition approach is the Weighted-Sum Method (𝑝 = 1), which is known to be incapable
of converging to the non-convex part of a Pareto front [9]. Moreover, instead of summing values,
Tchebysheff Method (𝑝 = ∞) considers only the maximum value of the difference between the ideal
point and a solution. Similarly, the Achievement Scalarization Function (ASF) [366] and a modified
version Augmented Achievement Scalarization Function (AASF) [367] use the maximum of all
differences. Furthermore, Penalty Boundary Intersection (PBI) [11] is calculated by a weighted
sum of the norm of the projection of a point onto the reference direction and the perpendicular
distance. Also, it is worth noting that normalization is essential for any kind of decomposition. All
decomposition techniques mentioned above are implemented in pymoo.
                                                 200


7.2.6    Analytics
7.2.6.1    Performance Indicators
The comparison regarding performance is rather simple for single-objective optimization algorithms
because each optimization run results in a single best solution. In multi-objective optimization,
however, each run returns a non-dominated set of solutions. In order to compare sets of solutions,
various performance indicators have been proposed in the past [368]. In pymoo most commonly
used performance indicators are described:
   (i) GD/IGD: Given the Pareto front PF the deviation between the non-dominated set S found
       by the algorithm and the optimum can be measured. Following this principle, Generational
       Distance (GD) indicator [369] calculates the average Euclidean distance in the objective
       space from each solution in S to the closest solution in PF. This measures the convergence of
       S, but does not indicate whether a good diversity on the Pareto front has been reached. Sim-
       ilarly, Inverted Generational Distance (IGD) indicator [240] measures the average Euclidean
       distance in the objective space from each solution in PF to the closest solution in S. The
       Pareto front as a whole needs to be covered by solutions from S to minimize the performance
       metric. Thus, the lower the GD and IGD values, the better is the set. However, IGD is known
       to be not Pareto-compliant [370].
  (ii) GD+/IGD+: A variation of GD and IGD has been proposed in [370]. The Euclidean distance
       is replaced by a distance measure that takes the dominance relation into account. The authors
       show that IGD+ is weakly Pareto compliant.
 (iii) Hypervolume: Moreover, the dominated portion of the objective space can be used to
       measure the quality of non-dominated solutions [371]. The higher the hypervolume, the
       better is the set. Instead of the Pareto front, a reference point needs to be provided. It
       has been shown that Hypervolume is Pareto compliant [241]. Because the performance
       metric becomes computationally expensive in higher dimensional spaces, the exact measure
                                                 201


      becomes intractable. However, we plan to include some proposed approximation methods in
      the near future.
    Performance indicators are used to compare existing algorithms. Moreover, the development
of new algorithms can be driven by the goodness of different metrics themselves.
7.2.6.2   Visualization
The visualization of intermediate steps or the final result is inevitable. In multi and many-objective
optimization, visualization of the objective space is of interest so that trade-off information among
solutions can be easily experienced from the plots. Depending on the dimension of the objective
space, different types of plots are suitable to represent a single or a set of solutions. In pymoo the
implemented visualizations wrap around the well-known plotting library in Python Matplotlib [372].
Keyword arguments provided by Matplotlib itself are still available, which allows modifying, for
instance, the color, thickness, opacity of lines, points, or other shapes. Therefore, all visualization
techniques are customizable and extendable.
    For two or three objectives, scatter plots (see Figure 7.5a and 7.5b) can give a good intuition
about the solution set. Trade-offs can be observed by considering the distance between two points.
It might be desired to normalize each objective to make sure a comparison between values is based
on relative and not absolute values. Pairwise Scatter Plots (see Figure 7.5c) visualize more than
three objectives by showing each pair of axes independently. The diagonal is used to label the
corresponding objectives.
    Also, high-dimensional data can be illustrated by Parallel Coordinate Plots (PCP) as shown in
Figure 7.5d. All axes are plotted vertically and represent an objective. Each solution is illustrated
by a line from left to right. The intersection of a line and an axis indicate the value of the
solution regarding the corresponding objective. For the purpose of comparison, solution(s) can be
highlighted by varying color and opacity.
    Moreover, a common practice is to project the higher dimensional objective values onto the
2D plane using a transformation function. Radviz (Figure 7.5e) visualizes all points in a circle,
                                                  202


                                                                                                                                                         0.5                         0.5                          0.5
                                                                                                                                         f1         f2                          f3                           f4
     1.0                                                                                                                                                 0.0
                                                                                                                                                               0.0
                                                                                                                                                                     f1
                                                                                                                                                                          0.5
                                                                                                                                                                                     0.0
                                                                                                                                                                                           0.0
                                                                                                                                                                                                  f1
                                                                                                                                                                                                       0.5
                                                                                                                                                                                                                  0.0
                                                                                                                                                                                                                        0.0
                                                                                                                                                                                                                              f1
                                                                                                                                                                                                                                   0.5
                                                                                                                             0.5                                                     0.5                          0.5
     0.5                                                                                                                f1                                           f2         f3                           f4
                                                                                                                             0.0                                                     0.0                          0.0
                                           0.5                                                                                     0.0        0.5                                          0.0         0.5              0.0        0.5
                                           0.4                                                                                           f2                                                       f2                          f2
f2   0.0                                 f3 0.3                                                                              0.5                         0.5
                                                                                                                                                                                                  f3
                                                                                                                                                                                                                  0.5
                                            0.2                                                                         f1                          f2                                                       f4
                                             0.1                                                                             0.0                         0.0                                                      0.0
     0.5                                     0.0                                                                                   0.0
                                                                                                                                         f3
                                                                                                                                              0.5              0.0
                                                                                                                                                                     f3
                                                                                                                                                                          0.5                                           0.0
                                                                                                                                                                                                                              f3
                                                                                                                                                                                                                                   0.5
                                                 0.0                                                              0.0        0.5                         0.5                         0.5
                                                       0.1                                                  0.1                                                                                                               f4
       0.00      0.25    0.50   0.75                          0.2                                       0.2             f1                          f2                          f3
                                                                                                                             0.0                         0.0                         0.0
                                                                    0.3                           0.3
                        f1                               f2               0.4
                                                                                0.5 0.5
                                                                                            0.4            f1
                                                                                                                                   0.0
                                                                                                                                         f4
                                                                                                                                              0.5              0.0
                                                                                                                                                                     f4
                                                                                                                                                                          0.5              0.0
                                                                                                                                                                                                  f4
                                                                                                                                                                                                       0.5
           (a)    Scatter 2D                                        (b) Scatter 3D                                                                       (c) Scatter ND
                                                                           f4                 f3
                                                                                                                                                                     f4                    f3
                                                              f5                                           f2
                                                                                                                                                     f5                                                f2
                                                       f6                                                         f1                            f6                                                          f1
                                                                                                                                                     f7                                                f10
                                                              f7                                           f10                                                       f8                    f9
 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10                                            f8                 f9
              (d) PCP [235]                                         (e) Radviz [373]                                         (f) Star Coordinate Graph [374]
                                                                                                                                                                                       f2
                                                                                       f2                                                     f3
                                                             f3                                            f1                                                                                                           f1
                                                             f4                                            f6                  f4
     f1 f2 f3 f4 f5 f6                                                            f5                                                                                                             f5
           (g) Heatmap [375]                           (h) Petal Diagram [376]                                                      (i) Spider-Web/Radar [377]
                            Figure 7.5: Different visualization methods coded in pymoo.
and the objective axes are uniformly positioned around the perimeter. Considering a minimization
problem and a set of non-dominated solutions, an extreme point very close to an axis represents
the worst solution for that corresponding objective but is comparably "good" in one or many other
objectives. Similarly, Star Coordinate Plots (see Figure 7.5f) illustrate the objective space, except
that the transformation function allows solutions outside of the circle.
     Heatmaps (see Figure 7.5g) are used to represent the goodness of solutions through colors.
Each row represents a solution and each column a variable. We leave the choice to the end-user of
what color map to use and whether light or dark colors illustrate better or worse solutions. Also,
                                                                                 203


solutions can be sorted lexicographically by their corresponding objective values.
    Instead of visualizing a set of solutions, one solution can be illustrated at a time. The Petal
Diagram (Figure 7.5h) is a pie diagram where the objective value is represented by each piece’s
diameter. Colors are used to further distinguish the pieces. Finally, the Spider-Web or Radar
Diagram (Figure 7.5i) shows the values of the objective as a point on an axis. The ideal and
nadir point [9] is represented by the inner and outer polygon. By definition, the solution lies in
between those two extremes. If the objective space ranges are scaled differently, normalization for
the purpose of plotting can be enabled, and the diagram becomes symmetric. New and emerging
methods for visualizing more than efficient three-dimensional solutions, such as 2.5-dimensional
PaletteViz plots [378], would be implemented in the future.
7.2.6.3    Decision Making
In practice, after obtaining a set of non-dominated solutions, a single solution has to be chosen for
implementation. Pymoo provides a few “a posteriori” approaches for decision making [9].
   (i) Compromise Programming: One way of making a decision is to compute the value of
       a scalarized and aggregated function and select one solution based on the minimum or
       maximum value of the function. In pymoo, a number of scalarization functions described in
       Section 7.2.5.4 can be used to come to a decision regarding desired weights of objectives.
  (ii) Pseudo-Weights: However, a more intuitive way to chose a solution out of a Pareto front is the
       pseudo-weight vector approach proposed in [9]. The pseudo-weight 𝑤 𝑖 for the 𝑖-th objective
       function is calculated by:
                                          ( 𝑓 max − 𝑓𝑖 (𝑥)) / ( 𝑓𝑖max − 𝑓𝑖min )
                                 𝑤 𝑖 = Í 𝑀 𝑖 max                       max − 𝑓 min )
                                                                                     .          (7.1)
                                         𝑚=1  ( 𝑓 𝑚  −  𝑓 𝑚 (𝑥)) / ( 𝑓 𝑚      𝑚
       The normalized distance to the worst solution regarding each objective 𝑖 is calculated. It is
       interesting to note that for non-convex Pareto fronts, the pseudo-weight does not correspond
       to the result of an optimization using the weighted-sum method. A solution having the
                                                    204


       closest pseudo-weight to a target preference vector of objectives ( 𝑓1 being preferred twice as
       important as 𝑓2 results in a target preference vector of (0.667, 0.333)) can be chosen as the
       preferred solution from the efficient set.
 (iii) High Trade-Off Solutions: Furthermore, high trade-off solutions are usually of interest but
       not straightforward to identify in higher-dimensional objective spaces. We have implemented
       the procedure proposed in [379]. It was described to be embedded in an algorithm to guide
       the search; we, however, use it for post-processing. The metric for each solution pair 𝑥𝑖 and
       𝑥 𝑗 in a non-dominated set is given by:
                                                  Í𝑀
                                                         max[0, 𝑓𝑚 (𝑥 𝑗 ) − 𝑓𝑚 (𝑥𝑖 )]
                                  𝑇 (𝑥𝑖 , 𝑥 𝑗 ) = Í𝑖=1𝑀
                                                                                      ,                (7.2)
                                                     𝑖=1 max[0, 𝑓𝑚 (𝑥𝑖 ) − 𝑓𝑚 (𝑥 𝑗 )]
       where the numerator represents the aggregated sacrifice and the denominator represents the
       aggregated gain. The trade-off measure 𝜇(𝑥𝑖 , 𝑆) for each solution 𝑥𝑖 with respect to a set of
       neighboring solutions 𝑆 is obtained by:
                                                𝜇(𝑥𝑖 , 𝑆) = min 𝑇 (𝑥𝑖 , 𝑥 𝑗 ).                         (7.3)
                                                            𝑥 𝑗 ∈𝑆
       It finds the minimum 𝑇 (𝑥𝑖 , 𝑥 𝑗 ) from 𝑥𝑖 to all other solutions 𝑥 𝑗 ∈ 𝑆. Instead of calculating the
       metric with respect to all others, we provide the option to only consider the 𝑘 closest neighbors
       in the objective space to reduce the computational complexity. Based on circumstances, the
       ‘min’ operator can be replaced with ‘average’, or ‘max’, or any other suitable operator.
       Thereafter, the solution having the maximum 𝜇 can be chosen as the preferred solution,
       meaning that this solution causes a maximum sacrifice in one of the objective values for a
       unit gain in another objective value for it be the most valuable solution for implementation.
    The above methods are algorithmic but require user interaction to choose a single preferred
solution. However, in real practice, a more problem-specific decision-making method must be
used, such as an interaction EMO method suggested elsewhere [380]. We like to emphasize that
multi-objective frameworks should include multi-criteria decision-making methods and support
end-users in choosing a solution out of a trade-off solution set.
                                                        205


7.2.7   Summary of Section 7.2
This chapter has introduced pymoo, a multi-objective optimization framework in Python. We have
presented the overall architecture of the framework consisting of three core modules: Problems,
Optimization, and Analytics. Each module has been described in-depth and illustrative examples
have been provided. We have shown that our framework covers various aspects of multi-objective
optimization, including the visualization of high-dimensional spaces and multi-criteria decision-
making to finally select a solution out of the obtained solution set. One distinguishing feature of our
framework with other existing ones is that we have provided a few options for various key aspects
of a multi-objective optimization task, providing standard evolutionary operators for optimization,
standard performance metrics for evaluating a run, standard visualization techniques for showcasing
obtained trade-off solutions, and a few approaches for decision-making.
    However, the framework can be extended to make it more comprehensive, and we are constantly
adding new capabilities based on practicalities learned from our collaboration with industries. In
the future, we plan to implement more optimization algorithms and test problems to provide end-
users with more choices. Also, we aim to implement some methods from classical literature on
single-objective optimization, which can also be used for multi-objective optimization through
decomposition or embedded as a local search. So far, we have provided a few basic performance
metrics. We plan to extend this by creating a module that automatically runs a list of algorithms on
test problems and provides statistics of different performance indicators.
    Furthermore, we like to mention that any kind of contribution is more than welcome. We see our
framework as a collaborative collection from and to the multi-objective optimization community. By
adding a method or algorithm to pymoo the community can benefit from a growing comprehensive
framework, and it can help researchers to advertise their methods. Interested researchers are
welcome to contact the authors. In general, different kinds of contributions are possible, and more
information can be found online. Moreover, we would like to mention that even though we try to
keep our framework as bug-free as possible, in case of exceptions during the execution or doubt of
correctness, please contact us directly or use our issue tracker.
                                                  206


7.3   Summary of the Chapter
This chapter has provided a blueprint for optimization in practice. We have proposed a methodology
following the SOLVeR acronym. While collaboratively solving an optimization problem, each
phase is accompanied by supporting activities such as project management or interdisciplinary
communication. The application of the SOLVeR method has been shown in two case studies.
Additionally, we have provided insights into the software architecture and usage of pymoo, a well-
known optimization framework used in academia and industry. The introduction into the framework
has given a fundamental understanding of the different software components and has been helpful
to get end-users started solving their optimization problem using a standard toolkit.
                                                 207


                                            CHAPTER 8
                            CONCLUSIONS AND FUTURE WORK
8.1    Summary of Contributions
This thesis has focused on computational expense – an important problem characteristic in practice.
In Chapter 1, we have discussed the origin of time-consuming evaluation functions and why they
need to be addressed when solving real-world optimization problems. Moreover, we have presented
the idea of using surrogates, which is the predominant approach in academia and industry. In Chap-
ter 2, some more basics and definitions about optimization, different well-known models, and a
standard surrogate-based method were provided. Related literature, including a categorization of
existing surrogate-based approaches, was discussed in Chapter 3. Besides a theme-wise overview
of previous works, we have identified current trends of surrogate-assisted optimization and open
issues to be addressed. Chapter 4 first have presented a method that incorporates surrogate models
into different kinds of well-known metaheuristics proposed for single-objective optimization. By
keeping the capability of generalization in mind, this idea was extended to multi-objective optimiza-
tion algorithms, as this needs to be commonly solved in practice. In Chapter 5, the optimization
of multiple independently computable and mixed computationally expensive functions was investi-
gated. This often occurs in practice when the performance assessment (constraints and objectives)
of a design requires running multiple third-party software packages or calculating closed-form
equations. We have proposed a method when all objectives are computationally expensive and the
constraints are inexpensive. In such a case, one wants to ensure that only feasible designs are sent to
the time-consuming evaluation function. Moreover, we have developed an evolutionary algorithm
for mixed computationally expensive functions evaluating the target functions in a specific order that
maximizes the information gain in each iteration. In Chapter 6 we have presented two case studies.
First, the optimization of a bi-objective head water jacket design problem with computationally
expensive functions. Second, a constrained bi-objective application exploring the electric machine
                                                 208


design with computationally expensive objectives and inexpensive constraints. Some studies and
projects in this thesis required collaborating with other – sometimes less optimization-literate –
researchers. Thus, we have presented a blueprint for collaborative optimization in Chapter 7.
Besides the collaboration, implementing the optimization code itself can be quite challenging. As
founders and maintainers of an open-source package (which also was used for the majority of this
thesis), we have described the architecture and usage of our framework focusing on multi-objective
optimization.
8.2    Future Directions
Throughout this dissertation, all kinds of questions have been approached in different ways. In the
very end, we would now like to identify possible future research directions, including matters that
deserve to be investigated in more detail.
    We proposed a generic methodology for surrogate-assisted optimization applied to uncon-
strained and constrained, single- and multi-objective problems in Chapter 4. Even though we tested
our method for a variety of metaheuristics, some additional studies shall broaden the scope of this
work further. Furthermore, the constrained handling for multi-objective optimization problems
has demonstrated competitive results to another algorithm; however, it has room for improvement.
Other interesting future research directions are the behavior of the proposed method with an increas-
ing amount of variables (20 or more) and variation of the evaluation budget (imitating application
evaluations of mediocre or high expense). This raises questions about the time a surrogate needs to
be fitted and the evaluation itself. Throughout this thesis, we assumed that the expensive evaluation
of a solution justified any additional computational burden (for instance, running a whole surrogate
selection procedure for each objective and constraint). However, for mediocre expensive functions,
let us say for one minute, one has to revise if surrogate incorporation is still beneficial.
    Heterogeneously expensive evaluations are a practical matter that needs to be paid attention
to when solving a real-world problem. First, our focus was on dealing with different evaluation
expenses (for objectives and constraints) but not the surrogate incorporation itself in Chapter 5.
                                                  209


Thus, a future research direction should be the holistic consideration of both aspects at the same time
attempting to solve a real-world optimization problem directly. For instance, suitable candidates
are multi-physics design problems often requiring multiple software packages to be executed.
Moreover, we have assumed that the evaluation time for each solution’s target is known and is
approximately the same for all designs. However, for some simulations, the evaluation time may
vary. For instance, let us assume the configuration of a car controller is optimized where the
simulation takes place until a time threshold has been made or an accident happens. For some
solutions, the evaluation will be relatively quick (an accident happens early). For others, the
simulation is time-intensive (simulating until the time limit is met; no accident happens). Thus,
one cannot generally order targets by their expensiveness (or as we have proposed, by information
gain) but needs to implement a more sophisticated procedure. Possible research directions are
dynamically updating the target’s evaluation time using a book-keeping approach or another external
predictor. One aspect of handling heterogeneous computing time problems is that you do not know
beforehand how much computational time a solution will take to evaluate a solution. Usually, the
time to compute a function may vary with a distribution. Thus, the allocation of objectives and
constraints for the next evaluation may need to be done online with a quick decision at the end of
each evaluation is complete. Such practicalities in scheduling will make the whole approach even
more pragmatic.
    Optimization is characterized by interdisciplinarity. In Chapter 2, we analyzed the literature and
already provided a list of a number of application problems in different sciences with the necessity to
address expensive functions. Additionally, we provided two case studies in Chapter 6. Nevertheless,
this is a good starting point more, and more application problems need to be investigated. This future
research direction shall help pin down commonality across research disciplines and further refine
the requirements of surrogate-assisted optimization. Furthermore, because addressing expensive
functions is necessary for solving real-world optimization problems, research and industry can
benefit from each other pursuing collaborative optimization.
    Lastly, some closing thoughts on our contributions directly to the community. Pymoo has
                                                  210


become a widely used tool in academia and industry. Over the last years, the framework has become
a robust toolkit for (evolutionary) single- and multi-objective optimization. Many extensions and
features are possible in the future, each addressing a specific problem characteristic. Moreover,
the long-term view will require building a self-sustainable community to incorporate new research
approaches and provide a well-maintained framework to end-users. The community should include
internationally well-known optimization researchers and a group of Python developers taking care
of technical matters. A network and an active community are important factors for keeping an open-
source product alive. As pymoo has become one of the standard tools in Python for multi-objective
optimization, we are planning to release another toolkit focusing on the optimization of expensive
functions. The new development shall be based on the findings in this thesis and depend on
pymoo across its optimization components. The new toolkit will directly serve as an extension for
researchers and practitioners to solve computationally expensive optimization problems efficiently.
                                                211


APPENDIX
   212


                     SCOPUS QUERIES FOR LITERATURE ANALYSIS
In Section 3.2 we performed an analysis of literature related to surrogate-assisted algorithms.
We distinguished between publications with a focus on the Problem, Method, and Goal. Their
corresponding Scopus queries are listed in Algorithms A.1, A.2, A.3, respectively.
  Algorithm A.1: Scopus Query: Problem
  1 TITLE ( ( "simulation optimization" OR "simulation-based optimization"
  2 OR "expensive black-box" OR "data-driven optimization") AND optimization )
  3 AND ( LIMIT-TO ( LANGUAGE,"English" ) )
  Algorithm A.2: Scopus Query: Method
  1 TITLE ( ( "simulation optimization" OR "simulation-based optimization"
  2 OR "expensive black-box" OR "data-driven optimization" OR bayesian OR "model-based"
  3 OR "Kriging assisted") AND optimization )
  4 AND ( LIMIT-TO ( LANGUAGE,"English" ) )
  Algorithm A.3: Scopus Query: Goal
  1 TITLE-ABS-KEY ( ( "Efficient Global Optimization" OR "Efficient Global Optimizer"
  2 OR "anytime optimization" OR "anytime algorithm" OR "EGO") AND optimization )
  3 AND ( LIMIT-TO ( LANGUAGE,"English" ) )
                                                 213


BIBLIOGRAPHY
     214


                                       BIBLIOGRAPHY
[1]  D. E. Goldberg, Genetic algorithms in search, optimization and machine learning. USA:
     Addison-Wesley Longman Publishing Co., Inc., 1st ed., 1989.
[2]  T. A. Burress, S. L. Campbell, C. Coomer, C. W. Ayers, A. A. Wereszczak, J. P. Cunningham,
     L. D. Marlino, L. E. Seiber, and H.-T. Lin, “Evaluation of the 2010 toyota prius hybrid synergy
     drive system,” Tech. Rep. March, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN
     (United States), Mar. 2011.
[3]  Y. Pochet and L. A. Wolsey, Production planning by mixed integer programming. Springer
     Science & Business Media, 2006.
[4]  M. R. Bonyadi, Z. Michalewicz, and L. Barone, “The travelling thief problem: The first step
     in the transition from theoretical problems to realistic problems,” in 2013 IEEE congress on
     evolutionary computation, pp. 1037–1044, 2013.
[5]  J. Kiefer, “Sequential minimax search for a maximum,” Proceedings of the American Math-
     ematical Society, vol. 4, no. 3, pp. 502–506, 1953.
[6]  K. Deb and C. Myburgh, “A Population-Based Fast Algorithm for a Billion-Dimensional
     Resource Allocation Problem with Integer Variables,” European Journal of Operational
     Research, vol. 261, no. 2, pp. 460–474, 2017.
[7]  Y. Chauvin and D. E. Rumelhart, eds., Backpropagation: Theory, architectures, and appli-
     cations. USA: L. Erlbaum Associates Inc., 1995.
[8]  W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A survey of deep neural
     network architectures and their applications,” Neurocomputing, vol. 234, pp. 11 – 26, 2017.
[9]  K. Deb, Multi-objective optimization using evolutionary algorithms. USA: John Wiley &
     Sons, Inc., 2001.
[10] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic
     algorithm: NSGA-II,” Trans. Evol. Comp, vol. 6, pp. 182–197, Apr. 2002.
[11] Q. Zhang and H. Li, “A multi-objective evolutionary algorithm based on decomposition,”
     IEEE Transactions on Evolutionary Computation, Accepted, vol. 2007, 2007.
[12] J. Blank, K. Deb, and P. Roy, “Investigating the normalization procedure of NSGA-III,”
     in Evolutionary multi-criterion optimization (K. Deb, E. Goodman, C. A. Coello Coello,
     K. Klamroth, K. Miettinen, S. Mostaghim, and P. Reed, eds.), (Cham), pp. 229–240, Springer
     International Publishing, 2019.
[13] T. Tušar and B. Filipič, “Visualization of pareto front approximations in evolutionary mul-
     tiobjective optimization: A critical review and the prosection method,” IEEE Transactions
     on Evolutionary Computation, vol. 19, no. 2, pp. 225–245, 2015.
                                                215


[14] M. Kaisa, Nonlinear multiobjective optimization, vol. 12 of International series in operations
     research & management science. Boston, USA: Kluwer Academic Publishers, 1999.
[15] D. P. Bertsekas, Constrained optimization and lagrange multiplier methods (optimization
     and neural computation series). Athena Scientific, 1 ed., 1996.
[16] K. Deb, “An efficient constraint handling method for genetic algorithms,” Computer Methods
     in Applied Mechanics and Engineering, vol. 186, no. 2, pp. 311–338, 2000.
[17] J. A. Snyman and D. N. Wilke, Practical mathematical optimization: basic optimization
     theory and gradient-based algorithms; 2nd ed. Springer optimization and its applications,
     Cham: Springer, 2018.
[18] C. Audet and W. Hare, Derivative-free and blackbox optimization. Jan. 2017.
[19] R. Horst and P. Pardalos, Handbook of global optimization. Nonconvex optimization and its
     applications, Springer US, 2013.
[20] J. J. Schneider and S. Kirkpatrick, Stochastic optimization. Berlin Heidelberg, Germany:
     Springer-Ver, 2006.
[21] J. D. Anderson and J. Wendt, Computational fluid dynamics, vol. 206. Springer, 1995.
[22] B. Szabó and I. Babuška, Finite element analysis. John Wiley & Sons, 1991.
[23] K. Deb, R. Hussein, P. C. Roy, and G. Toscano-Pulido, “A taxonomy for metamodeling
     frameworks for evolutionary multiobjective optimization,” IEEE Transactions on Evolution-
     ary Computation, vol. 23, no. 1, pp. 104–116, 2019.
[24] J. Blank and K. Deb, “PSAF: A Probabilistic Surrogate-Assisted Framework for Single-
     Objective Optimization,” in GECCO ’21: Proceedings of the genetic and evolutionary
     computation conference companion, (New York, NY, USA), ACM, 2021. place: New York,
     NY, USA.
[25] J. Blank and K. Deb, “Constrained bi-objective surrogate-assisted optimization of problems
     with heterogeneous evaluation times: Expensive objectives and inexpensive constraints,” in
     Evolutionary multi-criterion optimization (H. Ishibuchi, Q. Zhang, R. Cheng, K. Li, H. Li,
     H. Wang, and A. Zhou, eds.), pp. 257–269, Springer International Publishing, 2021.
[26] J. Blank and K. Deb, “Handling constrained multi-objective optimization problems with
     heterogeneous evaluation times: proof-of-principle results,” Memetic Computing, Mar. 2022.
[27] J. Blank and K. Deb, “SOLVeR: A Blueprint for Collaborative Optimization in Practice,”
     2021. tex.eventtitle: International Multi-Conference on Complexity, Informatics and Cyber-
     netics: IMCIC 2021.
[28] A. Ahrari, J. Blank, K. Deb, and X. Li, “A proximity-based surrogate-assisted method
     for simulation-based design optimization of a cylinder head water jacket,” Engineering
     Optimization, pp. 1–19, 2020.
                                               216


[29] J. Blank and K. Deb, “pymoo: Multi-objective Optimization in Python,” IEEE Access, vol. 8,
     pp. 89497–89509, 2020.
[30] J. Nocedal and S. J. Wright, Numerical optimization. New York, NY, USA: Springer, 2 ed.,
     2006.
[31] J. H. Holland, “Genetic Algorirthms: Computer programs that "evolve" in ways that resemble
     natural selection can solve complex problems even their creators do not fully understand,”
     1960.
[32] J. H. Holland, Adaptation in natural and artificial systems. Ann Arbor, MI: University of
     Michigan Press, 1975.
[33] K. Deb and C. Myburgh, “Breaking the billion-variable barrier in real-world optimization
     using a customized evolutionary algorithm,” in Proceedings of the genetic and evolutionary
     computation conference 2016, GECCO ’16, (New York, NY, USA), pp. 653–660, Associa-
     tion for Computing Machinery, 2016.
[34] R. L. Hardy, “Multiquadric equations of topography and other irregular surfaces,” Journal
     of Geophysical Research (1896-1977), vol. 76, pp. 1905–1915, Mar. 1971.
[35] D. G. Krige, “A statistical approach to some basic mine valuation problems on the Witwa-
     tersrand, by D.G. Krige, published in the Journal, December 1951 : introduction by the
     author,” 1951.
[36] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning (adaptive
     computation and machine learning). The MIT Press, 2005.
[37] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive
     black-box functions,” J. of Global Optimization, 1998.
[38] J. Stork, M. Friese, M. Zaefferer, T. Bartz-Beielstein, A. Fischbach, B. Breiderhoff, B. Nau-
     joks, and T. Tušar, “Open issues in surrogate-assisted optimization,” in High-performance
     simulation-based optimization (T. Bartz-Beielstein, B. Filipič, P. Korošec, and E.-G. Talbi,
     eds.), pp. 225–244, Cham: Springer International Publishing, 2020.
[39] “Scopus: Expertly curated abstract and citation database.”
[40] GECCO ’20: Proceedings of the 2020 genetic and evolutionary computation conference
     companion. Cancún, Mexico: Association for Computing Machinery, 2020.
[41] J. Zhang, Z. Zhan, Y. Lin, N. Chen, Y. Gong, J. Zhong, H. S. H. Chung, Y. Li, and
     Y. Shi, “Evolutionary computation meets machine learning: A survey,” IEEE Computational
     Intelligence Magazine, vol. 6, no. 4, pp. 68–75, 2011.
[42] Y. Jin, H. Wang, T. Chugh, D. Guo, and K. Miettinen, “Data-Driven Evolutionary Optimiza-
     tion: An Overview and Case Studies,” IEEE Transactions on Evolutionary Computation,
     vol. 23, pp. 442–458, June 2019.
                                               217


[43] T. W. Simpson, T. M. Mauery, J. J. Korte, and F. Mistree, “Kriging models for global
     approximation in simulation-based multidisciplinary design optimization,” AIAA Journal,
     vol. 39, no. 12, pp. 2233–2241, 2001.
[44] Z. Zhou, Y.-S. Ong, M. H. Nguyen, and D. Lim, “A study on polynomial regression and
     Gaussian process global surrogate model in hierarchical surrogate-assisted evolutionary
     algorithm,” 2005 IEEE Congress on Evolutionary Computation, vol. 3, pp. 2832–2839 Vol.
     3, 2005.
[45] Y. Zhang, N. H. Kim, C. Park, and R. T. Haftka, “Multifidelity surrogate based on single
     linear regression,” AIAA Journal, vol. 56, no. 12, pp. 4944–4952, 2018.
[46] I. Loshchilov, M. Schoenauer, and M. Sebag, “Comparison-based optimizers need
     comparison-based surrogates,” in Parallel problem solving from nature, PPSN XI (R. Schae-
     fer, C. Cotta, J. Kołodziej, and G. Rudolph, eds.), (Berlin, Heidelberg), pp. 364–373, Springer
     Berlin Heidelberg, 2010.
[47] A. Rosales-Pérez, J. A. Gonzalez, C. A. Coello Coello, H. J. Escalante, and C. A. Reyes-
     Garcia, “Surrogate-assisted multi-objective model selection for support vector machines,”
     Neurocomputing, vol. 150, pp. 163 – 172, 2015.
[48] J. A. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural
     processing letters, vol. 9, no. 3, pp. 293–300, 1999.
[49] D. Buche, N. N. Schraudolph, and P. Koumoutsakos, “Accelerating evolutionary algorithms
     with gaussian process fitness function models,” Trans. Sys. Man Cyber Part C, vol. 35,
     pp. 183–194, May 2005.
[50] B. Liu, Q. Zhang, and G. G. E. Gielen, “A gaussian process surrogate model assisted evolu-
     tionary algorithm for medium scale expensive optimization problems,” IEEE Transactions
     on Evolutionary Computation, vol. 18, no. 2, pp. 180–192, 2014.
[51] Z. Zhou, Y. S. Ong, P. B. Nair, A. J. Keane, and K. Y. Lum, “Combining global and local
     surrogate models to accelerate evolutionary optimization,” IEEE Transactions on Systems,
     Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 1, pp. 66–76, 2007.
[52] R. Hussein and K. Deb, “A generative kriging surrogate model for constrained and un-
     constrained multi-objective optimization,” in Proceedings of the genetic and evolutionary
     computation conference 2016, GECCO ’16, (New York, NY, USA), pp. 573–580, Associa-
     tion for Computing Machinery, 2016.
[53] A. Sinha, S. Bedi, and K. Deb, “Bilevel optimization based on kriging approximations of
     lower level optimal value function,” in 2018 IEEE congress on evolutionary computation
     (CEC), pp. 1–8, 2018.
[54] J. Müller and C. A. Shoemaker, “Influence of ensemble surrogate models and sampling
     strategy on the solution quality of algorithms for computationally expensive black-box global
     optimization problems,” Journal of Global Optimization, vol. 60, no. 2, pp. 123–144, 2014.
                                                218


[55] S. Lophaven, H. B. Nielsen, and J. Søndergaard, “DACE – a MATLAB kriging toolbox,”
     2002.
[56] Yew-Soon Ong, P. B. Nair, and K. Y. Lum, “Max-min surrogate-assisted evolutionary
     algorithm for robust design,” IEEE Transactions on Evolutionary Computation, vol. 10,
     no. 4, pp. 392–404, 2006.
[57] Y. S. Ong, P. B. Nair, A. J. Keane, and K. W. Wong, “Surrogate-Assisted Evolutionary
     Optimization Frameworks for High-Fidelity Engineering Design Problems,” in Knowledge
     Incorporation in Evolutionary Computation (Y. Jin, ed.), pp. 307–331, Berlin, Heidelberg:
     Springer Berlin Heidelberg, 2005.
[58] Y. Tenne and S. W. Armfield, “A framework for memetic optimization using variable global
     and local surrogate models,” Soft Computing, vol. 13, no. 8, p. 781, 2008.
[59] R. G. Regis and C. A. Shoemaker, “A stochastic radial basis function method for the global
     optimization of expensive functions,” INFORMS Journal on Computing, vol. 19, no. 4,
     pp. 497–509, 2007.
[60] Z. Yang, H. Qiu, L. Gao, X. Cai, C. Jiang, and L. Chen, “Surrogate-assisted classification-
     collaboration differential evolution for expensive constrained optimization problems,” Infor-
     mation Sciences, vol. 508, pp. 50 – 63, 2020.
[61] M. Farina, “A neural network based generalized response surface multiobjective evolutionary
     algorithm,” in Proceedings of the 2002 congress on evolutionary computation. CEC’02 (cat.
     No.02TH8600), vol. 1, pp. 956–961 vol.1, 2002.
[62] M. Hüsken, Y. Jin, and B. Sendhoff, “Structure optimization of neural networks for evolu-
     tionary design optimization,” Soft Computing, vol. 9, pp. 21–28, Jan. 2005.
[63] Y. Jin, M. Hüsken, M. Olhofer, and B. Sendhoff, “Neural Networks for Fitness Approximation
     in Evolutionary Optimization,” in Knowledge Incorporation in Evolutionary Computation
     (Y. Jin, ed.), pp. 281–306, Berlin, Heidelberg: Springer Berlin Heidelberg, 2005.
[64] L. Pan, C. He, Y. Tian, H. Wang, X. Zhang, and Y. Jin, “A classification-based surrogate-
     assisted evolutionary algorithm for expensive many-objective optimization,” IEEE Transac-
     tions on Evolutionary Computation, vol. 23, no. 1, pp. 74–88, 2019.
[65] M. Holeňa, D. Linke, U. Rodemerck, and L. Bajer, “Neural Networks as Surrogate Models
     for Measurements in Optimization Algorithms,” in Analytical and Stochastic Modeling
     Techniques and Applications (K. Al-Begain, D. Fiems, and W. J. Knottenbelt, eds.), (Berlin,
     Heidelberg), pp. 351–366, Springer Berlin Heidelberg, 2010.
[66] L. Niles, H. Silverman, G. Tajchman, and M. Bush, “How limited training data can allow
     a neural network to outperform an’optimal’statistical classifier,” in International conference
     on acoustics, speech, and signal processing,, pp. 17–20, 1989.
[67] F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential model-based optimization for
     general algorithm configuration,” in Learning and intelligent optimization (C. A. C. Coello,
     ed.), (Berlin, Heidelberg), pp. 507–523, Springer Berlin Heidelberg, 2011.
                                                219


[68] K. Eggensperger, M. Lindauer, H. H. Hoos, F. Hutter, and K. Leyton-Brown, “Efficient
     benchmarking of algorithm configurators via model-based surrogates,” Machine Learning,
     vol. 107, no. 1, pp. 15–41, 2018.
[69] E. O. Nsoesie, R. J. Beckman, S. Shashaani, K. S. Nagaraj, and M. V. Marathe, “A simulation
     optimization approach to epidemic forecasting,” PLOS ONE, vol. 8, pp. 1–10, June 2013.
[70] H. Wang and Y. Jin, “A random forest-assisted evolutionary algorithm for data-driven con-
     strained multiobjective combinatorial optimization of trauma systems,” IEEE Transactions
     on Cybernetics, vol. 50, no. 2, pp. 536–549, 2020.
[71] Y. Sun, H. Wang, B. Xue, Y. Jin, G. G. Yen, and M. Zhang, “Surrogate-Assisted Evolutionary
     Deep Learning Using an End-to-End Random Forest-Based Performance Predictor,” IEEE
     Transactions on Evolutionary Computation, vol. 24, pp. 350–364, Apr. 2020.
[72] L. Bajer and M. Holeňa, “Surrogate model for continuous and discrete genetic optimization
     based on RBF networks,” in Intelligent data engineering and automated learning – IDEAL
     2010 (C. Fyfe, P. Tino, D. Charles, C. Garcia-Osorio, and H. Yin, eds.), (Berlin, Heidelberg),
     pp. 251–258, Springer Berlin Heidelberg, 2010.
[73] L. P. Swiler, P. D. Hough, P. Qian, X. Xu, C. Storlie, and H. Lee, “Surrogate models for mixed
     discrete-continuous variables,” in Constraint programming and decision making (M. Ceberio
     and V. Kreinovich, eds.), pp. 181–202, Cham: Springer International Publishing, 2014.
[74] J. L. Walsh, “A closed set of normal orthogonal functions,” American Journal of Mathemat-
     ics, vol. 45, p. 5, 1923.
[75] S. Verel, B. Derbel, A. Liefooghe, H. Aguirre, and K. Tanaka, “A Surrogate Model Based
     on Walsh Decomposition for Pseudo-Boolean Functions,” in Parallel Problem Solving from
     Nature – PPSN XV (A. Auger, C. M. Fonseca, N. Lourenço, P. Machado, L. Paquete, and
     D. Whitley, eds.), (Cham), pp. 181–193, Springer International Publishing, 2018.
[76] G. Pruvost, B. Derbel, A. Liefooghe, S. Verel, and Q. Zhang, “Surrogate-assisted multi-
     objective combinatorial optimization based on decomposition and walsh basis,” in Proceed-
     ings of the 2020 genetic and evolutionary computation conference, GECCO ’20, (New York,
     NY, USA), pp. 542–550, Association for Computing Machinery, 2020.
[77] F. Chicano, D. Whitley, G. Ochoa, and R. Tinós, “Optimizing one million variable NK
     landscapes by hybridizing deterministic recombination and local search,” in Proceedings of
     the genetic and evolutionary computation conference, GECCO ’17, (New York, NY, USA),
     pp. 753–760, Association for Computing Machinery, 2017.
[78] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimiza-
     tion,” in Proceedings of the 24th international conference on neural information processing
     systems, NIPS’11, (Red Hook, NY, USA), pp. 2546–2554, Curran Associates Inc., 2011.
     tex.ids= 2011-bergstra-hyper-param-opt.
                                                220


[79] Y. Ozaki, Y. Tanigaki, S. Watanabe, and M. Onishi, “Multiobjective tree-structured parzen
     estimator for computationally expensive optimization problems,” in Proceedings of the 2020
     genetic and evolutionary computation conference, GECCO ’20, (New York, NY, USA),
     pp. 533–541, Association for Computing Machinery, 2020.
[80] M. N. Le, Y. S. Ong, S. Menzel, Y. Jin, and B. Sendhoff, “Evolution by adapting surrogates,”
     Evolutionary Computation, vol. 21, no. 2, pp. 313–340, 2013.
[81] R. Jin, W. Chen, and T. Simpson, “Comparative studies of metamodelling techniques under
     multiple modelling criteria,” Structural and Multidisciplinary Optimization, vol. 23, pp. 1–
     13, Dec. 2001.
[82] D. Lim, Y.-S. Ong, Y. Jin, and B. Sendhoff, “A study on metamodeling techniques, ensem-
     bles, and multi-surrogates in evolutionary computation,” in Proceedings of the 9th annual
     conference on genetic and evolutionary computation, GECCO ’07, (New York, NY, USA),
     pp. 1288–1295, Association for Computing Machinery, 2007.
[83] D. Lim, Y. Jin, Y. Ong, and B. Sendhoff, “Generalizing surrogate-assisted evolutionary
     computation,” IEEE Transactions on Evolutionary Computation, vol. 14, no. 3, pp. 329–
     355, 2010.
[84] K. S. Bhattacharjee, H. K. Singh, T. Ray, and J. Branke, “Multiple surrogate assisted multiob-
     jective optimization using improved pre-selection,” in 2016 IEEE congress on evolutionary
     computation (CEC), 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 4328–
     4335, 2016.
[85] B. S. Saini, M. Lopez-Ibanez, and K. Miettinen, “Automatic surrogate modelling technique
     selection based on features of optimization problems,” in GECCO ’19: Proceedings of
     the genetic and evolutionary computation conference companion, (New York, NY, USA),
     pp. 1765–1772, ACM, 2019.
[86] A. Bhosekar and M. Ierapetritou, “Advances in surrogate based modeling, feasibility analysis,
     and optimization: A review,” Computers & Chemical Engineering, vol. 108, pp. 250 – 267,
     2018.
[87] T. Simpson, J. Poplinski, P. N. Koch, and J. Allen, “Metamodels for Computer-based
     Engineering Design: Survey and recommendations,” Engineering with Computers, vol. 17,
     pp. 129–150, July 2001.
[88] A. I. Khuri and S. Mukhopadhyay, “Response surface methodology,” Wiley Interdisciplinary
     Reviews: Computational Statistics, vol. 2, no. 2, pp. 128–149, 2010.
[89] R. H. Myers, D. C. Montgomery, and C. M. Anderson-Cook, Response surface methodology:
     process and product optimization using designed experiments. John Wiley & Sons, 2016.
[90] S. Kitayama, M. Arakawa, and K. Yamazaki, “Sequential approximate optimization using
     radial basis function network for engineering optimization,” Optimization and Engineering,
     vol. 12, no. 4, pp. 535–557, 2011.
                                               221


[91] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine
      learning algorithms,” in Advances in neural information processing systems, pp. 2951–2959,
      2012.
[92] D. R. Jones, “A taxonomy of global optimization methods based on response surfaces,”
      Journal of Global Optimization, vol. 21, no. 4, pp. 345–383, 2001.
[93] F. Rehbach, M. Zaefferer, B. Naujoks, and T. Bartz-Beielstein, “Expected improvement
      versus predicted value in surrogate-based optimization,” in Proceedings of the 2020 genetic
      and evolutionary computation conference, GECCO ’20, (New York, NY, USA), pp. 868–876,
      Association for Computing Machinery, 2020.
[94] R. G. Regis and C. A. Shoemaker, “Constrained global optimization of expensive black
      box functions using radial basis functions,” Journal of Global Optimization, vol. 31, no. 1,
      pp. 153–171, 2005.
[95] L. Wang, S. Shan, and G. Gary Wang, “Mode-pursuing sampling method for global op-
      timization on expensive black-box functions,” Engineering Optimization, vol. 36, no. 4,
      pp. 419–438, 2004.
[96] F. Viana and R. Haftka, “Surrogate-based optimization with parallel simulations using the
      probability of improvement,” in 13th AIAA/ISSMO multidisciplinary analysis optimization
      conference, 2010.
[97] A. Chaudhuri, R. T. Haftka, P. Ifju, K. Chang, C. Tyler, and T. Schmitz, “Experimental
      flapping wing optimization and uncertainty quantification using limited samples,” Structural
      and Multidisciplinary Optimization, pp. 957–970, Apr. 2015.
[98] L. Yaohui, “A Kriging-based global optimization method using multi-points infill search
      criterion,” Journal of Algorithms & Computational Technology, pp. 366–377, Dec. 2017.
[99] P. Beaucaire, C. Beauthier, and C. Sainvitu, “Multi-point infill sampling strategies exploiting
      multiple surrogate models,” in GECCO ’19: Proceedings of the genetic and evolutionary
      computation conference companion, (New York, NY, USA), pp. 1559–1567, ACM, 2019.
[100] N. Berveglieri, B. Derbel, A. Liefooghe, H. Aguirre, Q. Zhang, and K. Tanaka, “Designing
      parallelism in surrogate-assisted multiobjective optimization based on decomposition,” in
      Proceedings of the 2020 genetic and evolutionary computation conference, GECCO ’20,
      (New York, NY, USA), pp. 462–470, Association for Computing Machinery, 2020.
[101] J. J. Grefenstette and J. M. Fitzpatrick, Genetic search with approximate function evaluations.
      Proceedings of an International Conference on Genetic Algorithms and Their Applications,
      1985.
[102] A. Ratle, Accelerating the convergence of evolutionary algorithms by fitness landscape
      approximation. International Conference on Parallel Problem Solving from Nature, Springer,
      1998.
                                                  222


[103] Y. Jin, “A comprehensive survey of fitness approximation in evolutionary computation,” Soft
      Computing, vol. 9, pp. 3–12, Jan. 2005.
[104] Y. Jin, “Surrogate-assisted evolutionary computation: Recent advances and future chal-
      lenges,” Swarm and Evolutionary Computation, vol. 1, no. 2, pp. 61 – 70, 2011.
[105] Yaochu Jin, M. Olhofer, and B. Sendhoff, “A framework for evolutionary optimization with
      approximate fitness functions,” IEEE Transactions on Evolutionary Computation, vol. 6,
      no. 5, pp. 481–494, 2002.
[106] F. Neri and C. Cotta, “Memetic algorithms and memetic computing optimization: A literature
      review,” Swarm and Evolutionary Computation, pp. 1–14, Feb. 2012.
[107] R. Storn and K. Price, “Differential Evolution –A Simple and Efficient Heuristic for global
      Optimization over Continuous Spaces,” Journal of Global Optimization, pp. 341–359, Dec.
      1997.
[108] A. K. Qin, V. L. Huang, and P. N. Suganthan, “Differential evolution algorithm with strategy
      adaptation for global numerical optimization,” IEEE transactions on Evolutionary Compu-
      tation, vol. 13, no. 2, pp. 398–417, 2008.
[109] X. Lu, K. Tang, and X. Yao, “Classification-assisted Differential Evolution for computa-
      tionally expensive problems,” in 2011 IEEE congress of evolutionary computation (CEC),
      pp. 1986–1993, 2011.
[110] E. Krempser, H. Bernardino, H. Barbosa, and A. Lemonge, “Differential evolution assisted by
      surrogate models for structural optimization problems,” vol. 100 of Civil-Comp Proceedings,
      Jan. 2012.
[111] Y. Wang, D. Yin, S. Yang, and G. Sun, “Global and local surrogate-assisted differential
      evolution for expensive constrained optimization problems with inequality constraints,” IEEE
      Transactions on Cybernetics, vol. 49, no. 5, pp. 1642–1656, 2019.
[112] N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strate-
      gies,” Evolutionary Computation, vol. 9, pp. 159–195, June 2001.
[113] N. Hansen, “The CMA Evolution Strategy: A Comparing Review,” in Towards a New
      Evolutionary Computation: Advances in the Estimation of Distribution Algorithms (J. A.
      Lozano, P. Larrañaga, I. Inza, and E. Bengoetxea, eds.), pp. 75–102, Berlin, Heidelberg:
      Springer Berlin Heidelberg, 2006.
[114] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally weighted learning,” Artificial Intelli-
      gence Review, vol. 11, pp. 11–73, Feb. 1997.
[115] S. Kern, N. Hansen, and P. Koumoutsakos, “Local meta-models for optimization using
      evolution strategies,” in Parallel problem solving from nature - PPSN IX (T. P. Runarsson,
      H.-G. Beyer, E. Burke, J. J. Merelo-Guervós, L. D. Whitley, and X. Yao, eds.), (Berlin,
      Heidelberg), pp. 939–948, Springer Berlin Heidelberg, 2006.
                                               223


[116] I. Loshchilov, M. Schoenauer, and M. Sebag, “Intensive surrogate model exploitation in self-
      adaptive surrogate-assisted cma-es (saacm-es),” in Proceedings of the 15th annual conference
      on genetic and evolutionary computation, GECCO ’13, (New York, NY, USA), pp. 439–446,
      Association for Computing Machinery, 2013.
[117] L. Bajer, Z. Pitra, J. Repický, and M. Holeňa, “Gaussian process surrogate models for
      the CMA-ES,” in GECCO ’19: Proceedings of the genetic and evolutionary computation
      conference companion, (New York, NY, USA), pp. 17–18, ACM, 2019.
[118] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 -
      international conference on neural networks, vol. 4, pp. 1942–1948 vol.4, 1995.
[119] Z. Lv, L. Wang, Z. Han, J. Zhao, and W. Wang, “Surrogate-assisted particle swarm opti-
      mization algorithm with Pareto active learning for expensive multi-objective optimization,”
      IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 3, pp. 838–849, 2019.
[120] H. Wang, Y. Jin, and J. Doherty, “Committee-based active learning for surrogate-assisted
      particle swarm optimization of expensive problems,” IEEE Transactions on Cybernetics,
      vol. 47, no. 9, pp. 2664–2677, 2017.
[121] T. Chugh, K. Sindhya, J. Hakanen, and K. Miettinen, “A survey on handling computa-
      tionally expensive multiobjective optimization problems with evolutionary algorithms,” Soft
      Computing, vol. 23, no. 9, pp. 3137–3166, 2019.
[122] R. Allmendinger, M. T. M. Emmerich, J. Hakanen, Y. Jin, and E. Rigoni, “Surrogate-assisted
      multicriteria optimization: Complexities, prospective solutions, and business case,” Journal
      of Multi-Criteria Decision Analysis, vol. 24, no. 1-2, pp. 5–24, 2017.
[123] J. Zhang, A. Zhou, and G. Zhang, “A classification and Pareto domination based multiob-
      jective evolutionary algorithm,” 2015 IEEE Congress on Evolutionary Computation (CEC),
      pp. 2883–2890, 2015.
[124] T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, and K. Sindhya, “A surrogate-assisted reference
      vector guided evolutionary algorithm for computationally expensive many-objective opti-
      mization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 1, pp. 129–142,
      2018.
[125] R. Cheng, Y. Jin, M. Olhofer, and B. Sendhoff, “A reference vector guided evolutionary algo-
      rithm for many-objective optimization,” IEEE Transactions on Evolutionary Computation,
      vol. 20, no. 5, pp. 773–791, 2016.
[126] J. Knowles, “ParEGO: a hybrid algorithm with on-line landscape approximation for expen-
      sive multiobjective optimization problems,” IEEE Transactions on Evolutionary Computa-
      tion, vol. 10, no. 1, pp. 50–66, 2006.
[127] R. E. Steuer and E.-U. Choo, “An interactive weighted Tchebycheff procedure for multiple
      objective programming,” Mathematical Programming, vol. 26, no. 3, pp. 326–344, 1983.
                                               224


[128] C. M. Cristescu and J. Knowles, “Surrogate-based multiobjective optimization : ParEGO
      update and test,” 2015.
[129] M. Li, G. Li, and S. Azarm, “A kriging metamodel assisted multi-objective genetic algorithm
      for design optimization,” Journal of Mechanical Design, vol. 130, Feb. 2008.
[130] W. Ponweiser, T. Wagner, D. Biermann, and M. Vincze, “Multiobjective optimization on
      a limited budget of evaluations using model-assisted $\mathcal{S}$-Metric selection,” in
      Parallel problem solving from nature –PPSN x (G. Rudolph, T. Jansen, N. Beume, S. Lucas,
      and C. Poloni, eds.), (Berlin, Heidelberg), pp. 784–794, Springer Berlin Heidelberg, 2008.
[131] E. Zitzler and L. Thiele, “Multiobjective optimization using evolutionary algorithms — A
      comparative case study,” in Parallel problem solving from nature — PPSN v (A. E. Eiben,
      T. Bäck, M. Schoenauer, and H.-P. Schwefel, eds.), (Berlin, Heidelberg), pp. 292–301,
      Springer Berlin Heidelberg, 1998.
[132] V. Picheny, “Multiobjective optimization using Gaussian process emulators via stepwise
      uncertainty reduction,” Statistics and Computing, vol. 25, no. 6, pp. 1265–1280, 2015.
[133] A. A. M. Rahat, R. M. Everson, and J. E. Fieldsend, “Alternative infill strategies for expensive
      multi-objective optimisation,” in Proceedings of the genetic and evolutionary computation
      conference, GECCO ’17, (New York, NY, USA), pp. 873–880, Association for Computing
      Machinery, 2017.
[134] Q. Zhang, W. Liu, E. Tsang, and B. Virginas, “Expensive multiobjective optimization by
      MOEA/D with gaussian process model,” IEEE Transactions on Evolutionary Computation,
      vol. 14, no. 3, pp. 456–474, 2010.
[135] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms. USA: Kluwer
      Academic Publishers, 1981.
[136] A. Habib, H. K. Singh, T. Chugh, T. Ray, and K. Miettinen, “A multiple surrogate assisted
      decomposition-based evolutionary algorithm for expensive Multi/Many-Objective optimiza-
      tion,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 6, pp. 1000–1014, 2019.
[137] X. Wang, Y. Jin, S. Schmitt, and M. Olhofer, “An adaptive Bayesian approach to surrogate-
      assisted evolutionary multi-objective optimization,” Information Sciences, vol. 519, pp. 317–
      331, 2020.
[138] S. Bagheri, W. Konen, R. Allmendinger, J. Branke, K. Deb, J. Fieldsend, D. Quagliarella,
      and K. Sindhya, “Constraint handling in efficient global optimization,” in Proceedings of
      the genetic and evolutionary computation conference, GECCO ’17, (New York, NY, USA),
      pp. 673–680, Association for Computing Machinery, 2017.
[139] T. Chugh, K. Sindhya, K. Miettinen, J. Hakanen, and Y. Jin, “On constraint handling in
      surrogate-assisted evolutionary many-objective optimization,” in Parallel problem solving
      from nature – PPSN XIV (J. Handl, E. Hart, P. R. Lewis, M. López-Ibáñez, G. Ochoa, and
      B. Paechter, eds.), (Cham), pp. 214–224, Springer International Publishing, 2016.
                                               225


[140] R. Allmendinger and J. Knowles, “‘Hang on a minute’: Investigations on the effects of
      delayed objective functions in multiobjective optimization,” in Evolutionary multi-criterion
      optimization (R. C. Purshouse, P. J. Fleming, C. M. Fonseca, S. Greco, and J. Shaw, eds.),
      (Berlin, Heidelberg), pp. 6–20, Springer Berlin Heidelberg, 2013.
[141] R. Allmendinger, J. Handl, and J. Knowles, “Multiobjective optimization: When objectives
      exhibit non-uniform latencies,” European Journal of Operational Research, vol. 243, no. 2,
      pp. 497 – 513, 2015.
[142] T. Chugh, R. Allmendinger, V. Ojalehto, and K. Miettinen, “Surrogate-assisted evolutionary
      biobjective optimization for objectives with non-uniform latencies,” in Proceedings of the
      genetic and evolutionary computation conference, GECCO ’18, (New York, NY, USA),
      pp. 609–616, Association for Computing Machinery, 2018.
[143] J. Thomann and G. Eichfelder, “A trust-region algorithm for heterogeneous multiobjective
      optimization,” SIAM Journal on Optimization, vol. 29, no. 2, pp. 1017–1047, 2019.
[144] X. Wang, Y. Jin, S. Schmitt, and M. Olhofer, “Transfer learning for gaussian process assisted
      evolutionary bi-objective optimization for objectives with different evaluation times,” in
      Proceedings of the 2020 genetic and evolutionary computation conference, GECCO ’20,
      (New York, NY, USA), pp. 587–594, Association for Computing Machinery, 2020.
[145] X. Ruan, K. Li, B. Derbel, and A. Liefooghe, “Surrogate assisted evolutionary algorithm for
      medium scale multi-objective optimisation problems,” in Proceedings of the 2020 genetic
      and evolutionary computation conference, GECCO ’20, (New York, NY, USA), pp. 560–568,
      Association for Computing Machinery, 2020.
[146] F. Rehbach, L. Gentile, and T. Bartz-Beielstein, “Feature selection for surrogate model-based
      optimization,” in GECCO ’19: Proceedings of the genetic and evolutionary computation
      conference companion, (New York, NY, USA), pp. 399–400, ACM, 2019.
[147] F. Rehbach, L. Gentile, and T. Bartz-Beielstein, “Variable reduction for surrogate-based
      optimization,” in Proceedings of the 2020 genetic and evolutionary computation conference,
      GECCO ’20, (New York, NY, USA), pp. 1177–1185, Association for Computing Machinery,
      2020.
[148] J.-A. Mejía-de Dios and E. Mezura-Montes, “A surrogate-assisted metaheuristic for bilevel
      optimization,” in Proceedings of the 2020 genetic and evolutionary computation conference,
      GECCO ’20, (New York, NY, USA), pp. 629–635, Association for Computing Machinery,
      2020.
[149] Z. Wang and M. Ierapetritou, “A Novel Surrogate-Based Optimization Method for Black-
      Box Simulation with Heteroscedastic Noise,” Industrial & Engineering Chemistry Research,
      vol. 56, pp. 10720–10732, Sept. 2017.
[150] A. I. Forrester, A. Sóbester, and A. J. Keane, “Multi-fidelity optimization via surrogate
      modelling,” Proceedings of the royal society a: mathematical, physical and engineering
      sciences, vol. 463, no. 2088, pp. 3251–3269, 2007.
                                               226


[151] H. Wang, Y. Jin, and J. Doherty, “A generic test suite for evolutionary multifidelity opti-
      mization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 6, pp. 836–850,
      2018.
[152] J. Richter, J. Shi, J.-J. Chen, J. Rahnenführer, and M. Lang, “Model-based optimization with
      concept drifts,” in Proceedings of the 2020 genetic and evolutionary computation conference,
      GECCO ’20, (New York, NY, USA), pp. 877–885, Association for Computing Machinery,
      2020.
[153] Z. Lu, I. Whalen, V. Boddeti, Y. Dhebar, K. Deb, E. Goodman, and W. Banzhaf, “NSGA-
      Net: Neural architecture search using multi-objective genetic algorithm,” in Proceedings of
      the genetic and evolutionary computation conference, GECCO ’19, (New York, NY, USA),
      pp. 419–427, Association for Computing Machinery, 2019.
[154] A. Fabisch, “Empirical evaluation of contextual policy search with a comparison-based
      surrogate model and active covariance matrix adaptation,” in GECCO ’19: Proceedings of
      the genetic and evolutionary computation conference companion, (New York, NY, USA),
      pp. 251–252, ACM, 2019.
[155] O. Francon, S. Gonzalez, B. Hodjat, E. Meyerson, R. Miikkulainen, X. Qiu, and H. Shahrzad,
      “Effective reinforcement learning through evolutionary surrogate-assisted prescription,” in
      Proceedings of the 2020 genetic and evolutionary computation conference, GECCO ’20,
      (New York, NY, USA), pp. 814–822, Association for Computing Machinery, 2020.
[156] M. Rochoux, S. Ricci, D. Lucor, B. Cuenot, and A. Trouvé, “Towards predictive data-driven
      simulations of wildfire spread – Part I: Reduced-cost Ensemble Kalman Filter based on a
      Polynomial Chaos surrogate model for parameter estimation,” Natural Hazards and Earth
      System Sciences, vol. 14, no. 11, pp. 2951–2973, 2014.
[157] I. Gidaris, A. A. Taflanidis, and G. P. Mavroeidis, “Kriging metamodeling in seismic risk as-
      sessment based on stochastic ground motion models,” Earthquake Engineering & Structural
      Dynamics, vol. 44, pp. 2377–2399, Nov. 2015.
[158] I. Halachmi, J. Metz, A. Van’t Land, S. Halachmi, and J. Kleijnen, “Case study: Opti-
      mal facility allocation in a robotic milking barn,” Transactions of the American Society of
      Agricultural Engineers, vol. 45, no. 5, pp. 1539–1546, 2002.
[159] D. Broad, H. Maier, and G. Dandy, “Optimal operation of complex water distribution systems
      using metamodels,” Journal of Water Resources Planning and Management, vol. 136, no. 4,
      pp. 433–443, 2010.
[160] B. Horowitz, S. M. B. Afonso, and C. V. P. de Mendonça, “Surrogate based optimal water-
      flooding management,” Journal of Petroleum Science and Engineering, vol. 112, pp. 206 –
      219, 2013.
[161] B. Ataie-Ashtiani, H. Ketabchi, and M. Rajabi, “Optimal management of a freshwater lens in
      a small Island using surrogate models and evolutionary algorithms,” Journal of Hydrologic
      Engineering, vol. 19, no. 2, pp. 339–354, 2014.
                                                 227


[162] M. J. Asher, B. F. W. Croke, A. J. Jakeman, and L. J. M. Peeters, “A review of surrogate
      models and their application to groundwater modeling,” Water Resources Research, vol. 51,
      pp. 5957–5973, Aug. 2015.
[163] C. Zheng, Surrogate-Assisted Evolutionary Algorithms for Wind Farm Layout Optimisation
      Problem. PhD thesis, University of Waikato, 2016.
[164] H. Klie, “Physics-Based and Data-Driven Surrogates for Production Forecasting,” in SPE-
      173206-MS, (SPE), p. 18, Society of Petroleum Engineers, Feb. 2015.
[165] S. Agada, S. Geiger, A. Elsheikh, and S. Oladyshkin, “Data-driven surrogates for rapid
      simulation and optimization of WAG injection in fractured carbonate reservoirs,” Petroleum
      Geoscience, vol. 23, p. 270, May 2017.
[166] M. Hussain, A. Javadi, A. Ahangar-Asr, and R. Farmani, “A surrogate model for simulation-
      optimization of aquifer systems subjected to seawater intrusion,” Journal of Hydrology,
      vol. 523, pp. 542–554, 2015.
[167] R. Sayyadi and A. Awasthi, “A simulation-based optimisation approach for identifying
      key determinants for sustainable transportation planning,” International Journal of Systems
      Science: Operations and Logistics, vol. 5, no. 2, pp. 161–174, 2018.
[168] S. Deshpande, L. T. Watson, J. Shu, F. A. Kamke, and N. Ramakrishnan, “Data driven
      surrogate-based optimization in the problem solving environment WBCSim,” Engineering
      with Computers, vol. 27, no. 3, pp. 211–223, 2011.
[169] V. Drouet, S. Verel, and J.-M. Do, “Surrogate-assisted asynchronous multiobjective algo-
      rithm for nuclear power plant operations,” in Proceedings of the 2020 genetic and evo-
      lutionary computation conference, GECCO ’20, (New York, NY, USA), pp. 1073–1081,
      Association for Computing Machinery, 2020.
[170] N. O. Nikitin, P. Vychuzhanin, A. Hvatov, I. Deeva, A. V. Kalyuzhnaya, and S. V. Kovalchuk,
      “Deadline-driven approach for multi-fidelity surrogate-assisted environmental model cali-
      bration: SWAN wind wave model case study,” in GECCO ’19: Proceedings of the genetic
      and evolutionary computation conference companion, (New York, NY, USA), pp. 1583–
      1591, ACM, 2019.
[171] F. S. Royce, J. W. Jones, and J. W. Hansen, “MODEL–BASED OPTIMIZATION OF
      CROP MANAGEMENT FOR CLIMATE FORECAST APPLICATIONS,” Transactions of
      the ASAE, vol. 44, no. 5, p. 1319, 2001.
[172] P. C. Roy, A. Guber, M. Abouali, A. P. Nejadhashemi, K. Deb, and A. J. M. Smucker,
      “Simulation Optimization of Water Usage and Crop Yield Using Precision Irrigation,” in
      Evolutionary Multi-Criterion Optimization (K. Deb, E. Goodman, C. A. Coello Coello,
      K. Klamroth, K. Miettinen, S. Mostaghim, and P. Reed, eds.), (Cham), pp. 695–706, Springer
      International Publishing, 2019.
                                               228


[173] S. Grihon, E. Burnaev, M. Belyaev, and P. Prikhodko, “Surrogate modeling of stability
      constraints for optimization of composite structures,” in Surrogate-based modeling and
      optimization: Applications in engineering (S. Koziel and L. Leifsson, eds.), pp. 359–391,
      New York, NY: Springer New York, 2013.
[174] L. Leifsson, S. Koziel, E. Jonsson, and S. Ogurtsov, “Aerodynamic shape optimization by
      space mapping,” in Surrogate-based modeling and optimization: Applications in engineering
      (S. Koziel and L. Leifsson, eds.), pp. 213–245, New York, NY: Springer New York, 2013.
[175] S. Ulaganathan and N. Asproulis, “Surrogate models for aerodynamic shape optimisation,”
      in Surrogate-based modeling and optimization: Applications in engineering (S. Koziel and
      L. Leifsson, eds.), pp. 285–312, New York, NY: Springer New York, 2013.
[176] A. Amrit and L. Leifsson, “Applications of surrogate-assisted and multi-fidelity multi-
      objective optimization algorithms to simulation-based aerodynamic design,” Engineering
      Computations (Swansea, Wales), vol. 37, no. 2, pp. 430–457, 2019.
[177] A. J. Booker, J. E. Dennis, P. D. Frank, D. B. Serafini, and V. Torczon, “Optimization Using
      Surrogate Objectives on a Helicopter Test Example,” in Computational Methods for Optimal
      Design and Control: Proceedings of the AFOSR Workshop on Optimal Design and Control
      Arlington, Virginia 30 September–3 October, 1997 (J. Borggaard, J. Burns, E. Cliff, and
      S. Schreck, eds.), pp. 49–58, Boston, MA: Birkhäuser Boston, 1998.
[178] F. Duchaine, T. Morel, and L. Gicquel, “Computational-fluid-dynamics-based kriging opti-
      mization tool for aeronautical combustion chambers,” AIAA Journal, vol. 47, no. 3, pp. 631–
      645, 2009.
[179] H. Yin, H. Fang, G. Wen, Q. Wang, and Y. Xiao, “An adaptive RBF-based multi-objective
      optimization method for crashworthiness design of functionally graded multi-cell tube,”
      Structural and Multidisciplinary Optimization, vol. 53, no. 1, pp. 129–144, 2016.
[180] P. Zhu, F. Pan, W. Chen, and S. Zhang, “Use of support vector regression in structural
      optimization: Application to vehicle crashworthiness design,” Mathematics and Computers
      in Simulation, vol. 86, pp. 21–31, 2012.
[181] P. Zhu, Y. Zhang, and G.-L. Chen, “Metamodel-based lightweight design of an automotive
      front-body structure using robust optimization,” Proceedings of the Institution of Mechanical
      Engineers, Part D: Journal of Automobile Engineering, vol. 223, no. 9, pp. 1133–1147, 2009.
[182] A. Parnianifard, A. S. Azfanizam, M. K. A. Ariffin, and M. I. S. Ismail, “Kriging-assisted
      robust black-box simulation optimization in direct speed control of DC motor under uncer-
      tainty,” IEEE Transactions on Magnetics, vol. 54, no. 7, pp. 1–10, 2018.
[183] E. Gengembre, B. Ladevie, O. Fudym, and A. Thuillier, “A Kriging constrained efficient
      global optimization approach applied to low-energy building design problems,” Inverse
      Problems in Science and Engineering, vol. 20, no. 7, pp. 1101–1114, 2012.
                                                229


[184] L. Van Gelder, P. Das, H. Janssen, and S. Roels, “Comparative study of metamodelling tech-
      niques in building energy simulation: Guidelines for practitioners,” Simulation Modelling
      Practice and Theory, vol. 49, pp. 245–257, 2014.
[185] A.-T. Nguyen, S. Reiter, and P. Rigo, “A review on simulation-based optimization methods
      applied to building performance analysis,” Applied Energy, vol. 113, pp. 1043 – 1058, 2014.
[186] S. Tseranidis, N. C. Brown, and C. T. Mueller, “Data-driven approximation algorithms for
      rapid performance evaluation and optimization of civil structures,” Automation in Construc-
      tion, vol. 72, pp. 279–293, Dec. 2016.
[187] Dan Guo, T. Chai, Jinliang Ding, and Y. Jin, “Small data driven evolutionary multi-objective
      optimization of fused magnesium furnaces,” in 2016 IEEE symposium series on computa-
      tional intelligence (SSCI), pp. 1–8, 2016.
[188] T. Chugh, N. Chakraborti, K. Sindhya, and Y. Jin, “A data-driven surrogate-assisted evolu-
      tionary algorithm applied to a many-objective blast furnace optimization problem,” Materials
      and Manufacturing Processes, vol. 32, no. 10, pp. 1172–1178, 2017.
[189] J. A. Easum, J. Nagar, and D. H. Werner, “Multi-objective surrogate-assisted optimiza-
      tion applied to patch antenna design,” in 2017 IEEE international symposium on antennas
      and propagation & USNC/URSI national radio science meeting, 2017 IEEE International
      Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting,
      pp. 339–340, 2017.
[190] I. Danjuma, M. Akinsolu, C. See, R. Abd-Alhameed, and B. Liu, “Design and optimization
      of a slotted monopole antenna for ultra-wide band body centric imaging applications,” IEEE
      Journal of Electromagnetics, RF and Microwaves in Medicine and Biology, vol. 4, no. 2,
      pp. 140–147, 2020.
[191] J. P. Jacobs, S. Koziel, and L. Leifsson, “Bayesian support vector regression modeling of
      microwave structures for design applications,” in Surrogate-based modeling and optimiza-
      tion: Applications in engineering (S. Koziel and L. Leifsson, eds.), pp. 121–145, New York,
      NY: Springer New York, 2013.
[192] S. Koziel, Q. S. Cheng, and J. W. Bandler, “Fast EM modeling exploiting shape-preserving
      response prediction and space mapping,” IEEE Transactions on Microwave Theory and
      Techniques, vol. 62, no. 3, pp. 399–407, 2014.
[193] S. Koziel, L. Leifsson, and S. Ogurtsov, “Space mapping for electromagnetic-simulation-
      driven design optimization,” in Surrogate-based modeling and optimization: Applications
      in engineering (S. Koziel and L. Leifsson, eds.), pp. 1–25, New York, NY: Springer New
      York, 2013.
[194] A.-K. S. O. Hassan and A. S. A. Mohamed, “Surrogate-based circuit design centering,” in
      Surrogate-based modeling and optimization: Applications in engineering (S. Koziel and
      L. Leifsson, eds.), pp. 27–49, New York, NY: Springer New York, 2013.
                                               230


[195] J. Caballero and I. Grossmann, “An algorithm for the use of surrogate models in modular
      flowsheet optimization,” AIChE Journal, vol. 54, no. 10, pp. 2633–2650, 2008.
[196] S. Lucidi, M. Maurici, L. Paulon, F. Rinaldi, and M. Roma, “A simulation-based multiob-
      jective optimization approach for health care service management,” IEEE Transactions on
      Automation Science and Engineering, vol. 13, no. 4, pp. 1480–1491, 2016.
[197] H. Guo, S. Gao, K. Tsui, and T. Niu, “Simulation optimization for medical staff configuration
      at emergency department in hong kong,” IEEE Transactions on Automation Science and
      Engineering, vol. 14, no. 4, pp. 1655–1665, 2017.
[198] F. Zeinali, M. Mahootchi, and M. Sepehri, “Resource planning in the emergency departments:
      A simulation-based metamodeling approach,” Simulation Modelling Practice and Theory,
      vol. 53, pp. 123–138, 2015.
[199] G. Deng, D. G. Fryback, and V. Kuruchittham, “Breast cancer epidemiology : calibrating
      simulations via optimization,” 2005.
[200] S. Bhattarai, S. Klimov, M. Aleskandarany, H. Burrell, A. Wormall, A. Green, P. Rida,
      I. Ellis, R. Osan, E. Rakha, and R. Aneja, “Machine learning-based prediction of breast
      cancer growth rate in vivo,” British Journal of Cancer, vol. 121, no. 6, pp. 497–504, 2019.
[201] C. Spanakis, E. Mathioudakis, M. Tsiknakis, N. Kampanis, and K. Marias, “Function
      approximation for medical image registration,” in 2018 41st international conference on
      telecommunications and signal processing (TSP), 2018 41st International Conference on
      Telecommunications and Signal Processing (TSP), pp. 1–5, 2018.
[202] P. A. Romero, A. Krause, and F. H. Arnold, “Navigating the protein fitness landscape with
      Gaussian processes,” Proceedings of the National Academy of Sciences of the United States
      of America, vol. 110, pp. E193–E201, Jan. 2013.
[203] L. Y. Hsieh, E. Huang, and C. Chen, “Equipment utilization enhancement in photolithography
      area through a dynamic system control using multi-fidelity simulation optimization with
      big data technique,” IEEE Transactions on Semiconductor Manufacturing, vol. 30, no. 2,
      pp. 166–175, 2017.
[204] C. Almeder, M. Preusser, and R. F. Hartl, “Simulation and optimization of supply chains:
      alternative or complementary approaches?,” OR Spectrum. Quantitative Approaches in Man-
      agement, vol. 31, no. 1, pp. 95–119, 2009.
[205] C. Osorio and M. Bierlaire, “A surrogate model for traffic optimization of congested networks:
      an analytic queueing network approach,” Jan. 2009.
[206] C. Osorio and M. Bierlaire, “A simulation-based optimization framework for urban traffic
      control,” tech. rep., 2010.
[207] Y. Zhao, P. A. Ioannou, and M. M. Dessouky, “Dynamic multimodal freight routing using
      a co-simulation optimization approach,” IEEE Transactions on Intelligent Transportation
      Systems, vol. 20, no. 7, pp. 2657–2667, 2019.
                                               231


[208] W. E. Biles, J. P. C. Kleijnen, W. C. M. van Beers, and I. van Nieuwenhuyse, “Kriging
      metamodeling in constrained simulation optimization: an explorative study,” in 2007 Winter
      Simulation Conference, pp. 355–362, Dec. 2007.
[209] W. Ye and F. You, “A computationally efficient simulation-based optimization method with
      region-wise surrogate modeling for stochastic inventory management of supply chains with
      general network structures,” Computers & Chemical Engineering, vol. 87, pp. 164–179,
      2016.
[210] S. Nguyen, M. Zhang, M. Johnston, and K. C. Tan, “Selection schemes in surrogate-assisted
      genetic programming for job shop scheduling,” in Simulated evolution and learning (G. Dick,
      W. N. Browne, P. Whigham, M. Zhang, L. T. Bui, H. Ishibuchi, Y. Jin, X. Li, Y. Shi, P. Singh,
      K. C. Tan, and K. Tang, eds.), (Cham), pp. 656–667, Springer International Publishing, 2014.
[211] F. Amiri, B. Shirazi, and A. Tajdin, “Multi-objective simulation optimization for uncer-
      tain resource assignment and job sequence in automated flexible job shop,” Applied Soft
      Computing Journal, vol. 75, pp. 190–202, 2019.
[212] M.-E. Iacob, D. Quartel, and H. Jonkers, “Capturing business strategy and value in enterprise
      architecture to support portfolio valuation,” pp. 11–20, 2012.
[213] María G. Villarreal-Marroquín, Joshua D. Svenson, Fangfang Sun, Thomas J. Santner,
      Angela Dean, and José M. Castro, “A comparison of two metamodel-based methodologies
      for multiple criteria simulation optimization using an injection molding case study,” Journal
      of Polymer Engineering, vol. 33, no. 3, pp. 193–209, 2013.
[214] T. Gabor and P. Altmann, “Benchmarking surrogate-assisted genetic recommender systems,”
      in GECCO ’19: Proceedings of the genetic and evolutionary computation conference com-
      panion, (New York, NY, USA), pp. 1568–1575, ACM, 2019.
[215] A. Antonakis and K. Giannakoglou, “Optimisation of military aircraft engine maintenance
      subject to engine part shortages using asynchronous metamodel-assisted particle swarm
      optimisation and Monte-Carlo simulations,” International Journal of Systems Science: Op-
      erations and Logistics, vol. 5, no. 3, pp. 239–252, 2018.
[216] J. Blank and K. Deb, “GPSAF: A Generalized Probabilistic Surrogate-Assisted Framework
      for Constrained Single- and Multi-objective Optimization,” Tech. Rep. 2022004, Compu-
      tational Optimization and Innovation Laboratory, Michigan State University, East Lansing,
      MI-48824, USA, 2022.
[217] W. Luo, R. Yi, B. Yang, and P. Xu, “Surrogate-assisted evolutionary framework for data-
      driven dynamic optimization,” IEEE Transactions on Emerging Topics in Computational
      Intelligence, vol. 3, no. 2, pp. 137–150, 2019.
[218] S. Olafsson and J. Kim, “Simulation optimization,” in Proceedings of the winter simulation
      conference, vol. 1, pp. 79–84, 2002.
                                                 232


[219] R. T. Haftka, D. Villanueva, and A. Chaudhuri, “Parallel surrogate-assisted global optimiza-
      tion with expensive functions –a survey,” Structural and Multidisciplinary Optimization,
      vol. 54, no. 1, pp. 3–13, 2016.
[220] R. Storn, “On the usage of differential evolution for function optimization,” in Proceedings
      of north american fuzzy information processing, pp. 519–523, 1996.
[221] M. Emmerich, A. Giotis, M. Özdemir, T. Bäck, and K. Giannakoglou, “Meta-
      model—Assisted evolution strategies,” in Parallel problem solving from nature — PPSN
      VII (J. J. M. Guervós, P. Adamidis, H.-G. Beyer, H.-P. Schwefel, and J.-L. Fernández-
      Villacañas, eds.), (Berlin, Heidelberg), pp. 361–370, Springer Berlin Heidelberg, 2002.
      tex.ids= 2002-emmerich-metamodel-assisted.
[222] C. Sun, Y. Jin, R. Cheng, J. Ding, and J. Zeng, “Surrogate-assisted cooperative swarm
      optimization of high-dimensional expensive problems,” IEEE Transactions on Evolutionary
      Computation, vol. 21, no. 4, pp. 644–660, 2017.
[223] X. Cai, L. Gao, and X. Li, “Efficient generalized surrogate-assisted evolutionary algorithm for
      high-dimensional expensive problems,” IEEE Transactions on Evolutionary Computation,
      vol. 24, no. 2, pp. 365–379, 2020.
[224] Y. S. Ong, P. B. Nair, and A. J. Keane, “Evolutionary optimization of computationally
      expensive problems via surrogate modeling,” AIAA Journal, vol. 41, no. 4, pp. 687–696,
      2003.
[225] T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, and K. Sindhya, K-rvea: A kriging-assisted
      evolutionary algorithm for many-objective optimization. Mar. 2016.
[226] S. Bagheri, W. Konen, M. Emmerich, and T. Bäck, “Self-adjusting parameter control for
      surrogate-assisted constrained optimization under limited budgets,” Applied Soft Computing,
      vol. 61, pp. 377 – 393, 2017.
[227] M. M. Islam, H. K. Singh, and T. Ray, “A surrogate assisted approach for single-objective
      bilevel optimization,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 5,
      pp. 681–696, 2017.
[228] J. Müller, “MISO: mixed-integer surrogate optimization framework,” Optimization and
      Engineering, vol. 17, pp. 177–203, Mar. 2016.
[229] J. Müller and J. D. Woodbury, “GOSAC: global optimization with surrogate approximation
      of constraints,” Journal of Global Optimization, vol. 69, pp. 117–136, Sept. 2017.
[230] W. J. Roux, N. Stander, and R. T. Haftka, “Response surface approximations for structural
      optimization,” International Journal for Numerical Methods in Engineering, vol. 42, no. 3,
      pp. 517–534, 1998.
[231] R. H. Myers, A. I. Khuri, and Walter H. Carter, “Response surface methodology: 1966–l988,”
      Technometrics, vol. 31, no. 2, pp. 137–157, 1989.
                                                 233


[232] N. Hansen, “A global surrogate assisted CMA-ES,” in Proceedings of the genetic and
      evolutionary computation conference, GECCO ’19, (New York, NY, USA), pp. 664–672,
      Association for Computing Machinery, 2019.
[233] N. Hansen, A. Auger, R. Ros, O. Mersmann, T. Tušar, and D. Brockhoff, “COCO: A platform
      for comparing continuous optimizers in a black-box setting,” Optimization Methods and
      Software, 2020.
[234] M. D. McKay, R. J. Beckman, and W. J. Conover, “Comparison of three methods for selecting
      values of input variables in the analysis of output from a computer code,” Technometrics,
      vol. 21, no. 2, pp. 239–245, 1979.
[235] E. Wegman, “Hyperdimensional data analysis using parallel coordinates,” Journal of the
      American Statistical Association, vol. 85, pp. 664–675, Sept. 1990.
[236] Z. Zhan, J. Zhang, Y. Li, and H. S. Chung, “Adaptive particle swarm optimization,” IEEE
      Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6,
      pp. 1362–1381, 2009.
[237] T. G. authors, “GPyOpt: A bayesian optimization framework in python,” 2016.
[238] G. De Ath, R. M. Everson, J. E. Fieldsend, and A. A. M. Rahat, “$\epsilon$-shotgun,”
      Proceedings of the 2020 Genetic and Evolutionary Computation Conference, June 2020.
[239] M. G. Kendall, “A new measure of rank correlation,” Biometrika, vol. 30, no. 1/2, pp. 81–93,
      1938.
[240] C. A. Coello Coello and M. Reyes Sierra, “A study of the parallelization of a coevolutionary
      multi-objective evolutionary algorithm,” in MICAI 2004: Advances in artificial intelligence
      (R. Monroy, G. Arroyo-Figueroa, L. E. Sucar, and H. Sossa, eds.), (Berlin, Heidelberg),
      pp. 688–697, Springer Berlin Heidelberg, 2004.
[241] E. Zitzler, D. Brockhoff, and L. Thiele, “The Hypervolume Indicator Revisited: On the
      Design of Pareto-compliant Indicators via Weighted Integration,” in Proceedings of the 4th
      International Conference on Evolutionary Multi-criterion Optimization, EMO’07, (Berlin,
      Heidelberg), pp. 862–876, Springer-Verlag, 2007.
[242] J. Blanchard, C. Beauthier, and T. Carletti, “A surrogate-assisted cooperative co-evolutionary
      algorithm using recursive differential grouping as decomposition strategy,” in 2019 IEEE
      congress on evolutionary computation (CEC), pp. 689–696, 2019.
[243] G. Chen, K. Zhang, X. Xue, L. Zhang, J. Yao, H. Sun, L. Fan, and Y. Yang, “Surrogate-
      assisted evolutionary algorithm with dimensionality reduction method for water flooding pro-
      duction optimization,” Journal of Petroleum Science and Engineering, vol. 185, p. 106633,
      2020.
[244] F. Li, X. Cai, L. Gao, and W. Shen, “A surrogate-assisted multiswarm optimization algo-
      rithm for high-dimensional computationally expensive problems,” IEEE Transactions on
      Cybernetics, vol. 51, no. 3, pp. 1390–1402, 2021.
                                                234


[245] Y. Tian, R. Cheng, X. Zhang, and Y. Jin, “PlatEMO: A MATLAB platform for evolutionary
      multi-objective optimization,” IEEE Computational Intelligence Magazine, vol. 12, no. 4,
      pp. 73–87, 2017.
[246] Z. Michalewicz and M. Schoenauer, “Evolutionary algorithms for constrained parameter
      optimization problems,” Evolutionary Computation, vol. 4, pp. 1–32, Mar. 1996.
[247] C. A. Floudas and P. M. Pardalos, “A collection of test problems for constrained global
      optimization algorithms,” in Lecture notes in computer science, 1990.
[248] IEEE international conference on evolutionary computation, CEC 2006, part of WCCI 2006,
      vancouver, BC, canada, 16-21 july 2006. IEEE, 2006.
[249] J. Liang, T. Runarsson, E. Mezura-Montes, M. Clerc, P. Suganthan, C. Coello, and K. Deb,
      “Problem definitions and evaluation criteria for the CEC 2006 special session on constrained
      real-parameter optimization,” Nangyang Technological University, Singapore, Tech. Rep,
      vol. 41, Jan. 2006.
[250] T. Runarsson and X. Yao, “Search biases in constrained evolutionary optimization,” IEEE
      Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 35,
      no. 2, pp. 233–243, 2005.
[251] T. Runarsson and X. Yao, “Stochastic ranking for constrained evolutionary optimization,”
      IEEE Transactions on Evolutionary Computation, vol. 4, no. 3, pp. 284–294, 2000.
[252] R Core Team, “R: A language and environment for statistical computing,” manual, Vienna,
      Austria, 2020.
[253] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms:
      Empirical results,” Evolutionary Computation, vol. 8, no. 2, pp. 173–195, 2000.
[254] S. Huband, L. Barone, L. While, and P. Hingston, “A Scalable Multi-objective Test Problem
      Toolkit,” in Evolutionary Multi-Criterion Optimization (C. A. Coello Coello, A. Hernán-
      dez Aguirre, and E. Zitzler, eds.), (Berlin, Heidelberg), pp. 280–295, Springer Berlin Hei-
      delberg, 2005.
[255] N. Beume, B. Naujoks, and M. Emmerich, “SMS-EMOA: Multiobjective selection based
      on dominated hypervolume,” European Journal of Operational Research, vol. 181, no. 3,
      pp. 1653–1669, 2007.
[256] E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strength pareto evolutionary
      algorithm,” 2001.
[257] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable test problems for evolutionary
      multiobjective optimization,” in Evolutionary multiobjective optimization: Theoretical ad-
      vances and applications (A. Abraham, L. Jain, and R. Goldberg, eds.), pp. 105–145, London:
      Springer London, 2005.
                                                235


[258] H. Jain and K. Deb, “An evolutionary many-objective optimization algorithm using reference-
      point based nondominated sorting approach, part II: Handling constraints and extending to an
      adaptive approach,” IEEE Transactions on Evolutionary Computation, vol. 18, pp. 602–622,
      Aug. 2014.
[259] T. T. Binh and U. Korn, “MOBES: A multiobjective evolution strategy for constrained
      optimization problems,” in In Proceedings of the Third International Conference on Genetic
      Algorithms, pp. 176–182, 1997.
[260] N. Srinivas and K. Deb, “Multi-Objective function optimization using non-dominated sorting
      genetic algorithms,” Evolutionary Computation Journal, vol. 2, no. 3, pp. 221–248, 1994.
[261] M. Tanaka, “GA-based decision support system for multi-criteria optimization,” in Proceed-
      ings of the international conference on systems, man and cybernetics, vol. 2, pp. 1556–1561,
      1995.
[262] A. Osyczka and S. Kundu, “A new method to solve generalized multicriteria optimization
      problems using the simple genetic algorithm,” Structural optimization, vol. 10, pp. 94–99,
      Oct. 1995.
[263] E. Mezura-Montes and C. A. C. Coello, “Constraint-handling in nature-inspired numerical
      optimization: past, present and future,” Swarm and Evolutionary Computation, vol. 1, no. 4,
      pp. 173–194, 2011.
[264] A. I. Forrester and A. J. Keane, “Recent advances in surrogate-based optimization,” Progress
      in Aerospace Sciences, vol. 45, no. 1, pp. 50 – 79, 2009.
[265] J. Thomann and G. Eichfelder, “Representation of the Pareto Front for heterogeneous multi-
      objective optimization,” vol. 1, pp. 293–323, Dec. 2019.
[266] X. Wang, Y. Jin, S. Schmitt, M. Olhofer, and R. Allmendinger, “Transfer learning based sur-
      rogate assisted evolutionary bi-objective optimization for objectives with different evaluation
      times,” Knowledge-Based Systems, vol. 227, p. 107190, 2021.
[267] R. Allmendinger and J. Knowles, “Heterogeneous objectives: State-of-the-art and future
      research,” 2021.
[268] K. H. Rahi, H. K. Singh, and T. Ray, “Investigating the use of sequencing and infeasibility
      driven strategies for constrained optimization,” in 2019 IEEE congress on evolutionary
      computation (CEC), pp. 1642–1649, 2019.
[269] K. H. Rahi, H. K. Singh, and T. Ray, “Feasibility-ratio based sequencing for computa-
      tionally efficient constrained optimization,” Swarm and Evolutionary Computation, vol. 62,
      p. 100850, 2021.
[270] K. H. Rahi, H. K. Singh, and T. Ray, “Partial evaluation strategies for expensive evolutionary
      constrained optimization,” IEEE Transactions on Evolutionary Computation, pp. 1–1, 2021.
                                                236


[271] A. I. Forrester, A. Sóbester, and A. J. Keane, “Surrogate-assisted multicriteria optimization:
      Complexities, prospective solutions, and business case,” Proceedings of the Royal Society
      A: Mathematical, Physical and Engineering Sciences, vol. 463, no. 2088, pp. 3251–3269,
      2007.
[272] D. Brockhoff and E. Zitzler, “Are all objectives necessary? On dimensionality reduction in
      evolutionary multiobjective optimization,” in Parallel problem solving from nature - PPSN
      IX (T. P. Runarsson, H.-G. Beyer, E. Burke, J. J. Merelo-Guervós, L. D. Whitley, and X. Yao,
      eds.), (Berlin, Heidelberg), pp. 533–542, Springer Berlin Heidelberg, 2006.
[273] G. Batista and Maria Carolina Monard, “An analysis of four missing data treatment methods
      for supervised learning,” Applied Artificial Intelligence, vol. 17, no. 5-6, pp. 519–533, 2003.
[274] A. Pétrowski, “A Clearing Procedure as a Niching Method for Genetic Algorithms,” in IEEE
      3rd ICEC’96, pp. 798–803, 1996.
[275] D. Hardin and E. Saff, “Minimal Riesz energy point configurations for rectifiable d-
      dimensional manifolds,” Advances in Mathematics, vol. 193, no. 1, pp. 174 – 204, 2005.
[276] J. Blank, K. Deb, Y. Dhebar, S. Bandaru, and H. Seada, “Generating Well-Spaced Points
      on a Unit Simplex for Evolutionary Many-Objective Optimization,” IEEE Transactions on
      Evolutionary Computation, pp. 1–1, 2020.
[277] D. Eriksson, D. Bindel, and C. A. Shoemaker, “pySOT and POAP: An event-driven asyn-
      chronous framework for surrogate optimization,” arXiv preprint arXiv:1908.00420, 2019.
[278] K. Deb, A. Pratap, and T. Meyarivan, “Constrained test problems for multi-objective evo-
      lutionary optimization,” in Evolutionary multi-criterion optimization (E. Zitzler, L. Thiele,
      K. Deb, C. A. Coello Coello, and D. Corne, eds.), (Berlin, Heidelberg), pp. 284–298,
      Springer Berlin Heidelberg, 2001.
[279] K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference-
      point-based nondominated sorting approach, Part I: Solving problems with box constraints,”
      IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp. 577–601, 2014.
[280] I. Das and J. E. Dennis, “Normal-boundary intersection: A new method for generating the
      pareto surface in nonlinear multicriteria optimization problems,” SIAM J. on Optimization,
      vol. 8, pp. 631–657, Mar. 1998.
[281] K. Deb and M. Goyal, “Optimizing engineering designs using a combined genetic search,”
      in PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON GENETIC AL-
      GORITHMS, pp. 521–528, Morgan Kauffman Publishers, 1995.
[282] B. Khoshoo, J. Blank, T. Pham, K. Deb, and S. Foster, “Optimized electric machine de-
      sign solutions with efficient handling of constraints,” in 2021 IEEE symposium series on
      computational intelligence (SSCI), pp. 1–8, 2021.
[283] N. V. Sahinidis, “Optimization under uncertainty: state-of-the-art and opportunities,” Com-
      puters & Chemical Engineering, vol. 28, no. 6-7, pp. 971–983, 2004.
                                                237


[284] X. Li, M. G. Epitropakis, K. Deb, and A. Engelbrecht, “Seeking multiple solutions: an up-
      dated survey on niching methods and their applications,” IEEE Transactions on Evolutionary
      Computation, vol. 21, no. 4, pp. 518–538, 2016.
[285] T. Bartz-Beielstein, M. Preuss, K. Schmitt, and H.-P. Schwefel, “Challenges for contempo-
      rary evolutionary algorithms,” 2010.
[286] N. Hansen, A. Auger, S. Finck, and R. Ros, “Real-parameter black-box optimization bench-
      marking 2010: Experimental setup,” Research report RR-7215, INRIA, Mar. 2010.
[287] S. Koziel, L. Leifsson, and X.-S. Yang, Solving computationally expensive engineering
      problems: methods and applications, vol. 97. Springer, 2014.
[288] V. R. Joseph, “Space-filling designs for computer experiments: A review,” Quality Engi-
      neering, vol. 28, no. 1, pp. 28–35, 2016.
[289] A. Ahrari, K. Deb, and M. Preuss, “Multimodal optimization by covariance matrix self-
      adaptation evolution strategy with repelling subpopulations,” Evolutionary computation,
      vol. 25, no. 3, pp. 439–471, 2017.
[290] A. Sóbester, S. J. Leary, and A. J. Keane, “A parallel updating scheme for approximat-
      ing and optimizing high fidelity computer simulations,” Structural and multidisciplinary
      optimization, vol. 27, no. 5, pp. 371–383, 2004.
[291] D. Zhan, J. Qian, and Y. Cheng, “Balancing global and local search in parallel efficient global
      optimization algorithms,” Journal of Global Optimization, vol. 67, no. 4, pp. 873–892, 2017.
[292] N. M. Alexandrov, J. Dennis, R. M. Lewis, and V. Torczon, “A trust-region framework for
      managing the use of approximation models in optimization,” Structural optimization, vol. 15,
      no. 1, pp. 16–23, 1998.
[293] Y. Jin, M. Hüsken, and B. Sendhoff, “Quality measures for approximate models in evolu-
      tionary computation,” in GECCO, pp. 170–173, 2003.
[294] A. Auger and N. Hansen, “A restart CMA evolution strategy with increasing population
      size,” in 2005 IEEE congress on evolutionary computation, vol. 2, pp. 1769–1776 Vol. 2,
      2005.
[295] H. Ishibuchi, R. Imada, N. Masuyama, and Y. Nojima, “Comparison of hypervolume,
      IGD and IGD+ from the viewpoint of optimal distributions of solutions,” in International
      conference on evolutionary multi-criterion optimization, pp. 332–345, 2019.
[296] H. Ishibuchi, R. Imada, Y. Setoguchi, and Y. Nojima, “How to specify a reference point
      in hypervolume calculation for fair performance comparison,” Evolutionary computation,
      vol. 26, no. 3, pp. 411–440, 2018.
[297] J. Müller, “SOCEMO: Surrogate optimization of computationally expensive multiobjective
      problems,” INFORMS Journal on Computing, vol. 29, no. 4, pp. 581–596, 2017.
[298] J. Müller, “Codes for Surrogate Model Based Optimization,” 2020.
                                                238


[299] R. Ramarathnam, B. G. Desai, and V. S. Rao, “A comparative study of minimization tech-
      niques for optimization of induction motor design,” IEEE Transactions on Power Apparatus
      and Systems, vol. PAS-92, no. 5, pp. 1448–1454, 1973.
[300] B. Singh, B. P. Singh, S. S. Murthy, and C. S. Jha, “Experience in design optimization of
      induction motor using ’SUMT’ algorithm,” IEEE Transactions on Power Apparatus and
      Systems, vol. PAS-102, no. 10, pp. 3379–3384, 1983.
[301] N. Bianchi and S. Bolognani, “Design optimisation of electric motors by genetic algorithms,”
      IEE Proceedings - Electric Power Applications, vol. 145, pp. 475–483(8), Sept. 1998.
      tex.copyright: © IEE.
[302] B. Mirzaeian, M. Moallem, V. Tahani, and C. Lucas, “Multiobjective optimization method
      based on a genetic algorithm for switched reluctance motor design,” IEEE Transactions on
      Magnetics, vol. 38, no. 3, pp. 1524–1527, 2002.
[303] S. D. Sudhoff, J. Cale, B. Cassimere, and M. Swinney, “Genetic algorithm based design of a
      permanent magnet synchronous machine,” 2005 IEEE International Conference on Electric
      Machines and Drives, pp. 1011–1019, 2005.
[304] D. Zarko, D. Ban, and T. Lipo, “Design optimization of interior permanent magnet (IPM)
      motors with maximized torque output in the entire speed range,” in 2005 european conference
      on power electronics and applications, pp. 10 pp.–P.10, 2005.
[305] Y. Duan, R. G. Harley, and T. G. Habetler, “Comparison of particle swarm optimization and
      genetic algorithm in the design of permanent magnet motors,” 2009 IEEE 6th International
      Power Electronics and Motion Control Conference, IPEMC ’09, vol. 3, pp. 822–825, 2009.
[306] Y. Duan and D. M. Ionel, “A review of recent developments in electrical machine design
      optimization methods with a permanent-magnet synchronous motor benchmark study,” IEEE
      Transactions on Industry Applications, vol. 49, no. 3, pp. 1268–1275, 2013.
[307] P. Zhang, G. Y. Sizov, D. M. Ionel, and N. A. Demerdash, “Design optimization of spoke-
      type ferrite magnet machines by combined design of experiments and differential evolution
      algorithms,” in 2013 international electric machines drives conference, pp. 892–898, 2013.
[308] G. Pellegrino and F. Cupertino, “FEA-based multi-objective optimization of IPM motor
      design including rotor losses,” in 2010 IEEE energy conversion congress and exposition,
      pp. 3659–3666, 2010.
[309] G. Pellegrino and F. Cupertino, “IPM motor rotor design by means of FEA-based multi-
      objective optimization,” in 2010 IEEE international symposium on industrial electronics,
      pp. 1340–1346, 2010.
[310] L. Jolly, M. Jabbar, and L. Qinghua, “Design optimization of permanent magnet motors using
      response surface methodology and genetic algorithms,” IEEE Transactions on Magnetics,
      vol. 41, no. 10, pp. 3928–3930, 2005.
                                               239


[311] D. M. Ionel and M. Popescu, “Finite element surrogate model for electric machines with
      revolving field — application to IPM motors,” in 2009 IEEE energy conversion congress
      and exposition, pp. 178–186, 2009.
[312] G. Y. Sizov, D. M. Ionel, and N. A. O. Demerdash, “Modeling and parametric design of
      permanent-magnet AC machines using computationally efficient finite-element analysis,”
      IEEE Transactions on Industrial Electronics, vol. 59, no. 6, pp. 2403–2413, 2012.
[313] N. Taran, D. M. Ionel, and D. G. Dorrell, “Two-level surrogate-assisted differential evolution
      multi-objective optimization of electric machines using 3-D FEA,” IEEE Transactions on
      Magnetics, vol. 54, no. 11, pp. 1–5, 2018.
[314] J. Song, J. Zhao, F. Dong, J. Zhao, Z. Qian, and Q. Zhang, “A novel regression model-
      ing method for PMSLM structural design optimization using a Distance-Weighted KNN
      Algorithm,” IEEE Transactions on Industry Applications, vol. 54, no. 5, pp. 4198–4206,
      2018.
[315] G. Pellegrino, F. Cupertino, and C. Gerada, “Automatic design of synchronous reluctance
      motors focusing on barrier shape optimization,” IEEE Transactions on Industry Applications,
      vol. 51, no. 2, pp. 1465–1474, 2015.
[316] G. Bramerdorfer, J. A. Tapia, J. J. Pyrhönen, and A. Cavagnino, “Modern electrical
      machine design optimization: <span class="nocase">Techniques, Trends, and Best Prac-
      tices</span>,” IEEE Transactions on Industrial Electronics, vol. 65, no. 10, pp. 7672–7684,
      2018.
[317] S. Stipetic, W. Miebach, and D. Zarko, “Optimization in design of electric machines:
      Methodology and workflow,” in 2015 intl aegean conference on electrical machines power
      electronics (ACEMP), 2015 intl conference on optimization of electrical electronic equipment
      (OPTIM) 2015 intl symposium on advanced electromechanical motion systems (ELECTRO-
      MOTION), pp. 441–448, 2015.
[318] F. Gillon and P. Brochet, “Screening and response surface method applied to the numerical
      optimization of electromagnetic devices,” IEEE Transactions on Magnetics, vol. 36, no. 4,
      pp. 1163–1167, 2000.
[319] Altair, “Altair FluxMotor (version 2019.1.1),” manual, Altair Engineering Inc., 2019.
[320] J. Pyrhönen, T. Jokinen, and V. Hrabovcová, “Design of rotating electrical machines,” in
      Design of rotating electrical machines, pp. 1–512, 2008.
[321] E. J. Cramer, J. E. Dennis, Jr, P. D. Frank, R. M. Lewis, and G. R. Shubin, “Problem
      formulation for multidisciplinary optimization,” SIAM Journal on Optimization, vol. 4,
      no. 4, pp. 754–776, 1994.
[322] Y. Parte, D. Auroux, J. Clément, M. Masmoudi, and J. Hermetz, “Collaborative opti-
      mization,” Multidisciplinary design optimization in computational mechanics, pp. 321–368,
      Wiley, 2010.
                                               240


[323] R. Braun, A. Moore, and I. Kroo, “Use of the collaborative optimization architecture for
      launch vehicle design,” in 6th symposium on multidisciplinary analysis and optimization,
      1996.
[324] Collaboration, The cambridge dictionary of philosophy. Cambridge University Press, 1999.
[325] “Why is collaboration so difficult?,” 2017.
[326] R. Atkinson, “Project management: cost, time and quality, two best guesses and a phe-
      nomenon, its time to accept other success criteria,” International journal of project manage-
      ment, vol. 17, no. 6, pp. 337–342, 1999.
[327] A. De Wit, “Measurement of project success,” International journal of project management,
      vol. 6, no. 3, pp. 164–170, 1988.
[328] J. R. Adams and S. E. Barnd, “Behavioral Implications of the Project Life Cycle,” in Project
      Management Handbook, pp. 206–230, John Wiley & Sons, Ltd, 1997.
[329] J. Highsmith, Agile project management: creating innovative products. Pearson education,
      2009.
[330] G. D. Brewer, “The challenges of interdisciplinarity,” Policy sciences, vol. 32, no. 4, pp. 327–
      337, 1999.
[331] S. Lélé and R. B. Norgaard, “Practicing interdisciplinarity,” BioScience, vol. 55, no. 11,
      pp. 967–975, 2005.
[332] J. A. Jacobs and S. Frickel, “Interdisciplinarity: A Critical Assessment,” Annual Review of
      Sociology, vol. 35, no. 1, pp. 43–65, 2009.
[333] R. T. Craig, “Communication in the Conversation of Disciplines,” Russian Journal of Com-
      munication, vol. 1, no. 1, pp. 7–23, 2008.
[334] J. D. Peters, Speaking into the air : a history of the idea of communication. Chicago :
      University of Chicago Press, 1999., 1999.
[335] E. Curry, “The big data value chain: Definitions, concepts, and theoretical approaches.,” in
      New horizons for a data-driven economy (J. M. Cavanillas, E. Curry, and W. Wahlster, eds.),
      pp. 29–37, Springer, 2016.
[336] A. Gaur, A. K. Talukder, K. Deb, S. Tiwari, S. Xu, and D. Jones, “Unconventional optimiza-
      tion for achieving well-informed design solutions for the automobile industry,” Engineering
      Optimization, vol. 52, no. 9, pp. 1542–1560, 2020.
[337] G. Rossum, “Python Reference Manual,” tech. rep., CWI (Centre for Mathematics and
      Computer Science), Amsterdam, The Netherlands, The Netherlands, 1995.
[338] M. Bücker, G. Corliss, P. Hovland, U. Naumann, and B. Norris, Automatic Differentiation:
      Applications, Theory, and Implementations (Lecture Notes in Computational Science and
      Engineering). Berlin, Heidelberg: Springer-Verlag, 2006.
                                                241


[339] R. Lehmann, Sphinx Documentation. Universitaet Potsdam, 2019.
[340] A. Pajankar, Python Unit Test Automation: Practical Techniques for Python Developers and
      Testers. Berkely, CA, USA: Apress, 1st ed., 2017.
[341] J. J. Durillo and A. J. Nebro, “jMetal: A Java framework for multi-objective optimization,”
      Advances in Engineering Software, vol. 42, pp. 760–771, 2011.
[342] J. Gosling, B. Joy, G. L. Steele, G. Bracha, and A. Buckley, The Java Language Specification,
      Java SE 8 Edition. Addison-Wesley Professional, 1st ed., 2014.
[343] A. Benítez-Hidalgo, A. J. Nebro, J. García-Nieto, I. Oregi, and J. D. Ser, “jMetalPy: A Python
      framework for multi-objective optimization with metaheuristics,” Swarm and Evolutionary
      Computation, vol. 51, p. 100598, 2019.
[344] D. Izzo, “PyGMO and PyKEP: open source tools for massively parallel optimization in
      astrodynamics (the case of interplanetary trajectory optimization),” in 5th International
      Conference on Astrodynamics Tools and Techniques (ICATT 2012), 2012.
[345] D. Hadka, Platypus: Multiobjective Optimization in Python.
[346] F.-A. Fortin, F.-M. D. Rainville, M.-A. Gardner, M. Parizeau, and C. Gagné, “DEAP:
      Evolutionary Algorithms Made Easy,” Journal of Machine Learning Research, vol. 13,
      pp. 2171–2175, July 2012.
[347] A. Garrett, inspyred: Python library for bio-inspired computational intelligence.
[348] D. Hadka, MOEA Framework: A free and open source Java framework for multiobjective
      optimization.
[349] E. López-Camacho, M. J. García-Godoy, A. J. Nebro, and J. F. A. Montes, “jMetalCpp: opti-
      mizing molecular docking problems with a C++ metaheuristic framework,” Bioinformatics,
      vol. 30, no. 3, pp. 437–438, 2014.
[350] K. Deb and M. Abouhawwash, “An optimality theory-based proximity measure for set-
      based multiobjective optimization,” IEEE Trans. Evolutionary Computation, vol. 20, no. 4,
      pp. 515–528, 2016.
[351] K. Deb, M. Abouhawwash, and H. Seada, “A computationally fast convergence measure
      and implementation for single-, multiple-, and many-objective optimization,” IEEE Trans.
      Emerging Topics in Comput. Intellig., vol. 1, no. 4, pp. 280–293, 2017.
[352] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” Trans. Evol.
      Comp, vol. 1, pp. 67–82, Apr. 1997.
[353] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,
      L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” 2017.
[354] Dask Development Team, Dask: Library for dynamic task scheduling. 2016.
                                                 242


[355] S. Mishra, S. Mondal, and S. Saha, “Fast implementation of steady-state NSGA-II,” in 2016
      IEEE Congress on Evolutionary Computation (CEC), pp. 3777–3784, July 2016.
[356] J. C. Bean, “Genetic Algorithms and Random Keys for Sequencing and Optimization,” ORSA
      Journal on Computing, vol. 6, no. 2, pp. 154–160, 1994.
[357] J. A. Nelder and R. Mead, “A simplex method for function minimization,” Computer Journal,
      vol. 7, pp. 308–313, 1965.
[358] K. Deb and J. Sundar, “Reference point based multi-objective optimization using evolution-
      ary algorithms,” in Proceedings of the 8th annual conference on genetic and evolutionary
      computation, GECCO ’06, (New York, NY, USA), pp. 635–642, ACM, 2006.
[359] H. Seada and K. Deb, “A unified evolutionary optimization procedure for single, multiple,
      and many objectives,” IEEE Transactions on Evolutionary Computation, vol. 20, pp. 358–
      369, June 2016.
[360] Y. Vesikar, K. Deb, and J. Blank, “Reference point based NSGA-III for preferred solutions,”
      in 2018 IEEE symposium series on computational intelligence (SSCI), pp. 1587–1594, Nov.
      2018.
[361] K. Deb, K. Sindhya, and T. Okabe, “Self-adaptive simulated binary crossover for real-
      parameter optimization,” in Proceedings of the 9th annual conference on genetic and evolu-
      tionary computation, GECCO ’07, (New York, NY, USA), pp. 1187–1194, ACM, 2007.
[362] K. Deb and D. Deb, “Analysing mutation schemes for real-parameter genetic algorithms,”
      International Journal of Artificial Intelligence and Soft Computing, vol. 4, no. 1, pp. 1–28,
      2014.
[363] K. Deb and M. Goyal, “A robust optimization procedure for mechanical component design
      based on genetic adaptive search,” Transactions of the ASME: Journal of Mechanical Design,
      vol. 120, no. 2, pp. 162–164, 1998.
[364] J. Blank and K. Deb, “A Running Performance Metric and Termination Criterion for Eval-
      uating Evolutionary Multi- and Many-objective Optimization Algorithms,” in 2020 IEEE
      Congress on Evolutionary Computation (CEC), pp. 1–8, 2020.
[365] A. Santiago, H. J. F. Huacuja, B. Dorronsoro, J. E. Pecero, C. G. Santillan, J. J. G. Bar-
      bosa, and J. C. S. Monterrubio, “A Survey of Decomposition Methods for Multi-objective
      Optimization,” in Recent Advances on Hybrid Approaches for Designing Intelligent Systems
      (O. Castillo, P. Melin, W. Pedrycz, and J. Kacprzyk, eds.), pp. 453–465, Cham: Springer
      International Publishing, 2014.
[366] A. P. Wierzbicki, “The use of reference objectives in multiobjective optimization,” in Multiple
      criteria decision making theory and application, pp. 468–486, Springer, 1980.
[367] A. P. Wierzbicki, “A mathematical basis for satisficing decision making,” Mathematical
      Modelling, vol. 3, no. 5, pp. 391 – 405, 1982.
                                                 243


[368] J. Knowles and D. Corne, “On Metrics for Comparing Non-Dominated Sets,” in Proceedings
      of the 2002 Congress on Evolutionary Computation Conference (CEC02), (United States),
      pp. 711–716, Institute of Electrical and Electronics Engineers, 2002.
[369] D. A. V. Veldhuizen, “Multiobjective evolutionary algorithms: Classifications, analyses, and
      new innovations,” tech. rep., Evolutionary Computation, 1999.
[370] H. Ishibuchi, H. Masuda, Y. Tanigaki, and Y. Nojima, “Modified distance calculation in
      generational distance and inverted generational distance,” in Evolutionary multi-criterion
      optimization (A. Gaspar-Cunha, C. Henggeler Antunes, and C. C. Coello, eds.), (Cham),
      pp. 110–125, Springer International Publishing, 2015.
[371] E. Zitzler and L. Thiele, “Multiobjective Optimization Using Evolutionary Algorithms - A
      Comparative Case Study,” in Proceedings of the 5th International Conference on Parallel
      Problem Solving from Nature, PPSN V, (London, UK, UK), pp. 292–304, Springer-Verlag,
      1998.
[372] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Computing in Science & Engineer-
      ing, vol. 9, no. 3, pp. 90–95, 2007.
[373] P. Hoffman, G. Grinstein, and D. Pinkney, “Dimensional anchors: a graphic primitive
      for multidimensional multivariate information visualizations,” in Proc. Workshop on new
      Paradigms in Information Visualization and Manipulation in conjunction with the ACM
      International Conference on Information and Knowledge Management (NPIVM99), (New
      York, NY, USA), pp. 9–16, ACM, 1999.
[374] E. Kandogan, “Star coordinates: A multi-dimensional visualization technique with uni-
      form treatment of dimensions,” in In proceedings of the IEEE information visualization
      symposium, late breaking hot topics, pp. 9–12, 2000.
[375] A. Pryke, S. Mostaghim, and A. Nazemi, “Heatmap visualization of population based multi
      objective algorithms,” in Evolutionary multi-criterion optimization (S. Obayashi, K. Deb,
      C. Poloni, T. Hiroyasu, and T. Murata, eds.), (Berlin, Heidelberg), pp. 361–375, Springer
      Berlin Heidelberg, 2007.
[376] Y. S. Tan and N. M. Fraser, “The modified star graph and the petal diagram: two new
      visual aids for discrete alternative multicriteria decision making,” Journal of Multi-Criteria
      Decision Analysis, vol. 7, no. 1, pp. 20–33, 1998.
[377] E. Kasanen, R. Östermark, and M. Zeleny, “Gestalt system of holistic graphics: New
      management support view of MCDM,” Computers & OR, vol. 18, no. 2, pp. 233–239, 1991.
[378] A. K. A. Talukder and K. Deb, “PaletteViz: A Visualization Method for Functional Un-
      derstanding of High-Dimensional Pareto-Optimal Data-Sets to Aid Multi-Criteria Decision
      Making,” IEEE Computational Intelligence Magazine, vol. 15, no. 2, pp. 36–48, 2020.
[379] L. Rachmawati and D. Srinivasan, “Multiobjective evolutionary algorithm with controllable
      focus on the knees of the pareto front,” IEEE Transactions on Evolutionary Computation,
      vol. 13, pp. 810–824, Aug. 2009.
                                                 244


[380] K. Deb, A. Sinha, P. Korhonen, and J. Wallenius, “An Interactive Evolutionary Multi-
      Objective Optimization Method Based on Progressively Approximated Value Functions,”
      IEEE Transactions on Evolutionary Computation, vol. 14, no. 5, pp. 723–739, 2010.
                                           245