AN INTERACTIVE KNOWLEDGE-DRIVEN MULTI-OBJECTIVE OPTIMIZATION
        FRAMEWORK FOR ACHIEVING FASTER CONVERGENCE
                                     By
                              Abhiroop Ghosh
                            A DISSERTATION
                                 Submitted to
                         Michigan State University
                 in partial fulfillment of the requirements
                              for the degree of
              Electrical Engineering – Doctor of Philosophy
                                    2022


                                             ABSTRACT
Users interested in solving real-world optimization problems often have many years of experience.
Their intuition or ‘knowledge’ is often overlooked in academic studies due to concerns regarding loss
of generality. Such knowledge can be expressed as inter-variable relationships or functions, which
can provide some initial guidance to a suitably-designed optimization algorithm. Alternatively,
knowledge about variable interactions can also be extracted algorithmically during the optimization
by analyzing the better solutions progressively found over iterations – a process termed innovization.
Any common pattern extracted from good solutions discovered during an optimization run can be
used as a repair operator to modify candidate solutions, but the key aspect is to strike a balance
between the relevance of the pattern identified and the extent of its use in the repair operator, lest
the learned patterns turn out to be properties of unpromising search directions or ‘blind alleys’.
    In this dissertation, we propose a framework combining both user-supplied and algorithmically-
extracted knowledge to repair solutions during an optimization run in an online fashion. Such a
framework is also interactive, allowing the user to provide inputs at any point during the optimiza-
tion. We show the step-wise modifications required for an evolutionary multi-objective (EMO)
framework to allow for: (a) initial user-provided knowledge, (b) automated knowledge extraction
and application using innovization methods, and (c) allowing the user to interact with the framework
at any point during the optimization run. The path to creating such a framework is systematically
performed one step at a time, starting from creating an efficient method of representing problem
knowledge, designing a suitable automated innovization procedure, and finally interleaving human-
provided and machine-extracted knowledge. We show that such a framework can achieve faster
convergence across a variety of practical optimization problems. Some future research directions
are also discussed.


This dissertation is dedicated to my parents, friends, and all the other people who directly or
                               indirectly supported my journey.
                                              iii


                                     ACKNOWLEDGEMENTS
I would like to thank my advisor Dr. Kalyanmoy Deb for his invaluable guidance throughout my
years as a doctoral student, and for believing in me. I am thankful to my Ph.D. committee members,
Dr. Erik Goodman, Dr. Ronald Averill, and Dr. Alejandro Diaz, for their valuable guidance and
input on my research. I would also like to thank Ming Liu, Qiren Gao, and fellow COIN lab
members for giving me a helping hand when I got stuck. I would also like to acknowledge the
financial support of DARPA and the Koenig Chair funding.
    I wish to thank my family, without whom I would not be able to reach this far. My teachers
from high school and all the way to graduate school deserve a special shout out for their valuable
lessons and inculcating the love of learning. I would like to thank my friends in East Lansing, who
made this five-year journey special. Many of you have graduated and moved elsewhere, but I will
always remember the obligatory Friday hangout and/or the Saturday brunch.
    I would also like to acknowledge my friends in high school and in my undergraduate university.
Even if we are far away in different countries, the good times with you will never be forgotten.
                                                   iv


                              TABLE OF CONTENTS
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     vi
CHAPTER 1   INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         1
CHAPTER 2   BACKGROUND AND LITERATURE REVIEW . . . . . . . . . . . . . . .                     5
CHAPTER 3   KNOWLEDGE REPRESENTATION . . . . . . . . . . . . . . . . . . . . . 14
CHAPTER 4   USER-GUIDED INNOVIZATION-BASED MOEA FRAMEWORK . . . . 18
CHAPTER 5   AN INTERACTIVE KNOWLEDGE-BASED EVOLUTIONARY
            MULTI-OBJECTIVE ALGORITHM FRAMEWORK . . . . . . . . . . . . 53
CHAPTER 6   IK-EMOVIZ: A SOFTWARE IMPLEMENTATION FOR THE IK-EMO
            FRAMEWORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
CHAPTER 7   CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . 97
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
                                            v


                               LIST OF ABBREVIATIONS
DE Differential evolution
DM Decision maker
EA Evolutionary algorithm
EMO Evolutionary multi-objective optimization
GA Genetic algorithm
MOEA Multi-objective evolutionary algorithm
MOO Multi-objective optimization
MOP Multi-objective optimization problem
NSGA Non-dominated sorting genetic algorithm
OPF Optimal power flow
PF Pareto front
PO Pareto optimal
PS Pareto set
SB Stepped beam
SBX Simulated binary crossover
SIV Semi-independent variable
                                            vi


                                              CHAPTER 1
                                           INTRODUCTION
Algorithms required to solve practical optimization problems often need to be customized in
order to handle problem-specific difficulties. This is often at odds with academic optimization
research, which focuses on finding the best possible solution using generalizable algorithms. While
such optimization algorithms can perform reasonably well on practical problems without any
modifications [1, 2], their potential is often underutilized by ignoring problem-specific information
to avoid the risk of misguiding the algorithm towards sub-optimal search regions. Developing a
framework which overcomes this limitation can be very useful, and this dissertation aims to achieve
the same.
1.1    Motivation
For practical multi-objective optimization problems (MOPs), existing broad and generic qualitative
user knowledge is often ignored while devising an algorithm, anticipating that the use of such
additional information may ‘dilute’ the overall study. However, when real-world problems are
to be solved in practice, it is perfectly legitimate to use such information with an appropriate
quantification during the optimization process, since computational resources may be limited by
time and/or cost. Additionally, not using this information may be considered ‘insulting’ to the user’s
many years of earned knowledge and intuition about the problem. If the users are ready to utilize
additional qualitative user-guided knowledge to speed up the optimization process, challenges exist
in determining how to quantify the knowledge and the process by which such knowledge can be
incorporated within an optimization algorithm. This study addresses these issues of accepting
qualitative user-knowledge and validating their quantification through careful analysis of evolved
solutions and also discusses ways to utilize validated user-knowledge to improve solution quality.
    It has been discussed in the context of single-objective evolutionary algorithms (EAs) that
an optimization algorithm with generic recombination and mutation operators (such as simulated
binary crossover (SBX) [3] or differential evolution (DE) [4] may be too slow to lead to high-
performing regions of the search space in complex practical problems involving a large number
                                                   1


of variables. Although these methods are probably ideal for solving benchmark problems due to
non-availability of problem-specific information, real-world problems usually come with physical
parameters that must be related in certain ways for a solution to be meaningful. In a recent billion-
variable resource allocation problem originating from a casting scheduling task in a foundry [5], the
linearity of constraint structures allowed the development of an efficient customized evolutionary
algorithm that exploited the linearity. The outcome demonstrated a polynomial performance of
the resulting customized genetic algorithm that scaled to solve similar scheduling problems having
50,000 to one billion variables.
    Another study used the concept of semi-independent variables [6], in which a redefinition of
variables was proposed to handle user-specified monotonic relationships among variables in the
form of 𝑥𝑖 ≤ 𝑥𝑖+1 ≤ 𝑥𝑖+2 ≤ . . . ≤ 𝑥 𝑗 as additional knowledge, so that every created solution
automatically satisfies all the above monotonic constraints. However, for a different relationship
(such as a monotonically decreasing relationship, or another pattern in variables), a new description
of semi-independent variables needs to be defined and implemented.
    Pre-specifying problem information is not the only way to provide guidance to an optimization
algorithm. Recent innovization studies [7, 8, 9] have demonstrated that additional problem infor-
mation can be extracted from high-performing solutions created during the optimization process,
thereby leading to a more efficient search and faster convergence.
    It cannot be denied that users with many years of experience in solving a given problem
class can usually provide certain qualitative information about variables, such as monotonicity or
other properties among variables. Although qualitative, it will not be wise to simply ignore such
information in an optimization task, particularly when the problem being solved is large-scale or
sufficiently nonlinear. However, integration of additional qualitative problem information raises
several challenging issues. How can the qualitative information be quantified? How can the
qualitative information be validated to be useful in speeding progress toward the optimal region?
To what extent should such information be used in an optimization algorithm in order not to cause
a premature or false convergence? Thus, an adaptive algorithm must assess the worth of user-
                                                  2


provided information systematically and utilize it differently as the algorithm navigates from its
initial generation to the end.
    This study addresses the issues mentioned above and proposes generic knowledge-based
methodologies for solving practical multi-objective optimization problems. The aim is to demon-
strate how a multi-objective optimization algorithm can work in tandem with pre-specified qualita-
tive problem information, automatically extract problem-specific knowledge, and keep the user in
the loop so that large-scale problems can be solved relatively quickly.
1.2     Challenges
The objective of this dissertation comes with a lot of challenges which needs to be addressed in
order to apply the proposed methods for practical problems.
1.2.1    Knowledge representation
Collecting any relevant problem knowledge from the user is not useful unless there exists a method
to convert qualitative knowledge into a quantitative version. A standardized representation method
will allow the user to communicate with the optimization algorithm which can only process
mathematical representations of knowledge.
1.2.2    Quality of user-provided knowledge
While user guidance in the form of problem knowledge can be useful, it should not hinder the ability
of an optimization algorithm to discover new solutions. Too much emphasis on user-provided
knowledge is not desirable as it can be imperfect or incorrect.
1.2.3    Knowledge discovery and application
A good automated method to learn useful information from the good solutions found during the
intermediate runs can be very useful towards speeding up convergence. However, the same concerns
as those regarding user-provided knowledge apply in this case. Any discovered knowledge should
not be trusted blindly since it can potentially steer the algorithm towards subpar solutions. Applying
knowledge learned from solutions obtained prior to convergence should be done carefully.
                                                     3


1.2.4    User interaction method
In order to allow for effective user interaction, we need to develop methods to allow seamless
information transfer between optimization algorithms and the user. Some safeguards need to be
present to verify the user-provided information is of high quality.
1.2.5    Large-scale problems
For problems with a high number of decision variables, interaction with the user can become
complex. The computational cost of knowledge discovery also increases, thus requiring scalable
solutions.
1.3     Organization of the dissertation
The rest of dissertation is structured as follows. In Chapter 2, we provide some concepts and
literature survey relevant to the dissertation topic. In Chapter 3, we define multiple ways to
represent problem knowledge in order to standardize our experiments. In Chapter 4, we present
the first iteration of our proposed framework which provides a way for the user to provide some
initial guidance to the optimization algorithm, as well as propose an online innovization method to
learn certain types of inter-variable relations and apply them through one or more repair operators.
Chapter 5 consists of the final version of the proposed framework which is able to handle multiple
types of knowledge representation. Chapter 6 shows a software implementation of the proposed
framework. Finally we summarize the contributions of this dissertation and provide some promising
directions for future works in Chapter 7.
                                                   4


                                                  CHAPTER 2
                          BACKGROUND AND LITERATURE REVIEW
In this chapter, we outline the necessary concepts and related works.
2.1     Multi-objective optimization
For real-world optimization problems, there are often multiple, potentially conflicting criteria or
objectives a good design needs to satisfy [10, 11]. Such problems are also referred to as multi-
objective optimization (MOO) problems. In traditional single-objective optimization algorithms,
only one objective function can be optimized which poses certain limitations for solving MOO
problems. Approaches like weighted metric methods [12] can be used to tackle such problems using
single-objective optimization algorithms. On the other hand, true multi-objective optimization
algorithms like MOGA [13], SPEA [14] and NSGA-II [15] aim to present a set of solutions to the
decision maker (DM) representing a tradeoff between two or more objectives. Out of these, one of
the solutions will be selected by the DM. An MOO problem can be formulated as follows:
                                  Minimize 𝑓𝑚 (x)             𝑚 = 1, 2, ..., 𝑀,                   (2.1)
                                        s.t. 𝑔 𝑗 (x) ≥ 0,     𝑗 = 1, 2, ..., 𝐽,                   (2.2)
                                             ℎ 𝑘 (x) = 0,     𝑘 = 1, 2, ..., 𝐾,                   (2.3)
                                             𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈 ,  𝑖 = 1, 2, ..., 𝑁,                  (2.4)
where x = [𝑥1 , 𝑥2 , ..., 𝑥 𝑁 ] 𝑇 is a solution represented as a vector of 𝑁 decision variables, 𝑓𝑚 (x)
represents an objective function, 𝑔 𝑗 (x) represents an inequality constraint, ℎ 𝑘 (x) represents an
equality constraint, 𝑀 is the number of objectives, 𝐽 is the number of inequality constraints and 𝐾
is the number of equality constraints. Equation 2.1 degenerates to a single-objective optimization
problem formulation for 𝑀 = 1. Figure 2.1 represents a bi-objective optimization problem (𝑀 = 2)
and the mapping from the feasible decision variable space to the objective space.
     In case of a single objective optimization problem, two solutions can be compared using the
corresponding objective values. For a minimization problem, a lower objective value is considered
to be better. However for the MOO problem formulation shown in Equation 2.1, such comparison
                                                        5


         Figure 2.1 Decision and objective space of a multi-objective optimization problem.
is not as straightforward as the single-objective optimization case. For a bi-objective problem
(𝑀 = 2), if one solution (x (1) ) has a lower value of all the objectives ( 𝑓1 (x) and 𝑓2 (x)) compared
to another solution (x (2) ), then x (1) can be declared as superior to x (2) . Alternately, x (1) may have
a lower value for the first objective ( 𝑓1 (x)) compared to x (2) , but a higher objective value for the
second objective ( 𝑓2 (x)). In such a case, x (1) cannot be said to be better than x (2) . To resolve this
issue, the concept of dominance [11] can be used as defined below.
Definition 1 A solution x (1) is said to dominate another solution x (2) , if 𝑓𝑖 (x (1) ) ≤ 𝑓𝑖 (x (2) )∀𝑖 ∈
{1, 2, ..., 𝑀 } and 𝑓 𝑗 (x (1) ) < 𝑓 𝑗 (x (2) )∃ 𝑗 ∈ {1, 2, ..., 𝑀 }. x (1) dominating x (2) is represented as
x (1) ⪯ x (2)
Definition 2 A set of solutions 𝑃 is said to be non-dominated if no member of 𝑃 is dominated by
another member of the same set.
     In simple terms, definition 1 says that a solution x (1) is better than or ‘dominates’ another
solution x (2) if the following conditions are satisfied:
    1. x (1) is equal to or better than x (2) in all the objectives [ 𝑓1 (x), 𝑓2 (x), ..., 𝑓𝑚 (x)].
    2. x (1) is strictly better (has a lower value) than x (2) in at least one objective.
                                                          6


    Definition 2 defines the concept of a non-dominated set where no solution in the set is better
than the other according to the dominance criteria defined in definition 1.
    Figure 2.2 illustrates the dominance concept by showing 6 solutions (x (1) to x (6) ) in the objective
space. After performing pairwise comparison among all the solutions, the non-dominated set
𝑃 = {x (1) , x (2) , x (3) }. Each solution in 𝑃 dominates the other solutions not in 𝑃. For example,
according to the dominance criteria shown in definition 1, x (2) ⪯ x (5) and x (2) ⪯ x (6) .
                              Figure 2.2 A multi-objective optimization problem.
2.2    Evolutionary multi-objective optimization (EMO) algorithms
Practical design problems often contain complexities like multiple local optima, discrete variables,
non-linearity, discontinuity, etc. Classical methods like gradient descent [16] can be ill-equipped
to handle such situations. An evolutionary algorithm (EA) like the genetic algorithm (GA) [17],
genetic programming [18], evolution strategy [19], and differential evolution [4] does not have
such limitations. A significant feature of EAs is the use of multiple solutions across iterations or
generations as opposed to point-based methods. Such a population of potential solutions allows
EAs to explore the search space with a reduced risk of converging to a local optimum.
    Many practical problems often have multiple conflicting objectives and can be designed as
                                                      7


MOO problems. Population-based EAs are able to store and evaluate multiple solutions at each
generation, a property suitable for solving MOO problems. While methods such as weighted
sum and Tchebycheff metrics [12] can be used to apply single-objective EAs to MOO problems,
they often have some shortcomings [11]. Thus, evolutionary multi-objective optimization (EMO)
algorithms [11, 10, 15, 20, 21, 22, 23] were proposed in order to tackle MOO problems.
                               Figure 2.3 A basic EMO framework.
    A basic EMO framework is illustrated in Figure 2.3. It consists of multiple components as
described below:
    • Problem specification - This is the first step in the optimization. Here, the user specifies the
      objective functions, the decision variables, the constraints, and the model to be used.
    • Optimization strategy - Here, an EA is selected with suitable parameter values for solving
      the problem.
    • Generate new solutions - Create new candidate solutions according to the EMO algorithm.
      For genetic algorithm-based EAs, new solutions are generated randomly in the first generation
      and through crossover and mutation operators in the subsequent generations.
    • Evaluation and selection - Evaluate the objective functions and select best-performing solu-
      tions.
                                                   8


    • Best solutions - These are the best-performing solutions selected in the evaluation and
       selection step. For an EMO algorithm the non-dominated solution set consists of the best-
       performing solutions.
    • Termination criteria - Decide when to stop the algorithm. In this dissertation, an upper bound
       on the number of objective function evaluations (FEs) performed by the EMO algorithm will
       be used as a termination criteria.
2.3     Human-computer interaction in optimization
For real-world problems, designers often have some problem knowledge or intuition developed
over multiple years. Developing methods to exploit such knowledge can help in improving the
optimization performance. In the following sections, we review existing work which proposes
methods to incorporate problem knowledge into an optimization process, either through human
input or via heuristics. We also review works which propose automated methods to algorithmically
extract problem knowledge.
    For complex single-objective practical problems, evolutionary algorithms (EAs) with generic
recombination and mutation operators [3, 4] may be too slow to lead to high-performing regions of
the search space. Good performance of an algorithm in solving benchmark problems such as ZDT
[24], DTLZ [25], WFG [26], and CEC 2009 [27] does not always translate to good performance
on practical problems. For such cases, creating customized algorithms leveraging additional
problem information is necessary. Deb and Myburgh [28] proposed a customized evolutionary
algorithm that exploited the linearity of constraint structures to solve a billion-variable resource
allocation problem. A micro-genetic algorithm [29] combining range-adaptation and knowledge-
based re-initialization was applied to an airfoil optimization problem. Semi-independent variables
[6] can be used to handle user-specified monotonic relationships among variables in the form of
𝑥𝑖 ≤ 𝑥𝑖+1 ≤ 𝑥𝑖+2 ≤ . . . ≤ 𝑥 𝑗 . Every newly created solution is guaranteed to automatically satisfy
the given monotonic constraints. However, a new description of semi-independent variables needs
to be defined and implemented for a different relationship, such as a monotonically decreasing
relation of the form 𝑥𝑖 ≥ 𝑥𝑖+1 ≥ 𝑥𝑖+2 ≥ . . . ≥ 𝑥 𝑗 . Some techniques for combining EAs with
                                                   9


problem knowledge are given in [30].
2.3.1   Human-specified problem knowledge
A common way to incorporate human intuition into an optimization algorithm for a practical
problem is to design a customized EA or apply some heuristics. Customized algorithms have
problem knowledge implicitly built-in, and often provide a better performance compared to the
standard versions of popular EAs. Amouzgar et al. [31] proposed a modified GA with a customized
solution representation for a multi-objective machining tool-indexing problem. Diaz et al. [32]
proposed a simulation-based NSGA-II algorithm for optimizing reconfigurable manufacturing
systems. Nourmohammadi et al. [33] created a a hybrid GA with variable neighborhood search
in order to solve an integrated supermarket location and transport vehicles selection problem.
It was shown to outperform traditional GA. Amouzgar et al. [34] proposed a GA with custom
operators for minimizing the non-machining time of CNC machines. In combinatorial optimization
problems, like the traveling salesman problem, specific studies exist [35]. An algorithm inspired
by the knapsack problem has been used in [36] for solving a pharmaceutical clinical trial planning
problem. A meta-heuristic optimization algorithm has been used in [37] to achieve collaboration
between an engineer and an architect for designing pre-fabricated wall-floor building systems. A
two-step heuristic-based algorithm is presented in [38] for performing network topology design.
Another application exists in the layer patterning of plate fin heat exchangers [39]. Gandomi et
al. [6] proposed a novel semi-independent variable formulation in order to enforce pre-defined
monotonically increasing or decreasing variable patterns.
2.3.2   Computer-extracted problem knowledge
In the previous sections, we have covered some works which made use of customized EAs or
problem formulations to incorporate additional problem knowledge. In this section, we review
some existing works which do not make use of pre-specified problem knowledge. Instead, learning
methods are used to automatically extract such information. Rios et al. [40] proposed a novel deep
neural network for learning compact geometric representations of car shapes which can be used to
generate more optimal designs. Friess et al. [41] used artificial neural networks for feature extraction
                                                  10


among optimal solutions found by an EA. A simple transfer learning framework for continuous
evolutionary single-objective optimization is proposed in [42] which captures knowledge from
completed optimization runs as empirical probability distributions. Ruan et al. [43] performed a
computational study to determine the effectiveness of knowledge transfer from one instance of an
optimization run to another.
    Alternatives to pre-specifying problem information exist, such as cultural algorithms which
encode domain knowledge inside a belief space [44]. Self-organizing maps (SOMs) can provide
information about important design variable clusters [45]. Recent innovization studies [7, 8, 9] aim
to extract additional problem information from high-performing solutions during the optimization
process. This can lead to a faster convergence.
2.3.3   Interactive optimization
For practical problems, designers may want to play a more active role in the optimization process
rather than just providing information in an apriori manner. In this section, we review some existing
works covering interactive optimization involving the exchange of problem knowledge.
    Saha et al. [46] proposed a method to mimic user behavior when using an EA to find the
optimal shapes of 3D objects which satisfies the user’s aesthetic sensibilities. Karlsson et al. [47]
combines NSGA-II, and a customized data mining algorithm, called Flexible Pattern Mining to
extract knowledge in the form of rules and guide the optimization to converge towards a decision
maker’s preferred region in the objective space. Morshedzadeh et al. [48] propose a method to
optimize information management in product lifecycle management systems, which is consistently
being changed and revised by users. Lidberg et al. [49] proposes a method to support the DM
during the optimization real-world factory flows. Smedberg et al. [50] proposes Trend Mining – a
visualization method aimed towards assisting DMs in understanding the effect of each variable on
the objective space.
    Interactive optimization is when the user, referred to as the decision maker (DM), provides
guidance during the optimization [51]. Multiple ways to interactively specify information exist,
such as aspiration levels [52], importance of individual objectives [53, 54], pairwise comparison
                                                  11


of solutions [55], etc. Many other interactive optimization methods can be found in literature
[56, 57, 58, 55, 59, 60, 61, 62, 63].
    Using additional problem information comes with a set of challenges. An effective knowledge
representation method needs to be designed which can be used effectively by an optimization
algorithm. At the same time, it should also be comprehensible to the user. Validating any user-
provided knowledge is necessary since the quality of supplied information may vary. This reduces
the possibility of a premature or false convergence. The user may wish to periodically monitor and
review optimization progress as well as any learned information. If necessary, the user can also
supply information in a collaborative fashion [58, 64]. However, care needs to be taken to ensure
that any user feedback does not lead the search process towards sub-optimal solutions.
2.3.4   Innovization
Non-dominated solutions found by an MOEA may share common characteristics which may help
us uncover hidden knowledge about the practical problem under consideration. Innovization [7] is
the process of finding such hidden common characteristics, also referred to as ‘rules’, which can
be retained as important knowledge, ready to be applied to hasten the convergence properties of an
optimization algorithm. In early studies [8, 7], innovization was presented as a post-optimization
step to extract key knowledge for future analysis. The problem first needed to be solved using an
MOEA. Then a manual analysis was performed to discover hidden relationships existing among
the decision variables, constraints and objective functions.
    Automated innovization was introduced in [65] which allowed the discovery of multiple design
rules in a single step. Some applications of automated innovization on practical engineering
problems were covered in [66].
    Multiple Pareto fronts can be obtained from the same optimization problem when the problem
parameters are changed. Higher level innovization [67, 68] is the process of finding common
overarching rules across multiple such Pareto fronts. Lower level innovization [8, 69], on the
other hand, involves finding hidden characteristics from a preferred region of a single Pareto front.
Temporal innovization [70] determines the relative hierarchy of obtained rules by analyzing their
                                                 12


significance across previous generations. The principles appearing earlier are hierarchically more
important than those which emerge later in the optimization.
    Further works [71, 72, 73] demonstrated the benefits of applying the derived innovization
principles within the same optimization run in an online manner, as heuristics or as repair operators.
It was shown that online application of innovization rules helped in achieving faster convergence
on many benchmark and practical problems.
2.3.5   Optimization software
Multiple software implementations of optimization algorithms exist in order to make existing single
and multi-objective optimization algorithms accessible to academic and commercial users. A well-
designed optimization software or ‘solver’ can often make formulating and solving optimization
problems easier, as well as save time. In addition, they can provide useful visualization tools and
parallel computation capabilities.
    Some of the popular commercial optimization solvers are CPLEX [74], GUROBI [75], XPRESS
[76], modeFrontier [77], VisualDoc [78], HEEDS [79], Optimus [80], optiSLang [81], and iSight
[82], to name a few. Open-source solvers also exist, such as jMetal [83, 84], pymoo [85], PISA
[86], NIMBUS [87], and DESDEO [88].
    The popularity of optimization solvers illustrate the importance to back up theoretical optimiza-
tion frameworks with a robust software implementation. As a part of this dissertation, a graphical
user interface (IK-EMOViz) was also developed.
                                                  13


                                            CHAPTER 3
                               KNOWLEDGE REPRESENTATION
Knowledge is a generic term and can be interpreted in many different ways depending on the
context. For an optimization task, here, we restrict the definition of knowledge to be additional
information provided or extracted about the optimization problem itself. Specifically, we are
interested in variable-variable relationships that commonly exist in high-performing solutions of
the problem. A practical optimization task minimizes a number of objectives and satisfies a number
of constraints, all stated as functions of one or more variables. Thus, understanding the variable-
to-variable relationships which are common to feasible solutions (each represented by a variable
vector) with small objective values is critically important. A supply of such knowledge a priori
by the users, in addition to the optimization problem description, or a discovery process of such
knowledge from the evolving high-performing optimization solutions, can be directly utilized by the
optimization algorithm to speed up its search process. Moreover, if such knowledge is discovered
during the optimization process, users will benefit from having this knowledge in addition to the
optimal solutions of the problem.
3.1    Structure of rules considered in this study
For an interactive knowledge-based optimization algorithm to work, a standard form of knowledge
representation is necessary which is simple enough for users to understand but has enough com-
plexity to capture problem knowledge accurately. Using algebraic expressions or ‘rules’ is one way
of representing knowledge and has been extensively used in the ‘innovization’ literature [7, 9]. A
rule can take the form of an equality or an inequality, as shown below:
                                               𝜙(x) = 0,                                        (3.1)
                                              𝜓(x) ≤ 0.                                         (3.2)
Any arbitrary form of rules involving many variables from a decision variable vector (x) and
complicated mathematical structures of functions 𝜙 or 𝜓 may be considered [89, 90], but such
rules would not only be difficult to learn, they would also be difficult to interpret by the user. In
                                                  14


this study, we restrict the rules to have simple structures involving a maximum of two variables, as
discussed below.
3.1.1   Constant rule
This type of rule involves only one variable taking a constant value (𝑥𝑖 = 𝜅𝑖 ). In terms of Equation
3.1, for the 𝑖-th variable, the structure of the rule becomes 𝜙𝑖 (x) = 𝑥𝑖 − 𝜅𝑖 . This type of rule can
occur if multiple high-performing solutions are expected to have in common a fixed value of a
specific variable [91].
3.1.2   Power law rule
Power law rules [7] for two variables 𝑥𝑖 and 𝑥 𝑗 can be represented by Equation 3.1 as 𝜙𝑖 𝑗 (x) = 𝑥𝑖 𝑥 𝑏𝑗 −𝑐,
where 𝑏 and 𝑐 are constants. This form makes power laws versatile enough to encode a wide
variety of rules, such as proportionate or inversely proportionate relationships among two variables.
Interestingly, an inequality power law using a 𝜓 function can also be implemented, but such a rule
may represent a relationship loosely and we do not consider it here.
3.1.3   Equality rule
This type of rule can express the equality principle of two variables 𝑥𝑖 and 𝑥 𝑗 observed in high-
performing solutions. In terms of Equation 3.1, 𝜙𝑖 𝑗 (x) = 𝑥𝑖 − 𝑥 𝑗 is the rule’s structure.
3.1.4   Inequality rule
This type of rule can represent relational properties of two variables 𝑥𝑖 and 𝑥 𝑗 as 𝑥𝑖 ≤ 𝑥 𝑗 or 𝑥𝑖 ≥ 𝑥 𝑗 .
In terms of Equation 3.2, 𝜓𝑖 𝑗 (x) = 𝑥𝑖 − 𝑥 𝑗 or 𝜓𝑖 𝑗 (x) = 𝑥 𝑗 − 𝑥𝑖 are the respective rules. For example,
the radius of two beams in a truss [92] might be related via this type of rule.
    After describing the chosen rule structures, we are now ready to discuss the procedures of
extracting such rules from high-performing variable vectors and applying the extracted rules to the
optimization algorithm. A summary of their representations and use in our analysis are provided
in Table 3.1.
3.2     User-provided knowledge
Before the start of the optimization, the user may provide some initial information which will affect
how the framework operates, details of which are given below.
                                                      15


Table 3.1 Rule types and the corresponding mathematical representation. X represents the set
of ND solutions. 𝑥𝑖 and 𝑥 𝑗 refer to the 𝑖-th and 𝑗-th variables, respectively, of an ND solution
x ∈ X. The corresponding variables in a new solution x𝑟 to be repaired are labeled as 𝑥𝑖𝑟 and
𝑥 𝑗𝑟 , respectively. Normalized variables are represented by a hat (𝑥ˆ𝑖 , 𝑥ˆ 𝑗 ). Higher ranked rules are
preferred while performing repair. The score (𝑠) is a measure of how well X follows the rule in the
representation column. Satisfaction condition dictates whether x𝑟 follows the respective rule.
   Rule type               Representation                                            Score
                                                           | 𝐴𝑖 |
   Constant            𝜙𝑖 (x) = 𝑥 𝑖 − 𝜅 𝑖 = 0     𝑠 𝜙𝑖 =    |X| ,     where 𝐴𝑖 = {1 : ∀x ∈ X, |𝑥 𝑖 − 𝜅 𝑖 | ≤ 𝜌𝑖 }, 𝜅𝑖 = 𝑥˜𝑖
   Power law         𝜙𝑖 𝑗 (x) = 𝑥ˆ𝑖 𝑥ˆ 𝑏𝑗 − 𝑐 = 0 𝑠 𝜙𝑖 𝑗 = 𝑅 2 score of linear regression shown in Equation 5.2
                                                             | 𝐵𝑖 𝑗 |
   Equality            𝜙𝑖 𝑗 (x) = 𝑥 𝑖 − 𝑥 𝑗 = 0   𝑠 𝜙𝑖 𝑗 =     | X | , where 𝐵𝑖 𝑗 = {1 : ∀x ∈ X, |𝑥 𝑖 − 𝑥 𝑗 | ≤ 𝜀𝑖 𝑗 }
                                                             |𝐶𝑖 𝑗 |
   Inequality (≤)    𝜓𝑖 𝑗 (x) = (𝑥 𝑖 − 𝑥 𝑗 ) ≤ 0  𝑠 𝜓𝑖 𝑗 =     | X | , where 𝐶𝑖 𝑗 = {1 : ∀x ∈ X, 𝑥 𝑖 ≤ 𝑥 𝑗 }
                                                             |𝐷𝑖 𝑗 |
   Inequality (≥)    𝜓𝑖 𝑗 (x) = (𝑥 𝑗 − 𝑥 𝑖 ) ≤ 0  𝑠 𝜓𝑖 𝑗 =     | X | , where 𝐷 𝑖 𝑗 = {1 : ∀x ∈ X, 𝑥 𝑖 ≥ 𝑥 𝑗 }
3.2.1     Variable grouping
                                                                 𝑛(𝑛−1)
For a problem with 𝑛 variables, there can be                          2   pairwise variable interactions. For any
reasonable-sized problem, such a huge number of meaningful relationships may not exist. In
practice, the user may be interested in only a handful of relationships that relate some critical
decision variables. In order to reduce the complexity, variables can be divided into different groups
𝐺 𝑘 for 𝑘 = 1, 2, ..., 𝑛𝑔 . Each group consists of variables that the user thinks are likely to be related.
Group information is specified prior to the optimization. If no group specification exists, then all
                                                                               𝑛(𝑛−1)
𝑁 variables are considered as part of a single group and all                      2   pairwise variable combinations
will be considered. Inter-group relationships are not discoverable under this scheme. Variables
that are not part of any group are assumed not to be related to other variables. For example, assume
there are two variable groups 𝐺 1 = {2, 3, 5} and 𝐺 2 = {1, 4, 7} for an 8-variable problem. For
𝐺 1 all pairwise combinations (𝑥 2 , 𝑥3 ), (𝑥2 , 𝑥5 ), and (𝑥 3 , 𝑥5 ) will be checked for the existence of any
possible relationships. A similar process is repeated for 𝐺 2 . Since inter-group relationships are not
explored, combinations like (𝑥 3 , 𝑥7 ) will not be considered. Variables 𝑥 6 , and 𝑥 8 are not part of any
group, hence they are assumed not to be related to the other variables in any meaningful way.
                                                            16


3.2.2   Rule hierarchy
In the proposed framework we consider four types of rules as presented in Section 3.1. A pair of
variables may be related by more than one rule type. In that case, we will select one type of rule
according to a predefined hierarchy. At the start of the optimization each rule is assigned a rank.
The existence of a particular rule type for one or more variables is checked rank-wise. For example,
if constant rules are ranked 1, followed by power laws (rank 2) and inequalities (rank 3), then the
variables in every group will be checked for constant rules first. The variables which do not exhibit
constant rules will then be checked for power laws, and so on. For relations having equal ranking,
a scoring criterion needs to be used to determine which rule better represents the non-dominated
(ND) front and will be used by the algorithm.
                                                   17


                                            CHAPTER 4
              USER-GUIDED INNOVIZATION-BASED MOEA FRAMEWORK
In this chapter, we propose an extension of the base EMO framework described in Chapter 2.2 which
combines initial user-provided guidance and online innovization to perform a more efficient search.
It also makes provisions for verifying the correctness of each piece of provided information,
adjusting its influence on the optimization process accordingly. This requires an optimization
algorithm that is customizable. Classical point-based methods do not offer such flexibility, making
MOEAs such as NSGA-II [15] a better choice. Fig. 4.1 illustrates the MOEA/I framework and
covers the entirety of the optimization process.
Figure 4.1 User-guided innovization-based MOEA framework (MOEA/I) showing learning and
repair agents. Blue blocks represent a normal EMO. Green blocks represent the components
responsible for knowledge extraction and application.
    The MOEA/I framework [93] consists of some major additions to the base EMO framework as
described below:
     • User knowledge - An intermediate step between problem specification and devising an
                                                  18


       optimization strategy where the user can provide some initial guidance to the optimization
       algorithm.
     • Learning agent - An algorithm or a machine learning method that extracts knowledge from
       the best solutions found during the optimization.
     • Learned model - The model created by the learning agent which encodes knowledge in some
       pre-specified format.
     • Knowledge base - Information about the learned models in each generation is stored in the
       knowledge base. This information can be retrieved later if necessary.
     • Repair agent - Based on the information stored in the knowledge base, the repair agent
       modifies or ‘repairs’ the decision variable values of the newly generated solutions.
     • Repaired solutions - The modified solution set to be sent for evaluation.
    The subsequent sections present the user knowledge specification procedure and the four repair
operators used in this study, three of which are an extension of the operators proposed by [92].
Each repair operator functions differently when explicit relations are specified compared to when
only variable groups are specified without any explicit relation. In this study, the scope of the repair
operators is limited to inequality relations. However, if the problem demands it, custom repair
operators can be designed and plugged into the MOEA/I framework.
4.1     User knowledge
User knowledge can be of different types, relating two or more comparable variables or a combi-
nation of variables, objective or constraint functions in certain ways. Comparable variables mean
that they are of identical units and varying in similar range. For example, two size-related variables
varying within [𝑎, 𝑏] are defined as comparable variables here. This information is in addition to
the optimization problem formulation provided by the user and perhaps acquired over many years
of experience of the user in dealing with past solutions of the problem.
                                                   19


    An upper-diagonal relationship matrix 𝑈 = [𝑢𝑖 𝑗 ], where 𝑖, 𝑗 ∈ [1, 𝑛], 𝑖 < 𝑗 is proposed to store
all user-provided pair-wise variable information:
                          
                          
                          
                          
                          
                           0,  if 𝑥𝑖 and 𝑥 𝑗 are not likely to have any relationship,
                          
                          
                          
                          
                          
                          
                          
                           1,  if 𝑥𝑖 and 𝑥 𝑗 have a relationship, but unknown,
                          
                          
                          
                          
                    𝑢𝑖 𝑗 = 2,   if 𝑥𝑖 < 𝑥 𝑗 ,                                                         (4.1)
                          
                          
                          
                          
                          
                          
                          
                          
                           3,  if 𝑥𝑖 > 𝑥 𝑗 ,
                          
                          
                          
                          
                          
                           4,
                               if 𝑥𝑖 ≈ 𝑥 𝑗 .
                          
As mentioned before, all variable pairs for which the relationship index 𝑢𝑖 𝑗 > 0 are required to have
the same scale and represent similar quantities. This is justified by some practical considerations.
Two variables representing totally different quantities, such as length and weight, should not be
expected to have any direct inequality relationship.
                                                                                       𝑛(𝑛−1)
    For LSMOPs, a large number of decision variables will make specifying all             2   relationships
extremely cumbersome. However, in practical problems, variable patterns can be specified or
envisioned to have relationships in groups (or variable clusters). 𝐾 groups of variables can be
specified (𝐺 𝑘 , 𝑘 = 1, 2, . . . , 𝐾). An example of a 6-variable 𝑈matrix and 3 groups (𝐺 1 -𝐺 3 ) is
given in the supplementary document.
    For the unknown relationship (𝑢𝑖 𝑗 = 1), it is expected that the optimization task will analyze
its best population members to try to discover an exact relationship between variables 𝑥𝑖 and 𝑥 𝑗 .
For 𝑢𝑖 𝑗 = 2 to 4, the optimization task is expected to establish whether the specified relationship
exists and utilize the relationship to fix any population member that violates it. Thus, the qualitative
relationships provided at the start of the optimization task must first be validated and utilized
to make a faster convergence. As a by-product, the user also gets a quantitative version of the
relationships they provided. This repair-based quantification process is described below.
    As an example, consider 6 decision variables divided into three groups of variables 𝐺 1 =
                                                       20


{1, 2, 3}, 𝐺 2 = {1, 5, 6} and 𝐺 3 = {4, 6}. The matrix 𝑈 can be defined as:
                                                                 
                                             4   3 3 0 2 2
                                             
                                                                 
                                             0   4 3 0 0 0
                                             
                                                                 
                                                                 
                                             0   0 4 0 0 0
                                        U =                     
                                                                                                      (4.2)
                                             0   0 0 4 0 1
                                             
                                                                 
                                             0   0 0 0 4 2
                                             
                                                                 
                                                                 
                                             0   0 0 0 0 4
                                                                 
indicating the following relationships: 𝑥 1 ≥ 𝑥 2 ≥ 𝑥3 , 𝑥 1 ≤ 𝑥 5 ≤ 𝑥 6 and (𝑥 4 , 𝑥 6 ) are related via an
unknown relationship.
4.2     Innovization procedure
The learning agent operation (Figure 4.1) is described in Algorithm 4.1. Every variable pair (𝑖, 𝑗)
in a group 𝐺 𝑘 is cross-checked with the relationship matrix 𝑈 to determine which variable pairs
are designated by the user to be connected by an unknown relationship (𝑢𝑖 𝑗 = 1). A reference
vector (x𝑟𝑒 𝑓 ), calculated from the best solutions obtained so far, encodes the ‘average’ relationship
between each variable pair among good designs found so far. The calculation procedure varies
among different repair operators.
    A score 𝑝𝑖 𝑗 is assigned equal to the proportion of the good solution set that follow the relationship
existing among variable pair (𝑖, 𝑗), defined using x𝑟𝑒 𝑓 . This score represents how well (x𝑟𝑒 𝑓 ) reflects
the set of good solutions and also acts as the probability with which a particular offspring is repaired.
4.3     Repair procedure
For every variable group, if an explicit relation is defined (𝑢𝑖 𝑗 > 1), the repair operator ensures
every offspring follows it. Otherwise, the relationships learned through innovization (𝑢𝑖 𝑗 = 1) are
applied with the probability calculated during the innovization procedure.
    The repair operators used in this study are outlined in Table 4.1. Each operator uses the
knowledge available during the optimization run to different extents. For example, operator IR1
constrains the offspring the least, whereas operator IR3 constrains them the most. In addition, an
ensemble operator (I-ES) is also proposed that automatically switches between IR1, IR2 and IR3
                                                   21


Algorithm 4.1 Learning agent algorithm
Require: Good solution set (X), variable groups (𝐺), relationship matrix (𝑈)
Ensure: Reference vector (x𝑟 𝑒 𝑓 ), rule scores
 1: Calculate reference vector (x𝑟 𝑒 𝑓 ) from X;                      ⊲ According to selected repair operator
 2: for each group 𝐺 𝑘 in 𝐺 do
 3:      for each variable pair (𝑖, 𝑗) in 𝐺 𝑘 do
 4:          𝑝 𝑙 ← 0;                                                                ⊲ Rule score for 𝑥 𝑖 ≤ 𝑥 𝑗
 5:          𝑝 𝑔 ← 0;                                                                ⊲ Rule score for 𝑥 𝑖 ≥ 𝑥 𝑗
 6:          𝑝 𝑒 ← 0;                                                                ⊲ Rule score for 𝑥 𝑖 ≈ 𝑥 𝑗
 7:          if 𝑢 𝑖 𝑗 = 1 then
 8:               for each solution x in X do
 9:                      if 𝑥𝑖 ≈ 𝑥 𝑗 and 𝑥 𝑖𝑟𝑒 𝑓 ≈ 𝑥 𝑗𝑟𝑒 𝑓 then
10:                           𝑝 𝑒 ← 𝑝 𝑒 + 1;
11:                      else if 𝑥 𝑖 ≤ 𝑥 𝑗 and 𝑥 𝑖𝑟𝑒 𝑓 ≤ 𝑥 𝑗𝑟𝑒 𝑓 then
12:                           𝑝 𝑙 ← 𝑝 𝑙 + 1;
13:                      else if 𝑥 𝑖 ≥ 𝑥 𝑗 and 𝑥 𝑖𝑟𝑒 𝑓 ≥ 𝑥 𝑗𝑟𝑒 𝑓 then
14:                           𝑝 𝑔 ← 𝑝 𝑔 + 1;
15:                      end if
16:               end for
                             max( 𝑝 , 𝑝 , 𝑝 )
17:                𝑝 𝑖 𝑗 ← 𝑝𝑙 + 𝑝𝑙 𝑔 +𝑔𝑝𝑒 𝑒 ;
18:          end if
19:      end for
20: end for
based on their individual performances. If no repair operators are used, the algorithm is referred to
as ‘Base MOEA’ in this study.
4.3.1    MOEA/IR1 Procedure
It is assumed that the user has already provided the relationship matrix 𝑈 and variable groups 𝐺.
MOEA/IR1 computes the variable-wise average (𝑥𝑖 𝑎𝑣𝑔 ) from all non-dominated (ND) solutions at
the end of a generation and repairs a pair of variables of an offspring solution (𝑥𝑖 and 𝑥 𝑗 ) based on
the supplied problem information 𝑢𝑖 𝑗 . MOEA/IR1 uses the information of the current population
and does not collaborate with the same of the past generation. In this sense, it uses instantaneous
information and may not be trustworthy.
     The repaired variable value 𝑥𝑖 𝑟 for 𝑢𝑖 𝑗 = 1, 2, and 3 are shown in Table 4.1. If 𝑢𝑖 𝑗 = 0,
no repair is performed and a free-form evolution is allowed. If 𝑢𝑖 𝑗 = 1, meaning that a relation
is expected, but is unknown, MOEA/IR1 attempts to learn the evolved relationship between 𝑥𝑖
and 𝑥 𝑗 present among the non-dominated (ND) solutions and to enforce repair of both variable
                                                                  22


values. If 𝑥𝑖 𝑎𝑣𝑔 is smaller than 𝑥 𝑗 𝑎𝑣𝑔 and 𝑥𝑖 is also smaller than 𝑥 𝑗 , the current (𝑥𝑖 , 𝑥 𝑗 ) pair matches
the relationship among ND solutions and hence, the (𝑥𝑖 , 𝑥 𝑗 ) pair is not modified. However, if
𝑥𝑖 ≥ 𝑥 𝑗 , disagreeing with the relationship found between their average ND values, the repaired
(𝑥𝑖 𝑟 , 𝑥 𝑗 𝑟 ) pair in Table 4.1 are closer to their (𝑥𝑖 𝑎𝑣𝑔 , 𝑥 𝑗 𝑎𝑣𝑔 ) values. After the repair, the difference
|𝑥𝑖 𝑎𝑣𝑔 −𝑥𝑖 𝑟 | = |𝑥𝑖 𝑎𝑣𝑔 −0.5(𝑥𝑖 𝑎𝑣𝑔 +𝑥𝑖 )| = 0.5|𝑥𝑖 𝑎𝑣𝑔 −𝑥𝑖 | is smaller than the original difference |𝑥𝑖 𝑎𝑣𝑔 −𝑥𝑖 |
by 50%. The same is true for repaired variable 𝑥 𝑗𝑟 .
      For 𝑢𝑖 𝑗 = 2 and 3, the 𝑥𝑖 and 𝑥 𝑗 are repaired carefully (shown in Table 4.1) so that the supplied
relationship among the two variable is satisfied. For 𝑢𝑖 𝑗 = 4, 𝑥𝑖𝑟 = 𝑥 𝑗𝑟 = 𝑟𝑎𝑛𝑑𝑜𝑚(𝑥𝑖 𝑎𝑣𝑔 , 𝑥 𝑗 𝑎𝑣𝑔 ) is
assigned.
4.3.2       MOEA/IR2 Procedure
This repair operator follows a similar repair process to that of MOEA/IR1, except that 𝑥𝑖 𝑎𝑣𝑔 is
replaced with 𝑥𝑖𝑟 𝑒 𝑓 , which uses a history of change of the average 𝑥𝑖 from the past to the current
generation, as shown in Table 4.1. The notable difference between the IR1 and IR2 operators is
the inclusion of a momentum parameter (𝛾), noting that 𝛾 = 0 makes both methods identical. The
parameter is intended to provide a boost to the optimization algorithm toward a projected good
solution. The average value x𝑎𝑣𝑔 of the variables under consideration reflects the approximate
pattern followed by good solutions in the current generation. The momentum term takes into
account the average values in the previous generation x𝑎𝑣𝑔 (𝑡 − 1), and allows for faster convergence
toward optimal solutions. It also allows the algorithm to avoid following x𝑎𝑣𝑔 too closely, thus,
preserving diversity. This repair method is more trustworthy than IR1, as an attempt is made to
make the variable values close to historically agreeable average values, rather than current average
value alone. The update of variables for 𝑢𝑖 𝑗 = 4 is identical to that in IR1.
4.3.3       MOEA/IR3 Procedure
This repair operator is similar to IR2 for 𝑢𝑖 𝑗 = 2 and 3, but for 𝑢𝑖 𝑗 = 1, the values are repaired
to be within one-sigma (𝜎𝑖 is the standard deviation of the 𝑥𝑖 values among ND solutions of the
current generation) away from the historical average point. Thus, this method trusts the observed
relationships of 𝑥𝑖 and 𝑥 𝑗 more closely than IR1 and IR2. The update of variables for 𝑢𝑖 𝑗 = 4 is
                                                           23


                                Table 4.1 Repair operator description.
           Relation          Operator                                 Repair equations
                                                        𝑎𝑣𝑔 +𝑥𝑖
                                                   ( 𝑥𝑖
                                                          2        ,       for (𝑥 𝑖𝑎𝑣𝑔 − 𝑥 𝑗𝑎𝑣𝑔 ) (𝑥 𝑖 − 𝑥 𝑗 ) < 0,
                                          𝑥 𝑖𝑟 =
                                                     𝑥𝑖 ,                  otherwise.
           Unknown             IR1                ( 𝑥 𝑗 +𝑥 𝑗
                                                          2
                                                             𝑎𝑣𝑔
                                                                   ,       for (𝑥 𝑖𝑎𝑣𝑔 − 𝑥 𝑗𝑎𝑣𝑔 ) (𝑥 𝑖 − 𝑥 𝑗 ) < 0,
           Relationship                  𝑥 𝑗𝑟 =
           (𝑢 𝑖 𝑗 = 1)                               𝑥 𝑗,                  otherwise.
                                                      ( 𝑥𝑖       +𝑥𝑖
                                                           𝑟𝑒 𝑓
                                                                      ,       for (𝑥 𝑖𝑟𝑒 𝑓 − 𝑥 𝑗𝑟𝑒 𝑓 ) (𝑥 𝑖 − 𝑥 𝑗 ) < 0,
                                             𝑥 𝑖𝑟 =           2
                                                        𝑥𝑖 ,                  otherwise.
                                                      ( 𝑥 𝑗 +𝑥 𝑗
                               IR2
                                                                 𝑟𝑒 𝑓
                                                                       ,       for (𝑥 𝑖𝑟𝑒 𝑓 − 𝑥 𝑗𝑟𝑒 𝑓 ) (𝑥𝑖 − 𝑥 𝑗 ) < 0,
                                             𝑥 𝑗𝑟 =            2
                                                        𝑥 𝑗,                   otherwise.
                                                                                                                     
                                        where 𝑥 𝑖𝑟𝑒 𝑓 (𝑡) = 𝑥 𝑖𝑎𝑣𝑔 (𝑡) + 𝛾 𝑥𝑖𝑎𝑣𝑔 (𝑡) − 𝑥𝑖𝑎𝑣𝑔 (𝑡 − 1)
                                                            𝑥 𝑖𝑟 = U (𝑥 𝑖𝑟𝑒 𝑓 − 𝜎𝑖 , 𝑥𝑖𝑟𝑒 𝑓 + 𝜎𝑖 ),
                               IR3                         𝑥 𝑗𝑟 = U (𝑥 𝑗𝑟𝑒 𝑓 − 𝜎 𝑗 , 𝑥 𝑗𝑟𝑒 𝑓 + 𝜎 𝑗 ),
                                         where U (𝑎, 𝑏) ≡ Uniform distribution between [a,b]
                                                           (                    | 𝑥𝑖 𝑗      − 𝑥𝑖      |
                                                              𝑥 𝑖 𝑗𝑎𝑣𝑔 − 𝑎𝑣𝑔 2 𝑎𝑣𝑔 , for 𝑢 𝑖 𝑗 = 2,
                                                  𝑥 𝑖𝑟 =                        | 𝑥𝑖 𝑗      − 𝑥𝑖     |
                                                              𝑥 𝑖 𝑗𝑎𝑣𝑔 + 𝑎𝑣𝑔 2 𝑎𝑣𝑔 , for 𝑢 𝑖 𝑗 = 3.
                                                           (                    | 𝑥𝑖 𝑗      −𝑥𝑗       |
           Direct              IR1                            𝑥 𝑖 𝑗𝑎𝑣𝑔 + 𝑎𝑣𝑔 2 𝑎𝑣𝑔 , for 𝑢 𝑖 𝑗 = 2,
                                                  𝑥 𝑗𝑟 =                        | 𝑥𝑖 𝑗      −𝑥𝑗       |
           Relationship                                       𝑥 𝑖 𝑗𝑎𝑣𝑔 − 𝑎𝑣𝑔 2 𝑎𝑣𝑔 , for 𝑢 𝑖 𝑗 = 3.
           (𝑢 𝑖 𝑗 = [2, 3])                                           𝑥 𝑖𝑎𝑣𝑔 + 𝑥 𝑗𝑎𝑣𝑔
                                            where 𝑥 𝑖 𝑗𝑎𝑣𝑔 =                                    .
                                                                                   2
                                                                        | 𝑥𝑖 𝑗𝑟𝑒 𝑓 − 𝑥𝑖𝑟𝑒 𝑓 |
                                                           𝑖 𝑗𝑟𝑒 𝑓 −                               , for 𝑢 𝑖 𝑗 = 2,
                                                       𝑥
                                                       
                                                       
                                              𝑥 𝑖𝑟 =                                   2
                                                                         | 𝑥𝑖 𝑗𝑟𝑒 𝑓 − 𝑥𝑖𝑟𝑒 𝑓 |
                                                        𝑥 𝑖 𝑗𝑟𝑒 𝑓 +
                                                       
                                                                                      2            , for 𝑢 𝑖 𝑗 = 3.
                                                       
                                                                         | 𝑥           − 𝑥       |
                                                        𝑥 𝑖 𝑗𝑟𝑒 𝑓 + 𝑖 𝑗𝑟𝑒 𝑓 𝑗𝑟𝑒 𝑓 , for 𝑢 𝑖 𝑗 = 2,
                                                       
                               IR2                     
                                                       
                                              𝑥 𝑗𝑟 =                                  2
                                                                         | 𝑥𝑖 𝑗𝑟𝑒 𝑓 − 𝑥 𝑗𝑟𝑒 𝑓 |
                                                        𝑥 𝑖 𝑗𝑟𝑒 𝑓 −
                                                       
                                                                                       2            , for 𝑢 𝑖 𝑗 = 3.
                                                                                                                     
                                        where 𝑥 𝑖𝑟𝑒 𝑓 (𝑡) = 𝑥 𝑖𝑎𝑣𝑔 (𝑡) + 𝛾 𝑥𝑖𝑎𝑣𝑔 (𝑡) − 𝑥𝑖𝑎𝑣𝑔 (𝑡 − 1)
                                                                         |𝑥            −𝑥        |
                                                        𝑥 𝑖 𝑗𝑟𝑒 𝑓 − 𝑖 𝑗𝑟𝑒 𝑓 𝑖𝑟𝑒 𝑓 , for 𝑢 𝑖 𝑗 = 2,
                                                       
                                                       
                                                       
                                              𝑥 𝑖𝑟 =                                   2
                                                                         | 𝑥𝑖 𝑗        − 𝑥𝑖     |
                                                        𝑥 𝑖 𝑗𝑟𝑒 𝑓 + 𝑟𝑒 𝑓 2 𝑟𝑒 𝑓 , for 𝑢 𝑖 𝑗 = 3.
                                                       
                                                       
                                                                         |𝑥            −𝑥        |
                                                        𝑥 𝑖 𝑗𝑟𝑒 𝑓 + 𝑖 𝑗𝑟𝑒 𝑓 𝑗𝑟𝑒 𝑓 , for 𝑢 𝑖 𝑗 = 2,
                                                       
                               IR3                     
                                                       
                                              𝑥 𝑗𝑟 =                                  2
                                                                         | 𝑥𝑖 𝑗        −𝑥𝑗       |
                                                        𝑥 𝑖 𝑗𝑟𝑒 𝑓 − 𝑟𝑒 𝑓 2 𝑟𝑒 𝑓 , for 𝑢 𝑖 𝑗 = 3.
                                                       
                                                                                                                     
                                        where 𝑥 𝑖𝑟𝑒 𝑓 (𝑡) = 𝑥 𝑖𝑎𝑣𝑔 (𝑡) + 𝛾 𝑥𝑖𝑎𝑣𝑔 (𝑡) − 𝑥𝑖𝑎𝑣𝑔 (𝑡 − 1)
identical to that in IR1 and IR2.
                                                       24


4.3.4    Ensemble MOEA/I-ES Procedure
All repair operators designed for MOEA/I must be tested through separate experiments. However,
in practical applications, such experiments might prove costly. A method to avoid this is to combine
these operators into an ensemble. During the optimization run, the performance of each repair
operator is tracked, and the number of solutions allowed to be repaired by each operator is based
on the historical performance of the offspring generated by each. This relieves the need to analyze
every operator, since the best performing will be selected automatically. The ensemble method
also treats Base MOEA as a repair operator, which represents the case when the offspring is not
repaired. This allows the optimization to remain relatively unaffected if bad repair operators are
supplied. Thus, a total of four repair operators are used in the ensemble process.
    The performance of each offspring generated by the 𝑖-th repair operator is based on its offspring
survival rate (𝑟 𝑠𝑖 ). Greater the survival rate of the offspring created by an operator, greater is the
probability of it being used to repair subsequent offsprings. The probability ( 𝑝b𝑖𝑟 ) update operation
for the 𝑖-th operator is described below:
                                                                                    
                                                                  𝑟 𝑠𝑖
                            𝑝𝑖𝑟 (𝑡  + 1) = max 𝑝 𝑚𝑖𝑛 , 𝛼 Í 𝑖 + (1 − 𝛼) 𝑝𝑟 (𝑡) ,b𝑖                          (4.3)
                                                                  𝑖 𝑟𝑠
                                                𝑝 (𝑡 + 1)
                                                  𝑖
                            𝑝b𝑖𝑟 (𝑡 + 1) = Í 𝑟 𝑖             ,                                             (4.4)
                                                  𝑝𝑟 (𝑡 + 1)
                                               𝑖
                                         𝑛𝑖𝑠
where 𝛼 is the learning rate, 𝑟 𝑠𝑖 =    𝑛𝑜 𝑓 𝑓   (𝑛𝑖𝑠 and 𝑛𝑜 𝑓 𝑓 are the number of offsprings created by the 𝑖-th
operator that survive in generation 𝑡 and the total number of offsprings that survive in generation t,
respectively). It is possible that at any point during the optimization, no solution generated by one
of the repair operators survives. This might cause the corresponding selection probability to go
down to zero without any possibility of recovery. To prevent this, in Equation 4.3, the probability
update step ensures that a minimum selection probability (𝑝 𝑚𝑖𝑛 ) is always assigned to each repair
operator present in the ensemble. Equation 4.4 normalizes the probability values for each operator
so that their total sum is 1.
    The learning rate (𝛼) determines the rate of change of the repair probabilities. A high 𝛼 would
increase the sensitivity, and can result in large changes in repair probabilities over a short period
                                                            25


of time. A low 𝛼 exerts a damping effect which causes the probability values to update slowly.
Through trial and error, 𝛼 = 0.5 and 𝑝 𝑚𝑖𝑛 = 0.1 were found to be suitable to the problems of this
study.
4.4                 Sensitivity analysis of momentum parameter
For the IR2 and IR3 repair operators, the calculation of 𝑥𝑖𝑟 𝑒 𝑓 requires specifying a momentum
parameter (𝛾). A sensitivity analysis of 𝛾 is presented in this section. For scenarios 𝑆2 and 𝑆3 of
the 820-member truss, 𝛾 was set at different values between 0 to 1, and 20 runs were performed
for each setting using NSGA-II/IR2. The median HV values obtained after 2 million function
evaluations are recorded for each 𝛾 setting. The results are outlined in Figure 4.2.
               1.10                                                                           1e6
                                                                                        2.0
                                                                     Function e aluations
               1.05
                                                                                                       Scenario S₂
 Hypervol me (HV)
               1.00                                                                     1.8
                                                                                                       Scenario S₃
               0.95
                                                                                        1.6
               0.90
               0.85                                                                     1.4
               0.80
                               Scenario S₂                                              1.2
                               Scenario S₃
               0.75
               0.70                                                                     1.0
                      0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0                             0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
                                 Momentum (γ)                                                            Momentum (γ)
                        (a) Median HV after 2M fevals.                              (b) Median fevals taken to reach HV𝑇 =0.91.
Figure 4.2 Performance of NSGA-II/IR2 for each 𝛾 = 0, 0.1, 0.2, ..., 1. For each 𝛾 combined with
scenarios 𝑆2 and 𝑆3 , 20 runs were performed.
        From Figure 4.2a, it is evident that the HV variation is very low for 𝛾 <= 0.6, whereas for higher
values the median final HV sharply drops. In Figure 4.2b, it is seen that 0.2 ≤ 𝛾 ≤ 0.4 results
in lower function evaluations required to achieve HV𝑇 . The difference in the number of function
evaluations is also low with this setting, indicating a low parameter sensitivity in this region.
The dotted lines signify that NSGA-II/IR2 was not able to reach HV𝑇 within 2 million function
evaluations. Taking the results of both the HV and function evaluations sensitivity analysis into
account, 𝛾 is set to be 0.3 in order to ensure maximum performance and low parameter sensitivity.
                                                                    26


4.5    Truss Design Problem
4.5.1   Problem background and formulation
Truss structures provide a wide variety of optimization applications [94, 95, 96, 97], hence, a
number of algorithms have been published to solve them. Genetic algorithms (GAs) have been
applied to truss design problems by [98, 99] for single objective, [100, 101] for multiple objectives,
and [102] for many objectives.
    For this study, a scalable 3D truss design problem has been created based on [92, 103]. One
such truss having 260 cylindrical members is shown in Fig. 4.3. It is simply supported at the
extreme nodes in the lower portion. All members parallel to the 𝑥 and 𝑦 axes are 4 meters long. A
vertical load of 10 kN is applied in the negative 𝑧-direction to each of the top nodes as shown in
Fig. 4.3. The number of truss members can be increased as necessary to scale up the problem.
                                                                Ungrouped members
                                                                Vertical member (G1, G4)
                                                                Top longitudinal member (G2)
                                                                Bottom longitudinal member (G3)
                                                                Node
                                                                Fixed support
                                                                Force
                      8
                      6
                      4
                    z 2
                       0
                      −2
                      −4
                         12.5
                              10.0                                                           70
                                   7.5                                                   60
                                       5.0                                          50
                                       y   2.5                                 40
                                                                          30   x
                                               0.0                   20
                                                   −2.5         10
                                                        −5.0  0
                   Figure 4.3 A truss structure with support and vertical loadings.
    The truss design problem can be defined as a bi-objective optimization problem, with the
objectives being minimized: (a) weight, and (b) compliance. There are two types of design
variables: size variables, defined by member radii (r), and shape variables, defined by length of the
vertical members (L𝑣 ) parallel to the 𝑧-axis. The L𝑣 for the vertical members at 𝑦 = 0 is assumed
                                                             27


to be the same as that for those at 𝑦 = 4. This prevents too much arbitrariness in the truss shape.
This problem can be scaled by adjusting the number of shape variables (𝑛 𝑠 ). Two constraints are to
be satisfied: (a) stress in each member should be less than 200 MPa, (b) displacement of each node
should be less than 20 mm. The problem formulation is shown below:
                                                        𝑛𝑚
                                                        ∑︁
                           Minimize    𝑓1 (r, L𝑣 ) = 𝜌      𝑉𝑖 (r, L𝑣 ),                        (4.5)
                                                        𝑖=1
                           Minimize    𝑓2 (r, L𝑣 ) = F(r, L𝑣 ) · D(r, L𝑣 ),                     (4.6)
                           Subject to 𝜎𝑖 (r, L𝑣 ) ≤ 𝑆, for 𝑖 = 1, 2 . . . , 𝑛𝑚 ,                (4.7)
                                      |𝐷 𝑗 (r, L𝑣 )| ≤ 𝛿, for 𝑗 = 1, 2, . . . , 𝑛𝑛 ,            (4.8)
                                      𝑟 𝐿 ≤ 𝑟𝑖 ≤ 𝑟 𝑈 , for 𝑖 = 1, 2 . . . , 𝑛𝑚 ,                (4.9)
                                      𝑙 𝐿 ≤ 𝑙 𝑗 ≤ 𝑙 𝑈 , for 𝑗 = 1, 2 . . . , 𝑛𝑛 ,              (4.10)
where 𝑉𝑖 is the volume of the 𝑖 𝑡ℎ member, 𝜌 is the material density (7,000 kg/m3 ), 𝑛𝑚 is the total
number of members, 𝑛𝑛 is the total number of nodes, F and D are vectors consisting of the forces
in members and displacements at nodes, respectively. 𝜎𝑖 is the stress developed in the 𝑖-th member
of the truss and is restricted to material strength 𝑆 = 200 MPa. Radius and length variables are
restricted to [5, 300] mm and [4, 30] m, respectively. The absolute deflection of nodes is restricted
to 𝛿 = 20 mm. While 𝑓1 is relatively easy to compute, 𝑓2 and two constraint evaluations require a
finite element simulation, making the process computationally expensive. Two different trusses are
considered here: one with 820 members and 236 nodes, and the other consisting of 1,380 members
and 396 nodes. Due to the involvement of a large number of decision variables (𝑛 𝑠 + 𝑛𝑚 ), this
problem can be considered as an LSMOP.
4.5.2    User-provided information
Truss design problems are very common in the literature and an engineer can provide some expert
knowledge about good truss shapes or weight distribution for a particular type of loading. In this
problem, the radii of members are comparable variables. The same is true for the lengths of vertical
members. Thus, a pair of variables (𝑖, 𝑗) within each of these classes (radii and lengths) can be be
                                                    28


assigned a relationship index 𝑢𝑖 𝑗 > 0. This, in turn, can be mathematically expressed and integrated
into the MOEA/I algorithm. Two types of knowledge can be specified for this problem:
    • Symmetry: Due to the symmetric nature of the loading and supports, it can be expected that
       an optimal truss design will be symmetric in terms of shape and weight distribution. For
       the truss shown in Fig. 4.3, there can be two planes of symmetry: (a) a plane parallel to the
       𝑦-𝑧 plane and passing through the midpoint of the truss in the 𝑥-direction, and, (b) a plane
       parallel to the 𝑥-𝑧 plane and passing through the midpoint of the truss in the 𝑦-direction.
    • Monotonicity: For an optimal truss, the length of vertical members will monotonically
       increase while moving from the support to the middle. In addition, for a simply-supported
       truss like the one considered here, it is known that the bending moment monotonically
       increases from the support towards the middle. So the radii of the corresponding members
       may also monotonically increase to withstand the large bending moment.
    The incorporation of this a priori user knowledge into the optimization process requires defining
the variable groups (𝐺) and the relationship matrix (𝑈) defining the associated relationships. Four
groups of variables are considered in this study, as shown in Table 4.2. The truss members
corresponding to each group are also highlighted in Fig. 4.3. Each group is also divided into two
or more subgroups. These will be used in specifying symmetry relationships. Each group of size
variables is designed taking into account the physical location of the members corresponding to
those variables. For example, the members lying parallel to the x-axis at the top of the truss can
be expected to be related to each other in some manner rather than to a member with a different
location and physical orientation.
    Based on the variable groups, eight different scenarios (𝑆1 to 𝑆8 ) are created with varying extents
of user knowledge being supplied, as shown in Table 4.3. Each scenario uses the available groups
and subgroups in different ways. Each entry in the table represents the values (𝑢𝑖 𝑗 ) constituting
the matrix 𝑈 for every variable pair (𝑖, 𝑗) ∈ 𝐺 𝑘 . The first four scenarios from 𝑆1 to 𝑆4 do not
                                                   29


Table 4.2 Variable groups for the 820 and 1,380-member truss cases. Each group has comparable
variables having identical units and scales.
                                                                      Variable Indices
          Group   Subgroup            Variable Type
                                                         820-member truss 1,380-member truss
                     𝐺 11                                      [1 − 30]               [1 − 50]
           𝐺1                𝑙𝑖 of vertical members
                     𝐺 12                                     [30 − 59]              [50 − 99]
                     𝐺 21                                     [60 − 88]             [60 − 108]
                     𝐺 22    𝑟 𝑖 of top longitudinal         [89 − 117]            [109 − 157]
           𝐺2
                     𝐺 23    members                        [118 − 146]            [158 − 206]
                     𝐺 24                                   [147 − 175]            [207 − 255]
                     𝐺 31                                   [176 − 204]            [256 − 304]
                     𝐺 32    𝑟 𝑖 of bottom longitudinal     [205 − 233]            [305 − 353]
           𝐺3
                     𝐺 33    members                        [234 − 262]            [354 − 402]
                     𝐺 34                                   [263 − 291]            [403 − 451]
                     𝐺 41                                   [292 − 321]            [452 − 501]
                     𝐺 42                                   [321 − 350]            [501 − 550]
           𝐺4                𝑟 𝑖 of vertical members
                     𝐺 43                                   [351 − 380]            [551 − 600]
                     𝐺 44                                   [380 − 409]            [600 − 649]
enforce any symmetry, making the optimization more challenging. The scenarios 𝑆5 to 𝑆8 enforce
symmetry (𝑢𝑖 𝑗 = 4) in the truss.
    In scenario 𝑆1 , no knowledge is provided by the user. This is the reference scenario based on
which the effectiveness of varying degrees of user knowledge coupled with different innovization-
based repair operators is evaluated.
    In scenario 𝑆2 , only group 𝐺 1 is used. 𝑢𝑖 𝑗 is set to 1 for all variable pairs within subgroups
𝐺 11 and 𝐺 12 . This signifies that the length of each vertical member is expected to consistently
follow a monotonically increasing or decreasing pattern, but it is not known beforehand what the
exact relationship would be. It is up to the innovization process to determine the exact nature of
the relationships among these two groups of variables dictated by the ND solutions.
    𝑆3 extends 𝑆2 and applies 𝑢𝑖 𝑗 = 1 among all variables within each subgroup in all groups. Thus,
relationships in both shape and size variables will now be obtained by the innovization process and
will be enforced by our proposed procedure to various degrees dictated by the relative success of
the three repair schemes.
    Scenario 𝑆4 provides more problem information to the optimization algorithm by specifying
precise relationships among variables of each subgroup of the four groups. For example, within the
                                                      30


Table 4.3 User information matrix entries (𝑢𝑖 𝑗 ) and groups used for all scenarios. For 𝑢𝑖 𝑗 = 1, 2
and 3, the relations are restricted within a subgroup, and for 𝑢𝑖 𝑗 = 4, the relation is between the two
subgroups. For example, for 𝑆6 and 𝐺 1 , 𝑢𝑖 𝑗 = 4 means 𝑟𝑖 ≈ 𝑟 60−𝑖 for 𝑖 = 1, . . . , 30. For 𝑆7 and 𝐺 2 ,
𝑢𝑖 𝑗 = 4 means 𝑟𝑖 ≈ 𝑟 177−𝑖 where 𝑖 = 60, . . . , 117, for subgroups 𝐺 21 and 𝐺 22 .
                                                              Scenario
                        Group    Subgroup
                                              𝑆1    𝑆2    𝑆3   𝑆4 𝑆5       𝑆6    𝑆7   𝑆8
                                    𝐺 11            1     1    2           1     1    2
                          𝐺1                  0                       4      4     4    4
                                    𝐺 12            1     1    3           1     1    3
                                    𝐺 21                  1    2                 1    2
                                              0     0                 0     0      4    4
                                    𝐺 22                  1    3                 1    3
                          𝐺2
                                    𝐺 23                  1    2                 1    2
                                              0     0                 0     0      4    4
                                    𝐺 24                  1    3                 1    3
                                    𝐺 31                  1    2                 1    2
                                              0     0                 0     0      4    4
                                    𝐺 32                  1    3                 1    3
                          𝐺3
                                    𝐺 33                  1    2                 1    2
                                              0     0                 0     0      4    4
                                    𝐺 34                  1    3                 1    3
                                    𝐺 41                  1    2                 1    2
                                              0     0                 0     0      4    4
                                    𝐺 42                  1    3                 1    3
                          𝐺4
                                    𝐺 43                  1    2                 1    2
                                              0     0                 0     0      4    4
                                    𝐺 44                  1    3                 1    3
subgroup 𝐺 11 , variable pairs (𝑥𝑖 , 𝑥𝑖+1 ) are assigned a relationship (𝑢𝑖,𝑖+1 = 2): 𝑥𝑖 ≤ 𝑥𝑖+1 for 𝑖 = 1 to
|𝐺 11 | − 1. For 𝐺 12 , (𝑢𝑖,𝑖+1 = 3): 𝑥𝑖 ≥ 𝑥𝑖+1 for 𝑖 = 1 to |𝐺 12 | − 1 is assigned.
     Scenarios 𝑆5 to 𝑆8 use the same grouping as scenarios 𝑆1 to 𝑆4 , respectively. The only addition
is the application of symmetry relations within the respective variables of the subgroups in each
group. Two planes of symmetry exist as mentioned earlier, and for the corresponding variable pairs
(𝑖, 𝑗), 𝑢𝑖 𝑗 is set as 4.
     In this study, only inequality-based relationships are considered due to the nature of the problems
considered. However, it is possible to extend the scope to cover other types of relations (such as
power laws) as well. In order to test the robustness of the proposed repair operators, different
knowledge levels are considered. Experiments are performed to determine how the algorithm
performs in the most adverse conditions, such as too little or too much information, and scenarios
involving asymmetric trusses. With every scenario, the user knowledge extent is increased, all
the way until the final scenario, which specifies the exact relationships expected to be present in
                                                       31


optimal solutions.
4.5.3   Experimental Settings
In this study, NSGA-II [15], a state-of-the-art MOEA, has been used for the MOEA/I frame-
work. The original NSGA-II algorithm, without any modifications, will be referred to as the base
optimization case or Base NSGA-II. The 4 types of repair methods, MOEA/IR1, MOEA/IR2,
MOEA/IR3 and MOEA/I-ES, presented in Sections 4.3.1 to 4.3.4, were used, and are referred
to as NSGA-II/IR1, NSGA-II/IR2, NSGA-II/IR3 and NSGA-II/I-ES, respectively. For scenarios
𝑆4 and 𝑆8 , additional experiments were performed using semi-independent variables, referred to
as NSGA-II/SIV, described in Section 4.5.3.1. The parameter settings for NSGA-II as well as
for the repair operators are presented in Table 4.4. Two truss cases, one with 820 members, and
another with 1,380 members, were considered. The variable groups were set according to Table
4.2. Different experiments were performed with multiple levels of user knowledge integration,
which are described in Section 4.5.2. The number of decision variables for scenarios 𝑆1 -𝑆8 are
879 and 1,479 for the 820 and 1,380-member trusses, respectively. For scenarios 𝑆5 to 𝑆8 , Base
NSGA-II takes into account the symmetry relations by evolving only one of any pair of variables
following a symmetry relation. Thus, the problems are run with reduced dimensions for these cases
only for Base NSGA-II. For the 820 member truss, the number of decision variables in that case
would be 850 for 𝑆5 -𝑆6 (shape symmetry) and 450 for 𝑆7 -𝑆8 (shape and size symmetry). For the
1380 member truss, the number of decision variables in that case would be 1439 for 𝑆5 -𝑆6 and
750 for 𝑆7 -𝑆8 . 20 runs were performed and for both truss cases, the maximum number of function
evaluations available was set at 2 million.
    The hypervolume (HV) metric [104, 105, 106] is used as a performance metric. The median
HV of scenario 𝑆1 achieved by Base NSGA-II is set as the target hypervolume (HV𝑇 ). Over 20
runs, for each method, the mean and standard deviation of the number of function evaluations taken
to achieve HV𝑇 are used for performance comparison. The Wilcoxon rank-sum test [107, 108] is
used to compare the statistical performance of the algorithms tested here with respect to the best
performing algorithm for each scenario. As an example, let 𝑥 1 and 𝑥 2 represent the performance
                                                32


metric values for two algorithms 𝐴1 and 𝐴2 . For each run, 𝑥1 and 𝑥 2 exist as paired observations.
Here, the null hypothesis states that the there is no statistically significant difference between 𝑥1
and 𝑥 2 . The hypothesis was tested with 95% significance level and the 𝑝-values are recorded. A
𝑝-value > 0.05 means that there is not a statistically significant performance difference between
𝐴1 and 𝐴2 . In this case, the algorithm with the lowest number of median function evaluations
(FEmin                                    𝑇
    𝑚𝑒𝑑𝑖𝑎𝑛 ) required for achieving 𝐻𝑉 is chosen as reference. The HV values for achieving one
standard deviation more than lowest median FE (FEmin                 min
                                                           𝑚𝑒𝑑𝑖𝑎𝑛 +𝜎𝐹𝐸 ) for all algorithms are recorded
for 20 runs. Using this data, the Wilcoxon test is performed and the 𝑝-values are calculated.
                              Table 4.4 Parameter settings of NSGA-II/I.
              Parameter                                                      Value
              Population size                                                 500
              Maximum generations                                            4,000
              Mutation operator                                    Polynomial mutation [11]
              Mutation probability (𝑝 𝑚 ) and index (𝜂𝑚 )                 1/𝑛𝑣𝑎𝑟 , 50
              Crossover operator                                           SBX [3]
              Crossover probability (𝑝 𝑐 ) and index (𝜂𝑐 )                  0.9, 30
              Momentum parameter (𝛾) for IR2                                  0.3
              Minimum repair prob. ( 𝑝 𝑚𝑖𝑛 ) in Equation 4.3                  0.1
              𝛼 in Equation 4.3                                               0.5
4.5.3.1    Comparison with semi-independent variables
Scenarios 𝑆4 and 𝑆8 explicitly define the relationships to be imposed on any newly generated
solution, and thus, the role of the repair operators is limited to enforcing the user-defined constraints.
Thus, it is necessary to compare the performance of the proposed repair operators with methods
that handle this type of scenario as constraints. Since, many of the explicitly specified relationships
are monotonic in nature, methods such as semi-independent variables (SIVs, 𝑣 𝑖 ) proposed by [6]
can be used to enforce the provided knowledge as constraints. In this study, this algorithm is called
NSGA-II/SIV. For example, for subgroup 𝐺 11 with 𝐾11 variables under scenario 𝑆4 or 𝑆8 defined
in Table 4.2, the SIV representation is shown below. A similar process can be repeated for the other
variable groups as well.
                                                    33


                         Specified relationship: 𝑙1 ≤ 𝑙 2 ≤ . . . ≤ 𝑙 𝐾11 ,
                         SIVs: 𝑙1 (base var.), 𝑣 2 , 𝑣 3 , . . . , 𝑣 𝐾11 (derived vars.),
                         where
                            𝑙𝑖 = 𝑙𝑖−1 + 𝑣 𝑖 (𝑙 𝑈 − 𝑙𝑖−1 ), 𝑖 = 2, 3, . . . , 𝐾11 ,
                            𝑙 𝐿 ≤ 𝑙1 ≤ 𝑙𝑈 ,
                                                         
                            0 ≤ 𝑣 2 , 𝑣 3 , . . . , 𝑣 𝐾11 ≤ 1.
The restriction of 𝑣 𝑖 ≤ 1 ensures that all 𝑢𝑖,𝑖+1 = 2 relationships are always met with the SIVs.
Similar updates can be made for 𝑢𝑖,𝑖+1 = 3.
4.5.4    Results and Discussion
The optimization results for the 820 and 1,380-member truss cases are presented in Tables 4.5 and
4.6, respectively. In order to estimate the extent of knowledge specified by the user, a metric (K)
is proposed here which calculates the number of relations specified as a proportion of the total
number of possible relations. Here, ‘crisp’ or explicit knowledge (𝑢𝑖 𝑗 = 4) is given a weight of
one, partial information with 𝑢𝑖 𝑗 = 2 or 3 is assigned a weight of 0.75, and unknown knowledge
(𝑢𝑖 𝑗 = 1) is given a weight of 0.5. Albeit somewhat arbitrary, the following K measure is used as
an indication of overall problem information provided to the algorithm:
                            Í                          Í                        Í
                                [𝑢𝑖 𝑗 = 4] + 0.75 [𝑢𝑖 𝑗 = 2 ∨ 3] + 0.5 [𝑢𝑖 𝑗 = 1]
                           𝑖< 𝑗                       𝑖< 𝑗                     𝑖< 𝑗
                      K=                                                                  .    (4.11)
                                                         𝑛(𝑛 − 1)/2
     Scenarios 𝑆1 to 𝑆8 are designed to show the results of the proposed MOEA/I approach (imple-
mented as NSGA-II/I), and also for NSGA-II paired with SIVs. In both the truss cases, the row for
scenario 𝑆1 is marked as N/A for all the algorithms except for Base NSGA-II. This is because 𝑆1
is the no-knowledge case where all the elements in the matrix 𝑈 are set to 0. Thus, all the repair
operators remain inactive, and the results are the same as Base NSGA-II. For most of the cases, it
is seen that NSGA-II/IR2 performs the best (marked in bold), except for scenarios 𝑆5 and 𝑆8 (both
cases), and 𝑆4 (only for 1380-member truss). For 𝑆5 , even though Base NSGA-II performs the best
in terms of FEs, the other algorithms give a statistically similar performance (𝑝 > 0.05). For 𝑆8 , all
the repair operators and NSGA-II/SIV have a statistically worse performance (𝑝 < 0.05) compared
                                                           34


Table 4.5 FEs (median and standard deviation) required to reach target HV𝑇 = 0.91 for 820-
member, 236-node truss problem. Best performing algorithm for each scenario (𝑆1 to 𝑆4 does
not use symmetry of any kind but more knowledge is supplied from 𝑆1 towards 𝑆4 , and similar
knowledge embedding is done for 𝑆5 to 𝑆8 , except that symmetry knowledge is used for these latter
scenarios) is marked in bold. Algorithms with performance not statistically different from the best
algorithm are marked in italics. N/A indicates ‘Not Applied’ when the user-provided information
is not used (Base NSGA-II), or used in specific instances (NSGA-II/SIV). Wilcoxon test p-values
are given in braces.
                                                      Algorithm Used
    Scn.  K
                      Base          NSGA-II      NSGA-II       NSGA-II      NSGA-II /I-  NSGA-II
                      NSGA-II       /IR1         /IR2          /IR3         ES           /SIV
     𝑆1   0           2M            N/A          N/A           N/A          N/A          N/A
     𝑆2   7.63e-05    2M            1.6M±65k     1.4M±72k      1.7M±9k      1.5M±50k     N/A
                      (𝑝 = 0.0301)  (𝑝 = 0.1343)               (𝑝 = 0.0951) (𝑝 = 0.1520)
     𝑆3   3.80e-04    2M            1.2M±30k     0.98M±33k     1.5M±25k     1.2M±27k     N/A
                      (𝑝 = 0.0213)  (𝑝 = 0.1830)               (𝑝 = 0.0961) (𝑝 = 0.1841)
     𝑆4   5.60e-04    2M            1.9M±23k     1.7M±20k      2M           1.8M±10k     1.9M±20k
                      (𝑝 = 0.0424)  (𝑝 = 0.0411)               (HV=0.52,    (𝑝 = 0.1313) (𝑝 = 0.0313)
                                                                𝑝 = 0.0068)
     𝑆5   6.50e-04    1.2M±8k       1.3M±18k     1.3M ±12k     1.4M±8k      1.3M±21k     N/A
                                    (𝑝 = 0.1363) (𝑝 = 0.1413)  (𝑝 = 0.1526) (𝑝 = 0.1201)
     𝑆6   7.60e-04    1.2M±8k       0.99M±22k    0.96M±25k     1.2M±22k     0.98M±21k    N/A
                      (𝑝 = 0.0414)  (𝑝 = 0.1328)               (𝑝 = 0.0983) (𝑝 = 0.1154)
     𝑆7   1.16e-03    1M±10k        0.72M±19k    0.44M±23k     0.92M±10k    0.65M±15k    N/A
                      (𝑝 = 0.0048)  (𝑝 = 0.0612)               (𝑝 = 0.0141) (𝑝 = 0.0596)
     𝑆8   1.30e-03    1M±10k        1.3M±21k     1.6M±30k      1.8M±20k     1.5M±27k     1.8M±10k
                                    (𝑝 = 0.0362) (𝑝 = 0.0419)  (𝑝 = 0.0261) (𝑝 = 0.0237) (𝑝 = 0.0225)
to Base NSGA-II. Tables 4.5 and 4.6 show that a moderate usage of knowledge obtained through
innovization gives the optimal results. Too little or too much knowledge is detrimental to the
optimization performance. The ensemble approach’s (NSGA-II/I-ES) performance is statistically
similar to the best algorithm in all scenarios except 𝑆8 . For example, with a Wilcoxon ranksum test,
it is observed that for 𝑆7 in Table 4.5, NSGA-II/I-ES has a statistically similar performance (with
𝑝 = 0.0596) to NSGA-II/IR2, the best-performing algorithm. The ensemble approach (NSGA-II/I-
ES) has the advantage that the user does not need to think about which repair operator to use, since
it is handled automatically.
     Figure 4.4 shows the variation of solution evaluations for different algorithms and scenarios,
compiled from Table 4.5. It is clear from both plots (with and without the use of symmetry
knowledge) that the best performance is achieved with an intermediate supply of problem knowledge
                                                   35


(𝑆3 and 𝑆7 ) combined with an intermediate usage extent (IR2).
                                        No Symmetry Knowledge                                                     Symmetry Knowledge
                                                                                                           Base NSGA−II      NSGA−II IR3
                        2.0M                                                                        2.0M
 Solution Evaluations
                                                                                                           NSGA−II IR1       NSGA−II I−ES
                                                                             Solution Evaluations
                                                                                                           NSGA−II IR2
                        1.6M                                                                        1.6M
                        1.2M                                                                        1.2M
                               Base NSGA−II        NSGA−II IR3
                        0.8M                                                                        0.8M
                                NSGA−II IR1        NSGA−II I−ES
                                NSGA−II IR2
                        0.4M                                                                        0.4M
                               S1   S2                  S3         S4                                      S5   S6                  S7     S8
                                      Scenario (Increasing K −>)                                                Scenario (Increasing K −>)
Figure 4.4 Variation of solution evaluations for different algorithms and different scenarios for
820-member, 236-node truss problem.
                        Proceeding row-wise from top to bottom, it is seen that performance generally improves until a
particular point (𝑆3 and 𝑆7 ) and then drops (𝑆4 and 𝑆8 ). Interestingly, 𝑆4 and 𝑆8 are the ones which
explicitly specify the relationships instead of letting the IR operators determine it. A possible cause
is that over-specification of knowledge complicates the search process and constrains the algorithm
efficiency. An interesting comparison is the difference in performances between NSGA-II/IR
approaches and NSGA-II/SIV [6] (applicable to cases with 𝑢𝑖 𝑗 = 2 or 3 only). For the 820-member
truss case in Table 4.5, the performance of NSGA-II/SIV is close to NSGA-II/I-ES and is inferior
to NSGA-II/IR2 for 𝑆4 . For 𝑆8 , NSGA-II/SIV is notably worse than the NSGA-II/IR approaches.
A greater amount of supplied knowledge seems to provide only a marginal improvement in the
performance of NSGA-II/SIV. For the 1,380-member truss (Table 4.6), NSGA-II/SIV is not able
to reach the desired performance within 2M function evaluations for all the scenarios. This shows
that the SIV approach is still susceptible to knowledge overspecification for large-sized problems.
                        Results for both trusses for scenario 𝑆7 are shown in Figures 4.5 and 4.6, respectively. The ND
front plots (Figures 4.5a and 4.6a) clearly show NSGA-II/IR2 as the better performer, providing
higher quality solutions than the others, with NSGA-II/I-ES following close behind. The median
                                                                        36


   Table 4.6 FEs required to reach target HV𝑇 = 0.80 for 1,380-member, 396-node truss problem.
                                                                         Algorithm Used
         Scn.             K
                                    Base             NSGA-II        NSGA-II       NSGA-II                         NSGA-II /I-     NSGA-II
                                    NSGA-II          /IR1           /IR2          /IR3                            ES              /SIV
             𝑆1            0        2M               N/A            N/A                            N/A            N/A             N/A
             𝑆2        3.14e-03     2M               1.8M±42k       1.6M±48k                       1.8M±9k        1.7M±25k        N/A
                                    (𝑝 = 0.2347)     (𝑝 = 0.3212)                                  (𝑝 = 0.3205)   (𝑝 = 0.3153)
             𝑆3        4.70e-03     2M               1.6M±20k       1.4M±25k                       1.9M±10k       1.5M±17k        N/A
                                    (𝑝 = 0.0153)     (𝑝 = 0.1394)                                  (𝑝 = 0.0265)   (𝑝 = 0.1277)
             𝑆4        6.27e-03     2M               2M             2M                             2M             2M              2M
                                                     (HV=0.31,      (HV=0.44,                      (HV=0.57,      (HV=0.66),      (HV=0.30,
                                                     𝑝 = 0.0019)    𝑝 = 0.0058)                    𝑝 = 0.0103)    𝑝 = 0.0216)     𝑝 = 0.0038)
             𝑆5        6.52e-03     1.6M±25k         1.7M±15k       1.7M±23k                       1.8M±12k       1.7M±11k        N/A
                                                     (𝑝 = 0.3120)   (𝑝 = 0.2961)                   (𝑝 = 0.1849)   (𝑝 = 0.2971)
             𝑆6        3.26e-02     1.6M±25k         1.3M±20k       1.0M±30k                       1.5M±10k       1.1M±18k        N/A
                                    (𝑝 = 0.0431)     (𝑝 = 0.0713)                                  (𝑝 = 0.0481)   (𝑝 = 0.0692)
             𝑆7        3.59e-02     1.4M±15k         1.2M±19k       0.90M±23k                      1.2M±10k       1.0M±15k        N/A
                                    (𝑝 = 0.0303)     (𝑝 = 0.0572)                                  (𝑝 = 0.0576)   (𝑝 = 0.0861)
             𝑆8        4.83e-02     1.4M±15k         2M             2M                             2M             2M              2M
                                                     (HV=0.68,      (HV=0.73,                      (HV=0.72,      (HV=0.71,       (HV=0.66,
                                                     𝑝 = 0.0422)    𝑝 = 0.0490)                    𝑝 = 0.0481)    𝑝 = 0.0477)     𝑝 = 0.0302)
               5                                                                        1.6
                                     Base NSGA-II         NSGA-II/IR3                                   Base NSGA-II         NSGA-II/IR3
                                     NSGA-II/IR1          NSGA-II/I-ES                  1.4             NSGA-II/IR1          NSGA-II/I-ES
 Compliance (m/N)                                                         Hypervolume (HV)
               4                     NSGA-II/IR2                                                        NSGA-II/IR2          Repair Start Point
                                                                                        1.2
                                                                                        1.0
               3
                                                                                        0.8
               2                                                                        0.6
                                                                                        0.4
               1
                                                                                        0.2
               0 0.0          0.5          1.0          1.5         2.0                 0.0
                                                                                             0.0         0.5           1.0       1.5          2.0
                                     Weight (kg)                    1e5                                 Function Evaluations                1e6
                                    (a) ND fronts.                                             (b) Median hypervolume of 20 runs.
Figure 4.5 Results for 820-member truss for scenario 𝑆7 (symmetry among subgroups, unknown
relationships within subgroups).
HV plots (Figures 4.5b and 4.6b) show that NSGA-II/IR2 and NSGA-II/I-ES achieve a higher
median HV, faster, as shown previously in Tables 4.5 and 4.6.
           The NSGA-II/I-ES median repair probabilities plots (Figure 4.7) show the selection probability
of each repair operator as well as base NSGA-II for generating new solutions in NSGA-II/I-ES. It is
seen that in both truss cases, a high probability is assigned to IR2 compared to others. Probabilities
                                                                         37


                                                                                                 1.6
                 12                      Base NSGA-II           NSGA-II/IR3                                      Base NSGA-II           NSGA-II/IR3
                                         NSGA-II/IR1            NSGA-II/I-ES                     1.4             NSGA-II/IR1            NSGA-II/I-ES
   Compliance (m/N)                                                                Hypervolume (HV)
                 10                      NSGA-II/IR2                                                             NSGA-II/IR2            Repair Start Point
                                                                                                 1.2
                      8                                                                          1.0
                      6                                                                          0.8
                                                                                                 0.6
                      4
                                                                                                 0.4
                      2                                                                          0.2
                      0 0.0    0.5    1.0       1.5     2.0    2.5      3.0                      0.0
                                                                                                      0.0        0.5            1.0          1.5            2.0
                                        Weight (kg)                       1e5                                    Function Evaluations                     1e6
                                      (a) ND fronts.                                                      (b) Median hypervolume of 20 runs.
Figure 4.6 Results for 1,380-member truss for scenario 𝑆7 (symmetry among subgroups, unknown
relationships within subgroups).
                                                      Base NSGA-II                                                                    Base NSGA-II
                 1                                                                               1
                                                      NSGA-II/IR1                                                                     NSGA-II/IR1
                                                      NSGA-II/IR2                                                                     NSGA-II/IR2
              0.8             Repair begins                                                   0.8            Repair begins            NSGA-II/IR3
                                                      NSGA-II/IR3
Probability                                                                     Probability
              0.6                                                                             0.6
              0.4                                                                             0.4
              0.2                                                                             0.2
                 0                                                                               0
                      0         0.5         1            1.5         2                                0        0.5        1            1.5            2
                               Function Evaluations              10 6                                         Function Evaluations                 10 6
                               (a) 820-member truss.                                                          (b) 1,380-member truss
                                         Figure 4.7 Repair probability variation for median run.
of IR1 and IR3 are reduced to the minimum level very early on, and new solution generation is
controlled mostly by IR2 and the base NSGA-II. This clearly indicates the ability of the proposed
ensembled method (I-ES) to pick the most successful operator on the fly for solving a problem
efficiently.
              It is interesting from Tables 4.5 and 4.6 that with more information (K) being provided, the
performance of all algorithms does not improve monotonically. This means that there exists an
optimal amount of problem information for every problem below or above which there is under- or
over-specification of information provided. When less than necessary information is provided, an
                                                                               38


algorithm needs to work its way to find relevant building blocks for solving the problem, thereby
requiring more solution evaluations. On the other hand, if more than necessary information is
provided, even if the information is correct, the algorithm may get restricted in searching for
other required information needed to solve the problem. Providing just an optimal amount of
additional problem information and without unnecessary restriction in creating novel solutions
enables an MOEA to have a flexible search, thereby requiring to use an minimal number of solution
evaluations to solve a problem. For both trusses, the 𝑆7 scenario provides the right amount of
additional problem information. Combining it with the NSGA-II/IR2 algorithm which uses the
right amount of available knowledge, the problem can be solved with at most 22% and 45%
of the median number of solution evaluations required by the base NSGA-II procedure, for the
820- and 1,380-member problems, respectively. NSGA-II/I-ES provides a statistically comparable
performance, thus negating the need to know the appropriate repair operator in advance.
4.5.4.1    Visualization of non-dominated solutions
In this section, details about some of the best non-dominated solutions for the two truss case studies
have been provided. For each truss case, three designs taken from the Pareto Front of one run for
NSGA-II/IR2 for Scenario 𝑆7 have been visualized in Figures 4.8 and 4.9. The first two designs in
each figure are the lowest weight and lowest compliance trusses, thus covering the two extremes of
the Pareto front. The third design is taken from the middle of the Pareto front. The member size
distribution for each design is also provided.
                                                  39


                                                    0.12
                                                    0.10
                                                    0.08
                                                    0.06
                                                    0.04
                                                    0.02
                                                    0.00
                    (a) Min. weight truss         (b) Member size distribution
                    (4,000 kg, 5.0 m/N)           (Median = 0.03 m2 )
                                                    0.12
                                                    0.10
                                                    0.08
                                                    0.06
                                                    0.04
                                                    0.02
                                                    0.00
                    (c) Min. compliance truss     (d) Member size distribution.
                    (200,000 kg, 0.5 m/N)         (Median = 0.06 m2 )
                                                    0.12
                                                    0.10
                                                    0.08
                                                    0.06
                                                    0.04
                                                    0.02
                                                    0.00
                    (e) Intermediate truss        (f) Member size distribution.
                    (102,000 kg, 0.8 m/N)         (Median = 0.04 m2 )
Figure 4.8 Figures (a), (c) & (e) show 3 truss topologies found in one run of NSGA-II/IR2 for the
820-member truss problem and scenario 𝑆7 . The first two trusses correspond to the two extremes
of the Pareto Front (minimum weight and compliance). The third truss represents a solution picked
from the middle region of the Pareto Front. Figures (b), (d) & (f) show the variation in member
sizes among the 3 trusses.
                                                 40


                                                        0.14
                                                        0.12
                                                        0.10
                                                        0.08
                                                        0.06
                                                        0.04
                                                        0.02
                                                        0.00
            (a) Min. weight truss                    (b) Member size distribution
            (24,000 kg, 11.8 m/N)                    (Median = 0.04 m2 )
                                                        0.14
                                                        0.12
                                                        0.10
                                                        0.08
                                                        0.06
                                                        0.04
                                                        0.02
                                                        0.00
            (c) Min. compliance truss                (d) Member size distribution.
            (325,000 kg, 1.7 m/N)                    (Median = 0.09 m2 )
                                                        0.14
                                                        0.12
                                                        0.10
                                                        0.08
                                                        0.06
                                                        0.04
                                                        0.02
                                                        0.00
            (e) Intermediate truss                   (f) Member size distribution.
            (175,000 kg, 1.9 m/N)                    (Median = 0.07 m2 )
Figure 4.9 Figures (a), (c) & (e) show the optimal truss designs found in one run of NSGA-II/IR2
for the 1380-member truss problem and scenario 𝑆7 . Figures (b), (d) & (f) show the variation in
member sizes among the 3 trusses.
                                                41


4.6     Solid Rocket Design Problem
4.6.1   Problem background and formulation
Optimal design of solid rocket motors is a pertinent problem due to the advantages of solid
fuels ([109]). The main concern in designing a solid rocket motor is with choosing a suitable
propellant composition, thickness of layers, and core propellant geometry which can meet a set of
pre-determined performance criteria. This heterogeneous propellant distribution problem can be
formulated as an optimization problem as done by [91] and the same formulation is used in this
study. There are two criteria a good design must satisfy: (a) it should be able to meet a certain target
thrust profile (thrust versus time), and (b) it should minimize the amount of insulation required in the
motor by ensuring the entirety of the propellant gets burnt out simultaneously. In traditional solid
rocket propellants the composition is more or less uniform. In this case, however, the composite
propellant is made up of multiple sub-regions of same or different propellant types, each with a
separate burning rate, and following a specific geometry. The optimization algorithm has to find a
design that specifies which propellants to use and how they are arranged. Current manufacturing
technologies face a lot of challenge in implementing such complex motor designs. However, with
advances in 3-D printing technology, eventually, it may be become feasible to manufacture such
designs.
    The rocket under consideration has the propellant regions divided vertically into six cylindrical
sections, plus the dome, with each having a maximum of 20 distinct layers arranged as concentric
rings. Three sections possess some additional propellant arranged at the core, also referred to as a
‘star’ or finocyl shape. In addition, there are six smaller non-cylindrical ‘corner’ segments acting
as an interface between the dome and cylindrical segment propellants. Lastly, the nozzle region is
also considered a separate segment with different propellant layers. Propellant types and their burn
rates are presented in Table 4.7.
    Figures 4.10a and 4.10b illustrate the burn progression for a specific propellant arrangement.
    The target thrust profile is defined as a monotonically decreasing curve defined at 0.5-second
intervals, and lasting for 10 seconds, as shown by the blue band in Figure 4.12a. A good design
                                                     42


                       Table 4.7 Available propellant types and their burn rates.
                                  Type   Reference Burn Rate (m/s)
                                    0              2.54e-03
                                    1              3.05e-03
                                    2              3.63e-03
                                    3              4.34e-03
                                    4              5.21e-03
                                    5              6.22e-03
                                    6              7.44e-03
                                    7              8.92e-03
                                    8             1.064e-02
                                    9             1.275e-02
                                   10             1.524e-02
                 (a) Side view.                     (b) Sectional views at different segments.
Figure 4.10 Side and top views of a rocket showing burn progression from ignition till burnout. 10
propellant types with varying burn rates are represented by separate colors. The side view only
shows the propellants arranged in cylindrical sections. The middle 3 segments for the top view at
t=0 are the ‘star’ segments with a non-cylindrical geometry initially. Adapted from [91].
should match the target thrust profile with a tolerance of ±5%.
                                                  43


    The optimization problem formulation is given below:
                                                       𝑡𝑏
                                                      ∑︁
                              Minimize       𝑓1 (x) =     (𝑇 (x, 𝑡) − 𝑇𝑟 (𝑡)) 2 ,                   (4.12)
                                                      𝑡=0
                              Minimize       𝑓2 (x) = 𝜇𝑟𝑒𝑠 (x) + 𝜎𝑟𝑒𝑠 (x),                          (4.13)
                              subject to    𝑃 𝐿 ≤ 𝑃(x, 𝑡) ≤ 𝑃𝑈 ,                                    (4.14)
where x is the vector of decision variables specifying each type of propellant in each layer of each
segment, thickness of each layer, and geometry of core (finocyl) propellant arrangement. 𝑇 (x, 𝑡) is
the thrust obtained at time 𝑡, 𝑡 𝑏 is the target burn time, 𝑇𝑟 (𝑡) is the target thrust at time 𝑡, 𝜇𝑟𝑒𝑠 (x)
and 𝜎𝑟𝑒𝑠 (x) are the mean and standard deviation of segment residues, respectively, 𝑃(x, 𝑡) is the
pressure at time 𝑡, 𝑃 𝐿 and 𝑃𝑈 are the minimum and maximum allowable pressures, respectively.
The allowable pressure range is set to be between 1.4 MPa and 3.5 MPa.
    There are both discrete and continuous variables, which add to the problem complexity. In
addition, the number of decision variables (544) is also high, making it an LSMOP. The breakdown
of each type of decision variables is given in Table 4.8.
                 Table 4.8 Decision variable (x) properties for the rocket problem.
                           Variables                   Type          Range        Number
                       Layer propellants             Discrete        [0, 10]       260
                       Layer thicknesses            Continuous      [1, 2.35]      260
                         Star geometry               Discrete         [0, 3]        18
                        Star propellants             Discrete        [0, 35]         3
                  Circularization propellant         Discrete        [0, 10]         3
    The first objective function defined by Equation (4.12) measures the sum of squared errors
(SSE) between the target thrust and that obtained by the current design. The error at time 𝑡 is
considered as 0 if the obtained thrust is within ±5% of the target thrust.
    The second objective function defined by Equation (4.13) is an indirect measure of the amount
of insulation required and the extent of simultaneous burnout. Here, residue is measured by the
thickness of the cylindrical fuel layer (in metres) remaining in each segment at the end of the burn,
when the propellant in at least one of the segments has fully burnt out. The mean (𝜇𝑟𝑒𝑠 ) of the
                                                     44


residues at all the segments aims to encourage the maximum propellant usage and lengthen the
burn period. The standard deviation (𝜎𝑟𝑒𝑠 ) of all the segment residues measure the disparity in
the residues across different segments. Minimizing both 𝜇𝑟𝑒𝑠 and 𝜎𝑟𝑒𝑠 is necessary for obtaining a
good design.
    Equation (4.14) represents the pressure constraint which is necessary to ensure optimal rocket
operation. Dropping below the minimum pressure (Pressure Deflagration Limit, or PDL) will cause
the rocket to irrevocably cease burning, whereas, exceeding the maximum pressure risks explosion
of the rocket.
4.6.2    User Knowledge
Unlike the truss design problem which is very well-studied in literature, the solid rocket design
problem is relatively new [91]. This problem is intended to demonstrate the effectiveness of our
proposed approach in a new and less-analyzed practical problem. Due to the lack of previous
knowledge, most of the supplied user knowledge cases analyzed here will not specify any exact
relationships, but rather, leave it to the innovization process to figure them out. A logical way to
define variable groups is to do it segment-wise, as shown in Table 4.9. Groups 𝐺 𝑟1 -𝐺 𝑟3 represent
the star shape for the 3 star segments. 𝐺 𝑟4 -𝐺 𝑟9 represent the propellants for each cylindrical
segment. 𝐺 𝑟10 -𝐺 𝑟15 represent the layer thicknesses for each cylindrical segment.
    Five scenarios can be designed based on different levels of user knowledge extent, shown in
Table 4.10. Scenario 𝐶1 represents the no-knowledge case of the optimization. This is the reference
scenario and the optimization is free to evolve all 544 variables in any fashion.
    Scenario 𝐶2 is based on the knowledge that star segments determine a large proportion of the
initial part of the burn [91]. As a result, specific patterns might result in a better rocket performance.
Thus, for groups 𝐺 𝑟1 -𝐺 𝑟3 , 𝑢𝑖 𝑗 is set as 1.
    Scenario 𝐶3 extends 𝐶2 and by including the propellant variables for each cylindrical segment
(groups 𝐺 𝑟4 -𝐺 𝑟9 ) into the scope of the innovization. Thus, any pattern that exists between the
propellants in each segment will be captured, and repair performed accordingly.
    Scenario 𝐶4 builds upon 𝐶3 and also includes the layer thickness variables (groups 𝐺 𝑟10 -𝐺 𝑟15 ).
                                                     45


This scenario uses all the groups defined in Table 4.9.
    Scenario 𝐶5 is intended to demonstrate how more knowledge may not necessarily work in the
favor of the optimization algorithm. In this case, the nature of the target thrust profile is considered,
and an explicit relation among the cylindrical layer propellants is supplied. The target thrust profile
in Figure 4.12a is decreasing in nature, and the burn progresses outwards from the core to the shell.
Hence, an argument can be made that the near-core layers should have a faster burning fuel, and
the far-core layers should have slower burning fuel. This can be expressed by setting 𝑢𝑖 𝑗 = 3 for
every group 𝐺 𝑟4 -𝐺 𝑟9 . However, due to the problem complexity, such a trend is not guaranteed to
hold for every case. Such knowledge can be potentially ‘imperfect’ or ‘incomplete’. Compared to
the other scenarios, 𝐶5 has the most user-supplied information.
Table 4.9 Variable groups for the rocket prob-           Table 4.10 User information matrix entries
lem. Each group has comparable variables                 (𝑢𝑖 𝑗 ) and groups used for all scenarios for the
with identical units and scales.                         rocket problem. More problem information
                                                         is provided with increasing index in 𝐶.
  Variable        Group    Variable Indices
  Type                                                                            Scenario
                                                                  Group
                    𝐺 𝑟1     [521 − 526]                                  𝐶1   𝐶2 𝐶3 𝐶4         𝐶5
  Star segment
                    𝐺 𝑟2     [529 − 534]                           𝐺 𝑟1   0    1     1     1    1
  geometry
                    𝐺 𝑟3     [537 − 542]                           𝐺 𝑟2   0    1     1     1    1
                    𝐺 𝑟4        [1 − 20]                           𝐺 𝑟3   0    1     1     1    1
                    𝐺 𝑟5       [21 − 40]                           𝐺 𝑟4   0    0     1     1    3
  Cylindrical                                                      𝐺 𝑟5   0    0     1     1    3
                    𝐺 𝑟6       [41 − 60]
  segment                                                          𝐺 𝑟6   0    0     1     1    3
                    𝐺 𝑟7       [61 − 80]
  propellants                                                      𝐺 𝑟7   0    0     1     1    3
                    𝐺 𝑟8      [81 − 100]
                                                                   𝐺 𝑟8   0    0     1     1    3
                    𝐺 𝑟9     [101 − 120]
                                                                   𝐺 𝑟9   0    0     1     1    3
                   𝐺 𝑟10     [261 − 280]                           𝐺 𝑟10  0    0     0     1    1
                   𝐺 𝑟11     [281 − 300]                           𝐺 𝑟11  0    0     0     1    1
  Cylindrical
                   𝐺 𝑟12     [301 − 320]                           𝐺 𝑟12  0    0     0     1    1
  segment layer
                   𝐺 𝑟13     [321 − 340]                           𝐺 𝑟13  0    0     0     1    1
  thickness
                   𝐺 𝑟14     [341 − 360]                           𝐺 𝑟14  0    0     0     1    1
                   𝐺 𝑟15     [361 − 380]                           𝐺 𝑟15  0    0     0     1    1
4.6.3   Results and Discussion
The experimental settings are same as those for the truss design problem described in Section
4.5.3, with only the maximum generations increased to 100,000. For scenario 𝐶5 consisting of
                                                  46


explicitly-specified information, NSGA-II/SIV is also compared to the other repair operators (IR1,
IR2, IR3 and I-ES).
     The optimization results are presented Table 4.11. The extent of supplied information increases
row-wise from scenario 𝐶1 to 𝐶5 as can be seen from the values of K. For all the scenarios, the
p-values of the Wilcoxon test with the best-performing algorithm as a reference are also given in
brackets. For the no-knowledge case 𝐶1 , all the elements of matrix 𝑈 are set as 0. Thus, all repair
operators remain inactive, so apart from base NSGA-II, all the other cells are marked as N/A. It is
seen that in all of the cases except for 𝐶5 , NSGA-II/IR2 is the best performing algorithm (marked
in bold), with NSGA-II/I-ES showing a comparable performance. Scenario 𝐶4 combined with the
NSGA-II/IR2 operator gives the best performance overall. This shows that for both the amount of
user-supplied knowledge and the extent of online knowledge usage, an optimal level exists. Too
little or too much knowledge usage is detrimental to the optimization performance. Interestingly,
for 𝐶5 the relationships between the propellants in the cylindrical layers are explicitly defined. But
this seems to constrain the algorithm and degrades its performance to below that of Base NSGA-II.
Since this problem is not very well-studied, a possible cause could be the lack of a direct relation
between the thrust and the individual layer propellants. The same thrust value can be produced by
a lot of different propellant combinations across all segments, and simple monotonic relationships
may not exist.
     In terms of final HV obtained, for all the cases, NSGA-II/IR2 provides the fastest convergence,
demonstrated by the low number of function evaluations taken to reach the target HV, with NSGA-
II/I-ES following closely. Base NSGA-II always performs worse in scenarios 𝐶2 to 𝐶4 . For 𝐶5 , the
perceived knowledge 𝑢𝑖 𝑗 = 3 for 𝐺 𝑟4 -𝐺 𝑟9 is found to be not correct and they harm the performance
of NSGA-II/IR methods.
     One of the non-dominated fronts obtained by the four repair operators for scenario 𝐶4 is
shown in Figure 4.11a. It can be seen that NSGA-II/IR2 provides better quality solutions than
the others. The median HV plot in Figure 4.11b also shows that NSGA-II/IR2 achieves the same
level of performance as that of the no-knowledge case with fewer function evaluations. The thrust
                                                   47


  Table 4.11 FEs required to reach target HV𝑇 = 0.93 for 544-variable solid fuel rocket design.
                                                                                       Algorithm Used
               Scn.                       K
                                                    Base             NSGA-        NSGA-         NSGA-                                    NSGA-II/I-         NSGA-
                                                    NSGA-II          II/IR1       II/IR2        II/IR3                                   ES                 II/SIV
                           𝐶1           0           50.0M            N/A          N/A           N/A                                      N/A                N/A
                           𝐶2        1.5e-04        50.0M            32.0M±2.0M 30.5M±1.5M 36.0M±3.0M                                    31.5M±1.0M         N/A
                                                    (𝑝 = 0.0053)     (𝑝 = 0.1205)               (𝑝 = 0.0104)                             (𝑝 = 0.1516)
                           𝐶3        4.0e-03        50.0M            28.0M±3.0M 25.0M±4.0M 32.0M±1.0M                                    26.0M±1.0M         N/A
                                                    (𝑝 = 0.0061)     (𝑝 = 0.0998)               (𝑝 = 0.0051)                             (𝑝 = 0.1336)
                           𝐶4        7.8e-03        50.0M            19.0M±5.0M 12.0M±5.0M 22.0M±4.0M                                    13.0M±2.0M         N/A
                                                    (𝑝 = 0.0029)     (𝑝 = 0.0350)               (𝑝 = 0.0154)                             (𝑝 = 0.7710)
                           𝐶5        9.8e-03        50.0M            50.0M        50.0M         50.0M                                    50.0M              50.0M
                                                                     (HV=0.74,    (HV=0.78,     (HV=0.66,                                (HV=0.84,          (HV=0.69,
                                                                     𝑝 = 0.0233) 𝑝 = 0.0135) 𝑝 = 0.0104)                                 𝑝 = 0.0157)        𝑝 = 0.0119)
profile and residues for one of the solutions on the ND front is shown in Figures 4.12a and 4.12b,
respectively. Thrust is always within allowable ranges and the residue for every segment is lower
than 1 mm.
                           0.0006                                                  Base NSGA-II                                                                Base NSGA-II
                                                                                                                     1.4                                       NSGA-II/IR1
                                                                                            R1
                                                                                   NSGA-II/I-R1                                                                NSGA-II/IR2
                                                                                            R2
                                                                                   NSGA-II/I-R2                                                                NSGA-II/IR3
                           0.0005
                                                                                                                     1.2                                       NSGA-II/I-ES
                                                                                   NSGA-II/I-R3
                                                                                            R3                                                                 Repair Start Point
   Residue m et ric (f2)
                                                                                                      Hypervolume (HV)
                                                                                   NSGA-II/I-ES
                           0.0004                                                                                    1.0
                                                                                                                     0.8
                           0.0003
                                                                                                                     0.6
                           0.0002
                                                                                                                     0.4
                           0.0001
                                                                                                                     0.2
                           0.0000                                                                                    0.0
                                                                                                                           0   1         2         3          4                     5
                                    0.0       0.5     1.0     1.5     2.0      2.5       3.0
                                                                                                1e7
                                                                                                                                     Function Evaluations                     1e7
                                                    Thrust m at ch m et ric (f1)
                                                     (a) ND fronts.                                                                (b) HV variations.
                                      Figure 4.11 ND fronts and HV variations of scenario 𝐶4 for rocket design.
4.6.3.1                             Visualization of non-dominated solutions
Three non-dominated solutions from different regions of the Pareto Front found in one run of
NSGA-II/IR2 for scenario 𝐶4 are visualized in Figures 4.13, 4.14, and 4.15. Two types of decision
variables are visualized: the ‘star’ geometry, and the cylindrical segment propellants. One common
pattern visible in all 3 propellant distribution figures is the presence of faster burning fuel (reddish
regions) in the inner layers and slower burning propellants (bluish regions) in the outer layers. This
                                                                                               48


                                                                                                                    0.005
                                                                                      Residual fuel thickness (m)
                     8000
                                                                 Obtained Thrust                                    0.004
  Thrust (Newtons)
                                                                 Target Thrust
                     6000
                                                                                                                    0.003
                     4000
                                                                                                                    0.002
                     2000                                                                                           0.001
                       0                                                                                               0
                        0                  5                10                   15                                         0   1   2   3   4   5   6   7   8   9 10 11 12
                                             Time (seconds)
                                                                                                                                                Segment
                            (a) Thrust profile is within limits ( 𝑓1 ).                                                (b) Residue ( 𝑓2 ) is less than 1 mm.
                                  Figure 4.12 Objectives 𝑓1 and 𝑓2 for a particular run for scenario 𝐶4 .
intuitively makes sense since the target thrust profile requires a higher thrust early on, thus requiring
a faster burning fuel compared to the later part of the burn.
 (a) Star shapes in segments 2-4.                                         (b) Propellant distribution in the cylindrical region.
Figure 4.13 Star shapes and cylindrical region propellant distribution for the rocket design with
minimum thrust error in one run of NSGA-II/IR2 for scenario 𝐶4 . Blue regions in the star shape
figures represent the propellant. The green circular rings represent the cylindrical region. In the
cylindrical propellant distribution figures, blue represents the slowest-burning propellant and red
represents the fastest burning propellant.
                                                                            49


 (a) Star shapes in segments 2-4.              (b) Propellant distribution in the cylindrical region.
Figure 4.14 Star shapes and cylindrical region propellant distribution for the rocket design with
minimum residue in one run of NSGA-II/IR2 for scenario 𝐶4 .
 (a) Star shapes in segments 2-4.              (b) Propellant distribution in the cylindrical region.
Figure 4.15 Star shapes and cylindrical region propellant distribution for a rocket design with an
intermediate thrust error and residue lying in the middle region of the Pareto Front in one run of
NSGA-II/IR2 for scenario 𝐶4 .
                                                 50


4.7    Overall Performance of Proposed Methods
Each of the three case studies (879- and 1,479-variable truss designs and 544-variable rocket design)
are ranked based on the median FEs needed by each of the five algorithms (base NSGA-II, three
innovized repair based NSGA-IIs, and ensembled NSGA-II) in 20 independent runs of each. The
sums of ranks of each algorithm over all 3 case studies are calculated, and the algorithms are ranked
accordingly, shown in Table 4.12. The final row of Table 4.12 indicates that NSGA-II/IR2 performs
the best, followed by NSGA-II/I-ES. The base NSGA-II and NSGA-II/IR1 are tied in third place.
NSGA-II/IR3 is ranked last, showing that an aggressive use of available and potentially imperfect
information performs the worst. Interestingly, even though a single balance of problem information
and its use within NSGA-II (IR2) performs the best for these problems, with all repair methods
being included, our proposed ensembled NSGA-II performs the second best.
4.8    Limitations of the proposed MOEA/I framework
The primary limitation of the proposed MOEA/I framework is the absence of any user interaction
during the optimization. While some level of user involvement is present at the beginning of the
optimization run, none is present during the run. In addition, MOEA/I is shown to only work with
inequality relationships. Other versatile relations such as power laws are not compatible with such a
framework. These factors limit the utility of the MOEA/I framework in practical applications. This
motivates the next part of the dissertation where a truly interactive framework able to accommodate
multiple relationship types is proposed.
                                                  51


Table 4.12 Ranking of different NSGA-IIs on three case studies.
            Scenario    Base IR1 IR2 IR3         I-ES
                    879-variable Truss Design
               𝑆2          5      3     1      4    2
               𝑆3          5      3     1      4    2
               𝑆4          4      3     1      5    2
               𝑆5          1      3     2      5    4
               𝑆6          4      3     1      5    2
               𝑆7          5      3     1      4    2
               𝑆8          1      2     4      5    3
              Rank         4      3     1      5    2
                   1,479-variable Truss Design
               𝑆2          5      4     1      3    2
               𝑆3          5      3     1      4    2
               𝑆4          1      5     4      3    2
               𝑆5          1      3     4      5    2
               𝑆6          5      3     1      4    2
               𝑆7          5      4     1      3    2
               𝑆8          1      5     2      3    4
              Rank         3      5     1      4    2
                   544-variable Rocket Design
               𝐶2          5      3     1      4    2
               𝐶3          5      3     1      4    2
               𝐶4          5      3     1      4    2
               𝐶5          1      4     3      5    2
              Rank         4      3     1      5    2
           Rank-Sum       11     11     3    14     6
           Final Rank      3      3     1      4    2
                               52


                                            CHAPTER 5
   AN INTERACTIVE KNOWLEDGE-BASED EVOLUTIONARY MULTI-OBJECTIVE
                                 ALGORITHM FRAMEWORK
Figure 5.1 shows the proposed interactive knowledge-based EMO (IK-EMO) framework. The
framework starts with a description of the multi-objective optimization problem, as shown in the
top-left box in the figure. In addition, if any additional problem information is available, that is
also specified. The penultimate step before starting the optimization is to select a suitable EMO
and methods to algorithmically extract and apply any problem knowledge. The subsequent sections
describe the various components in more detail.
Figure 5.1 Interactive knowledge-based EMO framework (IK-EMO) showing user interaction,
learning and repair agents. Blue blocks represent a normal EMO. Green blocks represent the
components responsible for knowledge extraction and application. Orange blocks represent user
interaction operations.
5.1    Learning agent
A learning agent is a procedure used to identify different innovization rules present among the ND
solutions in a population. The rules involve a single variable or a pair of variables, as required
by a rule’s description. Each type of rule (inequality, power law, etc.) requires a different rule
satisfaction condition. A score (within [0,1], as presented in Table 3.1) is assigned to each rule to
                                                 53


quantify how well the rule represents the ND set. Different learning agents applicable to the rule
types covered in Section 3.1 are presented below. A summary of the various rules, their scoring
procedures and satisfaction conditions are provided in Table 3.1.
5.1.1   Constant rule
In order to learn constant rules, we have to analyze the values of the variable under consideration
for every ND solution and check if one or more of them converge to specific values. Since variables
can be of different scales and units, we need a generalized criterion to determine if a variable is
taking on a constant value. First, the median (𝑥˜𝑖 ) is calculated. The proportion of ND solutions
which satisfy |𝑥𝑖 − 𝑥˜𝑖 | ≤ 𝜌𝑖 is said to be the score (𝑠 𝜙𝑖 ) of the constant rule 𝑥𝑖 = 𝜅𝑖 = 𝑥˜𝑖 . 𝜌𝑖 is a small
tolerance used for determining whether variable 𝑥𝑖 ’s value is in the neighborhood of 𝑥˜𝑖 . It must be
defined separately for each variable. An alternative option is to normalize the variables and define
a singular 𝜌 for the normalized variable space.
    To check whether a new solution (x𝑟 ) follows 𝑥𝑖𝑟 = 𝜅𝑖 , we check whether 𝑥𝑖𝑟 lies in the
neighborhood of 𝜅𝑖 using the condition: |𝑥𝑖𝑟 − 𝜅𝑖 | ≤ 𝜌𝑖 .
5.1.2   Power law rule
In order to learn power laws (𝑥𝑖 𝑥 𝑏𝑗 = 𝑐) we use the method proposed in [72] with a modification.
Each variable is initially normalized to [1, 2]. A training dataset is created from the ND solution
set with the logarithms of normalized variables 𝑥ˆ𝑖 and 𝑥ˆ 𝑗 as features, leading to Eqn. 5.2:
                                               𝑥ˆ𝑖 𝑥ˆ 𝑏𝑗 = 𝑐,                                                (5.1)
                                        ⇒     log 𝑥ˆ𝑖 = 𝛽 log 𝑥ˆ 𝑗 + 𝜖,                                      (5.2)
where 𝛽 = −𝑏 is the weight and 𝜖 = log 𝑐 is the intercept. Normalization prevents 0 or negative
values from appearing in the logarithm terms. Then we apply ordinary least squares linear regression
to the logarithm of 𝑥ˆ𝑖 and 𝑥ˆ 𝑗 . Linear regression finds the best-fit line for the training data defined
by the parameters 𝛽 and 𝜖. In order to evaluate the quality of the fit, we use the coefficient of
determination (𝑅 2 ) metric. A new solution (x𝑟 ) follows the power law given in Equation 5.1 if the
difference between the actual value (𝑥𝑖𝑟 or 𝑥 𝑗𝑟 ) and the predicted value (𝑥ˆ𝑖𝑟 or 𝑥ˆ 𝑗𝑟 ) is lower than a
pre-defined threshold error (𝑒𝑖min 𝑗 ). Table 3.1 shows the formulation for the satisfaction condition.
                                                         54


5.1.3     Equality rule
Two variables can be considered equal if |𝑥𝑖 − 𝑥 𝑗 | ≤ 𝜀𝑖 𝑗 with 𝜀𝑖 𝑗 being a tolerance parameter for
variable pair 𝑥𝑖 and 𝑥 𝑗 . The proportion of ND solutions following this condition is the score (𝑠 𝜙𝑖 𝑗 )
of the equality rule. The need to define 𝜀𝑖 𝑗 for every variable pair can be avoided if normalized
variables are used.
5.1.4     Inequality rule
Inequality rules can be of the form 𝑥𝑖 ≤ 𝑥 𝑗 or 𝑥𝑖 ≥ 𝑥 𝑗 . The proportion of ND solutions satisfying
either condition is the score of the respective rules.
    After the learning agent identifies specific rules from a set of ND solutions, the rules can be
used to repair offspring solutions of the next generation. The repair mechanism for each rule is
described next.
5.2      Repair agent
Once the rules are learned from the current ND solutions by the learning agent, the next task is
to use these rules to repair the offspring solutions for the next few generations. There are two
questions to ponder. First, how many rules should we use in the repair process? Second, how
closely should we adhere to each rule while repairing? A small fraction of learned rules may not
embed requisite properties present in the ND solutions in offspring solutions. But the usage of too
many rules may reduce the effect of each rule. Similarly, a tight adherence to observed rules may
encourage premature convergence to a non-optimal solution, while a loose adherence may not pass
on properties of ND solutions to the offspring. We propose four different rule usage schemes (10%
(RU1) to 100% (RU4)) and three rule adherence schemes (RA1 (tight) to RA3 (loose)) for power
law and inequality rules.
5.2.1     Constant rule
To apply a constant rule 𝑥𝑖 = 𝜅𝑖 to a particular offspring solution x (𝑘) , the variable 𝑥𝑖(𝑘) is simply set
to 𝜅𝑖 , thereby implementing the learned rule from previous ND solutions to the current offspring
solutions. Constant rules are always included in the rule set and used with tight adherence.
                                                   55


5.2.2      Power law rule
For a power law rule 𝑥ˆ𝑖 𝑥ˆ𝑗 𝑏 = 𝑐, one variable is selected as the base (independent) variable and the
other variable is set according to the rule. For example, for a particular offspring solution x (𝑘) , if
𝑥ˆ𝑖 (𝑘) is selected as the base variable, 𝑥ˆ (𝑘)                       (𝑘)      𝑐 𝑏1
                                              𝑗 is set as follows: 𝑥ˆ 𝑗 = ( 𝑥ˆ𝑖 ) . Despite theoretically being
able to represent constant relationships by having 𝑏 = 0, in practice, extremely low values of 𝑏
can cause the repaired variable 𝑥ˆ (𝑘) 𝑗 to have a large value outside the variable range. Hence, in this
study, we first check whether a variable follows constant rules, and if it does, then that variable’s
involvement in a power law rule is ignored.
      A repair of a power law rule is followed with three different confidence levels by adjusting to an
updated 𝑐-value: 𝑥ˆ𝑖 𝑥ˆ𝑗 𝑏 = 𝑐𝑟 . PL-RA1 uses 𝑐𝑟 = 𝑐 (tight adherence); PL-RA2 uses 𝑐𝑟 ∈ N (𝑐, 𝜎𝑐 )
(medium adherence), and PL-RA3 uses 𝑐𝑟 ∈ N (𝑐, 2𝜎𝑐 ) (loose adherence), where 𝜎𝑐 is the standard
deviation of 𝑐-values for the power law observed among the ND solutions during learning process.
PL-RA1 puts the greatest trust into the learned power law rule, whereas PL-RA3 has the least
amount of trust and provides the most flexibility in the repair process.
5.2.3      Inequality and equality rules
In order to repair an offspring solution x (𝑘) , we have to select one variable (𝑥𝑖(𝑘) ) as the base variable
and the other (𝑥 (𝑘) 𝑗 ) as the dependent variable to be repaired. The generalized inequality repair
operation is shown below:
                      𝑥 (𝑘)    (𝑘)          𝑈
                        𝑗 = 𝑥𝑖 + 𝜈𝑟1 (𝑥𝑖 − 𝑥𝑖 ),
                                                    (𝑘)
                                                                               for 𝑥𝑖(𝑘) ≤ 𝑥 (𝑘)
                                                                                             𝑗 ,           (5.3)
                              𝑥𝑖(𝑘) − 𝜈𝑟2 𝑥𝑖𝑈
                      𝑥 (𝑘) =                 ,                                for 𝑥𝑖(𝑘) ≥ 𝑥 (𝑘)
                                                                                             𝑗 .           (5.4)
                        𝑗         1 − 𝜈𝑟2
Three different rule adherence (RA) schemes are considered. IQ-RA1 uses 𝜈𝑟1 = 𝜇 𝜈1 and 𝜈𝑟2 = 𝜇 𝜈2
(tight adherence with no standard deviation), which are computed as the means of 𝜈1 and 𝜈2 from
ND solutions during the learning process, as follows:
                                                𝑥 𝑗 − 𝑥𝑖           𝑥𝑖 − 𝑥 𝑗
                                        𝜈1 =             ,    𝜈2 =           .
                                                𝑥𝑖𝑈 − 𝑥𝑖           𝑥𝑖𝑈 − 𝑥 𝑗
For IQ-RA2, 𝜈𝑟1 ∈ N (𝜇 𝜈1 , 𝜎𝜈1 ) and 𝜈𝑟2 ∈ N (𝜇 𝜈2 , 𝜎𝜈2 ) (medium adherence with one standard
deviation) are used, where 𝜎𝜈1 and 𝜎𝜈2 are standard deviations of 𝜈1 and 𝜈2 , respectively. Both
                                                           56


𝜈𝑟1 and 𝜈𝑟2 are set to zero, if they come out to be negative. For IQ-RA3, 𝜈𝑟1 , 𝜈𝑟2 ∈ 𝑈 (0, 1) (loose
adherence with a uniform distribution) are used.
5.3     Ensemble repair agent
Both power law and inequality/equality rules have three rule adherence options for repair. For
a new problem, it is not clear which option will work the best, so we also propose an ensemble
approach (PL-RA-E and IQ-RA-E) in which all three options are allowed, but based on the success
of each option, more probability is assigned to each. The ensemble method also considers a fourth
option in which no repair to an offspring is made. The survival rate (𝑟 𝑠𝑖 ) of offspring generated by
the 𝑖-th repair operator is a measure of its quality. The greater the survival rate of the offspring
created by an operator is, the higher is the probability of its being used in subsequent offspring
generation. The probability ( 𝑝b𝑖𝑟 ) update operation for the 𝑖-th operator is presented below:
                                                                                
                              𝑖                               𝑟 𝑠𝑖
                            𝑝𝑟 (𝑡 + 1) = max 𝑝 min , 𝛼 Í 𝑖 + (1 − 𝛼) 𝑝𝑟 (𝑡) ,
                                                                            b𝑖                         (5.5)
                                                              𝑖 𝑟𝑠
                                                𝑝 (𝑡 + 1)
                                                 𝑖
                            𝑝b𝑖𝑟 (𝑡 + 1) = Í 𝑟 𝑖            ,                                          (5.6)
                                                 𝑝𝑟 (𝑡 + 1)
                                               𝑖
                                         𝑛𝑖𝑠
where 𝛼 is the learning rate, 𝑟 𝑠𝑖 =    𝑛off , where 𝑛𝑖𝑠 and 𝑛off are the number of offspring created by the
𝑖-th operator that survive in generation 𝑡 and the total number of offspring that survive in generation
𝑡, respectively. It is possible that at any point during the optimization, no solution generated by one
of the repair operators survives. This might cause the corresponding selection probability to go
down to zero without any possibility of recovery. To prevent this, in Equation 5.5, the probability
update step ensures that a minimum selection probability (𝑝 min ) is always assigned to each repair
operator present in the ensemble. Equation 5.6 normalizes the probability values for each operator
so that their total sum is one.
     The learning rate (𝛼) determines the rate of change of the repair probabilities. A high 𝛼 would
increase the sensitivity, and can result in large changes in repair probabilities over a short period
of time. A low 𝛼 exerts a damping effect which causes the probability values to update slowly.
Through trial and error, 𝛼 = 0.5 and 𝑝 min = 0.1 are found to be suitable for the problems of this
study.
                                                           57


5.3.1   Mixed rule repair agent
A mixed rule repair agent is designed to work on two or more different types of rules. Since
multiple rules (for example, an inequality rule and a power law rule) can show up for the same
variable pair, a rule hierarchy needs to exist as defined in Section 3.2.2. Table 5.1 shows the rule
hierarchical rank used for all the repair agents in this study.
                       Table 5.1 Rule hierarchy by rank for each repair agent.
                                Repair agent                    Rule type    Rank
                                                                 Constant      1
                  PL-RA1, PL-RA2, PL-RA3, PL-RA-E
                                                                Power law      2
                                                                 Constant      1
                                                                 Equality      2
                   IQ-RA1, IQ-RA2, IQ-RA3, IQ-RA-E
                                                              Inequality (≤)   3
                                                              Inequality (≥)   3
                                                                 Constant      1
                                                                Power law      2
                     Mixed (Power law and inequality)            Equality      2
                                                              Inequality (≤)   2
                                                              Inequality (≥)   2
5.4    User’s ranking of rules
The user forms the basis of the interactivity of the IK-EMO framework. At any point during the
optimization, the user has the option to review the optimization results and provide feedback to the
optimization algorithm in one or more of the following ways:
    • Rule ranking: The user may provide a ranking of rules (rank 1 is most preferred) provided by
      the algorithm. The algorithm will then try to implement the rules in the rank order provided
      by the user.
    • Rule exclusion: The user may select to remove certain rules provided by the algorithm, based
      on their knowledge of the problem.
    • Rule specificity: The user may specify details for considering a rule further. For example,
      the user may specify that only variables having a correlation above a specified value should
                                                  58


       be considered. Another criterion could be to select all rules having a score greater than a
       threshold as rank 1 and exclude the others.
In this study, the proposed rule usage schemes (RU1-RU4) can also be considered as artificial users
[56] who select a certain percentage of the learned rules every few generations. This systematically
illustrates the interactive ability of IK-EMO while showing the effect of different numbers of rules
used for repair on the performance.
5.5     Variable relation graph (VRG)
                                                                        𝑛(𝑛−1)
The possible number of pair-wise relations among 𝑛 variables is            2   or 𝑂 (𝑛2 ). Thus, for a large
number of variables, the amount of bookkeeping required to track individual pairwise relations is
large. Moreover, the observed relationships should not contradict each other. For example, for
inequality rules 𝑥𝑖 ≤ 𝑥 𝑗 and 𝑥 𝑗 ≤ 𝑥 𝑘 , the transitive property can be maintained by choosing to repair
𝑥 𝑗 based on 𝑥𝑖 , followed by repairing 𝑥 𝑘 based on 𝑥 𝑗 using Equation 5.3. But repairing both 𝑥 𝑗 and
𝑥 𝑘 separately based on 𝑥𝑖 can potentially contradict the rule 𝑥 𝑗 ≤ 𝑥 𝑘 . To solve these two challenges,
we propose using a graph-based data structure, called variable relation graph (VRG), to encode
and track relationships observed between multiple variable pairs. A customized graph-traversal
algorithm ensures that all repairs are performed with minimal or no contradictions. In the following
sections, steps 1 to 5 show the process of using learning agents to construct a VRG (learning phase).
A learning interval (𝑇𝐿 ) is defined as the number of generations or function evaluations (FEs) after
which a new learning phase begins. Step 6 shows the process of applying the VRG to repair an
offspring solution using one or more repair agents (repair phase). A repair interval (𝑇𝑅 ) is defined
as the number of generations or FEs between any two repair phases.
5.5.1    Create a complete VRG
A vertex (or node) of a VRG represents a variable and an edge connecting two nodes indicates the
existence of a relationship between the corresponding variables. For every group 𝐺 𝑘 of variables,
all pairwise variable combinations are connected by an edge. This will result in a complete graph
where every pair of vertices is connected by a unique undirected edge. An example with two variable
groups (𝐺 1 = {1, 2, 3, 6, 8} and 𝐺 2 = {4, 5, 7, 9, 10}) having five variables each is illustrated in
                                                       59


Figure 5.2.
                   (a) Group 𝐺 1 .                                     (b) Group 𝐺 2 .
     Figure 5.2 Ten variables in two non-interacting groups are represented in complete graphs.
5.5.2    Rule selection
In this step, learned rules are used to modify the VRGs according to two selection criteria. First,
all rules having a score (defined in Table 3.1) above a certain threshold (𝑠min ) are considered.
Second, they are applied in the order of user’s preference ranking. A connection may be removed
if it does not satisfy the selection criteria. If a single-variable (constant) rule satisfies the selection
criterion, then the corresponding node is removed from the VRG and that rule will be implemented
separately. If no two-variable rule involving 𝑥𝑖 and 𝑥 𝑗 satisfies the minimum score criterion, the
corresponding VRG edge (𝑖- 𝑗) is removed. An example is shown in Figure 5.3, which uses the rule
hierarchy for mixed rule repair operators (third row) shown in Table 5.1, except that inequalities are
ranked 3 for illustration here. A blue or brown edge represents a power law rule or an inequality
rule, respectively. An edge ranking is also assigned based on the rule hierarchy. In this case, edges
representing power laws and inequalities will be ranked 1 and 2 by default, unless overruled by the
user. Both graphs have a reduced number of edges after the rule selection process is complete.
Node 8 in Figure 5.3a (marked in red) is found to have a constant rule associated with it and hence
removed. In Figure 5.3b, variables (𝑥5 , 𝑥 9 ) and (𝑥 9 , 𝑥10 ) are not related by power laws having a
                                                      60


score greater than 𝑠min . However, they are found to follow inequality relationships with a score
greater than 𝑠min . Hence, they are connected by brown edges. The rest of the edges represent power
law rules and are marked by blue. The approach to set the direction of the edges is discussed next.
                   (a) Group 𝐺 1 .                                 (b) Group 𝐺 2 .
                                      Figure 5.3 Rule selection.
5.5.3   Create a directed acyclic VRG
In order to apply a repair agent to the VRG, it needs to be converted to a directed acyclic graph
(DAG). This step ensures graph traversal is possible without getting stuck in loops. The members
of every group 𝐺 𝑘 are randomly permuted to create a sequence 𝐷 𝑘 . If 𝑖 appears before 𝑗 in 𝐷 𝑘 , an
undirected edge between nodes 𝑖 and 𝑗 is converted to a directed edge from 𝑖 to 𝑗. In the example
shown in Figure 5.4, two random sequences 𝐷 1 = (2, 1, 3, 6) and 𝐷 2 = (10, 4, 5, 9, 7) are created
for groups 𝐺 1 and 𝐺 2 , respectively. Since node 2 appears before node 1 in 𝐷 1 , a blue arrow goes
from node 2 to node 1, as shown in the figure. This process is repeated for every population member
so as to create diverse VRGs.
5.5.4   Transitive reduction
Next, a transitive reduction [110] is performed on the VRG corresponding to each variable group.
For VRGs having both power law and inequality edges, transitive reduction is performed on
subgraphs consisting only of the edges of the same type. This step eliminates redundant directed
                                                 61


                 (a) Group 𝐺 1 .                                  (b) Group 𝐺 2 .
                            Figure 5.4 Creating a directed acyclic VRG.
edges between two different rule types. An example of eliminating an arrow from node 2 to node 6
is shown in Figure 5.5a.
                 (a) Group 𝐺 1 .                                  (b) Group 𝐺 2 .
                                  Figure 5.5 Transitive reduction.
5.5.5   Modify VRG according to user feedback
A user can provide feedback in the form of a ranking, or select only a subset of the available
rules. In the former case, the VRG edge rankings are updated to reflect the user’s choice. Edges
                                                62


corresponding to the rules discarded by the user are removed. Figure 5.6a shows an example where
the rule involving 𝑥 1 and 𝑥 6 are ranked 1 (marked by arrows with a red border) and 𝑥 2 and 𝑥3
are ranked 2 (marked by arrows with a dark yellow border). The gray edges represent the rules
discarded by the user. Figure 5.6b shows a similar ranking process.
                 (a) Group 𝐺 1 .                                   (b) Group 𝐺 2 .
                               Figure 5.6 Implementing user feedback.
5.5.6   Repair new offspring solutions
For every new solution, the corresponding VRGs are traversed. A random rank 1 starting point
is selected and the VRG is traversed recursively in a depth-first fashion. From every node, the
algorithm first moves forward via the outgoing edges and repairs the connected node based on
the current node. Once all outgoing edges are traversed, and the algorithm comes back to the
same node, traversal is performed by following the incoming edges. This is repeated for all ranks.
Algorithm 5.1 presents the pseudocode of the repair process. For ease of understanding, some of
the terminology used in the pseudocode is explained in this section. In the pseudocode, the VRG
data structure has the attributes Nodes and Edges. The Edges attribute representing an edge (𝑖, 𝑗)
has multiple sub-attributes: StartVertex (𝑖 in this case), EndVertex ( 𝑗 in this case), EdgeType (rule
type and correspdonding repair agent), EdgeRank (rank of an edge). A function TraverseGraph is
used which recursively traverses the VRG from a random start node for a particular rule rank. The
                                                  63


function Repair called by TraverseGraph calls the correct repair agent based on EdgeType.
Algorithm 5.1 VRG traversal and repair pseudocode.
Require: New solution set (X𝑟 ), variable groups (𝐺), VRGs for every solution and group, rule hierarchy.
Ensure: Repaired solution set X𝑟 .
 1: function TraverseGraph(x, Graph, CurrentNode, PreviousNode, NodesVisited, CurrentRank)
 2:     if CurrentNode in NodesVisited then
 3:         return
 4:     end if
 5:     CurrentEdges ← Graph.Edges[CurrentNode];
 6:     for each outgoing edge (e) in CurrentEdges do
 7:         NextNode ← e.EndVertex;
 8:         if NextNode not in NodesVisited then
 9:             EdgeType ← e.EdgeType;
10:             EdgeRank ← e.EdgeRank;
11:             if EdgeRank = CurrentRank then
12:                 Repair(x, CurrentNode, NextNode, EdgeType, EdgeRank);
13:             end if
14:             TraverseGraph(x, Graph, NextNode, CurrentNode, NodesVisited);
15:         end if
16:     end for
17:     for each incoming edge (e) in CurrentEdges do
18:         NextNode ← e.StartVertex;
19:         if NextNode not in NodesVisited and NextNode ≠ PreviousNode then
20:             EdgeType ← e.EdgeType;
21:             EdgeRank ← e.EdgeRank;
22:             if EdgeRank = CurrentRank then
23:                 Repair(x, CurrentNode, NextNode, EdgeType, EdgeRank);
24:             end if
25:             TraverseGraph(x, Graph, NextNode, CurrentNode, NodesVisited);
26:         end if
27:     end for
28:     Add CurrentNode to NodesVisited;
29: end function
30: for each group 𝐺 𝑘 in 𝐺 do                                                 ⊲ Repair procedure begins
31:     for each solution 𝑥 in X𝑟 do
32:         CurrentGraph ← VRG assigned to x for 𝐺 𝑘 ;
33:         for CurrentRank = 1, 2, ..., 𝑛𝑟 𝑎𝑛𝑘𝑠 do
34:             StartNode ← Select random node having atleast one edge of rank CurrentRank;
35:             TraverseGraph(x, CurrentGraph, StartNode, NULL, [], CurrentRank);
36:         end for
37:     end for
38: end for
                                                    64


5.6     Simply-supported stepped beam design
5.6.1   Problem background and formulation
Beam design problems are common in the literature [6, 111] and can be used to benchmark an
optimization algorithm. In this study, we consider a simply-supported stepped beam design with
multiple segments having a rectangular cross-section. An example with five segments is shown
in Figure 5.7. A vertical load of 2 kN is applied at the middle of the beam. All 𝑛seg segments
                    Figure 5.7 Simply-supported stepped beam with five segments.
are of equal length. The area of the rectangular cross-section is determined by its width (𝑏𝑖 ) and
height (ℎ𝑖 ) for the 𝑖-th segment, where 𝑖 ∈ [1, 𝑛seg ], The volume (𝑉) and maximum deflection (Δ)
are to be minimized by finding an optimal width 𝑏𝑖 and height ℎ𝑖 of each segment, totalling 2𝑛seg
variables. The maximum stress 𝜎𝑖 (x) of 𝑖-th member and deflection 𝛿 𝑗 (x) at 𝑗-th node need to be
kept below strength of the material 𝜎max and a specified limit 𝛿max , respectively. The aspect ratio
(ratio of height to width) of each segment is also restricted within a particular range (in [𝑎 𝐿 , 𝑎𝑈 ]),
as constraints. The MOP formulation is shown below:
                                                 𝑛seg
                                                ∑︁
                              Minimize 𝑉 (x) =         𝑏 𝑖 ℎ𝑖 𝑙 𝑖 ,                                (5.7)
                                                 𝑖=1
                                                  𝑛seg
                              Minimize Δ(x) = max 𝛿𝑖 (x),                                          (5.8)
                                                  𝑖=1
                                         𝑛seg
                             Subject to max 𝜎𝑖 (x) ≤ 𝜎max ,                                        (5.9)
                                         𝑖=1
                                         𝑛seg
                                        max 𝛿 𝑗 (x) ≤ 𝛿max ,                                      (5.10)
                                         𝑗=1
                                        𝑎 𝐿 ≤ 𝑎𝑖 ≤ 𝑎𝑈 , for 𝑖 = 1, . . . , 𝑛seg .                 (5.11)
                                                    65


Here, two cases with 39 and 59 segments are considered. Problem parameters are described in
Table 5.2.
   Table 5.2 Number of decision variables and constraints for 39 and 59-segment stepped beams.
     𝑛𝑠𝑒𝑔   𝜎max (𝑀 𝑃𝑎)   𝛿max (𝑚) Width (𝑏 𝑖 ) Height (ℎ𝑖 ) Aspect ratio (𝑎 𝑖 ) Variables Constraints
      39         20         0.04    [0.1, 40]    [0.1, 40]       [0.5, 2]           78        41
      59         20         0.06    [0.1, 60]    [0.1, 60]       [0.5, 2]          118        61
5.6.2    Experimental settings
NSGA-II [15], a state-of-the-art MOEA, is applied with the proposed IK-EMO procedure to solve
both cases. This problem is intended to demonstrate the performance of our proposed algorithm
with minimal initial user knowledge. Thus, no grouping information is provided, resulting in all
variables being in a single group. IK-EMO is combined separately with each repair agent described
in Section 5.2. In addition, there are two cases where mixed relationships are used: the first case
with PL-RA2 and IQ-RA2, and the second case with PL-RA-E and I-ES. The rule hierarchy is
described in Table 5.1.
    Four rule usage schemes RU1, RU2, RU3 and RU4 select the top 10%, 20%, 50% and 100% of
the learned rules sorted according to their scores. They also act as artificial users with a consistent
behavior. Each rule usage scheme is paired with one or more repair agents. Eight cases with a single
repair agent are considered: PL-RA1, PL-RA2, PL-RA3, PL-RA-E, IQ-RA1, IQ-RA2, IQ-RA3,
IQ-RA-E. Two cases with a combination of repair agents are considered: one with PL-RA2 and
IQ-RA2, and the other with PL-RA-E and IQ-RA-E. From Table 3.1, 𝜌𝑖 is set to be 0.1, 𝜀𝑖 𝑗 is set
as 0.1, and 𝑒𝑖𝑚𝑖𝑛
               𝑗 is set to 0.01. Table 5.3 shows the parameter settings for this problem.
    For each combination of a repair agent and user, 20 runs are performed and the Hypervolume
(HV) [104] values are recorded at the end of each generation. The Wilcoxon rank-sum test [108]
is used to compare the statistical performance of the algorithms tested here with respect to the best
performing algorithm for each scenario.
5.6.3    Experimental results and discussion
Tables 5.4 and 5.5 show the optimization results for the 39 and 59-segment stepped beam problems,
respectively. Base NSGA-II results without any rule extraction and repair are shown in the first row.
                                                   66


                             Table 5.3 Parameter settings of IK-EMO.
              Parameter                                             Value
              Population size                                  Problem-specific
              Maximum generations                              Problem-specific
              Mutation operator                            Polynomial mutation [11]
              Mutation probability (𝑝 𝑚 ) and index (𝜂𝑚 )         1/𝑛𝑣𝑎𝑟 , 50
              Crossover operator                                   SBX [3]
              Crossover probability (𝑝 𝑐 ) and index (𝜂𝑐 )          0.9, 30
              Minimum rule score, 𝑠min                                0.7
              Learning interval (𝑇𝐿 , in generations)                 10
              Repair interval (𝑇𝑅 , in generations)                   10
              𝛼 and 𝑝 min in Equation 5.5                          0.5, 0.1
              Rule parameters 𝜌𝑖 , 𝜀𝑖 𝑗 , 𝑒𝑖min
                                             𝑗                 Problem-specific
              User feedback lag, 𝑇𝑈 in Section 5.10             User-dependent
The best performance case in each row is marked in bold. For every column, the best performing
algorithm is marked with a shaded gray box. The Wilcoxon p-values show the relative performance
of each algorithm with the column-wise best performance. Algorithms with a statistically similar
performance to the column-wise best are shown in italics. The ND front obtained in a particular
run using the power law repair operators for RU2 are shown in Figures 5.8a and Figure 5.8c for the
39 and 59-segment cases, respectively. The corresponding median HV plot over the course of the
optimization run are shown in Figures 5.8b and Figure 5.8d, respectively.
                                                 67


Table 5.4 FEs required to achieve HV𝑇 = 0.81 for 39-segment beams. Best performing algorithm for row is marked in bold. Best
performing algorithm in each column is marked by a shaded gray box. Algorithms with statistically similar performance to the best
algorithm column-wise are marked in italics. The corresponding Wilcoxon p-values are given in braces.
       Rule Type      Repair agent            RU1                      RU2                     RU3                     RU4
          None        None (base)  10.6k ± 1.0k (p = 0.0145) 10.6k ± 1.0k (p=0.0118) 10.6k ± 1.0k (p=0.0238) 10.6k ± 1.0k (p=0.0412)
                        PL-RA1      10.4k ± 0.8k (p=0.0416)  10.3k ± 1.0k (p=0.0225) 10.5k ± 0.9k (p=0.0420) 10.6k ± 1.0k (p=0.0319)
        Power           PL-RA2       9.8k ± 0.8k (p=0.0661)        9.4k ± 0.9k              9.5k ± 1.2k      10.2k ± 1.3k (p=0.0551)
        law rule        PL-RA3      10.4k ± 0.9k (p=0.0195)  10.3k ± 1.0k (p=0.0422)  9.8k ± 1.0k (p=0.0106) 11.6k ± 1.0k (p=0.0147)
                       PL-RA-E             9.6k ± 1.0k        9.5k ± 1.0k (p=0.0762)  9.6k ± 1.4k (p=0.0841)       9.9k ± 1.2k
                        IQ-RA1      10.7k ± 0.7k (p=0.0016)  10.5k ± 1.0k (p=0.0471) 10.6k ± 0.9k(p=0.0483)  10.8k ± 1.0k (p=0.0308)
        Inequality      IQ-RA2     10.2k ± 0.9k (p=0.0125)   10.3k ± 1.1k (p=0.0263) 10.3k ± 0.8k(p=0.0486)  10.7k ± 1.2k (p=0.0210)
        rule            IQ-RA3      10.7k ± 1.2k (p=0.0483)  10.6k ± 1.1k(p=0.0340)  10.5k ± 1.0k (p=0.0318) 10.6k ± 1.4k (p=0.0247)
                       IQ-RA-E      10.6k ± 0.8k (p=0.0463)  10.5k ± 0.9k (p=0.0207) 10.6k ± 1.3k(p=0.0342)  10.6k ± 1.6k (p=0.0177)
        Mixed       PL-RA2+IQ-RA2    9.8k ± 0.9k (p=0.0517)   9.6k ± 1.0k (p=0.0957)  9.7k ± 1.1k (p=0.0586) 10.4k ± 0.8k (p=0.0778)
        rule       PL-RA-E+IQ-RA-E  9.7k ± 1.2k (p=0.0913)    9.6k ± 0.4k (p=0.0713)  9.8k ± 0.7k (p=0.0616) 10.0k ± 0.7k (p=0.0506)
                                                                   68


Table 5.5 FEs required to achieve HV𝑇 = 0.75 for 59-segment beams. Best performing algorithm for row is marked in bold. Best
performing algorithm in each column is marked by a shaded gray box. Algorithms with statistically similar performance to the best
algorithm column-wise are marked in italics. The corresponding Wilcoxon p-values are given in braces.
       Rule Type      Repair agent           RU1                     RU2                     RU3                     RU4
          None        None (base)  19.0k ± 2.5k (p=0.0102) 19.0k ± 2.5k (p=0.0015) 19.0k ± 2.5k (p=0.0011) 19.0k ± 2.5k (p=0.0027)
                        PL-RA1     15.6k ± 1.8k (p=0.0105)      14.1k ± 2.3k       15.2k ± 3.1k (p=0.0164) 16.1k ± 2.5k (p=0.0371)
        Power           PL-RA2           14.8k ± 1.8k      15.0k ± 2.0k (p=0.0225) 14.4k ± 2.8k (p=0.1015) 15.5k ± 1.8k (p=0.0510)
        law rule        PL-RA3     16.0k ± 2.2k (p=0.0042) 15.8k ± 1.5k (p=0.0218) 16.6k ± 3.5k (p=0.0215) 17.0k ± 2.6k (p=0.0446)
                       PL-RA-E     14.9k ± 3.1k (p=0.1165) 14.2k ± 3.6k (p=0.0911)      14.2k ± 2.9k             14.9k ± 2.5k
                        IQ-RA1     16.6k ± 4.1k (p=0.0215) 16.0k ± 2.6k (p=0.0411) 16.6k ± 2.0k (p=0.0182) 18.1k ± 2.2k (p=0.0341)
        Inequality      IQ-RA2     16.8k ± 4.0k (p=0.0193) 15.9k ± 2.9k (p=0.0365) 16.8k ± 3.0k (p=0.0179) 17.6k ± 2.6k(p=0.0335)
        rule            IQ-RA3     16.5k ± 4.3k (p=0.0317) 17.2k ± 3.1k (p=0.0357) 17.2k ± 3.2k (p=0.0155) 17.9k ± 2.4k(p=0.0273)
                       IQ-RA-E     16.6k ± 3.5k (p=0.0287) 16.4k ± 4.2k (p=0.0282) 16.9k ± 2.7k (p=0.0293) 17.8k ± 1.9k(p=0.0228)
        Mixed       PL-RA2+IQ-RA2  15.1k ± 3.0k (p=0.0583) 14.5k ± 3.1k (p=0.0917) 14.6k ± 3.0k (p=0.0715) 15.4k ± 2.5k (p=0.0713)
        rule       PL-RA-E+IQ-RA-E 14.9k ± 2.6k (p=0.0917) 14.4k ± 2.9k (p=0.0663) 14.6k ± 2.6k (p=0.0681) 15.2k ± 2.2k (p=0.0558)
                                                                 69


                                                                              1.6
                                               Base NSGA-II                               Base NSGA-II            IK-EMO/PL-RA3
              0.0175
                                               IK-EMO/PL-RA1                  1.4         IK-EMO/PL-RA1           IK-EMO/PL-RA-E
              0.0150                           IK-EMO/PL-RA2                              IK-EMO/PL-RA2           Target HV
                                                                              1.2
                                                                    Hypervolume
                                               IK-EMO/PL-RA3
    Deflection
              0.0125                           IK-EMO/PL-RA-E                 1.0
              0.0100                                                          0.8
              0.0075                                                          0.6
              0.0050                                                          0.4
                                                                              0.2
              0.0025
                               2          3          4          5             0.0 0        5000      10000        15000     20000
                                       Volume                                              Function evaluations
                       (a) Pareto Front for one run.                                      (b) HV plot over 20 runs.
         0.0200                                                                   1.6
                                              Base NSGA-II                                  Base NSGA-II           IK-EMO/PL-RA3
                                              IK-EMO/PL-RA1                       1.4       IK-EMO/PL-RA1          IK-EMO/PL-RA-E
         0.0175                               IK-EMO/PL-RA2                       1.2       IK-EMO/PL-RA2          Target HV
                                                                        Hypervolume
         0.0150                               IK-EMO/PL-RA3
Deflection
                                              IK-EMO/PL-RA-E                      1.0
         0.0125                                                                   0.8
         0.0100                                                                   0.6
         0.0075                                                                   0.4
         0.0050                                                                   0.2
                         2.0    2.5     3.0    3.5       4.0                      0.0 0      5000         10000    15000      20000
                                      Volume                                                 Function evaluations
                       (c) ND Front for one run.                                           (d) HV plot over 20 runs.
 Figure 5.8 ND fronts and hypervolume plots obtained by IK-EMO with RU2 and power law repair
 agents for: (a-b) 39-segment, and (c-d) 59-segment stepped beam problem.
             The results show many interesting observations as stated below.
 5.6.3.1          General observations
 Despite the median FEs being close, statistically base NSGA-II does not perform well compared
 to knowledge-based NSGA-II methods for both 39- and 59-segment problems. A positive aspect
 of the proposed algorithm is that it is still able to achieve a good performance with significantly
 low population size. For problems with expensive evaluation functions, this may stay beneficial for
 saving computational time.
                                                                70


5.6.3.2   Power law vs inequality rules
As can be seen from the table, for both 39- and 59-segment problems, the power law repair alone
results in the best performance for each user case. Inequality rule-based repair alone in general
results in worse performance compared to power-law-based repair. One possible reason could
be the greater versatility of power laws in modeling complex relationships compared to simple
inequality rules.
5.6.3.3   Best performing algorithm for each rule usage scheme
In the 39-segment case, PL-RA-E is the best performer for both RU1 and RU4. For RU2 and RU3,
PL-RA2 is the best performer. It is to be noted that PL-RA-E has statistically similar performance
to PL-RA2 for both RU2 and RU3. This shows that the ensemble method can be used to get
good performance without the need for selecting a proper repair process. For the 59-segment case,
PL-RA2 and PL-RA1 are the best performers for RU1 and RU2. In the cases of RU3 and RU4,
PL-RA-E gives the best performance, with PL-RA2 having a comparable performance. PL-RA1
and PL-RA3 do not show comparable performance with PL-RA2 or PL-RA-E in most cases. For
PL-RA1, adhering closely to the learned power law rules constrains NSGA-II in finding good
solutions. PL-RA3 introduces a large amount of variance which is detrimental to the optimization
process. A compromise between these two extremes, provided by PL-RA2 or PL-RA-E, is the
logical step.
    Figure 5.8 illustrates the results when RU2 is combined with the power law repair operators
for both problem cases. The difference in the quality of solutions obtained after 20,000 FEs is
prominent in the 59-segment ND front. In the median HV plots it is seen that the FEs required
to reach HV𝑇 for PL-RA-E is close to the number needed by the best performing repair agent.
Base NSGA-II and the repair operators all give good quality solutions at the end of the run for the
39-segment case. However, for the 59-segment case, base NSGA-II performs significantly worse.
5.6.3.4   Relative performance of each rule usage scheme
It can be seen from Tables 5.4 and 5.5 that RU2 produces the best performance in 6 out of 10
cases for the 39-segment case, and 8 out of 10 cases for the 59-segment case. This shows that in
                                                71


terms of rule usage, using too few or too many of the learned rules is not effective in improving the
algorithm’s performance.
5.6.3.5   Mixed relation repair agents
The mixed relation repair agents (PL-RA2+IQ-RA2) and (PL-RA-E+IQ-RA-E), have statistically
similar performance to the best algorithm for each user and both problem cases. Even though in-
equality repair operators perform worse than power law repair operators individually, their presence
in the mixed repair agents do not hinder the performance, since only the high-performing rules
are added to the VRG during creation. The proposed framework is robust enough to give good
performance irrespective of the number of repair agents and type of rules.
5.7    Optimal Power Flow Problem
5.7.1   Problem background and formulation
Optimal power flow (OPF) is a common problem in power system engineering with MOEAs being
used to solve the problem [112, 113, 113, 114, 115, 116, 117, 118]. The following objective
functions are minimized: fuel cost, emissions, voltage deviation, and real power loss. In many
cases, one or more of these objectives are considered in the literature, with the rest being kept as
constraints. In this study, we consider two objectives: minimizing fuel cost and reducing fossil
fuel emissions. Voltage deviation and power loss are kept as constraints. This version of the OPF
problem is also known as the environmental economic dispatch (EED) problem [113].
                                                  𝑁𝐺 
                                                 ∑︁                             
                                                                             2
                       Minimize 𝐶𝐹 (PG , VG ) =         𝑎 𝑖 + 𝑏 𝑖 𝑃𝐺𝑖 + 𝑐 𝑖 𝑃𝐺𝑖   ,                 (5.12)
                                                  𝑖=1
                                                  𝑁
                                                 ∑︁ 𝐺                                           
                       Minimize 𝐶𝐸 (PG , VG ) =         𝛼𝑖 + 𝛽𝑖 𝑃𝐺𝑖 + 𝛾𝑖 𝑃𝐺𝑖 2
                                                                                 + 𝜁𝑖 𝑒 (𝜆𝑖 𝑃𝐺𝑖 ) , (5.13)
                                                  𝑖=1
                                 𝑁 𝑏𝑢𝑠
                                  ∑︁
                      Subject to       (𝑃𝑖 − 𝑃 𝐷 − 𝑃 𝐿 ) = 0,                                       (5.14)
                                  𝑖=1
                                 𝑉 𝐷 min ≤ 𝑉 𝐷 ≤ 𝑉 𝐷 max ,        𝑃 𝐿𝑚𝑖𝑛 ≤ 𝑃 𝐿 ≤ 𝑃 𝐿𝑚𝑎𝑥 ,
                                 𝑄 𝐺𝑖𝑚𝑖𝑛 ≤ 𝑄 𝐺𝑖 ≤ 𝑄 𝐺𝑖𝑚𝑎𝑥 ,          𝑃𝑠𝑚𝑖𝑛 ≤ 𝑃𝑠 ≤ 𝑃𝑠𝑚𝑎𝑥 ,
                                 𝑉𝑠𝑚𝑖𝑛 ≤ 𝑉𝑠 ≤ 𝑉𝑠𝑚𝑎𝑥 ,        𝑉𝑃𝑄𝑖𝑚𝑖𝑛 ≤ 𝑉𝑃𝑄𝑖 ≤ 𝑉𝑃𝑄𝑖𝑚𝑎𝑥 ,
                                                        72


where 𝐶𝐹 is the fuel cost, 𝐶𝐸 is the emission cost, 𝑁𝐺 is the number of generators, 𝑃𝐺𝑖 is the
real power output and 𝑉𝐺𝑖 is the voltage output of the 𝑖 𝑡ℎ generator, (𝑎𝑖 , 𝑏𝑖 , 𝑐𝑖 ) are the fuel cost
coefficients, (𝛼𝑖 , 𝛽𝑖 , 𝛾𝑖 , 𝜁𝑖 , 𝜆𝑖 ) are the emission cost coefficients. 𝑉 𝐷 is the total voltage deviation
of all the load buses, 𝑃 𝐿 is the total real power loss, 𝑄 𝐺𝑖 is the reactive power output of the 𝑖 𝑡ℎ
generator, 𝑃𝑠 is the real power output and 𝑉𝑠 is the voltage output of the slack bus, 𝑉𝑃𝑄𝑖 is the
voltage at the 𝑖 𝑡ℎ load/P-Q bus. 𝑃 𝐷 is the power demand and 𝑁 𝑏𝑢𝑠 is the total number of buses.
A load flow analysis must be performed to satisfy the equality constraint. We use MATPOWER
[119] as the load flow solver. We consider IEEE 118-bus and 300-bus systems in this study.
5.7.2    Experimental settings
The bus details, along with the numbers of decision variables and constraints, are given in Table 5.6.
The types of decision variables and their corresponding ranges are given in Table 5.7.
                                      Table 5.6 IEEE bus system specifications.
        System           Generators         Transformers    Load bus     Decision variables   Constraints
    IEEE 118-bus               54                 11            64              115                240
    IEEE 300-bus               69                107           231              243                604
                               Table 5.7 OPF decision variable types and ranges.
                                                Variable                 Range
                                   Generator power output (𝑃𝐺𝑖 )       [30, 100]
                                        Generator voltage (𝑉𝐺𝑖 )      [0.95, 1.05]
                                       Transformer tap ratio (𝑇𝑖 )     [0.9, 1.1]
    Experimental settings are the same as in the stepped beam problem except that the population
size is set to be 50 and the maximum number of generations is set as 400 for both IEEE 118 and
300-bus systems. From Table 3.1, 𝜌𝑖 and 𝜀𝑖 𝑗 are set as 1, and 𝑒𝑖𝑚𝑖𝑛          𝑗 is set to 0.01. Two variable
groups are defined and shown in Table 5.8.
5.7.3    Experimental results and discussion
Tables 5.9 and 5.10 show the optimization results for the IEEE 118- and 300-bus systems, respec-
tively. The ND fronts obtained in a single run using four power law repair methods for RU2 are
                                                          73


                                  Table 5.8 OPF variable groups.
                                                             Variable Indices
                 Group             Variable Type
                                                           118-bus     300-bus
                 𝐺 opf1   Generator power and voltage      [1-104]      [1-136]
                 𝐺 opf2       Transformer tap ratio       [105-115] [137-243]
shown in 5.9a and Figure 5.9c for the IEEE 118 and 300-bus systems, respectively. The correspond-
ing median HV plots over the course of the optimization are shown in Figures 5.9b and Figure 5.9d,
respectively.
                                                 74


Table 5.9 FEs required to achieve HV𝑇 = 0.74 for IEEE 118-bus system. Best performing algorithm for row is marked in bold. Best
performing algorithm in each column is marked by a shaded gray box. Algorithms with statistically similar performance to the best
algorithm column-wise are marked in italics. The corresponding Wilcoxon p-values are given in braces.
        Rule Type       Repair agent            RU1                    RU2                    RU3                    RU4
           None         None (base)    6.5k ± 0.6k (p=0.0216) 6.5k ± 0.6k (p=0.0286) 6.5k ± 0.6k (p=0.0337) 6.5k ± 0.6k (p=0.0432)
                         PL-RA1        5.9k ± 0.2k (p=0.0101) 5.8k ± 0.4k (p=0.0417) 5.8k ± 0.5k (p=0.0119)       6.0k ± 0.2k
         Power           PL-RA2              4.5k ± 0.4k      5.0k ± 0.9k (p=0.0805)       5.2k ± 0.2k      6.1k ± 0.3k (p=0.0813)
         law rule        PL-RA3        6.8k ± 0.5k (p=0.0152) 6.3k ± 0.7k (p=0.0398) 6.2k ± 0.4k (p=0.0817) 7.0k ± 0.6k (p=0.0086)
                         PL-RA-E       4.7k ± 0.4k (p=0.0656)      4.2k ± 0.5k       5.4k ± 0.3k (p=0.0680) 6.1k ± 0.2k (p=0.0727)
                          IQ-RA1      6.6k ± 0.4k (p=0.00119) 6.9k ± 0.7k (p=0.0255) 6.1k ± 0.1k (p=0.0341) 7.0k ± 0.5k (p=0.0338)
         Inequality       IQ-RA2       6.6k ± 0.2k (p=0.0065) 6.8k ± 0.5k (p=0.0021) 6.4k ± 0.3k (p=0.0279) 6.8k ± 0.1k (p=0.0332)
         rule             IQ-RA3       6.5k ± 0.4k (p=0.0138) 6.4k ± 0.2k (p=0.0018) 6.1k ± 0.3k (p=0.0275) 7.5k ± 0.4k(p=0.0320)
                         IQ-RA-E       6.5k ± 0.1k (p=0.0129) 6.3k ± 0.4k (p=0.0121) 6.8k ± 0.3k (p=0.0116) 7.0k ± 0.2k (p=0.0112)
         Mixed       PL-RA2 + IQ-RA2   4.8k ± 0.3k (p=0.0722) 4.8k ± 0.4k (p=0.0713) 5.4k ± 0.3k (p=0.0841) 6.2k ± 0.1k (p=0.0748)
         rule       PL-RA-E + IQ-RA-E  4.6k ± 0.1k (p=0.0903) 4.4k ± 0.2k (p=0.0667) 5.6k ± 0.1k (p=0.0144) 6.2k ± 0.2k (p=0.0919)
                                                                  75


Table 5.10 FEs required to achieve HV𝑇 = 0.70 for IEEE 300-bus system. Best performing algorithm for row is marked in bold. Best
performing algorithm in each column is marked by a shaded gray box. Algorithms with statistically similar performance to the best
algorithm column-wise are marked in italics. The corresponding Wilcoxon p-values are given in braces.
       Rule Type       Repair agent            RU1                     RU2                     RU3                     RU4
          None         None (base)   17.5k ± 1.2k (p=0.0142) 17.5k ± 1.2k (p=0.0205) 17.5k ± 1.2k (p=0.0130) 17.5k ± 1.2k (p=0.0091)
                        PL-RA1       14.8k ± 0.7k (p=0.0878) 15.9k ± 0.8k (p=0.0110) 16.2k ± 0.6k (p=0.0035) 16.8k ± 0.6k (p=0.0063)
        Power           PL-RA2       15.1k ± 0.6k (p=0.0753) 14.9k ± 0.5k (p=0.0518) 15.9k ± 0.2k (p=0.0239) 14.9k ± 0.2k (p=0.0657)
        law rule        PL-RA3       18.5k ± 0.3k (p=0.0413) 17.5k ± 1.0k (p=0.0017) 19.3k ± 1.2k (p=0.0181) 18.8k ± 0.5k (p=0.0025)
                        PL-RA-E           14.5k ± 0.2k            14.4k ± 0.3k             14.8k ± 0.6k            15.6k ± 0.9k
                         IQ-RA1      15.2k ± 0.9k (p=0.0315) 16.0k ± 1.1k (p=0.0033)  18.5 ± 1.3k (p=0.0059)  18.2 ± 1.0k (p=0.0024)
        Inequality       IQ-RA2      16.9k ± 1.2k (p=0.0122) 16.5k ± 1.0k (p=0.0015)  17.6 ± 0.7k (p=0.0073)  18.0 ± 1.3k (p=0.0022)
        rule             IQ-RA3      16.8k ± 1.0k (p=0.0286) 16.1k ± 0.8k (p=0.0112)  16.6 ± 1.7k (p=0.0012)  19.5 ± 1.5k (p=0.0016)
                        IQ-RA-E      15.9k ± 0.8k (p=0.0252) 16.8k ± 0.8k (p=0.0104) 15.7 ± 0.6k (p=0.0076)   18.3 ± 1.1k (p=0.0032)
        Mixed       PL-RA2 + IQ-RA2  14.9k ± 0.8k (p=0.0991) 14.8k ± 0.7k (p=0.0836) 16.3k ± 0.2k (p=0.0103) 16.0k ± 3.0k (p=0.0528)
        rule       PL-RA-E + IQ-RA-E 14.7k ± 0.7k (p=0.1013) 14.6k ± 0.9k (p=0.0811) 15.0k ± 0.8k (p=0.0713)  16.2k ± 0.9k (p=0.661)
                                                                   76


                                                                                     1.6
                                                         Base NSGA-II                              Base NSGA-II            IK-EMO/PL-RA3
                      3.25                               IK-EMO/PL-RA1               1.4           IK-EMO/PL-RA1           IK-EMO/PL-RA-E
       Emissions (ton/h)
                                                         IK-EMO/PL-RA2               1.2           IK-EMO/PL-RA2           Target HV
                      3.00
                                                                           Hypervolume
                                                         IK-EMO/PL-RA3
                      2.75                               IK-EMO/PL-RA-E              1.0
                                                                                     0.8
                      2.50
                                                                                     0.6
                      2.25
                                                                                     0.4
                      2.00                                                           0.2
                             1.375 1.400 1.425 1.450 1.475 1.500
                                                                                     0.0 0          5000      10000        15000     20000
                                        Fuel cost ($)           1e4                                 Function evaluations
                                (a) ND Front for one run.                                          (b) HV plot over 20 runs.
                                                     Base NSGA-II                         1.50       Base NSGA-II            IK-EMO/PL-RA3
                                                     IK-EMO/PL-RA1                                   IK-EMO/PL-RA1           IK-EMO/PL-RA-E
Emissions (ton/h)
               8.0                                   IK-EMO/PL-RA2                        1.25       IK-EMO/PL-RA2           Target HV
                                                                                Hypervolume
                                                     IK-EMO/PL-RA3
                                                     IK-EMO/PL-RA-E
                                                                                          1.00
               7.5
                                                                                          0.75
                                                                                          0.50
               7.0
                                                                                          0.25
               6.5
                               7.30      7.35     7.40        7.45
                                                                                          0.00 0       5000        10000     15000     20000
                                      Fuel cost ($)                  1e4                               Function evaluations
                               (c) ND Front for one run.                                            (d) HV plot over 20 runs.
  Figure 5.9 ND front and hypervolume plots obtained with RU2 and power law repair agents for:
  (a-b) IEEE 118 and (c-d) 300-bus OPF problems.
  5.7.3.1                    General observations
  Base NSGA-II is outperformed by PL-RA1, PL-RA2 and PL-RA-E as well as the mixed rule
  repair agents. However, for PL-RA3 and the inequality repair operators, base NSGA-II produces
  comparable performance in most cases. Good performance with low population size is obtainable
  by the power-law-based repair agents.
  5.7.3.2                    Power law vs inequality rules
  For both problem cases, a power law repair operator is the best performer for each user, as
  in the stepped beam problem. Inequality-rule-based repair operators in general result in worse
  performance compared to power-law-based repair operators as well as the base NSGA-II. This, as
                                                                           77


in the stepped beam problem, is a result of the power laws being able to more accurately model the
inter-variable relationships. For PL-RA3, having a high variance (2𝜎𝑐 ) during repair is harmful to
the optimization, resulting in comparable or worse performance than base NSGA-II in general.
5.7.3.3   Best performing algorithm for each user
In the IEEE 118-bus system, PL-RA-E is the best performer for RU2, with PL-RA2 showing
comparable performance. For RU1 and RU3, PL-RA2 is the best performer, with PL-RA-E
showing comparable performance. For RU4, PL-RA1 is the best, with PL-RA2 and PL-RA-E
giving statistically similar performance. As in the case of the stepped beam problems, PL-RA-E is
either the best or gives statistically similar performance. Thus, good performance can be obtained
without the need to determine which power-law-based repair operator is the best.
    Figure 5.9 illustrates the results with RU2 combined with the power law repair operators for
both problem cases. The difference in the quality of solutions obtained after 20,000 FEs is more
prominent in the IEEE 300-bus case. In the median HV plots it is seen that the number of FEs
required to reach HV𝑇 for PL-RA-E is close to that of PL-RA2. Base NSGA-II and PL-RA3 show
worse performance than the others.
5.7.3.4   Relative performance of each user
It can be seen from Tables 5.9 and 5.10 that RU2 produces the best performance in 6 out of 10
cases for the IEEE 118-bus case, and 7 out of 10 cases for the IEEE 300-bus case. This shows
that in terms of rule usage, using too few or too many of the learned rules is detrimental to the
optimization performance in general, which is similar to the conclusions made in the stepped beam
design problems.
5.7.3.5   Mixed relation repair agents
The mixed relation repair agents (PL-RA2+IQ-RA2) and (PL-RA-E+IQ-RA-E), have statistically
similar performance to the best algorithm for each user and both problem cases. As in the stepped
beam problems, the worse performance of the inequality repair operators does not hinder the
performance of the mixed relation operators.
                                                    78


5.8    Truss Design Problem
This problem uses the same problem formulation given in section 4.5.
5.8.1   Experimental settings
Experimental settings are similar to those of the previous problems. Population size is set to 100
and the maximum number of generations is set as 10,000. Thus, the total computational budget
comes out to be 1 million FEs. From Table 3.1, 𝜌𝑖 and 𝜀𝑖 𝑗 are set as 0.1, and 𝑒𝑖𝑚𝑖𝑛  𝑗 is set to 0.01.
Multiple variable groups are defined for this problem based on the relative location and alignment
of the beams as shown in Table 5.11.
Table 5.11 Variable groups for the 1,100-member truss cases. Each group has comparable variables
having identical units and scales.
                   Group                  Variable Type           Variable Indices
                    𝐺 𝑡1            𝑙𝑖 of vertical members         [1101 − 1179]
                    𝐺 𝑡2      𝑟 𝑖 of top longitudinal members [79 − 156], [235 − 312]
                    𝐺 𝑡3   𝑟 𝑖 of bottom longitudinal members  [1 − 78], [157 − 234]
                    𝐺 𝑡4            𝑟 𝑖 of vertical members         [313 − 391]
5.8.2   Experimental results and discussion
Results on the 1100-member truss problem is given in Table 5.12. The Pareto Fronts obtained in 1
run using the power law repair operators for RU2 are shown in Figure 5.10a. The corresponding
median HV plots over the course of the optimization are shown in Figure 5.10b.
                                                        79


Table 5.12 FEs required to achieve HV𝑇 = 0.78 for 1100-member truss. Best performing algorithm for row is marked in bold. Best
performing algorithm in each column is marked by a shaded gray box. Algorithms with statistically similar performance to the best
algorithm column-wise are marked in italics. The corresponding Wilcoxon p-values are given in braces.
       Rule Type                                 RU1                     RU2                     RU3                    RU4
         None               Base       874k ± 10k (p = 0.0043) 874k ± 10k (p = 0.0017) 874k ± 10k (p = 0.0062) 874k ± 10k (p = 0.0053)
                          PL-RA1         802k ± 8k (p=0.0057)    792k ± 8k (p=0.0129)  786k ± 13k (p=0.0140)    812k ± 6k (p=0.0023)
                          PL-RA2        680k ± 11k (p=0.0633)   678k ± 15k (p=0.0793)        688k ± 10k              744k ± 17k
       Power law
                          PL-RA3       963k ± 21k (p=0.0005)        1M (HV=0.74)            1M (HV=0.71)           1M (HV=0.66)
                          PL-RA-E             672k ± 9k              656k ± 18k         693k ± 16k (p=0.1016)   754k ± 24k (p=0.1163)
                           IQ-RA1       843k ± 15k (p=0.0015)   828k ± 20k (p=0.0115)  822k ± 23k (p=0.0169)    851k ± 8k (p=0.0325)
                           IQ-RA2       836k ± 12k (p=0.0039)   838k ± 19k (p=0.0248)  803k ± 18k (p=0.0465)    834k ± 6k (p=0.0318)
     Inequality rule
                           IQ-RA3       839k ± 21k (p=0.0036)  819k ± 29k (p=0.0219)   826k ± 10k (p=0.0454)    837k ± 5k (p=0.0414)
                          IQ-RA-E       840k ± 27k (p=0.0024)   816k ± 22k (p=0.0351)  798k ± 13k (p=0.0311)   843k ± 11k (p=0.0223)
                      PL-RA2 + IQ-RA2   682k ± 10k (p=0.0669)   677k ± 14k (p=0.0816)    691k ± 8k (p=0.0772)   751k ± 10k (p=0.0522)
         Mixed
                     PL-RA-E + IQ-RA-E   676k ± 9k (p=0.0714)   659k ± 12k (p=0.0814)   696k ± 13k (p=0.0611)   752k ± 23k (p=0.0699)
                                                                   80


               8                                                           1.6
                                                Base NSGA-II                        Base NSGA-II           IK-EMO/PL-RA3
                                                IK-EMO/PL-RA1              1.4      IK-EMO/PL-RA1          IK-EMO/PL-RA-E
 Compliance (m/N)
               7
                                                IK-EMO/PL-RA2              1.2      IK-EMO/PL-RA2          Target HV
                                                                 Hypervolume
               6                                IK-EMO/PL-RA3
                                                IK-EMO/PL-RA-E             1.0
               5
                                                                           0.8
               4
                                                                           0.6
               3
                                                                           0.4
               2
                                                                           0.2
               1
                    1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0                        0.00.0
                               Weight (kg)
                                                                                     0.2     0.4     0.6        0.8         1.0
                                                 1e5                                 Function evaluations              1e6
                        (a) Pareto Front for one run.                               (b) HV plot over 20 runs.
Figure 5.10 Pareto fronts and hypervolume plots obtained by IK-EMO with RU2 and power law
repair agents for 1100-member truss design problem.
          The observations from the results are outlined below.
5.8.2.1               General observations
Base NSGA-II is outperformed by all repair operators except for PL-RA3, which fails to achieve
the target HV in most cases. This shows PL-RA3 being a subpar repair operator introducing too
much uncertainty into the optimization process compared to the others. However, despite the low
population size (almost 1/10th the number of variables), the repair operators are able to generate
better solutions on an LSMOP. FE savings upto 25% compared to base NSGA-II is obtained.
5.8.2.2               Power law vs inequality rules
Similar to the previous two problems, power law repair operators are the best performer for each
RU. Inequality rule-based repair operators in general result in a worse performance compared to
power law-based repair operators except for the PL-RA3 case. These results show that power laws
are in general a better choice compared to inequality rules.
5.8.2.3               Best performing algorithm for each RU
PL-RA-E is the best performer for RU1 and RU2, with PL-RA2 showing a comparable perfor-
mance. For RU3 and RU4, PL-RA2 is the best performer, with PL-RA-E showing a comparable
performance. PL-RA3 fails to achieve the target. For RU4, PL-RA1 is the best, with PL-RA2 and
PL-RA-E giving statistically similar performance. As in the case of the previous problem, PL-RA-
                                                                 81


E is either the best or gives a statistically similar performance to the best. PL-RA2 performance
shows that an intermediate amount of trust in the learned power law is the optimal choice.
     Figure 5.10 illustrates the results with RU2 combined with the power law repair operators for
both problem cases. The difference in the quality of solutions obtained after 1 million FEs is more
prominent than the previous two problems due to this being an LSMOP. PL-RA-E gives the best
solutions are PL-RA3 is significantly worse. In the median HV plots it is seen that the FEs required
to reach HV𝑇 for PL-RA-E is close to PL-RA2. Base NSGA-II, PL-RA2 and PL-RA3 show a
worse performance than the others.
5.8.2.4   Relative performance of each RU
It can be seen from Table 5.12 that RU2 gives the best performance in 5 out of 10 cases and RU3
gives the best performance in 4 out of 10 cases. This reinforces the previous results by showing
that a moderate amount of rule usage gives better performance.
5.8.2.5   Mixed relation repair agents
All of the mixed relation repair agents (PL-RA2+IQ-RA2) and (PL-RA-E+IQ-RA-E), have a
comparable performance to the best algorithm for each RU. Similar to previous problems, the
worse performance of PL-RA3 and inequality repair operators do not affect the performance.
5.9     Summary of results
For every problem we have a total of 11 different algorithms including base NSGA-II and 4 RUs.
Table 5.13 presents a ranking of all the algorithms for each RU based on the FEs required to reach
the target HV. An algorithm with statistically similar performances to the best is given a rank of 1.
The results are collected and an overall ranking is assigned to each algorithm. It is seen that the top
ranked algorithm is PL-RA-E followed by PL-RA2+IQ-RA2, PL-RA-E+IQ-RA-E and PL-RA2.
If the ideal repair agent or rule types are not known beforehand, the ensemble operator with mixed
relationships (PL-RA-E+IQ-RA-E) can be used without significant performance drop.
5.10     Synchronous vs asynchronous user interaction
In the previous sections, it is assumed that user’s feedback will be available soon after the learned
rules are presented to the user. However, in the real world, users will always take some finite
                                                   82


                 Table 5.13 Ranking of different repair agents on multiple problems.
   RU       Base   PL-RA1  PL-RA2  PL-RA3    PL-RA-E      IQ-RA1     IQ-RA2 IQ-RA3 IQ-RA-E PL-RA2  PL-RA-E
                                                                                           +IQ-RA2 +IQ-RA-E
                                         78-variable stepped beam design
   RU1        4    3       1       3         1            5          2      5      4       1       1
   RU2        4    2       1       2         1            3          2      4      3       1       1
   RU3        5    4       1       2         1            5          3      4      5       1       1
   RU4        2    2       1       5         1            4          3      2      2       1       1
   Rank Sum   15   11      4       12        4            17         10     15     14      4       4
   Rank       9    6       1       7         1            11         5      9      8       1       1
                                        118-variable stepped beam design
   RU1        7    2       1       3         1            5          6      4      5       1       1
   RU2        8    1       2       3         1            5          4      7      6       1       1
   RU3        7    2       1       3         1            3          4      6      5       1       1
   RU4        9    3       2       4         1            8          5      7      6       1       1
   Rank Sum   31   8       6       13        4            21         19     24     22      4       4
   Rank       11   5       4       6         1            8          7      10     9       1       1
                                         115-variable optimal power flow
   RU1        3    2       1       5         1            4          4      3      3       1       1
   RU2        5    2       1       3         1            7          6      4      3       1       1
   RU3        7    3       1       5         1            4          6      4      8       1       2
   RU4        2    1       1       4         1            4          3      5      4       1       1
   Rank Sum   17   8       4       17        4            19         19     16     18      4       5
   Rank       7    5       1       7         1            10         10     6      9       1       4
                                         243-variable optimal power flow
   RU1        7    1       2       8         1            3          6      5      4       1       1
   RU2        7    2       1       7         1            3          5      4      6       1       1
   RU3        7    4       3       10        1            9          8      6      2       5       1
   RU4        3    2       1       7         1            5          4      8      6       1       1
   Rank Sum   24   9       7       32        4            20         23     23     18      8       4
   Rank       10   5       3       11        1            7          8      8      6       4       1
                                            1179-variable truss design
   RU1        7    2       1       8         1            6          3      4      5       1       1
   RU2        7    2       1       8         1            5          6      4      3       1       1
   RU3        7    2       1       8         1            5          4      6      3       1       1
   RU4        7    2       1       8         1            6          3      4      5       1       1
   Rank Sum   28   8       4       32        4            22         16     18     16      4       4
   Rank       10   5       1       11        1            9          6      8      6       1       1
   Final      47   26      10      42        5            45         36     41     38      8       8
   Rank Sum
   Final      11   5       4       9         1            10         6      8      7       2       2
   Rank
time to come up with a preferred ranking of the rules. Pausing the optimization (synchronous
user interaction) until the user provides feedback may be inefficient for problems with expensive
function evaluations. Continuing the optimization tasks while the user finalizes their feedback
(asynchronous user interaction) is a practicality, and seems like a more intriguing approach. We
implement a simplistic asynchronous scenario here to investigate the effect of delayed feedback
from users.
                                                        83


     We consider both 118- and 300-bus OPF problems for this purpose. The user feedback lag
(𝑇𝑈 ) is expressed as the number of FEs that could have been executed between the time the user
is presented with a set of rules and the time when the user is ready with some feedback (ranking
of rules). For simplicity, 𝑇𝑈 is assumed to be constant for every round of user interaction. In the
synchronous user interaction case, 𝑇𝑈 is undefined, as the optimization is put on hold until the user
comes up with a ranking of the rules. The learning interval (𝑇𝐿 ) – number of FEs executed between
two consecutive rule learning tasks – and repair interval (𝑇𝑅 ) – number of FEs executed between
two consecutive repair operations – are set to 500 FEs, (with 50 population members, this means
after every 10 generations). When the user provides feedback—ranking of the previous rule set
provided to them—new rules having identical structure to previous rules are given priority, and the
rest are discarded. But, instead of using previous rules’ statistics (means and standard deviations
of 𝑐, for example), statistics of the new rules are used to repair. The optimization with repair
operations then proceeds with the updated preferred ranking of rules.
     In the asynchronous user interaction case, once a VRG is constructed for the first time, IK-
EMO is ready to provide information to the user about the learned rules every 𝑇𝐿 FEs if the user
is available. However, we let the optimization run proceed normally without waiting for user’s
feedback and perform a repair operation using the learned rules. After 𝑇𝑈 FEs, the user is ready
to provide feedback. Then, the user feedback on the immediate past rules is combined with the
latest learned rules, as in the synchronous case, and repair is performed using the common rules
but with latest rules’ statistics. If 𝑇𝑈 ≤ 𝑇𝐿 , the learned rules available to IK-EMO will be the same
as the ones provided to the user before. Repair is performed immediately after the user provides the
feedback. If 𝑇𝑈 > 𝑇𝐿 , the latest learned rules may be different than the ones the user was provided.
It is to be noted that in the asynchronous case, 𝑇𝑅 is not an adjustable parameter, since repair is
performed as soon as one learning phase is complete, or the user has provided some feedback. A
figure illustrating both processes in detail is provided in the supplementary document. In this study,
we use different 𝑇𝑈 values from 125 to 500 FEs (𝑇𝑈 ≤ 𝑇𝐿 ) and from 1,000 to 4,000 FEs (𝑇𝑈 > 𝑇𝐿 ).
                                                     84


Table 5.14 FEs required to achieve HV𝑇 for IEEE 118 and 300-bus OPF problems over 20 runs with multiple instances of asynchronous
user interaction. Best performing algorithm for each row is marked in bold. Statistically similar results to the best are marked in italics.
                                                                           Decision-making lag (𝑇𝑈 )
     Problem       HV𝑇     Base NSGA-II
                                                 125            250            500             1000             2000            4000
  IEEE 118-bus     0.74      6.5k ± 0.6k     4.2k ± 0.1k    4.4k ± 0.3k    4.5k ± 0.4k      4.8k ± 0.4k     5.2k ± 0.2k      6.0k ± 0.5k
  IEEE 300-bus     0.70     17.5k ± 1.2k    14.4k ± 0.3k   14.5k ± 0.1k   14.4k ± 0.5k 14.7k ± 0.4k        15.1k ± 0.2k     16.0k ± 0.3k
Table 5.15 Final median HV obtained for IEEE 118 and 300-bus OPF problems over 20 runs with multiple instances of synchronous
user interaction. Best performing algorithm for row is marked in bold. Algorithms with statistically similar performance to the best
performing algorithm are marked in italics. Results are presented graphically in the supplementary document.
                          𝑇𝑐      𝑇𝑈 = 125        𝑇𝑈 = 250        𝑇𝑈 = 500        𝑇𝑈 = 1,000        𝑇𝑈 = 2,000         𝑇𝑈 = 4,000
           Problem
                                Sync Async      Sync Async      Sync Async       Sync Async        Sync Async        Sync Async
        IEEE 118-bus     20k    0.86    0.84    0.86    0.84    0.80    0.79     0.74     0.79     0.69     0.77     0.51     0.76
        IEEE 300-bus     20k    0.84    0.84    0.83    0.81    0.74    0.82     0.71     0.80     0.66     0.80     0.43     0.77
                                                                   85


    Table 5.14 shows that for both IEEE 118-bus and 300-bus systems, a lag of up to 1,000 FEs
gives statistically similar performance to the synchronous case. For lags of 2,000 FEs or higher,
the performance deteriorates. But even with large lags, IK-EMO is robust enough to perform better
than base NSGA-II. A quicker user feedback with 𝑇𝑈 ≤ 𝑇𝐿 produces the optimal performance, as
expected. With a large lag time for user decision, feedback based on old rules has a detrimental
effect. Thus, asynchronous interaction is suitable for cases where the objective evaluation is overly
expensive, providing users relatively more time to make a decision on preferential ranking of rules.
    A second study evaluates the effect of user feedback lag time for a fixed overall computational
time of 𝑇𝑐 units. For the synchronous case, the effective number of FEs allocated for the optimization
operations becomes small, since a part of 𝑇𝑐 is now consumed by the user to make a decision. For
multiple values of 𝑇𝑈 , we compare the final HV obtained for asynchronous interaction with non-zero
lag cases with that of the synchronous user interaction case. Table 5.15 shows such a comparison
with lag values varying from 125 to 4,000 FEs for a fixed overall execution time of 𝑇𝑐 = 20, 000
FEs.
    From the results, we observe that the performance of the synchronous case drops drastically
when 𝑇𝑈 is increased for both 118- and 300-bus OPF problems. This is because for a large lag in
making decisions, more time is wasted in the decision-making and less execution time is provided
for running the optimization operations. For the asynchronous case, the performance drops slowly
with 𝑇𝑈 , since IK-EMO is able to use the full computational budget of 20k FEs for both 118- and
300-bus cases. This outweighs any performance loss caused by using outdated rules.
    From the results presented in this section, it is evident that user feedback lag is an important
practical factor that will have an effect in an interactive optimization procedure. These preliminary
results suggest that for a small lag, both synchronous and asynchronous implementations can be
viable options. However, for a large anticipated lag, asynchronous implementation is better.
                                                    86


                                           CHAPTER 6
 IK-EMOVIZ: A SOFTWARE IMPLEMENTATION FOR THE IK-EMO FRAMEWORK
IK-EMO Visualizer, or IK-EMOViz is a software implementation of user interactivity in the IK-
EMO framework. In this section, we briefly cover the IK-EMOViz implementation shown in
Figure 6.1.
Figure 6.1 Interactive knowledge-based EMO framework (IK-EMO). Blue blocks represent a normal
EMO. Green blocks represent the automated knowledge extraction and application. Information
from the learning agents and variable relation graphs is presented to the user using IK-EMOViz.
The user, in turn, can provide feedback using the same interface.
6.1     User interaction using the IK-EMOViz graphical user interface
IK-EMOViz is implemented in Python using Plotly and Dash [120], which provides a browser-
based graphical user interface (GUI) for the IK-EMO framework. IK-EMOViz allows the user to
access real-time data about the optimization run such as convergence indicators, scatter plots, and
parallel coordinate plots (PCP) through separate widgets. Figure 6.2 shows an example instance
of IK-EMOViz where the user wants to analyze the results of a bi-objective optimization problem
after 60 generations. The optimization progress in this case is represented by a hypervolume (HV)
[104] evolution plot (left-most plot in Figure 6.2) over the generations completed. The plot is
dynamically updated as each generation is completed. Indicators other than HV can also be used,
if desired. The scatter plot widget (middle plot) shows the entire population in the objective space
                                                 87


Figure 6.2 IK-EMOViz graphical user interface showing the optimization progress, scatter plots,
and parallel coordinate plot widgets describing the results of a bi-objective optimization problem
which was terminated after 60 generations.
with the non-dominated solutions marked in orange and dominated solutions marked in blue. By
using the generator slider shown below the plot, the user can also check the objective vectors of the
population in any earlier generation. The software saves all earlier populations and can display any
earlier population, if desired. The PCP plot (right-most plot) gives a visualization of the objective
and decision variable values together. An additional widget is available whereby the user can
visualize any solution from the Pareto front scatter plot, but for brevity, it is not shown here.
    The ‘IK-EMO Controls’ widget has three buttons (not shown here). The first button can pause or
resume the optimization run. This is used if the user wants the algorithm to wait till he/she analyzes
the results and provides feedback, also known as ‘synchronous interaction’. The second button
is used to refresh all the widgets except the one showing optimization progress to get the latest
data. This functionality is necessary if the user did not pause the optimization run (‘asynchronous
interaction’) and wants to see the updated results and their associated rules. The third button saves
any user feedback which will be considered by IK-EMO in subsequent generations.
    Another important widget displays the latest rules and the corresponding VRGs generated by
the learning agent for all variable groups. A group selector menu allows the user to switch among
                                                  88


multiple groups. Figure 6.3 shows the rule list and VRG for Group 2 variables of the example
problem. Two variables found to possess a significant relation are connected by a gray edge whose
thickness is proportional to the rule score. Nodes are colored on the basis of their degree, with
reddish nodes indicating a high number of connected edges, and bluish nodes representing a low
number of connected edges.
Figure 6.3 IK-EMOViz graphical user interface showing the full VRG for variable group 2 for a
specific optimization problem.
    After analyzing the VRG and associated rules, the user provides feedback by modifying one or
more nodes in the VRG. The following operations can be performed on the learned rules.
    • Exclusion: The user may select to remove certain rules provided by the algorithm, based on
      their knowledge of the problem. The VRG will be updated by removing the corresponding
      edges.
    • Selection: The user may wants to keep only certain rules. In that case, the corresponding
      VRG edges will be retained and the rest will be deleted.
    • Filtering: If there are a large number of rules, the user can choose to filter them based on
      criteria like rule scores, variable correlations, etc.
    • Ranking: The user may provide a ranking of rules (rank 1 is most preferred) provided by
      the algorithm. The algorithm will then try to implement the rules according to the ranks, as
      demonstrated in [121].
                                                   89


Figure 6.3 shows a portion of the IK-EMOViz GUI for the example problem considered previously.
For Group 2, the list of power laws obtained is shown on the left and the corresponding VRG
is shown on the right. Figure 6.4 shows the selected rules by the user achieved by clicking the
corresponding check-boxes. On clicking the green tick button (top corner in right plot), the VRG
is updated with only the selected rules. For example, Node 12 in Figure 6.3 is now absent in
Figure 6.4, since the user did not select any rule involving variable 𝑥 12 . This type of operation can
be useful when the user only wants to select a few rules involving a few important variables.
        Figure 6.4 IK-EMOViz graphical user interface showing a rule selection operation.
    Figure 6.5 shows the resulting VRG when some rules are excluded from Figure 6.3 by selecting
them in the rule list and clicking the cross button on the top right of the VRG. This is useful when
the user only wants to exclude specific rules provided by the algorithm.
       Figure 6.5 IK-EMOViz graphical user interface showing a rule exclusion operation.
                                                  90


6.1.1    Synchronous vs asynchronous user interaction
Pausing the optimization until the user finishes providing feedback ensures the user always has
access to the latest data. This is also referred to as synchronous interaction. However, the user
can take a longer time to analyze the results and provide his/her feedback. It can be more practical
to continue running the optimization in the background while the user analyzes the intermediate
solutions and their associated rules. This is known as asynchronous interaction. In the previous
section, the pause optimization functionality of IK-EMOViz was introduced. It allows the user to
switch between synchronous and asynchronous interaction modes.
    Apart from the learning interval (𝑇𝐿 ) and repair interval (𝑇𝑅 ), another quantity, the user feedback
lag (𝑇𝑈 ), was introduced. This is defined as the number of generations or SEs that can be performed
during the time the user is analyzing the results and preparing the feedback. 𝑇𝑈 is undefined for
synchronous user interaction. Figure 6.6 illustrates the asynchronous user interaction mechanism
of IK-EMOViz, assuming 𝑇𝐿 = 𝑇𝑅 . After every 𝑇𝐿 SEs, the learning agent generates a new VRG
for each variable group, marked by the blue vertical dashed lines. IK-EMOViz allows the user
to provide feedback at any point thereafter during the optimization run. The green dashed lines
represent a user interaction phase lasting for 𝑇𝑈 SEs. In the example shown in Figure 6.6, the user
launches IK-EMOViz between learning rounds 2 and 3. Since user interaction is active, no repair
is performed during learning rounds 3 and 4. Once user feedback is complete, the user-modified
VRG (VRG (𝑈) ) is merged with the last learned VRG (VRG (4) ) as specified in Section 5.5.5, and the
combined VRG is used by the repair agent. Thus, a larger 𝑇𝑈 means the user is making a decision
based on potentially outdated information and may not be as efficient as using the latest knowledge.
6.2     Truss Design Problem
For this study, we use the scalable truss optimization problem described in Chapter 4.
6.2.1    Experimental settings
In this study, we use two instances of the truss design problem, a truss with 120 members, 36 nodes,
129 decision variables and 156 non-linear constraints, and another larger truss with 820 members,
236 nodes, 879 decision variables, and 1056 non-linear constraints. Variable groups were created
                                                   91


Figure 6.6 Asynchronous user interaction example with 𝑇𝐿 = 𝑇𝑅 . Blue dashed lines represent the
learning phases taking place at intervals of 𝑇𝐿 SEs. Green dashed lines represent a user interaction
phase.
similar to [121] according to the physical orientation of the truss beams as shown in Table 6.1. The
           Table 6.1 Variable groups for the 120 and 820-member truss design problems.
                                                                       Variable Indices
           Group                  Variable Type
                                                       120-member truss          820-member truss
            𝐺1              𝑙 𝑖 of vertical members        [120 − 128]              [820 − 878]
            𝐺2        𝑟 𝑖 of top longitudinal members  [8 − 15], [24 − 31]     [0 − 57], [116 − 173]
            𝐺3     𝑟 𝑖 of bottom longitudinal members   [0 − 7], [16 − 23]    [58 − 115], [174 − 231]
            𝐺4              𝑟 𝑖 of vertical members         [36 − 53]               [236 − 353]
120-member truss is used to illustrate how IK-EMOViz can allow the user to obtain insights about
an optimization problem. NSGA-II [15] is chosen as the optimization algorithm of IK-EMO in this
study. Population size is set as 40 and the maximum number of generations is set as 100. After
100 generations, IK-EMOViz is launched. The results are presented in the next section.
    The 820-member truss design problem is used to illustrate the power of IK-EMO in finding good
solutions with periodic user interactions. NSGA-II is run for a maximum number of generations of
12,500 (set by trial-and-error process and required to work with 879 variables), thus giving a total
computational budget of 500,000. To ensure consistency, three artificial users – U1, U2, and U3 are
created to select rules from the all the generated rules by our rule learning method with scores above
0.9, 0.7, and 0.5, respectively. Thus, user U1 chooses fewer rules compared to U2 and U3. For each
user, four rule adherence schemes (RA1, RA2, RA3, and RA-E) are considered, making a total
                                                      92


of 12 separate optimization runs. Moreover, two different scenarios are considered: synchronous
interaction, where optimization is paused when user interaction takes place after 𝑇𝑈 = 10, 000 SEs
would have elapsed, and asynchronous interaction, in which the optimization does not wait for
during the analysis process by the user. Since we would like to complete the runs after a fixed
execution time is achieved, the synchornous interaction cases are allocated less overall SEs (for
each analysis mode, 𝑇𝑈 SEs are doscounted). For each user-repair agent combination, 20 runs are
performed, involving a total of 480 optimization runs. The hypervolume (HV) [104] metric value
for each run is recorded.
6.2.2   Experimental results and discussion on the 120-member truss
For the 120-member truss, the Pareto front and HV evolution plots are shown in Figure 6.7. The
figures are extracted from IK-EMOViz. From the Pareto-optimal solutions found till now, power
                                                                     1.4
                                                                     1.2
                                                                      1
                                                       Hypervolume
                                                                                                                Termination
                                                                     0.8
                                                                     0.6
                                                                     0.4
                                                                     0.2
                                                                      0
                                                                       0       20     40      60     80       100
                                                                                       Generations
                  (a) Pareto front.
                                                                           (b) Hypervolume vs. generations.
Figure 6.7 Pareto front and hypervolume plot for a 120-member truss optimization problem obtained
from IK-EMOViz.
law rules are extracted. Group 𝐺 1 representing the length of the vertical members are shown in
Figure 6.8 highlighted in green. From Figure 6.8b, it can be seen that all the variables are related to
each other through some significant power law. However, through the functionality of IK-EMOViz
we can perform some simplifications. Variables 120-128 represent consecutive vertical members
shown in Figure 6.8a from right to left. So we perform rule selection and select rules of the form
                                                  93


                                                                                                  8
                                                                           122
                                                               123
                                                                                      121
                                                                                                      Node Connections
                                                       124
                                                                                            120   7
                                                       125
                                                                                      128
                                                               126
                                                                           127
                                                                                                  6
                  (a) 120-member truss.                        (b) Variable relation graph.
Figure 6.8 A Pareto-optimal truss design with group 𝐺 1 highlighted in green and the corresponding
variable relation graph obtained from IK-EMOViz.
    𝑏 = 𝑐 where 𝑖 = 120, 121, . . . , 127. The reduced set of power law rules thus obtained for 𝐺
𝑥𝑖 𝑥𝑖+1                                                                                           1
along with their rule compliance values are given in Table 6.2. Rule compliance is defined as the
proportion of non-dominated solutions that follow a power law. The corresponding VRG is shown
in Figure 6.9a.
                           Table 6.2 Power law rules found for group 𝐺 1 .
                          Rule no.      Power law            Rule compliance
                             1              −0.57 = 0.92
                                     𝑥120 𝑥 121                    1.00
                             2              −1.00 = 0.87
                                     𝑥121 𝑥 122                    1.00
                                            −0.92
                             3       𝑥122 𝑥 123   = 0.96           1.00
                             4              −0.82
                                     𝑥123 𝑥 124 = 1.05             1.00
                             5              −1.13 = 0.98
                                     𝑥124 𝑥 125                    1.00
                             6              −0.78 = 1.14
                                     𝑥125 𝑥 126                    1.00
                             7              −1.06 = 1.10
                                     𝑥126 𝑥 127                    1.00
                             8              −1.72 = 1.03
                                     𝑥127 𝑥 128                    1.00
                                                                  −0.57 − 0.92 = 0 and 𝜙 (x) =
   From Table 6.2, let us consider two power laws 𝜙1 (x) = 𝑥120 𝑥 121                   2
       −1.72 − 1.03 = 0. 𝑥
𝑥 127 𝑥128                 120 and 𝑥 128 represent the length of the vertical members at each end, and
𝑥 121 and 𝑥127 represent the adjacent vertical members, respectively. Figure 6.9b obtained from IK-
EMOViz plots the two power laws over the normalized variable range of [1, 2] and plots another line
𝑥𝑖 = 𝑥 𝑗 . The non-dominated solutions from which the power laws where extracted are also shown.
An interesting observation is that the 𝜙1 (x) lies above and 𝜙2 (x) lies below the 𝑥𝑖 = 𝑥 𝑗 line. This
indicates that 𝑥 120 < 𝑥121 and 𝑥127 > 𝑥128 among good solutions. Analyzing the rest of the power
laws in Table 6.2 and the loading condition of the truss, we can see that 𝑥 120 < 𝑥121 < . . . < 𝑥124
                                                  94


                                                           2
                             122
              123
                                           121
                                                               Node Connections
     124
                                                 120
     125
                                           128
              126
                             127
                                                           1
            (a) Simplified variable relation graph.                               (b) Power laws.
Figure 6.9 Simplified variable relation graph of group 𝐺 2 and power laws between 𝑥 120 , 𝑥121 and
𝑥 127 , 𝑥128 obtained from IK-EMOViz.
and 𝑥124 > 𝑥125 > . . . > 𝑥 128 . This indicates that the length of the vertical members increase
from the end to the middle of the truss, which is an expected outcome of this particular loading
condition [92]. IK-EMO is able to successfully extract these rules and IK-EMOViz provides useful
functionality to the user to understand the rules.
6.2.3   Experimental results and discussion on the 820-member truss
The final median HV obtained at the end of 500k SEs for each user-repair agent combination are
shown in Table 6.3. The row-wise best is marked in bold with the statistically similar performing
algorithms marked in italics. The column-wise best is marked by a gray box. It can be seen that
user U2 who selects a moderate amount of rules is the best performer for all the repair agents. For
each user, RA2 is the best performer in 4 out of 6 cases, and RA-E has a the best performance in
3 out of 6 cases. This shows that an intermediate amount of rule usage combined with a medium
level of rule adherence works best, as shown in [121]. In all the cases, asynchronous interactions
results in a better performance due to a high 𝑇𝑈 . While asynchronous interaction runs the risk
of the user making a decision based on outdated rules, the built-in safeguards of IK-EMO against
incorrect user information [121] mitigate some of the risks. Thus, allowing the optimization to run
in the background while the user deliberates on the available data is a better approach compared to
pausing the optimization. Thus, a tool like IK-EMOViz which provides the option to the user to
                                                      95


keep running the optimization in the background is necessary.
Table 6.3 Final median HV obtained at the end of 500k SEs for the 820-member truss optimization
problem. Best-performing algorithm in each row is marked in bold. Statistically similar perfor-
mance to the best in each row is marked in italics. The gray boxes represent the column-wise best.
                                                             U1                  U2                                U3
                                     Repair agent
                                                        Sync Async          Sync Async                        Sync Async
                                      None (base)       0.79    0.79        0.79    0.79                      0.79    0.79
                                         RA1            0.81    0.98        0.88    1.01                      0.82    0.95
                                         RA2            0.84    1.06        0.91    1.08                      0.77    0.99
                                         RA3            0.81    0.96        0.86    0.98                      0.79    0.90
                                        RA-E            0.85    1.02        0.88    1.08                      0.79    0.99
                For user U2 and repair agent RA2, the PF and HV plots comparing synchronous and asyn-
chronous interaction cases are shown in Figure 6.10. Figure 6.10a shows how asynchronous
interaction can provide better solutions compared to synchronous interaction. The HV plot in Fig-
ure 6.10b shows a portion of the optimization run between 50k and 200k SEs with user interaction
instances being marked in red. Asynchronous interaction results in a better median HV.
                      0.3                                                                       1
                                                       Asynchronous                                                          Asynchronous
                                                       Synchronous                                                           Synchronous
                     0.25                                                                     0.9                            User interaction
  Compliance (m/N)
                                                                                Hypervolume
                      0.2                                                                     0.8
                     0.15                                                                     0.7
                      0.1                                                                     0.6
                     0.05                                                                     0.5
                            2        4         6          8           10                        0.5            1             1.5                2
                                         Weight (kg)             10 4                                       Function evaluations           10 5
                                (a) Pareto Front for one run.                                         (b) Median HV plot over 20 runs.
Figure 6.10 Pareto fronts and median hypervolume plots obtained by IK-EMO with U2, RA2, and
asynchronous interaction for 820-member truss design problem.
                                                                           96


                                            CHAPTER 7
                             CONCLUSIONS AND FUTURE WORK
7.1    Contributions of this dissertation
This dissertation describes the process of converting a basic EMO algorithm into an interactive
method which uses automated knowledge-extraction methods. Knowledge, in this context, is
defined as mathematical relations among two or more variables such as power laws, inequalities,
constant rules, etc.
    The first framework (MOEA/I) extracts inequality and equality rules during the optimization
run. This knowledge is applied in the form of repair operators, where newly-generated solutions
are modified according to any learned rules. Three such repair operators are introduced, along
with an ensemble method capable of combining multiple repair operators and adaptively shifting
between them based on their performance. The performance is demonstrated on a large-scale truss
design problem (879 and 1479 variables), as well as a solid rocket design problem (544 variables).
MOEA/I is shown to provide better solutions overall as well as to exhibit faster convergence on all
the problems.
    The IK-EMO framework interleaves interactive optimization with automated knowledge extrac-
tion to obtain better quality solutions faster. Power law, inequality, and mixed rules are extracted,
together with their degrees of statistical adherence, from the ND solutions at a regular interval of
generations. A computationally efficient graph data structure-based (VRG) knowledge processing
method has been proposed to store and process multiple pairwise variable interactions. A user is
then expected to provide a ranking of the learned rules based on his/her perception of the validity
of the rules. A repair agent has been proposed to utilize the VRG with user-supplied ranking to
repair offspring solutions. The study has created six repair schemes with three different degrees
— tight, medium, and loose – of rule adherence. A mixed power law and inequality based repair
has also been used. Finally, three ensemble-based repair schemes which adaptively use power law,
inequality or both have been proposed. These 10 repair schemes have been implemented with four
different rule usage schemes RU1-RU4, using 10% (conservative), to 100% (liberal) of the learned
                                                  97


rules.
    The IK-EMO framework has been applied to multiple large-scale two-objective practical prob-
lems: 78- and 118-variable stepped beam design problems, 115- and 243-variable optimal power
flow problems, and a 1,179-variable truss design problem. Experimental results on all problems
have consistently shown that (i) usage of a moderate number of rules (20%) combined with a
moderate degree of rule adherence produces the best performance, and (ii) power law rules, in-
dividually, produce better performance than inequality rules. Moreover, ensemble-based repair
operators provide comparable performance to the best performing individual repair operators. Use
of ensembles eliminates the need to experiment to find the right rule adherence for a new problem.
IK-EMO is also able to work with very low population sizes, even for a large-scale problem.
    A preliminary study on a practical aspect—the inevitable lag time between presenting learned
rules to the user and obtaining feedback from the user—has been made. Results on the OPF
problem have shown that the proposed framework is able to maintain similar performance up to a
certain lag period, beyond which the user response has been found to be too slow for the algorithm
to maintain the same level of performance.
    This study has introduced a knowledge-based interactive optimization tool IK-EMOViz for
executing a better-informed optimization study. Through a 120-member truss design problem, we
have shown how IK-EMOViz can help the user to obtain useful insights about an optimization
problem in the form of simplistic relationships among variables and visualization of the interaction
through relationship graphs. In addition, the user can also provide their own feedback by providing
their preferences on obtained relationships using the tool. IK-EMOViz also allows the user to
perform an asynchronous interaction by utilizing computing resources in the background while
understanding and analyzing the obtained relationships. As demonstrated on an 820-member truss
problem, the option of asynchronous interaction can result in a better performance. Moreover,
moderately few relationships chosen by the user applied with moderate adherence within the EMO
algorithm have found to produce better results.
                                                 98


7.2     Future work
This study opens up a number of avenues for future work, discussed below.
7.2.1    Varying learning and repair intervals
In this work, the learning and repair intervals are kept fixed. The effect of these parameters need to
be studied more closely. Learning every 10 generations can result in different results compared to
learning every generation. Frequent repair operations can result in a much more accurate probability
update for ensemble operations. On the other hand, too many repairs done within a short period of
time can steer the optimization algorithm towards undesired regions in the search space.
7.2.2    Locality of learned rules
In this study, the learning agents operated on the assumption that learned rules are of high quality
if they are applicable across the entire Pareto front. Some rules may exist in only certain parts of
the Pareto-optimal front. The existence of transition points [67] is one such example. Exploring
ways for IK-EMO to extract local rules and repair offspring population members accordingly
will introduce additional challenges but may result in faster convergence. Appropriate evaluation
metrics also need to be developed.
7.2.3    Integration of traditional user preference incorporation methods
This study uses a novel way to integrate user-provided knowledge into an optimization framework.
Traditional user preference information including, but not limited to, aspiration levels, relative
importance of objective functions, and preferred regions of the Pareto-optimal front, can also
potentially be integrated into IK-EMO.
7.2.4    Asynchronous user interaction
The asynchronous user interaction study is practical and must be investigated more thoroughly.
One future direction would be to study the effect of variable user interaction lag (𝑇𝑈 ). This is
relevant since in a practical setting, the user is unlikely to take the same amount of time for every
round of interaction. The user is also not likely to interact as frequently as assumed in this study.
Thus, studies comparing synchronous and asynchronous user interaction modes need to be more
oriented towards practical situations.
                                                   99


7.2.5    Improvements to IK-EMOViz
Future work can focus on enhancing the functionality of IK-EMOViz by allowing direct modification
of rule parameters. The user should also be able to introduce new rules if they are not present in
the rule set extracted by the learning agents. Other modifications can include introduction of new
repair agents or disabling existing ones. The VRG widget can also be made interactive by turning
it into a 3D model, allowing the user to directly modify the VRG. Nevertheless, this study provides
evidence of the importance of a robust software implementation for any knowledge-based interactive
optimization method that combines human knowledge and machine intelligence in executing an
optimization task faster than either of them alone.
7.3     Concluding remarks
Designers often have years of knowledge and intuition in dealing with the problem they are
interested to optimize. Innovization methods also exist which allows us to extract this knowledge in
an online fashion. In this work, we attempt to provide an EMO-based framework which is able to
leverage both human-provided as well as computer-extracted problem knowledge to achieve faster
convergence. The proposed framework is able to outperform NSGA-II on large scale problems
ranging up to 1,479 variables. The work in this dissertation demonstrates the potential of developing
and applying interactive knowledge-driven optimization methods in practical applications.
                                                 100


                                       BIBLIOGRAPHY
[1]  M. O. Grond, N. H. Luong, J. Morren, and J. G. Slootweg, “Multi-objective optimization
     techniques and applications in electric power systems,” 2012.
[2]  J. L. J. Pereira, G. A. Oliver, M. B. Francisco, S. S. Cunha, and G. F. Gomes, “A review of
     multi-objective optimization: Methods and algorithms in mechanical engineering problems,”
     Archives of Computational Methods in Engineering, vol. 29, pp. 2285–2308, 6 2022.
[3]  K. Deb and R. B. Agrawal, “Simulated binary crossover for continuous search space,”
     Complex Syst., vol. 9, pp. 115–148, 1995.
[4]  K. V. Price, R. M. Storn, and J. A. Lampinen, Differential Evolution: A Practical Approach
     to Global Optimization. Springer-Verlag, 2005.
[5]  K. Deb and C. Myburgh, “A population-based fast algorithm for a billion-dimensional
     resource allocation problem with integer variables,” European Journal of Operational Re-
     search, vol. 261, pp. 460–474, 9 2017.
[6]  A. H. Gandomi, K. Deb, R. C. Averill, S. Rahnamayan, and M. N. Omidvar, “Using semi-
     independent variables to enhance optimization search,” Expert Systems with Applications,
     vol. 120, pp. 279–297, 2019. Semi independent vars paper<br/>Useful for innovization.
[7]  K. Deb and A. Srinivasan, “Innovization: Innovating design principles through optimiza-
     tion,” vol. 2, pp. 1629–1636, 2006.
[8]  S. Bandaru and K. Deb, “Higher and lower-level knowledge discovery from pareto-optimal
     sets,” vol. 57, pp. 281–298, Springer, 10 2013.
[9]  A. Gaur and K. Deb, “Adaptive use of innovization principles for a faster convergence of evo-
     lutionary multi-objective optimization algorithms,” pp. 75–76, Association for Computing
     Machinery, Inc, 7 2016.
[10] C. A. C. Coello, D. A. V. Veldhuizen, and G. B. Lamont, Evolutionary Algorithms for Solving
     Multi-Objective Problems, vol. 5. Springer US, 2007.
[11] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons,
     Inc., 2001.
[12] K. Miettinen, “Nonlinear multiobjective optimization,” vol. 12, 1998.
[13] C. M. Fonseca and P. J. Fleming, “Genetic algorithms for multiobjective optimization,”
     1993.
[14] E. . Zitzler, L. Thiele, and E. Zitzler, “An evolutionary algorithm for multiobjective opti-
     mization: the strength pareto approach,” 1998.
[15] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic
     algorithm: Nsga-ii,” IEEE Transactions on Evolutionary Computation, vol. 6, pp. 182–197,
     4 2002.
                                               101


[16] J. Nocedal and S. Wright, “Solutions to the selected problems in numerical optimization,”
     p. 524, 2006.
[17] D. Goldberg, “Genetic algorithms in search optimization and machine learning,” 1988.
[18] J. Koza, “Genetic programming: on the programming of computers by means of natural
     selection,” Biosystems, vol. 33, pp. 69–73, 1992.
[19] H.-P. P. Schwefel, Evolution and Optimum Seeking: The Sixth Generation. John Wiley
     &amp; Sons, Inc., 1993.
[20] Q. Zhang and H. Li, “Moea/d: A multiobjective evolutionary algorithm based on decompo-
     sition,” IEEE Transactions on Evolutionary Computation, vol. 11, pp. 712–731, 12 2007.
[21] E. . Zitzler, M. . Laumanns, L. Thiele, E. Zitzler, and M. Laumanns, “Spea2: Improving the
     strength pareto evolutionary algorithm,” TIK Report, vol. 103, 2001.
[22] K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference-
     point-based nondominated sorting approach, part i: Solving problems with box constraints,”
     IEEE Transactions on Evolutionary Computation, vol. 18, pp. 577–601, 2014.
[23] H. Jain and K. Deb, “An evolutionary many-objective optimization algorithm using reference-
     point based nondominated sorting approach, part ii: Handling constraints and extending to
     an adaptive approach,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 4,
     pp. 602–622, 2014.
[24] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms:
     empirical results.,” Evolutionary computation, vol. 8, pp. 173–195, 3 2000. ZDT original.
[25] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable multi-objective optimization test
     problems,” vol. 1, pp. 825–830, IEEE Computer Society, 2002. DTLZ original.
[26] S. Huband, L. Barone, L. While, and P. Hingston, “A scalable multi-objective test problem
     toolkit,” vol. 3410, pp. 280–295, Springer Verlag, 2005. WFG original.
[27] Q. Zhang, A. Zhou, and S.-Z. Zhao, “Multiobjective optimization test instances for the cec
     2009 special session and competition,” University of Essex, pp. 1–30, 2008. CEC 2009
     original.
[28] K. Deb and C. Myburgh, “Breaking the billion-variable barrier in real-world optimiza-
     tion using a customized evolutionary algorithm,” pp. 653–660, Association for Computing
     Machinery, Inc, 7 2016.
[29] A. Szollos, M. Šmíd, and J. Hájek, “Aerodynamic optimization via multi-objective micro-
     genetic algorithm with range adaptation, knowledge-based reinitialization, crowding and
     𝜖-dominance,” Advances in Engineering Software, vol. 40, pp. 419–430, 6 2009.
[30] R. Landa-Becerra, L. V. Santana-Quintero, and C. A. Coello, “Knowledge incorporation
     in multi-objective evolutionary algorithms,” Studies in Computational Intelligence, vol. 98,
     pp. 23–46, 2008.
                                               102


[31] K. Amouzgar, A. Nourmohammadi, and A. H. Ng, “Multi-objective optimisation of tool
     indexing problem: a mathematical model and a modified genetic algorithm,” International
     Journal of Production Research, vol. 59, pp. 3572–3590, 2021.
[32] C. A. B. Diaz, T. Aslam, and A. H. Ng, “Optimizing reconfigurable manufacturing systems
     for fluctuating production volumes: A simulation-based multi-objective approach,” IEEE
     Access, 2021.
[33] A. Nourmohammadi, H. Eskandari, M. Fathi, and A. H. Ng, “Integrated locating in-house
     logistics areas and transport vehicles selection problem in assembly lines,” International
     Journal of Production Research, vol. 59, pp. 598–616, 2021.
[34] K. Amouzgar, A. H. Ng, and G. Ljustina, “Optimizing index positions on cnc tool magazines
     considering cutting tool life and duplicates,” vol. 93, pp. 1508–1513, Elsevier, 1 2020.
[35] A. Jaszkiewicz and P. Zielniewicz, “Pareto memetic algorithm with path relinking for
     bi-objective traveling salesperson problem,” European Journal of Operational Research,
     vol. 193, pp. 885–890, 3 2009.
[36] B. Christian and S. Cremaschi, “Planning pharmaceutical clinical trials under outcome
     uncertainty,” 1 2018.
[37] A. Baghdadi, M. Heristchian, and H. Kloft, “Design of prefabricated wall-floor building sys-
     tems using meta-heuristic optimization algorithms,” Automation in Construction, vol. 114,
     p. 103156, 6 2020.
[38] H. Li, “Network topology design,” 1 2016.
[39] S. K. Seo, D. H. Cho, Y. Lim, and C. J. Lee, “Application of genetic algorithm to layer
     patterning of plate fin heat exchanger,” 10 2017.
[40] T. Rios, B. V. Stein, T. Back, B. Sendhoff, and S. Menzel, “Point2ffd: Learning shape rep-
     resentations of simulation-ready 3d models for engineering design optimization,” pp. 1024–
     1033, Institute of Electrical and Electronics Engineers (IEEE), 1 2022.
[41] S. Friess, P. Tino, Z. Xu, S. Menzel, B. Sendhoff, and X. Yao, “Artificial neural networks
     as feature extractors in continuous evolutionary optimization,” vol. 2021-July, Institute of
     Electrical and Electronics Engineers Inc., 7 2021.
[42] S. Friess, P. Tino, S. Menzel, B. Sendhoff, and X. Yao, “Representing experience in con-
     tinuous evolutionary optimisation through problem-tailored search operators,” Institute of
     Electrical and Electronics Engineers Inc., 7 2020.
[43] G. Ruan, L. L. Minku, S. Menzel, B. Sendhoff, and X. Yao, “Computational study on
     effectiveness of knowledge transfer in dynamic multi-objective optimization,” Institute of
     Electrical and Electronics Engineers Inc., 7 2020.
[44] C. A. C. Coello and M. G. C. Tapia, “Cultural algorithms for optimization,” 7 2021.
                                               103


[45] S. Obayashi and D. Sasaki, “Visualization and data mining of pareto solutions using self-
     organizing map,” Lecture Notes in Computer Science (including subseries Lecture Notes in
     Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2632, pp. 796–809, 2003.
[46] S. Saha, T. Rios, L. L. Minku, X. Yao, Z. Xu, B. Sendhoff, and S. Menzel, “Optimal
     evolutionary optimization hyper-parameters to mimic human user behavior,” pp. 858–866,
     Institute of Electrical and Electronics Engineers Inc., 12 2019.
[47] I. Karlsson, S. Bandaru, and A. H. Ng, “Online knowledge extraction and preference guided
     multi-objective optimization in manufacturing,” IEEE Access, vol. 9, pp. 145382–145396,
     2021.
[48] I. Morshedzadeh, A. H. Ng, and M. Jeusfeld, “Managing manufacturing data and information
     in product lifecycle management systems considering changes and revisions,” International
     Journal of Product Lifecycle Management, vol. 13, pp. 244–264, 2021.
[49] S. Lidberg, T. Aslam, L. Pehrsson, and A. H. Ng, “Optimizing real-world factory flows
     using aggregated discrete event simulation modelling: Creating decision-support through
     simulation-based optimization and knowledge-extraction,” Flexible Services and Manufac-
     turing Journal, vol. 32, pp. 888–912, 12 2020.
[50] H. Smedberg, S. Bandaru, A. H. Ng, and K. Deb, “Trend mining 2.0: Automating the
     discovery of variable trends in the objective space,” Institute of Electrical and Electronics
     Engineers Inc., 7 2020.
[51] B. Xin, L. Chen, J. Chen, H. Ishibuchi, K. Hirota, and B. Liu, “Interactive multiobjective
     optimization: A review of the state-of-the-art,” 7 2018.
[52] K. Deb and J. Sundar, “Reference point based multi-objective optimization using evolution-
     ary algorithms,” vol. 1, pp. 635–642, ACM Press, 2006.
[53] K. Miettinen, F. Ruiz, and A. P. Wierzbicki, “Introduction to multiobjective optimization:
     Interactive approaches,” 2008.
[54] L. Rachmawati and D. Srinivasan, “Incorporating the notion of relative importance of
     objectives in evolutionary multiobjective optimization,” IEEE Transactions on Evolutionary
     Computation, vol. 14, pp. 530–546, 8 2010.
[55] S. Greco, B. Matarazzo, and R. Słowiński, “Interactive evolutionary multiobjective opti-
     mization using dominance-based rough set approach,” 2010.
[56] C. Barba-González, V. Ojalehto, J. García-Nieto, A. J. Nebro, K. Miettinen, and J. F. Aldana-
     Montes, “Artificial decision maker driven by pso: An approach for testing reference point
     based interactive methods,” vol. 11101 LNCS, pp. 274–285, Springer Verlag, 2018.
[57] K. Deb, A. Sinha, P. J. Korhonen, and J. Wallenius, “An interactive evolutionary multiob-
     jective optimization method based on progressively approximated value functions,” IEEE
     Transactions on Evolutionary Computation, vol. 14, pp. 723–739, 10 2010.
                                               104


[58] M. Gombolay, R. Jensen, J. Stigile, T. Golen, N. Shah, S. H. Son, and J. Shah, “Human-
     machine collaborative optimization via apprenticeship scheduling,” Journal of Artificial
     Intelligence Research, vol. 63, pp. 1–49, 5 2018.
[59] S. Greco, V. Mousseau, and R. Słowiński, “Ordinal regression revisited: Multiple criteria
     ranking using a set of additive value functions,” European Journal of Operational Research,
     vol. 191, pp. 416–436, 12 2008.
[60] O. I. Larichev, “Cognitive validity in design of decision-aiding techniques,” Journal of
     Multi-Criteria Decision Analysis, vol. 1, pp. 127–138, 1992.
[61] V. Ojalehto, D. Podkopaev, and K. Miettinen, “Towards automatic testing of reference point
     based interactive methods,” vol. 9921 LNCS, pp. 483–492, Springer Verlag, 2016.
[62] S. Ruiyi, G. Liangjin, and F. Zĳie, “Multi-objective collaborative optimization based on
     evolutionary algorithms,” Journal of Mechanical Design, Transactions of the ASME, vol. 133,
     10 2011.
[63] X. Shen, Y. Guo, Q. Chen, and W. Hu, “A multi-objective optimization evolutionary algo-
     rithm incorporating preference information based on fuzzy logic,” Computational Optimiza-
     tion and Applications, vol. 46, pp. 159–188, 6 2010.
[64] J. Blank and K. Deb, “Solver: A blueprint for collaborative optimization in practice,” vol. 1,
     pp. 105–110, 2021.
[65] S. Bandaru and K. Deb, “Automated innovization for simultaneous discovery of multiple
     rules in bi-objective problems,” vol. 6576 LNCS, pp. 1–15, Springer, Berlin, Heidelberg,
     2011.
[66] K. Deb, S. Bandaru, D. Greiner, A. Gaspar-Cunha, and C. C. Tutum, “An integrated ap-
     proach to automated innovization for discovering useful design principles: Case studies from
     engineering,” Applied Soft Computing Journal, vol. 15, pp. 42–56, 2 2014.
[67] K. Deb and A. Srinivasan, “Innovization: Discovery of innovative design principles through
     multiobjective evolutionary optimization,” 11 2007.
[68] S. Bandaru, C. C. Tutum, K. Deb, and J. H. Hattel, “Higher-level innovization: A case study
     from friction stir welding process optimization,” pp. 2782–2789, 2011.
[69] K. Deb, S. Gupta, D. Daum, J. Branke, A. K. Mall, and D. Padmanabhan, “Reliability-
     based optimization using evolutionary algorithms,” IEEE Transactions on Evolutionary
     Computation, vol. 13, pp. 1054–1074, 2009.
[70] S. Bandaru and K. Deb, “Temporal innovization: Evolution of design principles using
     multi-objective optimization,” vol. 9018, pp. 79–93, Springer Verlag, 2015.
[71] A. Gaur and K. Deb, “Towards an automated innovization method for handling discrete
     search spaces,” pp. 2899–2906, Institute of Electrical and Electronics Engineers Inc., 9
     2015.
                                               105


[72] A. Gaur and K. Deb, “Adaptive use of innovization principles for a faster convergence of evo-
     lutionary multi-objective optimization algorithms,” pp. 75–76, Association for Computing
     Machinery, Inc, 7 2016.
[73] A. Gaur and K. Deb, “Effect of size and order of variables in rules for multi-objective
     repair-based innovization procedure,” pp. 2177–2184, Institute of Electrical and Electronics
     Engineers Inc., 7 2017.
[74] CPLEX, IBM ILOG, “V12. 1: User’s manual for CPLEX,” International Business Machines
     Corporation, vol. 46, no. 53, p. 157, 2009.
[75] Gurobi Optimization, LLC, “Gurobi Optimizer Reference Manual,” 2022.
[76] FICO, “Xpress-Optimizer Reference Manual,” 2009.
[77] ESTECO - ModeFRONTIER, “Process automation and optimization in the engineering
     designprocess,” 2022.
[78] V. R. D, “VisualDOC: Software for Process Integration and Multidiscipline Design Opti-
     mization,” 2022.
[79] Siemens, “HEEDS – Discover better designs, faster,” 2022.
[80] Noesis-Optimus, “The industry-leading pido software platform,” 2022.
[81] Ansys, “Ansys optiSLang - Process Integration & Design Optimization,” 2022.
[82] Dassault-iSight, “Automate design exploration and optimization,” 2022.
[83] J. J. Durillo and A. J. Nebro, “jmetal: A java framework for multi-objective optimization,”
     Advances in Engineering Software, vol. 42, pp. 760–771, 2011.
[84] J. Durillo, A. Nebro, and E. Alba, “The jmetal framework for multi-objective optimization:
     Design and architecture,” in CEC 2010, (Barcelona, Spain), pp. 4138–4325, July 2010.
[85] J. Blank and K. Deb, “Pymoo: Multi-objective optimization in python,” IEEE Access, vol. 8,
     pp. 89497–89509, 2020.
[86] S. Bleuler, M. Laumanns, L. Thiele, and E. Zitzler, “PISA — a platform and programming
     language independent interface for search algorithms,” in Evolutionary Multi-Criterion Op-
     timization (EMO 2003) (C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele,
     eds.), Lecture Notes in Computer Science, (Berlin), pp. 494 – 508, Springer, 2003.
[87] K. Miettinen and M. M. Mäkelä, “Nimbus — interactive method for nondifferentiable
     multiobjective optimization problems,” pp. 50–57, 1996.
[88] G. Misitano, B. S. Saini, B. Afsar, B. Shavazipour, and K. Miettinen, “Desdeo: The modular
     and open source framework for interactive multiobjective optimization,” IEEE Access, vol. 9,
     pp. 148277–148295, 2021.
                                               106


[89] Y. Dhebar and K. Deb, “Interpretable rule discovery through bilevel optimization of split-
      rules of nonlinear decision trees for classification problems,” IEEE Transactions on Cyber-
      netics, 2020.
[90] A. Ghosh, Y. Dhebar, R. Guha, K. Deb, S. Nageshrao, L. Zhu, E. Tseng, and D. Filev,
      “Interpretable AI agent through nonlinear decision trees for lane change problem,” in 2021
      IEEE Symposium Series on Computational Intelligence (SSCI), pp. 01–08, 2021.
[91] A. Ghosh, E. Goodman, K. Deb, R. Averill, and A. Diaz, “A large-scale bi-objective opti-
      mization of solid rocket motors using innovization,” 2020 IEEE Congress on Evolutionary
      Computation (CEC), pp. 1–8, 7 2020.
[92] A. Ghosh, K. Deb, R. Averill, and E. Goodman, “Combining user knowledge and online
      innovization for faster solution to multi-objective design optimization problems,” vol. 12654
      LNCS, pp. 102–114, Springer, Cham, 3 2021.
[93] A. Ghosh, K. Deb, E. Goodman, and R. Averill, “A user-guided innovization-based evolution-
      ary algorithm framework for practical multi-objective optimization problems,” Engineering
      Optimization, vol. 0, no. 0, pp. 1–13, 2022.
[94] Z. Tian, Y. Liu, L. Jiang, W. Zhu, and Y. Ma, “A review on application of composite truss
      bridges composed of hollow structural section members,” 2 2019.
[95] M. P. Bendsøe, O. Sigmund, M. P. Bendsøe, and O. Sigmund, “Topology design of truss
      structures,” 2004.
[96] A. Ahrari, A. A. Atai, and K. Deb, “Simultaneous topology, shape and size optimization of
      truss structures by fully stressed design based on evolution strategy,” Engineering Optimiza-
      tion, vol. 47, pp. 1063–1084, 8 2015.
[97] A. Ahrari, A. A. Atai, and K. Deb, “A customized bilevel optimization approach for solving
      large-scale truss design problems,” Engineering Optimization, vol. 52, pp. 2062–2079, 12
      2020.
[98] M. Z. A. Elrehim, M. A. Eid, and M. G. Sayed, “Structural optimization of concrete arch
      bridges using genetic algorithms,” Ain Shams Engineering Journal, vol. 10, pp. 507–516,
      2019.
[99] R. Cazacu and L. Grama, “Steel truss optimization using genetic algorithms and fea,”
      Procedia Technology, vol. 12, pp. 339–346, 2014.
[100] A. C. Lemonge, J. P. Carvalho, P. H. Hallak, and D. E. Vargas, “Multi-objective truss
      structural optimization considering natural frequencies of vibration and global stability,”
      Expert Systems with Applications, vol. 165, p. 113777, 3 2021.
[101] V. Ho-Huu, D. Duong-Gia, T. Vo-Duy, T. Le-Duc, and T. Nguyen-Thoi, “An efficient com-
      bination of multi-objective evolutionary optimization and reliability analysis for reliability-
      based design optimization of truss structures,” Expert Systems with Applications, vol. 102,
      pp. 262–272, 7 2018.
                                                 107


[102] S. Bandyopadhyay and A. Mukherjee, “An algorithm for many-objective optimization with
      reduced objective computations: A study in differential evolution,” IEEE Transactions on
      Evolutionary Computation, vol. 19, pp. 400–413, 6 2015.
[103] B. O. I. S. C. in Structural Optimization, “Optimization problem of iscso 2019,” 2019.
[104] E. Zitzler and L. Thiele, “Multiobjective optimization using evolutionary algorithms - a
      comparative case study,” vol. 1498 LNCS, pp. 292–301, Springer Verlag, 1998.
[105] D. A. V. Veldhuizen, “Multiobjective evolutionary algorithms: Classifications, analyses and
      new innovations,” IRE Transactions on Education, vol. 8, pp. 125–147, 1999.
[106] H. Ishibuchi, H. Masuda, Y. Tanigaki, and Y. Nojima, “Modified distance calculation in
      generational distance and inverted generational distance,” vol. 9019, pp. 110–125, Springer
      Verlag, 2015.
[107] J. D. Gibbons and S. Chakraborti, “Nonparametric statistical inference,” 2011.
[108] M. Hollander, D. A. Wolfe, and E. Chicken, Nonparametric Statistical Methods. Wiley, 7
      2015.
[109] G. P. Sutton and O. Biblarz, Rocket Propulsion Elements. Wiley, 9th editio ed., 2016.
[110] A. V. Aho, M. R. Garey, and J. D. Ullman, “The transitive reduction of a directed graph,”
      SIAM Journal on Computing, vol. 1, pp. 131–137, 7 1972.
[111] A. Rothwell, “Optimization of beams,” 4 2017.
[112] S. Datta, A. Ghosh, K. Sanyal, and S. Das, “A radial boundary intersection aided interior
      point method for multi-objective optimization,” Information Sciences, vol. 377, pp. 1–16,
      2017.
[113] M. Basu, “Economic environmental dispatch using multi-objective differential evolution,”
      Applied Soft Computing, vol. 11, pp. 2845–2853, 3 2011.
[114] S. Tan, S. Lin, L. Yang, A. Zhang, W. Shi, and H. Feng, “Multi-objective optimal power
      flow model for power system operation dispatching,” IEEE Computer Society, 2013.
[115] M. Ghasemi, S. Ghavidel, M. M. Ghanbarian, M. Gharibzadeh, and A. A. Vahed, “Multi-
      objective optimal power flow considering the cost, emission, voltage deviation and power
      losses using multi-objective modified imperialist competitive algorithm,” Energy, vol. 78,
      pp. 276–289, 12 2014.
[116] S. K. Dash and S. Mohanty, “Multi-objective economic emission load dispatch with nonlinear
      fuel cost and noninferior emission level functions for ieee-118 bus system,” pp. 1371–1376,
      Institute of Electrical and Electronics Engineers Inc., 6 2015.
[117] M. Ghasemi, S. Ghavidel, M. M. Ghanbarian, and M. Gitizadeh, “Multi-objective optimal
      electric power planning in the power system using gaussian bare-bones imperialist competi-
      tive algorithm,” Information Sciences, vol. 294, pp. 286–304, 2 2015.
                                                108


[118] F. Tooryan, A. Ghosh, Y. Wang, S. Srivastava, E. Arvanitis, and V. D. Angelis, “Compre-
      hensive design of a microgrid energy storage with guaranteed optimality,” 2020.
[119] R. D. Zimmerman, C. E. Murillo-Sánchez, and R. J. Thomas, “Matpower: Steady-state
      operations, planning, and analysis tools for power systems research and education,” IEEE
      Transactions on Power Systems, vol. 26, pp. 12–19, 2 2011.
[120] S. Hossain, “Visualization of bioinformatics data with dash bio,” PROC. OF THE 18th
      PYTHON IN SCIENCE CONF, 2019.
[121] A. Ghosh, K. Deb, E. Goodman, and R. Averill, “An interactive knowledge-based multi-
      objective evolutionary algorithm framework for practical optimization problems,” 2022.
                                              109