A VARIABLE-LENGTH MANY-OBJECTIVE OPTIMIZATION APPROACH IN IMAGE

SEGMENTATION PROBLEMS

By

Xuhui Huang

A THESIS

Michigan State University

in partial fulﬁllment of the requirements

Submitted to

for the degree of

Electrical Engineering – Master of Science

2018

ABSTRACT

A VARIABLE-LENGTH MANY-OBJECTIVE OPTIMIZATION APPROACH IN IMAGE

SEGMENTATION PROBLEMS

By

Xuhui Huang

Image segmentation is a technique of dividing an image space into a number of meaningful ho-
mogeneous regions. Various data clustering techniques have been adapted in solving segmentation
problems.
In particular, data clustering is often posed as multi-optimization problem so that
characteristics of data could be caught by diﬀerent objectives simultaneously. Traditional multi-
optimization methods often require some prior knowledge or assumptions about data, performance
is poor if these assumptions do not hold. Limitations with established multi-optimization methods
are caused by their inadequacy in handling a large number of objectives. Nondominated sorting
genetic algorithm III (NSGA-III) [1] is proposed to alleviate this issue. However, NSGA-III is
ineﬃcient in removing some bad solutions in high-dimensional searching space during evolution.
In this article, we propose a variable string length many-objective genetic algorithm(VMOGA)
whose framework has evolved from NSGA-III and its encoding strategy, genetic and evolution-
ary operator have been redesigned. Performance of VMOGA in image segmentation problems is
further enhanced by an appropriate selection of objectives. In the end, we conduct unsupervised
segmentation by proposed clustering technique on magnetic resonance image(MRI) of human
brain. Comparisons with other evolutionary algorithms are presented and dominance of VMOGA
has been demonstrated quantitatively. VMOGA is also performed on detection of delamination
area caused by fatigue loading in Mode I glass ﬁber reinforced polymer (GFRP) samples. Results
are compared with fast marching algorithm(FMA) and superiority of VMOGA suggests future
potential application in fatigue detection.

 
C
o
p
y
r
i
g
h
t
 
b
y
 
X
U
H
U
I
 
H
U
A
N
G
 
2
0
1
8
ACKNOWLEDGEMENTS

I would ﬁrst like to thank my advisor Yiming Deng at Michigan State University. The door to Prof.
Deng oﬃce was always open whenever I ran into a trouble spot or had a question about my research
or writing. I would also like to thank Prof. Udpa and Prof. Ulusoy in my thesis defense committee,
without their passionate participation and comments, the validation of this thesis could not have
been successfully conducted. I would also like to acknowledge Dr. Banerjee as the provider of the
data to this thesis, and I am gratefully indebted to her for datailed explaination and guidance of the
data.

Finally, I must express my very profound gratitude to my parents for providing me with unfailing
support and continuous encouragement throughout my years of study and through the process of
researching and writing this thesis. This accomplishment would not have been possible without
them. Thank you.

iv

TABLE OF CONTENTS

.
.

.
.

.
.

.
.

.
.

.
.

LIST OF TABLES .
LIST OF FIGURES .
CHAPTER 1

.
.
.
.
INTRODUCTION .

. vii
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
1
1.1 Brief Introduction of Data Clustering
1
1.2 Multi-objective Optimization and Data Clustering . . . . . . . . . . . . . . . . .
.
2
1.3 Concept of Multi-objective Optimization . . . . . . . . . . . . . . . . . . . . . . .
3
1.4 Genetic Algorithms(GAs)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Multi-objective Genetic Algorithm . . . . . . . . . . . . . . . . . . . . .
3
4
1.4.2 General Framework of MOGA . . . . . . . . . . . . . . . . . . . . . . . .
5
1.5 Evolving the Number of Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Structure of This Paper
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5

.

7
CHAPTER 2 ENCODING STRATEGY AND GENETIC OPERATOR . . . . . . . . . .
7
2.1 General Encoding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2 Binary Encoding Strategy and Genetic Operator . . . . . . . . . . . . . . . . . .
.
8
2.3 Real-coded Encoding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Simulation of SBC .
9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Simulation of Real-coded Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Handling of Variable Length Chromosome in VMOGA . . . . . . . . . . . . . . . 15

.

.

.

. .

3.2 Evolutionary Strategy for High-dimensional Space . . . . . . . . . . . . . . . . .

CHAPTER 3 EVOLUTIONARY OPERATOR . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Evolutionary strategy for Low-dimensional Space . . . . . . . . . . . . . . . . . . 18
3.1.1 Non-dominated Sorting Algorithm . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Diversity Preservation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Weakness .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
. 22
ε-Non-dominated Sorting Algorithm . . . . . . . . . . . . . . . . . . . . . 22
3.2.1
3.2.2 Reference Based Diversity Preservation Rule
. . . . . . . . . . . . . . . . 23
3.2.3 Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.4 Niche-Preservation Operation . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Evolutionary Operator for VMOGA . . . . . . . . . . . . . . . . . . . . . . . . . 25
. 25
. 26

3.3.1 Tchebycheﬀ Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 A Rejection Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 4 OBJECTIVE FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 28
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

.
4.1
.
4.2 Distance Measure .
.
4.3 Membership Function .

Introduction .

. .

.
.
.

.
.
.

.
.

v

.

.

.

Fuzzy C-means (FCM) . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4 Reﬁned Membership Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Objective Functions .
. 31
4.5.1
4.5.2 Xie-Beni Index (XB)
. 31
4.5.3 Overall Cluster Deviation (DEV) . . . . . . . . . . . . . . . . . . . . . . . 32
. 32
Fuzzy Separation (FESP) . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.4
. 32
4.5.5 Global Separation (SEP)
. . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.6 Average Between Group Sum of Squares (ABGSS) . . . . . . . . . . . .
. 33
4.5.7
. . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5.8 Davies-Bouldin Index (DB) . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5.9 Connectedness
. 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Intra-cluster Entropy (H)

.

CHAPTER 5 EVALUATION OF OBJECTIVES . . . . . . . . . . . . . . . . . . . . . . . 36
5.1 Multivariate Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.1 Mutual Information Among Two Variables . . . . . . . . . . . . . . . . . . 36
5.1.2 Mutual Information Among Multi Variables . . . . . . . . . . . . . . . . . 37
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Total Correlation .

.

.

.

CHAPTER 6

.

.

.

6.1 Minor Speciﬁcation .

IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.1.1 Handling of Inﬁnity Solution in Objective Space
. . . . . . . . . . . . . . 40
6.1.2 Handling of Solutions with Same Objectives or Diversity . . . . . . . . . . 40
6.1.3 Handling of Solutions with Same Membership on Two Clusters . . . . . . . 40
6.2 Parameter and Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2.1
Initialization and Objective Selection . . . . . . . . . . . . . . . . . . . . . 41
6.2.2
Parameter Encoding and Genetic Operator . . . . . . . . . . . . . . . . . . 42
6.2.3 Obtaining Final Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
. 43
Introduction of MRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3.1
6.3.2 Adjusted Rand Index(ARI) . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3.3 Evaluation of Segmentation Results
. . . . . . . . . . . . . . . . . . . . . 45
6.3.4 Detection of White and Gray Matter . . . . . . . . . . . . . . . . . . . . . 47
Implementation on Fatigue Area Detection . . . . . . . . . . . . . . . . . . . . . . 50

6.3 Experiment on MRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.4

CHAPTER 7 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . 55
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
BIBLIOGRAPHY .

.

.

.

.

.

.

.

.

vi

LIST OF TABLES

 
Main loop of NSGA-II .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

General procedure of Niche-Preservation . . . . . . . . . . . . . . . . . . . . . 24

Distance measurement

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Initialization parameter .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Choice of objectives for FCM, SO,NSGA-II, MOVGA, NSGA-III and VMOGA . 42

Truth table for MRI brain image . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Contingency table .

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Results for Z10, Z72, Z108, Z120 and Z140 planes . . . . . . . . . . . . . . . . 47

Gray and White Matter extraction results for normal brain images for Z72,
Z80, Z100, Z108 and Z120 planes. . . . . . . . . . . . . . . . . . . . . . . . . . 49

vii

T
a
b
l
e
 
3
.
1
 
T
a
b
l
e
 
3
.
2
 
T
a
b
l
e
 
4
.
1
 
T
a
b
l
e
 
6
.
1
 
T
a
b
l
e
 
6
.
2
 
T
a
b
l
e
 
6
.
3
 
T
a
b
l
e
 
6
.
4
 
T
a
b
l
e
 
6
.
5
 
T
a
b
l
e
 
 
6
.
6
 
 
LIST OF FIGURES

 
viii

5

8

9

9

11

11

 
Figure  1.1 
 
 
Figure  2.1 
 
 
Figure  2.2 
 
 
Figure  2.3 
 
 
Figure  2.4 
 
 
Figure  2.5
 
 
Figure  2.6
 
 
Figure  2.7 
 
 
Figure  2.8
 
 
Figure  2.9
 
 
G
e
n
e
r
a
l
 
f
r
a
m
e
w
o
r
k
 
o
f
 
M
O
G
A
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
C
o
n
d
u
c
t
i
n
g
 
b
i
n
a
r
y
-
c
o
d
e
d
 
c
r
o
s
s
o
v
e
r
 
a
n
d
 
m
u
t
a
t
i
o
n
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
R
e
a
l
 
c
o
d
e
d
 
e
n
c
o
d
i
n
g
 
s
t
r
a
t
e
g
y
 
f
o
r
 
C
r
o
s
s
o
v
e
r
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
R
e
a
l
 
c
o
d
e
d
 
e
n
c
o
d
i
n
g
 
s
t
r
a
t
e
g
y
 
f
o
r
 
M
u
t
a
t
i
o
n
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
!
c
 
=
 
5
,
 
1
0
,
 
1
5
,
 
2
0
 
f
o
r
 
c
r
o
s
s
o
v
e
r
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
5
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
!
c
 
=
 
1
 
a
n
d
 
2
0
 
f
o
r
 
c
r
o
s
s
o
v
e
r
 
r
e
s
p
e
c
t
i
v
e
l
y
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
,
 
5
0
0
,
 
1
0
0
0
,
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
!
c
 
=
 
2
 
f
o
r
 
c
r
o
s
s
o
v
e
r
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
1
2
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
d
i
!
e
r
e
n
t
 
d
 
f
o
r
 
m
u
t
a
t
i
o
n
 
1
3
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
5
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
d
 
=
 
0
.
2
 
a
n
d
 
5
 
f
o
r
 
m
u
t
a
t
i
o
n
 
r
e
s
p
e
c
t
i
v
e
l
y
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
1
4
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
,
 
5
0
0
,
 
1
0
0
0
,
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
d
 
=
 
1
 
f
o
r
 
m
u
t
a
t
i
o
n
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
1
4
 
F
i
g
u
r
e
 
2
.
1
0
 
G
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
h
a
n
d
l
e
 
v
a
r
i
a
b
l
e
 
K
 
b
y
 
r
a
n
d
o
m
 
s
e
l
e
c
t
i
o
n
 
f
r
o
m
 
g
e
n
e
 
p
o
o
l
 
.
 
.
 
.
 
.
 
1
5
 
F
i
g
u
r
e
 
2
.
1
1
 
G
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
h
a
n
d
l
e
 
v
a
r
i
a
b
l
e
 
K
 
b
y
 
e
x
c
h
a
n
g
i
n
g
 
g
e
n
e
 
s
e
g
m
e
n
t
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
1
6
 
F
i
g
u
r
e
 
2
.
1
2
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
b
y
 
V
M
O
G
A
 
g
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
g
i
v
e
n
 
1
0
0
0
0
 
i
t
e
r
-
 
a
t
i
o
n
s
 
f
o
r
 
s
t
d
 
=
 
1
,
2
,
3
,
4
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
1
7
 
F
i
g
u
r
e
 
3
.
1
 
F
r
a
m
e
w
o
r
k
 
o
f
 
c
o
n
d
u
c
t
i
n
g
 
n
o
n
-
d
o
m
i
n
a
t
e
d
 
s
o
r
t
i
n
g
 
a
n
d
 
g
e
n
e
r
a
t
i
n
g
 
s
o
l
u
t
i
o
n
 
f
r
o
n
t
 
.
 
1
9
 
F
i
g
u
r
e
 
3
.
2
 
F
r
a
m
e
w
o
r
k
 
o
f
 
c
o
n
d
u
c
t
i
n
g
 
d
i
v
e
r
s
i
t
y
 
p
r
e
s
e
r
v
a
t
i
o
n
 
i
n
 
N
S
G
A
-
I
I
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
2
0
 
F
i
g
u
r
e
 
3
.
3
 
N
o
n
-
d
o
m
i
n
a
t
e
d
 
s
o
l
u
t
i
o
n
s
 
p
r
o
p
o
r
t
i
o
n
 
v
e
r
s
u
s
 
n
u
m
b
e
r
 
o
f
 
o
b
j
e
c
t
i
v
e
s
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
2
1
 
F
i
g
u
r
e
 
3
.
4
 
F
r
a
m
e
w
o
r
k
 
o
f
 
i
m
p
l
e
m
e
n
t
i
n
g
 
n
i
c
h
e
 
p
r
e
s
e
r
v
a
t
i
o
n
 
i
n
 
N
S
G
A
-
I
I
I
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
2
5
 
F
i
g
u
r
e
 
3
.
5
 
E
v
o
l
u
t
i
o
n
a
r
y
 
o
p
e
r
a
t
o
r
 
i
n
 
V
M
O
G
A
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
2
7
Figure  6.1

 
(a)Original MRI image in Z10 plane (b)Ground truth table (c)Corresponding
segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 45

Figure  6.2

 
(a)Original MRI image in Z72 plane (b)Ground truth table (c)Corresponding
segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46

Figure  6.3

 
(a)Original MRI image in Z108 plane (b)Ground truth table (c)Corresponding
segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46

Figure  6.4

 
(a)Original MRI image in Z120 plane (b)Ground truth table (c)Corresponding
segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46

Figure  6.5

 
(a)Original MRI image in Z140 plane (b)Ground truth table (c)Corresponding
segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46

Figure  6.6

 
(a)Gray Matter Ground truth (b)Gray Matter segmentation produced by
VMOGA on Z108 plane (a)White Matter Ground truth (b)White Matter seg-
mentation produced by VMOGA on Z108 plane . . . . . . . . . . . . . . . . . 48

Figure  6.7

 
(a)Gray Matter Ground truth (b)Gray Matter Extraction produced by VMOGA+Extraction
on Z108 plane (a)White Matter Ground truth (b)White Matter segmentation
produced by VMOGA-based extraction on Z108 plane . . . . . . . . . . . . . . 48

Figure  6.8

 
(a)Gray Matter Ground truth (b)Gray Matter segmentation produced by
VMOGA on Z120 plane (a)White Matter Ground truth (b)White Matter seg-
mentation produced by VMOGA on Z120 plane . . . . . . . . . . . . . . . . . 48

Figure  6.9

 
(a)Gray Matter Ground truth (b)Gray Matter Extraction produced by VMOGA+Extraction
on Z120 plane (a)White Matter Ground truth (b)White Matter segmentation
produced by VMOGA-based extraction on Z120 plane . . . . . . . . . . . . . . 49

 
Figure  6.10  Healthy sample being subjected to Mode 1 cyclic loading after (a) 20K cycles
(b) 40K cycles (c) 60K cycles (d) 80K cycles (e) 100K cycles (f) 120K cycles
(g) 140K cycles (h) 160 cycles

. . . . . . . . . . . . . . . . . . . . . . . . . . 51

   
Figure  6.11  PZT  cutting  image  of  Healthy  sample  being  subjected  to  Mode  1  cyclic
loading after (a) 20K cycles (b) 40K cycles (c) 60K cycles (d) 80K cycles (e)
100K cycles (f) 120K cycles (g) 140K cycles (h) 160K cycles . . . . . . . . . . 52

 
Figure  6.12  Edge eﬀect and real delamination area in PZT cutting image of Healthy sample
 
 
.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

 
after 160K cycles 
 
 
53

53

ix

F
i
g
u
r
e
 
 
6
.
1
3
 
H
e
a
l
t
h
y
 
s
a
m
p
l
e
 
b
e
i
n
g
 
s
u
b
j
e
c
t
e
d
 
t
o
 
M
o
d
e
 
1
 
c
y
c
l
i
c
 
l
o
a
d
i
n
g
 
a
f
t
e
r
 
2
0
K
,
 
4
0
K
,
 
6
0
K
 
a
n
d
 
8
0
K
 
c
y
c
l
e
s
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 

Figure  6.14  Healthy sample being subjected to Mode 1 cyclic loading after 100K, 120K,
 
 
140K and 160 cycles .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
                                                               
                                                               
54

 
x

F
i
g
u
r
e
 
 
6
.
1
5
 
P
l
o
t
 
o
f
 
n
u
m
b
e
r
 
o
f
 
l
o
a
d
 
c
y
c
l
e
s
v
e
r
s
u
s
d
e
l
a
m
i
n
a
t
i
o
n
 
a
r
e
a
d
e
t
e
c
t
e
d
 
b
y
 
V
M
O
G
A
+
E
x
t
r
a
c
t
i
o
n
 
a
n
d
 
F
M
M
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
.
 
5
4
CHAPTER 1

INTRODUCTION

1.1 Brief Introduction of Data Clustering

Data clustering has emerged as one of the fundamental technique for data analysis recently. One
of the primary application of data clustering is image segmentation. The problem of image seg-
mentation is usually considered as clustering image pixels in the intensity space [2] [1]. Geographic
distribution sometimes is also incorporated to enhance homogeneity. Real-life applications of im-
age segmentation include segmentation of remote sensing satellite images [3], magnetic resonance
image(MRI) medical imagery of human body [4] and texture image [5]. Other applications such as
social network analysis [6], time series data analysis [7], web mining [8] involve more sophisticated
data. The purpose of clustering is to partition a given dataset into homogeneous groups. Given
a dataset X = {x1, x2, . . . , xn} with size n in a multi-dimensional space, the goal of clustering is
to produce a partition matrix U = {uk, j}, value of uk, j is the membership or belongingness of
i∈[1,k]{ui, j} then xj ∈ Ck. Crisp and
data point xj to cluster Ck. Generally speaking, if uk, j = max
fuzzy partition matrix both exist in the literature with their diﬀerences of selecting uk, j. For crisp
partition matrix, uk, j = 1 or 0 while belongingness ranging from 0 to 1 in a fuzzy matrix [9]. It is
worth to be noted that fuzzy clustering is better capable of handling overlapping and noisy clusters.

1.2 Multi-objective Optimization and Data Clustering

Performances of clustering solution are often evaluated by some validity indices. Therefore,
clustering is posed as optimization problem by directly optimizing validity index [5] [6] [10].
Principle underlying this strategy is to consider partition matrix U equivalent to optimization
searching space. Searching space is composed of all the possible clustering results which are
denoted as Z. Notice that mapping between U and Z is unique and bilateral. On the other hand,
there is a strong connection between performance of clustering results and validity index. This

1

is because validity index often represents properties of cluster such as homogeneity, separation,
compactness, etc. In fact, when data is classiﬁed into reasonable or suitable groups, validity index
is often optimized and vice versa. Further analysis on this transformation requires a thorough study
on properties of cluster reﬂected by validity index, hence selection of validity index is important.
There are some evident limitations with single-objective framework [11]. If only one validity index
is optimized, characteristics of data are unlikely to be fully considered, thus adding complexity
in discriminating two data groups with features alike. It is also extremely diﬃcult to choose an
objective with no prior knowledge about dataset. If wrong assumption is made at the beginning,
clustering results would be corrupted [12].
In the end, it is impossible to implement tradeoﬀ
on objectives by single-objective optimization. At one time, in order to solve this issue, people
adopt single-objective framework incorporating multiple objectives by ordering them according
to importance or using weight vector to convert into single objective [13]. However, relative
signiﬁcance of various validity indices is unknown real-life application since data structure usually
has not been speciﬁed. Moreover, it is extremely diﬃcult to choose an objective with no prior
knowledge about dataset. If wrong assumption is made at the beginning, clustering results would
be corrupted. Therefore, optimizing multiple objectives simultaneously is demanded by situations.

1.3 Concept of Multi-objective Optimization

In order to achieve higher accuracy, our framework must optimize many objectives simulta-

( f1(x), f2(x), . . . , fk(x)),

neously. Multi-objective optimization can be formally stated as min
fi
x∈X
{− fi(x)}, where X
fi(x) = min
denotes objective function. Maximum can be expressed as max
x∈X
x∈X
is the feasible region. Solution to this problem is not unique. Moreover, good solution can be
extracted by some standards. The most important notion here is Pareto optimality [09], if x∗ ∈ X
is Pareto optimal, ∀x ∈ X, s.t. (1) if ∀i
fi(x∗) < fi(x) (2)
∃i ∈ 1, 2, . . . , K, fi(x∗) < fi(x). Similarly, non-domination sorting rule can be written (in min-
if ∀i ∈ 1, 2, ..., K, fi(x) ≤ fi(y), ∃i ∈ 1, 2, ...K, fi(x) < fi(y) then x dominate
imization style):
y

fi(x∗) ≤ fi(x),∃i ∈ 1, 2, ...K, s.t

2

1.4 Genetic Algorithms(GAs)

One of the most eﬃcient approach in searching solutions of Pareto optimality in multi-
optimization problem is Genetic Algorithms(GAs) [14], proposed by John Holland. It is proven
that GAs are the most powerful searching technique in a large space.
Inspired by evolutionary
process theory created by Charles Darwin, GAs adopt some similar concepts such as parent, child,
population selection, mutation, crossover. Initialization of GAs is to encode solutions(variables)
into strings called chromosomes, by operating on chromosomes instead of variables. GAs conduct
searching from multiple points instead of single point. Therefore, GAs are unlikely to get trapped
in local optimum as other searching algorithms do. Each solution is then given a ﬁtness value
and evaluated based on ﬁtness value that measures the goodness of the solution encoded in the
chromosome. In this article, ﬁtness value is equivalent to objective value. Thereafter, chromo-
somes are selected based on ﬁtness value, those having better ﬁtness are given more chance to be
selected and reproduce other chromosome. Potential better solutions are reproduced from selected
chromosomes by applying some operators such as mutation and crossover. The procedure that
retain best chromosome and replace the worst during each generation is called elitism and better
ﬁtness than previous generation is provided. Model that build upon these notions are developing
very fast in the last decade. Strength Pareto Evolutionary Algorithm (SPEA) [15] and [16], Pareto
Envelope-Based Selection Algorithm (PESA) [17]and PESA-II [18], Non-dominated Sorting Ge-
netic Algorithm-II (NSGA-II) [19] are proposed and instantly become popular in application of
multi-objective evolutionary algorithm for clustering problem.

1.4.1 Multi-objective Genetic Algorithm

In recent studies, multi-objective genetic algorithm (MOGA) based methods are frequently adopted
in unsupervised image segmentation, medical and geographical detection problems. NSGA-II is
one of the most widely used MOGA based methods since its proposal. However, procedures within
NSGA-II framework that beneﬁt its performance on two-objective optimization problems has

3

prevented it from achieving decent results on many-objectives problem (with number of objectives
larger than 3). NSGA-III was proposed to alleviate the issue, as an evolutionary many-objective
algorithm, it becomes popular instantly. As a reference-point-based searching algorithm, solution
in each iteration is encouraged to be close to vacant reference lines, thus adding coverage among
searching space.

1.4.2 General Framework of MOGA

4

E
x
i
s
t
i
n
g
 
M
O
G
A
 
t
e
c
h
n
i
q
u
e
s
 
m
a
i
n
l
y
 
d
i
!
e
r
 
i
n
 
(cid:222)
v
e
 
a
s
p
e
c
t
s
 
[
2
0
]
.
 
F
i
r
s
t
 
o
f
 
a
l
l
,
 
d
a
t
a
 
c
l
u
s
t
e
r
i
n
g
 
i
s
 
p
o
s
e
d
 
a
s
 
o
p
t
i
m
i
z
a
t
i
o
n
 
p
r
o
b
l
e
m
s
 
b
y
 
o
p
t
i
m
i
z
i
n
g
 
s
o
m
e
 
v
a
l
i
d
i
t
y
 
i
n
d
i
c
e
s
 
(
o
b
j
e
c
t
i
v
e
 
f
u
n
c
t
i
o
n
s
)
.
 
T
h
e
r
e
f
o
r
e
,
 
s
e
l
e
c
-
 
t
i
o
n
 
o
f
 
o
b
j
e
c
t
i
v
e
 
f
u
n
c
t
i
o
n
 
m
u
s
t
 
b
e
 
d
o
n
e
 
i
n
 
t
h
e
 
(cid:222)
r
s
t
 
p
l
a
c
e
.
 
T
h
e
n
,
 
c
l
u
s
t
e
r
 
s
o
l
u
t
i
o
n
 
w
i
l
l
 
b
e
 
r
e
p
r
e
s
e
n
t
e
d
 
i
n
 
t
h
e
 
f
o
r
m
 
o
f
 
c
h
r
o
m
o
s
o
m
e
.
 
E
n
c
o
d
i
n
g
 
o
p
e
r
a
t
o
r
s
 
a
r
e
 
d
i
!
e
r
e
n
t
 
i
n
 
t
e
r
m
s
 
o
f
 
r
e
p
r
e
s
e
n
t
a
t
i
o
n
 
s
t
r
a
t
e
g
y
.
 
A
m
o
n
g
 
t
h
e
m
,
 
l
a
b
e
l
 
r
e
p
r
e
s
e
n
t
a
t
i
o
n
 
a
n
d
 
c
o
o
r
d
i
n
a
t
e
 
r
e
p
r
e
s
e
n
t
a
t
i
o
n
 
f
r
o
m
 
L
a
b
e
l
-
B
a
s
e
d
 
E
n
c
o
d
i
n
g
 
a
n
d
 
P
r
o
t
o
t
y
p
e
-
B
a
s
e
d
 
E
n
c
o
d
i
n
g
 
r
e
s
p
e
c
t
i
v
e
l
y
 
a
r
e
 
m
o
s
t
 
c
o
m
m
o
n
l
y
 
a
p
p
l
i
e
d
.
 
C
l
u
s
t
e
r
 
s
o
l
u
t
i
o
n
 
i
s
 
i
n
i
t
i
a
l
i
z
e
d
 
a
s
 
p
a
r
e
n
t
 
s
o
l
u
t
i
o
n
 
p
o
p
u
l
a
t
i
o
n
 
i
n
 
t
h
i
s
 
p
e
r
i
o
d
.
 
I
n
 
t
h
e
 
n
e
x
t
 
s
t
a
g
e
,
 
g
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
p
r
o
d
u
c
e
s
 
c
h
i
l
d
 
s
o
l
u
t
i
o
n
 
p
o
p
u
l
a
t
i
o
n
.
 
G
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
i
s
 
h
i
g
h
l
y
 
r
e
l
a
t
e
d
 
t
o
 
e
n
c
o
d
i
n
g
 
o
p
e
r
a
t
o
r
,
 
s
o
 
t
h
e
y
 
a
r
e
 
o
f
t
e
n
 
d
i
s
c
u
s
s
e
d
 
t
o
g
e
t
h
e
r
.
 
T
h
e
 
k
e
y
 
a
s
p
e
c
t
 
o
f
 
M
O
G
A
 
i
s
 
t
h
e
 
e
v
o
l
u
t
i
o
n
a
r
y
 
o
p
e
r
a
t
o
r
 
w
h
i
c
h
 
m
a
k
e
s
 
s
e
l
e
c
t
i
o
n
 
o
n
 
p
a
r
e
n
t
-
c
h
i
l
d
 
c
o
m
b
i
n
e
d
 
p
o
p
u
l
a
t
i
o
n
 
a
n
d
 
g
e
n
e
r
a
t
e
s
 
s
o
l
u
t
i
o
n
 
p
o
p
u
l
a
t
i
o
n
 
f
o
r
 
t
h
e
 
n
e
x
t
 
g
e
n
e
r
a
t
i
o
n
.
 
I
n
 
s
o
m
e
 
a
r
t
i
c
l
e
s
,
 
g
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
a
n
d
 
e
v
o
l
u
t
i
o
n
a
r
y
 
o
p
e
r
a
t
o
r
 
a
r
e
 
b
o
u
n
d
e
d
 
t
o
g
e
t
h
e
r
 
a
s
 
e
v
o
l
u
t
i
o
n
a
r
y
 
o
p
-
 
e
r
a
t
o
r
.
 
H
e
r
e
,
 
w
e
 
w
a
n
t
 
t
o
 
s
t
r
e
s
s
 
t
h
e
 
i
m
p
o
r
t
a
n
c
e
 
o
f
 
s
e
l
e
c
t
i
o
n
 
s
t
r
a
t
e
g
y
 
i
n
 
e
v
o
l
u
t
i
o
n
,
 
t
h
u
s
,
 
e
v
o
l
u
t
i
o
n
a
r
y
 
o
p
e
r
a
t
o
r
 
i
s
 
d
i
s
c
u
s
s
e
d
 
s
e
p
a
r
a
t
e
l
y
.
 
I
n
 
t
h
e
 
e
n
d
,
 
a
 
s
i
n
g
l
e
 
c
l
u
s
t
e
r
i
n
g
 
s
o
l
u
t
i
o
n
 
m
u
s
t
 
b
e
 
o
b
t
a
i
n
e
d
 
f
r
o
m
 
t
h
e
 
l
a
s
t
 
g
e
n
e
r
a
t
i
o
n
.
 
G
e
n
e
r
a
l
 
f
r
a
m
e
w
o
r
k
 
o
f
 
M
O
G
A
 
i
s
 
i
l
l
u
s
t
a
t
e
d
 
i
n
 
(cid:222)
g
[
1
.
1
]
1.5 Evolving the Number of Clusters

Conventional MOGA methods require predetermination of the number of cluster K. In fact,
only an appropriate choice of K could yield acceptable clustering results. In this article, choice of
the number of clusters can be tackled by evolving K automatically during iteration using variable
string-length encoding in GAs. However, previous encoding strategies and proposed evolutionary
operators [21] are ineﬃcient to generate K since they ignore inherent distribution of binary mutation
and crossover. In this paper, we examine child population distribution from diﬀerent evolutionary
operators. Performance of selecting K is also evaluated and superiority of proposed evolutionary
strategy is demonstrated.

1.6 Structure of This Paper

This paper is organized as follows. In section 2, we introduce methods of constructing encoding
and genetic operator. Inspired by some philosophy underlying conventional operator, we design
a new strategy for VMOGA which is capable of handling variable number of.
In section 3,
evolutionary operator is introduced with respect to NSGA-II and III. Weakness with exiting strategy

5

F
i
g
u
r
e
 
1
.
1
 
G
e
n
e
r
a
l
 
f
r
a
m
e
w
o
r
k
 
o
f
 
M
O
G
A
In section 4,
is discussed and a selection scheme which remove “bad” solutions is proposed.
we discuss the choice of objective functions which are selected based on prior knowledge and
supplement information provided by total correlation. In section 5, conventional objective functions
are ﬁrst introduced, then reﬁned objective functions are constructed in order to better target image
segmentation problem. In section 6, implementation details of proposed algorithm are presented.
For MRI human brain image, evaluation of segmentation results which demonstrate superiority of
proposed algorithm is accomplished by comparing adjusted rand index(ARI) and accuracy(ACC)
with corresponding truth table. Results of experiment on detecting delamination area are also
discussed.

6

CHAPTER 2

ENCODING STRATEGY AND GENETIC OPERATOR

2.1 General Encoding Strategy

Encoding strategy is adopted to represent a clustering solution. At ﬁrst glance, clustering
solutions can be represented by partition matrix. This representation scheme provides full coverage
of entire solution space and it is convenient to deﬁne extra constraints during mutation or crossover.
However, with increasing number of data points introduced in clustering problem, the complexity
of direct encoding is surging exponentially, even for a mid-sized clustering problem, scale of its
representation will be too large to handle. Consequently, reduction of searching space is vital for
encoding and searching. Previous researches have chosen a more indirect approach: assigning
each point to its closest cluster represented by centroid of the cluster and encoding coordinates
of the centroids. Hence each representation has the length of K × d, d is the dimension of data
and K denotes the number of clusters. It is written as [Z1, Z2, . . . , ZK], Zi denotes center of the
cluster which is vector of d by 1. In image segmentation, d = 3. Notice that K is a variable in
VMOGA. Centroid-based encoding strategies produce considerably smaller scale of representation
and reduce its complexity signiﬁcantly. In initialization, a set of initial solutions are forwarded to
encoding by the form of chromosome. They solution chromosomes are manipulated to produce
child population solutions by evolutionary operator. Procedures in evolutionary operator to produce
child population are called mutation and crossover. During each generation, eﬀective population
maintenance strategies are also adopted to keep track of the current non-dominated front.

2.2 Binary Encoding Strategy and Genetic Operator

Solutions population are encoded in the form of binary string by binary encoding strategy [22].
Mutation and crossover are performed on a binary string as illustrated in ﬁg 21. One application
of Binary-coded GAs is to conduct feature selection, chromosomes of solutions are encoded by

7

 
binary string with length of total number of features. Bit ‘1’ and ‘0’ represent selected and ignored
feature respectively, thus every solution denotes a subset of features. In Centroid-based encoding
framework, positions of cluster center are encoded in the form of binary string. However, there
are some weaknesses with binary-coded GAs in real-life implementation. The biggest one with
binary-coded scheme is that Hamming cliﬀ is likely to be triggered during searching. When
searching around neighboring area of one particular solution, one might need to change many bits,
i.e. 01111 −→ 10000 which introduce lots of inconveniency to the gradual search in continuous
searching space. This is inherently caused by uneven importance distribution among diﬀerent
schema in binary string. Moreover, it is diﬃcult to achieve arbitrary precision since ﬁxed string
length limits the precision of solution and appropriate length of the string is often not known.

2.3 Real-coded Encoding Strategy

In order to solve these issues, real-coded GAs are adopted where population solutions are not
required to convert into binary strings. Real-coded GAs are simple and straightforward. The key
of constructing real-coded scheme is to imitate child population distribution results from binary
coded scheme.

8

F
i
g
u
r
e
 
2
.
1
 
C
o
n
d
u
c
t
i
n
g
 
b
i
n
a
r
y
-
c
o
d
e
d
 
c
r
o
s
s
o
v
e
r
 
a
n
d
 
m
u
t
a
t
i
o
n
Figure 2.2 Real coded encoding strategy for Crossover

Figure 2.3 Real coded encoding strategy for Mutation

2.4 Simulation of SBC

Notice that probability of conducting crossover, number of bits changed during one crossover
and position of crossover in binary scheme are all aﬀecting distribution of child population distri-
bution. Generally speaking, some properties of binary-coded crossover are important to estimate
distribution. First of all, symmetry plays a signiﬁcant role in distribution: P1 + P2 = C1 +C2, which

9

 
A
s
 
s
h
o
w
n
 
a
b
o
v
e
 
,
d
i
s
t
r
i
b
u
t
i
o
n
 
o
f
 
c
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
i
s
 
p
r
o
d
u
c
e
d
 
b
y
 
r
e
p
e
a
t
i
n
g
 
b
i
n
a
r
y
-
 
c
o
d
e
d
 
c
r
o
s
s
o
v
e
r
 
a
n
d
 
m
u
t
a
t
i
o
n
 
p
r
o
c
e
s
s
.
 
T
h
e
s
e
 
d
i
s
t
r
i
b
u
t
i
o
n
s
 
a
r
e
 
l
i
s
t
e
d
 
a
s
 
r
e
f
e
r
e
n
c
e
 
i
n
 
c
o
n
s
t
r
u
c
t
i
n
g
 
r
e
a
l
-
c
o
d
e
d
 
s
c
h
e
m
e
.
 
S
i
m
u
l
a
t
e
d
 
b
i
n
a
r
y
 
c
r
o
s
s
o
v
e
r
 
(
S
B
C
)
 
i
s
 
w
i
d
e
l
y
 
u
s
e
d
 
a
n
d
 
p
r
o
p
o
s
e
d
 
b
y
 
[
2
3
]
.
means we place equal importance on P1and P2. Secondly, P1 and P2 have the same probability
Pc to be selected as a crossover point. Crossover in the lower bit and large bits results in small
and large change respectively. In the end, child population are likely to be closed to parent pop-
ulation. SBC adopts probability density function of child distribution and simulates single-point
crossover in binary-coded GAs (multiple-point crossover is superimposition of some single-point
+ β(P2−P1)
crossover). P1 and P2 are ready for crossover, C1 = P1+P2
; β is functioned as a spreading factor. C1 and C2 are spread outward when β > 1, inwards when
β < 1. Probability distribution of β should be similar to probability distribution of spread factor in
binary-coded crossover. Probability density function(PDF) of β is deﬁned as:

− β(P2−P1)

; C2 = P1+P2

2

2

2

2

0.5(n + 1)βn
0.5(n + 1)
1
βn+2
n =2 to 5 mostly. Another reﬁned version is given by [23]:

f(β) =

β ≤ 1
β > 1

(2.1)

(2.2)

(2.3)





(1 + β)P1

2

(1 − β)P2

2

+

C1 =

; C2 =

(1 − β)P1

2

(1 + β)P2

2

+

β is the distribution index. Where β can be generated each time by

(cid:19)
(cid:18)
(2u) 1
ηc+1

1

2(1−u)

1
ηc+1

u ≤ 0.5

u > 0.5

β =

ηc is a variable that controls distribution concentration. As shown in ﬁg 24 larger ηc tends to
generate children closer to parents while small ηc allows children to be far from parents.

On the other hand, if ηc is comparatively small, distribution converges slowly.

In extreme
cases where large number of repetition is unreachable, distribution of child population is no longer
guaranteed to have higher PDF around parent population. This is illustrated in ﬁg[25]

10

 
Figure 2.5 Child population distribution after 50 iterations given ηc = 1 and 20 for crossover respectively

  We see that distribution of child population gradually converges to a smooth curve as 
number of simulation increasing.

Some valuable discoveries from simulation needs to be discussed. First of all, ηc can be
considered as a spread factor in constructing crossover operator. Secondly, if ηc and the number
of iteration is small, there will be some uncertainty with children distribution, therefore ηc should
not be too small. On the other hand, children population are encouraged to spread out during
early searching period. This is because parent solutions in this period are usually far away from

11

F
i
g
u
r
e
 
2
.
4
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
!
c
 
=
 
5
,
 
1
0
,
 
1
5
,
 
2
0
 
f
o
r
 
c
r
o
s
s
o
v
e
r
Pareto optimal front and searching area should be large enough to reach out for better solution.
Lastly, evolutionary operator should conduct searching in neighborhood area of parent population
as iteration approaching termination. This is largely to that in later period, parent population
solutions are assumed to locate in vicinity of Pareto optimal front. In summary, large ηc fails to
satisfy our requirement in early searching stage while small ηc could results in wasteful searching.
As a result, ηc should be a function of iteration with its value increasing as iteration going up.

2.5 Simulation of Real-coded Mutation

The design of mutation operator is quite similar. There are two types of mutation: uniform mu-
tation and non-uniform mutation. The random mutation generates a solution randomly distributed
within a vicinity of the original solution. Customarily, non-uniform mutation is adopted due to
its preference on original solutions. There are several demands on selection of mutation operator.
First of all, we wish mutated solutions are distributed towards parent population. Secondly, As
the generation number increases, mutated solutions are generated closer to the original solution.
Finally, it is obvious that all mutated solutions should be feasible. Mutation operator proposed
by [20] is given as

C = P + 0.5τ∆

(1− t

1 − r

(2.4)

tmax )d(cid:19)

(cid:18)

12

F
i
g
u
r
e
 
2
.
6
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
,
 
5
0
0
,
 
1
0
0
0
,
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
!
c
 
=
 
2
 
f
o
r
 
c
r
o
s
s
o
v
e
r
∆ is the maximum mutation step within constraint and τ is direction index where τ = 1 and −1
indicate mutation towards upper and lower bound respectively. Mutation step is also aﬀected by
r which is randomly selected from 0 to 1.
t and tmax denote number of mutation so far and
the maximum number of mutation respectively. d is a variable that control concentration within

the child population distribution after mutation. Usually large d tends to generate children closer
to parents while small d allows children to be far from parents. For convergence perspective, if
d is comparatively small, distribution with insigniﬁcant number of trials β under test has some
randomness. In other words, it is not guaranteed that child population are more likely to be near
parent under insuﬃcient simulation. On the other hand, if d is large enough, insuﬃcient simulation
won’t be a problem

13

F
i
g
u
r
e
 
2
.
7
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
d
i
!
e
r
e
n
t
 
d
 
f
o
r
 
m
u
t
a
t
i
o
n
Figure 2.8 Child population distribution after 50 iterations given d = 0.2 and 5 for mutation respectively

With number of simulation increasing, distribution of child population gradually approaches
to a smooth curve Some valuable discoveries from simulation also needs to be discussed. First of

all,d can be considered as a spread factor in constructing mutation operator. Secondly, if dand the
number of iterations is small, there will be some uncertainty with children distribution, therefore
d should not be too small. On the other hand, children population are encouraged to spread out
during early searching period. This is because parent solutions in this period are usually far away

14

F
i
g
u
r
e
 
2
.
9
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
a
f
t
e
r
 
1
0
0
,
 
5
0
0
,
 
1
0
0
0
,
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
g
i
v
e
n
 
d
 
=
 
1
 
f
o
r
 
m
u
t
a
t
i
o
n
from Pareto optimal front and searching area should be large enough to reach out for better solution.
Lastly, evolutionary operator should conduct searching in neighborhood area of parent population.
This is largely to that in later period, parent population solutions are assumed to locate in vicinity
of Pareto optimal front. In summary, large d fails to satisfy our requirement in early searching stage
while small d could results in wasteful searching. As a result, d should be a function of iteration
with its value increasing as iteration moving on.

2.6 Handling of Variable Length Chromosome in VMOGA

One proposed operator by [21] is capable of doing crossover when K is a variable. Assume
parent solutions P1 and P2 encode K1 and K1 cluster centers. Kmax and Kmin denote upper and
lower bound respectively. Considering data is always classiﬁed into at least 2 groups, Kmin = 2.
During iteration, crossover operator randomly selects an integer from max(2, K1 + K2 − Kmax), to
min(Kmax, K1 + K2 − 2) as variable length for child population. Procedure is illustrated in ﬁg[210]:

Another crossover operation iproposed by [4] ensures exchanging solution information is con-
ducted smoothly. Similarity, assume parent solutions P1 and P2 encode K1 and K2 cluster centers.
λ1 and λ2 denote the location of crossover point in P1 and P2 respectively. λ1 is generated by
conducting random integer selection from 2 to K1m2. λ2 is generated by conducting random integer
selection from λ2 to λ2. if λ2 ≤ λ2 , λ2 is given by:

λ2 = min[2, max(2 − (K1 − λ1))]

(2.5)

15

F
i
g
u
r
e
 
2
.
1
0
 
G
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
h
a
n
d
l
e
 
v
a
r
i
a
b
l
e
 
K
 
b
y
 
r
a
n
d
o
m
 
s
e
l
e
c
t
i
o
n
 
f
r
o
m
 
g
e
n
e
 
p
o
o
l
λ2 = K2 − max[0,(2 − λ1)]

(2.6)

Otherwise λ2 = 0, This is illustrated in ﬁg[211]

Figure 2.11 Genetic operator handle variable K by exchanging gene segment

(cid:48)
1 and K

Despite being easy to implement, genetic operator that handle variable K by random selection
(cid:48)
from gene pool do not follow binary crossover distribution. K
2 are randomly selected from
section with lowerbound and upperbound being max(2, K1 +K2−Kmax) and min(Kmax, K1 +K2−2)
respectly, thus probability of child population to some extent will not be higher when it is getting
close to parent population. Genetic operator that handle K by exchanging gene segment also has
similar limitations. Moreover, it is unable to generate solutions with diﬀerent length after initiation.
In order to solve these issues, genetic operator in VMOGA is introduced. First of all, to achieve
similar distribution when K is a variable, a similarity index is proposed to measure similarity
between child and parent population.

16

 
!
"
 
!
!
 
|
C
|
 
#
!
 
|
P
|
 
i
f
 
a
d
d
i
n
g
 
g
e
n
e
s
 
S
(
P
,
 
C
)
 
=
 
(
2
.
7
)
 
!
!
 
|
P
!
C
|
 
!
!
 
i
f
 
l
o
s
i
n
g
 
g
e
n
e
s
 
$
 
|
P
|
T
h
e
n
,
 
m
o
t
i
v
a
t
e
d
 
b
y
 
p
a
r
a
m
e
t
e
r
 
s
e
t
t
i
n
g
 
i
n
 
s
i
m
u
l
a
t
e
d
 
c
r
o
s
s
o
v
e
r
 
a
n
d
 
m
u
t
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
,
 
a
 
s
p
r
e
a
d
i
n
g
 
f
a
c
t
o
r
 
i
s
 
g
e
n
e
r
a
t
e
d
 
f
r
o
m
 
a
 
n
o
r
m
a
l
 
d
i
s
t
r
i
b
u
t
i
o
n
 
N
(
(cid:181)
,
 
!
2
)
 
w
i
t
h
 
(cid:181)
 
d
e
t
e
r
m
i
n
e
d
 
b
y
 
p
a
r
e
n
t
 
p
o
p
u
l
a
t
i
o
n
 
a
n
d
 
!
 
a
d
o
p
t
 
t
h
e
 
s
a
m
e
 
s
t
r
a
t
e
g
y
 
a
s
 
"
c
 
a
n
d
 
d
.
 
I
n
 
s
i
m
u
l
a
t
i
o
n
,
 
s
e
l
e
c
t
 
o
p
e
r
a
t
o
r
 
c
o
n
d
u
c
t
 
a
 
r
a
n
d
o
m
 
s
e
l
e
c
t
i
o
n
 
N
(
(cid:181)
,
 
!
2
)
.
 
C
1
 
=
 
P
1
 
"
 
s
e
l
e
c
t
(
P
2
)
 
o
r
 
C
1
 
=
 
s
e
l
e
c
t
(
P
1
)
,
 
e
a
c
h
 
w
i
t
h
 
5
0
 
p
e
r
c
e
n
t
 
c
h
a
n
c
e
 
r
e
p
r
e
s
e
n
t
i
n
g
 
c
o
n
t
r
a
c
t
i
o
n
 
a
n
d
 
e
x
p
a
n
s
i
o
n
.
 
R
e
s
u
l
t
s
 
o
f
 
s
i
m
u
l
a
t
i
o
n
 
a
r
e
 
i
l
l
u
s
t
r
a
t
e
d
 
i
n
 
(cid:222)
g
 
2
.
1
2
 

17

F
i
g
u
r
e
 
2
.
1
2
 
C
h
i
l
d
 
p
o
p
u
l
a
t
i
o
n
 
d
i
s
t
r
i
b
u
t
i
o
n
 
b
y
 
V
M
O
G
A
 
g
e
n
e
t
i
c
 
o
p
e
r
a
t
o
r
 
g
i
v
e
n
 
1
0
0
0
0
 
i
t
e
r
a
t
i
o
n
s
 
f
o
r
 
s
t
d
 
=
 
1
,
2
,
3
,
4
CHAPTER 3

EVOLUTIONARY OPERATOR

3.1 Evolutionary strategy for Low-dimensional Space

Various strategies exist in the literature of searching in low-dimensional space. The most
widely used technique is NSGA-II. NSGA-II conduct non-dominated sorting based on ﬁtness value
(objective value) and preserving diversity among solutions. The scheme to preserve diversity is
to adopt crowding comparison operator. Generally, the operator prefers solutions with lower rank
(sorting by non-domination) and if both solutions belong to the same rank, solutions located in
a less crowded region are preferred. NSAG-III is a reference-point-based algorithm that adopt
similar framework as NSGA-II

3.1.1 Non-dominated Sorting Algorithm

f(x1),...,−−−−−−→

f(xi−1),−−−−−−→

f(xi+1). . . , −−−−→

f(xi), with −−−−→

The concept of domination and Pareto front are given in introduction part. Here, we will see how
these concepts applied in evaluating solutions population. Note that minx∈X ( f1(x), f2(x), ..., fk(x))
, denotes −→
f = ( f1, f2, ..., fk), x1, x2, . . . , xN are N solutions. Basic idea is that we compare
every −−−→
f(xN). Comparison on each solution will bring
computation complexity of O(kN2). In order to ﬁnd the ﬁrst non-dominated front, each solution is
assigned two labels: domination count DC and dominated count DDC, representing the number of
solution xi dominated and the number of solution xi being dominated. The only requirement for
xi to enter ﬁrst front is DDC(xi) = 0which means solution xi is dominated by no one. Hence, no
solution is being dominated in the ﬁrst front. In order to ﬁnd solutions in the next level, solutions
of the ﬁrst front are discounted temporarily and solutions of the ﬁrst front among the remaining
i=1 DC(xi) in practice.

solutions enter marked as the second front. This is usually done by limN

Procedure above is repeated until end of iteration and it is shown in ﬁg ??

18

3.1.2 Diversity Preservation Algorithm

In NSGA-II, convergence of solution to the Pareto-optimal set is underlined. This requires evolu-
tionary operator to maintain a good spread of solutions. In other words, our evolutionary operator
must give an estimation of density of solutions surrounding a particular solution very quickly
after non-domination sorting. Obviously, density estimation will be restricted in a certain area,
i.e. neighborhood area of a given point. However, it is very diﬃcult to determine how large this
neighborhood should be, if it’s too small, there might be no points fall in the area for most solutions
being examined while it’s computationally ineﬀective to construct such a large neighborhood. On
the other hand, density estimation is fundamentally unreliable for high dimensional space. For
relatively small dimensional space, crowding distance operator is proposed to handle this issue.
Mostly, crowding distance requires sorting solutions with each objective function value in ascend-
ing order. For each objective function fi, each boundary value xu = {x : fi(x) = max( fi(x))}or
xl = {x : fi(x) = min( fi(x))} is assigned an inﬁnity value which emphasis signiﬁcance of preserv-
ing spreading out solutions. For other solutions, overall distance is calculated as the sum height
and width of cuboid. It is extremely useful in two-objective space since they only use one number
to describe density. In other occasions where two or more features are required to depict density,

19

F
i
g
u
r
e
 
3
.
1
 
F
r
a
m
e
w
o
r
k
 
o
f
 
c
o
n
d
u
c
t
i
n
g
 
n
o
n
-
d
o
m
i
n
a
t
e
d
 
s
o
r
t
i
n
g
 
a
n
d
 
g
e
n
e
r
a
t
i
n
g
 
s
o
l
u
t
i
o
n
 
f
r
o
n
t
non-domination sorting rule could also be applied. In summary, diversity preserving operator gives
every solution a “crowding estimation” DE and sort them by descending order, larger DE(xi) is
preferred in diversity sorting which represent that solution xi is located in a sparse area and pre-
serving xi will guide the searching toward a uniformly spread-out Pareto optimal front. Procedure
for diversity preservation in NSGA-II is illustrated in ﬁg[32]

3.1.3 Weakness

When dealing with real world problems, we often need to optimize more than four or more
objectives. It is well known that with an increase number of objectives, an increasing proportion
of solutions will be non-dominated. This is shown in ﬁg[33]

20

F
i
g
u
r
e
 
3
.
2
 
F
r
a
m
e
w
o
r
k
 
o
f
 
c
o
n
d
u
c
t
i
n
g
 
d
i
v
e
r
s
i
t
y
 
p
r
e
s
e
r
v
a
t
i
o
n
 
i
n
 
N
S
G
A
-
I
I
Step
Step1

Step2

Step3

Step4

Step5

Operation

Initial parent solutions P1 with population number N are created with each
population given a corresponding objective value.
Create oﬀspring solutions Q1 based on P1 with population number N by
genetic operator and corresponding objective values are assigned.

Combining P1 Q1 = R1. Sorting R1 based on non-domination rule and
in order to construct P2, since l must exist satisfying |l−1
|I
remaining N − |l−1

generate F1, F2, . . . , Fi, . . . Now we want to select N populations from R1
i=1 Fi| ≤ N and
i=1 Fi| ≥ N The ﬁrst l − 1 fronts F1, F2, ..., Fl−1 are assigned to P2 and
i=1 Fi| solutions will be selected from Fl by diversity pre-
serving operator.
Sorting solutions in Fl by DE, x = {x : x ∈ Fl, DE(x) = max(DE(x))}will
be ﬁrst selected, assigned to P2, Fl = Fl\x and so on until |P2| = N.
Go to step2 until iteration is over.

Figure 3.3 Non-dominated solutions proportion versus number of objectives

Figure[33] shows percentage of non-dominate solution versus cardinality For example, given

|Pt| = N ,|Qt| = N and Rt = Pt Qt, |Rt| = 2N. If proportion of non-dominated solution is more

than 50 %, then |F1| > N which suggest that non-domination sorting is not working in population

21

T
a
b
l
e
 
3
.
1
 
M
a
i
n
 
l
o
o
p
 
o
f
 
N
S
G
A
-
I
I
selection. On the other hand, all the selecting pressure will be placed on diversity preserving
operator. Recall that it is computationally expensive to calculate diversity in high-dimensional
space, identiﬁcation of neighbors which required crowded-comparison operator brings huge amount
of calculation, approximation in diversity estimation to make computations faster may cause an
unacceptable distribution of solutions at the end. To make things even worse, diversity preserving
i=1 Fi|.
In summary, traditional evolutionary operator leaves no space for in accommodating an adequate
number of new solutions in the population and cause diversity preservation operator collapsing
during iteration.

operator will sort N solutions in F1 instead of solutions in Fl with population N − |l−1

3.2 Evolutionary Strategy for High-dimensional Space

3.2.1

ε-Non-dominated Sorting Algorithm

In order to optimize many objectives simultaneously, a special domination rule is proposed by [24]
by means of changing dominance rule. In minimization problem, ε-dominance can be expressed
as: if x dominate y, then ∀i ∈ 1, 2, ..., K, fi(x) + εi ≤ fi(y) and ∃i ∈ 1, 2, ...K, s.t
fi(x) + εi <
if x dominate y, ∀i ∈
fi(y) . Another approach of conducting ε-dominance is quite similar:
1, 2, ..., K, fi(x)(1 + εi) ≤ fi(y) (2) ∃i ∈ 1, 2, ...K,
fi(x)(1 + εi) < fi(y) The concept of ε-Pareto
optimal is given as, if x∗ ∈ X is ε-Pareto optimal, then ∀x ∈ X if ∀i ∈ 1, 2, ..., K fi(x∗) + εi ≤
fi(x),∃i ∈ 1, 2, ...K,
fi(x∗) + εi < fi(x). Notice that εi must be predeﬁned by some prior
knowledge or universal rules. It has been proved the number of population in ε-Pareto optimal set
is related to ε. Assume K objectives, if ∀i ∈ 1, 2, ..., K, 1 ≤ fi(x) ≤ M

(cid:18)

(cid:19)K−1

|Fε| ≤

logM
log(1 + ε)

(3.1)

In fact, there are many diﬀerent concepts of ε-Pareto optimality and ε-dominance exist in research
literature. If we consider performance of ε non-dominated sorting on objective space, value of
ε actually represents a relative tolerance or constraint on each objective. When ε → 0, the ε-
dominance place no extra constraint on solution population. As the value of ε increases, size of

22

|Fε| is decreasing over time. Idea behind this suggests a tradeoﬀ between converging speed and
diversity.

3.2.2 Reference Based Diversity Preservation Rule

In NSGA-III, a predeﬁned set of reference point is adopted to maintain diversity. In this paper, we
use Das and Dennis’s [021] systematic approach that places points on a normalized hyper-plane.
Imagine a mapping from objective space to a hyperplane, we now focus on geometric characteristic
of hyper-plane. Reference points are uniformly distributed, and recombination population is mapped

into this space. The number of reference points ρj ∈ ρ can be computed by(cid:0)d+P−1

(cid:1) denote the

P

number of objectives and divisions (along each objective) respectively. Similarity, we would like
to see a solution getting higher chance of being selected when it is located in a less crowded region.
Crowdedness here is measured by counting total number of solutions attached to the reference point.
Reliable measurement must be designed in order to provide qualitative crowdedness description.

3.2.3 Density Estimation
Firstly, in order to make reasonable calculation, each objective function fi(x) is normalized:
(cid:48)
( fi(x)− fmin(x))
max( fi(x)−min( fi(x)), x ∈ St. Then, each reference point is associated with closest solutions.
i (x) =
f
Here, distance between reference point and solutions are measured by projection on reference line.
Reference line w is deﬁned on the hyper-plane by joining reference point with the origin.

(cid:13)(cid:13)(cid:13)(cid:13)x − wT

j xwj
(cid:107)wj(cid:107)2

(cid:13)(cid:13)(cid:13)(cid:13)

pd(x, wj) =

(3.2)

pd(x, wj) denotes perpendicular distance of solution x to reference line wj. Then we calculate



δ(xi, wj) =

1 if

pd(xi, wj) = min
w∈W

(pd(x, wj))

0

otherwise

(3.3)

23

Step
Step1

Step2

Step3

Step4

Operation

, since l must exist satisfying | i = 1l−1Fi| ≤ N and |l
Now N − |l−1
l

Sorting combined solution population Rt (generation t ) based on ε-
nondomination rule. F1, F2, . . . , Fi, . . . are generated.
Now we want to select N populations from R1 in order to construct P2
i=1 Fi| ≥ N
The ﬁrst l − 1 fronts F1, F2, . . . , Fl−1 are assigned to Pt+1
i=1 Fi| solutions from Fl will be associated with reference
points. θ(x) = {ρm : min
ρj} are calculated for all solutions x in
ρj∈ρ
i=1 Fi
Sorting solutions x ∈ Fl by order of θ(x), x ∈ Fl, θ(x) = min(θ(x)}will
be ﬁrst selected, assigned to Pt+1, Fl = Fl\x and so on until |Pt+1| = N;

δ(xi, wj) = 1 represent that solution xi is associated with reference line wj. The number of solutions
associated with every reference line wjis counted as

δ(x, wj)

(3.4)

ρj =

x∈X

ρj for reference line wj provides valuable information to estimate solution density or crowdedness.
Maximum diversity is then achieved by niche-preservation Operation.

3.2.4 Niche-Preservation Operation

We start with calculating

θ(x) = (ρm| ρm = min
ρj∈ρ

ρj) X = {x : x ∈ X, θ(x) = min(θ(x)}

(3.5)

Least crowded reference lines are considered ﬁrstly. Solution population associated with these
reference lines are selected randomly. Selection stops if the number of population reach N.
Otherwise, after taking care of solutions associated with these reference lines, least crowded
reference lines among the remaining are considered. General procedure is written as :

24

T
a
b
l
e
 
3
.
2
 
G
e
n
e
r
a
l
 
p
r
o
c
e
d
u
r
e
 
o
f
 
N
i
c
h
e
-
P
r
e
s
e
r
v
a
t
i
o
n
3.3 Evolutionary Operator for VMOGA

3.3.1 Tchebycheﬀ Metric

In practice, parameter setting in ε-non-dominated sorting algorithm is diﬃcult when prior knowl-
edge about data to be clustered is un known. Usually, ε1 = ... = εd = ε with d representing the
number of objectives. Solution that achieves superb performance in one objective while behave
poorly on all the other objectives is still able to propagate to the next generation. For example,
( f1(x), f2(x), . . . , fd(x)), and ∃x0, s.t f1(x0) << f1(x),
optimization problem is described as min
x∈X
∀x ∈ X\x0 while fi(x0) >> fi(x),∀x ∈ X\x0 for i = 2, 3, . . . , d. Even if ε is very large, still
(1 + ε) f1(x0) < f1(x),∀x ∈ X\x0. Hence, x0 always have great chance to be selected to ﬁrst non-
dominated front. Generally speaking, this issue can be alleviated (not extinguished) by adopting
ε-dominance and setting large ε. However, large ε will signiﬁcantly decrease the size of Pareto
front, thus impairing diversity. Tchebycheﬀ metric oﬀers us inspiration in terms of removing bad

25

F
i
g
u
r
e
 
3
.
4
 
F
r
a
m
e
w
o
r
k
 
o
f
 
i
m
p
l
e
m
e
n
t
i
n
g
 
n
i
c
h
e
 
p
r
e
s
e
r
v
a
t
i
o
n
 
i
n
 
N
S
G
A
-
I
I
I
solution populoation. Proposed by [002], Tchebycheﬀ metric is deﬁned as

TCH(x, w, z∗) = max
i∈1,...,d

wi(cid:107) fi(x) − z∗
i (cid:107)

(3.6)

z∗denotes projection of idea point in each objective axis with weight vector W = [w1, . . . , wd]
characterizing signiﬁcance of each objective. In order to reject “unacceptable” solution population
W could be set as wi = 1, i = 1, . . . , d. However, to determine the idea point is very hard.
Sophistication of determining z∗ also prevents us from using method. Furthermore, order of
priority in rejection should be considered. Intuitively, solutions that deliver bad performance on
d − 1 objectives ought to be removed, then d − 2 comes next, Finally, if rejection quota has not
been used up, solutions with only one corrupted objective should be deleted. If equal importance
is placed upon objectives, It is illustrated in analysis above that Tchebycheﬀ metric basically ignore
order of priority in rejection.

3.3.2 A Rejection Operator

Motivated by Tchebycheﬀ metric and aggregating technique, a selection index V is proposed and
deﬁned as:

d

wi f ∗

i (x)

V(x) =

i=1

(3.7)
f ∗
i (x) denotes normalized objective value for solution x. Notice that V could take the advantage
of Niche preservation process where normalization has done. ε-non-dominated sorting is still
implemented in VMOGA after normalization of objective value. Here, ε is determined by previous
experiment that illustrate relationship between non-dominated population proportion and value of
ε. We need to ensure that

2N ∗ P(ε) < N

(3.8)

As a supplement to non-dominated sorting, rejection operator has the rejecting rate of θ1+θ
θ1+θ

maintain population, a new l exist satisfying |l−1
.The number of N(1 + θ)−|l−1

%. Top
% population with largest V(x) are rejected after Niche-Preservation Operation. In order to
i=1 Fi| ≥ N(1 + θ)
i=1 Fi|solutions are selected by reference-based diversity preservation

i=1 Fi| ≤ N(1 + θ) while |l

26

operator. Then θ1+θ
operator in VMOGA is illustrated in ﬁg[35]

% of the Combined solutions with population N(1+θ)are rejected. Evolutionary

27

F
i
g
u
r
e
 
3
.
5
 
E
v
o
l
u
t
i
o
n
a
r
y
 
o
p
e
r
a
t
o
r
 
i
n
 
V
M
O
G
A
CHAPTER 4

OBJECTIVE FUNCTIONS

4.1 Introduction

Objective functions that evaluate quality of clustering solutions are simultaneously optimized
in multi-optimization framework. Early exploration of data clustering adopt validity index in
evaluation of classiﬁcation results since a solid connection exists between performance of clustering
and validity index. When clustering performance is enhanced, validity index is optimized and vice
versa. Applications of validity indexes as objective functions achieve satisﬁed results in previous
studies [022]. Generally speaking, cluster prototype-based and cluster label-based approach are two
types of objective functions exist in the literature. In 2005, cluster label-based encoding strategy
is proposed by [023] and has been improved from then. However, it requires every data point
being assigned a label, thus demands large computation. This issue can be tackled by Prototype-
based approach. In this article, we adopt Prototype-based encoding strategy and introduce several
frequently applied corresponding objectives. The importance of right choice of objective functions
is also illustrated.

4.2 Distance Measure

Distance measure is the basic element in constructing measurement gauging similarity/ dissim-
ilarity between two data points xi and xj. Suppose X ∈ Rd , distance between xi and xj is denoted
as D(xi, xj) which satisfying following properties:

• D(xi, xj) ≥ 0 for all xi, xj ∈ X and D(xi, xj) = 0 only if i = j
• D(xi, xj) = D(xj, xi)
• D(xi, xj) ≤ D(xi, xl) + D(xl, xj) for all xi, xj ∈ X

28

Name
D(1)(xi, xj)

D(2)(xi, xj)
D(3)(xi, xj)

Expression
D(1)(xi, xj) = |xi,d − xj,d|, xi,dandxj,d repre-
sent dth component of xi,d,xj,drespectively that
contain intensity information of image.
D(2)(xi, xj) =
D(3)(xi, xj) =

(cid:113)l=d
(cid:0)xi,l − xj,l
(cid:113)l=l1,l2,...,lk

(cid:1)2
(cid:0)xi,l − xj,l

(cid:1)2.

l=1

characteristics
intensity information

Geographic and in-
tensity information
1 , . . . , lth
lth
k
of data space

feature

Euclidean distance measure is the most commonly used one in research, here, we categorize them
into three groups based on characteristics representation.

4.3 Membership Function

Membership function is vital in constructing objectives. In this paper, we wish to build up a
framework containing a set of universally applicable objective functions. Cluster validity indices
according to [022], are often selected as the objective function and they are heavily depend on
calculation of membership value. Membership function can be written as a matrix U with its value
reﬂecting belongingness of data point to corresponding cluster. It is also equivalent to partition
matrix. In image segmentation problems, fuzzy membership value is calculated by:

, 1 ≤ i ≤ K; 1 ≤ j ≤ n;

(4.1)

j=1 uk, j = n (4.2)
k=1 uk, j
Equation[4.2] proves that uk, j is reasonable to reﬂect belongness. D(zi, xk) denotes dis-
It should be pointed out that D(zi, xk) could be

tance between cluster centerzi and pattern xk.

j=1 uk, j

k=1

k=1

29

ui, j =

 limK

1

(cid:18) D(zk,xj)

(cid:19) 2

m−1

D(zi,xj)
uk, j ∈ [0, 1] and simple calculation shows that:

i=1

 Klim

(cid:18) Klim
k=1 D(zk, xj) 2
m−1

(cid:19)(cid:18) Klim

 nlim

(cid:19)

= 1;  Klim

 nlim

T
a
b
l
e
 
4
.
1
 
D
i
s
t
a
n
c
e
 
m
e
a
s
u
r
e
m
e
n
t
D(1)(xi, xj)or D(2)(xi, xj) in following studies determined by practical demands. In image segmen-
tations, D(1)(xi, xj) and D(2)(xi, xj) are adopted when intensity distribution of pixels and geographic
information are considered respectively. Fuzzy exponential m is usually set as 2, leading 2
m−1 = 2.
During implementation, centers encoded in a chromosome are ﬁrst extracted (usually given by a
random generation) and are forwarded to objective function.

4.4 Reﬁned Membership Function

In practice, uk, j is likely to be aﬀected by noise and artifacts. In theory, we hope pixel or pattern
gets bigger chance of being classiﬁed into the same cluster as neighborhood pixels. As a result,
reﬁned membership function is proposed by [024]. By evaluating belongingness of surrounding
area, neighborhood information is incorporated into reﬁned membership value. Another advantage
of reﬁned membership function is that ambiguity between adjacent patterns (i.e. overlapping) will
be considerably reduced. In order to construct reﬁned member function, some preparations are
needed. First of all, diameter of neighboring region is predeﬁned (usually set as 3 or 5) and crisp
membership value is calculated via

gk, j =

1 i f uk, j ≥ ul, j, l = 1, 2, ..., K
0


hk, j = 

otherwise

j∈neighborhood

gk, j

(4.3)

(4.4)

Secondly, estimation on partition results of neighboring region is given by

hk, j is deﬁned as the summation of crisp membership value gk, j in a given neighboring region.
In the end, reﬁned membership function can be constructed by product of hk, j and uk, j with
normalization.

1 ≤ i ≤ K, 1 ≤ j ≤ n

,

(4.5)

Similarly, u∗

 nlim
k, j ∈ [0, 1] and simple calculation shows that:
j=1 u∗

 Klim
k=1 u∗

D(zk, xj) 2
m−1

(cid:19)(cid:18) K

(cid:18) K

k=1

k, j

k, j

(cid:19)

= 1;

K

n

k=1

j=1

u∗
k, j

= n

(4.6)

u∗
k, j

=

K

k=1

uk, j hk, j

uk, j hk, j

k=1

30

4.5 Objective Functions

4.5.1 Fuzzy C-means (FCM)

FCM [025][026] is a widely used objective in partition matrix evolution. In fact, single-objective
optimization once took FCM as optimizing objective because of its capability of upgrading cluster
center every iteration. In this paper, Jm is only considered as one of objective functions and it’s
given by

FCM =

k, j D2(zk, xj)
um

(4.7)

n

K

j=1

k=1

Where n is the number of data point or pattern (number of pixels in image equivalently),
D(zk, xj)denotes Euclidean distance between data points xj and the center zk. Notice that Jm
summates variance over all clusters. Lower value of Jm indicate better compactness and each data
point is classiﬁed to the cluster that achieves the largest membership value. It’s quite intuitive that
Jm decrease as number of cluster K increasing and it can be proved that Jm takes minimum value
when K = n

4.5.2 Xie-Beni Index (XB)

Xie-Beni index [027] a function of the ratio of the total variation σ to the minimum separation sep
of the clusters. Total variation σ is given by

n

K

j=1

k=1

σ =

k, j D2(zk, xj)
um

(4.8)

which is the same as Jm and minimum separation min sep(V) = min
i(cid:44)j
the worst case scenario of separation in clustering. XB is written as

{(cid:107)zi − zj(cid:107)2}, is a measure of

XB(U, V; X) =

σ(U, V; X)

n × min_sep(V)

(4.9)

31

4.5.3 Overall Cluster Deviation (DEV)


K
k = 1K 

xj∈Ck

k=1

xj∈Ck

Dev =

Jm =

D2(zk, xj)

D2(zk, xj)

When maximum membership for each data point is assigned 1 and others by 0,

(4.10)

(4.11)

In fact, Dev can be viewed as crisp version of Jm. During iteration ambiguity on the boundary is
eliminated since summation in every iteration is proceed right after crisp classiﬁcation. Dev must
be minimized in order to obtain compact clusters. This objective is practically the same as the
objective of K-means clustering.

4.5.4 Fuzzy Separation (FESP)

Fuzzy separation is given by [019]

K

K

i=1

j=1, j(cid:44)i

S =

i, j D2(zi, xj)
µm

(4.12)

Membership degree follow similar equation to measure belongingness of each zi to zj, i (cid:44) j.

µi, j =

K

l=1,l(cid:44)j

(cid:19) 2

m−1

1

(cid:18) D(zj,zi)

D(zj,zl)

, j (cid:44) i

(4.13)

It should be maximized in order to obtain well separated clusters.

4.5.5 Global Separation (SEP)

Global separation [022] is introduced as supplement criteria in estimating separation and it can
be considered as the crisp version of fuzzy separation. Maximizing sep(V) encourage clusters to
be separated from each other, on the other hand, distance between cluster centers should be large
enough to avoid overlapping.

sep(V) =

2

K(K − 1)

D2(zi, zj)

(4.14)

 Klim

i=1


i(cid:44)j

32

4.5.6 Average Between Group Sum of Squares (ABGSS)

ABGSS is proposed by [028] which also measure cluster separation and compute the average
distance of the cluster centers from the centroid of the dataset as follows:

 limK

i=1 niD2(zi, z)

K

ABGSS =

Here, z represent the center of the whole dataset and ni is the number of data points having
maximum membership value corresponding to zi. ABGSS must be maximized in order to obtain
well-separated clusters.

4.5.7

Intra-cluster Entropy (H)

Intra-cluster entropy is brought up in terms of improving clustering homogeneity [029]. Average
degree of similarity between each cluster center and data points is given by

(cid:20)

 nlim

j=1

g(zk) =

1
n

(cid:21)

0.5 +

CO(zk, xj)

2

Value of g(zk) reﬂect global probability of grouping all data points to Ck represented by center zk.
−→xi·−→xj
cosine distance CO(xi, xj) is written as CO(xi, xj) =
(cid:107)−→xi(cid:107)(cid:107)−→xj(cid:107) Intra-cluster entropy of partition Ci
is given by

(4.15)

(4.16)

(4.17)

(4.18)

H(Zi) = −[g(zi) log2 g(zi) + (1 − g(zi)) log2 (1 − g(zi))]

In the end, general Intra-cluster entropy is deﬁned as

(cid:20)

K

k=1

H =

1 − H(Zi)g(zi)

(cid:21) 1

d

d is the dimension of dataset. High intra-cluster entropy is desirable because low entropy from
each cluster enhance homogeneity of our results.

33

4.5.8 Davies-Bouldin Index (DB)

The Davies-Bouldin index [030] is a function of the ratio of the sum of within-cluster scatter to
between-cluster scatter, scatter of cluster is given as:

Single intra-cluster homogeneity and compactness can be estimated by

Overall homogeneity and compactness is then given by:


xj∈Zi

Si =

(cid:27)

D2(zi, xj)
|Z j|

(cid:26) Sk + Sj
 limK

D2(zk, zj)

k=1 Rk
K

Rk = max
j(cid:44)k

DB =

(4.19)

(4.20)

(4.21)

(4.22)

(4.23)

Davies-Bouldin index calculate the ratio of within cluster scatter to intra cluster separation. For a
good partition result, inter cluster separation as well as intra cluster homogeneity and compactness
should be high. This aﬃrms the idea that no cluster has to be similar to another, and hence the best
clustering scheme essentially minimizes the Davies–Bouldin index.

4.5.9 Connectedness
Inspired by concept of fuzzy adjacency relation from [01], For any pair of pixel (x, y), the aﬃnity
function µk(x, y)indicates the local hanging togetherness and it is written as:

µk(x, y) = µα(x, y)g(µψ(x, y), µφ(x, y))

µα(x, y) represents adjacency relation which is deﬁned as:

µα(x, y) =

1 i f (cid:107)x − y(cid:107) ≤ 1
0

otherwise

g should be a monotonically non-decreasing function and two components of µψ(x, y) and µφ(x, y)
represent object-feature based and homogeneity-based function.
In our case, calculation of



34

g(µψ(x, y), µφ(x, y)) can be simpliﬁed into

g(µψ(x, y), µφ(x, y)) =



1

0

x, y ∈ Ck

otherwise

(4.24)

The shortest path between two points x,y is denotes as dshort(x, y), which is measured along the
relative neighborhood graph. First of all, all possible paths connecting x and y are found, assume
x1, x2, . . . , xk are intermediate point forming path C(x1, x2, . . . , xk) which satisfy

µk(xk, xk+1) = 1

(4.25)

Set |C(x1, x2, . . . , xk)| = k, dshort(x, y) = minc∈C |c|. Otherwise, if there is no path between (x, y),
set dshort(x, y) = inf. For a given cluster Ck, we calculate the number of connected components
and denote it as L(Ck)

(4.26)

k∈1,2,...,K
Cluster validity index is minimized during optimization.

Conn = max

L(Ck)

35

CHAPTER 5

EVALUATION OF OBJECTIVES

As mentioned before, there is no single objective function that works well for all kinds of dataset.
Moreover, performance of multi-optimization algorithm is also likely be harmed by objectives
that provide insuﬃcient coverage of data features. Therefore, selection of objective is quite
important. An increasing number of objectives generally enhance optimizing performance while
adding computational complexity. Multi-objective optimization is performed on a number of, often
conﬂicting objectives. The degree of conﬂict between two objectives increases if optimization of
one objective violate the assumption of the other. One example is to optimize XB and FCM:
FCM is strongly correlated with global variance while XB index is a combination of global
variance and minimum separation as numerator and denominator respectively. Separation between
closest clusters are considered as the worst-case scenario here, the value of which increases as
clusters are well separated from each other, thus increase global variance. It is shown in [015],
minimizing XB index and FCM simultaneously yields good results.
It turns out to be much
more diﬃcult to give similar analysis on other objective functions depending limited knowledge.
Therefore, information theory is introduced to solve this issue.

5.1 Multivariate Mutual Information

5.1.1 Mutual Information Among Two Variables

Generally speaking, mutual information [031] measures amount of information about one variable
given the knowledge of another variable and it is mostly deﬁned on two variables. In probability
theory, mutual information gives a quantitative measurement on the amount of information obtained
about one variable through another. Our problem here requires the knowledge of interaction
between more than two variables. Firstly, in two-objective cases, assume X and Y have distribution

36

PX(x),PY(y) respectively. Entropy is given by

x∈X

H(X) = −
H(Y) = −
H(X, Y) = − 

y∈Y

x∈X,y∈Y

PX(x) log PX(x)

PY(y) log(PY(y))

PXY(x, y) log(PXY(x, y))

Then joint entropy is written as

(5.1)

(5.2)

(5.3)

(5.4)

(5.5)

(5.6)

Mutual information is deﬁned in terms of individual entropy and joint entropy:

I(X;Y) = H(x) + H(Y) − H(X, Y)

Now, we want to expand to many-objective situation, the idea underlying this is dimension reduction.

5.1.2 Mutual Information Among Multi Variables
For three variables, U, V, Y. If V has no eﬀect on U and Ythen IV(U;Y) = I(U;Y), Iv(U;Y) =
I(U;Y). On the other hand, if V aﬀect the relationship between Uand Y, we could eliminate V by
taking a weighted sum (on the probability of occurrence of the particular value of V ) of the mutual
information between U and Y for each value of V.

IV(U;Y) =

v∈V

PV(v)I(U;Y|V = v) = I(U;Y|V)

I(U;Y|V) = I(U, V;Y) − I(V;Y)

since

I(U, V;Y) = H(U, V) + H(Y) − H(U, V, Y); I(U;Y) = H(U) + H(Y) − H(U, Y)

(5.7)

We can defer that

IV(U;Y) = H(U, V) − H(U, V, Y) − H(V) + H(V, Y)

(5.8)

A simple subtraction betweenIV(U;Y) andI(U;Y) can be written as
I(U;Y) − IV(U;Y) = H(U) + H(V) + H(Y) − (H(U, V) + H(V, Y) + H(U, V)) + H(U, V, Y) (5.9)

37

This is also called mutual interaction between V,U and Y.Analysis above presume that probability
distribution of each variable is known. Now, we can easily extend mutual information calculation
to a universal case:

I(X1; X2) = H(X1) − H(X1|X2); H(X1|X2) = I(X1|X2)

So, I(X1; X2) = I(X1) − I(X1|X2), with knowledge above

I(X1; X2; X3) = I(X1; X2) − I(X1; X2|X3)

(5.10)

(5.11)

And

I(X1; X2; . . . ; XN) = I(X1; X2; . . . ; XN−1) − I(X1; X2; . . . ; XN−1|XN)

(5.12)

These equations can be expressed in terms of entropy

I(X1; X2; ...; XN) = (H(X1) + H(X2) + ... + H(XN)) − ... + (−1)N−1H(X1; X2; ...; XN)

(5.13)

When processing only two variables,I is always nonnegative, if I = 0, Xand Yare independent and
knowing X does not give any information about Y and I > 0 implies information is shared by Xand
Y, the largerI is, more information is shared by X and Y. The ineﬃciency with multi-variate mutual
information is that it could be either positive or negative. For instance, consider case that U and Y
are independent of each other and V is a variable only dependent on U, then I(U; V;Y) is negative.

5.2 Total Correlation

Total correlation is the amount of information shared among the variables in the set.

In
information theory, total correlation is one of the several generalizations of mutual information.
For a given set of N random variables {X1, X2, . . . , XN}, total correlation is deﬁned as Kullback-
Leibler divergence from the joint distribution p(X1, X2, . . . , XN) to the independent distribution of
p(X1)p(X2). . . p(XN) :[032]

C(X1, X2, . . . , XN) = DK L[p(X1, X2, . . . , XN)|p(X1)p(X2). . . p(XN)]

(5.14)

38

By reducing divergence to simpler diﬀerence of entropies:

C(X1, X2, . . . , XN) =

H(Xi) − H(X1, X2, . . . , XN)

(5.15)

(5.16)

which can be also written as

C(X1, X2, . . . , XN) =

I(Xi; Xj; Xk) + ... + I(X1; X2; . . . ; XN)

N
I(Xi; Xj) +

i=1

i, j

i, j,k

39

CHAPTER 6

IMPLEMENTATION

6.1 Minor Speciﬁcation

6.1.1 Handling of Inﬁnity Solution in Objective Space

It’s not rare in practice when prior knowledge is inaccessible and number of cluster in implementa-
tion is unknown. An inﬁnite value in objective space is possible, for instance, when some clusters
are empty, Zi, hence |Zi| = 0. Recall that within-scatter of cluster is given as:


xj∈Zi

Si =

D2(zi, xj)
|Z j|

(6.1)

Si goes to inﬁnity. Notice that normalization will be aﬀected by doing calculation ∞−C∞ . This is
unlikely to generate a reasonable result. In our implementation, solutions with inﬁnite objective
are detected immediately and deleted.

6.1.2 Handling of Solutions with Same Objectives or Diversity

Solutions with the same objectives or diversity are always selected randomly until solution popu-
lation reach threshold N.

6.1.3 Handling of Solutions with Same Membership on Two Clusters

Assume that data pointxj has equivalent maximum membership for cluster Z1 and Z2: u1, j = u2, j
due to fuzzy clustering. This often implied that xj is on the boundary of Z1 and Z2. In order
to take advantage of fuzzy clustering, data pointswith equivalent degree of belongingness are
assigned to more than one clusters. Some objectives, for instance, deviation requires calculating

K


Dev =

k=1

xj∈Ck

D2(zk, xj). In VMOGA, xj is assigned to both Z1and Z2.

40

6.2 Parameter and Experiment Setting

6.2.1

Initialization and Objective Selection

Initialization Parameter
Number of generations
Population size
Parent solution population
Child solution population
Upper bound on number of clusters
Lower bound on number of clusters

value
50
100
100
100
30
2

Six diﬀerent methods are selected to construct comparison Fuzzy c-means (FCM) is a method
of clustering which allows one pixel to belong to two or more clusters. FCM is proposed by [9]
and frequently applied in pattern recognition. It is based on minimization of the FCM objective
function

FCM =

k, j D2(zk, xj)
um

(6.2)

Partition is carried out through an iterative optimization of FCM shown above, with the update of
membership ui, j and the cluster centers zk ,
1

, 1 ≤ i ≤ K; 1 ≤ j ≤ n;

In
NSGA-II is one of the most widely adopted two-objective optimization genetic algorithm.
our experiment, MOVGA and NSGA-II optimize the same objectives (FCM and XB) while the
number of clusterK is ﬁxed for NSGA-II and set as a variable for MOVGA. Similarily, NSGA-III
and VMOGA-1 both optimize FCM,XB, 1/FSEP, DB, 1/H while K is ﬁxed for NSGA-III and set
as a variable for VMOGA-1. In VMOGA-2, another objective function Conn in terms of evaluating

ui, j =

(cid:18) D(zk,xj)

D(zi,xj)

K

i=1

zk =

j=1 um
k, j xj
j=1 um
k, j

(6.3)

(6.4)

n

K

j=1

k=1

m−1

(cid:19) 2
n
n

41

T
a
b
l
e
 
6
.
1
 
I
n
i
t
i
a
l
i
z
a
t
i
o
n
 
p
a
r
a
m
e
t
e
r
connectedness is incorporated into our framework. However, introduction of Conn disrupts the
balance of previous objective set. Number of cluster K, which is set as a variable in VMOGA, is
adopted to address this issue. SO (Single optimization of FCM + XB + 1/FSEP + DB + 1/H)
is taken as a supplement criteria in our comparison, with corresponding weight value to each
objective set as one. Recall that weight value or priority for each objective function is needed
before implementing aggregating methods. Optimizing SO in our experiment is to exclude the
possibility that equal weights happen to be the right answer since in this case rejection operator
invalidate the rest part of evolutionary operator.

√

√

Methods

Choice of objectives

FCM

SO

√

√

Jm XB 1/FuzzySep DB 1/H Conn
√
√
NSGA-II √
MOVGA √
NSGA-III √
VMOGA √

√
√
√
√
√

√
√

√
√

√
√

√
√

1SO optimize normalized Jm + XB + 1/FuzzySep + DB + 1/H + Conn

6.2.2 Parameter Encoding and Genetic Operator

Methods with predeﬁned number of clusters K (NSGA-II, NSGA-III, SO) adopt simulated binary
crossover with ηc = 10 and mutation operator with d = 10 with Crossover probability and mutation
probability equal 1 and 0.2 respectively. In VMOGA, crossover operation is determined by normal
2 , t
distribution N(µ, σ
denotes iteration insofar, tmax = 200 as total number of generation. Mutation operator parameter

2), µ is the length of the parent solution chromosome, σ = 4 − 3

(cid:19) 1

tmax

(cid:18)

t

42

T
a
b
l
e
 
6
.
2
 
C
h
o
i
c
e
 
o
f
 
o
b
j
e
c
t
i
v
e
s
 
f
o
r
 
F
C
M
,
 
S
O
,
N
S
G
A
-
I
I
,
 
M
O
V
G
A
,
 
N
S
G
A
-
I
I
I
 
a
n
d
 
V
M
O
G
A
(cid:18)

(cid:19) 1

d = 0.1 + 9.9
algorithm, provision is set as 4.

tmax

t

2 ; Moreover, rejection rate θ = 0.1 for VMOGA. In refence-based sorting

6.2.3 Obtaining Final Solution

Notice that, a set of non-dominated solutions are generated by multi-optimization algorithm and
none of these solutions can be improved in one objective without degrading another. Thus, we
need do some tradeoﬀ or use a selection scheme to choose one particular solution. Index I [6] is
eﬃcient as a supplement validity index to measure the goodness of cluster results. It is given by

I(K) =

1
K

× E1
EK

× DK

(6.5)

DK. E1 =n

K denotes the number of clusters. Observed from equation, Index I is composed of 1
K

j=1 D(z1, xj) ;EK =K

k=1xj∈Zk D(zk, xj). E1 is taken as constant and z1 represent

, E1
EK

and

global center of dataset. Recall that EK has same expression as Dev which needs to be minimized
D(zi, zj); it denotes the largest
in order to enhance compactness of clustering results. DK = max
i, j
distance between two cluster centers and larger DK indicate better separation. In summary, indexI
is to be maximized. During implementation, we select solution from the last generation with largest
I.

6.3 Experiment on MRI Data

6.3.1

Introduction of MRI Data

Image Segmentation plays an important role in medical image analysis. Automatic segmentation
of MRI brain images into diﬀerent classes is very important in clinical study and neurological
pathology. However, image segmentation has always been challenging for MRI images since these
images are noisy and imprecise in nature. One important aspect of evaluating segmenting results is
to calculate regional volume. In real-life application, regional volume calculations often bring very
useful diagnostic information. Among them, the quantization of gray and white matter volumes may

43

be of major interest in neurodegenerative disorders such as Alzheimer disease, movement disorders
such as Parkinson or Parkinson related syndrome, white matter in metabolic or inﬂammatory
disease, congenital brain malformations, perinatal brain damage, or in post-traumatic syndrome.
Normal brain image data provided by [22] is a great source to evaluate eﬃciency of proposed
algorithm. Here, ground truth table is available for these images. There are ten classes present in
Normal brain image with each class given an integer between 0 and 9 representing its content.

Class

Label

Back
gound
0

1

Matter
2

White
Matter
3

4

CSF Grey

Fat Muscle/

skin
5

Skin Skull

6

7

Glial
Matte
8

Connective

9

A result comparison here is established with MOVGA proposed by [4] and evaluated by adjusted

rand index(ARI) [25]

6.3.2 Adjusted Rand Index(ARI)

Clustering quality is mostly evaluated by external measurement. The Rand index(RI) takes two
partitions as the input (one of which is the correct solution from truth table). The number of
pairwise co-assignments of data items between the two partitions is counted as RI. Adjusted Rand
index (ARI) [25] additionally introduces a statistically induced normalization in order to yield
values close to zero for random partitions. Suppose true partition is available and denotes as
T = {T1, T2, . . . , Tr} for MRI data X = {X1, X2, . . . , XN}, our segmentation results are written as
C = {C1, C2, . . . , Cs}. Overlapping between T and C can be summarized in a contingency table
illustrated in table[64]

44

T
a
b
l
e
 
6
.
3
 
T
r
u
t
h
 
t
a
b
l
e
 
f
o
r
 
M
R
I
 
b
r
a
i
n
 
i
m
a
g
e
Table 6.4 Contingency table

C1
n11
n21
...
nr1
b1

C2
n12
n22
...
nr2
b2

... Cs
...
n1s
...
n2s
...
...
...
nrs
...
bs

Sums
a1
a2
...
ar

T1
T2
...
Tr
Sums

Each element in contingency table denotes the number of objects belonging to Ti Cj as
ni j = |Ti Cj|, andr

i=1 ai =s

i=1 bi = N

i, j
(cid:0)ai2
2i

1

(cid:0)ni, j
(cid:1)j

2

(cid:1) −
(cid:0)bj

2

(cid:21)

(cid:20)
i (ai2)
2 )
j (bj
(cid:20)
i (ai2)
(cid:1) −

(N2)

(N2)

j (bj
2 )

ARI =

(cid:21)

(6.6)

ARI=1 when clustering results perfectly match true partition T and approximate to 0 for random
classiﬁcation.

6.3.3 Evaluation of Segmentation Results

Algorithms have been applied on the images corresponding to the Z planes Z10, Z72, Z108, Z120
and Z140.Results are presented in Fig[61:65]. It appears that proposed VMOGA has identiﬁed
diﬀerent homogeneous regions very well

45

F
i
g
u
r
e
 
6
.
1
 
(
a
)
O
r
i
g
i
n
a
l
 
M
R
I
 
i
m
a
g
e
 
i
n
 
Z
1
0
 
p
l
a
n
e
 
(
b
)
G
r
o
u
n
d
 
t
r
u
t
h
 
t
a
b
l
e
 
(
c
)
C
o
r
r
e
s
p
o
n
d
i
n
g
 
s
e
g
m
e
n
t
e
d
 
i
m
a
g
e
 
p
r
o
d
u
c
e
d
 
b
y
 
V
M
O
G
A
 
c
l
u
s
t
e
r
i
n
g
Figure  6.2  (a)Original  MRI  image  in  Z72  plane  (b)Ground  truth  table  (c)Corresponding  segmented  image 
produced by VMOGA clustering

Figure  6.3  (a)Original  MRI  image  in  Z108  plane  (b)Ground  truth  table  (c)Corresponding  segmented  image 
produced by VMOGA clustering

Figure  6.4  (a)Original  MRI  image  in  Z120  plane  (b)Ground  truth  table  (c)Corresponding  segmented  image 
produced by VMOGA clustering

46

F
i
g
u
r
e
 
6
.
5
 
(
a
)
O
r
i
g
i
n
a
l
 
M
R
I
 
i
m
a
g
e
 
i
n
 
Z
1
4
0
 
p
l
a
n
e
 
(
b
)
G
r
o
u
n
d
 
t
r
u
t
h
 
t
a
b
l
e
 
(
c
)
C
o
r
r
e
s
p
o
n
d
i
n
g
 
s
e
g
m
e
n
t
e
d
 
i
m
a
g
e
 
p
r
o
d
u
c
e
d
 
b
y
 
V
M
O
G
A
 
c
l
u
s
t
e
r
i
n
g
Results of ARI score are illustrated in table [65]

NSGA-II NSGA-III MOVGA VMOGA

FCM SO
0.509
0.661
0.450
0.604
0.557
0.719
0.698
0.386
0.591
0.627

Z10
Z72
Z108
Z120
Z140

0.722
0.727
0.732
0.761
0.816

0.786
0.725
0.791
0.810
0.886

0.744
0.664
0.767
0.758
0.827

0.801
0.719
0.808
0.825
0.893

It is evident in table[65] that proposed VMOGA generally outperforms other algorithm. Mean-
while, the presence of VMOGA and NSGA-III results illustrate that more objectives provide extra
coverage of data characteristics, hence enhance clustering performance. Comparison of FCM,
NSGA-II and NSGA-III further illustrate the importance of optimizing many objectives simultane-
ously. Results of single-objective optimization is also presented in the table. Recall that a rejection
operator V is adopted in VMOGA to remove bad solutions. It is also neccessary to eliminate the
possibility that best results could be selected by index V individually. The single optimization of V
here displays that rejection operator is incapable of selecting good solutions.

6.3.4 Detection of White and Gray Matter

Further application on MRI images segmentation results is going to be estimation of white and
gray matter. Gray matters are important reference in diagnosing neurodegenerative disorders such
as Alzheimer disease, Parkinson or Parkinson related syndrome while white matter is important for
metabolic or inﬂammatory disease, congenital brain malformations, perinatal brain damage, or in
post-traumatic syndrome. Here detection accuracy (AAC) is deﬁned as

ACC =

T P + T N

T P + FP + FN + T N

(6.7)

T P, FP, FN, T N can be interpreted as True positive: white/gray matter correctly identiﬁed as
white/gray matter, False positive: white/gray matter incorrectly identiﬁed as non-white/gray matter,
True negative: non-white/gray matter correctly identiﬁed as non-white/gray matter, False negative:

47

T
a
b
l
e
 
6
.
5
 
R
e
s
u
l
t
s
 
f
o
r
 
Z
1
0
,
 
Z
7
2
,
 
Z
1
0
8
,
 
Z
1
2
0
 
a
n
d
 
Z
1
4
0
 
p
l
a
n
e
s
non-white matter incorrectly identiﬁed as white matter respectively.Moreover, VA(Volume predic-
tion accuracy) is proposed to examine accuracy of white or gray matter prediction. VA is given
by

(6.8)

V A =

|P − T|
|T|

Where T and P denotes true volume and predicted volume by segmentation respectively Compar-
isons are made on Gray and White Matter segmentation with respect to truth label and VMOGA
results. It is illustrated in ﬁg[66-68]

Figure  6.7  (a)Gray  Matter  Ground  truth  (b)Gray  Matter  Extraction  produced  by  VMOGA+Extraction  on  Z108 
plane (a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA-based extraction on Z108 
plane

Figure  6.8  (a)Gray  Matter  Ground  truth  (b)Gray  Matter  segmentation  produced  by  VMOGA  on  Z120  plane 
(a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA on Z120 plane

48

F
i
g
u
r
e
 
6
.
6
 
(
a
)
G
r
a
y
 
M
a
t
t
e
r
 
G
r
o
u
n
d
 
t
r
u
t
h
 
(
b
)
G
r
a
y
 
M
a
t
t
e
r
 
s
e
g
m
e
n
t
a
t
i
o
n
 
p
r
o
d
u
c
e
d
 
b
y
 
V
M
O
G
A
 
o
n
 
Z
1
0
8
 
p
l
a
n
e
 
(
a
)
W
h
i
t
e
 
M
a
t
t
e
r
 
G
r
o
u
n
d
 
t
r
u
t
h
 
(
b
)
W
h
i
t
e
 
M
a
t
t
e
r
 
s
e
g
m
e
n
t
a
t
i
o
n
 
p
r
o
d
u
c
e
d
 
b
y
 
V
M
O
G
A
 
o
n
 
Z
1
0
8
 
p
l
a
n
e
Figure  6.9  (a)Gray  Matter  Ground  truth  (b)Gray  Matter  Extraction  produced  by  VMOGA+Extraction  on  Z120 
plane (a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA-based extraction on Z120 
plane

MRI plane

Index

FCM

SO

Performance evaluation

NSGA-II NSGA-III MOVGA VMOGA Extraction

Z72

Z80

Gray Acc
Gray VA
White Acc
White VA

Gray Acc
Gray VA
White Acc
White VA

Z100 Gray Acc
Gray VA
White Acc
White VA

Z108 White Acc
White VA
Gray Acc
Gray VA

0.692
46%
0.777

0.86
0.873
39%
33.8%
0.873
0.912
47% 59.5% 40.4%

0.852
50%
0.901
41%

0.906
35%
0.876
40%

0.791
64%
0.832
60%

0.821
66%
0.707
37%

0.839
40.8%
0.942
27.9%

0.906
40.8%
0.912
30.9%

0.802

0.893
0.932
40% 49.5% 30.9%
0.895
0.811
38% 77.2% 33.1%

0.844

Z120 White Acc
White VA
Gray Acc
Gray VA

0.907
0.772
40% 106%
0.821
0.929
32%
83%

0.883
34.9%
0.881
17.2%

0.881
35.2%
0.904
25.1%

0.860
46.5%
0.933
13.8%

0.91
46.5%
0.934
15.8%

0.884
15.8%
0.948
38.4%

0.952
35.8%
0.93
16%

0.892
31.9%
0.915
29.6%

0.881
39.2%
0.939
22.2%

0.904
39.2%
0.925
29.6%

0.915
29.6%
0.864
22.6%

0.910
41%
0.901
19.4%

0.899
26.2%
0.924
17.7%

0.884
25.3%
0.948
16.5%

0.895
26.2%
0.941
15.1%

0.951
17.7%
0.947
19.9%

0.961
26%
0.944
12.9%

0.959
3.15%
0.978
3.3%

0.970
4.68%
0.971
3.85%

0.982
2.34%
0.987
1.09%

0.977
3.21%
0.985
0.53%

0.978
1.75%
0.984
0.21%

Figure [66-69] shows the clustered gray and white matter using VMOGA technique. Visually
it provides a similar clustering structure as that provided corresponding truth table. However,
VMOGA fails to identify skin, skin/muscle matter which disturbs measurement accuracy. There-

49

T
a
b
l
e
 
6
.
6
 
G
r
a
y
 
a
n
d
 
W
h
i
t
e
 
M
a
t
t
e
r
 
e
x
t
r
a
c
t
i
o
n
 
r
e
s
u
l
t
s
 
f
o
r
 
n
o
r
m
a
l
 
b
r
a
i
n
 
i
m
a
g
e
s
 
f
o
r
 
Z
7
2
,
 
Z
8
0
,
 
Z
1
0
0
,
 
Z
1
0
8
 
a
n
d
 
Z
1
2
0
 
p
l
a
n
e
s
.
fore, connectedness-based extraction is adopted as a supplementary procedure in quantization of
Gray and White Matter and corresponding segmentation result of Z100 and Z120 plane is shown
in Fig [66-69].It is illustrated in the table[66] that proposed VMOGA generally achieve higher
accuracy when extracting White and Gray Matter. Meanwhile, the presence of VMOGA and
NSGA-III results illustrate that increasing number of objectives provide growing coverage of data
characteristics. Comparison of FCM, NSGA-II and NSGA-III further illustrate the importance
of optimizing many objectives simultaneously. Furthermore, it is shown in the table [66] that
evident discrepancies exist in the quantization of Gray and White Matter from VMOGA. Superior-
ity of proposed Extraction (results from VMOGA plus connectedness based extraction) has been
demonstrated which signiﬁcantly reduce the error of detection and volume prediction accuracy.

6.4 Implementation on Fatigue Area Detection

The proposed image processing method is validated on NDE data obtained from inspection of
delamination in glass ﬁber reinforced polymer (GFRP) specimens subjected to Mode I fatigue tests.
A GFRP specimen is periodically imaged from its healthy state to damaged state using optical
transmission scan [26]. OTS has been previously used for tracking impact damage growth in GFRP
specimens [27]. In the experiment, sample is illuminated by the laser source and power transmitted
through the sample is recorded by photodetector to examine extent of delamination in the GFRP
sample. At ﬁrst, the critical displacement where the specimen cracks are recorded by introducing
monotonic loading to ﬁve similar samples. The process is then repeated for all the specimens and
the average critical displacement is computed. Fatigue loading is then conducted on a new sample
under cyclic loading at 5 cycles/sec and displacement ratio of 0.1. OTS images of mode 1 GFRP
sample subjected to 8 rounds of cyclic loading are presented in Figure [610]:

50

From OTS images, extent of delamination can be observed as the region between end of Teﬂon
and the beginning of healthy part of the sample. It is shown that delamination inside specimen grows
slowly as the number of load circles increasing. The piezoelectric sensors attached to the GFRP
sample are marked as reference points and are used to calculate the physical area of delamination
from dpix. During preprocessing, location of the two PZT sensors are identiﬁed and the pixel
distance between their inner edges is recorded as lpix .Additionally, upper and lower edges of the
sample and its pixel width is recorded as wpix . The cutting results are presented in Fig[611]

51

F
i
g
u
r
e
 
6
.
1
0
 
H
e
a
l
t
h
y
 
s
a
m
p
l
e
 
b
e
i
n
g
 
s
u
b
j
e
c
t
e
d
 
t
o
 
M
o
d
e
 
1
 
c
y
c
l
i
c
 
l
o
a
d
i
n
g
 
a
f
t
e
r
 
(
a
)
 
2
0
K
 
c
y
c
l
e
s
 
(
b
)
 
4
0
K
 
c
y
c
l
e
s
 
(
c
)
 
6
0
K
 
c
y
c
l
e
s
 
(
d
)
 
8
0
K
 
c
y
c
l
e
s
 
(
e
)
 
1
0
0
K
 
c
y
c
l
e
s
 
(
f
)
 
1
2
0
K
 
c
y
c
l
e
s
 
(
g
)
 
1
4
0
K
 
c
y
c
l
e
s
 
(
h
)
 
1
6
0
 
c
y
c
l
e
s
Area of delamination from the OTS image is originally computed using image processing
algorithm implemented in MATLAB. The delaminated area is identiﬁed using segmentation via
fast marching method (FMM) [02] whose corresponding threshold for extracting fatigue area is
known. Physical distance between two PZT sensors Lphy and width of the sample Wphy the
delamination area are measured before segmentation. Finally, Dphy is calculated by equation 6.9

dpix

lpis × wpix

(Lphy × Wphy)cm2

(6.9)

Where Lphy = 10cm and Lphy = 2.5cm. Notice that unsupervised VMOGA generate result that
contains edge eﬀect which in practice does belong to delamination.

52

F
i
g
u
r
e
 
6
.
1
1
 
P
Z
T
 
c
u
t
t
i
n
g
 
i
m
a
g
e
 
o
f
 
H
e
a
l
t
h
y
 
s
a
m
p
l
e
 
b
e
i
n
g
 
s
u
b
j
e
c
t
e
d
 
t
o
 
M
o
d
e
 
1
 
c
y
c
l
i
c
 
l
o
a
d
i
n
g
 
a
f
t
e
r
 
(
a
)
 
2
0
K
 
c
y
c
l
e
s
 
(
b
)
 
4
0
K
 
c
y
c
l
e
s
 
(
c
)
 
6
0
K
 
c
y
c
l
e
s
 
(
d
)
 
8
0
K
 
c
y
c
l
e
s
 
(
e
)
 
1
0
0
K
 
c
y
c
l
e
s
 
(
f
)
 
1
2
0
K
 
c
y
c
l
e
s
 
(
g
)
 
1
4
0
K
 
c
y
c
l
e
s
 
(
h
)
 
1
6
0
K
 
c
y
c
l
e
s
Figure 6.12 Edge eﬀect and real delamination area in PZT cutting image of Healthy sample after 160K cycles

In order to separate edge eﬀect from real delamination, a special window function is adopted.

Recall the process in constructing reﬁned membership function where

gk, j =

1 i f uk, j ≥ ul, j, l = 1, 2, ..., K
0

otherwise

In fact, gk, j is the crisp version of fuzzy membership uk, j, W = {wi} is predetermined as a group
of rectangular windows with width equal to one. The length of wi is denoted as li

Largest component of ROI produced by h∗
k, j

is selected as our segmenting result. Binary images


hk, j =


=

i

gk, j T = li + 1

2


j∈wi

h∗
k, j

1 i f hk, j ≥ hl, j, l = 1, 2, ..., K
0

otherwise

(6.10)

(6.11)

(6.12)

denoting delamination area is shown in ﬁgure[613-614].In the end, plot of number of load cycles

53

F
i
g
u
r
e
 
6
.
1
3
 
H
e
a
l
t
h
y
 
s
a
m
p
l
e
 
b
e
i
n
g
 
s
u
b
j
e
c
t
e
d
 
t
o
 
M
o
d
e
 
1
 
c
y
c
l
i
c
 
l
o
a
d
i
n
g
 
a
f
t
e
r
 
2
0
K
,
 
4
0
K
,
 
6
0
K
 
a
n
d
 
8
0
K
 
c
y
c
l
e
s
Figure 6.14 Healthy sample being subjected to Mode 1 cyclic loading after 100K, 120K, 140K and 160 cycles

versus delamination area detected is shown in ﬁg[615]. It is clear that curve of delamination area
detected by VMOGA Extraction ﬁts very well.

54

F
i
g
u
r
e
 
6
.
1
5
 
P
l
o
t
 
o
f
 
n
u
m
b
e
r
 
o
f
 
l
o
a
d
 
c
y
c
l
e
s
 
v
e
r
s
u
s
 
d
e
l
a
m
i
n
a
t
i
o
n
 
a
r
e
a
 
d
e
t
e
c
t
e
d
 
b
y
 
V
M
O
G
A
+
E
x
t
r
a
c
t
i
o
n
 
a
n
d
 
F
M
M
.
CHAPTER 7

CONCLUSION AND FUTURE WORK

In this article, a variable-length multi-objective optimization technique(VMOGA) has been pro-
posed to evolve the number of clusters automatically. NSGA-III has been adopted as underlying
optimization framework for multi-objective optimization due to its capability of managing a large
number of objectives. Essential components of existing multi-objective optimization algorithms
have been discussed and developed. Frist of all, a new encoding and genetic operator has been
designed following principle underlying binary operator in order to reduce redundant computation.
Then, weaknesses with existing evolutionary operator are discussed and a rejection scheme is pro-
posed to remove “bad” solutions that occupy population space. Furthermore, total correlation is
introduced as a supplementary tool to make suitable choice of objectives. Conventional objective
functions are presented. Spatial Jm, intra-cluster entropy and connectedness which incorporate
various aspects of spatial information are also constructed in order to better target image segmen-
tation problem. Superiority of VMOGA is demonstrated in segmenting MRI human brain images
by comparing with several other optimization algorithms. Additionally, In the end, VMOGA is
performed on detection of delamination area caused by fatigue loading in Mode I glass ﬁber rein-
forced polymer (GFRP) samples. Results suggest future potential application in fatigue detection.
However, some limitations still exist in our proposed VMOGA. It is known that among the large
number of cluster validity indices, none of which performs satisfactorily for a wide range of data
sets alone. Therefore, it is important to select two or more objective functions that can complement
and compensate for one another. This is illustrated by MRI experiment. We know that diﬀerent
tissues may hold similar intensity information at some locations while intensity value of a particular
tissue may variate across some region. This is caused by noise and intensity inhomogeneities in
data collection phase. As a result, it is possible to improve clustering results by incorporating extra
spatial information. However, the amount of spatial information needed is still unknown. In future
work, balance between geography-based and intensity-based objectives should be examined.

55

BIBLIOGRAPHY

56

BIBLIOGRAPHY

[1] K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference-
point-based nondominated sorting approach, part i: Solving problems with box constraints.,”
IEEE Trans. Evolutionary Computation, vol. 18, no. 4, pp. 577–601, 2014.

[2] S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay, “Multiobjective genetic clustering for
pixel classiﬁcation in remote sensing imagery,” IEEE transactions on Geoscience and Remote
Sensing, vol. 45, no. 5, pp. 1506–1511, 2007.

[3] A. Mukhopadhyay and U. Maulik, “Unsupervised pixel classiﬁcation in satellite imagery
using multiobjective fuzzy clustering combined with svm classiﬁer,” IEEE Transactions on
Geoscience and Remote Sensing, vol. 47, no. 4, pp. 1132–1138, 2009.

[4] A. Mukhopadhyay and U. Maulik, “A multiobjective approach to mr brain image segmenta-

tion,” Applied Soft Computing, vol. 11, no. 1, pp. 872–880, 2011.
J. C. Bezdek and N. R. Pal, “Some new indexes of cluster validity,” IEEE Transactions on
Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 28, no. 3, pp. 301–315, 1998.

[5]

[6] U. Maulik and S. Bandyopadhyay, “Performance evaluation of some clustering algorithms and
validity indices,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24,
no. 12, pp. 1650–1654, 2002.

[7] S. Bandyopadhyay, U. Maulik, and R. Baragona, “Clustering multivariate time series by

genetic multiobjective optimization,” Metron, vol. 68, no. 2, pp. 161–183, 2010.

[8] G. N. Demir, A. S. Uyar, and S. G. Ögüdücü, “Graph-based sequence clustering through
multiobjective evolutionary algorithms for web recommender systems,” in Proceedings of
the 9th annual conference on Genetic and evolutionary computation, pp. 1943–1950, ACM,
2007.

[9] N. R. Pal and J. C. Bezdek, “On cluster validity for the fuzzy c-means model,” IEEE Trans-

actions on Fuzzy systems, vol. 3, no. 3, pp. 370–379, 1995.

[10] W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy sets and systems, vol. 158,

no. 19, pp. 2095–2117, 2007.

[11] A. Ferligoj and V. Batagelj, “Direct multicriteria clustering algorithms,” Journal of Classiﬁ-

cation, vol. 9, no. 1, pp. 43–61, 1992.

[12] J. Handl and J. Knowles, “An evolutionary approach to multiobjective clustering,” IEEE

transactions on Evolutionary Computation, vol. 11, no. 1, pp. 56–76, 2007.

[13] Y. Jin, T. Okabe, and B. Sendho, “Adapting weighted aggregation for multiobjective evolu-
tion strategies,” in International Conference on Evolutionary Multi-Criterion Optimization,
pp. 96–110, Springer, 2001.

57

[14] D. Goldberg, “Genetic algorithm in search optimization and machine learing, addison-

wesleypub,” 1989.

[15] E. Zitzler and L. Thiele, “An evolutionary algorithm for multiobjective optimization: The

strength pareto approach,” TIK-report, vol. 43, 1998.

[16] E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strength pareto evolutionary

algorithm,” TIK-report, vol. 103, 2001.

[17] D. W. Corne, J. D. Knowles, and M. J. Oates, “The pareto envelope-based selection algorithm
for multiobjective optimization,” in International Conference on Parallel Problem Solving
from Nature, pp. 839–848, Springer, 2000.

[18] D. W. Corne, N. R. Jerram, J. D. Knowles, and M. J. Oates, “Pesa-ii: Region-based selection
in evolutionary multiobjective optimization,” in Proceedings of the 3rd Annual Conference
on Genetic and Evolutionary Computation, pp. 283–290, Morgan Kaufmann Publishers Inc.,
2001.

[19] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic
algorithm: Nsga-ii,” IEEE transactions on evolutionary computation, vol. 6, no. 2, pp. 182–
197, 2002.

[20] U. Maulik, S. Bandyopadhyay, and A. Mukhopadhyay, Multiobjective Genetic Algorithms for
Clustering: Applications in Data Mining and Bioinformatics. Springer Science & Business
Media, 2011.

[21] J.-M. Won, S. Ullah, and F. Karray, “Data clustering using multi-objective hybrid evolutionary
algorithm,” in Control, Automation and Systems, 2008. ICCAS 2008. International Conference
on, pp. 2298–2303, IEEE, 2008.

[22] R.-S. Kwan, A. C. Evans, and G. B. Pike, “Mri simulation-based evaluation of image-
processing and classiﬁcation methods,” IEEE transactions on medical imaging, vol. 18,
no. 11, pp. 1085–1097, 1999.

[23] K. Deb and R. B. Agrawal, “Simulated binary crossover for continuous search space,” Complex

Systems, vol. 9, no. 3, pp. 1–15, 1994.

[24] M. Laumanns, L. Thiele, K. Deb, and E. Zitzler, “Combining convergence and diversity in
evolutionary multiobjective optimization,” Evolutionary computation, vol. 10, no. 3, pp. 263–
282, 2002.

[25] K. Y. Yeung and W. L. Ruzzo, “An empirical study on principal component analysis for

clustering gene expression data,” Bioinformatics, vol. 17, no. 9, pp. 763–774, 2001.

[26] A. Khomenko, O. Karpenko, E. Koricho, M. Haq, G. L. Cloud, and L. Udpa, “Theory and
validation of optical transmission scanning for quantitative nde of impact damage in gfrp
composites,” Composites Part B: Engineering, vol. 107, pp. 182–191, 2016.

58

[27] P. Banerjee, O. Karpenko, L. Udpa, M. Haq, and Y. Deng, “Prediction of impact-damage
growth in gfrp plates using particle ﬁltering algorithm,” Composite Structures, vol. 194,
pp. 527–536, 2018.

59