A VARIABLE-LENGTH MANY-OBJECTIVE OPTIMIZATION APPROACH IN IMAGE SEGMENTATION PROBLEMS By Xuhui Huang A THESIS Michigan State University in partial fulfillment of the requirements Submitted to for the degree of Electrical Engineering – Master of Science 2018 ABSTRACT A VARIABLE-LENGTH MANY-OBJECTIVE OPTIMIZATION APPROACH IN IMAGE SEGMENTATION PROBLEMS By Xuhui Huang Image segmentation is a technique of dividing an image space into a number of meaningful ho- mogeneous regions. Various data clustering techniques have been adapted in solving segmentation problems. In particular, data clustering is often posed as multi-optimization problem so that characteristics of data could be caught by different objectives simultaneously. Traditional multi- optimization methods often require some prior knowledge or assumptions about data, performance is poor if these assumptions do not hold. Limitations with established multi-optimization methods are caused by their inadequacy in handling a large number of objectives. Nondominated sorting genetic algorithm III (NSGA-III) [1] is proposed to alleviate this issue. However, NSGA-III is inefficient in removing some bad solutions in high-dimensional searching space during evolution. In this article, we propose a variable string length many-objective genetic algorithm(VMOGA) whose framework has evolved from NSGA-III and its encoding strategy, genetic and evolution- ary operator have been redesigned. Performance of VMOGA in image segmentation problems is further enhanced by an appropriate selection of objectives. In the end, we conduct unsupervised segmentation by proposed clustering technique on magnetic resonance image(MRI) of human brain. Comparisons with other evolutionary algorithms are presented and dominance of VMOGA has been demonstrated quantitatively. VMOGA is also performed on detection of delamination area caused by fatigue loading in Mode I glass fiber reinforced polymer (GFRP) samples. Results are compared with fast marching algorithm(FMA) and superiority of VMOGA suggests future potential application in fatigue detection. C o p y r i g h t b y X U H U I H U A N G 2 0 1 8 ACKNOWLEDGEMENTS I would first like to thank my advisor Yiming Deng at Michigan State University. The door to Prof. Deng office was always open whenever I ran into a trouble spot or had a question about my research or writing. I would also like to thank Prof. Udpa and Prof. Ulusoy in my thesis defense committee, without their passionate participation and comments, the validation of this thesis could not have been successfully conducted. I would also like to acknowledge Dr. Banerjee as the provider of the data to this thesis, and I am gratefully indebted to her for datailed explaination and guidance of the data. Finally, I must express my very profound gratitude to my parents for providing me with unfailing support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Thank you. iv TABLE OF CONTENTS . . . . . . . . . . . . LIST OF TABLES . LIST OF FIGURES . CHAPTER 1 . . . . INTRODUCTION . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Brief Introduction of Data Clustering 1 1.2 Multi-objective Optimization and Data Clustering . . . . . . . . . . . . . . . . . . 2 1.3 Concept of Multi-objective Optimization . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Genetic Algorithms(GAs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Multi-objective Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . 3 4 1.4.2 General Framework of MOGA . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Evolving the Number of Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Structure of This Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . 7 CHAPTER 2 ENCODING STRATEGY AND GENETIC OPERATOR . . . . . . . . . . 7 2.1 General Encoding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Binary Encoding Strategy and Genetic Operator . . . . . . . . . . . . . . . . . . . 8 2.3 Real-coded Encoding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Simulation of SBC . 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Simulation of Real-coded Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Handling of Variable Length Chromosome in VMOGA . . . . . . . . . . . . . . . 15 . . . . . 3.2 Evolutionary Strategy for High-dimensional Space . . . . . . . . . . . . . . . . . CHAPTER 3 EVOLUTIONARY OPERATOR . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Evolutionary strategy for Low-dimensional Space . . . . . . . . . . . . . . . . . . 18 3.1.1 Non-dominated Sorting Algorithm . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 Diversity Preservation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.3 Weakness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 . 22 ε-Non-dominated Sorting Algorithm . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 3.2.2 Reference Based Diversity Preservation Rule . . . . . . . . . . . . . . . . 23 3.2.3 Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.4 Niche-Preservation Operation . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Evolutionary Operator for VMOGA . . . . . . . . . . . . . . . . . . . . . . . . . 25 . 25 . 26 3.3.1 Tchebycheff Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 A Rejection Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 4 OBJECTIVE FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 . 4.1 . 4.2 Distance Measure . . 4.3 Membership Function . Introduction . . . . . . . . . . . v . . . Fuzzy C-means (FCM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Refined Membership Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Objective Functions . . 31 4.5.1 4.5.2 Xie-Beni Index (XB) . 31 4.5.3 Overall Cluster Deviation (DEV) . . . . . . . . . . . . . . . . . . . . . . . 32 . 32 Fuzzy Separation (FESP) . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 . 32 4.5.5 Global Separation (SEP) . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.6 Average Between Group Sum of Squares (ABGSS) . . . . . . . . . . . . . 33 4.5.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.8 Davies-Bouldin Index (DB) . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.5.9 Connectedness . 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intra-cluster Entropy (H) . CHAPTER 5 EVALUATION OF OBJECTIVES . . . . . . . . . . . . . . . . . . . . . . . 36 5.1 Multivariate Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.1 Mutual Information Among Two Variables . . . . . . . . . . . . . . . . . . 36 5.1.2 Mutual Information Among Multi Variables . . . . . . . . . . . . . . . . . 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2 Total Correlation . . . . CHAPTER 6 . . . 6.1 Minor Specification . IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1.1 Handling of Infinity Solution in Objective Space . . . . . . . . . . . . . . 40 6.1.2 Handling of Solutions with Same Objectives or Diversity . . . . . . . . . . 40 6.1.3 Handling of Solutions with Same Membership on Two Clusters . . . . . . . 40 6.2 Parameter and Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.2.1 Initialization and Objective Selection . . . . . . . . . . . . . . . . . . . . . 41 6.2.2 Parameter Encoding and Genetic Operator . . . . . . . . . . . . . . . . . . 42 6.2.3 Obtaining Final Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 . 43 Introduction of MRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.1 6.3.2 Adjusted Rand Index(ARI) . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.3.3 Evaluation of Segmentation Results . . . . . . . . . . . . . . . . . . . . . 45 6.3.4 Detection of White and Gray Matter . . . . . . . . . . . . . . . . . . . . . 47 Implementation on Fatigue Area Detection . . . . . . . . . . . . . . . . . . . . . . 50 6.3 Experiment on MRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 CHAPTER 7 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 BIBLIOGRAPHY . . . . . . . . . vi LIST OF TABLES Main loop of NSGA-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 General procedure of Niche-Preservation . . . . . . . . . . . . . . . . . . . . . 24 Distance measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Initialization parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Choice of objectives for FCM, SO,NSGA-II, MOVGA, NSGA-III and VMOGA . 42 Truth table for MRI brain image . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Contingency table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Results for Z10, Z72, Z108, Z120 and Z140 planes . . . . . . . . . . . . . . . . 47 Gray and White Matter extraction results for normal brain images for Z72, Z80, Z100, Z108 and Z120 planes. . . . . . . . . . . . . . . . . . . . . . . . . . 49 vii T a b l e 3 . 1 T a b l e 3 . 2 T a b l e 4 . 1 T a b l e 6 . 1 T a b l e 6 . 2 T a b l e 6 . 3 T a b l e 6 . 4 T a b l e 6 . 5 T a b l e 6 . 6 LIST OF FIGURES viii 5 8 9 9 11 11 Figure 1.1 Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4 Figure 2.5 Figure 2.6 Figure 2.7 Figure 2.8 Figure 2.9 G e n e r a l f r a m e w o r k o f M O G A . . . . . . . . . . . . . . . . . . . . . . . . . . . C o n d u c t i n g b i n a r y - c o d e d c r o s s o v e r a n d m u t a t i o n . . . . . . . . . . . . . . . . . R e a l c o d e d e n c o d i n g s t r a t e g y f o r C r o s s o v e r . . . . . . . . . . . . . . . . . . . R e a l c o d e d e n c o d i n g s t r a t e g y f o r M u t a t i o n . . . . . . . . . . . . . . . . . . . . C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 0 0 i t e r a t i o n s g i v e n ! c = 5 , 1 0 , 1 5 , 2 0 f o r c r o s s o v e r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 5 0 i t e r a t i o n s g i v e n ! c = 1 a n d 2 0 f o r c r o s s o v e r r e s p e c t i v e l y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 , 5 0 0 , 1 0 0 0 , 1 0 0 0 0 i t e r a t i o n s g i v e n ! c = 2 f o r c r o s s o v e r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 0 0 i t e r a t i o n s g i v e n d i ! e r e n t d f o r m u t a t i o n 1 3 C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 5 0 i t e r a t i o n s g i v e n d = 0 . 2 a n d 5 f o r m u t a t i o n r e s p e c t i v e l y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 , 5 0 0 , 1 0 0 0 , 1 0 0 0 0 i t e r a t i o n s g i v e n d = 1 f o r m u t a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 F i g u r e 2 . 1 0 G e n e t i c o p e r a t o r h a n d l e v a r i a b l e K b y r a n d o m s e l e c t i o n f r o m g e n e p o o l . . . . 1 5 F i g u r e 2 . 1 1 G e n e t i c o p e r a t o r h a n d l e v a r i a b l e K b y e x c h a n g i n g g e n e s e g m e n t . . . . . . . . 1 6 F i g u r e 2 . 1 2 C h i l d p o p u l a t i o n d i s t r i b u t i o n b y V M O G A g e n e t i c o p e r a t o r g i v e n 1 0 0 0 0 i t e r - a t i o n s f o r s t d = 1 , 2 , 3 , 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 7 F i g u r e 3 . 1 F r a m e w o r k o f c o n d u c t i n g n o n - d o m i n a t e d s o r t i n g a n d g e n e r a t i n g s o l u t i o n f r o n t . 1 9 F i g u r e 3 . 2 F r a m e w o r k o f c o n d u c t i n g d i v e r s i t y p r e s e r v a t i o n i n N S G A - I I . . . . . . . . . . 2 0 F i g u r e 3 . 3 N o n - d o m i n a t e d s o l u t i o n s p r o p o r t i o n v e r s u s n u m b e r o f o b j e c t i v e s . . . . . . . . 2 1 F i g u r e 3 . 4 F r a m e w o r k o f i m p l e m e n t i n g n i c h e p r e s e r v a t i o n i n N S G A - I I I . . . . . . . . . . 2 5 F i g u r e 3 . 5 E v o l u t i o n a r y o p e r a t o r i n V M O G A . . . . . . . . . . . . . . . . . . . . . . . . 2 7 Figure 6.1 (a)Original MRI image in Z10 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 45 Figure 6.2 (a)Original MRI image in Z72 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46 Figure 6.3 (a)Original MRI image in Z108 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46 Figure 6.4 (a)Original MRI image in Z120 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46 Figure 6.5 (a)Original MRI image in Z140 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering . . . . . . . . . . . . . . . . 46 Figure 6.6 (a)Gray Matter Ground truth (b)Gray Matter segmentation produced by VMOGA on Z108 plane (a)White Matter Ground truth (b)White Matter seg- mentation produced by VMOGA on Z108 plane . . . . . . . . . . . . . . . . . 48 Figure 6.7 (a)Gray Matter Ground truth (b)Gray Matter Extraction produced by VMOGA+Extraction on Z108 plane (a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA-based extraction on Z108 plane . . . . . . . . . . . . . . 48 Figure 6.8 (a)Gray Matter Ground truth (b)Gray Matter segmentation produced by VMOGA on Z120 plane (a)White Matter Ground truth (b)White Matter seg- mentation produced by VMOGA on Z120 plane . . . . . . . . . . . . . . . . . 48 Figure 6.9 (a)Gray Matter Ground truth (b)Gray Matter Extraction produced by VMOGA+Extraction on Z120 plane (a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA-based extraction on Z120 plane . . . . . . . . . . . . . . 49 Figure 6.10 Healthy sample being subjected to Mode 1 cyclic loading after (a) 20K cycles (b) 40K cycles (c) 60K cycles (d) 80K cycles (e) 100K cycles (f) 120K cycles (g) 140K cycles (h) 160 cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 6.11 PZT cutting image of Healthy sample being subjected to Mode 1 cyclic loading after (a) 20K cycles (b) 40K cycles (c) 60K cycles (d) 80K cycles (e) 100K cycles (f) 120K cycles (g) 140K cycles (h) 160K cycles . . . . . . . . . . 52 Figure 6.12 Edge effect and real delamination area in PZT cutting image of Healthy sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . after 160K cycles 53 53 ix F i g u r e 6 . 1 3 H e a l t h y s a m p l e b e i n g s u b j e c t e d t o M o d e 1 c y c l i c l o a d i n g a f t e r 2 0 K , 4 0 K , 6 0 K a n d 8 0 K c y c l e s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 6.14 Healthy sample being subjected to Mode 1 cyclic loading after 100K, 120K, 140K and 160 cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 x F i g u r e 6 . 1 5 P l o t o f n u m b e r o f l o a d c y c l e s v e r s u s d e l a m i n a t i o n a r e a d e t e c t e d b y V M O G A + E x t r a c t i o n a n d F M M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4 CHAPTER 1 INTRODUCTION 1.1 Brief Introduction of Data Clustering Data clustering has emerged as one of the fundamental technique for data analysis recently. One of the primary application of data clustering is image segmentation. The problem of image seg- mentation is usually considered as clustering image pixels in the intensity space [2] [1]. Geographic distribution sometimes is also incorporated to enhance homogeneity. Real-life applications of im- age segmentation include segmentation of remote sensing satellite images [3], magnetic resonance image(MRI) medical imagery of human body [4] and texture image [5]. Other applications such as social network analysis [6], time series data analysis [7], web mining [8] involve more sophisticated data. The purpose of clustering is to partition a given dataset into homogeneous groups. Given a dataset X = {x1, x2, . . . , xn} with size n in a multi-dimensional space, the goal of clustering is to produce a partition matrix U = {uk, j}, value of uk, j is the membership or belongingness of i∈[1,k]{ui, j} then xj ∈ Ck. Crisp and data point xj to cluster Ck. Generally speaking, if uk, j = max fuzzy partition matrix both exist in the literature with their differences of selecting uk, j. For crisp partition matrix, uk, j = 1 or 0 while belongingness ranging from 0 to 1 in a fuzzy matrix [9]. It is worth to be noted that fuzzy clustering is better capable of handling overlapping and noisy clusters. 1.2 Multi-objective Optimization and Data Clustering Performances of clustering solution are often evaluated by some validity indices. Therefore, clustering is posed as optimization problem by directly optimizing validity index [5] [6] [10]. Principle underlying this strategy is to consider partition matrix U equivalent to optimization searching space. Searching space is composed of all the possible clustering results which are denoted as Z. Notice that mapping between U and Z is unique and bilateral. On the other hand, there is a strong connection between performance of clustering results and validity index. This 1 is because validity index often represents properties of cluster such as homogeneity, separation, compactness, etc. In fact, when data is classified into reasonable or suitable groups, validity index is often optimized and vice versa. Further analysis on this transformation requires a thorough study on properties of cluster reflected by validity index, hence selection of validity index is important. There are some evident limitations with single-objective framework [11]. If only one validity index is optimized, characteristics of data are unlikely to be fully considered, thus adding complexity in discriminating two data groups with features alike. It is also extremely difficult to choose an objective with no prior knowledge about dataset. If wrong assumption is made at the beginning, clustering results would be corrupted [12]. In the end, it is impossible to implement tradeoff on objectives by single-objective optimization. At one time, in order to solve this issue, people adopt single-objective framework incorporating multiple objectives by ordering them according to importance or using weight vector to convert into single objective [13]. However, relative significance of various validity indices is unknown real-life application since data structure usually has not been specified. Moreover, it is extremely difficult to choose an objective with no prior knowledge about dataset. If wrong assumption is made at the beginning, clustering results would be corrupted. Therefore, optimizing multiple objectives simultaneously is demanded by situations. 1.3 Concept of Multi-objective Optimization In order to achieve higher accuracy, our framework must optimize many objectives simulta- ( f1(x), f2(x), . . . , fk(x)), neously. Multi-objective optimization can be formally stated as min fi x∈X {− fi(x)}, where X fi(x) = min denotes objective function. Maximum can be expressed as max x∈X x∈X is the feasible region. Solution to this problem is not unique. Moreover, good solution can be extracted by some standards. The most important notion here is Pareto optimality [09], if x∗ ∈ X is Pareto optimal, ∀x ∈ X, s.t. (1) if ∀i fi(x∗) < fi(x) (2) ∃i ∈ 1, 2, . . . , K, fi(x∗) < fi(x). Similarly, non-domination sorting rule can be written (in min- if ∀i ∈ 1, 2, ..., K, fi(x) ≤ fi(y), ∃i ∈ 1, 2, ...K, fi(x) < fi(y) then x dominate imization style): y fi(x∗) ≤ fi(x),∃i ∈ 1, 2, ...K, s.t 2 1.4 Genetic Algorithms(GAs) One of the most efficient approach in searching solutions of Pareto optimality in multi- optimization problem is Genetic Algorithms(GAs) [14], proposed by John Holland. It is proven that GAs are the most powerful searching technique in a large space. Inspired by evolutionary process theory created by Charles Darwin, GAs adopt some similar concepts such as parent, child, population selection, mutation, crossover. Initialization of GAs is to encode solutions(variables) into strings called chromosomes, by operating on chromosomes instead of variables. GAs conduct searching from multiple points instead of single point. Therefore, GAs are unlikely to get trapped in local optimum as other searching algorithms do. Each solution is then given a fitness value and evaluated based on fitness value that measures the goodness of the solution encoded in the chromosome. In this article, fitness value is equivalent to objective value. Thereafter, chromo- somes are selected based on fitness value, those having better fitness are given more chance to be selected and reproduce other chromosome. Potential better solutions are reproduced from selected chromosomes by applying some operators such as mutation and crossover. The procedure that retain best chromosome and replace the worst during each generation is called elitism and better fitness than previous generation is provided. Model that build upon these notions are developing very fast in the last decade. Strength Pareto Evolutionary Algorithm (SPEA) [15] and [16], Pareto Envelope-Based Selection Algorithm (PESA) [17]and PESA-II [18], Non-dominated Sorting Ge- netic Algorithm-II (NSGA-II) [19] are proposed and instantly become popular in application of multi-objective evolutionary algorithm for clustering problem. 1.4.1 Multi-objective Genetic Algorithm In recent studies, multi-objective genetic algorithm (MOGA) based methods are frequently adopted in unsupervised image segmentation, medical and geographical detection problems. NSGA-II is one of the most widely used MOGA based methods since its proposal. However, procedures within NSGA-II framework that benefit its performance on two-objective optimization problems has 3 prevented it from achieving decent results on many-objectives problem (with number of objectives larger than 3). NSGA-III was proposed to alleviate the issue, as an evolutionary many-objective algorithm, it becomes popular instantly. As a reference-point-based searching algorithm, solution in each iteration is encouraged to be close to vacant reference lines, thus adding coverage among searching space. 1.4.2 General Framework of MOGA 4 E x i s t i n g M O G A t e c h n i q u e s m a i n l y d i ! e r i n (cid:222) v e a s p e c t s [ 2 0 ] . F i r s t o f a l l , d a t a c l u s t e r i n g i s p o s e d a s o p t i m i z a t i o n p r o b l e m s b y o p t i m i z i n g s o m e v a l i d i t y i n d i c e s ( o b j e c t i v e f u n c t i o n s ) . T h e r e f o r e , s e l e c - t i o n o f o b j e c t i v e f u n c t i o n m u s t b e d o n e i n t h e (cid:222) r s t p l a c e . T h e n , c l u s t e r s o l u t i o n w i l l b e r e p r e s e n t e d i n t h e f o r m o f c h r o m o s o m e . E n c o d i n g o p e r a t o r s a r e d i ! e r e n t i n t e r m s o f r e p r e s e n t a t i o n s t r a t e g y . A m o n g t h e m , l a b e l r e p r e s e n t a t i o n a n d c o o r d i n a t e r e p r e s e n t a t i o n f r o m L a b e l - B a s e d E n c o d i n g a n d P r o t o t y p e - B a s e d E n c o d i n g r e s p e c t i v e l y a r e m o s t c o m m o n l y a p p l i e d . C l u s t e r s o l u t i o n i s i n i t i a l i z e d a s p a r e n t s o l u t i o n p o p u l a t i o n i n t h i s p e r i o d . I n t h e n e x t s t a g e , g e n e t i c o p e r a t o r p r o d u c e s c h i l d s o l u t i o n p o p u l a t i o n . G e n e t i c o p e r a t o r i s h i g h l y r e l a t e d t o e n c o d i n g o p e r a t o r , s o t h e y a r e o f t e n d i s c u s s e d t o g e t h e r . T h e k e y a s p e c t o f M O G A i s t h e e v o l u t i o n a r y o p e r a t o r w h i c h m a k e s s e l e c t i o n o n p a r e n t - c h i l d c o m b i n e d p o p u l a t i o n a n d g e n e r a t e s s o l u t i o n p o p u l a t i o n f o r t h e n e x t g e n e r a t i o n . I n s o m e a r t i c l e s , g e n e t i c o p e r a t o r a n d e v o l u t i o n a r y o p e r a t o r a r e b o u n d e d t o g e t h e r a s e v o l u t i o n a r y o p - e r a t o r . H e r e , w e w a n t t o s t r e s s t h e i m p o r t a n c e o f s e l e c t i o n s t r a t e g y i n e v o l u t i o n , t h u s , e v o l u t i o n a r y o p e r a t o r i s d i s c u s s e d s e p a r a t e l y . I n t h e e n d , a s i n g l e c l u s t e r i n g s o l u t i o n m u s t b e o b t a i n e d f r o m t h e l a s t g e n e r a t i o n . G e n e r a l f r a m e w o r k o f M O G A i s i l l u s t a t e d i n (cid:222) g [ 1 . 1 ] 1.5 Evolving the Number of Clusters Conventional MOGA methods require predetermination of the number of cluster K. In fact, only an appropriate choice of K could yield acceptable clustering results. In this article, choice of the number of clusters can be tackled by evolving K automatically during iteration using variable string-length encoding in GAs. However, previous encoding strategies and proposed evolutionary operators [21] are inefficient to generate K since they ignore inherent distribution of binary mutation and crossover. In this paper, we examine child population distribution from different evolutionary operators. Performance of selecting K is also evaluated and superiority of proposed evolutionary strategy is demonstrated. 1.6 Structure of This Paper This paper is organized as follows. In section 2, we introduce methods of constructing encoding and genetic operator. Inspired by some philosophy underlying conventional operator, we design a new strategy for VMOGA which is capable of handling variable number of. In section 3, evolutionary operator is introduced with respect to NSGA-II and III. Weakness with exiting strategy 5 F i g u r e 1 . 1 G e n e r a l f r a m e w o r k o f M O G A In section 4, is discussed and a selection scheme which remove “bad” solutions is proposed. we discuss the choice of objective functions which are selected based on prior knowledge and supplement information provided by total correlation. In section 5, conventional objective functions are first introduced, then refined objective functions are constructed in order to better target image segmentation problem. In section 6, implementation details of proposed algorithm are presented. For MRI human brain image, evaluation of segmentation results which demonstrate superiority of proposed algorithm is accomplished by comparing adjusted rand index(ARI) and accuracy(ACC) with corresponding truth table. Results of experiment on detecting delamination area are also discussed. 6 CHAPTER 2 ENCODING STRATEGY AND GENETIC OPERATOR 2.1 General Encoding Strategy Encoding strategy is adopted to represent a clustering solution. At first glance, clustering solutions can be represented by partition matrix. This representation scheme provides full coverage of entire solution space and it is convenient to define extra constraints during mutation or crossover. However, with increasing number of data points introduced in clustering problem, the complexity of direct encoding is surging exponentially, even for a mid-sized clustering problem, scale of its representation will be too large to handle. Consequently, reduction of searching space is vital for encoding and searching. Previous researches have chosen a more indirect approach: assigning each point to its closest cluster represented by centroid of the cluster and encoding coordinates of the centroids. Hence each representation has the length of K × d, d is the dimension of data and K denotes the number of clusters. It is written as [Z1, Z2, . . . , ZK], Zi denotes center of the cluster which is vector of d by 1. In image segmentation, d = 3. Notice that K is a variable in VMOGA. Centroid-based encoding strategies produce considerably smaller scale of representation and reduce its complexity significantly. In initialization, a set of initial solutions are forwarded to encoding by the form of chromosome. They solution chromosomes are manipulated to produce child population solutions by evolutionary operator. Procedures in evolutionary operator to produce child population are called mutation and crossover. During each generation, effective population maintenance strategies are also adopted to keep track of the current non-dominated front. 2.2 Binary Encoding Strategy and Genetic Operator Solutions population are encoded in the form of binary string by binary encoding strategy [22]. Mutation and crossover are performed on a binary string as illustrated in fig 21. One application of Binary-coded GAs is to conduct feature selection, chromosomes of solutions are encoded by 7 binary string with length of total number of features. Bit ‘1’ and ‘0’ represent selected and ignored feature respectively, thus every solution denotes a subset of features. In Centroid-based encoding framework, positions of cluster center are encoded in the form of binary string. However, there are some weaknesses with binary-coded GAs in real-life implementation. The biggest one with binary-coded scheme is that Hamming cliff is likely to be triggered during searching. When searching around neighboring area of one particular solution, one might need to change many bits, i.e. 01111 −→ 10000 which introduce lots of inconveniency to the gradual search in continuous searching space. This is inherently caused by uneven importance distribution among different schema in binary string. Moreover, it is difficult to achieve arbitrary precision since fixed string length limits the precision of solution and appropriate length of the string is often not known. 2.3 Real-coded Encoding Strategy In order to solve these issues, real-coded GAs are adopted where population solutions are not required to convert into binary strings. Real-coded GAs are simple and straightforward. The key of constructing real-coded scheme is to imitate child population distribution results from binary coded scheme. 8 F i g u r e 2 . 1 C o n d u c t i n g b i n a r y - c o d e d c r o s s o v e r a n d m u t a t i o n Figure 2.2 Real coded encoding strategy for Crossover Figure 2.3 Real coded encoding strategy for Mutation 2.4 Simulation of SBC Notice that probability of conducting crossover, number of bits changed during one crossover and position of crossover in binary scheme are all affecting distribution of child population distri- bution. Generally speaking, some properties of binary-coded crossover are important to estimate distribution. First of all, symmetry plays a significant role in distribution: P1 + P2 = C1 +C2, which 9 A s s h o w n a b o v e , d i s t r i b u t i o n o f c h i l d p o p u l a t i o n i s p r o d u c e d b y r e p e a t i n g b i n a r y - c o d e d c r o s s o v e r a n d m u t a t i o n p r o c e s s . T h e s e d i s t r i b u t i o n s a r e l i s t e d a s r e f e r e n c e i n c o n s t r u c t i n g r e a l - c o d e d s c h e m e . S i m u l a t e d b i n a r y c r o s s o v e r ( S B C ) i s w i d e l y u s e d a n d p r o p o s e d b y [ 2 3 ] . means we place equal importance on P1and P2. Secondly, P1 and P2 have the same probability Pc to be selected as a crossover point. Crossover in the lower bit and large bits results in small and large change respectively. In the end, child population are likely to be closed to parent pop- ulation. SBC adopts probability density function of child distribution and simulates single-point crossover in binary-coded GAs (multiple-point crossover is superimposition of some single-point + β(P2−P1) crossover). P1 and P2 are ready for crossover, C1 = P1+P2 ; β is functioned as a spreading factor. C1 and C2 are spread outward when β > 1, inwards when β < 1. Probability distribution of β should be similar to probability distribution of spread factor in binary-coded crossover. Probability density function(PDF) of β is defined as: − β(P2−P1) ; C2 = P1+P2 2 2 2 2 0.5(n + 1)βn 0.5(n + 1) 1 βn+2 n =2 to 5 mostly. Another refined version is given by [23]: f(β) = β ≤ 1 β > 1 (2.1) (2.2) (2.3)   (1 + β)P1 2 (1 − β)P2 2 + C1 = ; C2 = (1 − β)P1 2 (1 + β)P2 2 + β is the distribution index. Where β can be generated each time by (cid:19) (cid:18) (2u) 1 ηc+1 1 2(1−u) 1 ηc+1 u ≤ 0.5 u > 0.5 β = ηc is a variable that controls distribution concentration. As shown in fig 24 larger ηc tends to generate children closer to parents while small ηc allows children to be far from parents. On the other hand, if ηc is comparatively small, distribution converges slowly. In extreme cases where large number of repetition is unreachable, distribution of child population is no longer guaranteed to have higher PDF around parent population. This is illustrated in fig[25] 10 Figure 2.5 Child population distribution after 50 iterations given ηc = 1 and 20 for crossover respectively We see that distribution of child population gradually converges to a smooth curve as number of simulation increasing. Some valuable discoveries from simulation needs to be discussed. First of all, ηc can be considered as a spread factor in constructing crossover operator. Secondly, if ηc and the number of iteration is small, there will be some uncertainty with children distribution, therefore ηc should not be too small. On the other hand, children population are encouraged to spread out during early searching period. This is because parent solutions in this period are usually far away from 11 F i g u r e 2 . 4 C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 0 0 i t e r a t i o n s g i v e n ! c = 5 , 1 0 , 1 5 , 2 0 f o r c r o s s o v e r Pareto optimal front and searching area should be large enough to reach out for better solution. Lastly, evolutionary operator should conduct searching in neighborhood area of parent population as iteration approaching termination. This is largely to that in later period, parent population solutions are assumed to locate in vicinity of Pareto optimal front. In summary, large ηc fails to satisfy our requirement in early searching stage while small ηc could results in wasteful searching. As a result, ηc should be a function of iteration with its value increasing as iteration going up. 2.5 Simulation of Real-coded Mutation The design of mutation operator is quite similar. There are two types of mutation: uniform mu- tation and non-uniform mutation. The random mutation generates a solution randomly distributed within a vicinity of the original solution. Customarily, non-uniform mutation is adopted due to its preference on original solutions. There are several demands on selection of mutation operator. First of all, we wish mutated solutions are distributed towards parent population. Secondly, As the generation number increases, mutated solutions are generated closer to the original solution. Finally, it is obvious that all mutated solutions should be feasible. Mutation operator proposed by [20] is given as C = P + 0.5τ∆ (1− t 1 − r (2.4) tmax )d(cid:19) (cid:18) 12 F i g u r e 2 . 6 C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 , 5 0 0 , 1 0 0 0 , 1 0 0 0 0 i t e r a t i o n s g i v e n ! c = 2 f o r c r o s s o v e r ∆ is the maximum mutation step within constraint and τ is direction index where τ = 1 and −1 indicate mutation towards upper and lower bound respectively. Mutation step is also affected by r which is randomly selected from 0 to 1. t and tmax denote number of mutation so far and the maximum number of mutation respectively. d is a variable that control concentration within the child population distribution after mutation. Usually large d tends to generate children closer to parents while small d allows children to be far from parents. For convergence perspective, if d is comparatively small, distribution with insignificant number of trials β under test has some randomness. In other words, it is not guaranteed that child population are more likely to be near parent under insufficient simulation. On the other hand, if d is large enough, insufficient simulation won’t be a problem 13 F i g u r e 2 . 7 C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 0 0 i t e r a t i o n s g i v e n d i ! e r e n t d f o r m u t a t i o n Figure 2.8 Child population distribution after 50 iterations given d = 0.2 and 5 for mutation respectively With number of simulation increasing, distribution of child population gradually approaches to a smooth curve Some valuable discoveries from simulation also needs to be discussed. First of all,d can be considered as a spread factor in constructing mutation operator. Secondly, if dand the number of iterations is small, there will be some uncertainty with children distribution, therefore d should not be too small. On the other hand, children population are encouraged to spread out during early searching period. This is because parent solutions in this period are usually far away 14 F i g u r e 2 . 9 C h i l d p o p u l a t i o n d i s t r i b u t i o n a f t e r 1 0 0 , 5 0 0 , 1 0 0 0 , 1 0 0 0 0 i t e r a t i o n s g i v e n d = 1 f o r m u t a t i o n from Pareto optimal front and searching area should be large enough to reach out for better solution. Lastly, evolutionary operator should conduct searching in neighborhood area of parent population. This is largely to that in later period, parent population solutions are assumed to locate in vicinity of Pareto optimal front. In summary, large d fails to satisfy our requirement in early searching stage while small d could results in wasteful searching. As a result, d should be a function of iteration with its value increasing as iteration moving on. 2.6 Handling of Variable Length Chromosome in VMOGA One proposed operator by [21] is capable of doing crossover when K is a variable. Assume parent solutions P1 and P2 encode K1 and K1 cluster centers. Kmax and Kmin denote upper and lower bound respectively. Considering data is always classified into at least 2 groups, Kmin = 2. During iteration, crossover operator randomly selects an integer from max(2, K1 + K2 − Kmax), to min(Kmax, K1 + K2 − 2) as variable length for child population. Procedure is illustrated in fig[210]: Another crossover operation iproposed by [4] ensures exchanging solution information is con- ducted smoothly. Similarity, assume parent solutions P1 and P2 encode K1 and K2 cluster centers. λ1 and λ2 denote the location of crossover point in P1 and P2 respectively. λ1 is generated by conducting random integer selection from 2 to K1m2. λ2 is generated by conducting random integer selection from λ2 to λ2. if λ2 ≤ λ2 , λ2 is given by: λ2 = min[2, max(2 − (K1 − λ1))] (2.5) 15 F i g u r e 2 . 1 0 G e n e t i c o p e r a t o r h a n d l e v a r i a b l e K b y r a n d o m s e l e c t i o n f r o m g e n e p o o l λ2 = K2 − max[0,(2 − λ1)] (2.6) Otherwise λ2 = 0, This is illustrated in fig[211] Figure 2.11 Genetic operator handle variable K by exchanging gene segment (cid:48) 1 and K Despite being easy to implement, genetic operator that handle variable K by random selection (cid:48) from gene pool do not follow binary crossover distribution. K 2 are randomly selected from section with lowerbound and upperbound being max(2, K1 +K2−Kmax) and min(Kmax, K1 +K2−2) respectly, thus probability of child population to some extent will not be higher when it is getting close to parent population. Genetic operator that handle K by exchanging gene segment also has similar limitations. Moreover, it is unable to generate solutions with different length after initiation. In order to solve these issues, genetic operator in VMOGA is introduced. First of all, to achieve similar distribution when K is a variable, a similarity index is proposed to measure similarity between child and parent population. 16 ! " ! ! | C | # ! | P | i f a d d i n g g e n e s S ( P , C ) = ( 2 . 7 ) ! ! | P ! C | ! ! i f l o s i n g g e n e s $ | P | T h e n , m o t i v a t e d b y p a r a m e t e r s e t t i n g i n s i m u l a t e d c r o s s o v e r a n d m u t a t i o n d i s t r i b u t i o n , a s p r e a d i n g f a c t o r i s g e n e r a t e d f r o m a n o r m a l d i s t r i b u t i o n N ( (cid:181) , ! 2 ) w i t h (cid:181) d e t e r m i n e d b y p a r e n t p o p u l a t i o n a n d ! a d o p t t h e s a m e s t r a t e g y a s " c a n d d . I n s i m u l a t i o n , s e l e c t o p e r a t o r c o n d u c t a r a n d o m s e l e c t i o n N ( (cid:181) , ! 2 ) . C 1 = P 1 " s e l e c t ( P 2 ) o r C 1 = s e l e c t ( P 1 ) , e a c h w i t h 5 0 p e r c e n t c h a n c e r e p r e s e n t i n g c o n t r a c t i o n a n d e x p a n s i o n . R e s u l t s o f s i m u l a t i o n a r e i l l u s t r a t e d i n (cid:222) g 2 . 1 2 17 F i g u r e 2 . 1 2 C h i l d p o p u l a t i o n d i s t r i b u t i o n b y V M O G A g e n e t i c o p e r a t o r g i v e n 1 0 0 0 0 i t e r a t i o n s f o r s t d = 1 , 2 , 3 , 4 CHAPTER 3 EVOLUTIONARY OPERATOR 3.1 Evolutionary strategy for Low-dimensional Space Various strategies exist in the literature of searching in low-dimensional space. The most widely used technique is NSGA-II. NSGA-II conduct non-dominated sorting based on fitness value (objective value) and preserving diversity among solutions. The scheme to preserve diversity is to adopt crowding comparison operator. Generally, the operator prefers solutions with lower rank (sorting by non-domination) and if both solutions belong to the same rank, solutions located in a less crowded region are preferred. NSAG-III is a reference-point-based algorithm that adopt similar framework as NSGA-II 3.1.1 Non-dominated Sorting Algorithm f(x1),...,−−−−−−→ f(xi−1),−−−−−−→ f(xi+1). . . , −−−−→ f(xi), with −−−−→ The concept of domination and Pareto front are given in introduction part. Here, we will see how these concepts applied in evaluating solutions population. Note that minx∈X ( f1(x), f2(x), ..., fk(x)) , denotes −→ f = ( f1, f2, ..., fk), x1, x2, . . . , xN are N solutions. Basic idea is that we compare every −−−→ f(xN). Comparison on each solution will bring computation complexity of O(kN2). In order to find the first non-dominated front, each solution is assigned two labels: domination count DC and dominated count DDC, representing the number of solution xi dominated and the number of solution xi being dominated. The only requirement for xi to enter first front is DDC(xi) = 0which means solution xi is dominated by no one. Hence, no solution is being dominated in the first front. In order to find solutions in the next level, solutions of the first front are discounted temporarily and solutions of the first front among the remaining i=1 DC(xi) in practice. solutions enter marked as the second front. This is usually done by limN Procedure above is repeated until end of iteration and it is shown in fig ?? 18 3.1.2 Diversity Preservation Algorithm In NSGA-II, convergence of solution to the Pareto-optimal set is underlined. This requires evolu- tionary operator to maintain a good spread of solutions. In other words, our evolutionary operator must give an estimation of density of solutions surrounding a particular solution very quickly after non-domination sorting. Obviously, density estimation will be restricted in a certain area, i.e. neighborhood area of a given point. However, it is very difficult to determine how large this neighborhood should be, if it’s too small, there might be no points fall in the area for most solutions being examined while it’s computationally ineffective to construct such a large neighborhood. On the other hand, density estimation is fundamentally unreliable for high dimensional space. For relatively small dimensional space, crowding distance operator is proposed to handle this issue. Mostly, crowding distance requires sorting solutions with each objective function value in ascend- ing order. For each objective function fi, each boundary value xu = {x : fi(x) = max( fi(x))}or xl = {x : fi(x) = min( fi(x))} is assigned an infinity value which emphasis significance of preserv- ing spreading out solutions. For other solutions, overall distance is calculated as the sum height and width of cuboid. It is extremely useful in two-objective space since they only use one number to describe density. In other occasions where two or more features are required to depict density, 19 F i g u r e 3 . 1 F r a m e w o r k o f c o n d u c t i n g n o n - d o m i n a t e d s o r t i n g a n d g e n e r a t i n g s o l u t i o n f r o n t non-domination sorting rule could also be applied. In summary, diversity preserving operator gives every solution a “crowding estimation” DE and sort them by descending order, larger DE(xi) is preferred in diversity sorting which represent that solution xi is located in a sparse area and pre- serving xi will guide the searching toward a uniformly spread-out Pareto optimal front. Procedure for diversity preservation in NSGA-II is illustrated in fig[32] 3.1.3 Weakness When dealing with real world problems, we often need to optimize more than four or more objectives. It is well known that with an increase number of objectives, an increasing proportion of solutions will be non-dominated. This is shown in fig[33] 20 F i g u r e 3 . 2 F r a m e w o r k o f c o n d u c t i n g d i v e r s i t y p r e s e r v a t i o n i n N S G A - I I Step Step1 Step2 Step3 Step4 Step5 Operation Initial parent solutions P1 with population number N are created with each population given a corresponding objective value. Create offspring solutions Q1 based on P1 with population number N by genetic operator and corresponding objective values are assigned. Combining P1 Q1 = R1. Sorting R1 based on non-domination rule and in order to construct P2, since l must exist satisfying |l−1 |I remaining N − |l−1 generate F1, F2, . . . , Fi, . . . Now we want to select N populations from R1 i=1 Fi| ≤ N and i=1 Fi| ≥ N The first l − 1 fronts F1, F2, ..., Fl−1 are assigned to P2 and i=1 Fi| solutions will be selected from Fl by diversity pre- serving operator. Sorting solutions in Fl by DE, x = {x : x ∈ Fl, DE(x) = max(DE(x))}will be first selected, assigned to P2, Fl = Fl\x and so on until |P2| = N. Go to step2 until iteration is over. Figure 3.3 Non-dominated solutions proportion versus number of objectives Figure[33] shows percentage of non-dominate solution versus cardinality For example, given |Pt| = N ,|Qt| = N and Rt = Pt Qt, |Rt| = 2N. If proportion of non-dominated solution is more than 50 %, then |F1| > N which suggest that non-domination sorting is not working in population 21 T a b l e 3 . 1 M a i n l o o p o f N S G A - I I selection. On the other hand, all the selecting pressure will be placed on diversity preserving operator. Recall that it is computationally expensive to calculate diversity in high-dimensional space, identification of neighbors which required crowded-comparison operator brings huge amount of calculation, approximation in diversity estimation to make computations faster may cause an unacceptable distribution of solutions at the end. To make things even worse, diversity preserving i=1 Fi|. In summary, traditional evolutionary operator leaves no space for in accommodating an adequate number of new solutions in the population and cause diversity preservation operator collapsing during iteration. operator will sort N solutions in F1 instead of solutions in Fl with population N − |l−1 3.2 Evolutionary Strategy for High-dimensional Space 3.2.1 ε-Non-dominated Sorting Algorithm In order to optimize many objectives simultaneously, a special domination rule is proposed by [24] by means of changing dominance rule. In minimization problem, ε-dominance can be expressed as: if x dominate y, then ∀i ∈ 1, 2, ..., K, fi(x) + εi ≤ fi(y) and ∃i ∈ 1, 2, ...K, s.t fi(x) + εi < if x dominate y, ∀i ∈ fi(y) . Another approach of conducting ε-dominance is quite similar: 1, 2, ..., K, fi(x)(1 + εi) ≤ fi(y) (2) ∃i ∈ 1, 2, ...K, fi(x)(1 + εi) < fi(y) The concept of ε-Pareto optimal is given as, if x∗ ∈ X is ε-Pareto optimal, then ∀x ∈ X if ∀i ∈ 1, 2, ..., K fi(x∗) + εi ≤ fi(x),∃i ∈ 1, 2, ...K, fi(x∗) + εi < fi(x). Notice that εi must be predefined by some prior knowledge or universal rules. It has been proved the number of population in ε-Pareto optimal set is related to ε. Assume K objectives, if ∀i ∈ 1, 2, ..., K, 1 ≤ fi(x) ≤ M (cid:18) (cid:19)K−1 |Fε| ≤ logM log(1 + ε) (3.1) In fact, there are many different concepts of ε-Pareto optimality and ε-dominance exist in research literature. If we consider performance of ε non-dominated sorting on objective space, value of ε actually represents a relative tolerance or constraint on each objective. When ε → 0, the ε- dominance place no extra constraint on solution population. As the value of ε increases, size of 22 |Fε| is decreasing over time. Idea behind this suggests a tradeoff between converging speed and diversity. 3.2.2 Reference Based Diversity Preservation Rule In NSGA-III, a predefined set of reference point is adopted to maintain diversity. In this paper, we use Das and Dennis’s [021] systematic approach that places points on a normalized hyper-plane. Imagine a mapping from objective space to a hyperplane, we now focus on geometric characteristic of hyper-plane. Reference points are uniformly distributed, and recombination population is mapped into this space. The number of reference points ρj ∈ ρ can be computed by(cid:0)d+P−1 (cid:1) denote the P number of objectives and divisions (along each objective) respectively. Similarity, we would like to see a solution getting higher chance of being selected when it is located in a less crowded region. Crowdedness here is measured by counting total number of solutions attached to the reference point. Reliable measurement must be designed in order to provide qualitative crowdedness description. 3.2.3 Density Estimation Firstly, in order to make reasonable calculation, each objective function fi(x) is normalized: (cid:48) ( fi(x)− fmin(x)) max( fi(x)−min( fi(x)), x ∈ St. Then, each reference point is associated with closest solutions. i (x) = f Here, distance between reference point and solutions are measured by projection on reference line. Reference line w is defined on the hyper-plane by joining reference point with the origin. (cid:13)(cid:13)(cid:13)(cid:13)x − wT j xwj (cid:107)wj(cid:107)2 (cid:13)(cid:13)(cid:13)(cid:13) pd(x, wj) = (3.2) pd(x, wj) denotes perpendicular distance of solution x to reference line wj. Then we calculate  δ(xi, wj) = 1 if pd(xi, wj) = min w∈W (pd(x, wj)) 0 otherwise (3.3) 23 Step Step1 Step2 Step3 Step4 Operation , since l must exist satisfying | i = 1l−1Fi| ≤ N and |l Now N − |l−1 l Sorting combined solution population Rt (generation t ) based on ε- nondomination rule. F1, F2, . . . , Fi, . . . are generated. Now we want to select N populations from R1 in order to construct P2 i=1 Fi| ≥ N The first l − 1 fronts F1, F2, . . . , Fl−1 are assigned to Pt+1 i=1 Fi| solutions from Fl will be associated with reference points. θ(x) = {ρm : min ρj} are calculated for all solutions x in ρj∈ρ i=1 Fi Sorting solutions x ∈ Fl by order of θ(x), x ∈ Fl, θ(x) = min(θ(x)}will be first selected, assigned to Pt+1, Fl = Fl\x and so on until |Pt+1| = N; δ(xi, wj) = 1 represent that solution xi is associated with reference line wj. The number of solutions associated with every reference line wjis counted as δ(x, wj) (3.4) ρj = x∈X ρj for reference line wj provides valuable information to estimate solution density or crowdedness. Maximum diversity is then achieved by niche-preservation Operation. 3.2.4 Niche-Preservation Operation We start with calculating θ(x) = (ρm| ρm = min ρj∈ρ ρj) X = {x : x ∈ X, θ(x) = min(θ(x)} (3.5) Least crowded reference lines are considered firstly. Solution population associated with these reference lines are selected randomly. Selection stops if the number of population reach N. Otherwise, after taking care of solutions associated with these reference lines, least crowded reference lines among the remaining are considered. General procedure is written as : 24 T a b l e 3 . 2 G e n e r a l p r o c e d u r e o f N i c h e - P r e s e r v a t i o n 3.3 Evolutionary Operator for VMOGA 3.3.1 Tchebycheff Metric In practice, parameter setting in ε-non-dominated sorting algorithm is difficult when prior knowl- edge about data to be clustered is un known. Usually, ε1 = ... = εd = ε with d representing the number of objectives. Solution that achieves superb performance in one objective while behave poorly on all the other objectives is still able to propagate to the next generation. For example, ( f1(x), f2(x), . . . , fd(x)), and ∃x0, s.t f1(x0) << f1(x), optimization problem is described as min x∈X ∀x ∈ X\x0 while fi(x0) >> fi(x),∀x ∈ X\x0 for i = 2, 3, . . . , d. Even if ε is very large, still (1 + ε) f1(x0) < f1(x),∀x ∈ X\x0. Hence, x0 always have great chance to be selected to first non- dominated front. Generally speaking, this issue can be alleviated (not extinguished) by adopting ε-dominance and setting large ε. However, large ε will significantly decrease the size of Pareto front, thus impairing diversity. Tchebycheff metric offers us inspiration in terms of removing bad 25 F i g u r e 3 . 4 F r a m e w o r k o f i m p l e m e n t i n g n i c h e p r e s e r v a t i o n i n N S G A - I I I solution populoation. Proposed by [002], Tchebycheff metric is defined as TCH(x, w, z∗) = max i∈1,...,d wi(cid:107) fi(x) − z∗ i (cid:107) (3.6) z∗denotes projection of idea point in each objective axis with weight vector W = [w1, . . . , wd] characterizing significance of each objective. In order to reject “unacceptable” solution population W could be set as wi = 1, i = 1, . . . , d. However, to determine the idea point is very hard. Sophistication of determining z∗ also prevents us from using method. Furthermore, order of priority in rejection should be considered. Intuitively, solutions that deliver bad performance on d − 1 objectives ought to be removed, then d − 2 comes next, Finally, if rejection quota has not been used up, solutions with only one corrupted objective should be deleted. If equal importance is placed upon objectives, It is illustrated in analysis above that Tchebycheff metric basically ignore order of priority in rejection. 3.3.2 A Rejection Operator Motivated by Tchebycheff metric and aggregating technique, a selection index V is proposed and defined as: d wi f ∗ i (x) V(x) = i=1 (3.7) f ∗ i (x) denotes normalized objective value for solution x. Notice that V could take the advantage of Niche preservation process where normalization has done. ε-non-dominated sorting is still implemented in VMOGA after normalization of objective value. Here, ε is determined by previous experiment that illustrate relationship between non-dominated population proportion and value of ε. We need to ensure that 2N ∗ P(ε) < N (3.8) As a supplement to non-dominated sorting, rejection operator has the rejecting rate of θ1+θ θ1+θ maintain population, a new l exist satisfying |l−1 .The number of N(1 + θ)−|l−1 %. Top % population with largest V(x) are rejected after Niche-Preservation Operation. In order to i=1 Fi| ≥ N(1 + θ) i=1 Fi|solutions are selected by reference-based diversity preservation i=1 Fi| ≤ N(1 + θ) while |l 26 operator. Then θ1+θ operator in VMOGA is illustrated in fig[35] % of the Combined solutions with population N(1+θ)are rejected. Evolutionary 27 F i g u r e 3 . 5 E v o l u t i o n a r y o p e r a t o r i n V M O G A CHAPTER 4 OBJECTIVE FUNCTIONS 4.1 Introduction Objective functions that evaluate quality of clustering solutions are simultaneously optimized in multi-optimization framework. Early exploration of data clustering adopt validity index in evaluation of classification results since a solid connection exists between performance of clustering and validity index. When clustering performance is enhanced, validity index is optimized and vice versa. Applications of validity indexes as objective functions achieve satisfied results in previous studies [022]. Generally speaking, cluster prototype-based and cluster label-based approach are two types of objective functions exist in the literature. In 2005, cluster label-based encoding strategy is proposed by [023] and has been improved from then. However, it requires every data point being assigned a label, thus demands large computation. This issue can be tackled by Prototype- based approach. In this article, we adopt Prototype-based encoding strategy and introduce several frequently applied corresponding objectives. The importance of right choice of objective functions is also illustrated. 4.2 Distance Measure Distance measure is the basic element in constructing measurement gauging similarity/ dissim- ilarity between two data points xi and xj. Suppose X ∈ Rd , distance between xi and xj is denoted as D(xi, xj) which satisfying following properties: • D(xi, xj) ≥ 0 for all xi, xj ∈ X and D(xi, xj) = 0 only if i = j • D(xi, xj) = D(xj, xi) • D(xi, xj) ≤ D(xi, xl) + D(xl, xj) for all xi, xj ∈ X 28 Name D(1)(xi, xj) D(2)(xi, xj) D(3)(xi, xj) Expression D(1)(xi, xj) = |xi,d − xj,d|, xi,dandxj,d repre- sent dth component of xi,d,xj,drespectively that contain intensity information of image. D(2)(xi, xj) = D(3)(xi, xj) = (cid:113)l=d (cid:0)xi,l − xj,l (cid:113)l=l1,l2,...,lk (cid:1)2 (cid:0)xi,l − xj,l (cid:1)2. l=1 characteristics intensity information Geographic and in- tensity information 1 , . . . , lth lth k of data space feature Euclidean distance measure is the most commonly used one in research, here, we categorize them into three groups based on characteristics representation. 4.3 Membership Function Membership function is vital in constructing objectives. In this paper, we wish to build up a framework containing a set of universally applicable objective functions. Cluster validity indices according to [022], are often selected as the objective function and they are heavily depend on calculation of membership value. Membership function can be written as a matrix U with its value reflecting belongingness of data point to corresponding cluster. It is also equivalent to partition matrix. In image segmentation problems, fuzzy membership value is calculated by: , 1 ≤ i ≤ K; 1 ≤ j ≤ n; (4.1) j=1 uk, j = n (4.2) k=1 uk, j Equation[4.2] proves that uk, j is reasonable to reflect belongness. D(zi, xk) denotes dis- It should be pointed out that D(zi, xk) could be tance between cluster centerzi and pattern xk. j=1 uk, j k=1 k=1 29 ui, j =  limK 1 (cid:18) D(zk,xj) (cid:19) 2 m−1 D(zi,xj) uk, j ∈ [0, 1] and simple calculation shows that: i=1  Klim (cid:18) Klim k=1 D(zk, xj) 2 m−1 (cid:19)(cid:18) Klim  nlim (cid:19) = 1;  Klim  nlim T a b l e 4 . 1 D i s t a n c e m e a s u r e m e n t D(1)(xi, xj)or D(2)(xi, xj) in following studies determined by practical demands. In image segmen- tations, D(1)(xi, xj) and D(2)(xi, xj) are adopted when intensity distribution of pixels and geographic information are considered respectively. Fuzzy exponential m is usually set as 2, leading 2 m−1 = 2. During implementation, centers encoded in a chromosome are first extracted (usually given by a random generation) and are forwarded to objective function. 4.4 Refined Membership Function In practice, uk, j is likely to be affected by noise and artifacts. In theory, we hope pixel or pattern gets bigger chance of being classified into the same cluster as neighborhood pixels. As a result, refined membership function is proposed by [024]. By evaluating belongingness of surrounding area, neighborhood information is incorporated into refined membership value. Another advantage of refined membership function is that ambiguity between adjacent patterns (i.e. overlapping) will be considerably reduced. In order to construct refined member function, some preparations are needed. First of all, diameter of neighboring region is predefined (usually set as 3 or 5) and crisp membership value is calculated via gk, j = 1 i f uk, j ≥ ul, j, l = 1, 2, ..., K 0  hk, j =  otherwise j∈neighborhood gk, j (4.3) (4.4) Secondly, estimation on partition results of neighboring region is given by hk, j is defined as the summation of crisp membership value gk, j in a given neighboring region. In the end, refined membership function can be constructed by product of hk, j and uk, j with normalization. 1 ≤ i ≤ K, 1 ≤ j ≤ n , (4.5) Similarly, u∗  nlim k, j ∈ [0, 1] and simple calculation shows that: j=1 u∗  Klim k=1 u∗ D(zk, xj) 2 m−1 (cid:19)(cid:18) K (cid:18) K k=1 k, j k, j (cid:19) = 1; K n k=1 j=1 u∗ k, j = n (4.6) u∗ k, j = K k=1 uk, j hk, j uk, j hk, j k=1 30 4.5 Objective Functions 4.5.1 Fuzzy C-means (FCM) FCM [025][026] is a widely used objective in partition matrix evolution. In fact, single-objective optimization once took FCM as optimizing objective because of its capability of upgrading cluster center every iteration. In this paper, Jm is only considered as one of objective functions and it’s given by FCM = k, j D2(zk, xj) um (4.7) n K j=1 k=1 Where n is the number of data point or pattern (number of pixels in image equivalently), D(zk, xj)denotes Euclidean distance between data points xj and the center zk. Notice that Jm summates variance over all clusters. Lower value of Jm indicate better compactness and each data point is classified to the cluster that achieves the largest membership value. It’s quite intuitive that Jm decrease as number of cluster K increasing and it can be proved that Jm takes minimum value when K = n 4.5.2 Xie-Beni Index (XB) Xie-Beni index [027] a function of the ratio of the total variation σ to the minimum separation sep of the clusters. Total variation σ is given by n K j=1 k=1 σ = k, j D2(zk, xj) um (4.8) which is the same as Jm and minimum separation min sep(V) = min i(cid:44)j the worst case scenario of separation in clustering. XB is written as {(cid:107)zi − zj(cid:107)2}, is a measure of XB(U, V; X) = σ(U, V; X) n × min_sep(V) (4.9) 31 4.5.3 Overall Cluster Deviation (DEV)  K k = 1K  xj∈Ck k=1 xj∈Ck Dev = Jm = D2(zk, xj) D2(zk, xj) When maximum membership for each data point is assigned 1 and others by 0, (4.10) (4.11) In fact, Dev can be viewed as crisp version of Jm. During iteration ambiguity on the boundary is eliminated since summation in every iteration is proceed right after crisp classification. Dev must be minimized in order to obtain compact clusters. This objective is practically the same as the objective of K-means clustering. 4.5.4 Fuzzy Separation (FESP) Fuzzy separation is given by [019] K K i=1 j=1, j(cid:44)i S = i, j D2(zi, xj) µm (4.12) Membership degree follow similar equation to measure belongingness of each zi to zj, i (cid:44) j. µi, j = K l=1,l(cid:44)j (cid:19) 2 m−1 1 (cid:18) D(zj,zi) D(zj,zl) , j (cid:44) i (4.13) It should be maximized in order to obtain well separated clusters. 4.5.5 Global Separation (SEP) Global separation [022] is introduced as supplement criteria in estimating separation and it can be considered as the crisp version of fuzzy separation. Maximizing sep(V) encourage clusters to be separated from each other, on the other hand, distance between cluster centers should be large enough to avoid overlapping. sep(V) = 2 K(K − 1) D2(zi, zj) (4.14)  Klim i=1  i(cid:44)j 32 4.5.6 Average Between Group Sum of Squares (ABGSS) ABGSS is proposed by [028] which also measure cluster separation and compute the average distance of the cluster centers from the centroid of the dataset as follows:  limK i=1 niD2(zi, z) K ABGSS = Here, z represent the center of the whole dataset and ni is the number of data points having maximum membership value corresponding to zi. ABGSS must be maximized in order to obtain well-separated clusters. 4.5.7 Intra-cluster Entropy (H) Intra-cluster entropy is brought up in terms of improving clustering homogeneity [029]. Average degree of similarity between each cluster center and data points is given by (cid:20)  nlim j=1 g(zk) = 1 n (cid:21) 0.5 + CO(zk, xj) 2 Value of g(zk) reflect global probability of grouping all data points to Ck represented by center zk. −→xi·−→xj cosine distance CO(xi, xj) is written as CO(xi, xj) = (cid:107)−→xi(cid:107)(cid:107)−→xj(cid:107) Intra-cluster entropy of partition Ci is given by (4.15) (4.16) (4.17) (4.18) H(Zi) = −[g(zi) log2 g(zi) + (1 − g(zi)) log2 (1 − g(zi))] In the end, general Intra-cluster entropy is defined as (cid:20) K k=1 H = 1 − H(Zi)g(zi) (cid:21) 1 d d is the dimension of dataset. High intra-cluster entropy is desirable because low entropy from each cluster enhance homogeneity of our results. 33 4.5.8 Davies-Bouldin Index (DB) The Davies-Bouldin index [030] is a function of the ratio of the sum of within-cluster scatter to between-cluster scatter, scatter of cluster is given as: Single intra-cluster homogeneity and compactness can be estimated by Overall homogeneity and compactness is then given by:  xj∈Zi Si = (cid:27) D2(zi, xj) |Z j| (cid:26) Sk + Sj  limK D2(zk, zj) k=1 Rk K Rk = max j(cid:44)k DB = (4.19) (4.20) (4.21) (4.22) (4.23) Davies-Bouldin index calculate the ratio of within cluster scatter to intra cluster separation. For a good partition result, inter cluster separation as well as intra cluster homogeneity and compactness should be high. This affirms the idea that no cluster has to be similar to another, and hence the best clustering scheme essentially minimizes the Davies–Bouldin index. 4.5.9 Connectedness Inspired by concept of fuzzy adjacency relation from [01], For any pair of pixel (x, y), the affinity function µk(x, y)indicates the local hanging togetherness and it is written as: µk(x, y) = µα(x, y)g(µψ(x, y), µφ(x, y)) µα(x, y) represents adjacency relation which is defined as: µα(x, y) = 1 i f (cid:107)x − y(cid:107) ≤ 1 0 otherwise g should be a monotonically non-decreasing function and two components of µψ(x, y) and µφ(x, y) represent object-feature based and homogeneity-based function. In our case, calculation of  34 g(µψ(x, y), µφ(x, y)) can be simplified into g(µψ(x, y), µφ(x, y)) =  1 0 x, y ∈ Ck otherwise (4.24) The shortest path between two points x,y is denotes as dshort(x, y), which is measured along the relative neighborhood graph. First of all, all possible paths connecting x and y are found, assume x1, x2, . . . , xk are intermediate point forming path C(x1, x2, . . . , xk) which satisfy µk(xk, xk+1) = 1 (4.25) Set |C(x1, x2, . . . , xk)| = k, dshort(x, y) = minc∈C |c|. Otherwise, if there is no path between (x, y), set dshort(x, y) = inf. For a given cluster Ck, we calculate the number of connected components and denote it as L(Ck) (4.26) k∈1,2,...,K Cluster validity index is minimized during optimization. Conn = max L(Ck) 35 CHAPTER 5 EVALUATION OF OBJECTIVES As mentioned before, there is no single objective function that works well for all kinds of dataset. Moreover, performance of multi-optimization algorithm is also likely be harmed by objectives that provide insufficient coverage of data features. Therefore, selection of objective is quite important. An increasing number of objectives generally enhance optimizing performance while adding computational complexity. Multi-objective optimization is performed on a number of, often conflicting objectives. The degree of conflict between two objectives increases if optimization of one objective violate the assumption of the other. One example is to optimize XB and FCM: FCM is strongly correlated with global variance while XB index is a combination of global variance and minimum separation as numerator and denominator respectively. Separation between closest clusters are considered as the worst-case scenario here, the value of which increases as clusters are well separated from each other, thus increase global variance. It is shown in [015], minimizing XB index and FCM simultaneously yields good results. It turns out to be much more difficult to give similar analysis on other objective functions depending limited knowledge. Therefore, information theory is introduced to solve this issue. 5.1 Multivariate Mutual Information 5.1.1 Mutual Information Among Two Variables Generally speaking, mutual information [031] measures amount of information about one variable given the knowledge of another variable and it is mostly defined on two variables. In probability theory, mutual information gives a quantitative measurement on the amount of information obtained about one variable through another. Our problem here requires the knowledge of interaction between more than two variables. Firstly, in two-objective cases, assume X and Y have distribution 36 PX(x),PY(y) respectively. Entropy is given by x∈X H(X) = − H(Y) = − H(X, Y) = −  y∈Y x∈X,y∈Y PX(x) log PX(x) PY(y) log(PY(y)) PXY(x, y) log(PXY(x, y)) Then joint entropy is written as (5.1) (5.2) (5.3) (5.4) (5.5) (5.6) Mutual information is defined in terms of individual entropy and joint entropy: I(X;Y) = H(x) + H(Y) − H(X, Y) Now, we want to expand to many-objective situation, the idea underlying this is dimension reduction. 5.1.2 Mutual Information Among Multi Variables For three variables, U, V, Y. If V has no effect on U and Ythen IV(U;Y) = I(U;Y), Iv(U;Y) = I(U;Y). On the other hand, if V affect the relationship between Uand Y, we could eliminate V by taking a weighted sum (on the probability of occurrence of the particular value of V ) of the mutual information between U and Y for each value of V. IV(U;Y) = v∈V PV(v)I(U;Y|V = v) = I(U;Y|V) I(U;Y|V) = I(U, V;Y) − I(V;Y) since I(U, V;Y) = H(U, V) + H(Y) − H(U, V, Y); I(U;Y) = H(U) + H(Y) − H(U, Y) (5.7) We can defer that IV(U;Y) = H(U, V) − H(U, V, Y) − H(V) + H(V, Y) (5.8) A simple subtraction betweenIV(U;Y) andI(U;Y) can be written as I(U;Y) − IV(U;Y) = H(U) + H(V) + H(Y) − (H(U, V) + H(V, Y) + H(U, V)) + H(U, V, Y) (5.9) 37 This is also called mutual interaction between V,U and Y.Analysis above presume that probability distribution of each variable is known. Now, we can easily extend mutual information calculation to a universal case: I(X1; X2) = H(X1) − H(X1|X2); H(X1|X2) = I(X1|X2) So, I(X1; X2) = I(X1) − I(X1|X2), with knowledge above I(X1; X2; X3) = I(X1; X2) − I(X1; X2|X3) (5.10) (5.11) And I(X1; X2; . . . ; XN) = I(X1; X2; . . . ; XN−1) − I(X1; X2; . . . ; XN−1|XN) (5.12) These equations can be expressed in terms of entropy I(X1; X2; ...; XN) = (H(X1) + H(X2) + ... + H(XN)) − ... + (−1)N−1H(X1; X2; ...; XN) (5.13) When processing only two variables,I is always nonnegative, if I = 0, Xand Yare independent and knowing X does not give any information about Y and I > 0 implies information is shared by Xand Y, the largerI is, more information is shared by X and Y. The inefficiency with multi-variate mutual information is that it could be either positive or negative. For instance, consider case that U and Y are independent of each other and V is a variable only dependent on U, then I(U; V;Y) is negative. 5.2 Total Correlation Total correlation is the amount of information shared among the variables in the set. In information theory, total correlation is one of the several generalizations of mutual information. For a given set of N random variables {X1, X2, . . . , XN}, total correlation is defined as Kullback- Leibler divergence from the joint distribution p(X1, X2, . . . , XN) to the independent distribution of p(X1)p(X2). . . p(XN) :[032] C(X1, X2, . . . , XN) = DK L[p(X1, X2, . . . , XN)|p(X1)p(X2). . . p(XN)] (5.14) 38 By reducing divergence to simpler difference of entropies: C(X1, X2, . . . , XN) = H(Xi) − H(X1, X2, . . . , XN) (5.15) (5.16) which can be also written as C(X1, X2, . . . , XN) = I(Xi; Xj; Xk) + ... + I(X1; X2; . . . ; XN) N I(Xi; Xj) + i=1 i, j i, j,k 39 CHAPTER 6 IMPLEMENTATION 6.1 Minor Specification 6.1.1 Handling of Infinity Solution in Objective Space It’s not rare in practice when prior knowledge is inaccessible and number of cluster in implementa- tion is unknown. An infinite value in objective space is possible, for instance, when some clusters are empty, Zi, hence |Zi| = 0. Recall that within-scatter of cluster is given as:  xj∈Zi Si = D2(zi, xj) |Z j| (6.1) Si goes to infinity. Notice that normalization will be affected by doing calculation ∞−C∞ . This is unlikely to generate a reasonable result. In our implementation, solutions with infinite objective are detected immediately and deleted. 6.1.2 Handling of Solutions with Same Objectives or Diversity Solutions with the same objectives or diversity are always selected randomly until solution popu- lation reach threshold N. 6.1.3 Handling of Solutions with Same Membership on Two Clusters Assume that data pointxj has equivalent maximum membership for cluster Z1 and Z2: u1, j = u2, j due to fuzzy clustering. This often implied that xj is on the boundary of Z1 and Z2. In order to take advantage of fuzzy clustering, data pointswith equivalent degree of belongingness are assigned to more than one clusters. Some objectives, for instance, deviation requires calculating K  Dev = k=1 xj∈Ck D2(zk, xj). In VMOGA, xj is assigned to both Z1and Z2. 40 6.2 Parameter and Experiment Setting 6.2.1 Initialization and Objective Selection Initialization Parameter Number of generations Population size Parent solution population Child solution population Upper bound on number of clusters Lower bound on number of clusters value 50 100 100 100 30 2 Six different methods are selected to construct comparison Fuzzy c-means (FCM) is a method of clustering which allows one pixel to belong to two or more clusters. FCM is proposed by [9] and frequently applied in pattern recognition. It is based on minimization of the FCM objective function FCM = k, j D2(zk, xj) um (6.2) Partition is carried out through an iterative optimization of FCM shown above, with the update of membership ui, j and the cluster centers zk , 1 , 1 ≤ i ≤ K; 1 ≤ j ≤ n; In NSGA-II is one of the most widely adopted two-objective optimization genetic algorithm. our experiment, MOVGA and NSGA-II optimize the same objectives (FCM and XB) while the number of clusterK is fixed for NSGA-II and set as a variable for MOVGA. Similarily, NSGA-III and VMOGA-1 both optimize FCM,XB, 1/FSEP, DB, 1/H while K is fixed for NSGA-III and set as a variable for VMOGA-1. In VMOGA-2, another objective function Conn in terms of evaluating ui, j = (cid:18) D(zk,xj) D(zi,xj) K i=1 zk = j=1 um k, j xj j=1 um k, j (6.3) (6.4) n K j=1 k=1 m−1 (cid:19) 2 n n 41 T a b l e 6 . 1 I n i t i a l i z a t i o n p a r a m e t e r connectedness is incorporated into our framework. However, introduction of Conn disrupts the balance of previous objective set. Number of cluster K, which is set as a variable in VMOGA, is adopted to address this issue. SO (Single optimization of FCM + XB + 1/FSEP + DB + 1/H) is taken as a supplement criteria in our comparison, with corresponding weight value to each objective set as one. Recall that weight value or priority for each objective function is needed before implementing aggregating methods. Optimizing SO in our experiment is to exclude the possibility that equal weights happen to be the right answer since in this case rejection operator invalidate the rest part of evolutionary operator. √ √ Methods Choice of objectives FCM SO √ √ Jm XB 1/FuzzySep DB 1/H Conn √ √ NSGA-II √ MOVGA √ NSGA-III √ VMOGA √ √ √ √ √ √ √ √ √ √ √ √ √ √ 1SO optimize normalized Jm + XB + 1/FuzzySep + DB + 1/H + Conn 6.2.2 Parameter Encoding and Genetic Operator Methods with predefined number of clusters K (NSGA-II, NSGA-III, SO) adopt simulated binary crossover with ηc = 10 and mutation operator with d = 10 with Crossover probability and mutation probability equal 1 and 0.2 respectively. In VMOGA, crossover operation is determined by normal 2 , t distribution N(µ, σ denotes iteration insofar, tmax = 200 as total number of generation. Mutation operator parameter 2), µ is the length of the parent solution chromosome, σ = 4 − 3 (cid:19) 1 tmax (cid:18) t 42 T a b l e 6 . 2 C h o i c e o f o b j e c t i v e s f o r F C M , S O , N S G A - I I , M O V G A , N S G A - I I I a n d V M O G A (cid:18) (cid:19) 1 d = 0.1 + 9.9 algorithm, provision is set as 4. tmax t 2 ; Moreover, rejection rate θ = 0.1 for VMOGA. In refence-based sorting 6.2.3 Obtaining Final Solution Notice that, a set of non-dominated solutions are generated by multi-optimization algorithm and none of these solutions can be improved in one objective without degrading another. Thus, we need do some tradeoff or use a selection scheme to choose one particular solution. Index I [6] is efficient as a supplement validity index to measure the goodness of cluster results. It is given by I(K) = 1 K × E1 EK × DK (6.5) DK. E1 =n K denotes the number of clusters. Observed from equation, Index I is composed of 1 K j=1 D(z1, xj) ;EK =K k=1xj∈Zk D(zk, xj). E1 is taken as constant and z1 represent , E1 EK and global center of dataset. Recall that EK has same expression as Dev which needs to be minimized D(zi, zj); it denotes the largest in order to enhance compactness of clustering results. DK = max i, j distance between two cluster centers and larger DK indicate better separation. In summary, indexI is to be maximized. During implementation, we select solution from the last generation with largest I. 6.3 Experiment on MRI Data 6.3.1 Introduction of MRI Data Image Segmentation plays an important role in medical image analysis. Automatic segmentation of MRI brain images into different classes is very important in clinical study and neurological pathology. However, image segmentation has always been challenging for MRI images since these images are noisy and imprecise in nature. One important aspect of evaluating segmenting results is to calculate regional volume. In real-life application, regional volume calculations often bring very useful diagnostic information. Among them, the quantization of gray and white matter volumes may 43 be of major interest in neurodegenerative disorders such as Alzheimer disease, movement disorders such as Parkinson or Parkinson related syndrome, white matter in metabolic or inflammatory disease, congenital brain malformations, perinatal brain damage, or in post-traumatic syndrome. Normal brain image data provided by [22] is a great source to evaluate efficiency of proposed algorithm. Here, ground truth table is available for these images. There are ten classes present in Normal brain image with each class given an integer between 0 and 9 representing its content. Class Label Back gound 0 1 Matter 2 White Matter 3 4 CSF Grey Fat Muscle/ skin 5 Skin Skull 6 7 Glial Matte 8 Connective 9 A result comparison here is established with MOVGA proposed by [4] and evaluated by adjusted rand index(ARI) [25] 6.3.2 Adjusted Rand Index(ARI) Clustering quality is mostly evaluated by external measurement. The Rand index(RI) takes two partitions as the input (one of which is the correct solution from truth table). The number of pairwise co-assignments of data items between the two partitions is counted as RI. Adjusted Rand index (ARI) [25] additionally introduces a statistically induced normalization in order to yield values close to zero for random partitions. Suppose true partition is available and denotes as T = {T1, T2, . . . , Tr} for MRI data X = {X1, X2, . . . , XN}, our segmentation results are written as C = {C1, C2, . . . , Cs}. Overlapping between T and C can be summarized in a contingency table illustrated in table[64] 44 T a b l e 6 . 3 T r u t h t a b l e f o r M R I b r a i n i m a g e Table 6.4 Contingency table C1 n11 n21 ... nr1 b1 C2 n12 n22 ... nr2 b2 ... Cs ... n1s ... n2s ... ... ... nrs ... bs Sums a1 a2 ... ar T1 T2 ... Tr Sums Each element in contingency table denotes the number of objects belonging to Ti Cj as ni j = |Ti Cj|, andr i=1 ai =s i=1 bi = N i, j (cid:0)ai2 2i 1 (cid:0)ni, j (cid:1)j 2 (cid:1) − (cid:0)bj 2 (cid:21) (cid:20) i (ai2) 2 ) j (bj (cid:20) i (ai2) (cid:1) − (N2) (N2) j (bj 2 ) ARI = (cid:21) (6.6) ARI=1 when clustering results perfectly match true partition T and approximate to 0 for random classification. 6.3.3 Evaluation of Segmentation Results Algorithms have been applied on the images corresponding to the Z planes Z10, Z72, Z108, Z120 and Z140.Results are presented in Fig[61:65]. It appears that proposed VMOGA has identified different homogeneous regions very well 45 F i g u r e 6 . 1 ( a ) O r i g i n a l M R I i m a g e i n Z 1 0 p l a n e ( b ) G r o u n d t r u t h t a b l e ( c ) C o r r e s p o n d i n g s e g m e n t e d i m a g e p r o d u c e d b y V M O G A c l u s t e r i n g Figure 6.2 (a)Original MRI image in Z72 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering Figure 6.3 (a)Original MRI image in Z108 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering Figure 6.4 (a)Original MRI image in Z120 plane (b)Ground truth table (c)Corresponding segmented image produced by VMOGA clustering 46 F i g u r e 6 . 5 ( a ) O r i g i n a l M R I i m a g e i n Z 1 4 0 p l a n e ( b ) G r o u n d t r u t h t a b l e ( c ) C o r r e s p o n d i n g s e g m e n t e d i m a g e p r o d u c e d b y V M O G A c l u s t e r i n g Results of ARI score are illustrated in table [65] NSGA-II NSGA-III MOVGA VMOGA FCM SO 0.509 0.661 0.450 0.604 0.557 0.719 0.698 0.386 0.591 0.627 Z10 Z72 Z108 Z120 Z140 0.722 0.727 0.732 0.761 0.816 0.786 0.725 0.791 0.810 0.886 0.744 0.664 0.767 0.758 0.827 0.801 0.719 0.808 0.825 0.893 It is evident in table[65] that proposed VMOGA generally outperforms other algorithm. Mean- while, the presence of VMOGA and NSGA-III results illustrate that more objectives provide extra coverage of data characteristics, hence enhance clustering performance. Comparison of FCM, NSGA-II and NSGA-III further illustrate the importance of optimizing many objectives simultane- ously. Results of single-objective optimization is also presented in the table. Recall that a rejection operator V is adopted in VMOGA to remove bad solutions. It is also neccessary to eliminate the possibility that best results could be selected by index V individually. The single optimization of V here displays that rejection operator is incapable of selecting good solutions. 6.3.4 Detection of White and Gray Matter Further application on MRI images segmentation results is going to be estimation of white and gray matter. Gray matters are important reference in diagnosing neurodegenerative disorders such as Alzheimer disease, Parkinson or Parkinson related syndrome while white matter is important for metabolic or inflammatory disease, congenital brain malformations, perinatal brain damage, or in post-traumatic syndrome. Here detection accuracy (AAC) is defined as ACC = T P + T N T P + FP + FN + T N (6.7) T P, FP, FN, T N can be interpreted as True positive: white/gray matter correctly identified as white/gray matter, False positive: white/gray matter incorrectly identified as non-white/gray matter, True negative: non-white/gray matter correctly identified as non-white/gray matter, False negative: 47 T a b l e 6 . 5 R e s u l t s f o r Z 1 0 , Z 7 2 , Z 1 0 8 , Z 1 2 0 a n d Z 1 4 0 p l a n e s non-white matter incorrectly identified as white matter respectively.Moreover, VA(Volume predic- tion accuracy) is proposed to examine accuracy of white or gray matter prediction. VA is given by (6.8) V A = |P − T| |T| Where T and P denotes true volume and predicted volume by segmentation respectively Compar- isons are made on Gray and White Matter segmentation with respect to truth label and VMOGA results. It is illustrated in fig[66-68] Figure 6.7 (a)Gray Matter Ground truth (b)Gray Matter Extraction produced by VMOGA+Extraction on Z108 plane (a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA-based extraction on Z108 plane Figure 6.8 (a)Gray Matter Ground truth (b)Gray Matter segmentation produced by VMOGA on Z120 plane (a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA on Z120 plane 48 F i g u r e 6 . 6 ( a ) G r a y M a t t e r G r o u n d t r u t h ( b ) G r a y M a t t e r s e g m e n t a t i o n p r o d u c e d b y V M O G A o n Z 1 0 8 p l a n e ( a ) W h i t e M a t t e r G r o u n d t r u t h ( b ) W h i t e M a t t e r s e g m e n t a t i o n p r o d u c e d b y V M O G A o n Z 1 0 8 p l a n e Figure 6.9 (a)Gray Matter Ground truth (b)Gray Matter Extraction produced by VMOGA+Extraction on Z120 plane (a)White Matter Ground truth (b)White Matter segmentation produced by VMOGA-based extraction on Z120 plane MRI plane Index FCM SO Performance evaluation NSGA-II NSGA-III MOVGA VMOGA Extraction Z72 Z80 Gray Acc Gray VA White Acc White VA Gray Acc Gray VA White Acc White VA Z100 Gray Acc Gray VA White Acc White VA Z108 White Acc White VA Gray Acc Gray VA 0.692 46% 0.777 0.86 0.873 39% 33.8% 0.873 0.912 47% 59.5% 40.4% 0.852 50% 0.901 41% 0.906 35% 0.876 40% 0.791 64% 0.832 60% 0.821 66% 0.707 37% 0.839 40.8% 0.942 27.9% 0.906 40.8% 0.912 30.9% 0.802 0.893 0.932 40% 49.5% 30.9% 0.895 0.811 38% 77.2% 33.1% 0.844 Z120 White Acc White VA Gray Acc Gray VA 0.907 0.772 40% 106% 0.821 0.929 32% 83% 0.883 34.9% 0.881 17.2% 0.881 35.2% 0.904 25.1% 0.860 46.5% 0.933 13.8% 0.91 46.5% 0.934 15.8% 0.884 15.8% 0.948 38.4% 0.952 35.8% 0.93 16% 0.892 31.9% 0.915 29.6% 0.881 39.2% 0.939 22.2% 0.904 39.2% 0.925 29.6% 0.915 29.6% 0.864 22.6% 0.910 41% 0.901 19.4% 0.899 26.2% 0.924 17.7% 0.884 25.3% 0.948 16.5% 0.895 26.2% 0.941 15.1% 0.951 17.7% 0.947 19.9% 0.961 26% 0.944 12.9% 0.959 3.15% 0.978 3.3% 0.970 4.68% 0.971 3.85% 0.982 2.34% 0.987 1.09% 0.977 3.21% 0.985 0.53% 0.978 1.75% 0.984 0.21% Figure [66-69] shows the clustered gray and white matter using VMOGA technique. Visually it provides a similar clustering structure as that provided corresponding truth table. However, VMOGA fails to identify skin, skin/muscle matter which disturbs measurement accuracy. There- 49 T a b l e 6 . 6 G r a y a n d W h i t e M a t t e r e x t r a c t i o n r e s u l t s f o r n o r m a l b r a i n i m a g e s f o r Z 7 2 , Z 8 0 , Z 1 0 0 , Z 1 0 8 a n d Z 1 2 0 p l a n e s . fore, connectedness-based extraction is adopted as a supplementary procedure in quantization of Gray and White Matter and corresponding segmentation result of Z100 and Z120 plane is shown in Fig [66-69].It is illustrated in the table[66] that proposed VMOGA generally achieve higher accuracy when extracting White and Gray Matter. Meanwhile, the presence of VMOGA and NSGA-III results illustrate that increasing number of objectives provide growing coverage of data characteristics. Comparison of FCM, NSGA-II and NSGA-III further illustrate the importance of optimizing many objectives simultaneously. Furthermore, it is shown in the table [66] that evident discrepancies exist in the quantization of Gray and White Matter from VMOGA. Superior- ity of proposed Extraction (results from VMOGA plus connectedness based extraction) has been demonstrated which significantly reduce the error of detection and volume prediction accuracy. 6.4 Implementation on Fatigue Area Detection The proposed image processing method is validated on NDE data obtained from inspection of delamination in glass fiber reinforced polymer (GFRP) specimens subjected to Mode I fatigue tests. A GFRP specimen is periodically imaged from its healthy state to damaged state using optical transmission scan [26]. OTS has been previously used for tracking impact damage growth in GFRP specimens [27]. In the experiment, sample is illuminated by the laser source and power transmitted through the sample is recorded by photodetector to examine extent of delamination in the GFRP sample. At first, the critical displacement where the specimen cracks are recorded by introducing monotonic loading to five similar samples. The process is then repeated for all the specimens and the average critical displacement is computed. Fatigue loading is then conducted on a new sample under cyclic loading at 5 cycles/sec and displacement ratio of 0.1. OTS images of mode 1 GFRP sample subjected to 8 rounds of cyclic loading are presented in Figure [610]: 50 From OTS images, extent of delamination can be observed as the region between end of Teflon and the beginning of healthy part of the sample. It is shown that delamination inside specimen grows slowly as the number of load circles increasing. The piezoelectric sensors attached to the GFRP sample are marked as reference points and are used to calculate the physical area of delamination from dpix. During preprocessing, location of the two PZT sensors are identified and the pixel distance between their inner edges is recorded as lpix .Additionally, upper and lower edges of the sample and its pixel width is recorded as wpix . The cutting results are presented in Fig[611] 51 F i g u r e 6 . 1 0 H e a l t h y s a m p l e b e i n g s u b j e c t e d t o M o d e 1 c y c l i c l o a d i n g a f t e r ( a ) 2 0 K c y c l e s ( b ) 4 0 K c y c l e s ( c ) 6 0 K c y c l e s ( d ) 8 0 K c y c l e s ( e ) 1 0 0 K c y c l e s ( f ) 1 2 0 K c y c l e s ( g ) 1 4 0 K c y c l e s ( h ) 1 6 0 c y c l e s Area of delamination from the OTS image is originally computed using image processing algorithm implemented in MATLAB. The delaminated area is identified using segmentation via fast marching method (FMM) [02] whose corresponding threshold for extracting fatigue area is known. Physical distance between two PZT sensors Lphy and width of the sample Wphy the delamination area are measured before segmentation. Finally, Dphy is calculated by equation 6.9 dpix lpis × wpix (Lphy × Wphy)cm2 (6.9) Where Lphy = 10cm and Lphy = 2.5cm. Notice that unsupervised VMOGA generate result that contains edge effect which in practice does belong to delamination. 52 F i g u r e 6 . 1 1 P Z T c u t t i n g i m a g e o f H e a l t h y s a m p l e b e i n g s u b j e c t e d t o M o d e 1 c y c l i c l o a d i n g a f t e r ( a ) 2 0 K c y c l e s ( b ) 4 0 K c y c l e s ( c ) 6 0 K c y c l e s ( d ) 8 0 K c y c l e s ( e ) 1 0 0 K c y c l e s ( f ) 1 2 0 K c y c l e s ( g ) 1 4 0 K c y c l e s ( h ) 1 6 0 K c y c l e s Figure 6.12 Edge effect and real delamination area in PZT cutting image of Healthy sample after 160K cycles In order to separate edge effect from real delamination, a special window function is adopted. Recall the process in constructing refined membership function where gk, j = 1 i f uk, j ≥ ul, j, l = 1, 2, ..., K 0 otherwise In fact, gk, j is the crisp version of fuzzy membership uk, j, W = {wi} is predetermined as a group of rectangular windows with width equal to one. The length of wi is denoted as li Largest component of ROI produced by h∗ k, j is selected as our segmenting result. Binary images  hk, j =  = i gk, j T = li + 1 2  j∈wi h∗ k, j 1 i f hk, j ≥ hl, j, l = 1, 2, ..., K 0 otherwise (6.10) (6.11) (6.12) denoting delamination area is shown in figure[613-614].In the end, plot of number of load cycles 53 F i g u r e 6 . 1 3 H e a l t h y s a m p l e b e i n g s u b j e c t e d t o M o d e 1 c y c l i c l o a d i n g a f t e r 2 0 K , 4 0 K , 6 0 K a n d 8 0 K c y c l e s Figure 6.14 Healthy sample being subjected to Mode 1 cyclic loading after 100K, 120K, 140K and 160 cycles versus delamination area detected is shown in fig[615]. It is clear that curve of delamination area detected by VMOGA Extraction fits very well. 54 F i g u r e 6 . 1 5 P l o t o f n u m b e r o f l o a d c y c l e s v e r s u s d e l a m i n a t i o n a r e a d e t e c t e d b y V M O G A + E x t r a c t i o n a n d F M M . CHAPTER 7 CONCLUSION AND FUTURE WORK In this article, a variable-length multi-objective optimization technique(VMOGA) has been pro- posed to evolve the number of clusters automatically. NSGA-III has been adopted as underlying optimization framework for multi-objective optimization due to its capability of managing a large number of objectives. Essential components of existing multi-objective optimization algorithms have been discussed and developed. Frist of all, a new encoding and genetic operator has been designed following principle underlying binary operator in order to reduce redundant computation. Then, weaknesses with existing evolutionary operator are discussed and a rejection scheme is pro- posed to remove “bad” solutions that occupy population space. Furthermore, total correlation is introduced as a supplementary tool to make suitable choice of objectives. Conventional objective functions are presented. Spatial Jm, intra-cluster entropy and connectedness which incorporate various aspects of spatial information are also constructed in order to better target image segmen- tation problem. Superiority of VMOGA is demonstrated in segmenting MRI human brain images by comparing with several other optimization algorithms. Additionally, In the end, VMOGA is performed on detection of delamination area caused by fatigue loading in Mode I glass fiber rein- forced polymer (GFRP) samples. Results suggest future potential application in fatigue detection. However, some limitations still exist in our proposed VMOGA. It is known that among the large number of cluster validity indices, none of which performs satisfactorily for a wide range of data sets alone. Therefore, it is important to select two or more objective functions that can complement and compensate for one another. This is illustrated by MRI experiment. We know that different tissues may hold similar intensity information at some locations while intensity value of a particular tissue may variate across some region. This is caused by noise and intensity inhomogeneities in data collection phase. As a result, it is possible to improve clustering results by incorporating extra spatial information. However, the amount of spatial information needed is still unknown. In future work, balance between geography-based and intensity-based objectives should be examined. 55 BIBLIOGRAPHY 56 BIBLIOGRAPHY [1] K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference- point-based nondominated sorting approach, part i: Solving problems with box constraints.,” IEEE Trans. Evolutionary Computation, vol. 18, no. 4, pp. 577–601, 2014. [2] S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay, “Multiobjective genetic clustering for pixel classification in remote sensing imagery,” IEEE transactions on Geoscience and Remote Sensing, vol. 45, no. 5, pp. 1506–1511, 2007. [3] A. Mukhopadhyay and U. Maulik, “Unsupervised pixel classification in satellite imagery using multiobjective fuzzy clustering combined with svm classifier,” IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 4, pp. 1132–1138, 2009. [4] A. Mukhopadhyay and U. Maulik, “A multiobjective approach to mr brain image segmenta- tion,” Applied Soft Computing, vol. 11, no. 1, pp. 872–880, 2011. J. C. Bezdek and N. R. Pal, “Some new indexes of cluster validity,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 28, no. 3, pp. 301–315, 1998. [5] [6] U. Maulik and S. Bandyopadhyay, “Performance evaluation of some clustering algorithms and validity indices,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1650–1654, 2002. [7] S. Bandyopadhyay, U. Maulik, and R. Baragona, “Clustering multivariate time series by genetic multiobjective optimization,” Metron, vol. 68, no. 2, pp. 161–183, 2010. [8] G. N. Demir, A. S. Uyar, and S. G. Ögüdücü, “Graph-based sequence clustering through multiobjective evolutionary algorithms for web recommender systems,” in Proceedings of the 9th annual conference on Genetic and evolutionary computation, pp. 1943–1950, ACM, 2007. [9] N. R. Pal and J. C. Bezdek, “On cluster validity for the fuzzy c-means model,” IEEE Trans- actions on Fuzzy systems, vol. 3, no. 3, pp. 370–379, 1995. [10] W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy sets and systems, vol. 158, no. 19, pp. 2095–2117, 2007. [11] A. Ferligoj and V. Batagelj, “Direct multicriteria clustering algorithms,” Journal of Classifi- cation, vol. 9, no. 1, pp. 43–61, 1992. [12] J. Handl and J. Knowles, “An evolutionary approach to multiobjective clustering,” IEEE transactions on Evolutionary Computation, vol. 11, no. 1, pp. 56–76, 2007. [13] Y. Jin, T. Okabe, and B. Sendho, “Adapting weighted aggregation for multiobjective evolu- tion strategies,” in International Conference on Evolutionary Multi-Criterion Optimization, pp. 96–110, Springer, 2001. 57 [14] D. Goldberg, “Genetic algorithm in search optimization and machine learing, addison- wesleypub,” 1989. [15] E. Zitzler and L. Thiele, “An evolutionary algorithm for multiobjective optimization: The strength pareto approach,” TIK-report, vol. 43, 1998. [16] E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strength pareto evolutionary algorithm,” TIK-report, vol. 103, 2001. [17] D. W. Corne, J. D. Knowles, and M. J. Oates, “The pareto envelope-based selection algorithm for multiobjective optimization,” in International Conference on Parallel Problem Solving from Nature, pp. 839–848, Springer, 2000. [18] D. W. Corne, N. R. Jerram, J. D. Knowles, and M. J. Oates, “Pesa-ii: Region-based selection in evolutionary multiobjective optimization,” in Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, pp. 283–290, Morgan Kaufmann Publishers Inc., 2001. [19] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE transactions on evolutionary computation, vol. 6, no. 2, pp. 182– 197, 2002. [20] U. Maulik, S. Bandyopadhyay, and A. Mukhopadhyay, Multiobjective Genetic Algorithms for Clustering: Applications in Data Mining and Bioinformatics. Springer Science & Business Media, 2011. [21] J.-M. Won, S. Ullah, and F. Karray, “Data clustering using multi-objective hybrid evolutionary algorithm,” in Control, Automation and Systems, 2008. ICCAS 2008. International Conference on, pp. 2298–2303, IEEE, 2008. [22] R.-S. Kwan, A. C. Evans, and G. B. Pike, “Mri simulation-based evaluation of image- processing and classification methods,” IEEE transactions on medical imaging, vol. 18, no. 11, pp. 1085–1097, 1999. [23] K. Deb and R. B. Agrawal, “Simulated binary crossover for continuous search space,” Complex Systems, vol. 9, no. 3, pp. 1–15, 1994. [24] M. Laumanns, L. Thiele, K. Deb, and E. Zitzler, “Combining convergence and diversity in evolutionary multiobjective optimization,” Evolutionary computation, vol. 10, no. 3, pp. 263– 282, 2002. [25] K. Y. Yeung and W. L. Ruzzo, “An empirical study on principal component analysis for clustering gene expression data,” Bioinformatics, vol. 17, no. 9, pp. 763–774, 2001. [26] A. Khomenko, O. Karpenko, E. Koricho, M. Haq, G. L. Cloud, and L. Udpa, “Theory and validation of optical transmission scanning for quantitative nde of impact damage in gfrp composites,” Composites Part B: Engineering, vol. 107, pp. 182–191, 2016. 58 [27] P. Banerjee, O. Karpenko, L. Udpa, M. Haq, and Y. Deng, “Prediction of impact-damage growth in gfrp plates using particle filtering algorithm,” Composite Structures, vol. 194, pp. 527–536, 2018. 59