1 \ l 1 m ’4 than; I; :m This is to certify that the thesis entitled REPRESENTATIONS OF 30 FACES: TOWARD TIME COMPLEXITY REDUCTION AND EXPRESSION RECOGNITION presented by Mayur Mudigonda has been accepted towards fulfillment of the requirements for the Master of degree in Computer Science Science J "v 4 ’ —»/ Major Professor’s Signature 72‘?) 4&5} 2" /(_’D (/ Date MSU is an Affirmative Action/Equal Opportunity Employer LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 KIProj/Aoc3Pres/ClRC/Dat00uo.indd REPRESENTATIONS OF 3D FACES: TOWARD TIME COMPLEXITY REDUCTION AND EXPRESSION RECOGNITION By IVIayur Mudigonda A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Computer Science 2010 ABSTRACT REPRESENTATIONS OF 3D FACES: TOWARD TIME COMPLEXITY REDUCTION AND EXPRESSION RECOGNITION By Mayur Mudigonda We parametrize 3D faces with superquadrics and PCA techniques to support clus- tering. We verify the effectiveness of superquadrics finding axes of symmetry for 3D faces on pre—normalized datasets.We show that a superquadric fit to 2.5D face data in conjunction with other normalization techniques helps in bettering the normal- ization. This is observed by comparing 100 x 100 meshes of all faces using various normalization techniques; the top 10 eigen values represent about 99% energy with our method, 95% with only CFDM normalization and about 83% with superquadric normalization alone. One of the goals of parametrization is to be able to show a robust partitioning of the dataset into smaller robust clusters, thus reducing the complexity of identification of faces. On the MSU data set we show that we can create about 20-30 clusters with an accuracy of about 83% using traditional k—means. We also show that using semi- supervised clustering techniques we are able to extend the number of clusters to higher numbers of 30—40. We analyze the various feature spaces proposed and show that shape curvatures (exponential parameters of superquadrics) contain maximal information (6.7). We attempt to model expressions in 3D faces using Gabor filters and a novel method to parameterize clusters using convex hulls that extract the perimeter of clusters and apply PCA to parameterize the clusters. We show that such a model can fully automatically discriminate smile versus no expression with an accuracy of 78%. Future extensions involving using shape parametrizing of local regions are needed to achieve a practical performance for this problem. ACKNOWLEDGMENT This piece of research has fructified by the good will and support of many. First, I would like to acknowledge my parents (and Mash) who laid the foundation for my education by always encouraging me to question and providing ample opportunities to debate and learn. Without their countless sacrifices this research would not be possible. I dedicate this thesis to them. I am grateful for the mirthful debate, patient ear and the challenging discussions with Dr.Stockman all through my time at Michigan State. Advice often masked through witty questioning has helped shape my thought processes, for this I am indebted to him. I thank Dr.Jain for his time, insight and insightful advice. I thank Dr.Jin for helping develop a deeper appreciation of research in pattern recognition and machine learning. I also thank Frank (Dr.Biocca) for his boundless encouragement, brain storm sessions and initially supporting me at Michigan State University. Along the paths of research I have stumbled every now and then, only to be helped and motivated by my close friends(and advisors) Naveen and Pavan (MNSSK). I acknowledge the patient listening and advice of my counsellors Adi, AK and Pa- van(Ramkumar). I also acknowledge other friends and PRIPies (Abhishekh in partic- ular) for their support, wonderful lunch discussions and help in times of need. Finally, I thank the many friends (Zubin, Kantha, Ashish) and events that have helped shape the person that I sometimes fondly look upon. iii List of Tables ................................. List of Figures ................................ Introduction ...... . . . . . . . . . ............ 1.1 Faces .................................... 1.2 Biometrics ................................. 1.2.1 Biometric Modalities ....................... 1.2.2 3D Faces .............................. 1.2.3 Scanning 3D Faces ........................ 1.2.3.1 Motivation For This Thesis .............. 1.2.4 Thesis Contributions ....................... Superquadric Representation of 3D Faces ............ 2.1 Superquadrics ............................... 2.1.1 Superquadrics ........................... 2.1.2 Superquadrics In General Position ................ 2.1.3 Ambiguities in shape description ................ 2.2 Fitting ................................... 2.2.1 A robust normalized co—ordinate space ............. 2.2.2 Fitting Results .......................... 2.3 PCA and Eigenfaces ........................... 2.4 Toward a Parametric Face ........................ Clustering Representations . .................. 3.1 A Note On Clustering .......................... 3.1.1 Unsupervised Learning - Kmeans ................ 3.1.2 Semisupervised Clustering .................... 3.2 Clustering Results ............................ 3.3 Time Complexity ............................. 3.4 Feature Analysis ............................. 3.4.1 Submeshed Projections ...................... Parameterizing Expressions: Recognizing Smiles . . . . ..... 4.1 Gabor Filters ............................... 4.2 Cluster Parametrization: A Convex Hull And PCA Approach 4.2.1 Parameterization of Expressions ................. 4.2.2 Recognizing Smiles ........................ TABLE OF CONTENTS iv vi 61 5 Discussion ............................ 63 6 Appendix ............................ 66 6.1 A - Convex Hull .............................. 66 6.2 B - Dataset Description ......................... 68 6.3 C-Sample Superquadric Fitting ..................... 70 6.4 D- Expression Modeling Samples .................... 80 Bibliography ..................................................... 88 1.1 2.1 2.2 3.1 3.2 3.3 4.1 4.2 4.3 6.1 LIST OF TABLES Comparison of biometric technologies. The data are based on percep- tion of three biometric experts. This table is from the book by Jain et a1 [1] .................................... Fitting Results on Raw Scans. Rotations are described in radians. . . Fitting Results on CFDM Data Set. Rotations are described in radians. List of processes / data formats and their corresponding times and mem- ory requirements. Some data from Colbry et al [2] ........... Table showing the variance capture by the top 3 Eigen ........ Table comparing entropy estimates of Superquadric fitting, 100 x 100 Eigen projections and 25x25 eigen projections ............. Performance of feature vector for classification on neutral vs smiling faces using 200 points on the convex hull ................ Performance of feature vector for classification on neutral vs smiling faces using varying points on the convex hull projected onto 10 princi- pal components .............................. Performance of feature vector for classification on neutral vs smiling faces using submeshed Eigen Projection Weights using ........ Operation of the Vivid 910 scanner (time in seconds). Details from PhD dissertation - Dirk Colbry [3] .................... vi 27 48 49 61 61 68 1.1 2.1 2.2 2.3 2.4 2.5 2.6 2.7 LIST OF FIGURES Images in this thesis are presented in color Shows activation of the bilateral Parahippocampal Place Area. (PPA), preferentially activating more for scenes [4,5] .............. Block diagram that explains the various steps involved in this chapter The vertical axes of varies the North-South exponential parameter, while the horizontal axis varies East-West exponential parameter. The values are varied from 0.1 to 2 on both the axis. ............ Superquadric fit to a CFDM face scan. The superquadric is plotted in green .................................... Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data . The numbers above the fits (Left to Right) are the 5 shape parameters a1, a2,a3, 61,62 .............................. Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data . The numbers above the fits (Left to Right) are the 5 shape parameters a1, a2, a3,61, 62 .............................. Superquadric Fit for a subject 1038 3 scans. The numbers above the fits (Left to Right) are the 5 shape parameters a1, a2, a3, 61, e2 . . . . Histograms of the 5 superquadric shape parameters. X,Y and Z radii are measured in millimeters(mm). .................... vii 20 20 28 30 30 32 32 2.8 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4.1 4.2 4.3 Superquadric fit to a poor CFDM face scan. Note in the top left corner, the scan has many holes in it. Holes are a result of the reflected laser not being captured due to hair, etc . The superquadric is plotted in green. The blue points are the sampled data points for fitting the superquadric. ............................... Plot of recall accuracy against cluster size for CFDM and superquadric normalized dataset with no Eigen and 5 shape parameters used Plot of recall accuracy against cluster size for CF DM and superquadric normalized dataset with 10 Eigen parameters and no shape parameters used .................................... Plot of recall accuracy against cluster size for CFDM and superquadric normalized data set with 10 Eigen and 5 shape parameters used . . . Plot of recall accuracy against cluster size for CFDM and superquadric normalized data set with 50 Eigen and 5 shape parameters used . . . Plot of recall accuracy against cluster size for the raw data set with 50 Eigen and 5 shape parameters used ................... Plot of Recall Accuracy against Cluster Size for the CFDM and su- perquadric normalized Data Set with 20 Eigen and 5 shape parameters using the MPCK—means Algorithm ................... Rank Accuracy: Retrieving sister scans based on parameters ..... This figure shows a typical Gabor filter bank. Left to right are the changing frequencies. Top to bottom are the changing spatial orien- tations. Typrically, an input face is convolved with each of the filters to give a gabor intensity map. Features are extracted from various intensity maps for the specific application ................ Frontal scan with no expression of subject 002 from the MSU-Face database. ................................. Frontal scan with smile expression of subject 002 from the MSU-F ace database .................................. viii 42 44 45 46 47 48 53 55 4.4 4.5 4.6 4.7 4.8 6.1 6.2 6.3 6.4 6.5 6.6 6.7 Convolved responses of the no—expression frontal scan 4.2 using Gabor map described above Figure4.1. Notice how some responses identify the regions around the eyes, nose and lips while others tend to iden- tify regions that are not easily identifiable. Note also how the Gabor responses identify the boundary of the face scan from the input depth map. .................................... 56 Convolved responses of the smile-expression frontal scan 4.3 using Ga- bor map described in figure Figure 4.1. Notice how some responses identify the regions around the eyes, nose and lips while others tend to identify regions that are not easily identifiable. Note also how the Gabor responses identify the boundary of the face scan from the input depth map. ................................ 57 Block diagram showing the procedure for parameterizing smiles. . . . 58 Gabor responses for smile versus neutral expression along with the input scans ................................ 59 Blobs identified on Gabor response of a neutral scan of subject 13 from the MSU database. ............................ 60 Convex Hull (blue line) of a random set of points (red plus) from a nor- mal distribution with mean (it: [1,2]) and co-variance (a=[0.9,0.2;0.2,0.5]) 67 Figure showing 2.5D facescan of subject from the MSU-CFDM dataset 70 Figure showing 2.5D facescan of subject from the MSU-CFDM dataset along with a superquadric fit, with x—y axes visible .......... 71 Figure showing 2.5D facescan of subject from the MSU-CFDM dataset along with a superquadric fit, with y—z axes visible ........... 72 Figure showing 2.5D facescan of subject from the MSU-CFDM dataset along with a superquadric fit, with x-z axes visible. The superquadric fit parameters are as follows [64.39,95.5,40,0.9062,.8998] ........ 72 Figure showing 2.5D frontal facescan of subject from the MSU-CFDM dataset ................................... 73 Figure showing 2.5D frontal facescan of subject from the MSU-CFDM dataset along with a superquadric fit.The fit parameters are as follows [56.6,98.2,41.72,1.4742,0.6360] ...................... 74 ix 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 Figure showing 2.5D frontal facescan of subject from the MSU—CFDM dataset ................................... Figure showing 2.5D frontal facescan of subject from the MSU-CFDM dataset along with a superquadric fit. The fit parameters are as follows [56.8,92,41.67,1.4469,0.7807] ....................... Figure showing 2.5D frontal facescan of subject from the MSU-CFDM dataset showing a smile expression ................... Figure showing 2.5D frontal facescan with smile expression of subject from the MSU-CFDM dataset along with a superquadric fit. The fit parameters are as follows [61.05,90,44.13,1.5375,0.9761] ........ Figure showing 2.D frontal facescan that is both poorly normalized and has many holes .............................. Figure showing 2.D frontal facescan that is both poorly normalized and has many holes (previously described) fit with a superquadric. Note, how superquadrics find the axes in spite of the holes and poor normalization ................................ Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression ...................... Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the absolute Gabor responses . . . Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression ...................... Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the absolute Gabor responses . . . Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression ...................... Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the real Gabor responses ..... Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression ...................... 75 76 78 79 79 80 81 82 83 84 86 6.21 Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the real Gabor responses ..... Chapter 1 Introduction This work addresses the problem of person recognition using a large gallery of 3D face data. In other words, given face scan Vu of an unknown person (or expression), we need to decide if Vu is similar to any other V2 in our dataset. We further assume that n is impractically large to allow for comparisons of Va with every V2. The organization of this thesis is as follows. Chapter 1 provides an overview of object, face and 3D face representation. It further tries to motivate the problems addressed in this thesis. Chapter 2 describes the problem of fitting superquadrics to 3D faces. It further describes the use of PCA for parameterizing 3D faces. Chapter 3 describes work regarding clustering (binning) representations obtained in Chapter 2. Chapter 4 describes work in discriminating smiles from neutral expressions. Chapter 5 concludes this thesis with a discussion of the contributions and future work. Representing knowledge for inferencing is key to many decision making tasks. A good representation of the problem helps making decisions. By representation we mean a small set of parameters (features) by which we can explain the task accurately. A typical small image (100 x 100 pixels, say) from a low-resolution camera contains 10,000 pixels. If each pixel is considered a feature, then we have a large number of parameters that describe each image and the objects (or scenes) that they may 1 describe. Dealing with such a large feature space is not only time consuming but also inefficient because (in many cases) all pixels don’t really contribute towards the description of the object (scene). Representation - A Neuroscience Perspective Representing objects is an important area of research not only for pattern recogni- tion but also for cognitive neuroscience. Cognitive neuroscience is the study of neural precepts of mental processes. Present day research in cognitive neuroscience often draws upon techniques in pattern recognition [6]. Similarly, research in neuroscience contributes sometimes motivates research in pattern recognition [7]. Some of the earliest work in object representation was done by Biederman [8]. He observed that objects can be recognised simply by their shape. Further,information such as colour, texture, etc did not contribute as much to their recognition. Since Biederman, research in cognitive science has evolved from human responses to studying cortical data.The use of computational neuroimaging methods such as functional Magnetic Resonance Imaging (MRI), Magneticencephalography (MEG), Electroencephalography(EEG), etc have helped advance the field of cognitive neuro— science. FMRI data are indirect measures of neuronal activity based on the difference in magnetic activity of blood between oxygenated and de-oxygenated blood. FMRI maps this difference as an image [9-11]. Faces have been the subject of significant attention in the neuroscience community. They are thought to be processed in Fusiform Face Area (FFA) [12]. It is generally thought that the identification process precedes the recognition (verification process). Humans recognize many visual cues such as expressions and other social signals very astutely and accurately. These cues seem to be universally interprettable [13]. It is thought that human faces are represented by the differences from an averaged face model [14]. Haxby et a1. [6] showed that representations for faces and objects are distributed in the ventral temporal cortex thus laying out some ground work for 2 representation of objects (and faces) in the brain. Similarly, scenes are thought to be processed in the Parahippocampal Place Area (PPA) [15]. A distributed approach to face (and object) processing seems to be the accepted model of research in the community as of today. This is of interest to the face research community because visual network models use information from such studies. Figure 1.1: Shows activation of the bilateral Parahippocampal Place Area (PPA), preferentially activating more for scenes [4,5] It is important for Artificial Intelligence (AI) researchers to understand how hu- mans processes information and synthesize new data, so as to be able to emulate artificial systems that can perform human—like activities or even exceed them. Thus, research in neuroscience (sometimes) goes hand in hand with the long term research goals in the AI community. Representation - A Computer Vision Perspective The harder aspect of recognizing and representing objects in computer systems is partly due to factors such as lighting, scale and orientation changes. There exist many different approaches to object recognition in the computer vision community. In the following paragraphs, we will briefly look at some of these models. Contour modeling is a very popular approach in computer vision. The most popular approach being snakes or active contour models [16]. Snakes are energy minimizing splines that are guided by both extrinsic forces and image forces to align themselves as close the lines and edges that we wish to model. Methods to model high and low level vision have also been developed using Markov 3 Random Fields (MRF) [17]. Such methods can encode the prior in the MRF and thus hope to solve larger vision problems. Huang et a1. [18] proposed a hybrid face recognition method using MRFs. They break a face down into small patches and use MRFs to represent the relationship between the various patches and the patch’s ID. Object recognition through shape matching algorithms is a popular approach [19]. Contextual shape matching is done through solving the point correspondence and creating descriptors that describe the distribution around the point, such that it generalizes the ability of the feature to represent the object fairly well. Scale Invariant Feature Transforms (SIFT) is also a very popular approach that identifies key points or descriptors in each image. Typically, feature vectors are generated by taking representative descriptors for each cluster of descriptors [20]. Other approaches to modeling objects for various applications include geometric models such as superquadrics [21,22], deformable models [23], volumetric approaches [24] , etc. 1. 1 Faces Human faces are a source of fascination to both artists and scientists. Faces are smooth and symmetrical [25] . Some of the earliest work to understand faces was done by Leonardo Da Vinci. Sir Francis Galton laid the foundation for characterizing profile faces [26]. He studied various component measurements of the face and defined key feature points on the profile. Formally, he developed a method for characterizing irregular curves. Anthropometric research is the field of study where measurements are made on the surface of the face and then compared across race, gender, age and ethnicity. Anchor points based face identification has been strongly driven by anthropometric research [27]. Use of anthropometric measurements in medicine is also not uncommon [28]. 4 Earlier work on characterizing 2D faces involved developing a face model based on manually located anchor points. Active Shape Modeling (ASM) proposed by Cootes [29] is such a model. ASM uses 68 key points identified along the contours of the face and key regions such as eyes, nose and lips. The choice of the number of anchor points is based on empirical requirements such as quality of image, size of the face (with respect to the image), etc. The location of the key points is intuitive because we wish to represent the face through silhouttes. Methods of modeling faces based on features based of profile curves are called Analytic methods. Analytic methods are popular because they are potentially invariant to pose, lighting and other minor global changes. However, if the features are innaccurately detected then analytic methods break down. Other approaches to modeling faces include holistic methods such as eigenfaces [30]. The eigenfaces approach is easy to train, simple and fast. However, since it works primarily on finding the eigenvectors of the covariance matrix, if the faces are not very well correlated (due to misalignment, lighting, etc) this approach will perform poorly. Hybrid approaches such as those used by Manjunath et a1 [27] address face mod- eling by using Gabor response maps for non-specific features. Hybrid methods offer a compromise using a mixture of feature-based and analytic methods for face parame- terization. In our work, we address the problem of 3D face parameterization through such a scheme. 1 .2 Biometrics The need to identify or recognize individuals has gained significant importance in today’s society. Biometrics provides a way to authenticate transactions between man and machines (and man-man). Biometric applications can largely be classified into 5 two categories 0 Recognition (Verification) is the problem of matching a user against his or her own template scan that is stored in the database. Typical applications include personal data/ device authentications, bank vaults, etc. 0 Identification is the problem of matching an unknown individual (probe scan) against a set of persons (gallery of scans) that are stored in the database. Typical applications include detecting suspicious characters, detecting anoma- lous objects, etc. The typical output of a biometric recognition system can be any of the following st ates 0 True Positive A case of correctly identifying someone as the correct user of the system. 0 ’IYue Negative A case of correctly identifying someone as an incorrect user of the system. 0 False Positive A case of falsely identifying someone as the correct person, when in actuality it is another person. 0 False Negative A case of falsely identifying someone as an incorrect user, when in actuality it is another person. In specific cases where the cost of making an incorrect decision outweighs the cost of making no decision, one may choose a reject option. This action may mean the added cost of a person being scanned again for a repeated match to the gallery. 6 1.2.1 Biometric Modalities Biometrics are biological measurements that help identify a person repeatably and correctly. A biological measurement must satisfy the following criterion [31] o Universality It should apply to every person, irrespective of whether they are going to be enrolled in the system 0 Uniqueness It should be sufficiently discriminative, such that each person’s biometric characteristic is different from everyone else 0 Permanence It should be time invariant as far as possible, i.e. less prone to effects of aging o Collectability It should be easy to collect. Genetic markers such as DNA are unique identifiers of humans [32],but the process of obtaining and verifying DNA samples is laborious and sometimes not convenient. In practice, other factors come into play [33]. There exist many different type of biometric modalities as seen in Table 1.1. Each biometric measurement has its own benefit and no one measurement is expected to satisfy all requirements. Current day research suggests that a fusion of biometrics provides a much better system against spoofing and other such attacks [34,35]. Thus, research in biometrics strives to better the performance of each modality. Performance, is effected by rep- resentation and matching algorithms. We refer the reader to Ross et al. for a good overview of the field [36]. 1.2.2 3D Faces Although 3D data from range sensors has been available since the 1970’s [37] the popularity of such devices has only recently increased. Recognition of faces has largely 7 E :5 5 85:. 24 :oo: 55 80: m: 533 5:8 .8555 525853 55: .5 525555 50 4555 55 555 5:8 5550:0845» 525855 :0 50555800 ”44 5558 3s82 85 a282 3s82 85 S282 8382 3.5 3s82 85 33 85 33 33 e382 85 33 e382 33 33 85 85 85 <25 33 e282 33 33 85 85 85 85 85 85 8282 85 33 85 85 83808885 >54 :53 >54 85:52 >54 >54 85:52 82m 55> 354 :wfi >54 :wfi >54 >54 >54 555.55% 85 33 85 33 2282 85 85 8.8 8:85 85 33 85 5282 85 85 85 .5 85 a282 a282 a282 a282 a282 8:82 58> 8.5 85:52 85:52 >54 85:52 >54 >54 >54 5:85.23: 85:52 85:52 85:52 :wfi 85:52 85:52 85:52 2.85850 5553 85 225.2 85 2s82 85 85 8:82 858835 33 85 33 85 e282 33 85 85 5555255 0.50 23455554 5:585:52 3:535:50 55555852 5555558 D 2555253 5.25555 been approached as a 2D problem. Recognition accuracies for 2D face images is fairly large (> 99% in some cases), the performance quickly decays for large changes in lighting, illumination and pose. Although 3D faces are relatively recent, 3D face models based research has been in vogue largely from the graphics community [38—40]. Blanz and Vetter [41] showed that it is possible to fit 2D faces to 3D face models and then perform recognition. They show accuracies of approx 78% on side views (CMU PIE dataset) and 86% on frontal view (FERET) datasets. In their work, they morph a given input 2D face to a 3D face model in the gallery. Correspondence between probe and gallery is performed using image flow techniques. Once a 3D face model is constructed, recognition is performed by calculating the Mahalanobis distance [42]. Lu et al. [43] show a specific example of a multi-view 3D scan based recognition system. They construct a 3D model based on multiple views of subjects. Although, it is perhaps infeasible for a common place application of such a system, high-end or high-security systems could definitely benefit from such research. Further, with reduction in scanner costs such a system might be the way of the future. Park and Jain [44] show a 3D model construction from multiple non—frontal views for face recognition. They show a performance improvement of 40% on the CMU Face In Action (F IA) dataset. This method has benefits for a more robust recognition system from video streams. Lu and Jain [43] tackle the problem of expression (which affects shape) by us- ing a deformation model to handle the non-linearity of expressions for recognition. Kakadiaris et al. [45] show a face recognition system that uses a deformed face model that represents 3D geometry in a 2D structure. Wavelet features are extracted and then matching is performed. If we consider a mesh as the simplest (but more com- putationally intensive to match) representation of a 3D face, then matching can be 9 performed using a rigid surface alignment method such as Iterative Close Point (ICP) algorithm [46]. Colbry and Stockman present a 3D face verification system using a rigid alignment method in their work [47]. 3D faces are a logical extension to 2D faces in that the z-depth of the surface points is included. This extra piece of information could be useful to estimate shape curvature more reliably that estimating them from 2D intensity maps. “With the availability of the z-depth, pose can be estimated a bit better than with just 2D faces. Also, depth maps are not significantly affected by lighting changes or make up. Earlier research supposed that 3D data would address issues of poor recognition accuracies through the use of better shape curvature estimations as features. Prelim- inary results showed that matching accuracies using only 3D were only comparable to using 2D face scans [48,49]. Fusing 2D and 3D scans definitely improved performance accuarcy. To be critical, such a matching schema represents multiple representations of the same input face, so the results of such research should be interpreted appro- priately [50]. Excellent surveys of 2D + 3D fusion biometrics maybe found in Chang et al. [49] and Bowyer et al. [50]. The use of 3D faces helps the feature based methods of face recognition and mod- eling, since it allows for a more reliable estimation of shape curvature and detection of anchor points [51,52]. Mian et al. [53] fit a surface on uniformly sampled 3D data. Features are constructed from these surfaces. Principal Component Analysis (PCA) is performed on the gallery images. Input and gallery images are projected onto these eigen subspaces and comapred. A graph is constructed based on the matching fea- tures for the 3D data. Shape Invariant Feature Transforms (SIF T) are applied to 2D face data. A fusion of both 2D and 3D features are then applied for matching. They show a 96.1% identification and a 98.6% verification rate on the Face Recognition Grand Challenge (F RGC) dataset [54]. 10 1.2.3 Scanning 3D Faces There are 3 major variants of scanning methodologies for 3D faces [50]. 0 Passive Stereo In a passive stereo setting, the geometric relationship is known between the two cameras used to acquire the images of the face. The location of feature points in 3D is then computed from matching 2D pixels. An example is the Geometrix system [55]. 0 Structure from lighting The scanner projects a horizontal plane of laser light onto the surface of an object, and then a 640 x 480 pixel color camera picks up the curve produced by the interaction of the laser with the object, and triangulates the depth of the illuminated surface points. The Minolta sensor is such a system. In our research, we use data from the Minolta Vivid scanner [56]. 0 Hybrid In such a setting both structured light and a stereo camera setup is used to collect data. Such a system can improve the point density collected. The 3Q “Qlonerator” is one such system [57] Although, the literature is spilt on the actual benefits of using 3D face data for better recognition accuracies [50], there is sufficient reason to believe that 3D face data in conjunction with 2D face data helps improve performance. It may also be that 3D face research is where 2D face research was a few years back with FERET [58]. The 3D face research community has moved towards a standardized dataset for face recognition research with the Face Recognition Grand Challenge (FRGC) [54]. The FRGC v2 data set contains 4007 3D scans from 466 subjects and contains multiple expressions as well. 1.2.3.1 Motivation For This Thesis Face recognition and biometrics have evolved to become mature fields over the past few decades. Yet a large scale practical implementation of a 3D face matching system 11 is inhibited by the time complexity of matching. Even assuming less than a few seconds for recognition in a dataset with a few thousand subjects, would result in unacceptable processing time. Reduction of complexity can be done either by reducing the time taken for pair wise matching or by partitioning the search space. In this work we choose the latter scheme. The idea of partitioning search spaces is not an uncommon idea in the field of biometrics. The fields of fingerprint, iris, etc have addressed similar problems. Germain et al [59] present a nice method to cluster finger print data. Bowyer et al. [60] present a comprehensive survey of iris image understanding in their work. The larger goal of such research would be to create a large number of robust clusters that are created on a fairly comprehensive data set (106 subjects, say) so that it may generalize well into the real world setting. For example, when a person is scanned at an airport terminal the face is first parameterized and a search is made in the parameter space to find the best cluster. Once the cluster has been identified the matching procedure, such as the ICP algorithm [46], can be employed to match the input face with all the faces in the cluster. If we have,say, 100 clusters with each cluster representing 1000 people, then we have reduced our matchings from 105 to 103. We know that a probe scan can be matched against a gallery of 330 scans in under 5 seconds [2]. Thus, we can effectively reduce the time taken to verify a probe to about 15 seconds. If we further parallelize some of these calculations, lower times could be achieved. If we can further rank all the faces within a cluster using the parameters, i.e. faces that are nearer to each other with respect to the matching algorithm are in fact closer to each other with respect to the parameter space then we can further restrict our search from 1000 people to only, say, the top 10 neighbours of a face scan within a cluster. Thus, we can effectively recognize a person in about 30 seconds as opposed to 3 x 105 seconds. 12 If we do not find a face with in the specified cluster we can still intelligently search amongst other nearest clusters and still achieve good performance. We parameterize human faces in our work here using a hybrid model of geometric models - superquadrics and PCA based eigen parameters. The motivation for this is for the geometric model to capture larger features of human faces, such as length, width and gross curvature of the face, while the PCA based eigen parameters capture finer details. Superquadrics are a powerful class of geometric surfaces originally proposed by Barr [61]. Superquadrics are endowed with two additional exponential parameters that extend the range of traditional quadrics. Very simply, superquadrics can be thought of as an extension to ellipsoids. Colbry and Stockman [47] present a rudimentary parameterization as part of their work for developing a Canonical Face Depth Map (CFDM) that aims to provide a robust normalization for 2.5D faces. Niu et.al [62] present a way to cluster facial landmarks and provide insight into how regions effect separability. Extended su- perquadrics have been employed by Zhang et al. [63] to track 3D faces, which pro- vides support for a superquadric based representation of faces. We choose a regular superquadric because of the its ability to provide better normalization of 3D faces. Interesting work on shape retrieval and 3D faces can be found in the Shape Retrieval Contest (2008) [64]. One other problem we wish to address in this work is to parameterize expressions in 3D faces. We address this issue using Gabor filter banks and convex hull methods. We show that our method does not require detection of anchor points during training. To the best of our knowledge, our work novelly extends work in this field. 1.2.4 Thesis Contributions The following are the list of problems this thesis attempts to tackle: 13 0 We attempt to fit superquadrics to 3D faces using 3D point clouds. The hy- pothesis is that superquadrics capture the larger information in faces. Due to their inherent symmetry, we also hope that superquadrics find a good axes of symmetry for faces. The larger goal is to attempt a good representation of 3D faces 0 We attempt to bin (cluster) representations to find many compact clusters of faces. Thus, a good representation that can model both individual faces uniquely while maintaining some information at the group level can help reduce the time complexity of matching probe scans to a gallery 0 Lastly, we wish to study the parameterization of expressions (smile versus no smile) in 3D faces. Recognition of expressions are of importance to many com- puter applications. Understanding expressions is also important to achieve bet- ter recognition accuracies 14 Chapter 2 Superquadric Representation of 3D Faces For most problems in pattern recognition, it is both efficient and convenient to rep- resent the problem in a feature space that describes the problem better for the task of recognition. Raw spaces are often poor choices for representation and the study of similarity. In this section, we address the problem of representing faces using superquadrics. First, we look at the problem of fitting superquadrics to 3D face data. We, then analyze the quality of fit first by looking at a few sample fits. We find that the inherent symmetry in human faces and in superquadrics helps superquadrics find good axes of symmetry. This axes of symmetry helps to normalize 3D faces. Although, the superquadric representation captures some of the variability in human faces, it does not sufliciently represent faces. We then explore the use of PCA, to describe the finer details of the face in the later sections. Figure 2.1 shows a block diagram of the various processes involved in this chapter. For a good treatment of superquadrics and their properties, we refer the reader to [61]. Superquadrics found their place in the computer vision community through 15 the work of Pentland on perceptual organization [65]. Solina and Bajcsy [22] present a detailed sketch on how superquadrics can be used with deformations. The book by J acklic, Leonardis and Solina [66] provides a very good tutorial on superquadrics. Bardinet, et al. [67—69] show the effectiveness of superquadrics with deformations for fitting in real-world applications. For superquadrics with deformations on a local and global setting, we refer the reader to [21]. 2. 1 Superquadrics Before, we study superquadrics, let us take a brief tour of its predecesor a superellipse. Superellipse A superellipse is any surface, defined as follows. <2>m+ where a and b are positive real numbers, describing the major and minor axis. As, m —> 00 we get more rectangular shapes. As m—> 0 it takes the shape of a cross. Superellipses are special cases of curves known as Lamé curves, where m can be any rational number. Superquadrics are a powerful class of geometric models. They were originally proposed by Barr for rendering a multitude of objects. Superquadrics are an extension to regular quadrics in that they are able to parameterize curvature in the North-South (61) and East-West direction(62). Thus, superquadrics can generate a host of new shapes ranging from pin cushions, to ellipsoids, to torroids, etc through only a few parameters. Although, there exist many sub-classes of superquadrics, we will deal with only the super-ellipsoids, since their inherent symmetry and axes can help model human faces. 16 Given, two two-dimensional curves, h(w) mm) MW) “’0 S to S “’1 (22) MW) mflm mSnSni 05 mflm The spherical product of these two surfaces gives us a surface x, defined as, :1: = m®hz flmw= p— L m1(n)h1(w)- “’0 S a) S “’1 mNWWW) QM 710 S 77 S 771 mfim _ The two (2D) curves modulate each other to generate a surface x (3D). Let h(w) be a horizontal curve and m(77) be a vertical curve. Thus the spherical product results in a surface where h(w) is modulated vertically by the curve m1(n) while m2(77) raises or lowers the surface. Thus, 77 is a north-south parameter(latitude) while to acts as an east-west parameter (longitude). Let us take a closer look at the spherical product now. The spherical product receives its name from the productiOn of a sphere, when the half circle, 771(7)) 2 is crossed with the full circle c08(77) ‘% S "7 S 772, (2 5) sin(n) 608W) 7r S to S 7r (2-6) sin(w) The meridional half circle determines the radius of the sphere and also the height of the sphere above and below the X-Y plane. Superquadrics are obtained by the spherical product between two 2D surfaces. The parametric form of the superquadric is given by : r . alco.9€1ncos€2w 7r 7r r(nu')— E1 '52 —gSn-<_g 27 , . — a2cos 773m w 7 ( ~ ) < < [ . 61 7r __ w __ 7r a332n 77 An alternative, implicit form of the superquadric can be derived using the equality c0326 + sin2l9 = 1. We can now rewrite (2.7) as 2 (i) =cosZ€1nc032€2w, (2.8) a1 2 (i) =0052€1n0032€2w, (2.9) a2 2 (i) =sin2€2n (2.10) (13 Now, raising both sides of the (2.8) and (2.9) 1/62 and (2.10) by 1/61 and adding them up, we get the inside-outside function or the function used to render. C a a 2% a :1: 62 y :52 z 51 F 2 _ _ _ 2.11 render (a1) + (a2) + (a3) ( l where, F = 1 implies point (x,y,z) is on the surface,F < 1 implies the point lies inside the superquadric, F > 1 implies the point lies outside the superquadric. The explicit form is useful for representation and rendering, but if we define an error function based on (2.11) for fitting for certain 51 and similar (x,y,z) tuples we will not receive the same F (:13, y, 2) values, thus making recovery harder. l8 Since, there is no explicit (closed-form) solution for finding the Eucledian distance measure for fitting, we use the implicit equation as defined in f 2 2:2 2% _ __ 61 1r (a? <1)“ f“ (01 + a2 + a3 ( ) Note in Figure 2.2 , how the shapes along the diagonal become increasingly pinched as the value of the exponential parameters begins to increase. Superquadrics, can take a wide range of shapes. When we set both 61, £2 to 1, we obtain a normal ellipsoid (sphere), with radii 1. 2.1. 1 Superquadrics The term superquadrics was defined by Barr in his seminal work [70] to describe a class of surfaces - superellipsoids, superhyperboloids and supertoroids. In computer vision, superellipsoids commonly are used to represent various objects. Thus, for the rest of this thesis we will refer to superellipsoids synonymously as superquadrics. Hyperboloids Hyperboloids of one sheet can be represented by (act-(ail Applying the equality 00320 + 5137220 2 1, in a similar fashion to obtain (2.11), we get, 113 6 6 £1 Z 62 2 3/ 2 1 F + 2.14 hyperb ((11) (02) (a3) ( ) Face Aquisition & I Rough Symmetry 3 Superquadric Fitting Selection Feature Extraction Clustering : (Superquadrics/ PCA/Gabors) Figure 2.1: Block diagram that explains the various steps involved in this chapter rattan ff Figure 2.2: The vertical axes of varies the North-South exponential parameter, while the horizontal axis varies East-West exponential parameter. The values are varied from 0.1 to 2 on both the axis. 20 Hyperboloids of two sheets can be represented by 2 2 2 (i) _ (i) _ (i) :1 (2.15) a1 a2 a3 Applying the equality 00326 + sin29 = 1, in a similar fashion to obtain (2.11), we get, _ __ 61 _ .2 .2_ 2. .2 _ :1 ., thpeTbZ . (01) (02) (03) (2'16) Supertoroids Toroids are a special case of an extended quadric surface, defined as follows 2 (r - a)2 = (g) :1 (2.17) where r is defined as 2 2 r = (“3) + (i) (2.18) a1 oz The surface vector for a Supertoroid is given by a1(a4 + coseln)cos€2w r(n,w) = a2(a4 + coséln)sin€2w , (2.19) a3(sinC 1.77 and the implicit function is given by 21 where a4, a positive real offset that defines the equivalent of the radius for the supertoroid, is given by (14 = —— (2.21) a? + a3 2.1.2 Superquadrics In General Position We have already seen that a superquadric can be defined easily with only 5 shape parameters (3 radii, 1 for each dimension and 2 exponential parameters), but to define the position of a superquadric in a 3 dimensional Euclidean space, we need more than that. We need to define the set of operations that we need to perform to tranlates and rotate a superquadric to arbitrary position in space. Let us define .a homogenous transformation T to transform the superquadric (1'3, ys, 23) to its new origin, that of the 3D points (world) (1w,yw,zw) in the space. Thus, {L‘w ) 1‘5 yw = T ys (2.22) _ 3w _ L ‘33 _ Here, T is a homogenous transformation matrix defined as follows 22 (m 7713; Tia: 02: l m n 0 T 2 y 3/ 3/ y (223) [z m; 77.:y 0;: O 0 O 1 Thus, for a given point T first rotates the points using the rotation parameters l,m and n. The point is then translated by (01;, 0y, 0 2, HT. Note, that this transformation matrix T is invertible and can be used to go from world space to superquadric space and back, i.e. _ 1 - .. 1'3 .17“) y, = T_1 W, (2.24) 23 3w If we use Euler angles (0,640) to express the rotations in the transformation matrix T, we get the transformation matrix as follows. )- -1 cosocosdcosw — singbsinw singficosficosw + cososinfl —cos¢cos€sinw —sin.¢cost’) cosg’osinQ p3; —sin¢cos6’sin¢+cosocos6l sinésinfl py T : —sin6lcosw sinflsinw 0036 p; 0 0 O 1 (2.25) F(‘TU)1 3111)» ZUJ) : F(a‘1:a210‘32611627 Q5,6,'¢’,pr, pyi p3) (2'26) Thus, the inside-outside function of a superquadric has 11 parameters of which 5 are shape parameters (a1, (12, (13,61, 62) and 6 transformation parameters are used to describe the superquadric in 3D Euclidean space as seen in Equation (2.26) 23 2.1.3 Ambiguities in shape description The mapping between superquadrics and objects is many to one, by nature. For example, if we have a superellipsoid such as (1,2,1,0.4,0.4) by modifying the rotation parameters required to specify the superquadric, multiple superquadrics can represent the same object. Further, the same object can be represented by another superquadric with values (2,1,1,0.4,0.4) with appropriate rotation around the x-y axes. Lastly, as pointed out in [22], two parallelpiped with slightly rounded edges ((11 = a2 = 1,61 = 0.1 and £2 = 0.1) can be similarly represented after a rotation of 7r/4 around axis 2 of the object coordinate system, with 61 = 0.1,62 = 1.9 and a1=a2=\/2. The easiest way to deal with this problem is to constrain some of the shape parameters for the object we are trying to model. Thus, if we limit rotations from going beyond {—7r/4, 7r/4] for example, that would prevent inversions of parameters. Similarly, the exponents may be constrained to meaningful classes. In our problem, since we are only modeling human faces, this constraining can be easily done. Solina and Bacjsy [22] provided a new objective function to recover shapes that accounted for occclusions and chooses parameters such that they are minimized. R= WU“ 1) (2.27) where F, is the objective function in (2.12). The reason for , /a1a2a3 comes from the need to constrain the shape parameters to lie between [0.1.1]. thus implying meaningful recovery of shapes. 24 2.2 Fitting A superquadric can be fully rendered with just 5 parameters. To get its position in §R3 we need to include translation and rotations. These account for 6 other parameters. Our goal is to learn the optimal set of 11 superquadric parameters that represent a face scan approximately. To acquire a reliable sampling of points from the face scans for fitting, we first reliably identify key points on the face such as the tip of the nose, innermost eye points, etc. This is done by using the computed normals on the surface and then iterating to find the optimal point in a region. These key points are found using curvature estimation and localization as described in [47]. We use the key points for defining bounds to solve the optimization problem. The top most point of the forehead (that we can reliably accept from our scans), the bottom most point (the chin), the left and right cheek bones. We then sample uniformly within these bounds to get a point cloud. We use a multi—stage optimization procedure with two different optimization al- gorithms. We first compute the shape parameters using the active-set algorithm. We maintain very rigid bounds for the rotations in this step because of the sensitivity of the algorithm. We then use the parameters obtained to re—compute a fit, to include rotations. Suppose we have, an objective function 9(1) and constraints 92(1) 2 0 that define the search space for 1:. A constraint is said to be active if 9,-(17) = O and inactive otherwise. Thus, the active set is the set of constraints 92(2) that are active values of the parameters. The active—set algorithm works by iteratively attempting to solve the equality problem defined in the active-set through approximations. After computing the Lagrangian multipliers of the active-set, it removes a subset of the constraints with negative Lagrangian multipliers and other constraints that are infeasible. 25 Traditionally, conjugate gradient descent algorithms are applied to learn the pa- rameters; these algorithms tend to be slower than alternatives because of the overhead of computing the direction from the gradient vector and Hessian matrix. We employ a M atlabTM package that uses the modified version of the active set algorithm. The Hessians are computed empirically and modified accordingly. The details and analy- sis of the algorithm are beyond the scope of this thesis. We refer the reader to [71] and [72] for an introduction to these algorithms. 2.2.1 A robust normalized co-ordinate space PCA is a decomposition technique that has been applied to various applications in computer vision and pattern recognition. PCA tries to find the best orthogonal basis to fit data by searching for axes that maximize projected variances. Thus, given a matrix X, which is the data matrix, we are looking to transform X into its decom- posed basis vectors ,where every column in X represents a grid of (approximately) 100 x 100 3D co—ordinates. Since we know that the eigenvectors of the covariance matrix are the same as the original data matrix, we compute the eigenvectors for the covariance matrix. Thus, men is an input data matrix such that it has 71 data column vectors from m input faces. The empirical mean of each dimension n then is subtracted to get 2?.W’e now compute the covariance matrix of 2? given by: 1 A "T _ C = N Z XX (2.28) We, then perform a singular value decomposition to get the singular values U C 2 AC (2.29) The superquadric model captures the larger shape features in a face and pose and 26 the PCA is used to represent the finer features of the face. By looking at the singular values in the diagonal matrix A we can choose a minimal set of appropriate eigenvec- tors that capture the energy efficiently. We then find the weights of projections of the input data onto these eigenvectors and use these values for parametrization. 2.2.2 Fitting Results The MSU dataset consists of 107 subjects with 3 scans per subject. Scans were collected using the Minolta scanner, whose scanning accuracy is about 0.1mm. Anchor points or facial landmarks are located using the algorithms defined by Colbry and Stockman [47]. Points Rot X' Rot Y Rot Z 2000 —.008 -.0377 -.0009 SD :1: .001 :l:.002 :t.002 1000 -.0084 -.0053 -.00027 SD :1: .03 :l: .04 :l: .04 750 -.0024 -.0014 -.0045 SD £0395 i .0461 :t .0464 Table 2.1: Fitting Results on Raw Scans. Rotations are described in radians. Points Rot Z Rot Y Rot X 1000 -.0010 -.0012 -.0011 SD :1: .0053 i .0051 :l: .0134 750 -.0008 -.0017 -.0007 SD :1: .0044 :t.0061 :l:.0128 500 -.0009 -.0016 -.0019 SD 21:.0049 :l:.0090 :’c.0162 100 -.0014 -.0018 -.0017 SD z’c.0067 :l:.OO81 :t.0160 Table 2.2: Fitting Results on CFDM Data Set. Rotations are described in radians. Table 2.1 and Table 2.2 describe the average rotations of the superquadric along each dimension (X,Y and Z) along with their standard deviations for both raw scans and CFDM normalized scans. We observed that the rotations were generally smaller 27 in the CFDM normalized data set. I l I -100 o 100 {—1—‘7—1 ‘ -50 O 50 Figure 2.3: Superquadric fit to a CFDM face scan. The superquadric is plotted in green A sample fit is shown in Figure 2.3. It is important to note that we do a double optimization method of fitting. The superquadric was first fit to the CFDM scan. Points were pruned (gross outliers) based on the first fit and the fitting procedure was re—initialized. We found this to give a more robust fitting. This dual procedure particularly helped with finding good rotations. Since, solving for all parameters simultaneously sometimes resulted in very poor choice of axes. So, in our approach, in the first pass we do not estimate the rotations and do so only in the second pass of fitting. Sampling was done such that, we choose points between the chin and the middle of the forehead in the y-axis. We limited our data to lie between the left and right 28 cheek bone. We also took extra precaution and pruned away data beyond a certain z depth. This helped with the fitting procedure. Points that lie between 0 and -10 (which usually included protrusions, like the tip of the nose) were pruned. This served two purposes a) It helped in generating amore informative model that can capture the larger cur— vature of the face and b) It increased the speed of the system. We experimented with various choices of sampling points. We generally found that including regions like the tip of the nose and other protrusions, forced the objective function to find an inappropriate X-Y coordinate space for the face. In our work, we have constrained sampling using only depth constraints but an interesting idea could be to remove certain regions such as the eyes, nose and lips and resample for fitting. Since our data is 2.5D data, we found that the parameters for the z axis radius was very sensitive. Intuitively, this makes sense since there can be multiple solutions that could be optimal for a given point cloud. We found that modifying the objective function to penalize only the z-radii resulted in a poor choice of axes. The solution that, we found, worked was to constrain the space of values the z-radii parameters could take. This, resulted in stable fits. Now, let us compare the Figures 2.4 and 2.5. We see that the NS exponent (or the 4th parameter) differs on an average of about .3 between the two faces. This exponent basically tells us if the superquadric is going to be pinched or not. Notice how in Figure 2.5, it tends to pinch a bit more than the first figure. The horizontal exponential component tends to account for the elliptical nature of faces. Thus, the more closer to an ellipse a face is, the larger this parameter. Both Figures 2.4 and 2.5 have only average values for horizontal exponents. In Figure 2.6, the vertical (N-S) exponent is large and we notice a small pinch in 29 ' frontal ne 53.18,93.6,36.87,1.34,.48 53.18,88.72,25.03,1.11,.74 53.10.92.75,30.30,1.39,.6 .200 .7 0~ ,5 100 o ,/ 100 '50 ' -soj $0 -100 1 0 ~.. g -155‘ N .1 -1oo ‘“3\~-\\M/ ~c. 100 -100 o 100 -100 Figure 2.4: Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data. . The numbers above the fits (Left to Right) are the 5 shape parameters al. (12, a3, 61, 62 ac 56.24.111.5,39.20.1.74,.69 frontal 58.33,110.5,42.48,1.74..70 58.27.106.75.43.54.1.74.50 Figure 2.5: Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data . The numbers above the fits (Left to Right) are the. :3 shape parameters a1, a2, a3, 61, E2 30 the superquadric. The horizontal exponent (EVV or the 5th parameter) is a little above average thus giving a shape that is in between a box and an ellipse.If the horizontal exponential component is low then the superquadric tends to be more box-shaped (if it is not pinched). Figure 2.7, shows histogram plots of the various shape parameters on the MSU dataset. It is interesting to note the nature of their empirical distributions. The X- radii seemingly fits an approximate Poisson distribution, with a mean around 55mm. The Y-radii fits a normal distribution with low variance very well, thus implying that only the two extremum will help in separation. The Z-radii again would seemingly fit a normal distribution with mean about 30mm. The exponential parameter N—S fits a normal distribution with a large variance and a heavier left tail, while the E-W parameter fits a normal distribution with a large variance and a heavier right tail. Thus, both of these can contribute towards separability of faces. It is important to note that these histograms should be interpreted with due care because the sample size of our dataset is small and the fitting procedure only partially robust. It is important to note that the rotations of the superquadric along the YZ axis tends to be more stable with the increase in the number of points in the cloud. Although the error of fit tended to be a bit higher(more outliers) for larger sized point clouds, the parameters estimated tended to be similar. The idea of why our procedure works better may be intuitively understood by looking at a poorly normalize data sample from the CFDM dataset and its corre- sponding fit(using our model). Figure 2.8 shows one such case. We estimate some 20 such scans to be poorly normalized in our dataset, which roughly accounts for 3-4 more eigenvectors required to describe the CFDM well. 31 frontal 56.71 102.75 38.23 1.75 0.61 57.64 94.50 37.38 1.65 .69 O -50 100 me sm 56.49 96.50 37.57 1.69 .63 Figure 2.6: Superquadric Fit for a subject 1035 3 scans. The numbers above the fits (Left to Right) are the 5 shape parameters (11, a2, a3, 61, 62 1 00 80 60 40 No of faces 20 0 50 60 X Radii 70 1 00 so 60 40 No of faces 20 0 0 1 N-S Exponent No of faces No of faces 100 80 80 ,,, so 60 § 3: 4o 40 l g l 2 20 2o 0 o 50 100 150 o 50 Y Radii z Radii 150 100 50 o o 1 2 E-W Exponent Figure 2.7: Histograms of the 5 superquadric shape parameters. X,Y and Z radii are measured in millimeters(mm). 100 50 -1oo ' 0 ‘g_ . . 10° -100 -50 o 50 100 Figure 2.8: Superquadric fit to a poor CFDM face scan. Note in the top left corner, the scan has many holes in it. Holes are a result of the reflected laser not being captured due to hair, etc . The superquadric is plotted in green. The blue points are the sampled data points for fitting the superquadric. 33 2.3 PCA and Eigenfaces Principal Component Analysis (PCA) or Karhunen—Loéve transform was originally introduced by mathematical statistician Karl Pearson [73]. PCA works by finding orthogonal axes that maximally capture the variance in the data. PCA as a method for characterizing human faces was first proposed by [74,75]. PCA was introduced as a tool for face recognition by Turk and Pentland [30,76]. The procedural details for PCA are as follows. Let us assume we have m faces in our dataset. Further, the vectorized size of each face is d. 0 We first organize a collection of faces that are normalized. Usually, normal— ization includes a transformation of the face, such that the tip of the nose is aligned at the center of the image with a frontal pose. 0 Now, create column vectors, one for each face and arrange them next to each ——-> -+ other, i.e. X = [:rl..xm]. Thus, X is of size d x m ——) 0 We then compute the mean face ( (15) from this matrix and subtract the mean from each face vector 2'. —)—-> %:&_a (mm 0 We then compute the Covariance matrix of the matrix )7 c = —1—YYT (2.31) m 0 we then compute the eigen vectors and eigen values of matrix C C A 2: AV (2.32) where V is the set of eigen vectors and /\ is a diagonal matrix containing the 34 eigen values 0 We then choose k eigen vectors corresponding to the top k eigen values that capture the maximal variance in the data. The choice of k is subject to the need of the application. The trade off is between accuracy and computational complexity. If we require a more accurate representation, i.e. more variance represented, then we choose more eigen vectors but that increases the dimen- sionality of the system. Thus, more computational effort is required with higher dimensions. 0 Typically, it is computationally very expensive to perform the eigen decompo- sition of a d x (1 matrix. For example, if every face was a small image of 100 x 100 pixels, the covariance matrix would result in the size of 10000 x 10000. A large number of elements in this matrix does not contain useful information (thus resulting in a sparseness in the matrix in some cases), so computing the eigen vectors for such a matrix is not only infeasible in some cases but also not very accurate. This problem was addressed by Turk and Pentland [30,76]. They suggested that it was possible to compute the eigen vectors as follows. C’ = —1—YTY (2.33) 711 ex = 11> (2.34) v = W (2.35) Thus, we can now compute the eigen vectors of the d x d matrix using the m x m matrix. 35 o On choosing the eigen vectors, we then project each face (mean subtracted) onto the eigen vectors to get the eigen projection weights. ——> 7“" ij = V yj (2.36) where w is a k x d dimension matrix, such that each row corresponds to the set of weights for each eigen vector (eigen face). 0 From the previous step, reconstruction can be easily done for face j as A k —-) —-> Xj = Z wjz-Vj + a (2.37) i=1 The addition of mean is important, since while computing the eigen vectors, we subtract the mean to center the axes. 2.4 Toward a Parametric Face The ramifications of having a 3D face that can be represented by a set of few param- eters, are plenty.Our primary goal of parameterization is to create a set of bins that; once created, they can be used to partition the space of human faces. Now, when we want to recognize a face using matching algorithms, we would only need to match it with the set of faces in a bin. This reduction in search space leads to a large reduction in time-complexity. For representing the finer features of the face, we sampled approximately 100 x 100 points based on the superquadric co—ordinate space. We assumed (and verified through the trace of the eigenvalues) that the axes of the fitted superquadric better represent the symmetry of human faces as compared to raw and CFDM scans. Then using the steps described in Equations 3.1 - 3.2, we computed the eigenvectors. 36 We compared an approximate 100 x 100 feature mesh under three different con- ditions : 1. MSU dataset with superquadric normalization (raw dataset;with no—preprocessing) 2. MSU dataset with CF DM normalization (as described in [47],and no superquadric normalization) 3. MSU dataset with both CF DM and then superquadric normalization To compare the meshes from these three different normalizations, we looked at the trace of the eigenvalues. We found that the top 10 eigenvalues for (1) contained 82% of the energy while for (2) it contained about 97% of the energy. (3) on the other hand did better with about 99% of the energy.Thus, we see that the superquadric normalization along with the CFDM normalization provides a better normalizing co-ordinate space. 37 Chapter 3 Clustering Representations In this chapter, we will look at the results of clustering the representations described in earlier chapters. First, we provide an overview of the clustering methods used in our work and their objective functions. 3.1 A Note On Clustering Traditionally, clustering is an unsupervised method used to discover previously un- known patterns in the data by grouping them based on some similarity measure. Depending on the application/problem, clustering can involve a variety of al- gorithms. Given, a set of n vectors sigma? of d-dimensions, the objective of clustering is to identify groups or partitions in the dataset. The challenges largely involved in clustering can be grouped into the following : 1. To identify clusters (of varied shapes), if knowledge of the total number of clusters - k, is given 2. To reliably discover k To address problem 1, there are many solutions. Depending on the application, the choice of algorithms are many. Spectral clustering has been used in document 38 clustering, speech recognition and other applications [77, 78]. Hierarchical clustering is used when there exists a tree-like structure in the data, i.e. groups are better represented in a layered manner, with higher layers subsuming lower ones [79]. Other algorithms include (but are not restricted to) probablistic k-means, learning vector quantization (LVQ),etc. 3.1.1 Unsupervised Learning - Kmeans Of the many algorithms k—means still continues to enjoy popularity principally be- cause of its simple, yet efficient objective, that the association of a sample is best towards a cluster that it is most “similar” to [80,81]. A good tutorial of its history and applications is provided by Jain [82]. The objective function is defined as: k . 2 Ar minV= E E :r- — ~ 3.1 9 V . ( J #2,) ( ) 2:1:EjECZ' where there are k clusters 0,, where i = 1, 2...k and “i is the centroid for the Cith cluster. Thus, we aim to minimize the function for intra class variance and maximize inter class variance. Since, this is an unsupervised algorithm the choice of k and the initial location of the centroids is critical. Although, k-means remains popular, there are some inherent disadvantages to this model. First, k-means tends to discover only Gaussian shaped clusters. This can be highly limiting, if we wish to partition datasets as seen below. Methods such as spectral clustering can be used to discover such shapes. Further, association to a group is very rigorous with this model. Fuzzy k-means, probabilistic k-means, etc are commonly employed to create a soft association of a sample with a cluster. In probabilistic k—means, every sample is given a probability 39 of association with each group, such that the sum of the probabilities totals to 1, thus modeling a probability distribution. 3.1.2 Semisupervised Clustering In real-world problems, often there lies some information about the data that could possibly be useful to discover new information / patterns. This information can be in terms of user feedback or constraints such as those imposed by the real-world - e. g. height of humans cannot be less than a foot or greater than 9 feet. Constraints can also be of the nature such that. they determine similarity between sets of points (not necessarily all). Semi-supervised clustering employs a small amount of such constraints to aid the unsupervised clustering. We use the clustering model proposed by Bilenko,et.al [83,84], where they modified an unsupervised k—means based clustering model, to include metric learning and constraints. Let M be a set of must-link pairs of data points, where ($25,513) 6 M. Similarly, (1:23:17) 6 D implies 2:,- and 103- must lie in different clusters. If we modify equation (3.1) to include constraints then we get the following equation 2 . Vka-means : E : ”Ii — ”i“ + E : wi,jTlli 75 [j] TiEX :ri,:chM + Z mTll-z=ljl (3.2) 132',.'L‘jED where W = wi’ j is the weight matrix for forcing data pairs to belong to the same cluster and W = 102-, j is the weight matrix to force data points not to belong to the same cluster. Lastly,r is the indicator function 3, T[true] = 1 and 7'[ f alse] = 0. If we wish for tighter clustering, then we constrain the rank of the weight matrix A . If A It represents the weight matrix for each cluster, then equation (4.2) can be 40 modified to 7pcknzeans = Z Ill‘z' - Milli” -109(d€t(/1hi)) (3-3) IEZEX 1 Z wi,j7lli 7'é fj] + Z wi,jT[li = [j] xifljEfW 132311760 In [83], they further change equation 3.3 to account for the weight matrices having the appropriate shapes for and propose an optimization algorithm for solving the objective function. 3.2 Clustering Results This section discusses the clustering experiments that we conducted during the course of our research. The data set in use consists of 107 subjects with 3 scans each. The accuracies defined are based on the effectiveness of the clustering algorithm to place at least two of the sister scans in one bin. Note, this is equivalent to calculating the precision of the system. The clustering procedure includes subtracting the mean and normalizing the data set before running the clustering algorithm. It is interesting to note that the choice of normalization affects recall accuracies. From our work we find that a L2 Norm works best. From Figure 3.1 and Figure 3.2, we can see that the accuracy decreases with number of clusters. Perhaps, the accuracy of the shape parameter based clustering is slightly better. We propose a hybrid method of clustering using shape and PCA based Eigen parameters for clustering on the CFDM and Superquadric normalized data set. 41 ‘ if j fi Tr T T f f + 0.98 ' i 0.96 _ '1‘ ‘1' + + + + + A 0'“ + + + + c + + c 0'92 b + + + + + + + + + u 0.9 " + + '1 r + + + + l 0-33 ’ + + ‘ c + Y 0.86 > + + + + 4 0.84 P + 4. 0.82 - + 0.8 1 1 1 1 1 1 1 1 1 + 0 2 4 6 8 10 12 14 16 18 20 Number of clusters Figure 3.1: Plot of recall accuracy against cluster size for CF DM and superquadric normalized dataset with no Eigen and 5 shape parameters used Hom Figure 3.3, clearly the hybrid method shows promise. We can see that a cluster size between 10-15 is easily and robustly achievable (over 90%). From Figure 3.4 it is also evident that the use of too many PCA weights actually worsens the performance. The optimal size of PCA weights seems to be about 20. Finally, Figure 3.5, shows the cluster plots for the raw data set without the CFDM normalization. Clearly, the performance of a hybrid clustering system on the raw data set outperforms kmeans. Figure 3.6 shows the performance of clustering on the CFDM and superquadric normalized data set. Clustering was done using both default constraints and manual constraints with RISC MPCK—means toolbox. The equation in [83] is employed in the toolbox. We can see that the accuracies are slightly better for larger cluster numbers with the semi-supervised clustering method. Since, it is beneficial for us to have as many clusters as possible, this method is interesting. 42 1 . a f s 0.95- + + + + A 1- + + -( c 0.9 + + + c + + + u + + 0.85- + + + r + + , + + + + + 3 + + + + c + + + + 08’ '1‘ '1' Y ' + + + + + + + + 0.75» i 1 0‘7 L 1 1 1 o 5 1o 15 20 25 Number of clusters Figure 3.2: Plot of recall accuracy against cluster size for CF DM and superquadric normalized dataset with 10 Eigen parameters and no shape parameters used Figure 3.7 shows the rank accuracy of retrieving a sister scan from the test set based on the parameters. We see that an accuracy of above 80% is achievable by using a mixture of shape parameters and Eigen projection weights. Note, that this ranking is based on the entire set. The ranking results must be interpreted in conjunction with the clustering procedures, i.e. if we searched for the sister scans in the appropriate cluster then the number of comparisons could be lower. Since, we might have few 105 or 1003 of highly dense clusters it would save us a lot of overhead if we had to match the probe faces to the nearest 112 gallery faces. 3.3 Time Complexity Table 3.1 shows some details on the time and space complexity of the various proce— dures used in this project. As stated in Colbry et a1 [2], a probe scan can be matched 43 1 T F77 + + _ + 0.95 + + ] + + A + + + + C + + + 0.9. + + , C + + + + + u + + + + + + r + + a 0-35' + + + .[ + + C + + + Y + + 0.8” + + J 0.75 1 1 1 L 0 5 10 15 20 25 Number of clusters Figure 3.3: Plot of recall accuracy against cluster size for CFDM and superquadric normalized data set with 10 Eigen and 5 shape parameters used with an entire gallery (330 scans) in about 5 seconds. If we assume a 15:1 gain in performance by moving from Matlab to C++, the time taken to fit a probe scan to a superquadric is quite small. The only large computational time is the time taken to sample a mesh from the superquadric space. Currently, we try to find the nearest point between a 100 x 100 mesh and a point on the 3D face. This can be simplified (approximated) and sped up by sampling from a fixed corner. So, if we had a large gallery (say, 10000 unique subject scans), we would still take about 150 seconds to verify a probe scan. This although feasible does not make for a very practical system. Now, if we are able to partition the data set into even 50 equal sized clusters. Then the time taken to verify can be reduced to 1/30th of a brute forced search, by matching only against all scans in a cluster. Further, if we could rank the clusters by similiarity, we could parallelize the match- ing against multiple clusters and the verification with-in them. This could allow for 44 ‘ T T + 0.98 - + + + + + 0.96 - + t + f 1 + + + + -L + .L . 1 A 0.94 + + + 1 C + + + c 0.92 ’ + + + '4 U + + r 0-9 ’ + + + 1 g + + c 0.88 '- + + 4 +- Y 0.86 ~ + J + .. 0.84 - + 0.82 L 1 1 1 1 1 1 1 1 “ 0 2 4 6 8 10 12 14 16 18 20 Number of clusters Figure 3.4: Plot of recall accuracy against cluster size for CFDM and superquadric normalized data set with 50 Eigen and 5 shape parameters used a real-time system on a large scaled system. The only road blocks are the scanning time, robust partitioning and time taken to extract features. Work in low-cost, high speed 3D scanners hopes to reduce both the time and cost of procuring good face scans. So, in future a robust partitioning schema, along with good features and a parallelized matching system could make for a practical face recognition system even for large galleries. 3.4 Feature Analysis 3.4.1 Submeshed Projections Our motivation is that a single mesh of approximately 100 x 100 data points centered around the nose is not effective to capture all the variances in the human face. So, we explore in this section a submeshing and their corresponding Eigen projections. 45 0.95 - 0.9 L + 0.85 1- + + 0.8 7 + + 0.75 - + 0.7b 1‘ 1 i + + 0.65“ + + + 1 0.6 '- + 4 0.55 0 5 1o 15 20 25 30 Number of clusters Figure 3.5: Plot of recall accuracy against cluster size for the raw data set with 50 Eigen and 5 shape parameters used We split the 100 x 100 mesh into smaller meshes of 25 x 25, thus resulting in 16 submeshes. We then apply Eigen decompositions to each of the submeshes. We project each submesh onto a few Eigen vector and compute weights. These weights are the parameters for each submesh. Table 3.2 studies the top 3 Eigen values contained in each of the submeshes. The submeshes are arranged in the same way, we’d View a face, i.e. the entry (1,1) represents the right eye (user-centric) of a normalized scan. This provides a slightly intuitive understanding of human faces. We see that the regions around the lip and nose are more sensitive than the other region. From the experiments so far it is clear that a simple parametrization using Su- perquadrics will not suffice. To study this further, we can look at the entropy esti- mates of the parameters. Entropy (H(x)( f (xi))) where f (17,-) is a particular feature/ data point), is defined 46 1 V T ‘1’ T 1” r +I + ,_ + 0.95 + + A + + + C 0-9’ + + 1 + + C + + + u ++ '1' 4. . + ++ + + c + + + 08 '1‘ '1' '1' 4 Y + + ++++ 4.. 0.75“ ++ < + 0‘7 4 1 1 0 5 10 15 20 25 30 35 40 45 50 Number of clusters Figure 3.6: Plot of Recall Accuracy against Cluster Size for the CFDM and su- perquadric normalized Data Set with 20 Eigen and 5 shape parameters using the MPCK-means Algorithm H122) = -— Z f($i)1092(f(33i)) (3.4) i=1 Table 3.3 shows a comparison of entropy estimates using equation 3.4. We see that the entropy of the superquadric parametrization in conjunction with the submeshed Eigen projections clearly outperforms other parametrization. We wish to apologize for the abrupt inclusion of the Gabor features’ projections in this table, but we include it for completeness. The 100 here refers to the number of points sampled on the convex hull and the 10 refers to the number of Eigen vectors employed. Intuitively, the submeshed parametrization makes sense as well. Superquadrics can capture large variances in faces, the length of a face, its general latitudinal and longitudinal curvatures, etc. In fact the entrOpy estimates of only the two curvatures of superquadrics is fairly high, with east-west curvature capturing more information. 47 0.9 I F V V Y I r 0.3 - HngfOr/z’o/J / _/ ._/ a/r P/F'EP‘fa—A‘“ * A /O"’_" [A C /A c /A u I. 1 c y J —A-— No PCA —0-— Shape + 5 PCA J —I— Shapo+10 PCA ~O— snape+30 PCA 1 I 1 l l L 05101520253035404550 Number of Neighbors o l 1 Figure 3.7: Rank Accuracy: Retrieving sister scans based on parameters Process Details (320 x 240) Sean Time z 2 sec Curvature Calculation 0.42 sec Symmetry Alignment (ICP) 0.23 sec Orthogonal Projection 0.22 sec Total CF DM Generation 0.85 sec Verifying Probe vs Gallery (330 scans) z 5 sec Raw File 1.58 MB Compressed Raw File 0.88 MB 400 x 700 CFDM file (ppm) 1.68 MB 400 x 700 Compressed CFDM 0.24 MB Fitting Superquadrics z 5 sec (Matlab) Sampling From Superquadric Space z 3 min (Matlab) Euclidean Matching to a Cluster << 1 sec (Matlab) Time Taken to Calculate PCA weights << 1 sec (Matlab) Table 3.1: List of processes/ data formats and their corresponding times and memory requirements. Some data from Colbry et a1 [2] 48 Row / Col [ C011 C012 C013 C014 Bowl 95% 94% 93% 93% Row2 97% 92% 92% 96% Row3 97% 89% 90% 94% Row4- 94% 96% 94% 91% Table 3.2: Table showing the variance capture by the top 3 Eigen Parameter Type Entropy Estimate Superquadric Parameters 2.42 Superquadric Curvatures (61,62) [1.6.6.7] Eigen Projection (100 x 100) 1.02 Eigen Projection (25 x 25) 4.38 Gabor Features Projections (100, 10) 2.33 Table 3.3: Table comparing entropy estimates of Superquadric fitting, 100 x 100 Eigen projections and 25x25 eigen projections This goes to show that there is a lot more information in the curvatures of faces than the points themselves in an Euclidean space. The real variance lies in finer details such as the length of a nose, the pouty nature of lips, the high cheek bones and so forth. A submeshed projection, thus captures this variances better, assuming a well normalized data set. 49 Chapter 4 Parameterizing Expressions: Recognizing Smiles The study of expressions is important for multiple reasons. Detecting expressions is of benefit to many applications such as gauging user-interest, stress levels, etc. Another key reason to study expressions is that face recognition is often confounded by expressions.There has been a lot of work in studying expressions in the face research community, largely dealing with 2D images. Varying degress of success have been achieved in detecting various affective states. Edwards et al. [85] use the Active Appearance Model(AAM) to model expressions using key points and PCA for both texture and shape information. Gabor filter based expression recognition is also a well studied approach. Lyons et al. [86] use Gabor banks to model expressions. Other approaches of expression modeling include elastic bunch graph matching algorithms as seen in Wiskott et al. [87] As far as 3D faces are concerned the following approaches have been explored with varying degrees of success. We present an overview only for completeness and not as a survey of existing methods. Wang et al. [88,89] present a 3D primitive feature analysis (Topograhic Context), where they segment the face into 8 regions 50 and label the sub—regions based on its convexity. They then create feature vectors by computing the ratio between the various labels in the region [[Enl, “27%2] This ratio, they claim, is unique and sufficient to represent different expressions. They show fairly succesful results based on this model. Although, they do compare their work with a gabor bank based approach, we wish to point out that they merely computed a simple intensity calculation of the manually located feducial points.Huang et al. [90] present some work on calculating 3D image feature vectors based of Euclidean distance computation between manually selected anchor points. Some other interesting work, includes Mpiperis et a1 [91] who propose a bilinear model that uses anchor points and regularizers to smooth the fit between model ( to be fit) and surface (given) by using both model to surface and surface to model distances. They claim that such a model helps to learn regions in the surface such as valleys and ridges. Thus, although there exists a critical body of work to study expressions in 3D face data. There exists no truly automatic system for expression recognition that does not use pre—calculated or manually chosen anchor points(during training). Although there exist models such as Active Shape Modeling using Point Distrubition Models [29] that can alleviate this setback fairly well, they inevitably add to more computational time. In this work, we attempt to parameterize expressions (smiles) without the use of anchor points. The chapter is organized as follows: we first review some theoretical concepts of Gabor filters. We then describe our method of parameterization. Lastly, we describe our experimental results. 4. 1 Gabor Filters Early cortical cells in the V1 regions of the visual cortex have been shown to be sensitive to both orientation as well as spatial frequency [7,92]. Daugman further suggested that the cell’s response can be well approximated using a 2D Gabor filter. Thus, Computer Vision researchers have taken a keen interest in Gabor filters. Gabor filters have been extensively used in Computer Vision research and even more so for biometrics research. Gabor fitlers are often used for finding key points in recognition algorithms [87, 93, 94]. We refer the reader to a survey paper for a clear overview of the algorithms that employ Gabors for authentication [95] and expression [96]. A brief but rigorous tutorial by Movellan on Gabors can be useful to the interested reader [97]. In our work, we use such a Gabor filter bank. A 2D generalization of the Gabor function is given by 11311 _uk.-u2na-12 ,- , ,2 02(5) = ——§—e 20 ejkix — 67 (4.1) 0’ Each ‘11,; is a plane wave characterized by the vector I3,- enveloped by Gaussian function, where o is the standard deviation of this Gaussian. The center frequency of the ith filter, is given by the wave vector _. k- It 0036 kg = w = U 11 (4-2) kiy kvsinflu Each filter is represented by a scale kg and orientation 0#.We computed the Gabor responses for a standard set of frequencies and scale k2): [0,..4] and 6”: [-7r,—-37r/4,...7r]. The first term in the (4.1) is the part that deals with the periodicity of the filter, the term within the square braces compensates for the DC value of the kernel. This 52 lllllllllfllilil "mam EEFZ%E EEEEz Figure 4.1: This figure shows a typical Gabor filter bank. Left to right are the changing frequencies. Top to bottom are the changing spatial orientations. Typrically, an input face is convolved with each of the filters to give a gabor intensity map. Features are extracted from various intensity maps for the specific application operation makes the filter insensitive to illumination. Gabor filters are nothing but products between a harmonic function (4.2) and a Gaussian enve10pe(4.l). Thus, they can be thought of as edges or contour detectors that detect both valleys and ridges. With respect to face research, they can help detect a lot of key regions of a face; like the eyes, tip of the nose, lips, moles, scars, etc. The different orientations are controlled by the harmonic function and the scale or the frequency is controlled by how often the Gaussian envelope repeats itself. The way the filters are applied is that each filter is convolved with the input face. The high response regions are the potential regions of interest in the original image. Some features may be larger (eye sockets) and thus will show up on a lower scale while some features that are smaller (like moles) show up on larger scales. Depending on our application, the response of a few filters to increase robustness of detection. Let us number the rows and columns of Figures 4.4,4.5 as 1 through 8 (left to right) and 1 through 5 (top to bottom). In Figure 4.4 we see (2,5) the eyes, nose 53 -60 -40 -2O 0 2'0 40 60 Figure 4.2: Frontal scan with no expression of subject 002 from the MSU—Face database. 54 150 100 1 50. -100 L -50 0 50 Figure 4.3: Frontal scan with smile expression of subject 002 from the MSU-Face database 55 l] ll ‘ 1' l] ‘ . ll ‘) [ Figure 4.4: Convolved responses of the no—expression frontal scan 4.2 using Gabor map described above Figure4.1. Notice how some responses identify the regions around the eyes, nose and lips while others tend to identify regions that are not easily identifiable. Note also how the Gabor responses identify the boundary of the face scan from the input depth map. 56 Figure 4.5: Convolved responses of the smile-expression frontal scan 4.3 using Gabor map described in figure Figure 4.1. Notice how some responses identify the regions around the eyes, nose and lips while others tend to identify regions that are not easily identifiable. Note also how the Gabor responses identify the boundary of the face scan from the input depth map. 57 Normalized Input Facescan Parameterize Blobs Blob (Convex Hulls + Detection ‘ ’ PCA on perimeter points) Gabor Filter Bank Figure 4.6: Block diagram showing the procedure for parameterizing smiles. and lip regions identified. These features can also be identified in some other filters but it’s clearest here. Similarly, in Figure 4.5 we see the same 4 features identified, except the shape of the blobs is different, they seem some how “softer” thus denoting the smile. These features are almost repeatably identified in all input scans. 4.2 Cluster Parametrization: A Convex Hull And PCA Approach 4.2.1 Parameterization of Expressions Our approach of parameterizing expressions (smile versus no expression) uses a sim- pler idea of detecting regions of interest on faces using Gabor filter banks. These regions of interest as shown in figure vary in size and shape between faces that are smiling and those that are not smiling. We, then attempt to parameterize these clusters by using an ASM like approach. Figure 4.6 explains the procedure. As described in the previous section. we obtain a set of 40 convolved images for every input face. From our data the response of scale kv = l and orientation 6,) 2 0 responds most discriminatively for the task of smile recognition. 58 Figure 4.7: Gabor responses for smile versus neutral expression along with the input scans Figure 4.8: Blobs identified on Gabor response of a neutral scan of subject 13 from the MSU database. From the response as shown in Figure 4.7, we perform blob detection. To do this effectively, we first remove the high intensity responses around the border of the image and perform blob detection based on simple relative geometric locations. We experimented with various clustering algorithms to identify the blobs robustly but found that performance based on blob detection outperformed clustering methods such as nearest neighbour, shortest distance and average distance clustering methods. Figure 4.8 shows an example of blobs being identified on Gabor responses. Once the 4 blobs have been identified from the Gabor response a convex hull is extracted 1. Our idea is to sample uniformly along the perimeter (hull) of each cluster. We then perform PCA on the perimeter points. The projection weights of the perimeter points of each cluster are the parameters for that cluster. Thus, we obtain 4 sets of parameters for each face. 1See Appendix A for more details on convex hulls. 60 4.2.2 Recognizing Smiles We evaluated our proposed methods on the Michigan State University Canonical Face Depth Map (MSU-CFDM) dataset used by Dirk et al. [47]. We normalized the dataset using procedures stated earlier. We mainly employed Support Vector Machines for classification. Although we did experiment with quadratic kernels (equivalent of QDA) and even tree-classifiers. We found that SVMs worked relatively the best. All results reported below are based on a 10~fold cross validation using the hold out method. 80% of the data was used to train the classifier and 20% was used to test the accuracy of the classifier. Prin Components Accuracy Variance 5 75% .002 10 76% .003 15 71% .003 Table 4.1: Performance of feature vector for classification on neutral vs smiling faces using 200 points on the convex hull From Table 4.1 There seems to be a critical point in the usage of the number of principal components with respect to classification accuracies. Too many or too few projection subspaces resulted in poor classification accuracies. No. of Points Accuracy Variance 200 76% .003 100 78% .003 50 74% .002 Table 4.2: Performance of feature vector for classification on neutral vs smiling faces using varying points on the convex hull projected onto 10 principal components Table 4.2 compares performance accuracies using various perimeter point densities. Here we see that if we use too few points or too many the accuracy of classification tends to decrease. We also experimented with training classifiers using the feature vector from the submesh Eigen projections. The accuracies are reported in Table 4.3. We see that it 61 No. of Big Projections Accuracy Variance 3 80% .001 5 76% 6e—4 10 73% .001 Table 4.3: Performance of feature vector for classification on neutral vs smiling faces using submeshed Eigen Projection Weights using performs marginally better than features extracted from Gabor filtering, with lower variances, thus possibly more stable classification. 62 Chapter 5 Discussion We have shown a novel parametric form for 2.5 D faces, using superquadrics and PCA based parametrization. Colbry and Stockman [47] show a robust normalized face-space in their work. They show how a robust face-space can lead to a faster and more accurate matching of face-pairs. We show that we can find a more robust normalized face—space. Such a face-space has beneficial impacts in other areas such as robotic surgeries. We show that the average rotations is in the order of 10—3 for all 3 rotations on the CF DM data set. Although the mean squared error (MSE) of fitting is large, the axes obtained show that superquadrics help find a good coordinate space for faces. This result is again verified by the trace of the eigenvalue, showing that the sum of the top 10 eigenvalues captures about 99% of the energy. It is important to note that, we believe that superquadrics cannot represent 3D faces completely (and accurately). They only help capture larger shape variations, which help to find about. 20 clusters with an accuracy of about 85%. It would be fair to say that they only capture larger variations such as overall curvature and diameters. Since the set of geometric shapes spanned by 3D faces is strictly ellipsoidal (or superellipsoidal) the results presented here support this idea. To speed up traditional biometric verification and recognition systems, we propose 63 a method of partitioning the data set. We show that cluster numbers between 20- 30 can give a fairly accurate recall of over 90% using simple k-means clustering. Employing, the MPC K-means algorithm, we show that we can create more clusters (30-40) with accuracy of about 80%. Thus, clustering on the face data set could give us an order of 101 speed up in large data systems using as little as 30 shape parameters. Given a probe face we can find its sister scan (other scans of the same subject) with in 20 matchings with a probability of 0.85 as opposed to matching it with all the 214 scans. If each matching costs about a second, then even with a large gallery we can match a person in about 20 seconds. We also propose a fully automatic method to extract features for recognizing smiles. We show that we can reliably recognize smiles at about 80% accuracy. There are many limitations to our current model. The responses of the Gabor filters are sometimes insufficient to reliably extract the convex hull. Thus, to achieve a truly automatic system, we will have to assume atleast a robust normalization schema. It is important to note that the time complexity of our system is fairly trivial. Given a normalized face, we were able to easily estimate the coefficients in under a few seconds. Assuming a 15 fold increase in performance using C++, we can perform recognition in real time. It is important to note that our dataset consisted of samples that have many holes in them due to features such as hair and anomalies where the reflected laser light was not captured (Examples in Appendix and Chapter 2). Thus, the results reported in this thesis is a conservative estimate, with improved results on better data. Future Work We would also like to explore how the clustering methods perform when the num- ber of scans per person increases in the data set. We also plan to test our system on a larger data set such as the Face Recognition Grand Challenge v2 (F RGC) data set [54],to verify the performance of our results. Another very interesting dimension 64 to be studied would be to identify the average number of clusters to be evaluated before a match is made. We expect this value to be fairly low on large databases. It is clear from the literature and Table 3.3, that shape curvatures carry a lot of discriminative power. Nevertheless, there is a strong requirement to gain a truly automated feature space. There exists room to try more hybrid approaches based on extraction of facial features and surface features. We hypothesize that extract- ing shape curvatures for the regions enveloped by the convex hulls (proposed in our method ) using Gabor filters could perform very well for parameterizing expressions. Extracting shape curvatures for such select regions will also greatly reduce computa- tional cost while still maintaining autonomity. Our future work will involve working along these lines and testing our work on other datasets such as those presented by Yin et a1 [89]. Using shape curvature descriptive methods such as those proposed by Wang et al [88] can also be a good way to parameterize 3D faces for clustering. The problem of representing 3D faces for clustering is a recent problem. This thesis presents one approach to represent 3D faces . We also hope that it has shed some light on the usefulness of superquadrics toward representation and clustering. We hope that this work lays some ground work for practical performance on these problems. * 65 Chapter 6 Appendix 6.1 A - Convex Hull Given a set of points X, the convex hull is the set of points A? such that A? envelopes the set of points X. There are many algorithms to determine the convex envelope or the convex hull for a given set of points [99]. We refer the reader to for some algorithms. Convex hulls have many applications in computer vision, computer graphics, Geo- graphic Information Systems (GIS) and many other fields. Convex hulls can be related to the Delauynay triangulation.It can also be viewed as the Voronoi diagram [100]. We extract the convex hull using the qhull [101] package in Matlab. For a given set of points, we first extract the convex hull as shown in Figure 6.1 using the qhull package [101] through Matlab [102]. 66 3.5 fi fi 7 I T I I I I 2.5 r 0.5 f + Data Convex Hull O .. .- .. .. )— 1- l —1.5 -1 —0.5 0 0.5 1 1.5 2 2.5 3 3.5 Figure 6.1: Convex Hull (blue line) of a random set of points (red plus) from a normal distribution with mean (u=[1,2]) and co-variance (o=[0.9,0.2;0.2,0.5]) 67 Operation Fast fine Description Camera setup 14.2 14.2 Camera calibration and setup. Required only during start up Focs 3.4 3.4 Minor Focus adjustment, if subject moves Lead 0.072 0.072 Scanner setup time Scan 0.87 3.74 Scanning time Color 0.85 0.85 Record 3 picture color image Release 1.72 4.59 Total time subject needs to hold still Download 1.04 1.92 Download image from scanner to computer Total 2.83 6.58 Total time for processing (without setup/ focus) Table 6.1: Operation of the Vivid 910 scanner (time in seconds). Details from PhD dissertation - Dirk Colbry [3] 6.2 B - Dataset Description The data used in this research was collected by former labmate Dirk Colbry and pro- fessors George Stockman and Anil Jain. This appendix briefly describes the dataset collected. The 3D models used in this study were produced by the Minolta Vivid 910 3D scanner. This scanner can generate a 2.5 image of the face in less than 2 seconds. A 2.5D image is a reduced representation of 3D images in that, we do not a mathematical solid. That is to say, the 2.5D face image does not divide the 3D space into 3 regions - region outside the 3D face, region on the surface of the 3D face and region inside the 3D face (hollow space). The Vivid 910 provides scans accurate upto 0.2mm [103].The Vivid 910 estimates structure from lighting. It uses a horizontal plane of laser light onto the surface of an object. A 640 x 480 pixel color camera picks up the curve produced by the interaction of the the laser light with the object. It then triangulates the depth of the illuminated surface points. Since this type of sensor is well calibrated, it will produce scan depth values accurate to less than a millimeter (:l:.10mm in fine mode). There are two major modes of operation for the Vivid 910 scanner - fast and fine. Table 6.1 describes the operation modes and scan times in detail. 68 The Michigan State University Canonical Face Depth Map (h‘ISUvCFDM) Dataset consists of scanning only in the fast mode, with scan resolutions set at 320 x 240 pixels. This scanning time takes place in less than a second. The dataset consists of scans from 111 subjects. For 107 subjects exactly 3 scans were collected (4 scans had only 1 or 2 each and were exempt from our analysis). Thus, in total 321 scans were used in this research. Institutional Review Board (IRB) Approval All data collected in this research met with IRB approved standards and pro- cedures. The IRB number for this research is #03-851, titled “Face modeling for communication and recognition”. 69 6.3 C-Sample Superquadric Fitting This appendix will describe some more examples of superquadric fitting. -410 o 40 Figure 6.2: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset 70 -40 -2° 0 20 4o 1 l I J l Figure 6.3: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset along with a superquadric fit, with x-y axes visible 71 so- . _ . — ' . » xxx». 3!} 1111111... -100 O 100 Figure 6.4: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset along with a superquadric fit, with y-z axes visible 60 Figure 6.5: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset along with a superquadric fit, with x-z axes visible. The superquadric fit parameters are as follows [64.39,95.5,40,0.9062..8998] 72 100" -100- Figure 6.6: Figure showing 2.5D frontal facescan of subject from the MSU-CFDM dataset 73 "100 50 Figure 6.7: Figure showing 2.5D frontal facescan of subject from the MSU- CFDM dataset along with a superquadric fit.The fit parameters are as follows [56.6,98.2,41.72,1.4742,0.6360] 74 1 J -60 -40 -20 0 20 40 60 Figure 6.8: Figure showing 2.5D frontal facescan of subject from the MSU-CFDM dataset 75 Figure 6.9: Figure showing 2.5D frontal facescan of subject from the MSU- CFDM dataset along with a superquadric fit. The fit parameters are as follows [56.8,92,41.67,1.4469,0.7807] 76 Figure 6.10: Figure showing 2.5D frontal facescan of subject from the MSU-CFDM dataset showing a smile expression 77 Figure 6.11: F ignre showing 2.5D frontal facescan with smile expression of subject from the MSU-CFDM dataset along with a superquadric fit. The fit parameters are as follows [61.05,90,44.13,1.5375,0.9761] 78 50 0 -50 -100 1 1 . U Figure 6.12: Figure showing 2.D frontal facescan that is both poorly normalized and has many holes ' ' 50 [ 0 [ -50 -50 9 50 Figure 6.13: Figure showing 2.D frontal facescan that is both poorly normalized and has many holes (previously described) fit with a superquadric. Note, how su- perquadrics find the axes in spite of the holes and poor normalization. 79 6.4 D- Expression Modeling Samples This appendix describes sample scans and the Gabor responses employed for smile parmaeterization described in our work. 100 _ -100 r Figure 6.14: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression 80 Figure 6.15: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the absolute Gabor responses 81 100 _ 50- -100 .- Figure 6.16: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression 82 Figure 6.17: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the absolute Gabor responses 83 100 50 -50 l l I -40 0 40 Figure 6.18: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression 84 Figure 6.19: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the real Gabor responses 85 -40 0 40 I l L l l J Figure 6.20: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression 86 Figure 6.21: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM dataset showing no expression and the real Gabor responses 87 BIBLIOGRAPHY [1] A. Jain, R. Bolle, and S. Pankanti, Biometrics: personal identification in net- worked society. Kluwer Academic Publishers, 1999. [2] D. Colbry and G. Stockman, “Real time identification using a canonical face depth map,” IET Computer Vision, Special Issue on 3D Face Processing, pp. 74—92. [3] D. Colbry, “Human face verification by robust 3d surface alignment,” Michigan State University PhD Dissertation, 2006. [4] H. J .M., C. Larson, and D. Zhu, “Cortical activation to indoor versus outdoor scenes: an MRI study,” Experimental Brain Research, vol. 179, no. 1, pp. 75—84, 2007. [5] M. Mudigonda, P. Ramkumar, D. Zhu, G. Stockman, and R. Jin, “Multi—voxel pattern analysis identifies brain regions that discriminate indoor and outdoor scenes,” Science, vol. 10, no. 9. [6] J. Haxby, M. Gobbini, M. Furey, A. Ishai, J. Schouten, and P. Pietrini, “Dis— tributed and overlapping representations of faces and objects in ventral tempo- ral cortex,” Science, vol. 293, no. 5539, p. 2425, 2001. [7] J. Daugman, “Two-dimensional spectral analysis of cortical receptive field pro- files,” Vision research, vol. 20, no. 10, pp. 847—856, 1980. [8] I. Biederman, “Perceiving real-world scenes,” Science, vol. 177, no. 43, pp. 77—— 80, 1972. [9] J. Belliveau, D. Kennedy Jr, R. McKinstry, B. Buchbinder, R. Weisskoff, M. Co- hen, J. Vevea, T. Brady, and B. Rosen, “Functional mapping of the human visual cortex by magnetic resonance imaging,” Science, vol. 254, no. 5032, p. 716, 1991. [10] S. Ogawa, D. Tank, R. Menon, J. Ellermann. S. Kim, H. Merkle, and K. Ugur- bil, “Intrinsic signal changes accompanying sensory stimulation: functional 88 [11] ml [13] [14] [15] [16] [17] [18] [19] [20] [21] [2?] brain mapping with magnetic resonance imaging,” Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 13, p. 5951, 1992. K. Kwong, J. Belliveau, D. Chesler, I. Goldberg, R. Weisskoff, B. Poncelet, D. Kennedy, B. Hoppel, M. Cohen, and R. Turner, “Dynamic magnetic reso- nance imaging of human brain activity during primary sensory stimulation,” Proceedings of the National Academy of Sciences, vol. 89, no. 12, p. 5675, 1992. G. McCarthy, A. Puce, J. Gore, and T. Allison, “Face-specific processing in the human fusiform gyrus,” Journal of Cognitive Neuroscience, vol. 9, no. 5, pp. 605—610, 1997. P. Ekman and W. Friesen, “Facial action coding system,” psychology, vol. 1978, p. 35. A. OToole, H. Abdi, K. Deffenbacher, and D. Valentin, “Low-dimensional repre- sentation of faces in higher dimensions of the face space,” Journal of the Optical Society of America A, vol. 10, no. 3, pp. 405—411, 1993. R. Epstein, J. Higgins, and S. Thompson-Schill, “Learning places from views: variation in scene processing as a function of experience and navigational abil- ity,” Journal of Cognitive Neuroscience, vol. 17, no. 1, pp. 73—83, 2005. M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International journal of computer vision, vol. 1, no. 4, pp. 321-331, 1988. S. Li, “Markov random field models in computer vision,” Computer Vi- sionECCV’QI, pp. 361—370, 1994. R. Huang, V. Pavlovic, and D. Metaxas, “A hybrid face recognition method us- ing markov random fields,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004, pp. 157—160. S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 509—522, 2002. D. Lowe, “Object recognition from local scale-invariant features," in iccv. Pub- lished by the IEEE Computer Society, 1999, p. 1150. D. Terzopoulos and D. Metaxas, “Dynamic 3D models with local and global de- formations: Deformable superquadrics,” IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, pp. 703—714, 1991. F. Solina and R. Bajcsy, “Recovery of parametric models from range images: The case for superquadrics with global deformations,” IEEE transactions on pattern analysis and machine intelligence, pp. 131—147, 1990. 89 [23] D. Terzopoulos, A. Witkin, and M. Kass, “Constraints on deformable models: Recovering 3D shape and nonrigid motion,” Artificial Intelligence, vol. 36, no. 1, pp. 91~123, 1988. [24] B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 1996, p. 312. [25] G. Rhodes, F. Proffitt, J. Grady, and A. Sumich, “Facial symmetry and the perception of beauty,” Psychonomic Bulletin and Review, vol. 5, pp. 659—669, 1998. [26] F. Galton, “Personal identification and description,” Journal of Anthropological Institute of Great Britain and Ireland, vol. 18, pp. 177—191, 1889. [27] B. Manjunath, R. Chellappa, and C. Von der Malsburg, “A feature based ap- proach to face recognition,” in IEEE Computer Societay Conference on Com- puter Vision and Pattern Recognition, 1992, pp. 373—378. [28] D. DeCarlo, D. Metaxas, and M. Stone, “An anthropometric face model us- ing variational techniques,” in Proceedings of the 25th annual conference on Computer graphics and interactive techniques. ACM, 1998, p. 74. [29] T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active Shape Models- Their Training and Applications,” Computer Vision and Image Understand- ing, vol. 61, no. 1, pp. 38—59, 1995. [30] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of cognitive neuroscience, vol. 3, no. 1, pp. 71-86, 1991. [31] R. Clarke, “Human identification in information systems: It’lanagement chal- lenges and public policy issues,” Information Technology 63 People, vol. 7, no. 4, pp. 6—37, 1994. [32] Y. Itakura, M. Hashiyada, T. Nagashima, and S. Tsujii, “Proposal on personal identifiers generated from the STR information of DNA,” International Journal of Information Security, vol. 1, no. 3, pp. 149—160, 2002. [33] E. Newham, “The biometric report,” SJB Services, New York, 1995. [34] A. Ross and A. Jain, “Information fusion in biometrics,” Pattern Recognition Letters, vol. 24, no. 13, pp. 2115—2125, 2003. 9 [35] ————, “Multimodal biometrics: An overview,’ in 12th European Signal Process- ing Conference (EUSIPCO), Vienna, Austria, 2004, pp. 1221—1224. [36] A. Ross, K. Nandakumar, and A. Jain, Handbook of multibiometrics. Springer- Verlag New York Inc, 2006. 90 [37] D. Nitzan, A. Brain, and R. Duda, “The measurement and use of registered reflectance and range data in scene analysis,” in IEEE Proceedings, vol. 65, 1977, pp. 206—220. [38] V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in Proceedings of the 26th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 1999, pp. 187—194. [39] T. Vetter and V. Blanz, “Estimating coloured 3D face models from single im- ages: An example based approach,” European Conference On Computer Vision, p.499,1998. [40] T. Akimoto, Y. Suenaga, and R. Wallace, “Automatic creation of 3D facial models,” IEEE Computer Graphics and Applications, pp. 16-22, 1993. [41] V. Blanz and T. Vetter, “Face recognition based on fitting a 3D morphable model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1063-1074, 2003. [42] P. Mahalanobis, “On the generalized distance in statistics,” Proceedings of the National Institute of Sciences,India, vol. 2, 1936. [43] X. Lu, A. Jain, and D. Colbry, “Matching 2.5 D face scans to 3D models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 31—43, 2006. [44] U. Park and A. Jain, “3D model-based face recognition in video,” Advances in Biometrics, pp. 1085—1094, 2007. [45] I. Kakadiaris, G. Passalis, G. Toderici, M. Murtuza, Y. Lu, N. Karampatziakis, and T. Theoharis, “Three—dimensional face recognition in the presence of facial expressions: An annotated deformable model approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 640—649, 2007. [46] P. Besl and H. McKay, “A method for registration of 3-d shapes,” Pattern Analysis and Machine Intelligence, vol. 14, pp. 239~256, 1992. [47] D. Colbry and G. C. Stockman, “Canonical face depth map: A robust 3d rep- resentation for face verification,” in Computer Vision and Pattern Recognition. 2007. [48] K. Chang, K. Bowyer, and P. Flynn, “Face recognition using 2D and 3D facial data,” in ACM Workshop on Multimodal User Authentication, 2003, pp. 25 32. [49] —, “An evaluation of multimodal 2D+ 3D face biometrics,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence, pp. 619—624, 2005. 91 [50] 151] [52] [53] [54] [55] [55] [57] [58] [59] [60] [61] [62] K. Bowyer, K. Chang, and P. Flynn, “A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition,” Computer Vision and Image Understanding, vol. 101, pp. 1—15, 2006. D. Colbry, G. Stockman, and A. Jain, “Detection of anchor points for 3D face verification,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 93, 2005, p. 94. M. Irfanoglu, B. Gokberk, and L. Akarun, “3D shape-based face recognition using automatically registered facial surfaces,” Pattern Recognition, vol. 4, pp. 183—186, 2004. A. Mian, M. Bennamoun, and R. Owens, “Keypoint detection and local feature matching for textured 3D face recognition,” International Journal of Computer Vision, vol. 79, no. 1, pp. 1—12, 2008. P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the face recognition grand challenge,” in IEEE Computer Society Conference on Computer Vision and Pattern Recogni- tion, 2005, pp. 947—954. “wwwgeometrixcom.” Lt ' ' 3’ www.kon1cam1nolta.com. “http: //www.qlonerator.com/ .” P. Phillips, H. Moon, P. Rauss, and S. Rizvi, “The FERET evaluation method- ology for face-recognition algorithms,” in Proceedings Of The IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997, 1997, pp. 137—143. A. C. S. Germain, Robert S Califano, “Fingerprint matching user transforma- tion parameter clustering,” in IEEE Computational Science and Engineering, 1997, pp. 42—49. K. W. Bowyer, K. Hollingsworth, and P. J. Flynn, “Image understanding for iris biometrics: A survey,” Computer Vision and Image Understanding, vol. 110, pp. 42—49, 2007. A. Barr, “Superquadrics and angle preserving transformations,” IEEE Corn- puter Graphics Applications, vol. 1, pp. 11—23, 1981. Z. X. S. Niu, Jianwei Li, “Comparisons of 3d shape clustering with different face area definitions,” Second International Conference on Digital Human Modeling, vol. 5620, pp. 55—63, 2009. 92 [63] Y. Zhang and C. Kambhamettu, “Robust 3d head tracking under partial oc- clusion,” in Proc of the fourth International Conference on Automatic Face and Gesture Recognition, 2000. [64] F. Haar, M. Daoudi, and R. Veltkamp, “Shape retrieval contest 2008: 3d faces scans,” in Shape Modeling and Applications (SMI), 2008, pp. 225—226. [65] A. P. Pentland, “Perceptual organization and the representation of natural form,” Artificial Intelligence, vol. 28, pp. 293-331, 1986. [66] A. Jaklic, A. Leonardis, and F. Solina, Superquadrics and their geometric prop- erties, Thesaloniki,Greece, 2000, ch. 2, pp. 13—539. [67] E. Bardinet, L. D. Cohen, and N. Ayache, “Fitting 3-d data using superquadrics and free-form deformations,” in 12th IAPR, Int Conf on Pattern Recognition, Jerusalem, 1994, pp. 79~83. [68] —, “A parametric deformable model to fit unstructured 3d data,” Computer Vision and Image Understanding, vol. 71, pp. 39-54, 1998. [69] —, “Tracking and motion analysis of the left ventricle with deformable su- perquadrics,” Medical Image Analysis, vol. 1, pp. 129-149, 2004. [70] A. H. Barr, “Superquadrics , angle preserving transformations,” IEEE Com- puter Graphics , Applications, vol. 1, no. January, pp. 11—23, 1981. [71] S. Han, “A globally convergent method for nonlinear programming,” Optimiza- tion Theory and Applications, vol. 22, p. 297, 1977. [72] P. Gill, W.Murray, and M.H.Wright, Numerical Linear Algebra and Optimza- tion, 1991, vol. 1. [73] K. Pearson, “On lines and planes of closest fit to systems of points in space,” Philosophical Magazine Series 6, vol. 2, no. 11, pp. 559—572, 1901. [74] M. Kirby and L. Sirovich, “Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces,” IEEE Transactions of Pattern Analysis and Machine Intelligence, vol. 12, no. 1, 1990. [75] L. Sirovich and M. Kirby, “Low—dimensional procedure for the characterization of human faces,” Journal of the Optical Society of America, vol. 4, no. 3, pp. 519-524, 1987. [76] M. Turk and A. Pentland, “Face recognition using eigenfaces,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 591, 1991, pp. 586—591. [77] A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algo- rithm,” in Advances in Neural Information Processing Systems 14: Proceeding of the 2001 Conference, 2001, pp. 849—856. 93 [78] F. Bach and M. Jordan, “Learning spectral clustering, with application to speech separation,” The Journal of Machine Learning Research, vol. 7, p. 2001. 2006. [79] M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document cluster- ing techniques,” in KDD workshop on text mining, vol. 400, 2000, pp. 525—526. [80] A. Jain and R. Dubes, Algorithms for clustering data, 1988. [81] A. Jain, M. Murty, and P. Flynn, “Data clustering: a review,” ACM computing surveys (CSUR), vol. 31, no. 3, pp. 264—323, 1999. 11 [82] A. Jain, “Data clustering: 50 years beyond K-means, Letters, vol. 31, no. 8, pp. 651-666, 2010. Pattern Recognition [83] M. Bilenko, S. Basu, and R. Mooney, “Integrating constraints and metric learn- ing in semi-supervised clustering,” in Proceedings of the twenty-first interna- tional conference on Machine learning. ACM, 2004, p. 11. [84] S. Basu, M. Bilenko, and R. Mooney, “A probabilistic framework for semi- supervised clustering,” in Proceedings of the tenth ACM SI GKDD international conference on Knowledge discovery. ACM, 2004, pp. 59—68. [85] G. Edwards, T. Cootes, and C. Taylor, “Face recognition using active appear- ' ance models,” Computer VisionECCV98, p. 581, 1998. [86] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with gabor wavelets,” in Proceedings of the 3rd. International Conference on Face 52 Gesture Recognition, 1998, p. 200. [87] L. Wiskott, J. Fellous, N. Kr ”uger, and C. Von der Malsburg, “Face recognition by elastic bunch graph matching,” in Computer Analysis of Images and Patterns. Springer, 1997, pp. 456—463. [88] J. Wang and L. Yin, “Static topographic modeling for facial expression recog- nition and analysis,” Computer Vision and Image Understanding, vol. 108, no. 1-2, pp. 19—34, 2007. [89] L. Yin, X. Wei, Y. Sun, J. Wang, and A. M. Rosato, “3d facial expression database for facial behavior research,” In 7th Int. Conf. on F GR, Southampton. UK, 2006. [90] T. S. Huang, “3D facial expression recognition based on automatically selected features,” 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-8, June 2008. 94 [W [92] [93] [94] [95] [96] [97] [981 [99] [100] [101] [102] [103] I. Mpiperis, S. Malassiotis, and M. G. Strintzis, “Bilinear Models for 3-D Face and Facial Expression Recognition,” IEEE Transactions on Information F oren- sics and Security, vol. 3, no. 3, pp. 498—511, September 2008. J. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters,” Journal of the Optical Society of America A, vol. 2, no. 7, pp. 1160—1169, 1985. Y. Moses, Y. Adini, and S. Ullman, “Face recognition: The problem of compen- sating for changes in illumination direction,” European Conference on Computer Vision, pp. 286—296, 1994. P. Phillips, H. Wechsler, J. Huang, and P. Rauss, “The FERET database and evaluation procedure for face—recognition algorithms,” Image and Vision Com- puting, vol. 16, no. 5, pp. 295—306, 1998. W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld, “Face recognition: A literature survey,” Acm Computing Surveys (CSUR), vol. 35, no. 4, pp. 399— 458, 2003. B. Fasel and J. Luettin, “Automatic facial expression analysis: a survey,” Pat- tern Recognition, vol. 36, no. 1, pp. 259-275, 2003. J. Movellan, “Tutorial on gabor filters,” Open Source Document, 2002. M. Mudigonda and G. C. Stockman, “Superquadric Representation of 3D Faces : Toward Reducing Time Complexity of Recognition,” in SPIE, Biometric Tech- nology for Human Identification VII, Orlando, FL, 2010. T. Cormen, C. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms. The MIT press, 2001. K. Brown, “Voronoi diagrams from convex hulls,” Information Processing Let- ters, vol. 9, no. 5, pp. 223~228, 1979. “www.qhull.org.” “www.mathworks.com.” C. Boehnen and P. Flynn, “Accuracy of 3D scanning technologies in a face scanning scenario,” 2005, pp. 310—317. 95