1
\

l
1

m

 

’4

than;

I;

:m

This is to certify that the
thesis entitled

REPRESENTATIONS OF 30 FACES: TOWARD TIME
COMPLEXITY REDUCTION AND EXPRESSION
RECOGNITION

presented by

Mayur Mudigonda

has been accepted towards fulﬁllment
of the requirements for the

 

 

 

 

Master of degree in Computer Science
Science
J "v 4 ’ —»/

Major Professor’s Signature

 

72‘?) 4&5} 2" /(_’D
(/

Date

MSU is an Affirmative Action/Equal Opportunity Employer

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 KIProj/Aoc3Pres/ClRC/Dat00uo.indd

 

REPRESENTATIONS OF 3D FACES: TOWARD TIME COMPLEXITY
REDUCTION AND EXPRESSION RECOGNITION

By

IVIayur Mudigonda

A THESIS
Submitted to
Michigan State University

in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE
Computer Science

2010

ABSTRACT

REPRESENTATIONS OF 3D FACES: TOWARD TIME
COMPLEXITY REDUCTION AND EXPRESSION RECOGNITION

By

Mayur Mudigonda

We parametrize 3D faces with superquadrics and PCA techniques to support clus-
tering. We verify the effectiveness of superquadrics ﬁnding axes of symmetry for 3D
faces on pre—normalized datasets.We show that a superquadric ﬁt to 2.5D face data
in conjunction with other normalization techniques helps in bettering the normal-
ization. This is observed by comparing 100 x 100 meshes of all faces using various
normalization techniques; the top 10 eigen values represent about 99% energy with
our method, 95% with only CFDM normalization and about 83% with superquadric
normalization alone.

One of the goals of parametrization is to be able to show a robust partitioning of
the dataset into smaller robust clusters, thus reducing the complexity of identiﬁcation
of faces. On the MSU data set we show that we can create about 20-30 clusters with
an accuracy of about 83% using traditional k—means. We also show that using semi-
supervised clustering techniques we are able to extend the number of clusters to
higher numbers of 30—40. We analyze the various feature spaces proposed and show
that shape curvatures (exponential parameters of superquadrics) contain maximal
information (6.7).

We attempt to model expressions in 3D faces using Gabor ﬁlters and a novel
method to parameterize clusters using convex hulls that extract the perimeter of
clusters and apply PCA to parameterize the clusters. We show that such a model can
fully automatically discriminate smile versus no expression with an accuracy of 78%.
Future extensions involving using shape parametrizing of local regions are needed to

achieve a practical performance for this problem.

ACKNOWLEDGMENT

This piece of research has fructified by the good will and support of many. First, I
would like to acknowledge my parents (and Mash) who laid the foundation for my
education by always encouraging me to question and providing ample opportunities
to debate and learn. Without their countless sacriﬁces this research would not be
possible. I dedicate this thesis to them.

I am grateful for the mirthful debate, patient ear and the challenging discussions
with Dr.Stockman all through my time at Michigan State. Advice often masked
through witty questioning has helped shape my thought processes, for this I am
indebted to him.

I thank Dr.Jain for his time, insight and insightful advice. I thank Dr.Jin for
helping develop a deeper appreciation of research in pattern recognition and machine
learning. I also thank Frank (Dr.Biocca) for his boundless encouragement, brain
storm sessions and initially supporting me at Michigan State University.

Along the paths of research I have stumbled every now and then, only to be
helped and motivated by my close friends(and advisors) Naveen and Pavan (MNSSK).
I acknowledge the patient listening and advice of my counsellors Adi, AK and Pa-
van(Ramkumar). I also acknowledge other friends and PRIPies (Abhishekh in partic-
ular) for their support, wonderful lunch discussions and help in times of need. Finally,
I thank the many friends (Zubin, Kantha, Ashish) and events that have helped shape

the person that I sometimes fondly look upon.

iii

List of Tables .................................
List of Figures ................................
Introduction ...... . . . . . . . . . ............
1.1 Faces ....................................
1.2 Biometrics .................................
1.2.1 Biometric Modalities .......................

1.2.2 3D Faces ..............................
1.2.3 Scanning 3D Faces ........................
1.2.3.1 Motivation For This Thesis ..............

1.2.4 Thesis Contributions .......................

Superquadric Representation of 3D Faces ............

2.1 Superquadrics ...............................
2.1.1 Superquadrics ...........................
2.1.2 Superquadrics In General Position ................
2.1.3 Ambiguities in shape description ................

2.2 Fitting ...................................
2.2.1 A robust normalized co—ordinate space .............
2.2.2 Fitting Results ..........................

2.3 PCA and Eigenfaces ...........................

2.4 Toward a Parametric Face ........................

Clustering Representations . ..................

3.1 A Note On Clustering ..........................
3.1.1 Unsupervised Learning - Kmeans ................
3.1.2 Semisupervised Clustering ....................

3.2 Clustering Results ............................

3.3 Time Complexity .............................

3.4 Feature Analysis .............................
3.4.1 Submeshed Projections ......................

Parameterizing Expressions: Recognizing Smiles . . . . .....

4.1 Gabor Filters ...............................

4.2 Cluster Parametrization: A Convex Hull And PCA Approach
4.2.1 Parameterization of Expressions .................
4.2.2 Recognizing Smiles ........................

TABLE OF CONTENTS

iv

vi

61

5 Discussion ............................ 63

6 Appendix ............................ 66
6.1 A - Convex Hull .............................. 66
6.2 B - Dataset Description ......................... 68
6.3 C-Sample Superquadric Fitting ..................... 70
6.4 D- Expression Modeling Samples .................... 80
Bibliography ..................................................... 88

1.1

2.1

2.2

3.1

3.2

3.3

4.1

4.2

4.3

6.1

LIST OF TABLES

Comparison of biometric technologies. The data are based on percep-
tion of three biometric experts. This table is from the book by Jain et
a1 [1] ....................................
Fitting Results on Raw Scans. Rotations are described in radians. . .

Fitting Results on CFDM Data Set. Rotations are described in radians.

List of processes / data formats and their corresponding times and mem-
ory requirements. Some data from Colbry et al [2] ...........

Table showing the variance capture by the top 3 Eigen ........

Table comparing entropy estimates of Superquadric ﬁtting, 100 x 100
Eigen projections and 25x25 eigen projections .............

Performance of feature vector for classification on neutral vs smiling
faces using 200 points on the convex hull ................

Performance of feature vector for classiﬁcation on neutral vs smiling
faces using varying points on the convex hull projected onto 10 princi-

pal components ..............................

Performance of feature vector for classiﬁcation on neutral vs smiling
faces using submeshed Eigen Projection Weights using ........

Operation of the Vivid 910 scanner (time in seconds). Details from
PhD dissertation - Dirk Colbry [3] ....................

vi

27

48

49

61

61

68

1.1

2.1

2.2

2.3

2.4

2.5

2.6

2.7

LIST OF FIGURES

Images in this thesis are presented in color

Shows activation of the bilateral Parahippocampal Place Area. (PPA),
preferentially activating more for scenes [4,5] ..............

Block diagram that explains the various steps involved in this chapter

The vertical axes of varies the North-South exponential parameter,
while the horizontal axis varies East-West exponential parameter. The
values are varied from 0.1 to 2 on both the axis. ............

Superquadric ﬁt to a CFDM face scan. The superquadric is plotted in
green ....................................

Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data .
The numbers above the ﬁts (Left to Right) are the 5 shape parameters
a1, a2,a3, 61,62 ..............................

Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data .
The numbers above the ﬁts (Left to Right) are the 5 shape parameters
a1, a2, a3,61, 62 ..............................

Superquadric Fit for a subject 1038 3 scans. The numbers above the
ﬁts (Left to Right) are the 5 shape parameters a1, a2, a3, 61, e2 . . . .

Histograms of the 5 superquadric shape parameters. X,Y and Z radii
are measured in millimeters(mm). ....................

vii

20

20

28

30

30

32

32

2.8

3.1

3.2

3.3

3.4

3.5

3.6

3.7

4.1

4.2

4.3

Superquadric ﬁt to a poor CFDM face scan. Note in the top left corner,
the scan has many holes in it. Holes are a result of the reﬂected laser
not being captured due to hair, etc . The superquadric is plotted in
green. The blue points are the sampled data points for ﬁtting the
superquadric. ...............................

Plot of recall accuracy against cluster size for CFDM and superquadric
normalized dataset with no Eigen and 5 shape parameters used

Plot of recall accuracy against cluster size for CF DM and superquadric
normalized dataset with 10 Eigen parameters and no shape parameters
used ....................................

Plot of recall accuracy against cluster size for CFDM and superquadric
normalized data set with 10 Eigen and 5 shape parameters used . . .

Plot of recall accuracy against cluster size for CFDM and superquadric
normalized data set with 50 Eigen and 5 shape parameters used . . .

Plot of recall accuracy against cluster size for the raw data set with 50
Eigen and 5 shape parameters used ...................

Plot of Recall Accuracy against Cluster Size for the CFDM and su-
perquadric normalized Data Set with 20 Eigen and 5 shape parameters
using the MPCK—means Algorithm ...................

Rank Accuracy: Retrieving sister scans based on parameters .....

This ﬁgure shows a typical Gabor ﬁlter bank. Left to right are the
changing frequencies. Top to bottom are the changing spatial orien-
tations. Typrically, an input face is convolved with each of the ﬁlters
to give a gabor intensity map. Features are extracted from various
intensity maps for the speciﬁc application ................

Frontal scan with no expression of subject 002 from the MSU-Face
database. .................................

Frontal scan with smile expression of subject 002 from the MSU-F ace
database ..................................

viii

42

44

45

46

47

48

53

55

4.4

4.5

4.6

4.7

4.8

6.1

6.2

6.3

6.4

6.5

6.6

6.7

Convolved responses of the no—expression frontal scan 4.2 using Gabor
map described above Figure4.1. Notice how some responses identify
the regions around the eyes, nose and lips while others tend to iden-
tify regions that are not easily identiﬁable. Note also how the Gabor
responses identify the boundary of the face scan from the input depth
map. .................................... 56

Convolved responses of the smile-expression frontal scan 4.3 using Ga-
bor map described in ﬁgure Figure 4.1. Notice how some responses
identify the regions around the eyes, nose and lips while others tend
to identify regions that are not easily identiﬁable. Note also how the
Gabor responses identify the boundary of the face scan from the input
depth map. ................................ 57

Block diagram showing the procedure for parameterizing smiles. . . . 58

Gabor responses for smile versus neutral expression along with the
input scans ................................ 59

Blobs identiﬁed on Gabor response of a neutral scan of subject 13 from
the MSU database. ............................ 60

Convex Hull (blue line) of a random set of points (red plus) from a nor-
mal distribution with mean (it: [1,2]) and co-variance (a=[0.9,0.2;0.2,0.5])
67

Figure showing 2.5D facescan of subject from the MSU-CFDM dataset 70

Figure showing 2.5D facescan of subject from the MSU-CFDM dataset
along with a superquadric ﬁt, with x—y axes visible .......... 71

Figure showing 2.5D facescan of subject from the MSU-CFDM dataset
along with a superquadric ﬁt, with y—z axes visible ........... 72

Figure showing 2.5D facescan of subject from the MSU-CFDM dataset
along with a superquadric ﬁt, with x-z axes visible. The superquadric
ﬁt parameters are as follows [64.39,95.5,40,0.9062,.8998] ........ 72

Figure showing 2.5D frontal facescan of subject from the MSU-CFDM
dataset ................................... 73

Figure showing 2.5D frontal facescan of subject from the MSU-CFDM
dataset along with a superquadric ﬁt.The ﬁt parameters are as follows

[56.6,98.2,41.72,1.4742,0.6360] ...................... 74

ix

6.8

6.9

6.10

6.11

6.12

6.13

6.14

6.15

6.16

6.17

6.18

6.19

6.20

Figure showing 2.5D frontal facescan of subject from the MSU—CFDM
dataset ...................................

Figure showing 2.5D frontal facescan of subject from the MSU-CFDM
dataset along with a superquadric ﬁt. The ﬁt parameters are as follows
[56.8,92,41.67,1.4469,0.7807] .......................

Figure showing 2.5D frontal facescan of subject from the MSU-CFDM
dataset showing a smile expression ...................

Figure showing 2.5D frontal facescan with smile expression of subject
from the MSU-CFDM dataset along with a superquadric ﬁt. The ﬁt
parameters are as follows [61.05,90,44.13,1.5375,0.9761] ........

Figure showing 2.D frontal facescan that is both poorly normalized and
has many holes ..............................

Figure showing 2.D frontal facescan that is both poorly normalized
and has many holes (previously described) ﬁt with a superquadric.
Note, how superquadrics ﬁnd the axes in spite of the holes and poor
normalization ................................

Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression ......................

Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the absolute Gabor responses . . .

Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression ......................

Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the absolute Gabor responses . . .

Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression ......................

Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the real Gabor responses .....

Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression ......................

75

76

78

79

79

80

81

82

83

84

86

6.21 Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the real Gabor responses .....

Chapter 1

Introduction

This work addresses the problem of person recognition using a large gallery of 3D
face data. In other words, given face scan Vu of an unknown person (or expression),
we need to decide if Vu is similar to any other V2 in our dataset. We further assume

that n is impractically large to allow for comparisons of Va with every V2.

The organization of this thesis is as follows. Chapter 1 provides an overview of
object, face and 3D face representation. It further tries to motivate the problems
addressed in this thesis. Chapter 2 describes the problem of ﬁtting superquadrics to
3D faces. It further describes the use of PCA for parameterizing 3D faces. Chapter 3
describes work regarding clustering (binning) representations obtained in Chapter 2.
Chapter 4 describes work in discriminating smiles from neutral expressions. Chapter

5 concludes this thesis with a discussion of the contributions and future work.

Representing knowledge for inferencing is key to many decision making tasks.
A good representation of the problem helps making decisions. By representation we
mean a small set of parameters (features) by which we can explain the task accurately.
A typical small image (100 x 100 pixels, say) from a low-resolution camera contains
10,000 pixels. If each pixel is considered a feature, then we have a large number

of parameters that describe each image and the objects (or scenes) that they may

1

describe. Dealing with such a large feature space is not only time consuming but
also inefficient because (in many cases) all pixels don’t really contribute towards the

description of the object (scene).

Representation - A Neuroscience Perspective

Representing objects is an important area of research not only for pattern recogni-
tion but also for cognitive neuroscience. Cognitive neuroscience is the study of neural
precepts of mental processes. Present day research in cognitive neuroscience often
draws upon techniques in pattern recognition [6]. Similarly, research in neuroscience
contributes sometimes motivates research in pattern recognition [7].

Some of the earliest work in object representation was done by Biederman [8]. He
observed that objects can be recognised simply by their shape. Further,information
such as colour, texture, etc did not contribute as much to their recognition.

Since Biederman, research in cognitive science has evolved from human responses
to studying cortical data.The use of computational neuroimaging methods such as
functional Magnetic Resonance Imaging (MRI), Magneticencephalography (MEG),
Electroencephalography(EEG), etc have helped advance the ﬁeld of cognitive neuro—
science. FMRI data are indirect measures of neuronal activity based on the difference
in magnetic activity of blood between oxygenated and de-oxygenated blood. FMRI
maps this difference as an image [9-11].

Faces have been the subject of signiﬁcant attention in the neuroscience community.
They are thought to be processed in Fusiform Face Area (FFA) [12]. It is generally
thought that the identiﬁcation process precedes the recognition (veriﬁcation process).
Humans recognize many visual cues such as expressions and other social signals very
astutely and accurately. These cues seem to be universally interprettable [13]. It
is thought that human faces are represented by the differences from an averaged
face model [14]. Haxby et a1. [6] showed that representations for faces and objects

are distributed in the ventral temporal cortex thus laying out some ground work for

2

representation of objects (and faces) in the brain. Similarly, scenes are thought to be
processed in the Parahippocampal Place Area (PPA) [15]. A distributed approach
to face (and object) processing seems to be the accepted model of research in the
community as of today. This is of interest to the face research community because

visual network models use information from such studies.

 

Figure 1.1: Shows activation of the bilateral Parahippocampal Place Area (PPA),
preferentially activating more for scenes [4,5]

It is important for Artiﬁcial Intelligence (AI) researchers to understand how hu-
mans processes information and synthesize new data, so as to be able to emulate
artiﬁcial systems that can perform human—like activities or even exceed them. Thus,
research in neuroscience (sometimes) goes hand in hand with the long term research
goals in the AI community.

Representation - A Computer Vision Perspective

The harder aspect of recognizing and representing objects in computer systems
is partly due to factors such as lighting, scale and orientation changes. There exist
many different approaches to object recognition in the computer vision community.
In the following paragraphs, we will briefly look at some of these models.

Contour modeling is a very popular approach in computer vision. The most
popular approach being snakes or active contour models [16]. Snakes are energy
minimizing splines that are guided by both extrinsic forces and image forces to align
themselves as close the lines and edges that we wish to model.

Methods to model high and low level vision have also been developed using Markov

3

Random Fields (MRF) [17]. Such methods can encode the prior in the MRF and
thus hope to solve larger vision problems. Huang et a1. [18] proposed a hybrid face
recognition method using MRFs. They break a face down into small patches and use
MRFs to represent the relationship between the various patches and the patch’s ID.

Object recognition through shape matching algorithms is a popular approach [19].
Contextual shape matching is done through solving the point correspondence and
creating descriptors that describe the distribution around the point, such that it
generalizes the ability of the feature to represent the object fairly well. Scale Invariant
Feature Transforms (SIFT) is also a very popular approach that identiﬁes key points
or descriptors in each image. Typically, feature vectors are generated by taking
representative descriptors for each cluster of descriptors [20].

Other approaches to modeling objects for various applications include geometric
models such as superquadrics [21,22], deformable models [23], volumetric approaches

[24] , etc.

1. 1 Faces

Human faces are a source of fascination to both artists and scientists. Faces are
smooth and symmetrical [25] . Some of the earliest work to understand faces was
done by Leonardo Da Vinci. Sir Francis Galton laid the foundation for characterizing
proﬁle faces [26]. He studied various component measurements of the face and deﬁned
key feature points on the proﬁle. Formally, he developed a method for characterizing
irregular curves.

Anthropometric research is the ﬁeld of study where measurements are made on the
surface of the face and then compared across race, gender, age and ethnicity. Anchor
points based face identiﬁcation has been strongly driven by anthropometric research

[27]. Use of anthropometric measurements in medicine is also not uncommon [28].

4

Earlier work on characterizing 2D faces involved developing a face model based on
manually located anchor points. Active Shape Modeling (ASM) proposed by Cootes
[29] is such a model. ASM uses 68 key points identiﬁed along the contours of the
face and key regions such as eyes, nose and lips. The choice of the number of anchor
points is based on empirical requirements such as quality of image, size of the face
(with respect to the image), etc. The location of the key points is intuitive because
we wish to represent the face through silhouttes. Methods of modeling faces based
on features based of proﬁle curves are called Analytic methods. Analytic methods
are popular because they are potentially invariant to pose, lighting and other minor
global changes. However, if the features are innaccurately detected then analytic
methods break down.

Other approaches to modeling faces include holistic methods such as eigenfaces
[30]. The eigenfaces approach is easy to train, simple and fast. However, since it
works primarily on ﬁnding the eigenvectors of the covariance matrix, if the faces
are not very well correlated (due to misalignment, lighting, etc) this approach will
perform poorly.

Hybrid approaches such as those used by Manjunath et a1 [27] address face mod-
eling by using Gabor response maps for non-speciﬁc features. Hybrid methods offer a
compromise using a mixture of feature-based and analytic methods for face parame-
terization. In our work, we address the problem of 3D face parameterization through

such a scheme.

1 .2 Biometrics

The need to identify or recognize individuals has gained signiﬁcant importance in
today’s society. Biometrics provides a way to authenticate transactions between man

and machines (and man-man). Biometric applications can largely be classiﬁed into

5

two categories

0 Recognition (Veriﬁcation) is the problem of matching a user against his
or her own template scan that is stored in the database. Typical applications

include personal data/ device authentications, bank vaults, etc.

0 Identiﬁcation is the problem of matching an unknown individual (probe
scan) against a set of persons (gallery of scans) that are stored in the database.
Typical applications include detecting suspicious characters, detecting anoma-

lous objects, etc.

The typical output of a biometric recognition system can be any of the following

st ates

0 True Positive A case of correctly identifying someone as the correct user of

the system.

0 ’IYue Negative A case of correctly identifying someone as an incorrect user of

the system.

0 False Positive A case of falsely identifying someone as the correct person,

when in actuality it is another person.

0 False Negative A case of falsely identifying someone as an incorrect user, when

in actuality it is another person.

In speciﬁc cases where the cost of making an incorrect decision outweighs the cost
of making no decision, one may choose a reject option. This action may mean the

added cost of a person being scanned again for a repeated match to the gallery.

6

1.2.1 Biometric Modalities

Biometrics are biological measurements that help identify a person repeatably and

correctly. A biological measurement must satisfy the following criterion [31]

o Universality It should apply to every person, irrespective of whether they are

going to be enrolled in the system

0 Uniqueness It should be sufficiently discriminative, such that each person’s

biometric characteristic is different from everyone else

0 Permanence It should be time invariant as far as possible, i.e. less prone to

effects of aging

o Collectability It should be easy to collect. Genetic markers such as DNA
are unique identiﬁers of humans [32],but the process of obtaining and verifying

DNA samples is laborious and sometimes not convenient.

In practice, other factors come into play [33]. There exist many different type of
biometric modalities as seen in Table 1.1. Each biometric measurement has its own
beneﬁt and no one measurement is expected to satisfy all requirements.

Current day research suggests that a fusion of biometrics provides a much better
system against spooﬁng and other such attacks [34,35]. Thus, research in biometrics
strives to better the performance of each modality. Performance, is effected by rep-
resentation and matching algorithms. We refer the reader to Ross et al. for a good

overview of the ﬁeld [36].

1.2.2 3D Faces

Although 3D data from range sensors has been available since the 1970’s [37] the

popularity of such devices has only recently increased. Recognition of faces has largely

7

E :5 5 85:. 24 :oo: 55 80:
m: 533 5:8 .8555 525853 55: .5 525555 50 4555 55 555 5:8 5550:0845» 525855 :0 50555800 ”44 5558

 

3s82 85 a282 3s82 85 S282 8382 3.5
3s82 85 33 85 33 33 e382 85
33 e382 33 33 85 85 85 <25
33 e282 33 33 85 85 85 85
85 85 8282 85 33 85 85 83808885
>54 :53 >54 85:52 >54 >54 85:52 82m 55>
354 :wﬁ >54 :wﬁ >54 >54 >54 555.55%
85 33 85 33 2282 85 85 8.8 8:85
85 33 85 5282 85 85 85 .5
85 a282 a282 a282 a282 a282 8:82 58> 8.5
85:52 85:52 >54 85:52 >54 >54 >54 5:85.23:
85:52 85:52 85:52 :wﬁ 85:52 85:52 85:52 2.85850 5553
85 225.2 85 2s82 85 85 8:82 858835
33 85 33 85 e282 33 85 85
5555255 0.50 23455554 5:585:52 3:535:50 55555852 5555558 D 2555253 5.25555

 

been approached as a 2D problem. Recognition accuracies for 2D face images is fairly
large (> 99% in some cases), the performance quickly decays for large changes in

lighting, illumination and pose.

Although 3D faces are relatively recent, 3D face models based research has been

in vogue largely from the graphics community [38—40].

Blanz and Vetter [41] showed that it is possible to ﬁt 2D faces to 3D face models
and then perform recognition. They show accuracies of approx 78% on side views
(CMU PIE dataset) and 86% on frontal view (FERET) datasets. In their work, they
morph a given input 2D face to a 3D face model in the gallery. Correspondence
between probe and gallery is performed using image ﬂow techniques. Once a 3D
face model is constructed, recognition is performed by calculating the Mahalanobis

distance [42].

Lu et al. [43] show a speciﬁc example of a multi-view 3D scan based recognition
system. They construct a 3D model based on multiple views of subjects. Although,
it is perhaps infeasible for a common place application of such a system, high-end
or high-security systems could deﬁnitely beneﬁt from such research. Further, with
reduction in scanner costs such a system might be the way of the future. Park and
Jain [44] show a 3D model construction from multiple non—frontal views for face
recognition. They show a performance improvement of 40% on the CMU Face In
Action (F IA) dataset. This method has beneﬁts for a more robust recognition system
from video streams.

Lu and Jain [43] tackle the problem of expression (which affects shape) by us-
ing a deformation model to handle the non-linearity of expressions for recognition.
Kakadiaris et al. [45] show a face recognition system that uses a deformed face model
that represents 3D geometry in a 2D structure. Wavelet features are extracted and
then matching is performed. If we consider a mesh as the simplest (but more com-

putationally intensive to match) representation of a 3D face, then matching can be

9

performed using a rigid surface alignment method such as Iterative Close Point (ICP)
algorithm [46]. Colbry and Stockman present a 3D face veriﬁcation system using a

rigid alignment method in their work [47].

3D faces are a logical extension to 2D faces in that the z-depth of the surface
points is included. This extra piece of information could be useful to estimate shape
curvature more reliably that estimating them from 2D intensity maps. “With the
availability of the z-depth, pose can be estimated a bit better than with just 2D

faces. Also, depth maps are not signiﬁcantly affected by lighting changes or make up.

Earlier research supposed that 3D data would address issues of poor recognition
accuracies through the use of better shape curvature estimations as features. Prelim-
inary results showed that matching accuracies using only 3D were only comparable to
using 2D face scans [48,49]. Fusing 2D and 3D scans deﬁnitely improved performance
accuarcy. To be critical, such a matching schema represents multiple representations
of the same input face, so the results of such research should be interpreted appro-
priately [50]. Excellent surveys of 2D + 3D fusion biometrics maybe found in Chang

et al. [49] and Bowyer et al. [50].

The use of 3D faces helps the feature based methods of face recognition and mod-
eling, since it allows for a more reliable estimation of shape curvature and detection
of anchor points [51,52]. Mian et al. [53] ﬁt a surface on uniformly sampled 3D data.
Features are constructed from these surfaces. Principal Component Analysis (PCA)
is performed on the gallery images. Input and gallery images are projected onto these
eigen subspaces and comapred. A graph is constructed based on the matching fea-
tures for the 3D data. Shape Invariant Feature Transforms (SIF T) are applied to 2D
face data. A fusion of both 2D and 3D features are then applied for matching. They
show a 96.1% identiﬁcation and a 98.6% veriﬁcation rate on the Face Recognition

Grand Challenge (F RGC) dataset [54].

10

1.2.3 Scanning 3D Faces

There are 3 major variants of scanning methodologies for 3D faces [50].

0 Passive Stereo In a passive stereo setting, the geometric relationship is known
between the two cameras used to acquire the images of the face. The location
of feature points in 3D is then computed from matching 2D pixels. An example

is the Geometrix system [55].

0 Structure from lighting The scanner projects a horizontal plane of laser light
onto the surface of an object, and then a 640 x 480 pixel color camera picks
up the curve produced by the interaction of the laser with the object, and
triangulates the depth of the illuminated surface points. The Minolta sensor is

such a system. In our research, we use data from the Minolta Vivid scanner [56].

0 Hybrid In such a setting both structured light and a stereo camera setup is used
to collect data. Such a system can improve the point density collected. The 3Q

“Qlonerator” is one such system [57]

Although, the literature is spilt on the actual beneﬁts of using 3D face data for
better recognition accuracies [50], there is sufﬁcient reason to believe that 3D face
data in conjunction with 2D face data helps improve performance. It may also be that
3D face research is where 2D face research was a few years back with FERET [58].

The 3D face research community has moved towards a standardized dataset for
face recognition research with the Face Recognition Grand Challenge (FRGC) [54].
The FRGC v2 data set contains 4007 3D scans from 466 subjects and contains multiple

expressions as well.

1.2.3.1 Motivation For This Thesis

Face recognition and biometrics have evolved to become mature ﬁelds over the past

few decades. Yet a large scale practical implementation of a 3D face matching system

11

is inhibited by the time complexity of matching. Even assuming less than a few
seconds for recognition in a dataset with a few thousand subjects, would result in

unacceptable processing time.

Reduction of complexity can be done either by reducing the time taken for pair
wise matching or by partitioning the search space. In this work we choose the latter
scheme. The idea of partitioning search spaces is not an uncommon idea in the ﬁeld
of biometrics. The ﬁelds of ﬁngerprint, iris, etc have addressed similar problems.
Germain et al [59] present a nice method to cluster ﬁnger print data. Bowyer et

al. [60] present a comprehensive survey of iris image understanding in their work.

The larger goal of such research would be to create a large number of robust
clusters that are created on a fairly comprehensive data set (106 subjects, say) so
that it may generalize well into the real world setting. For example, when a person
is scanned at an airport terminal the face is ﬁrst parameterized and a search is made
in the parameter space to ﬁnd the best cluster. Once the cluster has been identiﬁed
the matching procedure, such as the ICP algorithm [46], can be employed to match
the input face with all the faces in the cluster. If we have,say, 100 clusters with each
cluster representing 1000 people, then we have reduced our matchings from 105 to
103. We know that a probe scan can be matched against a gallery of 330 scans in
under 5 seconds [2]. Thus, we can effectively reduce the time taken to verify a probe
to about 15 seconds. If we further parallelize some of these calculations, lower times

could be achieved.

If we can further rank all the faces within a cluster using the parameters, i.e. faces
that are nearer to each other with respect to the matching algorithm are in fact closer
to each other with respect to the parameter space then we can further restrict our
search from 1000 people to only, say, the top 10 neighbours of a face scan within a
cluster. Thus, we can effectively recognize a person in about 30 seconds as opposed

to 3 x 105 seconds.

12

If we do not find a face with in the speciﬁed cluster we can still intelligently search
amongst other nearest clusters and still achieve good performance.

We parameterize human faces in our work here using a hybrid model of geometric
models - superquadrics and PCA based eigen parameters. The motivation for this
is for the geometric model to capture larger features of human faces, such as length,
width and gross curvature of the face, while the PCA based eigen parameters capture
ﬁner details.

Superquadrics are a powerful class of geometric surfaces originally proposed by
Barr [61]. Superquadrics are endowed with two additional exponential parameters
that extend the range of traditional quadrics. Very simply, superquadrics can be
thought of as an extension to ellipsoids.

Colbry and Stockman [47] present a rudimentary parameterization as part of their
work for developing a Canonical Face Depth Map (CFDM) that aims to provide a
robust normalization for 2.5D faces. Niu et.al [62] present a way to cluster facial
landmarks and provide insight into how regions effect separability. Extended su-
perquadrics have been employed by Zhang et al. [63] to track 3D faces, which pro-
vides support for a superquadric based representation of faces. We choose a regular
superquadric because of the its ability to provide better normalization of 3D faces.
Interesting work on shape retrieval and 3D faces can be found in the Shape Retrieval
Contest (2008) [64].

One other problem we wish to address in this work is to parameterize expressions
in 3D faces. We address this issue using Gabor ﬁlter banks and convex hull methods.
We show that our method does not require detection of anchor points during training.

To the best of our knowledge, our work novelly extends work in this ﬁeld.

1.2.4 Thesis Contributions

The following are the list of problems this thesis attempts to tackle:

13

0 We attempt to ﬁt superquadrics to 3D faces using 3D point clouds. The hy-
pothesis is that superquadrics capture the larger information in faces. Due to
their inherent symmetry, we also hope that superquadrics ﬁnd a good axes of
symmetry for faces. The larger goal is to attempt a good representation of 3D

faces

0 We attempt to bin (cluster) representations to ﬁnd many compact clusters
of faces. Thus, a good representation that can model both individual faces
uniquely while maintaining some information at the group level can help reduce

the time complexity of matching probe scans to a gallery

0 Lastly, we wish to study the parameterization of expressions (smile versus no
smile) in 3D faces. Recognition of expressions are of importance to many com-
puter applications. Understanding expressions is also important to achieve bet-

ter recognition accuracies

14

Chapter 2

Superquadric Representation of 3D

Faces

For most problems in pattern recognition, it is both efﬁcient and convenient to rep-
resent the problem in a feature space that describes the problem better for the task
of recognition. Raw spaces are often poor choices for representation and the study of
similarity.

In this section, we address the problem of representing faces using superquadrics.
First, we look at the problem of ﬁtting superquadrics to 3D face data. We, then
analyze the quality of ﬁt ﬁrst by looking at a few sample ﬁts. We ﬁnd that the inherent
symmetry in human faces and in superquadrics helps superquadrics ﬁnd good axes
of symmetry. This axes of symmetry helps to normalize 3D faces. Although, the
superquadric representation captures some of the variability in human faces, it does
not suﬂiciently represent faces. We then explore the use of PCA, to describe the ﬁner

details of the face in the later sections.
Figure 2.1 shows a block diagram of the various processes involved in this chapter.

For a good treatment of superquadrics and their properties, we refer the reader

to [61]. Superquadrics found their place in the computer vision community through

15

the work of Pentland on perceptual organization [65]. Solina and Bajcsy [22] present
a detailed sketch on how superquadrics can be used with deformations. The book
by J acklic, Leonardis and Solina [66] provides a very good tutorial on superquadrics.
Bardinet, et al. [67—69] show the effectiveness of superquadrics with deformations for
ﬁtting in real-world applications. For superquadrics with deformations on a local and

global setting, we refer the reader to [21].

2. 1 Superquadrics

Before, we study superquadrics, let us take a brief tour of its predecesor a superellipse.

Superellipse A superellipse is any surface, deﬁned as follows.

<2>m+<am=1 (21>

where a and b are positive real numbers, describing the major and minor axis.

As, m —> 00 we get more rectangular shapes. As m—> 0 it takes the shape of a cross.

Superellipses are special cases of curves known as Lamé curves, where m can be

any rational number.

Superquadrics are a powerful class of geometric models. They were originally
proposed by Barr for rendering a multitude of objects. Superquadrics are an extension
to regular quadrics in that they are able to parameterize curvature in the North-South
(61) and East-West direction(62). Thus, superquadrics can generate a host of new
shapes ranging from pin cushions, to ellipsoids, to torroids, etc through only a few
parameters. Although, there exist many sub-classes of superquadrics, we will deal
with only the super-ellipsoids, since their inherent symmetry and axes can help model

human faces.

16

Given, two two-dimensional curves,

h(w)

mm)

MW)

“’0 S to S “’1 (22)
MW)
mﬂm

mSnSni 05
mﬂm

The spherical product of these two surfaces gives us a surface x, deﬁned as, :1: =

m®hz

ﬂmw=

p—

 

L

 

m1(n)h1(w)-
“’0 S a) S “’1
mNWWW) QM
710 S 77 S 771
mﬁm _

The two (2D) curves modulate each other to generate a surface x (3D). Let h(w)

be a horizontal curve and m(77) be a vertical curve. Thus the spherical product results

in a surface where h(w) is modulated vertically by the curve m1(n) while m2(77) raises

or lowers the surface. Thus, 77 is a north-south parameter(latitude) while to acts as

an east-west parameter (longitude).

Let us take a closer look at the spherical product now. The spherical product

receives its name from the productiOn of a sphere, when the half circle,

771(7)) 2

is crossed with the full circle

c08(77) ‘% S "7 S 772, (2 5)
sin(n)

608W) 7r S to S 7r (2-6)
sin(w)

The meridional half circle determines the radius of the sphere and also the height
of the sphere above and below the X-Y plane.
Superquadrics are obtained by the spherical product between two 2D surfaces.

The parametric form of the superquadric is given by :

r .
alco.9€1ncos€2w 7r 7r
r(nu')— E1 '52 —gSn-<_g 27
, . — a2cos 773m w 7 ( ~ )
< <
[ . 61 7r __ w __ 7r
a332n 77

 

 

An alternative, implicit form of the superquadric can be derived using the equality

c0326 + sin2l9 = 1. We can now rewrite (2.7) as

2
(i) =cosZ€1nc032€2w, (2.8)
a1
2
(i) =0052€1n0032€2w, (2.9)
a2
2
(i) =sin2€2n (2.10)
(13

Now, raising both sides of the (2.8) and (2.9) 1/62 and (2.10) by 1/61 and adding

them up, we get the inside-outside function or the function used to render.

C
a a 2% a
:1: 62 y :52 z 51
F 2 _ _ _ 2.11
render (a1) + (a2) + (a3) ( l

where, F = 1 implies point (x,y,z) is on the surface,F < 1 implies the point lies inside
the superquadric, F > 1 implies the point lies outside the superquadric.

The explicit form is useful for representation and rendering, but if we deﬁne an
error function based on (2.11) for ﬁtting for certain 51 and similar (x,y,z) tuples we

will not receive the same F (:13, y, 2) values, thus making recovery harder.

l8

Since, there is no explicit (closed-form) solution for ﬁnding the Eucledian distance
measure for ﬁtting, we use the implicit equation as deﬁned in

f
2 2:2 2%

_ __ 61
1r (a? <1)“
f“ (01 + a2 + a3 ( )

Note in Figure 2.2 , how the shapes along the diagonal become increasingly pinched
as the value of the exponential parameters begins to increase. Superquadrics, can take
a wide range of shapes. When we set both 61, £2 to 1, we obtain a normal ellipsoid

(sphere), with radii 1.

2.1. 1 Superquadrics

The term superquadrics was deﬁned by Barr in his seminal work [70] to describe a
class of surfaces - superellipsoids, superhyperboloids and supertoroids. In computer
vision, superellipsoids commonly are used to represent various objects. Thus, for the

rest of this thesis we will refer to superellipsoids synonymously as superquadrics.

Hyperboloids

Hyperboloids of one sheet can be represented by

(act-(ail

Applying the equality 00320 + 5137220 2 1, in a similar fashion to obtain (2.11), we

get,

113 6 6 £1 Z 62

2 3/ 2 1
F + 2.14
hyperb ((11) (02) (a3) ( )

 

 

Face Aquisition & I
Rough Symmetry 3 Superquadric Fitting
Selection

 

 

 

 

 

Feature Extraction
Clustering : (Superquadrics/
PCA/Gabors)

 

 

 

 

 

 

 

Figure 2.1: Block diagram that explains the various steps involved in this chapter

rattan ff

 

Figure 2.2: The vertical axes of varies the North-South exponential parameter, while
the horizontal axis varies East-West exponential parameter. The values are varied
from 0.1 to 2 on both the axis.

20

Hyperboloids of two sheets can be represented by
2 2 2
(i) _ (i) _ (i) :1 (2.15)
a1 a2 a3

Applying the equality 00326 + sin29 = 1, in a similar fashion to obtain (2.11), we

get,

_ __ 61
_ .2 .2_ 2. .2 _ :1 .,
thpeTbZ . (01) (02) (03) (2'16)

Supertoroids

Toroids are a special case of an extended quadric surface, deﬁned as follows

2
(r - a)2 = (g) :1 (2.17)

where r is deﬁned as

 

2 2
r = (“3) + (i) (2.18)
a1 oz
The surface vector for a Supertoroid is given by

a1(a4 + coseln)cos€2w
r(n,w) = a2(a4 + coséln)sin€2w , (2.19)

a3(sinC 1.77

 

 

and the implicit function is given by

21

where a4, a positive real offset that deﬁnes the equivalent of the radius for the

supertoroid, is given by

(14 = —— (2.21)
a? + a3

2.1.2 Superquadrics In General Position

We have already seen that a superquadric can be defined easily with only 5 shape
parameters (3 radii, 1 for each dimension and 2 exponential parameters), but to deﬁne
the position of a superquadric in a 3 dimensional Euclidean space, we need more than
that. We need to deﬁne the set of operations that we need to perform to tranlates

and rotate a superquadric to arbitrary position in space.

Let us deﬁne .a homogenous transformation T to transform the superquadric
(1'3, ys, 23) to its new origin, that of the 3D points (world) (1w,yw,zw) in the space.

Thus,

 

{L‘w ) 1‘5
yw = T ys (2.22)
_ 3w _ L ‘33 _

 

 

 

Here, T is a homogenous transformation matrix deﬁned as follows

22

(m 7713; Tia: 02:
l m n 0
T 2 y 3/ 3/ y (223)
[z m; 77.:y 0;:
O 0 O 1

 

 

Thus, for a given point T ﬁrst rotates the points using the rotation parameters l,m
and n. The point is then translated by (01;, 0y, 0 2, HT. Note, that this transformation
matrix T is invertible and can be used to go from world space to superquadric space

and back, i.e.

 

_ 1 - ..
1'3 .17“)
y, = T_1 W, (2.24)
23 3w

 

 

 

If we use Euler angles (0,640) to express the rotations in the transformation

matrix T, we get the transformation matrix as follows.

)- -1

cosocosdcosw — singbsinw

singﬁcosﬁcosw + cososinﬂ

 

—cos¢cos€sinw —sin.¢cost’) cosg’osinQ p3;

—sin¢cos6’sin¢+cosocos6l sinésinﬂ py

 

T :
—sin6lcosw sinﬂsinw 0036 p;
0 0 O 1
(2.25)
F(‘TU)1 3111)» ZUJ) : F(a‘1:a210‘32611627 Q5,6,'¢’,pr, pyi p3) (2'26)

Thus, the inside-outside function of a superquadric has 11 parameters of which 5

are shape parameters (a1, (12, (13,61, 62) and 6 transformation parameters are used to

describe the superquadric in 3D Euclidean space as seen in Equation (2.26)

23

2.1.3 Ambiguities in shape description

The mapping between superquadrics and objects is many to one, by nature. For
example, if we have a superellipsoid such as (1,2,1,0.4,0.4) by modifying the rotation
parameters required to specify the superquadric, multiple superquadrics can represent
the same object. Further, the same object can be represented by another superquadric

with values (2,1,1,0.4,0.4) with appropriate rotation around the x-y axes.

Lastly, as pointed out in [22], two parallelpiped with slightly rounded edges ((11 =
a2 = 1,61 = 0.1 and £2 = 0.1) can be similarly represented after a rotation of
7r/4 around axis 2 of the object coordinate system, with 61 = 0.1,62 = 1.9 and

a1=a2=\/2.

The easiest way to deal with this problem is to constrain some of the shape
parameters for the object we are trying to model. Thus, if we limit rotations from
going beyond {—7r/4, 7r/4] for example, that would prevent inversions of parameters.
Similarly, the exponents may be constrained to meaningful classes. In our problem,

since we are only modeling human faces, this constraining can be easily done.

Solina and Bacjsy [22] provided a new objective function to recover shapes that

accounted for occclusions and chooses parameters such that they are minimized.

R= WU“ 1) (2.27)

where F, is the objective function in (2.12). The reason for , /a1a2a3 comes from
the need to constrain the shape parameters to lie between [0.1.1]. thus implying

meaningful recovery of shapes.

24

2.2 Fitting

A superquadric can be fully rendered with just 5 parameters. To get its position in §R3
we need to include translation and rotations. These account for 6 other parameters.
Our goal is to learn the optimal set of 11 superquadric parameters that represent a

face scan approximately.

To acquire a reliable sampling of points from the face scans for ﬁtting, we first
reliably identify key points on the face such as the tip of the nose, innermost eye
points, etc. This is done by using the computed normals on the surface and then
iterating to ﬁnd the optimal point in a region. These key points are found using
curvature estimation and localization as described in [47]. We use the key points
for deﬁning bounds to solve the optimization problem. The top most point of the
forehead (that we can reliably accept from our scans), the bottom most point (the
chin), the left and right cheek bones. We then sample uniformly within these bounds

to get a point cloud.

We use a multi—stage optimization procedure with two different optimization al-
gorithms. We ﬁrst compute the shape parameters using the active-set algorithm. We
maintain very rigid bounds for the rotations in this step because of the sensitivity of
the algorithm. We then use the parameters obtained to re—compute a ﬁt, to include

rotations.

Suppose we have, an objective function 9(1) and constraints 92(1) 2 0 that deﬁne
the search space for 1:. A constraint is said to be active if 9,-(17) = O and inactive
otherwise. Thus, the active set is the set of constraints 92(2) that are active values of
the parameters. The active—set algorithm works by iteratively attempting to solve the
equality problem deﬁned in the active-set through approximations. After computing
the Lagrangian multipliers of the active-set, it removes a subset of the constraints

with negative Lagrangian multipliers and other constraints that are infeasible.

25

Traditionally, conjugate gradient descent algorithms are applied to learn the pa-
rameters; these algorithms tend to be slower than alternatives because of the overhead
of computing the direction from the gradient vector and Hessian matrix. We employ
a M atlabTM package that uses the modiﬁed version of the active set algorithm. The
Hessians are computed empirically and modiﬁed accordingly. The details and analy-
sis of the algorithm are beyond the scope of this thesis. We refer the reader to [71]

and [72] for an introduction to these algorithms.

2.2.1 A robust normalized co-ordinate space

PCA is a decomposition technique that has been applied to various applications in
computer vision and pattern recognition. PCA tries to ﬁnd the best orthogonal basis
to ﬁt data by searching for axes that maximize projected variances. Thus, given a
matrix X, which is the data matrix, we are looking to transform X into its decom-
posed basis vectors ,where every column in X represents a grid of (approximately)
100 x 100 3D co—ordinates. Since we know that the eigenvectors of the covariance
matrix are the same as the original data matrix, we compute the eigenvectors for
the covariance matrix. Thus, men is an input data matrix such that it has 71 data
column vectors from m input faces. The empirical mean of each dimension n then is

subtracted to get 2?.W’e now compute the covariance matrix of 2? given by:

1 A "T _
C = N Z XX (2.28)

We, then perform a singular value decomposition to get the singular values
U C 2 AC (2.29)

The superquadric model captures the larger shape features in a face and pose and

26

the PCA is used to represent the ﬁner features of the face. By looking at the singular

values in the diagonal matrix A we can choose a minimal set of appropriate eigenvec-

tors that capture the energy efﬁciently. We then ﬁnd the weights of projections of

the input data onto these eigenvectors and use these values for parametrization.

2.2.2 Fitting Results

The MSU dataset consists of 107 subjects with 3 scans per subject.

Scans were

collected using the Minolta scanner, whose scanning accuracy is about 0.1mm. Anchor

points or facial landmarks are located using the algorithms deﬁned by Colbry and

Stockman [47].

 

 

Points Rot X' Rot Y Rot Z
2000 —.008 -.0377 -.0009
SD :1: .001 :l:.002 :t.002
1000 -.0084 -.0053 -.00027
SD :1: .03 :l: .04 :l: .04
750 -.0024 -.0014 -.0045
SD £0395 i .0461 :t .0464

 

Table 2.1: Fitting Results on Raw Scans. Rotations are described in radians.

 

 

Points Rot Z Rot Y Rot X
1000 -.0010 -.0012 -.0011
SD :1: .0053 i .0051 :l: .0134
750 -.0008 -.0017 -.0007
SD :1: .0044 :t.0061 :l:.0128
500 -.0009 -.0016 -.0019
SD 21:.0049 :l:.0090 :’c.0162
100 -.0014 -.0018 -.0017
SD z’c.0067 :l:.OO81 :t.0160

 

Table 2.2: Fitting Results on CFDM Data Set. Rotations are described in radians.

Table 2.1 and Table 2.2 describe the average rotations of the superquadric along

each dimension (X,Y and Z) along with their standard deviations for both raw scans

and CFDM normalized scans. We observed that the rotations were generally smaller

27

in the CFDM normalized data set.

 

 

 

 

I l I
-100 o 100

{—1—‘7—1 ‘

-50 O 50

Figure 2.3: Superquadric ﬁt to a CFDM face scan. The superquadric is plotted in
green

A sample ﬁt is shown in Figure 2.3. It is important to note that we do a double
optimization method of ﬁtting. The superquadric was ﬁrst ﬁt to the CFDM scan.
Points were pruned (gross outliers) based on the ﬁrst ﬁt and the ﬁtting procedure
was re—initialized. We found this to give a more robust ﬁtting. This dual procedure
particularly helped with ﬁnding good rotations. Since, solving for all parameters
simultaneously sometimes resulted in very poor choice of axes. So, in our approach,
in the ﬁrst pass we do not estimate the rotations and do so only in the second pass
of ﬁtting.

Sampling was done such that, we choose points between the chin and the middle

of the forehead in the y-axis. We limited our data to lie between the left and right

28

cheek bone. We also took extra precaution and pruned away data beyond a certain
z depth. This helped with the ﬁtting procedure. Points that lie between 0 and -10
(which usually included protrusions, like the tip of the nose) were pruned. This served
two purposes

a) It helped in generating amore informative model that can capture the larger cur—
vature of the face and

b) It increased the speed of the system.

We experimented with various choices of sampling points. We generally found that
including regions like the tip of the nose and other protrusions, forced the objective
function to ﬁnd an inappropriate X-Y coordinate space for the face. In our work, we
have constrained sampling using only depth constraints but an interesting idea could

be to remove certain regions such as the eyes, nose and lips and resample for ﬁtting.

Since our data is 2.5D data, we found that the parameters for the z axis radius
was very sensitive. Intuitively, this makes sense since there can be multiple solutions
that could be optimal for a given point cloud. We found that modifying the objective
function to penalize only the z-radii resulted in a poor choice of axes. The solution
that, we found, worked was to constrain the space of values the z-radii parameters
could take. This, resulted in stable ﬁts.

Now, let us compare the Figures 2.4 and 2.5. We see that the NS exponent (or
the 4th parameter) differs on an average of about .3 between the two faces. This
exponent basically tells us if the superquadric is going to be pinched or not. Notice
how in Figure 2.5, it tends to pinch a bit more than the ﬁrst ﬁgure. The horizontal
exponential component tends to account for the elliptical nature of faces. Thus, the
more closer to an ellipse a face is, the larger this parameter. Both Figures 2.4 and

2.5 have only average values for horizontal exponents.

In Figure 2.6, the vertical (N-S) exponent is large and we notice a small pinch in

29

' frontal ne
53.18,93.6,36.87,1.34,.48 53.18,88.72,25.03,1.11,.74 53.10.92.75,30.30,1.39,.6

         

.200 .7
0~ ,5 100 o ,/ 100
'50 ' -soj $0
-100 1 0 ~.. g
-155‘ N .1 -1oo ‘“3\~-\\M/
~c. 100 -100

o 100 -100

Figure 2.4: Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data. . The
numbers above the ﬁts (Left to Right) are the 5 shape parameters al. (12, a3, 61, 62

ac
56.24.111.5,39.20.1.74,.69

frontal
58.33,110.5,42.48,1.74..70

58.27.106.75.43.54.1.74.50

 

Figure 2.5: Superquadric Fit for a subject’s 3 scans from the MSU-CFDM data . The
numbers above the ﬁts (Left to Right) are the. :3 shape parameters a1, a2, a3, 61, E2

30

the superquadric. The horizontal exponent (EVV or the 5th parameter) is a little above
average thus giving a shape that is in between a box and an ellipse.If the horizontal
exponential component is low then the superquadric tends to be more box-shaped (if

it is not pinched).

Figure 2.7, shows histogram plots of the various shape parameters on the MSU
dataset. It is interesting to note the nature of their empirical distributions. The X-
radii seemingly ﬁts an approximate Poisson distribution, with a mean around 55mm.
The Y-radii ﬁts a normal distribution with low variance very well, thus implying that
only the two extremum will help in separation. The Z-radii again would seemingly
ﬁt a normal distribution with mean about 30mm. The exponential parameter N—S
ﬁts a normal distribution with a large variance and a heavier left tail, while the E-W
parameter ﬁts a normal distribution with a large variance and a heavier right tail.
Thus, both of these can contribute towards separability of faces. It is important to
note that these histograms should be interpreted with due care because the sample

size of our dataset is small and the ﬁtting procedure only partially robust.

It is important to note that the rotations of the superquadric along the YZ axis
tends to be more stable with the increase in the number of points in the cloud.
Although the error of ﬁt tended to be a bit higher(more outliers) for larger sized

point clouds, the parameters estimated tended to be similar.

The idea of why our procedure works better may be intuitively understood by
looking at a poorly normalize data sample from the CFDM dataset and its corre-
sponding ﬁt(using our model). Figure 2.8 shows one such case. We estimate some
20 such scans to be poorly normalized in our dataset, which roughly accounts for 3-4

more eigenvectors required to describe the CFDM well.

31

frontal

56.71 102.75 38.23 1.75 0.61 57.64 94.50 37.38 1.65 .69

O
-50

100

me sm

56.49 96.50 37.57 1.69 .63

  

Figure 2.6: Superquadric Fit for a subject 1035 3 scans. The numbers above the ﬁts
(Left to Right) are the 5 shape parameters (11, a2, a3, 61, 62

1 00
80
60
40

No of faces

20

0
50 60

X Radii

70

1 00
so
60
40

No of faces

20
0
0 1
N-S Exponent

No of faces

No of faces

100 80
80
,,, so
60 §
3: 4o
40 l g
l 2
20 2o
0 o
50 100 150 o 50
Y Radii z Radii
150
100
50
o
o 1 2

E-W Exponent

Figure 2.7: Histograms of the 5 superquadric shape parameters. X,Y and Z radii are

measured in millimeters(mm).

100

50

 

 

 

 

-1oo '
0 ‘g_ . .
10° -100 -50 o 50 100

Figure 2.8: Superquadric ﬁt to a poor CFDM face scan. Note in the top left corner,
the scan has many holes in it. Holes are a result of the reflected laser not being
captured due to hair, etc . The superquadric is plotted in green. The blue points are
the sampled data points for ﬁtting the superquadric.

33

2.3 PCA and Eigenfaces

Principal Component Analysis (PCA) or Karhunen—Loéve transform was originally
introduced by mathematical statistician Karl Pearson [73]. PCA works by ﬁnding
orthogonal axes that maximally capture the variance in the data. PCA as a method
for characterizing human faces was ﬁrst proposed by [74,75]. PCA was introduced as
a tool for face recognition by Turk and Pentland [30,76].

The procedural details for PCA are as follows. Let us assume we have m faces in

our dataset. Further, the vectorized size of each face is d.

0 We ﬁrst organize a collection of faces that are normalized. Usually, normal—
ization includes a transformation of the face, such that the tip of the nose is

aligned at the center of the image with a frontal pose.

0 Now, create column vectors, one for each face and arrange them next to each

——-> -+
other, i.e. X = [:rl..xm]. Thus, X is of size d x m

——)
0 We then compute the mean face ( (15) from this matrix and subtract the mean

from each face vector 2'.

—)—->

%:&_a (mm
0 We then compute the Covariance matrix of the matrix )7

c = —1—YYT (2.31)

m
0 we then compute the eigen vectors and eigen values of matrix C
C A 2: AV (2.32)

where V is the set of eigen vectors and /\ is a diagonal matrix containing the

34

eigen values

0 We then choose k eigen vectors corresponding to the top k eigen values that
capture the maximal variance in the data. The choice of k is subject to the
need of the application. The trade off is between accuracy and computational
complexity. If we require a more accurate representation, i.e. more variance
represented, then we choose more eigen vectors but that increases the dimen-
sionality of the system. Thus, more computational effort is required with higher

dimensions.

0 Typically, it is computationally very expensive to perform the eigen decompo-
sition of a d x (1 matrix. For example, if every face was a small image of 100
x 100 pixels, the covariance matrix would result in the size of 10000 x 10000.
A large number of elements in this matrix does not contain useful information
(thus resulting in a sparseness in the matrix in some cases), so computing the
eigen vectors for such a matrix is not only infeasible in some cases but also not

very accurate.

This problem was addressed by Turk and Pentland [30,76]. They suggested

that it was possible to compute the eigen vectors as follows.

C’ = —1—YTY (2.33)
711

ex = 11> (2.34)

v = W (2.35)

Thus, we can now compute the eigen vectors of the d x d matrix using the m x

m matrix.

35

o On choosing the eigen vectors, we then project each face (mean subtracted)

onto the eigen vectors to get the eigen projection weights.
——> 7“"
ij = V yj (2.36)

where w is a k x d dimension matrix, such that each row corresponds to the set

of weights for each eigen vector (eigen face).

0 From the previous step, reconstruction can be easily done for face j as
A k —-) —->
Xj = Z wjz-Vj + a (2.37)
i=1

The addition of mean is important, since while computing the eigen vectors, we

subtract the mean to center the axes.

2.4 Toward a Parametric Face

The ramifications of having a 3D face that can be represented by a set of few param-
eters, are plenty.Our primary goal of parameterization is to create a set of bins that;
once created, they can be used to partition the space of human faces. Now, when we
want to recognize a face using matching algorithms, we would only need to match it
with the set of faces in a bin. This reduction in search space leads to a large reduction
in time-complexity.

For representing the ﬁner features of the face, we sampled approximately 100 x
100 points based on the superquadric co—ordinate space. We assumed (and veriﬁed
through the trace of the eigenvalues) that the axes of the ﬁtted superquadric better
represent the symmetry of human faces as compared to raw and CFDM scans. Then

using the steps described in Equations 3.1 - 3.2, we computed the eigenvectors.

36

We compared an approximate 100 x 100 feature mesh under three different con-

ditions :
1. MSU dataset with superquadric normalization (raw dataset;with no—preprocessing)

2. MSU dataset with CF DM normalization (as described in [47],and no superquadric

normalization)
3. MSU dataset with both CF DM and then superquadric normalization

To compare the meshes from these three different normalizations, we looked at
the trace of the eigenvalues. We found that the top 10 eigenvalues for (1) contained
82% of the energy while for (2) it contained about 97% of the energy. (3) on the other
hand did better with about 99% of the energy.Thus, we see that the superquadric
normalization along with the CFDM normalization provides a better normalizing

co-ordinate space.

37

Chapter 3

Clustering Representations

In this chapter, we will look at the results of clustering the representations described
in earlier chapters. First, we provide an overview of the clustering methods used in

our work and their objective functions.

3.1 A Note On Clustering

Traditionally, clustering is an unsupervised method used to discover previously un-

known patterns in the data by grouping them based on some similarity measure.
Depending on the application/problem, clustering can involve a variety of al-

gorithms. Given, a set of n vectors sigma? of d-dimensions, the objective of

clustering is to identify groups or partitions in the dataset. The challenges largely

involved in clustering can be grouped into the following :

1. To identify clusters (of varied shapes), if knowledge of the total number of

clusters - k, is given
2. To reliably discover k

To address problem 1, there are many solutions. Depending on the application,

the choice of algorithms are many. Spectral clustering has been used in document

38

clustering, speech recognition and other applications [77, 78]. Hierarchical clustering
is used when there exists a tree-like structure in the data, i.e. groups are better
represented in a layered manner, with higher layers subsuming lower ones [79]. Other
algorithms include (but are not restricted to) probablistic k-means, learning vector

quantization (LVQ),etc.

3.1.1 Unsupervised Learning - Kmeans

Of the many algorithms k—means still continues to enjoy popularity principally be-
cause of its simple, yet efﬁcient objective, that the association of a sample is best
towards a cluster that it is most “similar” to [80,81]. A good tutorial of its history
and applications is provided by Jain [82].

The objective function is deﬁned as:

k
. 2
Ar minV= E E :r- — ~ 3.1
9 V . ( J #2,) ( )
2:1:EjECZ'

where there are k clusters 0,, where i = 1, 2...k and “i is the centroid for the Cith
cluster. Thus, we aim to minimize the function for intra class variance and maximize
inter class variance. Since, this is an unsupervised algorithm the choice of k and the
initial location of the centroids is critical.

Although, k-means remains popular, there are some inherent disadvantages to
this model. First, k-means tends to discover only Gaussian shaped clusters. This can
be highly limiting, if we wish to partition datasets as seen below. Methods such as
spectral clustering can be used to discover such shapes.

Further, association to a group is very rigorous with this model. Fuzzy k-means,
probabilistic k-means, etc are commonly employed to create a soft association of a

sample with a cluster. In probabilistic k—means, every sample is given a probability

39

of association with each group, such that the sum of the probabilities totals to 1, thus

modeling a probability distribution.

3.1.2 Semisupervised Clustering

In real-world problems, often there lies some information about the data that could
possibly be useful to discover new information / patterns. This information can be in
terms of user feedback or constraints such as those imposed by the real-world - e. g.
height of humans cannot be less than a foot or greater than 9 feet. Constraints can
also be of the nature such that. they determine similarity between sets of points (not
necessarily all).

Semi-supervised clustering employs a small amount of such constraints to aid
the unsupervised clustering. We use the clustering model proposed by Bilenko,et.al
[83,84], where they modiﬁed an unsupervised k—means based clustering model, to
include metric learning and constraints. Let M be a set of must-link pairs of data
points, where ($25,513) 6 M. Similarly, (1:23:17) 6 D implies 2:,- and 103- must lie in
different clusters. If we modify equation (3.1) to include constraints then we get the

following equation

2 .
Vka-means : E : ”Ii — ”i“ + E : wi,jTlli 75 [j]
TiEX :ri,:chM

+ Z mTll-z=ljl (3.2)

132',.'L‘jED

where W = wi’ j is the weight matrix for forcing data pairs to belong to the same
cluster and W = 102-, j is the weight matrix to force data points not to belong to the
same cluster. Lastly,r is the indicator function 3, T[true] = 1 and 7'[ f alse] = 0.

If we wish for tighter clustering, then we constrain the rank of the weight matrix

A . If A It represents the weight matrix for each cluster, then equation (4.2) can be

40

modiﬁed to

7pcknzeans = Z Ill‘z' - Milli” -109(d€t(/1hi)) (3-3)
IEZEX 1
Z wi,j7lli 7'é fj] + Z wi,jT[li = [j]
xiﬂjEfW 132311760

In [83], they further change equation 3.3 to account for the weight matrices having
the appropriate shapes for and propose an optimization algorithm for solving the

objective function.

3.2 Clustering Results

This section discusses the clustering experiments that we conducted during the course
of our research. The data set in use consists of 107 subjects with 3 scans each. The
accuracies deﬁned are based on the effectiveness of the clustering algorithm to place
at least two of the sister scans in one bin. Note, this is equivalent to calculating the

precision of the system.

The clustering procedure includes subtracting the mean and normalizing the data
set before running the clustering algorithm. It is interesting to note that the choice of
normalization affects recall accuracies. From our work we ﬁnd that a L2 Norm works

best.

From Figure 3.1 and Figure 3.2, we can see that the accuracy decreases with
number of clusters. Perhaps, the accuracy of the shape parameter based clustering
is slightly better. We propose a hybrid method of clustering using shape and PCA
based Eigen parameters for clustering on the CFDM and Superquadric normalized

data set.

41

 

 

 

 

‘ if j ﬁ Tr T T f f
+
0.98 ' i
0.96 _ '1‘ ‘1' +
+ +
+ +
A 0'“ + + + +
c + +
c 0'92 b + + + + + +
+ + +

u 0.9 " + + '1
r + + + +
l 0-33 ’ + + ‘
c +
Y 0.86 > + + + + 4

0.84 P + 4.

0.82 - +

0.8 1 1 1 1 1 1 1 1 1 +
0 2 4 6 8 10 12 14 16 18 20

Number of clusters

Figure 3.1: Plot of recall accuracy against cluster size for CF DM and superquadric
normalized dataset with no Eigen and 5 shape parameters used

Hom Figure 3.3, clearly the hybrid method shows promise. We can see that a
cluster size between 10-15 is easily and robustly achievable (over 90%). From Figure
3.4 it is also evident that the use of too many PCA weights actually worsens the

performance. The optimal size of PCA weights seems to be about 20.

Finally, Figure 3.5, shows the cluster plots for the raw data set without the CFDM
normalization. Clearly, the performance of a hybrid clustering system on the raw data

set outperforms kmeans.

Figure 3.6 shows the performance of clustering on the CFDM and superquadric
normalized data set. Clustering was done using both default constraints and manual
constraints with RISC MPCK—means toolbox. The equation in [83] is employed in the
toolbox. We can see that the accuracies are slightly better for larger cluster numbers
with the semi-supervised clustering method. Since, it is beneﬁcial for us to have as

many clusters as possible, this method is interesting.

42

 

 

 

 

1 . a f s
0.95- + +
+ +
A 1- + + -(
c 0.9
+ + +
c
+ + +
u + +
0.85- + + +
r + +
, + + + + +
3 + + + +
c + + + +
08’ '1‘ '1'
Y ' +
+ + + +
+ + +
0.75» i 1
0‘7 L 1 1 1
o 5 1o 15 20 25
Number of clusters

Figure 3.2: Plot of recall accuracy against cluster size for CF DM and superquadric
normalized dataset with 10 Eigen parameters and no shape parameters used

Figure 3.7 shows the rank accuracy of retrieving a sister scan from the test set
based on the parameters. We see that an accuracy of above 80% is achievable by using
a mixture of shape parameters and Eigen projection weights. Note, that this ranking
is based on the entire set. The ranking results must be interpreted in conjunction with
the clustering procedures, i.e. if we searched for the sister scans in the appropriate
cluster then the number of comparisons could be lower. Since, we might have few 105
or 1003 of highly dense clusters it would save us a lot of overhead if we had to match

the probe faces to the nearest 112 gallery faces.

3.3 Time Complexity

Table 3.1 shows some details on the time and space complexity of the various proce—

dures used in this project. As stated in Colbry et a1 [2], a probe scan can be matched

43

 

 

 

 

1 T F77
+
+
_ +
0.95 + + ]
+
+
A + + + +
C + + +
0.9. + + ,
C + + + + +
u + + + + +
+
r + +
a 0-35' + + + .[
+ +
C +
+ +
Y + +
0.8” + + J
0.75 1 1 1 L
0 5 10 15 20 25

Number of clusters

Figure 3.3: Plot of recall accuracy against cluster size for CFDM and superquadric
normalized data set with 10 Eigen and 5 shape parameters used

with an entire gallery (330 scans) in about 5 seconds. If we assume a 15:1 gain in
performance by moving from Matlab to C++, the time taken to ﬁt a probe scan to
a superquadric is quite small. The only large computational time is the time taken
to sample a mesh from the superquadric space. Currently, we try to ﬁnd the nearest
point between a 100 x 100 mesh and a point on the 3D face. This can be simpliﬁed

(approximated) and sped up by sampling from a ﬁxed corner.

So, if we had a large gallery (say, 10000 unique subject scans), we would still take
about 150 seconds to verify a probe scan. This although feasible does not make for
a very practical system. Now, if we are able to partition the data set into even 50
equal sized clusters. Then the time taken to verify can be reduced to 1/30th of a

brute forced search, by matching only against all scans in a cluster.
Further, if we could rank the clusters by similiarity, we could parallelize the match-

ing against multiple clusters and the veriﬁcation with-in them. This could allow for

44

 

 

 

 

‘ T T
+
0.98 - + +
+ + +
0.96 - + t + f 1
+ + + +
-L + .L
. 1
A 0.94 + + + 1
C + + +
c 0.92 ’ + + + '4
U + +
r 0-9 ’ + + + 1
g + +
c 0.88 '- + + 4
+-
Y 0.86 ~ + J
+ ..
0.84 - +
0.82 L 1 1 1 1 1 1 1 1 “
0 2 4 6 8 10 12 14 16 18 20
Number of clusters

Figure 3.4: Plot of recall accuracy against cluster size for CFDM and superquadric
normalized data set with 50 Eigen and 5 shape parameters used
a real-time system on a large scaled system. The only road blocks are the scanning
time, robust partitioning and time taken to extract features.

Work in low-cost, high speed 3D scanners hopes to reduce both the time and cost
of procuring good face scans. So, in future a robust partitioning schema, along with
good features and a parallelized matching system could make for a practical face

recognition system even for large galleries.

3.4 Feature Analysis

3.4.1 Submeshed Projections

Our motivation is that a single mesh of approximately 100 x 100 data points centered
around the nose is not effective to capture all the variances in the human face. So,

we explore in this section a submeshing and their corresponding Eigen projections.

45

 

0.95 -
0.9 L +

0.85 1- + +
0.8 7 + +

0.75 -

<noncon>
+

0.7b 1‘ 1
i
+ +
0.65“ + + + 1

0.6 '- + 4

 

 

0.55

 

0 5 1o 15 20 25 30
Number of clusters

Figure 3.5: Plot of recall accuracy against cluster size for the raw data set with 50
Eigen and 5 shape parameters used

We split the 100 x 100 mesh into smaller meshes of 25 x 25, thus resulting in 16
submeshes. We then apply Eigen decompositions to each of the submeshes. We
project each submesh onto a few Eigen vector and compute weights. These weights
are the parameters for each submesh.

Table 3.2 studies the top 3 Eigen values contained in each of the submeshes.
The submeshes are arranged in the same way, we’d View a face, i.e. the entry (1,1)
represents the right eye (user-centric) of a normalized scan. This provides a slightly
intuitive understanding of human faces. We see that the regions around the lip and
nose are more sensitive than the other region.

From the experiments so far it is clear that a simple parametrization using Su-
perquadrics will not suffice. To study this further, we can look at the entropy esti-

mates of the parameters.

Entropy (H(x)( f (xi))) where f (17,-) is a particular feature/ data point), is deﬁned

46

 

 

 

1 V T ‘1’ T 1” r
+I
+
,_ +
0.95 +
+
A + + +
C 0-9’ + + 1
+ +
C + + +
u ++ '1'
4.
. + ++ + +
c + + +
08 '1‘ '1' '1' 4
Y + +
++++
4..
0.75“ ++ <
+

 

0‘7 4 1 1
0 5 10 15 20 25 30 35 40 45 50

Number of clusters

Figure 3.6: Plot of Recall Accuracy against Cluster Size for the CFDM and su-
perquadric normalized Data Set with 20 Eigen and 5 shape parameters using the
MPCK-means Algorithm

H122) = -— Z f($i)1092(f(33i)) (3.4)
i=1

Table 3.3 shows a comparison of entropy estimates using equation 3.4. We see that
the entropy of the superquadric parametrization in conjunction with the submeshed
Eigen projections clearly outperforms other parametrization. We wish to apologize for
the abrupt inclusion of the Gabor features’ projections in this table, but we include it
for completeness. The 100 here refers to the number of points sampled on the convex
hull and the 10 refers to the number of Eigen vectors employed.

Intuitively, the submeshed parametrization makes sense as well. Superquadrics
can capture large variances in faces, the length of a face, its general latitudinal and
longitudinal curvatures, etc. In fact the entrOpy estimates of only the two curvatures

of superquadrics is fairly high, with east-west curvature capturing more information.

47

 

0.9 I F V V Y I r

 

0.3 - HngfOr/z’o/J
/ _/ ._/ a/r
P/F'EP‘fa—A‘“ *
A /O"’_" [A
C /A
c /A
u
I. 1
c
y J
—A-— No PCA
—0-— Shape + 5 PCA J
—I— Shapo+10 PCA
~O— snape+30 PCA 1

 

 

 

 

 

 

I 1 l l L

05101520253035404550
Number of Neighbors

 

o l 1

Figure 3.7: Rank Accuracy: Retrieving sister scans based on parameters

 

Process Details (320 x 240)
Sean Time z 2 sec
Curvature Calculation 0.42 sec
Symmetry Alignment (ICP) 0.23 sec
Orthogonal Projection 0.22 sec

Total CF DM Generation 0.85 sec

Verifying Probe vs Gallery (330 scans) z 5 sec

Raw File 1.58 MB
Compressed Raw File 0.88 MB

400 x 700 CFDM ﬁle (ppm) 1.68 MB

400 x 700 Compressed CFDM 0.24 MB

Fitting Superquadrics z 5 sec (Matlab)
Sampling From Superquadric Space z 3 min (Matlab)
Euclidean Matching to a Cluster << 1 sec (Matlab)

Time Taken to Calculate PCA weights << 1 sec (Matlab)

Table 3.1: List of processes/ data formats and their corresponding times and memory
requirements. Some data from Colbry et a1 [2]

48

Row / Col [ C011 C012 C013 C014
Bowl 95% 94% 93% 93%
Row2 97% 92% 92% 96%
Row3 97% 89% 90% 94%
Row4- 94% 96% 94% 91%

Table 3.2: Table showing the variance capture by the top 3 Eigen

 

 

 

Parameter Type Entropy Estimate
Superquadric Parameters 2.42
Superquadric Curvatures (61,62) [1.6.6.7]
Eigen Projection (100 x 100) 1.02
Eigen Projection (25 x 25) 4.38
Gabor Features Projections (100, 10) 2.33

Table 3.3: Table comparing entropy estimates of Superquadric ﬁtting, 100 x 100 Eigen
projections and 25x25 eigen projections

This goes to show that there is a lot more information in the curvatures of faces than

the points themselves in an Euclidean space.
The real variance lies in ﬁner details such as the length of a nose, the pouty nature
of lips, the high cheek bones and so forth. A submeshed projection, thus captures

this variances better, assuming a well normalized data set.

49

Chapter 4

Parameterizing Expressions:

Recognizing Smiles

The study of expressions is important for multiple reasons. Detecting expressions
is of beneﬁt to many applications such as gauging user-interest, stress levels, etc.
Another key reason to study expressions is that face recognition is often confounded by
expressions.There has been a lot of work in studying expressions in the face research
community, largely dealing with 2D images. Varying degress of success have been

achieved in detecting various affective states.

Edwards et al. [85] use the Active Appearance Model(AAM) to model expressions
using key points and PCA for both texture and shape information. Gabor ﬁlter based
expression recognition is also a well studied approach. Lyons et al. [86] use Gabor
banks to model expressions. Other approaches of expression modeling include elastic
bunch graph matching algorithms as seen in Wiskott et al. [87]

As far as 3D faces are concerned the following approaches have been explored
with varying degrees of success. We present an overview only for completeness and
not as a survey of existing methods. Wang et al. [88,89] present a 3D primitive

feature analysis (Topograhic Context), where they segment the face into 8 regions

50

and label the sub—regions based on its convexity. They then create feature vectors by
computing the ratio between the various labels in the region [[Enl, “27%2] This ratio,
they claim, is unique and sufficient to represent different expressions. They show
fairly succesful results based on this model. Although, they do compare their work
with a gabor bank based approach, we wish to point out that they merely computed
a simple intensity calculation of the manually located feducial points.Huang et al. [90]
present some work on calculating 3D image feature vectors based of Euclidean distance

computation between manually selected anchor points.

Some other interesting work, includes Mpiperis et a1 [91] who propose a bilinear
model that uses anchor points and regularizers to smooth the ﬁt between model (
to be ﬁt) and surface (given) by using both model to surface and surface to model
distances. They claim that such a model helps to learn regions in the surface such as

valleys and ridges.

Thus, although there exists a critical body of work to study expressions in 3D face
data. There exists no truly automatic system for expression recognition that does not
use pre—calculated or manually chosen anchor points(during training). Although there
exist models such as Active Shape Modeling using Point Distrubition Models [29]
that can alleviate this setback fairly well, they inevitably add to more computational
time. In this work, we attempt to parameterize expressions (smiles) without the use

of anchor points.

The chapter is organized as follows: we ﬁrst review some theoretical concepts of
Gabor ﬁlters. We then describe our method of parameterization. Lastly, we describe

our experimental results.

4. 1 Gabor Filters

Early cortical cells in the V1 regions of the visual cortex have been shown to be
sensitive to both orientation as well as spatial frequency [7,92]. Daugman further
suggested that the cell’s response can be well approximated using a 2D Gabor ﬁlter.
Thus, Computer Vision researchers have taken a keen interest in Gabor ﬁlters.

Gabor ﬁlters have been extensively used in Computer Vision research and even
more so for biometrics research. Gabor ﬁtlers are often used for ﬁnding key points
in recognition algorithms [87, 93, 94]. We refer the reader to a survey paper for
a clear overview of the algorithms that employ Gabors for authentication [95] and
expression [96]. A brief but rigorous tutorial by Movellan on Gabors can be useful to
the interested reader [97].

In our work, we use such a Gabor ﬁlter bank. A 2D generalization of the Gabor

function is given by

11311 _uk.-u2na-12 ,- , ,2
02(5) = ——§—e 20 ejkix — 67 (4.1)

0’

Each ‘11,; is a plane wave characterized by the vector I3,- enveloped by Gaussian
function, where o is the standard deviation of this Gaussian. The center frequency

of the ith ﬁlter, is given by the wave vector

_. k- It 0036
kg = w = U 11 (4-2)
kiy kvsinﬂu
Each ﬁlter is represented by a scale kg and orientation 0#.We computed the Gabor
responses for a standard set of frequencies and scale k2): [0,..4] and 6”: [-7r,—-37r/4,...7r].

The ﬁrst term in the (4.1) is the part that deals with the periodicity of the ﬁlter, the

term within the square braces compensates for the DC value of the kernel. This

52

 

lllllllllﬂlilil
"mam

EEFZ%E
EEEEz

     

Figure 4.1: This ﬁgure shows a typical Gabor ﬁlter bank. Left to right are the
changing frequencies. Top to bottom are the changing spatial orientations. Typrically,
an input face is convolved with each of the ﬁlters to give a gabor intensity map.
Features are extracted from various intensity maps for the speciﬁc application

operation makes the ﬁlter insensitive to illumination.

Gabor ﬁlters are nothing but products between a harmonic function (4.2) and a
Gaussian enve10pe(4.l). Thus, they can be thought of as edges or contour detectors
that detect both valleys and ridges. With respect to face research, they can help
detect a lot of key regions of a face; like the eyes, tip of the nose, lips, moles, scars,
etc. The different orientations are controlled by the harmonic function and the scale

or the frequency is controlled by how often the Gaussian envelope repeats itself.

The way the ﬁlters are applied is that each ﬁlter is convolved with the input face.
The high response regions are the potential regions of interest in the original image.
Some features may be larger (eye sockets) and thus will show up on a lower scale
while some features that are smaller (like moles) show up on larger scales. Depending

on our application, the response of a few ﬁlters to increase robustness of detection.

Let us number the rows and columns of Figures 4.4,4.5 as 1 through 8 (left to

right) and 1 through 5 (top to bottom). In Figure 4.4 we see (2,5) the eyes, nose

53

 

 

-60 -40 -2O 0 2'0 40 60

Figure 4.2: Frontal scan with no expression of subject 002 from the MSU—Face
database.

54

150

100 1

50.

 

-100 L

 

-50 0 50

Figure 4.3: Frontal scan with smile expression of subject 002 from the MSU-Face
database

55

l] ll ‘ 1' l]
‘ .
ll ‘) [

Figure 4.4: Convolved responses of the no—expression frontal scan 4.2 using Gabor map
described above Figure4.1. Notice how some responses identify the regions around the
eyes, nose and lips while others tend to identify regions that are not easily identiﬁable.
Note also how the Gabor responses identify the boundary of the face scan from the
input depth map.

 

56

 

Figure 4.5: Convolved responses of the smile-expression frontal scan 4.3 using Gabor
map described in ﬁgure Figure 4.1. Notice how some responses identify the regions
around the eyes, nose and lips while others tend to identify regions that are not easily
identiﬁable. Note also how the Gabor responses identify the boundary of the face
scan from the input depth map.

57

 

Normalized Input

 

 

 

 

 

 

 

 

 

 

 

Facescan
Parameterize Blobs
Blob (Convex Hulls +
Detection ‘ ’ PCA on perimeter
points)
Gabor Filter
Bank

 

Figure 4.6: Block diagram showing the procedure for parameterizing smiles.

and lip regions identiﬁed. These features can also be identiﬁed in some other ﬁlters
but it’s clearest here. Similarly, in Figure 4.5 we see the same 4 features identiﬁed,
except the shape of the blobs is different, they seem some how “softer” thus denoting

the smile. These features are almost repeatably identiﬁed in all input scans.

4.2 Cluster Parametrization: A Convex Hull And

PCA Approach

4.2.1 Parameterization of Expressions

Our approach of parameterizing expressions (smile versus no expression) uses a sim-
pler idea of detecting regions of interest on faces using Gabor ﬁlter banks. These
regions of interest as shown in ﬁgure vary in size and shape between faces that are
smiling and those that are not smiling. We, then attempt to parameterize these
clusters by using an ASM like approach. Figure 4.6 explains the procedure.

As described in the previous section. we obtain a set of 40 convolved images for
every input face. From our data the response of scale kv = l and orientation 6,) 2 0

responds most discriminatively for the task of smile recognition.

58

Figure 4.7: Gabor responses for smile versus neutral expression along with the input
scans

   

 

Figure 4.8: Blobs identiﬁed on Gabor response of a neutral scan of subject 13 from
the MSU database.

From the response as shown in Figure 4.7, we perform blob detection. To do
this effectively, we ﬁrst remove the high intensity responses around the border of the

image and perform blob detection based on simple relative geometric locations.

We experimented with various clustering algorithms to identify the blobs robustly
but found that performance based on blob detection outperformed clustering methods
such as nearest neighbour, shortest distance and average distance clustering methods.

Figure 4.8 shows an example of blobs being identiﬁed on Gabor responses.

Once the 4 blobs have been identiﬁed from the Gabor response a convex hull
is extracted 1. Our idea is to sample uniformly along the perimeter (hull) of each
cluster. We then perform PCA on the perimeter points. The projection weights of
the perimeter points of each cluster are the parameters for that cluster. Thus, we

obtain 4 sets of parameters for each face.

 

1See Appendix A for more details on convex hulls.

60

4.2.2 Recognizing Smiles

We evaluated our proposed methods on the Michigan State University Canonical Face
Depth Map (MSU-CFDM) dataset used by Dirk et al. [47]. We normalized the dataset
using procedures stated earlier. We mainly employed Support Vector Machines for
classiﬁcation. Although we did experiment with quadratic kernels (equivalent of
QDA) and even tree-classiﬁers. We found that SVMs worked relatively the best.
All results reported below are based on a 10~fold cross validation using the hold out
method. 80% of the data was used to train the classiﬁer and 20% was used to test

the accuracy of the classiﬁer.

Prin Components Accuracy Variance

 

5 75% .002
10 76% .003
15 71% .003

Table 4.1: Performance of feature vector for classiﬁcation on neutral vs smiling faces
using 200 points on the convex hull

From Table 4.1 There seems to be a critical point in the usage of the number of
principal components with respect to classiﬁcation accuracies. Too many or too few

projection subspaces resulted in poor classiﬁcation accuracies.

No. of Points Accuracy Variance

 

200 76% .003
100 78% .003
50 74% .002

Table 4.2: Performance of feature vector for classiﬁcation on neutral vs smiling faces
using varying points on the convex hull projected onto 10 principal components

Table 4.2 compares performance accuracies using various perimeter point densities.
Here we see that if we use too few points or too many the accuracy of classiﬁcation
tends to decrease.

We also experimented with training classiﬁers using the feature vector from the

submesh Eigen projections. The accuracies are reported in Table 4.3. We see that it

61

No. of Big Projections Accuracy Variance

 

3 80% .001
5 76% 6e—4
10 73% .001

Table 4.3: Performance of feature vector for classiﬁcation on neutral vs smiling faces
using submeshed Eigen Projection Weights using

performs marginally better than features extracted from Gabor ﬁltering, with lower

variances, thus possibly more stable classiﬁcation.

62

Chapter 5

Discussion

We have shown a novel parametric form for 2.5 D faces, using superquadrics and
PCA based parametrization. Colbry and Stockman [47] show a robust normalized
face-space in their work. They show how a robust face-space can lead to a faster
and more accurate matching of face-pairs. We show that we can ﬁnd a more robust
normalized face—space. Such a face-space has beneﬁcial impacts in other areas such as
robotic surgeries. We show that the average rotations is in the order of 10—3 for all 3
rotations on the CF DM data set. Although the mean squared error (MSE) of ﬁtting
is large, the axes obtained show that superquadrics help ﬁnd a good coordinate space
for faces. This result is again veriﬁed by the trace of the eigenvalue, showing that the
sum of the top 10 eigenvalues captures about 99% of the energy. It is important to
note that, we believe that superquadrics cannot represent 3D faces completely (and
accurately). They only help capture larger shape variations, which help to ﬁnd about.
20 clusters with an accuracy of about 85%. It would be fair to say that they only
capture larger variations such as overall curvature and diameters. Since the set of
geometric shapes spanned by 3D faces is strictly ellipsoidal (or superellipsoidal) the

results presented here support this idea.

To speed up traditional biometric veriﬁcation and recognition systems, we propose

63

 

a method of partitioning the data set. We show that cluster numbers between 20-
30 can give a fairly accurate recall of over 90% using simple k-means clustering.
Employing, the MPC K-means algorithm, we show that we can create more clusters
(30-40) with accuracy of about 80%. Thus, clustering on the face data set could
give us an order of 101 speed up in large data systems using as little as 30 shape
parameters. Given a probe face we can ﬁnd its sister scan (other scans of the same
subject) with in 20 matchings with a probability of 0.85 as opposed to matching it
with all the 214 scans. If each matching costs about a second, then even with a large

gallery we can match a person in about 20 seconds.

We also propose a fully automatic method to extract features for recognizing
smiles. We show that we can reliably recognize smiles at about 80% accuracy. There
are many limitations to our current model. The responses of the Gabor ﬁlters are
sometimes insufﬁcient to reliably extract the convex hull. Thus, to achieve a truly
automatic system, we will have to assume atleast a robust normalization schema. It
is important to note that the time complexity of our system is fairly trivial. Given
a normalized face, we were able to easily estimate the coefﬁcients in under a few
seconds. Assuming a 15 fold increase in performance using C++, we can perform
recognition in real time.

It is important to note that our dataset consisted of samples that have many holes
in them due to features such as hair and anomalies where the reflected laser light was
not captured (Examples in Appendix and Chapter 2). Thus, the results reported in
this thesis is a conservative estimate, with improved results on better data.

Future Work

We would also like to explore how the clustering methods perform when the num-
ber of scans per person increases in the data set. We also plan to test our system
on a larger data set such as the Face Recognition Grand Challenge v2 (F RGC) data

set [54],to verify the performance of our results. Another very interesting dimension

64

 

to be studied would be to identify the average number of clusters to be evaluated
before a match is made. We expect this value to be fairly low on large databases.

It is clear from the literature and Table 3.3, that shape curvatures carry a lot
of discriminative power. Nevertheless, there is a strong requirement to gain a truly
automated feature space. There exists room to try more hybrid approaches based
on extraction of facial features and surface features. We hypothesize that extract-
ing shape curvatures for the regions enveloped by the convex hulls (proposed in our
method ) using Gabor ﬁlters could perform very well for parameterizing expressions.
Extracting shape curvatures for such select regions will also greatly reduce computa-
tional cost while still maintaining autonomity. Our future work will involve working
along these lines and testing our work on other datasets such as those presented by
Yin et a1 [89]. Using shape curvature descriptive methods such as those proposed by
Wang et al [88] can also be a good way to parameterize 3D faces for clustering.

The problem of representing 3D faces for clustering is a recent problem. This
thesis presents one approach to represent 3D faces . We also hope that it has shed
some light on the usefulness of superquadrics toward representation and clustering.
We hope that this work lays some ground work for practical performance on these

problems. *

65

Chapter 6

Appendix

6.1 A - Convex Hull

Given a set of points X, the convex hull is the set of points A? such that A? envelopes
the set of points X. There are many algorithms to determine the convex envelope
or the convex hull for a given set of points [99]. We refer the reader to for some
algorithms.

Convex hulls have many applications in computer vision, computer graphics, Geo-
graphic Information Systems (GIS) and many other ﬁelds. Convex hulls can be related
to the Delauynay triangulation.It can also be viewed as the Voronoi diagram [100].
We extract the convex hull using the qhull [101] package in Matlab.

For a given set of points, we ﬁrst extract the convex hull as shown in Figure 6.1

using the qhull package [101] through Matlab [102].

66

 

3.5 ﬁ ﬁ 7 I T I I I I

2.5 r

   

 

0.5 f + Data

Convex Hull

   

 

 

 

 

 

O
..
.-
..
..
)—
1-

l

—1.5 -1 —0.5 0 0.5 1 1.5 2 2.5 3 3.5

 

Figure 6.1: Convex Hull (blue line) of a random set of points (red plus) from a normal
distribution with mean (u=[1,2]) and co-variance (o=[0.9,0.2;0.2,0.5])

67

Operation Fast ﬁne Description
Camera setup 14.2 14.2 Camera calibration and setup.
Required only during start up

 

 

 

Focs 3.4 3.4 Minor Focus adjustment, if subject moves
Lead 0.072 0.072 Scanner setup time

Scan 0.87 3.74 Scanning time

Color 0.85 0.85 Record 3 picture color image

Release 1.72 4.59 Total time subject needs to hold still
Download 1.04 1.92 Download image from scanner to computer
Total 2.83 6.58 Total time for processing (without setup/ focus)

Table 6.1: Operation of the Vivid 910 scanner (time in seconds). Details from PhD
dissertation - Dirk Colbry [3]

6.2 B - Dataset Description

The data used in this research was collected by former labmate Dirk Colbry and pro-
fessors George Stockman and Anil Jain. This appendix brieﬂy describes the dataset
collected. The 3D models used in this study were produced by the Minolta Vivid
910 3D scanner. This scanner can generate a 2.5 image of the face in less than 2
seconds. A 2.5D image is a reduced representation of 3D images in that, we do not a
mathematical solid. That is to say, the 2.5D face image does not divide the 3D space
into 3 regions - region outside the 3D face, region on the surface of the 3D face and

region inside the 3D face (hollow space).

The Vivid 910 provides scans accurate upto 0.2mm [103].The Vivid 910 estimates
structure from lighting. It uses a horizontal plane of laser light onto the surface of an
object. A 640 x 480 pixel color camera picks up the curve produced by the interaction
of the the laser light with the object. It then triangulates the depth of the illuminated
surface points. Since this type of sensor is well calibrated, it will produce scan depth

values accurate to less than a millimeter (:l:.10mm in ﬁne mode).

There are two major modes of operation for the Vivid 910 scanner - fast and ﬁne.

Table 6.1 describes the operation modes and scan times in detail.

68

The Michigan State University Canonical Face Depth Map (h‘ISUvCFDM) Dataset
consists of scanning only in the fast mode, with scan resolutions set at 320 x 240 pixels.
This scanning time takes place in less than a second. The dataset consists of scans
from 111 subjects. For 107 subjects exactly 3 scans were collected (4 scans had only
1 or 2 each and were exempt from our analysis). Thus, in total 321 scans were used
in this research.

Institutional Review Board (IRB) Approval

All data collected in this research met with IRB approved standards and pro-
cedures. The IRB number for this research is #03-851, titled “Face modeling for

communication and recognition”.

69

6.3 C-Sample Superquadric Fitting

This appendix will describe some more examples of superquadric ﬁtting.

 

 

-410 o 40

Figure 6.2: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset

70

 

 

-40 -2° 0 20 4o
1

l I J

l

 

Figure 6.3: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset
along with a superquadric ﬁt, with x-y axes visible

71

so- . _ .
— ' . » xxx».

3!}

1111111...

 

 

-100 O 100

Figure 6.4: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset
along with a superquadric ﬁt, with y-z axes visible

 

 

 

60

Figure 6.5: Figure showing 2.5D facescan of subject from the MSU-CFDM dataset
along with a superquadric ﬁt, with x-z axes visible. The superquadric ﬁt parameters
are as follows [64.39,95.5,40,0.9062..8998]

72

100"

-100-

 

 

Figure 6.6: Figure showing 2.5D frontal facescan of subject from the MSU-CFDM
dataset

73

"100

 

50

Figure 6.7: Figure showing 2.5D frontal facescan of subject from the MSU-
CFDM dataset along with a superquadric ﬁt.The ﬁt parameters are as follows
[56.6,98.2,41.72,1.4742,0.6360]

74

 

1 J

-60 -40 -20 0 20 40 60

 

Figure 6.8: Figure showing 2.5D frontal facescan of subject from the MSU-CFDM
dataset

75

 

Figure 6.9: Figure showing 2.5D frontal facescan of subject from the MSU-
CFDM dataset along with a superquadric ﬁt. The ﬁt parameters are as follows
[56.8,92,41.67,1.4469,0.7807]

76

 

 

Figure 6.10: Figure showing 2.5D frontal facescan of subject from the MSU-CFDM
dataset showing a smile expression

77

 

Figure 6.11: F ignre showing 2.5D frontal facescan with smile expression of subject

from the MSU-CFDM dataset along with a superquadric ﬁt. The ﬁt parameters are
as follows [61.05,90,44.13,1.5375,0.9761]

78

50
0
-50

 

-100 1

 

1 .
U

 

Figure 6.12: Figure showing 2.D frontal facescan that is both poorly normalized and

has many holes
' ' 50
[ 0
[ -50
-50 9 50

Figure 6.13: Figure showing 2.D frontal facescan that is both poorly normalized
and has many holes (previously described) ﬁt with a superquadric. Note, how su-
perquadrics ﬁnd the axes in spite of the holes and poor normalization.

 

 

 

79

6.4 D- Expression Modeling Samples

This appendix describes sample scans and the Gabor responses employed for smile

parmaeterization described in our work.

100 _

-100 r

 

 

 

Figure 6.14: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression

80

 

Figure 6.15: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the absolute Gabor responses

81

100 _

50-

-100 .-

 

 

Figure 6.16: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression

82

 

Figure 6.17: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the absolute Gabor responses

83

100

50

-50

 

 

 

l l I
-40 0 40
Figure 6.18: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression

84

 

Figure 6.19: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the real Gabor responses

85

 

 

-40 0 40

I l L l

l J

Figure 6.20: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression

86

 

Figure 6.21: Figure showing 2.5 D frontal facescan of subject from the MSU-CFDM
dataset showing no expression and the real Gabor responses

87

BIBLIOGRAPHY

[1] A. Jain, R. Bolle, and S. Pankanti, Biometrics: personal identiﬁcation in net-
worked society. Kluwer Academic Publishers, 1999.

[2] D. Colbry and G. Stockman, “Real time identiﬁcation using a canonical face
depth map,” IET Computer Vision, Special Issue on 3D Face Processing, pp.
74—92.

[3] D. Colbry, “Human face veriﬁcation by robust 3d surface alignment,” Michigan
State University PhD Dissertation, 2006.

[4] H. J .M., C. Larson, and D. Zhu, “Cortical activation to indoor versus outdoor
scenes: an MRI study,” Experimental Brain Research, vol. 179, no. 1, pp.
75—84, 2007.

[5] M. Mudigonda, P. Ramkumar, D. Zhu, G. Stockman, and R. Jin, “Multi—voxel
pattern analysis identiﬁes brain regions that discriminate indoor and outdoor
scenes,” Science, vol. 10, no. 9.

[6] J. Haxby, M. Gobbini, M. Furey, A. Ishai, J. Schouten, and P. Pietrini, “Dis—
tributed and overlapping representations of faces and objects in ventral tempo-
ral cortex,” Science, vol. 293, no. 5539, p. 2425, 2001.

[7] J. Daugman, “Two-dimensional spectral analysis of cortical receptive ﬁeld pro-
ﬁles,” Vision research, vol. 20, no. 10, pp. 847—856, 1980.

[8] I. Biederman, “Perceiving real-world scenes,” Science, vol. 177, no. 43, pp. 77——
80, 1972.

[9] J. Belliveau, D. Kennedy Jr, R. McKinstry, B. Buchbinder, R. Weisskoff, M. Co-
hen, J. Vevea, T. Brady, and B. Rosen, “Functional mapping of the human
visual cortex by magnetic resonance imaging,” Science, vol. 254, no. 5032, p.
716, 1991.

[10] S. Ogawa, D. Tank, R. Menon, J. Ellermann. S. Kim, H. Merkle, and K. Ugur-

bil, “Intrinsic signal changes accompanying sensory stimulation: functional

88

[11]

ml

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[2?]

brain mapping with magnetic resonance imaging,” Proceedings of the National
Academy of Sciences of the United States of America, vol. 89, no. 13, p. 5951,
1992.

K. Kwong, J. Belliveau, D. Chesler, I. Goldberg, R. Weisskoff, B. Poncelet,
D. Kennedy, B. Hoppel, M. Cohen, and R. Turner, “Dynamic magnetic reso-
nance imaging of human brain activity during primary sensory stimulation,”
Proceedings of the National Academy of Sciences, vol. 89, no. 12, p. 5675, 1992.

G. McCarthy, A. Puce, J. Gore, and T. Allison, “Face-speciﬁc processing in the
human fusiform gyrus,” Journal of Cognitive Neuroscience, vol. 9, no. 5, pp.
605—610, 1997.

P. Ekman and W. Friesen, “Facial action coding system,” psychology, vol. 1978,
p. 35.

A. OToole, H. Abdi, K. Deffenbacher, and D. Valentin, “Low-dimensional repre-
sentation of faces in higher dimensions of the face space,” Journal of the Optical
Society of America A, vol. 10, no. 3, pp. 405—411, 1993.

R. Epstein, J. Higgins, and S. Thompson-Schill, “Learning places from views:
variation in scene processing as a function of experience and navigational abil-
ity,” Journal of Cognitive Neuroscience, vol. 17, no. 1, pp. 73—83, 2005.

M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,”
International journal of computer vision, vol. 1, no. 4, pp. 321-331, 1988.

S. Li, “Markov random ﬁeld models in computer vision,” Computer Vi-
sionECCV’QI, pp. 361—370, 1994.

R. Huang, V. Pavlovic, and D. Metaxas, “A hybrid face recognition method us-
ing markov random ﬁelds,” in Proceedings of the 17th International Conference
on Pattern Recognition, 2004, pp. 157—160.

S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition
using shape contexts,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 509—522, 2002.

D. Lowe, “Object recognition from local scale-invariant features," in iccv. Pub-
lished by the IEEE Computer Society, 1999, p. 1150.

D. Terzopoulos and D. Metaxas, “Dynamic 3D models with local and global de-
formations: Deformable superquadrics,” IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, pp. 703—714, 1991.

F. Solina and R. Bajcsy, “Recovery of parametric models from range images:
The case for superquadrics with global deformations,” IEEE transactions on
pattern analysis and machine intelligence, pp. 131—147, 1990.

89

[23] D. Terzopoulos, A. Witkin, and M. Kass, “Constraints on deformable models:
Recovering 3D shape and nonrigid motion,” Artiﬁcial Intelligence, vol. 36, no. 1,
pp. 91~123, 1988.

[24] B. Curless and M. Levoy, “A volumetric method for building complex models
from range images,” in Proceedings of the 23rd annual conference on Computer
graphics and interactive techniques. ACM, 1996, p. 312.

[25] G. Rhodes, F. Profﬁtt, J. Grady, and A. Sumich, “Facial symmetry and the
perception of beauty,” Psychonomic Bulletin and Review, vol. 5, pp. 659—669,
1998.

[26] F. Galton, “Personal identiﬁcation and description,” Journal of Anthropological
Institute of Great Britain and Ireland, vol. 18, pp. 177—191, 1889.

[27] B. Manjunath, R. Chellappa, and C. Von der Malsburg, “A feature based ap-
proach to face recognition,” in IEEE Computer Societay Conference on Com-
puter Vision and Pattern Recognition, 1992, pp. 373—378.

[28] D. DeCarlo, D. Metaxas, and M. Stone, “An anthropometric face model us-
ing variational techniques,” in Proceedings of the 25th annual conference on
Computer graphics and interactive techniques. ACM, 1998, p. 74.

[29] T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active Shape Models-
Their Training and Applications,” Computer Vision and Image Understand-
ing, vol. 61, no. 1, pp. 38—59, 1995.

[30] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of cognitive
neuroscience, vol. 3, no. 1, pp. 71-86, 1991.

[31] R. Clarke, “Human identiﬁcation in information systems: It’lanagement chal-
lenges and public policy issues,” Information Technology 63 People, vol. 7, no. 4,
pp. 6—37, 1994.

[32] Y. Itakura, M. Hashiyada, T. Nagashima, and S. Tsujii, “Proposal on personal
identiﬁers generated from the STR information of DNA,” International Journal
of Information Security, vol. 1, no. 3, pp. 149—160, 2002.

[33] E. Newham, “The biometric report,” SJB Services, New York, 1995.

[34] A. Ross and A. Jain, “Information fusion in biometrics,” Pattern Recognition
Letters, vol. 24, no. 13, pp. 2115—2125, 2003.

9

[35] ————, “Multimodal biometrics: An overview,’ in 12th European Signal Process-
ing Conference (EUSIPCO), Vienna, Austria, 2004, pp. 1221—1224.

[36] A. Ross, K. Nandakumar, and A. Jain, Handbook of multibiometrics. Springer-
Verlag New York Inc, 2006.

90

 

[37] D. Nitzan, A. Brain, and R. Duda, “The measurement and use of registered

reﬂectance and range data in scene analysis,” in IEEE Proceedings, vol. 65,
1977, pp. 206—220.

[38] V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in
Proceedings of the 26th annual conference on Computer graphics and interactive
techniques. ACM Press/Addison-Wesley Publishing Co., 1999, pp. 187—194.

[39] T. Vetter and V. Blanz, “Estimating coloured 3D face models from single im-
ages: An example based approach,” European Conference On Computer Vision,
p.499,1998.

[40] T. Akimoto, Y. Suenaga, and R. Wallace, “Automatic creation of 3D facial
models,” IEEE Computer Graphics and Applications, pp. 16-22, 1993.

[41] V. Blanz and T. Vetter, “Face recognition based on ﬁtting a 3D morphable
model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.
1063-1074, 2003.

[42] P. Mahalanobis, “On the generalized distance in statistics,” Proceedings of the
National Institute of Sciences,India, vol. 2, 1936.

[43] X. Lu, A. Jain, and D. Colbry, “Matching 2.5 D face scans to 3D models,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 31—43,
2006.

[44] U. Park and A. Jain, “3D model-based face recognition in video,” Advances in
Biometrics, pp. 1085—1094, 2007.

[45] I. Kakadiaris, G. Passalis, G. Toderici, M. Murtuza, Y. Lu, N. Karampatziakis,
and T. Theoharis, “Three—dimensional face recognition in the presence of facial
expressions: An annotated deformable model approach,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, pp. 640—649, 2007.

[46] P. Besl and H. McKay, “A method for registration of 3-d shapes,” Pattern
Analysis and Machine Intelligence, vol. 14, pp. 239~256, 1992.

[47] D. Colbry and G. C. Stockman, “Canonical face depth map: A robust 3d rep-
resentation for face veriﬁcation,” in Computer Vision and Pattern Recognition.
2007.

[48] K. Chang, K. Bowyer, and P. Flynn, “Face recognition using 2D and 3D facial
data,” in ACM Workshop on Multimodal User Authentication, 2003, pp. 25 32.

[49] —, “An evaluation of multimodal 2D+ 3D face biometrics,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, pp. 619—624, 2005.

91

 

[50]

151]

[52]

[53]

[54]

[55]
[55]
[57]

[58]

[59]

[60]

[61]

[62]

K. Bowyer, K. Chang, and P. Flynn, “A survey of approaches and challenges
in 3D and multi-modal 3D+ 2D face recognition,” Computer Vision and Image
Understanding, vol. 101, pp. 1—15, 2006.

D. Colbry, G. Stockman, and A. Jain, “Detection of anchor points for 3D face
verification,” in Proc. IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, vol. 93, 2005, p. 94.

M. Irfanoglu, B. Gokberk, and L. Akarun, “3D shape-based face recognition
using automatically registered facial surfaces,” Pattern Recognition, vol. 4, pp.
183—186, 2004.

A. Mian, M. Bennamoun, and R. Owens, “Keypoint detection and local feature
matching for textured 3D face recognition,” International Journal of Computer
Vision, vol. 79, no. 1, pp. 1—12, 2008.

P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques,
J. Min, and W. Worek, “Overview of the face recognition grand challenge,” in

IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
tion, 2005, pp. 947—954.

“wwwgeometrixcom.”
Lt ' ' 3’
www.kon1cam1nolta.com.

“http: //www.qlonerator.com/ .”

P. Phillips, H. Moon, P. Rauss, and S. Rizvi, “The FERET evaluation method-
ology for face-recognition algorithms,” in Proceedings Of The IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, 1997, 1997,
pp. 137—143.

A. C. S. Germain, Robert S Califano, “Fingerprint matching user transforma-
tion parameter clustering,” in IEEE Computational Science and Engineering,
1997, pp. 42—49.

K. W. Bowyer, K. Hollingsworth, and P. J. Flynn, “Image understanding for
iris biometrics: A survey,” Computer Vision and Image Understanding, vol.
110, pp. 42—49, 2007.

A. Barr, “Superquadrics and angle preserving transformations,” IEEE Corn-
puter Graphics Applications, vol. 1, pp. 11—23, 1981.

Z. X. S. Niu, Jianwei Li, “Comparisons of 3d shape clustering with different face
area deﬁnitions,” Second International Conference on Digital Human Modeling,
vol. 5620, pp. 55—63, 2009.

92

 

[63] Y. Zhang and C. Kambhamettu, “Robust 3d head tracking under partial oc-
clusion,” in Proc of the fourth International Conference on Automatic Face and
Gesture Recognition, 2000.

[64] F. Haar, M. Daoudi, and R. Veltkamp, “Shape retrieval contest 2008: 3d faces
scans,” in Shape Modeling and Applications (SMI), 2008, pp. 225—226.

[65] A. P. Pentland, “Perceptual organization and the representation of natural
form,” Artiﬁcial Intelligence, vol. 28, pp. 293-331, 1986.

[66] A. Jaklic, A. Leonardis, and F. Solina, Superquadrics and their geometric prop-
erties, Thesaloniki,Greece, 2000, ch. 2, pp. 13—539.

[67] E. Bardinet, L. D. Cohen, and N. Ayache, “Fitting 3-d data using superquadrics
and free-form deformations,” in 12th IAPR, Int Conf on Pattern Recognition,
Jerusalem, 1994, pp. 79~83.

[68] —, “A parametric deformable model to ﬁt unstructured 3d data,” Computer
Vision and Image Understanding, vol. 71, pp. 39-54, 1998.

[69] —, “Tracking and motion analysis of the left ventricle with deformable su-
perquadrics,” Medical Image Analysis, vol. 1, pp. 129-149, 2004.

[70] A. H. Barr, “Superquadrics , angle preserving transformations,” IEEE Com-
puter Graphics , Applications, vol. 1, no. January, pp. 11—23, 1981.

[71] S. Han, “A globally convergent method for nonlinear programming,” Optimiza-
tion Theory and Applications, vol. 22, p. 297, 1977.

[72] P. Gill, W.Murray, and M.H.Wright, Numerical Linear Algebra and Optimza-
tion, 1991, vol. 1.

[73] K. Pearson, “On lines and planes of closest ﬁt to systems of points in space,”
Philosophical Magazine Series 6, vol. 2, no. 11, pp. 559—572, 1901.

[74] M. Kirby and L. Sirovich, “Application of the Karhunen-Loeve Procedure for
the Characterization of Human Faces,” IEEE Transactions of Pattern Analysis
and Machine Intelligence, vol. 12, no. 1, 1990.

[75] L. Sirovich and M. Kirby, “Low—dimensional procedure for the characterization
of human faces,” Journal of the Optical Society of America, vol. 4, no. 3, pp.
519-524, 1987.

[76] M. Turk and A. Pentland, “Face recognition using eigenfaces,” in Proc. IEEE
Conf. on Computer Vision and Pattern Recognition, vol. 591, 1991, pp. 586—591.

[77] A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algo-
rithm,” in Advances in Neural Information Processing Systems 14: Proceeding
of the 2001 Conference, 2001, pp. 849—856.

93

 

[78] F. Bach and M. Jordan, “Learning spectral clustering, with application to
speech separation,” The Journal of Machine Learning Research, vol. 7, p. 2001.
2006.

[79] M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document cluster-
ing techniques,” in KDD workshop on text mining, vol. 400, 2000, pp. 525—526.

[80] A. Jain and R. Dubes, Algorithms for clustering data, 1988.

[81] A. Jain, M. Murty, and P. Flynn, “Data clustering: a review,” ACM computing
surveys (CSUR), vol. 31, no. 3, pp. 264—323, 1999.

11

[82] A. Jain, “Data clustering: 50 years beyond K-means,
Letters, vol. 31, no. 8, pp. 651-666, 2010.

Pattern Recognition

[83] M. Bilenko, S. Basu, and R. Mooney, “Integrating constraints and metric learn-
ing in semi-supervised clustering,” in Proceedings of the twenty-ﬁrst interna-
tional conference on Machine learning. ACM, 2004, p. 11.

[84] S. Basu, M. Bilenko, and R. Mooney, “A probabilistic framework for semi-
supervised clustering,” in Proceedings of the tenth ACM SI GKDD international
conference on Knowledge discovery. ACM, 2004, pp. 59—68.

[85] G. Edwards, T. Cootes, and C. Taylor, “Face recognition using active appear-
' ance models,” Computer VisionECCV98, p. 581, 1998.

[86] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions
with gabor wavelets,” in Proceedings of the 3rd. International Conference on
Face 52 Gesture Recognition, 1998, p. 200.

[87] L. Wiskott, J. Fellous, N. Kr
”uger, and C. Von der Malsburg, “Face recognition by elastic bunch graph
matching,” in Computer Analysis of Images and Patterns. Springer, 1997, pp.
456—463.

[88] J. Wang and L. Yin, “Static topographic modeling for facial expression recog-
nition and analysis,” Computer Vision and Image Understanding, vol. 108, no.
1-2, pp. 19—34, 2007.

[89] L. Yin, X. Wei, Y. Sun, J. Wang, and A. M. Rosato, “3d facial expression
database for facial behavior research,” In 7th Int. Conf. on F GR, Southampton.
UK, 2006.

[90] T. S. Huang, “3D facial expression recognition based on automatically selected
features,” 2008 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Workshops, pp. 1-8, June 2008.

94

[W

[92]

[93]

[94]

[95]

[96]

[97]

[981

[99]

[100]

[101]
[102]
[103]

I. Mpiperis, S. Malassiotis, and M. G. Strintzis, “Bilinear Models for 3-D Face
and Facial Expression Recognition,” IEEE Transactions on Information F oren-
sics and Security, vol. 3, no. 3, pp. 498—511, September 2008.

J. Daugman, “Uncertainty relation for resolution in space, spatial frequency,
and orientation optimized by two-dimensional visual cortical ﬁlters,” Journal
of the Optical Society of America A, vol. 2, no. 7, pp. 1160—1169, 1985.

Y. Moses, Y. Adini, and S. Ullman, “Face recognition: The problem of compen-
sating for changes in illumination direction,” European Conference on Computer
Vision, pp. 286—296, 1994.

P. Phillips, H. Wechsler, J. Huang, and P. Rauss, “The FERET database and
evaluation procedure for face—recognition algorithms,” Image and Vision Com-
puting, vol. 16, no. 5, pp. 295—306, 1998.

W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld, “Face recognition: A
literature survey,” Acm Computing Surveys (CSUR), vol. 35, no. 4, pp. 399—
458, 2003.

B. Fasel and J. Luettin, “Automatic facial expression analysis: a survey,” Pat-
tern Recognition, vol. 36, no. 1, pp. 259-275, 2003.

J. Movellan, “Tutorial on gabor ﬁlters,” Open Source Document, 2002.

M. Mudigonda and G. C. Stockman, “Superquadric Representation of 3D Faces
: Toward Reducing Time Complexity of Recognition,” in SPIE, Biometric Tech-
nology for Human Identiﬁcation VII, Orlando, FL, 2010.

T. Cormen, C. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms.
The MIT press, 2001.

K. Brown, “Voronoi diagrams from convex hulls,” Information Processing Let-
ters, vol. 9, no. 5, pp. 223~228, 1979.

“www.qhull.org.”
“www.mathworks.com.”

C. Boehnen and P. Flynn, “Accuracy of 3D scanning technologies in a face
scanning scenario,” 2005, pp. 310—317.

95