.,
{5.0%

.‘c,

H. F) .2 .531 .
n. r. .s 1»‘.v..l.4t$§,
, k,

, . h»... E
.r.J ”mummy. ‘
#3 EU.

3‘s

3%,. 7 .

.c
:....
«Ibikla
thank“.
. s .
3.5%. u 5.
.X“ 9-.le
i 6.;

. v . in

.. . :

.33....391ﬁiu
m: x! am“

Mr .
3.113.
‘lti

.IAIC

.qum

5.5»...-
1 I... ..
.1. to t.

 

 

:. 13.1%? ‘ .6:

E 7 .\...!,I m V3.41}
. .. 4:.ﬁﬁﬁ: z.”

THESIS LIBRARY
., ‘1 , Michigan State
c: i _ University

This is to certify that the
dissertation entitled

STATISTICAL MODELS FOR FINGERPRINT INDIVIDUALITY

presented by
YONGFANG ZHU

has been accepted towards fulﬁllment
of the requirements for the

degree in Statistics and Probability

 

 

WWW

Major Professor” 5 Signature
07 / 5L3 / o 8

Date

 

MSU is an affirmative-action, equal-opportunity employer

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 KzlPrQIAOCAPreleIRC/Dateoue indd

STATISTICAL MODELS FOR FINGERPRINT INDIVIDUALITY
By

Yongfang Zhu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

2008

ABSTRACT
STATISTICAL MODELS FOR FINGERPRINT INDIVIDUALITY
By

Yongfang Zhu

The US. Supreme Court in the 1993 case of Daubert vs. Merrell Dow Pharmaceuticals
ruled that scientiﬁc evidence presented in a court is subject to the principles of scientiﬁc
validation that include whether (i) the particular technique or methodology has been sub-
ject to statistical hypothesis testing, (ii) its error rate has been established, (iii) standards
controlling the technique’s Operation exist and have been maintained, (iv) it has been peer
reviewed, and (v) it has a general widespread acceptance. Following Daubert, forensic
evidence based on ﬁngerprints was ﬁrst challenged in the 1999 case of USA vs. Byron
Mitchell based on the “known error rate” condition mentioned above, and subsequently,
in 20 other cases involving ﬁngerprint evidence. The main concern with the admissibility
Of ﬁngerprint evidence is the problem of individualization, namely, that the fundamental
premise for asserting the uniqueness Of ﬁngerprints has not been objectively tested. In
other words, the ﬁngerprint matching error rate is unknown. The problem Of ﬁngerprint
individuality can be formulated as follows: Given a query ﬁngerprint, what is the proba-
bility of ﬁnding a ﬁngerprint in a target population having features Similar to that Of the
query? To answer this question, the variability Of ﬁngerprint features, namely, minutiae in
the target population needs to be understood and quantiﬁed in terms Of statistical models.

For minutiae interclass variability, a family of mixture models iS developed to model
the minutiae variability Of individual ﬁngers, including minutiae clustering tendencies and
dependencies in different regions of the ﬁngerprint image domain. For a heterogeneous
population Of ﬁngers, a hyper-mixture model is proposed to cluster the population into

homogeneous groups having similar distribution given by the mixture models. The group

densities and weights are acquired by clustering the mixture models ﬁtted to individual
ﬁngers from a sample of the population

Whereas mixture models take into account the minutiae interclass variability, a com-
pound stochastic model is developed for more sources of minutiae variability, i.e., besides
interclass variability, the model also considers intraclass variability, such as nonlinear de-
formation and variability due to partial prints.

The proposed models are shown to better describe the Observed variability in the minu-
tiae compared to the model by Pankanti et a1. [35]. To quantify ﬁngerprint individuality,
a mathematical model that computes the probability of a random correspondence (PRC)
between minutiae sets of two randomly selected different ﬁngers is derived. Compari-
son Of PRCS with empirical matching probability Shows that the PRCS from the proposed
model are closer to the empirical matching probability than those calculated by the model
of Pankanti et a1. [35].

Copyright by
Yongfang Zhu
2008

This dissertation is dedicated to my professors and my loving family.

ACKNOWLEDGMENTS

Being a doctoral student in statistics, it has been a great opportunity for me to work in the
ﬁeld of biometrics. With the help from my professors and colleagues, my graduate study
at Michigan State University has been fruitful.

I would like to express my gratitude to my advisor, Dr. Sarat C. Dass, for introducing
me into this very challenging and yet very interesting ﬁeld of biometrics. I would like
to thank him for his continuous guidance for both my research and study, as well as his
precious advice for my professional career. Being a non-native English speaker, I received
a lot of help from Dr. Dass. He read and corrected my research articles, which led to
my improvement on scientiﬁc writing. He listened to my talk every time before I gave a
presentation, and offered substantial suggestions.

I am grateful to Dr. Lijian Yang, Dr. Connie Page, Dr. Dennis Gilliland from the
Department of Statistics and Probability, and Dr. Sasha Kravchenko from the Department
of Crop and Soil Sciences, for serving in my thesis committee. Besides the help for my
thesis research, Dr. Yang gave me good suggestions on my professional development. I
accumulated practical experience from the consulting work at the Center for Statistical
Training and Consulting. Dr. Page, Dr. Gilliland, and Dr. Sandra Herman Shared with me
their experience and knowledge in dealing with practical problems, and guided me through
my projects.

I would like to thank Dr. James Stapleton for his help and encouragement on my teaching
and study. He has been very supportive to me, as well as to all the other students. When we
have problems with our study, or any problem in everyday life, he would spend time with
us, and gave very kind suggestions.

I would like to especially thank Dr. Mark M. Meerschaert and the Department of Statis-
tics and Probability, for being very considerate and kind to me and many other students,
helping us going through hard times, and trying the best to solve any possible problems.

I want to thank my colleagues in the PRIP lab at the Department of Computer Science

vi

and Engineering, Michigan State University, for their help and being supportive for my
research: Hong Chen, Yi Chen, Dirk Colbry, Steve Krawczyk, Pavan K. Mallapragada,
Karthik Nandakumar, and Unsang Park. I thank Dr. Anil K. Jain for letting me use the
PRIP lab for my research and for reading my thesis. I would like to thank Ms. Wenmei
Huang for discussions on course work and research. I thank Dr. Peter Xinya Zhang for
spending a lot of time with this thesis work, and for continuously being very supportive for
my entire study. As one of the ﬁrst readers, he gave me a lot of comments on the structure
and narration of the thesis from a perspective outside the ﬁeld. I also received from him
many helpful suggestions for my presentations.

I beneﬁted from the comments and suggestions from Dr. Sharath Pankanti and Dr. Salil
Prabhakar on some of my publications. I thank the National Science Foundation for sup-
porting this research.

Last but not least, I would like to thank my family for their continuous support, under-

standing and encouragement for everything in my life.

vii

TABLE OF CONTENTS

LIST OF TABLES ................................... x
LIST OF FIGURES .................................. xii
1 Introduction to Fingerprint Based Recognition ................. 1
1.1 Overview of Fingerprint Recognition ..................... 1

1.2 General Procedures in Fingerprint Recognition ............... 4
1.2.1 Fingerprint Enrollment ........................ 4

1.2.2 Fingerprint Acquisition ........................ 6

1.2.3 Fingerprint Feature Extraction .................... 7

1.2.4 Fingerprint Matching ......................... 12

1.3 Statistical Test of Hypothesis in Fingerprint Recognition .......... 13

1.4 Summary ................................... 17

2 Fingerprint Individuality ............................. 18
2.1 Importance of Fingerprint Individuality ................... 18

2.2 Early Studies on Fingerprint Individuality .................. 20

2.3 A Stochastic Model of Fingerprint Individuality ............... 21
2.3.1 Deﬁnition of a Random Minutiae Correspondence ......... 23

2.3.2 Estimation of Fingerprint Individuality by Uniform Model ..... 25

2.3.3 Corrected Uniform Model ...................... 28

2.3.4 Limitations of Corrected Uniform Model .............. 29

2.4 Contributions of the Thesis .......................... 31

2.5 Summary ................................... 34

3 Mixture Models for Fingerprint Minutiae .................... 35
3.1 Features for Fingerprint Individuality Model ................. 35

3.2 Mixture Models ................................ 36

3.3 EM algorithm for Estimating OG ....................... 40

3.4 Goodness-of-ﬁt Tests for the Mixture Models ................ 46

3.5 Summary ................................... 48

4 Fingerprint Individuality for a Pair of Fingerprints .............. 51
4.1 Assumptions for Estimating PRC ....................... 51
4.2 Model for Estimating Fingerprint Individuality ............... 53

4.3 Difﬁculties in Estimating Fingerprint Individuality ............. 56
4.4 Poisson Model ................................ 57
4.5 Justiﬁcation of the Poisson Model ...................... 61

viii

4.6 Overlapping Area Model: Comparison of Mixture Model with Corrected
Uniform Model ................................ 61
4.6.1 Determination of no, mo and the Overlapping Area ......... 62
4.6.2 Adaptation of Mixture Model to Overlapping Area Model ..... 62
4.7 Summary ................................... 64
5 Assessment of Fingerprint Individuality: Target Population ,,,,,,,,,, 66
5.1 The Hyper—mixture Density Model ...................... 66

5.2 Relationship between Clusters from Hyper-mixture Model and Fingerprint
Classes .................................... 70
5.3 Assessment of Fingerprint Individuality for a Target Population ....... 72
5.4 Summary ................................... 73
6 Assessing Fingerprint Individuality: Compound Stochastic Models ,,,,,, 74
6.1 Motivation ................................... 74
6.2 Compound Stochastic Model ......................... 75
6.2.1 Construction of Master Minutiae Set ................. 75

6.2.2 Mixture Model on the Centers: Adaptation of Mixture Models to
Compound Stochastic Models .................... 77
6.2.3 Local Perturbation Model ...................... 78
6.2.4 Modeling the Variability of Partial Prints .............. 80
6.3 Conditional Minutiae Synthesis ........................ 81
6.4 Description of Matchers ........................... 85
6.4.1 Matcher for Master Construction ................... 85
6.4.2 Matcher for Synthesized Minutiae Matching ............ 85
6.5 Summary ................................... 86
7 Experimental Results ............................... 87
7.1 Fingerprint Databases ............................. 87
7.2 Fitting the Mixture Models .......................... 91
7.3 Fitting the Hyper-mixture Models ...................... 91
7.3.1 A Check to See if the Clusters of Mixture are Meaningful ..... 91

7.3.2 Evaluation of Hyper-mixture Models on Assessment of Fingerprint
Individuality ............................. 97
7.4 Assessment of Fingerprint Individuality with the Hyper-mixture Models . . 99
7.5 Estimation of Fingerprint Individuality with the Compound Stochastic Model 103
7.6 Summary ................................... 107
8 Conclusions and Future Directions ,,,,,,,,,,,,,,,,,,,,,,,, 109
BIBLIOGRAPHY ................................... 112

ix

1.1

7.1

7.2

7.3

LIST OF TABLES

Comparison of selected biometric technologies adapted from Maltoni et
a1. [27]. UVSL = Universality, DSTC = Distinctiveness, PRMN = Perma-
nence, CLTB = Collectability, PRFM = Performance, ACPT = Acceptabil-
ity, and CRVN = Circumvention.The symbols H, M and L denote High,

Medium and Low, respectively [27]. .....................

Results of the Freeman-Tukey and Chi-square tests for testing the
goodness-of-ﬁt of the mixture and uniform models on NIST. (W, V) means
the whole minutiae location Space S is partitioned into W equal-Size rows
by W equal-Size columns and the minutiae direction space D is partitioned
into V equal-Size blocks initially, prior to merging blocks of insufﬁcient
minutiae with their neighboring blocks. Entries correspond to the number
of ﬁngerprints in each database with p-values above 0.05. The total number
of mixture models that are tested in NIST is 1,997 because three out of the

2,000 ﬁngerprints don’t have enough minutiae (at least 5) for the tests. . . .

Results of the Freeman-Thkey and Chi-square tests for testing the
goodness-of-ﬁt of the mixture and uniform models on DB1. (W, V) means
the whole minutiae location Space S is partitioned into W equal-size rows
by W equal-Size columns and the minutiae direction space D is partitioned
into V equal-Size blocks initially, prior to merging blocks of insufﬁcient
minutiae with their neighboring blocks. Entries correspond to the number
of ﬁngerprints in each database with p-valueS above 0.05. The total num-
ber of mixture models that are tested in this database is 100 since all master

minutiae sets have total number of minutiae more than 5. ..........

Results of the Freeman-Tukey and Chi-square tests for testing the goodness
of ﬁt of the mixture and uniform models on DB2 database. (W, V) means
the whole minutiae location space S is partitioned into W equal-Size rows
by W equal-Size columns and the minutiae direction Space D is partitioned
into V equal-Size blocks initially, prior to merging blocks of insufﬁcient
minutiae with their neighboring blocks. Entries correspond to the number
of ﬁngerprints in each database with p—values above 0.05. The total number
of mixture models that are tested in this database is 100 Since all master

minutiae sets have total number of minutiae more than 5. ..........

3

92

7.4 Comparison of PRCS estimated from three different methods: (1) Individual
mixture models, (H) Hyper-mixture models, and (HI) Random grouping.
The right-most column Shows the relative ratios of the PRCS ......... 98

7.5 The number of clusters, N *, as well as mean /\ and PRCa based on the
hyper-mixture models for the three databases. ................ 99

7.6 A comparison of the PRCa obtained from the mixture and uniform models
based on mean m, n with empirical values. ................. 100

7.7 A comparison of the mean number of matches obtained from the mixture
and uniform models and empirical matches. ................. 100

7.8 A comparison between PRCa obtained from the mixture and uniform mod-
elsform=n=46 andw= 12. ....................... 101

7.9 Table giving the mean m and n in the overlapping area, the mean overlap-
ping area and the value of M for each database. ............... 101

7.10 A comparison between the mean A obtained from the mixture and uni-
form models and the mean number of matched minutiae from the empirical
matches in the overlapping area. ....................... 102

7.11 A comparison between ﬁngerprint individuality estimates using the (a)
Poisson and mixture models, and (b) the corrected uniform model of
Pankanti et a1. [35]. ............................ - . . 102

7.12 The estimation of ﬁngerprint individuality from the compound Stochastic
model when w = 12. Notice that when m = n = 26, none of two synthe-
sized minutiae sets Share 12 or more matched minutiae. Hence the proba-
bility cannot be estimated accurately for the case of m = n = 26 ....... 106

7.13 A comparison of ﬁngerprint individuality estimates. n and m are the total

number of minutiae in the query and the template, respectively. 112 is the
number of matches between the query and template ﬁngerprints. ...... 107

xi

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

1.10

1.11

LIST OF FIGURES

Some examples of biometric traits: (a) ﬁngerprint [26], (b) face, (0) signa-

ture, ((1) voice and (e) hand geometry ....................

Schematic diagram Showing the processing tasks invOlved in the enroll—
ment, veriﬁcation and identiﬁcation modes of a ﬁngerprint-based authenti-

cation system [27]. ..............................

Fingerprints of different quality based on the clarity of ridge and valley

structures: (a) good, (b) medium, and (c) bad. Extracting features and '

establishing matches based on (c) can be difﬁcult. Images are from FVC

2002 DB1 [26] ................................

Fingerprints of different sources: (a) live [26], (b) ink [33], and (c) latent

ﬁngerprint scans [15] .............................

Examples of ﬁngerprint images from the major classes. Images are from

NIST 2000 SD 4 [33] ..............................

A ﬁngerprint image Showing the salient features [33] ............

Gradient directions and magnitudes of a partial ﬁngerprint region from

FVC 2002 DB1 [26] indicated by arrow heads and lengths, respectively. . .
Examples of different minutiae. Images are from FVC 2002 DB1 [26]. . . .

Locations and directions of bifurcation and ending .............

Minutiae feature extraction steps. Different ﬁngerprint processing stages
for extracting minutiae features: (a) original image from FV C 2002 DB1
[26], (b) enhanced image, (c) thinned image and (d) detected features with
minutiae locations indicated by black boxes, and directions represented by

solid lines. ..................................

Example of ﬁngerprint matching. Two impressions of the same ﬁnger from
FV C 2002 DB1 [26] with 37 and 38 minutiae, respectively. 25 true corre-

spondences are found here. ..........................

xii

7

8

9

10

14

1.12 Multiple impressions of the same ﬁnger illustrating the intraclass variabil-
ity [26] .................................... 15

1.13 Illustrating small interclass variability: A pair of impostor ﬁngerprints with
13 features (minutiae location and direction) in correspondence. ...... 16

2.1 Identifying the tolerance region for a query minutia .............. 24

2.2 Comparison of experimental and theoretical probabilities for the number
of matching minutiae: (a) MSU DBI database, and (b) MSU VERIDICOM
database. Figure is the reproduction of Figure 9 in Pankanti et a1. [35] . . . 30

2.3 Outline of the thesis contributions where the labels Cks indicate the chap-
ters where the corresponding contribution is made. ............. 32

3.1 Probability distribution plots of the Von-Mises distribution with center
ug = 37r/4, and with two different precisions, mg and mg, with reg < n3.
The values of 22(6) at 0 and 7r are equal to each other due to the cyclical
nature of the cosine function. ........................ 38

3.2 Assessing the ﬁt of the mixture models to minutiae location and direction:
Observed minutiae locations (white boxes) and directions (white lines) are
shown in panels (a) and (b) for two different ﬁngerprints from the NIST
2000 SD 4. Panels (c) and ((1), respectively, Show the cluster labels for
each minutia in (a) and (b). .......................... 44

3.3 Assessing the ﬁt of the mixture models to minutiae location and direction
observed for ﬁngerprint images (a) and (b) in Figure 3.2. Panels (a) and (b)
Show the BICs when 1 5 G _<_ 5 ........................ 45

3.4 The clusters in 3-D space for ﬁngerprint images in Figure 3.2 (a-b) are
shown in panels (a) and (b) with :r, y, z as the row, column, and the direc-
tion of the minutiae ............................... 49

3.5 Minutiae locations and directions Simulated from the proposed model ((c)
and (d)), and from the uniform distribution ((e) and (f)) for two different
images ((a) and (b)). The true minutiae locations and directions are marked

in (a) and (b). ................................. 50

4.1 Overlapping area of two ﬁngerprints during matching ............ 63

xiii

4.2

5.1

5.2

6.1

6.2

6.3

6.4

6.5

6.6

Convex hull of minutiae and best ﬁtting ellipses. (a) The minutiae from
image (b) and the best ﬁtting ellipse to the minutiae set. p1,p2, c, 0 are the
major axis, minor axis, center, and orientation of the ellipse. (b) Fingerprint

image with minutiae and the best ﬁtting ellipse ............... 64
Determination of the number of clusters for NIST 2000 SD 4. The number
of clusters estimated is 33. .......................... 69

The number of ﬁngerprints in the three main classes for the 33 clusters from
the hyper-mixture models on NIST 2000 SD 4. For each cluster label 1' as
Shown in the horizontal axis, the vertical coordinate of each point shows the
number of ﬁngers in loop (labeled with dots), whorl (labeled with triangles)
and arch (labeled with squares). ....................... 71

Flow chart for constructing compound stochastic model and assessing ﬁn-
gerprint individuality ............................. 75

Master minutiae set construction. Eight impressions are shown which in-
clude the reference impression (b) and the other seven impressions (a). The
number of minutiae in each impression in the ﬁrst row is 29, 30, 27, 32, 32,
38, 24. The number of minutiae in the reference impression in the second
row is 38. There are 70 minutiae centers kept in the master minutiae set . . . 77

The ability of the mixture model to capture clustering characteristics of the
master in (a). The eight impressions are shown in Figure 6.2. Three cluster
components labeled by circles, squares and asterisk in the mixture model
ﬁtted to the minutiae in (a) ........................... 78

Consolidating minutiae: (a) a partial ﬁngerprint image. (b) and (c) Show
the locations and directions of the two labeled minutiae in (a) from eight
aligned impressions. ............................. 80

Scatter plot of the area of ellipse [A( f, l)] versus the total number of minu-
tiae [m( f, l)], and the ﬁtted quadratic polynomial for the FV C 2002 D81 [26]. 82

Simulating m0 = 36 minutiae for FV C 2002 DB1: (a) Finger impression,
(b) Minutiae and minimal ellipse for the impression, (c) Random ellipse
and synthesized minutiae centers from the mixture model, ((1) Synthesized
minutiae after compounding with local perturbations, and (e) Synthesized
impression after rigid transformation. .................... 83

xiv

7.1

7.2

7.3

7.4

7.5

7.6

Examples of ﬁngerprint images from different databases. Images (a—b) are
from NIST database [30]; Images (c-d) are from D81 and images (e-t) are

from DB2 [26] .................................

Empirical distribution of the number of minutiae (m,n) in the NIST

database. The average number of minutiae is 62 ................

Empirical distributions of the number of minutiae (m,n) in the master prints
constructed from (a) DB1 database, and (b) DBZ database. Average num-
ber of minutiae in the master minutiae set for the two databases are 63 and

77, respectively .................................

Estimating the number of clusters for FVC database.The estimated number
of clusters for D31 and DBZ are 9 and 12, respectively. The horizontal
axis shows the number of clusters N and the vertical axis is the value of

gap statistic at N. ...............................

Analysis of Hyper-mixture model using the NIST. The ﬁgure shows the
cumulative distribution of the ranks for the ﬁngerprints in the validation
set. The horizontal axis shows the rank, and the vertical axis Shows the
number of ﬁngerprints that have a rank less than or equal to a given rank.

A comparison between the synthetic and empirical impostor distributions

88

89

90

95

96

for the number of minutiae matches. ..................... 105

XV

CHAPTER 1

Introduction to Fingerprint Based

Recognition

1.1 Overview of Fingerprint Recognition

Biometric recognition refers to the automatic authentication of humans using his/her
anatomical or behavioral characteristics. Biometric recognition is more reliable compared
to traditional approaches, such as password-based or token-based approaches, as biomet-
ric traits cannot be easily Stolen or forgotten. Some examples of biometric traits include
ﬁngerprint, face, Signature, voice and hand geometry (see Figure 1.1). Biometric recogni-
tion systems have been deployed and implemented for human recognition (e. g., US-VISIT
program and the e-biometric passport which stores the owner’s biometric information in a
chip inside a passport). With increasing applications involving human-computer interac-
tions, there is a growing need for fast authentication techniques that are reliable and secure.
Biometric recognition meets such demand.

For a particular biometric trait to be considered for human authentication, several re-
quirements need to be met, namely, (i) universality, (ii) distinctiveness, (iii) permanence,

and (iv) collectability ( [42], [27]). Universality requires that every human being possesses

 

(d) (9)

Figure 1.1: Some examples of biometric traits: (a) ﬁngerprint [26], (b) face, (c) signature,
(d) voice and (e) hand geometry

the biometric trait; distinctiveness requires that two persons have sufﬁciently different trait
characteristics; permanence requires that trait characteristics remain unchanged over time;
collectability requires that naits are quantitatively measurable. There are further consider-
ations for practical biometric systems, such as: (i) performance and authentication rates,
measured in terms of speed and recognition accuracy, (ii) public acceptance for use in our
daily lives, (iii) the extent of which the biometric recognition system can be attacked or
spoofed, and (iv) cost'efﬁciency.

Among biometric traits currently used, ﬁngerprint has the longest history, and has been
widely adopted in both forensic and civilian applications. This is because ﬁngerprint meets

the previously discussed requirements of a successful biometric trait, (see Table 1.1 taken

Table 1.1: Comparison of selected biometric technologies adapted from Maltoni et a1. [27].
UVSL = Universality, DSTC = Distinctiveness, PRMN = Permanence, CLTB = Collectabil-
ity, PRFM = Performance, ACPT = Acceptability, and CRVN = Circumvention.The sym-
bols H, M and L denote High, Medium and Low, respectively [27].

 

Biometric Trait UVSL DSTC PRMN CLTB PRFM ACPT CRVN

 

DNA H H H L H L L
Face H L M H L H H
Fingerprint M H H M H M M
Hand Geometry M M M H M M M
Iris H H H M H L L
Signature L L L H L H H
Voice M L L M L H H

 

from [27] comparing commonly-used biometric traits). Due to wide acceptance of ﬁnger-
prints, ﬁngerprint-based recognition systems continue to dominate the biometrics market
by accounting for 52% of current authentication systems [27].

Rapid development of mobile commerce and mobile banking in recent years has cre-
ated new demands for biometric authentication. Some biometric systems, such as ﬁnger-
print, voice and face, have appeared in some high-end mobile phones. Miniaturized ﬁnger-
print sensors, capable of being embedded in a cell phone, have been recently developed.
Different from commonly-used two-dimensional sensors, new line-scan sensors enable a
ﬁngerprint impression to be captured by swiping a ﬁnger across a line. Thus biometric
authentication, with increasing deployment in various applications, is here to stay.

There are two modes of biometric recognition, namely, veriﬁcation and identiﬁcation. In
the veriﬁcation mode, the recognition task is to verify the claimed identity, 10, of a user
based on an input ﬁngerprint, Q. A biometric system retrieves the template, T, of 10 that is
stored in its database, and extracts features from Q. The extracted and retrieved features are
tested by a ﬁngerprint matcher, and a similarity measure S (Q, T) is obtained (1:1 match).
When the Similarity measure is above (respectively, below) a pre-determined threshold A,

the claimed identity is accepted (respectively, rejected).

On the other hand, when the biometric system is in the identiﬁcation mode, the extracted
features of an input image Q are tested against the extracted features of every stored tem-
plate and a decision is made on whether a match is found or not. Compared to veriﬁcation,
identiﬁcation is much more difﬁcult because no claimed identity is available. For a sys-
tem with M templates, the system searches through the entire database to recognize an

individual (1 to M matches).

1.2 General Procedures in Fingerprint Recognition

After establishing a system database through ﬁngerprint enrollment, both the veriﬁcation
and identiﬁcation tasks can be divided into three different components, namely ﬁngerprint
acquisition, feature extraction, and ﬁngerprint matching. Figure 1.2 shows the basic tasks

in enrollment, veriﬁcation and identiﬁcation of a ﬁngerprint-based recognition system.

1.2.1 Fingerprint Enrollment

Enrollment is the procedure of sensing and Storing ﬁngerprint templates to establish a sys-
tem database. First, a ﬁngerprint image is captured by a sensor. After that, a quality checker
is applied to the image. If the quality is good, the image is retained. Otherwise, the cap-
tured image is deleted and a new ﬁngerprint image is acquired. The process is continued
till the quality of the acquired image iS sufﬁciently good. Image quality is determined by
the clarity of the ridge and valley structures and measured based on many different param-
eters, e.g., image resolution, sensed area, image contrast and the extent of deformation in
the enrolled ﬁnger. The quality checker ensures that the acquired images have low noise.
Noisy images can create problems for later processing (for example, during the ﬁngerprint
matching stage). Figure 1.3 shows three images with good, medium and poor quality from
FVC 2002 DB1 database [26]. Poor quality images produce Spurious features which lead

to ﬁngerprints from the same ﬁnger looking different, or ﬁngerprints from different ﬁngers

Enrollment

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Input Quality
Fingerprint Feature System
and —> Checker —’ Extractor Database
Username
ver'f'cat'on Claimed Identity
1
Input Finger
. Feature Matcher One System
and Claimed _' Extractor W 1:1 Match lemplate Database
IdentIty l
Accept/Reject
Identification
Input Feature Matcher M System
Fingerprint —> Extractor 1:M Match Templates Database

 

 

1

User’s identity or
User not identified

Figure 1.2: Schematic diagram Showing the processing tasks involved in the enrollment,
veriﬁcation and identiﬁcation modes of a ﬁngerprint-based authentication system [27].

   

(b) (C)

Figure 1.3: Fingerprints of different quality based on the clarity of ridge and valley struc-
tures: (a) good, (b) medium, and (c) bad. Extracting features and establishing matches
based on (c) can be difﬁcult. Images are from FVC 2002 DB] [26]

looking alike, and thus should be avoided. Finally, a feature extractor is applied to the

enrolled image and the extracted features are stored in the system database.

1.2.2 Fingerprint Acquisition

Fingerprint acquisition is to capture ﬁngerprint images during the recognition phase and
enrollment phases. There are two types of capture procedure, namely live scan and off-line
scan. Currently both types of ﬁngerprints are used in applications. For example, live ﬁn-
gerprints are used in the Automated Fingerprint Identiﬁcation System (AFIS) [27] whereas
off-line ﬁngerprints are still used in the forensic applications. In a live scan, ﬁngerprints are
acquired directly from the sensors. In an off-line scan, preliminary ﬁngerprint images are
obtained ﬁrst and the ﬁnal ﬁngerprint images are obtained by digitizing the preliminary im-
ages. TWo examples of off-line ﬁngerprint scans are ink-based and latent scans. During the
ink scans, ﬁngers are ﬁrst Spread with ink and rolled from nail edge to nail edge against a
paper ﬁngerprint card. The rolled images can be digitalized through either a paper scanner
or a high quality camera. Latent ﬁngerprint is a ﬁlm of moisture or grease from ﬁngerprint

ridges deposited on the surface of touched objects. Due to poor quality of the lifted image,

     

(b)

Figure 1.4: Fingerprints of different sources: (a) live [26], (b) ink [33], and (c) latent
ﬁngerprint scans [15]

later enhancement methods, such as powder dusting, ninhydrin spraying, iodine fuming,
and silver nitrate soaking, are involved for better ﬁngerprint detection [6]. Figures 1.4 (a-c)

show examples of a live ﬁngerprint, an ink ﬁngerprint and a latent ﬁngerprint.

1.2.3 Fingerprint Feature Extraction
Fingerprint Features

After the ﬁngerprint acquisition, salient features need to be extracted for later matching.
As previously discussed, there is also feature extraction in the enrollment procedure. These
two extractions are similar with one main difference: Feature extraction in the enrollment
stage is used to establish a database and the quality of the acquired image can be controlled,
whereas during the testing phase, especially in a latent scan, we can not guarantee such a
good quality image. Two groups of features, namely global features and local features, are
critical in ﬁngerprint matching. Local features, which are details that are believed to be
unique to an individual, are used for ﬁngerprint matching.

Global features are used for ﬁngerprint classiﬁcation and for ruling out erroneous types

of ﬁngerprints prior to matching. Fingerprint classiﬁcation is the problem of binning ﬁnger-

print images into different classes. Figure 1.5 shows examples of major ﬁngerprint classes
that include arch, loop (includes left loop and right loop), and whorl. Different ﬁngerprint
classes are differentiated by the global ridge structures. For example, the ﬁngerprints of
the left loop have ridges initiated from the left side of the ﬁngerprints and continue to the

center area and ﬁnally come back to the left side.

 

(c)Whorl

Figure 1.5: Examples of ﬁngerprint images from the major classes. Images are from NIST
2000 SD 4 [33].

Commonly-used global features include (i) singular points and (ii) the directional ﬁeld of

   
  

Core

Minutiae
Bifurcation

Delta

Minutiae
‘ Ending
' ~ .4!

 

Figure 1.6: A ﬁngerprint image showing the salient features [33]

ridge ﬂow. Singular points are the discontinuities of ﬁngerprints which consist of delta and
core. A delta is a point in a ﬁngerprint image which is the conﬂuence of ridge ﬂows in three
different directions and a core is the point of inner most ridge with maximum curvature.
The core and delta are labeled in Figure 1.6.

Detection of singularity (i.e., core and delta) has been the focus of many previous studies.
In Nakarnura et a1. [32], and Srinivasan and Murthy [46], singularities were detected by ﬁrst
ﬁnding high curvature regions and then classifying the regions into core, delta and non-
singular regions. In Rao and Jain [37], a method based on geometric theory of differential
equations was used to detect cores and deltas.

The directional ﬁeld reveals direction of the ridge ﬂow for each pixel or a block of pixels
in ﬁngerprint images. Ridge ﬂow direction can be described by an angle 0 with respect
to the x-axis. Opposite ridge ﬂow directions are equivalent, and therefore 0 can only be
determined in [0, 7r]. The main challenge in estimation of directional ﬁeld is that gradient

with opposite directions should not cancel each other, but rather reinforce them (see Figure

    

r' ' - I .
Figure 1.7: Gradient directions and magnitudes of a partial ﬁngerprint region from FVC
2002 DB1 [26] indicated by arrow heads and lengths, respectively.

1.7). Many studies have focused on estimating the directional ﬁeld. These studies include
neural- network based approaches (Wilson et a1. [51]), ﬁlter-based approaches (O’Gorman
and Nickerson [34]), and gradient-based approaches (Hong et a1. [20], Jain et a1. [1], and
Bazen and Gerez [5]).

Local features are anomalies along ridge and valley structures, which are usually called
minutiae. There are several types of minutiae: ending, bifurcation, island, spur, crossover,
lake and others. Figure 1.8 Shows examples of different types of minutiae. A minutia
ending is a point where a ridge terminates, and a minutia bifurcation is a point where a
ridge bifurcates into two almost parallel ridges. This thesis focuses on ridge bifurcation
and ending because the other types of minutiae occur much less frequently compared to
endings and bifurcations in ﬁngerprint images. Moreover, other types of minutiae can be
thought of as combinations of endings and bifurcations. Examples of a minutia ending
and bifurcation in a ﬁngerprint image are shown in Figure 1.6 and Figure 1.8 (a) and (b).
Direction of a minutia ending is normally represented by the angle between the horizontal
axis and the minutiae tangent pointing away from the ridge terminating point. Direction of
a minutia bifurcation is represented by the angle between the horizontal axis and the tangent
pointing into the ridge prior to the bifurcation. Hence the minutiae direction of an ending

or a bifurcation is in range [0, 27r]. Figure 1.9 shows illustrations of minutiae directions

Ki: 5:,

(a) Bifurcation (b) Ending (0) Island
V¢ 7";
(d) Spur (e) Crossover (f) Lake

Figure 1.8: Examples of different minutiae. Images are from FVC 2002 D81 [26].

and locations for both ending and bifurcation, where minutiae location and direction are

represented by s and D, respectively.

Minutiae Extraction

Local features or minutiae, are most important for ﬁngerprint matching. To extract minutiae
from ﬁngerprint images, various algorithms have been developed. Binarization and direct
gray scale approaches are the two most popular methods for extracting minutiae. Both of
them use gray scale images, and the difference is that the binarization approach requires
gray scale images to be converted into black-and-white images, whereas the direct gray
scale method detects minutiae without binarizing gray scale images. The feature extraction
algorithm used in this thesis is a binarization method reported in [1]. The enhancement
process involves strengthening the clarity of the ridge Structures using directional Gabor
ﬁlters. This is followed by thinning where the enhanced ridges are reduced to connected
components of one pixel wide. The minutiae locations are then detected in the thinned

image as breaks or bifurcations in the connected components. Figure 1.10 (a) shows a

ll

 

 

 

(a) Ending (b) Bifurcation

Figure 1.9: Locations and directions of bifurcation and ending

typical ﬁngerprint image in the FVC 2002. Figures 1.10 (b) and (c) Show, respectively,
images after applying enhancement and thinning algorithms to Figure 1.10 (a). Detected

minutiae locations and directions are Shown in Figure 1.10 ((1).

1.2.4 Fingerprint Matching

After features are extracted from the query image, they are matched to the template saved
in the system database. There are three approaches for ﬁngerprint matching [27], namely
correlation-based, minutiae-based and ridge-feature-based matching. Correlation-based
matching computes a correlation between pixel gray values of aligned images. Minutiae-
based matching optimizes the alignment by maximizing the number of matched minu-
tiae pairs between aligned minutiae patterns. Minutiae-based matching is the most popu-
lar among the commercial ﬁngerprint matchers and it is the basis of ﬁngerprint matching
used by forensic ﬁngerprint experts. An example of minutiae-based ﬁngerprint matching iS
Shown in Figure 1.11, where matched minutiae correspondences are marked by connecting
lines. Ridge-based matching is usually applied to ﬁngerprint images with low quality. Usu-

ally, for these low-quality images, correlation calculation is less consistent and minutiae

12

 

1"   i/ﬂ
‘\\\\ y

(a) (b) (C) (d)
Figure 1.10: Minutiae feature extraction steps. Different ﬁngerprint processing stages for
extracting minutiae features: (a) original image from FVC 2002 DB1 [26], (b) enhanced
image, (c) thinned image and (d) detected features with minutiae locations indicated by
black boxes, and directions represented by solid lines.

 

features are unavailable, but ridge-pattem features (e.g., local orientation and frequency,
texture information, and ridge shape) are more reliable. We used minutiae based matching

in this thesis to match two ﬁngerprint images.

1.3 Statistical Test of Hypothesis in Fingerprint Recogni-
tion

The task of biometric recognition can be described in terms of a statistical test of hypoth-

esis. Suppose a query image, Q, corresponding to the true but unknown identity, It, is

acquired. In order to carry out a test to determine whether Q belongs to a claimed identity

IC, template T corresponding to Ic is retrieved from the system database and is matched

with Q. The null hypothesis is that IC is not the owner of the ﬁngerprint Q (i.e., Q is an

impostor impression of 16), and the alternative hypothesis is that 16 is the owner of Q (i.e.,

Q is a genuine impression of Is). The null-altemative hypothesis testing scenario is

H0 2 1,375 [C vs. H1 1 It 216- (1.1)

13

\\‘\ V\ é;

\\\“:\\‘
\\\\.\\\‘: _

\\\‘\2\.

‘- .
. :\\\.r

 

Figure 1.11: Example Of ﬁngerprint matching. Two impressions of the same ﬁnger from
FVC 2002 DB1 [26] with 37 and 38 minutiae, respectively. 25 true correspondences are
found here.

Suppose the matching score between Q and T is S(Q, T). Large (respectively, small)
values of S(Q, T) indicate that T and Q are similar to (respectively, dissimilar to) each
other. If S (Q, T) is lower (respectively, higher) than a pre-speciﬁed threshold A, it leads to
rejection (respectively, acceptance) of H0. Consequently, two types of errors can be made,
the probabilities of which are false reject rate (FRR) and false acceptance rate (FAR),
respectively. By deﬁnition, false reject rate ,which can also be called false non-match rate
(FNMR), is the probability of incorrectly rejecting a genuine ﬁngerprint query, and false
acceptance rate, which can also be called the false match rate (FMR), is the probability of

incorrectly accepting an impostor query. The formulae of FRR and FAR are:

FRR(A) = FNMR(A) _ P(S(Q,T) g ,\|1, = 1,),
(1.2)
FAR(A) = FMR(A) = P(S(Q,T) > MI, ya la).

TWO sources of variability, namely large intraclass variability and small interclass vari-

ability cause erroneous decisions when testing between H0 and H1. Intraclass variability

14

 

Figure 1.12: Multiple impressions of the same ﬁnger illustrating the intraclass variability
[26]

 
  
   

\

  

.I. . ma: ‘6.
riffr “' ‘

‘ \
\t. /‘ ~,yx—g‘iﬁg¢t>i.l \: i

r\ ”ff/f} \:

 
 
 
 
 

   
    
 
  

I!" u. ” II l,‘\\
5.1/1 3 \\\\\\\.\.!..., (9)7 1’31 _“v ‘
' ‘7\\\‘{‘\‘\"\‘\‘\\ ///l

  

_. ‘
”is WI \‘5;
y; m/ \\_,, w/ \\\;\ ‘
s-__ /;,:::,~-

. ‘5 ('-
.’.M ’ ..
t- .rﬂs‘.:\.

  
   

“a.

Figure 1.13: Illustrating small interclass variability: A pair of impostor ﬁngerprints with
13 features (minutiae location and direction) in correspondence.

refers to the fact that ﬁngerprints from the same ﬁnger look different from one another.
Sources for this variability include non-linear deformation due to skin elasticity, partial
print, non uniform ﬁngertip pressure, poor ﬁnger-condition (e.g., dry ﬁnger), and noisy en-
vironment, etc. Non-linear deformation is introduced into a ﬁngerprint image when a three
dimensional ﬁngertip is projected onto a two dimensional sensing plane. Partial image is
due to small sensing surface, capturing only a portion of ﬁngerprint. Poor ﬁnger condition
and noisy environment can produce images whose features are unable to discriminate be-
tween different ﬁngers. In Figure l.12, multiple impressions of the same ﬁnger appear dif-
ferent from each other due to various sources of intraclass variability discussed above. On
the other hand, interclass variability is inherent in different ﬁngers in the population. Small
interclass variability causes ﬁngerprints from different individuals to look very similar to
each other (See Figure 1.13). The research presented in this thesis develops statistically

models for both the interclass and intraclass variability of ﬁngerprint minutiae.

1.4 Summary

This chapter gives an overview of ﬁngerprint-based recognition. Among various biometric
traits, ﬁngerprints are most commonly used. They are also universal, distinctive, permanent
and collectible, and thus have dominated the biometric market over a long time. There are
two modes of ﬁngerprint recognition, i.e., veriﬁcation and identiﬁcation. In a veriﬁcation
mode, a ﬁngerprint is matched against a claimed identity; whereas in an identiﬁcation
mode, a ﬁngerprint is matched against all ﬁngerprint images in the database.

The general procedure of ﬁngerprint recognition includes four steps, namely ﬁngerprint
enrollment, ﬁngerprint acquisition, feature extraction and ﬁngerprint matching. Fingerprint
veriﬁcation can be viewed as a statistical test of hypothesis that involves two types of errors,
namely, false acceptance rate and false reject rate. This thesis focuses on false acceptance
rate since it gives an estimate of the probability of random correspondence (that is, the
probability that a pair of different ﬁngers will match with each other), which is the measure

of ﬁngerprint individuality.

17

CHAPTER 2

Fingerprint Individuality

2.1 Importance of Fingerprint Individuality

There are two premises underlying ﬁngerprint based recognition. The ﬁrst premise is that
ﬁngerprints are permanent, i.e., ﬁngerprints do not change over a person’s life-time. The
second premise is that ﬁngerprints are unique, i.e., the characteristics of ﬁngerprint features
of different ﬁngers are different. The ﬁrst premise has been widely investigated and proven
to be valid based on the anatomy and morphogenesis of friction ridge skin [27]. However,
the second premise of uniqueness has not been thoroughly studied. In particular, when
ﬁngerprints are matched, statistical measures of the conﬁdence associated with the match
have not been thoroughly investigated. Fingerprint individuality is the study of the extent
of uniqueness of ﬁngerprints and this is the focus of the research presented in this thesis.
Investigation of ﬁngerprint individuality is most important in the legal setting. Expert
testimony based on ﬁngerprint evidence is delivered in a courtroom by comparing salient
features of a latent print query lifted from a crime scene with that of the defendant. Thus, we
are in the hypothesis testing scenario of Equation 1.1 where the court has to decide whether
the defendant is truly the criminal (reject H 0) or otherwise (accept H0). A reasonably high

degree of match between the query and template ﬁngerprints from the defendant leads the

18

experts to testify irrefutably that the owner of the latent print and the defendant are the
same person. For decades, the testimony of forensic ﬁngerprint experts was almost never
excluded from these cases, and on cross-examination, the foundation of this testimony was
rarely questioned. A prerequisite to establishing an identity based on ﬁngerprint evidence is
the assumption of discernible uniqueness, i.e., salient features of ﬁngerprints from different
individuals are different. Only when this is true can the experts conclude that the owners of
two different prints with reasonably high degree of similarity are one and the same person.
However, in reality, forensic experts are never questioned on the uncertainty associated with
their testimony (that is, how frequently an observable match between a pair of prints would
lead to errors in identiﬁcation of individuals). Thus, discernible uniqueness precludes the
opportunity to determine error rates of ﬁngerprint matching from analyzing inherent feature
variability and calculating the probability of two different persons sharing a set of common
features.

A signiﬁcant event that broke this trend occurred in 1993 in the case of Daubert v. Merrell
Dow Pharmaceuticals [13] where the US. Supreme Court ruled that in order for an expert
forensic testimony to be allowed in courts, it had to be subject to ﬁve main criteria of
scientiﬁc validation, that is, whether (i) the particular technique or methodology has been
scientiﬁcally tested, (ii) its error rates have been established, and (iii) known standards
of the technique have been developed and well maintained, (iv) the technique has been
peer-reviewed, and (v) the technique has gained broad public acceptance [35]. Forensic
evidence based on ﬁngerprints was ﬁrst challenged under Daubert’s ruling in 1999 in the
case of USA v. Byron Mitchell [49], stating that the fundamental prerrrise for asserting the
uniqueness of ﬁngerprints had not been objectively tested and its matching error rates were
unknown. The Brandon Mayﬁeld Case [36] is another case that challenges the reliability
of ﬁngerprint. In this case, Brandon Mayﬁeld, a lawyer from Oregon, was mistakenly
identiﬁed by FBI as the terrorist who attacked the commuter trains in Madrid in March,

2004. In October, 2007, ﬁngerprint evidence was excluded in the case of Bryan Rose for

19

the conviction of death penalty by a Baltimore County judge [10]. The judge challenged the
reliability of ﬁngerprints based on the error made from ﬁngerprint evidence in the Brandon
Mayﬁeld case [10]. Fingerprint-based identiﬁcation has been challenged in more than 20
court cases in the United States (see [9] for details). It is clear that there is a need to study
ﬁngerprint individuality which is the basis for the admissibility of ﬁngerprint evidence in

COLII’I cases.

2.2 Early Studies on Fingerprint Individuality

There have been a few previous studies that addressed the problem of ﬁngerprint individu-
ality using mathematical models on ﬁngerprint features. All these studies utilized minutiae
features (both location and direction information) to assess individuality. In 1892, Gal-
ton [16] raised the topic of ﬁngerprint individuality for the ﬁrst time. He assumed that
a full ﬁngerprint iS a combination of 24 disjoint and independent square regions, each of
which consists of six ridges. He found that the probability of correctly re-building each
of the 24 regions by only looking at its neighboring ridges is 1 / 2. Thus the probability of
correctly re-building all of the 24 regions is (1 / 2)24. Further, Since the probability of ﬁnd-
ing a speciﬁc ﬁngerprint class (i.e., arch, whorl, left loop, right loop, double loop) is 1 / 16,
and the probability of ﬁnding the correct number of ridges entering and exiting each region
is 1 / 256, the probability of ﬁnding a set of given neighboring ridges is (1 / 16) x (1 / 256).
Based on these assumptions, Galton estimated that the probability of ﬁnding each ﬁnger-
print conﬁguration in a given population is

. . , , 1 1 1 24
P(F1ngerpr1nt configuration): 1—6 x 2576 x (2)

= 1.45 x10'11. (2.1)
Later, Pearson [41] and Kingston [23] disputed Galton’S assumptions and suggested that,
given the fact that there are 36 possible minutiae locations within a six-ridge block, the

probability of observing a given ridge conﬁguration in one block is actually 1 / 36, instead

of 1 / 2 as proposed by Galton. Therefore, the probability of ﬁngerprint conﬁguration Should

20

be

1 124

— —— =1. ‘41. 2.2
36x256x(36) 09X“) ( )

 

P(Fingerprint configuration) =

After Pearson and Kingston, several subsequent models (e.g., Henry [18], Balthazard [3],
Bose [47], Wentworth and Wilder [50], and Cumrrrins and Midlo [11]) of ﬁngerprint indi-
viduality were proposed based on the number of minutiae in a ﬁngerprint and the proba-
bility (p) of occurrence of a minutia. The probability of a ﬁngerprint conﬁguration with N

minutiae is given by the general formula

P(Fingerprint configuration) = pN, (2.3)

with different values for p used in different studies. For example, Henry estimated p as 1 / 4,
and thus, for a given ﬁngerprint type and a given number of ridges between core and delta,
Henry determined the probability of ﬁngerprint conﬁguration to be pN+2. By contrast,
Wentworth and Wilder [50] chose p to be 1 / 50, and Cummin and Midlo [11] chose p to be
1/31.

The above investigations as well as many other later studies made assumptions that were
not validated on actual ﬁngerprint databases. This is a serious drawback Since estimates
of ﬁngerprint individuality obtained by these studies were also never validated. The ﬁrst
attempt to validate a ﬁngerprint individuality model on an actual database was carried out

by Pankanti et al. [35].

2.3 A Stochastic Model of Fingerprint Individuality

A Signiﬁcant improvement on earlier models of ﬁngerprint individuality was reported by
Pankanti et al. [35]. Since this thesis makes an effort to improve this model, the model by
Pankanti et a1. [35] is presented in this section in detail .

Pankanti et a1. [35] estimated ﬁngerprint individuality via probability of random corre-

spondence (PRC), which is deﬁned as the probability that two different ﬁngerprints from a

21

target population randomly match each other. Suppose the query ﬁngerprint Q has n “effec-
tive” minutiae and the template T has m “effective” minutiae, where “effective” minutiae
indicates minutiae in the overlapping region of two ﬁngerprints after alignment. A more
natural deﬁnition of PRC which is utilized in this thesis is the probability of match when
Q and T have n and m minutiae, respectively, in the whole ﬁngerprint instead of in the
overlapping region. Recall the hypothesis testing scenario of Equation 1.1 for biometric
authentication. When the similarity measure S (Q, T) is above the threshold A, the claimed
identity (IC) is accepted as the true identity It. Based on the statistical hypothesis in Equa-

tion 1.1 in Chapter 1, the PRC is deﬁned as the false acceptance rate, which is
PRCO) = P<S(Q, T) .>_ Alm, n, Ho), (2.4)

where the probability is computed under the assumption that H0 is true.

To estimate PRC, the following assumptions were made:

1. Only minutia ending and bifurcation are considered as salient ﬁngerprint features for
matching. No distinction was made between minutia ending and bifurcation. Other
types of minutiae, such as islands, Spur, crossover, lake, etc., rarely appear and can

be thought of as combination of endings and bifurcations.

2. Minutiae location and direction are uniformly distributed and independent of each

other. Further, minutiae locations can not occur very close to each other.

3. Different minutiae correspondences between Q and T are independent of each other,

and any two correspondences are equally important.
4. All minutiae are assumed true, that is there are no missed or spurious minutiae.
5. Ridge width is unchanged across the whole ﬁngerprint.

6. Alignment between Q and T exists, and can be uniquely determined. To align two
sets of minutiae, corresponding minutiae were ﬁrst determined by a matching al-

gorithm [1]. A rigid transformation consisting of rotation and translation was then

22

determined with a least square approximation between the corresponding minutiae

pairs [35].

2.3.1 Deﬁnition of a Random Minutiae Correspondence

For a query ﬁngerprint Q with n minutiae, Pankanti et a1. [35] estimated the probability of
Q sharing q minutiae out of the m minutiae in template T. Use the same letters Q and T
to denote the minutiae sets in ﬁngerprint Q and template T. These minutiae can be then

expressed as

Q 2 {{SQ,D?}.{S§’,D§3}, ....,{S,?,D.?}} (2.5)

T +— {isTJDlT}, {55.0%}, {35.0%}, (2.6)

where S and D refer to a generic rrrinutia location and direction pair. Assume that the
minutiae in Q have been aligned with minutiae in T. To assess ﬁngerprint individuality, a
random minutiae correspondence between Q and T needs to be deﬁned: a minutia in Q,
(SQ, DQ), is said to match or be in correspondence with a minutia in T, (ST, DT), if for

ﬁxed positive numbers 7‘0 and do, the following inequalities are valid:

 

ISQ — STls S 7‘0 and IDQ — DTld S do, (27)
where
|SQ — STIS a 3/(xQ — J)? + (1,0 — yT) (2.8)
is the Euclidean distance between the minutiae locations SQ E (xQ,yQ) and ST E
(HST, if)
and
|DQ — DTld E min(|DQ — DT|, 2n — IDQ — DT|) (2.9)

is the angular distance between the minutiae directions DQ and DT.
The choice of parameters To and d0 deﬁnes a tolerance region (see Figure 2.1), which

is critical in determining a match according to Equation 2.7. Large (respectively, small)

23

 

 

Sensing _
Plane, S

 

 

 

 

Fingerprint
Area, A

 

 

 

 

I Minutia i

 

 

 

 

Figure 2.1: Identifying the tolerance region for a query minutia.

values of the pair (70,010) will lead to spurious (missed) minutiae matches. Thus, it iS
necessary to select (r0, d0) judiciously so that both kinds of matching errors are minimized.
A discussion on how to select (To, do) is given subsequently.

Parameters (To, do) determine the matching region for a query minutia. In the ideal Situa-
tion, a genuine pair of matching minutiae in the query and template will correspond exactly,
which leads to the choice of (To, do) as (0,0). However, factors such as Skin elasticity and
non-uniform ﬁngertip pressure can cause the minutiae pair that is supposed to perfectly
match, to slightly deviate from one another. To avoid rejecting such pairs as non-matches,

non-zero values of To and do need to be speciﬁed for matching pairs of genuine minutiae.

24

The value of To is determined based on the distribution of the Euclidean distance between
every pair of matched minutiae in the genuine case. To ﬁnd the corresponding pairs of
minutiae, pairs of genuine ﬁngerprints were aligned, and Euclidean distance between each
of the genuine minutiae pairs was then calculated. The value of m was selected so that only
the upper 5% of the genuine matching location distances (corresponding to large values of
7') were rejected. In a Similar fashion, the value of do was determined to be the 95th per-
centile of the distribution of genuine matching angular distances (i.e., the upper 5% of the
genuine matching angular distances were rejected).

To ﬁnd the actual 1'0 and d0, Pankanti et al. [35] used a database of 450 mated ﬁngerprint
pairs from IBM ground truth database [38]. The true minutiae locations in this database
and the minutiae correspondences between each pair of genuine ﬁngerprints in the database
were determined by a ﬁngerprint expert. Using the ground truth correspondences, To and do
were estimated to be 15 and 22.5, respectively. These same values will be used to estimate

the PRC in the experiments presented in this thesis.

2.3.2 Estimation of Fingerprint Individuality by Uniform Model

In the model by Pankanti et al. [35], the similarity measure between the query Q and the
template T, namely S (Q, T), is taken as the number of common minutiae between Q and
T. When S (Q, T) is above a threshold w, the claimed identity (Ic) is accepted as true
identity It. Given Q with n minutiae and T with m minutiae, Pankanti et al. [35] evaluated
ﬁngerprint individuality by measuring the probability of ﬁnding exactly q matches between
them.

Let A be the overlapping area between Q and T, and To and do be the estimated threshold
for the tolerance region. Moreover, deﬁne C = 7T’r‘02. Pankanti et al. [35] claimed that

considering any subset of p minutiae in Q, the probability that all of these p minutiae and

25

only these p minutiae have correspondences in location (in whatever direction) in T is

mC.(m—1)C. .(m—p+l)C
A A—C A—(p—1)C

 

 

(2.10)
A—mC A—(m—1)C. .A—(m—(n—p+1))C

XA—pC'A—(p+1)c A—(n—1)C

 

 

The probability in Equation 2.10 can be derived sequentially as follows. First, let
{S1, Sg, ..., Sp} be locations of the p selected rrrinutiae in Q. Then, the probability that

S 1 is matched with one of the m minutiae in the template T is

m9
A .
Moreover, given that 81 is matched with a minutia in T, the probability that 52 is matched

with one of the remaining m — 1 minutiae in T is

(m — 1)C'

A -— C '
Recall that the second assumption of their model is that minutiae can not be very close to
each other. After introduction of the tolerance region by To, this assumption can be restated
as follows: The minimum distance between any pair of minutia locations in a ﬁngerprint
is larger than 27:0. Therefore, Sg cannot correspond with the same minutiae of T as that of
31. Similarly, given that all of {$1, SQ, ..., Sk_1} ﬁnd their matches in T, the probability

that Sk (k S p) ﬁnds a match in T is

(m—k+1)C
A—(k—1)C'

 

Furthermore, given that all of the p minutiae have found their correspondences in T, the
probability for minutiae 5;, (p < k g 71) not to match with any minutiae in T is (recall that

only these p minutiae have correspondences in T)

A—(mC—(k—p+l))C
A—(k—1)C

 

Combining these steps gives Equation 2.10.

26

Equation 2.10 gives the probability of p matches between Q and T for a given set of

p minutiae, {{Sl, D1}, {S2, D2}, ..., {Sp, Dp}}, in Q. However, in practice, it is more

It

important to ﬁnd the probability for any set of p minutiae in Q. Since there are (p

) ways
to select p minutiae from Q, the probability of ﬁnding exactly p minutiae pairs between Q

and T matched in location is

P(A, C, m, n, p) = (z)

 

 

.mC.(m——1 C. (m—p+1)C
A C mA—(p—1)C

 

 

A—
XA—mC.A—(m—1)C. .A—(m—(n—p+1))C
A—pC A—(p+1)C A—(n—1)C '
(2.11)

A
Equation 2.11 can be further Simpliﬁed. Deﬁne M : C' Since M is large (this is be-

cause C is much smaller compared to the ﬁngerprint area, A), it is realistic to take M as an
integer. Thus Equation 2.11 approaches the probability of a hyper-geometric distribution:
M ._
('3) X ( n-2,”)
(M) '
n

The model introduced above considers only minutiae matches in location, and ignores

 

P(M, m, n, p) = (2.12)

minutiae direction. To account for direction, Pankanti et al. [35] introduced a binomial
model. They assumed that minutiae direction is independent of minutiae location, and
therefore matching minutiae in location and in direction are also independent. They further

assumed that minutiae direction is uniformly distributed in [0, 27r]. Deﬁning
6
1: P009 — 071 5 do) = —0,
it

the probability that there are q pairs of minutiae matched in direction follows a binomial

(2)100 — 09—4.

Based on the hyper-geometric distribution for minutiae match in location and the bino-

distribution:

mial distribution for minutiae match in direction, the probability that there are p matches

in location and q matches in both location and direction (q 5 p) between Q and T is

P(M m n l p) = W X (p)lq(1—l)p_q. (2.13)
, ’ ’ , (if) q

27

Therefore, the probability of q minutiae matches in both location and direction is the sum

of Equation 2.13 over p, i.e.,

P _ min{m,n} (TS) (Nit?) p q p_q
(M,m,n,q) _ Z ——M—-— x I (1—1) . (2.14)
10:9 ( n ) q

2.3.3 Corrected Uniform Model

Pankanti et al. [35] validated their stochastic model on various databases. However, the
model predictions deviated signiﬁcantly from empirical results obtained through an auto-
matic ﬁngerprint matching system [35]. This is mainly because the assumption of uniform
distribution on minutiae location and direction does not hold true in practice. For example,
it is known that ﬁngerprint minutiae tend to form clusters [44], and minutiae only occur
on ﬁngerprint ridges instead of valleys. Therefore minutiae locations are not unifomrly
distributed. Moreover, minutiae in different regions of a ﬁngerprint are observed to be
associated with different region-Speciﬁc minutiae directions. Hence minutiae directions
are neither uniformly distributed nor independent of the location. Furthermore, minutiae
points that are spatially close to each other tend to have Similar directions. These observa-
tions on the distribution of ﬁngerprint minutiae need to be accounted for in eliciting reliable
statistical models for ﬁngerprint individuality.

Pankanti et al. [35] improved their model to better ﬁt the empirical results. First, to
account for non-homogeneity of minutiae location, they redeﬁned the parameter M in their

model as
A
27'00) ,

 

where w is the ridge width. This deﬁnition deviates from their uniform assumption of
minutiae location.

Second, when evaluating the parameter I, Pankanti et al. [35] found that instead of using

_ 2 x 22.5
_ 360

 

= 0.125,

28

as derived based on the uniform distribution, they instead used
I = 0.267,

based on empirical results on real databases which again deviates from their uniform as-

sumption minutiae directions.

2.3.4 Limitations of Corrected Uniform Model

A comparison between model predictions and empirical observations in Figure 2.2 [35]
based on two databases, MSU DB1 and MSU VERIDICOM [35], Showed that the cor-
rected uniform model grossly underestimated the probabilities. In this ﬁgure, there are
two different probability distributions of the number of matched minutiae pairs for each
database, namely the empirical distribution and the theoretical distribution. The empirical
distribution is obtained through an automatic ﬁngerprint matching system (AFMS) [1] and
theoretical distribution is computed from the corrected uniform model when M is taken
as the average value estimated from the database. As seen from the ﬁgures, the empirical
distributions are to the right of their corresponding theoretical distributions which indicates
that the corrected uniform model grossly underestimates the PRCS. This is mainly be-
cause the corrected uniform model didn’t model the minutiae clustering tendency and the
dependence between minutiae locations and directions.

A comparison of PRCS derived from their model with empirical PRCS based on NIST
2002 SD4 reached the same conclusion. For example, for a query ﬁngerprint Q and a
template ﬁngerprint T, each with 52 minutiae, the empirical probability that they share 12
or more minutiae is 3.9 x 10’3, differing greatly from the model estimation of 4.3 x 10—8
(See Table 7.11 based on the estimates from NIST 2000 SD 4 [30]).

The inherent limitation of the corrected uniform model motivated the research presented
in this thesis. The thesis develops statistical models that are Si gniﬁcant improvements over

the model by Pankanti et a]. [35].

29

 

-l- Empirical

 

 

 

 

 

 

 

 

I \
0.25 - . ’ ‘ ,
’1’ ‘3 , TheoretIcal
I I
0.2 - 1' ‘I
l 1
2? II I.
§ 0.15 31' '
‘8 .’
E I
0.11
0.05 - ,
o 4' I " ‘L J I ‘
0 5 10 15 20
Number of Matching Minutiae (q)
(a)
0.3- 1*, ' ' ' ' 1.. '
fr ‘3 —1— EmpIrIcal
r i, ----- Theoretical
025r ,’ I
f 1
I“ I“
0.2 "J l.
> I ‘
:1: , i
F 015 1
‘3 ‘1
6'. I
0.1 ‘,
\
K
0.05- \x 1
o 1 l “‘1"... r I
0 2 4 s a 10 12
Number of Matching Minutiae (q)

Figure 2.2: Comparison of experimental and theoretical probabilities for the number of
matching minutiae: (a) MSU DBI database, and (b) MSU VERIDICOM database. Figure

is the reproduction of Figure 9 in Pankanti et al. [35]
30

2.4 Contributions of the Thesis

The uncertainty involved in assessing ﬁngerprint individuality can be quantiﬁed as the
probability of ﬁnding a ﬁngerprint in a target population having minutiae Similar to that
of a given query ﬁngerprint. This probability is also known as the probability of random
correspondence (PRC). To compute this probability, ﬁngerprint samples from a target pop-
ulation are collected ﬁrst. Then the variability of the minutiae from various ﬁngerprints
Should be analyzed. After that, a notion of similarity between a pair of ﬁngerprints and the
probability that two different individuals share a set of Similar minutiae should be deﬁned.
In this thesis, it is assumed that a sample of prints is available from a target population
and a notion of similarity is given, and thus it does not address the issues and challenges
involved in sampling from a target population, which is worthy of separate investigations.
Instead, this thesis demonstrates how the proposed methodology can be used to obtain es-
timates of ﬁngerprint individuality based on the database that is assumed to be available.
If the available database is representative of the target population, the proposed estimates
will generalize to the target population. Figure 2.3 Shows the structure of the thesis.

To address the issue of individuality, candidate models must meet two important require-
ments: (i) ﬂexibility, i.e., the elicited model can represent the minutiae distribution from
a variety of ﬁngerprints in the target population, and (ii) computational efﬁciency, i.e., as-
sociated measures of ﬁngerprint individuality can be easily obtained from the model. In
practice, a forensic expert uses many ﬁngerprint features, such as minutiae location and di-
rection, ﬁngerprint class, inter-ridge distance, to match a pair of ﬁngerprints. In this thesis,
we only use minutiae locations and directions for Simplicity. Although Pankanti et a1. [35]
provided a stochastic model based on the same set of features (as discussed in Section
2.3), their model cannot satisfactorily represent minutiae Variability because the uniformity
assumption Of minutiae location and direction disagrees with observations from empirical
studies.

Empirical studies suggest that minutiae tend to cluster together, and minutiae close to

31

 

 

Assessment of Fingerprint Individuality

 

 

/

 

 

Inter-class Variability Model

 

 

L/

\

 

Whole Area Model

Overlapping Area Model

 

 

 

l

Mixture Model (C3)

 

 

 

 

 

 

 

l

 

 

Mixture Model (03)

 

 

 

\

 

 

Inter 8. lntra-class
Variability Model

 

 

I

 

 

Partial Print Model (C6)

 

 

 

PRC Poisson Model (C4)

 

 

PRC Poisson Model (C4)

L

Mixture Model (C3)

 

 

 

 

 

 

 

 

 

 

 

 

Hyper-mixture Model (05)

 

 

 

 

 

Comparison with
Uniform Model (C7)

 

 

 

Local Perturbation Model (06)

 

 

Comparison with
Corrected Uniform Model
by Pankanti et al. and
Empirical Estimation (C7)

 

 

 

 

 

Comparison with Uniform
Model and Empirical
Estimation (C7)

 

 

 

 

Figure 2.3: Outline of the thesis contributions where the labels Cks indicate the chapters
where the corresponding contribution is made.

32

each other Share Similar orientations (in Similar or almost opposite directions), which, in
turn, implies that minutiae locations and directions are dependent. To account for these
minutiae characteristics, a family of mixture models is proposed to represent the observed
distribution of minutiae (both location and direction) in Chapter 3. A mathematical model
for the PRC is derived in Chapter 4 and an approximation formula is derived to improve
computational efﬁciency. The approximate PRC follows a Poisson distribution, and thus
the corresponding model is called Poisson model.

The PRCS obtained in this thesis and those from Pankanti et al. [35] are estimated in
different scenarios, which makes it difﬁcult to compare these two models. To carry out
this model-comparison, an overlapping area model was also developed, which adopts the
deﬁnition of PRC proposed by Pankanti et al. [35], and which still estimates the PRC with
the mixture models developed in this thesis.

Estimation of ﬁngerprint individuality for a target population involves averaging over all
pairwise impostor ﬁngerprints in a sample database. For a reasonably large database, to
calculate PRCS for all impostor pairs is infeasible. To solve this problem, a hyper-mixture
model is developed in Chapter 5, assuming that the target population is composed of clus-
ters of mixture distributions. The model conveniently estimates the PRC by the clusters
without dealing with the individual mixture distributions, which greatly improves the com-
putational efﬁciency. Finally, experimental results that compare hyper-mixture model and
the uniform model are given in Chapter 7.

Between the two sources of minutiae variability, namely interclass and intraclass vari-
abilities, the intraclass variability has not been thoroughly investigated by previous research
on ﬁngerprint individuality. While the mixture and hyper-mixture models still only deal
with minutiae interclass variability, a compound stochastic model is developed in Chap-
ter 6 to account for both interclass variability and intraclass variability. In particular, we
address the two most important types of intraclass variability: (i) variability due to local

perturbations arising from non-linear distortions in multiple impressions of the same ﬁnger,

33

and (ii) variability due to partial prints in multiple acquisitions of the same ﬁnger. There-
fore, compared with the mixture model and hyper-mixture model, the compound stochastic

model iS a more realistic model, accounting for both interclass and intraclass variability.

2.5 Summary

Fingerprint individuality studies uniqueness of ﬁngerprints which is the premise for ﬁn-
gerprint evidence. This problem has not been extensively studied and as a consequence,
ﬁngerprint evidence has been challenged in multiple court cases in the last two decades.
Although there is some existing research on ﬁngerprint individuality, almost all early re-
searchers developed their models based on assumptions that were not validated on actual
databases.

The ﬁrst study validating model assumptions on an actual database was presented by
Pankanti et al. [35]. In their model, ﬁngerprint individuality was assessed by probability of
random correspondence, which is the false acceptance rate of ﬁngerprint recognition. Their
model made two simplifying assumptions, (i) the uniformity of minutiae locations and
directions and, (ii) independence between minutiae location and direction. Pankanti et al.
[35] did modify their hyper-geometric model to account for the non-uniformity of minutiae
locations and directions. However, fundamentally, this model is still based on the uniform
assumptions. Furthermore, it does not consider the dependence between minutiae location
and direction. AS a result, experimental results showed that the hyper-geometric model
grossly under-estimates ﬁngerprint individuality even after the empirical modiﬁcation (see
ﬁgure 2.2).

Section 2.4 summarizes the contributions by this thesis.The proposed model gives a bet-

ter estimate of the ﬁngerprint individuality as supported by experimental results.

34

CHAPTER 3

Mixture Models for Fingerprint

Minutiae

In this chapter, a mixture model is proposed to model the minutiae variability of a ﬁnger-
print. For each ﬁngerprint, this mixture model captures the minutiae clustering tendencies
in different ﬁngerprint regions and the dependence between minutiae locations and direc-

tions.

3.1 Features for Fingerprint Individuality Model

Minutiae are utilized as the features for ﬁngerprint matching by forensic experts and most
automatic ﬁngerprint matching systems. In this research, only the two dominant types,
minutia bifurcation and minutia ending are considered. Minutia bifurcation and ending
are not distinguished since it is often not easy to distinguish between them by automatic
systems and they can convert between each other under noisy environment during the cap-
turing procedure. Each minutia is characterized by its location and direction. Subsequently,
the term minutiae features will be used to refer to the location and direction of a rrrinutia in

a ﬁngerprint impression.

35

3.2 Mixture Models

Let S denote a generic random minutiae location and D denote its corresponding direc-
tion. Let A Q R2 denote the subset of the plane representing the ﬁngerprint domain.
It follows that S E A and D E [0, 27r). Further, denote the total number of minutiae
in the ﬁngerprint region by k. A joint distribution model for the k minutiae features
{ (Sj, Dj), j = 1, 2, . . . k} is proposed to account for (i) clustering tendencies (i.e., non-
uniforrnity) of minutiae, and (ii) dependence between minutiae location (8]) and direction
(Dj) in different regions of A.

The joint distribution model proposed is a mixture model consisting of G components.
Let cj, cj 6 {1,2, . . .,G}, be the cluster label of the j-th minutia, j = 1,2, . . .,k. The
labels cj’s are identically and independently distributed according to a multinomial distri-

bution with G classes, and the probabilities of the G classes (i.e., 71, 72, . . . ,TG) satisfy
C?

Tj 2 0 and Z Tj = 1. Given label cj = 9, we assume the minutiae location Sj is
3:1

distributed according to the density

f§9(3|#g, Z:g)=</)2(Slligazg)i (3-1)

where (02 is the bivariate Gaussian density with mean pg and covariance matrix 29. Equa-
tion 3.1 states that the minutiae locations arising from the g-th cluster follow a two-
dimensional Gaussian distribution with mean p9 and covariance matrix 29.

The Von-Mises distribution [28] is a typical distribution used to model angular random
variables, and we adopt it to model minutiae directions. We elicit the distribution Dj given

cj = g to be the density
ngWII/gifig, Pg) =pgv(0)-I{0 3 0 < 7r}+ (1 -—pg)v(l9—7r)-I{7r g 6 < 2w}, (3.2)

where [{A} is the indicator function of A (i.e., [{A} = 1 if A is true; and [{A} = 0,

otherwise), and 11(6) is the Von-Mises distribution given by

12(6) :—: v( 6 | ug, mg) 2 exp{rzg c052(6 — I/g)}, (3.3)

_L.
10059)

36

with 100:9) deﬁned as

27r
10(Isg) = [O exp{I~cg cos (9 — ug)} dd. (3.4)

In Equation 3.3, V9 and reg represent the mean angle and the precision (inverse of the
variance) of the Von-Mises distribution, respectively. Figure 3.1 plots two density functions
associated with Von-Mises distributions with common means l/g but with two different
precisions rag < leg. The ﬁgure shows that I/g represents the “center” (or modal value)
while my controls the degree of Spread around the center (thus, the density with precision
K; has higher concentration around Vg). The density f? in Equation 3.2 can be interpreted
in the following way: The ridge ﬂow orientation, w, is assumed to follow the Von-Mises
distribution in Equation 3.3 with mean ug and precision reg. Subsequently, minutiae arising
from the g-th component have directions that are either 02 or w + it with probabilities pg
and 1 — pg, respectively.

Combining the distributions of the minutiae location (S) and direction (D), it follows

that each (S, D) is distributed according to the mixture density

G
me I 90) = Z rgfgseiug, 2:9) 15(611/9. mg), (3.5)
g=1

where f; (-) and ng (-) are deﬁned in Equations 3.] and 3.2, respectively.

In Equation 3.5, OG denotes all the unknown parameters in the mixture model which
includes the total number of mixture components, G, the mixture probabilities 79, g =
1,2, . . . ,G, the component means and covariance matrices of f9 ’5 given by “G 5
{p1, p2, ...., pg} and 2G E {21, 22, ..., 20}, the component mean angles and precisions
of f9 ’3 given by V0 2 {V1,1/2, . . . ,V0} and Isa E {831,832, . . . ,I‘Eg}, and the mixing
probabilities PG E {p1, p2, . . . ,pG}. The model described by Equation 3.5 has three ad-
vantages: (i) it allows for clustering tendencies in minutiae locations and directions via
G different clusters, (ii) it incorporates dependence between minutiae location and direc-

tion since if S J- is known to come from the g-th component, the direction Dj also comes

37

 

 

 

 

 

 

6 (radians)

Figure 3.1: Probability distribution plots of the Von-Mises distribution with center Vg =
37r / 4, and with two different precisions, rig and 5;, with I19 < 10;. The values of 12(0) at 0
and 7r are equal to each other due to the cyclical nature of the cosine function.

38

from the g-th component, and (iii) it is ﬂexible because it allows for two sub-components
with almost opposite minutiae directions in each component which is the novel part of this
mixture model.

The mixture density given in Equation 3.5 is deﬁned on the entire plane R2, and is not

restricted to the ﬁngerprint domain A. The constrained mixture model on the ﬁngerprint

 

domain is
3,6 9
fseA 0:0 f(s,6 | BC) d6ds
If the ﬁngerprint area A is large, it follows that A a: R2 and,
fA(Sa 9 I 90) % f(8, 9 | 90) (3.7)

because the denominator in (3.6)

f f: f( (S meg) )d6ds~ (3.8)
6A 6-

To estimate the unknown parameters in the model, an algorithm is developed based on the
expectation maximization (EM) algorithm [14, 31]. The optimal number of components,
0*, is selected using the Bayes Information Criteria (BIC). The BIC [43] has been widely
used in various model selection problems, and has the property that it selects a model that
is most parsimonious (with the least number of model parameters). Details of the EM
algorithm and the BIC are given in the next section.

The extension of mixture models to minutiae direction is a novel contribution. In each
component in the mixture model, as shown in Equation 3.5, ng (6 | 119, Kg, pg) is, according
to Equation 3.2, a sum of two weighted Von-mises distributions for minutiae direction. The
mean angles of the two Von-Mises distributions in Equation 3.2, i.e., 12(6) and 12(6 — ii),
are different by it, capturing two sub-groups of minutiae in each component with almost
opposite minutiae directions. This is motivated by the fact that neighboring minutiae tend

to have Similar orientations, i.e., with similar or almost opposite directions.

39

3.3 EM algorithm for Estimating 90

The Expectation Maximization (EM) algorithm [14] is a well-known iterative method for
ﬁnding the maximum likelihood estimate of parameters either in the presence of missing
data or when the model can be Simpliﬁed by adding latent variables. In such cases, the
original (or observed) likelihood can be obtained by marginalizing a complete likelihood

over the missing or latent variables. The EM algorithm consists of two main steps:

1) the E-Step, where the expectation of the logarithm of the complete likelihood is ob-
tained conditional on the observed data and parameter estimates at the current itera-

tion step, and

2) the M-step, where a maximization is performed to update the parameter estimates for

the subsequent iteration step.

The E- and M-Steps are cycled until the parameter estimates converge. For a more detailed
introduction of the EM algorithm, refer to [31].

The k minutiae features (Sj, Dj), j = 1, 2, . . . , k, are assumed to be independent of each
other and distributed according to the mixture density in Equation 3.5. In this case, the
missing component for the EM algorithm consists of the class labels, cj, j = 1, 2, . . . , k,

corresponding to each of the 10 features. The transformation

D- ifD- E [0 7r)
. = .7 .7 ’
02] { Dj — 7r if Dj 6 [7r, 271') (3'9)

converts the minutiae directions into orientations which take values in [0, 7r). The corre-

sponding distribution for each (Sj, 02]) then becomes

C

2: Ta fﬂsj My, 29) - ng(wj|I/g, rig), (3.10)
9:1

where ng (02]- | l/g, reg ) is aS given in Equation 3.3. Note that the expression in Equation

3.10 is now in the standard form for mixture models (see, for example, Section 2.7 of [31]),

40

and can be solved using general formulas for the E- and M-Steps. In this case, the E- and

M-steps can be combined into a single updating equation for each parameter linking the

current estimates to subsequent ones. The estimate of T9 at the (n + 1)-th iteration, 5n n+1),
is given by
+1)_1 k
9(7).
__ g 22;?“ ), (3.11)
j=1
where
z)? E P(cj = g | 3,312,, 9(3)) (3.12)

is the posterior probability that the j -th observation is from the g-th class, conditioned on

(n).

Sj, wj and the parameter estimates at the n-th iteration OG

 

The estimates of pg and 29 at the (n + 1)-th iteration, :57” n+1) and 29241), respectively,
are given by the equations
k
(71)
Z Z92 SJ
(’1)
Z zgj
j=1
and k
(n) (n+1) (n+1) T
Z Zgj (SJ _ #9 )(SJ _ g )
(n+1) __ 2:1
29 — k ( ) , (3.14)
TI
2 zgj
j=1

where 25”.) is as deﬁned in Equation 3.12.
We next proceed to estimate the parameters V9 and 21:9 at the (n + 1)-th iteration. The
component of the complete log-likelihood function (after the E-Step of the (n + 1)-th iter-

ation) involving only the parameters V9 and Hg is given by

k k
(n) n 2
E 29 gj log {f90(u2 II/g,- 239)} = E Zéj) {Kg cos2(u2j — Vg)+10g {10(Kg)}}.

i=1 i=1

 

41

Differentiating with respect to V9 and setting the derivative to zero, the estimate of V9 at

the (n + 1)-th step, l/gH-l), satisﬁes the equation
k < ) 1 +1)
2 29’]? 3111200]- — 129" ) = 0, (3.16)
i=1

which can be solved to give the closed form solution

 

k
1 Z 2;?) Sin 2023-
ujn“) = 5 tan—’1 3‘: (3.17)
(n) ,
Z 29]. cos 202]
i=1

Substituting Equation 3.17 in Equation 3.15, differentiating with respect to leg and setting

the derivative to zero, we note that the (n + 1)-th step estimate of mg, 213‘“), satisﬁes the

 

equation
k
n (n+1)
I (n+1) Z Zéj) “’5sz “ V9 )

W — - (- )

I0(“39 ) 2 2(11)

92'
121

The numerical method outlined in [19] is then used to compute 223‘“) from Equation 3.18.
The cluster label for observation (Sj, Dj) at the (n + 1)-th step, 091+”, is determined as

(n+1) _ (n)
cj — arg maxg zgj . (3.19)

and the estimate of pg is obtained as

k
Itcjn“) = g, D,- e 10.70}
1

1231“) = J: k . (3.20)
j=1

 

The E— and M-stepS are repeated till the parameter estimates converge. To ﬁnd the op-

timal number of clusters (0*), the EM algorithm was ﬁrst implemented to estimate the

42

model parameters for different values of G, and the BIC criteria

k
310(0) = 2 1: 2 log f(SJ-, D,- | (96) — legl log(k), (3.21)
j=1

is used to select 0*, where |Ogl is the cardinality of 90. i.e., the number of unknown
parameters in 90, and f is the mixture density as deﬁned in Equation 3.5. For the databases
used in this thesis, G was chosen to be less than or equal to 5. Based on the number of
minutiae typically observed in the database used here, choosing a larger value of C may
lead to model over-ﬁtting. The value of 0* is selected as the value of G’ that maximizes
BIC (C).

Figure 3.2 illustrates the ﬁt of the mixture model to two different ﬁngerprint images from
the NIST 2000 SD 4 . Observed minutiae locations (white boxes) and directions (white
lines) are Shown in panels (a) and (b). Panels (c) and ((1), respectively, give the cluster
assignment for each rrrinutia feature in (a) and (b). The cluster label of the jth minutiae
(53-, Dj) is estimated according to Equation 3.19 after the EM algorithm has converged.
Panels (a) and (b) in Figure 3.3 Shows the BIC values for different values of C. When
G = 3 (or G = 2), BIC is maximum for ﬁngerprint in Figure 3.3 (a) (or b). Figures 3.4
(a) and (b) plot the minutiae features in the 3-D (S, D) space for easy visualization of the
clusters (in both location and direction). The BIC criteria yields 0* to be 3 and 2 for panels
(a) and (b), respectively. Minutiae from the same cluster are labeled with the same Shape
and number.

Another way to Show the effectiveness of the ﬁt of the models to the observed data is to
Simulate a minutiae realization from the ﬁtted models. Figures 3.5 (a) and (b) Show two
ﬁngerprints whose minutiae features were ﬁtted with the mixture distribution in Equation
3.6. Figures (0) and ((1) Show a simulated realization when both S and D are assumed
to have the mixture distributions ﬁtted to (a) and (b), respectively. Figures 3.5 (e) and (1)
Show a simulated realization when both S and D are assumed to be uniformly distributed

and independent of each other. Note that there is a good agreement, in the distributional

43

 

(C) ((1)
Figure 3.2: Assessing the ﬁt of the mixture models to minutiae location and direction:
Observed minutiae locations (white boxes) and directions (white lines) are shown in panels
(a) and (b) for two different ﬁngerprints from the NIST 2000 SD 4. Panels (c) and ((1),
respectively, show the cluster labels for each minutia in (a) and (b).

 

-1600 I I I

-1610

-1620

-1630

-1640

BK)

-1650

-1660

-1670

-1680

 

 

 

 

-1690 J ‘ LT
1 2 3 4 5
Number of Components G

(a)

 

‘1490 I I I

-1495

-1500

-1505

-1510

BK:

-1515

-1520

-1525

-1530

 

 

 

I-

l

p-

 

-1535
1 2 3 4 5

Number of Components G

(b)

Figure 3.3: Assessing the ﬁt of the mixture models to minutiae location and direction
observed for ﬁngerprint images (a) and (b) in Figure 3.2. Panels (a) and (b) Show the BICS
when 1 S G S 5.

45

sense, between the observed minutiae locations and directions [Figures 3.5 (a) and (b)] and
those Simulated from the proposed models [Figures 3.5 (c) and (d)], but no such agreement

exists for the uniform model. [Figures 3.5 (e) and (1)]

3.4 Goodness-of-ﬁt Tests for the Mixture Models

To test the goodness-of-ﬁt of the mixture models to the observed minutiae, the following

null and alternative hypotheses have been considered:
H0 : fT(s, 6) = f5(s, 6 | Og) for some G and 90, versus H1 : not H0, (3.22)

where fT(s, 6) is the true distribution of minutiae location and direction. For a ﬁngerprint
with k minutiae, the above goodness-of-ﬁt test can be carried out by partitioning the (S, D)
, Space into W2 x V non-overlapping blocks, which means the Space S is partitioned into W
equal-size rows by W equal-Size columns and the Space D is partitioned into 12 equal-size

blocks, and computing

= observed number of (S j, Dj)’s that fall in the (1121., we, 12)-th block, and

0(wr1wC1v)

= k . P((S, D) e (wr, we: v)-th block I éG*)

600731001”)

= expected number of (S, D)’s that fall in the (112,», we, 12)-th block under the ﬁtted

mixture model.

The two tests discussed below require ewr,wc,v to be large for each block (10,-,1120, 12).
Let nwnwcm denote the number of minutiae in block (112T, 1126, 12). For the tests to be valid,
the expected frequency for all the blocks Should be at least 5 [45]. Consequently, the total
number of minutiae in the ﬁnger Should be at least 5 times the number of blocks. Therefore,
a threshold 7 = 5 was selected, so that blocks with ewnwcm < T for either the mixture
models or the uniform model are combined with neighboring blocks that have ewnwcw
greater than or equal to r. The set of blocks resulting from this merger is denoted by 8.

Two non-parametric test statistics are considered:

46

(i) the Freeman-Tukey statistic [24] given by

1 2
Z {Ow/r,wc,v + (Ownwcm + 1)1/2 — (4 X awnwcm + 1)1/2}21 (3-23)

(wr,wc,12)EB

and

(ii) the Chi-square statistic given by

2
Z — 6
(OwTIwCIv wr,w2iv) . (3.24)

ewriwcsv

 

(wr,wc,v)€3
Both the Freeman-Tukey and the Chi-square statistics have asymptotic chi-square distri-
butions (corresponding to the total number of minutiae being large) with |B| — 1 degrees
of freedom under H0, where |B| is the total number of blocks in B. The chi-square distri-
bution can be used to obtain a p-value to either accept or reject H0. Small (respectively,
large) p-values, typically below (respectively, above) 0.05, lead to rejection (respectively,
acceptance) of H0, which in the case of Equation 3.22, leads to a conclusion that a mixture
model is inadequate (respectively, adequate) as a model for minutiae.

To perform the good-neSS-of-ﬁt test, the parameters W and V are taken to be W = 10
and V = 4, resulting in W2V = 400 blocks. However, Since many of these blocks
contain less than or equal to 5 minutiae, the merging procedure discussed earlier results
in a smaller number of blocks. For example, the ﬁngerprint image in Figure 3.5 (a)
gives |B| = 8 blocks, with observed and expected frequencies of (3,9, 5, 7,8, 8, 10, 7)
and (5.1, 5.8, 6.3, 6.9, 9.0, 6.5, 10.9, 6.6), respectively. The Freeman-T‘ukey and Chi-square
tests give p-values of 0.88 and 0.84, respectively, based on a chi-square distribution with 7
degrees of freedom, resulting in the acceptance of H0.

In order to compare the adequateness of the mixture and uniform models as candidate
models on minutiae, it is necessary to perform the goodness-of-ﬁt test for the uniform
model as well. If a larger number of Ho’s are rejected for the uniform model compared
to the mixture model, it can be concluded that the mixture model is more adequate for

representing the distribution of minutiae. To obtain the goodness-of-ﬁt test for the uniform

47

model, we simply substitute f A(8, 6 | Og) in Equation 3.22 by the uniform distribution
1/ (27rA), where A is the ﬁngerprint area. The p-values for the Freeman-Tukey and Chi-
square tests for the uniform models were calculated in the same way as for the mixture
models. The results of p—values can then be used to decide either in favor, or against, H0.
For the ﬁngerprint image in Figure 3.5 (a), the expected frequencies under the uniform
model are (14.6, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 12.9). The p-values for the Freeman-Tukey and
Chi-square tests are 2 x 10"4 and 1.2 x 10’4, leading to the rejection of the uniform
model. Results of the model ﬁt on several ﬁngerprint databases are given in Section 7.2
based on which we can conclude that the mixture model is a far superior model to describe

the distribution of minutiae compared to the uniform model.

3.5 Summary

In order to model minutiae distribution, a G-component mixture model was developed in
this chapter. The model takes into account clustering tendency of minutiae and dependence
between minutiae location and direction. For each of the G components, minutiae location
was modeled by a bivariate Gaussian distribution, and minutiae direction was modeled
by a mixture of two Von-Mises distributions. To compare the mixture model with the
uniform model, goodness-of-ﬁt tests based on Freeman-Tukey test and Chi-square test were
performed for both models. The test results Showed the superiority of the mixture model

over the uniform model.

48

uoIreruauo

 

uoIterueuo

 

(b)

Figure 3.4: The clusters in 3-D space for ﬁngerprint images in Figure 3.2 (a—b) are Shown
in panels (a) and (b) with :13, y, 2 as the row, column, and the direction of the minutiae.

49

 

(e) (0
Figure 3.5: Minutiae locations and directions simulated from the proposed model ((c) and
(d)), and from the uniform distribution ((e) and (f)) for two different images ((a) and (b)).
The true minutiae locations and directions are marked in (a) and (b).

50

CHAPTER 4

Fingerprint Individuality for a Pair of

Fingerprints

4.1 Assumptions for Estimating PRC

To estimate ﬁngerprint individuality, a similarity measure between a pair of ﬁngerprints is
required. In this thesis, the Similarity measure, S (Q, T), is deﬁned as the number of minu-
tiae matches between a query ﬁngerprint Q and a template ﬁngerprint T. Thus, estimation
of PRC is equivalent to ﬁnding the probability distribution of the number of matched minu-
tiae pairs for every impostor (Q, T) pair from the target population. This estimation is
achieved by a newly developed mathematical model introduced in this chapter.

Suppose the query, Q has n minutiae and template T has m minutiae. Let (St-Q , D29), 1' =
1,2, . . . ,n, and (5}, DJT), j = 1,2, . . . ,m, be the minutiae in query ﬁngerprint Q and
template ﬁngerprint T, respectively. A ﬁngerprint recognition system accepts or rejects the
query ﬁngerprint Q based on a threshold 122, which is a positive integer. The PRC is the

probability of obtaining 112 or more matched minutiae pairs between Q and T and can be

51

expressed as

PRC/”(WWW = P(S(Q,T) 2 w l ",m, 1c # It)

-_—_ Z P(S(Q,T) =1'ln,m,Ic 75 It). (4-1)

wgigmin{n,m}
The right-hand side of Equation 4.1, i.e., the probability distribution of S (Q, T), will be
carefully investigated in this chapter.
Query and template minutiae are assumed to be independently distributed according to

the following mixture densities:
fQ(SQ, 09) = f(SQ. DQ I 9Q) (4.2)

and

fT(5T, DT) = f(5T, DT I 9T)- (4-3)

In this chapter, a match is deﬁned in the same way as used by Pankanti et al. [35] (Equa-
tion 2.7), and it depends on the two parameters ro and do. Using the method to estimate To
and do introduced in Chapter 2, the values of T0 and do are found to be 15 and 22.5, based
on the ground truth database [3 8], respectively. These values will be used in the subsequent
experiments to estimate the PRC.

As in corrected uniform model by Pankanti et al. [35], minutiae in Q are taken to be at
least a distance of 2ro apart from each other. The spatial region within a distance ro of S?

is deﬁned by

 

818?; m) = {(2:3) : 3/<:c — x9)? + (y — 32,913 To}, (4.4)

where1=1,2,...,n.

It follows from our assumption that the sets B0329; T0),i = 1, 2, 3, ..., m are non-
overlapping. A Similar condition is imposed on the template minutiae set. Subsequently,
the sets B(SJT; To) for j = 1, 2, . . . ,m, are also pairwise non-overlapping. It follows

from this assumption that there can be at most one match for each query minutiae point

52

(5?, D?) If (SI-Q, D?) matches with (SJT, D?) for some j, then (Sf, Dz) cannot match

with the other minutiae points of Q.

4.2 Model for Estimating Fingerprint Individuality

An analytical model for the PRC is obtained in this section. Initially, let the query minutiae

set, (8?, DZQ), 1' = 1, 2, . . . ,n, be ﬁxed. Deﬁne
11,: P{|ST — Sin g m and |DT — Din 3 do} (4.5)

to be the probability that a random minutiae from T, (S T, DT), distributed according to

Equation 4.3, is matched with (SI-Q , D?) Similarly, let
0, = P{|ST — SJT| g m and IDT — 0}] 3 do} (4.6)

denote the probability that (ST, DT) is matched with (3?, DE). The dependence of u,-
and vj on Q and T via the mixture distribution is implicit and subsequently suppressed.
We ﬁrst compute the probability that there is exactly one match between Q and T, with-

out loss of generality, between (S Q, B?) and (Sir , Dill"). This probability is given by

(l—iui\ (l—iui—vg\ {l—iui—vg—v3\

1—111 1—111—122 1—111—122—123

1 l l . m2 1 2
{l—gui—;Uj\j

 

 

 

 

 

 

 

 

 

 

 

m .
1 — 11.1 - Z ’Uj
A 2:2 A
(4.7)
The ﬁrst term in Equation 4.7, namely 111, corresponds to the event that there iS a match
between the minutiae points (5? , D?) and (S C111, D?) The second term is the probability

that (811,03) does not match with any of the other minutiae (8?, Big), 1' 2 2, given

53

that there is already a match for (SQ, D?) Given (SIT,D:1F) and (S;, Dg), the third
template minutiae (ST, D?) can be positioned anywhere in the region outside B(SQ, D?)
and B (ST, D31); B (ST, D; ) has to be considered as well, due to the imposed condition
that the template minutiae should not be close to each other. The requirement that (S31:w , B?)

should not match with any other query minutiae (Si-Q , D29), 1' 2 2, gives

77.
1—211, —122
i=1

(4.8)
1 - 111 - 122

 

as the required probability, which is the third term in Equation 4.7. Proceeding in this way,
the last term in Equation 4.7 is the probability that (8,15, D35) does not match any of the
previous template minutiae (Sf, Dir), j 2 2, due to the same condition that the template
minutiae should not be close to each other, or any of the query minutiae points. 1

Carrying this argument for w minutiae matches, the probability of obtaining matches
between (SRDIQ) and (ST,D;F) for l = 1,2, . . . ,w, and no matches between all re-
maining minutiae is Shown in Equation 4.9 and is denoted as g({(SQ, DZQ) : l =
1,2,---w},{(ST,DEF) :l=1,2,~-- ,w}.

g({(SQ,D,Q):I—_—1,2,.--w},{(S}’",Dg’") :l~—=1,2,~- ,w})

”2 ”3 ”112
_— ,_ _— 4.
u1x(1 1)x(1 1 2)x x 1 (9)

v

(1)

11) {l—iui—vw+1\ {1

121
X w XH-X

 

 

 

{1_i
\1—gui) \1_;ui—Uw+l) K1

-E 1

.=w+1

 

 

 

 

 

 

 

 

71
~12?)
i=1
w m-l
~52
1:1 j=w+1 ”1.)!

v

(2)

where the term (1) denotes the probability that minutiae {(SQ, DIQ) : l =

54

3,4, - - ~112}, {(SlT, DIT) : l = 3, 4, - .. ,w} are corresponded and term (2) denotes the
probability that there are no other matches except the p matches between Q and T.

Equation 4.9 is derived assuming that matches occurred between the ﬁrst 112 minutiae of
the query and template, and no other matches were found for the remaining minutiae. It is
the ﬁrst step towards estimation of probability of exactly 112 matches between Q and T. In
general, the match between Q and T can happen between any 112 minutiae from Q and T
respectively. Let {SIQ , D1622} be the 112 minutiae from Q that are matched to rrrinutiae in T,
where

IQ ={(z’1,1'2,---,1'w):151'1 <12 < <1w 311}.
Let {8}}, D51} be the 112 minutiae from T that are matched to minutiae in {SIQ ,ngz},
where
IT = {(21,22, - -- .212) = 1 s j11j21"'jw s m}-
Denote the minutiae in T that fail to match with any minutiae in Q as {8%, D26}, where

l0 —_— {jw+11jw+2a’” 1.7m: 1$j11j2a"'jw S m}.

The probability that minutiae {53211922} are matched to minutiae {S17}, D51}, and no

other minutiae are matched, is

21115? 21%)}, {(53}, 0,1911)

U" u‘ u.
= 111 x 22 x 23 x x 2‘”
1 — 11,1 1— uil — “(1.12 w—l

 

 

 

 

 

55

n n
l—Zui\ (l—Eui_vjw+l\
z:

 

 

 

 

 

 

 

X 1:] X 112 X
1 — u- 1 — 11- — 12-
,; 1k) K I; 2k Jw+1}
{ n m—l \
1 — Z: 11, — Z vjk
X i=1 k:=nw+l . (4.10)
(1 - Z “I Z ”ka
l=1k=w+l

In order to calculate the probability of obtaining exactly 112 matches, all possible 112 indices
out of the total m from the template have to be considered for matching with the ﬁrst 112
minutiae ofthe query. This can be done in m(m- 1)(m——2) - - - (m—w+1) = ml/(m—w)!
ways. Furthermore, 112 query minutiae can be selected for matching m (If, ) ways. Taking

into account the above facts, the probability of obtaining exactly w matches is given by

Z f({(S,QQ,D M1111 05.11) (4.11)

1511<12<-~<iwgn,
lsj13j21".tjwsm

In summary, the probability of exactly 112 matches between Q and T can be calculated as

follows,

(1) For each sequence of w minutiae in Q and T, 51% and S11}, calculate the probability

that minutiae 31% is matched with minutiae S5 from Equation 4.10.

(2) Calculate the sum of all the m! / (m — 112)! probabilities calculated in step (1) and
obtain the PRC by Equation 4.11. Thus the probability of exactly 112 matches between
Q and T is achieved.

4.3 Difﬁculties in Estimating Fingerprint Individuality

There are several difﬁculties in calculating the PRCS for a given ﬁngerprint database ac-

cording to Equation 4.11. The main challenge is that it involves a sum over all possible

56

subset of 112 distinct indices from {1, 2, . . . ,m}. Even for moderate values of w and m
(such as the values considered in Section 7.1), the number of terms in the summation, i.e.,
(3)), is very large. For example, when m = 26 and w = 12, the value of (3,) is 9, 657, 700.
Thus, any method that involves Simulation for computing the entire summation (or, an
estimate of the summation using, for example, bootstrap samples) becomes infeasible in
terms of computational time. Another challenge is that the above summation needs to be
computed for every pair of impostor ﬁngerprint images in the given database to estimate
ﬁngerprint individuality of a target population. For example, the NIST database used in
this thesis have, respectively, 3, 998, 000 pairs of impostor ﬁngerprint images, making any
simulation-based method both infeasible and impractical. In the next section, we Show that
Equation 4.11 can be approximated by a Poisson distribution which drastically simpliﬁes
estimation of ﬁngerprint individuality. The Poisson model solves the problems mentioned
above with regard to the summation over different 1' and j indices. The Poisson model
simpliﬁes the estimation of ﬁngerprint individuality for a pair (Q, T). In chapter 5, we
consider a population/database from which the pairs (Q, T) are generated. There are prob-
lems with computations in this scenario, too, and for this reason, the hyper-mixture models

are developed.

4.4 Poisson Model

In this section, we Show that Equation 4.11 can be approximated by a Poisson distribution
which drastically Simpliﬁes assessment of ﬁngerprint individuality and therefore solves the
main challenge discussed above.

Recall that 111, deﬁned in Equation 4.5, is the probability that (59,12?) matches
(ST , DT) . Let

pier) = Eel) = P(IST — SQI s m and IDT — DQI 3 do) (4.12)
denote the probability of a match when (S Q, DQ) and (ST, DT) are random pair of minu-

57

tiae from Equations 4.2 and 4.3. For a set of n minutiae in query Q sampled from Equation

4.2, and a set of m minutiae in query T sampled from Equation 4.3,

A(Q, T) E m np(Q, T) (4.13)

is the expected value of the number of matches between Q and T. Similarly, when both Q

and T arise from Equation 4.3, let
1/(Q, T) = m n E(121) (4.14)

be the expected value of the number of matches between them, where 121 is deﬁned in
Equation 4.6. The term (2) in Equation 4.9 is approximately e—A for large m and n. This
is derived as follows. When m and n are large (m x n > 100), the largest summation

involving the vj’s in Equation 4.9 (term (2)) is

m—l 1 m—l I! V
12- = m- — 12- szv =m(———)=:-. 4.15
3:. m2, (.1 nmn<>
J=w+1 J=w+1
m—l
. V .
Since — 13 close to zero for large 11, each of the terms 12w+1, 12w+1 + 12w+2, . . . , Z 123'

n
j=w+1

is also close to zero by Equation 4.15. Moreover, note that the difference
112 / n n A
I‘Z“1‘\1*Z“i T: the}; (4.16)
2:1 2:1 2=w+1

w
using an argument Similar to that in Equation 4.15. Furthermore, Since 2 u,- z 0 for large
i=1
m and n, we get the following two equations:

” 1 n ,\ A
l—Zui=1—n(;Zuz-)zl—n(n—1;L-)=1—T—n-, (4.17)
1:1 2:1
and

111
1—2 a,- z 1. (4.18)
121

58

In order to calculate (term (2)) in Equation 4.9, ﬁrst calculate the logarithm of ((2)) as

 

 

 

 

 

 

 

follows:
71 w
(m - w) 10230 - Z: 141) - (m - w) 105(1 - Z 14)
i=1 1:1
('1')
B B \3
m—l ( Z vj\ m—l K E v]
+ 2 log 1 — ———]=w+nl — 2 log 1 — ———J:wtul
B=w+1 1 _ Z “'2' B=w+1 1_ Z “2'
1 1 1 1
(17)
(II) can be Simpliﬁed as
m—l B m—l B
Z Z a, Z Z a,-
(II) = B=w+1j=w+1 _ B=w+1j=w+1
TI. 11)
1 -— Z 11, 1 — Z 11,-
i=1 i=1

m—l B

A/m
(T-T/ai Z Z ”1'

B=w+1 j=w+1
< C i (m K) (4.19)
m n
for some constant C for large m and 11 using equations 4.15, 4.16 and 4.17. From the last

line in Equation 4.19, it follows that (H) goes to zero as m and 11. go to to inﬁnity. Thus,

log(B) = (m — 112) log(1 —— Z 21,-) — (m — 112)log(1—- Z 11,-)
i=1 i=1
2 —(m — 112) Z 11,-
i=w+l

as m and 11 go to to inﬁnity. Therefore (term 2) in (4.9) can be approximated by
exp{—A(Q, T)} (4.20)

59

when both m and n are large, and when the number of matches 11) is moderate (i.e., 112/m
and 112/n are not too close to either 0 or 1). In applications, where m x n > 100 and
m x n x p(Q,T) < 10, m and n are considered sufﬁciently large and the number of
matches are considered as moderate [39].

Each term in the denominator of (term (1)) in equation 4.9 is close to 1 Since 11,- :e 0 for

each 1' = 1, 2, . . . ,112 — 1. Thus, equation 4.9 can be written approximately as
w
I] utexpt—A(Q.T)}- (4.21)
121

Similar to Equation 4.21, starting with Equation 4.10 (instead of Equation 4.9), the follow-

ing holds:
Q Q w

MS, ,DlQ», {(8311, 0,711)» = H at, exp{-A(Q,T)}- (4.22)
Applying Equation 4.22 to Equation 4.11, the probability of obtaining exactly 112 matches
is given by

m! w ,
11<12<m<1w k=1

where the summation in (4.23) is over all 112 distinct indices 1'1 < 12 < .. . < 1w from
{1,2,...,n}.

Equation 4.23 can be Simpliﬁed when n is large. Note that
1 w
(7.— Z [1 “1k e (E(ut))“’, (4.24)
w 2‘1<1'2<---<1'w k=1
where E (111) iS deﬁned as in Equation 4.12, because ui’s are independent and identically
distributed. Substituting Equation 4.24 in Equation 4.23 with E (111) = A(Q, T) / (mm)

from Equation 4.13, the probability of exactly 112 matches is

p(wiQ,T) a —""——,(”) (Myewt—MQTH

(m — 112). 112 mn

e—A(Q,T) A(QI T)w

112!

(4.25)

 

22

60

for large m and n and moderate 112, which corresponds to a Poisson distribution with mean

A(Q, T)-

4.5 Justiﬁcation of the Poisson Model

Equation 4.29 corresponds to a Poisson probability mass function with mean A (as deﬁned
in Equation 4.13), where A is the expected number of matches from the total number of
mn possible pairings between 71 minutiae in Q and m minutiae in T, with the probability
of each match being p(Q, T). Using a Poisson distribution to approximate the number
of matched minutiae pairs, which follows a binomial distribution, is valid because of the
following three properties. (i) In ﬁngerprint matching, a “success” is deﬁned as a minutia
match and the number of trials, mn, is large (_>_ 100) [39]. This can be conﬁrmed by
Figures 7.2 and 7.3 in Chapter 7, which Show that m and n are much greater than 10 for
most ﬁngerprints. (ii) The probability of a success, p(Q, T), is small. (iii) The number
of impostor matches between Q and T is moderate. The properties (ii) and (iii) can be
conﬁrmed by Table 7.7 where the right-most column Shows that, for all three databases,
the empirical value of the number of matched minutiae pairs between two ﬁngerprints is

always less than 10. In summary, it is appropriate to use Poisson model in Equation 4.29.

4.6 Overlapping Area Model: Comparison of Mixture

Model with Corrected Uniform Model

During ﬁngerprint matching, an overlapping area is formed after alignment between minu-
tiae regions of Q and T (see Figure 4.1). Assume that no out of the n minutiae in Q,
and mo out of the m minutiae in T are within the overlapping area. The PRC proposed
by Pankanti et al. [35] was based on the parameters within the overlapping area, namely

no and mo, whereas the PRC in this thesis is based on n and m of the entire ﬁngerprint.

61

Because of this difference, the PRCS from the corrected uniform model and those from
the mixture model on the entire ﬁngerprint area cannot be compared. For comparing with
the corrected uniform model, an overlapping area is developed, adopting the parameter no
and mo as used by Pankanti et al. [35], but still estimating the PRC based on the minutiae

density from the mixture models.

4.6.1 Determination of no, mo and the Overlapping Area

The ﬁrst step of overlapping area model is to ﬁnd no, mo and the overlapping area between
Q and T. To ﬁnd the overlapping area, the model has to ﬁrst determine the entire minutiae
region for both Q and T, which is deﬁned as the minimal ellipse that encompasses all the
minutiae in the ﬁngerprint, as follows. A convex hull encompassing all minutiae locations
is ﬁrst determined (see Figure 4.2, where minutiae locations are labeled as squares). Then,
an ellipse (denoted by a dashed line in Figure 4.2) is obtained by the direct least square
ﬁtting method [17]. Since some of the minutiae fall outside the dashed ellipse, the size
of the ellipse is increased until it encloses all the minutiae. The resulting ellipse (i.e., the
minimal ellipse) is denoted by a solid line in Figure 4.2. After determining the minutiae
regions, Q and T are aligned with a Procrustes transformation [29]. This completes the

ﬁrst step of the model construction, namely, ﬁnding no, mo and the overlapping area.

4.6.2 Adaptation of Mixture Model to Overlapping Area Model

The second step of the overlapping area model is to estimate the PRC adopting no and mo
as used by Pankanti et al. [35], and employing the minutiae density of the mixture model
truncated to the overlapping area. Assume that the minutiae densities of ﬁngerprints Q and
T are fQ(S, D) and fT(S, D) as in Equations 4.2 and 4.3, respectively. The alignment
of minutiae region Q with respect to minutiae region T, which is obtained from the Pro-
crustes transformation, is denoted as TR(Q, T). Further assume that after alignment, the

overlapping area is A(Q, T). It is obvious the minutiae density from ﬁngerprint Q in the

62

Template. T

Overlapping
Region

 

Figure 4.1: Overlapping area of two ﬁngerprints during matching

overlapping area is

fQ1TR'1(QiT)(S,D)) X M
from) fQ(TR‘1(Q, T)(S, 12)) x lJldeD'

 

(4.26)

where TR‘1 (Q, T)(S, D) is the transformation to obtain the original placement of query
Q without its alignment with T at which placement the minutiae density of Q is estimated
and |J| is absolute value of the Jacobian determinate. The minutiae density from T in the

overlapping area is
fT(Si D)
fA(Q,T) fT(S, D)deD

Let (SQ, DQ) be a random selected minutia from Q in the overlapping area, and let

 

(4.27)

(ST, DT) be a random selected minutiae from T in the overlapping area. Furthermore,

let the probability of a random match in the overlapping area be

P0(Q,T) = P(ISQ - 8T1 3 re and IDQ — 071 3 doing, m0)- (428)

63

   

(a)
Figure 4.2: Convex hull of minutiae and best ﬁtting ellipses. (a) The minutiae from image
(b) and the best ﬁtting ellipse to the minutiae set. p1, p2, c, 6 are the major axis, minor
axis, center, and orientation of the ellipse. (b) Fingerprint image with minutiae and the best
ﬁtting ellipse

Applying the Poisson model, the probability of obtaining exactly 112 minutiae matches is

(MW) Ao(Q, le

p0(w; QIT) z w, , (4.29)

Where A0(Q, T) = n0 X m0 >< 100(Q,T)-

4.7 Summary

A mathematical model was developed to estimate PRCS, given a pair of ﬁngerprints Q
and T with n and m minutiae, respectively. The minutiae density was estimated from
the mixture models developed in Chapter 3. Although the model calculates PRCS in a
closed form, its estimation is time-consuming for practical applications. Hence a Poisson
approximation was derived to improve the computational efﬁciency. The Poisson model
simpliﬁes the estimation of ﬁngerprint individuality for a pair (Q, T).

To compare PRCS from the mixture model with those from the corrected uniform model,

64

an overlapping area model was developed, adopting the deﬁnition of PRC proposed by
Pankanti et al. [35], while still assuming that the minutiae density is estimated from the
proposed mixture model. In chapter 5, we consider a population/database from which the
pairs (Q,T) are generated. There are problems with computations in this scenario, too, and

for this reason, the hyperrrrixture models are developed.

65

CHAPTER 5

Assessment of Fingerprint Individuality:

Target Population

There are two challenges involved in obtaining an assessment of ﬁngerprint individuality.
The ﬁrst challenge was resolved after Poisson model was developed in the last chapter. The
second challenge for the assessment of ﬁngerprint individuality is that computation for ev-
ery pair of impostor ﬁngerprint images in the given database is required to estimate ﬁnger-
print individuality of a target population. For example, the NIST 2000 SD4 database used
in this thesis have, respectively, 3, 998,000 pairs of impostor ﬁngerprint images, making
any simulation-based method both infeasible and impractical. The hyper-mixture density

model developed in this chapter, on the other hand, is meant to solve the second challenge.

5.1 The Hyper-mixture Density Model

Assume in a target population there are N * unknown different minutiae distribution groups

with class densities H1, H2, ..., H N* and corresponding proportions 7r1, 7r2, ..., ”N* (where
Nil!
71,- 2 0 for 1' :2 1, 2, ..., N *, and Zn,- 2 1). Thus, ﬁngerprints in different groups have

121
different distributions (H i’s), whereas those within the same group have Similar minutiae

66

distributions. Therefore, this hyper-mixture model is capable of capturing different minu—
tiae distributions in different ﬁngers in the population. Note that this assumption is needed
Since it is well-known that ﬁngerprints belong to ﬁve different classes (i.e., right-loop, left-
loop, whorl, arch and tented arch), and therefore are likely to have different class-Speciﬁc
minutiae distributions. Thus, using only one common minutiae distribution may smooth
out different distributions in the ﬁngerprint classes. Moreover, PRCS depend heavily on the
composition of each target population and different target populations may have different
composition of the ﬁngerprint classes [18]. For example, the proportion of occurrence of
the right-loop, left-loop, whorl, arch and tented arch classes of ﬁngerprints is estimated
to be 31.7%, 33.8%, 27.9%, 3.7% and 2.9%, respectively, in the general population [27].
Thus, PRCS computed for ﬁngerprints from the general population will be inﬂuenced more
by the mixture models ﬁtted to the right-loop, left-loop and whorl classes, than to the arch
and tented arch classes. In effect, the composition of target population needs to be Studied,
which is the goal of the hyper-mixture model. Besides the possible uneven proportions,
more important is the fact that the class proportions might change across different target
populations (for example, if the target population has an equal number of ﬁngerprints in
each class, or with class proportions different from the ones given above), which will lead
to change of the PRCS. With a hyper-mixture model comprising of N * clusters of minutiae
distributions, the methodology of obtaining PRCS for a pair of ﬁngerprints can be extended
to any target population.

To formally obtain the composition of a target population, an agglomerative hierarchi-
cal clustering procedure [21] was adopted on the space of all ﬁtted mixture models. The
dissimilarity measure between the estimated mixture densities f and g is taken to be the

Hellinger distance [25]

H(f,g)=/ / (3/f(s,9)—\/g(s,9))2drrd0- (5.1)
568 6€[0,27r)

The reasons for using Hellinger distance, H (f, g), is that it is a number bounded between

0 and 2, with H ( f, g) = 0 (respectively, H (f, g) = 2) if and only if f = g (respectively,

67

f and 9 have disjoint support). Thus, we avoid distance measures that are arbitrarily large
and therefore can focus on thresholds of clustering in [0,2] only.

For a database of F ﬁngers, a total of F(F — 1) / 2 Hellinger distances were obtained
corresponding to the F (F — 1) / 2 mixture pairs. The agglomerative hierarchical clustering
methodology with Ward’s method [22] gives a dendrogram that can be cut at an appropriate
level to form N clusters of mixture densities, 01,02, . . . ,CN. Note that N = 1 when
A = 2, and N increases to F (F — 1) / 2 as A decreases to 0. When the number of clusters is

N, the within cluster dissimilarity is deﬁned as

1
WN = 2:] 317.1 P(Ct), (5.2)
where
17(01): Z H(f.g) (5.3)
f,g€Ci

is the sum of all distances H (f, g) for f and g in 0,, and |C,-| is the number of mixture
densities in Ci. Note that as N increases to F, WN decreases to 0. To choose the optimal
number of clusters, the “Gap Statistic” [48] is applied as follows: Let G N = |WN — WN_1|
denote the absolute difference between the within cluster dissimilarities WN_1 and WN.
N * is selected as the number of clusters if the values of G N for N > N * are insigniﬁcant
(close to 0) compared to the value of GN*. Figure 5.1 shows the plot of G N against N for
NIST 2000 SD 4. G N doesn’t change Signiﬁcantly when the number of clusters is more
than 33. Hence N * iS chosen as 33. For now, N * is chosen by visual inspection of Figure
5.1; we tend to prefer larger N * values so as not to under-represent the interclass variability
of the population.

Once the number of clusters N * has been determined, the mean mixture density for each

cluster C,- is determined as

113.6) = 33.-I 2 213,6), (5.4)

68

 

 

 

 

 

35 . r i .
30' .
25' l
20. .1
Z
(9
15. .1
10~ -
5' .1
M
o l 1 r
0 10 20 33 40 50

Number of Clusters, N

Figure 5.1: Determination of the number of clusters for NIST 2000 SD 4. The number of
clusters estimated is 33.

69

where f (s, 6) is the mixture distribution from Equation 3.5. The weight 7r,- of cluster C,- is

(5.5)

At this point, all the parameters in the model have been estimated. Since the minutiae
distribution of each cluster is a mixture (average) of the densities of the mixture models
that are ﬁtted to each individual ﬁnger, the model is appropriately called a “hyper-mixture

model”.

5.2 Relationship between Clusters from Hyper-mixture
Model and Fingerprint Classes

Fingerprint clusters in the hyper-mixture model are determined by minutiae distribution,
whereas the deﬁnition of the ﬁngerprint classes is based on global ridge pattern. Thus the
relationship between clusters and ﬁngerprint classes is worthy of investigation. Finger-
prints in a cluster can belong to different classes. If the clusters and classes are totally
uncorrelated, every one of the three classes should be evenly distributed among all the 33
clusters. On the other hand, if a certain ﬁngerprint class tends to concentrate into some
clusters, it can be concluded that there is a strong correlation between clusters and classes.
Figure 5.2 shows the composition of the three major ﬁngerprint classes (i.e., the number
of ﬁngerprints for each class, loop, whorl, arch, where loop includes left loop and right
loop, arch includes arch and tented arch) for each of the 33 clusters in the NIST 2000 SD
4. The wide spread on the vertical axis indicates that the classes are not evenly distributed
among the 33 clusters (otherwise the plot should be approximately a ﬂat line). More quan-
titatively, for each ﬁngerprint class, after ranking the clusters according to the number of
ﬁngerprints in the class, the total number of ﬁngerprints from the top 16 out of the 33 clus-

ters were counted. If the clusters are evenly distributed, the total number should be close to

70

 

Fingerprint Classes of Clusters from Hyper-mixture Model

 

 

 

 

 

 

120’?

w .

g1oo,

§ 3

3,80 .‘r

c . l,

Eeo : "

- 1‘

E A lb. i'.‘ '
o r 1‘ rut '

1:40 r e D L
E ., l

3 .

z .

 

 

 

AI .
20" blu- l’ "d. r' .\
l A-l ' I‘l I \
\' " :“Au. ‘ '.
0 1 ‘W“““ ‘ ‘ “ A A A A A A A E
5

10 15 20 25 30 35

Cluster Number
-i- Loop --t--Whor1 -O-Arch

 

 

 

Figure 5.2: The number of ﬁngerprints in the three main classes for the 33 clusters from
the hyper-mixture models on NIST 2000 SD 4. For each cluster label 2' as shown in the
horizontal axis, the vertical coordinate of each point shows the number of ﬁngers in loop
(labeled with dots), whorl (labeled with triangles) and arch (labeled with squares).

16/33 z 50%. However, the top 16 whorl clusters consist of 97% of all whorl ﬁngerprints;
for the top 16 loop clusters and the top 16 arch clusters, the percentages are 73% and 80%,
respectively. Furthermore, the whorl clusters are so clustered that 69% of the whorl ﬁnger—
prints belong to the top 5 whorl clusters, and those 5 clusters have very few ﬁngerprints
from the other two classes, namely loop and arch. We conclude there is a strong correlation

between clusters of hyper-mixture models and the three main ﬁngerprint classes.

71

5.3 Assessment of Fingerprint Individuality for a Target
Population

Given the N * cluster densities (H1, H2, ..., H N*) and cluster weights {n1,7r2, ...7rN*},
ﬁngerprint individuality of the target population can be calculated as follows.

The mean parameter A( Q, T) in Equation 4.13 depends on Q and T via the mean mixture
densities of the clusters from which Q and T are taken. If Q and T belong to clusters C,- and
C}, respectively, then the mean mixture densities of C,- and Cj can be used in place of the
original mixture densities in Equation 4.12, i.e., A(Q, T) E A(Ci, 0]). Let p*(w; Ci, 0])
denote the Poisson probability

_ . . ,\
p*(w; 02363) =e A(CZ’CJ) (5.6)

11)!
For a ﬁngerprint database consisting of N * different clusters of distributions, the most rep-
resentative value for the probability of a random correspondence is reported as the estimate
of ﬁngerprint individuality for this database. There are a total of N *(N * — 1) possible
impostor pairs of ﬁngerprint images (Q, T), where Q and T come from different clusters.
Let T0 ={(z’,j):1§z’§ N* andl g j g N*,z’ 7é j}. The average PRC corresponding
to w minutiae matches is given by

__ 203356511) Wiﬁjp*(’w; Cucj)

PRC = ,
2 W1
(i,J')€To

(5.7)

 

where p*(w; Q, T) is deﬁned in Equation 4.29. Note that p*(w; Q, T) is symmetric in
Q and T, and thus it is sufﬁcient to consider only the N*(N* — 1) / 2 distinct impostor
pairs instead of the total N *(N * — 1). Each of the probabilities, p*(w; Q,T), is very
small, e.g., 10‘6 or 10‘7. Thus, the average PRC in Equation 5.7 is highly affected by
the largest of these probabilities, and is, therefore, not reliable as an estimate of typical
PRCS arising from the impostor pairs. A better measure would be to consider an average

of the trimmed probabilities. Let (1 denote the percentage of p*(w; Q, T) to be trimmed,

72

and let p* (w; a / 2) and p* (w; 1 — a/2), respectively, denote the lower and upper 100a/2-th
percentiles of these probabilities. Deﬁne the set of all trimmed p*(w ; 0,, 03-) probabilities
as T E {(z',j) : p*(w;oz/2) S p*(w; Ci,CJ-) S p*(w;1— a/2)}. Then, the a-trimmed

mean PRC is
Z(i,j)€T 7r,-7rj P*(w; CiaCj)
2 7n 7rj
(i,j)eT

The above discussion is general and holds true for any distribution of the query and

PRCa = (5.8)

 

template minutiae. In particular, when the distribution on the minutiae (both location and
direction) are chosen to be uniform as in the model by Pankanti et al. [35], the following

expression for A(Q, T) is obtained:

/\U(Q,T) = man pp, (5.9)

where p L (respectively, p D) is the probability that SQ and ST (respectively, DQ and DT)
will match. The probability of a location-and-direction match appears as the product p L
and p D since the minutiae location and direction are distributed independently of each

other.

5.4 Summary

A hyper-mixture model was proposed to cluster the mixture models of all the ﬁngers in the
sample database into clusters so that ﬁngerprints in the same cluster have similar distribu-
tions. The PRCs for the target population can be calculated by the weighted average of
PRC5 for the clusters from the hyper-mixture models. Study on the clusters in the hyper-
mixture model showed that ﬁngerprints of the same class are not uniformly distributed in
the clusters of hyper-mixture models and ﬁngerprints of the same class tend to belong to

the same cluster.

73

CHAPTER 6

Assessing Fingerprint Individuality:

Compound Stochastic Models

6.1 Motivation

There are two sources of ﬁngerprint variability in matching, namely interclass variability
and intraclass variability. Chapter 5 addressed interclass variability, i.e., minutiae variabil-
ity in different ﬁngers in target population. While still taking minutiae locations and di-
rections as the salient features, this chapter focuses on modeling both intraclass variability
and interclass variability. The previous studies discussed in Chapter 2 ( [3], [11], [16], [18],
[23], [41], [47], [50] ) did not model intraclass variability. Though the corrected uniform
model by Pankanti et al. [35] estimated parameter I from empirical genuine matching, their
model didn’t study the intraclass variability intensively. In this chapter, compound stochas—
tic models are developed to account for three sources of minutiae variability, namely, (i)
variability in minutiae distributions in different ﬁngers, (ii) variability due to local perturba-
tions arising from non-linear distortion effects in multiple impressions of the same ﬁnger,
and (iii) variability due to partial prints in multiple acquisitions of the same ﬁnger. The

three sources of variability mentioned here account for most of the variability in minutiae

74

 

Database of F fingers with L impressions per finger

D

Minutiae extraction and combination of L impressions to
obtain a master for each finger

JZL

Compound Stochastic Model i) finite mixture model for minutiae

 

 

 

 

 

 

 

 

 

 

for multiple sources of ii) local perturbation model

 

 

minutiae variability iii) sub-region sampling model

J

Validation of compound stochastic models

ll

Estimation of fingerprint individuality

 

 

 

 

 

 

 

 

 

Figure 6.1: Flow chart for constructing compound stochastic model and assessing ﬁnger-
print individuality

distributions. Since a compound stochastic model involves both interclass and intraclass
variability, it is a more realistic model compared to the hyper-mixture model introduced in
Chapter 5 which only addressed the ﬁrst variability mentioned above. The ﬂow chart in

Figure 6.1 gives the steps involved in constructing the compound stochastic model.

6.2 Compound Stochastic Model

6.2.1 Construction of Master Minutiae Set

Suppose a ﬁngerprint database consists of prints of F different ﬁngers with L impressions
per ﬁnger. Let .7: ( f, 1) denote the l-th impression of the f-th ﬁnger. As a ﬁrst step to-
wards constructing the compound stochastic model for ﬁnger f, the minutiae in all the L

impressions of a ﬁnger, .7-"( f, l),l = 1, 2, . . . ,L, are combined to obtain a “master” set. A

75

reference impression for each ﬁnger f, without loss of generality, f ( f, 1), say, the ﬁrst
impression, is chosen as follows: Since the quality and the sensed area corresponding to
different impressions of the same ﬁnger are typically different, the reference impression
is chosen as the one that has relatively good quality computed according to [8] and maxi-
mizes, on the average, the number of minutiae matches with all other impressions of that
ﬁnger.

Once the reference impression is determined, all other impressions are aligned to it via
a Procrustes transformation [29] based on the correspondences from the matcher reported
in Section 6.4.1. Thus, for each I = 2, 3, . . . ,L, .7-‘(f, l) is aligned to .7-"(f,1), using the
rigid transformation T ( f, l), and correspondences between minutiae in .7: ( f, l) and .77 ( f, l)
are found. The correspondence between the minutiae sets was achieved by both auto-
matic ﬁngerprint matching and manual veriﬁcation as described in section 6.4.1. When
a minutia in .7: ( f, I) does not have any corresponding minutiae in f ( f, 1), that minutiae
is appended to the list of minutiae in .7: ( f, 1). The consolidation of minutiae into the
master set in this way eventually results in a total of n consolidated minutiae in .7: ( f, 1)
with correspondence sets M k, k = 1, 2, . . . ,n. The elements in each Mk are denoted by
{(Skj, ij), j = 1, 2, . . . ,mk}, where SM and ij are, respectively, the j-th location and

direction of minutiae Is. For each set of correspondences, deﬁne the mean, or the center,

mk
_ 1 _
of Skj as 5;, = — Z Skj. The mean of ij, Dk, is taken to be the phase angle of the
mk j=l
mk
complex number 2 cos(ij) + isin(ij) (see also [28]). The deviations of locations
i=1

and directions from their respective centers for the k-th minutiae are given by
{(Skj‘gk,ij—Dk),j=1,2,-.-,mk}- (6.1)

An illustration of the construction of a master minutiae set is presented in Figure 6.2, in
which multiple impressions of the same ﬁnger (top panel) are aligned to the reference image
(bottom left panel) to obtain the master minutiae set with minutiae centers shown (bottom

right panel).

76

 

 

 

 

 

 

 

 

450 - 4 -
o 100 200 300 400

(b) Reference impression (c) Master

Figure 6.2: Master minutiae set construction. Eight impressions are shown which include
the reference impression (b) and the other seven impressions (a). The number of minutiae
in each impression in the ﬁrst row is 29, 30, 27, 32, 32, 38, 24. The number of minutiae in
the reference impression in the second row is 38. There are 70 minutiae centers kept in the
master minutiae set .

6.2.2 Mixture Model on the Centers: Adaptation of Mixture Models
to Compound Stochastic Models

The ﬁrst two stages of the compound stochastic models consist of developing statistical
models on (i) the centers, and (ii) the deviations of the observed minutiae from their re-
spective centers.

Each minutia center in the master, (5k, Dk), k = 1, 2, . . . ,n, is assumed to be indepen-
dently distributed according to the mixture density deﬁned in Equation 3.5. The parameters

of the mixture model are estimated using the method introduced in Chapter 3. Figure 6.3

77

450 a _50 . s
30‘
400 ' IE 0 ' a
350 ' I- 50 o :9
300 . m u 100’ I oh
250 . h u I u- ‘ 150 ’ “a ‘
'"' r 3 °
200 l n It i 200 ' ‘ it *
150 ' F F 250’ to? ‘7‘ * *
100 "q - ”I? ‘3‘ l
h In 300 . i' it”
so i 350 r, e ‘
0 - I: 400 ' I
l
-50 e L

 

 

 

 

0 100 200 300 400

 

 

 

 

0' r L ‘
0100 200 3004

00 500

(a) (b)

(a) Master (b) Components of mixture models

Figure 6.3: The ability of the mixture model to capture clustering characteristics of the
master in (a). The eight impressions are shown in Figure 6.2. Three cluster components
labeled by circles, squares and asterisk in the mixture model ﬁtted to the minutiae in (a).

shows an example of ﬁtted mixture model to a master minutiae set. Figure 6.3 (a) is the
master minutiae set obtained from eight different ﬁngerprint impressions of a ﬁnger. Fig-
ure 6.3 (b) shows three clusters, labeled with circles, squares and asterisks, obtained by the

mixture model ﬁtting.

6.2.3 Local Perturbation Model

The local perturbation model consists of a probability model to capture ﬁngerprint distor-
tions in different ﬁnger regions. For the local perturbation model, the domain of the master
is ﬁrst divided into a lattice of b0 non-overlapping blocks, 13 = {Bb, b = 1, 2, . . . ,bO }. If

the mean of the k-th minutiae 3,, belongs to 35, then the k-th minutiae is assigned to block

78

Bb.

In block Bb, the location deviations of all the minutiae that fall in 81,, i.e.,
{Skj — 3k : 3k E Bb},

are modeled as a bivariate normal distribution with mean zero and covariance matrix
COVBb. The covariance matrix COVBb allows for ﬂexible modeling of the dominant
directions of distortions via the eigenvalues and eigenvectors of COVBb- In this chapter,

the covariance matrix COVBb is estimated by

mk
6073,) =% Z Z (Skj — 5k) ' (Skj — EDT,
k : SkEBb j=1
where N = 2,615,663,) mk.

The minutiae direction deviations, on the other hand, {ij — Dk : 5'), E Bb}, are
modeled as a Von-Mises distribution with mean zero and precision 53b. The unknown
parameter 1:3,) is estimated from the observed deviations in each block Bb (based on the
estimation procedure given in [28]).

The local perturbation model assumes that the non-linear distortions of different minutiae
within the same block are independent and identically distributed, whereas the distortions
in different blocks can be different. Figure 6.4 (a) shows two minutiae, labeled as 1 and 2, in
the reference image. The locations and directions of minutiae 1 and 2 in seven other aligned
impressions are shown in Figures 6.4 (b) and (c). Note that there are multiple location
and direction values for the same minutiae in different impressions of the same ﬁnger.
The changes in the location and direction values are due to nonlinear distortion introduced
during sensing as the three-dimensional ﬁnger surface is projected onto a two-dimensional
plane. The amount of distortion is different in different regions of the ﬁnger. The distortion
is usually less in regions closer to the center of a ﬁnger, compared to peripheral regions
due to nonuniform pressure of the ﬁnger against the sensor [7]. In the area closer to the

center of the ﬁngerprint image, the pressure is high and the slippage is little and therefore

79

  

”g”

(b) minutiae 1 (c) minutiae 2

   

Figure 6.4: Consolidating minutiae: (a) a partial ﬁngerprint image. (b) and (c) show the
locations and directions of the two labeled minutiae in (a) from eight aligned impressions.

the distortion is small; However, in the outer region, the pressure is small and slippage can
be large which leads to large distortion. Computing the average location (5),) and average

direction (Dk) for each minutia smoothes out the noise.

6.2.4 Modeling the Variability of Partial Prints

The third and the ﬁnal component of the compound stochastic model accounts for minutiae
variability due to partial prints in multiple acquisitions of a ﬁnger. The partial print region
can be determined as the minimal ellipse discussed in Chapter 4.

For the impression .7-"( f, l), the following parameters uniquely determine the minimal
ellipse discussed in section 4.6: the area [A( f, l)], length of major axis [p1( f, l)], the ori-
entation [6( f, l)], and the center [c( f, l)]. In our experiments, the ratio of the lengths of the
major to minor axes of each ellipse (say, 7'0) is ﬁxed. Thus, the effective ellipse parame-
ters reduce to the triplet E(f, l) E {A(f, l), 0(f, l),c(f, 1)}. Denote the collection of all
the ellipse parameters for all the ﬁngerprint impressions in the ﬁngerprint database by 8,
and let T E {T_1( f, l)}, where T( f, l ) is a Procrustes transformation [29] used to align
.7-"( f, l ) to .7-"( f, 1). With these quantities deﬁned, a conditional minutiae synthesis method
can be applied to estimate the ﬁngerprint individuality, which is described in the following

section.

80

6.3 Conditional Minutiae Synthesis

In Chapter 4, an analytical method, namely the Poisson model, based solely on the in-
terclass variability was introduced for the assessment of ﬁngerprint individuality. When
additional sources of minutiae variability are considered, larger number of parameters are
involved in calculating ﬁngerprint individuality requiring a more elegant analytical for-
mula. An alternative synthesis method, namely the conditional minutiae synthesis method,
is developed in this section so that all sources of variability in the stochastic model, i.e.,
both interclass and intraclass variabilities, are considered in the simulation procedure and
the simulated minutiae sets are matched by a matcher to ﬁnd the PRC. This method re-
places the step of ﬁngerprint individuality calculation by matching the simulated minutiae
sets.

A minutiae set is synthesized for a ﬁnger consisting of a pre—speciﬁed number (m) of
minutiae. In order to synthesize this minutiae set, the minimal ellipse that encompasses all
minutiae needs to be simulated ﬁrst. The areas of best-ﬁtting ellipses are found, in general,
to be strongly correlated with the total number of minutiae, i.e., m( f, 1 ), in a ﬁngerprint
impression. As an example, an illustration is given based on FVC 2002 D31 database.
In this database the ellipse area A( f, 1) is positively correlated with m( f, 1) (see Figure
6.5), while the other variables had no signiﬁcant correlation. Consequently, a quadratic
polynomial in m(f,1), i.e., Q0(m(f, 1)), was ﬁtted to the scatter plot of (m(f,1),A(f, 1))
(Figure 6.5). The residuals from the ﬁt were found to follow a normal distribution with
mean u and standard deviation 00, where p, = 4.5 and 00 = 1.4 x 104. As a consequence,
the area of a partial print with a ﬁxed number of minutiae can be simulated.

An illustration of the conditional synthesis technique is given based on the FVC 2002
DB1 database. Panels (a) and (b) in Figure 6.6 give an instance of a ﬁnger f. Panel (a)
shows the constructed master set with m minutiae, and panel (b) shows the corresponding
minimal ellipse. The procedure of minutiae synthesis is as follows: (i) A random ellipse

is generated whose area is a sample from a normal distribution with mean Q0(m) and

81

x104
15

 

10_

Area of Ellipse

LII

 

 

 

 

50 60

Number of Minutiae

Figure 6.5: Scatter plot of the area of ellipse [A( f, l )] versus the total number of minutiae
[m( f, 1 )], and the ﬁtted quadratic polynomial for the FV C 2002 DB] [26].

standard deviation 00, and the orientation and center of the ellipse are randomly selected
from the second and third components of 8 . (ii) a minutia center is then generated from the
mixture model of master f. (iii) A deviation is generated according to the local perturbation
model, and compounded to the generated minutiae center from step (ii). This synthesized
minutiae is retained if its location lies within the ellipse in step (i) and rejected otherwise.
Steps (ii) and (iii) are repeated until m synthesized minutiae fall inside the ellipse. In Figure
6.6, panel (c) shows the m synthesized minutiae centers from the mixture model, whereas
panel ((1) shows the synthesized minutiae after compounding with the local perturbation
model. (iv) Finally, the m minutiae are transformed by a random rigid transformation from
T to form the synthesized impression (Figure 6.6 (e)). During this synthesis procedure,

the inter-minutiae distances in an impression are controlled so that they are no smaller than

82

 

   

 

 

l

.0. l

150 l 1

200 [ i

250 l

300 l i

-50 0 50 100 150 200 250 300 350

(a) (b)
100??? 5

150}
200L
250‘,
300!
350}

 

 

 

. l
150 200 250 300 350 100150 200 250 300 350
(C) (d) (c)

Figure 6.6: Simulating m0 = 36 minutiae for FVC 2002 DB1: (a) Finger impression,
(b) Minutiae and minimal ellipse for the impression, (c) Random ellipse and synthesized
minutiae centers from the mixture model, (d) Synthesized minutiae after compounding with
local perturbations, and (e) Synthesized impression after rigid transformation.

the minimal inter-minutiae distance estimated from the empirical databases. Note from
Figures 6.6 (c-e) that the synthesized ﬁngerprint has similar clustering characteristics and
dependence structure as the original ﬁngerprint (Figure 6.6 (a)). The ﬁngerprint synthesis
procedure described above is able to simulate any number of synthetic impressions of a
ﬁnger (as a minutia set) with a pre-determined number of minutiae (m) while preserving
the clustering and dependence characteristics of the minutiae.

Given a query ﬁngerprint Q with n minutiae and a template T with m minutiae, the prob-

ability that Q and T share exactly to minutiae is needed to estimate ﬁngerprint individuality.

83

This probability is given by the expression

p(wlm, n) = P{S(Q,T) = w I #Q = n, #T = m, IQ 75 IT}, (6.2)

where S (Q, T) is the number of minutiae matches between Q and T as determined by a
matcher.

The conditional synthesis technique, described earlier in this section, is applied to simulate
ﬁngerprint impressions so that each Q (respectively, T) has exactly 71 (respectively, m)
minutiae. Without loss of generality, we assume m = 71. Corresponding to each ﬁnger
f in the database, multiple synthetic impressions are generated based on the conditional

synthesis technique. The resulting synthetic database is denoted as follows,
{J-"*(f,1), 1 = 1, 2, ..., H, f = 1, 2, ..., F},

where .7-'*(f, 1) is the 1-th synthetic impression from ﬁnger f. In order to obtain the dis-
tribution of the number of impostor matches for the synthetic database, the ﬁngerprint
matcher reported in [40] is applied to each pair of impostor ﬁngerprints. A description of
the matcher is described in section 6.4. To compute the probability of 11) matches, namely
p(wlm, n), the number of impostor pairs that resulted in w minutiae matches is counted,
and this number is then divided by the total number of impostor pairs. Thus p(wlm, n) is

given by the following equation

H H F F

:2 Z Z Iw{(frl)r(f’rl,)}

l=11l=1f=1f1=1
f¢ ’

F(F — 1)H2 ’ (6'3)

 

p(wlm, 71) =

where Iw{(f,1), (f’,l’)} is 1 ifS(.F*(f,1),.7-'*(f’,1’)) equals to, and 0 otherwise. Note that
p(wlm, 72) provides an estimate of the probability in Equation 6.2 based on the synthetic
database.

Although the compound stochastic models incorporate more sources of minutiae vari-

ability compared to the hyper-mixture model, they have limited capability to estimate ﬁn-

84

gerprint individuality. While Equation 6.3 gives reliable estimates of ﬁngerprint individu-
ality for small and moderate values of w, the estimate obtained for large to is not reliable.
In the case of large w, the true value of Equation 6.2 is extremely small. As a consequence,
Equation 6.3 gives zero as the estimate of Equation 6.2, due to limitations of numerical

precision.

6.4 Description of Matchers

6.4.1 Matcher for Master Construction

During master construction, in order to ﬁnd correspondence between multiple impressions
of the same ﬁnger, an adaptive elastic string matcher, developed by Jain et al. [1], was
applied. One can reference the original article for details of this matcher. After applying
the matcher, the reported correspondence was manually checked to remove false matched

minutiae pairs and to add true matched minutiae pairs that were not detected by the matcher.

6.4.2 Matcher for Synthesized Minutiae Matching

In order to match the minutiae sets synthesized from the compound stochastic models, a
matcher developed by Ross et al. [40] was used. Unlike the matcher for construction above
which utilizes the ﬁngerprint ridges besides minutiae, this matcher utilizes only minutiae
information. The matcher was implemented as follows. First, two minutiae, one query
minutia and one template minutia were selected to form a reference minutiae pair. form a
reference minutiae pair. Then the two minutiae sets were aligned by translating the query
minutiae set, so that the reference minutiae pair had identical locations. Next, the query
minutiae set was rotated about its reference minutiae, which maximized the number of
minutiae that were paired (i.e., fell within a tolerance window) with the template minu-

tiae set. The above procedure was repeated till all possible reference minutiae pairs were

85

considered, and the maximum number of matched minutiae pairs was reported.

6.5 Summary

A family of compound stochastic models was developed to account for both interclass
variability and intraclass variability of ﬁngerprints. Based on the models, a conditional
minutiae synthesis method was introduced to simulate minutiae sets, which were then com-
pared between simulations of different models (i.e., ﬁngers) with a matcher. Based on the
obtained probability distribution of the number of matched minutiae pairs, ﬁngerprint in-
dividuality was then reported as the probability that the matched minutiae pairs exceeds a

given threshold.

86

CHAPTER 7

Experimental Results

7 .1 Fingerprint Databases

The methodology for assessing the individuality of ﬁngerprints is validated on three target
populations, namely, the NIST 2000 SD 4 [30] (denoted as “NIST” in this chapter), FVC
2002 DB1 (denoted as “DB 1” in this chapter) and FVC 2002 DB2 [26] (denoted as “DB2”
in this chapter) ﬁngerprint databases. All the three databases are publicly available. The
NIST database contains 2,000 8-bit gray scale ﬁngerprint image pairs of size 512-by-512
pixels. Because of the relative large size of the images in the NIST, the ﬁrst image of each
pair was used for statistical modeling. Minutiae could not be automatically extracted from
two images of the NIST due to poor quality. Thus, the total number of ﬁngerprints used in
the experiments for NIST is F = 1, 998. For the FVC 2002, the DB1 impressions (image
size = 388 x 374) are acquired using the “T ouchView H” optical sensor by Identix, while the
DBZ impressions (image size = 296 x 560) are acquired using the “FXZOOO” optical sensor
by Biometrika. Both DB 1 and DB2 databases consist of ﬁngerprints of 100 different ﬁngers
with 8 impressions per ﬁnger. Because of the small size of the DB1 and DB2 databases,
the minutiae consolidation procedure was adopted to obtain a master nrinutiae set for each

ﬁnger. The mixture models were subsequently ﬁtted to each master. Figure 7.1 shows two

87

 

(e) (D

Figure 7.1: Examples of ﬁngerprint images from different databases. Images (a—b) are from
NIST database [30]; Images (c-d) are from D81 and images (e-f) are from DBZ [26] .

88

 

 

Relative Frequency
N
°\°

 

 

 

 

0 50 100 150
Number of Minutiae

Figure 7.2: Empirical distribution of the number of minutiae (m,n) in the NIST database.
The average number of minutiae is 62.

ﬁngerprint images from each of the three databases.

The distribution of the number of minutiae (m, n) for the images in NIST is shown in
Figure 7.2 and those of the master minutiae sets from DB1 and DB2 are given in Figure
7.3, panels (a) and (b), respectively (The distribution of m and the distribution of n are
identical, and hence only one histogram is shown). The average number of minutiae for the
images in NIST and the master minutiae sets in DB1 and DB2 databases are approximately
62 , 63 and 77, respectively.

Based on the three databases, experiments were performed to validate the ﬁngerprint

individuality models developed in this thesis.

89

 

6%

01
o\°

A
o\°

m
o\°

Relative Frequency
(a)
°\°

ree-

 

 

 

 

 

 

 

 

l_

 

20 4o 60 8O ”7’ 100
Number of Minutiae

(a)

 

7%

Relative Frequency
0)
o\°

A
0

°\
I

or
o\°

.rs
o\°

co
o\°

N
o\°

 

O
O

 

I
L
L

 

 

 

 

 

 

 

 

 

50 100 150
Number of Minutiae

(b)

Figure 7.3: Empirical distributions of the number of minutiae (m,n) in the master prints
constructed from (a) DB1 database, and (b) DB2 database. Average number of minutiae in
the master minutiae set for the two databases are 63 and 77, respectively.

90

7 .2 Fitting the Mixture Models

The best ﬁtting mixture model (see Equation 3.5) was found for minutiae in each ﬁngerprint
and the goodness-of-ﬁt test was applied to each database. The results for the goodness
of ﬁt for the mixture model as well as those for the uniform model (see Equation 3.22)
are reported in Tables (7.1- 7.3) with W = 2,4,10,20,50 and V = 4,6,9. For all the
three databases, the numbers of ﬁngerprint images with p—values above (corresponding to
acceptance of H0) and below the threshold 0.05 (corresponding to rejection of H0) were
computed. The results show that the mixture model is generally a better ﬁt to the observed
minutiae distribution compared to the uniform. For example, when W = 10 and V = 4,
the mixture is a good ﬁt to 1,948 out of 1, 998 images from the NIST (corresponding to p-
values above 0.05) based on the Freeman-Tukey test. For the Chi-square test, this number
is 1, 945. In comparison, the uniform model is a good ﬁt to only 360 and 352 images based

on the Freeman-Tukey and Chi-square tests, respectively.

7 .3 Fitting the Hyper-mixture Models

Hyper-mixture model assumes that there are N * clusters in the target population, which
can be estimated based on the gap statistic GN. The gap statistic G N as a function of the
number of clusters for the NIST was shown in Figure 5.1, and the plots for the DB1 and
DB2 databases are shown in Figure 7.4. Based on these ﬁgures, N * for the NIST, DB1 and

DBZ databases are estimated as 9, 12 and 33, respectively.

7.3.1 A Check to See if the Clusters of Mixture are Meaningful

Replicability is an important feature for a meaningful cluster analysis. Ideally, the cluster
analysis is meaningful when re-performing it on a new sample produces similar results as
the original clustering. In reality, if the variation in the clusters is within a reasonable limit,

the cluster analysis can be considered to be reliable. To evaluate reliability of the hyper-

91

 

NIST
Freeman-Tukey Test (a = 0.05)

 

Chi-square Test (a = 0.05)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(W,V) Mixture Uniform Mixture Uniform Average#
Accepted Accepted Accepted Accepted Blocks
(2,4) 1,567 708 1,542 714 6.2
(4,4) 1,914 153 1,909 179 7.2
(10,4) 1,945 352 1,948 360 8.2
(20,4) 1,935 390 1,938 397 8.3
(50,4) 1,940 405 1,938 395 8.3
(2,6) 987 425 911 416 7.0
(4,6) 1,880 135 1,877 148 7.4
(10,6) 1,942 331 1,939 327 8.2
(20,6) 1,937 396 1,937 395 8.3
(50,6) 1,939 403 1,936 392 8.3
(2,9) 981 462 933 456 7.4
(4,9) 1,882 137 1,865 150 7.5
(10,9) 1,944 325 1,945 330 8.2
(20,9) 1,938 393 1,942 396 8.3
(50,9) 1,939 407 1,937 392 8.3

 

Table 7.1: Results of the Freeman-Tukey and Chi-square tests for testing the goodness-of-
ﬁt of the mixture and uniform models on NIST. (W, V) means the whole minutiae location
space S is partitioned into W equal-size rows by W equal-size columns and the minutiae
direction space D is partitioned into V equal-size blocks initially, prior to merging blocks
of insufﬁcient minutiae with their neighboring blocks. Entries correspond to the number
of ﬁngerprints in each database with p—values above 0.05. The total number of mixture
models that are tested in NIST is 1,997 because three out of the 2,000 ﬁngerprints don’t
have enough minutiae (at least 5) for the tests.

92

 

DB1
Chi-square Test (a = 0.05) Freeman-Tukey Test (a = 0.05)

 

 

(W,V) Mixture Uniform Mixture Uniform Average#
Accepted Accepted Accepted Accepted Blocks

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(2,4) 57 14 55 9 4.7
(4,4) 60 2 57 2 4.8
(10,4) 90 0 89 O 4.4
(20,4) 92 O 90 0 4.8
(50,4) 93 O 92 O 4.9
(2,6) 38 10 36 7 5.1
(4,6) 48 1 45 O 5.0
(10,6) 87 0 85 O 4.6
(20,6) 91 O 89 0 4.8
(50,6) 92 O 91 0 4.8
(2,9) 41 14 41 12 5.4
(4,9) 46 0 47 0 5.4
(10,9) 86 0 87 O 4.6
(20,9) 91 O 89 0 4.8
(50,9) 93 O 92 0 4.9

 

Table 7.2: Results of the Freeman-Tukey and Chi-square tests for testing the goodness-of-
ﬁt of the mixture and uniform models on DB 1. (W, V) means the whole minutiae location
space S is partitioned into W equal-size rows by W equal-size columns and the minutiae
direction space D is partitioned into V equal-size blocks initially, prior to merging blocks
of insufﬁcient minutiae with their neighboring blocks. Entries correspond to the number of
ﬁngerprints in each database with p-values above 0.05. The total number of mixture models
that are tested in this database is 100 since all master minutiae sets have total number of
minutiae more than 5.

93

 

DB2
Chi-square Test (a = 0.05) Freeman-Tukey Test (a = 0.05)

 

 

(W,V) Mixture Uniform Mixture Uniform Average#
Accepted Accepted Accepted Accepted Blocks

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(2,4) 47 4 44 8 6.9
(4,4) 67 2 64 5 7.4
(10,4) 92 1 94 2 8.8
(20,4) 94 5 94 5 10.2
(50,4) 95 3 93 4 10.5
(2,6) 37 6 35 6 7.5
(4,6) 57 2 52 2 8.0
(10,6) 95 2 94 2 9.0
(20,6) 96 5 95 5 10.2
(50,6) 95 4 93 4 10.5
(2,9) 27 4 28 6 8.0
(4,9) 37 l 38 1 8.5
(10,9) 94 2 93 3 9.1
(20,9) 95 4 94 5 10.2
(50,9) 95 4 93 4 10.5

 

Table 7.3: Results of the Freeman-Tukey and Chi-square tests for testing the goodness of
ﬁt of the mixture and uniform models on DB2 database. (W, V) means the whole minutiae
location space S is partitioned into lW equal-size rows by W equal-size columns and the
minutiae direction space D is partitioned into V equal-size blocks initially, prior to merging
blocks of insufﬁcient minutiae with their neighboring blocks. Entries correspond to the
number of ﬁngerprints in each database with p-values above 0.05. The total number of
mixture models that are tested in this database is 100 since all master minutiae sets have
total number of minutiae more than 5.

94

 

 

 

 

 

 

 

 

 

 

2.5 e r 4
2 ..
1.5 - -
Z
0
1 -
0.5 ~ -
0 r r r
O 9 20 30 4O 50
Number of Clusters, N
(a) DB1
3 r I ‘I
2.5 ~ ~
2 - .
(DZ 1 5 _
1 ~ -
0.5 ~ N\ «
O ‘ '
0 12 20 30 40
Number of Clusters, N
(b) DB2

Figure 7.4: Estimating the number of clusters for FV C database.The estimated number of
clusters for DB1 and DB2 are 9 and 12, respectively. The horizontal axis shows the number
of clusters N and the vertical axis is the value of gap statistic at N.

95

 

,,,,,

 

aouerdeoov )luea 1991103 10 requrnN

 

 

i i i r
5 10 15 20 25 30
rank

_-
u
.

p—.......uu.~ wuv<..>..v.r.....v......... .a......o- ..... .. .-
. n
u o
n u

600

 

Figure 7.5: Analysis of Hyper-mixture model using the NIST. The ﬁgure shows the cumu-
lative distribution of the ranks for the ﬁngerprints in the validation set. The horizontal axis
shows the rank, and the vertical axis shows the number of ﬁngerprints that have a rank less
than or equal to a given rank.

mixture model, a check [4] was implemented on the NIST database. In this database, there
are two impressions for each ﬁnger. Thus for the total of 2,000 ﬁngers, there are 2,000 pairs
of images. Most of the pairs (1,975 out of the 2,000 pairs) share at least 3 minutiae features
(the ﬁtting process of the mixture models requires at least 3 minutiae features), but 25 pairs
do not. The check was performed only for the minutiae shared by each of the 1,975 pairs.
First, the two databases were partitioned into two sets, namely a development set and
a validation set. Each set has one and only one impression for each ﬁnger. When apply-
ing the hyper-mixture model on the development set, the ﬁngerprints were grouped into
33 clusters. For each ﬁngerprint in the validation set, a Hellinger distance between the
ﬁngerprint and each of the 33 clusters was calculated. A ﬁngerprint is similar to a cluster
if the Hellinger distance between them is small. Ideally, the Hellinger distance between a
ﬁngerprint and a cluster containing its counterpart (i.e., the ﬁngerprint of the same ﬁnger)
should be the smallest, which is given a rank one. However, in reality, it is not always true.

Therefore, a rank k can be given to each ﬁngerprint, which describes that the Hellinger dis-

96

tance between the ﬁngerprint and the cluster containing its counterpart is the kth smallest
among the Hellinger distances for the 33 clusters. The rank k characterizes quantitatively
the reliability of the clustering method, i.e., the smaller the k, the more reliable the cluster-
ing method is. Figure 7.5 shows the cumulative distribution of the ranks for the ﬁngerprints
in the validation set. The horizontal axis shows the rank, and the vertical axis shows the
number of ﬁngerprints that have a rank less than or equal to a given rank. As the curve
shows, 1,564 out of the 1,975 mixture models in the validation set were within rank 5, indi-
cating good reliability of the clustering procedure, that is, the clustering is capturing some

aspects of inter-class variability in the population.

7 .3.2 Evaluation of Hyper-mixture Models on Assessment of Finger-

print Individuality

In order to improve computing efﬁciency, hyper-mixture model was introduced to replace
the individual mixture models with an average density of similar mixture models. Obvi-
ously, this modiﬁcation should not signiﬁcantly change the individuality estimates if the
clusters represent original characteristics of the population. Therefore, it is necessary to
compare the hyper-mixture model with the individual mixture models. Experiments were
carried out on the three databases. The PRCS estimated by the hyper-mixture model (Equa—
tion 5.8) are shown in the ﬁrst row in each block of Table 7.4. The PRCS estimated by the
individual mixture models, on the other hand, are calculated according to Equation 7.1, and

are shown in the middle row in each block of Table 7.4.

— Z: 1,2,...,L, ,2‘ ' P(w; iaj)
PRCNIixture = {(z'J)E{ L X61332; a (7-1)

 

where

_),(i,j) Mid)!” (7,2)

and A(z’, j) is estimated by Equation 4.13 (i.e., Poisson model) for ﬁngers z' and j.

97

 

Database (m, 71,111) Method PRC PRC Ratio
NIST (62,62, 12) Individual Mixture Model 6.1 x 10-3 l
NIST (62,62, 12) Hyper-mixture Moder 4.1 x 10-4 1/14
NIST (62, 62, 12) Random Growing 1.0 x 10-5 1/554
DB1 (63, 63, 12) Individual Mixture Model 2.7 x 10-2 1
331 (63,63,12) Hyper-mixture Model 5.9 x 10-3 1/5
DBl (63,63,12) Random Grouping 5.5 x 10-5 1/491
332 (77,77,12) Individual Mixture Model 3.2 x 10-2 1
332 (77,77,12) Hyper-mixture Model 8.4 x 10-3 1/4
DBZ (77,77, 12) Random Grouping 1.6 x 10’3 1 / 20

 

 

 

 

Table 7.4: Comparison of PRCS estimated from three different methods: (1) Individual
mixture models, (11) Hyper-mixture models, and (111) Random grouping. The right-most
column shows the relative ratios of the PRCS.

Comparing the top two rows in each block of Table 7.4, it appears that the PRCS from the
individual mixture model and those from the hyper-mixture model are similar. However,
without a reference, it is difﬁcult to see how similar they are. That is why a third row has
been included in each block of Table 7.4, which is the PRCs from random grouping, i.e.,
while keeping the same number of clusters and the same number of ﬁngerprints in each
cluster as in the hyper-mixture model, the members in each cluster are randomly selected
from the entire database with no replacement. In each block of Table 7.4, as the right-most
column shows, the PRCS from random grouping (the bottom row) is always at least 20 times
smaller than those from the individual mixture model (the top row); whereas the PRCS from
the hyper-mixture model (the middle row) is at most 14 times smaller than those from the
individual mixture model. Therefore, compared with the PRCS from random grouping,
those from the hyper-mixture model are more similar to those from the individual mixture

models, which gives support to the clustering based on hyper-mixture models.

98

 

Database (m, 11,111) N * Mean Mean A PRCa
Fingerprint area Hyper-Mixture Model

 

NTST (62,62,12) 33 2.5 x 105 2.5 4.1 x 10"4
D31 (63,63,12) 9 1.2 x 105 5.1 5.9 x 10-3
DB2 (77,77,12) 12 1.8 x 105 5.14 8.4 x 10-3

 

Table 7.5: The number of clusters, N *, as well as mean A and PRCa based on the hyper-
mixture models for the three databases.

7.4 Assessment of Fingerprint Individuality with the
Hyper-mixture Models

For the three databases, the agglomerative clustering procedure in Chapter 5 was canied
out for the ﬁtted mixture models to estimate the number of clusters, i.e., N * . The re-
sults are shown in Table 7.5, which also gives the following quantities for each database:
the numbers of minutiae in master minutiae sets (namely m and n), the ﬁngerprint area,
and the parameter A for the mixture model representing the expected mean number of im-
postor matches from the mixture models. The last column in Table 7.5 gives the mean
PRC, PRCQ, corresponding to w = 12 based on the hyper-mixture model (i.e., obtaining
12 or more matches). The parameter a was chosen to be 0.05 to correspond to the 5%
trimmed mean of the probabilities. Note that while the mean values of m and n for the
NIST and DB1 databases are similar, the mean of A for DB1 is much larger than that for
NIST database, resulting in a much larger mean PRC for DE] compared to that for NIST
database. Comparing DB1 and DBZ, the mean A remains the same but the mean value of
minutiae in DB2 is much larger than that in D31 (77 vs. 63). A larger number of total
minutiae implies a greater chance of obtaining a random match and hence a larger value
for the PRC.

A comparison of TWO, (Oz 2 0.05) was carried out for two different choices of A for

the Poisson model: (i) the A that was derived from the cluster of the mixture models (see

99

 

Database (m, n, w) Hyper-Mixture Model Uniform Empirical

 

NIST (62,62,12) 4.1 x 10-4 2.9 x 10-7 3.4 x 10'-3
331 (63,63,12) 5.9 x 10-3 1.0 x 10-4 1.4 x 10—2
332 (77,77,12) 8.4 x 10-3 8.4 x 10—5 1.9 x 10*2

 

Table 7.6: A comparison of the PRCa obtained from the mixture and uniform models based
on mean 711., n with empirical values.

 

Database (m,n,w) Hyper-Mixture Model Uniform Model Empirical

 

Mean A Mean A
NIST (62, 62, 12) 2.5 1.5 7.1
DB1 (63,63, 12) 5.1 3.0 8.0
DBZ (77,77, 12) 5.1 3.0 8.6

 

Table 7.7: A comparison of the mean number of matches obtained from the mixture and
uniform models and empirical matches.

Equations 4.29, 5.6 and 5.8), and (ii) the A that was derived from the uniform model (see
Equations 5.9 and 5.8). The values of m and n are taken to be the mean in each database.
The REG/s obtained from the mixture model are reported in Table 7.5. Table 7.6 gives
the PRC—a from the mixture and uniform models corresponding to w = 12 from the NIST
and FV C 2002 based on the ﬁngerprint area. Note that the ﬁngerprint individuality esti-
mates using the mixture models are at least one order of magnitude higher compared to the
uniform model. It is because when minutiae from the query and template have similar clus-
tering tendencies, a larger number of random matches will arise compared to the uniform
model. The empirical PRCS for w = 12 in each database is the proportion of impostor pairs
with 12 or more matches among all pairs that have m and n values within i5 of the mean.
Using the matcher reported in [40], the n query minutiae (5?, D29), 1' = 1, 2, . . . ,n, are
optimally aligned with the m template minutiae (S511, Dir), j = 1, 2, . . . ,m, to obtain the
best number of matches between each impostor pair. The mean number of impostor minu-
tiae matches for each database is reported in Table 7.7. Note that the empirical number of

matches and the PRCS are closer to the values derived from the mixture models compared

100

 

Database (m, n, to) N * Mean A Hyper-Mixture Uniform

 

Hyper-Mixture Model
NIST (46,46,12) 33 1.9 2.3 x 10—6 5.0 x 10-10
331 (46, 46, 12) 9 2.7 5.6 x 10—5 2.8 x 10—7
332 (46,46,12) 12 1.8 4.1 x 10“6 3.2 x 10‘9

 

Table 7.8: A comparison between PRCa obtained from the mixture and uniform models
form=n=46andw =12.

 

Database (m,n) Mean Overlapping Area (pixe12) M

 

NIST (52,52) 1 12,840 413
DB1 (51,51) 71,000 259
DB2 (63,63) 1 10,470 405

 

Table 7.9: Table giving the mean m and n in the overlapping area, the mean overlapping
area and the value of M for each database.

to those from the uniform model, suggesting the appropriateness of the mixture models in
representing the distribution of minutiae.

Since the mathematical model for the PRC was developed for any combination of m, n
and w, the trimmed mean PRC value corresponding to m = n = 46 and w = 12 can be
found for the three databases as an example. These PRCs are given in Table 7.8 for the
mixture and uniform distributions. Note, again, that the PRCS derived from the mixture
model are orders of magnitude higher compared to those from the uniform model.

In the following paragraphs, the results obtained from the proposed methodology in this
thesis are compared with those of Pankanti et a1. [35], introduced in Section 2.3. There
are two main differences between the experiments presented in this section and the ones
discussed in the previous paragraphs (i.e., Tables 7.6 and 7.8). First, the “corrected” uni-
form model of Pankanti et al. [35], instead of the fully uniform model, is considered (the
“corrected uniform model” was discussed in section 2.3.3). Second, the overlapping area

between the query and the template, instead of the whole ﬁngerprint area, is considered. In

101

 

Database (m, n, w) Empirical Mixture Model Pankanti

 

NIST (52, 52, 12) 7.1 3.1 1.2
331 (51, 51, 12) 8.0 4.9 2.4
332 (63,63, 12) 8.6 5.9 2.5

 

Table 7.10: A comparison between the mean A obtained from the mixture and uniform
models and the mean number of matched minutiae from the empirical matches in the over-
lapping area.

 

Database (m,n,w) Empirical Mixture Model Pankanti
NIST (52,52,12) 3.9 x 10-3 4.4 x 10-3 4.3 x 10—8
331 (51,51,12) 2.9 x 10—2 1.1 x 10—2 4.1 x 10‘-6
332 (63,63,12) 6.5 x 10-2 1.1 x 10-2 4.3 x 10-6

Table 7.11: A comparison between ﬁngerprint individuality estimates using the (a) Poisson
and mixture models, and (b) the corrected uniform model of Pankanti et al. [35].

 

 

other words, the overlapping area model was utilized to estimate ﬁngerprint individuality.
Since mixture models were used in the overlapping area model instead of hyper-mixture
models, the comparison is between mixture models and the “corrected uniform model”.

In order to compare the ﬁngerprint individuality estimates using the mixture model and
the model by Pankanti et al. [35], we ﬁrst need to ﬁnd the overlapping area between the
query and template. This is done as follows. The query and template ﬁngerprints in the
NIST and FV C databases are ﬁrst aligned using a Procrustes transformation [29] based on
the minutiae correspondence obtained from the matcher described in section 6.4.2. Then,
bounding boxes encompassing all minutiae points in the query and template ﬁngerprints
are determined. The overlapping area between the two bounding boxes is taken to be the
overlapping area between the query and template ﬁngerprints. Thus the ﬁngerprint indi-
viduality estimates presented here are dependent on the matcher. In order to compute the
Poisson probabilities, overlapping area model is used. Meanwhile, the ﬁngerprint individ-
uality estimates based on the corrected uniform model is also obtained. Table 7.10 gives

the mean A of the Poisson model in the overlapping area for the NIST and FVC databases.

102

The mean A’s (i.e., the theoretical mean numbers of matches) obtained from the hyper-
rnixture density model are closer to those from the empirical results, compared to those
from the corrected uniform model, which illustrates superiority of the mixture model over
the corrected uniform model. Table 7.11 shows the PRCS corresponding to the mean 111
and the mean 71, compared with the empirical PRCS. The empirical PRC is computed as
the proportion of impostor pairs with 12 or more matches among all pairs with m and n
values within :l:5 of the mean in the overlapping area. Note that as m or n or both increase,
the values of PRC for both models become large because it becomes much easier to obtain
spurious matches for larger m and n values. More important, however, is the fact that the
Poisson probabilities based on the mixture models are, again, orders of magnitude larger
than those from the corrected uniform model. Also the PRCs corresponding to the hyper-
mixture model are closer to the empirical counterparts, compared to those corresponding

to the corrected uniform model, conﬁrming again the reliability of the mixture models.

7 .5 Estimation of Fingerprint Individuality with the Com-
pound Stochastic Model

Instead of all the three databases as in other experiments, only two databases, namely DB1
and DB2, were used to demonstrate effectiveness of the proposed compound stochastic
models. There are two reasons not to use NIST. First, in the NIST database, there are only
two impressions for each ﬁnger; whereas in DB1 and DBZ, each ﬁnger has eight impres-
sions. Therefore the NIST might not have sufﬁcient data to model the local perturbation
for each ﬁnger. Second, the ink ﬁngerprints in the NIST database have very large area and
cover most of the ﬁngertips, which makes it difﬁcult to show the effectiveness of the partial
print model.

To validate the models, a synthetic database consisting of F ﬁngers with L impressions

per ﬁnger was generated. For ﬁnger f, a total of n (the actual number of consolidated

103

minutiae) minutiae were synthesized from the ﬁtted mixture model (for minutiae centers)
and the local perturbation model (for deviations from the minutiae centers). The param-
eterized ellipse for the l-th impression was then used to select a subset of the synthetic
minutiae set. Subsequently, the rigid transformation T ( f, 1 ) was used to obtain a synthetic
impression. Since the ellipses used in this synthesis are the ellipses from the original ﬁn-
gerprint impressions, this simulation is called the ﬁxed ellipse simulation. The distribution
of the number of impostor minutiae matches for this synthetic database is obtained using
the matcher described in [40]. This distribution is represented by the solid line with squares
(El) labeled as “ﬁxed-ellipse” in Figures 7.6 (a-b). Another synthetic database of F ﬁngers
with L impressions per ﬁnger was constructed using the conditional minutiae synthesis
technique so that the number of minutiae for the 1-th impression of ﬁnger f equals to the
observed number of minutiae in the l-th impression of ﬁnger f, namely m( f, 1). When
simulating the best ﬁtting ellipse, the ratio of lengths of major to minor axes for the DB1
and DB2 were taken to be the mean values, namely 1.48 and 1.90, respectively. Since this
synthesis simulates random ellipses from the partial-print model, it is called the random
ellipse synthesis. The corresponding distribution of the number of impostor matches is
represented in Figures 7.6 (a-b) by dashed lines labeled as “random ellipse”. The distribu-
tions based on the real ﬁngerprint impressions and based on the uniform distribution (i.e.,
with no clustering tendency) for the minutiae centers and deviations were also obtained
(denoted by the dot-dashed lines with circles and by the solid lines, respectively). Note
the close agreement between the impostor distributions of the synthesized and empirical
databases, demonstrating the adequateness of the proposed compound stochastic models in
representing the distribution of minutiae in the databases.

Fingerprint individuality estimates computed using Equation 6.3 (using 20 synthetic im-
pressions per ﬁnger, i.e., H = 20) are given in Table 7.12 for DB1 and DBZ databases.
For example, given a query and a template impression with 36 minutiae, the estimated

probability of getting more than or equal to 12 matches is 7.2 x 10”7 for DB1. When

104

0.7 T I I r T j

 

  
 
    
  

 

 

 

 

 

 

 

 

 

 

. -o- 1 Empirical distribution
—I- Fixed ellipse
0.6- "m" Randomeliipse "
— Uniform distribution
0.5 ~ 1
g 0.4 - ' ' ‘
.o
m
.o
2 0.3 - ‘
a
0.2
0.1
0 2 4 e , e 10 ' 1 _ 14
(a) Number of Minutiae Matches for FVCZOOZ DB1
0.5 r '
- -o-- Empirical distribution
O45 ,_ + Fixed ellipse '1
Randomeliipse
0.4 _ — Uniform distribution ~
0.35 — i
.4? 0.3 - , 1
g 0 25 - I \ '
9 9 ..
Q 0.2 b l \' ‘
.' ‘.
0.15 r ! 2 r. ”
0.1 ~ ~' 1
0.05 ~ 6 ‘
.I
0 _ ; _ : J
0 1o 15

 

(b) Number of Minutiae Matches for FVCZOOZ D82

Figure 7.6: A comparison between the synthetic and empirical impostor distributions for
the number of minutiae matches.

105

 

(m,n,w) 331 332
(26, 26, 12) 0* 0*
(36,36,12) 7.2 x 10-7 0*
(46,46,12) 2.6 x 10—5 7.9 x 10—7

Table 7.12: The estimation of ﬁngerprint individuality from the compound stochastic model
when w = 12. Notice that when m = n = 26, none of two synthesized minutiae sets share
12 or more matched minutiae. Hence the probability cannot be estimated accurately for the
case ofm = n = 26.

 

 

m = n = 26, none of two synthesized minutiae sets share 12 or more matched minutiae,
and thus the PRC is reported as zero. Since the ﬁngerprint individuality (i.e., the PRC) is
very small, to estimate such small probability, the sample size of the synthesized database
needs to be large; otherwise a trivial result (i.e., zero) as for m = n = 26 will occur.
The PRCS from the compound stochastic models are slightly smaller than those from the
hyper-mixture models as shown in Table 7.8. For example, when m. = n = 46, estimated
probability of obtaining more than or equal to 12 matches for D82 of hyper-mixture model
and compound stochastic model are 4.1 x 10‘6 and 7.9 x 10‘7, which can be explained by
the multiple sources of minutiae variability from the compound stochastic model compared
to single source of minutiae variability from the hyper-mixture model.

Table 7.13 gives ﬁngerprint individuality estimates from the compound stochastic model
for the “12-point match criteria” (see [2] and [35]) based on DB1 database. For com-
parison purposes, the ﬁngerprint individuality estimates of the corrected uniform model
by Pankanti et. a1 [35] are also given. Recall that the PRCS from the corrected uniform
model are computed based on the number of minutiae in the query and template that oc-
cur in the overlapping area. The parameters 11 and m from the mixture models represent
the total number of minutiae in a query and template, respectively. In order to make valid
comparisons, the mean number of minutiae occurring in the overlapping area is found.
When n = m = 36, this mean number is approximately 25. Consequently, the estimate

1.0 x 10‘10 in Table 7.13 was calculated using Equation 2.13 from the corrected uniform

106

 

(m, n, 10) Compound Stochastic model Corrected Uniform Model
(36,36,12) 7.2 x 10-7 1.0 x 10-10
(46,46,12) 2.6 x 10-5 3.9 x 10-8

Table 7.13: A comparison of ﬁngerprint individuality estimates. n and m are the total
number of minutiae in the query and the template, respectively. 10 is the number of matches
between the query and template ﬁngerprints.

 

 

model based on the combination (25, 25, 12). Note that the estimates of the compound
stochastic models are orders of magnitude higher compared to those of the corrected uni-
form model. This is due to the fact that the compound stochastic model accounts for the
clustering tendency of minutiae via mixture model whereas the corrected uniform model
does not, which indicates that the compound stochastic models gives a more realistic es-
timate of ﬁngerprint individuality compared to the corrected uniform model. In conclu-
sion, the experiment results indicates that the compound stochastic models consider more
sources of minutiae variability but are not able to estimate the PRC at the tails of the distri-
bution. However, if only the variability of minutiae in different ﬁngerprints are considered,
the hyper-mixture models are more computational efﬁcient (no minutiae synthesis and
matching is required) in ﬁngerprint individuality assessment via Poisson approximation.
The mixture models are also better models for representing the distributions of minutiae

compared to the uniform or “corrected” uniform models.

7 .6 Summary

To validate and compare models on ﬁngerprint individuality, the models were implemented
on three different databases, namely NIST 2000 SD 4, FVC 2002 DB1 and DB2. For all
the databases, goodness-of-ﬁt tests on the mixture models showed better performance than
the uniform model. For computational efﬁciency and without jeopardizing the inherent
interclass variability, the hyper-mixture model was used on the entire database/population

of ﬁngerprints. The compound stochastic model is a further development, accounting in-

107

traclass variability of ﬁngerprints. In order to compare between the compound stochastic
model and the uniform model, synthetic minutiae sets were generated based on each of the
models. Visually comparing the matching distributions of the synthetic minutiae sets with
empirical distribution (Figure 7.13) shows that the results from the compound stochastic
model is more similar than those from the uniform model to the empirical distribution, fa-
voring the compound stochastic model. Finally, the PRCS from the hyper-mixture model,
those from the compound stochastic model (based on the synthetic minutiae sets), those
from the uniform model, and those from the corrected uniform model were compared with
the empirical PRCS. The results show that the PRCs from the hyper-mixture model and the
compound stochastic model are orders of magnitude higher than those from the uniform
model and the corrected uniform model. Since the proposed models better represent minu-
tiae distributions, we believe that the PRCS reported in this thesis are better estimates of

ﬁngerprint individuality compared to previous works.

108

CHAPTER 8

Conclusions and Future Directions

The task in ﬁngerprint individuality is to develop statistical measures that characterize the
extent of uniqueness of a ﬁngerprint. The measure can be taken as the probability of ﬁnd-
ing another ﬁngerprint that is sufﬁciently similar to a given query ﬁngerprint in a target
population. A satisfactory estimate of ﬁngerprint individuality will make it possible for
forensic experts to determine the admissibility of ﬁngerprints as evidence in courts of law
where ﬁngerprint-based evidence is increasingly being challenged. The main issue in the
assessment of ﬁngerprint individuality is to satisfactorily model two sources of ﬁngerprint
variability, namely, the intraclass variability and interclass variability of ﬁngerprint fea-
tures. This thesis developed a mixture model for the interclass variability, and a com-
pound stochastic model for minutiae intraclass variability. Publications for this research
are [12], [53], and [52].

As models for minutiae interclass variability, the mixture models provide a ﬂexible way
to represent a variety of observed minutiae distributions in different ﬁngers. Goodness-
of-ﬁt tests showed that the mixture model better represents the characteristics of minutiae
features observed in ﬁngerprint images compared to the uniform model. For example, for
FVC 2002 DB2 with W = 10 and V = 4, the rejection rate for the mixture model and
the uniform models are 1 — 94 / 100 = 6% and 1 — 2 / 100 = 98%, respectively, based on

109

Freeman-Tukey test at the 0.05—level.

Although the proposed mixture model is successful in capturing interclass variability, it
does not address intraclass variability. Therefore a compound stochastic model was de-
veloped to model intraclass variability. More speciﬁcally, the model quantiﬁes three main
sources of intraclass variability, namely, the nonlinear deformation, local perturbation, and
partial ﬁngerprint. A synthesis technique was then used to validate the compound stochas-
tic model and to estimate ﬁngerprint individuality. Experimental results showed that the
impostor matching distributions of synthesized databases based on the compound stochas-
tic model were closer to the corresponding empirical matching distributions compared to
the distributions based on the uniform model. This observation indicates the superiority of
the compound stochastic model over the uniform model.

To estimate individuality of a target population using the above models of minutiae vari-
ability, there are two different approaches, namely, synthesis and analytical approaches.
In the synthesis approach, minutiae sets are synthesized by the models and a matcher is
applied to the synthesized minutiae sets to obtain impostor matching distribution. Fin-
gerprint individuality is then estimated based on the observed number of matches in the
synthetically generated database. On the other hand, in the analytical approach, ﬁngerprint
individuality is estimated based on a mathematical formula from the Poisson model. This
approach is implemented when minutiae density is calculated based on the hyper-mixture
model, which only considers minutiae interclass variability. The Poisson model enables
analytical estimation of ﬁngerprint individuality. However, the implementation of the Pois-
son model becomes infeasible for the compound stochastic model where additional sources
of variability are considered. As an alternative method, the synthesis approach is good at
incorporating multiple sources of minutiae variability, which produces more realistic esti-
mate. Therefore, the synthesis approach is used for the compound stochastic model, which
incorporates both interclass variability and intraclass variability. Although the synthesis

approach has many advantages, its matching procedure to obtain the impostor matching

110

distribution, nevertheless, is time-consuming and doesn’t produce reliable estimates of ﬁn-
gerprint individuality, especially for very small values of PRC. By contrast, the analytical
method has the advantage of high efﬁciency, but has the disadvantage of not accounting for
intraclass variability.

The PRCS obtained from the proposed models were reported and compared with those of
the “corrected” uniform model of Pankanti et al. [35] as well as with empirical results which
is matcher dependent. It was found that the estimation based on the proposed approach in
this thesis is closer to the empirical results compared with those from the “corrected” uni-
form model. Also, the PRCS from the proposed models are orders of magnitude larger than
those from the corrected uniform model which can be explained by the similar clustering
tendencies of minutiae from different ﬁngers. Since the proposed models better represent
minutiae distributions, we believe that the PRCs reported in this thesis are better estimates
of ﬁngerprint individuality compared to previous works.

There are different ways to improve the model presented in this thesis. First of all,
the mixture model can be improved. Instead of using a Gaussian mixture model, a t-
mixture model, with heavier tails can be applied. As many distributions for angular data
can be used to model minutiae directions, such as wrapped Cauchy distribution, wrapped
normal distribution, a detailed study on these distributions is needed to choose an optimal
distribution model for minutiae directions. Secondly, a non-homogenous Poisson process
model incorporating all aspects of minutiae variability (from sources such as superposition
of ghost points, thinning, censoring, and uncertainty in the correspondence function) can be
developed to incorporate intraclass variability and to develop analytical models. Another
direction is to explicitly model spatial dependence of neighboring minutiae. A model for
spatial correlation of minutiae distribution will shed more light on quantifying the tendency

of minutiae to cluster spatially.

111

BIBLIOGRAPHY

[1] S. Pankanti A. K. Jain, L. Hong and R. Bolle. An identity authentication system using
ﬁngerprints. Proc. IEEE, 85(9):103—110, 1997.

[2] J. Buscaglia B. Budowle and R. C. Perlman. Review of the scientiﬁc ba-
sis for friction ridge comparisons as a means of identiﬁcation: Committee
ﬁndings and recommendations. Forensic Science Communications,1(2) Online
at:http://www.fbi.gov/hq/lab/fsc/backissu/jan2006/research/2006 OI research02.htm,
2006.

[3] V. Balthazard. De lidentiﬁcation par les empreintes ditalis. Comptes Rendus, des
Academies des Sciences, 1862(152), 1911.

[4] Claudio Barbarnelli. Evaluating cluster analysis solutions: An application to the ital-
ian neo personality inventory. European Journal of Personality, 16:843-s55, 2002.

[5] A. M. Bazen and S. H. Gerez. Systematic methods for the computation of the direc-
tional ﬁelds and singular points of ﬁngerprints. IEEE Trans. PAMI, 24(7):905—919,
2002.

[6] John Berry and David A. Stoney. The history and development of ﬁngerprinting. In
Henry C. Lee and RE. Gaensslen, editors, Advances in Fingerprint Technology, pages
1-40. CRC Press, Florida, 2nd edition, 2001.

[7] R. Cappelli, D. Maio, and D. Maltoni. Modelling plastic distortion in ﬁngerprint
images. Proceedings of the Second International Conference on Advances in Pattern
Recognition (ICAPR 2001), pages 369—376, 2001.

[8] Y. Chen, S. Dass, and A. K. Jain. Fingerprint quality indices for predicting authentica-
tion performance. Proc. of Audio- and Video-based Biometric Person Authentication
(AVBPA), pages 160—170, 2005.

[9] Simon Cole. “Is Fingerprint Identiﬁcation Valid? Rhetorics of Reliability in Finger-
print Proponents Discourse”. Law & Policy, 28(1):]09—135, January, 2006.

[10] Suzanne Collins. Judge throws out ﬁngerprint evidence in murder.
http://wjz. com/topstori es/shooting. evidence. judge. 2. 43 I 460. html , 2007.

[11] H. Cumrrrins and C. Midlo. Fingerprints, Palms and Soles: An Introduction to Der-
matoglyphics. Dover Publications, Inc., New York, 1961.

112

[12] SC. Dass, Y. Zhu, and AK. Jain. Statistical models for assessing the individuality of
ﬁngerprints. Fourth IEEE workshop on Automatic Identiﬁcation Advanced Technolo-
gies, 2:1—7, 2005.

[13] Daubert v. Merrel Dow Pharmaceuticals Inc, 509 US. 579, 113 S. Ct. 2786, 125
L.Ed.2d 469 (1993).

[14] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood for incomplete
data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1): 1—
38, 1977.

[15] LLC Elephant Engineering, a division of Syntronics. Latentrnaster 2004.
http://www.latentmaster. com/.

[16] F. Galton. Finger Prints. London: McMillan, 1892.

[17] Radim Halir and Jan Flusser. Numerically stable direct least square ﬁtting of ellipses.
Proc. of the 6th International Conference in Central Europe on Computer Graphics,
Visualization and Interactive Digital Media, pages 125—132, 1998.

[18] E. R. Henry. Classiﬁcation and Uses of Fingerprints. London: Routledge, 1900.

[19] Geoffrey W. Hill. Evaluation and inversion of the ratios of modiﬁed Bessel func-
tions, [1(z)/Io(a:) and 11(23)/I.5(:1:). ACM Transactions on Mathematica Software,
7(2):]99—208, 1981.

[20] Lin Hong, Yifei Wan, and Anil K. Jain. Fingerprint image enhancement: Algorithms
and performance evaluation. IEEE Transactions on PAMI, 20(8):777—789, Aug 1998.

[21] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Inc.
Upper Saddle River, NJ, 1988.

[22] Joe H. Ward Jr. Hierarchical grouping to optimize an objective functions. Journal of
the American Statistical Association, 58(301):236—244, 1963.

[23] C. Kingston. Probabilistic analysis of partial ﬁngerprint patterns. PhD. The-
sis, University of California, Berkeley, 1964.

[24] H. Bayo Lawal. Comparison of the X2, Y2, freeman-tukey and williams’s improved
G2 test statistics in small samples of one-way multinorrrials. Biometrika, 71(2):415—
418, 1984.

[25] L. LeCam. Asymptotic Methods in Statistical Decision Theory. Springer-Verlag,
1986.

113

[26] D. Maio, D. Maltoni, R. Cappelli, J. L. Wayman, and Anil K. Jain.
FVC2002: Fingerprint veriﬁcation competition. In Proceedings of the Inter-
national Conference on Pattern Recognition, pages 744—747, 2002. Online:
http://bias.csr.unibo.it/fvc2002/databases.asp.

[27] Davide Maltoni, Dario Maio, Anil K. Jain, and Salil Prabhakar. Handbook of Finger-
print Recognition. Springer—Verlag, 2003.

[28] K. V. Mardia. Statistics of Directional Data. Academic Press, 1972.

[29] K. V. Mardia and I. L. Dryden. The complex Watson distribution and shape analysis.
J. R. Stat. Soc. Ser. B Stat. Methodol, 62(4):913—926, 1999.

[30] A. Martin and M. Przybocki. The NIST 1999 speaker recognition evaluation - an
overview. Digital Signal Processing, 10:1—18, 2000.

[31] G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1997.

[32] O. Nakamura, K. Goto, and T. Minami. Fingerprint classiﬁcation by directional dis-
tribution patterns. Systems, Computers, Controls, l3(5):81—89, 1982.

[33] NIST: 8-bit gray scale images of ﬁngerprint image groups (FIGS). Online:
http://www.nist.gov/srd/nistsd4.htm.

[34] L. O’Gorrnan and J. V. Nickerson. An approach to ﬁngerprint ﬁlter design,. Pattern
Recognition, pages 362-385, 1987.

[35] Sharath Pankanti, Salil Prabhakar, and Anil K. Jain. On the individuality of ﬁnger-
prints. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(8): 1010—1025,
2002.

[36] FBI National Press. Statement on Brandon Mayﬁeld case. FBI National
Press:http://www.fbi. gov/pressrel/pressrelO4/mayﬁeld052404.htm, 2004.

[37] A. R. Rao and R. C. Jain. Computerized ﬂow ﬁeld analysis: Oriented texture ﬁelds.
IEEE Trans. Pattern Analysis and Machine Intelligence, 14(7):693—709, 1992.

[38] N. Ratha and R. Bolle. Fingerprint image quality estimation. BM Computer Science
Research Report RC 21622, 1999.

[39] Ralph Riddiough and John McColl. Maths in Action - Advanced Higher Statistics 1:
Advanced Higher Statistics 1. Nelson, 2001.

[40] A. Ross, S. Dass, and A. K. Jain. A deformable model for ﬁngerprint matching.
Pattern Recognition, 38(1):95-103, 2005.

114

[41] T. Roxburgh. Work on evidential value of ﬁngerprints. Sankhya: Indian Journal of
Statistics, l(50):189—214, 1933.

[42] S. Pankanti S. Prabhakar and A. K. Jain. Biometric recognition: Security and privacy
concerns. IEEE Security and Privacy Magazine, l(2):33—42, 2003.

[43] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461—464,
1978.

[44] S. C. Scolve. The occurence of ﬁngerprint characteristics as a two dimensional pro-
cess. Journal of the American Statistical Association, 74(367):588—595, 1979.

[45] George W. Snedecor and William G. Cochran. Statistical Methods, Eighth Edition.
Iowa UNiversity Press, 1989.

[46] V. S. Srinivasan and N. N. Murthy. Detection of singular points in ﬁngerprint images.
Pattern Recognition, 25(2):139—153, 1992.

[47] D. A. Stoney and J. I. Thornton. A critical analysis of quantitative ﬁngerprint individ-
uality models. Journal of Forensic Sciences, 31(4): 1 187—1216, 1986.

[48] Robert Tibshirani, Guenther Walther, and Trevor Hastie. Estimating the number of
clusters in a data set via the gap statistic. Journal of the Royal Statistical Society.
Series B (Statistical Methodology), 63(2):411—423, 2001.

[49] U. S. v. Byron Mitchell. Criminal Action No. 96-407, U. S. District Court for the
Eastern District of Pennsylvania, 1999.

[50] B. Wentworth and H. H. Wilder. Personal Identiﬁcation,. R. B. Badger, Boston,,
1918.

[51] C. L. Wilson, G. T. Candela, and C. I. Watson. Neural network ﬁngerprint classiﬁac-
tion. J. Artiﬁcial Neural Networks, 2:203—228, 1994.

[52] Y. Zhu, S. C. Dass, and AK. Jain. Statistical models for assessing the individuality of
ﬁngerprints. IEEE Trans. on Information Forensics and Security, 2007, 2:391—401.

[53] Y. Zhu, S.C. Dass, and AK. Jain. Statistical models for ﬁngerprint individuality.
Proceedings of the International Conference on Pattern Recognition(ICPR), 3:532—
535, 2006.

115

       

"ll11111111111111iiiljllllillllllllililillis

4617