ON THE GENERALIZATION OF FINGERPRINT EMBEDDINGS

By

Steven A. Grosz

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computer Science—Doctor of Philosophy

2024

ABSTRACT

Fingerprint recognition is a long-standing and important topic in computer vision and pattern

recognition research, supported by its diverse applications in real-world scenarios such as access

control, consumer products, law enforcement, forensics, national identity, and border security.

Recent advances in deep learning have greatly enhanced fingerprint recognition accuracy and effi-

ciency alongside traditional hand-crafted fingerprint recognition methods, particularly in controlled

settings. While state-of-the-art fingerprint recognition methods excel in controlled scenarios, like

rolled fingerprint recognition, their performance tends to drop in uncontrolled settings, such as

latent and contactless fingerprint recognition. These scenarios are often characterized by extreme

degradations and image variations in the captured images. This performance drop is due to the in-

ability of fingerprint embeddings (feature vectors obtained via deep networks) to generalize across

variations in the captured fingerprint images between controlled and uncontrolled settings.

The challenges in the generalization of fingerprint embeddings, from controlled to uncontrolled

settings, encompass issues such as insufficient labeled data, varying domain characteristics (often

referred to as “domain gap"), and the misalignment of fingerprint features due to information loss.

This dissertation proposes a series of methods aimed at addressing these challenges in various

unconstrained fingerprint recognition scenarios. We begin in chapter 2 with an examination of

cross-sensor and cross-material presentation attack detection (PAD), where the sensing mechanism

and encountered presentation attack instruments (PA) may be unknown. We present methods to

augment the given training data to include a wider diversity of possible domain characteristics,

while simultaneously encouraging the learning of domain-invariant representations. Next, we turn

our attention in chapter 3 to the challenging scenario of contact to contactless fingerprint matching,

where misaligned fingerprint features due to differences in contrast, perspective differences, and

non-linear distortions are corrected via a series of deep learning-based preprocessing techniques

to minimize the domain gap between contact and corresponding contactless fingerprint images. In

chapter 4, we aim to improve the sensor-interoperability of fingerprint recognition by leveraging a

diversity of deep learning representations, integrating convolutional neural network and attention-

based vision transformer architectures into a single, multimodel embedding. Similarly, in chapter 5,

we further improve the robustness and universality of fingerprint representations by fusing multiple

local and global embeddings and demonstrate a marked improvement in latent to rolled fingerprint

recognition performance, both in terms of accuracy and efficiency. Next, chapter 6 presents a

method for synthetic fingerprint generation, capable of mimicking the distribution of real (i.e.,

bona fide) and PA (i.e., spoof) fingerprint images, to alleviate the lack of publicly available data for

building robust fingerprint presentation attack detection algorithms. Finally, in chapter 7 we extend

our fingerprint generation capabilities toward generating universal fingerprints of any fingerprint

class, acquisition type, sensor domain, and quality, all to improve fingerprint recognition training

and generalization performance across diverse scenarios.

Copyright by
STEVEN A. GROSZ
2024

To my loving friends and family.

v

ACKNOWLEDGEMENTS

As I reflect on my time as a PhD student, I am filled with tremendous joy from the experience

as well as with an immense gratitude for the friends, colleagues, and mentors who have guided

and instructed me along the way. Although there are too many to mention, I would like to give

my thanks and appreciation for a few individuals who have truly impacted me during this time.

Foremost, my deepest gratitude to my advisor, Professor Anil K. Jain, for his unwavering support and

encouragement in both my personal and professional life. Thank you, Dr. Jain, for your mentorship

and dedication to teaching me how to become an independent and successful researcher.

I would also like to thank my PhD committee, Dr. Arun Ross, Dr. Xiaoming Liu, and Dr. Kai

Cao, for providing valuable feedback on this dissertation and for their collaboration throughout my

PhD program. Thank you, Brenda Hodge, Amy King, and Vincent Mattison, for administrative

and everyday assistance throughout the degree program. A special thank you to Russell Werner

for not only all the impressive systems and computing support over the years which have saved me

countless hours but also for the comical exchanges which made dealing with environment issues

that much more bearable.

I am grateful for all my fellow labmates with whom I have shared many great memories over

the years. A special thank you to Joshua Engelsma who has been an incredible mentor and friend.

Thank you also to Tarang Chugh, Debayan Deb, Sixue Gong, Yichun Shi, Vishesh Mistry, and

Divyansh Aggarwal with whom I had the pleasure of working with. Thank you to Kanishka

Wĳewardena, Akash Godbole, and Xiao Guo for sharing many fond memories together in the lab.

Thank you to all my loving family and friends who have always supported me. I am especially

thankful to my parents for how they helped nurture the abilities and fortitude that God gave me

to pursue my dreams. Thank you to my sister whose encouragement was much needed at times.

Finally, thank you to my loving girlfriend, Allison Pasek, who has made the last few years that

much more memorable. There are, of course, many other friends and family who have impacted

me along the way, and I thank all of you.

vi

TABLE OF CONTENTS

CHAPTER1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
1.1 History of Fingerprint Recognition . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Major Applications .
1.3 Pipeline of Automated Fingerprint Identification Systems . . . . . . . . . . . .
1.4 Challenges in the Generalization of Fingerprint Representations
1.5 Thesis Contributions

1
1
6
8
. . . . . . . . 17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

CHAPTER2

.
.
. .

Introduction .

SENSOR AND MATERIAL AGNOSTIC FINGERPRINT PRESEN-
TATION ATTACK DETECTION . . . . . . . . . . . . . . . . . . . . . 25
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
.
.
2.1
. 29
2.2 Related Work .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 32
2.4 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Summary . .
. 48
. .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7 Acknowledgment

. .
.
.

CHAPTER3

.
.
. .

.
.
. .

CONTACT TO CONTACTLESS FINGERPRINT MATCHING . . . . . 50
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1
Introduction .
. 56
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Prior Work . .
. 59
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Methods . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
.
3.4 Experiments .
. 76
3.5 Discussion .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. 78
3.6 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . .
. 79
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.8 Acknowledgment

. .
.
.
.
.

. .
.
.

.

.

CHAPTER4

Introduction .

UNIVERSAL FINGERPRINT REPRESENTATION VIA MULTIMODEL
EMBEDDINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 81
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1
.
. 85
4.2 Related Work .
4.3 AFR-Net: Attention-Driven Fingerprint Recognition Network . . . . . . . . . . 86
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5 Discussion .
. 101
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.6 Conclusion .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

.
.
. .

.
.

.
.

.
.

CHAPTER5

Introduction .

LATENT FINGERPRINT RECOGNITION: FUSION OF LOCAL AND
GLOBAL EMBEDDINGS . . . . . . . . . . . . . . . . . . . . . . . .

. 106
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
.
5.1
.
. 109
5.2 Related Work .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 LFR-Net: Latent Fingerprint Recognition Network . . . . . . . . . . . . . .
. 112
5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
. 131
5.5 Discussion .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
. .

.

.

.

.

vii

5.6 Conclusion . .
5.7 Acknowledgment

. .
.

.
.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

CHAPTER6

SYNTHETIC FINGERPRINT SPOOF IMAGES . . . . . . . . . . . . . 136
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Introduction . .
6.1
. 140
.
6.2 Related Work .
6.3 Proposed Synthetic Presentation Attack Fingerprint Generator
. . . . . . . . . 143
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .
.
.

.
.

CHAPTER7

.
.

. .
.
.

UNIVERSAL FINGERPRINT GENERATION . . . . . . . . . . . . .

. 163
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Introduction . .
7.1
. 166
.
7.2 Related Work .
7.3 GenPrint: Controllable Multimodal Fingerprint Diffusion Model
. . . . . . . . 168
7.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.5 Conclusion . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.6 Acknowledgment

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .
.

.
.

CHAPTER8

SUMMARY .
.

. 191
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 Contributions .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.3 List of Publications .

.
.

.

BIBLIOGRAPHY . .

. .

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

viii

CHAPTER 1

INTRODUCTION

The term “fingerprint recognition" evokes diverse mental images among individuals, ranging from

cinematic spy scenarios featuring characters like James Bond to depictions of forensic leads in

criminal investigative television series. In any case, fingerprint recognition has established itself as

one of the most widespread and valuable methods of personal identification, culminating in centuries

worth of scientific and technological discovery. With its popularity, fingerprint recognition has

permeated into numerous areas of everyday life, including secure facility access, smartphone

unlock, forensics, border control, and national identity programs [160]. On the surface, the ease of

which many of us use fingerprint recognition in our daily lives can overshadow the technological

advancements that underpin its development as well as overlook the need for future innovations to

come.

In this chapter, we first explore the origins of fingerprint recognition and how it became so

pervasive in modern society. After detailing its long history, we discuss some notable applications

of fingerprint recognition in the real world that have aided the familiarity that most of us feel with

the technology. Then, we introduce the basic components of an automated fingerprint recognition

system before turning our attention to some of the potential pitfalls and challenges remaining with

the technology. Finally, we conclude with a description of the specific solutions proposed in this

dissertation to address many of those challenges.

1.1 History of Fingerprint Recognition

Human fascination with fingerprints dates back to 7000 BC, where the first discovery of

thumbprints was found in Neolithic bricks from the ancient city of Jericho, in the State of Pales-

tine [127]. Fingerprints were also found embedded into clay tablets in Babylon from the years

1955-1913 BC, suspected of being used in business contracts. Similarly, according the Chinese

historian, Kia Kung-Yen, fingerprints were used from 600-700 AD during the Tang dynasty to sign

legal documents [105]. These accounts speak to some degree of early belief in fingerprints as a

means of personal identification; however, it was not until a century later that scientific studies on

1

Figure 1.1 Example fingerprint images and corresponding minutiae from two identical twins, (a)
twin A and (b) twin B, and one non-twin (shown in c) shown for comparison. Minutiae matching
for (d) twinB–non-twin (matching of (b) and (c), matching score = 3 on a scale of 0–999) and (e)
twin A–twin B (matching of (a) and (b), matching score = 38 on a scale of 0–999). The “matched”
minutiae pairs are shown by bounding boxes. Though the match score of 38 between the twins is
larger than the match score of 3 between twin B and the non-twin, this match score is still below
the threshold for a genuine match. Figure reproduced from [122].

the merits of fingerprint recognition began. As we will see, these studies would come to establish

two core tenets of fingerprint recognition, namely uniqueness and permanence:

1. Uniqueness: Due to genetics and random forces in play during the formation of friction ridge

details, no two fingers, even for the same individual, have identical fingerprints. Even twins

are reported to have unique fingerprints [122], as illustrated in Figure 1.1.

2. Permanence: Friction ridge patterns are believed to be persistent during the lifetime of an

individual in terms of their ability for personal identification. For example, the fingerprint

images shown in Figure 1.2 are from the same finger captured over a 12 year timespan.

Despite small variations due to the introduction of small cuts and bruises, the ridge structure

remained constant throughout the years of collection.

The uniqueness tenet of fingerprints came to fruition through multiple studies on the formation

of ridges, furrows, and pore structures of fingerprints, which began in 1684 by English plant

morphologist Nehemiah Grew [52]. A century later, in 1788, a detailed description of the anatomical

formations of fingerprints was made by Mayer [174]. In 1892, Sir Francis Galton, a polymath and

2

Figure 1.2 Time-lapse of the same finger over a 12 year period from the longitudinal dataset used
in [262].

cousin of Charles Darwin, wrote the landmark book Finger Prints in which he stated the following

on the potential of fingerprint ridges:

“Let no one despise the ridges on account of their smallness, for they are in some

respects the most important of all anthropological data. We shall see that they form

patterns, considerable in size and of a curious variety of shape, whose boundaries can be

firmly outlined, and which are little worlds in themselves. They have the unique merit

of retaining all their peculiarities unchanged throughout life, and afford in consequence

an incomparably surer criterion of identity than any other bodily feature.” [81].

Going further, Galton made note of the uniqueness of fingerprint minutiae, the various endings

and bifurcations found throughout the fingerprint ridge structure, which are still one of the most

common feature representation used in automated fingerprint identification systems today [160].

On the other hand, the notion of permanence was first established by William Herschel, a

German-born British astronomer, who demonstrated the permanence of fingerprints in his 1916

book titled The Origin of Finger-Printing [109], where he collected longitudinal inked impressions

of his son’s finger at the ages of 7, 17, and 40 years old and concluded that fingerprints remained

constant over time. Furthermore, Dr. Henry Faulds observed that not only were fingerprints

permanent, but they also grew back into the exact same pattern when the outer skin of the fingerprint

was removed [52].

In other words, fingerprints were a permanent physical characteristic of an

individual that remained with them throughout their lifetime.

Based on these two long-standing beliefs, fingerprint recognition began to gain prominence

as a critical means of identification in law enforcement applications. In 1892, fingerprints were

3

Figure 1.3 (a) Rolled ten-print fingerprint images of Francisca Rojas and (b) matching crime scene
fingerprint found on the door post at the scene of the murder of her two children in Buenos Aires,
Argentina. Images obtained from [208] and [105].

used for the first time to solve a murder case of two children in Argentina [105]. Their mother,

Francisca Rojas, confessed to the crime after being confronted by evidence matching one of

her ten-print fingerprint images to a bloody fingerprint found on a door post at the scene (see

Figure 1.3).

In 1898, in the Bengal province of India, another murder case was solved using

the remnants of two fingerprint impressions found on an almanac. Sir Edward Henry, Herschel’s

successor in India, found the prints matched with an ex-convict Kangali Charan whose thumbprint

was already in the records due to a prior theft conviction [105].

In 1900, Sir Edward Henry

introduced a scientific fingerprint classification system [108], which was later popularly known as

Henry System of Classification. Henry’s classification system consisted of five major fingerprint

classes, which are demonstrated in Figure 1.4. It was officially introduced at New Scotland Yard for

criminal identification in 1901 [105]. A major development happened in the year 1924 when the

United States Congress mandated the collection of fingerprints of criminals. Consequently, a new

identification division was instituted within the Federal Bureau of Investigation (FBI). In 1933, a

unit specializing in technical analysis of latent fingerprints, i.e., noisy finger marks unintentionally

left at a crime scene (such as the marks left on the almanac by Kangali Charan), was also established

by the FBI [181].

With the increasing use of fingerprints in law enforcement around the world, a need arose

4

Figure 1.4 Example images from the 5 major fingerprint categories established by Henry.

to automate the process of searching a fingerprint against the growing repository of enrolled

fingerprints. This need was further supported by a report compiled by the RAND Corporation [91],

which showcased the advantages and opportunities of the effective use of fingerprints as physical

evidence in improving crime-solving performance. Recognizing the potential for the technology,

combined with an electronics revolution happening at the time, agencies including the FBI, the UK

Home Office, and the Japanese and French police agencies undertook research initiatives that led

to development of Automated Fingerprint Identification Systems (AFIS) [133]. Consequently, the

first algorithmic approach to fingerprint recognition was published in Nature in 1963 and the first

Automated Fingerprint Identification Systems (AFIS) became a reality in 1974 [237].1

As accuracy and efficiency of AFIS improved, so did the scientific understanding of fingerprints

thanks to rigorous statistical studies on central tenets of fingerprint recognition over the last few

decades.

In particular, it was long assumed fingerprints were unique and permanent, but this

belief was only backed up scientifically in 2002 when Pankanti, Prabhakar, and Jain published “On

the individuality of fingerprints” [190]. The authors computed the likelihood of two fingerprints

sharing the exact same patterns to be approxiamtely 5.47𝑥10−59, an incredibly small possibility.

Similarly, in 2015, Yoon and Jain published a study in the Proceedings of the National Academy

of Sciences (PNAS) providing strong backing to the permanence of fingerprints [262]. The study

involved 15,597 subjects over time lapses of 5 to 12 years and found that, although match scores

drop slightly over time, the overall recognition accuracy remained stable. These studies have helped

solidify fingerprint recognition as a viable and secure means of person authentication, which has

1https://www.secureidnews.com/news-item/a-history-of-afis/

5

Figure 1.5 Example prominent applications of fingerprint recognition.

now permeated into myriad commercial and governmental applications. A number of these notable,

well-known applications are discussed in the following section.

1.2 Major Applications

Due to the exceptional performance in terms of accuracy and speed exhibited by contemporary

AFIS, centuries of empirical observation, and robust statistical evidence of the merits on which

it is founded, fingerprint recognition has proliferated across a diverse range of applications in our

present-day world. The following list highlights some of the more notable and widely recognized

applications. Examples of these applications are illustrated in Figure 1.5.

• Forensics - In the mid to late nineteenth century, pioneers, such as Faulds, Herschel, and

Henry, engaged in manual fingerprint examinations to identify repeat criminals [52]. In 1924,

the FBI formally established the Identification Division to collect and store inked ten-print

cards from criminals. By 1999, this process evolved into the FBI’s Integrated Automated

Fingerprint Identification System (IAFIS), where fingerprints (ten-prints) were digitized,

stored, and automatically compared [133]. In 2011, the FBI introduced the Next Generation

6

Identification (NGI) system to enhance the outdated IAFIS, enabling faster and more accurate

fingerprint recognition capabilities. Today, approximately 22,294 federal, state, local, tribal,

and international partners submit criminal and/or civil electronic entries to this system per

month, according to statistics gathered at the end of December 2023 [180].

• Border Crossings - The Office of Biometric Identity Management (OBIM) oversees the

largest biometric repository in the United States. A key aspect of OBIM’s purpose is in

preventing criminals and dangerous individuals from entering the United States. The system

employed, called the Automated Biometric Identification System or IDENT, currently holds

approximately 300 million unique identities and processes more than 400,000 biometric

transactions per day [179].

• National ID - India’s Aadhaar program currently stands as the largest biometric recognition

system in the world [1]. Aadhaar employs all 10 fingerprints, 2 irises, and a face image

of each Indian citizen enrolled. One of the main objectives is to eliminate duplicates and

associate each individual with a unique 12-digit identifier. It has proven to be an effective

tool in facilitating financial transactions and in curbing instances of fraud that may occur

when providing benefits to the marginalized population. As of November 7th, 2023, Aadhaar

has enrolled over 1.3 billion Indian citizens [223].

• Consumer Products (e.g., laptops, smartphones, etc.) - According to a recent study, 81% of

smartphones employ some kind of biometric authentication [45], with face and fingerprint

among the most popular biometrics being used for this purpose. However, consumers

report that the most acceptable biometric technology is fingerprint recognition, with 86% of

Americans feeling comfortable with the technology, compared to 30% who express concerns

about the use of facial recognition.2

• Payments - Major payment companies like Visa and Master Card are incorporating fingerprint

recognition directly into credit cards through a concept known as “Match on Card" [189].

This concept allows users to enroll their fingerprint template onto a chip embedded in their

2https://passport-photo.online/blog/biometric-statistics

7

credit card, enabling them to conduct financial transactions without the need for a PIN

number. Importantly, the fingerprint template data remains securely within the credit card,

ensuring its confidentiality, and is directly matched to the queried fingerprint on the chip

during transactions.

• Access Control – Popularized in numerous spy movies and TV shows over the last few

decades (e.g., James Bond [100], Mission Impossible [99], etc.), fingerprint recognition is

well recognized as a secure and reliable means of restricting access to sensitive and top-

secret buildings and information. In addition to protecting sensitive information, commercial

entities may also employ fingerprint recognition to restrict access to paying customers for

events such as sporting events and concerts. One of the most well-known examples of this is

perhaps Disney’s use of fingerprint recognition to access their theme parks [50].

The aforementioned applications and statistics suggest that it is highly likely that the majority

of the global population has personally encountered fingerprint recognition at some point in their

lives. The data further indicates a potential for continued growth in these numbers. In the following

section, we explore the technical aspects of the modern automated fingerprint recognition system

and explore how these systems became so widespread in our everyday lives.

1.3 Pipeline of Automated Fingerprint Identification Systems

With advancements in fingerprint sensing technology and automated matching algorithms,

fingerprint recognition has become highly accurate and efficient. A typical recognition system

involves two key stages: enrollment and recognition. A diagram of these stages of a typical

fingerprint recognition system is shown in Figure 1.6.

1. Enrollment: In this stage, an individual’s fingerprint, acquired using a fingerprint reader,

undergoes processing to extract salient features and generate a fingerprint template. This

template is then tagged with a unique user identifier and stored with associated metadata in

a database. This database is known as the reference, gallery, or enrollment database.

2. Recognition: Depending on the application context, the recognition of an individual can be

for validating the claimed identity (verification) or establishing the identity of an unknown

8

individual (identification). In both cases, a fingerprint is acquired and processed to generate

a template, known as the query or probe template.

a) Authentication: In the authentication scenario, also known as verification, the query

template is accompanied by a user identifier (the claimed identity), which is used to

retrieve the enrolled template from the reference database. The system accepts or rejects

the submitted claim of identity by performing a one-to-one comparison between the

query template and the retrieved reference template. Examples include fingerprint-

based access control and large-scale civil ID systems (e.g., Aadhaar), where the user

provides a unique ID (e.g., employee RFID card or Aadhaar 12-digit unique ID) and a

fingerprint impression for authentication.

b) Identification: In the identification scenario, the system aims to establish the identity

of a subject by searching the entire reference database for a match. Operating in

the identification mode, a biometric system performs one-to-many comparisons to

determine if the user is already enrolled in the database, returning the user identifier

that matches. This scenario is common in criminal investigations, where a fingerprint

left at the crime scene is used to search if the perpetrator is already enrolled in the

database.

The functionality of each of the fingerprint recognition modes is facilitated by a configuration

of multiple sub-processes, including fingerprint acquisition, feature extraction, and matching. Each

of these sub-modules are described in detail below.

1.3.1 Fingerprint Acquisition

In the early stages of fingerprint recognition, fingerprint images were acquired via a process

that involved inking a user’s fingers and having them press down on card stock paper. The captured

fingerprints could be either rolled fingerprints, obtained by rolling the finger from one side to

another, or slap/plain fingerprints captured by pressing the fingers flat against the card. These

fingerprints were then filed away and manually compared by an examiner. Over time, digital

fingerprint readers were developed, offering significant convenience compared to the traditional

9

Figure 1.6 Diagram of typical fingerprint recognition pipeline.

“ink on paper" capture techniques. The various mechanisms for how these digital fingerprint

readers captured images are discussed below.

1.3.1.1 Sensing Technologies

The major sensing technologies used in fingerprint readers over the years include, but are not

limited to, ultrasound, frustrated total internal reflection (FTIR), direct-view imaging, capacitance,

thermal, and pressure sensors. Depending on the imaging technology used, the acquired images may

have starkly different characteristics and appearances, as illustrated in Figure 1.7. These differences

can pose a significant challenge to fingerprint recognition algorithms designed to work on images

acquired by one type of fingerprint reader when applied to images captured on a completely

different device. This difficulty is enhanced if the two readers employ drastically different sensing

technology. Chapter two of this dissertation will concentrate on enhancing the interoperability of

fingerprint recognition systems across various sensor types. Further details on individual sensing

mechanisms are outlined below:

1. Optical: Among optical sensing technologies, frustrated total internal reflection (FTIR) is

10

Figure 1.7 Example images from the NIST SD 302 dataset [78] of the same finger captured across
6 different fingerprint readers.

the most widely used. FTIR sensing involves using a light source and a glass prism, with

a camera capturing reflected light from the fingerprint ridges and valleys, resulting in a

high-contrast fingerprint image. Other optical fingerprint readers use direct-view imaging,

where a light source illuminates the finger, and light from both ridges and valleys is reflected

back towards the camera. Contactless optical readers typically offer lower contrast compared

to FTIR systems but are more hygienic and less impacted by moisture on the finger due to

environmental humidity.

2. Solid-State: Solid-state sensing technology operates using an array of mini-sensors measuring

differentials in capacitance, temperature, or pressure between ridges and valleys of the finger.

Due to their small size and low cost, solid-state sensors are commonly deployed in mobile

devices [160].

3. Ultrasound: Ultrasound sensing emits acoustic waves toward the fingertip on the imaging

platen and uses a receiver to gather the echoed response to develop a depth profile of

the fingerprint. Ultrasound sensing is advantageous for capturing a subsurface fingerprint

image which is useful for detecting fake fingerprint attacks and alleviating contaminants on

the surface of fingerprints. Recently, QualComm Inc. developed an in-display ultrasound

sensor for mobile phones, widely deployed in the Samsung smartphone series (Galaxy S10

onwards) [160].

With the onset of the COVID-19 pandemic, there has been a surge in interest toward contactless

acquisition, especially utilizing direct-view cameras such as readily available smartphone cameras.

However, for the field to move toward contactless acquisition, several significant challenges stem-

11

ming from the domain gap between images acquired via one of the aforementioned contact-based

acquisition methods need to be addressed.

In particular, the presence of non-linear distortion

in contact prints introduces alignment errors between undistorted contactless prints. Paired with

marked differences in the appearance of the images due to differences in lighting exposure, varying

backgrounds, and varying stand-off between the finger and the imaging device, drastically different

images may be captured between contactless and contact-based images of the same fingers. Vary-

ing perspective differences as the finger is unconstrained along 6 degrees of freedom as the image

is being captured in space also presents another set of unique challenges, both for contactless to

contactless matching and contact to contactless matching. Chapter three of the dissertation pro-

poses several methods to reduce the domain gap between newly mobile phone acquired contactless

fingerprint images and corresponding contact-based images that are commonly found in legacy

databases.

1.3.2 Feature Extraction

To determine the identity of a given fingerprint image, a measurement (feature) space is re-

quired in which multiple fingerprint images of one identity (finger) are clustered together in the

space and images belonging to another finger occupy a separate cluster in the space. Two main

requirements of a good feature space are that the chosen features should be salient, meaning that

the representation contains discriminative information about the fingerprint, and effective, mean-

ing that the representation can be easily obtained, stored in a compact manner, and be useful for

matching [160]. For purposes of recognition, a number of different fingerprint features have been

proposed, which capture distinct information from the input fingerprint ridge structures. These

features are commonly categorized in a hierarchical order based on the scale of the characteristics

which they contain, as described below and visualized in Figure 1.8:

• Level-1: These global features encompass fingerprint pattern types (arch, loop, whorl),

singular points (cores, deltas), ridge orientation, and ridge spacing. While useful for indexing

and fingerprint alignment, they do not typically provide enough discriminative information

for unique finger identification. Techniques such as image processing, detection of ridges with

12

Figure 1.8 Illustration of the three scales of features typically extracted from a fingerprint image.

maximum curvature, or deep learning approaches are employed to extract these features [160].

• Level-2: These local features pertain to salient points where ridges exhibit discontinuity,

such as ridge endings and bifurcations, known as minutiae points. There can be over 100

minutiae found in a rolled fingerprint, but the number of corresponding minutiae considered

sufficient for a high-confidence match is believed to be around 12 to 15. A study from

2013 on the sufficiency of information for latent fingerprint examination cites 44 countries

as having a minimum number of minutiae required for evidence, which can range from 4 to

16 points. For example, the United Kingdom requires a minimum sixteen points, while 24

other countries (e.g., Australia, Germany, etc.) only require twelve. The United States does

not have a minimum [239].

• Level-3: These features involve fine-grained characteristics of fingerprints, such as sweat

pores, incipient ridges, scars, creases, and dots between ridges. While providing additional

uniqueness, they typically require a minimum scanning resolution of 1000 ppi for successful

extraction. Mainly utilized by latent fingerprint examiners for manual comparison, these

features are not commonly employed in AFIS due to their lack of robustness, cost of fingerprint

reader construction, and slow extraction and matching speeds. However, recent developments

in low-cost, high-resolution readers have led to the development of algorithms utilizing level-3

features for matching [120].

13

Figure 1.9 Handcrafted feature learning vs. data-driven feature learning.

State-of-the-art commercial-off-the-shelf matchers (COTS) may leverage either knowledge-

driven (i.e., hand-crafted) techniques and/or data-driven methods, such as Convolutional Neural

Networks (CNNs), for extracting fingerprint features, including those previously mentioned. Much

of chapters 4 and 5 of this dissertation explores the complimentary nature of these representations,

with an emphasis on how to successfully leverage a combination of them for improved accuracy

and efficiency in fingerprint recognition. Each of these two paradigms, knowledge-driven and data-

driven, are discussed further in the following sections within the context of fingerprint recognition

(see also Figure 1.9 for an illustration of the differences).

1.3.2.1 Knowledge-driven Representation

The prominent characteristic of each fingerprint image is the organization of various connected

ridges and valleys, which typically portray as dark pixels for ridges and bright pixels for valleys in

the captured digital images [160]. Before the prevalence of big data and large-scale computing,

designers of automated fingerprint recognition systems relied on extensive knowledge about the

problem domain (fingerprints) to extract salient features from these ridges and valleys to discrim-

inate between different fingerprint identities. Riding on the back of Galton’s claims about the

14

uniqueness of minutiae points, many of these hand-crafted approaches were aimed at reliably ex-

tracting points of discontinuity along the ridges. The points were commonly denoted by their type

(e.g., termination, bifurcation, etc.), x and y coordinates within the image, and the angle between

the tangent to the connecting ridge line and the horizontal axis [160].

A number of preprocessing steps have proved useful for reliable minutiae extraction, including

fingerprint image enhancement, segmentation, and binarization. Traditional algorithms for each of

these steps employed the use of well-known image processing techniques, such as Gabor Filters for

image enhancement [111], Prewitt and Sobel filters for ridge orientation estimation [87], sinusoidal-

shaped waves for frequency estimation [154], and local histograms and morphological operations

for fingerprint segmentation and binarization [170]. Finally, minutiae are detected from thinned

versions of the binarized images by summing the differences between pairs of adjacent pixels in a

neighborhood of each pixel [160].

1.3.2.2 Data-driven Representation

As advancements in deep learning proved remarkably useful across a wide array of problems

in computer vision and machine learning, researchers began exploring the use of deep networks

for various fingerprint recognition tasks. Some of these methods supplanted the hand-designed

algorithms used for fingerprint feature extraction in the past; some examples include fingerprint

enhancement [116, 126, 137, 271], segmentation [30], and minutiae extraction [29, 51, 187, 233].

From these developments arose a new approach to fingerprint representation in which a fingerprint

image is input to a deep network (typically a CNN) and a single, fixed-length embedding is obtained.

Among the first of the kind was a fingerprint representation network nicknamed DeepPrint, which

would output a single 192-dimensional feature embedding [68]. What set DeepPrint apart from

other fixed-length representation networks was its incorporation of minutiae features directly into

the learning process. Thus, DeepPrint was able to show remarkable improvement in fingerprint

recognition accuracy by embedding minutiae domain knowledge into the data-driven learning

process of CNNs, while at the same time, delivering significant speedup in matching. A major

focus of chapter 4 of this dissertation also aims to combine domain knowledge with data-driven

15

representations for improved performance of latent fingerprint recognition.

1.3.3 Matching

A fingerprint matching algorithm compares two given fingerprint templates and typically outputs

a similarity score, usually a value between 0 and 1. A score close to 0 implies low similarity, while

a value near 1 indicates very high similarity. A match score above a specified threshold (t) is

considered a successful match. A strict threshold (close to 1) enhances security by minimizing

false accepts but may lead to a poor user experience due to increased false rejects. Fingerprint

matching poses a challenge due to significant variability between different impressions of the same

finger (intra-class variability), influenced by factors such as noise, displacement, partial overlap,

pressure, and skin conditions [160]. There are essentially three broad categories of fingerprint

matching approaches:

1. Correlation-based matching: This technique involves overlaying two fingerprint images and

calculating the correlation between corresponding pixels for different alignments, rotations,

and displacements. Due to its resource-intensive nature, these techniques are not widely

used.

2. Minutiae-based matching: This is the most popular and widely deployed technique for

fingerprint matching, used by both automated algorithms and fingerprint examiners.

It

entails finding the alignment between the reference minutiae set and the input query minutiae

set that results in the maximum number of paired minutiae.

3. Non-minutiae feature-based matching: In cases of low-quality images, such as latent finger-

prints, minutiae extraction becomes challenging. This category of matching approaches may

utilize ridge pattern characteristics (e.g., local ridge frequency and orientation) or texture

information using hand-crafted or deep learning methods [31]. Combining minutiae-based

and texture-based features can significantly enhance the matching performance of latent

fingerprints, including state-of-the-art deep-learning based methods with fixed length repre-

sentation [68].

Besides accuracy, another important component of fingerprint matching algorithms is the

16

latency or time taken to return a match. For 1:1 authentication systems, the latency time becomes

a nuisance factor for patrons utilizing the system, but anything close to real-time is sufficient to

achieve a satisfactory experience. For search applications (1:N comparison), however, the latency

can have a significant impact on the usability of the system with a large number of comparisons

(e.g., tens of millions). For example, with the FBI’s NGI system, which maintains a database

of roughly 185 million fingerprints of criminal, civil, and military individuals, the match speed

becomes a significant [101] factor when searching a latent fingerprint probe to find a potential

suspect match. Chapter 5 of this dissertation discusses a mutli-stage search strategy to significantly

reduce the time required to search a gallery given an input latent fingerprint probe.

1.4 Challenges in the Generalization of Fingerprint Representations

Despite notable progress in the evolution and acceptance of fingerprint recognition over the past

few decades, substantial challenges persist and require attention. As mentioned in the introduction,

a significant portion of these challenges arises from the fragility of fingerprint feature extraction and

matching in the face of noisy, low-quality prints, diverse capture characteristics, and the scarcity

of data for acquiring robust and broadly applicable features. The ensuing sections outline some of

these specific challenges which this dissertation aims to address.

1.4.1 Sensor-interoperability: Lack of Sensor-invariant Features

A well known issue of data-driven machine learning approaches is their susceptibility to overfit

to the training data [60]. The general idea of overfitting in machine learning is when a model

memorizes the training data too well, including its noise and specificities, to the extent that it

negatively impacts the model’s generalization performance on new, unseen data. Given the high

variability in the appearance of fingerprint images captured via various sensing mechanisms,

learning robust, sensor-invariant features is of particular importance and interest in the fingerprint

recognition community.

17

1.4.1.1 Universal Fingerprint Representation

As introduced in Section 1.3.2, many different types of features have been proposed for finger-

print recognition within the last century, which can be broadly classified into one of two categories:

knowledge-based and data-driven. The most popular knowledge-based feature is that of minutiae

points, which have become an essential component of automated fingerprint recognition systems.

More recently, deep learning-based fingerprint representations (i.e., embeddings) have emerged as

a viable and complementary alternative to minutiae features. However, minutiae features and deep

learning-based embeddings each have their own advantages and disadvantages. Minutiae, which are

explainable and have a strong background in scientific understanding, perform exceptionally well

in high-quality fingerprint images, particularly in large area fingerprints which may contain a large

number of minutiae, but their performance degrades rapidly in partial and very noisy fingerprint

images, such as latent fingerprints or individuals with degraded finger skin [98]. On the other hand,

fixed-length embeddings seem to improve the performance in low-quality and partial fingerprints

compared to minutiae, and are orders of magnitude faster to match [68]. However, they suffer from

a lack of transparency and explainability. Research into improved methods on how to leverage both

representations for robust, universal fingerprint recognition is an area of active research.

1.4.1.2 Cross-sensor and Cross-material Spoof Detection

One of the most significant challenges threatening the security of fingerprint recognition devices

today is that of presentation attacks from adversaries trying to gain unauthorized access to these

systems [165, 178]. A presentation attack (PA), as defined by the ISO standard IEC 30107-

1:2016(E), is a “presentation to the biometric data capture subsystem with the goal of interfering

with the operation of the biometric system.” [117]. Common presentation attack artifacts include

fingerprint casts constructed from molds using readily available household materials (gelatin,

silicone, wood glue, etc.)

that aim to mimic the ridge-valley structure of an enrolled user’s

fingerprint [25, 66, 162, 169, 261]. This concern has led to a series of competitions on fingerprint

presentation attack detection (PAD) methods to alleviate the vulnerability of these systems to

presentation attacks.

18

Figure 1.10 Example raw and segmented (a) contactless and (b) latent fingerprints.

As of recently, convolutional neural network (CNN) approaches have shown the best perfor-

mance on the respective genuine vs. PA benchmark datasets. However, due to the different optical

and mechanical properties, it has been shown that the PAD error rates of these approaches suffer

up to a three-fold increase when applied to datasets containing PA materials not seen during train-

ing, denoted as “cross-material generalization" [163, 229]. Similarly, a performance gap exists for

“cross-sensor generalization", in which presentation attack algorithms are applied to fingerprint

images captured on new fingerprint sensor devices that were not seen during training.

1.4.2 Unconstrained Fingerprint Recognition

Two notable applications of fingerprint recognition in the last few years are in the unconstrained

capture scenarios of contactless fingerprints, mostly by mobile phones which obviates the need for a

separate fingerprint reader, and latent fingerprints left unintentionally at crime scenes. Contactless

acquisition of fingerprint images suffers from a high degree of variability due to the six degrees

of freedom during capture and latent prints are often extremely smudgy, occluded, and distorted

(see Figure 1.10 for examples). The following two sections will discuss each of these challenging

capture scenarios in detail.

1.4.3 Contact to Contactless Fingerprint Matching Compatability

Despite the benefits of contactless fingerprint acquisition, imaging and subsequently matching

a contactless fingerprint presents its own set of unique challenges. These include (i) low ridge-

valley contrast, (ii) non-uniform illumination, (iii) varying roll, pitch, and yaw of the finger, (iv)

varying background, (v) and perspective distortions due to the varying distances of the finger from

the camera. Of particular importance is the lack of cross-compatibility with legacy databases of

19

contact-based fingerprints. Indeed, for widespread adoption of contactless fingerprint acquisition,

specific algorithms and methods need to be developed to minimize the domain gap between

contactless fingerprints and corresponding contact-based impressions.

1.4.4 Latent Fingerprint Recognition

The reliability of automatic latent to rolled fingerprint matching considerably lags that of rolled

to rolled fingerprint matching. As a result, some innocent individuals, like in the case of Brandon

Mayfield [182], have unfortunately been incarcerated due to inaccurate latent to rolled comparison

by automatic fingerprint identification systems (AFIS) and failure of forensic examiners to follow

the FBI’s ACE-V protocol established in the 1980s [8]. Some of the reasons for low performance

in latent fingerprint recognition include poor ridge-valley contrast, occlusion, distortion, varying

background, and incomplete fingerprint patterns. Because of these challenges, latent fingerprint

recognition remains one of the most challenging problems in biometrics, akin to matching poor

quality face images from CCTV surveillance frames to mugshot photos.

Few studies have focused on an end-to-end system to improve the latent to rolled fingerprint

recognition pipeline, which is necessary since optimizing individual components separately may

lead to sub-optimal performance when integrated together and tested as a complete system. Of

those studies that do report on an end-to-end recognition system [28, 31, 234], the highest rank-1

retrieval rate reported in the academic literature is 65.7% [31], computed on 258 latent probes from

NIST SD 27 against a background of 100K rolled fingerprints. However, increased interest in latent

fingerprint recognition has led to the creation of the on-going Evaluation of Latent Fingerprint

Technologies (ELFT) competition held by the National Institute of Standards and Technology

(NIST), where these results are contiually improving [2].

1.4.5 Limited Public Domain Datasets

Recently, the community has seen a push toward deep neural network (DNN) based models for

fingerprint recognition [26,68,93,139,143,217,219]. These compact, fixed-length embeddings can

be matched efficiently and combined with homomorphic encryption for added security [74]. Indeed,

this push toward DNN-based fingerprint recognition comes in the wake of the success demonstrated

20

in the face recognition domain in applying DNN models to face recognition, which was aided by the

availability of large-scale face recognition databases scraped from the web, despite the many ethical

and privacy concerns which have led to many of these datasets to be recalled today. Arguably,

at least in part, the reason for the delayed adoption of DNNs for fingerprint recognition has been

the lack of publicly available, large-scale fingerprint recognition datasets and increased scrutiny

over privacy of biometric data, which has led to many works to generate synthetic fingerprint

images [6, 9, 11, 19, 21, 23, 36, 71, 76, 125, 171, 173, 198, 252, 268].

Similarly, there has been an increased interest in DNN-based models for fingerprint presentation

attack detection (PAD), i.e., spoof detection, where the scale and amount of publicly available data

is also limited. Compounding the problem is the difficulty in collecting large-scale fingerprint PA

datasets due to the increased time and complexity in fabricating and imaging artifacts mimicking

realistic fingerprint ridge-valley structures. Further advancements in synthetic fingerprint genera-

tion are needed to improve the performance of models in the presence of a limited amount of real

fingerprint data.

1.5 Thesis Contributions

This dissertation aims to improve the generalization performance of fingerprint recognition and

fingerprint spoof detection in each of the challenging applications previously mentioned.

• Sensor and Material Agnostic Fingerprint Presentation Attack Detection

To improve the generalization of fingerprint presentation attack detection across novel PA

materials and fingerprint sensing devices, we propose an approach which builds off any exist-

ing CNN-based architecture trained for fingerprint liveliness detection. First, we incorporate

a style transfer network wrapper to augment the training data with additional domain char-

acteristics. Next, we utilize adversarial representation learning (ARL) to learn sensor and

material invariant representations. These two approaches combined were shown to improve

the cross-sensor (cross-sensor and cross-material) generalization performance from a TDR

of 88.36% (78.76%) to a TDR of 93.03% (88.49%) at a FDR of 0.2%.

• Contact to Contactless Fingerprint Matching

21

We present a comprehensive, end-to-end solution for contact-contactless fingerprint match-

ing that addresses the challenges inherent to each step in the contact to contactless matching

process (mobile capture, segmentation, enhancement, scaling, non-linear warping, repre-

sentation extraction, and matching). Our approach utilizes several deep learning-based

pre-processing techniques to minimize domain gap between contact and contactless finger-

prints. Additionally, a new dataset of 9, 888 2D contactless and corresponding contact-based

fingerprint images from 206 subjects (2 thumbs and 2 index fingers per subject) was collected

as part of this work and made public to advance much needed research in this area. The

smartphone contactless fingerprint capture app that was used to collect the dataset was made

available as well.

• Universal Fingerprint Representation via Multi-model Embeddings

To address difficult edge-cases where accurate fingerprint recognition remains challenging,

such as partial overlap between two candidate fingerprint images and cross-sensor inter-

operability (e.g., optical to capacitive, contact to contactless, latent to rolled fingerprints,

etc.), we introduce a novel deep learning-based fingerprint recognition architecture, AFR-

Net (Attention-Driven Fingerprint Recognition Network), consisting of a shared feature

extraction and parallel CNN and attention classification layers. This dual CNN and Vi-

sion Transformer (ViT) network successfully leverages the complimentary representations of

CNN and attention-based networks for improved recognition accuracy across several diverse

fingerprint datasets and sensor domains. Furthermore, we proposed a two-stage matching

algorithm to significantly improve recognition performance in presence of low-quality and/or

partial fingerprints, which was motivated from the observation the intermediate feature maps

of deep learning-based fingerprint recognition networks encode local features that are also

useful for relating two candidate fingerprint images. We use these corresponding local fea-

tures between two candidate fingerprint image pairs to guide the network in placing attention

on overlapping regions of the images. This leads to a more accurate determination of whether

the images are from the same finger, especially for fingerprint pairs whose similiarities are

22

close to the match threshold.

• Latent Fingerprint Recognition: Fusion of Local and Global Embeddings

To address the degradations commonly found in latent fingerprints (poor ridge-valley contrast,

occlusion, distortion, varying background, and incomplete fingerprint patterns, etc.), we

describe a novel, deep learning-based latent fingerprint enhancement method, coupled with

an end-to-end pipeline for latent fingerprint recognition which leverages both a learned,

global fingerprint representation (i.e., entire friction ridge pattern) and local representations

(i.e., minutiae and virtual minutiae3) for improved accuracy and search speed of latent to

rolled fingerprint recognition. Unlike existing latent to rolled fingerprint matching pipelines

which are highly tuned specifically for latent fingerprints, our learned representation and

matching pipeline is generalizable and effective across a wide range of fingerprint sensors

(e.g., optical, capacitive, etc.) and image domains (e.g., latent, rolled, plain, contactless

captures via mobile phone cameras, etc.).

• Synthetic Fingerprint Spoof Images

To address the lack of large-scale fingerprint PA datasets, this dissertation describes a syn-

thetic fingerprint generator, SpoofGAN, which is capable of generating highly realistic finger-

print impressions of both real (i.e., bona fide) and spoof (i.e., presentation attack) fingerprint

images. SpoofGAN is a multi-stage generative architecture to fingerprint generation which

can generate multiple, realistic fingerprint impressions from the same finger and a large

number of different, unique fingers. We validate the realism of our synthetic bona fide and

PA images through extensive qualitative and quantitative metrics including NFIQ2 [228],

minutiae statistics, match scores from a SOTA fingerprint matcher, and T-SNE feature space

analysis showing the similarity of real bona fide and PA embeddings to the embeddings of

our synthetic bona fide and PA fingerprints. Besides verifying the realism of our synthetic

PA generator, we also show how SpoofGAN fingerprints can be used to train a DNN for fin-

gerprint PAD. We show this by improving the performance of a PAD model by augmenting

3Virtual minutiae are densely sampled points on an evenly spaced grid on the extracted fingerprint ridge area.

23

an existing fingerprint PA dataset with additional samples from our synthetic generator.

• Universal Fingerprint Generation

Existing methods for generating fingerprints have limitations in creating varied impressions

of the same finger with useful intra-class variations and diversity. To tackle this challenge, we

present GenPrint, a framework designed to produce fingerprint images of various types while

maintaining identity and offering humanly understandable control over different appearance

factors such as fingerprint class, acquisition type, sensor device, and quality level. Unlike

previous fingerprint generation approaches, GenPrint is not confined to replicating style

characteristics from the training dataset alone: it enables the generation of novel sensor and

style attributes from unseen fingerprint acquisition devices during inference without requiring

additional fine-tuning. To accomplish these objectives, we developed GenPrint using latent

diffusion models with multimodal conditions (text and image) for consistent generation of

style and identity. Our experiments leverage a variety of publicly available datasets for

training and evaluation. Results demonstrate the benefits of GenPrint in terms of identity

preservation, explainable control, and universality of generated images.

Importantly, the

GenPrint-generated images yield comparable or even superior accuracy to models trained

solely on real data and further enhances performance when augmenting the diversity of

existing real fingerprint datasets.

24

CHAPTER 2

SENSOR AND MATERIAL AGNOSTIC FINGERPRINT PRESENTATION ATTACK
DETECTION

This chapter addresses the poor generalization of existing presentation attack detection (PAD)

solutions to new presentation attack (PA) materials and fingerprint sensors not seen during training.

We present a robust PAD solution via a combination of adversarial representation learning (ARL)

and a CNN-based architecture embedded with a cross-sensor style transfer network wrapper. The

style transfer component aims to address the challenge of training robust deep networks on limited

training data, whereas ARL is leveraged to encode generalized representations across all classes

of sensors and PA materials. Experimental results on MSU-FPAD, Government Controlled Test

(GCT), and LivDet 2015 and 2017 public domain datasets exhibit the effectiveness of the proposed

approach in improving the cross-material and cross-sensor generalization performance.

2.1

Introduction

It has been observed that the security of automated fingerprint recognition systems may be

compromised by presentation attacks from adversaries trying to gain unauthorized access to these

systems [165, 178]. A presentation attack (PA) as defined by the ISO standard IEC 30107-

1:2016(E) [117] is a “presentation to the biometric data capture subsystem with the goal of

interfering with the operation of the biometric system.” Common presentation attack artifacts in-

clude fingerprint casts constructed from molds using readily available household materials (gelatin,

silicone, wood glue, etc) that aim to mimic the ridge-valley structure of an enrolled user’s finger-

print [25, 66, 162, 169, 261]. Several examples of these artifacts are shown in Figure 2.1.

This concern has led to a series of competitions on fingerprint PAD methods to alleviate the

vulnerability of these systems to various PAs. The First International Fingerprint Liveness Detection

Competition debuted in 2009 [166] with subsequent competitions every two years, the most recent

This chapter was previously published as S. A. Grosz, T. Chugh, and A. K. Jain, “Fingerprint Presentation Attack
Detection: A Sensor and Material Agnostic Approach”, IEEE International Joint Conference on Biometrics, Houston,
TX, Sept. 2020. Copyright 2020 by IEEE. Reprinted with permission.
In the literature, presentation attack detection (PAD) is also commonly referred to as spoof detection and liveness
detection. In this work, we use these terms interchangeably.

25

Figure 2.1 Example PA artifacts. Due to the varying visual and physical properties, many PAD
algorithms fail to detect PA materials not seen during training.

Figure 2.2 Illustration of the differences in textural appearance of live fingerprints captured on six
different fingerprint readers. Images from LivDet 2015 [175], LivDet 2017 [176], and MSU-FPAD
datasets [41].

being 2019 [86, 175, 176, 186, 257]. Approaches to fingerprint PAD can be broadly classified

as either hardware-based, software-based or a combination of both. Hardware-based methods

leverage the physical properties of various PA materials and use a number of additional sensors to

gain further insight into the liveness of the presented fingerprint [12, 67, 135]. Some examples of

sensing technologies that are inherently well-suited for liveness detection that have been used for

26

Table 2.1 Summary of published fingerprint cross-material generalization studies1.

Study

Rattani et
al. [196]

Ding &
Ross [58]

Chugh &
Jain [41]

Chugh &
Jain [42]

Approach

Weibull-calibrated SVM

Database

LivDet 2011

Performance

EER = 19.70 %

Ensemble of multiple one-class SVMs

LivDet 2011

EER = 17.06 %

MobileNet-v1 trained on minutiae-centered
local patches

LivDet 2011-2015

Identify a representative set of spoof
materials to cover the deep feature space

MSU-FPAD v2.0, 12
PA materials

ACE = 1.48 % (LivDet 2015),
2.93 % (LivDet 2011, 2013)

TDR = 75.24 % @ FDR = 0.2 %

Engelsma &
Jain [72]

Ensemble of generative adversarial networks
(GANs)

Custom database of
12 PA materials

TDR = 49.80 % @ FDR = 0.2 %

Gonzalez-Soler
et al. [88]

Tolosana et
al. [235]

Gajawada et
al. [79]

Chugh &
Jain [44]

Feature encoding of dense-SIFT features

LivDet 2011-2015

TDR = 7.03 % @ FDR = 1.0 %
(LivDet 2015), ACE = 1.01 %
(LivDet 2011, 2013)

Fusion of two CNN architectures trained on
SWIR images

Custom database of 8
PA materials

EER = 1.35 %

Style transfer from spoof to live images with
a few samples of target material

LivDet 2015,
CrossMatch sensor

TDR = 78.04 % @ FDR = 0.1 %

Style transfer between known spoof materials MSU-FPAD v2.0, 12

PAs, LivDet 2017

TDR = 91.78 % @ FDR = 0.2 %
(MSU-FPAD v2.0), TDR =
80.74 % @ FDR = 1.0 %
(LivDet 2017)

TDR = 87.86 % @ FDR = 0.2 %
(cross-sensor and cross-material
LivDet 2015)

Grosz et al. [92]

Style transfer with a few samples of target
sensor fingerprint images + SARL

LivDet 2015, LivDet
2017, MSU-FPAD

Proposed
Approach

Style transfer with a few samples of target
sensor fingerprint images + SARL + MARL

LivDet 2015, LivDet
2017, MSU-FPAD,
GCT3

TDR = 88.49 % @ FDR = 0.2 %
(cross-sensor and cross-material
LivDet 2015)

1 Several of the earlier studies used EER and ACE to characterize performance; however, in practice, TDR at a fixed FAR or the
ROC curve is required.

fingerprint PAD are the multispectral Lumidigm sensor and OCT-based sensors [43]. In contrast,

software-based solutions use only the information contained in the captured fingerprint image (or a

sequence of images) to classify a fingerprint as bonafide or PA [41,84,85,164,167,177,188]. As of

recently, convolutional neural network (CNN) approaches have shown the best performance on the

respective genuine vs. PA benchmark datasets. However, due to the different optical and mechanical

properties (see Figure 2.2), it has been shown that the PAD error rates of these approaches suffer up

to a three fold increase when applied to datasets containing PA materials not seen during training,

denoted as “cross-material generalization" [163, 229].

Some published studies aimed at reducing the performance gap due to cross material evaluations

are summarized in Table 5.1. These approaches generally fall under one of two categories of

27

techniques: (i) single-class PAD [59,73,197] and (ii) synthetic data augmentation via style transfer

or domain generalization [44, 79]. Many of these methods have shown incredible promise toward

cross-material PAD generalization and have been applied toward other biometric modalities as

well, including face [7] and iris [255]. The single-class methods circumvent the need to collect

extensive training sets of a diverse class of PA materials, but they tend to lag the performance

of binary-classifiers on known material attacks. On the other hand, synthetic data generators via

domain generalization or style transfer are effective in augmenting the number of data samples

of target materials of which we have only a few examples; however, they cannot approximate all

possible data distributions of unknown attacks.

A similar performance gap exists for cross-sensor generalization, in which presentation attack

algorithms are applied to fingerprint images captured on new fingerprint sensing devices that were

not seen during training. One explanation for the challenge of cross-sensor generalization is the

different textural characteristics in the fingerprint images from different sensors (see Figure 2.2).

This discrepancy in the representation performance between the “seen source domain" and the

“unseen target domain" has been referred to as the “domain gap" in the deep learning literature [16].

The cross-sensor evaluation can be considered as two separate cases: (i) all sensors in the evaluation

employ the same sensing technology, e.g., all optical FTIR, and (ii) the sensors may vary in the

underlying sensing mechanisms used, e.g., optical direct-view vs. capacitive.

In this work, we aim to improve the fingerprint presentation attack detection generalization

across novel PA materials and fingerprint sensing devices. Our approach builds off any existing

CNN-based architecture trained for fingerprint liveliness detection by encouraging cross-sensor

generalization via a style transfer network wrapper. We also incorporate adversarial representation

learning (ARL) in deep neural networks (DNN) to learn sensor and material invariant representa-

tions for presentation attack detection.

The main contributions of this chapter are enumerated below:

Generally, fingerprint sensor refers to the fingerprint sensing mechanism (e.g., camera and prism for FTIR optical,
direct-view camera, thermal measurement device, etc.) and fingerprint reader refers to the entire process of converting
a physical fingerprint into a digital image. For purposes of this study, we use these two terms interchangeably.

28

Figure 2.3 Overview of the network architecture for the proposed approach toward a more general-
izable presentation attack detector.

1. A robust PAD solution with improved cross-material and cross-sensor generalization perfor-

mance.

2. Our solution can be built on top of any CNN-based fingerprint PAD solution for cross-sensor

and cross-material PA generalization using adversarial representational learning.

3. Experimental evaluation of the proposed approach on publicly available datasets LivDet

2015, LivDet 2017, MSU-FPAD, and GCT3. Our approach is shown to improve the cross-

sensor (cross-sensor and cross-material) generalization performance from a TDR of 88.36%

(78.76%) to a TDR of 93.03% (88.49%) at a FDR of 0.2%.

4. Feature space analysis of cross-sensor domain separation of the embedded representations

prior to and following adversarial representation learning.

5. Detailed discussion of the challenges and techniques involved in applying deep-adversarial

representation learning for fingerprint PAD.

2.2 Related Work

In this section, we review the previous approaches to cross-material generalization of finger-

print PAD, which generally fall under two broad categories: (i) single-class classifiers and (ii)

29

synthetic data augmentation. We also discuss the preliminaries of domain adaptation and domain

generalization in the context of machine learning. Csurka provides a more in-depth review of

domain adaptation [46]. Similarly, Wang and Deng provide a specific survey of the recent deep

domain adaptation methods [244]. We also describe adversarial representation learning (ARL) as

it is applied to the tasks of domain adaptation and domain generalization.

2.2.1 Single-Class Classification for PAD

Unlike cross-sensor PAD generalization, generalization to unseen PA species has received

greater interest in the biometrics research community. One popular technique toward overcoming

the challenge of unknown PAs is to design a one-class classifer that is trained to precisely learn

the data distribution surrounding bonafide training examples and form a tight decision boundary

around this distribution.

In this way, any future test example which falls on the outside of the

decision boundary will be classified as a presentation attack.

The advantage of the one-class approach is that the classification is independent of the location of

any particular PA species in the high dimensional feature space, so long as it does not intersect with

the space occupied by the bonafide representations. Preliminary works in fingerprint [59, 73, 197],

face [7], and iris [255] PAD have shown incredible promise toward cross-material generalization;

however, these techniques tend to lag performance of binary classifiers on seen materials.

2.2.2 Cross-Material PAD Generalization via Style Transfer

A separate approach to cross-material generalization leverages style transfer to synthesize

additional training data to approximate unknown PA material representations. Gajawada et al. [79]

first proposed data augmentation via a style transfer network wrapper to synthetically augment

the amount of PA training data for a specific target PA material. This approach helps overcome

the challenge of collecting sufficient training data amenable to training deep neural networks,

which have shown superior PAD performance. Chugh et al. [44] extended this idea to interpolate

between known PA materials to digitally synthesize new PAs, which improved the cross-material

generalization. Style transfer is a very effective data augmentation tool to increase the robustness

of the learned representations; however, the diversity of PA materials which can be synthesized

30

does not cover the full spectrum of possible PAs to be seen.

2.2.3 Domain Adaptation and Domain Generalization

A domain refers to a probability distribution over which data examples are drawn.

In this

context, domain adaptation and domain generalization are approaches to machine learning aimed

at minimizing the performance gap between training data examples from a seen “source" domain

and testing data from a related but different “target" domain. Therefore, domain adaptation and

domain generalization are applied to situations in which the training and testing data points are not

both independently and identically sampled from the same distribution. While domain adaptation

involves training on labeled examples from the source domain and unlabeled data from the target

domain, domain generalization assumes no access to labeled or unlabeled data examples from the

target domain.

2.2.4 Adversarial Representation Learning (ARL)

Adversarial representation learning is a machine learning technique that can be applied to

both domain adaptation and domain generalization. Adversarial representation learning has been

applied in DNN architectures to extract discriminative representations for a given target prediction

task (e.g., face recognition), while obfuscating some undesired attributes present in the data (e.g.,

gender information) [65, 82, 238, 263].

The general setup of ARL involves (i) an encoder network, (ii) a target prediction network, and

(iii) an adversary network. The encoder network aims to extract a latent representation (z) that is

not only informative for the target prediction task (𝑡), but also does not leak any information for

the sensitive task (𝑠). Meanwhile, the adversary network is tasked with extracting the sensitive

information from the encoded latent representation. The entire network is trained in a minimax

game similar to the generative adversarial networks introduced by Goodfellow et al. [89].

In Xie et al., the parameters of the adversary network are optimized to maximize the likelihood

of the sensitive label prediction, whereas the encoder is trained to maximize the likelihood of the

target task while minimizing the likelihood of the sensitive task [254]. In contrast, our proposed

work is more aligned with the approach of Roy and Bodetti [206], where the adversary network is

31

optimized to maximize the likelihood of the sensitive label prediction from the latent representation,

and the encoder is trained to maximize the entropy of the sensitive label prediction. In this manner,

the base network is encouraged to encode a representation that aims to confuse the sensitive label

prediction such that the adversary predicts equal probabilities (maximum entropy) for all classes of

the sensitive label.

In tandem with our preliminary publication of this work combining style transfer with adversarial

representation learning, Pereira et al. [193] also applied adversarial learning to encourage improved

cross-material generalization. Our work stands out as we also incorporate adversarial learning on

the sensor domains to learn a sensor-invariant as well as a material-invariant representation.

2.3 Proposed Approach

Our proposed approach toward fingerprint PAD generalization is a combination of style transfer

and adversarial representational learning. For the style transfer component, we have adapted

the Universal Material Generator (UMG) from Chugh et al. [44] to transfer style between sensor

domains rather than between PA material types. Additionally, we include ARL to enforce invariance

amongst the various sensing devices and unseen PA materials. An overview of the approach, which

highlights each of the individual components, is shown in Figure 5.2. We discuss the workings and

motivations of each individual component of the algorithm in the following sections.

2.3.1 Base CNN

What we refer to as the “base CNN" approach is a CNN trained on 96 x 96 aligned, minutiae-

centered patches for classifying a given fingerprint impression as live or PA. It was shown by Chugh

and Jain [41] that utilizing minutiae patches, as opposed to whole images, has two stark advantages:

(i) it alleviates the difficulty of limited training data in the existing bonafide vs PA public datasets

and (ii) it encourages the network to learn local textural cues to robustly separate bonafide from fake

fingerprints. This base CNN approach is illustrated in Figure 5.2 as the box enclosed by the green

line. Importantly, this network is trained without any specific methods to improve cross-sensor

and/or cross-material performance.

For most of our experiments, we utilize the MobileNet-v1 model [112] as our base CNN

32

architecture (the same as in [41]). To adapt the network for the two-class (live vs. PA) problem of

fingerprint PAD, we replace the final 1000-unit softmax layer with a 2-unit softmax layer, and the

network is trained with a batch size of 64 with an RMSProp optimizer. Additionally, to encourage

robustness and avoid overfitting to minute variations of the input images, data augmentation tools

of random distorted cropping, horizontal flipping, and random brightness were employed during

training.

2.3.2 Adversarial Representational Learning (ARL)

ARL is an approach to domain generalization that aims to learn a generalized and robust feature

representation for both source and target domains that performs well on predicting some target

label while obfuscating the information related to a sensitive label. To accurately predict the target

label in each domain, the goal of the “ARL" approach is to encourage an encoding network to learn

a representation that is invariant between both domains. The ARL approach is shown in Figure

5.2 by the box enclosed by the red line. In the proposed approach, we incorporate two adversary

networks with their own adversarial tasks: (i) sensor adversary and (ii) material adversary.

2.3.2.1 Sensor Adversarial Representational Learning (SARL)

SARL is used to learn a representation that is agnostic to which sensor generated the input

fingerprint images (sensitive label), while accurately predicting live vs. PA (target label). In this

setup, the encoder network is represented as a deterministic function, z = 𝐸 (x; 𝜃E), the target

prediction network estimates the conditional distribution 𝑝(𝑡|x) through 𝑞𝑇 (𝑡|z; 𝜃T), and the sensor

adversary network estimates the conditional distribution 𝑝(𝑠|x) through 𝑞𝑆 𝐴 (𝑠|z; 𝜃SA), where x

denotes the input fingerprint image, and 𝑝(𝑡|x) and 𝑝(𝑠|x) represent the probabilities of the target

label (PA vs. Bonafide) 𝑡 and sensitive label (sensor origin) 𝑠, respectively.

To learn this sensor-invariant representation, the sensor adversary network is trained to maximize

the likelihood of predicting which sensor generated the input fingerprint image from the encoded

representation. The parameters, 𝜃SA, of the sensor adversary network are updated to minimize the

loss defined in equation 2.1.

Any other CNN-based approach other than [41] can be used instead.

33

L𝑆 𝐴 = Ex,𝑠 [− log 𝑞𝑆 𝐴 (𝑠|𝐸 (x; 𝜃E); 𝜃SA)]

(2.1)

2.3.2.2 Material Adversarial Representational Learning (MARL)

MARL is used to learn a representation that is agnostic to the fabrication material of a given

PA sample. The loss to update the parameters of the material adversary is defined in Eq. 2.2, which

is similar to the sensory adversary, except for replacing the sensitive task of predicting a given

sample’s sensor origin (s) to predicting its material composition (m).

L𝑀 𝐴 = Ex,𝑚 [− log 𝑞 𝑀 𝐴 (𝑚|𝐸 (x; 𝜃E); 𝜃MA)]

(2.2)

The output of each adversary network is used to encourage the encoder to produce a repre-

sentation that obfuscates the sensitive class labels by penalizing the parameters of the encoder,

𝜃E, to minimize the loss in equation 2.3, where 𝛼𝑆 and 𝛼𝑀 are hyper-parameters that allow for a

trade-off between obfuscation of the sensitive labels and prediction of the target label. Meanwhile,

to accurately predict bonafide vs. PA, the parameters of target prediction network, 𝜃T, are optimized

to minimize the loss in equation 2.4.

L𝐸 = Ex,𝑡 [− log 𝑞𝑇 (𝑡|𝐸 (x; 𝜃E); 𝜃T)]+𝛼𝑠Ex [

𝑚
∑︁

𝑖=1

𝑞𝑆 𝐴 (𝑠𝑖 |𝐸 (x; 𝜃E); 𝜃SA) log 𝑞𝑆 𝐴 (𝑠𝑖 |𝐸 (x; 𝜃E); 𝜃SA)]

+ 𝛼𝑚Ex [

𝑚
∑︁

𝑖=1

𝑞 𝑀 𝐴 (𝑚𝑖 |𝐸 (x; 𝜃E); 𝜃MA) log 𝑞 𝑀 𝐴 (𝑚𝑖 |𝐸 (x; 𝜃E); 𝜃MA)]

(2.3)

L𝑇 = Ex,𝑡 [− log 𝑞𝑇 (𝑡|𝐸 (x; 𝜃E); 𝜃T)]

(2.4)

2.3.3 Naïve

A simple approach to cross-sensor generalization is to assume access to a limited number of

training examples (100 live and 100 PA fingerprint images) from the target sensor. This is a

reasonable assumption in the case of cross-sensor generalization, where we have access to the

sensing device on which the system will be deployed, and helps alleviate the necessity of collecting

extensive amounts of data from the target domain. This is in contrast to generalization to unknown

34

Figure 2.4 Example style transfer using the Universal Material Generator (UMG) to transfer the
content of a source sensor minutiae patch (GreenBit) to the style of a target sensor minutiae patch
(Biometrika).

PA materials, where we cannot assume any prior knowledge of the unknown target materials. We

denote this method as the “naïve" approach to cross-sensor PAD as it does not require any changes

to the system architecture.

2.3.4 Universal Material Generator (UMG)

The final aspect of our proposed approach is the incorporation of the style transfer network,

which is used to augment the training data from the target sensor. The specific style transfer

method we use is the Universal Material Generator (UMG) proposed in [44] that we have adapted

for style transfer between sensor domains rather than between PA material types. Our version

of the UMG accepts source and target domain minutiae patches as input and produces a large

amount of synthetic training images in the target sensor domain. The style transfer is achieved

through learning a mapping from the style of the source domain image patches to the style of the

target domain image patches. Concretely, the network separates the content information, i.e, the

fingerprint ridge structure, and the style, i.e, textural information, of a given fingerprint minutiae

patch and produces a synthetic image that has the content of the source domain and the style of

the target domain. An example of the style transfer between a content image from GreenBit to

Biometrika is shown in Figure 2.4, and an overview of the “UMG" approach is shown as the box

enclosed by the blue line in Figure 5.2.

35

2.3.5 UMG + SARL + MARL (Proposed Approach)

The proposed approach applies ARL (both SARL and MARL) with the UMG style transfer

wrapper to further improve generalization performance. An illustration of the “UMG + SARL +

MARL" approach is illustrated in Figure 5.2 as everything enclosed by the box formed by the solid,

black line. Like the naïve approach, this method inherently assumes knowledge of a limited set

of examples from the target domain sensor. Specifically, we assume 100 live and 100 PA images

from the target sensor. From this small set of images from the target sensor, we produce a much

larger set of synthetic images in the target domain using the UMG wrapper to transfer the style of

the target domain to the content of the source domain training images.

The advantage of this approach is that we leverage the ability of the UMG wrapper to ensure a

balanced dataset from all sensors (source and target), which we combine with ARL that forces the

network to learn a sensor-invariant and material-invariant representation. In the following section,

we demonstrate the performance gains over the previous approaches and show that the UMG coupled

with ARL achieves the new state-of-the-art in cross-sensor and cross-material generalization for

fingerprint PAD.

2.4 Evaluation Procedure

In this section we describe the experimental protocol of the various experiments carried out

in this chapter, the datasets involved in each experiment, and the implementation details of the

proposed approach.

2.4.1 Experimental Protocol

We adopt the leave-one-out protocol to evaluate cross-sensor PAD performance, where one

sensor is set aside for testing, and the network is trained on data from all remaining sensors. The

setup is then repeated for all combinations of source and target sensors. To separately study the

impact of cross-sensor from cross-material performance, we divide our evaluation into two cases,

one which includes all materials from the testing set in training (cross-sensor only), and the case

where none of the testing materials were included in training (cross-sensor and cross-material).

36

Table 2.2 Summary of the 2015 and 2017 Liveness Detection (LivDet) datasets.

Dataset

Fingerprint
Reader

LivDet 2015

LivDet 2017

Green Bit

Biometrika

Digital Per-
sona

CrossMatch

Green Bit

Orcanthus

Model

DactyScan26 HiScan-PRO U.are.U

5160

L Scan
Guardian

DactyScan84CCerits2 Im-

age

Digital Per-
sona

U.are.U
5160

Image Size

500 x 500

1000 x 1000

252 x 324

640 x 480

500 x 500

300 x 𝑛†

252 x 324

Resolution
(dpi)

#Live Images
Train / Test

#PA Images
Train / Test

PA Materials

500

1000

500

500

569

500

500

1000 / 1000

1000 / 1000

1000 / 1000

1510 / 1500

1000 / 1700

1000 / 1700

1000 / 1692

1000 / 1500

1000 /1500

1000 / 1500

1473 / 1448

1200 / 2040

1180∗/
2018

1199 / 2028

Ecoflex, Gelatine, Latex, Wood Glue, Liquid
Ecoflex, RTV, Body Double, PlayDoh, OOMOO

Wood Glue, Ecoflex, Body Double,
Gelatine, Latex, Liquid Ecoflex

† Fingerprint images captured by Orcanthus have a variable height (350 - 450 pixels) depending on the friction ridge
content.
∗ A Set of 20 Latex PA fingerprints were present in the training data of Orcanthus; which were excluded in our experiments
because only Wood Glue, Ecoflex, and Body Double are expected to be in the training dataset.

Furthermore, we analyze separately the cross-sensor scenarios where all train and test sensors

employ the same sensing technology and when the training and test sensors employ different

sensing technology. The following defines the three test scenarios:

1. Cross-Sensor Only: Training on all but one sensor and evaluating generalization performance

on the left-out sensor. Test datasets contain only materials seen during training and each

sensor employs the same sensing technology.

2. Cross-Sensor and Cross-Material: Training on all but one sensor and evaluating generaliza-

tion performance on the left-out sensor. Test datasets contain only materials not seen during

training and every sensor uses the same sensing technology.

3. Cross-Sensing Technology: Training on all but one sensor and evaluating generalization

performance on the left-out sensor. Test datasets contain only materials seen during training

and left-out sensor employs different sensing technology than the training sensors.

37

Table 2.3 Summary of the MSU-FPAD and GCT3 datasets.

Dataset

Fingerprint
Reader

MSU-FPAD

GCT31

CrossMatch

Lumidigm

Optical A

Optical B

Optical C

Optical D

Model

Guardian 200

Venus 302

N/A

N/A

N/A

N/A

Image Size

750 x 800

400 x 272

1600 x 1600

180 x 256

500 x 500

500 x 468

Resolution
(dpi)

#Live Images
Train / Test

#PA Images
Train / Test

PA Materials

500

500

500

385

500

1000

2250 / 2250

2250 / 2250

10527 / 2632

6588 / 1647

6393 / 1599

8288 / 2072

3000 / 3000

2250 / 2250

641 / 161

179 / 45

305 / 77

596 / 149

Ecoflex, PlayDoh,
2D Print
(Matte Paper), 2D Print (Trans-
parency)

Ecoflex, Gelatine, Silicone, Gummy Overlay with Conductive Silicone

1 Sponsor approval is required to release sensor name and model. Instead, descriptive identifiers are used based on the data types
each sensor captures.

2.4.2 Datasets

The data used in the experiments are from the LivDet 2015, LivDet 2017, MSU-FPAD, and

IARPA ODIN Government Control Test (GCT) datasets, which are summarized in Tables 2.2 and

2.3. The LivDet 2015 dataset consists of four sensors: Biometrika, CrossMatch, Digital Persona,

and Green Bit. These sensors are all FTIR optical image capturing devices. We utilize this dataset

to evaluate the generalization performance across different fingerprint readers with the same sensing

technology. To evaluate performance on fingerprint readers with different sensing mechanisms,

we experiment on fingerprint data from the Lumidigm sensor of the MSU-FPAD dataset. This

sensor uses different sensing technology from the four seen in the LivDet 2015 as it is a multi-

spectral, direct-view capture device. Furthermore, we incorporate a third dataset, LivDet 2017,

which consists of three sensors: Digital Persona, Green Bit, and Orcanthus, where Orcanthus uses

thermal-based imaging. Finally, a subset of four Optical-FTIR sensors from phase III of the IARPA

ODIN sponsored Government Control Test data collection (GCT3) was also used.

38

2.4.3

Implementation Details

The architecture of the encoder in the proposed approach is MobileNet-v1 with the final 1000-

unit softmax layer removed, which is used to encode a latent representation z ∈ R𝑑.

In our

implementation, 𝑑 = 1024. The target predictor is a single fully connected layer of 2-dimensions

(for predicting live vs. PA) with a softmax activation. The sensor (material) adversary network

consists of a fully connected layer with a softmax activation of output dimension equal to the

number of source sensors (PA materials) in the training dataset.

Training adversarial losses often requires extensive hyper-parameter tuning. For example, it

was found advantageous during training to update the parameters, 𝜃A, of the adversary networks

five times per every update of the encoder and target predictor. We also explored adjusting the

number of hidden layers in the adversary networks, but no significant improvements over a single

layer network were observed. A grid search was performed over the values of 𝛼𝑠 and 𝛼𝑚 for

selecting the influence of the adversarial loss on updating the parameters, 𝜃E, of the encoder, and

the optimal parameter values of 𝛼𝑠 = 0.1 and 𝛼𝑚 = 0.05 were selected for the sensor adversary and

material adversary, respectively (see Eq. 2.3).

2.5 Experimental Results

Here we present the results of each experiment to evaluate the generalization performance of

the proposed approach. This section is divided into several parts to facilitate an in-depth analysis of

the generalization performance of the algorithm to each of the following cases: cross-sensor, cross-

sensor and cross-material, and cross-sensing technology. A discussion on the effect of varying the

number of assumed target domain images is included, as well as an analysis of the number of PA

materials seen during training. We conclude this section with examining the deep feature space prior

to and following the application of the proposed methodology for fingerprint PAD generalization.

The feature space analysis is conducted utilizing a t-Distributed Stochastic Neighbor Embedding

(t-SNE) visualization [153].

There has not been much prior work aimed specifically at improving cross-sensor generalization

of fingerprint PAD; nonetheless, there are a few cross-sensor performance results reported in the

39

Table 2.4 Cross-sensor generalization performance (TDR (%) @ FDR = 0.2 %) with leave-one-out
protocol on LivDet 2015 dataset with materials common to training and testing, i.e., excluding cross-
materials. Bio = Biometrika, CM = CrossMatch, DP = Digital Persona, and GB = GreenBit. The
proposed method (UMG+SARL+MARL) provides the best cross-sensor generalization indicated
by the highest mean TDR and small s.d. on the target sensor.

Source
CM, DP,
GB

Target
Bio

Source
Bio, DP,
GB

Target
CM

Source
Bio, CM,
GB

Target
DP

Source
Bio, CM,
DP

Target
GB

90.34

75.16

88.20

3.33

98.40

10.76

92.82

70.74

Source
Mean±
s.d.
92.44 ±
4.40

Base
CNN [112]

SARL

93.44

80.51

91.03

2.11

98.73

11.74

92.04

64.74

93.81 ± 3.43

Naïve

87.74

84.80

88.23

97.37

96.96

59.13

88.08

90.68

UMG [44]

89.10

94.33

84.28

90.70

96.39

71.85

78.14

96.57

90.18

91.86

87.87

98.95

94.21

52.07

89.15

83.92

88.98

92.83

88.48

97.54

96.18

87.61

86.88

93.78

96.22

91.10

87.60

100.0

97.72

84.10

93.15

96.90

Naïve +
SARL

UMG +
SARL [92]

UMG +
SARL +
MARL

90.25 ±
4.48

86.98 ±
7.71

90.35 ±
2.74

90.13 ±
4.13
93.67 ±
4.47

Target
Mean±
s.d.
40.00 ±
38.21

39.78 ±
38.67

83.00 ±
16.72

88.36 ±
11.27

81.70 ±
20.69

92.94 ±
4.09
93.03 ± 7.00

1 We use FDR = 0.2 % because this is the stringent metric being used by the IARPA Odin program. Due to space limits, it is challenging to
show the complete Receiver Operating Curve (ROC) or Detection Error Tradeoff (DET) curve.
2 Liquid Ecoflex and RTV materials were excluded from the testing sets of Green Bit, Biometrika, and Digital Persona. Body Double, Playdoh,
and OOMOO were excluded from the testing set of CrossMatch.

Table 2.5 Cross-sensor generalization performance (TDR (%) @ FDR = 0.2 %) with leave-one-out
protocol on GCT3 dataset. A = optical sensor type A, B = optical sensor type B, C = optical sensor
type C, and D = optical sensor type D.

Source
A, B, D

Target
C

Source
A, C, D

Target
B

Source
B, C, D

Target
A

Base
CNN [112]

89.95 ±
4.12

UMG [44]

96.34 ± 0.32

81.21 ±
7.03

91.42 ±
0.56

UMG +
SARL [92]

UMG + SARL
+ MARL

95.80 ±
0.96

95.80 ±
0.42

92.23 ± 1.22

91.96 ±
0.95

90.95 ±
2.88

91.12 ±
2.65

92.64 ±
3.55

49.10 ±
14.15

68.02 ±
10.18

74.62 ±
8.31

93.57 ± 3.07 78.68 ± 5.88

86.48 ±
1.55

92.73 ±
5.60

91.81 ± 0.94 96.45 ± 0.46

93.09 ± 1.30

90.75 ±
0.71

89.21 ±
0.74

96.45 ±
0.76

95.98 ±
0.08

93.06 ±
1.74

92.86 ±
1.41

Source
Mean ±
s.d.

89.13 ±
2.85

Target
Mean ±
s.d.

74.35 ±
8.93

85.30 ±
3.73

87.77 ±
3.43

88.87 ± 2.30

literature. Chugh and Jain report the cross-sensor performance of Fingerprint Spoof Buster, which

shares the same architecture of our base encoder model [41]. Therefore, in the following sections

we can expect the performance against that of Fingerprint Spoof Buster to be similar to that of the

Base CNN model. Furthermore, Chugh and Jain report cross-sensor results in their work toward

40

Table 2.6 Cross-sensor and cross-material generalization performance (TDR (%) @ FDR = 0.2 %)
with leave-one-out protocol on LivDet 2015 dataset with materials exclusive to the testing datasets,
i.e., cross-material only. Bio = Biometrika, CM = CrossMatch, DP = Digital Persona, and GB =
GreenBit.

Source

CM, DP,

GB

Target

Bio

Source

Bio, DP,

GB

Target

CM

Source

Bio, CM,

GB

Target

DP

90.34

63.92

88.20

4.46

98.40

11.39

Source

Bio, CM,

DP

92.82

Target

GB

72.39

Source
Mean±

s.d.

92.44 ±
4.40

Base
CNN [112]

SARL

92.78

72.58

91.03

6.06

98.73

13.08

92.04

49.69

93.65 ± 3.47

Naïve

87.74

77.11

88.23

96.80

96.96

42.62

88.08

85.69

UMG [44]

89.10

87.01

84.28

81.37

96.39

54.43

78.14

92.23

90.18

86.19

87.87

97.45

94.21

35.65

82.51

65.44

89.31

89.07

88.48

92.69

96.18

78.69

86.88

91.00

92.37

90.20

77.20

99.54

93.83

71.60

89.90

92.60

Naïve +
SARL

UMG +
SARL [92]

UMG +
SARL +
MARL

90.25 ±
4.48

86.98 ±
7.71

88.69 ±
4.88

85.51 ±
7.00

88.33 ±
7.59

Target
Mean±

s.d.

38.04 ±
35.06

35.35 ±
31.33

75.56 ±
23.39

78.76 ±
16.82

71.18 ±
27.15

87.86 ±
6.29

88.49 ± 11.93

improving cross-material generalization with the introduction of their UMG network wrapper [44].

This approach is comparable to what we refer to as the UMG approach in following tables of this

section.

2.5.1 Cross-Sensor Performance

To evaluate cross-sensor generalization we utilize the LivDet 2015 dataset and the GCT3 dataset

which both consist of four different FTIR optical fingerprint imaging devices. The training and

test sets of each GCT3 sensor contain exactly the same set of PA materials, whereas LivDet 2015

has additional PA materials in the testing sets. To separate out the cross-sensor generalization

performance from the related task of cross-material generalization on the LivDet 2015 dataset, we

first remove all the non-overlapping materials between the testing dataset of the target sensor and

the training datasets of the three source sensors. For this experiment, Liquid Ecoflex and RTV

materials were excluded from the testing sets when Green Bit, Biometrika, and Digital Persona

were the target sensors, whereas, Body Double, Playdoh, and OOMOO were excluded from the

41

Table 2.7 Cross-sensor generalization performance (TDR (%) @ FDR = 0.2 %) on leave-out
Biometrika (LivDet 2015) using Resnet-v1-50 as the base CNN model. Bio = Biometrika, CM =
CrossMatch, DP = Digital Persona, and GB = GreenBit.

Test Dataset(s) Base CNN [106]

SARL Naïve UMG [44] Naïve + SARL UMG + SARL [92]

Source: CM,
DP, GB

65.29

72.72

73.55

72.76

73.05

Target: Bio

76.02

72.27

90.79

91.76

92.18

75.94

92.83

UMG + SARL
+ MARL

87.18

93.50

testing set with CrossMatch as the target sensor.

As shown in Tables 2.4 and 2.5, the proposed approach of UMG + SARL + MARL increases

the average cross-sensor generalization in terms of True Detection Rate (TDR) at a False Detection

Rate (FDR) of 0.2% from 88.36% to 93.03% over the UMG only method. The proposed approach

also maintains higher performance (TDR = 93.67%) on the source domain sensors compared to the

UMG only approach (TDR = 86.98%). Lastly, we note that the standard deviation (s.d.) across the

four experiments of cross-sensor generalization on the LivDet 2015 dataset is significantly reduced

for the UMG + SARL + MARL method (11.27% to 7.00%) in comparison to UMG only, indicating

the robustness of the proposed approach.

For completeness, we include an evaluation of using an additional CNN architecture, Resnet-

v1-50 [106], as the base encoder to demonstrate the generality of the proposed approach.

In

Table 2.7, we report the performance with ResNet-v1-50 as the Base CNN model on LivDet 2015

with leaving Biometrika out as the target sensor. We see that the performance improvement is

consistent for both Base CNN models, supporting the generality of the approach to any existing

CNN architecture trained for fingerprint PAD. In the remaining experiments, we continue to report

results for only MobileNet-v1 as the Base CNN model.

We consider this metric to be more representative of actual use cases as opposed to EER and ACE. Space limitation
does not allow us to show the full ROC curve.
Resnet-v1-50 was chosen since the authors of other SOTA fingerprint PAD algorithms were not willing to share their
code and we found the details of their reported implementations insufficient for reproducing for a fair evaluation.

42

2.5.2 Cross-Sensor and Cross-Material Performance

In this section, we compare the performance of each solution on the cross-sensor and cross-

material experiment by following the same procedure as the cross-sensor experiment, while in-

cluding only materials exclusive to the test datasets of LivDet 2015. We saw that the adversarial

learning improved the cross-sensor performance via enforcing a sensor-invariant representation,

and we now evaluate whether we observe a similar benefit for the cross-material generalization

(Table 2.6).

The results of Table 2.6 agree with the results of the cross-sensor only experiment shown previ-

ously; however, there is a small absolute performance decline due to the evaluation on only unknown

PA materials. Specifically, the average TDR at a FDR of 0.2% of the proposed approach decreased

from 93.03% for cross-sensor only to 88.49% for cross-sensor and cross-material generalization

on the target sensor. However, we notice that the relative performance degradation of the UMG +

SARL + MARL method is less than the relative drop in performance of the UMG only approach,

which further demonstrates the generalization benefits of incorporating ARL for fingerprint PAD.

2.5.3 Cross-Sensing Technology Performance

In this section, we expand our analysis to include generalization across different fingerprint

sensing mechanisms, where the sensing technology of the source fingerprint readers during training

is different from the target test reader. For the first experiment we incorporate the data from the

Lumidigm multispectral sensor of the MSU-FPAD database as the test sensor and the four FTIR

optical sensors of LivDet 2015 as our training sensors. Here the testing datasets include only PA

material types that were seen during training. The results show that UMG + SARL + MARL

achieves the highest generalization TDR of 89.68% on the target domain sensor (Lumidigm), an

improvement of 9.08% over the UMG only approach (Figure 2.8).

Figure 7.12 shows examples of bonafide and PA samples from Lumidigm that were correctly and

incorrectly classified by the network trained with the proposed approach. Since the network averages

predictions over minutiae patches extracted from each sample, the performance is dependent upon

the reliability of the minutiae extraction. For example, the fingerprint ridge structure is ambiguous

43

Figure 2.5 Example live and PA samples from the testing set of Lumidigm from MSU-FPAD
that are correctly and incorrectly classified by a network trained via the proposed approach with
Lumidigm left out of training.

Table 2.8 Cross-sensing technology generalization performance (TDR (%) @ FDR = 0.2 %) with
four sensors of LivDet 2015 dataset included during training and Lumidigm from the MSU-FPAD
dataset left out for testing. Bio = Biometrika, CM = CrossMatch, DP = Digital Persona, GB =
GreenBit, and Lum = Lumidigm.

Test Dataset(s)

Base CNN [112]

SARL Naïve UMG [44]

Naïve
+ SARL

UMG
+ SARL [92]

UMG + SARL
+ MARL

Source: Bio, CM,
DP, GB

90.40

87.41

63.54

88.24

87.22

88.45

Target: Lum

0.60

3.00

61.27

80.60

84.93

88.60

86.70

89.68

in many live samples that are incorrectly classified as PAs, whereas live samples with consistent,

clear ridge structures tend to be correctly classified. To alleviate this dependence on the reliability

of minutiae extraction, one could extract a minimum or fixed number of patches from each sample

or even fuse the network predictions with a model trained on whole images, as was done by the

authors of Spoof Buster [41].

2.5.4 Varying Number of Target Domain Images

To study the effect of varying the number of assumed target domain images available during

training, we repeat the experiments in the leave-out Biometrika (LivDet 2015) scenario. Specif-

ically, we run experiments on 50 and 250 live and PA training images from the target domain.

As shown in Table 2.9, increasing the number of target domain images greatly benefits the naïve

approach but only marginally affects the UMG + SARL method. Therefore, the benefit of UMG +

SARL is most pronounced in cases with limited target domain training examples. In the trade-off

between time spent for data collection and performance, the proposed method can significantly

help reduce the burden of expensive data collection.

44

Table 2.9 Cross-sensor generalization performance (TDR (%) @ FDR = 0.2 %) on leave-out
Biometrika (LivDet 2015) with varying number of target sensor training images.

Number of Target
Domain Training Images

Test Dataset
Domain

Naïve UMG [44] Naïve + SARL UMG + SARL [92]

50

250

Source
Target
Source
Target

91.21
90.15
91.04
95.29

93.19
90.47
91.00
89.19

85.64
91.43

95.50
95.40

90.76

93.25
90.71
93.04

2.5.5 Varying PA Material Selection in Target Domain Images

In addition to varying the total number of live and PA images from the target sensor, we also

experiment with varying the types of PA materials among those target sensor PA images that we

include. Specifically, we first run the experiment with only including live images from the target

sensor during training. Note, we still include all PA materials from the training sets for each of the

source sensors. In other words, we are only varying the diversity of PA material types in the target

domain that are seen during training. Then, we retrain while including two additional PA materials

from the target domain (Ecoflex and Gelatine). Finally, we retrain again while including four PA

materials in total (Ecoflex, Gelatine, Latex, and Wood Glue) from the target domain sensor.

We compare the source and target sensor generalization performance for each of these cases

in Table 2.10. We observe that the target sensor performance is highest when including the most

number of PA materials, where the difference is most pronounced in the case of the Digital Persona

sensor as the target sensor. Plots of the TSNE feature space for each of the three networks on the

test data from Digital Persona are shown in Figure 2.6, which show that the separation of bonafide

and PA samples is greatest as we add more materials. On the other hand, the performance on the

source domain sensors drops in two out of the three cases as we incorporate more PA data from the

target domain into the training. This further highlights the trade-off we observed in the previous

experiments between improved generalization performance at the expense of lower source domain

performance.

45

(a)

(b)

(c)

(d)

Figure 2.6 3-dimensional t-SNE feature embeddings of the target sensor (Digital Persona) for
networks trained via the proposed method on (a) live only, (b) live, ecoflex, and gelatine, and (c)
live, ecoflex, gelatine, latex, and wood glue impressions from Digital Persona, in addition to the
full training set of PA materials from GreenBit, Biometrika, and CrossMatch of LivDet 2015. The
separation of bonafide vs. PA samples improves as the network is trained with more PA material
types from the target sensor.

Table 2.10 Cross-sensor generalization performance of the UMG + SARL method (TDR (%) @
FDR = 0.2 %) on LivDet 2015 with varying number of target sensor PA materials during training.
Bio = Biometrika, CM = CrossMatch, DP = Digital Persona, and GB = GreenBit.

Source
CM, DP, GB

Target
Bio

Live Only1
Live, Ecoflex, Gelatine2
Live, Ecoflex, Gelatine,
Latex, Wood Glue3

94.51
92.26
92.42

90.80
89.6
94.60

Source
CM, Bio,
GB

91.88
91.11
96.70

Target
DP

Source
CM, DP, Bio

Target
GB

77.50
80.7
90.10

96.54
94.26
94.95

88.3
89.80
92.50

1 All source sensor materials plus only Live images from the target sensor used during training.
2 All source sensor materials plus Live, Ecoflex, and Gelatine images from the target sensor used during training.
3 All source sensor materials plus Live, Ecoflex, Gelatine, Latex, and Wood Glue images from the target sensor used during
training.

2.5.6 Effect of Different Training Sensor Technologies

Due to the large discrepancies between images, we analyze whether including additional data

from vastly different sensing technology readers is beneficial or not. Intuitively, one would think

that incorporating additional data should improve the predictions of the data-driven deep learning

network; however, we show that this is not necessarily the case if the domain gap of the additional

sensors is too large. To show this result, we train two networks on the LivDet 2017 dataset. The

first network is trained with the GreenBit sensor designated as the target sensor, and the second

network is trained with Digital Persona as the target sensor. Both GreenBit and Digital Persona

46

Table 2.11 Cross-sensor generalization performance (TDR (%) @ FDR = 0.2 %) with leave-one-
out protocol for GreenBit and Digital Persona on LivDet2017 dataset with Orcanthus and without
Orcanthus included in training. DP = Digital Persona, GB = GreenBit, and Orc = Orcanthus.

Source

DP (Orc)

5.50

10.24

Target

GB

24.19

59.19

Source

GB (Orc)

21.31

75.66

Target

DP

25.85

32.55

UMG + SARL
With Orc
UMG + SARL
Without Orc

employ optical-FTIR based sensing, whereas the third sensor of LivDet 2017, Orcanthus, uses

thermal-swipe based technology.

We first train each network on only images of the other optical-FTIR sensor in the dataset

(GreenBit or Digital Persona) with the few samples of the target sensor as we did in all the previous

experiments. Then, we compare the performance when incorporating training examples from the

Orcanthus sensor. The results are shown in Table 2.11. We see that the performance for both

the source and target domains is much better when Orcanthus is excluded from the training. This

suggests that incorporating training data from sensors that are very different from your desired test

sensors may actually degrade the performance on that test sensor.

2.5.7 Feature Space Analysis

To explore the benefits of incorporating ARL on top of the UMG only approach, we extract

2-dimensional t-SNE feature embeddings of the live and PA fingerprint minutiae patches from the

final 1024-unit layer of the MobileNet-v1 encoder network, prior to the softmax non-linearity, from

the UMG only network and the UMG + SARL network. For brevity, we just show the results of the

leave-one-out protocol on the LivDet 2015 dataset with Biometrika, Green Bit, and Digital Persona

as the source sensors and CrossMatch as the target sensor. In Figure 7.5, we plot these embeddings

to analyze the effect of adversarially enforcing the learning of a sensor-invariant representation.

Figure 7.5 (a) shows the separation between live and PA fingerprint minutiae patch embeddings

of the UMG only network for minutiae patches from the target sensor, i.e., CrossMatch, whereas

(b) shows the separation of the embeddings produced by the UMG + SARL approach. We can see

that the proposed method provides noticeably better separation between the live and fingerprint PA

47

(a)

(b)

Figure 2.7 2-dimensional t-SNE feature embeddings of the target sensor fingerprint minutiae
patches for the (a) UMG only and (b) UMG + SARL models trained on the LivDet 2015 dataset
with Biometrika, Green Bit, and Digital Persona as the source sensors and CrossMatch as the target
sensor. The blue and red dots represent live and PA minutiae patches of fingerprint impressions
captured on the target sensor (CrossMatch), respectively.

patches, resulting in the improved PAD performance.

2.6 Summary

Diverse and sophisticated presentation attacks pose a threat to the effectiveness of finger-

print recognition systems for reliable authentication and security. Previous PAD algorithms have

demonstrated success in scenarios for which significant training data of bonafide and PA fingerprint

images are available but are not robust enough to generalize well to novel PA materials unseen dur-

ing training. Additionally, previous fingerprint PAD solutions are not generalizable across different

fingerprint readers, meaning that a PAD algorithm trained on a specific fingerprint reader will not

perform well when applied to different fingerprint sensing devices.

The approach towards fingerprint PAD presented in this chapter demonstrates an improvement

over the state-of-the-art in terms of true detection rate (TDR) at a false detection rate (FDR) of

0.2% on cross-sensor and cross-material generalization. In particular, incorporating adversarial

representation learning with the Universal Material Generator (UMG) improves the cross-sensor

48

generalization performance from a TDR of 88.36 ± 11.27% to 93.03 ± 7.00% on the LivDet

2015 dataset, while maintaining higher performance on the sensors seen during training. Fur-

thermore, including cross-materials with the cross-sensor evaluation leads to an improvement of

78.76 ± 16.82% to 88.49 ± 11.93%. Lastly, experiments involving cross-sensing technology show

average improvements of 80.60% to 89.68% with the proposed approach over state-of-the-art on

the MSU-FPAD dataset. In the next chapter, we turn our attention to another case of cross-sensor

fingerprint generalization, contact to contactless fingerprint matching, which has its own set of

unique challenges stemming from the domain gap between contact and corresponding contactless

fingerprint images.

2.7 Acknowledgment

This research is based upon work supported in part by the Office of the Director of National

Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D

Contract No. 2017 − 17020200004. The views and conclusions contained herein are those of

the authors and should not be interpreted as necessarily representing the official policies, either

expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is autho-

rized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright

annotation therein.

49

CHAPTER 3

CONTACT TO CONTACTLESS FINGERPRINT MATCHING

Matching contactless fingerprints or finger photos to contact-based fingerprint impressions has

received increased attention in the wake of COVID-19 due to the superior hygiene of the contactless

acquisition and the widespread availability of low cost mobile phones capable of capturing photos

of fingerprints with sufficient resolution for verification purposes. This chapter presents an end-

to-end automated system, called C2CL, comprised of a mobile finger photo capture application,

preprocessing, and matching algorithms to handle the challenges inhibiting previous cross-matching

methods, namely i) low ridge-valley contrast of contactless fingerprints, ii) varying roll, pitch, yaw,

and distance of the finger to the camera, iii) non-linear distortion of contact-based fingerprints,

and vi) different image qualities of smartphone cameras. Our preprocessing algorithm segments,

enhances, scales, and unwarps contactless fingerprints, while our matching algorithm extracts both

minutiae and texture representations. A sequestered dataset of 9, 888 contactless 2D fingerprints and

corresponding contact-based fingerprints from 206 subjects (2 thumbs and 2 index fingers for each

subject) acquired using our mobile capture app is used to evaluate the cross-database performance

of our proposed algorithm. Furthermore, additional experimental results on 3 publicly available

datasets show substantial improvement in the state-of-the-art for contact to contactless fingerprint

matching (TAR in the range of 96.67% to 98.30% at FAR=0.01%).

3.1

Introduction

Most prevailing fingerprint readers in use today necessitate physical contact of the user’s finger

with the imaging surface of the reader; however, this direct contact presents certain challenges in

processing the acquired fingerprint images. Most notably, elastic human skin introduces a non-

linear deformation upon contact with the imaging surface which has been shown to significantly

degrade matching performance [13,35,201]. Furthermore, contact with the surface is likely to leave

a latent impression on the imaging surface [191], which presents a security risk as an imposter

This chapter was previously published as S. A. Grosz, J. J. Engelsma, and A. K. Jain, “C2CL: Contact to Contactless
Fingerprint Matching”, IEEE Transactions on Information Forensics and Security, vol. 17, pp. 196-210, 2022.
Copyright 2022 by IEEE. Reprinted with permission.

50

Figure 3.1 Overview of matching contactless fingerprint images with a legacy database of contact-
based fingerprint impressions. While only a specific scenario is shown here where contact-based
images are obtained from optical FTIR readers (slap or single finger capture) and contactless images
are captured by a smartphone camera, our approach can be applied to any heterogeneous fingerprint
matching problem.

could illegally gain access to the system though creation of a presentation (i.e., spoof) attack.

In light of the ongoing Covid-19 pandemic, contactless fingerprint recognition has gained

renewed interest as a hygienic alternative to contact-based fingerprint acquisition [183]. This is

further supported by a recent survey that showed that the majority of users prefer touchless capture

methods in terms of usability and hygeine considerations [194]. Prior studies have explored the use

of customized 2D or 3D sensing for contactless fingerprint acquisition [110,134,136,192,220,259],

while others have explored the low-cost alternative of using readily available smartphone cameras

to capture “finger photos” [157, 210, 224].

Despite the benefits of contactless fingerprint acquisition, imaging and subsequently matching

a contactless fingerprint presents its own set of unique challenges. These include (i) low ridge-

valley contrast, (ii) non-uniform illumination, (iii) varying roll, pitch, and yaw of the finger, (iv)

varying background, (v) perspective distortions due to the varying distances of the finger from the

camera, and (vi) lack of cross-compatibility with legacy databases of contact-based fingerprints

In general, contactless fingerprints refers to fingerprint images acquired by a contactless fingerprint sensor, whereas
finger photo refers to fingerprint images acquired by a mobile phone. In this paper, we use the two terms interchangeably.

51

Self-captureContactlessOperator-CaptureContact-basedFingerprint AcquisitionMatchingMatch ScoreMinutiae MatchingTexture MatchingCosineSimilaritySegmentationEnhancementWarp + ScaleLegacy DatabaseContact-basedContactless(a)

(b)

Figure 3.2 Examples of contactless fingerprints (a) and their corresponding contact-based finger-
print images (b). Varying viewing angle, resolution, and illumination of contactless images and
non-linear distortion of contact-based fingerprints contribute to the degradation of cross-matching
performance. The contactless images shown are from the ZJU dataset.

(see Figure 3.2). For widespread adoption, contactless fingerprint recognition must overcome the

aforementioned challenges and bridge the gap in accuracy compared to contact-contact fingerprint

matching.

The most significant factor limiting the adoption of contactless fingerprint technology is cross-

compatibility with legacy databases of contact-based fingerprints, which is particularly important

for governmental agencies and large-scale national ID programs such as India’s Aadhaar National

ID program which has already enrolled over 1 billion users based upon contact-based fingerprints.

Several studies have aimed at improving the compatibility of matching legacy slap images to con-

tactless fingerprint images [49,53,145,146,158,250]; however, none have achieved the same levels

of accuracy as state-of-the-art (SOTA) contact-contact fingerprint matching (such as the results

reported in FVC-ongoing [62] and NIST FpVTE [247]). Furthermore, all of these works focus on

solving only a subset of the challenges in an effort to push the contact-contactless matching accu-

racy closer to the SOTA contact-contact matching systems. Indeed, to the best of our knowledge,

this study presents the most comprehensive, end-to-end solution in the open academic literature

for contact-contactless fingerprint matching that addresses the challenges inherent to each step in

the contact to contactless matching process (mobile capture, segmentation, enhancement, scaling,

non-linear warping, representation extraction, and matching).

52

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3.3 Example contactless and contact-based fingerprint image pairs from databases which
we have obtained from different research groups:
(a) IIT Bombay [17], (b) ISPFDv2 [158],
In general, contactless
(c) MSU [53], (d) PolyU [145], (e) UWA [270], and (f) ZJU datasets.
fingerprints suffer from low ridge-valley contrast, varying roll, pitch, and yaw, and perspective
distortions, especially those captured by smartphone cameras (e.g., (a), (b), (c) and (f)). We believe
our study involves the largest collection of public domain databases of contactless and contact-based
fingerprints.

We show that our end-to-end matcher, called C2CL, is able to significantly improve contact-

contactless matching performance over the prevailing SOTA methods through experimental results

on a number of different datasets, collected by various research groups using their own app and

fingerprint readers. We also demonstrate that our matcher generalizes well to datasets which were

not included during training. This cross-database evaluation solves a shortcoming of many existing

studies which train and evaluate algorithms on different training and test splits of the same contact-

contactless dataset. Furthermore, despite multiple evaluation datasets, we train only a single model

for our evaluations, rather than fine-tuning individual models to fit a specific dataset.

Concretely, the contributions of this chapter are stated as the following:

1. An end-to-end system, called C2CL, for contact-contactless fingerprint matching. C2CL is

comprised of preprocessing (segmentation, enhancement, scaling, and deformation correc-

tion), feature extraction (minutiae and texture representations), and matching modules. Our

preprocessing also benefits the Verifinger 12.0 commercial fingerprint SDK.

53

Table 3.1 Summary of published cross-matching contact to contactless fingerprint recognition
studies.

Study

Approach

Lin and Kumar, 2018 [145]

Robust TPS deformation
correction model, minutiae and
ridge matching

Deb et al., 2018 [53]

COTS matcher

Lin and Kumar, 2019 [146]

Fusion of three Siamese CNNs

Wild et al., 2019 [250]

Filtering based on NFIQ 2.0
quality measure, COTS matcher

Dabouei et al., 2019 [49]

Malhotra et al., 2020 [158]

Priesnitz et al., 2021 [194]

Proposed Approach

TPS spatial transformer network
for deformation correction and
binary ridge-map extraction net-
work, COTS matcher
Feature extraction with deep
scattering network, random de-
cision forest matcher
Neural network-based minutiae
feature extraction, open-source
minutiae matcher

TPS spatial transformer for 500
ppi scaling and deformation
correction of contactless
fingerprints. Fusion of minutiae
and CNN texture
representations.

Accuracy†

EER = 14.33% [145]
EER = 19.81% [270]

TAR = 92.4% − 98.6%
@ FAR = 0.1% [53]

EER = 7.93% [145]
EER = 7.11% [270]

TAR = 95.5% − 98.6%
@ FAR = 0.1% [250]

EER = 7.71% [270]

EER =
2.11% − 5.23% [158]

EER = 15.71% and
32.02% [194]

EER = 1.20% [158]
EER = 0.72% [270]
EER = 0.30% [145]
EER = 0.62% (ZJU
Dataset)

Database
and con-
1, 800 contactless
tact fingerprints from 300 fin-
gers [145].
2, 000 contactless and 4, 000
contact fingerprints from 1, 000
fingers [270]
2, 472 contactless and contact
fingerprints
from 1, 236 fin-
gers [53]
960 contactless and contact fin-
gerprints from 160 fingers [145].
1, 000 contactless and 2, 000
contact fingerprints from 500
fingers [270]
1, 728 contactless and 2, 582
contact fingerprints from 108
fingers [250]

2, 000 contactless and 4, 000
contact fingerprints from 1, 000
fingers [270]

8, 512 contactless and 1, 216
contact fingerprints from 152
fingers [158]

896 contactless from two differ-
ent capture setups and 464 con-
tact fingerprints from 232 fin-
gers [194]
8, 512 contactless and 1, 216
contact fingerprints from 152
fingers [158].
2, 000 contactless and 4, 000
contact fingerprints from 1, 000
fingers [270].
960 contactless and contact fin-
gerprints from 160 fingers [145].
9, 888 contactless and 9, 888
contact fingerprints from 824
fingers (ZJU Dataset)

† Some studies only report EER while other studies only report TAR @ FAR = 0.1%.

2. A fully automated, preprocessing pipeline to map contactless fingerprints into the domain

of contact-based fingerprints and a contact-contactless adaptation of DeepPrint [69] for

representation extraction. Our preprocessing and representation extraction is generalizable

across multiple datasets and contactless capture devices.

3. SOTA cross-matching verification and large-scale identification accuracy using C2CL on

both publicly available contact-contactless matching datasets as well as on a completely

54

Table 3.2 Summary of contact to contactless fingerprint recognition datasets used in this study.

# Unique
Fingers
1, 500

# Images
(Contactless/Contact)
3, 000 / 6, 000

Contactless
Capture Device

Contact
Capture Device

3D Scanner
S120E)

(TBS

CROSSMATCH Ver-
ifier 300 LC2.0

Dataset

# Subjects

150

UWA Benchmark 3D
Fingerprint Database,
2014 [270]
ManTech
2015 [75]

Phase2,

496

4, 960

N/A / N/A∗

Contactless
PolyU
2D to Contact-based
2D Images Database,
2018 [145]
MSU Finger Photo
and Slap Fingerprint
Database, 2018 [53]
IIT Bombay Touch-
less and Touch-Based
Fingerprint Database,
2019 [17]
ISPFDv2,
2020 [158]

N/A

336

2, 976 / 2, 976

309

1, 236

2, 472 / 2, 472

N/A

200

800 / 800

76

304

17, 024 / 2, 432

Cross Match Guardian
Cross Match
R2,
Avenger,
SEEK
Mor-
MorphoTrak
phoIDent,
Mor-
phoTrust TouchPrint
5300,
Northrop
Grumman BioSled
URU 4000

CrossMatch Guardian
SilkID
200,
(SLK20R)
eNBioScan-C1
(HFDU08)

Secugen Hamster IV

AOS ANDI
On-The-Go (OTG),
MorphoTrak
Finger-On-The-Fly
(FOTF), IDair
innerID on iPhone 4.

Low-cost camera and
lens (specific device
not given)

Xiaomi Redmi Note 4
smartphone

Lenovo
smartphone

Vibe

k5

OnePlus One (OPO)
and Micromax Canvas
Knight smartphones
HuaWei P20, Sam-
sung s9+, and One-
Plus 8 smartphones

ZJU Finger Photo and
Touch-based Finger-
print Database
∗ The number of contact and contactless images acquired per finger varies for each device; the exact number is not provided.

9, 888 / 9, 888

URU 4500

824

206

sequestered dataset collected at Zhejiang University, China. Our evaluation includes the

most diverse set of contactless fingerprint acquisition devices, yet we employ just a single

trained model for evaluation.

4. A smartphone contactless fingerprint capture app that was developed in-house for improved

throughput and user-convenience. This app is made available to the public to promote further

research in this area.

5. A new dataset of 9, 888 2D contactless and corresponding contact-based fingerprint images

from 206 subjects (2 thumbs and 2 index fingers per subject), which is made available to

advance much needed research in this area.

The project repository for the smartphone contactless fingerprint capture app is available at https://github.com/
ronny3050/FingerPhotos.
The dataset application is available at https://person.zju.edu.cn/en/eryunliu.

55

3.2 Prior Work

Prior studies on contact-contactless fingerprint matching primarily focus on only one of the sub-

modules needed to obtain matching accuracy close to contact-contact based fingerprint matching

systems (e.g., segmentation, distortion correction, or feature extraction only). These studies are

categorized and discussed below.

3.2.1 Segmentation

The first challenge in contact-contactless matching is segmenting the relevant fingerprint region

from the captured contactless fingerprint images. Malhotra et al. [158] proposed a combination

of a saliency map and a skin-color map to segment the distal phalange (i.e., fingertip) of contact-

less fingerprint images in presence of varying background, illumination, and resolution. Despite

impressive results, the algorithm requires extensive hyperparameter tuning and still fails to accu-

rately segment fingerprints in severe illumination conditions or noisy backgrounds. To alleviate

these issues, we incorporate segmentation via an autoencoder trained to robustly segment the distal

phalange of input contactless images.

3.2.2 Enhancement

One of the main challenges with contactless fingerprint images is the low ridge-valley contrast

(Figure 3.3). The literature has addressed this in a number of different ways, including adaptive

histogram equalization, Gabor filtering, median filtering, and sharpening by subtraction of the

Gaussian blurred image from the captured image ( [49, 146, 158]). We also incorporate adaptive

contrast enhancement in our work; however, one consideration that is lacking in existing approaches

is the ridge inversion that occurs with Frustrated Total Internal Reflection (FTIR) optical imaging.

In particular, the ridges and valleys of an FTIR fingerprint image will appear dark and light,

respectively, while the opposite is true in contactless fingerprint images. Therefore, a binary

inversion of the contactless fingerprint images is expected to improve the correspondence with their

contact-based counterparts.

56

Figure 3.4 System architecture of C2CL. (a) A contactless fingerprint is captured and used as input
to the preprocessing module, consisting of segmentation, enhancement, 500 ppi ridge frequency
scaling, and deformation correction; (b) the transformed image output by the preprocessing module
is fed to DeepPrint [69], which extracts a texture representation (shown in red). Without performing
any additional preprocessing, the corresponding contact-based fingerprint is again fed to DeepPrint
to extract a texture representation (shown in blue). Simultaneously, a minutiae representation is
extracted using the Verifinger 12.0 SDK from both the contactless and contact-based fingerprint
images.

3.2.3 Scaling

After segmenting and enhancing a contactless fingerprint, the varying distances between fingers

captured and the camera must be accounted for. In particular, since contact-based fingerprints are

almost always captured at 500 pixels per inch (ppi), the contactless fingerprints need to be scaled

to be as close to 500 ppi as possible. Previous studies have applied a fixed manual scaling, set

for a specific dataset, or have employed contact-based fingerprint ridge frequency normalization

algorithms that rely on accurate ridge extraction - which is often unreliable for contactless finger-

prints. In contrast, we incorporate a spatial transformer network [119] which has been trained to

automatically normalize the resolution of the contactless fingerprints to match that of the 500 ppi

contact images. This scaling is performed dynamically, i.e. every input contactless fingerprint

image is independently scaled.

3.2.4 Distortion Correction

A final preprocessing step for contact-contactless matching is non-linear distortion correction.

To address this problem, [145] used thin-plate-spline (TPS) deformation correction models (previ-

57

ously applied for contact-contact matching [13,47,201,202,213,215]) using the alignment between

minutiae annotations of corresponding contactless and contact fingerprints. A limitation is that the

transformation is limited to one of six possible parameterizations. In a different study, Dabouei et

al. [47] train a spatial transformer to learn the distortion correction that is dynamically computed

for each input image.

In [47], a contact-based image is used as the reference for learning the

distortion correction for a contactless image. However, we argue that this is not a reliable ground

truth since the deformation varies among different contact-based fingerprint impressions. In our

attempt to re-implement their algorithm, we found that this lack of a reliable and consistent ground

truth makes training unstable, making it difficult to learn sound distortion parameters. In our work,

rather than using the contact-based image as a reference, we use the match scores of our texture

matcher as supervision for generating robust distortion correction. In other words, the distortion

correction is optimized to maximize match scores between genuine contact-contactless fingerprint

pairs.

3.2.5 Representation Extraction and Matching

After preprocessing a contactless fingerprint image to lie within the same domain as a contact-

based fingerprint, a discriminative representation must be extracted for matching.

In the prior

literature there are two main approaches to feature representation:

(i) minutiae representation

( [47, 145]) and (ii) deep learning representation ( [146, 158]). Minutiae-based approaches rely on

clever preprocessing and other techniques to improve the compatability of contactless fingerprint

images for traditional contact-based minutiae extraction algorithms. On the other hand, deep

learning approaches place less emphasis on preprocessing to manipulate the contactless fingerprints

to improve correspondence with contact-based fingerprints; rather, the responsibility is placed on

the representation network to learn the correspondence despite the differences. For example, Lin

and Kumar [144] and Dabouei et al. [47] both apply a deformation correction to the contactless

image to improve the minutiae correspondence.

In contrast, the deep learning approach taken

in [146] applies very little preprocessing to the contactless image (just contrast enhancement and

Gabor filtering) and leverages a Siamese CNN to extract features for matching. Similarly, Malhotra

58

et al. [158] utilize a deep scattering network to extract multi-scale and multi-directional feature

representations.

In contrast to prior studies, our approach utilizes both a texture representation and a minutiae

representation. Given the lower contrast and quality of contactless fingerprints (causing missing

or spurious minutiae) and the non-linear distortion and scaling discrepancies between contact and

contactless fingerprints (negatively impacting minutiae graph matching algorithms) a global texture

representation is useful to improve the contact-contactless matching accuracy. We demonstrate this

hypothesis empirically in the experimental results.

3.3 Methods

Our matcher, C2CL, aims to improve contact to contactless fingerprint recognition through a

multi-stage preprocessing algorithm and matching algorithm comprised of both a minutiae rep-

resentation and a texture feature representation. The preprocessing is employed to minimize the

domain gap between the contactless fingerprints residing in a domain D𝑐𝑙 and contact-based finger-

prints residing in another domain D𝑐 and consists of segmentation, enhancement, ridge frequency

scaling to 500 ppi, and deformation correction through a learned spatial transformation network.

After preprocessing, we extract deep-textural and minutiae representations (unordered, variable

length sets 𝑇 = {(𝑥1, 𝑦1, 𝜃1), ..., (𝑥𝑛, 𝑦𝑛, 𝜃𝑛)}) for matching. The final match scores are obtained

via a score-level fusion between the texture and minutiae matching scores.

3.3.1 Preprocessing

Here we discuss the details of each stage of our preprocessing algorithm as illustrated in

Figure 3.6.

3.3.1.1 Segmentation

Many contactless fingerprint datasets are unsegmented; for example, the ISPFDv2 dataset [158]

contains unsegmented, (4, 208 × 3, 120) images with varying illumination, resolution, and back-

ground conditions. Thus, the first step in our preprocessing pipeline is to segment the distal

phalange of the fingerprint using a U-net segmentation network [200]. Our segmentation algorithm

59

(a)

(b)

Figure 3.5 Example segmentation success (a) and failure (b) cases from images in the ISPFDv2
dataset using our segmentation algorithm. Sources of failure are presence of skin-like color tones
in the background and varying skin complexion due to varying illumination.

(a)

(b)

(c)

(d)

(e)

Figure 3.6 Illustration of our preprocessing pipeline including (a) segmentation, (b) enhancement,
(c) scaling, and (d) warping. For reference, a corresponding contact-based fingerprint is shown in
(e).

is a network 𝑆(·) which takes as input the unsegmented contactless fingerprint 𝐼𝑐𝑙 of dimension
(𝑚 × 𝑛) and outputs a segmentation mask ˆ𝑀 ∈ {0, 1} of dimension (𝑚 × 𝑛). The obtained segmen-
ˆ𝑀, is element-wise multiplied with 𝐼𝑐𝑙 to (i) crop out only the distal phalange of the

tation mask,

contactless fingerprints and (ii) eliminate the remaining background to avoid detection of spurious

minutiae in the later representation extraction stage. The segmented image 𝐼′

𝑐𝑙 is then resized to

480 × 480 by maintaining the aspect ratio with appropriate padding for further processing.

For training 𝑆(·), we manually marked segmentation masks 𝑀 of the distal phalange of 496

contactless fingerprints from the ISPFDv2 dataset. Initially, 200 images were randomly selected to

have varying resolutions of either 5MP, 8MP, or 16MP and another 200 were selected with varying

We used the open source Labelme segmentation tool found on GitHub [243].

60

backgrounds and illumination. An additional 96 images were specifically selected for their greater

perceived difficulty, particularly images with skin tone backgrounds. The optimization function for

training 𝑆(·) is a pixel-wise binary cross-entropy loss between ˆ𝑀 and 𝑀 (Eq. 3.1).

L𝑠𝑒𝑔 (𝐼𝑐𝑙, 𝐼𝑐, 𝑀)

=

∑︁

−

𝑖, 𝑗

3.3.1.2 Enhancement

[𝑀𝑖, 𝑗 log(

ˆ𝑀𝑖, 𝑗 |𝐼𝑐𝑙) + (1 − 𝑀𝑖, 𝑗 ) log(1 −

ˆ𝑀𝑖, 𝑗 |𝐼𝑐𝑙)]

(3.1)

Following segmentation, we apply a series of image enhancements 𝐸 (·) to increase the con-

trast of the ridge-valley structure of the contactless images, including (i) an adaptive histogram

equalization to improve the ridge-valley contrast and (ii) pixel gray-level inversion to correct for

the inversion of ridges between contact-based and contactless fingerprints. We also experimented

with SOTA super resolution and de-blurring techniques, such as RDN [267], to further improve the

contactless image quality but found only minimal matching accuracy improvements at the expense

of significant additional computational cost.

3.3.1.3 Distortion Correction and Scaling

After segmenting and enhancing the contactless fingerprints, the non-linear distortions that

separate the domains of contactless and contact-based fingerprints must be removed. In particular,

this includes both a perspective distortion (caused by the varying distance of a finger from the

camera) and a non-linear distortion (caused when the elastic human skin flattens against a platen).

To correct for these discrepancies, we train a spatial transformer network (STN) [119] 𝑇 (·) that

takes as input a segmented, enhanced contactless image 𝐼 𝑒

𝑐𝑙) and aligns the ridge structure
to better match the corresponding contact-based image domain D𝑐. The goal of the STN is two-

𝑐𝑙 = 𝐸 (𝐼′

fold: (i) an affine transformation 𝑇𝑠 (·) to normalize the ridge frequency of the contactless images

to match the 500 ppi ridge spacing of the contact-based impressions and (ii) a TPS deformation

warping 𝑇𝑑 (·) of the contactless images to match the deformation present in contact-based images

due to the elasticity of the human skin.

Both 𝑇𝑠 (·) and 𝑇𝑑 (·) are comprised of a shared localization network 𝑙 (·, 𝑤) and individual

61

differentiable grid-samplers. Given an enhanced contactless fingerprint 𝐼 𝑒
𝑐𝑙

𝑐𝑙, 𝑤) outputs the
scale (𝑠), rotation (𝜃), and translation (𝑡𝑥, 𝑡𝑦) of an affine transformation matrix 𝐴𝑠 (Eq. 3.2) and a

, 𝑙 (𝐼 𝑒

distortion field Θ which is characterized by a grid of 𝑛 × 𝑛 pixel displacements {(𝑥1, 𝑦1)...(𝑥𝑛, 𝑦𝑛)}.

Subsequently, a scaled, warped image 𝐼 𝑤
𝑐𝑙

is obtained via Equation 3.3.

To learn the weights 𝑤 of the localization network such that 𝑇𝑠 (·) and 𝑇𝑑 (·) correctly scale the

contactless fingerprints to 500 ppi and unroll them into a contact-based fingerprint, we minimize

the distance between DeepPrint representations extracted from genuine pairs of scaled, warped

contactless fingerprints (𝐼 𝑤
𝑐𝑙

) and contact-based fingerprints (𝐼𝑐). In particular, let 𝑓 (·) be a frozen

DeepPrint network pretrained on contact-based fingerprints. Then, we can obtain a pair of 192D

DeepPrint identity representations 𝑅𝑐𝑙 and 𝑅𝑐 via 𝑅𝑐𝑙 = 𝑓 (𝐼 𝑤

𝑐𝑙) and 𝑅𝑐 = 𝑓 (𝐼𝑐). Our loss can
then be computed from Equation 3.4. By using the DeepPrint identity features extracted from

contact-based fingerprint images to compute the loss, we are able to utilize the contact-based

impressions as a ground truth of sorts. In particular, we are training our localization network to

output better scalings and warpings such that the distortion and scale corrected contactless images

have DeepPrint representations closer to their corresponding “ground truth" contact-based image.

We note that this approach has key differences to that which was proposed in [49] where

the distortion corrected contactless image (scale was not learned in [49]) would be more directly

compared to the ground-truth contact-based fingerprint via a cross-entropy loss between “binarized"

versions of 𝐼 𝑤
𝑐𝑙

and 𝐼𝑐. We found that directly comparing the contactless and contact images via

a cross entropy loss was quite difficult in practice since the ground truth contact image and the

corresponding contactless image will have different rotations and translations separating them

(even after scaling and distortion correction - resulting in a high loss value even if the scaling and

distortion are correct). Furthermore, the contact-based image itself varies based upon the pressure

applied during the acquisition, environmental conditions, sensor model, etc., meaning that directly

using the contact-based image as ground truth is unreliable. In contrast, since DeepPrint has been

trained to be invariant to pressure, environmental conditions, and sensor model, our ground truth

(DeepPrint representations from contact-based images) will remain stable across different contact-

62

Table 3.3 Number of contactless and contact fingerprint images used in training each component
of C2CL (No. contactless/No. contact).

Dataset

UWA Benchmark 3D
Fingerprint Database [270]
ManTech Phase2,
2015 [75]
PolyU Contactless 2D to
Contact-based 2D Images Database [145]
MSU Finger Photo and
Slap Fingerprint Database [53]
IIT Bombay Touchless and
Touch-based Fingerprint Database [17]
ISPFDv2 [158]
ZJU Finger Photo and
Touch-based Fingerprint Database
Total

Segmentation
S(·)

Deformation Correction & Scaling
T(·)

0/0

0/0

0/0

0/0

0/0
496/0

0/0

496/0

0/0

0/0

1, 920/1, 920

2, 472/2, 472

800/800
8, 400/1, 200

0/0

13, 592/6, 392

DeepPrint
f (·)

1, 000/2, 000

21, 352/28, 574

1, 920/1, 920

2, 472/2, 472

800/800
8, 400/1, 200

0/0

35, 944/36, 966

based impressions. In short, unlike [49], we learn both distortion correction and scaling correction

simultaneously, and we use the DeepPrint identity loss to stabilize training of 𝑇 (·) and to enable

predictions of warpings and scalings which better improve matching accuracy.

𝑠 cos(𝜃) −𝑠 sin(𝜃)

𝑠 sin(𝜃)

𝑠 cos(𝜃)

𝐴𝑠 =









𝑡𝑥

𝑡𝑦









𝑐𝑙 = 𝑇 (𝐼 𝑒
𝐼 𝑤

𝑐𝑙; 𝐴𝑠, Θ) = 𝑇𝑑 (𝑇𝑠 (𝐼 𝑒

𝑐𝑙, 𝐴𝑠), Θ)

L𝑆𝑇 𝑁 = ∥𝑅𝑐𝑙 − 𝑅𝑐 ∥2
2

(3.2)

(3.3)

(3.4)

3.3.2 Representation Extraction

After performing all of the aforementioned preprocessing steps, we enter the second major stage

of our contact-contactless matcher, namely the representation extraction stage. Our representation

extraction algorithm extracts both a textural representation (using a CNN) and a minutiae-set.

Scores are computed using both of these representations and then fused together using a sum score

fusion.

63

3.3.2.1 Texture Representation

To extract our textural representation, we fine-tune the DeepPrint network proposed by En-

gelsma et al. in [69] on a training partition of the publicly available datasets which we aggregated

(Table 3.3). Unlike the deep networks used in [146] and [158] for extraction of textural rep-

resentations, DeepPrint is a deep-network which has been specifically designed for fingerprint

representation extraction via a built-in alignment module and minutiae domain knowledge. There-

fore, in this work, we seek to adopt DeepPrint for contact-contactless fingerprint matching. As

is common practice in the machine learning and computer vision communities, we are utilizing

a pretrained DeepPrint network to warm start our model, which has been shown to improve over

random initialization for many applications (for example, in fingerprint spoof detection [177]).

Formally, DeepPrint is a network 𝑓 (·) with parameters 𝑤 that takes as input a fingerprint

image 𝐼 and outputs a fixed-length fingerprint representation 𝑅 (which encodes the textural related

features). During training, DeepPrint is guided to encode features related to fingerprint minutiae

via a multi-task learning objective including (i) a cross-entropy loss on both the minutiae branch

identity classification probability ˆ𝑦1 and texture branch identity classification probability ˆ𝑦2 (Eq.

3.5), (ii) minimize the intra-class variance of class 𝑦 via a center loss between the predicted minutiae

feature vector 𝑅1 and its mean feature vector 𝑅𝑦
mean feature vector 𝑅𝑦

1 and the predicted texture feature vector 𝑅2 and its
2 (where 𝑅1 concatenated with 𝑅2 form the full representation 𝑅), and (iii) a
mean squared error loss on the predicted minutiae maps ˆ𝐻 output by DeepPrint’s minutiae branch

and ground truth minutiae maps H (Eq. 3.7). These losses are combined to form the DeepPrint

identity loss, L𝐼 𝐷 (Eq. 3.8), where 𝜆1 = 1, 𝜆2 = 0.00125, 𝜆3 = 0.095 are set empirically.

L1(𝐼, 𝑦) = − log( ˆ𝑦 𝑗=𝑦

1

|𝐼, 𝑤) − log( ˆ𝑦 𝑗=𝑦

2

|𝐼, 𝑤)]

L2(𝐼, 𝑦) = ∥𝑅1 − ¯𝑅𝑦

1 ∥2

2 + ∥𝑅2 − ¯𝑅𝑦

2 ∥2

2

(3.5)

(3.6)

64

L3(𝐼, 𝐻) =

∑︁

𝑗,𝑘,𝑙

( ˆ𝐻 𝑗,𝑘,𝑙 − 𝐻 𝑗,𝑘,𝑙)2

(3.7)

L𝐼 𝐷 (𝐼, 𝑦, 𝐻)

=

argmin
𝑤

𝑖=1

𝑁
∑︁

[𝜆1L1(𝐼𝑖, 𝑦𝑖) + 𝜆2L2(𝐼𝑖, 𝑦) + 𝜆3L3(𝐼𝑖, 𝐻𝑖)]

(3.8)

Due to the differences in resolution, illumination, and backgrounds observed between different

datasets of contactless fingerprint images, generalization to images captured on unseen cameras

becomes critical. The problem of cross-sensor generalization in fingerprint biometrics (e.g., optical

reader to capacitive reader), of which contact to contactless matching is an extreme example, has

been noted in the literature [4, 5, 152, 203], with many previous works aimed at improving the

interoperability [123, 161, 204]. Motivated by the recent work employing adversarial learning to

cross-sensor generalization of fingerprint spoof detection [92], we incorporate an adversarial loss to

encourage robustness of DeepPrint to differences between acquisition devices. The adversarial loss

L 𝐴 is defined as the cross-entropy on the output of an adversary network 𝑞(·, 𝜃 𝐴) across 𝐶 classes

of sensors, where the adversarial ground truth 𝑦′ is assigned equal probabilities across these 𝐶

classes (Eq. 3.9). The adversarial loss L 𝐴 and identity loss L𝐼 𝐷 form the overall loss function L𝐷

used to train DeepPrint (Eq. 3.10), where 𝜆4 = 0.1 is empirically selected. The adversary network,

𝑞(·, 𝜃 𝐴), is a two layer fully connected network, with weights 𝜃 𝐴, that predicts the probability of the

class of input device used to capture each image, i.e., minimizes the cross-entropy of the predicted

device and the ground truth device label 𝑦 (Eq. 3.11). Intuitively, if DeepPrint learns to fool the

adversary, it has learned to encode identifying features which are independent of the acquisition

device or camera.

L 𝐴 (𝐼, 𝑦′) = −

𝐶
∑︁

𝑐=1

𝑦′
𝑐 log 𝑞 𝐴 (𝑦𝑐 | 𝑓 (𝐼; 𝑤); 𝜃 𝐴)

(3.9)

L𝐷 (𝐼, 𝑦, 𝐻, 𝑦′)

=

argmin
𝑤

𝑁
∑︁

𝑖=1

[L𝐼 𝐷 (𝐼, 𝑦, 𝐻)

+

𝜆4L 𝐴 (𝐼𝑖, 𝑦′𝑖)]

(3.10)

L𝐶 (𝐼, 𝑦𝑐) = −𝑦𝑐 log 𝑞 𝐴 (𝑦𝑐 | 𝑓 (𝐼; 𝑤); 𝜃 𝐴)]

(3.11)

65

In addition to the adversarial loss, we also increased the DeepPrint representation dimensionality

from the original 192D to 512D and added perspective distortion and scaling augmentations during

training.

In an ablation study (Table 3.8), we show how each of our DeepPrint modifications

(fine-tuning, adversarial loss, perspective and scaling augmentations, and dimensionality change)

improves the contact-contactless fingerprint matching performance.

3.3.2.2 Minutiae Representation

Finally, after extracting a textural representation with our modified DeepPrint network, we

extract a minutiae-based representation from our preprocessed contactless fingerprints with the

Verifinger 12.0 SDK.

3.3.3 Matching

Following feature extraction, from which we obtain texture representations (𝑅𝑐

𝑡 , 𝑅𝑐𝑙

𝑡 ) and Ver-

ifinger minutiae representations (𝑅𝑐

𝑚, 𝑅𝑐𝑙

𝑚 ) for a given pair of contact and contactless fingerprint

images (𝐼𝑐, 𝐼𝑐𝑙), we compute a final match score as a weighted fusion of the individual scores

computed between (𝑅𝑐

𝑡 ) and (𝑅𝑐
𝑡 ) and 𝑠𝑚 denote the similarity score between (𝑅𝑐

𝑚, 𝑅𝑐𝑙

𝑡 , 𝑅𝑐𝑙

𝑡 , 𝑅𝑐𝑙

(𝑅𝑐

𝑚 ). Concretely, let 𝑠𝑡 denote the similarity score between

𝑚, 𝑅𝑐𝑙

𝑚 ), then the final similarity score is

computed from a sum score fusion shown in Equation 3.12. For our implementation, 𝑤𝑡 = 𝑤𝑚 =

0.5 was selected empirically.

𝑠 = 𝑤𝑡 𝑠𝑡 + 𝑤𝑚𝑠𝑚

(3.12)

3.4 Experiments

In this section, we give details on various experimental evaluations to determine the effec-

tiveness of C2CL for contact to contactless fingerprint matching. We employed various publicly

available datasets for the evaluation of our algorithms, as well as a new database of contactless and

corresponding contact-based fingerprints which was collected using our mobile-app in coordination

with Zhejiang University (ZJU).

66

3.4.1 Datasets

Table 7.1 gives a detailed description of the publicly available datasets for contact to contactless

matching used in this study, and Figure 3.3 shows some example images from these datasets. For

comparison with previous studies, we use the same train/test split of the PolyU dataset that was used

in [146], which consists of 160 fingers for training with 12 impressions each and the remaining 160

fingers for testing with 6 impressions each. Similarily, we split the UWA Benchmark 3D dataset

into 500 training fingers and 1, 000 unique test fingers. Furthermore, following the protocol of

Malhotra et al. [158], we split the ISPFDv2 dataset evenly into 50% train and 50% test subjects.

Finally, we captured and sequestered a new dataset of contactless fingerprints and contact-based

fingerprint images in coordination with ZJU for a cross-database evaluation (e.g., not seen during

training) to demonstrate generalizability of our algorithm. The cross-database evaluation is much

more stringent than existing approaches which only train/test on different partitions of the same

dataset.

Indeed, the cross-database evaluation is a much better measure of how C2CL would

perform in the real world.

The ZJU Finger Photo and Touch-based Fingerprint Database contains a total of 206 subjects,

with 12 contactless images and 12 contact-based impressions per finger. The thumb and index

fingers of both hands were collected for each subject, giving a total of 9, 888 contactless and contact-

based images each. The contactless images were captured using three commodity smartphones:

HuaWei P20, Samsung s9+, and OnePlus 8, whereas the contact-based fingerprint impressions were

captured on a URU 4500 optical-based scanner at 512 ppi. An Android fingerphoto capture app

was developed to improve the ease and efficiency of the data collection. To initiate the capturing

process, a user or operator enters the transaction ID for the user and uses an on screen viewing

window to help guide and capture the fingerprint image. Furthermore, a counter displayed on the

screen keeps track of subsequent captures to streamline the data collection process.

3.4.2

Implementation Details

All the deep learning components (segmentation network, deformation correction and scaling

network, and DeepPrint) are implemented using the Tensorflow deep learning framework. Each

67

network is trained independently and information regarding how many of the contactless and contact

fingerprint images from each of the datasets used in training each component of our algorithm is

given in Table 3.3.

3.4.2.1 Segmentation Network

A total of 496 contactless fingerprint images from the ISPFDv2 were manually labeled with

segmentation masks outlining the distal phalange were used for training. Input images were down

sampled to 256 × 256 during training to reduce the time to convergence, which occurred around

100, 000 iterations using stochastic gradient descent (SGD) with a learning rate of 1𝑒−3 and a

batch size of 8 on a single NVIDIA GeForce RTX 2080 Ti GPU. During inference, the contactless

fingerprint images are resized to 256 × 256 and resulting segmentation masks are upsampled back

to the original resolution. Due to limited number of manually marked images, we employed

random rotations, translations, and brightness augmentations to avoid over-fitting. Additionally,

we incorporated random resizing of input training images within the range [128 × 128, 384 × 384]

to encourage robustness to varying resolution between capture devices.

3.4.2.2 Deformation Correction and Scaling Network

The pretrained DeepPrint model in [69] was used to provide supervision of our spatial transfor-

mation network in line with Eq 3.4. The motivation for using a network pretrained on contact-based

fingerprints, rather than our new finetuned model on contactless fingerprints, is that the goal of our

transformation network is to transform the contactless fingerprint images to better resemble their

contact-based counterparts. Thus, a supervisory network trained on solely contact-based finger-

print images is more suitable for this purpose. The architectural details of our STN localization

network are given in Table 3.4. For our implementation, we set the number of sampling points for

the distortion grid to n = 4 × 4. Data augmentations of random rotations, translations, brightness

adjustments, and perspective distortions were employed to avoid over-fitting. This network was

trained for 25, 000 iterations using an Adam optimizer with a learning rate of 1𝑒−6 and a batch size

of 16 on a single NVIDIA GeForce RTX 2080 Ti GPU.

68

Table 3.4 Deformation correction and scaling spatial transformation network architecture, 𝑇 (·).

Output Dim.
Layer
480 × 480 × 1
0. Input
240 × 240 × 32
1. Convolution
120 × 120 × 64
2. Convolution
60 × 60 × 128
3. Convolution
30 × 30 × 256
4. Convolution
13 × 14 × 256
5. Max Pool
1024
6. Dense
7. Dense
2 × 𝑛0 + 4
The final dense layer contains output neurons for a 2 × 𝑛0 grid of 𝑛0 = 𝑛 × 𝑛 pixel displacements and 4 neurons
for the affine transformation matrix (𝑠, 𝜃, 𝑡 𝑥 and 𝑡𝑦.). In our implementation, 𝑛 = 4.

#Filters, Filter Size, Stride
0, 0, 0
32, 3 × 3, 2
64, 3 × 3, 2
128, 3 × 3, 2
256, 3 × 3, 2
256, 6 × 4, 2
−
−

3.4.2.3 DeepPrint

The DeepPrint network was trained on two NVIDIA GeForce RTX 2080 Ti GPUs with an

RMSProp optimizer, learning rate of 0.01, and a batch size of 16. The added adversary network,

which was trained in step with DeepPrint, also utilized an RMSProp optimizer with a learning rate

of 0.01. A small validation set was partitioned from the DeepPrint fine-tuning data outlined in

Table 3.3 to stop the training (which occurred at 73,000 steps). Lastly, random rotation, translation,

brightness, cropping, and perspective distortion augmentations were utilized during training.

3.4.3 Evaluation Protocol

To evaluate the cross-matching performance of our algorithms, we conduct both verification

(1:1) and identification (1:N) experiments. For verification, we report the Receiver Operating

Characteristic (ROC) curves at specific operating points and equal error rates (EER). Note that we

report the True Acceptance Rate (TAR) at a False Acceptance Rate (FAR) of 0.01%, which is a

stricter threshold than is currently reported in the literature, and which is also a threshold expected

for field deployment. For the search experiments, the rank-one search accuracy is given against

an augmented large scale gallery of 1.1 million contact fingerprints taken from an operational

forensics database [262]. This is a much larger gallery than has previously been evaluated against

in the literature and is again more indicative of what C2CL would face in the real world. Finally,

we present ablation results on each significant component of our proposed system.

69

Table 3.5 Verification performance comparison.

Dataset

PolyU
UWA
ISPFDv2
ZJU2

Verifinger 12.0

DeepPrint

DeepPrint +
Verifinger 12.0

Previous SOTA

EER (%)
0.46
6.81
1.46
0.79

TAR (%) @
FAR=0.01%
97.20
92.56
96.02
96.86

EER (%)
2.37
5.29
2.33
2.08

TAR (%) @
FAR=0.01%
72.07
83.40
84.33
86.42

EER (%)
0.30
0.72
1.20
0.62

TAR (%) @
FAR=0.01%
97.74
98.30
96.67
97.56

EER (%)
7.93 [146]
7.11 [146]
3.401[158]
N/A

1 [158] reports results on the ISPFDv2 dataset per individual capture condition; 3.40 is the average EER across these data
splits.
2 Cross-database evaluation, i.e., not seen during training.

3.4.4 Verification Experiments

The verification experiments are conducted in a manner consistent with previous approaches to

facilitate a fair comparison. In particular, (i) the PolyU testing dataset yields 5, 760 (160 × 6 × 6)

genuine scores and 915, 840 (160 × 159 × 6 × 6) imposter scores, (ii) the UWA Benchmark 3D

dataset yields 8, 000 (1, 000 × 4 × 2) genuine and 7, 992, 000 (1, 000 × 999 × 4 × 2) imposter

scores, (iii) the ISPFDv2 dataset (which is split into 7 different capture variations) yields 68, 096

((152 × 8 × 8) × 7) genuine and 10, 282, 496 ((152 × 151 × 8 × 8) × 7) imposter scores, and (iv) the

ZJU dataset yields 118, 656 (824 × 12 × 12) genuine and 97, 653, 888 (824 × 823 × 12 × 12) imposter

scores. Due to the very high number of possible imposter scores for ZJU, we limit the number of

imposter scores computed to only include the first impression of each imposter fingerprint. This

process results in 678, 152 imposter scores out of the possible 97, 653, 888 scores. It is assumed for

all experiments that the contactless fingerprints and contact-based impressions are the probe and

enrollment images, respectively.

Table 3.5 provides the Equal Error Rate (EER) and TAR @ FAR=0.01% of C2CL on the

different datasets. For comparison with previous methods, rather than implement the relevant

SOTA approaches that have been proposed and risk under representing those methods, we directly

compare our approach to the results reported in each of the respective papers. In terms of EER,

our method outperforms all the previous approaches in the verification setting. Not only does our

The 7 scenarios consist of different background, illumination, and resolution variations (e.g., white background &
indoor lighting, white background & outdoor lighting, natural background & indoor lighting, natural background &
outdoor lighting, 5MP resolution, 8MP resolution, and 16MP resolution. For our evaluation, we combine each of these
into a single dataset.

70

Table 3.6 DeepPrint verification performance (finetuned on PolyU dataset only).

Test Dataset

PolyU

UWA

ISPFDv2

ZJU

EER (%) TAR (%) @ FAR=0.01%

1.90

8.35

3.87

2.99

74.62

35.42

57.10

68.99

individual performance of the minutiae and textural representations alone exceed that of the previous

SOTA methods (in particular, even if we remove Verifinger, we still beat SOTA in all cases), the

fusion performance attains matching accuracy (EER = 0.30% − 1.20%), which is much closer to

contact-contact fingerprint matching [62]. Even in the most challenging cross-database evaluation

(ZJU), C2CL attains competitive performance with contact-contact matching, demonstrating the

generalizability of C2CL to unseen datasets. Note that we report the TAR @ FAR=0.01% only for

C2CL since most of the prior approaches only report EER and none report TAR @ FAR =0.01%.

Different from previous approaches, which train individual models on a train/test split for each

evaluation dataset, we have trained just a single model for our evaluation across four different

datasets. This protocol is actually more challenging than finetuning for each individual evaluation

dataset. This is because despite having a smaller number of training samples, higher verification

performance can more easily be achieved by individually trained models. To support this claim,

we have finetuned an additional model on just the PolyU dataset using the same train/test split

specified in [146] and recorded the verification performance in Table 3.6. We observe that our

accuracy improves on PolyU from 2.37% EER for the model trained on our full combination of

training datasets to 1.90% EER for the model trained on just PolyU; however, because of the lower

performance on the other three datasets, we can see that this model is indeed over-fit to PolyU.

3.4.4.1 Ablation study

We present an ablation study (Table 5.9) to fully understand the contribution of the main

components of our algorithm; namely, segmentation, enhancement, 500 ppi frequency scaling, and

TPS deformation correction. From the ablation, we notice there is a substantial improvement in

71

both EER and TAR @ FAR=0.01% just from incorporating proper enhancement of the contactless

images.

In most cases, there is almost a 50% reduction in EER from including both contrast

enhancement and binary pixel inversion. For brevity, not shown in the table is the individual

contribution of inverting the ridges of the contactless images aside from contrast enhancement. For

reference, the EER of DeepPrint on ZJU warped images with only contrast enhancement is 2.49%.

This is in comparison to the EER of 2.08% on the warped images with both contrast enhancement

and pixel inversion.

Furthermore, we observe that for the smartphone captured contactless fingerprints in the

ISPFDv2 and ZJU datasets, there is a dramatic performance jump when incorporating our 500

ppi scaling network. Additionally, there is another noticeable improvement when incorporating

the deformation correction branch of our STN, most notably for the ISPFDv2 dataset. Since the

ZJU dataset contains equal numbers of thumb and index fingers, where the majority of our training

datasets contain mostly non-thumb fingers, we observed that the deformation correction is less

beneficial on average for the ZJU dataset compared to ISPFDv2. In fact, from Table 3.9, we see

that the EER of just index fingers of ZJU is noticeably lower than the EER on thumbs.

To investigate whether the lower performance on thumbs is a limitation of the available training

data or whether thumbs require a different distortion correction from non-thumbs, we retrained

separate warping models on thumb data only and non-thumb data only. The test results for the ZJU

dataset are in Table 3.10. A couple of observations: i.) The performance (TAR @ FAR=0.01%) is

highest for the model trained on both thumbs and non-thumbs, ii.) the model trained on non-thumbs

performs slightly worse when applied to the test set of thumbs in the ZJU dataset, which indicates

that the warping required for thumbs may be slightly different, and iii.) the performance of the

thumb only model decreases on both thumbs and non-thumbs due to the limited number of thumb

training examples.

3.4.4.2 Multi-finger fusion verification

The final set of verification experiments is to investigate the effects of finger position and

multiple finger fusion in the verification accuracy for the ZJU dataset. Table 3.9 shows the

72

Table 3.7 Ablation study of C2CL using only Verifinger 12.0 for matching∗. 𝑆 = segmentation, 𝐸
= enhancement, 𝑇𝑠 = scaling, 𝑇𝑑 = deformation correction.

E Ts Td

EER (%) TAR @ FAR = 0.01%

Dataset

S

✓

PolyU

UWA‡

ISPFDv2

ZJU†

✓ ✓

✓ ✓ ✓

✓ ✓ ✓ ✓

✓

✓ ✓

✓

✓ ✓

✓ ✓ ✓

✓ ✓ ✓ ✓

✓

✓ ✓

✓ ✓ ✓

✓ ✓ ✓ ✓

0.86

0.45

0.48

0.46

6.62

6.81

13.76

7.83

2.02

1.46

3.35

1.88

0.90

0.79

93.19

96.96

96.44

97.20

91.05

92.56

23.93

38.53

93.3

96.02

82.8

89.9

96.97

96.86

∗ Ablation results for DeepPrint are not shown since only a single model was trained on the final
𝐸+𝑆+𝑇𝑠+𝑇𝑑 images.
‡ We do not apply our STN here since these images are captured with a 3D scanner and are already unrolled
and at a resolution of 500 ppi.
† Cross-database evaluation, i.e., not seen during training.

individual performance per finger position and the fusion of multiple fingers; namely, thumb only,

index only, fusion of right thumb and right index, fusion of left thumb and left index, and four

finger fusion. The motivation for considering fusion of the thumb and index on each hand is that

from a usability standpoint, a user may be able to use their dominant hand when capturing their

own fingerprints. Notably, when fusing multiple fingers (e.g., right index and left index), we obtain

nearly perfect accuracy.

73

Table 3.8 DeepPrint ablation study

Method
DeepPrint [69]
+ Finetune
+ 512D
+ Augmentations
+ Adversarial Loss

∗ Each row adds on to the previous row.

ZJU EER (%)
4.07
2.68
2.64
2.35
2.08

Table 3.9 Multi-finger fusion verification results on the ZJU dataset

Finger Type
Thumb
Index
LT + LI
RT + RI
RT + LT
RI + LI

EER (%)
0.95
0.48
0.00
0.00
0.00
0.00

TAR (%) @ FAR = 0.01%
95.89
98.31
99.77
99.74
99.80
99.89

Table 3.10 Separate warping modules for thumbs vs. non-thumbs (TAR (%) @ FAR=0.01%)

Method

Trained on Thumbs
and Non-Thumbs

Trained
Thumbs

on

Non-

Trained on Thumbs

ZJU Non-Thumbs

ZJU Thumbs

ZJU All

92.22

80.66

86.42

3.4.5 Search Experiments

92.24

77.50

84.80

91.66

76.80

84.46

For the identification (or search) experiments, we utilize the first impressions of both the

contactless and contact-based fingerprints of the ZJU dataset. The contact-based fingerprints are

placed in the gallery which is augmented with 1.1 million fingerprint images from an operational

forensic database [262]. The contactless fingerprint images serve as the probes. We note that

Table 3.11 Improvement in minutiae correspondence without and with warping correction on ZJU
dataset

Avg. Number of
Paired Minutiae

Avg. Number of
Missing Minutiae

Avg. Number of
Spurious Minu-
tiae

Goodness
dex [195]

In-

Without Warping

28.06

With Warping

30.11

67.95

65.00

71.13

69.20

−0.0167

−0.0157

74

our 1.1 million augmented gallery is significantly larger than any of the existing galleries used to

evaluate contact-contactless fingerprint search and is more indicative of the real world use-case

of cross fingerprint matching (e.g., in a National ID system like Aadhaar where a large gallery of

contact-based fingerprints is already enrolled and used for de-duplication).

We evaluate 3 different search algorithms on the ZJU augmented gallery: (i) Verifinger 1:N

search, (ii) search via our DeepPrint texture matcher (scores 𝑠𝑡 from Eq. 3.12 are computed

between a given preprocessed, contactless probe and all 1.1 million contact-based fingerprints in

the gallery), and (iii) a two-stage search algorithm [69] where the DeepPrint texture scores are first

used to retrieve the top-500 candidates, followed by a reordering using the 1:1 minutiae matching

scores (𝑠𝑚 from Eq. 3.12) from Verifinger. The advantage of the two-stage search scheme is that it

balances both speed and accuracy by utilizing the matching speed of DeepPrint to locate the first

list of 500 candidates and the accuracy of Verifinger to further refine this list.

From Table 3.12, we observe that Verifinger outperforms DeepPrint stand-alone but at a search

time against 1.1 million that is quite slow in comparison to DeepPrint. This motivates combining

both approaches into the aforementioned two-stage search algorithm which outperforms Verifinger

at rank-1 and reduces the search time by 50 seconds.

In short, our two stage search algorithm

obtains high levels of search accuracy on a large-scale gallery at a significant search time savings.

Table 3.12 Search performance of the proposed matcher on the ZJU dataset with a gallery of 1.1
million.

Method

Search
Time (s)
0.4
DeepPrint
60.1
Verifinger 12.0
DeepPrint + Verifinger 12.0
10.5
DeepPrint + Verifinger 12.0 refers to indexing the top-500 candidates with DeepPrint and then re-sorting those 500 candidates
using a fusion of the Verifinger and DeepPrint score.

Rank (%)
100
10
95.86
93.06
96.95
96.47
96.95
96.10

1
83.56
95.25
95.49

500
97.08
97.20
97.08

3.4.6 Segmentation Evaluation

A successful segmentation algorithm for contactless fingerprint images must not only reliably

detect the distal phalange of the contactless fingerprint but also be robust to varying illumination,

background, and resolution that is expected to occur in highly unconstrained capture environments.

75

Table 3.13 Intersection Over Union (IOU) for Segmentation 𝑆(·)

Method

Baseline [158]

Proposed

IOU

0.747

0.899

The method by Malhotra et al. [158] performed well on the ISDFPDv2 dataset using certain

hyperparameters that were fit to this particular dataset; however, the authors did not evaluate it on

unseen datasets. In contrast, our algorithm requires no hyperparameter tuning and still performs

well across a variety of different evaluation datasets, both seen and unseen. Table 3.13 gives a

comparison on the unseen ZJU dataset between our method and our implementation of the baseline

approach of Malhotra et al., which was trained on the ISPFDv2 dataset. For this evaluation, we

manually marked the first contactless fingerprint image of each unique finger in the ZJU dataset

with ground truth segmentation masks of the distal phalange and then computed the Intersection

Over Union (IOU) metric between the predicted segmentation masks of our algorithm and our

implementation of the benchmark algorithm in [158]. Our method does not require any hyper-

parameter tuning and still achieves higher IOU compared to [158].

A qualitative analysis of our segmentation network (see Figure 3.5) shows our algorithm is

robust to varying illumination, background, and resolution and generalizes across multiple datasets

of contactless fingerprints. However, as seen in Figure 3.5 (b), the network may still fail in

extremely challenging background and illumination settings. An additional consideration, which is

of importance for real-time deployment, is the processing speed of the segmentation network. Our

segmentation algorithm is extremely fast compared to existing methods - requiring just 12.6ms to

segment a (900 × 1200) resolution image. In contrast, our parallel implemenation of the baseline

approach of Malhotra et al. requires 3s per image.

3.5 Discussion

Despite the low error rates achieved across each dataset, there are many factors that complicate

the cross-matching performance and lead to both type I (false rejects) and type II (false accepts)

errors. Many of the type I and type II errors are attributed to a failure to correctly segment and scale

76

only the distal phalange of the input contactless fingerprint. Incorrect segmentation can lead the

large amounts of the image containing background rather than the relevant fingerprint region. Other

errors can be attributed to the inherent low-contrast of the contactless fingerprints, despite any effort

of contrast or resolution enhancement. The only way to mitigate these types of failures is to include

a quality assurance algorithm at the point of capture of the contactless fingerprint images. Lastly,

minimal overlap in the fingerprint ridge structure between genuine probe and gallery fingerprint

images is the cause of many false rejections, whereas very similar ridge structure between imposter

fingerprint pairs leads to a number of false accepts. This challenge is present in contact-contact

matching; however, it is exaggerated in C2CL because of the unconstrained pose variance of the

finger in 3D space.

The potential for greater variance in the capture conditions when capturing contactless fin-

gerprint images necessitates more robust preprocessing to reliably match contactless fingerprints.

Thus, performance will likely be markedly lower in unconstrained scenarios compared to highly

controlled capture environments that employ dedicated hardware for the image acquisition, such

as the PolyU and UWA datasets. However, C2CL has pushed the SOTA forward both in matching

more unconstrained fingerphotos and the more constrained dedicated-device captured contactless

fingerprints. Additionally, one might consider acquiring multiple image views of the same finger to

build a complete 3D model of the finger to guide the preprocessing stage; however, this would add

additional computational costs and latency to the acquisition process. Furthermore, in some cap-

ture scenarios, certainly the setup employed by our capture app, this process may be ergonomically

challenging for the user.

As highlighted in the ablation study of Table 3.5, most of the improvement in interoperability

between contactless and contact-based fingerprints is due to appropriate 500 ppi scaling of the

contactless prints; however, incorporating a deformation correction module is also shown, with

statistical significance, to further improve the compatibility. Figure 3.7 aims to highlight this

The Mann-Whitney rank test [55] was used to compute the statistical significance between the ROC curves of 𝑆+𝐸+𝑇𝑠
and 𝑆+𝐸+𝑇𝑠+𝑇𝑑. For all four datasets, the p value is smaller than 0.05, indicating that the difference is statistically
significant to reject the hypothesis that the two curves are similar with a confidence of 95%.

77

Table 3.14 DeepPrint performance pre-trained with NIST N2N [78] dataset vs. Longitudinal dataset
referenced in Yoon and Jain [262].

Dataset

Pretrained on NIST N2N Dataset (publicly available)
EER (%)

PolyU
UWA
ISPFDv2
ZJU†
† Cross-database evaluation, i.e., not seen during training.

2.04
5.62
2.60
3.09

TAR (%) @ FAR =
0.01%
71.30
56.99
81.83
77.92

Pretrained on Longitudinal Dataset

EER (%)

2.37
5.29
2.33
2.08

TAR (%) @ FAR =
0.01%
72.07
83.40
84.33
86.42

fact through an overlay of the fingerprint ridge structure of one pair of corresponding contact

and contactless fingerprints before and after applying the deformation correction. Additionally,

Table 3.11 shows the average number of paired minutiae, missing minutiae, spurious minutiae,

and Goodness Index [195] without and with the warping correction on the ZJU dataset. The GI,

ranging from -1 to 3, is a combined measure of paired, missing, and spurious minutiae. The

warping module improved the GI by 5.99%. Thus, the improved alignment indubitably leads to

better minutiae-based and texture-based matching, as verified by our experiments.

Lastly, in order to utilize a large CNN, such as DeepPrint, for the task of contact-contactless

fingerprint matching, we leveraged a large dataset [262] from a related domain of contact-contact

fingerprint matching to pretrain our DeepPrint network. Since this dataset is not currently publicly

available, we have repeated the verification experiments when pretraining DeepPrint on the publicly

available NIST N2N dataset [78] (see Table 3.14). Due to the smaller dataset, we experience a

slight degradation in the DeepPrint performance on some of the evaluation datasets; however,

further data augmentation and incorporation of other publicly available datasets can be used to

improve the performance.

3.6 Computational Efficiency

Our system architecture consists of a variety of deep networks (segmentation network, deforma-

tion correction and scaling STN, DeepPrint CNN feature extractor) and a minutiae feature extractor.

The inference speeds of the segmentation network, STN, and DeepPrint are approximately 12.6ms,

6.2ms, and 26.3ms using a single NVIDIA GeForce RTX 2080 Ti GPU and 143.8ms, 19.5ms, and

120.2ms on an Intel Core i7-8700X CPU @ 3.70GHz, respectively. The Verifinger 12.0 feature

78

(a)

(b)

Figure 3.7 Comparison of ridge overlap (a) without and (b) with the unwarping module. Use of the
unwarping module results in better ridge alignment between contactless and contact-based images.

extractor requires 600ms on an Intel Core i7-8700X. In total, the inference speed of the end-to-end

network is ≈ 643.6ms with an NVIDIA GeForce RTX 2080 Ti GPU or ≈ 883.5ms on an Intel Core

i7-8700X CPU.

The deep network components of our algorithm are capable of very fast inference per input

image; however, the system as a whole consumes a large amount of memory (400 MB). To fit

into a resource constrained environment, such as a mobile phone, further optimization to the

system architecture can easily be implemented with very little, if any, performance drop. First,

the intermediate step of generating a scaled image prior to the deformation correction is not

required for deployment and was just included for the ablation study. Instead, we can remove the

affine transformation layer of our STN and directly scale and warp the input images in one step.

As it stands, the main components of the algorithm, DeepPrint and Verfinger, require ≈ 1s and

≈ 1.2s on a mobile phone (Google Pixel 2), respectively. Thus, the inference time is estimated

to be ≈ 2 seconds. However, to further boost the speed, rather than rely on a COTS system for

minutiae extraction and matching, we can directly use the minutiae sets output by DeepPrint and

a computationally efficient minutiae matcher to obtain the minutiae match scores, such as MSU’s

Latent AFIS Matcher [31]. Porting the model with these optimizations to a mobile phone remains

as a point of future work.

3.7 Conclusion and Future Work

In this chapter, we have presented an end-to-end system for matching contactless fingerprints

(i.e., finger photos) to contact-based fingerprint impressions that significantly pushes the SOTA

79

in contact-contactless fingerprint matching closer to contact-contact fingerprint matching. In par-

ticular, our contact to contactless matcher achieves less than 1% EER across multiple datasets

employing a variety of contactless and contact-based acquisition devices with varying background,

illumination, and resolution settings. Critical to the success of our system is our extensive prepro-

cessing pipeline consisting of segmentation, contrast enhancement, 500 ppi scale normalization,

deformation correction, and our adaptation of DeepPrint for contact-contactless matching. Our

cross-database evaluations and large-scale search experiments are more rigorous evaluations than

what is reported in the open literature, and it enables us to confidently demonstrate a step toward

a contact-contactless fingerprint matcher that is comparable to SOTA contact-contact fingerprint

matching accuracy. The following chapter focuses on further improving sensor interoperability of

fingerprint recognition systems by focusing on the diversity of fingerprint representations used for

matching.

3.8 Acknowledgment

This material is based upon work supported by the Center for Identification Technology Research

and the National Science Foundation under Grant No. 1841517. The authors would like to thank

Debayan Deb for his help in developing the mobile phone contactless fingerprint capture application,

Dr. Eryun Liu’s research group at Zhejiang University for overseeing the data collection effort for

this project, and the various research groups who have shared their datasets that were used in this

study.

80

CHAPTER 4

UNIVERSAL FINGERPRINT REPRESENTATION VIA MULTIMODEL EMBEDDINGS

This chapter aims to improve the generalization capability of fingerprint recognition across a

wide range of fingerprint sensors and cross-domain applications (contact to contactless, latent to

rolled, etc.) by leveraging complimentary features derived from multiple state-of-the-art deep

learning architectures into a single architecture. The proposed architecture, AFR-Net (Attention-

Driven Fingerprint Recognition Network), outperforms several baseline models, including a SOTA

commercial fingerprint system by Neurotechnology, Verifinger v12.3, across intra-sensor, cross-

sensor, and latent to rolled fingerprint matching datasets. Additionally, a novel realignment strategy

using local embeddings extracted from intermediate feature maps within the networks is proposed

to refine the global embeddings in low certainty situations, which boosts the overall recognition

accuracy. This realignment strategy requires no additional training and can be applied as a wrapper

to any existing deep learning network (including attention-based, CNN-based, or both) to boost its

performance in a variety of computer vision tasks.

4.1

Introduction

Automated fingerprint recognition systems have continued to permeate many facets of everyday

life, appearing in many civilian and governmental applications over the last several decades [160].

As an example, India’s Aadhaar civil registration system is used to authenticate approximately

70 million transactions per day, primarily with fingerprints. Due to the impressive accuracy

of fingerprint recognition algorithms (0.626% False Non-Match Rate at a False Match Rate of

0.01% on the FVC-ongoing 1:1 hard benchmark [63]), researchers have turned their attention to

addressing difficult edge-cases where accurate recognition remains challenging, such as partial

overlap between two candidate fingerprint images and cross-sensor interoperability (e.g., optical

to capacitive, contact to contactless, latent to rolled fingerprints, etc.), as well as other practical

This chapter was previously published as S. A. Grosz and A. K. Jain, “AFR-Net: Attention-Driven Fingerprint
Recognition Network”, IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 6, pp. 30-42, 2023.
Copyright 2023 by IEEE. Reprinted with permission.
https://uidai.gov.in/aadhaar_dashboard/auth_trend.php

81

Figure 4.1 Example correspondence between local features extracted from the intermediate feature
maps of our AFR-Net model for two images of the same finger. Note, these local features are not
necessarily the same as minutiae points, which are commonly used in fingerprint recognition.

problems like template encryption, privacy concerns, and matching latency for large-scale (gallery

sizes on the order of tens or hundreds of millions) identification.

For many reasons, some of which mentioned above (e.g., template encryption and latency),

methods for extracting fixed-length fingerprint embeddings using various deep learning approaches

have been proposed. Some of these methods were proposed for specific fingerprint-related tasks,

such as minutiae extraction [51, 234] and fingerprint indexing [27, 218], whereas others were

aimed at extracting a single “global" embedding [68, 140, 143]. Of these methods, the most

common architecture employed is the convolutional neural network (CNN), often utilizing domain

knowledge (e.g., minutiae [68]) and other tricks (e.g., specific loss functions, such as triplet

loss [61]) to improve fingerprint recognition accuracy. More recently, motivated by the success of

attention-based Transformers [241] in natural language processing, the computer vision field has

seen an influx of the use of the vision transformer (ViT) architecture for various computer vision

tasks [38, 64, 104, 149].

In fact, two studies have already explored the use of ViT for learning discriminative fingerprint

embeddings [95, 230], albeit, with the following limitations: i.) the authors of [230] supervised

their ViT model using a pretrained CNN as a teacher model and thus did not give the transformer

architecture the freedom to learn its own representation and ii.) the authors of [95] were limited

82

in the data and choice of loss function used to supervise their transformer model, thereby limiting

the fingerprint recognition accuracy compared to the baseline ResNet50 model. Nonetheless,

the authors in [95] did note the complimentary nature between the features learned by the CNN-

based ResNet50 model and the attention-based ViT model. This motivated us to evaluate additional

attention-based models that bridge the gap between purely CNN and purely attention-based models,

in order to leverage the benefits of each. Toward this end, we evaluate two ViT variants (vanilla

ViT [64] and Swin [149]) along with two variants of a CNN model [106] (ResNet50 and ResNet101)

for fingerprint recognition.

In addition, we propose our own architecture, AFR-Net (Attention-

Driven Fingerprint Recognition Network), consisting of a shared feature extraction and parallel

CNN and attention classification layers.

Even though these models are trained to extract a single, global embedding representing the

identity of a given fingerprint image, we make the observation that for both CNN-based and

attention-based models, the intermediate feature maps encode local features that are also useful

for relating two candidate fingerprint images. Correspondence between these local features can

be used to guide the network in placing attention on overlapping regions of the images in order to

make a more accurate determination of whether the images are from the same finger. Additionally,

these local features are useful in explaining the similarity between two candidate images by directly

visualizing the corresponding keypoints, as shown in Figure 4.1.

One remaining concern with regards to deep learning-based fingerprint matchers is their gener-

alization across different fingerprint sensing technology (e.g., optical, capacitive, etc.), fingerprint

readers (e.g., CrossMatch, GreenBit, etc.), and fingerprint impression types (e.g., rolled, plain,

contactless, etc.). This problem is often referred to as sensor interoperability, which has received

some attention in recent years [4, 142, 152, 203]. In this paper, we demonstrate the generalizability

of our learned representations via extensive experiments across a wide range of fingerprint sensors

and types. As we show in the ablation study in section 4.4.5, much of the challenge of sensor

interoperability is mitigated by training on a large, diverse training dataset; however, additional per-

formance gains are achieved by incorporating both of the complimentary CNN and attention-based

83

Figure 4.2 Overview of the AFR-Net architecture. First, input fingerprint images are passed through
a spatial alignment module for better alignment of two fingerprints under comparison, then passed
through a shared feature extraction, followed by two classification heads (one CNN-based and the
other attention-based). For our implementation, we followed the ResNet50 architecture as our
backbone and CNN classification head and used 12 multi-headed attention transformer encoder
blocks for the attention-based classification head.

features into our network.

More concisely, the contributions of this research are as follows:

• Analysis of various attention-based architectures for fingerprint recognition.

• Novel architecture for fingerprint recognition, AFR-Net, which incorporates attention layers

into the ResNet architecture.

• State-of-the-art (SOTA) fingerprint recognition performance (authentication and identifica-

tion) across several diverse benchmark datasets, including intra-sensor, cross-sensor, contact

to contactless, and latent to rolled fingerprint matching.

• Novel use of local embeddings extracted from intermediate feature maps to both improve the

recognition accuracy and explainability of the model.

• Ablation analysis demonstrating the importance of each aspect of our model, including

choice of loss function, training dataset size, use of spatial alignment module, use of both

classification heads, and use of local embeddings to refine the global embeddings.

84

4.2 Related Work

Here we briefly discuss the prior literature in deep learning-based fingerprint recognition and

the use of vision transformer models for computer vision. For a more in-depth discussion on these

topics, refer to one of the many survey papers available (e.g., [172] for deep learning in biometrics

and [128] for the use of transformers in vision).

4.2.1 Deep Learning for Fingerprint Recognition

Over the last decade, deep learning has seen a plethora of applications in fingerprint recognition,

including minutiae extraction [51, 234], fingerprint indexing [27, 218], presentation attack detec-

tion [41,72,92,240], synthetic fingerprint generation [71,96,129,251], and fixed-length fingerprint

embeddings for recognition [68, 140, 143]. For purposes of this paper, we limit our discussion to

fixed-length (global) embeddings for fingerprint recognition.

Among the first studies on extracting global fingerprint embeddings using deep learning was

proposed by Li et al. [140] which used a fully convolutional neural network to produce a final

embedding of 256 dimensions. The authors of [68] then showed improved performance of their

fixed-length embedding network by incorporating minutiae domain knowledge as an additional

supervision. Similarly, Lin and Kumar incorporated additional fingerprint domain knowledge

(minutiae and core point regions) into a multi-Siamese CNN for contact to contactless fingerprint

matching [143]. More recently, [230] and [95] proposed the use of vision transformer architec-

ture for extracting discriminative fixed-length fingerprint embeddings, both of which showed that

incorporating minutiae domain knowledge into ViT improved the performance.

4.2.2 Vision Transformers for Biometric Recognition

Transformers have led to numerous applications across the computer vision field in the past

couple of years since they were first introduced for computer vision applications by Doesovitskiy et

al. in 2021 [64]. The general principle of transformers for computer vision is the use of the attention

mechanism for aggregating sets of features across the entire image or within local neighborhoods

of the image. The notion of attention was originally introduced in 2015 for sequence modeling

85

by Bahdanau et al. [10] and has been shown to be a useful mechanism in general for operations

on a set of features. Today, numerous variants of ViT have been proposed for a wide range of

computer vision tasks, including image recognition, generative modeling, multi-model tasks, video

processing, low-level vision, etc. [128].

Some recent works have explored the use of transformers for biometric recognition across

several modalities including face [269], finger vein [115], fingerprint [95, 230], ear [3], gait [54],

and keystroke recognition [225]. In this work, we improve upon these previous uses of transformers

by evaluating additional attention-based architectures for extracting global fingerprint embeddings.

4.3 AFR-Net: Attention-Driven Fingerprint Recognition Network

Our approach consists of i.) investigating several baseline CNN and attention-based models

for fingerprint recognition, ii.) fusing a CNN-based architecture with attention into a single model

to leverage the complimentary representations of each, iii.) a strategy to use intermediate local

feature maps to refine global embeddings and reduce uncertainty in challenging pairwise fingerprint

comparisons, and iv.) use of spatial alignment module to improve recognition performance. Details

of each component of our approach are given in the following sections.

4.3.1 Baseline Methods

First, we improve on the initial studies [95,230] applying ViT to fingerprint recognition to better

establish a fair baseline performance of ViT compared to CNN-based models. This is accomplished

by removing the limitations of the previous studies in terms of choice of supervision and size of

training dataset used to learn the parameters of the models. We then compare ViT with two variants

of the ResNet CNN-based architecture, ResNet50 and ResNet101. For our specific choice of ViT,

we decided on the small version with patch size of 16, number of attention heads of 6, and layer

depth of 12. We selected this architecture as it presents an adequate trade-off in speed and accuracy

compared to other ViT variants.

In addition, we compare the performance of a popular ViT

successor, Swin, which uses a hierarchical structure and shifted windows for computing attention

within local regions of the image. Specifically, we used the small Swin architecture with patch size

of 4, window size of 7, and embedding dimension of 96.

86

For additional baseline comparisons with previous methods, we included the latest version of the

commercial-of-the-shelf (COTS) fingerprint recognition system from Neurotechnology, Verifinger

v12.3, and DeepPrint [68], a fingerprint recognition network based on Inceptionv4 backbone that

incorporates fingerprint domain knowledge into the learning framework. According to the FVC

On-going competition, Verifinger is the top performing algorithm for the 1:1 fingerprint verification

benchmark [63], and DeepPrint has also shown competitive performance with Verifinger on some

benchmark datasets [68].

4.3.2 Proposed AFR-Net Architecture

Based on previous research suggesting the complimentary nature of ViT and ResNet embed-

dings, we were motivated to merge the two into a single architecture, referred to as AFR-Net. As

shown in Figure 4.2, AFR-Net consists of a spatial alignment module, shared CNN feature encoder,

CNN classification head, and an attention classification head. The shared alignment module and

feature encoder greatly reduces the number of parameters compared to the fusion of the two separate

networks and also allows the two classification heads to be trained jointly.

Due to the two classification heads, we have two bottleneck classification layers which map each

of the 384-d embeddings, 𝑍𝑐 and 𝑍𝑎, into a softmax output representing the probability of a sample

belonging to one of N classes (identities) in our training dataset. We employ the Additive Angular

Margin (ArcFace) loss function to encourage intra-class compactness and inter-class discrepancy

of the embeddings of each branch [56]. Through an ablation study, presented in section 4.4.5,

we find that despite the relatively little use of this loss function in previous fingerprint recognition

papers [95, 187], the ArcFace loss function makes an enormous difference in the performance of

our model.

4.3.3 Global Embedding Refinement via Local Embeddings

As noted in the introduction, and demonstrated in Figure 4.1, we find that the intermediate

feature maps of our AFR-Net model (and in general, other deep learning models evaluated in this

work) encode local descriptors (i.e., embeddings) of the input images. These local descriptors can

https://neurotechnology.com/verifinger.html

87

be matched between two fingerprint images and used to compute a correspondence between similar

regions. Given the high accuracy of these local embeddings in locating corresponding points of

interest between two images, we devise a strategy to use these corresponding regions of interest as

a sort of hard attention for the model to refine the global embeddings based on just the overlapping

regions present in both images.

Some examples of this process are demonstrated in Figure 4.3, where the correspondence

between local embeddings is used to compute an affine transformation between the image pairs.

Then, the non-overlapping fingerprint regions are masked and each image is presented to the

network for a second time to yield a new set of embeddings. Finally, a second similarity score

between the masked images is computed via a cosine similarity between the new embeddings. The

similarity between the masked regions is combined via a weighted sum with the similarity score

obtained from the original images to obtain a final similarity score.

For ResNet50, ResNet101, and AFR-Net, we take the last output of the Conv4 layer as our local

embeddings, which has dimensions of 14x14x1024. For ViT and Swin, we take the final patch

embeddings at the output of the last attention layer as the local embeddings, which has dimensions

of 14x14x384.

In all cases, each of these 196 local descriptors corresponds to a single 16x16

patch of the input fingerprint image. We assign the center of each patch as the keypoint associated

with the corresponding local embedding when computing the correspondence points between two

fingerprint images.

Indeed, computing the correspondence between sets of local descriptors of two images is time

consuming, especially in computing a brute force exhaustive search to establish a 1:1 correspon-

dence between matched descriptors. For this reason, we only employ the re-weighting strategy in

low certainty scenarios (when the similarity score is close to the match threshold) to keep the amor-

tized latency of our algorithm approximately the same as without the re-weighting process; that is,

we only utilize the local descriptors if the similarity score between the original global embeddings

falls between a specified range [𝑠𝑙, 𝑠ℎ]. Values of 0.3 and 0.6 for 𝑠𝑙 and 𝑠ℎ, respectively, were

empirically determined on our validation dataset to work well across all models. Furthermore, if a

88

Figure 4.3 Example genuine (a and b) and imposter (c and d) pairs from FVC 2002 DB1A before
and after realignment, with corresponding similarity scores from AFR-Net. Both genuine scores
are pushed above the FAR=0.1% threshold of 0.36, whereas both imposter scores remained below
the threshold.

89

Algorithm 4.1 Compute similarity between input fingerprint pairs with AFR-Net.

𝑠 ← 𝑤1(𝑍𝑐1 · 𝑍𝑐2) + 𝑤2(𝑍𝑎1 · 𝑍𝑎2)

𝑎, 𝐿1 ← 𝐴𝐹 𝑅𝑛𝑒𝑡 (𝐼1)
𝑎, 𝐿2 ← 𝐴𝐹 𝑅𝑛𝑒𝑡 (𝐼2)

𝑍 1
𝑐 , 𝑍 1
𝑐 , 𝑍 2
𝑍 2

𝑤1 := 0.2
𝑤2 ← 1 − 𝑤1
𝑤3 := 0.5
𝑤4 ← 1 − 𝑤3
𝑠𝑙, 𝑠ℎ := [0.3, 0.6]

1: procedure Match(𝐼1, 𝐼2)
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:

if 𝑠𝑙 ≤ 𝑠 ≤ 𝑠ℎ then

𝑘 𝑝1, 𝑘 𝑝2 ← 𝑔𝑒𝑡𝐶𝑜𝑟𝑟 (𝐼1, 𝐼2, 𝐿1, 𝐿2)
𝑀 ← 𝑔𝑒𝑡𝐻𝑜𝑚𝑜𝑔𝑟𝑎 𝑝ℎ𝑦(𝑘 𝑝1, 𝑘 𝑝2)
if ℎ𝑜𝑚𝑜𝑔𝑟𝑎 𝑝ℎ𝑦𝑂𝐾 (𝑀) then

𝐼′
1 ← 𝑀 𝐼1
𝐶1, 𝐶2 ← 𝑐𝑟𝑜 𝑝𝑂𝑣𝑒𝑟𝑙𝑎 𝑝(𝐼′
𝑍 1
𝑐 , 𝑍 1
𝑐 , 𝑍 2
𝑍 2
𝑠′ ← 𝑤1(𝑍 1
𝑐 · 𝑍 2
𝑠 ← 𝑤3𝑠 + 𝑤4𝑠′

𝑎, _ ← 𝐴𝐹 𝑅𝑛𝑒𝑡 (𝐶1)
𝑎, _ ← 𝐴𝐹 𝑅𝑛𝑒𝑡 (𝐶2)
𝑐 ) + 𝑤2(𝑍 1

1, 𝐼2)

𝑎 · 𝑍 2
𝑎)

19:
20:
21:
22:

23:
24:

return 𝑠

valid homography computed between corresponding local regions cannot be obtained (e.g., if the

scale, rotation, and/or translation parameters exceed expected limits), we fall back to the original

similarity score as to not further degrade the comparison by computing a new set of embeddings

from images which have been corrupted due to poorly behaved transformation matrices. Figure 4.4,

shows the genuine and imposter score distributions for our AFR-Net model on the FVC 2002 DB3A

dataset, where we experienced the biggest increase in performance after re-weighting the predic-

tions using this method. In figure 4.4, we show (a) the original score distributions, (b) the scores

computed on the refined embeddings which had well behaved homography matrices, and (c) the

fused score distributions after the weighted averaging. The full algorithm of this process is detailed

in Algorithm 4.1.

There are likely more sophisticated, faster alternatives to a brute force algorithm for computing

90

Figure 4.4 Similarity score distributions for (a) original image embeddings, (b) embeddings after
refinement, and (c) weighted average of original and refined embeddings. Similarity scores are
computed with the AFR-Net model on the FVC 2002 DB3A dataset, where the TAR @ 0.1% FAR
for the original embeddings is 98.43%, 91.32% for the refined embeddings, and 99.36% after the
weighted score fusion.

the correspondence between sets of local embeddings and aggregating the similarity scores between

matched local embeddings themselves. However, we leave those areas of exploration for future

work as the current algorithm seems to improve the results significantly across all models and

datasets and has the nice interpretation of giving the network a second glance at regions of interest

in uncertain cases - much like a human fingerprint examiner would. Nonetheless, some suggestions

for future extensions would include the use of a graphical neural network (GNN) or attention

mechanism to more intelligently aggregate the sets of local descriptors between two images.

4.3.4 Spatial Alignment Module

As has been noted in previous literature on fingerprint recognition [49, 68,93, 107,231], Spatial

Transformer Networks [119] have been shown to be highly effective in aligning input fingerprints for

improved recognition accuracy across a wide range of tasks (e.g., contact to contactless fingerprint

matching, partial fingerprint recognition, etc.). That, coupled with our observation that the local

descriptors used in our realignment procedure are not rotation invariant, we were motivated to

include a spatial alignment module into the architecture of our AFR-Net model and each of the

baseline models. Lastly, in the ablation portion of our experimental section, we further emphasize

the benefit of incorporating the spatial alignment module into our fingerprint recognition network.

91

4.3.5 Training Details

AFR-Net and all baseline models, excluding COTS Verifinger and DeepPrint, were trained

with an ArcFace loss function with a margin of 0.5, learning rate of 1e-4, weight decay of 2e-5,

and polynomial learning rate decay function with a power of 3 and minimum learning rate of

1e-5. All models were intialized using the pre-trained ImageNet weights made available by the

open-sourced pytorch-image-models git repository [249]. The AFR-Net, ResNet100, and Swin

models were trained with a batch size of 64 across four Nvidia GeForce RTX 2080 Ti GPUs,

whereas ResNet50 and ViT models were trained with a batch size of 128. AFR-Net, ResNet50, and

ResNet100 were trained with the Adam optimizer [130] and ViT and Swin were trained with the

AdamW optimizer [151]. The maximum number of epochs for all models was set to 75; however,

the number of epochs trained for the final saved models varied based on the highest validation

accuracy on a hold-out validation dataset.

4.4 Experimental Results

In this section, we discuss the datasets used in this study, the authentication and identification

results achieved by our method in comparison with the baseline methods, latency and performance

trade-off between the methods, and an ablation analysis to highlight the contributions of individual

components of our algorithm.

4.4.1 Datasets

For training our models, we aggregate a large number of fingerprint datasets with diverse charac-

teristics, ranging from rolled fingerprints [248,262], plain (i.e., slap) fingerprints, mixture of rolled

and plain fingerprints [78], contactless (e.g., from mobile phone cameras) fingerprints [17, 53, 75],

latent fingerprints (from the Michigan State Police (MSP) Latent Database), and synthetic finger-

prints [71, 96, 251]. Example images from each of these datasets are shown in Figure 4.5. A

small portion of the total training dataset was reserved for validation.

In total, our aggregated

training dataset contains 1.3M images for training and 3,814 images for validation. Further infor-

mation regarding the number of unique fingers, images per dataset, and fingerprint type is given in

92

Figure 4.5 Example image pairs from each of the datasets used in this paper. See Table 7.1 for
details and source of each dataset.

93

Table 4.1 Fingerprint datasets used in this study. Train and test splits are disjoint.

200

[17]

[53]

1,243

1,600

5,220

4,535
2,000
2,000
# Fingers

# images
447,988
20,008
57,813
524,775
150,000

# Fingers
37,411
1,600
4,582
34,985
10,000

Train Dataset
MSP† [262]
NIST SD 302 [78]
MSU Self-Collection†
PrintsGAN [71]
SpoofGAN [96]
MSU Finger Photo
and Slap Database
IIT Bombay Touchless
and Touch-based Database
ManTech Phase 2 [75]
Synthetic Latent Prints [251]
NIST SD 4† [248]
Validation Dataset
MSU Finger Photo
and Slap Database
MSP Latent† [262]
NIST SD 302 [78]
Test Dataset
FVC 2002 DB1A [155]
FVC 2002 DB2A [155]
FVC 2002 DB3A [155]
NIST SD 14† [246]
NIST SD 302 [78]
NIST SD 27† [83]
PolyU Contactless 2D to
Contact-based 2D Database
ZJU Finger Photo and
Touch-based Database
† Not publicly available. NIST SD 4, NIST SD 14 and NIST SD 27 were publicly available but were later removed from
public domain by NIST. MSP and MSP Latent databases are operational forensic datasets which cannot be released for privacy
reasons and per our NDA with the Michigan State Police (MSP).

524
200
# Fingers
100
100
100
2700
200
258

1086
2528
# Images
800
800
800
5,400
2,548
516

64,061
16,000
4,000
# Images

19,776

[145]

[93]

[53]

200

960

824

160

110

Table 7.1.

Our evaluation datasets are just as diverse as our training datasets and include challenging

scenarios such as contact to contactless fingerprint matching [93, 145], varying sensor types for

both rolled and slap prints (e.g., optical, capacitive, thermal swipe, etc.) [155, 246], latent to rolled

fingerprint matching [83], and even rolled to plain fingerprint matching (as is the case in NIST SD

302 [78]).

4.4.2 Authentication Results

We report authentication performance of our method across 11 different evaluation datasets of

varying characteristics. The results are given in Table 5.7 as the true accept rate (TAR) at a false

accept rate (FAR) of 0.01% (FAR=0.1% in the case of the FVC datasets in order to follow the

We have reserved 200 of the 2,000 unique fingers in the NIST SD 302 for testing; these 200 fingers are completely
disjoint from the fingers used in our training and validation partitions.

94

established protocols). Besides for the established protocol on the FVC datasets, we compute all

possible genuine and imposter pairs for our evaluations.

According to the results in Table 5.7, AFR-Net outperforms all baseline methods on 9 out of

the 11 datasets and shows competitive performance on the two datasets where it comes in second

place (99.96% vs. 100% and 99.36% vs. 99.54% for FVC 2002 DB2A and DB3A, respectively).

We show especially impressive performance in cross-sensor (TAR=96.11% on NIST SD 302)

and contact to contactless matching (TAR=98.73% and TAR=98.70% on PolyU and ZJU datasets,

respectively), as well as latent to rolled fingerprint matching on the challenging NIST SD 27 dataset,

where our method out performs COTS Verifinger v12.3 (TAR=63.18% to TAR=61.63%). AFR-

Net, and even our baseline ResNet and ViT variants, show substantial improvement over previous

fixed-length, global representation networks for fingerprint recognition. For example, DeepPrint,

one of the top performing models in the open literature, achieves a TAR of 98.55% on NIST SD

14, compared to 99.93% for AFR-Net.

For all the methods, we show improved performance with using the local embeddings to realign

the images as a way to refine the global embeddings and improve the resulting similarity scores.

The performance improvement was most pronounced for datasets with frequent partial fingerprints,

such as FVC 2002 DB3A and DB1A. For example, the average performance across all the methods

on FVC 2002 DB3A improved from 94.46% to 96.26%, a 32.5% reduction in error. Intuitively, this

realignment process has the effect of slightly improving the similarity scores between borderline

genuine fingerprint pairs by forcing the network to focus on overlapping regions in the images and

does not appreciably effect the borderline imposter scores.

If comparing just the CNN-based models (ResNet50 and ResNet101) to the attention-based

models (ViT and Swin), the performance in terms of matching accuracy is quite comparable;

however, in terms of number of parameters, ViT and Swin have substantially smaller footprints.

For the most part, Swin outperformed ViT in terms of accuracy across many of the datasets, but it

does have more than twice the parameters and 3 times the latency of ViT, making it perhaps not as

preferable in some situations.

95

Table 4.2 Authentication (1:1 comparison) Results.

Model

Params.
(M)

Inference
Speed‡
(ms)

TAR (%) @ FAR=0.1%∗
FVC 2002
DB1A DB2A DB3A

TAR (%) @ FAR=0.01%

NIST
SD14

NIST
SD302

NIST
SD27

PolyU

ZJU

Verifinger
v12.3
DeepPrint [68]
ResNet50
ResNet101
ViT
Swin
AFR-Net
ResNet50†
ResNet101†
ViT†
Swin†
AFR-Net†

N/A

78.81
62.21
81.20

21.83
52.69
85.02
62.21
81.20

21.83
52.69
85.02

600

99.96

99.86

99.54

98.79

93.26

61.63

95.39

96.88

17.53
4.34
7.58

4.12
11.66
8.42
10.37
14.19
10.11
19.00
15.18

95.32
99.50
99.57
99.29
99.75
99.86
99.86
99.75
99.54
99.86

94.64
99.93
99.61
99.68
99.79
99.96

100
99.75
99.75
99.82
99.96

69.93
93.96
94.5
93.00
92.43
98.43
95.54
96.11
95.04
95.25
99.36

98.55

99.93
99.93
99.67
99.89

99.93
99.93
99.89
99.74
99.89

84.01
94.70
93.79
93.49
95.46
95.46
95.40
94.62
94.25
95.70

96.11

24.81
53.88
56.59
46.51
44.96

63.18
53.88
56.59
46.51
44.96

63.18

72.07
96.94
96.48
92.38
96.15
98.23
97.60
97.15
94.08
95.95

98.73

86.42
98.28
97.71
98.08
98.33
98.68
98.04
97.78
98.29
98.07

98.70

100
‡ Computed on an Nvidia GeForce RTX 2080 Ti.
∗ Following the FVC protocol of 2,800 genuine pairs and 4,950 imposter pairs.
† With re-weighting using local embeddings.

99.93

4.4.3

Identification Results

We used the NIST SD 27 latent fingerprint dataset and a gallery of 100K rolled fingerprints from

the MSP fingerprint dataset to evaluate the closed-set identification (i.e., 1:N search) performance of

our models. According to the cumulative match characteristic (CMC) curve shown in Figure 5.9 and

the identification performance at specific retrieval ranks given in Table 4.3, AFR-Net is competitive

with Verifinger v12.3 and outperforms all the rest of the baseline methods by a substantial margin.

The rank-1 accuracy of COTS Verifinger is 55.04%, compared to 53.10% with AFR-Net but

AFR-Net surpasses Verifinger at higher retrieval ranks. The next closest performing model was

ResNet101, with a rank-1 accuracy of 44.96%. Some example image retrievals when (a) the correct

mate was returned at rank-1 and (b) when the correct mate was not returned in the top five candidates

are shown in Figure 4.6. In the successful case, the latent probe image is of relatively high quality

and is able to match with its corresponding mate with a similarity score far above the other returned

matches. In the failure case, the latent image is of very poor quality and returns high similarity

scores with other poor quality images in the gallery.

These 100K images are completely disjoint from the 448K fingerprint images from MSP used for training.

96

Figure 4.6 Example (a) successful and (b) unsuccessful search results for two NIST SD 27 latent
probe fingerprints.
In the successful case, the latent probe image is matched at rank-1 with its
corresponding mate with a similarity score of 0.98; whereas, in the unsuccessful case, the poor
quality latent image produces high similarity scores with other low quality images in the gallery
and is not matched with the correct mate until rank 10 (with a similarity score of 0.84); not shown
here since we are displaying only the top 5 retrievals.

Despite impressive performance of our model compared to the baseline methods, we should

note that latent fingerprint identification is an extremely challenging task that requires targeted

segmentation, enhancement, and matching strategies to achieve SOTA performance, as is demon-

strated in these prior latent identification studies [28,31,121,187]. For our evaluation, we only used

manual bounding box annotations to locate the latent fingerprints prior to matching, but we did not

use any other preprocessing or enhancement; thus, our performance could be further improved for

latent to rolled fingerprint matching. Additionally, since we do not use minutiae or any other fin-

gerprint domain knowledge in designing AFR-Net, our model may be at a disadvantage compared

to the SOTA latent matchers, since minutiae have been shown to be a useful feature for matching

very low quality latents [31]. Nonetheless, AFR-Net still performs reasonably well compared to

Verifinger, which is also not intended for latent to rolled fingerprint matching but does incorporate

some fingerprint domain knowledge (enhancement, minutiae, etc.).

Furthermore, we observed that the fusion of Verifinger v12.3 and AFR-Net leads to a significant

boost in retrieval accuracy (rank-1 accuracy of about 64% compared to 55.04% for Verifinger and

97

Figure 4.7 Cumulative match characteristic (CMC) curve with NIST SD 27 latent probes and a
gallery of 100K rolled fingerprints plus the NIST SD 27 mated rolled fingerprint pairs.

53.10% for AFR-Net). Still, there is room for improvement, as according to Cao et al. [31], the

SOTA rank-1 retrieval rate for NIST SD 27 against a gallery of 100K rolled fingerprint is 65.7%.

We also evaluated the fusion of ResNet50 and ViT, which performed worse compared to using just

AFR-Net (rank 1 retrieval rate of 49.61% vs. 53.10%). Thus, not only does incorporating both

architectures into one save on latency and model size, as is done in AFR-Net, it also leads to better

fingerprint recognition performance over the fusion of both individual models.

Lastly, we evaluated our model’s performance for rolled to rolled fingerprint search using NIST

SD 14. Consistent with previous studies [68], we used the last 2,700 images from NIST SD 14 as

probes and their corresponding mates with the same 100K rolled images from MSP as the gallery.

AFR-Net achieves a rank 1 retrieval rate of 99.78%, which is an improvement over the previous

SOTA performance of 99.20% by DeepPrint [68].

4.4.4 Latency

The inference speed of each method is given in Table 5.7, along with the number of parameters

of each network. Of the models that we compared, the one with the least number of parameters

98

Table 4.3 Closed-set identification performance on NIST SD 27 (rolled to latent comparison) at
varying retrieval ranks (%) against a gallery of 100K rolled fingerprints.

Model
Verifinger
v12.3
ResNet50
ResNet101
ViT
Swin
AFR-Net

Rank 1 Rank 5 Rank 10 Rank 20 Rank 100

55.04

43.02
44.96
38.37
37.98
53.10

62.02

52.33
55.81
44.57
46.12
62.79

65.89

55.81
57.75
48.06
48.06
65.50

67.44

58.91
61.24
50.78
55.43
68.6

73.64

67.05
72.09
58.91
62.79
75.58

is ViT (21.83 M), followed by Swin (52.69 M) and ResNet50 (62.21 M). ViT also has the lowest

latency of 4.12 ms per comparison, followed closely by ResNet50 with a latency of 4.34 ms. AFR-

Net has 85.02 M parameters but is still comparable to the number of parameters as ResNet101

(81.20 M).

In terms of performance vs.

latency trade-off, ResNet50 outperformed ResNet101 on the

majority of the evaluation datasets, whereas Swin outperformed ViT on the majority of the datasets

- however, at a significant cost to latency and larger number of parameters. Thus, it seems that both

ResNet50 and ViT may be preferable in some applications that require smaller footprints and faster

inference speed. AFR-Net performed the best overall in terms of performance; however, AFR-Net

does have a small added latency and increase in number of parameters compared to, for example,

ResNet50. Still, the significant improvements in performance on many of the datasets seem to

justify the added computational costs.

Lastly, the realignment stage utilizing the local embeddings does incur some additional latency,

which we will denote as 𝑡𝑅. For our implementation, the average value of 𝑡𝑅 is 29.36 ms.

In

addition to 𝑡𝑅, the realignment stage includes the time required for one additional inference time of

the embedding network, 𝑡𝐼. However, since we only invoke the realignment stage for a fraction of

the total comparisons, 𝑟, the amortized latency cost, 𝑡 𝐴, of the realignment is significantly lower,

computed by the following equation:

𝑡 𝐴 = 𝑟 (𝑡𝑅 + 2𝑡𝐼) + (1 − 𝑟)𝑡𝐼

(4.1)

99

For example, with a specified range of [0.3, 0.6], the realignment process for AFR-Net is

invoked 17.9% (r=0.179) of the time on average across all the datasets. Using the inference speed

of AFR-Net from Table 5.7 of 8.42ms, the total latency of AFR-Net† (AFR-Net with realignment)

is 15.18 ms per comparison.

4.4.5 Ablation Analysis

In the abalation study of our AFR-Net model, we analyzed the effects of the loss function

(cross entropy vs. ArcFace), training dataset size, use of a spatial transformer network (STN)

for spatial alignment, number of classification heads, and our realignment strategy using the local

feature embeddings. For the ablation on the training dataset size, we compared the performance

of our algorithm when trained on only a subset of the full 1.3M training images. Specifically, we

created the subset using only the publicly available fingerprint datasets, which included NIST SD

302, IIT Bombay Touchless and Touch-based, ManTech Phase 2, SpoofGAN, and PrintsGAN. This

resulted in 760K training images, where 675K of these images are synthetic (from SpoofGAN and

PrintsGAN). In comparison, our full training database consists of the same 675K synthetic images

plus an additional 540K real fingerprint images.

The results of the ablation study are given in Table 5.9. The largest increase in performance is

attributed to the use of ArcFace loss rather than a cross entropy loss for supervision. Interestingly,

training with ArcFace loss on a subset of only publicly available training data (85K real fingerprints

+ 675K synthetic compared to our full dataset of 540K real fingerprints + 675K synthetic) achieves

competitive recognition performance across all datasets, where the benefit of additional data is

most evident in the cross-sensor and latent matching scenarios. Even further improvements were

obtained with the incorporation of the spatial alignment network. Finally, we noticed consistent

improvements across all evaluation datasets with applying our realignment strategy, especially in

the more challenging datasets such as NIST SD 302 and FVC 2002 DB3A, which have many

partially overlapping fingerprints.

100

Table 4.4 Ablation study for AFR-Net.

Loss

# Imgs.

STN

Backbone

Realign

TAR (%) @ FAR=0.1%∗
FVC 2002
DB2A

DB1A

DB3A

No

No

760K

96.93

CNN+Attn

Cross
Entropy
CNN+Attn
ArcFace
CNN+Attn
ArcFace
CNN only
ArcFace
Attn only
ArcFace
CNN+Attn
ArcFace
ArcFace
CNN+Attn
∗ Following the FVC protocol of 2,800 genuine pairs and 4,950 imposter pairs.

99.86
99.82
99.96
99.93
99.96
99.96

98.46
99.79
99.64
99.86
99.86
100

760K
1.3M
1.3M
1.3M
1.3M
1.3M

No
No
Yes
Yes
Yes
Yes

No
No
No
No
No
Yes

96.50

80.21

92.50
97.82
96.68
98.34
98.43
99.36

NIST
SD14

98.23

99.63
99.89
99.89
99.93
99.93
99.93

TAR (%) @ FAR=0.01%
NIST
SD302

NIST
SD27

PolyU

74.05

21.32

69.15

92.04
94.82
95.17
95.21
95.46
96.11

39.53
58.14
55.43
60.85
63.18
63.18

91.3
97.00
96.72
98.45
98.23
98.73

ZJU

87.09

97.61
98.66
98.63
98.57
98.68
98.70

4.5 Discussion

In this section, we discuss some remaining failure cases of our model and some possible future

extensions to mitigate them. We also investigate the robustness of our model to partial fingerprints

by manually generating affine and occlusion deformations of varying magnitudes.

4.5.1 Failure Case Analysis

Two example fingerprint image pairs that failed to be successfully matched by AFR-Net are

shown in Figure 7.12. As demonstrated with these representative examples, the majority of the

failure cases can be attributed to one of two factors: i.) extremely poor image quality or ii.) very

little overlap between the images. The first cause can be avoided by implementing a quality check

into the algorithm, whereas the second cause may be more difficult to detect and/or avoid in a

practical system (especially one operating with a limited acquisition aperture). Our realignment

strategy is effective at improving partial overlap pairs; however, when the amount of overlap is

severe, such as example (b) in Figure 7.12, the model may still fail.

4.5.2 Robustness to Occlusions and Affine Transformations

To help understand the difference between CNN-based and attention-based embeddings, we

conducted an experiment to visualize the saliency maps of each model on pairs of partial fingerprints.

Specifically, we scan a mask of 16x16 pixels (with a stride of 16) across one image in the pair

and compute the resulting similarity scores, which we use to draw a heatmap of salient regions

for each patch in the image. We repeat this process for the other image and overlay the heatmaps

onto the original images. Due to space limitations, we just show one representative example in

101

Figure 4.8 Example fingerprint pairs that failed to match successfully by the proposed AFR-Net.
(a) is from NIST SD 302 and (b) is from FVC 2002 DB2A. Similarity scores for (a) and (b) are
0.35 and 0.34, respectively; both below the match threshold of 0.36.

Figure 4.9 along with the ridge overlays of the two images to better visualize the overlapping

regions. Comparing the saliency maps of ResNet50 ((a) in Figure 4.9) to ViT ((b) in Figure 4.9), it

seems that occluding some areas of the fingerprint has more of an effect on varying the similarity

scores for ViT than it does for ResNet50. This suggests that ViT is placing more weight on specific

regions of the fingerprint compared to ResNet50 which may be using more of the fingerprint area

for its prediction. Finally, comparing both saliency maps with the saliency map of AFR-Net (shown

in (c) of Figure 4.9), we can see that AFR-Net exhibits characteristics of both models.

We performed two additional experiments using manual occlusions and affine transformations

to generate partial fingerprints and plotted the performance of each model vs.

the amount of

degradation. Specifically, we generated random occlusions and affine transformations at five

different ratios, corresponding to the percentage of the fingerprint area being obscured. We

repeated the experiment 5 times at each ratio and recorded the average performance of each model.

For reference, example images with random affine transformations and occlusions at ratios of 40%

and 20%, respectively, are shown in Figure 4.10.

According to Figure 4.11, the ResNet models appear to have a slight edge over the attention-

based models (ViT and Swin) when subjected to random occlusions, whereas the ViT and Swin

models show more robustness to severe degrees of affine transformations compared to the ResNet

models. However, AFR-Net shows the best robustness to both occlusions and affine transformations,

underscoring the benefit of merging the two complimentary networks.

102

Figure 4.9 Saliency maps for a partial fingerprint pair from FVC 2002 DB1A computed with
(a) ResNet50, (b) ViT, and (c) AFR-Net models. The reddish regions represent regions of the
fingerprint where the similarity score drops the most when occluded (indicating it’s importance),
whereas the blue regions represent the regions with high similarity scores even with that region
occluded (indicating low importance). On the right, the ridge structures of each fingerprint are
overlaid to highlight the overlapping area. (Best viewed in color).

4.6 Conclusion

In this chapter, we evaluated attention-based fingerprint recognition networks against com-

petitive CNN baselines and a state-of-the-art commercial fingerprint recognition system, Verifin-

ger v12.3, and showed that our combined architecture, AFR-Net (Attention-Driven Fingerprint

Recognition Network), outperforms all of the baselines in the majority of the evaluation datasets.

These evaluations included challenging intra-sensor, cross-sensor, contact to contactless, and la-

tent fingerprint matching scenarios. Furthermore, we introduced a realignment stage using the

103

Figure 4.10 Example manual occlusions and affine transformations to generate challenging, partial
fingerprint pairs.

Figure 4.11 Comparison in TAR (%) at FAR=0.1% of each model under increasing degrees of
random (a) occlusion and (b) affine transformations.

correspondence between local embeddings extracted from intermediate feature maps of two fin-

gerprint images which consistently improved the performance across all the models, especially in

challenging cases (e.g., partial overlap between the fingerprint images). This realignment strategy

requires no additional training and can be applied as a wrapper to any deep learning network (CNN

or attention-based).

It also serves as an explainable visualization of the corresponding regions

of two fingerprint images as ascertained by the network. Future work will aim at improving the

realignment strategy to reduce the latency introduced by the current brute force correspondence

implementation. The use of attention and/or graphical neural networks for this purpose may be

explored in order to more intelligently aggregate two sets of local embeddings. In the next chapter,

104

we turn our attention to learning a diversity of scales (both global and local) encoded into the

embeddings used for fingerprint recognition with a particular emphasis on improving latent to

rolled fingerprint matching, one of the most challenging, unconstrained applications.

105

CHAPTER 5

LATENT FINGERPRINT RECOGNITION: FUSION OF LOCAL AND GLOBAL
EMBEDDINGS

One of the most challenging problems in fingerprint recognition continues to be establishing the

identity of a suspect associated with partial and smudgy fingerprints left at a crime scene (i.e., latent

prints or fingermarks). Despite the success of fixed-length embeddings for rolled and slap fingerprint

recognition, the features learned for latent fingerprint matching have mostly been limited to local

minutiae-based embeddings and have not directly leveraged global representations for matching.

In this chapter, we combine global embeddings with local embeddings for state-of-the-art latent

to rolled matching accuracy with high throughput. The combination of both local and global

representations leads to improved recognition accuracy across NIST SD 27, NIST SD 302, MSP,

MOLF DB1/DB4, and MOLF DB2/DB4 latent fingerprint datasets for both closed-set (84.11%,

54.36%, 84.35%, 70.43%, 62.86% rank-1 retrieval rate, respectively) and open-set (0.50, 0.74, 0.44,

0.60, 0.68 FNIR at FPIR=0.02, respectively) identification scenarios on a gallery of 100K rolled

fingerprints. Not only do we fuse the complimentary representations, we also use the local features

to guide the global representations to focus on discriminatory regions in two fingerprint images

to be compared. This leads to a multi-stage matching paradigm in which subsets of the retrieved

candidate lists for each probe image are passed to subsequent stages for further processing, resulting

in a considerable reduction in latency (requiring just 0.068 ms per latent to rolled comparison on an

AMD EPYC 7543 32-Core Processor, roughly 15K comparisons per second). Finally, we show the

generalizability of the fused representations for improving authentication accuracy across several

rolled, plain, and contactless fingerprint datasets.

5.1

Introduction

Latent fingerprints are fingerprint impressions that are left behind, unintentionally, on surfaces

such as glass, metal, and plastic, and are often invisible to the human eye. However, these prints can

This chapter was previously published as S. A. Grosz and A. K. Jain, “Latent Fingerprint Recognition: Fusion of Local
and Global Embeddings”, IEEE Transactions on Information Forensics and Security, vol. 18, pp. 5691-5705, 2023.
Copyright 2023 by IEEE. Reprinted with permission.

106

Figure 5.1 Example latent and corresponding rolled images from (a) N2N Latent [78], (b) NIST
SD 27 [83], (c) MSP Latent [262], and (d) MOLF [212] datasets. In all the above examples, the
true mate for each query latent was returned at rank-1 by the proposed method against a gallery of
100K rolled fingerprints.

be lifted and analyzed using specialized techniques to enhance the friction ridge patterns present

in the prints [256]. Once the prints have been enhanced, digital images can be compared against

a database of known ten-print fingerprints (rolled or slap) in law enforcement databases, which

can help identify the person who left the prints behind.

In fact, latent fingerprint recognition

has been used as a tool in forensics and criminal investigations over the last century and are often

regarded as a credible source of information in identifying suspects [160]; however, the reliability of

automatic latent to rolled fingerprint matching considerably lags that of rolled to rolled fingerprint

matching. As a result, some innocent individuals, like in the case of Brandon Mayfield [182],

have unfortunately been incarcerated due to inaccurate latent to rolled comparison by automatic

fingerprint identification systems (AFIS) and failure of forensic examiners to follow the ACE-V

protocol, established in the 1980s [8]. As demonstrated in the four example latent fingerprints

shown in Figure 5.1, some of the reasons for low performance in latent fingerprint recognition

include poor ridge-valley contrast, occlusion, distortion, varying background, and incomplete

107

fingerprint patterns. Because of these challenges, latent fingerprint recognition remains one of the

most challenging problems in biometrics, akin to matching poor quality face images from CCTV

surveillance frames to mugshot photos.

In general, AFIS optimized for rolled or plain fingerprint impressions do not achieve comparable

accuracy for latent fingerprints, even when finetuned on publicly available latent fingerprint datasets.

For example, the commercial software Verifinger v12.3 achieves a true accept rate (TAR) of 99.93%

at a false accept rate (FAR) of 0.01% on the NIST SD 14 rolled fingerprint dataset, but only a TAR

of 55.04% on NIST SD 27 latent dataset. This has sparked a number of studies that have focused

on improving individual components of the fingerprint recognition pipeline to work better for

latent impressions, such as those focusing on foreground segmentation of the ridge structure [30],

latent enhancement [116, 126, 137, 271], minutiae extraction [29, 51, 187, 233], and orientation

field estimation [77, 258]. However, few studies have focused on an end-to-end system to improve

the latent to rolled fingerprint recognition pipeline, which is necessary since optimizing individual

components separately may lead to sub-optimal performance when integrated together and tested as

a complete system. Of those studies that do report on an end-to-end recognition system [28,31,234],

the highest rank-1 retrieval rate achieved is 65.7% [31], computed on 258 latent probes from NIST

SD 27 against a background of 100K rolled fingerprints.

Furthermore, despite recent advancements in deep learning techniques for fixed-length repre-

sentations (embeddings) for rolled/plain fingerprint recognition, these global representations have

yet to be leveraged effectively for latents, likely due to the domain gap and limited availability of

large-scale latent fingerprint datasets to learn such representations. Therefore, in this study, we

propose an end-to-end pipeline for latent fingerprint recognition which leverages both a learned,

global fingerprint representation (i.e., entire friction ridge pattern) and local representations (i.e.,

minutiae and virtual minutiae) for improved accuracy and search speed of latent to rolled fingerprint

recognition. We not only fuse these complimentary local and global embeddings but utilize the

local features to inform or guide the global representations to focus on discriminatory regions of

Virtual minutiae are densely sampled points on an evenly spaced grid on the extracted fingerprint ridge area.

108

the input fingerprint image pairs for an improved matching accuracy.

Lastly, existing latent to rolled fingerprint matching pipelines are highly tuned specifically for

latent fingerprints, whereas our representation and matching pipeline is generalizable and effective

across a wide range of fingerprint sensors (e.g., optical, capacitive, etc.) and image domains (e.g.,

latent, rolled, plain, contactless captures via mobile phone cameras, etc.). In a sense, we report on

a universal fingerprint representation that is agnostic to fingerprint sensors and fingerprint capture

mode. Concretely, the contributions of this research are the following:

1. Design of an end-to-end latent fingerprint recognition pipeline using deep learning methods,

including algorithms for segmentation, enhancement, minutiae extraction, and a fusion of

global and local embeddings.

2. State-of-the-art (SOTA) latent to rolled/plain fingerprint search across multiple datasets,

including NIST SD 27 [83], NIST SD 302 Latents (N2N Latents) [78], MSP Latent [262],

and MOLF datasets [212].

3. Faster search speed (low latency) due to our multi-stage search scheme while maintaining

SOTA recognition accuracy for both closed-set and open-set identification.

4. Generalization of representation (embedding) from LFR-Net is shown via SOTA authentica-

tion performance across several rolled (NIST SD 14 [246]), plain (NIST SD 302 [78]), and

contact to contactless fingerprint matching datasets (PolyU Contactless 2D to Contact-based

2D [145] and ZJU Finger Photo and Touch-based [93]) using the same network, a step toward

a universal fingerprint recognition system.

5.2 Related Work

5.2.1 Latent Fingerprint Enhancement

A critical step in improving the accuracy of latent to rolled comparison is alleviating the effect

of various degradations present in latent fingerprints through preprocessing aimed at enhancing the

contrast of the latent fingerprint ridge structure. A multitude of latent enhancement methods have

been proposed over the years, ranging from classical computer vision techniques [37, 40, 77, 258,

109

Table 5.1 Summary of recently published latent fingerprint recognition studies.

Study

FingerNet, 2017 [234]

MSU-AFIS, 2019 [31]

Gu et al., 2020 [102]

MinNet, 2022 [187]

Approach
CNN methods for minutiae extraction,
orientation field estimation, segmentation,
and enhancement. Search results on a gallery of 40K.
CNN methods for enhancement, segmentation,
minutiae, and virtual minutiae
extraction. Search results on a gallery of 100K.
Registration of latents via matching dense,
undirected sampling points + virtual
minutiae (not a fully automated system).
Search results on a gallery of 100K.
CNN-based minutiae patch embedding
network + local similarity assignment (LSA)
algorithm for matching. Search
results on a gallery of
5,560, 316, and 168 images
from EGM, FVC-Latent, and
Tshingua-Latent databases, respectively.

FingerGAN, 2023 [271]

GAN-based enhancement + Verifinger v12.1.
Search results on a gallery of 27,258.

LFR-Net
(proposed approach)

CNN-based latent enhancement, segmentation,
and fusion of local (minutiae + virtual minutiae)
and global (AFR-Net [97]) embeddings for
matching. Search results reported on a gallery of 100K.

† Did not specify the test split.

Database

NIST SD 27 [83]

NIST SD 27 [83]
MSP Latent† [262]

Rank-1

∼35.0%

65.7%
69.4%

NIST SD 27 [83]
MOLF DB3/DB4 [212]

70.1%
19.8%

EGM (private dataset)
FVC-Latent [187]
Tshingua-Latent [187]

92.39%
95.57%
99.40%

NIST SD 27 [83]
MOLF DB1/DB4 [212]
MOLF DB2/DB4 [212]
MOLF DB3/DB4 [212]

NIST SD 27 [83]
N2N Latent [78]
MSP Latent [262]
MOLF DB1/DB4 [212]
MOLF DB2/DB4 [212]

59.69%
25.34%
22.23%
29.43%

84.11%
54.36%
84.35%
70.43%
62.86%

260, 264, 265] to state of the art deep learning methods [24, 48, 116, 126, 137, 147, 227, 234, 271].

Early enhancement efforts utilized contextual filtering and directional filtering [37,40], but these

methods were limited in their effectiveness for enhancing latent fingerprints due to corrupted ridge

structures and unreliable orientation and frequency estimation compared to that of plain and rolled

fingerprints. This led to many subsequent studies on improving the ridge orientation estimation for

latent fingerprints. For example, Yoon et al. [260] utilized a combination of polynomial models

and Gabor filters to improve latent orientation estimation. Similarly, Feng et al. [77] utilized

an orientation patch dictionary and Gabor filters for latent enhancement, and Yang et al. [258]

extended this approach by utilizing local orientation dictionaries, which increased the flexibility of

the approach to find better orientation fields. However, the variance in ridge frequency of distorted

110

latent fingerprints limited the utility of these methods in improving overall matching accuracy.

Subsequent efforts introduced deep neural networks to improve the enhancement of latent

fingerprints.

In addition to a combination of short-time Fourier transform (STFT) and Gabor

filters, Cao et al. [24] trained a convolutional neural network (CNN) autoencoder to enhance latent

fingerprints. Variants of the CNN-based approach were also proposed in [137,147,227]. Generative

adversarial networks (GANs) have also been adopted for latent fingerprint enhancement, and these

methods have shown promise in restoring ridge and valley structures [48, 116, 126, 271]. However,

as shown in Table 5.8, these methods have a tendency to hallucinate ridge lines and produce

spurious minutiae that may degrade matching performance. Furthermore, critical to the success of

many of these methods was access to large databases of mated rolled and latent fingerprint image

pairs for training, many of which are unfortunately not publicly available to other researchers. In

this work, we adopt the efficient CNN architecture of Squeeze U-Net [14] for latent enhancement

without access to any latent training data. Instead, we employ a series of data augmentations on

a dataset of rolled and plain fingerprint impressions in order to mimic the degradations present

in latent fingerprints, and our network is trained to restore the degraded images to their original

input. A comparison between the performance of our enhancement network and several previous

baselines is given in section 5.5.2.

5.2.2 Latent Fingerprint Recognition

Despite recent success of deep learning global representations for fingerprint matching, all

existing latent fingerprint recognition systems (to the best of our knowledge) utilize minutiae-

based matchers for computing final similarity scores between latent and rolled image pairs. For

example, Cao et al. [31] and Öztürk et al. [187] utilize variants of the local similarity assignment

algorithm proposed in [34] for computing minutiae similarity scores; Tang et al. [234] utilized the

extended clique model for minutiae matching, FingerGAN used Verifinger v12.1 for matching; and

Gu et al. [103] utilized multi-scale fixed-length embeddings for indexing to reduce the potential

candidate list in combination with MSU-AFIS [31] for computing the similarity scores. Even

though deep learning networks are used within many of these minutiae-based methods to produce

111

local minutiae descriptors around minutiae points, no existing study directly leveraged a global

embedding as an additional similarity comparison.

In this paper, we propose to use a global

embedding score for improving the latent to rolled matching performance in conjunction with

local minutiae embeddings for minutiae matching. A table giving a brief summary of the recent

publications on latent fingerprint recognition is given in Table 5.1

5.3 LFR-Net: Latent Fingerprint Recognition Network

Our approach for accurate and efficient latent fingerprint search consists of a combination of

local (minutiae and virtual minutiae) and global features (AFR-Net embeddings [97]). Additionally,

due to the low contrast, occlusion, and varying background present in many latent fingerprint

images, we first incorporate automatic segmentation and enhancement of latent fingerprint images

prior to feature extraction. The following sections will describe each component of our latent

to rolled fingerprint matcher, referred to as LFR-Net. These components include enhancement,

segmentation, minutiae extraction, virtual minutiae extraction, global embedding, realignment for

improved global embeddings, and a multi-stage search strategy. An overview of the pipeline is

illustrated in Figure 5.2.

5.3.1 Latent Enhancement and Segmentation

The terminology introduced for NIST SD 27 denotes the quality of latent fingerprints as either

good, bad, or ugly depending on several factors, including the percentage of the fingerprint ridge

structure occluded, noise obscuring the ridge structure, and low contrast of the ridges compared

to the background content of the image. To make things even more challenging, the quality and

appearance of latent fingerprints can vary drastically across different databases, either collected

in the lab (as is the case for the NIST SD 302 (N2N) [78] and IIIT-D MOLF datasets [212])

or from real crime scenes (as is the case for NIST SD 27 [83] and MSP Latent datasets [262]).

Therefore, latent enhancement is a critical yet challenging step for accurate and reliable latent to

rolled fingerprint matching. See Figure 5.3 for example latent and rolled/plain fingerprint pairs

showcasing the various differences between latent datasets.

To address the problem of latent enhancement, we focus on two key factors degrading the quality

112

𝑝, orientation field 𝑂 𝑝, and segmentation mask 𝑆 𝑝. Then, 𝐼 𝑒

Figure 5.2 Overview of LFR-Net. An input latent probe image 𝐼𝑝 is first automatically segmented
and enhanced to generate 𝐼 𝑒
𝑝 is passed
to a minutiae extraction network, minutiae descriptor network, and AFR-Net [97] to produce a
minutiae feature set 𝑀𝑝, virtual minutiae feature set 𝑉𝑝, and AFR-Net embeddings 𝑍 𝑝, which are
embedded into a template for matching (𝑀𝑝, 𝑍 𝑝, 𝑉𝑝). Once extracted, the probe feature template
is compared with each gallery template (𝑀𝑔, 𝑍𝑔, 𝑉𝑔) in the gallery 𝐺 of size 𝑁 via a similarity
function 𝑠(𝐼𝑝, 𝐼𝑔) in three stages.

113

Figure 5.3 Example enhanced latent images from (a) NIST SD 27 [83], (b) MSP Latent dataset [262],
(c) N2N Latent [78], and (d) MOLF [212] datasets. In each subfigure, the left image is the original
latent image, the middle image is the enhanced latent image using the proposed enhancement
network, and the right image is the corresponding rolled mate.

of latent prints - namely, presence of noise occluding areas of the latent fingerprint ridge structure

and low contrast of the ridges. To remove noise from the latent images, we train a de-noising

CNN network to remove noise and fill-in occluded regions of the fingerprint ridge structure. This

network architecture is modeled from Squeeze U-Net [14], an efficient network proposed for image

segmentation, where we have adapted it for latent enhancement. Next, we aim to highlight the

ridge structure of the latent fingerprints by constraining the network to segment the fingerprint

ridge lines from the background. To accomplish this, we introduce an additional channel to the

output of our enhancement network and optimize for both tasks in a single architecture. Thus, the

output of the enhancement network is two channels, one for the enhanced image and another for

the ridge lines. Note, the outputs of both channels are gray-scale and in the range [0,255]. A few

example enhancement outputs from this network are shown in the middle column of each sub-figure

in Figure 5.3 and the bottom two rows of Figure 5.5.

To locate and segment the latent fingerprint area from the background image content, we use

the predicted fingerprint ridges as a segmentation mask for localizing the latent fingerprint area by

performing a series of simple image processing operations. First, a Gaussian filter with kernel size

(5,5) is applied to the predicted ridge map, followed by a thresholding operation with a threshold

114

Figure 5.4 Example mask prediction for a latent image from NIST SD 27. (a) input latent image,
(b) gray-scale ridge image output by the enhancement network, and (c) binary mask obtained after
a series of Guassian blurring, thresholding, and morphological operations on (b).

of 150 on the pixel values to obtain the binary ridge lines in the range [0,1]. Next, a morphological

closing operation with a kernel size of (9,9) is repeated 3 times, followed by three morphological

opening operations with a kernel size of (9,9). Finally, the mitigate erroneous predictions, the

resulting mask defaults to the entire image if the resulting mask after processing has an area of less

than 10,000 pixels. Since our enhancement network is fully convolutional, it can accept images of

any resolution; however, the final segmented images are cropped to a height and width of 512 × 512

pixels at a resolution of 500 ppi. Figure 5.4 illustrates the process of converting a predicted gray-

scale ridge image to a binary segmentation mask for an example latent fingerprint from NIST SD

27.

Due to a lack of publicly available large-scale latent databases, we utilize several data aug-

mentations to mimic the distribution of latent fingerprints using a collection of rolled and slap

fingerprints. These data augmentations are illustrated in (b) of Figure 5.5 and consist of random

amounts of Gaussian blurring, Gaussian noise, downsampling, partial occlusions, and contrast ad-

justments. The enhancement network is trained to remove these degradations from the augmented

images via an MSE loss between the predicted, enhanced image and the original, unperturbed

image. Furthermore, we compute an additional MSE loss between the predicted ridge images and

the ridge images extracted from the original input fingerprints via Verifinger v12.3 (normalized to

the range [0,255]). Equal weight is given to the two MSE loss terms during training.

The enhancement network is trained on the MSP longitudinal fingerprint dataset (rolled finger-

prints only) [262], a subset of NIST SD 302 (rolled and plain fingerprints only) [78], and a dataset

115

Figure 5.5 Example data augmentations to train the latent enhancement network. Random Gaussian
blurring, Gaussian noise, downsampling, partial occlusions, and contrast adjustments are applied
to rolled fingerprint images to generate low quality fingerprints that resemble characteristics of
latent fingerprints.
(a) original rolled fingerprints, (b) simulated latent fingerprints after data
augmentations, (c) predicted enhanced output, and (d) predicted binary ridge image.

of plain fingerprint impressions referred to as the MSU Self-Collection. Details on number of

fingers/images contained in each of these datasets are provided in Table 7.1. Ground truth binary

images for all the training images are obtained using Verifinger v12.3. The network was trained on

2 Nvidia RTX A6000 GPUs for 11 epochs utilizing an initial learning rate of 0.001, polynomial

learning rate schedule, and Adam optimizer. As is shown in section 5.5.2, despite not being trained

on any real latent images, our enhancement network is able to outperform many of the existing

latent enhancement methods in the literature. For illustration, example enhancements from each of

these methods is shown in Figure 5.6.

5.3.2 Minutiae Extraction

Our minutiae extraction network consists of a ResNet50 backbone, self-attention transformer

layers, and a series of transpose convolutional layers to predict a 12-channel minutiae map, a

representation for minutiae points introduced in [31]. This minutiae map is converted to a list of

(𝑥, 𝑦, 𝜃) locations for each minutiae point, and a set of 96 × 96 image patches centered around each

minutiae are aligned based on the orientation 𝜃 and fed into a separate ResNet50 model to extract

a set of descriptors associated with each minutiae. These descriptors are each 96-dimensional and

116

Figure 5.6 Two example comparisons of several baseline enhancement algorithms and the proposed
enhancement network. (a) original latent images, (b) enhanced by MSU-AFIS [31], (c) enhanced
by FingerNet [234], (d) enhanced by FingerGAN [271], and (e) enhanced by LFR-Net (proposed).

used in the minutiae similarity calculation when comparing two sets of minutiae points extracted

from a given fingerprint image pair. Thus, in conjunction with the (𝑥, 𝑦, 𝜃) locations of each

minutiae point and assuming 𝑚 minutiae points in total, a given minutiae template 𝑀 will be of

dimension 𝑀 ∈ R𝑚×99. The architecture details of the minutiae extraction network are given in

Table 5.2.

For matching minutiae points, we compute a similarity matrix between all Euclidean normalized

minutiae descriptors and utilize the local similarity with relaxation (LSS-R) algorithm (as described

in Minutiae Cylinder-Code (MCC) [34]) to refine and remove false correspondences. Finally, the

cosine similarity between the descriptors of corresponding minutiae points are summed to yield

a final minutiae similarity score. Due to the nature of latent fingerprint formation, we find it is

extremely useful to align the minutiae points prior to extracting the minutiae descriptors. This step

imparts the similarity calculation with rotation invariance, a critical factor in unconstrained latent

fingerprint recognition.

Our minutiae extraction and descriptor networks are trained on the MSP (rolled fingerprints

117

Table 5.2 Architecture details for the minutiae extractor network. Batch normalization and ReLU
activation are applied after each convolution, except for the last layer which uses a Sigmoid
activation.

Layer Type

Conv2d

Conv2d

Conv2d

Conv2d

MLP

Self-Attention + MLP

Conv2d Transpose

Conv2d

Conv2d Transpose

Conv2d

Conv2d Transpose

Conv2d

Conv2d Transpose

Conv2d

Conv2d

Output Dim.

Parameters

64 × 112 × 112

k=7x7, padding=3, stride=2

256 × 56 × 56

512 × 28 × 28

1024 × 14 × 14

384 × 196

384 × 196

384 × 28 × 28

384 × 28 × 28

192 × 56 × 56

192 × 56 × 56

96 × 112 × 112

96 × 112 × 112

48 × 224 × 224

48 × 224 × 224

x3

k=1 × 1, ch=64

k=3 × 3, ch=64



















k=1 × 1, ch=256








k=1 × 1, ch=256



k=1 × 1, ch=128



k=3 × 3, ch=128



k=1 × 1, ch=512
















k=1 × 1, ch=1024




in=1024, hid=1024, out=384

k=3 × 3, ch=256

x6

x4

in=384, hid=1536, out=384

k=2x2, stride=2
(cid:104)

k=3 × 3, ch=384

(cid:105)x2

k=2x2, stride=2
(cid:104)

k=3 × 3, ch=192

(cid:105)x2

k=2x2, stride=2
(cid:104)

k=3 × 3, ch=96

(cid:105)x2

k=2x2, stride=2
(cid:104)

k=3 × 3, ch=48

(cid:105)x2

12 × 224 × 224

k=1x1

only), NIST SD 302 (rolled and plain fingerprints only), and MSU Self-Collection (plain fingerprints

only) training datasets. An MSE loss between predicted and ground truth minutiae points (obtained

using the commercial Innovatrics v2.4.10 SDK) was used to supervise the minutiae extraction

network. For training the minutiae descriptor model, minutiae patches of size 96×96 pixels

were extracted from corresponding minutiae points between multiple impressions of each finger

in the training set. To ensure reliability of ground truth corresponding minutiae patches, only

corresponding minutiae points common among all impressions of the same finger were used and

118

Figure 5.7 Visual comparison of minutiae extracted by our method (shown in green), Verifinger
v12.3 (shown in red) and manually marked minutiae (shown in blue). Best viewed in color.

assigned a label for training. The Additive Angular Margin (ArcFace) loss function was used

to supervise the descriptor model in classifying image patches belonging to the same minutiae

point [56]. Both networks were trained on 4 Nvidia RTX A6000 GPUs for 56 epochs, with an

initial learning rate of 0.0001, polynomial learning rate schedule, and Adam optimizer. A visual

comparison of four example latent images annotated with minutiae from our minutiae extractor

(shown in green), Verifinger v12.3 (shown in red), and manually marked minutiae (shown in blue)

is provided in Figure 5.7. Due to the difficulty in manually marking latent minutiae points, usually

very few minutiae are manually annotated. On the other hand, automatic minutiae extractors tend

to detect many false (e.g., spurious) minutiae due to noise in the image. Nonetheless, compared

to Verifinger, our method is detecting less spurious minutiae (as can be seen in the bottom two

examples of Figure 5.7).

5.3.3 Virtual Minutiae Extraction

Due to the severely low quality of the ridges in many latent fingerprints, minutiae extraction is

often unreliable and may produce many spurious minutiae and/or fail to extract any minutiae points

at all. Therefore, in order to enforce local features within the image as part of matching, we utilize

virtual minutiae, originally suggested in [31]. These virtual minutiae points are evenly spaced

throughout the fingerprint area and use the estimated orientation field within the neighborhood of

each point as the orientation assigned to each virtual minutiae point. Through an ablation study

presented in section 5.5.2, we show the importance/utility of incorporating virtual minutiae into

our pipeline.

119

For extracting virtual minutiae, we place a grid of virtual minutiae points at each (x,y) location

of the segmented fingerprint area, separated by 16 pixels (in both x and y directions). The

orientation of each 16x16 patch assigned to each virtual minutiae is estimated using the orientation

field extraction algorithm described in [40]. Aligned image patches centered around each virtual

minutiae are then fed to the same minutiae descriptor model described above to extract embeddings

for each virtual minutiae. Since we are using the same minutiae descriptor extraction network, no

additional training is required to obtain the virtual minutiae points. Assuming 𝑛 virtual minutiae

points are extracted in total, a given virtual minutiae template 𝑉 will be of dimension 𝑉 ∈ R𝑛×99.

The virtual minutiae similarity calculation between two virtual minutiae templates also utilizes the

LSS-R matching algorithm [34].

5.3.4 Global Embedding Representation

For our global representation, we utilize the recently proposed AFR-Net architecture [97] which

achieved high performance across a wide range of fingerprint types (rolled, plain, contactless)

and sensors (optical, capacitive, etc.). AFR-Net is a combination of both CNN and ViT image

recognition architectures, consisting of a shared CNN backbone and two separate classification

heads (one CNN-based and the other utilizing attention blocks from ViT). The output of AFR-

Net is two embeddings (𝑍𝑎 and 𝑍𝑐) of 384-dimensions each and the similarity score calculation

is performed via a weighted sum of the normalized dot product between both embeddings of a

fingerprint pair. For simplicity, we denote the AFR-Net embeddings as 𝑍, a concatenation of the

two individual embeddings (764-dimensional).

AFR-Net is trained on a diverse training set consisting of a combination of rolled fingerprints [71,

248,262], plain (i.e., slap) fingerprints [96], mixture of rolled and plain fingerprints [78], contactless

(e.g., from mobile phone cameras) fingerprints [17, 53, 75], and synthetic latent fingerprints [251].

In total there are about 1.3 million images from 96,556 unique finger identities in training. Due to

the lack of publicly available latent fingerprint datasets, we do not train on any real latent fingerprint

databases and reserve the few latent dataset that we do have for evaluation. Interested readers are

referred to [97] for complete training dataset details.

120

5.3.5 Minutiae Alignment of Global Embeddings

As proposed in [97], a strategy for improving the fingerprint representations obtained via deep

learning networks is to align the regions of interest between two input images, remove background

and other non-overlapping regions of the fingerprint areas in both images, and pass the aligned

images back into the embedding network to yield new “refined" representations. In contrast to [97],

where the local embeddings used to find corresponding regions of interest in both images are from

an intermediate layer in the AFR-Net architecture, we directly use the minutiae correspondence

between two images to compute the affine transformation which best aligns the image pair. In a

sense, we are informing the global representation to focus on regions of the images which share

many local similarities, in order to better distinguish between genuine pairs and close imposters.

We also experimented with using virtual minutiae correspondences to compute the transformation

but observed no significant change in the overall search accuracy to warrant the additional latency

incurred by matching a much larger number of points.

5.3.6 Multi-Stage Search Strategy

Each of the feature sets in LFR-Net adds complimentary information for improving the reliability

of a potential match, yet incurs an additional latency which can be prohibitively expensive on a

large gallery size (e.g., N=100,000). Typically, computing the similarity between global, fixed

length feature vectors (such as AFR-Net embeddings) is extremely fast compared to local feature

matching (e.g., minutiae graph similarity computation); however, performance on small area latent

fingerprints suffers without the use of local features. Therefore, we propose a multi-stage search

paradigm which reduces the size of the returned candidate list before invoking expensive local

feature matching (e.g., virtual minutiae similarity computation) to refine the final ranked candidate

list.

Specifically, our hierarchical matching procedure consists of three stages. First, we return

the top K (e.g., K=1,000) candidate matches using a fusion of AFR-Net similarity and minutiae

matching. Next, we re-rank the top K candidates using virtual minutiae matching and obtain a

smaller candidate list of size L (e.g., L=500). Finally, we align each probe image to each of its L

121

candidate gallery images (using an affine transformation computed from corresponding minutiae

points) and obtain a new set of AFR-Net embeddings on the aligned images in order to further refine

the final candidate list. An illustration of this multi-stage search strategy is shown in Figure 5.2. A

discussion on the latency savings utilizing our three stage match procedure is given in section 5.4.5.

The scores after each stage of matching are normalized to the range [0,1] based on a set of weights

(𝑤1=0.4, 𝑤2=0.4, 𝑤3=0.18, and 𝑤4=0.02) determined empirically on a validation set of latent

fingerprints from the MSP latent database (which is separate from the MSP latent test dataset). The

overall algorithm for LFR-Net is given in Algorithm 5.1.

5.4 Experimental Results

In this section, we report the performance of our latent fingerprint recognition pipeline across

multiple latent fingerprint datasets, as well as other plain, rolled, and contactless fingerprint datasets

to demonstrate the generalizability of our representations. First, we give the details of the datasets

used in this study, followed by the closed-set and open-set identification results for several latent

datasets, as well as the authentication performance across a diverse set of fingerprint sensors (e.g.,

capacitive, optical, etc.) and fingerprint type (plain, rolled, contactless, etc.). Next, we benchmark

the performance of our enhancement network compared to previous enhancement methods, both

in terms of minutiae detection accuracy and authentication performance of Verifinger v12.3 on

each of the enhanced image outputs. Finally, we conclude with a discussion on the speed and

computational efficiency of our recognition pipeline and the trade-offs in speed and accuracy given

our multi-stage search strategy.

5.4.1 Datasets

Details for all training, validation, and test datasets used in this study are given in Table 7.1.

Unlike previous latent fingerprint papers, we do not have access to a large private dataset of paired

latent and rolled fingerprints (e.g., HiSign Latent Fingerprint database used in [102,258], consisting

of 10,458 latent and mated rolled pairs). In fact, we do not use any latent fingerprint datasets for

training, yet our system is able to achieve new SOTA accuracy on many latent test datasets. Since

our system is not highly tuned for latent fingerprints, we are able to maintain SOTA accuracy on

122

Algorithm 5.1 Return a ranked candidate list, given an input latent fingerprint probe (𝐼 𝑝) and gallery of rolled fingerprint
images (𝐼𝐺) using the proposed LFR-Net matcher.

// Initialize score weights
𝑤1, 𝑤2, 𝑤3, 𝑤4 := 0.4, 0.4, 0.18, 0.02

1: procedure Match(𝐼 𝑝, 𝐼𝐺)
2:
3:
4:
5:
6:
7:
8:
9:
10:

// Initialize no. candidates passed to 2𝑛𝑑 stage.
𝐾 := 1000

// Initialize no. candidates passed to 3𝑟 𝑑 stage.
𝐿 := 500

11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:

23:

24:
25:
26:
27:
28:

29:
30:

31:
32:
33:
34:
35:
36:

37:

38:

39:
40:
41:
42:

// No. candidates in gallery.
𝑁 ← 𝑙𝑒𝑛(𝐼𝐺)

// Initialize score lists.
𝑆1, 𝑆2, 𝑆3 := [0] ∗ 𝑁, [0] ∗ 𝐾, [0] ∗ 𝐿

// Extract gallery and probe features.
𝑀𝑝, 𝑉𝑝, 𝑍 𝑝 ← 𝐸𝑥𝑡𝑟𝑎𝑐𝑡 (𝐼 𝑝)
𝑀𝐺, 𝑉𝐺, 𝑍𝐺 ← 𝐸𝑥𝑡𝑟𝑎𝑐𝑡 (𝐼𝐺)

// Stage 1 Matching
for i in range(N) do

𝑀𝑔, 𝑍𝑔 := 𝑀𝐺 [𝑖], 𝑍𝐺 [𝑖]
𝑆1 [𝑖] ← 𝑤1𝑚𝑠𝑖𝑚𝑖 (𝑀𝑝, 𝑀𝑔) + 𝑤2

(𝑍 𝑇
𝑝 ·𝑍𝑔 )
| 𝑍𝑝 | | 𝑍𝑔 |

𝐺, 𝑀 1
𝐼 1

𝐺, 𝑉 1

𝐺, 𝑍 1

𝐺 ← 𝑆𝑜𝑟𝑡 𝐴𝑛𝑑𝐹𝑖𝑙𝑡𝑒𝑟𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠(𝑆1)

// Stage 2 Matching
for i in range(K) do

𝐺 [𝑖]

𝐺 [𝑖], 𝑉 1

𝑀𝑔, 𝑉𝑔, 𝑍𝑔 := 𝑀 1
𝐺 [𝑖], 𝑍 1
𝑆2 [𝑖] ← 𝑤1𝑚𝑠𝑖𝑚𝑖 (𝑀𝑝, 𝑀𝑔) + 𝑤2
+𝑤3𝑚𝑠𝑖𝑚𝑖 (𝑉𝑝, 𝑉𝑔)
𝐺, 𝑍 2

𝐺, 𝑉 2

𝐺, 𝑀 2
𝐼 2

(𝑍 𝑇
𝑝 ·𝑍𝑔 )
| 𝑍𝑝 | | 𝑍𝑔 |

𝐺 ← 𝑆𝑜𝑟𝑡 𝐴𝑛𝑑𝐹𝑖𝑙𝑡𝑒𝑟𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠(𝑆2)

// Stage 3 Matching
for i in range(L) do

𝑝, 𝑍 ′

𝐺 [𝑖], 𝐼2

𝑀𝑔, 𝐼𝑔, 𝑍𝑔, 𝑉𝑔 := 𝑀 2
𝑍 ′

𝐺 [𝑖], 𝑍 2
𝑔 ← 𝑅𝑒𝑎𝑙𝑖𝑔𝑛(𝐼 𝑝, 𝑀𝑝, 𝐼𝑔, 𝑀𝑔)
(𝑍 𝑇
𝑝 ·𝑍𝑔 )
| 𝑍𝑝 | | 𝑍𝑔 |

𝑆3 [𝑖] ← 𝑤1𝑚𝑠𝑖𝑚𝑖 (𝑀𝑝, 𝑀𝑔) + 𝑤2

𝐺 [𝑖], 𝑉 2

𝐺 [𝑖]

+𝑤3𝑚𝑠𝑖𝑚𝑖 (𝑉𝑝, 𝑉𝑔) + 𝑤4

𝑝 ·𝑍 ′
𝑔 )
𝑝 | | 𝑍 ′
𝑔 |
𝐺 ← 𝑆𝑜𝑟𝑡 𝐴𝑛𝑑𝐹𝑖𝑙𝑡𝑒𝑟𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠(𝑆3)

(𝑍 ′𝑇
| 𝑍 ′

𝐺, 𝑀 3
𝐼 3

𝐺, 𝑉 3

𝐺, 𝑍 3

// Return sorted candidate list.
return 𝐼3
𝐺

123

Table 5.3 Fingerprint datasets used in this study.

200

1,600

20,008

# images
447,988

# Fingers
37,411

57,813
# Images

4,582
# Fingers

Train Datasets
MSP Rolled† [262]
NIST SD 302 (N2N)‡ [78]
(plain and rolled prints)
MSU Self-Collection (plain prints)
Validation Datasets
NIST SD 302 (N2N)‡ [78]
(plain and rolled prints)
MSP Latent† [262]
Test Datasets
NIST SD 14 [246]
NIST SD 302 (N2N)‡ [78]
NIST SD 302 Latents
(N2N Latents) [78]
IIIT-D MOLF [209]
MSP Latent [262]
NIST SD 27 [83]
PolyU Contactless 2D to
Contact-based 2D Database [145]
ZJU Finger Photo and
Touch-based Database [93]
† The MSP Rolled and MSP Latent datasets are completely disjoint and distinct in terms of finger identities.
‡ The train, validation, and test splits of N2N are disjoint and distinct in terms of finger identities.

933
# Fingers
2700
200

2,030
# Images
5,400
2,548

12,400
1,866
516

1,000
933
258

19,776

2,528

3,793

1,019

960

160

824

Table 5.4 Closed-set identification (1:N comparison) results of three matchers, including the pro-
posed LFR-Net.

Model

MSU-AFIS [31]
AFR-Net [97]
LFR-Net
(proposed)

Feature
Extraction
latency‡
2,586ms
6.86ms

Match
latency†
0.093ms
0.002ms

Template size
(latent/rolled)

308KB / 56KB
3KB / 3KB

NIST
SD 27
61.63
39.92

N2N
Latent
29.78
21.95

MSP
Latent
67.64
59.27

Closed-Set Rank 1 Retrieval Rate (%)
MOLF

DB1/DB4
44.84
42.41

DB2/DB4
27.86
38.48

553ms

0.068ms*

307KB / 401KB

84.11

54.36

84.35

70.43

62.86

‡ Computed on an Nvidia RTX A6000 GPU.
† 128 threads on an AMD EPYC 7543 32-Core Processor.
∗ Time computed using 3-stage matching with K=1,000 and L=500. Recall, K is the no. of candidates from the gallery
passed from stage 1 to stage 2 and L is the no. of candidates passed from stage 2 to stage 3.

Table 5.5 Rank-1 Retrieval Rate (%) of the proposed LFR-Net on NIST SD 27 gallery sizes ranging
from 50,000 to 250,000.

Gallery Size (N)

50K 100K 150K 200K 250K

Rank-1 Retrieval Rate (%)

85.66

84.11

83.33

82.56

82.17

rolled, plain, and contactless fingerprints as well.

5.4.2

Identification Results

Particularly relevant to the use case of latent fingerprint recognition is the ability to quickly

and accurately search a large database of known fingerprints given an input probe latent fingerprint

124

Table 5.6 Open-set comparison of FNIR at FPIR=0.02 across 5 latent dataset evaluations (lower is
better).

Method

MSU-AFIS [31]

LFR-Net
(proposed)

N2N

MSP

NIST
SD 27 Latent Latent DB1/DB4 DB2/DB4
0.72

MOLF

0.87

0.76

0.86

0.69

0.50

0.74

0.44

0.60

0.68

Figure 5.8 Open-set performance of LFR-Net with a gallery of 100K. Only 50% of the latent probes
in each dataset have mates in the gallery.

image. To benchmark the performance of our proposed pipeline for latent to rolled fingerprint

recognition, we compute the closed-set and open-set identification results across four different

latent fingerprint datasets against a gallery of 100K rolled fingerprints. Each fingerprint image in

the gallery is from a unique finger and is separate from the rolled mates included in each latent

database. Furthermore, all gallery fingerprints used in these experiments are from a separate set of

the MSP forensic database [262] and were not used in training.

A comparison of the rank-1 retrieval rate of the proposed method and two baseline algorithms

for latent fingerprint recognition is shown in Table 5.4. The two baseline methods include i.)

the original AFR-Net as proposed in [97] and ii.) an optimized version of the MSU-AFIS latent

recognition pipeline proposed in [31]. Currently, none of the commercial latent fingerprint vendors

provide their latent fingerprint SDK to include in the evaluation. It is reported that Verifinger SDK

125

v13 may have a latent matcher, but it is not yet available to developers at this time. Additionally,

many other previous latent fingerprint algorithms in the literature either have not made the source

code publicly available or take prohibitively long to run the evaluation on our large size of gallery.

Where available, we have included the performance of these baseline methods using the numbers

reported in those publications in Table 5.1. Not surprisingly, AFR-Net underperforms across each of

the latent datasets compared to both MSU-AFIS and our proposed pipeline. MSU-AFIS performs

reasonably well across each dataset because of its use of minutiae and virtual minutiae which,

according to our ablation table in section 5.5.2, makes a significant difference in the accuracy for

latent to rolled comparisons. Nonetheless, our method, LFR-Net, outperforms all baseline methods

due to a combination of improved enhancement, segmentation, and fusion of both local and global

embeddings.

In particular, our average rank-1 retrieval rate across the four datasets is 71.22%,

compared to the average rank-1 performance of MSU-AFIS of 46.19%. Out of the published

results on NIST SD 27, LFR-Net outperforms the next best method of Gu et al. 84.11% to 70.1%

at rank-1.

For time and space constraints, most of our experiments are conducted on a background gallery

of 100,000 unique rolled fingerprints; however, in order to investigate the scalability of LFR-Net

to larger gallery sizes, we conducted closed-set identification experiments using NIST SD 27 on

gallery sizes ranging from N=50,000 to N=250,000 rolled distractors. The results given in Table 5.5

suggest that the decline in rank-1 retrieval rate up to a gallery size of 250,000 starts to converge

toward 82%, down from an initial 85.66% for gallery size of 50,000. As part of future work, we will

investigate this trend up toward a gallery size of 1,000,000 unique fingers, which is more indicative

of practical real-world applications.

To the best of our knowledge, previous studies on latent to rolled fingerprint matching have

only reported closed-set identification results. With our improved performance across many of the

latent to rolled datasets, we also report open-set identification results where 50% of the probes from

each of our datasets are randomly selected to have no corresponding mates in the gallery. A plot

https://neurotechnology.com/verifinger.html

126

Table 5.7 Authentication (1:1 comparison) results of three matchers, including the proposed LFR-
Net.

Model

NIST
SD 14

TAR (%) @ FAR=0.01%
NIST
SD 302

PolyU‡

ZJU‡

NIST
SD 27†

NIST
SD 14

NIST
SD 302

EER (%)

PolyU‡

ZJU‡

0.04

2.52

96.88

93.26

99.93

95.39

99.93

98.04

95.42

55.04

Verifinger
v12.3
AFR-Net
[97]
LFR-Net
(stage 1)
LFR-Net
(stage 1&2)
† Using LFR-Net preprocessing and segmentation.
‡ Contactless images are preprocessed using the enhancement and unwarping method proposed in [93].

59.69

94.30

99.93

98.78

68.99

99.00

98.61

95.56

98.61

99.96

68.99

99.00

1.96

2.03

0.45

0.45

1.01

1.87

0.04

0.34

0.04

0.04

0.50

0.45

0.45

0.83

NIST
SD 27†

12.91

11.52

7.74

6.58

of false negative identification rate (FNIR) vs. false positive identification rate (FPIR), computed

for rank-1 retrieval, for LFR-Net across all five latent evaluations is given in Figure 5.8, where a

comparison of FNIR @ FPIR=0.02 with MSU-AFIS is given in Table 5.6.

5.4.3 Authentication Performance

Due to the challenges inherent to latent fingerprint recognition, all the existing systems in the

open literature are highly tuned to perform well for latent to rolled comparison and/or require

expensive feature extraction and matching times due to the additional features required to achieve

high accuracy. However, our system stands out in that the representations learned (both local and

global embeddings) are generalizable across a wide range of fingerprint image characteristics, as

demonstrated in the authentication results shown in Table 5.7. LFR-Net is competitive with and

even outperforms the commercial fingerprint SDK Verifinger v12.3 on several datasets.

Furthermore, our algorithm can be tuned to vary the latency for both feature extraction and

matching depending on the confidence required and/or difficulty of the fingerprint image domain

of interest. For example, both the feature extraction and matching latencies could be significantly

decreased for rolled-to-rolled or plain-to-plain fingerprint matching by utilizing just the AFR-Net

embeddings, which achieves very competitive authentication performance across all of the datasets.

Similarly, at a modest cost in latency, one could also incorporate minutiae features for a slight boost

in accuracy, as is done in stage 1 of LFR-Net. Virtual minutiae are effective in improving latent

127

to rolled matching accuracy; however, they introduce additional latency which is not required to

achieve high accuracy across rolled and plain fingerprint datasets.

5.4.4 Latent Enhancement Performance

For a comparison of our enhancement method with several previous SOTA latent enhancement

methods (FingerGAN [271], FingerNet [234], and MSU-AFIS [31]) we computed the statistics of

false (spurious) and correctly predicted minutiae using the manually marked ground truth provided

for NIST SD 27 from [77]. Specifically, we individually enhanced all 258 latent images from

NIST SD 27 using each of the enhancement methods, extracted the minutiae points of the enhanced

images using Verifinger v12.3, and compared the extracted minutiae to the human annotated ground

truth minutiae points. We consider a correctly detected minutiae as one in which the type is the

same, (x,y) location is within 10 pixels, and the angle difference is less than 10 degrees compared

to a ground truth minutiae. These thresholds are motivated from a previous study on the robustness

of minutiae-based matchers, which showed that the performance of minutiae matching starts to

decline with minutiae perturbations outside these ranges [94]. Results in Table 5.8 show that our

enhancement network outperforms the previous methods in terms of number of correctly predicted

minutiae, while also introducing fewer spurious minutiae than the next best method, FingerGAN.

Interestingly, the number of spurious predicted minutiae is the lowest for the unenhanced, original

latent images. The increase in spurious minutiae may, at least partially, be attributed to each

enhancement method hallucinating fingerprint ridges where there are none; however, the number

of spurious minutiae increasing dramatically across all methods suggests that the increase could be

attributed in some part to missed minutiae during the manual markup process due to the difficulty

in annotating minutiae for unenhanced latent fingerprints. Nonetheless, according to previous

research, spurious minutiae have much less of an effect on overall matching accuracy as does

failing to detect correct minutiae [94]. In fact, we can see this is indeed the case in the improved

authentication (1:1 matching) performance of Verifinger on our enhanced NIST SD 27 images

(TAR=65.12% at FAR=0.1%) compared to the original images (TAR=57.75% at FAR=0.1%),

where a total of 258 genuine scores and 66,564 imposter scores were computed.

128

Table 5.8 Number of correctly predicted minutiae and spurious minutiae introduced for the 258
latent prints in NIST SD 27 before and after enhancement by several methods. Predicted minutiae
were extracted using Verifinger v12.3, and ground truth minutiae were manually marked by [77].

Method

Original
Images
Enhanced by
MSU-AFIS [31]
Enhanced by
FingerNet [234]
Enhanced by
FingerGAN [271]
Enhanced by
LFR-Net (Proposed)

No. Correctly
Predicted

No. Spurious
Predictions

TAR (%) @
FAR=0.01% (0.1%)

Rank-1 (%) with
gallery of 100K

2,606

2,329

2,460

2,939

3,118

3,676

4,678

5,147

7,385

5,536

51.94 (57.75)

38.76 (44.96)

39.15 (47.29)

52.71 (57.36)

55.04 (65.12)

56.59

48.06

47.29

58.14

58.14

Interestingly, the rank-1 performance using Verifinger is the same for our enhanced images as

those of FingerGAN [271]; however, we have the additional advantage that our enhanced images

are not introducing a domain shift with respect to the original, gray-scale fingerprint images.

This means that other components of our pipeline (e.g., minutiae descriptor and AFR-Net) can

also benefit from the enhancement without the need to be re-trained. For example, the rank-1

performance of AFR-Net on NIST SD 27 enhanced by LFR-Net improves from 39.92% to 52.33%

without re-training the network, but the performance drops to 6.20% using FingerGAN enhanced

images.

5.4.5 Computational Efficiency

Latency is a crucial aspect for large-scale identification applications, which tends to be in

competition with accuracy. Thus, we were motivated to find a balance between accuracy and speed

using a multi-stage search protocol, which has also been explored in previous works on fingerprint

identification [68]. For a quantitative analysis on the latency of our approach, we will denote the

size of our gallery as N (e.g., N=100,000) and the size of our probe dataset as Q (e.g., Q=258 in

the case of NIST SD 27). Furthermore, since our algorithm consists of three stages of matching

with variable number of top candidates per probe passed to subsequent stages, we will denote the

number of candidates per probe image passed from the first stage to the second stage as K and the

number of candidates passed to the third stage as L.

129

For our first stage matching, we utilize only AFR-Net and minutiae features to obtain a short

list of top K candidates from the gallery for each probe fingerprint image. This stage takes on

average 𝑡1=0.015 ms for a single latent to rolled comparison when utilizing 128 threads on an AMD

EPYC 7543 32-Core Processor, where a total of NxQ comparisons are computed. In the second

stage, we utilize virtual minutiae scores to re-rank the K list of candidates per latent and return a

further condensed list of top L candidates to pass to the third stage. Here, a single virtual minutiae

comparison between a latent and rolled image pair takes on average 𝑡2=0.984 ms, where a total

of KxQ comparisons are computed. Finally, our third stage consists of re-aligning each of the L

candidate images for each probe using the pairwise minutiae correspondences and recomputing

AFR-Net scores for each pair. In this stage, there are a total of LxQ comparisons required, where

each realignment plus AFR-Net inference per comparison takes an average of 𝑡3=8.626 ms. Note,

the latency of stage 1 and stage 2 depends on the number of minutiae and virtual minutiae extracted

per latent probe, respectively. The latency numbers reported here are computed for NIST SD 27

against a gallery augmented by 100,000 rolled fingerprints, where the average number of minutiae

and virtual minutiae extracted per latent image is 45 and 363, respectively, and the average number

of minutiae and virtual minutiae per rolled fingerprint is 119 and 886, respectively. In total, the

average latency 𝑡 per comparison for the entire three stage matching process can be computed using

equation 5.1:

𝑡 = 𝑡1 +

𝐾
𝑁

𝑡2 +

𝐿
𝑁

𝑡3

(5.1)

Using equation 5.1 with N=100,000, K=1000 and L=500, the average latent to rolled comparison

across each of the four latent datasets for our full matching pipeline takes about t=0.068 ms. As

mentioned previously, the filtering of our candidate lists in each stage does incur some accuracy

trade-off; however, we find that filtering 99% of the candidate list prior to stage 2 (with K=1,000

and N=100,000) leads to no difference in rank-1 retrieval rate for NIST SD 27 and only about a

1% decrease in accuracy at higher ranks. A plot of the Cumulative Match Characteristic (CMC)

130

Figure 5.9 Search results on NIST SD 27 (100K gallery) as the no. of candidates (K) sent to stage
2 is varied. Reducing K to 1% of the gallery results in a speed up of 0.352 ms to 0.068 ms (5.2×
faster) per comparison with no change in rank-1 accuracy and only ∼ 1% decrease at higher ranks.

for NIST SD 27 on a gallery of 100,000 as the value of K is varied from 100,000 to 10 is shown in

Figure 5.9.

The feature extraction speed is often less of a concern for fingerprint recognition since templates

for the gallery can be extracted offline prior to matching; however, is still important in cases of

updating the gallery for future improvements to the system. Nonetheless, our method is significantly

faster compared to the baseline MSU-AFIS algorithm, taking just 553 ms on average per latent

image or 1.88 images per second. In terms of template size, our algorithm is comparable to MSU-

AFIS for latents; however, for rolled templates, MSU-AFIS performs several template compression

and quantization techniques to reduce the size of the templates compared to ours, which can also

be incorporated into our algorithm in future work.

5.5 Discussion

In this section we discuss some of the failure cases of our pipeline and present plausible

strategies to improve on these challenging cases. Furthermore, we present an ablation analysis to

evaluate the contribution of each component of our pipeline to the overall identification accuracy.

131

5.5.1 Failure Case Analysis

Despite the SOTA performance of our algorithm across all five latent evaluation fingerprint

datasets utilized, there are still cases in which our algorithm fails to return the correct mate at rank

one (see Figure 5.10 (c) and (d)). Figure 5.10 (a) and (b) show examples where the system was

successful in returning the correct mate at rank-1, demonstrating the benefit of the enhancement

network in removing some occlusions and enhancing the inter-ridge separation. However, as

examples (c) and (d) show, there are many challenges that need to be tackled. One cause of

failure, demonstrated in (c), is poor segmentation, where a large portion of the fingerprint image is

cut-off. Additional failures can be attributed to noisy background and overlapping latent patterns,

very low ridge-valley contrast, and extreme rotations. In case of small area latent fingerprints, it

becomes difficult to estimate the correct rotation to align latent fingerprints prior to global feature

extraction. A simple way to overcome this difficulty could be to rotate the latent image 4 times

by 90° increments and take a max fusion of the global similarity scores obtained when matching

each of the four rotated images with all the images in the gallery. Perhaps an even better approach

would be to make the global embeddings inherently rotation invariant, which will be one focus of

our future work.

5.5.2 Ablation Study

To justify the use of each component (enhancement, minutiae, virtual minutiae, global embed-

ding, and realignment stage), we perform an ablation analysis as each additional module is added to

the matching pipeline. The results are given in Table 5.9, which show improved search performance

with the addition of each module. We observe a significant jump in accuracy with the incorpo-

ration of local minutiae features, another significant jump in accuracy with virtual minutiae, and

a final improvement in using the realignment stage. The performance across all datasets starts to

saturate after the second stage matching with virtual minutiae, where the third stage (realignment)

adds the most noticeable benefit for datasets with extreme rotations (e.g., N2N Latent). We also

see significant improvements in rank-1 retrieval with using our enhancement network vs. without

any enhancement. For example, the performance on NIST SD 27 improves from 72.87% without

132

Figure 5.10 Example success (a and b) and failure (c and d) cases of the proposed LFR-Net on the
NIST SD 27 latent database.

133

Table 5.9 LFR-Net ablation study.

Global
Emb.
✓

Minu.

Modules
Virtual
Minu.

Realign

Enhance

✓
✓
✓
✓

✓
✓
✓
✓

✓
✓
✓
✓

✓
✓
✓

✓
✓

✓
✓

✓

✓

✓
✓
✓
✓
✓

Rank-1 accuracy (%) on a gallery of 100K

NIST
SD 27
39.92
57.75
65.12
72.48
72.87
52.33
67.69
75.58

84.11
84.11

N2N
Latent
21.95
46.47
46.58
49.21
50.25
23.00
51.66
51.12
53.50

MSP
Latent
59.27
72.45
78.67
80.71
81.78
60.34
78.24
81.46
83.92

MOLF
(DB1/DB4)
42.41
45.05
54.86
60.34
60.77
53.48
60.52
67.18
70.14

MOLF
(DB2/DB4)
38.48
33.36
45.20
50.25
50.52
49.85
50.32
59.68
62.68

54.36

84.35

70.43

62.86

enhancement to 84.11% with enhancement.

5.6 Conclusion

In this chapter, we presented a pipeline for end-to-end latent fingerprint recognition and demon-

strated its SOTA performance across five different latent fingerprint evaluations (for both closed-set

and open-set identification), as well as its generalization across several rolled, plain, and contact

to contactless fingerprint datasets. Our network incorporates a novel use of both local (minutiae

and virtual minutiae) and global (AFR-Net) embeddings for improved latent fingerprint recogni-

tion. We also present a multi-stage search strategy to decrease the time required for large-scale

identification, which is adaptable for a desired trade-off in accuracy and search speed. Despite

the performance improvement achieved by the methods proposed in this chapter, there still exists

a gap in performance compared to controlled rolled to rolled fingerprint recognition. One of the

significant factors limiting the performance of latent fingerprint recognition is the lack of large-

scale, publicly available latent datasets for training. In fact, a lack of publicly available fingerprint

recognition data is a significant inhibitor to progress in many aspects of fingerprint recognition. In

the next chapter, we turn to synthetic fingerprint generation to help alleviate this challenge in one

of these notable applications, fingerprint spoof (i.e., presentation attack) detection, where limited

data is of particular consequence.

134

5.7 Acknowledgment

Parts of this research were supported by a grant from the Department of Homeland Security via

The Criminal Investigations and Network Analysis Center (CINA) at George Mason University.

135

CHAPTER 6

SYNTHETIC FINGERPRINT SPOOF IMAGES

A major limitation to advances in fingerprint presentation attack detection (PAD) is the lack of

publicly available, large-scale datasets - a problem which has been compounded by increased

concerns surrounding privacy and security of biometric data. Furthermore, most state-of-the-art

PAD algorithms rely on deep networks which perform best in the presence of a large amount of

training data. This chapter aims to demonstrate the utility of synthetic (both bona fide and PA

style) fingerprints in supplying these algorithms with sufficient data to improve the performance of

fingerprint PAD algorithms beyond the capabilities when training on a limited amount of publicly

available “real” datasets. First, we provide details of our approach in modifying a state-of-the-

art generative architecture to synthesize high quality bona fide and PA fingerprints. Then, we

provide quantitative and qualitative analysis to verify the quality of our synthetic fingerprints in

mimicking the distribution of real data samples. We showcase the utility of our synthetic bona fide

and PA fingerprints in training a deep network for fingerprint PAD, which dramatically boosts the

performance across three different evaluation datasets compared to an identical model trained on

real data alone. Finally, we demonstrate that only 25% of the original (real) dataset is required

to obtain similar detection performance when augmenting the training dataset with synthetic data.

We make our synthetic dataset and model publicly available to encourage further research on this

topic: https://github.com/groszste/SpoofGAN.

6.1

Introduction

Fingerprint recognition has had a long history in person identification due to the purported

uniqueness and permanence of fingerprints, originally pointed out by Sir Francis Galton in his

1892 book titled Finger Prints [81] and reaffirmed in many works over the last century, including

the well-known studies on the individuality and longitudinal permanence of fingerprint recogni-

tion [190, 262]. Clearly, a significant contributor to their widespread adoption is the high level of

This chapter was previously published as S. A. Grosz and A. K. Jain, “SpoofGAN: Synthetic Fingerprint Spoof
Images”, IEEE Transactions on Information Forensics and Security, vol. 18, pp. 730-743, 2023. Copyright 2023 by
IEEE. Reprinted with permission.

136

Figure 6.1 Example fabricated fingerprint PAs of various materials and corresponding fingerprint
impressions captured on a CrossMatch Guardian200 fingerprint reader.
(a) PlayDoh PA, (b)
printed paper PA, (c) latex PA, (d) fingerprint impression from the PlayDoh artifact, (e) fingerprint
impression from the printed paper artifact, and (f) fingerprint impression from the latex artifact.

verification performance achieved by state-of-the-art (SOTA) algorithms for automated fingerprint

recognition. However, despite the impressive accuracy achieved to date by the top-performing

fingerprint recognition algorithms, there remain many ongoing efforts to further improve the ca-

pabilities of fingerprint recognition systems - especially in terms of recognition speed and system

security. As a result, there has been a recent push toward deep neural network (DNN) based models

for fingerprint recognition [26,68,93,139,143,217,219]. These compact, fixed-length embeddings

can be matched efficiently and combined with homomorphic encryption for added security [74].

For a more exhaustive account of existing deep learning approaches to fingerprint recognition and

other biometric modalities, interested readers are encouraged to consult one of the many surveys

on deep learning in biometrics (e.g., [172, 226]).

Indeed, this push toward DNN-based fingerprint recognition comes in the wake of the success

demonstrated in the face recognition domain in applying DNN models to face recognition, which

was aided by the availability of large-scale face recognition databases which were easily crawled

from the web despite the many ethical and privacy concerns which have led to many of these datasets

to be recalled today. Arguably, at least in part, the reason for the delayed adoption of DNNs for

fingerprint recognition has been the lack of publicly available, large-scale fingerprint recognition

datasets and increased scrutiny over privacy of biometric data, which has led to many works to

Today’s top-performing algorithm in the FVCongoing 1:1 hard benchmark achieved a False Non-Match Rate (FNMR)
of just 0.626% at a False Match Rate (FMR) of 0.01% [63]

137

generate synthetic fingerprint images [6, 9, 11, 19, 21, 23, 36, 71, 76, 125, 171, 173, 198, 252, 268].

Similarly, there has been an increased interest in DNN-based models for fingerprint PAD (i.e.,

spoof detection), where the scale and amount of publicly available data is also limited. Table 6.1

gives a list of the publicly available fingerprint PA datasets. Compared to the largest, public finger-

print recognition dataset, NIST Special Database 302 [78], which contains fingerprints from 2,000

unique fingers, the largest publicly available fingerprint PA datasets, e.g., the LivDet competition

datasets, contain at most 1,000 unique fingers (for the Swipe sensor in LivDet 2013 [86]). Com-

pounding the problem is the difficulty in collecting large-scale fingerprint PA datasets due to the

increased time and complexity in fabricating and imaging artifacts mimicking realistic fingerprint

ridge-valley structures, motivating the potential of synthetic data as a viable alternative. However,

to the best of our knowledge, there does not exist a synthetic fingerprint PA generator to fill the

gap between the amount of publicly available fingerprint PA data and training of data-hungry deep

learning-based models.

To address the lack of large-scale fingerprint PA datasets, we propose SpoofGAN. Inspired

by the impressive results of the recently proposed PrintsGAN [71], SpoofGAN is a multi-stage

generative architecture to fingerprint generation. SpoofGAN is different from PrintsGAN in the

following ways:

• Generation of plain print fingerprints, which compared to the rolled prints generated by

PrintsGAN, are more representative of the publicly available PA fingerprint datasets and

exhibit different textural characteristics, distortions, etc.

• Ability to synthesize representations of both bona fide (i.e., live) and PA fingerprint images

of the same finger.

• Replacing the learned warping and cropping module with a statistical, controllable non-linear

deformation model to synthesize multiple, realistic impressions per finger. This allows us to

control the degree of distortion applied.

Swipe sensors are no longer in vogue after Apple introduced the “area capacitive sensor" in Touch ID.

138

In this work, we make the distinction that real fingerprint images are those captured on a

fingerprint reader by either a real human finger or physical artifact (presentation attack) which

mimics a fingerprint ridge structure, whereas synthetic fingerprint images are digital renderings of

fingerprint images. However, there can be both real bona fide and real PA fingerprint images as well

as synthetic representations of bona fide and synthetic representations of PA fingerprint images.

For clarification, we use the following four classifications:

• Real bona fide fingerprint image: fingerprint images captured from a real human finger on a

fingerprint reader.

• Real PA fingerprint image: fingerprint images captured from a presentation attack artifact on

a fingerprint reader.

• Synthetic bona fide fingerprint image: synthetic images that mimic the distribution of finger-

print images captured from a real human finger.

• Synthetic PA fingerprint image: synthetic images that mimic the distribution of fingerprint

images captured from presentation attack artifacts.

We validate the realism of our synthetic bona fide and PA images through extensive qualitative

and quantitative metrics including NFIQ2 [228], minutiae statistics, match scores from a SOTA

fingerprint matcher, and T-SNE feature space analysis showing the similarity of real bona fide and

PA embeddings to the embeddings of our synthetic bona fide and PA fingerprints. Besides verifying

the realism of our synthetic PA generator, we also show how SpoofGAN fingerprints can be used to

train a DNN for fingerprint PAD. We show this by improving the performance of a PAD model by

augmenting an existing fingerprint PA dataset with additional samples from our synthetic generator.

We also open the door to jointly optimizing for fingerprint PAD and recognition in an end-to-end

learning framework with our ability to generate a large-scale dataset of multiple impressions per

finger of both bona fide and PA examples.

More concisely, the contributions of this research are as follows:

• A highly realistic plain print synthetic fingerprint generator capable of generating multiple

impressions per finger.

139

• The first, to the best of our knowledge, synthetic fingerprint PA generator which is capable of

producing synthetic representations of both bona fide and PA impressions of the same finger.

This opens the door to joint optimization of fingerprint PAD and recognition algorithms.

• Quantitative and qualitative analysis to verify the quality of our generated bona fide and PA

fingerprints.

• Experiments showcasing improved fingerprint PAD on both seen and unseen PA material

types when augmenting existing fingerprint datasets with our synthetic bona fide and PA

fingerprints.

• We release our code and a database of SpoofGAN images to encourage further research in

this area https://github.com/groszste/SpoofGAN.

6.2 Related Work

6.2.1 Fingerprint Presentation Attack Detection

One significant risk to the security of fingerprint recognition systems is that of presentation

attacks, defined by the international standard ISO/IEC 30107-1:2016 as a “presentation to the

biometric data capture subsystem with the goal of interfering with the operation of the biometric

system” [117]. The most common type of presentation attacks are spoof attacks, i.e., physical

representations of finger-like structures aimed at either mimicking the fingerprint ridge-valley

structure of another individual or subverting the user’s own identity. Spoof attacks may come in

many different forms and materials such as those shown in Figure 6.1.

Several hardware-based and software-based solutions to detecting spoof attacks have been

proposed. Hardware-based solutions include specialized sensors that leverage various “liveness”

cues at the time of acquisition, such as conductivity of the material/finger, sub-dermal imaging,

and multi-spectral lighting [12, 43, 67, 135, 221, 236]. On the other hand, software-based solutions

typically rely on only the information captured in the grayscale image acquired by the fingerprint

reader [41, 84, 85, 164, 167, 177, 188]. Despite the limited publicly available fingerprint PA data,

many of the state-of-the-art software-based solutions to fingerprint PAD leverage convolutional

In this work, we use the terms spoof and presentation attack interchangeably.

140

Table 6.1 Publicly available fingerprint PA datasets.

# Train Images
Bona Fide (PA)
1000 (1000)
1000 (1000)

# Test Images
Bona Fide (PA)
1000 (1000)
1000 (1000)

PA types

Ecoflex, Gelatine, Latex, Modasil, WoodGlue

1000 (1000)
1000 (1000)
1000 (1000)
1000 (1000)

1000 (1000)
1000 (1000)
1250 (1000)
1250 (1000)
1000 (1000)
1000 (1000)
1000 (1000)
1510 (1473)
1000 (1200)
1000 (1200)
999 (1199)

1000 (1200)
1000 (1200)
1000 (1000)
1250 (1500)
1250 (1500)

2250 (3000)
2250 (2250)

4743 (4912)

1000 (1000)
1000 (1000)
1000 (1000)
1000 (1000)

1000 (1000)
1000 (1000)
1250 (1000)
1250 (1000)
1000 (1500)
1000 (1500)
1000 (1500)
1500 (1448)
1700 (2040)
1700 (2676)
1700 (2028)

1020 (1224)
990 (1088)
1019 (1224)
2050 (2460)
2050 (2460)

2250 (3000)
2250 (2250)

1000
(leave-one-out)

Gelatine, latex, PlayDoh, Silicone, Wood Glue,
Ecoflex

Body Double, Latex, PlayDoh, Wood Glue,
Gelatine, Ecoflex, Modasil

Ecoflex, Gelatine, Latex, Liquid Ecoflex, RTV,
WoodGlue, Body Double, PlayDoh, OOMOO

Body Double, Ecoflex, Wood Glue, Gelatine, Latex,
Liquid Ecoflex

Body Double, Ecoflex, Wood Glue, Gelatine, Latex,
Liquid Ecoflex

Latex, RProFast, Nex Mix 1, Body Double, Elmer’s
Glue, GLS20, RFast30

Ecoflex, PlayDoh, 2D Matte Paper, 2D
Transparency
2D Printed Paper, 3D Universal Targets, Conductive
Ink on Paper, Dragon Skin, Gelatine, Gold Fingers,
Latex Body Paint, Monster Liquid Latex, Play Doh,
Silicone, Transparency, Wood Glue

Name

LivDet
2009 [166]

LivDet
2011 [257]

LivDet
2013 [86]

LivDet
2015 [175]

LivDet
2017 [176]

LivDet
2019 [186]

LivDet
2021 [39]

MSU
FPAD [41]
MSU
FPAD
v2 [42]

Sensors

Biometrika
Italdata

Biometrika
DigitalPersona
ItalData
Sagem

Biometrika
ItalData
CrossMatch
Swipe
Biometrika
DigitalPersona
GreenBit
CrossMatch
GreenBit
Orcanthus
DigitalPersona

GreenBit
Orcanthus
DigitalPersona
GreenBit
Dermalog

CrossMatch

CrossMatch

1 The dataset release agreement for all LivDet databases can be found at https://livdet.org/registration.php.
2 Similarly, the dataset release form for the MSU FPAD dataset can be found at http://biometrics.cse.msu.edu/Publications/
Databases/MSU_FPAD/

141

neural networks to learn the decision boundary between bona fide and PA images. Some researchers

have proposed training their algorithms on smaller patches of the fingerprint images as a way to

deal with limited amounts of available training data, which roughly increases the number of training

images by a factor proportional to the number of patches [41]. However, given the increased scrutiny

over privacy concerns related to biometric datasets, it is not certain whether any PA fingerprint

datasets will remain available in the future, motivating the need for synthetic data.

Another challenge related to limited training data is that of unseen PAs, or fingerprint images

arising from never before seen PA instruments. This problem is also commonly referred to in the

literature as cross-material generalization. Some strategies proposed to improve the cross-material

performance of PA detectors include learning a tighter boundary around the bona fide class via one-

class classifiers [58,72], incorporating adversarial representational learning to encourage robustness

to varying material types [92, 193], or applying style transfer to mix textures from some known PA

materials to better fill the space of unknown texture characteristics that may be encountered [44,79].

Similar ideas may apply to synthetic data generation, where new material types can be synthesized

by mixing characteristics of known PAs.

6.2.2 Synthetic Fingerprint Generation

Research on synthetic fingerprint generation began in the early 2000s with the introduction

of SFinGe [33]. Since then, many subsequent works have followed utilizing either hand-crafted

approaches [125, 131, 268], learning-based approaches [6, 11, 19, 21, 23, 76, 171, 173], or a combi-

nation of both [71, 173, 252]. Classical methods, such as SFinGe, are useful for many applications

due to the controllable nature of the generation process; however, they still lack the level of real-

ism needed to close the domain gap to real fingerprint images. On the other hand, more recent

learning-based approaches have generated increasingly realistic fingerprint ridge patterns but could

not generate multiple impressions of the same finger. Fortunately, hybrid approaches, such as [71]

and [252], incorporate domain knowledge into the learning-based generation process and can

generate high quality fingerprints with multiple impressions per finger. Motivated by the success

of hybrid approaches, we also employ a similar architecture as [71] to generate multiple, realistic

142

Figure 6.2 Overview of the SpoofGAN Architecture.

fingerprint images of each finger. However, unique to our approach, we also have the ability to

simulate realistic PA impressions for each finger in a variety of different PA artifact “styles". To

the best of our knowledge, this is the first study on synthetic PA fingerprint image generation.

6.3 Proposed Synthetic Presentation Attack Fingerprint Generator

In this section, we detail the process of generating synthetic bona fide and PA fingerprints. Mo-

tivated by the success of previous multi-stage fingerprint generation methods (e.g., [36, 71, 252]),

SpoofGAN generates highly realistic fingerprints in multiple stages. First, unique fingers are syn-

thesized through generating binary master fingerprints which define the fingerprint ridge structure

of the finger. Following the master print synthesis stage, perturbations such as random rotation,

translation, and non-linear deformation are applied to simulate realistic, repeat impressions. Fi-

nally, each generated fingerprint impression is input to a second neural network to impart realistic

textures which mimic a database of real fingerprints. An overview of the entire process is given in

Figure 6.2.

143

6.3.1 Master Print Synthesis

The first step in generating synthetic fingerprints with SpoofGAN is generating binary master

fingerprints 𝐼𝑖𝑑 ∈ {0, 1}256×256 from a random vector 𝑧𝑖𝑑 ∈ R256 sampled from a standard normal

distribution (i.e., 𝑧𝑖𝑑 ∼ N (0, 1)). In particular, we used a standard BigGAN [22] architecture for

this task, consisting of a generator 𝐺𝑖𝑑 and a discriminator 𝐷𝑖𝑑. Since many PA impressions can

exhibit non-realistic fingerprint ridge structures, either from artifacts introduced in the fabrication

(e.g., bubbles in the ridges) or in the presentation process (e.g., smudges due to the high elasticity

of some PA material types), we chose to train 𝐺𝑖𝑑 using a database of only bona fide fingerprint

impressions consisting of 38,164 images captured on a CrossMatch Guardian200 fingerprint reader.

As we will show later, these artifacts for the PA impressions can be introduced in the later texture

rendering stage of our synthesis pipeline. The network is trained via an adversarial loss shown in

Equation 6.1, where 𝐼 is a binary fingerprint image extracted from a real fingerprint image using

the Verifinger v12.0 SDK.

L𝑎𝑑𝑣 (𝐺, 𝐷) = E𝐼 [𝑙𝑜𝑔𝐷 (𝐼)] + E𝑧 [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝑧)))]

(6.1)

6.3.2 Generating Multiple Realistic Impressions

To generate multiple impressions from a single master print, we apply realistic rotation, trans-

lation, and non-linear deformation for each subsequent impression. Rotations are applied via a

uniform random sampling in the range [-30°, 30°], whereas translations in both the x and y direc-

tions are uniformly sampled in the range [-25, 25] pixels. Finally, realistic non-linear deformations

are applied via a learned, statistical deformation model proposed by Si et al.

in [215]. The dis-

tortion parameters were learned from a database of 320 distorted fingerprint videos in which the

minutiae locations in the first and last frames were manually labeled, and the displacements be-

tween corresponding minutiae points were used to estimate a distortion field via a Thin-Plate-Spline

deformation model [20]. The distortion fields were condensed into a subset of eigenvectors, 𝑒𝑖,

computed from a Principal Component Analysis of the covariance matrix estimated from the 320

144

videos. By varying the coefficients, 𝑐𝑖, multiplied to each of the eigenvectors, we vary the magni-

tude of distortion, 𝑑, applied to an input fingerprint along realistic distortion directions according

to equation6.2, where 𝜆𝑖 are the eigenvalues of each eigenvector and ¯𝑑 is the average distortion

field. For our implementation, we randomly sample the coefficients of the two largest eigenvectors

from a normal distribution with mean 0 and standard deviation of 0.66, which were empirically

determined to produce reasonable distortions.

𝑑 ≈ ¯𝑑

𝑡
∑︁

𝑖=1

√︁𝜆𝑖𝑒𝑖

𝑐𝑖

(6.2)

6.3.3 Texture Rendering

The final stage of our fingerprint generation process consists of imparting each fingerprint with

a realistic texture that mimics the distribution of real bona fide and PA images. For the generator,

𝐺𝑡, we use an encoder-decoder architecture which translates an input, warped binary image, 𝐼𝑤,

into a realistic fingerprint impression, 𝐼𝑡. To promote diversity in the rendered images, a random

texture vector sampled from a standard normal distribution (i.e., 𝑧𝑡 ∼ N (0, 1)) is injected into

the network and encoded into 𝛾 and 𝛽 parameters for performing instance normalization on the

intermediate feature maps of 𝐺𝑡. Finally, the discriminator, 𝐷𝑡 utilizes the same architecture used

in the binary master print synthesis network.

The goal of our texture renderer is two-fold: i.) generate realistic texture details and ii.) maintain

the fingerprint ridge structure (i.e., identity) of the rendered fingerprint between corresponding

impressions of the same finger. Thus, we introduce two losses in addition to the conventional GAN

loss (eq. 6.3) to maintain the ridge structure of textured fingerprints. The first is an identity loss

to minimize the 𝐿2 distance between feature embeddings of corresponding fingerprint impressions

using a SOTA fingerprint matcher DeepPrint [68] (eq. 6.4), and the other is an 𝐿2 pixel loss

between ground truth binary images and binary images extracted from the textured fingerprints

(eg. 6.5). The 𝐿2 pixel loss is computed on the binary images, rather than the grayscale images, to

allow for the network to generate diverse “styles” in the generated fingerprints to simulate different

145

pressure, moisture content, and contrast in subsequent impressions; all of which would lead to

slightly different loss values compared to the ground truth image unless first converted to binary

ridge images. To make the process of binarization of the generated fingerprints differentiable,

we train a convolutional autoencoder to binarize input fingerprints which is trained on 38,164

grayscale/binary image pairs. The overall losses for 𝐺𝑡 and 𝐷𝑡 are given in equations 6.6 and 6.7,

respectively.

1. GAN loss: Classical min-max GAN loss between the discriminator, 𝐷𝑡 (·), trying to classify

each original fingerprint image, 𝐼, as real and each synthetic fingerprint 𝐼𝑡 = 𝐺𝑡 (𝐼𝑤) as fake.

Meanwhile, 𝐺𝑡 (·) is trying to fool 𝐷𝑡 (·) into thinking its outputs come from the original

image distribution.

L𝑎𝑑𝑣 = E𝐼 [𝑙𝑜𝑔𝐷 (𝐼)] + E𝐼𝑤 [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝐼𝑤)))]

(6.3)

2. DeepPrint loss: 𝐿2 distance between the DeepPrint embedding, 𝑅, extracted from the ground

truth grayscale image and the DeepPrint embedding,

ˆ𝑅, extracted from the synthesized

grayscale fingerprint image.

𝐿𝑑𝑝 =

1
2

∑︁

(𝑅 − ˆ𝑅)2

(6.4)

3. Image/pixel loss: 𝐿2 loss between the ground truth binary fingerprint image, 𝐼𝑤, and synthe-

sized binary fingerprint image, ˆ𝐼𝑤.

1
2
4. Overall loss for 𝐺𝑡 (·): 𝜆1 = 1, 𝜆2 = 2, and 𝜆3 = 10 (determined empirically).

(𝐼𝑤 (𝑥, 𝑦) − ˆ𝐼𝑤 (𝑥, 𝑦))2

𝐿𝑖 =

∑︁

𝑥,𝑦

5. Overall loss for 𝐷𝑡 (·):

L𝐺𝑡 = 𝜆1L𝑎𝑑𝑣 + 𝜆2L𝑑𝑝 + 𝜆3L𝑖

L𝐷𝑡 = L𝑎𝑑𝑣

(6.5)

(6.6)

(6.7)

Unlike the binary master print synthesis and warping stages, an individual texture rendering

network is trained for each material type (bona fide, ecoflex PA, PlayDoh PA, etc.). Due to the

limited number of images in our PA dataset, we pretrained a texture rendering network on the

146

282K unique fingerprint database taken from the MSP longitudinal database introduced in [262].

Initially, following the pretraining, two texture rendering networks are trained further, one on the

dataset of 38,164 bona fide only impressions and the other on the 3,366 PA fingerprint images

consisting of all PA types aggregated together. Finally, we further finetune the model trained on

all PAs for each of the individual PA types to give more fine-grained control on the specific PA

style being generated. Thus, unlike the binary master print synthesis and warping stages which are

shared, each individual PA type has it’s own rendering network. Alternatively, a conditional GAN

structure could be used to generate PA classes of each type within a single network; however, we

found that due to the very limited amount of training images in some PA types (e.g., 50 images),

finetuning for just a few epochs on each PA type individually produced higher quality images.

6.3.4 Training Details

For training the master print generator, 𝐺𝑖𝑑, an Adam optimizer with a learning rate of 0.0001

was used, whereas a Moving Average Optimizer with an initial learning rate of 0.0004 was used

to train the discriminator, 𝐷𝑖𝑑. Furthermore, the master print generator was trained with a batch

size of 8 on two NVIDIA GeForce RTX 2080 Ti GPUs for a total of 178 epochs, where each

epoch contained 100 batches. To help balance the training, the generator was updated twice for

every update of the discriminator. The architecture for both 𝐺𝑖𝑑 and 𝐷𝑖𝑑 are given in Table 6.2 and

Table 6.3, respectively.

Finally, 𝐺𝑡 and 𝐷𝑡 of the texture renderer utilized the same optimizers as 𝐺𝑖𝑑 and 𝐷𝑖𝑑, respec-

tively; however, 𝐺𝑡 was updated 3 times for every update of 𝐷𝑡. To increase the diversity in the

generated samples, multiple checkpoints of the texture renderer are used in generating the synthetic

data that was used for training the PAD model. The full architectures for 𝐺𝑡 and 𝐷𝑡 are given in

Tables 6.4 and 6.5, respectively. For completeness, the architecture for the CNN-based fingerprint

binarizer used in training the texture render is given in Table 6.6, and the architecture for the texture

encoder (which encodes 𝛾 and 𝛽 parameters from a random texture vector) utilized in 𝐺𝑡 is given

in Table 6.7.

This database is not publicly available, but the pretrained model can be made available upon request.

147

Table 6.2 Architecture for 𝐺𝑖𝑑 (·) (𝐶ℎ = 48).

Layer
0. Input
1. ReLU(Dense)
2. Reshape
3. ResBlock Up†
4. ResBlock Up†
5. ResBlock Up†
6. ResBlock Up†
7. ResBlock Up†
8. Self Attention
9. ResBlock Up†
10. Tanh(Conv2d(Relu(Batch Norm))))
† Layer contains conditional batch norm.

Table 6.3 Architecture for 𝐷𝑖𝑑 (·) (𝐶ℎ = 48).

Layer
0. Input
1. ResBlock Down
2. ResBlock Down
3. Self Attention
4. ResBlock Down
5. ResBlock Down
6. ResBlock Down
7. ResBlock Down
8. ResBlock
9. Dense(Global Sum Pooling(ReLU))

6.4 Experimental Results

Output Dim.
512
12, 288
4 × 4 × 16𝑐ℎ
8 × 8 × 16𝑐ℎ
16 × 16 × 8𝑐ℎ
32 × 32 × 8𝑐ℎ
64 × 64 × 4𝑐ℎ
128 × 128 × 2𝑐ℎ
128 × 128 × 2𝑐ℎ
256 × 256 × 𝑐ℎ
256 × 256 × 1

Output Dim.
256 × 256 × 1
128 × 128 × 𝑐ℎ
64 × 64 × 2𝑐ℎ
64 × 64 × 2𝑐ℎ
32 × 32 × 4𝑐ℎ
16 × 16 × 8𝑐ℎ
8 × 8 × 8𝑐ℎ
4 × 4 × 16𝑐ℎ
4 × 4 × 16𝑐ℎ
1

In this section, we aim to validate the realism of our synthetic bona fide and PA images via

several qualitative and quantitative experiments. First, we provide details on the datasets involved in

the following experiments, followed by some example fingerprint images generated by SpoofGAN

to qualitatively compare with real fingerprint images. Finally, several quantitative metrics are used

to compare the utility and distribution of SpoofGAN generated fingerprint images compared to the

real fingerprint images.

6.4.1 Datasets

A main motivation for this paper is the lack of large-scale, publicly available fingerprint PAD

datasets. Some of the largest datasets that are available have resulted from the biennial LivDet

competition series dating as far back as 2009 [39,86,175,176,186,257]. A more comprehensive list

of the fingerprint PA datasets currently available to the research community is given in Table 6.1,

whereas the datasets used in this paper are given in Table 6.8. In this paper we focus our experiments

148

Table 6.4 Architecture for 𝐺𝑡 (·) (𝐶ℎ = 48).

Layer
0. Input
1. ReLU(Batch Norm(Conv2d))
2. ReLU(Batch Norm(Conv2d))
3. ReLU(Batch Norm(Conv2d))
4. ReLU(Batch Norm(Conv2d))
5. ReLU(Batch Norm(Conv2d))
6. Dense
7. Dense
8. Reshape
9. ResBlock Up†
10. ResBlock Up†
11. ResBlock Up†
12. ResBlock Up†
13. Self Attention
14. ResBlock Up†
15. ResBlock Up†
16. ResBlock Up†
17. Tanh(Conv2d(ReLU(Batch Norm)))
† Layer contains conditional batch norm.

Table 6.5 Architecture for 𝐷𝑡 (·) (𝐶ℎ = 48).

Layer
0. Input
1. ResBlock Down
2. ResBlock Down
3. ResBlock Down
4. Self Attention
5. ResBlock Down
6. ResBlock Down
7. ResBlock Down
8. ResBlock Down
9. ResBlock
10. Dense(Global Sum Pooling(ReLU))

Output Dim.
256 × 256 × 1
128 × 128 × 32
64 × 64 × 64
32 × 32 × 128
16 × 16 × 256
8 × 8 × 512
512
16𝑐ℎ
4 × 4 × 𝑐ℎ
8 × 8 × 16𝑐ℎ
16 × 16 × 8𝑐ℎ
32 × 32 × 8𝑐ℎ
64 × 64 × 4𝑐ℎ
64 × 64 × 4𝑐ℎ
128 × 128 × 2𝑐ℎ
256 × 256 × 𝑐ℎ
512 × 512 × 𝑐ℎ
512 × 512 × 1

Output Dim.
512 × 512 × 1
256 × 256 × 𝑐ℎ
128 × 128 × 𝑐ℎ
64 × 64 × 2𝑐ℎ
64 × 64 × 2𝑐ℎ
32 × 32 × 4𝑐ℎ
16 × 16 × 8𝑐ℎ
8 × 8 × 8𝑐ℎ
4 × 4 × 16𝑐ℎ
4 × 4 × 16𝑐ℎ
1

on fingerprint images obtained via the CrossMatch optical reader from LivDet 2013, LivDet 2015,

and the Government Controlled Test (GCT) dataset of bona fide and PA fingerprints collected as

part of the IARPA ODIN program. Our training dataset for SpoofGAN, referred to as GCT 1-5,

consists of 38,164 bona fide fingerprint images and 3,366 PA fingerprint images from 2,007 fingers

and 11 different PA types (Dragon Skin, Ecoflex, Paper, Silicone, Transparency, Gelatine, Glue,

PDMS, Knox Gelatine, Gummy Overlay, and Tattoo). For our evaluations, we have followed the

same train/test protocol referenced in LivDet2015 and LivDet2013, as well as reserved a fraction

of the GCT dataset, GCT 6, as an evaluation dataset.

We selected CrossMatch for our experiments since it is one of the most popular slap (4-4-2) capture readers used in
law enforcement, homeland security and civil registry applications.

149

Table 6.6 Architecture for Fingerprint Binarizer.

Layer
0. Input
1. ReLU(Instance Norm(Conv2d)))
2. ReLU(Instance Norm(Conv2d)))
3. ReLU(Instance Norm(Conv2d)))
4. ReLU(Instance Norm(Conv2d)))
5. ReLU(Instance Norm(Conv2d)))
6. UpScale2d(ReLU(Layer Norm(Conv2d))))
7. UpScale2d(ReLU(Layer Norm(Conv2d))))
8. UpScale2d(ReLU(Layer Norm(Conv2d))))
9. UpScale2d(ReLU(Layer Norm(Conv2d))))
10. UpScale2d(ReLU(Layer Norm(Conv2d))))
11. Tanh(Layer Norm(Conv2d))))

Output Dim.
256 × 256 × 1
128 × 128 × 64
64 × 64 × 128
32 × 32 × 256
16 × 16 × 384
8 × 8 × 512
16 × 16 × 384
32 × 32 × 256
64 × 64 × 128
128 × 128 × 2
256 × 256 × 2
256 × 256 × 1

Table 6.7 Architecture for the texture encoder used in the conditional batch norm layers of 𝐺𝑡 (·).

Layer
0. Input
1. ReLU(Dense)
2. ReLU(Dense)
3. ReLU(Dense)
4. ReLU(Dense)

Output Dim.
128
128
128
128
128

6.4.2 Qualitative Analysis of Synthetic Bona Fide and PA Images

Example synthetic bona fide and PA fingerprints of varying material types (i.e., presentation

Attack Instruments) are shown in column (b) of Figure 6.3 with corresponding real examples in

column (a) shown for reference. Visually, SpoofGAN is generating PA examples that closely

resemble the target material type, which can be seen in the different texture characteristics for

each material seen in the synthetic examples (e.g., Paper vs. Tattoo, etc.). Additionally, looking

across the rows of the synthetic SpoofGAN images, we notice that the underlying fingerprint ridge

structure is successfully being preserved across the different PA styles being generated. Figure 6.4

further highlights SpoofGAN’s ability to successfully generate multiple impressions of each finger

in both bona fide and PA styles.

6.4.3 Quantitative Analysis of Synthetic Bona Fide and PA Images

Similar to previous works on synthetic fingerprint generation [11, 71], we have evaluated the

quality of SpoofGAN fingerprints on several quantitative metrics, including PAD performance

by a pretrained PA detector, the distribution of minutiae count, type, and quality extracted from

SpoofGAN generated fingerprints compared to a real fingerprint dataset, NFIQ2 quality scores,

150

Figure 6.3 Example real bona fide and PA images and synthetic bona fide and PA images generated
by SpoofGAN of various material types: (a) bona fide, (b) ecoflex, (c) body double, (d) conductive
ink on paper, (e) tattoo, and (f) gelatine.

151

Table 6.8 Summary of the PA datasets used in our experiments.

Dataset

LivDet 2013

LivDet 2015

GCT 1-5

GCT 6

MSU FPADv2

Fingerprint

Reader

Model

CrossMatch

CrossMatch

CrossMatch

CrossMatch

CrossMatch

L Scan Guardian

L Scan Guardian

Guardian200

Guardian200

Guardian200

Resolution (dpi)

500

500

500

500

500

# Bona Fide

Images

(Train/Test)

# PA Images

(Train/Test)

1,250 / 1,250

1,510 / 1,500

38,164 / 0

7,357 / 14,236

4,743 / 1,000

500 / 440

1,473 / 1,448

3,366 / 0

2,550 / 1,829

4,912†

Dragon Skin,

Ecoflex, Knox

Body Double,

PA Materials

Latex, PlayDoh,

Wood Glue

Ecoflex, Gelatine,

Gelatine, Silicone,

Ecoflex, Silicone,

PlayDoh,

OOMOO

Transparency,

Gummy Overlay,

Gelatine, Glue,

Tattoo, Knox

Body Double

PDMS, Tattoo

Gelatine

Paper, Gummy

Overlay

Paper, Transparency,

3D Unviersal

Targets, Conductive

Ink on Paper,

Dragon Skin,

Gelatin, Gold

Fingers, PlayDoh,

Latex Body Paint,

Monster Liquid

Latex, Silicone,

Wood Glue

† There are 4,912 total PA images but the number of train/test images depends on which PA is left-out for the cross-material generalization
evaluation.

match score distributions from a SOTA fingerprint matcher, and identity leakage experiments.

6.4.3.1 Presentation Attack Detection Performance of Real vs. Synthetic Fingerprints

Our first evaluation to verify the quality of our synthetic bona fide and PA fingerprints is to

see whether a pretrained PAD algorithm trained on similar, real fingerprints performs equally

well on our synthetic fingerprints.

In particular, we pretrained an Inception v3 network on the

GCT 1-5 data to classify between bona fide and PA fingerprint samples. Then, we evaluated the

PAD performance on LivDet 2015 CrossMatch images compared to an equivalent sized database

of synthetic fingerprints. As shown in Table 6.9, the attack presentation classification error rate

(APCER), computed at threshold corresponding to bona fide classification error rate (BPCER)

of 0.2%, is similar across multiple PA types of both datasets, supporting our hypothesis that the

synthetic samples should be useful in training additional PAD models without access to a large

152

Figure 6.4 Example images of multiple impressions of the same finger generated by SpoofGAN.
(a) and (b) show three impressions each of two fingers rendered in a bona fide style, whereas (c)
and (d) show three impressions each of the same two fingers in a PA style (ecoflex and body double,
respectively).

Table 6.9 PAD model trained on real fingerprints (GCT 1-5) and evalu-
ated on LivDet 2015 CrossMatch images (top row) and an equivalent-
sized synthetic fingerprint dataset (bottom row)1. Results given in
APCER at a threshold corresponding to BPCER=0.2%.

BodyDouble Ecoflex PlayDoh OOMOO Gelatine

Real

Synthetic

0%

0%

0.43%

1.42%

0.42%

0%

0%

0%

0.42%

0.88%

1 There are 1,500 bona fide and 1,448 PA fingerprint test images for Cross-
Match in LivDet 2015, which we have replicated with synthetic data. Specifi-
cally, the PA images consist of 300 Body Double, 270 Ecoflex, 297 OOMOO,
281 PlayDoh, and 300 Gelatine images.

database of real bona fide and PA fingerprints for training. Lastly, the embeddings of both real and

synthetic images in the T-SNE embedding space suggest high similarity between the embeddings

of real and synthetic images (see Figure 7.5).

APCER and BPCER are the standard metrics according to ISO/IEC 30107-1:2016; however, other metrics have also
been reported in the literature, including true detection rate at a fixed false detection rate, where BPCER=FDR and
APCER=1-TDR (albeit, APCER is computed per PAI).

153

Figure 6.5 3D visualization of 2,048-dimensional embeddings of real bona fide and PA images
from the LivDet2015 CrossMatch dataset compared with embeddings of our synthetic bona fide
and PA images generated from SpoofGAN. Best viewed in color.

6.4.3.2 Feature Similarity Between Real and Synthetic Fingerprints

For synthetic fingerprint images to be useful as a substitute for real fingerprint images, the

features between a database of real fingerprint images and synthetic fingerprint images should

closely align. For this analysis, we computed several statistics from the LivDet 2015 CrossMatch

training dataset (bona fides only) and 1,500 SpoofGAN bona fide fingerprint images which are

shown in Table 6.10.

In terms of fingerprint area, SpoofGAN images are, on average, smaller

compared to the real fingerprint database. Since our training dataset consists of images of all

10 fingers, there is a bias toward smaller fingerprints considering the thumb as a minority class.

Given this assumption, it is perhaps unsurprising that a GAN-based generation approach might

exaggerate this class imbalance and generate smaller fingerprint area impressions. This problem

is related to mode-collapse and has been noted in several GAN related works [89, 205, 222], with

some recent papers proposing strategies to improve the generation process in class-imbalanced

datasets [114, 168].

Next, we computed the NFIQ 2.0 quality metric [228] on both datasets (see Figure 7.8). The

154

Figure 6.6 NFIQ2 quality scores for LivDet 2015 CrossMatch bona fide images and SpoofGAN
synthetic bona fide images. Due to smaller fingerprint area of SpoofGAN images compared to
LivDet CrossMatch images, the NFIQ2 distributions of the full images is quite different between
the two datasets; however, if we compute NFIQ2 scores on 256 × 256 center crops, the means of
the distributions are much closer.

NFIQ 2.0 scores for SpoofGAN are, on average, lower compared to LivDet. However, since one

of the features considered in NFIQ 2.0 is the fingerprint area, we recomputed the scores on a

256 × 256 center crop of each of the fingerprints and observed that independent of fingerprint area,

the NFIQ scores between SpoofGAN images and LivDet are much more aligned (36.88 ± 10.18

vs. 43.65 ± 14.12).

Lastly, we computed some additional metrics specific to the distribution of minutiae since

many of the state-of-the-art fingerprint algorithms incorporate minutiae information. The average

minutiae count and minutiae quality computed by Verifinger 12.0 are given in Table 6.10. The

average number of minutiae found in SpoofGAN seems to be lower compared to the CrossMatch

images from LivDet 2015; however, the minutiae per Megapixel is similar for both datasets (59.74

vs. 59.49). The minutiae quality given by Verifinger is also very similar between the two datasets

(71.89 vs. 70.78).

155

Table 6.10 Metrics for real and SpoofGAN fingerprint images. Minutiae quality and NFIQ2 scores
have a range of [0, 100].

Measure
Total Minutiae Count
Ridge Ending Minutiae Count
Ridge Bifurcation Minutiae Count
Verifinger Minutiae Quality
Fingerprint Area (Megapixels)
Fingerprint Image Quality (NFIQ2)

LivDet 2015 CrossMatch
Mean
55.56
30.58
24.98
71.89
0.93
60.18

Std. Dev.
18.43
12.12
9.18
16.00
0.30
19.35

Mean
40.45
20.99
19.46
70.78
0.68
44.34

SpoofGAN

Std. Dev.
11.57
6.67
6.61
15.15
0.15
14.38

6.4.3.3 Diversity in the Generated Fingers

To verify that SpoofGAN generated fingerprints mimic the similarity score distribution of real

bona fide and PA fingerprints, we have computed genuine and imposter matches with Verifinger

v12.0.

In particular, we computed match scores (genuine and imposter) between the bona fide

impressions of the LivDet 2015 CrossMatch images as well as between the bona fide samples

of synthetic SpoofGAN fingerprints. These distributions are shown in Figure 6.7 (a). This

figure highlights that SpoofGAN is generating diverse fingers with similar intraclass and interclass

variation as the real LivDet 2015 CrossMatch dataset, albeit producing slightly lower genuine

scores compared to the real dataset. However, in terms of recognition performance at a fixed

false acceptance rate (FAR), the performance between the two datasets is quite similar, despite the

slightly shifted genuine distribution of the SpoofGAN images (see Table 6.11).

Furthermore, we computed genuine score distributions between the individual PA types gener-

ated by SpoofGAN and their corresponding bona fide impressions. These distributions are given

in Figure 6.7 (b). Here we see that Verifinger is able to successfully match PA and bona fide

fingerprint images belonging to the same finger, which we believe opens the door to synthesising

a large-scale PA and recognition dataset that can be used to train and evaluate joint PAD and

recognition algorithms.

6.4.3.4

Identity Leakage

A major advantage of generating synthetic fingerprint data is that, theoretically, no fingerprint

ridge structure matches that of an actual user in the training database. However, there remains

a concern that synthesis methods, such as GANs, may inadvertently over-fit and leak private

156

Figure 6.7 Verifinger 12.0 match score distributions of the real LivDet 2015 CrossMatch L Scan
(a) match scores
Guardian database vs.
computed between the bona fide impressions of each dataset and (b) match scores computed
between bona fide and PA impressions for the SpoofGAN images.

an equivalent sized synthetic SpoofGAN database.

Table 6.11 TAR at Varying FAR Thresholds for LivDet 2015 CrossMatch Test Data vs. SpoofGAN
Images.

FAR
0.01% @ threshold=48
0.001% @ threshold=60
0.0001% @ threshold=72
1e-05% @ threshold=84
1e-06% @ threshold=96

LivDet
99.87%
99.80%
99.74%
99.41%
99.35%

SpoofGAN
100%
100%
99.80%
99.47%
98.87%

information from the training corpus [15]. Therefore, it is instructive to investigate whether, and

to what degree, any of our SpoofGAN generated images are revealing, i.e., match with sufficient

confidence, the identities present in our training database. Toward this end, we have computed

match scores between 1,500 SpoofGAN generated bona fide fingers and each of the 38,164 real bona

fide fingers in our training set. Out of the roughly 57.2 million (1, 500 × 38, 164) potential matches,

only 50 comparisons exceeded the matching threshold of 48 set by Verifinger for a false acceptance

rate of 0.01% with a maximum match score of 81. Furthermore, the 50 matches resulted from just

29 SpoofGAN generated fingers out of the 1,500 evaluated. Some example matched SpoofGAN

and real fingerprint image pairs are shown in Figure 6.8 along with their corresponding match

scores.

As part of future work, the risk of identity leakage could be further mitigated with either

157

Figure 6.8 Example SpoofGAN and real training database image pairs with corresponding match
scores given by Verifinger. For each pair, the left image is a SpoofGAN fingerprint image and the
right image is a real fingerprint image. The threshold for a genuine match at an FAR of 0.01%
for Verifinger is 48, indicating that the identity of these training images have been “leaked” by
SpoofGAN in the corresponding generated images.

i.) a larger training database to avoid the network from simply memorizing training samples,

ii.) performing an identity check with the training database upon generation of each fingerprint to

trigger a re-sampling if matched with an existing finger (though potentially an expensive operation),

or iii.) composing the master print pattern using SFinGe or other classical method (albeit, at the

expense of realism in the generated friction ridge pattern).

6.4.4

Improved Presentation Attack Detection with Synthetic Fingerprints

Ultimately, our synthetic fingerprints should offer some utility in advancing the training of

fingerprint PAD algorithms. Toward this end, we have augmented three existing, publicly available

datasets of bona fide and PA fingerprints with our synthetic fingerprints in an effort to improve the

performance beyond that achievable when training on the real bona fide and PA images from each

dataset alone. For this evaluation, we have trained several PAD models on the following training set

compositions: i.) synthetic bona fide and PA images only, ii.) real bona fide and PA images only,

iii.) synthetic bona fide and PA images plus only real bona fide images, and iv.) synthetic bona fide

and PA images plus real bona fide and PA images. We used the SpoofBuster model, which consists

of two Inception v3 networks, one trained on the whole image input and the other trained on 96 × 96

minutiae centered patches [41]. The final PA score is the weighted fusion of the two networks, with

158

a minutiae patch score weight of 0.8 and a whole image score weight of 0.2. Each of the models

are trained from scratch using Tensorflow on a single Nividia TitanX 1080 GPU on their respective

datasets with identical hyper-parameters (learning rate of 0.01, step decay learning rate schedule,

Adam optimizer with default parameters, and total training updates of 200,000 steps).

Shown in Table 6.12, we have evaluated each of the models on their respective test sets. Despite

the lower performance when training on synthetic data alone compared to training on real data,

we see improvement in the overall PA classification performance when the real training data is

augmented with samples from our synthetic bona fide and PA generator. For example, the error

of the minutiae patch model trained on real data from LivDet 2013 is reduced by 91.03% (from

15.60% to 1.40%) when augmented with synthetic data. Similarly, the error on LivDet 2015 is

reduced from 0.48% to 0.0%, where the error on GCT 6 remained the same at 0.0%.

Interestingly, as seen in Table 6.13, we also see substantial improvements in cross-material

generalization when incorporating synthetic SpoofGAN images into the training dataset of our

PAD model. Specifically, we compared the cross-material generalization of a PAD model trained

on LivDet 2013 with the same PAD model trained on LivDet 2013 augmented with SpoofGAN

images. We observe drastic reduction in APCER across 7 different PA types from the MSU FPADv2

dataset which are not included in LivDet 2013 training dataset, nor were these PA types replicated

in the synthetic images obtained from SpoofGAN that were used to augment the training dataset.

Overall, the average APCER reduced from 76.71% to 54.03% due to the addition of SpoofGAN

training images.

Furthermore, Figure 6.9 shows the trend in performance on LivDet 2015 as we keep the number

of synthetic training samples fixed but varying the percentage of real data included when training

the whole image-based PAD model. This figure suggests that when augmenting the training set

with synthetic data, just 25% of the original (real) data is required to obtain similar performance to

training on 100% of the real data alone, which significantly reduces the time and resources required

for data collection to obtain similar performance. In fact, the behavior of the real training data curve

(shown in blue) in the early stages (e.g., 5% and 10% of the total training data) exhibits a very sharp

159

Figure 6.9 PAD performance as we keep the number of synthetic training samples fixed and vary the
percentage of real data included in training. Performance reported as average APCER @ BPCER
= 0.2% across all PA types in the LivDet 2015 dataset.

decrease in APCER, suggesting that many of the PA vs. bona fide features can be learned from very

few samples; however, the subtle features that give the PAD model the last 10% of improvement

require 3 or 4 times as many data samples. Importantly, from the real plus synthetic curve (shown in

orange), the additional data required to gain that extra 10% can instead be generated by SpoofGAN.

Lastly, even if researchers and practitioners have access to a large, private database of real bona

fide fingerprint images, collecting an equivalent database of PA fingerprints is more difficult and

costly; therefore, they may wish to augment their database of real bona fide images with synthetic

PA images. As seen in Table 6.12, mixing synthetic data with only real bona fide fingerprint

examples improves the performance over training on just the synthetic examples alone; however,

synthetic PA data is still not a substitute for collecting real PA examples as the performance still

lags quite significantly.

6.5 Conclusion and Future Work

In this chapter, we presented a GAN-based synthesis method for generating high quality 512 ×

512 plain fingerprint impressions of both bona fide and PA varieties. We demonstrated the utility of

160

Table 6.12 PAD performance when trained on various combinations of real and synthetic data. We
have followed the established protocols of train/test split for each of the evaluation datasets, which
are shown in Table 6.8

Real Bona
Fide

✓
✓
✓

Training Data Composition

Real PA

Synthetic Bona Fide

Synthetic PA

Average APCER @ 0.2% BPCER
LivDet2013

GCT6

LivDet2015

✓
✓

✓
✓

✓

✓
✓

✓

63.47%
19.89%
0.48%
0.0%

57.70%
27.00%
15.60%
1.40%

43.92%
13.14%
0.0%
0.0%

Table 6.13 Cross-material PAD performance on unseen material types from the MSU FPADv2
training with LivDet 2013
dataset, comparing a PAD model trained on only LivDet 2013 vs.
plus synthetic data from SpoofGAN. Results reported as APCER @ 0.2% BPCER.

Training
Dataset
LivDet20131
LivDet2013 +
SpoofGAN2

3D Universal
Targets
82.50%

Conductive
Ink on Paper
100.0%

Dragon
Skin
79.65% 43.05% 95.84% 36.69% 99.27% 76.71 ± 0.26%

Average ±
Std. Dev.

Transp-
arency

Gold
Fingers

2D
Paper

Silicone

65.00%

64.00%

70.53% 34.92% 64.03% 15.51% 64.23% 54.03 ± 0.21%

1 Trained on bona fide and PAs from LivDet 2013.
2 Trained on bona fide and PAs from LivDet 2013 plus synthetic bona fide and PAs (of the same material types as LivDet 2013)
from SpoofGAN.

our synthetically generated PA fingerprints in improving the performance of a PA detector beyond

that achieved when training on real PA fingerprint datasets only, both in terms of seen PAs and cross-

material generalization. Additionally, our synthetic fingerprints closely resemble a database of real

fingerprints both qualitatively and quantitatively in terms of various statistics, such as distribution

of minutiae, NFIQ 2.0 image quality, PA classification, and match score distributions computed by

the state-of-the-art Verifinger 12.0 SDK. Finally, since our method is capable of generating multiple

genuine and imposter fingerprints of unique fingers in both bona fide and PA types, we open the

door for large-scale training and evaluation of joint fingerprint PAD and recognition algorithms,

overcoming a current limitation given the existing scale of publicly available PA fingerprint datasets.

Despite the demonstrated utility of our synthetic PA images, there remains several limitations

that will be addressed in future work. First, given the lower standard deviation across the various

fingerprint metrics presented in Table 6.10, the diversity of the generated images could be improved.

A related issue, specific to fingerprint PAD, is the ever increasing novelty of PA materials and types

that may be encountered in the future; thus, instilling the generation process with the ability to adapt

to novel PA types is a promising future research direction. Lastly, training the synthetic fingerprint

161

generator in an online fashion with a PAD network may provide additional supervision to generate

more useful bona fide and PA examples to improve the PAD performance. In this next chapter, we

extend our synthetic fingerprint generation capabilities toward universal fingerprint generation to

address the limitations in intra-class diversity present in SpoofGAN and other existing fingerprint

generation methods.

162

CHAPTER 7

UNIVERSAL FINGERPRINT GENERATION

The utilization of synthetic data to train fingerprint recognition models has garnered increased

attention in the biometric field due to its potential to alleviate privacy concerns surrounding

sensitive biometric data. However, current methods for generating fingerprints have limitations

in creating varied impressions of the same finger with useful intra-class variations. To tackle this

challenge, this chapter presents GenPrint, a framework designed to produce fingerprint images of

various types while maintaining identity and offering humanly understandable control over different

appearance factors such as fingerprint class, acquisition type, sensor device, and quality level.

Unlike previous fingerprint generation approaches, GenPrint is not confined to replicating style

characteristics from the training dataset alone: it enables the generation of novel sensor and style

attributes from unseen fingerprint acquisition devices during inference without requiring additional

fine-tuning. To accomplish these objectives, we developed GenPrint using latent diffusion models

with multimodal conditions (text and image) for consistent generation of style and identity. Our

experiments leverage a variety of publicly available datasets for training and evaluation. Results

demonstrate the benefits of GenPrint in terms of identity preservation, explainable control, and

universality of generated images. Importantly, the GenPrint-generated images yield comparable

or even superior accuracy to models trained solely on real data and further enhances performance

when augmenting the diversity of existing real fingerprint datasets.

Figure 7.1 Synthetic fingerprint images generated by various baseline methods and the proposed
GenPrint. The four images in each panel are impressions of the same finger to show case the
intra-class variance of each method.

163

7.1

Introduction

The use of Artificial Intelligence Generated Content (AIGC) over the last few years has exploded

due to advancements in model architectures and larger computation and data being used to train

Generative AI (GAI) models [32]. In particular, text generation models, such as ChatGPT, have

catapulted the field of GAI into the public view since its public release in November of 2022 [184].

Following in its wake came stunning advancements in image and video generation models, such

as ImageGen [90] and SORA [185], utilizing denoising diffusion probabilistic model (DDPM)

frameworks. Since then, DDPM models have proliferated as the center of attention in many top

computer vision conferences and journals. Notably, their probabilistic framework and straight-

forward optimization process makes DDPMs more stable and easy to train compared to generative

adversarial networks (GANs) [89], one of the predominant frameworks for image generation pre-

viously. Furthermore, the work of Dhariwal and Nichol further demonstrated the advantages of

diffusion models over GANs for image generation in terms of image quality [57].

Indeed, the

introduction of GANs by Goodfellow et al. [89] in 2014 and the recent surge in DDPM models

have revolutionized GenAI capabilities across enumerable industries and applications.

Artificial fingerprint generation is one application which has received increased interest for the

potential of synthetic data for training and evaluation of algorithms, aided by recent privacy and

ethical concerns as well as difficulty and cost associated with collecting biometric data. Before

the explosion of deep learning techniques, fingerprint generation methods began with intelligent,

hand-crafted methods to simulate convincing fingerprint patterns and textures [33]. Importantly,

these methods allowed for generating multiple images of the same finger, opening the door to

training and evaluation of fingerprint recognition algorithms.

Early GAN-based methods drastically improved the realism of the generated prints but lacked

control over the fingerprint identity being generated [11, 23, 76, 171, 173, 198]. Subsequent works

aimed to fill this gap by replacing each stage of the mutli-stage generation pipeline of hand-crafted

methods with GANs, preserving the identity of the generated fingerprints at each stage [71, 96,

253]. However, with the exception of identity, other appearance factors remained obscured and

164

uncontrollable, such as the specific fingerprint class (e.g., arch, loop, and whorl), acquisition type

(e.g., rolled, slap, contactless, swipe, and latent), sensor characteristics (e.g., optical, capacitive,

thermal, etc.), and quality level (e.g., high, average, and low) of the generated prints. Shoshan

et al. [214] proposed FPGAN-Control to disentangle identity and appearance factors in the latent

space and allowed for swapping between different appearance latent vectors to achieve some degree

of control over intra-class variations (e.g., acquisition type, sensor, and pressure level); however,

this method lacked explicit, humanly explainable control over appearance factors.

Recent advancements in text to image generation models utilizing DDPMs have demonstrated

very realistic and controlled image generation capabilities.

In this chapter, we aim to leverage

DDPM advancements for controllable fingerprint image generation utilizing multimodal conditions

(text and image) for improved generation capabilities. We leverage text prompts to allow for

guidance of explainable appearance factors and rely on image style embeddings for factors not

easily expressed in language. Importantly, an added benefit of our novel image style condition is

that the generation outputs are no longer constrained to interpolating between the domain of the

seen training data, for it allows for zero-shot generation of novel fingerprint sensor characteristics

not seen during training. For a visual comparison, figure 7.1 shows some example synthetic images

generated from SFinGe, PrintsGAN, FPGAN-Control and the proposed model which we refer to as

GenPrint. The four images in each panel are of impressions of the same finger identity to showcase

the intra-class variance of each method, which demonstrates the improved diversity which GenPrint

is capable of generating.

More concisely, the contributions of this research are the following:

1. A controllable latent diffusion model, GenPrint, using text and image conditions for highly

realistic and diverse synthetic fingerprint generation.

2. GenPrint is capable of generating fingerprints of any acquisition type, sensor, fingerprint

class, and quality, including fingerprint styles not seen during training without any additional

fine-tuning (e.g., zero-shot fingerprint style generation).

3. The generation process is controllable (both in appearance and identity preservation) and

165

Figure 7.2 Architecture of GenPrint.

explainable with humanly interpretable text prompts.

4. The utility of GenPrint synthetic images is validated through experiments showcasing im-

proved recognition performance of models trained on GenPrint images compared to real

datasets and other benchmark fingerprint generation methods.

5. We also demonstrate the utility of GenPrint images for evaluating fingerprint recognition

systems by replacing real data for large-scale identification experiments.

6. Open-sourcing a dataset of 100K synthetic finger identities with 15 impressions of various

acquisition devices to the research community.

7.2 Related Work

7.2.1 Hand-crafted Fingerprint Generation Methods

The seminal work of Cappelli et al. [33] utilized a combination of an elliptical shape generation

model, mathematical ridge flow and Gabor filters for ridge pattern generation, and noise and

distortion models to simulate realistic fingerprint patterns.

Importantly, this model allowed for

generating multiple impressions of the same finger leading to its adoption for aiding in training

and evaluation of fingerprint recognition models. Despite its impressive capabilities and intelligent

design, SFinGe is limited in its intra-class variations it is able to generate due to its hand-crafted

nature (see subfigure (a) of Figure 7.1 for examples). More recent methods have turned to deep

166

learning techniques, starting with GANs, to learn the subtle intra-class variations that have led to

more varied and realistic fingerprint images.

7.2.2 Fingerprint Generation via GANs

The introduction of GANs gave way to more realistic fingerprint generation that captured

more realistic texture characteristics that are difficult to hand-design [11, 23, 76, 171, 173, 198];

however, early uses of GANs lacked control over the fingerprint identity being generated - severely

limiting the utility of the generated fingerprints. Wyzykowski et al. [253], aimed to fill this gap

by adopting CycleGAN as a wrapper around SFinGe generated images to impart them with more

realistic textures, while leveraging SFinGe’s ability to generate multiple impressions. However, the

intra-class and inter-class variations were still limited by the hand-designed generation of SFinGe.

Engelsma et al. went one step further and designed a multi-stage GAN method for generating

highly realistic fingerprint ridge patterns with multiple impressions per finger and showed substantial

improvement over SFinGe in utility for recognition model training [71]. Finally, Shoshan et al. [214]

adopted a mixed variational autoencoder (VAE) and GAN architecture called FPGAN-Control to

interpolate between latent identity and appearance vectors to be able to render fingerprint images in

multiple different appearances. Still, this model lacked explicit control over the appearance factors,

and the possible space of generated fingerprint styles is constrained to the distribution of styles

belonging to the original training set.

7.2.3 DDPMs for Fingerprint Generation

To the best of our knowledge, DDPMs have only just begun to be investigated for artificial

fingerprint generation [138, 232]. Tang et al. applied a vanilla DDPM to synthesize unconditional

fingerprint patches and validated the realism compared to real fingerprint patches using the Fréchet

Inception Distance (FID) metric [232]. Li and Yang also applied an unconditional DDPM model

trained on a dataset of latent, rolled, and plain (i.e., slap) fingerprint images to randomly generate

fingerprint impressions of these types [138]. They demonstrated the realism of the DDPM generated

fingerprint images both in terms of NFIQ quantitative values and t-SNE qualitative comparisons

to the real fingerprint images. However, their model lacked control over both the identity and

167

appearance of the generated fingerprints, which is critical for training and evaluation of fingerprint

recognition models. To the best of our knowledge, the proposed GenPrint model is the first use of

DDPMs for fingerprint generation with explicit control over both the identity and appearance of

generated fingerprint images.

7.3 GenPrint: Controllable Multimodal Fingerprint Diffusion Model

GenPrint is a multimodal latent diffusion model [199] finetuned for fingerprint generation from a

pretrained Stable Diffusion v1.5 model with weights made available from the Diffusers library [242].

In this section, we first describe the text to image fingerprint generation capabilities including the

dataset curation process and fine-tuning procedure. Next, we describe the architectural design for

incorporating style image embeddings into the Stable Diffusion pipeline and explain the zero-shot

style generation capability it facilitates. Finally, the identity preservation process is described along

with a detailed description of the full pipeline for generating synthetic fingerprint images with

GenPrint. An overview of the architecture design is given in Figure 7.2.

7.3.1 Control Factors via Text Conditions

The first step in fine-tuning Stable Diffusion for text to fingerprint generation is obtaining a large

corpus of fingerprint images and associated text descriptions. For this purpose, we aggregated data

from multiple fingerprint datasets from predominately publicly available sources. These training

datasets are listed in Table 7.1 along with the acquisition label, sensor label, and number of images

for each dataset. Our aggregated dataset consists of data from five different acquisition types (rolled,

slap, swipe, contactless, and latent) and thirty different sensing devices ranging from optical readers

as well as capacitive, thermal, contactless, and latent surfaces.

Missing from many fingerprint datasets are annotations for fingerprint class (whorl, plain arch,

tented arch, left loop, and right loop) and quality (low, average, and high quality), which are needed

to impart the generator with this kind of control. To obtain these labels, we utilized Verifinger

SDK v12.4 to extract class and NFIQ 2.0 [228] quality estimations. Since the NFIQ 2.0 metric was

optimized for slap impressions utilizing frustrated total internal reflection (FTIR) optical imaging,

https://www.neurotechnology.com/verifinger.html

168

the quality levels across each acquisition type may vary distinctly. Thus, we fit individual quality

distributions according to a normal distribution using images belonging to each acquisition category

and assigned low, average, and high quality labels to image clusters based on the mean ± standard

deviation.

Using these annotations we constructed text prompt labels for each training image utilizing the

following template: “a {acquisition} fingerprint image, {class} pattern, {quality} quality, {sensor},

{sensing}", where the acquisition type is one of {rolled, slap, swipe, contactless, latent}, class is

one of {whorl, plain arch, tented arch, left loop, right loop}, quality is one of {low, average, high},

sensor is one of the thirty training sensors listed in Table 7.1, and sensing type is one of {FTIR

optical, direct-view optical, multispectral optical, capacitive, thermal}.

For fine-tuning Stable Diffusion on our text to fingerprint image dataset, we utilize the low-rank

adaptation (LoRA) strategy for more efficient training with a rank of 128 [113]. The LoRA weights

are finetuned with a learning rate of 0.0001, cosine scheduler [150], default Adam optimizer [130],

and batch size of 96 spread across 8 Nvidia A100 GPUs. The model is trained for 500,000 steps

and trained on fingerprint images of a resolution of 512 × 512 pixels.

7.3.2 Zero-shot Style Generation

Motivated by the fact that many of the textural intra-class variations present in fingerprint

images are not easily expressed in language via simple text prompts, we turned toward a deep

learning-based representation to capture those characteristics. In particular, we take a pretrained

VGG [216] model trained on ImageNet to embed style embeddings for each training image. These

style embeddings are injected into the diffusion model via cross-attention layers which are de-

coupled from the cross-attention layers from the textual embeddings used to control the explainable

style factors. Our choice of VGG embeddings for style representation is motivated from two key

insights:

i.)

the previous use of VGG for neural style transfer [124] and ii.) visualizing the

separation of VGG style embeddings for various fingerprint sensor types in the t-SNE embedding

space (see Figure 7.5).

During inference, style embeddings from various sensor types present in the training data can

169

Table 7.1 Training datasets for GenPrint.

Train Dataset

Acquisition
Types

Sensor Types

No. Images
(Fingers)

NIST SD14 [246]

Rolled

Ink on paper

54,000 (27,000)

FVC 2002 [155]

FVC 2004 [156]

PLUS-MSL-
FP [132]

MSU Infant
Fingerprint [118]

NIST SD302
(N2N) [78]

NIST SD302
Latent [78]

MSP Latent [262]

IIITD SLF [211]

MOLF [212]

MUST [159]

IIT Bombay
Touchless and
Touch-based [17]

ISPFDv2 [158]

Slap, Con-
tactless

Slap, Con-
tactless

UWA Benchmark 3D
Fingerprint [270]

Slap, Con-
tactless

ZJU Finger Photo
and Touch-based
Fingerprint [93]

Slap, Con-
tactless

Slap

Slap,
Swipe

Slap,
Swipe

Slap

Desktop Scanner, TouchChip, DF90

2,400 (100)

CrossMatch, Digital Persona, Fingerchip

2,400 (100)

Eikon, Integrated Biometrics Columbo,
Integrated Biometrics Curve, Lumidigm, Next
Biometrics, Suprema RealScan G1, Digital
Persona

106,712 (580)

SilkID

9,683 (1,921)

Slap, Con-
tactless,
Rolled

CrossMatch, Eikon, GreenBit, ANDI, S120,
MorphoWave, DactyScan, LIVETOUCH,
Futronic, RaspiReader

45,072 (1,600)

Rolled,
Latent

Rolled,
Latent

Slap,
Latent

Slap,
Latent

Slap,
Latent

Ink on paper, crime scene

7,586 (1,019)

Ink on paper, crime scene

1,866 (933)

CrossMatch, crime scene

480 (150)

Lumidigm, Secugen, CrossMatch, crime scene

65,512 (1,000)

CrossMatch, crime scene

20,247 (120)

eNBioScan, smartphone

3,200 (200)

Secugen, smartphone

57,600 (304)

CrossMatch, 3D scanner

18,266 (1,500)

Digital Persona, smartphone

39,580 (824)

170

be sampled to generate images of that sensor. On the other-hand, even style embeddings extracted

from images of a completely new, unseen sensor can be used to generate images in that new

sensor domain. Therefore, our method is generalizable and allows for “zero-shot" fingerprint style

generations without any additional fine-tuning required. This fact is later supported by empirical

evidence in section 7.4.3 to produce new fingerprint characteristics of latent, optical, capacitive,

and contactless sensors outside those seen during training.

7.3.3 Fingerprint Identity Preservation

Several strategies for identity preservation and personalization in diffusion models have been

proposed. Some of these techniques, such as Textual Inversion [80] and DreamBooth [207],

require additional fine-tuning for each new concept, whereas others, such as IP-Adapter [245]

and PhotoMaker [141], can produce identity consistent generations for multiple subjects without

inference time fine-tuning. Both IP-Adapter and PhotoMaker embed the identity of an input

reference image or images into the diffusion process via cross-attention layers. This guides the

diffusion model to generate images which are identity consistent with the input reference images.

Empirically, we tried IP-Adapter but found that it lacked the fine-grained spatial control needed

to maintain the fingerprint ridge structure throughout the image. To solve this, we turned to

ControlNet [266], which is another adaptation to the diffusion model process in which reference

images are provided to the diffusion model to guide the generation with spatially consistent outputs.

For fingerprints, the identity discriminative features which are consistent across multiple dif-

ferent acquisition and sensor types are the silhouettes of the ridge flow patterns giving rise to the

relative orientation of minutiae points of each finger. We posit that ControlNet is a suitable choice

for imparting our DDPM model with identity preservation. Therefore, we propose to adapt the

ControlNet framework to provide explicit spatial consistency of the generated fingerprint ridge

pattern by pre-pending a ridge extraction module to the input of our identity preserving diffusion

model, ID-Net. This ridge extractor removes sensor dependent and other style characteristics from

the input fingerprint control image leaving only the ridge pattern silhouette image to guide the

spatial preservation of the fingerprint identity, including the location and orientation of minutiae

171

points. This, combined with the text and style embeddings providing the style information, allows

our ID-Net to generate varying textural characteristics while maintaining the input fingerprint ridge

pattern. The architecture for our ridge extraction model is the light-weight SqueezeUNet model,

which has been successfully applied previously for fingerprint ridge extraction [98].

7.3.4 Generation Pipeline for GenPrint

The full generation pipeline for GenPrint consists of two stages. First, our finetuned stable

diffusion model is used to generate full (i.e., rolled) fingerprint images of various fingerprint

classes from a random noise vector. For this stage, the text prompt guiding the generation follows

the template of “a rolled fingerprint image, {class} pattern", high quality, ink on stock paper",

where the fingerprint class is randomly selected from the five available classes. This provides a full

fingerprint ridge pattern for use in the subsequent generation stage which imparts controllable style

variations to generate large intra-class variations. By varying the noise vector for each generation,

completely new and unique fingerprint patterns are generated. This fact is supported in section 7.4.6,

showcasing the inter-class separation of the generated fingerprints, and section 7.4.7, highlighting

the low similarity between generated identities and the training fingerprint identities. In the second

stage, the generated fingerprint images from the first stage are passed through ID-Net and imparted

with varying appearances based on the style embeddings (from reference images either belonging

to the training set or from new example images from unseen sensors) and different text prompts

providing explainable acquisition, sensor, and quality factors.

One critical observation we noticed was that the ControlNet framework was indeed very suc-

cessful at constraining the local spatial details to be preserved in the generated images; however,

often we found that the generated images had the tendency to over-constrain the generation process

to preserve every detail of the input control image. This is undesirable if, for example, the desired

output image is a slap fingerprint image where the input control image is a full, rolled fingerprint

ridge pattern. The result is an unrealistic image with the full rolled fingerprint pattern in the style

of the specified “slap" sensor input. Therefore, we apply a mask to the output of the ridge extractor

aligned with the input text prompt to apply a realistic foreground mask for the specified acquisition

172

type. For example, if the prompt is to produce a slap fingerprint image, then an extracted mask of

the fingerprint foreground area from one of the slap training images is applied to the input rolled

fingerprint ridge pattern to produce an image with a fingerprint area resembling a realistic slap

fingerprint. If instead the prompt is to generate a latent fingerprint, then a mask from a training

latent fingerprint image is applied to the input control image to produce a realistic looking latent

fingerprint with occluded areas of the ridge pattern.

Similarly, the ControlNet aspect of the ID-Net model will not produce realistic non-linear

distortions to the generated images because it would modify the input fingerprint pattern supplied

as the ControlNet input. Therefore, we also randomly sample realistic distortion grids to apply

to the ControlNet image for each generation. These realistic distortion grids are obtained by

computing minutiae displacements between genuine fingerprint pairs within the training dataset.

During inference, an example distortion grid, indexed by the specified fingerprint acquisition type,

is sampled and applied to the input reference image.

7.4 Experimental Results

In this section, we first evaluate the realism of GenPrint-generated images compared to real

fingerprint images and other baseline fingerprint generation methods. We then verify the validity of

each of the explainable control factors which GenPrint is trained to generate, including control over

the fingerprint class, acquisition, sensor, and quality. Next, we examine GenPrint’s adaptability

for zero-shot style generation by using GenPrint to generate fingerprint images following the

style characteristics of the unseen Latent Fingerprint in the Wild (LFIW) dataset consisting of

three new latent fingerprint types, one optical sensor, one capacitive sensor, and one contactless

sensor [148]. Additionally, we evaluate the utility of GenPrint-generated images for training

fingerprint recognition models and compare it with other fingerprint generation methods both when

training on only synthetic images and for augmenting a set of real fingerprint images with additional

synthetic fingerprint impressions. Furthermore, we show the potential for GenPrint images to be

used for evaluation of fingerprint recognition models as a replacement for real fingerprint images

in large-scale identification experiments. Next, we verify the uniqueness and independence of

173

Figure 7.3 AFR-Net similarity score distributions for (a) NIST SD302 and similar GenPrint dataset
and (b) NIST SD27 and similar GenPrint latent dataset.

GenPrint-generated finger identities compared to the set of training fingerprint identities from

which it was trained. Finally, we include a discussion on the failure cases and limitations of

GenPrint.

7.4.1 Realism of Generated Fingerprints

To validate the realism of GenPrint-generated fingerprints, we performed two experiments: i.)

comparing the genuine and imposter score distribution of GenPrint images to a similar composition

of real fingerprint images and ii.) comparing various fingerprint and minutiae related statistics

between real images and GenPrint synthetic images.

For the first experiment, we generated 400 unique synthetic fingers with 12 impressions each

across a random selection of slap, rolled, and contactless fingerprints using GenPrint to mimic the

size and sensor distribution of a test split of the NIST SD302 dataset consisting of 400 real finger

identities with roughly 12 impressions each and a mix of different sensor and acquisition types. We

then computed genuine (same identity) and imposter (different identity) score distributions using a

pretrained AFR-Net [97] fingerprint recognition model for both the GenPrint-generated dataset and

the test split of NIST SD302. The results are shown in subfigure (a) of Figure 7.3. Similarly, we

repeated the experiment by generating an equivalent size and composition (258 unique latent and

174

rolled fingerprint pairs) to the NIST SD27 dataset and compared the score distributions in subfigure

(b) of Figure 7.3. We chose these two real datasets as comparison because they encompass many

of the different acquisition types (rolled, slap, contactless, and latent) that GenPrint is trained to

generate. The realism of GenPrint-generated images is evident from the overlap in the distributions

compared to both real fingerprint datasets. The recognition performance of each of the datasets is

also very similar. For NIST SD302 and the corresponding GenPrint dataset, the true accept rate

(TAR) at a false accept rate (FAR) of 0.01% is 96.33% and 97.33%, respectively. Similarly, for NIST

SD27 and corresponding GenPrint latent dataset, the TAR is 63.57% and 68.33%, respectively.

Next, we compare various fingerprint statistics from 1,000 real, rolled fingerprint impressions

from the NIST SD4 dataset to 1,000 synthetic rolled impressions generated by GenPrint and the

baseline method PrintsGAN. The specific metrics being compared are summarized in Table 7.2

and include fingerprint area, minutiae count, and average quality of minutiae. Compared to the real

fingerprint dataset, GenPrint and PrintsGAN images differ slightly between average fingerprint area

compared to the real images, where PrintsGAN tends to produce smaller fingerprints and GenPrint

tends to produce larger fingerprints. To normalize for the relative differences in fingerprint area,

we computed the minutiae statistics on a center crop of 256 × 256 pixels. Compared to the real

images, GenPrint images exhibit a higher degree of similarity than PrintsGAN images in terms of

average minutiae count and quality. For example, GenPrint differs from the real fingerprint images

in average minutiae count by 5.27, whereas PrintsGAN differs by 6.52. Similarly, GenPrint differs

in minutiae quality by 0.02, whereas PrintsGAN differs by 1.15.

7.4.2 Consistency of Control Factors

In this section, we evaluate all the different explicit control factors which GenPrint is trained

to accommodate via text prompts, including control over the fingerprint class, acquisition, sensor,

and quality level.

7.4.2.1 Fingerprint Class

GenPrint is able to generate fingerprints of any of the five major classes of fingers: whorl, left

loop, right loop, plain arch, and tented arch. Examples of each of the categories generated by

175

Table 7.2 Fingerprint statistics comparison of GenPrint and PrintsGAN generated images to real
fingerprint images.

MSP [262]
(real dataset)

PrintsGAN

GenPrint

37.18 ± 9.75

30.66 ± 6.98

42.45 ± 8.52

80.38 ± 10.36

81.53 ± 9.52

80.40 ± 9.83

192,285 ± 34,368

175,460 ± 25,189

211,599 ± 19,241

Minutiae
count

Minutiae
quality

Area
(pixels)

Figure 7.4 Example GenPrint images of different fingerprint classes and corresponding classification
accuracy of Verifinger v12.4 SDK.

176

Figure 7.5 T-SNE plots to show (a) separation of GenPrint-generated images from different ac-
quisition devices, (b) similarity of GenPrint images and corresponding real images of the same
acquisition device, (c) similarity of zero-shot generated images to corresponding real images of
novel acquisition devices which were not included in the training set of GenPrint.

GenPrint are shown in Figure 7.4. The consistency of GenPrint-generated images in following the

fingerprint class prompt provided by the user is validated quantitatively using the commercially

available fingerprint recognition software, Verifinger SDK v12.4. Specifically, we generate 100

unique finger identities using GenPrint in each of the five different fingerprint classes and classify

each of the fingerprints using Verifinger and compute the accuracy between the Verifinger predic-

tions and the ground truth class assigned by the input text prompts. The classification accuracy

for whorl, left loop, and right loop fingerprints was 99%, indicating that 99 out of 100 generated

fingerprints were classified by Verifinger as the same class intended to be generated by GenPrint.

It turned out that the classification accuracy for Verifinger on the plain arch (92%) and tented arch

types (25%) was much more challenging for Verifinger, which often misclassified the arch type as

either left or right loop in all the misclassifications. Understandably, these two fingerprint classes

can be difficult to distinguish given the similarity in the ridge patterns.

7.4.2.2 Fingerprint Acquisition and Sensor Type

GenPrint is trained on data from 30 different acquisition devices which consist of various

rolled, slap, swipe, contactless, and latent fingerprint acquisition types. Some example images

from different devices are given in Figure 7.6 along with corresponding GenPrint-generated images

177

Figure 7.6 Example GenPrint-generated and real fingerprint images from corresponding acquisition
device domains. In each pair, the left image is generated by GenPrint and the right image is a real
fingerprint image of the same acquisition device to show the similarity of GenPrint images to real
images with corresponding sensor characteristics.

178

in those same device domains, where the left image in each pair is a synthetic fingerprint generated

by GenPrint, and the right image is an example fingerprint image from a real fingerprint database.

Comparing GenPrint images and corresponding real images in the same sensor and acquisition

types highlights the realism and diversity in the possible generation space of GenPrint.

To visualize the separability of all 30 acquisition device characteristics that GenPrint is trained

to generate, we first generated 100 example fingerprint images in each acquisition device domain.

Then, we extracted representation embeddings using a pretrained VGG network and plotted them in

t-SNE [153] embedding space. The result is shown in subfigure (a) of Figure 7.5 which shows clear

separation between very distinct acquisition devices and some small overlap in similar sensors,

such as the large number of different slap FTIR optical devices sharing similar characteristics.

Furthermore, we also generated VGG embeddings for 100 real fingerprint image examples in 5 dif-

ferent acquisition devices and embedded them into the t-SNE space along with their corresponding

generated images from GenPrint to show the similarity between corresponding real and synthetic

images of the same acquisition device domains.

7.4.2.3 Quality Control

There are two ways in which GenPrint can manipulate the quality of the generated images. The

first is through the text prompt where the user can specify either low, average, or high quality, and

the other is through passing a reference style image with a relatively low, average, or high quality

appearance. Empirically, we found both approaches to work well. For validating the quality control

of GenPrint, we generated datasets of 100 unique synthetic finger identities with 300 impressions

of each of the five different acquisition types (rolled, slap, swipe, contactless, and latent) and used

the text prompt to generate 100 of those impressions for each quality level (low, average, and high).

Example low, average, and high quality images for three generated rolled fingerprints are given in

Figure 7.7 for visualization. We then computed the NFIQ 2.0 quality score using the Verifinger

SDK and plotted the quality distributions in Figure 7.8. There is clear separation among each of

the quality levels across each of the acquisition types, verifying GenPrint’s appropriate control over

the quality of the generated fingerprints.

179

Figure 7.7 Example low, average, and high quality rolled fingerprint impressions of a finger
generated by GenPrint.

Figure 7.8 NFIQ 2.0 score distributions for fingerprints generated by GenPrint across five different
fingerprint acquisition types.

180

7.4.3 Zero-shot Fingerprint Style Generation

To validate the quality of zero-shot fingerprint style generation, we performed one last experi-

ment using t-SNE visualizations where we embed 100 example synthetic and real images from 6

different acquisition device domains from an unseen dataset which was not included in the training

dataset for GenPrint. These images come from the recently released LFIW dataset [148]. Again,

we observe very close similarity to corresponding real and synthetic images of the same acquisi-

tion devices, demonstrating GenPrint’s adaptability toward zero-shot style generation from novel

acquisition devices.

7.4.4 Utility for Training Fingerprint Recognition Models

One of the most important criteria for the quality of synthetic fingerprint generators is their

utility for training fingerprint recognition models. We evaluate the utility of GenPrint both when

training on only synthetically generated images and when augmenting a set of real fingerprint

images with additional synthetic data. We compare with several previous synthetic fingerprint

generators as baselines including SFinGe, PrintsGAN, and FPGAN-Control.

For the first experiment, we generate synthetic databases of 35,000 identities and 15 impressions

per identity using each synthetic generation method. Some example images from each method are

shown in Figure 7.1. We then train ResNet50 [106] recognition models using an ArcFace loss

function on incremental subsets of each database using increments from 1,600 identities (the size

of the real N2N fingerprint database) to 35,000 identities and plot the performance of the trained

models on various evaluation datasets (see Figure 7.9). The evaluation datasets used are summarized

in Table 7.3 and include fingerprint impressions of diverse acquisition devices including rolled,

slap, contactless, and latent fingerprint types. We also summarize the TAR at an FAR of 0.1% for

training on 35,000 identities from each method in Table 7.5. From Figure 7.9, we can clearly see

that the performance of the recognition model trained on GenPrint images performs far better than

any of the baseline synthetic methods and even surpasses the performance of training on the real

N2N fingerprint dataset as the number of synthetic identities is increased.

For the second experiment, we compared the utility of GenPrint to the next best performing

181

synthetic method FPGAN-Control in augmenting an existing set of real fingerprint data for training

on a combination of real and synthetic. Starting from the initial set of 1,600 real finger identities

from N2N, we add increasing amounts of synthetic identities and again plot the performance of

the trained ResNet50 models as the number of identities is increased. The results in Figure 7.10

show that both synthetic methods improve the performance when used for augmentation, but the

improvement from GenPrint images is far superior.

The previous experiment showcased improvement of augmenting a limited set of real fingerprint

data of only 1,600 unique finger identities, but naturally a question arises as to whether synthetic data

augmentation is still helpful if the number of unique, real fingerprint identities in the training set is

already large (e.g., 35,000). To investigate this question given that the number of identities is already

large, rather than include additional synthetic identities, we instead take the existing real identities

and use GenPrint to synthesize additional impressions in a more diverse range of acquisition

devices. For this experiment, we use 35,000 unique fingerprint identities from the Michigan State

Police (MSP) longitudinal fingerprint dataset [262] which has about 12 impressions per identity

and augment each finger identity with an additonal 15 synthetic impressions of various acquisition

devices. The result of augmenting MSP with GenPrint impressions is shown in Figure 7.10.

As the number of identities increases, the plots show that GenPrint does indeed improve the

performance significantly by augmenting the diversity of the already existing fingerprint images.

This improvement is particularly evident when the test datasets contain sensor characteristics not

included in the original MSP dataset but which GenPrint is able to synthesize (e.g., contactless and

latent fingerprints). For reference, the TAR at FAR=0.1% using 35,000 training identities is given

in Table 7.5.

In both of the previous experiments, we trained only ResNet50 models for the comparison.

Thus, we now study the impact of additional model architectures and examine whether similar

trends arise.

In particular, we train two additional model architectures on both GenPrint and

FPGAN-Control datasets of 35,000 identities and 15 impressions per identity. These include a

ResNet18 model and a vision transformer (ViT) [64] with a patch size of 16 and 12 layers. As

182

Table 7.3 Test Datasets.

Sensor Types

Digital Persona, smartphone

Test Dataset

PolyU Contactless
2D to Contact-
based 2D [143]

Acquisition
Types

Slap,
Contactless

NIST SD 4 [248]

Rolled

Ink on paper

NIST SD 302 [78]

Slap,
Contactless,
Rolled,

CrossMatch, Eikon, GreenBit, ANDI, S120,
MorphoWave, DactyScan, LIVETOUCH,
Futronic, RaspiReader

No. Images
(Fingers)

1920 (336)

4000 (2,000)

2548 (400)

NIST SD 27 [83]

Rolled,
Latent

Ink on paper, crime scene

1032 (258)

Figure 7.9 Authentication accuracy (TAR at FAR=0.1%) of ResNet50 trained on synthetic data
from various fingerprint generation methods including the proposed GenPrint.

shown in Table 7.6, the same relative performance gap between training on GenPrint vs. FPGAN-

Control images across each model architecture is consistent with our more extensive experiments

using ResNet50.

183

Table 7.4 Authentication accuracy (TAR at FAR=0.1%) of ResNet50 trained on synthetic data from
various fingerprint generators including the proposed GenPrint. A ResNet50 model trained on
N2N, a real dataset, is included as a baseline.

Training Data

No. IDs No. imgs/ID

N2N
slap-rolled-
contactless

NIST SD4
rolled-
rolled

PolyU
contact-
contact

PolyU
contactless-
contactless

PolyU
contact-
contactless

NIST SD27
latent-
rolled

N2N [78]
(real dataset)

1,600

SFinGe [33]

35,000

FPGAN-
Control [214]

35,000

PrintsGAN [71]

35,000

GenPrint

35,000

12

15

15

15

15

85.73

87.9

94

95.79

47.08

13.95

7.62

37.25

52.63

78.42

74.52

63.66

86.08

83.6

89.38

95

96.15

97.85

89.58

97.58

96.92

97.71

3.07

37.86

61.51

75.26

1.55

12.02

18.6

39.53

Figure 7.10 Authentication accuracy (TAR at FAR=0.1%) of ResNet50 trained on a combination of
real and synthetic data from FPGAN-Control and the proposed GenPrint evaluated on six different
test scenarios.

184

Table 7.5 Authentication accuracy (TAR at FAR=0.1%) of ResNet50 trained on a combination of
real and synthetic data from FPGAN-Control and the proposed GenPrint evaluated on six different
test scenarios.

Training Data No. IDs No. imgs/ID

N2N
slap-rolled-
contactless

NIST SD4
rolled-
rolled

PolyU
contact-
contact

PolyU
contactless-
contactless

PolyU
contact-
contactless

NIST SD27
latent-
rolled

85.73

87.9

94

95.79

47.08

13.95

89.71

88.8

95.33

96.83

68.25

23.64

1,600

35,000

12

15

35,000

13.5

94.69

98.9

99.54

99.17

90.9

46.51

35,000

35,000

12

27

96.04

96.49

99.80

99.79

99.71

97.29

62.02

99.70

99.75

99.75

98.07

69.38

N2N [78]
(real dataset)

N2N [78] +
FPGAN [214]

N2N [78] +
GenPrint

MSP [262]
(real dataset)

MSP [262] +
GenPrint

Table 7.6 Training on 35K IDs, 15 impressions with different model architectures and evaluated on
six different test scenarios. Results reported as TAR @ FAR=0.1%.

Model

Training Dataset

ResNet18

FPGAN [214]

ResNet18

GenPrint

ResNet50

FPGAN [214]

ResNet50

GenPrint

ViT

ViT

FPGAN [214]

GenPrint

N2N
slap-rolled-
contactless

NIST SD4
rolled-
rolled

PolyU
contact-
contact

PolyU
contactless-
contactless

PolyU
contact-
contactless

NIST SD27
latent-
rolled

61.92

71.84

74.52

86.08

45.47

79.81

70.6

91.75

83.60

97.85

70.40

96.80

80.79

90.25

89.38

97.58

45.92

95.75

87.83

93.04

95.00

97.71

83.33

96.71

26.70

42.45

37.86

75.26

6.28

61.25

8.14

23.64

12.02

39.53

5.43

30.62

7.4.5 Utility for Evaluating Fingerprint Recognition Models

In addition to being useful for training, synthetic fingerprints can also help with large-scale

evaluation of fingerprint recognition algorithms, where collecting a dataset of potentially millions

of unique real fingers can be prohibitively expensive. To demonstrate the feasibility of GenPrint

images to be used for such purposes, we generated a large database of 64,000 unique rolled

fingerprints to compare with a database of 64,000 real rolled fingerprint identities from the MSP

dataset as a background gallery for latent to rolled fingerprint search using latent probes and

corresponding mates from the NIST SD 27 latent dataset. Ideally, the search performance should

185

Figure 7.11 Search results using probes from NIST SD 27 and 64,000 identity background from
GenPrint compared to the real MSP dataset and FPGAN-Control backgrounds.

be similar when using the real fingerprint background images and GenPrint fingerprint background

images. We repeated the experiment using a database of 64,000 unique identities from FPGAN-

Control as a baseline. The results on the three different gallery backgrounds are given in Figure 7.11,

which shows better overlap in the search accuracies between GenPrint background gallery and the

real fingerprint gallery compared to the overlap between FPGAN-Control and the real dataset,

indicating that GenPrint images make a more suitable replacement for real images for large-scale

search evaluations than the baseline FPGAN-Control method. In particular, the rank-1 accuracy

on the real background dataset is 82.17%, whereas it was 82.95% and 83.72% for GenPrint and

FPGAN-Control, respectively.

7.4.6 Biometric Capacity

Ideally, every synthetic finger identity should be unique, but the probability of encountering

“duplicate" identities, those which have a high similarity to each other, increases as the size of the

dataset grows, which is true even for real fingerprint datasets. Nonetheless, the possible number of

unique finger identities that a model can generate, referred to as the biometric capacity [18], is an

186

important factor for comparison among synthetic biometric generators. Unfortunately, accurately

measuring the biometric capacity is a difficult and open question, and empirically computing the

similarity between all generated identities scales in complexity of O(𝑛2), making it computationally

demanding for anything above 100,000 identities.

Recently, Bodetti et al. [18] proposed a geometrical model of capacity by embedding face images

into a hyperspherical representation space and used a specified FAR to estimate the ratio of the

overall embedding space volume and the intra-class separation. However, this approach only aims

to estimate an upper bound on the capacity, and when we applied the code to our generated images

and several other baselines, we received capacity estimates on the order of 1032. To obtain more

practical insights, we used a pretrained AFR-Net fingerprint matcher to compute the percentage

of “duplicate" identities as the number of generated identities increases for both PrintsGAN and

GenPrint. We obtained duplicate identities by computing all possible imposter score comparisons

between the generated identities and determined how many of the pairs produced similarity scores

to each other which fell above the genuine match threshold of 0.35 computed on the real NIST SD4

dataset for a FAR of 0.01%. As shown in Table 7.7, the number of duplicate identities is increasing

at a large rate for PrintsGAN as the number of generated identities increases, whereas GenPrint

closely follows the trend on the real MSP fingerprint dataset as the number of identities approaches

100,000.

Interestingly, PrintsGAN and GenPrint were both trained on fingerprint databases of

a similar number of identities (38,291 for PrintsGAN and 37,351 for GenPrint), the difference

being that GenPrint is based-off diffusion models which are believed to better capture the full data

distribution compared to GANs [57].

7.4.7

Identity Leakage

Besides the capacity of the biometric generator, a privacy preserving model should also not leak

sensitive information from the dataset on which it was trained. In other words, the generated finger

identities from GenPrint should not have high similarity with any of the finger identities in the

training set. We aimed to measure the potential identity leakage of GenPrint by generating 35,000

unique synthetic fingerprint identities and computed similarity scores to each of the 37,351 real

187

Table 7.7 Percentage of duplicate identities generated by PrintsGAN and GenPrint as the number
of generated identities increases from 20,000 to 100,000. A duplicate identity is counted whenever
an imposter score between any of the generated identities is above a genuine match threshold of
0.35, which was computed on the real NIST SD4 dataset using a pretrained AFR-Net fingerprint
recognition model.

Number of IDs

20,000
40,000
60,000
80,000
100,000

MSP
(real data)

0.060%
0.078%
0.103%
0.154%
0.170%

GenPrint PrintsGAN [71]

0.070%
0.190%
0.305%
0.461%
0.535%

4.255%
7.408%
9.885%
12.23%
14.43%

Figure 7.12 Example failure cases generated by GenPrint exhibiting noise and color artifacts.

training finger identities in our training dataset using a pretrained AFR-Net fingerprint recognition

model [97]. Out of these 35,000 synthetic identities, only 10 (0.03%) had a similarity score with

any training identity above 0.231, the genuine match threshold computed on FVC 2002 DB1A at

FAR=0.01%. Furthermore, even out of those ten similarity scores that fell above the threshold, the

maximum similarity score obtained was just 0.297, only slightly above the threshold.

7.4.8 Failures and Limitations

On occasion, the outputs generated from GenPrint can exhibit some noise and other color

artifacts. Some of these failure cases are visualized in Figure 7.12. For the image in subfigure (a),

the prompt was for a low quality contactless fingerprint from a smartphone camera, and the prompt

188

Figure 7.13 Example images with mixed text prompts. The image in (a) was created with a prompt
containing the acquisition type of “latent" and sensor type of “smartphone", whereas (b) was
prompted with acquisition type “contactless" and sensor type “crime scene".

used in (b) was for a low quality latent fingerprint from a crime scene. Empirically, we found that

the probability of such artifacts occurring is higher when the quality of the fingerprint is prompted

as low. Another potential area for unexpected outputs is in mixing acquisition and sensor types

that may not be realistic. For example, the image produced in subfigure (a) of Figure 7.13 was

prompted with the acquisition type of “latent" and sensor type of “smartphone", whereas subfigure

(b) was prompted with acquisition type “contactless" and sensor type “crime scene". These mixed

prompts produce fingerprint images that resemble characteristics of both contactless and latent

fingerprints but are not quite realistic. Nonetheless, even though mixing various acquisition and

sensor types may produce unrealistic fingerprint images, the results may still prove to be a useful data

augmentation tool for training fingerprint recognition models. In fact, all experiments conducted

in this paper are without editing or removing any images generated by GenPrint.

One of the most significant limitations of GenPrint and DDPM models in general is the com-

putational efficiency, both in terms of training and inference time and memory footprint. Training

efficiency of GenPrint was partially mitigated by utilizing LoRA weights for training; however, the

inference speed for GenPrint is still about 1.13 seconds per image (512×512 resolution) using an

Nvidia A100 GPU and an AMD EPYC 7543 32-Core Processor - which is much slower compared to

189

GANs (e.g., 7.41 ms per image for FPGAN-Control on the same hardware). For offline generation

of synthetic datasets, the latency and RAM usage are both just a nuisance; however, they prevent

the model from being useful for online generation of synthetic data during training of recognition

models.

7.5 Conclusion

By employing latent diffusion models with multimodal conditions, GenPrint offers a versatile

framework capable of generating diverse fingerprint images while preserving identity and providing

explainable control over various appearance factors. Unlike previous approaches, GenPrint is not

constrained by the characteristics of the training dataset alone, allowing for the generation of

novel sensor and style attributes during inference without the need for additional fine-tuning.

The experimental results showcase the efficacy of GenPrint in terms of identity preservation and

narrowing the gap between synthetic and real domains. Moreover, the universality of GenPrint-

generated images improves model training by augmenting the diversity of existing fingerprint

datasets, thus enhancing the performance and generalization of fingerprint recognition systems.

The same or similar model architecture can also be applied to other areas of biometrics (e.g., face,

palmprint, iris, etc.) which we are currently undertaking.

7.6 Acknowledgment

Parts of this research were supported by a grant from the Department of Homeland Security via

The Criminal Investigations and Network Analysis Center (CINA) at George Mason University.

190

CHAPTER 8

SUMMARY

8.1 Contributions

In this dissertation, we have presented several methods to improve the generalization of fin-

gerprint embeddings to various challenging and unconstrained scenarios. These specific scenarios

and contributions are enumerated below:

• Sensor and material agnostic fingerprint PAD:

– A robust PAD solution with improved cross-material and cross-sensor generalization

performance.

– The proposed solution using style transfer and adversarial representational learning

can be built on top of any CNN-based fingerprint PAD solution for cross-sensor and

cross-material PA generalization.

– Experimental evaluation of the proposed method on publicly available datasets LivDet

2015, LivDet 2017, MSU-FPAD, and GCT3. The approach is shown to improve the

cross-sensor (cross-sensor and cross-material) generalization performance from a TDR

of 88.36% (78.76%) to a TDR of 93.03% (88.49%) at a FDR of 0.2%.

• Contact to contactless compatible fingerprint recognition:

– An end-to-end system, called C2CL, for contact-contactless fingerprint matching.

C2CL is comprised of preprocessing (segmentation, enhancement, scaling, and de-

formation correction), feature extraction (minutiae and texture representations), and

matching modules aimed at reducing the domain gap between contact and correspond-

ing contactless fingerprint images.

– Our preprocessing is generalizable as it was shown to also benefit the Verifinger 12.0

commercial fingerprint SDK.

– A contact-contactless adaptation of DeepPrint [69] for representation extraction. Our

representation is generalizable across multiple datasets and contactless capture devices.

– SOTA cross-matching verification and large-scale identification accuracy using C2CL

191

on both publicly available contact-contactless matching datasets as well as on a com-

pletely sequestered dataset collected at Zhejiang University, China. Our evaluation

includes the most diverse set of contactless fingerprint acquisition devices, yet we

employ just a single trained model for evaluation.

– A smartphone contactless fingerprint capture app that was developed in-house for

improved throughput and user-convenience. This app will be made available to the

public to promote further research in this area.

– A new dataset of 9, 888 2D contactless and corresponding contact-based fingerprint

images from 206 subjects (2 thumbs and 2 index fingers per subject), made publicly

available to advance much needed research in this area.

• Multimodel embeddings for improved sensor-interoperability of fingerprint recogni-

tion:

– Analysis of various attention-based architectures for fingerprint recognition.

– Novel architecture for fingerprint recognition, AFR-Net, which incorporates attention

layers into the ResNet architecture.

– State-of-the-art (SOTA) fingerprint recognition performance (authentication and identi-

fication) across several diverse benchmark datasets, including intra-sensor, cross-sensor,

contact to contactless, and latent to rolled fingerprint matching.

– Novel use of local embeddings extracted from intermediate feature maps to both improve

the recognition accuracy and explainability of the model.

– Ablation analysis demonstrating the importance of each aspect of our model, including

choice of loss function, training dataset size, use of spatial alignment module, use of

both classification heads, and use of local embeddings to refine the global embeddings.

• Improved latent fingerprint recognition via fusion of global and local embeddings:

– Design of an end-to-end latent fingerprint recognition pipeline using deep learning

The project repository for the smartphone contactless fingerprint capture app is available at https://github.com/
ronny3050/FingerPhotos.
The dataset application is available at https://person.zju.edu.cn/en/eryunliu.

192

methods, including algorithms for segmentation, enhancement, minutiae extraction,

and a fusion of global and local embeddings.

– State-of-the-art (SOTA) latent to rolled/plain fingerprint search across multiple datasets,

including NIST SD 27 [83], NIST SD 302 Latents (N2N Latents) [78], MSP La-

tent [262], and MOLF datasets [212].

– Faster search speed (low latency) due to our multi-stage search scheme, while main-

taining SOTA recognition accuracy for both closed-set and open-set identification.

– Generalization of representation (embedding) from LFR-Net is shown via SOTA au-

thentication performance across several rolled (NIST SD 14 [246]), plain (NIST SD

302 [78]), and contact to contactless fingerprint matching datasets (PolyU Contactless

2D to Contact-based 2D [145] and ZJU Finger Photo and Touch-based [93]) using the

same network, a step toward a universal fingerprint recognition system.

• Synthetic fingerprint spoof images for improved fingerprint PAD:

– A highly realistic plain print synthetic fingerprint generator capable of generating mul-

tiple impressions per finger.

– The first, to the best of our knowledge, synthetic fingerprint PA generator which is

capable of producing synthetic representations of both bona fide and PA impressions

of the same finger. This opens the door to joint optimization of fingerprint PAD and

recognition algorithms.

– Quantitative and qualitative analysis to verify the quality of our generated bona fide and

PA fingerprints.

– Experiments showcasing improved fingerprint PAD on both seen and unseen PA material

types when augmenting existing fingerprint datasets with our synthetic bona fide and

PA fingerprints.

– We release our code and a database of SpoofGAN images to encourage further research

in this area https://github.com/groszste/SpoofGAN.

• Universal Fingerprint Generation:

193

– A controllable latent diffusion model, GenPrint, using text and image conditions for

highly realistic and diverse synthetic fingerprint generation.

– GenPrint is capable of generating fingerprints of any acquisition type, sensor, fingerprint

class, and quality, including fingerprint styles not seen during training without any

additional fine-tuning (e.g., zero-shot fingerprint style generation).

– The generation process is controllable (both in appearance and identity preservation)

and explainable with humanly interpretable text prompts.

– The utility of GenPrint synthetic images is validated through experiments showcasing

improved recognition performance of models trained on GenPrint images compared to

real datasets and other benchmark fingerprint generation methods.

– We also demonstrate the utility of GenPrint images for evaluating fingerprint recognition

systems by replacing real data for large-scale identification experiments.

– Open-sourcing a dataset of 100K synthetic finger identities with 15 impressions of

various acquisition devices to the research community.

8.2 Suggestions for Future Work

The following are some of the possible future directions within the scope of improving the

generalization of fingerprint embeddings:

• Infant fingerprint recognition: Recognition of infant fingerprints has shown some initial

promise on recognizing infants enrolled between the ages of 2-3 months [70]; however, the

performance still significantly lags that of adult fingerprint recognition. Methods presented

in this dissertation, particularly style transfer, fusion of multiple deep networks, and fusion

of local and global features, may be applicable to this scenario; however, further advanced

methods specifically tailored to infant fingerprints is an area of important future research.

• Mobile-based fingerprint recognition: With extending fingerprint recognition to appli-

cations involving mobile devices, there is a need for computationally efficient fingerprint

recognition algorithms which can be embedded into the mobile device. This would facilitate

applications such as integrated image quality assessment, deduplication, and recognition in

194

real-time, as well as preserve the privacy of users by alleviating the need to send images

and/or templates over the network.

8.3 List of Publications

A list of publications during the course of my PhD program related to the topics of this thesis:

Journal Articles

• S. A. Grosz, and A. K. Jain, “Universal Fingerprint Generation: Controllable Diffusion

Model with Multimodal Conditions", under review in IEEE Transactions on Pattern Analysis

and Machine Intelligence, 2024. Patent pending.

• S. A. Grosz, A. Godbole, and A. K. Jain, “Mobile Contactless Palmprint Recognition: Use

of Multiscale, Multimodel Embeddings", under review in IEEE Transactions on Information

Forensics and Security, 2024.

• S. A. Grosz, and A. K. Jain, “Latent Fingerprint Recognition: Fusion of Local and Global

Embeddings", IEEE Trans. Information Forensics and Security, vol. 18, pp. 5691-5705,

2023. Technology licensed to Thales. Patent pending.

• S. A. Grosz, and A. K. Jain, “AFR-Net: Attention-Driven Fingerprint Recognition Network",

IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 6, pp. 30-42, 2023.

• K. P. Wĳewardena, S. A. Grosz, and A. K. Jain, “Fingerprint Template Invertibility: Minutiae

vs. Deep Templates", IEEE Trans. Information Forensics and Security, vol. 18, pp. 744-757,

2023.

• S. A. Grosz, and A. K. Jain, “SpoofGAN: Synthetic Fingerprint Spoof Images", IEEE Trans.

Information Forensics and Security, vol. 18, pp. 730-743, 2023.

• J. J. Engelsma, S. A. Grosz, and A. K. Jain, “PrintsGAN: Synthetic Fingerprint Generator",

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6111-

6124, 2023. Technology licensed to NEC.

• S. A. Grosz, J. J. Engelsma and A. K. Jain, “C2CL: Contact to Contactless Fingerprint

Matching", IEEE Trans. Information Forensics and Security, vol. 17, pp. 196-210, 2022.

195

Conference Proceedings and Technical Reports

• A. Godbole, S. A. Grosz, and A. K. Jain, “Contactless Palmprint Recognition for Children",

International Conference of the Biometrics Special Interest Group (BIOSIG), Sept. 2023.

• S. A. Grosz, K. P. Wĳewardena, and A. K. Jain, “ViT Unified: Joint Fingerprint Recognition

and Presentation Attack Detection", IEEE International Joint Conference on Biometrics,

Ljubljana, Sept. 2023.

• S. A. Grosz, J. J. Engelsma and A. K. Jain, “White-Box Evaluation of Fingerprint Recognition

Systems", ArXiv, 2021, arXiv:2008.00128.

• S. A. Grosz, J. J. Engelsma, N. G. Paulter and A. K. Jain, “White-Box Evaluation of Finger-

print Matchers: Robustness to Minutiae Perturbations”, IEEE International Joint Conference

on Biometrics, Houston, TX, Sept. 2020

• S. A. Grosz, T. Chugh and A. K. Jain, “Fingerprint Presentation Attack Detection: A Sen-

sor and Material Agnostic Approach", IEEE International Joint Conference on Biometrics,

Houston, TX, Sept. 2020.

196

BIBLIOGRAPHY

[1]

Aadhaar - Unique Identification Authority of India. https://uidai.gov.in/. Accessed: January
16, 2024.

[2]

ELFT-1.X Results. https://pages.nist.gov/elft/elft_1_x/results/. Accessed: May 3, 2024.

[3] M. B. Alejo. Unconstrained ear recognition using transformers. Jordanian Journal of

Computers and Information Technology, 7(4), 2021.

[4]

[5]

[6]

[7]

[8]

F. Alonso-Fernandez, R. N. Veldhuis, A. M. Bazen, J. Fiérrez-Aguilar, and J. Ortega-Garcia.
Sensor interoperability and fusion in fingerprint verification: A case study using minutiae-
and ridge-based matchers. In 2006 9th International Conference on Control, Automation,
Robotics and Vision, pages 1–6. IEEE, 2006.

H. AlShehri, M. Hussain, H. AboAlSamh, and M. AlZuair. A large-scale study of fingerprint
matching systems for sensor interoperability problem. Sensors, 18(4):1008, 2018.

A. H. Ansari. Generation and storage of large synthetic fingerprint database. ME Thesis,
Jul, 2011.

S. R. Arashloo, J. Kittler, and W. Christmas. An anomaly detection approach to face spoofing
detection: A new formulation and evaluation protocol. IEEE Access, 5:13868–13882, 2017.

D. R. Ashbaugh. Quantitative-qualitative friction ridge analysis: an introduction to basic
and advanced ridgeology. CRC press, 1999.

[9] M. Attia, M. H. Attia, J. Iskander, K. Saleh, D. Nahavandi, A. Abobakr, M. Hossny, and
S. Nahavandi. Fingerprint synthesis via latent space representation. In IEEE International
Conference on Systems, Man and Cybernetics, pages 1855–1861. IEEE, 2019.

[10] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align

and translate. arXiv preprint arXiv:1409.0473, 2014.

[11] K. Bahmani, R. Plesh, P. Johnson, S. Schuckers, and T. Swyka. High fidelity fingerprint
generation: Quality, uniqueness, and privacy. arXiv preprint arXiv:2105.10403, 2021.

[12] D. Baldisserra, A. Franco, D. Maio, and D. Maltoni. Fake fingerprint detection by odor

analysis. In Proc. ICB, pages 265–272. Springer, 2006.

[13] A. M. Bazen and S. H. Gerez. Fingerprint matching by thin-plate spline modelling of elastic

deformations. Pattern Recognition, 36(8):1859–1867, 2003.

[14] N. Beheshti and L. Johnsson. Squeeze u-net: A memory and energy efficient image segmen-
tation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern

197

recognition workshops, pages 364–365, 2020.

[15] S. M. Bellovin, P. K. Dutta, and N. Reitinger. Privacy and synthetic datasets. Stan. Tech. L.

Rev., 22:1, 2019.

[16] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new
perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–
1828, 2013.

[17] P. Birajadar, M. Haria, P. Kulkarni, S. Gupta, P. Joshi, B. Singh, and V. Gadre. Towards

smartphone-based touchless fingerprint recognition. S¯adhan¯a, 44(7):1–15, 2019.

[18] V. N. Boddeti, G. Sreekumar, and A. Ross. On the biometric capacity of generative face
models. In 2023 IEEE International Joint Conference on Biometrics (ĲCB), pages 1–10.
IEEE, 2023.

[19] P. Bontrager, A. Roy, J. Togelius, N. Memon, and A. Ross. Deepmasterprints: Generating
masterprints for dictionary attacks via latent variable evolution. In IEEE 9th International
Conference on Biometrics Theory, Applications and Systems, pages 1–9. IEEE, 2018.

[20] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6):567–585, 1989.

[21] R. Bouzaglo and Y. Keller. Synthesis and reconstruction of fingerprints using generative

adversarial networks. arXiv preprint arXiv:2201.06164, 2022.

[22] A. Brock, J. Donahue, and K. Simonyan. Large scale gan training for high fidelity natural

image synthesis. arXiv preprint arXiv:1809.11096, 2018.

[23] K. Cao and A. Jain. Fingerprint synthesis: Evaluating fingerprint search at scale.

In

International Conference on Biometrics, 2018.

[24] K. Cao and A. K. Jain. Latent orientation field estimation via convolutional neural network.
In 2015 International Conference on Biometrics (ICB), pages 349–356. IEEE, 2015.

[25] K. Cao and A. K. Jain. Hacking mobile phones using 2D Printed Fingerprints. MSU Tech.
report, MSU-CSE-16-2 https://www.youtube.com/watch?v=fZJI_BrMZXU, 2016.

[26] K. Cao and A. K. Jain. Fingerprint indexing and matching: An integrated approach. In 2017

IEEE International Joint Conference on Biometrics (ĲCB), pages 437–445. IEEE, 2017.

[27] K. Cao and A. K. Jain. Fingerprint Indexing and Matching: An Integrated Approach. In
2017 IEEE International Joint Conference on Biometrics, pages 437–445. IEEE, 2017.

[28] K. Cao and A. K. Jain. Automated latent fingerprint recognition. IEEE Transactions on

198

Pattern Analysis and Machine Intelligence, 41(4):788–800, 2018.

[29] K. Cao and A. K. Jain. Latent fingerprint recognition: role of texture template. In 2018
IEEE 9th international conference on biometrics theory, applications and systems (BTAS),
pages 1–9. IEEE, 2018.

[30] K. Cao, E. Liu, and A. K. Jain. Segmentation and enhancement of latent fingerprints: A
coarse to fine ridgestructure dictionary. IEEE transactions on pattern analysis and machine
intelligence, 36(9):1847–1859, 2014.

[31] K. Cao, D.-L. Nguyen, C. Tymoszek, and A. K. Jain. End-to-end latent fingerprint search.

IEEE Transactions on Information Forensics and Security, 15:880–894, 2019.

[32] Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, and L. Sun. A comprehensive survey of
ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv preprint
arXiv:2303.04226, 2023.

[33] R. Cappelli, A. Erol, D. Maio, and D. Maltoni. Synthetic fingerprint-image generation. In
Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, volume 3,
pages 471–474. IEEE, 2000.

[34] R. Cappelli, M. Ferrara, and D. Maltoni. Minutia cylinder-code: A new representation and
matching technique for fingerprint recognition. IEEE transactions on pattern analysis and
machine intelligence, 32(12):2128–2141, 2010.

[35] R. Cappelli, D. Maio, and D. Maltoni. Modelling plastic distortion in fingerprint images.
In International Conference on Advances in Pattern Recognition, pages 371–378. Springer,
2001.

[36] R. Cappelli, D. Maio, and D. Maltoni. Synthetic fingerprint-database generation. In Object
recognition supported by user interaction for service robots, volume 3, pages 744–747.
IEEE, 2002.

[37] R. Cappelli, D. Maio, and D. Maltoni. Semi-automatic enhancement of very low quality
In 2009 Proceedings of 6th International Symposium on Image and Signal

fingerprints.
Processing and Analysis, pages 678–683. IEEE, 2009.

[38] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End-to-
end object detection with transformers. In European conference on computer vision, pages
213–229. Springer, 2020.

[39] R. Casula, M. Micheletto, G. Orrù, R. Delussu, S. Concas, A. Panzino, and G. L. Marcialis.
In 2021 IEEE

Livdet 2021 fingerprint liveness detection competition-into the unknown.
International Joint Conference on Biometrics (ĲCB), pages 1–6. IEEE, 2021.

199

[40] S. Chikkerur, A. N. Cartwright, and V. Govindaraju. Fingerprint enhancement using stft

analysis. Pattern Recognition, 40(1):198–211, 2007.

[41] T. Chugh, K. Cao, and A. K. Jain. Fingerprint spoof buster: Use of minutiae-centered patches.
IEEE Transactions on Information Forensics and Security, 13(9):2190–2202, September
2018.

[42] T. Chugh and A. K. Jain. Fingerprint Presentation Attack Detection: Generalization and
Efficiency. IEEE International Conference on Biometrics (ICB), pages 1–8, 2019.

[43] T. Chugh and A. K. Jain. OCT Fingerprints: Resilience to Presentation Attacks. arXiv

preprint arXiv:1908.00102, 2019.

[44] T. Chugh and A. K. Jain. Fingerprint spoof detector generalization. IEEE Transactions on

Information Forensics and Security, 16:42–55, 2020.

[45] Cisco.

Cisco report:

all
https://www.biometricupdate.com/202211/cisco-report-81-percent-of-all-smartphones-
have-biometrics-enabled, 2022. Accessed on: January 16, 2024.

smartphones have biometrics

81% of

enabled.

[46] G. Csurka. Domain adaptation for visual applications: A comprehensive survey. arXiv

preprint arXiv:1702.05374, 2017.

[47] A. Dabouei, H. Kazemi, S. M. Iranmanesh, J. Dawson, and N. M. Nasrabadi. Fingerprint
In 2018 International

distortion rectification using deep convolutional neural networks.
Conference on Biometrics (ICB), pages 1–8. IEEE, 2018.

[48] A. Dabouei, H. Kazemi, S. M. Iranmanesh, J. Dawson, N. M. Nasrabadi, et al. Id preserving
generative adversarial network for partial latent fingerprint reconstruction. In 2018 IEEE 9th
International Conference on Biometrics Theory, Applications and Systems (BTAS), pages
1–10. IEEE, 2018.

[49] A. Dabouei, S. Soleymani, J. Dawson, and N. M. Nasrabadi. Deep contactless fingerprint

unwarping. In International Conference on Biometrics, pages 1–8. IEEE, 2019.

[50] H. M. Daluz. Fundamentals of fingerprint analysis. CRC Press, 2018.

[51] L. N. Darlow and B. Rosman. Fingerprint minutiae extraction using deep learning. In IEEE

International Joint Conference on Biometrics, pages 22–30. IEEE, 2017.

[52] A. K. Datta, H. C. Lee, R. Ramotowski, and R. Gaensslen. Advances in fingerprint technol-

ogy. CRC press, 2001.

[53] D. Deb, T. Chugh, J. Engelsma, K. Cao, N. Nain, J. Kendall, and A. K. Jain. Matching

fingerphotos to slap fingerprint images. arXiv preprint arXiv:1804.08122, 2018.

200

[54] P. Delgado-Santos, R. Tolosana, R. Guest, F. Deravi, and R. Vera-Rodriguez. Exploring
transformers for behavioural biometrics: A case study in gait recognition. arXiv preprint
arXiv:2206.01441, 2022.

[55] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson. Comparing the areas under
two or more correlated receiver operating characteristic curves: a nonparametric approach.
Biometrics, pages 837–845, 1988.

[56]

J. Deng, J. Guo, N. Xue, and S. Zafeiriou. Arcface: Additive angular margin loss for
deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition, pages 4690–4699, 2019.

[57] P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis. Advances in

neural information processing systems, 34:8780–8794, 2021.

[58] Y. Ding and A. Ross. An ensemble of one-class SVMs for fingerprint spoof detection across

different fabrication materials. In Proc. IEEE WIFS, pages 1–6, 2016.

[59] Y. Ding and A. Ross. An ensemble of one-class svms for fingerprint spoof detection
across different fabrication materials. In 2016 IEEE International Workshop on Information
Forensics and Security (WIFS), pages 1–6, 2016.

[60] P. Domingos. A few useful things to know about machine learning. Communications of the

ACM, 55(10):78–87, 2012.

[61] X. Dong and J. Shen. Triplet loss in siamese network for object tracking. In Proceedings of

the European conference on computer vision, pages 459–474, 2018.

[62] B. Dorizzi, R. Cappelli, M. Ferrara, D. Maio, D. Maltoni, N. Houmani, S. Garcia-Salicetti,
and A. Mayoue. Fingerprint and on-line signature verification competitions at ICB 2009. In
International Conference on Biometrics, pages 725–732. Springer, 2009.

[63] B. Dorizzi, R. Cappelli, M. Ferrara, D. Maio, D. Maltoni, N. Houmani, S. Garcia-Salicetti,
and A. Mayoue. Fingerprint and on-line signature verification competitions at icb 2009. In
International Conference on Biometrics, pages 725–732. Springer, 2009.

[64] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. De-
hghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Trans-
formers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

[65] H. Edwards and A. Storkey. Censoring representations with an adversary. arXiv preprint

arXiv:1511.05897, 2015.

[66]

J. J. Engelsma, S. S. Arora, A. K. Jain, and N. G. Paulter. Universal 3D wearable Fingerprint
Targets: Advancing Fingerprint Reader Evaluations.
IEEE Transactions on Information

201

Forensics and Security, 13(6):1564–1578, 2018.

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

J. J. Engelsma, K. Cao, and A. K. Jain. Raspireader: Open Source Fingerprint Reader. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 2018.

J. J. Engelsma, K. Cao, and A. K. Jain. Learning a fixed-length fingerprint representation.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.

J. J. Engelsma, K. Cao, and A. K. Jain. Learning a fixed-length fingerprint representation.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.

J. J. Engelsma, D. Deb, K. Cao, A. Bhatnagar, P. S. Sudhish, and A. K. Jain.
id: Fingerprints for global good.
Intelligence, 44(7):3543–3559, 2021.

Infant-
IEEE Transactions on Pattern Analysis and Machine

J. J. Engelsma, S. A. Grosz, and A. K. Jain. Printsgan: synthetic fingerprint generator. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 2022.

J. J. Engelsma and A. K. Jain. Generalizing fingerprint spoof detector: Learning a one-class
classifier. In International Conference on Biometrics, pages 1–8. IEEE, 2019.

J. J. Engelsma and A. K. Jain. Generalizing fingerprint spoof detector: Learning a one-class
classifier. In 2019 International Conference on Biometrics (ICB), pages 1–8, 2019.

J. J. Engelsma, A. K. Jain, and V. N. Boddeti. Hers: Homomorphically encrypted represen-
tation search. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2022.

[75] L. Ericson and S. Shine. Evaluation of contactless versus contact fingerprint data phase 2
(version 1.1). Technical Report 249552, DOJ Office Justice Programs, I. ManTech Adv. Syst.
Int., Fairmont, WV, 2015.

[76] M. A. I. Fahim and H. Y. Jung. A lightweight gan network for large scale fingerprint

generation. IEEE Access, 8:92918–92928, 2020.

[77]

J. Feng, J. Zhou, and A. K. Jain. Orientation field estimation for latent fingerprint enhance-
ment.
IEEE transactions on pattern analysis and machine intelligence, 35(4):925–940,
2012.

[78] G. P. Fiumara, P. A. Flanagan, J. D. Grantham, K. Ko, K. Marshall, M. Schwarz, E. Tabassi,
B. Woodgate, and C. Boehnen. NIST Special Database 302: Nail to Nail Fingerprint
Challenge. Technical Report NIST.TN.2007, National Institute of Standards and Technology,
Gaithersburg, MD, 2019.

[79] R. Gajawada, A. Popli, T. Chugh, A. Namboodiri, and A. K. Jain. Universal Material
Translator: Towards Spoof Fingerprint Generalization. In IEEE International Conference

202

on Biometrics (ICB), 2019.

[80] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or.
An image is worth one word: Personalizing text-to-image generation using textual inversion.
arXiv preprint arXiv:2208.01618, 2022.

[81] F. Galton. Finger prints. Macmillan and Company, 1892.

[82] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand,
and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine
Learning Research, 17(1):2096–2030, 2016.

[83] M. D. Garris and M. D. Garris. NIST special database 27: Fingerprint minutiae from latent
and matching tenprint images. US Department of Commerce, National Institute of Standards
and Technology, 2000.

[84] L. Ghiani, A. Hadid, G. L. Marcialis, and F. Roli. Fingerprint liveness detection using
Binarized Statistical Image Features. In Proc. IEEE 6th Int. Conf. BTAS, pages 1–6, 2013.

[85] L. Ghiani, G. L. Marcialis, and F. Roli. Fingerprint liveness detection by local phase

quantization. In Proc. 21st ICPR, pages 537–540, 2012.

[86] L. Ghiani, D. Yambay, V. Mura, S. Tocco, G. L. Marcialis, F. Roli, and S. Schuckcrs. LivDet
2013 Fingerprint Liveness Detection Competition 2013. In Proc. ICB, pages 1–6. IEEE,
2013.

[87] R. C. Gonzales and P. Wintz. Digital image processing. Addison-Wesley Longman Publish-

ing Co., Inc., 1987.

[88] L. J. González-Soler, M. Gomez-Barrero, L. Chang, A. Pérez-Suárez, and C. Busch. Fin-
gerprint Presentation Attack Detection Based on Local Features Encoding for Unknown
Attacks. arXiv preprint arXiv:1908.10163, 2019.

[89]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing
Systems, volume 27, pages 2672–2680, 2014.

[90] Google. Introducing imagen: A large-scale image library and open-source tooling to enable

computer vision research, May 2021.

[91] P. W. Greenwood and J. Petersiua. The criminal investigation process volume i: Summary

and policy impucations. 1975.

[92] S. A. Grosz, T. Chugh, and A. K. Jain. Fingerprint presentation attack detection: A sensor
In IEEE International Joint Conference on Biometrics,

and material agnostic approach.

203

pages 1–10. IEEE, 2020.

[93] S. A. Grosz, J. J. Engelsma, E. Liu, and A. K. Jain. C2cl: Contact to contactless fingerprint

matching. IEEE Transactions on Information Forensics and Security, 17:196–210, 2021.

[94] S. A. Grosz, J. J. Engelsma, N. G. Paulter, and A. K. Jain. White-box evaluation of
fingerprint matchers: Robustness to minutiae perturbations. In 2020 IEEE International
Joint Conference on Biometrics (ĲCB), pages 1–10. IEEE, 2020.

[95] S. A. Grosz, J. J. Engelsma, R. Ranjan, N. Ramakrishnan, M. Aggarwal, G. G. Medioni, and
A. K. Jain. Minutiae-guided fingerprint embeddings via vision transformers. arXiv preprint
arXiv:2210.13994, 2022.

[96] S. A. Grosz and A. K. Jain. Spoofgan: Synthetic fingerprint spoof images. IEEE Transactions

on Information Forensics and Security, 2022.

[97] S. A. Grosz and A. K. Jain. Afr-net: Attention-driven fingerprint recognition network. IEEE

Transactions on Biometrics, Behavior, and Identity Science, 2023.

[98] S. A. Grosz and A. K. Jain. Latent fingerprint recognition: Fusion of local and global

embeddings. IEEE Transactions on Information Forensics and Security, 2023.

[99] T. Group. Biometrics: A comprehensive introduction. https://dis-blog.thalesgroup.com/

identity-biometric-solutions/2017/01/30/biometricsintro/, 2017. Accessed: May 3, 2024.

[100] T. Group.

Five

james

bond

gadgets which

use

real-world

technologies.

https://dis-blog.thalesgroup.com/identity-biometric-solutions/2021/11/16/five-james-
bond-gadgets-which-use-real-world-technologies/, 2021. Accessed: May 3, 2024.

[101] T. Group. Automated Fingerprint Identification System (AFIS) overview - A short his-
https://www.thalesgroup.com/en/markets/digital-identity-and-security/government/

tory.
biometrics/afis-history, 2023. Accessed: January 16, 2024.

[102] S. Gu, J. Feng, J. Lu, and J. Zhou. Latent fingerprint registration via matching densely
sampled points. IEEE Transactions on Information Forensics and Security, 16:1231–1244,
2020.

[103] S. Gu, J. Feng, J. Lu, and J. Zhou. Latent fingerprint indexing: Robust representation
IEEE Transactions on Information Forensics and Security,

and adaptive candidate list.
17:908–923, 2022.

[104] K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang. Transformer in transformer. Advances

in Neural Information Processing Systems, 34:15908–15919, 2021.

[105] M. R. Hawthorne. Fingerprints: Analysis and Understanding. CRC Press, 2009.

204

[106] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pages
770–778, 2016.

[107] Z. He, E. Liu, and Z. Xiang. Partial fingerprint verification via spatial transformer networks.

In IEEE International Joint Conference on Biometrics, pages 1–10. IEEE, 2020.

[108] E. R. Henry. Classification and uses of finger prints. HM Stationery office, 1922.

[109] W. J. Herschel. The origin of finger-printing. H. Milford, Oxford University Press, 1916.

[110] B. Y. Hiew, A. B. Teoh, and Y.-H. Pang. Touch-less fingerprint recognition system. In 2007
IEEE Workshop on Automatic Identification Advanced Technologies, pages 24–29. IEEE,
2007.

[111] L. Hong, Y. Wan, and A. Jain. Fingerprint image enhancement: algorithm and performance
evaluation. IEEE transactions on Pattern Analysis and Machine Intelligence, 20(8):777–789,
1998.

[112] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto,
and H. Adam. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision
Applications. arXiv preprint arXiv:1704.04861, 2017.

[113] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. LoRA:
Low-rank adaptation of large language models. In International Conference on Learning
Representations, 2022.

[114] G. Huang and A. H. Jafari. Enhanced balancing gan: Minority-class image generation.

Neural Computing and Applications, pages 1–10, 2021.

[115] J. Huang, W. Luo, W. Yang, A. Zheng, F. Lian, and W. Kang. Fvt: Finger vein transformer
for authentication. IEEE Transactions on Instrumentation and Measurement, 2022.

[116] X. Huang, P. Qian, and M. Liu. Latent fingerprint image enhancement based on progressive
generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops, pages 800–801, 2020.

[117] International

Standards Organization.

ISO/IEC 30107-1:2016,

Technology—Biometric
https://www.iso.org/standard/53227.html, 2016.

Presentation Attack Detection—Part

1:

Information
Framework.

[118] J. J Engelsma, D. Deb, A. Jain, A. Bhatnagar, and P. Sewak Sudhish. Infant-prints: Fin-
In Proceedings of the IEEE/CVF Conference on

gerprints for reducing infant mortality.
Computer Vision and Pattern Recognition Workshops, pages 67–74, 2019.

205

[119] M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer networks. Advances in

Neural Information Processing Systems, 28, 2015.

[120] A. K. Jain, Y. Chen, and M. Demirkus. Pores and ridges: High-resolution fingerprint
IEEE Transactions on Pattern Analysis and Machine

matching using level 3 features.
Intelligence, 29(1):15–27, 2007.

[121] A. K. Jain and J. Feng. Latent fingerprint matching. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 33(1):88–100, 2010.

[122] A. K. Jain, S. Prabhakar, and S. Pankanti. On the similarity of identical twin fingerprints.

Pattern Recognition, 35(11):2653–2663, 2002.

[123] J. Jang, S. J. Elliott, and H. Kim. On improving interoperability of fingerprint recognition
In S.-W. Lee and S. Z. Li,
using resolution compensation based on sensor evaluation.
editors, Advances in Biometrics, pages 455–463, Berlin, Heidelberg, 2007. Springer Berlin
Heidelberg.

[124] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual Losses for Real-time Style Transfer and
Super-resolution. In European Conference on Computer Vision (ECCV), pages 694–711.
Springer, 2016.

[125] P. Johnson, F. Hua, and S. Schuckers. Texture modeling for synthetic fingerprint genera-
tion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, pages 154–159, 2013.

[126] I. Joshi, A. Anand, M. Vatsa, R. Singh, S. D. Roy, and P. Kalra. Latent fingerprint enhance-
ment using generative adversarial networks. In 2019 IEEE winter conference on applications
of computer vision (WACV), pages 895–903. IEEE, 2019.

[127] K. M. Kenyon. Archaeology in the holy land. E. Benn; New York: WW Norton, 1979.

[128] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah. Transformers in

vision: A survey. ACM Computing Surveys, 54(10s):1–41, 2022.

[129] H. Kim, X. Cui, M.-G. Kim, and T. H. B. Nguyen. Fingerprint generation and presentation
attack detection using deep neural networks. In IEEE Conference on Multimedia Information
Processing and Retrieval, pages 375–378. IEEE, 2019.

[130] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint

arXiv:1412.6980, 2014.

[131] S. Kirchgasser, C. Kauba, and A. Uhl. Assessment of synthetically generated mated sam-
ples from single fingerprint samples instances. In 2021 IEEE International Workshop on
Information Forensics and Security (WIFS), pages 1–6. IEEE, 2021.

206

[132] S. Kirchgasser, C. Kauba, and A. Uhl. The plus multi-sensor and longitudinal fingerprint
dataset: An initial quality and performance evaluation. IEEE Transactions on Biometrics,
Behavior, and Identity Science, 4(1):43–56, 2021.

[133] P. D. Komarinski. Automated fingerprint identification systems. In Cold Case Homicides,

pages 317–326. CRC Press, 2017.

[134] A. Kumar and Y. Zhou. Contactless fingerprint identification using level zero features. In

CVPR 2011 Workshops, pages 114–119. IEEE, 2011.

[135] P. D. Lapsley, J. A. Lee, D. F. Pare Jr, and N. Hoffman. Anti-fraud biometric scanner that

accurately detects blood flow, 1998. US Patent 5,737,439.

[136] C. Lee, S. Lee, and J. Kim. A study of touchless fingerprint recognition system. In Joint
IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and
Structural and Syntactic Pattern Recognition (SSPR), pages 358–365. Springer, 2006.

[137] J. Li, J. Feng, and C.-C. J. Kuo. Deep convolutional neural network for latent fingerprint

enhancement. Signal Processing: Image Communication, 60:52–63, 2018.

[138] K. Li and X. Yang. Diffusion probabilistic model based end-to-end latent fingerprint syn-
thesis. In 2023 IEEE 4th International Conference on Pattern Recognition and Machine
Learning (PRML), pages 343–349. IEEE, 2023.

[139] R. Li, D. Song, Y. Liu, and J. Feng. Learning global fingerprint features by training a fully
convolutional network with local patches. In 2019 International Conference on Biometrics
(ICB), pages 1–8. IEEE, 2019.

[140] R. Li, D. Song, Y. Liu, and J. Feng. Learning Global Fingerprint Features by Training a
Fully Convolutional Network with Local Patches. International Conference on Biometrics,
2019.

[141] Z. Li, M. Cao, X. Wang, Z. Qi, M.-M. Cheng, and Y. Shan. Photomaker: Customizing
realistic human photos via stacked id embedding. arXiv preprint arXiv:2312.04461, 2023.

[142] C. Lin and A. Kumar. Improving cross sensor interoperability for fingerprint identification.
In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 943–948.
IEEE, 2016.

[143] C. Lin and A. Kumar. A cnn-based framework for comparison of contactless to contact-based
IEEE Transactions on Information Forensics and Security, 14(3):662–676,

fingerprints.
2018.

[144] C. Lin and A. Kumar. Contactless and partial 3d fingerprint recognition using multi-view

deep representation. Pattern Recognition, 83:314–327, 2018.

207

[145] C. Lin and A. Kumar. Matching contactless and contact-based conventional fingerprint
images for biometrics identification. IEEE Transactions on Image Processing, 27(4):2008–
2021, 2018.

[146] C. Lin and A. Kumar. A cnn-based framework for comparison of contactless to contact-based
IEEE Transactions on Information Forensics and Security, 14(3):662–676,

fingerprints.
2019.

[147] M. Liu and P. Qian. Automatic segmentation and enhancement of latent fingerprints using
deep nested unets. IEEE Transactions on Information Forensics and Security, 16:1709–1719,
2020.

[148] X. Liu, K. Raja, R. Wang, H. Qiu, H. Wu, D. Sun, Q. Zheng, N. Liu, X. Wang, G. Huang,
et al. A latent fingerprint in the wild database. IEEE Transactions on Information Forensics
and Security, 2024.

[149] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer:
Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages 10012–10022, 2021.

[150] I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv

preprint arXiv:1608.03983, 2016.

[151] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint

arXiv:1711.05101, 2017.

[152] L. Lugini, E. Marasco, B. Cukic, and I. Gashi. Interoperability in fingerprint recognition:
A large-scale empirical study. In 2013 43rd Annual IEEE/IFIP Conference on Dependable
Systems and Networks Workshop (DSN-W), pages 1–6. IEEE, 2013.

[153] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning

Research, 9(Nov):2579–2605, 2008.

[154] D. Maio and D. Maltoni. Ridge-line density estimation in digital images. In Proceedings.
Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), volume 1,
pages 534–538. IEEE, 1998.

[155] D. Maio, D. Maltoni, R. Cappelli, J. L. Wayman, and A. K. Jain. Fvc2002. http://bias.csr.

unibo.it/fvc2002/, 2002. 2002.

[156] D. Maio, D. Maltoni, R. Cappelli, J. L. Wayman, and A. K. Jain. Fvc2004. http://bias.csr.

unibo.it/fvc2004/, 2004. 2004.

[157] A. Malhotra, A. Sankaran, A. Mittal, M. Vatsa, and R. Singh. Fingerphoto authentication
In Human

using smartphone camera captured under varying environmental conditions.

208

Recognition in Unconstrained Environments, pages 119–144. Elsevier, 2017.

[158] A. Malhotra, A. Sankaran, M. Vatsa, and R. Singh. On matching finger-selfies using deep
IEEE Transactions on Biometrics, Behavior, and Identity Science,

scattering networks.
2(4):350–362, 2020.

[159] A. Malhotra, M. Vatsa, R. Singh, K. B. Morris, and A. Noore. Multi-surface multi-technique
(must) latent fingerprint database. IEEE Transactions on Information Forensics and Security,
2023.

[160] D. Maltoni, D. Maio, A. K. Jain, and F. Jianjiang. Handbook of Fingerprint Recognition.

Springer Science & Business Media, 3 edition, 2022.

[161] E. Marasco, L. Lugini, B. Cukic, and T. Bourlai. Minimizing the impact of low interop-
erability between optical fingerprints sensors. In IEEE Sixth International Conference on
Biometrics: Theory, Applications and Systems (BTAS), pages 1–8. IEEE, 2013.

[162] E. Marasco and A. Ross. A survey on antispoofing schemes for fingerprint recognition

systems. ACM Computing Surveys, 47(2):28, 2015.

[163] E. Marasco and C. Sansone. On the Robustness of Fingerprint Liveness Detection Algorithms
against New Materials used for Spoofing. In Proc. Intl. Conf. Bio-Insp. Syst. Sign. Process.,
pages 553–558, 2011.

[164] E. Marasco and C. Sansone. Combining perspiration-and morphology-based static features
for fingerprint liveness detection. Pattern Recognition Letters, 33(9):1148–1156, 2012.

[165] S. Marcel, M. S. Nixon, J. Fierrez, and N. Evans, editors. "Handbook of Biometric Anti-

Spoofing: Presentation Attack Detection". Springer, 2 edition, 2019.

[166] G. L. Marcialis, A. Lewicke, B. Tan, P. Coli, D. Grimberg, A. Congiu, A. Tidu, F. Roli,
and S. Schuckers. First International Fingerprint Liveness Detection Competition—LivDet
2009. In Proc. ICIAP, pages 12–23. Springer, 2009.

[167] G. L. Marcialis, F. Roli, and A. Tidu. Analysis of fingerprint pores for vitality detection. In

Proc. 20th ICPR, pages 1289–1292, 2010.

[168] G. Mariani, F. Scheidegger, R. Istrate, C. Bekas, and C. Malossi. Bagan: Data augmentation

with balancing gan. arXiv preprint arXiv:1803.09655, 2018.

[169] T. Matsumoto, H. Matsumoto, K. Yamada, and S. Hoshino.

Impact of artificial gummy

fingers on fingerprint systems. In Proc. SPIE, volume 4677, pages 275–289, 2012.

[170] B. M. Mehtre and B. Chatterjee. Segmentation of fingerprint images—a composite method.

Pattern recognition, 22(4):381–385, 1989.

209

[171] S. Minaee and A. Abdolrashidi. Finger-gan: Generating realistic fingerprint images using

connectivity imposed gan. arXiv preprint arXiv:1812.10482, 2018.

[172] S. Minaee, A. Abdolrashidi, H. Su, M. Bennamoun, and D. Zhang. Biometrics recognition

using deep learning: A survey. arXiv preprint arXiv:1912.00271, 2019.

[173] V. Mistry, J. J. Engelsma, and A. K. Jain. Fingerprint synthesis: Search with 100 million

prints. In International Joint Conference on Biometrics, 2020.

[174] A. A. Moenssens. Fingerprint techniques. Chilton Book Company London, 1971.

[175] V. Mura, L. Ghiani, G. L. Marcialis, F. Roli, D. A. Yambay, and S. A. Schuckers. LivDet
2015 - Fingerprint liveness detection competition 2015. In Proc. IEEE 7th Intl. Conf. BTAS,
pages 1–6, 2015.

[176] V. Mura, G. Orrù, R. Casula, A. Sibiriu, G. Loi, P. Tuveri, L. Ghiani, and G. L. Marcialis.
In IEEE International

LivDet 2017 Fingerprint Liveness Detection Competition 2017.
Conference on Biometrics (ICB), pages 297–302. IEEE, 2018.

[177] R. F. Nogueira, R. de Alencar Lotufo, and R. C. Machado. Fingerprint liveness detection
IEEE Transactions on Information Forensics and

using convolutional neural networks.
Security, 11(6):1206–1213, 2016.

[178] ODNI, IARPA.

IARPA-BAA-16-04 (Thor). https://www.iarpa.gov/index.php/research-

programs/odin/odin-baa, 2016.

[179] U. D. of Homeland Security. Office of biometric identity management identification ser-
vices. https://www.dhs.gov/obim-biometric-identification-services. Accessed on: January
16, 2024.

[180] F. B. of Investigation. Next generation identification (ngi) system fact sheet. https://le.
fbi.gov/file-repository/december-2023-ngi-system-fact-sheet.pdf/view, 2023. Accessed on:
January 16, 2024.

[181] U. S. F. B. of Investigation. The Science of Fingerprints: Classification and Uses. US

Department of Justice, Federal Bureau of Investigation, 1984.

[182] A. Oig. Review of the fbi’s handling of the brandon mayfield case. Office of the Inspector

General, Oversight and Review Division, US Department of Justice, pages 1–330, 2006.

[183] K. Okereafor, I. Ekong, I. O. Markson, and K. Enwere. Fingerprint biometric system hygiene

and the risk of covid-19 transmission. JMIR Biomedical Engineering, 5(1):e19623, 2020.

[184] OpenAI. Chatgpt. https://openai.com/blog/chatgpt, October 2022.

210

[185] OpenAI. Creating video from text. OpenAI Blog, February 2024.

[186] G. Orrù, R. Casula, P. Tuveri, C. Bazzoni, G. Dessalvi, M. Micheletto, L. Ghiani, and G. L.
Marcialis. Livdet in action-fingerprint liveness detection competition 2019. In International
Conference on Biometrics, pages 1–6. IEEE, 2019.

[187] H. İ. Öztürk, B. Selbes, and Y. Artan. Minnet: Minutia patch embedding network for
automated latent fingerprint recognition. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 1627–1635, 2022.

[188] F. Pala and B. Bhanu. Deep Triplet Embedding Representations for Liveness Detection.
In Deep Learning for Biometrics. Advances in Computer Vision and Pattern Recognition.,
pages 287–307. Springer, 2017.

[189] C. T. Pang, Y. W. Yun, and J. Xudong. On-card matching. Encyclopedia of Biometrics,

pages 1014–1021, 2009.

[190] S. Pankanti, S. Prabhakar, and A. K. Jain. On the individuality of fingerprints.
Transactions on Pattern Analysis and Machine Intelligence, 24(8):1010–1025, 2002.

IEEE

[191] G. Parziale. Touchless fingerprinting technology. In Advances in Biometrics, pages 25–48.

Springer, 2008.

[192] G. Parziale and Y. Chen. Advanced technologies for touchless fingerprint recognition. In

Handbook of Remote Biometrics, pages 83–109. Springer, 2009.

[193] J. A. Pereira, A. F. Sequeira, D. Pernes, and J. S. Cardoso. A robust fingerprint presentation
attack detection method against unseen attacks through adversarial learning.
In BIOSIG
2020-Proceedings of the 19th International Conference of the Biometrics Special Interest
Group. Gesellschaft für Informatik eV, 2020.

[194] J. Priesnitz, R. Huesmann, C. Rathgeb, N. Buchmann, and C. Busch. Mobile touchless
fingerprint recognition: Implementation, performance and usability aspects. arXiv preprint
arXiv:2103.03038, 2021.

[195] N. K. Ratha, S. Chen, and A. K. Jain. Adaptive flow orientation-based feature extraction in

fingerprint images. Pattern Recognition, 28(11):1657–1672, 1995.

[196] A. Rattani, W. J. Scheirer, and A. Ross. Open set fingerprint spoof detection across novel fab-
rication materials. IEEE Transactions on Information Forensics and Security, 10 (11):2447–
2460, 2015.

[197] A. Rattani, W. J. Scheirer, and A. Ross. Open set fingerprint spoof detection across novel fab-
rication materials. IEEE Transactions on Information Forensics and Security, 10(11):2447–
2460, 2015.

211

[198] M. S. Riazi, S. M. Chavoshian, and F. Koushanfar. Synfi: Automatic synthetic fingerprint

generation. arXiv preprint arXiv:2002.08900, 2020.

[199] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image
In Proceedings of the IEEE/CVF conference on

synthesis with latent diffusion models.
computer vision and pattern recognition, pages 10684–10695, 2022.

[200] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image
segmentation. In International Conference on Medical Image Computing and Computer-
assisted Intervention, pages 234–241. Springer, 2015.

[201] A. Ross, S. Dass, and A. Jain. A deformable model for fingerprint matching. Pattern

Recognition, 38(1):95–103, 2005.

[202] A. Ross, S. C. Dass, and A. K. Jain. Fingerprint warping using ridge curve correspondences.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1):19–30, 2005.

[203] A. Ross and A. Jain. Biometric sensor interoperability: A case study in fingerprints. In
D. Maltoni and A. K. Jain, editors, ECCV Workshop BioAW, pages 134–145. Springer,
Springer, 2004.

[204] A. Ross and R. Nadgir. A calibration model for fingerprint sensor interoperability.

In
Biometric Technology for Human Identification III, volume 6202, page 62020B. International
Society for Optics and Photonics, 2006.

[205] K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann. Stabilizing training of generative adver-
sarial networks through regularization. Advances in Neural Information Processing Systems,
30, 2017.

[206] P. C. Roy and V. N. Boddeti. Mitigating information leakage in image representations: A
maximum entropy approach. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 2586–2594, 2019.

[207] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman. Dreambooth: Fine
tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510,
2023.

[208] R. Saferstein. Criminalistics: An introduction to forensic science. 2004.

[209] A. Sankaran, T. I. Dhamecha, M. Vatsa, and R. Singh. On matching latent to latent fin-
gerprints. In 2011 international joint conference on biometrics (ĲCB), pages 1–6. IEEE,
2011.

[210] A. Sankaran, A. Malhotra, A. Mittal, M. Vatsa, and R. Singh. On smartphone camera

212

based fingerphoto authentication. In 2015 IEEE 7th International Conference on Biometrics
Theory, Applications and Systems (BTAS), pages 1–7. IEEE, 2015.

[211] A. Sankaran, M. Vatsa, and R. Singh. Hierarchical fusion for matching simultaneous
In 2012 IEEE Fifth International Conference on Biometrics: Theory,

latent fingerprint.
Applications and Systems (BTAS), pages 377–382. IEEE, 2012.

[212] A. Sankaran, M. Vatsa, and R. Singh. Multisensor optical and latent fingerprint database.

IEEE access, 3:653–665, 2015.

[213] A. W. Senior and R. M. Bolle. Improved fingerprint matching by distortion removal. IEICE

Transactions on Information and Systems, 84(7):825–832, 2001.

[214] A. Shoshan, N. Bhonker, E. Ben Baruch, O. Nizan, I. Kviatkovsky, J. Engelsma, M. Ag-
garwal, and G. Medioni. Fpgan-control: A controllable fingerprint generator for training
with synthetic data. In Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision, pages 6067–6076, 2024.

[215] X. Si, J. Feng, J. Zhou, and Y. Luo. Detection and rectification of distorted fingerprints.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):555–568, 2015.

[216] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image

recognition. arXiv preprint arXiv:1409.1556, 2014.

[217] D. Song and J. Feng. Fingerprint indexing based on pyramid deep convolutional feature.
In 2017 IEEE International Joint Conference on Biometrics (ĲCB), pages 200–207. IEEE,
2017.

[218] D. Song and J. Feng. Fingerprint Indexing based on Pyramid Deep Convolutional Feature.
In IEEE International Joint Conference on Biometrics, pages 200–207. IEEE, 2017.

[219] D. Song, Y. Tang, and J. Feng. Aggregating minutia-centred deep convolutional features for

fingerprint indexing. Pattern Recognition, 88:397–408, 2019.

[220] Y. Song, C. Lee, and J. Kim. A new scheme for touchless fingerprint recognition sys-
tem. In Proceedings of 2004 International Symposium on Intelligent Signal Processing and
Communication Systems, (ISPACS), pages 524–527. IEEE, 2004.

[221] L. Spinoulas, H. Mirzaalian, M. E. Hussein, and W. AbdAlmageed. Multi-modal fingerprint
presentation attack detection: Evaluation on a new dataset. IEEE Transactions on Biometrics,
Behavior, and Identity Science, 3(3):347–364, 2021.

[222] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sutton. Veegan: Reducing
mode collapse in gans using implicit variational learning. Advances in Neural Information
Processing Systems, 30, 2017.

213

[223] Statista. Number of aadhaar cards generated in india. https://www.statista.com/statistics/
1349391/india-number-of-aadhaar-cards-generated/, 2023. Accessed: May 3, 2024.

[224] C. Stein, C. Nickel, and C. Busch. Fingerphoto recognition with smartphone cameras. In
International Conference of Biometrics Special Interest Group (BIOSIG), pages 1–12. IEEE,
2012.

[225] G. Stragapede, P. Delgado-Santos, R. Tolosana, R. Vera-Rodriguez, R. Guest, and
arXiv preprint

A. Morales. Mobile keystroke biometrics using transformers.
arXiv:2207.07596, 2022.

[226] K. Sundararajan and D. L. Woodard. Deep learning for biometrics: A survey. ACM

Computing Surveys (CSUR), 51(3):1–34, 2018.

[227] J. Svoboda, F. Monti, and M. M. Bronstein. Generative convolutional networks for latent
In 2017 IEEE International Joint Conference on Biometrics

fingerprint reconstruction.
(ĲCB), pages 429–436. IEEE, 2017.

[228] E. Tabassi, M. Olsen, O. Bausinger, C. Busch, A. Figlarz, G. Fiumara, O. Henniger, J. Merkle,
T. Ruhland, C. Schiel, and M. Schwaiger. NIST Fingerprint Image Quality 2. Technical
Report NIST.IR.8382, National Institute of Standards and Technology, Gaithersburg, MD,
2021.

[229] B. Tan, A. Lewicke, D. Yambay, and S. Schuckers. The effect of environmental conditions
and novel spoofing methods on fingerprint anti-spoofing algorithms. In Proc. IEEE Intl.
WIFS, pages 1–6, 2010.

[230] S. Tandon and A. Namboodiri. Transformer based fingerprint feature extraction. arXiv

preprint arXiv:2209.03846, 2022.

[231] S. Tang, C. Han, M. Li, and T. Guo. An end-to-end algorithm based on spatial transformer
for fingerprint matching. In 7th International Conference on Computer and Communication
Systems, pages 320–325. IEEE, 2022.

[232] W. Tang, D. Figueroa, D. Liu, K. Johnsson, and A. Sopasakis. Enhancing fingerprint
image synthesis with gans, diffusion models, and style transfer techniques. arXiv preprint
arXiv:2403.13916, 2024.

[233] Y. Tang, F. Gao, and J. Feng. Latent fingerprint minutia extraction using fully convolutional
network. In 2017 IEEE International Joint Conference on Biometrics (ĲCB), pages 117–123.
IEEE, 2017.

[234] Y. Tang, F. Gao, J. Feng, and Y. Liu. Fingernet: An unified deep network for fingerprint
minutiae extraction. In IEEE International Joint Conference on Biometrics, pages 108–116.
IEEE, 2017.

214

[235] R. Tolosana, M. Gomez-Barrero, C. Busch, and J. Ortega-Garcia. Biometric Presentation At-
tack Detection: Beyond the Visible Spectrum. IEEE Transactions on Information Forensics
and Security, 2019.

[236] R. Tolosana, M. Gomez-Barrero, J. Kolberg, A. Morales, C. Busch, and J. Ortega-Garcia.
Towards Fingerprint Presentation Attack Detection based on Convolutional Neural Networks
and Short Wave Infrared Imaging.
In IEEE International Conference of the Biometrics
Special Interest Group (BIOSIG), pages 1–5. IEEE, 2018.

[237] M. Trauring. Automatic comparison of finger-ridge patterns. Nature, 197(4871):938–940,

1963.

[238] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adapta-
tion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 7167–7176, 2017.

[239] B. T. Ulery, R. A. Hicklin, G. I. Kiebuzinski, M. A. Roberts, and J. Buscaglia. Understanding
the sufficiency of information for latent fingerprint value determinations. Forensic Science
International, 230(1-3):99–106, 2013.

[240] D. M. Uliyan, S. Sadeghi, and H. A. Jalab. Anti-spoofing method for fingerprint recog-
nition using patch based deep learning machine. Engineering Science and Technology, an
International Journal, 23(2):264–273, 2020.

[241] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and
I. Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems,
30, 2017.

[242] P. von Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, D. Nair,
S. Paul, W. Berman, Y. Xu, S. Liu, and T. Wolf. Diffusers: State-of-the-art diffusion models.
https://github.com/huggingface/diffusers, 2022.

[243] K. Wada. labelme: Image Polygonal Annotation with Python. https://github.com/wkentaro/

labelme, 2016.

[244] M. Wang and W. Deng. Deep visual domain adaptation: A survey. Neurocomputing,

312:135–153, 2018.

[245] Q. Wang, X. Bai, H. Wang, Z. Qin, and A. Chen. Instantid: Zero-shot identity-preserving

generation in seconds. arXiv preprint arXiv:2401.07519, 2024.

[246] C. I. Watson. NIST special database 14: Mated fingerprint cards pairs 2 version 2. Technical

report, National Institute of Standards and Technology, Gaithersburg, MD, 2001.

[247] C. I. Watson, G. P. Fiumara, E. Tabassi, S. L. Cheng, P. A. Flanagan, and W. J. Salamon.

215

Fingerprint vendor technology evaluation. 2015.

[248] C. I. Watson and C. L. Wilson. NIST special database 4. Fingerprint Database, National

Institute of Standards and Technology, 17(77):5, 1992.

[249] R. Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models,

2019.

[250] P. Wild, F. Daubner, H. Penz, and G. F. Domínguez. Comparative test of smartphone
finger photo vs. touch-based cross-sensor fingerprint recognition. In 2019 7th International
Workshop on Biometrics and Forensics (IWBF), pages 1–6. IEEE, 2019.

[251] A. B. V. Wyzykowski and A. K. Jain. Synthetic latent fingerprint generator. IEEE/CVF

Winter Conference on Applications of Computer Vision, 2022.

[252] A. B. V. Wyzykowski, M. P. Segundo, and R. de Paula Lemes. Level three synthetic fin-
gerprint generation. In 2020 25th International Conference on Pattern Recognition (ICPR),
pages 9250–9257. IEEE, 2021.

[253] A. B. V. Wyzykowski, M. P. Segundo, and R. d. P. Lemes. Level three synthetic fingerprint

generation. arXiv preprint arXiv:2002.03809, 2020.

[254] Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig. Controllable invariance through adversarial
feature learning. In Advances in Neural Information Processing Systems, pages 585–596,
2017.

[255] S. Yadav, C. Chen, and A. Ross. Relativistic discriminator: A one-class classifier for gener-
alized iris presentation attack detection. In The IEEE Winter Conference on Applications of
Computer Vision, pages 2635–2644, 2020.

[256] B. Yamashita and M. French. Fingerprint sourcebook-chapter 7: Latent print development.
US Dept. of Justice, Office of Justice Programs, National Institute of Justice, 2010.

[257] D. Yambay, L. Ghiani, P. Denti, G. L. Marcialis, F. Roli, and S. Schuckers. LivDet 2011-
Fingerprint liveness detection competition 2011. In Proc. 5th IAPR ICB, pages 208–215.
IEEE, 2012.

[258] X. Yang, J. Feng, and J. Zhou. Localized dictionaries based orientation field estimation
IEEE transactions on pattern analysis and machine intelligence,

for latent fingerprints.
36(5):955–969, 2014.

[259] X. Yin, Y. Zhu, and J. Hu. Contactless fingerprint recognition based on global minutia
IEEE Transactions on Information Forensics and

topology and loose genetic algorithm.
Security, 15:28–41, 2019.

216

[260] S. Yoon, J. Feng, and A. K. Jain. Latent fingerprint enhancement via robust orientation field
estimation. In 2011 international joint conference on biometrics (ĲCB), pages 1–8. IEEE,
2011.

[261] S. Yoon, J. Feng, and A. K. Jain. Altered fingerprints: Analysis and detection. IEEE TPAMI,

34(3):451–464, 2012.

[262] S. Yoon and A. K. Jain. Longitudinal study of fingerprint recognition. Proceedings of the

National Academy of Sciences, 112(28):8555–8560, 2015.

[263] B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial
learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society,
pages 335–340, 2018.

[264] J. Zhang, R. Lai, and C.-C. J. Kuo. Latent fingerprint segmentation with adaptive total
variation model. In 2012 5th IAPR International Conference on Biometrics (ICB), pages
189–195. IEEE, 2012.

[265] J. Zhang, R. Lai, and C.-C. J. Kuo. Adaptive directional total-variation model for la-
tent fingerprint segmentation. IEEE Transactions on Information Forensics and Security,
8(8):1261–1273, 2013.

[266] L. Zhang, A. Rao, and M. Agrawala. Adding conditional control to text-to-image diffusion
models. In Proceedings of the IEEE/CVF International Conference on Computer Vision,
pages 3836–3847, 2023.

[267] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image
super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2472–2481, 2018.

[268] Q. Zhao, A. K. Jain, N. G. Paulter, and M. Taylor. Fingerprint image synthesis based on
statistical feature models. In IEEE Fifth International Conference on Biometrics: Theory,
Applications and Systems, pages 23–30. IEEE, 2012.

[269] Y. Zhong and W. Deng. Face transformer for recognition. arXiv preprint arXiv:2103.14803,

2021.

[270] W. Zhou, J. Hu, I. Petersen, S. Wang, and M. Bennamoun. A benchmark 3d fingerprint
database. In 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery
(FSKD), pages 935–940, 2014.

[271] Y. Zhu, X. Yin, and J. Hu. Fingergan: A constrained fingerprint generation scheme for latent
fingerprint enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence,
2023.

217