SYNTHESIZING IRIS AND OCULAR IMAGES USING ADVERSARIAL NETWORKS AND
DIFFUSION MODELS

By

Shivangi Yadav

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computer Science – Doctor of Philosophy

2024

ABSTRACT

Synthetic biometric data – such as fingerprints, face, iris and speech – can overcome some of the

limitations associated with the use of real data in biometric systems. The focus of this work is on the

iris biometric. Current methods for generating synthetic irides and ocular images have limitations

in terms of quality, realism, intra-class diversity and uniqueness. Different methods are proposed in

this thesis to overcome these issues while evaluating the utility of synthetic data for two biometric

tasks: iris matching and presentation attack (PA) detection.

Two types of synthetic iris images are generated: (1) partially synthetic and (2) fully synthetic.

The goal of “partial synthesis” is to introduce controlled variations in real data. This can be

particularly useful in scenarios where real data are limited, imbalanced, or lack specific variations.

We present three different techniques to generate partially synthetic iris data: one that leverages

the classical Relativistic Average Standard Generative Adversarial Network (RaSGAN), a novel

Cyclic Image Translation Generative Adversarial Network (CIT-GAN) and a novel Multi-domain

Image Translative Diffusion StyleGAN (MID-StyleGAN). While RaSGAN can generate realistic

looking iris images, this method is not scalable to multiple domains (such as generating different

types of PAs). To overcome this limitation, we propose CIT-GAN that generates iris images using

multi-domain style transfer. To further address the issue of quality imbalance across different

domains, we develop MID-StyleGAN that exploits the stable and superior generative power of

diffusion based StyleGAN.

The goal of “full synthesis” is to generate iris images with both inter and intra-class variations. In

this regard, we propose two novel architectures, viz., iWarpGAN and IT-diffGAN. The proposed

iWarpGAN focuses on generating iris images that are different from the identities in the training

data using two transformation pathways: (1) Identity Transformation and (2) Style Transformation.

On the other hand, Image Translative Diffusion-GAN (IT-diffGAN) projects input images onto the

latent space of a diffusion GAN, identifying and manipulating the features most relevant to identity

and style. By adjusting these features in the latent space, IT-diffGAN generates new identities while

preserving image realism.

A number of experiments are conducted using multiple iris and ocular datasets in order to evaluate

the quality, realism, uniqueness, and utility of the synthetic images generated using the aforemen-

tioned techniques. An extensive analysis conveys the benefits and the limitations of each technique.

In summary, this thesis advances the state of the art in iris and ocular synthesis by leveraging the

prowess of GANs and Diffusion Models.

ACKNOWLEDGMENTS

I am deeply grateful to my doctoral advisor and mentor, Professor Arun Ross, for his exceptional

guidance and support throughout my academic journey. His profound expertise, insightful feedback,

and unwavering commitment to excellence have been pivotal in shaping both my research and my

growth as a scholar. Professor Ross has consistently emphasized the importance of not only

understanding the technical aspects of research but also probing deeper into the reasoning behind

each method. His mentorship has provided me with countless opportunities to expand my knowledge

and skills through conferences, workshops, and other academic events, for which I am sincerely

thankful.

I would also like to thank Professors Xiaoming Liu, Professors Selin Aviyente and

Johnson Kristen for serving on my committee and offering their expert advice on my research. I am

thankful to Professor Sandeep Kulkarni, Associate Chair of Graduate Studies, for their guidance

and support that helped me stay motivated throughout my degree. I am also deeply grateful to my

Master’s advisor, Professor Mayank Vatsa and Professor Richa Singh, whose encouragement led

me to pursue my doctoral studies at Michigan State University.

I have been fortunate to have the support of incredible lab mates and colleagues throughout

this journey. Their patience, critiques, and guidance have been essential in helping me refine and

defend my research. They have celebrated my successes and provided comfort during setbacks. I

am also grateful to Vincent Mattison and Brenda Hodge for their unwavering support.

Lastly, I would like to express my deepest gratitude to my partner, Aman Chahar, for his

unwavering support throughout my PhD journey. His constant encouragement and understanding

have been a source of strength, helping me stay focused and motivated even during the most

challenging times. Aman’s belief in my abilities and his patience through the ups and downs of

this journey have been invaluable. I am incredibly fortunate to have him by my side, and I couldn’t

have come this far without his love and support.

iv

TABLE OF CONTENTS

CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Biometric Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Biometric Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Synthetic Biometric Data – Face, Fingerprint & Iris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Synthetic Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
2
5
8
1.3.1
9
1.3.2 Applications of Synthetic Biometric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Methods to Generate Synthetic Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Pre-CNN Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3.1
Post-CNN Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.3.2
Synthetic Iris Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.3.4

CHAPTER 2 GENERATING PARTIALLY SYNTHETIC IRIS IMAGES FOR ENHANCED

PRESENTATION ATTACK DETECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.1
Standard Generative Adversarial Networks (SGANs) . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.2 Relativistic Standard Generative Adversarial Networks (RSGANs) . . . . . . . . . . . 41
2.2.3 Relativistic Average Standard Generative Adversarial Networks (RaSGANs) 41
Fréchet Inception Distance Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.4
2.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Synthesizing Irides using RaSGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.3.1
2.3.2 Relativistic Discriminator- A One-class Presentation Attack Detection

Method (RD-PAD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.2.1 Method-I: RD-PAD Trained with Bonafide Samples Only . . . . . . . . . . 46
2.3.2.2 Method-II: RD-PAD Fine-tuned with Some PA Samples . . . . . . . . . . . 46
2.4 Experimental Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4.1 Dataset Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4.2
Image Realism Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.3 Applications of RaSGAN based Iris images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.3.1 Baseline on Current PAD Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Synthetic Iris as Bonafide Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.3.2
Synthetic Iris as Presentation Attack Sample . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.3.3
2.4.4 RD-PAD for Seen and Unseen Presentation Attack Detection . . . . . . . . . . . . . . . . . 52
2.4.4.1
Seen Presentation Attacks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.4.2 Unseen Attack: Cosmetic Contact Lenses and Kindle Display . . . . . 53
2.4.4.3 Unseen Attack: Printed Eyes and Artificial Eyes . . . . . . . . . . . . . . . . . . . . 54
2.4.4.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

CHAPTER 3 CYCLIC IMAGE TRANSLATION GENERATIVE ADVERSARIAL

NETWORK (CIT-GAN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.1

v

3.2 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.2
Styling Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.3 Cycle Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3 Experimental Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.1 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.2
Image Realism Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.3 Utility of Synthetically Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Results & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

CHAPTER 4 IWARPGAN: DISENTANGLING IDENTITY AND STYLE TO GENERATE

SYNTHETIC IRIS IMAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.1 Disentangling Identity and Style to Generate New Iris Identities . . . . . . . . . . . . . . 80
4.4 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 Experimental Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5.1 Experiment-1: Quality of Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5.2 Experiment-2: Uniqueness of Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5.3 Experiment-3: Utility of Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.6 Summary & Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

CHAPTER 5 IT-DIFFGAN: IMAGE TRANSLATIVE DIFFUSION GAN TO GENERATE

5.2.1.1
5.2.1.2

SYNTHETIC IRIS IMAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.1 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Standard Generative Adversarial Networks (SGANs) . . . . . . . . . . . . . . . 98
StyleGAN and its Latent Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.2 Diffusion-GANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3.1 Mapping Image Input to Latent Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.2
Identity & Style Disentanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.3 Manipulating Style & Identity in Latent Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Identity Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.4 Datasets Utilized. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Experiments & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4.1 Experiment 1: Test of Realism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4.2 Experiment 2: Test of Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.4.3 Experiment 3: Utility of Generated Iris Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.3.3.1
5.3.3.2

vi

CHAPTER 6 MULTI-DOMAIN IMAGE TRANSLATIVE DIFFUSION STYLEGAN

WITH APPLICATION IN IRIS PRESENTATION ATTACK DETECTION . . . 122
6.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.2.1 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.2.2 Diffusion based GANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4 Experiments & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.4.1 Datasets & PA Detection Methods Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.4.2 Realism Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.4.3 Utility of Generated Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.4.3.1 Baseline Experiment-0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4.3.2 Utility Experiment-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Studying Components of Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.4.4.1
6.4.4.2 Capacitive Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

CHAPTER 7 SUMMARY AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

vii

CHAPTER 1

INTRODUCTION

Throughout human history, the capacity to differentiate individuals uniquely and associate personal

attributes, like name and nationality, to a person has played an essential role in human society.

For example, people have traditionally used physical traits such as facial features, gait, speech,

and surroundings as cues to recognize each other. Human abilities for identity management may

suffice within small communities, but the limitations of the human brain [69] that come with the

exponential growth of the global population has necessitated the development of sophisticated

systems to efficiently handle the task of managing many identities. This led to the emergence of

biometrics, the science of recognizing individuals on large scale based on their unique physical or

behavioral characteristics [69]. This discipline encompasses the study and application of automated

methods for identifying or verifying an individual’s identity by analyzing their distinctive physical

or behavioral traits. Unlike traditional identification methods such as ID cards or passwords,

which can be lost, stolen, or forgotten, biometrics offers a highly reliable and convenient means

of person recognition. Thus, biometrics enable accurate and secure person recognition through

the utilization of intrinsic and distinctive human traits [69, 71]. Physical traits include but are not

limited to fingerprints [96], face [70], iris [17] and hand geometry [133] (viz., Figure 1.1). Some

examples of primarily behavioral traits may include gait [118], speech [13], signature [48], and

keystroke dynamics [139]. This research will primarily emphasize the iris modality as a biometric

trait.

"Biometrics is the most human-centric technology of our time." - John Mears, former

Director of Biometric Standards and Testing at NIST, Biometrics: Personal Identifica-

tion in Networked Society, 1999.

"Biometrics provides a bridge between our physical existence and our digital identity."

- Mark Lockie, Biometrics Institute, 2019.

1

Figure 1.1: Examples of physical biometric traits such as fingerprints [4], face in visible spectrum
and iris in NIR spectrum [5].

1.1 Biometric Systems

Biometric systems have become an integral part of contemporary technology, revolutionizing

the way individuals are recognized. These systems capitalize on unique physical or behavioral

characteristics to establish a person’s identity with a high degree of accuracy. A biometric system

comprises of two distinct subsystems: enrollment and recognition [15, 71].

In the enrollment

subsystem, a sensor is utilized to capture raw biometric data from an individual. The captured data

undergoes a quality check to ensure its suitability for further processing. A feature extractor then

extracts relevant information from the data, which is securely stored in a database along with a

unique ID assigned to the individual during the data acquisition stage. If the captured data fails to

meet the required quality standards, it is discarded, prompting the enrollment process to restart.

The recognition subsystem can be further divided into two different types of recognition modes

(as shown in Figure 1.9) [15, 69]:

• Verification: It is also known as one-to-one matching and is employed to verify the claimed

identity of an individual. During verification, a biometric sample provided by the individual

2

Figure 1.2: Illustration of the key components of biometric systems, including data acquisition,
quality assessment, feature extraction and matching, highlighting their roles in recognition.

Figure 1.3: Illustration of verification and identification modes. Verification mode is used to verify
the claimed identity of an individual i.e., a biometric sample provided by the individual is compared
to a pre-stored template associated with their claimed identity. The objective is to determine if the
sample and template correspond to the same identity. On the other hand, during the identification
mode the biometric sample provided by the individual is compared against all available templates
in the system’s database. Similarity scores are computed between the sample and each template to
find the closest match.

3

is compared to a pre-stored template associated with their claimed identity. The objective

is to determine if the sample and template correspond to the same identity. In verification,

the performance evaluation involves the use of genuine and impostor pairs. Genuine pairs

consist of a biometric sample used for enrollment (stored template) and the sample presented

during verification. The system’s task is to correctly match these pairs, thereby confirming

the claimed identity. Genuine pairs represent successful matches, resulting in positive

verification outcomes. Conversely, impostor pairs consist of a biometric sample from an

individual attempting to impersonate someone else and the stored template associated with

the target individual. The system aims to detect the mismatch between the samples and

reject the impostor’s claim. Impostor pairs represent failed attempts to deceive the system,

resulting in negative verification outcomes. To evaluate the performance of a verification

system, metrics such as False Acceptance Rate (FAR) and False Rejection Rate (FRR) are

utilized. FAR quantifies the rate at which impostor pairs are incorrectly accepted as genuine

matches, highlighting security vulnerabilities. FRR measures the rate at which genuine pairs

are incorrectly rejected as impostor matches, indicating the system’s inability to correctly

identify individuals.

• Identification: During the identification mode, the biometric sample provided by the indi-

vidual is compared against all available templates in the system’s database. Similarity scores

are computed between the sample and each template to find the closest match. The template

associated with the highest similarity score is considered the identification result, indicating

the likely identity of the individual. Identification can further be classified into positive and

negative identification. The positive identification refers to a scenario where an individ-

ual’s provided biometric sample successfully matches a template in the database with a high

similarity score. In this case, the system accurately identifies the individual and confirms

their true identity. Positive identification is achieved when the system correctly recognizes

the person among the potential candidates, providing a reliable match. On the other hand,

Negative Identification occurs when the individual’s biometric sample does not match any

4

template in the database, or the similarity scores fall below a predefined threshold. In such

cases, the system fails to identify the individual or determine their true identity. Negative

identification indicates that the given person’s biometric data is not present in the database or

the corresponding biometric sample does not sufficiently match any of the stored templates.

It is important to note that the concepts of positive and negative identification help assess

the effectiveness and accuracy of biometric identification systems in correctly identifying

individuals from a pool of candidates.

1.2 Biometric Datasets

A biometric dataset refers to a collection of data that consists of biometric samples or measure-

ments obtained from individuals for the purpose of developing, testing, and evaluating biometric

recognition systems. Biometric data is derived from the unique physical or behavioral charac-

teristics of individuals, such as fingerprints, iris, face, speech, etc. The collection of a biometric

dataset involves several steps [28,69]. First, a target population or sample group is identified, which

represents the intended user base of the biometric system. This population can vary based on the

application, such as employees in an organization, individuals at border control points, or patients in

a healthcare setting. Next, the individuals within the target population are enrolled in the biometric

dataset. During the enrollment process, biometric samples are collected using specific sensors or

devices designed for each biometric modality. For example, fingerprint scanners, iris cameras, or

voice recording devices may be used to capture the respective biometric traits. Details of some

publicly available iris datasets is given in Table 1.1 that lists iris datasets collected using various

sensors in different spectrum. Note that most of these datasets are frontal view and collected in a

controlled environment.

Biometric datasets play a crucial role in training algorithms, testing system performance, and

conducting research in the field of biometrics. However, the collection of biometric datasets

involves various challenges that can lead to their shortcomings [15, 28, 69, 145]:

• Informed Consent and Legal Issues: Collecting biometric data requires obtaining informed

5

consent from individuals. Consent involves providing clear and understandable information

about the purpose, scope, and potential risks associated with the data collection. Compliance

with legal requirements, such as data protection laws and regulations specific to biometric

data, is essential to ensure that the collection and use of biometric datasets are conducted

lawfully and ethically.

• Privacy Concerns: Biometric data is highly personal and sensitive since it represents unique

physical or behavioral characteristics of individuals. Collecting and storing such data must

comply with privacy regulations and ensure the protection of individuals’ privacy rights.

Proper measures such as data encryption, secure storage, and controlled access to the dataset

should be implemented.

• Inter-class variations: It refers to the differences or dissimilarities between individuals in

a dataset. The importance of inter-class variation lies in its ability to capture the diver-

sity present within the target population. Biometric systems are designed to recognize and

differentiate individuals based on their unique traits. By including samples from individ-

uals with diverse characteristics such as demographics, ethnicities, ages, and genders, the

inter-class variation in the dataset ensures that the system can generalize well and perform

accurately across a wide range of individuals. It helps to avoid bias and ensures fairness in

the recognition process.

• Intra-class variations: It represents the natural variations that occur within an individual’s

biometric samples. No two samples of the same biometric trait from a person are identical due

to factors such as environmental conditions, sensor variations, or changes in the presentation

of the biometric trait. Understanding and modeling this variation is crucial for handling the

inherent uncertainties and variations in real-world scenarios. By capturing and incorporating

intra-class variation in the dataset, biometric systems can be trained to be more robust and

tolerant to these natural variations, leading to improved recognition performance.

• Sample Size: The size of the biometric dataset is a critical factor in determining the effective-

6

ness of biometric systems. A larger dataset provides a more representative and comprehensive

coverage of the target population, enabling better algorithm training, evaluation, and test-

ing. Increasing the dataset size reduces the risk of overfitting, where the system becomes

too specialized to the limited samples in the dataset, resulting in poor performance on new

samples.

• Data Quality: The quality of biometric samples within the dataset is essential for achieving

reliable recognition performance. Factors such as the quality of the sensors used for data

capture, environmental conditions during data collection (e.g., lighting, noise), and variations

in sample presentation can impact the quality of the dataset and subsequently influence system

performance. Quality control measures, including sensor calibration, data pre-processing

techniques, and rigorous data validation, should be employed to enhance the quality of the

collected biometric dataset.

• Annotation and Ground Truth: Biometric datasets often require annotation or ground truth

labels, indicating the correct identity associated with each sample. This process can be labor-

intensive and subjective, as it involves human judgment to determine the correct identities.

The accuracy and reliability of the ground truth significantly influence the performance eval-

uation of biometric systems, making proper annotation a crucial aspect of dataset collection.

• Longitudinal Data and Aging: Longitudinal biometric datasets are collected over time from

the same individuals to study the effects of aging on biometric traits. Collecting and main-

taining longitudinal datasets present challenges in terms of tracking individuals over extended

periods, managing data consistency and integrity, and addressing variations in biometric traits

due to aging or other factors. Longitudinal data enables the development of age-invariant

recognition algorithms and provides insights into the long-term performance and stability of

biometric systems.

These challenges can lead to shortcomings that hampers the development and evaluation of

reliable biometric systems. For example, the datasets used for training and evaluating the biometric

7

Table 1.1: Iris datasets captured in different spectrum such as visible (VIS) and near-infrared (NIR)
using various sensors.

Dataset Name
CASIA-Iris-Thousand [5]
CASIA-Iris-Interval [5]
CASIA-Iris-Lamp [5]
CASIA-Iris-Twins [5]
CASIA-Iris-Distance [5]
ICE-2005 [1]
ICE-2006 [2]
IIITD-CLI [80]
URIRIS v1 [113]
URIRIS v2 [112]
MILES [43]
MICHE DB [167]
CSIP [122]
WVU BIOMDATA [30]

IIT-Delhi Iris Dataset [6]

CASIA-BTAS [167]
IIITD Multi-spectral Periocular [127]
CROSS-EYED [125]

Devices
Iris scanner (Irisking IKEMB-100)
CASIA close-up iris camera
OKI IRISPASS-h
OKI IRISPASS-h
CASIA long-range iris camera
LG2200
LG2200
Cogent and VistaFA2E single iris sensor
Nikon E5700
Canon EOS 5D
MILES camera
iPhone 5, Samsung Galaxy (IV + Tablet II)
Xperia Arc S, iPhone 4, THL W200, Huawei Ideos X3
Irispass
JIRIS, JPC1000 and
digital CMOS camera
CASIA Module v2
Cogent Iris Scanner
Dual Spectrum Sensor

Spectrum
NIR
NIR
NIR
NIR
NIR
NIR
NIR
NIR
VIS
VIS
VIS
VIS
VIS
NIR

NIR

NIR
NIR, VIS, Night Vision
NIR, VIS

Number of Images (subjects)
20,000 (1000)
2,639 (249)
16,212 (411)
3,183 (200)
2,567 (142)
2,953 (132)
59,558 (240)
6,570 (240)
1,877 (241)
11,102 (261)
832 (50)
3,732 (184)
2,004 (100)
3,099 (244)

1,120 (224)

4,500 (300)
1,240 (62)
11,520 (240)

systems often lack an adequate number of samples and fail to encompass the full range of inter

and intra class variations 1.1. To overcome these shortcomings, researchers are actively exploring

the potential of synthetically generated biometric data [73, 164] that can capture the intricate

characteristics of real biometric traits, such as facial features [16,83,114], fingerprints [45,95,103],

iris patterns [126, 147, 173], etc. While generating synthetic biometric data it is important to

make sure that the synthetic samples exhibit a wide range of intra-class variations, bolstering the

size and diversity of biometric dataset, facilitating the development of more robust and accurate

biometric systems. Also, the generated biometric data should be distinct from real individuals. This

makes sure that the risk of privacy breaches and unauthorized access to personal data is effectively

mitigated, since the generated samples do not correspond to real individuals. This aspect assumes

great significance, as ensuring the confidentiality and security of individuals’ biometric information

is of paramount importance.

1.3 Synthetic Biometric Data – Face, Fingerprint & Iris

In recent years, biometric based human recognition has gained significant attention and

widespread adoption with applications in various domains such as security systems, access control,

forensics, healthcare etc. However, as the field of biometrics advances, it presents a range of

8

research challenges that require exploration and innovation. For example, one of the prominent

challenges in biometrics research is the availability of datasets with sufficient size, high quality, and

diverse intra-class variations. Despite remarkable progress in biometric technologies, the datasets

used for training and evaluating these systems often lack an adequate number of samples and fail

to encompass the full range of intra-class variations. This limitation hampers the development

and evaluation of reliable biometric systems. Another critical challenge is to ensure the privacy

of individuals who submit their biometric data. Biometric information, being inherently personal

and unique, raises concerns about unauthorized access, data breaches, and potential identity theft.

Protecting the privacy of individuals while utilizing their biometric data is of utmost importance

to foster trust and encourage widespread adoption of biometric systems [145]. Researchers have

been actively working on finding solutions to overcome these challenges. One of such solution is

to explore the potential of synthetic biometric data.

1.3.1 Synthetic Biometrics

Synthetic biometrics refer to the creation of artificial biometric data that replicates real-world

characteristics. It encompasses the generation of synthetic samples that emulate the traits and pat-

terns observed in authentic biometric data [35], such as fingerprints, facial features, iris patterns, or

speech. These synthetic samples are designed to possess similar statistical properties and variations

as genuine biometric data, providing a valuable resource for research, development, and evaluation

in the field of biometrics. Therefore, the synthetically generated biometrics can be utilized to

generate more biometric data with both inter and intra-class variations. This helps overcome the

limitations and challenges associated with conventional biometric datasets. Traditional biometric

datasets often suffer from restricted sample sizes, lack diversity, and present concerns regarding

privacy and data sharing. In contrast, synthetically generated biometrics offer a controlled and scal-

able solution by generating artificial data that can mimic the complexity and diversity of real-world

biometric traits [73].

By generating synthetic biometric data, researchers and developers gain access to larger and

9

more diverse datasets that facilitate robust training and evaluation of biometric systems. These

datasets encompass a wide range of variations, including different poses, lighting conditions, ex-

pressions, and occlusions. This facilitates comprehensive testing and optimization of biometric

algorithms, enhancing their performance, reliability, and generalization capabilities. Additionally,

synthetically generated biometrics address privacy concerns associated with the utilization of gen-

uine biometric data. The artificial nature of the generated data ensures that it is not directly linked

to any specific individual, mitigating the risks of unauthorized access or misuse of personal infor-

mation. Consequently, synthetic biometric datasets can be shared and distributed for research and

evaluation purposes without compromising individuals’ privacy rights. Furthermore, synthetically

generated biometrics play a crucial role in enhancing the training and testing of deep convolutional

neural network (CNN) models. Deep CNNs have demonstrated remarkable performance in various

biometric tasks but heavily rely on large labeled datasets for effective training. Synthetically gener-

ated biometric data aids in augmenting the availability of labeled training data by creating synthetic

samples with known ground truth annotations. This facilitates the creation of more extensive and

diverse training sets, resulting in improved CNN model training and higher accuracy in biometric

systems. Moreover, synthetically generated biometrics contribute to the development of robust

spoof detection algorithms. Spoofing attacks, where impostors attempt to deceive biometric sys-

tems using fabricated or altered biometric traits, pose significant security risks. Synthetic data can

be generated to simulate various spoofing scenarios, producing a comprehensive dataset of spoofing

attempts with diverse attack variations. This synthetic dataset enables the training and evaluation of

spoof detection algorithms, enhancing their effectiveness and enabling the development of resilient

countermeasures against evolving spoofing techniques.

To summarize, synthetically generated biometrics provide a valuable approach to address the

limitations and challenges associated with conventional biometric datasets. By offering more

data with sufficient variations, mitigating privacy concerns, improving deep CNN training and

testing, and enhancing spoof detection capabilities, synthetically generated biometrics significantly

contribute to the advancement and effectiveness of biometric systems in real-world applications.

10

1.3.2 Applications of Synthetic Biometric Data

Synthetic biometric data has gained significant attention as a valuable resource in various fields,

offering a range of applications and benefits. Here, we discuss the diverse applications of syn-

thetic biometric data and highlights its significance in addressing challenges and improving the

performance of biometric systems [45, 58, 73, 145, 158]:

• Algorithm Development and Testing: Synthetic biometric data serves as a valuable resource

for algorithm development and testing in the field of biometrics. By generating synthetic

biometric samples, researchers can assess and evaluate the performance of novel algorithms,

compare different techniques, and benchmark their efficacy against a standardized dataset.

Synthetic data allows for controlled experimentation, enabling researchers to precisely ma-

nipulate specific biometric traits, variations, and noise levels to simulate real-world scenarios.

• System Evaluation and Benchmarking: Synthetic biometric data plays a vital role in evalu-

ating and benchmarking the performance of biometric systems. It provides a standardized

dataset that enables fair comparisons between different systems and algorithms. By using

synthetic data, researchers and developers can assess system accuracy, robustness, and vul-

nerability to various attacks or spoofing attempts. This evaluation process aids in identifying

system weaknesses, improving overall system performance, and guiding the development of

countermeasures.

• Training Data Augmentation: Synthetic biometric data is utilized to augment training

datasets, enhancing the performance of biometric recognition systems. By generating addi-

tional synthetic samples, researchers can increase the size and diversity of the training set,

which help to improve the performance and generalization capabilities of the algorithms. This

approach reduces overfitting, enhances the system’s ability to handle intra-class variations,

and improves overall recognition accuracy.

• Privacy-Preserving Studies: Synthetic biometric data is invaluable for privacy-preserving

11

studies and research involving sensitive biometric information.

It allows researchers to

conduct studies, simulations, and experiments without the need for real individuals’ personal

biometric data. Synthetic data provides a privacy-friendly alternative that ensures data

protection while enabling advancements in biometric research and system development.

• Modeling Abnormal or Rare Cases: Synthetic data generation can help model age progression

and the effects of diseases in iris recognition, where real-world datasets are often scarce. Age

progression may or may not alter iris patterns over time, but lack of enough data over the

period of time make it hard to have a definitive analysis. By using generative techniques

like GANs, synthetic iris data can simulate these changes, improving the performance of

biometric systems across different ages. Similarly, synthetic data can model disease effects

on the iris, helping recognition systems account for conditions like cataracts or glaucoma,

which impact iris textures. This allows for the development of more robust iris recognition

systems that can handle aging and disease-related changes effectively.

Overall, the applications of synthetic biometric data contribute to the advancement of biometric

systems, enhancing their accuracy, security, and privacy while facilitating research and development

in the field. Apart from these, synthetic biometric data generation is widely used in the field of

entertainment and gaming to generate avatars with realistic looking human features, emotions and

speech [23, 38, 68, 89]. Similarly, learning and generating human-like gait movements can help

create simulations for training and testing autonomous vehicles [154].

1.3.3 Methods to Generate Synthetic Biometrics

Synthetic biometric data offers flexibility and convenience in the development and assessment of

biometric systems. In this context, it is important to understand the different methods employed

to generate synthetic biometric data and their respective approaches. These methods have evolved

with advancements in technology, ranging from pre-CNN techniques that predate the widespread

adoption of CNNs to post-CNN methods that harness the power of deep learning. By exploring

12

these techniques, researchers and practitioners have been working on generating synthetic biometric

data that closely resembles real-world samples, fostering advancements in the field of biometrics.

Some of these techniques are given below:

1.3.3.1 Pre-CNN Methods

Pre-CNN methods refer to the techniques used to generate synthetic biometric data before the

widespread adoption of CNNs. These methods often involve mathematical models and algorithms

to simulate biometric traits. Some common pre-CNN methods are [73]:

• Mathematical Models: These models form a fundamental approach for generating synthetic

data by approximating the distribution of real human data through mathematical models.

Synthetic samples are then created by sampling from the approximated model. For example,

Shah and Ross [126] proposed an approach for generating digital renditions of iris images

using a two-step technique. In the first stage, they utilized a Markov Random Field model

to generate a background texture that accurately represents the global appearance of the

iris. In the subsequent stage, various iris features, including radial and concentric furrows,

collarette, and crypts, are generated and seamlessly embedded within the texture field. In

another example, Cappelli et al. [18] introduced a widely recognized method called Synthetic

Fingerprint Generation (SFinGe) that is based on mathematical modeling. The researchers

leverage their domain expertise to establish a fingerprint orientation model, which incorpo-

rates key characteristics such as the number and placement of fingerprint cores and deltas.

The generation process begins by initializing the core and delta locations, followed by the

generation of ridge orientations and densities. To obtain a high-quality fingerprint image,

the authors employ space-invariant linear filtering techniques. Additionally, they introduce

domain-specific noise to simulate realistic grayscale fingerprint images. There are also other

methods that use mathematical models for synthetic generation for face [108], keystroke [99],

hand recognition [27], etc. Examples of some biometric data generated using mathematical

models are given in Figure 1.4.

13

Limitations: Mathematical models learn to generate synthetic biometric data by approximat-

ing the distribution of real human data, creating a dependency on training data. Therefore,

such models may struggle to capture complex variations present in biometric data, resulting

in limited diversity in the generated synthetic data. For example, SFinGe generates synthetic

fingerprints based on a predefined fingerprint orientation model. This limits the diversity and

variability of the generated fingerprints, as they are confined to the characteristics defined

by the model [45]. As a result, the generated fingerprints may not fully represent the wide

range of natural fingerprint variations observed in real-world scenarios. Apart from limited

variability, some models are also limited in terms of quality and realism [27, 45, 108, 156].

• Input Perturbations:

It refers to the deliberate introduction of controlled variations or

distortions to the input data in order to generate synthetic biometric data. This technique is

commonly used in data augmentation to increase the diversity and robustness of biometric

datasets. The perturbations can be either hand-crafted or dynamic in nature [46, 73]. Hand-

crafted perturbations involve the application of predetermined modifications or manipulations

to the original biometric data, with specific perturbations designed by experts based on their

understanding of the underlying biometric characteristics and vulnerabilities. On the other

hand, dynamic perturbations involve the introduction of variations to the biometric data in

a more adaptive and context-aware manner. Examples of hand-crafted perturbations include

geometric transformations, the addition of noise, modifications to texture, and the inclusion

of occlusion patterns. These perturbations are typically static and do not vary dynamically

during the generation process. Nojavanasghari et al. [107] proposed a method for synthesizing

and recognizing occlusions caused by hands covering faces in images. The focus of the paper

is on addressing the challenges posed by hand occlusions in facial recognition systems. The

synthesis pipeline consists of three steps: (1) To synthesize faces with occlusions, the first

step is to collect non-occluded faces and segmented occlusions such as hands, hair, hats,

scarves, etc., (2) align the faces with occlusions in terms of scale, pose, orientations and

color correction, and (3) determine the areas of faces where occlusion should be added using

14

facial landmarks. This helps synthesize face images with occlusions that can help train a face

recogniton system to overcome the challenges posed by occluded faces. In [19], Cardoso et

al. aimed to generate synthetic degraded iris images for evaluation purposes. The method

utilizes various degradation factors such as blur, noise, occlusion, and contrast changes to

simulate realistic and challenging iris image conditions. The degradation factors are carefully

controlled to achieve a realistic representation of degraded iris images commonly encountered

in real-world scenarios. Examples of some biometric data generated using perturbations are

given in Figure 1.5.

Limitations: The purpose of applying input perturbations is to simulate the natural variability

present in real biometric data. However, the range of hand-crafted perturbations may be

limited compared to the extensive diversity observed in real biometric data.

It can be

challenging to fully encompass the complete spectrum of variations and complexities using

input perturbations alone. Also, input perturbations are typically generic and may not

incorporate domain-specific knowledge or expertise. This can result in synthetic data that

does not fully capture the specific characteristics or challenges of the target biometric modality

or application domain [46].

• 3-D Modelling & Rendering: 3D modeling is a technique used in synthetic image generation

to create a representations of the three-dimensional surface of an object of interest. This

approach involves constructing a digital model that accurately captures the shape, texture,

and geometry of the biometric feature, such as a face, fingerprint, or eye. The advantage of

using 3D modeling and rendering is the ability to incorporate extreme changes in illumination,

viewpoint, occlusion, scale, and background, providing a diverse set of synthetic samples.

Han et al. [61] emphasized the benefits of generating synthetic samples in 3D space, allowing

for precise control over environmental conditions such as pose variations, lighting, and object

geometry. This control enables accurate annotations, which are often obtained from real

datasets. Other studies have explored the use of 3D rendering tools in various applications,

including re-identification of individuals, face recognition, and gait recognition. Examples

15

of some biometric data generated using 3-D models are given in Figure 1.6.

Limitations: While 3D modeling and rendering techniques have advantages in synthetic

biometric image generation, they also come with some limitations. These methods can cap-

ture the shape and texture of a biometric feature, but there are still limitations in achieving

complete realism. Certain fine details, such as subtle variations in skin texture or small

imperfections are challenging to accurately replicate in the synthetic images. This limitation

could potentially affect the performance and generalization of biometric recognition algo-

rithms trained on synthetic data. Also, these methods are seen to generate images that are

similar to the training dataset i.e., the identities in generated dataset has high similarity with

the identities in real dataset [73]

• Hybrid Approaches: Hybrid approaches combine multiple techniques to generate synthetic

data that exhibits diverse and realistic characteristics. These approaches leverage the strengths

of both mathematical models and transformation techniques. For instance, researchers may

use parametric models to capture the statistical properties of the data and then apply trans-

formation techniques to introduce variations and enhance the realism of the synthetic data.

By combining these approaches, researchers can create synthetic data that closely resembles

the real data while incorporating desired variations.

In [110], Park et al. combined 3-D

modeling with separate models for shape and texture to generate effects of aging on human

faces. This helped train a face recognition method with temporal in-variance. As discussed

earlier, Cappelli et al. [18] proposed SFinGe that utilizes mathematical model to generate

synthetic fingerprints by initializing the core and delta locations, followed by the generation

of ridge orientations and densities. Also, domain-specific noise is added to the generated

images to improve realism of the generated images.

Limitations: While the hybrid approaches helps improve quality and realism of generated in

some cases [18, 110], their generations capability is still limited to the training data affecting

their diversity and fail to capture complex variations caused due to external factors [73].

16

Figure 1.4: Examples of images generated by approximating the distribution of real human data
through mathematical models. (i) In [107], authors used Synthetic Fingerprint Generation (SFinGe)
to generate realistic looking fingerprints. (ii) Shah and Ross [126] utilized a Markov Random Field
model to generate background texture for synthetic irides and various iris features, including radial
and concentric furrows, collarette, and crypts, are generated and seamlessly embedded within the
texture field and (iii) [108] generates multiple face images, starting from seed points representing
different identities. Users can control the degree of variation from the seed point to create new
faces while maintaining the same identity if the difference is below a certain threshold.

Figure 1.5: (i) Nojavanasghari et al. [107] proposed an innovative approach to generate lifelike
facial occlusions using a dataset of clear faces and separate hand images. This method reduces
the need for labor-intensive data gathering and annotation and (ii) In [19], researchers proposed a
method to generate synthetic irides with noise perturbations to achieve a realistic representation of
degraded iris images commonly encountered in real-world scenarios.

17

Figure 1.6: (i) In [109], Öz et al. generated artificial eye images by utilizing UnityEyes [150], a
3D rendering tool. The synthetic dataset was then incorporated for evaluation purposes and (ii)
Feng et al. [49] proposed to expand the face dataset by generating synthetic faces with 11 different
yaw rotations (ranging from -50 to 50 in increments of 5) and 5 pitch rotations (ranging from -30
to 30). This augmentation technique effectively increased the size of dataset when combined with
real data.

Figure 1.7: (i) and (ii) are produced by an Adversarial Autoencoder [93] trained on both the MNIST
and Toronto Face datasets (TFD). The final column displays the nearest training images based on
pixel-wise Euclidean distance to the ones in the second-to-last column.

18

Figure 1.8: (i) In [156], Yadav et al. leveraged RaSGAN to generate high-resolution iris images and
(ii) Choi et al. [26] developed a image-to-image translation GAN, StarGAN v2, that aims to learn
mapping between various visual domains while ensuring diversity and scalability across multiple
domains. In the figure, StarGAN v2 translates input image to images with target domain=Female.

1.3.3.2 Post-CNN Methods

Post-CNN methods involve generating synthetic biometric data using CNNs, which have shown

remarkable performance in various computer vision tasks. These methods leverage the power of

deep learning models to learn and generate realistic biometric samples. Some common post-CNN

methods include:

• Recurrent Neural Networks: A recurrent neural network (RNN) is a type of artificial neural

network that is designed to process sequential data by capturing dependencies and patterns

over time. Unlike feedforward neural networks, which process data in a strictly forward

manner, RNNs have loops that allow information to be stored and propagated through time.

This recurrent structure enables RNNs to handle inputs of variable lengths and make use of

past information to influence future predictions. At each time step, an RNN takes an input

vector and produces an output vector, while also maintaining an internal hidden state. This

hidden state serves as the memory of the network and allows it to retain information about

previous inputs. The output at each time step is influenced by both the current input and

the previous hidden state [124]. This type of network has been commonly used to generate

synthetic speech data. In [14], Bird et al. proposed Character-level Recurrent Neural Network

(Char-RNN) trained on a short clip of five spoken sentences to generate synthetic data. The

19

generated synthetic data is then used to improve the generalization capability of deep learning

based speech recognition methods. Mirsky and Lee [102] discuss how RNNs can be trained

to capture patterns over time, which helps render realistic flow in synthetic videos and audios.

Limitations: While recurrent neural networks (RNNs) have shown promise in generating

synthetic biometric data, they also have certain limitations and disadvantages. For example,

RNNs are sensitive to the length of the input sequences. When generating synthetic biometric

data, if the sequences are too long, RNNs may struggle to capture long-term dependencies

effectively, leading to potential loss of important information. Also, RNNs typically have a

limited context window, which means they may not be able to capture long-range dependen-

cies in the data. This can result in synthetic data that lacks broader contextual information

and may not fully capture the intricacies of the biometric characteristics. These methods

are prone to overfitting, especially when the training dataset is small or lacks diversity. This

can lead to the generation of synthetic biometric data that closely resembles the training set

but fails to capture the full variability and generalization required for accurate representa-

tion [73, 102, 124].

• Autoencoders: An autoencoder is a type of neural network that can be used to generate

synthetic biometric data by learning the underlying patterns and representations from a given

dataset. It consists of an encoder and a decoder, where the encoder compresses the input

data into a lower-dimensional representation (latent space), and the decoder reconstructs

the data from this representation. The basic autoencoder architecture has been extended

and modified to cater to specific requirements in generating synthetic biometric data. For

example, Wan et al. [146] proposed a novel approach for synthetic data generation using a

variational autoencoder (VAE) that can enable the generation of new samples that exhibit

similarities to the original dataset while introducing some level of variation. The proposed

method is evaluated and compared against traditional synthetic sampling methods using

multiple datasets and five evaluation metrics. The experimental results demonstrate the

effectiveness of the approach in addressing the challenges of imbalanced learning. Other

20

prominent autoencoders for synthetic image generation are adversarial autoencoder [93]

and Wasserstein auto-encoder [141]. Some examples of biometric data generated using

autoencoders are given in Figure 1.7.

Limitations: Many efforts have been made improve the quality of generated images by

introducing different variations of autoencoder [93, 141], however, they are still lacking in

terms of quality of generated images [11, 29]. Also, autoencoders rely on the patterns and

features present in the training data to generate synthetic samples. As a result, their creativity

and ability to generate novel data beyond the training set may be limited [29, 56].

• Generative Adversarial Networks (GANs): GANs consist of a generator and a discriminator

network that compete against each other [56]. The generator (G) learns to generate synthetic

samples that are similar to real biometric data, while the discriminator network aims to

distinguish between real and synthetic samples. GANs have been successfully applied to

generate realistic face images, iris patterns, and other biometric traits.

In [82], Kohli et

al. proposed iDCGAN (iris Deep Convolutional Generative Adversarial Network), a novel

framework that leverages deep convolutional generative adversarial networks and iris quality

metrics to generate synthetic iris images that closely resemble real iris images. The purpose

of this framework is to explore the impact of these synthetically generated iris images when

used as presentation attacks on iris recognition systems. Some examples of biometric data

generated using mathematical models are given in Figure 1.8.

Limitations: While GANs have proven highly effective in generating synthetic biometric

data, including iris images, they are without their challenges. One significant limitation is

their dependency on training data, which can lead to generated images resembling real human

data, restricting the ability to produce identities distinct from the training set [73, 140]. This

limitation raises concerns about privacy, as synthetic images may share similarities with

real-world subjects. Moreover, GANs often struggle with producing sufficient intra-class

variation, limiting their utility in biometric applications where diverse representations of the

21

same identity are needed [78]. Additionally, GANs are known for their unstable training

process, which can result in suboptimal image quality and mode collapse, especially when

scaling to high-resolution synthetic data generation.

• Diffusion based Generative Adversarial Networks (GANs): Diffusion-based GANs are

a novel approach that combines diffusion models with the traditional GAN architecture to

improve the generation of high-quality synthetic images [147]. Diffusion models generate

images by progressively denoising random noise through a series of iterative steps, moving

from random patterns to structured images [31]. This iterative process allows them to better

capture fine details and complex structures that traditional GANs may miss. By incorpo-

rating diffusion techniques into GANs, diffusion-based GANs benefit from the strengths of

both frameworks: the high image quality and detail retention of diffusion models and the

adversarial training of GANs, which helps produce more realistic and diverse outputs. This

method addresses common challenges in traditional GANs, such as instability during training

and mode collapse, making diffusion-based GANs more robust and effective for generating

complex synthetic data, especially in applications like biometric image generation [79, 135].

Limitations: Despite their advantages, diffusion-based GANs face challenges in generating

distinct identities. One key limitation is the high computational cost due to the iterative

denoising process, making it slower and less scalable for large datasets. Additionally, while

these models generate realistic images, ensuring that the identities are sufficiently distinct

from the training data remains a challenge. Diffusion-based GANs may also struggle with

controlling specific identity features in the latent space, potentially leading to less diversity

in the generated identities.

• Hybrid Approaches: Hybrid approaches combine multiple techniques to generate synthetic

data that exhibits diverse and realistic characteristics. Bamoriya et al. [9] proposed a novel ap-

proach, called Deep Synthetic Biometric GAN (DSB-GAN), for generating realistic synthetic

biometrics that can serve as large training datasets for deep learning networks, enhancing

22

their robustness against adversarial attacks. DSB-GAN builds upon a combination of convo-

lutional autoencoder (CAE) and DCGAN and the evaluation of DSB-GAN is conducted on

three biometric modalities: fingerprint, iris, and palmprint. One of the notable advantages of

DSB-GAN is its efficiency due to a low number of trainable parameters compared to existing

state-of-the-art methods.

In [67], Huang et al.

introduced an innovative approach called

introspective variational autoencoder (IntroVAE) for the synthesis of high-resolution photo-

graphic images. IntroVAE employs a streamlined architecture that combines the strengths

of both VAEs and GANs, eliminating the need for additional discriminators. The generator

component of IntroVAE follows the conventional VAE approach by reconstructing input

images from noisy outputs of the inference model. Simultaneously, the inference model

is trained to classify between real and generated samples, while the generator attempts to

deceive it, similar to the concept in GANs.

Limitations: Methods like [9] can successfully generates biometric data for multiple modal-

ities, but it lacks evidence to show inter and intra class variation in the generated dataset.

In [67], huang et al. combined the stable training of VAE with superior generation capability

of GANs to generate synthetic data. This helped improve image quality of the synthetic data,

but the work doesn’t focus on generating images that are unique from training data.

1.3.4 Synthetic Iris Images

Synthetic iris images have become increasingly important for various applications, such as iris

recognition system evaluation, algorithm development, and biometric data augmentation. Synthetic

images offer several advantages, including scalability, diversity, and control over the generated data.

In this literature review, we will explore different methods for generating synthetic iris images,

discussing their strengths, limitations, and potential applications:

• Texture Synthesis: This technique has been widely used for generating synthetic iris images.

These methods analyze the statistical properties of real iris images and generate new images

23

Figure 1.9: Some examples of synthetically generated iris images using different methods. (i) and
(ii) use mathematical models from [169] and [126]. On the other hand, (iii) and (iv) use GAN based
methods from [156] and [82], respectively. As mentioned in Section 1.3, the mathematical models
learn to generate synthetic biometric data by approximating the distribution of real human data,
creating a dependency on training data. Therefore, such models may struggle to capture complex
variations present in biometric data, resulting in limited diversity in the generated synthetic data.
Apart from limited variability, some models are also limited in terms of quality and realism. On
the other hand, GANs have paved the way to generate realistic looking synthetic images. However,
the generated images share resemblance to training data with real images from real humans i.e., the
identities generated by most of the current methods are not unique enough. Also, some of these
methods lack intra-class variations and may not fully represent the wide range natural variations
observed in real-world scenarios.

based on those statistics. Shah and Ross [126] proposed an approach for generating digital

renditions of iris images using a two-step technique.

In the first stage, they utilized a

Markov Random Field model to generate a background texture that accurately represents

the global appearance of the iris. In the subsequent stage, various iris features, including

radial and concentric furrows, collarette, and crypts, are generated and seamlessly embedded

within the texture field.

In another example, Makthal and Ross [94] introduced a novel

approach for synthetic iris generation using Markov Random Field (MRF) modeling. The

proposed method offers a deterministic synthesis procedure, which eliminates the need for

sampling a probability distribution and simplifies computational complexity. Additionally,

the study highlights the distinctiveness of iris textures compared to other non-stochastic

textural patterns. Through clustering experiments, it is demonstrated that the synthetic irises

24

generated using this technique exhibit content similarity to real iris images. In a different

approach, Wei et al. [149] proposed a framework for synthesizing large and realistic iris

datasets by utilizing iris patches as fundamental elements to capture the visual primitives

of iris texture. Through patch-based sampling, an iris prototype is created, serving as the

foundation for generating a set of pseudo irises with intra-class variations. Qualitative and

quantitative analyses demonstrate that the synthetic datasets generated by this framework are

well-suited for evaluating iris recognition systems.

Limitations: These methods learn to generate synthetic iris images by approximating the

distribution of real iris data, creating a dependency on training data. Therefore, such models

may struggle to capture complex variations present in real iris datasets, resulting in limited

diversity in the generated synthetic data. Furthermore, certain studies in this domain lack

comprehensive analysis of the inter and intra-class variations exhibited in the generated

synthetic data. For instance, a study by Wei et al. [149] demonstrates that the synthetic

iris images bear a striking resemblance to real iris images in terms of visual appearance.

However, the experiments conducted in this study do not thoroughly explore the extent to

which the synthetic iris images resemble each other in terms of identity.

• Morphable Models: Morphable models have been utilized for generating synthetic iris

images by capturing the shape and appearance variations in a statistical model. These

models represent the shape and texture of irises using a low-dimensional parameter space.

By manipulating the parameters, synthetic iris images with different characteristics, such as

size, shape, and texture, can be generated. Most of the research in this category focuses on

generation synthetic iris images with gaze estimation and rendering eye movements. Wood

et al. [150] proposed a 3-D morphable model for the eye region with gaze estimation and

re-targeting gaze using a single reference image. Similarly, [10] focuses on achieving photo-

realistic rendering of eye movements in 3D facial animation. The model is built upon 3D

scans of a face captured from various gaze directions, enabling the capture of realistic motion

of the eyeball, eyelid deformation, and the surrounding skin. To represent these deformations,

25

a 3D morphable model is employed.

Limitations: Morphable models offer control over iris attributes and can generate a diverse

range of iris images with unique identity. However, acquiring a comprehensive training

dataset and accurate parameter estimation are crucial for the effectiveness of these models.

An illustrative example can be found in [150], where the focus is on gaze estimation and

re-targeting, enabling the generation of eye regions with realistic variations. However, it

is worth noting that while emphasizing these variations, the model’s capacity to preserve

identity information is constrained in the resulting generated images. This is a very common

problem found with morphable models for gaze estimation.

• Image Warping: Image warping techniques involve applying geometric transformations to

real iris images to generate synthetic images. These transformations can include rotations,

translations, scaling, and deformations. Image warping allows for the generation of synthetic

iris images with variations in pose, gaze direction, and occlusions. In [19], Cardoso et al.

aimed to generate synthetic degraded iris images for evaluation purposes. The method utilizes

various degradation factors such as blur, noise, occlusion, and contrast changes to simulate

realistic and challenging iris image conditions. The degradation factors are carefully con-

trolled to achieve a realistic representation of degraded iris images commonly encountered

in real-world scenarios. In [32], a novel iris image synthesis method combining principal

component analysis (PCA) and super-resolution techniques is proposed. The study begins

by introducing the iris recognition algorithm based on PCA, followed by the presentation of

the iris image synthesis method. The proposed synthesis method involves the construction

of coarse iris images using predetermined coefficients. Subsequently, super-resolution tech-

niques are applied to enhance the quality of the synthesized iris images. By manipulating the

coefficients, it becomes possible to generate a wide range of iris images belonging to specific

classes.

Limitations: This method is computationally efficient and can produce a large number of

26

synthetic images. However, image warping may not capture the complex textural details and

iris-specific characteristics accurately. Also, some methods like [19] focuses on generating

intra-class variations for the currently available iris datasets and don’t cater to the need of

generating large-scale synthetic iris dataset with inter-class variations.

• Generative Adversarial Networks (GANs): GANs have gained significant attention for

generating realistic and diverse synthetic iris images.

In a GAN framework, a generator

network learns to generate synthetic iris images, while a discriminator network distinguishes

between real and synthetic images. The two networks are trained in an adversarial manner,

resulting in improved image quality over time. GANs can generate iris images with realistic

features, including iris texture, color, and overall appearance. Minaee and Abdolrashidi [100]

proposed a framework that utilizes a GAN to generate synthetic iris images sampled from a

learned prior distribution. The framework is applied to two widely used iris datasets, and the

generated images demonstrate a high level of realism, closely resembling the distribution of

images within the original datasets. Similarly, Kohli et al. [82] proposed iDCGAN (iris Deep

Convolutional Generative Adversarial Network), a novel framework that leverages a deep

convolutional GAN to generate synthetic iris images that closely resemble real iris images.

The purpose of this framework is to explore the impact of these synthetically generated

iris images when used as presentation attacks on iris recognition systems. Bamoriya et

al. [9] proposed a novel approach, called Deep Synthetic Biometric GAN (DSB-GAN), for

generating realistic synthetic biometrics that can serve as large training datasets for deep

learning networks, enhancing their robustness against adversarial attacks. DSB-GAN builds

upon a combination of convolutional autoencoder (CAE) and DCGAN and the evaluation of

DSB-GAN is conducted on three biometric modalities: fingerprint, iris, and palmprint. One

of the notable advantages of DSB-GAN is its efficiency due to a low number of trainable

parameters compared to existing state-of-the-art methods.

Limitations: GANs have shown promise in generating synthetic iris images, but they also

have certain limitations. The GAN-based methods introduced by Minaee et al. [100] and

27

Kohli et al. [82] demonstrate the ability to generate high-quality iris images, particularly

in low-resolution settings. However, it is important to note that the image quality tends to

degrade when applied to higher resolution scenarios [156]. While these methods exhibit

promising capabilities for generating synthetic iris images, further improvements are neces-

sary to ensure consistent quality across various image resolutions. The other most common

issue found with the current GAN based methods for synthetic iris image generation is data

dependency. Current GAN methods fail to generate iris images that are unique from real

training data in terms of identity [73, 140]. This restricts the capability of GAN methods to

generate large-scale synthetic iris dataset. Also, the inadequate or biased training data can

result in sub-optimal performance and limit network’s generalization capabilities.

• Diffusion based Generative Adversarial Networks: Diffusion-based GANs: Diffusion-

based GANs have recently emerged as a promising approach for generating realistic and

high-quality synthetic iris images by combining the strengths of diffusion models and GANs.

In these models, the image generation process involves iterative refinement, starting from

noise and progressively producing structured images. Diffusion-based GANs can capture

complex textures and features, making them particularly suited for the fine details required

in synthetic iris image generation. Recent work like II3FDM [92] has shown the potential of

diffusion models in tasks like iris inpainting, highlighting the effectiveness of diffusion-based

approaches in reconstructing realistic iris textures. This method demonstrates that diffusion

models can produce high-quality iris images, preserving intricate details such as iris patterns

and textures, which are crucial for biometric applications. However, while studies like

II3FDM have explored diffusion models for iris-related tasks, the application of diffusion-

based GANs in the iris domain remains relatively underexplored. Most research has focused

on traditional GANs or inpainting methods, leaving significant room for further development

and innovation in using diffusion-based GANs for generating synthetic iris datasets. The

limited research in this area suggests there is ample scope for exploring how diffusion-based

GANs can overcome the challenges faced by traditional GANs, such as generating distinct

28

identities and handling high-resolution images. By leveraging the stability and iterative

nature of diffusion processes, future research could unlock new capabilities in iris synthesis,

particularly in enhancing identity diversity and addressing issues related to mode collapse

and training instability, which are common in GAN-based models.

• Limitations: While diffusion-based GANs offer a promising direction, their application in

iris synthesis has not been thoroughly investigated. The work by II3FDM [92] indicates their

potential for high-quality iris reconstruction, but the challenge of generating distinct identities

in large-scale datasets remains a largely unaddressed area. Additionally, the computational

cost and complexity of diffusion-based models present a barrier to their widespread use,

further highlighting the need for research to optimize these models for synthetic biometric

data generation.

1.4 Our Contribution

In the field of biometrics, various techniques have been developed to generate synthetic data

for different modalities, including face, fingerprint, and iris. These methods enable researchers

and practitioners to create artificial biometric samples that mimic the characteristics of real-world

data. However, as discussed earlier the current methods for generating synthetic irides are limited

in terms of quality, realism and uniqueness (inter and intra-class variations).

In this research,

we proposed different methods to overcome these issues and emphasize their usefulness through

various experiments and analysis. Based on the nature of the generated data and their application,

the contribution of our work can be categorized as follows:

• Generating Partially-Synthetic Biometric Data: Partially-Synthetic biometric data refer

to the synthetic samples that contain artificial components mixed with real biometric traits.

The goal of partially-synthetic data is to introduce controlled variations or augmentations

to the real data, thereby increasing the diversity and robustness of the dataset. This can be

particularly useful in scenarios where the real data is limited, imbalanced, or lacks specific

29

variations. For example, in iris presentation attack (PA) detection where the detection

methods aim to detect PA attacks (such as printed eyes, cosmetic contact lens, etc.), limited

PA data is available to train the detection methods. This can limit the methods’ development

and testing as well. Also, with the improvement in technology more advance PA attacks are

present in the real world (such as good quality textured contact lens, replay attack using high

definition screens, etc.) and the current detection methods are not generalized enough to

detect these new and unseen attacks. To overcome these issues, we proposed two different

methods:

(1) Leveraging Relativistic Average Standard Generative Adversarial Network (RaS-

GAN): This study leverages the capabilities of RaSGAN to produce high-quality iris images.

Unlike traditional GANs, the introduction of a "relativistic" discriminator and generator in

RaSGAN enhances the network’s generative power. This approach aims to maximize the

probability that the real input data is more realistic than the synthetic data, and vice versa.

The synthetic iris images generated through this method exhibit similarity to real iris images,

capturing the intricate details and characteristics of the iris. Building upon this, we explore

the usability of these synthetic images for training a PAD system that can effectively detect

presentation attacks. By combining the power of RaSGAN for generating highly realistic

iris images and the effectiveness of the resulting synthetic images for training a PAD system,

this approach offers promising prospects for enhancing the security and reliability of iris

recognition systems. The synthetic images generated through this technique can contribute

to improving the detection and prevention of presentation attacks, enabling biometric systems

to handle previously unseen attacks more effectively.

(2) Cyclic Image Translation Generative Adversarial Network (CIT-GAN): We intro-

duced a novel approach called CIT-GAN for achieving multi-domain style transfer. Our

method incorporated a Styling Network, which learns the distinctive style characteristics of

each domain represented in the training dataset. By leveraging the Styling Network, the

generator is guided to translate images from a source domain to a reference domain, resulting

30

in the generation of synthetic images that possess the style characteristics of the reference

domain. The learning process for style characteristics is influenced by both the style loss and

domain classification loss, allowing for variability in style characteristics within each domain.

In the context of iris presentation attack detection (PAD), we utilized the proposed CIT-GAN

to generate synthetic presentation attack (PA) samples for classes that are under-represented

in the training set. Through evaluation using state-of-the-art iris PAD methods, we demon-

strated the effectiveness of using these synthetically generated PA samples for training PAD

models. Additionally, we evaluated the realism of the synthetic images using the Frechet

Inception Distance (FID) score, which quantifies the similarity between the distributions of

real and synthetic images. Our results indicate that the proposed method produces synthetic

images of superior quality compared to other competing methods, including StarGan-v2.

(3) Multi-domain Image Translative Diffusion StyleGAN with Application in Iris Pre-

sentation Attack Detection (MID-StyleGAN): An iris biometric system can be vulnerable

to presentation attacks (PAs), where artifacts like artificial eyes, printed eye images, or

cosmetic contact lenses are used to deceive the system. To mitigate these threats, various

presentation attack detection (PAD) methods have been proposed. However, the development

and evaluation of iris PAD techniques face a significant challenge due to the lack of sufficient

datasets, primarily because of the inherent difficulties in creating and capturing realistic PAs.

To address this issue, we presented the Multi-domain Image Translative Diffusion StyleGAN

(MID-StyleGAN), a novel framework designed to generate synthetic ocular images that effec-

tively capture domain-specific information from iris PA datasets. MID-StyleGAN leverages

the strengths of diffusion models and generative adversarial networks (GANs) to create real-

istic and diverse synthetic data. It utilizes a multi-domain architecture that enables seamless

translation between bonafide ocular images and various PA domains, while maintaining the

biometric identity features. The framework incorporates an adaptive loss function specifi-

cally tailored for ocular data to ensure domain consistency. Experimental results demonstrate

that MID-StyleGAN surpasses existing methods in generating high-quality synthetic ocular

31

images, significantly enhancing PAD system performance.

• Generating Fully-Synthetic Biometric Data: Fully-synthetic biometric data refer to the

generation of entirely artificial biometric samples that do not correspond to any real individ-

uals in the population. With fully-synthetic biometric data the aim is to generate new iris

images with both inter and intra-class variations to help mitigate the issue of small training

data size by increasing their size of dataset. This can improve the development of recognition

systems and their testing. Also, by generating fully-synthetic identities that doesn’t resemble

anyone in this world, we can solve the privacy concerns attached with using real person’s

biometric data. We have proposed two different methods to achieve this:

(1) iWarpGAN: Disentangling Identity and Style to Generate Synthetic Iris Images: This

framework incorporates two transformation pathways, namely the Identity Transformation

and the Style Transformation. The Identity Transformation pathway is designed to modify the

identity of the input iris image in the latent space, allowing the generation of iris images with

identities that are distinct from those in the training set. This is accomplished by learning

a radial basis function (RBF)-based warp function, denoted as 𝑓𝑝, in the latent space of a

Generative Adversarial Network (GAN). The gradient of this function enables the generation
of non-linear paths along the 𝑝𝑡ℎ family of paths for each latent code 𝑧 ∈ 𝑅, resulting in

diverse identities. On the other hand, the Style Transformation pathway focuses on generating

iris images with different styles. The style attributes are extracted from a reference iris image

and combined with the transformed identity code. By concatenating the reference style

code with the modified identity code, iWarpGAN generates synthetic iris images with both

inter-class and intra-class variations in terms of style. Through the integration of these two

transformation pathways, iWarpGAN facilitates the generation of iris images that exhibit

diverse identities and styles, providing a comprehensive exploration of the latent space.

(2) Image Translative Diffusion GAN (IT-diffGAN): IT-diffGAN is introduced to address

critical challenges posed by traditional GANs in generating synthetic biometric data, specif-

32

ically in producing distinct, realistic identities with sufficient variation. Traditional GANs

often struggle with mode collapse, unstable training, and the generation of synthetic images

that closely resemble the training data, which can compromise privacy and diversity. IT-

diffGAN seeks to overcome these issues by leveraging the advantages of diffusion models,

which offer a more stable and iterative approach to image generation, combined with the

architectural power of StyleGAN-3. The proposed method first begins by projecting input

images into the latent space of the diffusion-GAN. In this latent space, the model identifies

key features that define individual identity and style, enabling precise manipulation of these

features. By applying a specialized identity and style metric, IT-diffGAN calculates the

displacement or distance between the original and generated images, providing a measure of

how much these key features are being altered. This metric allows the model to learn which

latent features most significantly affect the identity and style, making the manipulation of

these attributes more controlled and effective. Once the relevant identity and style features

are identified, IT-diffGAN is trained to generate entirely new identities by carefully adjusting

these features within the latent space. This training enables the model to introduce both

inter-class and intra-class variations in the synthetic images. Inter-class variations allow for

the generation of completely distinct identities, while intra-class variations create different

representations of the same identity, mimicking natural biometric variations seen in real-

world data. By utilizing the diffusion-GAN framework, IT-diffGAN offers enhanced stability

during training compared to traditional GANs. Diffusion models inherently reduce the risk of

mode collapse by employing an iterative denoising process that refines the generated images

progressively from noise to fully-formed, structured images. This leads to the production

of more realistic and higher-quality synthetic iris images. Additionally, the StyleGAN-3

backbone helps in preserving the fine details and features essential for biometric recognition,

further boosting the quality of the generated data.

33

CHAPTER 2

GENERATING PARTIALLY SYNTHETIC IRIS IMAGES FOR ENHANCED
PRESENTATION ATTACK DETECTION

This work leverages RaSGAN to generate high-quality partially synthetic iris images in NIR

spectrum and evaluates the effectiveness and usefulness of these images as both bonafide and

presentation attack. We also propose a novel one-class presentation attack detection method known

as RD-PAD for unseen presentation attack detection, addressing the challenge of generalizability

in PAD algorithms. This research have been published in [156] and [157]

2.1 Introduction

The rich texture of the iris, which is better discernible in the near-infrared spectrum, has been

used as a biometric cue [37] in many recognition systems [71]. This has led to an increased interest

in the texture and morphology of the iris. Consequently, researchers have strived to model the

pattern of the iris. In this regard, a number of methods to generate synthetic digital irides have

been developed (as discussed in Chapter 1). Cui et al. [32] used principal component analysis to

select appropriate feature vector coefficients from real images, which were then used to generate

synthetic irides. The quality of the generated data was improved using super-resolution. Zuo

et al. [173] developed a model based on the morphology of the iris. Noise and light reflection

were also added to the model to create more realistic looking samples. Shah and Ross [126] used

Markov Random Field to model the stromal texture of the iris [94] and then added anatomical

entities such as collarette, crypts, radial and concentric furrows. Wei et al. [149] proposed a

framework for synthesizing large and realistic iris datasets by utilizing iris patches as fundamental

elements to capture the visual primitives of iris texture. Through patch-based sampling, an iris

prototype is created, serving as the foundation for generating a set of pseudo irises with variations.

In [144], Venugopalan and Savvides aimed to model synthetic iris codes from original irides that

can be utilized to evaluate iris recognition systems. Other methods have also been proposed in

the literature to generate synthetic iris images [19, 50, 148]. While these methods successfully

34

generate digital iris images, they are still unable to truly model the distribution of real iris images

in high-resolution [156].

With the introduction of deep learning and generative adversarial networks (GANs), arise

the opportunity to overcome some challenges of pre-CNN methods to generate realistic looking

synthetic images with higher quality. Huang et al. [67] introduced an innovative approach called

introspective variational autoencoder (IntroVAE) for the synthesis of high-resolution photographic

images. IntroVAE employs a streamlined architecture that combines the strengths of both VAEs and

GANs, eliminating the need for additional discriminators. The generator component of IntroVAE

follows the conventional VAE approach by reconstructing input images from noisy outputs of

the inference model. Simultaneously, the inference model is trained to classify between real and

generated samples, while the generator attempts to deceive it.

In [82], Kohli et al. proposed

iDCGAN (iris Deep Convolutional Generative Adversarial Network), a novel framework that

leverages deep convolutional generative adversarial networks and iris quality metrics to generate

synthetic iris images that closely resemble real iris images. This framework aims to explore

the impact of the synthetically generated iris images when used as presentation attacks on iris

recognition systems. Bamoriya et al. [9] proposed a novel approach, Deep Synthetic Biometric

GAN (DSB-GAN), to generate realistic synthetic biometrics that can serve as large training datasets

for deep learning networks, enhancing their robustness against adversarial attacks. DSB-GAN

builds upon a combination of convolutional autoencoder (CAE) and DCGAN and the evaluation

of DSB-GAN is conducted on three biometric modalities: fingerprint, iris, and palmprint. One of

the notable advantages of DSB-GAN is its efficiency due to a low number of trainable parameters

compared to existing state-of-the-art methods. While these methods successfully generate synthetic

data, the quality of the generated data deteriorates with increasing resolution, failing to capture the

intricate biometric details.

In this research, we propose to overcome these challenges by leveraging the generative capability

of Relativistic Average Standard Generative Adversarial Network (RaSGAN) [72] with Frechet

Inception Distance (FID) [63] to generate high-resolution iris images. Unlike other GAN methods,

35

Figure 2.1: Samples of real and spoof iris images from MSU-Iris-PA01 [156]: (a) bonafide samples
and (b) presentation attack samples: (i) artificial eye, (ii) & (iii) printed iris, (iv) Kindle display
and (v) cosmetic contact lens.

RaSGAN trains a generator that aims to maximize the probability that a randomly sampled set

of synthetic samples are more realistic than a given set of real samples.

In [72], Jolicoeur-

Martineau showed that this property can be implemented in a Standard GAN using a “relativistic

discriminator" that competes with the generator to maximize the probability that the real data is

more realistic than the synthetic data. The author studied different cost functions and compared

the statistics of the generated samples using the Frechet Inception Distance (FID) score [12].

They reported that the RaSGAN obtained much lower (better) FID score on the CIFAR-10 dataset

than SGAN, Least Squares GAN (LSGAN) [97] and Wasserstien GAN (WGAN) [8]. It was also

observed that RaSGAN produces high-resolution images using fewer number of iterations even

when other networks were not able to converge (especially for high resolution images). We further

investigate the quality of irides generated using RaSGAN by evaluating whether state-of-the-art

iris presentation attack detection (PAD) methods can distinguish between bonafide, synthetically

generated irides and presentation attacks. Here, presentation attacks (PAs) refer to physical artifacts

that are utilized to successfully circumvent iris biometric systems. For example, an adversary can

present a printed image [60, 161] to an iris sensor to impersonate another subject, or use cosmetic

contact lenses [80, 155] and artificial eyes [51] to either obfuscate their own identity or to create

a virtual identity. Due to their serious impact on the security of a system, detecting such spoof or

obfuscation attacks has become a key research topic in biometrics. Some of the commonly used

iris presentation attack detection (PAD) algorithms are summarized below:

• Print Attack: Attackers may use printed or static images of an enrolled iris to impersonate

someone. Gupta et al. [60] used textual descriptors such as LBP, HOG and GIST to detect

36

print attacks. Raghavendra and Busch [115] used multi-scale binarized statistical image

features (BSIF) combined with cepstral features for print attack detection. The printed

images are often flat and can easily be detected using liveliness test [33, 74].

• Cosmetic Contact Lens: Attackers can utilize cosmetic contact lens to fool the iris recog-

nition systems. Kohli et al. [81] used a variant of LBP to obtain useful textural features for

contact lens detection. Other approaches to detect such attacks include weighted local binary

pattern and deep features extracted from CNNs [98].

• Synthetic/Artificial Eye: This type of attack is less common than print and cosmetic contact

lens but is gaining interest in recent times [82]. Some of the proposed methods for detection

are based on multispectral imaging [22] and eye gaze tracking [88].

• Multiple Attacks: PAD algorithms can also be designed to address various types of PAs.

In [64], Hoffman et al. designed a CNN that used patch information along with a segmentation

mask from an un-normalized iris image to learn image characteristics that differentiate PA

samples from bonafide samples. Menotti et al. [98] proposed a CNN based approach to

detect spoofs in different modalities, viz., iris, face and fingerprint.

While the aforementioned methods exhibit reasonably good performance on seen PAs, they

do not generalize well over unseen PAs. We use the term “Seen PAs" to refer to PAs that are

used or observed during the development or training stage of the detector. On the other hand,

“Unseen PAs" refer to PAs that are not used or observed during the development or training stage

of the detector. Some of the recent PAD methods attempted to improve generalizability by building

deep convolutional networks that aims to learn the difference between bonafide irides and PAs.

Gupta et al. [59] proposed a novel deep learning-based PAD method, known as MVANet, that

leverages multiple representation convolutional layers to improve the generalization of the PA

detector. MVANet also addresses the computational complexity inherent in training deep neural

networks by adopting a pragmatic approach and utilizing a fixed base model. The comprehensive

assessments on various databases using cross-database training-testing configurations shows the

37

efficacy of MVANet in generalizing over unseen PAs. Sharma and Ross [128], introduced an iris

PAD method named D-NetPAD, built upon the DenseNet convolutional neural network architecture.

D-NetPAD exhibits a high degree of adaptability when it comes to generalizing over various PA

artifacts, sensors, and datasets. Through a series of experiments carried out on various iris PA

datasets, they validated the efficacy of D-NetPAD in generalized PA detection. In [21], Chen and

Ross proposed a joint iris detection and PA detection method that aims to predict the parameters

of the iris bounding box and simultaneously assess the likelihood of a presentation attack using

the input ocular image. Through a series of experiments, authors showed the efficacy of proposed

method in detecting irides as well as iris PAs.

Most of these PAD methods formulate presentation attack detection as a binary-class problem,

which demands the availability of a large collection of both bonafide and PA samples to train

classifiers. However, obtaining a large number of PA samples can be much more difficult than

bonafide iris samples. Further, classifiers are usually trained and tested across similar PAs, but PAs

encountered in operational systems can be diverse in nature and may not be available during the

training stage. Also, in case of binary-class based detectors, the PAD methods need to be fine-tuned

(or in some cases re-trained) whenever a new PA is introduced. Therefore, PAD algorithms based

on binary classifiers might fail to generalize to unseen PAs.

In the literature, researchers have

attempted to impart generalizability to PAD algorithms by adopting an anomaly detection approach

also known as one-class classification. In this approach, the PA detector learns the distribution

of bonafide samples only and uses this information to detect “outliers" that would presumably

correspond to PAs. In [39], Ding and Ross proposed an ensemble of one-class classifiers, trained

on hand-crafted features, to detect unseen fingerprint PAs. Nikisins et al. [106] used the gaussian

mixture model as a one-class classifier trained on image quality measure (IQM) features [50]

for generalized face PA detection. In [44, 65, 123, 168] researchers used deep architectures such

as CNNs and GANs [57] for anomaly detection in general image classification problems, which

suggest the efficacy of deep-learning based features in anomaly detection.

Using this as motivation, we propose a one-class classifier for generalized iris PA detection

38

known as RD-PAD. The relativistic discriminator (RD) of the ensuing RaSGAN learns to separate

bonafide irides from their synthetic counterparts.

In the process, the RD fits a tight boundary

around the bonafide samples making it an effective one-class anomaly detector, which we refer to

as RD-PAD. The proposed method, in principle, does not require any PA samples during training;

only bonafide samples are needed during training. Consequently, anything that lies outside the

learned distribution on bonafide samples is classified as PA.

The major contributions of this research are summarized here:

• We use RaSGAN with FID score to generate synthetic iris images that can effectively model

the distribution of real iris images.

• We investigate if state-of-the-art iris PAD algorithms can distinguish bonafide irides as well as

presentation attack images (e.g. cosmetic contact lens, printed iris and artificial eye images)

from the generated synthetic images.

• We propose RD-PAD for unseen PA detection that utilizes the relativistic discriminator from a

RaSGAN to discriminate bonafide samples from PAs. The proposed PAD algorithm requires

only bonafide samples for training.

• We analyze the performance of state-of-the-art PAD algorithms on unseen PAs and compare

them with the proposed method.

• We evaluate the performance of the proposed RD-PAD when it is fine-tuned using a few PA

samples and tested on PAs that are not used during training.

2.2 Background

Generative Adversarial Networks (GANs) [57] are neural networks that consist of two different

components: a generator (𝐺) that learns how to synthesize the data (e.g., images), and a discrim-

inator (𝐷) that aims to discriminate between real and synthetic data. These two networks are

39

alternatively updated against each other in a min-max game where the objective of the generator is

to maximally fool the discriminator while the objective of the discriminator is to not be fooled.

2.2.1 Standard Generative Adversarial Networks (SGANs)

As mentioned previously, a standard GAN (SGAN) consists of two networks 𝐷 and 𝐺 that are

wrapped in a min-max game to update their weights and compete against each other. This is

achieved by alternatively minimizing and maximizing the objective function 𝑆 as,

min
𝐺

max
𝐷

𝑆(𝐷, 𝐺) = E𝒙𝒓 ∼P [𝑙𝑜𝑔(𝐷 (𝒙𝒓))]

+E𝒛∼M [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝒛)))].

(2.1)

Here, 𝒙𝒓 ∼ P indicates that 𝒙𝒓 is from the true data distribution P. Also, 𝐷 (𝒙) is the output

obtained after applying the sigmoid function (𝑠𝑖𝑔), to the non-transformed layer 𝑁𝑇 (𝒙),

𝐷 (𝒙) = 𝑠𝑖𝑔(𝑁𝑇 (𝒙)).

(2.2)

Here, 𝑁𝑇 (𝒙), refers to the output of the last convolutional layer before the application of

logistic regression. Traditional GANs such as SGAN, WGAN and DCGAN design discriminators

that optimize their ability to distinguish synthetically generated data from bonafide samples. While

they have been reported to perform well [72] on low resolution datasets, unstable training and

optimization have been observed when they are used with high-resolution data [72]. This instability

can be explained in terms of the gradient of the traditional discriminator:

∇𝜃 𝑆𝐷 = −E𝒙𝒓 ∼P [(1 − 𝐷 (𝒙𝒓))∇𝜃 𝑁𝑇 (𝒙𝒓)]
+E𝒙𝒔∼Q [𝐷 (𝒙𝒔)∇𝑤 𝑁𝑇 (𝒙𝒔)].

(2.3)

Here, 𝒙𝒔 ∼ Q indicates that 𝒙𝒔 is from the model distribution Q, i.e., synthetically generated data.

During training, when the discriminator is optimized, 1 − 𝐷 (𝒙𝒓) converges to 0 indicating that the

gradient of 𝐷 comes mostly from synthetically generated data. Consequently, the generator stops

learning to generate natural looking images. This in turn restricts the ability of the discriminator

to learn a good representation for bonafide irides. However, we would like to learn a stable model

with a discriminator that has a better understanding of the distribution of bonafide irides.

40

2.2.2 Relativistic Standard Generative Adversarial Networks (RSGANs)

In [72], Jolicoeur-Martineau introduced the relativistic discriminator, 𝐷 𝑅, which aims to maximize

the probability that bonafide irides are more real than synthetically generated irides using the

following objective function:

𝑅(𝐷 𝑅) = −E(𝒙𝒓 ,𝒙𝒔)∼(P,Q) [𝑙𝑜𝑔(𝑠𝑖𝑔(𝑁𝑇 (𝒙𝒓)
−𝑁𝑇 (𝒙𝒔)))].

(2.4)

In this case, the training of the discriminator depends on both bonafide and synthetic data. From

Equation (5.7), we can see that its gradient depends on 𝒙𝒓 as well as 𝒙𝒔, which ensures that the

generator 𝐺 𝑅 continues learning to synthesize real looking irides until convergence. In RSGAN,

𝐺 𝑅 aims to generate images that maximize the probability that they are more real than bonafide

samples:

𝑅(𝐺 𝑅) = −E(𝒙𝒓 ,𝒙𝒔)∼(P,Q) [𝑙𝑜𝑔(𝑠𝑖𝑔(𝑁𝑇 (𝒙𝒔)
−𝑁𝑇 (𝒙𝒓)))].

(2.5)

Therefore, 𝐷 𝑅 and 𝐺 𝑅 compete with each other to generate realistic looking and high resolution

synthetic irides.

2.2.3 Relativistic Average Standard Generative Adversarial Networks (RaSGANs)

In RSGAN, a sample in distribution P is compared with every sample in Q (and vice-versa), which

might not be very efficient. Therefore, to make this adversarial network more efficient, Jolicoeur-

Martineau [72] updated the objective function of 𝐷 𝑅 and 𝐺 𝑅 to compare a sample in distribution

P with the average of samples from Q (and vice-versa):

𝑅𝑎𝑣𝑔 (𝐷 𝑅) = −E𝒙𝒓 ∼P [𝑙𝑜𝑔( ˆ𝐷 (𝒙𝒓))]−
E𝒙𝒔∼Q [𝑙𝑜𝑔(1 − ˆ𝐷 (𝒙𝒔))],

𝑅𝑎𝑣𝑔 (𝐺 𝑅) = −E𝒙𝒔∼Q [𝑙𝑜𝑔( ˆ𝐷 (𝒙𝒔))]−
E𝒙𝒓 ∼P [𝑙𝑜𝑔(1 − ˆ𝐷 (𝒙𝒓))],

41

(2.6)

(2.7)

Figure 2.2: Schematic of the training process for the Relativistic Average Standard Generative
Adversarial Network (RaSGAN) using real iris images. The training images for RaSGAN are first
aligned and center-cropped using the pupil-iris center. Cropped images of size 256×256 are then
sent to the discriminator for training. The discriminator tries to detect synthesized images while
the generator competes with it to generate more realistic synthetic images by back-propagating the
loss after each training iteration and updating the weights. For each generated image, a FID score
is calculated to evaluate its quality. This process is repeated until images with lower (i.e., better)
FID scores are generated.

ˆ𝐷 =





𝑠𝑖𝑔(𝑁𝑇 (𝒙) − E𝒙𝒔∼Q𝑁𝑇 (𝒙𝒔)), if 𝒙 = 𝒙𝒓
𝑠𝑖𝑔(𝑁𝑇 (𝒙) − E𝒙𝒓 ∼P𝑁𝑇 (𝒙𝒓)), if 𝒙 = 𝒙𝒔.

(2.8)

This network is referred to as Relativistic Average Standard Generative Adversarial Network

(RaSGAN). The generator and discriminator in RaSGAN use the relative information between

bonafide and synthetic data to generate realistic looking and high resolution irides. When the

generator competes with the discriminator in this fashion, it gives the discriminator the opportunity

to learn a more effective distribution for bonafide irides. This is the key observation that we exploit

in this work: the ability of the generator to synthesize more natural looking irides, and the ability

of the discriminator to learn a more accurate distribution for bonafide irides.

42

Figure 2.3: Samples of generated irides from trained Relativistic Average Standard Generative
Adversarial Network (RaSGAN).

2.2.4 Fréchet Inception Distance Score

In [119], Salimans et al. proposed to use a pre-trained inception-V3 network to generate images

and then compare the marginal label distribution with the conditional label distribution to generate

the inception score. With respect to large KL-Divergence between the distributions, higher the

inception score, more realistic looking the generated data. The inception score provides a good

metric for evaluating realism of generated images but it does not include statistics that compare real

data against synthetic data.

Instead of analyzing synthetic iris images in isolation, the Frechet Inception Distance [63]

compares the statistics of the generated synthetic samples against the real samples:

𝐹 𝐼 𝐷 = ∥ 𝝁𝒓 − 𝝁𝒔 ∥2 + 𝑇𝑟 (𝚺𝒓 + 𝚺𝒔 − 2

√︁𝚺𝒓𝚺𝒔),

(2.9)

where, 𝝁𝒓, 𝝁𝒔, 𝚺𝒓 and 𝚺𝒔 represent the statistics of the two distributions and 𝑇𝑟 is the trace of

the co-variance matrix (𝚺𝒓 + 𝚺𝒔 − 2

√

𝚺𝒓𝚺𝒔). Since FID is measured as the distance between the

43

distributions of real and generated data, the lower the FID score, the higher the similarity between

real and generated data.

2.3 Proposed Method

In this research, we train RaSGAN under two different settings: (1) Train RaSGAN such that the

generator can generate high resolution and realistic looking synthetic irides. The stopping criteria

for this setting is designed to achieve best image quality which is evaluted using Fréchet Inception

Distance (FID) Score. Lower the FID score, better is the image quality.

(2) Train a one-class

classifier, Relativistic Discriminator (RD), for generalized iris PA detection. Here, the focus is to

learn to separate bonafide irides from their synthetic counterparts. In the process, the RD fits a tight

boundary around the bonafide samples making it an effective one-class anomaly detector, known

as RD-PAD.

2.3.1 Synthesizing Irides using RaSGAN

Here, our aim is to utilize the generative power of RaSGAN to generate synthetic irides that has

good image quality. The network is trained until the best quality iris images the generated by the

generator. This is achieved by analyzing the quality of generated images at the end of each training

iteration until the best possible FID score is obtained. This helps make sure that the trained network

can generate realistic looking iris images.

The RaSGAN architecture used in this work consists of two important components: relativistic

discriminator and generator that are implemented using PyTorch libraries.1 The network is trained

using only bonafide samples that are first pre-processed to align them using the center coordinates

of the pupil and the iris. The coordinates themselves are obtained using VeriEye SDK.2 The aligned

images are further center-cropped and then resized to obtain images of size 256×256. The input to

the relativistic discriminator (𝐷 𝑅) are pre-processed bonafide samples and synthetically generated

1https://github.com/alexiajm/relativisticgan
2www.neurotechnology.com/verieye.html

44

irides from 𝐺 𝑅. The input to the generator is a noise sample 𝒛 of size 1×1×128, where 𝒛 is sampled

from a normal noise distribution. The architecture of the RaSGAN used in this work is summarized

below.

• Relativistic Discriminator: The 𝐷 𝑅 in RaSGAN has been constructed using seven convo-

lutional layers with kernel size 4×4 and stride=2 (apart from the last convolution layer where

stride=1). The first convolutional layer is followed by leaky rectified linear (Leaky-ReLU)

units while the remaining layers (except for the last convolutional layer) are followed by both

batch normalization and leaky rectified units.

• Relativistic Generator: The 𝐺 𝑅 aims to generate natural looking irides of size 256×256

from input 𝒛, and has been implemented using seven transposed convolutional layer. Each

layer has a kernel size of 4×4 and stride=2, except for the first transposed layer that has

stride=1. Batch normalization and rectified linear units are applied to the output of each

transposed convolutional layer.

2.3.2 Relativistic Discriminator- A One-class Presentation Attack Detection Method (RD-

PAD)

The goal here is to develop a presentation attack detector that can generalize well over previously

unseen PAs. Therefore, we focused on learning a good representation of bonafide samples for one-

class classification. This has been achieved by utilizing the relativistic discriminator (𝐷 𝑅) from

RaSGAN that is trained using only bonafide samples and the corresponding synthetically generated

samples from RaSGAN. During training, 𝐷 𝑅 competes with 𝐺 𝑅 to distinguish between bonafide

and synthetically generated irides. This enables the discriminator to better learn the distribution of

bonafide irides thereby allowing it to distinguish bonafide samples from all types of PA samples.

45

2.3.2.1 Method-I: RD-PAD Trained with Bonafide Samples Only

Training a good discriminator is an important aspect of the proposed method. Therefore, as the

first step, RaSGAN is trained using bonafide irides only. All the samples used during training

are center aligned and cropped to size 256×256, as described in Section 2.3.1. The 𝐷 𝑅 obtained

after RaSGAN training outputs the probability that a given input sample belongs to the bonafide

distribution, i.e., an ideally trained 𝐷 𝑅 should satisfy 𝐷 𝑅 (𝒙) ≈ 1, when 𝒙 belongs to the bonafide

iris category, and 𝐷 𝑅 (𝒙) ≈ 0, when 𝒙 represents some PA sample.

2.3.2.2 Method-II: RD-PAD Fine-tuned with Some PA Samples

The 𝐷 𝑅 in Method-I is familiar with the distribution of bonafide samples but has no knowledge

of any PA distributions. So, it learns a tight boundary encompassing the bonafide class, which

can lead to misclassification of some bonafide irides (especially in the cross-sensor scenario). In

Method-II, we further expand the capabilities of the RD-PAD by fine-tuning 𝐷 𝑅 using bonafide

samples and a few known PAs. This enables 𝐷 𝑅 to learn the difference between bonafide irides and

some PAs albeit in a limited way. Further, since this research focuses on unseen iris PA detection,

the PA types used to fine-tune 𝐷 𝑅 are mutually disjoint with the PA types used in the test set.

2.4 Experimental Protocols

2.4.1 Dataset Used

In this, we utilized image samples from multiple iris datasets, viz., Berc-iris-fake [90, 91], Casia-

iris-fake [136], LivDet2015 [161], LivDet2017 [162], NDCLD15 [41] and a self collected dataset

named MSU-IrisPA-01, for training and testing under different experimental set-ups. Images in

MSU-IrisPA-01 were collected using the IrisID 7000 scanner over multiple sessions. This dataset

contains 1,343 bonafide samples, 1,938 printed iris images, 108 colored contact lens images, 352

artificial eyes and 125 Kindle replay-attack images. All the images in these iris datasets are pre-

46

Table 2.1: The iris datasets used in this research. The gray cells represents PA types that are
not present in some datasets. The iris images are first pre-processed to produce images of size
256×256. Images that could not be processed by VeriEye were removed from the training sets. On
the other hand, all such images are labeled as PAs in the test sets. The datasets are further adjusted
to balance the samples in the two classes (bonafide versus PA).

Bonafide
Printed eyes
Cosmetic contact lenses
Artificial eyes
Kindle display

Berc-Iris-Fake [91] Casia-Iris-Fake [136] NDCLD15 [41] LivDet15 [161] LivDet17 [162] MSU-Iris-PA01 [156]
Total
2,778
1,200
140
80

Used
6,000
640
740
400

Total
6,000
640
740
400

Used
2,778
1,200
140
80

Used
10,763
7,336
5,287

Total
11,372
12,099
5,287

Total
2,606
4,473

Used
1,402
4,259

Total
1,100

Used
695

1,100

1,100

Total
1,343
1,938
108
352
125

Used
1,000
1,830
108
352
125

processed to produce images of size 256×256 using segmentation coordinates from VeriEye 3.

Images that could not be processed by VeriEye were removed from the training sets (as shown in

Table 2.1).

2.4.2

Image Realism Assessment

As mentioned earlier, Fréchet Inception Distance (FID) Score [119] can be utilized as a good metric

for evaluating the realism of the synthetically generated images, which compares the statistics of the

generated synthetic images against the real images to generate a distance score. Hence, the lower

the FID score, the higher the similarity between real and generated data. As described in [131],

this score can be as high as 400-600 (or even more with respect to the deviation of generated data

from the original distribution), but a score this high would indicate that the quality of the generated

dataset is unacceptable. To analyze the quality of irides generated by RaSGAN that is trained using

using 2,778 bonafide samples from the Berc-iris-fake dataset, we first generate 6,277 synthetic iris

samples and evaluate a FID score against 6,277 real bonafide samples from the Casia-iris-fake,

LivDet2015, NDCLD15 and MSU-IrisPA-01 datasets. With this, we obtained an overall score of

39.17 that is comparable to FID scores obtained in [42]. Hence, we conclude that the RaSGAN

based synthetically generated iris samples closely resemble bonafide iris samples.

3www.neurotechnology.com/verieye.html

47

(a) Normalized PA score distribution from BSIF+SVM.

(b) Normalized PA score distribution from fine-tuned VGG-16.

Figure 2.4: Normalized PA score distribution of RaSGAN-based synthetic iris images for
Experiment-1 when tested on two PAD algorithms: (a) BSIF+SVM [41] and (b) Fine-tuned VGG-
16 [53]. These histograms emphasize the similarity between bonafide samples and the generated
dataset.

48

Table 2.2: Performance (in %) of PAD algorithms in Experiment-0 that is used as baseline for
analysis and comparison with other experiments. Here, the PAD methods are trained using 4,312
bonafide samples and 5,538 PA samples. Also, the test set consists of 1,965 bonafide samples and
3,929 PA samples.

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81]

Iris-TLPAD

EER
TDR(@1%)
TDR(@5%)

5.44
71.70
93.79

5.85
86.20
93.17

19.37
21.62
48.80

2.78
96.31
97.15

Table 2.3: Performance (in %) of PAD algorithms in Experiment-1 (top) and Experiment-2 (bottom)
when RaSGAN-based synthetic iris images are used as bonafide samples.

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81]

Iris-TLPAD

EER
TDR(@1%)
TDR(@5%)

7.42
81.01
90.45

6.74
82.89
92.33

16.07
28.98
56.25

3.19
95.65
97.25

BSIF+SVM [41] pre-trained VGG-16 [53] DESIST [81]

Iris-TLPAD

EER
TDR(@1%)
TDR(@5%)

10.38
60.36
84.27

14.11
51.69
68.36

18.53
43.05
54.50

23.85
53.82
62.93

2.4.3 Applications of RaSGAN based Iris images

The generated images are analyzed and evaluated for their usefulness as both bonafide images and

presentation attack images, using different PAD methods, viz., DESIST [81], BSIF+SVM [41],

Iris-TLPAD [21] and pre-trained VGG-16 [53]. Seven different experiments are conducted with

6,277 RaSGAN-based synthetically generated irides, 6,277 bonafide irides and 9,467 PA samples

from Casia-iris-fake, NDCLD15, LivDet2015 and MSU-IrisPA-01 datasets. Further division of

these datasets for training and testing the PAD algorithms is explained in the experimental protocols

described below. In all cases, train and test sets were mutually disjoint.

49

Table 2.4: Performance (in %) of PAD algorithms in Experiment-3 (top) and Experiment-4 (bottom)
when RaSGAN-based synthetic iris images are used as presentation attack images.

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81]

Iris-TLPAD

EER
TDR(@1%)
TDR(@5%)

16.03
52.06
74.96

10.25
75.77
85.14

10.37
83.87
86.06

2.47
95.11
95.17

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81]

Iris-TLPAD

EER
TDR(@1%)
TDR(@5%)

50.64
0.51
3.10

30.94
3.21
7.43

57.45
0.35
2.23

25.14
1.37
9.16

2.4.3.1 Baseline on Current PAD Algorithms

Experiment-0: This experiment is used as a baseline to evaluate the performance of different PAD

methods on traditional PAs such as cosmetic contact lenses, printed eyes, Kindle replay-attack

and artificial eyes. The PAD algorithms are trained using 4,312 bonafide samples and 5,538 PA

samples; the test set consists of 1,965 bonafide samples and 3,929 PA samples. Results for this

experiment are summarized in Table 2.2, where Iris-TLPAD achieves the best performance with an

Equal Error Rate (EER) as low as 2.78% followed by BSIF and VGG-16 with 5.44% and 5.85%,

respectively.

2.4.3.2 Synthetic Iris as Bonafide Sample

FID scores do not merely provide an estimate of the quality of the generated images, but also about

the similarity between the distributions of the synthetic data and the real data. To further establish

the “bonafide nature" of the generated synthetic images, we conducted two more experiments:

Experiment-1: The PAD algorithms are trained using 4,312 bonafide and 5,538 PA samples

including printed eye, cosmetic contact lens, artificial eye and Kindle images. The test set was

created using 1,965 bonafide samples, 3,929 PA samples and 1,965 synthetically generated images

(labeled as bonafide samples).

50

Experiment-2: This experiment focuses on evaluating the capability of the generated synthetic

data to replace the need for bonafide samples. Thus, the PAD algorithms are trained using 4,312

RaSGAN- based synthetic iris images and 5,538 PA samples from Experiment-1. Testing is done

on 1,965 bonafide irides and 3,929 PA samples.

Analysis: From Table 2.2, we observe that even in the presence of synthetic data (labeled as

bonafide) during testing, the performance of PAD algorithms in Experiment-0 and Experiment-

1 are comparable. There is an increase of only 1.98% in the EER of BSIF for Experiment-1.

Congruent behavior is observed for other PAD algorithms implying that majority of RaSGAN

based synthetic iris images are being classified as bonafide samples (see Figure 2.4). However,

when PAD algorithms are trained using synthetic iris images (instead of bonafide images) in

Experiment-2, an increase in EER is observed (see Table 2.3). But some of the PAD algorithms

still achieve a competitive True Detection Rate (TDR) of 84.27% at 5% False Detection Rate (FDR).

This signifies that even though the generated iris images closely resemble bonafide samples, there

are some fundamental differences between the two sets of images. This suggests the possibility of

exploiting the synthetic images in a different way to enhance PAD algorithms, as will be shown

later.

2.4.3.3 Synthetic Iris as Presentation Attack Sample

The synthetically generated dataset can be exploited by an adversary to impersonate someone’s

identity. In the next two experiments, we study the impact of the synthetic data on PAD algorithms

when used as a presentation attack.

Experiment-3: In this experiment, we analyzed the performance of the PAD algorithms when the

synthetic iris data is used as a “known" presentation attack. So, PAD algorithms are trained using

4,312 bonafide and 4,312 synthetic samples while testing is done using 1,965 samples from each

class. Unlike Experiment-1 and 2, here synthetic images are labeled as PA.

Experiment-4: In this experiment, we analyzed the performance of the PAD methods when the

51

generated iris data are used as “unseen" presentation attacks. Here, the PAD algorithms are trained

using 4,312 bonafide and 4,312 PA samples while testing is done using 1,965 bonafide and 1,965

synthetic samples.

Analysis: Comparing the results of Experiment-0 and Experiment-3, we observe a considerable

increase in EER when RaSGAN-based synthetic iris images are used as PAs (except for Iris-

TLPAD). A decrease in TDR is observed for all PAD algorithms (except for DESIST) that confirms

the viability of using RaSGAN generated synthetic images as presentation attack vectors on current

state-of-the-art methods. Also, when RaSGAN based synthetic data is used only in the test set as

an unseen attack (Experiment-4), a very significant drop in the performance of PAD algorithms is

observed. For example, in Table 2.4 (bottom), EER values for BSIF and DESIST are more than

50% with TDR at an FDR of 5% as low as 3.10% and 2.23%, respectively. Similar observation

can be made for other PAD algorithms.

2.4.4 RD-PAD for Seen and Unseen Presentation Attack Detection

In this section, we evaluate the efficacy of RD-PAD in detecting unseen PAs using publicly

available iris datasets summarized in Table 2.1. We also compute the performance of current

state-of-the-art PAD algorithms, viz., BSIF+SVM [41], pre-trained VGG-16 [53], DESIST [81]

and Iris-TLPAD [21], and compare them against that of RD-PAD. A total of 22,638 bonafide irides

and 23,597 PA samples are utilized to train and test these algorithms. PA samples used in this

work consist of multiple types of attacks including cosmetic contact lenses, artificial eyes, Kindle

display-attack and printed eyes.

2.4.4.1 Seen Presentation Attacks

This is a baseline experiment for RD-PAD, Experiment-5, which demonstrates the performance of

existing PAD algorithms and proposed Method-I and Method- II on known PAs. In this experiment,

the PAD algorithms were trained using 12,875 bonafide irides and 12,326 PAs containing cosmetic

52

Table 2.5: Attack Presentation Classification Error Rate (APCER) at 0.1%, 1% and 5% Bonafide
Presentation Classification Error Rate (BPCER) of existing PAD algorithms and the proposed RD-
PAD (Method-I and Method-II) on known PAs as described in Experiment-5. Lower the APCER,
better is the performance.

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81]

Iris-TLPAD [21] Method-I Method-II

APCER(@0.1%)
APCER(@1%)
APCER(@5%)

81.83
51.26
27.54

30.55
23.27
8.09

99.04
78.13
43.64

13.34
6.64
0.89

33.40
26.18
18.35

24.65
15.68
8.43

contact lens, printed eye, artificial eye and Kindle display-attack. On the other hand, the proposed

Method-I was trained using samples. Method-II was trained using bonafide samples and only 800

randomly selected known PAs. All the trained algorithms were then tested on 6,207 bonafide and

6,529 PA samples consisting of cosmetic contact lenses, printed eyes, artificial eyes and Kindle-

display attacks.

2.4.4.2 Unseen Attack: Cosmetic Contact Lenses and Kindle Display

In this section, we evaluate the performance of the PAD algorithms for generalized PA detection

when training data does not include PAs such as cosmetic contact lens and Kindle display-attack.

Experiment-6: Here, the other PAD algorithms are trained using 2,778 bonafide samples from

Berc-iris-fake [90, 91], and 3,007 printed eyes and artificial eyes from the other datasets. Note that

Method-I is trained using only bonafide samples and Method-II is first trained using only bonafide

samples and then fine tuned using only 800 PA samples. All the trained algorithms are then tested

using 3,913 bonafide samples (excluding Berc-Iris-Fake) and 3,279 PA samples corresponding to

cosmetic contact lenses and Kindle display-attacks.

Experiment-7: Here, the other PAD algorithms are trained using 6,000 bonafide samples from

Casia-Iris-Fake [136], and 6,187 PA samples from the other datasets consisting of only printed eyes

and artificial eyes. Similar to the previous experiment, Method-I is trained using only bonafide

samples while Method-II is first trained using only bonafide samples and then fine-tuned using only

800 PA samples. These algorithms are tested using 5,634 bonafide samples from other datasets

(excluding Casia-Iris-Fake) and 5,556 PAs consisting of cosmetic contact lenses and Kindle display-

53

Table 2.6: Attack Presentation Classification Error Rate (APCER) at 0.1%, 1% and 5% Bonafide
Presentation Classification Error Rate (BPCER) of existing PAD algorithms and the proposed
methods on unseen PAs as described in Experiment-6 and Experiment-7. Lower the APCER,
better is the performance.

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?]

Iris-TLPAD [21] Method-I Method-II

APCER(@0.1%)
APCER(@1%)
APCER(@5%)

99.95
97.83
91.61

99.95
95.88
72.55

94.14
79.29
54.34

99.52
95.25
85.06

74.39
44.35
34.06

50.26
37.05
21.79

(a) Experiment-6

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?]

Iris-TLPAD [21] Method-I Method-II

APCER(@0.1%)
APCER(@1%)
APCER(@5%)

100
98.31
75.05

99.98
82.04
70.09

100
96.99
83.05

99.63
96.33
89.87

66.98
53.69
38.72

61.80
39.40
26.49

(b) Experiment-7

attacks.

2.4.4.3 Unseen Attack: Printed Eyes and Artificial Eyes

In this section, we evaluate the performance of the PAD algorithms when printed eyes and artificial

eyes are used as unseen presentation attacks.

Experiment-8: In this experiment, the other PAD algorithms are trained using 2,778 bonafide

samples from Berc-Iris-Fake and 3,093 PA samples from the other datasets. The PA samples in

training set consists of only cosmetic contact lenses and Kindle display-attacks. The proposed

Method-I is trained using only bonafide samples and Method-II is first trained using only bonafide

samples and then fine-tuned using only 500 PA samples from the training set. The test set consists

of 3,450 bonafide samples and 3,347 PA samples corresponding to printed eyes and artificial eyes.

Experiment-9: Here, the PAD algorithms are trained using 6,000 bonafide samples from Casia-

Iris-Fake [136] and 5,681 PA samples from the other datasets corresponding to cosmetic lenses and

Kindle display-attacks. Similar to the previous experiment, Method-I is trained using only bonafide

samples while Method-II is first trained using only bonafide samples and then fine-tuned using only

500 PA samples. These algorithms are tested using 8,517 bonafide samples from other datasets

(excluding Casia-Iris-Fake) and 8,865 PAs corresponding to printed eyes and artificial eyes.

54

Table 2.7: Attack Presentation Classification Error Rate (APCER) at 0.1%, 1% and 5% Bonafide
Presentation Classification Error Rate (BPCER) of existing PAD algorithms and the proposed RD-
PAD (Method-I and Method-II) on unseen PAs as described in Experiment-8 and Experiment-9.
Lower the APCER, better is the performance.

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?]

Iris-TLPAD [21] Method-I Method-II

APCER(@0.1%)
APCER(@1%)
APCER(@5%)

90.29
90.29
87.75

90.26
80.79
66.78

97.88
93.55
81.35

N/A
27.49
17.06

58.34
45.72
36.74

37.13
27.19
19.56

(a) Experiment-8

BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?]

Iris-TLPAD [21] Method-I Method-II

APCER(@0.1%)
APCER(@1%)
APCER(@5%)

95.37
90.74
81.52

99.92
94.19
77.69

100
98.90
93.68

N/A
34.86
17.59

60.71
38.30
25.06

32.49
23.30
17.58

(b) Experiment-9

Figure 2.5: ROC curve demonstrating the performance of existing PAD algorithms and the proposed
methods on known PAs (as described in Experiment-5).

2.4.4.4 Analysis

The results in Table 2.5 show that deep networks such as VGG-16, Iris-TLPAD and the proposed

methods achieve good (low) Attack Presentation Classification Error Rate (APCER) at 5% Bonafide

55

(a)

(b)

Figure 2.6: ROC curves demonstrating the performance of existing PAD algorithms and the
proposed RD-PAD methods on unseen PAs, as described in Experiment-6 and Experiment-7.

56

(a)

(b)

Figure 2.7: ROC curves demonstrating the performance of existing PAD algorithms and the
proposed RD-PAD methods on unseen PAs, as described in Experiment-8 and Experiment-9.

57

Presentation Classification Error Rate (BPCER)4 when trained and tested on the same type of

PAs. However, Tables 2.6 and 2.7 show that current PAD algorithms do not perform well when

tested on unseen PAs.

In Experiment-1, APCERs of 34.06% and 21.79% are obtained at 5%

BPCER for the proposed Method-I and Method-II, respectively. On the other hand, current

PAD algorithms obtained a much higher APCER thereby highlighting the shortcomings of these

algorithms for unseen PA detection. In Experiment-3 and Experiment-4, Method-II and TL-PAD

obtained comparable performance at 5% BPCER for unseen printed and artificial eyes. However,

TL-PAD failed to produce any valid output (NA) at 0.1% BPCER and has a higher APCER than

Method-II at 1% BPCER. Also, TL-PAD performed poorly on unseen cosmetic contact lenses

and Kindle display-attacks in Experiment-1 and Experiment-2. This indicates the shortcoming of

TL-PAD in handling unseen cosmetic contact lenses. Comparing all the results, we can conclude

that the proposed algorithms have better generalizability over both seen and unseen attacks (see

Figures 2.5, 2.6 and 2.7). Additionally, in [21] TL-PAD was evaluated on a subset of the LivDet-

Iris 2017 dataset, and achieved better performance than the three participating algorithms in the

competition. Hence, this research makes an indirect comparison against the other algorithms

published in LivDet-Iris 2017.

2.5 Summary

In this work, we designed a new technique based on RaSGAN to generate synthetic irides. Our

experimental results suggest that there are multiple applications for synthetic iris images: (1) they

can be used to imitate real iris images, which eliminates the hassle of large data collection, (2) they

can efficiently model the bonafide samples (see Figure 2.4), making them potential presentation

attack vectors, and (3) they can be used to train existing PAD algorithms for “unseen" presentation

attack detection. While this method can generate realistic looking images with low FID score,

the biometric data in the generated images is not unique i.e., identities in irides generated by

RaSGAN show high resemblance with training data and itself (discussed in Section 4). Also, this

4APCER is equivalent to (1 - True Detection Rate (TDR)) while BPCER is equivalent to False

Detection Rate (FDR).

58

method is not scalable to multi-domains, i.e., one model can only learn to generate a single type

of distribution or style. We will further investigate this in next chapter to solve the problem of

multi-domain synthetic iris generation with style transfer.

Apart from analyzing the application of generated irides, we also proposed a one-class PA

detection method for improved unseen PA detection. To facilitate this, we harnessed the relativistic

discriminator of a RaSGAN that is trained to distinguish between bonafide iris samples and the

corresponding synthetically generated iris samples. We hypothesize that such a discriminator more

effectively learns the distribution of bonafide samples and will, therefore, reject PA samples that do

not fall within this distribution. In this regard, the discriminator behaves as a one-class classifier

since, in principle, it does not require data from PA samples during the training stage. Experimental

results demonstrate the efficacy of the proposed method over current state-of-the-art PAD methods,

especially on unseen attacks. However, for seen PAs current deep learning-based binary PAD

methods [98, 127] outperform the proposed method, highlighting its weakness and limitations.

59

CHAPTER 3

CYCLIC IMAGE TRANSLATION GENERATIVE ADVERSARIAL NETWORK
(CIT-GAN)

In this work, we discuss the lack of iris samples from both classes- bonafide and presentation

attacks- in terms of number and diversity and how it affects the performance of various presentation

attack detection methods. We aim to solve this problem by proposing a novel image translative

GAN, known as CIT-GAN, to generate bonafide as well as different types of presentation attacks

in NIR spectrum using a unified architecture. This work has been published in [158].

3.1 Introduction

As mentioned previously, the unique texture of the iris has made iris-based recognition sys-

tems desirable for human recognition in a number of applications [71]. However, these systems

are increasingly facing threats from presentation attacks (PAs) where an adversary attempts to

obfuscate their own identity, impersonate someone’s identity, or create a virtual identity [117].

Grasping the threats posed by PAs, researchers have been working on devising methods for iris

presentation attack detection (PAD) that aim to distinguish between bonafide and PAs. In [60,115],

researchers used textural descriptors like Local Binary Pattern (LBP) and multi-scale binarized

statistical image features (BSIF) to detect print attacks. Kohli et al. [81] proposed a variation of

LBP to obtain textual information from iris images that helps in detecting cosmetic contact lens.

More recently, deep features from Convolutional Neural Networks (CNNs) have been used to detect

multiple iris presentation attacks [21] [65]. Yadav et al. [157] utilized the Relativistic Discriminator

from a RaSGAN as a one-class classifier for PA detection. Gupta et al. [59] introduced a novel

deep learning-based PAD method named MVANet, which harness multiple convolutional layers for

enhanced representation and employs a fixed base model to mitigate the computational complex-

ities often associated with training deep neural networks. Comprehensive evaluations conducted

across various datasets, employing cross-database training-testing configurations, underscore the

effectiveness of MVANet in its capacity to generalize over unseen Presentation Attacks (PAs).

60

In [128], Sharma and Ross, proposed D-NetPAD, an iris PAD method founded on the DenseNet

convolutional neural network architecture. D-NetPAD demonstrates adaptability in generalizing

across diverse PA artifacts, sensors, and datasets. Through a series of experiments conducted on

various iris PA datasets, they substantiated the efficacy of D-NetPAD in achieving generalized PA

detection. While these methods report high accuracy for iris PA detection, but their performance

can be negatively impacted by the absence of a sufficient number of samples from different PA

classes [34]. Therefore, we can conclude that current iris PAD methods need a copious amount of

training data corresponding to different PA classes and scenarios. Unfortunately, in the real world,

such a dataset is hard to acquire. In previous chapter, we leveraged RaSGAN to synthesize realistic

and high resolution bonafide iris images. Nevertheless, the network’s scalability is constrained

when it comes to handling various domains. To generate both bonafide and presentation attack

(PA) images, it would necessitate the training of separate RaSGANs for each class (PA class has

many sub-class such as printed eyes, cosmetic contact lens, etc.). This approach does not provide

a viable solution to address the issue at hand.

With recent advances in the field of deep learning, researchers have proposed different methods

based on Convolutional Autoencoders [132,143] and Generative Adversarial Networks (GANs) [57]

for image-to-image style translation. Here, image-to-image translation refers to learning a mapping

between different visual domain categories each of which has its own unique appearance and style.

Gatys et al. [53] proposed a neural architecture that could separate image content from style and

then combine the two in different combinations to generate new natural looking styles. Their paper

mainly focused on learning styles from well known artworks to generate new high quality and

natural looking artwork. Karras et al. [77] introduced StyleGAN that uses a non-linear mapping

function to embed a style driven latent code to generate images with different styles. However,

since the input to the generator in StyleGAN is a noise vector, non-trivial efforts are required to

transform image from one domain to another. Some researchers overcame this issue by enforcing

an overlay between generator’s input and output for diversity in generated images using either

marginal matching [7] or diversity regularization [163]. Others approached style transfer with the

61

guidance of some reference images [20, 24]. However, these methods are not scalable to more

than two domains and often show instability in the presence of multiple domains [26]. Choi et

al. [25, 26] proposed to solve this problem by using a unified GAN architecture called StarGAN

v2 for style transfer that can generate diverse images across multiple domains. StarGAN v2 uses

a single generator with a mapping and Styling Network to learn diverse mappings between all

available domains. This method is scalable to multiple domain and aims to transfer style from

source to target domain using a reference image. While this method offers a viable solution to our

research problem, the images generated by StarGAN v2 needs improvement with respect to quality,

especially while transferring style from one domain to another [158, 159].

In this work, we propose a GAN architecture that uses a novel Styling Network to drive the

translation of input image into multiple target domains. Here, the generator takes an image as input

along with a domain label and then generates images with characteristics of the target domains.

Apart from the domain label, the generative capability of the network is enhanced using a multi-

task Styling Network that learns the style codes for multiple PA domains and helps the generator

to synthesize images reflecting the style components of the target PA domains. The domain-

specific style characteristics learned using Styling Network depend on both style loss and domain

classification. This ensures variability in style characteristics within each domain. Since there are

multiple domains, the discriminator has multiple output branches to decide if a given image is real

or synthetic for each of the domains.

The primary contributions of this research are as follows:

• We propose a unified GAN architecture, referred to as Cyclic Image Translation Generative

Adversarial Network (CIT-GAN), with novel Styling Network to generate realistic looking

and high resolution synthetic images for multiple domains. The quality of generated samples

is evaluated using Fréchet Inception Distance (FID) Score [119].

• We demonstrate the usefulness of the synthetically generated data to train state-of-the-art iris

PAD methods and improve their accuracy.

62

Figure 3.1: Schematic of the proposed Cyclic Image Translation Generative Adversarial Network
(CIT-GAN). The proposed architecture has three important components: (a) Generator: Unlike a
standard generator, this network takes as an image 𝑋 from a domain with label as input (represented
as [1,0,0] in the figure) and outputs an image 𝑌 ′ with style similar to a reference image 𝑌 with
domain label [0,1,0]. (b) Styling Network: The image-to-image translation to multiple domains is
driven by a Styling Network that learns the style code for each given domain. (c) Discriminator:
Unlike a standard discriminator, the discriminator in the proposed method has multiple branches
each of which determines whether the input image is real or synthetic pertaining to that domain.

3.2 Proposed Method

Let 𝒙 ∼ X be an input image and 𝒅 ∼ 𝔇 be an arbitrary domain from domain space 𝔇. The

proposed method aims to translate image 𝒙 to synthetic image 𝒚 with style characteristics of domain

𝒅. This is achieved using a Styling Network 𝑆 that is trained to learn domain specific style codes,

and then train 𝐺 to generate synthetic images with the given target style codes (see Figure 3.2).

3.2.1 Generative Adversarial Network

Unlike standard GAN, the generative adversarial network in the proposed method has been updated

to include domain level information. These changes are reflected in each component of the proposed

architecture (as shown in 3.1) :

• Generator: For image-to-image translation between multiple domains, 𝐺 takes as input

𝒙 ∈ (𝒙, 𝒍), where 𝒙 is an image and 𝒍 is the domain label, and translates it to an image 𝐺 (𝒙, 𝒔)

with the desired style code 𝒔. The style code 𝒔 is facilitated by Styling Network 𝑆 and injected

into G.

63

Figure 3.2: Examples of image translation from source to reference domain via CIT-GAN using
domain specific styling vector obtained from the Styling Network.

• Discriminator: Discriminator in the proposed architecture has multiple branches, where

each branch 𝐷𝑑 decides whether the input image 𝒙 is a real image in domain 𝒅 or a synthetic

image.

With the new objective for the generative adversarial network, the adversarial loss can be

updated as,

L𝑎𝑑𝑣 = E𝒙,𝒅 [𝑙𝑜𝑔(𝐷 𝒅 (𝒙))]
𝒅′ (𝐺 (𝒙, 𝒔′)))].

𝒙,𝒅′ [𝑙𝑜𝑔(1 − 𝐷

+E

(3.1)

Here, 𝐷 𝒅 (𝑥) outputs a decision on image 𝒙 for domain branch 𝒅. The Styling Network 𝑆 takes
an image 𝒚 from target domain 𝒅′ and outputs a style code 𝒔′. 𝐺 (𝒙, 𝒔′) generates image 𝑦′ with

style characteristics of target domain 𝑑′.

3.2.2 Styling Network

Given an input image 𝒙 belonging to domain 𝒅, the Styling Network 𝑆 encodes the image into a

style code 𝒔. Similar to 𝐷, the Styling Network 𝑆 is a multi-task network that learns the style code

for an input image and injects the style code into 𝐺 to generate images with the given style codes.

64

This is achieved using [26],

L𝑠𝑡𝑦𝑙𝑒 = E

𝒙,𝒅′ [∥𝒔′ − 𝑆

𝒅′ (𝐺 (𝒙, 𝒔′)) ∥].

(3.2)

Here, 𝑠′ = 𝑆(𝑦) is the style code of reference image 𝒚 belonging to target domain 𝒅′. This ensures

that 𝐺 generates images with the specified style code. However, poor quality synthetic data in the

initial training iterations can affect the quality of the domain specific style codes learned by 𝑆. To

avoid this, we introduce a domain classification loss L𝑐𝑙𝑠 at the shared layer of 𝑆 from soft-max

layer (as shown in Figure 3.1) to ensure that the learnt style code aligns with the correct domain.

Further, this helps the Styling Network to learn style vectors (or feature characteristics) of varying

samples from same domain.

L𝑐𝑙𝑠 = −𝑙𝑜𝑔𝑃(𝐷 = 𝒅|𝑋 = 𝒙).

(3.3)

Here, 𝒅 is the true domain of input 𝒙.

3.2.3 Cycle Consistency

While translating images from the source domain to the domains depicted by the reference images,

it is important to preserve some characteristics of the input images (such as geometry, pose and eye

lashes in case of irides). This is achieved using the cycle consistency loss [25],

L𝑐𝑦𝑐𝑙𝑒 = E

𝒙,𝒅,𝒅′ [∥𝒙 − 𝐺 (𝐺 (𝒙, 𝒔′), 𝑠) ∥].

(3.4)

Here, 𝒔 = 𝑆(𝑥) represents the style code of input image with domain 𝒅, and 𝒔′ = 𝑆(𝑦) is the

style code of reference image in target domain 𝒅′. This ensures that image 𝒙 with style 𝒔 can be

reconstructed using synthetic image 𝐺 (𝒙, 𝒔′).

Hence, the overall loss function for the proposed Cyclic Image Translation Generative Adver-

sarial Network can be defined as:

L𝑡𝑜𝑡𝑎𝑙 = L𝑎𝑑𝑣 + 𝜆1L𝑠𝑡𝑦𝑙𝑒 + 𝜆2L𝑐𝑙𝑠 + 𝜆3L𝑐𝑦𝑐𝑙𝑒.

(3.5)

Here, 𝜆1, 𝜆2 and 𝜆3 represent the hyper-parameters for each loss term.

65

Figure 3.3: Given are some examples of synthetic samples generated using the proposed Cyclic
Image Translation Generative Adversarial Network (CIT-GAN). (a)-(b) represent synthetic cosmetic
contact lenses, (c)-(d) are two different types of synthetically generated print images (one with whole
iris image and other with pupil cut-out), and (e)-(f) are synthetic artificial eyes with (f) representing
a doll eye.

3.3 Experimental Protocols

In this section, we describe different experimental setups that are used to evaluate the quality

and usefulness of synthetic PA samples generated using CIT-GAN. We evaluated the performance

of different iris PAD methods viz., VGG-16 [53], BSIF [41], DESIST [81], D-NetPAD [128] and

AlexNet [84] under these different experimental setups for analysis purposes. Note that D-NetPAD

is one of the best performing PAD algorithms in the iris liveness detection competition (LivDet-20

edition) [36].

3.3.1 Datasets Used

In this research, we utilized five different iris PA datasets, viz., Casia-iris-fake [136], Berc-iris-

fake [91], NDCLD15 [41], LivDet2017 [162] and MSU-IrisPA-01 [156] for training and testing

different iris presentation attack detection (PAD) algorithms. These iris datasets contain bonafide

images and images from different PA classes such as cosmetic contact lenses, printed eyes, artificial

eyes and kindle-display attack (as shown in Figure 2.1). The images in these datasets are pre-

processed and cropped to a size of 256x256 around the iris using the coordinates from a software

called VeriEye.1 The images that were not properly processed by VeriEye were discarded from

the datasets as this research focuses primarily on image synthesis. This give us a total of 24,409

1www.neurotechnology.com/verieye.html

66

bonafide irides, 6,824 cosmetic contact lenses, 680 artificial eyes and 13,293 printed eyes.

3.3.2

Image Realism Assessment

The proposed architecture is trained using 6,450 bonafide images, 2,104 cosmetic contact lenses,

4,482 printed eyes and 276 artificial eyes randomly selected from the aforementioned datasets. The

trained network is then utilized to generate synthetic PA samples. To achieve this, 6,000 bonafide

images were utilized as source images. The source images are then translated to different PA classes

using 2,000 printed eyes, 2,000 cosmetic contact lens and 276 artificial eyes as reference images.

Using this approach, we generated 8,000 samples for each PA class. The generated samples from

CIT-GAN obtained an average FID score of 32.79.

For comparison purposes, we used the same train and evaluation setup to generate synthetic

samples using Star-GAN [25], Star-GAN v2 [26] and Style-GAN [77]. As mentioned before, Style-

GAN and Star-GAN are not well-equipped to handle multi-domain image translation. Therefore,

they obtained a high average FID score of 86.69 and 44.76, respectively. On the other hand,

Star-GAN v2 is equipped to handle multi-domains using a styling and mapping network. A trained

Star-GAN v2 utilizes the mapping network to generate diverse style codes to diversify images.

The synthetic iris PAs generated using this method were diverse in nature, but failed to capture

the true characteristics of PAs like artificial eyes. Hence, the average FID score of the generated

image using Star-GAN v2 was 38.81 - much lower than that of Style-GAN and Star-GAN, but still

a bit higher than CIT-GAN. This can also be seen in the FID score distribution in Figure 3.4 that

compares the synthetically generated data using Star-GAN v2 with that of CIT-GAN.

3.3.3 Utility of Synthetically Generated Images

In this section, we describe different experimental setups that are used to evaluate the quality and

usefulness of synthetic PA samples generated using CIT-GAN. We evaluated the performance of

different iris PAD methods viz., VGG-16 [53], BSIF [41], DESIST [81], D-NetPAD [128] and

AlexNet [84] under these different experimental setups for analysis purposes. Note that D-NetPAD

67

Figure 3.4: Comparing the FID score distributions of Star-GAN v2 [26] and CIT-GAN for each
synthetically generated PA domain. A lower FID is better.

Table 3.1: Experiment-1: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detection
Rate (FDR) of existing iris PAD algorithms in baseline experiment (referred to as Experiment-1)
when trained using imbalanced samples across different PA classes.

BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [128]

TDR (@0.1%)
TDR (@0.2%)
TDR (@1.0%)

3.32
6.15
28.11

85.25
83.86
89.07

4.25
5.85
17.15

86.10
87.29
90.51

87.94
88.91
92.54

Table 3.2: Experiment-2: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detection
Rate (FDR) of existing iris PAD algorithms in Experiment-2 to evaluate the equivalence between
real and synthetic PAs. When comparing with the performances in Table 2.3, it can be seen that
substituting some of the real PAs in the training set with synthetically generated samples has a
limited impact on the performance of PAD algorithms, especially at a FDR of 1%.

BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [128]

TDR (@0.1%)
TDR (@0.2%)
TDR (@1.0%)

11.09
18.06
29.38

78.29
83.03
87.39

4.48
7.43
17.43

79.90
84.13
89.66

84.27
86.04
89.55

is one of the best performing PAD algorithms in the iris liveness detection competition (LivDet-20

edition) [36].

Experiment-1: This is the baseline experiment that demonstrates the performance of the current

iris PAD methods on the previously mentioned datasets with imbalanced samples across different

PA classes. The PAD methods are trained using 14,970 bonafide samples and 10,306 PA samples

consisting of 276 artificial eyes, 4,014 cosmetic contact lenses and 6,016 printed eyes. The test

set consists of 9,439 bonafide samples and 9,896 PA samples corresponding to 404 artificial eyes,

2,720 cosmetic contact lenses and 6,772 printed eyes.

Experiment-2: In this experiment, our aim is to evaluate the equivalence between real PAs and

68

Table 3.3: Experiment-3: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detection
Rate (FDR) of existing iris PAD algorithms in Experiment-3 to evaluate the efficacy of proposed
method, CIT-GAN, in generating synthetic PA samples that captures the real PA distribution across
various PA domains.

BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [129]

TDR (@0.1%)
TDR (@0.2%)
TDR (@1.0%)

7.57
10.51
29.43

76.12
80.98
85.81

2.31
4.70
15.29

78.89
82.66
88.37

82.26
84.54
88.86

synthetically generated PAs. Therefore, the iris PAD methods are trained using 14,970 bonafide

samples and PAs consisting of both real PA images and synthetic PA images. The real PA dataset has

138 artificial eyes, 2,007 cosmetic contact lenses and 3,008 printed eyes. The synthetic PA dataset

is generated using the remainder of the real PA dataset as reference images (i.e., 138 artificial eyes,

2,007 cosmetic contact lenses and 3,008 printed eyes) in order to capture their style characteristics

in the generated dataset. As before, the test set consists of 9,439 bonafide and 9,896 PA samples

corresponding to 404 artificial eyes, 2,720 cosmetic contact lenses and 6,772 printed eyes.

Experiment-3: This experiment aims to evaluate the efficacy of the proposed method, CIT-GAN, in

generating synthetic PA samples that represent the real PA distribution across various PA domains.

Here, the iris PAD methods are trained using 14,970 bonafide samples and synthetically generated

276 artificial eyes, 4,014 cosmetic contact lenses and 6,016 printed eyes. The test set consists of

9,439 bonafide and 9,896 PA samples corresponding to 404 artificial eyes, 2,720 cosmetic contact

lenses and 6,772 printed eyes.

Experiment-4: As mentioned in Experiment-1, current iris PAD methods are trained and tested

on imbalanced samples from PA classes thereby affecting their accuracy. To overcome this, we

train the iris PAD methods using 14,970 bonafide samples and a balanced set of 15,000 PA samples

corresponding to 276 artificial eyes, 4,014 cosmetic contact lenses and 5000 printed eyes that are

real; and 4,724 artificial eyes and 986 cosmetic contact lenses that are synthetic. This balances the

number of samples across PA classes. The testing was done on 9,439 bonafide samples and 9,896

PA samples consisting of 404 artificial eyes, 2,720 cosmetic contact lenses and 6,772 printed eyes.

69

Table 3.4: Experiment-4: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detec-
tion Rate (FDR) of existing iris PAD algorithms in Experiment-4. When comparing against the
performance numbers in Table 2.3, it can be seen that training using balanced samples from each
PA class/domain helps improve the performance of current iris PAD algorithms.

BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [129]

TDR (@0.1%)
TDR (@0.2%)
TDR (@1.0%)

14.39
22.91
51.11

69.40
75.34
91.60

2.59
5.38
21.41

79.26
82.80
92.70

90.38
94.19
97.89

3.4 Results & Analysis

In this section, we discuss the results obtained for the four different experiments described in

the previous section. Experiment-1 is the baseline experiment that evaluates the performance of

various iris PAD methods. The training set for this experiment contains 14,970 bonafide and 10,306

PA samples consisting of 276 artificial eyes, 4,014 cosmetic contact lenses and 6,016 printed eyes.

Due to imbalance in the number of samples across various PA domains, the performance of the

PAD methods is affected. This becomes apparent when comparing the results of Experiment-1

with that of Experiment-4 where PAD methods are trained using 9,439 bonafide samples and a

balanced number of PA samples (i.e., 5,000 samples from each PA domain) containing both real

and synthetic PAs. As seen from the results in Table 3.1 and Table 3.4, performance for each PAD

method improves in Experiment-4. For example, in the case of D-NetPAD, the TDR at a 1% FDR

improved from 92.54% in Experiment-1 to 97.89% in Experiment-4 (as shown in Figure 3.7). A

huge increase in performance was also noticed for BSIF+SVM where TDR improved from 28.11%

in Experiment-1 to 51.11% in Experiment-4, at a FDR of 1%.

In addition, the equivalence of synthetically generated PA samples and real PA samples was

established using Experiment-2 and Experiment-3. In Experiment-2, some of the real PA samples

in the training set were replaced with synthetically generated PAs. Comparing the performance in

Table 3.1 and Table 3.2, a very slight difference in PAD performance is observed (see Figure 3.5).

Similarly, in Experiment-3, where all the real PAs are replaced with synthetically generated PA

samples, only a slight decrease in performance was seen for the PAD methods (as shown in Figure

3.6) signifying underlying similarities between real and synthetically generated data.

70

3.5 Summary

PA detection methods such as [59,65,128] have demonstrated their effectiveness in generalized

PA detection through extensive experimentation on various iris PA datasets. While these methods

achieve high accuracy in iris PA detection, their performance can be adversely affected when there

is an insufficient number of samples available from diverse PA classes [34]. Consequently, it can

be inferred that existing iris deep learning-based PAD methods require a substantial amount of

training data encompassing various PA classes and scenarios. Unfortunately, acquiring such a

dataset is a formidable challenge in real-world settings. To address this challenge, we proposed a

novel GAN architecture known as the Cyclic Image Translation Generative Adversarial Network

(CIT-GAN). CIT-GAN incorporates a unique Styling Network to guide the transformation of input

images into multiple target domains. In this setup, the generator accepts an input image with a

domain label, generating images with the characteristics of the specified target domain. To further

enhance the generative capacity of the network, a multi-task Styling Network is employed. This

network learns style codes for multiple PA domains, aiding the generator in synthesizing images

that capture the style elements of the target domains. The domain-specific style features acquired

through the Styling Network are influenced by both style loss and domain classification, ensuring

variability in style characteristics within each domain. As there are multiple domains to consider,

the discriminator incorporates multiple output branches to determine whether a given image is real

or synthetic for each of the domains. The results obtained on various experimental setups show

the equivalence of synthetically generated PA samples and real PA samples, showing how realistic

the generated PA samples are. Furthermore, the results in Table 3.1 and 3.2 demonstrate that the

performance of the iris PAD methods can be improved by adding synthetically generated data to

different PA classes for balanced training.

71

Figure 3.5: Comparing performance of D-NetPAD in Experiment-1 and Experiment-2 to evaluate
the equivalence of synthetic PA samples in replacing real PA samples. D-NetPAD is one of the
best performing PAD algorithms in iris liveness detection competition (LivDet-20 edition) [36].

72

Figure 3.6: Comparing performance of D-NetPAD in Experiment-1 and Experiment-3 to evaluate
the efficacy of the proposed method, CIT-GAN, in generating synthetic PA samples that represent
the real PA distribution across various PA domains.

73

Figure 3.7: Comparing the performance of D-NetPAD in Experiment-1 and Experiment-4 to empha-
size that the performance of current iris PAD methods are affected due to training with imbalanced
samples from PA classes (Experiment-1). Improved performance is reported in Experiment-4 that
utilizes synthetic samples for balanced training.

74

CHAPTER 4

IWARPGAN: DISENTANGLING IDENTITY AND STYLE TO GENERATE
SYNTHETIC IRIS IMAGES

So far, we focused on generating partially synthetic iris images in NIR spectrum where identities

in the generated images is not the focus of our study. In this chapter, we explore the capabilities

and weaknesses of current methods in generating iris images with identities that are different

from training dataset and has sufficient intra-class variations to represent real iris datasets. We

also proposed a novel method, known as iWarpGAN, to generate fully synthetic iris images NIR

spectrum with identities that are different from training dataset (with both inter and intra-class

variations). This work has been published in [159].

4.1 Introduction

With the advent of technology, iris sensors are now available in commercial and personal

devices, paving the way for secure recognition and access control [55, 111]. However, the accuracy

of iris recognition systems relies heavily on the quality and size of the dataset used for training.

The limited availability of large-scale iris datasets due to the difficulty in collecting operational

quality iris images, has become a major challenge in this field. For example, most of the iris

datasets available in the literature have frontal view images [5, 86], and the number of subjects and

total number of samples in these datasets are limited. Further, in some instances, collecting and

sharing iris datasets may be stymied due to privacy or legal concerns [145]. Therefore, researchers

have been studying the texture and morphology of the iris in order to model its unique patterns

and to create large-scale synthetic iris datasets. For example, Cui et al. [32] utilized principal

component analysis with super-resolution to generate synthetic iris images. Shah and Ross [126]

used a Markov model to capture and synthesize the iris texture followed by embedding of elements

such as spots and stripes to improve visual realism. In [173], Zuo et al. analyzed various features

of real iris images, such as texture, boundary regions, eyelashes, etc. and used these features to

75

create a generative model based on the Hidden Markov Model for synthetic iris image generation.

These methods while successfully generating synthetic iris images, are found lacking in terms of

quality (visual realism and good-resolution) and diversity in the generated samples [156].

Over the past few years, deep learning-based approaches have set a benchmark in various fields

including synthetic image generation and attribute editing, using Convolutional Autoencoders

(CAEs) [143] and Generative Adversarial Networks (GANs) [56, 57]. In [82, 156, 158], authors

proposed GAN-based synthetic image generation methods that input a random noise vector and

output a synthetic iris image. While these methods address some of the concerns mentioned

previously, the generated images are often similar to each other [140]. Additionally, due to

insufficient number of training samples, the generator is often over-trained to synthesize images

with patterns seen during training [140], which affects the uniqueness of the synthesized iris images.

In this work, we address the following limitations of current synthetic iris generators: (1)

difficulty in generating realistic looking and high resolution synthetic iris images, (2) failure to

incorporate inter and intra class variations in the generated images, (3) generating images that are

similar to the training data, and (4) utilizing domain-knowledge to guide the synthetic generation

process. We achieve this by proposing iWarpGAN that aims to disentangle identity and style using

two transformation pathways: (1) Identity Transformation and (2) Style Transformation. The goal

of Identity Transformation pathway is to transform the identity of the input iris image in the latent

space to generate identities that are different from the training set. This is achieved by learning
RBF-based warp function, 𝑓 𝑝, in the latent space of a GAN, whose gradient gives non-linear paths
along the 𝑝𝑡ℎ family of paths for each latent code 𝑧 ∈ R. The Style Transformation pathway aims

to generate images with different styles, which are extracted from a reference iris image, without

changing the identity. Therefore, by concatenating the reference style code with the transformed

identity code, iWarpGAN generates iris images with both inter and intra-class variations. Thus,

the contributions of this research are as follows:

• We propose a synthetic image generation method, iWarpGAN, which aims to disentangle

identity and style in two steps: identity transformation and style transformation.

76

• We evaluate the quality and realism of the generated iris images using ISO/IEC 29794-6

Standard Quality Metrics which uses a non-reference single image quality evaluation method.

• We show the utility of the generated iris dataset in training deep-learning based iris matchers

by increasing the number of identities and overall images in the dataset.

4.2 Background

As mentioned previously, GANs [56] are generative models that typically take a random noise

vector as input and output a visually realistic synthetic image. A GAN consists of two main

components: (1) Generative Network known as Generator (𝐺), and (2) Discriminative Network

known as Discriminator (𝐷) that are in competition with each other. The Generator aims to generate

realistic looking images that can fool the discriminator, while Discriminator (𝐷) aims to distinguish

between real and synthetic images generated by 𝐺. In the literature, different methods have been

proposed to generate generate realistic looking biometric images such as face, iris and fingerprint.

Some of these methods are discussed below:

• Generation using Random Noise: Kohli et al. [82] proposed a GAN-based approach to

synthesize cropped iris images using iris Deep Convolution Generative Adversarial Network

(iDCGAN). While this method generates realistic looking cropped iris images of size 64×64,

unrealistic distortions and noise were observed when trained to generate high resolution

images. In [156], Yadav et al. overcame this issue by utilizing Relativistic Average Standard

Generative Network (RaSGAN) that aims to generate realistic looking and high resolution

iris images. However, since RaSGAN generates synthetic images from a random noise

vector, it is hard to generate irides with intra-class variations. Also, as shown in Figure

4.4, the uniqueness of generated images is limited and the network was often observed to

repeat certain patterns, restricting the diversity in the generated dataset. Wang et al. [147]

proposed a method for generating iris images that exhibit a wide range of intra- and inter-

class variations. Their approach incorporates contrastive learning techniques to effectively

77

disentangle identity-related features, such as iris texture and eye orientation, from condition-

variant features, such as pupil size and iris exposure ratio, in the generated images. While

their method seems promising but the experiments presented in their paper [147] are not

sufficient to comment on realism and uniqueness of the generated iris images.

• Generation via Image Translation: Image translation refers to the process of translating

an image from one domain to another by learning the mapping between various domains.

Therefore, image translation GANs focus on translating a source image to the target domain

with the purpose of either changing some style attribute in the source image or adding/mixing

different styles together. For example, StyleGAN [77] learns a mapping to different styles in

face images (such as hair color, gender, expression, etc.) using a non-linear mapping function

that embeds the style code of the target domain into the generated image. Unlike StyleGAN,

StarGAN [26] and CIT-GAN [158] require paired training data to translate a source image

to an image with the attributes of the target domain using style code of a reference image.

This forces the generator to learn mappings across various domains, making it scalable to

multiple domains. However, when trained using real iris images, StarGAN and CIT-GAN

were seen to assume the identity of the source image (as shown in Figures 4.6 and 4.7). So,

both methods fail to generate irides whose identities are not present in the training dataset.

There are other GAN-based methods in the literature that aim to edit certain portions of the image

using warp fields or color transformations. Warp fields have been widely used for editing images

such as modifying eye-gaze [52], semantically adding objects to an image [169], reconstructing

facial features [165], etc. Dorta et. al [40] argues that warp fields are more comprehensive than

pixel differences that allow more flexibility in terms of partial edits. Geng et al. [54] proposed

WG-GAN that aims to fit a dense warp field to an input source image to translate it according to

the target image. This method showed good results at low resolution, but the quality of synthetic

data deteriorates at high resolution. Also, as mentioned earlier, the source-target relationship in

WG-GAN can restrict the uniqueness of the output image. Dorta et al. [40] overcame these issues

78

by proposing WarpGAN that allows partial edits without the dependency on the source-target image

pair. The generator takes as input a source image and a target attribute vector and then learns the

warp field to make the desired edits in the source image. This method has been proven to make

more realistic semantic edits in the input image than StarGAN and CycleGAN [170]. Further, with

the ability of controlled or partial edits, WarpGAN provides the mechanism to generate images with

intra-class variations. However, using a real image as input to the generator restricts the number of

unique images that can be generated from this network.

4.3 Proposed Method

In this section, we will discuss the proposed method, iWarpGAN, that has the capability to

synthesize an iris dataset in such a way that: (1) it contains iris images with unique identities that

are not seen during training, (2) generates multiple samples per identity, (3) it is scalable to hundred

thousand unique identities, and (4) images are generated in real-time.

Let 𝑥𝑑1
𝑠1

∈ P be an input image with identity 𝑑1 and style 𝑠1, and another input image 𝑥𝑑2
𝑠2

∈ P

with identity 𝑑2 and style 𝑠2. Here, 𝑠1 and 𝑠2 denote image with attribute 𝑦. The attribute vector 𝑦

is a 12-bit binary vector, where the first 5 bits correspond to a one-hot encoding of angle, the next

5 bits correspond to a one-hot encoding of position shift, and the last 2 bits denote contraction and

dilation, respectively. Here, angle and position define eye orientation and the shift of iris center in
the given image. The possible angles are 0𝑜, 10𝑜, 12𝑜, 15𝑜, 18𝑜 and the possible position shifts are
[0,0], [5,5], [10,10], [-10,10], [-10,-10]. For example, an image with angle 10𝑜, position shift [0,0]

and dilation, the attribute vector 𝑦 will be [0,1,0,0,0,1,0,0,0,0,0,1]. The angle value defines the
image orientation and position defines the offset of the iris center from the image center. Given 𝑥𝑑1
𝑠1
and 𝑥𝑑2
𝑠2 with identity 𝑑3 different from the training
data and possessing the style attribute 𝑠2 from 𝑥𝑑2
𝑠2 . To achieve this, as shown in Figure 4.1, the
framework of iWarpGAN has been divided into five parts: (1) Style Encoder, 𝐸𝑆, that encodes

𝑠2 , our aim is to synthesize a new iris image 𝑥𝑑3

style of the input image, (2) Identity Encoder, 𝐸𝐷, that learns an encoding to generate an identity

different from the input image, (3) Generative Network, 𝐺, that uses encoding from both 𝐸𝐷 and

79

Figure 4.1: The proposed iWarpGAN consists of five parts: (1) Style Encoder, 𝐸𝑆, that aims to
encode the style of the input image as 𝑠, (2) Identity Encoder, 𝐸𝐷, that aims to learn encoding 𝑑 that
generates an identity different from the input image, (3) Generative Network, 𝐺, that uses encoding
from both 𝐸𝐷 and 𝐸𝑆 to generate an image with a unique identity and the given style attribute, (4)
Discriminator, 𝐷, that inputs either a real or synthetic image and predicts whether the image is real
or synthetic and also emits an attribute vector 𝑦′
∈ {angle, position, contraction, dilation of pupil},
and (5) Pre-trained Classifier, 𝐶, that computes the distance score between the real input image and
the new identity generated by 𝐺.

𝐸𝑆 to generate an image with a unique identity and the given style attribute, (4) Discriminator,
𝐷, that predicts whether the image is real or synthetic and emits an attribute vector 𝑦′
Pre-trained Classifier, 𝐶, that returns the distance score between a real input image and new the

and (5)

identity generated by 𝐺.

4.3.1 Disentangling Identity and Style to Generate New Iris Identities

Generally, the number of samples available in the training dataset is limited. This restricts the

latent space learned by 𝐺 thereby limiting the number of unique identities generated by the trained

GAN. Some GANs focus too much on editing or modifying style attributes in the images while

generating previously seen identities in the training dataset [26, 156, 158]. This motivated us to

divide the problem into two parts: (1) Learning new identities that are different from those in the

training dataset, and (2) Editing style attributes for ensuring intra-class variation. Inspired by [166],

80

we achieve this by training the proposed GAN using two pathways - Style Transformation Pathway

and Identity Transformation Pathway.

Style Transformation Pathway: Similar to StyleGAN, this pathway entirely focuses on learning

the transformation of the style. Therefore, this sub-path aims to train the networks 𝐸𝑆, 𝐷 and 𝐺,

while keeping the networks 𝐸𝐷 and 𝐶 fixed. Input to the generator 𝐺 is the concatenated latent

vector 𝑑 and 𝑠 to generate an iris image with style attribute 𝑦. 𝐺 tries to challenge 𝐺 by maximizing,

L𝐺−𝑆𝑡𝑦 = E

𝑥𝑑𝑖
𝑠𝑖

,𝑥

𝑑𝑗
𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙

[𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖

𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗

𝑠 𝑗 , 𝑦)))]

(4.1)

Here, ¯𝑥 = (𝐺 (𝐸𝐷 (𝑥𝑑𝑖
competes with 𝐺 by minimizing,

𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗

𝑠 𝑗 , 𝑦))) is the image generated by 𝐺. At the same time, 𝐷

L𝐷−𝑆𝑡𝑦 = E

𝑥𝑑𝑖
𝑠𝑖

,𝑥

𝑑𝑗
𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙

[𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖

𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗

𝑠 𝑗 , 𝑦)))]

−E𝑥 [𝐷 (𝑥)]

(4.2)

In order to enforce that an iris image is generated with style attributes 𝑦, the following loss

function is utilized:

L𝑆𝑡𝑦−𝑅𝑒𝑐𝑜𝑛 = ||𝐸𝑆 ( ¯𝑥) − 𝐸𝑆 (𝑥𝑑𝑗

𝑠 𝑗 )||2
2

(4.3)

Identity Transformation Pathway: This pathway focuses on learning identities in latent space

that are different from the training dataset. Therefore, this sub-path aims to train the networks

𝐸𝐷, 𝐷 and 𝐺, while keeping the networks 𝐸𝑆 and 𝐶 fixed. Therefore,

L𝐺−𝐼 𝐷 = E

𝑥𝑑𝑖
𝑠𝑖

,𝑥

𝑑𝑗
𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙

L𝐷−𝐼 𝐷 = E

𝑥𝑑𝑖
𝑠𝑖

,𝑥

𝑑𝑗
𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙

[𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖

𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗

𝑠 𝑗 , 𝑦)))]

[𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖

𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗

𝑠 𝑗 , 𝑦)))]

(4.4)

(4.5)

−E𝑥 [𝐷 (𝑥)]

Here, the goal is to learn encodings that represent identities different from those in the training

dataset. For this, 𝐸𝐷 is divided into two parts (as shown in Figure 4.1) - Encoder 𝐸 that extracts the

encoding from given input image and Warping Network 𝑊 that aims to learn 𝑀 warping functions
( 𝑓 1, ....., 𝑓 𝑀 ) to discover 𝑀 non-linear paths in the latent space of 𝐺. The gradient of these can be

81

utilized to define the direction at each latent code 𝑧 [142] such that new ¯𝑧 represents encoding of

an identity different from the input image. In order to achieve this, the encoder 𝐸𝐷 is broken down

to two parts - an encoder 𝐸 that extracts the latent code of the given input image and passes it on

to the warping network 𝑊.

For a vector space R𝑑, the function 𝑓 : R is defined as,

𝑓 (𝑧) =

𝐾
∑︁

𝑖=1

𝑏𝑖𝑒𝑥 𝑝(−𝑢𝑖 ||𝑧 − 𝑣𝑖 ||2)

(4.6)

Here, 𝑣𝑖 ∈ R𝑑 represents the center, 𝑏𝑖 ∈ R represents weight and 𝑢𝑖 ∈ R represents scale of 𝑖𝑡ℎ

RBF. This function for warping is differentiable and for a specific value of 𝑧, the direction from Δ 𝑓
can be used to define a curve in R𝑑 by shifting 𝑧 as [142]:

𝛿𝑧 = 𝜖

Δ 𝑓 (𝑧)
||Δ 𝑓 (𝑧)||

(4.7)

Here, 𝜖 is the shift magnitude that determines the shift from 𝑧 to ¯𝑧 using above equation. The

Warping Network, 𝑊, contains two components: warper and reconstructor 𝑅. The warper can be
parameterized using the triplet (𝑉 𝑚, 𝐵𝑚, 𝑈𝑚) denoting the center, weight and parameters. Here,
𝑚 = 1, 2, ....𝑀 and each triplet help warping the latent space in R𝑑. Also, the reconstructor is

utilized to estimate the support set and magnitude shift that led to the transformation at hand.

Therefore, the objective function for the Warping Network can be defined as,

min
𝑉,𝐵,𝑈,𝑅

E𝑧,𝜖 [L𝑊−𝑅𝑒𝑔 (𝜖, ¯𝜖)]

(4.8)

Here, L𝑊−𝑅𝑒𝑔 refers to regression loss. To further emphasize the uniqueness of identity learned

by 𝐺 in latent space, we maximize,

L𝐼𝑑𝑒𝑛𝑡−𝑅𝑒𝑐𝑜𝑛 = ||𝐸 ( ¯𝑥) − 𝐸 (𝑥𝑑𝑖

𝑠𝑖 )||2
2

L𝐼𝑑𝑒𝑛𝑡−𝐶𝑙𝑠 = ||𝐹𝑒𝑎𝑡 ( ¯𝑥) − 𝐹𝑒𝑎𝑡 (𝑥𝑑𝑖

𝑠𝑖 )||2
2

(4.9)

(4.10)

Here, 𝐹𝑒𝑎𝑡 (𝑥) are the features extracted by the trained iris classifier (i.e., matcher) 𝐶.

82

Figure 4.2: Some examples of images generated using iWarpGAN. A total of 20,000 irides corre-
sponding to 2,000 identities were generated for each of the three training datasets.

By employing distinct pathways for style and identity, the proposed method enables the manip-

ulation of identity features to generate synthetic images with distinct identities that diverge from

the training dataset. Additionally, this methodology allows for the generation of images with varied

styles for each identity. This is achieved by keeping the input image to the identity pathway constant

and varying the input image to the style pathway to enforce that the generated images have the same

identity 𝑑 but different styles (i.e., intra-class variation) 𝑠1, 𝑠2, ...., 𝑠𝑛.

4.4 Datasets Used

In this work, we utilized three publicly available iris datasets for conducting experiments and

performing our analysis:

D1: CASIA-Iris-Thousand This dataset [5] released by the Chinese Academy of Sciences Institute

of Automation has been widely used to study distinctiveness of iris features and to develop state-

of-the-art iris recognition methods. It contains 20,000 irides from 1,000 subjects (2,000 unique

identities with left and right eye) captured using an iris scanner with a resolution of 640×480. The

dataset is divided into train and test sets using a 70-30 split based on unique identities, i.e., 1,400

83

Figure 4.3: Examples of images generated using iWarpGAN with unique identities and intra-class
variations. The figure shows the average similarity score (SScore) for both inter and intra class.

identities in the training set and 600 in the test set.

D2: CASIA Cross Sensor Iris Dataset (CSIR) For this work, we had access to only the train

set of the CASIA-CSIR dataset [152] released by the Chinese Academy of Sciences Institute of

Automation. This dataset consists of 7,964 iris images from 100 subjects (200 unique identities

with left and right eye), which is divided into train and test sets using a 70-30 split on unique

identities for training and testing deep learning based iris recognition methods, i.e., training set

contains 5,411 images and test set contains 2,553 images.

D3: IITD-iris This dataset [6] was released by the Indian Institute of Technology, Delhi, and was

acquired in an indoor environment. It contains 1,120 iris images from 224 subjects captured using

JIRIS, JPC1000 and digital CMOS cameras with a resolution of 320×240. This dataset is divided

into train and test sets using 70-30 split based on unique identities, i.e., images from 314 identities

in the training set and images from 134 identities in the testing set.

Training Data for Proposed Method The proposed method is trained using cropped iris images

of size 256×256, where the style of each image is represented using the attribute vector 𝑦. Current

datasets do not contain balanced number of iris images across these attributes. Therefore, variations

such as angle and position is added via image transformations on randomly selected images from

84

the dataset.

In order to achieve this, iris coordinates are first obtained using the VeriEye iris

matcher, images are then translated to different angles and positions with respect to these centers,

and cropped iris image of size 256×256 extracted. This helps create a training dataset with balanced

samples across different attributes. Since the proposed method uses an image translation GAN,
𝑠1 , 𝑥𝑑2
during image synthesis two images 𝑥𝑑1
𝑠2 are used as
input to synthesize a new iris image 𝑥𝑑3
𝑠2 with identity 𝑑3 which is different from the training data
and possesses the style attribute 𝑠2 of 𝑥𝑑2
𝑠2 .

𝑠2 and an attribute vector 𝑦 of image 𝑥𝑑2

4.5 Experimental Protocols

In this section, we discuss different experiments utilized to study and analyze the performance

of the proposed method. First, three sets of 20,000 number of iris images corresponding to 2,000

identities are generated. The three sets correspond to three different training datasets, D1, D2 and

D3. For some of the experiments below, a subset of the generated images were used in order to be

commensurate with the corresponding real dataset.

4.5.1 Experiment-1: Quality of Generated Images

ISO/IEC 29794-6 Standard Quality Metrics The quality of generated images is compared with

the real images using ISO/IEC 29794-6 Standard Quality Metrics [3]. We also evaluated the quality

of images generated by other techniques, viz., WGAN [8], RaSGAN [156] and CITGAN [158] and

compared them with the images generated using iWarpGAN. The ISO metric evaluates the quality

of an iris image using factors such as usable iris area, iris-sclera contrast, sharpness, iris-pupil

contrast, pupil circularity, etc. to generate an overall quality score. The quality score ranges from

[0-100] with 0 representing poor quality and 100 representing the highest quality. The images

that cannot be processed by this method (either due to extremely poor quality or error during

segmentation) are given a score of 255. As shown in Figure 4.4, the quality scores of iris images

generated by iWarpGAN and CITGAN are comparable with real irides. On the other hand, WGAN

and RaSGAN have many images with a score of 255 due to the poor image quality. Also, when

85

(a) CASIA-Iris-Thousand dataset v/s synthetically generated images from differ-
ent GANs.

(b) CASIA-CSIR dataset v/s synthetically generated images from different GANs.

Figure 4.4: Histograms showing the quality scores of real iris images from three different datasets
and the synthetically generated iris images. The quality scores were generated using ISO/IEC
29794-6 Standard Quality Metrics [3] in the score range of [0-100]. Higher the score, better the
quality. Iris images that failed to be processed by this method are given the score of 255.

86

Figure 4.4 (cont’d)

(c) IITD-iris dataset v/s synthetically generated images from different GANs.

comparing the images in the three datasets, it can be seen that CASIA-CSIR dataset has more

images with a score of 255 than IITD-iris and CASIA-Iris-Thousand dataset.

VeriEye Rejection Rate To further emphasize the superiority of the proposed method in generating

good quality iris images, we compare the rate of rejection of the generated images by a commercial

iris matcher known as VeriEye. We compare the rejection rate for images generated by iWarpGAN

with the real images as well as those generated by WGAN, RaSGAN and CITGAN:

(a) IITD-Iris-Dataset: This dataset contains a total of 1,120 iris images out of which 0.18% images

are rejected by VeriEye. For comparison, we generated 1,120 iris images each using iWarpGAN,

WGAN, RaSGAN and CITGAN. For the generated images, the rejection rate is as high as 9.73%

and 4.55% for WGAN and RaSGAN, respectively. However, the rejection rate for CITGAN and

iWarpGAN is 2.85% and 0.73%, respectively.

(b) CASIA-CS Iris Dataset: This dataset contains a total of 7,964 iris images out of which 2.81%

images are rejected by VeriEye. For comparison, we generated 7,964 iris images each using

87

iWarpGAN, WGAN, RaSGAN and CITGAN. For the generated images, the rejection rate is as

high as 4.17% and 2.06% for WGAN and RaSGAN, respectively. However, the rejection rate for

CITGAN and iWarpGAN is 2.71% and 2.74%, respectively.

(c) CASIA-Iris-Thousand Dataset: This dataset contains a total of 20,000 iris images out of which

0.06% images are rejected by VeriEye. For comparison, we generated 20,000 iris images each

using iWarpGAN, WGAN, RaSGAN and CITGAN. For the generated images, the rejection rate is

as high as 0.615% and 0.34% for WGAN and RaSGAN, respectively. However, the rejection rate

for CITGAN and iWarpGAN is 0.24% and 0.18%, respectively.

4.5.2 Experiment-2: Uniqueness of Generated Images

This experiment analyzes the uniqueness of the synthetically generated images, i.e., we evaluate

whether iWarpGAN is capable of generating unique identities with intra-class variations.

Experiment-2A: Experiment-2A focuses on studying the uniqueness in the synthetic iris dataset

generated using different GAN methods with respect to training samples. For this, we studied the

genuine and impostor distribution of real iris images used to train GAN methods and compared it

with the distribution of synthetically generated iris images. We utilized VeriEye matcher in this

experiment to evaluate the similarity score between a pair of iris image. The score ranges from [0,

1557] where a higher score denotes a better match.

Experiment-2B: Experiment-2B focuses on studying the uniqueness and intra-class variations

within the generated iris dataset. For this, we studied the genuine and impostor distributions of the

generated iris images and compare it with the distribution of real iris datasets. As mentioned earlier,

this study is done for various unique generated identities to study both uniqueness and scalability.

We utilized VeriEye matcher in this experiment to evaluate the similarity score between a pair of

iris images.

Analysis

As shown in the Figures 4.5, 4.6 and 4.7, unlike other GAN methods, the iris images generated by

88

iWarpGAN do not share high similarity with the real iris images used in training. This shows that

iWarpGAN is capable of generating irides with identities that are different from the training dataset.

Further, looking at the impostor distribution of synthetically generated images, which overlaps with

the impostor distribution of real iris images, we can conclude that the generated identities are

different from each other. Note that low similarity scores in WGAN for real v/s synthetic and

synthetic v/s synthetic distributions are due to poor quality iris images generated by WGAN.

4.5.3 Experiment-3: Utility of Synthetic Images

In this experiment, we analyze the performance of deep learning algorithms trained and tested for

iris recognition using a triplet training method, and compare it with the performance when these

algorithms are trained using real and synthetically generated iris images.

Experiment-3A: Baseline Analysis

This is a baseline experiment where EfficientNet [66] and Resnet-101 [101] are trained with the

training set of CASIA-Iris-Thousand, CASIA-CSIR and IITD-iris datasets using the triplet training

method. The trained networks are tested for iris recognition on the test set of the above mentioned

datasets.

Experiment-3B: Cross-Dataset Analysis

In this experiment, we analyze the benefits of synthetically generated iris datasets in improving

the performance of deep learning based iris recognition methods. EfficientNet and Resnet-101 are

trained using the training set of CASIA-Iris-Thousand, CASIA-CSIR and IITD-iris datasets, as

well as the synthetically generated iris dataset from the iWarpGAN.

Analysis: As shown in Figures 4.8 and 4.9, the performance of the deep learning based iris recog-

nition system improves when trained with more data, i.e., when combining real and synthetically

generated iris images from iWarpGAN. While there is a slight improvement in the performance of

ResNet-101, a significant improvement in the performance is seen for EfficientNet.

The baseline under-performance of EfficientNet can be attributed to two main factors. First,

89

architectural differences between EfficientNet and ResNet are significant. EfficientNet’s design

emphasizes parameter efficiency through compound scaling, optimizing depth, width, and reso-

lution together. While effective in many domains, this approach might not align well with the

specific requirements of iris recognition, particularly when training data is limited. In contrast,

ResNet’s residual connections and straightforward structure make it robust across diverse tasks,

including biometric recognition, even with smaller datasets. Second, the training requirements for

EfficientNet differ from those of ResNet. EfficientNet’s compound scaling may necessitate tailored

learning rates, augmentation strategies, and regularization techniques to effectively capture the

intricate patterns in iris images. Without such adjustments, its capacity to learn critical iris features

may be limited.

The inclusion of synthetic data appears to address these challenges by diversifying the training

set, allowing EfficientNet to better adapt to the unique characteristics of iris recognition. This finding

underscores the importance of designing training strategies specific to an architecture’s strengths

and weaknesses. Future research could explore architecture-specific synthetic data generation or

custom training pipelines to further enhance model performance across a variety of biometric

recognition systems.

4.6 Summary & Future Work

TThe results in the previous section demonstrate that, unlike existing GANs, the proposed

method can generate high-quality iris images with identities that are distinct from those in the

training dataset. Moreover, the generated identities are unique with respect to each other, exhibiting

slight variations to enhance diversity. This capability is critical in biometric systems, where

unique and diverse identities are required to train and evaluate robust models. Additionally, the

usefulness of the generated dataset in improving the performance of deep learning-based iris

recognition methods has been established by augmenting training data with numerous unique

synthetic identities.

With the dataset used in this study, the proposed method is capable of generating up to 7,680,000

90

Figure 4.5: This figure shows the uniqueness of iris images generated using iWarpGAN when the
GANs are trained using CASIA-Iris-Thousand dataset. The y-axis represents the similarity scores
obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor.

images, comprising 768,000 unique identities, with each identity having 10 samples. This large-

scale generation capability highlights the practical utility of the proposed method for creating

diverse datasets to train advanced iris recognition systems.

The proposed method is based on an image transformation framework, where the network

requires an input and reference image to transform the identity and style and produce an output

image. While effective, this approach inherently limits the feature space explored by iWarpGAN

to the diversity present in the training data and the provided references. Furthermore, the warping

transformations applied in GAN space are often non-linear and complex, aimed at disentangling

91

Figure 4.6: This figure shows the uniqueness of iris images generated using iWarpGAN when the
GANs are trained using CASIA-CS iris dataset. The y-axis represents the similarity scores obtained
using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor.

features such as identity and style. These transformations typically do not have an explicit reverse

function, which limits the ability to map generated images back to their precise original latent

representation. Additionally, GAN latent spaces are generally high-dimensional, and warping

manipulations might involve projecting onto subspaces or adding perturbations that result in some

loss of information, further constraining the potential for inversion or precise reconstruction.

For future work, we aim to extensively study the capacity of the proposed method to maximize

the number of unique identities it can generate. Additionally, we will explore strategies to make

the method more generalizable, ensuring that the new identities learned by iWarpGAN are not

92

Figure 4.7: This figure shows the uniqueness of iris images generated using iWarpGAN when the
GANs are trained using IITD iris dataset. The y-axis represents the similarity scores obtained using
VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor.

constrained by the training set. This will involve developing techniques to expand the latent feature

space, better decouple the identity and style representations, and address the inherent challenges

of invertibility in WarpedGANSpace, ultimately enhancing the robustness and scalability of the

method.

93

Figure 4.8: This figure shows the performance of Resnet-101 in the cross-dataset evaluation
scenario. (a) Trained using train set of CASIA-CSIR & IIT-Delhi datasets and tested using test set
of CASIA-Iris-Thousand. (b) Trained using CASIA-Iris-Thousand & IIT-Delhi datasets and tested
using test set of CASIA-CSIR dataset. (c) Trained using CASIA-Iris-Thousand & CASIA-CSIR
datasets and tested using test set of IIT-Delhi iris dataset.

Figure 4.9: This figure shows the performance of EfficientNet in the cross-dataset evaluation
scenario. (a) Trained using train set of CASIA-CSIR & IIT-Delhi datasets and tested using test set
of CASIA-Iris-Thousand. (b) Trained using CASIA-Iris-Thousand & IIT-Delhi datasets and tested
using test set of CASIA-CSIR dataset. (c) Trained using CASIA-Iris-Thousand & CASIA-CSIR
datasets and tested using test set of IIT-Delhi iris dataset.

94

CHAPTER 5

IT-DIFFGAN: IMAGE TRANSLATIVE DIFFUSION GAN TO GENERATE SYNTHETIC
IRIS IMAGES

5.1 Introduction

As discussed in previous chapters, the Generative Adversarial Networks (GANs) [56] have

emerged as a powerful tool in the field of computer vision, particularly for the generation of

synthetic biometric images, such as iris [82, 140, 156, 158], face [47, 61, 67, 108, 138], fingerprint

[45, 137, 151, 171] and speech, which closely resemble the real-world data. Traditionally, GANs

take a random noise as input and uses that to generate a realistic looking image. For example,

Kohli et. al. [82] proposed a GAN-based method, iris Deep Convolution Generative Adversarial

Network (iDCGAN), for synthesizing cropped iris images from random noise. Although successful

in generating high-quality cropped iris images sized 64×64, this approach exhibited unrealistic

distortions and noise when tasked with generating higher resolution images. In a subsequent study

by Yadav et al. [156], this challenge was addressed by employing the Relativistic Average Standard

Generative Network (RaSGAN), specifically designed to produce high-quality, high-resolution iris

images from random noise input. Apart from the traditional GAN-based approaches where images

are generated using random noise as input to the generator, researchers have also studied image

translative GANs that takes an image as an input and then generate a synthetic image. These types

of GANs are most commonly used for image editing, style transfer, etc. For example, Richardson

et al. [116] proposed a image transltive StyleGAN, pixel-Style-pixel (pSp), that takes a face image

as an input and then aims to generate face images with edited style attributes such as hair color, face

expression, etc. Similarly, StarGAN v2 [26] and CIT-GAN [158] require paired training data for

the translation of a source iris image into a synthetic image possessing the attributes of the target

domain (or reference iris image). This requirement compels the generator to acquire mappings

across diverse domains, rendering it adaptable to multiple domains. Dorta et al. [40] proposed

WarpGAN, which enables partial edits without reliance on a specific source-target image pair. In this

95

framework, the generator receives a source image along with a target attribute vector, subsequently

learning a warp field to implement the desired alterations in the source image. Comparative studies

have demonstrated that this approach yields more realistic semantic edits in input images compared

to methods such as StarGAN v2 [26] and CycleGAN [170].

Despite their superior generative power, the utilization of GANs for generating synthetic bio-

metric images is not without its challenges. The three most concerning challenges that we would

like to study and overcome in this study are: (1) The imperative need to safeguard the privacy of

individuals whose biometric information is present in the training datasets. Currently, the synthetic

images generated by GANs are seen to have identity leak from training data [140, 159, 160] i.e, the

identity traits in the generated iris images are similar to the images in the training data, (2) Most of

the GANs in the literature lack the mechanism to introduce intra-class variations in the generated

images that can generate multiple synthetic biometric images per identity [160]. (3) When training

on small dataset, GANs are prone to mode collapse and often suffers from unstable training leading

to generation of images with alien artifacts and distortions [147]. In [159], Yadav and Ross over-

came some of these challenges by proposing iWarpGAN that focuses on disentangling identity and

style through two distinct transformation pathways: Identity Transformation and Style Transfer. In

the Identity Transformation pathway, the objective is to alter the identity of the input iris image

within the latent space, thereby generating identities distinct from those present in the training set.
This is accomplished by training a radial basis function (RBF)-based warp function, 𝑓 𝑚, within the
latent space of a GAN. The gradient of this warp function provides non-linear paths along the 𝑚𝑡ℎ
family of paths for each latent code 𝑧 ∈ R𝑑. On the other hand, the Style Transfer pathway focuses

on generating images with varied styles, extracted from a reference iris image. Consequently, by

combining the reference style code with the transformed identity code, iWarpGAN produces iris

images exhibiting both inter and intra-class variations. While this method was able to generate

realistic looking iris images that are different from training data in terms of identity, the generated

images were seen to have some artifacts and distortions in non-iris regions. Also, the dependency

on paired images to generate the desired iris image restricts its capability to generate numerous new

96

identities.

In this research, we address these challenges by proposing IT-diffGAN that is an image transla-

tive diffusion-GAN [147] with StyleGAN-3 [76] as the backbone architecture. The proposed

method aims to project input images onto the latent space of our diffusion-GAN and identify the

features most pertinent to individual identity and style. This is achieved using a identity and

style metric that calculates the distance between the original image and the images generated by

controlled manipulation of features in latent space. Once the features affecting the identity and

style attributes are identified, the proposed method is then trained to generate new identities by

manipulating these features in the latent space. This help IT-diffGAN to generate images with inter-

and intra-class variations while making sure that generated identities doesn’t resemble anyone in

the training data. By utilizing an image translative diffusion-GAN, this method is able to generate

more realistic images and overcomes the issues like mode collapse and unstable training that is

often faced by GANs. Thus, the contribution of this paper can be summarized as follows:

• We proposed an image translative diffusion-GAN (IT-diffGAN) with StyleGAN-3 as the

backbone architecture to generate realistic images by overcoming typical GAN issues like

mode collapse and unstable training.

• Identify the features in the latent space of IT-diffGAN that are most pertinent to individual

identity and style by manipulating those features and calculating the displacement in style

and identity.

• Manipulate the features in the latent space that affects the identity and style of an image

and utilize this knowledge to learn to generate iris images with both inter- and intra-class

variations.

• Utilizing the proposed IT-diffGAN to generate more realistic and diverse images, address-

ing the limitations of traditional GANs and improving the quality and stability of image

generation.

97

• We also evaluate the utility of the generated images for training deep-learning based iris

recognition methods by providing data with more identities and intra-class variations.

5.2 Background

In this section, we discuss the background needed to understand the proposed method.

5.2.1 Generative Adversarial Networks (GANs)

GANs [56] provide a framework for generating synthetic images that closely resembles the real

images. It consist of two networks – a generator and a discriminator – engaged in a competitive game

to produce realistic data distributions. The generator aims to generate synthetic samples, while the

discriminator aims to differentiate between real and synthetic samples. Through iterative training,

the generator refines its ability to produce increasingly realistic samples, while the discriminator

enhances its capability to discern between real and synthetic data.

5.2.1.1 Standard Generative Adversarial Networks (SGANs)

As mentioned above, a GAN comprises of two essential components: a generator 𝐺 and a dis-

criminator 𝐷, engaged in a competitive process. Traditionally, 𝐺 takes a random noise vector 𝑧

as input and generates a realistic looking synthetic image closely resembling real data samples.

Conversely, 𝐷 strives to differentiate between synthetically generated data and genuine data. This

dynamic interplay between the generator and discriminator is encapsulated in a min-max objective

function [56]:

min
𝐺

max
𝐷

𝐿 (𝐷, 𝐺) = E𝑰∼P [𝑙𝑜𝑔(𝐷 ( 𝑰))]

+E𝒛∼Q [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝒛)))].

(5.1)

Here, real data represented by 𝑰 ∼ P and the input to the generator (𝐺) comes from Gaussian

distribution, 𝒛 ∼ Q. 𝐷 ( 𝑰) determines whether 𝑰 is real or synthetic.

98

(a) Illustration of identity and style disentanglement for input source image 𝑰 by first mapping it to
latent space and then manipulating the channels by replacing those channels with channels from some
target image 𝑰𝑻. Distance score is calculated to analyze the channels that are most pertinent to identity
and style properties.

(b) Analysis of channels affecting the identity in the synthetic iris images generated using IT-diffGAN
that is trained using training images from CASIA-Iris-Thousand dataset. Images are generated using
source images from test set of CASIA-Iris-Thousand and to manipulate the channels and study their
effects, certain channel of source image in the latent space is replaced with that of reference images
from the test set. Distance score is used to analyze the channels that affects identity properties.

Figure 5.1: In order to disentangle style and identity in latent space, the first step in the proposed
method is to study the latent space to analyze the channels that affect identity and style properties
of the input image. With this analysis, proposed method aims to manipulate these properties to
generate images with varying style and identities that are different from training data.

99

Figure 5.2: The training process of proposed IT-diffGAN consists of six parts: (1) Encoder, 𝐸,
that aims to encode the input image 𝑰𝑺 into an interpretable latent space, (2) Warping Network 𝑊
that manipulates the identity information in the latent space (3) Generative Network, 𝐺, that takes
transformed encoding from 𝑊 as input to generate an image 𝑰𝑮 with identity different from 𝑰𝑺, (4)
Discriminator, 𝐷, that inputs either a real (𝑰𝑺) or synthetic image (𝑰𝑮) and predicts whether the
image is real or synthetic, (5) Attribute predictor, 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟, that is trained to predict style attribute
of a given image and (6) Iris matcher, 𝐼𝑟𝑖𝑠𝐼 𝐷, which is based on DRFNet and trained for iris
recognition using triplet training method.

5.2.1.2 StyleGAN and its Latent Space

Standard GANs were proposed by Goodfellow et al. [56] and since then, many different variations

of GANs has been proposed by researchers to improve the performance of SGANs and generate

high-resolution and realistic looking images that closely resembles the real data. One such GAN

is StyleGAN [77], which improved the generative capability of a standard GAN by introducing

a style modulation, which involves injecting style information at multiple layers of the generator

network. This style information controls the appearance of features at different scales, enabling the

generation of images with diverse styles and characteristics. Additionally, StyleGAN incorporates a

progressive growing strategy during training, where the network gradually increases in complexity,

starting from low-resolution images and progressively adding details as training progresses. This

approach helps stabilize the training process and ensures the generation of high-quality images.

100

Figure 5.3: Overview of Warping network (𝑊) method begins with a latent code 𝑧 from intermediate
latent codes 𝑤 ∈ W. This latent code is modified by a vector shift determined by the warping
function 𝑓 𝑚, which is executed within the warping network 𝑊. This shift relies on a specifically
chosen support set 𝑉𝑚, associated weights 𝐵𝑚, and parameters 𝑈𝑚.

StyleGAN-2 [76] further builds upon the success of the original StyleGAN by introducing several

improvements to further enhance image quality and diversity. This includes the incorporation of

adaptive instance normalization (AdaIN), which allows for more fine-grained control over style

modulation, as well as the addition of skip connections to facilitate information flow between

different layers of the network. In [87], researchers studied that the intermediate latent space 𝑊 of

StyleGAN and its variants is highly disentangled. Karras et al. [77] observed that specific channels

within 𝑊 correspond to distinct subsets of facial attributes, categorizing them into three groups:

Coarse channels (0 to 3) encapsulate high-level characteristics like pose, face shape, hairstyle, and

eyewear. Intermediate channels (4 to 8) represent features such as eye, mouth, and nose structure,

while Fine channels (8 to 17) encode color scheme and micro-structure details. Previous studies

have demonstrated how this disentanglement of features can be utilized to manipulate individual

facial attributes [87].

101

(a) Various examples of generated images using
proposed method with identities different from
the training dataset.

(b) Genuine (red) and imposter (green) scores are calculated for images showing
inter and intra-class variations in the generated dataset.

Figure 5.4: Examples of images generated using IT-diffGAN with identities different from training
data and intra-class variations. A total of 20,000 irides corresponding to 2,000 identities were
generated for each of the three training datasets.

102

5.2.2 Diffusion-GANs

Many researchers have been working towards improving the quality and realism of the generated

images via GANs. However, GANs still struggle to generate images free from artifacts, distortions,

and mode collapse, where the generator fails to capture the full diversity of the target distribution.

Diffusion Generative Adversarial Networks (Diffusion-GANs) [147] offer a promising solution to

these challenges. In Diffusion-GANs, the generation process is based on the concept of iteratively

refining a noise-corrupted image to generate high-quality samples. This iterative process involves

gradually diffusing noise through the image, effectively smoothing out the noise to reveal the

underlying structure and details. One key advantage of Diffusion-GANs over traditional GANs lies

in their ability to produce high-quality images with fewer artifacts and distortions. By iteratively

refining the image through diffusion, Diffusion-GANs can generate samples that exhibit improved

realism, making them well-suited for tasks that require high-quality image synthesis, such as image

editing and synthesis. Additionally, Diffusion GANs provide greater stability during training

compared to traditional GANs. The diffusion process offers a more controlled and stable training

environment, leading to smoother convergence and better overall performance. This stability helps

mitigate issues such as mode collapse, allowing the network to capture a wider range of image

variations.

The process of constructing Diffusion based GAN can be described using three steps [147]:

(1) Injecting Noise via Diffusion: Diffusion GAN aims to produce realistic images 𝑰𝒈 using a gen-

erator 𝐺, which transforms a latent variable 𝒛 drawn from a simple prior distribution 𝑝(𝒛) into a high-
dimensional data space, such as images. Here, 𝒛 ∼ 𝑝(𝒛) is represented as 𝑝𝑔 ( 𝑰) = ∫ 𝑝( 𝑰𝒈 |𝒛) 𝑝(𝒛) 𝑑𝑧

and 𝑰 refers to real image. To enhance the robustness and diversity of the generator, instance noise

is introduced into the generated samples 𝑰𝒈 by employing a diffusion process, which adds Gaussian

noise at each step. The noisy samples (let say 𝑰𝒏) obtained at different steps of the diffusion and

modeled from mixture distribution 𝑞( 𝑰𝒏| 𝑰, 𝑡)) are from Gaussian distributions with mean directly

proportional to 𝑰 and variance that varies according to influence of noise at step 𝑡. The same

mixture distribution and diffusion process is utilized by both real and generated samples to help 𝐺

103

learn underlying structure and details of real images.

(2) Adversarial Training: In order to accommodate diffusion process in generative adversarial

training, the original min-max objective [56] is updated as follows:

max
𝐷

𝐿 (𝐷) = E𝑰∼𝑝( 𝑰),𝑡∼𝑝𝜋,𝑰𝒏∼𝑞( 𝑰𝒏 | 𝑰,𝑡) [𝑙𝑜𝑔(𝐷 ( 𝑰𝒏, 𝑡))]

+E𝒛∼𝑝(𝒛),𝑡∼𝑝𝜋,𝑰𝒈∼𝑞( 𝑰𝒈 |𝐺𝜃 (𝒛),𝑡) [𝑙𝑜𝑔(1 − 𝐷 ( 𝑰𝒈, 𝑡))]

(5.2)

Here, 𝑝𝜋 refers to discreet distribution that helps assign weights to each step in diffusion 𝑡 ∈ 1, ....𝑇.

(3) Discriminator for Adaptive Diffusion: With the introduction of diffusion and timestep in

the adversarial training, diffusion based GAN requires new optimization strategy for 𝐷 to learn

how to generate realistic looking images. This is achieved by making the discriminator learn

the distinction between real and synthetic from easiest examples (no noise) and then gradually

increasing the noise-to-data ratio. This is done by utilizing a self-paced schedule for determining

the number of diffusion steps (𝑇), based on a discriminator over-fitting metric (𝑟𝑑). This metric,

derived from [75], evaluates the confidence of discriminator relative to the data,

𝑟𝑑 = E𝑰𝒏,𝑡∼𝑝( 𝑰𝒏,𝑡) [sign(𝐷𝜙 ( 𝑰𝒏, 𝑡) − 0.5)],
𝑇 = 𝑇 + sign(𝑟𝑑 − 𝑑target) × 𝐶

(5.3)

The schedule adjusts 𝑇 based on the deviation of rd from a target value, with a constant factor (C)

influencing the rate of change. The 𝑟𝑑 is recalculated and 𝑇 updated every four minibatches [75].

5.3 Proposed Method

StyleGAN-3 has been shown to be able to generate realistic images by utilizing random noise

vectors as input [76].

In this method, the initial latent codes 𝑧 ∈ 𝑍 undergo a transformation

process facilitated by a mapper, which consists of a sequence of fully connected layers that produce

intermediate latent codes 𝑤 ∈ W. This intermediate latent space, represented by W, is structured

as an 18×512 two-dimensional array, with each row denoted as a channel. One of the useful

104

features of StyleGAN-3 is the high degree of disentanglement present in its intermediate latent

space W. Studies conducted by Karras et al. [77] have studied this, revealing that specific layers

within W correspond to distinct subsets of facial attributes. For example, they categorized these

channels into three main groups for face images: coarse channels 0-3, middle channels 4-8, and

fine channels 8-17. Coarse channels capture broad characteristics such as pose, facial structure,

hairstyle, and the presence of eyewear. Middle channels encapsulate finer details like the structure

of eyes, mouth, and nose. Fine channels encode information related to color palettes and micro-

structural intricacies. Various studies [76,77,87] have showcased how the disentangled features can

be leveraged to manipulate individual facial features effectively. By carefully adjusting the latent

codes, researchers have successfully altered specific features while preserving the overall identity

of the face.

This section aims to delve deeper into this, exploring the possibilities and implications of such

disentanglement within the context of manipulation of identity and style in iris images. In order to

achieve this, the proposed method can be divided into three steps as described below.

5.3.1 Mapping Image Input to Latent Space

The proposed method, IT-diffGAN, is an image translative diffusion-GAN using StyleGAN-3 as

the backbone architecture that takes an iris image as an input which is mapped to a latent space

using an image encoder. We have utilized the encoder 𝐸 from pSp [116] as the backbone for our

image encoder that takes an image of size 256×256 as input and returns an encoding of size 1×512.

This encoding is then passed through mapper from StyleGAN-3 to output intermediate latent codes

𝑤 of size 16×512. With this, the adversarial loss is defined as,

L𝐴𝑑𝑣 = E𝑰∼𝑝( 𝑰),𝑡∼𝑝𝜋,𝑰𝒏∼𝑞( 𝑰𝒏 | 𝑰,𝑡) [𝑙𝑜𝑔(𝐷 ( 𝑰𝒏, 𝑡))]
+E𝑰𝒔∼𝑝( 𝑰𝒔),𝑡∼𝑝𝜋,𝑰𝒈∼𝑞( 𝑰𝒈 |𝐺𝜃 (𝐸 ( 𝑰𝒔)),𝑡) [𝑙𝑜𝑔(1 − 𝐷 ( 𝑰𝒈), 𝑡)].
Here, 𝑝𝜋 refers to discreet distribution that helps assign weights to each step in diffusion 𝑡 ∈

(5.4)

1, ....𝑇. Analyzing different channels using this method, we determined that Channel-1, Channel-8,

Channel-9 and Channel-10 have the most effect on the identity (as shown in Figure 5.1).

105

(a) Distribution of FID scores of generated iris
images against real images from CASIA-Iris-
Thousand dataset when GANs are trained using
CASIA-Iris-Thousand dataset.

(b) Distribution of FID scores of generated iris
images against real images from CASIA-CSIR
dataset when GANs are trained using CASIA-
CSIR dataset.

(c) Distribution of FID scores of generated iris
images against real
images IITD-iris dataset
when GANs are trained IITD-iris dataset.

Figure 5.5: Histograms depicting the realism scores of real iris images using FID that compares
the distribution of synthetically generated irides with real data. Since FID is a distance score, lower
FID score indicates that the synthetic iris images are more realistic.

5.3.2

Identity & Style Disentanglement

As mentioned earlier, each row of the intermediate latent space represented by W and of size

16×512 is denoted as channel that can affect different properties during generation of iris images.

The proposed method aims to find the channels that affect the identity and style properties of iris

images generated by proposed method. This is achieved by studying the intermediate latent codes

𝑤𝑠 of source iris image 𝐼𝑠 and how it affects the identity and style when certain channels are

replaced by target 𝑤𝑡. The updated code is then used by image translative diffusion StyleGAN-3 to

106

generate synthetic iris image 𝐼𝑔. Further, we utilized a trained iris recognition method (IrisID) and

trained iris attribute predictor (IrisAttr) to evaluate the effects of channel replacement between 𝑤𝑠

and 𝑤𝑡 using,

L𝐼 𝐷−𝑅𝑒𝑐𝑜𝑛 = ||𝐹𝑒𝑎𝑡 (𝐼𝑠) − 𝐹𝑒𝑎𝑡 (𝐼𝑔)||2
2

L𝐴𝑡𝑡𝑟−𝑅𝑒𝑐𝑜𝑛 = ||𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑠) − 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑔)||2
2

(5.5)

(5.6)

Here, L𝐼 𝐷−𝑅𝑒𝑐𝑜𝑛 and L𝐴𝑡𝑡𝑟−𝑅𝑒𝑐𝑜𝑛 helps evaluate the displacement in identity and style when

certain channels in 𝑤𝑠 are replaced by channels in 𝑤𝑡. Also, 𝐹𝑒𝑎𝑡 (.) refers to features extracted

by trained iris matcher 𝐼𝑟𝑖𝑠𝐼 𝐷 and 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (.) returns an attribute vector 𝑦′ of size 12. The initial

5 bits represent a one-hot encoding of the angle, while the subsequent 5 bits encode the position

shift. The remaining 2 bits indicate contraction and dilation. In this context, the angle and position

determine the orientation and the displacement of the iris center within the image. The angles

considered for this study include 0◦, 10◦, 12◦, 15◦, and 18◦, while the position shifts are [0,0],

[5,5], [10,10], [-10,10], and [-10,-10]. For instance, for an image with a 12◦ angle, a position shift

of [0,0], and contraction, the attribute vector 𝑦 would be [0,0,1,0,0,1,0,0,0,0,1,0].

5.3.3 Manipulating Style & Identity in Latent Space

In this paper, we aim to manipulate style and identity of a given input image in latent space to

generate images with both inter- and intra-class variations.

5.3.3.1 Style Transfer

For the manipulation of style properties, as mentioned earlier, we exchange the style specific

channels from the source identity S with the corresponding channels of a target style. In order

to enforce that the generated image has the style attributes similar to the target image, proposed

method aim to minimize the attribute loss between target image and generated image,

L𝐴𝑡𝑡𝑟−𝑅𝑒𝑐𝑜𝑛 = ||𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑡) − 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑔)||2
2

(5.7)

Here, 𝐼𝑡 refers to the target image and 𝐼𝑔 refers to the synthetically generated image.

107

Figure 5.6: Analysis of VeriEye Rejection Rate for different Generative Methods when trained
using CASIA-Iris-Thousand, CASIA-CSIR and IITD-iris dataset.

5.3.3.2

Identity Transformation

In this work, we aim to manipulate the identity of the input image such that the generated image

doesn’t match with any identity in the training set. This is achieved by shifting the source identity

𝑆 in a particular direction (away from source) in the latent space to generate a new identity 𝑆′ using

a Warping Network 𝑊. This network is applied at latent codes/channels most pertinent to identity

in the given image (as shown in Figure ??).

The Warping Network 𝑊 aims to learn 𝑀 functions ( 𝑓 1, ..., 𝑓 𝑀 ) to warp the identity specific
channels. The gradient of these functions help determine the direction at each latent code 𝑧 ∈ R𝑑,
enabling the identity shift from source identity 𝑆 to new identity 𝑆′ i.e, it transform R𝑑 by 𝑓 𝑚 :
R𝑑 → R, which is parameterized as the weighted sum of RBFs given by,

𝑓 (𝑧) =

𝐾
∑︁

𝑖=1

𝑏𝑖 exp(−𝑢𝑖 ||𝑧 − 𝑣𝑖 ||2)

(5.8)

where, 𝑣𝑖, 𝑏𝑖 and 𝑢𝑖 represents the center, weight, and scale of the 𝑖𝑡ℎ RBF and for a 𝑧, the

108

direction Δ 𝑓 can be utilized to define a curve by shifting 𝑧 to ¯𝑧 using,

𝛿𝑤 = 𝜖

Δ 𝑓 (𝑧)
||Δ 𝑓 (𝑧)||

(5.9)

Here, 𝜖 is the shift magnitude determining the transition from 𝑧 to ¯𝑧. It is important that the

shift magnitude is not too large as it may lead to un-realistic image generation, while a shift too

small may not affect the identity transformation. Therefore, similar to iWarpGAN, a reconstructor

is utilized to estimate 𝜖 and support set that is utilized to transform 𝑧 to ¯𝑧.

min
𝑉,𝐵,𝑈,𝑅

E𝑤,𝜖 [L𝑊−𝑅𝑒𝑔 (𝜖, ¯𝜖)]

(5.10)

Here, (𝑉 𝑚, 𝐵𝑚, 𝑈𝑚) denote the center, weight and parameters of RBFs with 𝑚 = 1, ....𝑀.

Further, we aim to maximize the identity loss to emphasize the uniqueness of identity in the

generated images.

L𝐼 𝐷−𝑅𝑒𝑐𝑜𝑛 = ||𝐹𝑒𝑎𝑡 (𝐼𝑠) − 𝐹𝑒𝑎𝑡 (𝐼𝑔)||2
2

(5.11)

𝐹𝑒𝑎𝑡 (.) refers to features extracted by trained iris matcher 𝐼𝑟𝑖𝑠𝐼 𝐷 as discussed earlier.

5.3.4 Datasets Utilized

In this study, we employed three publicly accessible iris datasets utilized in [160] to evaluate and

analyze the performance of proposed method:

• D1: CASIA-Iris-Thousand: This dataset [5], collected by the Chinese Academy of Sciences

Institute of Automation, serves as a benchmark for assessing the distinctiveness of iris features

and developing iris recognition methods. It comprises 20,000 iris images from 1,000 subjects

(2,000 unique identities, each with left and right eye), captured using an iris scanner with a

resolution of 640×480. The dataset is partitioned into training and testing subsets, utilizing

a 70-30 split based on unique identities, with 1,400 identities in the training set and 600 in

the testing set.

• D2: CASIA Cross Sensor Iris Dataset (CSIR): Here, we utilized the training set of the CASIA-

CSIR dataset [152], provided by the Chinese Academy of Sciences Institute of Automation

109

(a) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using StarGAN-
v2 against real images when trained using D1, D2 and D3.

(b) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using StyleGAN-
3 against real images when trained using D1, D2 and D3.

(c) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using diff-
StyleGAN-3 against real images when trained using D1, D2 and D3.

Figure 5.7: The histograms illustrating the quality of both real and synthetic irises, evaluated using
the ISO/IEC 29794-6 Standard Quality Metrics. This metric scores the images on a scale from 0 to
100, with higher values indicating better quality. Images that could not be assessed by this standard
were given a score of 255.

110

Figure 5.7 (cont’d)

(d) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using iWarpGAN
against real images when trained using D1, D2 and D3.

(e) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using IT-diffGAN
against real images when trained using D1, D2 and D3.

that comprises 7,964 iris images from 100 subjects (200 unique identities, each with left and

right eye), divided into training and testing sets with a 70-30 split based on unique identities.

The training set consists of 5,411 images, while the test set contains 2,553 images, aimed at

training and evaluating deep learning-based iris recognition methods.

• D3: IITD-iris: The IITD-iris dataset [6], released by the Indian Institute of Technology,

Delhi, was captured in an indoor environment.

It contains 1,120 iris images from 224

subjects, acquired using JIRIS, JPC1000, and digital CMOS cameras with a resolution of

320×240. Similar to the previous datasets, this dataset is split into training and testing subsets

using a 70-30 split based on unique identities, with 314 identities in the training set and 134

in the testing set.

The iris images in these datasets are cropped and resized to 256×256 pixels, with each image

style represented by an attribute vector 𝑦. Since the existing datasets lack a balanced distribution

of iris images across these attributes, we introduced variations such as angle and position via

image transformations applied to randomly selected images. This process involved obtaining iris

111

coordinates using the VeriEye iris matcher, translating images to different angles and positions

relative to these centers, and then extracting cropped iris images of size 256×256. This approach

facilitated the creation of a training dataset with balanced samples across various attributes.

5.4 Experiments & Analysis

In this section, we discuss various experiments conducted to assess and evaluate the efficacy

of the proposed approach. Firstly, we generated three distinct sets of iris images, each comprising

20,000 images representing 2,000 unique identities. These sets are denoted as D1, D2, and D3,

corresponding to three different training datasets. In certain experiments outlined below, a subset

of the generated images was employed to ensure comparability with the respective real dataset.

5.4.1 Experiment 1: Test of Realism

To assess the realism of the iris images generated using proposed method, various GAN methods—

StarGAN-v2, StyleGAN-3, iWarpGAN, Diffusion-GAN and IT-diffGAN— are individually trained

using real iris data from the CASIA-Iris-Thousand, CASIA Cross Sensor Iris, and IITD-iris datasets.

The generated images are evaluated for realism and quality using two different methods: (1) Fréchet

Inception Distance Score [119], and (2) ISO-ISO/IEC 29794-6 Standard Quality Metrics [3].

Fréchet Inception Distance (FID) Score: FID score is a metric that can be used to evaluate the

quality of synthetic images by comparing the distribution of synthetically generated iris images to

that of real images [119]. Since it is a distance score, the goal is to achieve a lower FID score, as this

indicates a greater similarity between the synthetic and real datasets. In this research, we evaluate

the realism of iris images generated using proposed method and compared it with other existing

methods for generating iris images. For this we generated 20,000 (2,000 unique identities) using

different methods mentioned earlier and calculated their FID score using real datasets used to train

those networks. From this study, we found that the diffusion based GANs [147] produce the most

realistic image with StyleGAN-3 based diffusion-GAN achieving average FID score of 8.01, 9.34

and 8.64 when networks are trained using D1, D2 and D3 dataset, respectively. Also, proposed

112

method that is an image translative diffusion-GAN obtained the average FID score of 11.03, 12.08

and 12.02 for D1, D2 and D3, respectively. A detailed analysis and comparison is shown in Figure

5.5.

ISO-ISO/IEC 29794-6 Standard Quality Metrics: The ISO standard uses a variety of criteria

to assess the quality of an iris image. These criteria include the usable iris area, pupil shape, the

contrast between the iris and sclera, sharpness, the contrast between the iris and pupil, among others,

to derive a comprehensive quality score. This score ranges from 0 to 100, with 0 representing the

lowest quality and 100 the highest. Images that cannot be evaluated using this ISO metric, usually

due to poor quality or segmentation errors, are given a score of 255. Similar to the experimental

protocol described in [159], we compare the ISO scores obtained for real and synthetic iris images

generated using StarGAN-v2, StyleGAN-3, iWarpGAN, Diffusion-GAN and IT-diffGAN. For fair

comparison, number of generated images used for this experiment is equivalent to number of irides

in real dataset D1, D2 and D3.

As shown in Figure 5.7, the realism of images generated by our proposed method IT-diffGAN

and iWarpGAN are comparable to the real iris datasets. On the other hand, many images from

StarGAN-v2 and StyleGAN-3 recieved the score of 255 and most of the other images received

lower quality score than real images.

VeriEye Rejection Rate: We further emphasize the realism of the synthetic iris images generated

by proposed method by analyzing the number of the images that are rejected by commercial iris

segmenter and matcher VeriEye. This software uses some pre-defined parameters similar to ISO-

ISO/IEC 29794-6 Standard Quality Metrics to evaluate if the given iris image meets the standard

of an acceptable iris image and rejects the images that doesn’t match those standard.

As shown in Figure 5.6, CASIA-Iris-Thousand Dataset, containing 20,000 real iris images,

exhibited a very low rejection rate of 0.06% and among the synthetic iris images obtained from

networks trained using train set of this dataset, StarGAN-v2 has the highest rejection rates of

0.27%. However, images generated using Diff-StyleGAN-3 and IT-diffGAN demonstrated better

performance with rejection rates of 0.11%, and 0.13%, respectively. Similar behavior was observed

113

for IITD-iris dataset, containing 1,120 iris images, where lowest rejection rate is obtained by Diff-

StyleGAN-3 at 0.71% followed by StyleGAN-3 and IT-diffGAN at 0.73% and 0.89%, respectively.

Interestingly, it was observed that the real images in CASIA-CSIR dataset has higher rejection

rate (2.81%) than other real iris datasets and also synthetic images from some of the generative

methods i.e., Diff-StyleGAN-3 has the rejection rate of 2.11% followed by IT-diffGAN at 2.44%.

5.4.2 Experiment 2: Test of Uniqueness

This experiment investigates the ability of IT-diffGAN to generate identities different from training

data with intra-class variations in synthetically generated iris images.

Uniqueness Test w.r.t Training Data (Experiment-2(i)): This experiment evaluates the unique-

ness of identities in the generated images with respect to the training data that was utilized to train

proposed IT-diffGAN. This is done by computing the similarity score between the generated images

and the training data using the commercial matcher VeriEye. We compare the uniqueness of the

generated dataset with the images generated using iWarpGAN, StyleGAN-3 and StarGAN-v2.

Uniqueness Test within Generated Dataset (Experiment-2(i)): This experiment evaluates how

unique the generated identities are from each other by computing the similarity between generated

images using VeriEye matcher. This helps evaluate the scalability of proposed method in generating

unique identities. For this experiment, we test for uniqueness between the synthetic images

generated using proposed method, iWarpGAN, StyleGAN-3 and StarGAN-v2

Analysis: The experiments conducted in this work closely follow the protocols outlined in [159]

to assess both inter-class and intra-class variations within the generated iris dataset. Additionally,

the evaluation focuses on the distinctiveness of the generated iris identities when compared to the

real iris images in the training dataset. As shown in Figure 5.11, the iris images created by IT-

diffGAN and iWarpGAN demonstrate less similarity to the real training irides than those generated

by other GAN methods [159]. However, the genuine score distribution of dataset generated by

proposed method exhibits intra-class variations closer to genuine score distribution of real images

than iWarpGAN. This suggests that proposed method effectively generates iris patterns with both

114

inter- and intra-class variations. Furthermore, an analysis of the imposter distribution between

the synthetic and genuine iris images reveals minimal overlap, reinforcing the uniqueness of the

generated identities. Thus, it can be concluded that IT-diffGAN successfully creates entirely

synthetic iris images with identities distinct from those in the training set, while GAN based

methods like StyleGAN-3 are limited to producing partially synthetic irides.

5.4.3 Experiment 3: Utility of Generated Iris Images

This experiment evaluates the performance of deep learning algorithms in iris recognition by

training and testing them using a triplet training method. The study compares the outcomes when

these algorithms are trained solely on real images versus a combination of real and synthetic iris

images.

Baseline Experiment-3(i): In the baseline experiment, EfficientNet [66] and ResNet-101 [101]

are trained following the triplet training method with hard triplet mining. The training and testing

of these methods is done using cross-dataset scenario i.e., when trained using real irides from

CASIA-CSIR and IITD-iris dataset, testing is done on CASIA-Iris-Thousand.

Improvement Experiment-3(ii): In this experiment, EfficientNet [66] and ResNet-101 [101] are

again trained following the triplet training method with hard triplet mining. However, the training

and testing of these methods is done using cross-dataset scenario with synthetically generated

dataset i.e., when trained using real irides and synthetic from CASIA-CSIR and IITD-iris dataset,

testing is done on real images from CASIA-Iris-Thousand.

Analysis:

The results, as depicted in Figures 5.8, 5.9, and 5.10, show that the integration of synthetic

data during training significantly enhances the performance of deep learning-based iris recognition

systems. This effect was particularly evident when both real and synthetic images were used, as it

allowed the models to capture a broader range of intra-class and inter-class variations. For example,

EfficientNet and ResNet-101, which initially demonstrated only moderate recognition capabilities

when trained solely on real data, exhibited substantial accuracy improvements when synthetic

115

images were included. The synthetic data, generated by iWarpGAN and IT-diffGAN, introduced

additional variability in iris patterns that helped the models learn more robust and generalizable

features.

The ROC analysis reveals that IT-diffGAN consistently improves the True Detection Rate (TDR)

at various False Match Rate (FMR) thresholds. For instance, as shown in Figure 5.8, the ResNet-

101 model shows TDRs of 97.72% at FMR=1% when trained with real and augmented dataset

generated using IT-diffGAN, outperforming both the baseline (87.87%) and iWarpGAN (94.13%).

This trend continues with EfficientNet, where TDRs of 96.64% at FMR=1%, again surpassing

both the baseline and iWarpGAN. Similar improvement in the performance are observed in the

CASIA-CSIR and CASIA-Iris-Thousand datasets.

The ROC curves clearly indicate an increase in the TMR for cases where the dataset was

augmented with synthetic iris images generated. These observations highlights the advantage of

using generative models like iWarpGAN and IT-diffGAN, which are capable of producing synthetic

irides that not only preserve the variability necessary for training but also generate identities distinct

from the original training set. This ability to generate new identities mitigates the risk of overfitting

and improves the model’s performance on unseen data, as demonstrated by the in our experiments.

As mentioned in earlier chapter, the baseline under-performance of EfficientNet can be at-

tributed to two main factors. First, architectural differences between EfficientNet and ResNet are

significant. EfficientNet’s design emphasizes parameter efficiency through compound scaling, op-

timizing depth, width, and resolution together. While effective in many domains, this approach

might not align well with the specific requirements of iris recognition, particularly when training

data is limited. In contrast, ResNet’s residual connections and straightforward structure make it

robust across diverse tasks, including biometric recognition, even with smaller datasets. Second,

the training requirements for EfficientNet differ from those of ResNet. EfficientNet’s compound

scaling may necessitate tailored learning rates, augmentation strategies, and regularization tech-

niques to effectively capture the intricate patterns in iris images. Without such adjustments, its

capacity to learn critical iris features may be limited. The inclusion of synthetic data appears to

116

address these challenges by diversifying the training set, allowing EfficientNet to better adapt to

the unique characteristics of iris recognition. This finding underscores the importance of designing

training strategies specific to an architecture’s strengths and weaknesses. Future research could ex-

plore architecture-specific synthetic data generation or custom training pipelines to further enhance

model performance across a variety of biometric recognition systems.

5.5 Conclusion

The thorough analysis in the experimental section indicates that, unlike existing GANs, the

proposed method excels in generating realistic iris images with identities distinct from those in

the training dataset. Additionally, the generated identities exhibit uniqueness among themselves,

featuring some degree of variation. This study also highlights the practical benefits of the gener-

ated dataset, which enhances the performance of deep learning-based iris recognition systems by

providing additional synthetic training data with numerous identities different from those in the

training data.

With the dataset used in this study, the proposed method demonstrates the capability to generate

up to 11,139,840 images, encompassing 1,113,984 unique identities, with each identity having

10 samples. This large-scale generation capacity underscores the method’s potential for creating

diverse and expansive datasets, which are essential for advancing the robustness of iris recognition

systems.

Our method relies on image transformation, requiring an input image to modify the identity

and an additional reference image to alter the style. This dependency may inherently constrain the

number of distinct identities that can be generated by the proposed method. In future research, we

aim to thoroughly investigate the method’s capacity for generating diverse identities and explore

strategies to enhance the generalizability of the proposed framework. By addressing these limita-

tions, we aim to ensure that the new identities learned by IT-diffGAN are not constrained by the

original training set, thus expanding the feature space and improving its scalability and versatility.

117

(a) Performance of Resnet101 when trained using CASIA-Iris-
Thousand dataset and CASIA-CSIR (aditionally, synthetic iris images
for improvement experiment) and tested on IITD-iris dataset.

(b) Performance of EfficientNet-B0 when trained using CASIA-Iris-
Thousand dataset and CASIA-CSIR (aditionally, synthetic iris images
for improvement experiment) and tested on IITD-iris dataset..

Figure 5.8: This figure illustrates the performance of EfficientNet-B0, and ResNet-101 in a cross-
dataset evaluation scenario.

118

(a) Performance of Resnet101 when trained using CASIA-Iris-
Thousand and IITD-iris dataset (aditionally, synthetic iris images for
improvement experiment) and tested on CASIA-CSIR dataset.

(b) Performance of EfficientNet-B0 when trained using CASIA-Iris-
Thousand and IITD-iris dataset (aditionally, synthetic iris images for
improvement experiment) and tested on CASIA-CSIR dataset.

Figure 5.9: This figure illustrates the performance of EfficientNet, and ResNet-101 in a cross-
dataset evaluation scenario.

119

(a) Performance of Resnet101 when trained using CASIA-CSIR and
IITD-iris dataset (aditionally, synthetic iris images for improvement
experiment) and tested on CASIA-Iris-Thousand dataset.

(b) Performance of EfficientNet-B0 when trained using CASIA-CSIR
and IITD-iris dataset (aditionally, synthetic iris images for improve-
ment experiment) and tested on CASIA-Iris-Thousand dataset.

Figure 5.10: This figure illustrates the performance of EfficientNet, and ResNet-101 in a cross-
dataset evaluation scenario.

120

(a) This figure illustrates the uniqueness of iris images pro-
duced by IT-diffGAN, where the GANs were trained on the
IIT-Delhi iris dataset.

(b) This figure illustrates the uniqueness of iris images pro-
duced by IT-diffGAN, where the GANs were trained on the
CASIA-CSIR dataset.

Figure 5.11: This figure illustrates the uniqueness of iris images produced by IT-diffGAN. The
y-axis displays similarity scores obtained from VeriEye. In the figure, R stands for Real, S for
Synthetic, Gen for Genuine, and Imp for Impostor. As you can see, the similarity between images
generated by IT-diffGAN is the lowest compared to other GANs, with iWarpGAN following.

121

CHAPTER 6

MULTI-DOMAIN IMAGE TRANSLATIVE DIFFUSION STYLEGAN WITH
APPLICATION IN IRIS PRESENTATION ATTACK DETECTION

So far, we focused on generating partially and fully synthetic iris images in NIR spectrum.

In

this chapter, we explore the capabilities and weaknesses of current methods in generating partially

synthetic ocular PA images. We also proposed a novel method, known as MID-StyleGAN, to

generate ocular PA images (with both inter and intra-class variations).

6.1 Introduction

Iris based biometric systems are known for their reliability and contactless recognition [71].

However, as these systems become more widespread, they are increasingly targeted by presentation

attacks (PA), where attackers attempt to deceive the system using artifacts such as printed images,

textured cosmetic contact lens, artificial eyes, etc.

to either impersonate a real individual or

obfuscate their own identity [34]. Detecting such attacks is important for a secure iris recognition

system, but is hampered by the limited availability of ocular datasets. This lack of data makes it

difficult to adequately train models to recognize the subtle differences between bonafide and PAs,

particularly when considering the wide range of variations within and across different PAs of iris

images (such as printed eyes and cosmetic contact lens). One solution to overcome this challenge

is to augment the training data with synthetic data that exhibit realistic iris images. These synthetic

datasets can help in developing and evaluating PA detection algorithms, ensuring that they are

robust against a wide range of attacks [82, 156, 158].

The generation of realistic synthetic biometric data has been explored in various studies.

The more recent methods employ generative adversarial networks (GANs) to produce synthetic

images [45,47,61,140,151,156]. GANs typically take a random noise vector as input and generate

a realistic image from it. For instance, Kohli et al. [82] proposed iDCGAN for synthesizing cropped

iris images from random noise. While they showed good results for croppped iris images of size

64𝑥64, this method struggles with generating higher resolution images, and fails miserable at

122

generating ocular images. Yadav et al. [156] utilized the Relativistic Average Standard Generative

Network (RaSGAN) to generate high-resolution iris images from random noise input. Another

category of GANs, focused primarily on tasks such as image editing and domain specific style

transfer, are image translative GANs that takes an image as an input and generate a synthetic image

as per conditions specified for image translation. For example, Richardson et al. [116] proposed

pSp, a image translative StyleGAN that takes a face image as an input and generates a synthetic

image with altered style attributes such as hair color and expressions while keeping intact the

characteristics that defines the identity of the face in the given input image. In another work, Yadav

et al. [158] proposed CIT-GAN that utilizes paired training data to translate a source iris image into

a synthetic image that incorporates the attributes from a target domain defined using a reference iris

image. This process allows the generator to map across different domains, making it versatile for

multiple applications. While these methods offer significant improvements over traditional methods

for synthetic iris image generation [32, 126, 173], the quality of images degrade for ocular images

where sometimes GANs focus too much on the non-iris parts of the images (such as eyelashes)

while failing to capture the intricate details of the iris.

In this paper, we address the problem of generating realistic high resolution ocular images

while overcoming the shortcomings of GANs (mode collapse, unstable training, etc.). Ocular

images provide richer context and additional information compared to the cropped iris images. It

includes not only the iris but also the surrounding regions such as the sclera, eyelashes, and eyelids.

These elements play a key role in many biometric applications such as PA detection (PAD), where

adversarial artifacts might appear beyond the iris itself. Additionally, generating ocular images

facilitates the development of more robust machine learning models that can handle diverse real-

world scenarios. To generate such images with rich contextual information we propose a novel

approrach, known as Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN), to

generate realistic high resolution ocular synthetic PA datasets. This method combines the strengths

of StyleGAN [75,76] and diffusion models [147,153] for high-fidelity ocular image synthesis while

utilizing a multi-domain discriminator and an image encoder for smooth transitions and variations

123

across multiple PA domains, i.e., the discriminator is responsible for distinguishing between real

and synthetic images, as well as classifying the domain of the image (e.g., determining whether

the image belongs to the domain bonafide, printed eyes or cosmetic contact lens). Also, the ocular

image encoder utilizes feedback from the discriminator to learn domain-specific knowledge. This

helps the network to better learn image translation from source to target domain. In the Section

6.4, we will show how the images generated using the proposed method are more realistic than

other GAN methods, but also capture the inter and intra class variations within the domains. Also,

we will show how the generated dataset can be utilized to augment ocular PA training datasets for

enhancing the performance of PA detection (PAD) methods.

Hence, the contributions of this paper can be summarized as follows:

• We propose a Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN) to

generate realistic high resolution ocular synthetic PA datasets to augment the training data

for PA detection.

• The proposed method (a) utilizes diffusion models in combination with GANs to generate

high-resolution, realistic synthetic images, (b) employs a multi-domain discriminator that is

scalable to multiple domains, and (c) promotes domain transfer using conditional adversarial

training and domain transfer loss.

• We compare and analyze the realism of ocular images generated by our proposed method

with other methods in the literature.

• We evaluate the utility of the generated ocular PA dataset for enhancing the performance of

a DNN-based PA detector.

124

Figure 6.1: Illustration of the proposed method that has three modules: (1) Encoder, 𝐸, which
takes an image and its domain label as an input and outputs the encoded image, (2) 𝐺 that takes
the encoded image as an input along with the target domain label to which the input image has to
be translated, and (3) 𝐷 takes an image and label as input and outputs the image probability of
domains as well as whether the image is real or synthetic.

Figure 6.2: Samples of ocular images generated using proposed method, MID-StyleGAN. The
proposed method is capable of not only generating multiple domains in the PA datasets but also
intra class variations present in different types of PAs.

125

6.2 Background

6.2.1 Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) [56] are powerful generative models that typically take

a random noise vector as input to output a realistic looking synthetic image. A GAN comprises

two core components:

(1) a Generative Network, referred to as the Generator (𝐺), and (2) a

Discriminative Network, known as the Discriminator (𝐷). These two networks are trained in

a competitive setting, where the Generator’s goal is to create images that are realistic enough

to deceive the Discriminator, while the Discriminator’s task is to differentiate between authentic

images and those generated by 𝐺. The objective function for GANs is designed to capture this

adversarial process and is typically formulated as a min-max game between the Generator and

Discriminator. Mathematically, it can be expressed as [56],

min
𝐺

max
𝐷

L (𝐷, 𝐺) = E𝒙∼P [𝑙𝑜𝑔(𝐷 (𝒙))]

+E𝒛∼Z [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝒛)))].

(6.1)

Here, 𝒙 ∼ P refers to the real source distribution and 𝒛 ∼ Z refers to the random noise

distribution. Also, 𝐷 (𝒙) determines whether 𝒙 is real or synthetic, while 𝐺 (𝑧) outputs the generated

image.

6.2.2 Diffusion based GANs

Researchers have made significant progress in improving the quality and realism of images generated

by GANs. However, challenges such as artifacts, distortions, and mode collapse—where the

generator fails to capture the full diversity of the target distribution—still persist. Diffusion

Generative Adversarial Networks (Diffusion-GANs) [147] offer a promising approach to address

these issues. In Diffusion-GANs, the image generation process involves iteratively refining a noise-

corrupted image to produce high-quality samples. This refinement process gradually diffuses noise

126

through the image, effectively smoothing it out to reveal the underlying structure and details,

𝑥𝑡 = 𝑥0 + 𝛽𝑡𝜖

(6.2)

where, 𝒙𝑡 is the image at time-step t, 𝒙0 is the original image, 𝛽𝑡 is the diffusion coefficient, and

𝜖 is the noise added at each step.

Combining the strengths of GANs and diffusion models, Diffusion GANs leverage the diffusion

process to guide the generator in producing realistic samples, while the adversarial component

ensures that the generated data is indistinguishable from real data. To integrate the diffusion

process into generative adversarial training, the original min-max objective [56] is modified as

follows [147]:

L (𝐷, 𝐺) = E𝑥∼𝑝(𝑥),𝑡∼𝑝𝜋,𝑦∼𝑞(𝑦|𝑥,𝑡) [log(𝐷 (𝑦, 𝑡))]
(cid:2)log(1 − 𝐷 (𝑦𝑔, 𝑡))(cid:3)

+E𝑧∼𝑝(𝑧),𝑡∼𝑝𝜋,𝑦𝑔∼𝑞(𝑦|𝐺𝜃 (𝑧),𝑡)

(6.3)

Here, 𝑝(𝑥) refers to the distribution of the real data distribution and 𝑝𝜋 refers to discreet

distribution that helps assign weights to each step in diffusion 𝑡 ∈ 1, ....𝑇. Also, 𝒚 and 𝒚 𝒈

respectively refers to the noisy and generated noisy counterpart of the real image 𝑥. With the

introduction of diffusion and time steps in adversarial training, Diffusion-GANs require a new

optimization strategy for the discriminator (𝐷) to effectively distinguish between real and synthetic

images. This is achieved by having the discriminator learn from the simplest examples (with no

noise) while gradually increasing the noise-to-data ratio. A self-paced schedule is used to determine

the number of diffusion steps (𝑇), based on a discriminator over-fitting metric (𝑟𝑑). This metric,

derived from [75], evaluates the discriminator’s confidence relative to the data:

𝑟𝑑 = E𝒙𝒏,𝑡∼𝑝(𝒙𝒏,𝑡) [sign(𝐷 (𝒙𝒏, 𝑡) − 0.5)],
𝑇 = 𝑇 + sign(𝑟𝑑 − 𝑑target) × 𝐶

(6.4)

The schedule adjusts 𝑇 based on the deviation of 𝑟𝑑 from a target value, with a constant factor (C)

influencing the rate of change. 𝑟𝑑 is recalculated and 𝑇 update after every four mini-batches [75].

127

6.3 Proposed Method

The proposed Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN) model is

designed to generate synthetic ocular images that accurately reflect the diversity found in real-world

datasets (as shown in Figure ??). This method utilizes StyleGAN-3 as the backbone architecture

while incorporating diffusion to generate high resolution realistic ocular images. The discriminator

in the proposed method 𝐷 is responsible for distinguishing between real and synthetic images, as

well as classifying the domain of the image (e.g., determining whether the image is a bonafide

or one of many PA types). Unlike traditional discriminators that output a binary decision, our

discriminator uses a multi-class output to support domain-specific classification. Also, the image

encoder 𝐸 learns to encode an ocular image while utilizing feedback from the discriminator to learn

domain-specific knowledge. This helps the network to learn a smooth image translation from the

source to the target domain. To achieve this, the adversarial loss in Eqn. 6.3 has to be modified as,

L (𝐷, 𝐺) = E𝑥∼𝑝(𝑥),𝑡∼𝑝𝜋,𝑦∼𝑞(𝑦|𝑥,𝑡) [log(𝐷 (𝑦, 𝑡))]
(cid:2)log(1 − 𝐷 (𝑦𝑔, 𝑡))(cid:3)

+E𝑡∼𝑝𝜋,𝑦𝑔∼𝑞(𝑦|𝐺𝜃 (𝐸 (𝑥,𝑠),𝑐),𝑡)

(6.5)

Here, c refers to the target domain and 𝐸 (.) refers to image encoder 𝐸 that takes an image 𝑥 as

input and its domain label 𝑠. 𝒚 and 𝒚 𝒈 refer to real noisy and generated noisy image, at step 𝑡. The

sub-network in 𝐷 for domain classification helps promote domain transfer using the following loss

functions,

Lreal
domain = E𝑥∼𝑝

data (𝑥) [log 𝐷domain(𝑠|𝑦)]

Lsynthetic
domain

= E𝑥𝑔∼𝑞(𝑥𝑔)

(cid:2)log 𝐷domain(𝑐|𝑦𝑔)(cid:3)

Ldomain = Lreal

domain + Lsynthetic

domain

(6.6)

(6.7)

(6.8)

Here, 𝐷𝑑𝑜𝑚𝑎𝑖𝑛 represents the domain classifier component of the discriminator. In Lreal

domain the
goal is to ensure that the discriminator assigns a high probability for a real image that belongs to the
𝑠. Similarly, Lsynthetic
domain minimizes the probability that a generated sample from 𝐺 is classified as

128

real and from the correct domain. In order to ensure that the encoder learns to translate input image

𝑥 to latent code 𝑧 that represents the iris and PA distributions, we defined the content preservation

loss as,

Lrecon = E𝑥∼𝑝

data (𝑥)

(cid:104)

∥𝐺 (𝐸 (𝑥, 𝑠), 𝑐) − 𝑥∥2
2

(cid:105)

LLPIPS = E𝑥∼𝑝

data (𝑥)

∥𝜙𝑙 (𝐺 (𝐸 (𝑥, 𝑠), 𝑐)) − 𝜙𝑙 (𝑥) ∥2
2

(cid:35)

(cid:34)

∑︁

𝑙

(6.9)

(6.10)

Here, 𝜙𝑙 (.) represents the features extracted at layer 𝑙 of a pre-trained network.1 In order to

ensure that the network does not alter the image drastically when it’s already in the target domain

and learning the intra-class variations, we define the following loss:

Linr = E𝑥∼𝑝

data (𝑥)

(cid:104)

∥𝐺 (𝐸 (𝑥, 𝑐), 𝑐) − 𝑥∥2
2

(cid:105)

(6.11)

Further, as described in [75], to ensure diversity and encourage consistent image quality across

domains, we employ a style-mixing and path-length regularization technique. The objective is to

prevent the generator from becoming too reliant on a single latent vector for generating images.

The regularization terms are defined as:

Lmix =

𝐿
∑︁

𝑖=1

∥𝐺 (𝐸 (𝑥1, 𝑠1), 𝑐) − 𝐺 (𝐸 (𝑥2, 𝑠2), 𝑐) ∥2
2

(cid:104)

Lpl = E𝑤

∥∇𝑤𝐺 (𝑤) − 𝛼∥2
2

(cid:105)

(6.12)

(6.13)

Here, 𝑧1 and 𝑧2 are embeddings from image encoder 𝐸 and 𝐿 is the number of layers in the

generator.

6.4 Experiments & Analysis

6.4.1 Datasets & PA Detection Methods Used

In this research, we utilized three different PA datasets, viz., D1: Berc-iris-fake [91], D2: Casia-

iris-fake [136], D3: LivDet-2017 [162] and D4:

test set of LivDet-2020 [36]2 for training and

1https://github.com/TreB1eN/InsightFace_Pytorch
2LivDet-2020 does not have a training set.

129

testing different iris presentation attack detection (PAD) algorithms. These ocular PA datasets

contain bonafide images and images from different PA classes such as cosmetic contact lenses,

printed eyes and artificial eyes. Each dataset is divided into train and test using 70-30 split on each

domain (bonafide, printed eyes and cosmetic contact lens).

The proposed generative network, MID-StyleGAN, used in this research is trained using the

train set of LivDet-2017 dataset containing bonafide, printed eyes and cosmetic contact lens (three

domains). Using the trained network, we generate 10,000 synthetic ocular images per domain. We

evaluate these images for realism and utility in the sub-sections below.

6.4.2 Realism Assessment

With the rapid development in DeepFake technology, researchers have been exploring various

approaches to assess the quality of synthetically generated data. Salimans et al. [120] introduced

the use of a pre-trained Inception-V3 model to compute the inception score by comparing the

marginal and conditional label distribution of the synthetic data. A higher inception score indicates

better quality of the generated data. However, this method does not account for the real data

distribution in its calculations. To address this, Heusel et al. [63] proposed the Fréchet Inception

Distance (FID) score, which compares the statistics of the real data with those of the synthetically

generated data:

𝐹 𝐼 𝐷 = | 𝝁𝒓 − 𝝁𝒔 |2 + 𝑇𝑟 (𝚺𝒓 + 𝚺𝒔 − 2

√︁𝚺𝒓𝚺𝒔).

(6.14)

In this equation, 𝝁𝒔, 𝝁𝒓, 𝚺𝒔, and 𝚺𝒓 represent the statistics of the synthetic (𝑠) and real (𝑟)

distributions, respectively. Since FID measures the distance between these two distributions, a

lower FID score indicates better quality of the generated data. As described earlier, for this

experiment we train MID-StyleGAN with the train set of LivDet-2017 dataset and generate 10,000

images for each domain (bonafide, printed eyes and cosmetic contact lens) using test set from D1,

D2, D3 and D4 as source image. For each of the generated images, their realism score is calculated

130

against the distribution of real images (source) using FID. For the comparative study, we utilize

CIT-GAN [158], StyleGAN-3 [76] and diffusion based StyleGAN-3 (diff-Style3) [147].

Analysis: The analysis of FID scores reveals that MID-StyleGAN performs best, producing the

highest quality images with lower FID scores averaging at 19.71. In contrast, both StyleGAN-3

(average FID of 139.22) and CIT-GAN (average FID of 257.41) exhibit inconsistent performance,

reflecting significant variability across domains. Specifically, the printed eyes introduces higher

FID scores, leading to poorer overall performance. The presence of multiple peaks suggests

that these models struggle to maintain consistent quality across different types of synthetic data,

especially for printed eyes (See Figure 6.3).

6.4.3 Utility of Generated Dataset

In this section, we describe experimental setups that are utilized to evaluate the usefulness of

synthetically generated ocular PA dataset generated using MID-StyleGAN by evaluating the per-

formance of different deep learning based iris PA detection methods, viz., [130], VGG-19 [134],

ResNet-101 [62], MobileNet-v2 [121], AlexNet [85] and D-NetPAD [129].

The experiments in this section are done in a cross-dataset scenario, i.e., if the PA detectors

are trained on train set of D1, then they are tested on test sets from D3 and D4. As mentioned

earlier, D1: Berc-iris-fake [91] has a total of 2,778 bonafide and 1,820 PAs that is divided using

70-30 split on each domain (bonafide, printed eyes and cosmetic contact lens), i.e., the train set

has 1,944 bonafide and 1,274 PA images while the test set has 834 bonafide and 546 PA images.

For D3: LivDet-2017 [162] the train-test partition is already provided with 6,563 bonafides and

9,137 PA images in the train set and 5,511 bonafides and 9,356 PA images in the test set. We

have only the test set for D4: LivDet-2020 [36] dataset, which has 5,330 bonafides and 6,007 PA

images (excluding the post-mortem iris images in this dataset that is not the focus of our study).

Note that since the proposed method is an image translative generative method, we set aside D3:

CASIA-iris-fake [136] to be used for synthetic image generation with domain transfer. This ensures

that the generated images have no overlap with any images in the test sets. This dataset has a total

131

of 6,000 bonafide images and 1,780 PA images.

6.4.3.1 Baseline Experiment-0

In this experiment, we set the baseline for various PA detectors for ocular PA detection, i.e, training

and testing of the detectors are done on real images from iris PA datasets. As mentioned above, the

training and testing is done in the cross-dataset setup, which means that if training is done on train

set of D1, the testing is done on test sets from D3 and D4. Similarly, when training is done on train

set of D3, the test sets have images from D1 and D4.

6.4.3.2 Utility Experiment-1

In this experiment, we aim to evaluate the usefulness of the synthetic dataset in training more reliable

and secure PA detection techniques. For this, unlike the baseline experiment, the PA detection

methods are trained using both real and synthetically generated PA datasets. The synthetic image

for various domains (bonafide, printed eyes and cosmetic contact lens) are generated using MID-

StyleGAN trained on images from D2: CASIA-iris-fake. 7,000 images per domain are generated

which are then utilized enhance the detector’s performance by augmenting the training set with

more images that has domain level variations.

Analysis: After evaluating the performance of various iris detectors on different datasets, we

analyzed how the number of samples across domains and variations in the training set can affect the

performance of the PA detectors. This is very clear when comparing the baseline performance of

the detectors when trained with D1: Berc-iris-fake dataset which is a comparatively smaller dataset

(as shown in Table 6.3) versus when trained using the LivDet-2017 dataset (as shown in Table 6.1).

As seen from the Tables, the VGG-19 detectors trained using D1 obtained a TDR of 27.40% at

1% FDR when tested on D4: LivDet-2020 dataset, while it obtained 73.08% TDR at 1% FDR on

D4 when trained using D3: LivDet2017 (which has more number of samples and variations in the

dataset). Similar behavior was noticed for the other PA detectors.

132

Another proof of effects of number of samples across domains and variations in train set on

the performance of PA detectors is obtained by comparing their performance when train set is

augmented using synthetic ocular samples to introduce more samples per domain with intra-class

variations. Comparing the performance of PA detectors in Table 6.3 with 6.4 and Table 6.1 with

6.2, it can be clearly seen that performance of detectors improve after augmenting the train set

using synthetic samples. For example, in Table 6.1 and 6.2 the performance of DNetPAD [129]

when tested on D1 improves from 93.41% TDR at 1% FDR to 98.72% TDR at 1% FDR. Similar

behaviour was seen for other detectors as well.

Table 6.1: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Baseline
Experiment-0 when PA detectors are trained on D3: LivDet-2017 and tested using test sets from
D1: Berc-iris-fake and D4: LivDet-2020.

Dataset

VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121]

D1

D4

TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR
TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR

92.12
97.99
99.08
70.63
74.86
80.86

93.77
95.97
97.99
48.96
54.47
61.69

99.08
99.45
99.63
75.73
80.66
88.30

97.44
97.99
98.90
76.51
79.96
84.40

93.41
95.60
99.08
70.15
75.16
82.05

Table 6.2: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Utility
Experiment-1 when PA detectors are trained on D3: LivDet-2017 + Synthetic images and tested
using test sets from D1: Berc-iris-fake and D4: LivDet-2020.

Dataset

VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121]

D1

D4

TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR
TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR

94.51
97.99
100
73.08
77.36
84.07

97.99
97.99
98.90
54.60
60.91
68.89

99.27
99.45
99.82
83.20
88.66
92.73

98.90
99.08
99.63
79.39
83.39
86.97

98.72
98.72
99.72
70.88
76.21
83.05

Table 6.3: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Baseline
Experiment-0 when PA detectors are trained on D1: Berc-Iris-Fake and tested using test sets from
D3: LivDet-2017 and D4: LivDet-2020.

Dataset

VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121]

D3

D4

TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR
TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR

38.04
41.78
49.59
27.40
38.72
55.88

24.07
28.24
35.44
17.86
24.90
38.22

44.60
47.82
53.78
34.26
45.60
61.54

55.70
58.69
65.28
20.79
24.72
29.58

51.33
53.60
58.12
21.03
25.36
32.89

133

(a) This histogram shows the FID scores of generated images using four genera-
tive methods: Proposed method MID-StyleGAN, diff-StyleGAN-3, StyleGAN-3,
and CIT-GAN. MID-StyleGAN achieves the lowest FID scores, indicating better
image quality, while CIT-GAN and StyleGAN-3 show higher FID scores and
inconsistencies, especially across domains like printed eyes.

(b) This histogram breaks down MID-StyleGAN’s FID scores for the three do-
mains: Bonafide, Printed eyes, and Cosmetic Contact Lens. The Bonafide set
has the lowest average FID, showing higher quality, while Printed eyes introduces
higher variability and poorer performance.

Figure 6.3: Comparison of FID scores across multiple generative methods with respect to proposed
method. The first plot shows performance across all methods, while the second focuses on realism
of images generated using MID-StyleGAN across different domains.

134

Table 6.4: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Utility
Experiment-1 when PA detectors are trained on D1: Berc-Iris-Fake + Synthetic images and tested
using test sets from D3: LivDet-2017 and D4: LivDet-2020.

Dataset

VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121]

D3

D4

TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR
TDR @ 1% FDR
TDR @ 2% FDR
TDR @ 5% FDR

43.76
49.19
58.52
14.99
43.67
60.51

28.99
34.67
43.73
18.91
26.24
39.74

51.85
56.02
65.29
42.07
53.14
68.35

57.11
60.71
68.57
24.75
33.19
50.36

54.36
57.87
65.12
52.99
64.37
75.73

6.4.4 Ablation Study

To rigorously evaluate the effectiveness of the proposed MID-StyleGAN, we conducted two distinct

types of ablation studies. These studies provide a comprehensive understanding of the model’s

contributions and limitations in the context of ocular presentation attack detection (PAD). First,

we systematically removed key components of the MID-StyleGAN architecture to assess their

individual contributions to overall performance. By isolating elements such as the adaptive loss

function, multi-domain architecture, and diffusion-based components, we measured how each

feature impacts the quality and diversity of synthetic ocular images. The results of this analysis

demonstrated that the inclusion of these components significantly enhances the model’s ability

to generate realistic and domain-consistent images, which are crucial for improving PAD system

performance.

Second, we performed a capacitive study to explore the iterative potential of synthetic data

generation using MID-StyleGAN. In this approach, the proposed method was trained using real

data to produce the first set of synthetic samples. These synthetic samples were then used to train

the GAN again, generating a second set of synthetic data. The process was repeated iteratively to

produce a third set of synthetic data. For each generated set, we computed the Fréchet Inception

Distance (FID) score to measure its similarity to real data, allowing us to track how closely the

synthetic samples approximated real data over successive iterations. Furthermore, each set of

synthetic data was utilized to train PAD methods, and their performance was analyzed. The results

revealed the extent to which iterative training cycles influence the quality of synthetic data and its

effectiveness in improving PAD performance.

135

6.4.4.1 Studying Components of Proposed Method

Effect of Style Mixing Regularization: Style mixing regularization plays a vital role in encourag-

ing diversity in the generated images by mixing styles from different layers. To assess its impact,

we conducted experiments with and without this regularization. When style mixing was removed,

we observed a noticeable drop in image quality and diversity. The generated images tended to

lack variability across different domains, which hindered their utility for cross-domain analysis

in presentation attack detection. The average FID score increased (worsened) by approximately

9.79%, indicating degraded image quality.

Impact of Path Length Regularization: Path length regularization ensures smoother transitions

in the latent space and improves the consistency of generated images. We performed experiments

by disabling this regularization. The results showed that, without path length regularization, the

generator produced less consistent outputs, with occasional abrupt changes in image features. The

average FID score worsened by approximately 8.12%, and visual inspection of the generated images

revealed artifacts that negatively impacted their realism. This regularization was particularly critical

for maintaining the smooth transitions between different ocular domains.

Role of Domain-Specific Discriminator: The multi-domain discriminator in MID-StyleGAN

was specifically designed to handle domain transfer by discriminating images based on their target

domain. We conducted an experiment by replacing the multi-domain discriminator with a standard

single-domain discriminator. Without the multi-domain capability, the model struggled to enforce

domain-specific characteristics in the generated images. The generated PA samples lacked clear

domain-specific features, and domain confusion was evident. The average FID score worsened

by 20.70%, suggesting a significant decrease in the quality of the generated domain-transferred

images.

Effect of Content Preservation Loss: We further analyzed the effect of content preservation loss

(reconstruction loss) by removing it from the objective function. In this setting, the model generated

images that diverged significantly from the input samples, with important features being lost during

136

the domain transfer process. This loss function is crucial for ensuring that key ocular features

are retained, even when the domain is altered. Without this component, the model’s capacity for

realistic and recognizable presentation attack generation was severely compromised.

Each component of the proposed MID-StyleGAN contributes significantly to the overall success

of the model. Style mixing and path length regularization enhance the diversity and smoothness of

the generated images, while the multi-domain discriminator is critical for domain-specific image

generation. Content preservation ensures that key ocular features are retained during domain

transfer, and the absence of identity preservation for the intended task of presentation attack

detection where focus is on detecting attacks rather than biometric recognition. The proposed

architecture achieves an optimal balance of these components, resulting in high-quality domain-

transferred images with favorably low FID scores across various ablation settings.

6.4.4.2 Capacitive Study

For this study, the proposed method was trained using real data to produce the first set of synthetic

samples (Synthetic-1) for multiple domains (bonafide, printed eyes and cosmetic contact lens).

These synthetic samples were then used to train MID-StyleGAN again, generating a second set of

synthetic data (Synthetic-2). The process was repeated iteratively to produce a third set of synthetic

data (Synthetic-3). For each generated set, we computed the Fréchet Inception Distance (FID)

score to measure its similarity to real data, allowing us to track how closely the synthetic samples

approximated real data over successive iterations. Furthermore, each set of synthetic data was

utilized to train PAD methods, and their performance was analyzed.

For all the experiments in this section, we have utilized similar experimental suite as mentioned

in experiment section.

Analysis on Synthetic-1: This is the set of synthetic data that was generated when MID-StyleGAN

was trained using real samples and has been utilized for realism and utility analysis in the experi-

mental section. As mentioned before, this set of synthetic data (Synthetic-1) obtained an average

137

FID score of 19.71 for 30,000 images with 10,000 images per domain, indicating the realism of

generated samples with respect to real data. Also, comparing the performance of PA detectors

in Table 6.3 with 6.4 and Table 6.1 with 6.2, it can be clearly seen that performance of detectors

improve after augmenting the train set using synthetic samples. For example, in Table 6.1 and 6.2

the performance of DNetPAD [129] when tested on D1 improves from 93.41% TDR at 1% FDR to

98.72% TDR at 1% FDR. Similar behavior was seen for other detectors as well.

Analysis on Synthetic-2: This is the set of synthetic data that was generated when MID-StyleGAN

was trained using Synthetic-1. The number of samples utilized for training is similar to the number

of samples from real data for fair comparison. We observed an average FID score of 20.36 for the

generated data (Synthetic-2 as shown in Figure), where similar to before we obtained the score on

30,000 images with 10,000 images from each domain. Some effect on performance was observed

on the performance of PAD methods when trained using this dataset (Real + Synthetic-2). For

example, TDR of VGG-19 and DNetPAD improved by 0.23% and 0.18% at 1% on D4 test set,

respectively, in comparison to when the detectors were trained using Real + Synthetic-1.

Analysis on Synthetic-3: This is the set of synthetic data that was generated when MID-StyleGAN

was trained using Synthetic-2. The number of samples utilized for training is similar to the number

of samples from real data for fair comparison. We observed an average FID score of 31.64 for the

generated data (Synthetic-3 as shown in Figure), where similar to before we obtained the score on

30,000 images with 10,000 images from each domain. Some effect on performance was observed

on the performance of PAD methods when trained using this dataset (Real + Synthetic-2). For

example, TDR of VGG-19 and DNetPAD improved by 2.11% and 3.42% at 1% on D4 test set,

respectively, in comparison to when the detectors were trained using Real + Synthetic-1.

The results revealed the extent to which iterative training cycles influence the quality of synthetic

data and also performance of PAD methods.

138

6.5 Conclusion and Future Work

The proposed approach for multi-domain image translation within the context of iris presentation

attack detection effectively ensures that the generated ocular images are from the specified target

domain. By leveraging the domain classification loss, the model is trained to produce images that

not only exhibit realistic features but also align with the desired domain, facilitating more accurate

and robust PA detection. At present, our approach does not specifically aim to generate entirely

new identities. This decision is based on the nature of the presentation attack detection task, where

the primary concern is distinguishing between bonafide and attack images rather than deducing

identities. Consequently, the model may replicate certain identity features from the training data,

which is acceptable within the context of this specific application. However, we recognize the

importance of privacy considerations in synthetic image generation. Moving forward, our goal

is to refine this approach to be more privacy-conscious by ensuring that the generated images

do not replicate identity characteristics from the training data. This would involve introducing

additional constraints or adversarial techniques to separate identity features from domain-specific

attributes, ensuring that the synthetic images are effective for PA detection while also protecting the

privacy of individuals whose data is used in training. This enhancement aligns with broader ethical

considerations in AI development and contributes to the responsible deployment of generative

models, particularly in sensitive fields like biometric security.

139

(a) Performance of DNetPAD in baseline Experiment-0 when it is trained using real images from the
LivDet-2017 train dataset compared with utility Experiment-1 when it is trained using real and synthetically
generated images. The testing is done on the test set of Berc-iris-fake.

(b) Performance of MobileNet-v2 in baseline Experiment-0 when it is trained using real images from
LivDet-2017 train dataset compared with utility Experiment-1 when it is trained using real and synthetically
generated images. The testing is done on the test set of LivDet-2020.

Figure 6.4: Performance of iris PA detectors when trained using only real images and also when
trained using real+synthetic images showcasing the usefulness of the generated ocular PA dataset.

140

CHAPTER 7

SUMMARY AND FUTURE WORK

In the field of biometrics, the development of techniques for generating synthetic data has been

pivotal across various modalities. This has motivated researchers to generate synthetic biometric

data that exhibits the characteristics of real-world data. However, the existing methods for generating

synthetic irides are confronted with limitations concerning quality, realism, and the capacity to

capture both inter and intra-class variations. In this research, we introduced novel approaches that

are aimed at mitigating these issues, accompanied by a rigorous exploration of their practical utility

via extensive experimentation and analysis:

Generating Partially-Synthetic Iris Images: As mentioned previously, the primary objective

of partially-synthetic irides is to introduce controlled variations into genuine iris data, thereby

enriching the diversity and robustness of the dataset. This proves particularly advantageous in

scenarios where real data is scarce, imbalanced, or lacks specific variations. For instance, iris PA

detection, where detection methods are tasked with identifying various PA attacks (e.g., printed eyes,

cosmetic contact lenses) using a limited PA samples. In such a scenario, the training and testing

of PA detection methods can be severely hampered due to the scarcity of PA data. Furthermore,

as technology advances, novel and sophisticated PA attacks emerge (e.g., high-quality textured

contact lenses, replay attacks with high-definition screens), rendering existing detection methods

inadequate to generalize over new attacks. We proposed three different approaches to address these

challenges:

• Leveraging Relativistic Average Standard Generative Adversarial Network (RaSGAN): Our

study utilized RaSGAN to generate high quality synthetic iris images, with a specific emphasis

on their potential application in PA detection. This technique harnesses the capabilities of a

RaSGAN to produce high-quality iris images. Unlike conventional GANs, the incorporation

of a "relativistic" discriminator and generator in RaSGAN bolsters the network’s generative

power. The synthetic iris images generated through this method exhibit high resemblance to

141

real iris images, capturing their intricate details and characteristics. We further investigated

the utility of these synthetic images for training various PAD methods. Our experiments

demonstrated that a PAD method trained on these synthetic images exhibits improvement

in the performance. Our approach holds promise in bolstering the security and reliability

of iris recognition systems. We also proposed a one-class PA detector using "relativistic"

discriminator, known as RD-PAD, which aims to learn a decision boundary around bonafide

samples and reject everything else as PAs. Thus, enabling it to generalize well over unseen

PAs.

• Cyclic Image Translation Generative Adversarial Network (CIT-GAN): Here, we introduced

a novel approach, known as CIT-GAN, designed for synthetic iris generation with style

transfer to multiple target domains. This method incorporated a styling network, which

learns the distinctive style characteristics of each domain represented in the training dataset.

By leveraging the Styling Network, the generator is guided to translate images from a source

domain to a reference domain, resulting in the generation of synthetic images imbued with the

style characteristics of the reference or target domain. This approach is particularly pertinent

in the context of iris PA detection, where we employ CIT-GAN to generate synthetic PA

samples for under-represented classes in the training dataset. Through rigorous evaluation

using various iris PAD methods, we demonstrate the effectiveness of these synthetically

generated PA samples for training PAD models. Additionally, we gauge the realism of the

synthetic images using the Fréchet Inception Distance (FID) score, which quantifies the

similarity between the distributions of real and synthetic images. Our results emphasizes the

realism and utility of the synthetic images produced by the proposed method compared to

other competing approaches.

• Multi-domain Image Translative Diffusion StyleGAN with Application in Iris Presentation

Attack Detection (MID-StyleGAN): An iris biometric system can be vulnerable to presenta-

tion attacks (PAs), where artifacts like artificial eyes, printed eye images, or cosmetic contact

142

lenses are used to deceive the system. To mitigate these threats, various presentation attack

detection (PAD) methods have been proposed. However, the development and evaluation of

iris PAD techniques face a significant challenge due to the lack of sufficient datasets, pri-

marily because of the inherent difficulties in creating and capturing realistic PAs. To address

this issue, we presented the Multi-domain Image Translative Diffusion StyleGAN (MID-

StyleGAN), a novel framework designed to generate synthetic ocular images that effectively

capture domain-specific information from iris PA datasets. MID-StyleGAN leverages the

strengths of diffusion models and generative adversarial networks (GANs) to create realistic

and diverse synthetic data.

It utilizes a multi-domain architecture that enables seamless

translation between bonafide ocular images and various PA domains, while maintaining the

biometric identity features. The framework incorporates an adaptive loss function specifi-

cally tailored for ocular data to ensure domain consistency. Experimental results demonstrate

that MID-StyleGAN surpasses existing methods in generating high-quality synthetic ocular

images, significantly enhancing PAD system performance.

While these methods can effectively generate partially synthetic iris images, they lack the

capability of generating iris images with identities that are different from training data i.e., the

identities generated by these methods resemble the identities used for training the data.

Generating Fully-Synthetic Iris Images: Fully-synthetic biometric data refers to the generation of

entirely artificial biometric samples that do not correspond to any real individuals in the population.

These synthetic samples are created using different methods to simulate the statistical properties

and characteristics of real biometric data. To overcome the shortcomings of previous methods and

generate fully-synthetic iris images, we introduced two novel frameworks:

• Disentangling Identity and Style to Generate Synthetic Iris Images using iWarpGAN: This

framework consists of two transformation pathways:

the Identity transformation and the

Style transformation pathway. The Identity transformation pathway is devised to modify

the identity of the input iris image within the latent space, enabling the generation of iris

143

images with identities distinct from those in the training set. On the other hand, the Style

transformation pathway concentrates on generating iris images with varying styles. Style

attributes are extracted from a reference iris image and given attribute vector, representing

the attributes of given reference iris image. These two transformation pathways in iWarpGAN

facilitates the generation of iris images that show diverse identities and styles and provide a

comprehensive exploration of the latent space.

• Image Translative Diffusion GAN (IT-DiffGAN): IT-diffGAN addresses the challenges posed

by traditional GANs in generating synthetic biometric data. IT-diffGAN leverages the stability

and realism provided by diffusion models, using StyleGAN-3 as the backbone architecture.

The method begins by projecting input images onto the latent space of the diffusion-GAN,

where it identifies the features most pertinent to individual identity and style. A specialized

identity and style metric is employed to calculate the distance between the original and

generated images, enabling the model to learn which latent features affect identity and

style. Once these features are identified, IT-diffGAN is trained to generate new identities by

manipulating them in the latent space. This allows the generation of images with both inter-

and intra-class variations while ensuring that the synthetic identities do not resemble anyone

in the training data. By utilizing the diffusion-GAN framework, IT-diffGAN produces more

realistic images and addresses common issues associated with traditional GANs, such as

mode collapse and unstable training.

While the methods proposed in this thesis represent significant advancements in generating

synthetic iris images, there remain several promising directions for future exploration. These

directions could help further improve the realism, diversity, and practical application of synthetic

biometric data, particularly in addressing current limitations in iris image generation. For example,

• Scaling Synthetic Data Generation: A critical objective for future research is to expand

the scope and scale of synthetic datasets. Generative models should be capable of producing

large-scale datasets that accurately reflect the diversity and complexity of real-world biometric

144

data. This includes generating not only a higher number of images but also creating distinct

identities that do not resemble those in the training data. Additionally, the scope of synthetic

presentation attacks (PAs) needs to be broadened to include emerging and sophisticated attack

methods such as high-resolution replay attacks, and other advanced techniques. Simulating

combinations of multiple PA types would further improve the training of robust PAD systems

to handle complex and unforeseen scenarios.

• Incorporating Multiple Controllable Attributes: Future advancements should explore

generating iris images with controllable attributes such as gender, age, and other demographic

or physical traits. For example: (1) Gender-Specific Features: Generating datasets with

distinct male and female iris characteristics can help improve recognition accuracy across

different populations, (2) Age Progression and Variation: Simulating age-related changes in

iris patterns or generating images across various age groups can support studies on aging in

biometrics and enhance the robustness of recognition systems and (3) Customizable Features:

Attributes like iris color, environmental lighting, and even gaze direction could be integrated

into generative models. This level of control would enable the creation of datasets tailored to

specific application requirements, such as forensic analysis or testing under varied conditions.

• Text-to-Iris Generation: One emerging area of research is the use of text-to-image gen-

eration models for generating synthetic iris images. This involves employing natural lan-

guage descriptions to guide the generation of biometric data. Current generative models

like diffusion models and generative adversarial networks (GANs) have primarily relied on

image-based input data, but the integration of text prompts allows for even greater control

over the generated content. For instance, specifying text descriptions such as "dark brown

iris with light radial streaks" or "large iris with distinct crypts" could enable the generation of

highly specific iris images, potentially with desired features for certain applications, such as

forensic analysis or iris recognition in low-light environments. Exploring text-to-iris genera-

tion could lead to more flexible and customizable generation techniques, allowing researchers

145

to produce synthetic datasets tailored to specific needs, while also simplifying the dataset

creation process.

• Leveraging Large Language Models (LLMs) for Iris Generation:

Large Language

Models (LLMs) such as GPT-4 have demonstrated impressive generative capabilities across

a range of tasks, from text to code generation. Future work could explore the use of LLMs in

the context of iris image generation, particularly for improving the realism and diversity of

synthetic data. LLMs could be integrated with existing image generation models to enhance

the capability to generate new and diverse iris images. One possible direction is using

LLMs to guide the latent space manipulation in GANs or diffusion models, improving the

model’s ability to synthesize realistic and unique identities by interpreting detailed, semantic

input. The incorporation of LLMs into the synthetic iris generation process could also aid in

developing models that are more adept at creating specific variations required in biometric

systems, like diverse lighting conditions or novel presentation attack scenarios.

• Multi-Spectrum Iris Generation (NIR and Visible): Another key area for future research

is multi-spectrum iris image generation. Most iris recognition systems rely on near-infrared

(NIR) images. However, there is growing interest in visible spectrum iris recognition for

mobile and low-light environments, where NIR imagery may not always be feasible. De-

veloping models capable of generating synthetic iris images across both NIR and visible

spectrum would significantly expand the utility of synthetic iris datasets. For example, fu-

ture work could focus on training generative models to learn the correlations between iris

patterns captured in NIR and visible light, enabling the translation of a synthetic NIR iris

image into a realistic visible spectrum image, and vice versa. This capability could be crucial

for advancing biometric systems in mobile devices, where visible light conditions are more

common, and for cross-spectrum iris recognition.

• Cross-Domain and Multi-Modal Iris Image Generation: As biometric systems increas-

ingly incorporate multiple modalities (e.g., face and iris recognition), there is potential to

146

explore multi-modal synthetic data generation. Future work could investigate how to lever-

age existing models to create synthetic data that captures both cross-domain and cross-modal

biometric traits. For instance, generating images that simultaneously capture iris, face, and

periocular features could provide comprehensive datasets for multi-biometric recognition

systems. This direction could also enable better data augmentation techniques for systems

that use multiple biometric traits for identity verification.

• Generative Models for Emerging Presentation Attack (PA) Techniques: While signif-

icant progress has been made in generating synthetic data to improve presentation attack

detection (PAD) systems, as new PA techniques emerge—such as high-resolution display at-

tacks, textured contact lenses, or replay attacks—future work will need to continue evolving

generative models to account for these sophisticated attacks. Models like MID-StyleGAN

could be further refined to simulate these more advanced PA scenarios, ensuring that PAD

systems are robust to future threats. Additionally, exploring the integration of adversarial

training and domain adaptation into synthetic data generation could help PAD systems better

generalize across different PA types and environments.

• Enhancing Realism in Fully-Synthetic Identities: While IT-diffGAN and iWarpGAN

make strides in generating fully synthetic iris images with distinct identities, future research

can explore ways to enhance the realism of these fully-synthetic identities. This may involve

improving the latent space representation to better capture the fine-grained details of iris

patterns and incorporating feedback loops where the generated identities are compared with

real-world data for further refinement. Additionally, increasing the resolution and dynamic

range of synthetic iris images could improve the quality of data used for training biometric

systems, leading to more accurate recognition performance.

• Ethical Considerations and Privacy Preservation in Synthetic Data: As the field of

synthetic biometric data continues to expand, future work must also consider the ethical im-

plications and the importance of privacy preservation in generating fully synthetic biometric

147

identities. Ensuring that synthetic data generation techniques do not inadvertently reproduce

real-world identities, or mimic certain population characteristics disproportionately, is es-

sential. Future research could involve developing privacy-preserving generative techniques

that use differential privacy or privacy-aware learning frameworks to ensure that synthetic

datasets remain ethically sound and do not compromise individual privacy.

The future of synthetic iris image generation holds immense promise, offering not only tech-

nological advancements but also solutions to some of the most pressing challenges in biometric

systems. By exploring novel methods such as text-to-iris generation, leveraging large language mod-

els (LLMs), and developing multi-spectrum capabilities, the field is poised to overcome existing

limitations in data diversity, realism, and security. At the same time, maintaining a balance between

innovation and ethical responsibility is crucial. As synthetic data becomes more widespread in

biometric research and applications, it is essential to prioritize privacy and fairness. The continued

development of privacy-preserving techniques and frameworks will be critical in ensuring that the

benefits of synthetic biometric data do not come at the expense of individual privacy or ethical

integrity. Ultimately, the methods and directions proposed in this work pave the way for more

robust, secure, and reliable biometric systems. By advancing the state-of-the-art in synthetic iris

generation, future research can contribute to the broader adoption of biometric technologies across

various industries, ensuring their long-term viability and impact in a rapidly evolving digital world.

148

BIBLIOGRAPHY

[1]

[2]

[3]

[4]

[5]

[6]

[7]

The Iris Challenge Evaluation (ICE) 2005 conducted by National Institute of Standards and
Technology (NIST). https://www.nist.gov/programs-projects/iris-challenge-evaluation-ice,
2005.

The Iris Challenge Evaluation (ICE) 2006 conducted by National Institute of Standards and
Technology (NIST). https://www.nist.gov/programs-projects/iris-challenge-evaluation-ice,
2006.

ISO-Quality-Metrics-Iris-29794-6. Information technology Biometric sample quality Part
6: Iris image data. Standard, International Organization for Standardization, Geneva, CH.
(https://www.iso.org/standard/54066.html, 2014.

CASIA Fingerprint
dbDetailForUser.do?id=4, 2017.

Image Database Version 5.0).

http://biometrics.idealtest.org/

CASIA Iris Image Database Version 4.0. http://biometrics.idealtest.org/dbDetailForUser.
do?id=4, 2017.

IIT Delhi Database. http://www4.comp.polyu.edu.hk/~csajaykr/IITD/Database_Iris.htm.,
2017.

Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, and Aaron Courville.
arXiv
Augmented cycleGAN: Learning many-to-many mappings from unpaired data.
preprint arXiv:1802.10151, 2018.

[8] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial
networks. In International Conference on Machine Learning (ICML), pages 214–223, 2017.

[9]

Pankaj Bamoriya, Gourav Siddhad, Harkeerat Kaur, Pritee Khanna, and Aparajita Ojha.
Dsb-gan: Generation of deep learning based synthetic biometric data. Displays, 74:102267,
2022.

[10] Michael Banf and Volker Blanz. Example-based rendering of eye movements. In Computer

Graphics Forum, volume 28, pages 659–666. Wiley Online Library, 2009.

[11] Dor Bank, Noam Koenigstein, and Raja Giryes.

Autoencoders.

arXiv preprint

arXiv:2003.05991, 2020.

[12] Shane Barratt and Rishi Sharma. A note on the inception score. International Conference

on Machine Learning Workshops (ICMLW), 2018.

[13]

[14]

Jacob Benesty, M Mohan Sondhi, Yiteng Huang, et al. Springer handbook of speech
processing, volume 1. Springer, 2008.

Jordan J Bird, Diego R Faria, Cristiano Premebida, Anikó Ekárt, and Pedro PS Ayrosa.
Overcoming data scarcity in speaker identification: Dataset augmentation with synthetic
mfccs via character-level rnn. In 2020 IEEE International Conference on Autonomous Robot
Systems and Competitions (ICARSC), pages 146–151. IEEE, 2020.

149

[15] Ruud M Bolle, Jonathan H Connell, Sharath Pankanti, Nalini K Ratha, and Andrew W

Senior. Guide to biometrics. Springer Science & Business Media, 2013.

[16] Fadi Boutros, Marco Huber, Patrick Siebke, Tim Rieber, and Naser Damer. Sface: Privacy-
friendly and accurate face recognition using synthetic data. In 2022 IEEE International Joint
Conference on Biometrics (ĲCB), pages 1–11. IEEE, 2022.

[17] Kevin W Bowyer and Mark J Burge. Handbook of iris recognition. Springer, 2016.

[18] Raffaele Cappelli, Dario Maio, Davide Maltoni, et al. SFinGe: Synthetic Fingerprint

Generator. 2004.

[19] Luís Cardoso, André Barbosa, Frutuoso Silva, António MG Pinheiro, and Hugo Proença.
Iris biometrics: Synthesis of degraded ocular images. IEEE Transactions on information
forensics and security, 8(7):1115–1125, 2013.

[20] Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. Paired cycleGAN: Asymmet-
ric style transfer for applying and removing makeup. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 40–48, 2018.

[21] Cunjian Chen and Arun Ross. A multi-task convolutional neural network for joint iris
In IEEE Winter Applications of Computer

detection and presentation attack detection.
Vision Workshops (WACVW), pages 44–51, 2018.

[22] Rui Chen, Xirong Lin, and Tianhuai Ding. Liveness detection for iris recognition using

multispectral images. Pattern Recognition Letters, 33(12):1513–1519, 2012.

[23] Manuela Chessa, Guido Maiello, Alessia Borsari, and Peter J Bex. The perceptual quality
of the oculus rift for immersive virtual reality. Human–computer interaction, 34(1):51–82,
2019.

[24] Wonwoong Cho, Sungha Choi, David Keetae Park, Inkyu Shin, and Jaegul Choo. Image-to-
image translation via group-wise deep whitening-and-coloring transformation. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10639–
10647, 2019.

[25] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo.
StarGAN: Unified generative adversarial networks for multi-domain image-to-image trans-
lation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 8789–8797, 2018.

[26] Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. StarGAN v2: Diverse image
synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 8188–8197, 2020.

[27] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape
models-their training and application. Computer vision and image understanding, 61(1):38–
59, 1995.

150

[28] National Research Council, Whither Biometrics Committee, et al. Biometric recognition:

Challenges and opportunities. 2010.

[29] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and
Anil A Bharath. Generative adversarial networks: An overview. IEEE signal processing
magazine, 35(1):53–65, 2018.

[30] S Crihalmeanu, Arun Ross, Stephanie Schuckers, and L Hornak. A protocol for multibiomet-
ric data acquisition, storage and dissemination. Technical report, Technical Report, WVU,
Lane Department of Computer Science and Electrical . . . , 2007.

[31] Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion
models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,
45(9):10850–10869, 2023.

[32]

Jiali Cui, Yunhong Wang, JunZhou Huang, Tieniu Tan, and Zhenan Sun. An iris image
synthesis method based on pca and super-resolution. In Proceedings of the 17th International
Conference on Pattern Recognition, 2004. ICPR 2004., volume 4, pages 471–474 Vol.4,
2004.

[33] Adam Czajka. Database of iris printouts and its application: Development of liveness
detection method for iris recognition. In 2013 18th International Conference on Methods &
Models in Automation & Robotics (MMAR), pages 28–33. IEEE, 2013.

[34] Adam Czajka and Kevin W Bowyer. Presentation attack detection for iris recognition: An

assessment of the state-of-the-art. ACM Computing Surveys (CSUR), 51(4):1–35, 2018.

[35] Fida K Dankar and Mahmoud Ibrahim. Fake it till you make it: Guidelines for effective

synthetic data generation. Applied Sciences, 11(5):2158, 2021.

[36] Priyanka Das, Joseph McGrath, Zhaoyuan Fang, Aidan Boyd, Ganghee Jang, Amir Moham-
madi, Sandip Purnapatra, David Yambay, Sébastien Marcel, Mateusz Trokielewicz, et al.
Iris liveness detection competition (LivDet-Iris)–the 2020 edition. In IEEE International
Joint Conference on Biometrics (ĲCB), 2020.

[37]

John Daugman. New methods in iris recognition. IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics), 37(5):1167–1175, 2007.

[38] Parth Rajesh Desai, Pooja Nikhil Desai, Komal Deepak Ajmera, and Khushbu Mehta. A
review paper on oculus rift-a virtual reality headset. arXiv preprint arXiv:1408.1173, 2014.

[39] Yaohui Ding and Arun Ross. An ensemble of one-class SVMs for fingerprint spoof detection
In IEEE International Workshop on Information

across different fabrication materials.
Forensics and Security (WIFS), pages 1–6, 2016.

[40] Garoe Dorta, Sara Vicente, Neill DF Campbell, and Ivor JA Simpson. The GAN that warped:
Semantic attribute editing with unpaired data. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 5356–5365, 2020.

151

[41]

James S Doyle and Kevin W Bowyer. Robust detection of textured contact lenses in iris
recognition using BSIF. IEEE Access, 3:1672–1683, 2015.

[42] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Mar-
arXiv preprint

tin Arjovsky, and Aaron Courville. Adversarially learned inference.
arXiv:1606.00704, 2016.

[43] Melissa Edwards, Agnes Gozdzik, Kendra Ross, Jon Miles, and Esteban J Parra. Quantitative
measures of iris color using high resolution photographs. American journal of physical
anthropology, 147(1):141–149, 2012.

[44]

[45]

Joshua J Engelsma and Anil K Jain. Generalizing Fingerprint Spoof Detector: Learning a
One-Class Classifier. arXiv preprint arXiv:1901.03918, 2019.

Joshua James Engelsma, Steven Grosz, and Anil K Jain. Printsgan: Synthetic fingerprint
generator. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6111–
6124, 2022.

[46] Michael D. Fairhurst and Charles I. Watson. Synthetic biometrics: The path ahead.

In

Handbook of Biometric Anti-Spoofing, pages 525–551. 2019.

[47] Meiling Fang, Marco Huber, and Naser Damer. Synthaspoof: Developing face presentation
attack detection based on privacy-friendly synthetic data. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 1061–1070, 2023.

[48] Marcos Faundez-Zanuy. Signature recognition state-of-the-art. IEEE aerospace and elec-

tronic systems magazine, 20(7):28–32, 2005.

[49] Zhen-Hua Feng, Guosheng Hu, Josef Kittler, William Christmas, and Xiao-Jun Wu. Cas-
caded collaborative regression for robust facial landmark detection trained using a mixture of
synthetic and real images with dynamic weighting. IEEE Transactions on Image Processing,
24(11):3425–3440, 2015.

[50]

[51]

Javier Galbally, Arun Ross, Marta Gomez-Barrero, Julian Fierrez, and Javier Ortega-Garcia.
Iris image reconstruction from binary templates: An efficient probabilistic approach based
on genetic algorithms. Computer Vision and Image Understanding, 117:1512–1525, 10
2013.

Javier Galbally, Arun Ross, Marta Gomez-Barrero, Julian Fierrez, and Javier Ortega-Garcia.
Iris image reconstruction from binary templates: An efficient probabilistic approach based
on genetic algorithms. Computer Vision and Image Understanding, 117(10):1512–1525,
2013.

[52] Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor Lempitsky. Deepwarp:
Photorealistic image resynthesis for gaze manipulation. In European Conference on Com-
puter Vision, pages 311–326. Springer, 2016.

[53] Leon Gatys, Alexander S Ecker, and Matthias Bethge. Texture synthesis using convolutional
In Advances in Neural Information Processing Systems (NIPS), pages

neural networks.
262–270, 2015.

152

[54]

Jiahao Geng, Tianjia Shao, Youyi Zheng, Yanlin Weng, and Kun Zhou. Warp-guided GANs
for single-photo facial animation. ACM Transactions on Graphics (ToG), 37(6):1–12, 2018.

[55] Edd Gent. A cryptocurrency for the masses or a universal id?: Worldcoin aims to scan all

the world’s eyeballs. IEEE Spectrum, 2023.

[56]

[57]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets. pages 2672–2680,
2014.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets. In Advances in
Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.

[58] Steven A Grosz and Anil K Jain. Spoofgan: Synthetic fingerprint spoof images.

IEEE

Transactions on Information Forensics and Security, 18:730–743, 2022.

[59] Mehak Gupta, Vishal Singh, Akshay Agarwal, Mayank Vatsa, and Richa Singh. Generalized
In 25th IEEE

iris presentation attack detection algorithm under cross-database settings.
International Conference on Pattern Recognition (ICPR), pages 5318–5325, 2021.

[60] Priyanshu Gupta, Shipra Behera, Mayank Vatsa, and Richa Singh. On iris spoofing using
print attack. In IEEE International Conference on Pattern Recognition (ICPR), pages 1681–
1686, 2014.

[61]

Jian Han, Sezer Karaoglu, Hoang-An Le, and Theo Gevers.
performance with 3d-rendered synthetic data. arXiv preprint arXiv:1812.07363, 2018.

Improving face detection

[62] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for
image recognition. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 770–778, 2016.

[63] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klam-
bauer, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a
Nash equilibrium. arXiv preprint arXiv:1706.08500, 2017.

[64] Steven Hoffman, Renu Sharma, and Arun Ross. Convolutional neural networks for iris
presentation attack detection: Toward cross-dataset and cross-sensor generalization.
In
IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages
1620–1628, 2018.

[65] Steven Hoffman, Renu Sharma, and Arun Ross. Iris + Ocular: Generalized Iris Presentation
Attack Detection Using Multiple Convolutional Neural Networks. In IAPR International
Conference on Biometrics (ICB), 2019.

[66] Cheng-Shun Hsiao, Chih-Peng Fan, and Yin-Tsung Hwang. Design and analysis of deep-
learning based iris recognition technologies by combination of u-net and efficientnet. In
9th International Conference on Information and Education Technology (ICIET), pages
433–437. IEEE, 2021.

153

[67] Huaibo Huang, Ran He, Zhenan Sun, Tieniu Tan, et al. Introvae: Introspective variational
autoencoders for photographic image synthesis. Advances in neural information processing
systems, 31, 2018.

[68] Muhammad Zahid Iqbal and Abraham G Campbell. Adopting smart glasses responsibly:
potential benefits, ethical, and privacy concerns with ray-ban stories. AI and Ethics, 3(1):325–
327, 2023.

[69] Anil K Jain, Patrick Flynn, and Arun A Ross. Handbook of biometrics. Springer Science &

Business Media, 2007.

[70] Anil K Jain and Stan Z Li. Handbook of face recognition, volume 1. Springer, 2011.

[71] Anil K Jain, Karthik Nandakumar, and Arun Ross. 50 years of biometric research: Accom-

plishments, challenges, and opportunities. Pattern recognition letters, 79:80–105, 2016.

[72] Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from

standard GAN. arXiv preprint arXiv:1807.00734, 2018.

[73]

Indu Joshi, Marcel Grimmer, Christian Rathgeb, Christoph Busch, Francois Bremond,
and Antitza Dantcheva. Synthetic data in human analysis: A survey. arXiv preprint
arXiv:2208.09191, 2022.

[74] Masashi Kanematsu, Hironobu Takano, and Kiyomi Nakamura. Highly reliable liveness
detection method for iris recognition. In SICE Annual Conference 2007, pages 361–364.
IEEE, 2007.

[75] Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila.
Training generative adversarial networks with limited data. Advances in neural information
processing systems, 33:12104–12114, 2020.

[76] Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen,
and Timo Aila. Alias-free generative adversarial networks. Advances in neural information
processing systems, 34:852–863, 2021.

[77] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative
adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 4401–4410, 2019.

[78] Patrik Joslin Kenfack, Daniil Dmitrievich Arapov, Rasheed Hussain, SM Ahsan Kazmi, and
Adil Khan. On the fairness of generative adversarial networks (gans). In 2021 International
Conference" Nonlinearity, Information and Robotics"(NIR), pages 1–7. IEEE, 2021.

[79]

Jihyun Kim, Changjae Oh, Hoseok Do, Soohyun Kim, and Kwanghoon Sohn. Diffusion-
In Proceedings of the
driven GAN Inversion for Multi-Modal Face Image Generation.
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10403–10412,
2024.

154

[80] Naman Kohli, Daksha Yadav, Mayank Vatsa, and Richa Singh. Revisiting iris recognition
with color cosmetic contact lenses. In IEEE International Conference on Biometrics (ICB),
pages 1–7, 2013.

[81] Naman Kohli, Daksha Yadav, Mayank Vatsa, Richa Singh, and Afzel Noore. Detecting med-
ley of iris spoofing attacks using DESIST. In IEEE International Conference on Biometrics
Theory, Applications and Systems (BTAS), pages 1–6, 2016.

[82] Naman Kohli, Daksha Yadav, Mayank Vatsa, Richa Singh, and Afzel Noore. Synthetic iris
presentation attack using idcgan. In 2017 IEEE International Joint Conference on Biometrics
(ĲCB), pages 674–680. IEEE, 2017.

[83] Adam Kortylewski, Bernhard Egger, Andreas Schneider, Thomas Gerig, Andreas Morel-
Forster, and Thomas Vetter. Analyzing and reducing the damage of dataset bias to face
recognition with synthetic data. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops, pages 0–0, 2019.

[84] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep
In Advances in Neural Information Processing Systems,

convolutional neural networks.
pages 1097–1105, 2012.

[85] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep
convolutional neural networks. Advances in neural information processing systems, 25,
2012.

[86] Ajay Kumar and Arun Passi. Comparison and combination of iris matchers for reliable

personal authentication. Pattern recognition, 43(3):1016–1026, 2010.

[87] Minh-Ha Le and Niklas Carlsson. Styleid: Identity disentanglement for anonymizing faces.

arXiv preprint arXiv:2212.13791, 2022.

[88] Eui Chul Lee, You Jin Ko, and Kang Ryoung Park. Fake iris detection method using purkinje

images based on gaze position. Optical Engineering, 47(6):1 – 16, 2008.

[89] Lik-Hang Lee and Pan Hui. Interaction methods for smart glasses: A survey. IEEE access,

6:28712–28732, 2018.

[90] Sung Joo Lee, Kang Ryoung Park, and Jaihie Kim. Robust fake iris detection based on vari-
ation of the reflectance ratio between the iris and the sclera. In IEEE Biometrics Symposium:
Special Session on Research at the Biometric Consortium Conference, pages 1–6, 2006.

[91] Sung Joo Lee, Kang Ryoung Park, Youn Joo Lee, Kwanghyuk Bae, and Jai Hie Kim.
Multifeature-based fake iris detection method. Optical Engineering, 46(12):127204, 2007.

[92] Chenyang Li, Zhili Zhang, Peipei Li, and Zhaofeng He. I3FDM: IRIS Inpainting Via Inverse
In ICASSP 2024-2024 IEEE International Conference on

Fusion of Diffusion Models.
Acoustics, Speech and Signal Processing (ICASSP), pages 1636–1640. IEEE, 2024.

[93] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey.

Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.

155

[94] Sarvesh Makthal and Arun Ross. Synthesis of iris images using markov random fields. In

2005 13th European Signal Processing Conference, pages 1–4, 2005.

[95] Davide Maltoni, Dario Maio, Anil K Jain, and Salil Prabhakar. Synthetic fingerprint gener-

ation. Handbook of fingerprint recognition, pages 271–302, 2009.

[96] Davide Maltoni, Dario Maio, Anil K Jain, Salil Prabhakar, et al. Handbook of fingerprint

recognition, volume 2. Springer, 2009.

[97] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smol-
ley. Least squares generative adversarial networks. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), pages 2794–2802, 2017.

[98] David Menotti, Giovani Chiachia, Allan da Silva Pinto, William Robson Schwartz, Hélio
Pedrini, Alexandre Xavier Falcao, and Anderson Rocha. Deep representations for iris, face,
In IEEE Transactions on Information Forensics and
and fingerprint spoofing detection.
Security, 10:864–879, 2015.

[99] Denis Migdal and Christophe Rosenberger. Statistical modeling of keystroke dynamics
samples for the generation of synthetic datasets. Future Generation Computer Systems,
100:907–920, 2019.

[100] Shervin Minaee and Amirali Abdolrashidi.

Iris-gan: Learning to generate realistic iris

images using convolutional gan. arXiv preprint arXiv:1812.04822, 2018.

[101] Shervin Minaee and Amirali Abdolrashidi. Deepiris: Iris recognition using a deep learning

approach. arXiv preprint arXiv:1907.09380, 2019.

[102] Yisroel Mirsky and Wenke Lee. The creation and detection of deepfakes: A survey. ACM

Computing Surveys (CSUR), 54(1):1–41, 2021.

[103] Mohammad Nabati, Hojjat Navidan, Reza Shahbazian, Seyed Ali Ghorashi, and David
Windridge. Using synthetic data to enhance the accuracy of fingerprint-based localization:
A deep learning approach. IEEE Sensors Letters, 4(4):1–4, 2020.

[104] Kien Nguyen, Clinton Fookes, Raghavender Jillela, Sridha Sridharan, and Arun Ross. Long

range iris recognition: A survey. Pattern Recognition, 72:123–143, 2017.

[105] Ishan Nigam, Mayank Vatsa, and Richa Singh. Ocular biometrics: A survey of modalities

and fusion approaches. Information Fusion, 26:1–35, 2015.

[106] Olegs Nikisins, Amir Mohammadi, André Anjos, and Sébastien Marcel. On effectiveness
of anomaly detection approaches against unseen presentation attacks in face anti-spoofing.
In IAPR International Conference on Biometrics (ICB), 2018.

[107] Behnaz Nojavanasghari, Charles E Hughes, Tadas Baltrušaitis, and Louis-Philippe Morency.
In 2017
Hand2face: Automatic synthesis and recognition of hand over face occlusions.
Seventh International Conference on Affective Computing and Intelligent Interaction (ACII),
pages 209–215. IEEE, 2017.

156

[108] Margarita Osadchy, Yan Wang, Orr Dunkelman, Stuart Gibson, Julio Hernandez-Castro,
and Christopher Solomon. Genface: Improving cyber security using realistic synthetic face
generation. In Cyber Security Cryptography and Machine Learning: First International
Conference, CSCML 2017, Beer-Sheva, Israel, June 29-30, 2017, Proceedings 1, pages
19–33. Springer, 2017.

[109] Melih Öz, TANER DANIŞMAN, Melih Günay, Esra Şanal, Özgür Duman, and JOSEPH
LEDET. The use of synthetic data to facilitate eye segmentation using deeplabv3+. Annals
of Emerging Technologies in Computing, 5(3), 2021.

[110] Unsang Park, Yiying Tong, and Anil K Jain. Age-invariant face recognition. IEEE transac-

tions on pattern analysis and machine intelligence, 32(5):947–954, 2010.

[111] Alex Perala. Princeton identity tech powers galaxy s8 iris scanning. https://mobileidworld.

com/princeton-identity-galaxy-s8-iris-003312, 2017.

[112] H. Proenca, S. Filipe, R. Santos, J. Oliveira, and L.A. Alexandre. The UBIRIS.v2: A
database of visible wavelength images captured on-the-move and at-a-distance. IEEE Trans.
PAMI, 32(8):1529–1535, August 2010.

[113] H. Proença and L.A. Alexandre. UBIRIS: A noisy iris image database. In 13th International
Conference on Image Analysis and Processing - ICIAP 2005, volume LNCS 3617, pages
970–977, Cagliari, Italy, September 2005. Springer.

[114] Haibo Qiu, Baosheng Yu, Dihong Gong, Zhifeng Li, Wei Liu, and Dacheng Tao. Syn-
face: Face recognition with synthetic data. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 10880–10890, 2021.

[115] R. Raghavendra and C. Busch. Presentation attack detection algorithm for face and iris
In European Signal Processing Conference (EUSIPCO), pages 1387–1391,

biometrics.
2014.

[116] Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and
Daniel Cohen-Or. Encoding in style: a StyleGAN encoder for image-to-image translation.
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pages 2287–2296, 2021.

[117] Arun Ross, Sudipta Banerjee, Cunjian Chen, Anurag Chowdhury, Vahid Mirjalili, Renu
Sharma, Thomas Swearingen, and Shivangi Yadav. Some Research Problems in Biometrics:
The Future Beckons. In IAPR International Conference on Biometrics (ICB), 2019.

[118] Arun A Ross, Karthik Nandakumar, and Anil K Jain. Handbook of multibiometrics, volume 6.

Springer Science & Business Media, 2006.

[119] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and
In Advances in Neural Information

Improved techniques for training GANs.
Xi Chen.
Processing Systems (NIPS), pages 2234–2242, 2016.

157

[120] Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. Improving GANs using

optimal transport. arXiv preprint arXiv:1803.05573, 2018.

[121] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh
Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 4510–4520, 2018.

[122] Gil Santos, Emanuel Grancho, Marco V Bernardo, and Paulo T Fiadeiro. Fusing iris and
periocular information for cross-sensor recognition. Pattern Recognition Letters, 57:52–59,
2015.

[123] Thomas Schlegl, Philipp Seeböck, Sebastian M Waldstein, Ursula Schmidt-Erfurth, and
Georg Langs. Unsupervised anomaly detection with generative adversarial networks to
guide marker discovery. In International Conference on Information Processing in Medical
Imaging (IPMI), pages 146–157, 2017.

[124] Robin M Schmidt. Recurrent neural networks (rnns): A gentle introduction and overview.

arXiv preprint arXiv:1912.05911, 2019.

[125] Ana Sequeira, Lulu Chen, Peter Wild, James Ferryman, Fernando Alonso-Fernandez, Ki-
ran B Raja, Ramachandra Raghavendra, Christoph Busch, and Joseph Bigun. Cross-eyed-
cross-spectral iris/periocular recognition database and competition. In IEEE International
Conference of the Biometrics Special Interest Group (BIOSIG), pages 1–5, 2016.

[126] Samir Shah and Arun Ross. Generating synthetic irises by feature agglomeration. In 2006

international conference on image processing, pages 317–320. IEEE, 2006.

[127] Anjali Sharma, Shalini Verma, Mayank Vatsa, and Richa Singh. On cross spectral periocular
recognition. In IEEE International Conference on Image Processing (ICIP), pages 5007–
5011, 2014.

[128] Renu Sharma and Arun Ross. D-netpad: An explainable and interpretable iris presentation
attack detector. In 2020 IEEE international joint conference on biometrics (ĲCB), pages
1–10. IEEE, 2020.

[129] Renu Sharma and Arun Ross. D-NetPAD: An Explainable and Interpretable Iris Presentation
Attack Detector. In IEEE International Joint Conference on Biometrics (ĲCB), 2020.

[130] Renu Sharma and Arun Ross. Viability of optical coherence tomography for iris presentation
attack detection. In 25th IEEE International Conference on Pattern Recognition (ICPR),
pages 6165–6172, 2021.

[131] Joseph Shelton, Kaushik Roy, Brian O’Connor, and Gerry V Dozier. Mitigating iris-based
replay attacks. International Journal of Machine Learning and Computing (JMLC), 4(3):204,
2014.

[132] Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell
Webb. Learning from simulated and unsupervised images through adversarial training. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pages 2107–2116, 2017.

158

[133] David P Sidlauskas and Samir Tamer. Hand geometry recognition. Handbook of Biometrics,

pages 91–107, 2008.

[134] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale

image recognition. arXiv preprint arXiv:1409.1556, 2014.

[135] Michał Stypułkowski, Konstantinos Vougioukas, Sen He, Maciej Zięba, Stavros Petridis,
and Maja Pantic. Diffused heads: Diffusion models beat GANs on talking-face generation.
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
pages 5091–5100, 2024.

[136] Z. Sun, H. Zhang, T. Tan, and J. Wang. Iris image classification based on hierarchical visual
codebook. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI),
36(6):1120–1133, 2014.

[137] Dwi Joko Suroso, Panarat Cherntanomwong, and Pitikhate Sooraksa. Synthesis of a small
fingerprint database through a deep generative model for indoor localisation. Elektronika ir
Elektrotechnika, 29(1):69–75, 2023.

[138] Fariborz Taherkhani, Aashish Rai, Quankai Gao, Shaunak Srivastava, Xuanbai Chen, Fer-
nando de la Torre, Steven Song, Aayush Prakash, and Daeil Kim. Controllable 3d generative
In Proceedings of the
adversarial face model via disentangling shape and appearance.
IEEE/CVF Winter Conference on Applications of Computer Vision, pages 826–836, 2023.

[139] Pin Shen Teh, Andrew Beng Jin Teoh, and Shigang Yue. A survey of keystroke dynamics

biometrics. The Scientific World Journal, 2013, 2013.

[140] Patrick Tinsley, Adam Czajka, and Patrick J. Flynn. Haven’t I Seen You Before? Assessing
Identity Leakage in Synthetic Irises. In IEEE International Joint Conference on Biometrics
(ĲCB), pages 1–9, 2022.

[141] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein

auto-encoders. arXiv preprint arXiv:1711.01558, 2017.

[142] Christos Tzelepis, Georgios Tzimiropoulos, and Ioannis Patras. WarpedGANSpace: Finding
non-linear RBF paths in GAN latent space. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 6393–6402, 2021.

[143] Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al.
In Advances in Neural Information

Conditional image generation with CNN decoders.
Processing Systems (NIPS), pages 4790–4798, 2016.

[144] Shreyas Venugopalan and Marios Savvides. How to generate spoofed irises from an iris code
template. In IEEE Transactions on Information Forensics and Security (TIFS), 6(2):385–395,
2011.

[145] Paul Voigt and Axel Von dem Bussche. The EU General Data Protection Regulation (GDPR).
A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10(3152676):10–
5555, 2017.

159

[146] Zhiqiang Wan, Yazhou Zhang, and Haibo He. Variational autoencoder based synthetic
data generation for imbalanced learning. In 2017 IEEE symposium series on computational
intelligence (SSCI), pages 1–7. IEEE, 2017.

[147] Chen Wang, Zhaofeng He, Caiyong Wang, and Qing Tian. Generating intra-and inter-class
iris images by identity contrast. In 2022 IEEE International Joint Conference on Biometrics
(ĲCB), pages 1–7. IEEE, 2022.

[148] Lakin Wecker, Faramarz Samavati, and Marina Gavrilova. Iris synthesis: a reverse subdivi-
sion application. In Proceedings of the 3rd International Conference on Computer Graphics
and Interactive Techniques in Australasia and South East Asia, pages 121–125. ACM, 2005.

[149] Zhuoshi Wei, Tieniu Tan, and Zhenan Sun. Synthesis of large realistic iris databases using
patch-based sampling. In 2008 19th International Conference on Pattern Recognition, pages
1–4, 2008.

[150] Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas

Bulling. A 3d morphable model of the eye region. Optimization, 1:0, 2016.

[151] André Brasil Vieira Wyzykowski and Anil K Jain. Synthetic latent fingerprint generator.
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
pages 971–980, 2023.

[152] Lihu Xiao, Zhenan Sun, Ran He, and Tieniu Tan. Coupled feature selection for cross-
In IEEE Sixth International Conference on Biometrics: Theory,

sensor iris recognition.
Applications and Systems (BTAS), pages 1–6, 2013.

[153] Zhisheng Xiao, Karsten Kreis, and Arash Vahdat. Tackling the generative learning trilemma

with denoising diffusion gans. arXiv preprint arXiv:2112.07804, 2021.

[154] Minrui Xu, Dusit Niyato, Junlong Chen, Hongliang Zhang, Jiawen Kang, Zehui Xiong,
Shiwen Mao, and Zhu Han. Generative ai-empowered simulation for autonomous driving in
vehicular mixed reality metaverses. arXiv preprint arXiv:2302.08418, 2023.

[155] D. Yadav, N. Kohli, J. S. Doyle, R. Singh, M. Vatsa, and K. W. Bowyer. Unraveling the
effect of textured contact lenses on iris recognition. In IEEE Transactions on Information
Forensics and Security (TIFS), 9:851–862, 2014.

[156] Shivangi Yadav, Cunjian Chen, and Arun Ross. Synthesizing iris images using rasgan with
application in presentation attack detection. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.

[157] Shivangi Yadav, Cunjian Chen, and Arun Ross. Relativistic Discriminator: A One-Class
Classifier for Generalized Iris Presentation Attack Detection. In IEEE Winter Conference on
Applications of Computer Vision, pages 2635–2644, 2020.

[158] Shivangi Yadav and Arun Ross. Cit-gan: Cyclic image translation generative adversarial net-
work with application in iris presentation attack detection. In Proceedings of the IEEE/CVF
Winter Conference on Applications of Computer Vision, pages 2412–2421, 2021.

160

[159] Shivangi Yadav and Arun Ross.

iwarpgan: Disentangling identity and style to generate

synthetic iris images. arXiv preprint arXiv:2305.12596, 2023.

[160] Shivangi Yadav and Arun Ross. Synthesizing iris images using generative adversarial
networks: Survey and comparative analysis. arXiv preprint arXiv:2404.17105, 2024.

[161] D. Yambay, B. Walczak, S. Schuckers, and A. Czajka. LivDet-Iris 2015 - iris liveness
detection competition 2015. In IEEE International Conference on Identity, Security and
Behavior Analysis (ISBA), pages 1–6, 2017.

[162] David Yambay, Benedict Becker, Naman Kohli, Daksha Yadav, Adam Czajka, Kevin W
Bowyer, Stephanie Schuckers, Richa Singh, Mayank Vatsa, Afzel Noore, et al. LivDet Iris
2017 - Iris Liveness Detection Competition. In IEEE International Joint Conference on
Biometrics (ĲCB), pages 733–741, 2017.

[163] Dingdong Yang, Seunghoon Hong, Yunseok Jang, Tianchen Zhao, and Honglak
arXiv preprint

Lee. Diversity-sensitive conditional generative adversarial networks.
arXiv:1901.09024, 2019.

[164] Svetlana N Yanushkevich, Adrian Stoica, Vlad P Shmerko, and Denis V Popel. Biometric

inverse problems. CRC Press, 2018.

[165] Raymond Yeh, Ziwei Liu, Dan B Goldman, and Aseem Agarwala. Semantic facial expression

editing using autoencoded flow. arXiv preprint arXiv:1611.09961, 2016.

[166] Bassel Zeno, Ilya Kalinovskiy, and Yuri Matveev.

IP-GAN: learning identity and pose
disentanglement in generative adversarial networks. In International Conference on Artificial
Neural Networks, pages 535–547. Springer, 2019.

[167] Man Zhang, Qi Zhang, Zhenan Sun, Shujuan Zhou, and Nasir Uddin Ahmed. The BTAS
competition on mobile iris recognition. In IEEE 8th international conference on biometrics
theory, applications and systems (BTAS), pages 1–7, 2016.

[168] Mei Zhang, Jinglan Wu, Huifeng Lin, Peng Yuan, and Yanan Song. The application of
one-class classifier based on CNN in image defect detection. Procedia Computer Science,
114:341–348, 2017.

[169] Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. View
synthesis by appearance flow. In European Conference on Computer Vision, pages 286–301.
Springer, 2016.

[170] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image
translation using cycle-consistent adversarial networks. In Proceedings of the IEEE Inter-
national Conference on Computer Vision, pages 2223–2232, 2017.

[171] Tianlei Zhu, Junqi Chen, Renzhe Zhu, and Gaurav Gupta. StyleGAN3: Generative networks
for improving the equivariance of translation and rotation. arXiv preprint arXiv:2307.03898,
2023.

161

[172] Hang Zou, Hui Zhang, Xingguang Li, Jing Liu, and Zhaofeng He. Generation textured
contact lenses iris images based on 4dcycle-gan. In 2018 24th International Conference on
Pattern Recognition (ICPR), pages 3561–3566. IEEE, 2018.

[173] Jinyu Zuo, Natalia A Schmid, and Xiaohan Chen. On generation and analysis of synthetic

iris images. IEEE Transactions on Information Forensics and Security, 2(1):77–90, 2007.

162

APPENDIX

SYNTHETIC IRIS IMAGES USING GENERATIVE ADVERSARIAL NETWORKS:
SURVEY AND COMPARATIVE ANALYSIS

This work is moving towards its completion and we are waiting to submit it to ACM Survey.

Introduction

Iris based biometric recognition systems have gained significant attention in recent years with
applications in various domains [55,105,111]. However, as this field advances, it brings forth range
of research challenges that require exploration and innovation. Notably, a prominent challenge
in biometrics research is the unavailability of datasets with sufficient size, quality, and diverse
intra-class variations. Despite significant strides in biometric technology, the datasets commonly
employed for training and assessing these systems frequently fall short in terms of sample quantity
and the encompassment of the full spectrum of intra-class variabilities. This deficiency hinders the
development and dependable evaluation of iris recognition systems. Another critical challenge is
to ensure the privacy of individuals who submit their biometric data. Biometric information, being
inherently personal and unique, raises concerns about unauthorized access, data breaches, and
potential identity theft. Protecting the privacy of individuals while utilizing their biometric data is
of utmost importance to foster trust and encourage widespread adoption of biometric systems [145].
Researchers have been actively working on finding solutions to overcome these challenges. One of
such solution is to explore the potential of synthetically generated iris dataset.

Synthetic iris refers to the creation of artificial iris image that replicates real-world charac-
teristics. It encompasses the generation of synthetic samples that emulate the traits and patterns
observed in authentic iris images [35]. These synthetic samples are designed to possess similar
statistical properties and variations as genuine iris data, providing a valuable resource for research,
development, and evaluation in the field of biometrics. Therefore, the synthetically generated iris
images can be utilized to generate more data with both inter and intra-class variations. This helps
overcome the limitations and challenges associated with conventional iris datasets. Traditional iris
datasets often suffer from restricted sample sizes, lack diversity, and present concerns regarding
privacy and data sharing. In contrast, synthetically generated irides offer a controlled and scalable
solution by generating artificial iris data that can mimic the complexity and diversity of real-world
biometric traits [73].

By generating synthetic irides, researchers and developers can gain access to larger and more
diverse datasets that facilitates comprehensive testing and optimization of iris based biometric
algorithms, enhancing their performance, reliability, security and generalization capabilities. Ad-
ditionally, synthetically generated irides address privacy concerns associated with the utilization
of genuine biometric data. The artificial nature of the generated data ensures that it is not directly
linked to any specific individual, mitigating the risks of unauthorized access or misuse of personal
information. Consequently, synthetic biometric datasets can be shared and distributed for research
and evaluation purposes without compromising individuals’ privacy rights. Furthermore, syntheti-
cally generated irides play a crucial role in enhancing the training and testing of deep convolutional
neural network (CNN) models. Deep CNNs have demonstrated remarkable performance in vari-
ous biometric tasks but heavily rely on large labeled datasets for effective training. Synthetically

163

generated irides can aid in augmenting the availability of labeled training data by creating synthetic
samples with known ground truth annotations. This facilitates the creation of more extensive and
diverse training sets, resulting in improved CNN model training and higher accuracy in iris recog-
nition systems. Moreover, synthetically generated irides contribute to the development of robust
presentation attack detection (PAD) algorithms. Presentation attacks (PAs), where impostors at-
tempt to deceive biometric systems using fabricated or altered irides, pose significant security risks.
Various presentation attack scenarios can be simulated by generating synthetic irides with diverse
variations and attack types (for example, cosmetic contact lens, printed eyes, artificial eyes, etc.).
This synthetically generated dataset enables the training and evaluation of PA detection algorithms,
enhancing their effectiveness and enabling the development of resilient countermeasures against
evolving PA techniques.

Thus, depending on their usage, the application of synthetically generated irides can be described

as follows [45, 58, 73, 145, 158]:

• Algorithm Development and Testing: Synthetic iris dataset serves as a valuable resource for
algorithm development and testing in the field of biometrics. By generating synthetic iris
samples, researchers can assess and evaluate the performance of novel algorithms, compare
different techniques, and benchmark their efficacy against a standardized iris dataset. Syn-
thetic data allows for controlled experimentation, enabling researchers to precisely manipulate
specific biometric traits, variations, and noise levels to simulate real-world scenarios.

• System Evaluation and Benchmarking: Synthetic iris dataset plays a vital role in evaluating
and benchmarking the performance of iris recognition systems. It provides a standardized
dataset that enables fair comparisons between different systems and algorithms. By using
synthetic data, researchers and developers can assess system accuracy, robustness, and vul-
nerability to various attacks or spoofing attempts. This evaluation process aids in identifying
system weaknesses, improving overall system performance, and guiding the development of
countermeasures.

• Training Data Augmentation: Synthetic iris dataset can be utilized to augment training
datasets, enhancing the performance of iris recognition systems. By generating additional
synthetic samples, researchers can increase the size and diversity of the training set, which
helps to improve the generalization capabilities of the algorithms. This approach reduces
overfitting, enhances the system’s ability to handle intra-class variations, and improves overall
recognition accuracy.

• Privacy-Preserving Studies: Synthetic iris dataset is invaluable for privacy-preserving studies
and research involving sensitive biometric information.
It allows researchers to conduct
studies, simulations, and experiments without the need for real individuals’ personal biometric
data. Synthetic iris dataset provides a privacy-friendly alternative that ensures data protection
while enabling advancements in biometric research and system development.

With the various applications and advantages, synthetic iris dataset offers flexibility and con-
venience in the development and assessment of iris based recognition systems and attack detection
methods. In this context, it is important to understand the different methods employed to generate
synthetic iris samples and their respective approaches. Therefore, in this comprehensive review we

164

would like to explore the current state-of-the-art synthetic iris image generation methods and take a
closer look at the strengths and weaknesses of these methods in terms of quality, identity uniqueness
and their utility. By doing so, we aim for this survey to be a helpful source of accurate technical
information for those interested in learning about the progress and difficulties in synthesizing good
quality iris images. Thus, this review brings forth specific contributions:

• Study the current state-of-the-art methods to generate synthetic irides and explain their pros

and cons.

• An assessment of synthetic iris images generated by current state-of-the-art methods in terms

of quality, uniqueness and utility.

• Analyze current deep learning based iris recognition systems and how synthetically generated

iris dataset can enhance their performance.

• Analyze current deep learning based presentation attack detection algorithms and how syn-

thetically generated iris dataset can enhance their performance.

• Discuss the future direction to overcome the challenges of current methods to generate

enhanced synthetic iris datasets.

Iris Recognition

In this section, we briefly discuss iris recognition to establish the technical context for this

review. For a detailed exploration of iris recognition, we recommend referring to [17, 37, 104].

The foundational technology behind modern iris recognition systems can be traced back to the
work of John Daugman [37], who is credited with the development of the core algorithms that
make such systems possible. Daugman’s work leverages the distinct patterns found in the human
iris to create a method for secure and accurate human recognition. There have been significant
improvements on his initial work for various security, identification and privacy applications. Iris
Recognition algorithms can be divided into 4 sub-problems:

• Iris Segmentation: Most of the iris datasets does not only contain irides but also regions such
as pupil, sclera, eyelashes etc. So, the first step towards iris recognition is to segment iris
from captured image to remove these unnecessary or extra information. Most of the initial
segmentation approaches, including Daugman, involve identifying pupil and iris boundary.
In traditional approaches, the occlusions are minimized by edge detection and curve fitting
over eyelids.

• Normalization: Post segmentation, the variations in the segmented irides (caused due to
distance from sensor, angle or pupil size) are minimized via normalization where circular
irides are unwrapped to a specified resolution by transforming them from cartisan coordinate
system to polar coordinate system.

• Feature Encoding: After normalization, iris features are extracted and encoded so that they
can be used in matching. Most common techniques for iris feature extraction involve Gabor
Filtering [?], BSIF [?], etc. that help in capturing the unique textural properties of Iris.

165

• Matching: Once features are encoded, various matching algorithms can be used for iris
recognition. Daugman used Gabor phase quadrant feature to encode iris and used hamming
distance to match iris samples. These have been improved over time to account for various
image quality, occlusion and noisy data.

With recent developments in the field of deep-learning, deep networks have found it’s application
in all 4 stages of iris recognition. Iris segmentation and feature extraction have particularly benefited
from deep learning-based approaches as it handles noise and complexities in the iris datasets
better than traditional approaches. For example, Chen and Ross [21] proposed a novel multi-
task learning framework using a convolutional neural network (CNN) designed to carry out iris
localization alongside presentation attack detection (PAD) effectively. This multi-task PAD (MT-
PAD) aims to determines the iris’s boundaries and assesses the likelihood of a presentation attack
based on the input image at the same time. Through rigorous testing, this method demonstrated
state-of-the-art performance on various iris datasets. In [104], Nguyen et al. studied how well
pre-trained convolutional neural networks (CNNs) perform in the domain of iris recognition. The
study reveals that features derived from off-the-shelf CNNs can efficiently capture the complex
characteristics of irides. These features are adept at isolating distinguishing visual attributes, which
leads to encouraging outcomes in iris recognition performance. While the progress in the field
of deep learning has helped improve the reliability of iris recognition systems by improving their
performance under various conditions, the lack of iris data with sufficient inter and intra-class
variations limits the training and testing of these systems. Therefore, we need to explore the field of
generative methods to generate fully synthetic iris images that can help obtain a large collection iris
dataset with enough inter and intra-class variations to help train and test a robust iris recognition
system.

Iris Presentation Attack Detection

Iris presentation attack detection (PAD) is an essential aspect of iris recognition systems. As
the reliance on iris recognition systems grows, so does the sophistication of attacks designed to
exploit them. Here, we briefly examine the nature of iris presentation attacks, the methodologies
developed to detect them, and the challenges faced in enhancing iris PAD systems.

Iris presentation attacks (PAs), also known as spoofs, refers to physical artifacts that aims to
either impersonate someone or obfuscate one’s identity to fool the recognition systems. There are
several types of presentation attacks on iris recognition systems:

• Print Attack: One of the simplest forms of iris PA involves the attacker presenting a high-
quality photograph of a valid subject’s iris to the biometric system. Basic systems might be
misled by the photograph’s visual fidelity unless they’re designed to detect the absence of
depth or natural eye movements.

• Artificial Eyes: Attackers may employ high-grade artificial (prosthetic or doll) eyes that
replicate the iris’s texture and three-dimensionality. These artificial eyes seek to deceive
scanners that are not sophisticated enough to discern genuine subject from attacker based on
liveness indicators such as pupil’s response to light stimuli.

166

• Cosmetic Contact Lens: A more nuanced approach includes the usage of cosmetic contact
lenses that have been artificially created with iris patterns that can either conceal the attacker’s
true iris or mimic someone else’s identity. This type of attack attempts to bypass systems
that match iris patterns by introducing false textural elements.

• Replay-Attack: Playing back a video recording of a genuine iris to the scanner constitutes
another PA. Advanced iris recognition systems counter this by looking for evidence of
liveness, like blinking or involuntary pupil contractions.

Many researchers have proposed different methods to effectively detect different types of PAs.
In [60,80] proposed to utilize textual descriptors like GIST, LBP and HOG to detect printed eyes and
cosmetic contact lens. Similarly, Raghavendra and Busch [115] utilize cepstral features with binary
statistical image features (BSIF) to distinguish between bonafide irides and print attacks. Another
way to detect print attack is the liveness test that is lacking in printed eyes [33, 74]. Liveness test
can also be helpful to detect attacks like artificial eyes. Eye gaze tracking [88] and multi-spectral
imaging [22] have good results in detecting printed eyes and artificial eyes. [64] proposed deep
network based PA detection methods to detect different types of PAs. To achieve this, Hoffman et
al. [64] proposed a deep network that utilizes patch information with segmentation mask to learn
features that can distinguish bonafide from iris PAs.

While these iris PA detection methods perform well on various datasets, attackers are contin-
uously finding new ways to bypass them, leading to an arms race between security experts and
attackers. As a result, the PAD methods need to be constantly updated (re-trained or fine-tuned) and
tested against the latest forms of attacks. This calls for PA detection methods that can generalize
well over new (or unseen) PAs without the hassle of re-training or fine-tuning. Here, “Seen PAs"
are those which the PAD methods have been exposed to during the training phase. In contrast,
“Unseen PAs" are not included in the training phase, posing a concerning challenge for accurate PA
detection. Recent developments in PAD methods have focused on enhancing the ability of systems
to generalize, distinguishing bonafide irides from PAs, even when encountering previously unseen
PAs. Gupta et al. [59] proposed a deep network called MVANet, which uses multiple convolutional
layers for generalized PA detection. This network not only improves PA detection accuracy, but also
addresses the high computational costs typically associated with training deep neural networks by
using a simplified base model structure. Evaluations across different databases indicate MVANet’s
proficiency in generalizing to detect new and unseen PAs. In [128], Sharma and Ross proposed
D-NetPAD, a PAD method based on DenseNet to generalize over seen and unseen PAs.
It has
demonstrated a strong ability to generalize across diverse PAs, sensors, and data collections. Their
rigorous testing confirms D-NetPAD’s robustness in detecting generalized PAs.

Most of the PAD methods formulate PA detection as a binary-class problem, which demands
the availability of a large collection of both bonafide and PA samples to train classifiers. However,
obtaining a large number of PA samples can be much more difficult than bonafide iris samples.
Further, classifiers are usually trained and tested across similar PAs, but PAs encountered in
operational systems can be diverse in nature and may not be available during the training stage.
Therefore, we need to explore the generative methods to generate partially synthetic iris images (as
identity is not the focus in PA detection) that can help build a balanced iris PA datasets. This will
help researchers to better train and test their detection methods.

167

Generating Synthetic Irides

As mentioned earlier, synthetic iris images offer several advantages, including scalability,
diversity, and control over the generated data. Some of the methods to generate such images are
listed below, categorized on the basis of method used:

• Texture Synthesis: This technique has been widely used for generating synthetic iris images.
These methods analyze the statistical properties of real iris images and generate new images
based on those statistics. Shah and Ross [126] proposed an approach for generating digital
renditions of iris images using a two-step technique.
In the first stage, they utilized a
Markov Random Field model to generate a background texture that accurately represents
the global appearance of the iris. In the subsequent stage, various iris features, including
radial and concentric furrows, collarette, and crypts, are generated and seamlessly embedded
within the texture field.
In another example, Makthal and Ross [94] introduced a novel
approach for synthetic iris generation using Markov Random Field (MRF) modeling. The
proposed method offers a deterministic synthesis procedure, which eliminates the need for
sampling a probability distribution and simplifies computational complexity. Additionally,
the study highlights the distinctiveness of iris textures compared to other non-stochastic
textural patterns. Through clustering experiments, it is demonstrated that the synthetic irises
generated using this technique exhibit content similarity to real iris images. In a different
approach, Wei et. al. [149] proposed a framework for synthesizing large and realistic iris
datasets by utilizing iris patches as fundamental elements to capture the visual primitives
of iris texture. Through patch-based sampling, an iris prototype is created, serving as the
foundation for generating a set of pseudo irises with intra-class variations. Qualitative and
quantitative analyses demonstrate that the synthetic datasets generated by this framework are
well-suited for evaluating iris recognition systems.

• Morphable Models: Morphable models have been utilized for generating synthetic iris
images by capturing the shape and appearance variations in a statistical model. These
models represent the shape and texture of irises using a low-dimensional parameter space.
By manipulating the parameters, synthetic iris images with different characteristics, such as
size, shape, and texture, can be generated. Most of the research in this category focuses on
generation synthetic iris images with gaze estimation and rendering eye movements. Wood
et. al. [150] proposed a 3-D morphable model for the eye region with gaze estimation and
re-targeting gaze using a single reference image. Similarly, [10] focuses on achieving photo-
realistic rendering of eye movements in 3D facial animation. The model is built upon 3D
scans of a face captured from various gaze directions, enabling the capture of realistic motion
of the eyeball, eyelid deformation, and the surrounding skin. To represent these deformations,
a 3D morphable model is employed.

• Image Warping: Image warping techniques involve applying geometric transformations to
real iris images to generate synthetic images. These transformations can include rotations,
translations, scaling, and deformations. Image warping allows for the generation of synthetic
iris images with variations in pose, gaze direction, and occlusions.
In [19], Cardoso et.
al. aimed to generate synthetic degraded iris images for evaluation purposes. The method

168

utilizes various degradation factors such as blur, noise, occlusion, and contrast changes to
simulate realistic and challenging iris image conditions. The degradation factors are carefully
controlled to achieve a realistic representation of degraded iris images commonly encountered
in real-world scenarios. In [32], a novel iris image synthesis method combining principal
component analysis (PCA) and super-resolution techniques is proposed. The study begins
by introducing the iris recognition algorithm based on PCA, followed by the presentation of
the iris image synthesis method. The proposed synthesis method involves the construction
of coarse iris images using predetermined coefficients. Subsequently, super-resolution tech-
niques are applied to enhance the quality of the synthesized iris images. By manipulating the
coefficients, it becomes possible to generate a wide range of iris images belonging to specific
classes.

• Generative Adversarial Networks (GANs): GANs have gained significant attention for gen-
erating realistic and diverse synthetic iris images. In a GAN framework, a generator network
learns to generate synthetic iris images, while a discriminator network distinguishes between
real and synthetic images. The two networks are trained in an adversarial manner, resulting
in improved image quality over time. GANs can generate iris images with realistic features,
including iris texture, color, and overall appearance. Minaee and Abdolrashidi [100] pro-
posed a framework that utilizes a generative adversarial network (GAN) to generate synthetic
iris images sampled from a learned prior distribution. The framework is applied to two
widely used iris datasets, and the generated images demonstrate a high level of realism,
closely resembling the distribution of images within the original datasets. Similarly, Kohli
et. al. [82] proposed iDCGAN (iris Deep Convolutional Generative Adversarial Network),
a novel framework that leverages deep convolutional generative adversarial networks and
iris quality metrics to generate synthetic iris images that closely resemble real iris images.
Bamoriya et. al. [9] proposed a novel approach, called Deep Synthetic Biometric GAN
(DSB-GAN), for generating realistic synthetic biometrics that can serve as large training
datasets for deep learning networks, enhancing their robustness against adversarial attacks.

Currently, GAN based methods to generate synthetic biometrics have been proven to be far
superior to capture the intricate details of various biometric cues. Therefore, in the remaining of
the paper we will focus majorly on these methods and iris images generated by these methods for
our study and analysis.

Partially Synthetic Irides

Partially-Synthetic Biometric Data refer to the synthetic samples that contain artificial components
In this approach, certain aspects or attributes of the biometric
mixed with real biometric data.
data are synthetically generated, while other parts are derived from real individuals. The goal
of partially-synthetic data is to introduce controlled variations or augmentations to the real data,
thereby increasing the diversity and robustness of the dataset. This can be particularly useful in
scenarios where the real data is limited, imbalanced, or lacks specific variations. For example,
in iris presentation attack (PA) detection where the detection methods aim to detect PA attacks
(such as printed eyes, cosmetic contact lens, etc.), limited PA data is available to train the detection

169

methods. This can limit the methods’ development and testing as well. Also, with the improvement
in technology more advance PA attacks are present in the real world (such as good quality textured
contact lens, replay attack using high definition screens, etc.) and the current detection methods
are not generalized enough to detect these new and unseen attacks. As mentioned earlier, Kohli
et al. [82] proposed iDCGAN that utilizes a deep convolutional generative adversarial network
to generate synthetic iris images that are realistic looking and closely resemble real iris images.
This framework aims to explore the impact of the synthetically generated iris images when used as
presentation attacks on iris recognition systems. In [9], Bamoriya et al. proposed a novel approach,
DSB-GAN, which is built upon combination of convolutional autoencoder (CAE) and DCGAN
and the evaluation of DSB-GAN is conducted on three biometric modalities: fingerprint, iris, and
palmprint. One of the notable advantages of DSB-GAN is its efficiency due to a low number of
trainable parameters compared to existing state-of-the-art methods. Yadav et al. [156,157] leverages
RaSGAN to generate high-quality partially synthetic iris images in NIR spectrum and evaluates the
effectiveness and usefulness of these images as both bonafide and presentation attack. They also
proposed a novel one-class presentation attack detection method known as RD-PAD for unseen
presentation attack detection, addressing the challenge of generalizability in PAD algorithms. Zou
et al. [172] proposed 4DCycle-GAN that is designed to enhance the database of iris PA images
by generating synthetic iris images with cosmetic contact lenses. Building upon the Cycle-GAN
framework, the 4DCycle-GAN algorithm stands out by adding two discriminators to the existing
model to increase the diversity of the generated images. These additional discriminators are
engineered to favor the images from the generators rather than those from real-life captures. This
approach reduces the bias towards generating repetitive textures of contact lenses, which typically
make up a significant portion of the training data. In [158], Yadav and Ross proposed to generate
bonafide as well as different types of presentation attacks in NIR spectrum using a novel image
translative GAN, known as CIT-GAN. The proposed architecture translates the style of one domain
to another using a styling network to generate realistic, high resolution iris images.

Fully Synthetic Irides

Fully-synthetic biometric data refer to the generation of entirely artificial biometric samples that do
not correspond to any real individuals in the population. These synthetic samples are created using
mathematical models, statistical distributions, or generative algorithms to simulate the statistical
properties and characteristics of real biometric data. Some of the texture based methods focuses
on generating new iris identities (fully-synthetic) that are unique from training samples. These
methods aim to generate new iris images with both inter and intra-class variations to help mitigate
the issue of small training data size by increasing their size of dataset. This can improve the
development of recognition systems and their testing. Also, by generating fully-synthetic identities
that doesn’t resemble anyone in this world, we can solve the privacy concerns attached with using
real person’s biometric data.
In [147], Wang et al. proposed a novel algorithm for generating
diverse iris images, enhancing both the variety and the number of images available for analysis.
The technique employs contrastive learning to separate features tied to identity (like iris texture and
eye orientation) from those that change with conditions (such as pupil size and iris exposure). This
separation allows for precise identity representation in synthetic images. The algorithm uniquely
processes iris topology and texture through a dual-channel input system, enabling the generation of

170

(a) FID score distribution from generated iris im-
ages when GANs are trained using CASIA-Iris-
Thousand dataset.

(b) FID score distribution from generated iris
images when GANs are trained using CASIA-
CSIR dataset.

(c) FID score distribution from generated iris
images when GANs are trained using IITD-iris
dataset.

Figure A.1: Histograms showing the realism scores of real iris images (i.e., FID scores) from
three different datasets and the synthetically generated iris images. Lower the FID score, the more
realistic the generated iris images are.

varied iris images that retain specific texture details. Yadav and Ross [159] proposed iWarpGAN
that aims to disentangle identity and stylistic elements within iris images. It achieves this through
two distinct pathways: one that transforms identity features from real irides to create new iris
identities, and another that captures style from a reference image to infuse it into the output. By
merging these modified identity and style elements, iWarpGAN can produce iris images with a
wide range of inter and intra-class variations. Limited work has been done in this category to
generate irides with identity that doesn’t match with any identity in the training data. This is an
important and upcoming topic that needs more focus.

Experiments & Results

In this section, we discuss the different experiments evaluating the generation capability of
different GAN methods in generating fully and partially synthetic iris images. The different
GAN methods we studied in this research are: RaSGAN [72, 156], CIT-GAN [157], StarGAN-v2,

171

Stylegan-3 [171] and iWarpGAN [159].

Datasets Used

In this research, we explore the usefulness and utility of generated iris images for iris recognition
and PA detection. Therefore, the dataset used in this research can be listed as follows:

Iris Datasets

In this research, we conducted our experiments and analysis using three iris datasets:

• CASIA-Iris-Thousand [5]: Developed by the Chinese Academy of Sciences Institute of
Automation, the CASIA-Iris-Thousand dataset is a popular resource for studying iris patterns
and for advancing iris recognition technologies. This dataset comprises 20,000 iris images
from 1,000 participants, accounting for 2,000 distinct identities when considering images of
both the left and right eyes. These images are captured at a resolution of 640x480 pixels.
The dataset has been partitioned into training and testing subsets, with a distribution of 70%
for training (1,400 identities) and 30% for testing (600 identities).

• CASIA Cross Sensor Iris Dataset (CSIR) [152]: The training portion of the CASIA-CSIR
dataset, provided by the Chinese Academy of Sciences Institute of Automation, was employed
in our study. It includes a total of 7,964 iris images from 100 individuals, representing 200
unique identities when both eyes are considered. Similar to the first dataset, a 70-30 split
based on unique identities was used to divide the images into training (5,411 images) and
testing sets (2,553 images), intended for the training and evaluation of deep learning models
for iris recognition.

• IITD-iris [6]: Originating from the Indian Institute of Technology, Delhi, the IITD-iris
dataset was collected in an indoor setting and consists of 1,120 iris images from 224 subjects.
The images were captured using JIRIS JPC1000 and digital CMOS cameras, each with a
resolution of 320x240 pixels. In line with the previous datasets, this one also utilizes a 70-30
split based on unique identities for its training (314 identities) and testing sets (134 identities).

For the preparation of the data, we processed and resized all the iris images to 256x256 pixels,

centering on the iris region as determined by the iris and pupil coordinates from VeriEye1.

Iris PA Datasets

Our exploration into synthetic images for iris PA detection involves leveraging five distinct iris
PA datasets. These datasets included Casia-iris-fake [136], Berc-iris-fake [91], NDCLD15 [41],
LivDet2017 [161], and MSU-IrisPA-01 [156], each comprising authentic iris images alongside

1www.neurotechnology.com/verieye.html

172

various categories of PAs such as cosmetic contacts, printed iris images, artificial eyes, and display-
based attacks. As mentioned earlier, we processed and resized the images to 256x256 pixels,
centering on the iris region as determined by the iris and pupil coordinates from VeriEye. Images
that VeriEye failed to process correctly were excluded from the study, with our focus being primarily
on the aspect of image synthesis. The resulting processed dataset for analysis contains 24,409
genuine iris images, 6,824 images with cosmetic contact lenses, 680 artificial eye representations,
and 13,293 printed iris images. The train and test division on this dataset is explained later in this
section.

Quality of Generated Images

In order to evaluate the realism and quality of the generated iris images, different GAN methods-
RaSGAN, CITGAN, StarGAN-v2, Stylegan-3 and iWarpGAN- are trained using real irides from
CASIA-Iris-Thousand, CASIA Cross Sensor Iris and IITD-iris dataset, separately. Using the trained
networks, we generate three sets of 20,000 synthetic bonafide images (from each dataset) for each of
the GANs mentioned above. We then evaluate the realism of the generated images and the quality
of the iris using three different methods: (1) Fréchet Inception Score [119], (2) VeriEye Rejection
Rate and (3) ISO-ISO/IEC 29794-6 Standard Quality Metrics [3].

Fréchet Inception Distance Score

The Fréchet Inception Distance (FID) Score is a metric used to assess the quality of synthetically
generated images by comparing their distribution to that of real images, resulting in a score based
on the differences. The objective is to minimize this score, as a lower FID score suggests greater
resemblance between the synthetic and real datasets. It has been noted that FID scores can span a
broad range, with extremely high scores in the 400-600 range indicating significant deviation from
the real data distribution and, consequently, poor synthetic image quality [119].

In our specific analysis of the quality of synthetically generated iris images produced by different
GANs used in this study, we obtained an average FID score of 24.33 and for RaSGAN and StarGAN-
v2. On the other hand, a score of 31.82, 26.90, 15.72 and 17.62 are obtained for CIT-GAN,
Stylegan-3 and iWarpGAN, respectively. As mentioned earlier, lower the FID score, more realistic
the generated images are with respect to real images. Therefore, we can conclude that Stylegan-3
and iWarpGAN generates the most realistic iris images. The distribution of these FID scores has
been shown in Figure A.1.

VeriEye Rejection Rate

For this experiment, we followed the protocol mentioned in [159] to evaluate the effectiveness of
various synthetic iris image generation methods by their acceptance rate when analyzed by VeriEye,
a commercial iris matcher. Here, we analyze the rate at which VeriEye reject synthetic images
produced by different GAN methods. In the first set of comparisons using the IITD-Iris-Dataset,
which comprises 1,120 real iris images, only 0.18% were rejected by VeriEye. In contrast, 1,120
synthetic images produced by RaSGAN and StarGAN-v2 had a rejection rate of 4.55% and 4.64%,

173

(a) CASIA-Iris-Thousand dataset versus synthetic iris images from various GANs.

(b) CASIA-CSIR dataset versus synthetic iris images from various GANs.

Figure A.2: Histograms depicting the quality of real irides alongside the quality of synthetic irides.
These evaluations are in accordance with the ISO/IEC 29794-6 Standard Quality Metrics, with the
quality scale set between 0 and 100, where a higher score denotes superior quality. Iris images that
were not successfully assessed by this standard were assigned a score of 255.

174

Figure A.2 (cont’d)

(c) IITD-iris dataset versus synthetic iris images from various GANs.

respectively. However, images generated by CITGAN, iWarpGAN and Stylegan-3 demonstrated
significantly lower rejection rates at 2.85%, 0.73% and 1.07%, respectively.

The CASIA-CS Iris dataset contains 7,964 real iris images with a rejection rate of 2.81%.
The rejection rates for 7,964 synthetic images were notably higher; images generated by RaSGAN
and StarGAN-v2 were rejected at a rate of 2.06% and 2.65%, respectively. Meanwhile, CITGAN,
iWarpGAN and Stylegan-3 produced images with rejection rates closer to the real images, at 2.71%,
2.74% and 2.52% respectively.

Lastly, the CASIA-Iris-Thousand Dataset, which included 20,000 real iris images, saw a very
low rejection rate of 0.06%. Synthetic images from this dataset indicated that RaSGAN and
StarGAN-v2 had the highest rejection rate at 0.34% and 0.27%, respectively. Synthetic images
from CITGAN, iWarpGAN and Stylegan-3 showed improvement with rejection rates of 0.24%,
0.18% and 0.16%, respectively.

ISO/IEC 29794-6 Standard Quality Metrics

As described in [159], the fidelity of synthetically produced iris images is assessed using the
ISO/IEC 29794-6 Standard Quality Metrics [3]. This assessment was applied to images generated
by different GANs utilized in this study. The ISO standard employs a set of criteria to evaluate
the quality of an iris image, which includes the usable iris area, the contrast between the iris and
sclera, image sharpness, the contrast between the iris and pupil, pupil shape, and more, culminating
in a comprehensive quality score. This score is on a scale from 0 to 100, where 0 indicates the
Images that fail to be evaluated by this ISO metric,
lowest image quality and 100 the highest.
typically due to substandard quality or errors in segmentation, are assigned a score of 255. As
shown in Figure A.2, the quality scores for 20,000 synthetic iris images obtained using iWarpGAN,

175

CITGAN and Stylegan-3 are on par with those of real iris images. Conversely, a noticeable
number of images generated by RaSGAN were assigned the score of 255, reflecting their inferior
quality. Additionally, a comparison across the three datasets showed that the CASIA-CSIR dataset
contained a higher proportion of images with the lowest score of 255, in contrast to the IITD-iris
and CASIA-Iris-Thousand datasets.

Uniqueness of Synthetically Generated Irides

This experiment examines the uniqueness of the iris images generated synthetically using dif-
ferent GANs, specifically assessing the ability of these methods to create distinct identities that
exhibit some intra-class variations. For this, RaSGAN, CITGAN, StarGAN-v2, Stylegan-3 and
iWarpGAN- are trained using real irides in the train set of CASIA-Iris-Thousand, CASIA Cross
Sensor Iris and IITD-iris dataset, separately.

Unique-Experiment-1: This experiment is centered on exploring the uniqueness of the synthet-
ically generated iris datasets, which were generated using various GAN techniques, with respect to
the training examples. To achieve this, the analysis involve comparing the impostor and genuine
distribution for real irides that were part of the training set for different GAN techniques with that of
the synthetically generated irides. Here, VeriEye matcher is utilized to obtain the similarity score
between two iris images, with the score spanning from 0 to 1557, where a higher score indicates a
more accurate match.

Unique-Experiment-2: This experiment aims to investigate the uniqueness and intra-class vari-
ability present in the synthetically generated iris dataset. This involves an analysis of both genuine
and imposter distributions within the generated dataset and their comparison with the distributions
from actual iris datasets. As previously noted, this investigation is conducted across a range of
uniquely generated identities to assess their uniqueness. The VeriEye matcher is used in this exper-
iment for assessing the similarity score between pairs of iris images.

Analysis: Above mentioned experiments replicate the experimental protocols defined in [159]
It also analyze the
to evaluate the inter and intra-class variations in the generated iris dataset.
uniqueness of the generated samples from the real irides in the training dataset. As depicted in
Figures A.3, A.4, and A.5, the iris images produced by iWarpGAN do not exhibit a high degree of
resemblance to the real irides from train set, unlike the outcomes from other GAN methods. This
indicates iWarpGAN’s proficiency in generating iris patterns with identities that diverge from those
found in the training set. Moreover, by examining the overlap between the imposter distribution of
the synthetically generated iris images and that of the genuine iris images, it becomes evident that
the generated identities are distinct from one another. Hence, we can conclude that iWarpGAN has
the capability to generate fully synthetic irides images with identities that are unique from irides in
training set, while other GANs are efficient to generate only partially synthetic irides.

176

Table A.1: PAD-Experiment-0: True Detection Rate (TDR in %) at three different False Detection
Rate (FDR) of different iris PAD methods in baseline experiment (PAD-Experiment-0) where PA
detectors are trained using real bonafide and PA samples. The PA samples used in this experiment
are imbalanced across different PA classes [158].

BSIF+SVM [41] Fine-Tuned VGG-16 [53] Fine-Tuned AlexNet [84] D-NetPAD [128]

TDR (@0.1%)
TDR (@0.2%)
TDR (@1.0%)

3.32
6.15
28.11

85.25
83.86
89.07

86.10
87.29
90.51

87.94
88.91
92.54

Table A.2: PAD-Experiment-1: True Detection Rate (TDR in %) at 1% False Detection Rate (FDR)
of different iris PAD methods when trained using real bonafide irides, real PAs and synthetic PAs
generated using different GAN methods.

BSIF+SVM [41] Fine-Tuned VGG-16 [53] Fine-Tuned AlexNet [84] D-NetPAD [128]

RaSGAN
CIT-GAN
StarGAN-v2
Stylegan-3
iWarpGAN

28.31
29.43
29.73
34.05
32.09

81.11
85.81
83.47
88.14
86.58

86.74
88.37
88.71
90.95
89.18

87.97
88.86
88.45
90.75
90.28

Table A.3: PAD-Experiment-2: True Detection Rate (TDR in %) at 1% False Detection Rate
(FDR) of different iris PAD methods when they are trained using real bonafide irides alongside a
balanced collection of PA samples using both real and synthetic irides.

BSIF+SVM [41] Fine-Tuned VGG-16 [53] Fine-Tuned AlexNet [84] D-NetPAD [128]

RaSGAN
CIT-GAN
StarGAN-v2
Stylegan-3
iWarpGAN

50.52
51.11
50.69
56.14
55.25

88.62
91.60
92.39
95.02
93.69

88.15
92.70
88.38
93.38
92.99

94.90
97.89
95.59
98.39
98.20

Utility of Synthetically Generated Irides

The experiments in this section evaluates the usefulness of the synthetically generated irides in the
field of presentation attack detection as well as iris recognition.

Presentation Attack Detection

As discussed earlier, lack of sufficient number and variations of PA samples can affect the general-
izability of different PA detection methods, especially PAD methods based on deep networks that
needs large number of training samples for better performance. Therefore, we outline the various
experimental frameworks that have been established to assess the utility of the synthetically gener-
ated iris PAs using different GAN methods. To generate the synthetic data (for both bonafide and
PAs), the GANs are trained using 14,970 bonafide iris images, 6,016 printed eyes, 4,014 cosmetic

177

contact lenses and 276 artificial eyes from the iris PA datasets mentioned earlier. Please note that
CIT-GAN, StarGAN-v2, Stylegan-3 and iWarpGAN can handle style transfer from one domain to
another i.e., image generation for these GANs can extend to multiple domains/style. However, RaS-
GAN can facilitate only single domain image generation. Therefore, multiple RaSGAN networks
are trained to be able to generate bonafide as well different types of PAs. We analyzed the efficacy
of several iris Presentation Attack Detection (PAD) techniques, including VGG-16 [53], BSIF [41],
D-NetPAD [128], and AlexNet [84], within these different experimental configurations for analyt-
ical scrutiny. It is noteworthy that D-NetPAD has been recognized as one of the top-performing
PAD algorithms, particularly highlighted for its performance in the iris liveness detection challenge
(LivDet-20 edition).

PAD-Experiment-0: In this baseline experiment, we demonstrate the performance of different
iris PAD methods on iris PA datasets mentioned in Dataset section. The PAD methods are trained
with 14,970 bonafide iris images and 10,306 instances of PAs, which include 4,014 cosmetic contact
lenses, 276 artificial eyes and 6,016 printed eyes. For testing, the dataset contain 9,439 bonafide
iris images alongside 9,896 PA instances, which broke down into 2,720 cosmetic contacts, 404
artificial eyes and 6,772 printed eyes.

PAD-Experiment-1: Here, we aim to evaluate the realism of synthetically generated PAs against
real PAs by training the iris PA detection methods with 14,970 bonafide samples and synthetically
generated 6,016 printed eyes, 276 artificial eyes and 4,014 cosmetic contact lenses. For testing,
the set contains 9,439 bonafide iris images alongside 9,896 PA instances, that consists of 2,720
cosmetic contacts, 404 artificial eyes and 6,772 printed eyes.

PAD-Experiment-2: Here, we aim to evaluate the usefulness of the generated PA sampled for
balanced training when imbalanced PA classes are improved by adding synthetically generated PAs
to the training. Therefore, in this experiment, PAD methods are trained using 14,970 bonafide irides
alongside a balanced collection of 15,000 PA samples. This assembly comprises of 276 artificial
eyes, 4,014 cosmetic contact lenses, and 5,000 printed eyes which are real PAs; in addition to that
4,724 synthetic artificial eyes and 986 synthetic cosmetic contact lenses are utilized in training set.
Similar to previous experiment, testing is done on 9,439 bonafide iris images alongside 9,896 PA
instances, that consists of 2,720 cosmetic contacts, 404 artificial eyes and 6,772 printed eyes.

Analysis: The variation in the number of samples across different PA categories influences the
effectiveness of the PAD techniques. This impact is evident when the outcomes of PAD-Experiment-
0 are compared with PAD-Experiment-1 & 2. In PAD-Experiment-2, the PAD methods are trained
with 9,439 bonafide samples and an equalized set of PA samples (namely, 5,000 from each PA
category), including both real and synthesized PAs. Based on the data in Table A.1 and Table A.3,
there is a discernible enhancement in the performance of each PAD approach when trained with
balanced samples from each class. Moreover, the comparative analysis of synthetic PA samples
and actual PA samples was conducted through PAD-Experiment-1. In this experiment, a portion
of the real PA samples in the training set was substituted with synthetic PAs. When comparing the
performance metrics in Table A.1 and Table A.2, only a marginal discrepancy in PAD efficacy is
noticeable.

178

Iris Recognition

As mentioned earlier, the lack of number of unique identities in the dataset with intra-class varia-
tions to generalize can affect the training and testing of many iris recognition methods, especially
recognition methods based on deep networks that needs large number of training samples for
better performance. Therefore, we outline the various experimental frameworks that have been
established to assess the utility of the synthetically generated irides (with both inter and intra-class
variations) for iris recognition. As seen from the previous experiments, among the GANs studied in
this research, only iWarpGAN has the capability of generating irides that are unique from training
data in terms of identity. So, we train iWarpGAN with CASIA-Iris-Thousand, CASIA-CSIR and
IIT-Delhi iris dataset, separately, to generate synthetic irides with inter and intra-class variations.
The generated dataset is utilized in this experiment to evaluate its usefulness for improved iris
recognition.

Recog-Experiment-0: In this baseline experiment, EfficientNet [66], ResNet-101 [101] and
DenseNet-201 are trained using the triplet training approach. Training and testing has been done
using cross-dataset method i.e., when trained using real irides from CASIA-Iris-Thousand and
CASIA-CSIR, testing is done on IITD-iris dataset.

Recog-Experiment-1: This experiment focuses on assessing the impact of synthetic iris dataset
on enhancing the performance of iris recognition methods based on deep learning. In this context,
EfficientNet, ResNet-101 and DenseNet-201 are trained with not only the real irides from CASIA-
Iris-Thousand, CASIA-CSIR, and IITD-iris datasets but also with a synthetically generated iris
dataset derived from iWarpGAN.

Analysis: Figures A.7 and A.6 illustrate that the efficacy of the iris recognition system, which
relies on deep learning, is enhanced with the incorporation of a larger dataset. This enhancement
is particularly evident when the system is trained using a combination of both real iris images
and those synthetically produced by iWarpGAN. Although the baseline performance of ResNet-
101 and EfficientNet is somewhat modest, they exhibits a notably substantial enhancement in the
performance. Similar, behavior is seen for DenseNet-201.

Summary & Future Work

In this section, we summarize the studies found in this research and also discuss the future

scope in the field of generating synthetic irides.

Current Techniques & their Limitations

In this research, we studied different GAN methods to generate synthetic irides for both bonafide
and different presentation attacks. The generated irides were evaluated for their realism, quality,
uniqueness from training dataset and utility. Using these experiments as our criteria for comparison,
we conclude that:
(1) GAN methods like RaSGAN and StarGAN-v2, CITGAN can generate
partially synthetic dataset, but fail to generate enough samples that are unique from training

179

Figure A.3: This figure shows the uniqueness of iris images generated using iWarpGAN when the
GANs are trained using CASIA-Iris-Thousand dataset. The y-axis represents the similarity scores
obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor.

dataset i.e., generated dataset has high similarity with training dataset and also with itself. Similar
behavior was seen for Stylegan-3, however the images generated by Stylegan-3 are highly realistic
and very close to original dataset in terms or quality (as seen in Figure A.2). On the other
hand, iWarpGAN showed the capability of generating fully synthetic irides that has both inter and
intra-class variations, which can help replenish the lacking iris datasets for training and testing
iris recognition methods. This method is scalable to multiple domains (using attribute vector), it
can also be utilized to generate partially synthetic irides from various domains i.e., bonafide and
various PAs that can be used to enhance the performance of different PAD methods (as shown in
??). While this method provides solutions for both fully and partially synthetic iris generation,
iWarpGAN utilizes image transformation, whereby the network requires both an input and a style
reference image to modify the identity and style, resulting in the generation of an output image.
Such a process could potentially constrain the range of features that iWarpGAN is able to explore.
Also, there is still some similarity observed between training samples and generated irides.

180

Figure A.4: This figure shows the uniqueness of iris images generated using iWarpGAN when the
GANs are trained using CASIA-CS iris dataset. The y-axis represents the similarity scores
obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor.

Future Work & Scope

Numerous researchers are dedicating their efforts to the creation of synthetic face images encom-
passing varied attributes, styles, identities, spectrums, and more. However, very little work has
been done in the field of synthetic iris generation. This opens up a wide array of opportunities that
warrant in-depth investigation and exploration. Some of the possible future work in this field are
listed as follows:

• Generalizable Solution for Fully Synthetic Iris Images: The first area of exploration
involves developing a generalizable solution for creating fully synthetic iris images. This
involves not just replicating the physical appearance of an iris but also ensuring that the syn-
thetic images can adapt or respond to different lighting conditions and camera specifications,
just as a real iris would. Such a solution would have huge implications for enhancing the
realism and applicability of synthetic irides in various fields, including iris recognition and
presentation attack detection.

181

Figure A.5: This figure shows the uniqueness of iris images generated using iWarpGAN when the
GANs are trained using IITD iris dataset. The y-axis represents the similarity scores obtained
using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor.

• Generating Fully Synthetic Ocular Images: Another intriguing direction is the generation
of complete ocular images, which include not only the iris but also other parts of the eye.
So far, the research work in this field manly focuses on generating cropped iris images
and for some cases the image quality deteriorates as more information is introduced in the
image [157]. Therefore, this area needs attention from researchers to be able to study the
other distinguishing features of an eye apart from irides. Creating realistic ocular images that
accurately represent the myriad variations in human eyes could also aid in the development
of more robust facial recognition technologies by providing a method to generate faces that
also captures the intricate details of a real iris, which seems to be missing from most of the
face generation methods.

• Synthetic Iris Videos to Mimic Liveness of Real Irides: The creation of synthetic iris
videos that can mimic the liveness of real irides is a particularly challenging yet rewarding
prospect. Such advancements would be beneficial in developing more robust PA detection

182

Figure A.6: This figure shows the performance of DenseNet-201, EfficientNet and ResNet-101 in
the cross-dataset evaluation scenario i.e., trained using CASIA-Iris-Thousand & CASIA-CSIR
datasets and tested using IIT-Delhi iris dataset. Improvement in the performance is seen when size
of training set is increased using synthetic irides.

Figure A.7: This figure shows the performance of DenseNet-201, EfficientNet and ResNet-101 in
the cross-dataset evaluation scenario i.e., trained using CASIA-Iris-Thousand & IIT-Delhi iris
datasets and tested using CASIA-CSIR dataset. Improvement in the performance is seen when
size of training set is increased using synthetic irides.

183

methods. By simulating the natural movements and minute dynamic changes in the iris,
these videos could provide an authentic and effective tool for training and improving liveness
detection algorithms in iris recognition systems. Also as mentioned earlier, this could also
aid in developing a robust facial recognition technologies.

• Multi-spectrum Iris Image Generation: The generation of multi-spectrum iris images
presents another frontier. The human iris exhibits different characteristics under various light
spectrums - a feature that is often leveraged in biometric systems. Developing synthetic iris
images that can accurately reflect these multi-spectral properties would not only enhance the
realism of these images but also expand their utility in biometric recognition systems. Such
multi-spectrum images could serve as a valuable resource for researchers and developers,
offering a versatile tool for testing and improving multi-spectral iris recognition technologies.

The potential applications of successfully generated synthetic iris images are vast and varied.
In security and biometric recognition systems, these images can help improve the accuracy and
robustness of systems by providing a diverse range of data for training and testing. In the medical
field, synthetic iris images could be used for training purposes, enabling medical professionals
to recognize and diagnose eye-related diseases more effectively. Furthermore, in the realm of
entertainment and virtual reality, realistic synthetic iris images could enhance the visual experience
by providing more lifelike and expressive characters. The ability to generate eyes that accurately
mimic human emotions could revolutionize the way we interact with virtual environments and
characters. In conclusion, while the generation of realistic and unique synthetic iris images is still
in the development stage, it presents an opportunity for research and exploration.

184