SYNTHESIZING IRIS AND OCULAR IMAGES USING ADVERSARIAL NETWORKS AND DIFFUSION MODELS By Shivangi Yadav A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science – Doctor of Philosophy 2024 ABSTRACT Synthetic biometric data – such as fingerprints, face, iris and speech – can overcome some of the limitations associated with the use of real data in biometric systems. The focus of this work is on the iris biometric. Current methods for generating synthetic irides and ocular images have limitations in terms of quality, realism, intra-class diversity and uniqueness. Different methods are proposed in this thesis to overcome these issues while evaluating the utility of synthetic data for two biometric tasks: iris matching and presentation attack (PA) detection. Two types of synthetic iris images are generated: (1) partially synthetic and (2) fully synthetic. The goal of “partial synthesis” is to introduce controlled variations in real data. This can be particularly useful in scenarios where real data are limited, imbalanced, or lack specific variations. We present three different techniques to generate partially synthetic iris data: one that leverages the classical Relativistic Average Standard Generative Adversarial Network (RaSGAN), a novel Cyclic Image Translation Generative Adversarial Network (CIT-GAN) and a novel Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN). While RaSGAN can generate realistic looking iris images, this method is not scalable to multiple domains (such as generating different types of PAs). To overcome this limitation, we propose CIT-GAN that generates iris images using multi-domain style transfer. To further address the issue of quality imbalance across different domains, we develop MID-StyleGAN that exploits the stable and superior generative power of diffusion based StyleGAN. The goal of “full synthesis” is to generate iris images with both inter and intra-class variations. In this regard, we propose two novel architectures, viz., iWarpGAN and IT-diffGAN. The proposed iWarpGAN focuses on generating iris images that are different from the identities in the training data using two transformation pathways: (1) Identity Transformation and (2) Style Transformation. On the other hand, Image Translative Diffusion-GAN (IT-diffGAN) projects input images onto the latent space of a diffusion GAN, identifying and manipulating the features most relevant to identity and style. By adjusting these features in the latent space, IT-diffGAN generates new identities while preserving image realism. A number of experiments are conducted using multiple iris and ocular datasets in order to evaluate the quality, realism, uniqueness, and utility of the synthetic images generated using the aforemen- tioned techniques. An extensive analysis conveys the benefits and the limitations of each technique. In summary, this thesis advances the state of the art in iris and ocular synthesis by leveraging the prowess of GANs and Diffusion Models. ACKNOWLEDGMENTS I am deeply grateful to my doctoral advisor and mentor, Professor Arun Ross, for his exceptional guidance and support throughout my academic journey. His profound expertise, insightful feedback, and unwavering commitment to excellence have been pivotal in shaping both my research and my growth as a scholar. Professor Ross has consistently emphasized the importance of not only understanding the technical aspects of research but also probing deeper into the reasoning behind each method. His mentorship has provided me with countless opportunities to expand my knowledge and skills through conferences, workshops, and other academic events, for which I am sincerely thankful. I would also like to thank Professors Xiaoming Liu, Professors Selin Aviyente and Johnson Kristen for serving on my committee and offering their expert advice on my research. I am thankful to Professor Sandeep Kulkarni, Associate Chair of Graduate Studies, for their guidance and support that helped me stay motivated throughout my degree. I am also deeply grateful to my Master’s advisor, Professor Mayank Vatsa and Professor Richa Singh, whose encouragement led me to pursue my doctoral studies at Michigan State University. I have been fortunate to have the support of incredible lab mates and colleagues throughout this journey. Their patience, critiques, and guidance have been essential in helping me refine and defend my research. They have celebrated my successes and provided comfort during setbacks. I am also grateful to Vincent Mattison and Brenda Hodge for their unwavering support. Lastly, I would like to express my deepest gratitude to my partner, Aman Chahar, for his unwavering support throughout my PhD journey. His constant encouragement and understanding have been a source of strength, helping me stay focused and motivated even during the most challenging times. Aman’s belief in my abilities and his patience through the ups and downs of this journey have been invaluable. I am incredibly fortunate to have him by my side, and I couldn’t have come this far without his love and support. iv TABLE OF CONTENTS CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Biometric Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Biometric Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Synthetic Biometric Data – Face, Fingerprint & Iris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthetic Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 5 8 1.3.1 9 1.3.2 Applications of Synthetic Biometric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.3 Methods to Generate Synthetic Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Pre-CNN Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.3.1 Post-CNN Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.3.2 Synthetic Iris Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.4 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.3.4 CHAPTER 2 GENERATING PARTIALLY SYNTHETIC IRIS IMAGES FOR ENHANCED PRESENTATION ATTACK DETECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2.1 Standard Generative Adversarial Networks (SGANs) . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2.2 Relativistic Standard Generative Adversarial Networks (RSGANs) . . . . . . . . . . . 41 2.2.3 Relativistic Average Standard Generative Adversarial Networks (RaSGANs) 41 Fréchet Inception Distance Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.4 2.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Synthesizing Irides using RaSGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3.1 2.3.2 Relativistic Discriminator- A One-class Presentation Attack Detection Method (RD-PAD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3.2.1 Method-I: RD-PAD Trained with Bonafide Samples Only . . . . . . . . . . 46 2.3.2.2 Method-II: RD-PAD Fine-tuned with Some PA Samples . . . . . . . . . . . 46 2.4 Experimental Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4.1 Dataset Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4.2 Image Realism Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.3 Applications of RaSGAN based Iris images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4.3.1 Baseline on Current PAD Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Synthetic Iris as Bonafide Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4.3.2 Synthetic Iris as Presentation Attack Sample . . . . . . . . . . . . . . . . . . . . . . . . 51 2.4.3.3 2.4.4 RD-PAD for Seen and Unseen Presentation Attack Detection . . . . . . . . . . . . . . . . . 52 2.4.4.1 Seen Presentation Attacks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.4.4.2 Unseen Attack: Cosmetic Contact Lenses and Kindle Display . . . . . 53 2.4.4.3 Unseen Attack: Printed Eyes and Artificial Eyes . . . . . . . . . . . . . . . . . . . . 54 2.4.4.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 CHAPTER 3 CYCLIC IMAGE TRANSLATION GENERATIVE ADVERSARIAL NETWORK (CIT-GAN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.1 v 3.2 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.1 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.2 Styling Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.3 Cycle Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3 Experimental Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.1 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.2 Image Realism Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.3 Utility of Synthetically Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4 Results & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 CHAPTER 4 IWARPGAN: DISENTANGLING IDENTITY AND STYLE TO GENERATE SYNTHETIC IRIS IMAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3.1 Disentangling Identity and Style to Generate New Iris Identities . . . . . . . . . . . . . . 80 4.4 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.5 Experimental Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5.1 Experiment-1: Quality of Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5.2 Experiment-2: Uniqueness of Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.5.3 Experiment-3: Utility of Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6 Summary & Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 CHAPTER 5 IT-DIFFGAN: IMAGE TRANSLATIVE DIFFUSION GAN TO GENERATE 5.2.1.1 5.2.1.2 SYNTHETIC IRIS IMAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2.1 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Standard Generative Adversarial Networks (SGANs) . . . . . . . . . . . . . . . 98 StyleGAN and its Latent Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.2 Diffusion-GANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.1 Mapping Image Input to Latent Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.2 Identity & Style Disentanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3.3 Manipulating Style & Identity in Latent Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Identity Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3.4 Datasets Utilized. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.4 Experiments & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.1 Experiment 1: Test of Realism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.2 Experiment 2: Test of Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4.3 Experiment 3: Utility of Generated Iris Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.3.3.1 5.3.3.2 vi CHAPTER 6 MULTI-DOMAIN IMAGE TRANSLATIVE DIFFUSION STYLEGAN WITH APPLICATION IN IRIS PRESENTATION ATTACK DETECTION . . . 122 6.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.2.1 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.2.2 Diffusion based GANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.3 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.4 Experiments & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.4.1 Datasets & PA Detection Methods Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.4.2 Realism Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.4.3 Utility of Generated Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.4.3.1 Baseline Experiment-0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.4.3.2 Utility Experiment-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Studying Components of Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.4.4.1 6.4.4.2 Capacitive Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 CHAPTER 7 SUMMARY AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 vii CHAPTER 1 INTRODUCTION Throughout human history, the capacity to differentiate individuals uniquely and associate personal attributes, like name and nationality, to a person has played an essential role in human society. For example, people have traditionally used physical traits such as facial features, gait, speech, and surroundings as cues to recognize each other. Human abilities for identity management may suffice within small communities, but the limitations of the human brain [69] that come with the exponential growth of the global population has necessitated the development of sophisticated systems to efficiently handle the task of managing many identities. This led to the emergence of biometrics, the science of recognizing individuals on large scale based on their unique physical or behavioral characteristics [69]. This discipline encompasses the study and application of automated methods for identifying or verifying an individual’s identity by analyzing their distinctive physical or behavioral traits. Unlike traditional identification methods such as ID cards or passwords, which can be lost, stolen, or forgotten, biometrics offers a highly reliable and convenient means of person recognition. Thus, biometrics enable accurate and secure person recognition through the utilization of intrinsic and distinctive human traits [69, 71]. Physical traits include but are not limited to fingerprints [96], face [70], iris [17] and hand geometry [133] (viz., Figure 1.1). Some examples of primarily behavioral traits may include gait [118], speech [13], signature [48], and keystroke dynamics [139]. This research will primarily emphasize the iris modality as a biometric trait. "Biometrics is the most human-centric technology of our time." - John Mears, former Director of Biometric Standards and Testing at NIST, Biometrics: Personal Identifica- tion in Networked Society, 1999. "Biometrics provides a bridge between our physical existence and our digital identity." - Mark Lockie, Biometrics Institute, 2019. 1 Figure 1.1: Examples of physical biometric traits such as fingerprints [4], face in visible spectrum and iris in NIR spectrum [5]. 1.1 Biometric Systems Biometric systems have become an integral part of contemporary technology, revolutionizing the way individuals are recognized. These systems capitalize on unique physical or behavioral characteristics to establish a person’s identity with a high degree of accuracy. A biometric system comprises of two distinct subsystems: enrollment and recognition [15, 71]. In the enrollment subsystem, a sensor is utilized to capture raw biometric data from an individual. The captured data undergoes a quality check to ensure its suitability for further processing. A feature extractor then extracts relevant information from the data, which is securely stored in a database along with a unique ID assigned to the individual during the data acquisition stage. If the captured data fails to meet the required quality standards, it is discarded, prompting the enrollment process to restart. The recognition subsystem can be further divided into two different types of recognition modes (as shown in Figure 1.9) [15, 69]: • Verification: It is also known as one-to-one matching and is employed to verify the claimed identity of an individual. During verification, a biometric sample provided by the individual 2 Figure 1.2: Illustration of the key components of biometric systems, including data acquisition, quality assessment, feature extraction and matching, highlighting their roles in recognition. Figure 1.3: Illustration of verification and identification modes. Verification mode is used to verify the claimed identity of an individual i.e., a biometric sample provided by the individual is compared to a pre-stored template associated with their claimed identity. The objective is to determine if the sample and template correspond to the same identity. On the other hand, during the identification mode the biometric sample provided by the individual is compared against all available templates in the system’s database. Similarity scores are computed between the sample and each template to find the closest match. 3 is compared to a pre-stored template associated with their claimed identity. The objective is to determine if the sample and template correspond to the same identity. In verification, the performance evaluation involves the use of genuine and impostor pairs. Genuine pairs consist of a biometric sample used for enrollment (stored template) and the sample presented during verification. The system’s task is to correctly match these pairs, thereby confirming the claimed identity. Genuine pairs represent successful matches, resulting in positive verification outcomes. Conversely, impostor pairs consist of a biometric sample from an individual attempting to impersonate someone else and the stored template associated with the target individual. The system aims to detect the mismatch between the samples and reject the impostor’s claim. Impostor pairs represent failed attempts to deceive the system, resulting in negative verification outcomes. To evaluate the performance of a verification system, metrics such as False Acceptance Rate (FAR) and False Rejection Rate (FRR) are utilized. FAR quantifies the rate at which impostor pairs are incorrectly accepted as genuine matches, highlighting security vulnerabilities. FRR measures the rate at which genuine pairs are incorrectly rejected as impostor matches, indicating the system’s inability to correctly identify individuals. • Identification: During the identification mode, the biometric sample provided by the indi- vidual is compared against all available templates in the system’s database. Similarity scores are computed between the sample and each template to find the closest match. The template associated with the highest similarity score is considered the identification result, indicating the likely identity of the individual. Identification can further be classified into positive and negative identification. The positive identification refers to a scenario where an individ- ual’s provided biometric sample successfully matches a template in the database with a high similarity score. In this case, the system accurately identifies the individual and confirms their true identity. Positive identification is achieved when the system correctly recognizes the person among the potential candidates, providing a reliable match. On the other hand, Negative Identification occurs when the individual’s biometric sample does not match any 4 template in the database, or the similarity scores fall below a predefined threshold. In such cases, the system fails to identify the individual or determine their true identity. Negative identification indicates that the given person’s biometric data is not present in the database or the corresponding biometric sample does not sufficiently match any of the stored templates. It is important to note that the concepts of positive and negative identification help assess the effectiveness and accuracy of biometric identification systems in correctly identifying individuals from a pool of candidates. 1.2 Biometric Datasets A biometric dataset refers to a collection of data that consists of biometric samples or measure- ments obtained from individuals for the purpose of developing, testing, and evaluating biometric recognition systems. Biometric data is derived from the unique physical or behavioral charac- teristics of individuals, such as fingerprints, iris, face, speech, etc. The collection of a biometric dataset involves several steps [28,69]. First, a target population or sample group is identified, which represents the intended user base of the biometric system. This population can vary based on the application, such as employees in an organization, individuals at border control points, or patients in a healthcare setting. Next, the individuals within the target population are enrolled in the biometric dataset. During the enrollment process, biometric samples are collected using specific sensors or devices designed for each biometric modality. For example, fingerprint scanners, iris cameras, or voice recording devices may be used to capture the respective biometric traits. Details of some publicly available iris datasets is given in Table 1.1 that lists iris datasets collected using various sensors in different spectrum. Note that most of these datasets are frontal view and collected in a controlled environment. Biometric datasets play a crucial role in training algorithms, testing system performance, and conducting research in the field of biometrics. However, the collection of biometric datasets involves various challenges that can lead to their shortcomings [15, 28, 69, 145]: • Informed Consent and Legal Issues: Collecting biometric data requires obtaining informed 5 consent from individuals. Consent involves providing clear and understandable information about the purpose, scope, and potential risks associated with the data collection. Compliance with legal requirements, such as data protection laws and regulations specific to biometric data, is essential to ensure that the collection and use of biometric datasets are conducted lawfully and ethically. • Privacy Concerns: Biometric data is highly personal and sensitive since it represents unique physical or behavioral characteristics of individuals. Collecting and storing such data must comply with privacy regulations and ensure the protection of individuals’ privacy rights. Proper measures such as data encryption, secure storage, and controlled access to the dataset should be implemented. • Inter-class variations: It refers to the differences or dissimilarities between individuals in a dataset. The importance of inter-class variation lies in its ability to capture the diver- sity present within the target population. Biometric systems are designed to recognize and differentiate individuals based on their unique traits. By including samples from individ- uals with diverse characteristics such as demographics, ethnicities, ages, and genders, the inter-class variation in the dataset ensures that the system can generalize well and perform accurately across a wide range of individuals. It helps to avoid bias and ensures fairness in the recognition process. • Intra-class variations: It represents the natural variations that occur within an individual’s biometric samples. No two samples of the same biometric trait from a person are identical due to factors such as environmental conditions, sensor variations, or changes in the presentation of the biometric trait. Understanding and modeling this variation is crucial for handling the inherent uncertainties and variations in real-world scenarios. By capturing and incorporating intra-class variation in the dataset, biometric systems can be trained to be more robust and tolerant to these natural variations, leading to improved recognition performance. • Sample Size: The size of the biometric dataset is a critical factor in determining the effective- 6 ness of biometric systems. A larger dataset provides a more representative and comprehensive coverage of the target population, enabling better algorithm training, evaluation, and test- ing. Increasing the dataset size reduces the risk of overfitting, where the system becomes too specialized to the limited samples in the dataset, resulting in poor performance on new samples. • Data Quality: The quality of biometric samples within the dataset is essential for achieving reliable recognition performance. Factors such as the quality of the sensors used for data capture, environmental conditions during data collection (e.g., lighting, noise), and variations in sample presentation can impact the quality of the dataset and subsequently influence system performance. Quality control measures, including sensor calibration, data pre-processing techniques, and rigorous data validation, should be employed to enhance the quality of the collected biometric dataset. • Annotation and Ground Truth: Biometric datasets often require annotation or ground truth labels, indicating the correct identity associated with each sample. This process can be labor- intensive and subjective, as it involves human judgment to determine the correct identities. The accuracy and reliability of the ground truth significantly influence the performance eval- uation of biometric systems, making proper annotation a crucial aspect of dataset collection. • Longitudinal Data and Aging: Longitudinal biometric datasets are collected over time from the same individuals to study the effects of aging on biometric traits. Collecting and main- taining longitudinal datasets present challenges in terms of tracking individuals over extended periods, managing data consistency and integrity, and addressing variations in biometric traits due to aging or other factors. Longitudinal data enables the development of age-invariant recognition algorithms and provides insights into the long-term performance and stability of biometric systems. These challenges can lead to shortcomings that hampers the development and evaluation of reliable biometric systems. For example, the datasets used for training and evaluating the biometric 7 Table 1.1: Iris datasets captured in different spectrum such as visible (VIS) and near-infrared (NIR) using various sensors. Dataset Name CASIA-Iris-Thousand [5] CASIA-Iris-Interval [5] CASIA-Iris-Lamp [5] CASIA-Iris-Twins [5] CASIA-Iris-Distance [5] ICE-2005 [1] ICE-2006 [2] IIITD-CLI [80] URIRIS v1 [113] URIRIS v2 [112] MILES [43] MICHE DB [167] CSIP [122] WVU BIOMDATA [30] IIT-Delhi Iris Dataset [6] CASIA-BTAS [167] IIITD Multi-spectral Periocular [127] CROSS-EYED [125] Devices Iris scanner (Irisking IKEMB-100) CASIA close-up iris camera OKI IRISPASS-h OKI IRISPASS-h CASIA long-range iris camera LG2200 LG2200 Cogent and VistaFA2E single iris sensor Nikon E5700 Canon EOS 5D MILES camera iPhone 5, Samsung Galaxy (IV + Tablet II) Xperia Arc S, iPhone 4, THL W200, Huawei Ideos X3 Irispass JIRIS, JPC1000 and digital CMOS camera CASIA Module v2 Cogent Iris Scanner Dual Spectrum Sensor Spectrum NIR NIR NIR NIR NIR NIR NIR NIR VIS VIS VIS VIS VIS NIR NIR NIR NIR, VIS, Night Vision NIR, VIS Number of Images (subjects) 20,000 (1000) 2,639 (249) 16,212 (411) 3,183 (200) 2,567 (142) 2,953 (132) 59,558 (240) 6,570 (240) 1,877 (241) 11,102 (261) 832 (50) 3,732 (184) 2,004 (100) 3,099 (244) 1,120 (224) 4,500 (300) 1,240 (62) 11,520 (240) systems often lack an adequate number of samples and fail to encompass the full range of inter and intra class variations 1.1. To overcome these shortcomings, researchers are actively exploring the potential of synthetically generated biometric data [73, 164] that can capture the intricate characteristics of real biometric traits, such as facial features [16,83,114], fingerprints [45,95,103], iris patterns [126, 147, 173], etc. While generating synthetic biometric data it is important to make sure that the synthetic samples exhibit a wide range of intra-class variations, bolstering the size and diversity of biometric dataset, facilitating the development of more robust and accurate biometric systems. Also, the generated biometric data should be distinct from real individuals. This makes sure that the risk of privacy breaches and unauthorized access to personal data is effectively mitigated, since the generated samples do not correspond to real individuals. This aspect assumes great significance, as ensuring the confidentiality and security of individuals’ biometric information is of paramount importance. 1.3 Synthetic Biometric Data – Face, Fingerprint & Iris In recent years, biometric based human recognition has gained significant attention and widespread adoption with applications in various domains such as security systems, access control, forensics, healthcare etc. However, as the field of biometrics advances, it presents a range of 8 research challenges that require exploration and innovation. For example, one of the prominent challenges in biometrics research is the availability of datasets with sufficient size, high quality, and diverse intra-class variations. Despite remarkable progress in biometric technologies, the datasets used for training and evaluating these systems often lack an adequate number of samples and fail to encompass the full range of intra-class variations. This limitation hampers the development and evaluation of reliable biometric systems. Another critical challenge is to ensure the privacy of individuals who submit their biometric data. Biometric information, being inherently personal and unique, raises concerns about unauthorized access, data breaches, and potential identity theft. Protecting the privacy of individuals while utilizing their biometric data is of utmost importance to foster trust and encourage widespread adoption of biometric systems [145]. Researchers have been actively working on finding solutions to overcome these challenges. One of such solution is to explore the potential of synthetic biometric data. 1.3.1 Synthetic Biometrics Synthetic biometrics refer to the creation of artificial biometric data that replicates real-world characteristics. It encompasses the generation of synthetic samples that emulate the traits and pat- terns observed in authentic biometric data [35], such as fingerprints, facial features, iris patterns, or speech. These synthetic samples are designed to possess similar statistical properties and variations as genuine biometric data, providing a valuable resource for research, development, and evaluation in the field of biometrics. Therefore, the synthetically generated biometrics can be utilized to generate more biometric data with both inter and intra-class variations. This helps overcome the limitations and challenges associated with conventional biometric datasets. Traditional biometric datasets often suffer from restricted sample sizes, lack diversity, and present concerns regarding privacy and data sharing. In contrast, synthetically generated biometrics offer a controlled and scal- able solution by generating artificial data that can mimic the complexity and diversity of real-world biometric traits [73]. By generating synthetic biometric data, researchers and developers gain access to larger and 9 more diverse datasets that facilitate robust training and evaluation of biometric systems. These datasets encompass a wide range of variations, including different poses, lighting conditions, ex- pressions, and occlusions. This facilitates comprehensive testing and optimization of biometric algorithms, enhancing their performance, reliability, and generalization capabilities. Additionally, synthetically generated biometrics address privacy concerns associated with the utilization of gen- uine biometric data. The artificial nature of the generated data ensures that it is not directly linked to any specific individual, mitigating the risks of unauthorized access or misuse of personal infor- mation. Consequently, synthetic biometric datasets can be shared and distributed for research and evaluation purposes without compromising individuals’ privacy rights. Furthermore, synthetically generated biometrics play a crucial role in enhancing the training and testing of deep convolutional neural network (CNN) models. Deep CNNs have demonstrated remarkable performance in various biometric tasks but heavily rely on large labeled datasets for effective training. Synthetically gener- ated biometric data aids in augmenting the availability of labeled training data by creating synthetic samples with known ground truth annotations. This facilitates the creation of more extensive and diverse training sets, resulting in improved CNN model training and higher accuracy in biometric systems. Moreover, synthetically generated biometrics contribute to the development of robust spoof detection algorithms. Spoofing attacks, where impostors attempt to deceive biometric sys- tems using fabricated or altered biometric traits, pose significant security risks. Synthetic data can be generated to simulate various spoofing scenarios, producing a comprehensive dataset of spoofing attempts with diverse attack variations. This synthetic dataset enables the training and evaluation of spoof detection algorithms, enhancing their effectiveness and enabling the development of resilient countermeasures against evolving spoofing techniques. To summarize, synthetically generated biometrics provide a valuable approach to address the limitations and challenges associated with conventional biometric datasets. By offering more data with sufficient variations, mitigating privacy concerns, improving deep CNN training and testing, and enhancing spoof detection capabilities, synthetically generated biometrics significantly contribute to the advancement and effectiveness of biometric systems in real-world applications. 10 1.3.2 Applications of Synthetic Biometric Data Synthetic biometric data has gained significant attention as a valuable resource in various fields, offering a range of applications and benefits. Here, we discuss the diverse applications of syn- thetic biometric data and highlights its significance in addressing challenges and improving the performance of biometric systems [45, 58, 73, 145, 158]: • Algorithm Development and Testing: Synthetic biometric data serves as a valuable resource for algorithm development and testing in the field of biometrics. By generating synthetic biometric samples, researchers can assess and evaluate the performance of novel algorithms, compare different techniques, and benchmark their efficacy against a standardized dataset. Synthetic data allows for controlled experimentation, enabling researchers to precisely ma- nipulate specific biometric traits, variations, and noise levels to simulate real-world scenarios. • System Evaluation and Benchmarking: Synthetic biometric data plays a vital role in evalu- ating and benchmarking the performance of biometric systems. It provides a standardized dataset that enables fair comparisons between different systems and algorithms. By using synthetic data, researchers and developers can assess system accuracy, robustness, and vul- nerability to various attacks or spoofing attempts. This evaluation process aids in identifying system weaknesses, improving overall system performance, and guiding the development of countermeasures. • Training Data Augmentation: Synthetic biometric data is utilized to augment training datasets, enhancing the performance of biometric recognition systems. By generating addi- tional synthetic samples, researchers can increase the size and diversity of the training set, which help to improve the performance and generalization capabilities of the algorithms. This approach reduces overfitting, enhances the system’s ability to handle intra-class variations, and improves overall recognition accuracy. • Privacy-Preserving Studies: Synthetic biometric data is invaluable for privacy-preserving 11 studies and research involving sensitive biometric information. It allows researchers to conduct studies, simulations, and experiments without the need for real individuals’ personal biometric data. Synthetic data provides a privacy-friendly alternative that ensures data protection while enabling advancements in biometric research and system development. • Modeling Abnormal or Rare Cases: Synthetic data generation can help model age progression and the effects of diseases in iris recognition, where real-world datasets are often scarce. Age progression may or may not alter iris patterns over time, but lack of enough data over the period of time make it hard to have a definitive analysis. By using generative techniques like GANs, synthetic iris data can simulate these changes, improving the performance of biometric systems across different ages. Similarly, synthetic data can model disease effects on the iris, helping recognition systems account for conditions like cataracts or glaucoma, which impact iris textures. This allows for the development of more robust iris recognition systems that can handle aging and disease-related changes effectively. Overall, the applications of synthetic biometric data contribute to the advancement of biometric systems, enhancing their accuracy, security, and privacy while facilitating research and development in the field. Apart from these, synthetic biometric data generation is widely used in the field of entertainment and gaming to generate avatars with realistic looking human features, emotions and speech [23, 38, 68, 89]. Similarly, learning and generating human-like gait movements can help create simulations for training and testing autonomous vehicles [154]. 1.3.3 Methods to Generate Synthetic Biometrics Synthetic biometric data offers flexibility and convenience in the development and assessment of biometric systems. In this context, it is important to understand the different methods employed to generate synthetic biometric data and their respective approaches. These methods have evolved with advancements in technology, ranging from pre-CNN techniques that predate the widespread adoption of CNNs to post-CNN methods that harness the power of deep learning. By exploring 12 these techniques, researchers and practitioners have been working on generating synthetic biometric data that closely resembles real-world samples, fostering advancements in the field of biometrics. Some of these techniques are given below: 1.3.3.1 Pre-CNN Methods Pre-CNN methods refer to the techniques used to generate synthetic biometric data before the widespread adoption of CNNs. These methods often involve mathematical models and algorithms to simulate biometric traits. Some common pre-CNN methods are [73]: • Mathematical Models: These models form a fundamental approach for generating synthetic data by approximating the distribution of real human data through mathematical models. Synthetic samples are then created by sampling from the approximated model. For example, Shah and Ross [126] proposed an approach for generating digital renditions of iris images using a two-step technique. In the first stage, they utilized a Markov Random Field model to generate a background texture that accurately represents the global appearance of the iris. In the subsequent stage, various iris features, including radial and concentric furrows, collarette, and crypts, are generated and seamlessly embedded within the texture field. In another example, Cappelli et al. [18] introduced a widely recognized method called Synthetic Fingerprint Generation (SFinGe) that is based on mathematical modeling. The researchers leverage their domain expertise to establish a fingerprint orientation model, which incorpo- rates key characteristics such as the number and placement of fingerprint cores and deltas. The generation process begins by initializing the core and delta locations, followed by the generation of ridge orientations and densities. To obtain a high-quality fingerprint image, the authors employ space-invariant linear filtering techniques. Additionally, they introduce domain-specific noise to simulate realistic grayscale fingerprint images. There are also other methods that use mathematical models for synthetic generation for face [108], keystroke [99], hand recognition [27], etc. Examples of some biometric data generated using mathematical models are given in Figure 1.4. 13 Limitations: Mathematical models learn to generate synthetic biometric data by approximat- ing the distribution of real human data, creating a dependency on training data. Therefore, such models may struggle to capture complex variations present in biometric data, resulting in limited diversity in the generated synthetic data. For example, SFinGe generates synthetic fingerprints based on a predefined fingerprint orientation model. This limits the diversity and variability of the generated fingerprints, as they are confined to the characteristics defined by the model [45]. As a result, the generated fingerprints may not fully represent the wide range of natural fingerprint variations observed in real-world scenarios. Apart from limited variability, some models are also limited in terms of quality and realism [27, 45, 108, 156]. • Input Perturbations: It refers to the deliberate introduction of controlled variations or distortions to the input data in order to generate synthetic biometric data. This technique is commonly used in data augmentation to increase the diversity and robustness of biometric datasets. The perturbations can be either hand-crafted or dynamic in nature [46, 73]. Hand- crafted perturbations involve the application of predetermined modifications or manipulations to the original biometric data, with specific perturbations designed by experts based on their understanding of the underlying biometric characteristics and vulnerabilities. On the other hand, dynamic perturbations involve the introduction of variations to the biometric data in a more adaptive and context-aware manner. Examples of hand-crafted perturbations include geometric transformations, the addition of noise, modifications to texture, and the inclusion of occlusion patterns. These perturbations are typically static and do not vary dynamically during the generation process. Nojavanasghari et al. [107] proposed a method for synthesizing and recognizing occlusions caused by hands covering faces in images. The focus of the paper is on addressing the challenges posed by hand occlusions in facial recognition systems. The synthesis pipeline consists of three steps: (1) To synthesize faces with occlusions, the first step is to collect non-occluded faces and segmented occlusions such as hands, hair, hats, scarves, etc., (2) align the faces with occlusions in terms of scale, pose, orientations and color correction, and (3) determine the areas of faces where occlusion should be added using 14 facial landmarks. This helps synthesize face images with occlusions that can help train a face recogniton system to overcome the challenges posed by occluded faces. In [19], Cardoso et al. aimed to generate synthetic degraded iris images for evaluation purposes. The method utilizes various degradation factors such as blur, noise, occlusion, and contrast changes to simulate realistic and challenging iris image conditions. The degradation factors are carefully controlled to achieve a realistic representation of degraded iris images commonly encountered in real-world scenarios. Examples of some biometric data generated using perturbations are given in Figure 1.5. Limitations: The purpose of applying input perturbations is to simulate the natural variability present in real biometric data. However, the range of hand-crafted perturbations may be limited compared to the extensive diversity observed in real biometric data. It can be challenging to fully encompass the complete spectrum of variations and complexities using input perturbations alone. Also, input perturbations are typically generic and may not incorporate domain-specific knowledge or expertise. This can result in synthetic data that does not fully capture the specific characteristics or challenges of the target biometric modality or application domain [46]. • 3-D Modelling & Rendering: 3D modeling is a technique used in synthetic image generation to create a representations of the three-dimensional surface of an object of interest. This approach involves constructing a digital model that accurately captures the shape, texture, and geometry of the biometric feature, such as a face, fingerprint, or eye. The advantage of using 3D modeling and rendering is the ability to incorporate extreme changes in illumination, viewpoint, occlusion, scale, and background, providing a diverse set of synthetic samples. Han et al. [61] emphasized the benefits of generating synthetic samples in 3D space, allowing for precise control over environmental conditions such as pose variations, lighting, and object geometry. This control enables accurate annotations, which are often obtained from real datasets. Other studies have explored the use of 3D rendering tools in various applications, including re-identification of individuals, face recognition, and gait recognition. Examples 15 of some biometric data generated using 3-D models are given in Figure 1.6. Limitations: While 3D modeling and rendering techniques have advantages in synthetic biometric image generation, they also come with some limitations. These methods can cap- ture the shape and texture of a biometric feature, but there are still limitations in achieving complete realism. Certain fine details, such as subtle variations in skin texture or small imperfections are challenging to accurately replicate in the synthetic images. This limitation could potentially affect the performance and generalization of biometric recognition algo- rithms trained on synthetic data. Also, these methods are seen to generate images that are similar to the training dataset i.e., the identities in generated dataset has high similarity with the identities in real dataset [73] • Hybrid Approaches: Hybrid approaches combine multiple techniques to generate synthetic data that exhibits diverse and realistic characteristics. These approaches leverage the strengths of both mathematical models and transformation techniques. For instance, researchers may use parametric models to capture the statistical properties of the data and then apply trans- formation techniques to introduce variations and enhance the realism of the synthetic data. By combining these approaches, researchers can create synthetic data that closely resembles the real data while incorporating desired variations. In [110], Park et al. combined 3-D modeling with separate models for shape and texture to generate effects of aging on human faces. This helped train a face recognition method with temporal in-variance. As discussed earlier, Cappelli et al. [18] proposed SFinGe that utilizes mathematical model to generate synthetic fingerprints by initializing the core and delta locations, followed by the generation of ridge orientations and densities. Also, domain-specific noise is added to the generated images to improve realism of the generated images. Limitations: While the hybrid approaches helps improve quality and realism of generated in some cases [18, 110], their generations capability is still limited to the training data affecting their diversity and fail to capture complex variations caused due to external factors [73]. 16 Figure 1.4: Examples of images generated by approximating the distribution of real human data through mathematical models. (i) In [107], authors used Synthetic Fingerprint Generation (SFinGe) to generate realistic looking fingerprints. (ii) Shah and Ross [126] utilized a Markov Random Field model to generate background texture for synthetic irides and various iris features, including radial and concentric furrows, collarette, and crypts, are generated and seamlessly embedded within the texture field and (iii) [108] generates multiple face images, starting from seed points representing different identities. Users can control the degree of variation from the seed point to create new faces while maintaining the same identity if the difference is below a certain threshold. Figure 1.5: (i) Nojavanasghari et al. [107] proposed an innovative approach to generate lifelike facial occlusions using a dataset of clear faces and separate hand images. This method reduces the need for labor-intensive data gathering and annotation and (ii) In [19], researchers proposed a method to generate synthetic irides with noise perturbations to achieve a realistic representation of degraded iris images commonly encountered in real-world scenarios. 17 Figure 1.6: (i) In [109], Öz et al. generated artificial eye images by utilizing UnityEyes [150], a 3D rendering tool. The synthetic dataset was then incorporated for evaluation purposes and (ii) Feng et al. [49] proposed to expand the face dataset by generating synthetic faces with 11 different yaw rotations (ranging from -50 to 50 in increments of 5) and 5 pitch rotations (ranging from -30 to 30). This augmentation technique effectively increased the size of dataset when combined with real data. Figure 1.7: (i) and (ii) are produced by an Adversarial Autoencoder [93] trained on both the MNIST and Toronto Face datasets (TFD). The final column displays the nearest training images based on pixel-wise Euclidean distance to the ones in the second-to-last column. 18 Figure 1.8: (i) In [156], Yadav et al. leveraged RaSGAN to generate high-resolution iris images and (ii) Choi et al. [26] developed a image-to-image translation GAN, StarGAN v2, that aims to learn mapping between various visual domains while ensuring diversity and scalability across multiple domains. In the figure, StarGAN v2 translates input image to images with target domain=Female. 1.3.3.2 Post-CNN Methods Post-CNN methods involve generating synthetic biometric data using CNNs, which have shown remarkable performance in various computer vision tasks. These methods leverage the power of deep learning models to learn and generate realistic biometric samples. Some common post-CNN methods include: • Recurrent Neural Networks: A recurrent neural network (RNN) is a type of artificial neural network that is designed to process sequential data by capturing dependencies and patterns over time. Unlike feedforward neural networks, which process data in a strictly forward manner, RNNs have loops that allow information to be stored and propagated through time. This recurrent structure enables RNNs to handle inputs of variable lengths and make use of past information to influence future predictions. At each time step, an RNN takes an input vector and produces an output vector, while also maintaining an internal hidden state. This hidden state serves as the memory of the network and allows it to retain information about previous inputs. The output at each time step is influenced by both the current input and the previous hidden state [124]. This type of network has been commonly used to generate synthetic speech data. In [14], Bird et al. proposed Character-level Recurrent Neural Network (Char-RNN) trained on a short clip of five spoken sentences to generate synthetic data. The 19 generated synthetic data is then used to improve the generalization capability of deep learning based speech recognition methods. Mirsky and Lee [102] discuss how RNNs can be trained to capture patterns over time, which helps render realistic flow in synthetic videos and audios. Limitations: While recurrent neural networks (RNNs) have shown promise in generating synthetic biometric data, they also have certain limitations and disadvantages. For example, RNNs are sensitive to the length of the input sequences. When generating synthetic biometric data, if the sequences are too long, RNNs may struggle to capture long-term dependencies effectively, leading to potential loss of important information. Also, RNNs typically have a limited context window, which means they may not be able to capture long-range dependen- cies in the data. This can result in synthetic data that lacks broader contextual information and may not fully capture the intricacies of the biometric characteristics. These methods are prone to overfitting, especially when the training dataset is small or lacks diversity. This can lead to the generation of synthetic biometric data that closely resembles the training set but fails to capture the full variability and generalization required for accurate representa- tion [73, 102, 124]. • Autoencoders: An autoencoder is a type of neural network that can be used to generate synthetic biometric data by learning the underlying patterns and representations from a given dataset. It consists of an encoder and a decoder, where the encoder compresses the input data into a lower-dimensional representation (latent space), and the decoder reconstructs the data from this representation. The basic autoencoder architecture has been extended and modified to cater to specific requirements in generating synthetic biometric data. For example, Wan et al. [146] proposed a novel approach for synthetic data generation using a variational autoencoder (VAE) that can enable the generation of new samples that exhibit similarities to the original dataset while introducing some level of variation. The proposed method is evaluated and compared against traditional synthetic sampling methods using multiple datasets and five evaluation metrics. The experimental results demonstrate the effectiveness of the approach in addressing the challenges of imbalanced learning. Other 20 prominent autoencoders for synthetic image generation are adversarial autoencoder [93] and Wasserstein auto-encoder [141]. Some examples of biometric data generated using autoencoders are given in Figure 1.7. Limitations: Many efforts have been made improve the quality of generated images by introducing different variations of autoencoder [93, 141], however, they are still lacking in terms of quality of generated images [11, 29]. Also, autoencoders rely on the patterns and features present in the training data to generate synthetic samples. As a result, their creativity and ability to generate novel data beyond the training set may be limited [29, 56]. • Generative Adversarial Networks (GANs): GANs consist of a generator and a discriminator network that compete against each other [56]. The generator (G) learns to generate synthetic samples that are similar to real biometric data, while the discriminator network aims to distinguish between real and synthetic samples. GANs have been successfully applied to generate realistic face images, iris patterns, and other biometric traits. In [82], Kohli et al. proposed iDCGAN (iris Deep Convolutional Generative Adversarial Network), a novel framework that leverages deep convolutional generative adversarial networks and iris quality metrics to generate synthetic iris images that closely resemble real iris images. The purpose of this framework is to explore the impact of these synthetically generated iris images when used as presentation attacks on iris recognition systems. Some examples of biometric data generated using mathematical models are given in Figure 1.8. Limitations: While GANs have proven highly effective in generating synthetic biometric data, including iris images, they are without their challenges. One significant limitation is their dependency on training data, which can lead to generated images resembling real human data, restricting the ability to produce identities distinct from the training set [73, 140]. This limitation raises concerns about privacy, as synthetic images may share similarities with real-world subjects. Moreover, GANs often struggle with producing sufficient intra-class variation, limiting their utility in biometric applications where diverse representations of the 21 same identity are needed [78]. Additionally, GANs are known for their unstable training process, which can result in suboptimal image quality and mode collapse, especially when scaling to high-resolution synthetic data generation. • Diffusion based Generative Adversarial Networks (GANs): Diffusion-based GANs are a novel approach that combines diffusion models with the traditional GAN architecture to improve the generation of high-quality synthetic images [147]. Diffusion models generate images by progressively denoising random noise through a series of iterative steps, moving from random patterns to structured images [31]. This iterative process allows them to better capture fine details and complex structures that traditional GANs may miss. By incorpo- rating diffusion techniques into GANs, diffusion-based GANs benefit from the strengths of both frameworks: the high image quality and detail retention of diffusion models and the adversarial training of GANs, which helps produce more realistic and diverse outputs. This method addresses common challenges in traditional GANs, such as instability during training and mode collapse, making diffusion-based GANs more robust and effective for generating complex synthetic data, especially in applications like biometric image generation [79, 135]. Limitations: Despite their advantages, diffusion-based GANs face challenges in generating distinct identities. One key limitation is the high computational cost due to the iterative denoising process, making it slower and less scalable for large datasets. Additionally, while these models generate realistic images, ensuring that the identities are sufficiently distinct from the training data remains a challenge. Diffusion-based GANs may also struggle with controlling specific identity features in the latent space, potentially leading to less diversity in the generated identities. • Hybrid Approaches: Hybrid approaches combine multiple techniques to generate synthetic data that exhibits diverse and realistic characteristics. Bamoriya et al. [9] proposed a novel ap- proach, called Deep Synthetic Biometric GAN (DSB-GAN), for generating realistic synthetic biometrics that can serve as large training datasets for deep learning networks, enhancing 22 their robustness against adversarial attacks. DSB-GAN builds upon a combination of convo- lutional autoencoder (CAE) and DCGAN and the evaluation of DSB-GAN is conducted on three biometric modalities: fingerprint, iris, and palmprint. One of the notable advantages of DSB-GAN is its efficiency due to a low number of trainable parameters compared to existing state-of-the-art methods. In [67], Huang et al. introduced an innovative approach called introspective variational autoencoder (IntroVAE) for the synthesis of high-resolution photo- graphic images. IntroVAE employs a streamlined architecture that combines the strengths of both VAEs and GANs, eliminating the need for additional discriminators. The generator component of IntroVAE follows the conventional VAE approach by reconstructing input images from noisy outputs of the inference model. Simultaneously, the inference model is trained to classify between real and generated samples, while the generator attempts to deceive it, similar to the concept in GANs. Limitations: Methods like [9] can successfully generates biometric data for multiple modal- ities, but it lacks evidence to show inter and intra class variation in the generated dataset. In [67], huang et al. combined the stable training of VAE with superior generation capability of GANs to generate synthetic data. This helped improve image quality of the synthetic data, but the work doesn’t focus on generating images that are unique from training data. 1.3.4 Synthetic Iris Images Synthetic iris images have become increasingly important for various applications, such as iris recognition system evaluation, algorithm development, and biometric data augmentation. Synthetic images offer several advantages, including scalability, diversity, and control over the generated data. In this literature review, we will explore different methods for generating synthetic iris images, discussing their strengths, limitations, and potential applications: • Texture Synthesis: This technique has been widely used for generating synthetic iris images. These methods analyze the statistical properties of real iris images and generate new images 23 Figure 1.9: Some examples of synthetically generated iris images using different methods. (i) and (ii) use mathematical models from [169] and [126]. On the other hand, (iii) and (iv) use GAN based methods from [156] and [82], respectively. As mentioned in Section 1.3, the mathematical models learn to generate synthetic biometric data by approximating the distribution of real human data, creating a dependency on training data. Therefore, such models may struggle to capture complex variations present in biometric data, resulting in limited diversity in the generated synthetic data. Apart from limited variability, some models are also limited in terms of quality and realism. On the other hand, GANs have paved the way to generate realistic looking synthetic images. However, the generated images share resemblance to training data with real images from real humans i.e., the identities generated by most of the current methods are not unique enough. Also, some of these methods lack intra-class variations and may not fully represent the wide range natural variations observed in real-world scenarios. based on those statistics. Shah and Ross [126] proposed an approach for generating digital renditions of iris images using a two-step technique. In the first stage, they utilized a Markov Random Field model to generate a background texture that accurately represents the global appearance of the iris. In the subsequent stage, various iris features, including radial and concentric furrows, collarette, and crypts, are generated and seamlessly embedded within the texture field. In another example, Makthal and Ross [94] introduced a novel approach for synthetic iris generation using Markov Random Field (MRF) modeling. The proposed method offers a deterministic synthesis procedure, which eliminates the need for sampling a probability distribution and simplifies computational complexity. Additionally, the study highlights the distinctiveness of iris textures compared to other non-stochastic textural patterns. Through clustering experiments, it is demonstrated that the synthetic irises 24 generated using this technique exhibit content similarity to real iris images. In a different approach, Wei et al. [149] proposed a framework for synthesizing large and realistic iris datasets by utilizing iris patches as fundamental elements to capture the visual primitives of iris texture. Through patch-based sampling, an iris prototype is created, serving as the foundation for generating a set of pseudo irises with intra-class variations. Qualitative and quantitative analyses demonstrate that the synthetic datasets generated by this framework are well-suited for evaluating iris recognition systems. Limitations: These methods learn to generate synthetic iris images by approximating the distribution of real iris data, creating a dependency on training data. Therefore, such models may struggle to capture complex variations present in real iris datasets, resulting in limited diversity in the generated synthetic data. Furthermore, certain studies in this domain lack comprehensive analysis of the inter and intra-class variations exhibited in the generated synthetic data. For instance, a study by Wei et al. [149] demonstrates that the synthetic iris images bear a striking resemblance to real iris images in terms of visual appearance. However, the experiments conducted in this study do not thoroughly explore the extent to which the synthetic iris images resemble each other in terms of identity. • Morphable Models: Morphable models have been utilized for generating synthetic iris images by capturing the shape and appearance variations in a statistical model. These models represent the shape and texture of irises using a low-dimensional parameter space. By manipulating the parameters, synthetic iris images with different characteristics, such as size, shape, and texture, can be generated. Most of the research in this category focuses on generation synthetic iris images with gaze estimation and rendering eye movements. Wood et al. [150] proposed a 3-D morphable model for the eye region with gaze estimation and re-targeting gaze using a single reference image. Similarly, [10] focuses on achieving photo- realistic rendering of eye movements in 3D facial animation. The model is built upon 3D scans of a face captured from various gaze directions, enabling the capture of realistic motion of the eyeball, eyelid deformation, and the surrounding skin. To represent these deformations, 25 a 3D morphable model is employed. Limitations: Morphable models offer control over iris attributes and can generate a diverse range of iris images with unique identity. However, acquiring a comprehensive training dataset and accurate parameter estimation are crucial for the effectiveness of these models. An illustrative example can be found in [150], where the focus is on gaze estimation and re-targeting, enabling the generation of eye regions with realistic variations. However, it is worth noting that while emphasizing these variations, the model’s capacity to preserve identity information is constrained in the resulting generated images. This is a very common problem found with morphable models for gaze estimation. • Image Warping: Image warping techniques involve applying geometric transformations to real iris images to generate synthetic images. These transformations can include rotations, translations, scaling, and deformations. Image warping allows for the generation of synthetic iris images with variations in pose, gaze direction, and occlusions. In [19], Cardoso et al. aimed to generate synthetic degraded iris images for evaluation purposes. The method utilizes various degradation factors such as blur, noise, occlusion, and contrast changes to simulate realistic and challenging iris image conditions. The degradation factors are carefully con- trolled to achieve a realistic representation of degraded iris images commonly encountered in real-world scenarios. In [32], a novel iris image synthesis method combining principal component analysis (PCA) and super-resolution techniques is proposed. The study begins by introducing the iris recognition algorithm based on PCA, followed by the presentation of the iris image synthesis method. The proposed synthesis method involves the construction of coarse iris images using predetermined coefficients. Subsequently, super-resolution tech- niques are applied to enhance the quality of the synthesized iris images. By manipulating the coefficients, it becomes possible to generate a wide range of iris images belonging to specific classes. Limitations: This method is computationally efficient and can produce a large number of 26 synthetic images. However, image warping may not capture the complex textural details and iris-specific characteristics accurately. Also, some methods like [19] focuses on generating intra-class variations for the currently available iris datasets and don’t cater to the need of generating large-scale synthetic iris dataset with inter-class variations. • Generative Adversarial Networks (GANs): GANs have gained significant attention for generating realistic and diverse synthetic iris images. In a GAN framework, a generator network learns to generate synthetic iris images, while a discriminator network distinguishes between real and synthetic images. The two networks are trained in an adversarial manner, resulting in improved image quality over time. GANs can generate iris images with realistic features, including iris texture, color, and overall appearance. Minaee and Abdolrashidi [100] proposed a framework that utilizes a GAN to generate synthetic iris images sampled from a learned prior distribution. The framework is applied to two widely used iris datasets, and the generated images demonstrate a high level of realism, closely resembling the distribution of images within the original datasets. Similarly, Kohli et al. [82] proposed iDCGAN (iris Deep Convolutional Generative Adversarial Network), a novel framework that leverages a deep convolutional GAN to generate synthetic iris images that closely resemble real iris images. The purpose of this framework is to explore the impact of these synthetically generated iris images when used as presentation attacks on iris recognition systems. Bamoriya et al. [9] proposed a novel approach, called Deep Synthetic Biometric GAN (DSB-GAN), for generating realistic synthetic biometrics that can serve as large training datasets for deep learning networks, enhancing their robustness against adversarial attacks. DSB-GAN builds upon a combination of convolutional autoencoder (CAE) and DCGAN and the evaluation of DSB-GAN is conducted on three biometric modalities: fingerprint, iris, and palmprint. One of the notable advantages of DSB-GAN is its efficiency due to a low number of trainable parameters compared to existing state-of-the-art methods. Limitations: GANs have shown promise in generating synthetic iris images, but they also have certain limitations. The GAN-based methods introduced by Minaee et al. [100] and 27 Kohli et al. [82] demonstrate the ability to generate high-quality iris images, particularly in low-resolution settings. However, it is important to note that the image quality tends to degrade when applied to higher resolution scenarios [156]. While these methods exhibit promising capabilities for generating synthetic iris images, further improvements are neces- sary to ensure consistent quality across various image resolutions. The other most common issue found with the current GAN based methods for synthetic iris image generation is data dependency. Current GAN methods fail to generate iris images that are unique from real training data in terms of identity [73, 140]. This restricts the capability of GAN methods to generate large-scale synthetic iris dataset. Also, the inadequate or biased training data can result in sub-optimal performance and limit network’s generalization capabilities. • Diffusion based Generative Adversarial Networks: Diffusion-based GANs: Diffusion- based GANs have recently emerged as a promising approach for generating realistic and high-quality synthetic iris images by combining the strengths of diffusion models and GANs. In these models, the image generation process involves iterative refinement, starting from noise and progressively producing structured images. Diffusion-based GANs can capture complex textures and features, making them particularly suited for the fine details required in synthetic iris image generation. Recent work like II3FDM [92] has shown the potential of diffusion models in tasks like iris inpainting, highlighting the effectiveness of diffusion-based approaches in reconstructing realistic iris textures. This method demonstrates that diffusion models can produce high-quality iris images, preserving intricate details such as iris patterns and textures, which are crucial for biometric applications. However, while studies like II3FDM have explored diffusion models for iris-related tasks, the application of diffusion- based GANs in the iris domain remains relatively underexplored. Most research has focused on traditional GANs or inpainting methods, leaving significant room for further development and innovation in using diffusion-based GANs for generating synthetic iris datasets. The limited research in this area suggests there is ample scope for exploring how diffusion-based GANs can overcome the challenges faced by traditional GANs, such as generating distinct 28 identities and handling high-resolution images. By leveraging the stability and iterative nature of diffusion processes, future research could unlock new capabilities in iris synthesis, particularly in enhancing identity diversity and addressing issues related to mode collapse and training instability, which are common in GAN-based models. • Limitations: While diffusion-based GANs offer a promising direction, their application in iris synthesis has not been thoroughly investigated. The work by II3FDM [92] indicates their potential for high-quality iris reconstruction, but the challenge of generating distinct identities in large-scale datasets remains a largely unaddressed area. Additionally, the computational cost and complexity of diffusion-based models present a barrier to their widespread use, further highlighting the need for research to optimize these models for synthetic biometric data generation. 1.4 Our Contribution In the field of biometrics, various techniques have been developed to generate synthetic data for different modalities, including face, fingerprint, and iris. These methods enable researchers and practitioners to create artificial biometric samples that mimic the characteristics of real-world data. However, as discussed earlier the current methods for generating synthetic irides are limited in terms of quality, realism and uniqueness (inter and intra-class variations). In this research, we proposed different methods to overcome these issues and emphasize their usefulness through various experiments and analysis. Based on the nature of the generated data and their application, the contribution of our work can be categorized as follows: • Generating Partially-Synthetic Biometric Data: Partially-Synthetic biometric data refer to the synthetic samples that contain artificial components mixed with real biometric traits. The goal of partially-synthetic data is to introduce controlled variations or augmentations to the real data, thereby increasing the diversity and robustness of the dataset. This can be particularly useful in scenarios where the real data is limited, imbalanced, or lacks specific 29 variations. For example, in iris presentation attack (PA) detection where the detection methods aim to detect PA attacks (such as printed eyes, cosmetic contact lens, etc.), limited PA data is available to train the detection methods. This can limit the methods’ development and testing as well. Also, with the improvement in technology more advance PA attacks are present in the real world (such as good quality textured contact lens, replay attack using high definition screens, etc.) and the current detection methods are not generalized enough to detect these new and unseen attacks. To overcome these issues, we proposed two different methods: (1) Leveraging Relativistic Average Standard Generative Adversarial Network (RaS- GAN): This study leverages the capabilities of RaSGAN to produce high-quality iris images. Unlike traditional GANs, the introduction of a "relativistic" discriminator and generator in RaSGAN enhances the network’s generative power. This approach aims to maximize the probability that the real input data is more realistic than the synthetic data, and vice versa. The synthetic iris images generated through this method exhibit similarity to real iris images, capturing the intricate details and characteristics of the iris. Building upon this, we explore the usability of these synthetic images for training a PAD system that can effectively detect presentation attacks. By combining the power of RaSGAN for generating highly realistic iris images and the effectiveness of the resulting synthetic images for training a PAD system, this approach offers promising prospects for enhancing the security and reliability of iris recognition systems. The synthetic images generated through this technique can contribute to improving the detection and prevention of presentation attacks, enabling biometric systems to handle previously unseen attacks more effectively. (2) Cyclic Image Translation Generative Adversarial Network (CIT-GAN): We intro- duced a novel approach called CIT-GAN for achieving multi-domain style transfer. Our method incorporated a Styling Network, which learns the distinctive style characteristics of each domain represented in the training dataset. By leveraging the Styling Network, the generator is guided to translate images from a source domain to a reference domain, resulting 30 in the generation of synthetic images that possess the style characteristics of the reference domain. The learning process for style characteristics is influenced by both the style loss and domain classification loss, allowing for variability in style characteristics within each domain. In the context of iris presentation attack detection (PAD), we utilized the proposed CIT-GAN to generate synthetic presentation attack (PA) samples for classes that are under-represented in the training set. Through evaluation using state-of-the-art iris PAD methods, we demon- strated the effectiveness of using these synthetically generated PA samples for training PAD models. Additionally, we evaluated the realism of the synthetic images using the Frechet Inception Distance (FID) score, which quantifies the similarity between the distributions of real and synthetic images. Our results indicate that the proposed method produces synthetic images of superior quality compared to other competing methods, including StarGan-v2. (3) Multi-domain Image Translative Diffusion StyleGAN with Application in Iris Pre- sentation Attack Detection (MID-StyleGAN): An iris biometric system can be vulnerable to presentation attacks (PAs), where artifacts like artificial eyes, printed eye images, or cosmetic contact lenses are used to deceive the system. To mitigate these threats, various presentation attack detection (PAD) methods have been proposed. However, the development and evaluation of iris PAD techniques face a significant challenge due to the lack of sufficient datasets, primarily because of the inherent difficulties in creating and capturing realistic PAs. To address this issue, we presented the Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN), a novel framework designed to generate synthetic ocular images that effec- tively capture domain-specific information from iris PA datasets. MID-StyleGAN leverages the strengths of diffusion models and generative adversarial networks (GANs) to create real- istic and diverse synthetic data. It utilizes a multi-domain architecture that enables seamless translation between bonafide ocular images and various PA domains, while maintaining the biometric identity features. The framework incorporates an adaptive loss function specifi- cally tailored for ocular data to ensure domain consistency. Experimental results demonstrate that MID-StyleGAN surpasses existing methods in generating high-quality synthetic ocular 31 images, significantly enhancing PAD system performance. • Generating Fully-Synthetic Biometric Data: Fully-synthetic biometric data refer to the generation of entirely artificial biometric samples that do not correspond to any real individ- uals in the population. With fully-synthetic biometric data the aim is to generate new iris images with both inter and intra-class variations to help mitigate the issue of small training data size by increasing their size of dataset. This can improve the development of recognition systems and their testing. Also, by generating fully-synthetic identities that doesn’t resemble anyone in this world, we can solve the privacy concerns attached with using real person’s biometric data. We have proposed two different methods to achieve this: (1) iWarpGAN: Disentangling Identity and Style to Generate Synthetic Iris Images: This framework incorporates two transformation pathways, namely the Identity Transformation and the Style Transformation. The Identity Transformation pathway is designed to modify the identity of the input iris image in the latent space, allowing the generation of iris images with identities that are distinct from those in the training set. This is accomplished by learning a radial basis function (RBF)-based warp function, denoted as 𝑓𝑝, in the latent space of a Generative Adversarial Network (GAN). The gradient of this function enables the generation of non-linear paths along the 𝑝𝑡ℎ family of paths for each latent code 𝑧 ∈ 𝑅, resulting in diverse identities. On the other hand, the Style Transformation pathway focuses on generating iris images with different styles. The style attributes are extracted from a reference iris image and combined with the transformed identity code. By concatenating the reference style code with the modified identity code, iWarpGAN generates synthetic iris images with both inter-class and intra-class variations in terms of style. Through the integration of these two transformation pathways, iWarpGAN facilitates the generation of iris images that exhibit diverse identities and styles, providing a comprehensive exploration of the latent space. (2) Image Translative Diffusion GAN (IT-diffGAN): IT-diffGAN is introduced to address critical challenges posed by traditional GANs in generating synthetic biometric data, specif- 32 ically in producing distinct, realistic identities with sufficient variation. Traditional GANs often struggle with mode collapse, unstable training, and the generation of synthetic images that closely resemble the training data, which can compromise privacy and diversity. IT- diffGAN seeks to overcome these issues by leveraging the advantages of diffusion models, which offer a more stable and iterative approach to image generation, combined with the architectural power of StyleGAN-3. The proposed method first begins by projecting input images into the latent space of the diffusion-GAN. In this latent space, the model identifies key features that define individual identity and style, enabling precise manipulation of these features. By applying a specialized identity and style metric, IT-diffGAN calculates the displacement or distance between the original and generated images, providing a measure of how much these key features are being altered. This metric allows the model to learn which latent features most significantly affect the identity and style, making the manipulation of these attributes more controlled and effective. Once the relevant identity and style features are identified, IT-diffGAN is trained to generate entirely new identities by carefully adjusting these features within the latent space. This training enables the model to introduce both inter-class and intra-class variations in the synthetic images. Inter-class variations allow for the generation of completely distinct identities, while intra-class variations create different representations of the same identity, mimicking natural biometric variations seen in real- world data. By utilizing the diffusion-GAN framework, IT-diffGAN offers enhanced stability during training compared to traditional GANs. Diffusion models inherently reduce the risk of mode collapse by employing an iterative denoising process that refines the generated images progressively from noise to fully-formed, structured images. This leads to the production of more realistic and higher-quality synthetic iris images. Additionally, the StyleGAN-3 backbone helps in preserving the fine details and features essential for biometric recognition, further boosting the quality of the generated data. 33 CHAPTER 2 GENERATING PARTIALLY SYNTHETIC IRIS IMAGES FOR ENHANCED PRESENTATION ATTACK DETECTION This work leverages RaSGAN to generate high-quality partially synthetic iris images in NIR spectrum and evaluates the effectiveness and usefulness of these images as both bonafide and presentation attack. We also propose a novel one-class presentation attack detection method known as RD-PAD for unseen presentation attack detection, addressing the challenge of generalizability in PAD algorithms. This research have been published in [156] and [157] 2.1 Introduction The rich texture of the iris, which is better discernible in the near-infrared spectrum, has been used as a biometric cue [37] in many recognition systems [71]. This has led to an increased interest in the texture and morphology of the iris. Consequently, researchers have strived to model the pattern of the iris. In this regard, a number of methods to generate synthetic digital irides have been developed (as discussed in Chapter 1). Cui et al. [32] used principal component analysis to select appropriate feature vector coefficients from real images, which were then used to generate synthetic irides. The quality of the generated data was improved using super-resolution. Zuo et al. [173] developed a model based on the morphology of the iris. Noise and light reflection were also added to the model to create more realistic looking samples. Shah and Ross [126] used Markov Random Field to model the stromal texture of the iris [94] and then added anatomical entities such as collarette, crypts, radial and concentric furrows. Wei et al. [149] proposed a framework for synthesizing large and realistic iris datasets by utilizing iris patches as fundamental elements to capture the visual primitives of iris texture. Through patch-based sampling, an iris prototype is created, serving as the foundation for generating a set of pseudo irises with variations. In [144], Venugopalan and Savvides aimed to model synthetic iris codes from original irides that can be utilized to evaluate iris recognition systems. Other methods have also been proposed in the literature to generate synthetic iris images [19, 50, 148]. While these methods successfully 34 generate digital iris images, they are still unable to truly model the distribution of real iris images in high-resolution [156]. With the introduction of deep learning and generative adversarial networks (GANs), arise the opportunity to overcome some challenges of pre-CNN methods to generate realistic looking synthetic images with higher quality. Huang et al. [67] introduced an innovative approach called introspective variational autoencoder (IntroVAE) for the synthesis of high-resolution photographic images. IntroVAE employs a streamlined architecture that combines the strengths of both VAEs and GANs, eliminating the need for additional discriminators. The generator component of IntroVAE follows the conventional VAE approach by reconstructing input images from noisy outputs of the inference model. Simultaneously, the inference model is trained to classify between real and generated samples, while the generator attempts to deceive it. In [82], Kohli et al. proposed iDCGAN (iris Deep Convolutional Generative Adversarial Network), a novel framework that leverages deep convolutional generative adversarial networks and iris quality metrics to generate synthetic iris images that closely resemble real iris images. This framework aims to explore the impact of the synthetically generated iris images when used as presentation attacks on iris recognition systems. Bamoriya et al. [9] proposed a novel approach, Deep Synthetic Biometric GAN (DSB-GAN), to generate realistic synthetic biometrics that can serve as large training datasets for deep learning networks, enhancing their robustness against adversarial attacks. DSB-GAN builds upon a combination of convolutional autoencoder (CAE) and DCGAN and the evaluation of DSB-GAN is conducted on three biometric modalities: fingerprint, iris, and palmprint. One of the notable advantages of DSB-GAN is its efficiency due to a low number of trainable parameters compared to existing state-of-the-art methods. While these methods successfully generate synthetic data, the quality of the generated data deteriorates with increasing resolution, failing to capture the intricate biometric details. In this research, we propose to overcome these challenges by leveraging the generative capability of Relativistic Average Standard Generative Adversarial Network (RaSGAN) [72] with Frechet Inception Distance (FID) [63] to generate high-resolution iris images. Unlike other GAN methods, 35 Figure 2.1: Samples of real and spoof iris images from MSU-Iris-PA01 [156]: (a) bonafide samples and (b) presentation attack samples: (i) artificial eye, (ii) & (iii) printed iris, (iv) Kindle display and (v) cosmetic contact lens. RaSGAN trains a generator that aims to maximize the probability that a randomly sampled set of synthetic samples are more realistic than a given set of real samples. In [72], Jolicoeur- Martineau showed that this property can be implemented in a Standard GAN using a “relativistic discriminator" that competes with the generator to maximize the probability that the real data is more realistic than the synthetic data. The author studied different cost functions and compared the statistics of the generated samples using the Frechet Inception Distance (FID) score [12]. They reported that the RaSGAN obtained much lower (better) FID score on the CIFAR-10 dataset than SGAN, Least Squares GAN (LSGAN) [97] and Wasserstien GAN (WGAN) [8]. It was also observed that RaSGAN produces high-resolution images using fewer number of iterations even when other networks were not able to converge (especially for high resolution images). We further investigate the quality of irides generated using RaSGAN by evaluating whether state-of-the-art iris presentation attack detection (PAD) methods can distinguish between bonafide, synthetically generated irides and presentation attacks. Here, presentation attacks (PAs) refer to physical artifacts that are utilized to successfully circumvent iris biometric systems. For example, an adversary can present a printed image [60, 161] to an iris sensor to impersonate another subject, or use cosmetic contact lenses [80, 155] and artificial eyes [51] to either obfuscate their own identity or to create a virtual identity. Due to their serious impact on the security of a system, detecting such spoof or obfuscation attacks has become a key research topic in biometrics. Some of the commonly used iris presentation attack detection (PAD) algorithms are summarized below: • Print Attack: Attackers may use printed or static images of an enrolled iris to impersonate someone. Gupta et al. [60] used textual descriptors such as LBP, HOG and GIST to detect 36 print attacks. Raghavendra and Busch [115] used multi-scale binarized statistical image features (BSIF) combined with cepstral features for print attack detection. The printed images are often flat and can easily be detected using liveliness test [33, 74]. • Cosmetic Contact Lens: Attackers can utilize cosmetic contact lens to fool the iris recog- nition systems. Kohli et al. [81] used a variant of LBP to obtain useful textural features for contact lens detection. Other approaches to detect such attacks include weighted local binary pattern and deep features extracted from CNNs [98]. • Synthetic/Artificial Eye: This type of attack is less common than print and cosmetic contact lens but is gaining interest in recent times [82]. Some of the proposed methods for detection are based on multispectral imaging [22] and eye gaze tracking [88]. • Multiple Attacks: PAD algorithms can also be designed to address various types of PAs. In [64], Hoffman et al. designed a CNN that used patch information along with a segmentation mask from an un-normalized iris image to learn image characteristics that differentiate PA samples from bonafide samples. Menotti et al. [98] proposed a CNN based approach to detect spoofs in different modalities, viz., iris, face and fingerprint. While the aforementioned methods exhibit reasonably good performance on seen PAs, they do not generalize well over unseen PAs. We use the term “Seen PAs" to refer to PAs that are used or observed during the development or training stage of the detector. On the other hand, “Unseen PAs" refer to PAs that are not used or observed during the development or training stage of the detector. Some of the recent PAD methods attempted to improve generalizability by building deep convolutional networks that aims to learn the difference between bonafide irides and PAs. Gupta et al. [59] proposed a novel deep learning-based PAD method, known as MVANet, that leverages multiple representation convolutional layers to improve the generalization of the PA detector. MVANet also addresses the computational complexity inherent in training deep neural networks by adopting a pragmatic approach and utilizing a fixed base model. The comprehensive assessments on various databases using cross-database training-testing configurations shows the 37 efficacy of MVANet in generalizing over unseen PAs. Sharma and Ross [128], introduced an iris PAD method named D-NetPAD, built upon the DenseNet convolutional neural network architecture. D-NetPAD exhibits a high degree of adaptability when it comes to generalizing over various PA artifacts, sensors, and datasets. Through a series of experiments carried out on various iris PA datasets, they validated the efficacy of D-NetPAD in generalized PA detection. In [21], Chen and Ross proposed a joint iris detection and PA detection method that aims to predict the parameters of the iris bounding box and simultaneously assess the likelihood of a presentation attack using the input ocular image. Through a series of experiments, authors showed the efficacy of proposed method in detecting irides as well as iris PAs. Most of these PAD methods formulate presentation attack detection as a binary-class problem, which demands the availability of a large collection of both bonafide and PA samples to train classifiers. However, obtaining a large number of PA samples can be much more difficult than bonafide iris samples. Further, classifiers are usually trained and tested across similar PAs, but PAs encountered in operational systems can be diverse in nature and may not be available during the training stage. Also, in case of binary-class based detectors, the PAD methods need to be fine-tuned (or in some cases re-trained) whenever a new PA is introduced. Therefore, PAD algorithms based on binary classifiers might fail to generalize to unseen PAs. In the literature, researchers have attempted to impart generalizability to PAD algorithms by adopting an anomaly detection approach also known as one-class classification. In this approach, the PA detector learns the distribution of bonafide samples only and uses this information to detect “outliers" that would presumably correspond to PAs. In [39], Ding and Ross proposed an ensemble of one-class classifiers, trained on hand-crafted features, to detect unseen fingerprint PAs. Nikisins et al. [106] used the gaussian mixture model as a one-class classifier trained on image quality measure (IQM) features [50] for generalized face PA detection. In [44, 65, 123, 168] researchers used deep architectures such as CNNs and GANs [57] for anomaly detection in general image classification problems, which suggest the efficacy of deep-learning based features in anomaly detection. Using this as motivation, we propose a one-class classifier for generalized iris PA detection 38 known as RD-PAD. The relativistic discriminator (RD) of the ensuing RaSGAN learns to separate bonafide irides from their synthetic counterparts. In the process, the RD fits a tight boundary around the bonafide samples making it an effective one-class anomaly detector, which we refer to as RD-PAD. The proposed method, in principle, does not require any PA samples during training; only bonafide samples are needed during training. Consequently, anything that lies outside the learned distribution on bonafide samples is classified as PA. The major contributions of this research are summarized here: • We use RaSGAN with FID score to generate synthetic iris images that can effectively model the distribution of real iris images. • We investigate if state-of-the-art iris PAD algorithms can distinguish bonafide irides as well as presentation attack images (e.g. cosmetic contact lens, printed iris and artificial eye images) from the generated synthetic images. • We propose RD-PAD for unseen PA detection that utilizes the relativistic discriminator from a RaSGAN to discriminate bonafide samples from PAs. The proposed PAD algorithm requires only bonafide samples for training. • We analyze the performance of state-of-the-art PAD algorithms on unseen PAs and compare them with the proposed method. • We evaluate the performance of the proposed RD-PAD when it is fine-tuned using a few PA samples and tested on PAs that are not used during training. 2.2 Background Generative Adversarial Networks (GANs) [57] are neural networks that consist of two different components: a generator (𝐺) that learns how to synthesize the data (e.g., images), and a discrim- inator (𝐷) that aims to discriminate between real and synthetic data. These two networks are 39 alternatively updated against each other in a min-max game where the objective of the generator is to maximally fool the discriminator while the objective of the discriminator is to not be fooled. 2.2.1 Standard Generative Adversarial Networks (SGANs) As mentioned previously, a standard GAN (SGAN) consists of two networks 𝐷 and 𝐺 that are wrapped in a min-max game to update their weights and compete against each other. This is achieved by alternatively minimizing and maximizing the objective function 𝑆 as, min 𝐺 max 𝐷 𝑆(𝐷, 𝐺) = E𝒙𝒓 ∼P [𝑙𝑜𝑔(𝐷 (𝒙𝒓))] +E𝒛∼M [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝒛)))]. (2.1) Here, 𝒙𝒓 ∼ P indicates that 𝒙𝒓 is from the true data distribution P. Also, 𝐷 (𝒙) is the output obtained after applying the sigmoid function (𝑠𝑖𝑔), to the non-transformed layer 𝑁𝑇 (𝒙), 𝐷 (𝒙) = 𝑠𝑖𝑔(𝑁𝑇 (𝒙)). (2.2) Here, 𝑁𝑇 (𝒙), refers to the output of the last convolutional layer before the application of logistic regression. Traditional GANs such as SGAN, WGAN and DCGAN design discriminators that optimize their ability to distinguish synthetically generated data from bonafide samples. While they have been reported to perform well [72] on low resolution datasets, unstable training and optimization have been observed when they are used with high-resolution data [72]. This instability can be explained in terms of the gradient of the traditional discriminator: ∇𝜃 𝑆𝐷 = −E𝒙𝒓 ∼P [(1 − 𝐷 (𝒙𝒓))∇𝜃 𝑁𝑇 (𝒙𝒓)] +E𝒙𝒔∼Q [𝐷 (𝒙𝒔)∇𝑤 𝑁𝑇 (𝒙𝒔)]. (2.3) Here, 𝒙𝒔 ∼ Q indicates that 𝒙𝒔 is from the model distribution Q, i.e., synthetically generated data. During training, when the discriminator is optimized, 1 − 𝐷 (𝒙𝒓) converges to 0 indicating that the gradient of 𝐷 comes mostly from synthetically generated data. Consequently, the generator stops learning to generate natural looking images. This in turn restricts the ability of the discriminator to learn a good representation for bonafide irides. However, we would like to learn a stable model with a discriminator that has a better understanding of the distribution of bonafide irides. 40 2.2.2 Relativistic Standard Generative Adversarial Networks (RSGANs) In [72], Jolicoeur-Martineau introduced the relativistic discriminator, 𝐷 𝑅, which aims to maximize the probability that bonafide irides are more real than synthetically generated irides using the following objective function: 𝑅(𝐷 𝑅) = −E(𝒙𝒓 ,𝒙𝒔)∼(P,Q) [𝑙𝑜𝑔(𝑠𝑖𝑔(𝑁𝑇 (𝒙𝒓) −𝑁𝑇 (𝒙𝒔)))]. (2.4) In this case, the training of the discriminator depends on both bonafide and synthetic data. From Equation (5.7), we can see that its gradient depends on 𝒙𝒓 as well as 𝒙𝒔, which ensures that the generator 𝐺 𝑅 continues learning to synthesize real looking irides until convergence. In RSGAN, 𝐺 𝑅 aims to generate images that maximize the probability that they are more real than bonafide samples: 𝑅(𝐺 𝑅) = −E(𝒙𝒓 ,𝒙𝒔)∼(P,Q) [𝑙𝑜𝑔(𝑠𝑖𝑔(𝑁𝑇 (𝒙𝒔) −𝑁𝑇 (𝒙𝒓)))]. (2.5) Therefore, 𝐷 𝑅 and 𝐺 𝑅 compete with each other to generate realistic looking and high resolution synthetic irides. 2.2.3 Relativistic Average Standard Generative Adversarial Networks (RaSGANs) In RSGAN, a sample in distribution P is compared with every sample in Q (and vice-versa), which might not be very efficient. Therefore, to make this adversarial network more efficient, Jolicoeur- Martineau [72] updated the objective function of 𝐷 𝑅 and 𝐺 𝑅 to compare a sample in distribution P with the average of samples from Q (and vice-versa): 𝑅𝑎𝑣𝑔 (𝐷 𝑅) = −E𝒙𝒓 ∼P [𝑙𝑜𝑔( ˆ𝐷 (𝒙𝒓))]− E𝒙𝒔∼Q [𝑙𝑜𝑔(1 − ˆ𝐷 (𝒙𝒔))], 𝑅𝑎𝑣𝑔 (𝐺 𝑅) = −E𝒙𝒔∼Q [𝑙𝑜𝑔( ˆ𝐷 (𝒙𝒔))]− E𝒙𝒓 ∼P [𝑙𝑜𝑔(1 − ˆ𝐷 (𝒙𝒓))], 41 (2.6) (2.7) Figure 2.2: Schematic of the training process for the Relativistic Average Standard Generative Adversarial Network (RaSGAN) using real iris images. The training images for RaSGAN are first aligned and center-cropped using the pupil-iris center. Cropped images of size 256×256 are then sent to the discriminator for training. The discriminator tries to detect synthesized images while the generator competes with it to generate more realistic synthetic images by back-propagating the loss after each training iteration and updating the weights. For each generated image, a FID score is calculated to evaluate its quality. This process is repeated until images with lower (i.e., better) FID scores are generated. ˆ𝐷 =    𝑠𝑖𝑔(𝑁𝑇 (𝒙) − E𝒙𝒔∼Q𝑁𝑇 (𝒙𝒔)), if 𝒙 = 𝒙𝒓 𝑠𝑖𝑔(𝑁𝑇 (𝒙) − E𝒙𝒓 ∼P𝑁𝑇 (𝒙𝒓)), if 𝒙 = 𝒙𝒔. (2.8) This network is referred to as Relativistic Average Standard Generative Adversarial Network (RaSGAN). The generator and discriminator in RaSGAN use the relative information between bonafide and synthetic data to generate realistic looking and high resolution irides. When the generator competes with the discriminator in this fashion, it gives the discriminator the opportunity to learn a more effective distribution for bonafide irides. This is the key observation that we exploit in this work: the ability of the generator to synthesize more natural looking irides, and the ability of the discriminator to learn a more accurate distribution for bonafide irides. 42 Figure 2.3: Samples of generated irides from trained Relativistic Average Standard Generative Adversarial Network (RaSGAN). 2.2.4 Fréchet Inception Distance Score In [119], Salimans et al. proposed to use a pre-trained inception-V3 network to generate images and then compare the marginal label distribution with the conditional label distribution to generate the inception score. With respect to large KL-Divergence between the distributions, higher the inception score, more realistic looking the generated data. The inception score provides a good metric for evaluating realism of generated images but it does not include statistics that compare real data against synthetic data. Instead of analyzing synthetic iris images in isolation, the Frechet Inception Distance [63] compares the statistics of the generated synthetic samples against the real samples: 𝐹 𝐼 𝐷 = ∥ 𝝁𝒓 − 𝝁𝒔 ∥2 + 𝑇𝑟 (𝚺𝒓 + 𝚺𝒔 − 2 √︁𝚺𝒓𝚺𝒔), (2.9) where, 𝝁𝒓, 𝝁𝒔, 𝚺𝒓 and 𝚺𝒔 represent the statistics of the two distributions and 𝑇𝑟 is the trace of the co-variance matrix (𝚺𝒓 + 𝚺𝒔 − 2 √ 𝚺𝒓𝚺𝒔). Since FID is measured as the distance between the 43 distributions of real and generated data, the lower the FID score, the higher the similarity between real and generated data. 2.3 Proposed Method In this research, we train RaSGAN under two different settings: (1) Train RaSGAN such that the generator can generate high resolution and realistic looking synthetic irides. The stopping criteria for this setting is designed to achieve best image quality which is evaluted using Fréchet Inception Distance (FID) Score. Lower the FID score, better is the image quality. (2) Train a one-class classifier, Relativistic Discriminator (RD), for generalized iris PA detection. Here, the focus is to learn to separate bonafide irides from their synthetic counterparts. In the process, the RD fits a tight boundary around the bonafide samples making it an effective one-class anomaly detector, known as RD-PAD. 2.3.1 Synthesizing Irides using RaSGAN Here, our aim is to utilize the generative power of RaSGAN to generate synthetic irides that has good image quality. The network is trained until the best quality iris images the generated by the generator. This is achieved by analyzing the quality of generated images at the end of each training iteration until the best possible FID score is obtained. This helps make sure that the trained network can generate realistic looking iris images. The RaSGAN architecture used in this work consists of two important components: relativistic discriminator and generator that are implemented using PyTorch libraries.1 The network is trained using only bonafide samples that are first pre-processed to align them using the center coordinates of the pupil and the iris. The coordinates themselves are obtained using VeriEye SDK.2 The aligned images are further center-cropped and then resized to obtain images of size 256×256. The input to the relativistic discriminator (𝐷 𝑅) are pre-processed bonafide samples and synthetically generated 1https://github.com/alexiajm/relativisticgan 2www.neurotechnology.com/verieye.html 44 irides from 𝐺 𝑅. The input to the generator is a noise sample 𝒛 of size 1×1×128, where 𝒛 is sampled from a normal noise distribution. The architecture of the RaSGAN used in this work is summarized below. • Relativistic Discriminator: The 𝐷 𝑅 in RaSGAN has been constructed using seven convo- lutional layers with kernel size 4×4 and stride=2 (apart from the last convolution layer where stride=1). The first convolutional layer is followed by leaky rectified linear (Leaky-ReLU) units while the remaining layers (except for the last convolutional layer) are followed by both batch normalization and leaky rectified units. • Relativistic Generator: The 𝐺 𝑅 aims to generate natural looking irides of size 256×256 from input 𝒛, and has been implemented using seven transposed convolutional layer. Each layer has a kernel size of 4×4 and stride=2, except for the first transposed layer that has stride=1. Batch normalization and rectified linear units are applied to the output of each transposed convolutional layer. 2.3.2 Relativistic Discriminator- A One-class Presentation Attack Detection Method (RD- PAD) The goal here is to develop a presentation attack detector that can generalize well over previously unseen PAs. Therefore, we focused on learning a good representation of bonafide samples for one- class classification. This has been achieved by utilizing the relativistic discriminator (𝐷 𝑅) from RaSGAN that is trained using only bonafide samples and the corresponding synthetically generated samples from RaSGAN. During training, 𝐷 𝑅 competes with 𝐺 𝑅 to distinguish between bonafide and synthetically generated irides. This enables the discriminator to better learn the distribution of bonafide irides thereby allowing it to distinguish bonafide samples from all types of PA samples. 45 2.3.2.1 Method-I: RD-PAD Trained with Bonafide Samples Only Training a good discriminator is an important aspect of the proposed method. Therefore, as the first step, RaSGAN is trained using bonafide irides only. All the samples used during training are center aligned and cropped to size 256×256, as described in Section 2.3.1. The 𝐷 𝑅 obtained after RaSGAN training outputs the probability that a given input sample belongs to the bonafide distribution, i.e., an ideally trained 𝐷 𝑅 should satisfy 𝐷 𝑅 (𝒙) ≈ 1, when 𝒙 belongs to the bonafide iris category, and 𝐷 𝑅 (𝒙) ≈ 0, when 𝒙 represents some PA sample. 2.3.2.2 Method-II: RD-PAD Fine-tuned with Some PA Samples The 𝐷 𝑅 in Method-I is familiar with the distribution of bonafide samples but has no knowledge of any PA distributions. So, it learns a tight boundary encompassing the bonafide class, which can lead to misclassification of some bonafide irides (especially in the cross-sensor scenario). In Method-II, we further expand the capabilities of the RD-PAD by fine-tuning 𝐷 𝑅 using bonafide samples and a few known PAs. This enables 𝐷 𝑅 to learn the difference between bonafide irides and some PAs albeit in a limited way. Further, since this research focuses on unseen iris PA detection, the PA types used to fine-tune 𝐷 𝑅 are mutually disjoint with the PA types used in the test set. 2.4 Experimental Protocols 2.4.1 Dataset Used In this, we utilized image samples from multiple iris datasets, viz., Berc-iris-fake [90, 91], Casia- iris-fake [136], LivDet2015 [161], LivDet2017 [162], NDCLD15 [41] and a self collected dataset named MSU-IrisPA-01, for training and testing under different experimental set-ups. Images in MSU-IrisPA-01 were collected using the IrisID 7000 scanner over multiple sessions. This dataset contains 1,343 bonafide samples, 1,938 printed iris images, 108 colored contact lens images, 352 artificial eyes and 125 Kindle replay-attack images. All the images in these iris datasets are pre- 46 Table 2.1: The iris datasets used in this research. The gray cells represents PA types that are not present in some datasets. The iris images are first pre-processed to produce images of size 256×256. Images that could not be processed by VeriEye were removed from the training sets. On the other hand, all such images are labeled as PAs in the test sets. The datasets are further adjusted to balance the samples in the two classes (bonafide versus PA). Bonafide Printed eyes Cosmetic contact lenses Artificial eyes Kindle display Berc-Iris-Fake [91] Casia-Iris-Fake [136] NDCLD15 [41] LivDet15 [161] LivDet17 [162] MSU-Iris-PA01 [156] Total 2,778 1,200 140 80 Used 6,000 640 740 400 Total 6,000 640 740 400 Used 2,778 1,200 140 80 Used 10,763 7,336 5,287 Total 11,372 12,099 5,287 Total 2,606 4,473 Used 1,402 4,259 Total 1,100 Used 695 1,100 1,100 Total 1,343 1,938 108 352 125 Used 1,000 1,830 108 352 125 processed to produce images of size 256×256 using segmentation coordinates from VeriEye 3. Images that could not be processed by VeriEye were removed from the training sets (as shown in Table 2.1). 2.4.2 Image Realism Assessment As mentioned earlier, Fréchet Inception Distance (FID) Score [119] can be utilized as a good metric for evaluating the realism of the synthetically generated images, which compares the statistics of the generated synthetic images against the real images to generate a distance score. Hence, the lower the FID score, the higher the similarity between real and generated data. As described in [131], this score can be as high as 400-600 (or even more with respect to the deviation of generated data from the original distribution), but a score this high would indicate that the quality of the generated dataset is unacceptable. To analyze the quality of irides generated by RaSGAN that is trained using using 2,778 bonafide samples from the Berc-iris-fake dataset, we first generate 6,277 synthetic iris samples and evaluate a FID score against 6,277 real bonafide samples from the Casia-iris-fake, LivDet2015, NDCLD15 and MSU-IrisPA-01 datasets. With this, we obtained an overall score of 39.17 that is comparable to FID scores obtained in [42]. Hence, we conclude that the RaSGAN based synthetically generated iris samples closely resemble bonafide iris samples. 3www.neurotechnology.com/verieye.html 47 (a) Normalized PA score distribution from BSIF+SVM. (b) Normalized PA score distribution from fine-tuned VGG-16. Figure 2.4: Normalized PA score distribution of RaSGAN-based synthetic iris images for Experiment-1 when tested on two PAD algorithms: (a) BSIF+SVM [41] and (b) Fine-tuned VGG- 16 [53]. These histograms emphasize the similarity between bonafide samples and the generated dataset. 48 Table 2.2: Performance (in %) of PAD algorithms in Experiment-0 that is used as baseline for analysis and comparison with other experiments. Here, the PAD methods are trained using 4,312 bonafide samples and 5,538 PA samples. Also, the test set consists of 1,965 bonafide samples and 3,929 PA samples. BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81] Iris-TLPAD EER TDR(@1%) TDR(@5%) 5.44 71.70 93.79 5.85 86.20 93.17 19.37 21.62 48.80 2.78 96.31 97.15 Table 2.3: Performance (in %) of PAD algorithms in Experiment-1 (top) and Experiment-2 (bottom) when RaSGAN-based synthetic iris images are used as bonafide samples. BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81] Iris-TLPAD EER TDR(@1%) TDR(@5%) 7.42 81.01 90.45 6.74 82.89 92.33 16.07 28.98 56.25 3.19 95.65 97.25 BSIF+SVM [41] pre-trained VGG-16 [53] DESIST [81] Iris-TLPAD EER TDR(@1%) TDR(@5%) 10.38 60.36 84.27 14.11 51.69 68.36 18.53 43.05 54.50 23.85 53.82 62.93 2.4.3 Applications of RaSGAN based Iris images The generated images are analyzed and evaluated for their usefulness as both bonafide images and presentation attack images, using different PAD methods, viz., DESIST [81], BSIF+SVM [41], Iris-TLPAD [21] and pre-trained VGG-16 [53]. Seven different experiments are conducted with 6,277 RaSGAN-based synthetically generated irides, 6,277 bonafide irides and 9,467 PA samples from Casia-iris-fake, NDCLD15, LivDet2015 and MSU-IrisPA-01 datasets. Further division of these datasets for training and testing the PAD algorithms is explained in the experimental protocols described below. In all cases, train and test sets were mutually disjoint. 49 Table 2.4: Performance (in %) of PAD algorithms in Experiment-3 (top) and Experiment-4 (bottom) when RaSGAN-based synthetic iris images are used as presentation attack images. BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81] Iris-TLPAD EER TDR(@1%) TDR(@5%) 16.03 52.06 74.96 10.25 75.77 85.14 10.37 83.87 86.06 2.47 95.11 95.17 BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81] Iris-TLPAD EER TDR(@1%) TDR(@5%) 50.64 0.51 3.10 30.94 3.21 7.43 57.45 0.35 2.23 25.14 1.37 9.16 2.4.3.1 Baseline on Current PAD Algorithms Experiment-0: This experiment is used as a baseline to evaluate the performance of different PAD methods on traditional PAs such as cosmetic contact lenses, printed eyes, Kindle replay-attack and artificial eyes. The PAD algorithms are trained using 4,312 bonafide samples and 5,538 PA samples; the test set consists of 1,965 bonafide samples and 3,929 PA samples. Results for this experiment are summarized in Table 2.2, where Iris-TLPAD achieves the best performance with an Equal Error Rate (EER) as low as 2.78% followed by BSIF and VGG-16 with 5.44% and 5.85%, respectively. 2.4.3.2 Synthetic Iris as Bonafide Sample FID scores do not merely provide an estimate of the quality of the generated images, but also about the similarity between the distributions of the synthetic data and the real data. To further establish the “bonafide nature" of the generated synthetic images, we conducted two more experiments: Experiment-1: The PAD algorithms are trained using 4,312 bonafide and 5,538 PA samples including printed eye, cosmetic contact lens, artificial eye and Kindle images. The test set was created using 1,965 bonafide samples, 3,929 PA samples and 1,965 synthetically generated images (labeled as bonafide samples). 50 Experiment-2: This experiment focuses on evaluating the capability of the generated synthetic data to replace the need for bonafide samples. Thus, the PAD algorithms are trained using 4,312 RaSGAN- based synthetic iris images and 5,538 PA samples from Experiment-1. Testing is done on 1,965 bonafide irides and 3,929 PA samples. Analysis: From Table 2.2, we observe that even in the presence of synthetic data (labeled as bonafide) during testing, the performance of PAD algorithms in Experiment-0 and Experiment- 1 are comparable. There is an increase of only 1.98% in the EER of BSIF for Experiment-1. Congruent behavior is observed for other PAD algorithms implying that majority of RaSGAN based synthetic iris images are being classified as bonafide samples (see Figure 2.4). However, when PAD algorithms are trained using synthetic iris images (instead of bonafide images) in Experiment-2, an increase in EER is observed (see Table 2.3). But some of the PAD algorithms still achieve a competitive True Detection Rate (TDR) of 84.27% at 5% False Detection Rate (FDR). This signifies that even though the generated iris images closely resemble bonafide samples, there are some fundamental differences between the two sets of images. This suggests the possibility of exploiting the synthetic images in a different way to enhance PAD algorithms, as will be shown later. 2.4.3.3 Synthetic Iris as Presentation Attack Sample The synthetically generated dataset can be exploited by an adversary to impersonate someone’s identity. In the next two experiments, we study the impact of the synthetic data on PAD algorithms when used as a presentation attack. Experiment-3: In this experiment, we analyzed the performance of the PAD algorithms when the synthetic iris data is used as a “known" presentation attack. So, PAD algorithms are trained using 4,312 bonafide and 4,312 synthetic samples while testing is done using 1,965 samples from each class. Unlike Experiment-1 and 2, here synthetic images are labeled as PA. Experiment-4: In this experiment, we analyzed the performance of the PAD methods when the 51 generated iris data are used as “unseen" presentation attacks. Here, the PAD algorithms are trained using 4,312 bonafide and 4,312 PA samples while testing is done using 1,965 bonafide and 1,965 synthetic samples. Analysis: Comparing the results of Experiment-0 and Experiment-3, we observe a considerable increase in EER when RaSGAN-based synthetic iris images are used as PAs (except for Iris- TLPAD). A decrease in TDR is observed for all PAD algorithms (except for DESIST) that confirms the viability of using RaSGAN generated synthetic images as presentation attack vectors on current state-of-the-art methods. Also, when RaSGAN based synthetic data is used only in the test set as an unseen attack (Experiment-4), a very significant drop in the performance of PAD algorithms is observed. For example, in Table 2.4 (bottom), EER values for BSIF and DESIST are more than 50% with TDR at an FDR of 5% as low as 3.10% and 2.23%, respectively. Similar observation can be made for other PAD algorithms. 2.4.4 RD-PAD for Seen and Unseen Presentation Attack Detection In this section, we evaluate the efficacy of RD-PAD in detecting unseen PAs using publicly available iris datasets summarized in Table 2.1. We also compute the performance of current state-of-the-art PAD algorithms, viz., BSIF+SVM [41], pre-trained VGG-16 [53], DESIST [81] and Iris-TLPAD [21], and compare them against that of RD-PAD. A total of 22,638 bonafide irides and 23,597 PA samples are utilized to train and test these algorithms. PA samples used in this work consist of multiple types of attacks including cosmetic contact lenses, artificial eyes, Kindle display-attack and printed eyes. 2.4.4.1 Seen Presentation Attacks This is a baseline experiment for RD-PAD, Experiment-5, which demonstrates the performance of existing PAD algorithms and proposed Method-I and Method- II on known PAs. In this experiment, the PAD algorithms were trained using 12,875 bonafide irides and 12,326 PAs containing cosmetic 52 Table 2.5: Attack Presentation Classification Error Rate (APCER) at 0.1%, 1% and 5% Bonafide Presentation Classification Error Rate (BPCER) of existing PAD algorithms and the proposed RD- PAD (Method-I and Method-II) on known PAs as described in Experiment-5. Lower the APCER, better is the performance. BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [81] Iris-TLPAD [21] Method-I Method-II APCER(@0.1%) APCER(@1%) APCER(@5%) 81.83 51.26 27.54 30.55 23.27 8.09 99.04 78.13 43.64 13.34 6.64 0.89 33.40 26.18 18.35 24.65 15.68 8.43 contact lens, printed eye, artificial eye and Kindle display-attack. On the other hand, the proposed Method-I was trained using samples. Method-II was trained using bonafide samples and only 800 randomly selected known PAs. All the trained algorithms were then tested on 6,207 bonafide and 6,529 PA samples consisting of cosmetic contact lenses, printed eyes, artificial eyes and Kindle- display attacks. 2.4.4.2 Unseen Attack: Cosmetic Contact Lenses and Kindle Display In this section, we evaluate the performance of the PAD algorithms for generalized PA detection when training data does not include PAs such as cosmetic contact lens and Kindle display-attack. Experiment-6: Here, the other PAD algorithms are trained using 2,778 bonafide samples from Berc-iris-fake [90, 91], and 3,007 printed eyes and artificial eyes from the other datasets. Note that Method-I is trained using only bonafide samples and Method-II is first trained using only bonafide samples and then fine tuned using only 800 PA samples. All the trained algorithms are then tested using 3,913 bonafide samples (excluding Berc-Iris-Fake) and 3,279 PA samples corresponding to cosmetic contact lenses and Kindle display-attacks. Experiment-7: Here, the other PAD algorithms are trained using 6,000 bonafide samples from Casia-Iris-Fake [136], and 6,187 PA samples from the other datasets consisting of only printed eyes and artificial eyes. Similar to the previous experiment, Method-I is trained using only bonafide samples while Method-II is first trained using only bonafide samples and then fine-tuned using only 800 PA samples. These algorithms are tested using 5,634 bonafide samples from other datasets (excluding Casia-Iris-Fake) and 5,556 PAs consisting of cosmetic contact lenses and Kindle display- 53 Table 2.6: Attack Presentation Classification Error Rate (APCER) at 0.1%, 1% and 5% Bonafide Presentation Classification Error Rate (BPCER) of existing PAD algorithms and the proposed methods on unseen PAs as described in Experiment-6 and Experiment-7. Lower the APCER, better is the performance. BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?] Iris-TLPAD [21] Method-I Method-II APCER(@0.1%) APCER(@1%) APCER(@5%) 99.95 97.83 91.61 99.95 95.88 72.55 94.14 79.29 54.34 99.52 95.25 85.06 74.39 44.35 34.06 50.26 37.05 21.79 (a) Experiment-6 BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?] Iris-TLPAD [21] Method-I Method-II APCER(@0.1%) APCER(@1%) APCER(@5%) 100 98.31 75.05 99.98 82.04 70.09 100 96.99 83.05 99.63 96.33 89.87 66.98 53.69 38.72 61.80 39.40 26.49 (b) Experiment-7 attacks. 2.4.4.3 Unseen Attack: Printed Eyes and Artificial Eyes In this section, we evaluate the performance of the PAD algorithms when printed eyes and artificial eyes are used as unseen presentation attacks. Experiment-8: In this experiment, the other PAD algorithms are trained using 2,778 bonafide samples from Berc-Iris-Fake and 3,093 PA samples from the other datasets. The PA samples in training set consists of only cosmetic contact lenses and Kindle display-attacks. The proposed Method-I is trained using only bonafide samples and Method-II is first trained using only bonafide samples and then fine-tuned using only 500 PA samples from the training set. The test set consists of 3,450 bonafide samples and 3,347 PA samples corresponding to printed eyes and artificial eyes. Experiment-9: Here, the PAD algorithms are trained using 6,000 bonafide samples from Casia- Iris-Fake [136] and 5,681 PA samples from the other datasets corresponding to cosmetic lenses and Kindle display-attacks. Similar to the previous experiment, Method-I is trained using only bonafide samples while Method-II is first trained using only bonafide samples and then fine-tuned using only 500 PA samples. These algorithms are tested using 8,517 bonafide samples from other datasets (excluding Casia-Iris-Fake) and 8,865 PAs corresponding to printed eyes and artificial eyes. 54 Table 2.7: Attack Presentation Classification Error Rate (APCER) at 0.1%, 1% and 5% Bonafide Presentation Classification Error Rate (BPCER) of existing PAD algorithms and the proposed RD- PAD (Method-I and Method-II) on unseen PAs as described in Experiment-8 and Experiment-9. Lower the APCER, better is the performance. BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?] Iris-TLPAD [21] Method-I Method-II APCER(@0.1%) APCER(@1%) APCER(@5%) 90.29 90.29 87.75 90.26 80.79 66.78 97.88 93.55 81.35 N/A 27.49 17.06 58.34 45.72 36.74 37.13 27.19 19.56 (a) Experiment-8 BSIF+SVM [41] Pre-trained VGG-16 [53] DESIST [?] Iris-TLPAD [21] Method-I Method-II APCER(@0.1%) APCER(@1%) APCER(@5%) 95.37 90.74 81.52 99.92 94.19 77.69 100 98.90 93.68 N/A 34.86 17.59 60.71 38.30 25.06 32.49 23.30 17.58 (b) Experiment-9 Figure 2.5: ROC curve demonstrating the performance of existing PAD algorithms and the proposed methods on known PAs (as described in Experiment-5). 2.4.4.4 Analysis The results in Table 2.5 show that deep networks such as VGG-16, Iris-TLPAD and the proposed methods achieve good (low) Attack Presentation Classification Error Rate (APCER) at 5% Bonafide 55 (a) (b) Figure 2.6: ROC curves demonstrating the performance of existing PAD algorithms and the proposed RD-PAD methods on unseen PAs, as described in Experiment-6 and Experiment-7. 56 (a) (b) Figure 2.7: ROC curves demonstrating the performance of existing PAD algorithms and the proposed RD-PAD methods on unseen PAs, as described in Experiment-8 and Experiment-9. 57 Presentation Classification Error Rate (BPCER)4 when trained and tested on the same type of PAs. However, Tables 2.6 and 2.7 show that current PAD algorithms do not perform well when tested on unseen PAs. In Experiment-1, APCERs of 34.06% and 21.79% are obtained at 5% BPCER for the proposed Method-I and Method-II, respectively. On the other hand, current PAD algorithms obtained a much higher APCER thereby highlighting the shortcomings of these algorithms for unseen PA detection. In Experiment-3 and Experiment-4, Method-II and TL-PAD obtained comparable performance at 5% BPCER for unseen printed and artificial eyes. However, TL-PAD failed to produce any valid output (NA) at 0.1% BPCER and has a higher APCER than Method-II at 1% BPCER. Also, TL-PAD performed poorly on unseen cosmetic contact lenses and Kindle display-attacks in Experiment-1 and Experiment-2. This indicates the shortcoming of TL-PAD in handling unseen cosmetic contact lenses. Comparing all the results, we can conclude that the proposed algorithms have better generalizability over both seen and unseen attacks (see Figures 2.5, 2.6 and 2.7). Additionally, in [21] TL-PAD was evaluated on a subset of the LivDet- Iris 2017 dataset, and achieved better performance than the three participating algorithms in the competition. Hence, this research makes an indirect comparison against the other algorithms published in LivDet-Iris 2017. 2.5 Summary In this work, we designed a new technique based on RaSGAN to generate synthetic irides. Our experimental results suggest that there are multiple applications for synthetic iris images: (1) they can be used to imitate real iris images, which eliminates the hassle of large data collection, (2) they can efficiently model the bonafide samples (see Figure 2.4), making them potential presentation attack vectors, and (3) they can be used to train existing PAD algorithms for “unseen" presentation attack detection. While this method can generate realistic looking images with low FID score, the biometric data in the generated images is not unique i.e., identities in irides generated by RaSGAN show high resemblance with training data and itself (discussed in Section 4). Also, this 4APCER is equivalent to (1 - True Detection Rate (TDR)) while BPCER is equivalent to False Detection Rate (FDR). 58 method is not scalable to multi-domains, i.e., one model can only learn to generate a single type of distribution or style. We will further investigate this in next chapter to solve the problem of multi-domain synthetic iris generation with style transfer. Apart from analyzing the application of generated irides, we also proposed a one-class PA detection method for improved unseen PA detection. To facilitate this, we harnessed the relativistic discriminator of a RaSGAN that is trained to distinguish between bonafide iris samples and the corresponding synthetically generated iris samples. We hypothesize that such a discriminator more effectively learns the distribution of bonafide samples and will, therefore, reject PA samples that do not fall within this distribution. In this regard, the discriminator behaves as a one-class classifier since, in principle, it does not require data from PA samples during the training stage. Experimental results demonstrate the efficacy of the proposed method over current state-of-the-art PAD methods, especially on unseen attacks. However, for seen PAs current deep learning-based binary PAD methods [98, 127] outperform the proposed method, highlighting its weakness and limitations. 59 CHAPTER 3 CYCLIC IMAGE TRANSLATION GENERATIVE ADVERSARIAL NETWORK (CIT-GAN) In this work, we discuss the lack of iris samples from both classes- bonafide and presentation attacks- in terms of number and diversity and how it affects the performance of various presentation attack detection methods. We aim to solve this problem by proposing a novel image translative GAN, known as CIT-GAN, to generate bonafide as well as different types of presentation attacks in NIR spectrum using a unified architecture. This work has been published in [158]. 3.1 Introduction As mentioned previously, the unique texture of the iris has made iris-based recognition sys- tems desirable for human recognition in a number of applications [71]. However, these systems are increasingly facing threats from presentation attacks (PAs) where an adversary attempts to obfuscate their own identity, impersonate someone’s identity, or create a virtual identity [117]. Grasping the threats posed by PAs, researchers have been working on devising methods for iris presentation attack detection (PAD) that aim to distinguish between bonafide and PAs. In [60,115], researchers used textural descriptors like Local Binary Pattern (LBP) and multi-scale binarized statistical image features (BSIF) to detect print attacks. Kohli et al. [81] proposed a variation of LBP to obtain textual information from iris images that helps in detecting cosmetic contact lens. More recently, deep features from Convolutional Neural Networks (CNNs) have been used to detect multiple iris presentation attacks [21] [65]. Yadav et al. [157] utilized the Relativistic Discriminator from a RaSGAN as a one-class classifier for PA detection. Gupta et al. [59] introduced a novel deep learning-based PAD method named MVANet, which harness multiple convolutional layers for enhanced representation and employs a fixed base model to mitigate the computational complex- ities often associated with training deep neural networks. Comprehensive evaluations conducted across various datasets, employing cross-database training-testing configurations, underscore the effectiveness of MVANet in its capacity to generalize over unseen Presentation Attacks (PAs). 60 In [128], Sharma and Ross, proposed D-NetPAD, an iris PAD method founded on the DenseNet convolutional neural network architecture. D-NetPAD demonstrates adaptability in generalizing across diverse PA artifacts, sensors, and datasets. Through a series of experiments conducted on various iris PA datasets, they substantiated the efficacy of D-NetPAD in achieving generalized PA detection. While these methods report high accuracy for iris PA detection, but their performance can be negatively impacted by the absence of a sufficient number of samples from different PA classes [34]. Therefore, we can conclude that current iris PAD methods need a copious amount of training data corresponding to different PA classes and scenarios. Unfortunately, in the real world, such a dataset is hard to acquire. In previous chapter, we leveraged RaSGAN to synthesize realistic and high resolution bonafide iris images. Nevertheless, the network’s scalability is constrained when it comes to handling various domains. To generate both bonafide and presentation attack (PA) images, it would necessitate the training of separate RaSGANs for each class (PA class has many sub-class such as printed eyes, cosmetic contact lens, etc.). This approach does not provide a viable solution to address the issue at hand. With recent advances in the field of deep learning, researchers have proposed different methods based on Convolutional Autoencoders [132,143] and Generative Adversarial Networks (GANs) [57] for image-to-image style translation. Here, image-to-image translation refers to learning a mapping between different visual domain categories each of which has its own unique appearance and style. Gatys et al. [53] proposed a neural architecture that could separate image content from style and then combine the two in different combinations to generate new natural looking styles. Their paper mainly focused on learning styles from well known artworks to generate new high quality and natural looking artwork. Karras et al. [77] introduced StyleGAN that uses a non-linear mapping function to embed a style driven latent code to generate images with different styles. However, since the input to the generator in StyleGAN is a noise vector, non-trivial efforts are required to transform image from one domain to another. Some researchers overcame this issue by enforcing an overlay between generator’s input and output for diversity in generated images using either marginal matching [7] or diversity regularization [163]. Others approached style transfer with the 61 guidance of some reference images [20, 24]. However, these methods are not scalable to more than two domains and often show instability in the presence of multiple domains [26]. Choi et al. [25, 26] proposed to solve this problem by using a unified GAN architecture called StarGAN v2 for style transfer that can generate diverse images across multiple domains. StarGAN v2 uses a single generator with a mapping and Styling Network to learn diverse mappings between all available domains. This method is scalable to multiple domain and aims to transfer style from source to target domain using a reference image. While this method offers a viable solution to our research problem, the images generated by StarGAN v2 needs improvement with respect to quality, especially while transferring style from one domain to another [158, 159]. In this work, we propose a GAN architecture that uses a novel Styling Network to drive the translation of input image into multiple target domains. Here, the generator takes an image as input along with a domain label and then generates images with characteristics of the target domains. Apart from the domain label, the generative capability of the network is enhanced using a multi- task Styling Network that learns the style codes for multiple PA domains and helps the generator to synthesize images reflecting the style components of the target PA domains. The domain- specific style characteristics learned using Styling Network depend on both style loss and domain classification. This ensures variability in style characteristics within each domain. Since there are multiple domains, the discriminator has multiple output branches to decide if a given image is real or synthetic for each of the domains. The primary contributions of this research are as follows: • We propose a unified GAN architecture, referred to as Cyclic Image Translation Generative Adversarial Network (CIT-GAN), with novel Styling Network to generate realistic looking and high resolution synthetic images for multiple domains. The quality of generated samples is evaluated using Fréchet Inception Distance (FID) Score [119]. • We demonstrate the usefulness of the synthetically generated data to train state-of-the-art iris PAD methods and improve their accuracy. 62 Figure 3.1: Schematic of the proposed Cyclic Image Translation Generative Adversarial Network (CIT-GAN). The proposed architecture has three important components: (a) Generator: Unlike a standard generator, this network takes as an image 𝑋 from a domain with label as input (represented as [1,0,0] in the figure) and outputs an image 𝑌 ′ with style similar to a reference image 𝑌 with domain label [0,1,0]. (b) Styling Network: The image-to-image translation to multiple domains is driven by a Styling Network that learns the style code for each given domain. (c) Discriminator: Unlike a standard discriminator, the discriminator in the proposed method has multiple branches each of which determines whether the input image is real or synthetic pertaining to that domain. 3.2 Proposed Method Let 𝒙 ∼ X be an input image and 𝒅 ∼ 𝔇 be an arbitrary domain from domain space 𝔇. The proposed method aims to translate image 𝒙 to synthetic image 𝒚 with style characteristics of domain 𝒅. This is achieved using a Styling Network 𝑆 that is trained to learn domain specific style codes, and then train 𝐺 to generate synthetic images with the given target style codes (see Figure 3.2). 3.2.1 Generative Adversarial Network Unlike standard GAN, the generative adversarial network in the proposed method has been updated to include domain level information. These changes are reflected in each component of the proposed architecture (as shown in 3.1) : • Generator: For image-to-image translation between multiple domains, 𝐺 takes as input 𝒙 ∈ (𝒙, 𝒍), where 𝒙 is an image and 𝒍 is the domain label, and translates it to an image 𝐺 (𝒙, 𝒔) with the desired style code 𝒔. The style code 𝒔 is facilitated by Styling Network 𝑆 and injected into G. 63 Figure 3.2: Examples of image translation from source to reference domain via CIT-GAN using domain specific styling vector obtained from the Styling Network. • Discriminator: Discriminator in the proposed architecture has multiple branches, where each branch 𝐷𝑑 decides whether the input image 𝒙 is a real image in domain 𝒅 or a synthetic image. With the new objective for the generative adversarial network, the adversarial loss can be updated as, L𝑎𝑑𝑣 = E𝒙,𝒅 [𝑙𝑜𝑔(𝐷 𝒅 (𝒙))] 𝒅′ (𝐺 (𝒙, 𝒔′)))]. 𝒙,𝒅′ [𝑙𝑜𝑔(1 − 𝐷 +E (3.1) Here, 𝐷 𝒅 (𝑥) outputs a decision on image 𝒙 for domain branch 𝒅. The Styling Network 𝑆 takes an image 𝒚 from target domain 𝒅′ and outputs a style code 𝒔′. 𝐺 (𝒙, 𝒔′) generates image 𝑦′ with style characteristics of target domain 𝑑′. 3.2.2 Styling Network Given an input image 𝒙 belonging to domain 𝒅, the Styling Network 𝑆 encodes the image into a style code 𝒔. Similar to 𝐷, the Styling Network 𝑆 is a multi-task network that learns the style code for an input image and injects the style code into 𝐺 to generate images with the given style codes. 64 This is achieved using [26], L𝑠𝑡𝑦𝑙𝑒 = E 𝒙,𝒅′ [∥𝒔′ − 𝑆 𝒅′ (𝐺 (𝒙, 𝒔′)) ∥]. (3.2) Here, 𝑠′ = 𝑆(𝑦) is the style code of reference image 𝒚 belonging to target domain 𝒅′. This ensures that 𝐺 generates images with the specified style code. However, poor quality synthetic data in the initial training iterations can affect the quality of the domain specific style codes learned by 𝑆. To avoid this, we introduce a domain classification loss L𝑐𝑙𝑠 at the shared layer of 𝑆 from soft-max layer (as shown in Figure 3.1) to ensure that the learnt style code aligns with the correct domain. Further, this helps the Styling Network to learn style vectors (or feature characteristics) of varying samples from same domain. L𝑐𝑙𝑠 = −𝑙𝑜𝑔𝑃(𝐷 = 𝒅|𝑋 = 𝒙). (3.3) Here, 𝒅 is the true domain of input 𝒙. 3.2.3 Cycle Consistency While translating images from the source domain to the domains depicted by the reference images, it is important to preserve some characteristics of the input images (such as geometry, pose and eye lashes in case of irides). This is achieved using the cycle consistency loss [25], L𝑐𝑦𝑐𝑙𝑒 = E 𝒙,𝒅,𝒅′ [∥𝒙 − 𝐺 (𝐺 (𝒙, 𝒔′), 𝑠) ∥]. (3.4) Here, 𝒔 = 𝑆(𝑥) represents the style code of input image with domain 𝒅, and 𝒔′ = 𝑆(𝑦) is the style code of reference image in target domain 𝒅′. This ensures that image 𝒙 with style 𝒔 can be reconstructed using synthetic image 𝐺 (𝒙, 𝒔′). Hence, the overall loss function for the proposed Cyclic Image Translation Generative Adver- sarial Network can be defined as: L𝑡𝑜𝑡𝑎𝑙 = L𝑎𝑑𝑣 + 𝜆1L𝑠𝑡𝑦𝑙𝑒 + 𝜆2L𝑐𝑙𝑠 + 𝜆3L𝑐𝑦𝑐𝑙𝑒. (3.5) Here, 𝜆1, 𝜆2 and 𝜆3 represent the hyper-parameters for each loss term. 65 Figure 3.3: Given are some examples of synthetic samples generated using the proposed Cyclic Image Translation Generative Adversarial Network (CIT-GAN). (a)-(b) represent synthetic cosmetic contact lenses, (c)-(d) are two different types of synthetically generated print images (one with whole iris image and other with pupil cut-out), and (e)-(f) are synthetic artificial eyes with (f) representing a doll eye. 3.3 Experimental Protocols In this section, we describe different experimental setups that are used to evaluate the quality and usefulness of synthetic PA samples generated using CIT-GAN. We evaluated the performance of different iris PAD methods viz., VGG-16 [53], BSIF [41], DESIST [81], D-NetPAD [128] and AlexNet [84] under these different experimental setups for analysis purposes. Note that D-NetPAD is one of the best performing PAD algorithms in the iris liveness detection competition (LivDet-20 edition) [36]. 3.3.1 Datasets Used In this research, we utilized five different iris PA datasets, viz., Casia-iris-fake [136], Berc-iris- fake [91], NDCLD15 [41], LivDet2017 [162] and MSU-IrisPA-01 [156] for training and testing different iris presentation attack detection (PAD) algorithms. These iris datasets contain bonafide images and images from different PA classes such as cosmetic contact lenses, printed eyes, artificial eyes and kindle-display attack (as shown in Figure 2.1). The images in these datasets are pre- processed and cropped to a size of 256x256 around the iris using the coordinates from a software called VeriEye.1 The images that were not properly processed by VeriEye were discarded from the datasets as this research focuses primarily on image synthesis. This give us a total of 24,409 1www.neurotechnology.com/verieye.html 66 bonafide irides, 6,824 cosmetic contact lenses, 680 artificial eyes and 13,293 printed eyes. 3.3.2 Image Realism Assessment The proposed architecture is trained using 6,450 bonafide images, 2,104 cosmetic contact lenses, 4,482 printed eyes and 276 artificial eyes randomly selected from the aforementioned datasets. The trained network is then utilized to generate synthetic PA samples. To achieve this, 6,000 bonafide images were utilized as source images. The source images are then translated to different PA classes using 2,000 printed eyes, 2,000 cosmetic contact lens and 276 artificial eyes as reference images. Using this approach, we generated 8,000 samples for each PA class. The generated samples from CIT-GAN obtained an average FID score of 32.79. For comparison purposes, we used the same train and evaluation setup to generate synthetic samples using Star-GAN [25], Star-GAN v2 [26] and Style-GAN [77]. As mentioned before, Style- GAN and Star-GAN are not well-equipped to handle multi-domain image translation. Therefore, they obtained a high average FID score of 86.69 and 44.76, respectively. On the other hand, Star-GAN v2 is equipped to handle multi-domains using a styling and mapping network. A trained Star-GAN v2 utilizes the mapping network to generate diverse style codes to diversify images. The synthetic iris PAs generated using this method were diverse in nature, but failed to capture the true characteristics of PAs like artificial eyes. Hence, the average FID score of the generated image using Star-GAN v2 was 38.81 - much lower than that of Style-GAN and Star-GAN, but still a bit higher than CIT-GAN. This can also be seen in the FID score distribution in Figure 3.4 that compares the synthetically generated data using Star-GAN v2 with that of CIT-GAN. 3.3.3 Utility of Synthetically Generated Images In this section, we describe different experimental setups that are used to evaluate the quality and usefulness of synthetic PA samples generated using CIT-GAN. We evaluated the performance of different iris PAD methods viz., VGG-16 [53], BSIF [41], DESIST [81], D-NetPAD [128] and AlexNet [84] under these different experimental setups for analysis purposes. Note that D-NetPAD 67 Figure 3.4: Comparing the FID score distributions of Star-GAN v2 [26] and CIT-GAN for each synthetically generated PA domain. A lower FID is better. Table 3.1: Experiment-1: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detection Rate (FDR) of existing iris PAD algorithms in baseline experiment (referred to as Experiment-1) when trained using imbalanced samples across different PA classes. BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [128] TDR (@0.1%) TDR (@0.2%) TDR (@1.0%) 3.32 6.15 28.11 85.25 83.86 89.07 4.25 5.85 17.15 86.10 87.29 90.51 87.94 88.91 92.54 Table 3.2: Experiment-2: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detection Rate (FDR) of existing iris PAD algorithms in Experiment-2 to evaluate the equivalence between real and synthetic PAs. When comparing with the performances in Table 2.3, it can be seen that substituting some of the real PAs in the training set with synthetically generated samples has a limited impact on the performance of PAD algorithms, especially at a FDR of 1%. BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [128] TDR (@0.1%) TDR (@0.2%) TDR (@1.0%) 11.09 18.06 29.38 78.29 83.03 87.39 4.48 7.43 17.43 79.90 84.13 89.66 84.27 86.04 89.55 is one of the best performing PAD algorithms in the iris liveness detection competition (LivDet-20 edition) [36]. Experiment-1: This is the baseline experiment that demonstrates the performance of the current iris PAD methods on the previously mentioned datasets with imbalanced samples across different PA classes. The PAD methods are trained using 14,970 bonafide samples and 10,306 PA samples consisting of 276 artificial eyes, 4,014 cosmetic contact lenses and 6,016 printed eyes. The test set consists of 9,439 bonafide samples and 9,896 PA samples corresponding to 404 artificial eyes, 2,720 cosmetic contact lenses and 6,772 printed eyes. Experiment-2: In this experiment, our aim is to evaluate the equivalence between real PAs and 68 Table 3.3: Experiment-3: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detection Rate (FDR) of existing iris PAD algorithms in Experiment-3 to evaluate the efficacy of proposed method, CIT-GAN, in generating synthetic PA samples that captures the real PA distribution across various PA domains. BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [129] TDR (@0.1%) TDR (@0.2%) TDR (@1.0%) 7.57 10.51 29.43 76.12 80.98 85.81 2.31 4.70 15.29 78.89 82.66 88.37 82.26 84.54 88.86 synthetically generated PAs. Therefore, the iris PAD methods are trained using 14,970 bonafide samples and PAs consisting of both real PA images and synthetic PA images. The real PA dataset has 138 artificial eyes, 2,007 cosmetic contact lenses and 3,008 printed eyes. The synthetic PA dataset is generated using the remainder of the real PA dataset as reference images (i.e., 138 artificial eyes, 2,007 cosmetic contact lenses and 3,008 printed eyes) in order to capture their style characteristics in the generated dataset. As before, the test set consists of 9,439 bonafide and 9,896 PA samples corresponding to 404 artificial eyes, 2,720 cosmetic contact lenses and 6,772 printed eyes. Experiment-3: This experiment aims to evaluate the efficacy of the proposed method, CIT-GAN, in generating synthetic PA samples that represent the real PA distribution across various PA domains. Here, the iris PAD methods are trained using 14,970 bonafide samples and synthetically generated 276 artificial eyes, 4,014 cosmetic contact lenses and 6,016 printed eyes. The test set consists of 9,439 bonafide and 9,896 PA samples corresponding to 404 artificial eyes, 2,720 cosmetic contact lenses and 6,772 printed eyes. Experiment-4: As mentioned in Experiment-1, current iris PAD methods are trained and tested on imbalanced samples from PA classes thereby affecting their accuracy. To overcome this, we train the iris PAD methods using 14,970 bonafide samples and a balanced set of 15,000 PA samples corresponding to 276 artificial eyes, 4,014 cosmetic contact lenses and 5000 printed eyes that are real; and 4,724 artificial eyes and 986 cosmetic contact lenses that are synthetic. This balances the number of samples across PA classes. The testing was done on 9,439 bonafide samples and 9,896 PA samples consisting of 404 artificial eyes, 2,720 cosmetic contact lenses and 6,772 printed eyes. 69 Table 3.4: Experiment-4: True Detection Rate (TDR in %) at 0.1%, 0.2% and 1.0% False Detec- tion Rate (FDR) of existing iris PAD algorithms in Experiment-4. When comparing against the performance numbers in Table 2.3, it can be seen that training using balanced samples from each PA class/domain helps improve the performance of current iris PAD algorithms. BSIF+SVM [41] Fine-Tuned VGG-16 [53] DESIST [81] Fine-Tuned AlexNet [84] D-NetPAD [129] TDR (@0.1%) TDR (@0.2%) TDR (@1.0%) 14.39 22.91 51.11 69.40 75.34 91.60 2.59 5.38 21.41 79.26 82.80 92.70 90.38 94.19 97.89 3.4 Results & Analysis In this section, we discuss the results obtained for the four different experiments described in the previous section. Experiment-1 is the baseline experiment that evaluates the performance of various iris PAD methods. The training set for this experiment contains 14,970 bonafide and 10,306 PA samples consisting of 276 artificial eyes, 4,014 cosmetic contact lenses and 6,016 printed eyes. Due to imbalance in the number of samples across various PA domains, the performance of the PAD methods is affected. This becomes apparent when comparing the results of Experiment-1 with that of Experiment-4 where PAD methods are trained using 9,439 bonafide samples and a balanced number of PA samples (i.e., 5,000 samples from each PA domain) containing both real and synthetic PAs. As seen from the results in Table 3.1 and Table 3.4, performance for each PAD method improves in Experiment-4. For example, in the case of D-NetPAD, the TDR at a 1% FDR improved from 92.54% in Experiment-1 to 97.89% in Experiment-4 (as shown in Figure 3.7). A huge increase in performance was also noticed for BSIF+SVM where TDR improved from 28.11% in Experiment-1 to 51.11% in Experiment-4, at a FDR of 1%. In addition, the equivalence of synthetically generated PA samples and real PA samples was established using Experiment-2 and Experiment-3. In Experiment-2, some of the real PA samples in the training set were replaced with synthetically generated PAs. Comparing the performance in Table 3.1 and Table 3.2, a very slight difference in PAD performance is observed (see Figure 3.5). Similarly, in Experiment-3, where all the real PAs are replaced with synthetically generated PA samples, only a slight decrease in performance was seen for the PAD methods (as shown in Figure 3.6) signifying underlying similarities between real and synthetically generated data. 70 3.5 Summary PA detection methods such as [59,65,128] have demonstrated their effectiveness in generalized PA detection through extensive experimentation on various iris PA datasets. While these methods achieve high accuracy in iris PA detection, their performance can be adversely affected when there is an insufficient number of samples available from diverse PA classes [34]. Consequently, it can be inferred that existing iris deep learning-based PAD methods require a substantial amount of training data encompassing various PA classes and scenarios. Unfortunately, acquiring such a dataset is a formidable challenge in real-world settings. To address this challenge, we proposed a novel GAN architecture known as the Cyclic Image Translation Generative Adversarial Network (CIT-GAN). CIT-GAN incorporates a unique Styling Network to guide the transformation of input images into multiple target domains. In this setup, the generator accepts an input image with a domain label, generating images with the characteristics of the specified target domain. To further enhance the generative capacity of the network, a multi-task Styling Network is employed. This network learns style codes for multiple PA domains, aiding the generator in synthesizing images that capture the style elements of the target domains. The domain-specific style features acquired through the Styling Network are influenced by both style loss and domain classification, ensuring variability in style characteristics within each domain. As there are multiple domains to consider, the discriminator incorporates multiple output branches to determine whether a given image is real or synthetic for each of the domains. The results obtained on various experimental setups show the equivalence of synthetically generated PA samples and real PA samples, showing how realistic the generated PA samples are. Furthermore, the results in Table 3.1 and 3.2 demonstrate that the performance of the iris PAD methods can be improved by adding synthetically generated data to different PA classes for balanced training. 71 Figure 3.5: Comparing performance of D-NetPAD in Experiment-1 and Experiment-2 to evaluate the equivalence of synthetic PA samples in replacing real PA samples. D-NetPAD is one of the best performing PAD algorithms in iris liveness detection competition (LivDet-20 edition) [36]. 72 Figure 3.6: Comparing performance of D-NetPAD in Experiment-1 and Experiment-3 to evaluate the efficacy of the proposed method, CIT-GAN, in generating synthetic PA samples that represent the real PA distribution across various PA domains. 73 Figure 3.7: Comparing the performance of D-NetPAD in Experiment-1 and Experiment-4 to empha- size that the performance of current iris PAD methods are affected due to training with imbalanced samples from PA classes (Experiment-1). Improved performance is reported in Experiment-4 that utilizes synthetic samples for balanced training. 74 CHAPTER 4 IWARPGAN: DISENTANGLING IDENTITY AND STYLE TO GENERATE SYNTHETIC IRIS IMAGES So far, we focused on generating partially synthetic iris images in NIR spectrum where identities in the generated images is not the focus of our study. In this chapter, we explore the capabilities and weaknesses of current methods in generating iris images with identities that are different from training dataset and has sufficient intra-class variations to represent real iris datasets. We also proposed a novel method, known as iWarpGAN, to generate fully synthetic iris images NIR spectrum with identities that are different from training dataset (with both inter and intra-class variations). This work has been published in [159]. 4.1 Introduction With the advent of technology, iris sensors are now available in commercial and personal devices, paving the way for secure recognition and access control [55, 111]. However, the accuracy of iris recognition systems relies heavily on the quality and size of the dataset used for training. The limited availability of large-scale iris datasets due to the difficulty in collecting operational quality iris images, has become a major challenge in this field. For example, most of the iris datasets available in the literature have frontal view images [5, 86], and the number of subjects and total number of samples in these datasets are limited. Further, in some instances, collecting and sharing iris datasets may be stymied due to privacy or legal concerns [145]. Therefore, researchers have been studying the texture and morphology of the iris in order to model its unique patterns and to create large-scale synthetic iris datasets. For example, Cui et al. [32] utilized principal component analysis with super-resolution to generate synthetic iris images. Shah and Ross [126] used a Markov model to capture and synthesize the iris texture followed by embedding of elements such as spots and stripes to improve visual realism. In [173], Zuo et al. analyzed various features of real iris images, such as texture, boundary regions, eyelashes, etc. and used these features to 75 create a generative model based on the Hidden Markov Model for synthetic iris image generation. These methods while successfully generating synthetic iris images, are found lacking in terms of quality (visual realism and good-resolution) and diversity in the generated samples [156]. Over the past few years, deep learning-based approaches have set a benchmark in various fields including synthetic image generation and attribute editing, using Convolutional Autoencoders (CAEs) [143] and Generative Adversarial Networks (GANs) [56, 57]. In [82, 156, 158], authors proposed GAN-based synthetic image generation methods that input a random noise vector and output a synthetic iris image. While these methods address some of the concerns mentioned previously, the generated images are often similar to each other [140]. Additionally, due to insufficient number of training samples, the generator is often over-trained to synthesize images with patterns seen during training [140], which affects the uniqueness of the synthesized iris images. In this work, we address the following limitations of current synthetic iris generators: (1) difficulty in generating realistic looking and high resolution synthetic iris images, (2) failure to incorporate inter and intra class variations in the generated images, (3) generating images that are similar to the training data, and (4) utilizing domain-knowledge to guide the synthetic generation process. We achieve this by proposing iWarpGAN that aims to disentangle identity and style using two transformation pathways: (1) Identity Transformation and (2) Style Transformation. The goal of Identity Transformation pathway is to transform the identity of the input iris image in the latent space to generate identities that are different from the training set. This is achieved by learning RBF-based warp function, 𝑓 𝑝, in the latent space of a GAN, whose gradient gives non-linear paths along the 𝑝𝑡ℎ family of paths for each latent code 𝑧 ∈ R. The Style Transformation pathway aims to generate images with different styles, which are extracted from a reference iris image, without changing the identity. Therefore, by concatenating the reference style code with the transformed identity code, iWarpGAN generates iris images with both inter and intra-class variations. Thus, the contributions of this research are as follows: • We propose a synthetic image generation method, iWarpGAN, which aims to disentangle identity and style in two steps: identity transformation and style transformation. 76 • We evaluate the quality and realism of the generated iris images using ISO/IEC 29794-6 Standard Quality Metrics which uses a non-reference single image quality evaluation method. • We show the utility of the generated iris dataset in training deep-learning based iris matchers by increasing the number of identities and overall images in the dataset. 4.2 Background As mentioned previously, GANs [56] are generative models that typically take a random noise vector as input and output a visually realistic synthetic image. A GAN consists of two main components: (1) Generative Network known as Generator (𝐺), and (2) Discriminative Network known as Discriminator (𝐷) that are in competition with each other. The Generator aims to generate realistic looking images that can fool the discriminator, while Discriminator (𝐷) aims to distinguish between real and synthetic images generated by 𝐺. In the literature, different methods have been proposed to generate generate realistic looking biometric images such as face, iris and fingerprint. Some of these methods are discussed below: • Generation using Random Noise: Kohli et al. [82] proposed a GAN-based approach to synthesize cropped iris images using iris Deep Convolution Generative Adversarial Network (iDCGAN). While this method generates realistic looking cropped iris images of size 64×64, unrealistic distortions and noise were observed when trained to generate high resolution images. In [156], Yadav et al. overcame this issue by utilizing Relativistic Average Standard Generative Network (RaSGAN) that aims to generate realistic looking and high resolution iris images. However, since RaSGAN generates synthetic images from a random noise vector, it is hard to generate irides with intra-class variations. Also, as shown in Figure 4.4, the uniqueness of generated images is limited and the network was often observed to repeat certain patterns, restricting the diversity in the generated dataset. Wang et al. [147] proposed a method for generating iris images that exhibit a wide range of intra- and inter- class variations. Their approach incorporates contrastive learning techniques to effectively 77 disentangle identity-related features, such as iris texture and eye orientation, from condition- variant features, such as pupil size and iris exposure ratio, in the generated images. While their method seems promising but the experiments presented in their paper [147] are not sufficient to comment on realism and uniqueness of the generated iris images. • Generation via Image Translation: Image translation refers to the process of translating an image from one domain to another by learning the mapping between various domains. Therefore, image translation GANs focus on translating a source image to the target domain with the purpose of either changing some style attribute in the source image or adding/mixing different styles together. For example, StyleGAN [77] learns a mapping to different styles in face images (such as hair color, gender, expression, etc.) using a non-linear mapping function that embeds the style code of the target domain into the generated image. Unlike StyleGAN, StarGAN [26] and CIT-GAN [158] require paired training data to translate a source image to an image with the attributes of the target domain using style code of a reference image. This forces the generator to learn mappings across various domains, making it scalable to multiple domains. However, when trained using real iris images, StarGAN and CIT-GAN were seen to assume the identity of the source image (as shown in Figures 4.6 and 4.7). So, both methods fail to generate irides whose identities are not present in the training dataset. There are other GAN-based methods in the literature that aim to edit certain portions of the image using warp fields or color transformations. Warp fields have been widely used for editing images such as modifying eye-gaze [52], semantically adding objects to an image [169], reconstructing facial features [165], etc. Dorta et. al [40] argues that warp fields are more comprehensive than pixel differences that allow more flexibility in terms of partial edits. Geng et al. [54] proposed WG-GAN that aims to fit a dense warp field to an input source image to translate it according to the target image. This method showed good results at low resolution, but the quality of synthetic data deteriorates at high resolution. Also, as mentioned earlier, the source-target relationship in WG-GAN can restrict the uniqueness of the output image. Dorta et al. [40] overcame these issues 78 by proposing WarpGAN that allows partial edits without the dependency on the source-target image pair. The generator takes as input a source image and a target attribute vector and then learns the warp field to make the desired edits in the source image. This method has been proven to make more realistic semantic edits in the input image than StarGAN and CycleGAN [170]. Further, with the ability of controlled or partial edits, WarpGAN provides the mechanism to generate images with intra-class variations. However, using a real image as input to the generator restricts the number of unique images that can be generated from this network. 4.3 Proposed Method In this section, we will discuss the proposed method, iWarpGAN, that has the capability to synthesize an iris dataset in such a way that: (1) it contains iris images with unique identities that are not seen during training, (2) generates multiple samples per identity, (3) it is scalable to hundred thousand unique identities, and (4) images are generated in real-time. Let 𝑥𝑑1 𝑠1 ∈ P be an input image with identity 𝑑1 and style 𝑠1, and another input image 𝑥𝑑2 𝑠2 ∈ P with identity 𝑑2 and style 𝑠2. Here, 𝑠1 and 𝑠2 denote image with attribute 𝑦. The attribute vector 𝑦 is a 12-bit binary vector, where the first 5 bits correspond to a one-hot encoding of angle, the next 5 bits correspond to a one-hot encoding of position shift, and the last 2 bits denote contraction and dilation, respectively. Here, angle and position define eye orientation and the shift of iris center in the given image. The possible angles are 0𝑜, 10𝑜, 12𝑜, 15𝑜, 18𝑜 and the possible position shifts are [0,0], [5,5], [10,10], [-10,10], [-10,-10]. For example, an image with angle 10𝑜, position shift [0,0] and dilation, the attribute vector 𝑦 will be [0,1,0,0,0,1,0,0,0,0,0,1]. The angle value defines the image orientation and position defines the offset of the iris center from the image center. Given 𝑥𝑑1 𝑠1 and 𝑥𝑑2 𝑠2 with identity 𝑑3 different from the training data and possessing the style attribute 𝑠2 from 𝑥𝑑2 𝑠2 . To achieve this, as shown in Figure 4.1, the framework of iWarpGAN has been divided into five parts: (1) Style Encoder, 𝐸𝑆, that encodes 𝑠2 , our aim is to synthesize a new iris image 𝑥𝑑3 style of the input image, (2) Identity Encoder, 𝐸𝐷, that learns an encoding to generate an identity different from the input image, (3) Generative Network, 𝐺, that uses encoding from both 𝐸𝐷 and 79 Figure 4.1: The proposed iWarpGAN consists of five parts: (1) Style Encoder, 𝐸𝑆, that aims to encode the style of the input image as 𝑠, (2) Identity Encoder, 𝐸𝐷, that aims to learn encoding 𝑑 that generates an identity different from the input image, (3) Generative Network, 𝐺, that uses encoding from both 𝐸𝐷 and 𝐸𝑆 to generate an image with a unique identity and the given style attribute, (4) Discriminator, 𝐷, that inputs either a real or synthetic image and predicts whether the image is real or synthetic and also emits an attribute vector 𝑦′ ∈ {angle, position, contraction, dilation of pupil}, and (5) Pre-trained Classifier, 𝐶, that computes the distance score between the real input image and the new identity generated by 𝐺. 𝐸𝑆 to generate an image with a unique identity and the given style attribute, (4) Discriminator, 𝐷, that predicts whether the image is real or synthetic and emits an attribute vector 𝑦′ Pre-trained Classifier, 𝐶, that returns the distance score between a real input image and new the and (5) identity generated by 𝐺. 4.3.1 Disentangling Identity and Style to Generate New Iris Identities Generally, the number of samples available in the training dataset is limited. This restricts the latent space learned by 𝐺 thereby limiting the number of unique identities generated by the trained GAN. Some GANs focus too much on editing or modifying style attributes in the images while generating previously seen identities in the training dataset [26, 156, 158]. This motivated us to divide the problem into two parts: (1) Learning new identities that are different from those in the training dataset, and (2) Editing style attributes for ensuring intra-class variation. Inspired by [166], 80 we achieve this by training the proposed GAN using two pathways - Style Transformation Pathway and Identity Transformation Pathway. Style Transformation Pathway: Similar to StyleGAN, this pathway entirely focuses on learning the transformation of the style. Therefore, this sub-path aims to train the networks 𝐸𝑆, 𝐷 and 𝐺, while keeping the networks 𝐸𝐷 and 𝐶 fixed. Input to the generator 𝐺 is the concatenated latent vector 𝑑 and 𝑠 to generate an iris image with style attribute 𝑦. 𝐺 tries to challenge 𝐺 by maximizing, L𝐺−𝑆𝑡𝑦 = E 𝑥𝑑𝑖 𝑠𝑖 ,𝑥 𝑑𝑗 𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙 [𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖 𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗 𝑠 𝑗 , 𝑦)))] (4.1) Here, ¯𝑥 = (𝐺 (𝐸𝐷 (𝑥𝑑𝑖 competes with 𝐺 by minimizing, 𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗 𝑠 𝑗 , 𝑦))) is the image generated by 𝐺. At the same time, 𝐷 L𝐷−𝑆𝑡𝑦 = E 𝑥𝑑𝑖 𝑠𝑖 ,𝑥 𝑑𝑗 𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙 [𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖 𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗 𝑠 𝑗 , 𝑦)))] −E𝑥 [𝐷 (𝑥)] (4.2) In order to enforce that an iris image is generated with style attributes 𝑦, the following loss function is utilized: L𝑆𝑡𝑦−𝑅𝑒𝑐𝑜𝑛 = ||𝐸𝑆 ( ¯𝑥) − 𝐸𝑆 (𝑥𝑑𝑗 𝑠 𝑗 )||2 2 (4.3) Identity Transformation Pathway: This pathway focuses on learning identities in latent space that are different from the training dataset. Therefore, this sub-path aims to train the networks 𝐸𝐷, 𝐷 and 𝐺, while keeping the networks 𝐸𝑆 and 𝐶 fixed. Therefore, L𝐺−𝐼 𝐷 = E 𝑥𝑑𝑖 𝑠𝑖 ,𝑥 𝑑𝑗 𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙 L𝐷−𝐼 𝐷 = E 𝑥𝑑𝑖 𝑠𝑖 ,𝑥 𝑑𝑗 𝑠 𝑗 ∼P𝑟𝑒𝑎𝑙 [𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖 𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗 𝑠 𝑗 , 𝑦)))] [𝐷 (𝐺 (𝐸𝐷 (𝑥𝑑𝑖 𝑠𝑖 ), 𝐸𝑆 (𝑥𝑑𝑗 𝑠 𝑗 , 𝑦)))] (4.4) (4.5) −E𝑥 [𝐷 (𝑥)] Here, the goal is to learn encodings that represent identities different from those in the training dataset. For this, 𝐸𝐷 is divided into two parts (as shown in Figure 4.1) - Encoder 𝐸 that extracts the encoding from given input image and Warping Network 𝑊 that aims to learn 𝑀 warping functions ( 𝑓 1, ....., 𝑓 𝑀 ) to discover 𝑀 non-linear paths in the latent space of 𝐺. The gradient of these can be 81 utilized to define the direction at each latent code 𝑧 [142] such that new ¯𝑧 represents encoding of an identity different from the input image. In order to achieve this, the encoder 𝐸𝐷 is broken down to two parts - an encoder 𝐸 that extracts the latent code of the given input image and passes it on to the warping network 𝑊. For a vector space R𝑑, the function 𝑓 : R is defined as, 𝑓 (𝑧) = 𝐾 ∑︁ 𝑖=1 𝑏𝑖𝑒𝑥 𝑝(−𝑢𝑖 ||𝑧 − 𝑣𝑖 ||2) (4.6) Here, 𝑣𝑖 ∈ R𝑑 represents the center, 𝑏𝑖 ∈ R represents weight and 𝑢𝑖 ∈ R represents scale of 𝑖𝑡ℎ RBF. This function for warping is differentiable and for a specific value of 𝑧, the direction from Δ 𝑓 can be used to define a curve in R𝑑 by shifting 𝑧 as [142]: 𝛿𝑧 = 𝜖 Δ 𝑓 (𝑧) ||Δ 𝑓 (𝑧)|| (4.7) Here, 𝜖 is the shift magnitude that determines the shift from 𝑧 to ¯𝑧 using above equation. The Warping Network, 𝑊, contains two components: warper and reconstructor 𝑅. The warper can be parameterized using the triplet (𝑉 𝑚, 𝐵𝑚, 𝑈𝑚) denoting the center, weight and parameters. Here, 𝑚 = 1, 2, ....𝑀 and each triplet help warping the latent space in R𝑑. Also, the reconstructor is utilized to estimate the support set and magnitude shift that led to the transformation at hand. Therefore, the objective function for the Warping Network can be defined as, min 𝑉,𝐵,𝑈,𝑅 E𝑧,𝜖 [L𝑊−𝑅𝑒𝑔 (𝜖, ¯𝜖)] (4.8) Here, L𝑊−𝑅𝑒𝑔 refers to regression loss. To further emphasize the uniqueness of identity learned by 𝐺 in latent space, we maximize, L𝐼𝑑𝑒𝑛𝑡−𝑅𝑒𝑐𝑜𝑛 = ||𝐸 ( ¯𝑥) − 𝐸 (𝑥𝑑𝑖 𝑠𝑖 )||2 2 L𝐼𝑑𝑒𝑛𝑡−𝐶𝑙𝑠 = ||𝐹𝑒𝑎𝑡 ( ¯𝑥) − 𝐹𝑒𝑎𝑡 (𝑥𝑑𝑖 𝑠𝑖 )||2 2 (4.9) (4.10) Here, 𝐹𝑒𝑎𝑡 (𝑥) are the features extracted by the trained iris classifier (i.e., matcher) 𝐶. 82 Figure 4.2: Some examples of images generated using iWarpGAN. A total of 20,000 irides corre- sponding to 2,000 identities were generated for each of the three training datasets. By employing distinct pathways for style and identity, the proposed method enables the manip- ulation of identity features to generate synthetic images with distinct identities that diverge from the training dataset. Additionally, this methodology allows for the generation of images with varied styles for each identity. This is achieved by keeping the input image to the identity pathway constant and varying the input image to the style pathway to enforce that the generated images have the same identity 𝑑 but different styles (i.e., intra-class variation) 𝑠1, 𝑠2, ...., 𝑠𝑛. 4.4 Datasets Used In this work, we utilized three publicly available iris datasets for conducting experiments and performing our analysis: D1: CASIA-Iris-Thousand This dataset [5] released by the Chinese Academy of Sciences Institute of Automation has been widely used to study distinctiveness of iris features and to develop state- of-the-art iris recognition methods. It contains 20,000 irides from 1,000 subjects (2,000 unique identities with left and right eye) captured using an iris scanner with a resolution of 640×480. The dataset is divided into train and test sets using a 70-30 split based on unique identities, i.e., 1,400 83 Figure 4.3: Examples of images generated using iWarpGAN with unique identities and intra-class variations. The figure shows the average similarity score (SScore) for both inter and intra class. identities in the training set and 600 in the test set. D2: CASIA Cross Sensor Iris Dataset (CSIR) For this work, we had access to only the train set of the CASIA-CSIR dataset [152] released by the Chinese Academy of Sciences Institute of Automation. This dataset consists of 7,964 iris images from 100 subjects (200 unique identities with left and right eye), which is divided into train and test sets using a 70-30 split on unique identities for training and testing deep learning based iris recognition methods, i.e., training set contains 5,411 images and test set contains 2,553 images. D3: IITD-iris This dataset [6] was released by the Indian Institute of Technology, Delhi, and was acquired in an indoor environment. It contains 1,120 iris images from 224 subjects captured using JIRIS, JPC1000 and digital CMOS cameras with a resolution of 320×240. This dataset is divided into train and test sets using 70-30 split based on unique identities, i.e., images from 314 identities in the training set and images from 134 identities in the testing set. Training Data for Proposed Method The proposed method is trained using cropped iris images of size 256×256, where the style of each image is represented using the attribute vector 𝑦. Current datasets do not contain balanced number of iris images across these attributes. Therefore, variations such as angle and position is added via image transformations on randomly selected images from 84 the dataset. In order to achieve this, iris coordinates are first obtained using the VeriEye iris matcher, images are then translated to different angles and positions with respect to these centers, and cropped iris image of size 256×256 extracted. This helps create a training dataset with balanced samples across different attributes. Since the proposed method uses an image translation GAN, 𝑠1 , 𝑥𝑑2 during image synthesis two images 𝑥𝑑1 𝑠2 are used as input to synthesize a new iris image 𝑥𝑑3 𝑠2 with identity 𝑑3 which is different from the training data and possesses the style attribute 𝑠2 of 𝑥𝑑2 𝑠2 . 𝑠2 and an attribute vector 𝑦 of image 𝑥𝑑2 4.5 Experimental Protocols In this section, we discuss different experiments utilized to study and analyze the performance of the proposed method. First, three sets of 20,000 number of iris images corresponding to 2,000 identities are generated. The three sets correspond to three different training datasets, D1, D2 and D3. For some of the experiments below, a subset of the generated images were used in order to be commensurate with the corresponding real dataset. 4.5.1 Experiment-1: Quality of Generated Images ISO/IEC 29794-6 Standard Quality Metrics The quality of generated images is compared with the real images using ISO/IEC 29794-6 Standard Quality Metrics [3]. We also evaluated the quality of images generated by other techniques, viz., WGAN [8], RaSGAN [156] and CITGAN [158] and compared them with the images generated using iWarpGAN. The ISO metric evaluates the quality of an iris image using factors such as usable iris area, iris-sclera contrast, sharpness, iris-pupil contrast, pupil circularity, etc. to generate an overall quality score. The quality score ranges from [0-100] with 0 representing poor quality and 100 representing the highest quality. The images that cannot be processed by this method (either due to extremely poor quality or error during segmentation) are given a score of 255. As shown in Figure 4.4, the quality scores of iris images generated by iWarpGAN and CITGAN are comparable with real irides. On the other hand, WGAN and RaSGAN have many images with a score of 255 due to the poor image quality. Also, when 85 (a) CASIA-Iris-Thousand dataset v/s synthetically generated images from differ- ent GANs. (b) CASIA-CSIR dataset v/s synthetically generated images from different GANs. Figure 4.4: Histograms showing the quality scores of real iris images from three different datasets and the synthetically generated iris images. The quality scores were generated using ISO/IEC 29794-6 Standard Quality Metrics [3] in the score range of [0-100]. Higher the score, better the quality. Iris images that failed to be processed by this method are given the score of 255. 86 Figure 4.4 (cont’d) (c) IITD-iris dataset v/s synthetically generated images from different GANs. comparing the images in the three datasets, it can be seen that CASIA-CSIR dataset has more images with a score of 255 than IITD-iris and CASIA-Iris-Thousand dataset. VeriEye Rejection Rate To further emphasize the superiority of the proposed method in generating good quality iris images, we compare the rate of rejection of the generated images by a commercial iris matcher known as VeriEye. We compare the rejection rate for images generated by iWarpGAN with the real images as well as those generated by WGAN, RaSGAN and CITGAN: (a) IITD-Iris-Dataset: This dataset contains a total of 1,120 iris images out of which 0.18% images are rejected by VeriEye. For comparison, we generated 1,120 iris images each using iWarpGAN, WGAN, RaSGAN and CITGAN. For the generated images, the rejection rate is as high as 9.73% and 4.55% for WGAN and RaSGAN, respectively. However, the rejection rate for CITGAN and iWarpGAN is 2.85% and 0.73%, respectively. (b) CASIA-CS Iris Dataset: This dataset contains a total of 7,964 iris images out of which 2.81% images are rejected by VeriEye. For comparison, we generated 7,964 iris images each using 87 iWarpGAN, WGAN, RaSGAN and CITGAN. For the generated images, the rejection rate is as high as 4.17% and 2.06% for WGAN and RaSGAN, respectively. However, the rejection rate for CITGAN and iWarpGAN is 2.71% and 2.74%, respectively. (c) CASIA-Iris-Thousand Dataset: This dataset contains a total of 20,000 iris images out of which 0.06% images are rejected by VeriEye. For comparison, we generated 20,000 iris images each using iWarpGAN, WGAN, RaSGAN and CITGAN. For the generated images, the rejection rate is as high as 0.615% and 0.34% for WGAN and RaSGAN, respectively. However, the rejection rate for CITGAN and iWarpGAN is 0.24% and 0.18%, respectively. 4.5.2 Experiment-2: Uniqueness of Generated Images This experiment analyzes the uniqueness of the synthetically generated images, i.e., we evaluate whether iWarpGAN is capable of generating unique identities with intra-class variations. Experiment-2A: Experiment-2A focuses on studying the uniqueness in the synthetic iris dataset generated using different GAN methods with respect to training samples. For this, we studied the genuine and impostor distribution of real iris images used to train GAN methods and compared it with the distribution of synthetically generated iris images. We utilized VeriEye matcher in this experiment to evaluate the similarity score between a pair of iris image. The score ranges from [0, 1557] where a higher score denotes a better match. Experiment-2B: Experiment-2B focuses on studying the uniqueness and intra-class variations within the generated iris dataset. For this, we studied the genuine and impostor distributions of the generated iris images and compare it with the distribution of real iris datasets. As mentioned earlier, this study is done for various unique generated identities to study both uniqueness and scalability. We utilized VeriEye matcher in this experiment to evaluate the similarity score between a pair of iris images. Analysis As shown in the Figures 4.5, 4.6 and 4.7, unlike other GAN methods, the iris images generated by 88 iWarpGAN do not share high similarity with the real iris images used in training. This shows that iWarpGAN is capable of generating irides with identities that are different from the training dataset. Further, looking at the impostor distribution of synthetically generated images, which overlaps with the impostor distribution of real iris images, we can conclude that the generated identities are different from each other. Note that low similarity scores in WGAN for real v/s synthetic and synthetic v/s synthetic distributions are due to poor quality iris images generated by WGAN. 4.5.3 Experiment-3: Utility of Synthetic Images In this experiment, we analyze the performance of deep learning algorithms trained and tested for iris recognition using a triplet training method, and compare it with the performance when these algorithms are trained using real and synthetically generated iris images. Experiment-3A: Baseline Analysis This is a baseline experiment where EfficientNet [66] and Resnet-101 [101] are trained with the training set of CASIA-Iris-Thousand, CASIA-CSIR and IITD-iris datasets using the triplet training method. The trained networks are tested for iris recognition on the test set of the above mentioned datasets. Experiment-3B: Cross-Dataset Analysis In this experiment, we analyze the benefits of synthetically generated iris datasets in improving the performance of deep learning based iris recognition methods. EfficientNet and Resnet-101 are trained using the training set of CASIA-Iris-Thousand, CASIA-CSIR and IITD-iris datasets, as well as the synthetically generated iris dataset from the iWarpGAN. Analysis: As shown in Figures 4.8 and 4.9, the performance of the deep learning based iris recog- nition system improves when trained with more data, i.e., when combining real and synthetically generated iris images from iWarpGAN. While there is a slight improvement in the performance of ResNet-101, a significant improvement in the performance is seen for EfficientNet. The baseline under-performance of EfficientNet can be attributed to two main factors. First, 89 architectural differences between EfficientNet and ResNet are significant. EfficientNet’s design emphasizes parameter efficiency through compound scaling, optimizing depth, width, and reso- lution together. While effective in many domains, this approach might not align well with the specific requirements of iris recognition, particularly when training data is limited. In contrast, ResNet’s residual connections and straightforward structure make it robust across diverse tasks, including biometric recognition, even with smaller datasets. Second, the training requirements for EfficientNet differ from those of ResNet. EfficientNet’s compound scaling may necessitate tailored learning rates, augmentation strategies, and regularization techniques to effectively capture the intricate patterns in iris images. Without such adjustments, its capacity to learn critical iris features may be limited. The inclusion of synthetic data appears to address these challenges by diversifying the training set, allowing EfficientNet to better adapt to the unique characteristics of iris recognition. This finding underscores the importance of designing training strategies specific to an architecture’s strengths and weaknesses. Future research could explore architecture-specific synthetic data generation or custom training pipelines to further enhance model performance across a variety of biometric recognition systems. 4.6 Summary & Future Work TThe results in the previous section demonstrate that, unlike existing GANs, the proposed method can generate high-quality iris images with identities that are distinct from those in the training dataset. Moreover, the generated identities are unique with respect to each other, exhibiting slight variations to enhance diversity. This capability is critical in biometric systems, where unique and diverse identities are required to train and evaluate robust models. Additionally, the usefulness of the generated dataset in improving the performance of deep learning-based iris recognition methods has been established by augmenting training data with numerous unique synthetic identities. With the dataset used in this study, the proposed method is capable of generating up to 7,680,000 90 Figure 4.5: This figure shows the uniqueness of iris images generated using iWarpGAN when the GANs are trained using CASIA-Iris-Thousand dataset. The y-axis represents the similarity scores obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor. images, comprising 768,000 unique identities, with each identity having 10 samples. This large- scale generation capability highlights the practical utility of the proposed method for creating diverse datasets to train advanced iris recognition systems. The proposed method is based on an image transformation framework, where the network requires an input and reference image to transform the identity and style and produce an output image. While effective, this approach inherently limits the feature space explored by iWarpGAN to the diversity present in the training data and the provided references. Furthermore, the warping transformations applied in GAN space are often non-linear and complex, aimed at disentangling 91 Figure 4.6: This figure shows the uniqueness of iris images generated using iWarpGAN when the GANs are trained using CASIA-CS iris dataset. The y-axis represents the similarity scores obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor. features such as identity and style. These transformations typically do not have an explicit reverse function, which limits the ability to map generated images back to their precise original latent representation. Additionally, GAN latent spaces are generally high-dimensional, and warping manipulations might involve projecting onto subspaces or adding perturbations that result in some loss of information, further constraining the potential for inversion or precise reconstruction. For future work, we aim to extensively study the capacity of the proposed method to maximize the number of unique identities it can generate. Additionally, we will explore strategies to make the method more generalizable, ensuring that the new identities learned by iWarpGAN are not 92 Figure 4.7: This figure shows the uniqueness of iris images generated using iWarpGAN when the GANs are trained using IITD iris dataset. The y-axis represents the similarity scores obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor. constrained by the training set. This will involve developing techniques to expand the latent feature space, better decouple the identity and style representations, and address the inherent challenges of invertibility in WarpedGANSpace, ultimately enhancing the robustness and scalability of the method. 93 Figure 4.8: This figure shows the performance of Resnet-101 in the cross-dataset evaluation scenario. (a) Trained using train set of CASIA-CSIR & IIT-Delhi datasets and tested using test set of CASIA-Iris-Thousand. (b) Trained using CASIA-Iris-Thousand & IIT-Delhi datasets and tested using test set of CASIA-CSIR dataset. (c) Trained using CASIA-Iris-Thousand & CASIA-CSIR datasets and tested using test set of IIT-Delhi iris dataset. Figure 4.9: This figure shows the performance of EfficientNet in the cross-dataset evaluation scenario. (a) Trained using train set of CASIA-CSIR & IIT-Delhi datasets and tested using test set of CASIA-Iris-Thousand. (b) Trained using CASIA-Iris-Thousand & IIT-Delhi datasets and tested using test set of CASIA-CSIR dataset. (c) Trained using CASIA-Iris-Thousand & CASIA-CSIR datasets and tested using test set of IIT-Delhi iris dataset. 94 CHAPTER 5 IT-DIFFGAN: IMAGE TRANSLATIVE DIFFUSION GAN TO GENERATE SYNTHETIC IRIS IMAGES 5.1 Introduction As discussed in previous chapters, the Generative Adversarial Networks (GANs) [56] have emerged as a powerful tool in the field of computer vision, particularly for the generation of synthetic biometric images, such as iris [82, 140, 156, 158], face [47, 61, 67, 108, 138], fingerprint [45, 137, 151, 171] and speech, which closely resemble the real-world data. Traditionally, GANs take a random noise as input and uses that to generate a realistic looking image. For example, Kohli et. al. [82] proposed a GAN-based method, iris Deep Convolution Generative Adversarial Network (iDCGAN), for synthesizing cropped iris images from random noise. Although successful in generating high-quality cropped iris images sized 64×64, this approach exhibited unrealistic distortions and noise when tasked with generating higher resolution images. In a subsequent study by Yadav et al. [156], this challenge was addressed by employing the Relativistic Average Standard Generative Network (RaSGAN), specifically designed to produce high-quality, high-resolution iris images from random noise input. Apart from the traditional GAN-based approaches where images are generated using random noise as input to the generator, researchers have also studied image translative GANs that takes an image as an input and then generate a synthetic image. These types of GANs are most commonly used for image editing, style transfer, etc. For example, Richardson et al. [116] proposed a image transltive StyleGAN, pixel-Style-pixel (pSp), that takes a face image as an input and then aims to generate face images with edited style attributes such as hair color, face expression, etc. Similarly, StarGAN v2 [26] and CIT-GAN [158] require paired training data for the translation of a source iris image into a synthetic image possessing the attributes of the target domain (or reference iris image). This requirement compels the generator to acquire mappings across diverse domains, rendering it adaptable to multiple domains. Dorta et al. [40] proposed WarpGAN, which enables partial edits without reliance on a specific source-target image pair. In this 95 framework, the generator receives a source image along with a target attribute vector, subsequently learning a warp field to implement the desired alterations in the source image. Comparative studies have demonstrated that this approach yields more realistic semantic edits in input images compared to methods such as StarGAN v2 [26] and CycleGAN [170]. Despite their superior generative power, the utilization of GANs for generating synthetic bio- metric images is not without its challenges. The three most concerning challenges that we would like to study and overcome in this study are: (1) The imperative need to safeguard the privacy of individuals whose biometric information is present in the training datasets. Currently, the synthetic images generated by GANs are seen to have identity leak from training data [140, 159, 160] i.e, the identity traits in the generated iris images are similar to the images in the training data, (2) Most of the GANs in the literature lack the mechanism to introduce intra-class variations in the generated images that can generate multiple synthetic biometric images per identity [160]. (3) When training on small dataset, GANs are prone to mode collapse and often suffers from unstable training leading to generation of images with alien artifacts and distortions [147]. In [159], Yadav and Ross over- came some of these challenges by proposing iWarpGAN that focuses on disentangling identity and style through two distinct transformation pathways: Identity Transformation and Style Transfer. In the Identity Transformation pathway, the objective is to alter the identity of the input iris image within the latent space, thereby generating identities distinct from those present in the training set. This is accomplished by training a radial basis function (RBF)-based warp function, 𝑓 𝑚, within the latent space of a GAN. The gradient of this warp function provides non-linear paths along the 𝑚𝑡ℎ family of paths for each latent code 𝑧 ∈ R𝑑. On the other hand, the Style Transfer pathway focuses on generating images with varied styles, extracted from a reference iris image. Consequently, by combining the reference style code with the transformed identity code, iWarpGAN produces iris images exhibiting both inter and intra-class variations. While this method was able to generate realistic looking iris images that are different from training data in terms of identity, the generated images were seen to have some artifacts and distortions in non-iris regions. Also, the dependency on paired images to generate the desired iris image restricts its capability to generate numerous new 96 identities. In this research, we address these challenges by proposing IT-diffGAN that is an image transla- tive diffusion-GAN [147] with StyleGAN-3 [76] as the backbone architecture. The proposed method aims to project input images onto the latent space of our diffusion-GAN and identify the features most pertinent to individual identity and style. This is achieved using a identity and style metric that calculates the distance between the original image and the images generated by controlled manipulation of features in latent space. Once the features affecting the identity and style attributes are identified, the proposed method is then trained to generate new identities by manipulating these features in the latent space. This help IT-diffGAN to generate images with inter- and intra-class variations while making sure that generated identities doesn’t resemble anyone in the training data. By utilizing an image translative diffusion-GAN, this method is able to generate more realistic images and overcomes the issues like mode collapse and unstable training that is often faced by GANs. Thus, the contribution of this paper can be summarized as follows: • We proposed an image translative diffusion-GAN (IT-diffGAN) with StyleGAN-3 as the backbone architecture to generate realistic images by overcoming typical GAN issues like mode collapse and unstable training. • Identify the features in the latent space of IT-diffGAN that are most pertinent to individual identity and style by manipulating those features and calculating the displacement in style and identity. • Manipulate the features in the latent space that affects the identity and style of an image and utilize this knowledge to learn to generate iris images with both inter- and intra-class variations. • Utilizing the proposed IT-diffGAN to generate more realistic and diverse images, address- ing the limitations of traditional GANs and improving the quality and stability of image generation. 97 • We also evaluate the utility of the generated images for training deep-learning based iris recognition methods by providing data with more identities and intra-class variations. 5.2 Background In this section, we discuss the background needed to understand the proposed method. 5.2.1 Generative Adversarial Networks (GANs) GANs [56] provide a framework for generating synthetic images that closely resembles the real images. It consist of two networks – a generator and a discriminator – engaged in a competitive game to produce realistic data distributions. The generator aims to generate synthetic samples, while the discriminator aims to differentiate between real and synthetic samples. Through iterative training, the generator refines its ability to produce increasingly realistic samples, while the discriminator enhances its capability to discern between real and synthetic data. 5.2.1.1 Standard Generative Adversarial Networks (SGANs) As mentioned above, a GAN comprises of two essential components: a generator 𝐺 and a dis- criminator 𝐷, engaged in a competitive process. Traditionally, 𝐺 takes a random noise vector 𝑧 as input and generates a realistic looking synthetic image closely resembling real data samples. Conversely, 𝐷 strives to differentiate between synthetically generated data and genuine data. This dynamic interplay between the generator and discriminator is encapsulated in a min-max objective function [56]: min 𝐺 max 𝐷 𝐿 (𝐷, 𝐺) = E𝑰∼P [𝑙𝑜𝑔(𝐷 ( 𝑰))] +E𝒛∼Q [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝒛)))]. (5.1) Here, real data represented by 𝑰 ∼ P and the input to the generator (𝐺) comes from Gaussian distribution, 𝒛 ∼ Q. 𝐷 ( 𝑰) determines whether 𝑰 is real or synthetic. 98 (a) Illustration of identity and style disentanglement for input source image 𝑰 by first mapping it to latent space and then manipulating the channels by replacing those channels with channels from some target image 𝑰𝑻. Distance score is calculated to analyze the channels that are most pertinent to identity and style properties. (b) Analysis of channels affecting the identity in the synthetic iris images generated using IT-diffGAN that is trained using training images from CASIA-Iris-Thousand dataset. Images are generated using source images from test set of CASIA-Iris-Thousand and to manipulate the channels and study their effects, certain channel of source image in the latent space is replaced with that of reference images from the test set. Distance score is used to analyze the channels that affects identity properties. Figure 5.1: In order to disentangle style and identity in latent space, the first step in the proposed method is to study the latent space to analyze the channels that affect identity and style properties of the input image. With this analysis, proposed method aims to manipulate these properties to generate images with varying style and identities that are different from training data. 99 Figure 5.2: The training process of proposed IT-diffGAN consists of six parts: (1) Encoder, 𝐸, that aims to encode the input image 𝑰𝑺 into an interpretable latent space, (2) Warping Network 𝑊 that manipulates the identity information in the latent space (3) Generative Network, 𝐺, that takes transformed encoding from 𝑊 as input to generate an image 𝑰𝑮 with identity different from 𝑰𝑺, (4) Discriminator, 𝐷, that inputs either a real (𝑰𝑺) or synthetic image (𝑰𝑮) and predicts whether the image is real or synthetic, (5) Attribute predictor, 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟, that is trained to predict style attribute of a given image and (6) Iris matcher, 𝐼𝑟𝑖𝑠𝐼 𝐷, which is based on DRFNet and trained for iris recognition using triplet training method. 5.2.1.2 StyleGAN and its Latent Space Standard GANs were proposed by Goodfellow et al. [56] and since then, many different variations of GANs has been proposed by researchers to improve the performance of SGANs and generate high-resolution and realistic looking images that closely resembles the real data. One such GAN is StyleGAN [77], which improved the generative capability of a standard GAN by introducing a style modulation, which involves injecting style information at multiple layers of the generator network. This style information controls the appearance of features at different scales, enabling the generation of images with diverse styles and characteristics. Additionally, StyleGAN incorporates a progressive growing strategy during training, where the network gradually increases in complexity, starting from low-resolution images and progressively adding details as training progresses. This approach helps stabilize the training process and ensures the generation of high-quality images. 100 Figure 5.3: Overview of Warping network (𝑊) method begins with a latent code 𝑧 from intermediate latent codes 𝑤 ∈ W. This latent code is modified by a vector shift determined by the warping function 𝑓 𝑚, which is executed within the warping network 𝑊. This shift relies on a specifically chosen support set 𝑉𝑚, associated weights 𝐵𝑚, and parameters 𝑈𝑚. StyleGAN-2 [76] further builds upon the success of the original StyleGAN by introducing several improvements to further enhance image quality and diversity. This includes the incorporation of adaptive instance normalization (AdaIN), which allows for more fine-grained control over style modulation, as well as the addition of skip connections to facilitate information flow between different layers of the network. In [87], researchers studied that the intermediate latent space 𝑊 of StyleGAN and its variants is highly disentangled. Karras et al. [77] observed that specific channels within 𝑊 correspond to distinct subsets of facial attributes, categorizing them into three groups: Coarse channels (0 to 3) encapsulate high-level characteristics like pose, face shape, hairstyle, and eyewear. Intermediate channels (4 to 8) represent features such as eye, mouth, and nose structure, while Fine channels (8 to 17) encode color scheme and micro-structure details. Previous studies have demonstrated how this disentanglement of features can be utilized to manipulate individual facial attributes [87]. 101 (a) Various examples of generated images using proposed method with identities different from the training dataset. (b) Genuine (red) and imposter (green) scores are calculated for images showing inter and intra-class variations in the generated dataset. Figure 5.4: Examples of images generated using IT-diffGAN with identities different from training data and intra-class variations. A total of 20,000 irides corresponding to 2,000 identities were generated for each of the three training datasets. 102 5.2.2 Diffusion-GANs Many researchers have been working towards improving the quality and realism of the generated images via GANs. However, GANs still struggle to generate images free from artifacts, distortions, and mode collapse, where the generator fails to capture the full diversity of the target distribution. Diffusion Generative Adversarial Networks (Diffusion-GANs) [147] offer a promising solution to these challenges. In Diffusion-GANs, the generation process is based on the concept of iteratively refining a noise-corrupted image to generate high-quality samples. This iterative process involves gradually diffusing noise through the image, effectively smoothing out the noise to reveal the underlying structure and details. One key advantage of Diffusion-GANs over traditional GANs lies in their ability to produce high-quality images with fewer artifacts and distortions. By iteratively refining the image through diffusion, Diffusion-GANs can generate samples that exhibit improved realism, making them well-suited for tasks that require high-quality image synthesis, such as image editing and synthesis. Additionally, Diffusion GANs provide greater stability during training compared to traditional GANs. The diffusion process offers a more controlled and stable training environment, leading to smoother convergence and better overall performance. This stability helps mitigate issues such as mode collapse, allowing the network to capture a wider range of image variations. The process of constructing Diffusion based GAN can be described using three steps [147]: (1) Injecting Noise via Diffusion: Diffusion GAN aims to produce realistic images 𝑰𝒈 using a gen- erator 𝐺, which transforms a latent variable 𝒛 drawn from a simple prior distribution 𝑝(𝒛) into a high- dimensional data space, such as images. Here, 𝒛 ∼ 𝑝(𝒛) is represented as 𝑝𝑔 ( 𝑰) = ∫ 𝑝( 𝑰𝒈 |𝒛) 𝑝(𝒛) 𝑑𝑧 and 𝑰 refers to real image. To enhance the robustness and diversity of the generator, instance noise is introduced into the generated samples 𝑰𝒈 by employing a diffusion process, which adds Gaussian noise at each step. The noisy samples (let say 𝑰𝒏) obtained at different steps of the diffusion and modeled from mixture distribution 𝑞( 𝑰𝒏| 𝑰, 𝑡)) are from Gaussian distributions with mean directly proportional to 𝑰 and variance that varies according to influence of noise at step 𝑡. The same mixture distribution and diffusion process is utilized by both real and generated samples to help 𝐺 103 learn underlying structure and details of real images. (2) Adversarial Training: In order to accommodate diffusion process in generative adversarial training, the original min-max objective [56] is updated as follows: max 𝐷 𝐿 (𝐷) = E𝑰∼𝑝( 𝑰),𝑡∼𝑝𝜋,𝑰𝒏∼𝑞( 𝑰𝒏 | 𝑰,𝑡) [𝑙𝑜𝑔(𝐷 ( 𝑰𝒏, 𝑡))] +E𝒛∼𝑝(𝒛),𝑡∼𝑝𝜋,𝑰𝒈∼𝑞( 𝑰𝒈 |𝐺𝜃 (𝒛),𝑡) [𝑙𝑜𝑔(1 − 𝐷 ( 𝑰𝒈, 𝑡))] (5.2) Here, 𝑝𝜋 refers to discreet distribution that helps assign weights to each step in diffusion 𝑡 ∈ 1, ....𝑇. (3) Discriminator for Adaptive Diffusion: With the introduction of diffusion and timestep in the adversarial training, diffusion based GAN requires new optimization strategy for 𝐷 to learn how to generate realistic looking images. This is achieved by making the discriminator learn the distinction between real and synthetic from easiest examples (no noise) and then gradually increasing the noise-to-data ratio. This is done by utilizing a self-paced schedule for determining the number of diffusion steps (𝑇), based on a discriminator over-fitting metric (𝑟𝑑). This metric, derived from [75], evaluates the confidence of discriminator relative to the data, 𝑟𝑑 = E𝑰𝒏,𝑡∼𝑝( 𝑰𝒏,𝑡) [sign(𝐷𝜙 ( 𝑰𝒏, 𝑡) − 0.5)], 𝑇 = 𝑇 + sign(𝑟𝑑 − 𝑑target) × 𝐶 (5.3) The schedule adjusts 𝑇 based on the deviation of rd from a target value, with a constant factor (C) influencing the rate of change. The 𝑟𝑑 is recalculated and 𝑇 updated every four minibatches [75]. 5.3 Proposed Method StyleGAN-3 has been shown to be able to generate realistic images by utilizing random noise vectors as input [76]. In this method, the initial latent codes 𝑧 ∈ 𝑍 undergo a transformation process facilitated by a mapper, which consists of a sequence of fully connected layers that produce intermediate latent codes 𝑤 ∈ W. This intermediate latent space, represented by W, is structured as an 18×512 two-dimensional array, with each row denoted as a channel. One of the useful 104 features of StyleGAN-3 is the high degree of disentanglement present in its intermediate latent space W. Studies conducted by Karras et al. [77] have studied this, revealing that specific layers within W correspond to distinct subsets of facial attributes. For example, they categorized these channels into three main groups for face images: coarse channels 0-3, middle channels 4-8, and fine channels 8-17. Coarse channels capture broad characteristics such as pose, facial structure, hairstyle, and the presence of eyewear. Middle channels encapsulate finer details like the structure of eyes, mouth, and nose. Fine channels encode information related to color palettes and micro- structural intricacies. Various studies [76,77,87] have showcased how the disentangled features can be leveraged to manipulate individual facial features effectively. By carefully adjusting the latent codes, researchers have successfully altered specific features while preserving the overall identity of the face. This section aims to delve deeper into this, exploring the possibilities and implications of such disentanglement within the context of manipulation of identity and style in iris images. In order to achieve this, the proposed method can be divided into three steps as described below. 5.3.1 Mapping Image Input to Latent Space The proposed method, IT-diffGAN, is an image translative diffusion-GAN using StyleGAN-3 as the backbone architecture that takes an iris image as an input which is mapped to a latent space using an image encoder. We have utilized the encoder 𝐸 from pSp [116] as the backbone for our image encoder that takes an image of size 256×256 as input and returns an encoding of size 1×512. This encoding is then passed through mapper from StyleGAN-3 to output intermediate latent codes 𝑤 of size 16×512. With this, the adversarial loss is defined as, L𝐴𝑑𝑣 = E𝑰∼𝑝( 𝑰),𝑡∼𝑝𝜋,𝑰𝒏∼𝑞( 𝑰𝒏 | 𝑰,𝑡) [𝑙𝑜𝑔(𝐷 ( 𝑰𝒏, 𝑡))] +E𝑰𝒔∼𝑝( 𝑰𝒔),𝑡∼𝑝𝜋,𝑰𝒈∼𝑞( 𝑰𝒈 |𝐺𝜃 (𝐸 ( 𝑰𝒔)),𝑡) [𝑙𝑜𝑔(1 − 𝐷 ( 𝑰𝒈), 𝑡)]. Here, 𝑝𝜋 refers to discreet distribution that helps assign weights to each step in diffusion 𝑡 ∈ (5.4) 1, ....𝑇. Analyzing different channels using this method, we determined that Channel-1, Channel-8, Channel-9 and Channel-10 have the most effect on the identity (as shown in Figure 5.1). 105 (a) Distribution of FID scores of generated iris images against real images from CASIA-Iris- Thousand dataset when GANs are trained using CASIA-Iris-Thousand dataset. (b) Distribution of FID scores of generated iris images against real images from CASIA-CSIR dataset when GANs are trained using CASIA- CSIR dataset. (c) Distribution of FID scores of generated iris images against real images IITD-iris dataset when GANs are trained IITD-iris dataset. Figure 5.5: Histograms depicting the realism scores of real iris images using FID that compares the distribution of synthetically generated irides with real data. Since FID is a distance score, lower FID score indicates that the synthetic iris images are more realistic. 5.3.2 Identity & Style Disentanglement As mentioned earlier, each row of the intermediate latent space represented by W and of size 16×512 is denoted as channel that can affect different properties during generation of iris images. The proposed method aims to find the channels that affect the identity and style properties of iris images generated by proposed method. This is achieved by studying the intermediate latent codes 𝑤𝑠 of source iris image 𝐼𝑠 and how it affects the identity and style when certain channels are replaced by target 𝑤𝑡. The updated code is then used by image translative diffusion StyleGAN-3 to 106 generate synthetic iris image 𝐼𝑔. Further, we utilized a trained iris recognition method (IrisID) and trained iris attribute predictor (IrisAttr) to evaluate the effects of channel replacement between 𝑤𝑠 and 𝑤𝑡 using, L𝐼 𝐷−𝑅𝑒𝑐𝑜𝑛 = ||𝐹𝑒𝑎𝑡 (𝐼𝑠) − 𝐹𝑒𝑎𝑡 (𝐼𝑔)||2 2 L𝐴𝑡𝑡𝑟−𝑅𝑒𝑐𝑜𝑛 = ||𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑠) − 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑔)||2 2 (5.5) (5.6) Here, L𝐼 𝐷−𝑅𝑒𝑐𝑜𝑛 and L𝐴𝑡𝑡𝑟−𝑅𝑒𝑐𝑜𝑛 helps evaluate the displacement in identity and style when certain channels in 𝑤𝑠 are replaced by channels in 𝑤𝑡. Also, 𝐹𝑒𝑎𝑡 (.) refers to features extracted by trained iris matcher 𝐼𝑟𝑖𝑠𝐼 𝐷 and 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (.) returns an attribute vector 𝑦′ of size 12. The initial 5 bits represent a one-hot encoding of the angle, while the subsequent 5 bits encode the position shift. The remaining 2 bits indicate contraction and dilation. In this context, the angle and position determine the orientation and the displacement of the iris center within the image. The angles considered for this study include 0◦, 10◦, 12◦, 15◦, and 18◦, while the position shifts are [0,0], [5,5], [10,10], [-10,10], and [-10,-10]. For instance, for an image with a 12◦ angle, a position shift of [0,0], and contraction, the attribute vector 𝑦 would be [0,0,1,0,0,1,0,0,0,0,1,0]. 5.3.3 Manipulating Style & Identity in Latent Space In this paper, we aim to manipulate style and identity of a given input image in latent space to generate images with both inter- and intra-class variations. 5.3.3.1 Style Transfer For the manipulation of style properties, as mentioned earlier, we exchange the style specific channels from the source identity S with the corresponding channels of a target style. In order to enforce that the generated image has the style attributes similar to the target image, proposed method aim to minimize the attribute loss between target image and generated image, L𝐴𝑡𝑡𝑟−𝑅𝑒𝑐𝑜𝑛 = ||𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑡) − 𝐼𝑟𝑖𝑠𝐴𝑡𝑡𝑟 (𝐼𝑔)||2 2 (5.7) Here, 𝐼𝑡 refers to the target image and 𝐼𝑔 refers to the synthetically generated image. 107 Figure 5.6: Analysis of VeriEye Rejection Rate for different Generative Methods when trained using CASIA-Iris-Thousand, CASIA-CSIR and IITD-iris dataset. 5.3.3.2 Identity Transformation In this work, we aim to manipulate the identity of the input image such that the generated image doesn’t match with any identity in the training set. This is achieved by shifting the source identity 𝑆 in a particular direction (away from source) in the latent space to generate a new identity 𝑆′ using a Warping Network 𝑊. This network is applied at latent codes/channels most pertinent to identity in the given image (as shown in Figure ??). The Warping Network 𝑊 aims to learn 𝑀 functions ( 𝑓 1, ..., 𝑓 𝑀 ) to warp the identity specific channels. The gradient of these functions help determine the direction at each latent code 𝑧 ∈ R𝑑, enabling the identity shift from source identity 𝑆 to new identity 𝑆′ i.e, it transform R𝑑 by 𝑓 𝑚 : R𝑑 → R, which is parameterized as the weighted sum of RBFs given by, 𝑓 (𝑧) = 𝐾 ∑︁ 𝑖=1 𝑏𝑖 exp(−𝑢𝑖 ||𝑧 − 𝑣𝑖 ||2) (5.8) where, 𝑣𝑖, 𝑏𝑖 and 𝑢𝑖 represents the center, weight, and scale of the 𝑖𝑡ℎ RBF and for a 𝑧, the 108 direction Δ 𝑓 can be utilized to define a curve by shifting 𝑧 to ¯𝑧 using, 𝛿𝑤 = 𝜖 Δ 𝑓 (𝑧) ||Δ 𝑓 (𝑧)|| (5.9) Here, 𝜖 is the shift magnitude determining the transition from 𝑧 to ¯𝑧. It is important that the shift magnitude is not too large as it may lead to un-realistic image generation, while a shift too small may not affect the identity transformation. Therefore, similar to iWarpGAN, a reconstructor is utilized to estimate 𝜖 and support set that is utilized to transform 𝑧 to ¯𝑧. min 𝑉,𝐵,𝑈,𝑅 E𝑤,𝜖 [L𝑊−𝑅𝑒𝑔 (𝜖, ¯𝜖)] (5.10) Here, (𝑉 𝑚, 𝐵𝑚, 𝑈𝑚) denote the center, weight and parameters of RBFs with 𝑚 = 1, ....𝑀. Further, we aim to maximize the identity loss to emphasize the uniqueness of identity in the generated images. L𝐼 𝐷−𝑅𝑒𝑐𝑜𝑛 = ||𝐹𝑒𝑎𝑡 (𝐼𝑠) − 𝐹𝑒𝑎𝑡 (𝐼𝑔)||2 2 (5.11) 𝐹𝑒𝑎𝑡 (.) refers to features extracted by trained iris matcher 𝐼𝑟𝑖𝑠𝐼 𝐷 as discussed earlier. 5.3.4 Datasets Utilized In this study, we employed three publicly accessible iris datasets utilized in [160] to evaluate and analyze the performance of proposed method: • D1: CASIA-Iris-Thousand: This dataset [5], collected by the Chinese Academy of Sciences Institute of Automation, serves as a benchmark for assessing the distinctiveness of iris features and developing iris recognition methods. It comprises 20,000 iris images from 1,000 subjects (2,000 unique identities, each with left and right eye), captured using an iris scanner with a resolution of 640×480. The dataset is partitioned into training and testing subsets, utilizing a 70-30 split based on unique identities, with 1,400 identities in the training set and 600 in the testing set. • D2: CASIA Cross Sensor Iris Dataset (CSIR): Here, we utilized the training set of the CASIA- CSIR dataset [152], provided by the Chinese Academy of Sciences Institute of Automation 109 (a) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using StarGAN- v2 against real images when trained using D1, D2 and D3. (b) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using StyleGAN- 3 against real images when trained using D1, D2 and D3. (c) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using diff- StyleGAN-3 against real images when trained using D1, D2 and D3. Figure 5.7: The histograms illustrating the quality of both real and synthetic irises, evaluated using the ISO/IEC 29794-6 Standard Quality Metrics. This metric scores the images on a scale from 0 to 100, with higher values indicating better quality. Images that could not be assessed by this standard were given a score of 255. 110 Figure 5.7 (cont’d) (d) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using iWarpGAN against real images when trained using D1, D2 and D3. (e) Distribution of ISO/IEC 29794-6 Standard Quality scores of iris images generated using IT-diffGAN against real images when trained using D1, D2 and D3. that comprises 7,964 iris images from 100 subjects (200 unique identities, each with left and right eye), divided into training and testing sets with a 70-30 split based on unique identities. The training set consists of 5,411 images, while the test set contains 2,553 images, aimed at training and evaluating deep learning-based iris recognition methods. • D3: IITD-iris: The IITD-iris dataset [6], released by the Indian Institute of Technology, Delhi, was captured in an indoor environment. It contains 1,120 iris images from 224 subjects, acquired using JIRIS, JPC1000, and digital CMOS cameras with a resolution of 320×240. Similar to the previous datasets, this dataset is split into training and testing subsets using a 70-30 split based on unique identities, with 314 identities in the training set and 134 in the testing set. The iris images in these datasets are cropped and resized to 256×256 pixels, with each image style represented by an attribute vector 𝑦. Since the existing datasets lack a balanced distribution of iris images across these attributes, we introduced variations such as angle and position via image transformations applied to randomly selected images. This process involved obtaining iris 111 coordinates using the VeriEye iris matcher, translating images to different angles and positions relative to these centers, and then extracting cropped iris images of size 256×256. This approach facilitated the creation of a training dataset with balanced samples across various attributes. 5.4 Experiments & Analysis In this section, we discuss various experiments conducted to assess and evaluate the efficacy of the proposed approach. Firstly, we generated three distinct sets of iris images, each comprising 20,000 images representing 2,000 unique identities. These sets are denoted as D1, D2, and D3, corresponding to three different training datasets. In certain experiments outlined below, a subset of the generated images was employed to ensure comparability with the respective real dataset. 5.4.1 Experiment 1: Test of Realism To assess the realism of the iris images generated using proposed method, various GAN methods— StarGAN-v2, StyleGAN-3, iWarpGAN, Diffusion-GAN and IT-diffGAN— are individually trained using real iris data from the CASIA-Iris-Thousand, CASIA Cross Sensor Iris, and IITD-iris datasets. The generated images are evaluated for realism and quality using two different methods: (1) Fréchet Inception Distance Score [119], and (2) ISO-ISO/IEC 29794-6 Standard Quality Metrics [3]. Fréchet Inception Distance (FID) Score: FID score is a metric that can be used to evaluate the quality of synthetic images by comparing the distribution of synthetically generated iris images to that of real images [119]. Since it is a distance score, the goal is to achieve a lower FID score, as this indicates a greater similarity between the synthetic and real datasets. In this research, we evaluate the realism of iris images generated using proposed method and compared it with other existing methods for generating iris images. For this we generated 20,000 (2,000 unique identities) using different methods mentioned earlier and calculated their FID score using real datasets used to train those networks. From this study, we found that the diffusion based GANs [147] produce the most realistic image with StyleGAN-3 based diffusion-GAN achieving average FID score of 8.01, 9.34 and 8.64 when networks are trained using D1, D2 and D3 dataset, respectively. Also, proposed 112 method that is an image translative diffusion-GAN obtained the average FID score of 11.03, 12.08 and 12.02 for D1, D2 and D3, respectively. A detailed analysis and comparison is shown in Figure 5.5. ISO-ISO/IEC 29794-6 Standard Quality Metrics: The ISO standard uses a variety of criteria to assess the quality of an iris image. These criteria include the usable iris area, pupil shape, the contrast between the iris and sclera, sharpness, the contrast between the iris and pupil, among others, to derive a comprehensive quality score. This score ranges from 0 to 100, with 0 representing the lowest quality and 100 the highest. Images that cannot be evaluated using this ISO metric, usually due to poor quality or segmentation errors, are given a score of 255. Similar to the experimental protocol described in [159], we compare the ISO scores obtained for real and synthetic iris images generated using StarGAN-v2, StyleGAN-3, iWarpGAN, Diffusion-GAN and IT-diffGAN. For fair comparison, number of generated images used for this experiment is equivalent to number of irides in real dataset D1, D2 and D3. As shown in Figure 5.7, the realism of images generated by our proposed method IT-diffGAN and iWarpGAN are comparable to the real iris datasets. On the other hand, many images from StarGAN-v2 and StyleGAN-3 recieved the score of 255 and most of the other images received lower quality score than real images. VeriEye Rejection Rate: We further emphasize the realism of the synthetic iris images generated by proposed method by analyzing the number of the images that are rejected by commercial iris segmenter and matcher VeriEye. This software uses some pre-defined parameters similar to ISO- ISO/IEC 29794-6 Standard Quality Metrics to evaluate if the given iris image meets the standard of an acceptable iris image and rejects the images that doesn’t match those standard. As shown in Figure 5.6, CASIA-Iris-Thousand Dataset, containing 20,000 real iris images, exhibited a very low rejection rate of 0.06% and among the synthetic iris images obtained from networks trained using train set of this dataset, StarGAN-v2 has the highest rejection rates of 0.27%. However, images generated using Diff-StyleGAN-3 and IT-diffGAN demonstrated better performance with rejection rates of 0.11%, and 0.13%, respectively. Similar behavior was observed 113 for IITD-iris dataset, containing 1,120 iris images, where lowest rejection rate is obtained by Diff- StyleGAN-3 at 0.71% followed by StyleGAN-3 and IT-diffGAN at 0.73% and 0.89%, respectively. Interestingly, it was observed that the real images in CASIA-CSIR dataset has higher rejection rate (2.81%) than other real iris datasets and also synthetic images from some of the generative methods i.e., Diff-StyleGAN-3 has the rejection rate of 2.11% followed by IT-diffGAN at 2.44%. 5.4.2 Experiment 2: Test of Uniqueness This experiment investigates the ability of IT-diffGAN to generate identities different from training data with intra-class variations in synthetically generated iris images. Uniqueness Test w.r.t Training Data (Experiment-2(i)): This experiment evaluates the unique- ness of identities in the generated images with respect to the training data that was utilized to train proposed IT-diffGAN. This is done by computing the similarity score between the generated images and the training data using the commercial matcher VeriEye. We compare the uniqueness of the generated dataset with the images generated using iWarpGAN, StyleGAN-3 and StarGAN-v2. Uniqueness Test within Generated Dataset (Experiment-2(i)): This experiment evaluates how unique the generated identities are from each other by computing the similarity between generated images using VeriEye matcher. This helps evaluate the scalability of proposed method in generating unique identities. For this experiment, we test for uniqueness between the synthetic images generated using proposed method, iWarpGAN, StyleGAN-3 and StarGAN-v2 Analysis: The experiments conducted in this work closely follow the protocols outlined in [159] to assess both inter-class and intra-class variations within the generated iris dataset. Additionally, the evaluation focuses on the distinctiveness of the generated iris identities when compared to the real iris images in the training dataset. As shown in Figure 5.11, the iris images created by IT- diffGAN and iWarpGAN demonstrate less similarity to the real training irides than those generated by other GAN methods [159]. However, the genuine score distribution of dataset generated by proposed method exhibits intra-class variations closer to genuine score distribution of real images than iWarpGAN. This suggests that proposed method effectively generates iris patterns with both 114 inter- and intra-class variations. Furthermore, an analysis of the imposter distribution between the synthetic and genuine iris images reveals minimal overlap, reinforcing the uniqueness of the generated identities. Thus, it can be concluded that IT-diffGAN successfully creates entirely synthetic iris images with identities distinct from those in the training set, while GAN based methods like StyleGAN-3 are limited to producing partially synthetic irides. 5.4.3 Experiment 3: Utility of Generated Iris Images This experiment evaluates the performance of deep learning algorithms in iris recognition by training and testing them using a triplet training method. The study compares the outcomes when these algorithms are trained solely on real images versus a combination of real and synthetic iris images. Baseline Experiment-3(i): In the baseline experiment, EfficientNet [66] and ResNet-101 [101] are trained following the triplet training method with hard triplet mining. The training and testing of these methods is done using cross-dataset scenario i.e., when trained using real irides from CASIA-CSIR and IITD-iris dataset, testing is done on CASIA-Iris-Thousand. Improvement Experiment-3(ii): In this experiment, EfficientNet [66] and ResNet-101 [101] are again trained following the triplet training method with hard triplet mining. However, the training and testing of these methods is done using cross-dataset scenario with synthetically generated dataset i.e., when trained using real irides and synthetic from CASIA-CSIR and IITD-iris dataset, testing is done on real images from CASIA-Iris-Thousand. Analysis: The results, as depicted in Figures 5.8, 5.9, and 5.10, show that the integration of synthetic data during training significantly enhances the performance of deep learning-based iris recognition systems. This effect was particularly evident when both real and synthetic images were used, as it allowed the models to capture a broader range of intra-class and inter-class variations. For example, EfficientNet and ResNet-101, which initially demonstrated only moderate recognition capabilities when trained solely on real data, exhibited substantial accuracy improvements when synthetic 115 images were included. The synthetic data, generated by iWarpGAN and IT-diffGAN, introduced additional variability in iris patterns that helped the models learn more robust and generalizable features. The ROC analysis reveals that IT-diffGAN consistently improves the True Detection Rate (TDR) at various False Match Rate (FMR) thresholds. For instance, as shown in Figure 5.8, the ResNet- 101 model shows TDRs of 97.72% at FMR=1% when trained with real and augmented dataset generated using IT-diffGAN, outperforming both the baseline (87.87%) and iWarpGAN (94.13%). This trend continues with EfficientNet, where TDRs of 96.64% at FMR=1%, again surpassing both the baseline and iWarpGAN. Similar improvement in the performance are observed in the CASIA-CSIR and CASIA-Iris-Thousand datasets. The ROC curves clearly indicate an increase in the TMR for cases where the dataset was augmented with synthetic iris images generated. These observations highlights the advantage of using generative models like iWarpGAN and IT-diffGAN, which are capable of producing synthetic irides that not only preserve the variability necessary for training but also generate identities distinct from the original training set. This ability to generate new identities mitigates the risk of overfitting and improves the model’s performance on unseen data, as demonstrated by the in our experiments. As mentioned in earlier chapter, the baseline under-performance of EfficientNet can be at- tributed to two main factors. First, architectural differences between EfficientNet and ResNet are significant. EfficientNet’s design emphasizes parameter efficiency through compound scaling, op- timizing depth, width, and resolution together. While effective in many domains, this approach might not align well with the specific requirements of iris recognition, particularly when training data is limited. In contrast, ResNet’s residual connections and straightforward structure make it robust across diverse tasks, including biometric recognition, even with smaller datasets. Second, the training requirements for EfficientNet differ from those of ResNet. EfficientNet’s compound scaling may necessitate tailored learning rates, augmentation strategies, and regularization tech- niques to effectively capture the intricate patterns in iris images. Without such adjustments, its capacity to learn critical iris features may be limited. The inclusion of synthetic data appears to 116 address these challenges by diversifying the training set, allowing EfficientNet to better adapt to the unique characteristics of iris recognition. This finding underscores the importance of designing training strategies specific to an architecture’s strengths and weaknesses. Future research could ex- plore architecture-specific synthetic data generation or custom training pipelines to further enhance model performance across a variety of biometric recognition systems. 5.5 Conclusion The thorough analysis in the experimental section indicates that, unlike existing GANs, the proposed method excels in generating realistic iris images with identities distinct from those in the training dataset. Additionally, the generated identities exhibit uniqueness among themselves, featuring some degree of variation. This study also highlights the practical benefits of the gener- ated dataset, which enhances the performance of deep learning-based iris recognition systems by providing additional synthetic training data with numerous identities different from those in the training data. With the dataset used in this study, the proposed method demonstrates the capability to generate up to 11,139,840 images, encompassing 1,113,984 unique identities, with each identity having 10 samples. This large-scale generation capacity underscores the method’s potential for creating diverse and expansive datasets, which are essential for advancing the robustness of iris recognition systems. Our method relies on image transformation, requiring an input image to modify the identity and an additional reference image to alter the style. This dependency may inherently constrain the number of distinct identities that can be generated by the proposed method. In future research, we aim to thoroughly investigate the method’s capacity for generating diverse identities and explore strategies to enhance the generalizability of the proposed framework. By addressing these limita- tions, we aim to ensure that the new identities learned by IT-diffGAN are not constrained by the original training set, thus expanding the feature space and improving its scalability and versatility. 117 (a) Performance of Resnet101 when trained using CASIA-Iris- Thousand dataset and CASIA-CSIR (aditionally, synthetic iris images for improvement experiment) and tested on IITD-iris dataset. (b) Performance of EfficientNet-B0 when trained using CASIA-Iris- Thousand dataset and CASIA-CSIR (aditionally, synthetic iris images for improvement experiment) and tested on IITD-iris dataset.. Figure 5.8: This figure illustrates the performance of EfficientNet-B0, and ResNet-101 in a cross- dataset evaluation scenario. 118 (a) Performance of Resnet101 when trained using CASIA-Iris- Thousand and IITD-iris dataset (aditionally, synthetic iris images for improvement experiment) and tested on CASIA-CSIR dataset. (b) Performance of EfficientNet-B0 when trained using CASIA-Iris- Thousand and IITD-iris dataset (aditionally, synthetic iris images for improvement experiment) and tested on CASIA-CSIR dataset. Figure 5.9: This figure illustrates the performance of EfficientNet, and ResNet-101 in a cross- dataset evaluation scenario. 119 (a) Performance of Resnet101 when trained using CASIA-CSIR and IITD-iris dataset (aditionally, synthetic iris images for improvement experiment) and tested on CASIA-Iris-Thousand dataset. (b) Performance of EfficientNet-B0 when trained using CASIA-CSIR and IITD-iris dataset (aditionally, synthetic iris images for improve- ment experiment) and tested on CASIA-Iris-Thousand dataset. Figure 5.10: This figure illustrates the performance of EfficientNet, and ResNet-101 in a cross- dataset evaluation scenario. 120 (a) This figure illustrates the uniqueness of iris images pro- duced by IT-diffGAN, where the GANs were trained on the IIT-Delhi iris dataset. (b) This figure illustrates the uniqueness of iris images pro- duced by IT-diffGAN, where the GANs were trained on the CASIA-CSIR dataset. Figure 5.11: This figure illustrates the uniqueness of iris images produced by IT-diffGAN. The y-axis displays similarity scores obtained from VeriEye. In the figure, R stands for Real, S for Synthetic, Gen for Genuine, and Imp for Impostor. As you can see, the similarity between images generated by IT-diffGAN is the lowest compared to other GANs, with iWarpGAN following. 121 CHAPTER 6 MULTI-DOMAIN IMAGE TRANSLATIVE DIFFUSION STYLEGAN WITH APPLICATION IN IRIS PRESENTATION ATTACK DETECTION So far, we focused on generating partially and fully synthetic iris images in NIR spectrum. In this chapter, we explore the capabilities and weaknesses of current methods in generating partially synthetic ocular PA images. We also proposed a novel method, known as MID-StyleGAN, to generate ocular PA images (with both inter and intra-class variations). 6.1 Introduction Iris based biometric systems are known for their reliability and contactless recognition [71]. However, as these systems become more widespread, they are increasingly targeted by presentation attacks (PA), where attackers attempt to deceive the system using artifacts such as printed images, textured cosmetic contact lens, artificial eyes, etc. to either impersonate a real individual or obfuscate their own identity [34]. Detecting such attacks is important for a secure iris recognition system, but is hampered by the limited availability of ocular datasets. This lack of data makes it difficult to adequately train models to recognize the subtle differences between bonafide and PAs, particularly when considering the wide range of variations within and across different PAs of iris images (such as printed eyes and cosmetic contact lens). One solution to overcome this challenge is to augment the training data with synthetic data that exhibit realistic iris images. These synthetic datasets can help in developing and evaluating PA detection algorithms, ensuring that they are robust against a wide range of attacks [82, 156, 158]. The generation of realistic synthetic biometric data has been explored in various studies. The more recent methods employ generative adversarial networks (GANs) to produce synthetic images [45,47,61,140,151,156]. GANs typically take a random noise vector as input and generate a realistic image from it. For instance, Kohli et al. [82] proposed iDCGAN for synthesizing cropped iris images from random noise. While they showed good results for croppped iris images of size 64𝑥64, this method struggles with generating higher resolution images, and fails miserable at 122 generating ocular images. Yadav et al. [156] utilized the Relativistic Average Standard Generative Network (RaSGAN) to generate high-resolution iris images from random noise input. Another category of GANs, focused primarily on tasks such as image editing and domain specific style transfer, are image translative GANs that takes an image as an input and generate a synthetic image as per conditions specified for image translation. For example, Richardson et al. [116] proposed pSp, a image translative StyleGAN that takes a face image as an input and generates a synthetic image with altered style attributes such as hair color and expressions while keeping intact the characteristics that defines the identity of the face in the given input image. In another work, Yadav et al. [158] proposed CIT-GAN that utilizes paired training data to translate a source iris image into a synthetic image that incorporates the attributes from a target domain defined using a reference iris image. This process allows the generator to map across different domains, making it versatile for multiple applications. While these methods offer significant improvements over traditional methods for synthetic iris image generation [32, 126, 173], the quality of images degrade for ocular images where sometimes GANs focus too much on the non-iris parts of the images (such as eyelashes) while failing to capture the intricate details of the iris. In this paper, we address the problem of generating realistic high resolution ocular images while overcoming the shortcomings of GANs (mode collapse, unstable training, etc.). Ocular images provide richer context and additional information compared to the cropped iris images. It includes not only the iris but also the surrounding regions such as the sclera, eyelashes, and eyelids. These elements play a key role in many biometric applications such as PA detection (PAD), where adversarial artifacts might appear beyond the iris itself. Additionally, generating ocular images facilitates the development of more robust machine learning models that can handle diverse real- world scenarios. To generate such images with rich contextual information we propose a novel approrach, known as Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN), to generate realistic high resolution ocular synthetic PA datasets. This method combines the strengths of StyleGAN [75,76] and diffusion models [147,153] for high-fidelity ocular image synthesis while utilizing a multi-domain discriminator and an image encoder for smooth transitions and variations 123 across multiple PA domains, i.e., the discriminator is responsible for distinguishing between real and synthetic images, as well as classifying the domain of the image (e.g., determining whether the image belongs to the domain bonafide, printed eyes or cosmetic contact lens). Also, the ocular image encoder utilizes feedback from the discriminator to learn domain-specific knowledge. This helps the network to better learn image translation from source to target domain. In the Section 6.4, we will show how the images generated using the proposed method are more realistic than other GAN methods, but also capture the inter and intra class variations within the domains. Also, we will show how the generated dataset can be utilized to augment ocular PA training datasets for enhancing the performance of PA detection (PAD) methods. Hence, the contributions of this paper can be summarized as follows: • We propose a Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN) to generate realistic high resolution ocular synthetic PA datasets to augment the training data for PA detection. • The proposed method (a) utilizes diffusion models in combination with GANs to generate high-resolution, realistic synthetic images, (b) employs a multi-domain discriminator that is scalable to multiple domains, and (c) promotes domain transfer using conditional adversarial training and domain transfer loss. • We compare and analyze the realism of ocular images generated by our proposed method with other methods in the literature. • We evaluate the utility of the generated ocular PA dataset for enhancing the performance of a DNN-based PA detector. 124 Figure 6.1: Illustration of the proposed method that has three modules: (1) Encoder, 𝐸, which takes an image and its domain label as an input and outputs the encoded image, (2) 𝐺 that takes the encoded image as an input along with the target domain label to which the input image has to be translated, and (3) 𝐷 takes an image and label as input and outputs the image probability of domains as well as whether the image is real or synthetic. Figure 6.2: Samples of ocular images generated using proposed method, MID-StyleGAN. The proposed method is capable of not only generating multiple domains in the PA datasets but also intra class variations present in different types of PAs. 125 6.2 Background 6.2.1 Generative Adversarial Networks (GANs) Generative Adversarial Networks (GANs) [56] are powerful generative models that typically take a random noise vector as input to output a realistic looking synthetic image. A GAN comprises two core components: (1) a Generative Network, referred to as the Generator (𝐺), and (2) a Discriminative Network, known as the Discriminator (𝐷). These two networks are trained in a competitive setting, where the Generator’s goal is to create images that are realistic enough to deceive the Discriminator, while the Discriminator’s task is to differentiate between authentic images and those generated by 𝐺. The objective function for GANs is designed to capture this adversarial process and is typically formulated as a min-max game between the Generator and Discriminator. Mathematically, it can be expressed as [56], min 𝐺 max 𝐷 L (𝐷, 𝐺) = E𝒙∼P [𝑙𝑜𝑔(𝐷 (𝒙))] +E𝒛∼Z [𝑙𝑜𝑔(1 − 𝐷 (𝐺 (𝒛)))]. (6.1) Here, 𝒙 ∼ P refers to the real source distribution and 𝒛 ∼ Z refers to the random noise distribution. Also, 𝐷 (𝒙) determines whether 𝒙 is real or synthetic, while 𝐺 (𝑧) outputs the generated image. 6.2.2 Diffusion based GANs Researchers have made significant progress in improving the quality and realism of images generated by GANs. However, challenges such as artifacts, distortions, and mode collapse—where the generator fails to capture the full diversity of the target distribution—still persist. Diffusion Generative Adversarial Networks (Diffusion-GANs) [147] offer a promising approach to address these issues. In Diffusion-GANs, the image generation process involves iteratively refining a noise- corrupted image to produce high-quality samples. This refinement process gradually diffuses noise 126 through the image, effectively smoothing it out to reveal the underlying structure and details, 𝑥𝑡 = 𝑥0 + 𝛽𝑡𝜖 (6.2) where, 𝒙𝑡 is the image at time-step t, 𝒙0 is the original image, 𝛽𝑡 is the diffusion coefficient, and 𝜖 is the noise added at each step. Combining the strengths of GANs and diffusion models, Diffusion GANs leverage the diffusion process to guide the generator in producing realistic samples, while the adversarial component ensures that the generated data is indistinguishable from real data. To integrate the diffusion process into generative adversarial training, the original min-max objective [56] is modified as follows [147]: L (𝐷, 𝐺) = E𝑥∼𝑝(𝑥),𝑡∼𝑝𝜋,𝑦∼𝑞(𝑦|𝑥,𝑡) [log(𝐷 (𝑦, 𝑡))] (cid:2)log(1 − 𝐷 (𝑦𝑔, 𝑡))(cid:3) +E𝑧∼𝑝(𝑧),𝑡∼𝑝𝜋,𝑦𝑔∼𝑞(𝑦|𝐺𝜃 (𝑧),𝑡) (6.3) Here, 𝑝(𝑥) refers to the distribution of the real data distribution and 𝑝𝜋 refers to discreet distribution that helps assign weights to each step in diffusion 𝑡 ∈ 1, ....𝑇. Also, 𝒚 and 𝒚 𝒈 respectively refers to the noisy and generated noisy counterpart of the real image 𝑥. With the introduction of diffusion and time steps in adversarial training, Diffusion-GANs require a new optimization strategy for the discriminator (𝐷) to effectively distinguish between real and synthetic images. This is achieved by having the discriminator learn from the simplest examples (with no noise) while gradually increasing the noise-to-data ratio. A self-paced schedule is used to determine the number of diffusion steps (𝑇), based on a discriminator over-fitting metric (𝑟𝑑). This metric, derived from [75], evaluates the discriminator’s confidence relative to the data: 𝑟𝑑 = E𝒙𝒏,𝑡∼𝑝(𝒙𝒏,𝑡) [sign(𝐷 (𝒙𝒏, 𝑡) − 0.5)], 𝑇 = 𝑇 + sign(𝑟𝑑 − 𝑑target) × 𝐶 (6.4) The schedule adjusts 𝑇 based on the deviation of 𝑟𝑑 from a target value, with a constant factor (C) influencing the rate of change. 𝑟𝑑 is recalculated and 𝑇 update after every four mini-batches [75]. 127 6.3 Proposed Method The proposed Multi-domain Image Translative Diffusion StyleGAN (MID-StyleGAN) model is designed to generate synthetic ocular images that accurately reflect the diversity found in real-world datasets (as shown in Figure ??). This method utilizes StyleGAN-3 as the backbone architecture while incorporating diffusion to generate high resolution realistic ocular images. The discriminator in the proposed method 𝐷 is responsible for distinguishing between real and synthetic images, as well as classifying the domain of the image (e.g., determining whether the image is a bonafide or one of many PA types). Unlike traditional discriminators that output a binary decision, our discriminator uses a multi-class output to support domain-specific classification. Also, the image encoder 𝐸 learns to encode an ocular image while utilizing feedback from the discriminator to learn domain-specific knowledge. This helps the network to learn a smooth image translation from the source to the target domain. To achieve this, the adversarial loss in Eqn. 6.3 has to be modified as, L (𝐷, 𝐺) = E𝑥∼𝑝(𝑥),𝑡∼𝑝𝜋,𝑦∼𝑞(𝑦|𝑥,𝑡) [log(𝐷 (𝑦, 𝑡))] (cid:2)log(1 − 𝐷 (𝑦𝑔, 𝑡))(cid:3) +E𝑡∼𝑝𝜋,𝑦𝑔∼𝑞(𝑦|𝐺𝜃 (𝐸 (𝑥,𝑠),𝑐),𝑡) (6.5) Here, c refers to the target domain and 𝐸 (.) refers to image encoder 𝐸 that takes an image 𝑥 as input and its domain label 𝑠. 𝒚 and 𝒚 𝒈 refer to real noisy and generated noisy image, at step 𝑡. The sub-network in 𝐷 for domain classification helps promote domain transfer using the following loss functions, Lreal domain = E𝑥∼𝑝 data (𝑥) [log 𝐷domain(𝑠|𝑦)] Lsynthetic domain = E𝑥𝑔∼𝑞(𝑥𝑔) (cid:2)log 𝐷domain(𝑐|𝑦𝑔)(cid:3) Ldomain = Lreal domain + Lsynthetic domain (6.6) (6.7) (6.8) Here, 𝐷𝑑𝑜𝑚𝑎𝑖𝑛 represents the domain classifier component of the discriminator. In Lreal domain the goal is to ensure that the discriminator assigns a high probability for a real image that belongs to the 𝑠. Similarly, Lsynthetic domain minimizes the probability that a generated sample from 𝐺 is classified as 128 real and from the correct domain. In order to ensure that the encoder learns to translate input image 𝑥 to latent code 𝑧 that represents the iris and PA distributions, we defined the content preservation loss as, Lrecon = E𝑥∼𝑝 data (𝑥) (cid:104) ∥𝐺 (𝐸 (𝑥, 𝑠), 𝑐) − 𝑥∥2 2 (cid:105) LLPIPS = E𝑥∼𝑝 data (𝑥) ∥𝜙𝑙 (𝐺 (𝐸 (𝑥, 𝑠), 𝑐)) − 𝜙𝑙 (𝑥) ∥2 2 (cid:35) (cid:34) ∑︁ 𝑙 (6.9) (6.10) Here, 𝜙𝑙 (.) represents the features extracted at layer 𝑙 of a pre-trained network.1 In order to ensure that the network does not alter the image drastically when it’s already in the target domain and learning the intra-class variations, we define the following loss: Linr = E𝑥∼𝑝 data (𝑥) (cid:104) ∥𝐺 (𝐸 (𝑥, 𝑐), 𝑐) − 𝑥∥2 2 (cid:105) (6.11) Further, as described in [75], to ensure diversity and encourage consistent image quality across domains, we employ a style-mixing and path-length regularization technique. The objective is to prevent the generator from becoming too reliant on a single latent vector for generating images. The regularization terms are defined as: Lmix = 𝐿 ∑︁ 𝑖=1 ∥𝐺 (𝐸 (𝑥1, 𝑠1), 𝑐) − 𝐺 (𝐸 (𝑥2, 𝑠2), 𝑐) ∥2 2 (cid:104) Lpl = E𝑤 ∥∇𝑤𝐺 (𝑤) − 𝛼∥2 2 (cid:105) (6.12) (6.13) Here, 𝑧1 and 𝑧2 are embeddings from image encoder 𝐸 and 𝐿 is the number of layers in the generator. 6.4 Experiments & Analysis 6.4.1 Datasets & PA Detection Methods Used In this research, we utilized three different PA datasets, viz., D1: Berc-iris-fake [91], D2: Casia- iris-fake [136], D3: LivDet-2017 [162] and D4: test set of LivDet-2020 [36]2 for training and 1https://github.com/TreB1eN/InsightFace_Pytorch 2LivDet-2020 does not have a training set. 129 testing different iris presentation attack detection (PAD) algorithms. These ocular PA datasets contain bonafide images and images from different PA classes such as cosmetic contact lenses, printed eyes and artificial eyes. Each dataset is divided into train and test using 70-30 split on each domain (bonafide, printed eyes and cosmetic contact lens). The proposed generative network, MID-StyleGAN, used in this research is trained using the train set of LivDet-2017 dataset containing bonafide, printed eyes and cosmetic contact lens (three domains). Using the trained network, we generate 10,000 synthetic ocular images per domain. We evaluate these images for realism and utility in the sub-sections below. 6.4.2 Realism Assessment With the rapid development in DeepFake technology, researchers have been exploring various approaches to assess the quality of synthetically generated data. Salimans et al. [120] introduced the use of a pre-trained Inception-V3 model to compute the inception score by comparing the marginal and conditional label distribution of the synthetic data. A higher inception score indicates better quality of the generated data. However, this method does not account for the real data distribution in its calculations. To address this, Heusel et al. [63] proposed the Fréchet Inception Distance (FID) score, which compares the statistics of the real data with those of the synthetically generated data: 𝐹 𝐼 𝐷 = | 𝝁𝒓 − 𝝁𝒔 |2 + 𝑇𝑟 (𝚺𝒓 + 𝚺𝒔 − 2 √︁𝚺𝒓𝚺𝒔). (6.14) In this equation, 𝝁𝒔, 𝝁𝒓, 𝚺𝒔, and 𝚺𝒓 represent the statistics of the synthetic (𝑠) and real (𝑟) distributions, respectively. Since FID measures the distance between these two distributions, a lower FID score indicates better quality of the generated data. As described earlier, for this experiment we train MID-StyleGAN with the train set of LivDet-2017 dataset and generate 10,000 images for each domain (bonafide, printed eyes and cosmetic contact lens) using test set from D1, D2, D3 and D4 as source image. For each of the generated images, their realism score is calculated 130 against the distribution of real images (source) using FID. For the comparative study, we utilize CIT-GAN [158], StyleGAN-3 [76] and diffusion based StyleGAN-3 (diff-Style3) [147]. Analysis: The analysis of FID scores reveals that MID-StyleGAN performs best, producing the highest quality images with lower FID scores averaging at 19.71. In contrast, both StyleGAN-3 (average FID of 139.22) and CIT-GAN (average FID of 257.41) exhibit inconsistent performance, reflecting significant variability across domains. Specifically, the printed eyes introduces higher FID scores, leading to poorer overall performance. The presence of multiple peaks suggests that these models struggle to maintain consistent quality across different types of synthetic data, especially for printed eyes (See Figure 6.3). 6.4.3 Utility of Generated Dataset In this section, we describe experimental setups that are utilized to evaluate the usefulness of synthetically generated ocular PA dataset generated using MID-StyleGAN by evaluating the per- formance of different deep learning based iris PA detection methods, viz., [130], VGG-19 [134], ResNet-101 [62], MobileNet-v2 [121], AlexNet [85] and D-NetPAD [129]. The experiments in this section are done in a cross-dataset scenario, i.e., if the PA detectors are trained on train set of D1, then they are tested on test sets from D3 and D4. As mentioned earlier, D1: Berc-iris-fake [91] has a total of 2,778 bonafide and 1,820 PAs that is divided using 70-30 split on each domain (bonafide, printed eyes and cosmetic contact lens), i.e., the train set has 1,944 bonafide and 1,274 PA images while the test set has 834 bonafide and 546 PA images. For D3: LivDet-2017 [162] the train-test partition is already provided with 6,563 bonafides and 9,137 PA images in the train set and 5,511 bonafides and 9,356 PA images in the test set. We have only the test set for D4: LivDet-2020 [36] dataset, which has 5,330 bonafides and 6,007 PA images (excluding the post-mortem iris images in this dataset that is not the focus of our study). Note that since the proposed method is an image translative generative method, we set aside D3: CASIA-iris-fake [136] to be used for synthetic image generation with domain transfer. This ensures that the generated images have no overlap with any images in the test sets. This dataset has a total 131 of 6,000 bonafide images and 1,780 PA images. 6.4.3.1 Baseline Experiment-0 In this experiment, we set the baseline for various PA detectors for ocular PA detection, i.e, training and testing of the detectors are done on real images from iris PA datasets. As mentioned above, the training and testing is done in the cross-dataset setup, which means that if training is done on train set of D1, the testing is done on test sets from D3 and D4. Similarly, when training is done on train set of D3, the test sets have images from D1 and D4. 6.4.3.2 Utility Experiment-1 In this experiment, we aim to evaluate the usefulness of the synthetic dataset in training more reliable and secure PA detection techniques. For this, unlike the baseline experiment, the PA detection methods are trained using both real and synthetically generated PA datasets. The synthetic image for various domains (bonafide, printed eyes and cosmetic contact lens) are generated using MID- StyleGAN trained on images from D2: CASIA-iris-fake. 7,000 images per domain are generated which are then utilized enhance the detector’s performance by augmenting the training set with more images that has domain level variations. Analysis: After evaluating the performance of various iris detectors on different datasets, we analyzed how the number of samples across domains and variations in the training set can affect the performance of the PA detectors. This is very clear when comparing the baseline performance of the detectors when trained with D1: Berc-iris-fake dataset which is a comparatively smaller dataset (as shown in Table 6.3) versus when trained using the LivDet-2017 dataset (as shown in Table 6.1). As seen from the Tables, the VGG-19 detectors trained using D1 obtained a TDR of 27.40% at 1% FDR when tested on D4: LivDet-2020 dataset, while it obtained 73.08% TDR at 1% FDR on D4 when trained using D3: LivDet2017 (which has more number of samples and variations in the dataset). Similar behavior was noticed for the other PA detectors. 132 Another proof of effects of number of samples across domains and variations in train set on the performance of PA detectors is obtained by comparing their performance when train set is augmented using synthetic ocular samples to introduce more samples per domain with intra-class variations. Comparing the performance of PA detectors in Table 6.3 with 6.4 and Table 6.1 with 6.2, it can be clearly seen that performance of detectors improve after augmenting the train set using synthetic samples. For example, in Table 6.1 and 6.2 the performance of DNetPAD [129] when tested on D1 improves from 93.41% TDR at 1% FDR to 98.72% TDR at 1% FDR. Similar behaviour was seen for other detectors as well. Table 6.1: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Baseline Experiment-0 when PA detectors are trained on D3: LivDet-2017 and tested using test sets from D1: Berc-iris-fake and D4: LivDet-2020. Dataset VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121] D1 D4 TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR 92.12 97.99 99.08 70.63 74.86 80.86 93.77 95.97 97.99 48.96 54.47 61.69 99.08 99.45 99.63 75.73 80.66 88.30 97.44 97.99 98.90 76.51 79.96 84.40 93.41 95.60 99.08 70.15 75.16 82.05 Table 6.2: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Utility Experiment-1 when PA detectors are trained on D3: LivDet-2017 + Synthetic images and tested using test sets from D1: Berc-iris-fake and D4: LivDet-2020. Dataset VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121] D1 D4 TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR 94.51 97.99 100 73.08 77.36 84.07 97.99 97.99 98.90 54.60 60.91 68.89 99.27 99.45 99.82 83.20 88.66 92.73 98.90 99.08 99.63 79.39 83.39 86.97 98.72 98.72 99.72 70.88 76.21 83.05 Table 6.3: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Baseline Experiment-0 when PA detectors are trained on D1: Berc-Iris-Fake and tested using test sets from D3: LivDet-2017 and D4: LivDet-2020. Dataset VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121] D3 D4 TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR 38.04 41.78 49.59 27.40 38.72 55.88 24.07 28.24 35.44 17.86 24.90 38.22 44.60 47.82 53.78 34.26 45.60 61.54 55.70 58.69 65.28 20.79 24.72 29.58 51.33 53.60 58.12 21.03 25.36 32.89 133 (a) This histogram shows the FID scores of generated images using four genera- tive methods: Proposed method MID-StyleGAN, diff-StyleGAN-3, StyleGAN-3, and CIT-GAN. MID-StyleGAN achieves the lowest FID scores, indicating better image quality, while CIT-GAN and StyleGAN-3 show higher FID scores and inconsistencies, especially across domains like printed eyes. (b) This histogram breaks down MID-StyleGAN’s FID scores for the three do- mains: Bonafide, Printed eyes, and Cosmetic Contact Lens. The Bonafide set has the lowest average FID, showing higher quality, while Printed eyes introduces higher variability and poorer performance. Figure 6.3: Comparison of FID scores across multiple generative methods with respect to proposed method. The first plot shows performance across all methods, while the second focuses on realism of images generated using MID-StyleGAN across different domains. 134 Table 6.4: True Detection Rate (TDR) at different False Detection Rates (FDRs) for Utility Experiment-1 when PA detectors are trained on D1: Berc-Iris-Fake + Synthetic images and tested using test sets from D3: LivDet-2017 and D4: LivDet-2020. Dataset VGG-19 [134] AlexNet [62] ResNet-101 [85] MobileNet-v2 [121] DNetPAD [121] D3 D4 TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR TDR @ 1% FDR TDR @ 2% FDR TDR @ 5% FDR 43.76 49.19 58.52 14.99 43.67 60.51 28.99 34.67 43.73 18.91 26.24 39.74 51.85 56.02 65.29 42.07 53.14 68.35 57.11 60.71 68.57 24.75 33.19 50.36 54.36 57.87 65.12 52.99 64.37 75.73 6.4.4 Ablation Study To rigorously evaluate the effectiveness of the proposed MID-StyleGAN, we conducted two distinct types of ablation studies. These studies provide a comprehensive understanding of the model’s contributions and limitations in the context of ocular presentation attack detection (PAD). First, we systematically removed key components of the MID-StyleGAN architecture to assess their individual contributions to overall performance. By isolating elements such as the adaptive loss function, multi-domain architecture, and diffusion-based components, we measured how each feature impacts the quality and diversity of synthetic ocular images. The results of this analysis demonstrated that the inclusion of these components significantly enhances the model’s ability to generate realistic and domain-consistent images, which are crucial for improving PAD system performance. Second, we performed a capacitive study to explore the iterative potential of synthetic data generation using MID-StyleGAN. In this approach, the proposed method was trained using real data to produce the first set of synthetic samples. These synthetic samples were then used to train the GAN again, generating a second set of synthetic data. The process was repeated iteratively to produce a third set of synthetic data. For each generated set, we computed the Fréchet Inception Distance (FID) score to measure its similarity to real data, allowing us to track how closely the synthetic samples approximated real data over successive iterations. Furthermore, each set of synthetic data was utilized to train PAD methods, and their performance was analyzed. The results revealed the extent to which iterative training cycles influence the quality of synthetic data and its effectiveness in improving PAD performance. 135 6.4.4.1 Studying Components of Proposed Method Effect of Style Mixing Regularization: Style mixing regularization plays a vital role in encourag- ing diversity in the generated images by mixing styles from different layers. To assess its impact, we conducted experiments with and without this regularization. When style mixing was removed, we observed a noticeable drop in image quality and diversity. The generated images tended to lack variability across different domains, which hindered their utility for cross-domain analysis in presentation attack detection. The average FID score increased (worsened) by approximately 9.79%, indicating degraded image quality. Impact of Path Length Regularization: Path length regularization ensures smoother transitions in the latent space and improves the consistency of generated images. We performed experiments by disabling this regularization. The results showed that, without path length regularization, the generator produced less consistent outputs, with occasional abrupt changes in image features. The average FID score worsened by approximately 8.12%, and visual inspection of the generated images revealed artifacts that negatively impacted their realism. This regularization was particularly critical for maintaining the smooth transitions between different ocular domains. Role of Domain-Specific Discriminator: The multi-domain discriminator in MID-StyleGAN was specifically designed to handle domain transfer by discriminating images based on their target domain. We conducted an experiment by replacing the multi-domain discriminator with a standard single-domain discriminator. Without the multi-domain capability, the model struggled to enforce domain-specific characteristics in the generated images. The generated PA samples lacked clear domain-specific features, and domain confusion was evident. The average FID score worsened by 20.70%, suggesting a significant decrease in the quality of the generated domain-transferred images. Effect of Content Preservation Loss: We further analyzed the effect of content preservation loss (reconstruction loss) by removing it from the objective function. In this setting, the model generated images that diverged significantly from the input samples, with important features being lost during 136 the domain transfer process. This loss function is crucial for ensuring that key ocular features are retained, even when the domain is altered. Without this component, the model’s capacity for realistic and recognizable presentation attack generation was severely compromised. Each component of the proposed MID-StyleGAN contributes significantly to the overall success of the model. Style mixing and path length regularization enhance the diversity and smoothness of the generated images, while the multi-domain discriminator is critical for domain-specific image generation. Content preservation ensures that key ocular features are retained during domain transfer, and the absence of identity preservation for the intended task of presentation attack detection where focus is on detecting attacks rather than biometric recognition. The proposed architecture achieves an optimal balance of these components, resulting in high-quality domain- transferred images with favorably low FID scores across various ablation settings. 6.4.4.2 Capacitive Study For this study, the proposed method was trained using real data to produce the first set of synthetic samples (Synthetic-1) for multiple domains (bonafide, printed eyes and cosmetic contact lens). These synthetic samples were then used to train MID-StyleGAN again, generating a second set of synthetic data (Synthetic-2). The process was repeated iteratively to produce a third set of synthetic data (Synthetic-3). For each generated set, we computed the Fréchet Inception Distance (FID) score to measure its similarity to real data, allowing us to track how closely the synthetic samples approximated real data over successive iterations. Furthermore, each set of synthetic data was utilized to train PAD methods, and their performance was analyzed. For all the experiments in this section, we have utilized similar experimental suite as mentioned in experiment section. Analysis on Synthetic-1: This is the set of synthetic data that was generated when MID-StyleGAN was trained using real samples and has been utilized for realism and utility analysis in the experi- mental section. As mentioned before, this set of synthetic data (Synthetic-1) obtained an average 137 FID score of 19.71 for 30,000 images with 10,000 images per domain, indicating the realism of generated samples with respect to real data. Also, comparing the performance of PA detectors in Table 6.3 with 6.4 and Table 6.1 with 6.2, it can be clearly seen that performance of detectors improve after augmenting the train set using synthetic samples. For example, in Table 6.1 and 6.2 the performance of DNetPAD [129] when tested on D1 improves from 93.41% TDR at 1% FDR to 98.72% TDR at 1% FDR. Similar behavior was seen for other detectors as well. Analysis on Synthetic-2: This is the set of synthetic data that was generated when MID-StyleGAN was trained using Synthetic-1. The number of samples utilized for training is similar to the number of samples from real data for fair comparison. We observed an average FID score of 20.36 for the generated data (Synthetic-2 as shown in Figure), where similar to before we obtained the score on 30,000 images with 10,000 images from each domain. Some effect on performance was observed on the performance of PAD methods when trained using this dataset (Real + Synthetic-2). For example, TDR of VGG-19 and DNetPAD improved by 0.23% and 0.18% at 1% on D4 test set, respectively, in comparison to when the detectors were trained using Real + Synthetic-1. Analysis on Synthetic-3: This is the set of synthetic data that was generated when MID-StyleGAN was trained using Synthetic-2. The number of samples utilized for training is similar to the number of samples from real data for fair comparison. We observed an average FID score of 31.64 for the generated data (Synthetic-3 as shown in Figure), where similar to before we obtained the score on 30,000 images with 10,000 images from each domain. Some effect on performance was observed on the performance of PAD methods when trained using this dataset (Real + Synthetic-2). For example, TDR of VGG-19 and DNetPAD improved by 2.11% and 3.42% at 1% on D4 test set, respectively, in comparison to when the detectors were trained using Real + Synthetic-1. The results revealed the extent to which iterative training cycles influence the quality of synthetic data and also performance of PAD methods. 138 6.5 Conclusion and Future Work The proposed approach for multi-domain image translation within the context of iris presentation attack detection effectively ensures that the generated ocular images are from the specified target domain. By leveraging the domain classification loss, the model is trained to produce images that not only exhibit realistic features but also align with the desired domain, facilitating more accurate and robust PA detection. At present, our approach does not specifically aim to generate entirely new identities. This decision is based on the nature of the presentation attack detection task, where the primary concern is distinguishing between bonafide and attack images rather than deducing identities. Consequently, the model may replicate certain identity features from the training data, which is acceptable within the context of this specific application. However, we recognize the importance of privacy considerations in synthetic image generation. Moving forward, our goal is to refine this approach to be more privacy-conscious by ensuring that the generated images do not replicate identity characteristics from the training data. This would involve introducing additional constraints or adversarial techniques to separate identity features from domain-specific attributes, ensuring that the synthetic images are effective for PA detection while also protecting the privacy of individuals whose data is used in training. This enhancement aligns with broader ethical considerations in AI development and contributes to the responsible deployment of generative models, particularly in sensitive fields like biometric security. 139 (a) Performance of DNetPAD in baseline Experiment-0 when it is trained using real images from the LivDet-2017 train dataset compared with utility Experiment-1 when it is trained using real and synthetically generated images. The testing is done on the test set of Berc-iris-fake. (b) Performance of MobileNet-v2 in baseline Experiment-0 when it is trained using real images from LivDet-2017 train dataset compared with utility Experiment-1 when it is trained using real and synthetically generated images. The testing is done on the test set of LivDet-2020. Figure 6.4: Performance of iris PA detectors when trained using only real images and also when trained using real+synthetic images showcasing the usefulness of the generated ocular PA dataset. 140 CHAPTER 7 SUMMARY AND FUTURE WORK In the field of biometrics, the development of techniques for generating synthetic data has been pivotal across various modalities. This has motivated researchers to generate synthetic biometric data that exhibits the characteristics of real-world data. However, the existing methods for generating synthetic irides are confronted with limitations concerning quality, realism, and the capacity to capture both inter and intra-class variations. In this research, we introduced novel approaches that are aimed at mitigating these issues, accompanied by a rigorous exploration of their practical utility via extensive experimentation and analysis: Generating Partially-Synthetic Iris Images: As mentioned previously, the primary objective of partially-synthetic irides is to introduce controlled variations into genuine iris data, thereby enriching the diversity and robustness of the dataset. This proves particularly advantageous in scenarios where real data is scarce, imbalanced, or lacks specific variations. For instance, iris PA detection, where detection methods are tasked with identifying various PA attacks (e.g., printed eyes, cosmetic contact lenses) using a limited PA samples. In such a scenario, the training and testing of PA detection methods can be severely hampered due to the scarcity of PA data. Furthermore, as technology advances, novel and sophisticated PA attacks emerge (e.g., high-quality textured contact lenses, replay attacks with high-definition screens), rendering existing detection methods inadequate to generalize over new attacks. We proposed three different approaches to address these challenges: • Leveraging Relativistic Average Standard Generative Adversarial Network (RaSGAN): Our study utilized RaSGAN to generate high quality synthetic iris images, with a specific emphasis on their potential application in PA detection. This technique harnesses the capabilities of a RaSGAN to produce high-quality iris images. Unlike conventional GANs, the incorporation of a "relativistic" discriminator and generator in RaSGAN bolsters the network’s generative power. The synthetic iris images generated through this method exhibit high resemblance to 141 real iris images, capturing their intricate details and characteristics. We further investigated the utility of these synthetic images for training various PAD methods. Our experiments demonstrated that a PAD method trained on these synthetic images exhibits improvement in the performance. Our approach holds promise in bolstering the security and reliability of iris recognition systems. We also proposed a one-class PA detector using "relativistic" discriminator, known as RD-PAD, which aims to learn a decision boundary around bonafide samples and reject everything else as PAs. Thus, enabling it to generalize well over unseen PAs. • Cyclic Image Translation Generative Adversarial Network (CIT-GAN): Here, we introduced a novel approach, known as CIT-GAN, designed for synthetic iris generation with style transfer to multiple target domains. This method incorporated a styling network, which learns the distinctive style characteristics of each domain represented in the training dataset. By leveraging the Styling Network, the generator is guided to translate images from a source domain to a reference domain, resulting in the generation of synthetic images imbued with the style characteristics of the reference or target domain. This approach is particularly pertinent in the context of iris PA detection, where we employ CIT-GAN to generate synthetic PA samples for under-represented classes in the training dataset. Through rigorous evaluation using various iris PAD methods, we demonstrate the effectiveness of these synthetically generated PA samples for training PAD models. Additionally, we gauge the realism of the synthetic images using the Fréchet Inception Distance (FID) score, which quantifies the similarity between the distributions of real and synthetic images. Our results emphasizes the realism and utility of the synthetic images produced by the proposed method compared to other competing approaches. • Multi-domain Image Translative Diffusion StyleGAN with Application in Iris Presentation Attack Detection (MID-StyleGAN): An iris biometric system can be vulnerable to presenta- tion attacks (PAs), where artifacts like artificial eyes, printed eye images, or cosmetic contact 142 lenses are used to deceive the system. To mitigate these threats, various presentation attack detection (PAD) methods have been proposed. However, the development and evaluation of iris PAD techniques face a significant challenge due to the lack of sufficient datasets, pri- marily because of the inherent difficulties in creating and capturing realistic PAs. To address this issue, we presented the Multi-domain Image Translative Diffusion StyleGAN (MID- StyleGAN), a novel framework designed to generate synthetic ocular images that effectively capture domain-specific information from iris PA datasets. MID-StyleGAN leverages the strengths of diffusion models and generative adversarial networks (GANs) to create realistic and diverse synthetic data. It utilizes a multi-domain architecture that enables seamless translation between bonafide ocular images and various PA domains, while maintaining the biometric identity features. The framework incorporates an adaptive loss function specifi- cally tailored for ocular data to ensure domain consistency. Experimental results demonstrate that MID-StyleGAN surpasses existing methods in generating high-quality synthetic ocular images, significantly enhancing PAD system performance. While these methods can effectively generate partially synthetic iris images, they lack the capability of generating iris images with identities that are different from training data i.e., the identities generated by these methods resemble the identities used for training the data. Generating Fully-Synthetic Iris Images: Fully-synthetic biometric data refers to the generation of entirely artificial biometric samples that do not correspond to any real individuals in the population. These synthetic samples are created using different methods to simulate the statistical properties and characteristics of real biometric data. To overcome the shortcomings of previous methods and generate fully-synthetic iris images, we introduced two novel frameworks: • Disentangling Identity and Style to Generate Synthetic Iris Images using iWarpGAN: This framework consists of two transformation pathways: the Identity transformation and the Style transformation pathway. The Identity transformation pathway is devised to modify the identity of the input iris image within the latent space, enabling the generation of iris 143 images with identities distinct from those in the training set. On the other hand, the Style transformation pathway concentrates on generating iris images with varying styles. Style attributes are extracted from a reference iris image and given attribute vector, representing the attributes of given reference iris image. These two transformation pathways in iWarpGAN facilitates the generation of iris images that show diverse identities and styles and provide a comprehensive exploration of the latent space. • Image Translative Diffusion GAN (IT-DiffGAN): IT-diffGAN addresses the challenges posed by traditional GANs in generating synthetic biometric data. IT-diffGAN leverages the stability and realism provided by diffusion models, using StyleGAN-3 as the backbone architecture. The method begins by projecting input images onto the latent space of the diffusion-GAN, where it identifies the features most pertinent to individual identity and style. A specialized identity and style metric is employed to calculate the distance between the original and generated images, enabling the model to learn which latent features affect identity and style. Once these features are identified, IT-diffGAN is trained to generate new identities by manipulating them in the latent space. This allows the generation of images with both inter- and intra-class variations while ensuring that the synthetic identities do not resemble anyone in the training data. By utilizing the diffusion-GAN framework, IT-diffGAN produces more realistic images and addresses common issues associated with traditional GANs, such as mode collapse and unstable training. While the methods proposed in this thesis represent significant advancements in generating synthetic iris images, there remain several promising directions for future exploration. These directions could help further improve the realism, diversity, and practical application of synthetic biometric data, particularly in addressing current limitations in iris image generation. For example, • Scaling Synthetic Data Generation: A critical objective for future research is to expand the scope and scale of synthetic datasets. Generative models should be capable of producing large-scale datasets that accurately reflect the diversity and complexity of real-world biometric 144 data. This includes generating not only a higher number of images but also creating distinct identities that do not resemble those in the training data. Additionally, the scope of synthetic presentation attacks (PAs) needs to be broadened to include emerging and sophisticated attack methods such as high-resolution replay attacks, and other advanced techniques. Simulating combinations of multiple PA types would further improve the training of robust PAD systems to handle complex and unforeseen scenarios. • Incorporating Multiple Controllable Attributes: Future advancements should explore generating iris images with controllable attributes such as gender, age, and other demographic or physical traits. For example: (1) Gender-Specific Features: Generating datasets with distinct male and female iris characteristics can help improve recognition accuracy across different populations, (2) Age Progression and Variation: Simulating age-related changes in iris patterns or generating images across various age groups can support studies on aging in biometrics and enhance the robustness of recognition systems and (3) Customizable Features: Attributes like iris color, environmental lighting, and even gaze direction could be integrated into generative models. This level of control would enable the creation of datasets tailored to specific application requirements, such as forensic analysis or testing under varied conditions. • Text-to-Iris Generation: One emerging area of research is the use of text-to-image gen- eration models for generating synthetic iris images. This involves employing natural lan- guage descriptions to guide the generation of biometric data. Current generative models like diffusion models and generative adversarial networks (GANs) have primarily relied on image-based input data, but the integration of text prompts allows for even greater control over the generated content. For instance, specifying text descriptions such as "dark brown iris with light radial streaks" or "large iris with distinct crypts" could enable the generation of highly specific iris images, potentially with desired features for certain applications, such as forensic analysis or iris recognition in low-light environments. Exploring text-to-iris genera- tion could lead to more flexible and customizable generation techniques, allowing researchers 145 to produce synthetic datasets tailored to specific needs, while also simplifying the dataset creation process. • Leveraging Large Language Models (LLMs) for Iris Generation: Large Language Models (LLMs) such as GPT-4 have demonstrated impressive generative capabilities across a range of tasks, from text to code generation. Future work could explore the use of LLMs in the context of iris image generation, particularly for improving the realism and diversity of synthetic data. LLMs could be integrated with existing image generation models to enhance the capability to generate new and diverse iris images. One possible direction is using LLMs to guide the latent space manipulation in GANs or diffusion models, improving the model’s ability to synthesize realistic and unique identities by interpreting detailed, semantic input. The incorporation of LLMs into the synthetic iris generation process could also aid in developing models that are more adept at creating specific variations required in biometric systems, like diverse lighting conditions or novel presentation attack scenarios. • Multi-Spectrum Iris Generation (NIR and Visible): Another key area for future research is multi-spectrum iris image generation. Most iris recognition systems rely on near-infrared (NIR) images. However, there is growing interest in visible spectrum iris recognition for mobile and low-light environments, where NIR imagery may not always be feasible. De- veloping models capable of generating synthetic iris images across both NIR and visible spectrum would significantly expand the utility of synthetic iris datasets. For example, fu- ture work could focus on training generative models to learn the correlations between iris patterns captured in NIR and visible light, enabling the translation of a synthetic NIR iris image into a realistic visible spectrum image, and vice versa. This capability could be crucial for advancing biometric systems in mobile devices, where visible light conditions are more common, and for cross-spectrum iris recognition. • Cross-Domain and Multi-Modal Iris Image Generation: As biometric systems increas- ingly incorporate multiple modalities (e.g., face and iris recognition), there is potential to 146 explore multi-modal synthetic data generation. Future work could investigate how to lever- age existing models to create synthetic data that captures both cross-domain and cross-modal biometric traits. For instance, generating images that simultaneously capture iris, face, and periocular features could provide comprehensive datasets for multi-biometric recognition systems. This direction could also enable better data augmentation techniques for systems that use multiple biometric traits for identity verification. • Generative Models for Emerging Presentation Attack (PA) Techniques: While signif- icant progress has been made in generating synthetic data to improve presentation attack detection (PAD) systems, as new PA techniques emerge—such as high-resolution display at- tacks, textured contact lenses, or replay attacks—future work will need to continue evolving generative models to account for these sophisticated attacks. Models like MID-StyleGAN could be further refined to simulate these more advanced PA scenarios, ensuring that PAD systems are robust to future threats. Additionally, exploring the integration of adversarial training and domain adaptation into synthetic data generation could help PAD systems better generalize across different PA types and environments. • Enhancing Realism in Fully-Synthetic Identities: While IT-diffGAN and iWarpGAN make strides in generating fully synthetic iris images with distinct identities, future research can explore ways to enhance the realism of these fully-synthetic identities. This may involve improving the latent space representation to better capture the fine-grained details of iris patterns and incorporating feedback loops where the generated identities are compared with real-world data for further refinement. Additionally, increasing the resolution and dynamic range of synthetic iris images could improve the quality of data used for training biometric systems, leading to more accurate recognition performance. • Ethical Considerations and Privacy Preservation in Synthetic Data: As the field of synthetic biometric data continues to expand, future work must also consider the ethical im- plications and the importance of privacy preservation in generating fully synthetic biometric 147 identities. Ensuring that synthetic data generation techniques do not inadvertently reproduce real-world identities, or mimic certain population characteristics disproportionately, is es- sential. Future research could involve developing privacy-preserving generative techniques that use differential privacy or privacy-aware learning frameworks to ensure that synthetic datasets remain ethically sound and do not compromise individual privacy. The future of synthetic iris image generation holds immense promise, offering not only tech- nological advancements but also solutions to some of the most pressing challenges in biometric systems. By exploring novel methods such as text-to-iris generation, leveraging large language mod- els (LLMs), and developing multi-spectrum capabilities, the field is poised to overcome existing limitations in data diversity, realism, and security. At the same time, maintaining a balance between innovation and ethical responsibility is crucial. As synthetic data becomes more widespread in biometric research and applications, it is essential to prioritize privacy and fairness. The continued development of privacy-preserving techniques and frameworks will be critical in ensuring that the benefits of synthetic biometric data do not come at the expense of individual privacy or ethical integrity. Ultimately, the methods and directions proposed in this work pave the way for more robust, secure, and reliable biometric systems. By advancing the state-of-the-art in synthetic iris generation, future research can contribute to the broader adoption of biometric technologies across various industries, ensuring their long-term viability and impact in a rapidly evolving digital world. 148 BIBLIOGRAPHY [1] [2] [3] [4] [5] [6] [7] The Iris Challenge Evaluation (ICE) 2005 conducted by National Institute of Standards and Technology (NIST). https://www.nist.gov/programs-projects/iris-challenge-evaluation-ice, 2005. The Iris Challenge Evaluation (ICE) 2006 conducted by National Institute of Standards and Technology (NIST). https://www.nist.gov/programs-projects/iris-challenge-evaluation-ice, 2006. ISO-Quality-Metrics-Iris-29794-6. Information technology Biometric sample quality Part 6: Iris image data. Standard, International Organization for Standardization, Geneva, CH. (https://www.iso.org/standard/54066.html, 2014. CASIA Fingerprint dbDetailForUser.do?id=4, 2017. Image Database Version 5.0). http://biometrics.idealtest.org/ CASIA Iris Image Database Version 4.0. http://biometrics.idealtest.org/dbDetailForUser. do?id=4, 2017. IIT Delhi Database. http://www4.comp.polyu.edu.hk/~csajaykr/IITD/Database_Iris.htm., 2017. Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, and Aaron Courville. arXiv Augmented cycleGAN: Learning many-to-many mappings from unpaired data. preprint arXiv:1802.10151, 2018. [8] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML), pages 214–223, 2017. [9] Pankaj Bamoriya, Gourav Siddhad, Harkeerat Kaur, Pritee Khanna, and Aparajita Ojha. Dsb-gan: Generation of deep learning based synthetic biometric data. Displays, 74:102267, 2022. [10] Michael Banf and Volker Blanz. Example-based rendering of eye movements. In Computer Graphics Forum, volume 28, pages 659–666. Wiley Online Library, 2009. [11] Dor Bank, Noam Koenigstein, and Raja Giryes. Autoencoders. arXiv preprint arXiv:2003.05991, 2020. [12] Shane Barratt and Rishi Sharma. A note on the inception score. International Conference on Machine Learning Workshops (ICMLW), 2018. [13] [14] Jacob Benesty, M Mohan Sondhi, Yiteng Huang, et al. Springer handbook of speech processing, volume 1. Springer, 2008. Jordan J Bird, Diego R Faria, Cristiano Premebida, Anikó Ekárt, and Pedro PS Ayrosa. Overcoming data scarcity in speaker identification: Dataset augmentation with synthetic mfccs via character-level rnn. In 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pages 146–151. IEEE, 2020. 149 [15] Ruud M Bolle, Jonathan H Connell, Sharath Pankanti, Nalini K Ratha, and Andrew W Senior. Guide to biometrics. Springer Science & Business Media, 2013. [16] Fadi Boutros, Marco Huber, Patrick Siebke, Tim Rieber, and Naser Damer. Sface: Privacy- friendly and accurate face recognition using synthetic data. In 2022 IEEE International Joint Conference on Biometrics (IJCB), pages 1–11. IEEE, 2022. [17] Kevin W Bowyer and Mark J Burge. Handbook of iris recognition. Springer, 2016. [18] Raffaele Cappelli, Dario Maio, Davide Maltoni, et al. SFinGe: Synthetic Fingerprint Generator. 2004. [19] Luís Cardoso, André Barbosa, Frutuoso Silva, António MG Pinheiro, and Hugo Proença. Iris biometrics: Synthesis of degraded ocular images. IEEE Transactions on information forensics and security, 8(7):1115–1125, 2013. [20] Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. Paired cycleGAN: Asymmet- ric style transfer for applying and removing makeup. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 40–48, 2018. [21] Cunjian Chen and Arun Ross. A multi-task convolutional neural network for joint iris In IEEE Winter Applications of Computer detection and presentation attack detection. Vision Workshops (WACVW), pages 44–51, 2018. [22] Rui Chen, Xirong Lin, and Tianhuai Ding. Liveness detection for iris recognition using multispectral images. Pattern Recognition Letters, 33(12):1513–1519, 2012. [23] Manuela Chessa, Guido Maiello, Alessia Borsari, and Peter J Bex. The perceptual quality of the oculus rift for immersive virtual reality. Human–computer interaction, 34(1):51–82, 2019. [24] Wonwoong Cho, Sungha Choi, David Keetae Park, Inkyu Shin, and Jaegul Choo. Image-to- image translation via group-wise deep whitening-and-coloring transformation. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10639– 10647, 2019. [25] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. StarGAN: Unified generative adversarial networks for multi-domain image-to-image trans- lation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8789–8797, 2018. [26] Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8188–8197, 2020. [27] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape models-their training and application. Computer vision and image understanding, 61(1):38– 59, 1995. 150 [28] National Research Council, Whither Biometrics Committee, et al. Biometric recognition: Challenges and opportunities. 2010. [29] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. Generative adversarial networks: An overview. IEEE signal processing magazine, 35(1):53–65, 2018. [30] S Crihalmeanu, Arun Ross, Stephanie Schuckers, and L Hornak. A protocol for multibiomet- ric data acquisition, storage and dissemination. Technical report, Technical Report, WVU, Lane Department of Computer Science and Electrical . . . , 2007. [31] Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10850–10869, 2023. [32] Jiali Cui, Yunhong Wang, JunZhou Huang, Tieniu Tan, and Zhenan Sun. An iris image synthesis method based on pca and super-resolution. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., volume 4, pages 471–474 Vol.4, 2004. [33] Adam Czajka. Database of iris printouts and its application: Development of liveness detection method for iris recognition. In 2013 18th International Conference on Methods & Models in Automation & Robotics (MMAR), pages 28–33. IEEE, 2013. [34] Adam Czajka and Kevin W Bowyer. Presentation attack detection for iris recognition: An assessment of the state-of-the-art. ACM Computing Surveys (CSUR), 51(4):1–35, 2018. [35] Fida K Dankar and Mahmoud Ibrahim. Fake it till you make it: Guidelines for effective synthetic data generation. Applied Sciences, 11(5):2158, 2021. [36] Priyanka Das, Joseph McGrath, Zhaoyuan Fang, Aidan Boyd, Ganghee Jang, Amir Moham- madi, Sandip Purnapatra, David Yambay, Sébastien Marcel, Mateusz Trokielewicz, et al. Iris liveness detection competition (LivDet-Iris)–the 2020 edition. In IEEE International Joint Conference on Biometrics (IJCB), 2020. [37] John Daugman. New methods in iris recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(5):1167–1175, 2007. [38] Parth Rajesh Desai, Pooja Nikhil Desai, Komal Deepak Ajmera, and Khushbu Mehta. A review paper on oculus rift-a virtual reality headset. arXiv preprint arXiv:1408.1173, 2014. [39] Yaohui Ding and Arun Ross. An ensemble of one-class SVMs for fingerprint spoof detection In IEEE International Workshop on Information across different fabrication materials. Forensics and Security (WIFS), pages 1–6, 2016. [40] Garoe Dorta, Sara Vicente, Neill DF Campbell, and Ivor JA Simpson. The GAN that warped: Semantic attribute editing with unpaired data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5356–5365, 2020. 151 [41] James S Doyle and Kevin W Bowyer. Robust detection of textured contact lenses in iris recognition using BSIF. IEEE Access, 3:1672–1683, 2015. [42] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Mar- arXiv preprint tin Arjovsky, and Aaron Courville. Adversarially learned inference. arXiv:1606.00704, 2016. [43] Melissa Edwards, Agnes Gozdzik, Kendra Ross, Jon Miles, and Esteban J Parra. Quantitative measures of iris color using high resolution photographs. American journal of physical anthropology, 147(1):141–149, 2012. [44] [45] Joshua J Engelsma and Anil K Jain. Generalizing Fingerprint Spoof Detector: Learning a One-Class Classifier. arXiv preprint arXiv:1901.03918, 2019. Joshua James Engelsma, Steven Grosz, and Anil K Jain. Printsgan: Synthetic fingerprint generator. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6111– 6124, 2022. [46] Michael D. Fairhurst and Charles I. Watson. Synthetic biometrics: The path ahead. In Handbook of Biometric Anti-Spoofing, pages 525–551. 2019. [47] Meiling Fang, Marco Huber, and Naser Damer. Synthaspoof: Developing face presentation attack detection based on privacy-friendly synthetic data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1061–1070, 2023. [48] Marcos Faundez-Zanuy. Signature recognition state-of-the-art. IEEE aerospace and elec- tronic systems magazine, 20(7):28–32, 2005. [49] Zhen-Hua Feng, Guosheng Hu, Josef Kittler, William Christmas, and Xiao-Jun Wu. Cas- caded collaborative regression for robust facial landmark detection trained using a mixture of synthetic and real images with dynamic weighting. IEEE Transactions on Image Processing, 24(11):3425–3440, 2015. [50] [51] Javier Galbally, Arun Ross, Marta Gomez-Barrero, Julian Fierrez, and Javier Ortega-Garcia. Iris image reconstruction from binary templates: An efficient probabilistic approach based on genetic algorithms. Computer Vision and Image Understanding, 117:1512–1525, 10 2013. Javier Galbally, Arun Ross, Marta Gomez-Barrero, Julian Fierrez, and Javier Ortega-Garcia. Iris image reconstruction from binary templates: An efficient probabilistic approach based on genetic algorithms. Computer Vision and Image Understanding, 117(10):1512–1525, 2013. [52] Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor Lempitsky. Deepwarp: Photorealistic image resynthesis for gaze manipulation. In European Conference on Com- puter Vision, pages 311–326. Springer, 2016. [53] Leon Gatys, Alexander S Ecker, and Matthias Bethge. Texture synthesis using convolutional In Advances in Neural Information Processing Systems (NIPS), pages neural networks. 262–270, 2015. 152 [54] Jiahao Geng, Tianjia Shao, Youyi Zheng, Yanlin Weng, and Kun Zhou. Warp-guided GANs for single-photo facial animation. ACM Transactions on Graphics (ToG), 37(6):1–12, 2018. [55] Edd Gent. A cryptocurrency for the masses or a universal id?: Worldcoin aims to scan all the world’s eyeballs. IEEE Spectrum, 2023. [56] [57] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets. pages 2672–2680, 2014. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014. [58] Steven A Grosz and Anil K Jain. Spoofgan: Synthetic fingerprint spoof images. IEEE Transactions on Information Forensics and Security, 18:730–743, 2022. [59] Mehak Gupta, Vishal Singh, Akshay Agarwal, Mayank Vatsa, and Richa Singh. Generalized In 25th IEEE iris presentation attack detection algorithm under cross-database settings. International Conference on Pattern Recognition (ICPR), pages 5318–5325, 2021. [60] Priyanshu Gupta, Shipra Behera, Mayank Vatsa, and Richa Singh. On iris spoofing using print attack. In IEEE International Conference on Pattern Recognition (ICPR), pages 1681– 1686, 2014. [61] Jian Han, Sezer Karaoglu, Hoang-An Le, and Theo Gevers. performance with 3d-rendered synthetic data. arXiv preprint arXiv:1812.07363, 2018. Improving face detection [62] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [63] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klam- bauer, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a Nash equilibrium. arXiv preprint arXiv:1706.08500, 2017. [64] Steven Hoffman, Renu Sharma, and Arun Ross. Convolutional neural networks for iris presentation attack detection: Toward cross-dataset and cross-sensor generalization. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1620–1628, 2018. [65] Steven Hoffman, Renu Sharma, and Arun Ross. Iris + Ocular: Generalized Iris Presentation Attack Detection Using Multiple Convolutional Neural Networks. In IAPR International Conference on Biometrics (ICB), 2019. [66] Cheng-Shun Hsiao, Chih-Peng Fan, and Yin-Tsung Hwang. Design and analysis of deep- learning based iris recognition technologies by combination of u-net and efficientnet. In 9th International Conference on Information and Education Technology (ICIET), pages 433–437. IEEE, 2021. 153 [67] Huaibo Huang, Ran He, Zhenan Sun, Tieniu Tan, et al. Introvae: Introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems, 31, 2018. [68] Muhammad Zahid Iqbal and Abraham G Campbell. Adopting smart glasses responsibly: potential benefits, ethical, and privacy concerns with ray-ban stories. AI and Ethics, 3(1):325– 327, 2023. [69] Anil K Jain, Patrick Flynn, and Arun A Ross. Handbook of biometrics. Springer Science & Business Media, 2007. [70] Anil K Jain and Stan Z Li. Handbook of face recognition, volume 1. Springer, 2011. [71] Anil K Jain, Karthik Nandakumar, and Arun Ross. 50 years of biometric research: Accom- plishments, challenges, and opportunities. Pattern recognition letters, 79:80–105, 2016. [72] Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734, 2018. [73] Indu Joshi, Marcel Grimmer, Christian Rathgeb, Christoph Busch, Francois Bremond, and Antitza Dantcheva. Synthetic data in human analysis: A survey. arXiv preprint arXiv:2208.09191, 2022. [74] Masashi Kanematsu, Hironobu Takano, and Kiyomi Nakamura. Highly reliable liveness detection method for iris recognition. In SICE Annual Conference 2007, pages 361–364. IEEE, 2007. [75] Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33:12104–12114, 2020. [76] Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. Advances in neural information processing systems, 34:852–863, 2021. [77] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4401–4410, 2019. [78] Patrik Joslin Kenfack, Daniil Dmitrievich Arapov, Rasheed Hussain, SM Ahsan Kazmi, and Adil Khan. On the fairness of generative adversarial networks (gans). In 2021 International Conference" Nonlinearity, Information and Robotics"(NIR), pages 1–7. IEEE, 2021. [79] Jihyun Kim, Changjae Oh, Hoseok Do, Soohyun Kim, and Kwanghoon Sohn. Diffusion- In Proceedings of the driven GAN Inversion for Multi-Modal Face Image Generation. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10403–10412, 2024. 154 [80] Naman Kohli, Daksha Yadav, Mayank Vatsa, and Richa Singh. Revisiting iris recognition with color cosmetic contact lenses. In IEEE International Conference on Biometrics (ICB), pages 1–7, 2013. [81] Naman Kohli, Daksha Yadav, Mayank Vatsa, Richa Singh, and Afzel Noore. Detecting med- ley of iris spoofing attacks using DESIST. In IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS), pages 1–6, 2016. [82] Naman Kohli, Daksha Yadav, Mayank Vatsa, Richa Singh, and Afzel Noore. Synthetic iris presentation attack using idcgan. In 2017 IEEE International Joint Conference on Biometrics (IJCB), pages 674–680. IEEE, 2017. [83] Adam Kortylewski, Bernhard Egger, Andreas Schneider, Thomas Gerig, Andreas Morel- Forster, and Thomas Vetter. Analyzing and reducing the damage of dataset bias to face recognition with synthetic data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. [84] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep In Advances in Neural Information Processing Systems, convolutional neural networks. pages 1097–1105, 2012. [85] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. [86] Ajay Kumar and Arun Passi. Comparison and combination of iris matchers for reliable personal authentication. Pattern recognition, 43(3):1016–1026, 2010. [87] Minh-Ha Le and Niklas Carlsson. Styleid: Identity disentanglement for anonymizing faces. arXiv preprint arXiv:2212.13791, 2022. [88] Eui Chul Lee, You Jin Ko, and Kang Ryoung Park. Fake iris detection method using purkinje images based on gaze position. Optical Engineering, 47(6):1 – 16, 2008. [89] Lik-Hang Lee and Pan Hui. Interaction methods for smart glasses: A survey. IEEE access, 6:28712–28732, 2018. [90] Sung Joo Lee, Kang Ryoung Park, and Jaihie Kim. Robust fake iris detection based on vari- ation of the reflectance ratio between the iris and the sclera. In IEEE Biometrics Symposium: Special Session on Research at the Biometric Consortium Conference, pages 1–6, 2006. [91] Sung Joo Lee, Kang Ryoung Park, Youn Joo Lee, Kwanghyuk Bae, and Jai Hie Kim. Multifeature-based fake iris detection method. Optical Engineering, 46(12):127204, 2007. [92] Chenyang Li, Zhili Zhang, Peipei Li, and Zhaofeng He. I3FDM: IRIS Inpainting Via Inverse In ICASSP 2024-2024 IEEE International Conference on Fusion of Diffusion Models. Acoustics, Speech and Signal Processing (ICASSP), pages 1636–1640. IEEE, 2024. [93] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015. 155 [94] Sarvesh Makthal and Arun Ross. Synthesis of iris images using markov random fields. In 2005 13th European Signal Processing Conference, pages 1–4, 2005. [95] Davide Maltoni, Dario Maio, Anil K Jain, and Salil Prabhakar. Synthetic fingerprint gener- ation. Handbook of fingerprint recognition, pages 271–302, 2009. [96] Davide Maltoni, Dario Maio, Anil K Jain, Salil Prabhakar, et al. Handbook of fingerprint recognition, volume 2. Springer, 2009. [97] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smol- ley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2794–2802, 2017. [98] David Menotti, Giovani Chiachia, Allan da Silva Pinto, William Robson Schwartz, Hélio Pedrini, Alexandre Xavier Falcao, and Anderson Rocha. Deep representations for iris, face, In IEEE Transactions on Information Forensics and and fingerprint spoofing detection. Security, 10:864–879, 2015. [99] Denis Migdal and Christophe Rosenberger. Statistical modeling of keystroke dynamics samples for the generation of synthetic datasets. Future Generation Computer Systems, 100:907–920, 2019. [100] Shervin Minaee and Amirali Abdolrashidi. Iris-gan: Learning to generate realistic iris images using convolutional gan. arXiv preprint arXiv:1812.04822, 2018. [101] Shervin Minaee and Amirali Abdolrashidi. Deepiris: Iris recognition using a deep learning approach. arXiv preprint arXiv:1907.09380, 2019. [102] Yisroel Mirsky and Wenke Lee. The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR), 54(1):1–41, 2021. [103] Mohammad Nabati, Hojjat Navidan, Reza Shahbazian, Seyed Ali Ghorashi, and David Windridge. Using synthetic data to enhance the accuracy of fingerprint-based localization: A deep learning approach. IEEE Sensors Letters, 4(4):1–4, 2020. [104] Kien Nguyen, Clinton Fookes, Raghavender Jillela, Sridha Sridharan, and Arun Ross. Long range iris recognition: A survey. Pattern Recognition, 72:123–143, 2017. [105] Ishan Nigam, Mayank Vatsa, and Richa Singh. Ocular biometrics: A survey of modalities and fusion approaches. Information Fusion, 26:1–35, 2015. [106] Olegs Nikisins, Amir Mohammadi, André Anjos, and Sébastien Marcel. On effectiveness of anomaly detection approaches against unseen presentation attacks in face anti-spoofing. In IAPR International Conference on Biometrics (ICB), 2018. [107] Behnaz Nojavanasghari, Charles E Hughes, Tadas Baltrušaitis, and Louis-Philippe Morency. In 2017 Hand2face: Automatic synthesis and recognition of hand over face occlusions. Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pages 209–215. IEEE, 2017. 156 [108] Margarita Osadchy, Yan Wang, Orr Dunkelman, Stuart Gibson, Julio Hernandez-Castro, and Christopher Solomon. Genface: Improving cyber security using realistic synthetic face generation. In Cyber Security Cryptography and Machine Learning: First International Conference, CSCML 2017, Beer-Sheva, Israel, June 29-30, 2017, Proceedings 1, pages 19–33. Springer, 2017. [109] Melih Öz, TANER DANIŞMAN, Melih Günay, Esra Şanal, Özgür Duman, and JOSEPH LEDET. The use of synthetic data to facilitate eye segmentation using deeplabv3+. Annals of Emerging Technologies in Computing, 5(3), 2021. [110] Unsang Park, Yiying Tong, and Anil K Jain. Age-invariant face recognition. IEEE transac- tions on pattern analysis and machine intelligence, 32(5):947–954, 2010. [111] Alex Perala. Princeton identity tech powers galaxy s8 iris scanning. https://mobileidworld. com/princeton-identity-galaxy-s8-iris-003312, 2017. [112] H. Proenca, S. Filipe, R. Santos, J. Oliveira, and L.A. Alexandre. The UBIRIS.v2: A database of visible wavelength images captured on-the-move and at-a-distance. IEEE Trans. PAMI, 32(8):1529–1535, August 2010. [113] H. Proença and L.A. Alexandre. UBIRIS: A noisy iris image database. In 13th International Conference on Image Analysis and Processing - ICIAP 2005, volume LNCS 3617, pages 970–977, Cagliari, Italy, September 2005. Springer. [114] Haibo Qiu, Baosheng Yu, Dihong Gong, Zhifeng Li, Wei Liu, and Dacheng Tao. Syn- face: Face recognition with synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10880–10890, 2021. [115] R. Raghavendra and C. Busch. Presentation attack detection algorithm for face and iris In European Signal Processing Conference (EUSIPCO), pages 1387–1391, biometrics. 2014. [116] Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. Encoding in style: a StyleGAN encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2287–2296, 2021. [117] Arun Ross, Sudipta Banerjee, Cunjian Chen, Anurag Chowdhury, Vahid Mirjalili, Renu Sharma, Thomas Swearingen, and Shivangi Yadav. Some Research Problems in Biometrics: The Future Beckons. In IAPR International Conference on Biometrics (ICB), 2019. [118] Arun A Ross, Karthik Nandakumar, and Anil K Jain. Handbook of multibiometrics, volume 6. Springer Science & Business Media, 2006. [119] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and In Advances in Neural Information Improved techniques for training GANs. Xi Chen. Processing Systems (NIPS), pages 2234–2242, 2016. 157 [120] Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. Improving GANs using optimal transport. arXiv preprint arXiv:1803.05573, 2018. [121] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018. [122] Gil Santos, Emanuel Grancho, Marco V Bernardo, and Paulo T Fiadeiro. Fusing iris and periocular information for cross-sensor recognition. Pattern Recognition Letters, 57:52–59, 2015. [123] Thomas Schlegl, Philipp Seeböck, Sebastian M Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging (IPMI), pages 146–157, 2017. [124] Robin M Schmidt. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv preprint arXiv:1912.05911, 2019. [125] Ana Sequeira, Lulu Chen, Peter Wild, James Ferryman, Fernando Alonso-Fernandez, Ki- ran B Raja, Ramachandra Raghavendra, Christoph Busch, and Joseph Bigun. Cross-eyed- cross-spectral iris/periocular recognition database and competition. In IEEE International Conference of the Biometrics Special Interest Group (BIOSIG), pages 1–5, 2016. [126] Samir Shah and Arun Ross. Generating synthetic irises by feature agglomeration. In 2006 international conference on image processing, pages 317–320. IEEE, 2006. [127] Anjali Sharma, Shalini Verma, Mayank Vatsa, and Richa Singh. On cross spectral periocular recognition. In IEEE International Conference on Image Processing (ICIP), pages 5007– 5011, 2014. [128] Renu Sharma and Arun Ross. D-netpad: An explainable and interpretable iris presentation attack detector. In 2020 IEEE international joint conference on biometrics (IJCB), pages 1–10. IEEE, 2020. [129] Renu Sharma and Arun Ross. D-NetPAD: An Explainable and Interpretable Iris Presentation Attack Detector. In IEEE International Joint Conference on Biometrics (IJCB), 2020. [130] Renu Sharma and Arun Ross. Viability of optical coherence tomography for iris presentation attack detection. In 25th IEEE International Conference on Pattern Recognition (ICPR), pages 6165–6172, 2021. [131] Joseph Shelton, Kaushik Roy, Brian O’Connor, and Gerry V Dozier. Mitigating iris-based replay attacks. International Journal of Machine Learning and Computing (JMLC), 4(3):204, 2014. [132] Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2107–2116, 2017. 158 [133] David P Sidlauskas and Samir Tamer. Hand geometry recognition. Handbook of Biometrics, pages 91–107, 2008. [134] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [135] Michał Stypułkowski, Konstantinos Vougioukas, Sen He, Maciej Zięba, Stavros Petridis, and Maja Pantic. Diffused heads: Diffusion models beat GANs on talking-face generation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5091–5100, 2024. [136] Z. Sun, H. Zhang, T. Tan, and J. Wang. Iris image classification based on hierarchical visual codebook. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(6):1120–1133, 2014. [137] Dwi Joko Suroso, Panarat Cherntanomwong, and Pitikhate Sooraksa. Synthesis of a small fingerprint database through a deep generative model for indoor localisation. Elektronika ir Elektrotechnika, 29(1):69–75, 2023. [138] Fariborz Taherkhani, Aashish Rai, Quankai Gao, Shaunak Srivastava, Xuanbai Chen, Fer- nando de la Torre, Steven Song, Aayush Prakash, and Daeil Kim. Controllable 3d generative In Proceedings of the adversarial face model via disentangling shape and appearance. IEEE/CVF Winter Conference on Applications of Computer Vision, pages 826–836, 2023. [139] Pin Shen Teh, Andrew Beng Jin Teoh, and Shigang Yue. A survey of keystroke dynamics biometrics. The Scientific World Journal, 2013, 2013. [140] Patrick Tinsley, Adam Czajka, and Patrick J. Flynn. Haven’t I Seen You Before? Assessing Identity Leakage in Synthetic Irises. In IEEE International Joint Conference on Biometrics (IJCB), pages 1–9, 2022. [141] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558, 2017. [142] Christos Tzelepis, Georgios Tzimiropoulos, and Ioannis Patras. WarpedGANSpace: Finding non-linear RBF paths in GAN latent space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6393–6402, 2021. [143] Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. In Advances in Neural Information Conditional image generation with CNN decoders. Processing Systems (NIPS), pages 4790–4798, 2016. [144] Shreyas Venugopalan and Marios Savvides. How to generate spoofed irises from an iris code template. In IEEE Transactions on Information Forensics and Security (TIFS), 6(2):385–395, 2011. [145] Paul Voigt and Axel Von dem Bussche. The EU General Data Protection Regulation (GDPR). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10(3152676):10– 5555, 2017. 159 [146] Zhiqiang Wan, Yazhou Zhang, and Haibo He. Variational autoencoder based synthetic data generation for imbalanced learning. In 2017 IEEE symposium series on computational intelligence (SSCI), pages 1–7. IEEE, 2017. [147] Chen Wang, Zhaofeng He, Caiyong Wang, and Qing Tian. Generating intra-and inter-class iris images by identity contrast. In 2022 IEEE International Joint Conference on Biometrics (IJCB), pages 1–7. IEEE, 2022. [148] Lakin Wecker, Faramarz Samavati, and Marina Gavrilova. Iris synthesis: a reverse subdivi- sion application. In Proceedings of the 3rd International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, pages 121–125. ACM, 2005. [149] Zhuoshi Wei, Tieniu Tan, and Zhenan Sun. Synthesis of large realistic iris databases using patch-based sampling. In 2008 19th International Conference on Pattern Recognition, pages 1–4, 2008. [150] Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. A 3d morphable model of the eye region. Optimization, 1:0, 2016. [151] André Brasil Vieira Wyzykowski and Anil K Jain. Synthetic latent fingerprint generator. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 971–980, 2023. [152] Lihu Xiao, Zhenan Sun, Ran He, and Tieniu Tan. Coupled feature selection for cross- In IEEE Sixth International Conference on Biometrics: Theory, sensor iris recognition. Applications and Systems (BTAS), pages 1–6, 2013. [153] Zhisheng Xiao, Karsten Kreis, and Arash Vahdat. Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804, 2021. [154] Minrui Xu, Dusit Niyato, Junlong Chen, Hongliang Zhang, Jiawen Kang, Zehui Xiong, Shiwen Mao, and Zhu Han. Generative ai-empowered simulation for autonomous driving in vehicular mixed reality metaverses. arXiv preprint arXiv:2302.08418, 2023. [155] D. Yadav, N. Kohli, J. S. Doyle, R. Singh, M. Vatsa, and K. W. Bowyer. Unraveling the effect of textured contact lenses on iris recognition. In IEEE Transactions on Information Forensics and Security (TIFS), 9:851–862, 2014. [156] Shivangi Yadav, Cunjian Chen, and Arun Ross. Synthesizing iris images using rasgan with application in presentation attack detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. [157] Shivangi Yadav, Cunjian Chen, and Arun Ross. Relativistic Discriminator: A One-Class Classifier for Generalized Iris Presentation Attack Detection. In IEEE Winter Conference on Applications of Computer Vision, pages 2635–2644, 2020. [158] Shivangi Yadav and Arun Ross. Cit-gan: Cyclic image translation generative adversarial net- work with application in iris presentation attack detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2412–2421, 2021. 160 [159] Shivangi Yadav and Arun Ross. iwarpgan: Disentangling identity and style to generate synthetic iris images. arXiv preprint arXiv:2305.12596, 2023. [160] Shivangi Yadav and Arun Ross. Synthesizing iris images using generative adversarial networks: Survey and comparative analysis. arXiv preprint arXiv:2404.17105, 2024. [161] D. Yambay, B. Walczak, S. Schuckers, and A. Czajka. LivDet-Iris 2015 - iris liveness detection competition 2015. In IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), pages 1–6, 2017. [162] David Yambay, Benedict Becker, Naman Kohli, Daksha Yadav, Adam Czajka, Kevin W Bowyer, Stephanie Schuckers, Richa Singh, Mayank Vatsa, Afzel Noore, et al. LivDet Iris 2017 - Iris Liveness Detection Competition. In IEEE International Joint Conference on Biometrics (IJCB), pages 733–741, 2017. [163] Dingdong Yang, Seunghoon Hong, Yunseok Jang, Tianchen Zhao, and Honglak arXiv preprint Lee. Diversity-sensitive conditional generative adversarial networks. arXiv:1901.09024, 2019. [164] Svetlana N Yanushkevich, Adrian Stoica, Vlad P Shmerko, and Denis V Popel. Biometric inverse problems. CRC Press, 2018. [165] Raymond Yeh, Ziwei Liu, Dan B Goldman, and Aseem Agarwala. Semantic facial expression editing using autoencoded flow. arXiv preprint arXiv:1611.09961, 2016. [166] Bassel Zeno, Ilya Kalinovskiy, and Yuri Matveev. IP-GAN: learning identity and pose disentanglement in generative adversarial networks. In International Conference on Artificial Neural Networks, pages 535–547. Springer, 2019. [167] Man Zhang, Qi Zhang, Zhenan Sun, Shujuan Zhou, and Nasir Uddin Ahmed. The BTAS competition on mobile iris recognition. In IEEE 8th international conference on biometrics theory, applications and systems (BTAS), pages 1–7, 2016. [168] Mei Zhang, Jinglan Wu, Huifeng Lin, Peng Yuan, and Yanan Song. The application of one-class classifier based on CNN in image defect detection. Procedia Computer Science, 114:341–348, 2017. [169] Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. View synthesis by appearance flow. In European Conference on Computer Vision, pages 286–301. Springer, 2016. [170] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE Inter- national Conference on Computer Vision, pages 2223–2232, 2017. [171] Tianlei Zhu, Junqi Chen, Renzhe Zhu, and Gaurav Gupta. StyleGAN3: Generative networks for improving the equivariance of translation and rotation. arXiv preprint arXiv:2307.03898, 2023. 161 [172] Hang Zou, Hui Zhang, Xingguang Li, Jing Liu, and Zhaofeng He. Generation textured contact lenses iris images based on 4dcycle-gan. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 3561–3566. IEEE, 2018. [173] Jinyu Zuo, Natalia A Schmid, and Xiaohan Chen. On generation and analysis of synthetic iris images. IEEE Transactions on Information Forensics and Security, 2(1):77–90, 2007. 162 APPENDIX SYNTHETIC IRIS IMAGES USING GENERATIVE ADVERSARIAL NETWORKS: SURVEY AND COMPARATIVE ANALYSIS This work is moving towards its completion and we are waiting to submit it to ACM Survey. Introduction Iris based biometric recognition systems have gained significant attention in recent years with applications in various domains [55,105,111]. However, as this field advances, it brings forth range of research challenges that require exploration and innovation. Notably, a prominent challenge in biometrics research is the unavailability of datasets with sufficient size, quality, and diverse intra-class variations. Despite significant strides in biometric technology, the datasets commonly employed for training and assessing these systems frequently fall short in terms of sample quantity and the encompassment of the full spectrum of intra-class variabilities. This deficiency hinders the development and dependable evaluation of iris recognition systems. Another critical challenge is to ensure the privacy of individuals who submit their biometric data. Biometric information, being inherently personal and unique, raises concerns about unauthorized access, data breaches, and potential identity theft. Protecting the privacy of individuals while utilizing their biometric data is of utmost importance to foster trust and encourage widespread adoption of biometric systems [145]. Researchers have been actively working on finding solutions to overcome these challenges. One of such solution is to explore the potential of synthetically generated iris dataset. Synthetic iris refers to the creation of artificial iris image that replicates real-world charac- teristics. It encompasses the generation of synthetic samples that emulate the traits and patterns observed in authentic iris images [35]. These synthetic samples are designed to possess similar statistical properties and variations as genuine iris data, providing a valuable resource for research, development, and evaluation in the field of biometrics. Therefore, the synthetically generated iris images can be utilized to generate more data with both inter and intra-class variations. This helps overcome the limitations and challenges associated with conventional iris datasets. Traditional iris datasets often suffer from restricted sample sizes, lack diversity, and present concerns regarding privacy and data sharing. In contrast, synthetically generated irides offer a controlled and scalable solution by generating artificial iris data that can mimic the complexity and diversity of real-world biometric traits [73]. By generating synthetic irides, researchers and developers can gain access to larger and more diverse datasets that facilitates comprehensive testing and optimization of iris based biometric algorithms, enhancing their performance, reliability, security and generalization capabilities. Ad- ditionally, synthetically generated irides address privacy concerns associated with the utilization of genuine biometric data. The artificial nature of the generated data ensures that it is not directly linked to any specific individual, mitigating the risks of unauthorized access or misuse of personal information. Consequently, synthetic biometric datasets can be shared and distributed for research and evaluation purposes without compromising individuals’ privacy rights. Furthermore, syntheti- cally generated irides play a crucial role in enhancing the training and testing of deep convolutional neural network (CNN) models. Deep CNNs have demonstrated remarkable performance in vari- ous biometric tasks but heavily rely on large labeled datasets for effective training. Synthetically 163 generated irides can aid in augmenting the availability of labeled training data by creating synthetic samples with known ground truth annotations. This facilitates the creation of more extensive and diverse training sets, resulting in improved CNN model training and higher accuracy in iris recog- nition systems. Moreover, synthetically generated irides contribute to the development of robust presentation attack detection (PAD) algorithms. Presentation attacks (PAs), where impostors at- tempt to deceive biometric systems using fabricated or altered irides, pose significant security risks. Various presentation attack scenarios can be simulated by generating synthetic irides with diverse variations and attack types (for example, cosmetic contact lens, printed eyes, artificial eyes, etc.). This synthetically generated dataset enables the training and evaluation of PA detection algorithms, enhancing their effectiveness and enabling the development of resilient countermeasures against evolving PA techniques. Thus, depending on their usage, the application of synthetically generated irides can be described as follows [45, 58, 73, 145, 158]: • Algorithm Development and Testing: Synthetic iris dataset serves as a valuable resource for algorithm development and testing in the field of biometrics. By generating synthetic iris samples, researchers can assess and evaluate the performance of novel algorithms, compare different techniques, and benchmark their efficacy against a standardized iris dataset. Syn- thetic data allows for controlled experimentation, enabling researchers to precisely manipulate specific biometric traits, variations, and noise levels to simulate real-world scenarios. • System Evaluation and Benchmarking: Synthetic iris dataset plays a vital role in evaluating and benchmarking the performance of iris recognition systems. It provides a standardized dataset that enables fair comparisons between different systems and algorithms. By using synthetic data, researchers and developers can assess system accuracy, robustness, and vul- nerability to various attacks or spoofing attempts. This evaluation process aids in identifying system weaknesses, improving overall system performance, and guiding the development of countermeasures. • Training Data Augmentation: Synthetic iris dataset can be utilized to augment training datasets, enhancing the performance of iris recognition systems. By generating additional synthetic samples, researchers can increase the size and diversity of the training set, which helps to improve the generalization capabilities of the algorithms. This approach reduces overfitting, enhances the system’s ability to handle intra-class variations, and improves overall recognition accuracy. • Privacy-Preserving Studies: Synthetic iris dataset is invaluable for privacy-preserving studies and research involving sensitive biometric information. It allows researchers to conduct studies, simulations, and experiments without the need for real individuals’ personal biometric data. Synthetic iris dataset provides a privacy-friendly alternative that ensures data protection while enabling advancements in biometric research and system development. With the various applications and advantages, synthetic iris dataset offers flexibility and con- venience in the development and assessment of iris based recognition systems and attack detection methods. In this context, it is important to understand the different methods employed to generate synthetic iris samples and their respective approaches. Therefore, in this comprehensive review we 164 would like to explore the current state-of-the-art synthetic iris image generation methods and take a closer look at the strengths and weaknesses of these methods in terms of quality, identity uniqueness and their utility. By doing so, we aim for this survey to be a helpful source of accurate technical information for those interested in learning about the progress and difficulties in synthesizing good quality iris images. Thus, this review brings forth specific contributions: • Study the current state-of-the-art methods to generate synthetic irides and explain their pros and cons. • An assessment of synthetic iris images generated by current state-of-the-art methods in terms of quality, uniqueness and utility. • Analyze current deep learning based iris recognition systems and how synthetically generated iris dataset can enhance their performance. • Analyze current deep learning based presentation attack detection algorithms and how syn- thetically generated iris dataset can enhance their performance. • Discuss the future direction to overcome the challenges of current methods to generate enhanced synthetic iris datasets. Iris Recognition In this section, we briefly discuss iris recognition to establish the technical context for this review. For a detailed exploration of iris recognition, we recommend referring to [17, 37, 104]. The foundational technology behind modern iris recognition systems can be traced back to the work of John Daugman [37], who is credited with the development of the core algorithms that make such systems possible. Daugman’s work leverages the distinct patterns found in the human iris to create a method for secure and accurate human recognition. There have been significant improvements on his initial work for various security, identification and privacy applications. Iris Recognition algorithms can be divided into 4 sub-problems: • Iris Segmentation: Most of the iris datasets does not only contain irides but also regions such as pupil, sclera, eyelashes etc. So, the first step towards iris recognition is to segment iris from captured image to remove these unnecessary or extra information. Most of the initial segmentation approaches, including Daugman, involve identifying pupil and iris boundary. In traditional approaches, the occlusions are minimized by edge detection and curve fitting over eyelids. • Normalization: Post segmentation, the variations in the segmented irides (caused due to distance from sensor, angle or pupil size) are minimized via normalization where circular irides are unwrapped to a specified resolution by transforming them from cartisan coordinate system to polar coordinate system. • Feature Encoding: After normalization, iris features are extracted and encoded so that they can be used in matching. Most common techniques for iris feature extraction involve Gabor Filtering [?], BSIF [?], etc. that help in capturing the unique textural properties of Iris. 165 • Matching: Once features are encoded, various matching algorithms can be used for iris recognition. Daugman used Gabor phase quadrant feature to encode iris and used hamming distance to match iris samples. These have been improved over time to account for various image quality, occlusion and noisy data. With recent developments in the field of deep-learning, deep networks have found it’s application in all 4 stages of iris recognition. Iris segmentation and feature extraction have particularly benefited from deep learning-based approaches as it handles noise and complexities in the iris datasets better than traditional approaches. For example, Chen and Ross [21] proposed a novel multi- task learning framework using a convolutional neural network (CNN) designed to carry out iris localization alongside presentation attack detection (PAD) effectively. This multi-task PAD (MT- PAD) aims to determines the iris’s boundaries and assesses the likelihood of a presentation attack based on the input image at the same time. Through rigorous testing, this method demonstrated state-of-the-art performance on various iris datasets. In [104], Nguyen et al. studied how well pre-trained convolutional neural networks (CNNs) perform in the domain of iris recognition. The study reveals that features derived from off-the-shelf CNNs can efficiently capture the complex characteristics of irides. These features are adept at isolating distinguishing visual attributes, which leads to encouraging outcomes in iris recognition performance. While the progress in the field of deep learning has helped improve the reliability of iris recognition systems by improving their performance under various conditions, the lack of iris data with sufficient inter and intra-class variations limits the training and testing of these systems. Therefore, we need to explore the field of generative methods to generate fully synthetic iris images that can help obtain a large collection iris dataset with enough inter and intra-class variations to help train and test a robust iris recognition system. Iris Presentation Attack Detection Iris presentation attack detection (PAD) is an essential aspect of iris recognition systems. As the reliance on iris recognition systems grows, so does the sophistication of attacks designed to exploit them. Here, we briefly examine the nature of iris presentation attacks, the methodologies developed to detect them, and the challenges faced in enhancing iris PAD systems. Iris presentation attacks (PAs), also known as spoofs, refers to physical artifacts that aims to either impersonate someone or obfuscate one’s identity to fool the recognition systems. There are several types of presentation attacks on iris recognition systems: • Print Attack: One of the simplest forms of iris PA involves the attacker presenting a high- quality photograph of a valid subject’s iris to the biometric system. Basic systems might be misled by the photograph’s visual fidelity unless they’re designed to detect the absence of depth or natural eye movements. • Artificial Eyes: Attackers may employ high-grade artificial (prosthetic or doll) eyes that replicate the iris’s texture and three-dimensionality. These artificial eyes seek to deceive scanners that are not sophisticated enough to discern genuine subject from attacker based on liveness indicators such as pupil’s response to light stimuli. 166 • Cosmetic Contact Lens: A more nuanced approach includes the usage of cosmetic contact lenses that have been artificially created with iris patterns that can either conceal the attacker’s true iris or mimic someone else’s identity. This type of attack attempts to bypass systems that match iris patterns by introducing false textural elements. • Replay-Attack: Playing back a video recording of a genuine iris to the scanner constitutes another PA. Advanced iris recognition systems counter this by looking for evidence of liveness, like blinking or involuntary pupil contractions. Many researchers have proposed different methods to effectively detect different types of PAs. In [60,80] proposed to utilize textual descriptors like GIST, LBP and HOG to detect printed eyes and cosmetic contact lens. Similarly, Raghavendra and Busch [115] utilize cepstral features with binary statistical image features (BSIF) to distinguish between bonafide irides and print attacks. Another way to detect print attack is the liveness test that is lacking in printed eyes [33, 74]. Liveness test can also be helpful to detect attacks like artificial eyes. Eye gaze tracking [88] and multi-spectral imaging [22] have good results in detecting printed eyes and artificial eyes. [64] proposed deep network based PA detection methods to detect different types of PAs. To achieve this, Hoffman et al. [64] proposed a deep network that utilizes patch information with segmentation mask to learn features that can distinguish bonafide from iris PAs. While these iris PA detection methods perform well on various datasets, attackers are contin- uously finding new ways to bypass them, leading to an arms race between security experts and attackers. As a result, the PAD methods need to be constantly updated (re-trained or fine-tuned) and tested against the latest forms of attacks. This calls for PA detection methods that can generalize well over new (or unseen) PAs without the hassle of re-training or fine-tuning. Here, “Seen PAs" are those which the PAD methods have been exposed to during the training phase. In contrast, “Unseen PAs" are not included in the training phase, posing a concerning challenge for accurate PA detection. Recent developments in PAD methods have focused on enhancing the ability of systems to generalize, distinguishing bonafide irides from PAs, even when encountering previously unseen PAs. Gupta et al. [59] proposed a deep network called MVANet, which uses multiple convolutional layers for generalized PA detection. This network not only improves PA detection accuracy, but also addresses the high computational costs typically associated with training deep neural networks by using a simplified base model structure. Evaluations across different databases indicate MVANet’s proficiency in generalizing to detect new and unseen PAs. In [128], Sharma and Ross proposed D-NetPAD, a PAD method based on DenseNet to generalize over seen and unseen PAs. It has demonstrated a strong ability to generalize across diverse PAs, sensors, and data collections. Their rigorous testing confirms D-NetPAD’s robustness in detecting generalized PAs. Most of the PAD methods formulate PA detection as a binary-class problem, which demands the availability of a large collection of both bonafide and PA samples to train classifiers. However, obtaining a large number of PA samples can be much more difficult than bonafide iris samples. Further, classifiers are usually trained and tested across similar PAs, but PAs encountered in operational systems can be diverse in nature and may not be available during the training stage. Therefore, we need to explore the generative methods to generate partially synthetic iris images (as identity is not the focus in PA detection) that can help build a balanced iris PA datasets. This will help researchers to better train and test their detection methods. 167 Generating Synthetic Irides As mentioned earlier, synthetic iris images offer several advantages, including scalability, diversity, and control over the generated data. Some of the methods to generate such images are listed below, categorized on the basis of method used: • Texture Synthesis: This technique has been widely used for generating synthetic iris images. These methods analyze the statistical properties of real iris images and generate new images based on those statistics. Shah and Ross [126] proposed an approach for generating digital renditions of iris images using a two-step technique. In the first stage, they utilized a Markov Random Field model to generate a background texture that accurately represents the global appearance of the iris. In the subsequent stage, various iris features, including radial and concentric furrows, collarette, and crypts, are generated and seamlessly embedded within the texture field. In another example, Makthal and Ross [94] introduced a novel approach for synthetic iris generation using Markov Random Field (MRF) modeling. The proposed method offers a deterministic synthesis procedure, which eliminates the need for sampling a probability distribution and simplifies computational complexity. Additionally, the study highlights the distinctiveness of iris textures compared to other non-stochastic textural patterns. Through clustering experiments, it is demonstrated that the synthetic irises generated using this technique exhibit content similarity to real iris images. In a different approach, Wei et. al. [149] proposed a framework for synthesizing large and realistic iris datasets by utilizing iris patches as fundamental elements to capture the visual primitives of iris texture. Through patch-based sampling, an iris prototype is created, serving as the foundation for generating a set of pseudo irises with intra-class variations. Qualitative and quantitative analyses demonstrate that the synthetic datasets generated by this framework are well-suited for evaluating iris recognition systems. • Morphable Models: Morphable models have been utilized for generating synthetic iris images by capturing the shape and appearance variations in a statistical model. These models represent the shape and texture of irises using a low-dimensional parameter space. By manipulating the parameters, synthetic iris images with different characteristics, such as size, shape, and texture, can be generated. Most of the research in this category focuses on generation synthetic iris images with gaze estimation and rendering eye movements. Wood et. al. [150] proposed a 3-D morphable model for the eye region with gaze estimation and re-targeting gaze using a single reference image. Similarly, [10] focuses on achieving photo- realistic rendering of eye movements in 3D facial animation. The model is built upon 3D scans of a face captured from various gaze directions, enabling the capture of realistic motion of the eyeball, eyelid deformation, and the surrounding skin. To represent these deformations, a 3D morphable model is employed. • Image Warping: Image warping techniques involve applying geometric transformations to real iris images to generate synthetic images. These transformations can include rotations, translations, scaling, and deformations. Image warping allows for the generation of synthetic iris images with variations in pose, gaze direction, and occlusions. In [19], Cardoso et. al. aimed to generate synthetic degraded iris images for evaluation purposes. The method 168 utilizes various degradation factors such as blur, noise, occlusion, and contrast changes to simulate realistic and challenging iris image conditions. The degradation factors are carefully controlled to achieve a realistic representation of degraded iris images commonly encountered in real-world scenarios. In [32], a novel iris image synthesis method combining principal component analysis (PCA) and super-resolution techniques is proposed. The study begins by introducing the iris recognition algorithm based on PCA, followed by the presentation of the iris image synthesis method. The proposed synthesis method involves the construction of coarse iris images using predetermined coefficients. Subsequently, super-resolution tech- niques are applied to enhance the quality of the synthesized iris images. By manipulating the coefficients, it becomes possible to generate a wide range of iris images belonging to specific classes. • Generative Adversarial Networks (GANs): GANs have gained significant attention for gen- erating realistic and diverse synthetic iris images. In a GAN framework, a generator network learns to generate synthetic iris images, while a discriminator network distinguishes between real and synthetic images. The two networks are trained in an adversarial manner, resulting in improved image quality over time. GANs can generate iris images with realistic features, including iris texture, color, and overall appearance. Minaee and Abdolrashidi [100] pro- posed a framework that utilizes a generative adversarial network (GAN) to generate synthetic iris images sampled from a learned prior distribution. The framework is applied to two widely used iris datasets, and the generated images demonstrate a high level of realism, closely resembling the distribution of images within the original datasets. Similarly, Kohli et. al. [82] proposed iDCGAN (iris Deep Convolutional Generative Adversarial Network), a novel framework that leverages deep convolutional generative adversarial networks and iris quality metrics to generate synthetic iris images that closely resemble real iris images. Bamoriya et. al. [9] proposed a novel approach, called Deep Synthetic Biometric GAN (DSB-GAN), for generating realistic synthetic biometrics that can serve as large training datasets for deep learning networks, enhancing their robustness against adversarial attacks. Currently, GAN based methods to generate synthetic biometrics have been proven to be far superior to capture the intricate details of various biometric cues. Therefore, in the remaining of the paper we will focus majorly on these methods and iris images generated by these methods for our study and analysis. Partially Synthetic Irides Partially-Synthetic Biometric Data refer to the synthetic samples that contain artificial components In this approach, certain aspects or attributes of the biometric mixed with real biometric data. data are synthetically generated, while other parts are derived from real individuals. The goal of partially-synthetic data is to introduce controlled variations or augmentations to the real data, thereby increasing the diversity and robustness of the dataset. This can be particularly useful in scenarios where the real data is limited, imbalanced, or lacks specific variations. For example, in iris presentation attack (PA) detection where the detection methods aim to detect PA attacks (such as printed eyes, cosmetic contact lens, etc.), limited PA data is available to train the detection 169 methods. This can limit the methods’ development and testing as well. Also, with the improvement in technology more advance PA attacks are present in the real world (such as good quality textured contact lens, replay attack using high definition screens, etc.) and the current detection methods are not generalized enough to detect these new and unseen attacks. As mentioned earlier, Kohli et al. [82] proposed iDCGAN that utilizes a deep convolutional generative adversarial network to generate synthetic iris images that are realistic looking and closely resemble real iris images. This framework aims to explore the impact of the synthetically generated iris images when used as presentation attacks on iris recognition systems. In [9], Bamoriya et al. proposed a novel approach, DSB-GAN, which is built upon combination of convolutional autoencoder (CAE) and DCGAN and the evaluation of DSB-GAN is conducted on three biometric modalities: fingerprint, iris, and palmprint. One of the notable advantages of DSB-GAN is its efficiency due to a low number of trainable parameters compared to existing state-of-the-art methods. Yadav et al. [156,157] leverages RaSGAN to generate high-quality partially synthetic iris images in NIR spectrum and evaluates the effectiveness and usefulness of these images as both bonafide and presentation attack. They also proposed a novel one-class presentation attack detection method known as RD-PAD for unseen presentation attack detection, addressing the challenge of generalizability in PAD algorithms. Zou et al. [172] proposed 4DCycle-GAN that is designed to enhance the database of iris PA images by generating synthetic iris images with cosmetic contact lenses. Building upon the Cycle-GAN framework, the 4DCycle-GAN algorithm stands out by adding two discriminators to the existing model to increase the diversity of the generated images. These additional discriminators are engineered to favor the images from the generators rather than those from real-life captures. This approach reduces the bias towards generating repetitive textures of contact lenses, which typically make up a significant portion of the training data. In [158], Yadav and Ross proposed to generate bonafide as well as different types of presentation attacks in NIR spectrum using a novel image translative GAN, known as CIT-GAN. The proposed architecture translates the style of one domain to another using a styling network to generate realistic, high resolution iris images. Fully Synthetic Irides Fully-synthetic biometric data refer to the generation of entirely artificial biometric samples that do not correspond to any real individuals in the population. These synthetic samples are created using mathematical models, statistical distributions, or generative algorithms to simulate the statistical properties and characteristics of real biometric data. Some of the texture based methods focuses on generating new iris identities (fully-synthetic) that are unique from training samples. These methods aim to generate new iris images with both inter and intra-class variations to help mitigate the issue of small training data size by increasing their size of dataset. This can improve the development of recognition systems and their testing. Also, by generating fully-synthetic identities that doesn’t resemble anyone in this world, we can solve the privacy concerns attached with using real person’s biometric data. In [147], Wang et al. proposed a novel algorithm for generating diverse iris images, enhancing both the variety and the number of images available for analysis. The technique employs contrastive learning to separate features tied to identity (like iris texture and eye orientation) from those that change with conditions (such as pupil size and iris exposure). This separation allows for precise identity representation in synthetic images. The algorithm uniquely processes iris topology and texture through a dual-channel input system, enabling the generation of 170 (a) FID score distribution from generated iris im- ages when GANs are trained using CASIA-Iris- Thousand dataset. (b) FID score distribution from generated iris images when GANs are trained using CASIA- CSIR dataset. (c) FID score distribution from generated iris images when GANs are trained using IITD-iris dataset. Figure A.1: Histograms showing the realism scores of real iris images (i.e., FID scores) from three different datasets and the synthetically generated iris images. Lower the FID score, the more realistic the generated iris images are. varied iris images that retain specific texture details. Yadav and Ross [159] proposed iWarpGAN that aims to disentangle identity and stylistic elements within iris images. It achieves this through two distinct pathways: one that transforms identity features from real irides to create new iris identities, and another that captures style from a reference image to infuse it into the output. By merging these modified identity and style elements, iWarpGAN can produce iris images with a wide range of inter and intra-class variations. Limited work has been done in this category to generate irides with identity that doesn’t match with any identity in the training data. This is an important and upcoming topic that needs more focus. Experiments & Results In this section, we discuss the different experiments evaluating the generation capability of different GAN methods in generating fully and partially synthetic iris images. The different GAN methods we studied in this research are: RaSGAN [72, 156], CIT-GAN [157], StarGAN-v2, 171 Stylegan-3 [171] and iWarpGAN [159]. Datasets Used In this research, we explore the usefulness and utility of generated iris images for iris recognition and PA detection. Therefore, the dataset used in this research can be listed as follows: Iris Datasets In this research, we conducted our experiments and analysis using three iris datasets: • CASIA-Iris-Thousand [5]: Developed by the Chinese Academy of Sciences Institute of Automation, the CASIA-Iris-Thousand dataset is a popular resource for studying iris patterns and for advancing iris recognition technologies. This dataset comprises 20,000 iris images from 1,000 participants, accounting for 2,000 distinct identities when considering images of both the left and right eyes. These images are captured at a resolution of 640x480 pixels. The dataset has been partitioned into training and testing subsets, with a distribution of 70% for training (1,400 identities) and 30% for testing (600 identities). • CASIA Cross Sensor Iris Dataset (CSIR) [152]: The training portion of the CASIA-CSIR dataset, provided by the Chinese Academy of Sciences Institute of Automation, was employed in our study. It includes a total of 7,964 iris images from 100 individuals, representing 200 unique identities when both eyes are considered. Similar to the first dataset, a 70-30 split based on unique identities was used to divide the images into training (5,411 images) and testing sets (2,553 images), intended for the training and evaluation of deep learning models for iris recognition. • IITD-iris [6]: Originating from the Indian Institute of Technology, Delhi, the IITD-iris dataset was collected in an indoor setting and consists of 1,120 iris images from 224 subjects. The images were captured using JIRIS JPC1000 and digital CMOS cameras, each with a resolution of 320x240 pixels. In line with the previous datasets, this one also utilizes a 70-30 split based on unique identities for its training (314 identities) and testing sets (134 identities). For the preparation of the data, we processed and resized all the iris images to 256x256 pixels, centering on the iris region as determined by the iris and pupil coordinates from VeriEye1. Iris PA Datasets Our exploration into synthetic images for iris PA detection involves leveraging five distinct iris PA datasets. These datasets included Casia-iris-fake [136], Berc-iris-fake [91], NDCLD15 [41], LivDet2017 [161], and MSU-IrisPA-01 [156], each comprising authentic iris images alongside 1www.neurotechnology.com/verieye.html 172 various categories of PAs such as cosmetic contacts, printed iris images, artificial eyes, and display- based attacks. As mentioned earlier, we processed and resized the images to 256x256 pixels, centering on the iris region as determined by the iris and pupil coordinates from VeriEye. Images that VeriEye failed to process correctly were excluded from the study, with our focus being primarily on the aspect of image synthesis. The resulting processed dataset for analysis contains 24,409 genuine iris images, 6,824 images with cosmetic contact lenses, 680 artificial eye representations, and 13,293 printed iris images. The train and test division on this dataset is explained later in this section. Quality of Generated Images In order to evaluate the realism and quality of the generated iris images, different GAN methods- RaSGAN, CITGAN, StarGAN-v2, Stylegan-3 and iWarpGAN- are trained using real irides from CASIA-Iris-Thousand, CASIA Cross Sensor Iris and IITD-iris dataset, separately. Using the trained networks, we generate three sets of 20,000 synthetic bonafide images (from each dataset) for each of the GANs mentioned above. We then evaluate the realism of the generated images and the quality of the iris using three different methods: (1) Fréchet Inception Score [119], (2) VeriEye Rejection Rate and (3) ISO-ISO/IEC 29794-6 Standard Quality Metrics [3]. Fréchet Inception Distance Score The Fréchet Inception Distance (FID) Score is a metric used to assess the quality of synthetically generated images by comparing their distribution to that of real images, resulting in a score based on the differences. The objective is to minimize this score, as a lower FID score suggests greater resemblance between the synthetic and real datasets. It has been noted that FID scores can span a broad range, with extremely high scores in the 400-600 range indicating significant deviation from the real data distribution and, consequently, poor synthetic image quality [119]. In our specific analysis of the quality of synthetically generated iris images produced by different GANs used in this study, we obtained an average FID score of 24.33 and for RaSGAN and StarGAN- v2. On the other hand, a score of 31.82, 26.90, 15.72 and 17.62 are obtained for CIT-GAN, Stylegan-3 and iWarpGAN, respectively. As mentioned earlier, lower the FID score, more realistic the generated images are with respect to real images. Therefore, we can conclude that Stylegan-3 and iWarpGAN generates the most realistic iris images. The distribution of these FID scores has been shown in Figure A.1. VeriEye Rejection Rate For this experiment, we followed the protocol mentioned in [159] to evaluate the effectiveness of various synthetic iris image generation methods by their acceptance rate when analyzed by VeriEye, a commercial iris matcher. Here, we analyze the rate at which VeriEye reject synthetic images produced by different GAN methods. In the first set of comparisons using the IITD-Iris-Dataset, which comprises 1,120 real iris images, only 0.18% were rejected by VeriEye. In contrast, 1,120 synthetic images produced by RaSGAN and StarGAN-v2 had a rejection rate of 4.55% and 4.64%, 173 (a) CASIA-Iris-Thousand dataset versus synthetic iris images from various GANs. (b) CASIA-CSIR dataset versus synthetic iris images from various GANs. Figure A.2: Histograms depicting the quality of real irides alongside the quality of synthetic irides. These evaluations are in accordance with the ISO/IEC 29794-6 Standard Quality Metrics, with the quality scale set between 0 and 100, where a higher score denotes superior quality. Iris images that were not successfully assessed by this standard were assigned a score of 255. 174 Figure A.2 (cont’d) (c) IITD-iris dataset versus synthetic iris images from various GANs. respectively. However, images generated by CITGAN, iWarpGAN and Stylegan-3 demonstrated significantly lower rejection rates at 2.85%, 0.73% and 1.07%, respectively. The CASIA-CS Iris dataset contains 7,964 real iris images with a rejection rate of 2.81%. The rejection rates for 7,964 synthetic images were notably higher; images generated by RaSGAN and StarGAN-v2 were rejected at a rate of 2.06% and 2.65%, respectively. Meanwhile, CITGAN, iWarpGAN and Stylegan-3 produced images with rejection rates closer to the real images, at 2.71%, 2.74% and 2.52% respectively. Lastly, the CASIA-Iris-Thousand Dataset, which included 20,000 real iris images, saw a very low rejection rate of 0.06%. Synthetic images from this dataset indicated that RaSGAN and StarGAN-v2 had the highest rejection rate at 0.34% and 0.27%, respectively. Synthetic images from CITGAN, iWarpGAN and Stylegan-3 showed improvement with rejection rates of 0.24%, 0.18% and 0.16%, respectively. ISO/IEC 29794-6 Standard Quality Metrics As described in [159], the fidelity of synthetically produced iris images is assessed using the ISO/IEC 29794-6 Standard Quality Metrics [3]. This assessment was applied to images generated by different GANs utilized in this study. The ISO standard employs a set of criteria to evaluate the quality of an iris image, which includes the usable iris area, the contrast between the iris and sclera, image sharpness, the contrast between the iris and pupil, pupil shape, and more, culminating in a comprehensive quality score. This score is on a scale from 0 to 100, where 0 indicates the Images that fail to be evaluated by this ISO metric, lowest image quality and 100 the highest. typically due to substandard quality or errors in segmentation, are assigned a score of 255. As shown in Figure A.2, the quality scores for 20,000 synthetic iris images obtained using iWarpGAN, 175 CITGAN and Stylegan-3 are on par with those of real iris images. Conversely, a noticeable number of images generated by RaSGAN were assigned the score of 255, reflecting their inferior quality. Additionally, a comparison across the three datasets showed that the CASIA-CSIR dataset contained a higher proportion of images with the lowest score of 255, in contrast to the IITD-iris and CASIA-Iris-Thousand datasets. Uniqueness of Synthetically Generated Irides This experiment examines the uniqueness of the iris images generated synthetically using dif- ferent GANs, specifically assessing the ability of these methods to create distinct identities that exhibit some intra-class variations. For this, RaSGAN, CITGAN, StarGAN-v2, Stylegan-3 and iWarpGAN- are trained using real irides in the train set of CASIA-Iris-Thousand, CASIA Cross Sensor Iris and IITD-iris dataset, separately. Unique-Experiment-1: This experiment is centered on exploring the uniqueness of the synthet- ically generated iris datasets, which were generated using various GAN techniques, with respect to the training examples. To achieve this, the analysis involve comparing the impostor and genuine distribution for real irides that were part of the training set for different GAN techniques with that of the synthetically generated irides. Here, VeriEye matcher is utilized to obtain the similarity score between two iris images, with the score spanning from 0 to 1557, where a higher score indicates a more accurate match. Unique-Experiment-2: This experiment aims to investigate the uniqueness and intra-class vari- ability present in the synthetically generated iris dataset. This involves an analysis of both genuine and imposter distributions within the generated dataset and their comparison with the distributions from actual iris datasets. As previously noted, this investigation is conducted across a range of uniquely generated identities to assess their uniqueness. The VeriEye matcher is used in this exper- iment for assessing the similarity score between pairs of iris images. Analysis: Above mentioned experiments replicate the experimental protocols defined in [159] It also analyze the to evaluate the inter and intra-class variations in the generated iris dataset. uniqueness of the generated samples from the real irides in the training dataset. As depicted in Figures A.3, A.4, and A.5, the iris images produced by iWarpGAN do not exhibit a high degree of resemblance to the real irides from train set, unlike the outcomes from other GAN methods. This indicates iWarpGAN’s proficiency in generating iris patterns with identities that diverge from those found in the training set. Moreover, by examining the overlap between the imposter distribution of the synthetically generated iris images and that of the genuine iris images, it becomes evident that the generated identities are distinct from one another. Hence, we can conclude that iWarpGAN has the capability to generate fully synthetic irides images with identities that are unique from irides in training set, while other GANs are efficient to generate only partially synthetic irides. 176 Table A.1: PAD-Experiment-0: True Detection Rate (TDR in %) at three different False Detection Rate (FDR) of different iris PAD methods in baseline experiment (PAD-Experiment-0) where PA detectors are trained using real bonafide and PA samples. The PA samples used in this experiment are imbalanced across different PA classes [158]. BSIF+SVM [41] Fine-Tuned VGG-16 [53] Fine-Tuned AlexNet [84] D-NetPAD [128] TDR (@0.1%) TDR (@0.2%) TDR (@1.0%) 3.32 6.15 28.11 85.25 83.86 89.07 86.10 87.29 90.51 87.94 88.91 92.54 Table A.2: PAD-Experiment-1: True Detection Rate (TDR in %) at 1% False Detection Rate (FDR) of different iris PAD methods when trained using real bonafide irides, real PAs and synthetic PAs generated using different GAN methods. BSIF+SVM [41] Fine-Tuned VGG-16 [53] Fine-Tuned AlexNet [84] D-NetPAD [128] RaSGAN CIT-GAN StarGAN-v2 Stylegan-3 iWarpGAN 28.31 29.43 29.73 34.05 32.09 81.11 85.81 83.47 88.14 86.58 86.74 88.37 88.71 90.95 89.18 87.97 88.86 88.45 90.75 90.28 Table A.3: PAD-Experiment-2: True Detection Rate (TDR in %) at 1% False Detection Rate (FDR) of different iris PAD methods when they are trained using real bonafide irides alongside a balanced collection of PA samples using both real and synthetic irides. BSIF+SVM [41] Fine-Tuned VGG-16 [53] Fine-Tuned AlexNet [84] D-NetPAD [128] RaSGAN CIT-GAN StarGAN-v2 Stylegan-3 iWarpGAN 50.52 51.11 50.69 56.14 55.25 88.62 91.60 92.39 95.02 93.69 88.15 92.70 88.38 93.38 92.99 94.90 97.89 95.59 98.39 98.20 Utility of Synthetically Generated Irides The experiments in this section evaluates the usefulness of the synthetically generated irides in the field of presentation attack detection as well as iris recognition. Presentation Attack Detection As discussed earlier, lack of sufficient number and variations of PA samples can affect the general- izability of different PA detection methods, especially PAD methods based on deep networks that needs large number of training samples for better performance. Therefore, we outline the various experimental frameworks that have been established to assess the utility of the synthetically gener- ated iris PAs using different GAN methods. To generate the synthetic data (for both bonafide and PAs), the GANs are trained using 14,970 bonafide iris images, 6,016 printed eyes, 4,014 cosmetic 177 contact lenses and 276 artificial eyes from the iris PA datasets mentioned earlier. Please note that CIT-GAN, StarGAN-v2, Stylegan-3 and iWarpGAN can handle style transfer from one domain to another i.e., image generation for these GANs can extend to multiple domains/style. However, RaS- GAN can facilitate only single domain image generation. Therefore, multiple RaSGAN networks are trained to be able to generate bonafide as well different types of PAs. We analyzed the efficacy of several iris Presentation Attack Detection (PAD) techniques, including VGG-16 [53], BSIF [41], D-NetPAD [128], and AlexNet [84], within these different experimental configurations for analyt- ical scrutiny. It is noteworthy that D-NetPAD has been recognized as one of the top-performing PAD algorithms, particularly highlighted for its performance in the iris liveness detection challenge (LivDet-20 edition). PAD-Experiment-0: In this baseline experiment, we demonstrate the performance of different iris PAD methods on iris PA datasets mentioned in Dataset section. The PAD methods are trained with 14,970 bonafide iris images and 10,306 instances of PAs, which include 4,014 cosmetic contact lenses, 276 artificial eyes and 6,016 printed eyes. For testing, the dataset contain 9,439 bonafide iris images alongside 9,896 PA instances, which broke down into 2,720 cosmetic contacts, 404 artificial eyes and 6,772 printed eyes. PAD-Experiment-1: Here, we aim to evaluate the realism of synthetically generated PAs against real PAs by training the iris PA detection methods with 14,970 bonafide samples and synthetically generated 6,016 printed eyes, 276 artificial eyes and 4,014 cosmetic contact lenses. For testing, the set contains 9,439 bonafide iris images alongside 9,896 PA instances, that consists of 2,720 cosmetic contacts, 404 artificial eyes and 6,772 printed eyes. PAD-Experiment-2: Here, we aim to evaluate the usefulness of the generated PA sampled for balanced training when imbalanced PA classes are improved by adding synthetically generated PAs to the training. Therefore, in this experiment, PAD methods are trained using 14,970 bonafide irides alongside a balanced collection of 15,000 PA samples. This assembly comprises of 276 artificial eyes, 4,014 cosmetic contact lenses, and 5,000 printed eyes which are real PAs; in addition to that 4,724 synthetic artificial eyes and 986 synthetic cosmetic contact lenses are utilized in training set. Similar to previous experiment, testing is done on 9,439 bonafide iris images alongside 9,896 PA instances, that consists of 2,720 cosmetic contacts, 404 artificial eyes and 6,772 printed eyes. Analysis: The variation in the number of samples across different PA categories influences the effectiveness of the PAD techniques. This impact is evident when the outcomes of PAD-Experiment- 0 are compared with PAD-Experiment-1 & 2. In PAD-Experiment-2, the PAD methods are trained with 9,439 bonafide samples and an equalized set of PA samples (namely, 5,000 from each PA category), including both real and synthesized PAs. Based on the data in Table A.1 and Table A.3, there is a discernible enhancement in the performance of each PAD approach when trained with balanced samples from each class. Moreover, the comparative analysis of synthetic PA samples and actual PA samples was conducted through PAD-Experiment-1. In this experiment, a portion of the real PA samples in the training set was substituted with synthetic PAs. When comparing the performance metrics in Table A.1 and Table A.2, only a marginal discrepancy in PAD efficacy is noticeable. 178 Iris Recognition As mentioned earlier, the lack of number of unique identities in the dataset with intra-class varia- tions to generalize can affect the training and testing of many iris recognition methods, especially recognition methods based on deep networks that needs large number of training samples for better performance. Therefore, we outline the various experimental frameworks that have been established to assess the utility of the synthetically generated irides (with both inter and intra-class variations) for iris recognition. As seen from the previous experiments, among the GANs studied in this research, only iWarpGAN has the capability of generating irides that are unique from training data in terms of identity. So, we train iWarpGAN with CASIA-Iris-Thousand, CASIA-CSIR and IIT-Delhi iris dataset, separately, to generate synthetic irides with inter and intra-class variations. The generated dataset is utilized in this experiment to evaluate its usefulness for improved iris recognition. Recog-Experiment-0: In this baseline experiment, EfficientNet [66], ResNet-101 [101] and DenseNet-201 are trained using the triplet training approach. Training and testing has been done using cross-dataset method i.e., when trained using real irides from CASIA-Iris-Thousand and CASIA-CSIR, testing is done on IITD-iris dataset. Recog-Experiment-1: This experiment focuses on assessing the impact of synthetic iris dataset on enhancing the performance of iris recognition methods based on deep learning. In this context, EfficientNet, ResNet-101 and DenseNet-201 are trained with not only the real irides from CASIA- Iris-Thousand, CASIA-CSIR, and IITD-iris datasets but also with a synthetically generated iris dataset derived from iWarpGAN. Analysis: Figures A.7 and A.6 illustrate that the efficacy of the iris recognition system, which relies on deep learning, is enhanced with the incorporation of a larger dataset. This enhancement is particularly evident when the system is trained using a combination of both real iris images and those synthetically produced by iWarpGAN. Although the baseline performance of ResNet- 101 and EfficientNet is somewhat modest, they exhibits a notably substantial enhancement in the performance. Similar, behavior is seen for DenseNet-201. Summary & Future Work In this section, we summarize the studies found in this research and also discuss the future scope in the field of generating synthetic irides. Current Techniques & their Limitations In this research, we studied different GAN methods to generate synthetic irides for both bonafide and different presentation attacks. The generated irides were evaluated for their realism, quality, uniqueness from training dataset and utility. Using these experiments as our criteria for comparison, we conclude that: (1) GAN methods like RaSGAN and StarGAN-v2, CITGAN can generate partially synthetic dataset, but fail to generate enough samples that are unique from training 179 Figure A.3: This figure shows the uniqueness of iris images generated using iWarpGAN when the GANs are trained using CASIA-Iris-Thousand dataset. The y-axis represents the similarity scores obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor. dataset i.e., generated dataset has high similarity with training dataset and also with itself. Similar behavior was seen for Stylegan-3, however the images generated by Stylegan-3 are highly realistic and very close to original dataset in terms or quality (as seen in Figure A.2). On the other hand, iWarpGAN showed the capability of generating fully synthetic irides that has both inter and intra-class variations, which can help replenish the lacking iris datasets for training and testing iris recognition methods. This method is scalable to multiple domains (using attribute vector), it can also be utilized to generate partially synthetic irides from various domains i.e., bonafide and various PAs that can be used to enhance the performance of different PAD methods (as shown in ??). While this method provides solutions for both fully and partially synthetic iris generation, iWarpGAN utilizes image transformation, whereby the network requires both an input and a style reference image to modify the identity and style, resulting in the generation of an output image. Such a process could potentially constrain the range of features that iWarpGAN is able to explore. Also, there is still some similarity observed between training samples and generated irides. 180 Figure A.4: This figure shows the uniqueness of iris images generated using iWarpGAN when the GANs are trained using CASIA-CS iris dataset. The y-axis represents the similarity scores obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor. Future Work & Scope Numerous researchers are dedicating their efforts to the creation of synthetic face images encom- passing varied attributes, styles, identities, spectrums, and more. However, very little work has been done in the field of synthetic iris generation. This opens up a wide array of opportunities that warrant in-depth investigation and exploration. Some of the possible future work in this field are listed as follows: • Generalizable Solution for Fully Synthetic Iris Images: The first area of exploration involves developing a generalizable solution for creating fully synthetic iris images. This involves not just replicating the physical appearance of an iris but also ensuring that the syn- thetic images can adapt or respond to different lighting conditions and camera specifications, just as a real iris would. Such a solution would have huge implications for enhancing the realism and applicability of synthetic irides in various fields, including iris recognition and presentation attack detection. 181 Figure A.5: This figure shows the uniqueness of iris images generated using iWarpGAN when the GANs are trained using IITD iris dataset. The y-axis represents the similarity scores obtained using VeriEye. Here, R=Real, S=Synthetic, Gen=Genuine and Imp=Impostor. • Generating Fully Synthetic Ocular Images: Another intriguing direction is the generation of complete ocular images, which include not only the iris but also other parts of the eye. So far, the research work in this field manly focuses on generating cropped iris images and for some cases the image quality deteriorates as more information is introduced in the image [157]. Therefore, this area needs attention from researchers to be able to study the other distinguishing features of an eye apart from irides. Creating realistic ocular images that accurately represent the myriad variations in human eyes could also aid in the development of more robust facial recognition technologies by providing a method to generate faces that also captures the intricate details of a real iris, which seems to be missing from most of the face generation methods. • Synthetic Iris Videos to Mimic Liveness of Real Irides: The creation of synthetic iris videos that can mimic the liveness of real irides is a particularly challenging yet rewarding prospect. Such advancements would be beneficial in developing more robust PA detection 182 Figure A.6: This figure shows the performance of DenseNet-201, EfficientNet and ResNet-101 in the cross-dataset evaluation scenario i.e., trained using CASIA-Iris-Thousand & CASIA-CSIR datasets and tested using IIT-Delhi iris dataset. Improvement in the performance is seen when size of training set is increased using synthetic irides. Figure A.7: This figure shows the performance of DenseNet-201, EfficientNet and ResNet-101 in the cross-dataset evaluation scenario i.e., trained using CASIA-Iris-Thousand & IIT-Delhi iris datasets and tested using CASIA-CSIR dataset. Improvement in the performance is seen when size of training set is increased using synthetic irides. 183 methods. By simulating the natural movements and minute dynamic changes in the iris, these videos could provide an authentic and effective tool for training and improving liveness detection algorithms in iris recognition systems. Also as mentioned earlier, this could also aid in developing a robust facial recognition technologies. • Multi-spectrum Iris Image Generation: The generation of multi-spectrum iris images presents another frontier. The human iris exhibits different characteristics under various light spectrums - a feature that is often leveraged in biometric systems. Developing synthetic iris images that can accurately reflect these multi-spectral properties would not only enhance the realism of these images but also expand their utility in biometric recognition systems. Such multi-spectrum images could serve as a valuable resource for researchers and developers, offering a versatile tool for testing and improving multi-spectral iris recognition technologies. The potential applications of successfully generated synthetic iris images are vast and varied. In security and biometric recognition systems, these images can help improve the accuracy and robustness of systems by providing a diverse range of data for training and testing. In the medical field, synthetic iris images could be used for training purposes, enabling medical professionals to recognize and diagnose eye-related diseases more effectively. Furthermore, in the realm of entertainment and virtual reality, realistic synthetic iris images could enhance the visual experience by providing more lifelike and expressive characters. The ability to generate eyes that accurately mimic human emotions could revolutionize the way we interact with virtual environments and characters. In conclusion, while the generation of realistic and unique synthetic iris images is still in the development stage, it presents an opportunity for research and exploration. 184