DIGITAL IMAGE FORENSICS IN THE CONTEXT OF BIOMETRICS

By

Sudipta Banerjee

A DISSERTATION

Michigan State University

in partial fulﬁllment of the requirements

Submitted to

for the degree of

Computer Science – Doctor of Philosophy

2020

ABSTRACT

DIGITAL IMAGE FORENSICS IN THE CONTEXT OF BIOMETRICS

By

Sudipta Banerjee

Digital image forensics entails the deduction of the origin, history and authenticity of a digital
image. While a number of powerful techniques have been developed for this purpose, much of the
focus has been on images depicting natural scenes and generic objects. In this thesis, we direct our
focus on biometric images, viz., iris, ocular and face images.

Firstly, we assess the viability of using existing sensor identiﬁcation schemes developed for
visible spectrum images on near-infrared (NIR) iris and ocular images. These schemes are based
on estimating the multiplicative sensor noise that is embedded in an input image. Further, we
conduct a study analyzing the impact of photometric modiﬁcations on the robustness of the schemes.
Secondly, we develop a method for sensor de-identiﬁcaton, where the sensor noise in an image is
suppressed but its biometric utility is retained. This enhances privacy by unlinking an image from
its camera sensor and, subsequently, the owner of the camera. Thirdly, we develop methods for
constructing an image phylogeny tree from a set of near-duplicate images. An image phylogeny tree
captures the relationship between subtly modiﬁed images by computing a directed acyclic graph
that depicts the sequence in which the images were modiﬁed. Our primary contribution in this
regard is the use of complex basis functions to model any arbitrary transformation between a pair
of images and the design of a likelihood ratio based framework for determining the original and
modiﬁed image in the pair. We are currently integrating a graph-based deep learning approach with
sensor-speciﬁc information to reﬁne and improve the performance of the proposed image phylogeny
algorithm.

Dedicated to Maa and Baba

iii

ACKNOWLEDGMENTS

I wish to express my sincerest gratitude to my doctoral advisor and mentor, Professor Arun Ross,
for providing me with outstanding resources to build my research appetite and grooming me as a
graduate student. His expertise, unequivocal presentation skills, and a quest for excellence have
motivated me to pursue quality in all my tasks. He has taught me the importance of seeking
answers to three questions in research - what, why and how.
It is necessary but not suﬃcient
to just know what works, but also why does a speciﬁc technique work the way it does, and how
does it work, are equally important questions. He has always encouraged me to strive for holistic
experience as a doctoral candidate by giving me opportunities to attend conferences, workshops,
summer school, and volunteering in outreach programs. For that, I am grateful to him. I am also
thankful to Professor Anil Kumar Jain, whom I am fortunate to have on my doctoral committee.
He has been a great source of inspiration and encouragement to me. I will take this opportunity to
thank Professor Selin Aviyente and Professor Yiying Tong to serve on my doctoral committee and
providing me with their expert advice in my research. I appreciate the support given by Professor
Sandeep Kulkarni, the Associate Chair of the Graduate Studies, and Professors Xiaoming Liu and
Vishnu Boddeti. I am also thankful to Professor Ananda S. Chowdhury, my Masters’ adviser who
encouraged me to pursue doctoral studies at Michigan State University.

I feel blessed to have very supportive research lab mates and colleagues who have been extremely
patient with me and supported me through this journey of ﬁve and a half years. They helped me,
critiqued me, and prepared me to defend my research. Also, they celebrated with me when I
would receive an award, and consoled me when my paper would get rejected. They have been
critical in my journey which is inching towards its end now - (in the order I encountered them)
Thomas Swearingen, Steven Hoﬀman, Dr. Denton (Denny) Bobeldyk, Dr. Yaohui (Eric) Ding,
Raghunandan Pasula, Aaron Gonzalez, Melissa Dale and Jessie, Anurag Chowdhury, Dr. Vahid
Mirjalili, Renu Sharma, Achsah Ledala, Shivangi Yadav, Dr. Darshika Jauhari and Austin Cozzo.
I want to thank Kelly Climer, Brenda Hodge, and Erin Dunlop for their support and help.

iv

Personally, none of this would have been possible without God’s grace, and I could feel the
blessing and unconditional love and support showered upon me through my parents. My family
has been a pillar of support for me right from the beginning when I decided to come abroad for
the doctoral program. They have continued to motivate me whenever I would doubt myself. My
brother and sister-in-law expressed how proud they are of me, something that I will always cherish.
Friends, what would I do without them? I always doubted about making new friends in the
States. I always thought I would stop making good friends after school and college. Pratiti, Sangita,
and Monalisa have been great friends, and they will be my friends for life. I am grateful to have
added more to the list of people, my extended support line. They have been by my side whenever
I needed help, and I hope to continue our friendship even after we part ways from Michigan State
University. A very special thanks to Inci, Nilay and Priyanka, and Vidhya and Sai - you guys mean
a lot to me, and thank you for having me as your friend!

v

TABLE OF CONTENTS

.
.
.

.
.
.

.
.
.

.
.
.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
x
. xiv
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

.
LIST OF TABLES .
.
LIST OF FIGURES .
LIST OF ALGORITHMS .

.
.

.
.

.

.

.

.

.

.

CHAPTER 1

INTRODUCTION .

Sensor-based forensics
1.3.1.1
1.3.1.2
1.3.1.3

Image Forensics in the Context of Biometrics
1.3.1

1.1 Biometrics .
.
1.2 Digital Image Forensics .
1.3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
. . . . . . . . . . . . . . . . . . . .
6
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sensor Identiﬁcation Methods . . . . . . . . . . . . . . . . . . .
8
Sensor De-identiﬁcation Methods . . . . . . . . . . . . . . . . . 11
Joint Biometric-sensor Representation Methods . . . . . . . . . . 12
1.3.2 Content-based forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Image Phylogeny (Tree and Forest) Construction Methods . . . . 16
. 19
. 21
CHAPTER 2 SENSOR IDENTIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Study of existing sensor identiﬁcation schemes on near-infrared ocular images . . . 23
2.2.1 Experiments and results for the ﬁrst study . . . . . . . . . . . . . . . . . . 24
2.2.2 Datasets used in the ﬁrst study . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 Experimental methodology used in the ﬁrst study . . . . . . . . . . . . . . 24
2.2.4 Results and discussion from the ﬁrst study . . . . . . . . . . . . . . . . . . 27
Summary of the ﬁrst study . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4 Thesis Contributions
1.5 Thesis Organization .

Introduction .

1.3.2.1

.
. .

. .

.

.

.

.

.

2.3 Analyzing the eﬀect of photometric transformations on sensor identiﬁcation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
schemes for ocular images
2.3.1
Photometric Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.2 Experiments and results for the second study . . . . . . . . . . . . . . . . 37
2.3.3 Datasets used in the second study . . . . . . . . . . . . . . . . . . . . . . 37
2.3.4 Experimental methodology and results for the second study . . . . . . . . . 38
2.3.5 Analysis and explanatory model for the second study . . . . . . . . . . . . 39
Summary of the second study . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3.6
. 44
CHAPTER 3 SENSOR DE-IDENTIFICATION . . . . . . . . . . . . . . . . . . . . . . . 45
3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Sensor de-identiﬁcation for iris sensors . . . . . . . . . . . . . . . . . . . . . . . . 45
Perturbing the PRNU Pattern for iris sensors . . . . . . . . . . . . . . . . . 47
Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4 Summary .

Introduction .

3.2.1
3.2.2

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

vi

.

3.3.3

3.2.4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Proposed Method for smartphone camera de-identiﬁcation . . . . . . . .

3.2.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.2.1
Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2.2 Deriving perturbations and PRNU Spooﬁng . . . . . . . . . . . . 48
. 51
. 51
3.2.3.1 Datasets
Sensor identiﬁcation before PRNU spooﬁng . . . . . . . . . . . . 52
3.2.3.2
Sensor identiﬁcation after PRNU spooﬁng . . . . . . . . . . . . 52
3.2.3.3
3.2.3.4 Retaining biometric matching utility . . . . . . . . . . . . . . .
. 55
Summary of the ﬁrst strategy of sensor de-identiﬁcation . . . . . . . . . . . 58
3.3 Smartphone camera de-identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.1
. 60
3.3.2 Experiments and Results for smartphone camera de-identiﬁcation . . . . . 63
. 63
3.3.2.1 Dataset
3.3.2.2
Experimental Methodology for smartphone camera de-identiﬁcation 64
3.3.2.3 Results for smartphone camera de-identiﬁcation . . . . . . . . . 67
Summary of the second strategy for smartphone camera de-identiﬁcation . 70
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
JOINT BIOMETRIC-SENSOR REPRESENTATION . . . . . . . . . . . . 72
4.1
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
.
4.2 Proposed Method .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
.
4.3 Experiments .
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.1 Datasets . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.2 Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.3 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Selection of the metric and dimensionality of embedding . . . . . . . . . . 82
4.4.1
4.4.2
Performance of each of the three training modes . . . . . . . . . . . . . . . 82
4.4.3 Results of the joint identiﬁcation experiment . . . . . . . . . . . . . . . . . 83
4.4.4 Results of the joint veriﬁcation experiment . . . . . . . . . . . . . . . . . . 84
. 85
4.4.5 Analysis of the performance of the proposed method on MICHE-I dataset
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 88

4.4 Results and Analysis

3.4 Summary .

4.5 Summary .

Introduction .

.
.
.
.
.
.
. .

. .

. .

.
.
.
.

.

.

.

.

.

.

.

.

.

.

.

.

.

CHAPTER 4

.

.

.

.

CHAPTER 5

.
.

.
.
.

.
.
.

5.2.1

.
5.1
.
5.2 Proposed Approach .

Introduction .

IMAGE PHYLOGENY TREE FOR NEAR-DUPLICATE BIOMETRIC
IMAGES .
. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Parametric Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.1.1 Global Linear (GL) Model . . . . . . . . . . . . . . . . . . . . . 93
5.2.1.2
. . . . . . . . . . . . . . . . . . . . . 95
5.2.1.3 Global Quadratic (GQ) Model . . . . . . . . . . . . . . . . . . . 96
IPT-DAG Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3.1 Datasets and Experimental Methodology . . . . . . . . . . . . . . . . . . . 100
5.3.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Local Linear (LL) Model

5.3 Experiments .

5.2.2
5.2.3

. .

.

.

.

.

.

vii

5.4 Summary .

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

.
.

.
.

.
.

.
.

. .
. .

6.2.1

CHAPTER 6 A PROBABILISTIC FRAMEWORK FOR IMAGE PHYLOGENY US-

6.3.2.1
6.3.2.2
6.3.2.3
6.3.2.4
6.3.2.5
6.3.2.6
6.3.2.7

Introduction .

6.1
.
6.2 Proposed Method .

. .

6.2.2.1
6.2.2.2
6.3 Experiments .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

ING BASIS FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
. 107
. 108
Parameter Estimation of Basis Functions . . . . . . . . . . . . . . . . . . . 110
6.2.1.1 Orthogonal Polynomial Basis Functions . . . . . . . . . . . . . . 110
6.2.1.2 Wavelet Basis Functions . . . . . . . . . . . . . . . . . . . . . . 111
6.2.1.3 Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 111
6.2.2 Asymmetric Measure Computation and IPT Construction . . . . . . . . . . 112
Likelihood ratio for computing the asymmetric similarity measure 113
IPT Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 114
.
.
. 115
6.3.1 Datasets . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Experiment 1: Eﬃcacy of basis functions . . . . . . . . . . . . . 116
Experiment 2: IPT Reconstruction . . . . . . . . . . . . . . . . . 119
Experiment 3: Cross-modality testing on multiple conﬁgurations 119
Experiment 4: Robustness to unseen photometric transformations 120
Experiment 5: Ability to handle geometric transformations . . . . 121
Experiment 6: Ability to handle near-duplicates available online . 123
Experiment 7: Ability to handle deep learning-based transfor-
mations and image augmentation schemes . . . . . . . . . . . . . 123
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.4.1 Results of Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.4.2 Results of Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4.3 Results of Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.4.4 Results of Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.4.5 Results of Experiment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4.6 Results of Experiment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.4.7 Results of Experiment 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 142
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
CHAPTER 7 GRAPH-BASED APPROACH FOR IMAGE PHYLOGENY FOREST . . . 146
. 146
. 148
7.2.1 Locally-scaled spectral clustering
. . . . . . . . . . . . . . . . . . . . . . 150
7.2.2 GCN-based Node Embedding . . . . . . . . . . . . . . . . . . . . . . . . 154
PRNU-based Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2.3
Implementation .
. 157
. 159
7.4.1 Experiment 1: Evaluation of locally-scaled spectral clustering . . . . . . . 160

7.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Datasets and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.5 Explanatory Model
6.6 Summary .
.

Further Analysis
.
.

Introduction .

7.1
.
7.2 Proposed Method .

. .

6.4 Results and Analysis

.
.
.

.
.

.

.

.

.

.

.

.

.

. .

.

.
.

.
.

.
.

.
.

.
.

viii

7.4.2 Experiment 2: Evaluation of the proposed IPT reconstruction algorithm

using GCN-based node embedding and PRNU-based link prediction . . . . 160

.
.

.
.

7.4.4 Baseline .

.
7.5 Results and Analysis

7.4.3 Experiment 3: Evaluation of the proposed IPF reconstruction using
locally-scaled spectral clustering and GCN-based node embedding and
PRNU-based link prediction . . . . . . . . . . . . . . . . . . . . . . . .

7.5.1 Results for Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.2 Results for Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.3 Results for Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . .

. 163
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
.
. 164
. 165
. 168
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
CHAPTER 8 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . 172
. 172
. 176
. 179

.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8.1 Research Contributions .
8.2 Future Work . .
.

. .

.

.

BIBLIOGRAPHY .

7.6 Summary .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

ix

LIST OF TABLES

Table 2.1: Dataset and sensor speciﬁcations.

. . . . . . . . . . . . . . . . . . . . . . . . . 24

Table 2.2: Sensor identiﬁcation accuracies before and after applying box-cox transformation. 27

Table 2.3: Rank-1 Confusion Matrix for Basic SPN / MLE SPN based PRNU extraction

scheme.

.

.

.

.

. .

. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Table 2.4: Rank-1 Confusion Matrix for Enhanced SPN / Phase SPN based PRNU ex-

traction scheme.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Table 2.5: Rank-1 Sensor Identiﬁcation Accuracies (%). The value enclosed in parenthe-
ses indicates the diﬀerence in accuracy when compared to that obtained using
the original images. Note that in all cases, the reference pattern for each sensor
is computed using the unmodiﬁed original images.

. . . . . . . . . . . . . . . . 37

Table 2.6: Jensen-Shannon divergence values computed between the wavelet-denoised

versions of the original and the photometrically transformed images. . . . . . . . 38

Table 3.1: Speciﬁcations of the datasets used in this work.

. . . . . . . . . . . . . . . . . . 48

Table 3.2: Confusion matrix for sensor identiﬁcation involving unperturbed but resized
images. The test noise residuals of images from 5 sensors are compared
against reference patterns from 11 sensors. The last column indicates sensor
identiﬁcation accuracy.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Table 3.3: Results of PRNU spooﬁng where the target sensors (along the second column)
are spoofed by perturbing the images from 5 source sensors, namely, Aop, JPC,
IC, Cog and LG40 (along the ﬁrst column). The test noise residual after the
perturbation process is compared against the reference patterns of 11 sensors
(see Table 3.1). The last 3 columns indicate the proportion of the perturbed
images successfully classiﬁed as belonging to the target sensor and is denoted
as the Spoof Success Rate (SSR). The highest values of the SSR are bolded. . .

. 54

Table 3.4: Dataset speciﬁcations. The top block corresponds to MICHE-I dataset [117]
and the bottom block corresponds to OULU-NPU face dataset [35]. In the
MICHE-I dataset, we denote the brand Apple as ‘Device 1’ and the brand
Samsung as ‘Device 2’. Two diﬀerent smartphones belonging to the same
brand and model, e.g., Apple iPhone5, are distinguished as ‘UNIT I’ and ‘UNIT II’. 62

x

Table 3.5: Performance of the proposed algorithm for PRNU Anonymization in terms
of sensor identiﬁcation accuracy (%). Results are evaluated using 3 PRNU
estimation schemes.
‘Original’ corresponds to sensor identiﬁcation using
images prior to perturbation. ‘After’ corresponds to sensor identiﬁcation using
images after perturbation and ‘Change’ indicates the diﬀerence between the
‘Original’ and ‘After’ sensor identiﬁcation accuracies. A high positive value
in the ‘Change’ ﬁeld indicates successful PRNU Anonymization.

. . . . . . .

. 65

Table 3.6: Performance of the proposed algorithm for PRNU Spooﬁng in terms of the
spoof success rate (SSR) (%). Results are evaluated using three PRNU esti-
mation schemes. A high value of SSR indicates successful spooﬁng. . . . . . . . 67

Table 4.1: Dataset speciﬁcations used in this work. We used three datasets corresponding
to 3 biometric modalities viz., iris, periocular and face. Here, we perform joint
biometric and sensor recognition, so total # is computed as the product
of #   and # . (∗MICHE-I dataset has a total 75 subjects, out
of which the ﬁrst 48 subjects were imaged using iPhone 5S UNIT I and
the remaining 27 subjects were imaged using iPhone 5S UNIT II, as observed
in [71]. Here, ‘UNIT’ refers to two diﬀerent units of the same brand and model
iPhone 5S, and therefore, should be treated as two diﬀerent smartphones. In
this case, # = 375 since only a subset of the total 75 subjects were
imaged using either of the two units of iPhone 5S smartphone at a time.)

. . . . 77

Table 4.2: Description of the training modes and the loss functions used in this work. . . . . 80

Table 4.3: Results in the joint identiﬁcation scenario. Results are reported in terms of
Rank 1 identiﬁcation accuracies (%). A correct joint identiﬁcation implies
that both sensor and subject resulted in a match. Mismatch of either subject or
sensor or both will result in an incorrect joint identiﬁcation.

. . . . . . . . . . . 83

Table 4.4: Results in the joint veriﬁcation scenario. Results are reported in terms of true

match rate (TMR) at false match rates (FMRs) of 1% and 5%.

. . . . . . . . . . 84

Table 5.1: Photometric transformations and selected range of parameters used for the ﬁrst

set of experiments.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Table 5.2: Photometric transformations and selected range of parameters used for the

second set of experiments.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Table 5.3: Experiment 1. Performance of the 3 parametric models in representing each

of the 5 photometric transformations.

. . . . . . . . . . . . . . . . . . . . . . . 103

Table 5.4: Experiment 2. IPT-DAG reconstruction accuracy for diﬀerent tree conﬁgu-
rations using magnitude of predicted parameters as asymmetric dissimilarity
measure.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

. .

. .

.

.

xi

Table 5.5: Experiment 3. IPT-DAG Reconstruction Accuracy for the multiple transfor-

mation scenario depicted in Figure 5.5.

. . . . . . . . . . . . . . . . . . . . . . 105

Table 6.1: Description of the datasets used in this work.

. . . . . . . . . . . . . . . . . . . 115

Table 6.2: Photometric transformations and the range of the corresponding parameters
used in the training and testing experiments. The transformed images are
scaled to [0, 255]. Note that experiments were also conducted using other
. . . . . . .
complex photometric transformations besides the ones listed here.

. 116

Table 6.3: Experiment 5: Geometric transformations and their parameter ranges used in

this work.

.

. .

. .

. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Table 6.4: Experiment 2: Root identiﬁcation and IPT reconstruction accuracies for face

images (Partial Set).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Table 6.5: Experiment 2: Root identiﬁcation and IPT reconstruction accuracies for face

images (Full set).

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Table 6.6: Experiment 3A: Root identiﬁcation and IPT reconstruction accuracies for iris

images in the cross-modality setting.

. . . . . . . . . . . . . . . . . . . . . . . 131

Table 6.7: Experiment 3B: Root identiﬁcation and IPT reconstruction accuracies for ﬁn-

gerprint images (Conﬁg -I) in the cross-modality setting.

. . . . . . . . . . . . . 131

Table 6.8: Experiment 3B: Root identiﬁcation and IPT reconstruction accuracies for ﬁn-

gerprint images (Conﬁg -II) in the cross-modality setting. . . . . . . . . . . . . . 132

Table 6.9: Experiment 3: Baseline performance of basis functions in terms of root iden-

tiﬁcation and IPT reconstruction accuracies in the intra-modality setting. . . . . . 132

Table 6.10: Experiment 4: Root identiﬁcation and IPT reconstruction accuracies for unseen

photometric transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 132

Table 6.11: Experiment 5: Root identiﬁcation and IPT reconstruction accuracies for ge-
ometric transformations. The top two rows indicate the baseline algorithms.
The baselines yield only one root node as output so results are reported only
at Rank 1 and the remaining ranks are indicated Not Applicable (NA). In this
experiment, the testing (TE) is always done on geometrically modiﬁed images
(indicated by TE-GM) but the training (TR) can be done using either photo-
metrically modiﬁed images (indicated by TR-PM) or geometrically modiﬁed
images (TR-GM). Results indicate training on geometrically modiﬁed images
yield best performance when tested on geometric transformations.

. . . . . . .

. 135

xii

Table 6.12: Experiment 7: Root identiﬁcation and IPT reconstruction accuracies for deep
learning-based transformations. In this case, the near duplicates are generated
using autoencoder . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

.

.

Table 6.13: Experiment 7: Root identiﬁcation and IPT reconstruction accuracies for deep
learning-based transformations. In this case, the near duplicates are generated
using image augmentation schemes for training deep neural networks.

. . . . .

. 138

Table 6.14: Approximate von Neumann entropy for analysis of spurious edges and missing
edges in reconstructed IPTs. The mean and the standard deviation of the
diﬀerences between the entropy of the ground truth and the reconstructions are
reported. Low values indicate accurate reconstructions and smaller number of
spurious as well as missing edges.

. . . . . . . . . . . . . . . . . . . . . . . . . 142

Table 7.1: Photometric and geometric transformations and the range of the corresponding
parameters used in Experiments 2 and 3. The transformed images are scaled
to [0, 255]. Note that these transformations are being used only in the training
stage. For the test stage, any arbitrary transformation can be used.

. . . . . . . . 159

Table 7.2: Experiment 2: Performance of node embedding and link prediction modules
in terms of root identiﬁcation and IPT reconstruction accuracies for both
photometric and geometric transformations. The results are reported for two
scenarios. The values to the left of the forward slash indicate Scenario 1
(trained on face images and tested on face images) and the values to the right
indicate Scenario 2 (trained on face images but tested on images depicting
natural scenes).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

. .

. 167

Table 7.3: Experiment 2: Evaluation of the performance of node embedding and link
prediction modules in the context of unseen transformations, unseen modalities
and conﬁgurations and unseen number of nodes.

. . . . . . . . . . . . . . . . . 168

Table 7.4: Experiment 3: Number of clusters (mean and standard deviation) produced
during IPF construction by conventional spectral clustering and locally-scaled
spectral clustering (proposed). A lower value (mean ≈ 1, standard deviation
≈ 0) is desirable. The proposed method yields better results (bolded).

. . . . . . 169

Table 7.5: Experiment 3: Evaluation of GCN-based node embedding and PRNU-based
link prediction for each IPT conﬁguration used in the IPF in terms of root
identiﬁcation and reconstruction accuracies. Results indicate that the proposed
method (bolded) signiﬁcantly outperforms state-of-the-art baselines in all the
cases.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

. .

. .

.

.

.

.

.

xiii

LIST OF FIGURES

Figure 1.1: The overarching objective in this thesis is the integration of content-speciﬁc
and sensor-speciﬁc characteristics present in a biometric image to develop
image forensic strategies in the context of biometrics.

. . . . . . . . . . . . . .

Figure 1.2: Examples of Near-Infrared (NIR) iris images captured using (a) Aoptix In-

sight, (b) LG iCAM4000 and (c) IrisCam V2 sensors.

. . . . . . . . . . . . . .

Figure 1.3: General framework for sensor identiﬁcation from an iris image.

. . . . . . . .

.

6

6

9

Figure 1.4: Photo Response Non-Uniformity (PRNU) enhancement models used for sup-
pression of scene details in the images. Here, - axis represents the pre-
enhanced noise residual values in the wavelet domain and the -axis represents
the post-enhanced noise residual values.

. . . . . . . . . . . . . . . . . . . .

. 10

Figure 1.5: Examples of variations of the same image uploaded on multiple websites with

subtle modiﬁcations making them appear almost identical. . . . . . . . . . . . . 14

Figure 1.6: Examples of photometrically related near-duplicate face images. (a) A set of
20 related images and (b) their corresponding Image Phylogeny Tree (IPT).
In our work (a) is the input and (b) is the output.

. . . . . . . . . . . . . . . .

Figure 1.7: Near-duplicates appearing on defaced websites. Images curated by Zone-H,

not meant for public distribution.

. . . . . . . . . . . . . . . . . . . . . . . .

. 15

. 16

Figure 2.1: Average pixel-intensity histograms of four sensors. The pixel intensities vary

across diﬀerent sensors indicating diverse image characteristics. . . . . . . . . . 26

Figure 2.2: Reference patterns for the CASIAv3 Interval dataset estimated using diﬀer-
ent PRNU estimation schemes. Visual inspection reveals noise like pattern
extracted from the training images that are devoid of image content. . . . . . .

. 28

Figure 2.3: Noise residual from an image captured using the Aoptix Insight sensor. (a)
Before enhancement. (b) After enhancement. The application of enhance-
ment model subdues the scene content in the image signiﬁcantly.

. . . . . . . . 29

Figure 2.4: Comparison of overall accuracy of diﬀerent PRNU extraction schemes using

CMC and ROC curves.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 30

Figure 2.5: Digital watermark present in the reference pattern of AD100 sensor. (The

logarithmic transformation has been used here for better visualization).

. . . . . 31

xiv

Figure 2.6: Examples of Near-Infrared (NIR) ocular images exhibiting (a) defocus blur,

(b) uneven illumination and (c) motion blur (due to eyelid movement).1 . . . . . 32

Figure 2.7: Examples of iris sensors considered in this work. (a) IrisKing IKEMB100,
(b) LG 4000, (c) IrisGuard-IG-AD100, (d) Panansonic-BM-ET100US Au-
thenticam, (e) JIRIS JPC1000, (f) CASIA IrisCam-V2, (g) Aoptix Insight,
(h) OKI IrisPass-h, (i) LG 2200, (j) Cogent and (k) Everfocus Monochrome CCD. 35

Figure 2.8: An example of an NIR iris image subjected to seven illumination normaliza-
tion schemes. (a) Original, (b) CLAHE, (c) Gamma correction, (d) Homo-
morphic ﬁltering, (e) MSR, (f) SQI, (g) DCT normalization and (h) DoG.2 . . . 36

Figure 2.9: Cumulative Matching Characteristics (CMC) curves depicting the eﬀect of
diﬀerent illumination normalization processes on PRNU estimation tech-
niques. (a) Original, (b) CLAHE, (c) Gamma correction, (d) Homomorphic
ﬁltering, (e) MSR, (f) SQI, (g) DCT normalization and (h) DoG.

. . . . . . . . 40

Figure 2.10: ROC curves depicting sensor identiﬁcation performance of photometrically
transformed images. (a) Basic SPN, (b) Phase SPN, (c) MLE SPN and (d)
Enhanced SPN.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

. .

.

Figure 3.1:

Illustration of the objective of the proposed method, i.e., to perturb an ocular
(iris) image such that its PRNU pattern is modiﬁed to spoof that of another
sensor, while not adversely impacting its biometric utility.

. . . . . . . . . . . . 46

Figure 3.2: The proposed algorithm for deriving perturbations for the input image using
the candidate image.
(a) Steps involved in modifying the original image
from the source sensor using a candidate image from the target sensor (see
Algorithm 1), and (b) role of the candidate image in the perturbation engine
(see Algorithm 2).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

.

.

Figure 3.3:

Illustration of PRNU spooﬁng using images belonging to the source sensor
JPC and the candidate images belonging to the target sensor Aoptix.

. . . . . . 51

Figure 3.4: Example of PRNU spoofed images originating from the JPC 1000 sensor
(ﬁrst column) is illustrated for Baseline 1 (second column), Baseline 2 (third
column) and the proposed method (last column). Here, the target sensor is Aoptix. 55

Figure 3.5:

Intermediate images generated when an image from the Aoptix () sensor is
perturbed using a candidate image from Cogent (). For the sake of brevity,
NCC values corresponding to the reference patterns of the ﬁrst 5 sensors in
Table 3.1 are mentioned in the ﬁgure. The arrows indicate the increase in the
NCC values corresponding to the target sensor. . . . . . . . . . . . . . . . . . . 55

xv

Figure 3.6: Receiver Operating Characteristics (ROC) curves of matching performance
obtained using the VeriEye iris matcher software. The terms ‘Original’, ‘Per-
turbed’ and ‘Original vs. Perturbed’ indicate the three diﬀerent matching
scenarios (see Section 3.2.3.4).
‘Original’ indicates matching only unper-
turbed images; ‘Perturbed’ indicates matching only perturbed images; ‘Orig-
inal vs. Perturbed’ indicates the cross-matching case where unperturbed
images are matched against perturbed images. Note that the curves obtained
from perturbed images match very closely with the curves corresponding to
the unperturbed images illustrating preservation of iris recognition for each
sensor depicted in each column. The results are compared with Baseline 1
and 2 algorithms discussed in Section 3.2.3.3.

. . . . . . . . . . . . . . . . . . 56

Figure 3.7:

Impact of increase in the number of iterations on iris recognition performance
for the pair of LG 4000 (source) and Aoptix (target) sensors.

. . . . . . . . . . 58

Figure 3.8: The objective of our work. The original biometric image is modiﬁed such that
the sensor classiﬁer associates it with a diﬀerent sensor, while the biometric
matcher successfully matches the original image with the modiﬁed image.

. .

. 59

Figure 3.9:

Illustration of PRNU Anonymization. The DCT coeﬃcients are arranged
such that the top-left portion has the low frequency components while the
bottom-right portion encapsulates the high frequency information. The
PRNU anonymized image is the result of suppression of high frequency
components (see Algorithm 3, here  = 0.9).

. . . . . . . . . . . . . . . . . . . 60

Figure 3.10: Illustration of PRNU Spooﬁng. The high frequency components in the orig-
inal image are suppressed ﬁrst, the residue being the low frequency com-
ponents. The high frequency components of the target sensor are further
computed from the candidate images, and added to the low frequency com-
ponents of the original image, resulting in the PRNU spoofed image (see
Algorithm 4, here  = 0.7).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Figure 3.11: Example images from the MICHE-I and the OULU-NPU datasets acquired
using (a) Apple iPhone 5 Rear, (b) Samsung Galaxy S4 Front, (c) Samsung
Galaxy S6 Edge Front, (d) HTC Desire EYE Front, (e) MEIZU X5 Front, (f)
ASUS Zenfone Selﬁe Front, (g) Sony XPERIA C5 Ultra Dual Front and (h)
OPPO N3 Front sensors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Figure 3.12: ROC curves for matching PRNU Anonymized images. Each row corresponds
to a diﬀerent device identiﬁer: (a) Device 1 UNIT I, (b) Device 1 UNIT II
and (c) Device 2 UNIT I.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Figure 3.13: ROC curves for matching PRNU Spoofed images. Here, the source sensor is
Device 1 UNIT I. In this case, the target sensors are: (a) Device 1 UNIT II
(top row) and (b) Device 2 UNIT I (bottom row).

. . . . . . . . . . . . . . . . 66

xvi

Figure 3.14: ROC curves for matching PRNU Spoofed images. Here, the source sensor is
Device 1 UNIT II. In this case, the target sensors are: (a) Device 1 UNIT I
(top row) and (b) Device 2 UNIT I (bottom row).

. . . . . . . . . . . . . . . . 69

Figure 3.15: ROC curves for matching PRNU Spoofed images. Here, the source sensor
is Device 2 UNIT I. In this case, the target sensors are: (a) Device 1 UNIT I
(top row) and (b) Device 1 UNIT II (bottom row).

. . . . . . . . . . . . . . . . 69

Figure 4.1: Diﬀerence between (a) methods that use separate modules for computing
biometric and sensor representations, and (b) the proposed method that uses
an embedding network to generate a joint biometric-sensor representation.

. . . 73

Figure 4.2: Outline of the proposed method used for computing the joint biometric and
sensor representation. Input: A single image, or a pair of images, or 3-
tuple images to the embedding network. Output: Joint biometric-sensor
representation. The embedding network is trained in three mutually exclusive
modes, viz., classical mode (top row), siamese mode (middle row) and triplet
mode (bottom row). The switching circuit selects only one training mode at
a time.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

.

.

.

.

.

.

.

.

.

Figure 4.3: Variation in the joint veriﬁcation performance as a function of the dimension-
ality of the joint representation. Experiment is conducted on the validation
set using 50 images from the MICHE-I dataset and four dimensionality val-
ues viz., {4, 8, 16, 32}. 8-dimensional embedding resulted in the highest joint
. . . . . . . . . .
veriﬁcation accuracy, and is therefore selected in this work.

. 80

Figure 4.4: 2-D projection of the embeddings using t-SNE used for sensor identiﬁcation
in the OULU-NPU dataset. Each sensor class is suﬃciently discriminated
from the rest of the sensors.

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. 82

Figure 4.5: Cumulative Matching Characteristics (CMC) curves for the proposed method
in the joint identiﬁcation scenario for the following datasets used in this
work: (a) CASIA-Iris V2 (b) MICHE-I and (c) OULU-NPU. Refer to Table 4.2
for the diﬀerent training networks and loss functions indicated in the legend
in an identical order. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Figure 4.6: Receiver Operating Characteristics (ROC) curves for the proposed method in
the joint veriﬁcation scenario for the following datasets used in this work:
(a) CASIA-Iris V2 (b) MICHE-I and (c) OULU-NPU. Refer to Table 4.2 for
the diﬀerent training networks and loss functions indicated in the legend in
an identical order.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

. .

xvii

Figure 4.7: Example images from the challenging MICHE-I dataset. (a) Occlusion, (b)
Downward gaze and specular reﬂection, (c) Prominent background in the
outdoor setting and (d) A single image containing both eyes but labeled as
right eye image (061_GT2_OU_F_RI_01_3 where, RI indicates right eye).

. . . 87

Figure 4.8: Cumulative Matching Characteristics (CMC) curves for the proposed method
in the joint identiﬁcation scenario for the MICHE-I dataset evaluated sepa-
rately on the two lateralities, i.e., on the (a) Left periocular images and on
the (b) Right periocular images. Results indicate that the proposed method
performs better on the left periocular images compared to the right periocular
images. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

. 88

Figure 5.1: Example of photometric transformations applied to an NIR iris image. (a)

Original image, (b) brightness adjusted image, and (c) contrast adjusted image.

91

Figure 5.2: General framework for parameter estimation and IPT reconstruction from a

set of near-duplicate and related iris images.

. . . . . . . . . . . . . . . . . . . 92

Figure 5.3:

Illustration of global model optimization vs. local model optimization. (a)
Global model optimizes with respect to entire image, (b) local model opti-
mizes with respect to each of the tessellated patches in the image, and (c)
local optimization for a pair of tessellated images.

. . . . . . . . . . . . . . . . 93

Figure 5.4: Examples of image phylogeny tree conﬁgurations considered in this work. (a)

Breadth= 1, Depth= 3, (b) Breadth= 3, Depth= 1 and (c) Breadth= 2, Depth= 2 . 101

Figure 5.5:

IPT based on multiple photometric transformations. (a) IPT of Breadth= 3 and
Depth= 2. (b) Example of an iris image undergoing multiple transformations
in a single tree (diﬀerent colored lines denote the diﬀerent transformations
indicated in the left ﬁgure).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Figure 5.6: Example of an IPT of breadth 3 and depth 1 undergoing Gaussian smoothing
resulting in an incorrect IPT-DAG reconstruction. (a) Original IPT-DAG (
denotes the standard deviation governing the smoothing operation) and (b)
incorrect IPT-DAG reconstruction.

. . . . . . . . . . . . . . . . . . . . . . . . 104

Figure 6.1: The outline of the proposed method. The proposed method ﬁrst models the
photometric transformations between every image pair and then computes the
asymmetric measure. Given a set of near-duplicate images as input (on the
left) the two objectives are: (i) to determine the candidate set of root nodes,
and (ii) to construct the IPT when the root image is known. The dashed
arrows indicate ancestral links and the bold arrows indicate immediate links
between parent and child nodes.

. . . . . . . . . . . . . . . . . . . . . . . . . 114

xviii

Figure 6.2:

IPT conﬁgurations used in Experiments 2 and 3 for the face, iris and ﬁngerprint
modalities. Note that the same conﬁguration was tested across two modalities
(Face and Iris) while, two diﬀerent conﬁgurations were tested for the same
modality (Finger). The bold arrows indicate immediate links and the dashed
arrows indicate ancestral links.

. . . . . . . . . . . . . . . . . . . . . . . . . . 117

Figure 6.3:

IPT conﬁgurations used in Experiment 4. The bold arrows indicate immediate
links and the dashed arrows indicate ancestral links.

. . . . . . . . . . . . . . . 118

Figure 6.4: Experiment 5: An example IPT generated using geometrically modiﬁed near-
duplicate images. The bold arrows indicate immediate links and the dashed
arrows indicate ancestral links.

. . . . . . . . . . . . . . . . . . . . . . . . . . 122

Figure 6.5: Experiment 7: (Left) IPT test conﬁguration used for evaluation of the basis
functions by employing autoencoder generated near-duplicates. (Right) IPT
test conﬁguration used for evaluation of the basis functions by employing open
source image augmentation packages. The bold arrows indicate immediate
links and the dashed arrows indicate ancestral links.

. . . . . . . . . . . . . . . 125

Figure 6.6: Experiment 7: (Left) Near-duplicates generated using BeautyGlow generative
network.
(Right) IPT constructed using Chebsyshev polynomials for the
near-duplicates on the left. The bold arrows indicate immediate links and the
dashed arrows indicate ancestral links.

. . . . . . . . . . . . . . . . . . . . . . 125

Figure 6.7: Experiment 1: 3D projected parameters using -SNE corresponding to each
photometric transformation (column) modeled using each basis function
(row). Each color represents a single image. A total of 5 images were
modeled. Gaussian and Bump RBFs model majority of the transformations
reasonably well as indicated by the last two rows. The Brightness transfor-
mation was easiest to model as the parameters of the basis functions follow a
continuous path.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

. 127

Figure 6.8: Experiment 1: The photometric error between the actual output and the the
output modeled using the basis functions is denoted as residual photometric
error (PE). The mean of the residual PE is demonstrated for 2,000 image pairs
modeled in both forward and reverse directions using the ﬁve basis functions.
Gabor resulted in the highest residual PE, and the RBFs yield the lowest
residual PE demonstrating their eﬃcacy in reliably modeling the transformations.128

Figure 6.9: Experiment 1: 2D projected parameters using -SNE in forward and reverse
directions, corresponding to all 4 transformations modeled using each basis
function: (a) Legendre, (b) Chebyshev, (c) Gabor, (d) Gaussian RBF and (e)
Bump RBF. Legendre and Chebyshev polynomials can better discriminate
between forward and reverse directions as indicated by the relatively well-
separated parameter distributions compared to the remaining basis functions. . . 129

xix

Figure 6.10: Experiment 5: Example of geometric transformation (rotation) modeling
using basis functions. (a) Original image (on the left) and the transformed
image (on the right). (b) Modeled image pair using Legendre polynomials
(modeled original image is on the left and modeled transformed image is on
the right). (c) Modeled image pair using Gaussian RBF (modeled original
image is on the left and modeled transformed image is on the right). . . . . . . . 133

Figure 6.11: Experiment 5: ROC curves for recognition of the original images with the pho-
tometrically and geometrically modiﬁed images using a COTS face matcher.
The recognition performance is higher for geometrically altered images com-
pared to photometrically modiﬁed images indicating high degree of similarity
with the original images.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Figure 6.12: Experiment 6: Examples of near-duplicates available online and their cor-
responding IPTs constructed using the proposed method. The ﬁrst row cor-
responds to (a) 4 near-duplicates retrieved using the query Bob Marley, (b)
IPT constructed using Gabor trained on photometric distribution (the top 3
candidate root nodes are 2,3,1) and (c) IPT constructed using Gaussian RBF
trained on geometric distribution (the top 3 candidate root nodes are 3,2,1).
The second row corresponds to (d) 7 near-duplicates retrieved using the query
Britney Spears and (e) IPT constructed using Chebyshev trained on photo-
metric distribution (the top 3 candidate root nodes are 2,4,5). The bold arrows
indicate immediate links and the dashed arrows indicate ancestral links. . . . .

Figure 6.13: Example images from the CelebA dataset containing prominent background

details in the face images.

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. 136

. 137

Figure 6.14: Illustration of steganographic images generated using the S-UNIWARD algo-
rithm (on the left), and the diﬀerences in the coeﬃcients in the DCT domain
between the cover image and the stego image at each depth level (on the right). . 140

Figure 6.15: Toy example demonstrating the eﬀect of insertion of spurious edge on the von
Neumann entropy. (a) Groundtruth IPT (b) Correctly reconstructed IPT with
spurious edge (c) Incorrectly reconstructed IPT with spurious edge. Note, the
spurious edge is indicated by dashed line. . . . . . . . . . . . . . . . . . . . .

. 142

Figure 7.1: Outline of the objective in this work. Given a set of near-duplicate face
images belonging to the same subject (near-duplicates can be generated using
either photometric or geometric transformations or both), our objective is
two-fold. Firstly, we would like to ﬁlter out the images that do not belong
to the same evolutionary structure. We achieve this by using a locally-scaled
spectral clustering step. The clusters indicated by ellipsoids vary in diameter
indicating the importance of local scaling. Secondly, for each cluster, an
image phylogeny tree (IPT) is constructed. The ensemble of IPTs result in
the desired output corresponding to an Image Phylogeny Forest (IPF). . . . . . . 148

xx

Figure 7.2:

Illustration of the proposed spectral clustering which uses locally-scaled ker-
nels (bottom) instead of a single kernel with a global bandwidth  (top). The
number of images in each cluster, i.e., the density of each cluster is not known
apriori. The global bandwidth incorrectly merges two clusters. On the other
hand, the local scales (1, 2 and 3) are computed assuming that clustering
is inherently a geometric problem resulting in three correct clusters in this
example.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

. 150

Figure 7.3:

Illustration of the ‘node embedding’ module (Section 7.2.2). The module
 (, ) accepts a pair of inputs, : pixel intensity values of each image
in the IPT and : an adjacency matrix indicating relationships between the
images in the IPT. The output of this module is a vector of depth labels
corresponding to each IPT conﬁguration fed as input.

. . . . . . . . . . . . . . 153

Figure 7.4:

Figure 7.5:

Illustration of the ‘link prediction’ module (Section 7.2.3). The module ac-
cepts a pair of inputs (, ), : depth labels from the ‘node labeling’ module
(see Figure 7.3), and : sensor noise pattern (PRNU) features computed from
each image of the set fed as input. The output of this module is the image
phylogeny tree (IPT) containing edges directed from parent nodes to child
nodes. Note that the ancestral links are present in the reconstructed IPT.

. . . . 155

Illustration of utility of PRNU in image phylogeny. The graphic illustrates
the variation in PRNU patterns in repsonse to photometric transformations.
These variations are better visualized using the binary maps computed from
each PRNU pattern (threshold=0) and their corresponding power spectral
density (PSD) plots. Note, the PSD plots of the images do not bear any
apparent variation, but the PSD plots of PRNU patterns reveal discernible
diﬀerences. We intend to leverage this property of PRNU for the task of
image phylogeny in conjunction with GCN. . . . . . . . . . . . . . . . . . . . . 158

Figure 7.6:

IPT conﬁgurations (structures) used in Experiment 2. For ease of visualiza-
tion, only the immediate links are depicted. However, the ancestral links are
also included for evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Figure 7.7:

IPT conﬁguration of iris and natural scene images used for evaluation in
Experiment 2. The conﬁguration used in iris near-duplicates is diﬀerent from
the ones used in training (see Figure 7.6). The immediate links are depicted
using bold blue arrows, while the ancestral links are depicted using dashed
orange arrows.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

. 162

xxi

Figure 7.8:

Illustration of the image phylogeny forest structures used in Experiment 3.
Each IPF comprises three IPTs, where each IPT may have 5 nodes (IPT 1 and
IPT 4) or 10 nodes (IPT 2 and IPT 5) or 15 nodes (IPT 3 and IPT 6). The
selected test conﬁgurations diﬀer from the conﬁgurations used in training the
GCN and indicate variations both in density and conﬁgurations of the IPF
test cases. The immediate links are indicated for ease of visualization but
ancestral links are also included for evaluation. . . . . . . . . . . . . . . . . . . 163

Figure 7.9: Experiment 1: Locally-scaled spectral clustering performance for near-duplicates

downloaded from the Internet. The numbers indicate the cluster identiﬁer to
which an image has been assigned. On the left, six clusters (IPTs) have been
identiﬁed. On the right, two clusters (IPTs) have been identiﬁed. The results
are for visual inspection only as no ground truth is associated with them.

. . . . 165

Figure 7.10: Experiment 1: Locally-scaled spectral clustering performance for near-duplicates

generated using deep learning-based transformations [84]. The numbers indi-
cate the cluster identiﬁer to which an image has been assigned. The proposed
method can successfully discern between minute changes in the attributes and
assigns the modiﬁed images to distinct clusters (IPTs) in a majority of cases.

. . 166

Figure 7.11: Experiment 3: Variation of clustering accuracies as a function of the number
of nodes for the conventional spectral clustering (blue) and the locally-scaled
spectral clustering (proposed) methods. The proposed method (orange bars)
consistently results in higher means and lower standard deviations in clus-
tering accuracies across 5, 10 and 15 nodes over the conventional spectral
clustering algorithm. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

.

Figure 7.12: Experiment 3: Variation in root identiﬁcation and IPT reconstruction accu-

racies as a function of the number of nodes.

. . . . . . . . . . . . . . . . . . . 170

xxii

LIST OF ALGORITHMS

Algorithm 1: Selection of the candidate image . . . . . . . . . . . . . . . . . . . . . . . . 49
Algorithm 2: Spooﬁng PRNU pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Algorithm 3: PRNU anonymization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Algorithm 4: PRNU spooﬁng .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Algorithm 5: Asymmetric dissimilarity measure computation . . . . . . . . . . . . . . . . 97
Algorithm 6: IPT-DAG construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Algorithm 7: Locally-scaled spectral clustering . . . . . . . . . . . . . . . . . . . . . . . . 152
Algorithm 8: PRNU-based link prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 156

. .

xxiii

CHAPTER 1

INTRODUCTION

1.1 Biometrics

Biometrics refers to the science of recognizing individuals based on their physical or behavioral
attributes [92]. Examples of these attributes include face, ﬁngerprint, iris, voice, gait, hand
geometry, keystroke dynamics and signature [90]. Such attributes are typically intrinsic to a
particular individual. Therefore, biometrics can be used for identiﬁcation and veriﬁcation purposes.
Identiﬁcation pertains to recognizing an individual from a set of several individuals. Veriﬁcation,
on the other hand, involves conﬁrming the identity claimed by an individual. Biometrics has found
its way into our daily lives, be it in the form of access control such as TouchID or FaceID on the
iPhones, or speaker recognition on Alexa. The scope of applications of biometrics has extended
beyond the original utility of authentication, and is slowly gaining momentum towards ancillary
motives, such as gender prediction from ocular images [34], demographic prediction from face
images [82], etc. This demonstrates the capability of biometrics to cast a strong inﬂuence in
diﬀerent disciplines such as medicine, online marketing, social media, and much more.

A biometric system has two modes of operation — an enrollment stage and a veriﬁcation
stage [90]. During the enrollment stage, a gallery is constructed by acquiring biometric data from a
large number of individuals. The biometric template computed from the biometric data serves as a
personal identiﬁer for each enrollee in the gallery. During the veriﬁcation stage, a user presents their
biometric template for authentication. The probe or the query sample can then be used for either
identiﬁcation or veriﬁcation. The entire pipeline can be succinctly described using the following
steps:

• Acquisition: The sensor is responsible for the acquisition of data from an individual. Images
of faces, ﬁngerprints and irides are acquired using special sensors developed for the respective

1

modality. Iris sensors typically operate in the near-infrared (NIR) spectrum compared to face
sensors that operate in the visible (VIS) spectrum. The application scenario also dictates the
type of sensor used. For example, covert operations may use thermal cameras operating in
a night-time environment. Speaker recognition requires microphones to acquire the audio
data. Occasionally, the sensor module may also apply some form of on-board processing,
such as red-eye correction or converting the data into a format suitable for eﬃcient storage.

• Extraction: The feature extraction module processes the data acquired by the sensor and
distills a compact representation that encapsulates the most relevant components in the
original data. Minutia are the feature descriptors for a ﬁngerprint image, whereas, a binary
IrisCode is the distilled feature representation for an iris image.

• Comparison: The matcher is the ﬁnal component that accepts two features extracted from
two sets of data and compares them to compute a match score. The match score can be in the
form of a similarity score or a distance score. This score measures the degree of disparity
between the two feature representations, which in turn, can be used to infer whether two
representations belong to the same identity or not.

1.2 Digital Image Forensics

Digital image forensics refers to the science of integrity veriﬁcation of images and videos [65,
134]. The history of image tampering dates back to at least a hundred years ago, when there were
accounts of images being modiﬁed by inserting or eliminating an element present in the image.
The element can be an object, or a person. The concern is still the same in current times, but
now involves a wide gamut of possibilities in addition to insertion and deletion operations. Image
editing operations include moving objects around in an image, adding an emotional expression to a
face image, adding special eﬀects, cosmetic enhancements, and even synthesizing an entire image.
Digital image forensics involves the development and study of methods that can be employed to

2

indicate whether an image has been tampered with or not, and identify the tampered region within
the image [137]. The forensics can be conducted at diﬀerent levels as indicated in [66]:

• Format-Based Analysis: The format of the ﬁle in which the acquired image is stored can
provide some useful cues as to image tampering. JPEG headers contain information about
the compression rate and can be used to indicate whether an image has undergone double
compression. This happens when an original image is modiﬁed successively and is saved in
the form of compressed ﬁles. The JPEG header contains an image’s EXIF (Exchangeable
Image File Format) metadata. The metadata is organized in the form of image ﬁle directories
(IFDs), which may include the thumbnail, the GPS location and some additional details. The
EXIF metadata can serve as a unique camera signature. This signature can be compared with
a gallery of known camera signatures to assess whether a particular image has undergone
any tampering, as tampering will cause a mismatch in the metadata.

• Camera-Based Analysis: Most digital cameras are equipped with a single CCD or a CMOS
sensor integrated with color ﬁlter array (CFA) [134]. Bayer array is the most widely used
CFA in a majority of cameras. Each CFA records a single value of a color channel (red,
green or blue) at a single pixel location. This implies that the other two color channel values
at a particular pixel have to be interpolated using the color values present in neighboring
pixels. This estimation is referred to as demosaicking. The interpolation may inadvertently
introduce correlations which are typically of periodic nature due to the fact that the CFA is
arranged in a periodic fashion. Image tampering, such as splicing of two images to produce
a composite image, destroys the correlations or introduces inconsistencies in correlations of
the CFA array. These inconsistencies or lack of correlations can therefore be leveraged to
identify doctored images. Another phenomenon intrinsic to the camera’s optical imaging
system is known as chromatic aberration that can be harnessed to expose image forgery.
As the name suggests, this is an error arising due to the failure of the lens to focus light
of all wavelengths at a single point. This results in spitting of the light beam leading

3

in a polychromatic light. Chromatic aberration leads to color imperfections that appear
consistently across the image, but tampering with a particular region of an image alters the
magnitude and direction of the chromatic distortion, and therefore, can be subsequently used
for indicating tampering. Finally, the recently proposed sensor pattern noise, arising due to
anomalies in the fabrication process has been successfully used for image forensic purposes.
This noise component persists in each image acquired using a camera over time and serves
as a unique ‘camera ﬁngerprint’. Details about the camera sensor pattern noise based image
forensics will be discussed further in the following section.

• Pixel-Based Analysis:

Insertion, deletion or cloning of objects in an image are usually
accompanied by re-sampling operations to make the forged image look realistic. Otherwise,
the image may look irregular or abnormal. This involves geometric registration and possibly
modiﬁcation of the pixel intensity values for the foreground to blend well into the background
of the composite. Determining the geometry and color of the cloned region typically involves
keypoint-based matching, where keypoints indicate salient regions in an image. Scale-
Invariant Feature Transform (SIFT) can be used to extract the keypoints which are then
matched using an Euclidean norm in an image. The largest set of matching keypoints
corresponds to a possible cloned region in that image.

• Statistical Modeling-Based Analysis: Projection of data onto a linear subspace can be used for
discriminating between diﬀerent classes of images, say between original images and computer
rendered images. Images generated using computer graphics are typically rendered under
ideal assumptions pertaining to the geometry and optical models, which is not the case in the
case of real photographic images acquired using cameras. The deviations in the underlying
statistical distributions between the synthesized images and real images will result in disparate
projections onto the subspace. These distinct projections can then be used to classify the
synthetic images.

• Geometric Modeling-Based Analysis: The imaging model provides the perspective projection

4

of a point in a 3-D world coordinate system to a 2-D homogeneous coordinate system.
Image-splicing operations can introduce geometric distortions that can be estimated in terms
of transformation matrix coeﬃcients. The diﬀerences in the estimated parameters (matrix
coeﬃcients) can be used for detecting presence of image tampering. Other factors such as
geometric inconsistencies arising due to fake reﬂections or shadows in altered images can be
used as cues to indicate presence of image tampering.

• Physics-Based Analysis: The direction of the light source can also be used as an additional
cue into detection of image tampering. The 2-D lighting model used in images typically
assumes a Lambertian reﬂecting surface. Inconsistent lighting directions can suggest, but
does not conclusively indicate, presence of tampering. The properties that hold good for
a Lambertian surface can be used for analyzing the physical discrepancies in an image to
indicate manipulations.

1.3 Image Forensics in the Context of Biometrics

The ﬁeld of digital image forensics uses scientiﬁc principles to establish the origin and au-
thenticity of digital images [66]. The proliferation of digital images in a number of applications,
ranging from social media [40] to law enforcement [120, 138], has further accentuated the need
to develop eﬀective image forensic tools for a myriad of purposes. As described in Section 1.1, a
biometric system uses a sensor module for acquiring the biometric data, usually in the form of an
image. A biometric image, therefore, contains traces of both sensor-speciﬁc and biometric-speciﬁc
information. As discussed in Section 1.2, camera-based analysis and pixel-based analysis are two
image forensic schemes. The camera-based analysis of a biometric image can be used for biomet-
ric sensor identiﬁcation - this comes under sensor-based forensics. The pixel-level analysis of a
biometric image can be used for image modiﬁcation detection - this comes under content-based
analysis. We encompass both perspectives of sensor-based and content-based analyses to gain a
better understanding of image forensics in the context of biometric images (see Figure 1.1) in our
work.

5

Figure 1.1: The overarching objective in this thesis is the integration of content-speciﬁc and sensor-
speciﬁc characteristics present in a biometric image to develop image forensic strategies in the
context of biometrics.

(a)

(b)

(c)

Figure 1.2: Examples of Near-Infrared (NIR) iris images captured using (a) Aoptix Insight, (b) LG
iCAM4000 and (c) IrisCam V2 sensors.

1.3.1 Sensor-based forensics

Although we will explore sensor-forensics for diﬀerent types of biometric modalities, we begin our
research in the context of iris modality. This is due to that fact that limited work has been done on
ocular images [154]. Most iris recognition systems acquire a near infrared (NIR) image of the iris1
region (Figure 1.2). Recent work based on digital image forensics has established the possibility
of deducing sensor information from the iris image alone [18,19,52,71,154]. Determining sensor
information (e.g., brand name of sensor) from the iris image has several beneﬁts, especially in the
context of digital image forensics.

Validating Metadata: Iris datasets usually have some metadata associated with them such as
1It must be noted that a typical iris sensor captures the ocular region extending beyond the iris.
The term iris image has been used interchangeably with the term ocular image captured by the
sensor.

6

the date and time when the images were taken or modiﬁed, details of the camera used for image
acquisition, data collection protocol employed, etc. However, this metadata may be inadvertently
or deliberately corrupted. Image forensics can be used to verify if the images in a dataset did indeed
originate from the claimed source.

Sensor-speciﬁc Processing: Some sensors may perform on-board processing of the acquired
iris image. Knowledge of the sensor may then be used to recover the pre-processing history of
an iris image. In some cases, sensor information can be used to photometrically or geometrically
adjust an iris image. This can also be used to facilitate sensor interoperability.

Device Validation: In biometric applications such as banking, it may be necessary to authen-
ticate both the subject and the device itself. In such cases, deducing device information from the
biometric image will be useful [71].

Forensic Applications: Linking multiple iris images to a particular sensor source may be
required in forensic applications in order to validate the source of the iris images and to secure the
chain of custody.

Tampered Images: Sensor identiﬁcation schemes can be used to detect iris images that have
been tampered with. For example, if the pixel artifacts observed in an image are not consistent with
the sensor from which it was claimed to be obtained, there is a possibility that it has been modiﬁed
after acquisition.

The literature on image forensics, which includes deducing sensor information from digital
images, is rapidly developing. Early work focused on extracting information about dead pixels [74]
and color ﬁlter array interpolation artifacts [27] of a sensor from its images. Recent work has shown
that Sensor Pattern Noise (SPN) extracted from images can be used to recognize the acquisition
device [112, 113]. The two primary sources of noise in the image acquisition stage are shot
noise and pattern noise [41]. Shot noise, also known as photonic noise, is a random component.
Pattern noise, on the other hand, is deterministic in nature and remains in every image that has
been captured by a given sensor and can, therefore, be used to identify the sensor model. The
two primary components of pattern noise are Fixed Pattern Noise (FPN) and Photo Response Non

7

Uniformity (PRNU) [41]. FPN is generated by dark currents. It is an additive component and can be
suppressed by ﬂat ﬁelding where a dark frame is subtracted from every image taken by the camera.
On the other hand, PRNU is the more dominant multiplicative term arising as a consequence of
minor defects in the sensor manufacturing process. PRNU arises due to variation in sensitivity of
individual pixels to the same light intensity. Extensive work has been done in this regard to reliably
estimate PRNU [41,106,108,109]. But these methods have been developed speciﬁcally for sensors
operating in the visible (VIS) spectrum.

Iris sensors, on the other hand, primarily operate in the NIR spectrum. This poses a new
challenge to traditional image forensics. For example, NIR focal plane arrays have larger pixel
size compared to the CCD arrays employed by VIS cameras [118]. Furthermore, in some cases,
the materials used for manufacturing NIR detectors can be diﬀerent from those used in VIS
cameras [158]. These factors can impact the PRNU estimation process. Therefore, it is necessary
to determine if PRNU estimation schemes developed for VIS cameras can be appropriated to NIR
sensors.

There has been limited work on sensor identiﬁcation in the context of NIR iris images by Uhl

and Höller [154], Kalka et al. [98], and Debiasi and Uhl [53].

1.3.1.1 Sensor Identiﬁcation Methods

The general framework for sensor identiﬁcation from an input iris image is summarized in Figure 1.3.
The crux of the framework lies in estimating the sensor reference pattern from a set of training
images emerging from the sensor. This reference pattern is then correlated with the noise residual
pattern extracted from an input iris image in order to compute a correlation score. Given multiple
reference patterns corresponding to diﬀerent sensors, the test iris image is assigned to that sensor
class whose reference pattern results in the highest correlation score.

If an imaging sensor is illuminated with a uniform light intensity , the sensor registers the
signal as,  ≈ (1 + )× [41]. The multiplicative term () present in the output  signal is the
PRNU corresponding to the sensor used to acquire the image. The term  is intrinsic to each

8

Figure 1.3: General framework for sensor identiﬁcation from an iris image.

sensor, so it is also referred as the sensor reference pattern. Next, we describe methods used to
estimate the sensor reference pattern and perform PRNU-based sensor identiﬁcation.

Basic SPN: Lukas et al. [112] proposed extraction of the PRNU by denoising the original
image using 8-tap Daubechies Quadrature Mirror ﬁlter. The camera reference pattern K, which is a
matrix of same dimensions as the sensor, is constructed using the average of  noise residuals, i,
 = 1 . . . , corresponding to  training images. The noise residual corresponding to Ò training
=1 
image  is calculated as  =  − () and the reference pattern is computed as  =

Here, F represents the wavelet based denoising function as used in [112].


.

MLE SPN: Chen et al. [41] used Maximum Likelihood Estimation (MLE) to obtain a more
accurate estimate of PRNU. The authors also employed zero-mean operation and Wiener ﬁltering
to reduce the interpolation artifacts. The interpolation artifacts stem from the Bayer pattern and
should, therefore, not impact NIR images because infrared sensors do not use color ﬁlter arrays
(CFA). The MLE camera reference pattern is computed as K =
. The MLE noise residual
for a given test image Y is computed as  = K.

=1  
=1  


Enhanced SPN: Li [106] proposed an enhancement scheme to attenuate the scene details,
which can contaminate the noise residual obtained using Basic SPN. The author proposed ﬁve
enhancement models to subdue the scene inﬂuences by modulating the magnitude of the noise
components in the wavelet domain (Figure 1.4). Only the test noise residuals are enhanced, because

9

(a) Model I

(b) Model II

(c) Model III

(d) Model IV

(e) Model V

Figure 1.4: Photo Response Non-Uniformity (PRNU) enhancement models used for suppression
of scene details in the images. Here, - axis represents the pre-enhanced noise residual values in
the wavelet domain and the -axis represents the post-enhanced noise residual values.

a single noise residual is more likely to be inﬂuenced by the scene compared to the reference pattern
which is constructed by averaging multiple images. Since iris images exhibit a rich texture, the
enhanced SPN scheme can suppress iris-speciﬁc structures from the noise residuals.

10

-20-50520-1-0.500.51-20-50520-1-0.500.51-20-50520-1-0.500.51-20-50520-1-0.500.51-20-50520-1-0.500.51Phase SPN: The SPN can be modeled as a white Gaussian noise, and so Kang et al. [99]
hypothesized that whitening in the frequency domain can further attenuate the scene details. The
noise residual, , is ﬁrst whitened in the frequency domain, followed by extraction of the phase
component using discrete Fourier transform (DFT), i.e., W = () and W =
W
|W| . Here,
|W| denotes the Fourier magnitude of W. The spatial component of W is recovered using the
inverse DFT (IDFT). The camera reference pattern is ﬁnally constructed by averaging the real part
of the recovered spatial noise residual as, K = 

(cid:32)

(cid:33)(cid:33)

 
(cid:32)

=1 W


.

So far, we have discussed diﬀerent sensor identiﬁcation schemes. How can we gauge the
robustness of these methods? Can we deliberately confound the sensor identiﬁcation methods? We
explore these questions next.

1.3.1.2 Sensor De-identiﬁcation Methods

The counter-forensics literature describes techniques that can be used to suppress or perturb the
PRNU pattern embedded in an image. This is often referred to as source anonymization [64], i.e.,
obscuring the ‘ﬁngerprint’ of the source sensor in an image so as to anonymize the origin of the
image. Source anonymization can be used as a privacy preservation scheme that is particularly
relevant when the sensor-speciﬁc details can be used to associate a sensor with its owner. Assuming
that each device is typically associated with a single user, device identiﬁcation can be indirectly used
to reveal the identity of the person possessing that speciﬁc device [125]. There have been primarily
two approaches to perturb the PRNU pattern for this purpose: (i) compression and ﬁltering based
schemes, which typically use strong ﬁltering schemes such as, ﬂat-ﬁeld subtraction [157] or Wiener
ﬁltering [28] that can degrade the PRNU pattern leading to incorrect source attribution; and (ii)
geometric perturbation based schemes such as ‘seam carving’ [28, 63] that distorts the alignment
between the sensor reference pattern and the test noise residual, thereby impeding the process of
correlating the reference pattern with the test noise residual.

In contrast to source anonymization, PRNU spooﬁng not only suppresses the ﬁngerprint of the
source sensor, but it also inserts the ﬁngerprint of the target sensor. An adversary may tamper

11

with the digital evidence to maliciously exculpate a guilty person or worse, incriminate an innocent
person.
In recent literature, PRNU spooﬁng has been performed by two methods: (i) PRNU
injection and (ii) PRNU substitution. The ﬁrst method adds the weighted reference pattern of a pre-
selected target sensor to the input image,  [78]. The modiﬁed image becomes (cid:48) = [ +  ×  ˆ].
Here, ˆ is the reference pattern of the target sensor  and  is a scalar parameter. The second
method subtracts the PRNU pattern of the source sensor in an image and then adds the PRNU
pattern of a target sensor [107]. The modiﬁed image is represented as (cid:48) =  −  ˆ +  ˆ. 
belongs to the source sensor , whose reference pattern is ˆ.  and  are scalar terms.

In [154], the authors examine the viability of PRNU spooﬁng via injection in the context of
iris sensors operating in the NIR spectrum [77]. In their work, they computed the forged image as
(cid:48) = [F() +  ˆ]. Here, (·) is the wavelet based denoising ﬁlter, and  is a scalar parameter.
The authors further performed the triangle test to detect the spoof attack, but did not analyze the
impact of the PRNU spooﬁng on iris recognition performance. In the current literature, adversarial
networks have been used for perturbing images with great success [122]. However, a signiﬁcant
bottleneck of deep-learning based techniques is the need for large amount of training data for
driving the perturbation process.

1.3.1.3

Joint Biometric-sensor Representation Methods

Biometric recognition systems comprise a feature extraction module that elicits a salient feature
representation from the acquired biometric data, and a comparator module that compares two sets
of feature representations to compute a match score [91]. On the other hand, sensor recognition
systems extract sensor pattern noise [112] from a set of training images obtained from diﬀerent
sensors to generate sensor reference patterns. To deduce the sensor identity of an unknown test
image, ﬁrst its sensor pattern noise is extracted, and then it is correlated with the reference patterns.
The test image is assigned to the device whose reference pattern yields the highest correlation value.
In [72], the authors used partial face images acquired using smartphones and employed a
weighted sum fusion rule at the score level to combine sensor and biometric recognition. Later,

12

they extended their work to include feature level fusion in [71] and concluded that score level
fusion performed comparatively better. In [73], the authors performed HOG-based face recognition
and combined it with Photo Response Non-Uniformity-based sensor recognition at the score level.
In [15], the authors combined ﬁngerprint recognition with device recognition by performing feature
level fusion of minutiae-cylinder-codes with SRAM start-up values. Fusion at the score or feature
level is often dependent on the speciﬁc biometric modality and the device sensor used. A speciﬁc
fusion rule producing the best results on a particular biometric and sensor modality (e.g, iris and
near-infrared sensors) may not yield optimal results on a diﬀerent modality (e.g, face and RGB
sensors), and therefore, needs to be tuned separately for each pair of biometric and sensor modalities.
Furthermore, feature-level fusion retains the individual biometric and sensor-speciﬁc components
that can be recovered from the fused representation using appropriate measures. Obtaining the
biometric template may compromise the privacy aspect of biometrics. In contrast, the proposed
joint representation non-trivially uniﬁes the biometric and sensor-speciﬁc features. As a result,
typical countermeasures will be ineﬀective in disentangling the biometric component from the joint
representation. This will implicitly preserve the privacy of the biometric component.

To summarize, we described methods to best estimate sensor-speciﬁc traces present in an image
and use them to perform sensor identiﬁcation. We also discussed about PRNU suppression that
has applications in the context of privacy preservation. Furthermore, we described the importance
of designing an approach to create a joint biometric-sensor representation that can be used to
perform biometric and device recognition simultaneously. This has applications int he context of
smartphone-based authentication.

Next, we focus on the content-speciﬁc analysis of the biometric images that can discriminate

between original and modiﬁed biometric images.

1.3.2 Content-based forensics

In many applications, the face image of an individual may be subjected to photometric transfor-
mations such as brightness adjustment, histogram equalization, gamma correction, etc. These

13

photometric transformations may be applied in a sequential fashion, resulting in an array of near-
duplicate face images (see Figure 1.5). While some of these transformations can be used to improve
face recognition [151], others may be maliciously used for image ‘tampering’ [13]. The availability
of inexpensive photo editing tools has resulted in the posting of a large number of near-duplicates
on the internet. Identiﬁcation of the original image from a set of such near-duplicates is important
in the context of digital image forensics [65]. Furthermore, inferring the order of evolution between
a set of near-duplicate images is a challenging but interesting problem [139]. The order of evolution
can be represented as an Image Phylogeny Tree (IPT), that indicates the relationship between the
root node (original image) and the child nodes (transformed images) via directed links as illustrated
in Figure 1.6. The task of image phylogeny is highly challenging as the scope of transformations
and the widespread distribution of edited content on online platforms keep evolving. Deducing the
IPT from the set of near-duplicates in an automated fashion has the following advantages:

(a)

(b)

(c)

(d)

Figure 1.5: Examples of variations of the same image uploaded on multiple websites with subtle
modiﬁcations making them appear almost identical.

1. Indication of image tampering: An image can be tampered for a number of reasons. It can be
used to airbrush celebrity faces on magazine covers,2 or to depict fake situations to garner political
attention. In either case, the tampered images convey false information. An IPT has directed links,
and therefore, can be used to trace an image back to its origin, i.e., the root node that denotes the
original image.
2. Preserving chain of custody: Face images can be produced as culpable biometric evidence
in legal proceedings [149]. The admissibility of such evidence is contingent on its integrity, and

2https://www.cbsnews.com/news/uk-curb-airbrushed-images-keep-bodies-real/

14

(a)

(b)

Figure 1.6: Examples of photometrically related near-duplicate face images. (a) A set of 20 related
images and (b) their corresponding Image Phylogeny Tree (IPT). In our work (a) is the input and
(b) is the output.

should not have been tampered with. The chain of custody [45] of the digital evidence can be
established via (i) hardware identiﬁcation (the device used to acquire the biometric sample) [24],
and (ii) analysis at the image level. IPT construction involves content-based analysis and can be
leveraged to determine the authenticity between a pair of biometric samples i.e., the original versus
the tampered.

We have primarily motivated the reason for image phylogeny from the perspective of chain of
custody preservation of biometric images as culpable evidence [45, 46]. However, our method is
not restricted to biometric images only, and can be suitably applied to generic images. Therefore,
we can use our method in handling near-duplicates that appear on defaced websites. Website
defacement constitutes a hacker typically substituting the original content on the website with
images or/and text. The perpetrators often tend to re-use these images, with some alterations in
case of mass defacement attacks (see Figure 1.7). Some images bear a logo or an emblem that
indicates the aﬃliation of the hacker to a speciﬁc attack group. In such cases, image phylogeny can
analyze the evolution of the reused near-duplicates to potentially link it back to the hacker. This

15

will help in assessing cyberattack patterns and boost cybersecurity.

Figure 1.7: Near-duplicates appearing on defaced websites. Images curated by Zone-H, not meant
for public distribution.

1.3.2.1

Image Phylogeny (Tree and Forest) Construction Methods

The task of near-duplicate detection and retrieval (NDDR) is well-studied in the literature [68,101],
and is closely related to the task of image phylogeny. Near-duplicates are deﬁned as semantically
similar images but may have slight modiﬁcations or may contain portions from multiple source
images. In our case, we consider near-duplicates as images belonging to the same individual (also
includes images with slightly diﬀerent pose and expressions in the case of face images) but may
have undergone some photometric or geometric transformation to make the images ‘appear’ almost
identical but not exactly identical. We have also quantiﬁed the notion of near-duplicity using
Structural Similarity Index Measure (SSIM) which computes the similarity between an image pair
in terms of luminance, contrast and structural details [163]. The near-duplicates used in our work
are reported to have values greater than 75% in terms of SSIM. In cases, where the query image is a
composite of multiple donor images, provenance analysis [124] has been used, which ﬁrst identiﬁes

16

the relevant donor images for a single composite image using a provenance ﬁltering step, and then
follow it with a provenance graph construction step for determining the order of modiﬁcations. In
such cases, undirected phylogeny trees [31] will suﬃce, since deducing the links is more critical
than determining the direction of the link. On the other hand, image phylogeny trees (IPTs) deal
with the task of determining directed edges, that is explicitly determining the parent and child
nodes such that the edge is directed from the parent node (original image) toward the child node
(transformed image). Typically, IPT is considered as a minimal spanning tree (MST), while we
interpret IPT as a directed acyclic graph (DAG) by including the ancestral edges, which indicate
immediate parents as well as higher level predecessors. Conventionally, an IPT considers a single
root node [30,57,58,61,121,133]. Other works have considered the presence of multiple root nodes
resulting in Image Phylogeny Forests [62,128,129]. A majority of the literature focuses on simple
geometric transformations, (cropping, scaling, rotation) and pixel intensity-based transformations
(brightness and contrast adjustment, gamma transformation) and compression operations.

Conventionally, IPT construction involves two steps [59,61]:

(i) Computing an asymmetric (dis)similarity measure between every pair of images in the set.
(ii) Using a tree spanning algorithm to infer the existence of links between image pairs and identify
the parent (original) and child (transformed) nodes based on the asymmetric measure.

The ﬁrst step computes an asymmetric measure which models the relationship between each
pair of near-duplicate images in the set. The ﬁrst step of asymmetric measure computation usually
involves geometric registration, followed by color channel normalization and compression match-
ing [61]. Other works focus on using wavelet-based denoising technique [119], and a combination
of gradient estimation and mutual information techniques [47] to derive an improved asymmetric
measure. Typically, a majority of these methods [61, 119, 124] perform pairwise modeling to
compute the asymmetric measure. The objective is to minimize some distance or error function
between the pair of images. However, while performing the pairwise modeling, the method does
not consider the global relationships that the pair of images (under consideration) may share with
the remaining images from the set. This is particularly important for an accurate IPT construction

17

in the second step, which utilizes the asymmetric measure to span the IPT.

Recently, research has steered in the direction of exploring the ‘global’ relationships by utilizing
all the images in the set simultaneously instead of conducting a pairwise analysis. In [38], the authors
use a denoising autoencoder to improve the dissimilarity matrix to obtain an accurate IPT. The
authors in [32] used a deep neural network to rank the order in which the near-duplicates have been
generated by learning the transformation-speciﬁc embedding.

Typically, an IPT consists of edges or links which are directed from the original image or
the parent node towards the transformed image or the child node. However, consider a situation
where multiple images of the same individual in the same scene are available. Such situations are
relatively common. For example, images may have been acquired using diﬀerent cameras or each
image may capture a diﬀerent facial expression. In this case, transforming each image repeatedly,
but independently of the others, will result in multiple IPTs. Each original image will then serve as
an individual root, and each root will span a distinct IPT. A collection of such IPTs will constitute
an Image Phylogeny Forest (IPF).

In the context of image phylogeny forests, there are basically two types of approaches.

(i)
Consider IPF construction as an extension of the IPT construction process. Initially, each node is
considered as an individual IPT, and then they are successively merged until a terminating criterion
is met. The ﬁnal output is an IPF with multiple IPTs [50,51,62]. (ii) Consider IPF construction as
a two-step process, where the ﬁrst step clusters the images, for example, using spectral clustering
(each cluster represents an IPT), and the second step constructs the IPT corresponding to each
cluster [128]. We focus on the second type of IPF construction in this work.

We would like to point out two observations from the overview of the related literature: (i) In the
context of IPT construction, existing work tackle IPT construction by either performing a pairwise
node analysis or a global analysis including all the nodes. Alternatively, a graph-based approach
can explore both pairwise relationship (ﬁrst order proximity) as well as relationships with respect
to neighboring nodes (higher order proximity). (ii) Limited work has been done in the context of
IPF construction, and the state-of the art method uses spectral clustering which fails in multi-scale

18

examples [29]. In real-world applications, we have no prior knowledge about the number of IPTs,
or their scales (number of nodes).

In the following section, we list the thesis contributions from the twin perspective of sensor-

based and content-based forensic analysis in the context of biometric images.

1.4 Thesis Contributions

In this thesis, we propose a comprehensive analysis of image forensic schemes in the context
of biometric images. To accomplish this objective we have approached it from two diﬀerent
perspectives. Firstly, from the context of sensor-based forensics and secondly, from the context
of content-based analysis. Furthermore, we propose work that coalesces our sensor-forensic and
content-forensic analyses using a graph-based approach to determine the sequence of evolution of
digitally modiﬁed biometric images.

1. To understand the sensor-based forensic aspects of biometric images, we studied the feasibility
of existing sensor identiﬁcation schemes in the context of near-infrared ocular images. We
focused on Photo Response Non-Uniformity based sensor identiﬁcation scheme and observed
that it can be used to reliably identify iris sensors, with the exception of images that are highly
saturated or over-exposed. We demonstrated the impact of photometric transformations
(illumination normalization schemes), and observed that some transformations, such as
Diﬀerence-of-Gaussians (DoG) ﬁltering, can degrade the reliability of PRNU-based sensor
identiﬁcation.

2. To further test the robustness of PRNU-based sensor identiﬁcation method, we developed two
sensor de-identiﬁcation schemes. They are designed to suppress the sensor-speciﬁc traces
present in a biometric image while retaining its matching utility. The ﬁrst method uses an
iterative image perturbation routine to perform sensor spooﬁng. Sensor spooﬁng involves
deliberately confounding the sensor classiﬁer to assign an image to a speciﬁc target sensor,
diﬀerent from the original sensor used to acquire it. The second method uses discrete cosine
transform to suppress sensor-speciﬁc traces and can be used to confound three PRNU-based

19

sensor classiﬁers to perform both sensor spooﬁng and sensor anonymization, where the image
is assigned to any random sensor, not necessarily a speciﬁc target sensor. Both these methods
successfully perform sensor de-identiﬁcation on images acquired using near-infrared iris and
RGB smartphone sensors.

3. To explore the prospect of developing a joint biometric-sensor representation, we developed
a method that combines both biometric-speciﬁc and sensor-speciﬁc traces from a single bio-
metric image. The joint representation can be utilized to simultaneously perform biometric
and sensor recognition, as required in smartphone authentication using biometric signa-
tures. We used a deep learning embedding network to capture both biometric and sensor
details present in a single biometric image in a one-shot fashion. The joint representation
outperformed commercial biometric matchers and PRNU-based sensor recognition in the
task of joint identiﬁcation across three biometric modalities involving multi-spectral sensors.
The joint representation was also able to achieve superior performance in the task of joint
veriﬁcation.

4. To address the issue of image phylogeny for near-duplicate biometric images, we examined
content-based forensic approaches. To accomplish this objective, we developed two methods.
The ﬁrst method involved a deterministic approach of modeling the photometric transforma-
tions between a pair of near-duplicates to discriminate between forward and reverse directions
of transformations. By repeating this process for all pairs of near-duplicates in the set, we
were able to construct the image phylogeny tree for ocular images subjected to a small set
of transformations. The second method involved probabilistic framework by integrating
the use of basis functions and likelihood ratio of the estimated parameters obtained form
modeling photometric and geometric transformations. We evaluated the method on arbitrary
transformations and achieved reliably well performances.

5. To alleviate pairwise modeling required in traditional image phylogeny construction tech-
niques, we used a deep learning-based approach, that combined graph convolutional network

20

with sensor pattern noise for constructing image phylogeny tree. The graph-based approach
provided global analysis by incorporating ﬁrst order proximity and second order proximity
(neighborhood information). Th sensor pattern noise (PRNU) provided local analysis. By
leveraging both degrees of analysis, we constructed image phylogeny trees with higher ac-
curacies compared to sate-of-the-art baselines, that demonstrated promising results both on
face images as well as images containing natural scenes. We further extended our approach
by constructing image phylogeny forests that consisted of multiple image phylogeny trees.
We proposed a locally-scaled spectral clustering to identify the number of IPTs and then used
graph convolutional network along with PRNU for constructing the forest.

1.5 Thesis Organization

The remaining document is organized as follows:

Chapter 2 introduces sensor forensics in the context of biometrics. We begin with the study of
existing sensor identiﬁcation schemes in the context of ocular images. We further study the impact
of photometric transformations on the performance of the sensor identiﬁcation schemes. This
chapter covers the ﬁrst contribution.
Chapter 3 entails sensor de-identiﬁcation schemes in the context of privacy preservation. We ex-
plore diﬀerent biometric modalities such as iris and periocular images and analyze how the sensor
de-identiﬁcation schemes impact the biometric recognition and sensor recognition performances.
This chapter covers the second contribution.
Chapter 4 describes the joint biometric-sensor representation developed for smartphone sensors.
We use a deep learning-based approach to simultaneously learn biometric and sensor-speciﬁc traces
in a one-shot approach from a single biometric image. This chapter covers the third contribution.
Chapter 5 introduces the content-speciﬁc analysis for the task of image phylogeny for digitally
modiﬁed biometric images. We evaluate the method on iris images. This chapter covers the ﬁrst
(deterministic) method listed in the fourth contribution.
Chapter 6 delves deeper into the task of image phylogeny by combining basis functions and likeli-

21

hood ratio-based method for constructing image phylogeny trees. We evaluate the proposed method
on a suite of image editing operations such as Photoshop and deep learning-based manipulations
to test the eﬃcacy of our approach. This chapter covers the second (probabilistic) method listed in
the fourth contribution.
Chapter 7 describes the graph-based approach for constructing image phylogeny tree by unifying
graph convolutional network and sensor-speciﬁc traces. This work combines both sensor and
content details present in images for the task of image phylogeny. We further extend it to image
phylogeny forests with promising results. This chapter covers the ﬁfth contribution.
Chapter 8 concludes the thesis and provides some steps towards future work.

22

CHAPTER 2

SENSOR IDENTIFICATION

Portions of this chapter appeared in the following publications:
S. Banerjee and A.Ross, “From Image to Sensor: Comparative Evaluation of Multiple PRNU
Estimation Schemes for Identifying Sensors from NIR Iris Images," 5th International Workshop on
Biometrics and Forensics, (Coventry, UK), April 2017.
S. Banerjee and A.Ross, “Impact of Photometric Transformations on PRNU Estimation Schemes:
A Case Study Using Near Infrared Ocular Images," 6th IAPR/IEEE International Workshop on
Biometrics and Forensics, (Sassari, Italy), June 2018.

2.1 Introduction

In this chapter we present two studies. The ﬁrst study explores sensor identiﬁcation schemes in
the context of biometric sensors, particularly iris sensors. The second study analyzes the impact of
photometric transformations on iris sensor identiﬁcation schemes.

2.2 Study of existing sensor identiﬁcation schemes on near-infrared ocular

images

We have discussed in the previous chapter that there are a number of sensor identiﬁcation
schemes designed for conventional RGB sensors. But we need to analyze whether such schemes
can extend to biometric sensors, such as near-infrared iris sensors. This work diﬀers from the
existing literature [98, 154] in the following ways: (a) a larger number of sensors are considered
(12 sensors); (b) multiple PRNU estimation methods are compared (4 methods); (c) eﬀect of a
photometric transformation is investigated; and (d) dataset-speciﬁc artifacts are discovered.

23

Table 2.1: Dataset and sensor speciﬁcations.

Name of Dataset

BioCOP2009 Set I
CASIAv3 Interval
CASIAv3 Lamp
CASIAv4 Thousand
CASIAv2 Device2
IITD
BioCOP2009 Set II
BioCOP2009 Set III
ND_Cosmetic_Contact_Lens_2013
ND CrossSensor Iris 2013 Set I
ND CrossSensor Iris 2013 Set II
WVU Oﬀ-Axis

Name of Sensor

Abbreviation Image Resolution
Aop
Aoptix Insight
Int
Proprietary - not known
OKI
OKI IrisPass-h
IK
IrisKing IKEMB100
IC
IrisCam V2
JPC
JIRIS JPC1000
LG i40
LG iCAM 4000
ISCAN
Crossmatch I SCAN2
AD
IrisGuard AD100
LG22
LG2200
LG4000
LG40
EverFocus Monochrome CCD Mon

640x480
320x280
640x480
640x480
640x480
320x240
640x480
480x480
640x480
640x480
640x480
640x480

2.2.1 Experiments and results for the ﬁrst study

Below we provide a description of the datasets and experimental protocol used in this work. Then
we summarize the results obtained.

2.2.2 Datasets used in the ﬁrst study

In this work, we use 12 iris datasets that contain images corresponding to 12 diﬀerent sensors.
The details pertaining to the sensors and the datasets are summarized in Table 2.1. The number of
images used for reference pattern generation (i.e., the training set) is maintained at 55 per sensor,
while for testing, the number of images varied from 100 to 1940. Subjects in the training and test
sets were mutually exclusive. Only 55 images were used for reference pattern generation because
of the limited number of subjects available in some datasets. For example, the CASIAv2 Device2
dataset contains only 60 subjects; therefore, one iris image from each of 55 subjects was assigned
to the training set, and images from the remaining 5 subjects were assigned to the test set.

2.2.3 Experimental methodology used in the ﬁrst study

In the ﬁrst set of experiments, the four PRNU estimation methods were applied on all 12 datasets
(details of the implementation are mentioned later in the section). We observed that the performance
of all the PRNU estimation methods was poor on BioCOP2009 Set III (Crossmatch) compared to the

24

other datasets. We investigated the histogram of pixel intensities of images in the individual datasets
to discern whether sensor recognition was being unduly impacted by inherent image characteristics.
We observed that the histogram of Crossmatch images exhibited a bimodal distribution with
a compressed range of pixel values, that is typically associated with contrast adjustment (see
Figure 2.1). This prompted us to use the box-cox transformation, which is a series of power
transformations, to force the images to conform to an approximate normal distribution. The
equation governing the box-cox transformation is as follows:



 () =

−1


,

(),

if  ࣔ 0;

if  = 0.

The value of  is estimated using maximum log-likelihood function, such that, the distribution of
the transformed output closely approximates the normal distribution. In our experiments,  was
predominantly found to be in the interval [-1, 5]. This transformation also aids in studying the eﬀect
of photometric processing on PRNU estimation. The individual accuracies of the sensors, as well as
the overall accuracies obtained with and without box-cox transformation, are presented in Table 2.2
(for the sake of brevity we have reported the results for Basic SPN alone). The third column in
Table 2.2 reports the results obtained after applying the box-cox transformation to the images in
the datasets. The last column presents the results after applying the box-cox transformation only to
images captured using the Crossmatch sensor. The latter was done for investigative purposes only,
and not for evaluation purposes.

The second set of experiments omitted two datasets - the BioCOP2009 Set III and ND CrossSen-
sor Iris 2013 Set II (the reason for removing the two datasets will be discussed later). In our im-
plementation of Enhanced SPN, the Basic SPN is ﬁrst extracted and normalized using L2-norm as
described in [52] followed by the application of the enhancement model to the wavelet coeﬃcients.
Enhancement Model III (see Figure 4(c)) was used in our work as it was observed to be more robust

25

(a) Aop

(b) LG i40

(c) ISCAN

(d) LG40

Figure 2.1: Average pixel-intensity histograms of four sensors. The pixel intensities vary across
diﬀerent sensors indicating diverse image characteristics.

to the variations in  [106]. The enhancement equations are as follows:



(, ) =

1 − −(, ),
(1 − −)(−(, )),
−1 + (, ),

(−1 + −)(+(, )),

if 0 ≤ (, ) ≤ ;
if (, ) > ;
if −  ≤ (, );
& (, ) < 0;
if (, ) < −.

Here, (, ) and (, ) indicate the original and enhanced noise residual values, respectively (i
and j are the indices of the noise residual in the wavelet domain).  = 6 in our work. While we did
not use L2 normalization in the ﬁrst set of experiments, we employed the same enhancement model
and same  value in both sets of experiments. The reference pattern is identical to the one used for
Basic SPN scheme. For Phase SPN, we did not perform whitening in the frequency domain. This

26

Table 2.2: Sensor identiﬁcation accuracies before and after applying box-cox transformation.

Name of Sensor
(Abbreviation)

Before box-cox
(Basic SPN)

After box-cox
(Basic SPN)

After box-cox
on Crossmatch
images only (Basic

SPN)

Aop
Int
OKI
IK
IC
JPC
LG i40
ISCAN
AD
LG22
LG40
Mon
Overall Accuracy

100%
75.29%
100%
99.38%
100%
100%
60.83%
23.16%
76.17%
97.59%
88.64%
97.78%
87.26%

91.06%
68.24%
91.22%
99.64%
100%
99.11%
80.08%
49.80%
94.13%
70.19%
70.08%
82.50%
88.19%

100%
74.71%
100%
99.38%
100%
100%
60.92%
53.07%
76.41%
97.59%
88.26%
97.78%
88.91%

is because all the images used in our experiments are iris images, and spurious frequency responses
arising due to variations in scene details is not a critical issue.

In both experiments, SPN was extracted without subjecting the images (training and test) to any
geometric transformation (cropping, resizing, etc.). However, we resized the test noise residuals to
the size of the reference pattern for normalized cross-correlation (NCC) computation.

2.2.4 Results and discussion from the ﬁrst study

Analysis of results on the ﬁrst set of experiments: The results reported in Table 2.2 show two no-
table characteristics. Firstly, the accuracy on BioCOP2009 Set III (Crossmatch I SCAN2) improves
by 26% after applying box-cox transformation, while it decreases on most of the other datasets.
Secondly, BioCOP2009 Set II (LG iCAM4000) witnessed an improvement in performance, while
the accuracies on the ND CrossSensor 2013 Sets I and II (LG 2200 and LG 4000) observed a
decrease in performance. Therefore, the box-cox transformation improves the performance on
the LG iCAM 4000 sensor but at the expense of the other two sensors (LG 2200 and LG 4000).
Further investigation indicates that most of the confusion occurred between images from the LG

27

iCAM 4000 and LG 4000 cameras. The failure of Basic SPN, which is known to be eﬀective in
distinguishing between sensors from the same vendor and model, cannot be easily explained in
this case. We believe that even though PRNU derived methods try to approximate the SPN via
wavelet coeﬃcients, the image histogram also inﬂuences the sensor noise in some fashion. The
improvement in accuracy of the Crossmatch sensor after the box-cox transformation leads us to
believe that the images in BioCOP2009 Set III have perhaps undergone some form of illumination
normalization. Moreover, the histogram of images from LG iCAM 4000 and LG4000 reveal that
the latter has more saturated pixels making it prone to failure. Due to an incomplete understanding
about the extent to which the SPN is impacted by NIR image characteristics and unavailability of
pre-processing information, we decided to continue our second set of experiments without the two
ambiguous datasets (BioCOP2009 Set III and ND CrossSensor Iris 2013 Set II).

Analysis of results on the second set of experiments: The reference patterns generated using
diﬀerent PRNU estimation schemes are illustrated in Figure 2.2. The eﬀect of enhancing the noise
residual is visualized in Figure 2.3. The structure of the eye is evident in the noise residual extracted
using Basic SPN which is subdued, but not completely removed, using Enhanced SPN.

(a) Basic SPN

(b) MLE SPN

(c) Phase SPN

Figure 2.2: Reference patterns for the CASIAv3 Interval dataset estimated using diﬀerent PRNU
estimation schemes. Visual inspection reveals noise like pattern extracted from the training images
that are devoid of image content.

The rank-1 confusion matrices obtained for each of the PRNU estimation schemes are reported
in Tables 2.3 and 2.4. In Table 2.3, the values to the left of the slash denote the results obtained using
Basic SPN and the values to the right indicate the results obtained using MLE SPN. In Table 2.4,

28

(a)

(b)

Figure 2.3: Noise residual from an image captured using the Aoptix Insight sensor. (a) Before
enhancement. (b) After enhancement. The application of enhancement model subdues the scene
content in the image signiﬁcantly.

Table 2.3: Rank-1 Confusion Matrix for Basic SPN / MLE SPN based PRNU extraction scheme.

Predicted Aop

OKI

IK

IC

JPC

LGi40

AD

LG22

Mon

Int
492/492 0/0
7/5
0/0
0/0
0/0
0/0
6/42
13/30
0/0
0/0

0/0
21/16

0/0
262/210 6/16
0/10
0/0
0/0
0/0
40/176
33/108
0/0
0/0

763/727 0/8
1/0
0/0
0/0
17/48
30/24
0/0
0/0

1928/1940
0/0
0/0
3/49
12/42
0/0
0/0

0/0
0/0
8/39
13/15
0/8
0/6
0/0
0/0
100/100 0/0
0/0
4/53
11/44
0/0
0/0

0/0
12/8
0/2
0/0
0/0
1690/1690 0/0
60/174
60/153
0/0
0/0

878/365
8/70
0/0
0/0

0/0
0/0
3/9
1/10
0/2
0/0
0/0
11/0
0/0
0/0
0/0
0/0
6/44
14/43
642/301 12/32
2/0
8/0

0/0
7/12
0/0
0/0
0/0
0/0
11/45
14/31
527/540 11/0
0/0

352/360

Actual
Aop
Int
OKI
IK
IC
JPC
LGi40
AD
LG22
Mon

Actual
Aop
Int
OKI
IK
IC
JPC
LGi40
AD
LG22
Mon

Table 2.4: Rank-1 Confusion Matrix for Enhanced SPN / Phase SPN based PRNU extraction
scheme.

Predicted Aop

OKI

IK

IC

JPC

LGi40

AD

LG22

Mon

Int
487/492 0/0
7/2
0/0
0/0
0/0
0/0
2/6
3/14
0/0
0/0

0/0
27/20

0/0
259/224 4/11
0/9
0/0
0/0
0/0
9/83
2/104
0/0
0/0

763/743 0/1
0/0
0/0
1/0
11/11
2/19
0/0
0/0

1934/1940
0/0
0/0
1/14
1/17
0/0
0/0

0/0
0/0
6/40
14/15
0/8
0/2
0/0
0/0
100/100 0/0
0/0
2/12
0/19
0/0
0/0

0/0
13/10
0/0
0/0
0/0
1689/1690 0/0
20/117
5/140
0/0
0/0

945/744
7/42
0/0
0/0

0/0
5/0
2/7
2/5
0/0
0/0
0/0
6/0
0/0
0/0
0/0
0/0
12/24
31/18
810/431 4/27
1/0
8/0

0/0
6/6
0/0
0/0
0/0
0/0
6/10
1/22
531/540 8/0
0/0

352/360

the values before the slash correspond to the Enhanced SPN and values after the slash correspond
to the Phase SPN.

Figure 2.4 illustrates the overall performance of each of the PRNU extraction schemes using
Cumulative Match Characteristics (CMC) and Receiver Operating Characteristics (ROC) curves.
The following observations can be made:

• Enhanced SPN results in the best performance with a rank-1 accuracy of 97.17%.

29

(a) CMC

(b) ROC

Figure 2.4: Comparison of overall accuracy of diﬀerent PRNU extraction schemes using CMC and
ROC curves.

• MLE SPN performs the worst (with a rank-1 accuracy of 83.03%). We believe that the zero
mean operation in conjunction with Wiener ﬁltering that is used in MLE SPN, over-smoothens the
images to the extent that the wavelet coeﬃcients fail to eﬀectively capture the SPN. Iris sensors
operating in NIR spectrum do not use CFAs, and therefore, the use of Wiener ﬁltering and zero-
mean processing is not necessary in our opinion. The Basic SPN, Enhanced SPN and Phase SPN
do not apply these pre-processing operations and exhibit higher accuracies, thereby validating our
supposition.

30

123Rank80828486889092949698100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN020406080100False Match Rate (%)020406080100False Non-match Rate (%)Enhanced SPNBasic SPNMLE SPNPhase SPN• Phase SPN performs moderately well. Phase SPN performs on par with Basic SPN at Rank-3

(where the correlation score with the correct reference pattern occurs among the top 3 values).

• The poor performance of all the SPN methods on the CASIAv3 Interval dataset may be due

to the use of more than one sensor for assembling this dataset.

• The misclassiﬁcation of images from LG iCAM 4000 (see Tables 2.3 and 2.4) could be due

to pixel padding in the former that might have negatively impacted PRNU estimation.

• Application of the box-cox transformation showed a signiﬁcant increase in performance on the
AD100 sensor which had lower performance when using the Basic, Phase and MLE SPN schemes.
Upon closer investigation, this may be due to the digital watermark embedded in the images as
shown in Figure 2.5. Both the box-cox transformation and Enhanced SPN suppress this artifact
thereby leading to improved accuracy.

Figure 2.5: Digital watermark present in the reference pattern of AD100 sensor. (The logarithmic
transformation has been used here for better visualization).

2.2.5 Summary of the ﬁrst study

The ﬁrst study investigated the use of PRNU estimation schemes for determining sensor information
from NIR iris images. Experiments involving 12 sensors and 4 PRNU estimation schemes showed
that device identiﬁcation can be successfully performed on a majority of the sensors. Modiﬁcation
of the image histograms using box-cox transformation resulted in improved accuracy on some of the
sensors but negatively impacted the accuracy of other sensors. Experimental results revealed that
Basic SPN and Enhanced SPN performed favorably across most of the sensors and outperformed
MLE SPN and Phase SPN by a signiﬁcant margin. Enhanced SPN performed better than Basic

31

SPN. We also ascertained that photometric transformations play a role in PRNU-based sensor
recognition performance, which will be thoroughly analyzed in the second study described next.

2.3 Analyzing the eﬀect of photometric transformations on sensor identiﬁ-

cation schemes for ocular images

In the ﬁrst study, we demonstrated the feasibility of PRNU-based sesnor identiﬁcation schemes
in the context of iris sensors. While sensor forensic schemes have been extensively studied
in the context of color images produced by classical digital cameras based on CMOS or CCD
technology [41, 70, 76, 112], their applicability to near-infrared (NIR) sensors was only recently
established particularly in the context of iris recognition systems [18, 52, 54, 98, 100]. A typical

(a)

(b)

(c)

Figure 2.6: Examples of Near-Infrared (NIR) ocular images exhibiting (a) defocus blur, (b) uneven
illumination and (c) motion blur (due to eyelid movement).1

iris recognition system acquires an NIR ocular image, segments the annular iris region from the
ocular image, converts this region to a geometrically normalized rectangular entity, and extracts
a binary code from the normalized image for matching purposes [48]. Often, the input ocular
image is subjected to some illumination normalization schemes in order to address issues such as
motion blur, out-of-focus imaging, low resolution and uneven illumination [95]. See Figure 2.6.
The goal of such illumination normalization schemes is to improve the recognition accuracy of iris
recognition systems by favorably impacting their segmentation and feature extraction modules.

1Images are acquired using Panasonic BM-ET100US (b) and Cogent sensors (a & c).

32

In this study, we examine the following question: Do these commonly applied photometric
transformation schemes impede the performance of iris sensor identiﬁcation algorithms? Such a
study has the following beneﬁts:

• It would help in better understanding the robustness of diﬀerent PRNU-based schemes to
commonly applied illumination normalization routines in the iris recognition domain. This is
particularly important in situations where the original raw image is not available for forensic
purposes, but the processed image is available (e.g., when pre-processing is accomplished
using hardware).

• In recent literature, the possibility of combining ocular biometric recognition with device
(sensor) identiﬁcation has been proposed for enhanced security [71], by using the same
ocular image for both device identiﬁcation and ocular recognition. Since the photometric
normalization schemes considered in this work are known to positively impact biometric
recognition, it behooves us to determine the nature of their impact on device identiﬁcation.

In this work, we evaluate the eﬀect of photometric transformation on multiple PRNU-
based sensor identiﬁcation techniques, and use Jensen-Shannon based divergence measure to
explain the rationale behind the variation in sensor identiﬁcation performance.

In the current work, we advance our understanding of PRNU-based sensor identiﬁcation
schemes by considering multiple photometric transformations and analyzing the eﬀect of such
transformations on sensor identiﬁcation accuracy, in the context of NIR ocular images. Further, we
develop an explanatory model to determine a causal relationship between photometric transforma-
tions and their impact on the performance of PRNU algorithms.

The principal contributions of this work are as follows: a) investigating the eﬀect of seven illumi-
nation normalization schemes (the terms illumination normalization, photometric transformation
and image enhancement have been used interchangeably in the paper) on sensor identiﬁcation
performance; b) conducting experiments using 11 sensors and 4 PRNU estimation schemes; and c)
using the Jensen-Shannon divergence measure to explain the impact of photometric transformations

33

on the wavelet denoised pixel intensity distribution (discussed later) and, subsequently, on sensor
identiﬁcation.

2.3.1 Photometric Transformation

Variations in ambient lighting conditions, coupled with unconstrained image acquisition, result in
challenging ocular images as depicted in Figure 2.6. Occlusions due to eyelid movement, motion
blur, de-focus blur, poor resolution and varying degrees of illumination can signiﬁcantly impact iris
segmentation and iris recognition processes [95]. A large number of illumination normalization
schemes have been demonstrated to improve iris and periocular recognition performance [95, 97,
127, 147]. The relevance of the seven ocular image enhancement schemes considered in our work
is discussed next.

Homomorphic Filtering: Homomorphic ﬁltering is most commonly used for removing non-
uniform illumination in images by applying a high-pass ﬁlter in the frequency domain to images
subjected to a logarithmic transformation [79].
Issues arising due to uneven illumination, as
depicted in Figure 2.6(b), can be addressed by applying a high pass Butterworth ﬁlter after the
logarithm transformed image is converted to the frequency domain using Fourier transform. Singh
et al. [147] used homomorphic ﬁltering to improve the performance of iris recognition on the NHCI
database.

Gamma correction: Gamma adjustment is typically used to increase the contrast of images
acquired in low illumination conditions [150]. This photometric transformation produces the output
image as a power, denoted by a parameter , of the input image pixel values. Jillela et al. [95] em-
ployed gamma correction for improving the contrast of images in the FOCS database for periocular
recognition. The range of  studied in our work is [0.1, 2.1].

Contrast Limited Adaptive Histogram Equalization (CLAHE): Histogram equalization has
been shown to aid periocular recognition [94]. CLAHE tessellates the image into patches and
performs adaptive histogram equalization on each of these patches by clipping the pixel intensity

2Original image was acquired using CASIA-IrisCam V2 sensor [6].

34

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

Figure 2.7: Examples of iris sensors considered in this work. (a) IrisKing IKEMB100, (b) LG
4000, (c) IrisGuard-IG-AD100, (d) Panansonic-BM-ET100US Authenticam, (e) JIRIS JPC1000,
(f) CASIA IrisCam-V2, (g) Aoptix Insight, (h) OKI IrisPass-h, (i) LG 2200, (j) Cogent and (k)
Everfocus Monochrome CCD.

values exceeding the user deﬁned contrast limit [164]. Finally, it aggregates the patches using
bilinear interpolation. In our experiments, the size of a patch is 8 × 8 and the contrast limit is set
to 0.01.

Discrete Cosine Transform (DCT): Illumination invariance can be achieved in the logarithmic
DCT domain by discarding low frequency DCT coeﬃcients, which captures the illumination of the
image [43]. This process operates like a high pass ﬁlter. Juefei-Xu and Savvides applied this illu-
mination normalization for robust periocular recognition on NIST’s FRGC version 2 database [97].
Diﬀerence of Gaussians (DoG): DoG ﬁlter closely approximates the Laplacian of Gaussian
(LoG) ﬁlter in a computationally eﬃcient manner [89]. DoG uses Gaussian ﬁlters having diﬀerent
scales or ﬁlter sizes. The diﬀerence between the two ﬁltered outputs, corresponding to Gaussian

35

(a)

(b)

(c)

(e)

(f)

(g)

(d)

(h)

Figure 2.8: An example of an NIR iris image subjected to seven illumination normalization schemes.
(a) Original, (b) CLAHE, (c) Gamma correction, (d) Homomorphic ﬁltering, (e) MSR, (f) SQI, (g)
DCT normalization and (h) DoG.2

ﬁltering of the image using two diﬀerent scales, is computed. The diﬀerence computed above,
which is the ﬁnal output is devoid of the illumination variations present in the original image.
DoG ﬁltering has been used to compensate for illumination variations in the context of periocular
recognition on the FRGC version 2 database [97]. The two ﬁlter sizes that were used in our work
are 1 =1 and 2 =2, where  denotes the standard deviation.

Multi-Scale Retinex (MSR): Multi-Scale Retinex (MSR) [96] uses smoothing kernels of dif-
ferent sizes, and combines the outputs of Single Scale Retinex (SSR) to remove the halo-like
artifacts produced in images transformed using a kernel of a single scale. The retinex algorithm
was applied to UBIRIS v2, FRGC and CASIAv4-Distance datasets to improve the quality of ocular
images [152]. MSR proved to be the best illumination normalization scheme in [97]. Three scales
(standard deviations), viz.,  = [7, 15, 21], were used in our work to retain the ﬁne details present
in the scene as well as maintain the visual aesthetics of the image.

Single-Scale Self Quotient Image (SQI): SQI is closely related to MSR. It is based on the
Lambertian model and the concept of quotient image [160]. The illumination invariant repre-
sentation can be obtained as the quotient of the original image and the smoothed version of the
original image. The halo-like artifacts produced in MSR is typically due to the use of an isotropic
Gaussian smoothing kernel. This problem is resolved in SQI using a weighted anisotropic Gaussian

36

smoothing kernel [160]. SQI was used for illumination normalization on the UBIPosePr dataset
for unconstrained periocular recognition [131].

Figure 2.8 illustrates the eﬀect of the aforementioned photometric normalization schemes on
a sample ocular NIR image. As evident from Figure 2.8(e), MSR is not able to remove the halo
artifacts completely, and these anomalies persist in Figure 2.8(g), where DCT based normalization
scheme is used.

2.3.2 Experiments and results for the second study

In this section, we review the datasets used in our work, followed by a discussion of the results
observed in the second study.

2.3.3 Datasets used in the second study

We use 11 iris datasets corresponding to 11 diﬀerent sensors. The details concerning the sensors
and the datasets are outlined in Table 3.1. The sensor reference pattern is generated using 55
training images per sensor and the number of test images varied from 528 to 940 per sensor. The
subjects in the training and test sets were mutually exclusive.

Table 2.5: Rank-1 Sensor Identiﬁcation Accuracies (%). The value enclosed in parentheses
indicates the diﬀerence in accuracy when compared to that obtained using the original images.
Note that in all cases, the reference pattern for each sensor is computed using the unmodiﬁed
original images.

Photometric Transformation Basic SPN
Original
Homomorphic
CLAHE
Gamma
DCT normalization
DoG
MSR
SQI
Average

96.43
92.38(-4.05)
95.75(-0.68)
96.53(+0.10)
95.54(-0.89)
92.81(-3.62)
96.31(-0.12)
95.04(-1.39)
94.90(-1.53)

PRNU Estimation Schemes
Enhanced SPN Phase SPN MLE SPN
98.87
93.38(-5.49)
97.78(-1.09)
98.03(-0.84)
97.01(-1.86)
92.77(-6.10)
96.18(-2.69)
96.82(-2.05)
95.99(-2.88)

94.89
93.37(-1.52)
94.51(-0.38)
95.41(+0.52)
96.20(+1.31)
90.28(-4.61)
98.16(+3.27)
94.47(-0.42)
94.63(-0.26)

97.10
97.79(+0.69)
96.43(-0.67)
97.60(+0.50)
97.35(+0.25)
90.42(-6.68)
98.20(+1.10)
94.00(-3.10)
95.97(-1.13)

37

Table 2.6: Jensen-Shannon divergence values computed between the wavelet-denoised versions of
the original and the photometrically transformed images.

Transformations
Ori-CLAHE
Ori-DCT
Ori-DoG
Ori-Gamma
Ori-Homo
Oril-MSR
Ori-SQI

Aop
0.5543
0.5325
0.4791
0.4559
0.6184
0.5602
0.7320

OKI
0.1018
0.0724
0.0970
0.0836
0.0947
0.1231
0.1191

IC
0.2092
0.1330
0.1370
0.0960
0.0769
0.1657
0.2915

IK
0.1053
0.0824
0.0754
0.0499
0.1252
0.2409
0.1880

Name of Sensor
Pan
0.4027
0.3522
0.3134
0.2282
0.3979
0.4060
0.5478

Cog
0.0982
0.1033
0.0760
0.1143
0.3366
0.0910
0.1361

JPC
0.3165
0.5179
0.3074
0.2310
0.3584
0.7459
0.3522

AD
0.4403
0.3402
0.2516
0.2399
0.3894
0.3682
0.6849

LG22
0.4465
0.3314
0.3263
0.3062
0.4949
0.5440
0.5867

LG40 Mon
0.6798
0.3838
0.3230
0.2561
0.5962
0.4311
0.8173

0.0912
0.0571
0.0397
0.1036
0.0612
0.0678
0.1304

Mean and
Std. Deviation
0.3133 ± 0.2071
0.2642 ± 0.1801
0.2205 ± 0.1423
0.1968 ± 0.1211
0.3227 ± 0.2057
0.3404 ± 0.2217
0.4169 ± 0.2644

2.3.4 Experimental methodology and results for the second study

Experiments were conducted on 9,626 images (605 for training and 9,021 for testing) and the
results are reported in terms of Rank 1 identiﬁcation accuracy. Rank 1 accuracy corresponds to
the proportion of test images assigned to the correct sensor class, i.e., those images that yield the
highest NCC when compared against the reference pattern of the sensor they actually originated
from. Note that in all experiments, the sensor reference pattern was always computed using the
original training images and not the photometrically modiﬁed images. Table 2.5 reports the sensor
identiﬁcation accuracies for each PRNU method. Inferences drawn from this table are presented
below:

• Observation#1: The application of photometric transformations marginally decreases the
sensor identiﬁcation performance of the 4 PRNU estimation schemes considered in this work.
Note that the photometric schemes considered herein are applicable in the context of iris and
periocular recognition.

• Observation#2: Enhanced SPN emerges to be the most robust to illumination normalization
methods among the 4 PRNU estimation schemes, closely followed by MLE SPN. The ro-
bustness is assessed by computing the average of the rank-1 sensor identiﬁcation accuracies
corresponding to the 7 photometric transformations. Enhanced SPN resulted in the highest
average identiﬁcation accuracy of 95.99% followed by MLE SPN with an average of 95.97%.
Basic SPN yielded 94.90% and Phase SPN resulted in 94.63%.

38

• Observation#3: Photometric transformations were observed to improve the sensor identiﬁ-
cation accuracy of the Phase SPN method. Multi-Scale Retinex improved the accuracy by
3.27%, DCT normalization boosted the accuracy by 1.31% and Gamma correction marginally
improved it by 0.52%.

• Observation#4: DoG ﬁltering resulted in degradation of sensor identiﬁcation accuracy by
3.62% for Basic SPN, by 6.10% for Enhanced SPN, by 4.61% for Phase SPN and by 6.68% for
MLE SPN. It was closely followed by SQI which degraded the sensor identiﬁcation accuracy
by 1.39% for Basic SPN, by 2.05% for Enhanced SPN, by 0.42% for Phase SPN and by
3.10% for MLE SPN. Based on the results of this work, it is evident that some illumination
normalization schemes which help in improving iris recognition performance, can negatively
impact the performance of sensor identiﬁcation algorithms.

• Observation#5: Gamma transformation and MSR have marginal inﬂuence on all the PRNU
estimation schemes, as seen by the diﬀerence-in-performance values enclosed in parentheses.

The results are further visualized from two perspectives. First, CMC curves are presented in Fig-
ure 2.9 which depict the eﬀect of each photometric normalization scheme on the PRNU estimation
techniques. Secondly, ROC curves are presented in Figure 2.10 indicating the degree of robust-
ness of each PRNU estimation algorithm when subjected to diﬀerent illumination normalization
methods. These two curves reinforce the observations made above.

Next, we address the following question: Is there an explanatory model which can describe
the performance of the PRNU estimation schemes in the presence of photometrically transformed
images? The next section utilizes a statistical measure to explain the variations in the performance
of the sensor identiﬁcation algorithms when applied to photometrically modiﬁed images.

2.3.5 Analysis and explanatory model for the second study

The results in the previous section indicate that PRNU estimation schemes are able to recover sensor
information reliably for some commonly used illumination normalization schemes applied to ocular

39

(a)

(d)

(b)

(e)

(c)

(f)

(g)

(h)

Figure 2.9: Cumulative Matching Characteristics (CMC) curves depicting the eﬀect of diﬀerent
illumination normalization processes on PRNU estimation techniques. (a) Original, (b) CLAHE,
(c) Gamma correction, (d) Homomorphic ﬁltering, (e) MSR, (f) SQI, (g) DCT normalization and
(h) DoG.

images, barring DoG ﬁltering and SQI transformation. In this section, we study the probability
distribution of pixel intensities, i.e., the normalized histograms of the original image and the
photometrically transformed images after being subjected to the wavelet based denosing ﬁlter, to
provide a principled analysis of the performance of PRNU-based sensor identiﬁcation algorithms.
We hypothesize that the degree of disparity between the histograms of the denoised original images
and the denoised transformed images will provide insight into the general performance of PRNU
estimation algorithms on photometrically modiﬁed images. The four sensor identiﬁcation schemes

40

123Rank949596979899100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN123Rank949596979899100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN123Rank9596979899100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN123Rank9293949596979899100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN123Rank96979899100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN123Rank949596979899100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN123Rank9596979899100Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN123Rank909192939495969798Identification accuracy(%)Enhanced SPNBasic SPNMLE SPNPhase SPN(a)

(b)

(c)

(d)

Figure 2.10: ROC curves depicting sensor identiﬁcation performance of photometrically trans-
formed images. (a) Basic SPN, (b) Phase SPN, (c) MLE SPN and (d) Enhanced SPN.

used in this work are not applied to the raw images directly; rather, the images (both original
and transformed) are ﬁrst subjected to wavelet based denoising, followed by PRNU estimation in
the wavelet domain. Thus, it is necessary to consider the denoised images, instead of the raw
images, to develop a suitable explanatory model. To this end, we employed the Jensen-Shannon
(JS) divergence to compute the dissimilarity between denoised image histograms corresponding
to the original image and the photometrically modiﬁed image.
JS divergence is a symmetric
and smoothed version of Kullback-Liebler divergence and yields a ﬁnite value [146]. Given
two probability distributions,  and , the JS divergence is computed as follows, (||) =

41

10-310-210-1100101102False Positive Rate (%)020406080100False Negative Rate (%)Before TransformationAfter HomomorphicAfter CLAHEAfter GammaAfter DCTAfter DoGAfter MSRAfter SQI10-410-2100102False Positive Rate (%)01020304050False Negative Rate (%)Before TransformationAfter HomomorphicAfter CLAHEAfter GammaAfter DCTAfter DoGAfter MSRAfter SQI10-410-2100102False Positive Rate (%)020406080100False Negative Rate (%)Before TransformationAfter HomomorphicAfter CLAHEAfter GammaAfter DCTAfter DoGAfter MSRAfter SQI10-410-2100102False Positive Rate (%)020406080100False Negative Rate (%)Before TransformationAfter HomomorphicAfter CLAHEAfter GammaAfter DCTAfter DoGAfter MSRAfter SQIH( 1

2 +1

2 )−{ 1

to a random variable, say , and is computed as H() = −  log2( ). Here,  = P[ = ].

2H()}. Here, H indicates the Shannon entropy measure corresponding

2H()+1

The JS measure is bounded between 0 and 1 (0 corresponds to identical distributions and 1 indicates
high dissimilarity). Thus, JS divergence computes the average entropy of the two distributions:
higher the entropy value, the more dissimilar are the two distributions.

First, we generated the probability distributions (i.e., the normalized histograms) of the denoised
original and denoised transformed images.3 Next, we compute the JS divergence between the two
probability distributions. Finally, the JS values corresponding to all the images are averaged to
compute a single JS measure value for a given sensor. Table 2.6 reports the JS divergence pertaining
to diﬀerent transformations for each of the 11 sensors. The average and standard deviation of the JS
values corresponding to the 11 sensors are computed for each of the photometric transformations.
The highest divergence value corresponding to a particular transformation is bolded, while the
lowest divergence value is italicized. Some important observations from Table 2.6 are summarized
below.

• Observation#1: Gamma transformation resulted in the least JS divergence value. It indicates
that the normalized histograms of denoised original and Gamma transformed images are
highly similar. So it is not surprising that Gamma transformation resulted in only a marginal
degradation in sensor identiﬁcation accuracy as evident from the fourth row in Table 2.5.

• Observation#2: 7 out of 11 sensors reported maximum divergence values for the SQI
transformation, which resulted in the second worst degradation in Rank 1 accuracy (note
8Ò row in Table 2.5), trailing just behind DoG.

• Observation#3: The overall results indicate that pixel intensity distributions have an important

role to play with regards to PRNU.

3Note that PRNU estimation schemes do not require the input scene to be geometrically aligned,

since PRNU is a function of the pixel location in the original image.

42

In summary, Enhanced SPN and MLE SPN are robust to most of the illumination normalization
schemes used by periocular or iris matchers. Both these methods use 2−normalization of noise
residuals to account for the variations arising due to constructional diﬀerences of sensors [100],
which possibly facilitates a more accurate PRNU estimation. Gamma correction and MSR can be
used for ocular image enhancement without impairing the performance of the sensor identiﬁcation
module. SQI and DoG ﬁltering, on the other hand, degrade the performance of sensor identiﬁcation
algorithms.

2.3.6 Summary of the second study

This work investigated the impact of photometric transformations on PRNU estimation schemes and
employed an explanatory model to understand their performance in the presence of photometrically
modiﬁed images.
Iris recognition systems typically use illumination normalization to enhance
ocular images. In this work, photometric transformations which are known to positively impact
ocular recognition have been considered for experimental analysis. Experiments involving 7 ocular
enhancement schemes and 4 PRNU estimation schemes indicate that Enhanced SPN and MLE
SPN are robust to a majority of the illumination normalization schemes considered in this work,
and that DoG ﬁltering and SQI can be detrimental for sensor identiﬁcation (see Section 2.3.4). The
explanatory model indicates that those photometric transformations causing signiﬁcant deviation
of the distribution pertaining to the denoised photometrically modiﬁed image from the pixel
intensity distribution of the denoised original image can negatively impact sensor identiﬁcation
performance. The relative dissimilarity between the distributions pertaining to the denoised original
and photometrically transformed images was quantiﬁed using the Jensen-Shannon divergence
which explained the performance of sensor identiﬁcation algorithms in presence of photometric
transformations.

43

2.4 Summary

In this chapter, we focused on Photo Response Non-Uniformity (PRNU)- based sensor iden-
tiﬁcation method. We studied the feasibility of using PRNU for biometric sensor identiﬁcation,
particularly, in the context of near-infrared iris sensors. We observed that the performance of the
sensor identiﬁcation scheme is impacted by the underlying distribution of the pixel intensities.
Enhanced PRNU achieved 97.17% Rank 1 iris sensor identiﬁcation accuracy when evaluated on
ten iris sensors. We further studied the inﬂuence of photometric transformations, that can alter
the pixel intensity distributions of the images on such sensor forensic schemes. We observed
that some transformations such as Diﬀerence-of-Gaussians ﬁltering degraded sensor identiﬁcation
accuracies.

44

CHAPTER 3

SENSOR DE-IDENTIFICATION

Portions of this chapter appeared in the following publications:
S. Banerjee, V. Mirjalili and A. Ross, “Spooﬁng PRNU Patterns of Iris Sensors while Preserving
Iris Recognition," 5th IEEE International Conference on Identity, Security and Behavior Analysis,
(Hyderabad, India), January 2019.
S. Banerjee and A. Ross, “Smartphone Camera De-identiﬁcation while Preserving Biometric
Utility," 10th IEEE International Conference on Biometrics: Theory, Applications and Systems,
(Tampa, USA), September 2019.

3.1 Introduction

In the previous chapter, we analyzed whether sensor identiﬁcation schemes, particularly PRNU-
based schemes can be used to aid in sensor recgnition for biometric images.
In this chapter,
we would like to test the robustness of the schemes. More importantly, whether these sensor
recognition schemes can be deliberately confounded. To this end, we have proposed two strategies
in this chapter. The ﬁrst strategy involves altering the images for sensor de-identiﬁcation in iris
sensors. The seocnd strategy involves sensor de-identiﬁcation for smartphone sensors in the context
of partial face images.

3.2 Sensor de-identiﬁcation for iris sensors

Given the forensic value of PRNU in determining the origin of an image (i.e.

the sensor or
device that produced it), we explore if it is possible to alter an image such that its source, as assessed
by a PRNU estimation scheme, is confounded. We impose two constraints:

1. The modiﬁed image must spoof the PRNU pattern of a pre-speciﬁed target sensor.

2. The biometric utility of the modiﬁed image must be retained, viz., the modiﬁed ocular image

45

Figure 3.1: Illustration of the objective of the proposed method, i.e., to perturb an ocular (iris)
image such that its PRNU pattern is modiﬁed to spoof that of another sensor, while not adversely
impacting its biometric utility.

Figure 3.2: The proposed algorithm for deriving perturbations for the input image using the
candidate image. (a) Steps involved in modifying the original image from the source sensor using
a candidate image from the target sensor (see Algorithm 1), and (b) role of the candidate image in
the perturbation engine (see Algorithm 2).

must match successfully with the original image.

This kind of attack can be considered as a ‘targeted attack’, since the sensor whose PRNU
pattern has to be spoofed is pre-speciﬁed. In the literature, it is also referred to as ﬁngerprint-copy
attack [78, 154], because the objective is to copy the sensor pattern or ‘ﬁngerprint’ corresponding
to the target sensor to an image acquired using a diﬀerent source sensor. The proposed work
has two distinct beneﬁts. Firstly, it allows us to assess the feasibility of PRNU spooﬁng from a
counter-forensic perspective. The widespread use of forensic techniques for examining the validity
and origin of digital media [75,105] necessitates the study of attacks that can potentially undermine
the performance of such forensic methods. For example, an adversary may maliciously attempt
to link an image to a diﬀerent camera in an eﬀort to mislead law enforcement investigators [78].

46

Secondly, establishing the viability of such spoof attacks would promote the development of more
robust PRNU estimation schemes [115]. In addition, eﬀective methods to detect such attacks can
be developed if the process of spooﬁng is better understood. Figure 3.1 summarizes the objective
of this work.

The remainder of the chapter is organized as follows. We present two works, the ﬁrst work
involves sensor de-identiﬁcation for iris sensors using iterative image perturbation routine, and the
second work involves sensor de-identiﬁcation in the context of smartphones.

3.2.1 Perturbing the PRNU Pattern for iris sensors

In the ﬁrst work, our objective is to perform PRNU spooﬁng in a principled manner, that works
for any arbitrary pair of source and target iris sensors. In addition, we wish to retain the biometric
utility of the PRNU-spoofed image. The task of spooﬁng can be potentially accomplished through
diﬀerent techniques, an example will be the use of adversarial networks that have been successfully
utilized for perturbing images in the current literature [122]. However, a signiﬁcant bottleneck
of deep-learning based techniques is the need for large amount of training data for driving the
perturbation process. We will demonstrate the success of the proposed PRNU spooﬁng scheme
using small number of images (<1000).

3.2.2 Proposed method

In this section, we formally describe the objective and the method used to address this objective.

3.2.2.1 Problem formulation
Let  denote an NIR iris image of width  and height Ò, and  = {1, 2, .., }, denote a set
of  sensors. Let (, ) be the function that computes the normalized cross-correlation (NCC)
between the noise residual of  and the PRNU reference pattern of sensor . Then, the sensor
{(, )}. Furthermore, let 
label for the input iris image  can be determined using arg max


47

Table 3.1: Speciﬁcations of the datasets used in this work.

Dataset
BioCOP 2009 Set I
IITD [1]
CASIAv2 Device2 [6]
IIITD Multi-spectral
Periocular (NIR subset) [2]
ND CrossSensor Iris 2013 Set II [3]
MMU2 [4]
ND Cosmetic Contact Lens 2013 [3]
WVU Oﬀ-Axis
CASIAv2 Device 1 [6]
CASIAv4-Iris Thousand subset [7]
ND CrossSensor Iris 2013 Set I [3]

Sensor Name (Abbreviation)
Aoptix Insight (Aop)
Jiristech JPC 1000 (JPC)
CASIA-IrisCamV2 (IC)
Cogent (Cog)
LG 4000 (LG40)
Panasonic BM-ET 100US Authenticam (Pan)
IrisGuard IG AD100 (AD)
EverFocus Monochrome CCD (Ever) [144]
OKI IrisPass-h (OKI)
IrisKing IKEMB100 (IK)
LG 2200 (LG22)

No. of Images Used
(Training set+Testing set) No. of Subjects

995 (55+940)
995 (55+940)
995 (55+940)
588 (55+533)
615 (55+560)

55 (55+0)
55 (55+0)
55 (55+0)
55 (55+0)
55 (55+0)
55 (55+0)

100
100
50
62
99
6
4
7
3
3
5

be a biometric matcher where (1, 2) determines the match score between two iris samples 1
and 2. Given an input iris image  acquired using sensor , a candidate image  from the target
sensor , and an iris matcher  our goal is to devise a perturbation engine  that can modify the
input image as  = (, ) such that ( , ) < ( , ), and thereby predict  as the sensor
label of the perturbed image , while the iris matcher, , will successfully match  with . As a
result, the target sensor will be spoofed, while the biometric utility of the image will be retained.
This implies that the match score between a pair of perturbed images [(1, 2)] as well as that
of a perturbed sample with an original sample, [(1, 2)] and [(1, 2)], are expected to be
similar to the match scores between the original samples [(1, 2)]. The steps used to achieve
this task are described next.

3.2.2.2 Deriving perturbations and PRNU Spooﬁng
Given a single image  from the source sensor , a gallery of images  = {1, ..., } from
the target sensor , and a set of  random patch locations  = {1, ..., }, we ﬁrst select a
candidate image, ,  ∈ [1, · · · , ], from the gallery to perturb the input image. The candidate
image is selected from the gallery such that it is maximally correlated with the input image .
To accomplish this goal, we select 10 patches in the input image, each of size 10 × 10 (i.e.
 = 10, Ò = 10,   = 10 in Algorithm 1). Now, we compute the average pixel intensity in each
of these patches and create a -dimensional vector  . Next, for each of the  gallery images,

48

Algorithm 1: Selection of the candidate image
1: Input: An image  from sensor , a gallery of images  = {1, ..., } from the target
2: Output: A candidate image, , selected from the gallery
3: Set static parameters  = 10 (number of random patches) and   = 10, Ò = 10, (patch width
4: Generate a set of  random patch locations  = {1, · · · , }, where each patch size is
5: Compute the average pixel intensity in each patch  ∈  of the input image  to obtain a
6: Repeat step 3 for each of the gallery images to obtain a set of vectors , where,  = 1, · · · , .
The value of  (the target gallery size) depends on the number of test images indicated in the
fourth column in Table 3.1.

and height).
Ò ×  .

vector   (of size ).

sensor 

7: Compute the correlation between   and  corresponding to each gallery image to obtain a
set of  correlation scores.
return Candidate image  ∈  that has the highest correlation, i.e.,  =   where  =
{(  , )}
argmax
∈[1,··· ,]

we create  where  = [1, · · · , ], by computing the average pixel intensity in the 10 patches
selected previously in the input image. Finally, we compute the correlation between the vectors
  and , and select the candidate image with the maximum correlation value. The steps for
selecting the candidate image are described in Algorithm 1.

After obtaining the candidate image  from the gallery of the target sensor , the perturbations
for image  are then derived with the help of  as described in Algorithm 2. The perturbation
routine employs the following parameters: (i)  (the learning rate), (ii)  (the termination criterion),
Initially, the output perturbed image  (0) is
and (iii)  (the maximum number of iterations).
identical to the input image . Next, we select a random patch location from  (0), and create a
mask matrix, , of the same size as  (0), such that the elements in  are set to 1 for
the row and column indices corresponding to the selected patch location. Then, the image  (0)
is perturbed iteratively using pixels from the same patch location in .
In each iteration, the
pixels inside the selected patch are updated along two directions. The candidate image guides
the direction of perturbation [123]. In the ﬁrst case the perturbation is along a positive direction
(implemented using line 9 in Algorithm 2), which generates  . The other direction corresponds

49

to a negative perturbation (see line 11 in Algorithm 2), which produces  . Figure 3.2(b) illustrates
the role of the candidate image in the perturbation routine. Next, the noise residuals extracted
from ( ,  ) are correlated with the reference pattern of the target sensor. The perturbed image
yielding the maximum correlation value is then selected as the seed image for the next iteration,
. This process is repeated until the relative diﬀerence between the NCC values of perturbed
image   with respect to target sensor  and the original sensor  exceeds 10%, i.e.  = 0.1, or
the maximum number of iterations is reached. The parameters employed in the perturbation routine
are selected intuitively; for example, the learning rate is set to a small value  = 0.01 because our
objective is to perturb the image while preserving its biometric utility.
Algorithm 2: Spooﬁng PRNU pattern
1: Input: An image Ò× from sensor , a candidate image  from sensor , a function
(, ) that returns the NCC value when image  is correlated with the PRNU pattern of
sensor  ( ∈ {, })

2: Output: Perturbed image 
3: INITIALIZE: Set static parameters  = 0.01 (learning rate),  = 0.1 (threshold),  = 3000
(maximum number of iterations) and Ò = 10,   = 10 (patch size),  = 0 and  (0) = .
4: while ( (), ) − ( (), )

>  do

, 0 ≤  < 
  .
– Construct the mask matrix  such that [, ] = 1 if

– Choose a random patch location(cid:0), (cid:1) from [0, Ò
 (cid:99)(cid:17)
=(cid:0), (cid:1),
 −  ()(cid:17)
– Create a perturbed image in the positive direction   =  () +  (cid:12)(cid:16)
 −  ()(cid:17)
– Create a perturbed image in the negative direction   =  ()− (cid:12)(cid:16)

.
– Compute the NCC values of   and   for the target sensor , ( , ) and ( , ),
respectively.
– Set  (+1) =   if ( , ) > ( , ), otherwise set  (+1) =  ,
–  + +,
– If  >  break the loop.

 (cid:12) indicates element-wise product

Ò] and [0, 

 ] such that 0 ≤  <

and [, ] = 0 elsewhere.

Ò
Ò

5:

6:

7:

8:
9:

10:
11:
12:
13: end while

return The ﬁnal perturbed image,  ()

(, )

(cid:16)(cid:98) 

Ò(cid:99), (cid:98) 

.

At the end of the routine, the perturbed image will have to be incorrectly attributed to  by the
sensor classiﬁer. The steps of the PRNU spooﬁng algorithm are illustrated in Figure 3.2(a). The

50

Figure 3.3: Illustration of PRNU spooﬁng using images belonging to the source sensor JPC and
the candidate images belonging to the target sensor Aoptix.

sequence of modiﬁed images undergoing the perturbation routine is illustrated for two example iris
images in Figure 3.3. perturb input images in order to spoof a PRNU-based sensor classiﬁer, while
the perturbations do not aﬀect iris matching. As a result, after perturbations are applied to iris
images, their biometric utility is still preserved.

3.2.3 Experiments and Results

In this section, we describe the datasets and sensors employed in this work, followed by the
experiments conducted on the datasets. Results are reported and analyzed in the context of PRNU
spooﬁng and iris recognition.

3.2.3.1 Datasets

Experiments are conducted using 11 diﬀerent sensors from 11 iris datasets. The PRNU spooﬁng
process typically involves a single source sensor and a single target sensor from the set of 11
sensors. The sensor details and image speciﬁcations of the 11 sensors are described in Table 3.1.
Thus, there can be a total of [11]2 = 110 combinations for PRNU spooﬁng. However, for
the sake of brevity, we performed 20 diﬀerent PRNU spooﬁng experiments involving 5 sensors:
{ , , , , 40}. From the set of 5 sensors listed above, each sensor serves as the

51

source sensor while the remaining 4 sensors serve as target sensors one at a time, thus resulting in
20 diﬀerent PRNU spooﬁng experiments.

3.2.3.2 Sensor identiﬁcation before PRNU spooﬁng

Due to variations in image size of the source and target sensors, all images were resized to a
ﬁxed spatial resolution of 160 × 120 to facilitate PRNU spooﬁng. We then evaluated the sensor
identiﬁcation accuracy based on these resized images prior to PRNU spooﬁng. This is to determine
if resizing impacts sensor identiﬁcation accuracy. The sensor identiﬁcation involves deriving sensor
reference patterns using 55 training images, as used in [18] from each of the 11 sensors, followed
by extraction of test noise residuals from images belonging to the 5 sensors, and ﬁnally correlating
them. The subjects in the training set and the test set are disjoint. The sensor identiﬁcation accuracy
and the corresponding confusion matrix is presented in Table 3.2. The results indicate a very high
sensor identiﬁcation accuracy using the MLE PRNU scheme on the resized images. So we use the
resized images in the experiments below.

Table 3.2: Confusion matrix for sensor identiﬁcation involving unperturbed but resized images.
The test noise residuals of images from 5 sensors are compared against reference patterns from 11
sensors. The last column indicates sensor identiﬁcation accuracy.

Predicted Aop
900
2
0
2
0

Actual
Aop
JPC
IC
Cog
LG40

JPC IC
2
1
919
4
940
0
2
1
0
0

Cog LG 40 Pan AD Ever OKI
1
2
0
546
0

9
5
0
2
529

4
0
0
0
0

3
0
0
2
0

3
4
0
0
3

9
1
0
0
1

IK LG 22 Accuracy
7
2
0
5
0

(%)
95.74
97.77
100
97.51
99.25

1
1
0
0
0

3.2.3.3 Sensor identiﬁcation after PRNU spooﬁng

The PRNU spooﬁng process involves perturbing the original image from a source sensor using a
candidate image belonging to the target sensor, whose PRNU needs to be spoofed. The impact of
the perturbations on spooﬁng the PRNU pattern has been reported in terms of Spoof Success Rate
(SSR), which computes the proportion of test images from the source sensor classiﬁed as belonging

52

to the target sensor after perturbing using Algorithm 2. The results of spooﬁng are presented in
Table 3.3.

We implemented Baseline 1 and Baseline 2 algorithm. Baseline 2 is implemented following
normalization of the source and target reference patterns with respect to the maximum intensity of
the PRNU present in the two reference patterns. The normalization is required to account for the
variation in the PRNU strength associated with diﬀerent sensors. Ideally, the scalar terms  and
, which serve as parameters in the baseline algorithm, need to be optimized through grid-search
for a speciﬁc pair of source () and target () sensors. However, we set the scalars to a static
value of 1 for two reasons: (i) for ease of computation and (ii) to provide fair comparison with the
proposed algorithm which also uses ﬁxed values of parameters for all pairs of sensors. The baseline
algorithms are state-of-the art to the best of our knowledge and are, therefore, used for comparative
evaluation. Examples of perturbed outputs of images spoofed using Baseline 1, Baseline 2, and the
proposed algorithm are presented in Figure 3.4.

Results in Table 3.3 indicate that 15 out of 20 times the proposed algorithm outperforms
Baseline 1 technique, and performs considerably better than Baseline 2 method 16 out of 20
times. The average SSR of the proposed algorithm outperforms the baseline algorithms by a
signiﬁcant margin. We believe that the parameters  and  need to be tuned accurately for each
pair of source and target sensors to ensure the success of the baseline algorithms. On the other hand,
the proposed algorithm is successful for static parameter values: the size of patches (Ò ×  ), the
threshold , the learning rate , and the number of patches () (see Section 3.2.2.2). The PRNU is
successfully spoofed by the proposed method in most of the cases barring the case where the target
sensor is Aoptix and the source sensor is LG 4000 (≈62% SSR). Inspection of the images acquired
using LG 4000 sensor reveals the presence of image padding, which may negatively impact the
PRNU spooﬁng process.

Figure 3.5 shows an input image undergoing iterative perturbations. The original (unperturbed)
image belongs to the Aoptix sensor and is perturbed using a candidate image from the target sensor,
Cogent. The subsequent shift of the NCC values from being the highest for the source sensor

53

Table 3.3: Results of PRNU spooﬁng where the target sensors (along the second column) are
spoofed by perturbing the images from 5 source sensors, namely, Aop, JPC, IC, Cog and LG40
(along the ﬁrst column). The test noise residual after the perturbation process is compared against
the reference patterns of 11 sensors (see Table 3.1). The last 3 columns indicate the proportion of
the perturbed images successfully classiﬁed as belonging to the target sensor and is denoted as the
Spoof Success Rate (SSR). The highest values of the SSR are bolded.

Original
Sensor

Target
Sensor

Sensor classes compared against perturbed PRNU

Aop

JPC

IC

Cog

LG40

JPC
IC
Cog
LG40
Aop
IC
Cog
LG40
Aop
JPC
Cog
LG40
Aop
JPC
IC
LG40
Aop
JPC
IC
Cog

Aop
4
21
7
66
905
2
3
1
910
0
0
0
552
1
2
1
330
0
0
0

JPC IC
894
3
891
0
2
3
4
4
3
18
712
209
4
94
61
3
30
0
797
143
243
0
46
0
0
0
546
0
545
0
0
0
3
0
491
0
393
0
0
0

Average SSR (%)

Cog LG40 Pan AD Ever OKI
3
0
890
4
3
2
817
1
0
0
697
0
0
0
2
0
0
0
0
479

8
6
7
836
4
4
5
861
0
0
0
894
2
1
2
550
198
38
136
50

9
6
13
7
2
2
2
1
0
0
0
0
0
0
1
0
1
1
1
1

2
5
2
4
2
3
5
3
0
0
0
0
0
2
0
1
0
2
3
2

2
2
5
3
0
2
5
5
0
0
0
0
0
0
0
0
0
0
0
0

2
1
2
0
0
1
0
0
0
0
0
0
2
2
2
2
0
0
0
0

IK LG22
12
5
5
8
2
2
1
2
0
0
0
0
4
8
5
6
1
1
0
1

1
3
4
4
1
1
4
2
0
0
0
0
0
0
1
0
0
0
0
0

SSR (%) for
proposed
method

SSR (%)
for
Baseline 1

SSR (%)
for
Baseline 2

95.11
94.79
94.68
88.94
96.28
75.74
86.91
91.60
96.81
84.79
74.15
95.11
98.57
97.50
97.32
98.21
61.91
92.12
73.73
89.87
89.21

92.55
92.77
79.89
79.15
49.15
99.79
35.53
8.09
48.72
100
46.70
1.91
100
100
100
82.32
9.94
9.38
11.44
4.69
57.60

67.98
13.51
0.21
10.00
1.91
100
0.21
9.26
0

53.09

0
0.11
38.57
100
100
35.00
1.31
24.20
99.44
0.19
32.75

(Aoptix) to being the highest for the target sensor (Cogent), indicates the success of the proposed
method.

The average number of iterations required for successful PRNU spooﬁng varied between 200 to
2200. Another experiment is conducted to study the impact of increasing the number of iterations
on the proposed PRNU spooﬁng process. This experiment is conducted for the speciﬁc case where
the source sensor is LG 4000 and the target to be spoofed is the Aoptix sensor. The reason for
selecting this pair is due to the poor SSR reported for this speciﬁc set of sensors (see the ﬁfth block
in Table 3.3). We speculate that with an increase in the number of iterations, the PRNU spooﬁng
process will succeed and improve the SSR as a result.
In this regard, in the new experimental
set-up, the maximum number of iterations was set to 6000 (twice the earlier terminating criterion).
As a result, the SSR increased considerably from 61.91% to 79.73%, i.e. a ≈ 18% increase was
observed. 425 out of 533 test images belonging to the LG 4000 sensor were successfully classiﬁed

54

Figure 3.4: Example of PRNU spoofed images originating from the JPC 1000 sensor (ﬁrst column)
is illustrated for Baseline 1 (second column), Baseline 2 (third column) and the proposed method
(last column). Here, the target sensor is Aoptix.

as originating from the Aoptix sensor when the number of iterations was increased.

Figure 3.5: Intermediate images generated when an image from the Aoptix () sensor is perturbed
using a candidate image from Cogent (). For the sake of brevity, NCC values corresponding to
the reference patterns of the ﬁrst 5 sensors in Table 3.1 are mentioned in the ﬁgure. The arrows
indicate the increase in the NCC values corresponding to the target sensor.

3.2.3.4 Retaining biometric matching utility

The impact of the perturbations on iris recognition performance is evaluated next using the VeriEye
iris matcher [11]. We designed three experiments for analyzing biometric matching performance.

55

Figure 3.6: Receiver Operating Characteristics (ROC) curves of matching performance obtained
using the VeriEye iris matcher software. The terms ‘Original’, ‘Perturbed’ and ‘Original vs. Per-
turbed’ indicate the three diﬀerent matching scenarios (see Section 3.2.3.4). ‘Original’ indicates
matching only unperturbed images; ‘Perturbed’ indicates matching only perturbed images; ‘Origi-
nal vs. Perturbed’ indicates the cross-matching case where unperturbed images are matched against
perturbed images. Note that the curves obtained from perturbed images match very closely with
the curves corresponding to the unperturbed images illustrating preservation of iris recognition for
each sensor depicted in each column. The results are compared with Baseline 1 and 2 algorithms
discussed in Section 3.2.3.3.

First, the match scores between all pairs of iris samples before perturbation were computed. In
the second experiment, we computed the match scores between all pairs of perturbed samples. In
the third experiment, we computed match scores between all iris samples before perturbation and
all samples after perturbation. This is referred to as the cross-matching scenario. In the third set
of experiments, the genuine scores are computed by employing 2 sample images (from the same
subject): one sample belonging to the set of unperturbed images and the other sample from the set
of perturbed images. The impostor scores are generated by pairing samples belonging to diﬀerent
subjects: one image is taken from the set of unperturbed images, while the other is taken from the
set of perturbed images.

Figure 3.6 shows the ROC curves obtained from these three experiments. The ROC curves
conﬁrm that the perturbed images do not negatively impact the matching utility. In the case of
all the sensors, the ROC curves of the perturbed images are within a 1% deviation from the ROC

56

curve of the original samples before perturbation, except for the IrisCam (IC) sensor. Further, we
note that the matching performance of original samples from the Cogent (Cog) sensor is degraded
to begin with. We believe the reason for this degraded performance is due to the low quality of
the original images. Yet, perturbations have not further deteriorated the matching performance, as
evidenced by the before- and after-perturbation ROC curves that are very similar to each other.

In addition, the iris recognition performance after PRNU spooﬁng using the baseline algorithms
is analyzed. The results indicate that the proposed method is comparable to the baseline algorithms
in terms of iris recognition performance. Furthermore, we conducted a fourth experiment, where
we analyzed the matching performance of those LG4000 iris images that were perturbed to spoof
the Aoptix sensor after increasing the number of iterations. The result conﬁrms that increasing the
number of iterations to improve the SSR does not degrade matching performance, as is evident in
Figure 3.7.

In summary, the following salient observations in the context of both PRNU spooﬁng and iris

recognition preservation can be made.

• The PRNU pattern of a sensor can be successfully spoofed by directly modifying an input
image, without invoking the sensor reference pattern of the target sensor. Experiments are
conducted using 11 iris sensors, and the PRNU spooﬁng process is demonstrated using 5
sensors and compared with existing approaches. Results show that the proposed spooﬁng
method outperforms Baseline 1 by 31.6% and Baseline 2 by 56.4% in terms of average spoof
success rate.

• The proposed spooﬁng algorithm uses identical parameters, such as the size of patches and
learning rate for all pairs of source and target sensors. This obviates the need to ﬁne tune the
method for diﬀerent pairs of sensors.

• The iris recognition performance of the images perturbed using the proposed algorithm is
retained within 1% of the original. This suggests the success of the proposed spooﬁng method
in retaining the biometric utility of the modiﬁed images.

57

Figure 3.7: Impact of increase in the number of iterations on iris recognition performance for the
pair of LG 4000 (source) and Aoptix (target) sensors.

3.2.4 Summary of the ﬁrst strategy of sensor de-identiﬁcation

In the ﬁrst work, we design a method for PRNU spooﬁng that preserves biometric recognition in the
context of NIR iris images. In the proposed strategy, a test image belonging to a particular sensor
is modiﬁed iteratively using patches from a candidate image belonging to a target sensor, whose
PRNU is to be spoofed. We examine the impact of these perturbations on PRNU spooﬁng as well
as iris recognition performance. Experiments are conducted in this regard using 11 sensors and
compared with two existing PRNU spooﬁng algorithms. Results show that the proposed method
can successfully spoof the PRNU pattern of a target sensor and does not signiﬁcantly impact the
iris recognition performance in a majority of the cases.

3.3 Smartphone camera de-identiﬁcation

Since smartphone devices are intricately linked to their owners, sensor identiﬁcation using
images from smartphone cameras can inevitably lead to person identiﬁcation. This poses privacy
concerns to the general populace [10] and, especially, to photojournalists [125]. Sensor de-
identiﬁcation can mitigate such concerns by removing sensor speciﬁc traces from the image. A
number of sensor de-identiﬁcation algorithms, particularly in the context of PRNU suppression,
have been developed in the literature [64, 157]. PRNU suppression can be done by either PRNU
anonymization or PRNU spooﬁng.

58

10−510−410−310−210−11000.980.991.00LG40 to Ao (increased iterations)False Match RateTrue Match RateOriginalPerturbedOriginal vs. PerturbedFigure 3.8: The objective of our work. The original biometric image is modiﬁed such that the
sensor classiﬁer associates it with a diﬀerent sensor, while the biometric matcher successfully
matches the original image with the modiﬁed image.

The objective of this work is to develop a rather simple method to perform sensor de-
identiﬁcation, while preserving the biometric recognition utility of the images. The key idea
is illustrated in Figure 3.8. The merits of the proposed method are as follows.

1. Designing a sensor de-identiﬁcation algorithm that can perform both PRNU anonymization and
PRNU spooﬁng in a non-iterative fashion. This addresses the computational overhead incurred by
the algorithms in [17,63].
2. The proposed de-identiﬁcation algorithm is applicable to diﬀerent PRNU estimation schemes
and works irrespective of the source and target sensors. This eliminates the need for parameter
optimization and computation of the reference patterns corresponding to each pair of source and
target sensors as required in [78,107].
3. The proposed algorithm causes minimal degradation to the biometric content of the images,
thus retaining their biometric utility.

59

Figure 3.9: Illustration of PRNU Anonymization. The DCT coeﬃcients are arranged such that the
top-left portion has the low frequency components while the bottom-right portion encapsulates the
high frequency information. The PRNU anonymized image is the result of suppression of high
frequency components (see Algorithm 3, here  = 0.9).

Algorithm 3: PRNU anonymization
1: Input: An image  of size Ò ×  and parameter 
2: Output: PRNU anonymized image (cid:48)
3: Apply 2-dimensional DCT to ;  = ()
4: Compute  = min(Ò, ) and  = ( × )
5: Extract the high frequency components as follows:
6: ÒÒ = Ò(, ), where, the Ò(·, ·) operator extracts the lower triangular portion of
the DCT coeﬃcients along the anti-diagonal direction, regulated by 
7: Extract the low frequency components as follows:  =  − ÒÒ
8: Apply inverse DCT to obtain the modiﬁed image (cid:48) = −1()

return The modiﬁed image (cid:48)

3.3.1 Proposed Method for smartphone camera de-identiﬁcation

Discrete Cosine Transform (DCT) has been successfully used for lossy image compression [159]
or for improving source camera identiﬁcation [33]. The coeﬃcients located in the top-left portion
capture the low frequency components while the bottom-right coeﬃcients encode the high frequency
components. Our goal is to modify the images to perturb the PRNU pattern resulting in sensor
de-identiﬁcation. PRNU is a noise-like component which is dominated by the high frequency
components present in an image. Thus, we propose to transform the image into the DCT domain
and modulate the DCT coeﬃcients such that the high frequency components are suppressed, while
retaining the low frequency components. By suppressing the high frequency components, we mask
the sensor pattern present in the image. On the other hand, we retain the low frequency components
which primarily contain the scene details in the image. The scene details are pivotal for biometric
recognition. Thus, we ensure preservation of the biometric utility of the image. We then apply the

60

Figure 3.10: Illustration of PRNU Spooﬁng. The high frequency components in the original
image are suppressed ﬁrst, the residue being the low frequency components. The high frequency
components of the target sensor are further computed from the candidate images, and added to
the low frequency components of the original image, resulting in the PRNU spoofed image (see
Algorithm 4, here  = 0.7).

inverse DCT, and the output is the modiﬁed image.

Algorithm 4: PRNU spooﬁng
1: Input:An image  of size Ò ×  belonging to source sensor , a set of  candidate images
belonging to the target sensor , where each image of size × is denoted as  ( = [1, · · · , ])
and 

2: Output:PRNU spoofed image (cid:48)
3: Set  = 1
4: Apply 2-dimensional DCT to ,  = ()
5: Extract the low frequency and high frequency components,  and ÒÒ as described in
6: Compute  = ( × min( , ))
7: while  ≤  do
8:
9:

Algorithm 3 and set ÒÒ = 0

, )

 = ()
– Apply 2-dimensional DCT to , 
– Extract the high frequency components as follows: 
– Apply inverse DCT to the high frequency content as follows: (cid:48)
– Add the images to generate ÒÒ+ = (cid:48)
 and increment + = 1

10:
11:
12: end while
13: Divide by the number of images ÒÒ =
14: Resize ÒÒ to Ò ×  using bicubic interpolation
15: Apply inverse DCT to obtain the modiﬁed image (cid:48) = −1(ÒÒ + )

ÒÒ = Ò(

ÒÒ


return The modiﬁed image (cid:48)


 = −1(

ÒÒ)

To achieve sensor de-identiﬁcation we perform both (i) PRNU Anonymization and (ii) PRNU

Spooﬁng.

61

PRNU Anonymization: Given an image , we ﬁrst subject it to DCT to yield . We intend to
suppress the high frequency information without impairing the low frequency details. To achieve
this goal, we deﬁne a parameter  that serves as a regulator for high frequency suppression.  is
computed as the product of the minimum of the height and width of the image (Ò, ), and a user-
deﬁned parameter , rounded oﬀ to the nearest integer. All DCT coeﬃcients present in the interval
[ =  : Ò,  =  : ] are set to zero. Thus,  represents the threshold for the suppression of
the DCT coeﬃcients, and that threshold is a function of the image dimensions. We discard the high
frequency components and then apply inverse DCT which results in the PRNU anonymized image
(cid:48). The steps are described in Algorithm 3. The process of PRNU anonymization is illustrated
using an example image in Figure 3.9.

Table 3.4: Dataset speciﬁcations. The top block corresponds to MICHE-I dataset [117] and the
bottom block corresponds to OULU-NPU face dataset [35]. In the MICHE-I dataset, we denote
the brand Apple as ‘Device 1’ and the brand Samsung as ‘Device 2’. Two diﬀerent smartphones
belonging to the same brand and model, e.g., Apple iPhone5, are distinguished as ‘UNIT I’ and
‘UNIT II’.

Smartphone Brand and Model Device Identiﬁer

Apple iPhone 5

Apple iPhone 5
Samsung Galaxy

S4

Samsung Galaxy S6 Edge

HTC Desire EYE

MEIZU X5

ASUS Zenfone Selﬁe

Sony XPERIA C5 Ultra Dual

Oppo N3

TOTAL

Device 1
UNIT I
Device 1
UNIT II
Device 2
UNIT I

—
—
—
—
—
—

Sensor

Image Size
960×1280
Front (F)
1536×2048
Rear (R)
960×1280
Front (F)
2448×3264
Rear (R)
1080×1920
Front (F)
2322×4128
Rear (R)
Front (F) 1080×1920
Front (F) 1080×1920
1080×1920
Front (F)
Front (F) 1080×1920
Front (F) 1080×1920
1080×1920
Front (F)

Number of Images/
Number of Subjects

(Training Set)

55/7
55/7
55/6
55/6
55/5
55/5
55/6
55/6
55/6
55/6
55/6
55/6
660/72

Number of Images/
Number of Subjects

(Test Set)
344/41
355/41
164/20
170/20
577/69
600/70

0/0
0/0
0/0
0/0
0/0
0/0

2,210/261

PRNU Spooﬁng: We want the sensor classiﬁer to assign an image belonging to source sensor  to
a speciﬁc target sensor . To accomplish this task, we perform the following steps.
(i) First, we compute the parameter . Next, we transform the original image  from the source
sensor to the DCT domain and then extract its low frequency components (as done in Algorithm 3).
(ii) A set of  candidate images, ,  = [1, · · · , ], belonging to the target sensor is selected, and
each of them is subjected to DCT resulting in 
 (see Section 3.3.2.2). Next, we extract the high

62

, apply inverse DCT, and then compute their average to yield

frequency coeﬃcients from each 
ÒÒ. This averaged output represents the sensor traces of the target sensor.
(iii) Finally, we insert the averaged high frequency coeﬃcients into  to generate (cid:48) which will now
be classiﬁed as belonging to the target sensor, resulting in PRNU spooﬁng.

The implementation details for PRNU spooﬁng are described in Algorithm 4. The process of

PRNU spooﬁng is illustrated using an example image in Figure 3.10.

3.3.2 Experiments and Results for smartphone camera de-identiﬁcation

3.3.2.1 Dataset

We used the Mobile Iris Challenge Evaluation (MICHE-I) dataset [117] and the OULU-NPU face
dataset [35, 103] for performing the experiments in this work. The MICHE-I dataset comprises
of over 3,000 eye images from three devices: Apple iPhone 5, Samsung Galaxy S4 and Samsung
Galaxy Tab 2 [71]. However, in our work, we employed the periocular images from two smart-
phones, Apple iPhone 5 and Samsung Galaxy S4, only. The authors in [71] discovered that two
separate units of Apple iPhone 5 were used for data collection. We refer to them as Unit I and Unit
II respectively. Further, the images in the dataset were acquired using the front and rear camera
sensors, separately. Thus, the MICHE-I dataset used in this work consists of data from 6 sensors.
The OULU-NPU face dataset comprises of 4,950 face videos recorded using the front cameras of
six mobile devices—Samsung Galaxy S6 Edge, HTC Desire EYE, MEIZU X5, ASUS Zenfone
Selﬁe, Sony XPERIA C5 Ultra Dual and OPPO N3. The videos were recorded in three sessions
with diﬀerent illumination and background scenes. We only use the bonaﬁde face videos/images
in the OULU-NPU dataset corresponding to 6 sensors. See Figure 3.11. The speciﬁcations of the
dataset are described in Table 3.4.

We split each dataset into a training set and a test set. We followed a subject-disjoint protocol
for creating the training and test sets. The images in the training set are used for generating the
reference pattern for each sensor, as indicated in the ﬁfth column of Table 3.4. Our training set

63

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 3.11: Example images from the MICHE-I and the OULU-NPU datasets acquired using (a)
Apple iPhone 5 Rear, (b) Samsung Galaxy S4 Front, (c) Samsung Galaxy S6 Edge Front, (d) HTC
Desire EYE Front, (e) MEIZU X5 Front, (f) ASUS Zenfone Selﬁe Front, (g) Sony XPERIA C5
Ultra Dual Front and (h) OPPO N3 Front sensors.

consists of 55 images [19] from each camera sensor in the MICHE-I dataset. The OULU-NPU
database contains videos, and so we selected 55 frames (20 frames from the ﬁrst session, 20 frames
from the second, and 15 frames from the third) from 6 subjects, for each of the 6 sensors. The test
set comprises of images belonging to the MICHE-I dataset only (see the last column in Table 3.4).
Thus, our dataset consists of 2,870 images corresponding to 333 subjects acquired using 12 camera
sensors. Next, we describe the experiments conducted in this work.

3.3.2.2 Experimental Methodology for smartphone camera de-identiﬁcation

For the sensor de-identiﬁcation experiments, we ﬁrst computed the sensor reference patterns from
the traning set for each of the 12 sensors (see Table 3.4) using the three PRNU estimation schemes
viz., Enhanced PRNU, MLE PRNU and Phase PRNU. Next, we used a small number of images
(=10) as the validation set to compute the parameter  = [0, 1] to be used for PRNU anonymization
and PRNU spooﬁng, separately. We estimated  = 0.9 for PRNU anonymization and  = 0.7
for PRNU spooﬁng. The test experiments were conducted on images belonging to the MICHE-I
dataset only. However, the evaluation process involved all the 12 sensor reference patterns.
The experiments evaluated three PRNU estimation schemes: Enhanced PRNU,1 MLE PRNU and
Phase PRNU methods. We used normalized cross-correlation for sensor identiﬁcation. For the
PRNU spooﬁng experiments, the source and target sensors were from the MICHE-I dataset and
were either both front or both rear sensors. Thus, there were 2 × [3]2 = 12 PRNU spooﬁng

1We employed Enhancement Model III and we set the user deﬁned threshold to 6 [19,106].

64

experiments. Due to the signiﬁcant diﬀerence in resolutions between the front and rear sensors of
smartphones, we did not perform front-to-rear or rear-to-front spooﬁng. We selected , i.e. the
number of candidate images belonging to the target sensor (see Algorithm 4), to be the number of
test images for that sensor (see the last column in Table 3.4).
Table 3.5: Performance of the proposed algorithm for PRNU Anonymization in terms of sensor
identiﬁcation accuracy (%). Results are evaluated using 3 PRNU estimation schemes. ‘Original’
corresponds to sensor identiﬁcation using images prior to perturbation.
‘After’ corresponds to
sensor identiﬁcation using images after perturbation and ‘Change’ indicates the diﬀerence between
the ‘Original’ and ‘After’ sensor identiﬁcation accuracies. A high positive value in the ‘Change’
ﬁeld indicates successful PRNU Anonymization.

Device Identiﬁer Sensors

Front
Device 1 UNIT I
Rear
Device 1 UNIT II Front
Rear
Front
Rear

Device 2 UNIT I
AVERAGE

Enhanced PRNU

MLE PRNU

Phase PRNU

Original After Change Original After Change Original After Change
99.71
99.51
96.34
94.71
100
100

99.71
98.05
93.90
87.65
100
100

99.71
97.32
96.34
88.24
100
100

18.31
16.06
21.34
11.76
3.81
4.50

81.40
83.45
75
82.95
96.19
95.50
85.75

17.73 81.98
16.06 81.26
25.61 70.73
14.12 74.12
94.28
5.72
96.83
3.17
83.20

22.67 77.04
21.69 76.36
26.83 67.07
11.76 75.89
13.69 86.31
93.50
6.50
79.36

For the biometric matching experiments, we considered a periocular matcher, as many of the
images used in this work are partial face images. We employed the ResNet-101 [83] architecture
pre-trained on ImageNet [56] dataset for performing periocular matching. We utilized the features
from layer 170 which were shown to perform the best for periocular matching in [85]. We applied
Contrast Limited Adaptive Histogram Equalization (CLAHE) to the images before feeding them
to the convolutional neural network. We used the cosine similarity for computing the match score
between the probe and gallery images. We performed three sets of matching experiments, viz., (i)
original: both probe and gallery images comprise of unmodiﬁed images, (ii) after: both probe and
gallery images comprise of modiﬁed images and (iii) cross: the gallery images are the original
samples while the probe images are the modiﬁed images and the genuine scores are computed by
utilizing 2 sample images (belonging to the same subject), i.e. the original image and the modiﬁed
image; the impostor scores are computed by taking pairs of samples belonging to diﬀerent subjects.
Furthermore, we conducted experiments separately for the two acquisition settings in this database:
Indoor and Outdoor.

65

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

(a)

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

(b)

(c)

Figure 3.12: ROC curves for matching PRNU Anonymized images. Each row corresponds to a
diﬀerent device identiﬁer: (a) Device 1 UNIT I, (b) Device 1 UNIT II and (c) Device 2 UNIT I.

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

(a)

Figure 3.13: ROC curves for matching PRNU Spoofed images. Here, the source sensor is Device
1 UNIT I. In this case, the target sensors are: (a) Device 1 UNIT II (top row) and (b) Device 2
UNIT I (bottom row).

(b)

66

Table 3.6: Performance of the proposed algorithm for PRNU Spooﬁng in terms of the spoof
success rate (SSR) (%). Results are evaluated using three PRNU estimation schemes. A high value
of SSR indicates successful spooﬁng.

Spoof Success Rate (%)

Enhanced PRNU MLE PRNU Phase PRNU

Source Sensor
Device 1 UNIT I
FRONT
Device 1 UNIT II
FRONT
Device 2 UNIT I
FRONT
Device 1 UNIT I
REAR
Device 1 UNIT II
REAR
Device 2 UNIT I
REAR
AVERAGE

Target Sensor
Device 1 UNIT II FRONT 100
96.80
Device 2 UNIT I FRONT
100
Device 1 UNIT I FRONT
97.56
Device 2 UNIT I FRONT
Device 1 UNIT I FRONT
100
Device 1 UNIT II FRONT 100
Device 1 UNIT II REAR
100
99.44
Device 2 UNIT I REAR
100
Device 1 UNIT I REAR
100
Device 2 UNIT I REAR
Device 1 UNIT I REAR
100
100
Device 1 UNIT II REAR
99.48

100
100
100
100
100
100
100
96.90
100
98.82
100
100
99.64

100
100
100
100
100
100
94.08
97.46
100
100
100
99.83
99.28

3.3.2.3 Results for smartphone camera de-identiﬁcation

For the sensor de-identiﬁcation experiments, we used sensor identiﬁcation accuracy as the evalu-
ation metric for PRNU anonymization and the spoof success rate (SSR) as the evaluation metric
for the PRNU spooﬁng algorithm. For PRNU anonymization, we ﬁrst compute the sensor iden-
tiﬁcation accuracy of the original images. Before perturbation, the images are assigned to the
correct sensor with high accuracies by all three PRNU estimation schemes (see ‘Original’ column
in Table 3.5). Next, when the sensor classiﬁer accepts the modiﬁed images as input, the results
indicate a signiﬁcant degradation in the sensor identiﬁcation accuracy for a majority of the cases
(see ‘After’ column in Table 3.5). The diﬀerences in the sensor identiﬁcation accuracies before
and after perturbation are reported in the ‘Change’ column in Table 3.5. An average diﬀerence
(change) of 82.77% in the sensor identiﬁcation accuracies between pre- and post-perturbed images
is observed for all the three PRNU estimation schemes evaluated in this work (Enhanced PRNU:
85.75%, MLE PRNU: 83.20% and Phase PRNU: 79.36%). The results indicate successful PRNU
anonymization thereby ensuring sensor de-identiﬁcation.

The second set of results, pertaining to PRNU spooﬁng, reports the SSR for the perturbed
images. SSR computes the proportion of perturbed images that are assigned to the target sensor.

67

The results in Table 3.6 indicate successful spooﬁng with respect to all the PRNU estimation
schemes considered in this work. An average SSR of 99.48% is observed when evaluated using
Enhanced PRNU, 99.64% when evaluated using MLE PRNU, and 99.28% when evaluated using
Phase PRNU for all 12 PRNU spooﬁng experiments. The proposed spooﬁng experiment fails to
confound the Phase PRNU estimation scheme, particularly when the source sensor is the rear sensor
of Device 1 UNIT I and the target sensor is the rear sensor of Device 1 UNIT II. Upon analysis, we
observed that the original images belonging to Device 1 UNIT II rear sensor resulted in the lowest
sensor identiﬁcation accuracy for all three PRNU estimation schemes (see Table 3.5). We speculate
that the images may contain some artifacts that are interfering with reliable PRNU estimation as
well as the spooﬁng process. Therefore, we performed another experiment where we increased the
value of  from 0.7 to 0.9 for that particular spooﬁng experiment and we observed that the SSR
increased to 100% for all 3 PRNU estimation schemes. However, visual analysis reveals that the
spoofed images resulting from the two diﬀerent values of  have perceptible diﬀerences ( = 0.9
results in a more blurred image than when  = 0.7 is used). Finally, we studied the performance
of our PRNU spooﬁng algorithm when a smaller number of candidate images, , is employed
(50%, 10% and 1% of the test set). Surprisingly, even when only 1% of the test set is used as
candidate images, i.e.  = 4, we observed an average SSR of 99.6% across the three PRNU
estimation schemes. However, the spoofed images are signiﬁcantly degraded as they contain some
spurious scene details from the candidate images (possibly, the averaging operation in Step 17 of
Algorithm 4 suppresses scene details more aggressively for a high value of ).

Next, we report the results for the periocular biometric recognition experiments. The periocular
matching experiments indicate the preservation of the biometric utility of the images in both PRNU
anonymized images and PRNU spoofed images. The ROC curves corresponding to ‘Original’ and
‘After’ matching experiments are within 1% of each other. Figure 3.12 presents the ROC curves for
images subjected to PRNU anonymization. Note that Samsung Galaxy S4 results in overall lower
periocular matching performance even for the original images. The cross-matching experiments
result in perfect match (100%) for a majority of the cases barring the Samsung Galaxy S4 sensor.

68

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

(a)

(b)

Figure 3.14: ROC curves for matching PRNU Spoofed images. Here, the source sensor is Device
1 UNIT II. In this case, the target sensors are: (a) Device 1 UNIT I (top row) and (b) Device 2
UNIT I (bottom row).

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

Front-Indoor

Front-Outdoor

Rear-Indoor

Rear-Outdoor

(a)

(b)

Figure 3.15: ROC curves for matching PRNU Spoofed images. Here, the source sensor is Device
2 UNIT I. In this case, the target sensors are: (a) Device 1 UNIT I (top row) and (b) Device 1 UNIT
II (bottom row).

69

The suppression of the high frequency components may also result in removal of edges and other
details which can impact the matching performance. For the PRNU spooﬁng experiments, we
have presented the ROC matching curves for each smartphone device or unit used in this work in
Figures 3.13, 3.14 and 3.15. The matching experiments show that the perturbation scheme used for
PRNU spooﬁng does not degrade the biometric recognition performance.

3.3.3 Summary of the second strategy for smartphone camera de-identiﬁcation

In this work, we design an algorithm that perturbs a face image acquired using a smartphone
camera such that (a) sensor-speciﬁc details pertaining to the smartphone camera are suppressed
(sensor anonymization); (b) the sensor noise pattern of a diﬀerent device is incorporated (sensor
spooﬁng); and (c) biometric matching using the perturbed image is not aﬀected (biometric utility).
We achieve this by applying the Discrete Cosine Transform to images and further modulating
the DCT coeﬃcients to either attain PRNU anonymization or PRNU spooﬁng.
In contrast to
existing methods which involve computation of sensor reference patterns and exhaustive parameter
optimization [78, 107, 154], the proposed method is simple and can achieve highly promising
results. In our experiments, we considered face (partial and full) images acquired using the front
and rear cameras of diﬀerent smartphones resulting in data from a total of 12 camera sensors. Our
proposed method results in successful camera de-identiﬁcation for images without compromising
the biometric matching performance. An average of ≈ 82.8% reduction in sensor identiﬁcation
is reported in the case of PRNU anonymization, and an average spoof success rate of ≈99.5% is
observed for PRNU spooﬁng across the three PRNU estimation schemes evaluated in this work.

3.4 Summary

In this chapter, we proposed two diﬀerent strategies for PRNU-based sensor de-identiﬁcation.
The motivation behind this counter-forensic measure is two-fold, ﬁrstly, to analyze the robustness
of the PRNU-based sensor identiﬁcation scheme, and secondly, it serves as a privacy-preserving
method, particularly, for smartphone sensors. The developed strategies can achieve sensor de-

70

identiﬁcation without compromising the biometric utility of the images. The ﬁrst strategy for
sensor de-identiﬁcation involved an iterative patch-based image perturbation routine to spoof iris
sensors. The ﬁrst method outperformed state-of-the-art sensor de-identiﬁcation schemes by upto
56.40%. The second strategy involved a one-shot approach that modulated discrete cosine trans-
form coeﬃcients to accomplish both PRNU anonymization and PRNU spooﬁng in the context of
smartphone sensors. The proposed approach could successfully confound multiple PRNU classi-
ﬁers resulting in an average reduction of 82.78% in terms of PRNU anonymization and 99.48% in
terms of PRNU spooﬁng while retaining biometric utility within 1%.

71

CHAPTER 4

JOINT BIOMETRIC-SENSOR REPRESENTATION

Portions of this chapter appeared in the following publication:
S. Banerjee and A. Ross, “One Shot Representational Learning for Joint Biometric and Device
Authentication," 25th International Conference on Pattern Recognition, (Milan, Italy), January
2021.

4.1 Introduction

Biometric data such as face, ﬁngerprint or iris images reveal information about the identity of
In some
the individual as well as the identity of the device used to acquire the data [25, 135].
applications such as smartphone banking, it is necessary to authenticate both the user as well as the
device in order to enhance security [15, 71]. This can be done by invoking two separate modules:
one for biometric recognition and the other for device or sensor recognition.1 In such cases, the
system has to store two distinct templates: a biometric template denoting the identity of the user
and sensor template denoting the identity of the device.

In this paper, we approach this problem by designing a joint template that can be used to
authenticate both the user and the device simultaneously. Our objective is as follows: Given
a biometric image we would like to simultaneously recognize the individual and the acquisition
device. In the process of accomplishing this objective, we address the following questions:

1. Why do we need to combine biometric and device recognition?

Smartphones are increasingly using biometrics for access control and monetary transactions.
Examples include ﬁngerprint and face recognition on iPhones and iris recognition on Samsung
Galaxy S9. Device veriﬁcation2 can provide assurance that the biometric sample is being
1The terms “device" and “sensor" are interchangeably used in this paper. Thus, determining the
identity of a smartphone camera (i.e., sensor) is akin to determining the identity of the smartphone
(i.e., device) itself.

2Typically used in two factor authentication (2FA) protocol that combines any two of the three

72

(a)

(b)

Figure 4.1: Diﬀerence between (a) methods that use separate modules for computing biometric and
sensor representations, and (b) the proposed method that uses an embedding network to generate a
joint biometric-sensor representation.

acquired by an authorized device. A combined biometric and device recognition system can
therefore guarantee that the right person is accessing the remote service (e.g., banking) using
an authorized device.

2. Can existing device veriﬁcation techniques be used in the smartphone application sce-

nario?
Device identiﬁcation can be performed using the MAC (media access control) address, a
unique networking address assigned to each device. However, in case of smartphones that
have multiple network interfaces, such as Wi-Fi, 4G, bluetooth, etc., there can be multiple
factors: ‘something you are’ (biometrics), ‘something you have’ (a code on the authorized device)
and ‘something you know’ (a password) for additional security.

73

MAC addresses which may be broadcasted, making them vulnerable. Alternatively, SRAM
cells can be used to deduce physically unclonable cues for device identiﬁcation [15]; this is
a hardware-based solution and requires access to the physical device. In a mobile banking
scenario, where the veriﬁcation is conducted remotely, the customer provides a biometric
sample in the form of an image, and some device information, but not necessarily the physical
device itself. In this scenario, hardware-based solutions will be ineﬀective.

3. Why do we need a joint representation?

Existing literature uses separate modules to tease out the biometric-speciﬁc and sensor-
speciﬁc details from an image and perform feature-level or score-level fusion [15,71]. How-
ever, they suﬀer from the following limitations: (i) the overall performance is limited by the
weakest recognition module, and (ii) the process may not generalize well across diﬀerent
biometric modalities and multi-spectral sensors. Therefore, a joint representation that com-
bines both biometric and sensor-speciﬁc features present in a biometric image can oﬀer the
following advantages: (i) the joint representation is not constrained by the performance of
the individual recognition module, and the same method can be employed across diﬀerent
biometric modalities, and (ii) the joint representation integrates the biometric and sensor
representations into a compact template, such that, the individual templates cannot be easily
de-coupled; this implicitly imparts privacy to the biometric component.

4.2 Proposed Method

An image contains both low frequency and high frequency components. For example, in a face
image, the low frequency components capture the illumination details while the high frequency
components capture the structural details present in the face that are useful for biometric recog-
nition. Recently, sensor recognition has been successully accomplished using Photo Response
Non-Uniformity (PRNU) [112] for diﬀerent types of sensors, such as DSLR sensors [42], smart-
phone sensors [22], and also near-infrared iris sensors [154]. PRNU is a form of sensor pattern
noise in an image that manifests due to anomalies during the fabrication process and is, there-

74

fore, unique to each sensor. Typically, PRNU resides in the high frequencies that can be useful
for sensor recognition [112]. Since the high frequencies dominate in both biometric and sensor
representations, we hypothesize that there is a joint representation, that, if eﬀectively extracted, can
be utilized for both tasks of biometric and sensor recognition. Our objective is to learn this joint
representation that lies at the intersection of the sensor and biometric space. Mathematically, it can
be represented as ( ) = ( ) ∩ ( ), where  is an input biometric image, (·) is the biometric
representation extracted from , (·) is the sensor representation computed from the same input
, and (·) is the joint representation. Existing methods process  using two independent routines
to extract the two representations, and can optionally perform fusion, either at feature level or at
score level, to make a decision. However, we propose to leverage the two representations to derive
a joint representation (see Figure 4.1). The joint space can be best approximated using an embed-
ding network that can convert images to compact representations [161]. The embedding network
E, takes two inputs,  and the dimensionality () of the embedding to be generated, such that
( ) = E( , ) ≈ ( ) ∩ ( ). The second argument , allows us to regulate the dimensionality
of the joint representation, which will be much lesser than the original dimensionality of the image,
as well as the combined dimensionality of the two representations computed separately, i.e., if
 ∈ R, ( ) ∈ R and ( ) ∈ R, then the joint representation ( ) ∈ R, where,  <<  and
 < ( + ).

In this work, we used a deep convolutional neural network that serves the role of the embedding
network (see Figure 4.2). The embedding network consists of two 2-D convolutional layers and three
linear layers. We used max-pooling for down-sampling the feature map and a parametric-rectiﬁed
linear activation unit (PReLU) as the activation function. The embedder accepts an image, resized
to 48× 48, as the input and produces a 8-dimensional output, which is the joint representation. The
choice of the dimensionality of the representation along with the experimental setup is described
later (see Section 4.3.3).

The main contributions of this work are as follows:

1. We propose a method to learn a joint biometric and sensor representation using a one-shot

75

Figure 4.2: Outline of the proposed method used for computing the joint biometric and sensor
representation. Input: A single image, or a pair of images, or 3-tuple images to the embedding
network. Output: Joint biometric-sensor representation. The embedding network is trained in
three mutually exclusive modes, viz., classical mode (top row), siamese mode (middle row) and
triplet mode (bottom row). The switching circuit selects only one training mode at a time.

approach that can be used in joint identiﬁcation and joint veriﬁcation scenarios. A correct
joint identiﬁcation/veriﬁcation occurs only if both subject identity and device identity yield
correct matches.

2. We employ an embedding network that can learn the joint representation irrespective of the
In this context, we evaluate the
biometric modality and the sensor used for acquisition.
proposed method using three diﬀerent biometric modalities (face, iris and periocular), and
diﬀerent types of sensors (iris sensors operating in the near-infrared spectrum and smartphone
camera sensors operating in the visible spectrum).

3. We perform extensive experiments using diﬀerent training paradigms and loss functions,
and compare the proposed method with existing state-of-the-art algorithms for biometric and
sensor recognition.

76

Table 4.1: Dataset speciﬁcations used in this work. We used three datasets corresponding to 3
biometric modalities viz., iris, periocular and face. Here, we perform joint biometric and sensor
recognition, so total # is computed as the product of #   and # . (∗MICHE-I
dataset has a total 75 subjects, out of which the ﬁrst 48 subjects were imaged using iPhone 5S UNIT
I and the remaining 27 subjects were imaged using iPhone 5S UNIT II, as observed in [71]. Here,
‘UNIT’ refers to two diﬀerent units of the same brand and model iPhone 5S, and therefore, should
be treated as two diﬀerent smartphones. In this case, # = 375 since only a subset of the
total 75 subjects were imaged using either of the two units of iPhone 5S smartphone at a time.)

Modality Dataset

Name of sensors

Iris
Periocular MICHE-I Apple iPhone5S (Front and Rear) UNITS I and II, Samsung Galaxy

CASIA IrisCAM-V2,
OKI IrisPass-h

S4 (Front and Rear), Samsung Galaxy Tab GT2 (Front)
HTC Desire EYE, Sony XPERIA C5 Ultra Dual, MEIZU X5,
Oppo N3, Samsung Galaxy S6 Edge, ASUS Zenfone Selﬁe

CASIA-
Iris V2

OULU-
NPU

Face

TOTAL

Split

(# Subjects,
# Sensors,
# Classes)
Train
(60, 2, 120)
Test
(75, 5 (7), 375∗) Train
Test
Train
Test

(55, 6, 330)

(190, 13, 825)

# Images

1,680
720
2,278
863
5,940
2,970
14,451

4.3 Experiments

4.3.1 Datasets

In this work, we focused on three diﬀerent biometric modalities, viz., iris, periocular and face. To
this end, we used three diﬀerent datasets - (i) CASIA-Iris Image Database Version 2 [6] which
contains near-infrared iris images acquired using two sensors, (ii) Mobile Iris Challenge Evaluation
(MICHE-I) dataset [117] which contains partial face images acquired using two smartphones (front
and rear sensors separately) and front camera of a tablet, and (iii) OULU-NPU dataset [36] which
contains face images acquired using the front sensors of six smartphones. We used only bonaﬁde
images from the OULU-NPU dataset. Table 4.1 describes the datasets used in this work. Note
that the smartphone datasets (MICHE-I and OULU-NPU) contain images acquired in the visible
spectrum.

4.3.2 Evaluation Protocol

Before we describe the experiments, we present the protocol that is used to evaluate the proposed
approach. We evaluate the method in two scenarios, viz., (i) joint identiﬁcation and (ii) joint

77

veriﬁcation. The terms joint identiﬁcation and joint veriﬁcation are diﬀerent from the terms used
conventionally in the biometric literature. In the case of joint identiﬁcation, a correct identiﬁcation
occurs only when both sensor and subject labels of the test sample match with the ground truth
labels. To perform evaluation in the joint identiﬁcation scenario, we select one embedding from each
class (combines both sensor and subject label) to form the gallery, and the remaining embeddings
are used as probes. We use two metrics to compute the distance or similarity between the probe and
gallery embeddings and select the top three matches: (i) standardized Euclidean distance (computes
the pairwise euclidean distance divided by the standard deviation) and (ii) cosine similarity. We
plot the cumulative match characteristics (CMC) curves corresponding to the top three ranks. In
the case of joint veriﬁcation, two joint representations will yield a match if both the embeddings
belong to the same sensor and same subject, otherwise a mismatch occurs. Incorrect match can
occur in three cases as follows: (i) if the two joint representations belong to the same subject,
but diﬀerent sensors, (ii) if the two joint representations belong to the same sensor, but diﬀerent
subjects, and (iii) if the two joint representations belong to diﬀerent subjects and diﬀerent sensors.
To perform evaluation in the joint veriﬁcation scenario, we compute the distance or similarity
between all the test embeddings and present receiver operating characteristics (ROC) curves to
indicate the joint veriﬁcation performance. We also report the true match rate (TMR) values @1%
and 5% false match rates (FMR).

4.3.3 Experimental Settings

In this work, we designed the experimental settings using three diﬀerent modes of training. Say,
 denotes the output of an embedding network for input , i.e.,  = E( , ).
In the ﬁrst
mode, referred to as the classical mode, the embedding  is fed to a classiﬁcation network which
minimizes the cross-entropy loss computed between the ground truth label and the predicted label.
The classiﬁcation network in our case is a shallow network which applies PReLU activation on the
embedding, followed by a fully-connected layer and then applies softmax to compute a probability
value. We assigned the ground truth label for the Ò image, such that  ∈  ⊗ , where 

78

denotes the subject identiﬁer of image ,  denotes the sensor identiﬁer for the same image and
⊗ denotes the tensor product. The cardinality of the set of labels || = | × |. In the second
mode, referred to as the siamese mode, a siamese network [37] is used which feeds a pair of images
to the embedding network. The embedding network then computes a pair of embeddings (,  )
and the siamese network is trained by minimizing the contrastive loss [44] computed between
the pair of embeddings. We used single margin (SMCL) and double margin (DMCL) contrastive
losses. Finally, in the third mode, referred to as the triplet mode, a triplet network [86] is trained
using embeddings generated from an anchor (), a positive ( ) and a negative () sample
by minimizing the triplet loss [145]. We performed oﬄine triplet mining as well as online triplet
mining [143] with diﬀerent triplet selection strategies (random negative triplet selection, semi hard
negative triplet selection and hardest negative triplet selection). The triplet loss considers only one
negative example at a time. Alternatively, multi-class N-pair loss function [148] considers multiple
negative instances from several classes. In this work, we consider a positive example as one which
belongs to the same class as the anchor (same subject and same sensor), whereas there can be
three types of negative examples, viz., same subject but diﬀerent sensor, same sensor but diﬀerent
subject and diﬀerent sensor with diﬀerent subject. Figure 4.2 illustrates the proposed method.
Therefore, the number of negative classes in this work is signiﬁcantly high, so we used multi-class
N-pair loss using two mining techniques: (i) all positive pairs and (ii) hard negative pairs. Table 4.2
summarizes the diﬀerent loss functions used in the three training modes in this work. Note that each
input to the embedding network as shown in Figure 4.2 is mutually exclusive, i.e., the embedding
network can operate independently in any of the three training modes. We modiﬁed the design of
an existing embedding network for implementing the diﬀerent training paradigms [9]. We used
learning rate = 1×exp (−4), batch size = 4, Adam optimizer, and a step decay to reduce the learning
rate by a factor  = 0.1 every 8 epochs. The proposed network is shallow so we trained only for 50
epochs. The margin values in single margin contrastive loss and triplet losses are set to 1, while in
double margin contrastive loss, both margins are set to 0.5.

For each dataset, we used a training set and a test set (see Table 4.1). The number of classes

79

Table 4.2: Description of the training modes and the loss functions used in this work.

Training mode
Classical
Siamese

Triplet

Loss function
Cross entropy

Single margin contrastive loss (SMCL)
Double margin contrastive loss (DMCL)

Oﬄine triplet mining

Online triplet mining

Multi-class N-pair

Random negative
Semi-hard negative
Hardest negative
All positive pair
Hard negative pair

is computed as the product of the number of sensors and number of subjects in that dataset. For
example, CASIA-Iris V2 dataset has 60 subjects and 2 sensors, so total number of classes is
60 × 2 = 120. Each class has 20 images, therefore, the total number of images (samples) is 2,400
(20 × 120). The training and test partitions follows a 70:30 split. So, for a single class, out of 20
samples, 14 samples are randomly selected as the training set and the remaining 6 samples form
the test set. Similar protocol is followed for the remaining datasets.

Figure 4.3: Variation in the joint veriﬁcation performance as a function of the dimensionality of
the joint representation. Experiment is conducted on the validation set using 50 images from the
MICHE-I dataset and four dimensionality values viz., {4, 8, 16, 32}. 8-dimensional embedding
resulted in the highest joint veriﬁcation accuracy, and is therefore selected in this work.

Next, in the training phase, the embedding network accepts an image (resized to 48 × 48) as
input. Diﬀerent image resolutions were used {28 × 28, 48 × 48, 96 × 96}, but 48 × 48 provided

80

4-dim8-dim16-dim32-dimDimensionality of joint representation3035404550556065Joint verification accuracy(%)optimal trade-oﬀ between accuracy and training time. The embeddings are trained in (i) classical,
(ii) siamese and (iii) triplet modes. Then, in the testing phase, we computed the embeddings from
the test set. We evaluate the test embeddings in joint identiﬁcation and joint veriﬁcation scenarios.
Although deep learning-based sensor identiﬁcation methods exist in the literature [14,69,116],
we used Enhanced PRNU [106] (with enhancement Model III) as the sensor identiﬁcation baseline
for all three modalities due to its low computational burden and eﬀectiveness against multi-spectral
images [19]. Enhanced PRNU requires creation of sensor reference patterns, that serve as gallery
and test (probe) noise residuals, that are correlated with the reference patterns. We used training
images to compute the sensor reference patterns and test images for correlation. A test image is
assigned to the sensor class resulting in the highest correlation value. See [22] for more details.
Test noise residuals computed from JPEG images can be matched successfully against sensor
reference patterns computed from RAW images [155], thereby, justifying the use of PRNU as a
state-of-the-art sensor identiﬁcation baseline. We used COTS matcher as the biometric recognition
baseline for iris and face modalities. For the periocular modality, we used a pretrained ResNet-101
architecture [85] and used the features from layer 170 as the biometric representation for the test
samples. This particular architecture is used because it has demonstrated good performance in
biometric veriﬁcation on the MICHE-I dataset [22]. The gallery comprises the training images and
the probes are the test images. Since, PRNU can only be used for the task of sensor identiﬁcation,
we selected to implement both the baselines only in identiﬁcation scenario.

We further conducted an experiment using a validation set comprising 50 images from the
MICHE-I dataset (excluded from the test set) to analyze the eﬀect of the dimensionality of the
embedding on the veriﬁcation performance. To this end, we used four values of  = {4, 8, 16, 32},
and then selected that value which results in the highest performance for the remaining experiments.

81

Figure 4.4: 2-D projection of the embeddings using t-SNE used for sensor identiﬁcation in the
OULU-NPU dataset. Each sensor class is suﬃciently discriminated from the rest of the sensors.

4.4 Results and Analysis

4.4.1 Selection of the metric and dimensionality of embedding

In terms of the choice of the distance/similarity metric, we observed that standardized euclidean
distance metric resulted in better performance compared to the cosine similarity metric. This can
be attributed to the standardization process which takes into account the intra-class and inter-class
In terms of the choice of the dimensionality of the embedding,
variations in the embeddings.
we observed that 8 ais the optimal value, since, it resulted in the best performance (64 on the
MICHE-I validation set) as indicated in Figure 4.3. Therefore, we used 8-dimensional embedding
and standardized Euclidean distance metric for all the experiments. Furthermore, we presented
the t-SNE [156] visualization of the performance of the embedding network in terms of sensor
identiﬁcation for the OULU-NPU dataset in Figure 4.4. The well-separable clusters corresponding
to the six sensors demonstrate the capability of the embedding network used in this work.

4.4.2 Performance of each of the three training modes

In terms of training algorithms, the overall results in both joint identiﬁcation and joint veriﬁcation
scenarios indicate that the embedding network trained in siamese mode outperformed the remaining

82

-500501st projected dimension-60-40-2002040602nd projected dimensionASUSHTCMEIZUOPPOSAMSUNGSONYTable 4.3: Results in the joint identiﬁcation scenario. Results are reported in terms of Rank 1
identiﬁcation accuracies (%). A correct joint identiﬁcation implies that both sensor and subject
resulted in a match. Mismatch of either subject or sensor or both will result in an incorrect joint
identiﬁcation.

Dataset

Method for
baseline

Baseline performance

Sensor
identiﬁcation
(%)

Biometric
identiﬁcation
(%)

Joint
identiﬁcation
(%)

PRNU
COTS
PRNU
ResNet-101

CASIA-
Iris V2
MICHE-I
OULU-NPU PRNU
COTS

100.00

99.86

98.48

56.52

18.05

84.24

56.52

18.05

83.13

Proposed
method
(%)

89.67

47.53

99.81

training paradigms (see Figures 4.5 and 4.6). The reason for the superior performance of siamese
network can be attributed to the use of contrastive loss. Out of the two contrastive losses, single
margin contrastive loss outperformed double margin contrastive loss. The contrastive loss considers
a pair of embeddings at a time, and tries to either minimize the distance between them if they belong
to the same class, or increases the distance between them by some margin if they belong to diﬀerent
classes. On the other hand, triplet loss tries to simultaneously minimize the distance between
the anchor and positive sample, whereas, maximize the distance between the anchor and negative
samples. In this work, the number of negative classes is very high (in a 330 class dataset, 1 class is
positive and the remaining 329 classes are negative). This makes the task of triplet loss much more
complex as compared to contrastive loss. Given the huge variation in the possible combination of
negative triplets (see Figure 4.2), we suspect that the triplet loss struggled to determine the accurate
decision boundary between the positive and negative classes, resulting in an overall reduction in
performance. We investigated diﬀerent types of triplet mining strategies, and observed that online
triplet mining outperformed oﬄine triplet mining and multi-class N-pair in a majority of the cases.

4.4.3 Results of the joint identiﬁcation experiment

In terms of the performance in joint identiﬁcation scenario, Table 4.3 compares the results with
the baseline performance for all the datasets. We reported the baselines for sensor identiﬁcation

83

Table 4.4: Results in the joint veriﬁcation scenario. Results are reported in terms of true match
rate (TMR) at false match rates (FMRs) of 1% and 5%.

Dataset
CASIA-Iris V2
MICHE-I
OULU-NPU

TMR@FMR=1% TMR@FMR=5%
90.00
62.00
100.00

98.00
90.00
100.00

(PRNU), biometric identiﬁcation (COTS or ResNet), followed by joint identiﬁcation, separately.
We reiterate that joint identiﬁcation involves a correct match only if both sensor and subject labels
are correct to allow fair comparison with the proposed method. Results indicate that the proposed
method outperformed the baseline (joint identiﬁcation) by 26.41% averaged across all three datasets
computed at Rank 1. The poor performance for the MICHE-I dataset can be attributed to two factors
- ﬁrstly, the large number of classes (= 375) compared to rest of the datasets (see Table 4.1), and
secondly, the diverse acquisition settings (indoor vs. outdoor) resulting in degraded biometric
recognition, and subsequently leading to overall poor performance. Surprisingly, the proposed
method can still outperform the baseline by ∼30%. We have further analyzed this performance in
Section 4.4.5. CMC curves indicate the superior performance of the siamese network in contrast
to classical and triplet networks.

4.4.4 Results of the joint veriﬁcation experiment

In terms of the performance in joint veriﬁcation scenario, Table 4.4 reports the results. Results
indicate that the proposed method achieved an average joint TMR of 84% @1% FMR, and an average
TMR of 96% @5% FMR, indicating the strong representative capability of the joint representation.
ROC curves in Figure 4.6 indicate that the joint representation learnt using siamese network
trained with single margin contrastive loss (see the curve marked Siamese-SMCL-Emb[Joint])
outperformed the remaining joint representations. We would like to point out that in [71], the
authors achieved 23% (by using feature level fusion) and 86 (by using score level fusion) at 5%
FMR on the MICHE-I dataset (the authors excluded the Samsung Galaxy Tab 2 subset of the

84

(a)

(b)

(c)

Figure 4.5: Cumulative Matching Characteristics (CMC) curves for the proposed method in the
joint identiﬁcation scenario for the following datasets used in this work: (a) CASIA-Iris V2
(b) MICHE-I and (c) OULU-NPU. Refer to Table 4.2 for the diﬀerent training networks and loss
functions indicated in the legend in an identical order.

MICHE-I dataset, which we included in our evaluations). Although their objectives were diﬀerent
compared to the proposed work (they adopted a fusion rule for integrating their proposed biometric
and sensor recognition performances), we would like to indicate that the task of joint recognition
is diﬃcult. In spite of that, the proposed method performed reasonably well.

4.4.5 Analysis of the performance of the proposed method on MICHE-I dataset

In both cases of joint identiﬁcation and joint veriﬁcation experiments, we observed that the perfor-
mance of the proposed method evaluated on the MICHE-I dataset was relatively worse compared

85

123Rank020406080100Joint identification accuracy (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint]123Rank020406080100Joint identification accuracy (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint]123Rank020406080100Joint identification accuracy (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint](a)

(b)

(c)

Figure 4.6: Receiver Operating Characteristics (ROC) curves for the proposed method in the joint
veriﬁcation scenario for the following datasets used in this work: (a) CASIA-Iris V2 (b) MICHE-
I and (c) OULU-NPU. Refer to Table 4.2 for the diﬀerent training networks and loss functions
indicated in the legend in an identical order.

to the remaining two datasets. We posit that the poor performance can be attributed to two reasons:
(i) the image characteristics, and (ii) the variation in the performance across diﬀerent lateralities,
i.e., left vs. right periocular images. MICHE-I dataset was assembled as a part of an iris challenge
evaluation and contains images acquired in unconstrained settings (indoor and outdoor settings)
having occlusions (specular reﬂection and downward gaze). See some challenging images from
the MICHE-I dataset images in Figure 4.7. In contrast, CASIA and OULU datasets contain images
acquired in controlled settings.

We presented the CMC curves corresponding to joint identiﬁcation results for two lateralities
separately in Figure 4.8. Results indicate that the proposed method performed better on left
periocular images compared to right periocular images. This variation in the performance across

86

10-410-2100102False Match Rate (%)020406080100True Match Rate (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint]10-410-2100102False Match Rate (%)020406080100True Match Rate (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint]10-410-2100102False Match Rate (%)020406080100True Match Rate (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint](a)

(b)

(c)

(d)

Figure 4.7: Example images from the challenging MICHE-I dataset. (a) Occlusion, (b) Downward
gaze and specular reﬂection, (c) Prominent background in the outdoor setting and (d) A single
image containing both eyes but labeled as right eye image (061_GT2_OU_F_RI_01_3 where, RI
indicates right eye).

the two lateralities resulted in the overall poor performance on the entire MICHE dataset. MICHE
dataset has an imbalanced distribution of lateralities, only 30% of the total number of subjects
contain left periocular images. We hypothesize that the imbalanced distribution coupled with some
mislabeled test case (see Figure 4.7(d)) may have further compounded the challenges, resulting in
an overall poor performance.

The main ﬁndings from the experiments are as follows:

1. The joint biometric and sensor representation performed well in both joint identiﬁcation
scenario, with an average identiﬁcation accuracy of ∼80 computed at Rank 1, and an average
joint veriﬁcation accuracy of 96% at a false match rate of 5%, averaged across the three
biometric modalities.

2. The representation is robust across three modalities (iris, face and periocular), and diﬀerent

sensors (near-infrared iris sensors and visible smartphone sensors).

3. The joint embedding outperformed baselines that used state-of-the-art commercial biometric
matchers and sensor identiﬁcation schemes across three datasets corresponding to three
biometric modalities and multi-spectral sensors.

87

(a)

(b)

Figure 4.8: Cumulative Matching Characteristics (CMC) curves for the proposed method in the
joint identiﬁcation scenario for the MICHE-I dataset evaluated separately on the two lateralities,
i.e., on the (a) Left periocular images and on the (b) Right periocular images. Results indicate that
the proposed method performs better on the left periocular images compared to the right periocular
images.

4.5 Summary

In this chapter, we proposed a one-shot method to simultaneously authenticate the user and the
device from a single image, say a face or an iris image. To accomplish this task, we developed a
method to learn a joint representation that can be used for combined biometric and sensor (device)
recognition. The joint representation can be used in remote application scenarios (remote bank-

88

123Rank020406080100Joint identification accuracy (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint]123Rank020406080100Joint identification accuracy (%)Embedding[Joint]Siamese-SMCL-Emb[Joint]Siamese-DMCL-Emb[Joint]Triplet-Offline-Emb[Joint]OnlinePair-AllPositive-Emb[Joint]OnlinePair-HardNeg-Emb[Joint]OnlineTL-Random-Emb[Joint]OnlineTL-HardNeg-Emb[Joint]OnlineTL-SemiHard-Emb[Joint]ing) that employ multiple factor authentication. Additionally, the joint representation is implicitly
privacy-preserving as the biometric and sensor representations cannot be trivially separated. We
evaluated the proposed approach on multiple datasets belonging to three diﬀerent biometric modal-
ities (iris, face and periocular) in both joint identiﬁcation and joint veriﬁcation scenarios. We
observed the best performing results of joint identiﬁcation accuracy of 99.81% at Rank 1 and a
joint veriﬁcation accuracy i.e., True Match Rate of 100% at 1 False Match Rate using the proposed
method on face images.

89

IMAGE PHYLOGENY TREE FOR NEAR-DUPLICATE BIOMETRIC IMAGES

CHAPTER 5

Portions of this chapter appeared in the following publication:
S. Banerjee and A. Ross, “Computing an Image Phylogeny tree from Photometrically Modiﬁed Iris
Images," 3rd International Joint Conference on Biometrics, (Denver, USA), October 2017.

5.1 Introduction

In the previous chapters, we focused on the sensor-based forensic analysis of biometric images.
To this end we presented methods that can be used for identifying biometric sensors, followed by
sensor de-identiﬁcation. From this chapter onward, we will delve into the content-based forensic
analysis of biometric images. We will focus on the particular problem of image phylogeny in the
context of biometric images. We will primarily explore face and iris modalities, but we ﬁrst begin
with the iris modality in this chapter.

The performance of a biometric recognition system, say an iris recognition algorithm naturally
depends on the quality of the iris image. A photometrically modiﬁed iris image may adversely
aﬀect the iris recognition performance [136]. An iris image may be subjected to a sequence
of photometric transformations such as brightening, contrast adjustment and gamma correction,
resulting in a family of transformed images, all of which are directly or indirectly related to the
original image. Example of such photometric transformations are presented in Figure 5.1.

In this work, we explore the feasibility of determining the relationship between a set of pho-
tometrically modiﬁed NIR iris images using image forensic principles. In some applications, it
may be necessary to automatically deduce the relationship between such transformed images in
order to determine the structure of image evolution [60] and to represent it in the form of an Image
Phylogeny Tree (IPT). An IPT is a tree-like structure depicting the hierarchical relationship between
a family of transformed images. Each image is represented as a node, and an edge exists between a
pair of nodes if one image is derived from the other. The source image is then termed as the parent

90

(a)

(b)

(c)

Figure 5.1: Example of photometric transformations applied to an NIR iris image. (a) Original
image, (b) brightness adjusted image, and (c) contrast adjusted image.

node and the transformed image is referred to as the child node. The relationship between the child
and parent nodes can be modeled using a parametric function.

In this work, our objective is as follows. For a given set of photometrically transformed near-
duplicate iris images, (a) determine the transformation parameters between pairs of images in
the set and (b) represent the relationship between the related images in the form of an IPT. We
assume that no prior knowledge about the transformations applied to the images is available. The
ﬁrst objective is achieved by exploring the use of three models for modeling the transformation
between pairs of images: (a) global linear model; (b) local linear model; and (c) global quadratic
model. The second objective is achieved by using the estimated parameters of the transformation
between pairs of images, along with a variant of the Oriented Kruskal algorithm [61], to compute
a Directed Acyclic Graph (DAG) that is used to represent the IPT (IPT-DAG). Further, we conduct
experiments to validate the relevance of the three models in representing some popular photometric
transformations. The principal contributions of our paper are as follows: (a) employing three
parametric models, namely, global linear, global quadratic and local linear models to obtain the
best ﬁt to some photometric transformations such as, brightness adjustment, Gaussian smoothing,
contrast limited adaptive histogram equalization (CLAHE), median ﬁltering and gamma correction;
(b) estimating the parameters of the three models, and further using the estimated parameters to
compute an asymmetric dissimilarity measure for constructing the IPT-DAG; and (c) evaluating the
performance of the proposed method in terms of IPT reconstruction accuracies for diﬀerent tree
conﬁgurations.

91

5.2 Proposed Approach

Consider a family of photometrically transformed images denoted by the set {I1, · · · , I}. Our
objective is to construct an IPT as shown in Figure 5.2. This entails computing an asymmetric
dissimilarity measure between every image pair (I, I ), where, ,  = 1, · · · , . The dissimilarity
measure is computed from the parameters  of a transformation model, (I|), that relates I and I .
The pipeline of the proposed approach as presented in Figure 5.2 can be broadly classiﬁed into two
steps. The ﬁrst step constitutes the parameter estimation process for parameterized transformation
models. The second step is the construction of the IPT-DAG using the results from the ﬁrst step.

Figure 5.2: General framework for parameter estimation and IPT reconstruction from a set of
near-duplicate and related iris images.

5.2.1 Parametric Transformations

In order to estimate the transformation between a pair of images, three diﬀerent parametric models
are considered: (a) global linear (GL) model, (b) local linear (LL) model, and (c) global quadratic
(GQ) model. The global models assume the application of the same set of transformation parameters
to every pixel. On the other hand, the local model assumes that the image is tessellated into several
non-overlapping patches, and each patch is subjected to a diﬀerent set of transformation parameters
(see Figure 5.3). Diﬀerent parametric models are considered in this work because some commonly
used image enhancement and denoising operations are global (e.g., brightening) while others are
applied in patches (e.g., CLAHE).

Next, we will discuss the three diﬀerent models, the parameters associated with each of these

models and their respective parameter estimation processes.

92

(a)

(b)

(c)

Figure 5.3: Illustration of global model optimization vs.
(a) Global
model optimizes with respect to entire image, (b) local model optimizes with respect to each of the
tessellated patches in the image, and (c) local optimization for a pair of tessellated images.

local model optimization.

5.2.1.1 Global Linear (GL) Model

The GL transformation model is denoted in Eqn. (5.2.1), where  represents the multiplicative
coeﬃcient and  represents the bias or the oﬀset term.

(I|, ) = I + .

(5.2.1)
The parameters for the global linear model are   = [, ]. We will describe the inverse
compositional (IC) update rule [16, 26, 111] for estimating the parameters of the GL photometric
model. For an image pair (I, I ), where, I is the source image and I  is the target image, we
consider that I  has been subjected to some photometric transformation operation, (I |). We
want to estimate the parameters of the transformation model, , such that it results in minimum
photometric error (PE) between (I |) and I.

The parameter estimation involves minimizing the PE between the image pair using gradient

descent. The PE serves as the vector valued objective function formulated as,

 = min


(cid:107)I − (I |)(cid:107)2

2.

(5.2.2)

93

The objective function is minimized in an iterative procedure with respect to the incremental
parameters  resulting in the optimal parameter  . Eqn. (5.2.2) requires minimization to be
performed in the photometric space of the target image. The solution to Eqn.
(5.2.2) involves
computation of the gradient which needs to be performed in every iteration, making the procedure
computationally intensive. An elegant solution to this problem can be obtained by minimizing the
objective function in the photometric space of the source image. To achieve this, the IC update
rule [26] is employed, which can be written as,

(·|) = ˆ−1((·|)|),

(5.2.3)
ˆ(·|) is an incremental transformation with respect to parameter . The primary
where,
objective is to express the updated transformation in terms of the composition of the current
transformation, (·|), and the inverse of the incremental transformation ˆ−1(·|). After ap-
plying the IC update rule to Eqn.
(5.2.2), the least squares problem can be ﬁnally written as,
min

Let, W  = (I |). Thus, the equation becomes,

(cid:107) ˆ(I|) − (I |)(cid:107)2

2.

(cid:107) ˆ(I|) − W (cid:107)2

2.

min


(5.2.4)

The solution for the above expression requires the use of ﬁrst order Taylor’s expansion which
introduces the Jacobian of the photometrically transformed source image, ˆ(I|). Applying
Taylor’s expansion to Eqn. (5.2.4) results in
min


I ·  − W (cid:107)2
I represents the Jacobian matrix, and the diﬀerence vector E, also known as the error

(cid:107)J
I ·  − E(cid:107)2

(cid:107)I + J
Here, J

2 =min


2.

image, indicates W  − I. For the GL model, the Jacobian is computed as,

,


(I|,)

] = [Iv, 1].

I = [ (I|,)
J
Here, 1 is a column vector of all ones of the same length as Iv, which is the vectorized form
of the image I, such that Iv = vect(I). Thus, the Jacobian depends only on the source image (I)
pixels. The incremental parameter vector is solved using the following equation:

 = (J
I

I)−1J
J

I

E.

94

(5.2.5)

The optimal parameters,   ← [ ,  ] are computed using the inverse update rule as

follows :

  ←

(cid:21)

(cid:20) 

1 + 

;   ←

(cid:20)  − 

(cid:21)

1 + 

.

(5.2.6)

5.2.1.2 Local Linear (LL) Model

Some photometric transformations (e.g., median ﬁltering) are applied to images in patches thus
producing non-uniform changes throughout the image. A global model fails to capture such local
variations. An eﬀective solution is to apply a diﬀerent transformation at each local region in
an image. The process can be simpliﬁed by tessellating the image into several non-overlapping
patches and applying the GL based parameter estimation process iteratively on each of these patches.
Intuitively, the LL model seeks local optimization as opposed to global optimization guaranteed by
the GL model. For estimating transformation parameters between a pair of images, we assume that
the images are in geometric correspondence with respect to each other.

For the LL model, consider the image I to be tessellated into  equal-sized patches {cI

1, · · · , cI
},
1 denotes the ﬁrst patch of image I. Thus, the transformation for each patch can be

where, cI
represented as,

(cI

 |, ) = cI

 + .

(5.2.7)
For the Ò patch, the transformation parameters are  = [, ]. The parameter estimation process
for each pair of patches is identical to the GL model based approach. Upon estimation of the
parameters for each patch, the optimal transformation parameters for the entire image is computed
as their average. The aggregation is necessary due to the patch based approach adopted in the
LL model. Each patch is essentially a matrix of pixel intensity values and some of these patches
have low condition number, i.e., they are close to being singular matrices, thus, making parameter
estimation for these patches unreliable. However, averaging the estimated parameters reduces the
eﬀect of the ill-conditioned patches and estimates the optimal parameter fairly accurately. The

solution can be expressed as follows,   =

(cid:20) 


=1


 ,

=1


(cid:21)

.

95

5.2.1.3 Global Quadratic (GQ) Model

Image ﬁltering operations are widely used to enhance iris images, which aid in iris segmentation and
subsequent iris normalization. However, such operations cannot be approximated using a simple
linear model. A classical example is Gaussian smoothing which is used extensively for the purposes
of image denoising. A simple quadratic model may better model the non-linearities inherent to
such transformations compared to simple linear models. Thus, the third model considered in this
work is the global quadratic model denoted as,

(I|, , ) = I2 + I + ,

(5.2.8)

where, ,  and  represent the scalar coeﬃcients of the transformation.

The parameters for the quadratic model are  = [, , ]. The least squares estimation
(LSE) technique can be used for computing the coeﬃcients of the quadratic model. Eqn. (5.2.8)
can be rewritten as,

(cid:33)

(cid:32)

I2
I
1



(I) = 

. Substituting the above notations in Eqn. (5.2.9) results in

t = 

 · (I) = (I) · .

(5.2.10)

Since, the output, t, can be expressed as a weighted linear combination of the input, (I), it is linear
in terms of the the parameters, . Thus, it can be solved using linear or ordinary LSE and has a
closed form solution. The solution for the parameters (i.e., the coeﬃcients of the quadratic model)
can be expressed as:

96

(I|, , ) = []

(5.2.9)

Eqn.

(5.2.9) can be simpliﬁed by considering the following notations: t = ((I|, , )),



I2
I
1

 .

Algorithm 5: Asymmetric dissimilarity measure computation
1: Input: An image pair, (,  )
2: Output: Dissimilarity matrix, 
3: Normalize the source image () and the target image ( ) by dividing both the images by the

maximum pixel intensity value in 

 ← [1, 0], ÒÒ ← (),  ← 0,  ← 100;
 ← [
, 1];
 ←  ∗ ;

4: initialization:
5:
6: pre-computation:
7:
8:
9: loop:
10: while ÒÒ < 1 × 10−8   ≤  do
11:

multiplication operation

   ← (1). ∗ I  + (2)
 ←    − I

 ←  ∗ 
 ←(cid:2) (1)

(cid:3)

12:
13:

1+(1) , (2)−(2)
1+(1)
ÒÒ ← ()
 ←  + 1

14:
15:
16:
17: end while

return (,  ) ← (); Ò ← ()

 * indicates matrix multiplication operation

 .* indicates element-wise

(cid:18)

(cid:19)−1

  =

(I)(I)

(I)t.

(5.2.11)

In summary, for all three models, the transformation parameters are estimated in both directions:
from the ﬁrst image to the second image (I → I ) and, also, from the second image to the ﬁrst image
(I  → I). The magnitude of the parameters are asymmetric in the two directions. The magnitude
of the estimated parameters, computed as 2-norm of the vector , is then used to compute the
 ×  dissimilarity matrix, D = [(, )] 
, =1, which quantiﬁes the dissimilarity between
every pair of images in the input set. The process of parameter estimation (for the GL model) and
dissimilarity measure computation is summarized in Algorithm 5.

97

5.2.2

IPT-DAG Construction

The magnitudes of the estimated transformation parameters between every pair of images serve
as the elements of the dissimilarity matrix required by the IPT-DAG construction algorithm. The
IPT construction algorithm as described in [61] assumes that each node has a single parent and
constructs a minimal spanning tree (MST) from the dissimilarity matrix. The Oriented Kruskal
algorithm constructs the MST by ﬁrst sorting all the elements of the dissimilarity matrix, and then
creating edge between nodes, say,  and , directed from  → , such that  is the parent node and  is
the child node. The edge is created only if both the nodes do not belong to the same tree, and if node
 has not been assigned a parent. However, a dissimilarity matrix, which is unable to successfully
discriminate between the source (node ) and the target (node ) image, may misclassify  as the
child node of . There is no corrective procedure to amend the reconstruction, since the local
relationships are not examined by the algorithm. Consequently, this will negatively impact the IPT
reconstruction accuracy. The authors in [57] have used optimum branching algorithm to remedy
the above situation, which assumes an initial root node and iteratively tries diﬀerent root nodes to
arrive at the optimal solution; but this leads to higher algorithmic complexity. We propose relaxing
the MST construction by considering the IPT as a directed acyclic graph (DAG). Our objective is
to have a single attempt at IPT reconstruction, where reconstruction involves no prior knowledge
about the correct root, and at the same time be able to evaluate the reconstruction accuracy using
a single criterion. As such, converting both the original tree and the reconstructed tree into their
respective DAG forms may also aid in better understanding the relationships between the nodes,
i.e., the images within a set.

The IPT-DAG construction algorithm begins by sorting the elements of the dissimilarity matrix
with respect to each node at a time. This allows searching for local relationships in contrast to
global relationships as is the case with the Oriented Kruskal algorithm. In every iteration, a row
of the dissimilarity matrix is selected (the row index corresponding to a single node, say, 1)
and a set of potential candidates is determined from the remaining nodes {2, · · · , }.
The potential nodes are the vertices which will possibly share an edge with the current node under

98

consideration. A node (e.g., ,  = 2, · · · , ) is considered to be a potential candidate if the
magnitude of the estimated transformation parameter between the pair (1, ) is less than
5 times the minimum value of that of all the elements belonging to the current row (1) of the
dissimilarity matrix under consideration. Once the potential candidates are selected, the direction of
the edge is decided by comparing the parameter magnitudes in the forward direction and the reverse
direction. A lower magnitude will result in an edge in the corresponding direction. The output
of the algorithm is a data structure with two columns: the ﬁrst column named Ò contains the
child nodes and the second column named,  comprises of the corresponding parent nodes.
Algorithm 6 describes the steps in the IPT-DAG construction process. In our approach, we only
reconstruct a single tree for a given set of images.

 ← 
sort  in ascending order,
 ∈  − ;
 ← ( − );
check
if ( < 5 × ) then

Algorithm 6: IPT-DAG construction
1: Input: Dissimilarity matrix D of size  × 
2: Output: IPT-DAG containing  nodes
3: for each  in D do
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21: end for

Ò ← ;
 ← .
Ò ← ;
 ← .

end if
for each potential node,  do

end if

end for

return IPT-DAG ←(cid:2)Ò, ]

 −  ← 

else

 Determine the potential nodes sharing edge with 

if D(, ) < D(, ) then

 Determine the direction of the edge

99

5.2.3 Performance Evaluation

The reconstructed IPT-DAG is compared against the original IPT which is also converted to a DAG
by inserting links between each node and its ancestor (if not currently present). The accuracy is
computed as follows,

 =

Originaledges ∩ Reconstructededges
Cardinality of the set of Originaledges

,

(5.2.12)

where,  = { → Ò}. The IPT-DAG structure indicates the root
node, the leaf nodes, ancestral relationships and edges. The node appearing most frequently in
the  column is interpreted as the Root node and is connected to all the remaining nodes in
the reconstructed IPT-DAG. Similarly, the leaf nodes appear only in the Ò column of the data
structure obtained as output of the IPT-DAG construction algorithm (discussed in Section 3.2), and
nodes appearing in both columns (Ò and  columns) represent the intermediate nodes.

5.3 Experiments

In this section, the datasets and the experimental protocols used in this work are described.
Results are reported in terms of (a) photometric error for the parameter estimation algorithms of
the three parametric transformation models, and (b) the IPT reconstruction accuracy.

5.3.1 Datasets and Experimental Methodology

The three sets of experiments conducted in this work are described next.

In the ﬁrst set of experiments, 300 iris images from the CASIAv2 Device2 dataset [6] were
subjected to 5 popular photometric transformations with varying parameters. The photometric
transformations and the range of the parameters associated with each transformation, are described
in Table 5.1. For each pair of original and transformed images, the GL, GQ and LL models are
used to estimate the parameters of the transformation. For the LL model, each image is tessellated
into non-overlapping patches of size 16×16. A total of 300 transformed pairs are obtained. The
goal of this experiment is to demonstrate that some popular photometric transformations can be

100

reasonably approximated by one or more of these 3 parametric models. The ﬁnal photometric error
(formulated in Step 18 of Algorithm 5) obtained after convergence of the estimation algorithm is
computed for each of the three models, and the model yielding the lowest error is declared as the
best ﬁt for that speciﬁc photometric transformation.
Table 5.1: Photometric transformations and selected range of parameters used for the ﬁrst set of
experiments.

Photometric Transformation Parameters
Brightness adjustment
Median
CLAHE
Gaussian smoothing
Gamma correction

[a,b]
[m,n]
contrast limit, size of window  ∈ [0, 0.09],  ×  ∈ [5, 8] × [5, 8]
standard deviation
gamma

Range
 ∈ [0, 10],  ∈ [−30, 30]
 ∈ [3, 20],  ∈ [3, 20]
 ∈ [2, 8]
 ∈ [0.1, 2]

In the second set of experiments, 1200 images from the CASIAv2 Device2 dataset and 1992
images from the CASIAv4 Thousand [7] dataset were assembled together to form our experimental
dataset resulting in a total of 3192 images. Each image of the dataset is subjected to a sequence
of photometric transformations resulting in IPT of diﬀerent conﬁgurations. The parameters for the
second experimental protocol are presented in Table 5.2. The diﬀerent tree conﬁgurations analyzed
in this paper are presented in Figure 5.4. A total of 3192 IPTs were constructed for each of the
three conﬁgurations. Parameter estimation is conducted independently on each set of transformed
images.

(a)

(b)

(c)

Figure 5.4: Examples of image phylogeny tree conﬁgurations considered in this work. (a) Breadth=
1, Depth= 3, (b) Breadth= 3, Depth= 1 and (c) Breadth= 2, Depth= 2 .

In the third set of experiments, the 3192 images from the second experiment are subjected to
all 5 transformations resulting in a single complex IPT of breadth 3 and depth 2. The resulting IPT

101

and an example iris image undergoing multiple transformations are exhibited in Figure 5.5. For
this experiment, 3192 trees were constructed and evaluated in terms of reconstruction accuracy. In
practice, we cannot guarantee that all the images belonging to a particular tree will be subjected
to the same transformation. Some images may arise due to application of global operations, while
others may be a consequence of local operations. Accurate reconstruction of such IPTs is of
practical importance. The primary objective of the third set of experiments is to evaluate which
of the three parametric functions (GL, GQ and LL models) will be well suited for modeling an
IPT generated using sequences of multiple photometric transformations resulting in a complex
conﬁguration.

(a)

(b)

Figure 5.5: IPT based on multiple photometric transformations. (a) IPT of Breadth= 3 and Depth=
2.
(b) Example of an iris image undergoing multiple transformations in a single tree (diﬀerent
colored lines denote the diﬀerent transformations indicated in the left ﬁgure).

Table 5.2: Photometric transformations and selected range of parameters used for the second set of
experiments.

Photometric Transformations Parameters
Brightness adjustment
Median
CLAHE

[a, b]
[size of window]
contrast limit, Distribution

Gaussian smoothing
Gamma correction

standard deviation
gamma

Range
a ∈[0,10] , b ∈ [-30, 30]
{[3,3], [5,5], [7,7]}
contrast limit → {0.01, 0.03},
Distribution → {‘Uniform’, ‘Rayleigh’}
stddev [2, 8]
gamma [0.1, 2]

102

Table 5.3: Experiment 1. Performance of the 3 parametric models in representing each of the 5
photometric transformations.

Models

GL
GQ
LL
GL
GQ
LL
GL
GQ
LL
GL
GQ
LL
GL
GQ
LL

Best Fit Rate (%)
(Forward
transformation)
0
100
0
0
100
0
0
10.67
89.33
0
87.67
12.33
1.33
91.33
7.33

Best Fit Rate (%)
(Reverse
transformation)
0
99.67
0.33
100
0
0
0
13.33
86.67
100
0
0
1.33
73.67
25

Mean PE
(Forward
transformation)
5.1289×10−12
1.9129×10−14
1.4292
0.0306
0.0244
1.5227×106
0.2232
0.1417
0.0864
0.0302
0.0214
0.0501
0.0087
0.0015
0.1724

Mean PE
(Reverse
transformation)
1.854×10−12
2.0308×10−14
1.2018
0.0259
0.0303
3.2743×107
0.1344
0.0844
0.0461
0.0229
0.0306
0.1301
0.0103
0.0032
0.1738

Brightness adjustment

Median

CLAHE

Gaussian smoothing

Gamma correction

5.3.2 Results and Discussion

Analysis of results in terms of parameter estimation and reconstruction accuracy is discussed in
this section.

For the ﬁrst set of experiments, the results are presented in Table 5.3. Here, for each photometric
transformation, we report the percentage of times each of the 3 models gave the best ﬁt. The
results indicate that the GQ model best ﬁts the Brightness and Gamma correction operations for
both forward and reverse transformations. However, it should be noted that for the Brightness
adjustment operation, GL model performs almost at par with the GQ model as evident from the
average photometric error value reported in the last two columns. For a given image pair, we do
not assume prior information about which is the original image and which one is the transformed
image. As such, a model may ﬁt well the transformation in both directions. Thus, the performance
of a model (and, therefore, its relevance) can be claimed to be acceptable only if it results in low PE
in both forward and reverse directions. As anticipated, the LL model is able to better characterize
the CLAHE operation which is a local transformation.

For the second set of experiments, the IPT-DAG reconstruction accuracy is reported in Table 5.4.

103

The results reﬂect how well a model discriminates between the original and the transformed image.
For example, the LL model which best ﬁts the CLAHE transformation (see Table 5.3), results in
the highest reconstruction accuracy for CLAHE as indicated in the third row of Table 5.4. The
reconstruction accuracy for Brightness adjustment is identical for the GL and GQ models as the
magnitude of the estimated parameters were similar (for GQ model the coeﬃcient of the quadratic
term was ≈ 10−14). The poor reconstruction accuracy for the global Gamma correction can be
attributed to the failure of all the three models in representing the transformation. The contrast
adjustment uses the gamma value in the range [0.1, 2] to decide the shape of the curve governing the
relationship between the input and the output pixel values; thus, gamma adjustment is a non-linear
mapping which cannot be aptly represented using a simple quadratic model. Another example
of failure of the proposed IPT-DAG reconstruction is demonstrated for Gaussian smoothing in
Figure 5.6. Such a case arises when the value of the standard deviation value used for the smoothing
operation is small ( = 2.55), and the dissimilarity measure cannot successfully discriminate
between the source and the transformed image, resulting in poor IPT reconstruction. In Figure 5.6,
the second image was misclassiﬁed as the source image.

(a)

(b)

Figure 5.6: Example of an IPT of breadth 3 and depth 1 undergoing Gaussian smoothing resulting
in an incorrect IPT-DAG reconstruction. (a) Original IPT-DAG ( denotes the standard deviation
governing the smoothing operation) and (b) incorrect IPT-DAG reconstruction.

104

Table 5.4: Experiment 2. IPT-DAG reconstruction accuracy for diﬀerent tree conﬁgurations using
magnitude of predicted parameters as asymmetric dissimilarity measure.

Brightness adjustment

Median

CLAHE

Gaussian smoothing

Gamma correction

IPT-DAG Reconstruction Accuracy (%)
Models B1D3 B2D2 B3D1
88.26
GL
88.26
GQ
LL
88.18
0
GL
92.02
GQ
1.33
LL
GL
47.94
47.11
GQ
98.31
LL
0
GL
GQ
52.35
0.98
LL
14.68
GL
GQ
38.95
25.86
LL

86.76
86.76
86.53
0
91.44
0.93
62
36.17
99.16
0
58.45
1.92
17.94
40.58
26.49

91.16
91.16
88.03
0
97.70
0.43
89.47
15.36
99.52
0
67.07
3.25
22.45
39.58
28.84

Table 5.5: Experiment 3.
scenario depicted in Figure 5.5.

IPT-DAG Reconstruction Accuracy for the multiple transformation

Models
GL
GQ
LL

IPT-DAG Reconstruction Accuracy (%)
47.93
71.30
46.67

5.4 Summary

In this chapter, we introduced the content-based analysis of biometric images. We construct an
image phylogeny tree (IPT) that captures the relationship between a set of photometrically modiﬁed
iris images.
the IPT contains the original image as the root and a set of directed immediate
and ancestral links depicting the order in which the images have been modiﬁed. The proposed
approach used three parametric functions, namely, global linear, global quadratic and local linear
for modeling photometric transformations, out of which the global quadratic model outperformed
the linear models. However, the quadratic model struggled to perform to model highly non-linear
transformations (gamma correction and Gaussian smoothing). This work gave us insight into

105

moving towards a probabilistic framework that can suﬃciently discriminate between original and
transformed images for accurate IPT reconstruction.

106

CHAPTER 6

A PROBABILISTIC FRAMEWORK FOR IMAGE PHYLOGENY USING BASIS

FUNCTIONS

Portions of this chapter appeared in the following publications:
S. Banerjee and A. Ross, “Face Phylogeny Tree: Deducing Relationships Between Near-Duplicate
Face Images Using Legendre Polynomials and Radial Basis Functions," 10th IEEE International
Conference on Biometrics: Theory, Applications and Systems, (Tampa, USA), September 2019.
S. Banerjee and A. Ross, “Face Phylogeny Tree Using Basis Functions", IEEE Transactions on
Biometrics (T-BIOM) 2020.

6.1 Introduction

In the previous chapter, we presented a deterministic approach for modeling a set of photo-
metrically modiﬁed near-duplicate iris images. We have further realized that the application of
photmetric transformations to face images portray realistic scenarios. Therefore, in this chapter, we
tackle the challenging problem of image phylogeny in the context of diﬀerent biometric modalities
(face, ﬁngerprint and iris images) subjected to diﬀerent types of photometric transformations using
a probabilistic framework. We pose the problem of asymmetric measure computation as the “like-
lihood ratio" problem and employ a topological sorting technique to construct the image phylogeny
tree.. We further evaluate the proposed approach on a host of photometric, geometric transforma-
tions and deep learning-based transformations. The contributions of the proposed method are as
follows:

1. Considering three diﬀerent families of basis functions for modeling photometric and geo-
metric transformations: (i) Orthogonal polynomial family (Legendre and Chebyshev), (ii)
Wavelet family (Gabor), and (iii) Radial Basis family (Gaussian and Bump).

2. Performing cross-modality testing, i.e.earning the parameters of the basis functions using
face images, and testing it on near-infrared iris images and optical sensor ﬁngerprint images.

107

3. Testing on multiple IPT conﬁgurations to evaluate the robustness of the proposed method.
Also, robustness to unseen photometric and geometric transformations (i.e., transformations
not used during the training phase) accomplished using deep learning-based schemes, as
well as open-source and commercial software, is assessed. Furthermore, we have performed
qualitative assessment of the IPTs reconstructed using the proposed method on near-duplicates
downloaded from the internet.

4. Visualizing the results using t-distributed stochastic neighbor embedding (t-SNE) to better
understand the ability of the basis functions in modeling the transformations and discrimi-
nating between forward and reverse transformation directions.

5. Employing von Neumann directed graph entropy to better understand and evaluate the re-

constructed IPTs.

6.2 Proposed Method

A photometrically related image pair (,  ) can be generated by applying a single transfor-
mation or a sequence of transformations to one image resulting in the other image. However, to
construct the IPT we require to diﬀerentiate between the original image and the transformed image.
Say, if  is the original image and   is the transformed image, then the IPT should have a directed
link as follows:  →  . Applying this same principle to a set of near-duplicate photometrically
related images, we need two sets: the ﬁrst set denoting the parent nodes and the second set denoting
the child nodes. These two sets are then used to construct the IPT ( → Ò). So the ﬁrst
step is to identify the sets of parent and child nodes from an array of near-duplicates.

We proceed to identify the parent and the child nodes for each pair of images by ﬁrst modeling
the transformation that relates the two images. We use parameterized basis functions to model
the transformations in both directions ( →   and  ←  ). But modeling the transformation
does not indicate which is the parent node and which is the child node. To accomplish this, we
require an asymmetric measure to distinguish between the forward and reverse directions. We

108

pose the asymmetric measure computation as the likelihood ratio problem [20]. To compute the
likelihood ratio, we adopt a supervised framework with a training phase and a testing phase. In
the training phase, we model numerous transformations for a large number of near-duplicate image
pairs in both directions (in this phase we know the original and the transformed images apriori).
This results in two sets of parameter distributions, one for the forward transformation and the
other for the reverse transformation. In the testing phase, for a given near-duplicate pair, we ﬁrst
model the transformations in both directions. Next, we use the estimated parameters to determine
how likely they are to originate from the forward parameter distribution as opposed to the reverse
parameter distribution. This step leads to the computation of the asymmetric similarity measure.
We repeat this step for all image pairs in the near-duplicate set. Upon pairwise modeling of all
the near-duplicate images in the set, we perform thresholding to identify related image pairs. The
similarity measure is then utilized to identify which image from the related pair is the parent, thus,
making the other image its respective child. The sets of parent and child nodes are ultimately used
to generate the IPT.

In this work, we seek to model an arbitrary transformation using a set of parametric functions,
that we refer to as basis functions. Such an approach is needed since the space of photometric
transformations is very vast; further, each of these transformations has a large number of parameter
values. For example, a simple brightness adjustment can be accomplished using a large number of
brightness values. The use of a ﬁxed set of parametric functions to approximate a transformation
reduces the otherwise complex task of modeling the photometric transformation. Thus, the task of
approximating the transformations involves learning the parameters of the basis functions, subject
to a criterion.
In our case, the criterion or the objective function is the minimization of the
photometric error between a near-duplicate image pair. This is formulated as below:

(,  ) = min

(cid:107) ( ) − T [ ( )|](cid:107)2
2.

basis function as ( ) ≈ T [ ( )|] ≈

(6.2.1)
Here, T [·|] denotes the photometric transformation. We model the transformation using the
Ò=1 ÒBÒ[ ( )], where the transformation is applied
to each pixel .  = [1, · · · , ] is the parameter vector to be estimated and  is the number


109

of basis functions. In this work, we have ﬁve diﬀerent types of basis functions, so the value of
 depends upon the choice of the basis function. Next, we describe the process of modeling the
transformations using the basis functions and the parameter estimation routines.

6.2.1 Parameter Estimation of Basis Functions

6.2.1.1 Orthogonal Polynomial Basis Functions

1. Legendre polynomials are a class of orthogonal polynomials deﬁned in the interval [-1, 1].
The Legendre polynomial of degree  computed at  is denoted as () and is written as
follows:

() = 2

.

(6.2.2)

(cid:18)


=0

(cid:19)(cid:18) +−1

(cid:19)

2


Legendre polynomials have been successfully used for image template matching [130], and
image reconstruction and compression [104]. Note that Eqn.(6.2.2) simpliﬁes to a linear
function for  = 1 and a quadratic polynomial for  = 2.

2. Chebyshev polynomials are a special case of Jacobi polynomials deﬁned in the interval [-1,
1]. There are two kinds of Chebyshev polynomials, here we are interested in Chebyshev
polynomials of ﬁrst kind which have been extensively used for approximating complex
functions such as graph convolution [55]. The explicit representation is presented below:

() = 

(1 − −2)  .

(6.2.3)

In the notation,

Ò=1 ÒBÒ[ ( )], BÒ[·] equals Ò(·) if Legendre polynomial is used and Ò(·)
if Chebyshev polynomial is employed, and  =  + 1. Next, we solve the objective function in
Eqn.(6.2.1) using the inverse compositional estimation (ICE) algorithm [16, 26]. The IC update
rule expresses the updated transformation as a composition of the current transformation and the
inverse of the incremental transformation. See [26] for a detailed derivation of the IC update rule.
The parameter  is computed as  ← 
. The IC update rule is an iterative optimization
1 + Ɗ

(cid:107)
(cid:106) 

2

=0

(cid:18) 

(cid:19)

2

110

algorithm and updates the new  using the previous value, , and the incremental Ɗ. The
incremental parameter vector is computed as,

Ɗ = (  

 +  )−1  .

(6.2.4)

Here, the term  is known as the Jacobian of the source image (), and the term (  
) is
known as the approximate Hessian matrix. We applied 2 regularization to the Hessian matrix.
Here,  denotes the regularization parameter,   denotes the identity matrix, and  denotes the
error image computed between the source image () and the modeled target image (T [ | ]).
In this work,  is a 6-dimensional vector for both Legendre and Chebyshev polynomials.

6.2.1.2 Wavelet Basis Functions

Gabor wavelets are used for extracting texture information from images [49] and has been selected
as one of the basis functions for modeling the transformations. We employed a bank of Gabor ﬁlters
parameterized with the wavelength and orientation. A set of four discrete wavelengths {2, 3, 4, 5}
and four orientations {0◦, 45◦, 90◦, 135◦} are selected. Each wavelength corresponds to a single
ﬁlter scale that treats the image at a diﬀerent resolution. Thus, we have a bank of sixteen Gabor
ﬁlters. We ﬁltered the image with the Gabor bank and we obtained 16 ﬁltered responses. However,
in our case we combined the orientation responses for each wavelength, thus reducing the total
number of responses from 16 to 4. Finally, we use ICE to estimate the 4-dimensional parameter
vector  ( = 4).

6.2.1.3 Radial Basis Functions

The polynomial and wavelet basis functions model the transformations at pixel level. In pixel-level
modeling, the photometric error between each pixel of the original and transformed image pair
is minimized using a weighted linear combination of basis functions. However, local ﬁltering
operations such as median ﬁltering are applied in a patch-wise manner. Therefore, we used the
family of radial basis functions that possesses nice smoothing properties to model transformations

111

at the patch level.
In patch-level modeling, the photometric error between two patches, one
patch belonging to the original image and the second patch belonging to the transformed image, is
minimized using a weighted linear combination of basis functions. Therefore, patch-level modeling
considers all the pixels within a patch for minimizing the photometric error. This resolves the spatial
dependencies observed in patch-based photometric transformations.


 ,[

1. Gaussian radial kernel is the ﬁrst type of smoothing functions considered in the work.
Gaussian RBF computed at  is denoted as () and is written as () = exp (cid:107) − (cid:107)2.
Consider an image pair (,  ) each of which is tessellated into  non-overlapping blocks
 ( )|] ≈
of size 16×16. For the block , where  = [1, · · · , ], let 
 ( )]. Here,  denotes the pixel intensity value within the Ò block and 
indicates the average of the pixel intensity values within that block. For Gaussian RBF,
 = , i.e.or a 16×16 block, the local least squares estimation yields . Simplifying using
 ( )],∀. The local least
the matrix notation yields 
squares method is used to estimate the coeﬃcient vector  for each block. The ﬁnal  is a
256-dimensional vector obtained by computing the average of all s. See [20] for detailed
derivation.

 ], where B[

 ( ) = T [

 ] = [

 ≈ 

 B[


2. Bump RBF is a smooth compact function which can be interpreted as a Gaussian function
scaled to lie on a disc of unity radius. It is not analytic unlike Gaussian RBFs, but can be
used as generalized functions which are essential in converting discontinuous functions to
for  ∈ [−1, 1]. Here,  is
smooth functions [153]. In this case, () = exp
mean-centered. Using least squares estimation, we obtain  (256-dimensional).

1 − 2

(cid:19)

(cid:18)

−

1

6.2.2 Asymmetric Measure Computation and IPT Construction

The asymmetric measure can be in the form of pairwise similarity or dissimilarity, but in this work,
we adopt a similarity-based asymmetric measure. The similarity measure computed between a
pair of images determines whether an image pair is photometrically related or not, i.e.hether a link

112

should exist between a pair of nodes (images) in the IPT; secondly, determine the direction of the
link by identifying the parent node and the child node. The parameters estimated for modeling the
transformations are utilized to compute this similarity measure as described below.

6.2.2.1 Likelihood ratio for computing the asymmetric similarity measure
Given a pair of images, (,  ), we ﬁrst estimate the parameter vectors   and   in both
directions ( →   and   → ). The parameter vectors are necessary but not suﬃcient for
constructing the IPT. We compute the likelihood ratio from the estimated parameters to yield a
similarity score which can discriminate between the forward and reverse directions, and can thus
be used to construct the IPT. To compute the likelihood ratio, we need the probability distribution
of the parameter vectors —   () and () corresponding to forward and reverse directions
for a large number of training images. The probability distributions are generated in a supervised
fashion, where we assume that we know for an image pair from the training set ( , ),  is
the original image and  is the transformed image. Then the forward transformation refers to
( → ), and the reverse transformation refers to ( → ). The set of   vectors computed
for a large number of image pairs are used to estimate   (). Similarly, the set of  parameter
vectors are used to determine (). We utilized Parzen window based non-parametric density
estimation scheme [140] to obtain   () and ().

  ( )
( ) . Similarly, '  =

Upon obtaining the forward and the reverse parameter distributions, we now compute the
  (  )
(  ) . Our intuition is that
likelihood ratios as follows: '  =
we will observe a higher value of '  compared to ' , if  is the original image and   is the
transformed image. In this case,   belongs to the forward distribution and should result in a
higher value of ' . Conversely,    belongs to reverse distribution, resulting in a lower value of
' . The likelihood ratios are further used to populate the similarity matrix.

113

Figure 6.1: The outline of the proposed method. The proposed method ﬁrst models the photometric
transformations between every image pair and then computes the asymmetric measure. Given a set
of near-duplicate images as input (on the left) the two objectives are: (i) to determine the candidate
set of root nodes, and (ii) to construct the IPT when the root image is known. The dashed arrows
indicate ancestral links and the bold arrows indicate immediate links between parent and child
nodes.

IPT Construction

6.2.2.2
The similarity matrix  of size  ×  is populated by the likelihood ratio values as follows:
  = ' ; ,  = [1, · · · , ] and  ࣔ . The diagonal elements of the similarity matrix are ignored
as we do not consider self-loops in an IPT. The similarity matrix is then employed for (i) identifying
the candidate root nodes and (ii) constructing the IPT. The steps are described below.

1. Indicator matrix representation: This step helps in pruning the outliers which can be
falsely identiﬁed as root nodes. The indicator matrix is constructed by thresholding the
similarity matrix against a suitable threshold, which results in a binary matrix. The details
of the threshold selection are described in [20]. The indicator matrix serves as an adjacency
matrix of a coarse directed acyclic graph that is further reﬁned for constructing the IPT. The
reason it is referred to as coarse is that it may contain some spurious edges.

2. Candidate root nodes identiﬁcation: In this work, we consider each IPT to have a single
root node. The authors in [57, 62] considered each node as a potential root, one at a time,
and then computed a cost function for each IPT constructed using the potential root. The

114

IPT resulting in the least cost function value was selected, and its root node was used for the
ﬁnal evaluation. This involves (3) (can be optimized to (2)) complexity as reported
in [57]. In contrast, the method proposed here computes a set of three candidate root nodes,
which corresponds to the top 3 choices for roots out of  nodes. This requires ﬁnding the
nodes having the highest number of 1’s in the indicator matrix (we consider ancestral edges
as correct edges). The entire process requires summing each row of the indicator matrix
followed by sorting and this results in ( log ) computational complexity.

3. IPT generation: We construct the IPT as described in [20] using a depth-ﬁrst search-based
tree spanning technique. The choice of depth-ﬁrst search (DFS) over breadth-ﬁrst search
(BFS) is motivated by the fact that DFS has a linear memory requirement with respect to
the nodes and results in a faster search (()) and is therefore used for topological sorting.
The total computational complexity for the IPT construction using the proposed method is
( log ) + () ≈ ( log ).

The outline of the proposed method for constructing the IPT is illustrated in Figure 6.1.

Table 6.1: Description of the datasets used in this work.

Modality

Name of the
Dataset

Face

LFW
CASIA-IrisV2
Device2
CASIA-IrisV4
Thousand
Fingerprint FVC 2000

Iris

DB3

Dataset
Identiﬁer
Partial Set
Full set
—

No. of
subjects
391
468
37

—
Conﬁg I
Conﬁg II

525
110
90

No. of
images
12,290
27,270
7,260

5,005
8,800
7,200

6.3 Experiments

In this section, we describe the datasets employed, the experiments conducted and ﬁnally report

the results.

115

Table 6.2: Photometric transformations and the range of the corresponding parameters used in
the training and testing experiments. The transformed images are scaled to [0, 255]. Note that
experiments were also conducted using other complex photometric transformations besides the
ones listed here.

Photometric Transformations Level of
Operation Parameters
Global
Brightness adjustment
Median ﬁltering
Local
Global
Gaussian smoothing
Gamma transformation
Global

Range
a ∈ [0.9,1.5], b ∈ [-30,30]
[a,b]
size of window [m,n] m ∈ [2,6], n ∈ [2,6]
stddev ∈[1,3]
standard deviation
gamma ∈ [0.5,1.5]
gamma

6.3.1 Datasets

We have used four datasets belonging to three diﬀerent modalities to conduct experiments. For the
face modality, we used images from the Labeled Faces in the Wild (LFW) dataset [88]. For the iris
modality, we used near-infrared iris images from the CASIA-IrisV2 Device2 subset [6] and CASIA-
IrisV4 Thousand subset [7]. For the ﬁngerprint modality, we used images from the FVC2000
DB3 dataset [114]. The description of the datasets is provided in Table 6.1. We have selected
four photometric transformations, viz., Brightness adjustment, Median ﬁltering ﬁltering, Gaussian
smoothing, and Gamma transformation as used in [20] to test the proposed IPT construction
algorithm. The parameter range for each of the transformations is described in Table 6.2.

6.3.2 Experimental Methodology

We have performed seven experiments which are described below.

6.3.2.1 Experiment 1: Eﬃcacy of basis functions

In this experiment, we evaluate the ability of the basis functions to (i) model the photometric
transformations and (ii) discriminate between the forward and reverse directions. To accomplish
the ﬁrst task, we perform two evaluation methods. The ﬁrst evaluation involves deterministically
selected parameters, whereas the second evaluation involves randomly selected transformation
parameters. For the ﬁrst evaluation, we select a face image,  and subject it to a single transformation,
e.g., gamma adjustment, parameterized with a speciﬁc  value, resulting in (cid:48). We repeat this

116

(a)

(b)

(c)

(d)

Figure 6.2:
IPT conﬁgurations used in Experiments 2 and 3 for the face, iris and ﬁngerprint
modalities. Note that the same conﬁguration was tested across two modalities (Face and Iris)
while, two diﬀerent conﬁgurations were tested for the same modality (Finger). The bold arrows
indicate immediate links and the dashed arrows indicate ancestral links.

process 200 times, each time we use an incrementally modiﬁed  ( =  + Ɗ), thus,
resulting in 200 near-duplicate image pairs. Furthermore, we repeat this process for 5 images
corresponding to 5 diﬀerent subjects. Therefore, we have a total of 1,000 photometrically related
image pairs for a single transformation. We conduct this process for each of the four transformations
indicated in Table 6.2. Next, we use the basis functions to model the transformation only in the
forward direction in this experiment. Then, we use -SNE [156] to reduce the dimensionality
of the estimated vectors and project them onto 3-dimensions. This experiment is conducted to
assess the ability of the basis functions in modeling the transformations and we visually interpret

117

(a)

(b)

(c)

(d)

Figure 6.3: IPT conﬁgurations used in Experiment 4. The bold arrows indicate immediate links
and the dashed arrows indicate ancestral links.

the results. The second protocol involves modeling 2,000 image pairs in the forward direction
using the ﬁve basis functions, where each of the 500 images were subjected to each of the four
photometric transformations using randomly selected parameters. We further computed the residual
photometric error (PE), for each image pair which is the diﬀerence in the pixel intensity between
the actual target image and the output modeled using the basis functions. We then computed the
mean over all 2,000 image pairs. The mean of the residual PE evaluates the ability of the basis
functions to accurately model the transformations.

To evaluate the second task, i.e.iscriminating between the forward and reverse directions, we
selected 400 image pairs (100 image pairs corresponding to each of the four transformations) from
the 2,000 pairs, that were generated using randomly selected parameters. Then, we modeled the
transformations in both forward and reverse directions, and estimated the parameters. We further
used -SNE to obtain a 2-dimensional embedding of the estimated parameters, and visualized the
projections to analyze the performance of the basis functions.

118

6.3.2.2 Experiment 2: IPT Reconstruction

In this experiment, we evaluated the proposed approach in terms of (i) Root identiﬁcation and (ii)
IPT Reconstruction accuracy metrics as used in [20]. We followed the same experimental protocol
in [20], and assessed the performance of the basis functions on both a partial set and a full-set of
face images from the LFW dataset.1 We used the IPT conﬁguration presented in Figure 6.2(a) for
this experiment.

6.3.2.3 Experiment 3: Cross-modality testing on multiple conﬁgurations

We tested the proposed approach on iris images and ﬁngerprint images. This experiment is intended
to demonstrate the generalizability of the proposed method across modalities.

1. Iris Images —We applied a random sequence of photometric transformations on near-
infrared iris images. We evaluated the root identiﬁcation and IPT reconstruction accuracies
for 726 IPTs using the same procedure as described in Section 6.2.2. We used the same
IPT conﬁguration as the one used for face images (see Figure 6.2(b)). Note, the parameter
probability distributions in the forward and reverse directions are computed using a training
set comprising of face images; the test images are iris images. The test iris images are acquired
in the near-infrared spectrum, in contrast to the training face images that are acquired in the
visible spectrum. As a result, this experiment can also be treated as an assessment of the
basis functions for cross-spectral modeling.

2. Fingerprint Images —Two diﬀerent IPT conﬁgurations are tested as depicted in Fig-
ures 6.2(c) and 6.2(d). This experiment tests the generalizability of the proposed approach
as a function of the breadth and depth of the IPT. We refer to Figure 6.2(c) as Conﬁg I and
Figure 6.2(d) as Conﬁg II. The IPT conﬁguration used for testing the face and iris images
is more balanced (similar distribution of nodes on the left and the right sides of the root)

1http://iprobe.cse.msu.edu/dataset_detail.php?id=1&?title=Near-Duplicate_Face_Images_

(NDFI)

119

compared to the Conﬁg I structure which has more depth than breadth, whereas Conﬁg II has
the same breadth at successive depths.

We also performed an intra-modality experiment which serves as the baseline experiment to
compare against the performance of the cross-modality experiment. The training and testing
partitions for the intra-modality experiments are as follows:

• Iris images — We used 5,005 images from the CASIA-IrisV4 Thousand subset belonging
to 525 subjects to learn the parameter distributions in the forward and the reverse directions.
We then tested it on 726 IPTs (same conﬁguration as in Figure 6.2(b)) constructed from the
CASIA-IrisV2 Device2 subset. This experiment can also be considered as a cross-dataset
experiment, due to the use of two diﬀerent datasets in the training and testing phases.

• Fingerprint images — We used the same dataset (intra-dataset) in training and testing but
we strictly followed a subject disjoint protocol. This, however, resulted in a lesser number of
training images. 560 images from 70 subjects were used for creating parameter distributions
and then tested on 3,200 images from 40 subjects. We used the same conﬁgurations as
depicted in Figures 6.2(c) and 6.2(d).

6.3.2.4 Experiment 4: Robustness to unseen photometric transformations

We have considered a closed set of 4 transformations in the training stage. However, a gamut of
image and video editing tools such as Photoshop, GIMP and Snapchat ﬁlters exist which can be
used for image manipulation, particularly for face images. In this context, we constructed a small
test set of images transformed using Photoshop operations (Hue and Saturation adjustment, Curve
transformation, Color balance, and Blur ﬁlters) to create 35 IPTs corresponding to 5 subjects. The
IPT conﬁgurations are selected such that they cover diverse breadth and depth values possible
for an IPT with 5 nodes. See Figure 6.3. The trained parameter distributions did not encounter
instances of Photoshopped images; hence, this experiment will demonstrate the robustness of the
basis functions in handling unseen transformations.

120

6.3.2.5 Experiment 5: Ability to handle geometric transformations

We designed this experiment to assess the ability of basis functions in modeling geometric trans-
formations. We have selected some well-known geometric transformations such as sampling using
linear interpolation and aﬃne transformations that include translation, scaling and rotation. We
have selected these particular transformations as they have also been utilized in [61] for creating
near-duplicates. The details about the geometric transformations and their respective parameter
ranges are described in Table 6.3. We randomly selected 500 images belonging to 97 subjects from
the Labeled Faces in the wild (LFW) dataset [88]. We then applied four geometric modiﬁcations
(see Table 6.3) on each of these images in a random sequence with random parameter values to
create 500 image phylogeny trees (IPTs). Each IPT contains 10 images so we have a total of 5,000
images. An example IPT consisting of geometrically modiﬁed images is presented in Figure 6.4.
Note that the IPT conﬁguration is the same as the one used to evaluate photometrically modiﬁed
images. We have conducted the experiment using the following two protocols, and evaluated the
performance using root identiﬁcation accuracy at Ranks 1, 2 and 3 and IPT reconstruction accuracy.
1. The ﬁrst protocol involves training on the photometrically modiﬁed images, while testing on
geometrically modiﬁed images. In this protocol, the training set did not include any geometrically
modiﬁed images, so it assesses the robustness of the basis functions on diﬀerent classes of trans-
formations (i.e., photometric versus geometric). We have not modiﬁed the asymmetric measure
computation method or the tree-spanning method used in constructing the IPT.

2. The second protocol involves training and testing on geometrically modiﬁed images. To
accomplish this task, we created a new training set of 5,865 pairs of original and geometrically
transformed images using the LFW dataset. The objective is to evaluate the performance of the
basis functions when trained and tested on geometric modiﬁcations, unlike in the ﬁrst protocol.

We have compared the performance of the proposed method with a baseline algorithm described
in [61]. The baseline algorithm uses Speeded-Up Robust Features (SURF) and RANSAC algorithm
for the task of geometric registration, followed by color channel normalization and used the residual
photometric error as the asymmetric measure. The Oriented Kruskal algorithm is used for spanning

121

Table 6.3: Experiment 5: Geometric transformations and their parameter ranges used in this work.

Geometric transformations Parameters
[90%, 110%]
Re-sampling
[-5◦, 5◦]
[5, 20]
[90%, 110%]

Rotation
Translation
Scaling

Generic Aﬃne

Figure 6.4: Experiment 5: An example IPT generated using geometrically modiﬁed near-duplicate
images. The bold arrows indicate immediate links and the dashed arrows indicate ancestral links.

the IPT which is a minimal spanning tree in their case. We implemented the baseline algorithm in
two ways.

(a) Firstly, we used SURF and M-SAC (M-estimator sample and consensus scheme which is
an improved variant of RANSAC) for geometric registration. We did not perform color channel
normalization since we used gray-scale images. We then rescaled pixel intensities in the original
and modiﬁed images to [0, 255] prior to evaluation. We used the Oriented Kruskal algorithm for
constructing the IPT.

(b) Secondly, we used the best performing basis function to compute the likelihood ratio to be

used as the asymmetric measure and then employed the Oriented Kruskal for spanning the IPT.

It is important to note that, unlike the baseline method, the proposed method does not require

any separate geometric registration for modeling the geometric transformations.

122

6.3.2.6 Experiment 6: Ability to handle near-duplicates available online

Near-duplicate images of celebrities and political ﬁgures are widely circulated on the internet. The
actual sequence of generation of such near-duplicates may be unknown, but these images represent
pragmatic scenarios where the ground truth may not be always available. In this experiment, we
analyze how the proposed asymmetric measure and IPT construction methods can handle such
images. To this end, we followed the suggestion presented in [68] and used Google image search to
download 40 near-duplicates retrieved from the following 5 queries: Angelina Jolie, Kate Winslet,
Superman, Britney Spears and Bob Marley. We used training parameter distributions learnt from
both geometrically and photometrically modiﬁed images for each of the ﬁve basis functions used
in this work. We then used the proposed asymmetric measure computation method to identify top
3 candidate root nodes. For each of the candidate root node we then reconstructed an IPT. Due to
unavailability of ground truth, we could not evaluate the accuracy of the reconstructed IPTs, but
we present qualitative assessment of the reconstructed IPTs.

6.3.2.7 Experiment 7: Ability to handle deep learning-based transformations and image

augmentation schemes

Several deep learning-based transformations and image augmentation packages are available that
can be used for applying sophisticated transformations to images in an automated fashion gen-
erating a large number of near-duplicates.
In this experiment, we used a deep learning-based
autoencoder [80] and open source image augmentation packages [5] used for training deep neural
networks to evaluate the proposed method. We have conducted the experiment using two protocols.
1. The ﬁrst protocol involves a deep convolutional autoencoder [80]. The autoencoder was
trained on ∼ 19, 000 images from the CelebA dataset [8] to generate 80 near-duplicate images
belonging to 16 subjects. The resultant IPT conﬁguration is depicted in Figure 6.5(a). The
convolutional autoencoder comprises of an encoder block that consists of ﬁve convolutional layers,
followed by ReLU after each convolutional layer, and the decoder block comprises of traditional

123

convolutional layers and nearest-neighbor based upsampling.2 We did not use de-convolution or
transposed convolution layers, as they can lead to checkerboard artifacts. The intuition behind using
an autoencoder for generating near-duplicate images is to leverage upon its ability to perform high
ﬁdelity reconstruction of original input images. This ﬁts the deﬁnition of ‘near-duplicates’ in our
image phylogeny task, and has therefore been used in this experiment. We apply the original image
as an input to the autoencoder to generate the ﬁrst set of near-duplicates at depth=1. This ﬁrst set
of reconstructed images are again fed as input to the same autoencoder to generate near-duplicates
at depth=2, and so on until we generated near-duplicates for depth=5.

2. The second protocol involves Augmentor [5], a data augmentation tool used in training deep
neural networks. We used this tool, which is an open source package in Python, that applies random
distortions such as zoom, cropping, rotation, re-sampling and elastic deformations to an image.
See Figure 6.5(b). Some of these image transformations or the diverse parameter ranges (training
involved rotation values in the interval of [−5◦, 5◦], whereas, testing using Augmentor involved
rotation values in the interval of [−10◦, 10◦]) are not encountered during the training stage. We
randomly selected 100 images belonging to 100 subjects of the CelebA dataset. We applied the
Augmentor on each of these 100 images to create 100 IPTs. Each IPT contains 10 images. So we
tested on a total of 1,000 near-duplicate images.

In addition, we also used some images synthesized using a deep learning-based generative
network known as BeautyGlow [39]. The generative network performs a style transfer on the
makeup of the individual in face images, resulting in near-duplicates as shown in Figure 6.6(a).
Images are generated by sequentially increasing the magniﬁcation value of the makeup, highlighting
the intensity of the makeup. We used 13 IPTs (each IPT contains 7 images), resulting in a total of
91 images.

2https://sebastianraschka.com/deep-learning-resources.html

124

(a)

(b)

Figure 6.5: Experiment 7: (Left) IPT test conﬁguration used for evaluation of the basis functions
by employing autoencoder generated near-duplicates.
(Right) IPT test conﬁguration used for
evaluation of the basis functions by employing open source image augmentation packages. The
bold arrows indicate immediate links and the dashed arrows indicate ancestral links.

(a)

(b)

Figure 6.6: Experiment 7: (Left) Near-duplicates generated using BeautyGlow generative network.
(Right) IPT constructed using Chebsyshev polynomials for the near-duplicates on the left. The
bold arrows indicate immediate links and the dashed arrows indicate ancestral links.

125

6.4 Results and Analysis

In this section, we ﬁrst report the results observed for the experiments described in the previous

section, and we further present our insights into the ﬁndings.

6.4.1 Results of Experiment 1

The 3D projected vectors obtained using -SNE are illustrated in Figure 6.7 for each of the transfor-
mations modeled using the ﬁve basis functions. Each column denotes a photometric transformation,
and each row denotes a basis function. As evident from the projections, the basis functions can
model the majority of the transformations fairly well. The parameters governing each transfor-
mation are incrementally modiﬁed and, hence, their projections should ideally span a continuous
trajectory. Also, we expect to observe this behavior irrespective of the transformations used or
the identity of the subject. We indeed observe such a behavior for most of the cases, except for
Gamma adjustment, where the polynomials and wavelet functions ﬂounder. Note that median
ﬁltering requires integer parameter values (height and width of window). Therefore, in the t-SNE
results (last column in Figure 6.7) we observe small clusters, depicting accurate modeling of dis-
crete parameterized transformations. Out of all the basis functions, the radial basis functions seem
to model the transformations the best. Figure 6.8 further substantiates that the RBFs are best at
modeling transformations while Gabor wavelets perform relatively poorly. The RBFs result in the
lowest mean residual photometric error, suggesting a more accurate modeling of the photometric
transformations.

In terms of discriminability between the forward and reverse directions, the projections are
almost indistinguishable in the two directions for the wavelet functions, but they are relatively
better for polynomial functions and the RBFs, as evidenced in Figure 6.9. The polynomials
have fairly well-separated projections, indicating their ability to discriminate between the forward
and reverse directions. We anticipate that this ability will be reﬂected in the IPT reconstruction
experiments.

126

Brightness transformation

Gamma adjustment

Gaussian smoothing

Median ﬁltering

(a) Legendre

(b) Chebyshev

(c) Gabor

(d) Gaussian RBF

(e) Bump RBF

Figure 6.7: Experiment 1: 3D projected parameters using -SNE corresponding to each photometric
transformation (column) modeled using each basis function (row). Each color represents a single
image. A total of 5 images were modeled. Gaussian and Bump RBFs model majority of the
transformations reasonably well as indicated by the last two rows. The Brightness transformation
was easiest to model as the parameters of the basis functions follow a continuous path.

127

Figure 6.8: Experiment 1: The photometric error between the actual output and the the output
modeled using the basis functions is denoted as residual photometric error (PE). The mean of the
residual PE is demonstrated for 2,000 image pairs modeled in both forward and reverse directions
using the ﬁve basis functions. Gabor resulted in the highest residual PE, and the RBFs yield the
lowest residual PE demonstrating their eﬃcacy in reliably modeling the transformations.

6.4.2 Results of Experiment 2

The results of root identiﬁcation and IPT reconstruction are presented in Tables 6.4 and 6.5. Results
indicate that polynomials (Legendre and Chebyshev) perform the best in a majority of cases among
the set of ﬁve basis functions selected in this work. The results are consistent with the observations
reported in Figure 6.9, which indicates suﬃcient discriminability oﬀered by the polynomials. For
the partial set, Legendre polynomials perform best both in terms of root identiﬁcation (89.91%)
and IPT reconstruction (70.61%) accuracies, closely followed by Chebyshev polynomials. For the
full set, Gaussian RBF performs the best in terms of root identiﬁcation accuracy (80.85%) while
Chebsyhev polynomials perform the best in terms of IPT reconstruction accuracy (66.54%, a small
improvement of ≈1.5% is observed compared to the results in [20]).

128

(a)

(b)

(c)

(d)

(e)

Figure 6.9: Experiment 1: 2D projected parameters using -SNE in forward and reverse directions,
corresponding to all 4 transformations modeled using each basis function:
(a) Legendre, (b)
Chebyshev, (c) Gabor, (d) Gaussian RBF and (e) Bump RBF. Legendre and Chebyshev polynomials
can better discriminate between forward and reverse directions as indicated by the relatively well-
separated parameter distributions compared to the remaining basis functions.

6.4.3 Results of Experiment 3

The results for the cross-modality experiments are presented in Table 6.6 for iris images, and in
Tables 6.7 and 6.8 for ﬁngerprint images. The purpose of this set of experiments is to assess how
the proposed method performs for (i) the same IPT conﬁguration but used across two diﬀerent
modalities, and (ii) the same modality but tested on diﬀerent IPT conﬁgurations. Note that the
training modality is diﬀerent than the test modality in both cases.

The results in Table 6.6 indicate that Chebyshev polynomials obtain 94.90% root identiﬁcation
accuracy at Rank 3 and an IPT reconstruction accuracy of 67.90% for iris images. It is closely
followed by Legendre polynomials. However, other basis functions perform poorly, speciﬁcally, the
Gabor wavelets. Gabor wavelets are good texture descriptors, i.e.hey can extract the high-frequency
features reliably from an image. In the case of photometric transformations, the pixel intensity
gradient, which contributes toward high-frequency features are not signiﬁcantly aﬀected and, thus,
the wavelet fails to correctly model the transformation between an image pair.

129

Image phylogeny on near-duplicate ﬁngerprint images is extremely hard, as evident from the
results in Tables 6.7 and 6.8. Visual inspection reveals that the set of near-duplicate ﬁngerprint
images appear to be black blobs on a white background, with no discernible diﬀerences between the
set. Therefore, the root identiﬁcation and IPT reconstruction accuracies are worse compared to the
face and iris modalities: the best root identiﬁcation performance is 65.28% and IPT reconstruction
accuracy is 70.59%. This experiment also shows that the performance varies across conﬁgura-
tions. Speciﬁcally, symmetric conﬁgurations (Conﬁg-II) can be more diﬃcult to reconstruct than
asymmetric conﬁgurations (Conﬁg-I).

Next, we compare the results of the cross-modality experiments with the baseline, which is the
intra-modality experiment described in Section 6.3.2.3. The results of the baseline experiments are
reported in Table 6.9. The results indicate that cross-modality experiments are commensurate with
the intra-modality performance. For example, for iris images, the intra-modality experiment obtains
the best root identiﬁcation accuracy of 94.63% at Rank 3, while the cross-modality experiment
obtains the best root identiﬁcation accuracy of 94.90% at Rank 3. Furthermore, the intra-modality
experiment obtains the highest IPT reconstruction accuracy of 68.62%, while the cross-modality
experiment obtains the highest IPT reconstruction accuracy of 67.90% in the case of iris images.

6.4.4 Results of Experiment 4

The training set comprised of images modiﬁed using 4 rudimentary photometric transformations.
In the real world, a plethora of image editing applications exists, thereby making image phylogeny
for face images a typically diﬃcult problem. We hypothesize that by creating a training dataset
through random parameters on simple transformations, the unseen transformations can be reliably
modeled. Results reported in Table 6.10 indicate that unseen transformations were modeled fairly
well. Legendre polynomials performed the best in terms of root identiﬁcation accuracy with
76.47% averaged across the four IPT conﬁgurations (see Figure 6.3). Chebyshev polynomials
performed the best in terms of IPT reconstruction accuracy with 76.25% averaged across the four
IPT conﬁgurations (a small improvement of ≈1.67% is observed compared to the results in [20]).

130

Table 6.4: Experiment 2: Root identiﬁcation and IPT reconstruction accuracies for face images
(Partial Set).

Basis Function Root identiﬁcation (%)
Legendre
Chebyshev
Gabor
Gaussian RBF
Bump RBF

Rank 1/2/3
65.90/82.18/89.91
53.62/74.69/85.52
27.66/41.74/56.06
65.25/81.04/87.79
63.79/80.07/86.41

IPT Reconstruction
(%)
70.61
70.53
55.54
66.15
66.52

Table 6.5: Experiment 2: Root identiﬁcation and IPT reconstruction accuracies for face images
(Full set).

Basis Function Root identiﬁcation (%)
Legendre
Chebyshev
Gabor
Gaussian RBF
Bump RBF

Rank 1/2/3
50.45/66.74/75.68
45.18/65.13/76.86
29.48/44.77/58.01
56.44/71.87/80.85
55.34/70.85/80.09

IPT Reconstruction
(%)
65.05
66.54
55.46
63.84
64.27

Table 6.6: Experiment 3A: Root identiﬁcation and IPT reconstruction accuracies for iris images in
the cross-modality setting.

Basis Function Root identiﬁcation (%)
Legendre
Chebyshev
Gabor
Gaussian RBF
Bump RBF

Rank 1/2/3
56.75/76.58/87.88
72.59/88.29/94.90
5.79/12.40/19.70
40.08/63.64/76.45
39.67/60.88/76.03

IPT Reconstruction
(%)
67.53
67.90
51.23
66.74
66

Table 6.7: Experiment 3B: Root identiﬁcation and IPT reconstruction accuracies for ﬁngerprint
images (Conﬁg -I) in the cross-modality setting.

Basis Function Root identiﬁcation (%)
Legendre
Chebyshev
Gabor
Gaussian RBF
Bump RBF

Rank 1/2/3
29.66/44.32/56.82
31.93/46.14/57.39
22.50/37.27/51.36
30.80/50.34/61.14
31.59/51.70/62.50

IPT Reconstruction
(%)
68.99
70.59
68.08
68.98
68.51

131

Table 6.8: Experiment 3B: Root identiﬁcation and IPT reconstruction accuracies for ﬁngerprint
images (Conﬁg -II) in the cross-modality setting.

Basis Function Root identiﬁcation (%)
Legendre
Chebyshev
Gabor
Gaussian RBF
Bump RBF

Rank 1/2/3
34.58/51.11/59.31
35/55.28/65.28
31.39/48.75/63.33
14.58/23.75/36.81
17.50/28.33/40.42

IPT Reconstruction
(%)
65.82
65.93
60.76
59.29
59.96

Table 6.9: Experiment 3: Baseline performance of basis functions in terms of root identiﬁcation
and IPT reconstruction accuracies in the intra-modality setting.

Modality &
Conﬁguration

IRIS

FINGERPRINT

Conﬁg-I

FINGERPRINT

Conﬁg-II

Root identiﬁcation (%)
Basis
Rank 1/2/3
Function
64.46/83.75/91.46
Legendre
76.31/89.53/94.63
Chebyshev
Gabor
8.54/18.60/27.55
Gaussian RBF 29.89/49.31/66.25
25.21/38.15/48.76
Bump RBF
34.69/48.13/57.81
Legendre
38.75/58.44/67.55
Chebyshev
Gabor
5.94/14.06/23.75
Gaussian RBF 28.13/48.13/59.69
39.06/56.56/66.56
Bump RBF
Legendre
35.63/50.31/63.12
42.50/56.56/69.69
Chebyshev
Gabor
5.31/9.69/17.81
Gaussian RBF 26.56/42.19/54.37
Bump RBF
31.56/48.44/60.31

IPT
Reconstruction (%)
68.62
66.62
54.33
60.77
58.06
71.92
71.46
64.37
66.96
68.35
66.22
65.52
55.23
59.91
63.44

Table 6.10: Experiment 4: Root identiﬁcation and IPT reconstruction accuracies for unseen
photometric transformations.

Basis Functions

Legendre
Chebyshev
Gabor
Gaussian RBF
Bump RBF

IPT 1

Root
identiﬁcation
Rank 3 (%)
66.67
44.44
44.44
66.67
66.67

IPT
Reconst-
ruction (%)
82.22
84.44
82.22
84.44
84.44

IPT 2

Root
identiﬁcation
Rank 3 (%)
90
60
80
40
60

IPT
Reconst-
ruction (%)
71.67
75
71.67
68.33
68.33

IPT Conﬁgurations

IPT 3

Root
identiﬁcation
Rank 3 (%)
77.78
44.44
77.78
44.44
33.33

IPT
Reconst-
ruction (%)
44.44
45.56
48.89
43.33
44.44

IPT 4

Root
identiﬁcation
Rank 3 (%)
71.43
71.43
71.43
71.43
71.43

IPT
Reconst-
ruction (%)
100
100
100
100
100

132

(a)

(b)

(c)

(a) Original image (on the left) and the transformed image (on the right).

Figure 6.10: Experiment 5: Example of geometric transformation (rotation) modeling using
basis functions.
(b)
Modeled image pair using Legendre polynomials (modeled original image is on the left and
modeled transformed image is on the right). (c) Modeled image pair using Gaussian RBF (modeled
original image is on the left and modeled transformed image is on the right).

6.4.5 Results of Experiment 5

We present examples of geometric transformation modeling using the basis functions in Figure 6.10.
We observe that Gaussian RBFs outperform Legendre polynomials at modeling the geometric
transformations (see Figure 6.10) due to two reasons - (i) the radial basis functions can potentially
span inﬁnite range of values as opposed to the polynomials which can span values within a ﬁnite
interval, and (ii) the RBFs did patch-level modeling compared to the pixel-level modeling done by
the polynomials. The results in Table 6.11 indicate that the basis functions perform signiﬁcantly
better when trained on geometrically modiﬁed images (second protocol) compared to when trained
on photometrically modiﬁed images (ﬁrst protocol). As anticipated, if the class of transformations
are the same in both training and testing set, the results are better, but surprisingly, even with
photometrically modiﬁed training images, the basis functions are able to reliably handle geometric
transformations. The basis functions outperform the baseline (see ﬁrst row in Table 6.11) by ∼ 50%
in terms of root identiﬁcation accuracy and ∼ 56% in terms of IPT reconstruction accuracy. In
the case of substituting the asymmetric measure in the baseline with the proposed asymmetric
measure (we used Gaussian RBF), while retaining the tree spanning algorithm (see second row in
Table 6.11), an improvement of ∼ 32% in terms of root identiﬁcation accuracy and an improvement
of ∼ 52% in terms of IPT reconstruction accuracy is observed.

We also observe that the IPT reconstruction accuracy is lower for geometrically modiﬁed
images compared to photometrically modiﬁed images. We tried to further analyze this diﬀerence
in performance. Visual inspection revealed that the geometrically modiﬁed images appeared ‘more

133

Figure 6.11: Experiment 5: ROC curves for recognition of the original images with the photometri-
cally and geometrically modiﬁed images using a COTS face matcher. The recognition performance
is higher for geometrically altered images compared to photometrically modiﬁed images indicating
high degree of similarity with the original images.

similar’ to the original image compared to the photometrically altered images. This can be attributed
to the restrictive parameter range used in geometric transformations. A restrictive parameter range
ensures that the images are indeed near-duplicates. A wide variation in the parameter values may
result in highly dissimilar images, thereby, destroying the notion of near-duplicates. To quantify
the degree of similarity between the original images and the modiﬁed images, we performed face
recognition using a commercial face matcher.
In the face recognition experiment, the original
images served as the gallery and the geometrically modiﬁed images served as the probe samples.
We repeat this same process for photometrically modiﬁed images belonging to Set I of Experiment
2. ROC curves presented in Figure 6.11 indicate a true match rate of 96.27% at a false match
rate of 0.01% for the geometrically altered images, and a true match rate of 91.87% at a false
match rate of 0.01% for the photometrically altered images. Note, the probe and gallery sizes for
the photometrically modiﬁed images are four times more than that for the geometrically modiﬁed
images. Nonetheless, the basis functions can better handle the identiﬁcation of the original image
(root node) and reconstruction of the IPT compared to the existing method.

134

Table 6.11: Experiment 5: Root identiﬁcation and IPT reconstruction accuracies for geometric
transformations. The top two rows indicate the baseline algorithms. The baselines yield only one
root node as output so results are reported only at Rank 1 and the remaining ranks are indicated
Not Applicable (NA). In this experiment, the testing (TE) is always done on geometrically modi-
ﬁed images (indicated by TE-GM) but the training (TR) can be done using either photometrically
modiﬁed images (indicated by TR-PM) or geometrically modiﬁed images (TR-GM). Results indi-
cate training on geometrically modiﬁed images yield best performance when tested on geometric
transformations.

Method

Protocol

Root identiﬁcation
accuracy at
Ranks 1/2/3 (%)

IPT
reconstruction
accuracy (%)

Legendre

Chebyshev

Baseline

(SURF + MSAC) +
Oriented Kruskal
Gaussian RBF +
Oriented Kruskal
TR-PM, TE-GM
TR-GM, TE-GM
TR-PM, TE-GM
TR-GM, TE-GM
TR-PM, TE-GM
Gabor
TR-GM, TE-GM
Gaussian RBF TR-PM, TE-GM
TR-GM, TE-GM
TR-PM, TE-GM
TR-GM, TE-GM

Bump RBF

8.00 / NA / NA

3.62

7.24
27.20 / NA / NA
52.78
14.20 / 24.40 / 37.00
54.30
23.20 / 39.60 / 52.80
51.40
7.00 / 13.40 / 19.60
23.20 / 35.60 / 49.20 59.81
57.75
25.80 / 41.20 / 52.60
25.60 / 40.00 / 54.60
58.49
25.40 / 42.40 / 57.00
55.67
58.60 / 75.80 / 86.00 51.40
49.82
7.80 / 17.60 / 31.00
46.60 / 65.80 / 77.00
56.32

6.4.6 Results of Experiment 6

Examples of the near-duplicates retrieved from the internet and their corresponding IPT recon-
structions are presented in Figure 6.12. Qualitative analysis indicates that the reconstructed IPTs
can depict the relationship between the near-duplicates reasonably well. For example, the images
3, 4 and 1 for the Bob Marley images (see Figure 6.12(c)) should ideally follow the sequence
as indicated by the IPT. Similarly, image 2 appears to be a cropped version of image 3 which is
correctly constructed by the proposed method.

6.4.7 Results of Experiment 7

We used parameter distributions learnt from both photometrically modiﬁed images and geometri-
cally modiﬁed images for IPT reconstruction. The results are reported in Tables 6.12 6.13 and in

135

(a) Bob Marley near-duplicates

(b) IPT constructed using Ga-
bor

(c) IPT constructed us-
ing Gaussian RBF

(d) Britney Spears near-duplicates

(e) IPT constructed using Chebyshev

Figure 6.12: Experiment 6: Examples of near-duplicates available online and their corresponding
IPTs constructed using the proposed method. The ﬁrst row corresponds to (a) 4 near-duplicates
retrieved using the query Bob Marley, (b) IPT constructed using Gabor trained on photometric
distribution (the top 3 candidate root nodes are 2,3,1) and (c) IPT constructed using Gaussian
RBF trained on geometric distribution (the top 3 candidate root nodes are 3,2,1). The second
row corresponds to (d) 7 near-duplicates retrieved using the query Britney Spears and (e) IPT
constructed using Chebyshev trained on photometric distribution (the top 3 candidate root nodes
are 2,4,5). The bold arrows indicate immediate links and the dashed arrows indicate ancestral links.

136

terms of root identiﬁcation and IPT reconstruction accuracies.

(a) h

(b)

(c)

Figure 6.13: Example images from the CelebA dataset containing prominent background details
in the face images.

1. For the near-duplicates generated using autoencoder, the Gabor model performs the best in
terms of root identiﬁcation accuracy (Rank 1 accuracy of 81.25%) and the Chebyshev polynomials
perform the best in terms of IPT reconstruction accuracy (64.37%). Gabor outperforms the
remaining basis functions due to the characteristics of the CelebA dataset that includes face images
along with with background details (see Figure 6.13). Gabor as a texture descriptor is able to
accurately model the change in the texture of the background regions in the outputs reconstructed
using the autoencoder.

2. For the near-duplicates generated using image augmentation package, maximum root iden-
tiﬁcation accuracy of 50% at Rank 3 is obtained by Bump RBF while Chebyshev obtains an IPT
reconstruction accuracy of 75.20%. The image augmentation packages include elastic distortions
that are complex non-linear deformations, that the basis functions did not model accurately.

3. For the near-duplicates generated using the BeautyGlow network, we only performed
qualitative evaluation due to the small size of the test set. We observe that the IPT constructed in
Figure 6.6(b) indicates a deep tree with gradual increase in the intensity of the make-up which is
anticipated.

We observed that the basis functions perform better when trained on photometrically modiﬁed
images compared to when they are trained on geometrically modiﬁed images, in the case of both
autoencoder and Augmentor. This is perhaps due to the fact that the autoencoder does not introduce
geometric modiﬁcations; the modiﬁcations are restricted to structural and textural details. On the
other hand, the image augmentation library produces random geometric modiﬁcations such as

137

Table 6.12: Experiment 7: Root identiﬁcation and IPT reconstruction accuracies for deep learning-
based transformations. In this case, the near duplicates are generated using autoencoder .

For near-duplicates generated using autoencoder

Basis
functions

Performance when trained
on photometric transformations
Root identiﬁcation
IPT
accuracy (%)
reconstruction
@ Ranks 1/2/3
accuracy (%)
25.00 / 50.00 / 68.75
50.63
25.00 / 56.25 / 62.50 64.37
81.25 / 100 / 100
55.00
55.00
57.50

Legendre
Chebyshev
Gabor
Gaussian RBF 87.50 / 93.75 / 93.75
Bump RBF
56.25 / 68.75 / 87.50

Performance when trained
on geometric transformations
Root identiﬁcation
IPT
accuracy (%)
reconstruction
@ Ranks 1/2/3
accuracy (%)
62.50
25.00 / 50.00 / 62.50
37.50 / 43.75 / 68.75
46.88
31.25 / 43.75 / 56.25 49.38
50.62
43.75 / 56.25 / 68.75
25.00 / 37.50 / 43.75
55.00

Table 6.13: Experiment 7: Root identiﬁcation and IPT reconstruction accuracies for deep learning-
based transformations. In this case, the near duplicates are generated using image augmentation
schemes for training deep neural networks.

For near-duplicates generated using image augmentation schemes

Basis
functions

Performance when trained
on photometric transformations
Root identiﬁcation
accuracy (%)
@ Ranks 1/2/3
16.00 / 27.00 / 40.00
Legendre
13.00 / 24.00 / 39.00
Chebyshev
Gabor
12.00 / 25.00 / 36.00
Gaussian RBF 17.00 / 30.00 / 43.00
17.00 / 32.00 / 50.00
Bump RBF

IPT
reconstruction
accuracy (%)
74.00
75.20
71.93
65.20
66.20

Performance when trained
on geometric transformations
IPT
Root identiﬁcation
reconstruction
accuracy (%)
accuracy (%)
@ Ranks 1/2/3
71.27
16.00 / 30.00 / 42.00
72.00
18.00 / 29.00 / 43.00
70.27
18.00 / 30.00 / 40.00
19.00 / 30.11 / 41.00
67.67
12.00 / 26.00 / 44.00 67.40

rotation and re-sampling, but uses more sophisticated techniques to remove some of the artifacts
associated with such operations (e.g., removes the black padding near the borders of a rotated
image). In such cases, we speculate that the basis functions view the geometrically altered image
as a photometrically modiﬁed image. This explains the better performance of the basis functions
when trained on photometrically modiﬁed images compared to geometrically modiﬁed images.

6.4.8 Further Analysis

In addition to the seven experiments discussed above, we performed another four experiments
for further analysis: (i) to analyze the performance of the proposed algorithm on steganography,
(ii) to evaluate the performance of the proposed method on handling arbitrary number of nodes
(images), (iii) to analyze the impact of missing nodes on the performance of the method, and (iv)

138

to examine the impact of demographic inﬂuences on the method. To achieve the ﬁrst task, we used
near-duplicates generated using steganographic algorithm. Steganography refers to embedding a
hidden message within the ‘cover image’ to generate a ‘stego image’. Both cover and stego images
have visually indiscernible diﬀerences, and the hidden message can be deciphered using dedicated
steganalysis techniques. The amount of bits per pixel that can be distorted in the cover image
to embed the secret message is referred to as the payload. Since, the cover images and stego
images can be considered near-duplicates, we have demonstrated the performance of our image
phylogeny tree (IPT) construction algorithm on a well known stegonagraphic algorithm called
S-UNIWARD [87]. This refers to a universal distortion function for steganography in an arbitrary
domain using directional ﬁlter banks. We used the S-UNIWARD tool3 to create near-duplicate
stego images using diﬀerent payload values {0.1, · · · , 0.9}. We then applied ﬁve diﬀerent basis
functions for identifying the root node (original image) and IPT reconstruction. For the remaining
three tasks, we evaluated only using Chebyshev polynomial as it has consistently shown good
performance across other experiments. To accomplish the second task, we constructed 100 IPTs,
each IPT comprising 10, 20, 30, 40 and 50 nodes resulting in a total of 15,000 near-duplicates.
To accomplish the third task, we randomly removed 20%, 40%, 60% and 80% of the nodes from
100 IPTs, each IPT comprising 20 nodes. Finally, to accomplish the fourth task, we selected 178
subjects from the UNCW MORPH (Academic) dataset4 belonging to ﬁve diﬀerent ethnic groups:
41 subjects belonging to Asian demographic group, 33 subjects belonging to Black demographic
group, 38 subjects belonging to Hispanic demographic group, 29 subjects belonging to Asian
demographic group and 37 subjects belonging to White demographic group, and created 178 IPTs
comprising 10 nodes resulting in a total of 1,780 near-duplicates.

For the stego images demonstrated in Figure 6.14, the best performing basis function was
the Bump radial basis function which identiﬁed the correct root node at Rank 2 and achieved
55.56% IPT reconstruction accuracy. Note that in this example, SSIM varied between 97.54%

3http://dde.binghamton.edu/download/stego_algorithms/
4https://ebill.uncw.edu/C20231_ustores/web/product_detail.jsp?PRODUCTID=8

139

Figure 6.14: Illustration of steganographic images generated using the S-UNIWARD algorithm (on
the left), and the diﬀerences in the coeﬃcients in the DCT domain between the cover image and
the stego image at each depth level (on the right).

140

and 99.81%. In spite of the extremely strong similarity between the ten images, our method was
able to identify the correct root node with 66.67% accuracy, and recovered more than half of the
edges, including ancestral and immediate links. For the task involving arbitrary number of nodes,
we observed that the method can handle upto 20 nodes (75% root identiﬁcation accuracy and 70%
IPT reconstruction accuracy), but the structures become deeper after 30 nodes, and also the images
start losing biometric utility due to severe modiﬁcations resulting in overall degradation in root
identiﬁcation and IPT reconstruction accuracy. For the task involving missing nodes, we observed
the best performing results to be 85% in terms of root identiﬁcation and 60% in terms of IPT
reconstruction accuracy. Finally, as far as demographic inﬂuences are concerned, we observed that
the root identiﬁcation is slightly biased towards the ‘White’ group, possibly due to the training set
involving LFW dataset which has a majority of White subjects. However, the IPT reconstruction
accuracy was observed to be consistent across diﬀerent demographic groups.

Next, we highlight the main take-away points from all the experiments.

1. Radial basis functions performed best at modeling the baseline photometric transformations
(Brightness adjustment, Gamma transformation, Gaussian smoothing and Median ﬁltering)
and also geometric transformations. They resulted in the lowest residual error when used for
modeling the transformations.

2. Orthogonal polynomials performed best at reliably discriminating between the forward and
reverse directions. They resulted in the highest root identiﬁcation and IPT reconstruction
accuracies in a majority of the cases involving photometric and geometric modiﬁcations.

3. The proposed approach of utilizing “likelihood ratio" generalizes well across multiple IPT

conﬁgurations and diﬀerent biometric modalities.

4. The proposed approach is capable of handling diﬀerent classes of transformations such as
photometric and geometric. In addition, they are robust to unseen transformations using deep
learning tools and image editing software.

141

(a)

(b)

(c)

Figure 6.15: Toy example demonstrating the eﬀect of insertion of spurious edge on the von Neumann
entropy. (a) Groundtruth IPT (b) Correctly reconstructed IPT with spurious edge (c) Incorrectly
reconstructed IPT with spurious edge. Note, the spurious edge is indicated by dashed line.

Table 6.14: Approximate von Neumann entropy for analysis of spurious edges and missing edges in
reconstructed IPTs. The mean and the standard deviation of the diﬀerences between the entropy of
the ground truth and the reconstructions are reported. Low values indicate accurate reconstructions
and smaller number of spurious as well as missing edges.

Basis Function Entropy (Mean
and standard deviation)
-0.0009 ± 0.0045
Legendre6
-0.0000 ± 0.0044
Chebyshev
-0.0036 ± 0.0039
Gabor
-0.0035 ± 0.0028
Gaussian RBF
-0.0034 ± 0.0030
Bump

6.5 Explanatory Model

Finally, in this section, we present a graph entropy-based explanatory model which analyzes
the failure cases of the IPTs reconstructed using the proposed method. A failed IPT reconstruction
can involve (i) missing edges, and (ii) spurious edges. Such failure cases can be quantiﬁed using
approximate von Neumann entropy computed for directed graphs. Graph entropy is computed in
terms of its in-degree and out-degree [162]. Consider a directed graph (, ) with a set of nodes
denoted by  and the set of edges denoted by . The adjacency matrix of such a graph is deﬁned
as,


∈


∈

; 

 =

. The von

142

1

0

 =

if (, ) ∈ ,

otherwise.

The in-degree and out-degree of node  are presented as 

 =

Neumann entropy for the directed graph  can be written as:

 = 1 − 1

|| − 1
2||2

(cid:20) 

(,)∈


 2


+ 

(,)∈2

(cid:21)

.

1

 
(6.5.1)

Here, 1 = {(, )|(, ) ∈  and (, ) ࢡ }; 2 = {(, )|(, ) ∈  and (, ) ∈ }, such that,
 = 1 ∪ 2 and 1 ∩ 2 = ∅. The maximum value of entropy for a directed graph is equal to
1− 1|| for a star graph where all the nodes in the graph have either incoming links or outgoing links
but not both. The minimum entropy is obtained for a cyclic graph, implying all nodes are fully
2|| = 1− 1.5|| . In our
connected, and the value of minimum entropy is 1− 1|| − 1
work, we want to move toward maximum entropy since minimum entropy results in the worst case
scenario of reconstruction where the result is a cyclic graph. Thus, Eqn. (6.5.1) can be simpliﬁed
as follows.

2||2 || = 1− 1|| − 1

 = 1 − 1

|| − 1
2||2

(cid:20) 

(,)∈1

(cid:21)


 2


.

(6.5.2)

The above equation is applicable in our work since 2 = ∅, otherwise, we will end up having a
cycle in the reconstructed IPT. We hypothesize that spurious edges decreases the entropy of the
IPT, which is further reduced if the IPT misses correct edges. This can be illustrated using a toy
example presented in Figure 6.15. The reconstruction yields a 100% IPT reconstruction accuracy
for the ﬁrst reconstructed IPT (Figure 6.15(b)), which has no missing edge but one spurious edge,
and 50% for the second reconstructed IPT (Figure 6.15(c)), which has one missing edge and one
spurious edge. The von Neumann entropy as computed from Eqn. (6.5.2) for the ground truth
conﬁguration (Figure 6.15(a)) is, () =1− 1
=0.67.
3
Note that the in-degree of node 1 is 0. This value also corresponds to the maximum entropy
of a directed graph with 3 nodes. The entropy for the ﬁrst reconstructed IPT (Figure 6.15(b)) is,
− 1
( 1) =1−1
=0.64. The entropy
3
18


1
2 2


1
3 2


− 1
=1−1
3
18

=1− 1
3

− 1
18

− 1
18

0+0+

(cid:20)

(cid:20)

(cid:20)

(cid:21)

(cid:21)

(cid:20)

(cid:21)

(cid:21)

+


1
2 2


1

+


1
3 2


1

+


2
3 2


2

0

1

1

for the second reconstructed IPT (Figure 6.15(c)) is, ( 2) =1−1
3

1

(cid:20)

2 ×(1)2
− 1
18


1
2 2


1

+


2
3 2


2

(cid:21)

=

= 0.61. Thus, () > ( 1) > ( 2). This demonstrates that

(cid:20)

(cid:21)

1− 1
3

− 1
18

0+

1

1 ×(1)2

143

inaccurate IPTs (missing edges and spurious edges) can reduce the entropy from the ground truth
entropy, and this property can be leveraged for evaluating the goodness of the reconstructed IPTs.
Reconstructed IPTs with missing edges and spurious edges will tend to have lower entropy.

We compute this entropy-based measure to analyze the accuracy of the IPTs reconstructed using
the proposed method for the face images (full set with 2,727 IPTs). The maximum and minimum
entropy for a graph containing 10 nodes is 0.85 and 0.90, respectively. The von Neumann entropy as
computed from Eqn. (6.5.2) for the ground truth IPT (see Figure 6.2(a)) is 0.89. We then compute
the entropy corresponding to the reconstructed IPTs. Finally, we compute the diﬀerence between
the entropy of the ground truth conﬁguration and the reconstructed conﬁgurations. We report the
mean and the standard deviation of the diﬀerences in the entropy in Table 6.14. The results indicate
that the diﬀerences between the ground truth entropy and the entropy of the reconstructed IPTs are
the smallest when the polynomials are utilized as basis functions. This further corroborates our
ﬁndings that the polynomials result in the best IPT reconstruction accuracies (see Tables 6.4 - 6.9).

6.6 Summary

In this chapter, we presented a method to model pairwise photometric and geometric transfor-
mations between a set of near-duplicate biometric images using ﬁve basis functions. The modeling
of the transformations is used in conjunction with the likelihood ratio based asymmetric mea-
sure to identify the original image and deduce how the images are related to each other. We
performed comprehensive experiments on a large dataset comprising 27,270 images belonging
to face, iris and ﬁngerprint modalities. We further conducted evaluations on unseen modalities
and unseen transformations. The proposed method was robust against unseen modalities as well
as unseen transformations accomplished using image editing software, steganography and deep
learning-based methods. The proposed method capably handled arbitrary number of nodes (im-
ages), missing nodes and near-duplicates downloaded from the Internet reasonably well. We also
analyzed the performance of the basis functions using visualization aid (-SNE) and observed that
some of the photometric transformations are easier to model than others. Finally, we utilized

144

von Neumann directed graph entropy to evaluate the reconstructed IPTs. The proposed algorithm
outperformed existing state-of-the art methods by upto 37% in terms of root identiﬁcation accuracy
and by upto 47% in terms of IPT reconstruction accuracy.

145

CHAPTER 7

GRAPH-BASED APPROACH FOR IMAGE PHYLOGENY FOREST

7.1 Introduction

In this chapter, we propose a novel image phylogeny construction technique that employs sensor
pattern noise and graph convolution technique to (re)construct an image phylogeny forest. The idea
is to develop an uniﬁed framework that integrates sensor-speciﬁc details and image-speciﬁc details
for the task of image phylogeny. An image phylogeny forest (IPF) comprises multiple phylogeny
trees, and each tree consists of a distinct root node and can have arbitrary structure. This removes
the earlier constraint of having a single root node, and that all images should be related. The task
of IPF construction can be considered as an extension of image phylogeny tree (IPT) construction
but with the reject class option. It implies that all the images may not typically belong to a single
IPT. Instead, there can be cases when some images may be outliers (anomalies) and will remain
as singletons, or may belong to a diﬀerent IPT. To accomplish this objective of identifying which
images belong to which IPT, we present a novel locally-scaled spectral clustering technique to
identify the number of clusters (IPTs). Following the clustering process, we utilize sensor pattern
noise (PRNU) with graph convolutional network to construct each IPT. By repeating this process,
for all the clusters (IPTs) concerned, we achieve IPF construction. Also, this is diﬀerent from
our previous work, because the basis functions perform a pairwise analysis, while the current
work performs both global and local analysis. It applies spectral graph theory to all the images
simultaneously, performing a global analysis; followed by sensor pattern noise features to perform
link prediction between pairs of nodes, thereby performing a local analysis.

In this chapter, the objective is as follows: Given a set of near-duplicate face images, our
objective is to construct an IPF. See Figure 7.1. We propose a novel IPF construction method with
two components: (i) An improved spectral clustering algorithm which uses locally-scaled kernels
to address the multi-scale issue for clustering the images correctly. (ii) A graph-based approach

146

in conjunction with sensor noise pattern features extracted from the near-duplicate set in order to
examine both local (pairwise) and global (all images simultaneously) interactions to accurately
construct the IPT corresponding to each cluster. The contributions of this work are as follows:

1. We develop a locally-scaled spectral clustering technique using features extracted from near-
duplicate images to identify and distinguish between images belonging to disparate IPTs.
This step also helps in determining the number of IPTs in an IPF.

2. We develop a technique for IPT construction by combining a graph convolutional network
(GCN) with sensor noise pattern features to harness the capacity of both local and global
analyses. The GCN serves as a node embedding module to determine global relationships
between near-duplicate images by identifying the hierarchical position of an image in the IPT.
We use sensor noise pattern features to inspect local relationships for link prediction between
the original image and the transformed image. By doing so, we have designed a method that
leverages both degrees of analysis for an accurate determination of the relationships between
the near-duplicates. Finding relationships between near-duplicates is not a trivial task as
the transformations used to generate a child node from a parent node may not have a closed
form representation or may have a vast range of parameter space making the modeling of the
transformations extremely diﬃcult, and most importantly there may not be a unique mapping
from the parent image to the child image. Therefore, we need an approach that can perform
a holistic analysis to reliably determine how the images are related to each other.

3. We evaluate the proposed clustering technique on simulated examples (near-duplicates gen-
erated using deep learning transformations) and real world examples (images downloaded
from the Internet) and compare its performance with conventional spectral clustering.

4. We perform a rigorous analysis of the proposed IPT construction method (node embedding
and link prediction) by evaluating on photometrically and geometrically modiﬁed images
and across unseen transformations, unseen IPT conﬁgurations and across diﬀerent biometric
modalities.

147

Figure 7.1: Outline of the objective in this work. Given a set of near-duplicate face images belong-
ing to the same subject (near-duplicates can be generated using either photometric or geometric
transformations or both), our objective is two-fold. Firstly, we would like to ﬁlter out the images that
do not belong to the same evolutionary structure. We achieve this by using a locally-scaled spectral
clustering step. The clusters indicated by ellipsoids vary in diameter indicating the importance
of local scaling. Secondly, for each cluster, an image phylogeny tree (IPT) is constructed. The
ensemble of IPTs result in the desired output corresponding to an Image Phylogeny Forest (IPF).

5. Finally, we evaluate their joint performances in the context of IPF construction for face
images acquired using diﬀerent cameras and with diﬀerent expressions and compare with
state-of-the art baseline methods.

7.2 Proposed Method

The proposed IPF construction method has been broadly outlined in Figure 7.1. The proposed
method has two phases or steps - grouping and phylogeny. Grouping involves clustering the input
near-duplicates into disjoint groups. Intuitively, this implies that images which should not belong
to the same IPT must be assigned to separate groups or clusters. Several clustering algorithms [93]
exist such as k-means [110], density based clustering [141], spectral clustering [126], etc. but they
suﬀer from some limitations. For example, k-means requires input value of number of clusters ,
and is sensitive to noise and outliers. On the other hand, density-based clustering methods although
eﬀective for arbitrary shaped clusters, is sensitive to the parameter selection. Spectral clustering
conceptualizes data points as nodes in the graph and interprets the distances between data points as

148

walks on the graph lying on a non-linear manifold. Determining the clusters requires computing the
distances between data points, and in turn, a non-linear dimensionality reduction by using the eigen
vectors of the graph Laplacian. The steps of spectral clustering can be summarized as follows:

1. Construct a  ×  symmetric similarity matrix  from the input data points  ∈ R× and

its corresponding weighted adjacency matrix .

2. Compute the graph Laplacian matrix  from .

3. Perform eigen decomposition on  and select  eigen vectors corresponding to  smallest
eigen values  = [1, 2, · · · , ], where  ∈ R× and each  is a -dimensional column
vector.

4. Apply k-means on the rows of , (, :) to obtain cluster assignment for each ,  = 1, · · · , .

(cid:18) (cid:107) −  (cid:107)2

(cid:19)

2

Spectral clustering suﬀers from a major limitation that it is incapable of handling multi-scale
cases as discussed in [29]. The notion of scale arises in the similarity matrix  construction
used in the ﬁrst step of spectral clustering.
It uses a kernel, Ò (preferably a smooth kernel
such as an exponential decay function) to compute the aﬃnity between a pair of points ( ,  ) as
(, ) = Ò
. The parameter  refers to the bandwidth variable, and is usually computed
by training on a large number of data points. A correct computation of  is pivotal for accurate
clustering but is governed by the implicit assumption that all the clusters have similar scales, i.e.,
approximate uniform distribution of data points across all the clusters. This will result in inaccurate
clustering when the number of data points within a cluster varies. As shown in Figure 7.2(a), a
single global bandwidth results in two clusters because the bandwidth considers the data points
‘x’ and ‘o’ to be grouped together. An eﬀective strategy to deal with the multi-scale issue is
to consider locally-scaled kernels which vary with the data points and is therefore invariant to
sampling density [29]. Then the similarity matrix can be formulated as (, ) = Ò
.
( )(  )
The variables ( ) and (  ) are known as local bandwidths as they are locally tuned to the data
point under consideration. As a result the data points in multi-scale clusters are grouped correctly

(cid:18) (cid:107) −  (cid:107)2

(cid:19)

149

(a)

(b)

Figure 7.2: Illustration of the proposed spectral clustering which uses locally-scaled kernels (bot-
tom) instead of a single kernel with a global bandwidth  (top). The number of images in each
cluster, i.e., the density of each cluster is not known apriori. The global bandwidth incorrectly
merges two clusters. On the other hand, the local scales (1, 2 and 3) are computed assuming
that clustering is inherently a geometric problem resulting in three correct clusters in this example.

as depicted in Figure 7.2(b). This can be rationalized by leveraging a geometric interpretation of
spectral clustering. Spectral clustering can be treated as a technique “for representing function
spaces on a manifold" [29].

7.2.1 Locally-scaled spectral clustering

The multi-scale limitation posed by conventional spectral clustering concerns our work as we do not
assume the scale or the size of the cluster apriori, i.e., the number of nodes (images) in each IPT. It
may be that some IPT have 3 nodes, whereas another IPT may have 15 nodes. Such wide variations
will impair the performance of the spectral clustering (also shown empirically later), and will lead
to an overall poor performance in IPF construction. Taking the geometric interpretation of spectral
clustering into account, we propose locally-scaled spectral clustering, where we apply the local-
scaling on the feature space (  ( )) instead of directly applying it on the data space (). One of the
advantages of applying it on the feature space is that it will allow aggregation of multiple features
that can produce an accurate similarity matrix, and subsequently accurate clustering results. We
selected three features in this work: (i) pixel intensity, (ii) sensor noise pattern features, particularly
Enhanced Photo Response Non-Uniformity (PRNU) [106], and (iii) face descriptors. The choice
for selecting pixel intensity and PRNU features stem from the assumptions that the near-duplicates
may have been taken in diﬀerent settings (indoor or outdoor) or using diﬀerent cameras. Therefore,

150

pixel intensity can help discern between images depending on illumination variations, whereas,
PRNU which contains sensor-speciﬁc details can help disambiguate between images captured using
diﬀerent cameras. Lastly, as we are using face images which may vary in expression or pose, a
face descriptor may help distinguish between near-duplicates belonging to diﬀerent evolutionary
sequences. We obtained face descriptors using a neural network architecture such as VGGFace [132]
in this work. We can always substitute face descriptors with generic image descriptors for images
depicting natural scenes. The intuition is to harness the capability of complimentary features to
augment the construction of an accurate weighted symmetric similarity matrix . We use a smooth
exponential decay function as our kernel (Ò(·)) for computing the similarity matrix as suggested
in [29]. The resultant similarity matrix now becomes

(cid:18)  (, )2
ˆ() ˆ( ) +  (, )2

ˆ() ˆ( ) +  (, )2
ˆ() ˆ( )

(cid:19)

(, ) = exp−

(cid:19)


(cid:18) (·)−()

, where (·) is the PRNU extracted from the image;  (, ) = 1 −   
2
 2


, where, (·) is the face descriptor extracted from the image;  (, ) =
Here,  (, ) = 1−  
2
 2

, where, (·)
1 −   
2
 2

is the pixel intensity features extracted from the image. The terms in the denominator are locally-
scaled bandwidths computed using univariate kernel density estimate of the respective features.
ˆ(·) corresponds to kernel density estimate (KDE) computed from face descriptors,
ˆ(·) =
1
, where  is the number of data points, Ò is the bandwidth equal to the interval
Ò
between all the points in  and  is a normal kernel. Similarly, ˆ(·) denotes KDE computed
from the PRNU features, and ˆ(·) denotes KDE computed from the pixel intensity features.
Next, we compute binarized similarity matrix  using median of each row as threshold, such that
(, ) = 1 if (, ) > ((, :)), and (, ) = 0, otherwise. Then we follow the steps
pertaining to conventional spectral clustering, such as computation of the degree matrix, Laplacian
matrix, and eigen decomposition. The degree matrix is computed as (, ) =
(, ). The


=1 

Ò

−1
2 . Eigen value decomposition of the
normalized Laplacian is computed as  = 
Laplacian matrix results in eigen vectors () and eigen values ('). Next, we apply a threshold 
(selected using a validation set) to obtain a subset of eigen values, say  smallest eigen values such

−1
2  

=1

151

that, {' ⊂ '|' < }. The corresponding eigen vectors become  = [1, 2, · · · , ]. Finally,
k-means is applied to the rows of  to get cluster assignments, C ∈ R×1 for each image, such
that C = k-means( , ), where,  is the number of clusters. The details of the method are outlined
in Algorithm 7.

Algorithm 7: Locally-scaled spectral clustering
1: Input: Set of  near-duplicate images 
2: Output: Cluster assignments C
3: Compute the face descriptors  from images (we used VGGFace)
4: Compute the PRNU  from images (we used Enhanced PRNU)
5: Compute the vectorized representation of the pixel intensities  from images
6: Compute kernel density estimate for face descriptors, PRNU features and pixel features, re-
7: while ,  ≤  do
8:

Compute distance values between each pair of

spectively: ˆ(·), ˆ(·), ˆ(·)

(,  )

face de-
respectively (we used cosine distance)

images

for

features,

scriptors, PRNU features and pixel
;  (, ) = 1 −   
 (, ) = 1 −  
(cid:18) (, )2
2
 2
2
 2


Compute symmetric similarity matrix :
ˆ() ˆ( ) +  (, )2

(, ) = exp−

ˆ () ˆ ( ) + (, )2
ˆ() ˆ( )

9:

;  (, ) = 1 −   
2
 2


.

(cid:19)

10: end while
11: Compute binarized similarity matrix  using median of each row as threshold
12: Compute diagonal degree matrix 
13: Compute normalized Laplacian 
14: Perform eigen value decomposition to obtain eigen vectors () and eigen values (') from
15: Select thereshold  to select  smallest eigen vectors ':  = [1, 2, · · · , ]
16: Apply k-means clustering on the rows of  to get cluster assignments C for each image return

Laplacian

C

Upon obtaining the clusters using the proposed locally-scaled spectral clustering, we then
proceed to the second phase of our IPF pipeline, i.e., individual IPT construction. We have
discussed diﬀerent IPT construction strategies earlier but they all focus on either pairwise analysis
of the images or global analysis of the entire set. In contrast, we propose a novel method that couples
graph-based convolutional network (GCN) to perform a macroscopic analysis of the entire set and
sensor noise pattern features to explore microscopic analysis of the images. In order to develop a
global analysis of the near-duplicates, we must ﬁrst deﬁne what global or macroscopic relationships

152

Figure 7.3: Illustration of the ‘node embedding’ module (Section 7.2.2). The module  (, )
accepts a pair of inputs, : pixel intensity values of each image in the IPT and : an adjacency
matrix indicating relationships between the images in the IPT. The output of this module is a vector
of depth labels corresponding to each IPT conﬁguration fed as input.

mean in the context of an IPT. An IPT represents a hierarchical structure, where each image is
located at a depth . The depth of an image in an IPT signiﬁes the number of ancestors ( − 1) for
that image, with respect to the root node, which is located at depth=1. For example, an image at
depth=2 has only 1 ancestor, i.e., the root node, while an image at depth=3 has 2 ancestors, and so
on. The determination of the depths of the images in an IPT will provide a holistic understanding
of how the images are globally related with respect to the remaining images. Therefore, the ﬁrst
step in our proposed approach is determining the depth labels for a set of near-duplicate images that
are related to each other. We use a ‘node embedding’ module to accomplish the task of deducing
the depth labels. Once the depth labels are computed for each image, we will determine how nodes
(images) at successive depths are related to each other. For example, the root node will have only
outgoing links, sibling nodes will not have any links between them, and leaf nodes will have only
incoming links. We use a ‘link prediction’ module that utilizes sensor pattern noise (PRNU) to
accomplish the task of identifying existence of links between nodes located at depth  and depths
>  by performing the microscopic analysis. The details of both modules are described next.

153

7.2.2 GCN-based Node Embedding
The task of this module is to accept a set of related near-duplicate images  ∈ R ×× and output
a vector of depth labels  ∈ R . Here,  refers to the number of near-duplicate images belonging
to a set (number of images belonging to an IPT or cluster which is less than , the total number of
images we began with prior to clustering), and each image is of size  × . Let  (·, ·) represent the
node embedding module which requires a pair of inputs  and . Therefore,  =  ( , ). Here,
 refers to the pixel intensity values of images ,  ∈ R ×2.  represents the adjacency matrix
of size  ×  that represents some relationship between the images. The adjacency matrix can be
computed using a nearest-neighbor based method utilizing features extracted from the images. We
will discuss about the construction of the adjacency matrix in Section 7.3. Note that the adjacency
matrix does not necessarily encode the IPT structure. See Figure 7.3.

Node embedding associates a node (an image) with a corresponding label (depth in the IPT).
Graph-based convolutional networks employ spectral convolutions in the Fourier domain to model
pairwise as well as higher order correlations in the data. They have been successfully used for
semi-supervised node classiﬁcation [102] and representation learning [81]. In this work, we studied
three graph based neural networks, viz., (i) Graph Convolutional Network (GCN) [55], (ii) Graph
Convolutional Network with Linear Approximation (GCN-Linear) [102] and (iii) Hypergraph Neu-
ral Network (HGNN) [67] to accomplish the task of node embedding. The underlying principle of a
graph convolutional network is to perform spectral convolutions, in contrast to spatial convolutions,
used in traditional convolutional neural networks. The spectral convolution ﬁlter is approximated
using a truncated Chebyshev polynomial expansion of order . The approximation reduces the
computation complexity, as well as provides spatial localization, absent in the case of spectral
convolution ﬁlter. The reader is referred to [55] for a detailed derivation. Setting the order of the
polynomial  = 1, results in the variant known as GCN-Linear [102]. The linearization allows
stacking of multiple convolutional layers to build deeper models. Finally, we use a hypergraph neu-
ral network [67] that allows hyperedges. The hyperedges combine multiple features extracted from
the images simultaneously to determine high level correlations. This is in contrast to traditional

154

Figure 7.4: Illustration of the ‘link prediction’ module (Section 7.2.3). The module accepts a pair
of inputs (, ), : depth labels from the ‘node labeling’ module (see Figure 7.3), and : sensor
noise pattern (PRNU) features computed from each image of the set fed as input. The output of
this module is the image phylogeny tree (IPT) containing edges directed from parent nodes to child
nodes. Note that the ancestral links are present in the reconstructed IPT.

graph based networks that typically use a single feature at a time. HGNN is developed on top of
GCN-Linear ( = 1) and uses multiple convolutional layers to represent multiple features for the
task of node embedding.

7.2.3 PRNU-based Link Prediction
The task of this module is to accept the vector of depth labels  ∈ R  for a set of  near-
duplicate images and output a data structure containing a set of vertices , such that || = ,
and directed edges . We refer this directed structure as the image phylogeny tree: (, ).
Let (·, ·) represent the link prediction module that requires a pair of inputs  and . Therefore,
 = (, ). Here,  refers to the depth labels computed using the node embedding step. 
represents the sensor noise pattern features computed from the images , such that,  ∈ R ×2.
The depth labels computed by the node embedding step do not indicate how nodes at a particular
depths are linked with nodes ate lower depths. Multiple nodes can have an identical depth label
(see Figure 7.4), say depth label = , but only one of them will be the parent of a node located
at depth label  + 1. Additionally, there might some missing depth labels, in cases of incorrect
outputs produced by the node embedding module (for example, multiple nodes with depth label=1).
Therefore, the link prediction performs two tasks: (i) depth label correction, and (ii) pairwise link

155

inference.
Algorithm 8: PRNU-based link prediction
1: Input: Set of  near-duplicate images , set of depth labels  provided by GCN, where || = 
2: Output: Root  and 
3: Extract the sensor noise pattern (PRNU) for each image () where  = 1, · · · , 
4: Identify whether multiple nodes have depth label = 1, if || > 1 go to Step 3,

else go to Step 4

5: Perform depth label correction
6: Infer pairwise links
7: Identify the root node as the node with corrected depth label = 1, 
8: Construct the phylogeny tree: (, )   represents the set of nodes and  represents the

set of directed edges
return  and 

In order to achieve the ﬁrst task, i.e., correcting missing depth labels, we ﬁrst check whether
multiple nodes have depth label=1, this is important because according to the deﬁnition of an IPT,
it has only one root node (i.e., only one node will have depth label=1). Multiple root nodes exist
if || > 1, where,  = {() == 1}, where  = 1, · · · , . We
employed sensor noise pattern features present in images for depth label correction. Photo Response
Non-Uniformity (PRNU) is a type of sensor pattern noise which manifests in the image as a result
of the non-uniform response of the pixels to the same light intensity [112], and can be used for
sensor identiﬁcation. Photometric and geometric transformations can induce changes in the PRNU
pattern present in an image [19, 112]. See Figure 7.5 to visualize how PRNU patterns change
in presence of transformations. We used the power spectral density (PSD) plots to demonstrate
that, although the images may not depict signiﬁcant variation, but their PRNU patterns exhibit
changes and can therefore be utilized for link prediction, as it can help successfully discriminate
between parent and child nodes. So, we hypothesize that PRNU can be used to correctly deduce
the node which should have depth label=1. To accomplish that, we ﬁrst compute the Euclidean
distance between the PRNU features of each of the candidate root nodes and the remaining nodes,
 = (cid:107)((()) − ())(cid:107)2
2, where 1 ≤  ≤ ,  ࣔ ,  corresponds to PRNU for
each image. Next, we retain that node which results in the highest distance as the correct root node,
() = 1, where,  = (). The rationale behind retaining the node with highest distance

156

as the node with depth label=1, is because as the depth increases, it implies that the root node
(original image) has undergone multiple sequences of transformations, which will subsequently
lead to higher variation in PRNU patterns of the transformed images. As a consequence, the
root node will intuitively have the highest distance in terms of PRNU features between itself and
the remaining nodes in the set. The nodes that were misclassiﬁed as depth label=1 were then
re-assigned to depth label=2. After the depth label correction procedure, we proceed to the second
task to infer the links and construct IPT.

In order to achieve the second task, i.e., determine the existence of links between nodes located
at depth labels  and > , we again use the PRNU features. We will use the same notation () to
indicate PRNU of each image  = {1, · · · , }. Next we consider nodes located between successive
depths, say, for example, nodes  and  that are assigned the same depth label, , by the node
embedding module, and another node  that is assigned depth label  + 1. We have to identify the
correct parent ( or ) of the node . For that, we compute the squared 2− norm between their
PRNU features:  = (cid:107)(() − ())(cid:107)2
2. Finally, we select that
node as the parent of  which results in the least distance, i.e.,  →  ( is the parent) if  < ;
otherwise,  →  ( is the parent). The rationale behind this decision is that unrelated nodes or
ancestors are more likely to result in higher Euclidean distance with respect to their PRNU feature
vector, whereas, the immediate parent will result in lower distance. We repeat this step for nodes
located at successive depths until we have reached the leaf node(s). Finally, we deduce the root
node as the node with corrected depth label = 1,  = arg(() == 1), 1 ≤  ≤ , where,
 represents the set of corrected depth labels. The steps of PRNU-based link prediction are
summarized in Algorithm 8.

2 and  = (cid:107)(() − ())(cid:107)2

7.3 Implementation

The implementation of the GCN and GCN-Linear is based on [102].1 The parameters used are
as follows: learning rate = 0.01, number of epochs = 100, number of units in hidden layer = 16,

1https://github.com/tkipf/gcn

157

Figure 7.5: Illustration of utility of PRNU in image phylogeny. The graphic illustrates the variation
in PRNU patterns in repsonse to photometric transformations. These variations are better visualized
using the binary maps computed from each PRNU pattern (threshold=0) and their corresponding
power spectral density (PSD) plots. Note, the PSD plots of the images do not bear any apparent
variation, but the PSD plots of PRNU patterns reveal discernible diﬀerences. We intend to leverage
this property of PRNU for the task of image phylogeny in conjunction with GCN.

dropout = 0.5, weight decay = 5 × 10−4, early stopping = 10 (iterations) and degree of Chebyshev
polynomial,  = {3, · · · , 9} (for GCN only). The cross-entropy loss function is employed. Both
GCN and GCN-Linear require a feature matrix and an adjacency matrix. In our case, either the
pixel intensity values or the PRNU features extracted from the images are used as the feature matrix
of size  × 2, where,  is the number of images (in each set) and each image from the set is
of size  × . For extracting the PRNU features, we used the method described in [106]. The
method accepts a block diagonal matrix as the adjacency matrix input; each block represents a
single  ×  adjacency matrix corresponding to one set of near-duplicates. During training, we
used the actual IPT conﬁguration to construct the adjacency matrix. An asymmetric matrix can be
used as adjacency matrix provided correct normalization is used for computing the graph Laplacian
−1
(−1  instead of 
2 , where  is the degree matrix and  is the adjacency matrix).
Note that this degree matrix and adjacency matrix (for each near-duplicate set) are diﬀerent from
the ones used in locally-scaled spectral clustering (multiple near-duplicate sets). During testing,
we do not assume any prior IPT conﬁguration or image transformation. For a set of test images,

−1
2 

158

Table 7.1: Photometric and geometric transformations and the range of the corresponding param-
eters used in Experiments 2 and 3. The transformed images are scaled to [0, 255]. Note that
these transformations are being used only in the training stage. For the test stage, any arbitrary
transformation can be used.

Level of
Operation Parameters

Transformations
Brightness adjustment Global
Median ﬁltering
Local
Gaussian smoothing
Global
Gamma transformation Global
Global
Translation
Scaling
Global
Global
Rotation

[a,b]
size of window [m,n] m ∈ [2,6], n ∈ [2,6]
standard deviation
gamma
[, ]
Percentage
theta

Range
a ∈ [0.9,1.5], b ∈ [-30,30]
 ∈ [1,3]
 ∈ [0.5,1.5]
 ∈ [5,20],  ∈ [5,20]
[90%, 110%]
 ∈ [−50, 50]

ﬁrst we compute their respective PRNU features, next we compute the squared 2−norm between
all PRNU feature pairs, and assign a link between the node pair if the distance is less than some
threshold (mean is selected as threshold).

For HGNN,2 the parameters used are as follows: learning rate = 0.001, number of epochs = 600,
number of units in hidden layer = 128, dropout = 0.5, weight decay = 5 × 10−4, multi-step learning
rate scheduler parameters: gamma = 0.9, decay step = 200, decay rate = 0.7, milestones = 100
and the cross-entropy loss function is employed. We have used pixel features and PRNU features
separately but we observed best results when both features are used together. The hypergraph
adjacency matrix, ∈ R ×2, is constructed by concatenating horizontally the adjacency matrices
corresponding to the pixel features and PRNU features, when each of them is used in turn. Each
adjacency matrix is constructed using the −nearest neighbor ( = 5) method applied to the pixel
and PRNU features, respectively.

7.4 Datasets and Experiments

We performed three sets of experiments:

1. Evaluation of the proposed locally-scaled spectral clustering in the context of deep learning-

based transformations and images downloaded from the Internet.

2https://github.com/iMoonLab/HGNN

159

2. Evaluation of the proposed node embedding and link prediction module in the context of IPT
reconstruction for photometrically and geometrically transformed images; in the context of
unseen transformations; and in the context of unseen modalities and conﬁgurations.

3. Evaluation of the proposed clustering and IPT reconstruction method as a uniﬁed module for

IPF reconstruction.

7.4.1 Experiment 1: Evaluation of locally-scaled spectral clustering

In this experiment, we used Google search query such as Angelina Jolie and Superman to download
near duplicates from the Internet. These images are acquired in the wild with wide variations in
poses and no sensor information. We also used 22 images belonging to four subjects from [84],
where deep learning-based manipulations are applied on images to alter attributes such as adding
hair bangs or adding glasses. Fine modiﬁcations can be applied such as adding left bangs or
right bangs or making the shades more darker. These subtle modiﬁcation result in near-duplicates.
We have applied locally-scaled spectral clustering on these images to discern how many IPTs are
present in an IPF.

7.4.2 Experiment 2: Evaluation of the proposed IPT reconstruction algorithm using GCN-

based node embedding and PRNU-based link prediction

In this experiment, we evaluate the performance of the proposed approach in terms of (i) root iden-
tiﬁcation accuracy, which computes the proportion of correctly identiﬁed root nodes, and (ii) IPT
|Original_edges ∩ Reconstructed_edges|
reconstruction accuracy, which is computed as follows:
for face images that are subjected to photometric and geometric transformations. We used face
images from the Labeled Faces in the Wild (LFW) dataset [88]. All the images are cropped to a
ﬁxed size of 96 × 96 using a commercial face SDK. We used seven transformations—four photo-
metric: Brightness adjustment, Gamma transformation, Median ﬁltering and Gaussian smoothing,
and three geometric: Rotation, Scaling and Translation. The parameters used in generating the
near-duplicates are described in Table 7.1. 7,500 images from 123 subjects resulting in 1,500 IPTs

|Original_edges|

160

form the training set. 3,000 images from 46 subjects resulting in 600 IPTs form the validation set.
We compared the performance of three node embedding techniques used in this work viz., GCN,

Figure 7.6: IPT conﬁgurations (structures) used in Experiment 2. For ease of visualization, only
the immediate links are depicted. However, the ancestral links are also included for evaluation.

GCN-Linear and HGNN, and selected the network yielding the highest depth classiﬁcation accu-
racy. The node labels (depth values) from the best performing model in the ﬁrst stage are further
fed to the link prediction module to infer the links for the six diﬀerent IPT conﬁgurations depicted
in Figure 7.6. There can be more than (−2) IPTs with  nodes (Cayley’s formula). We selected
these six conﬁgurations for training as they cover maximum breadth and depth values possible for
an IPT with ﬁve nodes. We conducted experiments in two scenarios. In Scenario 1, we trained
and tested on face images from the LFW dataset. In this scenario, the test set comprises 900 IPTs
involving 4,500 face images corresponding to 75 subjects disjoint from the training and validation
sets, and are evaluated separately for photometric and geometric transformations.
In Scenario
2, we trained GCN on face images but tested on images from the Uncompressed Color Image
Database (UCID) [142] depicting natural scene and generic objects. We used 50 images as used
in [61] and applied photometric and geometric transformations to simulate 50 (number of original
images)×5 (number of images in each IPT)×6 (number of IPT conﬁgurations)×2 (photometric and
geometric transformations) = 3,000 test images. See Figure 7.7(b). For IPT reconstruction, only
the GCN-based node embedding module requires training. Therefore, we wanted to analyze the
robustness of the GCN module in handling diﬀerent training and testing datasets.

Next, we performed an experiment to evaluate the performance of the proposed method in

161

the context of unseen transformations. We used Photoshop to manually edit face images from
the LFW dataset resulting in a set of 175 near-duplicates. We generated the near-duplicates
corresponding to 35 IPTs having 4 diﬀerent conﬁgurations, and each conﬁguration has 5 nodes.
We used the same protocol as followed in [23]. We used the Curve, Hue/Saturation, Channel Mixer,
Brightness, Vibrance adjustment options and blur ﬁlters for generating the test set consisting of
Photoshopped images.

We performed another experiment where we wanted to test the generalizability of the proposed
method in terms of unseen modalities and unseen conﬁgurations. For this evaluation, we tested
on 6,000 near-infrared iris images from the CASIA Iris V2 Device 2 dataset [6] corresponding to
30 subjects, resulting in 1,200 IPTs, where each IPT contains 5 images (see Figure 7.7(a)).

Finally, we conducted an experiment to evaluate whether the GCN-based node embedding
module can handle unseen number of nodes. The GCN is trained using IPT conﬁgurations
comprising 5 nodes, but we tested it on a publicly available Near-duplicate Face Images (NDFI)
–Set I [21] dataset comprising 1,229 IPTs, where each IPT consists of 10 nodes. Therefore, the test
set for this experiment comprises 12,290 images.

(a)

(b)

Figure 7.7: IPT conﬁguration of iris and natural scene images used for evaluation in Experiment
2. The conﬁguration used in iris near-duplicates is diﬀerent from the ones used in training (see
Figure 7.6). The immediate links are depicted using bold blue arrows, while the ancestral links are
depicted using dashed orange arrows.

162

7.4.3 Experiment 3: Evaluation of the proposed IPF reconstruction using locally-scaled
spectral clustering and GCN-based node embedding and PRNU-based link prediction

Figure 7.8: Illustration of the image phylogeny forest structures used in Experiment 3. Each IPF
comprises three IPTs, where each IPT may have 5 nodes (IPT 1 and IPT 4) or 10 nodes (IPT
2 and IPT 5) or 15 nodes (IPT 3 and IPT 6). The selected test conﬁgurations diﬀer from the
conﬁgurations used in training the GCN and indicate variations both in density and conﬁgurations
of the IPF test cases. The immediate links are indicated for ease of visualization but ancestral links
are also included for evaluation.

For the IPF reconstruction we used face images from the WVU Multimodal Release I dataset [12].
We used only those subjects whose images have been acquired using two diﬀerent sensors: Sony
EVI- D30 and Sony EVI- D39. Therefore, we used images from 49 subjects. We randomly
selected 3 sample images from each of the 49 subjects. Next we subjected the images to seven
transformations (four photometric and three geometric as described in Table 7.1). We used the
six conﬁgurations depicted in Figure 7.8 to generate 294 near-duplicate sets (49 subjects × 3 IPT
conﬁgurations in each IPF × 2 sensors), such that each IPF will contain three IPTs in total - the
ﬁrst IPT with 5 nodes, the second IPT with 10 nodes and the third IPT with 15 nodes, resulting
in a total of 2,940 images (49 IPFs × 30 images in each IPF). Examples of some near-duplicates
used in IPF reconstruction are presented in Figure 7.1. The test set contains variations in sensors,
subjects (11 females and 38 males) and acquisition settings (indoor and outdoor). We used three
IPFs to compute the parameter for locally-scaled spectral clustering ( = 0.7, see Algorithm 7).
We report the results ﬁrstly, in terms of the performance of the locally-scaled spectral clustering

163

algorithm (mean number of clusters and clustering accuracy) and then secondly, in terms of root
identiﬁcation and IPF reconstruction accuracies.

7.4.4 Baseline

We compare the proposed locally-scaled clustering with conventional spectral clustering algorithm.
We compare the proposed GCN and PRNU based IPT reconstruction with Gaussian RBF and
Chebyshev polynomials basis functions approach [23], and the Oriented Kruskal algorithm [61,62].
The codes for implementing autoencoder-based method [38] and transformation aware embedding-
based method [32] are not open source, and therefore, could not be used as baselines for comparison.

7.5 Results and Analysis

7.5.1 Results for Experiment 1

The proposed locally-scaled spectral clustering is evaluated for near-duplicates downloaded from
the Internet and images generated using deep learning scheme. Figure 7.9(a) indicates that the
images are clustered into six distinct IPTs. Although there is no ground truth associated with the
images for empirical evaluation, we can do a visual inspection to perform a qualitative analysis in
this case. Some images seem to have been modiﬁed by inserting digital watermarking (such as ‘3’
and ‘4’) and are correctly assigned to separate clusters. Three images have been assigned to IPT
‘5’, although visual inspection reveals that two out of three images are correctly assigned, while the
third image should haave been assigned to a separate cluster. The results in Figure 7.9(b) appear to
be correct with two clusters.

For the near-duplicates generated using deep learning-based method, each image is generated
by modifying attributes such as adding hair bangs or adding glasses. Further modiﬁcations include
making the shades darker or changing the direction of the hair bangs or making them dense or sparse.
The modiﬁcations are performed mostly on the original image, so disjoint IPTs are anticipated as
also indicated by the locally-scaled clustering outputs in a majority of cases.

164

(a)

(b)

Figure 7.9: Experiment 1: Locally-scaled spectral clustering performance for near-duplicates
downloaded from the Internet. The numbers indicate the cluster identiﬁer to which an image has
been assigned. On the left, six clusters (IPTs) have been identiﬁed. On the right, two clusters (IPTs)
have been identiﬁed. The results are for visual inspection only as no ground truth is associated with
them.

7.5.2 Results for Experiment 2

We ﬁrst compared the performance of the three node embedding techniques and observed that GCN
(with Chebyshev polynomial of degree 3) outperforms GCN-Linear and HGNN by a considerable
margin (≈ 20%), and is, therefore, selected as the best node embedding technique. We subsequently
used the depth labels provided by the GCN technique for IPT reconstruction in the link prediction
module. The reason for GCN to perform better than the remaining two methods could be attributed
to the use of a higher degree polynomial compared to GCN-Linear and HGNN (both use Chebyshev
polynomial of degree 1). The spectral convolution ﬁlter is approximated using a truncated Cheby-
shev polynomial expansion. We hypothesize that higher order polynomials can therefore perform
eﬀective spectral convolutions, resulting in accurate node embedding. In contrast, the linearization
used in GCN-Linear and HGNN allows deeper architecture and combination of multiple features,
but weakens the ability to model global relationships. Thus, GCN outperforms both GCN-Linear
and HGNN, and is used for evaluating the remaining experiments.

Next, we evaluated the performance of the GCN-based node embedding and PRNU-based link
prediction module in terms of root identiﬁcation and IPT reconstruction accuracies in the two

165

Figure 7.10: Experiment 1: Locally-scaled spectral clustering performance for near-duplicates
generated using deep learning-based transformations [84]. The numbers indicate the cluster identi-
ﬁer to which an image has been assigned. The proposed method can successfully discern between
minute changes in the attributes and assigns the modiﬁed images to distinct clusters (IPTs) in a
majority of cases.

scenarios. In Scenario 1, where both training and testing is conducted on face images, the proposed
method achieves a root identiﬁcation accuracy of 85.11% in the context of photometrically modiﬁed
images and 73.22% in the context of geometrically modiﬁed images, averaged across the six
conﬁgurations. In terms of IPT reconstruction accuracies, the proposed method achieves 90.64% in
the context of photometrically modiﬁed images and 87.31% in the context of geometrically modiﬁed
images, averaged across the six conﬁgurations. In Scenario 2, where training is performed using
face images but testing is conducted on images containing natural scene, the proposed method
achieves a root identiﬁcation accuracy of 74.67% in the context of photometrically modiﬁed
images and 75.33% in the context of geometrically modiﬁed images, averaged across the six
conﬁgurations. In terms of IPT reconstruction accuracies, the proposed method achieves 87.60%
in the context of photometrically modiﬁed images and 86.72% in the context of geometrically
modiﬁed images, averaged across the six conﬁgurations. See Table 7.2. The results indicate that
some conﬁgurations particularly II and VI are very diﬃcult to reconstruct both for photometric

166

Table 7.2: Experiment 2: Performance of node embedding and link prediction modules in terms
of root identiﬁcation and IPT reconstruction accuracies for both photometric and geometric trans-
formations. The results are reported for two scenarios. The values to the left of the forward slash
indicate Scenario 1 (trained on face images and tested on face images) and the values to the right
indicate Scenario 2 (trained on face images but tested on images depicting natural scenes).

Photometric transformations

Geometric transformations

IPT
conﬁguration

IPT A
IPT B
IPT C
IPT D
IPT E
IPT F
Average

Root identiﬁcation
accuracy (%)
90.0 / 88.0
69.33 / 48.0
86.0 / 76.0
98.67 / 98.0
94.0 / 92.0
72.67 / 46.0
85.11 / 74.67

IPT reconstruction
accuracy (%)
94.0 / 91.33
78.80 / 74.80
96.0 / 89.67
97.17 / 100.0
97.73 / 93.60
80.15 / 76.22
90.64 / 87.60

Root identiﬁcation
accuracy (%)
86.67 / 90.0
46.67 / 58.0
74.67 / 72.0
95.33 / 96.0
93.33 / 86.0
42.67 / 50.0
73.22 / 75.33

IPT reconstruction
accuracy (%)
91.67 / 87.33
74.47 / 76.0
93.78 / 88.0
95.33 / 100
96.27 / 94.0
72.37 / 74.22
87.31 / 86.72

and geometric transformations, indicating the diﬃculty in reconstructing deeper and unbalanced
trees. The results indicate that the proposed GCN-based node embedding and PRNU-based link
prediction modules are adept in handling not only diﬀerent classes of transformations (photometric
and geometric), but also diﬀerent types of images (biometric and generic images).

In the context of unseen transformations, the proposed method acheieves a root identiﬁcation
accuracy of 82.86% and an IPT reconstruction accuracy of 92.95% averaged across four IPT
conﬁgurations (see Table 7.3). In the context of unseen modalities and conﬁgurations, the proposed
method achieves a root identiﬁcation accuracy of 85.92% and an IPT reconstruction accuracy of
90.85% in the case of iris images. In the context of unseen number of nodes, the proposed method
achieves a root identiﬁcation accuracy of 90.97%, and an IPT reconstruction accuracy of 61.02%.
The best performing method in [17] reported a root identiﬁcation accuracy of 89.91% at Rank 3,
and an IPT reconstruction accuracy of 70.61%, assuming that the the root node is known apriori.
In contrast, we report only one root identiﬁcation accuracy and we make no assumptions about
using the correct root node for IPT reconstruction. Overall, the results indicate that the proposed
method is capable of handling unseen transformations, modalities, conﬁgurations and number of
nodes reliably well.

167

Table 7.3: Experiment 2: Evaluation of the performance of node embedding and link prediction
modules in the context of unseen transformations, unseen modalities and conﬁgurations and unseen
number of nodes.

Experimental settings
Unseen transformations
Unseen modalities and
conﬁgurations
Unseen number of nodes

Root identiﬁcation
accuracy (%)
82.86
85.92
90.97

IPT reconstruction
accuracy (%)
92.95
90.85
61.02

7.5.3 Results for Experiment 3

In terms of IPF reconstruction, we evaluate both proposed methods: locally-scaled spectral clus-
tering for identifying number of IPTs and then node embedding and link prediction module for
IPF reconstruction (construct each IPT within the IPF). Table 7.4 reports the number of clusters
produced by conventional spectral clustering and the proposed locally-scaled spectral clustering.
Figure 7.11 indicates the clustering accuracies (i.e.
the proportion of images correctly assigned
to the respective clusters) for both conventional and proposed spectral clustering methods. Each
IPF contains three clusters (IPTs), and each IPT comprises either 5 or 10 or 15 nodes. Results
indicate that conventional clustering produces much higher clusters than the desired output and
result in erroneous assignment of images. On the other hand, the proposed method performs well
irrespective of the number of nodes. The global bandwidth used in conventional spectral clustering
has been computed using the standard deviation of the respective features (face descriptors, PRNU
and pixel intensities). A single global bandwidth is not suited to correctly identify the number
of IPTs as evident from the ﬁndings, and substantiates the importance of locally-scaled spectral
clustering.

In terms of root identiﬁcation and IPF reconstruction accuracies, we report them separately for
each IPT conﬁguration. We reconstruct the IPT using Oriented Kruskal, Gaussian RBF (Gaussian
RBF outperformed Chebyshev polynomials, so we are reporting the results pertaining to Gaussian
RBF only for the sake of brevity) and the proposed method from the clustered outputs of locally-
scaled clustering algorithm. In all cases, the proposed GCN-based node embedding and PRNU-

168

Table 7.4: Experiment 3: Number of clusters (mean and standard deviation) produced during IPF
construction by conventional spectral clustering and locally-scaled spectral clustering (proposed).
A lower value (mean ≈ 1, standard deviation ≈ 0) is desirable. The proposed method yields better
results (bolded).

Number of nodes

5
10
15

Number of clusters produced during IPF
construction (mean ± std. deviation)
Spectral clustering Locally-scaled spectral
clustering (Proposed)
1.70 ± 0.54
2.31 ± 1.04
1.98 ± 0.21
1.52 ± 0.32
2.35 ± 0.80
1.84 ± 0.53

Table 7.5: Experiment 3: Evaluation of GCN-based node embedding and PRNU-based link pre-
diction for each IPT conﬁguration used in the IPF in terms of root identiﬁcation and reconstruction
accuracies. Results indicate that the proposed method (bolded) signiﬁcantly outperforms state-of-
the-art baselines in all the cases.

IPT conﬁguration Root identiﬁcation accuracy (%)

Oriented Kruskal Basis functions
32.61
7.24
13.04
27.53
21.74
11.59
18.96

IPT reconstruction accuracy (%)
(Gaussian RBF) GCN+PRNU Oriented Kruskal Basis functions
23.91
12.32
7.24
42.03
35.50
17.39
23.06

78.26
57.97
47.10
79.71
55.79
31.88
46.97

IPT 1 (5 nodes)
IPT 2 (10 nodes)
IPT 3 (15 nodes)
IPT 4 (5 nodes)
IPT 5 (10 nodes)
IPT 6 (15 nodes)
Average

21.56
15.38
14.96
19.02
16.59
15.43
17.16

(Gaussian RBF) GCN+PRNU
34.17
28.09
24.01
40.37
31.09
31.11
31.47

66.78
71.48
61.55
60.76
62.03
58.86
59.41

based link prediction outperform the two baselines employing Oriented Kruskal and basis functions
(see Table 7.5) by a signiﬁcant margin. We have also reported the average root identiﬁcation and
IPT reconstruction accuracies, which measures the overall IPF reconstruction performance. Results
indicate that the proposed method outperforms Oriented Kruskal by 28.01% and Gaussian RBF by
23.91% in terms of root identiﬁcation accuracy; the proposed method outperforms Oriented Kruskal
by 42.25% and Gaussian RBF by 27.94% in terms of IPT reconstruction accuracy. Error plots
indicating mean and standard deviations of the root identiﬁcation and reconstruction accuracies are
illustrated as a function of the variation in the number of nodes (images) in Figure 7.12. Results
indicate the proposed method outperforms Gaussian RBF, which in turn outperforms Oriented
Kruskal method. The results also indicate that GCN which has been trained on ﬁve images only
can work fairly well on any arbitrary number of images, such as ten or ﬁfteen.

The main ﬁndings from the experiments are as follows:

169

Figure 7.11: Experiment 3: Variation of clustering accuracies as a function of the number of
nodes for the conventional spectral clustering (blue) and the locally-scaled spectral clustering
(proposed) methods. The proposed method (orange bars) consistently results in higher means and
lower standard deviations in clustering accuracies across 5, 10 and 15 nodes over the conventional
spectral clustering algorithm.

(a)

(b)

Figure 7.12: Experiment 3: Variation in root identiﬁcation and IPT reconstruction accuracies as a
function of the number of nodes.

1. Locally-scaled spectral clustering can suitably address the multi-scale issue encountered by
conventional spectral clustering. This is particularly relevant in IPF reconstruction where,
the number of nodes (near-duplicate images) belonging to each IPT is not known apriori
(see Table 7.4). Experiments indicate that the proposed method performs robustly when the

170

number of nodes in an IPT is varied by a factor of two and three in an IPF, and appropriates
well to near-duplicates downloaded from the Internet or generated using deep learning tools.

2. Graph convolutional network-based node embedding and PRNU-based link prediction can
signiﬁcantly outperform existing methods in constructing image phylogeny tree. Experiments
indicate that the proposed method performs very well for both photometrically and geometri-
cally modiﬁed face as well as natural scene images (see Table 7.2). The proposed method also
generalizes well across unseen transformations, unseen modalities, unseen conﬁgurations and
unseen number of nodes. See Table 7.3.

3. Locally-scaled spectral clustering used in conjunction with GCN and PRNU can successfully
outperform existing methods in the context of image phylogeny forest reconstructions and
oﬀers substantial improvement (upto ≈ 28% in terms of root identiﬁcation and upto ≈ 42%
in terms of IPF reconstruction accuracies, see Table 7.5).

7.6 Summary

In this chapter, we explored the use of graph convolutional network for the purpose of image
phylogeny. Firstly, we deduce the depth label at which an image is likely to be present in the image
phylogeny tree. Secondly, we used sensor pattern noise features (PRNU) extracted from the images
to predict links between parent and child nodes to ﬁnally construct the IPT. By leveraging graph-
based approach and PRNU-based analysis, we judiciously combined global and local analyses in
constructing IPTs with higher reconstruction accuracies than existing methods, which mostly focus
on pairwise analysis. Furthermore, the method generalized well across unseen transformations,
unseen conﬁgurations and across modalities. We successfully applied this method to construct
image phylogeny forests (comprising IPTs of arbitrary number of nodes and structures) by using a
locally-scaled spectral clustering technique. The resultant approach outperformed state-of-the-art
methods by upto ≈ 28% in terms of root identiﬁcation accuracy and upto ≈ 42% in terms of IPF
reconstruction accuracy.

171

CHAPTER 8

CONCLUSION AND FUTURE WORK

8.1 Research Contributions

Research in the context of digital image forensics has existed over several decades. But the
same cannot be ascertained for biometric images. The widespread utility of biometrics in a myriad
of applications combined with the availability of inexpensive image editing tools, has therefore,
necessitated the importance of examining forensics for digitally modiﬁed biometric images. It will
undoubtedly prove to be useful in commercial and judicial services.

In this thesis, we explore digitally edited biometric images from two perspectives –(i) sensor-
based analysis and (ii) content-based analysis. The sensor-based analysis encompasses the sensor
(camera) details of an image, as it can provide useful cues about the processing history of an
image. Sensor details present in an image can be used for copyright protection and deducing what
sensor has been used to acquire an image. This is particularly relevant in biometrics, as the sensors
used in acquiring biometric images vary across modalities. For example, ﬁngerprint images are
acquired using capacitive or optical sensors, iris images are acquired using near-infrared sensors,
and face images are acquired using sensors operating in the visible spectrum. Images acquired
using diﬀerent sensors will leave unique traces in the images, which can be extracted for sensor
identiﬁcation routines. On the other hand, the content-based analysis covers modiﬁcations applied
to the content present in the image. These changes can be very subtle, such as photometric and
geometric transformations which may be visually imperceptible and result in near-duplicates. In
the case of biometric images, such transformations may interfere with their matching utility. For
example, face images from surveillance videos can be contrast adjusted to increase the matching
performance of a commercial matcher, which will violate the chain of custody of digital evidence.
Also, the image characteristics vary across diﬀerent modalities. For example, face images contain
structural details while ﬁngerprint images contain textural details. So the content-based analysis of

172

near-duplicates can help deduce the original image as well as determine the trail of modiﬁcations.
Therefore, we pursued both degrees of analysis, and leveraged them in a uniﬁed framework for
improved forensic analysis of biometric images.

In the context of biometric sensor identiﬁcation, we analyzed the feasibility of Photo Repsonse
Non-Uniformity (PRNU) in correctly deducing the near-infrared sensor used to acquire ocular
images. PRNU has been used for sensor identiﬁcation of generic images, but limited work has been
done in the context of application of PRNU for biometric images. We further examined the impact
of photometric transformations on the reliability of PRNU-based iris sensor identiﬁcation. The
variation in illumination conditions can induce photometric transformations in biometric images.
Our ﬁndings substantiated that certain illumination normalization schemes can adversely impact
iris sensor recognition.

To further test the robustness of the PRNU-based sensor identiﬁcation schemes, we also devel-
oped methods to deliberately confound sensor recognition methods. Another objective of sensor
de-identiﬁcation is to cater to privacy preservation. Sensor recognition can be implicitly linked to
the individual possessing the device. By removing the sensor-speciﬁc traces while maintaining the
image content, one can unlink the device and the user. In our case of biometric images, we retained
their biometric utility while impeding their sensor identiﬁability. In the ﬁrst method, we applied
perturbations to patches of ocular images and repeated this process until the image is misclassiﬁed
as belonging to a diﬀerent sensor than its original acquisition device. This iterative process was
able to confound a speciﬁc PRNU-based classiﬁer. In the second method, we developed a sensor
de-identiﬁcation technique that used Discrete Cosine Transform to suppress sensor traces in a single
pass, and generalized to multiple PRNU classiﬁers.

Furthermore, we developed a method that used deep learning to jointly learn biometric and
sensor details present in an image using a one-shot approach. The joint representation can be
used for the task of performing biometric and sensor recognition simultaneously. The task of
joint biometric-device recognition can be used for authentication on current smartphones that use
biometric signature of the owner for access control, such as FaceID on Apple’s iPhones. The

173

joint representation couples biometric and sensor signatures non-trivially, and therefore, implicitly
imparts privacy to the biometric template.

Next, we probed into the content-based analysis of the images, particularly, in the task of image
phylogeny for near-duplicate biometric images. Photometric and geometric transformations can be
digitally induced, repeatedly, resulting in numerous near-duplicates that are manually indiscernible
from the original image. However, identifying the original image and understanding the evolution
of the near-duplicates is important from the perspective of media forensics. The task of image
phylogeny is highly challenging as the scope of transformations and the widespread distribution
of edited content on online platforms keep evolving. To the best of our knowledge, we initiated
the research involving image phylogeny for biometric images. We developed two methods to
combat this diﬃcult problem.
In the ﬁrst method, we used a deterministic approach to model
transformations between a pair of near-duplicate images and used the estimated parameters to
construct the image phylogeny tree (IPT). The method did not consider any prior information while
inferring directed relationships between the set of near-duplicates. So in the second method, we
leveraged a probabilistic approach, where we used the likelihood ratio of the estimated parameters
computed from a training set to deduce the original image and the hierarchical structure depicting
the IPT. We further evaluated the proposed method using diﬀerent families of basis functions,
and observed that the Chebyshev polynomials and Gaussian radial basis functions were the best
candidates for inferring the IPT.

Both deterministic and probabilistic approaches for IPT (re)construction involved pairwise
modeling which disregarded the global information. Therefore, in our ﬁnal work we utilized a
graph-based approach, speciﬁcally a graph convolutional network to inspect the global relationships
between a set of near-duplicates by examining all the images simultaneously. In this work, we also
utilized the sensor pattern noise (PRNU) features to distinguish between original and transformed
images, which assisted in correctly identifying the parent and the child node in the IPT. The
motivation behind using PRNU was driven by existing works that indicated the variation in the
PRNU in response to geometric transformations, and our own examinations of the impact of

174

photometric transformations on PRNU. The proposed method combined sensor and content-based
analyses to tackle the task of image phylogeny for near-duplicate face images. It robustly handled
unseen transformations, diﬀerent biometric modalities and arbitrary number of images within the
IPT. It demonstrated promising results when tested on image phylogeny forests comprising multiple
IPTs with diﬀerent structures. The proposed method outperformed existing state-of-the-art methods
by a signiﬁcant margin.

As far as computational costs are concerned, for a 10-node Image Phylogeny Tree:

(12.96 secs.

(i) Basis functions-based framework needs 12.98 secs.
computation; 0.02 secs. for root node identiﬁcation and IPT reconstruction)
(ii) GCN and Sensor pattern noise needs 4.45 secs. (4.37 secs. for depth labels; 0.08 secs. for link
prediction)
All evaluations are done using Intel® Core ™ i7-7700 CPU @ 3.6 GHz.

for dissimilarity matrix

As far as limitations are concerned, our analysis reveals that the performance of the method
is limited when the variation in the images between diﬀerent depth labels does not possess a
continuous gradient. For example, if the images located at higher depth labels are very similar
to the images located at lower depth labels (indicative of a cycle likely to be present), then the
modeling of transformations using basis functions as well as the inference of the depth labels by the
Graph Convolutional Network tend to be erroneous. This error propagates if the original structure
is depth-heavy, and is worst for trees with no sibling nodes (no breadth). In our test cases, we
randomly simulated the transformations without enforcing any gradient constraint on successive
transformations. As a result, we observed poor performance for trees with deep conﬁgurations.
However, in cases where there is gradual increase of transformations, our method was able to
correctly identify the root node and fully recover the phylogeny structure, as indicated in the example
(see Figure 6.6(b)) involving near-duplicates generated by gradually increasing the intensity of the
makeup.

175

8.2 Future Work

Image manipulation has been augmented and has achieved a new level of “realism” through the
courtesy of deep learning. DeepFakes have emerged and have piqued the interest of academicians,
researchers, engineers, and the government. Although it made its ﬁrst appearance as a feat achieved
by deep learning networks, quickly enough, the prospect of its abuse for misinformation and
disinformation has raised concerns. Distinguishing between ‘real’ and ‘fake’ images has therefore
become of paramount importance. We will conclude this thesis by touching on some future
directions by extending the current works.

1. In the thesis, we focused primarily on the task of image phylogeny. The same principle can
be applied to the task of video phylogeny. Only one work has been done in the context of
video phylogeny, in which the temporal sequence of the video frames is depicted using the
phylogenetic structure. The idea is to screen near-duplicate frames and use a reduced subset
of frames, followed by application of existing image phylogeny tree construction routines to
deduce the hierarchical structure. The task is highly challenging due to the variation in the
scene and time, simultaneously, but will be extremely useful in reconstructing timeline for
crime scenes. This will be particularly relevant in public incidents, such as riots or bombings,
where potentially, several people might capture the scene using their smartphone cameras.

2. The task of image phylogeny requires discriminating between original and transformed
images. The integration of graph-based approach and sensor pattern noise can be used as
a stepping stone towards identifying DeepFakes. Current DeepFake detection methods use
DeepFake images during training, and can therefore, accurately detect DeepFakes generated
using a particular algorithm. However, the generalizability of the current methods to new
DeepFake generators seem to be still lacking. In contrast, the graph-based approach used in
image phylogeny, models the underlying manifold of the data, and can be used to disentangle
between the subspaces populated by the real and the fake images. Also, synthetic images will

176

not carry sensor traces typically, so the use of sensor pattern noise may additionally assist in
discriminating between the two subspaces.

3. The use of graph convolutional network and sensor pattern noise can be leveraged to detect
morphed images or even biometric presentation attacks. One of the advantages of the graph-
based approach was its generalizability to unseen transformations and unseen biometric
modalities. Morphing typically involves fusing two images, and can be applied to face images
and iris images. Morphing of biometric images aggravates the concern of falsely matching the
morphed image to two separate identities. Presentation attacks, on the other hand, circumvent
the biometric recognition system by presenting an altered biometric sample, which can be in
the form of replay attacks, print attacks, plastic eyes or face masks. Distinguishing between
bonaﬁde face images and presentation attacks or morphed images is a pressing problem that
may be addressed using the graph convolutional network.

4. Cyberattacks such as website defacements can use disturbing graphics to incite unrest among
the audience. The hacker may reuse these images with subtle modiﬁcations as an insignia of
their propaganda. Tracking these near-duplicates through the proposed approach can deliver
some useful insight into the modus operandi of the defacer. This will help track perpetrators
and can assist the authorities in boosting cybersecurity.

5. Authentication using biometric signatures will become ubiquitous in the futuristic smart
city. Smart home environments already use face recognition to monitor guests arriving at
the entrance, smart mobility uses ECG-based signatures to solve drowsy driver problem.
In all these cases, the seamless authentication of the device and the user is essential. The
joint representation that we developed for performing smartphone sensor and biometric
recognition simultaneously, can be extended to work across diﬀerent platforms. Although
our work focused primarily on the performance of the joint representation, future work
can delve into the computational eﬃciency, memory requirement and template protection
strategies for the combined biometric-device representation.

177

We will also consider multiple sources for a single image as part of our Future Work. This
task will come under the scope of ‘provenance analysis’ [124] which ﬁrst identiﬁes the relevant
donor images for a single composite image using a provenance ﬁltering step, and then follow it
with a provenance graph construction step for determining the order of modiﬁcations. We will also
consider locating editing boundaries in images which have been locally tampered with. This task
will come under the scope of ‘manipulated image detection’.

178

BIBLIOGRAPHY

179

BIBLIOGRAPHY

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

http://www4.comp.polyu.edu.hk/~csajaykr/IITD/Database_Iris.htm.

http://www.iab-rubric.org/resources/impdatabase.html.

https://sites.google.com/a/nd.edu/public-cvrl/data-sets.

http://www.cs.princeton.edu/~andyz/irisrecognition.

Augmentor: Image augmentation library in python for machine learning. https://github.com/
mdbloice/Augmentor. [Online accessed: 3rd Januray, 2020].

CASIA Iris Database Version 2. http://biometrics.idealtest.org/dbDetailForUser.do?id=2.
[Online accessed: 12th April 2019].

CASIA Iris Database Version 4. http://biometrics.idealtest.org/dbDetailForUser.do?id=4.
[Online accessed: 30th August 2019].

CelebA dataset. http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html. [Online accessed: 3rd
January, 2020].

Embedding network tutorial. https://github.com/adambielski/siamese-triplet/blob/master/
Experiments_MNIST.ipynb. [Online accessed: 12th January, 2020].

[10] Smartphone

pictures

https://www.youtube.com/watch?v=

N2vARzvWxwY&feature=youtu.be. [Online accessed: 18th April 2019].

poses

privacy

risks.

[11] VeriEye iris matcher. https://www.fulcrumbiometrics.com/Iris-Matcher-License-p/100424.

htm. [Online: accessed 13-December-2018].

[12] WVU multimodal dataset release 1. https://biic.wvu.edu/data-sets/multimodal-dataset. [On-

line accessed: 15th March, 2020].

[13] A. Agarwal, A. Sehwag, R. Singh, and M. Vatsa. Deceiving face presentation attack detection

via image transforms. IEEE International Conference on Multimedia Big Data, 08 2019.

[14] Akshay Agarwal, Rohit Keshari, Manya Wadhwa, Mansi Vĳh, Chandani Parmar, Richa
Singh, and Mayank Vatsa. Iris sensor identiﬁcation in multi-camera environment. Informa-
tion Fusion, 45:333 – 345, 2019.

[15] R. Arjona, M. A. Prada-Delgado, I. Baturone, and A. Ross. Securing minutia cylinder codes
for ﬁngerprints through physically unclonable functions: An exploratory study. In Proc. of
11th IAPR International Conference on Biometrics (ICB), Gold Coast, Australia, June 2018.
[16] S. Baker and I. Matthews. Lucas-Kanade 20 years on: A unifying framework. International

Journal of Computer Vision, 56(3):221–255, 2004.

180

[17] S. Banerjee, V. Mirjalili, and A. Ross. Spooﬁng PRNU patterns of iris sensors while
In 5th IEEE International Conference on Identity, Security,

preserving iris recognition.
Behavior and Analysis (ISBA), January 2019.

[18] S. Banerjee and A. Ross. From image to sensor: Comparative evaluation of multiple PRNU
In Fifth International

estimation schemes for identifying sensors from NIR iris images.
Workshop on Biometrics and Forensics, 2017.

[19] S. Banerjee and A. Ross.

Impact of photometric transformations on PRNU estimation
schemes: A case study using near infrared ocular images. In International Workshop on
Biometrics and Forensics (IWBF), pages 1–8, June 2018.

[20] S. Banerjee and A. Ross. Face phylogeny tree: Deducing relationships between near-
duplicate face images using legendre polynomials and radial basis functions. In 10th IEEE
International Conference on Biometrics: Theory, Applications and Systems (BTAS), Tampa,
Florida, September 2019.

[21] S. Banerjee and A. Ross. Face phylogeny tree: Deducing relationships between near-
duplicate face images using legendre polynomials and radial basis functions. In 10th IEEE
International Conference on Biometrics: Theory, Applications and Systems (BTAS), Septem-
ber 2019.

[22] S. Banerjee and A. Ross. Smartphone camera de-identiﬁcation while preserving biometric
utility. In Proc. of 10th IEEE International Conference on Biometrics: Theory, Applications
and Systems (BTAS), Tampa, USA, September 2019.

[23] S. Banerjee and A. Ross. Face phylogeny tree using basis functions. In IEEE Transactions

on Biometrics, Behavior and Identity Science, 2020.

[24] N. Bartlow, N. Kalka, B. Cukic, and A. Ross. Identifying sensors from ﬁngerprint images. In
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops,
pages 78–84, June 2009.

[25] N. Bartlow, N. Kalka, B. Cukic, and A. Ross. Identifying sensors from ﬁngerprint images. In
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops,
pages 78–84, 2009.

[26] A. Bartoli. Groupwise geometric and photometric direct image registration. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence, 30(12):2098–2108, Dec 2008.

[27] S. Bayram, H. Sencar, N. Memon, and I. Avcibas. Source camera identiﬁcation based on
CFA interpolation. In IEEE International Conference on Image Processing 2005, volume 3,
pages III–69–72, Sept 2005.

[28] S. Bayram, H. T. Sencar, and N. D. Memon. Seam-carving based anonymization against
image and video source attribution. In IEEE 15th International Workshop on Multimedia
Signal Processing (MMSP), pages 272–277, Sept 2013.

[29] T. Berry and T. Sauer. Spectral clustering from geometric viewpoint. Technical report, 2015.

181

[30] P. Bestagini, M. Tagliasacchi, and S. Tubaro. Image phylogeny tree reconstruction based
In IEEE International Conference on Acoustics, Speech and Signal

on region selection.
Processing (ICASSP), pages 2059–2063, March 2016.

[31] A. Bharati, D. Moreira, A. Pinto, J. Brogan, K. Bowyer, P. Flynn, W. Scheirer, and A. Rocha.
U-phylogeny: Undirected provenance graph construction in the wild. In IEEE International
Conference on Image Processing (ICIP), 05 2017.

[32] Aparna Bharati, Daniel Moreira, Patrick J. Flynn, Anderson Rocha, Kevin W. Bowyer, and
Walter J. Scheirer. Learning transformation-aware embeddings for image forensics. ArXiv,
abs/2001.04547, 2020.

[33] G. Bhupendra and M. Tiwari. Improving source camera identiﬁcation performance using
DCT based image frequency components dependent sensor pattern noise extraction method.
Digital Investigation, 03 2018.

[34] D. Bobeldyk and A. Ross. Analyzing covariate inﬂuence on gender and race prediction from

near-infrared ocular images. pages 7905–7919, 2019.

[35] Z. Boulkenafet, J. Komulainen, Lei. Li, X. Feng, and A. Hadid. OULU-NPU: A mobile face
presentation attack database with real-world variations. IEEE International Conference on
Automatic Face and Gesture Recognition, 2017.

[36] Z. Boulkenafet, J. Komulainen, Lei. Li, X. Feng, and A. Hadid. OULU-NPU: A mobile face
presentation attack database with real-world variations. IEEE International Conference on
Automatic Face and Gesture Recognition, 2017.
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. Signature
In Proc. of 6th International
veriﬁcation using a "siamese" time delay neural network.
Conference on Neural Information Processing Systems, pages 737–744, 1993.

[37]

[38] R. Castelletto, S. Milani, and P. Bestagini. Phylogenetic minimum spanning tree reconstruc-
tion using autoencoders. In IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 2817–2821, 2020.

[39] Hung-Jen Chen, Ka-Ming Hui, Szu-Yu Wang, Li-Wu Tsao, Hong-Han Shuai, and Wen-
Huang Cheng. BeautyGlow: On-demand makeup transfer framework with reversible gen-
In The IEEE Conference on Computer Vision and Pattern Recognition
erative network.
(CVPR), June 2019.

[40] L. Chen, L. Xu, X. Yuan, and N. Shashidhar. Digital forensics in social networks and the
cloud: Process, approaches, methods, tools, and challenges. In International Conference on
Computing, Networking and Communications (ICNC), pages 1132–1136, Feb 2015.

[41] M. Chen, J. Fridrich, M. Goljan, and J. Lukas. Determining image origin and integrity using
sensor noise. IEEE Transactions on Information Forensics and Security, 3(1):74–90, March
2008.

182

[42] M. Chen, J. Fridrich, M. Goljan, and J. Lukas. Determining image origin and integrity using
sensor noise. IEEE Transactions on Information Forensics and Security, 3(1):74–90, March
2008.

[43] W. Chen, M. J. Er, and S. Wu. Illumination compensation and normalization for robust
face recognition using discrete cosine transform in logarithm domain. IEEE Transactions
on Systems, Man, and Cybernetics, Part B (Cybernetics), 36(2):458–466, April 2006.

[44] Sumit Chopra, Raia Hadsell, and Yann Lecun. Learning a similarity metric discriminatively,
IEEE Conference on Computer Vision and Pattern

with application to face veriﬁcation.
Recognition (CVPR), 1:539– 546, 07 2005.
J. Ćosić and M. Bača. (Im)proving chain of custody and digital evidence integrity with time
stamp. In The 33rd International Convention MIPRO, pages 1226–1230, May 2010.
Jasmin Ćosić and Zoran Ćosić. Chain of custody and life cycle of digital evidence. In Journal
of Computer Technology and Applications, volume 3, pages 126–129, February 2012.

[45]

[46]

[48]

[47] F. Costa, A. Oliveira, P. Ferrara, Z. Dias, S. Goldenstein, and A. Rocha. New dissimilarity
measures for image phylogeny reconstruction. Pattern Analysis and Applications, 20, 03
2017.
J. Daugman. How iris recognition works. IEEE Transactions on Circuits and Systems for
Video Technology, 14(1):21–30, Jan 2004.
J. G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation
optimized by two-dimensional visual cortical ﬁlters. In Journal of Optical Society of America
A, volume 2, pages 1160–1169, July 1985.

[49]

[50] F. de O. Costa, M. A. Oikawa, Z. Dias, S. Goldenstein, and A. R. de Rocha.

Image
phylogeny forests reconstruction. IEEE Transactions on Information Forensics and Security,
9(10):1533–1546, Oct 2014.

[51] A. A. de Oliveira, P. Ferrara, A. De Rosa, A. Piva, M. Barni, S. Goldenstein, Z. Dias, and
A. Rocha. Multiple parenting phylogeny relationships in digital images. IEEE Transactions
on Information Forensics and Security, 11(2):328–343, 2016.

[52] L. Debiasi and A. Uhl. Blind biometric source sensor recognition using advanced PRNU
ﬁngerprints. In 23rd European Signal Processing Conference (EUSIPCO), pages 779–783,
Aug 2015.

[53] L. Debiasi and A. Uhl. Techniques for a forensic analysis of the CASIA-IRIS V4 database. In
3rd International Workshop on Biometrics and Forensics (IWBF), pages 1–6, March 2015.
[54] L. Debiasi and A. Uhl. Comparison of PRNU enhancement techniques to generate PRNU
In 4th International Workshop on

ﬁngerprints for biometric source sensor attribution.
Biometrics and Forensics (IWBF), pages 1–6, March 2016.

183

[55] M. Deﬀerrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs
with fast localized spectral ﬁltering. In Proceedings of the 30th International Conference on
Neural Information Processing Systems, pages 3844–3852, 2016.
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierar-
chical image database. In IEEE Conference on Computer Vision and Pattern Recognition,
pages 248–255, June 2009.

[56]

[57] Z. Dias, S. Goldenstein, and A. Rocha. Exploring heuristic and optimum branching algo-
rithms for image phylogeny. Journal of Visual Communication and Image Representation,
24(7):1124 – 1134, 2013.

[58] Z. Dias, S. Goldenstein, and A. Rocha. Large-scale image phylogeny: Tracing image

ancestral relationships. IEEE MultiMedia, 20(3):58–70, July 2013.

[59] Z. Dias, A. Rocha, and S. Goldenstein. First steps toward image phylogeny.

International Workshop on Information Forensics and Security, pages 1–6, Dec 2010.

In IEEE

[60] Z. Dias, A. Rocha, and S. Goldenstein. First steps toward image phylogeny.

International Workshop on Information Forensics and Security, pages 1–6, Dec 2010.

In IEEE

[61] Z. Dias, A. Rocha, and S. Goldenstein. Image phylogeny by minimal spanning trees. IEEE

Transactions on Information Forensics and Security, 7(2):774–788, April 2012.

[62] Zanoni Dias, Siome Goldenstein, and Anderson Rocha. Toward image phylogeny forests:
Automatically recovering semantically similar image relationships. Forensic Science Inter-
national, 231(1–3):178 – 189, 2013.

[63] A. E. Dirik and A. Karaküçük. Forensic use of photo response non-uniformity of imaging

sensors and a counter method. Opt. Express, 22(1):470–482, Jan 2014.

[64] A. E. Dirik, H. T. Sencar, and N. Memon. Analysis of seam-carving-based anonymization
IEEE Transactions on

of images against PRNU noise pattern-based source attribution.
Information Forensics and Security, 9(12):2277–2290, Dec 2014.

[68]

[65] H. Farid. Digital image forensics. Scientiﬁc American, 298(6):66–71, 2008.
[66] H. Farid. Photo Forensics. The MIT Press, 2016.
[67] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural

networks. Thirty-Third AAAI Conference on Artiﬁcial Intelligence, 2019.
J. J. Foo, J. Zobel, and R. Sinha. Clustering near-duplicate images in large collections.
In Proceedings of the International Workshop on Workshop on Multimedia Information
Retrieval, page 21–30, New York, NY, USA, 2007. Association for Computing Machinery.
[69] David Freire-Obregón, Fabio Narducci, Silvio Barra, and Modesto Castrillón-Santana. Deep
learning for source camera identiﬁcation on mobile devices. Pattern Recognition Letters,
126:86 – 91, 2019. Robustness, Security and Regulation Aspects in Current Biometric
Systems.

184

[70]

J. Fridrich. Digital image forensics. IEEE Signal Processing Magazine, 26(2):26–37, March
2009.

[71] C. Galdi, M. Nappi, and J. L. Dugelay. Multimodal authentication on smartphones: Com-
bining iris and sensor recognition for a double check of user identity. Pattern Recognition
Letters, 3:34–40, 2015.

[72] Chiara Galdi, Michele Nappi, and Jean-Luc Dugelay. Combining hardwaremetry and biome-
try for human authentication via smartphones. In Vittorio Murino and Enrico Puppo, editors,
Image Analysis and Processing (ICIAP), pages 406–416. Springer International Publishing,
2015.

[73] Chiara Galdi, Michele Nappi, and Jean-Luc Dugelay. Secure user authentication on smart-
phones via sensor and face recognition on short video clips. In Man Ho Allen Au, Arcangelo
Castiglione, Kim-Kwang Raymond Choo, Francesco Palmieri, and Kuan-Ching Li, editors,
Green, Pervasive, and Cloud Computing, pages 15–22. Springer International Publishing,
2017.

[74] Z. J. Geradts, J. Bĳhold, M. Kieft, K. Kurosawa, and K. Kuroki. Methods for identiﬁcation
of images acquired with digital cameras. Proc. SPIE 4232, Enabling Technologies for Law
Enforcement and Security, 2001.

[75] T. Gloe, M. Kirchner, A. Winkler, and R. Böhme. Can we trust digital image forensics? In
Proceedings of the ACM International Multimedia Conference and Exhibition, pages 78–86.
01 2007.

[76] M. Goljan. Digital camera identiﬁcation from images - Estimating false acceptance proba-

bility. Proc. 7th International Workshop on Digital Watermarking, Nov 2008.

[77] M. Goljan, J. Fridrich, and M. Chen. Sensor noise camera identiﬁcation: countering counter-
forensics. In Proceedings of the SPIE, Media Forensics and Security II, volume 7541, 2010.
[78] M. Goljan, J. Fridrich, and M. Chen. Defending against ﬁngerprint-copy attack in sensor-
IEEE Transactions on Information Forensics and Security,

based camera identiﬁcation.
6(1):227–236, March 2011.

[79] R. C. Gonzalez and R. E. Woods. Digital Image Processing (3rd Edition). Prentice-Hall,

Inc., Upper Saddle River, NJ, USA, 2006.

[80] X. Guo, X. Liu, E. Zhu, and J. Yin. Deep clustering with convolutional autoencoders.
In D. Liu, S. Xie, Y. Li, D. Zhao, and El-Sayed M. El-Alfy, editors, Neural Information
Processing, pages 373–382, Cham, 2017. Springer International Publishing.

[81] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on
large graphs. In Proceedings of the 31st International Conference on Neural Information
Processing Systems, (NIPS), pages 1025–1035, 2017.

[82] H. Han and A. K. Jain. Age, gender and race estimation from unconstrained face images.

2014.

185

[83] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778,
June 2016.

[84] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen. Attgan: Facial attribute editing by only
changing what you want. IEEE Transactions on Image Processing, 28(11):5464–5478, 2019.
[85] K. Hernandez-Diaz, F. Alonso-Fernandez, and J. Bigun. Periocular recognition using CNN
features oﬀ-the-shelf. In Proceedings of the 17th International Conference of the Biometrics
Special Interest Group (BIOSIG), pages 1–5, September 2018.

[86] E. Hoﬀar and N. Ailon. Deep metric learning using triplet network.

Workshop on Similarity-Based Pattern Recognition, pages 84–92. Springer, 2015.

In International

[87] V. Holub, J. Fridrich, and Tomás Denemark. Universal distortion function for steganography

in an arbitrary domain. EURASIP Journal on Information Security, 2014:1–13, 2014.

[88] G. B. Huang, M. R., T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for
studying face recognition in unconstrained environments. Technical Report 07-49, University
of Massachusetts, Amherst, October 2007.

[89] B. Jähne and H. Haußecker, editors. Computer Vision and Applications: A Guide for Students

and Practitioners. Academic Press, Inc., Orlando, FL, USA, 2000.

[90] A. K. Jain, A. Ross, and K. Nandakumar. Introduction to biometrics. Springer.
[91] A. K. Jain, A. Ross, and K. Nandakumar. Introduction to biometrics. Springer, 2011.
[92] A. K. Jain, A. Ross, and S. Prabhakar. An introduction to biometric recognition. IEEE

Transactions on Circuits and Systems for Video Technology, 14(1):4–20, Jan 2004.

[93] Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Inc.,

USA, 1988.

[94] R. Jillela and A. Ross. Ocular and periocular biometrics. In Wiley Encyclopedia of Electrical

and Electronics Engineering. John Wiley & Sons, Inc., 2016.

[95] R. Jillela, A. Ross, V. N. Boddeti, B. V. K. Vĳaya Kumar, X. Hu, R. Plemmons, and P. Pauca.
Iris segmentation for challenging periocular images. In K. W. Bowyer and Mark J. Burge,
editors, Handbook of Iris Recognition, pages 281–308. Springer, London, 2016.

[96] D. J. Jobson, Z. Rahman, and G. A. Woodell. A multiscale retinex for bridging the gap
between color images and the human observation of scenes. Trans. Img. Proc., 6(7):965–
976, July 1997.

[97] F. Juefei-Xu and M. Savvides. Subspace-based discrete transform encoded local binary
patterns representations for robust periocular matching on NIST’s Face Recognition Grand
Challenge. IEEE Transactions on Image Processing, 23(8):3490–3505, Aug 2014.

186

[98] N. Kalka, N. Bartlow, B. Cukic, and A. Ross. A preliminary study on identifying sensors from
iris images. In IEEE Conference on Computer Vision and Pattern Recognition Workshops
(CVPRW), pages 50–56, June 2015.

[99] X. Kang, Y. Li, Z. Qu, and J. Huang. Enhancing source camera identiﬁcation performance
IEEE Transactions on Information

with a camera reference phase sensor pattern noise.
Forensics and Security, 7(2):393–402, April 2012.

[100] C. Kauba, L. Debiasi, and A. Uhl. Identifying the Origin of Iris Images Based on Fusion of
Local Image Descriptors and PRNU Based Techniques. In 3rd International Joint Conference
on Biometrics (ĲCB), October 2017.

[101] Yan Ke, Rahul Sukthankar, and Larry Huston. An eﬃcient parts-based near-duplicate and
sub-image retrieval system. In Proceedings of the 12th Annual ACM International Conference
on Multimedia, pages 869–876, 2004.

[102] Thomas N. Kipf and Max Welling. Semi-supervised classiﬁcation with graph convolutional

networks. In 5th International Conference on Learning Representations, (ICLR),, 2017.

[103] J. Komulainen, Z. Boulkenafet, and Z. Akhtar. Review of face presentation attack detection
competitions.
In Sébastien Marcel, Mark S. Nixon, Julian Fierrez, and Nicholas Evans,
editors, Handbook of Biometric Anti-Spooﬁng: Presentation Attack Detection, pages 291–
317. Springer International Publishing, 2019.

[104] Guoqi L. and Changyun W. Legendre polynomials in signal reconstruction and compression.
In 5th IEEE Conference on Industrial Electronics and Applications, pages 1636–1640, June
2010.

[105] B. Levy. Review of “Digital image forensics: There is more to a picture than meets the eye"

by Husrev Taha Sencar and Nasir Memon (Editors). volume 4, page 17. 2013.

[106] C. T. Li. Source camera identiﬁcation using enhanced sensor pattern noise. IEEE Transac-

tions on Information Forensics and Security, 5(2):280–287, June 2010.

[107] C. T. Li, C. Y. Chang, and Y. Li. On the repudiability of device identiﬁcation and image

integrity veriﬁcation using sensor pattern noise. pages 19–25, 2010.

[108] X. Lin and C. T. Li. Enhancing sensor pattern noise via ﬁltering distortion removal. IEEE

Signal Processing Letters, 23(3):381–385, March 2016.

[109] X. Lin and C. T. Li. Preprocessing reference sensor pattern noise via spectrum equalization.

IEEE Transactions on Information Forensics and Security, 11(1):126–140, Jan 2016.

[110] S. P. Lloyd. Least squares quantization in pcm. Technical Report RR-5497, Bell Lab, 1957.

[111] B. D. Lucas and T. Kanade. An iterative image registration technique with an application
In Proceedings of the 7th International Joint Conference on Artiﬁcial

to stereo vision.
Intelligence - Vol.2, ĲCAI, pages 674–679. Morgan Kaufmann Publishers Inc., 1981.

187

[112] J. Lukas, J. Fridrich, and M. Goljan. Digital camera identiﬁcation from sensor pattern noise.

IEEE Transactions on Information Forensics and Security, 1(2):205–214, June 2006.

[113] J. Lukas, J. Fridrich, and M. Goljan. Digital camera identiﬁcation from sensor pattern noise.

IEEE Transactions on Information Forensics and Security, 1(2):205–214, June 2006.

[114] D. Maio, D. Maltoni, R. Cappelli, J. L. Wayman, and A. K. Jain. FVC2000: ﬁngerprint
veriﬁcation competition. IEEE Transactions on Pattern Analysis and Machine Intelligence,
24(3):402–412, March 2002.

[115] Francesco Marra, Giovanni Poggi, Carlo Sansone, and Luisa Verdoliva. A deep learning

approach for iris sensor model identiﬁcation. Pattern Recognition Letters, 2017.

[116] Francesco Marra, Giovanni Poggi, Carlo Sansone, and Luisa Verdoliva. A deep learning
approach for iris sensor model identiﬁcation. Pattern Recognition Letters, 113:46 – 53, 2018.
[117] M. D. Marsico, M. Nappi, F. Narducci, and H. Proença. Insights into the results of MICHE

I - mobile iris challenge evaluation. Pattern Recognition, 74:286 – 304, 2018.

[118] J. R. Matey. Iris device. In S. Z. Li and A. K. Jain, editors, Encyclopedia of Biometrics,

pages 774–778. Springer, 2009.

[119] A. Melloni, P. Bestagini, S. Milani, M. Tagliasacchi, A. Rocha, and S. Tubaro.

Image
In Fifth European Workshop on Visual

phylogeny through dissimilarity metrics fusion.
Information Processing (EUVIP), pages 1–6, Dec 2014.

[120] R. Mercuri. Courtroom considerations in digital image forensics.

In H. T. Sencar and
N. Memon, editors, Digital Image Forensics: There is More to a Picture than Meets the Eye,
pages 313–325. Springer New York, New York, NY, 2013.

[121] S. Milani, M. Fontana, P. Bestagini, and S. Tubaro. Phylogenetic analysis of near-duplicate
images using processing age metrics. In IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pages 2054–2058, March 2016.

[122] V. Mirjalili, S. Raschka, and A. Ross. Gender privacy: An ensemble of semi adversarial
networks for confounding arbitrary gender classiﬁers. In 9th IEEE International Conference
on Biometrics Theory, Applications and Systems (BTAS), pages 1–10, Oct 2018.

[123] V. Mirjalili and A. Ross. Soft biometric privacy: Retaining biometric utility of face images
while perturbing gender. In Proc. of IEEE International Joint Conference on Biometrics
(ĲCB), pages 564–573, Oct 2017.

[124] D. Moreira, A. Bharati, J. Brogan, A. Pinto, M. Parowski, K. W. Bowyer, P. J. Flynn,
IEEE Transactions on

A. Rocha, and W.. Scheirer. Image provenance analysis at scale.
Image Processing (T-IP), 27(12), 2018.

[125] S. Nagaraja, P. Schaﬀer, and D. Aouada. Who clicks there!: Anonymising the photographer
in a camera saturated society. In Proceedings of the 10th Annual ACM Workshop on Privacy
in the Electronic Society, WPES ’11, pages 13–22, New York, NY, USA, 2011.

188

[126] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and
an algorithm. In Proceedings of the 14th International Conference on Neural Information
Processing Systems: Natural and Synthetic, page 849–856, 2001.

[127] L. Nie, A. Kumar, and S. Zhan. Periocular recognition using unsupervised convolutional rbm
feature learning. In 22nd International Conference on Pattern Recognition, pages 399–404,
Aug 2014.

[128] M. A. Oikawa, Z. Dias, A. de Rezende Rocha, and S. Goldenstein. Manifold learning and
spectral clustering for image phylogeny forests. IEEE Transactions on Information Forensics
and Security, 11(1):5–18, Jan 2016.

[129] A. Oliveira, P. Ferrara, A. De Rosa, A. Piva, M. Barni, S. Goldenstein, Z. Dias, and A. Rocha.
Multiple parenting identiﬁcation in image phylogeny. In IEEE International Conference on
Image Processing (ICIP), pages 5347–5351, Oct 2014.

[130] S. Omachi and M. Omachi. Fast template matching with polynomials. IEEE Transactions

on Image Processing, 16(8):2139–2149, Aug 2007.

[131] C. N. Padole and H. Proenca. Compenstaing for pose and illumination in unconstrained

periocular biometrics. International Journal of Biometrics, 5(3):336–359, 2013.

[132] Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition.

British Machine Vision Conference, 2015.

In

[133] N. Le Philippe, W. Puech, and C. Fiorio. Phylogeny of jpeg images by ancestor estimation
using missing markers on image pairs. In Sixth International Conference on Image Processing
Theory, Tools and Applications (IPTA), pages 1–6, Dec 2016.

[134] A. C. Popescu and H. Farid. Exposing digital forgeries in color ﬁlter array interpolated

images. IEEE Transactions on Signal Processing, 53(10):3948–3959, Oct 2005.

[135] Salil Prabhakar, Alexander Ivanisov, and A.K. Jain. Biometric recognition: Sensor charac-
teristics and image quality. Instrumentation & Measurement Magazine, IEEE, 14:10 – 16,
07 2011.

[136] G. W. Quinn, J. Matey, E. Tabassi, and P. Grother.

collection. In NIST Intragency Report 8013, 2014.

IREX V: Guidance for iris image

[137] J. Redi, W. Taktak, and J.L. Dugelay. Digital image forensics: A booklet for beginners.

Multimedia Tools and Applications, 51(1):133–162, 2011.

[138] J. A. Redi, W. Taktak, and J. L. Dugelay. Digital image forensics: A booklet for beginners.

Multimedia Tools Appl., 51(1):133–162, January 2011.

[139] A. Ross, S. Banerjee, C. Chen, A. Chowdhury, V. Mirjalili, R. Sharma, T. Swearingen,
and S. Yadav. Some research problems in biometrics: The future beckons. In 12th IAPR
International Conference on Biometrics (ICB), Crete, Greece, June 2019.

189

[140] C. Sammut. Density estimation. In C. Sammut and G. I. Webb, editors, Encyclopedia of

Machine Learning and Data Mining, pages 348–349. Springer US, Boston, MA, 2017.

[141] M. Ester H. Kriegel J. Sander and X. Xu. A density-based algorithm for discovering clusters

in large spatial databases with noise. pages 226–231, 1996.

[142] Gerald Schaefer and Michal Stich. UCID: an uncompressed color image database.

In
Minerva M. Yeung, Rainer W. Lienhart, and Chung-Sheng Li, editors, Storage and Retrieval
Methods and Applications for Multimedia 2004, volume 5307, pages 472 – 480. International
Society for Optics and Photonics, SPIE, 2003.

[143] F. Schroﬀ, D. Kalenichenko, and J. Philbin. Facenet: A uniﬁed embedding for face recog-
nition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 815–823, June 2015.

[144] S. A. C. Schuckers, N. A. Schmid, A. Abhyankar, V. Dorairaj, C. K. Boyce, and L. A. Hornak.
On techniques for angle compensation in nonideal iris recognition. IEEE Transactions on
Systems, Man, and Cybernetics, Part B (Cybernetics), 37(5):1176–1190, Oct 2007.

[145] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative compar-
isons. In S. Thrun, L. K. Saul, and B. Schölkopf, editors, Advances in Neural Information
Processing Systems 16, pages 41–48. MIT Press, 2003.

[146] E. Shutova, S. Teufel, and A. Korhonen. Statistical Metaphor Processing. Comput. Linguist.,

39(2):301–353, June 2013.

[147] R. Singh, M. Vatsa, and A. Noore. Improving veriﬁcation accuracy by synthesis of locally
enhanced biometric images and deformable model. Signal Processing, 87(11):2746 – 2764,
2007.

[148] Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In D. D.
Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural
Information Processing Systems 29, pages 1857–1865. Curran Associates, Inc., 2016.

[149] N. A. Spaun. Forensic biometrics from images and video at the federal bureau of investigation.
In IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS),
pages 1–3, Sep. 2007.

[150] G. Stockman and L. G. Shapiro. Computer Vision. Prentice Hall PTR, Upper Saddle River,

NJ, USA, 1st edition, 2001.

[151] V. Štruc and N. Pavesic. Photometric normalization techniques for illumination invariance.
In Advances in Face Image Analysis: Techniques and Technologies, pages 279–300. IGI-
Global, 01 2011.

[152] C. W. Tan and A. Kumar. Towards online iris and periocular recognition under relaxed
imaging constraints. IEEE Transactions on Image Processing, 22(10):3751–3765, Oct 2013.
[153] L. W. Tu. Bump functions and partitions of unity. In An Introduction to Manifolds, pages

127–134. Springer New York, New York, NY, 2008.

190

[154] A. Uhl and Y. Höller. Iris-sensor authentication using camera PRNU ﬁngerprints. In 5th

IAPR International Conference on Biometrics (ICB), pages 230–237, March 2012.

[155] D. Valsesia, G. Coluccia, T. Bianchi, and E. Magli. User authentication via PRNU-based
physical unclonable functions. IEEE Transactions on Information Forensics and Security,
12(8):1941–1956, 2017.

[156] L. van der Maaten and G. E. Hinton. Visualizing data using t-SNE. Journal of Machine

Learning Research, 9:2431–2556, November 2008.

[157] L. J. G. Villalba, A. L. S. Orozco, J. R. Corripio, and J. H. Castro. A PRNU-based counter-
forensic method to manipulate smartphone image source identiﬁcation techniques. Future
Generation Computer Systems, 76:418 – 427, 2017.

[158] M. Vollmer and K. P. Möllmann. Infrared thermal imaging: Fundamentals, research and

applications. New York: Wiley, 2010.

[159] G. K. Wallace. The JPEG still picture compression standard. IEEE Transactions on Consumer

Electronics, 38(1):xviii–xxxiv, Feb 1992.

[160] Haitao Wang, S. Z. Li, and Yangsheng Wang. Face recognition under varying lighting
conditions using self quotient image. In Sixth IEEE International Conference on Automatic
Face and Gesture Recognition, 2004. Proceedings., pages 819–824, May 2004.

[161] Liwei Wang, Yin Li, and Svetlana Lazebnik. Learning deep structure-preserving image-text
embeddings. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages
5005–5013, 2016.

[162] C. Ye, R. C. Wilson, C. H. Comin, L. da F. Costa, and E. R. Hancock. Entropy and hetero-
geneity measures for directed graphs. In E. Hancock and M. Pelillo, editors, Similarity-Based
Pattern Recognition, pages 219–234, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
[163] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from
error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–
612, April 2004.

[164] K. Zuiderveld. Contrast Limited Adaptive Histogram Equalization. In Paul S. Heckbert,
editor, Graphics Gems IV, pages 474–485. Academic Press Professional, Inc., San Diego,
CA, USA, 1994.

191