TOWARDSROBUSTANDSECUREFACERECOGNITION:
DEFENSEAGAINSTPHYSICALANDDIGITALATTACKS
By
DebayanDeb
ADISSERTATION
Submittedto
MichiganStateUniversity
inpartialoftherequirements
forthedegreeof
ComputerScienceŒDoctorofPhilosophy
2021
ABSTRACT
TOWARDSROBUSTANDSECUREFACERECOGNITION:
DEFENSEAGAINSTPHYSICALANDDIGITALATTACKS
By
DebayanDeb
Theaccuracy,usability,andtouchlessacquisitionofstate-of-the-artautomatedfacerecognition
systems(AFR)haveledtotheirubiquitousadoptioninaplethoraofdomains,includingmobile
phoneunlock,accesscontrolsystems,andpaymentservices.Despiteimpressiverecognitionper-
formance,prevailingAFRsystemsremainvulnerabletothegrowingthreatof
faceattacks
which
canbelaunchedinboth
physical
and
digital
domains.Faceattackscanbebroadly
intothreeattackcategories:(i)
Spoof
attacks:artifactsinthephysicaldomain(
e
.
g
.,3Dmasks,
eyeglasses,replayingvideos),(ii)
Adversarial
attacks:imperceptiblenoisesaddedtoprobesfor
evadingAFRsystems,and(iii)
Digitalmanipulation
attacks:entirelyorpartiallyphoto-
realisticfacesusinggenerativemodels.Eachofthesecategoriesiscomposedofdifferentattack
types.Forexample,eachspoofmedium,
e
.
g
.,3Dmaskandmakeup,constitutesoneattacktype.
Likewise,inadversarialanddigitalmanipulationattacks,eachattackmodel,designedbyunique
objectivesandlosses,maybeconsideredasoneattacktype.Thus,theattackcategoriesandtypes
forma
2
-layertreestructureencompassingthediverseattacks.Suchatreewillinevitablygrow
inthefuture.Giventhegrowingdisseminationofﬁfakenewsﬂandﬁdeepfakesﬂ,theresearch
communityandsocialmediaplatformsalikearepushingtowards
generalizable
defenseagainst
continuouslyevolvingandsophisticatedfaceattacks.Inthisdissertation,weproposeasetof
defensemethodsthatachievestate-of-the-artperformanceindetectingattacktypeswithinindivid-
ualattackcategories,bothphysical(e.g.,facespoofs)anddigital(e.g.,adversarialfacesanddigital
manipulation),thenintroduceamethodforsimultaneouslysafeguardingagainsteachattack.
First,inanefforttoimpartgeneralizabilityandinterpretabilitytofacespoofdetectionsystems,
weproposeanewfaceframeworkdesignedtodetectunknownspoof
types,namely,Self-SupervisedRegionalFullyConvolutionalNetwork(
SSR-FCN
),thatistrained
tolearnlocaldiscriminativecuesfromafaceimageinaself-supervisedmanner.Theproposed
frameworkimprovesgeneralizabilitywhilemaintainingthecomputationalefyofholistic
faceapproaches(
<
4msonaNvidiaGTX1080TiGPU).Theproposedmethod
isalsointerpretablesinceitlocalizeswhichpartsofthefacearelabeledasspoofs.Experimental
resultsshowthatSSR-FCNcanachieveTrueDetectionRate(TDR)=65%@2.0%FalseDetection
Rate(FDR)whenevaluatedonadatasetcomprisingof13differentspooftypesunderunknown
attackswhileachievingcompetitiveperformancesunderstandardbenchmarkface
datasets(Oulu-NPU,CASIA-MFSD,andReplay-Attack).
Next,weaddresstheproblemofdefendingagainstadversarialattacks.Wepropose,
Ad-
vFaces
,anautomatedadversarialfacesynthesismethodthatlearnstogenerateminimalpertur-
bationsinthesalientfacialregions.OnceAdvFacesistrained,itcanautomaticallyevadestate-
of-the-artfacematcherswithattacksuccessratesashighas97.22%and24.30%at
0
:
1%
FARfor
obfuscationandimpersonationattacks,respectively.Wethenproposeanewself-supervisedadver-
sarialdefenseframework,namely
FaceGuard
,thatcanautomaticallydetect,localize,andpurify
awidevarietyofadversarialfaceswithoututilizingpre-computedadversarialtrainingsamples.
FaceGuardautomaticallysynthesizesdiverseadversarialfaces,enablingatolearntodis-
tinguishthemfrombonafaces.Concurrently,aattemptstoremovetheadversarial
perturbationsintheimagespace.FaceGuardcanachieve99.81%,98.73%,and99.35%detection
accuraciesonLFW,CelebA,andFFHQ,respectively,onsixunseenadversarialattacktypes.
Finally,wetakethestepstowardssafeguardingAFRsystemsagainstfaceattacksin
both
physicalanddigitaldomains.Weproposeanew
uni

f
ace
a
ttack
d
etectionframework,
namely
UniFAD
,whichautomaticallyclusterssimilarattacksandemploysamulti-tasklearning
frameworktolearnsalientfeaturestodistinguishbetweenbonaandcoherentattacktypes.
Theproposed
UniFAD
candetectfaceattacksfrom
25
attacktypesacrossall
3
attackcategories
withTDR=
94
:
73%
@
0
:
2%
FDRonalargefakefacedataset,namely
GrandFake
.Further,
Uni-
FAD
canidentifywhetherattacksareadversarial,digitallymanipulated,orcontainspoofartifacts,
with
97
:
37%
accuracy.
Copyrightby
DEBAYANDEB
2021
ToMyLovingFamily
v
ACKNOWLEDGMENTS
Arguably,thissectionhasbeenthehardestonetowriteinthisentiredissertationduetothe
sheernumberofindividualswhowereinstrumentaltowardsthecompletionofmyPhDresearch.
Foremost,Iwouldliketoexpressmydeepestgratitudetowardsmyadvisor,Dr.AnilK.Jain,for
hisunwaveringsupportandencouragementtostriveforexcellence.Despitebeingarawstudentat
Dr.Jain,tookachanceonmebybringingmeintohisresearchlabandpatientlyteachingme
therigorsofresearch.His(i)abilitytosystematicallydisseminateproblems,(ii)
attentiontodetails,(iii)workethic,(iv)skillofpresentingideasinavisually-convincingmanner,
and(v)promptnessinrespondingtoemails,aresomeofthemanyvirtuesthatIwillalwaysstrive
toinstillinme.Apartfrombeingagreatscientist,heisalsoextraordinarilyhumbleandkindat
heart.ThankyouDr.Jainalsoforyourfriendshipandguidanceonmattersoutsideofresearch.
ThankyouDr.RossforteachingCSE491:SelectedTopicsinBiometricsandadmittingmeinto
yourresearchgrouppriortomyPhD.Withoutyourinitialsupport,Iwouldnothavebeenwriting
thisdissertation.
IwouldalsoliketothankmyPhDcommittee,Dr.ArunRoss,Dr.XiaomingLiu,andDr.Mi
Zhang,forprovidingvaluablecommentsandsuggestionsonthisdissertation.Aspecialthanksto
Dr.LiuandDr.Rossforyourpatience,inspirationalideas,andsuggestionsduringourcollabo-
rativeresearch.ThankyouBrendaHodge,StevenSmith,AmyKing,andErinDunlopforyour
administrativeassistanceandyourpatiencewithmyextremelytardytravelreimbursementforms.
AspecialthankstoChristopherPerry,forbeingagreatsupportandhelpinsettingupmyGPU
machineŒwithoutwhichtheresearchconductedinthisdissertationcouldnothavebeenpossible.
MydeepestgratitudetoBrendanKlarefromRankOneComputing,JianbangZhangfrom
Lenovo,andBiplobDebnathfromNEC,forprovidingmewithinternshipopportunitiesattheir
respectivecompanies.Throughtheinternships,Igotanopportunitytoviewresearch
fromabusinessperspective.IwouldalsolikethethankFord,Lenovo,IARPA,andFacebookAI
ResearchfortheirsupportinresearchconductedduringmyPhDprogram.
vi
IamgratefulforallmyfellowPRIPies,especiallymycontemporaries(Josh,Yichun,and
Sixue).ThankyouJoshforyourclosefriendshipandthemanybeer-drivenconversationswe
hadacrosstheworldovertheyears.ThankyouInciforallofourconversationsinthelab,your
encouragementsthroughout,andwillingnesstocollaboratewithmeonfutureresearch.Aspecial
thankstoAishvarywhohasbeenlikeabrothertomeforthepastdecade.ThankyouVishesh,
Divyansh,Yichun,andTarangforhelpingmegainduringmylows.Thankyoualsoto
Steve,Kai,Cori,Lacey,Charles,andSunpreet.Specialthankstonon-PRIPiesincludingbutnot
limitedtoYaojieLiu,AminJourabloo,JoelStehouwer,SudiptaBanerjee,RahulDey,Siddharth
Shukla,StevenHoffman,AdamTerwilliger,XiYin,andThomasSwearingen.
Sincereapologies
toallformytardiness.
Ioweagreatdealtomyparentsandsisterforbeingaconstantsourceofsupportandguidance
whileIjourneyedthroughtheupsanddownsasagraduatestudent.Withouttheirmoralsupport,
noneofthiswouldhavebeenpossible.
ThereareofcoursemanyotherswhohavehadagreatimpactonmealongthewayandI
sincerelythankallofyou.
vii
TABLEOFCONTENTS
LISTOFTABLES
.......................................
xii
LISTOFFIGURES
......................................
xv
LISTOFALGORITHMS
...................................
xxiv
Chapter1Introduction
..................................
1
1.1Background......................................8
1.2AutomatedFaceRecognition(AFR).........................9
1.2.1AFRPipeline.................................9
1.2.1.1FaceDetection...........................10
1.2.1.2FaceAlignment..........................12
1.2.1.3FeatureExtraction.........................14
1.2.1.4SimilarityMeasurement......................15
1.3EvolutionofFaceRecognition............................15
1.3.1FaceRepresentations.............................16
1.3.1.1HolisticFaceRepresentations...................16
1.3.1.2LocalFaceRepresentations....................17
1.3.1.3LearnedFaceRepresentations...................17
1.4BenchmarkingAFRSystems.............................18
1.4.1EvaluationMetrics..............................18
1.4.2FaceDatasets.................................20
1.4.3ConstrainedFaceRecognition........................20
1.4.4UnconstrainedFaceRecognition.......................21
1.5VulnerabilitiesofAFRSystems............................23
1.5.1PhysicalFaceSpoofs.............................24
1.5.2DigitalAdversarialFaces...........................25
1.5.3DigitalFaceManipulation..........................28
1.6DissertationContributions..............................29
Chapter2DefendingAgainstFaceSpoofs
........................
31
2.1Introduction......................................31
2.2Background......................................35
2.3Motivation.......................................38
2.3.1FacePresentationAttackDetectionisaLocalTask.............38
2.3.2Globalvs.LocalSupervision.........................38
2.4ProposedApproach..................................40
2.4.1NetworkArchitecture.............................40
2.4.2NetworkEfy..............................41
2.4.3StageI:TrainingFCNGlobally.......................41
2.4.4StageII:TrainingFCNonSelf-SupervisedRegions.............43
viii
2.4.5Testing....................................44
2.5ExperimentalSetup..................................44
2.5.1Datasets....................................44
2.5.1.1Spoof-in-the-WildwithMultipleAttacks(SiW-M)[1]......44
2.5.1.2Oulu-NPU[2]...........................45
2.5.1.3CASIA-FASD[3]&Replay-Attack[4]..............45
2.5.2DataPreprocessing..............................45
2.5.3ImplementationDetails............................45
2.5.4EvaluationMetrics..............................46
2.6ExperimentalResults.................................48
2.6.1EvaluationofGlobalDescriptorvs.LocalRepresentation..........48
2.6.2RegionExtractionStrategies.........................49
2.6.3EvaluationofNetworkCapacity.......................52
2.6.4GeneralizationacrossUnknownAttacks...................54
2.6.5SiW-M:DetectingKnownAttacks......................55
2.6.6EvaluationonOulu-NPUDataset.......................55
2.6.7Cross-DatasetGeneralization.........................57
2.6.8FailureCases.................................58
2.6.9ComputationalRequirement.........................59
2.6.10VisualizingPresentationAttackRegions...................60
2.7Discussion.......................................61
2.8Summary.......................................62
Chapter3SynthesizingandDefendingAgainstAdversarialFaces
...........
63
3.1Introduction......................................64
3.2RelatedWork.....................................67
3.2.1GenerativeAdversarialNetworks(GANs)..................67
3.2.2AdversarialAttacksonImage.................68
3.2.3AdversarialAttacksonFaceRecognition..................68
3.2.4DefensesAgainstAdversarialAttacks....................69
3.3SynthesizingAdversarialFaces............................70
3.3.1ProposedMethodology............................72
3.3.2ExperimentalSettings.............................75
3.3.3ComparisonwithState-of-the-Art......................77
3.3.4AblationStudy................................78
3.3.5WhatisAdvFacesLearning?.........................78
3.3.6TransferabilityofAdvFaces.........................79
3.3.7EffectofPerturbationAmount........................81
3.3.8HumanPerceptualStudy...........................82
3.3.9Addendum..................................84
3.3.9.1ImplementationDetails......................84
3.3.9.2EffectonCosineSimilarity....................85
3.3.9.3StructuralSimilarity........................86
3.3.9.4BaselineImplementationDetails..................87
3.4DefendingAgainstAdversarialFaces.........................89
ix
3.4.1LimitationsofState-of-the-ArtDefenses...................92
3.4.2ProposedMethodology............................94
3.4.2.1AdversarialGenerator.......................94
3.4.2.2AdversarialDetector........................96
3.4.2.3Adversarial........................96
3.4.2.4TrainingFramework........................98
3.4.3ExperimentalSettings.............................99
3.4.4ComparisonwithState-of-the-ArtDefenses.................100
3.4.5AnalysisofOurApproach..........................102
3.4.6Addendum..................................106
3.4.6.1ImplementationDetails......................106
3.4.6.2Preprocessing............................106
3.4.6.3NetworkArchitectures.......................106
3.4.6.4TrainingDetails..........................107
3.4.6.5Baselines..............................110
3.4.6.6AdditionalDatasets........................111
3.4.6.7OvinginPrevailingDetectors.................112
3.4.6.8QualitativeResults.........................112
3.4.6.9AdditionalResultson...................113
3.5Summary.......................................114
Chapter4DetectionofDigitalandPhysicalFaceAttacks
...........
120
4.1Introduction......................................120
4.2RelatedWork.....................................123
4.3DissectingPrevailingDefenseSystems........................125
4.3.1Datasets....................................125
4.3.2DrawbackofJointCNN............................126
4.3.3UnifyingMultipleJointCNNs........................126
4.4ProposedMethod...................................127
4.4.1Problem..............................128
4.4.2AutomaticConstructionofAuxiliaryTasks.................128
4.4.3Multi-TaskLearningwithConstructedTasks.................129
4.4.4ParameterSharing..............................129
4.4.5TrainingandTesting.............................130
4.5ExperimentalResults.................................131
4.5.1ExperimentalSettings.............................131
4.5.2ComparisonwithIndividualSOTADetectors................133
4.5.3ComparisonwithFusedSOTADetectors...................133
4.5.4Attack.............................135
4.5.5AnalysisofUniFAD.............................135
4.6Addendum.......................................138
4.6.1GrandFakeDataset..............................138
4.6.2ImplementationDetails............................139
4.6.3DigitalAttackImplementation........................140
4.6.4BaselineImplementation...........................141
x
4.6.5SeenAttacks.................................142
4.6.6GeneralizabilitytoUnseenAttacks......................144
4.6.7AttackCategory........................145
4.7Summary.......................................146
Chapter5Summary
....................................
147
5.1Contributions.....................................148
5.2SuggestionsforFutureWork.............................150
Chapter6PhDOverview
.................................
152
6.1Publications......................................152
6.2Videos&Demos...................................153
6.3MediaCoverage....................................154
BIBLIOGRAPHY
.......................................
155
xi
LISTOFTABLES
Table1.1Vperformance(%)undertwodifferentfacedetectorsonLFW,
CFP-FP,andAgeDB-30[5].Figure1.7showsafewexamplesofeachofthethree
datasets.........................................10
Table1.2VPerformance(%)ofFaceNet[6]AFRsystemunderdifferent
facealignmenttechniques[7].............................14
Table1.3VPerformance(%)onLFW[8]fordifferentfacefeatureextractors.14
Table1.4BenchmarkingAFRperformancethroughouttheyearsinNISTevaluations
onfrontalandconstrainedfaces............................21
Table2.1Asummaryofpubliclyavailablefacepresentationattackdetectiondatasets..34
Table2.2ArchitecturedetailsoftheproposedFCNbackbone..............39
Table2.3Generalizationerroronlearningglobal(CNN)vs.local(FCN)representa-
tionsofSiW-M[1]...................................46
Table2.4GeneralizationperformanceofdifferentregionextractionstrategiesonSiW-
Mdataset.Here,eachcolumnrepresentsanunknownpresentationattackinstru-
mentwhilethemethodistrainedontheremaining12presentationattackinstruments.48
Table2.5GeneralizationerrorofFCNswithrespecttothenumberoftrainableparam-
eters...........................................53
Table2.6ResultsonSiW-M:UnknownAttacks.Here,eachcolumnrepresentsanun-
knownpresentationattackinstrumentwhilethemethodistrainedontheremaining
12presentationattackinstruments...........................53
Table2.7ResultsonSiW-M:Knownpresentationattackinstruments...........53
Table2.8ErrorRates(%)oftheproposed
SSR-FCN
andandcompetingfacepresen-
tationattackdetectorsunderthefourstandardprotocolsofOulu-NPU[2]......56
Table2.9Cross-DatasetHTER(%)oftheproposed
SSR-FCN
andcompetingface
presentationattackdetectors..............................57
Table3.1Relatedworkinadversarialdefensesusedasbaselinesinourstudy.Un-
likemajorityofpriorwork,
FaceGuard
isself-supervisedwherenopre-computed
adversarialexamplesarerequiredfortraining.....................69
xii
Table3.2Attacksuccessratesandstructuralsimilaritiesbetweenprobeandgallery
imagesforobfuscationandimpersonationattacks.Attackratesforobfuscation
comprisesof484,514comparisonsandthemeanandstandarddeviationacross10-
foldsforimpersonationreported.Themeanandstandarddeviationofthestructural
similaritiesbetweenadversarialandprobeimagesalongwiththetimetakento
generateasingleadversarialimage(onaQuadroM6000GPU)alsoreported....74
Table3.3Foreachmethod,theaverageandstandarddeviation(
%
)ofthenumberof
timesworkerschosethesynthesizedimagetobeclosesttotheprobe.........83
Table3.4FacerecognitionperformanceofArcFace[9]underadversarialattackand
averagestructuralsimilarities(SSIM)betweenprobeandadversarialimagesfor
obfuscationattackson
485
K
genuinepairsinLFW[8]................97
Table3.5DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixad-
versarialattackssynthesizedfortheLFWdataset[8].Detectionthresholdisset
as
0
:
5
forallmethods.Allbaselinemethodsrequiretrainingonpre-computedad-
versarialattacksonCASIA-WebFace[10].Ontheotherhand,theproposed
Face-
Guard
isself-guidedandgeneratesadversarialattacksonthe.Hence,itcanbe
regardedasa
black-box
defensesystem........................98
Table3.6AFRperformance(TAR(%)@0.1%FAR)ofArcFaceundernodefense
andwhenArcFaceistrainedviaSOTArobustnesstechniques[11Œ13]orSOTA
[14,15].
FaceGuard
correctlypassesmajorityofrealfacestoArcFace
andalsoadversarialattacks..........................102
Table3.7Ablatingtrainingschemesofthegenerator
G
anddetector
D
.Allmodels
aretrainedonCASIA-WebFace[10].
(Col.3)
Wecomputethedetectionaccuracy
inclassifyingrealfacesinLFW[8]andthemostchallengingadversarialattackin
Tab.3.4,AdvFaces[16].
(Col.4)
Theavg.andstd.dev.ofdetectionaccuracy
acrossall6adversarialattacks.............................103
Table3.8AverageandstandarddeviationofdetectionaccuraciesofSOTAadversarial
facedetectorsinclassifyingsixadversarialattackssynthesizedfortheLFW[8],
CelebA[17],andFFHQ[18]datasets.Detectionthresholdissetas
0
:
5
forall
methods.Allbaselinemethodsrequiretrainingonpre-computedadversarialat-
tacksonCASIA-WebFace[10].Ontheotherhand,theproposed
FaceGuard
is
self-guidedandgeneratesadversarialattacksonthe.Hence,itcanberegarded
asa
black-box
defensesystem.............................110
Table3.9DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixad-
versarialattackssynthesizedfortheLFWdataset[8]undervariousknownand
unseenattackscenarios.Detectionthresholdissetas
0
:
5
forallmethods.......111
xiii
Table4.1Faceattackdatasetswithno.ofbonaimages,no.ofattackimages,and
no.ofattacktypes.Here,
I
denotesimagesand
V
referstovideos..........123
Table4.2Detectionaccuracy(TDR(%)@
0
:
2%
FDR)on
GrandFake
dataset.Results
onfusingFaceGuard[19],FFD[20],andSSRFCN[21]arealsoreported.We
reporttimetakentodetectasingleimage(onaNvidia2080TiGPU)........132
Table4.3Ablationstudyovercomponentsof
UniFAD
.Branchingviaﬁ
B
Semantic
ﬂ,
ﬁ
B
Random
ﬂ,andﬁ
B
kMeans
ﬂrefertopartitioningattacktypesbytheirsemanticcat-
egories,randomly,and
k
Means.ﬁSharedSemanticﬂincludessharedlayerspriorto
branching........................................136
Table4.4Compositionandstatisticsfortheproposed
GrandFake
dataset.Wealso
includetheevaluationprotocolfortheseenattackscenario..............140
Table4.5Detectionperformance(TDR(%)@
0
:
2%
FDRandAccuracy(%))
on
GrandFake
datasetunderthe
seen
attackscenario.................142
Table4.6Generalizationperformance(TDR(%)@
0
:
2%
FDRandAccuracy(%))
on
GrandFake
datasetunderunseenattacksetting.Eachfoldcomprisesof
8
unseen
attacksfromall4branches...............................144
xiv
LISTOFFIGURES
Figure1.1(a)errorratesofsixstate-of-the-artautomatedfacerecogni-
tionvendorsona12millionmugshotdataset,namelyFRVT-2018[22].(b)Six
mugshotsrepresentativeoftheFRVT-2018dataset..................2
Figure1.2Sourcesofintra-subjectvariability:(a)pose,(b)illumination,and(c)ex-
pression.Eachrowshowsintra-subjectvariationsforthesameindividualin(a-c;
source:PIEDataset[23]),(d)AmitabhBacchan(source:GoogleImages),and(e)
TomHiddleston(source:GoogleImages).......................4
Figure1.3errorratesofsixstate-of-the-artautomatedfacerecognition
vendorswhen(a)mugshots,(b)webcamimages,and(c)facesarecompared
againsta1.6millionmugshotdataset(asubsetoftheFRVT-2018[22]).......7
Figure1.4errorratesofsixstate-of-the-artautomatedfacerecognition
vendorsona3millionmugshotdatasetunderaging[22].Facerecognitionerrors
increaseasthetimegapbetweenaprobeimageandtheenrolledfaceimageinthe
galleryincreases....................................7
Figure1.5Sourceofinter-subjectsimilarities:(a)kinshiprelations(here,twins),(b)
differentpeoplewithnokinshiprelationswhohappentoexhibitverysimilarfa-
cialcharacters(knownasﬁdoppelg
¨
angersﬂ),and(c)RichardJones(right)spent
17yearsinprisonforacrimecommittedbyhisdoppelg
¨
anger,RickyAmos(left)
(Source:https://cnn.it/2Gb1F4A)...........................8
Figure1.6AtypicalAutomatedFaceRecognition(AFR)systemtypicallyconsists
of(i)facedetection,(ii)facealignment(tomitigategeometricdistortions),(iii)
featureextraction,and(iv)comparisonoffacerepresentations(featurevectors)...9
Figure1.7Examplefacesin(a)LFW[8],(b)CFP[24],andAgeDB[25]datasets....10
Figure1.8Astate-of-the-artfacedetector,RetinaFace,candetectaround900faces
(detectionthresholdat
0
:
5
)outof1,151peoplereportedtobepresentinthe
ﬁWorld'sLargest[5].Theyellowrectangledenotestheboundingbox
aroundafaceandgreendotsrepresentthedetectedlandmarks............11
Figure1.9Exampleimagesthatsupposedlycontainfacesbutcannotbedetectedbya
state-of-the-artfacedetector,RetinaFace[5].Notethattheseimagesareofvery
lowresolution......................................11
xv
Figure1.10Illustrationofvarious2Dfacealignmenttechniques:(i)simplycropping
thefaceregion,(ii)similaritytransform(scaleandrotation),(iii)aftransfor-
mation(rotation,scaling,andshearmapping),and(iv)projectivetransformation
(perspectivedeformation)[7]..............................13
Figure1.11Examplefaceimagesfrom(a)FERET[26],(b)FRGC[26],(c)LFW[8],
(d)IJB-A[27],(e)MS-Celeb-1M[28],and(f)TinyFace[29].Datasets(a)and(b)
containfaceimagesunderrelativelycontrolledacquisitionconditions.Datasets
(c-f)containmoreunconstrainedfaceimages(
e
.
g
.,collectedfromtheInternet)...22
Figure1.12FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigital
andphysicalspaces.Giventhediversityofthefaceattacks,prevailingmethods
fallshortindetectingattacksacrossallthreecategories(
i
.
e
.,adversarial,digital
manipulation,andspoofs)...............................23
Figure1.13Examplepresentationattacks:Simpleattacksinclude(b)printedphoto-
graph,or(c)replayingthevictim'svideo.Moreadvancedpresentationattackscan
alsobeleveragedsuchas(d-h)3Dmasks,(i-k)make-upattacks,or(l-n)partialat-
tacks[30].Afaceisshownin(a)forcomparison.Here,thepresentation
attacksin(b-c,k-n)belongtothesamepersonin(a).................25
Figure1.14Examplegalleryandprobefaceimagesandcorrespondingsynthesizedad-
versarialexamples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand
(b)thesamesubject'sprobeimage;(c)Adversarialexamplesgeneratedfrom(b)
byourproposedsynthesismethod,AdvFaces;(d-e)Resultsfromtwoadversar-
ialexamplegenerationmethods.Cosinesimilarityscores(
2
[

1
;
1]
)obtainedby
comparing(b-e)totheenrolledimageinthegalleryviaArcFace[9]areshown
belowtheimages.Ascoreabove
0.28
(threshold@
0
:
1%
FalseAcceptRate)
indicatesthattwofaceimagesbelongtothesamesubject.Here,asuccessfulob-
fuscationattackwouldmeanthathumanscanidentifytheadversarialprobesand
enrolledfacesasbelongingtothesameidentitybutanautomatedfacerecognition
systemconsidersthemtobefromdifferentsubjects..................26
Figure1.15Examplesofdigitallymanipulatedfaces.(a)Realimages/framesfrom
FFHQ,CelebAandFaceForensics++datasets;(b)Pairedfaceidentityswapim-
agesfromFaceForensics++dataset;(c)Pairedfaceexpressionswapimagesfrom
FaceForensics++dataset;(d)AttributesmanipulatedexamplesbyFaceAPPand
StarGAN;(e)EntiresynthesizedfacesbyPGGANandStyleGAN.Collagesourced
from[20]........................................27
Figure2.1Examplepresentationattackinstruments:Simpleattacksinclude(b)printed
photograph,or(c)replayingthevictim'svideo.Moreadvancedpresentationat-
tackscanalsobeleveragedsuchas(d-h)3Dmasks,(i-k)make-upattacks,or(l-n)
partialattacks[30].Afaceisshownin(a)forcomparison.Here,the
presentationattacksin(b-c,k-n)belongtothesamepersonin(a)..........32
xvi
Figure2.2AnoverviewoftheproposedSelf-SupervisedRegionalFullyConvolution
Network(
SSR-FCN
).Wetrainintwostages:(1)Stage1learnsglobaldiscrimi-
nativecuesviatrainingontheentirefaceimage.Thescoremapobtainedfrom
stage1ishard-gatedtoobtainpresentationattackregionsinthefaceimage.We
randomlycroparbitrary-sizepatchesfromthepresentationattackregionsand
tuneournetworkinstage2tolearnlocaldiscriminativecues.Duringtest,weinput
theentirefaceimagetoobtainthescore.Thescoremapcanalsobe
usedtovisualizethepresentationattackregionsintheinputimage..........33
Figure2.3Illustrationofdrawbacksofpriorapproaches.Top:exampleofa
face;Bottom:exampleofapaperglassespresentationattack.Inthiscase,thepre-
sentationattackartifactisonlypresentintheeye-regionoftheface.(a)
trainedwithglobalsupervisionovtotheclasssincebothimagesare
mostly(thepresentationattackinstrumentcoversonlyapartoftheface).
(b)Pixel-levelsupervisionassumestheentireimageiseitherorpresenta-
tionattackandconstructslabelmapsaccordingly.Thisisnotavalidassumption
inmask,makeup,andpartialpresentationattackinstruments.Instead,(c)thepro-
posedframework,trainsonextractedregionsfromfaceimages.Theseregionscan
bebasedondomainknowledge,suchaseye,nose,mouthregions,orrandomly
cropped.Theproposed
SSR-FCN
utlizesself-supervisedregion-selection......36
Figure2.4Threepresentationattackimagesandtheircorrespondingbinarymasksex-
tractedfrompredictedscoremaps.Blackregionscorrespondtopredicted
regions,whereas,whiteregionsindicatepresentationattack.............41
Figure2.5Illustrationofvariousregionextractionstrategiesfromtrainingimages.
(a)and(b)areregionsextractedviadomainknowledge(manuallyor
landmark-based.(c)randomregionsextractedviaproposedself-supervision
scheme.Eachcolordenotesaseparateregion.....................47
Figure2.6(a)Anexampleobfuscationpresentationattackattemptwhereournetwork
correctlypredictstheinputtobeapresentationattack.(b,e)Scoremapoutput
byournetworktrainedviaSelf-SupervisedRegions.(c,f)Scoremapoutputby
FCNtrainedonentirefaceimages.(d)Anexampleobfuscationpresentationattack
attemptwhereournetwork
incorrectly
predictstheinputtobeaAttack
scoresaregivenbelowthescoremaps.Decisionthresholdis
0
:
5
...........51
Figure2.7Networkconvergenceoveranumberoftrainingiterationswhenamodel
trainson(a)randomlycroppedpatches(blueline),and(b)self-supervisedregions
extractedviapre-trainedmodelfromStageI(orangeline).Randomlycropping
patchesmayresultinnoisysampleswheresamplesfrompresentation
attacksamplesmaybeusedfortraining.Someexamplerandomlycroppedpatches
withhightraininglossareshownabovethelines.Instead,wethattheproposed
self-supervisionaidsinnetworkconvergence.....................52
xvii
Figure2.8Examplecaseswheretheproposedframework,
SSR-FCN
,failstocorrectly
classifyandpresentationattacks.(a)areaspre-
sentationattackslikelyduetobrightlightingandocclusionsinfaceregions.(b)
Presentationattacksasduetothesubtlenatureofmake-up
attacksandtransparentmasks.Correspondingattackscores(
2
[0
;
1]
)areprovided
beloweachimage.Largervalueofattackscoreindicatesahigherlikelihoodthat
theinputimageisapresentationattack.Decisionthresholdis
0
:
5
..........58
Figure2.9Visualizingpresentationattackregionsviatheproposed
SSR-FCN
.Redre-
gionsindicatehigherlikelihoodofbeingapresentationattackregion.Correspond-
ingattackscores(
2
[0
;
1]
)areprovidedbeloweachimage.Largervalueofattack
scoreindicatesahigherlikelihoodthattheinputimageisapresentationattack.
Decisionthresholdis
0
:
5
................................59
Figure2.10Apartialpresentationattackartifactmaybepresentinasmallportionof
theinput
256

256
faceimage,suchas(a)papereyeglasses.However,since
theproposed
SSR-FCN
dynamicallyaggregatesdecisionsacrossmultiplereceptive
intheimage,amajorityofthepixelsinthescoremapcompriseofhigh
scores(indicatingthepresenceofapresentationattack).Wevisualizetheaverage
scoremapacrossallpapereyeglassattacksin(b)...................60
Figure3.1Examplegalleryandprobefaceimagesandcorrespondingsynthesizedad-
versarialexamples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand
(b)thesamesubject'sprobeimage;(c)Adversarialexamplesgeneratedfrom(b)
byourproposedsynthesismethod,AdvFaces;(d-e)Resultsfromtwoadversar-
ialexamplegenerationmethods.Cosinesimilarityscores(
2
[

1
;
1]
)obtainedby
comparing(b-e)totheenrolledimageinthegalleryviaArcFace[9]areshown
belowtheimages.Ascoreabove
0.28
(threshold@
0
:
1%
FalseAcceptRate)
indicatesthattwofaceimagesbelongtothesamesubject.Here,asuccessfulob-
fuscationattackwouldmeanthathumanscanidentifytheadversarialprobesand
enrolledfacesasbelongingtothesameidentitybutanautomatedfacerecognition
systemconsidersthemtobefromdifferentsubjects..................64
Figure3.2Threetypesoffacepresentationattacks:(a)printedphotograph,(b)replay-
ingthetargetedperson'svideoonasmartphone,and(c)asiliconemaskofthe
target'sface.Facepresentationattacksrequireaphysicalartifact.Adversarialat-
tacks(d),ontheotherhand,aredigitalattacksthatcancompromiseeitheraprobe
imageorthegalleryitself.Toahumanobserver,facepresentationattacks(a-c)are
moreconspicuousthanadversarialfaces(d)......................65
Figure3.3Eightpointsofattacksinanautomatedfacerecognitionsystem[31].An
adversarialimagecanbeinjectedintheAFRsystematpoints2and6(solidarrows).66
xviii
Figure3.4Oncetrained,AdvFacesautomaticallygeneratesanadversarialfaceimage.
Duringanobfuscationattack,(a)theadversarialfaceappearstobeabenignex-
ampleofCristianoRonaldo'sface,however,itfailstomatchhisenrolledimage.
AdvFacescanalsocombineCristiano'sprobeandBradPitt'sprobetosynthesize
anadversarialimagethatlookslikeCristianobutmatchesBrad'sgalleryimage(b).71
Figure3.5Givenaprobefaceimage,AdvFacesautomaticallygeneratesanadversarial
maskthatisthenaddedtotheprobetoobtainanadversarialfaceimage.......73
Figure3.6AdversarialfacesynthesisresultsonLFWdatasetin(a)obfuscationand(b)
impersonationattacksettings(cosinesimilarityscoresobtainedfromArcFace[9]
withthreshold@
0
:
1%
FAR
=0
:
28
).Theproposedmethodsynthesizesadversar-
ialfacesthatareseeminglyinconspicuousandmaintainhighperceptualquality.
AdditionalexamplesareavailableintheAddendum(Sec.3.3.9)...........77
Figure3.7VariantsofAdvFacestrainedwithoutthediscriminator,perturbationloss,
andidentityloss,respectively.EverycomponentofAdvFacesisnecessary......79
Figure3.8State-of-the-artfacematcherscanbeevadedbyslightlyperturbingsalient
facialregions,suchaseyebrows,eyeballs,andnose(cosinesimilarityobtainedvia
ArcFace[9])......................................80
Figure3.9CorrelationbetweenfacefeaturesextractedviaFaceNetandArcFacefrom
1,456imagesbelongingto10subjects.........................80
Figure3.102Dt-SNEvisualizationoffacerepresentationsextractedviaFaceNetand
ArcFacefrom1,456imagesbelongingto10subjects.................81
Figure3.11Trade-offbetweenattacksuccessrateandstructuralsimilarityforimper-
sonationattacks.Wechoose

=8
:
0
..........................82
Figure3.12Examplefailurecaseswherehumanobserversvotedanadversarialimage
synthesizedby(c)GFLM[32]and(f)PGD[33]tobeclosertotheprobeface(a),
(d)comparedtoAdvFaces(b),(e)...........................83
Figure3.13ShiftincosinesimilarityscoresforArcFace[9]beforeandafteradversarial
attacksgeneratedviaAdvFaces............................86
Figure3.15Left:RealfaceimagesintheLFWdataset.Right:Adversarialimages
synthesizedviaAdvFacesunderobfuscationsetting..................87
xix
Figure3.16LeonardoDiCaprio'srealfacephoto(a)enrolledinthegalleryand(b)his
probeimage
1
;(c)Adversarialprobesynthesizedbyastate-of-the-art(SOTA)ad-
versarialfacegenerator,AdvFaces[16];(d)Proposedadversarialdefenseframe-
work,namely
FaceGuard
takes(c)asinput,detectsadversarialimages,localizes
perturbedregions,andoutputsafacedevoidofadversarialperturbations.
ASOTAfacerecognitionsystem,ArcFace,failstomatchLeonardo'sadversarial
face(c)to(a),however,thefacecansuccessfullymatchto(a).Cosine
similarityscores(
2
[

1
;
1]
)obtainedviaArcFace[9]areshownbelowtheimages.
Ascoreabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatesthattwofaces
areofthesamesubject.................................89
Figure3.17
(TopRow)
Adversarialfacessynthesizedvia
6
adversarialattacksusedin
ourstudy.
(BottomRow)
Correspondingadversarialperturbations(grayindicates
nochangefromtheinput).Noticethediversityintheperturbations.ArcFace
scoresbetweenadversarialimageandtheunalteredgalleryimage(notshownhere)
aregivenbeloweachimage.Ascoreabove
0.36
indicatesthattwofacesareofthe
samesubject.Zoominfordetails...........................90
Figure3.18
FaceGuard
employsadetector(
D
)tocomputeanadversarialscore.Scores
belowdetectionthreshold(
˝
)passestheinputtoAFR,andhighvalueinvokesa
andsendsthefacetotheAFRsystem................91
Figure3.19(a)AdversarialtrainingdegradesAFRperformanceofFaceNetmatcher[6]
onrealfacesinLFWdatasetcomparedtostandardtraining.(b)Abinary
trainedtodistinguishbetweenrealfacesandFGSM[34]attacksfailstodetect
unseenattacktype,namelyPGD[35].........................92
Figure3.20Overviewoftrainingtheproposed
FaceGuard
inaself-supervisedmanner.
An
adversarialgenerator
,
G
,continuouslylearnstosynthesizechallengingand
diverseperturbationsthatevadeafacematcher.Atthesametime,a
detector
,
D
,learnstodistinguishbetweenthesynthesizedadversarialfacesandrealface
images.Perturbationsresidinginthesynthesizedadversarialfacesareremoved
viaa

,
P
ur
...................................93
Figure3.21Exampleswheretheproposed
FaceGuard
failstocorrectlydetect(a)real
facesand(b)adversarialfaces.Detectionscores
2
[0
;
1]
aregivenbeloweach
image,where
0
indicatesrealand
1
indicatesadversarialface.............101
Figure3.22Adversarialfacessynthesizedby
FaceGuard
duringtraining.Notethedi-
versityinperturbations(a)withinand(b)acrossiterations..............104
Figure3.23
FaceGuard
successfullytheadversarialimage(redregionsindicate
adversarialperturbationslocalizedbyourmask).ArcFace[9]scores
2
[

1
;
1]
andSSIM
2
[0
;
1]
betweenanadvprobeandinputprobe
aregivenbeloweachimage..............................105
xx
Figure3.24(a)
FaceGuard
'siscorrelatedwithitsadversarialsynthesispro-
cess.(b)Trade-offbetweendetectionandwithrespecttoperturbation
magnitudes.Withminimalperturbation,detectionischallengingwhile
maintainsAFRperformance.Excessiveperturbationsleadtoeasierdetectionwith
greaterchallengein...........................105
Figure3.25Traininglossacrossiterationswhenanadversarialdetectionnetworkis
trainedviapre-computedadversarialfaces(blue),theproposedadv.generatorbut
withoutthediversity(orange),andwiththeproposeddiversityloss(green).The
diversitylosspreventsthenetworkfromovtoadversarialperturbations
encounteredduringtraining..............................109
Figure3.26Examplesofgeneratedadversarialimagesalongwithcorrespondingpertur-
bationmasksobtainedvia
FaceGuard
'sgenerator
G
forthreerandomlysampled
z
.
CosinesimilarityscoresviaArcFace[9]
2
[

1
;
1]
andSSIM
2
[0
;
1]
betweensyn-
thesizedadversarialandinputprobearegivenbeloweachimage.Ascoreabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatesthattwofacesareofthesame
subject.........................................113
Figure3.27ExamplesofimagesviaMagNet[14],DefenseGan[15],andpro-
posed
FaceGuard
forsixadversarialattacks.Cosinesimilarityscoresvia
ArcFace[9]
2
[

1
;
1]
aregivenbeloweachimage.Ascoreabove
0.36
(threshold
@
0
:
1%
FalseAcceptRate)indicatesthattwofacesareofthesamesubject.....116
Figure3.28Examplesofsynthesizedadversarialimagesviatheproposedadversarial
generatorandcorrespondingimages.Cosinesimilaritybetweenperturba-
tionandmasksgivenbeloweachrowalongwithArcFacescoresbe-
tweensynthesizedadvimageandrealprobe.Ascoreabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatesthattwofacesareofthesame
subject.Evenwithlowercorrelationbetweenperturbationandmasks
(rows3-5),theimagescanstillbeasthecorrectidentity.No-
ticethattheprimarilyalterstheeyecolor,nose,andsubduesadversarial
perturbationsinforeheads.Zoominfordetails....................117
Figure3.29ArcFace
2
[

1
;
1]
/Detectionscores
2
[0
;
1]
whenperturbationamount
isvaried(

=
f
0
:
25
;
0
:
50
;
0
:
75
;
1
:
00
;
1
:
25
g
).Detectionscoresabove
0
:
5
arepre-
dictedasadversarialimageswhileArcFacescoresabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatethattwofacesareofthesamesubject.
FaceGuard
istrainedon

=1
:
00
.Thedetectionscoresimproveasperturbationamountin-
creases,whereas,majorityofimagesaredetectedasreal.Evenwhen
imagesfailtobeasrealbythedetector,maintain
highAFRperformance.................................118
xxi
Figure3.30
2
Dt-SNEvisualizationoffacerepresentationsextractedviaArcFacefrom
1
;
456
(a)real,(b)AdvFaces[16],and(c)imagesbelongingto
10
subjects
inLFW[8].ExampleAdvFaces[16]pertainingtoasubjectmovesfartherfromits
identityclusterwhiletheproposeddrawsthemback.............119
Figure4.1FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigital
andphysicalspaces.Giventhediversityofthefaceattacks,prevailingmethods
fallshortindetectingattacksacrossallthreecategories(
i
.
e
.,adversarial,digital
manipulation,andspoofs).Thisworkisamongthetothetaskofface
attackdetectiononthe
25
attacktypesacross
3
categoriesshownhere........121
Figure4.2(a)Detectionperformance(TDR@
0
:
2%
FDR)indetectingeachattack
typebytheproposed
UniFAD
(purple)andthedifferenceinTDRfromthebest
fusionscheme,LightGBM[36](pink).(b)Cosinesimilaritybetweenmeanfea-
turesfor
25
attacktypesextractedby
JointCNN
.(c)Examplesofattacktypesfrom
4
differentclustersvia
k
-meansclusteringonJointCNNfeatures.Attacktypesin
purple,blue,andreddenotespoofs,adversarial,anddigitalmanipulationattacks,
respectively.......................................125
Figure4.3Anoverviewoftraining
UniFAD
intwostages.Stage
1
automaticallyclus-
terscoherentattacktypesinto
T
groups.Stage
2
consistsofaMTLframework
whereearlylayerslearngenericattackfeatureswhile
T
brancheslearntodistin-
guishbonafromcoherentattacks.........................128
Figure4.4Confusionmatrixrepresentingtheaccuracyof
UniFAD
in
identifyingthe
25
attacktypes.Majorityofoccurwithinthe
attackcategory.Darkervaluesindicatehigheraccuracy.Overall,
UniFAD
achieves
75
:
81%
and
97
:
37%
accuracyinidentifyingattacktypesandcate-
gories,respectively.Purple,blue,andreddenotespoofs,adversarial,anddigital
manipulationattacks,respectively...........................134
Figure4.5Detectionperformancewithrespecttovaryingratioofsharedlayers(left)
andnumberofbranches(right).Ourproposedarchitectureuses
50%
sharedlayers
with
4
branches.....................................136
Figure4.6Detectionperformanceonattacktypeswithinandoutsideabranch'spar-
tition.Performancedropsonattacksoutsidepartitionastheymaynothaveany
correlationwithwithin-partitionattacktypes.....................137
Figure4.7Examplecaseswhere
UniFAD
failstodetectfaceattacks.Finaldetection
scoresalongwithscoresfromeachofthefourbranches(
2
[0
;
1]
)aregivenbe-
loweachimage.Scorescloser
0
indicatebonaBranchesresponsibleforthe
respectiveclusterarehighlightedinbold........................138
Figure4.8Trainingandtestingsplitsforgeneralizabilitystudy..............143
xxii
Figure4.9Confusionmatrixrepresentingtheaccuracyof
UniFAD
in
identifyingthe
3
attackcategories,namelyadversarialfaces,digitalfacemanipu-
lation,andspoofs.Majorityofconfusionoccurswithindigitalattacks(adversarial
anddigitalmanipulationattacks)............................145
xxiii
LISTOFALGORITHMS
Algorithm1Training
AdvFaces
.Allexperimentsinthisworkuse

=0
:
0001
,

1
=
0
:
5
,

2
=0
:
9
,

i
=10
:
0
,

p
=1
:
0
,
m
=32
.Weset

=3
:
0
(obfuscation),

=8
:
0
(impersonation)............................85
Algorithm2Training
FaceGuard
.Allexperimentsinthisworkuse

=0
:
0001
,

1
=
0
:
5
,

2
=0
:
9
,

obf
=

fr
=10
:
0
,

pt
=

perc
=

div
=1
:
0
,

=3
:
0
,
m
=16
.Forbrevity,
lg
referstologoperation................108
xxiv
Chapter1
Introduction
Automatedfacerecognition(AFR)systems-thesoftwarethatmaps,analyzes,andthen
theidentityofafaceinaphotographorvideo-isoneofthemostpowerfulsurveillancetools
evermade.WhilepeopleinteractwithAFRsystemsasawayofunlockingtheirphonesorsorting
theirphotos,facerecognitiontechnologiesarecurrentlydeployedinmanyimportantapplications.
TheubiquityofAFRsystemsisevidentinbothcommercialandgovernmentalapplications.For
instance,facerecognitionplaysavitalroleinidentitycardde-duplicationtopreventaperson
fromobtainingmultipleIDcards,suchasdriver'slicensesandpassports,underdifferentnames
1
.
AFRsystemsarealsoutilizedbytheUnitedStatesDepartmentofHomelandSecurity(DHS),
theUnitedStatesDepartmentofDefense(DoD),alongwiththeFederalBureauofInvestigation
(FBI)andImmigrationandCustomsEnforcement(ICE),todeterminefriendorfoeatsecurity
checkpoints,andtoassistlawenforcementofinthetocapturefaceimageswithmobile
devices,submitthemtofacerecognitionsystemoncentralservers,andquicklyidentifypeople
whorefusetogivetheirname,providefalseinformation,orareinjuredandunresponsive
2
.Face
recognitionsystemsareadditionallyemployedforsurveillanceandaccesscontroltosecureloca-
tions.Commercialapplicationsofautomaticfacerecognitionarealsonowabundant,including
automaticﬁtagﬂsuggestionsonFacebook,organizationofpersonalphotocollections,andmobile
1
https://bit.ly/2SO0yuL
2
https://bit.ly/3jSO4xH
1
(a)
(b)
Figure1.1(a)errorratesofsixstate-of-the-artautomatedfacerecognitionvendors
ona12millionmugshotdataset,namelyFRVT-2018[22].(b)Sixmugshotsrepresentativeofthe
FRVT-2018dataset.
phoneunlock.
ThecurrentcapabilitiesofanAFRsystemdependheavilyontheimageacquisitionconditions
(
i
.
e
.,subjectcooperation,environmentalconditions,etc.).IDdocumentvforinstance,
entailsfrontal-to-frontalfacematchingofcontrolledimages,wherefacesarerequiredtohavea
neutralexpression,auniformbackground,andcontrolledlighting.Inthesescenarios,state-of-the-
artcommercialoff-the-shelf(COTS)facerecognitionsystemsarehighlyaccurateandhaveproven
tobeextremelyuseful.Forty-threeofthe50statesintheUnitedStatesareutilizingAFRsystems
fordetectingfraudulentIDdocuments;iftheFBIneedstoputanametoaface,thebureaucan
scanthrough411millionfacephotosspreadacrossstateandfederaldatabases
3
.In2020,alarge-
scalefacerecognitionevaluation,conductedbytheNationalInstituteofStandardsandTechnology
(NIST),demonstratedthattheerrorratesofthetopperformingCOTSAFRsystemswereverylow
foridentifyingmugshotfaceimagesatrank-1againstagallerycomprisingof12millionmugshots
(seeFigure1.1).
Comparedtootherbiometrictraitssuchasandiris,faceoffersanumberofadvan-
tages:(i)We,ashumans,intrinsicallyidentifyothersviaface.Therefore,facedoesnotreveal
3
https://bit.ly/36edG4b
2
anyextraneoussensitiveinformationthatpeoplethemselvesdonotalreadyrevealtothepublic
onadailybasis.Consequently,facerecognitiontendstobeamorepubliclyacceptablebiometric
trait(comparedto,say,gerprintswhicharecommonlyassociatedwithcriminalaccusations).
(ii)Unlikerprintandiris,nospecializedhardwaresensorsarerequired;digitalcamerasare
readilyavailable(
i
.
e
.,insmartphones)andarerelativelyinexpensive.(iii)Facescanbeacquired
unobtrusivelyatadistance,andinacovertmanner,ifrequired.(iv)Availablelarge-scalelegacy
faceimagedatasets(suchaspassportanddriver'slicense)easebenchmarkingfacerecognition
performance.(v)Inadditiontoidentity,facesalsorevealdemographicattributessuchasgender,
race,andage.(vi)WiththerecentCOVID-19pandemic,theﬁtouchlessﬂnatureoffaceimage
acquisitionisattractiveinbothgovernmentandcommercialapplications.
WithnewemergingapplicationsofAFRsystems,theadvantagesofutilizingfacebiometric
forisapparent.Intoday'ssociety,smartphoneandsurveillancecamerasareubiq-
uitous.AsAmericanschools,mallsandoftightensecurityontheirpremises,thenumberof
surveillancecamerasintheUnitedStatesissaidtogrowto85millionby2021,comparedto70
millionin2020.Chinaaloneisexpectedtohave560millionnetworkedCCTVcamerasbythe
endof2021
4
.Thetotalnumberofsurveillancecamerasaroundtheworldissaidtoclimbabove
1billionbytheendof2021
5
.WithrecenttragicpoliceincidentssuchasthedeathofGeorge
FloydinMinneapolis,FreddieGrayinBaltimore,andBreonnaTaylorinLouisville,manypolice
departmentsarenowrequiringpatroloftowearbodycameras.Inthiseraofconstantdoc-
umentationofpersonallivesonsocialmediawebsites,suchasFacebookandInstagram,personal
collectionoffacephotosarealsoboomingwithanestimated25,000takenbyan
averageperson(
18
to
34
yearolds)duringtheirlifetime
6
.Duetothisincreasinglyvastcollection
ofavailableimagery,inadditiontocountlessroutinecrimes(
e
.
g
.,robbery,kidnapping,assault),
AFRsystemsareaninvaluabletoolforofpersonsofinterest.Forexample,Walter
Yovany-Gomez,amemberoftheMS-13streetgang,evadedauthoritiesforyearsbeforeFBIput
4
https://bit.ly/3nIQt0d
5
https://on.wsj.com/3345RvW
6
https://bit.ly/346F0P4
3
(a)Pose
(b)Illumination
(c)Expression
(d)Occlusion
(e)Resolution
Figure1.2Sourcesofintra-subjectvariability:(a)pose,(b)illumination,and(c)expression.Each
rowshowsintra-subjectvariationsforthesameindividualin(a-c;source:PIEDataset[23]),(d)
AmitabhBacchan(source:GoogleImages),and(e)TomHiddleston(source:GoogleImages).
himonitsTenMostWantedFugitiveslistin2011.WiththeaidofanAFRsystem,investigators
tracedMr.Gomezin2017viaphotosonFacebookcontaininghisface
7
.Without
utilizingan
automated
facerecognitionsystem,manuallysiftingthroughthe250billionphotos
uploadedonFacebook,tilldate
8
,isanimpossibletaskandawantedcriminalsuchasMr.Gomez
wouldberoamingaroundfreelytoday.
EvenwiththesuccessesenjoyedbyAFRsystemsintheaforementionedscenarios,facerecog-
7
https://n.pr/3j72xWl
8
https://bit.ly/30bGTZY
4
nitiontechnologyisstilllimitedbytheir
unconstrained
natureoftheimageryavailable.Accura-
ciesofprevailingCOTSAFRsystemsarehighlysensitivetotheimageacquisitionconditions.In
unconstrainedscenarios,faceimageacquisitionisnotwell-controlledandsubjectsmaybenon-
cooperative(orunaware).ConfoundingfactorsplaguingprevailingAFRsystemsinclude:

Pose:
Faceposescanvaryin-plane(roll)orout-of-planerotation(yawand/orpitch).In-
planerotationscanbecorrectedviastraightforward2Dtransformations.However,out-of-
planerotationscausefacestobecomeﬁself-occludedﬂleadingtomissinginformationand
poorfacerecognitionperformance.SeeFigure1.2aforsomeexamplefaceswithextreme
posevariation.

Illumination:
Evenforfacesacquiredinthenaturalenvironmentalsettings,lightingcondi-
tionscanvarydrastically(suchasindoorsvs.outdoors)andilluminationisalsoaffectedby
dailychanges(suchastimeofdayand/orweatherconditions).Duetothethree-dimensional
natureofface,illuminationacrossthefacevariesrapidlyduetotheshadowscastedatcer-
tainangles.Illuminationvariationisexacerbatedwhenthe3Dfaceisacquiredandtranslated
intoa2Dgrayscaleorcolorimage.Dependingonthestrengthoftheillumination,someface
featuresmaybeexaggeratedormaycompletelyvanish.Figure1.2bshowsafewexamples
offacesunderilluminationvariation.

Expression:
Inanunconstrainedsetting,faceimagesmaybeacquiredwithouttheknowl-
edgeofthesubject.Therefore,faceimagesmaybecapturedmid-conversationand/orwhile
viewingsomethingsurprising,upsetting,infuriating,etc.Suchdeviationsfromaneutral(or
relaxed)faceexpression,leadtoadegradationinfacerecognitionperformance.Asaresult,
theStateDepartmentguidelinesforpassportphotoacquisitionpermitssmilesbutfrowns
onﬁtoothyﬂsmiles(whichapparentlyareasunusualorunnaturalexpressions)
9
.
Accordingtotheguidelines,ﬁThesubject'sexpressionshouldbeneutral(non-smiling)with
botheyesopen,andmouthclosed.Asmilewithaclosedjawisallowedbutisnotpreferred.ﬂ.
9
https://cbsn.ws/3i7cdiD
5
Thisisbecausesmilingdistortsotherfacialfeatures,forexample,one'seyes.Figure1.2c
showsafewexamplesoffaceswithexpressionchanges.Extremeexpressionvariations
remainanongoingchallengeforAFRsystems.

Occlusion:
Itisquitecommonforunconstrainedfacestocontainfacialaccessoriessuch
aseyeglassesandsunglasses.Occludingeyeregionscannegativelyimpactfacerecogni-
tionperformancesincethesefacialregionsarehighlydiscriminative.Facialocclusionsare
problematicnotonlybecauseofmissinginformation,butalsoduetotheextraneousandspu-
riousinformationintroduced.Forinstance,evenifapersonconsistentlywearseyeglasses,
specularrethatchangebasedonthelightsourcecanleadtohighintra-subjectface
variation.Figure1.2dshowsafewexamplesofoccludedfaces.

Aging:
Giventwoimagesofthesameindividualacquiredmultipleyearsapart,arobust
AFRsystemshouldstillbeabletoidentifythetwoimagesaspertainingtothesameindi-
vidual.However,weinFigure1.4,thateventhetopperformingCOTSAFRsystems
havedifidentifyingtheindividualasheorsheages[37Œ39].Unlikeotherfactors,
faceagingisintrinsicandcannotbecontrolledbythesubjectortheacquisitionenviron-
ment.Thatis,faceagingcanbepresentinbothcontrolled(constrained)anduncontrolled
(unconstrained)scenarios.

Resolution:
Thequalityoffaceimagesseverelyaffectsfacerecognitionperformance.
Figure1.3showstheincreaseinerrorrateswhenlowerqualitywebcamfacephotosare
matchedtoamugshotgallery.Inpractice,unconstrainedfaceimagesareofpoorimage
quality(suchasthosecapturedfromsurveillancecameras).Facerecognitionperformanceon
poorqualityimagesisfarfromdesirableandremainsanongoingchallengeforthebiometric
community(seeFigure1.2e).
Inter-subjectsimilaritiescanalsoleadtofacerecognitionerrors.Forinstance,itischallenging
(evenforhumans)todistinguishbetweenpeoplewithkinshiprelations(particularlytwins,seeFig-
ure1.5a),andalsopeoplethatarenotrelatedbutexhibitstrongfacesimilarities(Figure1.5b,1.5c).
6
(a)
(b)WebcamImages
(c)Images
Figure1.3errorratesofsixstate-of-the-artautomatedfacerecognitionvendorswhen
(a)mugshots,(b)webcamimages,and(c)facesarecomparedagainsta1.6millionmugshot
dataset(asubsetoftheFRVT-2018[22]).
Figure1.4errorratesofsixstate-of-the-artautomatedfacerecognitionvendorson
a3millionmugshotdatasetunderaging[22].Facerecognitionerrorsincreaseasthetimegap
betweenaprobeimageandtheenrolledfaceimageinthegalleryincreases.
Asmentionedearlier,duetotherecentCOVID-19pandemic,authoritiesandcitizensalike
areenjoyingtheﬁcontactlessﬂandcovertnatureoffacerecognition.However,thecovertnesscan
alsoleadtoitsdownfallwiththeadventoffaceattackslaunchedinbothphysical(suchas3Dface
masks)anddigitaldomains(adversarialanddigitallymanipulatedfaces).Withoutthepresenceofa
7
(a)
(b)
(c)
Figure1.5Sourceofinter-subjectsimilarities:(a)kinshiprelations(here,twins),(b)different
peoplewithnokinshiprelationswhohappentoexhibitverysimilarfacialcharacters(knownas
ﬁdoppelg
¨
angersﬂ),and(c)RichardJones(right)spent17yearsinprisonforacrimecommittedby
hisdoppelg
¨
anger,RickyAmos(left)(Source:https://cnn.it/2Gb1F4A).
humanoperatorensuringthelegitimacyoffaceimageacquisition,maliciousindividuals(attackers)
areincreasinglychallengingthesecurityofAFRpipelinesforgovernmentaccesscontrol,
andgains.
ThevulnerabilitiesofAFRsystemstowardsfaceattacksandmethodstodefend
againstthemwillbediscussedlaterinthischapter.
1.1Background
AtypicalAFRsystemmayoperateinvariousmodesdependingonthedeployedapplication.In
mostscenarios,faceimagesalongwiththeiridentitylabelsare
enrolled
asatemplateina
dataset,referredtoasa
gallery
.Later,theAFRsystemtakesasinputanimage,knownasa
probe
or
query
,andmatchesitagainstoneormanyfacespresentinthegallery.
Face

or
authentication
referstoaone-to-onematchingscenariowherethetaskof
theAFRsystemistoverifywhetheraprobefaceimagebelongstotheindividualthathe/sheclaims
tobe(
e
.
g
.,apassportandpassenger'sfacephoto,accesscontrol,andsmartphoneunlock).Onthe
otherhand,face

or
search
involvesone-to-manycomparisonsinordertoretrieve
(fromthegallery)theidentityoftheprobefaceimage(whoseidentityispreviouslyunknown).
Inthescenario,thenumberofidentitiestoretrieveisusuallymanuallyto
top-
k
,where
k
dependsontheapplicationathand.Thisisreferredtoasa
closed-set

scenario,whereweassumethattheidentityoftheprobeispresentinthegalleryandwehope
thatthecorrectidentitywillbepresentinthetop-
k
retrievedindividuals.Authoritiescanthen
8
Figure1.6AtypicalAutomatedFaceRecognition(AFR)systemtypicallyconsistsof(i)face
detection,(ii)facealignment(tomitigategeometricdistortions),(iii)featureextraction,and(iv)
comparisonoffacerepresentations(featurevectors).
manuallyreviewthetop-
k
candidateretrievalstoidentify,say,asuspectedcriminalinaforensics
dataset.However,real-worldscenariosentailan
open-set
scenario,wherewecannot
assumethattheprobe'sidentitywillindeedbepresentinthegallery.Foropen-setapplications,we
retrievethetopcandidatematchesonlyiftheyexceedathreshold(insteadofmanually
specifyingaed
k
).Thisisextremelyuseful(forexample,inwatchlistsurveillance)sinceitmay
beimpracticalforauthoritiestomanuallyreviewretrievedcandidatesforeveryprobe.
1.2AutomatedFaceRecognition(AFR)
Regardlessoftheoperatingscenario(vortheprimarytaskofallAFR
systemsistocomputeasimilaritymeasurementbetweenanytwofaceimages.ArobustAFRsys-
temshouldideallyoutputahighsimilaritymeasurebetweenafaceimagepairofthesameindivid-
ual(
genuinepair
)andalowsimilaritymeasurebetweenfaceimagepairsofdifferentindividuals
(
impostorpairs
).Thisprocessinvolvesmultiplecomponentswhichallcontributeto
thefacesimilaritymeasurementandconsequently,theresultingfacerecognitionaccuracy.
1.2.1AFRPipeline
AtypicalAFRsystemsiscomposedofthefollowingsequentialmodules(seeFigure1.6):(i)
facedetection,(ii)facealignment,(iii)facefeatureextractionandfacerepresentation,and(iv)
similaritymeasurement.Eachcomponentisnecessaryandcontributestotheoverallperformance
9
Table1.1Vcationperformance(%)undertwodifferentfacedetectorsonLFW,CFP-FP,and
AgeDB-30[5].Figure1.7showsafewexamplesofeachofthethreedatasets.
MethodsLFW[8]CFP-FP[24]AgeDB-30[25]
MTCNN+ArcFace[9]99.8398.3798.15
RetinaFace+ArcFace[5]
99.8699.4998.60
(a)LFW[8]
(b)CFP[24]
(c)AgeDB[25]
Figure1.7Examplefacesin(a)LFW[8],(b)CFP[24],andAgeDB[25]datasets.
oftheAFRsystem.Thus,attentionisseenintheliteraturetowardsimprovingeach
individualcomponent.
1.2.1.1FaceDetection
Sinceimagesmayconsistofavarietyofdifferentobjects(includingface),priortofacerecognition,
itisimperativethatfacesarespatiallylocatedintheinputimage.Facedetectionistheprocessof
automatically(a)determiningwhetheraface(ormultiplefaces)ispresentinaninputimage,and
then(b)outputtingthelocations(2Dcoordinates)ofalldetectedfaces.Thisisatrivialtaskfor
humans,whereas,extractingfaceregionsfromarbitraryimagesischallengingformachines.This
canlikelybeattributedtoahighintra-classvariabilityintheappearanceoffaces(
e
.
g
.,skincolor,
backgroundnoise,facepose,illumination,etc.).Failuretodetectfaces(missingfacesintheimage,
orerroneouslydetectingnon-faceregionsasfaces)isextremelyproblematic,sincesubsequent
AFRcomponentswillnotaddanyvalueiftheyarenotpresentedwithanactualfaceregion.
TheseminalworkofViolaandJones[40]iscreditedtobeingthereal-timeandaccurate
facedetector.Sinceitspublicationin2004,alargenumberofstudiesinliteraturehavefocused
onimprovingfacedetectionaccuracy.Currently,MTCNN[41]andRetinaFace[5]leadtheway
infacedetectionaccuracy(seeFigure1.8)andarecommonlyemployedinfacerecognitionlitera-
10
Figure1.8Astate-of-the-artfacedetector,RetinaFace,candetectaround900faces(detection
thresholdat
0
:
5
)outof1,151peoplereportedtobepresentintheﬁWorld'sLargest[5].The
yellowrectangledenotestheboundingboxaroundafaceandgreendotsrepresentthedetected
landmarks.
(a)
(b)
(c)
(d)
(e)
Figure1.9Exampleimagesthatsupposedlycontainfacesbutcannotbedetectedbyastate-of-the-
artfacedetector,RetinaFace[5].Notethattheseimagesareofverylowresolution.
11
ture[6,9,42].Tounderstandtheofanaccuratefacedetectionsystemonfacerecognition
performance,inTable1.1wereportthefacevaccuracyonthreebenchmarkdatasets,
LFW[8],CFP-FP[24],AgeDB-30[25].Twofacedetectorsareemployed:(i)MTCNN[41]and
(ii)RetinaFace[5].Wecanobserveanimprovementinfacerecognitionperformancewhenabet-
terfacedetector,RetinaFace,isemployed.However,evenwiththestate-of-the-artfacedetectors,
errorscanstillbeobservedduetoextremeposevariations,occlusions,andlowresolution(see
Figure1.9).
1.2.1.2FaceAlignment
Thisstepisprimarilyconcernedwithreducingintra-classvariabilitybyeliminatinggeometric
distortionspresentinthefaceregions.Inparticular,geometricandphotographicvariationsmay
bemitigatedbytransformingthedetectedfaceregionstoacanonicalview(frontfaceview).Face
alignmentaimsatdeterminingcorrespondencesbetweenfaceimagesbasedon
points(
e
.
g
.,eyes,nose,mouth,chin,jaw,etc.).Themoststraightforwardalignmenttechniqueisa
simple2Drigidaftransformationbasedontwoeyelocationstoaccountforfacesizeandin-
planeheadrotation[6,43].Moresophisticatedalignmentmethodsinvolveemploying3Dmodeling
techniquestoﬁfrontalizeﬂtheface,whichalsoaccountsforout-of-planeheadrotations.However,
such3Dfacealignmenttechniquesaregenerallytime-consumingandincreasestheoverheadtime
associatedwithanAFRsystem.
Facealignmentrequiresasetofreferencepoints(
i
.
e
.,landmarks)whicharepoints
abasefaceposition(
i
.
e
.,positionofeyes,nose,andmouthforanaverageperson).A
commonapproachforacquiringreferencepointsistocomputetheaveragelandmarkpointsex-
tractedfromalargefacedataset.Afterobtainingreferencepoints,fortransformingaprobefacein
the2Dspace,wehavefourpossibletypesoftransformations(seeFigure1.10):

EuclideanTransformation
whichisarigidtransformationpreservingdistancesbetween
eachpairofpoints.TheEuclideantransformationallowsrotatingandtranslatingtheface(3
DegreesofFreedom).
12
Figure1.10Illustrationofvarious2Dfacealignmenttechniques:(i)simplycroppingtheface
region,(ii)similaritytransform(scaleandrotation),(iii)aftransformation(rotation,scaling,
andshearmapping),and(iv)projectivetransformation(perspectivedeformation)[7].

SimilarityTransformation
alsoallowsforscaling(makingfacesbigger/smaller)andthere-
fore,doesnotpreservedistancesbetweenpoints(4DegreesofFreedom).

Transformation
includespoint,straightline,andplanepreservation.Intotal,af
transformationallowstranslation,rotation,scaling,andchangingaspectratioandshearmap-
ping(6DegreesofFreedom).

ProjectiveTransformation
isthemostadvancedformof2Dtransformation.Everyparam-
eterinthetransformationmatrixisindependentofeachother(8DegreesofFreedom).
FromFigure1.10,wethat2Dtransformationswithhigherdegreesoffreedomhavelower
projectionerrorrates(MSEdistancetothereferencepoints)butyieldunnatural-lookingfaces.
Aninterestingquestionarises,
ﬂWhatismoretoAFRsystems,lowprojectionerroror
naturallookingfaces?ﬂ
Toanswerthis,whilekeepingtheentireAFRpipelineconsistentandonly
changingthefacealignmentmethod,thevaccuracyonLFWdataset[8]isreportedin
Table1.2.Wethatmorenaturallookingfaces(similaritytransform)arebetterthanallowing
formoredegreesoffreedom(afprojective).Consequently,themostcommonlyemployedface
13
Table1.2VPerformance(%)ofFaceNet[6]AFRsystemunderdifferentfacealignment
techniques[7].
MethodsLFWProtocol[8]BLUFRProtocol@FAR=0.1%
SimpleCrop+FaceNet97.985.59
Similarity+FaceNet
98.188.42
AfaceNet97.986.47
Projective+FaceNet97.785.63
Euclideantransformisexcludedasitassumesreferencepointsarescale-invariant.
Table1.3VPerformance(%)onLFW[8]fordifferentfacefeatureextractors.
MethodsYearFaceRepresentationLFW[8]
Eigenfaces[45]1991Holistic60.02
Fisherfaces[46]1997Holistic87.47
HighDim-LBP[47]2013Local95.17
JointBayesian[48]2012Local96.33
DeepFace[49]2014Learned97.35
DeepID[50]2014Learned97.45
VGGFace[51]2015Learned98.95
FaceNet[6]2015Learned99.63
SphereFace[44]2017Learned99.42
CosFace[42]2018Learned99.73
ArcFace[9]2019Learned99.83
alignmentmethodinliteratureinvolves(a)detecting5landmarkpoints(2eyes,nose,and2mouth
corners)andsubsequently,(b)applyingasimilaritytransformation[42,44].
1.2.1.3FeatureExtraction
Afterobtainingacroppedandalignedfaceimage,thenextmoduleinanAFRsysteminvolvesex-
tractingasetofnumericalvalues(knownasa
featurevector
ora
representation
)thatbestdescribes
theinputface.Thesimplestsetoffeaturesaretherawpixelvaluesoftheinputface.However,
14
suchfeaturesmayincludespuriousandirrelevantinformation(suchasthecolorspresentinnon-
faceregions).Instead,matchingresultscanbeenhancedifwedeviseamethodforextractingboth
high-levelfeatures(distancesbetweenfacialcomponentsandtheirrelativelocationsandratios)
andlow-levelfeatures(wrinklesandfacemarksandscars).However,afeaturevectormustbe
carefullyconstructedsuchthatwedonothaveunnecessarilylarge(andlikelyredundant)features
whichmaynegativelyimpactrecognitionrates.Table1.3providesasummaryofafewstudies
primarilyfocusedonimprovingfacefeatureextraction(someofwhichwillbediscussedinSec-
tion1.3.1).
1.2.1.4SimilarityMeasurement
ThetaskofanAFRsystemistocomputeameasureofsimilaritybetweenfacerepresen-
tations.ThemostobviouschoiceisEuclideandistancebetweenfeaturevectors,however,other
distancemetricssuchascosine,Manhattan,histogramintersection,log-likelihoodstatistics,chi-
squarestatistics,etc.,mayalsoimproverecognitionperformance.Inpractice,cosinesimilarity
betweenfeaturevectorsisthemostwidelyadoptedsimilaritymetricinliterature[6,9,42,44].
1.3EvolutionofFaceRecognition
Humanshavebeendrawntotheideaofidentifyingcriminalsbasedonfacialcharacteristicssince
aslongasthe19
th
century.Basedonanthropometricmeasurements,AlphonseBertillondeviseda
methodofidentifyingandtrackingcriminalsin1879[52].TheU.S.adoptedtheBertillonsystem
in1887whichwaslaterreplacedbyintheearly20
th
century.However,theconcept
ofutilizingfacephotosofcriminals(knownas
mugshots
)forisstillusedworldwide.
TheearliestpioneersofautomatingfacerecognitionwereWoodrowBledsoe,HelenWolf,and
CharlesBisson
10
.In1964and1965,Bledsoe,alongwithWolfandBissonbeganworkusingcom-
puterstorecognizethehumanface.Sincetheirworkwasfundedfromanunnamedintelligence
10
https://bit.ly/30exJf5
15
agency,muchoftheirworkwasneverpublished.However,wedoknowthattheirinitialwork
involvedthemanualmarkingofvariousﬁlandmarksﬂonthefacesuchaseyecenters,mouthcor-
ners,etc.,whichwerethenmathematicallyrotatedbyacomputertocompensateforposevariation
(facealignment).Afterwards,thedistancesbetweenlandmarkswereautomaticallycomputedand
comparedbetweenimagestodeterminetheidentity(similaritymeasurement).Inotherwords,fol-
lowingtheAFRpipelineoutlinedearlier,Bledsoeandhispeersutilizedfacelandmarksandtheir
distancesandratioasthefacerepresentation.Thisworkisdeemedtobecrucialsinceitwasthe
toshowfaceasavaluablebiometrictraitforperson
SinceBledsoe'seffortsondevisinganautomatedfacerecognitionsystem,researchershave
devotedthepast50+yearsonimprovingeachstageoftheAFRpipeline.
1.3.1FaceRepresentations
TheevolutionofAFRsystemscanberoughlycategorizedintothreefeaturerepresentationap-
proaches:(i)holistic,(ii)local,and(iii)learnedrepresentations(seeTable1.3).
1.3.1.1HolisticFaceRepresentations
Thefullyautomatedfacerecognitionutilized
holisticfacerepresentations
whereallpixelsin
theinputfaceimageareusedtoderivethefacerepresenation.Thesestudiesrelyheavilyonac-
curatefacealignment(typicallyusingeyelocations)whichischallengingwhennon-frontalfaces
areencountered.Eigenfaces[45]wasamongtheAFRsystemtoutilizeholisticfacefeatures.
Alow-dimensionalﬁface-spaceﬂiscomputedforatrainingdatasetcomprisedof
N
unlabeledface
imagesusingPrincipalComponentAnalysis(PCA).Theﬁface-spaceﬂisasetof
M<N
eigenvec-
torscorrespondingtothelargest
M
eigenvaluesofthecovariancematrixofthe
N
trainingimages.
EigenfaceswaslaterextendedtoFisherface[46]viasuperviseddimensionalityreduction(Linear
DiscriminantAnalysis)toasubspacethatminimizesintra-personvariabilityandmaximizes
inter-personsimilarity.Holisticmethodsdonotgeneralizewelltootherdatasetsnotencountered
duringtraining(seeTable1.3).
16
1.3.1.2LocalFaceRepresentations
Facefeaturescanalsobeextractedfromoverlappingpatchesinthefaceimageatmultiplescales
toformlocalfacerepresentations.Toincorporateholisticinformation,localfeaturescanbecon-
catenatedintoafeaturevectordescribingtheinputfaceimage.Typically,thefacerepre-
sentationisoftenover-completewithhigh-dimensionality(fullofredundantinformation).Feature
selection(
e
.
g
.,boosting)ordimensionalityreduction(
e
.
g
.,PCA,LDA)areadoptedtoachievea
morecompactfacerepresentation.Ahonen
etal.
proposedutilizingLocalBinaryPatterns(LBP)
forfacerecognition.Inordertoutilizebothlocalandholosticfacialcharecteristics,theydividethe
faceimageintoagrid.AhistogramofLBPfeaturesiscomputedforeachcellinthegridandresult-
ingfacerepresentationcomprisesaconcatenatedhistogram.High-dimensionalfeaturessampled
atmultiplescaleswithaJointBayesianclass[48]alsoachievesimpressivefacerecognition
performanceonLFW(seeTable1.3).
1.3.1.3LearnedFaceRepresentations
Duetotheadventoflarge-scalefacedatasetsandcheaperandeffecientcomputationalresources
(
e
.
g
.,GPUs),therecentdecadehaswitnessedmajoreffortstowardsautomaticfeatureextraction
algorithmspoweredbyConvolutionalNeuralNetworks(CNNs).Incontrasttothemanuallyde-
signed(ﬁhandcraftedﬂ)holisticandlocalrepresentations,facerepresentationscanbeautomatically
learnedbyCNNsfromlarge-scalefacedatasets.Suchdata-drivenmethodshavealmostcompletely
replacedtraditionalmethodsinfacerecognitionliterature[6,9,42,44,49,50].Inadditiontostate-
of-the-artfacerecognitionaccuracycomparedtotraditionalmethods(seeTable1.3),CNN-based
AFRsystemsofferaed-length,compact,andhighlydiscriminativefeaturevector.Thedimen-
sionalityofthefeaturerepresentationishierarchicallyreducedtoconvolutionalandpoolinglayers
(bothlocalandholisticrepresentationsarejointlylearned).Typically,theoutputofthelayer
consistsofa
D
-dimensionalfeaturevector(where
D
isamanuallyadjudicatedhyper-parameter).
AftertheseminalworkonCNN-basedAFRsystemssuchasDeepFace[49]andDeepID[50]in
2014,studiesinthesubsequentyearshaveprimarilyfocusedonbetterobjectivefunctions(also
17
knownasﬁlossfunctionsﬂ)fortrainingCNNswithimproveddiscriminationpowerofthefeature
representation.Wen
etal.
proposedacenterlossfunctionwhichaimstominimizetheintra-subject
variationbyminimizingthedistancebetweenfeaturevectorspertainingtothesameidentity[53].
1.4BenchmarkingAFRSystems
1.4.1EvaluationMetrics
Asmentionedearlier,AFRsystemsmayoperateundertwoscenarios:(i)facevand(ii)
face
AnAFRsystemistaskedtodeterminewhetherapairoffaceimagesbelongtothesamesubject
viaadecisionthresholdonthesimilaritymeasure(similarityscore).Twocommonlyadopted
evaluationmetricsforfacevare(i)vaccuracy(orrate)and(ii)TrueAccept
Rate(TAR)ataedFalseAcceptRate(FAR).
VAccuracy
(
˝
)=
Numberofsuccessfulpairwisematches
Totalnumberofimagepairs
(1.4.1)
Amatchisdeemedsuccessfulforanimagepair(
x
,
y
)if:(i)
similarity
(
x;y
)
>˝
when
x
and
y
aretwofacesbelongingtothesameperson(genuinepair),and(ii)
similarity
(
x;y
)
<
=
˝
when
x
and
y
aretwofacesbelongingtotwodifferentpeople(impostorpair).Here,thethreshold
˝
isde-
terminedviacross-validation.VaccuracyiscommonlyutilizedintheLFWprotocol[8]
wherethenumberofgenuineandimpostorpairsarebalanced.
Inreal-worldscenarios,AFRsystemsoperateatapre-determinedmatchthreshold(
˝
).Typi-
cally,wereporttheTrueAcceptRateatapre-determinedFalseAcceptRate,say,0.1%.The
˝
is
18
determinedviaanReceiverOperatingCharacteristic(ROC)curve.Formally,
TAR
(
˝
)=
Numberofgenuinepairswithsimilarityscore
>˝
Totalnumberofgenuinepairs
FAR
(
˝
)=
Numberofimpostorpairswithsimilarityscore
>˝
Totalnumberofimpostorpairs
InthecaseoffacetheAFRsystemispresentedwithagalleryoffaceimages
(knownidentities).Then,givenaprobeimage,theAFRsystemneedstodeterminetheidentity
oftheprobefromthegallery.Inclosed-settion,weassumetheidentityoftheprobeis
presentinthegallery.Therefore,wesimplyneedtodeterminetherankatwhichthetruemateis
retrieved(sayat
K
)outofagallerysize.Thentheclosed-setaccuracycanbecomputedvia,
RetrievalRate
(
K
)=
NumberofsuccessfullyretrievedprobeswithinKretrievals
Totalnumberofprobes
RetrievalRate(
K
)iscommonlyreferredtoasRank-
K
accuracy.Forexample,Rank-1accuracy
referstotheaccuracywithwhichanAFRsystemsuccessfullyretrievesthecorrectidentifyasthe
entryinalistofpotentialmatchesrankedindescendingorderviasimilarityscoreswiththe
probeimage.
Intheopen-setscenario,thesystemneedstodeterminewhethertheprobe
isinthegallerypriortoretreival.Inthiscase,thecommonlyemployedevaluationmetricisTrue
PositiveRate(TPIR)atFalsePositiveRate(FPIR):
TPIR
(
˝
)=
NumberofsuccessfullyretrievedmatedprobesatRank-1
Totalnumberofmatedprobe
FPIR
(
˝
)=
Numberoffalselyretrievednon-matesatRank-1
Totalnumberofnon-matedpairs
Here,asuccessfullyretrievedmatedprobemeansthatthetruemateisreturnedasthetop-1
resultandthesimilarityscorebetweenprobeandtruemateisgreaterthan
˝
.
19
1.4.2FaceDatasets
Alleffortsindesigningastate-of-the-artfacerecognitionsystemwouldhavebeeninvainwith-
outadvancementsinacquiringmoreandmorechallengingfacedatasetsforbenchmarkingAFR
systems.Indeed,duetoprivacyconcerns,manystudiesevaluatetheirproposedmethodsonface
datasetsacquiredin-house.However,facerecognitionperformancetodaywouldhavebeenfar
fromdesirablehaditnotbeenforthepublicreleaseoflarge-scalefacedatasets.
FERET[26]andFRGC[26]areamongthedatasetsutilizedforbenchmarkingfacerecog-
nitionperformance(seeFigure1.11).ThesedatasetsgreatlycontributedtoadvancementsinAFR
systems.However,thesedatasetswereacquiredundercontrolledconditionsandweremainly
gearedtowardsstudyingchallengesassociatedwithfacerecognition(
e
.
g
.,pose,illumi-
nation,andexpression).ThesedatasetswereextremelyvaluableforevaluatingAFRperformance
underimagesexhibitingchallenges,butwerenotveryrepresentativeoffaceimagesen-
counteredinthereal-world.
Huang
etal.
releasedtheLabeledFacesintheWild(LFW)datasetwhichwasacquiredby
scrapingtheInternetforcelebrityfacephotos.TheLFWdatasetincludes13,233faceimagesof
5,749differentpeople.FaceimageswereautomaticallydetectedviatheViola-Jonesfacedetec-
tor[40].Huang
etal.
alsoproposedtheLFWprotocolforbenchmarkingAFRaccuracy:10-fold
cross-validationonfacevof300genuinepairsand300impostorpairs.
1.4.3ConstrainedFaceRecognition
TheNationalInstituteofStandardsandTechnology(NIST)beganbenchmarkingtheaccuracyof
facerecognitionsystemsin1993withtheFERETprogram[26].Sincethen,commercialface
recognitionsystemshavebeenevaluatedinmultipleFaceRecognitionVendorTests(FRVTs).Ta-
ble1.4documentsstate-of-the-artconstrainedfacerecognitionperformancesince1993tilldate.
Withtheexceptionofwebcamandfaceimages(seeFigure1.3),mostoftheevaluationsin
theOngoingFRVTareperformedonaconstrainedfacedatasetacquiredbyNIST.Thesebench-
marksareinvaluableforprovidingcomparingstate-of-the-artAFRalgorithms.NISThasaccess
20
Table1.4BenchmarkingAFRperformancethroughouttheyearsinNISTevaluationsonfrontal
andconstrainedfaces.
StudyYearGallerySizeRank-1Accuracy(%)TAR(%)@0.1%FAR
FERET[26]1993-943167821
FERET[26]1996-978319546
FRVT[54]200237,4377380
FRGC[26]200516,028N/A99
FRVT[54]2006N/AN/A99
MBE[55]20101.6M9299
FRVT[56]20141.6M96N/A
FRVT[22]Ongoing12M99.9899.99
tolargeoperationaldatabasesandconductsextensivetestingofmultiplealgorithmsonprotocols
thatmimicoperationalscenarios
11
.
1.4.4UnconstrainedFaceRecognition
Prevailingstate-of-the-artmethodsforunconstrainedfacerecognitionhavebeenbenchmarked
bytheLFW[8]databaseprotocolsinceitsreleasein2007.RecentdeeplearningbasedAFR
systemsachieveaccuraciesabove99%viatheLFWprotocol(
e
.
g
.,FaceNet[6],CosFace[42],
SphereFace[44],ArcFace[9]).Asmentionedearlier,amajorreasonthesedata-drivenmethods
aresosuccessfulisduetotheavailabilityoflarge-scaletrainingdatasets.Here,weenumerateafew
ofthetrainingdatasetsthatarecommonlyemployedfortrainingstate-of-the-artfacerecognition
systems:

CASIA-WebFace[10]consistsof0.5Mfaceimagesof10,575celebritiesacquiredﬁin-the-
wildﬂbyscrapingtheInternet.

MS-Celeb-1M[28]contains8Mfaceimagesof85Ksubjects.SimilartoCASIA-WebFace,
MS-CelebalsocontainscelebrityfacephotographscollectedfromtheInternet.
11
http://www.nist.gov/itl/iad/ig/face.cfm
21
(a)FERET[26]
(b)FRGC[26]
(c)LFW[8]
(d)IJB-A[27]
(e)MS-Celeb-1M[28]
(f)TinyFace[29]
Figure1.11Examplefaceimagesfrom(a)FERET[26],(b)FRGC[26],(c)LFW[8],(d)IJB-
A[27],(e)MS-Celeb-1M[28],and(f)TinyFace[29].Datasets(a)and(b)containfaceimages
underrelativelycontrolledacquisitionconditions.Datasets(c-f)containmoreunconstrainedface
images(
e
.
g
.,collectedfromtheInternet).
22
Figure1.12FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigitalandphysi-
calspaces.Giventhediversityofthefaceattacks,prevailingmethodsfallshortindetectingattacks
acrossallthreecategories(
i
.
e
.,adversarial,digitalmanipulation,andspoofs).

VGGFace2[57]consistsof3.3Mcelebrityfacephotosfrom9Kidentitieswith362images
persubjectonaverage.
ExamplefaceimagesfromsomeofthesedatasetsareshowninFigure1.11.
1.5VulnerabilitiesofAFRSystems
Asmentionedearlier,theaccuracy,usability,andtouchlessacquisitionofstate-of-the-artAFR
systemshaveledtotheirubiquitousadoptioninaplethoraofdomains,includingmobilephone
unlock,accesscontrolsystems,andpaymentservices.Despitethisimpressiverecognitionper-
formance,currentAFRsystemsremainvulnerabletothegrowingthreatoffaceattacksbothin
physicalanddigitaldomains.
Forinstance,anattackercanhidehisidentitybywearinga3Dmask[58],orintruderscan
assumeavictim'sidentitybydigitallyswappingtheirfacewiththevictim'sfaceimage[20].With
unrestrictedaccesstotherapidproliferationoffaceimagesonsocialmediaplatforms,launching
attacksagainstAFRsystemshasbecomeevenmoreaccessible.Giventhegrowingdissemination
23
ofﬁfakenewsﬂandﬁdeepfakesﬂ[59],theresearchcommunityandsocialmediaplatformsalike
arepushingtowards
generalizable
defenseagainstcontinuouslyevolvingandsophisticatedface
attacks.
Inliterature,faceattackscanbebroadlyintothreeattackcategories:(i)Spoofat-
tacks:artifactsinthe
physical
domain(
e
.
g
.,3Dmasks,eyeglasses,replayingvideos)[1],(ii)
Adversarialattacks:imperceptiblenoisesaddedtoprobesforevadingAFRsystems[60],and(iii)
Digitalmanipulationattacks:entirelyorpartiallyphoto-realisticfacesusinggenerative
models[20].Withineachofthesecategories,therearedifferentattacktypes.Forexample,each
spoofmedium,
e
.
g
.,3Dmaskandmakeup,constitutesoneattacktype,andthereare
13
common
typesofspoofattacks[1].Likewise,inadversarialanddigitalmanipulationattacks,eachattack
model,designedbyuniqueobjectivesandlosses,maybeconsideredasoneattacktype.Thus,
theattackcategoriesandtypesforma
2
-layertreestructureencompassingthediverseattacks(see
Figure1.12).Suchatreewillinevitablygrowinthefuture.
InordertosafeguardAFRsystemsagainsttheseattacks,numerousfaceattackdetectionap-
proacheshavebeenproposed[20,21,61Œ63].Despiteimpressivedetectionrates,prevailingre-
searcheffortsfocusonafewattacktypeswithin
one
ofthethreeattackcategories.Sincetheexact
typeoffaceattackmaynotbeknown
apriori
,ageneralizabledetectorthatcandefendanAFR
systemagainstanyofthethreeattackcategoriesisofutmostimportance.
1.5.1PhysicalFaceSpoofs
Facepresentationattacks
12
areﬁphysicalfakefacesﬂwhichcanbeconstructedwithavarietyof
differentinstruments(presentationattackinstruments),
e
.
g
.,3Dprintedmasks,printedpaper,or
digitaldevices(videoreplayattacksfromamobilephone)withagoalofenablinganattackerto
impersonateavictim'sidentity,oralternatively,obfuscatetheirownidentity(seeFigure1.13).
Withtherapidproliferationoffaceimages/videosontheInternet(especiallyonsocialmediaweb-
12
ISOstandardIEC30107-1:2016(E)presentationattacksas
ﬁpresentationtothebiometricdatacapture
subsystemwiththegoalofinterferingwiththeoperationofthebiometricsystemﬂ
[64]
24
sites,suchasFacebook,Twitter,orLinkedIn),replayingvideoscontainingthevictim'sfaceor
presentingaprintedphotographofthevictimtotheAFRsystemisatrivialtask[65].Evenif
afacepresentationattackdetectionsystemcouldtriviallydetectprintedphotographsandreplay
videoattacks(
e
.
g
.,withdepthsensors),attackerscanstillattempttolaunchmoresophisticated
attackssuchas3Dmasks[66],make-up,orevenvirtualreality[67].
(a)
(b)Print
(c)Replay
(d)Half
(e)Silicone
(f)Trans.
(g)Paper
(h)Mann.
(i)Obf.
(j)Imp.
(k)Cosm.
(l)Funny
(m)PaperG
(n)PaperC
Figure1.13Examplepresentationattacks:Simpleattacksinclude(b)printedphotograph,or(c)
replayingthevictim'svideo.Moreadvancedpresentationattackscanalsobeleveragedsuchas
(d-h)3Dmasks,(i-k)make-upattacks,or(l-n)partialattacks[30].Afaceisshownin(a)
forcomparison.Here,thepresentationattacksin(b-c,k-n)belongtothesamepersonin(a).
Theneedforpreventingfaceattacksisbecomingincreasinglyurgentduetotheuser'spri-
vacyconcernsassociatedwithspoofedsystems.Failuretodetectfaceattackscanbeamajor
securitythreatduetothewidespreadadoptionofautomatedfacerecognitionsystemsforborder
control[68].Indeed,withtheadventofApple'siPhoneXandSamsung'sGalaxyS8,allofusare
carryingautomatedfacerecognitionsystemsinourpocketsembeddedinoursmartphones.Face
recognitiononourphonesfacilitates(i)unlockingthedevice,(ii)conductingtransactions,
and(iii)accesstoprivilegedcontentstoredonthedevice.
1.5.2DigitalAdversarialFaces
FromTable1.3wesawthatprevailingAFRsystemsarebasedonautomaticfeatureextraction
methodspoweredbyCNNs.However,CNNmodelshavebeenshowntobevulnerableto
adver-
25
sarialperturbations
13
[69Œ72].Szegedy
etal.
showedthedangersof
adversarialexamples
in
theimagedomain,whereperturbingthepixelsintheinputimagecancauseCNNs
tomisclassifytheimageevenwhentheamountofperturbationisimperceptibletothehuman
eye[69].Despiteimpressiverecognitionperformance,prevailingAFRsystemsarestillvulnerable
tothegrowingthreatofadversarialexamples(seeFigure1.14).
(a)EnrolledFace
0.72
0.78
(b)InputProbe
0.22
0.12
(c)AdvFaces
0.26
0.25
(d)PGD[33]
0.14
0.25
(e)FGSM[70]
Figure1.14Examplegalleryandprobefaceimagesandcorrespondingsynthesizedadversarial
examples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand(b)thesamesubject's
probeimage;(c)Adversarialexamplesgeneratedfrom(b)byourproposedsynthesismethod,
AdvFaces;(d-e)Resultsfromtwoadversarialexamplegenerationmethods.Cosinesimilarity
scores(
2
[

1
;
1]
)obtainedbycomparing(b-e)totheenrolledimageinthegalleryviaArcFace[9]
areshownbelowtheimages.Ascoreabove
0.28
(threshold@
0
:
1%
FalseAcceptRate)indicates
thattwofaceimagesbelongtothesamesubject.Here,asuccessfulobfuscationattackwouldmean
thathumanscanidentifytheadversarialprobesandenrolledfacesasbelongingtothesameidentity
butanautomatedfacerecognitionsystemconsidersthemtobefromdifferentsubjects.
ToattackanAFRsystem,ahackercanmaliciouslyperturbhisfaceimageinamannerthat
cancauseAFRsystemstomatchittoatargetvictim(
impersonationattack
)oranyidentityother
thanthehacker(
obfuscationattack
).Yettothehumanobserver,thisadversarialfaceimageshould
appearasalegitimatefacephotooftheattacker(seeFigure3.2d).Thisisdifferentfromface
pre-
13
Adversarialperturbationsrefertoalteringaninputimageinstancewithsmall,humanimperceptiblechangesina
mannerthatcanevadeCNNmodels.
26
Figure1.15Examplesofdigitallymanipulatedfaces.(a)Realimages/framesfromFFHQ,CelebA
andFaceForensics++datasets;(b)PairedfaceidentityswapimagesfromFaceForensics++dataset;
(c)PairedfaceexpressionswapimagesfromFaceForensics++dataset;(d)Attributesmanipulated
examplesbyFaceAPPandStarGAN;(e)EntiresynthesizedfacesbyPGGANandStyleGAN.
Collagesourcedfrom[20].
sentationattacks
,wherethehackerassumestheidentityofatargetbypresentingaphysicalfake
facetotheAFRsystem(seeFigure3.2).However,inthecaseofpresentationattacks,thehacker
needstoactivelyparticipatebywearingamaskorreplayingaphotograph/videoofthegenuine
individualwhichmaybeconspicuousinscenarioswherehumanoperatorsareinvolved(suchas
airports).Ontheotherhand,adversarialfaces,donotrequireactiveparticipationofthesubject
duringauthentication(comparisonbetweenadversarialprobeandgalleryimages).
Consider,forexample,theUnitedStatesCustomsandBorderProtection(CBP),thelargest
federallawenforcementagencyintheUnitedStates[73],which(i)processesentrytothecountry
27
forovera
million
travellers
everyday
[74]and(ii)employsautomatedfacerecognitionforverifying
travelers'identities[75].ToevadebeingasanindividualinaCBPwatchlist,aterrorist
canmaliciouslyenrollanadversarialimageinthegallerysuchthatuponenteringtheborder,his
legitimatefaceimagewillbematchedtoaknownandbenignindividualortoafakeidentity
previouslyenrolledinthegallery.
1.5.3DigitalFaceManipulation
Digitalmanipulationattacks,madefeasiblebyVariationalAutoEncoders(VAEs)andGenerative
AdversarialNetworks(GANs),cangenerateentirelyorpartiallyphotorealisticfaceim-
ages[20](seeFigure1.15).Digitalmanipulationattacktypescanbebroadlyintothe
following:
IdentitySwapping:
Thesemethodsdigitallyreplacethefaceofonepersonwiththefaceof
anotherperson.Forinstance,
FaceSwap
[76]insertsfamousactorsintomovieclipsinwhichthey
neverappeared.
DeepFakes
alsoperformsfaceswappingviadeeplearningalgorithms.
ExpressionSwapping:
Expressionsinafaceimagecanbedigitallyandreplaced
withanother[77].Thesemethodsswapexpressionsinreal-timewithonlyRGBcameras.
AttributeManipulation
Utilizingstate-of-the-artGANs,studiessuchasStarGAN[78]andST-
GAN[79]focusonattributemanipulationbyalteringsingleormultipleattributesinafaceimage,
e
.
g
.,gender,age,skincolor,hair,andglasses.
EntireFaceSynthesis:
Poweredbylarge-scalehigh-resolutionfacedatasetsandtheprevelance
ofGANs,anattackercaneasilysynthesizeentirefaceimagesofunknownidentities,whoserealism
issuchthatevenhumanshavedifassessingifitisgenuineormanipulated[80].
28
1.6DissertationContributions
Automatedfacerecognitionhasbeenstudiedextensivelyformorethanfourdecades.
improvementshavebeenmadeprogressivelyineachcomponent(facedetection,alignment,feature
extraction,matching)inordertobuildahighlydiscriminativeandrobustAFRsystem,atleastfor
constrainedandsemi-constrainedfacerecognition.However,asthetechnologybecomesmore
widelyadopted,safeguardingAFRsystemsagainstcontinuouslyevolvingfaceattacksinboth
physicalanddigitaldomainsisoftheutmostimportance.Sincetheexacttypeoffaceattackmay
notbeknown
apriori
,theneedforageneralizabledetectorthatcandefendanAFRsystemagainst
anyofthethreeattackcategories(spoofs,adversarial,anddigitalmanipulation)isevident.
Thisdissertationfocusesondesigningstate-of-the-artdefensemethodstosafeguardAFR
systemsagainstindividualattackcategories.Lastly,weproposeanewframeworktodefendAFR
systemsagainstbothphysicalanddigitalattacks.
Themaincontributionsofthisdissertationareasfollows:
1.
Ageneralizable,interpretable,andaccuratefacemethodfordetectingphysical
fakefaces(suchas3Dmasks)inordertomitigatephysicalspoofattackonstate-of-the-art
AFRsystems.
2.
Anautomaticadversarialfacesynthesismethodforgeneratingdigitalfakefacesthatcanim-
personateavictimorobfuscateone'sidentitybyevadingstate-of-the-artAFRsystems.With
apowerfuladversarialfacesynthesizer,wecanfurtherinvestigaterobustnessofprevailing
facerecognitionsystemstodigitaladversarialfaces.
3.
Anewself-supervisedframework,namely
FaceGuard
,fordefendingagainstadversarialface
images.FaceGuardcombinesofadversarialtraining,detection,andinto
adefensemechanismtrainedinanend-to-endmanner.
4.
Anovelfaceattackdetectionframework,namely
UniFAD
,thatautomaticallyclusters
similarattacksandemploysamulti-tasklearningframeworktojointlydetectdigitaland
29
physicalattacks.ProposedUniFADallowsforfurtheroftheattackcategories,
i
.
e
.,whetherattacksareadversarial,digitallymanipulated,orcontainspoofartifacts.
30
Chapter2
DefendingAgainstFaceSpoofs
Facepresentationattacks(physicallycraftedspoofs)challengetherobustnessoffacerecognition
systems.TosafeguardAFRsystemsagainstspoofs,numerouspresentationattackdetection(PAD)
methodshavebeenproposed.Inthischapter,weaddresstwomajorissueswithprevailingPAD
approaches,namely,facePADgeneralizationandinterpretability.Ourmainfocusistoimprove
presentationattackdetectionperformanceacrossawidevarietyofunknownattacks,whilealso
maintaininghighdetectionaccuracyonspoofsencounteredduringthetrainingofourproposed
PADsolution.
2.1Introduction
Despiteimpressivefacerecognitionperformance,currentAFRsystemsremainvulnerabletothe
growingthreatof
presentationattacks
1
.Facespoofsareﬁfakefacesﬂwhichcanbeconstructed
withavarietyofdifferentinstruments(presentationattackinstruments),
e
.
g
.,3Dprintedmasks,
printedpaper,ordigitaldevices(videoreplayattacksfromamobilephone)withagoalofenabling
anattackertoimpersonateavictim'sidentity,oralternatively,obfuscatetheirownidentity(see
Figure2.1).Withtherapidproliferationoffaceimages/videosontheInternet(especiallyonsocial
1
ISOstandardIEC30107-1:2016(E)presentationattacksas
ﬁpresentationtothebiometricdatacapture
subsystemwiththegoalofinterferingwiththeoperationofthebiometricsystemﬂ
[64].Notethatthesepresentation
attacksaredifferentfromdigitalmanipulationoffaceimages,suchasDeepFakes[81]andadversarialfaces[16].
31
mediawebsites,suchasFacebook,Twitter,orLinkedIn),replayingvideoscontainingthevictim's
faceorpresentingaprintedphotographofthevictimtotheAFRsystemisatrivialtask[65].Even
ifafacepresentationattackdetectionsystemcouldtriviallydetectprintedphotographsandreplay
videoattacks(
e
.
g
.,withdepthsensors),attackerscanstillattempttolaunchmoresophisticated
attackssuchas3Dmasks[66],make-up,orevenvirtualreality[67].
(a)
(b)Print
(c)Replay
(d)Half
(e)Silicone
(f)Trans.
(g)Paper
(h)Mann.
(i)Obf.
(j)Imp.
(k)Cosm.
(l)Funny
(m)PaperG
(n)PaperC
Figure2.1Examplepresentationattackinstruments:Simpleattacksinclude(b)printedphoto-
graph,or(c)replayingthevictim'svideo.Moreadvancedpresentationattackscanalsobelever-
agedsuchas(d-h)3Dmasks,(i-k)make-upattacks,or(l-n)partialattacks[30].Afaceis
shownin(a)forcomparison.Here,thepresentationattacksin(b-c,k-n)belongtothesameperson
in(a).
Theneedforpreventingfaceattacksisbecomingincreasinglyurgentduetotheuser'spri-
vacyconcernsassociatedwithspoofedsystems.Failuretodetectfaceattackscanbeamajor
securitythreatduetothewidespreadadoptionofautomatedfacerecognitionsystemsforborder
control[68].In2011,ayoungindividualfromHongKongboardedatoCanadadisguisedas
anoldmanwithahatbywearingasiliconefacemasktosuccessfullyfoolthebordercontrol
authorities[82].
Alsoconsiderthat,withtheadventofApple'siPhoneXandSamsung'sGalaxyS8,allofusare
carryingautomatedfacerecognitionsystemsinourpocketsembeddedinoursmartphones.Face
recognitiononourphonesfacilitates(i)unlockingthedevice,(ii)conductingtransactions,
and(iii)accesstoprivilegedcontentstoredonthedevice.Failuretodetectfacepresentationattacks
32
Figure2.2AnoverviewoftheproposedSelf-SupervisedRegionalFullyConvolutionNetwork
(
SSR-FCN
).Wetrainintwostages:(1)Stage1learnsglobaldiscriminativecuesviatrainingon
theentirefaceimage.Thescoremapobtainedfromstage1ishard-gatedtoobtainpresentation
attackregionsinthefaceimage.Werandomlycroparbitrary-sizepatchesfromthepresentation
attackregionsandournetworkinstage2tolearnlocaldiscriminativecues.Duringtest,
weinputtheentirefaceimagetoobtainthescore.Thescoremapcanalsobeusedto
visualizethepresentationattackregionsintheinputimage.
onsmartphonescouldcompromiseinformationsuchasemails,bankingrecords,social
mediacontent,andpersonalphotos[83].
Withnumerousapproachesproposedtodetectfaceattacks,currentfacepresentationattack
33
Table2.1Asummaryofpubliclyavailablefacepresentationattackdetectiondatasets.
Dataset
Year
Statistics
#PresentationAttackInstruments
#Subj.
#Vids.
Replay
Print
3DMask
Makeup
Partial
Attack[4]
2012
50
1,200
2
1
0
0
0
FASD[3]
2012
50
600
1
1
0
0
0
3DMAD[84]
2013
17
255
0
0
1
0
0
MFSD[85]
2015
35
440
2
1
0
0
0
Replay[86]
2016
40
1,030
1
1
0
0
0
MARs[87]
2016
35
1,009
0
0
2
0
0
Oulu-NPU[2]
2017
55
4,950
2
2
0
0
0
SiW[30]
2018
165
4,620
4
2
0
0
0
SiW-M[1]
2019
493
1,630
1
1
5
3
3
detectionmethodshavefollowingshortcomings:
Generalizabilty
Sincetheexactpresentationattackinstrumentmaynotbeknownbeforehand,
howtogeneralizewelltoﬁunknownﬂ
2
attacksisofutmostimportance.Amajorityoftheprevail-
ingstate-of-the-artfacepresentationattackdetectiontechniquesfocusonlyondetecting2Dprinted
paperandvideoreplayattacks,andarevulnerabletopresentationattackscraftedfrommaterials
notseenduringtrainingofthedectector.Infact,studiesshowatwo-foldincreaseinerrorwhen
presentationattackdetectionapproachesencounterunknownpresentationattackinstruments[30].
Inaddition,currentfacepresentationattackdetectionapproachesrelyondenselyconnectedneu-
ralnetworkswithalargenumberoflearnableparameters(exceeding
2
:
7
M
),wherethelackof
generalizationacrossunknownpresentationattackinstrumentsisevenmorepronounced.
LackofInterpretability
Givenafaceimage,facepresentationattackdetectionapproachestyp-
icallyoutputaholisticface
ﬁattackscoreﬂ
whichdepictsthelikelihoodthattheinputimageis
orapresentationattack.Withoutanabilitytovisualizewhichregionsofthefacecon-
tributetotheoveralldecisionmadebethenetwork,theglobalattackscorealonemaynotbe
sufforahumanoperatortointerpretthenetwork'sdecision.
2
Unseenattacksarepresentationattackinstrumentsthatareknowntothedeveloperswherebyalgorithmscanbe
tailoredtodetectthem,buttheirdataisneverusedfortraining.Unknownattacksarepresentationattack
instrumentsthatarenotknowntothedevelopersandneitherseenduringtraining.
34
Inanefforttoimpartgeneralizabilityandinterpretabilitytofacepresentationattackdetec-
tionsystems,weproposeafacepresentationattackdetectionframeworkdesignedto
detectunknownpresentationattackinstruments,namely,
Self-SupervisedRegionalFullyCon-
volutionalNetwork
(
SSR-FCN
).AFullyConvolutionalNetwork(FCN)istrainedtolearn
globaldiscriminativecuesandautomaticallyidentifypresentationattackregionsinfaceimages.
Thenetworkisthentolearnlocalrepresentationsviaregionalsupervision.Oncetrained,
thedeployedmodelcanautomaticallylocateregionswhereattackoccursintheinputimageand
provideaattackscore.
Thecontributionsofthischapterareasfollows:

Weshowthatfeatureslearnedfromlocalfaceregionshavebettergeneralizationabilitythan
thoselearnedfromtheentirefaceimagealone.

Weprovideextensiveexperimentstoshowthattheproposedapproach,
SSR-FCN
,outper-
formsotherlocalregionextractionstrategiesandstate-of-the-artfacepresentationattackde-
tectionmethodsononeofthelargestpubliclyavailabledataset,namely,SiW-M,comprised
of13differentspresentationattackinstruments.TheproposedmethodreducestheEqualEr-
rorRate(EER)by(i)14%relativetostate-of-the-art[88]undertheunknownattacksetting,
and(ii)40%onknownpresentationattackinstruments.Inaddition,
SSR-FCN
achievescom-
petitiveperformanceonstandardbenchmarksonOulu-NPU[2]datasetandoutperformspre-
vailingmethodsoncross-datasetgeneralization(CASIA-FASD[3]andReplay-Attack[4]).

Theproposed
SSR-FCN
isalsoshowntobemoreinterpretablesinceitcandirectlypredict
thepartsofthefacesthatareconsideredaspresentationattacks.
2.2Background
Inordertomitigatethethreatsassociatedwithpresentationattacks,numerousfacepresentation
attackdetectiontechniques,basedonbothsoftwareandhardwaresolutions,havebeenproposed.
35
Figure2.3Illustrationofdrawbacksofpriorapproaches.Top:exampleofaface;Bottom:
exampleofapaperglassespresentationattack.Inthiscase,thepresentationattackartifactisonly
presentintheeye-regionoftheface.(a)trainedwithglobalsupervisionovtothe
classsincebothimagesaremostly(thepresentationattackinstrumentcovers
onlyapartoftheface).(b)Pixel-levelsupervisionassumestheentireimageiseitheror
presentationattackandconstructslabelmapsaccordingly.Thisisnotavalidassumptioninmask,
makeup,andpartialpresentationattackinstruments.Instead,(c)theproposedframework,trains
onextractedregionsfromfaceimages.Theseregionscanbebasedondomainknowledge,such
aseye,nose,mouthregions,orrandomlycropped.Theproposed
SSR-FCN
utlizesself-supervised
region-selection.
Earlysoftware-basedsolutionsutilizedlivenesscues,suchaseyeblinking,lipmovement,and
headmotion,todetectprintattacks[89Œ92].However,theseapproachesfailwhentheyencounter
unknownattackssuchasprintedattackswithcuteyeregions(seeFigure2.1n).Inaddition,these
methodsrequireactivecooperationofuserinprovidingtypesofimagesmakingthem
tedioustouse.
Sincethen,researchershavemovedonto
passive
facepresentationattackdetectionapproaches
thatrelyontextureanalysisfordistinguishingandpresentationattacks,ratherthanmotion
orlivenesscues.Themajorityoffacepresentationattackdetectionmethodsonlyfocusondetect-
ingprintandreplayattacks,whichcanbedetectedusingfeaturessuchascolorandtexture[93Œ98].
Manypriorstudiesemployhandcraftedfeaturessuchas2DFourierSpectrum[85,99],LocalBi-
naryPatterns(LBP)[94,100Œ102],Histogramoforientedgradients(HOG)[93,103],Difference-
of-Gaussians(DoG)[104],Scale-invariantfeaturetransform(SIFT)[96],andSpeededuprobust
features(SURF)[105].SometechniquesutilizepresentationattackdetectionbeyondtheRGB
colorspectrum,suchasincorporatingtheluminanceandchrominancechannels[102].Insteadof
36
apredeterminedcolorspectrum,Li
etal.
[95]automaticallylearnanewcolorschemethatcan
bestdistinguishandpresentationattacks.Anotherlineofworkextractsimagequality
featurestodetectpresentationattacks[85,97,98].Duetotheassumptionthatpresentationattack
instrumentsareoneofreplayorprintattacks,thesemethodsseverelysufferfromgeneralizationto
unknownpresentationattackinstruments.
Hardware-basedsolutionsinliteraturehaveincorporated3Ddepthinformation[106Œ108],
multi-spectralandinfraredsensors[109],andevenphysiologicalsensorssuchasvwinfor-
mation[110].Presentationattackdetectioncanbefurtherenhancedbyincorporatingbackground
audiosignals[111].However,withtheinclusionofadditionalsensorsalongwithastandardcam-
era,thedeploymentcostscanbeexorbitant(
e
.
g
.,thermalsensorsforiPhonescostoverUSD
400
3
).
State-of-the-artfacepresentationattackdetectionsystemsutilizeConvolutionalNeuralNet-
works(CNN)sothefeatureset(representation)islearnedthatbestdifferentiatesdesfrom
presentationattacks.Yang
etal.
wereamongthetoproposeCNNsforfacepresentationat-
tackdetectionandtheyshowedabout70%decreaseinHalfTotalErrorRate(HTER)comparedto
thebaselinescomprisedofhandcraftedfeatures[112].Furtherimprovementinperformancewas
achievedbydirectlymodifyingthenetworkarchitecture[113Œ116].Deeplearningapproachesalso
performwellformaskattackdetection[66].Incorporatingauxiliaryinformation(
e
.
g
.eyeblinking)
indeepnetworkscanfurtherimprovethefacepresentationattackdetectionperformance[30,91].
Limitedstudiesongeneralizablefacepresentationattackdetectionfocusonone-classclassi-
approaches.Thesemethodsonlymodelthedistributionoffacefeaturesusing
one-classsuchasone-classSVM[117],one-classGMM[118],oremployadistance
metricloss[119].However,theseapproacheshaveseveraldrawbacks:(i)byonlymodelingthe
facefeaturedistributions,themethodstendtoovtoclass,and(ii)bothone-
classSVMandone-classGMMhavebeenshowntoperformpoorlyonpublicbenchmarkdatasets
(CASIA-FASD[3],Replay-Mobile[86],andMSU-MFSD[85]).Atree-basedapproachutilizing
deepnetworkswasproposedforgeneralizablefacepresentationattackdetection[1].Inorderto
3
https://amzn.to/2zJ6YW4
37
preventfacepresentationattackdetectionmethodsfromovtothesubject,envi-
ronment,andpresentationattackinstrument,transferlearninghasalsobeenstudied[120Œ122].
However,thesemethodssharesimilarnetworkarchitecturethataredenselyconnectedwiththir-
teenconvolutionallayersexceeding
2
:
7
M
learnableparameters[30,88,120Œ124].Duetothis,a
majorityoftheaforementionedpresentationattackdetectionmethodsalsosufferfrompoorgener-
alizationperformance.
Table2.1outlinesthepubliclyavailablefacepresentationattackdetectiondatasets.
2.3Motivation
Theapproachismotivatedbyfollowingobservations:
2.3.1FacePresentationAttackDetectionisaLocalTask
Itisnowgenerallyacceptedthatforprintandreplayattacks,
ﬁfacepresentationattackdetectionis
usuallyalocaltaskinwhichdiscriminativecluesareubiquitousandrepetitiveﬂ
[125].However,
inthecaseofmasks,makeups,andpartialattacks,theubiquityandrepetitivenessofpresentation
attackcuesmaynotholdtrue.Forinstance,inFigure2.3(a-c),presentationattackartifact(the
paperglasses)areonlypresentintheeyeregionsoftheface.Unlikefacerecognition,facepre-
sentationattackdetectiondoesnotrequiretheentirefaceimageinordertopredictwhetherthe
imageisapresentationattackorInfact,ourexperimentalresultsandtheiranalysiswill
thattheentirefaceimagealonecanadverserlyaffecttheconvergenceandgeneralization
ofnetworks.
2.3.2Globalvs.LocalSupervision
Priorworkcanbepartitionedintotwogroups:(i)
globalsupervision
wheretheinputtothenetwork
istheentirefaceimageandtheCNNoutputsascoreindicatingwhethertheimageis
orpresentationattack[30,88,112Œ116,123,126],and(ii)
pixel-levelsupervision
wheremultiple
38
Table2.2ArchitecturedetailsoftheproposedFCNbackbone.
Layer
#ofActivations
#ofParameters
Input
H

W

3
0
Conv
H=
2

W=
2

64
3

3

3

64+64
Conv
H=
4

W=
4

128
3

3

64

128+128
Conv
H=
8

W=
8

256
3

3

128

256+256
Conv
H=
16

H=
16

512
3

3

256

512+512
Conv
H=
16

H=
16

1
3

3

512

1+1
GAP
1
0
Total
1.5M
ConvandGAPrefertoconvolutionalandglobalaveragepoolingoperations.
lossesareaggregatedovereachpixelinthefeaturemap[124,127].Thesestudies
assumethatallpixelsinthefaceimageiseitherorpresentationattack(seeFigure2.3(b)).
Thisassumptionholdstrueforpresentationattackinstruments,suchasreplayandprintattacks
(whicharetheonlypresentationattackinstrumentsconsideredbythestudies),butnotformask,
makeup,andpartialattacks.Therefore,pixel-levelsupervisioncannotonlysufferfrompoor
generalizationacrossadiverserangeofpresentationattackinstruments,butalsoconvergenceof
thenetworkisseverelyaffectedduetonoisylabels.
Insummary,basedonthe13differentpresentationattackinstrumentsshowninFigure2.1,for
whichwehavethedata,wegainthefollowinginsights:(i)facepresentationattackdetectionis
inherentlyalocaltask,and(ii)learninglocalrepresentationscanimprovefacepresentationattack
detectionperformance[124,127].Motivatedby(i),wehypothesizethatutilizingaFullyConvo-
lutionalNetwork(FCN)maybemoreappropriateforthefacepresentationattackdetectiontask
comparedtoatraditionalCNN.ThesecondinsightsuggestsFCNscanbeintrinsicallyregular-
izedtolearnlocalcuesbyenforcingthenetworkto
look
atlocalspatialregionsoftheface.In
ordertoensurethattheseregionsmostlycomprisepresentationattackpatterns,weproposea
self-
supervised
regionextractor.
39
2.4ProposedApproach
Inthissection,wedescribetheproposed
Self-SupervisedRegionalFullyConvolutionalNetwork
(
SSR-FCN
)forgeneralizedfacepresentationattackdetection.AsshowninFigure2.2,wetrain
thenetworkintwostages,(a)StageIlearnsglobaldiscriminativecuesandpredictsscoremaps,
and(b)StageIIextractsarbitrary-sizeregionsfrompresentationattackareasandthe
networkviaregionalsupervision.
2.4.1NetworkArchitecture
Intypicalimagetasks,networksaredesignedsuchthatinformationpresentinthe
inputimagecanbeusedforlearning
global
discriminativefeaturesintheformofafeaturevector
withoututilizingthespatialarrangementintheinput.Tothisend,afully-connected(FC)layeris
generallyintroducedattheendofthelastconvolutionallayer.Thisfully-connectedlayeroutputs
a
D
-dimensionfeaturethataggregatesdecisionsatvariousspatialregionstoobtainaglobalde-
scriptionoftheinputimage.However,thisisnotidealforpartialpresentationattackssincethe
presentationattackartifactisnotpresentinallspatialregions.Giventheplethoraofavailablepre-
sentationattackinstruments,itisbettertolearn
local
representationsandmakedecisionsonlocal
spatialinputsratherthanglobaldescriptors.Therefore,weemployaFullyConvolutionalNetwork
(FCN)byreplacingtheFClayerinatraditionalCNNwitha
1

1
convolutionallayerfollowedby
aglobalaveragepoolinglayer.TheFCNleadstothreemajoradvantagesovertraditionalCNNs:

Arbitrary-sizedinputs
:Byreplacingthefully-connectedlayerwithaglobalaveragepool-
inglayer,theentireFCNcanacceptinputimagesofanyimagesize.Thispropertycanbe
exploitedtolearndiscriminativefeaturesatlocalspatialregions,regardlessoftheinputsize,
ratherthanovtoaglobalrepresentationoftheentirefaceimage.

Interpretability
:SincetheproposedFCNistrainedtoprovidedecisionsatalocallevel,the
scoremapoutputbythenetworkcanbeusedtoidentifythepresentationattackregionsin
theface.
40
Figure2.4Threepresentationattackimagesandtheircorrespondingbinarymasksextractedfrom
predictedscoremaps.Blackregionscorrespondtopredictedregions,whereas,white
regionsindicatepresentationattack.


:ViaFCN,anentirefaceimagecanbeinferredonlyoncewherelocaldecisions
aredynamicallyaggregatedviathe
1

1
convolutionoperator.Existingmethodsutilizinga
traditionalCNNwhichhasalargernumberoftrainableparametersduetothefullyconnected
layerattheendofthenetwork.Thisnecessitatesalargetrainingdatasetwhichislimited
inthefacepresentationattackdetectionliterature(seeTable2.1).FCNismoreparameter-
efandcanbetrainedinaneffectivemanner(toavoidov
2.4.2Network
AmajorityofpriorworkonCNN-basedfacepresentationattackdetectionemploysarchitectures
thataredenselyconnectedwiththirteenconvolutionallayers[30,88,123,124,128].Evenwiththe
placementofskipconnections,thenumberoflearnableparametersexceed
2
:
7
M
.Asweseein
Table2.1,onlyalimitedamountoftrainingdata
4
isgenerallyavailableinfacepresentationattack
detectiondatasets.Limiteddatacoupledwiththelargenumberoftrainableparameterscauses
currentapproachestoovleadingtopoorgeneralizationperformanceunderunknownattack
scenarios.Instead,weemployashallowerneuralnetworkcomprisingofonlyveconvolutional
layerswithapproximately
1
:
5
M
learnableparameters(seeTable2.2).
2.4.3StageI:TrainingFCNGlobally
WetraintheFCNwithglobalfaceimagesinordertolearnglobaldiscriminativecuesand
identifypresentationattackregionsinthefaceimage.Givenanimage,
x
2
R
H

W

C
,wedetecta
4
Thelackoflarge-scalepubliclyavailablefacepresentationattackdetectiondatasetsisduetothetimeandeffort
requiredalongwithprivacyconcernsassociatedinacquiringsuchdatasets.
41
faceandcropthefaceregionvia5landmarks(twoeyes,nose,andtwomouthkeypoints)inorder
toremovebackgroundinformationnotpertinenttothetaskoffacepresentationattackdetection.
Here,
H
,
W
,and
C
refertotheheight,width,andnumberofchannels(3inthecaseofRGB)of
theinputimage.Thefaceregionsarethenalignedandresizedtoaed-size(
e
.
g
.,
256

256
)in
ordertomaintainconsistentspatialinformationacrossalltrainingdata.
TheproposedFCNconsistsoffourdownsamplingconvolutionalblockseachcoupledwith
batchnormalizationandReLUactivation.Thefeaturemapfromthefourthconvolutionallayer
passesthrougha
1

1
convolutionallayer.Theoutputofthe
1

1
convolutionallayerrepresents
ascoremap
S
2
R
(
H
S

W
S

1)
whereeachpixelin
S
representsaattack
decisioncorrespondingtoitsreceptiveintheimage.Theheight(
H
S
)andwidth(
W
S
)ofthe
scoremapisdeterminedbytheinputimagesizeandthenumberofdownsamplinglayers.Fora
256

256

3
inputimage,ourproposedarchitectureoutputsa
16

16

1
scoremap.
Thescoremapisthenreducedtoasinglescalarvaluebygloballyaveragepooling.Thatis,the
score(
s
)foraninputimageisobtainedfromthe
(
H
S

W
S

1)
scoremap(
S
)
by,
s
=
1
H
S

W
S
H
S
X
i
W
S
X
j
S
i;j
(2.4.1)
Usingsigmoidactivationontheoutput(
s
),weobtainascalar
p
(
c
j
x
)
2
[0
;
1]
predictingthelikelihoodthattheinputimageisapresentationattack,where
c
=0
indicates
and
c
=1
indicatespresentationattack.
WetrainthenetworkbyminimizingtheBinaryCrossEntropy(BCE)loss,
L
=

[
y

log
(
p
(
c
j
x
))+(1

y
)

log
(1

p
(
c
j
x
))]
(2.4.2)
where
y
isthegroundtruthlabeloftheinputimage.
42
2.4.4StageII:TrainingFCNonSelf-SupervisedRegions
Inordertosupervisetrainingatalocallevel,weproposearegionalsupervisionstrategy.Wetrain
thenetworktolearnlocalcuesbyonlyshowingcertainregionsofthefacewherepresentation
attackpatternsexists.Inordertoensurethatpresentationattackartifacts/patternsindeedexists
withintheselectedregions,thepre-trainedFCNfromStageI(2.4.3)canautomaticallyguidethe
regionselectionprocessinpresentationattackimages.Forfaces,wecanrandomlycrop
aregionfromanypartoftheimage.
Duetotheabsenceofafullyconnectedlayer,noticethatFCNnaturallyencodesdecisionsat
eachpixelinfeaturemap
S
.Inotherwords,higherintensitypixelswithin
S
indicatealargerlike-
lihoodofapresentationattackpatternresidingwithinthereceptiveintheimage.Therefore,
discriminativeregions(presentationattackareas)areautomaticallyhighlightedinthescoremap
bytrainingonentirefaceimages(seeFigure2.2).
Wecanthencraftabinarymask
M
indicatingtheattackregionsinthe
inputpresentationattackimages.First,wesoft-gatethescoremapbymin-maxnormalizationsuch
thatwecanobtainascoremap
S
0
2
[0
;
1]
.
Let
f
S
0
(
i;j
)
representtheactivationinthe
(
i;j
)
-thspatiallocationinthescaledscoremap
S
0
.
Thebinarymask
M
isdesignedbyhard-gating,
M
(
i;j
)=
8
>
>
<
>
>
:
1
;
if
S
0
(
i;j
)

˝
0
;
otherwise
(2.4.3)
where,
˝
isathresholdthatcontrolsthesizeofthehard-gatedregion(
˝
=0
:
5
inourcase).
Alarger
˝
leadstosmallerregionsandsmaller
˝
canleadtospuriouspresentationattackregions.
ExamplesofbinarymasksareshowninFigure2.4.Fromthebinarymask,wecanthenrandomly
extractarectangularboundingboxsuchthatthecenteroftherectanglelieswithinthedetected
presentationattackregions.Inthismanner,wecancroprectangularregionsofarbitrarysizes
fromtheinputimagesuchthateachregioncontainspresentationattackartifactsaccordingtoour
43
pre-trainedglobalFCN.Weconstrainthewidthandheightoftheboundingboxestobebetween
MIN
region
and
MAX
region
.
Inthismanner,weournetworktolearnlocaldiscriminativecues.
2.4.5Testing
SinceFCNscanacceptarbitraryinputsizesandthefactthattheproposedFCNhasencountered
entirefacesinStageI,weinputtheglobalfaceintothetrainednetworkandobtainthescore
map.Thescoremapisthenaveragepooledtoextracttheoutput,whichisthen
normalizedbyasigmoidfunctioninordertoobtainanattackscorewithin
[0
;
1]
.Thatis,the
scoreisobtainedby,
1
1+exp(

s
)
Inadditiontothescore,thescoremap(
S
)canalsobeutilizedforvisualizingthe
presentationattackregionsinthefacebyconstructingaheatmap(seeFigure2.2).
2.5ExperimentalSetup
2.5.1Datasets
Thefollowingfourdatasetsareutilizedinourstudy(Table2.1):
2.5.1.1Spoof-in-the-WildwithMultipleAttacks(SiW-M)[1]
Adataset,collectedin2019,comprising13differentpresentationattackinstruments,acquired
forevaluatinggeneralizationperformanceonunknownpresentationattackinstru-
ments.Comparedwithotherpubliclyavailabledatasets(Table2.1),SiW-Misdiverseinpresen-
tationattackinstruments,environmentalconditions,andfaceposes.Weevaluate
SSR-FCN
under
both
unknown
and
known
settings,andperformablationstudiesonthisdataset.
44
2.5.1.2Oulu-NPU[2]
Adatasetcomprisedof4,950high-resolutionvideoclipsof55subjects.Oulu-NPUfour
protocolseachdesignedforevaluatinggeneralizationagainstvariationsincapturingconditions,
attackdevices,capturingdevicesandtheircombinations.Weusethisdatasetforcomparingour
approachwiththeprevailingstate-of-the-artfacepresentationattackdetectionmethodsonthefour
protocols.
2.5.1.3CASIA-FASD[3]&Replay-Attack[4]
Bothdatasets,collectedin2012,arefrequentlyemployedinfacepresentationattackdetection
literaturefortesting
cross-datasetgeneralization
performance.Thesetwodatasetsprovideacom-
prehensivecollectionofattacks,includingwarpedphotoattacks,cutphotoattacks,andvideo
replayattacks.Low-quality,normal-quality,andhigh-qualityvideosarerecordedunderdifferent
lightingconditions.
AllimagesshowninthispaperarefromSiW-Mtestingsets.
2.5.2DataPreprocessing
Foralldatasets,weextractallframesinavideo.TheframesarethenpassedthroughMTCNN
facedetector[41]todetect5faciallandmarks(twoeyes,noseandtwomouthcorners).Similarity
transformationisusedtoalignthefaceimagesbasedonthevelandmarks.Aftertransformation,
theimagesarecroppedto
256

256
.
Allfaceimagesshowninthepaperarecroppedandaligned.
Beforepassingintonetwork,wenormalizetheimagesbyrequiringeachpixeltobewithin
[

1
;
1]
bysubtracting
127
:
5
anddividingby
127
:
5
.
2.5.3ImplementationDetails
SSR-FCN
isimplementedinTw,andtrainedwithaconstantlearningrateof
(1
e

3)
with
amini-batchsizeof
128
.Theobjectivefunction,
L
,isminimizedusingAdamoptimizer[129].
45
Table2.3Generalizationerroronlearningglobal(CNN)vs.local(FCN)representationsofSiW-
M[1].
Method
Metric(%)
Replay
Obfuscation
PaperGlasses
Overall
CNN
ACER
13.3
47.1
32.2
30.8

17.0
EER
12.8
44.6
23.6
27.0

13.2
FCN(StageI)
ACER
11.2
52.2
12.1
25.1

23.4
EER
11.2
37.6
12.4
20.4

12.1
Ittakes20epochstoconverge.Following[30],werandomlyinitializealltheweightsofthe
convolutionallayersusinganormaldistributionof
0
meanand
0
:
02
standarddeviation.Werestrict
theself-supervisedregionstobeatleast
1
=
4
oftheentireimage,thatis,
MIN
region
=64
andat
most
MAX
region
=256
whichisthesizeoftheglobalfaceimage.Dataaugmentationduring
traininginvolvesrandomhorizontalwithaprobabilityof
0
:
5
.Wetrainandtestourproposed
methodonasingleNvidiaGTX1080TiGPU.Forevaluation,wecomputetheattackscoresforall
framesinavideoandtemporallyaveragethemtoobtainthescore.
2.5.4EvaluationMetrics
Foralltheexperiments,wereportthestandardISO/IEC30107[64]metrics:
1.
AttackPresentationErrorRate(APCER):theworsterrorrateamongallthe
presentationattackinstruments;
2.
PresentationErrorRate(BPCER):
3.
AverageClaErrorRate(ACER):themeanofAPCERandBPCER.
Inaddition,wealsoreporttheEqualErrorRate(EER)andtheTrueDetectionRate(TDR)at
2.0%
5
FalseDetectionRate(FDR)forourevaluation.Forafaircomparisonwithpriorwork,we
reporttheHalfTotalErrorRate(HTER)forcross-datasetevaluation.
ExceptforEERandHTER,
weemployadecisionthresholdof
0
:
5
.
5
Duetothesmallnumberofsamples,thresholdsatlowerFalseDetectionRate(FDR)suchas0.2%
(recommendedundertheIARPAODINprogram)cannotbecomputed.
46
(a)Regions
(b)LandmarkBased
(c)Self-Supervised(Proposed)
Figure2.5Illustrationofvariousregionextractionstrategiesfromtrainingimages.(a)and(b)
areregionsextractedviadomainknowledge(manuallyorlandmark-based.(c)random
regionsextractedviaproposedself-supervisionscheme.Eachcolordenotesaseparateregion.
47
Table2.4GeneralizationperformanceofdifferentregionextractionstrategiesonSiW-Mdataset.
Here,eachcolumnrepresentsanunknownpresentationattackinstrumentwhilethemethodis
trainedontheremaining12presentationattackinstruments.
Method
Metric
Replay
Print
MaskAttacks
MakeupAttacks
PartialAttacks
Mean

Std.
(%)
Replay
Print
Half
Silicone
Trans.
Paper
Mann.
Obf.
Imp.
Cosm.
FunnyEye
Glasses
PaperCut
99vids.
118vids.
72vids.
27vids.
88vids.
17vids.
40vids.
23vids.
61vids.
50vids.
160vids.
127vids.
86vids.
Global
ACER
11.2
15.5
12.8
21.5
35.4
6.1
10.7
52.2
50.0
20.5
26.2
12.1
9.6
22.6

15.3
EER
11.2
14.0
12.8
23.1
26.6
2.9
11.0
37.6
10.4
17.0
24.2
12.4
10.1
16.8

9.3
Eye-Region
ACER
13.2
13.7
7.5
17.4
22.5
5.79
6.2
19.5
8.3
11.7
32.8
15.3
7.3
13.2

8.5
EER
12.4
11.4
7.3
15.2
21.5
2.9
6.5
20.2
7.8
11.2
27.2
14.7
7.5
12.3

6.2
Nose-Region
ACER
17.4
10.5
8.2
13.8
30.3
5.3
8.4
37.4
5.1
18.0
35.5
31.4
7.1
17.6

12.0
EER
14.6
9.8
9.2
12.7
22.0
5.2
8.4
23.6
4.4
14.6
24.9
27.7
7.6
14.2

7.9
Mouth-Region
ACER
20.5
20.7
22.9
26.3
30.6
15.6
17.1
44.2
18.1
24.0
38.0
47.2
8.5
25.7

11.4
EER
19.9
21.3
22.6
25.1
30.0
10.1
10.7
40.9
16.1
24.0
35.5
40.4
8.1
23.4

10.9
Global+
ACER
10.9
10.5
7.5
17.7
28.7
5.1
7.0
38.0
5.1
13.6
29.4
15.2
6.2
15

10.7
Eye+Nose
EER
10.2
10.0
7.7
15.8
21.3
1.8
6.7
21.0
3.0
12.3
22.5
12.3
6.5
11.6

6.8
Landmark
ACER
10.7
9.2
18.4
25.1
26.4
6.2
6.9
53.8
8.1
15.4
35.8
40.8
7.6
20.3

15.2
Region
EER
8.0
10.1
12.2
23.1
18.8
8.9
4.1
40.1
9.9
15.6
17.7
25.6
4.9
15.3

10
Global+
ACER
12.0
11.2
7.3
23.7
26.4
6.3
5.9
26.7
6.7
10.7
27.8
25.7
6.4
15.1

9.2
Landmark
EER
11.5
10.1
7.2
19.0
4.9
6.6
4.6
25.6
6.7
10.9
23.5
18.5
4.7
11.8

7.4
Random-Crop
ACER
9.2
6.7
7.3
19.9
30.9
9.1
6.9
44.0
6.5
13.8
31.8
28.6
5.9
17.0

12.8
EER
8.9
7.8
10.3
17.9
21.3
3.7
6.5
32.7
5.4
13.7
18.7
19.4
7.1
13.3

8.3
Global+
ACER
12.3
10.7
6.5
18.2
22.9
6.2
6.1
18.6
4.9
11.6
32.7
16.1
7.2
13.4

8.1
Random-Crop
EER
10.9
9.2
6.9
16.6
21.3
2.9
5.2
18.8
3.7
11.5
19.0
14.9
6.2
11.3

6.3
SSR-FCN
ACER
7.4
19.5
3.2
7.7
33.3
5.2
3.3
22.5
5.9
11.7
21.7
14.1
6.4
12.4

9.2
EER
6.8
11.2
2.8
6.3
28.5
0.4
3.3
17.8
3.9
11.7
21.6
13.5
3.6
10.1

8.4
2.6ExperimentalResults
2.6.1EvaluationofGlobalDescriptorvs.LocalRepresentation
Inordertoanalyzetheimpactoflearninglocalembeddingsasopposedtolearningaglobalem-
bedding,weconductanablationstudyonthreepresentationattackinstrumentsintheSiW-M
dataset[1],namely,Replay(Figure2.1c),Obfuscation(Figure2.1i),andPaperGlasses(Fig-
ure2.1m).
Theexperimentisconductedundertheunknownattackscenario(leave-one-instrument-
outprotocol).
Inthisexperiment,a
traditionalCNN
learningaglobalimagedescriptorisconstructedby
replacingthe
1

1
convolutionallayerwithafullyconnectedlayer.WecomparetheCNNtothe
proposedbackbone
FCN
inTable2.2whichlearnslocalrepresentations.Forafaircomparison
betweenCNNandFCN,weutilizethesamemeta-parametersandemployglobalsupervisiononly
(StageI).
48
InTable2.3,wethatoverallFCNsaremoregeneralizabletounknownpresentationat-
tackinstrumentscomparedtoglobalembeddings.Forpresentationattackinstrumentswherethe
presentationattackaffectstheentireface,suchasreplayattacks,thedifferencesbetweengeneral-
izationperformanceofCNNandFCNarenegligible.Here,presentationattackdecisionsatlocal
spatialregionsdonothaveanyadvantageoverasinglepresentationattackdecision
overtheentireimage.RecallthatCNNsemployafullyconnectedlayerwhichstripsawayall
spatialinformation.Thisexplainswhylocaldecisionscanimprovegeneralizability
ofFCNoverCNNwhenpresentationattackinstrumentsarelocalinnature(
e
.
g
.,make-upattacks
andpartialattacks).Duetosubtletyofobfuscationattacksandlocalizednatureofpaperglasses,
FCNcanexhibitarelativereductioninEERby
16%
and
47%
,respectively,relativetoCNN.
2.6.2RegionExtractionStrategies
Weconsidered6differentregionextractionstrategies,namely,
Eye-Region
,
Nose-Region
,
Mouth-
Region
,
Landmark-Region
,
Random-Crop
,and
Self-SupervisedRegions(Proposed)
(seeFig-
ure2.5).Wealsoincluderesultsfora
Global
modelwhichreferstotrainingtheFCNwiththe
entirefaceimageonly(StageI).
Sinceallfaceimagesarealignedandcropped,spatialinformationisconsistentacrossallim-
agesindatasets.Therefore,wecanautomaticallyextractfacialregionsthatinclueeye,nose,and
mouthregions(Figure2.5a).WetraintheproposedFCNseparatelyoneachofthethreeregionsto
obtainthreemodels:eye-region,nose-region,andmouth-region.
Wealsoinvestigateextractingregionsbyfacelandmarks.Forthis,weutilizeastate-
of-the-artlandmarkextractor,namelyDLIB[130],toobtain68landmarkpoints.Weexclude17
landmarksaroundthejawlineandasubsetof51landmarkpointsaroundeyebrows
(10landmarks),eyes(12landmarks),nose(9landmarks),andmouth(20landmarks).Atotalof
51regions(withaedsize
32

32
)centeredaroundeachlandmarkareextractedandusedtrain
asingleFCNonall51regions.
Ourareasfollows:(i)almostallmethodswithregionalsupervisionhaveloweroverall
49
errorratesascomparedtotrainingwiththeentireface.ExceptiontothisiswhenFCNistrained
onlyonmouthregions.Thisislikelybecauseamajorityofpresentationattackinstrumentsmay
notcontainpresentationattackpatternsacrossmouthregions(Figure2.1).(ii)whenbothglobal
anddomain-knowledgestrategies,eyesandnose)arefused,thegeneralizationperfor-
manceimprovescomparedtotheglobalmodelalone.Notethatwedonotfusethemouthregion
sincetheperformanceispoorformouthregions.Similarly,wethatregionscroppedaround
landmarkswhenfusedwiththeglobalercanachievebettergeneralizationperformance.
(iii)samplingrandomlocalregionsofthefacealsoresultsinhigherrorratesacrossthediverseset
ofpresentationattackinstruments.(iv)comparedtoallregionextractionstrategies,theproposed
self-supervisedregionextractionstrategy(StageI
!
StageII)achievesthelowestgeneralization
errorratesacrossallpresentationattackinstrumentswitha
40%
and
45%
relativereductioninEER
andACERcomparedtotheGlobalmodel(StageI).ThissupportsourhypothesisthatbothStage
IandStageIIarerequiredforenhancedgeneralizationperformanceacrossunknownpresentation
attackinstruments.Ascore-levelfusionoftheglobalFCNwithself-supervisedregionsdoesnot
showanyreductioninerrorrates.ThisisbecausewealreadytrainedtheproposedFCN
onglobalfacesinStageI.
The
Random-Crop
strategycanbeviewedasavariantoftrainingtheproposedStageIIwithout
anyregion-selectionguidancefromStageI.Inordertofurtherinvestigatetheofthepro-
posedself-supervisionmethod,wetraintwomodelstodistinguishbetweensamplesand
PaperGlasses
presentationattacksamples:(i)
RandomCrop
modelistrainedonpatchesrandomly
sampledfromtheinputimagesuchthatthesizeofeachregionisbetween
64

64
and
256

256
,and
(ii)
Self-SupervisedRegions
modelistrainedonpatchessampledfromweaklylabeledregionsby
thepre-trainedmodelfromStageI.Forafaircomparison,wedonotemploythepre-trainedmodel
fromStageItotrainthe
Self-SupervisedRegions
model.InFigure2.7,weplotthetraininglossfor
bothmodelsovermultipletrainingiterations.Wecanobservethattheproposedself-supervisedre-
gionmethodaidsinnetworkconvergence.Thisisbecauserandomcroppingcanresultintraining
withregionsfrompresentationattacksamples(seeFigure2.7),whereas,theproposed
50
(a)Obfuscation
(b)AttackScore:1.0
(c)AttackScore:0.7
(d)Obfuscation
(e)AttackScore:0.0
(f)AttackScore:0.0
Figure2.6(a)Anexampleobfuscationpresentationattackattemptwhereournetworkcorrectly
predictstheinputtobeapresentationattack.(b,e)Scoremapoutputbyournetworktrainedvia
Self-SupervisedRegions.(c,f)ScoremapoutputbyFCNtrainedonentirefaceimages.(d)An
exampleobfuscationpresentationattackattemptwhereournetwork
incorrectly
predictstheinput
tobeaAttackscoresaregivenbelowthescoremaps.Decisionthresholdis
0
:
5
.
self-supervisionmethodensuresthatregionssampledfrompresentationattacksindeedcontains
presentationattackartifacts.
InFigure2.6,weanalyzetheeffectoftrainingtheFCNlocallyvs.globallyontheprediction
results.Intherow,wherebothmodelscorrectlypredicttheinputtobeapresentationattack,
weseethatFCNtrainedviarandomregionscancorrectlyidentifypresentationattackregionssuch
asfakeeyebrows.Incontrast,theglobalFCNcanbarelylocatethefakeeyebrows.Sincerandom
regionsincreasesthevariabilityinthetrainingsetalongwithadvantageoflearninglocalfeatures
thanglobalFCN,wethatproposedself-supervisedregionalsupervisionperformsbest.
51
Figure2.7Networkconvergenceoveranumberoftrainingiterationswhenamodeltrainson(a)
randomlycroppedpatches(blueline),and(b)self-supervisedregionsextractedviapre-trained
modelfromStageI(orangeline).Randomlycroppingpatchesmayresultinnoisysampleswhere
samplesfrompresentationattacksamplesmaybeusedfortraining.Someexampleran-
domlycroppedpatcheswithhightraininglossareshownabovethelines.Instead,wethatthe
proposedself-supervisionaidsinnetworkconvergence.
2.6.3EvaluationofNetworkCapacity
InTable2.5,weshowthegeneralizationperformanceofourmodelwhenwevarythecapacity
ofthenetwork.WeconsiderthreedifferentvariantsoftheproposedFCN:(a)3-layerFCN(
76
K
parameters),(b)5-layerFCN(
1
:
5
M
parameters;proposed),and(c)6-layerFCN(
3
M
parame-
ters).Thisexperimentisevaluatedonthreeunknownpresentationattackinstruments,namely,
Re-
play
,
Obfuscation
,and
PaperGlasses
.Wechosethesepresentationattackinstrumentsdueto
theirvastlydiversenature.Replayattacksconsistofglobalpresentationattackpatterns,whereas
52
Table2.5GeneralizationerrorofFCNswithrespecttothenumberoftrainableparameters.
Method
Metric(%)
Replay
Obfuscation
PaperGlasses
Mean

Std.
3-layers(
76
K
)
ACER
13.9
57.6
12.3
27.9

25.7
EER
14.0
44.1
7.5
21.9

19.5
5-layers(
1
:
5
M
;proposed)
ACER
7.4
22.5
14.1
14.7

7.6
EER
6.8
17.8
13.5
12.7

4.5
6-layers(
3
M
)
ACER
11.2
32.9
19.8
21.3

11.0
EER
7.8
25.19
19.7
17.6

7.2
Table2.6ResultsonSiW-M:UnknownAttacks.Here,eachcolumnrepresentsanunknownpre-
sentationattackinstrumentwhilethemethodistrainedontheremaining12presentationattack
instruments.
Method
Metric
Replay
Print
MaskAttacks
MakeupAttacks
PartialAttacks
Mean

Std.
Replay
Print
Half
Silicone
Trans.
Paper
Mann.
Obf.
Imp.
Cosm.
FunnyEye
Glasses
PaperCut
99vids.
118vids.
72vids.
27vids.
88vids.
17vids.
40vids.
23vids.
61vids.
50vids.
160vids.
127vids.
86vids.
SVM+LBP[2]
ACER
20.6
18.4
31.3
21.4
45.5
11.6
13.8
59.3
23.9
16.7
35.9
39.2
11.7
26.9

14.5
EER
20.8
18.6
36.3
21.4
37.2
7.5
14.1
51.2
19.8
16.1
34.4
33.0
7.9
24.5

12.9
Auxiliary[30]
ACER
16.8
6.9
19.3
14.9
52.1
8.0
12.8
55.8
13.7
11.7
49.0
40.5
5.3
23.6

18.5
EER
14.0
4.3
11.6
12.4
24.6
7.8
10.0
72.3
10.1
9.4
21.4
18.6
4.0
17.0

17.7
DTN[1]
ACER
9.8
6.0
15.0
18.7
36.0
4.5
7.7
48.1
11.4
14.2
19.3
19.8
8.5
16.8

11.1
EER
10.0
2.1
14.4
18.6
26.5
5.7
9.6
50.2
10.1
13.2
19.8
20.5
8.8
16.1

12.2
CDC[88]
ACER
10.8
7.3
9.1
10.3
18.8
3.5
5.6
42.1
0.8
14.0
24.0
17.6
1.9
12.7

11.2
EER
9.2
5.6
4.2
11.1
19.3
5.9
5.0
43.5
0.0
14.0
23.3
14.3
0.0
11.9

11.8
Proposed
ACER
7.4
19.5
3.2
7.7
33.3
5.2
3.3
22.5
5.9
11.7
21.7
14.1
6.4
12.4

9.2
EER
6.8
11.2
2.8
6.3
28.5
0.4
3.3
17.8
3.9
11.7
21.6
13.5
3.6
10.1

8.4
TDR*
72.0
51.0
96.0
55.9
39.0
100.0
95.0
31.0
90.0
44.0
33.0
42.9
94.7
65.0

25.9
*TDRevaluatedat
2
:
0%
FDR
Table2.7ResultsonSiW-M:Knownpresentationattackinstruments.
Method
Metric(%)
MaskAttacks
MakeupAttacks
PartialAttacks
Mean

Std.
Replay
Print
Half
Silicone
Trans.
Paper
Mann.
Obf.
Imp.
Cosm.
FunnyEye
Glasses
PaperCut
Auxiliary[30]
ACER
5.1
5.0
5.0
10.2
5.0
9.8
6.3
19.6
5.0
26.5
5.5
5.2
5.0
8.7

6.8
EER
4.7
0.0
1.6
10.5
4.6
10.0
6.4
12.7
0.0
19.6
7.2
7.5
0.0
6.5

5.8
Proposed
ACER
3.5
3.1
1.9
5.7
2.1
1.9
4.2
7.2
2.5
22.5
1.9
2.2
1.9
4.7

5.6
EER
3.5
3.1
0.1
9.9
1.4
0.0
4.3
6.4
2.0
15.4
0.5
1.6
1.7
3.9

4.4
TDR*
55.5
92.3
69.5
100.0
90.4
100.0
85.1
92.5
78.7
99.1
95.6
95.7
76.0
87.0

13.0
*TDRevaluatedat
2
:
0%
FDR
obfuscationattacksareextremelysubtlecosmeticchanges.PaperGlassesareconstrainedonly
toeyes.Whilealargenumberoftrainableparametersleadtopoorgeneralizationduetoover-
53
tothepresentationattackinstrumentsseenduringtraining,whereas,toofewparameters
limitslearningdiscriminativefeatures.Basedonthisobservationandexperimentalresults,we
utilizethe5-layerFCN(seeTable2.2)withapproximately1.5Mparameters.Amajorityofprior
studiesemploy13denselyconnectedconvolutionallayerswithtrainableparametersexceeding
2
:
7
M
[30,88,124,127].
2.6.4GeneralizationacrossUnknownAttacks
Theprimaryobjectiveofthisworkistoenhancegeneralizationperformanceacrossamultitude
ofunknownpresentationattackinstrumentsinordertoeffectivelygaugetheexpectederrorrates
inreal-worldscenarios.TheevaluationprotocolinSiW-Mfollowsaleave-one-spoof-outtesting
protocolwherethetrainingsplitcontains12differentpresentationattackinstrumentsandthe13
th
presentationattackinstrumentisheldoutfortesting.Amongthevideos,
80%
arekept
inthetrainingsetandtheremaining
20%
isusedfortestingNotethatthereareno
overlappingsubjectsbetweenthetrainingandtestingsets.Alsonotethatnodatasamplefromthe
testingpresentationattackinstrumentisusedforvalidationsinceweevaluateourapproachunder
unknownattacks.WereportACERandEERacrossthe13splits.InadditiontoACERandEER,
wealsoreporttheTDRat2.0%FDR.
InTable2.6,wecompare
SSR-FCN
withpriorwork.Wethatourproposedmethod
achievesimprovementincomparisontothepublishedresults[88](relativereduction
of14%ontheaverageEERand3%ontheaverageACER).Notethatthestandarddeviationacross
all13presentationattackinstrumentsisalsoreducedcomparedtopriorapproaches,eventhough
someofthem[30,88]utilizeauxiliarydatasuchasdepthandtemporalinformation.
,wereducetheEERsofreplay,halfmask,transparentmask,siliconemask,paper
mask,mannequinhead,obfuscation,impersonation,andpaperglassesrelativelyby27%,33%,
43%,93%,34%,59%,and6%,respectively.Amongallthe13presentationattackinstruments,
detectingobfuscationattacksisthemostchallenging.Thisisduetothefactthatthemakeup
appliedintheseattacksareverysubtleandmajorityofthefacesarePriorworkswere
54
notsuccessfulindetectingtheseattacksandpredictmostoftheobfuscationattacksas
Bylearningdiscriminativefeatureslocally,ourproposednetworkimprovesthestate-of-the-art
obfuscationattackdetectionperformanceby59%intermsofEERand46%intermsofACER.
2.6.5SiW-M:DetectingKnownAttacks
Hereallthe13presentationattackinstrumentsinSiW-Mareusedfortrainingaswellastesting.
WerandomlysplittheSiW-Mdatasetintoa60%-40%training/testingsplitandreporttheresults
inTable2.7.Incomparisontoastate-of-the-artfacepresentationattackdetectionmethod[30],
ourmethodoutperformsforalmostalloftheindividualpresentationattackinstrumentsaswell
astheoverallperformanceacrosspresentationattackinstruments.
Auxiliary[30]
utilizesdepth
andtemporalinformationforpresentationattackdetectionwhichaddscomplexityto
thenetwork.Wethatcharacterizinglocalspatialregionsasattackin
factleadstobettergeneralizationonunknownattacks(seeTable2.6)andspecializationonknown
attacks.
2.6.6EvaluationonOulu-NPUDataset
WefollowthefourstandardprotocolsintheOULU-NPUdataset[2]whichcoverthecross-
background,cross-presentation-attack-instrument(cross-PAI),cross-capture-device,andcross-
conditionsevaluations:

ProtocolI:
unseensubjects,illumination,andbackgrounds;

ProtocolII:
unseensubjectsandattackdevices;

ProtocolIII:
unseensubjectsandcameras;

ProtocolIV:
unseensubjects,illumination,backgrounds,attackdevices,andcameras.
Wecomparetheproposed
SSR-FCN
withthebestperformingmethod,namelyGRADI-
ENT[131],inIJCBMobileFaceCompetition[131]foreachprotocol.We
55
Table2.8ErrorRates(%)oftheproposed
SSR-FCN
andandcompetingfacepresentationattack
detectorsunderthefourstandardprotocolsofOulu-NPU[2].
Protocol
Method
APCER
BPCER
ACER
I
GRADIENT[131]
1.3
12.5
6.9
Auxiliary[30]
1.6
1.6
1.6
DeepPixBiS[124]
0.8
0.0
0.4
TSCNN-ResNet[132]
5.1
6.7
5.9
SSR-FCN
(Proposed)
1.5
7.7
4.6
II
GRADIENT[131]
3.1
1.9
2.5
Auxiliary[30]
2.7
2.7
2.7
DeepPixBiS[124]
11.4
0.6
6.0
TSCNN-ResNet[132]
7.6
2.2
4.9
SSR-FCN
(Proposed)
3.1
3.7
3.4
III
GRADIENT[131]
2.1

3.9
5
:
0

5
:
3
3
:
8

2
:
4
Auxiliary[30]
2
:
7

1
:
3
3
:
1

1
:
7
2
:
9

1
:
5
DeepPixBiS[124]
11
:
7

19
:
6
10
:
6

14
:
1
11
:
1

9
:
4
TSCNN-ResNet[132]
3
:
9

2
:
8
7
:
3

1
:
1
5
:
6

1
:
6
SSR-FCN
(Proposed)
2.9

2.1
2.7

3.2
2.8

2.2
IV
GRADIENT[131]
5.0

4.5
15
:
0

7
:
1
10
:
0

5
:
0
Auxiliary[30]
9
:
3

5
:
6
10
:
4

6
:
0
9.5

6.0
DeepPixBiS[124]
36
:
7

29
:
7
13
:
3

16
:
8
25
:
0

12
:
7
TSCNN-ResNet[132]
11
:
3

3
:
9
9.7

4.8
9
:
8

4
:
2
SSR-FCN
(Proposed)
8
:
3

6
:
8
13
:
3

8
:
7
10
:
8

5
:
1
alsoincludesomenewerbaselinemethods,includingAuxiliary[30],DeepPixBiS[124],and
TSCNN[132].Wecompareourproposedmethodwith10baselinesintotalforeachprotocol.
Additionalbaselinescanbefoundinsupplementarymaterial.
InTable2.8,
SSR-FCN
achievesACERsof4.6%,3.4%,2.8%,and10.8%inthefourprotocols,
respectively.Amongthebaselines,
SSR-FCN
evenoutperformsprevailingstate-of-the-artmethods
56
Table2.9Cross-DatasetHTER(%)oftheproposed
SSR-FCN
andcompetingfacepresentation
attackdetectors.
Method
CASIA
!
Replay
Replay
!
CASIA
CNN[112]
48.5
45.5
ColorTexture[102]
47.0
49.6
FaceSpoofBuster[133]
43.3
53.0
Auxiliary[30]
27.6
28.4
De-Noising[125]
28.5
41.1
Damer&Dimitrov[134]
28.4
38.1
STASN[135]
31.5
30.9
SAPLC[127]
27.3
37.5
SSR-FCN
(Proposed)
19.9
41.9
ﬁCASIA
!
ReplayﬂdenotestrainingonCASIAandtestingonReplay-Attack
inprotocolIIIwhichcorrespondstogeneralizationperformanceforunseensubjectsandcameras.
Theresultsarecomparabletobaselinemethodsintheotherthreeprotocols.SinceOulu-NPU
comprisesofonlyprintandreplayattacks,amajorityofthebaselinemethodsincorporateauxiliary
informationsuchasdepthandmotion.Indeed,incorporatingauxiliaryinformationcouldimprove
theresultsattheriskofovandoverheadcostandtime.
2.6.7Cross-DatasetGeneralization
Inordertoevaluatethegeneralizationperformanceof
SSR-FCN
whentrainedononedatasetand
testedonanother,followingpriorstudies,weperformacross-datasetexperimentbetweenCASIA-
FASD[3]andReplay-Attack[4].
InTable2.9,wethat,comparedto6prevailingstate-of-the-artmethods,theproposed
SSR-
FCN
achievesthelowesterror(a
27%
improvementinHTER)whentrainedonCASIA-FASD[3]
andevaluatedonReplay-Attack[4].Ontheotherhand,
SSR-FCN
achievesworseperformance
whentrainedonReplay-AttackandtestedonCASIA-FASD.Thiscanlikelybeattributedtohigher
resolutionimagesinCASIA-FASDcomparedtoReplay-Attack.Thisdemonstratesthat
SSR-
FCN
trainedwithhigher-resolutiondatacangeneralizebetteronpoorerqualitytestingimages,
57
Figure2.8Examplecaseswheretheproposedframework,
SSR-FCN
,failstocorrectlyclassify
andpresentationattacks.(a)areaspresentationattackslikely
duetobrightlightingandocclusionsinfaceregions.(b)Presentationattacksas
duetothesubtlenatureofmake-upattacksandtransparentmasks.Correspondingattack
scores(
2
[0
;
1]
)areprovidedbeloweachimage.Largervalueofattackscoreindicatesahigher
likelihoodthattheinputimageisapresentationattack.Decisionthresholdis
0
:
5
.
butthereversemaynotholdtrue.Weintendonaddressingthislimitationinfuturework.
Additionalbaselinescanbefoundinsupplementarymaterial.
2.6.8FailureCases
Eventhoughexperimentresultsshowenhancedgeneralizationperformance,ourmodelstillfails
tocorrectlyclassifycertaininputimages.InFigure2.8,weshowafewsuchexamples.
Figure2.8ashowsincorrectpredictionofaspresentationattacksinthepresence
ofinhomogeneousillumination.Thisisbecausethemodelpredictsbonaasbeingoneof
replayandprintattackswhichexhibitbrightlightingpatternsduetotherecapturingmediasuch
assmartphonesandlaptops.SinceweournetworkviaregionalsupervisioninStageII,
artifactsthatobstructpartsofthefacescanalsoadverselyaffectourmodel.
58
Figure2.9Visualizingpresentationattackregionsviatheproposed
SSR-FCN
.Redregionsindicate
higherlikelihoodofbeingapresentationattackregion.Correspondingattackscores(
2
[0
;
1]
)are
providedbeloweachimage.Largervalueofattackscoreindicatesahigherlikelihoodthatthe
inputimageisapresentationattack.Decisionthresholdis
0
:
5
.
Figure2.8bshowsincorrectofpresentationattackasThisispar-
ticularlytruewhenpresentationattackartifactsareverysubtle,suchascosmeticandobfuscation
make-upattacks.Transparentmaskscanalsobeproblematicwhenthemaskitselfisbarelyvisible.
2.6.9ComputationalRequirement
Sincefacepresentationattackdetectionmodulesareemployedasapre-processingstepforauto-
matedfacerecognitionsystems,itiscrucialthatthepresentationattackpredictiontimeshouldbe
aslowaspossible.Theproposedapproachcomprisesof1.5Mtrainableparameterscomparedtoa
traditionalCNN[30]with3Mlearnableparameters.Theproposed
SSR-FCN
takesunder
2
hours
totrainbothStageIandStageII,and
4
milisecondstopredictasingle
(256

256)
presentationat-
imageonaNvidiaGTX1080TiGPU.Inotherwords,
SSR-FCN
canprocessframes
59
(a)PaperEyeglasses
(b)AverageEyeglassesScoreMap
Figure2.10Apartialpresentationattackartifactmaybepresentinasmallportionoftheinput
256

256
faceimage,suchas(a)papereyeglasses.However,sincetheproposed
SSR-FCN
dy-
namicallyaggregatesdecisionsacrossmultiplereceptiveintheimage,amajorityofthe
pixelsinthescoremapcompriseofhighscores(indicatingthepresenceofapresentation
attack).Wevisualizetheaveragescoremapacrossallpapereyeglassattacksin(b).
at250FramesPerSecond(FPS)andthesizeofthemodelisonly
11
:
8
MB.Therefore,
SSR-FCN
is
wellsuitedfordeploymentwherereal-timedecisionsarerequired.
2.6.10VisualizingPresentationAttackRegions
SSR-FCN
canautomaticallylocatetheindividualpresentationattackregionsinaninputfaceim-
age.InFigure2.9,weshowheatmapsfromthescoremapsextractedforarandomlychosenimage
fromallpresentationattackinstruments.Redregionsindicateahigherlikelihoodofpresentation
attack.
Forainputimage,thepresentationattackregionsaresparsewithlowlikelihoods.In
thecaseofreplayandprintattacks,thepredictedpresentationattackregionsarelocatedthroughout
theentirefaceimage.Thisisbecausethesepresentationattackscontainglobal-levelnoise.For
maskattacks,includinghalf-mask,siliconemask,transparentmask,papermask,andmannequin,
thepresentationattackpatternsareneartheeyeandnoseregions.Make-upattacksare
60
hardertodetectsincetheyareverysubtleinnature.Proposed
SSR-FCN
detectsobfuscationand
cosmeticattackattemptsbylearninglocaldiscriminativecuesaroundtheeyebrowregions.In
contrast,impersonationmake-uppatternsexistthroughouttheentireface.Wealsothat
SSR-
FCN
canpreciselylocatethepresentationattackartifacts,suchasfunnyeyeglasses,paperglasses,
andpapercut,inpartialattacks.
2.7Discussion
Weshowthattheproposed
SSR-FCN
achievessuperiorgeneralizationperformanceonSiW-M
dataset[1]comparedtotheprevailingstate-of-the-artmethodsthattendtoovontheseen
presentationattackinstruments.Ourmethodalsoachievescomparableperformancetothestate-
of-the-artinOulu-NPUdataset[2]andoutperformsallbaselinesforcross-datasetgeneralization
performance(CASIA-FASD[3]
!
Replay-Attack[4]).
Incontrasttoanumberofpriorstudies[30,88,102,123],theproposedapproachdoesnot
utilizeauxiliarycuesforpresentationattackdetection,suchasmotionanddepthinformation.
Whileincorporatingsuchcuesmayenhanceperformanceonprintandreplayattackdatasetssuch
asOulu-NPU,CASIA-MFSD,andReplay-Attack,itisattheriskofpotentiallyovtothe
twoattacksandcomputecost.Amajorof
SSR-FCN
liesisitsusability.Asimplepre-
processingstepincludesfacedetectionandalignment.Thecroppedfaceisthenpassedtothe
network.Witha
single
forward-passthroughtheFCN,weobtainboththescoremapandthe
attackscore.
Ourproposed
SSR-FCN
outputsaattackdecisionateachpixelofinter-
mediatefeaturemaps.Duetothedownsamplinglayersfoundaftereachconvolutionaloperation
(seeTable2.2),thedecisionsareautomaticallyaggregated.Therefore,thefeaturemap(ref-
eredtoas
scoremap
)isofmuchsmallerresolution(
16

16
pixels)comparedtotheoriginalimage.
Therefore,inthecaseofpartialattackssuchaspapereyeglasses,eventhoughasmallportionof
thefaceimagecomprisesofapresentationattackartifact,ourscoremapisanaggregated
61
decisionacrossmultipleslidingwindows(receptiveoftheoriginalimage.InFigure2.10,
wecomputetheaveragescoremapforallthepapereyeglassespresentationattacks.Wethat,
eventhoughpapereyeglassescompriseofasmallportionoftheoriginalimage,majorityofthe
pixelsintheaverage
scoremap
compriseofhighscores(indicatingthepresenceofapresentation
attack).Inthispaper,weobtainthescoreviaaveragepoolingthescoremap.Asfuturework,
weintendonexploringotherfusionmechanismssuchasaweightedaverageviaanattentionmask.
Eventhoughtheproposedmethodiswell-suitedforgeneralizablefacepresentationattackde-
tection,
SSR-FCN
isstilllimitedbytheamountandqualityofavailabletrainingdata.Forinstance,
whentrainedonalow-resolutiondataset,namelyReplay-Attack[4],cross-datasetgeneralization
performancesuffers.
2.8Summary
Facepresentationattackdetectionsystemsarecrucialforsecureoperationofanautomatedface
recognitionsystem.Withtheintroductionofsophisticatedpresentationattacks,suchashighreso-
lutionandtightsilicone3Dfacemasks,presentationattackdetectorsneedtoberobustand
generalizable.Weproposedafacepresentationattackdetectionframework,namely
SSR-FCN
,
thatachievedstate-of-the-artgeneralizationperformanceagainst13differentpresentationattack
instruments.
SSR-FCN
reducedtheaverageerrorrateofcompetitivealgorithmsby
14%
onone
ofthelargestandmostdiversefacepresentationattackdetectiondataset,SiW-M,comprisedof13
presentationattackinstruments.Italsogeneralizeswellwhentrainingandtestingdatasetsarefrom
differentsources.Inaddition,theproposedmethodisshowntobemoreinterpretablecompared
topriorstudiessinceitcandirectlypredictthepartsofthefacesthatareconsideredaspresen-
tationattacks.Inthefuture,weintendonexploringwhetherincorporatingdomainknowledgein
SSR-FCN
canfurtherimprovegeneralizationperformance.
62
Chapter3
SynthesizingandDefendingAgainst
AdversarialFaces
Inthepreviouschapter,weproposedasolutiontodefendAFRsystemsagainstphysicallycrafted
facespoofs.However,inthischapter,wewillshowthatprevailingAFRsystemsarealsovul-
nerabletothegrowingthreatofadversarialexampleswhicharedigitalcrafted.Ourmainfocus
istodesignanautomaticadversarialsynthesismethodthatcanevade5state-of-the-artAFR
systems.Withapowerfuladversarialfacesynthesizer,weevaluatetherobustnessofprevailing
AFRsystemsagainstsuchdigitaladversarialattacks.Thelatterpartofthechapterpresentsa
state-of-the-artsolutiontosafeguardAFRsystemsagainstanyadversarialface.
63
3.1Introduction
(a)EnrolledFace
0.72
0.78
(b)InputProbe
0.22
0.12
(c)AdvFaces
0.26
0.25
(d)PGD[33]
0.14
0.25
(e)FGSM[70]
Figure3.1Examplegalleryandprobefaceimagesandcorrespondingsynthesizedadversarialex-
amples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand(b)thesamesubject's
probeimage;(c)Adversarialexamplesgeneratedfrom(b)byourproposedsynthesismethod,Ad-
vFaces;(d-e)Resultsfromtwoadversarialexamplegenerationmethods.Cosinesimilarityscores
(
2
[

1
;
1]
)obtainedbycomparing(b-e)totheenrolledimageinthegalleryviaArcFace[9]are
shownbelowtheimages.Ascoreabove
0.28
(threshold@
0
:
1%
FalseAcceptRate)indicatesthat
twofaceimagesbelongtothesamesubject.Here,asuccessfulobfuscationattackwouldmeanthat
humanscanidentifytheadversarialprobesandenrolledfacesasbelongingtothesameidentitybut
anautomatedfacerecognitionsystemconsidersthemtobefromdifferentsubjects.
Frommobilephoneunlock,toboardingaatairports,theubiquityofautomatedface
recognitionsystems(AFR)isevident.Withdeeplearningmodels,AFRsystemsareabletoachieve
accuraciesashighas99%TrueAcceptRate(TAR)at0.1%FalseAcceptRate(FAR)[22].The
modelbehindthissuccessisaConvolutionalNeuralNetwork(CNN)[6,9,44]andtheavailability
oflargefacedatasetstotrainthemodel.However,CNNmodelshavebeenshowntobevulnerable
to
adversarialperturbations
1
[69Œ72].Szegedy
etal.
showedthedangersof
adversarial
examples
intheimagedomain,whereperturbingthepixelsintheinputimagecan
causeCNNstomisclassifytheimageevenwhentheamountofperturbationisimperceptibleto
1
Adversarialperturbationsrefertoalteringaninputimageinstancewithsmall,humanimperceptiblechangesina
mannerthatcanevadeCNNmodels.
64
(a)Printattack
(b)Replayattack
(c)Maskattack
(d)AdversarialFacessynthesizedviaproposedAdvFaces
Figure3.2Threetypesoffacepresentationattacks:(a)printedphotograph,(b)replayingthetar-
getedperson'svideoonasmartphone,and(c)asiliconemaskofthetarget'sface.Facepresentation
attacksrequireaphysicalartifact.Adversarialattacks(d),ontheotherhand,aredigitalattacksthat
cancompromiseeitheraprobeimageorthegalleryitself.Toahumanobserver,facepresentation
attacks(a-c)aremoreconspicuousthanadversarialfaces(d).
thehumaneye[69].Despiteimpressiverecognitionperformance,prevailingAFRsystemsarestill
vulnerabletothegrowingthreatofadversarialexamples(seeFigure3.1).
ToattackanAFRsystem,ahackercanmaliciouslyperturbhisfaceimageinamannerthat
cancauseAFRsystemstomatchittoatargetvictim(
impersonationattack
)oranyidentityother
thanthehacker(
obfuscationattack
).Yettothehumanobserver,thisadversarialfaceimageshould
appearasalegitimatefacephotooftheattacker(seeFigure3.2d).Thisisdifferentfromface
pre-
sentationattacks
,wherethehackerassumestheidentityofatargetbypresentingafakeface
(alsoknownasspoofface)toafacerecognitionsystem(seeFigure3.2).However,inthecase
ofpresentationattacks,thehackerneedstoactivelyparticipatebywearingamaskorreplayinga
photograph/videoofthegenuineindividualwhichmaybeconspicuousinscenarioswherehuman
operatorsareinvolved(suchasairports).Asdiscussedbelow,adversarialfaces,donotrequireac-
tiveparticipationofthesubjectduringauthentication(comparisonbetweenadversarialprobeand
65
Figure3.3Eightpointsofattacksinanautomatedfacerecognitionsystem[31].Anadversarial
imagecanbeinjectedintheAFRsystematpoints2and6(solidarrows).
galleryimages).
Considerforexample,theUnitedStatesCustomsandBorderProtection(CBP),thelargest
federallawenforcementagencyintheUnitedStates[73],which(i)processesentrytothecountry
forovera
million
travellers
everyday
[74]and(ii)employsautomatedfacerecognitionforverifying
travelers'identities[75].InordertoevadebeingasanindividualinaCBPwatchlist,
aterroristcanmaliciouslyenrollanadversarialimageinthegallerysuchthatuponenteringthe
border,hislegitimatefaceimagewillbematchedtoaknownandbenignindividualortoafake
identitypreviouslyenrolledinthegallery.Anindividualcanalsogenerateadversarialexamples
tododgehisownidentityinordertoguardpersonalprivacy.Ratha
etal.
[31]eight
pointsinabiometricsystemwhereanattackcanbelaunchedagainstabiometric(includingface)
recognitionsystem,includingAFR(seeFigure3.3).Anadversarialfaceimagecanbeinsertedin
theAFRsystematpoint2,wherecompromisedfaceembeddingswillbeobtainedbythefeature
extractorthatcouldbeusedforimpersonationorobfuscationattacks.Theentiregallerycanalso
becompromisedifthehackerenrollsanadversarialimageatpoint6,wherenoneoftheprobes
willmatchtothecorrectidentity'sgallery.
66
Threebroadcategoriesofadversarialattackshavebeen
1.
White-box
attack:AmajorityofthepriorworkassumesfullknowledgeoftheCNNmodel
andtheniterativelyaddsimperceptibleperturbationstotheprobeimageviavariousopti-
mizationschemes[32,33,70,136Œ141].Thisisunrealisticinreal-worldscenarios,sincethe
attackermaynotbeabletoaccessthemodels.
2.
Black-box
attack:Generally,black-boxattacksarelaunchedbyqueryingtheoutputsofthe
deployedAFRsystem[142],[143].Butitmaytakealargenumberofqueriestoobtaina
reasonableadversarialimage[142].Further,mostCommercial-Off-The-Shelf(COTS)face
matcherspermitonlyafewqueriesatatimetopreventsuchattacks.
3.
Semi-whitebox
attack:Here,awhite-boxmodelisutilized
onlyduringtraining
andthenad-
versarialexamplesaresynthesizedduringinferencewithoutanyknowledgeofthedeployed
AFRmodel.
ThischapterproposesaGAN-basedadversarialfacesynthesismethod,namely
AdvFaces
,
thatlearnstogeneratevisuallyrealisticadversarialfaceimagesthatarebystate-of-
the-artAFRsystems.ThelatterpartofthechapterutilizestheconceptslearnedfromAdvFacesin
ordertodefendAFRsytemsagainstanyadversarialattacktype.
3.2RelatedWork
3.2.1GenerativeAdversarialNetworks(GANs)
GenerativeAdversarialNetworks[144]havebeenshowntobesuccessfulinawidevarietyof
imagesynthesisapplications[145,146]suchasstyletransfer[147Œ149],image-to-imagetrans-
lation[150,151],andrepresentationlearning[145,152,153].Ourobjectiveistosynthesizeface
imagesthatarenotonlyvisuallyrealisticbutarealsoabletoevadeAFRsystems.
67
3.2.2AdversarialAttacksonImage
Majorityofthepublishedpapershavefocusedonwhite-boxattacks,wherethehackerhasfull
accesstothemodelthatisbeingattacked[33,69,70,136,137].Otherworksfocusedonoptimizing
adversarialperturbationbyminimizinganobjectivefunctionfortargetedattackswhilesatisfying
certainconstraints[136].However,thesewhite-boxapproachesarenotfeasibleinthefacerecog-
nitiondomain,astheattackerisunlikelytohaveaccesstothedeployedAFRsystem.Wepropose
afeed-forwardnetworkthatcanautomaticallygenerateanadversarialimagewithasingleforward
passwithouttheneedforanyknowledgeofAFRsystemduringinference.
Indeed,feed-forwardnetworkshavebeenusedforsynthesizingadversarialattacks.Baluja
andFischerproposedadeepautoencoderthatlearnstotransformaninputimagetoanadversarial
image[154].StudiesonsynthesizingadversarialinstancesviaGANsarelimitedinliterature[155Œ
157].Thesemethodsrequiresoftmaxprobabilitiesinordertoevadeanimage.Instead,
weproposeanidentitylossfunctionbettersuitedforgeneratingadversarialfacesusingtheface
embeddingsobtainedfromafacematcher.
3.2.3AdversarialAttacksonFaceRecognition
Inliterature,studiesongeneratingadversarialexamplesinthefacerecognitiondomainarerela-
tivelylimited.Bose
etal.
craftadversarialexamplesbysolvingconstrainedoptimizationsuchthat
afacedetectorcannotdetectaface[158].In[159,160],perturbationsareconstrainedtotheeye-
glassregionofthefaceandadversarialimageisgeneratedbygradient-basedmethods.However,
thesemethodsrelyonwhite-boxmanipulationsoffacerecognitionmodels,whichisimpractical
inreal-worldscenarios.Dong
etal.
proposedanevolutionaryoptimizationmethodforgenerating
adversarialfacesinblack-boxsettings[142].However,theyrequireatleast1,000queriestothe
targetAFRsystembeforearealisticadversarialfacecanbesynthesized.Song
etal.
employed
aconditionalvariationautoencoderGANforcraftingadversarialfaceimagesinasemi-whitebox
setting[161].However,theyonlyfocusedonimpersonationattacksandrequireatleast5images
ofthetargetsubjectfortrainingandinference.Incontrast,wetrainaGANthatcanperformboth
68
StudyMethodDatasetAttacksSelf-Sup.
Robustness
Adv.Training[11](2017)Trainwithadv.ImageNet[162]FGSM[34]

RobGAN[12](2019)Trainwithgeneratedadv.CIFAR10[163],Ima-
geNet[162]
PGD[35]

Feat.Denoising[164](2019)Customnetworkarch.ImageNet[162]PGD[35]

L2L[13](2019)Trainwithgeneratedadv.MNIST[165],CIFAR10[163]FGSM[34],PGD[35],
C&W[136]
X
Detection
Gong
etal.
[166](2017)BinaryCNNMNIST[165],CIFAR10[163]FGSM[34]

UAP-D[167](2018)PCA+SVMMEDS[168],MultiPIE[169],
PaSC[170]
UAP[171]

SmartBox[172](2018)AdaptiveNoiseYaleFace[173]DeepFool[141],EAD[174],
FGSM[34]

ODIN[175](2018)Out-of-distributionDetectionCIFAR10[163],Ima-
geNet[162]
OODsamples

Goswami
etal.
[176](2019)SVMonAFRFiltersMEDS[168],PaSC[170],
MBGC[177]
Black-box,EAD[174]

Steganalysis[178](2019)SteganlysisImageNet[162]FGSM[34],DeepFool[141],
C&W[136]

Massoli
etal.
[179](2020)MLP/LSTMonAFRFiltersVGGFace2[180]BIM[181],FGSM[34],
C&W[136]

Agarwal
etal.
[182](2020)ImageTransformationImageNet[162],MBGC[177]FGSM[34],PGD[35],Deep-
Fool[141]


MagNet[14](2017)AEMNIST[165],CIFAR10[163]FGSM[34],DeepFool[141],
C&W[136]

DefenseGAN[15](2018)GANMNIST[165],CIFAR10[163]FGSM[34],C&W[136]

Feat.Distillation[183](2019)JPEG-compressionMNIST[165],CIFAR10[163]FGSM[34],DeepFool[141],
C&W[136]

NRP[184](2020)AEImageNet[162]FGSM[34]
X
A-VAE[185](2020)VariationalAELFW[8]FGSM[34],PGD[35],
C&W[136]

FaceGuard
(thisstudy)Adv.Generator+DetectorLFW[8],Celeb-A[17],
FGSM[34],PGD[35],Deep-
Fool[141],
X
+FFHQ[18]
AdvFaces[16],GFLM[32],Semantic[186]
Table3.1Relatedworkinadversarialdefensesusedasbaselinesinourstudy.Unlikemajority
ofpriorwork,
FaceGuard
isself-supervisedwherenopre-computedadversarialexamplesarere-
quiredfortraining.
obfuscationandimpersonationattacksandrequiresasinglefaceimageofthetargetsubject.
3.2.4DefensesAgainstAdversarialAttacks
Inliterature,acommondefensestrategy,namely
robustness
istore-trainthewewishto
defendwithadversarialexamples[11,13,34,35].However,adversarialtraininghasbeenshownto
degradeaccuracyonreal(non-adversarial)images[187,188].
InordertopreventdegradationinAFRperformance,alargenumberofadversarialdefense
mechanismsaredeployedasapre-processingstep,namely
detection
,whichinvolvestraininga
binarytodistinguishbetweenrealandadversarialexamples[166,167,172,179,189Œ
69
199].Theattacksconsideredinthesestudies[200Œ203]wereinitiallyproposedintheobject
recognitiondomainandtheyoftenfailtodetecttheattacksinafeature-extractionnetworksetting,
asinfacerecognition.Therefore,prevailingdetectorsagainstadversarialfacesaredemonstrated
tobeeffectiveonlyinahighlyconstrainedsettingwherethenumberofsubjectsislimitedand
edduringtrainingandtesting[167,172,179].
Anotherpre-processingstrategy,namely

,involvesautomaticallyremovingadver-
sarialperturbationsintheinputimagepriortopassingthemtoafacematcher[14,15,184,204].
However,withoutadedicatedadversarialdetector,thesedefensesmayendupﬁpurifyingﬂareal
faceimage,resultinginhighfalserejectrates.
InTab.3.1,wesummarizeafewstudiesonadversarialdefensesthatareusedasbaselinesin
ourwork.
3.3SynthesizingAdversarialFaces
Semi-whiteboxsettingsaremoreappropriateforcraftingadversarialfaces;oncethenetworklearns
togeneratetheperturbedinstancesbasedonasinglefacerecognitionsystem,attackscanbetrans-
ferredtoanyblack-boxAFRsystems.However,pastapproaches,basedonGenerativeAdversarial
Networks(GANs)[155Œ157],wereproposedintheimagedomainandrelyonsoft-
maxprobabilities[155Œ157,161,205].Therefore,thenumberofobjectclassesareassumedto
beknownduringtrainingandtesting.Facerecognitionsystemsdonotutilizethesoftmaxlayer
for(asthenumberofidentitiesarenoted)insteadfeaturesfromthelastfullycon-
nectedlayerareusedforcomparingfaceimages.Approachesforcraftingadversarialfacesinclude
addingmakeup,eyeglasses,hat,orocclusionstofaces[159Œ161,205,206].
Weemphasizethefollowingrequirementsoftheadversarialfacegenerator:

Adversarialfaceimagesshouldbeperceptuallyrealisticsuchthatahumanobservercan
identifytheimageasalegitimatefaceimage.

Thefacesneedtobeperturbedinamannersuchthattheycannotbeasthehacker
70
(a)ObfuscationAttack
(b)ImpersonationAttack
Figure3.4Oncetrained,AdvFacesautomaticallygeneratesanadversarialfaceimage.Duringan
obfuscationattack,(a)theadversarialfaceappearstobeabenignexampleofCristianoRonaldo's
face,however,itfailstomatchhisenrolledimage.AdvFacescanalsocombineCristiano'sprobe
andBradPitt'sprobetosynthesizeanadversarialimagethatlookslikeCristianobutmatches
Brad'sgalleryimage(b).
(
obfuscationattack
)orautomaticallymatchedtoatargetsubject(
impersonationattack
)by
anAFRsystem.

Theamountofperturbationshouldbecontrollablebythehackersothathecanexaminethe
successofthelearningmodelasafunctionofamountofperturbation.

Theadversarialexamplesshouldbe
transferable
and
model-agnostic
(
i
.
e
.treatthetargetAFR
modelasablack-box).Inotherwords,thegeneratedadversarialexamplesshouldhavehigh
attacksuccessrateonotherblack-boxAFRsystemsaswell.
Weproposeanautomatedadversarialfacesynthesismethod,named
AdvFaces
,whichgenerates
anadversarialimageforaprobefaceimageandalltheaboverequirements(seeFig.3.4).
71
Thecontributionsofthepaperareasfollows:
1.
GAN-basedAdvFacesthatlearnstogeneratevisuallyrealisticadversarialfaceimagesthat
arebystate-of-the-artAFRsystems.
2.
AdversarialfacesgeneratedviaAdvFacesaremodel-agnosticandtransferable,andachieve
highsuccessrateon5state-of-the-artautomatedfacerecognitionsystems.
3.
Perceptualstudieswherehumanobserverssuggestthattheadversarialexamplesappearsim-
ilartotheprobe.
4.
Visualizingthefacialregions,wherepixelsareperturbedandanalyzingthetransferability
ofAdvFaces.
5.
Anopen-source
2
automatedadversarialfacegeneratorpermittinguserstocontroltheamount
ofperturbation.
3.3.1ProposedMethodology
Ourgoalistosynthesizeafaceimagethatvisuallyappearstopertaintothetargetface,yetauto-
maticfacerecognitionsystemseitherincorrectlymatchesthesynthesizedimagetoanotherperson
ordoesnotmatchtotarget'sgalleryimages.AdvFacescomprisesofagenerator
G
,adiscriminator
D
,andfacematcher(seeFigure3.5).
Generator
Theproposedgeneratortakesaninputfaceimage,
x
2X
,andoutputsanimage,
G
(
x
)
.Thegeneratorisconditionedontheinputimage
x
;fordifferentinputfaces,wewillget
differentsynthesizedimages.
Sinceourgoalistoobtainanadversarialimagethatismetricallysimilartotheprobeinthe
imagespace,
x
,itisnotdesirabletoperturballthepixelsintheprobeimage.Forthisreason,
wetreattheoutputfromthegeneratorasanadditivemaskandtheadversarialfaceisas
x
+
G
(
x
)
.Ifthemagnitudeofthepixelsin
G
(
x
)
isminimal,thentheadversarialimagecomprises
mostlyoftheprobe
x
.Here,wedenote
G
(
x
)
asanﬁadversarialmaskﬂ.Inordertoboundthe
2
https://github.com/ronny3050/AdvFaces
72
Figure3.5Givenaprobefaceimage,AdvFacesautomaticallygeneratesanadversarialmaskthat
isthenaddedtotheprobetoobtainanadversarialfaceimage.
magnitudeoftheadversarialmask,weintroducea
perturbationloss
duringtrainingbyminimizing
the
L
2
norm
3
:
L
perturbation
=
E
x
[max(

kG
(
x
)
k
2
)]
(3.3.1)
where

2
[0
;
1
)
isahyperparameterthatcontrolstheminimumamountofperturbationallowed.
Inordertoachieveourgoalofimpersonatingatargetsubject'sfaceorobfuscatingone'sown
identity,weneedafacematcher,
F
,tosupervisethetrainingofAdvFaces.Forobfuscationattack,
ateachtrainingiteration,AdvFacestriestominimizethecosinesimilaritybetweenfaceembed-
dingsoftheinputprobe
x
andthegeneratedimage
x
+
G
(
x
)
viaan
identity
lossfunction:
L
identity
=
E
x
[
F
(
x
;
x
+
G
(
x
))]
(3.3.2)
Foranimpersonationattack,AdvFacesmaximizesthecosinesimilaritybetweenthefaceembed-
dingsofarandomlychosentarget'sprobe,
y
,andthegeneratedadversarialface
x
+
G
(
x
)
via:
L
identity
=
E
x
[1
F
(
y
;
x
+
G
(
x
))]
(3.3.3)
3
Forbrevity,wedenote
E
x

E
x
2X
.
73
ObfuscationAttack
AdvFacesGFLM[32]PGD[33]FGSM[70]
AttackSuccessRate(%)@0.1%FAR
FaceNet[6]99.6723.3499.70
99.96
SphereFace[44]97.2229.49
99.34
98.71
ArcFace[9]
64.53
03.4333.2535.30
COTS-A
82.98
08.8918.7432.48
COTS-B
60.71
05.0501.4918.75
StructuralSimilarity
0.95

0.01
0.82

0.120.29

0.060.25

0.06
ComputationTime(s)
0.01
3.2211.740.03
ImpersonationAttack
AdvFacesA
3
GN[161]PGD[33]FGSM[70]
AttackSuccessRate(%)@0.1%FAR
FaceNet[6]20.85

0.4005.99

0.19
76.79

0.26
13.04

0.12
SphereFace[44]
20.19

0.27
07.94

0.1909.03

0.3902.34

0.03
ArcFace[9]
24.30

0.44
17.14

0.2919.50

1.9508.34

0.21
COTS-A
20.75

0.35
15.01

0.3001.76

0.1001.40

0.08
COTS-B
19.85

0.28
10.23

0.5012.49

0.2404.67

0.16
StructuralSimilarity
0.92

0.02
0.69

0.040.77

0.040.48

0.75
ComputationTime(s)
0.01
0.0411.740.03
White-boxmatcher(usedfortraining)
Black-boxmatcher(neverusedintraining)
Table3.2Attacksuccessratesandstructuralsimilaritiesbetweenprobeandgalleryimagesfor
obfuscationandimpersonationattacks.Attackratesforobfuscationcomprisesof484,514com-
parisonsandthemeanandstandarddeviationacross10-foldsforimpersonationreported.The
meanandstandarddeviationofthestructuralsimilaritiesbetweenadversarialandprobeimages
alongwiththetimetakentogenerateasingleadversarialimage(onaQuadroM6000GPU)also
reported.
Theperturbationandidentitylossfunctionsenforcethenetworktolearnthesalientfacial
regionsthatcanbeperturbedminimallyinordertoevadeautomaticfacerecognitionsystems.
Discriminator
AkintopreviousworksonGANs[144,150],weintroduceadiscriminatorin
ordertoencourageperceptualrealismofthegeneratedimages.Weuseafully-convolutionnetwork
asapatch-baseddiscriminator[150].Here,thediscriminator,
D
,aimstodistinguishbetweena
probe,
x
,andageneratedadversarialfaceimage
x
+
G
(
x
)
viaaGANloss:
L
GAN
=
E
x
[log
D
(
x
)]+
E
x
[log(1
D
(
x
+
G
(
x
)))]
(3.3.4)
74
Finally,AdvFacesistrainedinanend-to-endfashionwiththefollowingobjectives:
min
D
L
D
=

GAN
(3.3.5)
min
G
L
G
=
L
GAN
+

i
L
identity
+

p
L
perturbation
(3.3.6)
where

i
and

p
arehyper-parameterscontrollingtherelativeimportanceofidentityandperturba-
tionlosses,respectively.Notethat
L
GAN
and
L
perturbation
encouragethegeneratedimagestobe
visuallysimilartotheoriginalfaceimages,while
L
identity
optimizesforahighattacksuccessrate.
Aftertraining,thegenerator
G
cangenerateanadversarialfaceimageforanyinputimageandcan
betestedonanyblack-boxfacerecognitionsystem.
3.3.2ExperimentalSettings
EvaluationMetrics
WequantifytheeffectivenessoftheadversarialattacksgeneratedbyAdv-
Facesandotherstate-of-the-artbaselinesvia(i)
attacksuccessrate
and(ii)
structuralsimilarity
(SSIM)
.
Theattacksuccessratefor
obfuscationattack
iscomputedas,
AttackSuccessRate
=
(No.ofComparisons
<˝
)
TotalNo.ofComparisons
(3.3.7)
whereeachcomparisonconsistsofasubject'sadversarialprobeandanenrollmentimage.Here,
˝
isapre-determinedthresholdcomputedat,say,0.1%FAR
4
.Attacksuccessratefor
impersonation
attack
isas,
AttackSuccessRate
=
(No.ofComparisons

˝
)
TotalNo.ofComparisons
(3.3.8)
Here,acomparisoncomprisesofanadversarialimagesynthesizedwithatarget'sprobeand
matchedtothetarget'senrolledimage.Weevaluatethesuccessratefortheimpersonationset-
tingvia10-foldcross-validationwhereeachfoldconsistsofarandomlychosentarget.
Similartopriorstudies[161],inordertomeasurethesimilaritybetweentheadversarialex-
4
Foreachfacematcher,wepre-computethethresholdat
0
:
1%
FARonallpossibleimagepairsinLFW.For
e
.
g
.,
threshold@
0
:
1%
FARforArcFaceis
0
:
28
.
75
ampleandtheinputface,wecomputethestructuralsimilarityindex(SSIM)betweentheimages.
SSIMisanormalizedmetricbetween

1
(completelydifferentimagepairs)to
1
(identicalimage
pairs).
Datasets
WetrainAdvFacesonCASIA-WebFace[10]andthentestonLFW[8]
5
.

CASIA-WebFace
[10]iscomprisedof494,414faceimagesbelongingto10,575different
subjects.Weremoved84subjectsthatarealsopresentinLFWandthetestingimagesinthis
chapter.

LFW
[8]contains13,233web-collectedimagesof5,749differentsubjects.Inorderto
computetheattacksuccessrate,weonlyconsidersubjectswithatleasttwofaceimages.
Afterthis9,614faceimagesof1,680subjectsareavailableforevaluation.
Allthetestingimagesinthischapterhavenoidentityoverlapwiththetrainingset,CASIA-
WebFace[10].
ExperimentalSettings
WeuseADAMoptimizersinTwwith

1
=0
:
5
and

2
=0
:
9
for
theentirenetwork.Eachmini-batchconsistsof
32
faceimages.WetrainAdvFacesfor200,000
stepswithaedlearningrateof
0
:
0001
.Sinceourgoalistogenerateadversarialfaceswithhigh
successrate,theidentitylossisofutmostimportance.Weempiricallyset

i
=10
:
0
and

p
=1
:
0
.
Wetraintwoseparatemodelsandset

=3
:
0
and

=8
:
0
forobfuscationandimpersonation
attacks,respectively.ArchitecturedetailsareprovidedintheAddendum(Sec.3.3.9).
FaceRecognitionSystems
Forallourexperiments,weemploy5state-of-the-artfacematchers
6
.
Threeofthemarepubliclyavailable,namely,FaceNet[6],SphereFace[44],andArcFace[9].
Wealsoreportourresultsontwocommercial-off-the-shelf(COTS)facematchers,COTS-Aand
COTS-B
7
.WeuseFaceNet[6]asthewhite-boxfacerecognitionmodel,
F
,duringtraining.
All
5
TrainingonCASIA-WebFaceandevaluatingonLFWisacommonapproachinfacerecognitionliterature[9,44]
6
Alltheopen-sourceandCOTSmatchersachieve99%accuracyonLFWunderLFWprotocol.
7
BothCOTS-AandCOTS-ButilizeCNNsforfacerecognition.COTS-BisoneofthetopperformersintheNIST
OngoingFaceRecognitionVendorTest(FRVT)[207].
76
GalleryProbeAdvFacesGFLM[32]PGD[33]FGSM[70]
0.680.140.260.270.04
0.380.080.120.210.02
(a)ObfuscationAttack
Target'sGalleryTarget'sProbeProbeAdvFacesA
3
GN[161]FGSM[70]
0.780.100.300.290.36
0.800.150.340.330.42
(b)ImpersonationAttack
Figure3.6AdversarialfacesynthesisresultsonLFWdatasetin(a)obfuscationand(b)imperson-
ationattacksettings(cosinesimilarityscoresobtainedfromArcFace[9]withthreshold@
0
:
1%
FAR
=0
:
28
).Theproposedmethodsynthesizesadversarialfacesthatareseeminglyinconspic-
uousandmaintainhighperceptualquality.AdditionalexamplesareavailableintheAddendum
(Sec.3.3.9).
thetestingimagesinthischapteraregeneratedfromthesamemodel(trainedonlywithFaceNet)
andtestedondifferentmatchers.
3.3.3ComparisonwithState-of-the-Art
Wecompareouradversarialfacesynthesismethodwithstate-of-the-artmethodsthathavespecif-
icallybeenimplementedorproposedforfaces,includingGFLM[32],PGD[33],FGSM[70],
77
andA
3
GN[161]
8
.InTable3.2,wethatcomparedtothestate-of-the-art,AdvFacesgener-
atesadversarialfacesthataresimilartotheprobe3.6.Moreover,theadversarialimagesattaina
highobfuscationattacksuccessrateon4state-of-the-artblack-boxAFRsystemsinbothobfusca-
tionandimpersonationsettings.AdvFaceslearnstoperturbthesalientregionsoftheface,unlike
PGD[33]andFGSM[70],whichaltereverypixelintheimage.GFLM[32],ontheotherhand,ge-
ometricallywarpsthefaceimagesandthereby,resultsinlowstructuralsimilarity.Inaddition,the
state-of-the-artmatchersarerobusttosuchgeometricdeformationwhichexplainsthelowsuccess
rateofGFLMonfacematchers.A
3
GN,anotherGAN-basedmethod,however,failstoachievea
reasonablesuccessrateinanimpersonationsetting.
3.3.4AblationStudy
Inordertoanalyzetheimportanceofeachmoduleinoursystem,inFigure3.7,wetrainthreevari-
antsofAdvFacesforcomparisonbyremovingthediscriminator(
D
),perturbationloss
L
perturbation
,
andidentityloss
L
identity
,respectively.Thediscriminatorhelpstoensurethevisualqualityofthe
synthesizedfacesaremaintained.Withthegeneratoralone,undesirableartifactsareintroduced.
Withouttheproposedperturbationloss,perturbationsintheadversarialmaskareunboundedand
therefore,leadstoalackinperceptualquality.Theidentitylossisimperativeinensuringanad-
versarialimageisobtained.Withouttheidentityloss,thesynthesizedimagecannotevadestate-of-
the-artfacematchers.WethateverycomponentofAdvFacesisnecessaryinordertoobtain
anadversarialfacethatisnotonlyperceptuallyrealisticbutcanalsoevadestate-of-the-artface
matchers.
3.3.5WhatisAdvFacesLearning?
Via
L
perturbation
,duringtraining,AdvFaceslearnstoperturbonlythesalientfacialregionsthat
canevadethefacematcher,
F
(FaceNet[6]inourcase).InFigure3.8,AdvFacessynthesizesthe
adversarialmaskscorrespondingtotheprobes.Wethenthresholdthemasktoextractpixelswith
8
Wetrainthebaselinesusingtheirofimplementations(detailedintheAddendum(Sec.3.3.9)).
78
Inputw/o
D
w/o
L
prt
w/o
L
idt
withall
Figure3.7VariantsofAdvFacestrainedwithoutthediscriminator,perturbationloss,andidentity
loss,respectively.EverycomponentofAdvFacesisnecessary.
perturbationmagnitudesexceeding
0
:
40
.Itcanbeinferredthattheeyebrows,eyeballs,andnose
containhighlydiscriminativeinformationthatanAFRsystemutilizestoidentifyanindividual.
Therefore,perturbingthesesalientregionsareenoughtoevadestate-of-the-artfacerecognition
systems.
3.3.6TransferabilityofAdvFaces
InTable3.2,wethatattackssynthesizedbyAdvFaceswhentrainedonawhite-boxmatcher
(FaceNet),cansuccessfullyevade5otherfacematchersthatarenotutilizedduringtrainingin
bothobfuscationandimpersonationsettings.Inordertoinvestigatethetransferabilityproperty
ofAdvFaces,weextractfaceembeddingsofrealimagesandtheircorrespondingadversarialim-
ages,undertheobfuscationsetting,viathewhite-boxmatcher(FaceNet)andablack-boxmatcher
(ArcFace).Intotal,weextractfeaturevectorsfrom1,456faceimagesof10subjectsintheLFW
dataset[8].InFigure3.9,weplotthecorrelationheatmapbetweenfacefeaturesofrealimages,
theircorrespondingadversarialmasksandadversarialimages.First,weobservethatfaceem-
beddingsofrealimagesextractedbyFaceNetandArcFacearecorrelatedinasimilarfashion.
79
ProbeAdv.MaskVisualizationAdv.Image
0.12
0.26
Figure3.8State-of-the-artfacematcherscanbeevadedbyslightlyperturbingsalientfacialregions,
suchaseyebrows,eyeballs,andnose(cosinesimilarityobtainedviaArcFace[9]).
Figure3.9CorrelationbetweenfacefeaturesextractedviaFaceNetandArcFacefrom1,456images
belongingto10subjects.
Thisindicatesthatbothmatchersextractfeatureswithrelatedpairwisecorrelations.Consequently,
perturbingsalientfeaturesforFaceNetcanleadtohighattacksuccessratesforArcFaceaswell.
Thesimilarityamongthecorrelationdistributionsofbothmatcherscanalsobeobservedwhen
80
Figure3.102Dt-SNEvisualizationoffacerepresentationsextractedviaFaceNetandArcFace
from1,456imagesbelongingto10subjects.
adversarialmasksandadversarialimagesareinputtothematchers.Thatis,receptivefor
automaticfacerecognitionsystemsattendtosimilarregionsintheface.Tofurtherillustratethe
distributionsoftheembeddingsofrealandsynthesizedimages,weplotthe2Dt-SNEvisualization
ofthefaceembeddingsforthe10subjectsinFigure3.10.Theidentityclusteringscanbeclearly
observedfrombothrealandadversarialimages.Inparticular,theadversarialcounterpartofeach
subjectformsanewclusterthatdrawsclosertotheadversarialclusteringsofothersubjects.This
showsthatAdvFacesperturbsonlysalientpixelsrelatedtofaceidentitywhilemaintainingase-
manticmeaninginthefeaturespace,resultinginasimilarmanifoldofsynthesizedfacestothatof
realfaces.
3.3.7EffectofPerturbationAmount
Theperturbationloss,
L
perturbation
isboundedbyahyper-parameter,

,
i
.
e
.,the
L
2
normofthe
adversarialmaskmustbeatleast

.Withoutthisconstraint,theadversarialmaskbecomesablank
imagewithnochangestotheprobe.With

,wecanobserveatrade-offbetweentheattacksuccess
81
Figure3.11Trade-offbetweenattacksuccessrateandstructuralsimilarityforimpersonationat-
tacks.Wechoose

=8
:
0
.
rateandthestructuralsimilaritybetweentheprobeandsynthesizedadversarialface(Fig.3.11).A
higher

leadstolessperturbationrestriction,resultinginahigherattacksuccessrateatthecostof
alowerstructuralsimilarity.Foranimpersonationattack,thisimpliesthattheadversarialimage
maycontainfacialfeaturesfromboththehackerandthetarget.Inourexperiments,wechose

=8
:
0
and

=3
:
0
forimpersonationandobfuscationattacks,respectively.
3.3.8HumanPerceptualStudy
For500realfaceimages(probes),wegenerate500correspondingadversarialexamplesviaAd-
vFaces,GFLM[32],A
3
GN[161],PGD[33],andFGSM[70].Wethenperformedauserstudy
onAmazonMechanicalTurk(AMT).Aworkerisshownaprobealongwiththe5adversarial
faces.Theworkerthenhasunlimitedtimetodecidewhichadversarialface,amongthe5possible
choices,isthemostsimilartotheprobe.
82
(a)Probe
(b)AdvFaces
(c)GFLM[32]
(d)Probe
(e)AdvFaces
(f)PGD[33]
Figure3.12Examplefailurecaseswherehumanobserversvotedanadversarialimagesynthesized
by(c)GFLM[32]and(f)PGD[33]tobeclosertotheprobeface(a),(d)comparedtoAdvFaces
(b),(e).
Table3.3Foreachmethod,theaverageandstandarddeviation(
%
)ofthenumberoftimesworkers
chosethesynthesizedimagetobeclosesttotheprobe.
Method
HitRate(%)
AdvFaces
62.06

5.06
GFLM[32]
23
:
52

3
:
82
A
3
GN[161]
00
:
60

0
:
84
PGD[33]
12
:
80

3
:
88
FGSM[70]
00
:
38

0
:
73
Intotal,wecomputeresultsfrom100differentworkers.FromTables3.2and3.3,wethat
AdvFacesgeneratesadversarialfacesthatarenotonlyeffectiveinevadingfacematchers,butare
alsovisuallysimilartotheprobesandoutperformsthestate-of-the-artadversarialfacesynthesis
methods.Indeed,someadversarialfaceimagessynthesizedbythebaselinesarevotedtobecloser
totheprobe(seeFigure3.12),however,comparedtoAdvFaces,thesemethodshavealowsuccess
rate(seeTable3.2).
83
3.3.9Addendum
3.3.9.1ImplementationDetails
AdvFaces
isimplementedusingTwr1.12.0.AsingleNVIDIAQuadroM6000GPUis
usedfortrainingandtesting.
DataPreprocessing
AllfaceimagesarepassedthroughMTCNNfacedetector[41]todetect
velandmarks(twoeyes,nose,andtwomouthcorners).Viasimilaritytransformation,theface
imagesarealigned.Aftertransformation,theimagesareresizedto
160

160
.Priortotraining
andtesting,eachpixelintheRGBimageisnormalizedbysubtracting127.5anddividingby128.
Architecture
Let
c7s1-k
bea
7

7
convolutionallayerwith
k
andstride
1
.
dk
denotes
a
4

4
convolutionallayerwith
k
andstride
2
.
Rk
denotesaresidualblockthatcontainstwo
3

3
convolutionallayers.
uk
denotesa
2

upsamplinglayerfollowedbya
5

5
convolutional
layerwith
k
andstride
1
.WeapplyInstanceNormalizationandBatchNormalizationtothe
generatoranddiscriminator,respectively.WeuseLeakyReLUwithslope0.2inthediscriminator
andReLUactivationinthegenerator.Thearchitecturesofthetwomodulesareasfollows:

Generator:
c7s1-64,d128,d256,R256,R256,R256,u128,u64,c7s1-3

Discriminator:
d32,d64,d128,d256,d512
A
1

1
convolutionallayerwith
3
andstride
1
isattachedtothelastconvolutionallayerof
thediscriminatorforthepatch-basedGANloss
L
GAN
.
Weapplythe
tanh
activationfunctiononthelastconvolutionlayerofthegeneratortoensure
thatthegeneratedimage
2
[

1
;
1]
.Inthechapter,wedenotedtheoutputofthetanhlayerasan
ﬁadversarialmaskﬂ,
G
(
x
)
2
[

1
;
1]
and
x
2
[

1
;
1]
.Theadversarialimageiscomputedas
x
adv
=2

clamp

G
(
x
)+

x
+1
2

1
0

1
.Thisensures
G
(
x
)
caneitheraddorsubtractpixelsfrom
x
when
G
(
x
)
6
=0
.When
G
(
x
)
!
0
,then
x
adv
!
x
.
84
Theoverallalgorithmdescribingthetrainingprocedureof
AdvFaces
canbefoundinAlgo-
rithm1.
Algorithm1
Training
AdvFaces
.Allexperimentsinthisworkuse

=0
:
0001
,

1
=0
:
5
,

2
=0
:
9
,

i
=10
:
0
,

p
=1
:
0
,
m
=32
.
Weset

=3
:
0
(obfuscation),

=8
:
0
(impersonation).
1:
Input
2:
X
TrainingDataset
3:
F
Cosinesimilaritybetweenanimagepairobtainedbyfacematcher
4:
G
Generatorwithweights
G

5:
D
Discriminatorwithweights
D

6:
m
Batchsize
7:

Learningrate
8:
for
numberoftrainingiterations
do
9:
Sampleabatchofprobes
f
x
(
i
)
g
m
i
=1
˘X
10:
if
impersonationattack
then
11:
Sampleabatchoftargetimages
y
(
i
)
˘X
12:

(
i
)
=
G
((
x
(
i
)
;y
(
i
)
)
13:
elseif
obfuscationattack
then
14:

(
i
)
=
G
(
x
(
i
)
)
15:
endif
16:
x
(
i
)
adv
=
x
(
i
)
+

(
i
)
17:
L
perturbation
=
1
m

P
m
i
=1
max


jj

(
i
)
jj
2

18:
if
impersonationattack
then
19:
L
identity
=
1
m
h
P
m
i
=1
F

x
(
i
)
;x
(
i
)
adv

20:
elseif
obfuscationattack
then
21:
L
identity
=
1
m
h
P
m
i
=1

1
F

y
(
i
)
;x
(
i
)
adv

22:
endif
23:
L
G
GAN
=
1
m
h
P
m
i
=1
log

1
D
(
x
(
i
)
adv
)

24:
L
D
=
1
m
P
m
i
=1
h
log

D
(
x
(
i
)
)

+
log

1
D
(
x
(
i
)
adv
)

25:
L
G
=
L
G
GAN
+

i
L
identity
+

p
L
perturbation
26:
G

=
Adam
(
O
G
L
G
;
G

;;
1
;
2
)
27:
D

=
Adam
(
O
D
L
D
;
D

;;
1
;
2
)
28:
endfor
3.3.9.2EffectonCosineSimilarity
InFigure3.13weseetheeffectoncosinesimilarityscoreswhenadversarialfaceimagessynthe-
sizedbyAdvFacesisintroducedtoablack-boxfacematcher,ArcFace[9].Amajority(
64
:
53%
)
85
Figure3.13ShiftincosinesimilarityscoresforArcFace[9]beforeandafteradversarialattacks
generatedviaAdvFaces.
ofthescoresfallbelowthethresholdat
0
:
1%
FARcausingtheAFRsystemtofalselyrejectunder
obfuscationattack.Intheimpersonationattacksetting,thesystemfalselyaccepts
24
:
30%
ofthe
imagepairs.
3.3.9.3StructuralSimilarity
Imagecomparisontechniques,suchMeanSquaredError(MSE)orPeakSignal-to-NoiseRatio
(PSNR),estimatetheabsoluteerrors,disregardingthe
perceptual
differences;ontheotherhand,
SSIMisaperception-basedmodelthatconsidersimagedifferencesasperceivedchangeinstruc-
turalinformation,whilealsoincorporatingimportantperceptualphenomena,includingbothlu-
minancemaskingandcontrastmaskingterms.Forinstance,considertheimagepaircomprising
oftwoimagesofMingXi.Wecannoticethatperceptually,theimagepairsaresimilar,butthis
perceptualsimilarityisnotappropriatelyinMSEandPSNR.Since,SSIMisanormal-
izedsimilaritymetric,itisbettersuitedforourapplicationwhereafaceimagepairissubjectively
judgedbyhumanoperators.
86
(a)Probe
(b)Adversarial
SSIM:00.96MSE:40.82PSNR:32.02
Figure3.15Left:RealfaceimagesintheLFWdataset.Right:Adversarialimagessynthesizedvia
AdvFacesunderobfuscationsetting.
3.3.9.4BaselineImplementationDetails
Allthestate-of-the-artbaselinesinthechapterareimplementationsproposedforevad-
ingfacerecognitionsystems.
87
FGSM[70]
WeusetheCleverhansimplementation
9
ofFGSMonFaceNet.Thisimplementation
supportsbothobfuscationandimpersonationattacks.Theonlywaschanging

=
0
:
01
to

=0
:
08
inordertocreatemoreeffectiveattacks.
PGD[33]
WeuseavariantofPGDproposedforfacerecognitionsystems
10
.Orig-
inally,thisimplementationisproposedforimpersonationattacks,however,forobfuscationwe
randomlychooseatargetotherthangenuinesubject.Wedonotmakeanytothe
parameters.
GFLM[32]
Codeforthislandmark-basedattacksynthesismethodispubliclyavailable
11
.This
methodreliesonsoftmaxprobalitiesimplyingthatthetrainingandtestingidentitiesareed.
Originally,theistrainedonCASIA-WebFace.However,forafairerevaluation,we
trainedafaceonLFWandthenrantheattack.
A
3
GN[161]
Tothebestofourknowledge,thereisnopubliclyavailableimplementationof
A
3
GN.Wemadethefollowingtoachieveaneffectivebaseline:

TheauthorsoriginallyusedArcFace[9]asthetargetmodel.Sinceallotherbaselinesemploy
FaceNetasthetargetmodel,wealsousedFaceNetfortrainingA
3
GN.

Originally,acycle-consistencylosswasproposedforcontentpreservation.However,we
werenotabletoreproducethisandtherefore,optedforthesame
L
1
normloss,butwithout
thesecondgenerator.Thisgreatlyhelpsinthevisualqualityofthegeneratedadversarial
image.Thatis,wemodiEquation3[161],from
L
rec
=
E
x;z
[
jj
x

G
2
(
G
1
(
x;z
))
jj
1
]
to
L
rec
=
E
x;z
[
jj
x

G
1
(
x;z
)
jj
1
]
9
https://githubw/cleverhans/tree/master/examples/facenet
adversarial
faces
10
https://github.com/ppwwyyxx/Adversarial-Face-Attack
11
https://github.com/alldbi/FLM
88
3.4DefendingAgainstAdversarialFaces
Theaccuracy,usability,andtouchlessacquisitionofstate-of-the-art(SOTA)AFRsystemshave
ledtotheirubiquitousadoptioninaplethoraofdomains.However,thishasalsoinadvertently
sparkedacommunityofattackersthatdedicatetheirtimeandefforttomanipulatefaceseither
physically[208,209]ordigitally[210],inordertoevadeAFRsystems[82].AFRsystemshave
beenshowntobevulnerabletoadversarialattacksresultingfromperturbinganinputprobe[16,
32,142,186].Evenwhentheamountofperturbationisimperceptibletothehumaneye,such
adversarialattackscandegradethefacerecognitionperformanceofSOTAAFRsystems[16].
Withthegrowingdisseminationofﬁfakenewsﬂandﬁdeepfakesﬂ[81],researchgroupsandsocial
mediaplatformsalikearepushingtowardsgeneralizabledefenseagainstcontinuouslyevolving
adversarialattacks.
Figure3.16LeonardoDiCaprio'srealfacephoto(a)enrolledinthegalleryand(b)hisprobeim-
age
12
;(c)Adversarialprobesynthesizedbyastate-of-the-art(SOTA)adversarialfacegenerator,
AdvFaces[16];(d)Proposedadversarialdefenseframework,namely
FaceGuard
takes(c)asin-
put,detectsadversarialimages,localizesperturbedregions,andoutputsaﬁfacedevoid
ofadversarialperturbations.ASOTAfacerecognitionsystem,ArcFace,failstomatchLeonardo's
adversarialface(c)to(a),however,thefacecansuccessfullymatchto(a).Cosinesimilar-
ityscores(
2
[

1
;
1]
)obtainedviaArcFace[9]areshownbelowtheimages.Ascoreabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatesthattwofacesareofthesamesubject.
Aconsiderableamountofresearchhasfocusedonsynthesizingadversarialattacks[16,32,34,
35,141,186].Obfuscationattempts(facesareperturbedsuchthattheycannotbeas
theattacker)aremoreeffective[16],computationallyeftosynthesize[34,35],andwidely
11
https://bit.ly/2IkfSxk
89
0.21
(a)[16]
0.27
(b)[34]
0.28
(b)[35]
0.32
(c)[186]
0.34
(d)[32]
0.35
(e)[141]
Figure3.17
(TopRow)
Adversarialfacessynthesizedvia
6
adversarialattacksusedinourstudy.
(BottomRow)
Correspondingadversarialperturbations(grayindicatesnochangefromtheinput).
Noticethediversityintheperturbations.ArcFacescoresbetweenadversarialimageandtheunal-
teredgalleryimage(notshownhere)aregivenbeloweachimage.Ascoreabove
0.36
indicates
thattwofacesareofthesamesubject.Zoominfordetails.
adopted[211]comparedtoimpersonationattacks(perturbedfacescanautomaticallymatchtoa
targetsubject).Similartopriordefenseefforts[172,179],thissectionofthechapterfocuseson
defendingagainstobfuscationattacks(seeFig.3.16).Givenaninputprobeimage,
x
,anadversarial
generatorhastworequirementsundertheobfuscationscenario:(1)synthesizeanadversarialface
image,
x
adv
=
x
+

,suchthatSOTAAFRsystemsfailtomatch
x
adv
and
x
,and(2)limitthe
magnitudeofperturbation
jj

jj
p
suchthat
x
adv
appearsverysimilarto
x
tohumans.
Anumberofapproacheshavebeenproposedtodefendagainstadversarialattacks.Theirmajor
shortcomingis
generalizability
tounseenadversarialattacks.Adversarialfaceperturbationsmay
vary(seeFig.3.17).Forinstance,gradient-basedattacks,suchasFGSM[35]and
PGD[35],perturbeverypixelinthefaceimage,whereas,AdvFaces[16]andSemanticAdv[186]
perturbonlythesalientfacialregions,
e.g.
,eyes,nose,andmouth.Ontheotherhand,GFLM[32]
performsgeometricwarpingtotheface.Sincetheexacttypeofadversarialperturbationmaynotbe
knownapriori,adefensesystemtrainedonasubsetofadversarialattacktypesmayhavedegraded
performanceonotherunseenattacks.
Tothebestofourknowledge,wetakethesteptowardsacompletedefenseagainstad-
versarialfacesbyintegratinganadversarialfacegenerator,adetector,andaintoa
90
Figure3.18
FaceGuard
employsadetector(
D
)tocomputeanadversarialscore.Scoresbelow
detectionthreshold(
˝
)passestheinputtoAFR,andhighvalueinvokesaandsendsthe
facetotheAFRsystem.
framework,namely
FaceGuard
(seeFig.3.18).Robustnesstounseenadversarialattacksisim-
partedviaastochasticgeneratorthatoutputsdiverseperturbationsevadinganAFRsystem,while
adetectorcontinuouslylearnstodistinguishthemfromrealfaces.Concurrently,aremoves
theadversarialperturbationsfromthesynthesizedimage.
Thisworkmakesthefollowingcontributions:

Anewself-supervisedframework,namely
FaceGuard
,fordefendingagainstadversarialface
images.
FaceGuard
combinesofadversarialtraining,detection,andinto
adefensemechanismtrainedinanend-to-endmanner.

Withtheproposeddiversityloss,ageneratorisregularizedtoproducestochasticandchal-
lengingadversarialfaces.Weshowthatthediversityinoutputperturbationsissuffor
improving
FaceGuard
'srobustnesstounseenattackscomparedtoutilizingpre-computed
trainingsamplesfromknownattacks.

Synthesizedadversarialfacesaidthedetectortolearnatightdecisionboundaryaroundreal
faces.
FaceGuard
'sdetectorachievesSOTAdetectionaccuraciesof
99
:
81%
,
98
:
73%
,and
99
:
35%
on
6
unseenattacksonLFW[8],Celeb-A[17],andFFHQ[18].

Asthegeneratortrains,aconcurrentlyremovesperturbationsfromthesynthesized
adversarialfaces.Withtheproposedloss,thedetectoralsoguidesstraining
toensureimagesaredevoidofadversarialperturbations.At0.1%FalseAccept
91
(a)AdversarialTraining[11]
(b)Detection[166]
Figure3.19(a)AdversarialtrainingdegradesAFRperformanceofFaceNetmatcher[6]onreal
facesinLFWdatasetcomparedtostandardtraining.(b)Abinarytrainedtodistinguish
betweenrealfacesandFGSM[34]attacksfailstodetectunseenattacktype,namelyPGD[35].
Rate,
FaceGuard
'senhancestheTrueAcceptRateofArcFace[9]from
34
:
27%
undernodefenseto
77
:
46%
.
3.4.1LimitationsofState-of-the-ArtDefenses
Robustness.
Adversarialtrainingisregardedasoneofthemosteffectivedefensemethod[12,
34,35]onsmalldatasetsincludingMNISTandCIFAR10.Whetherthistechniquecanscaleto
largedatasetsandavarietyofdifferentattacktypes(perturbationsets)hasnotyetbeenshown.
Adversarialtrainingisformulatedas[34,35]:
min

E
(
x;y
)
˘P
data

max

2

`
(
f

(
x
+

)
;y
)

;
(3.4.1)
where
(
x;y
)
˘P
data
isthe(image,label)jointdistributionofdata,
f

(
x
)
isthenetworkparameter-
izedby

,and
`
(
f

(
x
)
;y
)
isthelossfunction(usuallycross-entropy).Sincethegroundtruthdata
distribution,
P
data
,isnotknowninpractice,itislaterreplacedbytheempiricaldistribution.Here,
thenetwork,
f

ismaderobustbytrainingwithanadversarialnoise(

)thatmaximallyincreases
theloss.Inotherwords,adversarialtraininginvolvestrainingwiththe
strongest
adversarialattack.
92
Figure3.20Overviewoftrainingtheproposed
FaceGuard
inaself-supervisedmanner.An
adver-
sarialgenerator
,
G
,continuouslylearnstosynthesizechallenginganddiverseperturbationsthat
evadeafacematcher.Atthesametime,a
detector
,
D
,learnstodistinguishbetweenthesynthe-
sizedadversarialfacesandrealfaceimages.Perturbationsresidinginthesynthesizedadversarial
facesareremovedviaa

,
P
ur
.
Thegeneralizationofadversarialtraininghasbeeninquestion[12,13,187,188,212].It
wasshownthatadversarialtrainingcanreduceaccuracyonrealexam-
ples[187,188].Inthecontextoffacerecognition,weillustratethisbytrainingtwofacematchers
onCASIA-WebFace:(i)FaceNet[6]trainedviathestandardtrainingprocess,and(ii)FaceNet[6]
byadversarialtraining(FGSM
13
).Wethencomputefacerecognitionperformanceacrosstraining
iterationsonaseparatetestingdataset,LFW[8].Fig.3.19ashowsthatadversarialtrainingdrops
theaccuracyfrom
99
:
13%
!
98
:
27%
.Wegainthefollowinginsight:adversarialtrainingmay
degradeAFRperformanceonrealfaces.
Detection.
Detection-basedapproachesemployapre-processingsteptoﬁdetectﬂwhetheraninput
faceisrealoradversarial[166,167,179,191].Acommonapproachistoutilizeabinary,
D
,thatmapsafaceimage,
x
2
R
H

W

C
to
f
0
;
1
g
,where
0
indicatesarealand
1
anadversarial
face.WetrainabinarytodistinguishbetweenrealandFGSMattacksamplesinCASIA-
WebFace[10].InFig.3.19b,weevaluateitsdetectionaccuracyonFGSMandPGDsamplesin
LFW[8].Wethatprevailingdetection-baseddefenseschemesmayovtothe
adversarialattacksutilizedfortraining.
13
Withmaxperturbationhyperparameteras

=8
=
256
.
93
3.4.2ProposedMethodology
OurdefenseaimstoachieverobustnesswithoutAFRperformanceonrealfaceimages.
Wepositthatanadversarialdefensetrainedalongsideanadversarialgeneratorina
self-supervised
mannermayimproverobustnesstounseenattacks.Themainintuitionsbehindourdefensemech-
anismareasfollows:

SinceadversarialtrainingmaydegradeAFRperformance,weopttoobtainarobustadversarial
detector
and

todetectandpurifyadversarialattacks.

Giventhatprevailingdetection-basedmethodstendtoovtoknownadversarialperturbations
(seeAddendum(Sec.3.4.6)),adetectorandtrainedon
diverse
synthesizedadversarial
perturbationsmaybemorerobusttounseenattacks.

Sufdiversityinsynthesizedperturbationscanguidethedetectortolearnatighterbound-
aryaroundrealfaces.Inthiscase,thedetectoritselfcanserveasapowerfulsupervisionforthe
.

Lastly,pixelsinvolvedintheprocessmayservetoindicateadversarialregionsin
theinputface.
3.4.2.1AdversarialGenerator
Thegeneralizabilityofanadversarialdetectorandreliesonthequalityofthesynthesized
adversarialfaceimagesoutputby
FaceGuard
'sadversarialgenerator.Weproposeanadversarial
generatorthatcontinuouslylearnstosynthesizechallenginganddiverseadversarialfaceimages.
Thegenerator,denotedas
G
,takesaninputrealfaceimage,
x
2
R
H

W

C
,andoutputsanad-
versarialperturbation
G
(
x
;
z
)
,where
z
˘N
(0
;
I
)
isarandomlatentvector.Inspiredbyprevailing
adversarialattackgenerators[16,34,35,136,141],wetreattheoutputperturbation
G
(
x
;
z
)
asan
additive
perturbationmask
.Theadversarialfaceimage,
x
adv
,isgivenby
x
adv
=
x
+
G
(
x
;
z
)
.
Inanefforttoimpartgeneralizabilitytothedetectorand,weemphasizethefollowing
requirementsof
G
:

Adversarial:
Perturbatation,
G
(
x
;
z
)
,needstobeadversarialsuchthatanAFRsystemcan-
94
notidentifytheadversarialfaceimage
x
adv
asthesamepersonastheinputprobe
x
.

VisuallyRealistic:
Perturbation
G
(
x
;
z
)
shouldalsobeminimalsuchthat
x
adv
appearsasa
legitimatefaceimageofthesubjectintheinputprobe
x
.

Stochastic:
Foraninput
x
,werequirediverseadversarialperturbations,
G
(
x
;
z
)
,fordiffer-
entlatents
z
.
Forsatisfyingalloftheaboverequirements,weproposemultiplelossfunctionstotrainthe
generator.
ObfuscationLoss
Toensure
G
(
x
;
z
)
isindeed
adversarial
,weincorporateawhite-boxAFRsys-
tem,
F
,tosupervisethegenerator.Givenaninputface,
x
,thegeneratoraimstooutputanadver-
sarialface,
x
adv
=
x
+
G
(
x
;
z
)
suchthatthefacerepresentations,
F
(
x
)
and
F
(
x
adv
)
,donotmatch.
Inotherwords,thegoalistominimizethecosinesimilaritybetweenthetwofacerepresentations
14
:
L
obf
=
E
x

F
(
x
)
F
(
x
adv
)
jjF
(
x
)
jjjjF
(
x
adv
)
jj

:
(3.4.2)
PerturbationLoss
Withtheidentitylossalone,thegeneratormayoutputperturbationswithlarge
magnitudeswhichwill(a)betrivialforthedetectortorejectand(b)violatethevisualrealism
requirementof
x
adv
.Therefore,werestricttheperturbationstobewithin
[


]
viaahingeloss:
L
pt
=
E
x
[max(

jjG
(
x
;
z
)
jj
2
)]
:
(3.4.3)
DiversityLoss
Theabovetwolossesjointlyensurethatateachstep,ourgeneratorlearnstooutput
challengingadversarialattacks.However,theseattacksaredeterministic;foraninputimage,we
willobtainthesameadversarialimage.Thismayagainleadtoaninferiordetectorthatovtoa
fewdeterministicperturbationsseenduringtraining.Motivatedbystudiesofpreventingmodecol-
lapseinGANs[213],weproposemaximizingadiversitylosstopromotestochasticperturbations
pertrainingiteration,
i
:
L
div
=

1
N
ite
N
ite
X
i
=1


G
(
x
;
z
1
)
(
i
)
G
(
x
;
z
2
)
(
i
)


1
jj
z
1

z
2
jj
1
;
(3.4.4)
where
N
ite
isthenumberoftrainingiterations,
G
(
x
;
z
)
(
i
)
istheperturbationoutputatiteration
i
,
14
Forbrevity,wedenote
E
x

E
x
2P
data
.
95
and
(
z
1
;
z
2
)
aretwoi.i.d.samplesfrom
z
˘N
(0
;
I
)
.Thediversitylossensuresthatfortworandom
latentvectors,
z
1
and
z
2
,wewillobtaintwodifferentperturbations
G
(
x
;
z
1
)
(
i
)
and
G
(
x
;
z
2
)
(
i
)
.
GANLoss
AkintopriorworkonGANs[144,150],weintroduceadiscriminatortoencourage
perceptualrealismoftheadversarialimages.Thediscriminator,
Dsc
,aimstodistinguishbetween
probes,
x
,andsynthesizedfaces
x
adv
viaaGANloss:
L
GAN
=
E
x
[log
Dsc
(
x
)]+
E
x
[log(1

Dsc
(
x
adv
))]
:
(3.4.5)
3.4.2.2AdversarialDetector
Similartoprevailingadversarialdetectors,theproposeddetectoralsolearnsadecisionbound-
arybetweenrealandadversarialimages[166,167,179,191].Akeydifference,however,isthat
insteadofutilizingpre-computedadversarialimagesfromknownattacks(
e
.
g
.FGSMandPGD)
fortraining,theproposeddetectorlearnstodistinguishbetweenrealimagesandthe
synthesized
setofdiverseadversarialattacksoutputbytheproposedadversarialgeneratorinaself-supervised
manner.Thisleadstothefollowingadvantage:
ourproposedframeworkdoesnotrequirealarge
collectionofpre-computedadversarialfaceimagesfortraining
.
WeutilizeabinaryCNNfordistinguishingbetweenrealinputprobes,
x
,andsynthesized
adversarialsamples,
x
adv
.ThedetectoristrainedwiththeBinaryCross-Entropyloss:
L
BCE
=
E
x
[
log
D
(
x
)]+
E
x
[
log
(1
D
(
x
adv
))]
:
(3.4.6)
3.4.2.3Adversarial
Theobjectiveoftheadversarialistorecovertherealfaceimage
x
givenanadversarialface
x
adv
.Weaimtoautomaticallyremovetheadversarialperturbationsbytraininganeuralnetwork
P
ur
,referredasanadversarial.
Theadversarialprocesscanbeviewedasaninvertedprocedureofadversarial
imagesynthesis.Contrarytotheobfuscationlossintheadversarialgenerator,werequirethat
theimage,
x
pur
,successfullymatchestothesubjectintheinputprobe
x
.Notethatthis
96
Attacks
TAR(%)@
0
:
1%
FAR
(
#
)
SSIM
(
"
)
FGSM[34]
26
:
230
:
83

0
:
24
PGD[35]
04
:
910
:
89

0
:
12
DeepFool[141]
36
:
180
:
91

0
:
09
AdvFaces[16]
00
:
170
:
89

0
:
02
GFLM[32]
68
:
030
:
55

0
:
14
SemanticAdv[186]
70
:
050
:
71

0
:
21
NoAttack
99
:
821
:
00

0
:
00
Table3.4FacerecognitionperformanceofArcFace[9]underadversarialattackandaveragestruc-
turalsimilarities(SSIM)betweenprobeandadversarialimagesforobfuscationattackson
485
K
genuinepairsinLFW[8].
canbeachievedviaa
featurerecoveryloss
,whichistheoppositetotheobfuscationloss,
i.e.
,
L
fr
=

obf
.
Notethatanadversarialfaceimage,
x
adv
=
x
+

,ismetricallyclosetotherealimage,
x
,inthe
inputspace.Ifwecanestimate

,thenwecanretrievetherealfaceimage.Here,theperturbations
canbepredictedbyaneuralnetwork,
P
ur
.Inotherwords,retrievingtheimage,
x
pur
involves:(1)subtractingtheperturbationsfromtheadversarialimage,
x
pur
=
x
adv
P
ur
(
x
adv
)
and(2)ensuringthatthe
mask
,
P
ur
(
x
adv
)
,issmallsothatwedonotalterthecontent
ofthefaceimagebyalargemagnitude.Therefore,weproposeahybridperceptuallossthat(1)
ensures
x
pur
isascloseaspossibletotherealimage,
x
viaa
`
1
reconstructionlossand(2)aloss
thatminimizestheamountofalteration,
P
ur
(
x
adv
)
:
L
perc
=
E
x
jj
x
pur

x
jj
1
+
jjP
ur
(
x
adv
)
jj
2
:
(3.4.7)
Finally,wealsoincorporateourdetectortoguidethetrainingofour.Notethat,dueto
thediversityinsynthesizedadversarialfaces,theproposeddetectorlearnsatightdecisionbound-
aryaroundrealfaces.Thiscanserveasastrongself-supervisorysignaltotheforensuring
thattheimagesbelongtotherealfacedistribution.Therefore,wealsoincorporatethe
detectorasadiscriminatorfortheviatheproposedloss:
L
bf
=
E
x
[
log
D
(
x
pur
)]
:
(3.4.8)
97
DetectionAccuracy(%)YearFGS[34]PGD[35]DpFl.[141]AdvF.[16]GFLM[32]Sem.[186]Mean

Std.
General
Gong
etal.
[166]
201798
:
9497
:
9195
:
8792
:
69
99
:
9299
:
92
97
:
54

02
:
82
ODIN[175]
201883
:
1284
:
3971
:
7450
:
0187
:
2585
:
6877
:
03

14
:
34
Steganalysis[178]
201988
:
7689
:
3475
:
9754
:
3058
:
9978
:
6274
:
33

14
:
77
Face
UAP-D[167]
201861
:
3274
:
3356
:
7851
:
1165
:
3376
:
7864
:
28

09
:
97
SmartBox[172]
201858
:
7962
:
5351
:
3254
:
8750
:
9762
:
1456
:
77

05
:
16
Goswami
etal.
[176]
201984
:
5691
:
3289
:
7576
:
5152
:
9781
:
1279
:
37

14
:
04
Massoli
etal.
[179](MLP)
202063
:
5876
:
2881
:
7888
:
3851
:
9752
:
9869
:
16

15
:
29
Massoli
etal.
[179](LSTM)
202071
:
5376
:
4388
:
3275
:
4353
:
7655
:
2270
:
11

13
:
35
Agarwal
etal.
[182]
202094
:
4495
:
3891
:
1974
:
3251
:
6887
:
0387
:
03

16
:
86
ProposedFaceGuard
2021
99
:
8599
:
8599
:
8599
:
84
99
:
6199
:
85
99
:
81

00
:
10
Table3.5DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixadversarial
attackssynthesizedfortheLFWdataset[8].Detectionthresholdissetas
0
:
5
forallmethods.All
baselinemethodsrequiretrainingonpre-computedadversarialattacksonCASIA-WebFace[10].
Ontheotherhand,theproposed
FaceGuard
isself-guidedandgeneratesadversarialattacksonthe
.Hence,itcanberegardedasa
black-box
defensesystem.
3.4.2.4TrainingFramework
Wetraintheentire
FaceGuard
frameworkinFig.3.20inanend-to-endmannerwiththefollowing
objectives:
min
G
L
G
=
L
GAN
+

obf
L
obf
+

pt
L
pt


div
L
div
;
min
D
L
D
=
L
BCE
;
min
P
ur
L
P
ur
=

fr
L
fr
+

perc
L
perc
+

bf
L
bf
:
Ateachtrainingiteration,thegeneratorattemptstofoolthediscriminatorbysynthesizingvisually
realisticadversarialfaceswhilethediscriminatorlearnstodistinguishbetweenrealandsynthe-
sizedimages.Ontheotherhand,inthesameiteration,anexternalcriticnetwork,namelydetector
D
,learnsadecisionboundarybetweenrealandsynthesizedadversarialsamples.Concurrently,the

P
ur
learnstoinverttheadversarialsynthesisprocess.Notethatthereisakeydifference
betweenthediscriminatorandthedetector:thegeneratorisdesignedto
fool
thedis-
criminatorbutnotnecessarilythedetector.Wewillshowinourexperimentsthatthiscrucialstep
preventsthedetectorfrompredicting
D
(
x
)=0
:
5
forall
x
(seeTab.3.7).
98
3.4.3ExperimentalSettings
Datasets.
Wetrain
FaceGuard
onrealfaceimagesinCASIA-WebFace[10]datasetandthen
evaluateonrealandadversarialfacessynthesizedforLFW[8],Celeb-A[17]andFFHQ[18]
datasets.CASIA-WebFace[10]comprisesof
494
;
414
faceimagesfrom
10
;
575
15
differentsub-
jects.LFW[8]contains
13
;
233
faceimagesof
5
;
749
subjects.Sinceweevaluatedefensesunder
obfuscationattacks,weconsidersubjectswithatleasttwofaceimages
16
.Afterthis
9
;
164
faceimagesof
1
;
680
subjectsinLFWareavailableforevaluation.Forbrevity,experimentson
CelebAandFFHQareprovidedinAddendum(Sec.3.4.6).
Implementation.
Theadversarialgeneratorandemployaconvolutionalencoder-decoder.
Thelatentvariable
z
,a
128
-dimensionalfeaturevector,isfedasinputtothegeneratorthrough
spatialpaddingandconcatenation.Theadversarialdetector,a
4
-layerbinaryCNN,istrained
jointlywiththegeneratorand.Empirically,weset

obf
=

fr
=10
:
0
,

pt
=

perc
=1
:
0
,

div
=1
:
0
,

bf
=1
:
0
and

=3
:
0
.Trainingandnetworkarchitecturedetailsareprovidedin
Addendum(Sec.3.4.6).
FaceRecognitionSystems.
Inthisstudy,weusetwoAFRsystems:FaceNet[6]andArcFace[9].
Recallthattheproposeddefenseutilizesafacematcher,
F
,forguidingthetrainingprocessofthe
generator.However,thedeployedAFRsystemmaynotbeknowntothedefensesystemapriori.
Therefore,unlikeprevailingdefensemechanisms[167,172,179],weevaluatetheeffectiveness
oftheproposeddefenseonanAFRsystem
different
from
F
.Wehighlighttheeffectivenessof
ourproposeddefense:
FaceGuardistrainedonFaceNet,whiletheadversarialattacktestsetis
designedtoevadeArcFace.
Obfuscationattemptsperturbrealprobesintoadversarialones.Ideally,
deployedAFRsystems(say,ArcFace),shouldbeabletomatchagenuinepaircomprisedofan
adversarialprobeandarealenrolledfaceofthesamesubject.Therefore,regardlessofrealor
adversarialprobe,weassumethatgenuinepairsshould
always
matchasgroundtruth.Tab.3.4
providesAFRperformanceofArcFaceunder
6
SOTAadversarialattacksfor
484
;
514
genuine
15
Weremoved
84
subjectsinCASIA-WebFacethatoverlapwithLFW.
16
Obfuscationattemptsonlyaffectgenuinepairs(twofaceimagespertainingtothesamesubject).
99
pairsinLFW.Itappearsthatsomeattacks,
e
.
g
.,AdvFaces[16],areeffectiveinbothlowTARand
highSSIM,whilesomearelesscapableinbothmetrics.
3.4.4ComparisonwithState-of-the-ArtDefenses
Inthissection,wecomparetheproposed
FaceGuard
toprevailingdefenses.Weevaluateallmeth-
odsviapubliclyavailablerepositoriesprovidedbytheauthors(seeAddendum(Sec.3.4.6).).All
baselinesaretrainedonCASIA-WebFace[10].
SOTADetectors.
Ourbaselinesinclude
9
SOTAdetectorsproposedbothforgeneralob-
jects[166,175,178]andadversarialfaces[167,172,176,179,182].Thedetectorsaretrainedonreal
andadversarialfacesimagessynthesizedviasixadversarialgeneratorsforCASIA-WebFace[10].
Unlikeallthebaselines,
FaceGuard
'sdetectordoesnotutilizeanypre-computedadversarialat-
tackfortraining.Wecomputetheaccuracyforallmethodsonadatasetcomprising
of
9
;
164
realimagesand
9
;
164
adversarialfaceimagesperattacktypeinLFW.
InTab.3.5,wethatcomparedtothebaselines,
FaceGuard
achievesthehighestdetec-
tionaccuracy.Evenwhenthe
6
adversarialattacktypesareencounteredintraining,abinary
CNN[166],stillfallsshortcomparedto
FaceGuard
.Thisislikelybecause
FaceGuard
istrained
onadiversesetofadversarialfacesfromtheproposedgenerator.WhilethebinaryCNNhasa
smalldropcomparedtoFaceGuardintheseenattacks(
99
:
81%
!
97
:
54%
),itdrops
onunseenadversarialattacksintesting.
Comparedtohand-craftedfeatures,suchasPCA+SVMinUAP-D[167]andentropydetection
inSmartBox[172],
FaceGuard
achievessuperiordetectionresults.SomebaselinesutilizeAFR
featuresforidentifyingadversarialinputs[176,179].WethatintermediateAFRfeatures
primarilyrepresenttheidentityoftheinputfaceanddonotappeartocontainhighlydiscriminative
informationfordetectingadversarialfaces.
Despitetherobustness,
FaceGuard

28
outof
9
;
164
realimagesinLFW[8]and
falselypredicts
46
outof
54
;
984
adversarialfacesasreal.Fromthelatter,
44
arewarpedfacesvia
GFLM[32]andtheremainingtwoaresynthesizedviaAdvFaces[16].Wethat
FaceGuard
100
0
:
77
0
:
67
0
:
96
0
:
99
(a)Realfacesfalselydetectedasadversarial
Real
AdvFaces(
0
:
42
)
Real
AdvFaces(
0
:
28
)
(b)Adversarialfacesfalselydetectedasreal
Figure3.21Exampleswheretheproposed
FaceGuard
failstocorrectlydetect(a)realfacesand
(b)adversarialfaces.Detectionscores
2
[0
;
1]
aregivenbeloweachimage,where
0
indicatesreal
and
1
indicatesadversarialface.
tendstomisclassifyrealfacesunderextremeposesandadversarialfacesthatareoccluded(
e
.
g
.,
hats)(seeFig.3.21).
ComparisonwithAdversarialTraining&
Wealsocomparewithprevailingdefenses
designingrobustfacematchers[11Œ13]and[14,15,184].Weconductav
experimentbyconsideringallpossiblegenuinepairs(twofacesbelongingtothesamesubject)
inLFW[8].Foroneprobeinagenuinepair,wecraftsixdifferentadversarialprobes(oneper
attacktype).Intotal,thereare
484
;
514
realpairsand
˘
3
M
adversarialpairs.Foraedmatch
threshold
17
,wecomputetheTrueAcceptRate(TAR)ofsuccessfullymatchingtwoimagesina
realoradversarialpairinTab.3.6.Inotherwords,TARishereastheratioofgenuinepairs
abovethematchthreshold.
ArcFacewithoutanyadversarialdefensesystemachieves
34
:
27%
TARat
0
:
1%
FARunder
17
Wecomputethethresholdat0.1%FARonallpossibleimagepairsinLFW,
e
.
g
.,threshold@0.1%FARfor
ArcFaceissetat
0
:
36
.
101
DefensesYearStrategyRealAttacks
485Kpairs
3
M
pairs
No-Defense

-
99
:
8234
:
27
Adv.Training[11]2017Robustness
96
:
4211
:
23
Rob-GAN[12]2019Robustness
91
:
3513
:
89
Feat.Denoising[164]2019Robustness
87
:
6117
:
97
L2L[13]2019Robustness
96
:
8916
:
76
MagNet[14]2017
94
:
4738
:
32
DefenseGAN[15]2018
96
:
7839
:
21
Feat.Distillation[183]2019
94
:
6441
:
77
NRP[184]2020
97
:
5461
:
44
A-VAE[185]2020
93
:
7151
:
99
ProposedFaceGuard
2021
99
:
8177
:
46
Table3.6AFRperformance(TAR(%)@0.1%FAR)ofArcFaceundernodefenseandwhen
ArcFaceistrainedviaSOTArobustnesstechniques[11Œ13]orSOTAs[14,15].
FaceGuard
correctlypassesmajorityofrealfacestoArcFaceandalsoadversarialattacks.
attack.Adversarialtraining[11Œ13]inhibitsthefeaturespaceofArcFace,resultinginworseper-
formanceonbothrealandadversarialpairs.Ontheotherhand,methods[14,15,184]
canbetterretainfacefeaturesinrealpairsbuttheirperformanceunderattackisstillundesirable.
Instead,theproposed
FaceGuard
defensesystemdetectswhetheraninputfaceimageis
realoradversarial.Ifinputfacesareadversarial,theyarefurtherFromTab.3.6,we
thatourdefensesystemoutperformsSOTAbaselinesinprotectingArcFace[9]against
attacks.,
FaceGuard
'senhancesArcFace'saverageTARat
0
:
1%
FARunderall
sixattacks(seeTab.3.4)from
34
:
27%
!
77
:
46%
.Inaddition,
FaceGuard
alsomaintainssimilar
facerecognitionperformanceonrealfaces(TARonrealpairsdropfrom
99
:
82%
!
99
:
81%
).
Therefore,ourproposeddefensesystemensuresthatbenignuserswillnotbeincorrectlyrejected
whilemaliciousattemptstoevadetheAFRsystemwillbecurbed.
3.4.5AnalysisofOurApproach
QualityoftheAdversarialGenerator.
InTab.3.7,weseethatwithouttheproposedadversarial
generator(ﬁWithout
G
ﬂ),
i
.
e
.,adetectortrainedonthesixknownattacktypes,suffersfromhigh
102
Model
AdvFaces[16]
Mean

Std.
Gen.
G
Without
G
91
:
72
97
:
12

04
:
54
Without
L
div
95
:
42
98
:
23

01
:
33
With
G
and
L
div
99
:
84
99
:
81

00
:
10
Det.
D
D
asDiscriminator
50
:
00
75
:
25

21
:
19
D
viaPre-Computed
G
52
:
01
69
:
37

19
:
91
D
asOnlineDetector
99
:
84
99
:
81

00
:
10
Table3.7Ablatingtrainingschemesofthegenerator
G
anddetector
D
.Allmodelsaretrainedon
CASIA-WebFace[10].
(Col.3)
Wecomputethedetectionaccuracyinclassifyingrealfacesin
LFW[8]andthemostchallengingadversarialattackinTab.3.4,AdvFaces[16].
(Col.4)
Theavg.
andstd.dev.ofdetectionaccuracyacrossall6adversarialattacks.
standarddeviation.Instead,trainingadetectorwithadeterministic
G
(ﬁWithout
L
div
ﬂ),leadsto
bettergeneralizationacrossattacktypes,sincethedetectorstillencountersvariationsinsynthe-
sizedimagesasthegeneratorlearnstobettergenerateadversarialfaces.However,suchadetector
isstillpronetoovingtoafewdeterministicperturbationsoutputby
G
.Finally,
FaceGuard
withthediversitylossintroducesdiverseperturbationswithinandacrosstrainingiterations(see
Fig.3.22).
QualityoftheAdversarialDetector.
Thediscriminator'staskissimilartothedetector;deter-
minewhetheraninputimageisrealorfake/adversarial.Thekeydifferenceisthatthegenerator
isenforcedtofoolthediscriminator,butnotthedetector.Ifwereplacethediscriminatorwithan
adversarialdetector,thegeneratorcontinuouslyattemptstofoolthedetectorbysynthesizingim-
agesthatareascloseaspossibletotherealimagedistribution.Bydesign,suchadetectorshould
convergeto
Disc
(
x
)=0
:
5
forall
x
(realoradversarial).Asweexpect,inTab.3.7,wecannot
relyonpredictionsmadebysuchadetector(ﬁ
D
asDiscriminatorﬂ).Wetryanothervariant:we
trainthegenerator
G
andthentrainadetectortodistinguishbetweenrealandpre-computed
attacksvia
G
(ﬁ
D
viaPre-Computed
G
ﬂ).Asweexpect,theproposedmethodologyoftrainingthe
detectorinanonlinefashionbyutilizingthesynthesizedadversarialsamplesoutputby
G
atany
giveniterationleadstoarobustdetector(ﬁ
D
asOnlineDetectorﬂ).Thiscanlikely
beattributedtothefactthatadetectortrainedon-lineencountersamuchlargervariationasthe
generatortrainsalongside.ﬁ
D
viaPre-Computed
G
ﬂisexposedonlytowithin-iterationvariations
103
InputProbe(
x
)
G
(
x
;
z
1
)
G
(
x
;
z
2
)
G
(
x
;
z
3
)
(a)Adversarialfacesviarandomlatentswithinthesameiteration.
Iteration:
5
K
Iteration:
20
K
Iteration:
60
K
Iteration:
100
K
(b)Adversarialfacesatdifferenttrainingiterations.
Figure3.22Adversarialfacessynthesizedby
FaceGuard
duringtraining.Notethediversityin
perturbations(a)withinand(b)acrossiterations.
(fromrandomlatentsampling),however,`
D
asOnlineDetectorﬂencountersvariations
both
within
andacrosstrainingiterations(seeFig.3.22).
QualityoftheAdversarial.
Recallthatweenforcedthepuriimagetobeclosetothe
104
ProbeAdvFaces[16]Localization
ArcFace/SSIM:

0
:
30
=
0
:
890
:
62
=
0
:
91
Figure3.23
FaceGuard
successfullytheadversarialimage(redregionsindicateadversarial
perturbationslocalizedbyourmask).ArcFace[9]scores
2
[

1
;
1]
andSSIM
2
[0
;
1]
betweenanadvprobeandinputprobearegivenbeloweachimage.
(a)
(b)
Figure3.24(a)
FaceGuard
'siscorrelatedwithitsadversarialsynthesisprocess.(b)
Trade-offbetweendetectionandwithrespecttoperturbationmagnitudes.Withmin-
imalperturbation,detectionischallengingwhilemaintainsAFRperformance.Excessive
perturbationsleadtoeasierdetectionwithgreaterchallengein
realfaceviaareconstructionloss.Thus,theandperturbationmasksshouldbesimilar.
InFig.3.24a,weshowsthatthetwomasksareindeedcorrelatedbyplottingtheCosinesimilarity
distribution(
2
[

1
;
1]
)between
G
(
x
;
z
)
and
P
ur
(
x
+
G
(
x
;
z
))
forall
9
;
164
imagesinLFW.
Therefore,pixelsin
x
adv
involvedintheprocessshouldcorrespondtothosethat
causetheimagetobeadversarialintheplace.Fig.3.23highlightsthatperturbedregions
canbeautomaticallylocalizedviaconstructingaheatmapoutof
P
ur
(
x
adv
)
.InFig.3.24b,we
investigatethechangeinAFRperformance(TAR(%)@
0
:
1
%FAR)ofArcFaceunderattack
(synthesizedadversarialfacesvia
G
(
x
;
z
)
)whentheamountofperturbationisvaried.Wethat
105
(a)minimalperturbationishardertodetectbuttheincursminimaldamagetotheAFR,
while,(b)excessiveperturbationsareeasiertodetectbutincreasesthechallengein
3.4.6Addendum
3.4.6.1ImplementationDetails
AllthemodelsinthechapterareimplementedusingTwr1.12.AsingleNVIDIAGeForce
GTX2080TiGPUisusedfortraining
FaceGuard
onCASIA-Webface[10]andevaluatedon
LFW[8],CelebA[17],andFFHQ[18].
3.4.6.2Preprocessing
AllfaceimagesarepassedthroughMTCNNfacedetector[41]todetect
5
faciallandmarks
(twoeyes,noseandtwomouthcorners).Then,similaritytransformationisusedtonormalize
thefaceimagesbasedonthevelandmarks.Aftertransformation,theimagesareresizedto
160

160
.Beforepassinginto
FaceGuard
,eachpixelintheRGBimageisnormalized
2
[

1
;
1]
bysubtracting
128
anddividingby
128
.
Allthetestingimagesarefromtheidentitiesinthetest
dataset.
3.4.6.3NetworkArchitectures
Thegenerator,
G
takesasinputanrealRGBfaceimage,
x
2
R
160

160

3
anda
128
-dimensional
randomlatentvector,
z
˘N
(0
;
I
)
andoutputsasynthesizedadversarialface
x
adv
2
R
160

160

3
.
Let
c7s1-k
bea
7

7
convolutionallayerwith
k
andstride
1
.
dk
denotesa
4

4
con-
volutionallayerwith
k
andstride
2
.
Rk
denotesaresidualblockthatcontainstwo
3

3
convolutionallayers.
uk
denotesa
2

upsamplinglayerfollowedbya
5

5
convolutionallayer
with
k
andstride
1
.WeapplyInstanceNormalizationandBatchNormalizationtothegen-
eratoranddiscriminator,respectively.WeuseLeakyReLUwithslope
0
:
2
inthediscriminatorand
ReLUactivationinthegenerator.Thearchitecturesofthetwomodulesareasfollows:
106

Generator:
c7s1-64,d128,d256,R256,R256,R256,u128,u64,c7s1-3
,

Discriminator:
d32,d64,d128,d256,d512
.
A
1

1
convolutionallayerwith
3
andstride
1
isattachedtothelastconvolutionallayerof
thediscriminatorforthepatch-basedGANloss
L
GAN
.
The,
P
ur
,consistsofthesamenetworkarchitectureasthegenerator:


c7s1-64,d128,d256,R256,R256,R256,u128,u64,c7s1-3
.
Weapplythe
tanh
activationfunctiononthelastconvolutionlayerofthegeneratorandthe
toensurethatthegeneratedimagesare
2
[

1
;
1]
.Inthechapter,wedenotedthe
outputofthetanhlayerofthegeneratorasanﬁperturbationmaskﬂ,
G
(
x
;
z
)
2
[

1
;
1]
and
x
2
[

1
;
1]
.Similarly,theoutputofthetanhlayeroftheerisreferredtoan
tionmaskﬂ,
P
ur
(
x
adv
)
2
[

1
;
1]
and
x
adv
2
[

1
;
1]
.Theadversarialimageiscomputedas
x
adv
=2

clamp

G
(
x
;
z
)+

x
+1
2

1
0

1
.Thisensures
G
(
x
;
z
)
caneitheraddorsubtractpixels
from
x
when
G
(
x
;
z
)
6
=0
.When
G
(
x
;
z
)
!
0
,then
x
adv
!
x
.Similarly,theimage
iscomputedas
x
pur
=2

clamp

x
adv
+1
2

P
ur
(
x
adv
)

1
0

1
.
Theexternalcriticnetwork,detector
D
,comprisesofa
4
-layerbinaryCNN:

Detector:
d32,d64,d128,d256,fc64,fc1
,
where
fcN
referstoafully-connectedlayerwith
N
neuronoutputs.
3.4.6.4TrainingDetails
Thegenerator,detector,andaretrainedinanend-to-endmannerviaADAMoptimizer
withhyperparameters

1
=0
:
5
,

2
=0
:
9
,learningrateof
1
e

4
,andbatchsize
16
.Algorithm2
outlinesthetrainingalgorithm.
NetworkConvergence.
InFig.3.25,weplotthetraininglossacrossiterationswhenanadversar-
107
Algorithm2
Training
FaceGuard
.Allexperimentsinthisworkuse

=0
:
0001
,

1
=0
:
5
,

2
=0
:
9
,

obf
=

fr
=10
:
0
,

pt
=

perc
=

div
=1
:
0
,

=3
:
0
,
m
=16
.Forbrevity,
lg
refersto
logoperation.
1:
Input
2:
X
TrainingDataset
3:
F
CosinesimilaritybyAFR
4:
G
Generatorwithweights
G

5:
Dc
Discriminatorwithweights
Dc

6:
D
Detectorwithweights
D

7:
P
ur
withweights
P
ur

8:
m
Batchsize
9:

Learningrate
10:
for
numberoftrainingiterations
do
11:
Sampleabatchofprobes
f
x
(
i
)
g
m
i
=1
˘X
12:
Sampleabatchofrandomlatents
f
z
(
i
)
g
m
i
=1
˘N
(0
;I
)
13:

(
i
)
G
=
G
((
x
(
i
)
;z
(
i
)
)
14:
x
(
i
)
adv
=
x
(
i
)
+

(
i
)
G
15:

(
i
)
P
ur
=
G
((
x
(
i
)
;z
(
i
)
)
16:
x
(
i
)
pur
=
x
(
i
)
adv


(
i
)
P
ur
17:
18:
L
G
pt
=
1
m

P
m
i
=1
max


jj

(
i
)
jj
2

19:
L
G
obf
=
1
m
h
P
m
i
=1
F

x
(
i
)
;x
(
i
)
adv

20:
L
G
div
=

1
m

P
m
i
=1

jj
G
(
x
;
z
1
)
(
i
)

(
x
;
z
2
)
(
i
)
jj
1
jj
z
1

z
2
jj
1

21:
L
G
GAN
=
1
m
h
P
m
i
=1
lg

1

Dc
(
x
(
i
)
adv
)

22:
L
D
=
1
m
P
m
i
=1
h
lg
D
(
x
(
i
)
)+
lg

1
D
(
x
(
i
)
adv
)

23:
L
Dc
=
1
m
P
m
i
=1
h
lg

Dc
(
x
(
i
)
)

+
lg

1

Dc
(
x
(
i
)
adv
)

24:
L
P
ur
perc
=
1
m
P
m
i
=1
h
jj
x
pur

x
jj
1
+
jjP
ur
(
x
(
i
)
adv
)
jj
1
i
25:
L
P
ur
fr
=

1
m

P
m
i
=1
F

x
(
i
)
;x
pur

26:
L
P
ur
bf
=
1
m
[
P
m
i
=1
lg
(1
D
(
x
pur
))]
27:
L
G
=
L
G
GAN
+

obf
L
obf
+

pt
L
pt
+

div
L
div
28:
L
P
ur
=

fr
L
fr
+

perc
L
perc
+

bf
L
bf
29:
G

=
Adam
(
O
G
L
G
;
G

;;
1
;
2
)
30:
D
=
Adam
(
O
Dc
L
Dc
;Dc

;;
1
;
2
)
31:
D

=
Adam
(
O
D
L
D
;
D

;;
1
;
2
)
32:
P
ur

=
Adam
(
O
P
ur
L
P
ur
;
P
ur

;;
1
;
2
)
33:
endfor
108
Figure3.25Traininglossacrossiterationswhenanadversarialdetectionnetworkistrainedvia
pre-computedadversarialfaces(blue),theproposedadv.generatorbutwithoutthediversity(or-
ange),andwiththeproposeddiversityloss(green).Thediversitylosspreventsthenetworkfrom
ovtoadversarialperturbationsencounteredduringtraining.
ialdetectoristrainedviapre-computedadversarialfaces.Inthiscase,thetraininglossconvergesto
alowvalueandremainsconsistentacrosstheremainingepochs.Suchadetectormayovtothe
edsetofadversarialperturbationsencounteredintraining.Insteadofutilizingthepre-computed
adversarialattacks,utilizinganadversarialgeneratorintraining(without
L
div
),introduceschal-
lengingtrainingsamples.
FaceGuard
withthediversitylossintroducesdiverseperturbationswithinatrainingiteration
(seeFig.3.22).InFig.3.25,wealsoobservethatthetrainingloss(epochs
8

40
)untilconvergence(epochs
40

50
).Thisindicatesthatthroughoutthetraining(within
andacrosstrainingiterations),theproposedgeneratorsynthesizesstronganddiverserangeof
adversarialfacesthatcontinuouslyregularizesthetrainingoftheadversarialdetector.
109
DetectionAccuracy(%)YearLFW[34]CelebA[17]FFHQ[18]
General
Gong
etal.
[166]
201797
:
54

02
:
8294
:
38

04
:
4896
:
89

02
:
07
ODIN[175]
201877
:
03

14
:
3468
:
95

19
:
6474
:
63

08
:
16
Steganalysis[178]
201974
:
33

14
:
7772
:
53

11
:
3071
:
09

09
:
86
Face
UAP-D[167]
201864
:
28

09
:
9763
:
19

16
:
4968
:
65

08
:
73
SmartBox[172]
201856
:
77

05
:
1654
:
85

09
:
3357
:
19

09
:
55
Goswami
etal.
[176]
201979
:
37

14
:
0474
:
70

13
:
8880
:
03

09
:
24
Massoli
etal.
[179](MLP)
202069
:
16

15
:
2961
:
78

11
:
3466
:
26

10
:
06
Massoli
etal.
[179](LSTM)
202070
:
11

13
:
3563
:
67

16
:
2169
:
58

07
:
91
Agarwal
etal.
[182]
202087
:
03

16
:
8685
:
81

15
:
6486
:
70

11
:
04
ProposedFaceGuard
2021
99
:
81

00
:
1098
:
73

00
:
9299
:
35

00
:
09
Table3.8AverageandstandarddeviationofdetectionaccuraciesofSOTAadversarialfacedetec-
torsinclassifyingsixadversarialattackssynthesizedfortheLFW[8],CelebA[17],andFFHQ[18]
datasets.Detectionthresholdissetas
0
:
5
forallmethods.Allbaselinemethodsrequiretrainingon
pre-computedadversarialattacksonCASIA-WebFace[10].Ontheotherhand,theproposed
Face-
Guard
isself-guidedandgeneratesadversarialattacksonthe.Hence,itcanberegardedas
a
black-box
defensesystem.
3.4.6.5Baselines
Weevaluatealldefensemethodsviapubliclyavailablerepositoriesprovidedbytheauthors.Only
madeistoreplacetheirtrainingdatasetswithCASIA-WebFace[10].Weprovidethe
publiclinkstotheauthorcodesbelow:

Gong
etal.
[166]:https://github.com/gongzhitaao/adversarial-

UAP-D[167]/SmartBox
etal.
[172]:https://github.com/akhil15126/SmartBox

Massoli
etal.
[179]:https://github.com/fvmassoli/trj-based-adversarials-detection

AdversarialTraining[11]:https://github.com/locuslab/fast
adversarial

Rob-GAN[12]:https://github.com/xuanqing94/RobGAN

L2L[13]:https://github.com/YunseokJANG/l2l-da

MagNet[14]:https://github.com/Trevillie/MagNet

DefenseGAN[15]:https://github.com/kabkabm/defensegan

NRP[184]:https://github.com/Muzammal-Naseer/NRP
Attacksarealsosynthesizedviapubliclyavailableauthorcodes:
110
Known
Unseen
FGSM[34]PGD[35]DeepFool[141]
AdvFaces[16]GFLM[32]SemanticAdv[186]
Gong
etal.
[166]
94
:
5192
:
2194
:
12
68
:
6350
:
0050
:
21
UAP-D[167]
63
:
6569
:
3356
:
38
60
:
8150
:
1250
:
28
SmartBox[172]
58
:
7962
:
5351
:
32
54
:
8750
:
9762
:
14
Massoli
etal.
[179](MLP)
78
:
3582
:
5291
:
21
55
:
5750
:
0050
:
00
Massoli
etal.
[179](LSTM)
74
:
6186
:
4394
:
73
62
:
4350
:
0050
:
00
(a)
Known
Unseen
AdvFaces[16]GFLM[32]SemanticAdv[186]
FGSM[34]PGD[35]DeepFool[141]
Gong
etal.
[166]
81
:
3996
:
7298
:
97
84
:
4657
:
0072
:
32
UAP-D[167]
68
:
7854
:
3177
:
46
51
:
6450
:
3252
:
01
SmartBox[172]
54
:
8750
:
9762
:
14
58
:
7962
:
5351
:
32
Massoli
etal.
[179](MLP)
77
:
6486
:
5494
:
78
55
:
2051
:
3252
:
90
Massoli
etal.
[179](LSTM)
81
:
4292
:
6296
:
76
52
:
7465
:
4354
:
84
(b)
Known
FGSM[34]PGD[35]DeepFool[141]AdvFaces[16]GFLM[32]SemanticAdv[186]
Gong
etal.
[166]
98
:
9497
:
9195
:
8792
:
69
99
:
9299
:
92
UAP-D[167]
61
:
3274
:
3356
:
7851
:
1165
:
3376
:
78
SmartBox[172]
58
:
7962
:
5351
:
3254
:
8750
:
9762
:
14
Massoli
etal.
[179](MLP)
63
:
5876
:
2881
:
7888
:
3851
:
9752
:
98
Massoli
etal.
[179](LSTM)
71
:
5376
:
4388
:
3275
:
4353
:
7655
:
22
Unseen
ProposedFaceGuard
99
:
8599
:
8599
:
8599
:
84
99
:
6199
:
85
(c)
Table3.9DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixadversarial
attackssynthesizedfortheLFWdataset[8]undervariousknownandunseenattackscenarios.
Detectionthresholdissetas
0
:
5
forallmethods.

FGSM/PGD/DeepFool:https://githubw/cleverhans

AdvFaces:https://github.com/ronny3050/AdvFaces

GFLM:https://github.com/alldbi/FLM

SemanticAdv:https://github.com/AI-secure/SemanticAdv
3.4.6.6AdditionalDatasets
InTab.3.8,wereportaverageandstandarddeviationofdetectionratesoftheproposed
Face-
Guard
andotherbaselinesonthe
6
adversarialattackssynthesizedonLFW[8],CelebA[17],
andFFHQ[18](followingthesameprotocolasTab.3.5).ForCelebA,wesynthesizeatotalof
19
;
962

6=119
;
772
adversarialsamplesfor
19
;
962
realsamplesintheCelebAtestingsplit[17].
Wealsosynthesize
4
;
974

6=29
;
844
adversarialsamplesfor
4
;
974
realfacesinFFHQtesting
111
split[18].Wethattheproposed
FaceGuard
outperformsallbaselinesinallthreefacedatasets.
3.4.6.7OvinPrevailingDetectors
InTab.3.9,weprovidethedetectionratesofprevailingSOTAdetectorsindetectingsixadversarial
attacksinLFW[8]whentheyaretrainedondifferentattacksubsets.Wehighglighttheov
tingissuewhen(a)SOTAdetectorsaretrainedongradient-basedadversarialattacks(FGSM[34],
PGD[35],andDeepFool[141])andtestedongradient-basedandlearning-basedattacks(Ad-
vFaces[16],GFLM[32],andSemanticAdv[186]),and(b)vice-versa.Tab.3.9(c)reportsthe
detectionperformanceofSOTAdetectorswhenallsixattacksareavailablefortraining.
WethatdetectionaccuracyofSOTAdetectorsdropswhentestedonasubset
ofattacksnotencounteredduringtheirtraining.Instead,theproposed
FaceGuard
maintainsrobust
detectionaccuracywithouteventrainingonthepre-computedsamplesfromanyknownattacks.
3.4.6.8QualitativeResults
GeneratorResults.
Fig.3.26showsexamplesofsynthesizedadversarialfacesviatheproposed
adversarialgenerator
G
.Notethatthegeneratortakestheinputprob
x
andarandomlatent
z
.
Weshowsynthesizedperturbationmasksandcorrespondingadversarialfacesforthreerandomly
sampledlatents.WeobservethatthesynthesizedadversarialimagesevadesArcFace[9]while
maintaininghighstructuralsimilaritybetweenadversarialandinputprobe.
Results.
Weshowexamplesofimagesvia
FaceGuard
andbaselinesincluding
MagNet[14]andDefenseGAN[15]inFig.3.27.Weobservethat,comparedtobaselines,
imagessynthesizedvia
FaceGuard
arevisuallyrealisticwithminimalchangescomparedtothe
groundtruthrealprobe.Inaddition,comparedtothetwobaselines,
FaceGuard
'sprotects
ArcFace[9]matcherfrombeingevadedbythesixadversarialattacks.
112
Figure3.26Examplesofgeneratedadversarialimagesalongwithcorrespondingperturbation
masksobtainedvia
FaceGuard
'sgenerator
G
forthreerandomlysampled
z
.Cosinesimilarity
scoresviaArcFace[9]
2
[

1
;
1]
andSSIM
2
[0
;
1]
betweensynthesizedadversarialandinput
probearegivenbeloweachimage.Ascoreabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)
indicatesthattwofacesareofthesamesubject.
3.4.6.9AdditionalResultson
PerturbationandMasks.
Inthemaintext,wefoundthattheperturbationandpu-
masksarecorrelatedwithanaverageCosinesimilarityof
0
:
52
.Weshowvepairsof
perturbationandmasksrankedbytheCosinesimilaritybetweenthem(highesttolow-
est).Weobservethatmaskisbettercorrelatedwhenperturbationsaremorelocal.
Slightlyperturbingentirefacesposestobechallengingfortheproposed.
EffectofPerturbationAmount.
Wealsostudiedtheeffectofperturbationamountondetection
113
andresultsinthemaintext.Weobservedatrade-offbetweendetectionand
tionwithrespecttoperturbationmagnitudes.Withminimalperturbation,detectionischallenging
whilemaintainsAFRperformance.Excessiveperturbationsleadtoeasierdetectionwith
greaterchallengeinInFig.3.29,showexamplesofsynthesizedadversarialfacesfor
differentperturbationamountsandtheircorrespondingimages.Wethatdetection
scoresimprovewithlargerperturbation.Alignedwithourearlierduetotheproposed
loss,
L
bf
,facesarecontinuouslydetectedasrealbythedetectorwhichexplains
whythemaintainsAFRperformancewithincreasingperturbationamount.
EffectofonArcFaceEmbeddings.
Inordertoinvestigatetheeffectof
tiononamatcher'sfeaturespace,weextractfaceembeddingsofrealimages,theircorresponding
adversarialimagesviathechallengingAdvFaces[16]attack,andimages,viatheSOTA
ArcFacematcher.Intotal,weextractfeaturevectorsfrom
1
;
456
faceimagesof
10
subjectsinthe
LFWdataset[8].InFig.3.10,weplotthe
2
Dt-SNEvisualizationofthefaceembeddingsforthe
10
subjects.Theidentityclusteringscanbeclearlyobservedfromreal,adversarial,and
images.Inparticular,weobservethatsomeadversarialfacespertainingtoasubjectmovesfarther
fromitsidentityclusterwhiletheproposeddrawsthemback.Fig.3.30illustratesthat
theproposedindeedenhancesfacerecognitionperformanceofArcFaceunderattackfrom
34
:
27%
TAR@
0
:
1%
FARundernodefenseto
77
:
46%
TAR@
0
:
1%
FAR.
3.5Summary
Thischapterproposesanewmethodofadversarialfacesynthesis,namely
AdvFaces
,that
automaticallygeneratesadversarialfaceimageswithimperceptibleperturbationsevadingstate-
of-the-artfacematchers.WiththehelpofaGAN,andtheproposedperturbationandidentity
losses,AdvFaceslearnsthesetofpixellocationsrequiredbyfacematchersforand
onlyperturbsthosesalientfacialregions(suchaseyebrowsandnose).Oncetrained,AdvFaces
generateshighqualityandperceptuallyrealisticadversarialexamplesthatarebenigntothehuman
114
eyebutcanevadestate-of-the-artblack-boxfacematchers,whileoutperformingotherstate-of-the-
artadversarialfacemethods.
WiththeintroductionofsophisticatedadversarialattacksonAFRsystems,suchasgeomet-
ricwarpingandGAN-synthesizedadversarialattacks,adversarialdefenseneedstoberobustand
generalizable.Withoututilizinganypre-computedtrainingsamplesfromknownadversarialat-
tacks,theproposed
FaceGuard
achievedstate-of-the-artdetectionperformanceagainst
6
different
adversarialattacks.
FaceGuard
'salsoenhancedArcFace'srecognitionperformanceunder
adversarialattacks.
115
Figure3.27ExamplesofimagesviaMagNet[14],DefenseGan[15],andproposed
Face-
Guard
forsixadversarialattacks.CosinesimilarityscoresviaArcFace[9]
2
[

1
;
1]
are
givenbeloweachimage.Ascoreabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatesthat
twofacesareofthesamesubject.
116
Figure3.28Examplesofsynthesizedadversarialimagesviatheproposedadversarialgenerator
andcorrespondingimages.Cosinesimilaritybetweenperturbationandmasks
givenbeloweachrowalongwithArcFacescoresbetweensynthesizedadvimage
andrealprobe.Ascoreabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatesthattwofaces
areofthesamesubject.Evenwithlowercorrelationbetweenperturbationandmasks
(rows3-5),theimagescanstillbeasthecorrectidentity.Noticethatthe
primarilyalterstheeyecolor,nose,andsubduesadversarialperturbationsinforeheads.Zoomin
fordetails.
117
Figure3.29ArcFace
2
[

1
;
1]
/Detectionscores
2
[0
;
1]
whenperturbationamountisvaried
(

=
f
0
:
25
;
0
:
50
;
0
:
75
;
1
:
00
;
1
:
25
g
).Detectionscoresabove
0
:
5
arepredictedasadversarialimages
whileArcFacescoresabove
0.36
(threshold@
0
:
1%
FalseAcceptRate)indicatethattwofaces
areofthesamesubject.
FaceGuard
istrainedon

=1
:
00
.Thedetectionscoresimproveas
perturbationamountincreases,whereas,majorityofimagesaredetectedasreal.Even
whenimagesfailtobeclassasrealbythedetector,cationmaintainhighAFR
performance.
118
Figure3.30
2
Dt-SNEvisualizationoffacerepresentationsextractedviaArcFacefrom
1
;
456
(a)
real,(b)AdvFaces[16],and(c)imagesbelongingto
10
subjectsinLFW[8].Example
AdvFaces[16]pertainingtoasubjectmovesfartherfromitsidentityclusterwhiletheproposed
drawsthemback.
119
Chapter4
DetectionofDigitalandPhysical
FaceAttacks
Inthepreviouschapters,weproposedindividualsolutionstoenhancetherobustnessofAFRsys-
temagainstphysicalanddigitalattacks.However,threebroadcategoriesoffaceattackshavebeen
inliterature,namelyspoofs,digitalmanipulation,andadversarialfaces.Sincetheexact
typeoffaceattackmaynotbeknown
apriori
,ageneralizabledetectorthatcandefendanAFR
systemagainstanyofthethreeattackcategoriesisofutmostimportance.Inthischapter,ourmain
focusistodesignauniversalfaceattackdetectionframeworkthatcanreliablydetectattacksfrom
allthreecategories.
4.1Introduction
TheforemostchallengefacingAFRsystemsistheirvulnerabilityto
faceattacks
.Forinstance,
anattackercanhidehisidentitybywearinga3Dmask[58],orintruderscanassumeavictim's
identitybydigitallyswappingtheirfacewiththevictim'sfaceimage[20].Withunrestrictedaccess
totherapidproliferationoffaceimagesonsocialmediaplatforms,launchingattacksagainstAFR
systemshasbecomeevenmoreaccessible.Giventhegrowingdisseminationofﬁfakenewsﬂand
ﬁdeepfakesﬂ[59],theresearchcommunityandsocialmediaplatformsalikearepushingtowards
120
Figure4.1FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigitalandphysical
spaces.Giventhediversityofthefaceattacks,prevailingmethodsfallshortindetectingattacks
acrossallthreecategories(
i
.
e
.,adversarial,digitalmanipulation,andspoofs).Thisworkisamong
thetothetaskoffaceattackdetectiononthe
25
attacktypesacross
3
categoriesshown
here.
generalizable
defenseagainstcontinuouslyevolvingandsophisticatedfaceattacks.
Inliterature,faceattackscanbebroadlyintothreeattackcategories:(i)Spoofat-
tacks:artifactsinthe
physical
domain(
e
.
g
.,3Dmasks,eyeglasses,replayingvideos)[1],(ii)
Adversarialattacks:imperceptiblenoisesaddedtoprobesforevadingAFRsystems[60],and(iii)
Digitalmanipulationattacks:entirelyorpartiallyphoto-realisticfacesusinggenerative
models[20].Withineachofthesecategories,therearedifferentattacktypes.Forexample,each
spoofmedium,
e
.
g
.,3Dmaskandmakeup,constitutesoneattacktype,andthereare
13
common
typesofspoofattacks[1].Likewise,inadversarialanddigitalmanipulationattacks,eachattack
model,designedbyuniqueobjectivesandlosses,maybeconsideredasoneattacktype.Thus,
theattackcategoriesandtypesforma
2
-layertreestructureencompassingthediverseattacks(see
Fig.4.1).Suchatreewillinevitablygrowinthefuture.
InordertosafeguardAFRsystemsagainsttheseattacks,numerousfaceattackdetectionap-
proacheshavebeenproposed[20,21,61Œ63].Despiteimpressivedetectionrates,prevailingre-
searcheffortsfocusonafewattacktypeswithin
one
ofthethreeattackcategories.Sincetheexact
typeoffaceattackmaynotbeknown
apriori
,ageneralizabledetectorthatcandefendanAFR
121
systemagainstanyofthethreeattackcategoriesisofutmostimportance.
Duetothevastdiversityinattackcharacteristics,fromglossy2Dprintedphotographstoim-
perceptibleperturbationsinadversarialfaces,wethatlearningasingle

networkisinad-
equate.Evenwhenprevailingstate-of-the-art(SOTA)detectorsaretrainedonall
25
attacktypes,
theyfailtogeneralizewellduringtesting.Viaensembletraining,wecomprehensivelyevaluatethe
detectionperformanceonfusingdecisionsfromthreeSOTAdetectorsthatindividuallyexcelat
theirrespectiveattackcategories.However,duetothediversityinattackcharacteristics,decisions
madebyeachdetectormaynotbecomplementaryandresultinpoordetectionperformanceacross
all
3
categories.
Thisresearchisamongthetofocusondetecting
all
25
attacktypes
knowninliterature
(
6
adversarial,
6
digitalmanipulation,and
13
spoofattacks).Ourapproachconsistsof(i)auto-
maticallyclusteringattackswithsimilarcharacteristicsintodistinctgroups,and(ii)amulti-task
learningframeworktolearnsalientfeaturestodistinguishbetweenbonaandcoherentattack
types,whileearlysharinglayerslearnajointrepresentationtodistinguishbonafromany
genericattack.
Thisworkmakesthefollowingcontributions:

Amongthetothetaskoffaceattackdetectionon
25
attacktypesacross
3
attack
categories:adversarialfaces,digitalfacemanipulation,andspoofs.

Anovel
uni

f
ace
a
ttack
d
etectionframework,namely
UniFAD
,thatautomaticallyclus-
terssimilarattacksandemploysamulti-tasklearningframeworktodetectdigitalandphys-
icalattacks.

Proposed
UniFAD
achievesSOTAdetectionperformance,TDR=
94
:
73%
@
0
:
2%
FDRon
alargefakefacedataset,namely
GrandFake
.Tothebestofourknowledge,
GrandFake
is
thelargestfaceattackdatasetstudiedinliteratureintermsofthenumberofdiverseattack
types.

Proposed
UniFAD
allowsforfurtheroftheattackcategories,
i
.
e
.,whether
attacksareadversarial,digitallymanipulated,orcontainsphysicalartifacts,witha
122
StudyYear#BonaFides#Attacks#Types
Adversarial
UAP-D[167]2018
9
;
95929
;
8771
Goswami
etal.
[176]2019
16
;
68550
;
0553
Agarwal
etal.
[182]2020
24
;
04272
;
1263
Massoli
etal.
[179]2020
169
;
3961
M
6
FaceGuard[19]2020
507
;
6473
M
6
DigitalManip.
Zhou
etal.
[214]2018
2
;
0102
;
0102
Yang
etal.
[215]2018
241(
I
)
=
49(
V
)252(
I
)
=
49(
V
)1
DeepFake[216]2018

620(
V
)1
FaceForensics++[216]2019
1
;
000(
V
)3
;
000(
V
)3
FakeSpotter[217]2019
6
;
0005
;
0002
DFFD[20]2020
58
;
703240
;
3367
Phys.Spoofs
Replay-Attack[4]2012
200(
V
)1
;
000(
V
)3
MSUMFSD[85]2015
160(
V
)280(
V
)3
OuluNPU[2]2017
990(
V
)3
;
960(
V
)4
SiW[218]2018
1
;
320(
V
)3
;
158(
V
)6
SiW-M[1]2019
660(
V
)960(
V
)13
GrandFake(ours)
2021
341
;
738447
;
67425
Table4.1Faceattackdatasetswithno.ofbonaimages,no.ofattackimages,andno.ofattack
types.Here,
I
denotesimagesand
V
referstovideos.
accuracyof
97
:
37%
.
4.2RelatedWork
IndividualAttackDetection.
Earlyworkonfaceattackdetectionprimarilyfocusedononeor
twoattacktypesintheirrespectivecategories.Studiesonadversarialfacedetection[166,176]
primarilyinvolveddetectinggradient-basedattacks,suchasFGSM[34],PGD[219],andDeep-
Fool[141].DeepFakeswereamongthestudieddigitalattackmanipulation[214Œ216],how-
ever,generalizabilityoftheproposedmethodstoalargernumberofdigitalmanipulationattack
typesisunsatisfactory[220].Majorityoffaceanti-smethodsfocusonprintandreplay
attacks[2,66,85,89,91,91,92,94,112,123,135,218,221,222].
123
Overtheyears,acleartrendintheincreaseofattacktypesineachcategorycanbeobservedin
Tab.4.1.Sinceacommunityofattackersdedicatetheireffortstocraftnewattacks,itisimperative
tocomprehensivelyevaluateexistingsolutionsagainstalargenumberofattacktypes.
JointAttackDetection.
Recentstudieshaveusedmultipleattacktypesinordertodefendagainst
faceattacks.For
e
.
g
.,FaceGuard[19]proposedageneralizabledefenseagainst
6
adversarialattack
types.TheDiverseFakeFaceDataset(DFFD)[20]includes
7
digitalmanipulationattacktypes.
Inthespoofattackcategory,recentstudiesfocusondetecting
13
spooftypes.
Majorityoftheworkstacklingmultipleattacktypesposethedetectionasabinaryclass
cationproblemwithasinglenetworklearningajointfeaturespace.Forsimplicity,wereferto
suchanetworkarchitectureas
JointCNN
.Forinstance,itiscommoninadversarialfacedetection
totrainaJointCNNwithbonafacesandadversarialattackssynthesizedbyagen-
erativenetwork[12,13,19,184].Ontheotherhand,majorityoftheproposeddefensesagainst
digitalmanipulation,apre-trainedJointCNN(
e
.
g
.,Xception[223])onbonafaces
andallavailabledigitalmanipulationattacks[20,77,217].Duetotheavailabilityofawidevariety
ofphysicalspoofartifactsinfacedatasets(
e
.
g
.,eyeglasses,printandreplayinstru-
ments,masks,
etc
)alongwithevidentcuesfordetectingthem,studiesonanti-spoofsaremore
sophisticated.TheassociatedJointCNNemployseitherauxiliarycues,suchasdepthmapand
heartpulsesignals(rPPG)[88,127,218],oraﬁcompactnessﬂlosstopreventov[61,224].
RecentlyStehouwer
etal.
[225]attempttolearnaspoofdetectorfromimageryofgenericob-
jectsandapplyittofaceWhilejointlydetectingmultipleattacktypesispromising,
detectingattacktypes
across
differentcategoriesisoftheutmostimportance.Anearlyattempt
proposedadefenseagainst4attacktypes(
3
spoofsand
1
digitalmanipulation)[224].Tothebest
ofourknowledge,wearethetoattemptdetecting
25
attacktypesacross
3
categories.
Multi-taskLearning.
Inmulti-tasklearning(MTL),atask,
T
i
isusuallyaccompaniedbyatrain-
ingdataset,
D
tr
consistingof
N
t
trainingsamples,
i
.
e
.,
D
tr
=
f
x
tr
i
;y
tr
i
g
N
tr
i
=1
,where
x
tr
i
2
R
isthe
i
th
trainingsamplein
T
i
and
y
t
i
isitslabel.MostMTLmethodsrelyontasks[226Œ228].
Crawshaw
etal.
[229]summarizevariousworksonMLTwithCNNs.Inthiswork,wepropose
124
Figure4.2(a)Detectionperformance(TDR@
0
:
2%
FDR)indetectingeachattacktypebythe
proposed
UniFAD
(purple)andthedifferenceinTDRfromthebestfusionscheme,LightGBM[36]
(pink).(b)Cosinesimilaritybetweenmeanfeaturesfor
25
attacktypesextractedby
JointCNN
.(c)
Examplesofattacktypesfrom
4
differentclustersvia
k
-meansclusteringonJointCNNfeatures.
Attacktypesinpurple,blue,andreddenotespoofs,adversarial,anddigitalmanipulationattacks,
respectively.
aMTLframeworkinanextremesituationwhereonlyasingletaskisavailable(bonavs.
25
attacktypes)andutilize
k
-meansclusteringtoconstructnewauxiliarytasksfrom
D
tr
.Arecent
studyalsoutilized
k
-meansforconstructingnewtasks,however,theirapproachutilizesameta-
learningframeworkwherethegroupsthemselvescanalterthroughouttraining[230].Weshow
thatthisisproblematicforfaceattackssinceattacksthatsharesimilarcharacteristicsshouldbe
trainedjointly.Instead,weproposeanewattackdetectionframeworkthatutilizes
k
-
meanstopartitionthe
25
attackstypes,andthenlearnssharedandrepresentations
todistinguishthemfrombona
4.3DissectingPrevailingDefenseSystems
4.3.1Datasets
Inordertodetect
25
attacktypes(
6
adversarial,
6
digitalmanipulation,and
13
spoofs),wepropose
the
GrandFake
dataset,anamalgamationofmultiplefaceattackdatasetsfromeachcategory.We
125
provideadditionaldetailsof
GrandFake
inSec.4.5.1.
4.3.2DrawbackofJointCNN
Considerthediversityintheavailableattacks:fromimperceptibleadversarialperturbationsto
digitalmanipulationattacks,bothofwhichareentirelydifferentfromphysicalprintattacks(hard
surface,glossy,2D).Evenwithinthespoofcategory,characteristicsofmaskattacksarequite
differentfromreplayattacks.Inaddition,discriminativecuesforsomeattacktypesmaybeob-
servedinhigh-frequencydomain(
e
.
g
.,defocusedblurriness,chromaticmoment),whileothers
exhibitlow-frequencycues(
e
.
g
.,colordiversityandspecularForthesereasons,learn-
ingacommonfeaturespacetodiscriminateallattacktypesfrombonaischallenginganda
JointCNNmayfailtogeneralizewellevenonattacktypesseenduringtraining.
WedemonstratethisbytrainingaJointCNNonthe
25
attacktypesin
GrandFake
dataset.
Wethencomputean
attacksimilaritymatrix
betweenthe
25
types(seeFig.4.2(b)).Themeanfea-
tureforeachattacktypeiscomputedonavalidationsetcomposedof
1
;
000
imagesperattack.
Wethencomputethepairwisecosinesimilaritybetweenmeanfeaturesfromallattackpairs.From
Fig.4.2,wenotethatphysicalattackshavelittlecorrelationwithadversarialattacksandtherefore,
learningthemjointlywithinacommonfeaturespacemaydegradedetectionperformance.
AlthoughprevailingJointCNN-baseddefenseachievenearperfectdetectionwhentrainedand
evaluatedontherespectiveattacktypesinisolation,weobserveadegradedperfor-
mancewhentrainedandtestedonall
3
attackcategoriestogether(seeTab.4.2).Inotherwords,
evenwhenaprevailingSOTAdefensesystemistrainedonall
3
categories,itmayleadtodegraded
performanceontesting.
4.3.3UnifyingMultipleJointCNNs
Anotherpossibleapproachistoconsiderensembletechniques;insteadofusingasingleJointCNN
detector,wecanfusedecisionsfrommultipleindividualdetectorsthatarealreadyexpertsindistin-
guishingbetweenbonaandattacksfromtheirrespectiveattackcategory.GiventhreeSOTA
126
detectors,oneperattackcategory,weperformacomprehensiveevaluationonparallelandsequen-
tialscore-levelfusionschemes.
Inourexperiments,wethat,indeed,fusingscore-leveldecisionsfromsingle-categoryde-
tectorsoutperformsasingleSOTAdefensesystemtrainedonallattacktypes.Notethateffortsin
utilizingprevailingdefensesystemsrelyontheassumptionthatattackcategoriesareindependent
ofeachother.However,Fig.4.2showsthatsomedigitalmanipulationattacks,suchasSTGAN
andStyleGAN,are
morecloselyrelated
tosomeoftheadversarialattacks(
e
.
g
.,AdvFaces,GFLM,
andSemantic)thanotherdigitalmanipulationtypes.Thisislikelybecauseallvemethodsuti-
lizeaGANtosynthesizetheirattacksandmaysharesimilarattackcharacteristics.Therefore,
aSOTAadversarialdetectorandaSOTAdigitalmanipulationdetectormayindividuallyexcelat
theirrespectivecategories,butmaynotprovidecomplementarydecisionswhenfused.Insteadof
trainingdetectorsongroupswithmanuallyassignedsemantics(
e
.
g
.,adversarial,digitalmanipu-
lation,spoofs),itisbettertotrainJointCNNsoncoherentattacks.Inaddition,utilizingdecisions
frompre-trainedJointCNNsmaytendtoovtotheattackcategoriesusedfortraining.
4.4ProposedMethod
Weproposeanewmulti-tasklearningframeworkforAttackDetection,namely
UniFAD
,
bytraininganend-to-endnetworkforimprovedphysicalanddigitalfaceattackdetection.In
particular,a
k
-meansaugmentationmoduleisutilizedtoautomaticallyconstructauxiliarytasks
toenhancesingletasklearning(suchasaJointCNN).Then,ajointmodelisdecomposedinto
afeatureextractor(sharedlayers)
F
thatissharedacrossalltasks,andbranches
foreachauxiliarytask.Fig.4.3illustratestheauxiliarytaskcreationandthetrainingprocess
of
UniFAD
.
127
Figure4.3Anoverviewoftraining
UniFAD
intwostages.Stage
1
automaticallyclusterscoherent
attacktypesinto
T
groups.Stage
2
consistsofaMTLframeworkwhereearlylayerslearngeneric
attackfeatureswhile
T
brancheslearntodistinguishbonafromcoherentattacks.
4.4.1Problem
Lettheﬁmaintaskﬂbeastheoverallobjectiveofaattackdetector:givenaninput
image,
x
,assignascorecloseto
0
if
x
isbonaorcloseto
1
if
x
isanyoftheavailableface
attacktypes.Wearealsogivenalabeledtrainingset,
D
tr
.Prevailingdefensesfollowasingle
tasklearningapproachwherethemaintaskisadoptedtobetheultimatetrainingobjective.In
ordertoavoidtheshortcomingsofaJointCNNandofmultipleJointCNNs,we
use
D
tr
toautomaticallyconstructmultipleauxiliarytasks
fT
t
g
T
t
=1
,where
T
i
isthe
i
thclusterof
coherentattacktypes.Iftheauxiliarytasksareappropriatelyconstructed,jointlylearningthese
tasksalongwiththemaintaskshouldimproveattackdetectioncomparedtoasingletask
learningapproach.
4.4.2AutomaticConstructionofAuxiliaryTasks
OnewaytoconstructauxiliarytasksistotrainaseparatebinaryJointCNNoneachattacktype.
Suchpartitioningmassivelyincreasescomputationalburden(
e
.
g
.,trainingandtesting
25
JointC-
NNs).Othersimplepartitioningmethods,suchasrandomlypartitionarelikelytoclusteruncor-
relatedattacks.Ontheotherhand,clusteringinthepixel-spaceisalsounappealingduetopoor
correlationbetweenthedistancesinthepixel-space,andclusteringinthehigh-dimensionalspace
ischallenging[231].Therefore,werequireareasonablealternativetomanualinspectionofthe
attacksimilaritymatrixinFig.4.2topartitiontheattacktypesintoappropriateclusters.
128
Fortunately,wealreadyhaveaJointCNNtrainedviaasingletasklearningframeworkthatcan
extractsalientrepresentations.Thus,wecanmapthedata
f
x
g
intoJointCNN'sembeddingspace
Z
,producing
f
z
g
.Wecanthenutilizeatraditionalclusteringalgorithm,
k
-means,whichtakes
asetoffeaturevectorsasinput,andclusterstheminto
k
distinctgroupsbasedonageometric
constraint.,foreachattacktype,wecomputethemeanfeature.Wethenutilize
k
-meansclusteringtopartitionthe
L
featuresinto
T
(

L
)
sets,
P
=
fP
1
;
P
2
;:::;
P
T
g
suchthat
within-clustersumofsquares(WCSS)isminimized,
argmin
P
T
X
i
=1
X

z
2P
i
jj

z


i
jj
2
;
(4.4.1)
where,

z
representsameanfeatureforanattacktypeand

i
isthemeanofthefeaturesin
P
i
.
Fig.4.2(c)showsanexampleonclusteringthe
25
attacktypesof
GrandFake
.
4.4.3Multi-TaskLearningwithConstructedTasks
Withamulti-tasklearningframework,welearncoherentattacktypesjointly,whileuncorrelated
attacksarelearnedintheirownfeaturespaces.Weconstruct
T
ﬁbranchesﬂwhereeachbranchisa
neuralnetworktrainedonabinaryproblem(
i
.
e
.,aux.task).Thelearningobjective
ofeachbranch,
B
t
,istominimize,
L
aux
t
=
E
x
[
log
B
t
(
x
bf
)]+
E
x

log

1
B
t
(
x
P
t
fake
)

:
(4.4.2)
where
x
bf
denotesbonaimagesand
x
P
t
fake
isfaceattackscorrespondingtotheattacktypesin
thepartition
P
t
.
4.4.4ParameterSharing
EarlySharing.
Weadoptahardparametersharingmodulewhichlearnsacommonfeaturerepre-
sentationfordistinguishingbetweenbonaandattackspriortoaux.tasklearningbranches.
Baxter[232]demonstratedthesharedparametershavealowerriskofovthanthetask-
parameters.Therefore,adoptingearlyconvolutionallayersasapre-processingstepprior
129
tobranchingcanhelp
UniFAD
initsgeneralizationtoall
3
categories.Weconstructhiddenlay-
ersbetweentheinputandthebranchestoobtainsharedfeatures,
h
=
F
(
x
)
,whiletheauxiliary
learningbranchesoutput
B
t
(
h
)
.
LateSharing.
Eachbranch
B
t
istrainedtooutputadecisionscorewherescorescloserto
0
indicatethattheinputimageisabonawhereas,scorescloserto
1
correspondtoattacktypes
pertainingtothebranch'spartition.Thescoresfromall
T
branchesarethenconcatenatedand
passedtoadecisionlayer.Forsimplicity,wethedecisionoutputas,
FC
(
x
):=
FC
(
B
1
(
h
)
;
B
2
(
h
)
;:::;
B
T
(
h
))
.
Theearlysharedlayersandthedecisionlayerarelearnedviaabinarycross-entropyloss,
L
shared
=
E
x
[
log
FC
(
x
bf
)]+
E
x
[
log
(1

FC
(
x
fake
))]
;
(4.4.3)
betweenbonaandallavailableattacktypes.
4.4.5TrainingandTesting
Theentirenetworkistrainedinanend-to-endmannerbyminimizingthefollowingcompositeloss,
L
UniFAD
=
L
shared
+
T
X
t
=1
L
aux
t
:
(4.4.4)
The
L
shared
lossisbackpropagatedthroughout
UniFAD
,while
L
aux
t
isonlyresponsibleforupdat-
ingtheweightsofthebranch,
B
t
,andthelayer.Fortheforwardandbackward
passesof
L
shared
,anequalnumberofbonaandattacksamplesareusedfortraining.Onthe
otherhand,fortrainingeachbranch,
B
t
,wesampletheequalnumberofbonaandequal
numberofattackimagesfromtheattackpartition
P
t
.
AttackDetection.
Duringtesting,animagepassesthroughthesharedlayersandtheneach
branchof
UniFAD
outputsadecisionwhethertheimageisbona(valuescloseto
0
)oranattack
(closeto
1
).Thedecisionlayeroutputsthedecisionscore.Unlessstatedotherwise,we
usethedecisionscorestoreportperformance.
Attack
Onceanattackisdetected,
UniFAD
canautomaticallyclassifytheattack
130
typeandcategory.Forall
L
attacktypesinthetrainingset,weextractintermediate
128
-dim
featurevectorsfrom
T
branches.Thefeaturesarethenconcatenatedandthemeanfeatureacross
all
L
attacktypesiscomputed,suchthat,wehave
L
featurevectorsofsize
T

128
.Foradetected
attack,Cosinesimilarityiscomputedbetweenthetestingsample'sfeaturevectorandthemean
trainingfeaturesfor
L
types.Thepredictedattacktypeistheonewiththehighestsimilarityscore.
4.5ExperimentalResults
4.5.1ExperimentalSettings
Dataset.
GrandFake
consistsof
25
faceattacksfrom
3
attackcategories.Bothbonaandfake
facesareofvaryingqualityduetodifferentcaptureconditions.
BonaFideFaces.
WeutilizefacesfromCASIA-WebFace[10],LFW[8],CelebA[17],SiW-
M[1],andFFHQ[18]datasetssincethefacesthereincoverabroadvariationinrace,age,gender,
pose,illumination,expression,resolution,andacquisitionconditions.
AdversarialFaces.
WecraftadversarialfacesfromCASIA-WebFace[10]andLFW[8]via
6
SOTAadversarialattacks:FGSM[34],PGD[219],DeepFool[141],AdvFaces[16],GFLM[32],
andSemanticAdv[186].TheseattackswerechosenfortheirsuccessinevadingSOTAAFRsys-
temssuchasArcFace[9].
DigitalManipulation.
Therearefourbroadtypesofdigitalfacemanipulation:identityswap,
expressionswap,attributemanipulation,andentirelysynthesizedfaces[20].Weuseallclipsfrom
FaceForensics++[77],includingidentityswapbyFaceSwapandDeepFake
1
,andexpressionswap
byFace2Face[76].WeutilizetwoSOTAmodels,StarGAN[78]andSTGAN[79],togenerate
attributemanipulatedfacesinCeleba[17]andFFHQ[18].WeusethepretrainedStyleGAN2
model
2
tosynthesize
100
Kfakefaces.
PhysicalSpoofs.
Weutilizethepubliclyavailable
SiW-M
dataset[1],comprising
13
spoof
1
https://github.com/deepfakes/faceswap
2
https://github.com/NVlabs/stylegan2
131
TDR(%)@0.2%FDRYearProposedForAdv.Dig.Man.Phys.OverallTime(ms)
w/oRe-train
FaceGuard[19]
2020
Adversarial
99
:
9122
:
2800
:
5829
:
6401
:
41
FFD[20]
2020
DigitalManipulation
09
:
4994
:
5701
:
2534
:
5511
:
57
SSRFCN[21]
2020
Spoofs
00
:
2500
:
7693
:
1922
:
7102
:
22
MixNet[233]
2020
Spoofs
00
:
3609
:
8378
:
2121
:
1212
:
47
Baselines
FaceGuard[19]
2020
Adversarial
99
:
8641
:
5604
:
3556
:
6901
:
41
FFD[20]
2020
DigitalManipulation
76
:
0691
:
3287
:
4368
:
2511
:
57
SSRFCN[21]
2020
Spoofs
08
:
2327
:
6789
:
1943
:
2602
:
22
One-class[61]
2020
Spoofs
04
:
8145
:
9679
:
3239
:
4007
:
92
MixNet-
UniFAD
2021
All
82
:
3391
:
5994
:
6090
:
0712
:
47
FusionSchemes
Cascade[40]

88
:
3981
:
9869
:
1977
:
4605
:
16
Min-score

03
:
6511
:
0800
:
4307
:
2216
:
14
Median-score

10
:
8742
:
3347
:
1939
:
4816
:
12
Mean-score

14
:
5347
:
1861
:
3238
:
2316
:
12
Max-score

85
:
3261
:
9356
:
8773
:
8916
:
13
Sum-score

74
:
9358
:
0150
:
3469
:
2116
:
11
LightGBM[36]

76
:
2581
:
2888
:
5285
:
9717
:
92
ProposedUniFAD
2021
All
92
:
5697
:
2198
:
7694
:
7302
:
59
Table4.2Detectionaccuracy(TDR(%)@
0
:
2%
FDR)on
GrandFake
dataset.Resultsonfusing
FaceGuard[19],FFD[20],andSSRFCN[21]arealsoreported.Wereporttimetakentodetecta
singleimage(onaNvidia2080TiGPU).
types.Comparedwithotherspoofdatasets(Tab.4.1),SiW-Misdiverseinspoofattacktypes,
environmentalconditions,andfaceposes.
Protocol.
Asiscommonpracticeinfacerecognitionliterature,bonaandattacksfrom
CASIA-WebFace[10]areusedfortraining,whilebonaandattacksforLFW[8]arese-
questeredfortesting.Thebonaandattacksfromotherdatasetsaresplitinto
70%
training,
5%
validation,and
25%
testing.
Implementation.
UniFAD
isimplementedinTw,andtrainedwithaconstantlearning
rateof
(1
e

3
)
withamini-batchsizeof
180
.Theobjectivefunction,
L
UniFAD
,isminimized
usingAdamoptimizerfor
100
K
iterations.Dataaugmentationduringtraininginvolvesrandom
horizontalwithaprobabilityof
0
:
5
.
Metrics.
Studiesondifferentattackcategoriesprovidetheirownmetrics.Followingtherec-
ommendationfromIARPAODINprogram,wereporttheTDR@
0
:
2%
FDR
3
andtheoverall
detectionaccuracy(inAddendum(Sec.4.6).
3
https://www.iarpa.gov/index.php/research-programs/odin
132
4.5.2ComparisonwithIndividualSOTADetectors
Inthissection,wecomparetheproposed
UniFAD
toprevailingfaceattackdetectorsviapublicly
availablerepositoriesprovidedbytheauthors(seeAddendum(Sec.4.6).
WithoutRe-training.
InTab.4.2,wereporttheperformanceof
4
pre-trainedSOTAde-
tectors.Thesebaselineswerechosensincetheyreportthebestdetectionperformanceindatasets
withlargestnumbersofattacktypesintheirrespectivecategories(seeTab.4.1).Wealsoreport
theperformanceofageneralizablespoofdetectorwithsub-networks,namely
MixNet
.MixNet
semanticallyassignsspoofsintogroups(print,replay,andmasks)fortrainingeachsub-network
withoutanysharedrepresentation.Wethatpre-trainedmethodsindeedexcelintheir
attackcategories,however,generalizationperformanceacrossall
3
categoriesdeterioratescatas-
trophically.
WithRe-training.
Afterre-trainingthe
4
SOTAdetectorsonall
25
attacktypes,wethat
theygeneralizebetteracrosscategories.FaceGuard[19],FFD[20],SSRFCN[21],andOne-
Class[61],allemployaJointCNNfordetectingattacks.Unsurprisingly,thesedefensesperform
wellonsomeattackcategories,whilefailingonothers.
Forafaircomparison,wealsomodifyMixNet,namely
MixNet-UniFAD
suchthatclustersare
assignedvia
k
-meanswith
4
branches.Incontrastto
MixNet-UniFAD
,
UniFAD
(i)employsearly
sharedlayersforgenericattackcues,and(ii)eachbranchlearnstodistinguishbetweenbona
andattacktypes.MixNet,ontheotherhand,assignsabonalabel(
0
)toattacktypes
outsidearespectivebranch'spartition.Thisnegativelyimpactsnetworkconvergence.Overall,we
that
UniFAD
outperforms
MixNet-UniFAD
withTDR
90
:
07%
!
94
:
73%
@
0
:
2%
FDR.
4.5.3ComparisonwithFusedSOTADetectors
WealsocomprehensivelyevaluatedetectionperformanceonfusingSOTAdetectors.Weutilize
threebestperformingdetectorsfromeachattackcategory,namelyFaceGuard[19],FFD[20],and
SSRFCN[21].InspiredbytheViola-Jonesobjectdetection[40],weadoptasequentialensemble
133
Figure4.4Confusionmatrixrepresentingtheaccuracyof
UniFAD
inidentifyingthe
25
attacktypes.Majorityofoccurwithintheattackcategory.Darkervalues
indicatehigheraccuracy.Overall,
UniFAD
achieves
75
:
81%
and
97
:
37%
accuracyin
identifyingattacktypesandcategories,respectively.Purple,blue,andreddenotespoofs,adver-
sarial,anddigitalmanipulationattacks,respectively.
technique,namelyCascade[40],whereaninputprobeispassedthrougheachdetectorsequen-
tially.Wealsoevaluate
5
parallelscorefusionrules(min,mean,median,max,andsum)and
aSOTAensembletechnique,namelyLightGBM[36].MoredetailsareprovidedinAddendum
(Sec.4.6.Indeed,weobserveanoverheadindetectionspeedcomparedtotheindividualdetec-
torsinisolation,however,cascade,max-scorefusionandLightGBM[36]canenhancetheoverall
detectionperformancecomparedtotheindividualdetectorsatthecostofslowerinferencespeed.
Sincetheindividualdetectorsstilltrainwithincoherentattacktypes,wethatproposed
UniFAD
outperformsalltheconsideredfusionschemes.
InFig.4.2(a),weshowtheperformancedegradationofLightGBM[36],thebestfusingbase-
line,w.r.t.
UniFAD
.Weobservethatamong
4
clusters,thelast
2
havetheoveralllargestdegrada-
tion.Interestingly,these
2
clustersaretheonlyonesincludingattacktypesacrossdifferentattack
categories,learnedviaour
k
-meanclustering.Inotherwords,thecross-categoryattackstypes
withinabrancheachother,leadingtothelargestperformancegainover[36].Thisfurther
demonstratesthenecessityandimportanceofadetectionschemeŠthemoreattacktypes
134
thedetectorsees,themorelikelyitwouldnourishamongeachotherandbeabletogeneralize.
4.5.4Attack
WeclassifytheexactattacktypeandcategoriesusingthemethoddescribedinSec.4.4.5.In
Fig.4.4,wethat
UniFAD
canpredicttheattacktypewith
75
:
81%
accuracy.
Whilepredictingtheexacttypemaybechallenging,wehighlightthatmajorityofthemisclassi-
occurswithinattack'scategory.Withouthumanintervention,once
UniFAD
isdeployed
inAFRpipelines,itcanpredictwhetheraninputimageisadversarial,digitallymanipulated,or
containsspoofartifactswith
97
:
37%
accuracy.
4.5.5AnalysisofUniFAD
Architecture.
Weablateandanalyzeourarchitecture.
RatioofSharedLayers.
Ourbackbonenetworkconsistsofa
4
-layerCNN.InFig.4.5a,we
reportthedetectionperformancewhenweincorporate
0
,
1
(
25%
),
2
(
50%
),
3
(
75%
),and
4
(
100%
)
layersforearlysharing.Weobserveatrade-offbetweendetectionperformanceandthenumberof
earlylayers:toomanyreducestheeffectsoflearningfeaturesviabranching,whereas,
lessnumberofsharedlayersinhibitsthenetworkfromlearninggenericfeaturesthatdistinguish
anyattackfrombonaWethatanevensplitresultsinsuperiordetectionperformance.
NumberofBranches.
InFig.4.5b,wevarythenumberofbranches(aux.tasksconstructed
via
k
Means)andreportthedetectionperformance.Indeed,increasingthenumberofbranches
viaadditionalclustersenhancesdetectionperformance.However,theperformancesaturatesaf-
ter
4
branches.
UniFAD
with
4
branchesachievesTDR=
94
:
73%
@
0
:
2%
FDR,whereas,
5
and
6
branchesachieveTDRsof
94
:
33
and
94
:
62
at
0
:
2%
,respectively.For
T
=5
,wenoticethat
StyleGANisisolatedfromCluster
3
(seeFig.4.2(c))intoaseparatecluster.Learningtodiscrimi-
nateStyleGANseparatelymayoffernoadvantagethanlearningjointlywithFGSMand
PGD.Thus,wechoose
T
=4
duetolowernetworkcomplexityandhigherinferenceefy.
135
Figure4.5Detectionperformancewithrespecttovaryingratioofsharedlayers(left)andnumber
ofbranches(right).Ourproposedarchitectureuses
50%
sharedlayerswith
4
branches.
Model
Modules
Overall
SharedLayers
Branching
k
Means
TDR(%)@
0
:
2%
FDR
JointCNN
X
63
:
89
B
Semantic
X
86
:
17
B
Random
X
53
:
95

08
:
02
B
kMeans
X
X
89
:
67
SharedSemantic
X
X
92
:
44
Proposed
X
X
X
94
:
73
Table4.3Ablationstudyovercomponentsof
UniFAD
.Branchingviaﬁ
B
Semantic
ﬂ,ﬁ
B
Random
ﬂ,and
ﬁ
B
kMeans
ﬂrefertopartitioningattacktypesbytheirsemanticcategories,randomly,and
k
Means.
ﬁSharedSemanticﬂincludessharedlayerspriortobranching.
BranchGeneralizability.
InFig.4.6,scoresfromthe
4
branchesareusedtocomputethedetec-
tionperformanceonattacktypeswithinrespectivepartitionsandthoseoutsideabranch'spartition
(seeFig.4.2(b)).Sinceattacktypesoutsideabranch'spartitionarepurportedlyincoherent,wesee
adropinperformance;validatingthedrawbackofJointCNN.Wethatthelowestperformance
branch,Branch
4
,alsoexhibitsthebestgeneralizationperformanceacrossotherattacktypes.This
islikelybecauselearningtodistinguishbonafromimperceptibleperturbationsfromFGSM,
PGD,andminutesyntheticnoisesfromStyleGANyieldsatighterdecisionboundarywhichmay
contributetobettergeneralizationacrossdigitalattacks.(Branch
1
)itselfdoesnot
directlyaidindetectingdigitalattacks.
136
Figure4.6Detectionperformanceonattacktypeswithinandoutsideabranch'spartition.Perfor-
mancedropsonattacksoutsidepartitionastheymaynothaveanycorrelationwithwithin-partition
attacktypes.
AblationStudy.
InTab.4.3,weconductacomponent-wiseablationstudyover
UniFAD
.Westudy
differentpartitioningtechniquestogroupthe
25
attacktypes.Weemploysemanticpartitioning,
B
Semantic
whereattacktypesareclusteredintothe
3
categories.Anothertechniqueistosplitthe
25
attacktypesinto
4
clustersrandomly,
B
Random
.Wereportthemeanandstandarddeviationacross
3
trialsofrandomsplitting.Wealsoreporttheperformanceofclusteringvia
k
Means.We
thatboth
B
Semantic
and
B
kMeans
outperformsJointCNN.Thus,learningseparatefeaturespacesvia
MTLfordisjointattacktypescanimproveoveralldetectioncomparedtoaJointCNN.Weals
thatincorporatingearlysharedlayersinto
B
Semantic
,namely
B
SharedSemantic
,canfurtherimprove
detectionfrom
86
:
17%
!
92
:
44%
TDR@
0
:
2%
FDR.However,asweobservedinFig.4.2,
evenwithinsemanticcategories,someattacktypesmaybeincoherent.Byautomaticconstruction
ofauxiliarytaskswith
k
-meansclusteringandsharedrepresentation(Proposed),wecanfurther
enhancethedetectionperformancetoTDR=
94
:
73%
@
0
:
2%
FDR.
FailureCases.
Fig.4.7showsafewfailurecases.Majorityofthefailurecasesfordigitalattacks
areduetoimperceptibleperturbations.Incontrast,failuretodetectspoofscanlikelybeattributed
tothesubtlenatureoftransparentmasks,blurring,andilluminationchanges.
137
Figure4.7Examplecaseswhere
UniFAD
failstodetectfaceattacks.Finaldetectionscoresalong
withscoresfromeachofthefourbranches(
2
[0
;
1]
)aregivenbeloweachimage.Scorescloser
0
indicatebonaBranchesresponsiblefortherespectiveclusterarehighlightedinbold.
4.6Addendum
Here,weprovideadditionaldetailsontheproposed
GrandFake
datasetand
UniFAD
andbaselines.
4.6.1GrandFakeDataset
The
GrandFake
datasetiscomposedofseveralwidelyadoptedfacedatasetsforfacerecognition,
faceattributemanipulation,andsynthesis.detailsonthesourcedatasetsalongwithtrain-
ingandtestingsplitsareprovidedinTab.4.4.Weensurethatthereisnoidentityoverlapbetween
anyofthetrainingandtestingsplitsasfollows:
1.
Weremoved
84
subjectsinCASIA-WebFacethatoverlapwithLFW.Inaddition,CASIA-
WebFaceisonlyusedfortraining,whileLFWisonlyusedfortesting.
2.
SiW-Mcomprisesofhigh-resolutionphotosofnon-celebritysubjectswithoutanyidentity
overlapwithotherdatasets.Trainingandtestingsplitsarecomposedofvideospertainingto
differentidentities.
3.
IdentitiesinCelebAtrainingsetaredifferentthanthoseintesting.
4.
FFHQcomprisesofhigh-resolutionphotosofnon-celebritysubjectsonFlicker.FFHQis
utilizedsolelyforbonainordertoadddiversitytothequalityoffaceimages.
138
4.6.2ImplementationDetails
AllthemodelsinthepaperareimplementedusingTwr1.12.AsingleNVIDIAGeForce
GTX2080TiGPUisusedfortraining
UniFAD
on
GrandFake
dataset.
Preprocessing
AllfaceimagesarepassedthroughMTCNNfacedetector[41]todetect
5
faciallandmarks(twoeyes,noseandtwomouthcorners).Then,similaritytransformationisused
tonormalizethefaceimagesbasedonthevelandmarks.Aftertransformation,theimagesare
resizedto
160

160
.Beforepassinginto
UniFAD
andbaselines,eachpixelintheRGBimage
isnormalized
2
[

1
;
1]
bysubtracting
128
anddividingby
128
.
Allthetestingimagesinthis
chapterarefromtheidentitiesinthetestdataset.
NetworkArchitecture
Thebackbonenetwork,JointCNN,comprisesofa
4
-layerbinaryCNN:

d32,d64,d128,d256,fc128,fc1
,
where
dk
denotesa
4

4
convolutionallayerwith
k
andstride
2
and
fcN
referstoafully-
connectedlayerwith
N
neuronoutputs.
Theproposed
UniFAD
with
4
branchesiscomposedof:

EarlyLayers:
d32,d64
,

Branch
1
:
d128,d256,fc128,fc1
,

Branch
2
:
d128,d256,fc128,fc1
,

Branch
3
:
d128,d256,fc128,fc1
,

Branch
4
:
d128,d256,fc128,fc1
,
139
#TotalSamples
#Training
#Validation
#Testing
Examples
Datasets
BonaFides
CASIA[10]
10
;
575
10
;
075
500
0
CASIA[10]
CelebA[17]
LFW[8]
FFHQ[18]
SiW-M[1]
CelebA[17]
193
;
506
134
;
954
500
58
;
052
LFW[8]
9
;
164
0
0
9
;
164
FFHQ[18]
20
;
999
14
;
200
500
6
;
299
SiW-M[1]
107
;
494
74
;
746
500
32
;
248
Total
341
;
738
233,975
2,000
105,763
Attacks
Adversarial
FGSM
19
;
739
10
;
495
80
9
;
164
FGSM
PGD
DeepFool
AdvFaces
GFLM
Semantic
PGD
19
;
739
10
;
495
80
9
;
164
DeepFool
19
;
739
10
;
495
80
9
;
164
AdvFaces
19
;
739
10
;
495
80
9
;
164
GFLM
17
;
946
8
;
702
80
9
;
164
Semantic
19
;
739
10
;
495
80
9
;
164
DigitalManipulation
DeepFake
18
;
165
10
;
393
80
7
;
692
DeepFake
F2F
FaceSwap
StarGAN
STGAN
Style-
GAN
Face2Face
18
;
204
10
;
385
80
7
;
739
FaceSwap
14
;
492
8
;
299
80
6
;
113
STGAN
29
;
983
9
;
903
80
20
;
000
StarGAN
45
;
473
10
;
406
80
34
;
987
StyleGAN
76
;
604
10
;
919
80
65
;
605
Spoofs
Cosmetic
2
;
638
1
;
766
80
792
Cosm.
Imp.
Obf.
HalfMask
Mann.
Impersonation
9
;
184
6
;
348
80
2
;
756
Obfuscation
3
;
611
2
;
447
80
1
;
084
HalfMask
10
;
486
7
;
260
80
3
;
146
Mannequin
5
;
287
3
;
620
80
1
;
587
PaperMask
Silicone
Trans.
Print
PaperMask
2
;
550
1
;
705
80
765
Silicone
5
;
038
3
;
446
80
1
;
512
Transparent
11
;
451
7
;
935
80
3
;
436
Print
10
;
530
7
;
290
80
3
;
160
PaperCut
Funnyeye
PaperGlass
Replay
PaperCut
13
;
178
9
;
144
80
3
;
954
FunnyEye
23
;
470
16
;
349
80
7
;
041
PaperGlass
18
;
563
12
;
914
80
5
;
569
Replay
12
;
126
8
;
408
80
3
;
638
Total
447
;
674
210,114
2,000
235,560
Table4.4Compositionandstatisticsfortheproposed
GrandFake
dataset.Wealsoincludethe
evaluationprotocolfortheseenattackscenario.
4.6.3DigitalAttackImplementation
Adversarialattacksaresynthesizedviapubliclyavailableauthorcodes:

FGSM/PGD/DeepFool:https://githubw/cleverhans

AdvFaces:https://github.com/ronny3050/AdvFaces
140

GFLM:https://github.com/alldbi/FLM

SemanticAdv:https://github.com/AI-secure/SemanticAdv
Digitalmanipulationattacksarealsogeneratedviapubliclyavailableauthorcodes:

DeepFake/Face2Face/FaceSwap:https://github.com/ondyari/FaceForensics/tree/original

STGAN:https://github.com/csmliu/STGAN

StarGAN:https://github.com/yunjey/stargan

StyleGAN-v2:https://github.com/NVlabs/stylegan2
4.6.4BaselineImplementation
IndividualDetectors.
Weevaluateall
individual
defensemethods,exceptMixNet[233],viapub-
liclyavailablerepositoriesprovidedbytheauthors.Weprovidethepubliclinkstotheauthorcodes
below:

FaceGuard[19]:https://github.com/ronny3050/FaceGuard

FFD[20]:https://github.com/JStehouwer/FFD
CVPR2020

SSRFCN[21]:https://github.com/ronny3050/SSRFCN

One-Class[61]:https://github.com/anjith2006/bob.paper.oneclass
mccnn
2019
FusionofJointCNNs.
Inourwork,weemploy
5
parallelscore-levelfusionrules.Foratest-
inginputimage,weextractthreescores(in
[0
;
1]
)fromthreeSOTAindividualdetectors(Face-
Guard[19],FFD[20],andSSRFCN[21]).Thedecisionscoreiscomputedviathefusionrule
operator,namely
min
,
mean
,
median
,
max
,and
sum
.LightGBM[36]isatree-ensemblelearning
methodwhereaGradientBoostedDecisionTreeistrainedfromthethreescores(fromindivid-
ualSOTAdectors)tooutputtothedecision.WeuseMicrosoft'sLightGBMimplementa-
tion:https://github.com/microsoft/LightGBM.
WetraintheLightGBMmodelonthetraining
setof
GrandFake
.
141
MethodYearProposedForMetricAdv.Dig.Man.Phys.Overall
w/oRe-train
FaceGuard[19]2020Adversarial
TDR
99
:
9122
:
2800
:
5829
:
64
Acc.
99
:
7867
:
1251
:
0271
:
03
FFD[20]2020DigitalManipulation
TDR
09
:
4994
:
5701
:
2534
:
55
Acc.
56
:
4297
:
8753
:
4275
:
29
SSRFCN[21]2020Spoofs
TDR
00
:
2500
:
7693
:
1922
:
71
Acc.
50
:
0150
:
9396
:
1269
:
11
MixNet[233]2020Spoofs
TDR
00
:
3609
:
8378
:
2121
:
12
Acc.
50
:
4355
:
9885
:
4761
:
26
Baselines
FaceGuard[19]2020Adversarial
TDR
99
:
8641
:
5604
:
3556
:
69
Acc.
99
:
7171
:
2354
:
0681
:
88
FFD[20]2020DigitalManipulation
TDR
76
:
0691
:
3287
:
4368
:
25
Acc.
87
:
1593
:
4091
:
3789
:
06
SSRFCN[21]2020Spoofs
TDR
08
:
2327
:
6789
:
1943
:
26
Acc.
54
:
0269
:
1890
:
9183
:
41
One-class[61]2020Spoofs
TDR
04
:
8145
:
9679
:
3239
:
40
Acc.
53
:
9964
:
0886
:
6580
:
74
MixNet-
UniFAD
2021All
TDR
82
:
3391
:
5994
:
6090
:
07
Acc.
89
:
3294
:
5096
:
1893
:
19
FusionSchemes
Cascade[40]-

TDR
88
:
3981
:
9869
:
1977
:
46
Acc.
91
:
3389
:
1779
:
9285
:
16
Min-score-

TDR
03
:
6511
:
0800
:
4307
:
22
Acc.
51
:
6166
:
7650
:
8855
:
62
Median-score-

TDR
10
:
8742
:
3347
:
1939
:
48
Acc.
55
:
1259
:
5857
:
4459
:
22
Mean-score-

TDR
14
:
5347
:
1861
:
3238
:
23
Acc.
55
:
6954
:
1973
:
8755
:
92
Max-score-

TDR
85
:
3261
:
9356
:
8773
:
89
Acc.
89
:
2668
:
1160
:
0869
:
43
Sum-score-

TDR
74
:
9358
:
0150
:
3469
:
21
Acc.
83
:
8567
:
4864
:
7273
:
10
LightGBM[36]-

TDR
76
:
2581
:
2888
:
5285
:
97
Acc.
84
:
1989
:
4694
:
5690
:
56
ProposedUniFAD
2021All
TDR
92
:
5697
:
2198
:
7694
:
73
Acc
95
:
1898
:
3298
:
9696
:
89
Table4.5Detectionperformance(TDR(%)@
0
:
2%
FDRandAccuracy(%))on
GrandFake
datasetunderthe
seen
attackscenario.
4.6.5SeenAttacks
Tab.4.5reportsthedetectionperformance(TDR(%)@
0
:
2%
FDRandaccuray(%))of
UniFAD
andbaselineson
GrandFake
datasetundertheseenattackscenario.Trainingandtestingsplitsare
providedinTab.4.4.Overall,
UniFAD
outperformsallfusionschemesandbaselines.
142
Figure4.8Trainingandtestingsplitsforgeneralizabilitystudy.
143
Method
Metric(%)
Fold1
Fold2
Fold3
Mean

Std.
FaceGuard[19]
TDR
41
:
38
54
:
19
36
:
82
44
:
13

9
:
01
Acc.
58
:
42
64
:
19
55
:
74
59
:
45

4
:
32
FFD[20]
TDR
53
:
19
62
:
45
52
:
94
56
:
20

5
:
42
Acc.
66
:
15
69
:
33
67
:
86
67
:
78

1
:
59
SSRFCN[21]
TDR
49
:
10
64
:
92
61
:
18
58
:
84

8
:
26
Acc.
60
:
07
72
:
77
69
:
83
66
:
57

6
:
64
MixNet-
UniFAD
TDR
67
:
19
73
:
18
72
:
74
71
:
04

3
:
33
Acc.
75
:
64
79
:
40
78
:
73
77
:
93

2
:
00
LightGBM
TDR
51
:
65
65
:
73
67
:
91
61
:
76

8
:
83
Acc.
69
:
34
73
:
66
75
:
80
72
:
93

3
:
29
Proposed
UniFAD
TDR
76
:
18
83
:
19
82
:
67
80
:
68

3
:
91
Acc.
85
:
35
89
:
62
85
:
88
86
:
95

2
:
32
Table4.6Generalizationperformance(TDR(%)@
0
:
2%
FDRandAccuracy(%))on
GrandFake
datasetunderunseenattacksetting.Eachfoldcomprisesof
8
unseenattacksfromall4branches.
4.6.6GeneralizabilitytoUnseenAttacks
Underthissetting,weevaluatethegeneralizationperformanceon3folds(seeFig.4.8).Thefolds
arecomputedasfollows:weholdout
1
=
3
ofthetotalattacktypesinabranchfortestingandthe
remainingareusedfortraining.For
e
.
g
.,branch1consistingof
13
attacktypesare
randomly
split
suchthatweteston
4
unseenattacktypes,whiletheremaining
9
attacktypesareusedfortraining.
Weperform
3
foldsofsuchrandomsplitting.Intotal,eachfoldconsistsof
17
seenand
8
unseen
attacks.ForLightGBM,weutilizescoresfromFaceGuard[19],FFD[20],andSSRFCN[21]
whicharealltrainedonlyontheknownattacktypes.Wereportthedetectionperformanceandthe
averageandstandarddeviationacrossallfoldsinTab.4.6.
Wethatbranching-basedmethods,suchasMixNet-
UniFAD
andtheproposed
UniFAD
,
outperformsJointCNN-basedmethodssuchasFaceGuard[19],FFD[20],SSR-
FCN[21],andLightGBM(fusionofthethree).Thesuperiorityoftheproposed
UniFAD
under
144
Figure4.9Confusionmatrixrepresentingtheaccuracyof
UniFAD
inidentifyingthe
3
attackcategories,namelyadversarialfaces,digitalfacemanipulation,andspoofs.Majorityof
confusionoccurswithindigitalattacks(adversarialanddigitalmanipulationattacks).
unseenattacksisevident.Byincorporatingbrancheswithcoherentattacks,removingsomeattack
typeswithinabranchdoesnotdrasticallyaffectthegeneralizationperformance.
Inadditiontosuperiorgeneralizationperformancetounseenattacktypes,theproposed
Uni-
FAD
alsoreducesthegapbetweenseenandunseenattacks.Overall,theproposed
UniFAD
achieves
94
:
73%
and
80
:
68%
TDRsat
0
:
2%
FDRunderseenandunseenattackscenario.Thatis,weobserve
arelativereductioninTDRof
15%
underunseenattacks,comparedtothesecondbestmethod,
namelyMixNet-
UniFAD
,whichhasarelativereductioninTDRof
22%
underunseenattacks.
4.6.7AttackCategory
InFig.4.9,wethat
UniFAD
canpredicttheattackcategorywith
97
:
37%
accu-
racy.Weemphasizethatmajorityoftheoccurswithinthedigitalattackspace.
Thatis,misclassifyingadversarialattacksasdigitalmanipulationattacksandvice-versa.
Amongspoofs,
1
:
7%
ofthemareasdigitalmanipulationattacks.Majorityof
thesearemakeupattackswhichhavesomecorrelationwithsomedigitalmanipulationattackssuch
asDeepFake,Face2Face,andFaceSwap(seeFig.
2
in
mainpaper
).Wepositthatthislikely
becausecosmeticandimpersonationattacksapplymakeuptoeyebrowsandcheekswhichmay
145
appearsimilartoID-swappingmethodssuchasDeepFakeandFace2Facewhichalsomajorlyalter
theeyebrowsandcheeks.
4.7Summary
WithnewandsophisticatedattacksbeingcraftedagainstAFRsystemsinbothdigitalandphys-
icalspaces,detectorsneedtoberobustacrossall
3
categories.Prevailingmethodsindeedexcel
atdetectingattacksintheirrespectivecategories,however,theyfallshortingeneralizingacross
categories.While,ensembletechniquescanenhancetheoverallperformance,theystillfailtomeet
thedesiredaccuracylevels.Poorgeneralizationcanbepredominantlyattributedtowardslearning
incoherentattacksjointly.Withanewmulti-tasklearningframeworkalongwith
k
-meansaug-
mentation,theproposed
UniFAD
achievedSOTAdetectionperformance(TDR=
94
:
73%
@
0
:
2%
FDR)on
25
faceattacksacross
3
categories.
UniFAD
canfurtheridentifycategorieswitha
97
:
37%
accuracy.
146
Chapter5
Summary
Thisdissertationhasaddressedsomeimportantchallengesthatplagueprevailingautomatedface
recognitionsystemstoday.Ourprimarycontributionliesinenhancingtherobustnessandsecurity
ofanycommodityfacerecognitionsystemagainstmaliciousattackssuchaspresentationattacks
(physicallycraftedspoofs),adversarialfaces(imperceptiblenoisesaddedtotheinputprobe),and
digitalfacemanipulationattacks(identityandexpressionswapping,attributemanipulation,and
entirefacesynthesis).Allproposedsolutionsachievestate-of-the-artdetectionperformancewhile
maintaininghighcomputationalefy(
<
4
msonNvidiaGTX2080TiGPU).Wealsomake
anefforttoimpartgeneralizabilitytounknownattacktypesthatmaybelaunchedagainstanAFR
pipelineinthefuture.Inaddition,allproposeddetectorsareinterpretablesuchthatAFRsystems
canbeoperatedsafelyincovertscenarios.Whenafaceattackisdetected,authoritiescanbealerted
withmorethanjustaholisticﬁattackscoreﬂ.Forinstance,ourproposedmethodscanhighlight
regionsofthefaceimagethatcontributetotheoveralldecisionmadebythedetector.Lastly,
sincetheexactattacktypesmaynotbeknownbeforehand,ourworkisamongthetoattempt
detectionacrossthreeattackcategoriesinbothphysicalanddigitaldomains.Wealsopresenteda
methodtoautomaticallyclassifytheexactattackcategorywheneveranattackisdetected.
147
5.1Contributions
Chapter2focusedonsafeguardingAFRsystemsfromphysicalfacespoofs.Thecontributions
include:

Weshowthatfeatureslearnedfromlocalfaceregionshavebettergeneralizationperformance
ondetectingpresentationattacksthanthoselearnedfromtheentirefaceimage.

Weprovideextensiveexperimentstoshowthattheproposedpresentationattackdetectionap-
proach,outperformsotherlocalregionextractionstrategiesandstate-of-the-artfacepresen-
tationattackdetectionmethodsononeofthelargestpubliclyavailabledataset,namely,SiW-
M,comprisedof13differentpresentationattackinstruments.Theproposedmethodreduces
theEqualErrorRate(EER)by(i)14%relativetostate-of-the-art[88]undertheunknown
attacksetting,and(ii)40%onknownpresentationattackinstruments.Inaddition,
SSR-
FCN
achievescompetitiveperformanceonstandardbenchmarksonOulu-NPU[2]dataset
andoutperformsprevailingmethodsoncross-datasetgeneralization(CASIA-FASD[3]and
Replay-Attack[4]).

Theproposedpresentationattackdetectionmethodisalsoshowntobemoreinterpretable
sinceitcandirectlypredictthepartsofthefacesthatareconsideredaspresentationattacks.
InChapter3,wedesignedanautomatedadversarialsynthesismethod.Thecontributions
ofthesynthesissectionareasfollows:

AGAN-based
AdvFaces
thatlearnstogeneratevisuallyrealisticadversarialfaceimagesthat
aremiscbystate-of-the-artAFRsystems.AdversarialfacesgeneratedviaAdvFaces
aremodel-agnosticandtransferable,andachievehighsuccessrateon5state-of-the-artau-
tomatedfacerecognitionsystems.

Perceptualstudieswherehumanobserverssuggestthattheadversarialexamplesappearmore
similartotheprobecomparedtothepreviousmethods.

Visualizingthefacialregions,wherepixelsareperturbedandanalyzingthetransferability
148
ofAdvFaces.

Anopen-sourceautomatedadversarialfacegeneratorpermittinguserstocontroltheamount
ofperturbation.
Thelatterpartofthechapterfocusedonutilizingideaspresentintheaforementionedadver-
sarialsynthesismethodtodetectanyadversarialattacktype.Thisworkmakesthefollowing
contributions:

Anewself-supervisedframework,namely
FaceGuard
,fordefendingagainstadversarialface
images.
FaceGuard
combinesofadversarialtraining,detection,andinto
adefensemechanismtrainedinanend-to-endmanner.

Withtheproposeddiversityloss,ageneratorisregularizedtoproducestochasticandchal-
lengingadversarialfaces.Weshowthatthediversityinoutputperturbationsissuffor
improving
FaceGuard
'srobustnesstounseenattackscomparedtoutilizingpre-computed
trainingsamplesfromknownattacks.

Synthesizedadversarialfacesaidthedetectortolearnatightdecisionboundaryaroundreal
faces.
FaceGuard
'sdetectorachievesSOTAdetectionaccuraciesof
99
:
81%
,
98
:
73%
,and
99
:
35%
on
6
unseenattacksonLFW[8],Celeb-A[17],andFFHQ[18].

Asthegeneratortrains,aconcurrentlyremovesperturbationsfromthesynthesized
adversarialfaces.Withtheproposedloss,thedetectoralsoguidesstraining
toensureimagesaredevoidofadversarialperturbations.At0.1%FalseAccept
Rate,
FaceGuard
'senhancestheTrueAcceptRateofArcFace[9]from
34
:
27%
undernodefenseto
77
:
46%
.
Lastly,thecontributionsofourstudyondetectionofbothphysicalanddigitalattacks
inChapter4areasfollows:

Amongthetothetaskoffaceattackdetectionon
25
attacktypesacross
3
attack
categories:adversarialfaces,digitalfacemanipulation,andspoofs.Wecomprehensively
149
analyzetheshortcomingsofprevailingdetectors.Wealsoshowthatsequentialandparallel
ensemblelearningcanenhancedetectioncomparedtousingasingleSOTAdetector.

Anovel
uni

f
ace
a
ttack
d
etectionframework,namely
UniFAD
,thatautomaticallyclus-
terssimilarattacksandemploysamulti-tasklearningframeworktodetectdigitalandphys-
icalattacks.

Proposed
UniFAD
achievesSOTAdetectionperformance,TDR=
94
:
73%
@
0
:
2%
FDRon
alargefakefacedataset,namely
GrandFake
.Tothebestofourknowledge,
GrandFake
is
thelargestfaceattackdatasetstudiedinliteratureintermsofthenumberofdiverseattack
types.

Proposed
UniFAD
allowsforfurtheroftheattackcategories,
i
.
e
.,whether
attacksareadversarial,digitallymanipulated,orcontainsphysicalartifacts,witha
accuracyof
97
:
37%
.
5.2SuggestionsforFutureWork
ThealgorithmsandmodelsdesignedinthisthesisarenotlimitedtoAFRsystems.Attackscrafted
inbothphysicalanddigitaldomainscanalsobelaunchedagainstotherbiometricsystems(such
asautomatedandirisrecognitionsystems).Theresearchpresentedinthisdissertation
canbeextendedinthefollowingdirections:

AdversarialAttacksonOtherBiometrics
Theproposedadversarialfacesynthesismethod
inChapter3,namely
AdvFaces
,canbeutilizedtolaunchadversarialattacksonotherbio-
metricsystemssuchasautomatedoririsrecognitionsystems.Forinstance,by
replacingtheauxiliaryAFRsystemwithanautomatedrecognitionsystem(such
asDeepPrint[234]),wecansynthesizeadversarial

RobustnessviaSynthesis
Insteadofutilizingadedicatedadversarialdetector,wecanuti-
lize
FaceGuard
(Chapter3)inanadversarialtrainingmechanism.
FaceGuard
'sgenerator
150
cansynthesizeadversarialfaceson-t,whileafacerecognitionsystemistrainedtocor-
rectlyclassifythesynthesizedadversarialfaces.Inthismanner,weshouldbeabletoobtain
afacerecognitionsystemthatisrobusttoadversarialfaces.

UniversalAttackDetectionviaSynthesis
FollowingtheideaspresentedinChapter3,we
canalsolearntosynthesizephysicalspoofsalongwithadversarialattacks,such
thatadetectorisconcurrentlytrainedtodetectbothsynthesizedspoofsandadversarialat-
tacks.Oncetrained,thedetectorshouldbeabletoreliablyrejectfaceattacksfromboth
physicalanddigitaldomains.
151
Chapter6
PhDOverview
6.1Publications
AlistofallpublicationsduringthecourseofmyPhDprogram(inreversechronologicalorder):
1.
D.Deb,X.LiuandA.K.Jain,DetectionofDigitalandPhysicalFaceAttacksﬂ,
arXiv:2104.02156,2021.
2.
D.Deb,X.LiuandA.K.Jain,ﬂFaceGuard:ASelf-SupervisedDefenseAgainstAdversarial
FaceImagesﬂ,arXiv:2011.14218,2021.
3.
D.Deb,D.AggarwalandA.K.Jain.ﬁChildFaceAge-ProgressionviaDeepFeatureAgingﬂ,
IEEEICPR
,2021.
4.
J.J.Engelsma,D.Deb,K.Cao,A.Bhatnagar,P.S.SudhishandA.K.Jain,ﬂInfant-ID:
FingerprintsforGlobalGoodﬂ,in
IEEEPAMI
,2021.
5.
D.DebandA.K.Jain.ﬁLookLocallyInferGlobally:AGeneralizableFace
Approachﬂ,in
IEEETIFS
,2020.
6.
H.Xu,Y.Ma,H.Liu,D.Deb,H.Liu,J.TangandA.K.Jain,ﬁAdversarialAttacksand
DefensesinImages,GraphsandText:AReviewﬂ,in
IJAC
,DOI10.1007/s11633-019-1211-
x,2020.
152
7.
D.Deb,J.ZhangandA.K.Jain,ﬁAdvFaces:AdversarialFaceSynthesisﬂ,in
IEEEIJCB
,
2020.
8.
D.Deb,A.Ross,A.K.Jain,K.Prakah-AsanteandK.VenkateshPrasad,ﬁActionsSpeak
LouderThan(Pass)words:PassiveAuthenticationofSmartphoneUsersviaDeepTemporal
Featuresﬂ,in
IEEEICB
,2019.
9.
J.J.Engelsma,D.Deb,A.K.Jain,P.S.SudhishandA.Bhatnagar,ﬂInfant-Prints:Finger-
printsforReducingInfantMortalityﬂ,in
IEEECVPRW-CV4GC
,2019.
10.
Y.Shi,D.DebandA.K.Jain,ﬁWarpGAN:AutomaticCaricatureGenerationﬂ,in
IEEE
CVPR
,2019.
11.
D.Deb,N.NainandA.K.Jain,ﬁLongitudinalStudyofChildFaceRecognitionﬂ,in
IEEE
ICB
,2018.
12.
D.Deb,S.Wiper,S.Gong,Y.Shi,C.Tymoszek,A.FletcherandA.K.Jain,ﬁFaceRecog-
nition:PrimatesintheWildﬂ,in
IEEEBTAS
,2018.
13.
E.Tabassi,T.Chugh,D.DebandA.K.Jain,ﬁAlteredFingerprints:DetectionandLocaliza-
tionﬂ,in
IEEEBTAS
,2018.
14.
D.Deb,T.Chugh,J.Engelsma,K.Cao,N.Nain,J.KendallandA.K.Jain,ﬁMatching
FingerphotostoSlapFingerprintImagesﬂ,arXiv:1804.08122,2018.
15.
D.Deb,L.Best-RowdenandA.K.Jain,ﬁFaceRecognitionPerformanceUnderAgingﬂ,in
IEEECVPRW
,2017.
6.2Videos&Demos
Thefollowingvideosdemonstrateresearchsolutionspresentedintheabovepublicationsinthe
realworld:
153
1.
FaceAntihttps://youtu.be/VzD1GSJ5omQ
2.
AdversarialFaceSynthesishttps://youtu.be/uZBKymweNvI
3.
AutomaticCaricatureSynthesishttps://youtu.be/zJL-eivtVnk
4.
PrimID:FaceRecognitionforEndangeredPrimateshttps://youtu.be/mbiIhEjKfhA
6.3MediaCoverage
1.
https://msutoday.msu.edu/news/2021/using-thumbprints-vaccination-records-to-save-lives
(MSUToday)
2.
https://www.sciencedaily.com/releases/2018/05/180524112345.htm(ScienceDaily)
3.
https://msutoday.msu.edu/news/2018/msu-technology-and-app-could-help-endangered-
primates-slow-illegal-traf(MSUToday)
4.
https://www.springwise.com/app-endangered-primates-using-facial-recognition
(Springwise)
5.
https://www.conservationjobs.co.uk/articles/new-technology-assists-the-protection-of-
primates(ConservationJobs)
6.
https://www.asmag.com/rankings/m/content.aspx?id=25402(ASMag)
7.
https://olhardigital.com.br/en/2018/05/28/noticias/novo-sistema-de-reconhecimento-facial-
pode-ajudar-a-salvar-primatas-da-extincao/(OlharDigital)
8.
https://freetheapes.org/tag/facial-recognition/(PEGAS)
9.
https://msutoday.msu.edu/news/2019/compact-low-cost-reader-could-help-
reduce-infant-mortality-around-the-world(MSUToday)
154
BIBLIOGRAPHY
155
BIBLIOGRAPHY
[1]
Y.Liu,J.Stehouwer,A.Jourabloo,andX.Liu,ﬁDeeptreelearningforzero-shotfaceanti-
ﬂin
CVPR
,2019.
[2]
Z.Boulkenafet,J.Komulainen,L.Li,X.Feng,andA.Hadid,ﬁOULU-NPU:Amobileface
presentationattackdatabasewithreal-worldvariations,ﬂin
IEEEFG
,pp.612Œ618,2017.
[3]
Z.Zhang,J.Yan,S.Liu,Z.Lei,D.Yi,andS.Z.Li,ﬁAfacedatabasewith
diverseattacks,ﬂin
IEEEICB
,pp.26Œ31,2012.
[4]
I.Chingovska,A.Anjos,andS.Marcel,ﬁOntheEffectivenessofLocalBinaryPatternsin
Faceﬂin
IEEEBIOSIG
,2012.
[5]
J.Deng,J.Guo,E.Ververas,I.Kotsia,andS.Zafeiriou,ﬁRetinaface:Single-shotmulti-level
facelocalisationinthewild,ﬂin
IEEECVPR
,pp.5203Œ5212,2020.
[6]
F.Schroff,D.Kalenichenko,andJ.Philbin,ﬁFacenet:Aembeddingforfacerecog-
nitionandclustering,ﬂin
IEEECVPR
,pp.815Œ823,2015.
[7]
BLCV,ﬁDemystifyingFaceRecognitionIV:Face-Alignment.ﬂhttps://bit.ly/3iUUBqz,
2017.
[8]
G.B.Huang,M.Ramesh,T.Berg,andE.Learned-Miller,ﬁLabeledfacesinthewild:A
databaseforstudyingfacerecognitioninunconstrainedenvironments,ﬂTech.Rep.07-49,
UniversityofMassachusetts,Amherst,October2007.
[9]
J.Deng,J.Guo,N.Xue,andS.Zafeiriou,ﬁArcface:Additiveangularmarginlossfordeep
facerecognition,ﬂin
IEEECVPR
,pp.4690Œ4699,2019.
[10]
D.Yi,Z.Lei,S.Liao,andS.Z.Li,ﬁLearningfacerepresentationfromscratch,ﬂ
arXiv
preprintarXiv:1411.7923
,2014.
[11]
A.Kurakin,I.Goodfellow,andS.Bengio,ﬁAdversarialmachinelearningatscale.,ﬂ
ICLR
,
2017.
[12]
X.LiuandC.-J.Hsieh,ﬁRob-gan:Generator,discriminator,andadversarialattacker,ﬂin
CVPR
,2019.
[13]
Y.Jang,T.Zhao,S.Hong,andH.Lee,ﬁAdversarialdefensevialearningtogeneratediverse
attacks,ﬂin
ICCV
,2019.
[14]
D.MengandH.Chen,ﬁMagnet:atwo-prongeddefenseagainstadversarialexamples,ﬂin
ACMCCS
,pp.135Œ147,2017.
[15]
P.Samangouei,M.Kabkab,andR.Chellappa,ﬁDefense-gan:Protectingagainst
adversarialattacksusinggenerativemodels,ﬂ
ICLR
,2018.
156
[16]
D.Deb,J.Zhang,andA.K.Jain,ﬁAdvfaces:Adversarialfacesynthesis,ﬂ
arXivpreprint
arXiv:1908.05008
,2019.
[17]
Z.Liu,P.Luo,X.Wang,andX.Tang,ﬁDeeplearningfaceattributesinthewild,ﬂin
ICCV
,
December2015.
[18]
T.Karras,S.Laine,andT.Aila,ﬁAstyle-basedgeneratorarchitectureforgenerativeadver-
sarialnetworks,ﬂin
CVPR
,2019.
[19]
D.Deb,X.Liu,andA.K.Jain,ﬁFaceguard:Aself-superviseddefenseagainstadversarial
faceimages,ﬂ
arXivpreprintarXiv:2011.14218
,2020.
[20]
H.Dang,F.Liu,J.Stehouwer,X.Liu,andA.K.Jain,ﬁOnthedetectionofdigitalface
manipulation,ﬂin
CVPR
,2020.
[21]
D.DebandA.K.Jain,ﬁLooklocallyinferglobally:Ageneralizableface
approach,ﬂ
IEEETIFS
,vol.16,pp.1143Œ1157,2020.
[22]
P.Grother,M.Ngan,andK.Hanaoka,ﬁOngoingfacerecognitionvendortest(frvt),ﬂ
NIST
InteragencyReport
,2018.
[23]
S.Baker,T.Sim,andM.Bsat,ﬁThecmupose,illumination,andexpressiondatabase,ﬂ
IEEE
TIFS
,vol.25,no.12,pp.1615Œ1618,2003.
[24]
S.Sengupta,J.-C.Chen,C.Castillo,V.M.Patel,R.Chellappa,andD.W.Jacobs,ﬁFrontal
tofacevinthewild,ﬂin
IEEEWACV
,pp.1Œ9,2016.
[25]
S.Moschoglou,A.Papaioannou,C.Sagonas,J.Deng,I.Kotsia,andS.Zafeiriou,ﬁAgedb:
themanuallycollected,in-the-wildagedatabase,ﬂin
IEEECVPR
,pp.51Œ59,2017.
[26]
P.J.Phillips,H.Moon,S.A.Rizvi,andP.J.Rauss,ﬁTheferetevaluationmethodologyfor
face-recognitionalgorithms,ﬂ
IEEEPAMI
,vol.22,no.10,pp.1090Œ1104,2000.
[27]
B.F.Klare,B.Klein,E.Taborsky,A.Blanton,J.Cheney,K.Allen,P.Grother,A.Mah,and
A.K.Jain,ﬁPushingthefrontiersofunconstrainedfacedetectionandrecognition:IARPA
JanusBenchmarkA,ﬂin
IEEECVPR
,pp.1931Œ1939,2015.
[28]
H.Goldstein,
Multilevelstatisticalmodels
,vol.922.JohnWiley&Sons,2011.
[29]
Z.Cheng,X.Zhu,andS.Gong,ﬁLow-resolutionfacerecognition,ﬂin
ACCV
,pp.605Œ621,
2018.
[30]
Y.Liu,A.Jourabloo,andX.Liu,ﬁLearningdeepmodelsforfaceBinaryor
auxiliarysupervision,ﬂin
IEEECVPR
,June2018.
[31]
N.K.Ratha,J.H.Connell,andR.M.Bolle,ﬁEnhancingsecurityandprivacyinbiometrics-
basedauthenticationsystems,ﬂ
IBMSystemsJournal
,vol.40,no.3,pp.614Œ634,2001.
[32]
A.Dabouei,S.Soleymani,J.Dawson,andN.Nasrabadi,ﬁFastgeometrically-perturbed
adversarialfaces,ﬂin
IEEEWACV
,pp.1979Œ1988,2019.
157
[33]
A.Madry,A.Makelov,L.Schmidt,D.Tsipras,andA.Vladu,ﬁTowardsdeeplearning
modelsresistanttoadversarialattacks,ﬂ
arXivpreprintarXiv:1706.06083
,2017.
[34]
I.J.Goodfellow,J.Shlens,andC.Szegedy,ﬁExplainingandharnessingadversarialexam-
ples,ﬂ
arXivpreprintarXiv:1412.6572
,2014.
[35]
A.Madry,A.Makelov,L.Schmidt,D.Tsipras,andA.Vladu,ﬁTowardsdeeplearning
modelsresistanttoadversarialattacks,ﬂ
arXivpreprintarXiv:1706.06083
,2017.
[36]
G.Ke,Q.Meng,T.Finley,T.Wang,W.Chen,W.Ma,Q.Ye,andT.-Y.Liu,ﬁLightgbm:A
highlyefgradientboostingdecisiontree,ﬂ
NeurIPS
,2017.
[37]
L.Best-RowdenandA.K.Jain,ﬁLongitudinalstudyofautomaticfacerecognition,ﬂ
IEEE
transactionsonpatternanalysisandmachineintelligence
,vol.40,no.1,pp.148Œ162,2017.
[38]
D.Deb,N.Nain,andA.K.Jain,ﬁLongitudinalstudyofchildfacerecognition,ﬂin
IEEE
ICB
,pp.225Œ232,2018.
[39]
D.Deb,L.Best-Rowden,andA.K.Jain,ﬁFacerecognitionperformanceunderaging,ﬂin
IEEECVPRWorkshop
,pp.46Œ54,2017.
[40]
P.ViolaandM.Jones,ﬁRapidobjectdetectionusingaboostedcascadeofsimplefeatures,ﬂ
in
IEEECVPR
,vol.1,pp.IŒI,2001.
[41]
K.Zhang,Z.Zhang,Z.Li,andY.Qiao,ﬁJointfacedetectionandalignmentusingmultitask
cascadedconvolutionalnetworks,ﬂ
IEEESPL
,vol.23,no.10,pp.1499Œ1503,2016.
[42]
H.Wang,Y.Wang,Z.Zhou,X.Ji,D.Gong,J.Zhou,Z.Li,andW.Liu,ﬁCosface:Large
margincosinelossfordeepfacerecognition,ﬂin
IEEECVPR
,2018.
[43]
D.Wang,C.Otto,andA.K.Jain,ﬁFacesearchatscale,ﬂ
IEEEPAMI
,vol.39,no.6,
pp.1122Œ1136,2016.
[44]
W.Liu,Y.Wen,Z.Yu,M.Li,B.Raj,andL.Song,ﬁSphereface:Deephypersphereembed-
dingforfacerecognition,ﬂin
IEEECVPR
,pp.212Œ220,2017.
[45]
M.A.TurkandA.P.Pentland,ﬁFacerecognitionusingeigenfaces,ﬂin
IEEECVPR
,
pp.586Œ587,1991.
[46]
K.Simonyan,O.M.Parkhi,A.Vedaldi,andA.Zisserman,ﬁFishervectorfacesinthewild.,ﬂ
in
BMVC
,vol.2,p.4,2013.
[47]
D.Chen,X.Cao,F.Wen,andJ.Sun,ﬁBlessingofdimensionality:High-dimensionalfeature
anditsefcompressionforfacevﬂin
IEEECVPR
,pp.3025Œ3032,2013.
[48]
D.Chen,X.Cao,L.Wang,F.Wen,andJ.Sun,ﬁBayesianfacerevisited:Ajointformula-
tion,ﬂin
ECCV
,pp.566Œ579,2012.
[49]
Y.Taigman,M.Yang,M.Ranzato,andL.Wolf,ﬁDeepface:Closingthegaptohuman-level
performanceinfacevﬂin
IEEECVPR
,pp.1701Œ1708,2014.
158
[50]
Y.Sun,X.Wang,andX.Tang,ﬁDeeplearningfacerepresentationfrompredicting10,000
classes,ﬂin
IEEECVPR
,pp.1891Œ1898,2014.
[51]
O.M.Parkhi,A.Vedaldi,andA.Zisserman,ﬁDeepfacerecognition,ﬂin
BMVC
,pp.41.1Œ
41.12,2015.
[52]
H.T.F.Rhodes,
AlphonseBertillon,fatherofdetection
.Greenwood,1968.
[53]
Y.Wen,K.Zhang,Z.Li,andY.Qiao,ﬁAdiscriminativefeaturelearningapproachfordeep
facerecognition,ﬂin
ECCV
,pp.499Œ515,Springer,2016.
[54]
P.J.Phillips,P.J.Grother,R.J.Micheals,D.M.Blackburn,E.Tabassi,andM.Bone,ﬁFace
recognitionvendortest2002:Evaluationreport,ﬂtech.rep.,NIST,2003.
[55]
P.Flanagan,ﬁMultiplebiometricevaluation(mbe),ﬂ
NIST
,2010.
[56]
P.GrotherandM.Ngan,ﬁFacerecognitionvendortest(frvt):Performanceofface
cationalgorithms,ﬂ
NISTInteragencyreport
,vol.8009,no.5,p.14,2014.
[57]
Q.Cao,L.Shen,W.Xie,O.M.Parkhi,andA.Zisserman,ﬁVggface2:Adatasetforrecog-
nisingfacesacrossposeandage,ﬂin
IEEEFG
,pp.67Œ74,IEEE,2018.
[58]
S.Jia,G.Guo,andZ.Xu,ﬁAsurveyon3dmaskpresentationattackdetectionandcounter-
measures,ﬂ
PatternRecognition
,vol.98,p.107032,2020.
[59]
S.Agarwal,H.Farid,Y.Gu,M.He,K.Nagano,andH.Li,ﬁProtectingworldleadersagainst
deepfakes.,ﬂin
CVPRWorkshops
,2019.
[60]
X.Yang,D.Yang,Y.Dong,W.Yu,H.Su,andJ.Zhu,ﬁDelvingintotheadversarialrobust-
nessonfacerecognition,ﬂ
arXivpreprintarXiv:2007.04118
,2020.
[61]
A.GeorgeandS.Marcel,ﬁLearningoneclassrepresentationsforfacepresentationattack
detectionusingmulti-channelconvolutionalneuralnetworks,ﬂ
IEEETIFS
,vol.16,pp.361Œ
375,2020.
[62]
Y.Liu,J.Stehouwer,andX.Liu,ﬁOndisentanglingspooftraceforgenericfaceanti-
ﬂin
ECCV
,Springer,2020.
[63]
H.Feng,Z.Hong,H.Yue,Y.Chen,K.Wang,J.Han,J.Liu,andE.Ding,ﬁLearning
generalizedspoofcuesforfaceﬂ
arXivpreprintarXiv:2005.03922
,2020.
[64]
InternationalStandardsOrganization,ﬁISO/IEC30107-1:2016,InformationTechnology
BiometricPresentationAttackDetectionPart1:Framework.ﬂhttps://www.iso.org/standard/
53227.html,2016.
[65]
S.Marcel,M.S.Nixon,andS.Z.Li,
HandbookofBiometric
,vol.1.Springer,
2014.
[66]
I.Manjani,S.Tariyal,M.Vatsa,R.Singh,andA.Majumdar,ﬁDetectingsiliconemask-
basedpresentationattackviadeepdictionarylearning,ﬂ
IEEETIFS
,vol.12,no.7,pp.1713Œ
1723,2017.
159
[67]
Y.Xu,T.Price,J.-M.Frahm,andF.Monrose,ﬁVirtualu:Defeatingfacelivenessdetection
bybuildingvirtualmodelsfromyourpublicphotos,ﬂin
USENIX
,pp.497Œ512,2016.
[68]
ﬁFastPass-aharmonized,modularreferencesystemforallEuropeanautomatedborder-
crossingpoints.ﬂFastPass-EU,https://www.fastpass-project.eu.
[69]
C.Szegedy,W.Zaremba,I.Sutskever,J.Bruna,D.Erhan,I.Goodfellow,andR.Fergus,
ﬁIntriguingpropertiesofneuralnetworks,ﬂ
arXivpreprintarXiv:1312.6199
,2013.
[70]
I.J.Goodfellow,J.Shlens,andC.Szegedy,ﬁExplainingandharnessingadversarialexam-
ples,ﬂ
arXivpreprintarXiv:1412.6572
,2014.
[71]
S.-M.Moosavi-Dezfooli,A.Fawzi,O.Fawzi,andP.Frossard,ﬁUniversaladversarialper-
turbations,ﬂin
IEEECVPR
,pp.1765Œ1773,2017.
[72]
Y.Dong,F.Liao,T.Pang,H.Su,J.Zhu,X.Hu,andJ.Li,ﬁBoostingadversarialattacks
withmomentum,ﬂin
IEEECVPR
,pp.9185Œ9193,2018.
[73]
Wikipedia,ﬁU.S.CustomsandBorderProtection.ﬂhttps://en.wikipedia.org/wiki/U.S.
Customs
and
Border
Protection,2019.
[74]
U.S.CustomsandBorderProtection,ﬁOnaTypicalDayinFiscalYear2018.ﬂhttps://www.
cbp.gov/newsroom/stats/typical-day-fy2018,2018.
[75]
ﬁBiometrics.ﬂU.S.CustomsandBorderProtection,https://www.cbp.gov/travel/biometrics.
[76]
J.Thies,M.Zollhofer,M.Stamminger,C.Theobalt,andM.Nießner,ﬁFace2face:Real-time
facecaptureandreenactmentofrgbvideos,ﬂin
CVPR
,2016.
[77]
A.Rossler,D.Cozzolino,L.Verdoliva,C.Riess,J.Thies,andM.Nießner,ﬁFaceforen-
sics++:Learningtodetectmanipulatedfacialimages,ﬂin
ICCV
,pp.1Œ11,2019.
[78]
Y.Choi,M.Choi,M.Kim,J.-W.Ha,S.Kim,andJ.Choo,ﬁStargan:generative
adversarialnetworksformulti-domainimage-to-imagetranslation,ﬂin
CVPR
,2018.
[79]
M.Liu,Y.Ding,M.Xia,X.Liu,E.Ding,W.Zuo,andS.Wen,ﬁStgan:Aselective
transfernetworkforarbitraryimageattributeediting,ﬂin
CVPR
,2019.
[80]
T.Karras,S.Laine,M.Aittala,J.Hellsten,J.Lehtinen,andT.Aila,ﬁAnalyzingandimprov-
ingtheimagequalityofstylegan,ﬂin
CVPR
,2020.
[81]
J.Thies,M.Zollhofer,M.Stamminger,C.Theobalt,andM.Nießner,ﬁFace2face:Real-time
facecaptureandreenactmentofrgbvideos,ﬂin
IEEECVPR
,pp.2387Œ2395,2016.
[82]
DailyMail,ﬁPolicearrestpassengerwhoboardedplaneinHongKongasanoldmanin
capandarrivedinCanadaayoungAsianrefugee.ﬂhttp://dailym.ai/2UBEcxO,2011.
[83]
TheVerge,ﬁThis$150maskbeatFaceIDontheiPhoneX.ﬂhttps://bit.ly/300bRoC,2017.
[84]
N.ErdogmusandS.Marcel,in2DFaceRecognitionwith3DMasksandAnti-
withKinect,ﬂin
IEEEBTAS
,2013.
160
[85]
D.Wen,H.Han,andA.K.Jain,ﬁFacespoofdetectionwithimagedistortionanalysis,ﬂ
IEEETIFS
,vol.10,no.4,pp.746Œ761,2015.
[86]
A.Costa-Pazo,S.Bhattacharjee,E.Vazquez-Fernandez,andS.Marcel,ﬁThereplay-mobile
facepresentation-attackdatabase,ﬂin
IEEEBIOSIG
,Sept.2016.
[87]
S.Liu,B.Yang,P.C.Yuen,andG.Zhao,ﬁA3DMaskFaceDatabasewith
RealWorldVariations,ﬂin
IEEECVPR
,2016.
[88]
Z.Yu,C.Zhao,Z.Wang,Y.Qin,Z.Su,X.Li,F.Zhou,andG.Zhao,ﬁSearchingcentral
differenceconvolutionalnetworksforfaceﬂ
arXivpreprintarXiv:2003.04092
,
2020.
[89]
K.Kollreider,H.Fronthaler,M.I.Faraj,andJ.Bigun,ﬁReal-timefacedetectionandmotion
analysiswithapplicationinﬁlivenessﬂassessment,ﬂ
IEEETIFS
,vol.2,no.3,pp.548Œ558,
2007.
[90]
G.Pan,L.Sun,Z.Wu,andS.Lao,ﬁEyeblink-basedinfacerecognitionfrom
agenericwebcamera,ﬂin
IEEECVPR
,pp.1Œ8,2007.
[91]
K.Patel,H.Han,andA.K.Jain,ﬁCross-databasefacewithrobustfeature
representation,ﬂin
CCBR
,pp.611Œ619,2016.
[92]
R.Shao,X.Lan,andP.C.Yuen,ﬁDeepconvolutionaldynamictexturelearningwithadap-
tivechannel-discriminabilityfor3dmaskfaceﬂin
IEEEIJCB
,pp.748Œ755,
2017.
[93]
J.Komulainen,H.Abdenour,andP.Matti,ﬁContextbasedfaceﬂin
IEEE
BTAS
,2013.
[94]
T.deFreitasPereira,A.Anjos,J.M.DeMartino,andS.Marcel,ﬁLBP-TOPbasedcounter-
measureagainstfaceattacks,ﬂin
ACCV
,pp.121Œ132,2012.
[95]
L.Li,Z.Xia,A.Hadid,X.Jiang,F.Roli,andX.Feng,ﬁFacepresentationattackdetection
inlearnedcolor-likedspace,ﬂ
arXivpreprintarXiv:1810.13170
,2018.
[96]
K.Patel,H.Han,andA.K.Jain,ﬁSecurefaceunlock:Spoofdetectiononsmartphones,ﬂ
IEEETIFS
,vol.11,no.10,pp.2268Œ2283,2016.
[97]
J.Galbally,S.Marcel,andJ.Fierrez,ﬁImagequalityassessmentforfakebiometricde-
tection:Applicationtoiris,andfacerecognition,ﬂ
IEEETIP
,vol.23,no.2,
pp.710Œ724,2014.
[98]
H.Li,S.Wang,andA.C.Kot,ﬁFacedetectionwithimagequalityregression,ﬂin
IEEEIPTA
,pp.1Œ6,2016.
[99]
J.Li,Y.Wang,T.Tan,andA.K.Jain,ﬁLivefacedetectionbasedontheanalysisoffourier
spectra,ﬂin
BiometricTechnologyforHuman
,vol.5404,pp.296Œ303,SPIE,
2004.
161
[100]
T.Pereira,A.Anjos,J.M.DeMartino,andS.Marcel,ﬁCanfacecountermea-
suresworkinarealworldscenario?,ﬂin
IEEEICB
,2013.
[101]
J.M
¨
a
¨
att
¨
a,A.Hadid,andM.Pietik
¨
ainen,ﬁFacedetectionfromsingleimagesusing
micro-textureanalysis,ﬂin
IEEEIJCB
,pp.1Œ7,2011.
[102]
Z.Boulkenafet,J.Komulainen,andA.Hadid,ﬁFacedetectionusingcolourtexture
analysis,ﬂ
IEEETIFS
,vol.11,no.8,pp.1818Œ1830,2016.
[103]
J.Yang,Z.Lei,S.Liao,andS.Z.Li,ﬁFacelivenessdetectionwithcomponentdependent
descriptor,ﬂin
IEEEICB
,pp.1Œ6,2013.
[104]
X.Tan,Y.Li,J.Liu,andL.Jiang,ﬁFacelivenessdetectionfromasingleimagewithsparse
lowrankbilineardiscriminativemodel,ﬂin
ECCV
,pp.504Œ517,2010.
[105]
Z.Boulkenafet,J.Komulainen,andA.Hadid,ﬁFaceusingspeeded-uprobust
featuresandvectorencoding,ﬂ
IEEESignalProcessingLetters
,vol.24,no.2,pp.141Œ
145,2017.
[106]
T.Wang,J.Yang,Z.Lei,S.Liao,andS.Z.Li,ﬁFacelivenessdetectionusing3dstructure
recoveredfromasinglecamera,ﬂin
IEEEICB
,2013.
[107]
Y.Wang,F.Nian,T.Li,Z.Meng,andK.Wang,ﬁRobustfacewithdepth
information,ﬂ
JournalofVisualCommunicationandImageRepresentation
,vol.49,092017.
[108]
S.Zhang,X.Wang,A.Liu,C.Zhao,J.Wan,S.Escalera,H.Shi,Z.Wang,andS.Z.
Li,ﬁCASIA-SURF:ADatasetandBenchmarkforLarge-scaleMulti-modalFaceAnti-
ﬂ
arXivpreprintarXiv:1812.00408
,2018.
[109]
V.Conotter,E.Bodnari,G.Boato,andH.Farid,ﬁPhysiologically-baseddetectionofcom-
putergeneratedfacesinvideo,ﬂ
IEEEICIP
,pp.248Œ252,012015.
[110]
Z.Zhang,D.Yi,Z.Lei,andS.Li,ﬁFacelivenessdetectionbylearningmultispectralre-
distributions,ﬂin
IEEEFG
,pp.436Œ441,2011.
[111]
G.Chetty,ﬁBiometriclivenesscheckingusingmultimodalfuzzyfusion,ﬂin
IEEEWCCI
,
pp.1Œ8,072010.
[112]
J.Yang,Z.Lei,andS.Z.Li,ﬁLearnConvolutionalNeuralNetworkforFaceﬂ
arXivpreprintarXiv:1408.5601
,2014.
[113]
N.N.Lakshminarayana,N.Narayan,N.Napp,S.Setlur,andV.Govindaraju,ﬁAdiscrimi-
nativespatio-temporalmappingoffaceforlivenessdetection,ﬂin
IEEEISBA
,pp.1Œ7,2017.
[114]
X.TuandY.Fang,ﬁUltra-deepneuralnetworkforfaceﬂin
NIPS
,pp.686Œ
695,2017.
[115]
O.Lucena,A.Junior,V.Moia,R.Souza,E.Valle,andR.Lotufo,ﬁTransferlearningusing
convolutionalneuralnetworksforfaceﬂin
ICIAR
,pp.27Œ34,2017.
162
[116]
L.Li,X.Feng,Z.Boulkenafet,Z.Xia,M.Li,andA.Hadid,ﬁAnoriginalface
approachusingpartialconvolutionalneuralnetwork,ﬂin
IEEEIPTA
,pp.1Œ6,2016.
[117]
S.R.Arashloo,J.Kittler,andW.Christmas,ﬁAnanomalydetectionapproachtofacespoof-
ingdetection:Anewformulationandevaluationprotocol,ﬂ
IEEEAccess
,vol.5,pp.13868Œ
13882,2017.
[118]
O.Nikisins,A.Mohammadi,A.Anjos,andS.Marcel,ﬁOneffectivenessofanomalyde-
tectionapproachesagainstunseenpresentationattacksinfaceﬂin
IEEEICB
,
pp.75Œ81,2018.
[119]
D.P
´
erez-Cabo,D.Jim
´
enez-Cabello,A.Costa-Pazo,andR.J.L
´
opez-Sastre,ﬁDeepanomaly
detectionforgeneralizedfaceﬂin
IEEECVPR
,pp.0Œ0,2019.
[120]
H.Li,W.Li,H.Cao,S.Wang,F.Huang,andA.C.Kot,ﬁUnsuperviseddomainadaptation
forfaceﬂ
IEEETransactionsonInformationForensicsandSecurity
,vol.13,
no.7,pp.1794Œ1809,2018.
[121]
S.R.Arashloo,J.Kittler,andW.Christmas,ﬁFacedetectionbasedonmultiplede-
scriptorfusionusingmultiscaledynamicbinarizedstatisticalimagefeatures,ﬂ
IEEETrans-
actionsonInformationForensicsandSecurity
,vol.10,no.11,pp.2396Œ2407,2015.
[122]
J.Yang,Z.Lei,D.Yi,andS.Z.Li,ﬁPefacewithsubjectdo-
mainadaptation,ﬂ
IEEETransactionsonInformationForensicsandSecurity
,vol.10,no.4,
pp.797Œ809,2015.
[123]
Y.Atoum,Y.Liu,A.Jourabloo,andX.Liu,ﬁFaceusingpatchanddepth-
basedcnns,ﬂin
2017IEEEIJCB
,pp.319Œ328,2017.
[124]
A.GeorgeandS.Marcel,ﬁDeeppixel-wisebinarysupervisionforfacepresentationattack
detection,ﬂ
arXivpreprintarXiv:1907.04047
,2019.
[125]
A.Jourabloo,Y.Liu,andX.Liu,ﬁFacevianoisemodeling,ﬂin
ECCV
,pp.290Œ306,2018.
[126]
C.NagpalandS.R.Dubey,ﬁAperformanceevaluationofconvolutionalneuralnetworksfor
faceantiﬂin
IEEEIJCNN
,pp.1Œ8,2019.
[127]
W.Sun,Y.Song,C.Chen,J.Huang,andA.C.Kot,ﬁFacedetectionbasedonlocal
ternarylabelsupervisioninfullyconvolutionalnetworks,ﬂ
IEEETIFS
,vol.15,pp.3181Œ
3196,2020.
[128]
W.Sun,Y.Song,H.Zhao,andZ.Jin,ﬁAfacedetectionmethodbasedondomain
adaptationandlosslesssizeadaptation,ﬂ
IEEEAccess
,vol.8,pp.66553Œ66563,2020.
[129]
D.P.KingmaandJ.Ba,ﬁAdam:Amethodforstochasticoptimization,ﬂ
arXivpreprint
arXiv:1412.6980
,2014.
[130]
D.E.King,ﬁDlib-ml:Amachinelearningtoolkit,ﬂ
JMLR
,vol.10,no.Jul,pp.1755Œ1758,
2009.
163
[131]
Z.Boulkenafet,J.Komulainen,Z.Akhtar,A.Benlamoudi,D.Samai,S.E.Bekhouche,
A.F.Dornaika,A.Taleb-Ahmed,L.Qin,
etal.
,ﬁAcompetitionongeneralized
software-basedfacepresentationattackdetectioninmobilescenarios,ﬂin
IEEEIJCB
,
pp.688Œ696,2017.
[132]
H.Chen,G.Hu,Z.Lei,Y.Chen,N.M.Robertson,andS.Z.Li,ﬁAttention-basedtwo-
streamconvolutionalnetworksforfacedetection,ﬂ
IEEETIFS
,vol.15,pp.578Œ
593,2019.
[133]
R.Bresan,A.Pinto,A.Rocha,C.Beluzo,andT.Carvalho,ﬁFacespoofbuster:apresenta-
tionattackdetectorbasedonintrinsicimagepropertiesanddeeplearning,ﬂ
arXivpreprint
arXiv:1902.02845
,2019.
[134]
N.Damer,K.Dimitrov,R.Wilson,E.Hancock,andW.Smith,ﬁPracticalviewonface
presentationattackdetection.,ﬂin
BMVC
,2016.
[135]
X.Yang,W.Luo,L.Bao,Y.Gao,D.Gong,S.Zheng,Z.Li,andW.Liu,ﬁFace
Modelmatters,sodoesdata,ﬂin
IEEECVPR
,pp.3507Œ3516,2019.
[136]
N.CarliniandD.Wagner,ﬁTowardsevaluatingtherobustnessofneuralnetworks,ﬂin
IEEE
SP
,pp.39Œ57,2017.
[137]
C.Xiao,J.-Y.Zhu,B.Li,W.He,M.Liu,andD.Song,ﬁSpatiallytransformedadversarial
examples,ﬂ
arXivpreprintarXiv:1801.02612
,2018.
[138]
K.Eykholt,I.Evtimov,E.Fernandes,B.Li,A.Rahmati,C.Xiao,A.Prakash,T.Kohno,
andD.Song,ﬁRobustphysical-worldattacksondeeplearningmodels,ﬂ
arXivpreprint
arXiv:1707.08945
,2017.
[139]
N.Papernot,P.McDaniel,S.Jha,M.Fredrikson,Z.B.Celik,andA.Swami,ﬁThelimita-
tionsofdeeplearninginadversarialsettings,ﬂin
IEEEEuroS&P
,pp.372Œ387,2016.
[140]
A.Kurakin,I.Goodfellow,andS.Bengio,ﬁAdversarialmachinelearningatscale,ﬂ
arXiv
preprintarXiv:1611.01236
,2016.
[141]
S.-M.Moosavi-Dezfooli,A.Fawzi,andP.Frossard,ﬁDeepfool:asimpleandaccurate
methodtofooldeepneuralnetworks,ﬂin
IEEECVPR
,pp.2574Œ2582,2016.
[142]
Y.Dong,H.Su,B.Wu,Z.Li,W.Liu,T.Zhang,andJ.Zhu,ﬁEfdecision-based
black-boxadversarialattacksonfacerecognition,ﬂin
IEEECVPR
,pp.7714Œ7722,2019.
[143]
Y.Liu,X.Chen,C.Liu,andD.Song,ﬁDelvingintotransferableadversarialexamplesand
black-boxattacks,ﬂ
arXivpreprintarXiv:1611.02770
,2016.
[144]
I.Goodfellow,J.Pouget-Abadie,M.Mirza,B.Xu,D.Warde-Farley,S.Ozair,A.Courville,
andY.Bengio,ﬁGenerativeadversarialnets,ﬂin
NIPS
,pp.2672Œ2680,2014.
[145]
A.Radford,L.Metz,andS.Chintala,ﬁUnsupervisedrepresentationlearningwithdeep
convolutionalgenerativeadversarialnetworks,ﬂ
arXivpreprintarXiv:1511.06434
,2015.
164
[146]
E.L.Denton,S.Chintala,andR.Fergus,ﬁDeepgenerativeimagemodelsusingalaplacian
pyramidofadversarialnetworks,ﬂin
NIPS
,pp.1486Œ1494,2015.
[147]
D.Ulyanov,V.Lebedev,A.Vedaldi,andV.S.Lempitsky,ﬁTexturenetworks:Feed-forward
synthesisoftexturesandstylizedimages.,ﬂin
ICML
,vol.1,p.4,2016.
[148]
J.Johnson,A.Alahi,andL.Fei-Fei,ﬁPerceptuallossesforreal-timestyletransferand
super-resolution,ﬂin
ECCV
,pp.694Œ711,2016.
[149]
L.A.Gatys,A.S.Ecker,andM.Bethge,ﬁImagestyletransferusingconvolutionalneural
networks,ﬂin
IEEECVPR
,pp.2414Œ2423,2016.
[150]
P.Isola,J.-Y.Zhu,T.Zhou,andA.A.Efros,ﬁImage-to-imagetranslationwithconditional
adversarialnetworks,ﬂin
IEEECVPR
,pp.1125Œ1134,2017.
[151]
J.-Y.Zhu,T.Park,P.Isola,andA.A.Efros,ﬁUnpairedimage-to-imagetranslationusing
cycle-consistentadversarialnetworks,ﬂin
IEEEICCV
,pp.2223Œ2232,2017.
[152]
T.Salimans,I.Goodfellow,W.Zaremba,V.Cheung,A.Radford,andX.Chen,ﬁImproved
techniquesfortraininggans,ﬂin
NIPS
,pp.2234Œ2242,2016.
[153]
M.F.Mathieu,J.J.Zhao,J.Zhao,A.Ramesh,P.Sprechmann,andY.LeCun,ﬁDisen-
tanglingfactorsofvariationindeeprepresentationusingadversarialtraining,ﬂin
NIPS
,
pp.5040Œ5048,2016.
[154]
S.BalujaandI.Fischer,ﬁAdversarialtransformationnetworks:Learningtogenerateadver-
sarialexamples,ﬂ
arXivpreprintarXiv:1703.09387
,2017.
[155]
C.Xiao,B.Li,J.-Y.Zhu,W.He,M.Liu,andD.Song,ﬁGeneratingadversarialexamples
withadversarialnetworks,ﬂ
arXivpreprintarXiv:1801.02610
,2018.
[156]
X.Wang,K.He,C.Guo,K.Q.Weinberger,andJ.E.Hopcroft,ﬁAT-GAN:AGenerative
AttackModelforAdversarialTransferringonGenerativeAdversarialNets,ﬂ
arXivpreprint
arXiv:1904.07793
,2019.
[157]
Y.Song,R.Shu,N.Kushman,andS.Ermon,ﬁConstructingunrestrictedadversarialexam-
pleswithgenerativemodels,ﬂin
NIPS
,pp.8312Œ8323,2018.
[158]
A.J.BoseandP.Aarabi,ﬁAdversarialattacksonfacedetectorsusingneuralnetbased
constrainedoptimization,ﬂin
IEEEMMSP
,pp.1Œ6,2018.
[159]
M.Sharif,S.Bhagavatula,L.Bauer,andM.K.Reiter,ﬁAccessorizetoacrime:Realand
stealthyattacksonstate-of-the-artfacerecognition,ﬂin
ACMSIGSAC
,pp.1528Œ1540,2016.
[160]
M.Sharif,S.Bhagavatula,L.Bauer,andM.K.Reiter,ﬁAgeneralframeworkforadversarial
exampleswithobjectives,ﬂ
ACMTOPS
,vol.22,no.3,p.16,2019.
[161]
Q.Song,Y.Wu,andL.Yang,ﬁAttacksonstate-of-the-artfacerecognitionusingattentional
adversarialattackgenerativenetwork,ﬂ
arXivpreprintarXiv:1811.12026
,2018.
165
[162]
J.Deng,W.Dong,R.Socher,L.-J.Li,K.Li,andL.Fei-Fei,ﬁImagenet:Alarge-scale
hierarchicalimagedatabase,ﬂin
CVPR
,IEEE,2009.
[163]
A.Krizhevsky,G.Hinton,
etal.
,ﬁLearningmultiplelayersoffeaturesfromtinyimages,ﬂ
Citeseer
,2009.
[164]
C.Xie,Y.Wu,L.v.d.Maaten,A.L.Yuille,andK.He,ﬁFeaturedenoisingforimproving
adversarialrobustness,ﬂin
CVPR
,2019.
[165]
Y.LeCun,ﬁThemnistdatabaseofhandwrittendigits,ﬂ
TechReport
,1998.
[166]
Z.Gong,W.Wang,andW.-S.Ku,ﬁAdversarialandcleandataarenottwins,ﬂ
arXivpreprint
arXiv:1704.04960
,2017.
[167]
A.Agarwal,R.Singh,M.Vatsa,andN.Ratha,ﬁAreimage-agnosticuniversaladversarial
perturbationsforfacerecognitiondiftodetect?,ﬂin
BTAS
,2018.
[168]
A.P.Founds,N.Orlans,W.Genevieve,andC.I.Watson,ﬁNISTspecialdatabase32-
multipleencounterdatasetII(MEDS-II).,ﬂin
NISTIntragencyReport
,2011.
[169]
R.Gross,I.Matthews,J.Cohn,T.Kanade,andS.Baker,ﬁMulti-PIE.,ﬂin
FG
,2010.
[170]
J.R.Beveridge,P.J.Phillips,D.S.Bolme,B.A.Draper,G.H.Givens,Y.M.Lui,M.N.
Teli,H.Zhang,W.T.Scruggs,K.W.Bowyer,P.J.Flynn,andS.Cheng,ﬁThechallengeof
facerecognitionfromdigitalpoint-and-shootcameras.,ﬂin
BTAS
,2013.
[171]
Moosavi-Dezfooli,Seyed-Mohsen,A.Fawzi,O.Fawzi,andP.Frossard,ﬁFromfewto
many:Illuminationconemodelsforfacerecognitionundervariablelightingandpose.,ﬂ
in
CVPR
,pp.1765Œ1773,2017.
[172]
A.Goel,A.Singh,A.Agarwal,M.Vatsa,andR.Singh,ﬁSmartbox:Benchmarkingadver-
sarialdetectionandmitigationalgorithmsforfacerecognition.,ﬂin
BTAS
,pp.1Œ7,2018.
[173]
A.S.Georghiades,P.N.Belhumeur,andD.J.Kriegman,ﬁFromfewtomany:Illumination
conemodelsforfacerecognitionundervariablelightingandpose.,ﬂin
PAMI
,pp.643Œ660,
2001.
[174]
P.-Y.Chen,Y.Sharma,H.Zhang,J.Yi,andC.-J.Hsieh,ﬁEAD:Elastic-netattackstodeep
neuralnetworksviaadversarialexamples.,ﬂ
AAAI
,2018.
[175]
S.Liang,Y.Li,andR.Srikant,ﬁEnhancingthereliabilityofout-of-distributionimagede-
tectioninneuralnetworks,ﬂ
ICLR
,2018.
[176]
G.Goswami,A.Agarwal,N.Ratha,R.Singh,andM.Vatsa,ﬁDetectingandmitigating
adversarialperturbationsforrobustfacerecognition,ﬂ
ICCV
,vol.127,no.6-7,pp.719Œ742,
2019.
[177]
P.J.Phillips,P.J.Flynn,J.R.Beveridge,W.T.Scruggs,A.J.O'toole,D.Bolme,K.W.
Bowyer,B.A.Draper,G.H.Givens,Y.M.Lui,
etal.
,ﬁOverviewofthemultiplebiometrics
grandchallenge,ﬂin
ICB
,2010.
166
[178]
J.Liu,W.Zhang,Y.Zhang,D.Hou,Y.Liu,H.Zha,andN.Yu,ﬁDetectionbaseddefense
againstadversarialexamplesfromthesteganalysispointofview,ﬂin
CVPR
,2019.
[179]
F.V.Massoli,F.Carrara,G.Amato,andF.Falchi,ﬁDetectionoffacerecognitionadversarial
attacks,ﬂ
CVIU
,p.103103,2020.
[180]
Q.Cao,L.Shen,W.Xie,O.M.Parkhi,andA.Zisserman,ﬁVggface2:Adatasetforrecog-
nisingfacesacrossposeandage,ﬂin
FG
,2018.
[181]
A.Kurakin,I.Goodfellow,andS.Bengio,ﬁAdversarialexamplesinthephysicalworld.,ﬂ
arXivpreprintarXiv:1607.02533
,2016.
[182]
A.Agarwal,R.Singh,M.Vatsa,andN.K.Ratha,ﬁImagetransformationbaseddefense
againstadversarialperturbationondeeplearningmodels,ﬂ
IEEETransactionsonDepend-
ableandSecureComputing
,2020.
[183]
Z.Liu,Q.Liu,T.Liu,N.Xu,X.Lin,Y.Wang,andW.Wen,ﬁFeaturedistillation:Dnn-
orientedjpegcompressionagainstadversarialexamples,ﬂin
CVPR
,2019.
[184]
M.Naseer,S.Khan,M.Hayat,F.S.Khan,andF.Porikli,ﬁAself-supervisedapproachfor
adversarialrobustness,ﬂin
CVPR
,2020.
[185]
J.Zhou,C.Liang,andJ.Chen,ﬁManifoldprojectionforadversarialdefenseonfacerecog-
nition,ﬂin
EuropeanConferenceonComputerVision
,pp.288Œ305,Springer,2020.
[186]
H.Qiu,C.Xiao,L.Yang,X.Yan,H.Lee,andB.Li,ﬁSemanticadv:Generatingadversarial
examplesviaattribute-conditionalimageediting,ﬂ
arXivpreprintarXiv:1906.07927
,2019.
[187]
D.Su,H.Zhang,H.Chen,J.Yi,P.-Y.Chen,andY.Gao,ﬁIsrobustnessthecostof
accuracy?Œacomprehensivestudyontherobustnessof18deepimagemod-
els.,ﬂin
ECCV
,2018.
[188]
D.Tsipras,S.Santurkar,L.Engstrom,A.Turner,andA.Madry,ﬁRobustnessmaybeat
oddswithaccuracy.,ﬂ
ICLR
,2017.
[189]
G.S.Dhillon,K.Azizzadenesheli,Z.C.Lipton,J.Bernstein,J.KA.Khanna,and
A.Anandkumar,ﬁStochasticactivationpruningforrobustadversarialdefense,ﬂin
ICLR
,
2018.
[190]
R.Feinman,R.R.Curtin,S.Shintre,andA.B.Gardner,ﬁDetectingadversarialsamples
fromartifacts,ﬂ
arXivpreprintarXiv:1703.00410
,2017.
[191]
K.Grosse,P.Manoharan,N.Papernot,M.Backes,andP.McDaniel,ﬁOnthe(statistical)
detectionofadversarialexamples,ﬂ
arXivpreprintarXiv:1702.06280
,2017.
[192]
X.LiandF.Li,ﬁAdversarialexamplesdetectionindeepnetworkswithconvolutional
statistics,ﬂin
ICCV
,pp.5764Œ5772,2017.
[193]
D.HendrycksandK.Gimpel,ﬁEarlymethodsfordetectingadversarialimages,ﬂ
arXiv
preprintarXiv:1608.00530
,2016.
167
[194]
C.Guo,M.Rana,M.Cisse,andL.VanDerMaaten,ﬁCounteringadversarialimagesusing
inputtransformations,ﬂ
arXivpreprintarXiv:1711.00117
,2017.
[195]
H.Kannan,A.Kurakin,andI.Goodfellow,ﬁAdversariallogitpairing,ﬂ
arXivpreprint
arXiv:1803.06373
,2018.
[196]
J.H.Metzen,T.Genewein,V.Fischer,andB.Bischoff,ﬁOndetectingadversarialperturba-
tions,ﬂ
ICLR
,2017.
[197]
T.Na,J.H.Ko,andS.Mukhopadhyay,ﬁCascadeadversarialmachinelearningregularized
withaembedding,ﬂ
ICLR
,2017.
[198]
C.Xie,J.Wang,Z.Zhang,Z.Ren,andA.Yuille,ﬁMitigatingadversarialeffectsthrough
randomization,ﬂ
ICLR
,2017.
[199]
V.Zantedeschi,M.-I.Nicolae,andA.Rawat,ﬁEfdefensesagainstadversarialat-
tacks,ﬂin
ACMWorkshoponIntelligenceandSecurity
,pp.39Œ49,2017.
[200]
N.CarliniandD.Wagner,ﬁAdversarialexamplesarenoteasilydetected:Bypassingten
detectionmethods,ﬂin
ACMWorkshoponIntelligenceandSecurity
,pp.3Œ14,
2017.
[201]
A.Athalye,N.Carlini,andD.Wagner,ﬁObfuscatedgradientsgiveafalsesenseofsecurity:
Circumventingdefensestoadversarialexamples,ﬂ
ICML
,2018.
[202]
N.CarliniandD.Wagner,ﬁMagnetandﬁefdefensesagainstadversarialattacksﬂare
notrobusttoadversarialexamples,ﬂ
arXivpreprintarXiv:1711.08478
,2017.
[203]
M.Mosbach,M.Andriushchenko,T.Trost,M.Hein,andD.Klakow,ﬁLogitpairingmeth-
odscanfoolgradient-basedattacks,ﬂ
arXivpreprintarXiv:1810.12042
,2018.
[204]
Y.Song,T.Kim,S.Nowozin,S.Ermon,andN.Kushman,ﬁPixeldefend:Leveraginggen-
erativemodelstounderstandanddefendagainstadversarialexamples,ﬂ
ICLR
,2017.
[205]
A.Rozsa,M.G
¨
unther,andT.E.Boult,ﬁLotsaboutattackingdeepfeatures,ﬂin
2017IEEE
InternationalJointConferenceonBiometrics(IJCB)
,pp.168Œ176,IEEE,2017.
[206]
G.Goswami,N.Ratha,A.Agarwal,R.Singh,andM.Vatsa,ﬁUnravellingrobustnessof
deeplearningbasedfacerecognitionagainstadversarialattacks,ﬂin
AAAI
,2018.
[207]
P.J.Grother,M.Ngan,andK.Hanaoka,ﬁOngoingFaceRecognitionVendorTest(FRVT),
Part2:ﬂ
NISTInteragencyReport
,2018.
[208]
Y.Liu,A.Jourabloo,andX.Liu,ﬁLearningdeepmodelsforfaceBinaryor
auxiliarysupervision,ﬂin
CVPR
,2018.
[209]
Y.Liu,J.Stehouwer,andX.Liu,ﬁOndisentanglingspooftracesforgenericfaceanti-
ﬂin
ECCV
,2020.
168
[210]
H.Dang,F.Liu,J.Stehouwer,X.Liu,andA.Jain,ﬁOnthedetectionofdigitalfacemanip-
ulation,ﬂin
CVPR
,2020.
[211]
S.Shan,E.Wenger,J.Zhang,H.Li,H.Zheng,andB.Y.Zhao,ﬁFawkes:protectingprivacy
againstunauthorizeddeeplearningmodels,ﬂin
USENIX
,pp.1589Œ1604,2020.
[212]
A.Raghunathan,S.M.Xie,F.Yang,J.C.Duchi,andP.Liang,ﬁAdversarialtrainingcan
hurtgeneralization,ﬂ
arXivpreprintarXiv:1906.06032
,2019.
[213]
D.Yang,S.Hong,Y.Jang,T.Zhao,andH.Lee,ﬁDiversity-sensitiveconditionalgenerative
adversarialnetworks,ﬂ
ICLR
,2019.
[214]
P.Zhou,X.Han,V.I.Morariu,andL.S.Davis,ﬁTwo-streamneuralnetworksfortampered
facedetection,ﬂin
CVPRW
,IEEE,2017.
[215]
X.Yang,Y.Li,andS.Lyu,ﬁExposingdeepfakesusinginconsistentheadposes,ﬂin
ICASSP
,
IEEE,2019.
[216]
P.KorshunovandS.Marcel,ﬁDeepfakes:anewthreattofacerecognition?assessmentand
detection,ﬂ
arXivpreprintarXiv:1812.08685
,2018.
[217]
R.Wang,F.Juefei-Xu,L.Ma,X.Xie,Y.Huang,J.Wang,andY.Liu,ﬁFakespot-
ter:Asimpleyetrobustbaselineforspottingai-synthesizedfakefaces,ﬂ
arXivpreprint
arXiv:1909.06122
,2019.
[218]
Y.Liu,A.Jourabloo,andX.Liu,ﬁLearningdeepmodelsforfaceBinaryor
auxiliarysupervision,ﬂin
CVPR
,June2018.
[219]
A.Madry,A.Makelov,L.Schmidt,D.Tsipras,andA.Vladu,ﬁTowardsdeeplearning
modelsresistanttoadversarialattacks,ﬂ
arXivpreprintarXiv:1706.06083
,2017.
[220]
L.Li,J.Bao,T.Zhang,H.Yang,D.Chen,F.Wen,andB.Guo,ﬁFacex-rayformoregeneral
faceforgerydetection,ﬂin
CVPR
,2020.
[221]
Y.A.U.Rehman,L.M.Po,andM.Liu,ﬁDeeplearningforfaceanend-to-
endapproach,ﬂin
IEEESPA
,pp.195Œ200,2017.
[222]
A.Jourabloo,Y.Liu,andX.Liu,ﬁFacevianoisemodeling,ﬂin
ECCV
,2018.
[223]
F.Chollet,ﬁXception:Deeplearningwithdepthwiseseparableconvolutions,ﬂin
CVPR
,
2017.
[224]
S.Mehta,A.Uberoi,A.Agarwal,M.Vatsa,andR.Singh,ﬁCraftingapanopticfacepresen-
tationattackdetector,ﬂin
ICB
,IEEE,2019.
[225]
J.Stehouwer,A.Jourabloo,Y.Liu,andX.Liu,ﬁNoisemodeling,synthesisand
tionforgenericobjectﬂin
CVPR
,2020.
169
[226]
M.-T.Luong,Q.V.Le,I.Sutskever,O.Vinyals,andL.Kaiser,ﬁMulti-tasksequenceto
sequencelearning,ﬂ
arXivpreprintarXiv:1511.06114
,2015.
[227]
E.MeyersonandR.Miikkulainen,ﬁPseudo-taskaugmentation:Fromdeepmultitasklearn-
ingtointratasksharingŠandback,ﬂin
ICML
,pp.3511Œ3520,2018.
[228]
X.YinandX.Liu,ﬁMulti-taskconvolutionalneuralnetworkforpose-invariantfacerecog-
nition,ﬂ
IEEET-IP
,vol.27,no.2,pp.964Œ975,2017.
[229]
M.Crawshaw,ﬁMulti-tasklearningwithdeepneuralnetworks:Asurvey,ﬂ
arXivpreprint
arXiv:2009.09796
,2020.
[230]
T.Gui,L.Qing,Q.Zhang,J.Ye,H.Yan,Z.Fei,andX.Huang,ﬁConstructingmultipletasks
foraugmentation:Improvingneuralimagewithk-meansfeatures,ﬂin
AAAI
,
2020.
[231]
K.Hsu,S.Levine,andC.Finn,ﬁUnsupervisedlearningviameta-learning,ﬂ
ICLR
,2018.
[232]
J.Baxter,ﬁAbayesian/informationtheoreticmodeloflearningtolearnviamultipletask
sampling,ﬂ
Machinelearning
,vol.28,no.1,pp.7Œ39,1997.
[233]
N.Sanghvi,S.K.Singh,A.Agarwal,M.Vatsa,andR.Singh,ﬁMixnetforgeneralizedface
presentationattackdetection,ﬂ
arXivpreprintarXiv:2010.13246
,2020.
[234]
J.J.Engelsma,K.Cao,andA.K.Jain,ﬁLearningaed-lengthrepresentation,ﬂ
IEEEPAMI
,2019.
170