TOWARDSROBUSTANDSECUREFACERECOGNITION: DEFENSEAGAINSTPHYSICALANDDIGITALATTACKS By DebayanDeb ADISSERTATION Submittedto MichiganStateUniversity inpartialoftherequirements forthedegreeof ComputerScienceŒDoctorofPhilosophy 2021 ABSTRACT TOWARDSROBUSTANDSECUREFACERECOGNITION: DEFENSEAGAINSTPHYSICALANDDIGITALATTACKS By DebayanDeb Theaccuracy,usability,andtouchlessacquisitionofstate-of-the-artautomatedfacerecognition systems(AFR)haveledtotheirubiquitousadoptioninaplethoraofdomains,includingmobile phoneunlock,accesscontrolsystems,andpaymentservices.Despiteimpressiverecognitionper- formance,prevailingAFRsystemsremainvulnerabletothegrowingthreatof faceattacks which canbelaunchedinboth physical and digital domains.Faceattackscanbebroadly intothreeattackcategories:(i) Spoof attacks:artifactsinthephysicaldomain( e . g .,3Dmasks, eyeglasses,replayingvideos),(ii) Adversarial attacks:imperceptiblenoisesaddedtoprobesfor evadingAFRsystems,and(iii) Digitalmanipulation attacks:entirelyorpartiallyphoto- realisticfacesusinggenerativemodels.Eachofthesecategoriesiscomposedofdifferentattack types.Forexample,eachspoofmedium, e . g .,3Dmaskandmakeup,constitutesoneattacktype. Likewise,inadversarialanddigitalmanipulationattacks,eachattackmodel,designedbyunique objectivesandlosses,maybeconsideredasoneattacktype.Thus,theattackcategoriesandtypes forma 2 -layertreestructureencompassingthediverseattacks.Suchatreewillinevitablygrow inthefuture.Giventhegrowingdisseminationoffifakenewsflandfideepfakesfl,theresearch communityandsocialmediaplatformsalikearepushingtowards generalizable defenseagainst continuouslyevolvingandsophisticatedfaceattacks.Inthisdissertation,weproposeasetof defensemethodsthatachievestate-of-the-artperformanceindetectingattacktypeswithinindivid- ualattackcategories,bothphysical(e.g.,facespoofs)anddigital(e.g.,adversarialfacesanddigital manipulation),thenintroduceamethodforsimultaneouslysafeguardingagainsteachattack. First,inanefforttoimpartgeneralizabilityandinterpretabilitytofacespoofdetectionsystems, weproposeanewfaceframeworkdesignedtodetectunknownspoof types,namely,Self-SupervisedRegionalFullyConvolutionalNetwork( SSR-FCN ),thatistrained tolearnlocaldiscriminativecuesfromafaceimageinaself-supervisedmanner.Theproposed frameworkimprovesgeneralizabilitywhilemaintainingthecomputationalefyofholistic faceapproaches( < 4msonaNvidiaGTX1080TiGPU).Theproposedmethod isalsointerpretablesinceitlocalizeswhichpartsofthefacearelabeledasspoofs.Experimental resultsshowthatSSR-FCNcanachieveTrueDetectionRate(TDR)=65%@2.0%FalseDetection Rate(FDR)whenevaluatedonadatasetcomprisingof13differentspooftypesunderunknown attackswhileachievingcompetitiveperformancesunderstandardbenchmarkface datasets(Oulu-NPU,CASIA-MFSD,andReplay-Attack). Next,weaddresstheproblemofdefendingagainstadversarialattacks.Wepropose, Ad- vFaces ,anautomatedadversarialfacesynthesismethodthatlearnstogenerateminimalpertur- bationsinthesalientfacialregions.OnceAdvFacesistrained,itcanautomaticallyevadestate- of-the-artfacematcherswithattacksuccessratesashighas97.22%and24.30%at 0 : 1% FARfor obfuscationandimpersonationattacks,respectively.Wethenproposeanewself-supervisedadver- sarialdefenseframework,namely FaceGuard ,thatcanautomaticallydetect,localize,andpurify awidevarietyofadversarialfaceswithoututilizingpre-computedadversarialtrainingsamples. FaceGuardautomaticallysynthesizesdiverseadversarialfaces,enablingatolearntodis- tinguishthemfrombonafaces.Concurrently,aattemptstoremovetheadversarial perturbationsintheimagespace.FaceGuardcanachieve99.81%,98.73%,and99.35%detection accuraciesonLFW,CelebA,andFFHQ,respectively,onsixunseenadversarialattacktypes. Finally,wetakethestepstowardssafeguardingAFRsystemsagainstfaceattacksin both physicalanddigitaldomains.Weproposeanew uni f ace a ttack d etectionframework, namely UniFAD ,whichautomaticallyclusterssimilarattacksandemploysamulti-tasklearning frameworktolearnsalientfeaturestodistinguishbetweenbonaandcoherentattacktypes. Theproposed UniFAD candetectfaceattacksfrom 25 attacktypesacrossall 3 attackcategories withTDR= 94 : 73% @ 0 : 2% FDRonalargefakefacedataset,namely GrandFake .Further, Uni- FAD canidentifywhetherattacksareadversarial,digitallymanipulated,orcontainspoofartifacts, with 97 : 37% accuracy. Copyrightby DEBAYANDEB 2021 ToMyLovingFamily v ACKNOWLEDGMENTS Arguably,thissectionhasbeenthehardestonetowriteinthisentiredissertationduetothe sheernumberofindividualswhowereinstrumentaltowardsthecompletionofmyPhDresearch. Foremost,Iwouldliketoexpressmydeepestgratitudetowardsmyadvisor,Dr.AnilK.Jain,for hisunwaveringsupportandencouragementtostriveforexcellence.Despitebeingarawstudentat Dr.Jain,tookachanceonmebybringingmeintohisresearchlabandpatientlyteachingme therigorsofresearch.His(i)abilitytosystematicallydisseminateproblems,(ii) attentiontodetails,(iii)workethic,(iv)skillofpresentingideasinavisually-convincingmanner, and(v)promptnessinrespondingtoemails,aresomeofthemanyvirtuesthatIwillalwaysstrive toinstillinme.Apartfrombeingagreatscientist,heisalsoextraordinarilyhumbleandkindat heart.ThankyouDr.Jainalsoforyourfriendshipandguidanceonmattersoutsideofresearch. ThankyouDr.RossforteachingCSE491:SelectedTopicsinBiometricsandadmittingmeinto yourresearchgrouppriortomyPhD.Withoutyourinitialsupport,Iwouldnothavebeenwriting thisdissertation. IwouldalsoliketothankmyPhDcommittee,Dr.ArunRoss,Dr.XiaomingLiu,andDr.Mi Zhang,forprovidingvaluablecommentsandsuggestionsonthisdissertation.Aspecialthanksto Dr.LiuandDr.Rossforyourpatience,inspirationalideas,andsuggestionsduringourcollabo- rativeresearch.ThankyouBrendaHodge,StevenSmith,AmyKing,andErinDunlopforyour administrativeassistanceandyourpatiencewithmyextremelytardytravelreimbursementforms. AspecialthankstoChristopherPerry,forbeingagreatsupportandhelpinsettingupmyGPU machineŒwithoutwhichtheresearchconductedinthisdissertationcouldnothavebeenpossible. MydeepestgratitudetoBrendanKlarefromRankOneComputing,JianbangZhangfrom Lenovo,andBiplobDebnathfromNEC,forprovidingmewithinternshipopportunitiesattheir respectivecompanies.Throughtheinternships,Igotanopportunitytoviewresearch fromabusinessperspective.IwouldalsolikethethankFord,Lenovo,IARPA,andFacebookAI ResearchfortheirsupportinresearchconductedduringmyPhDprogram. vi IamgratefulforallmyfellowPRIPies,especiallymycontemporaries(Josh,Yichun,and Sixue).ThankyouJoshforyourclosefriendshipandthemanybeer-drivenconversationswe hadacrosstheworldovertheyears.ThankyouInciforallofourconversationsinthelab,your encouragementsthroughout,andwillingnesstocollaboratewithmeonfutureresearch.Aspecial thankstoAishvarywhohasbeenlikeabrothertomeforthepastdecade.ThankyouVishesh, Divyansh,Yichun,andTarangforhelpingmegainduringmylows.Thankyoualsoto Steve,Kai,Cori,Lacey,Charles,andSunpreet.Specialthankstonon-PRIPiesincludingbutnot limitedtoYaojieLiu,AminJourabloo,JoelStehouwer,SudiptaBanerjee,RahulDey,Siddharth Shukla,StevenHoffman,AdamTerwilliger,XiYin,andThomasSwearingen. Sincereapologies toallformytardiness. Ioweagreatdealtomyparentsandsisterforbeingaconstantsourceofsupportandguidance whileIjourneyedthroughtheupsanddownsasagraduatestudent.Withouttheirmoralsupport, noneofthiswouldhavebeenpossible. ThereareofcoursemanyotherswhohavehadagreatimpactonmealongthewayandI sincerelythankallofyou. vii TABLEOFCONTENTS LISTOFTABLES ....................................... xii LISTOFFIGURES ...................................... xv LISTOFALGORITHMS ................................... xxiv Chapter1Introduction .................................. 1 1.1Background......................................8 1.2AutomatedFaceRecognition(AFR).........................9 1.2.1AFRPipeline.................................9 1.2.1.1FaceDetection...........................10 1.2.1.2FaceAlignment..........................12 1.2.1.3FeatureExtraction.........................14 1.2.1.4SimilarityMeasurement......................15 1.3EvolutionofFaceRecognition............................15 1.3.1FaceRepresentations.............................16 1.3.1.1HolisticFaceRepresentations...................16 1.3.1.2LocalFaceRepresentations....................17 1.3.1.3LearnedFaceRepresentations...................17 1.4BenchmarkingAFRSystems.............................18 1.4.1EvaluationMetrics..............................18 1.4.2FaceDatasets.................................20 1.4.3ConstrainedFaceRecognition........................20 1.4.4UnconstrainedFaceRecognition.......................21 1.5VulnerabilitiesofAFRSystems............................23 1.5.1PhysicalFaceSpoofs.............................24 1.5.2DigitalAdversarialFaces...........................25 1.5.3DigitalFaceManipulation..........................28 1.6DissertationContributions..............................29 Chapter2DefendingAgainstFaceSpoofs ........................ 31 2.1Introduction......................................31 2.2Background......................................35 2.3Motivation.......................................38 2.3.1FacePresentationAttackDetectionisaLocalTask.............38 2.3.2Globalvs.LocalSupervision.........................38 2.4ProposedApproach..................................40 2.4.1NetworkArchitecture.............................40 2.4.2NetworkEfy..............................41 2.4.3StageI:TrainingFCNGlobally.......................41 2.4.4StageII:TrainingFCNonSelf-SupervisedRegions.............43 viii 2.4.5Testing....................................44 2.5ExperimentalSetup..................................44 2.5.1Datasets....................................44 2.5.1.1Spoof-in-the-WildwithMultipleAttacks(SiW-M)[1]......44 2.5.1.2Oulu-NPU[2]...........................45 2.5.1.3CASIA-FASD[3]&Replay-Attack[4]..............45 2.5.2DataPreprocessing..............................45 2.5.3ImplementationDetails............................45 2.5.4EvaluationMetrics..............................46 2.6ExperimentalResults.................................48 2.6.1EvaluationofGlobalDescriptorvs.LocalRepresentation..........48 2.6.2RegionExtractionStrategies.........................49 2.6.3EvaluationofNetworkCapacity.......................52 2.6.4GeneralizationacrossUnknownAttacks...................54 2.6.5SiW-M:DetectingKnownAttacks......................55 2.6.6EvaluationonOulu-NPUDataset.......................55 2.6.7Cross-DatasetGeneralization.........................57 2.6.8FailureCases.................................58 2.6.9ComputationalRequirement.........................59 2.6.10VisualizingPresentationAttackRegions...................60 2.7Discussion.......................................61 2.8Summary.......................................62 Chapter3SynthesizingandDefendingAgainstAdversarialFaces ........... 63 3.1Introduction......................................64 3.2RelatedWork.....................................67 3.2.1GenerativeAdversarialNetworks(GANs)..................67 3.2.2AdversarialAttacksonImage.................68 3.2.3AdversarialAttacksonFaceRecognition..................68 3.2.4DefensesAgainstAdversarialAttacks....................69 3.3SynthesizingAdversarialFaces............................70 3.3.1ProposedMethodology............................72 3.3.2ExperimentalSettings.............................75 3.3.3ComparisonwithState-of-the-Art......................77 3.3.4AblationStudy................................78 3.3.5WhatisAdvFacesLearning?.........................78 3.3.6TransferabilityofAdvFaces.........................79 3.3.7EffectofPerturbationAmount........................81 3.3.8HumanPerceptualStudy...........................82 3.3.9Addendum..................................84 3.3.9.1ImplementationDetails......................84 3.3.9.2EffectonCosineSimilarity....................85 3.3.9.3StructuralSimilarity........................86 3.3.9.4BaselineImplementationDetails..................87 3.4DefendingAgainstAdversarialFaces.........................89 ix 3.4.1LimitationsofState-of-the-ArtDefenses...................92 3.4.2ProposedMethodology............................94 3.4.2.1AdversarialGenerator.......................94 3.4.2.2AdversarialDetector........................96 3.4.2.3Adversarial........................96 3.4.2.4TrainingFramework........................98 3.4.3ExperimentalSettings.............................99 3.4.4ComparisonwithState-of-the-ArtDefenses.................100 3.4.5AnalysisofOurApproach..........................102 3.4.6Addendum..................................106 3.4.6.1ImplementationDetails......................106 3.4.6.2Preprocessing............................106 3.4.6.3NetworkArchitectures.......................106 3.4.6.4TrainingDetails..........................107 3.4.6.5Baselines..............................110 3.4.6.6AdditionalDatasets........................111 3.4.6.7OvinginPrevailingDetectors.................112 3.4.6.8QualitativeResults.........................112 3.4.6.9AdditionalResultson...................113 3.5Summary.......................................114 Chapter4DetectionofDigitalandPhysicalFaceAttacks ........... 120 4.1Introduction......................................120 4.2RelatedWork.....................................123 4.3DissectingPrevailingDefenseSystems........................125 4.3.1Datasets....................................125 4.3.2DrawbackofJointCNN............................126 4.3.3UnifyingMultipleJointCNNs........................126 4.4ProposedMethod...................................127 4.4.1Problem..............................128 4.4.2AutomaticConstructionofAuxiliaryTasks.................128 4.4.3Multi-TaskLearningwithConstructedTasks.................129 4.4.4ParameterSharing..............................129 4.4.5TrainingandTesting.............................130 4.5ExperimentalResults.................................131 4.5.1ExperimentalSettings.............................131 4.5.2ComparisonwithIndividualSOTADetectors................133 4.5.3ComparisonwithFusedSOTADetectors...................133 4.5.4Attack.............................135 4.5.5AnalysisofUniFAD.............................135 4.6Addendum.......................................138 4.6.1GrandFakeDataset..............................138 4.6.2ImplementationDetails............................139 4.6.3DigitalAttackImplementation........................140 4.6.4BaselineImplementation...........................141 x 4.6.5SeenAttacks.................................142 4.6.6GeneralizabilitytoUnseenAttacks......................144 4.6.7AttackCategory........................145 4.7Summary.......................................146 Chapter5Summary .................................... 147 5.1Contributions.....................................148 5.2SuggestionsforFutureWork.............................150 Chapter6PhDOverview ................................. 152 6.1Publications......................................152 6.2Videos&Demos...................................153 6.3MediaCoverage....................................154 BIBLIOGRAPHY ....................................... 155 xi LISTOFTABLES Table1.1Vperformance(%)undertwodifferentfacedetectorsonLFW, CFP-FP,andAgeDB-30[5].Figure1.7showsafewexamplesofeachofthethree datasets.........................................10 Table1.2VPerformance(%)ofFaceNet[6]AFRsystemunderdifferent facealignmenttechniques[7].............................14 Table1.3VPerformance(%)onLFW[8]fordifferentfacefeatureextractors.14 Table1.4BenchmarkingAFRperformancethroughouttheyearsinNISTevaluations onfrontalandconstrainedfaces............................21 Table2.1Asummaryofpubliclyavailablefacepresentationattackdetectiondatasets..34 Table2.2ArchitecturedetailsoftheproposedFCNbackbone..............39 Table2.3Generalizationerroronlearningglobal(CNN)vs.local(FCN)representa- tionsofSiW-M[1]...................................46 Table2.4GeneralizationperformanceofdifferentregionextractionstrategiesonSiW- Mdataset.Here,eachcolumnrepresentsanunknownpresentationattackinstru- mentwhilethemethodistrainedontheremaining12presentationattackinstruments.48 Table2.5GeneralizationerrorofFCNswithrespecttothenumberoftrainableparam- eters...........................................53 Table2.6ResultsonSiW-M:UnknownAttacks.Here,eachcolumnrepresentsanun- knownpresentationattackinstrumentwhilethemethodistrainedontheremaining 12presentationattackinstruments...........................53 Table2.7ResultsonSiW-M:Knownpresentationattackinstruments...........53 Table2.8ErrorRates(%)oftheproposed SSR-FCN andandcompetingfacepresen- tationattackdetectorsunderthefourstandardprotocolsofOulu-NPU[2]......56 Table2.9Cross-DatasetHTER(%)oftheproposed SSR-FCN andcompetingface presentationattackdetectors..............................57 Table3.1Relatedworkinadversarialdefensesusedasbaselinesinourstudy.Un- likemajorityofpriorwork, FaceGuard isself-supervisedwherenopre-computed adversarialexamplesarerequiredfortraining.....................69 xii Table3.2Attacksuccessratesandstructuralsimilaritiesbetweenprobeandgallery imagesforobfuscationandimpersonationattacks.Attackratesforobfuscation comprisesof484,514comparisonsandthemeanandstandarddeviationacross10- foldsforimpersonationreported.Themeanandstandarddeviationofthestructural similaritiesbetweenadversarialandprobeimagesalongwiththetimetakento generateasingleadversarialimage(onaQuadroM6000GPU)alsoreported....74 Table3.3Foreachmethod,theaverageandstandarddeviation( % )ofthenumberof timesworkerschosethesynthesizedimagetobeclosesttotheprobe.........83 Table3.4FacerecognitionperformanceofArcFace[9]underadversarialattackand averagestructuralsimilarities(SSIM)betweenprobeandadversarialimagesfor obfuscationattackson 485 K genuinepairsinLFW[8]................97 Table3.5DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixad- versarialattackssynthesizedfortheLFWdataset[8].Detectionthresholdisset as 0 : 5 forallmethods.Allbaselinemethodsrequiretrainingonpre-computedad- versarialattacksonCASIA-WebFace[10].Ontheotherhand,theproposed Face- Guard isself-guidedandgeneratesadversarialattacksonthe.Hence,itcanbe regardedasa black-box defensesystem........................98 Table3.6AFRperformance(TAR(%)@0.1%FAR)ofArcFaceundernodefense andwhenArcFaceistrainedviaSOTArobustnesstechniques[11Œ13]orSOTA [14,15]. FaceGuard correctlypassesmajorityofrealfacestoArcFace andalsoadversarialattacks..........................102 Table3.7Ablatingtrainingschemesofthegenerator G anddetector D .Allmodels aretrainedonCASIA-WebFace[10]. (Col.3) Wecomputethedetectionaccuracy inclassifyingrealfacesinLFW[8]andthemostchallengingadversarialattackin Tab.3.4,AdvFaces[16]. (Col.4) Theavg.andstd.dev.ofdetectionaccuracy acrossall6adversarialattacks.............................103 Table3.8AverageandstandarddeviationofdetectionaccuraciesofSOTAadversarial facedetectorsinclassifyingsixadversarialattackssynthesizedfortheLFW[8], CelebA[17],andFFHQ[18]datasets.Detectionthresholdissetas 0 : 5 forall methods.Allbaselinemethodsrequiretrainingonpre-computedadversarialat- tacksonCASIA-WebFace[10].Ontheotherhand,theproposed FaceGuard is self-guidedandgeneratesadversarialattacksonthe.Hence,itcanberegarded asa black-box defensesystem.............................110 Table3.9DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixad- versarialattackssynthesizedfortheLFWdataset[8]undervariousknownand unseenattackscenarios.Detectionthresholdissetas 0 : 5 forallmethods.......111 xiii Table4.1Faceattackdatasetswithno.ofbonaimages,no.ofattackimages,and no.ofattacktypes.Here, I denotesimagesand V referstovideos..........123 Table4.2Detectionaccuracy(TDR(%)@ 0 : 2% FDR)on GrandFake dataset.Results onfusingFaceGuard[19],FFD[20],andSSRFCN[21]arealsoreported.We reporttimetakentodetectasingleimage(onaNvidia2080TiGPU)........132 Table4.3Ablationstudyovercomponentsof UniFAD .Branchingviafi B Semantic fl, fi B Random fl,andfi B kMeans flrefertopartitioningattacktypesbytheirsemanticcat- egories,randomly,and k Means.fiSharedSemanticflincludessharedlayerspriorto branching........................................136 Table4.4Compositionandstatisticsfortheproposed GrandFake dataset.Wealso includetheevaluationprotocolfortheseenattackscenario..............140 Table4.5Detectionperformance(TDR(%)@ 0 : 2% FDRandAccuracy(%)) on GrandFake datasetunderthe seen attackscenario.................142 Table4.6Generalizationperformance(TDR(%)@ 0 : 2% FDRandAccuracy(%)) on GrandFake datasetunderunseenattacksetting.Eachfoldcomprisesof 8 unseen attacksfromall4branches...............................144 xiv LISTOFFIGURES Figure1.1(a)errorratesofsixstate-of-the-artautomatedfacerecogni- tionvendorsona12millionmugshotdataset,namelyFRVT-2018[22].(b)Six mugshotsrepresentativeoftheFRVT-2018dataset..................2 Figure1.2Sourcesofintra-subjectvariability:(a)pose,(b)illumination,and(c)ex- pression.Eachrowshowsintra-subjectvariationsforthesameindividualin(a-c; source:PIEDataset[23]),(d)AmitabhBacchan(source:GoogleImages),and(e) TomHiddleston(source:GoogleImages).......................4 Figure1.3errorratesofsixstate-of-the-artautomatedfacerecognition vendorswhen(a)mugshots,(b)webcamimages,and(c)facesarecompared againsta1.6millionmugshotdataset(asubsetoftheFRVT-2018[22]).......7 Figure1.4errorratesofsixstate-of-the-artautomatedfacerecognition vendorsona3millionmugshotdatasetunderaging[22].Facerecognitionerrors increaseasthetimegapbetweenaprobeimageandtheenrolledfaceimageinthe galleryincreases....................................7 Figure1.5Sourceofinter-subjectsimilarities:(a)kinshiprelations(here,twins),(b) differentpeoplewithnokinshiprelationswhohappentoexhibitverysimilarfa- cialcharacters(knownasfidoppelg ¨ angersfl),and(c)RichardJones(right)spent 17yearsinprisonforacrimecommittedbyhisdoppelg ¨ anger,RickyAmos(left) (Source:https://cnn.it/2Gb1F4A)...........................8 Figure1.6AtypicalAutomatedFaceRecognition(AFR)systemtypicallyconsists of(i)facedetection,(ii)facealignment(tomitigategeometricdistortions),(iii) featureextraction,and(iv)comparisonoffacerepresentations(featurevectors)...9 Figure1.7Examplefacesin(a)LFW[8],(b)CFP[24],andAgeDB[25]datasets....10 Figure1.8Astate-of-the-artfacedetector,RetinaFace,candetectaround900faces (detectionthresholdat 0 : 5 )outof1,151peoplereportedtobepresentinthe fiWorld'sLargest[5].Theyellowrectangledenotestheboundingbox aroundafaceandgreendotsrepresentthedetectedlandmarks............11 Figure1.9Exampleimagesthatsupposedlycontainfacesbutcannotbedetectedbya state-of-the-artfacedetector,RetinaFace[5].Notethattheseimagesareofvery lowresolution......................................11 xv Figure1.10Illustrationofvarious2Dfacealignmenttechniques:(i)simplycropping thefaceregion,(ii)similaritytransform(scaleandrotation),(iii)aftransfor- mation(rotation,scaling,andshearmapping),and(iv)projectivetransformation (perspectivedeformation)[7]..............................13 Figure1.11Examplefaceimagesfrom(a)FERET[26],(b)FRGC[26],(c)LFW[8], (d)IJB-A[27],(e)MS-Celeb-1M[28],and(f)TinyFace[29].Datasets(a)and(b) containfaceimagesunderrelativelycontrolledacquisitionconditions.Datasets (c-f)containmoreunconstrainedfaceimages( e . g .,collectedfromtheInternet)...22 Figure1.12FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigital andphysicalspaces.Giventhediversityofthefaceattacks,prevailingmethods fallshortindetectingattacksacrossallthreecategories( i . e .,adversarial,digital manipulation,andspoofs)...............................23 Figure1.13Examplepresentationattacks:Simpleattacksinclude(b)printedphoto- graph,or(c)replayingthevictim'svideo.Moreadvancedpresentationattackscan alsobeleveragedsuchas(d-h)3Dmasks,(i-k)make-upattacks,or(l-n)partialat- tacks[30].Afaceisshownin(a)forcomparison.Here,thepresentation attacksin(b-c,k-n)belongtothesamepersonin(a).................25 Figure1.14Examplegalleryandprobefaceimagesandcorrespondingsynthesizedad- versarialexamples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand (b)thesamesubject'sprobeimage;(c)Adversarialexamplesgeneratedfrom(b) byourproposedsynthesismethod,AdvFaces;(d-e)Resultsfromtwoadversar- ialexamplegenerationmethods.Cosinesimilarityscores( 2 [ 1 ; 1] )obtainedby comparing(b-e)totheenrolledimageinthegalleryviaArcFace[9]areshown belowtheimages.Ascoreabove 0.28 (threshold@ 0 : 1% FalseAcceptRate) indicatesthattwofaceimagesbelongtothesamesubject.Here,asuccessfulob- fuscationattackwouldmeanthathumanscanidentifytheadversarialprobesand enrolledfacesasbelongingtothesameidentitybutanautomatedfacerecognition systemconsidersthemtobefromdifferentsubjects..................26 Figure1.15Examplesofdigitallymanipulatedfaces.(a)Realimages/framesfrom FFHQ,CelebAandFaceForensics++datasets;(b)Pairedfaceidentityswapim- agesfromFaceForensics++dataset;(c)Pairedfaceexpressionswapimagesfrom FaceForensics++dataset;(d)AttributesmanipulatedexamplesbyFaceAPPand StarGAN;(e)EntiresynthesizedfacesbyPGGANandStyleGAN.Collagesourced from[20]........................................27 Figure2.1Examplepresentationattackinstruments:Simpleattacksinclude(b)printed photograph,or(c)replayingthevictim'svideo.Moreadvancedpresentationat- tackscanalsobeleveragedsuchas(d-h)3Dmasks,(i-k)make-upattacks,or(l-n) partialattacks[30].Afaceisshownin(a)forcomparison.Here,the presentationattacksin(b-c,k-n)belongtothesamepersonin(a)..........32 xvi Figure2.2AnoverviewoftheproposedSelf-SupervisedRegionalFullyConvolution Network( SSR-FCN ).Wetrainintwostages:(1)Stage1learnsglobaldiscrimi- nativecuesviatrainingontheentirefaceimage.Thescoremapobtainedfrom stage1ishard-gatedtoobtainpresentationattackregionsinthefaceimage.We randomlycroparbitrary-sizepatchesfromthepresentationattackregionsand tuneournetworkinstage2tolearnlocaldiscriminativecues.Duringtest,weinput theentirefaceimagetoobtainthescore.Thescoremapcanalsobe usedtovisualizethepresentationattackregionsintheinputimage..........33 Figure2.3Illustrationofdrawbacksofpriorapproaches.Top:exampleofa face;Bottom:exampleofapaperglassespresentationattack.Inthiscase,thepre- sentationattackartifactisonlypresentintheeye-regionoftheface.(a) trainedwithglobalsupervisionovtotheclasssincebothimagesare mostly(thepresentationattackinstrumentcoversonlyapartoftheface). (b)Pixel-levelsupervisionassumestheentireimageiseitherorpresenta- tionattackandconstructslabelmapsaccordingly.Thisisnotavalidassumption inmask,makeup,andpartialpresentationattackinstruments.Instead,(c)thepro- posedframework,trainsonextractedregionsfromfaceimages.Theseregionscan bebasedondomainknowledge,suchaseye,nose,mouthregions,orrandomly cropped.Theproposed SSR-FCN utlizesself-supervisedregion-selection......36 Figure2.4Threepresentationattackimagesandtheircorrespondingbinarymasksex- tractedfrompredictedscoremaps.Blackregionscorrespondtopredicted regions,whereas,whiteregionsindicatepresentationattack.............41 Figure2.5Illustrationofvariousregionextractionstrategiesfromtrainingimages. (a)and(b)areregionsextractedviadomainknowledge(manuallyor landmark-based.(c)randomregionsextractedviaproposedself-supervision scheme.Eachcolordenotesaseparateregion.....................47 Figure2.6(a)Anexampleobfuscationpresentationattackattemptwhereournetwork correctlypredictstheinputtobeapresentationattack.(b,e)Scoremapoutput byournetworktrainedviaSelf-SupervisedRegions.(c,f)Scoremapoutputby FCNtrainedonentirefaceimages.(d)Anexampleobfuscationpresentationattack attemptwhereournetwork incorrectly predictstheinputtobeaAttack scoresaregivenbelowthescoremaps.Decisionthresholdis 0 : 5 ...........51 Figure2.7Networkconvergenceoveranumberoftrainingiterationswhenamodel trainson(a)randomlycroppedpatches(blueline),and(b)self-supervisedregions extractedviapre-trainedmodelfromStageI(orangeline).Randomlycropping patchesmayresultinnoisysampleswheresamplesfrompresentation attacksamplesmaybeusedfortraining.Someexamplerandomlycroppedpatches withhightraininglossareshownabovethelines.Instead,wethattheproposed self-supervisionaidsinnetworkconvergence.....................52 xvii Figure2.8Examplecaseswheretheproposedframework, SSR-FCN ,failstocorrectly classifyandpresentationattacks.(a)areaspre- sentationattackslikelyduetobrightlightingandocclusionsinfaceregions.(b) Presentationattacksasduetothesubtlenatureofmake-up attacksandtransparentmasks.Correspondingattackscores( 2 [0 ; 1] )areprovided beloweachimage.Largervalueofattackscoreindicatesahigherlikelihoodthat theinputimageisapresentationattack.Decisionthresholdis 0 : 5 ..........58 Figure2.9Visualizingpresentationattackregionsviatheproposed SSR-FCN .Redre- gionsindicatehigherlikelihoodofbeingapresentationattackregion.Correspond- ingattackscores( 2 [0 ; 1] )areprovidedbeloweachimage.Largervalueofattack scoreindicatesahigherlikelihoodthattheinputimageisapresentationattack. Decisionthresholdis 0 : 5 ................................59 Figure2.10Apartialpresentationattackartifactmaybepresentinasmallportionof theinput 256 256 faceimage,suchas(a)papereyeglasses.However,since theproposed SSR-FCN dynamicallyaggregatesdecisionsacrossmultiplereceptive intheimage,amajorityofthepixelsinthescoremapcompriseofhigh scores(indicatingthepresenceofapresentationattack).Wevisualizetheaverage scoremapacrossallpapereyeglassattacksin(b)...................60 Figure3.1Examplegalleryandprobefaceimagesandcorrespondingsynthesizedad- versarialexamples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand (b)thesamesubject'sprobeimage;(c)Adversarialexamplesgeneratedfrom(b) byourproposedsynthesismethod,AdvFaces;(d-e)Resultsfromtwoadversar- ialexamplegenerationmethods.Cosinesimilarityscores( 2 [ 1 ; 1] )obtainedby comparing(b-e)totheenrolledimageinthegalleryviaArcFace[9]areshown belowtheimages.Ascoreabove 0.28 (threshold@ 0 : 1% FalseAcceptRate) indicatesthattwofaceimagesbelongtothesamesubject.Here,asuccessfulob- fuscationattackwouldmeanthathumanscanidentifytheadversarialprobesand enrolledfacesasbelongingtothesameidentitybutanautomatedfacerecognition systemconsidersthemtobefromdifferentsubjects..................64 Figure3.2Threetypesoffacepresentationattacks:(a)printedphotograph,(b)replay- ingthetargetedperson'svideoonasmartphone,and(c)asiliconemaskofthe target'sface.Facepresentationattacksrequireaphysicalartifact.Adversarialat- tacks(d),ontheotherhand,aredigitalattacksthatcancompromiseeitheraprobe imageorthegalleryitself.Toahumanobserver,facepresentationattacks(a-c)are moreconspicuousthanadversarialfaces(d)......................65 Figure3.3Eightpointsofattacksinanautomatedfacerecognitionsystem[31].An adversarialimagecanbeinjectedintheAFRsystematpoints2and6(solidarrows).66 xviii Figure3.4Oncetrained,AdvFacesautomaticallygeneratesanadversarialfaceimage. Duringanobfuscationattack,(a)theadversarialfaceappearstobeabenignex- ampleofCristianoRonaldo'sface,however,itfailstomatchhisenrolledimage. AdvFacescanalsocombineCristiano'sprobeandBradPitt'sprobetosynthesize anadversarialimagethatlookslikeCristianobutmatchesBrad'sgalleryimage(b).71 Figure3.5Givenaprobefaceimage,AdvFacesautomaticallygeneratesanadversarial maskthatisthenaddedtotheprobetoobtainanadversarialfaceimage.......73 Figure3.6AdversarialfacesynthesisresultsonLFWdatasetin(a)obfuscationand(b) impersonationattacksettings(cosinesimilarityscoresobtainedfromArcFace[9] withthreshold@ 0 : 1% FAR =0 : 28 ).Theproposedmethodsynthesizesadversar- ialfacesthatareseeminglyinconspicuousandmaintainhighperceptualquality. AdditionalexamplesareavailableintheAddendum(Sec.3.3.9)...........77 Figure3.7VariantsofAdvFacestrainedwithoutthediscriminator,perturbationloss, andidentityloss,respectively.EverycomponentofAdvFacesisnecessary......79 Figure3.8State-of-the-artfacematcherscanbeevadedbyslightlyperturbingsalient facialregions,suchaseyebrows,eyeballs,andnose(cosinesimilarityobtainedvia ArcFace[9])......................................80 Figure3.9CorrelationbetweenfacefeaturesextractedviaFaceNetandArcFacefrom 1,456imagesbelongingto10subjects.........................80 Figure3.102Dt-SNEvisualizationoffacerepresentationsextractedviaFaceNetand ArcFacefrom1,456imagesbelongingto10subjects.................81 Figure3.11Trade-offbetweenattacksuccessrateandstructuralsimilarityforimper- sonationattacks.Wechoose =8 : 0 ..........................82 Figure3.12Examplefailurecaseswherehumanobserversvotedanadversarialimage synthesizedby(c)GFLM[32]and(f)PGD[33]tobeclosertotheprobeface(a), (d)comparedtoAdvFaces(b),(e)...........................83 Figure3.13ShiftincosinesimilarityscoresforArcFace[9]beforeandafteradversarial attacksgeneratedviaAdvFaces............................86 Figure3.15Left:RealfaceimagesintheLFWdataset.Right:Adversarialimages synthesizedviaAdvFacesunderobfuscationsetting..................87 xix Figure3.16LeonardoDiCaprio'srealfacephoto(a)enrolledinthegalleryand(b)his probeimage 1 ;(c)Adversarialprobesynthesizedbyastate-of-the-art(SOTA)ad- versarialfacegenerator,AdvFaces[16];(d)Proposedadversarialdefenseframe- work,namely FaceGuard takes(c)asinput,detectsadversarialimages,localizes perturbedregions,andoutputsafacedevoidofadversarialperturbations. ASOTAfacerecognitionsystem,ArcFace,failstomatchLeonardo'sadversarial face(c)to(a),however,thefacecansuccessfullymatchto(a).Cosine similarityscores( 2 [ 1 ; 1] )obtainedviaArcFace[9]areshownbelowtheimages. Ascoreabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatesthattwofaces areofthesamesubject.................................89 Figure3.17 (TopRow) Adversarialfacessynthesizedvia 6 adversarialattacksusedin ourstudy. (BottomRow) Correspondingadversarialperturbations(grayindicates nochangefromtheinput).Noticethediversityintheperturbations.ArcFace scoresbetweenadversarialimageandtheunalteredgalleryimage(notshownhere) aregivenbeloweachimage.Ascoreabove 0.36 indicatesthattwofacesareofthe samesubject.Zoominfordetails...........................90 Figure3.18 FaceGuard employsadetector( D )tocomputeanadversarialscore.Scores belowdetectionthreshold( ˝ )passestheinputtoAFR,andhighvalueinvokesa andsendsthefacetotheAFRsystem................91 Figure3.19(a)AdversarialtrainingdegradesAFRperformanceofFaceNetmatcher[6] onrealfacesinLFWdatasetcomparedtostandardtraining.(b)Abinary trainedtodistinguishbetweenrealfacesandFGSM[34]attacksfailstodetect unseenattacktype,namelyPGD[35].........................92 Figure3.20Overviewoftrainingtheproposed FaceGuard inaself-supervisedmanner. An adversarialgenerator , G ,continuouslylearnstosynthesizechallengingand diverseperturbationsthatevadeafacematcher.Atthesametime,a detector , D ,learnstodistinguishbetweenthesynthesizedadversarialfacesandrealface images.Perturbationsresidinginthesynthesizedadversarialfacesareremoved viaa , P ur ...................................93 Figure3.21Exampleswheretheproposed FaceGuard failstocorrectlydetect(a)real facesand(b)adversarialfaces.Detectionscores 2 [0 ; 1] aregivenbeloweach image,where 0 indicatesrealand 1 indicatesadversarialface.............101 Figure3.22Adversarialfacessynthesizedby FaceGuard duringtraining.Notethedi- versityinperturbations(a)withinand(b)acrossiterations..............104 Figure3.23 FaceGuard successfullytheadversarialimage(redregionsindicate adversarialperturbationslocalizedbyourmask).ArcFace[9]scores 2 [ 1 ; 1] andSSIM 2 [0 ; 1] betweenanadvprobeandinputprobe aregivenbeloweachimage..............................105 xx Figure3.24(a) FaceGuard 'siscorrelatedwithitsadversarialsynthesispro- cess.(b)Trade-offbetweendetectionandwithrespecttoperturbation magnitudes.Withminimalperturbation,detectionischallengingwhile maintainsAFRperformance.Excessiveperturbationsleadtoeasierdetectionwith greaterchallengein...........................105 Figure3.25Traininglossacrossiterationswhenanadversarialdetectionnetworkis trainedviapre-computedadversarialfaces(blue),theproposedadv.generatorbut withoutthediversity(orange),andwiththeproposeddiversityloss(green).The diversitylosspreventsthenetworkfromovtoadversarialperturbations encounteredduringtraining..............................109 Figure3.26Examplesofgeneratedadversarialimagesalongwithcorrespondingpertur- bationmasksobtainedvia FaceGuard 'sgenerator G forthreerandomlysampled z . CosinesimilarityscoresviaArcFace[9] 2 [ 1 ; 1] andSSIM 2 [0 ; 1] betweensyn- thesizedadversarialandinputprobearegivenbeloweachimage.Ascoreabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatesthattwofacesareofthesame subject.........................................113 Figure3.27ExamplesofimagesviaMagNet[14],DefenseGan[15],andpro- posed FaceGuard forsixadversarialattacks.Cosinesimilarityscoresvia ArcFace[9] 2 [ 1 ; 1] aregivenbeloweachimage.Ascoreabove 0.36 (threshold @ 0 : 1% FalseAcceptRate)indicatesthattwofacesareofthesamesubject.....116 Figure3.28Examplesofsynthesizedadversarialimagesviatheproposedadversarial generatorandcorrespondingimages.Cosinesimilaritybetweenperturba- tionandmasksgivenbeloweachrowalongwithArcFacescoresbe- tweensynthesizedadvimageandrealprobe.Ascoreabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatesthattwofacesareofthesame subject.Evenwithlowercorrelationbetweenperturbationandmasks (rows3-5),theimagescanstillbeasthecorrectidentity.No- ticethattheprimarilyalterstheeyecolor,nose,andsubduesadversarial perturbationsinforeheads.Zoominfordetails....................117 Figure3.29ArcFace 2 [ 1 ; 1] /Detectionscores 2 [0 ; 1] whenperturbationamount isvaried( = f 0 : 25 ; 0 : 50 ; 0 : 75 ; 1 : 00 ; 1 : 25 g ).Detectionscoresabove 0 : 5 arepre- dictedasadversarialimageswhileArcFacescoresabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatethattwofacesareofthesamesubject. FaceGuard istrainedon =1 : 00 .Thedetectionscoresimproveasperturbationamountin- creases,whereas,majorityofimagesaredetectedasreal.Evenwhen imagesfailtobeasrealbythedetector,maintain highAFRperformance.................................118 xxi Figure3.30 2 Dt-SNEvisualizationoffacerepresentationsextractedviaArcFacefrom 1 ; 456 (a)real,(b)AdvFaces[16],and(c)imagesbelongingto 10 subjects inLFW[8].ExampleAdvFaces[16]pertainingtoasubjectmovesfartherfromits identityclusterwhiletheproposeddrawsthemback.............119 Figure4.1FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigital andphysicalspaces.Giventhediversityofthefaceattacks,prevailingmethods fallshortindetectingattacksacrossallthreecategories( i . e .,adversarial,digital manipulation,andspoofs).Thisworkisamongthetothetaskofface attackdetectiononthe 25 attacktypesacross 3 categoriesshownhere........121 Figure4.2(a)Detectionperformance(TDR@ 0 : 2% FDR)indetectingeachattack typebytheproposed UniFAD (purple)andthedifferenceinTDRfromthebest fusionscheme,LightGBM[36](pink).(b)Cosinesimilaritybetweenmeanfea- turesfor 25 attacktypesextractedby JointCNN .(c)Examplesofattacktypesfrom 4 differentclustersvia k -meansclusteringonJointCNNfeatures.Attacktypesin purple,blue,andreddenotespoofs,adversarial,anddigitalmanipulationattacks, respectively.......................................125 Figure4.3Anoverviewoftraining UniFAD intwostages.Stage 1 automaticallyclus- terscoherentattacktypesinto T groups.Stage 2 consistsofaMTLframework whereearlylayerslearngenericattackfeatureswhile T brancheslearntodistin- guishbonafromcoherentattacks.........................128 Figure4.4Confusionmatrixrepresentingtheaccuracyof UniFAD in identifyingthe 25 attacktypes.Majorityofoccurwithinthe attackcategory.Darkervaluesindicatehigheraccuracy.Overall, UniFAD achieves 75 : 81% and 97 : 37% accuracyinidentifyingattacktypesandcate- gories,respectively.Purple,blue,andreddenotespoofs,adversarial,anddigital manipulationattacks,respectively...........................134 Figure4.5Detectionperformancewithrespecttovaryingratioofsharedlayers(left) andnumberofbranches(right).Ourproposedarchitectureuses 50% sharedlayers with 4 branches.....................................136 Figure4.6Detectionperformanceonattacktypeswithinandoutsideabranch'spar- tition.Performancedropsonattacksoutsidepartitionastheymaynothaveany correlationwithwithin-partitionattacktypes.....................137 Figure4.7Examplecaseswhere UniFAD failstodetectfaceattacks.Finaldetection scoresalongwithscoresfromeachofthefourbranches( 2 [0 ; 1] )aregivenbe- loweachimage.Scorescloser 0 indicatebonaBranchesresponsibleforthe respectiveclusterarehighlightedinbold........................138 Figure4.8Trainingandtestingsplitsforgeneralizabilitystudy..............143 xxii Figure4.9Confusionmatrixrepresentingtheaccuracyof UniFAD in identifyingthe 3 attackcategories,namelyadversarialfaces,digitalfacemanipu- lation,andspoofs.Majorityofconfusionoccurswithindigitalattacks(adversarial anddigitalmanipulationattacks)............................145 xxiii LISTOFALGORITHMS Algorithm1Training AdvFaces .Allexperimentsinthisworkuse =0 : 0001 , 1 = 0 : 5 , 2 =0 : 9 , i =10 : 0 , p =1 : 0 , m =32 .Weset =3 : 0 (obfuscation), =8 : 0 (impersonation)............................85 Algorithm2Training FaceGuard .Allexperimentsinthisworkuse =0 : 0001 , 1 = 0 : 5 , 2 =0 : 9 , obf = fr =10 : 0 , pt = perc = div =1 : 0 , =3 : 0 , m =16 .Forbrevity, lg referstologoperation................108 xxiv Chapter1 Introduction Automatedfacerecognition(AFR)systems-thesoftwarethatmaps,analyzes,andthen theidentityofafaceinaphotographorvideo-isoneofthemostpowerfulsurveillancetools evermade.WhilepeopleinteractwithAFRsystemsasawayofunlockingtheirphonesorsorting theirphotos,facerecognitiontechnologiesarecurrentlydeployedinmanyimportantapplications. TheubiquityofAFRsystemsisevidentinbothcommercialandgovernmentalapplications.For instance,facerecognitionplaysavitalroleinidentitycardde-duplicationtopreventaperson fromobtainingmultipleIDcards,suchasdriver'slicensesandpassports,underdifferentnames 1 . AFRsystemsarealsoutilizedbytheUnitedStatesDepartmentofHomelandSecurity(DHS), theUnitedStatesDepartmentofDefense(DoD),alongwiththeFederalBureauofInvestigation (FBI)andImmigrationandCustomsEnforcement(ICE),todeterminefriendorfoeatsecurity checkpoints,andtoassistlawenforcementofinthetocapturefaceimageswithmobile devices,submitthemtofacerecognitionsystemoncentralservers,andquicklyidentifypeople whorefusetogivetheirname,providefalseinformation,orareinjuredandunresponsive 2 .Face recognitionsystemsareadditionallyemployedforsurveillanceandaccesscontroltosecureloca- tions.Commercialapplicationsofautomaticfacerecognitionarealsonowabundant,including automaticfitagflsuggestionsonFacebook,organizationofpersonalphotocollections,andmobile 1 https://bit.ly/2SO0yuL 2 https://bit.ly/3jSO4xH 1 (a) (b) Figure1.1(a)errorratesofsixstate-of-the-artautomatedfacerecognitionvendors ona12millionmugshotdataset,namelyFRVT-2018[22].(b)Sixmugshotsrepresentativeofthe FRVT-2018dataset. phoneunlock. ThecurrentcapabilitiesofanAFRsystemdependheavilyontheimageacquisitionconditions ( i . e .,subjectcooperation,environmentalconditions,etc.).IDdocumentvforinstance, entailsfrontal-to-frontalfacematchingofcontrolledimages,wherefacesarerequiredtohavea neutralexpression,auniformbackground,andcontrolledlighting.Inthesescenarios,state-of-the- artcommercialoff-the-shelf(COTS)facerecognitionsystemsarehighlyaccurateandhaveproven tobeextremelyuseful.Forty-threeofthe50statesintheUnitedStatesareutilizingAFRsystems fordetectingfraudulentIDdocuments;iftheFBIneedstoputanametoaface,thebureaucan scanthrough411millionfacephotosspreadacrossstateandfederaldatabases 3 .In2020,alarge- scalefacerecognitionevaluation,conductedbytheNationalInstituteofStandardsandTechnology (NIST),demonstratedthattheerrorratesofthetopperformingCOTSAFRsystemswereverylow foridentifyingmugshotfaceimagesatrank-1againstagallerycomprisingof12millionmugshots (seeFigure1.1). Comparedtootherbiometrictraitssuchasandiris,faceoffersanumberofadvan- tages:(i)We,ashumans,intrinsicallyidentifyothersviaface.Therefore,facedoesnotreveal 3 https://bit.ly/36edG4b 2 anyextraneoussensitiveinformationthatpeoplethemselvesdonotalreadyrevealtothepublic onadailybasis.Consequently,facerecognitiontendstobeamorepubliclyacceptablebiometric trait(comparedto,say,gerprintswhicharecommonlyassociatedwithcriminalaccusations). (ii)Unlikerprintandiris,nospecializedhardwaresensorsarerequired;digitalcamerasare readilyavailable( i . e .,insmartphones)andarerelativelyinexpensive.(iii)Facescanbeacquired unobtrusivelyatadistance,andinacovertmanner,ifrequired.(iv)Availablelarge-scalelegacy faceimagedatasets(suchaspassportanddriver'slicense)easebenchmarkingfacerecognition performance.(v)Inadditiontoidentity,facesalsorevealdemographicattributessuchasgender, race,andage.(vi)WiththerecentCOVID-19pandemic,thefitouchlessflnatureoffaceimage acquisitionisattractiveinbothgovernmentandcommercialapplications. WithnewemergingapplicationsofAFRsystems,theadvantagesofutilizingfacebiometric forisapparent.Intoday'ssociety,smartphoneandsurveillancecamerasareubiq- uitous.AsAmericanschools,mallsandoftightensecurityontheirpremises,thenumberof surveillancecamerasintheUnitedStatesissaidtogrowto85millionby2021,comparedto70 millionin2020.Chinaaloneisexpectedtohave560millionnetworkedCCTVcamerasbythe endof2021 4 .Thetotalnumberofsurveillancecamerasaroundtheworldissaidtoclimbabove 1billionbytheendof2021 5 .WithrecenttragicpoliceincidentssuchasthedeathofGeorge FloydinMinneapolis,FreddieGrayinBaltimore,andBreonnaTaylorinLouisville,manypolice departmentsarenowrequiringpatroloftowearbodycameras.Inthiseraofconstantdoc- umentationofpersonallivesonsocialmediawebsites,suchasFacebookandInstagram,personal collectionoffacephotosarealsoboomingwithanestimated25,000takenbyan averageperson( 18 to 34 yearolds)duringtheirlifetime 6 .Duetothisincreasinglyvastcollection ofavailableimagery,inadditiontocountlessroutinecrimes( e . g .,robbery,kidnapping,assault), AFRsystemsareaninvaluabletoolforofpersonsofinterest.Forexample,Walter Yovany-Gomez,amemberoftheMS-13streetgang,evadedauthoritiesforyearsbeforeFBIput 4 https://bit.ly/3nIQt0d 5 https://on.wsj.com/3345RvW 6 https://bit.ly/346F0P4 3 (a)Pose (b)Illumination (c)Expression (d)Occlusion (e)Resolution Figure1.2Sourcesofintra-subjectvariability:(a)pose,(b)illumination,and(c)expression.Each rowshowsintra-subjectvariationsforthesameindividualin(a-c;source:PIEDataset[23]),(d) AmitabhBacchan(source:GoogleImages),and(e)TomHiddleston(source:GoogleImages). himonitsTenMostWantedFugitiveslistin2011.WiththeaidofanAFRsystem,investigators tracedMr.Gomezin2017viaphotosonFacebookcontaininghisface 7 .Without utilizingan automated facerecognitionsystem,manuallysiftingthroughthe250billionphotos uploadedonFacebook,tilldate 8 ,isanimpossibletaskandawantedcriminalsuchasMr.Gomez wouldberoamingaroundfreelytoday. EvenwiththesuccessesenjoyedbyAFRsystemsintheaforementionedscenarios,facerecog- 7 https://n.pr/3j72xWl 8 https://bit.ly/30bGTZY 4 nitiontechnologyisstilllimitedbytheir unconstrained natureoftheimageryavailable.Accura- ciesofprevailingCOTSAFRsystemsarehighlysensitivetotheimageacquisitionconditions.In unconstrainedscenarios,faceimageacquisitionisnotwell-controlledandsubjectsmaybenon- cooperative(orunaware).ConfoundingfactorsplaguingprevailingAFRsystemsinclude: Pose: Faceposescanvaryin-plane(roll)orout-of-planerotation(yawand/orpitch).In- planerotationscanbecorrectedviastraightforward2Dtransformations.However,out-of- planerotationscausefacestobecomefiself-occludedflleadingtomissinginformationand poorfacerecognitionperformance.SeeFigure1.2aforsomeexamplefaceswithextreme posevariation. Illumination: Evenforfacesacquiredinthenaturalenvironmentalsettings,lightingcondi- tionscanvarydrastically(suchasindoorsvs.outdoors)andilluminationisalsoaffectedby dailychanges(suchastimeofdayand/orweatherconditions).Duetothethree-dimensional natureofface,illuminationacrossthefacevariesrapidlyduetotheshadowscastedatcer- tainangles.Illuminationvariationisexacerbatedwhenthe3Dfaceisacquiredandtranslated intoa2Dgrayscaleorcolorimage.Dependingonthestrengthoftheillumination,someface featuresmaybeexaggeratedormaycompletelyvanish.Figure1.2bshowsafewexamples offacesunderilluminationvariation. Expression: Inanunconstrainedsetting,faceimagesmaybeacquiredwithouttheknowl- edgeofthesubject.Therefore,faceimagesmaybecapturedmid-conversationand/orwhile viewingsomethingsurprising,upsetting,infuriating,etc.Suchdeviationsfromaneutral(or relaxed)faceexpression,leadtoadegradationinfacerecognitionperformance.Asaresult, theStateDepartmentguidelinesforpassportphotoacquisitionpermitssmilesbutfrowns onfitoothyflsmiles(whichapparentlyareasunusualorunnaturalexpressions) 9 . Accordingtotheguidelines,fiThesubject'sexpressionshouldbeneutral(non-smiling)with botheyesopen,andmouthclosed.Asmilewithaclosedjawisallowedbutisnotpreferred.fl. 9 https://cbsn.ws/3i7cdiD 5 Thisisbecausesmilingdistortsotherfacialfeatures,forexample,one'seyes.Figure1.2c showsafewexamplesoffaceswithexpressionchanges.Extremeexpressionvariations remainanongoingchallengeforAFRsystems. Occlusion: Itisquitecommonforunconstrainedfacestocontainfacialaccessoriessuch aseyeglassesandsunglasses.Occludingeyeregionscannegativelyimpactfacerecogni- tionperformancesincethesefacialregionsarehighlydiscriminative.Facialocclusionsare problematicnotonlybecauseofmissinginformation,butalsoduetotheextraneousandspu- riousinformationintroduced.Forinstance,evenifapersonconsistentlywearseyeglasses, specularrethatchangebasedonthelightsourcecanleadtohighintra-subjectface variation.Figure1.2dshowsafewexamplesofoccludedfaces. Aging: Giventwoimagesofthesameindividualacquiredmultipleyearsapart,arobust AFRsystemshouldstillbeabletoidentifythetwoimagesaspertainingtothesameindi- vidual.However,weinFigure1.4,thateventhetopperformingCOTSAFRsystems havedifidentifyingtheindividualasheorsheages[37Œ39].Unlikeotherfactors, faceagingisintrinsicandcannotbecontrolledbythesubjectortheacquisitionenviron- ment.Thatis,faceagingcanbepresentinbothcontrolled(constrained)anduncontrolled (unconstrained)scenarios. Resolution: Thequalityoffaceimagesseverelyaffectsfacerecognitionperformance. Figure1.3showstheincreaseinerrorrateswhenlowerqualitywebcamfacephotosare matchedtoamugshotgallery.Inpractice,unconstrainedfaceimagesareofpoorimage quality(suchasthosecapturedfromsurveillancecameras).Facerecognitionperformanceon poorqualityimagesisfarfromdesirableandremainsanongoingchallengeforthebiometric community(seeFigure1.2e). Inter-subjectsimilaritiescanalsoleadtofacerecognitionerrors.Forinstance,itischallenging (evenforhumans)todistinguishbetweenpeoplewithkinshiprelations(particularlytwins,seeFig- ure1.5a),andalsopeoplethatarenotrelatedbutexhibitstrongfacesimilarities(Figure1.5b,1.5c). 6 (a) (b)WebcamImages (c)Images Figure1.3errorratesofsixstate-of-the-artautomatedfacerecognitionvendorswhen (a)mugshots,(b)webcamimages,and(c)facesarecomparedagainsta1.6millionmugshot dataset(asubsetoftheFRVT-2018[22]). Figure1.4errorratesofsixstate-of-the-artautomatedfacerecognitionvendorson a3millionmugshotdatasetunderaging[22].Facerecognitionerrorsincreaseasthetimegap betweenaprobeimageandtheenrolledfaceimageinthegalleryincreases. Asmentionedearlier,duetotherecentCOVID-19pandemic,authoritiesandcitizensalike areenjoyingtheficontactlessflandcovertnatureoffacerecognition.However,thecovertnesscan alsoleadtoitsdownfallwiththeadventoffaceattackslaunchedinbothphysical(suchas3Dface masks)anddigitaldomains(adversarialanddigitallymanipulatedfaces).Withoutthepresenceofa 7 (a) (b) (c) Figure1.5Sourceofinter-subjectsimilarities:(a)kinshiprelations(here,twins),(b)different peoplewithnokinshiprelationswhohappentoexhibitverysimilarfacialcharacters(knownas fidoppelg ¨ angersfl),and(c)RichardJones(right)spent17yearsinprisonforacrimecommittedby hisdoppelg ¨ anger,RickyAmos(left)(Source:https://cnn.it/2Gb1F4A). humanoperatorensuringthelegitimacyoffaceimageacquisition,maliciousindividuals(attackers) areincreasinglychallengingthesecurityofAFRpipelinesforgovernmentaccesscontrol, andgains. ThevulnerabilitiesofAFRsystemstowardsfaceattacksandmethodstodefend againstthemwillbediscussedlaterinthischapter. 1.1Background AtypicalAFRsystemmayoperateinvariousmodesdependingonthedeployedapplication.In mostscenarios,faceimagesalongwiththeiridentitylabelsare enrolled asatemplateina dataset,referredtoasa gallery .Later,theAFRsystemtakesasinputanimage,knownasa probe or query ,andmatchesitagainstoneormanyfacespresentinthegallery. Face or authentication referstoaone-to-onematchingscenariowherethetaskof theAFRsystemistoverifywhetheraprobefaceimagebelongstotheindividualthathe/sheclaims tobe( e . g .,apassportandpassenger'sfacephoto,accesscontrol,andsmartphoneunlock).Onthe otherhand,face or search involvesone-to-manycomparisonsinordertoretrieve (fromthegallery)theidentityoftheprobefaceimage(whoseidentityispreviouslyunknown). Inthescenario,thenumberofidentitiestoretrieveisusuallymanuallyto top- k ,where k dependsontheapplicationathand.Thisisreferredtoasa closed-set scenario,whereweassumethattheidentityoftheprobeispresentinthegalleryandwehope thatthecorrectidentitywillbepresentinthetop- k retrievedindividuals.Authoritiescanthen 8 Figure1.6AtypicalAutomatedFaceRecognition(AFR)systemtypicallyconsistsof(i)face detection,(ii)facealignment(tomitigategeometricdistortions),(iii)featureextraction,and(iv) comparisonoffacerepresentations(featurevectors). manuallyreviewthetop- k candidateretrievalstoidentify,say,asuspectedcriminalinaforensics dataset.However,real-worldscenariosentailan open-set scenario,wherewecannot assumethattheprobe'sidentitywillindeedbepresentinthegallery.Foropen-setapplications,we retrievethetopcandidatematchesonlyiftheyexceedathreshold(insteadofmanually specifyingaed k ).Thisisextremelyuseful(forexample,inwatchlistsurveillance)sinceitmay beimpracticalforauthoritiestomanuallyreviewretrievedcandidatesforeveryprobe. 1.2AutomatedFaceRecognition(AFR) Regardlessoftheoperatingscenario(vortheprimarytaskofallAFR systemsistocomputeasimilaritymeasurementbetweenanytwofaceimages.ArobustAFRsys- temshouldideallyoutputahighsimilaritymeasurebetweenafaceimagepairofthesameindivid- ual( genuinepair )andalowsimilaritymeasurebetweenfaceimagepairsofdifferentindividuals ( impostorpairs ).Thisprocessinvolvesmultiplecomponentswhichallcontributeto thefacesimilaritymeasurementandconsequently,theresultingfacerecognitionaccuracy. 1.2.1AFRPipeline AtypicalAFRsystemsiscomposedofthefollowingsequentialmodules(seeFigure1.6):(i) facedetection,(ii)facealignment,(iii)facefeatureextractionandfacerepresentation,and(iv) similaritymeasurement.Eachcomponentisnecessaryandcontributestotheoverallperformance 9 Table1.1Vcationperformance(%)undertwodifferentfacedetectorsonLFW,CFP-FP,and AgeDB-30[5].Figure1.7showsafewexamplesofeachofthethreedatasets. MethodsLFW[8]CFP-FP[24]AgeDB-30[25] MTCNN+ArcFace[9]99.8398.3798.15 RetinaFace+ArcFace[5] 99.8699.4998.60 (a)LFW[8] (b)CFP[24] (c)AgeDB[25] Figure1.7Examplefacesin(a)LFW[8],(b)CFP[24],andAgeDB[25]datasets. oftheAFRsystem.Thus,attentionisseenintheliteraturetowardsimprovingeach individualcomponent. 1.2.1.1FaceDetection Sinceimagesmayconsistofavarietyofdifferentobjects(includingface),priortofacerecognition, itisimperativethatfacesarespatiallylocatedintheinputimage.Facedetectionistheprocessof automatically(a)determiningwhetheraface(ormultiplefaces)ispresentinaninputimage,and then(b)outputtingthelocations(2Dcoordinates)ofalldetectedfaces.Thisisatrivialtaskfor humans,whereas,extractingfaceregionsfromarbitraryimagesischallengingformachines.This canlikelybeattributedtoahighintra-classvariabilityintheappearanceoffaces( e . g .,skincolor, backgroundnoise,facepose,illumination,etc.).Failuretodetectfaces(missingfacesintheimage, orerroneouslydetectingnon-faceregionsasfaces)isextremelyproblematic,sincesubsequent AFRcomponentswillnotaddanyvalueiftheyarenotpresentedwithanactualfaceregion. TheseminalworkofViolaandJones[40]iscreditedtobeingthereal-timeandaccurate facedetector.Sinceitspublicationin2004,alargenumberofstudiesinliteraturehavefocused onimprovingfacedetectionaccuracy.Currently,MTCNN[41]andRetinaFace[5]leadtheway infacedetectionaccuracy(seeFigure1.8)andarecommonlyemployedinfacerecognitionlitera- 10 Figure1.8Astate-of-the-artfacedetector,RetinaFace,candetectaround900faces(detection thresholdat 0 : 5 )outof1,151peoplereportedtobepresentinthefiWorld'sLargest[5].The yellowrectangledenotestheboundingboxaroundafaceandgreendotsrepresentthedetected landmarks. (a) (b) (c) (d) (e) Figure1.9Exampleimagesthatsupposedlycontainfacesbutcannotbedetectedbyastate-of-the- artfacedetector,RetinaFace[5].Notethattheseimagesareofverylowresolution. 11 ture[6,9,42].Tounderstandtheofanaccuratefacedetectionsystemonfacerecognition performance,inTable1.1wereportthefacevaccuracyonthreebenchmarkdatasets, LFW[8],CFP-FP[24],AgeDB-30[25].Twofacedetectorsareemployed:(i)MTCNN[41]and (ii)RetinaFace[5].Wecanobserveanimprovementinfacerecognitionperformancewhenabet- terfacedetector,RetinaFace,isemployed.However,evenwiththestate-of-the-artfacedetectors, errorscanstillbeobservedduetoextremeposevariations,occlusions,andlowresolution(see Figure1.9). 1.2.1.2FaceAlignment Thisstepisprimarilyconcernedwithreducingintra-classvariabilitybyeliminatinggeometric distortionspresentinthefaceregions.Inparticular,geometricandphotographicvariationsmay bemitigatedbytransformingthedetectedfaceregionstoacanonicalview(frontfaceview).Face alignmentaimsatdeterminingcorrespondencesbetweenfaceimagesbasedon points( e . g .,eyes,nose,mouth,chin,jaw,etc.).Themoststraightforwardalignmenttechniqueisa simple2Drigidaftransformationbasedontwoeyelocationstoaccountforfacesizeandin- planeheadrotation[6,43].Moresophisticatedalignmentmethodsinvolveemploying3Dmodeling techniquestofifrontalizefltheface,whichalsoaccountsforout-of-planeheadrotations.However, such3Dfacealignmenttechniquesaregenerallytime-consumingandincreasestheoverheadtime associatedwithanAFRsystem. Facealignmentrequiresasetofreferencepoints( i . e .,landmarks)whicharepoints abasefaceposition( i . e .,positionofeyes,nose,andmouthforanaverageperson).A commonapproachforacquiringreferencepointsistocomputetheaveragelandmarkpointsex- tractedfromalargefacedataset.Afterobtainingreferencepoints,fortransformingaprobefacein the2Dspace,wehavefourpossibletypesoftransformations(seeFigure1.10): EuclideanTransformation whichisarigidtransformationpreservingdistancesbetween eachpairofpoints.TheEuclideantransformationallowsrotatingandtranslatingtheface(3 DegreesofFreedom). 12 Figure1.10Illustrationofvarious2Dfacealignmenttechniques:(i)simplycroppingtheface region,(ii)similaritytransform(scaleandrotation),(iii)aftransformation(rotation,scaling, andshearmapping),and(iv)projectivetransformation(perspectivedeformation)[7]. SimilarityTransformation alsoallowsforscaling(makingfacesbigger/smaller)andthere- fore,doesnotpreservedistancesbetweenpoints(4DegreesofFreedom). Transformation includespoint,straightline,andplanepreservation.Intotal,af transformationallowstranslation,rotation,scaling,andchangingaspectratioandshearmap- ping(6DegreesofFreedom). ProjectiveTransformation isthemostadvancedformof2Dtransformation.Everyparam- eterinthetransformationmatrixisindependentofeachother(8DegreesofFreedom). FromFigure1.10,wethat2Dtransformationswithhigherdegreesoffreedomhavelower projectionerrorrates(MSEdistancetothereferencepoints)butyieldunnatural-lookingfaces. Aninterestingquestionarises, flWhatismoretoAFRsystems,lowprojectionerroror naturallookingfaces?fl Toanswerthis,whilekeepingtheentireAFRpipelineconsistentandonly changingthefacealignmentmethod,thevaccuracyonLFWdataset[8]isreportedin Table1.2.Wethatmorenaturallookingfaces(similaritytransform)arebetterthanallowing formoredegreesoffreedom(afprojective).Consequently,themostcommonlyemployedface 13 Table1.2VPerformance(%)ofFaceNet[6]AFRsystemunderdifferentfacealignment techniques[7]. MethodsLFWProtocol[8]BLUFRProtocol@FAR=0.1% SimpleCrop+FaceNet97.985.59 Similarity+FaceNet 98.188.42 AfaceNet97.986.47 Projective+FaceNet97.785.63 Euclideantransformisexcludedasitassumesreferencepointsarescale-invariant. Table1.3VPerformance(%)onLFW[8]fordifferentfacefeatureextractors. MethodsYearFaceRepresentationLFW[8] Eigenfaces[45]1991Holistic60.02 Fisherfaces[46]1997Holistic87.47 HighDim-LBP[47]2013Local95.17 JointBayesian[48]2012Local96.33 DeepFace[49]2014Learned97.35 DeepID[50]2014Learned97.45 VGGFace[51]2015Learned98.95 FaceNet[6]2015Learned99.63 SphereFace[44]2017Learned99.42 CosFace[42]2018Learned99.73 ArcFace[9]2019Learned99.83 alignmentmethodinliteratureinvolves(a)detecting5landmarkpoints(2eyes,nose,and2mouth corners)andsubsequently,(b)applyingasimilaritytransformation[42,44]. 1.2.1.3FeatureExtraction Afterobtainingacroppedandalignedfaceimage,thenextmoduleinanAFRsysteminvolvesex- tractingasetofnumericalvalues(knownasa featurevector ora representation )thatbestdescribes theinputface.Thesimplestsetoffeaturesaretherawpixelvaluesoftheinputface.However, 14 suchfeaturesmayincludespuriousandirrelevantinformation(suchasthecolorspresentinnon- faceregions).Instead,matchingresultscanbeenhancedifwedeviseamethodforextractingboth high-levelfeatures(distancesbetweenfacialcomponentsandtheirrelativelocationsandratios) andlow-levelfeatures(wrinklesandfacemarksandscars).However,afeaturevectormustbe carefullyconstructedsuchthatwedonothaveunnecessarilylarge(andlikelyredundant)features whichmaynegativelyimpactrecognitionrates.Table1.3providesasummaryofafewstudies primarilyfocusedonimprovingfacefeatureextraction(someofwhichwillbediscussedinSec- tion1.3.1). 1.2.1.4SimilarityMeasurement ThetaskofanAFRsystemistocomputeameasureofsimilaritybetweenfacerepresen- tations.ThemostobviouschoiceisEuclideandistancebetweenfeaturevectors,however,other distancemetricssuchascosine,Manhattan,histogramintersection,log-likelihoodstatistics,chi- squarestatistics,etc.,mayalsoimproverecognitionperformance.Inpractice,cosinesimilarity betweenfeaturevectorsisthemostwidelyadoptedsimilaritymetricinliterature[6,9,42,44]. 1.3EvolutionofFaceRecognition Humanshavebeendrawntotheideaofidentifyingcriminalsbasedonfacialcharacteristicssince aslongasthe19 th century.Basedonanthropometricmeasurements,AlphonseBertillondeviseda methodofidentifyingandtrackingcriminalsin1879[52].TheU.S.adoptedtheBertillonsystem in1887whichwaslaterreplacedbyintheearly20 th century.However,theconcept ofutilizingfacephotosofcriminals(knownas mugshots )forisstillusedworldwide. TheearliestpioneersofautomatingfacerecognitionwereWoodrowBledsoe,HelenWolf,and CharlesBisson 10 .In1964and1965,Bledsoe,alongwithWolfandBissonbeganworkusingcom- puterstorecognizethehumanface.Sincetheirworkwasfundedfromanunnamedintelligence 10 https://bit.ly/30exJf5 15 agency,muchoftheirworkwasneverpublished.However,wedoknowthattheirinitialwork involvedthemanualmarkingofvariousfilandmarksflonthefacesuchaseyecenters,mouthcor- ners,etc.,whichwerethenmathematicallyrotatedbyacomputertocompensateforposevariation (facealignment).Afterwards,thedistancesbetweenlandmarkswereautomaticallycomputedand comparedbetweenimagestodeterminetheidentity(similaritymeasurement).Inotherwords,fol- lowingtheAFRpipelineoutlinedearlier,Bledsoeandhispeersutilizedfacelandmarksandtheir distancesandratioasthefacerepresentation.Thisworkisdeemedtobecrucialsinceitwasthe toshowfaceasavaluablebiometrictraitforperson SinceBledsoe'seffortsondevisinganautomatedfacerecognitionsystem,researchershave devotedthepast50+yearsonimprovingeachstageoftheAFRpipeline. 1.3.1FaceRepresentations TheevolutionofAFRsystemscanberoughlycategorizedintothreefeaturerepresentationap- proaches:(i)holistic,(ii)local,and(iii)learnedrepresentations(seeTable1.3). 1.3.1.1HolisticFaceRepresentations Thefullyautomatedfacerecognitionutilized holisticfacerepresentations whereallpixelsin theinputfaceimageareusedtoderivethefacerepresenation.Thesestudiesrelyheavilyonac- curatefacealignment(typicallyusingeyelocations)whichischallengingwhennon-frontalfaces areencountered.Eigenfaces[45]wasamongtheAFRsystemtoutilizeholisticfacefeatures. Alow-dimensionalfiface-spacefliscomputedforatrainingdatasetcomprisedof N unlabeledface imagesusingPrincipalComponentAnalysis(PCA).Thefiface-spaceflisasetof M˝ when x and y aretwofacesbelongingtothesameperson(genuinepair),and(ii) similarity ( x;y ) < = ˝ when x and y aretwofacesbelongingtotwodifferentpeople(impostorpair).Here,thethreshold ˝ isde- terminedviacross-validation.VaccuracyiscommonlyutilizedintheLFWprotocol[8] wherethenumberofgenuineandimpostorpairsarebalanced. Inreal-worldscenarios,AFRsystemsoperateatapre-determinedmatchthreshold( ˝ ).Typi- cally,wereporttheTrueAcceptRateatapre-determinedFalseAcceptRate,say,0.1%.The ˝ is 18 determinedviaanReceiverOperatingCharacteristic(ROC)curve.Formally, TAR ( ˝ )= Numberofgenuinepairswithsimilarityscore >˝ Totalnumberofgenuinepairs FAR ( ˝ )= Numberofimpostorpairswithsimilarityscore >˝ Totalnumberofimpostorpairs InthecaseoffacetheAFRsystemispresentedwithagalleryoffaceimages (knownidentities).Then,givenaprobeimage,theAFRsystemneedstodeterminetheidentity oftheprobefromthegallery.Inclosed-settion,weassumetheidentityoftheprobeis presentinthegallery.Therefore,wesimplyneedtodeterminetherankatwhichthetruemateis retrieved(sayat K )outofagallerysize.Thentheclosed-setaccuracycanbecomputedvia, RetrievalRate ( K )= NumberofsuccessfullyretrievedprobeswithinKretrievals Totalnumberofprobes RetrievalRate( K )iscommonlyreferredtoasRank- K accuracy.Forexample,Rank-1accuracy referstotheaccuracywithwhichanAFRsystemsuccessfullyretrievesthecorrectidentifyasthe entryinalistofpotentialmatchesrankedindescendingorderviasimilarityscoreswiththe probeimage. Intheopen-setscenario,thesystemneedstodeterminewhethertheprobe isinthegallerypriortoretreival.Inthiscase,thecommonlyemployedevaluationmetricisTrue PositiveRate(TPIR)atFalsePositiveRate(FPIR): TPIR ( ˝ )= NumberofsuccessfullyretrievedmatedprobesatRank-1 Totalnumberofmatedprobe FPIR ( ˝ )= Numberoffalselyretrievednon-matesatRank-1 Totalnumberofnon-matedpairs Here,asuccessfullyretrievedmatedprobemeansthatthetruemateisreturnedasthetop-1 resultandthesimilarityscorebetweenprobeandtruemateisgreaterthan ˝ . 19 1.4.2FaceDatasets Alleffortsindesigningastate-of-the-artfacerecognitionsystemwouldhavebeeninvainwith- outadvancementsinacquiringmoreandmorechallengingfacedatasetsforbenchmarkingAFR systems.Indeed,duetoprivacyconcerns,manystudiesevaluatetheirproposedmethodsonface datasetsacquiredin-house.However,facerecognitionperformancetodaywouldhavebeenfar fromdesirablehaditnotbeenforthepublicreleaseoflarge-scalefacedatasets. FERET[26]andFRGC[26]areamongthedatasetsutilizedforbenchmarkingfacerecog- nitionperformance(seeFigure1.11).ThesedatasetsgreatlycontributedtoadvancementsinAFR systems.However,thesedatasetswereacquiredundercontrolledconditionsandweremainly gearedtowardsstudyingchallengesassociatedwithfacerecognition( e . g .,pose,illumi- nation,andexpression).ThesedatasetswereextremelyvaluableforevaluatingAFRperformance underimagesexhibitingchallenges,butwerenotveryrepresentativeoffaceimagesen- counteredinthereal-world. Huang etal. releasedtheLabeledFacesintheWild(LFW)datasetwhichwasacquiredby scrapingtheInternetforcelebrityfacephotos.TheLFWdatasetincludes13,233faceimagesof 5,749differentpeople.FaceimageswereautomaticallydetectedviatheViola-Jonesfacedetec- tor[40].Huang etal. alsoproposedtheLFWprotocolforbenchmarkingAFRaccuracy:10-fold cross-validationonfacevof300genuinepairsand300impostorpairs. 1.4.3ConstrainedFaceRecognition TheNationalInstituteofStandardsandTechnology(NIST)beganbenchmarkingtheaccuracyof facerecognitionsystemsin1993withtheFERETprogram[26].Sincethen,commercialface recognitionsystemshavebeenevaluatedinmultipleFaceRecognitionVendorTests(FRVTs).Ta- ble1.4documentsstate-of-the-artconstrainedfacerecognitionperformancesince1993tilldate. Withtheexceptionofwebcamandfaceimages(seeFigure1.3),mostoftheevaluationsin theOngoingFRVTareperformedonaconstrainedfacedatasetacquiredbyNIST.Thesebench- marksareinvaluableforprovidingcomparingstate-of-the-artAFRalgorithms.NISThasaccess 20 Table1.4BenchmarkingAFRperformancethroughouttheyearsinNISTevaluationsonfrontal andconstrainedfaces. StudyYearGallerySizeRank-1Accuracy(%)TAR(%)@0.1%FAR FERET[26]1993-943167821 FERET[26]1996-978319546 FRVT[54]200237,4377380 FRGC[26]200516,028N/A99 FRVT[54]2006N/AN/A99 MBE[55]20101.6M9299 FRVT[56]20141.6M96N/A FRVT[22]Ongoing12M99.9899.99 tolargeoperationaldatabasesandconductsextensivetestingofmultiplealgorithmsonprotocols thatmimicoperationalscenarios 11 . 1.4.4UnconstrainedFaceRecognition Prevailingstate-of-the-artmethodsforunconstrainedfacerecognitionhavebeenbenchmarked bytheLFW[8]databaseprotocolsinceitsreleasein2007.RecentdeeplearningbasedAFR systemsachieveaccuraciesabove99%viatheLFWprotocol( e . g .,FaceNet[6],CosFace[42], SphereFace[44],ArcFace[9]).Asmentionedearlier,amajorreasonthesedata-drivenmethods aresosuccessfulisduetotheavailabilityoflarge-scaletrainingdatasets.Here,weenumerateafew ofthetrainingdatasetsthatarecommonlyemployedfortrainingstate-of-the-artfacerecognition systems: CASIA-WebFace[10]consistsof0.5Mfaceimagesof10,575celebritiesacquiredfiin-the- wildflbyscrapingtheInternet. MS-Celeb-1M[28]contains8Mfaceimagesof85Ksubjects.SimilartoCASIA-WebFace, MS-CelebalsocontainscelebrityfacephotographscollectedfromtheInternet. 11 http://www.nist.gov/itl/iad/ig/face.cfm 21 (a)FERET[26] (b)FRGC[26] (c)LFW[8] (d)IJB-A[27] (e)MS-Celeb-1M[28] (f)TinyFace[29] Figure1.11Examplefaceimagesfrom(a)FERET[26],(b)FRGC[26],(c)LFW[8],(d)IJB- A[27],(e)MS-Celeb-1M[28],and(f)TinyFace[29].Datasets(a)and(b)containfaceimages underrelativelycontrolledacquisitionconditions.Datasets(c-f)containmoreunconstrainedface images( e . g .,collectedfromtheInternet). 22 Figure1.12FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigitalandphysi- calspaces.Giventhediversityofthefaceattacks,prevailingmethodsfallshortindetectingattacks acrossallthreecategories( i . e .,adversarial,digitalmanipulation,andspoofs). VGGFace2[57]consistsof3.3Mcelebrityfacephotosfrom9Kidentitieswith362images persubjectonaverage. ExamplefaceimagesfromsomeofthesedatasetsareshowninFigure1.11. 1.5VulnerabilitiesofAFRSystems Asmentionedearlier,theaccuracy,usability,andtouchlessacquisitionofstate-of-the-artAFR systemshaveledtotheirubiquitousadoptioninaplethoraofdomains,includingmobilephone unlock,accesscontrolsystems,andpaymentservices.Despitethisimpressiverecognitionper- formance,currentAFRsystemsremainvulnerabletothegrowingthreatoffaceattacksbothin physicalanddigitaldomains. Forinstance,anattackercanhidehisidentitybywearinga3Dmask[58],orintruderscan assumeavictim'sidentitybydigitallyswappingtheirfacewiththevictim'sfaceimage[20].With unrestrictedaccesstotherapidproliferationoffaceimagesonsocialmediaplatforms,launching attacksagainstAFRsystemshasbecomeevenmoreaccessible.Giventhegrowingdissemination 23 offifakenewsflandfideepfakesfl[59],theresearchcommunityandsocialmediaplatformsalike arepushingtowards generalizable defenseagainstcontinuouslyevolvingandsophisticatedface attacks. Inliterature,faceattackscanbebroadlyintothreeattackcategories:(i)Spoofat- tacks:artifactsinthe physical domain( e . g .,3Dmasks,eyeglasses,replayingvideos)[1],(ii) Adversarialattacks:imperceptiblenoisesaddedtoprobesforevadingAFRsystems[60],and(iii) Digitalmanipulationattacks:entirelyorpartiallyphoto-realisticfacesusinggenerative models[20].Withineachofthesecategories,therearedifferentattacktypes.Forexample,each spoofmedium, e . g .,3Dmaskandmakeup,constitutesoneattacktype,andthereare 13 common typesofspoofattacks[1].Likewise,inadversarialanddigitalmanipulationattacks,eachattack model,designedbyuniqueobjectivesandlosses,maybeconsideredasoneattacktype.Thus, theattackcategoriesandtypesforma 2 -layertreestructureencompassingthediverseattacks(see Figure1.12).Suchatreewillinevitablygrowinthefuture. InordertosafeguardAFRsystemsagainsttheseattacks,numerousfaceattackdetectionap- proacheshavebeenproposed[20,21,61Œ63].Despiteimpressivedetectionrates,prevailingre- searcheffortsfocusonafewattacktypeswithin one ofthethreeattackcategories.Sincetheexact typeoffaceattackmaynotbeknown apriori ,ageneralizabledetectorthatcandefendanAFR systemagainstanyofthethreeattackcategoriesisofutmostimportance. 1.5.1PhysicalFaceSpoofs Facepresentationattacks 12 arefiphysicalfakefacesflwhichcanbeconstructedwithavarietyof differentinstruments(presentationattackinstruments), e . g .,3Dprintedmasks,printedpaper,or digitaldevices(videoreplayattacksfromamobilephone)withagoalofenablinganattackerto impersonateavictim'sidentity,oralternatively,obfuscatetheirownidentity(seeFigure1.13). Withtherapidproliferationoffaceimages/videosontheInternet(especiallyonsocialmediaweb- 12 ISOstandardIEC30107-1:2016(E)presentationattacksas fipresentationtothebiometricdatacapture subsystemwiththegoalofinterferingwiththeoperationofthebiometricsystemfl [64] 24 sites,suchasFacebook,Twitter,orLinkedIn),replayingvideoscontainingthevictim'sfaceor presentingaprintedphotographofthevictimtotheAFRsystemisatrivialtask[65].Evenif afacepresentationattackdetectionsystemcouldtriviallydetectprintedphotographsandreplay videoattacks( e . g .,withdepthsensors),attackerscanstillattempttolaunchmoresophisticated attackssuchas3Dmasks[66],make-up,orevenvirtualreality[67]. (a) (b)Print (c)Replay (d)Half (e)Silicone (f)Trans. (g)Paper (h)Mann. (i)Obf. (j)Imp. (k)Cosm. (l)Funny (m)PaperG (n)PaperC Figure1.13Examplepresentationattacks:Simpleattacksinclude(b)printedphotograph,or(c) replayingthevictim'svideo.Moreadvancedpresentationattackscanalsobeleveragedsuchas (d-h)3Dmasks,(i-k)make-upattacks,or(l-n)partialattacks[30].Afaceisshownin(a) forcomparison.Here,thepresentationattacksin(b-c,k-n)belongtothesamepersonin(a). Theneedforpreventingfaceattacksisbecomingincreasinglyurgentduetotheuser'spri- vacyconcernsassociatedwithspoofedsystems.Failuretodetectfaceattackscanbeamajor securitythreatduetothewidespreadadoptionofautomatedfacerecognitionsystemsforborder control[68].Indeed,withtheadventofApple'siPhoneXandSamsung'sGalaxyS8,allofusare carryingautomatedfacerecognitionsystemsinourpocketsembeddedinoursmartphones.Face recognitiononourphonesfacilitates(i)unlockingthedevice,(ii)conductingtransactions, and(iii)accesstoprivilegedcontentstoredonthedevice. 1.5.2DigitalAdversarialFaces FromTable1.3wesawthatprevailingAFRsystemsarebasedonautomaticfeatureextraction methodspoweredbyCNNs.However,CNNmodelshavebeenshowntobevulnerableto adver- 25 sarialperturbations 13 [69Œ72].Szegedy etal. showedthedangersof adversarialexamples in theimagedomain,whereperturbingthepixelsintheinputimagecancauseCNNs tomisclassifytheimageevenwhentheamountofperturbationisimperceptibletothehuman eye[69].Despiteimpressiverecognitionperformance,prevailingAFRsystemsarestillvulnerable tothegrowingthreatofadversarialexamples(seeFigure1.14). (a)EnrolledFace 0.72 0.78 (b)InputProbe 0.22 0.12 (c)AdvFaces 0.26 0.25 (d)PGD[33] 0.14 0.25 (e)FGSM[70] Figure1.14Examplegalleryandprobefaceimagesandcorrespondingsynthesizedadversarial examples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand(b)thesamesubject's probeimage;(c)Adversarialexamplesgeneratedfrom(b)byourproposedsynthesismethod, AdvFaces;(d-e)Resultsfromtwoadversarialexamplegenerationmethods.Cosinesimilarity scores( 2 [ 1 ; 1] )obtainedbycomparing(b-e)totheenrolledimageinthegalleryviaArcFace[9] areshownbelowtheimages.Ascoreabove 0.28 (threshold@ 0 : 1% FalseAcceptRate)indicates thattwofaceimagesbelongtothesamesubject.Here,asuccessfulobfuscationattackwouldmean thathumanscanidentifytheadversarialprobesandenrolledfacesasbelongingtothesameidentity butanautomatedfacerecognitionsystemconsidersthemtobefromdifferentsubjects. ToattackanAFRsystem,ahackercanmaliciouslyperturbhisfaceimageinamannerthat cancauseAFRsystemstomatchittoatargetvictim( impersonationattack )oranyidentityother thanthehacker( obfuscationattack ).Yettothehumanobserver,thisadversarialfaceimageshould appearasalegitimatefacephotooftheattacker(seeFigure3.2d).Thisisdifferentfromface pre- 13 Adversarialperturbationsrefertoalteringaninputimageinstancewithsmall,humanimperceptiblechangesina mannerthatcanevadeCNNmodels. 26 Figure1.15Examplesofdigitallymanipulatedfaces.(a)Realimages/framesfromFFHQ,CelebA andFaceForensics++datasets;(b)PairedfaceidentityswapimagesfromFaceForensics++dataset; (c)PairedfaceexpressionswapimagesfromFaceForensics++dataset;(d)Attributesmanipulated examplesbyFaceAPPandStarGAN;(e)EntiresynthesizedfacesbyPGGANandStyleGAN. Collagesourcedfrom[20]. sentationattacks ,wherethehackerassumestheidentityofatargetbypresentingaphysicalfake facetotheAFRsystem(seeFigure3.2).However,inthecaseofpresentationattacks,thehacker needstoactivelyparticipatebywearingamaskorreplayingaphotograph/videoofthegenuine individualwhichmaybeconspicuousinscenarioswherehumanoperatorsareinvolved(suchas airports).Ontheotherhand,adversarialfaces,donotrequireactiveparticipationofthesubject duringauthentication(comparisonbetweenadversarialprobeandgalleryimages). Consider,forexample,theUnitedStatesCustomsandBorderProtection(CBP),thelargest federallawenforcementagencyintheUnitedStates[73],which(i)processesentrytothecountry 27 forovera million travellers everyday [74]and(ii)employsautomatedfacerecognitionforverifying travelers'identities[75].ToevadebeingasanindividualinaCBPwatchlist,aterrorist canmaliciouslyenrollanadversarialimageinthegallerysuchthatuponenteringtheborder,his legitimatefaceimagewillbematchedtoaknownandbenignindividualortoafakeidentity previouslyenrolledinthegallery. 1.5.3DigitalFaceManipulation Digitalmanipulationattacks,madefeasiblebyVariationalAutoEncoders(VAEs)andGenerative AdversarialNetworks(GANs),cangenerateentirelyorpartiallyphotorealisticfaceim- ages[20](seeFigure1.15).Digitalmanipulationattacktypescanbebroadlyintothe following: IdentitySwapping: Thesemethodsdigitallyreplacethefaceofonepersonwiththefaceof anotherperson.Forinstance, FaceSwap [76]insertsfamousactorsintomovieclipsinwhichthey neverappeared. DeepFakes alsoperformsfaceswappingviadeeplearningalgorithms. ExpressionSwapping: Expressionsinafaceimagecanbedigitallyandreplaced withanother[77].Thesemethodsswapexpressionsinreal-timewithonlyRGBcameras. AttributeManipulation Utilizingstate-of-the-artGANs,studiessuchasStarGAN[78]andST- GAN[79]focusonattributemanipulationbyalteringsingleormultipleattributesinafaceimage, e . g .,gender,age,skincolor,hair,andglasses. EntireFaceSynthesis: Poweredbylarge-scalehigh-resolutionfacedatasetsandtheprevelance ofGANs,anattackercaneasilysynthesizeentirefaceimagesofunknownidentities,whoserealism issuchthatevenhumanshavedifassessingifitisgenuineormanipulated[80]. 28 1.6DissertationContributions Automatedfacerecognitionhasbeenstudiedextensivelyformorethanfourdecades. improvementshavebeenmadeprogressivelyineachcomponent(facedetection,alignment,feature extraction,matching)inordertobuildahighlydiscriminativeandrobustAFRsystem,atleastfor constrainedandsemi-constrainedfacerecognition.However,asthetechnologybecomesmore widelyadopted,safeguardingAFRsystemsagainstcontinuouslyevolvingfaceattacksinboth physicalanddigitaldomainsisoftheutmostimportance.Sincetheexacttypeoffaceattackmay notbeknown apriori ,theneedforageneralizabledetectorthatcandefendanAFRsystemagainst anyofthethreeattackcategories(spoofs,adversarial,anddigitalmanipulation)isevident. Thisdissertationfocusesondesigningstate-of-the-artdefensemethodstosafeguardAFR systemsagainstindividualattackcategories.Lastly,weproposeanewframeworktodefendAFR systemsagainstbothphysicalanddigitalattacks. Themaincontributionsofthisdissertationareasfollows: 1. Ageneralizable,interpretable,andaccuratefacemethodfordetectingphysical fakefaces(suchas3Dmasks)inordertomitigatephysicalspoofattackonstate-of-the-art AFRsystems. 2. Anautomaticadversarialfacesynthesismethodforgeneratingdigitalfakefacesthatcanim- personateavictimorobfuscateone'sidentitybyevadingstate-of-the-artAFRsystems.With apowerfuladversarialfacesynthesizer,wecanfurtherinvestigaterobustnessofprevailing facerecognitionsystemstodigitaladversarialfaces. 3. Anewself-supervisedframework,namely FaceGuard ,fordefendingagainstadversarialface images.FaceGuardcombinesofadversarialtraining,detection,andinto adefensemechanismtrainedinanend-to-endmanner. 4. Anovelfaceattackdetectionframework,namely UniFAD ,thatautomaticallyclusters similarattacksandemploysamulti-tasklearningframeworktojointlydetectdigitaland 29 physicalattacks.ProposedUniFADallowsforfurtheroftheattackcategories, i . e .,whetherattacksareadversarial,digitallymanipulated,orcontainspoofartifacts. 30 Chapter2 DefendingAgainstFaceSpoofs Facepresentationattacks(physicallycraftedspoofs)challengetherobustnessoffacerecognition systems.TosafeguardAFRsystemsagainstspoofs,numerouspresentationattackdetection(PAD) methodshavebeenproposed.Inthischapter,weaddresstwomajorissueswithprevailingPAD approaches,namely,facePADgeneralizationandinterpretability.Ourmainfocusistoimprove presentationattackdetectionperformanceacrossawidevarietyofunknownattacks,whilealso maintaininghighdetectionaccuracyonspoofsencounteredduringthetrainingofourproposed PADsolution. 2.1Introduction Despiteimpressivefacerecognitionperformance,currentAFRsystemsremainvulnerabletothe growingthreatof presentationattacks 1 .Facespoofsarefifakefacesflwhichcanbeconstructed withavarietyofdifferentinstruments(presentationattackinstruments), e . g .,3Dprintedmasks, printedpaper,ordigitaldevices(videoreplayattacksfromamobilephone)withagoalofenabling anattackertoimpersonateavictim'sidentity,oralternatively,obfuscatetheirownidentity(see Figure2.1).Withtherapidproliferationoffaceimages/videosontheInternet(especiallyonsocial 1 ISOstandardIEC30107-1:2016(E)presentationattacksas fipresentationtothebiometricdatacapture subsystemwiththegoalofinterferingwiththeoperationofthebiometricsystemfl [64].Notethatthesepresentation attacksaredifferentfromdigitalmanipulationoffaceimages,suchasDeepFakes[81]andadversarialfaces[16]. 31 mediawebsites,suchasFacebook,Twitter,orLinkedIn),replayingvideoscontainingthevictim's faceorpresentingaprintedphotographofthevictimtotheAFRsystemisatrivialtask[65].Even ifafacepresentationattackdetectionsystemcouldtriviallydetectprintedphotographsandreplay videoattacks( e . g .,withdepthsensors),attackerscanstillattempttolaunchmoresophisticated attackssuchas3Dmasks[66],make-up,orevenvirtualreality[67]. (a) (b)Print (c)Replay (d)Half (e)Silicone (f)Trans. (g)Paper (h)Mann. (i)Obf. (j)Imp. (k)Cosm. (l)Funny (m)PaperG (n)PaperC Figure2.1Examplepresentationattackinstruments:Simpleattacksinclude(b)printedphoto- graph,or(c)replayingthevictim'svideo.Moreadvancedpresentationattackscanalsobelever- agedsuchas(d-h)3Dmasks,(i-k)make-upattacks,or(l-n)partialattacks[30].Afaceis shownin(a)forcomparison.Here,thepresentationattacksin(b-c,k-n)belongtothesameperson in(a). Theneedforpreventingfaceattacksisbecomingincreasinglyurgentduetotheuser'spri- vacyconcernsassociatedwithspoofedsystems.Failuretodetectfaceattackscanbeamajor securitythreatduetothewidespreadadoptionofautomatedfacerecognitionsystemsforborder control[68].In2011,ayoungindividualfromHongKongboardedatoCanadadisguisedas anoldmanwithahatbywearingasiliconefacemasktosuccessfullyfoolthebordercontrol authorities[82]. Alsoconsiderthat,withtheadventofApple'siPhoneXandSamsung'sGalaxyS8,allofusare carryingautomatedfacerecognitionsystemsinourpocketsembeddedinoursmartphones.Face recognitiononourphonesfacilitates(i)unlockingthedevice,(ii)conductingtransactions, and(iii)accesstoprivilegedcontentstoredonthedevice.Failuretodetectfacepresentationattacks 32 Figure2.2AnoverviewoftheproposedSelf-SupervisedRegionalFullyConvolutionNetwork ( SSR-FCN ).Wetrainintwostages:(1)Stage1learnsglobaldiscriminativecuesviatrainingon theentirefaceimage.Thescoremapobtainedfromstage1ishard-gatedtoobtainpresentation attackregionsinthefaceimage.Werandomlycroparbitrary-sizepatchesfromthepresentation attackregionsandournetworkinstage2tolearnlocaldiscriminativecues.Duringtest, weinputtheentirefaceimagetoobtainthescore.Thescoremapcanalsobeusedto visualizethepresentationattackregionsintheinputimage. onsmartphonescouldcompromiseinformationsuchasemails,bankingrecords,social mediacontent,andpersonalphotos[83]. Withnumerousapproachesproposedtodetectfaceattacks,currentfacepresentationattack 33 Table2.1Asummaryofpubliclyavailablefacepresentationattackdetectiondatasets. Dataset Year Statistics #PresentationAttackInstruments #Subj. #Vids. Replay Print 3DMask Makeup Partial Attack[4] 2012 50 1,200 2 1 0 0 0 FASD[3] 2012 50 600 1 1 0 0 0 3DMAD[84] 2013 17 255 0 0 1 0 0 MFSD[85] 2015 35 440 2 1 0 0 0 Replay[86] 2016 40 1,030 1 1 0 0 0 MARs[87] 2016 35 1,009 0 0 2 0 0 Oulu-NPU[2] 2017 55 4,950 2 2 0 0 0 SiW[30] 2018 165 4,620 4 2 0 0 0 SiW-M[1] 2019 493 1,630 1 1 5 3 3 detectionmethodshavefollowingshortcomings: Generalizabilty Sincetheexactpresentationattackinstrumentmaynotbeknownbeforehand, howtogeneralizewelltofiunknownfl 2 attacksisofutmostimportance.Amajorityoftheprevail- ingstate-of-the-artfacepresentationattackdetectiontechniquesfocusonlyondetecting2Dprinted paperandvideoreplayattacks,andarevulnerabletopresentationattackscraftedfrommaterials notseenduringtrainingofthedectector.Infact,studiesshowatwo-foldincreaseinerrorwhen presentationattackdetectionapproachesencounterunknownpresentationattackinstruments[30]. Inaddition,currentfacepresentationattackdetectionapproachesrelyondenselyconnectedneu- ralnetworkswithalargenumberoflearnableparameters(exceeding 2 : 7 M ),wherethelackof generalizationacrossunknownpresentationattackinstrumentsisevenmorepronounced. LackofInterpretability Givenafaceimage,facepresentationattackdetectionapproachestyp- icallyoutputaholisticface fiattackscorefl whichdepictsthelikelihoodthattheinputimageis orapresentationattack.Withoutanabilitytovisualizewhichregionsofthefacecon- tributetotheoveralldecisionmadebethenetwork,theglobalattackscorealonemaynotbe sufforahumanoperatortointerpretthenetwork'sdecision. 2 Unseenattacksarepresentationattackinstrumentsthatareknowntothedeveloperswherebyalgorithmscanbe tailoredtodetectthem,buttheirdataisneverusedfortraining.Unknownattacksarepresentationattack instrumentsthatarenotknowntothedevelopersandneitherseenduringtraining. 34 Inanefforttoimpartgeneralizabilityandinterpretabilitytofacepresentationattackdetec- tionsystems,weproposeafacepresentationattackdetectionframeworkdesignedto detectunknownpresentationattackinstruments,namely, Self-SupervisedRegionalFullyCon- volutionalNetwork ( SSR-FCN ).AFullyConvolutionalNetwork(FCN)istrainedtolearn globaldiscriminativecuesandautomaticallyidentifypresentationattackregionsinfaceimages. Thenetworkisthentolearnlocalrepresentationsviaregionalsupervision.Oncetrained, thedeployedmodelcanautomaticallylocateregionswhereattackoccursintheinputimageand provideaattackscore. Thecontributionsofthischapterareasfollows: Weshowthatfeatureslearnedfromlocalfaceregionshavebettergeneralizationabilitythan thoselearnedfromtheentirefaceimagealone. Weprovideextensiveexperimentstoshowthattheproposedapproach, SSR-FCN ,outper- formsotherlocalregionextractionstrategiesandstate-of-the-artfacepresentationattackde- tectionmethodsononeofthelargestpubliclyavailabledataset,namely,SiW-M,comprised of13differentspresentationattackinstruments.TheproposedmethodreducestheEqualEr- rorRate(EER)by(i)14%relativetostate-of-the-art[88]undertheunknownattacksetting, and(ii)40%onknownpresentationattackinstruments.Inaddition, SSR-FCN achievescom- petitiveperformanceonstandardbenchmarksonOulu-NPU[2]datasetandoutperformspre- vailingmethodsoncross-datasetgeneralization(CASIA-FASD[3]andReplay-Attack[4]). Theproposed SSR-FCN isalsoshowntobemoreinterpretablesinceitcandirectlypredict thepartsofthefacesthatareconsideredaspresentationattacks. 2.2Background Inordertomitigatethethreatsassociatedwithpresentationattacks,numerousfacepresentation attackdetectiontechniques,basedonbothsoftwareandhardwaresolutions,havebeenproposed. 35 Figure2.3Illustrationofdrawbacksofpriorapproaches.Top:exampleofaface;Bottom: exampleofapaperglassespresentationattack.Inthiscase,thepresentationattackartifactisonly presentintheeye-regionoftheface.(a)trainedwithglobalsupervisionovtothe classsincebothimagesaremostly(thepresentationattackinstrumentcovers onlyapartoftheface).(b)Pixel-levelsupervisionassumestheentireimageiseitheror presentationattackandconstructslabelmapsaccordingly.Thisisnotavalidassumptioninmask, makeup,andpartialpresentationattackinstruments.Instead,(c)theproposedframework,trains onextractedregionsfromfaceimages.Theseregionscanbebasedondomainknowledge,such aseye,nose,mouthregions,orrandomlycropped.Theproposed SSR-FCN utlizesself-supervised region-selection. Earlysoftware-basedsolutionsutilizedlivenesscues,suchaseyeblinking,lipmovement,and headmotion,todetectprintattacks[89Œ92].However,theseapproachesfailwhentheyencounter unknownattackssuchasprintedattackswithcuteyeregions(seeFigure2.1n).Inaddition,these methodsrequireactivecooperationofuserinprovidingtypesofimagesmakingthem tedioustouse. Sincethen,researchershavemovedonto passive facepresentationattackdetectionapproaches thatrelyontextureanalysisfordistinguishingandpresentationattacks,ratherthanmotion orlivenesscues.Themajorityoffacepresentationattackdetectionmethodsonlyfocusondetect- ingprintandreplayattacks,whichcanbedetectedusingfeaturessuchascolorandtexture[93Œ98]. Manypriorstudiesemployhandcraftedfeaturessuchas2DFourierSpectrum[85,99],LocalBi- naryPatterns(LBP)[94,100Œ102],Histogramoforientedgradients(HOG)[93,103],Difference- of-Gaussians(DoG)[104],Scale-invariantfeaturetransform(SIFT)[96],andSpeededuprobust features(SURF)[105].SometechniquesutilizepresentationattackdetectionbeyondtheRGB colorspectrum,suchasincorporatingtheluminanceandchrominancechannels[102].Insteadof 36 apredeterminedcolorspectrum,Li etal. [95]automaticallylearnanewcolorschemethatcan bestdistinguishandpresentationattacks.Anotherlineofworkextractsimagequality featurestodetectpresentationattacks[85,97,98].Duetotheassumptionthatpresentationattack instrumentsareoneofreplayorprintattacks,thesemethodsseverelysufferfromgeneralizationto unknownpresentationattackinstruments. Hardware-basedsolutionsinliteraturehaveincorporated3Ddepthinformation[106Œ108], multi-spectralandinfraredsensors[109],andevenphysiologicalsensorssuchasvwinfor- mation[110].Presentationattackdetectioncanbefurtherenhancedbyincorporatingbackground audiosignals[111].However,withtheinclusionofadditionalsensorsalongwithastandardcam- era,thedeploymentcostscanbeexorbitant( e . g .,thermalsensorsforiPhonescostoverUSD 400 3 ). State-of-the-artfacepresentationattackdetectionsystemsutilizeConvolutionalNeuralNet- works(CNN)sothefeatureset(representation)islearnedthatbestdifferentiatesdesfrom presentationattacks.Yang etal. wereamongthetoproposeCNNsforfacepresentationat- tackdetectionandtheyshowedabout70%decreaseinHalfTotalErrorRate(HTER)comparedto thebaselinescomprisedofhandcraftedfeatures[112].Furtherimprovementinperformancewas achievedbydirectlymodifyingthenetworkarchitecture[113Œ116].Deeplearningapproachesalso performwellformaskattackdetection[66].Incorporatingauxiliaryinformation( e . g .eyeblinking) indeepnetworkscanfurtherimprovethefacepresentationattackdetectionperformance[30,91]. Limitedstudiesongeneralizablefacepresentationattackdetectionfocusonone-classclassi- approaches.Thesemethodsonlymodelthedistributionoffacefeaturesusing one-classsuchasone-classSVM[117],one-classGMM[118],oremployadistance metricloss[119].However,theseapproacheshaveseveraldrawbacks:(i)byonlymodelingthe facefeaturedistributions,themethodstendtoovtoclass,and(ii)bothone- classSVMandone-classGMMhavebeenshowntoperformpoorlyonpublicbenchmarkdatasets (CASIA-FASD[3],Replay-Mobile[86],andMSU-MFSD[85]).Atree-basedapproachutilizing deepnetworkswasproposedforgeneralizablefacepresentationattackdetection[1].Inorderto 3 https://amzn.to/2zJ6YW4 37 preventfacepresentationattackdetectionmethodsfromovtothesubject,envi- ronment,andpresentationattackinstrument,transferlearninghasalsobeenstudied[120Œ122]. However,thesemethodssharesimilarnetworkarchitecturethataredenselyconnectedwiththir- teenconvolutionallayersexceeding 2 : 7 M learnableparameters[30,88,120Œ124].Duetothis,a majorityoftheaforementionedpresentationattackdetectionmethodsalsosufferfrompoorgener- alizationperformance. Table2.1outlinesthepubliclyavailablefacepresentationattackdetectiondatasets. 2.3Motivation Theapproachismotivatedbyfollowingobservations: 2.3.1FacePresentationAttackDetectionisaLocalTask Itisnowgenerallyacceptedthatforprintandreplayattacks, fifacepresentationattackdetectionis usuallyalocaltaskinwhichdiscriminativecluesareubiquitousandrepetitivefl [125].However, inthecaseofmasks,makeups,andpartialattacks,theubiquityandrepetitivenessofpresentation attackcuesmaynotholdtrue.Forinstance,inFigure2.3(a-c),presentationattackartifact(the paperglasses)areonlypresentintheeyeregionsoftheface.Unlikefacerecognition,facepre- sentationattackdetectiondoesnotrequiretheentirefaceimageinordertopredictwhetherthe imageisapresentationattackorInfact,ourexperimentalresultsandtheiranalysiswill thattheentirefaceimagealonecanadverserlyaffecttheconvergenceandgeneralization ofnetworks. 2.3.2Globalvs.LocalSupervision Priorworkcanbepartitionedintotwogroups:(i) globalsupervision wheretheinputtothenetwork istheentirefaceimageandtheCNNoutputsascoreindicatingwhethertheimageis orpresentationattack[30,88,112Œ116,123,126],and(ii) pixel-levelsupervision wheremultiple 38 Table2.2ArchitecturedetailsoftheproposedFCNbackbone. Layer #ofActivations #ofParameters Input H W 3 0 Conv H= 2 W= 2 64 3 3 3 64+64 Conv H= 4 W= 4 128 3 3 64 128+128 Conv H= 8 W= 8 256 3 3 128 256+256 Conv H= 16 H= 16 512 3 3 256 512+512 Conv H= 16 H= 16 1 3 3 512 1+1 GAP 1 0 Total 1.5M ConvandGAPrefertoconvolutionalandglobalaveragepoolingoperations. lossesareaggregatedovereachpixelinthefeaturemap[124,127].Thesestudies assumethatallpixelsinthefaceimageiseitherorpresentationattack(seeFigure2.3(b)). Thisassumptionholdstrueforpresentationattackinstruments,suchasreplayandprintattacks (whicharetheonlypresentationattackinstrumentsconsideredbythestudies),butnotformask, makeup,andpartialattacks.Therefore,pixel-levelsupervisioncannotonlysufferfrompoor generalizationacrossadiverserangeofpresentationattackinstruments,butalsoconvergenceof thenetworkisseverelyaffectedduetonoisylabels. Insummary,basedonthe13differentpresentationattackinstrumentsshowninFigure2.1,for whichwehavethedata,wegainthefollowinginsights:(i)facepresentationattackdetectionis inherentlyalocaltask,and(ii)learninglocalrepresentationscanimprovefacepresentationattack detectionperformance[124,127].Motivatedby(i),wehypothesizethatutilizingaFullyConvo- lutionalNetwork(FCN)maybemoreappropriateforthefacepresentationattackdetectiontask comparedtoatraditionalCNN.ThesecondinsightsuggestsFCNscanbeintrinsicallyregular- izedtolearnlocalcuesbyenforcingthenetworkto look atlocalspatialregionsoftheface.In ordertoensurethattheseregionsmostlycomprisepresentationattackpatterns,weproposea self- supervised regionextractor. 39 2.4ProposedApproach Inthissection,wedescribetheproposed Self-SupervisedRegionalFullyConvolutionalNetwork ( SSR-FCN )forgeneralizedfacepresentationattackdetection.AsshowninFigure2.2,wetrain thenetworkintwostages,(a)StageIlearnsglobaldiscriminativecuesandpredictsscoremaps, and(b)StageIIextractsarbitrary-sizeregionsfrompresentationattackareasandthe networkviaregionalsupervision. 2.4.1NetworkArchitecture Intypicalimagetasks,networksaredesignedsuchthatinformationpresentinthe inputimagecanbeusedforlearning global discriminativefeaturesintheformofafeaturevector withoututilizingthespatialarrangementintheinput.Tothisend,afully-connected(FC)layeris generallyintroducedattheendofthelastconvolutionallayer.Thisfully-connectedlayeroutputs a D -dimensionfeaturethataggregatesdecisionsatvariousspatialregionstoobtainaglobalde- scriptionoftheinputimage.However,thisisnotidealforpartialpresentationattackssincethe presentationattackartifactisnotpresentinallspatialregions.Giventheplethoraofavailablepre- sentationattackinstruments,itisbettertolearn local representationsandmakedecisionsonlocal spatialinputsratherthanglobaldescriptors.Therefore,weemployaFullyConvolutionalNetwork (FCN)byreplacingtheFClayerinatraditionalCNNwitha 1 1 convolutionallayerfollowedby aglobalaveragepoolinglayer.TheFCNleadstothreemajoradvantagesovertraditionalCNNs: Arbitrary-sizedinputs :Byreplacingthefully-connectedlayerwithaglobalaveragepool- inglayer,theentireFCNcanacceptinputimagesofanyimagesize.Thispropertycanbe exploitedtolearndiscriminativefeaturesatlocalspatialregions,regardlessoftheinputsize, ratherthanovtoaglobalrepresentationoftheentirefaceimage. Interpretability :SincetheproposedFCNistrainedtoprovidedecisionsatalocallevel,the scoremapoutputbythenetworkcanbeusedtoidentifythepresentationattackregionsin theface. 40 Figure2.4Threepresentationattackimagesandtheircorrespondingbinarymasksextractedfrom predictedscoremaps.Blackregionscorrespondtopredictedregions,whereas,white regionsindicatepresentationattack. :ViaFCN,anentirefaceimagecanbeinferredonlyoncewherelocaldecisions aredynamicallyaggregatedviathe 1 1 convolutionoperator.Existingmethodsutilizinga traditionalCNNwhichhasalargernumberoftrainableparametersduetothefullyconnected layerattheendofthenetwork.Thisnecessitatesalargetrainingdatasetwhichislimited inthefacepresentationattackdetectionliterature(seeTable2.1).FCNismoreparameter- efandcanbetrainedinaneffectivemanner(toavoidov 2.4.2Network AmajorityofpriorworkonCNN-basedfacepresentationattackdetectionemploysarchitectures thataredenselyconnectedwiththirteenconvolutionallayers[30,88,123,124,128].Evenwiththe placementofskipconnections,thenumberoflearnableparametersexceed 2 : 7 M .Asweseein Table2.1,onlyalimitedamountoftrainingdata 4 isgenerallyavailableinfacepresentationattack detectiondatasets.Limiteddatacoupledwiththelargenumberoftrainableparameterscauses currentapproachestoovleadingtopoorgeneralizationperformanceunderunknownattack scenarios.Instead,weemployashallowerneuralnetworkcomprisingofonlyveconvolutional layerswithapproximately 1 : 5 M learnableparameters(seeTable2.2). 2.4.3StageI:TrainingFCNGlobally WetraintheFCNwithglobalfaceimagesinordertolearnglobaldiscriminativecuesand identifypresentationattackregionsinthefaceimage.Givenanimage, x 2 R H W C ,wedetecta 4 Thelackoflarge-scalepubliclyavailablefacepresentationattackdetectiondatasetsisduetothetimeandeffort requiredalongwithprivacyconcernsassociatedinacquiringsuchdatasets. 41 faceandcropthefaceregionvia5landmarks(twoeyes,nose,andtwomouthkeypoints)inorder toremovebackgroundinformationnotpertinenttothetaskoffacepresentationattackdetection. Here, H , W ,and C refertotheheight,width,andnumberofchannels(3inthecaseofRGB)of theinputimage.Thefaceregionsarethenalignedandresizedtoaed-size( e . g ., 256 256 )in ordertomaintainconsistentspatialinformationacrossalltrainingdata. TheproposedFCNconsistsoffourdownsamplingconvolutionalblockseachcoupledwith batchnormalizationandReLUactivation.Thefeaturemapfromthefourthconvolutionallayer passesthrougha 1 1 convolutionallayer.Theoutputofthe 1 1 convolutionallayerrepresents ascoremap S 2 R ( H S W S 1) whereeachpixelin S representsaattack decisioncorrespondingtoitsreceptiveintheimage.Theheight( H S )andwidth( W S )ofthe scoremapisdeterminedbytheinputimagesizeandthenumberofdownsamplinglayers.Fora 256 256 3 inputimage,ourproposedarchitectureoutputsa 16 16 1 scoremap. Thescoremapisthenreducedtoasinglescalarvaluebygloballyaveragepooling.Thatis,the score( s )foraninputimageisobtainedfromthe ( H S W S 1) scoremap( S ) by, s = 1 H S W S H S X i W S X j S i;j (2.4.1) Usingsigmoidactivationontheoutput( s ),weobtainascalar p ( c j x ) 2 [0 ; 1] predictingthelikelihoodthattheinputimageisapresentationattack,where c =0 indicates and c =1 indicatespresentationattack. WetrainthenetworkbyminimizingtheBinaryCrossEntropy(BCE)loss, L = [ y log ( p ( c j x ))+(1 y ) log (1 p ( c j x ))] (2.4.2) where y isthegroundtruthlabeloftheinputimage. 42 2.4.4StageII:TrainingFCNonSelf-SupervisedRegions Inordertosupervisetrainingatalocallevel,weproposearegionalsupervisionstrategy.Wetrain thenetworktolearnlocalcuesbyonlyshowingcertainregionsofthefacewherepresentation attackpatternsexists.Inordertoensurethatpresentationattackartifacts/patternsindeedexists withintheselectedregions,thepre-trainedFCNfromStageI(2.4.3)canautomaticallyguidethe regionselectionprocessinpresentationattackimages.Forfaces,wecanrandomlycrop aregionfromanypartoftheimage. Duetotheabsenceofafullyconnectedlayer,noticethatFCNnaturallyencodesdecisionsat eachpixelinfeaturemap S .Inotherwords,higherintensitypixelswithin S indicatealargerlike- lihoodofapresentationattackpatternresidingwithinthereceptiveintheimage.Therefore, discriminativeregions(presentationattackareas)areautomaticallyhighlightedinthescoremap bytrainingonentirefaceimages(seeFigure2.2). Wecanthencraftabinarymask M indicatingtheattackregionsinthe inputpresentationattackimages.First,wesoft-gatethescoremapbymin-maxnormalizationsuch thatwecanobtainascoremap S 0 2 [0 ; 1] . Let f S 0 ( i;j ) representtheactivationinthe ( i;j ) -thspatiallocationinthescaledscoremap S 0 . Thebinarymask M isdesignedbyhard-gating, M ( i;j )= 8 > > < > > : 1 ; if S 0 ( i;j ) ˝ 0 ; otherwise (2.4.3) where, ˝ isathresholdthatcontrolsthesizeofthehard-gatedregion( ˝ =0 : 5 inourcase). Alarger ˝ leadstosmallerregionsandsmaller ˝ canleadtospuriouspresentationattackregions. ExamplesofbinarymasksareshowninFigure2.4.Fromthebinarymask,wecanthenrandomly extractarectangularboundingboxsuchthatthecenteroftherectanglelieswithinthedetected presentationattackregions.Inthismanner,wecancroprectangularregionsofarbitrarysizes fromtheinputimagesuchthateachregioncontainspresentationattackartifactsaccordingtoour 43 pre-trainedglobalFCN.Weconstrainthewidthandheightoftheboundingboxestobebetween MIN region and MAX region . Inthismanner,weournetworktolearnlocaldiscriminativecues. 2.4.5Testing SinceFCNscanacceptarbitraryinputsizesandthefactthattheproposedFCNhasencountered entirefacesinStageI,weinputtheglobalfaceintothetrainednetworkandobtainthescore map.Thescoremapisthenaveragepooledtoextracttheoutput,whichisthen normalizedbyasigmoidfunctioninordertoobtainanattackscorewithin [0 ; 1] .Thatis,the scoreisobtainedby, 1 1+exp( s ) Inadditiontothescore,thescoremap( S )canalsobeutilizedforvisualizingthe presentationattackregionsinthefacebyconstructingaheatmap(seeFigure2.2). 2.5ExperimentalSetup 2.5.1Datasets Thefollowingfourdatasetsareutilizedinourstudy(Table2.1): 2.5.1.1Spoof-in-the-WildwithMultipleAttacks(SiW-M)[1] Adataset,collectedin2019,comprising13differentpresentationattackinstruments,acquired forevaluatinggeneralizationperformanceonunknownpresentationattackinstru- ments.Comparedwithotherpubliclyavailabledatasets(Table2.1),SiW-Misdiverseinpresen- tationattackinstruments,environmentalconditions,andfaceposes.Weevaluate SSR-FCN under both unknown and known settings,andperformablationstudiesonthisdataset. 44 2.5.1.2Oulu-NPU[2] Adatasetcomprisedof4,950high-resolutionvideoclipsof55subjects.Oulu-NPUfour protocolseachdesignedforevaluatinggeneralizationagainstvariationsincapturingconditions, attackdevices,capturingdevicesandtheircombinations.Weusethisdatasetforcomparingour approachwiththeprevailingstate-of-the-artfacepresentationattackdetectionmethodsonthefour protocols. 2.5.1.3CASIA-FASD[3]&Replay-Attack[4] Bothdatasets,collectedin2012,arefrequentlyemployedinfacepresentationattackdetection literaturefortesting cross-datasetgeneralization performance.Thesetwodatasetsprovideacom- prehensivecollectionofattacks,includingwarpedphotoattacks,cutphotoattacks,andvideo replayattacks.Low-quality,normal-quality,andhigh-qualityvideosarerecordedunderdifferent lightingconditions. AllimagesshowninthispaperarefromSiW-Mtestingsets. 2.5.2DataPreprocessing Foralldatasets,weextractallframesinavideo.TheframesarethenpassedthroughMTCNN facedetector[41]todetect5faciallandmarks(twoeyes,noseandtwomouthcorners).Similarity transformationisusedtoalignthefaceimagesbasedonthevelandmarks.Aftertransformation, theimagesarecroppedto 256 256 . Allfaceimagesshowninthepaperarecroppedandaligned. Beforepassingintonetwork,wenormalizetheimagesbyrequiringeachpixeltobewithin [ 1 ; 1] bysubtracting 127 : 5 anddividingby 127 : 5 . 2.5.3ImplementationDetails SSR-FCN isimplementedinTw,andtrainedwithaconstantlearningrateof (1 e 3) with amini-batchsizeof 128 .Theobjectivefunction, L ,isminimizedusingAdamoptimizer[129]. 45 Table2.3Generalizationerroronlearningglobal(CNN)vs.local(FCN)representationsofSiW- M[1]. Method Metric(%) Replay Obfuscation PaperGlasses Overall CNN ACER 13.3 47.1 32.2 30.8 17.0 EER 12.8 44.6 23.6 27.0 13.2 FCN(StageI) ACER 11.2 52.2 12.1 25.1 23.4 EER 11.2 37.6 12.4 20.4 12.1 Ittakes20epochstoconverge.Following[30],werandomlyinitializealltheweightsofthe convolutionallayersusinganormaldistributionof 0 meanand 0 : 02 standarddeviation.Werestrict theself-supervisedregionstobeatleast 1 = 4 oftheentireimage,thatis, MIN region =64 andat most MAX region =256 whichisthesizeoftheglobalfaceimage.Dataaugmentationduring traininginvolvesrandomhorizontalwithaprobabilityof 0 : 5 .Wetrainandtestourproposed methodonasingleNvidiaGTX1080TiGPU.Forevaluation,wecomputetheattackscoresforall framesinavideoandtemporallyaveragethemtoobtainthescore. 2.5.4EvaluationMetrics Foralltheexperiments,wereportthestandardISO/IEC30107[64]metrics: 1. AttackPresentationErrorRate(APCER):theworsterrorrateamongallthe presentationattackinstruments; 2. PresentationErrorRate(BPCER): 3. AverageClaErrorRate(ACER):themeanofAPCERandBPCER. Inaddition,wealsoreporttheEqualErrorRate(EER)andtheTrueDetectionRate(TDR)at 2.0% 5 FalseDetectionRate(FDR)forourevaluation.Forafaircomparisonwithpriorwork,we reporttheHalfTotalErrorRate(HTER)forcross-datasetevaluation. ExceptforEERandHTER, weemployadecisionthresholdof 0 : 5 . 5 Duetothesmallnumberofsamples,thresholdsatlowerFalseDetectionRate(FDR)suchas0.2% (recommendedundertheIARPAODINprogram)cannotbecomputed. 46 (a)Regions (b)LandmarkBased (c)Self-Supervised(Proposed) Figure2.5Illustrationofvariousregionextractionstrategiesfromtrainingimages.(a)and(b) areregionsextractedviadomainknowledge(manuallyorlandmark-based.(c)random regionsextractedviaproposedself-supervisionscheme.Eachcolordenotesaseparateregion. 47 Table2.4GeneralizationperformanceofdifferentregionextractionstrategiesonSiW-Mdataset. Here,eachcolumnrepresentsanunknownpresentationattackinstrumentwhilethemethodis trainedontheremaining12presentationattackinstruments. Method Metric Replay Print MaskAttacks MakeupAttacks PartialAttacks Mean Std. (%) Replay Print Half Silicone Trans. Paper Mann. Obf. Imp. Cosm. FunnyEye Glasses PaperCut 99vids. 118vids. 72vids. 27vids. 88vids. 17vids. 40vids. 23vids. 61vids. 50vids. 160vids. 127vids. 86vids. Global ACER 11.2 15.5 12.8 21.5 35.4 6.1 10.7 52.2 50.0 20.5 26.2 12.1 9.6 22.6 15.3 EER 11.2 14.0 12.8 23.1 26.6 2.9 11.0 37.6 10.4 17.0 24.2 12.4 10.1 16.8 9.3 Eye-Region ACER 13.2 13.7 7.5 17.4 22.5 5.79 6.2 19.5 8.3 11.7 32.8 15.3 7.3 13.2 8.5 EER 12.4 11.4 7.3 15.2 21.5 2.9 6.5 20.2 7.8 11.2 27.2 14.7 7.5 12.3 6.2 Nose-Region ACER 17.4 10.5 8.2 13.8 30.3 5.3 8.4 37.4 5.1 18.0 35.5 31.4 7.1 17.6 12.0 EER 14.6 9.8 9.2 12.7 22.0 5.2 8.4 23.6 4.4 14.6 24.9 27.7 7.6 14.2 7.9 Mouth-Region ACER 20.5 20.7 22.9 26.3 30.6 15.6 17.1 44.2 18.1 24.0 38.0 47.2 8.5 25.7 11.4 EER 19.9 21.3 22.6 25.1 30.0 10.1 10.7 40.9 16.1 24.0 35.5 40.4 8.1 23.4 10.9 Global+ ACER 10.9 10.5 7.5 17.7 28.7 5.1 7.0 38.0 5.1 13.6 29.4 15.2 6.2 15 10.7 Eye+Nose EER 10.2 10.0 7.7 15.8 21.3 1.8 6.7 21.0 3.0 12.3 22.5 12.3 6.5 11.6 6.8 Landmark ACER 10.7 9.2 18.4 25.1 26.4 6.2 6.9 53.8 8.1 15.4 35.8 40.8 7.6 20.3 15.2 Region EER 8.0 10.1 12.2 23.1 18.8 8.9 4.1 40.1 9.9 15.6 17.7 25.6 4.9 15.3 10 Global+ ACER 12.0 11.2 7.3 23.7 26.4 6.3 5.9 26.7 6.7 10.7 27.8 25.7 6.4 15.1 9.2 Landmark EER 11.5 10.1 7.2 19.0 4.9 6.6 4.6 25.6 6.7 10.9 23.5 18.5 4.7 11.8 7.4 Random-Crop ACER 9.2 6.7 7.3 19.9 30.9 9.1 6.9 44.0 6.5 13.8 31.8 28.6 5.9 17.0 12.8 EER 8.9 7.8 10.3 17.9 21.3 3.7 6.5 32.7 5.4 13.7 18.7 19.4 7.1 13.3 8.3 Global+ ACER 12.3 10.7 6.5 18.2 22.9 6.2 6.1 18.6 4.9 11.6 32.7 16.1 7.2 13.4 8.1 Random-Crop EER 10.9 9.2 6.9 16.6 21.3 2.9 5.2 18.8 3.7 11.5 19.0 14.9 6.2 11.3 6.3 SSR-FCN ACER 7.4 19.5 3.2 7.7 33.3 5.2 3.3 22.5 5.9 11.7 21.7 14.1 6.4 12.4 9.2 EER 6.8 11.2 2.8 6.3 28.5 0.4 3.3 17.8 3.9 11.7 21.6 13.5 3.6 10.1 8.4 2.6ExperimentalResults 2.6.1EvaluationofGlobalDescriptorvs.LocalRepresentation Inordertoanalyzetheimpactoflearninglocalembeddingsasopposedtolearningaglobalem- bedding,weconductanablationstudyonthreepresentationattackinstrumentsintheSiW-M dataset[1],namely,Replay(Figure2.1c),Obfuscation(Figure2.1i),andPaperGlasses(Fig- ure2.1m). Theexperimentisconductedundertheunknownattackscenario(leave-one-instrument- outprotocol). Inthisexperiment,a traditionalCNN learningaglobalimagedescriptorisconstructedby replacingthe 1 1 convolutionallayerwithafullyconnectedlayer.WecomparetheCNNtothe proposedbackbone FCN inTable2.2whichlearnslocalrepresentations.Forafaircomparison betweenCNNandFCN,weutilizethesamemeta-parametersandemployglobalsupervisiononly (StageI). 48 InTable2.3,wethatoverallFCNsaremoregeneralizabletounknownpresentationat- tackinstrumentscomparedtoglobalembeddings.Forpresentationattackinstrumentswherethe presentationattackaffectstheentireface,suchasreplayattacks,thedifferencesbetweengeneral- izationperformanceofCNNandFCNarenegligible.Here,presentationattackdecisionsatlocal spatialregionsdonothaveanyadvantageoverasinglepresentationattackdecision overtheentireimage.RecallthatCNNsemployafullyconnectedlayerwhichstripsawayall spatialinformation.Thisexplainswhylocaldecisionscanimprovegeneralizability ofFCNoverCNNwhenpresentationattackinstrumentsarelocalinnature( e . g .,make-upattacks andpartialattacks).Duetosubtletyofobfuscationattacksandlocalizednatureofpaperglasses, FCNcanexhibitarelativereductioninEERby 16% and 47% ,respectively,relativetoCNN. 2.6.2RegionExtractionStrategies Weconsidered6differentregionextractionstrategies,namely, Eye-Region , Nose-Region , Mouth- Region , Landmark-Region , Random-Crop ,and Self-SupervisedRegions(Proposed) (seeFig- ure2.5).Wealsoincluderesultsfora Global modelwhichreferstotrainingtheFCNwiththe entirefaceimageonly(StageI). Sinceallfaceimagesarealignedandcropped,spatialinformationisconsistentacrossallim- agesindatasets.Therefore,wecanautomaticallyextractfacialregionsthatinclueeye,nose,and mouthregions(Figure2.5a).WetraintheproposedFCNseparatelyoneachofthethreeregionsto obtainthreemodels:eye-region,nose-region,andmouth-region. Wealsoinvestigateextractingregionsbyfacelandmarks.Forthis,weutilizeastate- of-the-artlandmarkextractor,namelyDLIB[130],toobtain68landmarkpoints.Weexclude17 landmarksaroundthejawlineandasubsetof51landmarkpointsaroundeyebrows (10landmarks),eyes(12landmarks),nose(9landmarks),andmouth(20landmarks).Atotalof 51regions(withaedsize 32 32 )centeredaroundeachlandmarkareextractedandusedtrain asingleFCNonall51regions. Ourareasfollows:(i)almostallmethodswithregionalsupervisionhaveloweroverall 49 errorratesascomparedtotrainingwiththeentireface.ExceptiontothisiswhenFCNistrained onlyonmouthregions.Thisislikelybecauseamajorityofpresentationattackinstrumentsmay notcontainpresentationattackpatternsacrossmouthregions(Figure2.1).(ii)whenbothglobal anddomain-knowledgestrategies,eyesandnose)arefused,thegeneralizationperfor- manceimprovescomparedtotheglobalmodelalone.Notethatwedonotfusethemouthregion sincetheperformanceispoorformouthregions.Similarly,wethatregionscroppedaround landmarkswhenfusedwiththeglobalercanachievebettergeneralizationperformance. (iii)samplingrandomlocalregionsofthefacealsoresultsinhigherrorratesacrossthediverseset ofpresentationattackinstruments.(iv)comparedtoallregionextractionstrategies,theproposed self-supervisedregionextractionstrategy(StageI ! StageII)achievesthelowestgeneralization errorratesacrossallpresentationattackinstrumentswitha 40% and 45% relativereductioninEER andACERcomparedtotheGlobalmodel(StageI).ThissupportsourhypothesisthatbothStage IandStageIIarerequiredforenhancedgeneralizationperformanceacrossunknownpresentation attackinstruments.Ascore-levelfusionoftheglobalFCNwithself-supervisedregionsdoesnot showanyreductioninerrorrates.ThisisbecausewealreadytrainedtheproposedFCN onglobalfacesinStageI. The Random-Crop strategycanbeviewedasavariantoftrainingtheproposedStageIIwithout anyregion-selectionguidancefromStageI.Inordertofurtherinvestigatetheofthepro- posedself-supervisionmethod,wetraintwomodelstodistinguishbetweensamplesand PaperGlasses presentationattacksamples:(i) RandomCrop modelistrainedonpatchesrandomly sampledfromtheinputimagesuchthatthesizeofeachregionisbetween 64 64 and 256 256 ,and (ii) Self-SupervisedRegions modelistrainedonpatchessampledfromweaklylabeledregionsby thepre-trainedmodelfromStageI.Forafaircomparison,wedonotemploythepre-trainedmodel fromStageItotrainthe Self-SupervisedRegions model.InFigure2.7,weplotthetraininglossfor bothmodelsovermultipletrainingiterations.Wecanobservethattheproposedself-supervisedre- gionmethodaidsinnetworkconvergence.Thisisbecauserandomcroppingcanresultintraining withregionsfrompresentationattacksamples(seeFigure2.7),whereas,theproposed 50 (a)Obfuscation (b)AttackScore:1.0 (c)AttackScore:0.7 (d)Obfuscation (e)AttackScore:0.0 (f)AttackScore:0.0 Figure2.6(a)Anexampleobfuscationpresentationattackattemptwhereournetworkcorrectly predictstheinputtobeapresentationattack.(b,e)Scoremapoutputbyournetworktrainedvia Self-SupervisedRegions.(c,f)ScoremapoutputbyFCNtrainedonentirefaceimages.(d)An exampleobfuscationpresentationattackattemptwhereournetwork incorrectly predictstheinput tobeaAttackscoresaregivenbelowthescoremaps.Decisionthresholdis 0 : 5 . self-supervisionmethodensuresthatregionssampledfrompresentationattacksindeedcontains presentationattackartifacts. InFigure2.6,weanalyzetheeffectoftrainingtheFCNlocallyvs.globallyontheprediction results.Intherow,wherebothmodelscorrectlypredicttheinputtobeapresentationattack, weseethatFCNtrainedviarandomregionscancorrectlyidentifypresentationattackregionssuch asfakeeyebrows.Incontrast,theglobalFCNcanbarelylocatethefakeeyebrows.Sincerandom regionsincreasesthevariabilityinthetrainingsetalongwithadvantageoflearninglocalfeatures thanglobalFCN,wethatproposedself-supervisedregionalsupervisionperformsbest. 51 Figure2.7Networkconvergenceoveranumberoftrainingiterationswhenamodeltrainson(a) randomlycroppedpatches(blueline),and(b)self-supervisedregionsextractedviapre-trained modelfromStageI(orangeline).Randomlycroppingpatchesmayresultinnoisysampleswhere samplesfrompresentationattacksamplesmaybeusedfortraining.Someexampleran- domlycroppedpatcheswithhightraininglossareshownabovethelines.Instead,wethatthe proposedself-supervisionaidsinnetworkconvergence. 2.6.3EvaluationofNetworkCapacity InTable2.5,weshowthegeneralizationperformanceofourmodelwhenwevarythecapacity ofthenetwork.WeconsiderthreedifferentvariantsoftheproposedFCN:(a)3-layerFCN( 76 K parameters),(b)5-layerFCN( 1 : 5 M parameters;proposed),and(c)6-layerFCN( 3 M parame- ters).Thisexperimentisevaluatedonthreeunknownpresentationattackinstruments,namely, Re- play , Obfuscation ,and PaperGlasses .Wechosethesepresentationattackinstrumentsdueto theirvastlydiversenature.Replayattacksconsistofglobalpresentationattackpatterns,whereas 52 Table2.5GeneralizationerrorofFCNswithrespecttothenumberoftrainableparameters. Method Metric(%) Replay Obfuscation PaperGlasses Mean Std. 3-layers( 76 K ) ACER 13.9 57.6 12.3 27.9 25.7 EER 14.0 44.1 7.5 21.9 19.5 5-layers( 1 : 5 M ;proposed) ACER 7.4 22.5 14.1 14.7 7.6 EER 6.8 17.8 13.5 12.7 4.5 6-layers( 3 M ) ACER 11.2 32.9 19.8 21.3 11.0 EER 7.8 25.19 19.7 17.6 7.2 Table2.6ResultsonSiW-M:UnknownAttacks.Here,eachcolumnrepresentsanunknownpre- sentationattackinstrumentwhilethemethodistrainedontheremaining12presentationattack instruments. Method Metric Replay Print MaskAttacks MakeupAttacks PartialAttacks Mean Std. Replay Print Half Silicone Trans. Paper Mann. Obf. Imp. Cosm. FunnyEye Glasses PaperCut 99vids. 118vids. 72vids. 27vids. 88vids. 17vids. 40vids. 23vids. 61vids. 50vids. 160vids. 127vids. 86vids. SVM+LBP[2] ACER 20.6 18.4 31.3 21.4 45.5 11.6 13.8 59.3 23.9 16.7 35.9 39.2 11.7 26.9 14.5 EER 20.8 18.6 36.3 21.4 37.2 7.5 14.1 51.2 19.8 16.1 34.4 33.0 7.9 24.5 12.9 Auxiliary[30] ACER 16.8 6.9 19.3 14.9 52.1 8.0 12.8 55.8 13.7 11.7 49.0 40.5 5.3 23.6 18.5 EER 14.0 4.3 11.6 12.4 24.6 7.8 10.0 72.3 10.1 9.4 21.4 18.6 4.0 17.0 17.7 DTN[1] ACER 9.8 6.0 15.0 18.7 36.0 4.5 7.7 48.1 11.4 14.2 19.3 19.8 8.5 16.8 11.1 EER 10.0 2.1 14.4 18.6 26.5 5.7 9.6 50.2 10.1 13.2 19.8 20.5 8.8 16.1 12.2 CDC[88] ACER 10.8 7.3 9.1 10.3 18.8 3.5 5.6 42.1 0.8 14.0 24.0 17.6 1.9 12.7 11.2 EER 9.2 5.6 4.2 11.1 19.3 5.9 5.0 43.5 0.0 14.0 23.3 14.3 0.0 11.9 11.8 Proposed ACER 7.4 19.5 3.2 7.7 33.3 5.2 3.3 22.5 5.9 11.7 21.7 14.1 6.4 12.4 9.2 EER 6.8 11.2 2.8 6.3 28.5 0.4 3.3 17.8 3.9 11.7 21.6 13.5 3.6 10.1 8.4 TDR* 72.0 51.0 96.0 55.9 39.0 100.0 95.0 31.0 90.0 44.0 33.0 42.9 94.7 65.0 25.9 *TDRevaluatedat 2 : 0% FDR Table2.7ResultsonSiW-M:Knownpresentationattackinstruments. Method Metric(%) MaskAttacks MakeupAttacks PartialAttacks Mean Std. Replay Print Half Silicone Trans. Paper Mann. Obf. Imp. Cosm. FunnyEye Glasses PaperCut Auxiliary[30] ACER 5.1 5.0 5.0 10.2 5.0 9.8 6.3 19.6 5.0 26.5 5.5 5.2 5.0 8.7 6.8 EER 4.7 0.0 1.6 10.5 4.6 10.0 6.4 12.7 0.0 19.6 7.2 7.5 0.0 6.5 5.8 Proposed ACER 3.5 3.1 1.9 5.7 2.1 1.9 4.2 7.2 2.5 22.5 1.9 2.2 1.9 4.7 5.6 EER 3.5 3.1 0.1 9.9 1.4 0.0 4.3 6.4 2.0 15.4 0.5 1.6 1.7 3.9 4.4 TDR* 55.5 92.3 69.5 100.0 90.4 100.0 85.1 92.5 78.7 99.1 95.6 95.7 76.0 87.0 13.0 *TDRevaluatedat 2 : 0% FDR obfuscationattacksareextremelysubtlecosmeticchanges.PaperGlassesareconstrainedonly toeyes.Whilealargenumberoftrainableparametersleadtopoorgeneralizationduetoover- 53 tothepresentationattackinstrumentsseenduringtraining,whereas,toofewparameters limitslearningdiscriminativefeatures.Basedonthisobservationandexperimentalresults,we utilizethe5-layerFCN(seeTable2.2)withapproximately1.5Mparameters.Amajorityofprior studiesemploy13denselyconnectedconvolutionallayerswithtrainableparametersexceeding 2 : 7 M [30,88,124,127]. 2.6.4GeneralizationacrossUnknownAttacks Theprimaryobjectiveofthisworkistoenhancegeneralizationperformanceacrossamultitude ofunknownpresentationattackinstrumentsinordertoeffectivelygaugetheexpectederrorrates inreal-worldscenarios.TheevaluationprotocolinSiW-Mfollowsaleave-one-spoof-outtesting protocolwherethetrainingsplitcontains12differentpresentationattackinstrumentsandthe13 th presentationattackinstrumentisheldoutfortesting.Amongthevideos, 80% arekept inthetrainingsetandtheremaining 20% isusedfortestingNotethatthereareno overlappingsubjectsbetweenthetrainingandtestingsets.Alsonotethatnodatasamplefromthe testingpresentationattackinstrumentisusedforvalidationsinceweevaluateourapproachunder unknownattacks.WereportACERandEERacrossthe13splits.InadditiontoACERandEER, wealsoreporttheTDRat2.0%FDR. InTable2.6,wecompare SSR-FCN withpriorwork.Wethatourproposedmethod achievesimprovementincomparisontothepublishedresults[88](relativereduction of14%ontheaverageEERand3%ontheaverageACER).Notethatthestandarddeviationacross all13presentationattackinstrumentsisalsoreducedcomparedtopriorapproaches,eventhough someofthem[30,88]utilizeauxiliarydatasuchasdepthandtemporalinformation. ,wereducetheEERsofreplay,halfmask,transparentmask,siliconemask,paper mask,mannequinhead,obfuscation,impersonation,andpaperglassesrelativelyby27%,33%, 43%,93%,34%,59%,and6%,respectively.Amongallthe13presentationattackinstruments, detectingobfuscationattacksisthemostchallenging.Thisisduetothefactthatthemakeup appliedintheseattacksareverysubtleandmajorityofthefacesarePriorworkswere 54 notsuccessfulindetectingtheseattacksandpredictmostoftheobfuscationattacksas Bylearningdiscriminativefeatureslocally,ourproposednetworkimprovesthestate-of-the-art obfuscationattackdetectionperformanceby59%intermsofEERand46%intermsofACER. 2.6.5SiW-M:DetectingKnownAttacks Hereallthe13presentationattackinstrumentsinSiW-Mareusedfortrainingaswellastesting. WerandomlysplittheSiW-Mdatasetintoa60%-40%training/testingsplitandreporttheresults inTable2.7.Incomparisontoastate-of-the-artfacepresentationattackdetectionmethod[30], ourmethodoutperformsforalmostalloftheindividualpresentationattackinstrumentsaswell astheoverallperformanceacrosspresentationattackinstruments. Auxiliary[30] utilizesdepth andtemporalinformationforpresentationattackdetectionwhichaddscomplexityto thenetwork.Wethatcharacterizinglocalspatialregionsasattackin factleadstobettergeneralizationonunknownattacks(seeTable2.6)andspecializationonknown attacks. 2.6.6EvaluationonOulu-NPUDataset WefollowthefourstandardprotocolsintheOULU-NPUdataset[2]whichcoverthecross- background,cross-presentation-attack-instrument(cross-PAI),cross-capture-device,andcross- conditionsevaluations: ProtocolI: unseensubjects,illumination,andbackgrounds; ProtocolII: unseensubjectsandattackdevices; ProtocolIII: unseensubjectsandcameras; ProtocolIV: unseensubjects,illumination,backgrounds,attackdevices,andcameras. Wecomparetheproposed SSR-FCN withthebestperformingmethod,namelyGRADI- ENT[131],inIJCBMobileFaceCompetition[131]foreachprotocol.We 55 Table2.8ErrorRates(%)oftheproposed SSR-FCN andandcompetingfacepresentationattack detectorsunderthefourstandardprotocolsofOulu-NPU[2]. Protocol Method APCER BPCER ACER I GRADIENT[131] 1.3 12.5 6.9 Auxiliary[30] 1.6 1.6 1.6 DeepPixBiS[124] 0.8 0.0 0.4 TSCNN-ResNet[132] 5.1 6.7 5.9 SSR-FCN (Proposed) 1.5 7.7 4.6 II GRADIENT[131] 3.1 1.9 2.5 Auxiliary[30] 2.7 2.7 2.7 DeepPixBiS[124] 11.4 0.6 6.0 TSCNN-ResNet[132] 7.6 2.2 4.9 SSR-FCN (Proposed) 3.1 3.7 3.4 III GRADIENT[131] 2.1 3.9 5 : 0 5 : 3 3 : 8 2 : 4 Auxiliary[30] 2 : 7 1 : 3 3 : 1 1 : 7 2 : 9 1 : 5 DeepPixBiS[124] 11 : 7 19 : 6 10 : 6 14 : 1 11 : 1 9 : 4 TSCNN-ResNet[132] 3 : 9 2 : 8 7 : 3 1 : 1 5 : 6 1 : 6 SSR-FCN (Proposed) 2.9 2.1 2.7 3.2 2.8 2.2 IV GRADIENT[131] 5.0 4.5 15 : 0 7 : 1 10 : 0 5 : 0 Auxiliary[30] 9 : 3 5 : 6 10 : 4 6 : 0 9.5 6.0 DeepPixBiS[124] 36 : 7 29 : 7 13 : 3 16 : 8 25 : 0 12 : 7 TSCNN-ResNet[132] 11 : 3 3 : 9 9.7 4.8 9 : 8 4 : 2 SSR-FCN (Proposed) 8 : 3 6 : 8 13 : 3 8 : 7 10 : 8 5 : 1 alsoincludesomenewerbaselinemethods,includingAuxiliary[30],DeepPixBiS[124],and TSCNN[132].Wecompareourproposedmethodwith10baselinesintotalforeachprotocol. Additionalbaselinescanbefoundinsupplementarymaterial. InTable2.8, SSR-FCN achievesACERsof4.6%,3.4%,2.8%,and10.8%inthefourprotocols, respectively.Amongthebaselines, SSR-FCN evenoutperformsprevailingstate-of-the-artmethods 56 Table2.9Cross-DatasetHTER(%)oftheproposed SSR-FCN andcompetingfacepresentation attackdetectors. Method CASIA ! Replay Replay ! CASIA CNN[112] 48.5 45.5 ColorTexture[102] 47.0 49.6 FaceSpoofBuster[133] 43.3 53.0 Auxiliary[30] 27.6 28.4 De-Noising[125] 28.5 41.1 Damer&Dimitrov[134] 28.4 38.1 STASN[135] 31.5 30.9 SAPLC[127] 27.3 37.5 SSR-FCN (Proposed) 19.9 41.9 fiCASIA ! ReplayfldenotestrainingonCASIAandtestingonReplay-Attack inprotocolIIIwhichcorrespondstogeneralizationperformanceforunseensubjectsandcameras. Theresultsarecomparabletobaselinemethodsintheotherthreeprotocols.SinceOulu-NPU comprisesofonlyprintandreplayattacks,amajorityofthebaselinemethodsincorporateauxiliary informationsuchasdepthandmotion.Indeed,incorporatingauxiliaryinformationcouldimprove theresultsattheriskofovandoverheadcostandtime. 2.6.7Cross-DatasetGeneralization Inordertoevaluatethegeneralizationperformanceof SSR-FCN whentrainedononedatasetand testedonanother,followingpriorstudies,weperformacross-datasetexperimentbetweenCASIA- FASD[3]andReplay-Attack[4]. InTable2.9,wethat,comparedto6prevailingstate-of-the-artmethods,theproposed SSR- FCN achievesthelowesterror(a 27% improvementinHTER)whentrainedonCASIA-FASD[3] andevaluatedonReplay-Attack[4].Ontheotherhand, SSR-FCN achievesworseperformance whentrainedonReplay-AttackandtestedonCASIA-FASD.Thiscanlikelybeattributedtohigher resolutionimagesinCASIA-FASDcomparedtoReplay-Attack.Thisdemonstratesthat SSR- FCN trainedwithhigher-resolutiondatacangeneralizebetteronpoorerqualitytestingimages, 57 Figure2.8Examplecaseswheretheproposedframework, SSR-FCN ,failstocorrectlyclassify andpresentationattacks.(a)areaspresentationattackslikely duetobrightlightingandocclusionsinfaceregions.(b)Presentationattacksas duetothesubtlenatureofmake-upattacksandtransparentmasks.Correspondingattack scores( 2 [0 ; 1] )areprovidedbeloweachimage.Largervalueofattackscoreindicatesahigher likelihoodthattheinputimageisapresentationattack.Decisionthresholdis 0 : 5 . butthereversemaynotholdtrue.Weintendonaddressingthislimitationinfuturework. Additionalbaselinescanbefoundinsupplementarymaterial. 2.6.8FailureCases Eventhoughexperimentresultsshowenhancedgeneralizationperformance,ourmodelstillfails tocorrectlyclassifycertaininputimages.InFigure2.8,weshowafewsuchexamples. Figure2.8ashowsincorrectpredictionofaspresentationattacksinthepresence ofinhomogeneousillumination.Thisisbecausethemodelpredictsbonaasbeingoneof replayandprintattackswhichexhibitbrightlightingpatternsduetotherecapturingmediasuch assmartphonesandlaptops.SinceweournetworkviaregionalsupervisioninStageII, artifactsthatobstructpartsofthefacescanalsoadverselyaffectourmodel. 58 Figure2.9Visualizingpresentationattackregionsviatheproposed SSR-FCN .Redregionsindicate higherlikelihoodofbeingapresentationattackregion.Correspondingattackscores( 2 [0 ; 1] )are providedbeloweachimage.Largervalueofattackscoreindicatesahigherlikelihoodthatthe inputimageisapresentationattack.Decisionthresholdis 0 : 5 . Figure2.8bshowsincorrectofpresentationattackasThisispar- ticularlytruewhenpresentationattackartifactsareverysubtle,suchascosmeticandobfuscation make-upattacks.Transparentmaskscanalsobeproblematicwhenthemaskitselfisbarelyvisible. 2.6.9ComputationalRequirement Sincefacepresentationattackdetectionmodulesareemployedasapre-processingstepforauto- matedfacerecognitionsystems,itiscrucialthatthepresentationattackpredictiontimeshouldbe aslowaspossible.Theproposedapproachcomprisesof1.5Mtrainableparameterscomparedtoa traditionalCNN[30]with3Mlearnableparameters.Theproposed SSR-FCN takesunder 2 hours totrainbothStageIandStageII,and 4 milisecondstopredictasingle (256 256) presentationat- imageonaNvidiaGTX1080TiGPU.Inotherwords, SSR-FCN canprocessframes 59 (a)PaperEyeglasses (b)AverageEyeglassesScoreMap Figure2.10Apartialpresentationattackartifactmaybepresentinasmallportionoftheinput 256 256 faceimage,suchas(a)papereyeglasses.However,sincetheproposed SSR-FCN dy- namicallyaggregatesdecisionsacrossmultiplereceptiveintheimage,amajorityofthe pixelsinthescoremapcompriseofhighscores(indicatingthepresenceofapresentation attack).Wevisualizetheaveragescoremapacrossallpapereyeglassattacksin(b). at250FramesPerSecond(FPS)andthesizeofthemodelisonly 11 : 8 MB.Therefore, SSR-FCN is wellsuitedfordeploymentwherereal-timedecisionsarerequired. 2.6.10VisualizingPresentationAttackRegions SSR-FCN canautomaticallylocatetheindividualpresentationattackregionsinaninputfaceim- age.InFigure2.9,weshowheatmapsfromthescoremapsextractedforarandomlychosenimage fromallpresentationattackinstruments.Redregionsindicateahigherlikelihoodofpresentation attack. Forainputimage,thepresentationattackregionsaresparsewithlowlikelihoods.In thecaseofreplayandprintattacks,thepredictedpresentationattackregionsarelocatedthroughout theentirefaceimage.Thisisbecausethesepresentationattackscontainglobal-levelnoise.For maskattacks,includinghalf-mask,siliconemask,transparentmask,papermask,andmannequin, thepresentationattackpatternsareneartheeyeandnoseregions.Make-upattacksare 60 hardertodetectsincetheyareverysubtleinnature.Proposed SSR-FCN detectsobfuscationand cosmeticattackattemptsbylearninglocaldiscriminativecuesaroundtheeyebrowregions.In contrast,impersonationmake-uppatternsexistthroughouttheentireface.Wealsothat SSR- FCN canpreciselylocatethepresentationattackartifacts,suchasfunnyeyeglasses,paperglasses, andpapercut,inpartialattacks. 2.7Discussion Weshowthattheproposed SSR-FCN achievessuperiorgeneralizationperformanceonSiW-M dataset[1]comparedtotheprevailingstate-of-the-artmethodsthattendtoovontheseen presentationattackinstruments.Ourmethodalsoachievescomparableperformancetothestate- of-the-artinOulu-NPUdataset[2]andoutperformsallbaselinesforcross-datasetgeneralization performance(CASIA-FASD[3] ! Replay-Attack[4]). Incontrasttoanumberofpriorstudies[30,88,102,123],theproposedapproachdoesnot utilizeauxiliarycuesforpresentationattackdetection,suchasmotionanddepthinformation. Whileincorporatingsuchcuesmayenhanceperformanceonprintandreplayattackdatasetssuch asOulu-NPU,CASIA-MFSD,andReplay-Attack,itisattheriskofpotentiallyovtothe twoattacksandcomputecost.Amajorof SSR-FCN liesisitsusability.Asimplepre- processingstepincludesfacedetectionandalignment.Thecroppedfaceisthenpassedtothe network.Witha single forward-passthroughtheFCN,weobtainboththescoremapandthe attackscore. Ourproposed SSR-FCN outputsaattackdecisionateachpixelofinter- mediatefeaturemaps.Duetothedownsamplinglayersfoundaftereachconvolutionaloperation (seeTable2.2),thedecisionsareautomaticallyaggregated.Therefore,thefeaturemap(ref- eredtoas scoremap )isofmuchsmallerresolution( 16 16 pixels)comparedtotheoriginalimage. Therefore,inthecaseofpartialattackssuchaspapereyeglasses,eventhoughasmallportionof thefaceimagecomprisesofapresentationattackartifact,ourscoremapisanaggregated 61 decisionacrossmultipleslidingwindows(receptiveoftheoriginalimage.InFigure2.10, wecomputetheaveragescoremapforallthepapereyeglassespresentationattacks.Wethat, eventhoughpapereyeglassescompriseofasmallportionoftheoriginalimage,majorityofthe pixelsintheaverage scoremap compriseofhighscores(indicatingthepresenceofapresentation attack).Inthispaper,weobtainthescoreviaaveragepoolingthescoremap.Asfuturework, weintendonexploringotherfusionmechanismssuchasaweightedaverageviaanattentionmask. Eventhoughtheproposedmethodiswell-suitedforgeneralizablefacepresentationattackde- tection, SSR-FCN isstilllimitedbytheamountandqualityofavailabletrainingdata.Forinstance, whentrainedonalow-resolutiondataset,namelyReplay-Attack[4],cross-datasetgeneralization performancesuffers. 2.8Summary Facepresentationattackdetectionsystemsarecrucialforsecureoperationofanautomatedface recognitionsystem.Withtheintroductionofsophisticatedpresentationattacks,suchashighreso- lutionandtightsilicone3Dfacemasks,presentationattackdetectorsneedtoberobustand generalizable.Weproposedafacepresentationattackdetectionframework,namely SSR-FCN , thatachievedstate-of-the-artgeneralizationperformanceagainst13differentpresentationattack instruments. SSR-FCN reducedtheaverageerrorrateofcompetitivealgorithmsby 14% onone ofthelargestandmostdiversefacepresentationattackdetectiondataset,SiW-M,comprisedof13 presentationattackinstruments.Italsogeneralizeswellwhentrainingandtestingdatasetsarefrom differentsources.Inaddition,theproposedmethodisshowntobemoreinterpretablecompared topriorstudiessinceitcandirectlypredictthepartsofthefacesthatareconsideredaspresen- tationattacks.Inthefuture,weintendonexploringwhetherincorporatingdomainknowledgein SSR-FCN canfurtherimprovegeneralizationperformance. 62 Chapter3 SynthesizingandDefendingAgainst AdversarialFaces Inthepreviouschapter,weproposedasolutiontodefendAFRsystemsagainstphysicallycrafted facespoofs.However,inthischapter,wewillshowthatprevailingAFRsystemsarealsovul- nerabletothegrowingthreatofadversarialexampleswhicharedigitalcrafted.Ourmainfocus istodesignanautomaticadversarialsynthesismethodthatcanevade5state-of-the-artAFR systems.Withapowerfuladversarialfacesynthesizer,weevaluatetherobustnessofprevailing AFRsystemsagainstsuchdigitaladversarialattacks.Thelatterpartofthechapterpresentsa state-of-the-artsolutiontosafeguardAFRsystemsagainstanyadversarialface. 63 3.1Introduction (a)EnrolledFace 0.72 0.78 (b)InputProbe 0.22 0.12 (c)AdvFaces 0.26 0.25 (d)PGD[33] 0.14 0.25 (e)FGSM[70] Figure3.1Examplegalleryandprobefaceimagesandcorrespondingsynthesizedadversarialex- amples.(a)Twocelebrities'realfacephotoenrolledinthegalleryand(b)thesamesubject's probeimage;(c)Adversarialexamplesgeneratedfrom(b)byourproposedsynthesismethod,Ad- vFaces;(d-e)Resultsfromtwoadversarialexamplegenerationmethods.Cosinesimilarityscores ( 2 [ 1 ; 1] )obtainedbycomparing(b-e)totheenrolledimageinthegalleryviaArcFace[9]are shownbelowtheimages.Ascoreabove 0.28 (threshold@ 0 : 1% FalseAcceptRate)indicatesthat twofaceimagesbelongtothesamesubject.Here,asuccessfulobfuscationattackwouldmeanthat humanscanidentifytheadversarialprobesandenrolledfacesasbelongingtothesameidentitybut anautomatedfacerecognitionsystemconsidersthemtobefromdifferentsubjects. Frommobilephoneunlock,toboardingaatairports,theubiquityofautomatedface recognitionsystems(AFR)isevident.Withdeeplearningmodels,AFRsystemsareabletoachieve accuraciesashighas99%TrueAcceptRate(TAR)at0.1%FalseAcceptRate(FAR)[22].The modelbehindthissuccessisaConvolutionalNeuralNetwork(CNN)[6,9,44]andtheavailability oflargefacedatasetstotrainthemodel.However,CNNmodelshavebeenshowntobevulnerable to adversarialperturbations 1 [69Œ72].Szegedy etal. showedthedangersof adversarial examples intheimagedomain,whereperturbingthepixelsintheinputimagecan causeCNNstomisclassifytheimageevenwhentheamountofperturbationisimperceptibleto 1 Adversarialperturbationsrefertoalteringaninputimageinstancewithsmall,humanimperceptiblechangesina mannerthatcanevadeCNNmodels. 64 (a)Printattack (b)Replayattack (c)Maskattack (d)AdversarialFacessynthesizedviaproposedAdvFaces Figure3.2Threetypesoffacepresentationattacks:(a)printedphotograph,(b)replayingthetar- getedperson'svideoonasmartphone,and(c)asiliconemaskofthetarget'sface.Facepresentation attacksrequireaphysicalartifact.Adversarialattacks(d),ontheotherhand,aredigitalattacksthat cancompromiseeitheraprobeimageorthegalleryitself.Toahumanobserver,facepresentation attacks(a-c)aremoreconspicuousthanadversarialfaces(d). thehumaneye[69].Despiteimpressiverecognitionperformance,prevailingAFRsystemsarestill vulnerabletothegrowingthreatofadversarialexamples(seeFigure3.1). ToattackanAFRsystem,ahackercanmaliciouslyperturbhisfaceimageinamannerthat cancauseAFRsystemstomatchittoatargetvictim( impersonationattack )oranyidentityother thanthehacker( obfuscationattack ).Yettothehumanobserver,thisadversarialfaceimageshould appearasalegitimatefacephotooftheattacker(seeFigure3.2d).Thisisdifferentfromface pre- sentationattacks ,wherethehackerassumestheidentityofatargetbypresentingafakeface (alsoknownasspoofface)toafacerecognitionsystem(seeFigure3.2).However,inthecase ofpresentationattacks,thehackerneedstoactivelyparticipatebywearingamaskorreplayinga photograph/videoofthegenuineindividualwhichmaybeconspicuousinscenarioswherehuman operatorsareinvolved(suchasairports).Asdiscussedbelow,adversarialfaces,donotrequireac- tiveparticipationofthesubjectduringauthentication(comparisonbetweenadversarialprobeand 65 Figure3.3Eightpointsofattacksinanautomatedfacerecognitionsystem[31].Anadversarial imagecanbeinjectedintheAFRsystematpoints2and6(solidarrows). galleryimages). Considerforexample,theUnitedStatesCustomsandBorderProtection(CBP),thelargest federallawenforcementagencyintheUnitedStates[73],which(i)processesentrytothecountry forovera million travellers everyday [74]and(ii)employsautomatedfacerecognitionforverifying travelers'identities[75].InordertoevadebeingasanindividualinaCBPwatchlist, aterroristcanmaliciouslyenrollanadversarialimageinthegallerysuchthatuponenteringthe border,hislegitimatefaceimagewillbematchedtoaknownandbenignindividualortoafake identitypreviouslyenrolledinthegallery.Anindividualcanalsogenerateadversarialexamples tododgehisownidentityinordertoguardpersonalprivacy.Ratha etal. [31]eight pointsinabiometricsystemwhereanattackcanbelaunchedagainstabiometric(includingface) recognitionsystem,includingAFR(seeFigure3.3).Anadversarialfaceimagecanbeinsertedin theAFRsystematpoint2,wherecompromisedfaceembeddingswillbeobtainedbythefeature extractorthatcouldbeusedforimpersonationorobfuscationattacks.Theentiregallerycanalso becompromisedifthehackerenrollsanadversarialimageatpoint6,wherenoneoftheprobes willmatchtothecorrectidentity'sgallery. 66 Threebroadcategoriesofadversarialattackshavebeen 1. White-box attack:AmajorityofthepriorworkassumesfullknowledgeoftheCNNmodel andtheniterativelyaddsimperceptibleperturbationstotheprobeimageviavariousopti- mizationschemes[32,33,70,136Œ141].Thisisunrealisticinreal-worldscenarios,sincethe attackermaynotbeabletoaccessthemodels. 2. Black-box attack:Generally,black-boxattacksarelaunchedbyqueryingtheoutputsofthe deployedAFRsystem[142],[143].Butitmaytakealargenumberofqueriestoobtaina reasonableadversarialimage[142].Further,mostCommercial-Off-The-Shelf(COTS)face matcherspermitonlyafewqueriesatatimetopreventsuchattacks. 3. Semi-whitebox attack:Here,awhite-boxmodelisutilized onlyduringtraining andthenad- versarialexamplesaresynthesizedduringinferencewithoutanyknowledgeofthedeployed AFRmodel. ThischapterproposesaGAN-basedadversarialfacesynthesismethod,namely AdvFaces , thatlearnstogeneratevisuallyrealisticadversarialfaceimagesthatarebystate-of- the-artAFRsystems.ThelatterpartofthechapterutilizestheconceptslearnedfromAdvFacesin ordertodefendAFRsytemsagainstanyadversarialattacktype. 3.2RelatedWork 3.2.1GenerativeAdversarialNetworks(GANs) GenerativeAdversarialNetworks[144]havebeenshowntobesuccessfulinawidevarietyof imagesynthesisapplications[145,146]suchasstyletransfer[147Œ149],image-to-imagetrans- lation[150,151],andrepresentationlearning[145,152,153].Ourobjectiveistosynthesizeface imagesthatarenotonlyvisuallyrealisticbutarealsoabletoevadeAFRsystems. 67 3.2.2AdversarialAttacksonImage Majorityofthepublishedpapershavefocusedonwhite-boxattacks,wherethehackerhasfull accesstothemodelthatisbeingattacked[33,69,70,136,137].Otherworksfocusedonoptimizing adversarialperturbationbyminimizinganobjectivefunctionfortargetedattackswhilesatisfying certainconstraints[136].However,thesewhite-boxapproachesarenotfeasibleinthefacerecog- nitiondomain,astheattackerisunlikelytohaveaccesstothedeployedAFRsystem.Wepropose afeed-forwardnetworkthatcanautomaticallygenerateanadversarialimagewithasingleforward passwithouttheneedforanyknowledgeofAFRsystemduringinference. Indeed,feed-forwardnetworkshavebeenusedforsynthesizingadversarialattacks.Baluja andFischerproposedadeepautoencoderthatlearnstotransformaninputimagetoanadversarial image[154].StudiesonsynthesizingadversarialinstancesviaGANsarelimitedinliterature[155Œ 157].Thesemethodsrequiresoftmaxprobabilitiesinordertoevadeanimage.Instead, weproposeanidentitylossfunctionbettersuitedforgeneratingadversarialfacesusingtheface embeddingsobtainedfromafacematcher. 3.2.3AdversarialAttacksonFaceRecognition Inliterature,studiesongeneratingadversarialexamplesinthefacerecognitiondomainarerela- tivelylimited.Bose etal. craftadversarialexamplesbysolvingconstrainedoptimizationsuchthat afacedetectorcannotdetectaface[158].In[159,160],perturbationsareconstrainedtotheeye- glassregionofthefaceandadversarialimageisgeneratedbygradient-basedmethods.However, thesemethodsrelyonwhite-boxmanipulationsoffacerecognitionmodels,whichisimpractical inreal-worldscenarios.Dong etal. proposedanevolutionaryoptimizationmethodforgenerating adversarialfacesinblack-boxsettings[142].However,theyrequireatleast1,000queriestothe targetAFRsystembeforearealisticadversarialfacecanbesynthesized.Song etal. employed aconditionalvariationautoencoderGANforcraftingadversarialfaceimagesinasemi-whitebox setting[161].However,theyonlyfocusedonimpersonationattacksandrequireatleast5images ofthetargetsubjectfortrainingandinference.Incontrast,wetrainaGANthatcanperformboth 68 StudyMethodDatasetAttacksSelf-Sup. Robustness Adv.Training[11](2017)Trainwithadv.ImageNet[162]FGSM[34] RobGAN[12](2019)Trainwithgeneratedadv.CIFAR10[163],Ima- geNet[162] PGD[35] Feat.Denoising[164](2019)Customnetworkarch.ImageNet[162]PGD[35] L2L[13](2019)Trainwithgeneratedadv.MNIST[165],CIFAR10[163]FGSM[34],PGD[35], C&W[136] X Detection Gong etal. [166](2017)BinaryCNNMNIST[165],CIFAR10[163]FGSM[34] UAP-D[167](2018)PCA+SVMMEDS[168],MultiPIE[169], PaSC[170] UAP[171] SmartBox[172](2018)AdaptiveNoiseYaleFace[173]DeepFool[141],EAD[174], FGSM[34] ODIN[175](2018)Out-of-distributionDetectionCIFAR10[163],Ima- geNet[162] OODsamples Goswami etal. [176](2019)SVMonAFRFiltersMEDS[168],PaSC[170], MBGC[177] Black-box,EAD[174] Steganalysis[178](2019)SteganlysisImageNet[162]FGSM[34],DeepFool[141], C&W[136] Massoli etal. [179](2020)MLP/LSTMonAFRFiltersVGGFace2[180]BIM[181],FGSM[34], C&W[136] Agarwal etal. [182](2020)ImageTransformationImageNet[162],MBGC[177]FGSM[34],PGD[35],Deep- Fool[141] MagNet[14](2017)AEMNIST[165],CIFAR10[163]FGSM[34],DeepFool[141], C&W[136] DefenseGAN[15](2018)GANMNIST[165],CIFAR10[163]FGSM[34],C&W[136] Feat.Distillation[183](2019)JPEG-compressionMNIST[165],CIFAR10[163]FGSM[34],DeepFool[141], C&W[136] NRP[184](2020)AEImageNet[162]FGSM[34] X A-VAE[185](2020)VariationalAELFW[8]FGSM[34],PGD[35], C&W[136] FaceGuard (thisstudy)Adv.Generator+DetectorLFW[8],Celeb-A[17], FGSM[34],PGD[35],Deep- Fool[141], X +FFHQ[18] AdvFaces[16],GFLM[32],Semantic[186] Table3.1Relatedworkinadversarialdefensesusedasbaselinesinourstudy.Unlikemajority ofpriorwork, FaceGuard isself-supervisedwherenopre-computedadversarialexamplesarere- quiredfortraining. obfuscationandimpersonationattacksandrequiresasinglefaceimageofthetargetsubject. 3.2.4DefensesAgainstAdversarialAttacks Inliterature,acommondefensestrategy,namely robustness istore-trainthewewishto defendwithadversarialexamples[11,13,34,35].However,adversarialtraininghasbeenshownto degradeaccuracyonreal(non-adversarial)images[187,188]. InordertopreventdegradationinAFRperformance,alargenumberofadversarialdefense mechanismsaredeployedasapre-processingstep,namely detection ,whichinvolvestraininga binarytodistinguishbetweenrealandadversarialexamples[166,167,172,179,189Œ 69 199].Theattacksconsideredinthesestudies[200Œ203]wereinitiallyproposedintheobject recognitiondomainandtheyoftenfailtodetecttheattacksinafeature-extractionnetworksetting, asinfacerecognition.Therefore,prevailingdetectorsagainstadversarialfacesaredemonstrated tobeeffectiveonlyinahighlyconstrainedsettingwherethenumberofsubjectsislimitedand edduringtrainingandtesting[167,172,179]. Anotherpre-processingstrategy,namely ,involvesautomaticallyremovingadver- sarialperturbationsintheinputimagepriortopassingthemtoafacematcher[14,15,184,204]. However,withoutadedicatedadversarialdetector,thesedefensesmayendupfipurifyingflareal faceimage,resultinginhighfalserejectrates. InTab.3.1,wesummarizeafewstudiesonadversarialdefensesthatareusedasbaselinesin ourwork. 3.3SynthesizingAdversarialFaces Semi-whiteboxsettingsaremoreappropriateforcraftingadversarialfaces;oncethenetworklearns togeneratetheperturbedinstancesbasedonasinglefacerecognitionsystem,attackscanbetrans- ferredtoanyblack-boxAFRsystems.However,pastapproaches,basedonGenerativeAdversarial Networks(GANs)[155Œ157],wereproposedintheimagedomainandrelyonsoft- maxprobabilities[155Œ157,161,205].Therefore,thenumberofobjectclassesareassumedto beknownduringtrainingandtesting.Facerecognitionsystemsdonotutilizethesoftmaxlayer for(asthenumberofidentitiesarenoted)insteadfeaturesfromthelastfullycon- nectedlayerareusedforcomparingfaceimages.Approachesforcraftingadversarialfacesinclude addingmakeup,eyeglasses,hat,orocclusionstofaces[159Œ161,205,206]. Weemphasizethefollowingrequirementsoftheadversarialfacegenerator: Adversarialfaceimagesshouldbeperceptuallyrealisticsuchthatahumanobservercan identifytheimageasalegitimatefaceimage. Thefacesneedtobeperturbedinamannersuchthattheycannotbeasthehacker 70 (a)ObfuscationAttack (b)ImpersonationAttack Figure3.4Oncetrained,AdvFacesautomaticallygeneratesanadversarialfaceimage.Duringan obfuscationattack,(a)theadversarialfaceappearstobeabenignexampleofCristianoRonaldo's face,however,itfailstomatchhisenrolledimage.AdvFacescanalsocombineCristiano'sprobe andBradPitt'sprobetosynthesizeanadversarialimagethatlookslikeCristianobutmatches Brad'sgalleryimage(b). ( obfuscationattack )orautomaticallymatchedtoatargetsubject( impersonationattack )by anAFRsystem. Theamountofperturbationshouldbecontrollablebythehackersothathecanexaminethe successofthelearningmodelasafunctionofamountofperturbation. Theadversarialexamplesshouldbe transferable and model-agnostic ( i . e .treatthetargetAFR modelasablack-box).Inotherwords,thegeneratedadversarialexamplesshouldhavehigh attacksuccessrateonotherblack-boxAFRsystemsaswell. Weproposeanautomatedadversarialfacesynthesismethod,named AdvFaces ,whichgenerates anadversarialimageforaprobefaceimageandalltheaboverequirements(seeFig.3.4). 71 Thecontributionsofthepaperareasfollows: 1. GAN-basedAdvFacesthatlearnstogeneratevisuallyrealisticadversarialfaceimagesthat arebystate-of-the-artAFRsystems. 2. AdversarialfacesgeneratedviaAdvFacesaremodel-agnosticandtransferable,andachieve highsuccessrateon5state-of-the-artautomatedfacerecognitionsystems. 3. Perceptualstudieswherehumanobserverssuggestthattheadversarialexamplesappearsim- ilartotheprobe. 4. Visualizingthefacialregions,wherepixelsareperturbedandanalyzingthetransferability ofAdvFaces. 5. Anopen-source 2 automatedadversarialfacegeneratorpermittinguserstocontroltheamount ofperturbation. 3.3.1ProposedMethodology Ourgoalistosynthesizeafaceimagethatvisuallyappearstopertaintothetargetface,yetauto- maticfacerecognitionsystemseitherincorrectlymatchesthesynthesizedimagetoanotherperson ordoesnotmatchtotarget'sgalleryimages.AdvFacescomprisesofagenerator G ,adiscriminator D ,andfacematcher(seeFigure3.5). Generator Theproposedgeneratortakesaninputfaceimage, x 2X ,andoutputsanimage, G ( x ) .Thegeneratorisconditionedontheinputimage x ;fordifferentinputfaces,wewillget differentsynthesizedimages. Sinceourgoalistoobtainanadversarialimagethatismetricallysimilartotheprobeinthe imagespace, x ,itisnotdesirabletoperturballthepixelsintheprobeimage.Forthisreason, wetreattheoutputfromthegeneratorasanadditivemaskandtheadversarialfaceisas x + G ( x ) .Ifthemagnitudeofthepixelsin G ( x ) isminimal,thentheadversarialimagecomprises mostlyoftheprobe x .Here,wedenote G ( x ) asanfiadversarialmaskfl.Inordertoboundthe 2 https://github.com/ronny3050/AdvFaces 72 Figure3.5Givenaprobefaceimage,AdvFacesautomaticallygeneratesanadversarialmaskthat isthenaddedtotheprobetoobtainanadversarialfaceimage. magnitudeoftheadversarialmask,weintroducea perturbationloss duringtrainingbyminimizing the L 2 norm 3 : L perturbation = E x [max( kG ( x ) k 2 )] (3.3.1) where 2 [0 ; 1 ) isahyperparameterthatcontrolstheminimumamountofperturbationallowed. Inordertoachieveourgoalofimpersonatingatargetsubject'sfaceorobfuscatingone'sown identity,weneedafacematcher, F ,tosupervisethetrainingofAdvFaces.Forobfuscationattack, ateachtrainingiteration,AdvFacestriestominimizethecosinesimilaritybetweenfaceembed- dingsoftheinputprobe x andthegeneratedimage x + G ( x ) viaan identity lossfunction: L identity = E x [ F ( x ; x + G ( x ))] (3.3.2) Foranimpersonationattack,AdvFacesmaximizesthecosinesimilaritybetweenthefaceembed- dingsofarandomlychosentarget'sprobe, y ,andthegeneratedadversarialface x + G ( x ) via: L identity = E x [1 F ( y ; x + G ( x ))] (3.3.3) 3 Forbrevity,wedenote E x E x 2X . 73 ObfuscationAttack AdvFacesGFLM[32]PGD[33]FGSM[70] AttackSuccessRate(%)@0.1%FAR FaceNet[6]99.6723.3499.70 99.96 SphereFace[44]97.2229.49 99.34 98.71 ArcFace[9] 64.53 03.4333.2535.30 COTS-A 82.98 08.8918.7432.48 COTS-B 60.71 05.0501.4918.75 StructuralSimilarity 0.95 0.01 0.82 0.120.29 0.060.25 0.06 ComputationTime(s) 0.01 3.2211.740.03 ImpersonationAttack AdvFacesA 3 GN[161]PGD[33]FGSM[70] AttackSuccessRate(%)@0.1%FAR FaceNet[6]20.85 0.4005.99 0.19 76.79 0.26 13.04 0.12 SphereFace[44] 20.19 0.27 07.94 0.1909.03 0.3902.34 0.03 ArcFace[9] 24.30 0.44 17.14 0.2919.50 1.9508.34 0.21 COTS-A 20.75 0.35 15.01 0.3001.76 0.1001.40 0.08 COTS-B 19.85 0.28 10.23 0.5012.49 0.2404.67 0.16 StructuralSimilarity 0.92 0.02 0.69 0.040.77 0.040.48 0.75 ComputationTime(s) 0.01 0.0411.740.03 White-boxmatcher(usedfortraining) Black-boxmatcher(neverusedintraining) Table3.2Attacksuccessratesandstructuralsimilaritiesbetweenprobeandgalleryimagesfor obfuscationandimpersonationattacks.Attackratesforobfuscationcomprisesof484,514com- parisonsandthemeanandstandarddeviationacross10-foldsforimpersonationreported.The meanandstandarddeviationofthestructuralsimilaritiesbetweenadversarialandprobeimages alongwiththetimetakentogenerateasingleadversarialimage(onaQuadroM6000GPU)also reported. Theperturbationandidentitylossfunctionsenforcethenetworktolearnthesalientfacial regionsthatcanbeperturbedminimallyinordertoevadeautomaticfacerecognitionsystems. Discriminator AkintopreviousworksonGANs[144,150],weintroduceadiscriminatorin ordertoencourageperceptualrealismofthegeneratedimages.Weuseafully-convolutionnetwork asapatch-baseddiscriminator[150].Here,thediscriminator, D ,aimstodistinguishbetweena probe, x ,andageneratedadversarialfaceimage x + G ( x ) viaaGANloss: L GAN = E x [log D ( x )]+ E x [log(1 D ( x + G ( x )))] (3.3.4) 74 Finally,AdvFacesistrainedinanend-to-endfashionwiththefollowingobjectives: min D L D = GAN (3.3.5) min G L G = L GAN + i L identity + p L perturbation (3.3.6) where i and p arehyper-parameterscontrollingtherelativeimportanceofidentityandperturba- tionlosses,respectively.Notethat L GAN and L perturbation encouragethegeneratedimagestobe visuallysimilartotheoriginalfaceimages,while L identity optimizesforahighattacksuccessrate. Aftertraining,thegenerator G cangenerateanadversarialfaceimageforanyinputimageandcan betestedonanyblack-boxfacerecognitionsystem. 3.3.2ExperimentalSettings EvaluationMetrics WequantifytheeffectivenessoftheadversarialattacksgeneratedbyAdv- Facesandotherstate-of-the-artbaselinesvia(i) attacksuccessrate and(ii) structuralsimilarity (SSIM) . Theattacksuccessratefor obfuscationattack iscomputedas, AttackSuccessRate = (No.ofComparisons <˝ ) TotalNo.ofComparisons (3.3.7) whereeachcomparisonconsistsofasubject'sadversarialprobeandanenrollmentimage.Here, ˝ isapre-determinedthresholdcomputedat,say,0.1%FAR 4 .Attacksuccessratefor impersonation attack isas, AttackSuccessRate = (No.ofComparisons ˝ ) TotalNo.ofComparisons (3.3.8) Here,acomparisoncomprisesofanadversarialimagesynthesizedwithatarget'sprobeand matchedtothetarget'senrolledimage.Weevaluatethesuccessratefortheimpersonationset- tingvia10-foldcross-validationwhereeachfoldconsistsofarandomlychosentarget. Similartopriorstudies[161],inordertomeasurethesimilaritybetweentheadversarialex- 4 Foreachfacematcher,wepre-computethethresholdat 0 : 1% FARonallpossibleimagepairsinLFW.For e . g ., threshold@ 0 : 1% FARforArcFaceis 0 : 28 . 75 ampleandtheinputface,wecomputethestructuralsimilarityindex(SSIM)betweentheimages. SSIMisanormalizedmetricbetween 1 (completelydifferentimagepairs)to 1 (identicalimage pairs). Datasets WetrainAdvFacesonCASIA-WebFace[10]andthentestonLFW[8] 5 . CASIA-WebFace [10]iscomprisedof494,414faceimagesbelongingto10,575different subjects.Weremoved84subjectsthatarealsopresentinLFWandthetestingimagesinthis chapter. LFW [8]contains13,233web-collectedimagesof5,749differentsubjects.Inorderto computetheattacksuccessrate,weonlyconsidersubjectswithatleasttwofaceimages. Afterthis9,614faceimagesof1,680subjectsareavailableforevaluation. Allthetestingimagesinthischapterhavenoidentityoverlapwiththetrainingset,CASIA- WebFace[10]. ExperimentalSettings WeuseADAMoptimizersinTwwith 1 =0 : 5 and 2 =0 : 9 for theentirenetwork.Eachmini-batchconsistsof 32 faceimages.WetrainAdvFacesfor200,000 stepswithaedlearningrateof 0 : 0001 .Sinceourgoalistogenerateadversarialfaceswithhigh successrate,theidentitylossisofutmostimportance.Weempiricallyset i =10 : 0 and p =1 : 0 . Wetraintwoseparatemodelsandset =3 : 0 and =8 : 0 forobfuscationandimpersonation attacks,respectively.ArchitecturedetailsareprovidedintheAddendum(Sec.3.3.9). FaceRecognitionSystems Forallourexperiments,weemploy5state-of-the-artfacematchers 6 . Threeofthemarepubliclyavailable,namely,FaceNet[6],SphereFace[44],andArcFace[9]. Wealsoreportourresultsontwocommercial-off-the-shelf(COTS)facematchers,COTS-Aand COTS-B 7 .WeuseFaceNet[6]asthewhite-boxfacerecognitionmodel, F ,duringtraining. All 5 TrainingonCASIA-WebFaceandevaluatingonLFWisacommonapproachinfacerecognitionliterature[9,44] 6 Alltheopen-sourceandCOTSmatchersachieve99%accuracyonLFWunderLFWprotocol. 7 BothCOTS-AandCOTS-ButilizeCNNsforfacerecognition.COTS-BisoneofthetopperformersintheNIST OngoingFaceRecognitionVendorTest(FRVT)[207]. 76 GalleryProbeAdvFacesGFLM[32]PGD[33]FGSM[70] 0.680.140.260.270.04 0.380.080.120.210.02 (a)ObfuscationAttack Target'sGalleryTarget'sProbeProbeAdvFacesA 3 GN[161]FGSM[70] 0.780.100.300.290.36 0.800.150.340.330.42 (b)ImpersonationAttack Figure3.6AdversarialfacesynthesisresultsonLFWdatasetin(a)obfuscationand(b)imperson- ationattacksettings(cosinesimilarityscoresobtainedfromArcFace[9]withthreshold@ 0 : 1% FAR =0 : 28 ).Theproposedmethodsynthesizesadversarialfacesthatareseeminglyinconspic- uousandmaintainhighperceptualquality.AdditionalexamplesareavailableintheAddendum (Sec.3.3.9). thetestingimagesinthischapteraregeneratedfromthesamemodel(trainedonlywithFaceNet) andtestedondifferentmatchers. 3.3.3ComparisonwithState-of-the-Art Wecompareouradversarialfacesynthesismethodwithstate-of-the-artmethodsthathavespecif- icallybeenimplementedorproposedforfaces,includingGFLM[32],PGD[33],FGSM[70], 77 andA 3 GN[161] 8 .InTable3.2,wethatcomparedtothestate-of-the-art,AdvFacesgener- atesadversarialfacesthataresimilartotheprobe3.6.Moreover,theadversarialimagesattaina highobfuscationattacksuccessrateon4state-of-the-artblack-boxAFRsystemsinbothobfusca- tionandimpersonationsettings.AdvFaceslearnstoperturbthesalientregionsoftheface,unlike PGD[33]andFGSM[70],whichaltereverypixelintheimage.GFLM[32],ontheotherhand,ge- ometricallywarpsthefaceimagesandthereby,resultsinlowstructuralsimilarity.Inaddition,the state-of-the-artmatchersarerobusttosuchgeometricdeformationwhichexplainsthelowsuccess rateofGFLMonfacematchers.A 3 GN,anotherGAN-basedmethod,however,failstoachievea reasonablesuccessrateinanimpersonationsetting. 3.3.4AblationStudy Inordertoanalyzetheimportanceofeachmoduleinoursystem,inFigure3.7,wetrainthreevari- antsofAdvFacesforcomparisonbyremovingthediscriminator( D ),perturbationloss L perturbation , andidentityloss L identity ,respectively.Thediscriminatorhelpstoensurethevisualqualityofthe synthesizedfacesaremaintained.Withthegeneratoralone,undesirableartifactsareintroduced. Withouttheproposedperturbationloss,perturbationsintheadversarialmaskareunboundedand therefore,leadstoalackinperceptualquality.Theidentitylossisimperativeinensuringanad- versarialimageisobtained.Withouttheidentityloss,thesynthesizedimagecannotevadestate-of- the-artfacematchers.WethateverycomponentofAdvFacesisnecessaryinordertoobtain anadversarialfacethatisnotonlyperceptuallyrealisticbutcanalsoevadestate-of-the-artface matchers. 3.3.5WhatisAdvFacesLearning? Via L perturbation ,duringtraining,AdvFaceslearnstoperturbonlythesalientfacialregionsthat canevadethefacematcher, F (FaceNet[6]inourcase).InFigure3.8,AdvFacessynthesizesthe adversarialmaskscorrespondingtotheprobes.Wethenthresholdthemasktoextractpixelswith 8 Wetrainthebaselinesusingtheirofimplementations(detailedintheAddendum(Sec.3.3.9)). 78 Inputw/o D w/o L prt w/o L idt withall Figure3.7VariantsofAdvFacestrainedwithoutthediscriminator,perturbationloss,andidentity loss,respectively.EverycomponentofAdvFacesisnecessary. perturbationmagnitudesexceeding 0 : 40 .Itcanbeinferredthattheeyebrows,eyeballs,andnose containhighlydiscriminativeinformationthatanAFRsystemutilizestoidentifyanindividual. Therefore,perturbingthesesalientregionsareenoughtoevadestate-of-the-artfacerecognition systems. 3.3.6TransferabilityofAdvFaces InTable3.2,wethatattackssynthesizedbyAdvFaceswhentrainedonawhite-boxmatcher (FaceNet),cansuccessfullyevade5otherfacematchersthatarenotutilizedduringtrainingin bothobfuscationandimpersonationsettings.Inordertoinvestigatethetransferabilityproperty ofAdvFaces,weextractfaceembeddingsofrealimagesandtheircorrespondingadversarialim- ages,undertheobfuscationsetting,viathewhite-boxmatcher(FaceNet)andablack-boxmatcher (ArcFace).Intotal,weextractfeaturevectorsfrom1,456faceimagesof10subjectsintheLFW dataset[8].InFigure3.9,weplotthecorrelationheatmapbetweenfacefeaturesofrealimages, theircorrespondingadversarialmasksandadversarialimages.First,weobservethatfaceem- beddingsofrealimagesextractedbyFaceNetandArcFacearecorrelatedinasimilarfashion. 79 ProbeAdv.MaskVisualizationAdv.Image 0.12 0.26 Figure3.8State-of-the-artfacematcherscanbeevadedbyslightlyperturbingsalientfacialregions, suchaseyebrows,eyeballs,andnose(cosinesimilarityobtainedviaArcFace[9]). Figure3.9CorrelationbetweenfacefeaturesextractedviaFaceNetandArcFacefrom1,456images belongingto10subjects. Thisindicatesthatbothmatchersextractfeatureswithrelatedpairwisecorrelations.Consequently, perturbingsalientfeaturesforFaceNetcanleadtohighattacksuccessratesforArcFaceaswell. Thesimilarityamongthecorrelationdistributionsofbothmatcherscanalsobeobservedwhen 80 Figure3.102Dt-SNEvisualizationoffacerepresentationsextractedviaFaceNetandArcFace from1,456imagesbelongingto10subjects. adversarialmasksandadversarialimagesareinputtothematchers.Thatis,receptivefor automaticfacerecognitionsystemsattendtosimilarregionsintheface.Tofurtherillustratethe distributionsoftheembeddingsofrealandsynthesizedimages,weplotthe2Dt-SNEvisualization ofthefaceembeddingsforthe10subjectsinFigure3.10.Theidentityclusteringscanbeclearly observedfrombothrealandadversarialimages.Inparticular,theadversarialcounterpartofeach subjectformsanewclusterthatdrawsclosertotheadversarialclusteringsofothersubjects.This showsthatAdvFacesperturbsonlysalientpixelsrelatedtofaceidentitywhilemaintainingase- manticmeaninginthefeaturespace,resultinginasimilarmanifoldofsynthesizedfacestothatof realfaces. 3.3.7EffectofPerturbationAmount Theperturbationloss, L perturbation isboundedbyahyper-parameter, , i . e .,the L 2 normofthe adversarialmaskmustbeatleast .Withoutthisconstraint,theadversarialmaskbecomesablank imagewithnochangestotheprobe.With ,wecanobserveatrade-offbetweentheattacksuccess 81 Figure3.11Trade-offbetweenattacksuccessrateandstructuralsimilarityforimpersonationat- tacks.Wechoose =8 : 0 . rateandthestructuralsimilaritybetweentheprobeandsynthesizedadversarialface(Fig.3.11).A higher leadstolessperturbationrestriction,resultinginahigherattacksuccessrateatthecostof alowerstructuralsimilarity.Foranimpersonationattack,thisimpliesthattheadversarialimage maycontainfacialfeaturesfromboththehackerandthetarget.Inourexperiments,wechose =8 : 0 and =3 : 0 forimpersonationandobfuscationattacks,respectively. 3.3.8HumanPerceptualStudy For500realfaceimages(probes),wegenerate500correspondingadversarialexamplesviaAd- vFaces,GFLM[32],A 3 GN[161],PGD[33],andFGSM[70].Wethenperformedauserstudy onAmazonMechanicalTurk(AMT).Aworkerisshownaprobealongwiththe5adversarial faces.Theworkerthenhasunlimitedtimetodecidewhichadversarialface,amongthe5possible choices,isthemostsimilartotheprobe. 82 (a)Probe (b)AdvFaces (c)GFLM[32] (d)Probe (e)AdvFaces (f)PGD[33] Figure3.12Examplefailurecaseswherehumanobserversvotedanadversarialimagesynthesized by(c)GFLM[32]and(f)PGD[33]tobeclosertotheprobeface(a),(d)comparedtoAdvFaces (b),(e). Table3.3Foreachmethod,theaverageandstandarddeviation( % )ofthenumberoftimesworkers chosethesynthesizedimagetobeclosesttotheprobe. Method HitRate(%) AdvFaces 62.06 5.06 GFLM[32] 23 : 52 3 : 82 A 3 GN[161] 00 : 60 0 : 84 PGD[33] 12 : 80 3 : 88 FGSM[70] 00 : 38 0 : 73 Intotal,wecomputeresultsfrom100differentworkers.FromTables3.2and3.3,wethat AdvFacesgeneratesadversarialfacesthatarenotonlyeffectiveinevadingfacematchers,butare alsovisuallysimilartotheprobesandoutperformsthestate-of-the-artadversarialfacesynthesis methods.Indeed,someadversarialfaceimagessynthesizedbythebaselinesarevotedtobecloser totheprobe(seeFigure3.12),however,comparedtoAdvFaces,thesemethodshavealowsuccess rate(seeTable3.2). 83 3.3.9Addendum 3.3.9.1ImplementationDetails AdvFaces isimplementedusingTwr1.12.0.AsingleNVIDIAQuadroM6000GPUis usedfortrainingandtesting. DataPreprocessing AllfaceimagesarepassedthroughMTCNNfacedetector[41]todetect velandmarks(twoeyes,nose,andtwomouthcorners).Viasimilaritytransformation,theface imagesarealigned.Aftertransformation,theimagesareresizedto 160 160 .Priortotraining andtesting,eachpixelintheRGBimageisnormalizedbysubtracting127.5anddividingby128. Architecture Let c7s1-k bea 7 7 convolutionallayerwith k andstride 1 . dk denotes a 4 4 convolutionallayerwith k andstride 2 . Rk denotesaresidualblockthatcontainstwo 3 3 convolutionallayers. uk denotesa 2 upsamplinglayerfollowedbya 5 5 convolutional layerwith k andstride 1 .WeapplyInstanceNormalizationandBatchNormalizationtothe generatoranddiscriminator,respectively.WeuseLeakyReLUwithslope0.2inthediscriminator andReLUactivationinthegenerator.Thearchitecturesofthetwomodulesareasfollows: Generator: c7s1-64,d128,d256,R256,R256,R256,u128,u64,c7s1-3 Discriminator: d32,d64,d128,d256,d512 A 1 1 convolutionallayerwith 3 andstride 1 isattachedtothelastconvolutionallayerof thediscriminatorforthepatch-basedGANloss L GAN . Weapplythe tanh activationfunctiononthelastconvolutionlayerofthegeneratortoensure thatthegeneratedimage 2 [ 1 ; 1] .Inthechapter,wedenotedtheoutputofthetanhlayerasan fiadversarialmaskfl, G ( x ) 2 [ 1 ; 1] and x 2 [ 1 ; 1] .Theadversarialimageiscomputedas x adv =2 clamp G ( x )+ x +1 2 1 0 1 .Thisensures G ( x ) caneitheraddorsubtractpixelsfrom x when G ( x ) 6 =0 .When G ( x ) ! 0 ,then x adv ! x . 84 Theoverallalgorithmdescribingthetrainingprocedureof AdvFaces canbefoundinAlgo- rithm1. Algorithm1 Training AdvFaces .Allexperimentsinthisworkuse =0 : 0001 , 1 =0 : 5 , 2 =0 : 9 , i =10 : 0 , p =1 : 0 , m =32 . Weset =3 : 0 (obfuscation), =8 : 0 (impersonation). 1: Input 2: X TrainingDataset 3: F Cosinesimilaritybetweenanimagepairobtainedbyfacematcher 4: G Generatorwithweights G 5: D Discriminatorwithweights D 6: m Batchsize 7: Learningrate 8: for numberoftrainingiterations do 9: Sampleabatchofprobes f x ( i ) g m i =1 ˘X 10: if impersonationattack then 11: Sampleabatchoftargetimages y ( i ) ˘X 12: ( i ) = G (( x ( i ) ;y ( i ) ) 13: elseif obfuscationattack then 14: ( i ) = G ( x ( i ) ) 15: endif 16: x ( i ) adv = x ( i ) + ( i ) 17: L perturbation = 1 m P m i =1 max jj ( i ) jj 2 18: if impersonationattack then 19: L identity = 1 m h P m i =1 F x ( i ) ;x ( i ) adv 20: elseif obfuscationattack then 21: L identity = 1 m h P m i =1 1 F y ( i ) ;x ( i ) adv 22: endif 23: L G GAN = 1 m h P m i =1 log 1 D ( x ( i ) adv ) 24: L D = 1 m P m i =1 h log D ( x ( i ) ) + log 1 D ( x ( i ) adv ) 25: L G = L G GAN + i L identity + p L perturbation 26: G = Adam ( O G L G ; G ;; 1 ; 2 ) 27: D = Adam ( O D L D ; D ;; 1 ; 2 ) 28: endfor 3.3.9.2EffectonCosineSimilarity InFigure3.13weseetheeffectoncosinesimilarityscoreswhenadversarialfaceimagessynthe- sizedbyAdvFacesisintroducedtoablack-boxfacematcher,ArcFace[9].Amajority( 64 : 53% ) 85 Figure3.13ShiftincosinesimilarityscoresforArcFace[9]beforeandafteradversarialattacks generatedviaAdvFaces. ofthescoresfallbelowthethresholdat 0 : 1% FARcausingtheAFRsystemtofalselyrejectunder obfuscationattack.Intheimpersonationattacksetting,thesystemfalselyaccepts 24 : 30% ofthe imagepairs. 3.3.9.3StructuralSimilarity Imagecomparisontechniques,suchMeanSquaredError(MSE)orPeakSignal-to-NoiseRatio (PSNR),estimatetheabsoluteerrors,disregardingthe perceptual differences;ontheotherhand, SSIMisaperception-basedmodelthatconsidersimagedifferencesasperceivedchangeinstruc- turalinformation,whilealsoincorporatingimportantperceptualphenomena,includingbothlu- minancemaskingandcontrastmaskingterms.Forinstance,considertheimagepaircomprising oftwoimagesofMingXi.Wecannoticethatperceptually,theimagepairsaresimilar,butthis perceptualsimilarityisnotappropriatelyinMSEandPSNR.Since,SSIMisanormal- izedsimilaritymetric,itisbettersuitedforourapplicationwhereafaceimagepairissubjectively judgedbyhumanoperators. 86 (a)Probe (b)Adversarial SSIM:00.96MSE:40.82PSNR:32.02 Figure3.15Left:RealfaceimagesintheLFWdataset.Right:Adversarialimagessynthesizedvia AdvFacesunderobfuscationsetting. 3.3.9.4BaselineImplementationDetails Allthestate-of-the-artbaselinesinthechapterareimplementationsproposedforevad- ingfacerecognitionsystems. 87 FGSM[70] WeusetheCleverhansimplementation 9 ofFGSMonFaceNet.Thisimplementation supportsbothobfuscationandimpersonationattacks.Theonlywaschanging = 0 : 01 to =0 : 08 inordertocreatemoreeffectiveattacks. PGD[33] WeuseavariantofPGDproposedforfacerecognitionsystems 10 .Orig- inally,thisimplementationisproposedforimpersonationattacks,however,forobfuscationwe randomlychooseatargetotherthangenuinesubject.Wedonotmakeanytothe parameters. GFLM[32] Codeforthislandmark-basedattacksynthesismethodispubliclyavailable 11 .This methodreliesonsoftmaxprobalitiesimplyingthatthetrainingandtestingidentitiesareed. Originally,theistrainedonCASIA-WebFace.However,forafairerevaluation,we trainedafaceonLFWandthenrantheattack. A 3 GN[161] Tothebestofourknowledge,thereisnopubliclyavailableimplementationof A 3 GN.Wemadethefollowingtoachieveaneffectivebaseline: TheauthorsoriginallyusedArcFace[9]asthetargetmodel.Sinceallotherbaselinesemploy FaceNetasthetargetmodel,wealsousedFaceNetfortrainingA 3 GN. Originally,acycle-consistencylosswasproposedforcontentpreservation.However,we werenotabletoreproducethisandtherefore,optedforthesame L 1 normloss,butwithout thesecondgenerator.Thisgreatlyhelpsinthevisualqualityofthegeneratedadversarial image.Thatis,wemodiEquation3[161],from L rec = E x;z [ jj x G 2 ( G 1 ( x;z )) jj 1 ] to L rec = E x;z [ jj x G 1 ( x;z ) jj 1 ] 9 https://githubw/cleverhans/tree/master/examples/facenet adversarial faces 10 https://github.com/ppwwyyxx/Adversarial-Face-Attack 11 https://github.com/alldbi/FLM 88 3.4DefendingAgainstAdversarialFaces Theaccuracy,usability,andtouchlessacquisitionofstate-of-the-art(SOTA)AFRsystemshave ledtotheirubiquitousadoptioninaplethoraofdomains.However,thishasalsoinadvertently sparkedacommunityofattackersthatdedicatetheirtimeandefforttomanipulatefaceseither physically[208,209]ordigitally[210],inordertoevadeAFRsystems[82].AFRsystemshave beenshowntobevulnerabletoadversarialattacksresultingfromperturbinganinputprobe[16, 32,142,186].Evenwhentheamountofperturbationisimperceptibletothehumaneye,such adversarialattackscandegradethefacerecognitionperformanceofSOTAAFRsystems[16]. Withthegrowingdisseminationoffifakenewsflandfideepfakesfl[81],researchgroupsandsocial mediaplatformsalikearepushingtowardsgeneralizabledefenseagainstcontinuouslyevolving adversarialattacks. Figure3.16LeonardoDiCaprio'srealfacephoto(a)enrolledinthegalleryand(b)hisprobeim- age 12 ;(c)Adversarialprobesynthesizedbyastate-of-the-art(SOTA)adversarialfacegenerator, AdvFaces[16];(d)Proposedadversarialdefenseframework,namely FaceGuard takes(c)asin- put,detectsadversarialimages,localizesperturbedregions,andoutputsafifacedevoid ofadversarialperturbations.ASOTAfacerecognitionsystem,ArcFace,failstomatchLeonardo's adversarialface(c)to(a),however,thefacecansuccessfullymatchto(a).Cosinesimilar- ityscores( 2 [ 1 ; 1] )obtainedviaArcFace[9]areshownbelowtheimages.Ascoreabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatesthattwofacesareofthesamesubject. Aconsiderableamountofresearchhasfocusedonsynthesizingadversarialattacks[16,32,34, 35,141,186].Obfuscationattempts(facesareperturbedsuchthattheycannotbeas theattacker)aremoreeffective[16],computationallyeftosynthesize[34,35],andwidely 11 https://bit.ly/2IkfSxk 89 0.21 (a)[16] 0.27 (b)[34] 0.28 (b)[35] 0.32 (c)[186] 0.34 (d)[32] 0.35 (e)[141] Figure3.17 (TopRow) Adversarialfacessynthesizedvia 6 adversarialattacksusedinourstudy. (BottomRow) Correspondingadversarialperturbations(grayindicatesnochangefromtheinput). Noticethediversityintheperturbations.ArcFacescoresbetweenadversarialimageandtheunal- teredgalleryimage(notshownhere)aregivenbeloweachimage.Ascoreabove 0.36 indicates thattwofacesareofthesamesubject.Zoominfordetails. adopted[211]comparedtoimpersonationattacks(perturbedfacescanautomaticallymatchtoa targetsubject).Similartopriordefenseefforts[172,179],thissectionofthechapterfocuseson defendingagainstobfuscationattacks(seeFig.3.16).Givenaninputprobeimage, x ,anadversarial generatorhastworequirementsundertheobfuscationscenario:(1)synthesizeanadversarialface image, x adv = x + ,suchthatSOTAAFRsystemsfailtomatch x adv and x ,and(2)limitthe magnitudeofperturbation jj jj p suchthat x adv appearsverysimilarto x tohumans. Anumberofapproacheshavebeenproposedtodefendagainstadversarialattacks.Theirmajor shortcomingis generalizability tounseenadversarialattacks.Adversarialfaceperturbationsmay vary(seeFig.3.17).Forinstance,gradient-basedattacks,suchasFGSM[35]and PGD[35],perturbeverypixelinthefaceimage,whereas,AdvFaces[16]andSemanticAdv[186] perturbonlythesalientfacialregions, e.g. ,eyes,nose,andmouth.Ontheotherhand,GFLM[32] performsgeometricwarpingtotheface.Sincetheexacttypeofadversarialperturbationmaynotbe knownapriori,adefensesystemtrainedonasubsetofadversarialattacktypesmayhavedegraded performanceonotherunseenattacks. Tothebestofourknowledge,wetakethesteptowardsacompletedefenseagainstad- versarialfacesbyintegratinganadversarialfacegenerator,adetector,andaintoa 90 Figure3.18 FaceGuard employsadetector( D )tocomputeanadversarialscore.Scoresbelow detectionthreshold( ˝ )passestheinputtoAFR,andhighvalueinvokesaandsendsthe facetotheAFRsystem. framework,namely FaceGuard (seeFig.3.18).Robustnesstounseenadversarialattacksisim- partedviaastochasticgeneratorthatoutputsdiverseperturbationsevadinganAFRsystem,while adetectorcontinuouslylearnstodistinguishthemfromrealfaces.Concurrently,aremoves theadversarialperturbationsfromthesynthesizedimage. Thisworkmakesthefollowingcontributions: Anewself-supervisedframework,namely FaceGuard ,fordefendingagainstadversarialface images. FaceGuard combinesofadversarialtraining,detection,andinto adefensemechanismtrainedinanend-to-endmanner. Withtheproposeddiversityloss,ageneratorisregularizedtoproducestochasticandchal- lengingadversarialfaces.Weshowthatthediversityinoutputperturbationsissuffor improving FaceGuard 'srobustnesstounseenattackscomparedtoutilizingpre-computed trainingsamplesfromknownattacks. Synthesizedadversarialfacesaidthedetectortolearnatightdecisionboundaryaroundreal faces. FaceGuard 'sdetectorachievesSOTAdetectionaccuraciesof 99 : 81% , 98 : 73% ,and 99 : 35% on 6 unseenattacksonLFW[8],Celeb-A[17],andFFHQ[18]. Asthegeneratortrains,aconcurrentlyremovesperturbationsfromthesynthesized adversarialfaces.Withtheproposedloss,thedetectoralsoguidesstraining toensureimagesaredevoidofadversarialperturbations.At0.1%FalseAccept 91 (a)AdversarialTraining[11] (b)Detection[166] Figure3.19(a)AdversarialtrainingdegradesAFRperformanceofFaceNetmatcher[6]onreal facesinLFWdatasetcomparedtostandardtraining.(b)Abinarytrainedtodistinguish betweenrealfacesandFGSM[34]attacksfailstodetectunseenattacktype,namelyPGD[35]. Rate, FaceGuard 'senhancestheTrueAcceptRateofArcFace[9]from 34 : 27% undernodefenseto 77 : 46% . 3.4.1LimitationsofState-of-the-ArtDefenses Robustness. Adversarialtrainingisregardedasoneofthemosteffectivedefensemethod[12, 34,35]onsmalldatasetsincludingMNISTandCIFAR10.Whetherthistechniquecanscaleto largedatasetsandavarietyofdifferentattacktypes(perturbationsets)hasnotyetbeenshown. Adversarialtrainingisformulatedas[34,35]: min E ( x;y ) ˘P data max 2 ` ( f ( x + ) ;y ) ; (3.4.1) where ( x;y ) ˘P data isthe(image,label)jointdistributionofdata, f ( x ) isthenetworkparameter- izedby ,and ` ( f ( x ) ;y ) isthelossfunction(usuallycross-entropy).Sincethegroundtruthdata distribution, P data ,isnotknowninpractice,itislaterreplacedbytheempiricaldistribution.Here, thenetwork, f ismaderobustbytrainingwithanadversarialnoise( )thatmaximallyincreases theloss.Inotherwords,adversarialtraininginvolvestrainingwiththe strongest adversarialattack. 92 Figure3.20Overviewoftrainingtheproposed FaceGuard inaself-supervisedmanner.An adver- sarialgenerator , G ,continuouslylearnstosynthesizechallenginganddiverseperturbationsthat evadeafacematcher.Atthesametime,a detector , D ,learnstodistinguishbetweenthesynthe- sizedadversarialfacesandrealfaceimages.Perturbationsresidinginthesynthesizedadversarial facesareremovedviaa , P ur . Thegeneralizationofadversarialtraininghasbeeninquestion[12,13,187,188,212].It wasshownthatadversarialtrainingcanreduceaccuracyonrealexam- ples[187,188].Inthecontextoffacerecognition,weillustratethisbytrainingtwofacematchers onCASIA-WebFace:(i)FaceNet[6]trainedviathestandardtrainingprocess,and(ii)FaceNet[6] byadversarialtraining(FGSM 13 ).Wethencomputefacerecognitionperformanceacrosstraining iterationsonaseparatetestingdataset,LFW[8].Fig.3.19ashowsthatadversarialtrainingdrops theaccuracyfrom 99 : 13% ! 98 : 27% .Wegainthefollowinginsight:adversarialtrainingmay degradeAFRperformanceonrealfaces. Detection. Detection-basedapproachesemployapre-processingsteptofidetectflwhetheraninput faceisrealoradversarial[166,167,179,191].Acommonapproachistoutilizeabinary, D ,thatmapsafaceimage, x 2 R H W C to f 0 ; 1 g ,where 0 indicatesarealand 1 anadversarial face.WetrainabinarytodistinguishbetweenrealandFGSMattacksamplesinCASIA- WebFace[10].InFig.3.19b,weevaluateitsdetectionaccuracyonFGSMandPGDsamplesin LFW[8].Wethatprevailingdetection-baseddefenseschemesmayovtothe adversarialattacksutilizedfortraining. 13 Withmaxperturbationhyperparameteras =8 = 256 . 93 3.4.2ProposedMethodology OurdefenseaimstoachieverobustnesswithoutAFRperformanceonrealfaceimages. Wepositthatanadversarialdefensetrainedalongsideanadversarialgeneratorina self-supervised mannermayimproverobustnesstounseenattacks.Themainintuitionsbehindourdefensemech- anismareasfollows: SinceadversarialtrainingmaydegradeAFRperformance,weopttoobtainarobustadversarial detector and todetectandpurifyadversarialattacks. Giventhatprevailingdetection-basedmethodstendtoovtoknownadversarialperturbations (seeAddendum(Sec.3.4.6)),adetectorandtrainedon diverse synthesizedadversarial perturbationsmaybemorerobusttounseenattacks. Sufdiversityinsynthesizedperturbationscanguidethedetectortolearnatighterbound- aryaroundrealfaces.Inthiscase,thedetectoritselfcanserveasapowerfulsupervisionforthe . Lastly,pixelsinvolvedintheprocessmayservetoindicateadversarialregionsin theinputface. 3.4.2.1AdversarialGenerator Thegeneralizabilityofanadversarialdetectorandreliesonthequalityofthesynthesized adversarialfaceimagesoutputby FaceGuard 'sadversarialgenerator.Weproposeanadversarial generatorthatcontinuouslylearnstosynthesizechallenginganddiverseadversarialfaceimages. Thegenerator,denotedas G ,takesaninputrealfaceimage, x 2 R H W C ,andoutputsanad- versarialperturbation G ( x ; z ) ,where z ˘N (0 ; I ) isarandomlatentvector.Inspiredbyprevailing adversarialattackgenerators[16,34,35,136,141],wetreattheoutputperturbation G ( x ; z ) asan additive perturbationmask .Theadversarialfaceimage, x adv ,isgivenby x adv = x + G ( x ; z ) . Inanefforttoimpartgeneralizabilitytothedetectorand,weemphasizethefollowing requirementsof G : Adversarial: Perturbatation, G ( x ; z ) ,needstobeadversarialsuchthatanAFRsystemcan- 94 notidentifytheadversarialfaceimage x adv asthesamepersonastheinputprobe x . VisuallyRealistic: Perturbation G ( x ; z ) shouldalsobeminimalsuchthat x adv appearsasa legitimatefaceimageofthesubjectintheinputprobe x . Stochastic: Foraninput x ,werequirediverseadversarialperturbations, G ( x ; z ) ,fordiffer- entlatents z . Forsatisfyingalloftheaboverequirements,weproposemultiplelossfunctionstotrainthe generator. ObfuscationLoss Toensure G ( x ; z ) isindeed adversarial ,weincorporateawhite-boxAFRsys- tem, F ,tosupervisethegenerator.Givenaninputface, x ,thegeneratoraimstooutputanadver- sarialface, x adv = x + G ( x ; z ) suchthatthefacerepresentations, F ( x ) and F ( x adv ) ,donotmatch. Inotherwords,thegoalistominimizethecosinesimilaritybetweenthetwofacerepresentations 14 : L obf = E x F ( x ) F ( x adv ) jjF ( x ) jjjjF ( x adv ) jj : (3.4.2) PerturbationLoss Withtheidentitylossalone,thegeneratormayoutputperturbationswithlarge magnitudeswhichwill(a)betrivialforthedetectortorejectand(b)violatethevisualrealism requirementof x adv .Therefore,werestricttheperturbationstobewithin [ ] viaahingeloss: L pt = E x [max( jjG ( x ; z ) jj 2 )] : (3.4.3) DiversityLoss Theabovetwolossesjointlyensurethatateachstep,ourgeneratorlearnstooutput challengingadversarialattacks.However,theseattacksaredeterministic;foraninputimage,we willobtainthesameadversarialimage.Thismayagainleadtoaninferiordetectorthatovtoa fewdeterministicperturbationsseenduringtraining.Motivatedbystudiesofpreventingmodecol- lapseinGANs[213],weproposemaximizingadiversitylosstopromotestochasticperturbations pertrainingiteration, i : L div = 1 N ite N ite X i =1 G ( x ; z 1 ) ( i ) G ( x ; z 2 ) ( i ) 1 jj z 1 z 2 jj 1 ; (3.4.4) where N ite isthenumberoftrainingiterations, G ( x ; z ) ( i ) istheperturbationoutputatiteration i , 14 Forbrevity,wedenote E x E x 2P data . 95 and ( z 1 ; z 2 ) aretwoi.i.d.samplesfrom z ˘N (0 ; I ) .Thediversitylossensuresthatfortworandom latentvectors, z 1 and z 2 ,wewillobtaintwodifferentperturbations G ( x ; z 1 ) ( i ) and G ( x ; z 2 ) ( i ) . GANLoss AkintopriorworkonGANs[144,150],weintroduceadiscriminatortoencourage perceptualrealismoftheadversarialimages.Thediscriminator, Dsc ,aimstodistinguishbetween probes, x ,andsynthesizedfaces x adv viaaGANloss: L GAN = E x [log Dsc ( x )]+ E x [log(1 Dsc ( x adv ))] : (3.4.5) 3.4.2.2AdversarialDetector Similartoprevailingadversarialdetectors,theproposeddetectoralsolearnsadecisionbound- arybetweenrealandadversarialimages[166,167,179,191].Akeydifference,however,isthat insteadofutilizingpre-computedadversarialimagesfromknownattacks( e . g .FGSMandPGD) fortraining,theproposeddetectorlearnstodistinguishbetweenrealimagesandthe synthesized setofdiverseadversarialattacksoutputbytheproposedadversarialgeneratorinaself-supervised manner.Thisleadstothefollowingadvantage: ourproposedframeworkdoesnotrequirealarge collectionofpre-computedadversarialfaceimagesfortraining . WeutilizeabinaryCNNfordistinguishingbetweenrealinputprobes, x ,andsynthesized adversarialsamples, x adv .ThedetectoristrainedwiththeBinaryCross-Entropyloss: L BCE = E x [ log D ( x )]+ E x [ log (1 D ( x adv ))] : (3.4.6) 3.4.2.3Adversarial Theobjectiveoftheadversarialistorecovertherealfaceimage x givenanadversarialface x adv .Weaimtoautomaticallyremovetheadversarialperturbationsbytraininganeuralnetwork P ur ,referredasanadversarial. Theadversarialprocesscanbeviewedasaninvertedprocedureofadversarial imagesynthesis.Contrarytotheobfuscationlossintheadversarialgenerator,werequirethat theimage, x pur ,successfullymatchestothesubjectintheinputprobe x .Notethatthis 96 Attacks TAR(%)@ 0 : 1% FAR ( # ) SSIM ( " ) FGSM[34] 26 : 230 : 83 0 : 24 PGD[35] 04 : 910 : 89 0 : 12 DeepFool[141] 36 : 180 : 91 0 : 09 AdvFaces[16] 00 : 170 : 89 0 : 02 GFLM[32] 68 : 030 : 55 0 : 14 SemanticAdv[186] 70 : 050 : 71 0 : 21 NoAttack 99 : 821 : 00 0 : 00 Table3.4FacerecognitionperformanceofArcFace[9]underadversarialattackandaveragestruc- turalsimilarities(SSIM)betweenprobeandadversarialimagesforobfuscationattackson 485 K genuinepairsinLFW[8]. canbeachievedviaa featurerecoveryloss ,whichistheoppositetotheobfuscationloss, i.e. , L fr = obf . Notethatanadversarialfaceimage, x adv = x + ,ismetricallyclosetotherealimage, x ,inthe inputspace.Ifwecanestimate ,thenwecanretrievetherealfaceimage.Here,theperturbations canbepredictedbyaneuralnetwork, P ur .Inotherwords,retrievingtheimage, x pur involves:(1)subtractingtheperturbationsfromtheadversarialimage, x pur = x adv P ur ( x adv ) and(2)ensuringthatthe mask , P ur ( x adv ) ,issmallsothatwedonotalterthecontent ofthefaceimagebyalargemagnitude.Therefore,weproposeahybridperceptuallossthat(1) ensures x pur isascloseaspossibletotherealimage, x viaa ` 1 reconstructionlossand(2)aloss thatminimizestheamountofalteration, P ur ( x adv ) : L perc = E x jj x pur x jj 1 + jjP ur ( x adv ) jj 2 : (3.4.7) Finally,wealsoincorporateourdetectortoguidethetrainingofour.Notethat,dueto thediversityinsynthesizedadversarialfaces,theproposeddetectorlearnsatightdecisionbound- aryaroundrealfaces.Thiscanserveasastrongself-supervisorysignaltotheforensuring thattheimagesbelongtotherealfacedistribution.Therefore,wealsoincorporatethe detectorasadiscriminatorfortheviatheproposedloss: L bf = E x [ log D ( x pur )] : (3.4.8) 97 DetectionAccuracy(%)YearFGS[34]PGD[35]DpFl.[141]AdvF.[16]GFLM[32]Sem.[186]Mean Std. General Gong etal. [166] 201798 : 9497 : 9195 : 8792 : 69 99 : 9299 : 92 97 : 54 02 : 82 ODIN[175] 201883 : 1284 : 3971 : 7450 : 0187 : 2585 : 6877 : 03 14 : 34 Steganalysis[178] 201988 : 7689 : 3475 : 9754 : 3058 : 9978 : 6274 : 33 14 : 77 Face UAP-D[167] 201861 : 3274 : 3356 : 7851 : 1165 : 3376 : 7864 : 28 09 : 97 SmartBox[172] 201858 : 7962 : 5351 : 3254 : 8750 : 9762 : 1456 : 77 05 : 16 Goswami etal. [176] 201984 : 5691 : 3289 : 7576 : 5152 : 9781 : 1279 : 37 14 : 04 Massoli etal. [179](MLP) 202063 : 5876 : 2881 : 7888 : 3851 : 9752 : 9869 : 16 15 : 29 Massoli etal. [179](LSTM) 202071 : 5376 : 4388 : 3275 : 4353 : 7655 : 2270 : 11 13 : 35 Agarwal etal. [182] 202094 : 4495 : 3891 : 1974 : 3251 : 6887 : 0387 : 03 16 : 86 ProposedFaceGuard 2021 99 : 8599 : 8599 : 8599 : 84 99 : 6199 : 85 99 : 81 00 : 10 Table3.5DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixadversarial attackssynthesizedfortheLFWdataset[8].Detectionthresholdissetas 0 : 5 forallmethods.All baselinemethodsrequiretrainingonpre-computedadversarialattacksonCASIA-WebFace[10]. Ontheotherhand,theproposed FaceGuard isself-guidedandgeneratesadversarialattacksonthe .Hence,itcanberegardedasa black-box defensesystem. 3.4.2.4TrainingFramework Wetraintheentire FaceGuard frameworkinFig.3.20inanend-to-endmannerwiththefollowing objectives: min G L G = L GAN + obf L obf + pt L pt div L div ; min D L D = L BCE ; min P ur L P ur = fr L fr + perc L perc + bf L bf : Ateachtrainingiteration,thegeneratorattemptstofoolthediscriminatorbysynthesizingvisually realisticadversarialfaceswhilethediscriminatorlearnstodistinguishbetweenrealandsynthe- sizedimages.Ontheotherhand,inthesameiteration,anexternalcriticnetwork,namelydetector D ,learnsadecisionboundarybetweenrealandsynthesizedadversarialsamples.Concurrently,the P ur learnstoinverttheadversarialsynthesisprocess.Notethatthereisakeydifference betweenthediscriminatorandthedetector:thegeneratorisdesignedto fool thedis- criminatorbutnotnecessarilythedetector.Wewillshowinourexperimentsthatthiscrucialstep preventsthedetectorfrompredicting D ( x )=0 : 5 forall x (seeTab.3.7). 98 3.4.3ExperimentalSettings Datasets. Wetrain FaceGuard onrealfaceimagesinCASIA-WebFace[10]datasetandthen evaluateonrealandadversarialfacessynthesizedforLFW[8],Celeb-A[17]andFFHQ[18] datasets.CASIA-WebFace[10]comprisesof 494 ; 414 faceimagesfrom 10 ; 575 15 differentsub- jects.LFW[8]contains 13 ; 233 faceimagesof 5 ; 749 subjects.Sinceweevaluatedefensesunder obfuscationattacks,weconsidersubjectswithatleasttwofaceimages 16 .Afterthis 9 ; 164 faceimagesof 1 ; 680 subjectsinLFWareavailableforevaluation.Forbrevity,experimentson CelebAandFFHQareprovidedinAddendum(Sec.3.4.6). Implementation. Theadversarialgeneratorandemployaconvolutionalencoder-decoder. Thelatentvariable z ,a 128 -dimensionalfeaturevector,isfedasinputtothegeneratorthrough spatialpaddingandconcatenation.Theadversarialdetector,a 4 -layerbinaryCNN,istrained jointlywiththegeneratorand.Empirically,weset obf = fr =10 : 0 , pt = perc =1 : 0 , div =1 : 0 , bf =1 : 0 and =3 : 0 .Trainingandnetworkarchitecturedetailsareprovidedin Addendum(Sec.3.4.6). FaceRecognitionSystems. Inthisstudy,weusetwoAFRsystems:FaceNet[6]andArcFace[9]. Recallthattheproposeddefenseutilizesafacematcher, F ,forguidingthetrainingprocessofthe generator.However,thedeployedAFRsystemmaynotbeknowntothedefensesystemapriori. Therefore,unlikeprevailingdefensemechanisms[167,172,179],weevaluatetheeffectiveness oftheproposeddefenseonanAFRsystem different from F .Wehighlighttheeffectivenessof ourproposeddefense: FaceGuardistrainedonFaceNet,whiletheadversarialattacktestsetis designedtoevadeArcFace. Obfuscationattemptsperturbrealprobesintoadversarialones.Ideally, deployedAFRsystems(say,ArcFace),shouldbeabletomatchagenuinepaircomprisedofan adversarialprobeandarealenrolledfaceofthesamesubject.Therefore,regardlessofrealor adversarialprobe,weassumethatgenuinepairsshould always matchasgroundtruth.Tab.3.4 providesAFRperformanceofArcFaceunder 6 SOTAadversarialattacksfor 484 ; 514 genuine 15 Weremoved 84 subjectsinCASIA-WebFacethatoverlapwithLFW. 16 Obfuscationattemptsonlyaffectgenuinepairs(twofaceimagespertainingtothesamesubject). 99 pairsinLFW.Itappearsthatsomeattacks, e . g .,AdvFaces[16],areeffectiveinbothlowTARand highSSIM,whilesomearelesscapableinbothmetrics. 3.4.4ComparisonwithState-of-the-ArtDefenses Inthissection,wecomparetheproposed FaceGuard toprevailingdefenses.Weevaluateallmeth- odsviapubliclyavailablerepositoriesprovidedbytheauthors(seeAddendum(Sec.3.4.6).).All baselinesaretrainedonCASIA-WebFace[10]. SOTADetectors. Ourbaselinesinclude 9 SOTAdetectorsproposedbothforgeneralob- jects[166,175,178]andadversarialfaces[167,172,176,179,182].Thedetectorsaretrainedonreal andadversarialfacesimagessynthesizedviasixadversarialgeneratorsforCASIA-WebFace[10]. Unlikeallthebaselines, FaceGuard 'sdetectordoesnotutilizeanypre-computedadversarialat- tackfortraining.Wecomputetheaccuracyforallmethodsonadatasetcomprising of 9 ; 164 realimagesand 9 ; 164 adversarialfaceimagesperattacktypeinLFW. InTab.3.5,wethatcomparedtothebaselines, FaceGuard achievesthehighestdetec- tionaccuracy.Evenwhenthe 6 adversarialattacktypesareencounteredintraining,abinary CNN[166],stillfallsshortcomparedto FaceGuard .Thisislikelybecause FaceGuard istrained onadiversesetofadversarialfacesfromtheproposedgenerator.WhilethebinaryCNNhasa smalldropcomparedtoFaceGuardintheseenattacks( 99 : 81% ! 97 : 54% ),itdrops onunseenadversarialattacksintesting. Comparedtohand-craftedfeatures,suchasPCA+SVMinUAP-D[167]andentropydetection inSmartBox[172], FaceGuard achievessuperiordetectionresults.SomebaselinesutilizeAFR featuresforidentifyingadversarialinputs[176,179].WethatintermediateAFRfeatures primarilyrepresenttheidentityoftheinputfaceanddonotappeartocontainhighlydiscriminative informationfordetectingadversarialfaces. Despitetherobustness, FaceGuard 28 outof 9 ; 164 realimagesinLFW[8]and falselypredicts 46 outof 54 ; 984 adversarialfacesasreal.Fromthelatter, 44 arewarpedfacesvia GFLM[32]andtheremainingtwoaresynthesizedviaAdvFaces[16].Wethat FaceGuard 100 0 : 77 0 : 67 0 : 96 0 : 99 (a)Realfacesfalselydetectedasadversarial Real AdvFaces( 0 : 42 ) Real AdvFaces( 0 : 28 ) (b)Adversarialfacesfalselydetectedasreal Figure3.21Exampleswheretheproposed FaceGuard failstocorrectlydetect(a)realfacesand (b)adversarialfaces.Detectionscores 2 [0 ; 1] aregivenbeloweachimage,where 0 indicatesreal and 1 indicatesadversarialface. tendstomisclassifyrealfacesunderextremeposesandadversarialfacesthatareoccluded( e . g ., hats)(seeFig.3.21). ComparisonwithAdversarialTraining& Wealsocomparewithprevailingdefenses designingrobustfacematchers[11Œ13]and[14,15,184].Weconductav experimentbyconsideringallpossiblegenuinepairs(twofacesbelongingtothesamesubject) inLFW[8].Foroneprobeinagenuinepair,wecraftsixdifferentadversarialprobes(oneper attacktype).Intotal,thereare 484 ; 514 realpairsand ˘ 3 M adversarialpairs.Foraedmatch threshold 17 ,wecomputetheTrueAcceptRate(TAR)ofsuccessfullymatchingtwoimagesina realoradversarialpairinTab.3.6.Inotherwords,TARishereastheratioofgenuinepairs abovethematchthreshold. ArcFacewithoutanyadversarialdefensesystemachieves 34 : 27% TARat 0 : 1% FARunder 17 Wecomputethethresholdat0.1%FARonallpossibleimagepairsinLFW, e . g .,threshold@0.1%FARfor ArcFaceissetat 0 : 36 . 101 DefensesYearStrategyRealAttacks 485Kpairs 3 M pairs No-Defense - 99 : 8234 : 27 Adv.Training[11]2017Robustness 96 : 4211 : 23 Rob-GAN[12]2019Robustness 91 : 3513 : 89 Feat.Denoising[164]2019Robustness 87 : 6117 : 97 L2L[13]2019Robustness 96 : 8916 : 76 MagNet[14]2017 94 : 4738 : 32 DefenseGAN[15]2018 96 : 7839 : 21 Feat.Distillation[183]2019 94 : 6441 : 77 NRP[184]2020 97 : 5461 : 44 A-VAE[185]2020 93 : 7151 : 99 ProposedFaceGuard 2021 99 : 8177 : 46 Table3.6AFRperformance(TAR(%)@0.1%FAR)ofArcFaceundernodefenseandwhen ArcFaceistrainedviaSOTArobustnesstechniques[11Œ13]orSOTAs[14,15]. FaceGuard correctlypassesmajorityofrealfacestoArcFaceandalsoadversarialattacks. attack.Adversarialtraining[11Œ13]inhibitsthefeaturespaceofArcFace,resultinginworseper- formanceonbothrealandadversarialpairs.Ontheotherhand,methods[14,15,184] canbetterretainfacefeaturesinrealpairsbuttheirperformanceunderattackisstillundesirable. Instead,theproposed FaceGuard defensesystemdetectswhetheraninputfaceimageis realoradversarial.Ifinputfacesareadversarial,theyarefurtherFromTab.3.6,we thatourdefensesystemoutperformsSOTAbaselinesinprotectingArcFace[9]against attacks., FaceGuard 'senhancesArcFace'saverageTARat 0 : 1% FARunderall sixattacks(seeTab.3.4)from 34 : 27% ! 77 : 46% .Inaddition, FaceGuard alsomaintainssimilar facerecognitionperformanceonrealfaces(TARonrealpairsdropfrom 99 : 82% ! 99 : 81% ). Therefore,ourproposeddefensesystemensuresthatbenignuserswillnotbeincorrectlyrejected whilemaliciousattemptstoevadetheAFRsystemwillbecurbed. 3.4.5AnalysisofOurApproach QualityoftheAdversarialGenerator. InTab.3.7,weseethatwithouttheproposedadversarial generator(fiWithout G fl), i . e .,adetectortrainedonthesixknownattacktypes,suffersfromhigh 102 Model AdvFaces[16] Mean Std. Gen. G Without G 91 : 72 97 : 12 04 : 54 Without L div 95 : 42 98 : 23 01 : 33 With G and L div 99 : 84 99 : 81 00 : 10 Det. D D asDiscriminator 50 : 00 75 : 25 21 : 19 D viaPre-Computed G 52 : 01 69 : 37 19 : 91 D asOnlineDetector 99 : 84 99 : 81 00 : 10 Table3.7Ablatingtrainingschemesofthegenerator G anddetector D .Allmodelsaretrainedon CASIA-WebFace[10]. (Col.3) Wecomputethedetectionaccuracyinclassifyingrealfacesin LFW[8]andthemostchallengingadversarialattackinTab.3.4,AdvFaces[16]. (Col.4) Theavg. andstd.dev.ofdetectionaccuracyacrossall6adversarialattacks. standarddeviation.Instead,trainingadetectorwithadeterministic G (fiWithout L div fl),leadsto bettergeneralizationacrossattacktypes,sincethedetectorstillencountersvariationsinsynthe- sizedimagesasthegeneratorlearnstobettergenerateadversarialfaces.However,suchadetector isstillpronetoovingtoafewdeterministicperturbationsoutputby G .Finally, FaceGuard withthediversitylossintroducesdiverseperturbationswithinandacrosstrainingiterations(see Fig.3.22). QualityoftheAdversarialDetector. Thediscriminator'staskissimilartothedetector;deter- minewhetheraninputimageisrealorfake/adversarial.Thekeydifferenceisthatthegenerator isenforcedtofoolthediscriminator,butnotthedetector.Ifwereplacethediscriminatorwithan adversarialdetector,thegeneratorcontinuouslyattemptstofoolthedetectorbysynthesizingim- agesthatareascloseaspossibletotherealimagedistribution.Bydesign,suchadetectorshould convergeto Disc ( x )=0 : 5 forall x (realoradversarial).Asweexpect,inTab.3.7,wecannot relyonpredictionsmadebysuchadetector(fi D asDiscriminatorfl).Wetryanothervariant:we trainthegenerator G andthentrainadetectortodistinguishbetweenrealandpre-computed attacksvia G (fi D viaPre-Computed G fl).Asweexpect,theproposedmethodologyoftrainingthe detectorinanonlinefashionbyutilizingthesynthesizedadversarialsamplesoutputby G atany giveniterationleadstoarobustdetector(fi D asOnlineDetectorfl).Thiscanlikely beattributedtothefactthatadetectortrainedon-lineencountersamuchlargervariationasthe generatortrainsalongside.fi D viaPre-Computed G flisexposedonlytowithin-iterationvariations 103 InputProbe( x ) G ( x ; z 1 ) G ( x ; z 2 ) G ( x ; z 3 ) (a)Adversarialfacesviarandomlatentswithinthesameiteration. Iteration: 5 K Iteration: 20 K Iteration: 60 K Iteration: 100 K (b)Adversarialfacesatdifferenttrainingiterations. Figure3.22Adversarialfacessynthesizedby FaceGuard duringtraining.Notethediversityin perturbations(a)withinand(b)acrossiterations. (fromrandomlatentsampling),however,` D asOnlineDetectorflencountersvariations both within andacrosstrainingiterations(seeFig.3.22). QualityoftheAdversarial. Recallthatweenforcedthepuriimagetobeclosetothe 104 ProbeAdvFaces[16]Localization ArcFace/SSIM: 0 : 30 = 0 : 890 : 62 = 0 : 91 Figure3.23 FaceGuard successfullytheadversarialimage(redregionsindicateadversarial perturbationslocalizedbyourmask).ArcFace[9]scores 2 [ 1 ; 1] andSSIM 2 [0 ; 1] betweenanadvprobeandinputprobearegivenbeloweachimage. (a) (b) Figure3.24(a) FaceGuard 'siscorrelatedwithitsadversarialsynthesisprocess.(b) Trade-offbetweendetectionandwithrespecttoperturbationmagnitudes.Withmin- imalperturbation,detectionischallengingwhilemaintainsAFRperformance.Excessive perturbationsleadtoeasierdetectionwithgreaterchallengein realfaceviaareconstructionloss.Thus,theandperturbationmasksshouldbesimilar. InFig.3.24a,weshowsthatthetwomasksareindeedcorrelatedbyplottingtheCosinesimilarity distribution( 2 [ 1 ; 1] )between G ( x ; z ) and P ur ( x + G ( x ; z )) forall 9 ; 164 imagesinLFW. Therefore,pixelsin x adv involvedintheprocessshouldcorrespondtothosethat causetheimagetobeadversarialintheplace.Fig.3.23highlightsthatperturbedregions canbeautomaticallylocalizedviaconstructingaheatmapoutof P ur ( x adv ) .InFig.3.24b,we investigatethechangeinAFRperformance(TAR(%)@ 0 : 1 %FAR)ofArcFaceunderattack (synthesizedadversarialfacesvia G ( x ; z ) )whentheamountofperturbationisvaried.Wethat 105 (a)minimalperturbationishardertodetectbuttheincursminimaldamagetotheAFR, while,(b)excessiveperturbationsareeasiertodetectbutincreasesthechallengein 3.4.6Addendum 3.4.6.1ImplementationDetails AllthemodelsinthechapterareimplementedusingTwr1.12.AsingleNVIDIAGeForce GTX2080TiGPUisusedfortraining FaceGuard onCASIA-Webface[10]andevaluatedon LFW[8],CelebA[17],andFFHQ[18]. 3.4.6.2Preprocessing AllfaceimagesarepassedthroughMTCNNfacedetector[41]todetect 5 faciallandmarks (twoeyes,noseandtwomouthcorners).Then,similaritytransformationisusedtonormalize thefaceimagesbasedonthevelandmarks.Aftertransformation,theimagesareresizedto 160 160 .Beforepassinginto FaceGuard ,eachpixelintheRGBimageisnormalized 2 [ 1 ; 1] bysubtracting 128 anddividingby 128 . Allthetestingimagesarefromtheidentitiesinthetest dataset. 3.4.6.3NetworkArchitectures Thegenerator, G takesasinputanrealRGBfaceimage, x 2 R 160 160 3 anda 128 -dimensional randomlatentvector, z ˘N (0 ; I ) andoutputsasynthesizedadversarialface x adv 2 R 160 160 3 . Let c7s1-k bea 7 7 convolutionallayerwith k andstride 1 . dk denotesa 4 4 con- volutionallayerwith k andstride 2 . Rk denotesaresidualblockthatcontainstwo 3 3 convolutionallayers. uk denotesa 2 upsamplinglayerfollowedbya 5 5 convolutionallayer with k andstride 1 .WeapplyInstanceNormalizationandBatchNormalizationtothegen- eratoranddiscriminator,respectively.WeuseLeakyReLUwithslope 0 : 2 inthediscriminatorand ReLUactivationinthegenerator.Thearchitecturesofthetwomodulesareasfollows: 106 Generator: c7s1-64,d128,d256,R256,R256,R256,u128,u64,c7s1-3 , Discriminator: d32,d64,d128,d256,d512 . A 1 1 convolutionallayerwith 3 andstride 1 isattachedtothelastconvolutionallayerof thediscriminatorforthepatch-basedGANloss L GAN . The, P ur ,consistsofthesamenetworkarchitectureasthegenerator: c7s1-64,d128,d256,R256,R256,R256,u128,u64,c7s1-3 . Weapplythe tanh activationfunctiononthelastconvolutionlayerofthegeneratorandthe toensurethatthegeneratedimagesare 2 [ 1 ; 1] .Inthechapter,wedenotedthe outputofthetanhlayerofthegeneratorasanfiperturbationmaskfl, G ( x ; z ) 2 [ 1 ; 1] and x 2 [ 1 ; 1] .Similarly,theoutputofthetanhlayeroftheerisreferredtoan tionmaskfl, P ur ( x adv ) 2 [ 1 ; 1] and x adv 2 [ 1 ; 1] .Theadversarialimageiscomputedas x adv =2 clamp G ( x ; z )+ x +1 2 1 0 1 .Thisensures G ( x ; z ) caneitheraddorsubtractpixels from x when G ( x ; z ) 6 =0 .When G ( x ; z ) ! 0 ,then x adv ! x .Similarly,theimage iscomputedas x pur =2 clamp x adv +1 2 P ur ( x adv ) 1 0 1 . Theexternalcriticnetwork,detector D ,comprisesofa 4 -layerbinaryCNN: Detector: d32,d64,d128,d256,fc64,fc1 , where fcN referstoafully-connectedlayerwith N neuronoutputs. 3.4.6.4TrainingDetails Thegenerator,detector,andaretrainedinanend-to-endmannerviaADAMoptimizer withhyperparameters 1 =0 : 5 , 2 =0 : 9 ,learningrateof 1 e 4 ,andbatchsize 16 .Algorithm2 outlinesthetrainingalgorithm. NetworkConvergence. InFig.3.25,weplotthetraininglossacrossiterationswhenanadversar- 107 Algorithm2 Training FaceGuard .Allexperimentsinthisworkuse =0 : 0001 , 1 =0 : 5 , 2 =0 : 9 , obf = fr =10 : 0 , pt = perc = div =1 : 0 , =3 : 0 , m =16 .Forbrevity, lg refersto logoperation. 1: Input 2: X TrainingDataset 3: F CosinesimilaritybyAFR 4: G Generatorwithweights G 5: Dc Discriminatorwithweights Dc 6: D Detectorwithweights D 7: P ur withweights P ur 8: m Batchsize 9: Learningrate 10: for numberoftrainingiterations do 11: Sampleabatchofprobes f x ( i ) g m i =1 ˘X 12: Sampleabatchofrandomlatents f z ( i ) g m i =1 ˘N (0 ;I ) 13: ( i ) G = G (( x ( i ) ;z ( i ) ) 14: x ( i ) adv = x ( i ) + ( i ) G 15: ( i ) P ur = G (( x ( i ) ;z ( i ) ) 16: x ( i ) pur = x ( i ) adv ( i ) P ur 17: 18: L G pt = 1 m P m i =1 max jj ( i ) jj 2 19: L G obf = 1 m h P m i =1 F x ( i ) ;x ( i ) adv 20: L G div = 1 m P m i =1 jj G ( x ; z 1 ) ( i ) ( x ; z 2 ) ( i ) jj 1 jj z 1 z 2 jj 1 21: L G GAN = 1 m h P m i =1 lg 1 Dc ( x ( i ) adv ) 22: L D = 1 m P m i =1 h lg D ( x ( i ) )+ lg 1 D ( x ( i ) adv ) 23: L Dc = 1 m P m i =1 h lg Dc ( x ( i ) ) + lg 1 Dc ( x ( i ) adv ) 24: L P ur perc = 1 m P m i =1 h jj x pur x jj 1 + jjP ur ( x ( i ) adv ) jj 1 i 25: L P ur fr = 1 m P m i =1 F x ( i ) ;x pur 26: L P ur bf = 1 m [ P m i =1 lg (1 D ( x pur ))] 27: L G = L G GAN + obf L obf + pt L pt + div L div 28: L P ur = fr L fr + perc L perc + bf L bf 29: G = Adam ( O G L G ; G ;; 1 ; 2 ) 30: D = Adam ( O Dc L Dc ;Dc ;; 1 ; 2 ) 31: D = Adam ( O D L D ; D ;; 1 ; 2 ) 32: P ur = Adam ( O P ur L P ur ; P ur ;; 1 ; 2 ) 33: endfor 108 Figure3.25Traininglossacrossiterationswhenanadversarialdetectionnetworkistrainedvia pre-computedadversarialfaces(blue),theproposedadv.generatorbutwithoutthediversity(or- ange),andwiththeproposeddiversityloss(green).Thediversitylosspreventsthenetworkfrom ovtoadversarialperturbationsencounteredduringtraining. ialdetectoristrainedviapre-computedadversarialfaces.Inthiscase,thetraininglossconvergesto alowvalueandremainsconsistentacrosstheremainingepochs.Suchadetectormayovtothe edsetofadversarialperturbationsencounteredintraining.Insteadofutilizingthepre-computed adversarialattacks,utilizinganadversarialgeneratorintraining(without L div ),introduceschal- lengingtrainingsamples. FaceGuard withthediversitylossintroducesdiverseperturbationswithinatrainingiteration (seeFig.3.22).InFig.3.25,wealsoobservethatthetrainingloss(epochs 8 40 )untilconvergence(epochs 40 50 ).Thisindicatesthatthroughoutthetraining(within andacrosstrainingiterations),theproposedgeneratorsynthesizesstronganddiverserangeof adversarialfacesthatcontinuouslyregularizesthetrainingoftheadversarialdetector. 109 DetectionAccuracy(%)YearLFW[34]CelebA[17]FFHQ[18] General Gong etal. [166] 201797 : 54 02 : 8294 : 38 04 : 4896 : 89 02 : 07 ODIN[175] 201877 : 03 14 : 3468 : 95 19 : 6474 : 63 08 : 16 Steganalysis[178] 201974 : 33 14 : 7772 : 53 11 : 3071 : 09 09 : 86 Face UAP-D[167] 201864 : 28 09 : 9763 : 19 16 : 4968 : 65 08 : 73 SmartBox[172] 201856 : 77 05 : 1654 : 85 09 : 3357 : 19 09 : 55 Goswami etal. [176] 201979 : 37 14 : 0474 : 70 13 : 8880 : 03 09 : 24 Massoli etal. [179](MLP) 202069 : 16 15 : 2961 : 78 11 : 3466 : 26 10 : 06 Massoli etal. [179](LSTM) 202070 : 11 13 : 3563 : 67 16 : 2169 : 58 07 : 91 Agarwal etal. [182] 202087 : 03 16 : 8685 : 81 15 : 6486 : 70 11 : 04 ProposedFaceGuard 2021 99 : 81 00 : 1098 : 73 00 : 9299 : 35 00 : 09 Table3.8AverageandstandarddeviationofdetectionaccuraciesofSOTAadversarialfacedetec- torsinclassifyingsixadversarialattackssynthesizedfortheLFW[8],CelebA[17],andFFHQ[18] datasets.Detectionthresholdissetas 0 : 5 forallmethods.Allbaselinemethodsrequiretrainingon pre-computedadversarialattacksonCASIA-WebFace[10].Ontheotherhand,theproposed Face- Guard isself-guidedandgeneratesadversarialattacksonthe.Hence,itcanberegardedas a black-box defensesystem. 3.4.6.5Baselines Weevaluatealldefensemethodsviapubliclyavailablerepositoriesprovidedbytheauthors.Only madeistoreplacetheirtrainingdatasetswithCASIA-WebFace[10].Weprovidethe publiclinkstotheauthorcodesbelow: Gong etal. [166]:https://github.com/gongzhitaao/adversarial- UAP-D[167]/SmartBox etal. [172]:https://github.com/akhil15126/SmartBox Massoli etal. [179]:https://github.com/fvmassoli/trj-based-adversarials-detection AdversarialTraining[11]:https://github.com/locuslab/fast adversarial Rob-GAN[12]:https://github.com/xuanqing94/RobGAN L2L[13]:https://github.com/YunseokJANG/l2l-da MagNet[14]:https://github.com/Trevillie/MagNet DefenseGAN[15]:https://github.com/kabkabm/defensegan NRP[184]:https://github.com/Muzammal-Naseer/NRP Attacksarealsosynthesizedviapubliclyavailableauthorcodes: 110 Known Unseen FGSM[34]PGD[35]DeepFool[141] AdvFaces[16]GFLM[32]SemanticAdv[186] Gong etal. [166] 94 : 5192 : 2194 : 12 68 : 6350 : 0050 : 21 UAP-D[167] 63 : 6569 : 3356 : 38 60 : 8150 : 1250 : 28 SmartBox[172] 58 : 7962 : 5351 : 32 54 : 8750 : 9762 : 14 Massoli etal. [179](MLP) 78 : 3582 : 5291 : 21 55 : 5750 : 0050 : 00 Massoli etal. [179](LSTM) 74 : 6186 : 4394 : 73 62 : 4350 : 0050 : 00 (a) Known Unseen AdvFaces[16]GFLM[32]SemanticAdv[186] FGSM[34]PGD[35]DeepFool[141] Gong etal. [166] 81 : 3996 : 7298 : 97 84 : 4657 : 0072 : 32 UAP-D[167] 68 : 7854 : 3177 : 46 51 : 6450 : 3252 : 01 SmartBox[172] 54 : 8750 : 9762 : 14 58 : 7962 : 5351 : 32 Massoli etal. [179](MLP) 77 : 6486 : 5494 : 78 55 : 2051 : 3252 : 90 Massoli etal. [179](LSTM) 81 : 4292 : 6296 : 76 52 : 7465 : 4354 : 84 (b) Known FGSM[34]PGD[35]DeepFool[141]AdvFaces[16]GFLM[32]SemanticAdv[186] Gong etal. [166] 98 : 9497 : 9195 : 8792 : 69 99 : 9299 : 92 UAP-D[167] 61 : 3274 : 3356 : 7851 : 1165 : 3376 : 78 SmartBox[172] 58 : 7962 : 5351 : 3254 : 8750 : 9762 : 14 Massoli etal. [179](MLP) 63 : 5876 : 2881 : 7888 : 3851 : 9752 : 98 Massoli etal. [179](LSTM) 71 : 5376 : 4388 : 3275 : 4353 : 7655 : 22 Unseen ProposedFaceGuard 99 : 8599 : 8599 : 8599 : 84 99 : 6199 : 85 (c) Table3.9DetectionaccuracyofSOTAadversarialfacedetectorsinclassifyingsixadversarial attackssynthesizedfortheLFWdataset[8]undervariousknownandunseenattackscenarios. Detectionthresholdissetas 0 : 5 forallmethods. FGSM/PGD/DeepFool:https://githubw/cleverhans AdvFaces:https://github.com/ronny3050/AdvFaces GFLM:https://github.com/alldbi/FLM SemanticAdv:https://github.com/AI-secure/SemanticAdv 3.4.6.6AdditionalDatasets InTab.3.8,wereportaverageandstandarddeviationofdetectionratesoftheproposed Face- Guard andotherbaselinesonthe 6 adversarialattackssynthesizedonLFW[8],CelebA[17], andFFHQ[18](followingthesameprotocolasTab.3.5).ForCelebA,wesynthesizeatotalof 19 ; 962 6=119 ; 772 adversarialsamplesfor 19 ; 962 realsamplesintheCelebAtestingsplit[17]. Wealsosynthesize 4 ; 974 6=29 ; 844 adversarialsamplesfor 4 ; 974 realfacesinFFHQtesting 111 split[18].Wethattheproposed FaceGuard outperformsallbaselinesinallthreefacedatasets. 3.4.6.7OvinPrevailingDetectors InTab.3.9,weprovidethedetectionratesofprevailingSOTAdetectorsindetectingsixadversarial attacksinLFW[8]whentheyaretrainedondifferentattacksubsets.Wehighglighttheov tingissuewhen(a)SOTAdetectorsaretrainedongradient-basedadversarialattacks(FGSM[34], PGD[35],andDeepFool[141])andtestedongradient-basedandlearning-basedattacks(Ad- vFaces[16],GFLM[32],andSemanticAdv[186]),and(b)vice-versa.Tab.3.9(c)reportsthe detectionperformanceofSOTAdetectorswhenallsixattacksareavailablefortraining. WethatdetectionaccuracyofSOTAdetectorsdropswhentestedonasubset ofattacksnotencounteredduringtheirtraining.Instead,theproposed FaceGuard maintainsrobust detectionaccuracywithouteventrainingonthepre-computedsamplesfromanyknownattacks. 3.4.6.8QualitativeResults GeneratorResults. Fig.3.26showsexamplesofsynthesizedadversarialfacesviatheproposed adversarialgenerator G .Notethatthegeneratortakestheinputprob x andarandomlatent z . Weshowsynthesizedperturbationmasksandcorrespondingadversarialfacesforthreerandomly sampledlatents.WeobservethatthesynthesizedadversarialimagesevadesArcFace[9]while maintaininghighstructuralsimilaritybetweenadversarialandinputprobe. Results. Weshowexamplesofimagesvia FaceGuard andbaselinesincluding MagNet[14]andDefenseGAN[15]inFig.3.27.Weobservethat,comparedtobaselines, imagessynthesizedvia FaceGuard arevisuallyrealisticwithminimalchangescomparedtothe groundtruthrealprobe.Inaddition,comparedtothetwobaselines, FaceGuard 'sprotects ArcFace[9]matcherfrombeingevadedbythesixadversarialattacks. 112 Figure3.26Examplesofgeneratedadversarialimagesalongwithcorrespondingperturbation masksobtainedvia FaceGuard 'sgenerator G forthreerandomlysampled z .Cosinesimilarity scoresviaArcFace[9] 2 [ 1 ; 1] andSSIM 2 [0 ; 1] betweensynthesizedadversarialandinput probearegivenbeloweachimage.Ascoreabove 0.36 (threshold@ 0 : 1% FalseAcceptRate) indicatesthattwofacesareofthesamesubject. 3.4.6.9AdditionalResultson PerturbationandMasks. Inthemaintext,wefoundthattheperturbationandpu- masksarecorrelatedwithanaverageCosinesimilarityof 0 : 52 .Weshowvepairsof perturbationandmasksrankedbytheCosinesimilaritybetweenthem(highesttolow- est).Weobservethatmaskisbettercorrelatedwhenperturbationsaremorelocal. Slightlyperturbingentirefacesposestobechallengingfortheproposed. EffectofPerturbationAmount. Wealsostudiedtheeffectofperturbationamountondetection 113 andresultsinthemaintext.Weobservedatrade-offbetweendetectionand tionwithrespecttoperturbationmagnitudes.Withminimalperturbation,detectionischallenging whilemaintainsAFRperformance.Excessiveperturbationsleadtoeasierdetectionwith greaterchallengeinInFig.3.29,showexamplesofsynthesizedadversarialfacesfor differentperturbationamountsandtheircorrespondingimages.Wethatdetection scoresimprovewithlargerperturbation.Alignedwithourearlierduetotheproposed loss, L bf ,facesarecontinuouslydetectedasrealbythedetectorwhichexplains whythemaintainsAFRperformancewithincreasingperturbationamount. EffectofonArcFaceEmbeddings. Inordertoinvestigatetheeffectof tiononamatcher'sfeaturespace,weextractfaceembeddingsofrealimages,theircorresponding adversarialimagesviathechallengingAdvFaces[16]attack,andimages,viatheSOTA ArcFacematcher.Intotal,weextractfeaturevectorsfrom 1 ; 456 faceimagesof 10 subjectsinthe LFWdataset[8].InFig.3.10,weplotthe 2 Dt-SNEvisualizationofthefaceembeddingsforthe 10 subjects.Theidentityclusteringscanbeclearlyobservedfromreal,adversarial,and images.Inparticular,weobservethatsomeadversarialfacespertainingtoasubjectmovesfarther fromitsidentityclusterwhiletheproposeddrawsthemback.Fig.3.30illustratesthat theproposedindeedenhancesfacerecognitionperformanceofArcFaceunderattackfrom 34 : 27% TAR@ 0 : 1% FARundernodefenseto 77 : 46% TAR@ 0 : 1% FAR. 3.5Summary Thischapterproposesanewmethodofadversarialfacesynthesis,namely AdvFaces ,that automaticallygeneratesadversarialfaceimageswithimperceptibleperturbationsevadingstate- of-the-artfacematchers.WiththehelpofaGAN,andtheproposedperturbationandidentity losses,AdvFaceslearnsthesetofpixellocationsrequiredbyfacematchersforand onlyperturbsthosesalientfacialregions(suchaseyebrowsandnose).Oncetrained,AdvFaces generateshighqualityandperceptuallyrealisticadversarialexamplesthatarebenigntothehuman 114 eyebutcanevadestate-of-the-artblack-boxfacematchers,whileoutperformingotherstate-of-the- artadversarialfacemethods. WiththeintroductionofsophisticatedadversarialattacksonAFRsystems,suchasgeomet- ricwarpingandGAN-synthesizedadversarialattacks,adversarialdefenseneedstoberobustand generalizable.Withoututilizinganypre-computedtrainingsamplesfromknownadversarialat- tacks,theproposed FaceGuard achievedstate-of-the-artdetectionperformanceagainst 6 different adversarialattacks. FaceGuard 'salsoenhancedArcFace'srecognitionperformanceunder adversarialattacks. 115 Figure3.27ExamplesofimagesviaMagNet[14],DefenseGan[15],andproposed Face- Guard forsixadversarialattacks.CosinesimilarityscoresviaArcFace[9] 2 [ 1 ; 1] are givenbeloweachimage.Ascoreabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatesthat twofacesareofthesamesubject. 116 Figure3.28Examplesofsynthesizedadversarialimagesviatheproposedadversarialgenerator andcorrespondingimages.Cosinesimilaritybetweenperturbationandmasks givenbeloweachrowalongwithArcFacescoresbetweensynthesizedadvimage andrealprobe.Ascoreabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatesthattwofaces areofthesamesubject.Evenwithlowercorrelationbetweenperturbationandmasks (rows3-5),theimagescanstillbeasthecorrectidentity.Noticethatthe primarilyalterstheeyecolor,nose,andsubduesadversarialperturbationsinforeheads.Zoomin fordetails. 117 Figure3.29ArcFace 2 [ 1 ; 1] /Detectionscores 2 [0 ; 1] whenperturbationamountisvaried ( = f 0 : 25 ; 0 : 50 ; 0 : 75 ; 1 : 00 ; 1 : 25 g ).Detectionscoresabove 0 : 5 arepredictedasadversarialimages whileArcFacescoresabove 0.36 (threshold@ 0 : 1% FalseAcceptRate)indicatethattwofaces areofthesamesubject. FaceGuard istrainedon =1 : 00 .Thedetectionscoresimproveas perturbationamountincreases,whereas,majorityofimagesaredetectedasreal.Even whenimagesfailtobeclassasrealbythedetector,cationmaintainhighAFR performance. 118 Figure3.30 2 Dt-SNEvisualizationoffacerepresentationsextractedviaArcFacefrom 1 ; 456 (a) real,(b)AdvFaces[16],and(c)imagesbelongingto 10 subjectsinLFW[8].Example AdvFaces[16]pertainingtoasubjectmovesfartherfromitsidentityclusterwhiletheproposed drawsthemback. 119 Chapter4 DetectionofDigitalandPhysical FaceAttacks Inthepreviouschapters,weproposedindividualsolutionstoenhancetherobustnessofAFRsys- temagainstphysicalanddigitalattacks.However,threebroadcategoriesoffaceattackshavebeen inliterature,namelyspoofs,digitalmanipulation,andadversarialfaces.Sincetheexact typeoffaceattackmaynotbeknown apriori ,ageneralizabledetectorthatcandefendanAFR systemagainstanyofthethreeattackcategoriesisofutmostimportance.Inthischapter,ourmain focusistodesignauniversalfaceattackdetectionframeworkthatcanreliablydetectattacksfrom allthreecategories. 4.1Introduction TheforemostchallengefacingAFRsystemsistheirvulnerabilityto faceattacks .Forinstance, anattackercanhidehisidentitybywearinga3Dmask[58],orintruderscanassumeavictim's identitybydigitallyswappingtheirfacewiththevictim'sfaceimage[20].Withunrestrictedaccess totherapidproliferationoffaceimagesonsocialmediaplatforms,launchingattacksagainstAFR systemshasbecomeevenmoreaccessible.Giventhegrowingdisseminationoffifakenewsfland fideepfakesfl[59],theresearchcommunityandsocialmediaplatformsalikearepushingtowards 120 Figure4.1FaceattacksagainstAFRsystemsarecontinuouslyevolvinginbothdigitalandphysical spaces.Giventhediversityofthefaceattacks,prevailingmethodsfallshortindetectingattacks acrossallthreecategories( i . e .,adversarial,digitalmanipulation,andspoofs).Thisworkisamong thetothetaskoffaceattackdetectiononthe 25 attacktypesacross 3 categoriesshown here. generalizable defenseagainstcontinuouslyevolvingandsophisticatedfaceattacks. Inliterature,faceattackscanbebroadlyintothreeattackcategories:(i)Spoofat- tacks:artifactsinthe physical domain( e . g .,3Dmasks,eyeglasses,replayingvideos)[1],(ii) Adversarialattacks:imperceptiblenoisesaddedtoprobesforevadingAFRsystems[60],and(iii) Digitalmanipulationattacks:entirelyorpartiallyphoto-realisticfacesusinggenerative models[20].Withineachofthesecategories,therearedifferentattacktypes.Forexample,each spoofmedium, e . g .,3Dmaskandmakeup,constitutesoneattacktype,andthereare 13 common typesofspoofattacks[1].Likewise,inadversarialanddigitalmanipulationattacks,eachattack model,designedbyuniqueobjectivesandlosses,maybeconsideredasoneattacktype.Thus, theattackcategoriesandtypesforma 2 -layertreestructureencompassingthediverseattacks(see Fig.4.1).Suchatreewillinevitablygrowinthefuture. InordertosafeguardAFRsystemsagainsttheseattacks,numerousfaceattackdetectionap- proacheshavebeenproposed[20,21,61Œ63].Despiteimpressivedetectionrates,prevailingre- searcheffortsfocusonafewattacktypeswithin one ofthethreeattackcategories.Sincetheexact typeoffaceattackmaynotbeknown apriori ,ageneralizabledetectorthatcandefendanAFR 121 systemagainstanyofthethreeattackcategoriesisofutmostimportance. Duetothevastdiversityinattackcharacteristics,fromglossy2Dprintedphotographstoim- perceptibleperturbationsinadversarialfaces,wethatlearningasingle networkisinad- equate.Evenwhenprevailingstate-of-the-art(SOTA)detectorsaretrainedonall 25 attacktypes, theyfailtogeneralizewellduringtesting.Viaensembletraining,wecomprehensivelyevaluatethe detectionperformanceonfusingdecisionsfromthreeSOTAdetectorsthatindividuallyexcelat theirrespectiveattackcategories.However,duetothediversityinattackcharacteristics,decisions madebyeachdetectormaynotbecomplementaryandresultinpoordetectionperformanceacross all 3 categories. Thisresearchisamongthetofocusondetecting all 25 attacktypes knowninliterature ( 6 adversarial, 6 digitalmanipulation,and 13 spoofattacks).Ourapproachconsistsof(i)auto- maticallyclusteringattackswithsimilarcharacteristicsintodistinctgroups,and(ii)amulti-task learningframeworktolearnsalientfeaturestodistinguishbetweenbonaandcoherentattack types,whileearlysharinglayerslearnajointrepresentationtodistinguishbonafromany genericattack. Thisworkmakesthefollowingcontributions: Amongthetothetaskoffaceattackdetectionon 25 attacktypesacross 3 attack categories:adversarialfaces,digitalfacemanipulation,andspoofs. Anovel uni f ace a ttack d etectionframework,namely UniFAD ,thatautomaticallyclus- terssimilarattacksandemploysamulti-tasklearningframeworktodetectdigitalandphys- icalattacks. Proposed UniFAD achievesSOTAdetectionperformance,TDR= 94 : 73% @ 0 : 2% FDRon alargefakefacedataset,namely GrandFake .Tothebestofourknowledge, GrandFake is thelargestfaceattackdatasetstudiedinliteratureintermsofthenumberofdiverseattack types. Proposed UniFAD allowsforfurtheroftheattackcategories, i . e .,whether attacksareadversarial,digitallymanipulated,orcontainsphysicalartifacts,witha 122 StudyYear#BonaFides#Attacks#Types Adversarial UAP-D[167]2018 9 ; 95929 ; 8771 Goswami etal. [176]2019 16 ; 68550 ; 0553 Agarwal etal. [182]2020 24 ; 04272 ; 1263 Massoli etal. [179]2020 169 ; 3961 M 6 FaceGuard[19]2020 507 ; 6473 M 6 DigitalManip. Zhou etal. [214]2018 2 ; 0102 ; 0102 Yang etal. [215]2018 241( I ) = 49( V )252( I ) = 49( V )1 DeepFake[216]2018 620( V )1 FaceForensics++[216]2019 1 ; 000( V )3 ; 000( V )3 FakeSpotter[217]2019 6 ; 0005 ; 0002 DFFD[20]2020 58 ; 703240 ; 3367 Phys.Spoofs Replay-Attack[4]2012 200( V )1 ; 000( V )3 MSUMFSD[85]2015 160( V )280( V )3 OuluNPU[2]2017 990( V )3 ; 960( V )4 SiW[218]2018 1 ; 320( V )3 ; 158( V )6 SiW-M[1]2019 660( V )960( V )13 GrandFake(ours) 2021 341 ; 738447 ; 67425 Table4.1Faceattackdatasetswithno.ofbonaimages,no.ofattackimages,andno.ofattack types.Here, I denotesimagesand V referstovideos. accuracyof 97 : 37% . 4.2RelatedWork IndividualAttackDetection. Earlyworkonfaceattackdetectionprimarilyfocusedononeor twoattacktypesintheirrespectivecategories.Studiesonadversarialfacedetection[166,176] primarilyinvolveddetectinggradient-basedattacks,suchasFGSM[34],PGD[219],andDeep- Fool[141].DeepFakeswereamongthestudieddigitalattackmanipulation[214Œ216],how- ever,generalizabilityoftheproposedmethodstoalargernumberofdigitalmanipulationattack typesisunsatisfactory[220].Majorityoffaceanti-smethodsfocusonprintandreplay attacks[2,66,85,89,91,91,92,94,112,123,135,218,221,222]. 123 Overtheyears,acleartrendintheincreaseofattacktypesineachcategorycanbeobservedin Tab.4.1.Sinceacommunityofattackersdedicatetheireffortstocraftnewattacks,itisimperative tocomprehensivelyevaluateexistingsolutionsagainstalargenumberofattacktypes. JointAttackDetection. Recentstudieshaveusedmultipleattacktypesinordertodefendagainst faceattacks.For e . g .,FaceGuard[19]proposedageneralizabledefenseagainst 6 adversarialattack types.TheDiverseFakeFaceDataset(DFFD)[20]includes 7 digitalmanipulationattacktypes. Inthespoofattackcategory,recentstudiesfocusondetecting 13 spooftypes. Majorityoftheworkstacklingmultipleattacktypesposethedetectionasabinaryclass cationproblemwithasinglenetworklearningajointfeaturespace.Forsimplicity,wereferto suchanetworkarchitectureas JointCNN .Forinstance,itiscommoninadversarialfacedetection totrainaJointCNNwithbonafacesandadversarialattackssynthesizedbyagen- erativenetwork[12,13,19,184].Ontheotherhand,majorityoftheproposeddefensesagainst digitalmanipulation,apre-trainedJointCNN( e . g .,Xception[223])onbonafaces andallavailabledigitalmanipulationattacks[20,77,217].Duetotheavailabilityofawidevariety ofphysicalspoofartifactsinfacedatasets( e . g .,eyeglasses,printandreplayinstru- ments,masks, etc )alongwithevidentcuesfordetectingthem,studiesonanti-spoofsaremore sophisticated.TheassociatedJointCNNemployseitherauxiliarycues,suchasdepthmapand heartpulsesignals(rPPG)[88,127,218],oraficompactnessfllosstopreventov[61,224]. RecentlyStehouwer etal. [225]attempttolearnaspoofdetectorfromimageryofgenericob- jectsandapplyittofaceWhilejointlydetectingmultipleattacktypesispromising, detectingattacktypes across differentcategoriesisoftheutmostimportance.Anearlyattempt proposedadefenseagainst4attacktypes( 3 spoofsand 1 digitalmanipulation)[224].Tothebest ofourknowledge,wearethetoattemptdetecting 25 attacktypesacross 3 categories. Multi-taskLearning. Inmulti-tasklearning(MTL),atask, T i isusuallyaccompaniedbyatrain- ingdataset, D tr consistingof N t trainingsamples, i . e ., D tr = f x tr i ;y tr i g N tr i =1 ,where x tr i 2 R isthe i th trainingsamplein T i and y t i isitslabel.MostMTLmethodsrelyontasks[226Œ228]. Crawshaw etal. [229]summarizevariousworksonMLTwithCNNs.Inthiswork,wepropose 124 Figure4.2(a)Detectionperformance(TDR@ 0 : 2% FDR)indetectingeachattacktypebythe proposed UniFAD (purple)andthedifferenceinTDRfromthebestfusionscheme,LightGBM[36] (pink).(b)Cosinesimilaritybetweenmeanfeaturesfor 25 attacktypesextractedby JointCNN .(c) Examplesofattacktypesfrom 4 differentclustersvia k -meansclusteringonJointCNNfeatures. Attacktypesinpurple,blue,andreddenotespoofs,adversarial,anddigitalmanipulationattacks, respectively. aMTLframeworkinanextremesituationwhereonlyasingletaskisavailable(bonavs. 25 attacktypes)andutilize k -meansclusteringtoconstructnewauxiliarytasksfrom D tr .Arecent studyalsoutilized k -meansforconstructingnewtasks,however,theirapproachutilizesameta- learningframeworkwherethegroupsthemselvescanalterthroughouttraining[230].Weshow thatthisisproblematicforfaceattackssinceattacksthatsharesimilarcharacteristicsshouldbe trainedjointly.Instead,weproposeanewattackdetectionframeworkthatutilizes k - meanstopartitionthe 25 attackstypes,andthenlearnssharedandrepresentations todistinguishthemfrombona 4.3DissectingPrevailingDefenseSystems 4.3.1Datasets Inordertodetect 25 attacktypes( 6 adversarial, 6 digitalmanipulation,and 13 spoofs),wepropose the GrandFake dataset,anamalgamationofmultiplefaceattackdatasetsfromeachcategory.We 125 provideadditionaldetailsof GrandFake inSec.4.5.1. 4.3.2DrawbackofJointCNN Considerthediversityintheavailableattacks:fromimperceptibleadversarialperturbationsto digitalmanipulationattacks,bothofwhichareentirelydifferentfromphysicalprintattacks(hard surface,glossy,2D).Evenwithinthespoofcategory,characteristicsofmaskattacksarequite differentfromreplayattacks.Inaddition,discriminativecuesforsomeattacktypesmaybeob- servedinhigh-frequencydomain( e . g .,defocusedblurriness,chromaticmoment),whileothers exhibitlow-frequencycues( e . g .,colordiversityandspecularForthesereasons,learn- ingacommonfeaturespacetodiscriminateallattacktypesfrombonaischallenginganda JointCNNmayfailtogeneralizewellevenonattacktypesseenduringtraining. WedemonstratethisbytrainingaJointCNNonthe 25 attacktypesin GrandFake dataset. Wethencomputean attacksimilaritymatrix betweenthe 25 types(seeFig.4.2(b)).Themeanfea- tureforeachattacktypeiscomputedonavalidationsetcomposedof 1 ; 000 imagesperattack. Wethencomputethepairwisecosinesimilaritybetweenmeanfeaturesfromallattackpairs.From Fig.4.2,wenotethatphysicalattackshavelittlecorrelationwithadversarialattacksandtherefore, learningthemjointlywithinacommonfeaturespacemaydegradedetectionperformance. AlthoughprevailingJointCNN-baseddefenseachievenearperfectdetectionwhentrainedand evaluatedontherespectiveattacktypesinisolation,weobserveadegradedperfor- mancewhentrainedandtestedonall 3 attackcategoriestogether(seeTab.4.2).Inotherwords, evenwhenaprevailingSOTAdefensesystemistrainedonall 3 categories,itmayleadtodegraded performanceontesting. 4.3.3UnifyingMultipleJointCNNs Anotherpossibleapproachistoconsiderensembletechniques;insteadofusingasingleJointCNN detector,wecanfusedecisionsfrommultipleindividualdetectorsthatarealreadyexpertsindistin- guishingbetweenbonaandattacksfromtheirrespectiveattackcategory.GiventhreeSOTA 126 detectors,oneperattackcategory,weperformacomprehensiveevaluationonparallelandsequen- tialscore-levelfusionschemes. Inourexperiments,wethat,indeed,fusingscore-leveldecisionsfromsingle-categoryde- tectorsoutperformsasingleSOTAdefensesystemtrainedonallattacktypes.Notethateffortsin utilizingprevailingdefensesystemsrelyontheassumptionthatattackcategoriesareindependent ofeachother.However,Fig.4.2showsthatsomedigitalmanipulationattacks,suchasSTGAN andStyleGAN,are morecloselyrelated tosomeoftheadversarialattacks( e . g .,AdvFaces,GFLM, andSemantic)thanotherdigitalmanipulationtypes.Thisislikelybecauseallvemethodsuti- lizeaGANtosynthesizetheirattacksandmaysharesimilarattackcharacteristics.Therefore, aSOTAadversarialdetectorandaSOTAdigitalmanipulationdetectormayindividuallyexcelat theirrespectivecategories,butmaynotprovidecomplementarydecisionswhenfused.Insteadof trainingdetectorsongroupswithmanuallyassignedsemantics( e . g .,adversarial,digitalmanipu- lation,spoofs),itisbettertotrainJointCNNsoncoherentattacks.Inaddition,utilizingdecisions frompre-trainedJointCNNsmaytendtoovtotheattackcategoriesusedfortraining. 4.4ProposedMethod Weproposeanewmulti-tasklearningframeworkforAttackDetection,namely UniFAD , bytraininganend-to-endnetworkforimprovedphysicalanddigitalfaceattackdetection.In particular,a k -meansaugmentationmoduleisutilizedtoautomaticallyconstructauxiliarytasks toenhancesingletasklearning(suchasaJointCNN).Then,ajointmodelisdecomposedinto afeatureextractor(sharedlayers) F thatissharedacrossalltasks,andbranches foreachauxiliarytask.Fig.4.3illustratestheauxiliarytaskcreationandthetrainingprocess of UniFAD . 127 Figure4.3Anoverviewoftraining UniFAD intwostages.Stage 1 automaticallyclusterscoherent attacktypesinto T groups.Stage 2 consistsofaMTLframeworkwhereearlylayerslearngeneric attackfeatureswhile T brancheslearntodistinguishbonafromcoherentattacks. 4.4.1Problem Letthefimaintaskflbeastheoverallobjectiveofaattackdetector:givenaninput image, x ,assignascorecloseto 0 if x isbonaorcloseto 1 if x isanyoftheavailableface attacktypes.Wearealsogivenalabeledtrainingset, D tr .Prevailingdefensesfollowasingle tasklearningapproachwherethemaintaskisadoptedtobetheultimatetrainingobjective.In ordertoavoidtheshortcomingsofaJointCNNandofmultipleJointCNNs,we use D tr toautomaticallyconstructmultipleauxiliarytasks fT t g T t =1 ,where T i isthe i thclusterof coherentattacktypes.Iftheauxiliarytasksareappropriatelyconstructed,jointlylearningthese tasksalongwiththemaintaskshouldimproveattackdetectioncomparedtoasingletask learningapproach. 4.4.2AutomaticConstructionofAuxiliaryTasks OnewaytoconstructauxiliarytasksistotrainaseparatebinaryJointCNNoneachattacktype. Suchpartitioningmassivelyincreasescomputationalburden( e . g .,trainingandtesting 25 JointC- NNs).Othersimplepartitioningmethods,suchasrandomlypartitionarelikelytoclusteruncor- relatedattacks.Ontheotherhand,clusteringinthepixel-spaceisalsounappealingduetopoor correlationbetweenthedistancesinthepixel-space,andclusteringinthehigh-dimensionalspace ischallenging[231].Therefore,werequireareasonablealternativetomanualinspectionofthe attacksimilaritymatrixinFig.4.2topartitiontheattacktypesintoappropriateclusters. 128 Fortunately,wealreadyhaveaJointCNNtrainedviaasingletasklearningframeworkthatcan extractsalientrepresentations.Thus,wecanmapthedata f x g intoJointCNN'sembeddingspace Z ,producing f z g .Wecanthenutilizeatraditionalclusteringalgorithm, k -means,whichtakes asetoffeaturevectorsasinput,andclusterstheminto k distinctgroupsbasedonageometric constraint.,foreachattacktype,wecomputethemeanfeature.Wethenutilize k -meansclusteringtopartitionthe L featuresinto T ( L ) sets, P = fP 1 ; P 2 ;:::; P T g suchthat within-clustersumofsquares(WCSS)isminimized, argmin P T X i =1 X z 2P i jj z i jj 2 ; (4.4.1) where, z representsameanfeatureforanattacktypeand i isthemeanofthefeaturesin P i . Fig.4.2(c)showsanexampleonclusteringthe 25 attacktypesof GrandFake . 4.4.3Multi-TaskLearningwithConstructedTasks Withamulti-tasklearningframework,welearncoherentattacktypesjointly,whileuncorrelated attacksarelearnedintheirownfeaturespaces.Weconstruct T fibranchesflwhereeachbranchisa neuralnetworktrainedonabinaryproblem( i . e .,aux.task).Thelearningobjective ofeachbranch, B t ,istominimize, L aux t = E x [ log B t ( x bf )]+ E x log 1 B t ( x P t fake ) : (4.4.2) where x bf denotesbonaimagesand x P t fake isfaceattackscorrespondingtotheattacktypesin thepartition P t . 4.4.4ParameterSharing EarlySharing. Weadoptahardparametersharingmodulewhichlearnsacommonfeaturerepre- sentationfordistinguishingbetweenbonaandattackspriortoaux.tasklearningbranches. Baxter[232]demonstratedthesharedparametershavealowerriskofovthanthetask- parameters.Therefore,adoptingearlyconvolutionallayersasapre-processingstepprior 129 tobranchingcanhelp UniFAD initsgeneralizationtoall 3 categories.Weconstructhiddenlay- ersbetweentheinputandthebranchestoobtainsharedfeatures, h = F ( x ) ,whiletheauxiliary learningbranchesoutput B t ( h ) . LateSharing. Eachbranch B t istrainedtooutputadecisionscorewherescorescloserto 0 indicatethattheinputimageisabonawhereas,scorescloserto 1 correspondtoattacktypes pertainingtothebranch'spartition.Thescoresfromall T branchesarethenconcatenatedand passedtoadecisionlayer.Forsimplicity,wethedecisionoutputas, FC ( x ):= FC ( B 1 ( h ) ; B 2 ( h ) ;:::; B T ( h )) . Theearlysharedlayersandthedecisionlayerarelearnedviaabinarycross-entropyloss, L shared = E x [ log FC ( x bf )]+ E x [ log (1 FC ( x fake ))] ; (4.4.3) betweenbonaandallavailableattacktypes. 4.4.5TrainingandTesting Theentirenetworkistrainedinanend-to-endmannerbyminimizingthefollowingcompositeloss, L UniFAD = L shared + T X t =1 L aux t : (4.4.4) The L shared lossisbackpropagatedthroughout UniFAD ,while L aux t isonlyresponsibleforupdat- ingtheweightsofthebranch, B t ,andthelayer.Fortheforwardandbackward passesof L shared ,anequalnumberofbonaandattacksamplesareusedfortraining.Onthe otherhand,fortrainingeachbranch, B t ,wesampletheequalnumberofbonaandequal numberofattackimagesfromtheattackpartition P t . AttackDetection. Duringtesting,animagepassesthroughthesharedlayersandtheneach branchof UniFAD outputsadecisionwhethertheimageisbona(valuescloseto 0 )oranattack (closeto 1 ).Thedecisionlayeroutputsthedecisionscore.Unlessstatedotherwise,we usethedecisionscorestoreportperformance. Attack Onceanattackisdetected, UniFAD canautomaticallyclassifytheattack 130 typeandcategory.Forall L attacktypesinthetrainingset,weextractintermediate 128 -dim featurevectorsfrom T branches.Thefeaturesarethenconcatenatedandthemeanfeatureacross all L attacktypesiscomputed,suchthat,wehave L featurevectorsofsize T 128 .Foradetected attack,Cosinesimilarityiscomputedbetweenthetestingsample'sfeaturevectorandthemean trainingfeaturesfor L types.Thepredictedattacktypeistheonewiththehighestsimilarityscore. 4.5ExperimentalResults 4.5.1ExperimentalSettings Dataset. GrandFake consistsof 25 faceattacksfrom 3 attackcategories.Bothbonaandfake facesareofvaryingqualityduetodifferentcaptureconditions. BonaFideFaces. WeutilizefacesfromCASIA-WebFace[10],LFW[8],CelebA[17],SiW- M[1],andFFHQ[18]datasetssincethefacesthereincoverabroadvariationinrace,age,gender, pose,illumination,expression,resolution,andacquisitionconditions. AdversarialFaces. WecraftadversarialfacesfromCASIA-WebFace[10]andLFW[8]via 6 SOTAadversarialattacks:FGSM[34],PGD[219],DeepFool[141],AdvFaces[16],GFLM[32], andSemanticAdv[186].TheseattackswerechosenfortheirsuccessinevadingSOTAAFRsys- temssuchasArcFace[9]. DigitalManipulation. Therearefourbroadtypesofdigitalfacemanipulation:identityswap, expressionswap,attributemanipulation,andentirelysynthesizedfaces[20].Weuseallclipsfrom FaceForensics++[77],includingidentityswapbyFaceSwapandDeepFake 1 ,andexpressionswap byFace2Face[76].WeutilizetwoSOTAmodels,StarGAN[78]andSTGAN[79],togenerate attributemanipulatedfacesinCeleba[17]andFFHQ[18].WeusethepretrainedStyleGAN2 model 2 tosynthesize 100 Kfakefaces. PhysicalSpoofs. Weutilizethepubliclyavailable SiW-M dataset[1],comprising 13 spoof 1 https://github.com/deepfakes/faceswap 2 https://github.com/NVlabs/stylegan2 131 TDR(%)@0.2%FDRYearProposedForAdv.Dig.Man.Phys.OverallTime(ms) w/oRe-train FaceGuard[19] 2020 Adversarial 99 : 9122 : 2800 : 5829 : 6401 : 41 FFD[20] 2020 DigitalManipulation 09 : 4994 : 5701 : 2534 : 5511 : 57 SSRFCN[21] 2020 Spoofs 00 : 2500 : 7693 : 1922 : 7102 : 22 MixNet[233] 2020 Spoofs 00 : 3609 : 8378 : 2121 : 1212 : 47 Baselines FaceGuard[19] 2020 Adversarial 99 : 8641 : 5604 : 3556 : 6901 : 41 FFD[20] 2020 DigitalManipulation 76 : 0691 : 3287 : 4368 : 2511 : 57 SSRFCN[21] 2020 Spoofs 08 : 2327 : 6789 : 1943 : 2602 : 22 One-class[61] 2020 Spoofs 04 : 8145 : 9679 : 3239 : 4007 : 92 MixNet- UniFAD 2021 All 82 : 3391 : 5994 : 6090 : 0712 : 47 FusionSchemes Cascade[40] 88 : 3981 : 9869 : 1977 : 4605 : 16 Min-score 03 : 6511 : 0800 : 4307 : 2216 : 14 Median-score 10 : 8742 : 3347 : 1939 : 4816 : 12 Mean-score 14 : 5347 : 1861 : 3238 : 2316 : 12 Max-score 85 : 3261 : 9356 : 8773 : 8916 : 13 Sum-score 74 : 9358 : 0150 : 3469 : 2116 : 11 LightGBM[36] 76 : 2581 : 2888 : 5285 : 9717 : 92 ProposedUniFAD 2021 All 92 : 5697 : 2198 : 7694 : 7302 : 59 Table4.2Detectionaccuracy(TDR(%)@ 0 : 2% FDR)on GrandFake dataset.Resultsonfusing FaceGuard[19],FFD[20],andSSRFCN[21]arealsoreported.Wereporttimetakentodetecta singleimage(onaNvidia2080TiGPU). types.Comparedwithotherspoofdatasets(Tab.4.1),SiW-Misdiverseinspoofattacktypes, environmentalconditions,andfaceposes. Protocol. Asiscommonpracticeinfacerecognitionliterature,bonaandattacksfrom CASIA-WebFace[10]areusedfortraining,whilebonaandattacksforLFW[8]arese- questeredfortesting.Thebonaandattacksfromotherdatasetsaresplitinto 70% training, 5% validation,and 25% testing. Implementation. UniFAD isimplementedinTw,andtrainedwithaconstantlearning rateof (1 e 3 ) withamini-batchsizeof 180 .Theobjectivefunction, L UniFAD ,isminimized usingAdamoptimizerfor 100 K iterations.Dataaugmentationduringtraininginvolvesrandom horizontalwithaprobabilityof 0 : 5 . Metrics. Studiesondifferentattackcategoriesprovidetheirownmetrics.Followingtherec- ommendationfromIARPAODINprogram,wereporttheTDR@ 0 : 2% FDR 3 andtheoverall detectionaccuracy(inAddendum(Sec.4.6). 3 https://www.iarpa.gov/index.php/research-programs/odin 132 4.5.2ComparisonwithIndividualSOTADetectors Inthissection,wecomparetheproposed UniFAD toprevailingfaceattackdetectorsviapublicly availablerepositoriesprovidedbytheauthors(seeAddendum(Sec.4.6). WithoutRe-training. InTab.4.2,wereporttheperformanceof 4 pre-trainedSOTAde- tectors.Thesebaselineswerechosensincetheyreportthebestdetectionperformanceindatasets withlargestnumbersofattacktypesintheirrespectivecategories(seeTab.4.1).Wealsoreport theperformanceofageneralizablespoofdetectorwithsub-networks,namely MixNet .MixNet semanticallyassignsspoofsintogroups(print,replay,andmasks)fortrainingeachsub-network withoutanysharedrepresentation.Wethatpre-trainedmethodsindeedexcelintheir attackcategories,however,generalizationperformanceacrossall 3 categoriesdeterioratescatas- trophically. WithRe-training. Afterre-trainingthe 4 SOTAdetectorsonall 25 attacktypes,wethat theygeneralizebetteracrosscategories.FaceGuard[19],FFD[20],SSRFCN[21],andOne- Class[61],allemployaJointCNNfordetectingattacks.Unsurprisingly,thesedefensesperform wellonsomeattackcategories,whilefailingonothers. Forafaircomparison,wealsomodifyMixNet,namely MixNet-UniFAD suchthatclustersare assignedvia k -meanswith 4 branches.Incontrastto MixNet-UniFAD , UniFAD (i)employsearly sharedlayersforgenericattackcues,and(ii)eachbranchlearnstodistinguishbetweenbona andattacktypes.MixNet,ontheotherhand,assignsabonalabel( 0 )toattacktypes outsidearespectivebranch'spartition.Thisnegativelyimpactsnetworkconvergence.Overall,we that UniFAD outperforms MixNet-UniFAD withTDR 90 : 07% ! 94 : 73% @ 0 : 2% FDR. 4.5.3ComparisonwithFusedSOTADetectors WealsocomprehensivelyevaluatedetectionperformanceonfusingSOTAdetectors.Weutilize threebestperformingdetectorsfromeachattackcategory,namelyFaceGuard[19],FFD[20],and SSRFCN[21].InspiredbytheViola-Jonesobjectdetection[40],weadoptasequentialensemble 133 Figure4.4Confusionmatrixrepresentingtheaccuracyof UniFAD inidentifyingthe 25 attacktypes.Majorityofoccurwithintheattackcategory.Darkervalues indicatehigheraccuracy.Overall, UniFAD achieves 75 : 81% and 97 : 37% accuracyin identifyingattacktypesandcategories,respectively.Purple,blue,andreddenotespoofs,adver- sarial,anddigitalmanipulationattacks,respectively. technique,namelyCascade[40],whereaninputprobeispassedthrougheachdetectorsequen- tially.Wealsoevaluate 5 parallelscorefusionrules(min,mean,median,max,andsum)and aSOTAensembletechnique,namelyLightGBM[36].MoredetailsareprovidedinAddendum (Sec.4.6.Indeed,weobserveanoverheadindetectionspeedcomparedtotheindividualdetec- torsinisolation,however,cascade,max-scorefusionandLightGBM[36]canenhancetheoverall detectionperformancecomparedtotheindividualdetectorsatthecostofslowerinferencespeed. Sincetheindividualdetectorsstilltrainwithincoherentattacktypes,wethatproposed UniFAD outperformsalltheconsideredfusionschemes. InFig.4.2(a),weshowtheperformancedegradationofLightGBM[36],thebestfusingbase- line,w.r.t. UniFAD .Weobservethatamong 4 clusters,thelast 2 havetheoveralllargestdegrada- tion.Interestingly,these 2 clustersaretheonlyonesincludingattacktypesacrossdifferentattack categories,learnedviaour k -meanclustering.Inotherwords,thecross-categoryattackstypes withinabrancheachother,leadingtothelargestperformancegainover[36].Thisfurther demonstratesthenecessityandimportanceofadetectionschemeŠthemoreattacktypes 134 thedetectorsees,themorelikelyitwouldnourishamongeachotherandbeabletogeneralize. 4.5.4Attack WeclassifytheexactattacktypeandcategoriesusingthemethoddescribedinSec.4.4.5.In Fig.4.4,wethat UniFAD canpredicttheattacktypewith 75 : 81% accuracy. Whilepredictingtheexacttypemaybechallenging,wehighlightthatmajorityofthemisclassi- occurswithinattack'scategory.Withouthumanintervention,once UniFAD isdeployed inAFRpipelines,itcanpredictwhetheraninputimageisadversarial,digitallymanipulated,or containsspoofartifactswith 97 : 37% accuracy. 4.5.5AnalysisofUniFAD Architecture. Weablateandanalyzeourarchitecture. RatioofSharedLayers. Ourbackbonenetworkconsistsofa 4 -layerCNN.InFig.4.5a,we reportthedetectionperformancewhenweincorporate 0 , 1 ( 25% ), 2 ( 50% ), 3 ( 75% ),and 4 ( 100% ) layersforearlysharing.Weobserveatrade-offbetweendetectionperformanceandthenumberof earlylayers:toomanyreducestheeffectsoflearningfeaturesviabranching,whereas, lessnumberofsharedlayersinhibitsthenetworkfromlearninggenericfeaturesthatdistinguish anyattackfrombonaWethatanevensplitresultsinsuperiordetectionperformance. NumberofBranches. InFig.4.5b,wevarythenumberofbranches(aux.tasksconstructed via k Means)andreportthedetectionperformance.Indeed,increasingthenumberofbranches viaadditionalclustersenhancesdetectionperformance.However,theperformancesaturatesaf- ter 4 branches. UniFAD with 4 branchesachievesTDR= 94 : 73% @ 0 : 2% FDR,whereas, 5 and 6 branchesachieveTDRsof 94 : 33 and 94 : 62 at 0 : 2% ,respectively.For T =5 ,wenoticethat StyleGANisisolatedfromCluster 3 (seeFig.4.2(c))intoaseparatecluster.Learningtodiscrimi- nateStyleGANseparatelymayoffernoadvantagethanlearningjointlywithFGSMand PGD.Thus,wechoose T =4 duetolowernetworkcomplexityandhigherinferenceefy. 135 Figure4.5Detectionperformancewithrespecttovaryingratioofsharedlayers(left)andnumber ofbranches(right).Ourproposedarchitectureuses 50% sharedlayerswith 4 branches. Model Modules Overall SharedLayers Branching k Means TDR(%)@ 0 : 2% FDR JointCNN X 63 : 89 B Semantic X 86 : 17 B Random X 53 : 95 08 : 02 B kMeans X X 89 : 67 SharedSemantic X X 92 : 44 Proposed X X X 94 : 73 Table4.3Ablationstudyovercomponentsof UniFAD .Branchingviafi B Semantic fl,fi B Random fl,and fi B kMeans flrefertopartitioningattacktypesbytheirsemanticcategories,randomly,and k Means. fiSharedSemanticflincludessharedlayerspriortobranching. BranchGeneralizability. InFig.4.6,scoresfromthe 4 branchesareusedtocomputethedetec- tionperformanceonattacktypeswithinrespectivepartitionsandthoseoutsideabranch'spartition (seeFig.4.2(b)).Sinceattacktypesoutsideabranch'spartitionarepurportedlyincoherent,wesee adropinperformance;validatingthedrawbackofJointCNN.Wethatthelowestperformance branch,Branch 4 ,alsoexhibitsthebestgeneralizationperformanceacrossotherattacktypes.This islikelybecauselearningtodistinguishbonafromimperceptibleperturbationsfromFGSM, PGD,andminutesyntheticnoisesfromStyleGANyieldsatighterdecisionboundarywhichmay contributetobettergeneralizationacrossdigitalattacks.(Branch 1 )itselfdoesnot directlyaidindetectingdigitalattacks. 136 Figure4.6Detectionperformanceonattacktypeswithinandoutsideabranch'spartition.Perfor- mancedropsonattacksoutsidepartitionastheymaynothaveanycorrelationwithwithin-partition attacktypes. AblationStudy. InTab.4.3,weconductacomponent-wiseablationstudyover UniFAD .Westudy differentpartitioningtechniquestogroupthe 25 attacktypes.Weemploysemanticpartitioning, B Semantic whereattacktypesareclusteredintothe 3 categories.Anothertechniqueistosplitthe 25 attacktypesinto 4 clustersrandomly, B Random .Wereportthemeanandstandarddeviationacross 3 trialsofrandomsplitting.Wealsoreporttheperformanceofclusteringvia k Means.We thatboth B Semantic and B kMeans outperformsJointCNN.Thus,learningseparatefeaturespacesvia MTLfordisjointattacktypescanimproveoveralldetectioncomparedtoaJointCNN.Weals thatincorporatingearlysharedlayersinto B Semantic ,namely B SharedSemantic ,canfurtherimprove detectionfrom 86 : 17% ! 92 : 44% TDR@ 0 : 2% FDR.However,asweobservedinFig.4.2, evenwithinsemanticcategories,someattacktypesmaybeincoherent.Byautomaticconstruction ofauxiliarytaskswith k -meansclusteringandsharedrepresentation(Proposed),wecanfurther enhancethedetectionperformancetoTDR= 94 : 73% @ 0 : 2% FDR. FailureCases. Fig.4.7showsafewfailurecases.Majorityofthefailurecasesfordigitalattacks areduetoimperceptibleperturbations.Incontrast,failuretodetectspoofscanlikelybeattributed tothesubtlenatureoftransparentmasks,blurring,andilluminationchanges. 137 Figure4.7Examplecaseswhere UniFAD failstodetectfaceattacks.Finaldetectionscoresalong withscoresfromeachofthefourbranches( 2 [0 ; 1] )aregivenbeloweachimage.Scorescloser 0 indicatebonaBranchesresponsiblefortherespectiveclusterarehighlightedinbold. 4.6Addendum Here,weprovideadditionaldetailsontheproposed GrandFake datasetand UniFAD andbaselines. 4.6.1GrandFakeDataset The GrandFake datasetiscomposedofseveralwidelyadoptedfacedatasetsforfacerecognition, faceattributemanipulation,andsynthesis.detailsonthesourcedatasetsalongwithtrain- ingandtestingsplitsareprovidedinTab.4.4.Weensurethatthereisnoidentityoverlapbetween anyofthetrainingandtestingsplitsasfollows: 1. Weremoved 84 subjectsinCASIA-WebFacethatoverlapwithLFW.Inaddition,CASIA- WebFaceisonlyusedfortraining,whileLFWisonlyusedfortesting. 2. SiW-Mcomprisesofhigh-resolutionphotosofnon-celebritysubjectswithoutanyidentity overlapwithotherdatasets.Trainingandtestingsplitsarecomposedofvideospertainingto differentidentities. 3. IdentitiesinCelebAtrainingsetaredifferentthanthoseintesting. 4. FFHQcomprisesofhigh-resolutionphotosofnon-celebritysubjectsonFlicker.FFHQis utilizedsolelyforbonainordertoadddiversitytothequalityoffaceimages. 138 4.6.2ImplementationDetails AllthemodelsinthepaperareimplementedusingTwr1.12.AsingleNVIDIAGeForce GTX2080TiGPUisusedfortraining UniFAD on GrandFake dataset. Preprocessing AllfaceimagesarepassedthroughMTCNNfacedetector[41]todetect 5 faciallandmarks(twoeyes,noseandtwomouthcorners).Then,similaritytransformationisused tonormalizethefaceimagesbasedonthevelandmarks.Aftertransformation,theimagesare resizedto 160 160 .Beforepassinginto UniFAD andbaselines,eachpixelintheRGBimage isnormalized 2 [ 1 ; 1] bysubtracting 128 anddividingby 128 . Allthetestingimagesinthis chapterarefromtheidentitiesinthetestdataset. NetworkArchitecture Thebackbonenetwork,JointCNN,comprisesofa 4 -layerbinaryCNN: d32,d64,d128,d256,fc128,fc1 , where dk denotesa 4 4 convolutionallayerwith k andstride 2 and fcN referstoafully- connectedlayerwith N neuronoutputs. Theproposed UniFAD with 4 branchesiscomposedof: EarlyLayers: d32,d64 , Branch 1 : d128,d256,fc128,fc1 , Branch 2 : d128,d256,fc128,fc1 , Branch 3 : d128,d256,fc128,fc1 , Branch 4 : d128,d256,fc128,fc1 , 139 #TotalSamples #Training #Validation #Testing Examples Datasets BonaFides CASIA[10] 10 ; 575 10 ; 075 500 0 CASIA[10] CelebA[17] LFW[8] FFHQ[18] SiW-M[1] CelebA[17] 193 ; 506 134 ; 954 500 58 ; 052 LFW[8] 9 ; 164 0 0 9 ; 164 FFHQ[18] 20 ; 999 14 ; 200 500 6 ; 299 SiW-M[1] 107 ; 494 74 ; 746 500 32 ; 248 Total 341 ; 738 233,975 2,000 105,763 Attacks Adversarial FGSM 19 ; 739 10 ; 495 80 9 ; 164 FGSM PGD DeepFool AdvFaces GFLM Semantic PGD 19 ; 739 10 ; 495 80 9 ; 164 DeepFool 19 ; 739 10 ; 495 80 9 ; 164 AdvFaces 19 ; 739 10 ; 495 80 9 ; 164 GFLM 17 ; 946 8 ; 702 80 9 ; 164 Semantic 19 ; 739 10 ; 495 80 9 ; 164 DigitalManipulation DeepFake 18 ; 165 10 ; 393 80 7 ; 692 DeepFake F2F FaceSwap StarGAN STGAN Style- GAN Face2Face 18 ; 204 10 ; 385 80 7 ; 739 FaceSwap 14 ; 492 8 ; 299 80 6 ; 113 STGAN 29 ; 983 9 ; 903 80 20 ; 000 StarGAN 45 ; 473 10 ; 406 80 34 ; 987 StyleGAN 76 ; 604 10 ; 919 80 65 ; 605 Spoofs Cosmetic 2 ; 638 1 ; 766 80 792 Cosm. Imp. Obf. HalfMask Mann. Impersonation 9 ; 184 6 ; 348 80 2 ; 756 Obfuscation 3 ; 611 2 ; 447 80 1 ; 084 HalfMask 10 ; 486 7 ; 260 80 3 ; 146 Mannequin 5 ; 287 3 ; 620 80 1 ; 587 PaperMask Silicone Trans. Print PaperMask 2 ; 550 1 ; 705 80 765 Silicone 5 ; 038 3 ; 446 80 1 ; 512 Transparent 11 ; 451 7 ; 935 80 3 ; 436 Print 10 ; 530 7 ; 290 80 3 ; 160 PaperCut Funnyeye PaperGlass Replay PaperCut 13 ; 178 9 ; 144 80 3 ; 954 FunnyEye 23 ; 470 16 ; 349 80 7 ; 041 PaperGlass 18 ; 563 12 ; 914 80 5 ; 569 Replay 12 ; 126 8 ; 408 80 3 ; 638 Total 447 ; 674 210,114 2,000 235,560 Table4.4Compositionandstatisticsfortheproposed GrandFake dataset.Wealsoincludethe evaluationprotocolfortheseenattackscenario. 4.6.3DigitalAttackImplementation Adversarialattacksaresynthesizedviapubliclyavailableauthorcodes: FGSM/PGD/DeepFool:https://githubw/cleverhans AdvFaces:https://github.com/ronny3050/AdvFaces 140 GFLM:https://github.com/alldbi/FLM SemanticAdv:https://github.com/AI-secure/SemanticAdv Digitalmanipulationattacksarealsogeneratedviapubliclyavailableauthorcodes: DeepFake/Face2Face/FaceSwap:https://github.com/ondyari/FaceForensics/tree/original STGAN:https://github.com/csmliu/STGAN StarGAN:https://github.com/yunjey/stargan StyleGAN-v2:https://github.com/NVlabs/stylegan2 4.6.4BaselineImplementation IndividualDetectors. Weevaluateall individual defensemethods,exceptMixNet[233],viapub- liclyavailablerepositoriesprovidedbytheauthors.Weprovidethepubliclinkstotheauthorcodes below: FaceGuard[19]:https://github.com/ronny3050/FaceGuard FFD[20]:https://github.com/JStehouwer/FFD CVPR2020 SSRFCN[21]:https://github.com/ronny3050/SSRFCN One-Class[61]:https://github.com/anjith2006/bob.paper.oneclass mccnn 2019 FusionofJointCNNs. Inourwork,weemploy 5 parallelscore-levelfusionrules.Foratest- inginputimage,weextractthreescores(in [0 ; 1] )fromthreeSOTAindividualdetectors(Face- Guard[19],FFD[20],andSSRFCN[21]).Thedecisionscoreiscomputedviathefusionrule operator,namely min , mean , median , max ,and sum .LightGBM[36]isatree-ensemblelearning methodwhereaGradientBoostedDecisionTreeistrainedfromthethreescores(fromindivid- ualSOTAdectors)tooutputtothedecision.WeuseMicrosoft'sLightGBMimplementa- tion:https://github.com/microsoft/LightGBM. WetraintheLightGBMmodelonthetraining setof GrandFake . 141 MethodYearProposedForMetricAdv.Dig.Man.Phys.Overall w/oRe-train FaceGuard[19]2020Adversarial TDR 99 : 9122 : 2800 : 5829 : 64 Acc. 99 : 7867 : 1251 : 0271 : 03 FFD[20]2020DigitalManipulation TDR 09 : 4994 : 5701 : 2534 : 55 Acc. 56 : 4297 : 8753 : 4275 : 29 SSRFCN[21]2020Spoofs TDR 00 : 2500 : 7693 : 1922 : 71 Acc. 50 : 0150 : 9396 : 1269 : 11 MixNet[233]2020Spoofs TDR 00 : 3609 : 8378 : 2121 : 12 Acc. 50 : 4355 : 9885 : 4761 : 26 Baselines FaceGuard[19]2020Adversarial TDR 99 : 8641 : 5604 : 3556 : 69 Acc. 99 : 7171 : 2354 : 0681 : 88 FFD[20]2020DigitalManipulation TDR 76 : 0691 : 3287 : 4368 : 25 Acc. 87 : 1593 : 4091 : 3789 : 06 SSRFCN[21]2020Spoofs TDR 08 : 2327 : 6789 : 1943 : 26 Acc. 54 : 0269 : 1890 : 9183 : 41 One-class[61]2020Spoofs TDR 04 : 8145 : 9679 : 3239 : 40 Acc. 53 : 9964 : 0886 : 6580 : 74 MixNet- UniFAD 2021All TDR 82 : 3391 : 5994 : 6090 : 07 Acc. 89 : 3294 : 5096 : 1893 : 19 FusionSchemes Cascade[40]- TDR 88 : 3981 : 9869 : 1977 : 46 Acc. 91 : 3389 : 1779 : 9285 : 16 Min-score- TDR 03 : 6511 : 0800 : 4307 : 22 Acc. 51 : 6166 : 7650 : 8855 : 62 Median-score- TDR 10 : 8742 : 3347 : 1939 : 48 Acc. 55 : 1259 : 5857 : 4459 : 22 Mean-score- TDR 14 : 5347 : 1861 : 3238 : 23 Acc. 55 : 6954 : 1973 : 8755 : 92 Max-score- TDR 85 : 3261 : 9356 : 8773 : 89 Acc. 89 : 2668 : 1160 : 0869 : 43 Sum-score- TDR 74 : 9358 : 0150 : 3469 : 21 Acc. 83 : 8567 : 4864 : 7273 : 10 LightGBM[36]- TDR 76 : 2581 : 2888 : 5285 : 97 Acc. 84 : 1989 : 4694 : 5690 : 56 ProposedUniFAD 2021All TDR 92 : 5697 : 2198 : 7694 : 73 Acc 95 : 1898 : 3298 : 9696 : 89 Table4.5Detectionperformance(TDR(%)@ 0 : 2% FDRandAccuracy(%))on GrandFake datasetunderthe seen attackscenario. 4.6.5SeenAttacks Tab.4.5reportsthedetectionperformance(TDR(%)@ 0 : 2% FDRandaccuray(%))of UniFAD andbaselineson GrandFake datasetundertheseenattackscenario.Trainingandtestingsplitsare providedinTab.4.4.Overall, UniFAD outperformsallfusionschemesandbaselines. 142 Figure4.8Trainingandtestingsplitsforgeneralizabilitystudy. 143 Method Metric(%) Fold1 Fold2 Fold3 Mean Std. FaceGuard[19] TDR 41 : 38 54 : 19 36 : 82 44 : 13 9 : 01 Acc. 58 : 42 64 : 19 55 : 74 59 : 45 4 : 32 FFD[20] TDR 53 : 19 62 : 45 52 : 94 56 : 20 5 : 42 Acc. 66 : 15 69 : 33 67 : 86 67 : 78 1 : 59 SSRFCN[21] TDR 49 : 10 64 : 92 61 : 18 58 : 84 8 : 26 Acc. 60 : 07 72 : 77 69 : 83 66 : 57 6 : 64 MixNet- UniFAD TDR 67 : 19 73 : 18 72 : 74 71 : 04 3 : 33 Acc. 75 : 64 79 : 40 78 : 73 77 : 93 2 : 00 LightGBM TDR 51 : 65 65 : 73 67 : 91 61 : 76 8 : 83 Acc. 69 : 34 73 : 66 75 : 80 72 : 93 3 : 29 Proposed UniFAD TDR 76 : 18 83 : 19 82 : 67 80 : 68 3 : 91 Acc. 85 : 35 89 : 62 85 : 88 86 : 95 2 : 32 Table4.6Generalizationperformance(TDR(%)@ 0 : 2% FDRandAccuracy(%))on GrandFake datasetunderunseenattacksetting.Eachfoldcomprisesof 8 unseenattacksfromall4branches. 4.6.6GeneralizabilitytoUnseenAttacks Underthissetting,weevaluatethegeneralizationperformanceon3folds(seeFig.4.8).Thefolds arecomputedasfollows:weholdout 1 = 3 ofthetotalattacktypesinabranchfortestingandthe remainingareusedfortraining.For e . g .,branch1consistingof 13 attacktypesare randomly split suchthatweteston 4 unseenattacktypes,whiletheremaining 9 attacktypesareusedfortraining. Weperform 3 foldsofsuchrandomsplitting.Intotal,eachfoldconsistsof 17 seenand 8 unseen attacks.ForLightGBM,weutilizescoresfromFaceGuard[19],FFD[20],andSSRFCN[21] whicharealltrainedonlyontheknownattacktypes.Wereportthedetectionperformanceandthe averageandstandarddeviationacrossallfoldsinTab.4.6. Wethatbranching-basedmethods,suchasMixNet- UniFAD andtheproposed UniFAD , outperformsJointCNN-basedmethodssuchasFaceGuard[19],FFD[20],SSR- FCN[21],andLightGBM(fusionofthethree).Thesuperiorityoftheproposed UniFAD under 144 Figure4.9Confusionmatrixrepresentingtheaccuracyof UniFAD inidentifyingthe 3 attackcategories,namelyadversarialfaces,digitalfacemanipulation,andspoofs.Majorityof confusionoccurswithindigitalattacks(adversarialanddigitalmanipulationattacks). unseenattacksisevident.Byincorporatingbrancheswithcoherentattacks,removingsomeattack typeswithinabranchdoesnotdrasticallyaffectthegeneralizationperformance. Inadditiontosuperiorgeneralizationperformancetounseenattacktypes,theproposed Uni- FAD alsoreducesthegapbetweenseenandunseenattacks.Overall,theproposed UniFAD achieves 94 : 73% and 80 : 68% TDRsat 0 : 2% FDRunderseenandunseenattackscenario.Thatis,weobserve arelativereductioninTDRof 15% underunseenattacks,comparedtothesecondbestmethod, namelyMixNet- UniFAD ,whichhasarelativereductioninTDRof 22% underunseenattacks. 4.6.7AttackCategory InFig.4.9,wethat UniFAD canpredicttheattackcategorywith 97 : 37% accu- racy.Weemphasizethatmajorityoftheoccurswithinthedigitalattackspace. Thatis,misclassifyingadversarialattacksasdigitalmanipulationattacksandvice-versa. Amongspoofs, 1 : 7% ofthemareasdigitalmanipulationattacks.Majorityof thesearemakeupattackswhichhavesomecorrelationwithsomedigitalmanipulationattackssuch asDeepFake,Face2Face,andFaceSwap(seeFig. 2 in mainpaper ).Wepositthatthislikely becausecosmeticandimpersonationattacksapplymakeuptoeyebrowsandcheekswhichmay 145 appearsimilartoID-swappingmethodssuchasDeepFakeandFace2Facewhichalsomajorlyalter theeyebrowsandcheeks. 4.7Summary WithnewandsophisticatedattacksbeingcraftedagainstAFRsystemsinbothdigitalandphys- icalspaces,detectorsneedtoberobustacrossall 3 categories.Prevailingmethodsindeedexcel atdetectingattacksintheirrespectivecategories,however,theyfallshortingeneralizingacross categories.While,ensembletechniquescanenhancetheoverallperformance,theystillfailtomeet thedesiredaccuracylevels.Poorgeneralizationcanbepredominantlyattributedtowardslearning incoherentattacksjointly.Withanewmulti-tasklearningframeworkalongwith k -meansaug- mentation,theproposed UniFAD achievedSOTAdetectionperformance(TDR= 94 : 73% @ 0 : 2% FDR)on 25 faceattacksacross 3 categories. UniFAD canfurtheridentifycategorieswitha 97 : 37% accuracy. 146 Chapter5 Summary Thisdissertationhasaddressedsomeimportantchallengesthatplagueprevailingautomatedface recognitionsystemstoday.Ourprimarycontributionliesinenhancingtherobustnessandsecurity ofanycommodityfacerecognitionsystemagainstmaliciousattackssuchaspresentationattacks (physicallycraftedspoofs),adversarialfaces(imperceptiblenoisesaddedtotheinputprobe),and digitalfacemanipulationattacks(identityandexpressionswapping,attributemanipulation,and entirefacesynthesis).Allproposedsolutionsachievestate-of-the-artdetectionperformancewhile maintaininghighcomputationalefy( < 4 msonNvidiaGTX2080TiGPU).Wealsomake anefforttoimpartgeneralizabilitytounknownattacktypesthatmaybelaunchedagainstanAFR pipelineinthefuture.Inaddition,allproposeddetectorsareinterpretablesuchthatAFRsystems canbeoperatedsafelyincovertscenarios.Whenafaceattackisdetected,authoritiescanbealerted withmorethanjustaholisticfiattackscorefl.Forinstance,ourproposedmethodscanhighlight regionsofthefaceimagethatcontributetotheoveralldecisionmadebythedetector.Lastly, sincetheexactattacktypesmaynotbeknownbeforehand,ourworkisamongthetoattempt detectionacrossthreeattackcategoriesinbothphysicalanddigitaldomains.Wealsopresenteda methodtoautomaticallyclassifytheexactattackcategorywheneveranattackisdetected. 147 5.1Contributions Chapter2focusedonsafeguardingAFRsystemsfromphysicalfacespoofs.Thecontributions include: Weshowthatfeatureslearnedfromlocalfaceregionshavebettergeneralizationperformance ondetectingpresentationattacksthanthoselearnedfromtheentirefaceimage. Weprovideextensiveexperimentstoshowthattheproposedpresentationattackdetectionap- proach,outperformsotherlocalregionextractionstrategiesandstate-of-the-artfacepresen- tationattackdetectionmethodsononeofthelargestpubliclyavailabledataset,namely,SiW- M,comprisedof13differentpresentationattackinstruments.Theproposedmethodreduces theEqualErrorRate(EER)by(i)14%relativetostate-of-the-art[88]undertheunknown attacksetting,and(ii)40%onknownpresentationattackinstruments.Inaddition, SSR- FCN achievescompetitiveperformanceonstandardbenchmarksonOulu-NPU[2]dataset andoutperformsprevailingmethodsoncross-datasetgeneralization(CASIA-FASD[3]and Replay-Attack[4]). Theproposedpresentationattackdetectionmethodisalsoshowntobemoreinterpretable sinceitcandirectlypredictthepartsofthefacesthatareconsideredaspresentationattacks. InChapter3,wedesignedanautomatedadversarialsynthesismethod.Thecontributions ofthesynthesissectionareasfollows: AGAN-based AdvFaces thatlearnstogeneratevisuallyrealisticadversarialfaceimagesthat aremiscbystate-of-the-artAFRsystems.AdversarialfacesgeneratedviaAdvFaces aremodel-agnosticandtransferable,andachievehighsuccessrateon5state-of-the-artau- tomatedfacerecognitionsystems. Perceptualstudieswherehumanobserverssuggestthattheadversarialexamplesappearmore similartotheprobecomparedtothepreviousmethods. Visualizingthefacialregions,wherepixelsareperturbedandanalyzingthetransferability 148 ofAdvFaces. Anopen-sourceautomatedadversarialfacegeneratorpermittinguserstocontroltheamount ofperturbation. Thelatterpartofthechapterfocusedonutilizingideaspresentintheaforementionedadver- sarialsynthesismethodtodetectanyadversarialattacktype.Thisworkmakesthefollowing contributions: Anewself-supervisedframework,namely FaceGuard ,fordefendingagainstadversarialface images. FaceGuard combinesofadversarialtraining,detection,andinto adefensemechanismtrainedinanend-to-endmanner. Withtheproposeddiversityloss,ageneratorisregularizedtoproducestochasticandchal- lengingadversarialfaces.Weshowthatthediversityinoutputperturbationsissuffor improving FaceGuard 'srobustnesstounseenattackscomparedtoutilizingpre-computed trainingsamplesfromknownattacks. Synthesizedadversarialfacesaidthedetectortolearnatightdecisionboundaryaroundreal faces. FaceGuard 'sdetectorachievesSOTAdetectionaccuraciesof 99 : 81% , 98 : 73% ,and 99 : 35% on 6 unseenattacksonLFW[8],Celeb-A[17],andFFHQ[18]. Asthegeneratortrains,aconcurrentlyremovesperturbationsfromthesynthesized adversarialfaces.Withtheproposedloss,thedetectoralsoguidesstraining toensureimagesaredevoidofadversarialperturbations.At0.1%FalseAccept Rate, FaceGuard 'senhancestheTrueAcceptRateofArcFace[9]from 34 : 27% undernodefenseto 77 : 46% . Lastly,thecontributionsofourstudyondetectionofbothphysicalanddigitalattacks inChapter4areasfollows: Amongthetothetaskoffaceattackdetectionon 25 attacktypesacross 3 attack categories:adversarialfaces,digitalfacemanipulation,andspoofs.Wecomprehensively 149 analyzetheshortcomingsofprevailingdetectors.Wealsoshowthatsequentialandparallel ensemblelearningcanenhancedetectioncomparedtousingasingleSOTAdetector. Anovel uni f ace a ttack d etectionframework,namely UniFAD ,thatautomaticallyclus- terssimilarattacksandemploysamulti-tasklearningframeworktodetectdigitalandphys- icalattacks. Proposed UniFAD achievesSOTAdetectionperformance,TDR= 94 : 73% @ 0 : 2% FDRon alargefakefacedataset,namely GrandFake .Tothebestofourknowledge, GrandFake is thelargestfaceattackdatasetstudiedinliteratureintermsofthenumberofdiverseattack types. Proposed UniFAD allowsforfurtheroftheattackcategories, i . e .,whether attacksareadversarial,digitallymanipulated,orcontainsphysicalartifacts,witha accuracyof 97 : 37% . 5.2SuggestionsforFutureWork ThealgorithmsandmodelsdesignedinthisthesisarenotlimitedtoAFRsystems.Attackscrafted inbothphysicalanddigitaldomainscanalsobelaunchedagainstotherbiometricsystems(such asautomatedandirisrecognitionsystems).Theresearchpresentedinthisdissertation canbeextendedinthefollowingdirections: AdversarialAttacksonOtherBiometrics Theproposedadversarialfacesynthesismethod inChapter3,namely AdvFaces ,canbeutilizedtolaunchadversarialattacksonotherbio- metricsystemssuchasautomatedoririsrecognitionsystems.Forinstance,by replacingtheauxiliaryAFRsystemwithanautomatedrecognitionsystem(such asDeepPrint[234]),wecansynthesizeadversarial RobustnessviaSynthesis Insteadofutilizingadedicatedadversarialdetector,wecanuti- lize FaceGuard (Chapter3)inanadversarialtrainingmechanism. FaceGuard 'sgenerator 150 cansynthesizeadversarialfaceson-t,whileafacerecognitionsystemistrainedtocor- rectlyclassifythesynthesizedadversarialfaces.Inthismanner,weshouldbeabletoobtain afacerecognitionsystemthatisrobusttoadversarialfaces. UniversalAttackDetectionviaSynthesis FollowingtheideaspresentedinChapter3,we canalsolearntosynthesizephysicalspoofsalongwithadversarialattacks,such thatadetectorisconcurrentlytrainedtodetectbothsynthesizedspoofsandadversarialat- tacks.Oncetrained,thedetectorshouldbeabletoreliablyrejectfaceattacksfromboth physicalanddigitaldomains. 151 Chapter6 PhDOverview 6.1Publications AlistofallpublicationsduringthecourseofmyPhDprogram(inreversechronologicalorder): 1. D.Deb,X.LiuandA.K.Jain,DetectionofDigitalandPhysicalFaceAttacksfl, arXiv:2104.02156,2021. 2. D.Deb,X.LiuandA.K.Jain,flFaceGuard:ASelf-SupervisedDefenseAgainstAdversarial FaceImagesfl,arXiv:2011.14218,2021. 3. D.Deb,D.AggarwalandA.K.Jain.fiChildFaceAge-ProgressionviaDeepFeatureAgingfl, IEEEICPR ,2021. 4. J.J.Engelsma,D.Deb,K.Cao,A.Bhatnagar,P.S.SudhishandA.K.Jain,flInfant-ID: FingerprintsforGlobalGoodfl,in IEEEPAMI ,2021. 5. D.DebandA.K.Jain.fiLookLocallyInferGlobally:AGeneralizableFace Approachfl,in IEEETIFS ,2020. 6. H.Xu,Y.Ma,H.Liu,D.Deb,H.Liu,J.TangandA.K.Jain,fiAdversarialAttacksand DefensesinImages,GraphsandText:AReviewfl,in IJAC ,DOI10.1007/s11633-019-1211- x,2020. 152 7. D.Deb,J.ZhangandA.K.Jain,fiAdvFaces:AdversarialFaceSynthesisfl,in IEEEIJCB , 2020. 8. D.Deb,A.Ross,A.K.Jain,K.Prakah-AsanteandK.VenkateshPrasad,fiActionsSpeak LouderThan(Pass)words:PassiveAuthenticationofSmartphoneUsersviaDeepTemporal Featuresfl,in IEEEICB ,2019. 9. J.J.Engelsma,D.Deb,A.K.Jain,P.S.SudhishandA.Bhatnagar,flInfant-Prints:Finger- printsforReducingInfantMortalityfl,in IEEECVPRW-CV4GC ,2019. 10. Y.Shi,D.DebandA.K.Jain,fiWarpGAN:AutomaticCaricatureGenerationfl,in IEEE CVPR ,2019. 11. D.Deb,N.NainandA.K.Jain,fiLongitudinalStudyofChildFaceRecognitionfl,in IEEE ICB ,2018. 12. D.Deb,S.Wiper,S.Gong,Y.Shi,C.Tymoszek,A.FletcherandA.K.Jain,fiFaceRecog- nition:PrimatesintheWildfl,in IEEEBTAS ,2018. 13. E.Tabassi,T.Chugh,D.DebandA.K.Jain,fiAlteredFingerprints:DetectionandLocaliza- tionfl,in IEEEBTAS ,2018. 14. D.Deb,T.Chugh,J.Engelsma,K.Cao,N.Nain,J.KendallandA.K.Jain,fiMatching FingerphotostoSlapFingerprintImagesfl,arXiv:1804.08122,2018. 15. D.Deb,L.Best-RowdenandA.K.Jain,fiFaceRecognitionPerformanceUnderAgingfl,in IEEECVPRW ,2017. 6.2Videos&Demos Thefollowingvideosdemonstrateresearchsolutionspresentedintheabovepublicationsinthe realworld: 153 1. FaceAntihttps://youtu.be/VzD1GSJ5omQ 2. AdversarialFaceSynthesishttps://youtu.be/uZBKymweNvI 3. AutomaticCaricatureSynthesishttps://youtu.be/zJL-eivtVnk 4. PrimID:FaceRecognitionforEndangeredPrimateshttps://youtu.be/mbiIhEjKfhA 6.3MediaCoverage 1. https://msutoday.msu.edu/news/2021/using-thumbprints-vaccination-records-to-save-lives (MSUToday) 2. https://www.sciencedaily.com/releases/2018/05/180524112345.htm(ScienceDaily) 3. https://msutoday.msu.edu/news/2018/msu-technology-and-app-could-help-endangered- primates-slow-illegal-traf(MSUToday) 4. https://www.springwise.com/app-endangered-primates-using-facial-recognition (Springwise) 5. https://www.conservationjobs.co.uk/articles/new-technology-assists-the-protection-of- primates(ConservationJobs) 6. https://www.asmag.com/rankings/m/content.aspx?id=25402(ASMag) 7. https://olhardigital.com.br/en/2018/05/28/noticias/novo-sistema-de-reconhecimento-facial- pode-ajudar-a-salvar-primatas-da-extincao/(OlharDigital) 8. https://freetheapes.org/tag/facial-recognition/(PEGAS) 9. https://msutoday.msu.edu/news/2019/compact-low-cost-reader-could-help- reduce-infant-mortality-around-the-world(MSUToday) 154 BIBLIOGRAPHY 155 BIBLIOGRAPHY [1] Y.Liu,J.Stehouwer,A.Jourabloo,andX.Liu,fiDeeptreelearningforzero-shotfaceanti- flin CVPR ,2019. [2] Z.Boulkenafet,J.Komulainen,L.Li,X.Feng,andA.Hadid,fiOULU-NPU:Amobileface presentationattackdatabasewithreal-worldvariations,flin IEEEFG ,pp.612Œ618,2017. [3] Z.Zhang,J.Yan,S.Liu,Z.Lei,D.Yi,andS.Z.Li,fiAfacedatabasewith diverseattacks,flin IEEEICB ,pp.26Œ31,2012. [4] I.Chingovska,A.Anjos,andS.Marcel,fiOntheEffectivenessofLocalBinaryPatternsin Faceflin IEEEBIOSIG ,2012. [5] J.Deng,J.Guo,E.Ververas,I.Kotsia,andS.Zafeiriou,fiRetinaface:Single-shotmulti-level facelocalisationinthewild,flin IEEECVPR ,pp.5203Œ5212,2020. [6] F.Schroff,D.Kalenichenko,andJ.Philbin,fiFacenet:Aembeddingforfacerecog- nitionandclustering,flin IEEECVPR ,pp.815Œ823,2015. [7] BLCV,fiDemystifyingFaceRecognitionIV:Face-Alignment.flhttps://bit.ly/3iUUBqz, 2017. [8] G.B.Huang,M.Ramesh,T.Berg,andE.Learned-Miller,fiLabeledfacesinthewild:A databaseforstudyingfacerecognitioninunconstrainedenvironments,flTech.Rep.07-49, UniversityofMassachusetts,Amherst,October2007. [9] J.Deng,J.Guo,N.Xue,andS.Zafeiriou,fiArcface:Additiveangularmarginlossfordeep facerecognition,flin IEEECVPR ,pp.4690Œ4699,2019. [10] D.Yi,Z.Lei,S.Liao,andS.Z.Li,fiLearningfacerepresentationfromscratch,fl arXiv preprintarXiv:1411.7923 ,2014. [11] A.Kurakin,I.Goodfellow,andS.Bengio,fiAdversarialmachinelearningatscale.,fl ICLR , 2017. [12] X.LiuandC.-J.Hsieh,fiRob-gan:Generator,discriminator,andadversarialattacker,flin CVPR ,2019. [13] Y.Jang,T.Zhao,S.Hong,andH.Lee,fiAdversarialdefensevialearningtogeneratediverse attacks,flin ICCV ,2019. [14] D.MengandH.Chen,fiMagnet:atwo-prongeddefenseagainstadversarialexamples,flin ACMCCS ,pp.135Œ147,2017. [15] P.Samangouei,M.Kabkab,andR.Chellappa,fiDefense-gan:Protectingagainst adversarialattacksusinggenerativemodels,fl ICLR ,2018. 156 [16] D.Deb,J.Zhang,andA.K.Jain,fiAdvfaces:Adversarialfacesynthesis,fl arXivpreprint arXiv:1908.05008 ,2019. [17] Z.Liu,P.Luo,X.Wang,andX.Tang,fiDeeplearningfaceattributesinthewild,flin ICCV , December2015. [18] T.Karras,S.Laine,andT.Aila,fiAstyle-basedgeneratorarchitectureforgenerativeadver- sarialnetworks,flin CVPR ,2019. [19] D.Deb,X.Liu,andA.K.Jain,fiFaceguard:Aself-superviseddefenseagainstadversarial faceimages,fl arXivpreprintarXiv:2011.14218 ,2020. [20] H.Dang,F.Liu,J.Stehouwer,X.Liu,andA.K.Jain,fiOnthedetectionofdigitalface manipulation,flin CVPR ,2020. [21] D.DebandA.K.Jain,fiLooklocallyinferglobally:Ageneralizableface approach,fl IEEETIFS ,vol.16,pp.1143Œ1157,2020. [22] P.Grother,M.Ngan,andK.Hanaoka,fiOngoingfacerecognitionvendortest(frvt),fl NIST InteragencyReport ,2018. [23] S.Baker,T.Sim,andM.Bsat,fiThecmupose,illumination,andexpressiondatabase,fl IEEE TIFS ,vol.25,no.12,pp.1615Œ1618,2003. [24] S.Sengupta,J.-C.Chen,C.Castillo,V.M.Patel,R.Chellappa,andD.W.Jacobs,fiFrontal tofacevinthewild,flin IEEEWACV ,pp.1Œ9,2016. [25] S.Moschoglou,A.Papaioannou,C.Sagonas,J.Deng,I.Kotsia,andS.Zafeiriou,fiAgedb: themanuallycollected,in-the-wildagedatabase,flin IEEECVPR ,pp.51Œ59,2017. [26] P.J.Phillips,H.Moon,S.A.Rizvi,andP.J.Rauss,fiTheferetevaluationmethodologyfor face-recognitionalgorithms,fl IEEEPAMI ,vol.22,no.10,pp.1090Œ1104,2000. [27] B.F.Klare,B.Klein,E.Taborsky,A.Blanton,J.Cheney,K.Allen,P.Grother,A.Mah,and A.K.Jain,fiPushingthefrontiersofunconstrainedfacedetectionandrecognition:IARPA JanusBenchmarkA,flin IEEECVPR ,pp.1931Œ1939,2015. [28] H.Goldstein, Multilevelstatisticalmodels ,vol.922.JohnWiley&Sons,2011. [29] Z.Cheng,X.Zhu,andS.Gong,fiLow-resolutionfacerecognition,flin ACCV ,pp.605Œ621, 2018. [30] Y.Liu,A.Jourabloo,andX.Liu,fiLearningdeepmodelsforfaceBinaryor auxiliarysupervision,flin IEEECVPR ,June2018. [31] N.K.Ratha,J.H.Connell,andR.M.Bolle,fiEnhancingsecurityandprivacyinbiometrics- basedauthenticationsystems,fl IBMSystemsJournal ,vol.40,no.3,pp.614Œ634,2001. [32] A.Dabouei,S.Soleymani,J.Dawson,andN.Nasrabadi,fiFastgeometrically-perturbed adversarialfaces,flin IEEEWACV ,pp.1979Œ1988,2019. 157 [33] A.Madry,A.Makelov,L.Schmidt,D.Tsipras,andA.Vladu,fiTowardsdeeplearning modelsresistanttoadversarialattacks,fl arXivpreprintarXiv:1706.06083 ,2017. [34] I.J.Goodfellow,J.Shlens,andC.Szegedy,fiExplainingandharnessingadversarialexam- ples,fl arXivpreprintarXiv:1412.6572 ,2014. [35] A.Madry,A.Makelov,L.Schmidt,D.Tsipras,andA.Vladu,fiTowardsdeeplearning modelsresistanttoadversarialattacks,fl arXivpreprintarXiv:1706.06083 ,2017. [36] G.Ke,Q.Meng,T.Finley,T.Wang,W.Chen,W.Ma,Q.Ye,andT.-Y.Liu,fiLightgbm:A highlyefgradientboostingdecisiontree,fl NeurIPS ,2017. [37] L.Best-RowdenandA.K.Jain,fiLongitudinalstudyofautomaticfacerecognition,fl IEEE transactionsonpatternanalysisandmachineintelligence ,vol.40,no.1,pp.148Œ162,2017. [38] D.Deb,N.Nain,andA.K.Jain,fiLongitudinalstudyofchildfacerecognition,flin IEEE ICB ,pp.225Œ232,2018. [39] D.Deb,L.Best-Rowden,andA.K.Jain,fiFacerecognitionperformanceunderaging,flin IEEECVPRWorkshop ,pp.46Œ54,2017. [40] P.ViolaandM.Jones,fiRapidobjectdetectionusingaboostedcascadeofsimplefeatures,fl in IEEECVPR ,vol.1,pp.IŒI,2001. [41] K.Zhang,Z.Zhang,Z.Li,andY.Qiao,fiJointfacedetectionandalignmentusingmultitask cascadedconvolutionalnetworks,fl IEEESPL ,vol.23,no.10,pp.1499Œ1503,2016. [42] H.Wang,Y.Wang,Z.Zhou,X.Ji,D.Gong,J.Zhou,Z.Li,andW.Liu,fiCosface:Large margincosinelossfordeepfacerecognition,flin IEEECVPR ,2018. [43] D.Wang,C.Otto,andA.K.Jain,fiFacesearchatscale,fl IEEEPAMI ,vol.39,no.6, pp.1122Œ1136,2016. [44] W.Liu,Y.Wen,Z.Yu,M.Li,B.Raj,andL.Song,fiSphereface:Deephypersphereembed- dingforfacerecognition,flin IEEECVPR ,pp.212Œ220,2017. [45] M.A.TurkandA.P.Pentland,fiFacerecognitionusingeigenfaces,flin IEEECVPR , pp.586Œ587,1991. [46] K.Simonyan,O.M.Parkhi,A.Vedaldi,andA.Zisserman,fiFishervectorfacesinthewild.,fl in BMVC ,vol.2,p.4,2013. [47] D.Chen,X.Cao,F.Wen,andJ.Sun,fiBlessingofdimensionality:High-dimensionalfeature anditsefcompressionforfacevflin IEEECVPR ,pp.3025Œ3032,2013. [48] D.Chen,X.Cao,L.Wang,F.Wen,andJ.Sun,fiBayesianfacerevisited:Ajointformula- tion,flin ECCV ,pp.566Œ579,2012. [49] Y.Taigman,M.Yang,M.Ranzato,andL.Wolf,fiDeepface:Closingthegaptohuman-level performanceinfacevflin IEEECVPR ,pp.1701Œ1708,2014. 158 [50] Y.Sun,X.Wang,andX.Tang,fiDeeplearningfacerepresentationfrompredicting10,000 classes,flin IEEECVPR ,pp.1891Œ1898,2014. [51] O.M.Parkhi,A.Vedaldi,andA.Zisserman,fiDeepfacerecognition,flin BMVC ,pp.41.1Œ 41.12,2015. [52] H.T.F.Rhodes, AlphonseBertillon,fatherofdetection .Greenwood,1968. [53] Y.Wen,K.Zhang,Z.Li,andY.Qiao,fiAdiscriminativefeaturelearningapproachfordeep facerecognition,flin ECCV ,pp.499Œ515,Springer,2016. [54] P.J.Phillips,P.J.Grother,R.J.Micheals,D.M.Blackburn,E.Tabassi,andM.Bone,fiFace recognitionvendortest2002:Evaluationreport,fltech.rep.,NIST,2003. [55] P.Flanagan,fiMultiplebiometricevaluation(mbe),fl NIST ,2010. [56] P.GrotherandM.Ngan,fiFacerecognitionvendortest(frvt):Performanceofface cationalgorithms,fl NISTInteragencyreport ,vol.8009,no.5,p.14,2014. [57] Q.Cao,L.Shen,W.Xie,O.M.Parkhi,andA.Zisserman,fiVggface2:Adatasetforrecog- nisingfacesacrossposeandage,flin IEEEFG ,pp.67Œ74,IEEE,2018. [58] S.Jia,G.Guo,andZ.Xu,fiAsurveyon3dmaskpresentationattackdetectionandcounter- measures,fl PatternRecognition ,vol.98,p.107032,2020. [59] S.Agarwal,H.Farid,Y.Gu,M.He,K.Nagano,andH.Li,fiProtectingworldleadersagainst deepfakes.,flin CVPRWorkshops ,2019. [60] X.Yang,D.Yang,Y.Dong,W.Yu,H.Su,andJ.Zhu,fiDelvingintotheadversarialrobust- nessonfacerecognition,fl arXivpreprintarXiv:2007.04118 ,2020. [61] A.GeorgeandS.Marcel,fiLearningoneclassrepresentationsforfacepresentationattack detectionusingmulti-channelconvolutionalneuralnetworks,fl IEEETIFS ,vol.16,pp.361Œ 375,2020. [62] Y.Liu,J.Stehouwer,andX.Liu,fiOndisentanglingspooftraceforgenericfaceanti- flin ECCV ,Springer,2020. [63] H.Feng,Z.Hong,H.Yue,Y.Chen,K.Wang,J.Han,J.Liu,andE.Ding,fiLearning generalizedspoofcuesforfacefl arXivpreprintarXiv:2005.03922 ,2020. [64] InternationalStandardsOrganization,fiISO/IEC30107-1:2016,InformationTechnology BiometricPresentationAttackDetectionPart1:Framework.flhttps://www.iso.org/standard/ 53227.html,2016. [65] S.Marcel,M.S.Nixon,andS.Z.Li, HandbookofBiometric ,vol.1.Springer, 2014. [66] I.Manjani,S.Tariyal,M.Vatsa,R.Singh,andA.Majumdar,fiDetectingsiliconemask- basedpresentationattackviadeepdictionarylearning,fl IEEETIFS ,vol.12,no.7,pp.1713Œ 1723,2017. 159 [67] Y.Xu,T.Price,J.-M.Frahm,andF.Monrose,fiVirtualu:Defeatingfacelivenessdetection bybuildingvirtualmodelsfromyourpublicphotos,flin USENIX ,pp.497Œ512,2016. [68] fiFastPass-aharmonized,modularreferencesystemforallEuropeanautomatedborder- crossingpoints.flFastPass-EU,https://www.fastpass-project.eu. [69] C.Szegedy,W.Zaremba,I.Sutskever,J.Bruna,D.Erhan,I.Goodfellow,andR.Fergus, fiIntriguingpropertiesofneuralnetworks,fl arXivpreprintarXiv:1312.6199 ,2013. [70] I.J.Goodfellow,J.Shlens,andC.Szegedy,fiExplainingandharnessingadversarialexam- ples,fl arXivpreprintarXiv:1412.6572 ,2014. [71] S.-M.Moosavi-Dezfooli,A.Fawzi,O.Fawzi,andP.Frossard,fiUniversaladversarialper- turbations,flin IEEECVPR ,pp.1765Œ1773,2017. [72] Y.Dong,F.Liao,T.Pang,H.Su,J.Zhu,X.Hu,andJ.Li,fiBoostingadversarialattacks withmomentum,flin IEEECVPR ,pp.9185Œ9193,2018. [73] Wikipedia,fiU.S.CustomsandBorderProtection.flhttps://en.wikipedia.org/wiki/U.S. Customs and Border Protection,2019. [74] U.S.CustomsandBorderProtection,fiOnaTypicalDayinFiscalYear2018.flhttps://www. cbp.gov/newsroom/stats/typical-day-fy2018,2018. [75] fiBiometrics.flU.S.CustomsandBorderProtection,https://www.cbp.gov/travel/biometrics. [76] J.Thies,M.Zollhofer,M.Stamminger,C.Theobalt,andM.Nießner,fiFace2face:Real-time facecaptureandreenactmentofrgbvideos,flin CVPR ,2016. [77] A.Rossler,D.Cozzolino,L.Verdoliva,C.Riess,J.Thies,andM.Nießner,fiFaceforen- sics++:Learningtodetectmanipulatedfacialimages,flin ICCV ,pp.1Œ11,2019. [78] Y.Choi,M.Choi,M.Kim,J.-W.Ha,S.Kim,andJ.Choo,fiStargan:generative adversarialnetworksformulti-domainimage-to-imagetranslation,flin CVPR ,2018. [79] M.Liu,Y.Ding,M.Xia,X.Liu,E.Ding,W.Zuo,andS.Wen,fiStgan:Aselective transfernetworkforarbitraryimageattributeediting,flin CVPR ,2019. [80] T.Karras,S.Laine,M.Aittala,J.Hellsten,J.Lehtinen,andT.Aila,fiAnalyzingandimprov- ingtheimagequalityofstylegan,flin CVPR ,2020. [81] J.Thies,M.Zollhofer,M.Stamminger,C.Theobalt,andM.Nießner,fiFace2face:Real-time facecaptureandreenactmentofrgbvideos,flin IEEECVPR ,pp.2387Œ2395,2016. [82] DailyMail,fiPolicearrestpassengerwhoboardedplaneinHongKongasanoldmanin capandarrivedinCanadaayoungAsianrefugee.flhttp://dailym.ai/2UBEcxO,2011. [83] TheVerge,fiThis$150maskbeatFaceIDontheiPhoneX.flhttps://bit.ly/300bRoC,2017. [84] N.ErdogmusandS.Marcel,in2DFaceRecognitionwith3DMasksandAnti- withKinect,flin IEEEBTAS ,2013. 160 [85] D.Wen,H.Han,andA.K.Jain,fiFacespoofdetectionwithimagedistortionanalysis,fl IEEETIFS ,vol.10,no.4,pp.746Œ761,2015. [86] A.Costa-Pazo,S.Bhattacharjee,E.Vazquez-Fernandez,andS.Marcel,fiThereplay-mobile facepresentation-attackdatabase,flin IEEEBIOSIG ,Sept.2016. [87] S.Liu,B.Yang,P.C.Yuen,andG.Zhao,fiA3DMaskFaceDatabasewith RealWorldVariations,flin IEEECVPR ,2016. [88] Z.Yu,C.Zhao,Z.Wang,Y.Qin,Z.Su,X.Li,F.Zhou,andG.Zhao,fiSearchingcentral differenceconvolutionalnetworksforfacefl arXivpreprintarXiv:2003.04092 , 2020. [89] K.Kollreider,H.Fronthaler,M.I.Faraj,andJ.Bigun,fiReal-timefacedetectionandmotion analysiswithapplicationinfilivenessflassessment,fl IEEETIFS ,vol.2,no.3,pp.548Œ558, 2007. [90] G.Pan,L.Sun,Z.Wu,andS.Lao,fiEyeblink-basedinfacerecognitionfrom agenericwebcamera,flin IEEECVPR ,pp.1Œ8,2007. [91] K.Patel,H.Han,andA.K.Jain,fiCross-databasefacewithrobustfeature representation,flin CCBR ,pp.611Œ619,2016. [92] R.Shao,X.Lan,andP.C.Yuen,fiDeepconvolutionaldynamictexturelearningwithadap- tivechannel-discriminabilityfor3dmaskfaceflin IEEEIJCB ,pp.748Œ755, 2017. [93] J.Komulainen,H.Abdenour,andP.Matti,fiContextbasedfaceflin IEEE BTAS ,2013. [94] T.deFreitasPereira,A.Anjos,J.M.DeMartino,andS.Marcel,fiLBP-TOPbasedcounter- measureagainstfaceattacks,flin ACCV ,pp.121Œ132,2012. [95] L.Li,Z.Xia,A.Hadid,X.Jiang,F.Roli,andX.Feng,fiFacepresentationattackdetection inlearnedcolor-likedspace,fl arXivpreprintarXiv:1810.13170 ,2018. [96] K.Patel,H.Han,andA.K.Jain,fiSecurefaceunlock:Spoofdetectiononsmartphones,fl IEEETIFS ,vol.11,no.10,pp.2268Œ2283,2016. [97] J.Galbally,S.Marcel,andJ.Fierrez,fiImagequalityassessmentforfakebiometricde- tection:Applicationtoiris,andfacerecognition,fl IEEETIP ,vol.23,no.2, pp.710Œ724,2014. [98] H.Li,S.Wang,andA.C.Kot,fiFacedetectionwithimagequalityregression,flin IEEEIPTA ,pp.1Œ6,2016. [99] J.Li,Y.Wang,T.Tan,andA.K.Jain,fiLivefacedetectionbasedontheanalysisoffourier spectra,flin BiometricTechnologyforHuman ,vol.5404,pp.296Œ303,SPIE, 2004. 161 [100] T.Pereira,A.Anjos,J.M.DeMartino,andS.Marcel,fiCanfacecountermea- suresworkinarealworldscenario?,flin IEEEICB ,2013. [101] J.M ¨ a ¨ att ¨ a,A.Hadid,andM.Pietik ¨ ainen,fiFacedetectionfromsingleimagesusing micro-textureanalysis,flin IEEEIJCB ,pp.1Œ7,2011. [102] Z.Boulkenafet,J.Komulainen,andA.Hadid,fiFacedetectionusingcolourtexture analysis,fl IEEETIFS ,vol.11,no.8,pp.1818Œ1830,2016. [103] J.Yang,Z.Lei,S.Liao,andS.Z.Li,fiFacelivenessdetectionwithcomponentdependent descriptor,flin IEEEICB ,pp.1Œ6,2013. [104] X.Tan,Y.Li,J.Liu,andL.Jiang,fiFacelivenessdetectionfromasingleimagewithsparse lowrankbilineardiscriminativemodel,flin ECCV ,pp.504Œ517,2010. [105] Z.Boulkenafet,J.Komulainen,andA.Hadid,fiFaceusingspeeded-uprobust featuresandvectorencoding,fl IEEESignalProcessingLetters ,vol.24,no.2,pp.141Œ 145,2017. [106] T.Wang,J.Yang,Z.Lei,S.Liao,andS.Z.Li,fiFacelivenessdetectionusing3dstructure recoveredfromasinglecamera,flin IEEEICB ,2013. [107] Y.Wang,F.Nian,T.Li,Z.Meng,andK.Wang,fiRobustfacewithdepth information,fl JournalofVisualCommunicationandImageRepresentation ,vol.49,092017. [108] S.Zhang,X.Wang,A.Liu,C.Zhao,J.Wan,S.Escalera,H.Shi,Z.Wang,andS.Z. Li,fiCASIA-SURF:ADatasetandBenchmarkforLarge-scaleMulti-modalFaceAnti- fl arXivpreprintarXiv:1812.00408 ,2018. [109] V.Conotter,E.Bodnari,G.Boato,andH.Farid,fiPhysiologically-baseddetectionofcom- putergeneratedfacesinvideo,fl IEEEICIP ,pp.248Œ252,012015. [110] Z.Zhang,D.Yi,Z.Lei,andS.Li,fiFacelivenessdetectionbylearningmultispectralre- distributions,flin IEEEFG ,pp.436Œ441,2011. [111] G.Chetty,fiBiometriclivenesscheckingusingmultimodalfuzzyfusion,flin IEEEWCCI , pp.1Œ8,072010. [112] J.Yang,Z.Lei,andS.Z.Li,fiLearnConvolutionalNeuralNetworkforFacefl arXivpreprintarXiv:1408.5601 ,2014. [113] N.N.Lakshminarayana,N.Narayan,N.Napp,S.Setlur,andV.Govindaraju,fiAdiscrimi- nativespatio-temporalmappingoffaceforlivenessdetection,flin IEEEISBA ,pp.1Œ7,2017. [114] X.TuandY.Fang,fiUltra-deepneuralnetworkforfaceflin NIPS ,pp.686Œ 695,2017. [115] O.Lucena,A.Junior,V.Moia,R.Souza,E.Valle,andR.Lotufo,fiTransferlearningusing convolutionalneuralnetworksforfaceflin ICIAR ,pp.27Œ34,2017. 162 [116] L.Li,X.Feng,Z.Boulkenafet,Z.Xia,M.Li,andA.Hadid,fiAnoriginalface approachusingpartialconvolutionalneuralnetwork,flin IEEEIPTA ,pp.1Œ6,2016. [117] S.R.Arashloo,J.Kittler,andW.Christmas,fiAnanomalydetectionapproachtofacespoof- ingdetection:Anewformulationandevaluationprotocol,fl IEEEAccess ,vol.5,pp.13868Œ 13882,2017. [118] O.Nikisins,A.Mohammadi,A.Anjos,andS.Marcel,fiOneffectivenessofanomalyde- tectionapproachesagainstunseenpresentationattacksinfaceflin IEEEICB , pp.75Œ81,2018. [119] D.P ´ erez-Cabo,D.Jim ´ enez-Cabello,A.Costa-Pazo,andR.J.L ´ opez-Sastre,fiDeepanomaly detectionforgeneralizedfaceflin IEEECVPR ,pp.0Œ0,2019. [120] H.Li,W.Li,H.Cao,S.Wang,F.Huang,andA.C.Kot,fiUnsuperviseddomainadaptation forfacefl IEEETransactionsonInformationForensicsandSecurity ,vol.13, no.7,pp.1794Œ1809,2018. [121] S.R.Arashloo,J.Kittler,andW.Christmas,fiFacedetectionbasedonmultiplede- scriptorfusionusingmultiscaledynamicbinarizedstatisticalimagefeatures,fl IEEETrans- actionsonInformationForensicsandSecurity ,vol.10,no.11,pp.2396Œ2407,2015. [122] J.Yang,Z.Lei,D.Yi,andS.Z.Li,fiPefacewithsubjectdo- mainadaptation,fl IEEETransactionsonInformationForensicsandSecurity ,vol.10,no.4, pp.797Œ809,2015. [123] Y.Atoum,Y.Liu,A.Jourabloo,andX.Liu,fiFaceusingpatchanddepth- basedcnns,flin 2017IEEEIJCB ,pp.319Œ328,2017. [124] A.GeorgeandS.Marcel,fiDeeppixel-wisebinarysupervisionforfacepresentationattack detection,fl arXivpreprintarXiv:1907.04047 ,2019. [125] A.Jourabloo,Y.Liu,andX.Liu,fiFacevianoisemodeling,flin ECCV ,pp.290Œ306,2018. [126] C.NagpalandS.R.Dubey,fiAperformanceevaluationofconvolutionalneuralnetworksfor faceantiflin IEEEIJCNN ,pp.1Œ8,2019. [127] W.Sun,Y.Song,C.Chen,J.Huang,andA.C.Kot,fiFacedetectionbasedonlocal ternarylabelsupervisioninfullyconvolutionalnetworks,fl IEEETIFS ,vol.15,pp.3181Œ 3196,2020. [128] W.Sun,Y.Song,H.Zhao,andZ.Jin,fiAfacedetectionmethodbasedondomain adaptationandlosslesssizeadaptation,fl IEEEAccess ,vol.8,pp.66553Œ66563,2020. [129] D.P.KingmaandJ.Ba,fiAdam:Amethodforstochasticoptimization,fl arXivpreprint arXiv:1412.6980 ,2014. [130] D.E.King,fiDlib-ml:Amachinelearningtoolkit,fl JMLR ,vol.10,no.Jul,pp.1755Œ1758, 2009. 163 [131] Z.Boulkenafet,J.Komulainen,Z.Akhtar,A.Benlamoudi,D.Samai,S.E.Bekhouche, A.F.Dornaika,A.Taleb-Ahmed,L.Qin, etal. ,fiAcompetitionongeneralized software-basedfacepresentationattackdetectioninmobilescenarios,flin IEEEIJCB , pp.688Œ696,2017. [132] H.Chen,G.Hu,Z.Lei,Y.Chen,N.M.Robertson,andS.Z.Li,fiAttention-basedtwo- streamconvolutionalnetworksforfacedetection,fl IEEETIFS ,vol.15,pp.578Œ 593,2019. [133] R.Bresan,A.Pinto,A.Rocha,C.Beluzo,andT.Carvalho,fiFacespoofbuster:apresenta- tionattackdetectorbasedonintrinsicimagepropertiesanddeeplearning,fl arXivpreprint arXiv:1902.02845 ,2019. [134] N.Damer,K.Dimitrov,R.Wilson,E.Hancock,andW.Smith,fiPracticalviewonface presentationattackdetection.,flin BMVC ,2016. [135] X.Yang,W.Luo,L.Bao,Y.Gao,D.Gong,S.Zheng,Z.Li,andW.Liu,fiFace Modelmatters,sodoesdata,flin IEEECVPR ,pp.3507Œ3516,2019. [136] N.CarliniandD.Wagner,fiTowardsevaluatingtherobustnessofneuralnetworks,flin IEEE SP ,pp.39Œ57,2017. [137] C.Xiao,J.-Y.Zhu,B.Li,W.He,M.Liu,andD.Song,fiSpatiallytransformedadversarial examples,fl arXivpreprintarXiv:1801.02612 ,2018. [138] K.Eykholt,I.Evtimov,E.Fernandes,B.Li,A.Rahmati,C.Xiao,A.Prakash,T.Kohno, andD.Song,fiRobustphysical-worldattacksondeeplearningmodels,fl arXivpreprint arXiv:1707.08945 ,2017. [139] N.Papernot,P.McDaniel,S.Jha,M.Fredrikson,Z.B.Celik,andA.Swami,fiThelimita- tionsofdeeplearninginadversarialsettings,flin IEEEEuroS&P ,pp.372Œ387,2016. [140] A.Kurakin,I.Goodfellow,andS.Bengio,fiAdversarialmachinelearningatscale,fl arXiv preprintarXiv:1611.01236 ,2016. [141] S.-M.Moosavi-Dezfooli,A.Fawzi,andP.Frossard,fiDeepfool:asimpleandaccurate methodtofooldeepneuralnetworks,flin IEEECVPR ,pp.2574Œ2582,2016. [142] Y.Dong,H.Su,B.Wu,Z.Li,W.Liu,T.Zhang,andJ.Zhu,fiEfdecision-based black-boxadversarialattacksonfacerecognition,flin IEEECVPR ,pp.7714Œ7722,2019. [143] Y.Liu,X.Chen,C.Liu,andD.Song,fiDelvingintotransferableadversarialexamplesand black-boxattacks,fl arXivpreprintarXiv:1611.02770 ,2016. [144] I.Goodfellow,J.Pouget-Abadie,M.Mirza,B.Xu,D.Warde-Farley,S.Ozair,A.Courville, andY.Bengio,fiGenerativeadversarialnets,flin NIPS ,pp.2672Œ2680,2014. [145] A.Radford,L.Metz,andS.Chintala,fiUnsupervisedrepresentationlearningwithdeep convolutionalgenerativeadversarialnetworks,fl arXivpreprintarXiv:1511.06434 ,2015. 164 [146] E.L.Denton,S.Chintala,andR.Fergus,fiDeepgenerativeimagemodelsusingalaplacian pyramidofadversarialnetworks,flin NIPS ,pp.1486Œ1494,2015. [147] D.Ulyanov,V.Lebedev,A.Vedaldi,andV.S.Lempitsky,fiTexturenetworks:Feed-forward synthesisoftexturesandstylizedimages.,flin ICML ,vol.1,p.4,2016. [148] J.Johnson,A.Alahi,andL.Fei-Fei,fiPerceptuallossesforreal-timestyletransferand super-resolution,flin ECCV ,pp.694Œ711,2016. [149] L.A.Gatys,A.S.Ecker,andM.Bethge,fiImagestyletransferusingconvolutionalneural networks,flin IEEECVPR ,pp.2414Œ2423,2016. [150] P.Isola,J.-Y.Zhu,T.Zhou,andA.A.Efros,fiImage-to-imagetranslationwithconditional adversarialnetworks,flin IEEECVPR ,pp.1125Œ1134,2017. [151] J.-Y.Zhu,T.Park,P.Isola,andA.A.Efros,fiUnpairedimage-to-imagetranslationusing cycle-consistentadversarialnetworks,flin IEEEICCV ,pp.2223Œ2232,2017. [152] T.Salimans,I.Goodfellow,W.Zaremba,V.Cheung,A.Radford,andX.Chen,fiImproved techniquesfortraininggans,flin NIPS ,pp.2234Œ2242,2016. [153] M.F.Mathieu,J.J.Zhao,J.Zhao,A.Ramesh,P.Sprechmann,andY.LeCun,fiDisen- tanglingfactorsofvariationindeeprepresentationusingadversarialtraining,flin NIPS , pp.5040Œ5048,2016. [154] S.BalujaandI.Fischer,fiAdversarialtransformationnetworks:Learningtogenerateadver- sarialexamples,fl arXivpreprintarXiv:1703.09387 ,2017. [155] C.Xiao,B.Li,J.-Y.Zhu,W.He,M.Liu,andD.Song,fiGeneratingadversarialexamples withadversarialnetworks,fl arXivpreprintarXiv:1801.02610 ,2018. [156] X.Wang,K.He,C.Guo,K.Q.Weinberger,andJ.E.Hopcroft,fiAT-GAN:AGenerative AttackModelforAdversarialTransferringonGenerativeAdversarialNets,fl arXivpreprint arXiv:1904.07793 ,2019. [157] Y.Song,R.Shu,N.Kushman,andS.Ermon,fiConstructingunrestrictedadversarialexam- pleswithgenerativemodels,flin NIPS ,pp.8312Œ8323,2018. [158] A.J.BoseandP.Aarabi,fiAdversarialattacksonfacedetectorsusingneuralnetbased constrainedoptimization,flin IEEEMMSP ,pp.1Œ6,2018. [159] M.Sharif,S.Bhagavatula,L.Bauer,andM.K.Reiter,fiAccessorizetoacrime:Realand stealthyattacksonstate-of-the-artfacerecognition,flin ACMSIGSAC ,pp.1528Œ1540,2016. [160] M.Sharif,S.Bhagavatula,L.Bauer,andM.K.Reiter,fiAgeneralframeworkforadversarial exampleswithobjectives,fl ACMTOPS ,vol.22,no.3,p.16,2019. [161] Q.Song,Y.Wu,andL.Yang,fiAttacksonstate-of-the-artfacerecognitionusingattentional adversarialattackgenerativenetwork,fl arXivpreprintarXiv:1811.12026 ,2018. 165 [162] J.Deng,W.Dong,R.Socher,L.-J.Li,K.Li,andL.Fei-Fei,fiImagenet:Alarge-scale hierarchicalimagedatabase,flin CVPR ,IEEE,2009. [163] A.Krizhevsky,G.Hinton, etal. ,fiLearningmultiplelayersoffeaturesfromtinyimages,fl Citeseer ,2009. [164] C.Xie,Y.Wu,L.v.d.Maaten,A.L.Yuille,andK.He,fiFeaturedenoisingforimproving adversarialrobustness,flin CVPR ,2019. [165] Y.LeCun,fiThemnistdatabaseofhandwrittendigits,fl TechReport ,1998. [166] Z.Gong,W.Wang,andW.-S.Ku,fiAdversarialandcleandataarenottwins,fl arXivpreprint arXiv:1704.04960 ,2017. [167] A.Agarwal,R.Singh,M.Vatsa,andN.Ratha,fiAreimage-agnosticuniversaladversarial perturbationsforfacerecognitiondiftodetect?,flin BTAS ,2018. [168] A.P.Founds,N.Orlans,W.Genevieve,andC.I.Watson,fiNISTspecialdatabase32- multipleencounterdatasetII(MEDS-II).,flin NISTIntragencyReport ,2011. [169] R.Gross,I.Matthews,J.Cohn,T.Kanade,andS.Baker,fiMulti-PIE.,flin FG ,2010. [170] J.R.Beveridge,P.J.Phillips,D.S.Bolme,B.A.Draper,G.H.Givens,Y.M.Lui,M.N. Teli,H.Zhang,W.T.Scruggs,K.W.Bowyer,P.J.Flynn,andS.Cheng,fiThechallengeof facerecognitionfromdigitalpoint-and-shootcameras.,flin BTAS ,2013. [171] Moosavi-Dezfooli,Seyed-Mohsen,A.Fawzi,O.Fawzi,andP.Frossard,fiFromfewto many:Illuminationconemodelsforfacerecognitionundervariablelightingandpose.,fl in CVPR ,pp.1765Œ1773,2017. [172] A.Goel,A.Singh,A.Agarwal,M.Vatsa,andR.Singh,fiSmartbox:Benchmarkingadver- sarialdetectionandmitigationalgorithmsforfacerecognition.,flin BTAS ,pp.1Œ7,2018. [173] A.S.Georghiades,P.N.Belhumeur,andD.J.Kriegman,fiFromfewtomany:Illumination conemodelsforfacerecognitionundervariablelightingandpose.,flin PAMI ,pp.643Œ660, 2001. [174] P.-Y.Chen,Y.Sharma,H.Zhang,J.Yi,andC.-J.Hsieh,fiEAD:Elastic-netattackstodeep neuralnetworksviaadversarialexamples.,fl AAAI ,2018. [175] S.Liang,Y.Li,andR.Srikant,fiEnhancingthereliabilityofout-of-distributionimagede- tectioninneuralnetworks,fl ICLR ,2018. [176] G.Goswami,A.Agarwal,N.Ratha,R.Singh,andM.Vatsa,fiDetectingandmitigating adversarialperturbationsforrobustfacerecognition,fl ICCV ,vol.127,no.6-7,pp.719Œ742, 2019. [177] P.J.Phillips,P.J.Flynn,J.R.Beveridge,W.T.Scruggs,A.J.O'toole,D.Bolme,K.W. Bowyer,B.A.Draper,G.H.Givens,Y.M.Lui, etal. ,fiOverviewofthemultiplebiometrics grandchallenge,flin ICB ,2010. 166 [178] J.Liu,W.Zhang,Y.Zhang,D.Hou,Y.Liu,H.Zha,andN.Yu,fiDetectionbaseddefense againstadversarialexamplesfromthesteganalysispointofview,flin CVPR ,2019. [179] F.V.Massoli,F.Carrara,G.Amato,andF.Falchi,fiDetectionoffacerecognitionadversarial attacks,fl CVIU ,p.103103,2020. [180] Q.Cao,L.Shen,W.Xie,O.M.Parkhi,andA.Zisserman,fiVggface2:Adatasetforrecog- nisingfacesacrossposeandage,flin FG ,2018. [181] A.Kurakin,I.Goodfellow,andS.Bengio,fiAdversarialexamplesinthephysicalworld.,fl arXivpreprintarXiv:1607.02533 ,2016. [182] A.Agarwal,R.Singh,M.Vatsa,andN.K.Ratha,fiImagetransformationbaseddefense againstadversarialperturbationondeeplearningmodels,fl IEEETransactionsonDepend- ableandSecureComputing ,2020. [183] Z.Liu,Q.Liu,T.Liu,N.Xu,X.Lin,Y.Wang,andW.Wen,fiFeaturedistillation:Dnn- orientedjpegcompressionagainstadversarialexamples,flin CVPR ,2019. [184] M.Naseer,S.Khan,M.Hayat,F.S.Khan,andF.Porikli,fiAself-supervisedapproachfor adversarialrobustness,flin CVPR ,2020. [185] J.Zhou,C.Liang,andJ.Chen,fiManifoldprojectionforadversarialdefenseonfacerecog- nition,flin EuropeanConferenceonComputerVision ,pp.288Œ305,Springer,2020. [186] H.Qiu,C.Xiao,L.Yang,X.Yan,H.Lee,andB.Li,fiSemanticadv:Generatingadversarial examplesviaattribute-conditionalimageediting,fl arXivpreprintarXiv:1906.07927 ,2019. [187] D.Su,H.Zhang,H.Chen,J.Yi,P.-Y.Chen,andY.Gao,fiIsrobustnessthecostof accuracy?Œacomprehensivestudyontherobustnessof18deepimagemod- els.,flin ECCV ,2018. [188] D.Tsipras,S.Santurkar,L.Engstrom,A.Turner,andA.Madry,fiRobustnessmaybeat oddswithaccuracy.,fl ICLR ,2017. [189] G.S.Dhillon,K.Azizzadenesheli,Z.C.Lipton,J.Bernstein,J.KA.Khanna,and A.Anandkumar,fiStochasticactivationpruningforrobustadversarialdefense,flin ICLR , 2018. [190] R.Feinman,R.R.Curtin,S.Shintre,andA.B.Gardner,fiDetectingadversarialsamples fromartifacts,fl arXivpreprintarXiv:1703.00410 ,2017. [191] K.Grosse,P.Manoharan,N.Papernot,M.Backes,andP.McDaniel,fiOnthe(statistical) detectionofadversarialexamples,fl arXivpreprintarXiv:1702.06280 ,2017. [192] X.LiandF.Li,fiAdversarialexamplesdetectionindeepnetworkswithconvolutional statistics,flin ICCV ,pp.5764Œ5772,2017. [193] D.HendrycksandK.Gimpel,fiEarlymethodsfordetectingadversarialimages,fl arXiv preprintarXiv:1608.00530 ,2016. 167 [194] C.Guo,M.Rana,M.Cisse,andL.VanDerMaaten,fiCounteringadversarialimagesusing inputtransformations,fl arXivpreprintarXiv:1711.00117 ,2017. [195] H.Kannan,A.Kurakin,andI.Goodfellow,fiAdversariallogitpairing,fl arXivpreprint arXiv:1803.06373 ,2018. [196] J.H.Metzen,T.Genewein,V.Fischer,andB.Bischoff,fiOndetectingadversarialperturba- tions,fl ICLR ,2017. [197] T.Na,J.H.Ko,andS.Mukhopadhyay,fiCascadeadversarialmachinelearningregularized withaembedding,fl ICLR ,2017. [198] C.Xie,J.Wang,Z.Zhang,Z.Ren,andA.Yuille,fiMitigatingadversarialeffectsthrough randomization,fl ICLR ,2017. [199] V.Zantedeschi,M.-I.Nicolae,andA.Rawat,fiEfdefensesagainstadversarialat- tacks,flin ACMWorkshoponIntelligenceandSecurity ,pp.39Œ49,2017. [200] N.CarliniandD.Wagner,fiAdversarialexamplesarenoteasilydetected:Bypassingten detectionmethods,flin ACMWorkshoponIntelligenceandSecurity ,pp.3Œ14, 2017. [201] A.Athalye,N.Carlini,andD.Wagner,fiObfuscatedgradientsgiveafalsesenseofsecurity: Circumventingdefensestoadversarialexamples,fl ICML ,2018. [202] N.CarliniandD.Wagner,fiMagnetandfiefdefensesagainstadversarialattacksflare notrobusttoadversarialexamples,fl arXivpreprintarXiv:1711.08478 ,2017. [203] M.Mosbach,M.Andriushchenko,T.Trost,M.Hein,andD.Klakow,fiLogitpairingmeth- odscanfoolgradient-basedattacks,fl arXivpreprintarXiv:1810.12042 ,2018. [204] Y.Song,T.Kim,S.Nowozin,S.Ermon,andN.Kushman,fiPixeldefend:Leveraginggen- erativemodelstounderstandanddefendagainstadversarialexamples,fl ICLR ,2017. [205] A.Rozsa,M.G ¨ unther,andT.E.Boult,fiLotsaboutattackingdeepfeatures,flin 2017IEEE InternationalJointConferenceonBiometrics(IJCB) ,pp.168Œ176,IEEE,2017. [206] G.Goswami,N.Ratha,A.Agarwal,R.Singh,andM.Vatsa,fiUnravellingrobustnessof deeplearningbasedfacerecognitionagainstadversarialattacks,flin AAAI ,2018. [207] P.J.Grother,M.Ngan,andK.Hanaoka,fiOngoingFaceRecognitionVendorTest(FRVT), Part2:fl NISTInteragencyReport ,2018. [208] Y.Liu,A.Jourabloo,andX.Liu,fiLearningdeepmodelsforfaceBinaryor auxiliarysupervision,flin CVPR ,2018. [209] Y.Liu,J.Stehouwer,andX.Liu,fiOndisentanglingspooftracesforgenericfaceanti- flin ECCV ,2020. 168 [210] H.Dang,F.Liu,J.Stehouwer,X.Liu,andA.Jain,fiOnthedetectionofdigitalfacemanip- ulation,flin CVPR ,2020. [211] S.Shan,E.Wenger,J.Zhang,H.Li,H.Zheng,andB.Y.Zhao,fiFawkes:protectingprivacy againstunauthorizeddeeplearningmodels,flin USENIX ,pp.1589Œ1604,2020. [212] A.Raghunathan,S.M.Xie,F.Yang,J.C.Duchi,andP.Liang,fiAdversarialtrainingcan hurtgeneralization,fl arXivpreprintarXiv:1906.06032 ,2019. [213] D.Yang,S.Hong,Y.Jang,T.Zhao,andH.Lee,fiDiversity-sensitiveconditionalgenerative adversarialnetworks,fl ICLR ,2019. [214] P.Zhou,X.Han,V.I.Morariu,andL.S.Davis,fiTwo-streamneuralnetworksfortampered facedetection,flin CVPRW ,IEEE,2017. [215] X.Yang,Y.Li,andS.Lyu,fiExposingdeepfakesusinginconsistentheadposes,flin ICASSP , IEEE,2019. [216] P.KorshunovandS.Marcel,fiDeepfakes:anewthreattofacerecognition?assessmentand detection,fl arXivpreprintarXiv:1812.08685 ,2018. [217] R.Wang,F.Juefei-Xu,L.Ma,X.Xie,Y.Huang,J.Wang,andY.Liu,fiFakespot- ter:Asimpleyetrobustbaselineforspottingai-synthesizedfakefaces,fl arXivpreprint arXiv:1909.06122 ,2019. [218] Y.Liu,A.Jourabloo,andX.Liu,fiLearningdeepmodelsforfaceBinaryor auxiliarysupervision,flin CVPR ,June2018. [219] A.Madry,A.Makelov,L.Schmidt,D.Tsipras,andA.Vladu,fiTowardsdeeplearning modelsresistanttoadversarialattacks,fl arXivpreprintarXiv:1706.06083 ,2017. [220] L.Li,J.Bao,T.Zhang,H.Yang,D.Chen,F.Wen,andB.Guo,fiFacex-rayformoregeneral faceforgerydetection,flin CVPR ,2020. [221] Y.A.U.Rehman,L.M.Po,andM.Liu,fiDeeplearningforfaceanend-to- endapproach,flin IEEESPA ,pp.195Œ200,2017. [222] A.Jourabloo,Y.Liu,andX.Liu,fiFacevianoisemodeling,flin ECCV ,2018. [223] F.Chollet,fiXception:Deeplearningwithdepthwiseseparableconvolutions,flin CVPR , 2017. [224] S.Mehta,A.Uberoi,A.Agarwal,M.Vatsa,andR.Singh,fiCraftingapanopticfacepresen- tationattackdetector,flin ICB ,IEEE,2019. [225] J.Stehouwer,A.Jourabloo,Y.Liu,andX.Liu,fiNoisemodeling,synthesisand tionforgenericobjectflin CVPR ,2020. 169 [226] M.-T.Luong,Q.V.Le,I.Sutskever,O.Vinyals,andL.Kaiser,fiMulti-tasksequenceto sequencelearning,fl arXivpreprintarXiv:1511.06114 ,2015. [227] E.MeyersonandR.Miikkulainen,fiPseudo-taskaugmentation:Fromdeepmultitasklearn- ingtointratasksharingŠandback,flin ICML ,pp.3511Œ3520,2018. [228] X.YinandX.Liu,fiMulti-taskconvolutionalneuralnetworkforpose-invariantfacerecog- nition,fl IEEET-IP ,vol.27,no.2,pp.964Œ975,2017. [229] M.Crawshaw,fiMulti-tasklearningwithdeepneuralnetworks:Asurvey,fl arXivpreprint arXiv:2009.09796 ,2020. [230] T.Gui,L.Qing,Q.Zhang,J.Ye,H.Yan,Z.Fei,andX.Huang,fiConstructingmultipletasks foraugmentation:Improvingneuralimagewithk-meansfeatures,flin AAAI , 2020. [231] K.Hsu,S.Levine,andC.Finn,fiUnsupervisedlearningviameta-learning,fl ICLR ,2018. [232] J.Baxter,fiAbayesian/informationtheoreticmodeloflearningtolearnviamultipletask sampling,fl Machinelearning ,vol.28,no.1,pp.7Œ39,1997. [233] N.Sanghvi,S.K.Singh,A.Agarwal,M.Vatsa,andR.Singh,fiMixnetforgeneralizedface presentationattackdetection,fl arXivpreprintarXiv:2010.13246 ,2020. [234] J.J.Engelsma,K.Cao,andA.K.Jain,fiLearningaed-lengthrepresentation,fl IEEEPAMI ,2019. 170