FACERECOGNITION:REPRESENTATION,INTRINSICDIMENSIONALITY, CAPACITY,ANDDEMOGRAPHICBIAS By SixueGong ADISSERTATION Submittedto MichiganStateUniversity inpartialful˝llmentoftherequirements forthedegreeof ComputerScienceDoctorofPhilosophy 2021 ABSTRACT FACERECOGNITION:REPRESENTATION,INTRINSICDIMENSIONALITY, CAPACITY,ANDDEMOGRAPHICBIAS By SixueGong Facerecognitionisawidelyadoptedtechnologywithnumerousapplications,suchasmobile phoneunlock,mobilepayment,surveillance,socialmediaandlawenforcement.Therehasbeen tremendousprogressinenhancingtheaccuracyoffacerecognitionsystemsoverthepastfewdecades, muchofwhichcanbeattributedtodeeplearning.Despitethisprogress,severalfundamental problemsinfacerecognitionstillremainunsolved.Theseproblemsinclude˝ndingasalient representation,estimatingintrinsicdimensionality,representationcapacity,anddemographicbias. Withgrowingapplicationsoffacerecognition,theneedforanaccurate,robust,compactandfair representationisevident. Inthisthesis,we˝rstdevelopalgorithmstoobtainpracticalestimatesofintrinsicdimensionality offacerepresentations,andproposeanewdimensionalityreductionmethodtoprojectfeature vectorsfromambientspacetointrinsicspace.Basedonthestudyinintrinsicdimensionality,we thenestimatecapacityoffacerepresentation,castingthefacecapacityestimationproblemunderthe informationtheoreticframeworkofcapacityofaGaussiannoisechannel.Numericalexperimentson unconstrainedfaces(IJB-C)provideacapacityupperboundof 2 Ł 7 10 4 forFaceNetand 8 Ł 4 10 4 forSphereFacerepresentationat 1% FAR. Inthesecondpartofthethesis,weaddressthedemographicbiasprobleminfacerecognition systemswhereerrorsareloweroncertaincohortsbelongingtospeci˝cdemographicgroups.We proposetwode-biasingframeworksthatextractfeaturerepresentationstoimprovefairnessinface recognition.Experimentsonbenchmarkfacedatasets(RFW,LFW,IJB-A,andIJB-C)showthat ourapproachesareabletomitigatefacerecognitionbiasonvariousdemographicgroups(biasness dropsfrom 6 Ł 83 to 5 Ł 07 )aswellasmaintainthecompetitiveperformance(i.e., 99 Ł 75% onLFW, and 93 Ł 70% TAR@ 0 Ł 1% FARonIJB-C).Lastly,weexploretheglobaldistributionofdeepface representationsderivedfromcorrelationsbetweenimagesamplesofwithin-classandcross-class toenhancethediscriminativenessoffacerepresentationofeachidentityintheembeddingspace. Ournewapproachtofacerepresentationachievesstate-of-the-artperformanceforbothveri˝cation andidenti˝cationtasksonbenchmarkdatasets( 99 Ł 78% onLFW, 93 Ł 40% onCPLFW, 98 Ł 41% on CFP-FP, 96 Ł 2% TAR@ 0 Ł 01% FARand 95 Ł 3% Rank 1 accuracyonIJB-C).Since,theprimary techniquesweemployinthisdissertationarenotspeci˝ctofacesonly,webelieveourresearchcan beextendedtootherproblemsincomputervision,forexample,generalimageclassi˝cationand representationlearning. Copyrightby SIXUEGONG 2021 Dedicatedtomyparents,QiaojieGongandYanHuang v ACKNOWLEDGMENTS TheyearsatMSUhavebeenanunforgettableexperiencethathasbeenengravedonmyheart, fullofjoysandsorrows,partingsandmeetings.Itisyetacomforttorememberandtothankmy teachers,colleagues,friends,andfamilywhosein˛uencecontributedtothisthesis. Firstandforemost,Iwouldliketothankmyadvisor,AnilK.Jain,forhisextraordinarysupport, patience,guidance,andfundingmyentireresearchonfacerecognition.Hissuggestionsandour discussionscontributedimmenselytothisdissertation.Everysummersemester,Dr.Jainencouraged andhelpedmeto˝ndinternships,whichenrichedmyworkingexperiencewithindustryandalso inspiredthisdissertationresearch.Hitherto,heallowedmefreedomtoexploredirectionsinmyown research,andtomanagemywork-lifebalancebyspendingtimeonmyhobbies.Ihavecometo appreciatethewisdomofhiswaythathasguidedmee˙ectivelyandsafelythroughtheprocess. IowemanyinspirationalideastoVishnuNareshBoddetiandXiaomingLiu,whohavebeenlike co-advisorstome.IthasbeenaprivilegetoworkwithDr.BoddetiandDr.Liu,whichstrengthened myresearchabilityinthe˝eldofcomputervisionanddeeplearning.Thisthesiswouldnothave beenpossiblewithoutDr.Boddeti'sandDr.Liu'sguidanceandcontributions.Ideeplyappreciate theirenthusiasmfornovelapproachesandtheirgenuinelypositiveattitudetowardsscienceandmy researchprogress. IamverythankfultoProfessorYuanZhang,andIlearnedatremendousamountfromherby takingtheundergraduatecoursesofinformationtheoryanddatacompressionfromhermysenior year.WorkingwithProfessorZhangalsofosteredmyinterestinresearch.Shelaterintroduced metothekeyIntelligentInformationProcessingLaboratoryoftheChineseAcademyofSciences, whereIhadtheopportunitytoworkwithProfessorsHuHanandShiguangShan.Ihighlyvalue manyusefulideas,commentsandpracticaladvicefromProfessorHanandProfessorShan.HadI notworkedwithProfessorHanandProfessorShan,itisunlikelyIwouldhaveendedupinthePRIP labworkingwithDr.Jain. IthankallofthemembersofthePRIPlabforparticipatinginmyresearchandprovidingvaluable vi feedbackonmyworkatalltimes.Iamverygratefulfortheirfriendshipandsupport. IthankallmyfriendsinEastLansing.Theyarelikemyextendedfamilyhere-themembers ofthePRIPlab(DebayanDeb,JoshuaEngelsma,YichunShi,KaiCao,TarangChugh,InciM. Baytas,VisheshMistry,DivyanshAggarwal,andStevenGrosz),DiptiKamath,ManniLiu,Lan Wang,RahulDey,andXiaoxueWang-forparties,dinners,movies,games,andlovingfriendship overtheyears. Ithankmyentirefamilyfortheirloveandsupport,especiallymyparents,QiaojieGong,Yan Huang,myaunt,BaolanGong,andmycousin,RuozhuChen. Finally,Iwouldalsoliketothankthosethathavebroughtsu˙eringtome.Ihavelearntimportant lessonsinlifeandbuiltresiliencebyembracingtheadversityasagreatteacher.Ito˙eredmevaluable chancetogaintreasuredinsightsandwisdomstohavemyselfpreparedforabetteropportunityof successnexttime. vii TABLEOFCONTENTS LISTOFTABLES ....................................... xi LISTOFFIGURES ...................................... xiv LISTOFALGORITHMS ................................... xx KEYTOABBREVIATIONS ................................. xxi Chapter1Introduction:TheImportanceofFaceRepresentation ........... 1 1.1AutomatedFaceRecognition.............................3 1.1.1InputDataSource...............................3 1.1.2Evaluation...................................5 1.1.3FaceRecognitionPipeline..........................7 1.2FaceRepresentationExtractors............................8 1.3ManifoldofFaceRepresentation...........................11 1.3.1IntrinsicDimensionality...........................11 1.3.2Capacity....................................13 1.4BiasinFaceRecognition...............................15 1.5FaceRepresentationviaGraphNeuralNetwork...................18 1.6ThesisContributions.................................20 1.7ThesisStructure....................................22 Chapter2TheIntrinsicDimensionalityofFaceRepresentation ............ 23 2.1IntrinsicDimensionality................................24 2.2DimensionalityReduction...............................25 2.3OurApproach.....................................25 2.3.1EstimatingIntrinsicDimensionality.....................26 2.3.2EstimatingIntrinsicSapce..........................30 2.4Experiments......................................31 2.4.1IntrinsicDimensionalityEstimation.....................31 2.4.2IntrinsicSpaceMapping...........................36 2.5Conclusion......................................43 Chapter3TheCapacityofFaceRepresentation ..................... 44 3.1RelatedWork.....................................46 3.2CapacityofFaceRepresentations...........................48 3.2.1FaceRepresentationModel.........................48 3.2.2EstimatingUncertaintiesinRepresentations.................50 3.2.3ManifoldApproximation...........................54 3.2.4DecisionTheoryandModelCapacity....................55 3.3NumericalExperiments...............................58 3.3.1Two-DimensionalToy-Example.......................59 viii 3.3.2DatasetsandFaceRepresentationModel...................60 3.3.3FaceRecognitionPerformance........................61 3.3.4FaceRepresentationCapacity........................63 3.3.5AblationStudies...............................66 3.4Conclusion......................................68 Chapter4TheBiasinFaceRecognition ......................... 70 4.1FairnessLearningandDe-biasingAlgorithms....................71 4.2ProblemDe˝nition..................................72 4.3JointlyDe-biasingFaceRecognitionandDemographicAttributeEstimation....73 4.3.1AdversarialLearningandDisentangledRepresentation...........75 4.3.2Methodology.................................75 4.3.2.1AlgorithmDesign.........................75 4.3.2.2NetworkArchitecture.......................77 4.3.2.3AdversarialTrainingandDisentanglement............78 4.3.3Experiments..................................80 4.3.3.1DatasetsandPre-processing....................80 4.3.3.2ImplementationDetails......................81 4.3.3.3De-biasingFaceVeri˝cation....................81 4.3.3.4De-biasingDemographicAttributeEstimation..........83 4.3.3.5AnalysisofDisentanglement....................85 4.3.3.6FaceVeri˝cationonPublicTestingDatasets...........87 4.3.3.7DistributionsofScores.......................88 4.4MitigatingFaceRecognitionBiasviaGroupAdaptiveClassi˝er..........89 4.4.1AdaptiveNeuralNetworks..........................91 4.4.2Methodology.................................93 4.4.2.1Overview..............................93 4.4.2.2AdaptiveLayer...........................93 4.4.2.3AutomationModule........................96 4.4.2.4De-biasingObjectiveFunction...................96 4.4.3Experiments..................................97 4.4.3.1ResultsonRFWProtocol.....................98 4.4.3.2ResultsonGenderandRaceGroups................104 4.4.3.3AnalysisonIntrinsicBiasandDataBias.............106 4.4.3.4ResultsonStandardBenchmarkDatasets.............107 4.4.3.5VisualizationandAnalysisonBiasofFR.............108 4.4.3.6NetworkComplexityandFLOPs.................109 4.5DemographicEstimation...............................109 4.6Conclusion......................................111 Chapter5AdversarialFaceRepresentationLearningviaGraphClassi˝cation ... 113 5.1AdversarialLearningandGraphClassi˝cationwithGNN..............113 5.2OurApproach.....................................115 5.2.1OverallFramework..............................115 5.2.2GraphConstruction..............................117 ix 5.2.3DiscriminatorandAdversarialLearning...................118 5.2.4NetworkTraining...............................120 5.3Experiments......................................122 5.3.1DatasetsandImplementationDetails.....................122 5.3.2AblationStudy................................123 5.3.3ComparisonswithSOTAMethods......................125 5.3.4AnalysisonFeatureDistribution.......................128 5.4ConcludingRemarks.................................130 Chapter6SummaryandFutureWork .......................... 131 APPENDIX ........................................... 137 BIBLIOGRAPHY ....................................... 139 x LISTOFTABLES Table2.1IntrinsicDimensionality:GraphDistance[1].................35 Table2.2IntrinsicDimensionality:KNN[2].......................35 Table2.3IntrinsicDimensionality:IDEA[3]......................36 Table2.4LFWFaceVeri˝cationforSphereFaceEmbedding..............38 Table2.5DeepMDSTrainingMethods(TAR@0.1%FAR)..............40 Table3.1CapacityofTwo-DimensionalToyExampleat1%FAR............59 Table 3.2 FaceRecognitionResultsforFaceNet,SphereFaceandState-of-the-Art(The state-of-the-artfacerepresentationmodelsarenotavailableinthepublicdomain).63 Table3.3CapacityofFaceRepresentationModelat1%FAR..............64 Table3.4IJB-CCapacityat1%FARAcrossIntra-ClassUncertainty..........66 Table3.5IJB-CCapacityat1%FARAcrossManifoldSupport.............68 Table4.1Statisticsoftrainingandtestingdatasetsusedinthepaper...........80 Table4.2BiasnessofFaceRecognitionandDemographicAttributeEstimation.....84 Table4.3DemographicClassi˝cationAccuracy(%)byfacefeatures...........87 Table4.4FaceVeri˝cationAccuracy(%)onRFWdataset................87 Table4.5Veri˝cationPerformanceonLFW,IJB-A,andIJB-C..............88 Table 4.6 PerformancecomparisonwithSOTAontheRFWprotocol[4].Theresults markedby(*)aredirectlycopiedfrom[5].......................99 Table4.7AblationofadaptivestrategiesontheRFWprotocol[4]............102 Table4.8AblationofCNNdepthsanddemographicsonRFWprotocol[4].......102 Table4.9 Ablationson _ onRFWprotocol(%). .......................103 Table4.10Veri˝cationAccuracy(%)of 5 -foldcross-validationon 8 groupsofRFW[4].103 xi Table4.11AblationsontheautomationmoduleonRFWprotocol(%)..........104 Table4.12Statisticsofdatasetfoldsinthecross-validationexperiment..........104 Table4.13Veri˝cation(%)ongendergroupsofIJB-C(TAR@ 0 Ł 1% FAR).......105 Table 4.14 Veri˝cationaccuracy( % )ontheRFWprotocol[4]withvaryingrace/ethnicity distributioninthetrainingset.............................106 Table 4.15 Veri˝cationperformanceonLFW,IJB-A,andIJB-C.[Key: Best , Second , Third Best]......................................107 Table 4.16 Distributionofratiosbetweenminimuminter-classdistanceandmaximum intra-classdistanceoffacefeaturesin 4 racegroupsofRFW.GACexhibitshigher ratios,andmoresimilardistributionstothereference................108 Table4.17Networkcomplexityandinferencetime.....................109 Table4.18Genderdistributionofthedatasetsforgenderestimation............110 Table4.19Racedistributionofthedatasetsforraceestimation...............111 Table4.20Agedistributionofthedatasetsforageestimation...............111 Table 5.1 Veri˝cationperformance( % )ofdi˙erentvertexfeaturematricesoforacle graphs.Abigger A toleratesmoreintra-classvariations,whileasmall A , f 2 8 ,or f ? 8 striveforminimalintra-classvariation.Abalancebetweenthelearningcapability andidealrepresentationsperformsthebest( A =0 Ł 7 ).................123 Table 5.2 Veri˝cationperformance( % )ofdi˙erentadjacencymatricesofgenerated graphs..........................................125 Table5.3Veri˝cationperformance( % )ofdi˙erent _ and ` ................125 Table 5.4 Veri˝cationaccuracy( % )ofourmodelandSOTAmethodsonLFW,CPLFW, andCFP-FP.Theresultsmarkedby(*)arere-implementedbyArcFace[6].All otherbaselineresultsarereportedbytheirrespectivepapers.[Keys:Red:Best, Blue:Secondbest]..................................126 xii Table 5.5 Comparisonsofveri˝cationperformancewithSOTAmethodsonIJB-A, IJB-B,andIJB-C.TheevaluationismeasuredbyTAR( % ),TrueAcceptanceRate, atacertainFAR,FalseAcceptanceRate.ForIJB-A,FAR= 0 Ł 1% ;forIJB-Band IJB-C,FAR= 0 Ł 01% .ThedecimalprecisionofTARvariesamongthosereported bySOTAmethods.Resultsreportedinthistableareuni˝edtoonedecimalplace ( 0 Ł 1 ).Allbaselineresultsarereportedbytheirrespectivepapers.[Keys:Red:Best, Blue:Secondbest]..................................127 Table 5.6 Comparisonsoffaceidenti˝cationperformance( % )ontheIJB-Cdataset (close-set)........................................128 xiii LISTOFFIGURES Figure1.1Exampleimagesfromdi˙erentdatasets.MS-Celeb-1M,CASIA,andLFW containonlystillimages.IJB-A,IJB-BandIJB-Cincludebothstillimagesand videoframes.Threesubjectsareselectedfromeachdataset,andeachrowcontains imagesbelongingtoonesubject............................5 Figure 1.2 Atypicalpipelineoffeature-basedFRframeworkscomprisesoffacedetection, alignment,normalization,featureextraction,andfeaturematching.Whileeachof thesecomponentsa˙ecttheperformanceofFRsystems,inthisthesis,wefocus ontherepresentationfunctionoffeatureextractionthatmapsahigh-dimensional normalizedfaceimagetoa d- dimensionalvectorrepresentation...........8 Figure 1.3 Theerrortradeo˙characteristicsintheformoffalsenon-matchrates (FNMR=1-TAR)vs.falsematchrates(FMR=FAR)foronecommercialFR algorithmverifyingmugshotimages,reportedbytheNIST(NationalInstituteof StandardsandTechnology)FacerecognitionVendorTest[7].TheFMRestimates arecomputedonimpostorpairsoffaceimageswithsamegenderandsamerace. Eachsymbol(circle,triangle,square)correspondstoa˝xedthreshold-theirvertical andhorizontaldisplacementsreveal,respectively,di˙erencesinFNMRandFMR betweendemographicgroups[7]............................15 Figure 1.4 Falsepositivedi˙erentialsinveri˝cationalgorithmsprovidedtoNIST.The dotsgivethefalsematchratesforsame-sexandsame-raceimpostorcomparisons. ThethresholdissetforeachalgorithmtogiveFMR= 0 Ł 0001 onwhitemales(the purpledotsintherighthandpanel).Thealgorithmsaresortedinorderofworst caseFMR[7]......................................16 Figure 2.1 IntrinsicDimension: Ourapproachisbasedontwoobservations:(a)Graph inducedgeodesicdistancebetweenimagesisabletocapturethetopologyofthe imagerepresentationmanifoldmorereliably.Asanillustration,weshowthegraph edgesforthesurfaceofaunitaryhypersphereandafacemanifoldofIDtwo, embeddedwithina3- 38< space.(b)Thedistributionofthegeodesicdistances(for distance A <0G 2 f A A <0G ,where A <0G isthedistanceatthemode)hasbeen empiricallyobserved[1]tobesimilaracrossdi˙erenttopologicalstructureswith thesameintrinsicdimensionality.Theplotshowsthedistancedistributionforaface representation,unitaryhypersphereandaGaussiandistributionofIDtwoembedded within3- 38< space.[8]................................27 Figure 2.2 DeepMDSMapping: ADNNbasednon-linearmappingislearnedto transformtheambientspacetoaplausibleintrinsicspace.Thenetworkisoptimized topreservedistancesbetweenpairsofpointsintheambientandintrinsicspace...30 xiv Figure 2.3 IntrinsicDimensionality: (a)Geodesicdistancedistribution,and(b)global minimumofRMSE...................................32 Figure 2.4 Distributionofgeodesicdistancesfordi˙erentrepresentationmodelsand datasets.........................................33 Figure 2.5 log ^ ? M ( A ) ^ ? M ( A <0G ) vs log A A <0G plotsaswevarynumberofneighbors : forsphereface representationmodelondi˙erentdatasets......................34 Figure2.6IntrinsicDimensionalityofSwissRoll....................36 Figure 2.7 SwissRoll: (a)theoriginal2000pointsfromtheswissrollmanifold,(b) the2- 38< intrinsicspaceestimatedbyIsomap,and(3)the2- 38< intrinsicspace estimatedbyourproposedmethodDeepMDS.Inbothcases,theblueandblack points,andcorrespondinglygreenandredpoints,areclosetogetherinboththe intrinsicandambientspace..............................37 Figure 2.8 DeepMDS: FaceVeri˝cationonIJB-C[9](TAR@0.1%FARinlegend) forthe(a)FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings......38 Figure 2.9 DeepMDS: FaceVeri˝cationonLFW(BLUFR)datasetforthe(a)FaceNet- 128,(b)FaceNet-512and(c)SphereFaceembeddings.................39 Figure 2.10 PCA: FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a) FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings............40 Figure 2.11 Isomap: FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a) FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings............41 Figure 2.12 ROCcurveonLFWandIJB-CdatasetsfortheInceptionResNetV1[10] modeltrainedwithdi˙erentembeddingdimensionalityontheCASIA-WebFace[11] dataset..........................................42 Figure 2.13 DenoisingAutoencoder: FaceVeri˝cationonIJB-CandLFW(BLUFR) datasetforthe(a)FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings..42 Figure 3.1 Anillustrationofthegeometricalstructureofourcapacityestimationproblem: alow-dimensionalmanifold M2 R < embeddedinhighdimensionalspace P2 R ? . Onthismanifold,allthefaceslieinsidethepopulationhyper-ellipsoidandthe embeddingofimagesbelongingtoeachidentityoraclassareclusteredintotheir ownclass-speci˝chyper-ellipsoids.Thecapacityofthismanifoldisthenumber ofidentities(class-speci˝chyper-ellipsoids)thatcanbepackedintothepopulation hyper-ellipsoidwithinanerrortoleranceoramountofoverlap............45 xv Figure 3.2 OverviewofFaceRepresentationCapacityEstimation: Wecastthe capacityestimationprocessintheframeworkofthespherepackingproblemona low-dimensionalmanifold.Togeneralizethespherepackingproblem,wereplace spheresbyhyper-ellipsoids,oneperclass(subject).Ourapproachinvolvesthree steps;(i)Unfoldingandmappingthemanifoldembeddedinhigh-dimensional spaceontoalow-dimensionalspace.(ii) Teacher - Student modeltoobtainexplicit estimatesoftheuncertainty(noise)intheembeddingduetodataaswellasthe parametersoftherepresentation,and(iii)Theuncertaintyestimatesareleveragedto approximatethedensitymanifoldviamulti-variatenormaldistributions(tokeepthe problemanditsanalysistractable),whichinturnfacilitatesanempiricalestimateof thecapacityofthe teacher facerepresentationasaratioofhyper-ellipsoidalvolumes. 49 Figure 3.3 ManifoldUnfolding: ADNNbasednon-linearmappingislearnedtounfold andprojectthepopulationmanifoldintoalowerdimensionalspace.Thenetworkis optimizedtopreservethegeodesicdistancesbetweenpairsofpointsinthehighand lowdimensionalspace.................................50 Figure 3.4 DecisionTheoryandCapacity: Weillustratetherelationbetweencapacity andthediscriminantfunctioncorrespondingtoanearestneighborclassi˝er. Left: Depictionofthenotionofdecisionboundaryandprobabilityoffalseaccept betweentwoidenticalonedimensionalGaussiandistributions.Shannon'sde˝nition ofcapacitycorrespondstothedecisionboundarybeingonestandarddeviation awayfromthemean. Right: Depictionofthedecisionboundaryinducedbythe discriminantfunctionofnearestneighborclassi˝er.Unlikeinthede˝nitionof Shannon'scapacity,thesizeoftheellipsoidaldecisionboundaryisdeterminedby themaximumacceptablefalseacceptrate.Theprobabilityoffalseacceptancecan becomputedthroughthecumulativedistributionfunctionofa j 2 ( A 2 Œ3 ) distribution.55 Figure 3.5 SampleRepresentationSpace: Illustrationofatwo-dimensionalspace wheretheunderlyingpopulationandclass-speci˝crepresentations(weshowfour classes)are2-DGaussiandistributions(solidellipsoids).Samplesfromtheclasses (colored ¢ )areutilizedtoobtainestimatesofthisunderlyingpopulationandclass- speci˝cdistributions(solidlines).Asacomparison,thesupportofthesamplesin theformofaconvexhullarealsoshown(dashedlines)................59 Figure 3.6 Facerecognitionperformanceofthe original and student modelsondi˙erent datasets.Wereportthefaceveri˝cationperformanceofbothFaceNetandSphereFace facerepresentations,(a)LFWevaluatedthroughtheBLUFRprotocol,(b)IJB-A, (c)IJB-B,and(d)IJB-Cevaluatedthroughtheirrespectivematchingprotocol....63 Figure 3.7 Capacityestimatesacrossdi˙erentdatasetsforthe(a)FaceNet[12]and(b) SphereFace[13]representationsasfunctionofdi˙erentfalseacceptrates.Under thelimit,thecapacitytendstozeroastheFARtendstozero.Similarly,thecapacity tendsto 1 astheFARtendsto1.0.(c)Logarithmicvaluesofcapacityondi˙erent datasetsversusthecorrespondingTAR@0.1%FAR.................65 xvi Figure 3.8 Exampleimagesofclassesthatcorrespondtodi˙erentsizesoftheclass- speci˝chyper-ellipsoids,basedontheSphereFacerepresentation,fordi˙erent datasetsconsidered. TopRow :Imagesoftheclasswiththelargestclass-speci˝c hyper-ellipsoidforeachdatabase.Noticethatinthecaseofadatabasewith predominantlyfrontalfaces(LFW),largevariationsinfacialappearanceleadtothe greatestuncertaintyintheclassrepresentation.Onmorechallengingdatasets(IJB-B, IJB-C),thefacerepresentationexhibitsmostuncertaintyduetoposevariations. BottomRow: Imagesoftheclasswiththesmallestclass-speci˝chyper-ellipsoid foreachdatabase.Asexpected,acrossallthedatasets,frontalfaceimageswiththe minimalchangeinappearanceresultintheleastamountofuncertaintyintheclass representation......................................67 Figure 4.1 Methodstolearndi˙erenttaskssimultaneously.Solidlinesaretypicalfeature ˛owinCNN,whiledashlinesareadversariallosses.................74 Figure 4.2 OverviewoftheproposedDe-biasingface(DebFace)network.DebFace iscomposedofthreemajorblocks, i.e. ,asharedfeatureencodingblock,afeature disentanglingblock,andafeatureaggregationblock.Thesolidarrowsrepresent theforwardinference,andthedashedarrowsstandforadversarialtraining.During inference,eitherDebFace-ID( i.e. , f ˚ˇ )orDemoIDcanbeusedforfacematching giventhedesiredtrade-o˙betweenbiasnessandaccuracy..............76 Figure 4.3 FaceVeri˝cationAUC(%)oneachdemographiccohort.Thecohortsare chosenbasedonthethreeattributes, i.e. ,gender,age,andrace.To˝ttheresults intoa 2 Dplot,weshowtheperformanceofmaleandfemaleseparately.Duetothe limitednumberoffaceimagesinsomecohorts,theirresultsaregraycells......82 Figure 4.4 Theoverallperformanceoffaceveri˝cationAUC(%)ongender,age,andrace. 83 Figure 4.5 Classi˝cationaccuracy(%)ofdemographicattributeestimationsonfacesof di˙erentcohorts,byDebFaceandthebaselines.Forsimplicity,weuseDebFace-G, DebFace-A,andDebFace-Rtorepresentthegender,age,andraceclassi˝erof DebFace.........................................84 Figure 4.6 ThedistributionoffaceidentityrepresentationsofBaseFaceandDebFace. Bothcollectionsoffeaturevectorsareextractedfromimagesofthesamedataset. Di˙erentcolorsandshapesrepresentdi˙erentdemographicattributes.Zoominfor details..........................................85 Figure 4.7 ReconstructedImagesusingFaceandDemographicRepresentations.The ˝rstrowistheoriginalfaceimages.Fromthesecondrowtothebottom,theface imagesarereconstructedfrom2)BaseFace;3)DebFace-ID;4)DebFace-G;5) DebFace-R;6)DebFace-A.Zoominfordetails....................85 Figure4.8Thepercentageoffalseacceptedcrossraceoragepairsat1%FAR......87 xvii Figure 4.9 BaseFaceandDebFacedistributionsofthesimilarityscoresoftheimposter pairsacrosshomogeneousversusheterogeneousgender,age,andracecategories..89 Figure 4.10 (a)Ourproposedgroupadaptiveclassi˝er(GAC)automaticallychooses betweennon-adaptiveandadaptiveAlayerinamulti-layernetwork,where thelatterusesdemographic-group-speci˝ckernelandattention.(b)Comparedto thebaselinewiththe 50 -layerArcFacebackbone,GACimprovesfaceveri˝cation accuracyinmostgroupsofRFWdataset[4],especiallyunder-representedgroups, leadingtomitigatedFRbias.GACreducesbiasnessfrom 1 Ł 11 to 0 Ł 60 ........90 Figure4.11AcomparisonofapproachesinadaptiveCNNs................92 Figure 4.12 OverviewoftheproposedGACformitigatingFRbias.GACcontainstwo majormodules:theadaptivelayerandtheautomationmodule.Theadaptivelayer consistsofadaptivekernelsandattentionmaps.Theautomationmoduleisemployed todecidewhetheralayershouldbeadaptiveornot..................94 Figure4.13ROCof(a)baselineand(b)GACevaluatedonallpairsofRFW.......100 Figure 4.14 8 falsepositiveandfalsenegativepairsonRFWgivenbythebaselinebut successfullyveri˝edbyGAC..............................101 Figure 4.15 (a)Foreachofthethree g inautomaticadaptation,weshowtheaverage similaritiesofpair-wisedemographickernelmasks, i.e. , \ ,at 1 - 48 layers(y-axis), and 1 - 15 trainingsteps(x-axis).Thenumberofadaptivelayersinthreecases, i.e. , P 48 1 ( \¡g ) at 15 C step,are 12 , 8 ,and 2 ,respectively.(b)Withtworacegroups (White,BlackinPCSO[14])andtwomodels(baseline,GAC),foreachofthefour combinations,wecomputepair-wisecorrelationoffacerepresentationsusingany twoof 1 Ksubjectsinthesamerace,andplotthehistogramofcorrelations.GAC reducesthedi˙erence/biasoftwodistributions....................102 Figure 4.16 The˝rstrowshowstheaveragefacesofdi˙erentgroupsinRFW.The nexttworowsshowgradient-weightedclassactivationheatmaps[15]atthe 43 C convolutionallayeroftheGACandbaseline.Thehigherdiversityofheatmapsin GACshowsthevariabilityofparametersinGACacrossgroups...........104 Figure 4.17 DemographicAttributeClassi˝cationAccuracyoneachgroup.Thered dashedlinereferstotheaverageaccuracyonallimagesinthetestingset.......110 xviii Figure 5.1 (a)InGANs,duringtraininganimagegeneratorgraduallyproduceshigher qualityfacessothataCNN-baseddiscriminatorcouldnotdistinguishfakefromreal faces.(b)Analogously,giveninputfaces,ourembeddingnetworkforfacerecognition learnstoextractdiscriminativefeaturesandconnectfeaturesasagraph,withthe goalthatagraphneuralnetwork(GNN)-baseddiscriminatorcouldnotdistinguish generatedgraphsfromoraclegraphsthegraphofidealfacerepresentations. Duringinference,ourembeddingnetworkcanextractmorediscriminativefeatures thatformoracle-likegraph,justlikeGAN'sgeneratorsynthesizesphoto-realisticfaces. 114 Figure 5.2 Overviewoftheproposedadversarialfacerepresentationlearningviagraph classi˝cation.Solidarrowspresentforwardpass,anddashedarrowsdenotebackward propagation.Thetrainingalternatesbetween ˆ ( ) and ˇ ( ) .Forthesharedinference (solidbluearrows),asetoffaceimagesare˝rsttakenbytheembeddingnetwork ˆ ( ) toextractfeaturerepresentations.Thesefeaturevectorsarethenconverted intographstructurebythegraphconstructor ˝ ( ) inwhichanoraclegraphanda generatedgraphareconstructed.Duringthetrainingof ˇ ( ) (yellowarrows),the twotypesofgraphsarereceivedbythegraphdiscriminator ˇ ( ) thatisrequiredto makepredictionsonthecategoryofthegraphs. ˇ ( ) isthenupdatedbasedonthe gradientsentbackfromthelossfunction L ˇ .Inthecourseoftrainingon ˆ ( ) (red arrows),onlygeneratedgraphsaredeliveredto ˇ ( ) ,and ˆ ( ) receivesfeedbacks from L whosegoalistodrive ˇ ( ) tomakeerrorsongeneratedgraphs.......116 Figure 5.3 Constructionofgeneratedgraphandoraclegraph .Inthisexample,theinput imagesetcomprises 9 imagesof 3 subjects, 3 imagespersubject.Theimageof eachsubjectissurroundedbyacirclewithauniquecolor,indicatingitsidentity. Eachimageisprojectedtoapointinthe 2 DEuclideanfeaturespace.Thefollowing graphsareconstructed:(a)ageneratedgraph,whereeachvertex E 8 isrepresentedby itsfeaturevector,withadirectededgefrom E 8 to E 9 if E 9 isoneofthetop 2 nearest neighborsof E 8 ;(b)anoraclegraphcreatedbycenterpoints,whereeachvertexis representedbythemeanvectorofitsidentitywithabidirectionaledgeconnecting twoverticesofthesamesubject;(c)aradiusconstraintisusedtoallowtolerable intra-subjectvariations,whereverticesmovetowardscenterdirections(denotedby dashedarrows)tomeettheradiusrequirement.Forthevertexwithintheradius,the leftmostoneinthisexample,itstaysthesame.(d)anoraclegraphcontrolledby A , wherethedistanceofeachvertextoitscenterisreducedbytheratioof A ,witha bidirectionaledgeconnectingtwoverticesofthesamesubject............119 Figure 5.4 Twoexamplesofgeneratedgraphsbeingupdatedbyadversariallearningat 4 instancesduringthetrainingprocess..........................129 Figure 5.5 t-SNEvisualizationofthefacerepresentationsina 2 Dspace.Eachidentityis representedbyauniquecolor.TheinitialfacerepresentationsextractfromCurricularFace, andtheupdatedrepresentationslearnedviaadversarialgraphclassi˝cationareshownin(a) and(b),respectively. ..................................130 xix LISTOFALGORITHMS Algorithm1FaceRepresentationCapacityEstimation..................58 xx KEYTOABBREVIATIONS Acronyms/Abbreviation FRFaceRecognition SOTAState-of-the-art CCTVClosedCircuitTelevision ROCReceivingOperatingCharacteristic TARTrueAcceptanceRate FARFalseAcceptanceRate CMCCumulativeMatchCharacteristic DIRDetection&Identi˝cationRate : NN : nearestneighbors PCAPrincipalComponentAnalysis LDALinearDiscriminantAnalysis LBPLocalBinaryPatterns LQPLocalQuantizedPatterns HOGHistogramofOrientedGradients SIFTScaleInvariantFeatureTransform DNNDeepNeuralNetwork CNNConvolutionalNeuralNetwork INDIntrinsicDimensionality GACGroupAdaptiveClassi˝er MDSMultidimensionalScaling RMSERootMeanSquaredError GANGenerativeAdversarialNetwork GNNGraphNeuralNetwork xxi Chapter1 Introduction:TheImportanceofFace Representation Inbiometricsandcomputervisioncommunities,facerecognition(FR)hasemergedasoneof themajorresearch˝eldsfocusingonthedesignofalgorithmsthatcanautomaticallyauthenticate people'sidentitiesbasedontheirdigitalfaceimages.Withtherapidproliferationoffaceimages oronsocialmediawebsites,suchasFacebook,Twitter,andInstagram,researchersin theFRcommunityhaveaccesstoabundantimagesandvideosofhumanface,whichhasrapidly acceleratedthedevelopmentofFRsystemsandextendeditsapplications.Forexample,FRsystems arewidelyadoptedforsecurity-relatedapplications( e.g., accesscontrol,surveillancesystems), forensicapplications(criminalidentityveri˝cation),andentertainmentapplicationsondesktopsand mobiledevices,forexample,mobileappsforfacephotoediting.Asaconvenientauthenticationtool, FRrequiresminimalinteractionwithusersandcanevenoperateunderuncontrolledenvironments andatadistance[16].Comparedtootherbiometrictraits( i.e., iris,˝ngerprints,voice,etc.),in additiontoidentity,ahumanfaceimagecontainsseveralusefulinformation,includingdemographics (gender,age,race/ethnicity),facialexpression,andemotioncues.Suchrichinformation,ontheother hand,canunderminethereliabilityofFRsystems.Thisisbecausesomefacialcharacteristics,such asfacialexpressions,candeformcrucialfacefeatures,andleadtolargeintra-personfacialvariations 1 thataredi˚culttocompensate.OneofthekeystepsinanyFRsystemistoimpartrobustness towardschallengescausedbyvariationsinfaceimagesbyextractingsalientfacialfeatures.Instead ofarawfaceimage,avectorofrepresentativefeatures,alsoreferredtoasa facerepresentation ,is usedtodistinguishtheidentity.Agoodrepresentationmethodiscapableofreducingtheintra-person variationswhilemaintainingorevenenhancinginter-persondi˙erences. Whenusing deeplearning models[6,13,17 19]toextractfacerepresentations,currentstate-of- the-art(SOTA)FRsystemsclaimtosurpasshumancapabilityincertainscenarios[20].Despite thistremendousprogress,somefundamentalquestionsinfacerepresentationlearningremain: ‹ Howcompactcantherepresentationbewithoutanylossinrecognitionperformance? ‹ Givenafacerepresentation,howmanyidentitiescanitresolve?Orinotherwords,whatis thecapacityofagivenfacerepresentation? ‹ DoesaFRsystemgeneraterepresentationswhichisequallydiscriminableforfacesindi˙erent demographicgroups? ‹ Isitpossibletoenhancethesaliencyoffacerepresentationforeverysubjectinatarget populationbymeansofthedistancedistributionintheembeddingspace? First,ascienti˝cbasisforestimatingthecompactnessandthecapacityofagivenfacerepresentation willnotonlybene˝ttheevaluationandcomparisonofdi˙erentrepresentationmethods,butwill alsobene˝tthedevelopmentofcompactfacerepresentations(smalltemplatesize)withhighsearch e˚ciencyandestablishanupperboundonthescalabilityofanautomaticFRsystem.Second,FR systemsareknowntoexhibitbiasedperformanceagainstcertaindemographicgroups[7,14,21]. GiventheimportanceofautomatedFR-drivendecisions,deployingbiasedFRsystemsespeciallyfor lawenforcementispotentiallyunethical[22].SomestateandlocalgovernmentsintheUnitedStates havecurtailedtheuseoffacerecognitionforthesereasons,withcitiesincludingSanFranciscoand Bostonenactingtheirownbans.ItisnecessarytodevelopfairandunbiasedFRsystemstoavoid theirnegativesocietalimpact.Whilereducingthebiasamongdemographicgroupsisimportant, thesegroupsarepre-de˝nedbasedonone'sdemographicattributes.Itisalsocrucialtoimprovethe representationforeachindividualregardlessofhisorhergender,age,andrace/ethnicity.Themain 2 goalofthisthesisistodeveloppracticaltoolstoreasonaboutthecompactnessandcapacityofagiven facerepresentation,anddesignalgorithmsforfairerandmorediscriminativefacerepresentations ofeachidentityineverydemographicgroup.Tobeginwith,we˝rstgiveabriefbackgroundon automatedfacerecognition.Thenweintroducethemotivation,goals,andproblemdomainsofeach workonautomatedfacerecognitionaddressedinthisdissertationinSec.1.3toSec.1.5,separately. Wealsosummarizethecontributionsandthestructureofthisthesisattheendofthechapter. 1.1AutomatedFaceRecognition 1.1.1InputDataSource Asavisualpatternrecognitiontask,FRcanbeperformedonavarietyofinputdatasource,such as1)asingle2Dimage,2)asetof2Dimages(videoframes),and3)3Dfaceimages.Asingle 2Dimageisoftenusedastheinputoffaceveri˝cationsystems.Insomescenarios,forexample, inasurveillanceenvironment,aclipofvideocapturedbyCCTVsystemsistakenastheinputof FRsystems.Althoughmultipleframesinavideoclipproviderichinformationatdi˙erenttime stamps,unconstrainedFRinvideo-surveillancesettingsisstillachallengingproblem,becausevideo framestendtobeofpoorqualitywithmotionblurandunfavorableviewingangles.3Dsensors, includingdepthsensors,mayprovideextrainformationandhelpimprovetheaccuracyofFR,but theyareexpensivewithrelativelylargeracquisitiontime.Thisthesisfocusesonthemorecommonly deployedscenariowhereasingle2Dimageistheinputdata. AkeyaspectofdeeplearningbasedFRalgorithmsisthetrainingdatausedtolearnface representations.Datacollectionandannotationareextremelyimportantforsupervisedface representationlearning.Widelyadoptedpubliclyavailablefacedatasetsusedinthisthesisfor trainingandtestingfacerepresentationmodelsarelistedbelow.Examplesoffaceimagesinthese datasetsareshowninFig.1.1. CASIAWebFace[11]: Acollectionoflabeledimagesdownloadedfromtheweb(basedon namesoffamouspersonalitiesaskeywords)popularfortrainingdeepneuralnetworks.Itconsists 3 of 494 Œ 414 imagesacross 10 Œ 575 subjects,withanaverageofabout 500 faceimagespersubjects. Thisdatasetisprimarlyusedfortrainingfacerepresentationmodels. MS-Celeb-1M[23]: The˝rstreleasedversionofthisdatasetcontainedaround10million imagesof100Kcelebrities.About100imageswereretrievedforeachidentitybytheBingsearch engineusingthecelebrity'sname.Withno˝lteringoftheretrievedimages,thequalityofthe datasetisgreatlymuddiedbylabelnoise,duplicatedimages,andnon-faceimages,makingithardto beuseddirectlyforrepresentationlearning[24].Forthisreason,therehavebeenseveralcleaned versionsofthedataset([6,13,25,26]).Inthisthesis,theversionof[6]isusedasthedatasetfor trainingFRmodels,whichcontains 5 Œ 822 Œ 653 imagesof 85 Œ 742 subjects. LFW[27]: 13 Œ 233 faceimagesof 5 Œ 749 subjects,downloadedfromtheweb.Theseimages exhibitlimitedvariationsinpose,illumination,andexpression,sinceonlyfacesthatcouldbe detectedbytheViola-Jonesfacedetector[28]wereincludedinthedataset.Onelimitationofthis datasetisthatonly 1 Œ 680 subjectsamongthetotalof 5 Œ 749 subjectshavemorethanonefaceimage. IJB-A[29]: IARPAJanusBenchmark-A(IJB-A)contains 500 subjectswithatotalof 25 Œ 813 images( 5 Œ 399 stillimagesand 20 Œ 414 videoframes),anaverageof 51 imagespersubject.Compared totheLFWandCASIAdatasets,theIJB-Adatasetismorechallengingduetothepresenceof:i) largeposevariationsmakingitdi˚culttodetectallthefacesusingacommodityfacedetector,ii)a mixofimagesandvideos,andiii)widergeographicalvariationofsubjects.Thefacelocationsare providedwiththeIJB-Adataset(andusedinourexperimentsinthisthesiswhenneeded). IJB-B[30]: IARPAJanusBenchmark-B(IJB-B)datasetisasupersetoftheIJB-Adataset consistingof 1 Œ 845 subjectswithatotalof 76 Œ 824 images( 21 Œ 798 stillimagesand 55 Œ 026 video framesfrom 7 Œ 011 videos),anaverageof 41 imagespersubject.Imagesinthisdatasetarelabeled withgroundtruthboundingboxesandothercovariatemeta-datasuchasocclusions,facialhair andskintone.AkeymotivationfortheIJB-Bdatasetistomakethefacedatasetlessconstrained comparedtoIJB-Adatasetandhaveamoreuniformgeographicaldistributionofsubjectsacrossthe globe. IJB-C[9]: IARPAJanusBenchmark-C(IJB-C)datasetconsistsof 3 Œ 531 subjectswithatotal 4 (a)MS-Celeb-1M[23] (b)CASIAWebFace[11] (c)LFW[27] (d)IJB-A[29] (e)IJB-B[30] (f)IJB-C[9] Figure1.1Exampleimagesfromdi˙erentdatasets.MS-Celeb-1M,CASIA,andLFWcontainonly stillimages.IJB-A,IJB-BandIJB-Cincludebothstillimagesandvideoframes.Threesubjectsare selectedfromeachdataset,andeachrowcontainsimagesbelongingtoonesubject. of 31 Œ 334 ( 21 Œ 294 faceand 10 Œ 040 non-face)stillimagesand 11 Œ 779 videos( 117 Œ 542 frames),an averageof 39 imagespersubject.Thisdatasetemphasizesfaceswithfullposevariations,occlusions anddiversityofsubjectoccupationandgeographicalorigin.Imagesinthisdatasetarelabeledwith groundtruthboundingboxesandothercovariatessuchasocclusions,facialhairandskintone. 1.1.2Evaluation TherearemanyapplicationswhereFRtechniquesaresuccessfullyusedtoperformaspeci˝ctask. Amongthosetasks,twoprimarytasksareconsideredtoevaluateanFRsystem: ‹ Veri˝cation(authentication)onetoonematch. ‹ Identi˝cation(recognition)onetomanymatch. Veri˝cation. Thetaskofveri˝cationgenerallyaimsatidentityauthenticationwithuser interaction,toverifyifagivenfacematchestheidentitythatisclaimed.Toevaluateaveri˝cation 5 system,faceimagesare˝rstdividedintotwogroups:1) genuine groupwherepeoplegainaccess usingtheirownidentity;2) impostor groupwherepeoplegainaccessusingfalseidentities.Aface imageiscomparedwithotherfaceimagesfromgenuinegroupandimpostorgroup,respectively, whichcorrespondstogenuinepairs(apairoffaceimagesfromthesameidentity),andimpostor pairs(apairoffaceimagesfromdi˙erentidentities).AReceivingOperatingCharacteristic(ROC) curveisutilizedtoevaluatetheFRperformanceonveri˝cationtasks,wherethepercentageof genuineaccessisreportedastheTrueAcceptanceRate(TAR)andthepercentageofimpostor falselygainingaccessisreportedastheFalseAcceptanceRate(FAR)foragivenmatchthreshold, g . Let ˚ 4 1 and ˚ 4 2 denoteagenuinepairoffaceimages,and ˚ < 1 and ˚ < 2 denoteanimpostorpairofface images.ThenTARandFARcanbeformulatedas[31]: )' ( g )= jf E j B ( ˚ 4 1 Œ˚ 4 2 ) ¡g gj j E j Œ˙' ( g )= jf M j B ( ˚ < 1 Œ˚ < 2 ) ¡g gj j M j Œ (1.1) where E isthesetofgenuinepairs, M isthesetofimpostorpairs,and B ( Œ ) isagivensimilarity function. Identi˝cation. Thetaskofidenti˝cationismostlyaimedatidentitysearchwithoutuser interaction,forexample,surveillancesystems.Similartoevaluationonveri˝cationsystems,the identi˝cationtestisconductedbydividingfaceimagesintotwogroups:1) probe imageswhose identitiesareunknown,denotedas P ;2) gallery imagesthatbelongtopeoplewithknownidentities, denotedas G .Ingeneral,eachsubjecthasonefaceimageinthegallery.Basedontherelationship betweentheprobeandgalleryidentities,theevaluationissplitintotwodi˙erentsettings:1) closed-set identi˝cationwhereallprobeidentitiesareassumedtobeamongthegalleryidentities; 2) open-set identi˝cationwhereprobeidentitiesarenotnecessarilyinthegallery.Forclosed-set, aCumulativeMatchCharacteristic(CMC)curveisusedtoreportthecorrectidenti˝cationrate foreachcumulativerank(thenumberofcandidatesreturned).Byorderingthesimilarityscores betweenaprobeimage ˚ ? 2 P andimagesinthegallery ˚ 6 2 G ,therankof ˚ ? iscomputedasthe numberofidentityinthegallerywhosesimilarityscoresarehigherorequaltothecorrectidentity 6 ˚ 6 ? : A0=: ( ˚ ? )= jf G j B ( ˚ 6 Œ˚ ? ) B ( ˚ 6 ? Œ˚ ? ); ˚ 6 2 G gj (1.2) Foragivenrank A ,thecorrectidenti˝cationrateinaCMCcurveis jf P j A0=: ( ˚ ? ) A gj j P j .Inthecaseof open-set,imagesintheprobeset P canbeclassi˝edintoknownsubjects S (thesubjectsthatappear inthegallery)andunkownsubjects U (thesubjectsthatarenotinthegallery),i.e., P = S S U .A DetectionandIdenti˝cationRate(DIR)curveisplottedtoshowthecorrectidenti˝cationrateswith respecttothefalseacceptancerates(FAR).Here,thede˝nitionofFARisdi˙erentfromthatinan ROCcurve.Foragiventhreshold \ ,afalseacceptanceoccurswhenthesimilarityofanunknown probe ˚ ? 2 U tooneofthegallerysubjectsishigherthan \ .FARcomputestheaverageprobability as[32]: ˙' ( \ )= jf P j max G B ( ˚ 6 Œ˚ ? ) \ ; ˚ ? 2 U gj j U j Ł (1.3) TheDIRforagivenrank A iscalculatedontheknownprobeset S as[32]: ˇ˚' ( \ŒA )= jf P j A0=: ( ˚ ? ) A V B ( ˚ 6 Œ˚ ? ) \ ; ˚ ? 2 S gj j S j Ł (1.4) 1.1.3FaceRecognitionPipeline AgeneralFRsystemconsistsoffoursteps, i.e, facedetection,alignmentandnormalization,feature extraction,andfeaturematching.Fig.1.2showsatypicalpipelineoffeature-basedFRsystems. First,thefaceareaislocalizedinaninputimage,andthefaceregioniscroppedfromtheoriginal image.Next,thecroppedfaceimageisaligned(typicalalignmentsincludetranslation,rotation, andscaling)basedonthedetectedfaciallandmarks(keypointsonfaceincludingeyebrows,eyes, nose,mouth,andjawsilhouette).Toreducethee˙ectsofilluminationvariations,thealignedface imagesneedtobenormalizedbeforebeingusedforfeatureextraction.Then,intherepresentation stage,weextractacompactsetofdiscriminatinggeometricaland/orphotometricalfeaturesofthe facesothateachfaceimageisrepresentedandstoredasa 3 dimensionalfeaturevector.Finally, usingadistancemetric,thesimilaritybetweentwofeaturevectorsismeasured,alsoknownasthe 7 Figure1.2Atypicalpipelineoffeature-basedFRframeworkscomprisesoffacedetection,alignment, normalization,featureextraction,andfeaturematching.Whileeachofthesecomponentsa˙ectthe performanceofFRsystems,inthisthesis,wefocusontherepresentationfunctionoffeatureextraction thatmapsahigh-dimensionalnormalizedfaceimagetoa d- dimensionalvectorrepresentation. similarityscore,toverifyiftwofaceimagesbelongtothesameperson(faceveri˝cation),orto identifytheidentityofthefaceimagebyassigningittothelabelofthenearestgalleryimage(rank- 1 faceidenti˝cation).Sinceourcentralfocusisonfacerepresentations,werefertothestepsbefore featureextractionasdatapre-processing.Inthisthesis,wemainlyemploytwofaciallandmark detectionalgorithms,MTCNN[33]andRetinaface[34],todetectandalignallfacesinourtraining andtestingdatasets.Inparticular,eachfaceiscroppedfromthedetectedfaceregionandresizedto thesamesizeusingasimilaritytransformationbasedonthedetected˝vefaciallandmarks, i.e., left eyecenter,righteyecenter,nosetip,leftmouthcorner,andrightmouthcorner. 1.2FaceRepresentationExtractors ThesubjectofFRisasoldasthe˝eldofcomputervision[35].Unsurprisingly,facerecognitionhas receivedtremendousattentioninthecomputervisionandbiometricscommunitiesoverthepastthree decades.Here,wepresentafewnotableapproachestolearningfacerepresentation(hand-crafted andlearning-basedmethods). Hand-craftedRepresentation. The˝rstandmostwell-knownglobalfeatureextractionmethod isEigenfaces[36],inwhichprincipalcomponentanalysis(PCA)isemployedonnormalizedvectors oftrainingimagesto˝ndtheprincipaleigenvectors,correspondingtothelargest,sayk,eigenvalues. 8 Theobtainedeigenvectorsarethenusedasaseedsettorepresentotherfaceimages(notinthe trainingset)viaalinearprojectionoperation.Toutilizetheinformationwheneachsubjecthasmore thanoneimageinthetrainingsetandtheimagesarelabeledbysubjectID,Fisherfaces[37]uses Fisher'sLinearDiscriminantAnalysis(LDA)tominimizeintra-classvariations(amongimagesof thesameperson)whilemaximizinginter-classdi˙erences(amongimagesbelongingtodi˙erent people).Besidesglobalfeatures,avarietyoffacerepresentationmethodswereproposedtoencode localfacialcomponentssuchaseyes,nose,andmouth.Forexample,complexcoe˚cientsofGabor ˝lters[38](alsoknownasGaborwavelets)areusedtoencodebothfacialshapeandlocalappearance features.Intheworkof[39],faceimagesarerepresentedbyutilizingthebunchgraphtocollect informationfromGaborwaveletsconvolutionvaluesateachfaciallandmarklocation.Incontrastto Gabor˝lters,intensitybasedlocalelementarydescriptorshavealsobeenusedtorepresentface imagesduetotheire˚cientcomputations,andinsensitivitytopartialocclusionandposechanges, whichincludesLocalBinaryPatterns(LBP)[40],LocalPhaseQuantization(LPQ)[41],Histogram ofOrientedGradients(HOG)[42],andScaleInvariantFeatureTransform(SIFT)[43]. Learning-basedRepresentation. Giventhesuccessofdeepneuralnetwork(DNN)inthe ImageNetcompetition[44],DNN-basedrepresentationlearninghavecontributedtomassivestrides inFRcapabilities[9,45].Thede˝ningcharacteristicofsuchmethodsistheuseofconvolutional neuralnetwork(CNN)basedfeatureextractor,alearnableembeddingfunctioncomprisedof severalsequentiallinearandnon-linearoperators[44].Amongthe˝rstattemptsatlearningface representationusingdeeplearning,Taigman etal .[17]presentedDeepFacewhichusesadeepCNN trainedtoclassifyfacesinCASIAdataset,demonstratingremarkableperformanceonLFWdataset. AsanimprovedversionofDeepID[46],DeepID2[18]employsbothveri˝cationandidenti˝cation tasksassupervisionsignalstolearnrobustdiscriminatingrepresentations.Thefollowingapproaches haveexploreddi˙erentlossfunctionstoimprovethediscriminabilityoftheembeddingvector. ResearchersfromGoogle[12]useamassivedatasetofabout 200 millionfaceimagesof8million identitiestotrainaCNNdirectlyforfaceveri˝cation(calledFaceNet).Theyoptimizeatripletloss functionwhichisbasedontripletsofimagescomprisingapairofsimilarandapairofdissimilar 9 faces.Thelossfunctionisformulatedas: L CA8?;4C = " X 8 =1 [ U + 3 ( r 0 Œ r + ) 3 ( r 0 Œ r )] + Œ (1.5) where " isthenumberoftriplets, r 0 , r + and r aretherepresentationsofananchorfaceimage,a positive(sameidentityastheanchor)andanegative(di˙erentidentityfromtheanchor)faceimage, respectively. [ G ] + = max f 0 ŒG g , U isamarginparameterand 3 ( ) isthesquaredeuclideandistance. InspiredbyDNN-basedimageclassi˝cation,amajorpartofthee˙ortshasbeenmadetodevelop newlossfunctionsontopofthesoftmaxlayerbasedoncross-entropyloss: L 2A>BB 4=CA>?H ( r ŒH ; W Œ b )= X : =1 I ( : = H )log 4 W ) H r + b H P 9 =1 4 W ) 9 r + b 9 Œ (1.6) where r and H aretherepresentationandtheidentitylabelofaninputfaceimage, W and b arethe parametersofthesoftmaxlayer,and isthenumberofclasses(uniqueidentities)inthetraining set.Wen etal .[47]proposecenterlossthatisaugmentedwithEq.(1.6)toreducetheintra-class variations.IntheL2-constrainedsoftmax[48],afeaturevector r is˝rstnormalizedbyits ; 2 norm tolieonahyper-sphereandthenscaledbyaconstantfactor.SphereFace[13]introducesangular softmax(A-softmax)inwhichtheoriginalsoftmaxismodi˝edtodirectlyoptimizeanglesbetween W H and r ,resultinginangularlydistributedfeatures.Othersoftmaxmodi˝cationsenforceextra intra-classconcentrationandinter-classvariancetofacefeaturesbyaddingamarginpenaltytothe decisionboundary[6,19,49].InCosFace[19],boththerepresentation r andtheweightvectors W are ; 2 normalizedtocomputetheircosinesimilarity,basedonwhichacosinemargintermis introducedtofurtherbroadenthedecisionboundaryinanangularspace.ArcFace[6]addsan additiveangularmargintotheanglebetweentherepresentation r anditstargetweightvector W H via thearc-cosinefunction.Allthesefacerepresentationmethodssharethesameobjective:increasing inter-classdistancesandreducingintra-classvariations. 10 1.3ManifoldofFaceRepresentation Afacerepresentationisobtainedbyanembeddingfunctionthattransformstherawpixelrepresen- tationoftheimagetoapointinahigh-dimensionalfeaturespace.Learningorestimatingsucha mappingismotivatedbytwogoals:(a)compactnessoftherepresentation,and(b)e˙ectiveness ofthemappingforFR.GiventhemethodsintroducedinSec.1.2,thelattertopichasreceived substantialattention.Yet,therehasbeenlittlefocusonthedimensionalityoftherepresentation itself.Anothertopicrelatedtorepresentationcompactnessisthecapacityoffacerepresentation. Givenafacerepresentation,howmanyidentitiescanitresolve? Inthiswork,wedevelopalgorithms toestimatetheintrinsicdimensionalityandcapacityoffacerepresentations,anddesignanew dimensionalityreductionmethodtoobtaincompactrepresentations. 1.3.1IntrinsicDimensionality Thedimensionalityoffacerepresentationsextractedfromdeepnetworkshasrangedfromhundreds tothousandsofdimensions.Forinstance,currentSOTAfacerepresentationshave128,512,1,024 and4,096dimensionsforFaceNet[12],ResNet[50],SphereFace[13],andVGG[51],respectively. Thechoiceofdimensionalityisoftendeterminedbypracticalconsiderations,suchas,easeof learningtheembeddingfunction[52],constraintsonsystemmemory,etc.,insteadofassigning e˙ectivedimensionalityneededforimagerepresentation.Thisnaturallyraisesthefollowing fundamentalbutrelatedquestion, Howcompactcantherepresentationbewithoutanylossin recognitionperformance? Inotherwords, whatistheintrinsicdimensionalityoftherepresentation? Subsequently, howcanoneobtainsuchacompactrepresentation? Theintrinsicdimensionality(IND)ofarepresentationreferstotheminimumnumberofparame- ters(ordegreesoffreedom)necessarytocapturetheinformationpresentintherepresentation[53]. Equivalently,itreferstothedimensionalityofthe < dimensionalmanifold, M ,embeddedwithin the 3 dimensionalambient(representation)space P where < 3 .Thisnotionofintrinsic dimensionalityisnotablydi˙erentfromcommonlineardimensionalityestimatesobtainedthrough 11 e.g., PCA.Thislineardimensioncorrespondstothebestlinearsubspacenecessarytoretaina desiredfractionofthevariationsinthedata.Inprinciple,lineardimensionalitycanbeaslargeas theambientdimensionifthevariationfactorsarehighlyentangledwitheachother. Theabilitytoestimatetheintrinsicdimensionalityofagivenfacerepresentationisusefulina numberofways.Ata fundamental level,theINDdeterminesthetruecapacityandcomplexityof variationsinthedatacapturedbytherepresentation,throughtheembeddingfunction.Infact,IND canbeusedtogaugetheinformationcontentintherepresentation,duetoitslinearrelationwith Shannonentropy[54,55].Also,itprovidesanestimateoftheamountofredundancybuiltintothe representationwhichrelatestoitsgeneralizationcapability.Ona practical level,knowledgeofthe INDiscrucialfordevisingoptimalunsupervisedstrategiestoobtainfacefeaturesthatareminimally redundant,whileretainingitsfullabilitytorecognizefacesofdi˙erentidentities.Recognitionin theintrinsicspacecanprovidesigni˝cantsavings,bothinmemoryrequirementsforthetemplates aswellasprocessingtime,acrossdownstreamtaskslikelarge-scalefacematchingintheencrypted domain[56].Lastly,thegapbetweenambientandintrinsicdimensionalitiesofarepresentation canserveasausefulindicatortodrivethedevelopmentofalgorithmsthatcandirectlylearnhighly compactembeddings. EstimatingtheINDofagivenfacerepresentationis,however,achallengingtask.Suchestimates arecruciallydependentonthedensityvariationsintherepresentation,whichinitselfisdi˚cultto estimateasimagesoftenlieonatopologicallycomplexcurvedmanifold[57].Moreimportantly, givenanestimateofIND,howdoweverifythatittrulyrepresentsthedimensionalityofthecomplex high-dimensionalrepresentationspace?AnindirectvalidationoftheINDispossiblethrougha mappingthattransformstheambientrepresentationspacetotheintrinsicrepresentationspacewhile preservingitsdiscriminativeability.However,thereisnocertaintythatsuchamappingcanbe founde˚ciently.Inpractice,˝ndingsuchmappingscanbeconsiderablyharderthanestimatingthe INDitself. Weovercomebothofthesechallengesby(1)adoptingatopologicaldimensionalityestimation techniquebasedonthegeodesicdistancebetweenpointsonthemanifold,and(2)relyingonthe 12 abilityofDNNstoapproximatethecomplexmappingfunctionfromtheambientspacetothe intrinsicspace(asweseebelowinCh.2).ThelatterenablesvalidationoftheINDestimatesthrough facematchingexperimentsonthecorrespondinglow-dimensionalintrinsicrepresentationoffeature vectors. 1.3.2Capacity Considerthefollowingscenario:wewouldliketodeployaFRsystemwithrepresentation " ina targetapplicationthatrequiresamaximumFARof @ % .Assubjectsarecontinuouslyaugmentedto thegallery,anintuitivelyknownandempiricallyobservedphenomenonoccurs:theFRaccuracy startsdecreasing.Thisisprimarilyduetothefactthatwithmoresubjectsanddiverseviewpoints, therepresentationsoftheclasseswillnolongerbedisjoint.Inotherwords,theFRsystembasedon representation " cannolongercompletelyresolvealloftheuserswithinthe @ % FAR.Wede˝ne themaximalnumberofusersatwhichthefacerepresentationreachesthislimitasthecapacity 1 of therepresentation.Ourmaincontributioninthiswork,istodeterminethecapacityinanobjective mannerwithouttheneedforempiricalevaluation. Theabilitytodeterminethiscapacitya˙ordsthefollowingbene˝ts:(i)Statisticalestimatesof theupperboundonthenumberofidentitiesthefacerepresentationcanresolve.Thiswouldallow forinformeddeploymentofFRsystemsbasedontheexpectedscaleofoperation;(ii)Estimatethe maximalgallerysizeforthefacerepresentation without havingtoexhaustivelyevaluatetheface representationateachscale.Consequently,capacityo˙ersanalternativedataset 2 -agnosticmetric forcomparingdi˙erentfacerepresentations. Anattractivesolutionforestimatingthecapacityoffacerepresentationsistoleveragethenotion ofpackingbounds 3 ;themaximalnumberofshapesthatcanbe˝t,withoutoverlapping,within thesupportoftherepresentationspace.Alooseboundonthispackingproblemcanbeobtained 1 Thisisdi˙erentfromthenotionofcapacityofaspaceoffunctionsasmeasuredbyitsVvonenkis dimensionoflinearclassi˝ers. 2 Classofdatasetsasopposedtoaspeci˝cdataset. 3 Ageneralizationofthewellstudiedsphere-packingproblem. 13 asaratioofthevolumeofthesupportspaceandthevolumeoftheshape.Inthecontextof facerepresentations,therepresentationsupportcanbemodeledasalow-dimensionalpopulation manifold M2 R < embeddedwithinahigh-dimensionalrepresentationspace P2 R ? ,whileeach class 4 canbemodeledasitsownmanifold M 2 M .Underthissetting,aboundonthecapacity oftherepresentationcanbeobtainedasaratioofthevolumesofthepopulationandclass-speci˝c manifolds.However,adoptingthisapproachtoobtainempiricalestimatesofthecapacitypresents thefollowingchallenges: 1. Estimatingthesupportofthepopulationmanifold M andtheclass-speci˝cmanifolds M 2 , especiallyforahigh-dimensionalembedding,suchasafacerepresentation(typically,several hundred),isanopenproblem. 2. Estimatingthedensityofthemanifoldswhileaccountingforthedi˙erentsourcesofnoiseisa challengingtask.Inthecontextoffacerepresentations,allthecomponentsofatypicalface representationpipeline(seeFig.1.2)arepotentialsourcesofnoise. 3. Obtainingreliableestimatesofthevolumeofarbitrarilyshapedhigh-dimensionalmanifolds (forcapacitybound),isanotheropenproblem. Weproposeaframeworkthataddressestheaforementionedchallengestoobtainreliableestimates ofthecapacityofanyfacerepresentation.Oursolutionrelieson:(1)modelingthefacerepresentation asalow-dimensionalEuclideanmanifoldembeddedwithinahigh-dimensionalspace,(2)projecting andunfoldingthemanifoldtoalow-dimensionalspace,(3)approximatingthepopulationmanifold byamultivariateGaussiandistribution(equivalently,hyper-ellipsoidalsupport)intheunfolded low-dimensionalspace,(4)approximatingtheclass-speci˝cmanifoldsbyamulti-variateGaussian distributionandestimatingitssupportasafunctionofthespeci˝cFAR,and(5)estimatingthe capacityasaratioofthevolumesofthepopulationandclass-speci˝chyper-ellipsoids.Thisworkis introducedbelowinCh.3. 4 InthecaseofFR,eachclassisanidentity(subject)andthenumberofclassescorrespondstothenumberof identities. 14 Figure1.3Theerrortradeo˙characteristicsintheformoffalsenon-matchrates(FNMR=1-TAR) vs.falsematchrates(FMR=FAR)foronecommercialFRalgorithmverifyingmugshotimages, reportedbytheNIST(NationalInstituteofStandardsandTechnology)FacerecognitionVendor Test[7].TheFMRestimatesarecomputedonimpostorpairsoffaceimageswithsamegenderand samerace.Eachsymbol(circle,triangle,square)correspondstoa˝xedthreshold-theirverticaland horizontaldisplacementsreveal,respectively,di˙erencesinFNMRandFMRbetweendemographic groups[7]. 1.4BiasinFaceRecognition FRsystemsareknowntoexhibitdiscriminatorybehaviorsagainstcertaindemographicgroups[7,14, 21].Fig.1.3showsonecommercialFRalgorithmthathaslowerperformanceincertaindemographic groupsthanothersinthe2019NISTFaceRecognitionVendorTest(FRVT)[7].Infact,all106 FRalgorithmsthatparticipatedintheNISTFRVTexhibitdi˙erentlevelsofbiasedperformances ongender,race,andagegroupsofamugshotdataset(seeFig.1.4).Forallthealgorithmslisted inFig.1.4,weseethealgorithmsthatachievebetterperformancepresentlesssex/racebias.For example,thedi˙erenceinFMRbetweenthehighestandlowestsex/racegroupislessthan 0 Ł 1% for thebestmodelshowninthelastrowofFig.1.4.Evenso,demographicbiasstillexistsincurrentFR algorithms.InatimewhenFRsystemsarebeingdeployedintherealworldforsocietalbene˝t, 15 Figure1.4Falsepositivedi˙erentialsinveri˝cationalgorithmsprovidedtoNIST.Thedotsgive thefalsematchratesforsame-sexandsame-raceimpostorcomparisons.Thethresholdissetfor eachalgorithmtogiveFMR= 0 Ł 0001 onwhitemales(thepurpledotsintherighthandpanel).The algorithmsaresortedinorderofworstcaseFMR[7]. 16 thistypeofbias 5 isnotacceptable.Notethathere,wede˝neFRbiasastheunevenrecognition performancewithrespecttodemographicgroups.ItisdesirabletodesignunbiasedFRalgorithms tomaintainfairnessinFRperformancewhendeployingthistechnologyforlawenforcementand otherapplications. Anaturalquestionarises, WhydoesthebiasproblemexistinFRsystems? First,SOTAFR algorithms[6,13,19]relyonCNNstrainedonlarge-scalefacedatasets.Thepublictrainingdatasets forFR,e.g.,CASIAWebFace[11],VGGFace2[59],andMS-Celeb-1M[23],arecollectedby scrapingfaceimageso˙theweb,withinevitabledemographicbias[5].Previousstudieshaveshown thatmodelstrainedwithimbalanceddatasets(unequalnumberoftrainingsamplesfromdi˙erent demographicgroups)leadtobiaseddiscrimination[60,61].Similarly,biasinfacedatasetsis transmittedtotheFRmodelsthroughnetworklearning.Forexample,tominimizetheoverallloss,a networktendstolearnabetterrepresentationforfacesinthemajoritygroupwhosenumberoffaces dominatethetrainingset,resultinginunequaldiscriminabilities.Theimbalanceddistributionof demographicsinfacedata,isnevertheless,nottheonlytriggerofFRbias.Priorworkhasshownthat evenusingademographicbalanceddataset[5]ortrainingseparateclassi˝ersforeachgroup[14], theperformanceonsomegroupsisstillinferiortotheothers.Thisshowsthebiasinembedding functions.Sincethegoaloffacerepresentationistomaptheinputfaceimagetoatargetfeature vectorwithhighdiscriminativepower,thebiasinthemappingfunctionwillresultinfeaturevectors withlowerdiscriminabilityforcertaindemographicgroups.Furthermore,bystudyingnon-trainable FRalgorithms,[14]introducedanewnotionofinherentbias,i.e.,certaingroupsareinherently moresusceptibletoerrorsinthefacematchingprocess. Totacklethedataset-inducedbias,datare-samplingmethodshavebeenexploitedtobalance thedatadistributionbyunder-samplingtheclasseswithmoresamples[62]orover-samplingthe classeswithlesssamples[63,64].Despiteitssimplicity,valuableinformationmayberemovedby under-sampling,andover-samplingmayintroducenoisysamples.Naivelytrainingonabalanced datasetcanstillleadtobias[5].Anothercommonoptionforimbalanceddatatrainingiscost- 5 Thisisdi˙erentfromthenotionofinductivebiasinmachinelearning,de˝nedas"anybasisforchoosingone generalization[hypothesis]overanother,otherthanstrictconsistencywiththeobservedtraininginstances"[58]. 17 sensitivelearningthatassignsweightstodi˙erentclassesbasedon(i)theirfrequencyor(ii)the e˙ectivenumberofsamples[65,66].Recentimbalancedlearningmethodsfocusonnovelobjective functionsforclass-skeweddatasets.Forinstance,Dong etal .[67]proposeaClassRecti˝cation Losstoincrementallyoptimizeonhardsamplesoftheclasseswithunder-representedattributes. Alternatively,researchersstrengthenthedecisionboundarytoimpedeperturbationfromother classesbyenforcingmarginsbetweenhardclustersviaadaptiveclustering[68],orbetweenrare classesviaBayesianuncertaintyestimates[69].Toadapttheaforementionedmethodstoracialbias mitigation,Wang etal .[5]modifythelargemarginbasedlossfunctionsbyreinforcementlearning. However,[5]requirestwoauxiliarynetworks,ano˜inesamplingnetworkandadeepQ-learning network,togenerateadaptivemarginpolicyfortrainingtheFRnetwork,whichhindersthelearning e˚ciency. Weproposetwodi˙erentapproachestofairrepresentationlearningforFRsystems.The ˝rstapproach,called DebFace ,utilizesadversariallearningtodisentangleafacerepresentation intofourcomponents, i.e., gender,age,race/ethnicity,andidentity.Eachofthefourcomponents isindependentfromothers.Wede-biasafacerepresentationundertheassumptionthatifno discriminatingdemographicinformationiscapturedbythefacerepresentation,itwouldbeunbiased withrespecttodemographicattributes.MoredetailsaregiveninCh.4.Thesecondapproach,called GAC(GroupAdaptiveClassi˝er) ,addressesthebiasissueinadi˙erentway.GACoptimizestheface representationlearningoneverydemographicgroupinasinglenetworkviaadaptiveconvolution kernelsandchannel-wiseattentionmaps,whichincreasesthenetworkcapacityforrepresenting multiplefacepatternsfromdi˙erentdemographicgroups. 1.5FaceRepresentationviaGraphNeuralNetwork AsintroducedinSec.1.2,facerepresentationlearningistheprocessofestablishinganembedding functionbywhichafaceimageistransformedtoahigh-dimensionalfeaturevector.Amongthe currentSOTADNN-basedapproaches,di˙erentnetworkarchitecturesandlossfunctionshave 18 beenexploredtoimproveFRperformance.Inearlystudies[17,46],deepfeaturesarelearntviaa faceclassi˝cationobjective.However,laterstudies[18,48]foundasimpleclassi˝cationlossis insu˚cienttocapturediscriminativefacefeatures,andthusattemptedtodesignappropriateloss functionstoenhancethediscriminativepowerofrepresentations.Onedirectionofresearchisto directlylearnanembeddingviametriclearning, e.g., contrastiveloss[18]andtripletloss[12].The otherlineofmethodsadaptthetraditionalsoftmax-cross-entropylosstodecreasetheintra-class variations[47],ortolengthentheinter-classdistancesbyaddinganextramarginbetweenclassesin eithercosine[19]orangularspace[6,13]. WhiletheFRperformanceisimproved,bothtypesofmethodshavelimitations.Forthesoftmax approachanditsvariants,thedimensionalityoftheoutputsoftmaxvectorincreaseslinearlywiththe numberofidentitiesinthetrainingset,whichmayleadtoabottleneckissueinthecomputation. Distancemetriclearningapproachescanavoidthisissuebyacquiringafeaturespacewheredistance correspondstoclasssimilarity.However,acarefullydesignedschemeisrequiredtoselectthepair ortripletsamplesfromatremendousnumberofcombinationsforlarge-scaledatasets.Furthermore, bothclassi˝cation-basedanddistance-basedobjectivestacklemerelyoneachindividualsample oratmostatripletofsamplesinthephysicaldistancemetric,butignorethe generaldistribution derivedfromcorrelationsbetweensamplesofwithin-classandcross-class. ToaddresstheaforementionedshortcomingsoftheFRlossfunctions,weproposearepresentation learningmethodthatutilizesgraphclassi˝cationviaadversarialtraining.Incontrasttousinga metricastheconstraintinthefeaturespace,ourideaistocreateanidealfeaturespace,referred toas oraclespace ,wheretheclusteroffeaturepointsineachclassisclearlyseparatedfromother classes.Adeepneuralnetwork(DNN)isthentrainedtogeneratefacefeaturesthatfollowthedata distributioninthe oraclespace .Inthiscase,thefacerepresentationmodelcanberegardedas thegenerativemodelinGenerativeAdversarialNetwork(GAN),whichhasemergedasauseful frameworkforlearningarbitrarydistributionsfromobserveddatasamples.Bymeansofadversarial learning,GANo˙ersapleasingoptionforgenerativetasksinwhichthegeneratoristrainedtoderive thedatadistributionfromobservedsamples.However,therelationshipsandinter-dependencies 19 betweenfeaturepointsarenotassimpleasthedatastructureof˝xed-sizegridimages.Forthis reason,wecannolongerusetheconventional discriminator inGANastheformofadversarial supervision. Instead,weconsiderthedatastructureinfeaturespaceasadirectedgraph,whereeachvertex correspondstoanimagesampleandtheedgesbetweenverticesrepresenttheirdependencies.For theoraclespace,featurespointsareconnectediftheybelongtothesubject;whileintheactual featurespace,nodesarelinkedtotheir : nearestneighbors.Thediscriminationtaskhereisto distinguishbetweengraphsfromtheoraclespaceandthegeneratedspace.Tothisend,weemploya graphclassi˝ertrainedwithGraphNeuralNetwork(GNN),asthediscriminatorthatguidesthe representationmodeltooutputfeaturesthatfollowtheoracledistribution.Theproposedframework isthuscapableoflearningagenericfeaturespacewithenhanceddiscriminativepowerforface images,basedonapredesignedfeaturedistributionde˝nedonagraphstructure.Furthermore, theconstructionoforaclegraphsprovidesuswithanopportunitytotakecontrolofthelearning complexitysoasto˛exiblyadjustthebias-variancetrade-o˙fromtheperspectiveofthetraining objective. 1.6ThesisContributions Theresearchworkcarriedoutinaddressingtheaboveproblemshasresultedinanumberof contributionstofacerecognition.Thekeycontributionsofthisdissertationarebrie˛ydescribed below: ‹ Itisthe˝rstattempttoestimatetheintrinsicdimensionalityofDNNbasedimagerepre- sentations.NumericalexperimentsyieldanIDestimateof,12and16forFaceNet[12] andSphereFace[13]facerepresentations,respectively,and19forResNet-34[70]image representation.Theestimatesaresigni˝cantlylowerthantheirrespectiveambientdimension- alities,128- 38< forFaceNetand512- 38< fortheothers.Wealsoproposeanunsupervised DNNbaseddimensionalityreductionmethodundertheframeworkofmulti-dimensional 20 scaling,calledDeepMDS.DeepMDSmappingissigni˝cantlybetterthanotherdimensionality reductionapproachesintermsofitsdiscriminativecapability. ‹ Itisthe˝rstpracticalattemptatestimatingthecapacityofDNNbasedfacerepresentations. Weconsidertwosuchrepresentations,namely,FaceNet[12]andSphereFace[13]consisting of128-dimensionaland512-dimensionalfeaturevectors,respectively.Weproposeanoise modelforfacialembeddingsthatexplicitlyaccountsfortwosourcesofuncertainty,uncertainty duetodataandtheuncertaintyintheparametersoftherepresentationfunction.Wecan estimatecapacityasafunctionofthedesiredoperatingpoint,intermsofthemaximum desiredprobabilityoffalseacceptanceerrorbyestablishingarelationshipbetweenthesupport oftheclass-speci˝cmanifoldsandthediscriminantfunctionofanearestneighborclassi˝er. ‹ Weprovideathoroughanalysisofdeeplearningbasedfacerecognitionperformanceon threedi˙erentdemographics:(i)gender,(ii)age,and(iii)race.Weproposetwoface recognitionframeworksthatmitigatedemographicbias:(i)DebFace,and(ii)GAC.DebFace generatesdisentangledrepresentationsforbothidentityanddemographicattributerecognition whilejointlyremovingdiscriminativeinformationfromothercounterparts.Theresult indicateboththeidentityrepresentationandthedemographicattributeestimationviaDebFace showlowerbiasondi˙erentdemographiccohorts.GACreducesdemographicbiasand increasesrobustnessofrepresentationsforfacesineverydemographicgroupbyadopting adaptiveconvolutionsandattentiontechniques.GACisabletoautomaticallydeterminethe layerstoemploydynamickernelsandattentionmaps,leadingtoSOTAperformanceona demographic-balanceddatasetandthreebenchmarkdatasets. ‹ Weproposeanewframeworkforfacerepresentationlearning,whichmodelsthedata distributioninanoraclespaceviaadversariallearning,totransformrawpixelsofanimagetoa highlydiscriminativefeaturevectorforfacerecognition.Graphsareconstructedinthefeature spacetodescribethedatadistributionoffeaturepointswithrespecttotheiridentitiesand similarities.Wealsoprovideathoroughanalysistowardstheimpactofprede˝nedgraphson thediscriminabilityofthelearnedfacerepresentations.Ourgraphbasedapproachsurpasses 21 thebaselinemodelandachievesstate-of-the-artperformanceonsixbenchmarkdatasets (LFW[27],CPLFW[71],CFP-FP[72],IJB-A[29],IJB-B[30],IJB-C[9]). 1.7ThesisStructure Ch.2ofthisthesisfocusesonthecompactnessofafacerepresentation,andproposesanew algorithmtoreducethedimensionalityoftherepresentationwithlittledegradationinperformance. WiththedimensionalityreductiontooldevelopedinCh.2,Ch.3estimatesthecapacityofaface representationonamorecompactfeaturespace,attemptingtoovercomethecurseofdimensionality thatmayleadtoover-estimatedcapacityvalues.InCh.4,theissueofdemographicbiasinFR systemsisaddressed.Besidesanempiricalanalysisoftheunequalveri˝cationperformancein di˙erentdemographicgroups,weintroducenewstrategiestomitigatesuchbiasforfairerface representationlearning.AnewframeworkforfacerepresentationlearningispresentedinCh.5, whichutilizesadversariallearningtoacquireoraclefeaturedistributionintheformofa : NNgraph. Thelastchapterdiscussestheconclusionsofthisdissertationandpresentsdirectionsforfuturework. Theexperimentalresultsoftheworkinthisthesiswerepreviouslypresentedin[8, 22 Chapter2 TheIntrinsicDimensionalityofFace Representation Thischapteraddressesthefollowingquestionspertainingtotheintrinsicdimensionalityofany givenfacerepresentation:(i)estimateitsintrinsicdimensionality,(ii)developadeepneuralnetwork basednon-linearmapping,dubbedDeepMDS,thattransformstheambientrepresentationtothe minimalintrinsicspace,and(iii)validatetheveracityofthemappingthroughfacematchinginthe intrinsicspace.Experimentsonbenchmarkimagedatasets(LFW[27]andIJB-C[9])revealthat theintrinsicdimensionalityofdeepneuralnetworkrepresentationsissigni˝cantlylowerthanthe dimensionalityoftheambientfeatures.Forinstance,SphereFace's[13] 512 dimfacerepresentation hasanintrinsicdimensionalityof16onIJB-Cdataset.Further,theDeepMDSmappingisableto obtainarepresentationofsigni˝cantlylowerdimensionalitywhilemaintainingdiscriminativeability toalargeextent, 59 Ł 75% TAR@ 0 Ł 1% FARin 16 dim vs 71 Ł 26% TARin 512 dimonIJB-C. Thekeycontributionsand˝ndingsofthischapterare: The˝rstattempttoestimatetheintrinsicdimensionalityofDNNbasedfacerepresentations. AnunsupervisedDNNbaseddimensionalityreductionmethodundertheframeworkofmultidi- mensionalscaling,calledDeepMDS. NumericalexperimentsyieldanINDestimateof,12and16forFaceNet[12]andSphereFace[13] 23 facerepresentations,respectively.Theestimatesaresigni˝cantlylowerthantheirrespectiveambient dimensionalities,128- 38< forFaceNetand512- 38< forSphereFace. DeepMDSmappingissigni˝cantlybetterthanotherdimensionalityreductionapproaches(e.g., PCAandIsomap[76])intermsofitsdiscriminativecapability. 2.1IntrinsicDimensionality Existingapproachesforestimatingintrinsicdimensionalitycanbebroadlyclassi˝edintotwo groups:projectionmethodsandgeometricmethods.Theprojectionmethods[77 79]determine thedimensionalitybyprincipalcomponentanalysisonlocalsubregionsofthedataandestimating thenumberofdominanteigenvalues.Theseapproacheshaveclassicallybeenusedinthecontext ofmodelingfacialappearanceunderdi˙erentilluminationconditions[80]andobjectrecognition withvaryingpose[81].Whiletheyserveasane˚cientheuristic,theydonotprovidereliable estimatesofintrinsicdimension.Geometricmethods[2,82 86],ontheotherhand,modelthe intrinsictopologicalgeometryofthedataandarebasedontheassumptionthatthevolumeofa < -dimensionalsetscaleswithitssize n as n < andhencethenumberofneighborslessthan n also behavesthesameway. Ourapproachinthischapterisbasedonthetopologicalnotionofcorrelationdimension[82,83], themostpopulartypeoffractaldimensions.Thecorrelationdimensionimplicitlyusesnearest- neighbordistance,typicallybasedontheEuclideandistance.However,Granata etal .[1]observe thatleveragingthemanifoldstructureofthedata,intheformofgeodesicdistancesinducedbya neighborhoodgraphofthedata,providesmorerealisticestimatesoftheIND.Buildinguponthis observationwebaseourINDestimatesonthegeodesicdistancebetweenpoints.Webelievethat estimatingtheintrinsicdimensionalitywouldserveasthe˝rststeptowardsunderstandingthebound ontheminimalrequireddimensionalityforrepresentingfacesandaidinthedevelopmentofnovel algorithmsthatcanachievethislimit. 24 2.2DimensionalityReduction Thereisatremendousbodyofworkonthetopicofestimatinglow-dimensionalapproximationsof datamanifoldslyinginhigh-dimensionalspace.TheseincludelinearapproachessuchasPrincipal ComponentAnalysis[87],MultidimensionalScaling(MDS)[88]andLaplacianEigenmaps[89] andtheircorrespondingnon-linearspectralextensions,LocallyLinearEmbedding[90],Isomap[76] andDi˙usionMaps[91].Anotherclassofdimensionalityreductionalgorithmsleveragethe abilityofdeepneuralnetworkstolearncomplexnon-linearmappingsofdata,includingdeep autoencoders[92],denoisingautoencoders[93,94]andlearninginvariantmappingseitherwith thecontrastiveloss[95]orwiththetripletloss[12].Whiletheautoencoderscanlearnacompact representationofdata,sucharepresentationisnotexplicitlydesignedtoretaindiscriminative ability.Boththecontrastivelossandthetripletlosshaveanumberoflimitations;(1)requires similarityanddissimilaritylabelsandcannotbetrainedinanunsupervisedsetting,(2)requires anadditionalhyper-parameter,maximummarginofseparation,whichisdi˚culttopre-determine, especiallyforanarbitraryrepresentation,and(3)doesnotmaintainthemanifoldstructureinthe low-dimensionalspace.Inthiswork,wetooleverageDNNstoapproximatethenon-linearmapping fromtheambienttotheintrinsicspace.However,weconsideranunsupervisedsetting(i.e.,no similarityordissimilaritylabels)andcastthelearningproblemwithintheframeworkofMDSi.e., preservingtheambientgraphinducedgeodesicdistancebetweenpointsintheintrinsicspace. 2.3OurApproach Ourgoalinthisworkistocompressagivenfacerepresentationspace.Weachievethisintwo stages 1 :(1)estimatetheintrinsicdimensionalityoftheambientfaceresentation,and(2)learnthe DeepMDSmodeltomaptheambientrepresentationspace P2 R 3 totheintrinsicrepresentation space M2 R < ( < 3 ).TheINDestimatesarebasedontheonepresentedby[1]whichrelieson 1 Traditionalsingle-stagedimensionalityreductionmethodsusevisualaidstoarriveatthe˝nalINDandintrinsic space,e.g.,plottingtheprojectionerroragainsttheINDvaluesandlookingforainthecurve. 25 twokeyideas:(1)usinggraphinducedgeodesicdistancestoestimatethecorrelationdimensionof thefacerepresentationtopology,and(2)thesimilarityofthedistributionsofgeodesicdistances acrossdi˙erenttopologicalstructureswiththesameintrinsicdimensionality.TheDeepMDS modelisoptimizedtopreservetheinterpointgeodesicdistancesbetweenthefeaturevectorsinthe ambientandintrinsicspace,andistrainedinastage-wisemannerthatprogressivelyreducesthe dimensionalityoftherepresentation.BasingtheprojectionmethodonDNNs,insteadofspectral approacheslikeIsomap,addressesthescalabilityandout-of-sample-extensionproblemssu˙eredby spectralmethods.Speci˝cally,DeepMDSistrainedinastochasticfashion,whichallowsittoscale. Furthermore,oncetrained,DeepMDSprovidesamappingfunctionintheformofafeed-forward networkthatmapstheambientfeaturevectortoitscorrespondingintrinsicfeaturevector.Sucha mapcaneasilybeappliedtonewtestdata. 2.3.1EstimatingIntrinsicDimensionality Wede˝nethenotionofintrinsicdimensionthroughtheclassicalconceptof topologicaldimension ofthesupportofadistribution.Thisisageneralizationoftheconceptofdimensionofalinear space 2 toanon-linearmanifold.Methodsforestimatingthetopologicaldimensionareallbasedon theassumptionthatthebehaviorofthenumberofneighborsofagivenpointonan < -dimensional manifoldembeddedwithina 3 -dimensionalspacescaleswithitssize n as n < .Inotherwords, thedensityofpointswithinan n -ball( n ! 0 )intheambientspaceisindependentoftheambient dimension 3 andvariesonlyaccordingtoitsintrinsicdimensionality < .Givenacollectionof points ^ = f x 1 ŒŁŁŁŒ x = g ,where x 8 2 R 3 ,thecumulativedistributionofthepairwisedistances ˘ ( A ) betweenthe = pointscanbeestimatedas, ˘ ( A )= 2 = ( = 1) = X 89 =1 ˛ ( A k x 8 x 9 k )= Z A 0 ? ( A ) 3A (2.1) 2 Lineardimensionistheminimumnumberofindependentvectorsnecessarytorepresentanygivenpointinthis spaceasalinearcombination. 26 (a)GraphInducedGeodesicDistance 2 f ? ( A ) A A <0G GeodesicDistance (b)TopologicalSimilarity Figure2.1 IntrinsicDimension: Ourapproachisbasedontwoobservations:(a)Graphinduced geodesicdistancebetweenimagesisabletocapturethetopologyoftheimagerepresentation manifoldmorereliably.Asanillustration,weshowthegraphedgesforthesurfaceofaunitary hypersphereandafacemanifoldofIDtwo,embeddedwithina3- 38< space.(b)Thedistributionof thegeodesicdistances(fordistance A <0G 2 f A A <0G ,where A <0G isthedistanceatthemode) hasbeenempiricallyobserved[1]tobesimilaracrossdi˙erenttopologicalstructureswiththesame intrinsicdimensionality.Theplotshowsthedistancedistributionforafacerepresentation,unitary hypersphereandaGaussiandistributionofIDtwoembeddedwithin3- 38< space.[8] where ˛ ( ) istheHeavisidefunctionand ? ( A ) istheprobabilitydistributionofthepairwisedistances. Inthiswork,wechoosethecorrelationdimension[82],aparticulartypeoftopologicaldimension, torepresenttheintrinsicdimensionofthefacerepresentation.Itisde˝nedas, < =lim A ! 0 ln ˘ ( A ) ln A = ) lim A ! 0 ˘ ( A ) / A < (2.2) Therefore,theintrinsicdimensioniscruciallydependentontheaccuracywithwhichtheprobability distributioncanbeestimatedatverysmalllength-scales(distances),i.e., A ! 0 .Signi˝cante˙orts havebeendevotedtoestimatingtheintrinsicdimensionthroughline˝ttinginthe ln ˘ ( A ) vs ln A spacearoundtheregionwhere A ! 0 i.e., < =lim ( A 2 A 1 ) ! 0 ln ˘ ( A 2 ) ln ˘ ( A 1 ) ln A 2 ln A 1 (2.3) =lim A ! 0 3 ln ˘ ( A ) 3 ln A =lim A ! 0 ? ( A ) ˘ ( A ) A =lim A ! 0 < ( A ) Themaindrawbackwiththisapproachistheneedforreliableestimatesof ? ( A ) atverysmall lengthscales,whichispreciselywheretheestimatesaremostunreliablewhendataislimited, 27 especiallyinveryhigh-dimensionalspaces.Granataetal.[1]presentanelegantsolutiontothis problemthroughthreeobservations,(i)estimatesof < ( A ) canbestableevenas A ! 0 ifthe distancebetweenpointsiscomputedasthegraphinducedshortestpathbetweenpointsinsteadof theeuclideandistance,asiscommonlythecase,(ii)theprobabilitydistribution ? ( A ) atintermediate length-scalesaroundthemodeof ? ( A ) i.e., ( A <0G 2 f ) A A <0G canbeconvenientlyusedto obtainreliableestimatesofIND,and(iii)thedistributions ? ( A ) ofdi˙erenttopologicalgeometries aresimilartoeachotheraslongastheintrinsicdimensionalityisthesame,orinotherwordsthe distribution ? ( A ) dependsonlyontheintrinsicdimensionalityandnotonthegeometricsupportof themanifolds. Fig.2.1providesanillustrationoftheseobservations.Considertwodi˙erentmanifolds,facesand thesurfaceofa( < +1 )-dimensionalunitaryhypersphere(henceforthreferredtoas < -hypersphere S < ),withintrinsicdimensionalityof < =2 butembeddedwithin 3 - 38< Euclideanspace.Beyondthe nearestneighbor,thedistance A betweenanypairofpointsinthemanifoldiscomputedastheshortest pathbetweenthepointsasinducedbythegraphconnectingallthepointsintherepresentation. Fig.2.1bshowsthedistributionof log ? ( A ) ? ( A <0G ) vs log A A <0G intherange A <0G 2 f A A <0G , where f isthestandarddeviationof ? ( A ) and A <0G = argmax A ? ( A ) correspondstotheradiusof themodeof ? ( A ) .Interestingly,di˙erenttopologicalgeometries,namely,afacerepresentationof INDtwo,a 2 -hypersphereanda 2 - 38< Gaussian,allembeddedwithin 3 - 38< Euclideanspacehave almostidenticaldistributions.Moregenerally,thedistributionof log ? ( A ) ? ( A <0G ) vs log A A <0G intherange A <0G 2 f A A <0G isempiricallyobservedtodependonlyontheintrinsicdimensionality,rather thanthegeometricalsupportofthemanifold. Theintrinsicdimensionalityoftherepresentationmanifoldcanthusbeestimatedbycomparing theempiricaldistributionofthepairwisedistances ^ ? M ( A ) onthemanifoldtothatofaknown distribution,suchasthe < -hypersphereortheGaussiandistributionintherange A <0G f A A <0G . We˝rstshowthederivationforestimatingtheintrinsicdimensionality < thatminimizestheRoot MeanSquaredError(RMSE)withrespecttoa < -hypersphere.Thedistributionofthegeodesic distance ? S < ( A ) of < -hyperspherecanbeanalyticallyexpressedas, ? S < ( A )= 2 sin < 1 ( A ) ,where 2 28 isaconstantand < istheIND.Given ^ ? M ( A ) ,weminimizetheRMSEbetweenthedistributionsas, min 2Œ< Z A <0G A <0G 2 f k log^ ? M ( A ) log( 2 ) ( < 1)log ¹ sin[ A ] ºk 2 whichuponsimpli˝cationyields, min < Z A <0G A <0G 2 f log ^ ? M ( A ) ^ ? M ( A <0G ) ( < 1)log sin cA 2 A <0G 2 Theaboveoptimizationproblemcanbesolvedviaaleast-squares˝tafterestimatingthestandard deviation, f ,of ? ( A ) .Firstweestimate f forthe < -hyperspherebyapproximatingthedistribution ^ ? M ( A ) byaunivariateGaussiandistributionaroundthemodeof ? M ( A ) .So,givensamples ( = f A 1 ŒŁŁŁŒA ) g fromthedistribution ? ( A ) ,thevariancearoundthemodecanbeestimatedas, f 2 = 1 ) P ) C =1 ( A C A <0G ) 2 ,where A <0G istheradiusatthemodeof ^ ? M ( A ) .Then,weestimatethe distribution log ^ ? M ( A ) ^ ? M ( A <0G ) vs log B8= h cA 2 A <0G i andsolvethefollowingleast-squares˝tproblem: min < X ( \ A <0G 2 f A 8 A <0G ¹ H 8 ( < 1) G 8 º 2 where H 8 = log ^ ? _ M ( A 8 ) ^ ? _ M ( A <0G ) and G 8 = log B8= h cA 2 A <0G i .Suchaprocedurecould,inprinciple,resultin afractionalestimateofdimension.Ifoneonlyrequiresintegersolutions,theoptimalvalueof < can beestimatedbyrounding-o˙theleastsquares˝tsolution. InthecaseofcomparisontoaGaussiandistribution,theintrinsicdimensionalitycanalsobe estimatedbycomparingtothegeodesicdistancedistributionforpointssampledfromaGaussian distributionas, min 3 Z A <0G A <0G 2 f log ? ( A ) ? ( A <0G ) +( 3 1) A 2 4 f 2 2 2 (2.4) Thesolutionofthisoptimizationproblemcanbefoundfollowingthesameproceduredescribed abovefora < -hypersphere. 29 Figure2.2 DeepMDSMapping: ADNNbasednon-linearmappingislearnedtotransformthe ambientspacetoaplausibleintrinsicspace.Thenetworkisoptimizedtopreservedistancesbetween pairsofpointsintheambientandintrinsicspace. 2.3.2EstimatingIntrinsicSapce Theintrinsicdimensionalityestimatesobtainedintheprevioussubsectionalludestotheexistenceof amapping,thatcantransformtheambientrepresentationtotheintrinsicspace,butdoesnotprovide anysolutionsto˝ndsaidmapping.Themappingitselfcouldpotentiallybeverycomplexandour goalofestimatingitispracticallychallenging. Webaseoursolutiontoestimateamappingfromtheambienttotheintrinsicspaceon Multidimensionalscaling(MDS)[88],aclassicalmappingtechniquethatattemptstopreservethe distances(similarities)betweenpointsafterembeddingtheminalow-dimensionalspace.Given datapoints ^ = f x 1 ŒŁŁŁŒ x = g intheambientspaceand _ = f y 1 ŒŁŁŁŒ y = g thecorrespondingpoints intheintrinsiclow-dimensionalspace,theMDSproblemisformulatedas, min X 89 3 ˛ ( x 8 Œ x 9 ) 3 ! ( y 8 Œ y 9 ) 2 (2.5) where 3 ˛ ( ) and 3 ! ( ) aredistance(similarity)metricsintheambientandintrinsicspace,respectively. Di˙erentchoicesofthemetric,leadstodi˙erentdimensionalityreductionalgorithms.Forinstance, classicalmetricMDSisbasedonEuclideandistancebetweenthepointswhileusingthegeodesic distanceinducedbyaneighborhoodgraphleadstoIsomap[76].Similarly,manydi˙erentdistance metricshavebeenproposedcorrespondingtonon-linearmappingsbetweentheambientspaceand theintrinsicspace.Amajorityoftheseapproachesarebasedonspectraldecompositionsandsu˙er 30 manydrawbacks,(i)computationalcomplexityscalesas O ( = 3 ) for = datapoints,(ii)ambiguity inthechoiceofthecorrectnon-linearfunction,and(iii)collapsedembeddingsonmorecomplex data[95]. Toovercometheselimitations,weemployaDNNtoapproximatethenon-linearmapping thattransformstheambientrepresentation, x ,totheintrinsicspace, y byaparametricfunction y = 5 ( x ; ) ) withparameters ) .WelearntheparametersofthemappingwithintheMDSframework, min ) = X 8 =1 = X 8 =1 3 ˛ ( x 8 Œ x 9 ) 3 ! ( 5 ( x 8 ; ) ) Œ5 ( x 9 ; ) )) 2 + _ k ) k 2 2 wherethesecondtermisaregularizerwithahyperparameter _ .Fig.2.2showsanillustrationofthe DNNbasedmapping. Inpractice,directlylearningthemappingfromtheambienttotheintrinsicspaceisvery challenging,especiallyfordisentanglingacomplexmanifoldunderhighlevelsofcompression.We adoptacurriculumlearning[96]approachtoovercomethischallengeandprogressivelyreducethe dimensionalityofthemappinginmultiplestages.Westartwitheasiersub-tasksandprogressively increasethedi˚cultyofthetasks.Forexample,adirectmappingfrom R 512 ! R 15 isinstead decomposedintomultiplemappingfunctions R 512 ! R 256 ! R 128 ! R 64 ! R 32 ! R 15 .We formulatethelearningproblemfor ! mappingfunctions y ; = 5 ; ( x ; ) ) as: min ) 1 ŒŁŁŁŒ ) ! = X 8 =1 = X 9 =1 ! X ; =1 U ; h 3 ˛ ( x 8 Œ x 9 ) 3 ! ( y ; 8 Œ y ; 9 ) i 2 + _ k ) ; k 2 2 where ) ; aretheparametersofthe ; -thmapping.Appropriatelyschedulingthe U ; weightsenables ustosetitupasacurriculumlearningproblem. 2.4Experiments 2.4.1IntrinsicDimensionalityEstimation Inthissection,˝rstwewillestimatetheintrinsicdimensionalityofmultiplefacerepresentations overmultipledatasetsofvaryingcomplexity.Then,wewillevaluatethee˚cacyoftheproposed 31 (a)DistanceDistribution ? ( A ) (b)LeastSquaresFitting Figure2.3 IntrinsicDimensionality: (a)Geodesicdistancedistribution,and(b)globalminimum ofRMSE. DeepMDSmodelin˝ndingthemappingfromtheambienttotheintrinsicspacewhilemaintaining itsdiscriminativeability. Datasets. Weconsidertwodi˙erentfacedatasetsforfaceveri˝cation,LFW[27],andIJB-C[9]. RecallthatDeepMDSisanunsupervisedmethod,socategoryinformationassociatedwiththefaces isneitherusedforintrinsicdimensionalityestimationnorforlearningthemappingfromtheambient tointrinsicspace. RepresentationModels. Fortheface-veri˝cationtask,weconsidermultiplepubliclyavail- ableSOTAfaceembeddingmodels,namely, 128 - 38< FaceNet[12]representationand 512 - 38< SphereFace[13]representation.Inaddition,wealsoevaluatea 512 - 38< variantofFaceNet 3 thatoutperformsthe128- 38< version.AlloftheserepresentationsarelearnedfromtheCASIA WebFace[11]dataset,consistingof494,414imagesacross10,575subjects. BaselineMethods. IntrinsicDimensionality:Weselecttwodi˙erentalgorithmsforestimatingtheintrinsicdimen- sionalityofagivenrepresentation,aclassicalk-nearestneighborbasedestimator[2]andinsic DimensionalityEstimationAlgorithm"(IDEA)[3]. DimensionalityReduction:WecompareDeepMDSagainstthreedimensionalityreduction 3 https://github.com/davidsandberg/facenet 32 (a)FaceNet-128 (b)FaceNet-512 (c)SphereFace Figure2.4Distributionofgeodesicdistancesfordi˙erentrepresentationmodelsanddatasets. algorithms,principalcomponentanalysis(PCA)forlineardimensionalityreduction,Isomap[76] anddenoisingautoencoders[94](DAE). ImplementationDetails: TheINDestimatesforallthemethodsweevaluatearedependentonthe numberofneighbors : .Forthebaselines, : isusedtocomputetheparametersoftheprobability density.Forourmethod, : parameterizestheconstructionoftheneighborhoodgraph.Forthelatter, thechoiceof : isconstrainedbythreefactors;(1) : shouldbesmallenoughtoavoidshortcuts betweenpointsthatareclosetoeachotherintheEuclideanspace,butarepotentiallyfarawayinthe correspondingintrinsicmanifoldduetohighlycomplicatedlocalcurvatures.(2)Ontheotherhand, : shouldalsobelargeenoughtoresultinaconnectedgraphi.e.,therearenoisolateddatasamples, and(3) : thatbestmatchesthegeodesicdistancedistributionofahypersphereofthesameIND i.e., : thatminimizestheRMSE.Fig.2.3ashowsthedistancedistributionsforSphereFacewith : =15 ,a16-hypersphereanda16- 38< Gaussian.Theclosesimilarityofthepairwisedistance distributionsofthesemanifoldsinthegraphinducedgeodesicdistancespacesuggeststhattheIND ofSphereFace(512-dimambientspace)is16.Fig.2.3bshowstheoptimalRMSEforSphereFace atdi˙erentvaluesof < .Thedistributionofgeodesicdistances ? ( A ) foreachofthedatasetsand representationmodelsisshowninFig.2.4.Fig.2.5showstheplotof log ^ ? M ( A ) ^ ? M ( A <0G ) vs log A A <0G ,aswe varythenumberofneighbors : ,fortheSphereFacerepresentationmodelontheLFWandIJB-C datasets. Foralltheapproachesweselectthe : -nearestneighborsusingcosinesimilarityforSphereFace, andarc-length, 3 ( x 1 Œ x 2 )= cos 1 x ) 1 x 2 k x 1 kk x 2 k ,forFaceNetfeatures,asthelatterarenormalizedto 33 (a)SphereFaceonLFWDataset (b)SphereFaceonIJB-CDataset Figure2.5 log ^ ? M ( A ) ^ ? M ( A <0G ) vs log A A <0G plotsaswevarynumberofneighbors : forspherefacerepresen- tationmodelondi˙erentdatasets. 34 Table2.1IntrinsicDimensionality:GraphDistance[1] Representation dataset k 47915 FaceNet-128 LFW 10* 131118 IJB-C101010 11* FaceNet-512 LFW 10* 111117 IJB-C111112 12* SphereFace LFW 10* 11139 IJB-C141416 16* Table2.2IntrinsicDimensionality:KNN[2] Representation dataset k 47915 FaceNet-128 LFW10101111 IJB-C101099 FaceNet-512 LFW8889 IJB-C101099 Sphereface LFW6778 IJB-C6655 resideonthesurfaceofaunitaryhypersphere.Finally,forsimplicity,weroundtheINDestimatesto thenearestintegerforallthemethods. ExperimentalResults: Tab.2.1reportstheINDestimatesfromthegraphmethodfordi˙erent valuesof : 4 andfordi˙erentrepresentationmodelsacrossdi˙erentdatasets.Tab.2.2andTab.2.3 reportstheINDestimatesfromthek-nearestneighborapproach[2]andIDEA[3],respectively, fordi˙erentrepresentationmodelsacrossdi˙erentdatasetsthatweconsider.Theseapproaches areknowntounderestimatetheintrinsicdimensionality[79].Wemakeanumberofobservations fromourresults:(1)Surprisingly,theINDestimatesacrossallthedatasets,featurerepresentations andINDmethodsaresigni˝cantlylowerthanthedimensionalityoftheambientspace,between 10and20,suggestingthatfacerepresentationscould,inprinciple,bealmost10 to50 more compact.(2)Boththe : -NNbasedestimator[2]andtheIDEAestimator[3]arelesssensitivetothe numberofnearestneighborsincomparisontothegraphdistancebasedmethod[1],butareknown tounderestimateINDforsetswithhighintrinsicdimensionality[79].Asreportedinthetables, 4 *denotes˝nalINDestimatethatsatis˝esallconstraintson : . 35 Table2.3IntrinsicDimensionality:IDEA[3] Representation dataset k 47915 FaceNet-128 LFW14131312 IJB-C1411109 FaceNet-512 LFW12101010 IJB-C1411109 Sphereface LFW10999 IJB-C8765 (a)HistogramofSwissRoll (b) log ^ ? M ( A ) ^ ? M ( A <0G ) vs log A A <0G (c)DimensionalityofSwissRoll Figure2.6IntrinsicDimensionalityofSwissRoll INDestimatesofthetwobaselinemethodsarelowerthantheestimatesofthegraphdistancebased approachthatweuse. SwissRoll. Wealsoconsiderswissrolldataset,asameansofprovidingvisualvalidationofthe estimatedintrinsicspaceonaknowndataset.First,weestimatetheintrinsicdimensionalityofthe swissrolldatasetandthenwelearnalow-dimensionalmappingfromtheambient3- 38< spaceto theintrinsicspace.Wesample 2 Œ 000 pointsfromtheswissrolldatasetandusethesepointsfor theexperiments.Forthisdataset,theintrinsicdimensionalityestimateis 2 (SeeFig.2.6),whichis indeedthegroundtruthintrinsicdimensionalityofswiss-roll. 2.4.2IntrinsicSpaceMapping Giventheestimatesofthedimensionalityoftheintrinsicspace,welearnthemappingfromthe ambientspaceto aplausible intrinsicspacewiththegoalofretainingthediscriminativeabilityof therepresentation.Thetrueintrinsicrepresentation(INDandspace)isunknownandtherefore 36 (a)SwissRoll (b)ProjectionofIsomap (c)ProjectionofDeepMDS Figure2.7 SwissRoll: (a)theoriginal2000pointsfromtheswissrollmanifold,(b)the2- 38< intrinsicspaceestimatedbyIsomap,and(3)the2- 38< intrinsicspaceestimatedbyourproposed methodDeepMDS.Inbothcases,theblueandblackpoints,andcorrespondinglygreenandred points,areclosetogetherinboththeintrinsicandambientspace. notfeasibletovalidatedirectly.However,verifyingitsdiscriminatepowercanservetoindirectly validateboththeINDestimateandthelearnedintrinsicspace. ImplementationDetails: We˝rstextractfacefeaturesthroughtherepresentationmethodsi.e., FaceNet-128,FaceNet-512andSphereFace.ThearchitectureoftheproposedDeepMDSmodel isbasedontheideaofskipconnectionladenresidualunits[70].Wetrainthemappingfromthe ambienttointrinsicspaceinmultiplestageswitheachstagecomprisingoftworesidualunits.Once theindividualstagesaretrained,allthe ! projectionmodelsarejointly˝ne-tunedtomaintainthe pairwisedistancesintheintrinsicspace.Weadoptasimilarnetworkstructure(residualunits)and trainingstrategy(stagewisetrainingand˝ne-tuning)forthestackeddenoisingautoencoderbaseline. Fromanoptimizationperspective,trainingtheautoencoderismorecomputationallye˚cientthan theDeepMDSmodel, O ( = ) vs O ( = 2 ) . TheparametersofthenetworkarelearnedusingtheAdam[97]optimizerwithalearningrateof 3 10 4 andtheregularizationparameter _ =3 10 4 .Weobservedthatusingthecosine-annealing scheduler[98]wascriticaltolearningane˙ectivemapping. ExperimentalResults: Weevaluatethee˚cacyofthelearnedprojections,namelyPCA,Isomap andDeepMDS,inthelearnedintrinsicspaceandcomparetheirrespectiveperformanceintheambient space.Facerepresentationsareevaluatedintermsofveri˝cation(TAR@FAR)performance.Given 37 Table2.4LFWFaceVeri˝cationforSphereFaceEmbedding Dimension DimensionReductionmethod PCAIsomapDAEDeepMDS 51296.74% 256 96.75 %92.88%77.80%96.73% 128 96.80 %93.18%32.95%96.44% 6491.71%95.00%32.04% 96.50 % 3266.38%95.31%11.71% 96.31 % 1632.67%89.47%27.53% 95.95 % 10(ID)16.04%77.31%6.73% 92.33 % (a)IJB-C:FaceNet-128 (b)IJB-C:FaceNet-512 (c)IJB-C:SphereFace Figure2.8 DeepMDS: FaceVeri˝cationonIJB-C[9](TAR@0.1%FARinlegend)forthe(a) FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings. theINDestimate,designinganappropriateschemeformappingtheintrinsicmanifoldismuch morechallengingthantheINDestimationitself.Toshowhowdimensionalityoftheintrinsicspace in˛uencestheperformanceoffacerepresentations,weevaluateandcomparetheirperformanceat multipleintermediatespaces. Faceveri˝cationisperformedontheIJB-Cdatasetfollowingitsveri˝cationprotocolandonthe LFWdatasetfollowingtheBLUFR[99]protocol.Fig.2.8showstheROCcurvesfortheIJB-C datasetusingfacerepresentationsprojectedtomultipleintermediatespacesandtheintrinsicspace byDeepMDS.Thefaceveri˝cationROCcurvesofDeepMDSonLFWdatasetforFaceNet-128, FaceNet-512andSphereFacerepresentationmodelsareshowninFig.2.9.Tab.2.4reportsthe veri˝cationrateatFARof0.1%ontheLFWdataset.Fig.2.10showsthefaceveri˝cationROC curvesofPCAontheIJB-CandLFW(BLUFR)datasetsforallthethreerepresentationmodels. Similarly,Fig.2.11andFig.2.13showthefaceveri˝cationROCcurvesoftheIsomapandDenoising Autoencoderbaselines,respectively. 38 (a)LFW:FaceNet-128 (b)LFW:FaceNet-512 (c)LFW:SphereFace Figure2.9 DeepMDS: FaceVeri˝cationonLFW(BLUFR)datasetforthe(a)FaceNet-128,(b) FaceNet-512and(c)SphereFaceembeddings. Wemakethefollowingobservationsfromtheseresults:(1)foralltheveri˝cationexperiments, theperformanceoftheDeepMDSfeaturesupto32dimensionsforfacesiscomparabletothe original128- 38< and512- 38< features.The10- 38< spaceofDeepMDSonLFW,consistinglargely offrontalfaceimageswithminimalposevariationsandfacialocclusions,achievesaTARof92.33% at0.1%FAR,alossofabout4.5%comparedtotheambientspace.The12- 38< spaceofDeepMDS onIJB-C,withfullposevariations,occlusionsanddiversityofsubject,achievesaTARof62.25% at0.1%FAR,comparedto69.32%intheambientspace.(2)theproposedDeepMDSmodelis abletolearnalow-dimensionalspaceuptotheINDwithaperformancepenaltyof5%-10%for compressionfactorsof30 to40 for512-dimrepresentations,underscoringthefactthatlearning amappingfromambienttointrinsicspaceismorechallengingthanestimatingtheINDitself.(3) Inthetaskoffaceveri˝cation,weobservethattheDeepMDSmodelisabletoretainsigni˝cantly morediscriminativeabilitycomparedtothebaselineapproachesevenathighlevelsofcompression. WhileIsomapismorecompetitivethantheotherbaselinesitsu˙ersfromsomedrawbacks:(i)Due toitsiterativenature,itdoesnotprovideanexplicitmappingfunctionfornew(unseen)datasamples, whiletheautoencoderandDeepMDSmodelscanmapsuchdatasamples.Therefore,Isomap cannotbeutilizedtoevaluateveri˝cationaccuracyonavalidation/testset,and(ii)Computational complexityofIsomapis O ( = 3 ) andhencedoesnotscalewelltolargedatasets(IJB-C)andneeds approximations,suchasNyströmapproximation[57],fortractability. AblationStudy: Herewedemonstratethee˚cacyofthestagewiselearningprocessfortraining 39 (a)IJB-C:FaceNet-128 (b)IJB-C:FaceNet-512 (c)IJB-C:SphereFace (d)LFW:FaceNet-128 (e)LFW:FaceNet-512 (f)LFW:SphereFace Figure2.10 PCA: FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a)FaceNet-128, (b)FaceNet-512and(c)SphereFaceembeddings. Table2.5DeepMDSTrainingMethods(TAR@0.1%FAR) Method DirectDirect+ISStagewise+FinetuneStagewise TAR 80.2586.1590.42 92.33 theDeepMDSmodel.Allmodelshavethesamecapacity.Weconsiderfourvariants:(1) Direct mappingfromtheambienttointrinsicspace,(2) Direct+IS: directmappingfromambientto intrinsicspacewithintermediatesupervisionateachstagei.e.,optimizeaggregateintermediate losses,(3) Stagewise learningofthemapping,and(4) Stagewise+Fine-Tune: theprojectionmodel trainedstagewiseandthen˝ne-tuned.Tab.2.5comparestheresultsofthesevariationsontheLFW dataset(BLUFRprotocol).Ourresultssuggestthatstagewiselearningofthenon-linearprojection modelsismoree˙ectiveatprogressivelydisentanglingtheambientrepresentation.Similartrend wasobservedonthelargerdataset,IJB-C.Infact,stagewisetrainingwith˝ne-tuningwascriticalin learningane˙ectiveprojection,bothforDeepMDSaswellasDAE. DirectTraining: Our˝ndingsinthiswork,thatmannycurrentDNNrepresentationscanbe signi˝cantlycompressed,namelybegsthequestion: canwedirectlylearnembeddingfunctions 40 (a)IJB-C:FaceNet-128 (b)IJB-C:FaceNet-512 (c)IJB-C:SphereFace (d)LFW:FaceNet-128 (e)LFW:FaceNet-512 (f)LFW:SphereFace Figure2.11 Isomap: FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a)FaceNet-128, (b)FaceNet-512and(c)SphereFaceembeddings. thatyieldcompactanddiscriminativeembeddingsinthe˝rstplace? Taigmanetal.[52]studythis probleminthecontextoflearningfaceembeddings,andnotedthatacompactfeaturespacecreates abottleneckintheinformation˛owtotheclassi˝cationlayerandhenceincreasesthedi˚cultyof optimizingthenetworkwhentrainingfromscratch.Giventhesigni˝cantdevelopmentsinnetwork architecturesandoptimizationtoolssincethen,weattempttolearnhighlycompactembedding directlyfromraw-data,usingcurrentbest-practices,whilecircumventingthechicken-and-egg problemofnotknowingthetargetintrinsicdimensionalitybeforelearningtheembeddingfunction. Wetrain 5 theInceptionResNetV1[10]ontheCASIA-WebFace[11]forembeddingsofdi˙erent sizes.Fig.2.12showstheROCcurvesontheLFWandIJB-Cdatasets.Themodelssu˙ersigni˝cant lossinperformanceaswedecreasethedimensionalityoftheembeddings.Incomparisonthe proposedDeepMDSbaseddimensionalityreductionretainsitsdiscriminativeabilityevenathigh levelsofcompression.Theseresultscallforthedevelopmentofalgorithmsthatcandirectlylearn compactande˙ectiveimagerepresentations. 5 Webuildo˙ofthepubliclyavailableimplementationat https://github.com/davidsandberg/facenet 41 (a)LFW (b)IJB-C Figure2.12ROCcurveonLFWandIJB-CdatasetsfortheInceptionResNetV1[10]modeltrained withdi˙erentembeddingdimensionalityontheCASIA-WebFace[11]dataset. (a)IJB-C:FaceNet-128 (b)FaceNet-512 (c)SphereFace (d)FaceNet-128 (e)FaceNet-512 (f)SphereFace Figure2.13 DenoisingAutoencoder: FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetfor the(a)FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings. 42 2.5Conclusion Theworkinthischapteraddressedtwoquestions,givenaDNNbasedfacerepresentation,whatis theminimumnumberofdegreesoffreedomintherepresentationi.e.,itsintrinsicdimensionand canwe˝ndamappingbetweentheambientandintrinsicspacewhilemaintainingthediscriminative capabilityoftherepresentation?Contributionsoftheworkinclude,(i)agraphinducedgeodesic distancebasedapproachtoestimatetheintrinsicdimension,and(ii)DeepMDS,anon-linear projectiontotransformtheambientspacetotheintrinsicspace.ExperimentsonmultipleDNN basedfacerepresentationsyieldedINDestimatesof9to16,whicharesigni˝cantlylowerthan theambientdimension(10 to40 ).TheDeepMDSmodelwasabletolearnaprojectionfrom ambienttotheintrinsicspacewhilepreservingitsdiscriminativeability,toalargeextent,onthe LFW,andIJB-Cdatasets.Our˝ndingsinthischaptersuggestthatfacerepresentationscouldbe signi˝cantlymorecompactandcallforthedevelopmentofalgorithmsthatcandirectlylearnmore compactfacerepresentations. 43 Chapter3 TheCapacityofFaceRepresentation InthepreviouschapterweestimateINDoffacepresentation,andproposeadimensionalityreduction methodtomaphigh-dimensionalfeaturevectorsintolow-dimensionalspace.Wehaveyettogive anyapplicationrelatedtointrinsicrepresentationspace.Inthischapterwestudyaspeci˝cproblem offacerepresentationapplyingthedimensionalityreductionalgorithmbasedonthenotionofIND. Theproblemisde˝nedas, givenafacerepresentation,howmanyidentitiescanitresolve? Inother words, whatisthecapacityofthefacerepresentation? Wecastthefacecapacityprobleminterms ofpackingboundsonalow-dimensionalmanifoldembeddedwithinadeeprepresentationspace.By explicitlyaccountingforthemanifoldstructureoftherepresentationaswelltwodi˙erentsourcesof representationalnoise: epistemic (model)uncertaintyand aleatoric (data)variability,ourapproach isabletoestimatethecapacityofagivenfacerepresentation.Todemonstratethee˚cacyofour approach,weestimatethecapacityoftwodeepneuralnetworkbasedfacerepresentations,namely 128-dimensionalFaceNetand512-dimensionalSphereFace. Atthecore,weleverageadvancesindeepneuralnetworks(DNNs)formultipleaspectsofour solution,relyingontheirabilitytoapproximatecomplexnon-linearmappings.Firstly,weutilize DNNstoapproximatethenon-linearfunctionforprojectingandunfoldingthehigh-dimensionalface representationintoalow-dimensionalrepresentation,whilepreservingthelocalgeometricstructure ofthemanifold.Secondly,weutilizeDNNstoaidinapproximatingthedensityandsupportofthe 44 Figure3.1Anillustrationofthegeometricalstructureofourcapacityestimationproblem:a low-dimensionalmanifold M2 R < embeddedinhighdimensionalspace P2 R ? .Onthismanifold, allthefaceslieinsidethepopulationhyper-ellipsoidandtheembeddingofimagesbelongingto eachidentityoraclassareclusteredintotheirownclass-speci˝chyper-ellipsoids.Thecapacityof thismanifoldisthenumberofidentities(class-speci˝chyper-ellipsoids)thatcanbepackedintothe populationhyper-ellipsoidwithinanerrortoleranceoramountofoverlap. low-dimensionalmanifoldintheformofmultivariateGaussiandistributionsasafunctionofthe desiredFAR.Thekeytechnicalcontributionsofthisworkare: 1. Explicitlyaccountingforandmodelingthemanifoldstructureofthefacerepresentationinthe capacityestimationprocess.ThisisachievedthroughaDNNbasednon-linearprojectionand unfoldingoftherepresentationintoanequivalentlow-dimensionalEuclideanspacewhile preservingthelocalgeometricstructureofthemanifold. 2. Anoisemodelforfacialembeddingsthatexplicitlyaccountsfortwosourcesofuncertainty, uncertaintyduetodataandtheuncertaintyintheparametersoftherepresentationfunction. 3. Establishingarelationshipbetweenthesupportoftheclass-speci˝cmanifoldsandthe discriminantfunctionofanearestneighborclassi˝er.Consequently,wecanestimatecapacity asafunctionofthedesiredoperatingpoint,intermsofthemaximumdesiredprobabilityof falseacceptanceerror. 45 4. The˝rstpracticalattemptatestimatingthecapacityofDNNbasedfacerepresentations.We considertwosuchrepresentations,namely,FaceNet[12]andSphereFace[13]consistingof 128-dimensionaland512-dimensionalfeaturevectors,respectively. Numericalexperimentssuggestthatourproposedmodelcanprovidereasonableestimatesof capacity.ForFaceNetandSphereFace,theupperboundsonthecapacityare 3 Ł 5 10 5 and 1 Ł 1 10 4 , respectively,forLFW[27],and 2 Ł 2 10 3 and 3 Ł 6 10 3 ,respectively,forIJB-C[9]ataFARof0.1%. Thisimpliesthat,onaverage,therepresentationshouldhaveatrueacceptrate(TAR)of100%at FARof0.1%for 2 Ł 2 10 3 and 3 Ł 6 10 5 subjectidentitiesforthemorechallengingIJB-Cdatabase andrelativelesschallengingLFWdatabase(seeFig.1.1cand1.1fforexamplesoffacesinLFW andIJB-C).Assuch,thecapacityestimatesrepresentanupperboundonthemaximalscalabilityof agivenfacerepresentation.However,empirically,theFaceNetrepresentationonlyachievesaTAR of43%ataFARof0.1%ontheIJB-Cdatasetwith3,531subjectsandaTARof94%ataFARof 0.1%ontheLFWdatasetwith5,749subjects. 3.1RelatedWork Thefocusofagajorityofthefacerecognitionliteraturehasbeenontheaccuracyoffacialrecognition onbenchmarkdatasets.Incontrast,ourgoalinthisworkistocharacterizethemaximaldiscriminative capacityofagivenfacerepresentationataspeci˝ederrortolerance. Anumberofapproacheshavebeenproposedtoanalyzevariousperformancemetricsofbiometric recognitionsystems,primarilyusinginformationtheoreticconcepts.Schmid etal .[100,101] deriveanalyticalboundsontheprobabilityoferrorandcapacityofbiometricsystemsthroughlarge deviationanalysisonthedistributionofthesimilarityscores.Bhatnagar etal .[102]formulated performanceindicesforbiometricauthentication.Theyobtainedthecapacityofabiometricsystem followingShannon'schannelcapacityformulationalongwitharate-distortiontheoryframework toestimatetheFAR.Similarly,Wang etal .[103]proposedanapproachtomodelandpredictthe performanceofafacerecognitionsystembasedonananalysisofthesimilarityscores.Thecommon 46 themeacrossthisentirebodyofworkisthattheperformanceboundsofthesesystemsareanalyzed purelybasedonthesimilarityscoresobtainedaspartofthematchingprocess.Incontrast,ourwork directlyanalyzesthegeometryoftherepresentationspaceoffacerecognitionsystems. Toalleviatethecurseofdimensionality,weunfoldthefacemanifoldtoalowerdimensional spacebyusingthesolutionweintroduceinCh.2.IntheworkofCh.2,we˝rstestimatethe intrinsicdimensionalityoftherepresentationandusetheproposedDeepMDStolearnanon-linear mappingthatpreservesthediscriminativeperformanceoftherepresentationtoalargeextent.But withthegoalofestimatingthecapacityinthischapter,itisopposedtothatofpreservingthe discriminativeperformanceoftherepresentationinthepreviouschapter.Therefore,whilethelatter doesnotnecessitatepreservingthelocalgeometricstructureofthemanifold,theformeriscritically dependentontheabilityofthedimensionalityreductiontechniquetopreservethelocalgeometric structureofthemanifold. Inthecontextofestimatingdistributions,GaussianProcesses[104]areapopularandpowerful tooltomodeldistributionsoverfunctions,o˙eringnicepropertiessuchasuncertaintyestimatesover functionvalues,robustnesstoover-˝tting,andprincipledwaysforhyper-parametertuning.Anumber ofapproacheshavebeenproposedformodelinguncertaintiesindeepneuralnetworks[105 107]. Alongsimilarlines,Kendall etal .[108]studiedthebene˝tsofexplicitlymodeling epistemic 1 (model)and aleatoric 2 (data)uncertainties[109]inBayesiandeepneuralnetworksforsemantic segmentationanddepthestimationtasks.Drawinginspirationfromthiswork,weaccountfor thesetwosourcesofuncertaintiesintheprocessofmappinganormalizedfacialimageintoa low-dimensionalfacerepresentation. Capacityestimatestodeterminetheuniquenessofotherbiometricmodalities,namely˝ngerprints andirishavebeenreported.Pankanti etal .[110]derivedanexpressionforestimatingtheprobability ofafalsecorrespondencebetweenminutiae-basedrepresentationsfromtwoarbitrary˝ngerprints belongingtotwodi˙erent˝ngers.Zhu etal .[111]laterdevelopedamorerealisticmodelof ˝ngerprintindividualitythrougha˝nitemixturemodeltorepresentthedistributionofminutiaein 1 Uncertaintyduetolackofinformationaboutaprocess. 2 Uncertaintystemmingfromtheinherentrandomnessofaprocess. 47 ˝ngerprintimages,includingminutiaeclusteringtendenciesanddependenciesindi˙erentregions ofthe˝ngerprintimagedomain.Daugman[112]proposedaninformationtheoreticapproachto computethecapacityofIrisCode.He˝rstdevelopedagenerativemodelofIrisCodebasedon HiddenMarkovModelsandthenestimatedthecapacityofIrisCodebycalculatingtheentropyof thisgenerativemodel.Adler etal .[113]proposedaninformationtheoreticapproachtoestimatethe averageinformationcontainedwithinafacerepresentationlikeEigenfaces[36]. Tothebestofourknowledge,nosuchcapacityestimationmodelshavebeenproposedinthe literatureforrepresentationsoffaces.Moreover,thedistinctnatureofrepresentationsfor˝ngerprint 3 , iris 4 andface 5 traitsdoesnotallowcapacityestimationapproachestocarryoverfromonebiometric modalitytoanother.Therefore,webelievethatanewmodelisnecessarytoestablishthecapacityof facerepresentations. 3.2CapacityofFaceRepresentations We˝rstdescribethesettingoftheproblemandthendescribeoursolution.Apictorialoutlineofthe approachisshowninFig.3.2. 3.2.1FaceRepresentationModel Afacerepresentationmodel " isaparametricembeddingfunctionthatmapsafaceimage s of identity c toavectorspace x 2 R ? ,i.e., x = 5 " ( s ; ) P ) ,where ) P isthesetofparametersof theembeddingfunction.Forexample,inthecaseofalinearembeddingfunctionlikePrincipal ComponentAnalysis(PCA),theparameterset ) P wouldrepresenttheeigenvectors.And,inthe caseofadeepneuralnetworkbasednon-linearembeddingfunction, ) P representstheparameters ofthenetwork. ThefaceembeddingprocesscanbeapproximatelycastwithintheframeworkofaGaussian 3 Anunorderedcollectionofminutiaepoints. 4 Abinaryrepresentation,calledtheiriscode. 5 A˝xed-lengthvectorofrealvalues. 48 Figure3.2 OverviewofFaceRepresentationCapacityEstimation: Wecastthecapacityesti- mationprocessintheframeworkofthespherepackingproblemonalow-dimensionalmanifold. Togeneralizethespherepackingproblem,wereplacespheresbyhyper-ellipsoids,oneperclass (subject).Ourapproachinvolvesthreesteps;(i)Unfoldingandmappingthemanifoldembeddedin high-dimensionalspaceontoalow-dimensionalspace.(ii) Teacher - Student modeltoobtainexplicit estimatesoftheuncertainty(noise)intheembeddingduetodataaswellastheparametersofthe representation,and(iii)Theuncertaintyestimatesareleveragedtoapproximatethedensitymanifold viamulti-variatenormaldistributions(tokeeptheproblemanditsanalysistractable),whichin turnfacilitatesanempiricalestimateofthecapacityofthe teacher facerepresentationasaratioof hyper-ellipsoidalvolumes. noisechannelasfollows.Facerepresentations y ofanimage s fromthe teacher aremodeledas observationsofatrueunderlyingembedding x thatiscorruptedbynoise z .Thenatureofthe relationshipbetweentheseentitiesisdeterminedbytheassumptionsofaGaussianchannel,namely, (i)additivityofthenoisei.e., y = x + z ,(ii)independenceofthetrueembeddingandtheadditive noise,i.e., x ? z ,and(iii)allentities, y , x and z followaGaussiandistribution,i.e., % x ˘N ( - g Œ g ) , % z ˘N ( 0 Œ z ) and % y ˘N ( - y Œ y ) .Statisticalestimatesoftheseparameterizeddistributionswill enableustocomputethecapacityofthe teacher facerepresentationmodelasdescribedinSection 3.2.3. Foragivenblack-boxfacerepresentation,inpractice,theembeddingscouldpotentiallylieonan arbitraryandunknownlow-dimensionalmanifold.Approximatingthismanifoldthroughanormal distributionpotentiallyover-estimatesthesupportoftheembeddingin R ? ,especiallywhen ? is high,resultinginanover-estimationofthecapacityoftherepresentation.Tothisend,wemodel thespaceoccupiedbythelearnedfacerepresentationasalow-dimensionalpopulationmanifold M2 R < embeddedwithinahigh-dimensionalspace P2 R ? .Underthismodelthefeaturesofa givenidentity 2 lieonamanifold M 2 M .Directlyestimatingthesupportandvolumesofthese 49 Figure3.3 ManifoldUnfolding: ADNNbasednon-linearmappingislearnedtounfoldandproject thepopulationmanifoldintoalowerdimensionalspace.Thenetworkisoptimizedtopreservethe geodesicdistancesbetweenpairsofpointsinthehighandlowdimensionalspace. manifoldsisaverychallengingtask,especiallysincethemanifoldcouldbeahighlyentangled surfacein R ? .Therefore,we˝rstlearnamappingthatcanprojectandunfoldthepopulation manifoldontoalow-dimensionalspacewhosedensity,supportandvolumecanbeestimatedmore reliably.WebaseoursolutionforprojectingandunfoldingthemanifoldingonDeepMDS,whichis proposedintheworkofCh.2. Speci˝cally,weemployDeepMDStoapproximatethenon-linearmappingthattransformsthe populationmanifoldinhigh-dimensionalspace, x 2 R ? ,totheunfoldedmanifoldinlow-dimensional space, y 2 R < byaparametricfunction y = 5 % ( x ; ) M ) withparameters ) M .Welearntheparameters ofthemappingwithintheDeepMDSframeworktominimizethefollowingobjective, min ) M X 89 3 ˛ ( x 8 Œ x 9 ) 3 ! ( 5 ( x 8 ; ) M ) Œ5 ( x 9 ; ) M )) 2 + _ k ) M k 2 2 (3.1) wherethesecondtermisaregularizerwithahyperparameter _ .Sinceourprimarygoalisto estimatethecapacityoftherepresentationwemapthemanifoldintothelow-dimensionalspace whilepreservingthelocalgeometryofthemanifoldintheformofpairwisedistances.Toachieve thisgoal,wechoose 3 ˛ ( x 8 Œ x 9 )=1+ x ) 8 x 9 k x 8 k 2 k x 9 k 2 correspondingtothecosinedistancebetweenthe featuresinthehighdimensionalspaceand 3 ! ( y 8 Œ y 9 )= k y 8 y 9 k 2 correspondingtotheeuclidean distanceinthelowdimensionalspace.Fig.3.3showsanillustrationoftheDNNbasedmapping. 3.2.2EstimatingUncertaintiesinRepresentations Theprojectionmodellearnedintheprevioussectioncanbeusedtoobtainthepopulationmanifold bypropagatingmultipleimagesfrommanyidentitiesthroughit.However,thisprocessonlyprovides 50 pointestimates(samples)fromthemanifoldanddoesnotaccountfortheuncertaintyinthemanifold. Accuratelyestimatingthecapacityofthefacerepresentation,however,necessitatesmodelingthe uncertaintyintherepresentationstemmingfromdi˙erentsourcesofnoiseintheprocessofextracting featurerepresentationsfromagivenfacialimage. Aprobabilisticmodelforthespaceofnoisyembeddings y generatedbyablack-boxfacial representationmodel( teacher 6 ) " C withparameters ) = f ) P Œ ) M g canbeformulatedasfollows: ? ( y j Y Œ _ )= Z ? ( y j s Œ Y Œ _ ) ? ( s j Y Œ _ ) 3 s = ZZ ? ( y j s Œ ) ) ? ( ) j Y Œ _ ) ? ( s j Y Œ _ ) 3 ) 3 s (3.2) where _ = f y 1 ŒŁŁŁŒ y # g and Y = f s 1 ŒŁŁŁŒ s # g arethetrainingsamplestoestimatethemodel parameters ) , ? ( y j s Œ ) ) isthe aleatoric (data)uncertaintygivenasetofparameters, ? ( ) j Y Œ _ ) is the epistemic (model)uncertaintyintheparametersgiventhetrainingsamplesand ? ( s j Y Œ _ ) ˘ N ( - 6 Œ 6 ) istheGaussianapproximation(seeSection3.2.3forjusti˝cation)oftheunderlying manifoldofnoiselessembeddings.Furthermore,weassumethatthetruemappingbetweenthe image s andthenoiselessembedding - isadeterministicbutunknownfunctioni.e., - = 5 ( s Œ ) ) . Theblack-boxnatureofthe teacher modelhoweveronlyprovides D = f s 8 Œ y 8 g # 8 =1 ,pairsof facialimages s 8 andtheircorrespondingnoisyembeddings y 8 ,asinglesamplefromthedistribution ? ( y j Y Œ _ ) .Therefore,welearna student model " B withparameters w tomimicthe teacher model.Speci˝cally,the student modelapproximatesthedatadependent aleatoric uncertainty ? ( y 8 j s 8 Œ w ) ˘N ( - 8 Œ 8 ) ,where - 8 representsthedatadependentmeanofthenoiselessembeddingand 8 representsthedatadependentuncertaintyaroundthemean.This student isanapproximationof theunknownunderlyingprobabilistic teacher ,bywhichaninputimage s generatesnoisyembeddings y ofidealnoiselessembeddings - ,foragivensetofparameters w ,i.e., ? ( y 8 j s 8 Œ w ) ˇ ? ( y 8 j - 8 Œ ) ) . Finally,weemployavariationaldistributiontoapproximatethe epistemic uncertaintyofthe teacher i.e., ? ( w j Y Œ _ ) ˇ ? ( ) j Y Œ _ ) . Learning: Givenpairsoffacialimagesandtheircorrespondingembeddingsfromthe teacher model, welearna student modeltomimictheoutputsofthe teacher forthesameinputsinaccordance 6 Weadopttheterminologyofteacher-studentmodelsfromthemodelcompressioncommunity[114]. 51 totheprobabilisticmodeldescribedabove.Weuseparameterizedfunctions, - 8 = 5 ( s 8 ; w - ) and 8 = 5 ( s 8 ; w ) tocharacterizethe aleatoric uncertainty ? ( y 8 j s 8 Œ w ) ,where w = f w ` Œ w g . Wechoosedeepneuralnetworks,speci˝callyconvolutionalneuralnetworksasourfunctions 5 ( ; w - ) and 5 ( ; w ) .Forthe epistemic uncertainty,whilemanydeeplearningbasedvariational inference[105,115,116]approacheshavebeenproposed,weusethesimpleinterpretationofdropout asourvariationalapproximation[105].Practically,thisinterpretationsimplycharacterizesthe uncertaintyinthedeepneuralnetworkweights w throughaBernoullisamplingoftheweights. Welearntheparametersofourprobabilisticmodel 5 = f w - Œ w Œ - 6 Œ 6 g throughmaximum- likelihoodestimationi.e.,minimizingthenegativelog-likelihoodoftheobservations _ = f y 1 ŒŁŁŁŒ y # g .Thistranslatestooptimizingacombinationoflossfunctions: min 5 L B + _ L 6 + W L A B + X L A 6 (3.3) where _ , W and X aretheweightsforthedi˙erentlossfunctions, L A B = 1 2 # P # 8 =1 k 8 k 2 ˙ and L A 6 = 1 2 k 6 k 2 ˙ aretheregularizationtermsand L B isthelossfunctionofthestudentthatcaptures thelog-likelihoodofagivennoisyrepresentation y 8 underthedistribution N ( - 8 Œ 8 ) . L B = 1 2 # X 8 =1 ln j 8 j + 1 2 Trace # X 8 =1 1 8 ( y 8 - 8 )( y 8 - 8 ) ) ! L 6 isthelog-likelihoodofthepopulationmanifoldoftheembeddingundertheapproximationbya multi-variatenormaldistribution N ( - 6 Œ 6 ) . L 6 = # 2 ln j 6 j + 1 2 Trace 1 6 # X 8 =1 ( y 8 - 6 )( y 8 - 6 ) ) ! Forcomputationaltractabilitywemakeasimplifyingassumptiononthecovariancematrix byparameterizingitasadiagonalmatrixi.e.,theo˙-diagonalelementsaresettozero.This parametrizationcorrespondstoindependenceassumptionsontheuncertaintyalongeachdimension oftheembedding.Thesparseparametrizationofthecovariancematrixyieldstwocomputational 52 bene˝tsinthelearningprocess.Firstly,itissu˚cientforthe student topredictonlythediagonal elementsofthecovariancematrix.Secondly,positivesemi-de˝nitivenessconstraintsonadiagonal matrixcanbeenforcedsimplybyforcingallthediagonalelementsofthematrixtobenon-negative. Toenforcenon-negativityoneachofthediagonalvariancevalues,wepredictthelogvariance, ; 9 =log f 2 9 .Thisallowsustore-parameterizethe student likelihoodintermsof l 8 : (3.4) L B = 1 2 # X 8 =1 3 X 9 =1 ; 9 8 + 1 2 # X 8 =1 3 X 9 =1 H 9 8 ` 9 8 2 4G? ; 9 8 Similarly,wereparameterizethelikelihoodofthenoiselessembeddingasafunctionof l 6 ,the logvariancealongeachdimension.Theregularizationtermsarealsoreparameterizedas, L A B = 1 2 # P # 8 =1 P 3 9 =1 4G? ; 9 8 and L A 6 = 1 2 P 3 9 =1 4G? ; 9 6 .Weempiricallyestimate - 6 as - 6 = 1 # P # 8 =1 y 8 andtheotherparameters 5 = f w - Œ w Œ 6 g throughstochasticgradientdescent[117].Thegradients oftheparametersarecomputedbybackpropagating[118]thegradientsoftheoutputsthroughthe network. Inference: The student modelthathasbeenlearnedcannowbeusedtoinfertheuncertaintyinthe embeddingsoftheoriginal teacher model.Foragivenfacialimage s ,the aleatoric uncertainty canbepredictedbyafeed-forwardpassoftheimage s throughthenetworki.e., - = 5 ( s Œ w - ) and = 5 ( s Œ w ) .The epistemic uncertaintycanbeapproximatelyestimatedthroughMonte-Carlo integrationoverdi˙erentsamplesofmodelparameters w .Inpracticetheparametersampling isperformedthroughtheuseofdropoutatinference.Insummary,thetotaluncertaintyinthe embeddingofeachfacialimage s isestimatedbyperformingMonte-Carlointegrationoveratotal of ) evaluations, ^ - 8 = 1 ) ) X C =1 - C 8 (3.5) ^ 8 = 1 ) ) X C =1 - C 8 ^ - 8 - C 8 ^ - 8 ) + 1 ) ) X C =1 C 8 (3.6) where - C 8 and C 8 arethepredicted aleatoric uncertaintyforeachfeed-forwardevaluationofthe network. 53 3.2.3ManifoldApproximation ThestudentmodeldescribedinSection3.2.2allowsustoextractuncertaintyestimatesofeach individualimage.Giventheseestimatesthenextstepistoestimatethedensityandsupportofthe populationandclass-speci˝clow-dimensionalmanifolds. Multipleexistingtechniquescanbeemployedforthispurposeunderdi˙erentmodeling assumptions,rangingfromnon-parametricmodelslikekerneldensityestimatorsandconvex-hulls toparametricmodelslikemultivariateGaussiandistributionandescribedhyper-spheres.The non-parametricandparametricmodelsspanthetrade-o˙betweentheaccuracyofthemanifold's shapeestimateandthecomputationalcomplexityof˝ttingtheshapeandcalculatingthevolumeof themanifold.Whilethenon-parametricmodelsprovidemoreaccurateestimatesofthedensityand supportofthemanifold,theparametricmodelspotentiallyprovidemorerobustandcomputationally tractableestimatesofthedensityandvolumeofthemanifolds.Forinstance,estimatingtheconvex hullofsamplesinhigh-dimensionalspaceanditsvolumeisbothcomputationallyprohibitiveand lessrobusttooutliers. Toovercometheaforestatedchallengesweapproximatethedensityofthepopulationand class-speci˝cmanifoldsinthelow-dimensionalspaceviamulti-variatenormaldistributions.The choiceofthenormaldistributionapproximationismotivatedbymultiplefactors;(a)probabilistically itleadstoarobustandcomputationallye˚cientestimateofthedensityofthemanifold,(b) geometricallyitleadstoahyper-ellipsoidalapproximationofthemanifold,whichinturnallows fore˚cientandexactestimatesofthesupportandvolumeofthemanifoldasafunctionofthe desiredfalseacceptancerate(seeSection3.2.4),and(c)thelow-dimensionalmanifoldobtained throughprojectionandunfoldingofthehigh-dimensionalrepresentationisimplicitlydesigned, throughEq.3.1,toclusterthefacialimagesbelongingtothesameidentity,andthereforeanormal distributionisarealistic(seeSection3.3.1)approximationofthemanifold. Empiricallyweestimatetheparametersofthesedistributionsasfollows.Themeanofthe populationembeddingiscomputedas - y 2 = 1 ˘ P ˘ 2 =1 ^ - 2 ,where ^ - 2 = 1 # 2 P # 2 8 =1 ^ - 2 8 .Thecovarianceof 54 Figure3.4 DecisionTheoryandCapacity: Weillustratetherelationbetweencapacityandthe discriminantfunctioncorrespondingtoanearestneighborclassi˝er. Left: Depictionofthenotion ofdecisionboundaryandprobabilityoffalseacceptbetweentwoidenticalonedimensionalGaussian distributions.Shannon'sde˝nitionofcapacitycorrespondstothedecisionboundarybeingone standarddeviationawayfromthemean. Right: Depictionofthedecisionboundaryinducedbythe discriminantfunctionofnearestneighborclassi˝er.Unlikeinthede˝nitionofShannon'scapacity, thesizeoftheellipsoidaldecisionboundaryisdeterminedbythemaximumacceptablefalseaccept rate.Theprobabilityoffalseacceptancecanbecomputedthroughthecumulativedistribution functionofa j 2 ( A 2 Œ3 ) distribution. thepopulationembedding y 2 isestimatedas, ~ 2 =argmax ^ 2 ^ 2 + 1 ˘ ˘ X 2 =1 ( ^ - 2 - y 2 )( ^ - 2 - y 2 ) ) y 2 + z 2 = ~ 2 + 1 ˘ ˘ X 2 =1 ( ^ - 2 - y 2 )( ^ - 2 - y 2 ) ) (3.7) where ^ 2 = 1 # 2 P # 2 8 =1 ^ 2 8 .Alongthesamelines,theclass-speci˝ccovariance z 2 ofaclass 2 is estimatedas, z 2 = 1 # 2 ) # 2 X 8 =1 ) X C =1 h - C 8 ^ - 8 - C 8 ^ - 8 ) + C 8 i (3.8) 3.2.4DecisionTheoryandModelCapacity Thusfar,wedevelopedthetoolsnecessarytocharacterizethefacerepresentationmanifoldand estimateitsdensity.Inthissectionwewilldeterminethesupportandvolumeofthepopulationand class-speci˝cmanifoldsasafunctionofthespeci˝edfalseacceptrate(FAR). 55 Ourrepresentationspaceiscomposedoftwocomponents:thepopulationmanifoldofallthe classesapproximatedbyamulti-variateGaussiandistributionandtheembeddingnoiseofeachclass approximatedbyamulti-variateGaussiandistribution.Underthesesettings,thedecisionboundaries betweentheclassesthatminimizestheclassi˝cationerrorratearedeterminedbydiscriminant functions[119].AsillustratedinFig.3.4,foratwo-classproblem,thediscriminantfunctionisa hyper-planein R 3 ,withtheoptimalhyper-planebeingequidistantfromboththeclasses.Moreover, theseparationbetweentheclassesdeterminestheoperatingpointandhencetheFAR.Inthe multi-classsettingtheoptimaldiscriminantfunctionisthesurfaceencompassedbyallthepairwise hyper-planes,whichasymptoticallyreducestoahigh-dimensionalhyper-ellipsoid.Thesupport ofthisenclosinghyper-ellipsoidcanbedeterminedbythedesiredoperatingpointintermsofthe maximalerrorprobabilityoffalseacceptance. Underthemulti-classsetting,thecapacityestimationproblemisequivalenttothegeometrical problemofellipsepacking,whichseekstoestimatethemaximumnumberofsmallhyper-ellipsoids thatcanbepackedintoalargerhyper-ellipsoid.Inthecontextoffacerepresentationsthesmall hyper-ellipsoidscorrespondtotheclass-speci˝cenclosinghyper-ellipsoidsasdescribedabovewhile thelargehyper-ellipsoidcorrespondstothespacespannedbythepopulationofallclasses.The volume + ofahyper-ellipsoidcorrespondingtoaMahalanobisdistance A 2 =( x - ) ) 1 ( x - ) withcovariancematrix isgivenbythefollowingexpression, + = + 3 j j 1 2 A 3 ,where + 3 isthevolume ofthe 3 -dimensionalhypersphere.Anupperboundonthecapacityofthefacerepresentationcanbe computedsimplyastheratioofthevolumesofthepopulationandtheclass-speci˝chyper-ellipsoids, ˘ + y 2 Œ z 2 + z 2 = + 3 j y 2 + z 2 j 1 2 A 3 y 2 + 3 j z 2 j 1 2 A 3 z 2 ! = j y 2 + z 2 j 1 2 A 3 y 2 j z 2 j 1 2 A 3 z 2 ! = j y 2 Œ z 2 j 1 2 j z 2 j 1 2 ! (3.9) where + y 2 Œ z 2 isthevolumeofpopulationhyper-ellipsoidand + z 2 isthevolumeoftheclass-speci˝c 56 hyper-ellipsoid.Thesizeofthepopulationhyper-ellipsoid A y 2 ischosensuchthatadesiredfraction ofalltheclassesliewithinthehyper-ellipsoidand A z 2 determinesthesizeoftheclass-speci˝c hyper-ellipsoid. y 2 Œ z 2 and z 2 arethee˙ectivesizesoftheenclosingpopulationandclass-speci˝c hyper-ellipsoidsrespectively.Foreachofthehyper-ellipsoidsthee˙ectiveradiusalongthe 8 -th principaldirectionis p _ 8 = A p _ 8 ,where p _ 8 istheradiusoftheoriginalhyper-ellipsoidalongthe sameprincipaldirection. ThisgeometricalinterpretationofthecapacityreducestotheShannoncapacity[120]when A y 2 and A z 2 arechosentobethesamei.e.,when A y 2 = A z 2 .Consequently,inthisinstance,thechoiceof A y 2 forthepopulationhyper-ellipsoidimplicitlydeterminestheboundaryofseparationbetween theclassesandhencetheoperatingfalseacceptancerate(FAR)oftheembedding.Forinstance, whencomputingtheShannoncapacityofthefacerepresentationchoosing A y 2 suchthat95%ofthe classesareenclosedwithinthepopulationhyper-ellipsoidwouldimplicitlycorrespondtooperating ataFARof5%.However,practicalfacerecognitionsystemsneedtooperateatlowerfalseaccept rates,dictatedbythedesiredlevelofsecurity. ThegeometricalinterpretationofthecapacitydescribedinEq.3.9directlyenablesusto computetherepresentationcapacityasafunctionofthedesiredoperatingpointasdeterminedbyits correspondingfalseacceptrate.Thesizeofthepopulationhyper-ellipsoid A y 2 willbedeterminedby thedesiredfractionofclassestoencloseoralternativelyothergeometricshapesliketheminimum volumeenclosinghyper-ellipsoidorthemaximumvolumeinscribedhyper-ellipsoidofa˝nitesetof classes,bothofwhichcorrespondtoaparticularfractionofthepopulationdistribution.Similarly, thedesiredfalseacceptrate @ determinesthesizeoftheclass-speci˝chyper-ellipsoid A z 2 . Let = f x j A 2 ( x - ) ) 1 ( x - ) g betheenclosinghyper-ellipsoid.Withoutlossof generality,assumingthattheclass-speci˝chyper-ellipsoidiscenteredattheorigin,thefalseaccept rate @ canbecomputedas, @ =1 Z x 2 1 p (2 c ) 3 j j 4G? x ) 1 x 2 3 x (3.10) 57 Reparameterizingtheintegralas y = 1 2 x ,wehave = f y j A 2 y ) y g and, @ =1 Z y 2 1 p (2 c ) 3 4G? y ) y 2 3 y (3.11) where f H 1 ŒŁŁŁŒH = g areindependentstandardnormalrandomvariables.TheMahalanobisdistance A 2 isdistributedaccordingtothe j 2 ( A 2 Œ3 ) distributionwith 3 degreesoffreedomand 1 @ isthe cumulativedistributionfunctionof j 2 ( A 2 Œ3 ) .Therefore,giventhedesiredFAR @ ,thecorresponding Mahalanobisdistance A z 2 canbeobtainedfromtheinverseCDFofthe j 2 ( A 2 z Œ3 ) distribution.Along thesamelines,thesizeofthepopulationhyper-ellipsoid A y 2 canbeestimatedfromtheinverseCDF ofthe j 2 ( A 2 y Œ3 ) distributiongiventhedesiredfractionofclassestoencompass.Theseestimates of A z 2 and A y 2 canbeutilizedinEq.3.9toestimatethecapacityasafunctionofthedesiredFAR. Algorithm1providesahigh-leveloutlineofourcompletecapacityestimationprocedure. Algorithm1 FaceRepresentationCapacityEstimation Input: Representation 5 " ( Œ ) P ) ,afacedatasetanddesiredFAR. Output: Capacityestimateatspeci˝edFAR. Step1: Learnparametricmapping 5 % ( Œ ) M ): x ! y (Eq.3.1) Step2: Learn student model " B tomimicandprovideuncertaintyestimatesof teacher " C =( "Œ% ) (Eq.3.3) Step3: Estimatedensityandsupport y 2 ofpopulationmanifold(Eq.3.7) Step4: Estimatedensityandsupport z 2 ofclass-speci˝cmanifolds(Eq.3.8) Step5: Obtain A y 2 and A z 2 fordesiredpopulationfractionandFAR,respectively(Eq.3.11) Step6: ObtaincapacityestimatefordesiredpopulationfractionandFARusing A y 2 and A z 2 (Eq. 3.9) 3.3NumericalExperiments Inthissectionwewill,(a)illustratethecapacityestimationprocessonatwo-dimensionaltoy example,(b)estimatethecapacityofadeepneuralnetworkbasedfacerepresentationmodel, speci˝callyFaceNetandSphereFaceonmultipledatasetsofincreasingcomplexity,and(c)study thee˙ectofdi˙erentdesignchoicesoftheproposedcapacityestimationapproach. 58 Table3.1CapacityofTwo-DimensionalToyExampleat1%FAR Manifold PopulationClass(maxarea) EstimatedGround-Truth CovarianceAreaCovarianceArea SupportEstimateGround-TruthEstimateGround-TruthEstimateGround-TruthEstimateGround-TruthCapacityCapacity Ellipse 10 Ł 840 Ł 56 0 Ł 5611 Ł 57 10 Ł 340 Ł 71 0 Ł 7111 Ł 79 35.1534.62 4 Ł 960 Ł 47 0 Ł 476 Ł 54 4 Ł 180 Ł 97 0 Ł 975 Ł 86 17.8415.251.972.27 ConvexHull403.91102.653.932.27 3.3.1Two-DimensionalToy-Example Figure3.5 SampleRepresentationSpace: Illustrationofatwo-dimensionalspacewherethe underlyingpopulationandclass-speci˝crepresentations(weshowfourclasses)are2-DGaussian distributions(solidellipsoids).Samplesfromtheclasses(colored ¢ )areutilizedtoobtainestimates ofthisunderlyingpopulationandclass-speci˝cdistributions(solidlines).Asacomparison,the supportofthesamplesintheformofaconvexhullarealsoshown(dashedlines). Weconsideranillustrativeexampletodemonstratethecapacityestimationprocessgivena constellationofclassesinatwo-dimensionalrepresentationspace.Wemodelthedistributionof thepopulationspaceofclasses(classcenterstobespeci˝c)asamulti-variatenormaldistribution, whilethefeaturespaceofeachclassismodeledasatwo-dimensionalnormaldistribution.From thismodel,wesample100di˙erentclassesfromtheunderlyingpopulationdistributionandforeach oftheseclasseswesamplefeaturesfromthegroundtruthmulti-variatenormaldistributionforthat class.Fromthesesamples,weestimatethecovariancematrixofthepopulationspacedistribution andthatoftheindividualclasses. 59 Fig.3.5showstherepresentationspace,includingthepopulationspaceandfourdi˙erentclasses correspondingtoclasseswiththeminimum,mean,medianandmaximumareafromamongthe 100populationclassesthatweresampled.Asacomparisonwealsoobtainthesupportofthe populationandtheclassesthroughtheconvexhullofthesamples,evenasthispresentsanumberof practicalchallenges:(1)estimatingconvexhullinhigh-dimensionsiscomputationallychallenging, (2)convexhulloverestimatesthesupportduetooutliers,and(3)cannotbeeasilyadaptedtoobtain supportasafunctionofdesiredFAR. Thecapacityoftherepresentationisnowestimatedastheratioofthesupportareaofthe populationandtheclasswithmedianarea,respectively.Tab.3.1showsthecapacityestimatesso obtainedforthissimpli˝edrepresentationspace.Resultsonthisexamplesuggeststhattheellipsoidal approximationoftherepresentationisabletoprovidemoreaccurateestimatesofthecapacityofthe representationincomparisontotheconvexhull.Modelingthesupportoftherepresentationthrough convexhullsisseverelya˙ectedbyoutliers,resultinginanoverestimateoftheunderlyingsupport andareaoftherepresentationleadingtooverestimatesofitscapacity. 3.3.2DatasetsandFaceRepresentationModel Datasets. Weutilizemultiplelarge-scalefacedatasets,bothforlearningthe teacher and student modelsaswellasforestimatingthecapacityofthe teacher .CASIAWebFacedataset[11]isusedfor trainingboththe teacher and student models.Thecapacityofthe teacher isestimatedonLFW[27], IJB-A[29],IJB-B[30],andIJB-C[9]. FaceRepresentationModels. Weestimatethecapacityoftwodi˙erentfacerepresentationmodels: (i)FaceNetintroducedbySchro˙ etal .[12],and(ii)SphereFaceintroducedbyLiu etal .[13].These modelsareillustrativeofthestate-of-the-artrepresentationsforfacerecognition. Themanifoldprojectionandunfoldingfunctionismodeledasamulti-layerdeepneuralnetwork withmultipleresidual[50]modulesconsistingoffully-connectedlayers.Therefore,foragiven image,thelow-dimensionalrepresentationcanbeobtainedbypropagatingtheimagethroughthe original facerepresentationmodelandthenthroughthemanifoldprojectionmodel.Werefertothe 60 combinedmodel,i.e., original representationandtheprojectionmodel,asthe teacher model.Since the student modelispurposedtomimicthe teacher model,webasethe student networkarchitecture onthe teacher 's 7 architecturewithafewnotableexceptions.First,weintroducedropoutbefore everyconvolutionallayerofthenetwork,includingalltheconvolutionallayersoftheinception[121] andresidual[50]modulesandeverylinearlayerofthemanifoldprojectionandunfoldingmodules. Second,thelastlayerofthenetworkismodi˝edtogeneratetwooutputs - and insteadofthe outputofthe teacher i.e.,sample y ofthenoisyembedding. 3.3.3FaceRecognitionPerformance Belowweprovideimplementationdetailsforlearningthemanifoldprojectionandthe student net- works.Subsequently,wedemonstratetheabilityofthe student modeltomaintainthediscriminative performanceofthe original models. ImplementationDetails: Weusepre-trainedmodelsforbothFaceNet 8 andSphereFace 9 asour original facerepresentationmodels.Beforeweextractfeaturesfromthesemodels,thefaceimages arepre-processedandnormalizedtoacanonicalfaceimage.Thefacesaredetectedandnormalized usingthejointfacedetectionandalignmentsystemintroducedbyZhang etal .[33].Giventhefacial landmarks,thefacesarenormalizedtoacanonicalimageofsize182 182fromwhichRGBpatches ofsize160 160areextractedastheinputtothenetworks. Giventhefeaturesextractedfromthe original representation,wetrainthemanifoldprojection andunfoldingnetworksontheCASIAWebFacedataset.Themodelistrainedtominimizethe multi-dimensionalscalinglossfunctiondescribedinEq.3.1onrandomlyselectedpairsoffeatures vectors x 8 and x 9 fromthedataset.TrainingisperformedusingtheAdam[97]optimizerwitha learningrateof3e-4andtheregularizationparameter _ =3 10 4 .Weuseabatchsizeof256 imagepairsandtrainthemodelforabout100epochs. 7 Inthescenariowherethe teacher isablack-boxmodel,thedesignofthestudentnetworkarchitectureneedsmore carefulconsiderationbutitalsoa˙ordsmore˛exibility.SeeFig.3.2foranillustrationofthisprocess. 8 https://github.com/davidsandberg/facenet 9 https://github.com/wy1iu/sphereface 61 The student istrainedtominimizethelossfunctionde˝nedinEq.3.3,wherethehyper- parametersarechosenthroughcross-validation.Trainingisperformedthroughstochasticgradient descentwithNesterovMomentum0.9andweightdecay0.0005.Weusedabatchsizeof64,a learningrateof0.01thatisdroppedbyafactorof2every20epochs.Weobservedthatitissu˚cient totrainthe student modelforabout100epochsforconvergence.The student modelincludesdropout withaprobabilityof0.05aftereachconvolutionallayerandwithaprobabilityof0.2aftereach fully-connectedlayerinthemanifoldprojectionlayers.Atinferenceeachimageispassedthrough the student network1,000timesasawayofperformingMonte-Carlointegrationthroughthespace ofnetworkparameters f w - Œ w g .Thesesampledoutputsareusedtoempiricallyestimatethemean andcovarianceoftheimageembedding. Experiments: Weevaluateandcomparetheperformanceofthe original and student modelsonthe fourtestdatasets,namely,LFW,IJB-A,IJB-BandIJB-C.Toevaluatethe student modelweestimate thefacerepresentationthroughMonte-Carlointegration.Wepasseachimagethroughthe student model1,000timestoextract f - 8 Œ 8 g 1000 8 =1 andcompute - = 1 1000 P 1000 8 =1 - 8 astherepresentation. Followingstandardpractice,wematchapairofrepresentationsthroughanearestneighborclassi˝er i.e.,bycomputingtheeuclideandistance 3 89 = y 8 y 9 2 2 betweenthelow-dimensionalprojected featurevectors y 8 and y 9 . WeevaluatethefacerepresentationmodelsontheLFWdatasetusingtheBLUFRprotocol[99] andfollowtheprescribedtemplatebasedmatchingprotocol,whereeachtemplateiscomposed ofpossiblymultipleimagesoftheclass,fortheIJB-A,IJB-BandIJB-Cdatasets.Followingthe protocolin[122],wede˝nethematchscorebetweentemplatesastheaverageofthematchscores betweenallpairsofimagesinthetwotemplates. Fig.3.6andTab.3.2reporttheperformanceofthe original and student models,bothFaceNet andSphereFace,oneachofthesedatasetsatdi˙erentoperatingpoints.Thiscomparisonaccountsfor boththeabilityoftheprojectionmodeltomaintaintheperformanceofthe original high-dimensional representationaswellastheabilityofthe student tomimicthe teacher whileprovidinguncertainty estimates.Wemakethefollowingobservations:(1)TheperformanceofDNNbasedrepresentation 62 Table3.2FaceRecognitionResultsforFaceNet,SphereFaceandState-of-the-Art(Thestate-of-the-art facerepresentationmodelsarenotavailableinthepublicdomain) Dataset Original :FaceNet Student :FaceNet Original :SphereFace Student :SphereFaceState-of-the-Art 0.1%FAR1%FAR0.1%FAR1%FAR0.1%FAR1%FAR0.1%FAR1%FAR0.1%FAR1%FAR LFW(BLUFR)93.9098.5192.8398.2896.7499.1195.4998.7998.88[123]N/A IJB-A45.9270.2643.8471.7265.0685.9764.1385.2594.897.1[124] IJB-B48.3174.4745.5674.1067.5880.8164.0280.6393.797.5[125] IJB-C42.5778.5340.7476.7571.2691.6764.0288.3394.798.3[125] (a)LFW (b)IJB-A (c)IJB-B (d)IJB-C Figure3.6Facerecognitionperformanceofthe original and student modelsondi˙erentdatasets. Wereportthefaceveri˝cationperformanceofbothFaceNetandSphereFacefacerepresentations, (a)LFWevaluatedthroughtheBLUFRprotocol,(b)IJB-A,(c)IJB-B,and(d)IJB-Cevaluated throughtheirrespectivematchingprotocol. onLFW,consistinglargelyoffrontalfaceimageswithminimalposevariationsandfacialocclusions, iscomparabletothestate-of-the-art.However,itsperformanceonIJB-A,IJB-BandIJB-C,datasets withlargeposevariations,islowerthanstate-of-the-artapproaches.Thisisduetothetemplate generationstrategythatweemployandthefactthatunlikethesemethodswedonot˝ne-tunethe DNNmodelontheIJB-A,IJB-BandIJB-Ctrainingsets.Wereiteratethatourgoalinthisworkis toestimatethecapacityofagenericfacerepresentationasopposedtoachievingthebestveri˝cation performanceoneachindividualdatasets.,and(2)Ourresultsindicatethatthe student modelsare abletomimicthe teacher modelsverywellasdemonstratedbythesimilarityofthereceiving operatingcurves. 3.3.4FaceRepresentationCapacity Havingdemonstratedtheabilityofthe student modeltobeane˙ectiveproxyforthe original representationmanifold,weindirectlyestimatethecapacityofthe original modelbyestimatingthe capacityofthe student model. ImplementationDetails: Weestimatethecapacityofthefacerepresentationsbyevaluating 63 Table3.3CapacityofFaceRepresentationModelat1%FAR DatasetFacesFaceNetSphereFace LFW 4.3 10 6 2.6 10 5 IJB-A 6.3 10 4 3.2 10 6 IJB-B 6.4 10 4 2.4 10 5 IJB-C 2.7 10 4 8.4 10 4 Eq.3.9.Foreachofthedatasetsweempiricallydeterminetheshapeandsizeofthepopulation hyper-ellipsoid y 2 andtheclass-speci˝chyper-ellipsoids z 2 .Thesequantitiesarecomputed throughthepredictionsobtainedbysamplingtheweights( w - Œ w )ofthemodel,viadropout.We obtain1,000suchpredictionsforagivenimage,byfeedingtheimagethroughthe student networka 1,000di˙erenttimeswithdropout.Forrobustnessagainstoutliersweonlyconsiderclasseswith atleasttwoimagesperclassforLFWand˝veimagesperclassforalltheotherdatasetsforthe capacityestimates. CapacityEstimates: Tab.3.3reportsthecapacityofDNNbasedfacerepresentationsestimatedon di˙erentdatasetsat1%FAR(i.e.,when A y 2 = A z 2 ).Wemakethefollowingobservationsfromour numericalresults:TheupperboundonthecapacityestimateoftheFaceNetandSphereFacemodels inconstrainedscenarios(LFW)isoftheorderof ˇ 10 6 ,inunconstrainedenvironments(IJB-A, IJB-BandIJB-C)isoftheorderof ˇ 10 5 underthegeneralmodelofahyper-ellipsoidwiththeclass correspondingtomaximumnoise.Therefore,theoretically,therepresentationshouldbeableto resolve 10 6 and 10 5 subjectswithatrueacceptancerate(TAR)of100%ataFARof1%underthe constrainedandunconstrainedoperationalsettings,respectively.Whilethiscapacityestimateison theorderofthepopulationofalargecity,inpractice,theperformanceoftherepresentationislower 64 thanthetheoreticalperformance,about95%acrossonly10,000subjectsintheconstrainedandonly 50%across3,531subjectsintheunconstrainedscenarios.Theseresultssuggestthatourcapacity estimatesareanupperboundontheactualperformanceoffacerecognitionsystemsinpractice, especiallyunderunconstrainedscenarios.Therelativeorderofthecapacityestimates,however, mimicstherelativeorderoftheveri˝cationaccuracyonthesedatasetsasshowninFig.3.7c. (a)FaceNet[12] (b)SphereFace[13] (c)CapacityvsTAR Figure3.7Capacityestimatesacrossdi˙erentdatasetsforthe(a)FaceNet[12]and(b)SphereFace[13] representationsasfunctionofdi˙erentfalseacceptrates.Underthelimit,thecapacitytendstozero astheFARtendstozero.Similarly,thecapacitytendsto 1 astheFARtendsto1.0.(c)Logarithmic valuesofcapacityondi˙erentdatasetsversusthecorrespondingTAR@0.1%FAR. Weextendthecapacityestimatespresentedabovetoestablishcapacityasafunctionofdi˙erent operatingpoints,asde˝nedbydi˙erentfalseacceptrates.Wede˝ne A y 2 and A z 2 corresponding tothedesiredoperatingpointsandevaluateEq.3.9.Inallourexperimentswechoose A y 2 to encompass99%oftheclasseswithinthepopulationhyper-ellipsoid.Di˙erentFARsde˝nedi˙erent decisionboundarycontoursthat,inturn,de˝nethesizeoftheclass-speci˝chyper-ellipsoid.Figures 3.7aand3.7bshowshowthecapacityoftherepresentationchangesasafunctionoftheFARsfor di˙erentdatasets.Wenotethatattheoperatingpointof ˙' =0 Ł 1% ,thecapacityofthemaximum facerepresentationis ˇ 10 5 intheconstrainedand ˇ 10 3 intheunconstrainedcase.However,at stricteroperatingpoints(FARof 0 Ł 001% or 0 Ł 0001% ),thatismoremeaningfulatlargerscales ofoperation[126],thecapacityoftheFaceNetrepresentationissigni˝cantlylower(63and6, respectivelyforIJB-C)thanthetypicaldesiredscaleofoperationoffacerecognitionsystems.These resultssuggestasigni˝cantroomforimprovementinfacerepresentation. 65 Table3.4IJB-CCapacityat1%FARAcrossIntra-ClassUncertainty ModelMinMeanMedianMax FaceNet 1 Ł 6 10 14 6 10 8 5 Ł 0 10 8 2 Ł 7 10 4 SphereFace 4 Ł 9 10 16 1 Ł 1 10 11 9 Ł 8 10 10 8 Ł 4 10 4 3.3.5AblationStudies DNNandPCA: WeseektocomparethecapacityofclassicalPCAbasedEigenFaces[36] representationofimagepixelsandtheDNNbasedrepresentation.Theseareillustrativeof thetwoextremesofvariousfacerepresentationsproposedintheliteraturewithFaceNetand SphereFaceprovidingclosetostate-of-the-artrecognitionperformance.TheFaceNetandSphereFace representationsarebasedonnon-linearmulti-layereddeepconvolutionalnetworkarchitectures. EigenFaces,incontrast,isalinearmodelforrepresentingfaces.ThecapacityofEigenfacesis ˇ 10 0 ,whichissigni˝cantlylowerthanthecapacityofDNNbasedrepresentations.Eigenfaces, byvirtueofbeingbasedonlinearprojectionsoftherawpixelvalues,isunabletoscalebeyonda handfulofidentities,whiletheDNNrepresentationsareabletoresolvesigni˝cantlymorenumber ofidentities.Therelativedi˙erenceinthecapacityisalsore˛ectedinthevastdi˙erenceinthe veri˝cationperformancebetweenthetworepresentations. DataBias: Ourcapacityestimatesarecriticallydependentontherepresentationalsupportof thecanonicalclass.Inotherwords,thecapacityexpressioninEq.3.9dependson z 2 ,thatis representativeofthedemographicsandintra-classvariabilityofthesubjectsinthepopulationof interest.However,thehyper-ellipsoidscorrespondingtovariousclassescouldpotentiallybeofa di˙erentsize.Forinstance,inFig.3.1eachclass-speci˝cmanifoldisofdi˙erentsizes,orientation andshape.Preciselyde˝ningoridentifyingacanonicalsubject,fromamongallpossibleidentities, isinitselfachallengingtaskandbeyondthescopeofthiswork.InTab.3.4wereportthecapacityfor di˙erentchoicesofclasses(subjects)fromtheIJB-Cdataseti.e.,classeswiththeminimum,mean, medianandmaximumhyper-ellipsoidvolume,therebyrangingfromclasseswithverylowintra-class variabilityandclasseswithveryhighintra-classvariability.Datasetswhoseclassdistributionis 66 similartothedistributionofthedatathatwasusedtotrainthefacerepresentation,areexpectedto exhibitlowintra-classuncertainty,whiledatasetswithclassesthatareoutofthetrainingdistribution canpotentiallyhavehighintra-classuncertainty,andconsequentlylowercapacity.Fig.3.8show examplesoftheimagescorrespondingtothelowestandhighestintra-classvariabilityineachdataset. Empirically,weobservedthatclasseswiththesmallesthyper-ellipsoidaretypicallyclasseswith veryfewimagesandverylittlevariationinfacialappearance.Similarly,classeswithhighintra-class uncertaintyaretypicallyclasseswithaverylargenumberofimagesspanningawiderangeof variationsinpose,expression,illuminationconditionsetc.,variationsthatonecanexpectunder anyreal-worlddeploymentsoffacerecognitionsystems.Coupledwiththefactthatthecapacityof thefacerepresentationisestimatedfromaverysmallsampleofthepopulation(lessthan11,000 subjects),wearguethattheclasswithlargeintra-classuncertaintywithinthedatasetsconsideredin thischapterisareasonableproxyofacanonicalsubjectinunconstrainedreal-worlddeploymentsof facerecognitionsystems. (a)LFW (b)IJB-B (c)IJB-C Figure3.8Exampleimagesofclassesthatcorrespondtodi˙erentsizesoftheclass-speci˝chyper- ellipsoids,basedontheSphereFacerepresentation,fordi˙erentdatasetsconsidered. TopRow : Imagesoftheclasswiththelargestclass-speci˝chyper-ellipsoidforeachdatabase.Noticethatinthe caseofadatabasewithpredominantlyfrontalfaces(LFW),largevariationsinfacialappearancelead tothegreatestuncertaintyintheclassrepresentation.Onmorechallengingdatasets(IJB-B,IJB-C), thefacerepresentationexhibitsmostuncertaintyduetoposevariations. BottomRow: Imagesofthe classwiththesmallestclass-speci˝chyper-ellipsoidforeachdatabase.Asexpected,acrossallthe datasets,frontalfaceimageswiththeminimalchangeinappearanceresultintheleastamountof uncertaintyintheclassrepresentation. 67 Table3.5IJB-CCapacityat1%FARAcrossManifoldSupport ModelHypersphere Hyper-Ellipsoid Hyper-Ellipsoid (Axis-Aligned) FaceNet 1 Ł 5 10 3 9 Ł 2 10 2 2 Ł 7 10 4 SphereFace 6 Ł 7 10 3 7 Ł 2 10 3 8 Ł 4 10 4 GaussianDistributionParameterization: Forthesakeofe˚ciencywemadethesamemodeling assumptionforboththeglobalshapeoftheembeddingandtheembeddingshapeofeachclass.The capacityestimatesobtainedthusfararebymodelingthemanifoldsasunconstrainedhyper-ellipsoids. Wenowobtaincapacityestimatesfordi˙erentmodelingassumptionsontheshapeoftheseentities. Forinstancetheshapescouldalsobemodeledashyper-spherescorrespondingtoadiagonal covariancematrixwiththesamevarianceineachdimension.Wegeneralizethehyper-spheremodel toanaxisalignedhyper-ellipsoidcorrespondingtoadiagonalcovariancematrixwithpossibly di˙erentvariancesalongeachdimension.Tab.3.5showsthecapacityestimatesontheIJB-Cdataset at1%FAR.WeobservethatthecapacityestimatesoftheanisotropicGaussian(hyper-ellipsoid) aretwoordersofmagnitudehigherthanthecapacityestimatesofthereducedapproximations, hyper-sphere(isotropicGaussian)andaxis-alignedhyper-ellipsoid.Atthesametime,theisotropic andtheaxis-alignedhyper-ellipsoidapproximationsresultinverysimilarcapacityestimates. 3.4Conclusion Facerecognitionisbasedontwounderlyingpremises:persistence(invarianceoffacerepresentation overtime)andcapacity(numberofdistinctidentitiesafacerepresentationcaresolve).Whileface longitudinalstudies[127]haveaddressedthepersistenceproperty,verylittleattentionhasbeen devotedtothecapacityproblemthatisaddressedhere.Thefacerepresentationprocesswasmodeled asalow-dimensionalmanifoldembeddedinhigh-dimensionalspace.Weestimatedthecapacityof afacerepresentationasaratioofthevolumeofthepopulationandclass-speci˝cmanifoldsasa functionofthedesiredfalseacceptancerate.Empirically,weestimatedthecapacityoftwodeep neuralnetworkbasedfacerepresentations:FaceNetandSphereFace.Numericalresultsyieldeda 68 capacityof 10 5 ataFARof1%.AtlowerFARof 0 Ł 001% ,thecapacitydropped-o˙signi˝cantly toonly70underunconstrainedscenarios,impairingthescalabilityofthefacerepresentation. Theredoesexistalargegapbetweenthetheoreticalandempiricalveri˝cationperformanceofthe representationsindicatingthatthereisasigni˝cantscopeforimprovementinthediscriminative capabilitiesofcurrentstate-of-the-artfacerepresentations. Asfacerecognitiontechnologymakesrapidstridesinperformanceandwitnesseswideradoption, quantifyingthecapacityofagivenfacerepresentationisanimportantproblem,bothfroman analyticalaswellasfromapracticalperspective.However,duetothechallengingnatureof˝nding aclosed-formexpressionofthecapacity,wemakesimplifyingassumptionsonthedistributionofthe populationandspeci˝cclassesintherepresentationspace.Ourexperimentalresultsdemonstrate thateventhissimpli˝edmodelisabletoprovidereasonablecapacityestimatesofaDNNbasedface representation.Relaxingtheassumptionsoftheapproachpresentedhereisanexcitingdirectionof futurework,leadingtomorerealisticcapacityestimates. 69 Chapter4 TheBiasinFaceRecognition InthischapterweassessthedemographicbiasinFRalgorithmsanddevelopnewmethodsto mitigatethedemographicimpactonFRperformance.Weexperimentwithdi˙erentde-biasing approachesandnetworkarchitecturesusingdeeplearning.Assessingthemodels'demographic biasquantitativelyonvariousdatasetsweseehowmuchbiasmitigatedinourattemptatimproving fairnessoffacerepresentationsextractedfromCNNs. Morespeci˝cally,inthischapterweproposetwodi˙erentmethodstolearnafairface representation,wherefacesofeverygroupcouldbeequallywell-represented.Inthe˝rstmethod, wepresentade-biasingadversarialnetwork(DebFace)thatlearnstoextractdisentangledfeature representationsforbothunbiasedfacerecognitionanddemographicsestimation.Theproposed networkconsistsofoneidentityclassi˝erandthreedemographicclassi˝ers(forgender,age,and race)thataretrainedtodistinguishidentityanddemographicattributes,respectively.Adversarial learningisadoptedtominimizecorrelationamongfeaturefactorssoastoabatebiasin˛uencefrom otherfactors.Wealsodesignaschemetocombinedemographicswithidentityfeaturestostrengthen robustnessoffacerepresentationindi˙erentdemographicgroups. Thesecondmethod,groupadaptiveclassi˝er(GAC),learnstomitigatebiasbyusingadaptive convolutionkernelsandattentionmechanismsonfacesbasedontheirdemographicattributes.The adaptivemodulecompriseskernelmasksandchannel-wiseattentionmapsforeachdemographic 70 groupsoastoactivatedi˙erentfacialregionsforidenti˝cation,leadingtomorediscriminative featurespertinenttotheirdemographics.Wealsointroduceanautomatedadaptationstrategy whichdetermineswhethertoapplyadaptationtoacertainlayerbyiterativelycomputingthe dissimilarityamongdemographic-adaptiveparameters,therebyincreasingthee˚ciencyofthe adaptationlearning. Theexperimentalresultsonbenchmarkfacedatasets(e.g.,RFW[4],LFW,IJB-A,andIJB-C) showthatourapproachisabletoreducebiasinfacerecognitiononvariousdemographicgroupsas wellasmaintainthecompetitiveperformance. 4.1FairnessLearningandDe-biasingAlgorithms Westartbyreviewingrecentadvancesinfairnesslearningandde-biasingalgorithms.Previouse˙orts onfairnesstechniquesareproposedtopreventmachinelearningmodelsfromutilizingstatisticalbias intrainingdata,includingadversarialtraining[128 131],subgroupconstraintoptimization[132 134],datapre-processing( e.g. ,weightedsampling[135],anddatatransformation[136]),and algorithmpost-processing[137,138].Forexample,inprior-DNNere,Zhang etal .[139]propose acost-sensitivelearningframeworktoreducemisclassi˝cationrateoffaceidenti˝cation.To correcttheskewofseparatinghyperplanesofSVMonimbalanceddata,Liu etal .[140]propose Margin-BasedAdaptiveFuzzySVMthatobtainsalowergeneralizationerrorbound.IntheDNN era,facerecognitionmodelsaretrainedonlarge-scalefacedatasetswithhighly-imbalancedclass distribution[141,142].Touncoverdeeplearningbias,Alexander etal .[143]developanalgorithmto mitigatethehiddenbiaseswithintrainingdata.RangeLoss[142]learnsarobustfacerepresentation thatmakesthemostuseofeverytrainingsample.Tomitigatetheimpactofinsu˚cientclass samples,center-basedfeaturetransferlearning[141]andlargemarginfeatureaugmentation[144] areproposedtoaugmentfeaturesofunder-representedidentitiesandequalizeclassdistribution. Despitetheire˙ectiveness,thesestudiesignorethein˛uenceofdemographicimbalanceonthe dataset,whichmayleadtodemographicbias.Thestudiesin[4,145]addressthedemographic 71 biasinFRbyleveragingunlabeledfacestoimprovetheperformanceingroupswithfewersamples. Wang etal .[5]proposeskewness-awarereinforcementlearningtomitigateracialbiasinFR.Unlike priorwork,wedesignaGACframeworktocustomizetheclassi˝erforeachdemographicgroup, which,ifsuccessful,wouldleadtomitigatedbias.Thisframeworkispresentedinthefollowing Sec.4.4. Anotherpromisingapproachlearnsafairrepresentationtopreservealldiscerninginformtion aboutthedataattributesortask-relatedattributesbuteliminatetheprejudiciale˙ectsfromsensitive factors[22,146 149].Locatello etal .[150]showthefeaturedisentanglementisconsistently correlatedwithincreasingfairnessofgeneralpurposerepresentationsbyanalyzing 12 Œ 600 SOTA models.Accordingly,weproposeoursecondde-biasingframework, DebFace ,whichdisentangles facerepresentationstode-biasbothFRanddemographicattributeestimation.Sec.4.3discusses DebFaceinmoredetails. 4.2ProblemDe˝nition Wenowgiveaspeci˝cde˝nitionoftheproblemaddressedinthischapter.Theultimategoal ofunbiasedfacerecognitionisthat,givenafacerecognitionsystem,nostatisticallysigni˝cant di˙erenceamongtheperformanceindi˙erentcategoriesoffaceimages.Despitetheresearch onpose-invariantfacerecognitionthataimsforequalperformanceonallposes[151,152],we believethatitisinappropriatetode˝nevariationslikepose,illumination,orresolution,asthe categories.Theseareinstantaneous image-related variationswithintrinsicbias.E.g.,large-poseor low-resolutionfacesareinherentlyhardertoberecognizedthanfrontal-viewhigh-resolutionfaces. Rather,wewouldliketode˝ne subject-related propertiessuchasdemographicattributesasthe categories. Afacerecognitionsystemis biased ifitperformsworseoncertaindemographiccohorts. Forpracticalapplications,itisimportanttoconsiderwhatdemographicbiasesmayexist,and whethertheseareintrinsicbiasesacrossdemographiccohortsoralgorithmicbiasesderivedfrom thealgorithmitself.Thismotivatesustoanalyzethedemographicin˛uenceonfacerecognition 72 performanceandstrivetoreducealgorithmicbiasforfacerecognitionsystems.Onemayachievethis bytrainingonadatasetcontaininguniformsamplesoverthecohortspace.However,thedemographic distributionofadatasetisoftenimbalancedandunderrepresentsdemographicminoritieswhile overrepresentingmajorities.Naivelyre-samplingabalancedtrainingdatasetmaystillinducebias sincethediversityoflatentvariablesisdi˙erentacrosscohortsandtheinstancescannotbetreated fairlyduringtraining.Tomitigatedemographicbias,weproposetwofacede-biasingframeworks thatreducesdemographicbiasoverfaceidentityfeatureswhilemaintaintheoverallveri˝cation performanceinthemeantime. 4.3JointlyDe-biasingFaceRecognitionandDemographicAt- tributeEstimation Inthissection,weintroduceouranotherframeworktoaddressthein˛uenceofdemographicbiason facerecognition.Withthetechniqueofadversariallearning,weattackthisissuefromadi˙erent perspective.Speci˝cally,weassumethatifthefacerepresentationdoesnotcarrydiscriminative informationofdemographicattributes,itwouldbeunbiasedintermsofdemographics.Giventhis assumption,onecommonwaytoremovedemographicinformationfromfacerepresentationsis toperformfeaturedisentanglementviaadversariallearning(Fig.4.1b).Thatis,theclassi˝erof demographicattributescanbeusedtoencouragetheidentityrepresentationto not carrydemographic information.However,oneissueofthiscommonapproachisthat,thedemographicclassi˝eritself couldbebiased( e.g. ,theraceclassi˝ercouldbebiasedongender),andhenceitwillactdi˙erently whiledisentanglingfacesofdi˙erentcohorts.Thisisclearlyundesirableasitleadstodemographic biasedidentityrepresentation. Toresolvethechicken-and-eggproblem,weproposeto jointly learnunbiasedrepresentations forboththeidentityanddemographicattributes.Speci˝cally,startingfromamulti-tasklearning frameworkthatlearnsdisentangledfeaturerepresentationsofgender,age,race,andidentity, respectively,werequesttheclassi˝erofeachtasktoactasadversarialsupervisionfortheothertasks 73 (a) Multi-tasklearning (b) Adversariallearning (c) DebFace Figure4.1Methodstolearndi˙erenttaskssimultaneously.Solidlinesaretypicalfeature˛owin CNN,whiledashlinesareadversariallosses. ( e.g. ,thedasharrowsinFig.4.1c).Thesefourclassi˝ershelpeachothertoachievebetterfeature disentanglement,resultinginunbiasedfeaturerepresentationsforboththeidentityanddemographic attributes.AsshowninFig.4.1,ourframeworkisinsharpcontrasttoeithermulti-tasklearningor adversariallearning. Moreover,sincethefeaturesaredisentangledintothedemographicandidentity,ourface representationsalsocontributetoprivacy-preservingapplications.Itisworthnoticingthat suchidentityrepresentationscontainlittledemographicinformation,whichcouldunderminethe recognitioncompetencesincedemographicfeaturesare part ofidentity-relatedfacialappearance. Toretaintherecognitionaccuracyondemographicbiasedfacedatasets,weproposeanothernetwork thatcombinesthedemographicfeatureswiththedemographic-freeidentityfeaturestogeneratea newidentityrepresentationforfacerecognition. Thekeycontributionsand˝ndingsofthisworkare: Athoroughanalysisofdeeplearning basedfacerecognitionperformanceonthreedi˙erentdemographics:(i)gender,(ii)age,and(iii) race. Ade-biasingfacerecognitionframework,calledDebFace,thatgeneratesdisentangled representationsforbothidentityanddemographicsrecognitionwhilejointlyremovingdiscriminative informationfromothercounterparts. TheidentityrepresentationfromDebFace(DebFace-ID)showslowerbiasondi˙erent demographiccohortsandalsoachievesSOTAfaceveri˝cationresultsondemographic-unbiased facerecognition. 74 ThedemographicattributeestimationsviaDebFacearelessbiasedacrossotherdemographic cohorts. CombiningIDwithdemographicsresultsinmorediscriminativefeaturesforfacerecognition onbiaseddatasets. 4.3.1AdversarialLearningandDisentangledRepresentation We˝rstreviewpreviousworkrelatedtoadversariallearningandrepresentationdisentanglement. Adversariallearning[153]hasbeenwellexploredinmanycomputervisionapplications.For example,GenerativeAdversarialNetworks(GANs)[154]employadversariallearningtotraina generatorbycompetingwithadiscriminatorthatdistinguishesrealimagesfromsyntheticones. Adversariallearninghasalsobeenappliedtodomainadaptation[155 158].Aproblemofcurrent interestistolearninterpretablerepresentationswithsemanticmeaning[159].Manystudies havebeenlearningfactorsofvariationsinthedatabysupervisedlearning[152,160 163],or semi-supervised/unsupervisedlearning[164 167],referredtoasdisentangledrepresentation.For superviseddisentangledfeaturelearning,adversarialnetworksareutilizedtoextractfeaturesthat onlycontaindiscriminativeinformationofatargettask.Forfacerecognition,Liu etal .[161] proposeadisentangledrepresentationbytraininganadversarialautoencodertoextractfeaturesthat cancaptureidentitydiscriminationanditscomplementaryknowledge.Incontrast,ourproposed DebFacedi˙ersfrompriorworksinthateachbranchofamulti-tasknetworkactsasbothagenerator anddiscriminatorsofotherbranches(Fig.4.1c). 4.3.2Methodology 4.3.2.1AlgorithmDesign Theproposednetworktakesadvantageoftherelationshipbetweendemographicsandfaceidentities. Ononehand,demographiccharacteristicsarehighlycorrelatedtofacefeatures.Ontheotherhand, demographicattributesareheterogeneousintermsofdatatypeandsemantics[168].Amaleperson, 75 Figure4.2OverviewoftheproposedDe-biasingface(DebFace)network.DebFaceiscomposedof threemajorblocks, i.e. ,asharedfeatureencodingblock,afeaturedisentanglingblock,andafeature aggregationblock.Thesolidarrowsrepresenttheforwardinference,andthedashedarrowsstand foradversarialtraining.Duringinference,eitherDebFace-ID( i.e. , f ˚ˇ )orDemoIDcanbeusedfor facematchinggiventhedesiredtrade-o˙betweenbiasnessandaccuracy. forexample,isnotnecessarilyofacertainageorofacertainrace.Accordingly,wepresenta frameworkthatjointlygeneratesdemographicfeaturesandidentityfeaturesfromasinglefaceimage byconsideringboththeaforementionedattributecorrelationandattributeheterogeneityinaDNN. Whileourmaingoalistomitigatedemographicbiasfromfacerepresentation,weobservethat demographicestimationsarebiasedaswell(seeFig.4.5).Howcanweremovethebiasofface recognitionwhendemographicestimationsthemselvesarebiased?Cook etal .[169]investigated thise˙ectandfoundtheperformanceoffacerecognitionisa˙ectedbymultipledemographic covariates.Weproposeade-biasingnetwork,DebFace,thatdisentanglestherepresentationinto gender(DebFace-G),age(DebFace-A),race(DebFace-R),andidentity(DebFace-ID),todecrease biasofbothfacerecognitionanddemographicestimations.Usingadversariallearning,theproposed methodiscapableofjointlylearningmultiplediscriminativerepresentationswhileensuringthat eachclassi˝ercannotdistinguishamongclassesthroughnon-correspondingrepresentations. Thoughlessbiased,DebFace-IDlosesdemographiccuesthatareusefulforidenti˝cation.In particular,raceandgenderaretwocriticalcomponentsthatconstitutefacepatterns.Hence,wedesire toincorporateraceandgenderwithDebFace-IDtoobtainamoreintegratedfacerepresentation. Weemployalight-weightfully-connectednetworktoaggregatetherepresentationsintoaface representation(DemoID)withthesamedimensionalityasDebFace-ID. 76 4.3.2.2NetworkArchitecture Fig.4.2givesanoverviewoftheproposedDebFacenetwork.Itconsistsoffourcomponents:the sharedimage-to-featureencoder ˆ ˚<6 ,thefourattributeclassi˝ers(includinggender ˘ ˝ ,age ˘ ,race ˘ ' ,andidentity ˘ ˚ˇ ),thedistributionclassi˝er ˘ ˇ8BCA ,andthefeatureaggregationnetwork ˆ ˙40C . Weassumeaccessto # labeledtrainingsamples f ( x ( 8 ) ŒH ( 8 ) 6 ŒH ( 8 ) 0 ŒH ( 8 ) A ŒH ( 8 ) 83 ) g # 8 =1 .Ourapproachtakesan image x ( 8 ) astheinputof ˆ ˚<6 .Theencoderprojects x ( 8 ) toitsfeaturerepresentation ˆ ˚<6 ( x ( 8 ) ) .The featurerepresentationisthendecoupledintofour ˇ -dimensionalfeaturevectors,gender f ( 8 ) 6 ,age f ( 8 ) 0 , race f ( 8 ) A ,andidentity f ( 8 ) ˚ˇ ,respectively.Next,eachattributeclassi˝eroperatesthecorresponding featurevectortocorrectlyclassifythetargetattributebyoptimizingparametersofboth ˆ ˚<6 and therespectiveclassi˝er ˘ .Forademographicattributewith categories,thelearningobjective L ˘ ˇ4<> ( x ŒH ˇ4<> ; ˆ ˚<6 Œ˘ ˇ4<> ) isthestandardcrossentropylossfunction.Forthe = identity classi˝cation,weadoptAM-Softmax[170]astheobjectivefunction L ˘ ˚ˇ ( x ŒH 83 ; ˆ ˚<6 Œ˘ ˚ˇ ) .To de-biasallofthefeaturerepresentations,adversarialloss L 3E ( x ŒH ˇ4<> ŒH 83 ; ˆ ˚<6 Œ˘ ˇ4<> Œ˘ ˚ˇ ) isappliedtotheabovefourclassi˝erssuchthateachofthemwillnotbeabletopredictcorrect labelswhenoperatingirrelevantfeaturevectors.Speci˝cally,givenaclassi˝er,theremainingthree attributefeaturevectorsareimposedonitandattempttomisleadtheclassi˝erbyonlyoptimizingthe parametersof ˆ ˚<6 .Tofurtherimprovethedisentanglement,wealsoreducethemutualinformation amongtheattributefeaturesbyintroducingadistributionclassi˝er ˘ ˇ8BCA . ˘ ˇ8BCA istrainedto identifywhetheraninputrepresentationissampledfromthejointdistribution ? ( f 6 Œ f 0 Œ f A Œ f ˚ˇ ) or themultiplicationofmargindistributions ? ( f 6 ) ? ( f 0 ) ? ( f A ) ? ( f ˚ˇ ) viaabinarycrossentropyloss L ˘ ˇ8BCA ( x ŒH ˇ8BCA ; ˆ ˚<6 Œ˘ ˇ8BCA ) ,where H ˇ8BCA isthedistributionlabel.Similartoadversarialloss,a factorizationobjectivefunction L ˙02C ( x ŒH ˇ8BCA ; ˆ ˚<6 Œ˘ ˇ8BCA ) isutilizedtorestrainthe ˘ ˇ8BCA from distinguishingtherealdistributionandthusminimizesthemutualinformationofthefourattribute representations.BothadversariallossandfactorizationlossaredetailedinSec.4.3.2.3.Altogether, 77 DebFaceendeavorstominimizethejointloss: L ( x ŒH ˇ4<> ŒH 83 ŒH ˇ8BCA ; ˆ ˚<6 Œ˘ ˇ4<> Œ˘ ˚ˇ Œ˘ ˇ8BCA )= L ˘ ˇ4<> ( x ŒH ˇ4<> ; ˆ ˚<6 Œ˘ ˇ4<> ) + L ˘ ˚ˇ ( x ŒH 83 ; ˆ ˚<6 Œ˘ ˚ˇ ) + L ˘ ˇ8BCA ( x ŒH ˇ8BCA ; ˆ ˚<6 Œ˘ ˇ8BCA ) + _ L 3E ( x ŒH ˇ4<> ŒH 83 ; ˆ ˚<6 Œ˘ ˇ4<> Œ˘ ˚ˇ ) + a L ˙02C ( x ŒH ˇ8BCA ; ˆ ˚<6 Œ˘ ˇ8BCA ) Œ (4.1) where _ and a arehyper-parametersdetermininghowmuchtherepresentationisdecomposedand decorrelatedineachtrainingiteration. ThediscriminativedemographicfeaturesinDebFace-IDareweakenedbyremovingdemographic information.Fortunately,ourde-biasingnetworkpreservesallpertinentdemographicfeaturesin adisentangledway.Basically,wetrainanothermultilayerperceptron(MLP) ˆ ˙40C toaggregate DebFace-IDandthedemographicembeddingsintoauni˝edfacerepresentationDemoID.Sinceage generallydoesnotpertaintoaperson'sidentity,weonlyconsidergenderandraceastheidentity- informativeattributes.Theaggregatedembedding, f ˇ4<>˚ˇ = ˆ 540C ( f ˚ˇ Œ f 6 Œ f A ) ,issupervisedbyan identity-basedtripletloss: L ˆ ˙40C = 1 " " X 8 =1 [ k f ( 8 ) ˇ4<>˚ˇ 0 f ( 8 ) ˇ4<>˚ˇ ? k 2 2 k f ( 8 ) ˇ4<>˚ˇ 0 f ( 8 ) ˇ4<>˚ˇ = k 2 2 + U ] + Œ (4.2) where f f ( 8 ) ˇ4<>˚ˇ 0 Œ f ( 8 ) ˇ4<>˚ˇ ? Œ f ( 8 ) ˇ4<>˚ˇ = g isthe 8 C tripletconsistingofananchor,apositive,anda negativeDemoIDrepresentation, " isthenumberofhardtripletsinamini-batch. [ G ] + = max (0 ŒG ) , and U isthemargin. 4.3.2.3AdversarialTrainingandDisentanglement AsdiscussedinSec.4.3.2.2,theadversariallossaimstominimizethetask-independentinformation semantically,whilethefactorizationlossstrivestodwindletheinterferinginformationstatistically. 78 Weemploybothlossestodisentangletherepresentationextractedby ˆ ˚<6 .Weintroducethe adversariallossasameanstolearnarepresentationthatisinvariantintermsofcertainattributes, whereaclassi˝ertrainedonitcannotcorrectlyclassifythoseattributesusingthatrepresentation.We takeoneoftheattributes, e.g. ,gender,asanexampletoillustratetheadversarialobjective.Firstof all,forademographicrepresentation f ˇ4<> ,welearnagenderclassi˝eron f ˇ4<> byoptimizingthe classi˝cationloss L ˘ ˝ ( x ŒH ˇ4<> ; ˆ ˚<6 Œ˘ ˝ ) .Secondly,forthesamegenderclassi˝er,weintendto maximizethechaosofthepredicteddistribution[171].Itiswellknownthatauniformdistribution hasthehighestentropyandpresentsthemostrandomness.Hence,wetraintheclassi˝ertopredict theprobabilitydistributionascloseaspossibletoauniformdistributionoverthecategoryspaceby minimizingthecrossentropy: L ˝ 3E ( x ŒH ˇ4<> ŒH 83 ; ˆ ˚<6 Œ˘ ˝ )= ˝ X : =1 1 ˝ (log 4 ˘ ˝ ( f ˇ4<> ) : P ˝ 9 =1 4 ˘ ˝ ( f ˇ4<> ) 9 +log 4 ˘ ˝ ( f ˚ˇ ) : P ˝ 9 =1 4 ˘ ˝ ( f ˚ˇ ) 9 ) Œ (4.3) where ˝ isthenumberofcategoriesingender 1 ,andtheground-truthlabelisnolongeran one-hotvector,buta ˝ -dimensionalvectorwithallelementsbeing 1 ˝ .Theabovelossfunction correspondstothedashlinesinFig.4.2.Itstrivesforgender-invarianceby˝ndingarepresentation thatmakesthegenderclassi˝er ˘ ˝ performpoorly.Weminimizetheadversariallossbyonly updatingparametersin ˆ ˚<6 . Wefurtherdecorrelatetherepresentationsbyreducingthemutualinformationacrossattributes. Byde˝nition,themutualinformationistherelativeentropy(KLdivergence)betweenthejoint distributionandtheproductdistribution.Toincreaseuncorrelation,weaddadistributionclassi˝er ˘ ˇ8BCA thatistrainedtosimplyperformabinaryclassi˝cationusing L ˘ ˇ8BCA ( x ŒH ˇ8BCA ; ˆ ˚<6 Œ˘ ˇ8BCA ) onsamples f ˇ8BCA fromboththejointdistributionanddotproductdistribution.Similartoadversarial learning,wefactorizetherepresentationsbytrickingtheclassi˝erviathesamesamplessothatthe predictionsareclosetorandomguesses, L ˙02C ( x ŒH ˇ8BCA ; ˆ ˚<6 Œ˘ ˇ8BCA )= 2 X 8 =1 1 2 log 4 ˘ ˇ8BCA ( f ˇ8BCA ) 8 P 2 9 =1 4 ˘ ˇ8BCA ( f ˇ8BCA ) 9 Ł (4.4) Ineachmini-batch,weconsider ˆ ˚<6 ( x ) assamplesofthejointdistribution ? ( f 6 Œ f 0 Œ f A Œ f ˚ˇ ) .We 1 Inourcase, ˝ =2 , i.e. ,maleandfemale. 79 Table4.1Statisticsoftrainingandtestingdatasetsusedinthepaper. Dataset#ofImages#ofSubjects Containsthelabelof GenderAgeRaceID CACD[172] 163 Œ 4462 Œ 000 NoYesNoYes IMDB[173] 460 Œ 72320 Œ 284 YesYesNoYes UTKFace[174] 24 Œ 106 -YesYesYesNo AgeDB[175] 16 Œ 488567 YesYesNoYes AFAD[176] 165 Œ 515 -YesYesYes 0 No AAF[177] 13 Œ 32213 Œ 322 YesYesNoYes FG-NET 2 1 Œ 00282 NoYesNoYes RFW[4] 665 Œ 807 -NoNoYesPartial BUPT-Balancedface[5] 1 Œ 251 Œ 43028 Œ 000 NoNoYesYes IMFDB-CVIT[178] 34 Œ 512100 YesAgeGroupsYes * Yes Asian-DeepGlint[179] 2 Œ 830 Œ 14693 Œ 979 NoNoYes 0 Yes MS-Celeb-1M[23] 5 Œ 822 Œ 65385 Œ 742 NoNoNoYes PCSO[180] 1 Œ 447 Œ 6075 Œ 749 YesYesYesYes LFW[27] 13 Œ 2335 Œ 749 NoNoNoYes IJB-A[29] 25 Œ 813500 YesYesSkinToneYes IJB-C[9] 31 Œ 3343 Œ 531 YesYesSkinToneYes 0 EastAsian * Indian randomlyshu˜efeaturevectorsofeachattribute,andre-concatenatetheminto 4 ˇ -dimension, whichareapproximatedassamplesoftheproductdistribution ? ( f 6 ) ? ( f 0 ) ? ( f A ) ? ( f ˚ˇ ) .During factorization,weonlyupdate ˆ ˚<6 tominimizemutualinformationbetweendecomposedfeatures. 4.3.3Experiments 4.3.3.1DatasetsandPre-processing Weutilize 15 totalfacedatasetsinthiswork,forlearningthedemographicestimationmodels, thebaselinefacerecognitionmodel,DebFacemodelaswellastheirevaluation.Tobespeci˝c, CACD[172],IMDB[173],UTKFace[174],AgeDB[175],AFAD[176],AAF[177],FG-NET[181], RFW[4],IMFDB-CVIT[178],Asian-DeepGlint[179],andPCSO[180]arethedatasetsfortraining andtestingdemographicestimationmodels;andMS-Celeb-1M[23],LFW[27],IJB-A[29],and IJB-C[9]areforlearningandevaluatingfaceveri˝cationmodels.Tab.4.1reportsthestatisticsof trainingandtestingdatasetsinvolvedinalltheexperimentsofbothGACandDebFace,including thetotalnumberoffaceimages,thetotalnumberofsubjects(identities),andwhetherthedataset containstheannotationofgender,age,race,oridentity(ID).AllfacesaredetectedbyMTCNN[33]. Eachfaceimageiscroppedandresizedto 112 112 pixelsusingasimilaritytransformationbased 80 onthedetectedlandmarks. 4.3.3.2ImplementationDetails DebFaceistrainedonacleanedversionofMS-Celeb-1M[6],usingtheArcFacearchitecture[6] with 50 layersfortheencoder ˆ ˚<6 .SincetherearenodemographiclabelsinMS-Celeb-1M,we ˝rsttrainthreedemographicattributeestimationmodelsforgender,age,andrace,respectively. Forageestimation,themodelistrainedonthecombinationofCACD,IMDB,UTKFace,AgeDB, AFAD,andAAFdatasets.Thegenderestimationmodelistrainedonthesamedatasetsexcept CACDwhichcontainsnogenderlabels.WecombineAFAD,RFW,IMFDB-CVIT,andPCSOfor raceestimationtraining.AllthreemodelsuseResNet[50]with 34 layersforage, 18 layersfor genderandrace.Wediscusstheevaluationresultsofthedemographicattributeestimationmodels inSec.4.5. WepredictthedemographiclabelsofMS-Celeb-1Mwiththewell-traineddemographicmodels. OurDebFaceisthentrainedonthere-labeledMS-Celeb-1MusingSGDwithamomentumof 0 Ł 9 ,a weightdecayof 0 Ł 01 ,andabatchsizeof 256 .Thelearningratestartsfrom 0 Ł 1 anddropsto 0 Ł 0001 followingthescheduleat 8 , 13 ,and 15 epochs.Thedimensionalityoftheembeddinglayerof ˆ ˚<6 is 4 512 , i.e. ,eachattributerepresentation(gender,age,race,ID)isa 512 - dim vector.Wekeep thehyper-parametersettingofAM-Softmaxas[6]: B =64 and < =0 Ł 5 .Thefeatureaggregation network ˆ ˙40C comprisesoftwolinearresidualunitswithP-ReLUandBatchNorminbetween. ˆ ˙40C istrainedonMS-Celeb-1MbySGDwithalearningrateof 0 Ł 01 .Thetripletlossmargin U is 1 Ł 0 .Thedisentangledfeaturesofgender,race,andidentityareconcatenatedintoa 3 512 - dim vector,whichinputsto ˆ ˙40C .Thenetworkisthentrainedtooutputa 512 - dim representationfor facerecognitiononbiaseddatasets. 4.3.3.3De-biasingFaceVeri˝cation Baseline: WecompareDebFace-IDwitharegularfacerepresentationmodelwhichhasthesame architectureasthesharedfeatureencoderofDebFace.ReferredtoasBaseFace,thisbaselinemodel 81 (a) BaseFace (b) DebFace-ID Figure4.3FaceVeri˝cationAUC(%)oneachdemographiccohort.Thecohortsarechosenbased onthethreeattributes, i.e. ,gender,age,andrace.To˝ttheresultsintoa 2 Dplot,weshowthe performanceofmaleandfemaleseparately.Duetothelimitednumberoffaceimagesinsome cohorts,theirresultsaregraycells. isalsotrainedonMS-Celeb-1M,withtherepresentationdimensionof 512 . Toshowthee˚cacyofDebFace-IDonbiasmitigation,weevaluatetheveri˝cationperformance ofDebFace-IDandBaseFaceonfacesfromeachdemographiccohort.Thereare 48 totalcohorts giventhecombinationofdemographicattributesincluding 2 gender(male,female), 4 race 3 (Black, White,EastAsian,Indian),and 6 agegroup( 0 12 , 13 18 , 19 34 , 35 44 , 45 54 , 55 100 ). WecombineCACD,AgeDB,CVIT,andasubsetofAsian-DeepGlintasthetestingset.Overlapping identitiesamongthesedatasetsareremoved.IMDBisexcludedfromthetestingsetduetoits massivenumberofwrongIDlabels.Forthedatasetwithoutcertaindemographiclabels,wesimply usethecorrespondingmodelstopredictthelabels.WereporttheAreaUndertheCurve(AUC)of theReceiverOperatingCharacteristics(ROC).Wede˝nethedegreeofbias,termed biasness ,asthe standarddeviationofperformanceacrosscohorts. Fig.4.3showsthefaceveri˝cationresultsofBaseFaceandDebFace-IDoneachcohort.Thatis, foraparticularfacerepresentation( e.g. ,DebFace-ID),wereportitsAUConeachcohortbyputting thenumberinthecorrespondingcell.Fromtheseheatmaps,weobservethatbothDebFace-IDand BaseFacepresentbiasinfaceveri˝cation,wheretheperformanceonsomecohortsaresigni˝cantly worse,especiallythecohortsofIndianfemaleandelderlypeople.ComparedtoBaseFace,DebFace- IDsuggestslessbiasandthedi˙erenceofAUCissmaller,wheretheheatmapexhibitssmoother edges.Fig.4.4showstheperformanceoffaceveri˝cationon 12 demographiccohorts.Both 3 Toclarify,weconsidertworacegroups,BlackandWhite;andtwoethnicitygroups,EastAsianandIndian.The wordracedenotesbothraceandethnicityhere. 82 (a) Gender (b) Age (c) Race Figure4.4Theoverallperformanceoffaceveri˝cationAUC(%)ongender,age,andrace. DebFace-IDandBaseFacepresentsimilarrelativeaccuraciesacrosscohorts.Forexample,both algorithmsperformworseontheyoungeragecohortsthanonadults;andtheperformanceonthe Indianissigni˝cantlylowerthanontheotherraces.DebFace-IDdecreasesthebiasbygaining discriminativefacefeaturesforcohortswithlessimagesinspiteofthereductionintheperformance oncohortswithmoresamples. 4.3.3.4De-biasingDemographicAttributeEstimation Baseline: Wefurtherexplorethebiasofdemographicattributeestimationandcomparedemographic attributeclassi˝ersofDebFacewithbaselineestimationmodels.Wetrainthreedemographic estimationmodels,namely,genderestimation(BaseGender),ageestimation(BaseAge),andrace estimation(BaseRace),onthesametrainingsetasDebFace.Forfairness,allthreemodelshavethe samearchitectureasthesharedlayersofDebFace. WecombinethefourdatasetsmentionedinSec.4.3.3.3withIMDBastheglobaltestingset. Asalldemographicestimationsaretreatedasclassi˝cationproblems,theclassi˝cationaccuracy isusedastheperformancemetric.AsshowninFig.4.5,alldemographicattributeestimations presentsigni˝cantbias.Forgenderestimation,bothalgorithmsperformworseontheWhiteand BlackcohortsthanonEastAsianandIndian.Inaddition,theperformanceonyoungchildrenis signi˝cantlyworsethanonadults.Ingeneral,theraceestimationmodelsperformbetteronthemale cohortthanonfemale.Comparedtogender,raceestimationshowshigherbiasintermsofage.Both baselinemethodsandDebFaceperformworseoncohortsinagebetween 13 to 44 thaninotherage 83 (a) BaseGender (b) DebFace-G (c) BaseAge (d) DebFace-A (e) BaseRace (f) DebFace-R Figure4.5Classi˝cationaccuracy(%)ofdemographicattributeestimationsonfacesofdi˙erent cohorts,byDebFaceandthebaselines.Forsimplicity,weuseDebFace-G,DebFace-A,and DebFace-Rtorepresentthegender,age,andraceclassi˝erofDebFace. Table4.2BiasnessofFaceRecognitionandDemographicAttributeEstimation. Method FaceVeri˝cationDemographicEstimation All Gender AgeRace Gender AgeRace Baseline 6.830.503.135.4912.3810.8314.58 DebFace 5.070.151.833.7010.227.6110.00 groups. Similartorace,ageestimationstillachievesbetterperformanceonmalethanonfemale. Moreover,thewhitecohortshowsdominantadvantagesoverotherracesinageestimation.Inspiteof theexistingbiasindemographicattributeestimations,theproposedDebFaceisstillabletomitigate biasderivedfromalgorithms.ComparedtoFig.4.5a,4.5e,4.5c,cellsinFig.4.5b,4.5f,4.5d presentmoreuniformcolors.WesummarizethebiasnessofDebFaceandbaselinemodelsforboth facerecognitionanddemographicattributeestimationsinTab.4.2.Ingeneral,weobserveDebFace substantiallyreducesbiasnessforbothtasks.Forthetaskwithlargerbiasness,thereductionof biasnessislarger. 84 (a) BaseFace (b) DebFace-ID (c) BaseFace (d) DebFace-ID (e) BaseFace (f) DebFace-ID Figure4.6ThedistributionoffaceidentityrepresentationsofBaseFaceandDebFace.Both collectionsoffeaturevectorsareextractedfromimagesofthesamedataset.Di˙erentcolorsand shapesrepresentdi˙erentdemographicattributes.Zoominfordetails. Original BaseFace DebFace-ID DebFace-G DebFace-R DebFace-A Figure4.7ReconstructedImagesusingFaceandDemographicRepresentations.The˝rstrowisthe originalfaceimages.Fromthesecondrowtothebottom,thefaceimagesarereconstructedfrom2) BaseFace;3)DebFace-ID;4)DebFace-G;5)DebFace-R;6)DebFace-A.Zoominfordetails. 4.3.3.5AnalysisofDisentanglement WenoticethatDebFacestillsu˙ersunequalperformanceindi˙erentdemographicgroups.It isbecausethereareotherlatentvariablesbesidesthedemographics,suchasimagequalityor 85 captureconditionsthatcouldleadtobiasedperformance.Suchvariablesaredi˚culttocontrol inpre-collectedlargefacedatasets.IntheframeworkofDebFace,itisalsorelatedtothedegree offeaturedisentanglement.Afullydisentanglingissupposedtocompletelyremovethefactors ofbiasfromdemographicinformation.ToillustratethefeaturedisentanglementofDebFace,we showthedemographicdiscriminativeabilityoffacerepresentationsbyusingthesefeaturesto estimategender,age,andrace.Speci˝cally,we˝rstextractidentityfeaturesofimagesfromthe testingsetinSec.4.3.3.1andsplitthemintotrainingandtestingsets.Givendemographiclabels, thefacefeaturesarefedintoatwo-layerfully-connectednetwork,learningtoclassifyoneofthe demographicattributes.Tab.4.3reportsthedemographicclassi˝cationaccuracyonthetestingset. Forallthreedemographicestimations,DebFace-IDpresentsmuchloweraccuraciesthanBaseFace, indicatingthedeclineofdemographicinformationinDebFace-ID.Wealsoplotthedistributionof identityrepresentationsinthefeaturespaceofBaseFaceandDebFace-ID.Fromthetestingsetin Sec.4.3.3.3,werandomlyselect50subjectsineachdemographicgroupandoneimageofeach subject.BaseFaceandDebFace-IDareextractedfromtheselectedimagesetandarethenprojected from 512 - dim to 2 - dim byT-SNE.Fig.4.6showstheirT-SNEfeaturedistributions.Weobservethat BaseFacepresentscleardemographicclusters,whilethedemographicclustersofDebFace-ID,asa resultofdisentanglement,mostlyoverlapwitheachother. TovisualizethedisentangledfeaturerepresentationsofDebFace,wetrainadecoderthat reconstructsfaceimagesfromtherepresentations.Fourfacedecodersaretrainedseparately foreachdisentangledcomponent, i.e. ,gender,age,race,andID.Inaddition,wetrainanother decodertoreconstructfacesfromBaseFaceforcomparison.AsshowninFig.4.7,bothBaseFace andDebFace-IDmaintaintheidentifyfeaturesoftheoriginalfaces,whileDebFace-IDpresents lessdemographiccharacteristics.Noraceorage,butgenderfeaturescanbeobservedonfaces reconstructedfromDebFace-G.Meanwhile,wecanstillrecognizeraceandageattributesonfaces generatedfromDebFace-RandDebFace-A. 86 Table4.3DemographicClassi˝cationAccuracy(%)byfacefeatures. MethodGenderRaceAge BaseFace95.2789.8278.14 DebFace-ID73.3661.7949.91 Table4.4FaceVeri˝cationAccuracy(%)onRFWdataset. MethodGenderRaceAge BaseFace95.2789.8278.14 DebFace-ID73.3661.7949.91 (a) BaseFace:Race (b) DebFace-ID:Race (c) BaseFace:Age (d) DebFace-ID:Age Figure4.8Thepercentageoffalseacceptedcrossraceoragepairsat1%FAR. 4.3.3.6FaceVeri˝cationonPublicTestingDatasets Wereporttheperformanceofthreedi˙erentsettings,using1)BaseFace,thesamebaselinein Sec.4.3.3.3,2)DebFace-ID,and3)thefusedrepresentationDemoID.Table4.5reportsface veri˝cationresultsononthreepublicbenchmarks:LFW,IJB-A,andIJB-C.OnLFW,DemoID outperformsBaseFacewhilemaintainingsimilaraccuracycomparedtoSOTAalgorithms.On IJB-A/C,DemoIDoutperformsallpriorworksexceptPFE[184].AlthoughDebFace-IDshows lowerdiscrimination,TARatlowerFARonIJB-CishigherthanthatofBaseFace.Toevaluate DebFaceonaraciallybalancedtestingdatasetRFW[4]andcomparewiththework[5],wetraina DebFacemodelonBUPT-Balancedface[5]dataset.Thenewmodelistrainedtoreduceracialbias bydisentanglingIDandrace.Tab.4.4reportstheveri˝cationresultsonRFW.WhileDebFace-ID givesaslightlylowerfaceveri˝cationaccuracy,itimprovesthebiasnessover[5]. WeobservethatDebFace-IDislessdiscriminativethanBaseFace,orDemoID,sincedemo- graphicsareessentialcomponentsoffacefeatures.TounderstandthedeteriorationofDebFace,we 87 Table4.5Veri˝cationPerformanceonLFW,IJB-A,andIJB-C. MethodLFW(%) Method IJB-A(%)IJB-C@FAR(%) 0.1%FAR0.001%0.01%0.1% DeepFace+[17] 97 Ł 35 Yin etal .[182] 73 Ł 9 4 Ł 2 --69.3 CosFace[19] 99 Ł 73 Cao etal .[59] 90 Ł 4 1 Ł 474 Ł 784 Ł 091 Ł 0 ArcFace[6] 99 Ł 83 Multicolumn[183] 92 Ł 0 1 Ł 377 Ł 186 Ł 292 Ł 7 PFE[184] 99 Ł 82 PFE[184] 95 Ł 3 0 Ł 989 Ł 693 Ł 395 Ł 5 BaseFace 99 Ł 38 BaseFace 90 Ł 2 1 Ł 180 Ł 288 Ł 092 Ł 9 DebFace-ID 98 Ł 97 DebFace-ID 87 Ł 6 0 Ł 982 Ł 088 Ł 189 Ł 5 DemoID 99 Ł 50 DemoID 92 Ł 2 0 Ł 883 Ł 289 Ł 492 Ł 9 analysethee˙ectofdemographicheterogeneityonfaceveri˝cationbyshowingthetendencyfor onedemographicgrouptoexperienceafalseaccepterrorrelativetoanothergroup.Foranytwo demographiccohorts,wecheckthenumberoffalselyacceptedpairsthatarefromdi˙erentgroupsat 1% FAR.Fig.4.8showsthepercentageofsuchfalselyaccepteddemographic-heterogeneouspairs. ComparedtoBaseFace,DebFaceexhibitsmorecross-demographicpairsthatarefalselyaccepted, resultingintheperformancedeclineondemographicallybiaseddatasets.Duetothedemographic informationreduction,DebFace-IDismoresusceptibletoerrorsbetweendemographicgroups.In thesenseofde-biasing,itispreferabletodecoupledemographicinformationfromidentityfeatures. However,ifweprefertomaintaintheoverallperformanceacrossalldemographics,wecanstill aggregatealltherelevantinformation.Itisanapplication-dependenttrade-o˙betweenaccuracyand de-biasing.DebFacebalancestheaccuracyvs.biastrade-o˙bygeneratingbothdebiasedidentity anddebiaseddemographicrepresentations,whichmaybeaggregatedintoDemoIDifbiasislessof aconcern. 4.3.3.7DistributionsofScores Wefollowtheworkof[21]thatinvestigatesthee˙ectofdemographichomogeneityandheterogeneity onfacerecognition.We˝rstrandomlyselectimagesfromCACD,AgeDB,CVIT,andAsian- DeepGlintdatasets,andextractthecorrespondingfeaturevectorsbyusingthemodelsofBaseFace andDebFace,respectively.Giventheirdemographicattributes,weputthoseimagesintoseparate groupsdependingonwhethertheirgender,age,andracearethesameordi˙erent.Foreachgroup,a 88 (a) BaseFace (b) DebFace Figure4.9BaseFaceandDebFacedistributionsofthesimilarityscoresoftheimposterpairsacross homogeneousversusheterogeneousgender,age,andracecategories. ˝xedfalsealarmrate(thepercentageofthefacepairsfromthesamesubjectsbeingfalselyveri˝edas fromdi˙erentsubjects)issetto 1% .Amongthefalselyveri˝edpairs,weplotthetop 10 C percentile scoresofthenegativefacepairs(apairoffaceimagesthatarefromdi˙erentsubjects)giventheir demographicattributes.AsshowninFig.4.9aandFig.4.9b,weobservethatthesimilaritiesof DebFacearehigherthanthoseofBaseFace.Oneofthepossiblereasonsisthatthedemographic informationisdisentangledfromtheidentityfeaturesofDebFace,increasingtheoverallpair-wise similaritiesbetweenfacesofdi˙erentidentities.Intermsofde-biasing,DebFacealsore˛ects smallerdi˙erencesofthescoredistributionwithrespecttothehomogeneityandheterogeneityof demographics. 4.4MitigatingFaceRecognitionBiasviaGroupAdaptiveClas- si˝er Inspitethee˙ectivenessofDebFaceinmitigatingdemographicbias,itdegeneratestheoverall recognitionperformanceaswell.Thismotivatesusto˝ndanthersolutiontothisproblemsuchthat thebiasnesscanbereducedwithoutimpairingtheaveragerecognitionperformance.Inthissection, 89 (a) (b)Bias( 1 Ł 11 ! 0 Ł 60 ) Figure4.10(a)Ourproposedgroupadaptiveclassi˝er(GAC)automaticallychoosesbetween non-adaptiveandadaptiveAlayerinamulti-layernetwork,wherethelatteruses demographic-group-speci˝ckernelandattention.(b)Comparedtothebaselinewiththe 50 -layer ArcFacebackbone,GACimprovesfaceveri˝cationaccuracyinmostgroupsofRFWdataset[4], especiallyunder-representedgroups,leadingtomitigatedFRbias.GACreducesbiasnessfrom 1 Ł 11 to 0 Ł 60 . weintroduceoursecondapproachtomitigatingfacerecognitionbiasviagroupadaptiveclassi˝er (GAC).ThemainideaofGACistooptimizethefacerepresentationlearningoneverydemographic groupinasinglenetwork,despitedemographicallyimbalancedtrainingdata.Conceptually,we maycategorizefacefeaturesintotwotypesofpatterns: generalpattern issharedbyallfaces; di˙erentialpattern isrelevanttodemographicattributes.Whenthedi˙erentialpatternofone speci˝cdemographicgroupdominatestrainingdata,thenetworklearnstopredictidentitiesmainly basedonthatpatternasitismoreconvenienttominimizethelossthanusingotherpatterns,thus bringingitbiastowardsfacesofthatspeci˝cgroup.Onemitigationistogivethenetworkmore capacitytobroadenitsscopeformultiplefacepatternsfromdi˙erentdemographicgroups.An unbiasedFRmodelshallrelyonnotonlyuniquepatternsforrecognitionofdi˙erentgroups,but alsogeneralpatternsofallfacesforimprovedgeneralizability.Accordingly,asinFig.4.10,we proposeGACtoexplicitlylearnthesedi˙erentfeaturepatterns.GACincludestwomodules:the adaptivelayerandautomationmodule.TheadaptivelayerinGACcomprisesadaptiveconvolution kernelsandchannel-wiseattentionmapswhereeachkernelandattentionmaptacklefacesin one demographicgroup.WealsointroduceanewobjectivefunctiontoGAC,whichdiminishesthe variationofaverageintra-classdistancebetweendemographicgroups. 90 PriorworkondynamicCNNsintroduceadaptiveconvolutionstoeithereverylayer[185 187],or manuallyspeci˝edlayers[188 190].Incontrast,thisworkproposesanautomationmoduletochoose whichlayerstoapplyadaptations.Asweobserved,notallconvolutionallayersrequireadaptive kernelsforbiasmitigation(seeFig.4.15a).AtanylayerofGAC,onlykernelsexpressinghigh dissimilarityareconsideredasdemographic-adaptivekernels.Forthosewithlowdissimilarity,their averagekernelissharedbyallinputimagesinthatlayer.Thus,theproposednetworkprogressively learnstoselecttheoptimalstructureforthedemographic-adaptivelearning.Thisenablesthatboth non-adaptivelayerswithsharedkernelsandadaptivelayersarejointlylearnedinauni˝ednetwork. Contributionsofthisworkaresummarisedas:1)Anewfacerecognitionalgorithmthatreduces demographicbiasandincreasesrobustnessofrepresentationsforfacesineverydemographicgroup byadoptingadaptiveconvolutionsandattentiontechniques;2)Anewadaptationmechanismthat automaticallydeterminesthelayerstoemploydynamickernelsandattentionmaps;3)Theproposed methodachievesSOTAperformanceonademographic-balanceddatasetandthreebenchmarks. 4.4.1AdaptiveNeuralNetworks SincethemaintechniqueappliedbyGACisadaptiveneuralnetwork,we˝rstreviewrecent workrelatedtoadaptivelearning.ThreetypesofCNN-basedadaptivelearningtechniquesare relatedtoourwork:adaptivearchitectures,adaptivekernels,andattentionmechanism.Adaptive architecturesdesignnewperformance-basedneuralfunctionsorstructures, e.g. ,neuronselection hiddenlayers[191]andautomaticCNNexpansionforFR[192].AsCNNadvancesmanyAI˝elds, priorworksproposedynamickernelstorealizecontent-adaptiveconvolutions.Li etal .[193] proposeashape-drivenkernelforfacialtraitrecognitionwhereeachlandmark-centeredpatchhasa uniquekernel.Aconvolutionfusionforgraphneuralnetworksisintroducedby[194]whereasetof varying-size˝ltersareusedperlayer.Theworksof[195]and[196]useakernelselectionscheme toautomaticallyadjustthereceptive˝eldsizebasedoninputs.Tobettersuitinputdata,[197]splits trainingdataintoclustersandlearnsanexclusivekernelpercluster.Li etal .[198]introducean adaptiveCNNforobjectdetectionthattransferspre-trainedCNNstoatargetdomainbyselecting 91 (a)AdaptiveKernel (b)AttentionMap (c)GAC Figure4.11AcomparisonofapproachesinadaptiveCNNs. usefulkernelsperlayer.Alternatively,onemayfeedinputimagesorfeaturesintoakernelfunction todynamicallygenerateconvolutionkernels[199 202].Despiteitse˙ectiveness,suchindividual adaptationmaynotbesuitablegiventhediversityoffacesindemographicgroups.Ourworkis mostrelatedtothesideinformationadaptiveconvolution[185],whereineachlayerasub-network inputsauxiliaryinformationtogenerate˝lterweights.Wemainlydi˙erinthatGACautomatically learnswheretouseadaptivekernelsinamulti-layerCNN(seeFigs.4.11aand4.11c),thusmore e˚cientandcapableinapplyingtoadeeperCNN. Asthehumanperceptionprocessnaturallyselectsthemostpertinentpieceofinformation, attentionmechanismsaredesignedforavarietyoftasks, e.g. ,detection[203],recognition[204], imagecaptioning[205],tracking[206],poseestimation[190],andsegmentation[188].Typically, attentionweightsareestimatedbyfeedingimagesorfeaturemapsintoasharednetwork,composedof convolutionalandpoolinglayers[204,207 209]ormulti-layerperceptron(MLP)[210 213].Apart fromfeature-basedattention,Hou etal .[189]proposeacorrelation-guidedcrossattentionmapfor few-shotclassi˝cationwherethecorrelationbetweentheclassfeatureandqueryfeaturegeneratesthe attentionweights.Theworkof[186]introducesacross-channelcommunicationblocktoencourage informationexchangeacrosschannelsattheconvolutionallayer.Toacceleratethechannelinteraction, Wang etal .[187]proposea 1 Dconvolutionacrosschannelsforattentionprediction.Di˙erent frompriorwork,ourattentionmapsareconstructedbydemographicinformation(seeFigs.4.11b andFig.4.11c),whichimprovestherobustnessoffacerepresentationsineverydemographicgroup. 92 4.4.2Methodology 4.4.2.1Overview OurgoalistotrainaFRnetworkthatisimpartialtoindividualsindi˙erentdemographicgroups. Unlikeimage-relatedvariations, e.g. ,large-posesorlow-resolutionfacesarehardertoberecognized, demographicattributesaresubject-relatedpropertieswithnoapparentimpactinrecognizability ofidentity,atleastfromalayman'sperspective.Thus,anunbiasedFRsystemshouldbeableto obtainequallysalientfeaturesforfacesacrossdemographicgroups.However,duetoimbalanced demographicdistributionsandinherentfacedi˙erencesbetweengroups,itwasshownthatcertain groupsachievehigherperformanceevenwithhand-craftedfeatures[14].Thus,itisimpracticalto extractfeaturesfromdi˙erentdemographicgroupsthatexhibitequaldiscriminability.Despitesuch disparity,aFRalgorithmcanstillbedesignedto mitigate thedi˙erenceinperformance. Tothisend,weproposeaCNN-basedgroupadaptiveclassi˝erthatutilizesdynamickernelsand attentionmapstoboostFRperformanceinalldemographicgroupsconsideredhere.Speci˝cally, GAChastwomainmodules,anadaptivelayerandanautomationmodule.Inanadaptivelayer, faceimagesorfeaturemapsareconvolvedwithauniquekernelforeachdemographicgroup,and multipliedwithadaptiveattentionmapstoobtaindemographic-di˙erentialfeaturesforfacesina certaingroup.Theautomationmoduledeterminesinwhichlayersofthenetworkadaptivekernels andattentionmapsshouldbeapplied.AsshowninFig.4.12,givenanalignedface,anditsidentity label H ˚ˇ ,apre-traineddemographicclassi˝er˝rstestimatesitsdemographicattribute H ˇ4<> .With H ˇ4<> ,theimageisthenfedintoarecognitionnetworkwithmultipledemographicadaptivelayers toestimateitsidentity.Inthefollowing,wepresentthesetwomodules. 4.4.2.2AdaptiveLayer AdaptiveConvolution. ForastandardconvolutioninCNN,animageorfeaturemapfromthe previouslayer - 2 R 2 - F - isconvolvedwithasinglekernelmatrix 2 R : 2 F ,where 2 is thenumberofinputchannels, : thenumberof˝lters, - and F - theinputsize,and and F the 93 Figure4.12OverviewoftheproposedGACformitigatingFRbias.GACcontainstwomajor modules:theadaptivelayerandtheautomationmodule.Theadaptivelayerconsistsofadaptive kernelsandattentionmaps.Theautomationmoduleisemployedtodecidewhetheralayershould beadaptiveornot. ˝ltersize.Suchanoperationsharesthekernelwitheveryinputgoingthroughthelayer,andisthus agnostictodemographiccontent,resultinginlimitedcapacitytorepresentminoritygroups.To mitigatethebiasinconvolution,weintroduceatrainablematrixofkernelmasks " 2 R = 2 F , where = isthenumberofdemographicgroups.Intheforwardpass,thedemographiclabel H ˇ4<> andkernelmatrix " arefedintotheadaptiveconvolutionallayertogeneratedemographicadaptive ˝lters.Let 8 2 R 2 F denotethe 8 C channeloftheshared˝lter.The 8 C channelofadaptive ˝lterforgroup H ˇ4<> is: H ˇ4<> 8 = 8 N " H ˇ4<> Œ (4.5) where " H ˇ4<> 2 R 2 F isthe H ˇ4<> C kernelmaskforgroup H ˇ4<> ,and N denoteselement-wise multiplication.Thenthe 8 C channeloftheoutputfeaturemapisgivenby / 8 = 5 ( - H ˇ4<> 8 ) ,where *denotesconvolution,and 5 ( ) isactivation.Unlikeconventionalconvolution,samplesinevery demographicgrouphaveauniquekernel H ˇ4<> . AdaptiveAttention. Eachchannel˝lterinaCNNplaysanimportantroleineverydimensionof the˝nalrepresentation,whichcanbeviewedasasemanticpatterndetector[205].Intheadaptive convolution,however,thevaluesofakernelmaskarebroadcastalongthechanneldimension, indicatingthattheweightselectionisspatiallyvariedbutchannel-wisejoint.Hence,weintroducea 94 channel-wiseattentionmechanismtoenhancethefacefeaturesthataredemographic-adaptive.First, atrainablematrixofchannelattentionmaps " 2 R = : isinitializedineveryadaptiveattentionlayer. Given H ˇ4<> andthecurrentfeaturemap / 2 R : / F / ,where / and F / aretheheightandwidth of / ,the 8 C channelofthenewfeaturemapiscalculatedby: / H ˇ4<> 8 = Sigmoid ( " H ˇ4<> 8 ) / 8 Œ (4.6) where " H ˇ4<> 8 istheentryinthe H ˇ4<> C rowof " forthedemographicgroup H ˇ4<> at 8 C column. Incontrasttotheadaptiveconvolution,elementsofeachdemographicattentionmap " H ˇ4<> diverge inchannel-wisemanner,whilethesingleattentionweight " H ˇ4<> 8 isspatiallysharedbytheentire matrix / 8 2 R / F / .Thetwoadaptivematrices, " and " ,arejointlytunedwithalltheother parameterssupervisedbytheclassi˝cationloss. UnlikedynamicCNNs[185]whereadditionalnetworksareengagedtoproduceinput-variant kernelorattentionmap,ouradaptivenessisyieldedbyasimplethresholdingfunctiondirectly pointingtothedemographicgroupwithnoauxiliarynetworks.Althoughthekernelnetwork in[185]cangeneratecontinuouskernelswithoutenlargingtheparameterspace,furtherencoding isrequiredifthesideinputsforkernelnetworkarediscretevariables.Ourapproach,incontrast, divideskernelsintoclusterssothatthebranchparameterlearningcansticktoaspeci˝cgroup withoutinterferencefromindividualuncertainties,makingitsuitablefordiscretedomainadaptation. Further,theadaptivekernelmasksinGACaremoree˚cientintermsofthenumberofadditional parameters.Comparedtoanon-adaptivelayer,thenumberofadditionalparametersofGACis = 2 F ,whilethatof[185]is B : 2 F ifthekernelnetworkisaone-layer MLP,where B isthedimensionofinputsideinformation.Thus,foroneadaptivelayer,[185]has B : = timesmoreparametersthanours,whichcanbesubstantialgiventhetypicallargevalueof : ,the numberof˝lters. 95 4.4.2.3AutomationModule Thoughfacesindi˙erentdemographicgroupsareadaptivelyprocessedbyvariouskernelsand attentionmaps,itisine˚cienttousesuchadaptationsin every layerofadeepCNN.Torelieve theburdenofunnecessaryparametersandavoidempiricaltrimming,weadoptasimilarityfusion processtoautomaticallydeterminetheadaptivelayers.Sincethesamefusionschemecanbe appliedtobothtypesofadaptation,wetaketheadaptiveconvolutionasanexampletoillustratethis automaticscheme. First,amatrixcomposedof = kernelmasksisinitializedineveryconvolutionallayer.As trainingcontinues,eachkernelmaskisupdatedindependentlytoreduceclassi˝cationlossforeach demographicgroup.Second,wereshapethekernelmasksinto 1 Dvectors V =[ v 1 Œ v 2 ŒŁŁŁŒ v = ] , where v 8 2 R ; Œ; = 2 F isthekernelmaskofthe 8 C demographicgroup.Next,we computeCosinesimilaritybetweentwokernelvectors, \ 89 = v 8 k v 8 k v 9 k v 9 k ,where 1 8Œ9 = .The averagesimilarityofallpair-wisesimilaritiesisobtainedby \ = 2 = ( = 1) P 8 P 9 \ 89 Œ8 6 = 9 .Ifthe dissimilarity \ islowerthanapre-de˝nedthreshold g ,thekernelparametersinthislayerreveal thedemographic-agnosticproperty.Hence,wemergethe = kernelsintoasinglekernelbyaveraging alongthegroupdimension.Byde˝nition,alower g impliesmoreadaptivelayers.Givenanarray of f \ 8 g C ( C isthetotalnumberofconvolutionallayers),we˝rstsorttheelementsfromsmallest tohighest,andthisway,layerswhose \ 8 valuesarelargerthan g willbeadaptive.Thus,when g decreases,morelayerswillbeadaptive.Inthesubsequenttraining,thissinglekernelcanstillbe updatedseparatelyforeachdemographicgroup,asthekernelmaybecomedemographic-adaptivein laterepochs.Wemonitorthesimilaritytrendoftheadaptivekernelsineachlayeruntil \ isstable. 4.4.2.4De-biasingObjectiveFunction Apartfromtheobjectivefunctionforfaceidentityclassi˝cation,wealsoadoptaregressloss functiontonarrowthegapoftheintra-classdistancebetweendemographicgroups.Let 6 ( ) denotetheinferencefunctionofGAC,and ˚ 896 isthe 8 C imageofsubject 9 ingroup 6 .Thus, thefeaturerepresentationofimage ˚ 896 isgivenby r 896 = 6 ( ˚ 896 Œ w ) ,where w denotestheGAC 96 parameters.AssumingthefeaturedistributionofeachsubjectisaGaussiandistributionwithidentity covariancematrix(hyper-sphere),weutilizetheaverageEuclideandistancetoeverysubjectcenter astheintra-classdistanceofeachsubject.Inparticular,we˝rstcomputethecenterpointofeach identity-sphere: - 96 = 1 # # X 8 =1 6 ( ˚ 896 Œ w ) Œ (4.7) where # isthetotalnumberoffaceimagesofsubject 9 .Theaverageintra-classdistanceofsubject 9 isasfollows: ˇ8BC 96 = 1 # # X 8 =1 ( r 896 - 96 ) ) ( r 896 - 96 ) Ł (4.8) Wethencomputetheintra-classdistanceforallsubjectsingroup 6 as ˇ8BC 6 = 1 & P & 9 =1 ˇ8BC 96 ,where & isthenumberoftotalsubjectsingroup 6 .Thisallowsustolowerthedi˙erenceofintra-class distanceby: L 180B = _ & = = X 6 =1 & X 9 =1 ˇ8BC 96 1 = = X 6 =1 ˇ8BC 6 Œ (4.9) where _ isthecoe˚cientforthede-biasingobjective. 4.4.3Experiments Datasets OurbiasstudyusesRFWdataset[4]fortestingandBUPT-Balancedfacedataset[5] fortraining.RFWconsistsoffacesinfourrace/ethnicgroups:White,Black,EastAsian,and SouthAsian 4 .Eachgroupcontains ˘ 10 Kimagesof 3 Kindividualsforfaceveri˝cation.BUPT- Balancedfacecontains 1 Ł 3 Mimagesof 28 Kcelebritiesandisapproximatelyrace-balancedwith 7 Kidentitiesperrace.Otherthanrace,wealsostudygenderbias.WecombineIMDB[173], UTKFace[174],AgeDB[175],AAF[177],AFAD[176]totrainagenderclassi˝er,whichestimates genderoffacesinRFWandBUPT-Balancedface.Thestatisticsofthedatasetsarereportedin Tab.4.1.Allfaceimagesarecroppedandresizedto 112 112 pixelsvialandmarksdetectedby RetinaFace[34]. 4 RFW[4]usesCaucasian,African,Asian,andIndiantonamedemographicgroups.Weadoptthesegroupsand accordinglyrenametoWhite,Black,EastAsian,andSouthAsianforclearerrace/ethnicityde˝nition. 97 ImplementationDetails WetrainabaselinenetworkandGAConBUPT-Balancedface,using the 50 -layerArcFacearchitecture[6].Theclassi˝cationlossisanadditiveCosinemarginin Cosface[19],withthescaleandmarginof B =64 and < =0 Ł 5 .TrainingisoptimizedbySGDwith abatchsize 256 .Thelearningratestartsfrom 0 Ł 1 anddropsto 0 Ł 0001 followingthescheduleat 8 , 13 , 15 epochsforthebaseline,and 5 , 17 , 19 epochsforGAC.Weset _ =0 Ł 1 fortheintra-distance de-biasing. g = 0 Ł 2 ischosenforautomaticadaptationinGAC.OurFRmodelsaretrainedtoextract a 512 -dimrepresentation.Ourdemographicclassi˝erusesa 18 -layerResNet[50].ComparingGAC andthebaseline,theaveragefeatureextractionspeedperimageonNvidia 1080 TiGPUis 1 Ł 4 ms and 1 Ł 1 ms,andthenumberofmodelparametersis 44 Ł 0 Mand 43 Ł 6 M,respectively. PerformanceMetrics Thecommongroupfairnesscriterialikedemographicparitydistance[150] areimpropertoevaluatefairnessoflearntrepresentations,sincetheyaredesignedtomeasureinde- pendencepropertiesofrandomvariables.However,inFRthesensitivedemographiccharacteristics aretiedtoidentities,makingthesetwovariablescorrelated.TheNISTreportusesfalsenegative andfalsepositiveforeachdemographicgrouptomeasurethefairness[7].Insteadofplottingfalse negativevs.falsepositives,weadoptacompactquantitativemetric, i.e. ,thestandarddeviation (STD)oftheperformanceindi˙erentdemographicgroups,previouslyintroducedin[5,74]and calledAsbiasisconsideredassystematicerroroftheestimatedvaluescomparedtothe actualvalues,here,weassumetheaverageperformancetobetheactualvalue.Foreachdemographic group,itsbiasnessistheerrorbetweentheaverageandtheperformanceondemographicgroup. Theoverallbiasnessistheexpectationofallgrouperrors,whichistheSTDofperformanceacross groups.Wealsoreportaverageaccuracy(Avg)toshowtheoverallFRperformance. 4.4.3.1ResultsonRFWProtocol WefollowRFWfaceveri˝cationprotocolwith 6 Kpairsperrace/ethnicity.Themodelsaretrained onBUPT-Balancedfacewithgroundtruthraceandidentitylabels. ComparewithSOTA. WecomparetheGACwithfourSOTAalgorithmsonRFWprotocol,namely, ACNN[185],RL-RBN[5],PFE[184],andDebFace[74].SincetheapproachinACNN[185]is 98 Table4.6PerformancecomparisonwithSOTAontheRFWprotocol[4].Theresultsmarkedby(*) aredirectlycopiedfrom[5]. MethodWhiteBlackEastAsianSouthAsianAvg( " )STD( # ) RL-RBN[5] 96 Ł 2795 Ł 0094 Ł 8294 Ł 6895 Ł 190 Ł 63 ACNN[185] 96 Ł 1294 Ł 0093 Ł 6794 Ł 5594 Ł 580 Ł 94 PFE[184] 96 Ł 3895 Ł 17 94 Ł 2794 Ł 6095 Ł 110 Ł 93 ArcFace[6] 96 Ł 18 94 Ł 67 93 Ł 72 93 Ł 98 94 Ł 640 Ł 96 CosFace[19] 95 Ł 12 93 Ł 93 92 Ł 98 92 Ł 93 93 Ł 740 Ł 89 DebFace[74] 95 Ł 9593 Ł 6794 Ł 3394 Ł 7894 Ł 680 Ł 83 GAC 96 Ł 2094 Ł 77 94 Ł 8794 Ł 9895 Ł 210 Ł 58 relatedtoGAC,were-implementitandapplytothebiasmitigationproblem.First,wetrainarace classi˝erwiththecross-entropylossonBUPT-Balancedface.Thenthesoftmaxoutputofourrace classi˝erisfedtoa˝ltermanifoldnetwork(FMN)togenerateadaptive˝lterweights.Here,FMNis atwo-layerMLPwithaReLUinbetween.SimilartoGAC,raceprobabilitiesareconsideredas auxiliaryinformationforfacerepresentationlearning.WealsocomparewiththeSOTAapproach PFE[184]bytrainingitonBUPT-Balancedface.AsshowninTab.4.6,GACissuperiortoSOTA w.r.t.averageperformanceandfeaturefairness.ComparedtokernelmasksinGAC,theFMNin ACNN[185]containsmoretrainableparameters.Applyingittoeachconvolutionallayerisprone toover˝tting.Infact,thelayersthatareadaptiveinGAC( g = 0 Ł 2 )aresettobetheFMNbased convolutioninACNN.Astheracedataisafour-elementinputinourcase,usingextrakernel networksaddscomplexitytotheFRnetwork,whichdegradestheveri˝cationperformance.Even thoughPFEperformsthebestonstandardbenchmarks(Tab.4.15),itstillexhibitshighbiasness. OurGACoutperformsPFEonRFWinbothbiasnessandaverageperformance.Comparedto DebFace[74],inwhichdemographicattributesaredisentangledfromtheidentityrepresentations, GACachieveshigherveri˝cationperformancebyoptimizingtheclassi˝cationforeachdemographic group,withalowerbiasnessaswell. TofurtherpresentthesuperiorityofGACoverthebaselinemodelintermsofbias,weplot ReceiverOperatingCharacteristic(ROC)curvestoshowthevaluesofTrueAcceptanceRate(TAR) atvariousvaluesofFalseAcceptanceRate(FAR).Fig.4.13showstheROCperformanceofGAC andthebaselinemodelonRFW.WeseethatthecurvesofdemographicgroupsgeneratedbyGAC 99 (a)Baseline (b)GAC Figure4.13ROCof(a)baselineand(b)GACevaluatedonallpairsofRFW. suggestsmallergapsinTARateveryFAR,whichdemonstratesthede-biasingcapabilityofGAC. Fig.4.14showspairsoffalsepositives(twofacesfalselyveri˝edasthesameidentity)andfalse negativesinRFWdataset. AblationonAdaptiveStrategies. Toinvestigatethee˚cacyofournetworkdesign,weconduct threeablationstudies:adaptivemechanisms,numberofconvolutionallayers,anddemographic information.Foradaptivemechanisms,sincedeepfeaturemapscontainbothspatialandchannel-wise information,westudytherelationshipamongadaptivekernels,spatialandchannel-wiseattentions, andtheirimpacttobiasmitigation.Wealsostudytheimpactof g inourautomationmodule.Apart fromthebaselineandGAC,weablateeightvariants:(1)GAC-Channel:channel-wiseattention forrace-di˙erentialfeature;(2)GAC-Kernel:adaptiveconvolutionwithrace-speci˝ckernels;(3) GAC-Spatial:onlyspatialattentionisaddedtobaseline;(4)GAC-CS:bothchannel-wiseandspatial attention;(5)GAC-CSK:combineadaptiveconvolutionwithspatialandchannel-wiseattention; (6,7,8)GAC-( g = ):set g to . FromTab.4.7,wemakeseveralobservations:(1)thebaselinemodelisthemostbiasedacross racegroups.(2)spatialattentionmitigatestheracebiasatthecostofveri˝cationaccuracy,and islesse˙ectiveonlearningfairfeaturesthanotheradaptivetechniques.Thisisprobablybecause spatialcontents,especiallylocallayoutinformation,onlyresideatearlierCNNlayers,wherethe 100 Figure4.14 8 falsepositiveandfalsenegativepairsonRFWgivenbythebaselinebutsuccessfully veri˝edbyGAC. spatialdimensionsaregraduallydecreasedbythelaterconvolutionsandpoolings.Thus,semantic detailslikedemographicattributesarehardlyencodedspatially.(3)ComparedtoGAC,combining adaptivekernelswithbothspatialandchannel-wiseattentionincreasesthenumberofparameters, loweringtheperformance.(4)As g determinesthenumberofadaptivelayersinGAC,ithasagreat impactontheperformance.Asmall g mayincreaseredundantadaptivelayers,whiletheadaptation layersmaylackincapacityiftoolarge. AblationonDepthsandDemographicLabels. Boththeadaptivelayersandde-biasinglossin GACcanbeappliedtoCNNinanydepth.Inthisablation,wetrainboththebaselineandGAC ( _ =0 Ł 1 , g = 0 Ł 2 )inArcFacearchitecturewiththreedi˙erentnumbersoflayers: 34 , 50 ,and 100 .AsthetrainingofGACreliesondemographicinformation,theerrorandbiasindemographic labelsmightimpactthebiasreductionofGAC.Thus,wealsoablatewithdi˙erentdemographic 101 Table4.7AblationofadaptivestrategiesontheRFWprotocol[4]. MethodWhiteBlackEastAsianSouthAsianAvg( " )STD( # ) Baseline 96 Ł 1893 Ł 9893 Ł 7294 Ł 6794 Ł 641 Ł 11 GAC-Channel 95 Ł 9593 Ł 6794 Ł 3394 Ł 7894 Ł 680 Ł 83 GAC-Kernel 96 Ł 2394 Ł 4094 Ł 2794 Ł 8094 Ł 930 Ł 78 GAC-Spatial 95 Ł 9793 Ł 2093 Ł 6793 Ł 9394 Ł 191 Ł 06 GAC-CS 96 Ł 2293 Ł 9594 Ł 32 95 Ł 12 94 Ł 650 Ł 87 GAC-CSK 96 Ł 1893 Ł 5894 Ł 2894 Ł 8394 Ł 720 Ł 95 GAC-( g =0 ) 96 Ł 1893 Ł 9793 Ł 8894 Ł 7794 Ł 700 Ł 92 GAC-( g = 0 Ł 1 ) 96 Ł 25 94 Ł 2594 Ł 8394 Ł 7295 Ł 010 Ł 75 GAC-( g = 0 Ł 2 ) 96 Ł 20 94 Ł 7794 Ł 87 94 Ł 98 95 Ł 210 Ł 58 Table4.8AblationofCNNdepthsanddemographicsonRFWprotocol[4]. MethodWhiteBlackEastAsianSouthAsianAvg( " )STD( # ) NumberofLayers ArcFace-34 96 Ł 1393 Ł 1592 Ł 8593 Ł 0393 Ł 781 Ł 36 GAC-ArcFace-34 96 Ł 0294 Ł 1294 Ł 1094 Ł 2294 Ł 620 Ł 81 ArcFace-50 96 Ł 1893 Ł 9893 Ł 7294 Ł 6794 Ł 641 Ł 11 GAC-ArcFace-50 96 Ł 2094 Ł 7794 Ł 8794 Ł 9895 Ł 210 Ł 58 ArcFace-100 96 Ł 2393 Ł 8394 Ł 2794 Ł 8094 Ł 780 Ł 91 GAC-ArcFace-100 96 Ł 4394 Ł 5394 Ł 9095 Ł 0395 Ł 220 Ł 72 Race/EthnicityLabels Ground-truth 96 Ł 2094 Ł 7794 Ł 8794 Ł 9895 Ł 210 Ł 58 Estimated 96 Ł 2794 Ł 4094 Ł 3294 Ł 7794 Ł 940 Ł 79 Random 95 Ł 9593 Ł 1094 Ł 1894 Ł 8294 Ł 501 Ł 03 (a) (b) Figure4.15(a)Foreachofthethree g inautomaticadaptation,weshowtheaveragesimilarities ofpair-wisedemographickernelmasks, i.e. , \ ,at 1 - 48 layers(y-axis),and 1 - 15 trainingsteps (x-axis).Thenumberofadaptivelayersinthreecases, i.e. , P 48 1 ( \¡g ) at 15 C step,are 12 , 8 ,and 2 ,respectively.(b)Withtworacegroups(White,BlackinPCSO[14])andtwomodels(baseline, GAC),foreachofthefourcombinations,wecomputepair-wisecorrelationoffacerepresentations usinganytwoof 1 Ksubjectsinthesamerace,andplotthehistogramofcorrelations.GACreduces thedi˙erence/biasoftwodistributions. 102 Table4.9 Ablationson _ onRFWprotocol(%). _ WhiteBlackEastAsianSouthAsianAvg( " )STD( # ) 0 96 Ł 2394 Ł 6594 Ł 9395 Ł 1295 Ł 230 Ł 60 0.1 96 Ł 2094 Ł 7794 Ł 8794 Ł 9895 Ł 210 Ł 58 0.5 94 Ł 8994 Ł 0093 Ł 6794 Ł 5594 Ł 280 Ł 47 Table4.10Veri˝cationAccuracy(%)of 5 -foldcross-validationon 8 groupsofRFW[4]. MethodGenderWhiteBlackEastAsianSouthAsianAvg( " )STD( # ) Baseline Male 97 Ł 49 0 Ł 0896 Ł 94 0 Ł 2697 Ł 29 0 Ł 0997 Ł 03 0 Ł 13 96 Ł 96 0 Ł 030 Ł 69 0 Ł 04 Female 97 Ł 19 0 Ł 1097 Ł 93 0 Ł 1195 Ł 71 0 Ł 1196 Ł 01 0 Ł 08 AL+Manual Male 98 Ł 57 0 Ł 1098 Ł 05 0 Ł 1798 Ł 50 0 Ł 12 98 Ł 36 0 Ł 02 98 Ł 09 0 Ł 050 Ł 66 0 Ł 07 Female 98 Ł 12 0 Ł 18 98 Ł 97 0 Ł 13 96 Ł 83 0 Ł 1997 Ł 33 0 Ł 13 GAC Male 98 Ł 75 0 Ł 0498 Ł 18 0 Ł 2098 Ł 55 0 Ł 07 98 Ł 31 0 Ł 12 98 Ł 19 0 Ł 060 Ł 56 0 Ł 05 Female 98 Ł 26 0 Ł 16 98 Ł 80 0 Ł 15 97 Ł 09 0 Ł 1297 Ł 56 0 Ł 10 information,(1)ground-truth:therace/ethnicitylabelsprovidedbyRFW;(2)estimated:thelabels predictedbyapre-trainedraceestimationmodel;(3)random:thedemographiclabelrandomly assignedtoeachface. AsshowninTab.4.8,comparedtothebaselines,GACsuccessfullyreducestheSTDatdi˙erent numberoflayers.Weseethatthemodelwithleastnumberoflayerspresentsthemostbias,andthe biasreductionbyGACisthemostaswell.Thenoiseandbiasindemographiclabelsdo,however, impairtheperformanceofGAC.Withestimateddemographics,thebiasnessishigherthanthatof themodelwithground-truthsupervision.Meanwhile,themodeltrainedwithrandomdemographics hasthehighestbiasness.Evenso,usingestimatedattributesduringtestingstillimprovesfairnessin facerecognitioncomparedtobaseline.Thisindicatesthee˚cacyofGACevenintheabsenceof ground-truthlabels. Ablationon _ . Weuse _ tocontroltheweightofde-biasingloss.Tab.4.9reportstheresultsof GACtrainedwithdi˙erentvaluesof _ .When _ =0 ,de-biasinglossisremovedintraining.The resultsindicatealarger _ leadstolowerbiasnessatthecostofoverallaccuracy. AblationonAutomationModule Here,wealsoablateGACwithtwovariantstoshowthee˚ciencyofitsautomationmodule: i) Ada-All , i.e. ,alltheconvolutionallayersareadaptiveandii) Ada-8 , i.e. ,thesame 8 layersas GACaresettobeadaptivestartingfromthebeginningofthetrainingprocess,withnoautomation 103 Table4.11AblationsontheautomationmoduleonRFWprotocol(%). MethodWhiteBlackEastAsianSouthAsianAvg( " )STD( # ) Ada-All 93 Ł 2290 Ł 9591 Ł 3292 Ł 1291 Ł 900 Ł 87 Ada-8 96 Ł 2594 Ł 4094 Ł 3595 Ł 1295 Ł 030 Ł 77 GAC 96 Ł 2094 Ł 7794 Ł 8794 Ł 9895 Ł 210 Ł 58 AverageImage EastAsianfemale EastAsianmale Blackfemale Blackmale Whitefemale Whitemale SouthAsianfemale SouthAsianmale Heatmap+Image- GAC Heatmap+Image- Base Figure4.16The˝rstrowshowstheaveragefacesofdi˙erentgroupsinRFW.Thenexttworows showgradient-weightedclassactivationheatmaps[15]atthe 43 C convolutionallayeroftheGAC andbaseline.ThehigherdiversityofheatmapsinGACshowsthevariabilityofparametersinGAC acrossgroups. module(ourbestGACmodelhas 8 adaptivelayers).AsinTab.4.11,withautomationmodule,GAC achieveshigheraverageaccuracyandlowerbiasnessthantheothertwomodels. 4.4.3.2ResultsonGenderandRaceGroups Table4.12Statisticsofdatasetfoldsinthecross-validationexperiment. Fold White(#)Black(#)EastAsian(#)SouthAsian(#) SubjectsImagesSubjectsImagesSubjectsImagesSubjectsImages 11 Œ 99168 Œ 1591 Œ 99967 Œ 8801 Œ 89867 Œ 1041 Œ 99657 Œ 628 21 Œ 99167 Œ 4991 Œ 99965 Œ 7361 Œ 89866 Œ 2581 Œ 99657 Œ 159 31 Œ 99166 Œ 0911 Œ 99965 Œ 6701 Œ 89867 Œ 6961 Œ 99656 Œ 247 41 Œ 99166 Œ 3331 Œ 99967 Œ 7571 Œ 89865 Œ 3411 Œ 99657 Œ 665 51 Œ 99468 Œ 5971 Œ 99967 Œ 7471 Œ 89868 Œ 7632 Œ 00056 Œ 703 Wenowextenddemographicattributestobothgenderandrace.First,wetraintwoclassi˝ers thatpredictgenderandrace/ethnicityofafaceimage.Theclassi˝cationaccuracyofgenderand 104 Table4.13Veri˝cation(%)ongendergroupsofIJB-C(TAR@ 0 Ł 1% FAR). ModelMaleFemaleAvg( " )STD( # ) Baseline 89 Ł 7279 Ł 5784 Ł 645 Ł 08 GAC 88 Ł 2583 Ł 7486 Ł 002 Ł 26 race/ethnicityis 85% and 81% 5 ,respectively.Then,these˝xedclassi˝ersarea˚liatedwithGACto providedemographicinformationforlearningadaptivekernelsandattentionmaps.WemergeBUPT- BalancedfaceandRFW,andsplitthesubjectsinto 5 setsforeachof 8 demographicgroups.In 5 -fold cross-validation,eachtimeamodelistrainedon 4 setsandtestedontheremainingset.Tab.4.12 reportsthestatisticsofeachdatafoldforthecross-validationexperimentonBUPT-Balancedface andRFWdatasets. Herewedemonstratethee˚cacyoftheautomationmoduleforGAC.Wecomparetothescheme ofmanuallydesign(AL+Manual)thataddsadaptivekernelsandattentionmapstoasubsetof layers.Speci˝cally,the˝rstblockineveryresidualunitischosentobetheadaptiveconvolution layer,andchannel-wiseattentionsareappliedtothefeaturemapoutputbythelastblockineach residualunit.Asweuse 4 residualunitsandeachblockhas 2 convolutionallayers,themanual schemeinvolves 8 adaptiveconvolutionallayersand 4 groupsofchannel-wiseattentionmaps.As inTab.4.10,automaticadaptationismoree˙ectiveinenhancingthediscirminabilityandfairness offacerepresentations.Figure4.15ashowsthedissimilarityofkernelmasksintheconvolutional layerschangesduringtrainingepochsunderthreethresholds g .Alower g resultsinmoreadaptive layers.Weseethelayersthataredeterminedtobeadaptivedovaryacrossbothlayers(vertically) andtrainingtime(horizontally),whichshowstheimportanceofourautomaticmechanism. SinceIJB-Calsoprovidesgenderlabels,weevaluateourGAC-gendermodel(seeSec.4.2of themainpaper)onIJB-Caswell.Speci˝cally,wecomputetheveri˝cationTARat 0 Ł 1% FARon thepairsoffemalefacesandmalefaces,respectively.Tab.4.13reportstheTAR@ 0 Ł 1% FARon 5 Thisseeminglylowaccuracyismainlyduetothelargedatasetweassembledfortrainingandtestinggender/race classi˝ers.Ourdemographicclassi˝erhasbeenshowntoperformcomparablyasSOTAoncommonbenchmarks. Whiledemographicestimationerrorsimpactthetraining,testing,andevaluationofbiasmitigationalgorithms,the evaluationisofthemostconcernasdemographiclabelerrorsmaygreatlyimpactthebiasnesscalculation.Thus,future developmentmayincludeeithermanuallycleaningthelabels,ordesigningabiasnessmetricrobusttolabelerrors. 105 Table4.14Veri˝cationaccuracy( % )ontheRFWprotocol[4]withvaryingrace/ethnicitydistribution inthetrainingset. TrainingRatioWhiteBlackEastAsianSouthAsianAvg( " )STD( # ) 7:7:7:796 Ł 2094 Ł 7794 Ł 8794 Ł 9895 Ł 210 Ł 58 5:7:7:796 Ł 5394 Ł 6794 Ł 5595 Ł 4095 Ł 290 Ł 79 3 Ł 5:7:7:796 Ł 4894 Ł 5294 Ł 4595 Ł 3295 Ł 190 Ł 82 1:7:7:795 Ł 4594 Ł 2894 Ł 4795 Ł 1394 Ł 830 Ł 48 0:7:7:792 Ł 6392 Ł 2792 Ł 3293 Ł 3792 Ł 650 Ł 44 gendergroupsofIJB-C.ThebiasnessofGACisstilllowerthanthebaselinefordi˙erentgender groupsofIJB-C. 4.4.3.3AnalysisonIntrinsicBiasandDataBias ForallthealgorithmslistedinTab.1ofthemainpaper,theperformanceishigherinWhitegroup thanthoseintheotherthreegroups,eventhoughallthemodelsaretrainedonademographic balanceddataset,BUPT-Balancedface[5].Inthissection,wefurtherinvestigatetheintrinsicbiasof facerecognitionbetweendemographicgroupsandtheimpactofthedatabiasinthetrainingset. Are non-Whitefacesinherentlydi˚culttoberecognizedforexistingalgorithms?Or,arefaceimagesin BUPT-Balancedface(thetrainingset)andRFW[4](testingset)biasedtowardstheWhitegroup? Tothisend,wetrainourGACnetworkusingtrainingsetswithdi˙erentrace/ethnicitydistributions andevaluatethemonRFW.Intotal,weconductfourexperiments,inwhichwegraduallyreducethe totalnumberofsubjectsintheWhitegroupfromtheBUPT-Balancedfacedataset.Toconstruct anewtrainingset,subjectsfromthenon-WhitegroupsinBUPT-Balancedfaceremainthesame, whileasubsetofsubjectsisrandomlypickedfromtheWhitegroup.Asaresult,theratiosbetween non-Whitegroupsareconsistentlythesame,andtheratiosofWhite,Black,EastAsian,South Asianare f 5:7:7:7 g , f 3 Ł 5:7:7:7 g , f 1:7:7:7 g , f 0:7:7:7 g inthefourexperiments, respectively.Inthelastsetting,wecompletelyremoveWhitefromthetrainingset. Tab.4.14reportsthefaceveri˝cationaccuracyofmodelstrainedwithdi˙erentrace/ethnicity distributionsonRFW.Forcomparison,wealsoputourresultsonthebalanceddatasethere(with ratio f 7:7:7:7 g ),whereallimagesinBUPT-Balancedfaceareusedfortraining.Fromtheresults, 106 Table4.15Veri˝cationperformanceonLFW,IJB-A,andIJB-C.[Key: Best , Second ,Third Best] MethodLFW(%) Method IJB-A(%)IJB-C@FAR(%) 0.1%FAR0.001%0.01%0.1% DeepFace+[17] 97 Ł 35 Yin etal .[182] 73 Ł 9 4 Ł 2 --69.3 CosFace[19] 99 Ł 73 Cao etal .[59] 90 Ł 4 1 Ł 474 Ł 784 Ł 091 Ł 0 ArcFace[6] 99 Ł 83 Multicolumn[183] 92 Ł 0 1 Ł 3 77 Ł 1 86 Ł 2 92 Ł 7 PFE[184] 99 Ł 82 PFE[184] 95 Ł 3 0 Ł 989 Ł 693 Ł 395 Ł 5 Baseline 99 Ł 75 Baseline 90 Ł 2 1 Ł 180 Ł 288 Ł 092 Ł 9 GAC 99 Ł 78 GAC 91 Ł 3 1 Ł 2 83 Ł 589 Ł 293 Ł 7 weseeseveralobservations:(1)ItshowsthattheWhitegroupstilloutperformsthenon-Whitegroups forallthe˝rstthreeexperiments.EvenwithoutanyWhitesubjectsinthetrainingset,theaccuracy ontheWhitetestingsetisstillhigherthanthoseonthetestingimagesinBlackandEastAsian groups.ThissuggeststhatWhitefacesareeitherintrinsicallyeasiertobeveri˝edorfaceimagesin theWhitegroupofRFWarelesschallenging.(2)WiththedeclineinthetotalnumberofWhite subjects,theaverageperformancedeclinesaswell.Infact,forallthesegroups,theperformance su˙ersfromthedecreaseinthenumberofWhitefaces.ThisindicatesthatfaceimagesintheWhite groupsarehelpfultoboostthefacerecognitionperformanceforbothWhiteandnon-Whitefaces. Inotherwords,facesfromtheWhitegroupbene˝ttherepresentationlearningofglobalpatternsfor facerecognitioningeneral.(3)Oppositetoourintuition,thebiasnessislowerwithlessnumberof Whitefaces,whilethedatabiasisactuallyincreasedbyaddingtheunbalancednesstothetraining set. 4.4.3.4ResultsonStandardBenchmarkDatasets WhileourGACmitigatesbias,wealsohopeitcanperformwellonstandardbenchmarks.Therefore, weevaluateGAConstandardbenchmarkswithoutconsideringdemographicimpacts,including LFW[27],IJB-A[29],andIJB-C[9].Thesedatasetsexhibitimbalanceddistributionindemographics. ForafaircomparisonwithSOTA,insteadofusinggroundtruthdemographics,wetrainGAC onMs-Celeb-1M[23]withthedemographicattributesestimatedbytheclassi˝erpre-trainedin Sec.4.4.3.2.AsinTab.4.15,GACoutperformsthebaselineandperformscomparabletoSOTA. 107 Table4.16Distributionofratiosbetweenminimuminter-classdistanceandmaximumintra-class distanceoffacefeaturesin 4 racegroupsofRFW.GACexhibitshigherratios,andmoresimilar distributionstothereference. Race MeanStaDRelativeEntropy BaselineGACBaselineGACBaselineGAC White 1 Ł 151 Ł 170 Ł 300 Ł 310 Ł 00 Ł 0 Black 1 Ł 071 Ł 100 Ł 270 Ł 280 Ł 610 Ł 43 EastAsian 1 Ł 081 Ł 100 Ł 310 Ł 320 Ł 650 Ł 58 SouthAsian 1 Ł 151 Ł 180 Ł 310 Ł 320 Ł 190 Ł 13 4.4.3.5VisualizationandAnalysisonBiasofFR Visualization TounderstandtheadaptivekernelsinGAC,wevisualizethefeaturemapsatan adaptivelayerforfacesofvariousdemographics,viaaPytorchvisualizationtool[214].Wevisualize importantfaceregionspertainingtotheFRdecisionbyusingagradient-weightedclassactivation mapping(Grad-CAM)[15].Grad-CAMusesthegradientsbackfromthe˝nallayercorrespondingto aninputidentity,andguidesthetargetfeaturemaptohighlightimportregionsforidentitypredicting. Figure4.16showsthat,comparedtothebaseline,thesalientregionsofGACdemonstratemore diversityonfacesfromdi˙erentgroups.Thisillustratesthevariabilityofnetworkparametersin GACacrossdi˙erentgroups. BiasviaLocalGeometry InadditiontoSTD,weexplainthebiasphenomenonviathelocal geometryofagivenfacerepresentationineachdemographicgroup.Weassumethatthestatisticsof neighborsofagivenpoint(representation)re˛ectscertainpropertiesofitsmanifold(localgeometry). Thus,weillustratethepair-wisecorrelationoffacerepresentations.Tominimizevariationscaused byothervariables,weuseconstrainedfrontalfacesofamugshotdataset,PCSO[14].Werandomly select 1 KWhiteand 1 KBlacksubjectsfromPCSO,andcomputetheirpair-wisecorrelation withineachrace.InFig.4.15b,Base-Whiterepresentationsshowlowerinter-classcorrelationthan Base-Black, i.e. ,facesintheWhitegroupareover-representedbythebaselinethantheBlackgroup. Incontrast,GAC-WhiteandGAC-Blackshowsmoresimilarityintheircorrelationhistograms. AsPCSOhasfewAsiansubjects,weuseRFWforanotherexaminationofthelocalgeometry in 4 groups.Thatis,afternormalizingtherepresentations,wecomputethepair-wiseEuclidean 108 Table4.17Networkcomplexityandinferencetime. ModelInputResolution#Parameters(M)MACs(G)Inference(ms) Baseline 112 11243 Ł 585 Ł 961 Ł 1 GAC 112 11244 Ł 009 Ł 821 Ł 4 distanceandmeasuretheratiobetweentheminimumdistanceofinter-subjectspairsandthe maximumdistanceofintra-subjectpairs.Wecomputethemeanandstandarddeviation(StaD)of ratiodistributionsin 4 groups,bytwomodels.Also,wegaugetherelativeentropytomeasurethe deviationofdistributionsfromeachother.Forsimplicity,wechooseWhitegroupasthereference distribution.AsshowninTab.4.16,whileGAChasminorimprovementoverbaselineinthemean, itgivessmallerrelativeentropyintheother 3 groups,indicatingthattheratiodistributionsof otherracesinGACaremoresimilar, i.e. ,lessbiased,tothereferencedistribution.Theseresults demonstratethecapabilityofGACtoincreasefairnessoffacerepresentations. 4.4.3.6NetworkComplexityandFLOPs Tab.4.17summarizesthenetworkcomplexityofGACandthebaselineintermsofthenumberof parameters,,andinferencetimes.Whileweagreethenumberofparameters willincreasewiththenumberofdemographiccategories,itwillnotnecessarilyincreasetheinference time,whichismoreimportantforreal-timeapplications. 4.5DemographicEstimation Wetrainthreedemographicestimationmodelstoannotateage,gender,andraceinformationofthe faceimagesinBUPT-BalancedfaceandMS-Celeb-1MfortrainingGACandDebFace.Forallthree models,werandomlysampleequalnumberofimagesfromeachclassandsetthebatchsizeto 300 . Thetrainingprocess˝nishesat 35 C iteration.Allhyper-parametersarechosenbytestingona separatevalidationset.Belowgivesthedetailsofmodellearningandestimationperformanceof eachdemographic. 109 (a) Age (b) Gender (c) Race Figure4.17DemographicAttributeClassi˝cationAccuracyoneachgroup.Thereddashedline referstotheaverageaccuracyonallimagesinthetestingset. Table4.18Genderdistributionofthedatasetsforgenderestimation. Dataset #ofImages MaleFemale Training321,590229,000 Testing15,71510,835 Gender: WecombineIMDB,UTKFace,AgeDB,AFAD,andAAFdatasetsforlearningthe genderestimationmodel.Similartoage, 90% oftheimagesinthecombineddatasetsareused fortraining,andtheremaining 10% areusedforvalidation.Table4.18reportsthetotalnumber offemaleandmalefaceimagesinthetrainingandtestingset.Moreimagesbelongtomalefaces inbothtrainingandtestingset.Figure4.17bshowsthegenderestimationperformanceonthe validationset.Theperformanceonmaleimagesisslightlybetterthanthatonfemaleimages. Race: WecombineAFAD,RFW,IMFDB-CVIT,andPCSOdatasetsfortrainingtherace estimationmodel.UTKFaceisusedasvalidationset.Table4.19reportsthetotalnumberofimages ineachracecategoryofthetrainingandtestingset.Similartoageandgender,theperformanceof raceestimationishighlycorrelatedtotheracedistributioninthetrainingset.Mostoftheimages arewithintheWhitegroup,whiletheIndiangrouphastheleastnumberofimages.Therefore,the performanceonWhitefacesismuchhigherthanthatonIndianfaces. Age: WecombineCACD,IMDB,UTKFace,AgeDB,AFAD,andAAFdatasetsforlearningthe ageestimationmodel. 90% oftheimagesinthecombineddatasetsareusedfortraining,andthe 110 Table4.19Racedistributionofthedatasetsforraceestimation. Dataset #ofImages WhiteBlackEastAsianIndian Training468,139150,585162,07578,260 Testing9,4694,1153,3363,748 Table4.20Agedistributionofthedatasetsforageestimation Dataset #ofImagesintheAgeGroup 0-1213-1819-3435-4445-5455-100 Training9,53929,135353,901171,32893,50659,599 Testing1,0852,68113,8488,4145,4794,690 remaining 10% areusedforvalidation.Table4.20reportsthetotalnumberofimagesineachage groupofthetrainingandtestingset,respectively.Figure4.17ashowstheageestimationperformance onthevalidationset.Themajorityoftheimagescomefromtheage19to34group.Therefore,the ageestimationperformsthebestonthisgroup.Theperformanceontheyoungchildrenandmiddle tooldagegroupissigni˝cantlyworsethanthemajoritygroup. Itisclearthatallthedemographicmodelspresentbiasedperformancewithrespecttodi˙erent cohorts.ThesedemographicmodelsareusedtolabeltheBUPT-BalancedfaceandMS-Celeb-1M fortrainingGACandDebFace.Thus,inadditiontothebiasfromthedatasetitself,wealsoadd labelbiastoit.SinceDebFaceemployssupervisedfeaturedisentanglement,weonlystrivetoreduce thedatabiasinsteadofthelabelbias. 4.6Conclusion ThischaptertacklestheissueofdemographicbiasinFRbylearningfairfacerepresentations.We presenttwode-biasingFRnetworks,GACandDebFace,tomitigatedemographicbiasinFR.In particular,GACisproposedtoimproverobustnessofrepresentationsforeverydemographicgroup consideredhere.Bothadaptiveconvolutionkernelsandchannel-wiseattentionmapsareintroduced toGAC.Wefurtheraddanautomaticadaptationmoduletodeterminewhethertouseadaptationsin agivenlayer.Our˝ndingssuggestthatfacescanbebetterrepresentedbyusinglayersadaptiveto 111 di˙erentdemographicgroups,leadingtomorebalancedperformancegainforallgroups.Unlike GAC,DebFacemitigatemutualbiasacrossidentitiesanddemographicattributesrecognitionby adversariallylearningthedisentangledrepresentationforgender,race,andageestimation,andface recognitionsimultaneously.WeempiricallydemonstratethatDebFacecannotonlyreducebiasin facerecognitionbutindemographicattributeestimationaswell. 112 Chapter5 AdversarialFaceRepresentationLearning viaGraphClassi˝cation FacerepresentationlearningisoneofthekeystepsinFRtoovercomechallengescausedbyvariations infaceimages.Inthischapter,weproposearepresentationlearningmethodthatutilizesgraph classi˝cationviaadversarialtraining.Eachfaceimagecanbeviewedasanodeinthegraph,and theedgesbetweennodesstandfortheconnectivityoffacesamples.Agraphclassi˝eristrained todistinguishgraphsbetweenthosegeneratedbyextractedfeaturevectorsandthosede˝nedbya practicalassumption.Meanwhile,thefacerepresentationmodelattemptstofoolthegraphclassi˝er sothatitcangraduallyacquirethefeaturedistributionoftheidealoraclegraph.Inthisway,the featurepointsofthesameidentitywillcomecloserwhilemaintainareasonabledistancefrompoints ofotheridentities.Experimentsonbenchmarkfacedatasets(LFW,CPLFW,CFP-FP,IJB-A,IJB-B, IJB-C)showthatourframeworkachievesstate-of-the-artperformanceforbothveri˝cationand identi˝cationtasks. 5.1AdversarialLearningandGraphClassi˝cationwithGNN Adversariallearning[153]hasproventobeausefulapproachtolearningdatadistribution,andhas beenappliedtomanycomputervisionapplications.(RefertoSec.4.3.1formoredetailsonresent 113 (a)Generativeadversarialnetwork(GAN) (b)Adversarialrepresentationlearningviagraphclassi˝cation Figure5.1(a)InGANs,duringtraininganimagegeneratorgraduallyproduceshigherqualityfaces sothataCNN-baseddiscriminatorcouldnotdistinguishfakefromrealfaces.(b)Analogously, giveninputfaces,ourembeddingnetworkforfacerecognitionlearnstoextractdiscriminative featuresandconnectfeaturesasagraph,withthegoalthatagraphneuralnetwork(GNN)-based discriminatorcouldnotdistinguishgeneratedgraphsfromoraclegraphsthegraphofidealface representations.Duringinference,ourembeddingnetworkcanextractmorediscriminativefeatures thatformoracle-likegraph,justlikeGAN'sgeneratorsynthesizesphoto-realisticfaces. 114 advancesinadversariallearning.) Thetaskofgraphclassi˝cationistopredictthecategoryagraphbelongsto.Unlikenodelabel prediction,afullgraphstructureisconsideredasasingleinputcomponent,andthecorresponding outputiseitherasinglerepresentationvectororaclasslabel.Twomaintechniquesareinvolved inrecentDNN-basedgraphnetwork:spatialcomputationandspectraloperation.InSpectral approaches[215,216],thegraphconvolutionsarebasedontheconvolutiontheoremfromsignal processingtechnology,wherethepoint-wisemultiplicationsareperformedintheFourierdomain ofthegraph.Incontrast,spatialmethods[217 219]operateconvolutionsdirectlyonthegraph structure.Beforethelabelingprocedure,thealgorithmin[218]˝rstappliesanormalization ontheneighborhoodgraphscreatedbydeterminingthesequenceofnodes.Thisnormalization stepmovesgraphswithsimilarstructuralrolesinthesameneighborhood,andbene˝tsthe˝nal classi˝cation.Antoine etal .[220]modi˝estheconventional1Dgraphconvolutiontoavanilla2D CNNarchitecturefor2Dgraphclassi˝cation. 5.2OurApproach 5.2.1OverallFramework Theproposedadversarialtrainingframeworkforfacerepresentationiscomposedof:aface embeddingnetwork ˆ ( ) ,afeaturegraphconstructor ˝ ( ) ,andaGNN-basedgraphdiscriminator ˇ ( ) .First,givenasetof # labeledtrainingimages f ( x 8 ŒH 8 ) g # 8 ,theembeddingnetworktakesthe 8 C image x 8 astheinputandtransforms x 8 intoa < -dimensionalfeaturevector: f 8 = ˆ ( x 8 ) ,where f 8 2R < .Thepairofthefeaturevectoranditscorrespondinglabel ( f 8 ŒH 8 ) isthensenttothegraph constructor ˝ ( ) tobeaddedasanewnodeinthegraph.When ˝ ( ) collectsaspeci˝ednumberof nodes,itstartstobuildgraphsbasedonlabelsofthenodesandtheirsimilarities.Forthegraphsin theoraclespace,twovertices ( E 8 ŒE 9 ) areconnectedbyabidirectionaledgeiftheyarefromthesame subject;andforthegraphsinthegeneratedspace,onevertex E 8 islinkedtoanother E 9 if E 8 isone ofthe : nearestneighborsof E 9 .MoredetailsofgraphconstructionarediscussedinSec.5.2.2. 115 Figure5.2Overviewoftheproposedadversarialfacerepresentationlearningviagraphclassi˝cation. Solidarrowspresentforwardpass,anddashedarrowsdenotebackwardpropagation.Thetraining alternatesbetween ˆ ( ) and ˇ ( ) .Forthesharedinference(solidbluearrows),asetoffaceimages are˝rsttakenbytheembeddingnetwork ˆ ( ) toextractfeaturerepresentations.Thesefeature vectorsarethenconvertedintographstructurebythegraphconstructor ˝ ( ) inwhichanoracle graphandageneratedgraphareconstructed.Duringthetrainingof ˇ ( ) (yellowarrows),thetwo typesofgraphsarereceivedbythegraphdiscriminator ˇ ( ) thatisrequiredtomakepredictions onthecategoryofthegraphs. ˇ ( ) isthenupdatedbasedonthegradientsentbackfromtheloss function L ˇ .Inthecourseoftrainingon ˆ ( ) (redarrows),onlygeneratedgraphsaredeliveredto ˇ ( ) ,and ˆ ( ) receivesfeedbacksfrom L whosegoalistodrive ˇ ( ) tomakeerrorsongenerated graphs. Next,thetwotypesofgraphsproducedby ˝ ( ) arefedintotheGNNdiscriminator ˇ ( ) asthe trainingsamples.Theparametersof ˇ ( ) isupdatedwiththeobjectivetocorrectlyclassifythe graphcategory.Inthemeantime,theadversarialgoalof ˆ ( ) istomislead ˇ ( ) bymakingthe featuredistributioninthetwospacesfromwhichthesegraphsarecreated indistinguishable . AsshowninFig.5.2,withthecooperationfrom ˝ ( ) , ˆ ( ) endeavorstoextractfeature representationsbycompetingagainst ˇ ( ) usingagraphclassi˝cationnetwork[217].Sucha graphstructurecannotonlyallowfordistancemetricbetweennodes,butattendtobothlocaland globaldistributionsoffeaturerepresentations,andhencebene˝tsthediscriminativepowerofface embeddings. 116 5.2.2GraphConstruction Agraphiscomposedofvertices V andedges E .Inourframework,eachvertexisrepresented bya < -dimensionalfeaturerepresentation,andanadjacencymatrixwithbinary( 0 € 1 )elements isusedtodescribeedgeinformation.Theadjacentelementequalstoonewhenthetwovertices areconnected,otherwiseitiszero.Givenasetofdatapairs f ( f 8 ŒH 8 ) g # 8 ,thegraphconstructor ˝ ( ) producesgraphsoftwotypes:(1)anoraclegraph 6 > ( V > Œ E > ) ,and(2)ageneratedgraph 6 5 ( V 5 Œ E 5 ) . Weintroducetheconstructionofbothtypesofgraphsintermsofthede˝nitionoftheirverticesand edges,respectively. OracleGraph Foredgesinanoraclegraph,onevertexisconnectedtoalltheotherverticesofthe samesubject.Theresultingadjacencymatrixissymmetric: A > 89 = 8 > > >< > > > : 1 ŒH 8 = H 9 0 Œ otherwise .Forverticesin anoraclegraph,weneedtorede˝netherepresentationvectorsbasedonboththeexistingfeature pairs f ( f 8 ŒH 8 ) g # 8 andtheultimatelearningobjective.Forexample,inthemostidealsituationeach faceidentitywouldberepresentedbyasinglevector,whichmeansallimagesamplesofthesame subjectwouldbemappedtothesamefeaturevector,regardlessoftheintra-subjectvariations.The correspondingnodeinformationmatrixis: P > = f f 2 H 8 g # 8 2R # < Œ (5.1) where f 2 9 = 1 ) 9 P H 8 = 9 f 8 ,and ) 9 denotesthetotalnumberoffaceimagesofthe 9 C subject(See Fig.5.3b).However,whensuchnoderepresentationsareusedasthetrainingtarget,itisfarbeyond therealisticdatadistributionoffaceimages.Asaresult,itmayeitherbehardtotrainorleadto trivialsolutions. Tomakeitmorepractical,weintroducearatiohyper-parameter, A ,toadaptivelyadjustthe trainingdi˚cultyintermsofthenoderepresentation.Speci˝cally, A isafractionnumberbetween 0 and 1 .Itcontrolsthemaximumdistanceofanyfeaturevectortoitscentroid.Forthosevectors thatviolatethisconstraint,theyareforcedtomovetheircoordinatestowardsthedirectionofthe 117 centroidsinane˙orttoreachthemaximumdistance.Nowwecaneasilymanipulatethetraining targetbychangingthevalueof A .The˝nalmatrixofnodeinformationintheoraclegraphis: P > 8 = 8 > > > >< > > > > : f 8 Œˇ8BC ( f 8 Œ f 2 H 8 ) A ˇ8BC <8= f 2 H 8 + A ( f 8 f 2 H 8 ) ˇ8BC <8= ˇ8BC ( f 8 Œ f 2 H 8 ) , otherwise (5.2) where ˇ8BC <8= = min f ˇ8BC ( f 2 H 8 Œ f 2 9 ) g = 9 =1 Œ9 6 = H 8 , = isthetotalnumberofsubjects,and ˇ8BC ( ) isa distancemetricfunction.Fig.5.3cand5.3dshowa 2 Dexampleofthisprocess. GeneratedGraph Avertexinageneratedgraphissimplyrepresentedbyitscorresponding representationvector f 8 extractedfrom ˆ ( ) .Andtheentirenodeinformationinthegraphisdenoted byafeaturematrix P 5 = f f 8 g # 8 2R # < witheachrowcorrespondstoavertex.Foredgesina generatedgraph,weassigntheadjacentvaluebasedonthesimilaritybetweentwovertices.A vertex E 8 isassociatedwithanothervertex E 9 if E 9 isoneof E 8 'stop : nearestneighbors,denotedby #481 : ( f 9 ) ,whichisbasedonthedistanceoftheirfeaturevectorsintheEuclideanspace.Unlike A > , the˝naladjacencymatrixofageneratedgraphmaynotbesymmetric: A 5 89 = 8 > > >< > > > : 1 Œ f 8 2 #481 : ( f 9 ) 0 Œ otherwise . Intheend,apairofgraphs 6 > ( V > Œ E > ) Œ6 5 ( V 5 Œ E 5 ) associatedwiththebinarylabels H 6 8 =1 ŒH 6 9 =0 areyieldedbythegraphconstructor. 5.2.3DiscriminatorandAdversarialLearning Forthegraphdiscriminator ˇ ( ) ,weconsiderawholegraphwithallitsverticesandedgesasasingle instance,andthegoalistopredictthecategoryitbelongsto.Sincethereisavarietyofapplications ongraphclassi˝cation,ithasraisedattentionstodevelopmoreusefulandpracticalarchitecturesfor graphclassi˝cationandrepresentationlearning.Here,weemployafastapproximateconvolutions ongraphs,proposedby[215].Inparticular,thediscriminatornetworkconsistsofmultiplegraph convolutionallayersconnectedbynon-linearactivationfunctions: H ( ; +1) = f ~ D 1 2 ~ A ~ D 1 2 H ( ; ) W ( ; ) Œ (5.3) 118 (a)GeneratedGraph (b)OracleGraphbyCenter (c)MeettheRadius (d)OracleGraphby A Figure5.3 Constructionofgeneratedgraphandoraclegraph .Inthisexample,theinputimageset comprises 9 imagesof 3 subjects, 3 imagespersubject.Theimageofeachsubjectissurroundedby acirclewithauniquecolor,indicatingitsidentity.Eachimageisprojectedtoapointinthe 2 D Euclideanfeaturespace.Thefollowinggraphsareconstructed:(a)ageneratedgraph,whereeach vertex E 8 isrepresentedbyitsfeaturevector,withadirectededgefrom E 8 to E 9 if E 9 isoneofthe top 2 nearestneighborsof E 8 ;(b)anoraclegraphcreatedbycenterpoints,whereeachvertexis representedbythemeanvectorofitsidentitywithabidirectionaledgeconnectingtwoverticesof thesamesubject;(c)aradiusconstraintisusedtoallowtolerableintra-subjectvariations,where verticesmovetowardscenterdirections(denotedbydashedarrows)tomeettheradiusrequirement. Forthevertexwithintheradius,theleftmostoneinthisexample,itstaysthesame.(d)anoracle graphcontrolledby A ,wherethedistanceofeachvertextoitscenterisreducedbytheratioof A , withabidirectionaledgeconnectingtwoverticesofthesamesubject. 119 where ~ A = A + I # isthesummationoftheadjacencymatrixandtheidentitymatrix I # , ~ D 88 = P 9 ~ A 89 , and W ( ; ) istheconvolutionalkernelmatrixofthe ; C layer. f ( ) denotestheactivationfunction. H ( ; ) istheinputfeaturemapforlayer ; ,andtheinputfortheinitiallayeristhenodeinformation matrix P ofthegraph.Finally,theclassi˝cationnetworkendswithafully-connectedlayerfollowed byasigmoidoperation.Asabinaryclassi˝cationtask,thediscriminatoristrainedbyminimizing thestandardbinarycrossentropylossfunction: L ˇ = 1 # 6 # 6 X 8 =1 H 6 8 log( z 8 )+(1 H 6 8 )log(1 z 8 ) Œ (5.4) where # 6 isthetotalnumberofgraphs,and ˇ ( 6 8 )= z 8 istheprobabilitypredictionbySigmoid. Asadistributiondetector, ˇ ( ) isalsoinchargeofguiding ˆ ( ) toimitatefeaturessampledfrom thetargetdistribution.Theparametersof ˆ ( ) isadversariallytrainedbymaking ˇ ( 6 5 ) toyield wrongpredictionswhen 6 5 istheinput,suchthattheSigmoidoutputfor 6 5 isashighas 6 > .This adversariallossisformulatedas: L 0 = 1 # 6 # 6 X 8 =1 log ˇ ( 6 5 8 ) Œ (5.5) where 6 5 8 = ˝ ( ˆ ( f x g 8 )) .Thegradientsderivedfrom L 0 willnotpropagatebackto ˇ ( ) ,butonlyto ˆ ( ) toupdateitsparameters.Byoptimizing L 0 ,weassumethatif ˆ ( ) successfullyacquiresthe oracledistribution, ˇ ( ) issupposedtoconsistentlygivehighprobabilityestimatesforthecategory oforaclegraphs,nomatterwhatkindofgraphisactuallytakenastheinput.Asaresult,theface embeddingsoutputby ˆ ( ) arelikelytopresentsimilarglobalandlocalnodedependenciesand similarities,aswellastherecognitionabilitytothetargetfeatures. 5.2.4NetworkTraining Asbothoraclegraphsandgeneratedgraphsareconstructedbasedonfeaturesextractedfrom images,ourframeworkrequiresapre-trainedfacerepresentationmodeltoinitializethenode 120 informationmatrixforbothgraphs.Apartfromtheaboveadversarialloss,aconventionallossfor facerecognition, L 5 ,isalsoincludedtoconstrainthefeaturedistributioninareasonablephysical metricwherevectorsimilaritiescanbeapplied,andtopreventtrivialsolutionstoo.Further,as statedinSec.5.2.2thatthemaximumdistanceofanarbitrarypointtoitsclasscentroidisbased ontheminimumdistancebetweencentroids,thedistributionofcentroidsisalsoakeyfactorin creatingdiscriminativenodeembeddingsfororaclegraphs.Thus,weintroduceanotherobjectiveto constrainthedistancebetweencentroids: L 2 = 2 = ( = 1) = ( = 1) 2 X 8 6 = 9 [ ˇ8BC 2 ˇ8BC ( ^ f 2 8 Œ ^ f 2 9 )+ < ] + Œ (5.6) where ^ f 2 isthebatch-estimatedclasscentroidthatissmoothlyupdatedduringtraining,and ˇ8BC 2 istheaveragedistanceofallpair-wisecentroidsamongtheentiretrainingset. [ G ] + = max (0 ŒG ) , and < isthemarginparameter.Intotal,thetrainingonthefaceembeddingnetworkisupdatedby minimizingthecombinedlossfunction: L ˆ = ! 5 + _ L 0 + ` L 2 Œ (5.7) where _ and ` areusedtoadjustthecontributionoftheadversariallossandthecentroiddistance lossto ˆ ( ) . Trainingstrategy Theproposedframeworkcanbetrainedinanend-to-endmanner,oratwo-step strategy.Iftrainedasonepiece,thetrainingprocedureissimilartothatofGAN,where ˆ ( ) and ˇ ( ) areupdatedalternately.Ontheotherhand,toobtainamorestableinformationoftheglobal datastructureinthefeaturespace, e.g. ,theglobalclasscentroidvector f 2 andtheaveragecentroid distance ˇ8BC 2 ,weadoptastage-wiselearningprocesstotrainthetwonetworks. Speci˝cally,thediscriminator ˇ ( ) is˝rsttrainedonthefeaturespre-extractedby ˆ ( ) .Mean- while,alltheparametersrelatedtotheglobaldistributionisalsopre-computed,including f 2 and ˇ8BC 2 .When ˇ ( ) achievesadecentperformance,westopthetrainingand˝xitsparameters.Next, 121 thetrainingentersthesecondstage,wherethepairwisecentroiddistanceandthevertexfeaturesin thegeneratedgraphareupdatedwith ˆ ( ) .Atrainingloopiscompletedafterthesecondstagecomes toanend.Wecanstartanotherloopifnecessary.Similartotripletloss,weselecthardsamples duringtraining.Thosegeneratedgraphswithverticesoredgesdi˙erentfromoraclegraphsare consideredasvalidtrainingsamples.Inaddition,thethreeadjustablehyper-parameters,maximum distanceratio A ,losscontributions _ and ` canbeupdatedduringdi˙erenttrainingloops.For instance,wecangraduallyraisethedi˚cultyofthetargetgraphmanipulationwhilethenumberof loopsincreases. 5.3Experiments Inthissection,we˝rstconductablationexperimentstostudyvariousdesignchoicesofouralgorithm, andthencompareourperformancewiththestateoftheartmethodsonpublicbenchmarkdatasets. Finally,weanalyzethedistributionofourfacerepresentationsviagraphvisualization. 5.3.1DatasetsandImplementationDetails OurtrainingdatasetisMS-Celeb-1M(MS1M)[23]cleanedbyArcFace[6],referredtoas MS1MV2 , containingabout 5 Ł 8 Mimagesof 85 Ksubjects.Weevaluatedourmethodonsixpublicbenchmark datasetsforfacerecognition:LFW[27],CPLFW[71],CFP-FP[72],IJB-A[29],IJB-B[30],and IJB-C[9].Thefaceareais˝rstcroppedfromeachimagebasedon˝vefaciallandmarksdetectedby RetinaFace[34],andthenresizedto 112 112 pixels. Thearchitectureof ˆ ( ) isa 100 -layerResNetusedin[6].Forgraphdiscriminator ˇ ( ) ,weadopt theDGCNNarchitectureproposedby[217],consistingoffourlayersofgraphconvolution[215]and Tanhactivation,asortpoolinglayer,MaxPoolingand 1 Dconvolutionlayers,afully-connectedlayer withReLUactivation,andaSoftmaxclassi˝cationlayerintheend.Foreachgraph, 5 subjectswith 5 imagespersubjectarerandomlyselectedtoformthesetofnodes.Duringthetrainingof ˇ ( ) ,the graphbatchsizeissetto 90 ,ofwhichhalfareoracleandhalfaregenerated.Theparametersof ˇ ( ) 122 Table5.1Veri˝cationperformance( % )ofdi˙erentvertexfeaturematricesoforaclegraphs.A bigger A toleratesmoreintra-classvariations,whileasmall A , f 2 8 ,or f ? 8 striveforminimalintra-class variation.Abalancebetweenthelearningcapabilityandidealrepresentationsperformsthebest ( A =0 Ł 7 ). P > CFP-FP IJB-A TAR@ 0 Ł 1% FAR A =1 Ł 097 Ł 9094 Ł 42 1 Ł 49 A =0 Ł 998 Ł 2796 Ł 58 0 Ł 45 A =0 Ł 7 98 Ł 3497 Ł 31 0 Ł 38 A =0 Ł 596 Ł 6894 Ł 04 1 Ł 87 A =0 Ł 390 Ł 0478 Ł 13 3 Ł 98 f f 2 8 g # 1 92 Ł 7181 Ł 30 3 Ł 61 f f ? 8 g # 1 90 Ł 0577 Ł 42 4 Ł 19 areupdatedusingAdamwithalearningrateof 1 10 3 .Fortraining ˆ ( ) ,eachbatchcontains 275 imagesfrom 55 subjects,also 5 imagespersubject.Withthesamegraphsize, 22 graphsare constructedby ˝ ( ) ineverytrainingstep. ˆ ( ) isoptimizedbySGDwithamomentumof 0 Ł 9 anda weightdecayof 5 4 4 .Thelearningratestartsfrom 0 Ł 05 anddropsatepoch 8 Œ 15 Œ 20 .Themargin < inlossfunction L 2 issetto 0 Ł 3 .WeutilizethelossfunctionintroducedinCurricularFace[221]as L 5 ,andtheirResNet100modeltrainedonMS1MV2asthepre-trainedmodeltoinitializethevertex featuresingraphs.Theentiretrainingprocesstakestwoloops,andthethreehyper-parameters, A = f 0 Ł 9 Œ 0 Ł 7 g , _ = f 1 Ł 0 Œ 0 Ł 5 g , ` = f 0 Ł 5 Œ 0 Ł 1 g inthetwoloops,respectively. 5.3.2AblationStudy VertexFeatureMatrixofOracleGraphs Thedesignforvertexfeaturematrix, P > ,directly in˛uencesthefeaturedistributioninoraclespace,andalsodeterminesthecomplexityandfeasibility forlearningtheembeddingnetwork.Here,weexploredi˙erentwaystode˝neoraclerepresentations andanalyzetheirimpactonthelearnedfeaturespace.Sevenvariantsareconsidered:(1-5):Move eachfeaturevectortowardsclasscentersunderthecontrolofdi˙erentratios A ,inwhich A =1 Ł 0 , 0 Ł 9 , 0 Ł 7 , 0 Ł 5 ,and 0 Ł 3 ,respectively.(6)Eachidentityisrepresentedbyasinglefeaturevector f 2 8 , themeanvectorofallsamplesfromthesubject 8 .Thisisequivalentto A =0 ;(7)Eachidentityis representedbyaprototypefeaturevector f ? 8 ,thecorrespondingcolumnvectorintheweightmatrix 123 ofthelastclassi˝cationlayer;Allthesesevenablationmodelsaretrainedforoneloop,with _ =1 Ł 0 and ` =0 Ł 5 . Tab.5.1reportsthefaceveri˝cationresultsonCFP-FPandIJB-Ausingdi˙erentvertexfeature matricesfororaclegraphs.Ourresultssuggestthatthereisalimitoftheintra-classdistanceofeach identityinoraclegraphs.Iftheprede˝nedintra-classdistanceissmallerthanthelimit,itisbeyond thenetworklearningcapacityandthusitsrecognitionperformancewillsigni˝cantlydegrade.For example,whenweappointcenterrepresentation f f 2 8 g # 1 ,orprototyperepresentation f f ? 8 g # 1 asthe vertexfeaturematrixfororaclegraph,theaverageintra-classdistanceiszero,sinceeveryidentityis aprefect 512 -dimensionalfeaturerepresentation.Theveri˝cationaccuracyyetdrops 8 Ł 29% on CFP-FPcomparedtothebestmodelinthisablation,andtheTrueAcceptanceRate(TAR)falls from 97 Ł 31% 0 Ł 38% to 77 Ł 42% 4 Ł 19% onIJB-A.Asimilarperformanceisalsoobservedwhen A issetto 0 Ł 3 . Theseresultsareevenworsethantheinitialpre-trainednetwork.Thisindicatesthatgraph discriminatorwoulddeliverajumbleofinformationthatmaymisguidetheembeddingnetwork, whenanunreasonablyidealdistributionisassumedfororaclerepresentations.Ontheotherhand, theperformanceisrelativelyinsensitiveto A whentheminimumintra-classvariationiswithin areasonablerange.Byde˝nition,abigger A toleratesmoreintra-classvariations,butleaving lessroomforimprovingtherepresentation.Thus, A shouldbesmallenoughtoachievehigher discriminabilityoftherepresentation,andmeanwhile,bebiggerenoughtopreventcapacityover˛ow. Inourexperiments, A =0 Ł 7 appearstobeagoodtrade-o˙andconsistentlyperformsthebeston bothCFPandIJB-A. AdjacencyMatrixofGeneratedGraphs InSec.5.2.2,wementionthattheadjacencymatrixofa generatedgraphiscreatedbytheinformationofnearestneighbors.Thegoalisfor ˆ ( ) tolearn howtomaptheoracledependenciesbetweenverticesintheEuclideanspace.Toshowitse˚cacy, weablatebyreplacingitwiththeadjacencymatrixoftheoraclegraph,whichisestablishedbased onidentitylabels.Boththeablationmodelandtheproposedmodelaretrainedforoneloop,with A =0 Ł 7 , _ =1 Ł 0 ,and ` =0 Ł 5 . 124 Table5.2Veri˝cationperformance( % )ofdi˙erentadjacencymatricesofgeneratedgraphs. A 5 CFP-FP IJB-A TAR@ 0 Ł 1% FAR NearestNeighbors 98 Ł 3497 Ł 31 0 Ł 38 IdentityLabels 97 Ł 1694 Ł 44 1 Ł 09 Table5.3Veri˝cationperformance( % )ofdi˙erent _ and ` . _` CFP-FP IJB-A TAR@ 0 Ł 1% FAR 1 Ł 00 Ł 5 98 Ł 3497 Ł 31 0 Ł 38 1 Ł 00 Ł 198 Ł 2796 Ł 20 0 Ł 40 0 Ł 50 Ł 598 Ł 1495 Ł 82 0 Ł 47 Tab.5.2comparestheresultsoftrainingusingdi˙erentwaystode˝neadjacencymatricesfor generatedgraphs.Clearlyitismoreimportanttoallowadjacencymatricesdependingonthefeature representations,whichhelpsgradient˛owsthroughtheembeddingnetwork,andtheadjacency matriceswillupdatealongwiththeembeddings.Otherwise,ifde˝nedviaidentitylabels,constant adjacencymatriceswillbeutilizedduringtrainingiterations. ContributionofAdversarialLossandCentroidLoss Hereweshowthee˙ectsofadversarial lossandcentroidlossbytrainingthenetworkusingdi˙erent _ and ` .Theremainingsettingsare thesameforallthemodelstrainedinthisablation:oneloop,with A =0 Ł 7 .Tab.5.3reportsthe resultsfordi˙erenthyper-parameters, _ and ` toshowthecontributionsofbothobjectivefunctions, Wekeeponeofthemunchanged,anddecreasethevalueoftheothertoseehowita˙ectsthe performancewhenlesscontributionismadeby L 2 or L 2 .AsshowninTab.5.3,asmaller _ and ` bothleadtoworseperformanceonCFP-FPandIJB-A,whileincreasingeitherofthembooststhe performance.Thisindicatesthattheproposedadversariallearningwiththecentroiddistanceloss makesnon-negligiblecontributiontothediscriminabilityoffacepresentations. 5.3.3ComparisonswithSOTAMethods Wechoosesixbenchmarkdatasetswidelyusedforfacerecognition,tothoroughlyevaluateour approachandcompareitwithotherSOTAmethodsaswell.Amongthesixdatasets,threeofthem 125 Table5.4Veri˝cationaccuracy( % )ofourmodelandSOTAmethodsonLFW,CPLFW,andCFP-FP. Theresultsmarkedby(*)arere-implementedbyArcFace[6].Allotherbaselineresultsarereported bytheirrespectivepapers.[Keys:Red:Best,Blue:Secondbest] MethodTrainingDataLFWCPLFWCFP-FP DeepID2+[222] 0 Ł 3 M 99 Ł 47 CenterLoss[47] 0 Ł 7 M 99 Ł 28 SphereFace[13]CASIA[11]( 0 Ł 5 M) 99 Ł 42 DeepFace[17] 4 M 97 Ł 35 FaceNet[12] 200 M 99 Ł 63 TPE[223]CASIA 89 Ł 17 DRGAN[224] 1 M 93 Ł 41 Yin etal .[11]CASIA 94 Ł 39 UVGAN[225]MS1M( 10 M) 99 Ł 60 94 Ł 05 CosFace[19]CASIA 99 Ł 51 95 Ł 44 PFE[184]MS1M( 4 Ł 4 M) 99 Ł 82 93 Ł 34 ArcFace[6]MS1MV2( 5 Ł 8 M) 99 Ł 83 98 Ł 37 CurricularFace[221]MS1MV2 99 Ł 8093 Ł 1398 Ł 37 DDL[226]VGGFace2( 3 Ł 3 M) 99 Ł 68 93 Ł 43 98 Ł 53 OursMS1MV2 99 Ł 78 93 Ł 40 98 Ł 41 aretestedundertheinstance-basedimage-to-imageveri˝cationprotocol,verifyingwhetherapairof faceimagesbelongtothesameperson;andtheveri˝cationprotocoloftheotherthreedatasetsis basedonimagetemplates.Atemplateofimagesisreferredasacollectionoffaceimagessampled fromthesameidentity.Thetemplate-basedveri˝cationtaskrequiresustodecidewhethertwo templatesarefromthesamepersonornot. Instance-basedFaceVeri˝cation Tab.5.4reportstheveri˝cationaccuracyonthethreeinstance- basedbenchmarkdatasets,LFW,CPLFW,andCFP-FP.LFWisadatasetcollectedbeforetheeraof deepfacerepresentations.Its 6 Œ 000 pairsoffaceimagesareconsideredassemi-constrained,with limitedintra-classvariationsandrelativelyhighimagequality.TheSOTAperformanceisalready saturatedonLFW.AlmostallDNN-basedmethodslistedinTab.5.4achieveover 99 Ł 00% accuracy. Evenso,LFWisstillusedasastandardvalidationbenchmarkgivenitsprevalenceinFRresearch communitiesande˚cientpairwiseassessment.Ourmodelobtainsasimilaraccuracy( 99 Ł 78% )to othermethods,thoughslightlyworsethanthetwomodelsbothtrainedonMS1MV2(ArcFace[6] andCurricularFace[221]). Theothertwodatasets,CPLFW,andCFP-FP,arecreatedtoaddressthechallengeoflargefacial 126 Table5.5Comparisonsofveri˝cationperformancewithSOTAmethodsonIJB-A,IJB-B,andIJB-C. TheevaluationismeasuredbyTAR( % ),TrueAcceptanceRate,atacertainFAR,FalseAcceptance Rate.ForIJB-A,FAR= 0 Ł 1% ;forIJB-BandIJB-C,FAR= 0 Ł 01% .ThedecimalprecisionofTAR variesamongthosereportedbySOTAmethods.Resultsreportedinthistableareuni˝edtoone decimalplace( 0 Ł 1 ).Allbaselineresultsarereportedbytheirrespectivepapers.[Keys:Red:Best, Blue:Secondbest] MethodTrainingDataIJB-AIJB-BIJB-C DRGAN[224] 1 M 53 Ł 9 4 Ł 3 Yin etal .[11]CASIA 73 Ł 9 4 Ł 2 NAN[227] 3 M 88 Ł 1 1 Ł 1 QAN[228] 5 M 89 Ł 3 3 Ł 9 TPE[223]CASIA 90 Ł 0 1 Ł 0 VGGFace2[59]VGGFace2 90 Ł 4 2 Ł 080 Ł 084 Ł 0 Multicolumn[183]VGGFace2 92 Ł 0 1 Ł 383 Ł 188 Ł 7 DCN[125]VGGFace2 84 Ł 988 Ł 5 PFE[184]MS1M( 4 Ł 4 M) 95 Ł 3 0 Ł 9 93 Ł 3 Adacos[229] 2 Ł 8 M 92 Ł 4 P2sgrad[230] 2 Ł 8 M 92 Ł 3 ArcFace[6]MS1MV2 94 Ł 295 Ł 6 CurricularFace[221]MS1MV2 94 Ł 8 96 Ł 1 DDL[226]VGGFace2 90 Ł 793 Ł 1 OursMS1MV2 97 Ł 3 0 Ł 4 94 Ł 6 96 Ł 2 posevariations.Theveri˝cationisconductedbetweenafrontalfaceandapro˝leface,ortwo faceswithvariantyawangles.DespitemorechallengingthanLFW,imagesinCFP-FPareofhigh resolution.AndforCPLFW,itcontainsthesameimagesasLFW,butwithre-designatedfacepairs withposedi˙erence.Thus,theaverageSOTAperformanceonthesetwodatasetsisover 90 Ł 00% accuracy.Withoutaparticularpolicytailoredforlargeposevariations,ourapproachstillachieves topperformance,beingbetterthanalloftheSOTAmethodsexceptDDL[226],whichistrained onVGGFace2[59],adatasetwithlargeposevariationsthatexhibitslessdomaingapwiththetwo testingsetscomparedtoMS1MV2. Template-basedFaceVeri˝cation Tab.5.5reportstheTARperformanceonthethreetemplate- basedbenchmarkdatasets,IJB-A,IJB-B,andIJB-C.ToevaluateFRmodelsinmorechallenging scenes,NISTreleasedaseriesofdatasetsthatcontainamixofhigh/lowqualityimagesandlow qualityvideoframes,presentinglargevariationsinpose,illumination,occlusion,resolution, etc. IJB-Aisamongthe˝rsttobepublishedandhasthesmallestnumberofsubjectsandimages.Our modeloutperformstheothermethodsonIJB-Ausingnospeci˝cschemefortemplate-basedface 127 Table5.6Comparisonsoffaceidenti˝cationperformance( % )ontheIJB-Cdataset(close-set). Method IJB-C Rank- 1 Rank- 5 VGGFace2[59] 91 Ł 495 Ł 1 CurricularFace[221] 94 Ł 496 Ł 1 Ours 95 Ł 396 Ł 7 veri˝cation.ItshouldbenotedthatsincetheevaluationonIJB-Aisaten-foldcross-validation protocol,˝ne-tuningcanbedoneonthesplitfoldsbeforeevaluation.No˝ne-tuninghasyetbeen conductedonourmodel. BothIJB-BandIJB-CaretheextendedversionsofIJB-Abyaddingmoresubjectsandimages. The = foldcross-validationprotocolisnotprovidedforthesetwoextensionsandonlyevaluation isallowed.InTab.5.5,weseethatourapproachperformscomparablytotheSOTAmethodson IJB-BandIJB-C,whichdemonstratesthatourgraphbasedadversarialframeworkindeedbene˝ts thediscriminabilityandgeneralizabilityforfacerepresentation. Apartfromveri˝cationtasks,wealsoreportthefaceidenti˝cationresultsonIJB-CinTab.5.6. Theclose-setidenti˝cationprotocolofIJB-Ccontainstwosetsoffacetemplateswithnooverlapping images,referredtoasprobetemplates,andgallerytemplates,respectively.Inparticular,thesetof gallerytemplatesincludeallidentitiesinprobetemplates,onetemplateperidentity.Eachtime, onetemplateintheprobesetiscomparedwithallthegallerytemplatestosearchforthenearest matches.The˝nalresultispresentedbya Rank- : accuracy,thecorrectmatchingrateamong thetop : nearestneighbors.WereportRank- 1 andRank- 5 accuraciesinTab.5.6.Compared withCurricularFaceandthealgorithmin[59],ourmodelbooststheperformanceinbothRank- 1 andRank- 5 accuracies,whichshowsthattheproposedmethodisalsoe˙ectiveinimprovingface representationsforidenti˝cationtasks. 5.3.4AnalysisonFeatureDistribution Wefurtherinvestigatethee˙ectsofourgraph-basedadversariallearningmechanismonface representationsviavisualization.Wediscusstheenhancementoffeaturediscriminativenessfrom 128 Figure5.4Twoexamplesofgeneratedgraphsbeingupdatedbyadversariallearningat 4 instances duringthetrainingprocess. thegraphperspective.Sincethevertexfeaturesofourgraphareinitializedbyawell-trainedmodel, itlaysagoodfoundationoffurtherimprovement.Infact,manyfeaturepointsarewithintheradius constraintandarewellconnectedwiththeirsame-identityneighbors.Forthese goodpoints ,they neednoattentionsfromtraining.Onthecontrary,thesepointsmayconfusethegraphdiscriminator ˇ ( ) astheirfeaturesandstructuresarethesameasoraclegraphs,whichleadstobadresults.Hence, ouradversarialtrainingonlyfocusesonthose hardsamples thatpresentdi˙erentlyfromoracle graphs, i.e. ,eitherviolatingtheradiusrequirementornotconnectingtothesameidentity.We askthenetworktolearnagoodmappingforthesesamplestoboostthegeneralperformance.For example,Fig.5.4showsthetrainingprogressoftwosetsofinitiallyhardsamplesmadebythe proposedadversariallearning. Fig.5.5showsthet-SNEvisualizationofthefaceembeddingsextractedfromCurricularFaceas wellastheproposedmethod.First,Werandomlyselect 15 hardsamplesubjectswithalloftheir faceimagesavailableinMS1MV2,ourtrainingdataset.Next,a 512 -dimfeaturevectorisextracted fromeachimagebyourmodelandtheinitialCurricularFacemodel.Wethenuset-SNEtoproject each 512 -dimvectorintoa 2 Dspace,inwhichthegeodesicdistancesbetweenpointsinthehigh dimensionalspacearemaximallymaintained.Thecolorofeach 2 Dpointstandsforthesubject's identity.FromFig.5.5,weseethatthesesamplesimagesarehardtoberecognizedbyCurricularFace becausemanyofthenearestneighborsarepointsofotheridentities.Incomparison,bylearning 129 (a)CurricularFace[221] (b)Ours Figure5.5 t-SNEvisualizationofthefacerepresentationsina 2 Dspace.Eachidentityisrepresentedbya uniquecolor.TheinitialfacerepresentationsextractfromCurricularFace,andtheupdatedrepresentations learnedviaadversarialgraphclassi˝cationareshownin(a)and(b),respectively. fromoraclegraphs,theproposedmethodlearnsabetterfeaturespacewherethefaceembeddingsof thesameidentityformamorecompactclustering,leadingtoahigherdiscriminability. 5.4ConcludingRemarks Deepnetworkbasedfacerecognitionhaswitnessedrapiddevelopmentinrecentyears,especially owningtotheinnovationofaseriesoflossfunctionsdesignedtoenhancethediscriminativenessof theembeddingspace.However,mostlossfunctionsexamineeachindividualsample,asamplepair, oratmostatripletofsamplesinthephysicaldistancemetric,butignorethegeneral distribution derivedfromcorrelationsbetweensamplesofwithin-classandcross-class.Incontrast,motivated bythedesignofgenerativeadversarialnetwork,ourproposedapproachoverseesthedistributionof featurevectors,representedbyagraph,andpushthegeneratedgraphtobesimilartothe oraclegraph,byupdatingtheembeddingnetwork. Webelievethisworkisameaningfulandexcitingexplorationalongthedirectionoflossfunction designinthefacerecognitioncommunity.Experimentalresultsdemonstratethepromisingofour proposedapproach.Sinceourmainideaisnotfacespeci˝c,oneofthefutureworksistoextendthis studytoImageNet-likegeneralclassi˝cationproblems. 130 Chapter6 SummaryandFutureWork Thisdissertationaddressesfourfacerecognitionproblems,whichareessentialtobothfundamental analysisandpracticalapplications.Solutionsproposedinthisdissertationemployavarietyof techniquesindeeplearningtoadvanceresearchwithintheFRcommunity. IntrinsicDimensionality. Oneofthemaingoalsisto estimatethelimitofrepresentation compactnesswithnolossinrecognitionperformance .Thisisdenotedastheintrinsicdimensionality (IND)ofafacerepresentation.Meanwhile,giventheintrinsicdimensionality,wealsoaimat˝nding aprojectionmethodtoobtainarepresentationtowardsitslimitofcompactness. Thedensityvariationinarepresentationistheprincipalconsiderationwhenestimatingreliable INDofagivenfacerepresentation.However,itisdi˚culttoobtainanaccurateestimateofsuch probabilitydistributionsincefaceimagesoftenlieonatopologicallycomplexcurvedmanifold, andestimatingitsdistributionrequiresdatapointsatverysmalllength-scales(distances),which arehardtoaccesswhendataislimited,especiallyinhigh-dimensionalspaceswheredeepface representationsareusuallyembedded.Anotherchallengingtaskistoverifywhetheragivenestimate ofINDtrulyrepresentsthedimensionalityofthecomplexhigh-dimensionalrepresentationspace. Ourcontributiontothisproblemisthatweovercometheabovechallengesbyo˙eringa topologicaldimensionalityestimationtechniqueforhigh-dimensionalfacerepresentationsand proposingamappingapproachthatenablesvalidationoftheINDestimatesthroughimagematching 131 experimentsonthecorrespondinglow-dimensionalintrinsicrepresentationoffeaturevectors.In thisstudy,wede˝nethenotionofintrinsicdimensionthroughtheclassicalconceptoftopological dimensionofthesupportofadistribution.Weadoptanelegantsolutiontoaddressingtheissue of curseofdimensionality byutilizingthegeodesicdistancebetweenpointsthatiscomputedas graphinducedshortestpathbetweenpointsinsteadoftheEuclideandistance.Andthedi˚culty ofestimatingthedatadistributionisconqueredbasedontheobservationthatdi˙erenttopological geometriesaresimilartoeachotheraslongastheintrinsicdimensionalityisthesame,orin otherwordsthedistributiondependsonlyontheintrinsicdimensionalityandnotonthegeometric supportofthemanifolds.Thus,theintrinsicdimensionalityoffacemanifoldcanbeestimated bycomparingtheempiricaldistributionofthepairwisedistancesonthemanifoldtothatofa knowndistribution,suchasthe < hypersphere.TovalidatetheestimatedIND,weproposea method,DeepMDS,thatreliesontheabilityofDNNstoapproximatethecomplexmappingfunction fromtheambientspacetotheintrinsicspace.TheDeepMDSmodelisoptimizedtopreservethe interpointgeodesicdistancesbetweenthefeaturevectorsintheambientandintrinsicspace,andis trainedinastage-wisemannerthatprogressivelyreducesthedimensionalityoftherepresentation. Ournewdimensionalityreductionmethodaddressesthescalabilityandout-of-sample-extension problemssu˙eredbytraditionalspectralmethodslikeIsomap.Awell-trainedDeepMDSmodel isnotlimitedtomapobserveddata,butcanbeeasilyappliedtonewtestdatasinceitprovidesa mappingfunctionintheformofafeed-forwardnetworkthatmapstheambientfeaturevectortoits correspondingintrinsicfeaturevector.ItisshownthatDeepMDSmappingissigni˝cantlybetter thanotherdimensionalityreductionapproachesintermsofitsdiscriminativecapability. Capacity. Givenafacerepresentation,howmanyidentitiescanitresolve? Addressingthis questionistheprimarygoalofthiswork.Wealsorefertoitasthecapacityofagivenface representation.Wede˝nethemaximalnumberofusersatwhichthefacerepresentationreaches itslimitasthecapacityoftherepresentation.Byourde˝nition,thecapacityisdeterminedinan objectivemannerwithouttheneedforempiricalevaluation. Oursolutionreliesonthenotionofcapacitythathasbeenwellstudiedintheinformation 132 theorycommunityinthecontextofwirelesscommunication.Thesetting,commonlyreferred toastheGaussianchannel,consistsofasourcesignalthatisadditivelycorruptedbyGaussian noisetogenerateobservations.ThecapacityofthisGaussianchannelisde˝nedasthenumberof distinctsourcesignalsinthesignalrepresentations.Despitetherichtheoreticalunderstandingof thecapacityofaGaussianchannel,therehasbeenlimitedpracticalapplicationofthistheoryinthe contextofestimatingthecapacityoflearnedembeddingslikefacerepresentations.Forexample, estimatingthedistributionofthesourceandthenoiseforahigh-dimensionalembedding,suchasa facerepresentation,isanopenproblem.Avarietyofsourcesofnoiseneedtobetakenintoaccount whenitcomestoreliablyinferringtheprobabilitydistributionsinhigh-dimensionalspaces.Itis alsochallengingtoobtainreliableestimatesofthevolumeofarbitrarilyshapedhigh-dimensional manifolds(forcapacitybound). Weaddresstheaforementionedchallengestoobtainreliableestimatesofthecapacityofany facerepresentationbyleveragingthedimensionalityreductionmethod,DeepMDS,weproposedin thepreviousstudy.WiththeassistanceofDeepMDS,we˝rstmodelthefacerepresentationasa low-dimensionalEuclideanmanifoldembeddedwithinahigh-dimensionalspace,andthenproject andunfoldthemanifoldtoalow-dimensionalspace.Inoursolution,twokindsofmanifoldsneed tobeapproximated:(1)apopulationmanifoldthatisapproximatedbyamultivariateGaussian distribution(equivalently,hyper-ellipsoidalsupport)intheunfoldedlow-dimensionalspace;(2) identity-speci˝cmanifoldsthatareapproximatedbythecorrespondingmulti-variateGaussian distributionswhosesupportsareestimatedasafunctionofthespeci˝edFAR.The˝nalcapacity valueisestimatedasaratioofthevolumesofthepopulationandidentity-speci˝chyper-ellipsoids. Weestimatethedistributionofbothkindsofmanifoldsfromanobservedfacerepresentationby leveragingtherecentadvancesinDNNs.Inparticular,givenanembeddingfunction( teacher network )thatmapsnormalizedhigh-dimensionalfaceimagestoalow-dimensionalvector,wetrain aDNN( studentnetwork )tomodeltwosourcesofuncertaintythatcontributetothenoiseinthe embeddings:(i)uncertaintyinthedata,and(ii)uncertaintyintheembeddingfunction.These uncertaintyestimatesarethenusedtodeterminethevolumesoftheirmanifoldsandalsodirectly 133 enableustocomputetherepresentationcapacityasafunctionofthedesiredoperatingpointas determinedbyitscorrespondingFAR.Experimentalresultssuggestthatourcapacityestimatesare anupperboundontheactualperformanceoffacerecognitionsystemsinpractice,especiallyunder unconstrainedscenarios.Therelativeorderofthecapacityestimatesmimicstherelativeorderof theveri˝cationaccuracyonthebenchmarkdatasets. Bias. Demographicbiasinfacerecognitionsystemscanpotentiallycauseethicalissueswhen deployed.AthoroughanalysisonthediscriminatorybehaviorofFRsystemsagainstcertain demographicgroupsandasolutiontomitigatingsuchbiasarethecentralaimsofthisstudy.We de˝neFRbiasastheunevenrecognitionperformancew.r.t.demographicgroups.And,theultimate goalofunbiasedfacerecognitionisthat,givenafacerecognitionsystem,thereshouldbeno statisticallysigni˝cantdi˙erenceamongtheperformanceindi˙erentdemographicgroupsofface images. Biascanbederivedfromvarioussources,suchasthebalancedegreeofthedemographicsamples, variationsincaptureconditionsandimagenoiseamongdi˙erentdemographicgroups.Therefore, naivelytrainingonadatasetcontaininguniformsamplesoverthegroupspacemaystillleadtobias. Infact,thedemographicdistributionofadatasetisoftenimbalancedwithunderrepresentedand overrepresentedgroups.Similarly,simplyre-samplinganimbalancedtrainingdatasetmaynotsolve theproblemeither,giventhefactthatthediversityoflatentvariablesisdi˙erentacrossgroupsand theinstancescannotbetreatedfairlyduringtraining.Forthesereasons,biasmitigationrequires specialattentionsonbothdatasamplingandalgorithmdesign. Thisthesisfocusesondevelopingde-biasingalgorithmsforfacerecognition.Ourcontribution tothisproblemisthatweprovidetwoapproachestoaddressingtheissueofdemographicbias. The˝rstframework,DebFace,istodiminishthein˛uenceofbiasonbothfacerecognitionand demographicattributeestimation.Infact,duringourinvestigationondemographicbiasinface recognition,wealsoobservethebiasedperformanceofdemographicattributeestimation,which maycauseadditionalbiassinceitactsdi˙erentlywhenappliedtode-biasfacerepresentations indi˙erentgroups.Tothisend,weproposetojointlylearnunbiasedrepresentationsforboth 134 theidentityanddemographicattributes.Thissolutionisbasedontheassumptionthatiftheface representationdoesnotcarrydiscriminativeinformationofdemographicattributes,itwouldbe unbiasedintermsofdemographics.Andthesamehypothesisisappliedtodemographicattribute estimationaswell.Startingfromamulti-tasklearningframeworkthatlearnsdisentangledfeature representationsofgender,age,race,andidentity,respectively,werequesttheclassi˝erofeachtask toactasadversarialsupervisionfortheothertasks.Thesefourclassi˝ershelpeachothertoachieve betterfeaturedisentanglement,resultinginunbiasedfeaturerepresentationsforboththeidentityand demographicattributes. AlthoughDebFaceshowsnoticeablee˙ectinmitigatingdemographicbias,weobservethatthe recognitionperformancedeclinesaswell.Thus,inoursecondsolution,GAC,wemainlyfocuson racialbiasandstrivetoenhancethediscriminativenessoffacerepresentationsineveryrace/ethnicity group.ThekeyideaofGACistogivethenetworkmorecapacitytobroadenitsscopeformultiple facepatternsfromdi˙erentgroupssinceanunbiasedFRmodelshallrelyonbothuniquepatterns forrecognitionofdi˙erentgroups,andgeneralpatternsofallfacesforimprovedgeneralizability. GACexplicitlylearnsthesedi˙erentfeaturepatternsbyleveragingtwomodules:theadaptive layerandautomationmodule.Theadaptivelayercomprisesadaptiveconvolutionkernelsand channel-wiseattentionmapswhereeachkernelandmaptacklefacesin one demographicgroup. WealsointroduceanewobjectivefunctiontoGAC,whichdiminishesthevariationofaverage intra-classdistancebetweendemographicgroups.Todynamicallyapplytheseadaptivemodules, wealsoproposeanautomationschemethatcanchoosewhichlayerstoapplytheadaptations.Asa result,ourexperimentsdemonstratethee˚cacyofGACinbiasmitigationandSOTAperformance preservation. RepresentationLearning. Wefocusedonthefairnessofpre-de˝nedgroups(demographic groups)inthepreviousstudy.Yet,itisequallyimportanttoaddresstheindividualfairnessand performanceinfacerecognition.Inthisstudy,weaimatdevelopingafacerepresentationmethodin considerationofindividualperformancesothatfaceimagesofeachidentityisfairlytreated,andin themeantimetheoverallperformanceisimproved. 135 Byinvestigatingtheaverageintra-subjectandinter-subjectdistanceofeachidentity,westartby constructinga : NNgraphtodescribethegeneralfacedistributionintheembeddingspace.Asa furtherendeavor,weproposearepresentationlearningmethodthatutilizesgraphclassi˝cationvia adversarialtraining.Incontrasttousinganequationalmetricastheconstraintinthefeaturespace, ourideaengagesacreationofanidealfeaturespacethatensuresindividualperformance,referred toas oraclespace ,wheretheclusteroffeaturepointsineachclassisclearlyseparatedfromother classes.Adeepneuralnetwork(DNN)isthentrainedtogeneratefacefeaturesthatfollowthedata distributioninthe oraclespace .Sincetherelationshipsandinter-dependenciesbetweenfeature pointsarenotassimpleasthedatastructureof˝xed-sizegridimages,wecannolongerusethe conventionalCNN discriminator astheformofouradversarialsupervision.Inparticular,thedata structureinthefeaturespacecanberepresentedbyadirectedgraph,whereeachvertexcorresponds toanimagesampleandtheedgesbetweenverticesrepresenttheirdependencies.Fortheoracle space,featurepointsareconnectediftheybelongtothesubject;whileintheactualfeaturespace, nodesarelinkedtotheir : nearestneighbors.Thediscriminationtaskhereistodistinguishbetween graphsfromtheoraclespaceandthegeneratedspace.Hence,weemployagraphclassi˝ertrained withGraphNeuralNetwork(GNN),asthediscriminatorthatguidestherepresentationmodelto outputfeaturesthatfollowtheoracledistribution.Itisdemonstratedthatourframeworkiscapable oflearningagenericfeaturespacewithenhanceddiscriminativepowerforfaceimages,basedona predesignedfeaturedistributionde˝nedonagraphstructure. FutureWork. Whilethisdissertationhasexploredfundamentalproblemsinfacerecognitionand hasdevelopedusefultoolsandalgorithmsthatprovideexcellentperformance,thereisalwaysroom foradditionalimprovement.Mostimportantly,theresearchcontributionsinthisdissertationisnot limitedtofacerecognition.Thereareanumberofotherareasincomputervision,forexample, generalimageclassi˝cationandrepresentationlearning,thatcanbene˝tgreatlyfromtheresearch conductedinthisdissertation. 136 APPENDIX 137 PUBLICATIONS [1] S.Gong,V.N.Boddeti,andA.K.Jain,theintrinsicdimensionalityofimagerepresentations, in CVPR ,2019. [2] S.Gong,X.Liu,andA.K.Jain,yde-biasingfacerecognitionanddemographicattribute estimation, ECCV ,2020. [3] D.Deb,S.Wiper,S.Gong,Y.Shi,C.Tymoszek,A.Fletcher,andA.K.Jain,acerecognition: Primatesinthewild,in BTAS ,IEEE,2018. [4] S.Gong,Y.Shi,N.D.Kalka,andA.K.Jain,ideofacerecognition:Component-wisefeature aggregationnetwork(c-fan),in ICB ,IEEE,2019. [5] S.Gong,Y.Shi,andA.Jain,wqualityvideofacerecognition:Multi-modeaggregation recurrentnetwork(marn),in ICCVW ,2019. [6] S.Gong,X.Liu,andA.K.Jain,atingfacerecognitionbiasviagroupadaptiveclassi˝er, in CVPR ,2021. [7] S.Gong,V.N.Boddeti,andA.K.Jain,thecapacityoffacerepresentation, arXivpreprint arXiv:1709.10433 ,2017. 138 BIBLIOGRAPHY 139 BIBLIOGRAPHY [1] D.GranataandV.Carnevale,Accurateestimationoftheintrinsicdimensionusinggraph distances:Unravelingthegeometriccomplexityofdatasets, Scienti˝cReports ,vol.6,p.31377, 2016. [2] K.W.Pettis,T.A.Bailey,A.K.Jain,andR.C.Dubes,Anintrinsicdimensionalityestimatorfrom near-neighborinformation, IEEETransactionsonPatternAnalysisandMachineIntelligence , pp.1979. [3] A.Rozza,G.Lombardi,C.Ceruti,E.Casiraghi,andP.Campadelli,ovelhighintrinsic dimensionalityestimators, MachineLearning ,vol.89,no.1-2,pp.2012. [4] M.Wang,W.Deng,J.Hu,X.Tao,andY.Huang,facesinthewild:Reducingracial biasbyinformationmaximizationadaptationnetwork,in ICCV ,2019. [5] M.WangandW.Deng,atingbiasinfacerecognitionusingskewness-awarereinforcement learning,in CVPR ,2020. [6] J.Deng,J.Guo,N.Xue,andS.Zafeiriou,Arcface:Additiveangularmarginlossfordeepface recognition,in CVPR ,2019. [7] P.Grother,M.Ngan,andK.Hanaoka,acerecognitionvendortest(FRVT)part3:Demographic e˙ects,in TechnicalReport,NationalInstituteofStandardsandTechnology ,2019. [8] S.Gong,V.N.Boddeti,andA.K.Jain,theintrinsicdimensionalityofimagerepresentations, in CVPR ,pp.2019. [9] B.Maze,J.Adams,J.A.Duncan,N.Kalka,T.Miller,C.Otto,A.K.Jain,W.T.Niggel, J.Anderson,J.Cheney, etal. ,pajanusbenchmark-c:Facedatasetandprotocol,in ICB , 2018. [10] C.Szegedy,S.Io˙e,V.Vanhoucke,andA.A.Alemi,inception-resnetandthe impactofresidualconnectionsonlearning.,in AAAI ,2017. [11] D.Yi,Z.Lei,S.Liao,andS.Z.Li,ningfacerepresentationfromscratch, arXivpreprint arXiv:1411.7923 ,2014. [12] F.Schro˙,D.Kalenichenko,andJ.Philbin,acenet:Auni˝edembeddingforfacerecognition andclustering,in CVPR ,2015. [13] W.Liu,Y.Wen,Z.Yu,M.Li,B.Raj,andL.Song,ace:Deephypersphereembedding forfacerecognition,in CVPR ,2017. [14] B.F.Klare,M.J.Burge,J.C.Klontz,W.V.Bruegge,Richard,andA.K.Jain,acerecognition performance:Roleofdemographicinformation, IEEETrans.InformationForensicsand Security ,vol.7,no.6,pp.2012. 140 [15] R.R.Selvaraju,M.Cogswell,A.Das,R.Vedantam,D.Parikh,andD.Batra, Visualexplanationsfromdeepnetworksviagradient-basedlocalization,in ICCV ,2017. [16] H.-T.Nguyen, Contributionstofacialfeatureextractionforfacerecognition .PhDthesis, Grenoble,2014. [17] Y.Taigman,M.Yang,M.Ranzato,andL.Wolf,ace:Closingthegaptohuman-level performanceinfaceveri˝cation,in CVPR ,2014. [18] Y.Sun,Y.Chen,X.Wang,andX.Tang,learningfacerepresentationbyjoint identi˝cation-veri˝cation,in NeurIPS ,2014. [19] H.Wang,Y.Wang,Z.Zhou,X.Ji,D.Gong,J.Zhou,Z.Li,andW.Liu,ace:Large margincosinelossfordeepfacerecognition,in CVPR ,2018. [20] C.LuandX.Tang,passinghuman-levelfaceveri˝cationperformanceonlfwwith gaussianface, arXivpreprintarXiv:1404.3840 ,2014. [21] J.Howard,Y.Sirotin,andA.Vemury,Thee˙ectofbroadandspeci˝cdemographic homogeneityontheimposterdistributionsandfalsematchratesinfacerecognitionalgorithm performance,in IEEEBTAS ,2019. [22] E.Creager,D.Madras,J.-H.Jacobsen,M.Weis,K.Swersky,T.Pitassi,andR.Zemel,xibly fairrepresentationlearningbydisentanglement,in ICML ,2019. [23] Y.Guo,L.Zhang,Y.Hu,X.He,andJ.Gao,Adatasetandbenchmarkfor large-scalefacerecognition,in ECCV ,Springer,2016. [24] I.Masi,Y.Wu,T.Hassner,andP.Natarajan,facerecognition:Asurvey,in SIBGRAPI , 2018. [25] C.Jin,R.Jin,K.Chen,andY.Dou,Acommunitydetectionapproachtocleaningextremely largefacedatabase, ComputationalIntelligenceandNeuroscience ,2018. [26] ashlist, [27] G.B.Huang,M.Ramesh,T.Berg,andE.Learned-Miller,facesinthewild:A databaseforstudyingfacerecognitioninunconstrainedenvironments,Tech.Rep.07-49, UniversityofMassachusetts,Amherst,October2007. [28] P.ViolaandM.J.Jones,obustreal-timefacedetection, Internationaljournalofcomputer vision ,2004. [29] B.F.Klare,B.Klein,E.Taborsky,A.Blanton,J.Cheney,K.Allen,P.Grother,A.Mah,and A.K.Jain,thefrontiersofunconstrainedfacedetectionandrecognition:Iarpajanus benchmarka,in CVPR ,2015. [30] C.Whitelam,E.Taborsky,A.Blanton,B.Maze,J.Adams,T.Miller,N.Kalka,A.K.Jain, J.A.Duncan,K.Allen, etal. ,pajanusbenchmark-bfacedataset,in CVPRW ,2017. 141 [31] S.A.Rizvi,P.J.Phillips,andH.Moon,Theferetveri˝cationtestingprotocolforface recognitionalgorithms,in AFGR ,pp.IEEE,1998. [32] M.Gunther,S.Cruz,E.M.Rudd,andT.E.Boult,Towardopen-setfacerecognition,in CVPRW ,pp.2017. [33] K.Zhang,Z.Zhang,Z.Li,andY.Qiao,facedetectionandalignmentusingmultitask cascadedconvolutionalnetworks, IEEESignalProcessingLetters ,2016. [34] J.Deng,J.Guo,Y.Zhou,J.Yu,I.Kotsia,andS.Zafeiriou,etinaface:Single-stagedense facelocalisationinthewild, arXivpreprintarXiv:1905.00641 ,2019. [35] A.K.Jain,K.Nandakumar,andA.Ross,yearsofbiometricresearch:Accomplishments, challenges,andopportunities, PatternRecognitionLetters ,vol.79,pp.2016. [36] M.A.TurkandA.P.Pentland,acerecognitionusingeigenfaces,in CVPR ,1991. [37] P.N.Belhumeur,J.P.Hespanha,andD.J.Kriegman,enfacesvs.˝sherfaces:Recognition usingclassspeci˝clinearprojection, IEEETransactionsonPatternAnalysisandMachine Intelligence ,vol.19,no.7,pp.1997. [38] D.Gabor,Theoryofcommunication.part1:Theanalysisofinformation, Journalofthe InstitutionofElectricalEngineers-PartIII:RadioandCommunicationEngineering ,vol.93, no.26,pp.1946. [39] L.Wiskott,N.Krüger,N.Kuiger,andC.VonDerMalsburg,acerecognitionbyelasticbunch graphmatching, IEEETransactionsonPatternAnalysisandMachineIntelligence ,vol.19, no.7,pp.1997. [40] T.Ahonen,A.Hadid,andM.Pietikäinen,acerecognitionwithlocalbinarypatterns,in ECCV ,2004. [41] T.Ahonen,E.Rahtu,V.Ojansivu,andJ.Heikkila,ecognitionofblurredfacesusinglocal phasequantization,in ICPR ,IEEE,2008. [42] A.Albiol,D.Monzo,A.Martin,J.Sastre,andA.Albiol,acerecognitionusing PatternRecognitionLetters ,vol.29,no.10,pp.2008. [43] M.Bicego,A.Lagorio,E.Grosso,andM.Tistarelli,theuseofsiftfeaturesforface authentication,in CVPRW ,IEEE,2006. [44] A.Krizhevsky,I.Sutskever,andG.E.Hinton,enetclassi˝cationwithdeepconvolutional neuralnetworks,in NeurIPS ,2012. [45] P.Grother,P.Grother,M.Ngan,andK.Hanaoka, FaceRecognitionVendorTest(FRVT)Part 2:Identi˝cation .USDepartmentofCommerce,NationalInstituteofStandardsandTechnology, 2019. [46] Y.Sun,X.Wang,andX.Tang,learningfacerepresentationfrompredicting10,000 classes,in CVPR ,pp.2014. 142 [47] Y.Wen,K.Zhang,Z.Li,andY.Qiao,Adiscriminativefeaturelearningapproachfordeep facerecognition,in ECCV ,2016. [48] R.Ranjan,C.D.Castillo,andR.Chellappa,trainedsoftmaxlossfordiscriminative faceveri˝cation, arXivpreprintarXiv:1703.09507 ,2017. [49] W.Liu,Y.Wen,Z.Yu,andM.Yang,ge-marginsoftmaxlossforconvolutionalneural networks.,in ICML ,p.7,2016. [50] K.He,X.Zhang,S.Ren,andJ.Sun,residuallearningforimagerecognition,in CVPR , 2016. [51] K.SimonyanandA.Zisserman,erydeepconvolutionalnetworksforlarge-scaleimage recognition, arXivpreprintarXiv:1409.1556 ,2014. [52] Y.Taigman,M.Yang,M.Ranzato,andL.Wolf,eb-scaletrainingforfaceidenti˝cation, in CVPR ,pp.2015. [53] R.S.Bennett,epresentationandanalysisofsignalspartxxi.theintrinsicdimensionality ofsignalcollections,tech.rep.,JohnsHopkinsUniversityBaltimoreMD,Deptartmentof ElectricalEngineeringandComputerScience,1965. [54] J.Theiler,timatingfractaldimension, JOSAA ,vol.7,no.6,pp.1990. [55] J.A.CostaandA.O.Hero,entropicgraphsfordimensionandentropyestimation inmanifoldlearning, IEEETransactionsonSignalProcessing ,vol.52,no.8,pp. 2004. [56] V.N.Boddeti,facematchingusingfullyhomomorphicencryption,in IEEE InternationalConferenceonBiometrics:Theory,Applications,andSystems(BTAS) ,2018. [57] A.Talwalkar,S.Kumar,andH.Rowley,ge-scalemanifoldlearning,in CVPR ,2008. [58] T.G.DietterichandE.B.Kong,hinelearningbias,statisticalbias,andstatistical varianceofdecisiontreealgorithms,tech.rep.,DepartmentofComputerScience,Oregon StateUniversity,1995. [59] Q.Cao,L.Shen,W.Xie,O.M.Parkhi,andA.Zisserman,GGface2:Adatasetfor recognisingfacesacrossposeandage,in FRGC ,IEEE,2018. [60] T.Bolukbasi,K.-W.Chang,J.Y.Zou,V.Saligrama,andA.T.Kalai,istocomputer programmeraswomanistohomemaker?debiasingwordembeddings,in NeurIPS ,2016. [61] A.Torralba,A.A.Efros, etal. ,nbiasedlookatdatasetbias.,in CVPR ,2011. [62] C.Drummond,R.C.Holte, etal. ,5,classimbalance,andcostsensitivity:whyunder- samplingbeatsover-sampling,in WorkshoponLearningfromImbalancedDatasetsII ,Citeseer, 2003. 143 [63] N.V.Chawla,K.W.Bowyer,L.O.Hall,andW.P.Kegelmeyer,syntheticminority over-samplingtechnique, JournalofArti˝cialIntelligenceresearch ,vol.16,pp. 2002. [64] S.S.Mullick,S.Datta,andS.Das,eadversarialminorityoversampling, arXiv preprintarXiv:1903.09730 ,2019. [65] K.Cao,C.Wei,A.Gaidon,N.Arechiga,andT.Ma,ningimbalanceddatasetswith label-distribution-awaremarginloss, arXivpreprintarXiv:1906.07413 ,2019. [66] Y.Cui,M.Jia,T.-Y.Lin,Y.Song,andS.Belongie,lossbasedone˙ective numberofsamples,in CVPR ,2019. [67] Q.Dong,S.Gong,andX.Zhu,deeplearningbyminorityclassincremental recti˝cation, IEEETransactionsonPatternAnalysisandMachineIntelligence ,2018. [68] C.Huang,Y.Li,C.L.Chen,andX.Tang,imbalancedlearningforfacerecognitionand attributeprediction, IEEETransactionsonPatternAnalysisandMachineIntelligence ,2019. [69] S.Khan,M.Hayat,S.W.Zamir,J.Shen,andL.Shao,trikingtherightbalancewith uncertainty,in CVPR ,2019. [70] K.He,X.Zhang,S.Ren,andJ.Sun,mappingsindeepresidualnetworks,in ECCV , Springer,2016. [71] T.ZhengandW.Deng,lfw:Adatabaseforstudyingcross-posefacerecognition inunconstrainedenvironments, BeijingUniversityofPostsandTelecommunications,Tech.Rep , vol.5,2018. [72] S.Sengupta,J.-C.Chen,C.Castillo,V.M.Patel,R.Chellappa,andD.W.Jacobs,to pro˝lefaceveri˝cationinthewild,in WACV ,IEEE,2016. [73] S.Gong,V.N.Boddeti,andA.K.Jain,thecapacityoffacerepresentation, arXivpreprint arXiv:1709.10433 ,2017. [74] S.Gong,X.Liu,andA.K.Jain,yde-biasingfacerecognitionanddemographicattribute estimation, ECCV ,2020. [75] S.Gong,X.Liu,andA.K.Jain,atingfacerecognitionbiasviagroupadaptiveclassi˝er, in CVPR ,2021. [76] J.B.Tenenbaum,V.DeSilva,andJ.C.Langford,Aglobalgeometricframeworkfornonlinear dimensionalityreduction, Science ,vol.290,no.5500,pp.2000. [77] K.FukunagaandD.R.Olsen,Analgorithmfor˝ndingintrinsicdimensionalityofdata, IEEETransactionsonComputers ,vol.100,no.2,pp.1971. [78] J.BruskeandG.Sommer,insicdimensionalityestimationwithoptimallytopology preservingmaps, IEEETransactionsonPatternAnalysisandMachineIntelligence ,vol.20, no.5,pp.1998. 144 [79] P.J.VerveerandR.P.W.Duin,Anevaluationofintrinsicdimensionalityestimators, IEEE TransactionsonPatternAnalysisandMachineIntelligence ,vol.17,no.1,pp.1995. [80] A.S.Georghiades,P.N.Belhumeur,andD.J.Kriegman,fewtomany:Illumination conemodelsforfacerecognitionundervariablelightingandpose, IEEETransactionson PatternAnalysisandMachineIntelligence ,vol.23,no.6,pp.2001. [81] H.MuraseandS.K.Nayar,isuallearningandrecognitionof3-dobjectsfromappearance, InternationalJournalofComputerVision ,vol.14,no.1,pp.1995. [82] P.GrassbergerandI.Procaccia,ingthestrangenessofstrangeattractors,in The TheoryofChaoticAttractors ,pp.Springer,2004. [83] F.CamastraandA.Vinciarelli,timatingtheintrinsicdimensionofdatawithafractal-based method, IEEETransactionsonPatternAnalysisandMachineIntelligence ,vol.24,no.10, pp.2002. [84] B.Kégl,insicdimensionestimationusingpackingnumbers,in AdvancesinNeural InformationProcessingSystems ,2003. [85] M.HeinandJ.-Y.Audibert,insicdimensionalityestimationofsubmanifoldsin R 3 ,in InternationalConferenceonMachineLearning ,2005. [86] E.LevinaandP.J.Bickel,likelihoodestimationofintrinsicdimension,in AdvancesinNeuralInformationProcessingSystems ,2005. [87] I.T.Jolli˙e,incipalcomponentanalysisandfactoranalysis,in PrincipalComponent Analysis ,pp.Springer,1986. [88] J.B.Kruskal,scalingbyoptimizinggoodnessof˝ttoanonmetric hypothesis, Psychometrika ,vol.29,no.1,pp.1964. [89] M.BelkinandP.Niyogi,eigenmapsfordimensionalityreductionanddata representation, NeuralComputation ,vol.15,no.6,pp.2003. [90] S.T.RoweisandL.K.Saul,onlineardimensionalityreductionbylocallylinearembedding, Science ,vol.290,no.5500,pp.2000. [91] R.R.CoifmanandS.Lafon,maps, AppliedandComputationalHarmonicAnalysis , vol.21,no.1,pp.2006. [92] G.E.HintonandR.R.Salakhutdinov,educingthedimensionalityofdatawithneural networks, Science ,vol.313,no.5786,pp.2006. [93] P.Vincent,H.Larochelle,Y.Bengio,andP.-A.Manzagol,andcomposingrobust featureswithdenoisingautoencoders,in InternationalConferenceonMachineLearning , pp.ACM,2008. 145 [94] P.Vincent,H.Larochelle,I.Lajoie,Y.Bengio,andP.-A.Manzagol,tackeddenoising autoencoders:Learningusefulrepresentationsinadeepnetworkwithalocaldenoisingcriterion, JournalofMachineLearningResearch ,vol.11,no.Dec,pp.2010. [95] R.Hadsell,S.Chopra,andY.LeCun,reductionbylearninganinvariant mapping,in IEEEConferenceonComputerVisionandPatternRecognition ,pp. 2006. [96] Y.Bengio,J.Louradour,R.Collobert,andJ.Weston,riculumlearning,in International ConferenceonMachineLearning ,pp.ACM,2009. [97] D.P.KingmaandJ.Ba,Adam:Amethodforstochasticoptimization, arXivpreprint arXiv:1412.6980 ,2014. [98] I.LoshchilovandF.Hutter,stochasticgradientdescentwithwarmrestarts, arXiv preprintarXiv:1608.03983 ,2016. [99] S.Liao,Z.Lei,D.Yi,andS.Z.Li,Abenchmarkstudyoflarge-scaleunconstrainedface recognition,in IEEEInternationalJointConferenceonBiometrics(IJCB) ,2014. [100] N.A.SchmidandJ.A.O'Sullivan,erformancepredictionmethodologyforbiometric systemsusingalargedeviationsapproach, IEEETransactionsonSignalProcessing ,vol.52, no.10,pp.2004. [101] N.A.Schmid,M.V.Ketkar,H.Singh,andB.Cukic,erformanceanalysisofiris-based identi˝cationsystematthematchingscorelevel, IEEETransactionsonInformationForensics andSecurity ,vol.1,no.2,pp.2006. [102] J.BhatnagarandA.Kumar,estimatingperformanceindicesforbiometricidenti˝cation, PatternRecognition ,vol.42,no.9,pp.2009. [103] P.Wang,Q.Ji,andJ.L.Wayman,andpredictingfacerecognitionsystem performancebasedonanalysisofsimilarityscores, IEEETransactionsonPatternAnalysis andMachineIntelligence ,vol.29,no.4,pp.2007. [104] C.E.RasmussenandC.K.Williams, GaussianProcessesforMachineLearning ,vol.1.MIT PressCambridge,2006. [105] Y.GalandZ.Ghahramani,asabayesianapproximation:Representingmodel uncertaintyindeeplearning, arXiv:1506.02142 ,vol.2,2015. [106] Y.GalandZ.Ghahramani,Atheoreticallygroundedapplicationofdropoutinrecurrent neuralnetworks,in NIPS ,2016. [107] Y.GalandZ.Ghahramani,yesianconvolutionalneuralnetworkswithbernoulliapproxi- matevariationalinference, arXiv:1506.02158 ,2015. [108] A.KendallandY.Gal,uncertaintiesdoweneedinbayesiandeeplearningforcomputer vision?,in NIPS ,2017. 146 [109] A.DerKiureghianandO.Ditlevsen,Aleatoryorepistemic?doesitmatter?, Structural Safety ,vol.31,no.2,pp.2009. [110] S.Pankanti,S.Prabhakar,andA.K.Jain,theindividualityof˝ngerprints, IEEE TransactionsonPatternAnalysisandMachineIntelligence ,vol.24,no.8,pp.2002. [111] Y.Zhu,S.C.Dass,andA.K.Jain,tatisticalmodelsforassessingtheindividuality of˝ngerprints, IEEETransactionsonInformationForensicsandSecurity ,vol.2,no.3, pp.2007. [112] J.Daugman,ormationtheoryandtheiriscode, IEEETransactionsonInformation ForensicsandSecurity ,vol.11,no.2,pp.2016. [113] A.Adler,R.Youmaran,andS.Loyka,Towardsameasureofbiometricfeatureinformation, PatternAnalysisandApplications ,vol.12,no.3,pp.2009. [114] J.BaandR.Caruana,deepnetsreallyneedtobedeep?,in NIPS ,2014. [115] D.P.Kingma,T.Salimans,andM.Welling,ariationaldropoutandthelocalreparameteri- zationtrick,in NIPS ,2015. [116] C.Blundell,J.Cornebise,K.Kavukcuoglu,andD.Wierstra,eightuncertaintyinneural networks, arXiv:1505.05424 ,2015. [117] L.Bottou,ge-scalemachinelearningwithstochasticgradientdescent,in Proceedings ofCOMPSTAT'2010 ,pp.Springer,2010. [118] D.E.Rumelhart,G.E.Hinton,andR.J.Williams,ningrepresentationsbyback- propagatingerrors, CognitiveModeling ,vol.5,no.3,p.1,1988. [119] R.O.Duda,P.E.Hart,andD.G.Stork, PatternClassi˝cation .JohnWiley&Sons,2012. [120] T.M.CoverandJ.A.Thomas, ElementsofInformationTheory .JohnWiley&Sons,2012. [121] C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,and A.Rabinovich,deeperwithconvolutions,in CVPR ,2015. [122] D.Wang,C.Otto,andA.K.Jain,acesearchatscale:80milliongallery, IEEETransactions onPatternAnalysisandMachineIntelligence ,vol.39,no.6,pp.11221136,2017. [123] X.Wu,R.He,Z.Sun,andT.Tan,Alightcnnfordeepfacerepresentationwithnoisylabels, IEEETransactionsonInformationForensicsandSecurity ,vol.13,no.11,pp.2018. [124] R.Ranjan,A.Bansal,H.Xu,S.Sankaranarayanan,J.-C.Chen,C.D.Castillo,andR.Chellappa, ystallossandqualitypoolingforunconstrainedfaceveri˝cationandrecognition, arXiv preprintarXiv:1804.01159 ,2018. [125] W.Xie,L.Shen,andA.Zisserman,networks, arXivpreprintarXiv:1807.11440 , 2018. 147 [126] I.Kemelmacher-Shlizerman,S.M.Seitz,D.Miller,andE.Brossard,Themegaface benchmark:1millionfacesforrecognitionatscale,in CVPR ,2016. [127] L.Best-RowdenandA.K.Jain,studyofautomaticfacerecognition, IEEE TransactionsonPatternAnalysisandMachineIntelligence ,vol.40,no.1,pp.2018. [128] M.Alvi,A.Zisserman,andC.Nellåker,Turningablindeye:Explicitremovalofbiasesand variationfromdeepneuralnetworkembeddings,in ECCV ,2018. [129] L.A.Hendricks,K.Burns,K.Saenko,T.Darrell,andA.Rohrbach,omenalsosnowboard: Overcomingbiasincaptioningmodels,in ECCV ,2018. [130] T.Wang,J.Zhao,M.Yatskar,K.-W.Chang,andV.Ordonez,datasetsarenot enough:Estimatingandmitigatinggenderbiasindeepimagerepresentations,in CVPR ,2019. [131] D.Madras,E.Creager,T.Pitassi,andR.Zemel,ningadversariallyfairandtransferable representations,in ICML ,2018. [132] M.Kearns,S.Neel,A.Roth,andZ.S.Wu,Anempiricalstudyofrichsubgroupfairness formachinelearning,in ProceedingsoftheConferenceonFairness,Accountability,and Transparency ,2019. [133] J.Zhao,T.Wang,M.Yatskar,V.Ordonez,andK.-W.Chang,alsolikeshopping: Reducinggenderbiasampli˝cationusingcorpus-levelconstraints,in EMNLP ,2017. [134] Z.Wang,K.Qinami,Y.Karakozis,K.Genova,P.Nair,K.Hata,andO.Russakovsky, Towardsfairnessinvisualrecognition:E˙ectivestrategiesforbiasmitigation, arXivpreprint arXiv:1911.11834 ,2019. [135] A.Grover,J.Song,A.Kapoor,K.Tran,A.Agarwal,E.J.Horvitz,andS.Ermon, correctionoflearnedgenerativemodelsusinglikelihood-freeimportanceweighting,in NeurIPS , 2019. [136] F.Calmon,D.Wei,B.Vinzamuri,K.N.Ramamurthy,andK.R.Varshney, pre-processingfordiscriminationprevention,in NeurIPS ,2017. [137] M.P.Kim,A.Ghorbani,andJ.Zou,:Black-boxpost-processingforfairness inclassi˝cation,in AAAI/ACM ,2019. [138] G.Pleiss,M.Raghavan,F.Wu,J.Kleinberg,andK.Q.Weinberger,fairnessand calibration,in NeurIPS ,2017. [139] Y.ZhangandZ.-H.Zhou,t-sensitivefacerecognition, IEEETrans.PatternAnalysis andMachineIntelligence ,vol.32,no.10,pp.2009. [140] Y.-H.LiuandY.-T.Chen,acerecognitionusingtotalmargin-basedadaptivefuzzysupport vectormachines, IEEETransactionsonNeuralNetworks ,vol.18,no.1,pp.2007. [141] X.Yin,X.Yu,K.Sohn,X.Liu,andM.Chandraker,eaturetransferlearningforface recognitionwithunder-representeddata,in CVPR ,2019. 148 [142] X.Zhang,Z.Fang,Y.Wen,Z.Li,andY.Qiao,elossfordeepfacerecognitionwith long-tailedtrainingdata,in CVPR ,2017. [143] A.Amini,A.Soleimany,W.Schwarting,S.Bhatia,andD.Rus,ncoveringandmitigating algorithmicbiasthroughlearnedlatentstructure,in AAAI/ACMConferenceonAI,Ethics,and Society ,2019. [144] P.Wang,F.Su,Z.Zhao,Y.Guo,Y.Zhao,andB.Zhuang,class-skewedlearningfor facerecognition, Neurocomputing ,2019. [145] H.Qin,Asymmetricrejectionlossforfairerfacerecognition, arXivpreprint arXiv:2002.03276 ,2020. [146] D.Moyer,S.Gao,R.Brekelmans,A.Galstyan,andG.VerSteeg,variantrepresentations withoutadversarialtraining,in NeurIPS ,2018. [147] J.Song,P.Kalluri,A.Grover,S.Zhao,andS.Ermon,ningcontrollablefairrepresenta- tions,in ICAIS ,2019. [148] R.Zemel,Y.Wu,K.Swersky,T.Pitassi,andC.Dwork,ningfairrepresentations,in ICML ,2013. [149] M.Hardt,E.Price,andN.Srebro,ualityofopportunityinsupervisedlearning,in NeurIPS ,2016. [150] F.Locatello,G.Abbati,T.Rainforth,S.Bauer,B.Schölkopf,andO.Bachem,the fairnessofdisentangledrepresentations,in NeurIPS ,2019. [151] X.Yin,X.Yu,K.Sohn,X.Liu,andM.Chandraker,Towardslarge-posefacefrontalization inthewild,in ICCV ,2017. [152] L.Tran,X.Yin,andX.Liu,epresentationlearningbyrotatingyourfaces, IEEETrans.on PatternAnalysisandMachineIntelligence ,vol.41,no.12,pp.2019. [153] J.Schmidhuber,ningfactorialcodesbypredictabilityminimization, NeuralComputa- tion ,vol.4,no.6,pp.1992. [154] I.Goodfellow,J.Pouget-Abadie,M.Mirza,B.Xu,D.Warde-Farley,S.Ozair,A.Courville, andY.Bengio,eadversarialnets,in NIPS ,2014. [155] E.Tzeng,J.Ho˙man,T.Darrell,andK.Saenko,deeptransferacrossdomains andtasks,in CVPR ,2015. [156] E.Tzeng,J.Ho˙man,K.Saenko,andT.Darrell,Adversarialdiscriminativedomain adaptation,in CVPR ,2017. [157] M.Long,Z.Cao,J.Wang,andM.I.Jordan,adversarialdomainadaptation,in NIPS ,2018. 149 [158] C.Tao,F.Lv,L.Duan,andM.Wu,entropynetwork:Learningcategory-invariant featuresfordomainadaptation, arXivpreprintarXiv:1904.09601 ,2019. [159] B.Yin,L.Tran,H.Li,X.Shen,andX.Liu,Towardsinterpretablefacerecognition,in ICCV ,2019. [160] Y.Liu,Z.Wang,H.Jin,andI.Wassell,adversarialnetworkfordisentangled featurelearning,in CVPR ,2018. [161] Y.Liu,F.Wei,J.Shao,L.Sheng,J.Yan,andX.Wang,ingdisentangledfeature representationbeyondfaceidenti˝cation,in CVPR ,2018. [162] L.Tran,X.Yin,andX.Liu,representationlearningGANforpose-invariant facerecognition,in CVPR ,2017. [163] F.Liu,D.Zeng,Q.Zhao,andX.Liu,featuresin3Dfaceshapesforjointface reconstructionandrecognition,in CVPR ,2018. [164] H.KimandA.Mnih,byfactorising, arXivpreprintarXiv:1802.05983 ,2018. [165] S.Narayanaswamy,T.B.Paige,J.-W.VandeMeent,A.Desmaison,N.Goodman,P.Kohli, F.Wood,andP.Torr,ningdisentangledrepresentationswithsemi-superviseddeep generativemodels,in NIPS ,2017. [166] F.Locatello,S.Bauer,M.Lucic,S.Gelly,B.Schölkopf,andO.Bachem, commonassumptionsintheunsupervisedlearningofdisentangledrepresentations, arXiv preprintarXiv:1811.12359 ,2018. [167] Z.Zhang,L.Tran,X.Yin,Y.Atoum,J.Wan,N.Wang,andX.Liu,recognitionvia disentangledrepresentationlearning,in CVPR ,2019. [168] H.Han,K.J.Anil,S.Shan,andX.Chen,eneousfaceattributeestimation:Adeep multi-tasklearningapproach, IEEETrans.PatternAnalysisMachineIntelligence ,vol.PP, no.99,pp.2017. [169] C.M.Cook,J.J.Howard,Y.B.Sirotin,J.L.Tipton,andA.R.Vemury,raphic e˙ectsinfacialrecognitionandtheirdependenceonimageacquisition:Anevaluationofeleven commercialsystems, IEEETransactionsonBiometrics,Behavior,andIdentityScience ,2019. [170] F.Wang,J.Cheng,W.Liu,andH.Liu,Additivemarginsoftmaxforfaceveri˝cation, IEEE SignalProcessingLetters ,vol.25,no.7,pp.2018. [171] A.Jourabloo,X.Yin,andX.Liu,Attributepreservedfacede-identi˝cation,in ICB ,2015. [172] B.-C.Chen,C.-S.Chen,andW.H.Hsu,ereferencecodingforage-invariantface recognitionandretrieval,in ECCV ,2014. [173] R.Rothe,R.Timofte,andL.VanGool,expectationofrealandapparentagefroma singleimagewithoutfaciallandmarks, IJCV ,2018. 150 [174] Z.Zhang,Y.Song,andH.Qi,Ageprogression/regressionbyconditionaladversarial autoencoder,in CVPR ,2017. [175] S.Moschoglou,A.Papaioannou,C.Sagonas,J.Deng,I.Kotsia,andS.Zafeiriou,Agedb: the˝rstmanuallycollected,in-the-wildagedatabase,in CVPRW ,2017. [176] Z.Niu,M.Zhou,L.Wang,X.Gao,andG.Hua,regressionwithmultipleoutput cnnforageestimation,in CVPR ,2016. [177] J.Cheng,Y.Li,J.Wang,L.Yu,andS.Wang,e˙ectivefacialpatchesforrobust genderrecognition, TsinghuaScienceandTechnology ,vol.24,no.3,pp.2019. [178] S.Setty,M.Husain,P.Beham,J.Gudavalli,M.Kandasamy,R.Vaddi,V.Hemadri,J.C. Karure,R.Raju,Rajan,V.Kumar,andC.V.Jawahar,MovieFaceDatabase:A BenchmarkforFaceRecognitionUnderWideVariations,in NCVPRIPG ,2013. [179] illionpairs.deepglint.com/overview, [180] D.Deb,L.Best-Rowden,andA.K.Jain,acerecognitionperformanceunderaging,in CVPRW ,2017. [181] https://yanweifu.github.io/FG_NET_data . [182] X.YinandX.Liu,convolutionalneuralnetworkforpose-invariantfacerecogni- tion, IEEETrans.ImageProcessing ,vol.27,no.2,pp.2017. [183] W.XieandA.Zisserman,networksforfacerecognition, arXivpreprint arXiv:1807.09192 ,2018. [184] Y.ShiandA.K.Jain,ticfaceembeddings,in ICCV ,2019. [185] D.Kang,D.Dhar,andA.Chan,poratingsideinformationbyadaptiveconvolution,in NeurIPS ,2017. [186] J.Yang,Z.Ren,C.Gan,H.Zhu,andD.Parikh,hannelcommunicationnetworks, in NeurIPS ,2019. [187] Q.Wang,B.Wu,P.Zhu,P.Li,W.Zuo,andQ.Hu,E˚cientchannelattentionfor deepconvolutionalneuralnetworks, arXivpreprintarXiv:1910.03151 ,2019. [188] X.Lu,W.Wang,C.Ma,J.Shen,L.Shao,andF.Porikli,more,knowmore:Unsupervised videoobjectsegmentationwithco-attentionsiamesenetworks,in CVPR ,2019. [189] R.Hou,H.Chang,M.Bingpeng,S.Shan,andX.Chen,attentionnetworkforfew-shot classi˝cation,in NeurIPS ,2019. [190] K.Su,D.Yu,Z.Xu,X.Geng,andC.Wang,poseestimationwithenhanced channel-wiseandspatialinformation,in CVPR ,2019. [191] T.-K.Hu,Y.-Y.Lin,andP.-C.Hsiu,ningadaptivehiddenlayersformobilegesture recognition,in AAAI ,2018. 151 [192] Y.Zhang,D.Zhao,J.Sun,G.Zou,andW.Li,Adaptiveconvolutionalneuralnetworkand itsapplicationinfacerecognition, NeuralProcessingLetters ,2016. [193] S.Li,J.Xing,Z.Niu,S.Shan,andS.Yan,drivenkerneladaptationinconvolutional neuralnetworkforrobustfacialtraitsrecognition,in CVPR ,2015. [194] J.Du,S.Zhang,G.Wu,J.M.Moura,andS.Kar,Topologyadaptivegraphconvolutional networks, arXivpreprintarXiv:1710.10370 ,2017. [195] C.Ding,Y.Li,Y.Xia,L.Zhang,andY.Zhang,Automatickernelsizedeterminationfor deepneuralnetworksbasedhyperspectralimageclassi˝cation, RemoteSensing ,2018. [196] X.Li,W.Wang,X.Hu,andJ.Yang,ekernelnetworks,in CVPR ,2019. [197] C.Ding,Y.Li,Y.Xia,W.Wei,L.Zhang,andY.Zhang,volutionalneuralnetworks basedhyperspectralimageclassi˝cationmethodwithadaptivekernels, RemoteSensing ,2017. [198] X.Li,M.Ye,Y.Liu,andC.Zhu,Adaptivedeepconvolutionalneuralnetworksforscene- speci˝cobjectdetection, IEEETransactionsonCircuitsandSystemsforVideoTechnology , 2017. [199] H.Su,V.Jampani,D.Sun,O.Gallo,E.Learned-Miller,andJ.Kautz,el-adaptive convolutionalneuralnetworks,in CVPR ,2019. [200] J.ZamoraEsquivel,A.CruzVargas,P.LopezMeyer,andO.Tickoo,Adaptiveconvolutional kernels,in ICCVWorkshops ,2019. [201] B.Klein,L.Wolf,andY.Afek,Adynamicconvolutionallayerforshortrangeweather prediction,in CVPR ,2015. [202] X.Jia,B.DeBrabandere,T.Tuytelaars,andL.V.Gool,˝lternetworks,in NeurIPS ,2016. [203] X.Zhang,T.Wang,J.Qi,H.Lu,andG.Wang,ressiveattentionguidedrecurrent networkforsalientobjectdetection,in CVPR ,2018. [204] Z.Chen,L.Liu,I.Sa,Z.Ge,andM.Chli,ningcontext˛exibleattentionmodelfor long-termvisualplacerecognition, IEEERoboticsandAutomationLetters ,2018. [205] L.Chen,H.Zhang,J.Xiao,L.Nie,J.Shao,W.Liu,andT.-S.Chua,A-CNN:Spatialand channel-wiseattentioninconvolutionalnetworksforimagecaptioning,in CVPR ,2017. [206] B.Chen,P.Li,C.Sun,D.Wang,G.Yang,andH.Lu,attentionmoduleforvisual tracking, PatternRecognition ,2019. [207] A.A.BastidasandH.Tang,attentionnetworks,in CVPRWorkshops ,2019. [208] H.Ling,J.Wu,J.Huang,J.Chen,andP.Li,Attention-basedconvolutionalneuralnetwork fordeepfacerecognition, MultimediaToolsandApplications ,2020. 152 [209] V.A.SindagiandV.M.Patel,Hierarchicalattention-basedcrowdcountingnetwork, IEEETransactionsonImageProcessing ,2019. [210] J.Hu,L.Shen,andG.Sun,ueeze-and-excitationnetworks,in CVPR ,2018. [211] S.Woo,J.Park,J.-Y.Lee,andI.SoKweon,Convolutionalblockattentionmodule, in ECCV ,2018. [212] M.Sadiq,D.Shi,M.Guo,andX.Cheng,aciallandmarkdetectionviaattention-adaptive deepnetwork, IEEEAccess ,2019. [213] D.Linsley,D.Schiebler,S.Eberhardt,andT.Serre,ningwhatandwheretoattend,in ICLR ,2019. [214] U.Ozbulak,hcnnvisualizations. https://github.com/utkuozbulak/ pytorch-cnn-visualizations ,2019. [215] T.N.KipfandM.Welling,visedclassi˝cationwithgraphconvolutionalnetworks, in ICLR ,2017. [216] M.De˙errard,X.Bresson,andP.Vandergheynst,volutionalneuralnetworksongraphs withfastlocalizedspectral˝ltering,in NeurIPS ,2016. [217] M.Zhang,Z.Cui,M.Neumann,andY.Chen,Anend-to-enddeeplearningarchitecturefor graphclassi˝cation,in AAAI ,2018. [218] M.Niepert,M.Ahmed,andK.Kutzkov,ningconvolutionalneuralnetworksforgraphs, in ICML ,PMLR,2016. [219] J.-C.Vialatte,V.Gripon,andG.Mercier,theconvolutionoperatortoextend cnnstoirregulardomains, arXivpreprintarXiv:1606.01166 ,2016. [220] A.J.-P.Tixier,G.Nikolentzos,P.Meladianos,andM.Vazirgiannis,classi˝cation with2dconvolutionalneuralnetworks,in ICANN ,Springer,2019. [221] Y.Huang,Y.Wang,Y.Tai,X.Liu,P.Shen,S.Li,J.Li,andF.Huang,ricularface: adaptivecurriculumlearninglossfordeepfacerecognition,in CVPR ,2020. [222] Y.Sun,X.Wang,andX.Tang,ylearnedfacerepresentationsaresparse,selective, androbust,in CVPR ,2015. [223] S.Sankaranarayanan,A.Alavi,C.D.Castillo,andR.Chellappa,Tripletprobabilistic embeddingforfaceveri˝cationandclustering,in BTAS ,IEEE,2016. [224] L.Tran,X.Yin,andX.Liu,representationlearningganforpose-invariant facerecognition,in CVPR ,2017. [225] J.Deng,S.Cheng,N.Xue,Y.Zhou,andS.Zafeiriou,-gan:Adversarialfacialuvmap completionforpose-invariantfacerecognition,in ProceedingsoftheIEEEconferenceon computervisionandpatternrecognition ,pp.2018. 153 [226] Y.Huang,P.Shen,Y.Tai,S.Li,X.Liu,J.Li,F.Huang,andR.Ji,vingfacerecognition fromhardsamplesviadistributiondistillationloss,in ECCV ,pp.Springer,2020. [227] J.Yang,P.Ren,D.Zhang,D.Chen,F.Wen,H.Li,andG.Hua,euralaggregationnetwork forvideofacerecognition,in CVPR ,2017. [228] Y.Liu,J.Yan,andW.Ouyang,awarenetworkforsettosetrecognition,in CVPR , 2017. [229] X.Zhang,R.Zhao,Y.Qiao,X.Wang,andH.Li,Adacos:Adaptivelyscalingcosinelogits fore˙ectivelylearningdeepfacerepresentations,in CVPR ,2019. [230] X.Zhang,R.Zhao,J.Yan,M.Gao,Y.Qiao,X.Wang,andH.Li,rad:Re˝nedgradients foroptimizingdeepfacemodels,in CVPR ,2019. 154