FACERECOGNITION:REPRESENTATION,INTRINSICDIMENSIONALITY,
CAPACITY,ANDDEMOGRAPHICBIAS
By
SixueGong
ADISSERTATION
Submittedto
MichiganStateUniversity
inpartialful˝llmentoftherequirements
forthedegreeof
ComputerScienceDoctorofPhilosophy
2021
ABSTRACT
FACERECOGNITION:REPRESENTATION,INTRINSICDIMENSIONALITY,
CAPACITY,ANDDEMOGRAPHICBIAS
By
SixueGong
Facerecognitionisawidelyadoptedtechnologywithnumerousapplications,suchasmobile
phoneunlock,mobilepayment,surveillance,socialmediaandlawenforcement.Therehasbeen
tremendousprogressinenhancingtheaccuracyoffacerecognitionsystemsoverthepastfewdecades,
muchofwhichcanbeattributedtodeeplearning.Despitethisprogress,severalfundamental
problemsinfacerecognitionstillremainunsolved.Theseproblemsinclude˝ndingasalient
representation,estimatingintrinsicdimensionality,representationcapacity,anddemographicbias.
Withgrowingapplicationsoffacerecognition,theneedforanaccurate,robust,compactandfair
representationisevident.
Inthisthesis,we˝rstdevelopalgorithmstoobtainpracticalestimatesofintrinsicdimensionality
offacerepresentations,andproposeanewdimensionalityreductionmethodtoprojectfeature
vectorsfromambientspacetointrinsicspace.Basedonthestudyinintrinsicdimensionality,we
thenestimatecapacityoffacerepresentation,castingthefacecapacityestimationproblemunderthe
informationtheoreticframeworkofcapacityofaGaussiannoisechannel.Numericalexperimentson
unconstrainedfaces(IJB-C)provideacapacityupperboundof
2
Ł
7

10
4
forFaceNetand
8
Ł
4

10
4
forSphereFacerepresentationat
1%
FAR.
Inthesecondpartofthethesis,weaddressthedemographicbiasprobleminfacerecognition
systemswhereerrorsareloweroncertaincohortsbelongingtospeci˝cdemographicgroups.We
proposetwode-biasingframeworksthatextractfeaturerepresentationstoimprovefairnessinface
recognition.Experimentsonbenchmarkfacedatasets(RFW,LFW,IJB-A,andIJB-C)showthat
ourapproachesareabletomitigatefacerecognitionbiasonvariousdemographicgroups(biasness
dropsfrom
6
Ł
83
to
5
Ł
07
)aswellasmaintainthecompetitiveperformance(i.e.,
99
Ł
75%
onLFW,
and
93
Ł
70%
TAR@
0
Ł
1%
FARonIJB-C).Lastly,weexploretheglobaldistributionofdeepface
representationsderivedfromcorrelationsbetweenimagesamplesofwithin-classandcross-class
toenhancethediscriminativenessoffacerepresentationofeachidentityintheembeddingspace.
Ournewapproachtofacerepresentationachievesstate-of-the-artperformanceforbothveri˝cation
andidenti˝cationtasksonbenchmarkdatasets(
99
Ł
78%
onLFW,
93
Ł
40%
onCPLFW,
98
Ł
41%
on
CFP-FP,
96
Ł
2%
TAR@
0
Ł
01%
FARand
95
Ł
3%
Rank

1
accuracyonIJB-C).Since,theprimary
techniquesweemployinthisdissertationarenotspeci˝ctofacesonly,webelieveourresearchcan
beextendedtootherproblemsincomputervision,forexample,generalimageclassi˝cationand
representationlearning.
Copyrightby
SIXUEGONG
2021
Dedicatedtomyparents,QiaojieGongandYanHuang
v
ACKNOWLEDGMENTS
TheyearsatMSUhavebeenanunforgettableexperiencethathasbeenengravedonmyheart,
fullofjoysandsorrows,partingsandmeetings.Itisyetacomforttorememberandtothankmy
teachers,colleagues,friends,andfamilywhosein˛uencecontributedtothisthesis.
Firstandforemost,Iwouldliketothankmyadvisor,AnilK.Jain,forhisextraordinarysupport,
patience,guidance,andfundingmyentireresearchonfacerecognition.Hissuggestionsandour
discussionscontributedimmenselytothisdissertation.Everysummersemester,Dr.Jainencouraged
andhelpedmeto˝ndinternships,whichenrichedmyworkingexperiencewithindustryandalso
inspiredthisdissertationresearch.Hitherto,heallowedmefreedomtoexploredirectionsinmyown
research,andtomanagemywork-lifebalancebyspendingtimeonmyhobbies.Ihavecometo
appreciatethewisdomofhiswaythathasguidedmee˙ectivelyandsafelythroughtheprocess.
IowemanyinspirationalideastoVishnuNareshBoddetiandXiaomingLiu,whohavebeenlike
co-advisorstome.IthasbeenaprivilegetoworkwithDr.BoddetiandDr.Liu,whichstrengthened
myresearchabilityinthe˝eldofcomputervisionanddeeplearning.Thisthesiswouldnothave
beenpossiblewithoutDr.Boddeti'sandDr.Liu'sguidanceandcontributions.Ideeplyappreciate
theirenthusiasmfornovelapproachesandtheirgenuinelypositiveattitudetowardsscienceandmy
researchprogress.
IamverythankfultoProfessorYuanZhang,andIlearnedatremendousamountfromherby
takingtheundergraduatecoursesofinformationtheoryanddatacompressionfromhermysenior
year.WorkingwithProfessorZhangalsofosteredmyinterestinresearch.Shelaterintroduced
metothekeyIntelligentInformationProcessingLaboratoryoftheChineseAcademyofSciences,
whereIhadtheopportunitytoworkwithProfessorsHuHanandShiguangShan.Ihighlyvalue
manyusefulideas,commentsandpracticaladvicefromProfessorHanandProfessorShan.HadI
notworkedwithProfessorHanandProfessorShan,itisunlikelyIwouldhaveendedupinthePRIP
labworkingwithDr.Jain.
IthankallofthemembersofthePRIPlabforparticipatinginmyresearchandprovidingvaluable
vi
feedbackonmyworkatalltimes.Iamverygratefulfortheirfriendshipandsupport.
IthankallmyfriendsinEastLansing.Theyarelikemyextendedfamilyhere-themembers
ofthePRIPlab(DebayanDeb,JoshuaEngelsma,YichunShi,KaiCao,TarangChugh,InciM.
Baytas,VisheshMistry,DivyanshAggarwal,andStevenGrosz),DiptiKamath,ManniLiu,Lan
Wang,RahulDey,andXiaoxueWang-forparties,dinners,movies,games,andlovingfriendship
overtheyears.
Ithankmyentirefamilyfortheirloveandsupport,especiallymyparents,QiaojieGong,Yan
Huang,myaunt,BaolanGong,andmycousin,RuozhuChen.
Finally,Iwouldalsoliketothankthosethathavebroughtsu˙eringtome.Ihavelearntimportant
lessonsinlifeandbuiltresiliencebyembracingtheadversityasagreatteacher.Ito˙eredmevaluable
chancetogaintreasuredinsightsandwisdomstohavemyselfpreparedforabetteropportunityof
successnexttime.
vii
TABLEOFCONTENTS
LISTOFTABLES
.......................................
xi
LISTOFFIGURES
......................................
xiv
LISTOFALGORITHMS
...................................
xx
KEYTOABBREVIATIONS
.................................
xxi
Chapter1Introduction:TheImportanceofFaceRepresentation
...........
1
1.1AutomatedFaceRecognition.............................3
1.1.1InputDataSource...............................3
1.1.2Evaluation...................................5
1.1.3FaceRecognitionPipeline..........................7
1.2FaceRepresentationExtractors............................8
1.3ManifoldofFaceRepresentation...........................11
1.3.1IntrinsicDimensionality...........................11
1.3.2Capacity....................................13
1.4BiasinFaceRecognition...............................15
1.5FaceRepresentationviaGraphNeuralNetwork...................18
1.6ThesisContributions.................................20
1.7ThesisStructure....................................22
Chapter2TheIntrinsicDimensionalityofFaceRepresentation
............
23
2.1IntrinsicDimensionality................................24
2.2DimensionalityReduction...............................25
2.3OurApproach.....................................25
2.3.1EstimatingIntrinsicDimensionality.....................26
2.3.2EstimatingIntrinsicSapce..........................30
2.4Experiments......................................31
2.4.1IntrinsicDimensionalityEstimation.....................31
2.4.2IntrinsicSpaceMapping...........................36
2.5Conclusion......................................43
Chapter3TheCapacityofFaceRepresentation
.....................
44
3.1RelatedWork.....................................46
3.2CapacityofFaceRepresentations...........................48
3.2.1FaceRepresentationModel.........................48
3.2.2EstimatingUncertaintiesinRepresentations.................50
3.2.3ManifoldApproximation...........................54
3.2.4DecisionTheoryandModelCapacity....................55
3.3NumericalExperiments...............................58
3.3.1Two-DimensionalToy-Example.......................59
viii
3.3.2DatasetsandFaceRepresentationModel...................60
3.3.3FaceRecognitionPerformance........................61
3.3.4FaceRepresentationCapacity........................63
3.3.5AblationStudies...............................66
3.4Conclusion......................................68
Chapter4TheBiasinFaceRecognition
.........................
70
4.1FairnessLearningandDe-biasingAlgorithms....................71
4.2ProblemDe˝nition..................................72
4.3JointlyDe-biasingFaceRecognitionandDemographicAttributeEstimation....73
4.3.1AdversarialLearningandDisentangledRepresentation...........75
4.3.2Methodology.................................75
4.3.2.1AlgorithmDesign.........................75
4.3.2.2NetworkArchitecture.......................77
4.3.2.3AdversarialTrainingandDisentanglement............78
4.3.3Experiments..................................80
4.3.3.1DatasetsandPre-processing....................80
4.3.3.2ImplementationDetails......................81
4.3.3.3De-biasingFaceVeri˝cation....................81
4.3.3.4De-biasingDemographicAttributeEstimation..........83
4.3.3.5AnalysisofDisentanglement....................85
4.3.3.6FaceVeri˝cationonPublicTestingDatasets...........87
4.3.3.7DistributionsofScores.......................88
4.4MitigatingFaceRecognitionBiasviaGroupAdaptiveClassi˝er..........89
4.4.1AdaptiveNeuralNetworks..........................91
4.4.2Methodology.................................93
4.4.2.1Overview..............................93
4.4.2.2AdaptiveLayer...........................93
4.4.2.3AutomationModule........................96
4.4.2.4De-biasingObjectiveFunction...................96
4.4.3Experiments..................................97
4.4.3.1ResultsonRFWProtocol.....................98
4.4.3.2ResultsonGenderandRaceGroups................104
4.4.3.3AnalysisonIntrinsicBiasandDataBias.............106
4.4.3.4ResultsonStandardBenchmarkDatasets.............107
4.4.3.5VisualizationandAnalysisonBiasofFR.............108
4.4.3.6NetworkComplexityandFLOPs.................109
4.5DemographicEstimation...............................109
4.6Conclusion......................................111
Chapter5AdversarialFaceRepresentationLearningviaGraphClassi˝cation
...
113
5.1AdversarialLearningandGraphClassi˝cationwithGNN..............113
5.2OurApproach.....................................115
5.2.1OverallFramework..............................115
5.2.2GraphConstruction..............................117
ix
5.2.3DiscriminatorandAdversarialLearning...................118
5.2.4NetworkTraining...............................120
5.3Experiments......................................122
5.3.1DatasetsandImplementationDetails.....................122
5.3.2AblationStudy................................123
5.3.3ComparisonswithSOTAMethods......................125
5.3.4AnalysisonFeatureDistribution.......................128
5.4ConcludingRemarks.................................130
Chapter6SummaryandFutureWork
..........................
131
APPENDIX
...........................................
137
BIBLIOGRAPHY
.......................................
139
x
LISTOFTABLES
Table2.1IntrinsicDimensionality:GraphDistance[1].................35
Table2.2IntrinsicDimensionality:KNN[2].......................35
Table2.3IntrinsicDimensionality:IDEA[3]......................36
Table2.4LFWFaceVeri˝cationforSphereFaceEmbedding..............38
Table2.5DeepMDSTrainingMethods(TAR@0.1%FAR)..............40
Table3.1CapacityofTwo-DimensionalToyExampleat1%FAR............59
Table
3.2
FaceRecognitionResultsforFaceNet,SphereFaceandState-of-the-Art(The
state-of-the-artfacerepresentationmodelsarenotavailableinthepublicdomain).63
Table3.3CapacityofFaceRepresentationModelat1%FAR..............64
Table3.4IJB-CCapacityat1%FARAcrossIntra-ClassUncertainty..........66
Table3.5IJB-CCapacityat1%FARAcrossManifoldSupport.............68
Table4.1Statisticsoftrainingandtestingdatasetsusedinthepaper...........80
Table4.2BiasnessofFaceRecognitionandDemographicAttributeEstimation.....84
Table4.3DemographicClassi˝cationAccuracy(%)byfacefeatures...........87
Table4.4FaceVeri˝cationAccuracy(%)onRFWdataset................87
Table4.5Veri˝cationPerformanceonLFW,IJB-A,andIJB-C..............88
Table
4.6
PerformancecomparisonwithSOTAontheRFWprotocol[4].Theresults
markedby(*)aredirectlycopiedfrom[5].......................99
Table4.7AblationofadaptivestrategiesontheRFWprotocol[4]............102
Table4.8AblationofCNNdepthsanddemographicsonRFWprotocol[4].......102
Table4.9
Ablationson
_
onRFWprotocol(%).
.......................103
Table4.10Veri˝cationAccuracy(%)of
5
-foldcross-validationon
8
groupsofRFW[4].103
xi
Table4.11AblationsontheautomationmoduleonRFWprotocol(%)..........104
Table4.12Statisticsofdatasetfoldsinthecross-validationexperiment..........104
Table4.13Veri˝cation(%)ongendergroupsofIJB-C(TAR@
0
Ł
1%
FAR).......105
Table
4.14
Veri˝cationaccuracy(
%
)ontheRFWprotocol[4]withvaryingrace/ethnicity
distributioninthetrainingset.............................106
Table
4.15
Veri˝cationperformanceonLFW,IJB-A,andIJB-C.[Key:
Best
,
Second
,
Third
Best]......................................107
Table
4.16
Distributionofratiosbetweenminimuminter-classdistanceandmaximum
intra-classdistanceoffacefeaturesin
4
racegroupsofRFW.GACexhibitshigher
ratios,andmoresimilardistributionstothereference................108
Table4.17Networkcomplexityandinferencetime.....................109
Table4.18Genderdistributionofthedatasetsforgenderestimation............110
Table4.19Racedistributionofthedatasetsforraceestimation...............111
Table4.20Agedistributionofthedatasetsforageestimation...............111
Table
5.1
Veri˝cationperformance(
%
)ofdi˙erentvertexfeaturematricesoforacle
graphs.Abigger
A
toleratesmoreintra-classvariations,whileasmall
A
,
f
2
8
,or
f
?
8
striveforminimalintra-classvariation.Abalancebetweenthelearningcapability
andidealrepresentationsperformsthebest(
A
=0
Ł
7
).................123
Table
5.2
Veri˝cationperformance(
%
)ofdi˙erentadjacencymatricesofgenerated
graphs..........................................125
Table5.3Veri˝cationperformance(
%
)ofdi˙erent
_
and
`
................125
Table
5.4
Veri˝cationaccuracy(
%
)ofourmodelandSOTAmethodsonLFW,CPLFW,
andCFP-FP.Theresultsmarkedby(*)arere-implementedbyArcFace[6].All
otherbaselineresultsarereportedbytheirrespectivepapers.[Keys:Red:Best,
Blue:Secondbest]..................................126
xii
Table
5.5
Comparisonsofveri˝cationperformancewithSOTAmethodsonIJB-A,
IJB-B,andIJB-C.TheevaluationismeasuredbyTAR(
%
),TrueAcceptanceRate,
atacertainFAR,FalseAcceptanceRate.ForIJB-A,FAR=
0
Ł
1%
;forIJB-Band
IJB-C,FAR=
0
Ł
01%
.ThedecimalprecisionofTARvariesamongthosereported
bySOTAmethods.Resultsreportedinthistableareuni˝edtoonedecimalplace
(
0
Ł
1
).Allbaselineresultsarereportedbytheirrespectivepapers.[Keys:Red:Best,
Blue:Secondbest]..................................127
Table
5.6
Comparisonsoffaceidenti˝cationperformance(
%
)ontheIJB-Cdataset
(close-set)........................................128
xiii
LISTOFFIGURES
Figure1.1Exampleimagesfromdi˙erentdatasets.MS-Celeb-1M,CASIA,andLFW
containonlystillimages.IJB-A,IJB-BandIJB-Cincludebothstillimagesand
videoframes.Threesubjectsareselectedfromeachdataset,andeachrowcontains
imagesbelongingtoonesubject............................5
Figure
1.2
Atypicalpipelineoffeature-basedFRframeworkscomprisesoffacedetection,
alignment,normalization,featureextraction,andfeaturematching.Whileeachof
thesecomponentsa˙ecttheperformanceofFRsystems,inthisthesis,wefocus
ontherepresentationfunctionoffeatureextractionthatmapsahigh-dimensional
normalizedfaceimagetoa
d-
dimensionalvectorrepresentation...........8
Figure
1.3
Theerrortradeo˙characteristicsintheformoffalsenon-matchrates
(FNMR=1-TAR)vs.falsematchrates(FMR=FAR)foronecommercialFR
algorithmverifyingmugshotimages,reportedbytheNIST(NationalInstituteof
StandardsandTechnology)FacerecognitionVendorTest[7].TheFMRestimates
arecomputedonimpostorpairsoffaceimageswithsamegenderandsamerace.
Eachsymbol(circle,triangle,square)correspondstoa˝xedthreshold-theirvertical
andhorizontaldisplacementsreveal,respectively,di˙erencesinFNMRandFMR
betweendemographicgroups[7]............................15
Figure
1.4
Falsepositivedi˙erentialsinveri˝cationalgorithmsprovidedtoNIST.The
dotsgivethefalsematchratesforsame-sexandsame-raceimpostorcomparisons.
ThethresholdissetforeachalgorithmtogiveFMR=
0
Ł
0001
onwhitemales(the
purpledotsintherighthandpanel).Thealgorithmsaresortedinorderofworst
caseFMR[7]......................................16
Figure
2.1
IntrinsicDimension:
Ourapproachisbasedontwoobservations:(a)Graph
inducedgeodesicdistancebetweenimagesisabletocapturethetopologyofthe
imagerepresentationmanifoldmorereliably.Asanillustration,weshowthegraph
edgesforthesurfaceofaunitaryhypersphereandafacemanifoldofIDtwo,
embeddedwithina3-
38<
space.(b)Thedistributionofthegeodesicdistances(for
distance
A
<0G

2
f

A

A
<0G
,where
A
<0G
isthedistanceatthemode)hasbeen
empiricallyobserved[1]tobesimilaracrossdi˙erenttopologicalstructureswith
thesameintrinsicdimensionality.Theplotshowsthedistancedistributionforaface
representation,unitaryhypersphereandaGaussiandistributionofIDtwoembedded
within3-
38<
space.[8]................................27
Figure
2.2
DeepMDSMapping:
ADNNbasednon-linearmappingislearnedto
transformtheambientspacetoaplausibleintrinsicspace.Thenetworkisoptimized
topreservedistancesbetweenpairsofpointsintheambientandintrinsicspace...30
xiv
Figure
2.3
IntrinsicDimensionality:
(a)Geodesicdistancedistribution,and(b)global
minimumofRMSE...................................32
Figure
2.4
Distributionofgeodesicdistancesfordi˙erentrepresentationmodelsand
datasets.........................................33
Figure
2.5
log
^
?
M
(
A
)
^
?
M
(
A
<0G
)
vs
log
A
A
<0G
plotsaswevarynumberofneighbors
:
forsphereface
representationmodelondi˙erentdatasets......................34
Figure2.6IntrinsicDimensionalityofSwissRoll....................36
Figure
2.7
SwissRoll:
(a)theoriginal2000pointsfromtheswissrollmanifold,(b)
the2-
38<
intrinsicspaceestimatedbyIsomap,and(3)the2-
38<
intrinsicspace
estimatedbyourproposedmethodDeepMDS.Inbothcases,theblueandblack
points,andcorrespondinglygreenandredpoints,areclosetogetherinboththe
intrinsicandambientspace..............................37
Figure
2.8
DeepMDS:
FaceVeri˝cationonIJB-C[9](TAR@0.1%FARinlegend)
forthe(a)FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings......38
Figure
2.9
DeepMDS:
FaceVeri˝cationonLFW(BLUFR)datasetforthe(a)FaceNet-
128,(b)FaceNet-512and(c)SphereFaceembeddings.................39
Figure
2.10
PCA:
FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a)
FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings............40
Figure
2.11
Isomap:
FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a)
FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings............41
Figure
2.12
ROCcurveonLFWandIJB-CdatasetsfortheInceptionResNetV1[10]
modeltrainedwithdi˙erentembeddingdimensionalityontheCASIA-WebFace[11]
dataset..........................................42
Figure
2.13
DenoisingAutoencoder:
FaceVeri˝cationonIJB-CandLFW(BLUFR)
datasetforthe(a)FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings..42
Figure
3.1
Anillustrationofthegeometricalstructureofourcapacityestimationproblem:
alow-dimensionalmanifold
M2
R
<
embeddedinhighdimensionalspace
P2
R
?
.
Onthismanifold,allthefaceslieinsidethepopulationhyper-ellipsoidandthe
embeddingofimagesbelongingtoeachidentityoraclassareclusteredintotheir
ownclass-speci˝chyper-ellipsoids.Thecapacityofthismanifoldisthenumber
ofidentities(class-speci˝chyper-ellipsoids)thatcanbepackedintothepopulation
hyper-ellipsoidwithinanerrortoleranceoramountofoverlap............45
xv
Figure
3.2
OverviewofFaceRepresentationCapacityEstimation:
Wecastthe
capacityestimationprocessintheframeworkofthespherepackingproblemona
low-dimensionalmanifold.Togeneralizethespherepackingproblem,wereplace
spheresbyhyper-ellipsoids,oneperclass(subject).Ourapproachinvolvesthree
steps;(i)Unfoldingandmappingthemanifoldembeddedinhigh-dimensional
spaceontoalow-dimensionalspace.(ii)
Teacher
-
Student
modeltoobtainexplicit
estimatesoftheuncertainty(noise)intheembeddingduetodataaswellasthe
parametersoftherepresentation,and(iii)Theuncertaintyestimatesareleveragedto
approximatethedensitymanifoldviamulti-variatenormaldistributions(tokeepthe
problemanditsanalysistractable),whichinturnfacilitatesanempiricalestimateof
thecapacityofthe
teacher
facerepresentationasaratioofhyper-ellipsoidalvolumes.
49
Figure
3.3
ManifoldUnfolding:
ADNNbasednon-linearmappingislearnedtounfold
andprojectthepopulationmanifoldintoalowerdimensionalspace.Thenetworkis
optimizedtopreservethegeodesicdistancesbetweenpairsofpointsinthehighand
lowdimensionalspace.................................50
Figure
3.4
DecisionTheoryandCapacity:
Weillustratetherelationbetweencapacity
andthediscriminantfunctioncorrespondingtoanearestneighborclassi˝er.
Left:
Depictionofthenotionofdecisionboundaryandprobabilityoffalseaccept
betweentwoidenticalonedimensionalGaussiandistributions.Shannon'sde˝nition
ofcapacitycorrespondstothedecisionboundarybeingonestandarddeviation
awayfromthemean.
Right:
Depictionofthedecisionboundaryinducedbythe
discriminantfunctionofnearestneighborclassi˝er.Unlikeinthede˝nitionof
Shannon'scapacity,thesizeoftheellipsoidaldecisionboundaryisdeterminedby
themaximumacceptablefalseacceptrate.Theprobabilityoffalseacceptancecan
becomputedthroughthecumulativedistributionfunctionofa
j
2
(
A
2
Œ3
)
distribution.55
Figure
3.5
SampleRepresentationSpace:
Illustrationofatwo-dimensionalspace
wheretheunderlyingpopulationandclass-speci˝crepresentations(weshowfour
classes)are2-DGaussiandistributions(solidellipsoids).Samplesfromtheclasses
(colored
¢
)areutilizedtoobtainestimatesofthisunderlyingpopulationandclass-
speci˝cdistributions(solidlines).Asacomparison,thesupportofthesamplesin
theformofaconvexhullarealsoshown(dashedlines)................59
Figure
3.6
Facerecognitionperformanceofthe
original
and
student
modelsondi˙erent
datasets.Wereportthefaceveri˝cationperformanceofbothFaceNetandSphereFace
facerepresentations,(a)LFWevaluatedthroughtheBLUFRprotocol,(b)IJB-A,
(c)IJB-B,and(d)IJB-Cevaluatedthroughtheirrespectivematchingprotocol....63
Figure
3.7
Capacityestimatesacrossdi˙erentdatasetsforthe(a)FaceNet[12]and(b)
SphereFace[13]representationsasfunctionofdi˙erentfalseacceptrates.Under
thelimit,thecapacitytendstozeroastheFARtendstozero.Similarly,thecapacity
tendsto
1
astheFARtendsto1.0.(c)Logarithmicvaluesofcapacityondi˙erent
datasetsversusthecorrespondingTAR@0.1%FAR.................65
xvi
Figure
3.8
Exampleimagesofclassesthatcorrespondtodi˙erentsizesoftheclass-
speci˝chyper-ellipsoids,basedontheSphereFacerepresentation,fordi˙erent
datasetsconsidered.
TopRow
:Imagesoftheclasswiththelargestclass-speci˝c
hyper-ellipsoidforeachdatabase.Noticethatinthecaseofadatabasewith
predominantlyfrontalfaces(LFW),largevariationsinfacialappearanceleadtothe
greatestuncertaintyintheclassrepresentation.Onmorechallengingdatasets(IJB-B,
IJB-C),thefacerepresentationexhibitsmostuncertaintyduetoposevariations.
BottomRow:
Imagesoftheclasswiththesmallestclass-speci˝chyper-ellipsoid
foreachdatabase.Asexpected,acrossallthedatasets,frontalfaceimageswiththe
minimalchangeinappearanceresultintheleastamountofuncertaintyintheclass
representation......................................67
Figure
4.1
Methodstolearndi˙erenttaskssimultaneously.Solidlinesaretypicalfeature
˛owinCNN,whiledashlinesareadversariallosses.................74
Figure
4.2
OverviewoftheproposedDe-biasingface(DebFace)network.DebFace
iscomposedofthreemajorblocks,
i.e.
,asharedfeatureencodingblock,afeature
disentanglingblock,andafeatureaggregationblock.Thesolidarrowsrepresent
theforwardinference,andthedashedarrowsstandforadversarialtraining.During
inference,eitherDebFace-ID(
i.e.
,
f
˚ˇ
)orDemoIDcanbeusedforfacematching
giventhedesiredtrade-o˙betweenbiasnessandaccuracy..............76
Figure
4.3
FaceVeri˝cationAUC(%)oneachdemographiccohort.Thecohortsare
chosenbasedonthethreeattributes,
i.e.
,gender,age,andrace.To˝ttheresults
intoa
2
Dplot,weshowtheperformanceofmaleandfemaleseparately.Duetothe
limitednumberoffaceimagesinsomecohorts,theirresultsaregraycells......82
Figure
4.4
Theoverallperformanceoffaceveri˝cationAUC(%)ongender,age,andrace.
83
Figure
4.5
Classi˝cationaccuracy(%)ofdemographicattributeestimationsonfacesof
di˙erentcohorts,byDebFaceandthebaselines.Forsimplicity,weuseDebFace-G,
DebFace-A,andDebFace-Rtorepresentthegender,age,andraceclassi˝erof
DebFace.........................................84
Figure
4.6
ThedistributionoffaceidentityrepresentationsofBaseFaceandDebFace.
Bothcollectionsoffeaturevectorsareextractedfromimagesofthesamedataset.
Di˙erentcolorsandshapesrepresentdi˙erentdemographicattributes.Zoominfor
details..........................................85
Figure
4.7
ReconstructedImagesusingFaceandDemographicRepresentations.The
˝rstrowistheoriginalfaceimages.Fromthesecondrowtothebottom,theface
imagesarereconstructedfrom2)BaseFace;3)DebFace-ID;4)DebFace-G;5)
DebFace-R;6)DebFace-A.Zoominfordetails....................85
Figure4.8Thepercentageoffalseacceptedcrossraceoragepairsat1%FAR......87
xvii
Figure
4.9
BaseFaceandDebFacedistributionsofthesimilarityscoresoftheimposter
pairsacrosshomogeneousversusheterogeneousgender,age,andracecategories..89
Figure
4.10
(a)Ourproposedgroupadaptiveclassi˝er(GAC)automaticallychooses
betweennon-adaptiveandadaptiveAlayerinamulti-layernetwork,where
thelatterusesdemographic-group-speci˝ckernelandattention.(b)Comparedto
thebaselinewiththe
50
-layerArcFacebackbone,GACimprovesfaceveri˝cation
accuracyinmostgroupsofRFWdataset[4],especiallyunder-representedgroups,
leadingtomitigatedFRbias.GACreducesbiasnessfrom
1
Ł
11
to
0
Ł
60
........90
Figure4.11AcomparisonofapproachesinadaptiveCNNs................92
Figure
4.12
OverviewoftheproposedGACformitigatingFRbias.GACcontainstwo
majormodules:theadaptivelayerandtheautomationmodule.Theadaptivelayer
consistsofadaptivekernelsandattentionmaps.Theautomationmoduleisemployed
todecidewhetheralayershouldbeadaptiveornot..................94
Figure4.13ROCof(a)baselineand(b)GACevaluatedonallpairsofRFW.......100
Figure
4.14
8
falsepositiveandfalsenegativepairsonRFWgivenbythebaselinebut
successfullyveri˝edbyGAC..............................101
Figure
4.15
(a)Foreachofthethree
g
inautomaticadaptation,weshowtheaverage
similaritiesofpair-wisedemographickernelmasks,
i.e.
,
\
,at
1
-
48
layers(y-axis),
and
1
-
15
 
trainingsteps(x-axis).Thenumberofadaptivelayersinthreecases,
i.e.
,
P
48
1
(
\¡g
)
at
15
 
C
step,are
12
,
8
,and
2
,respectively.(b)Withtworacegroups
(White,BlackinPCSO[14])andtwomodels(baseline,GAC),foreachofthefour
combinations,wecomputepair-wisecorrelationoffacerepresentationsusingany
twoof
1
Ksubjectsinthesamerace,andplotthehistogramofcorrelations.GAC
reducesthedi˙erence/biasoftwodistributions....................102
Figure
4.16
The˝rstrowshowstheaveragefacesofdi˙erentgroupsinRFW.The
nexttworowsshowgradient-weightedclassactivationheatmaps[15]atthe
43
C
convolutionallayeroftheGACandbaseline.Thehigherdiversityofheatmapsin
GACshowsthevariabilityofparametersinGACacrossgroups...........104
Figure
4.17
DemographicAttributeClassi˝cationAccuracyoneachgroup.Thered
dashedlinereferstotheaverageaccuracyonallimagesinthetestingset.......110
xviii
Figure
5.1
(a)InGANs,duringtraininganimagegeneratorgraduallyproduceshigher
qualityfacessothataCNN-baseddiscriminatorcouldnotdistinguishfakefromreal
faces.(b)Analogously,giveninputfaces,ourembeddingnetworkforfacerecognition
learnstoextractdiscriminativefeaturesandconnectfeaturesasagraph,withthe
goalthatagraphneuralnetwork(GNN)-baseddiscriminatorcouldnotdistinguish
generatedgraphsfromoraclegraphsthegraphofidealfacerepresentations.
Duringinference,ourembeddingnetworkcanextractmorediscriminativefeatures
thatformoracle-likegraph,justlikeGAN'sgeneratorsynthesizesphoto-realisticfaces.
114
Figure
5.2
Overviewoftheproposedadversarialfacerepresentationlearningviagraph
classi˝cation.Solidarrowspresentforwardpass,anddashedarrowsdenotebackward
propagation.Thetrainingalternatesbetween
ˆ
(

)
and
ˇ
(

)
.Forthesharedinference
(solidbluearrows),asetoffaceimagesare˝rsttakenbytheembeddingnetwork
ˆ
(

)
toextractfeaturerepresentations.Thesefeaturevectorsarethenconverted
intographstructurebythegraphconstructor
˝
(

)
inwhichanoraclegraphanda
generatedgraphareconstructed.Duringthetrainingof
ˇ
(

)
(yellowarrows),the
twotypesofgraphsarereceivedbythegraphdiscriminator
ˇ
(

)
thatisrequiredto
makepredictionsonthecategoryofthegraphs.
ˇ
(

)
isthenupdatedbasedonthe
gradientsentbackfromthelossfunction
L
ˇ
.Inthecourseoftrainingon
ˆ
(

)
(red
arrows),onlygeneratedgraphsaredeliveredto
ˇ
(

)
,and
ˆ
(

)
receivesfeedbacks
from
L

whosegoalistodrive
ˇ
(

)
tomakeerrorsongeneratedgraphs.......116
Figure
5.3
Constructionofgeneratedgraphandoraclegraph
.Inthisexample,theinput
imagesetcomprises
9
imagesof
3
subjects,
3
imagespersubject.Theimageof
eachsubjectissurroundedbyacirclewithauniquecolor,indicatingitsidentity.
Eachimageisprojectedtoapointinthe
2
DEuclideanfeaturespace.Thefollowing
graphsareconstructed:(a)ageneratedgraph,whereeachvertex
E
8
isrepresentedby
itsfeaturevector,withadirectededgefrom
E
8
to
E
9
if
E
9
isoneofthetop
2
nearest
neighborsof
E
8
;(b)anoraclegraphcreatedbycenterpoints,whereeachvertexis
representedbythemeanvectorofitsidentitywithabidirectionaledgeconnecting
twoverticesofthesamesubject;(c)aradiusconstraintisusedtoallowtolerable
intra-subjectvariations,whereverticesmovetowardscenterdirections(denotedby
dashedarrows)tomeettheradiusrequirement.Forthevertexwithintheradius,the
leftmostoneinthisexample,itstaysthesame.(d)anoraclegraphcontrolledby
A
,
wherethedistanceofeachvertextoitscenterisreducedbytheratioof
A
,witha
bidirectionaledgeconnectingtwoverticesofthesamesubject............119
Figure
5.4
Twoexamplesofgeneratedgraphsbeingupdatedbyadversariallearningat
4
instancesduringthetrainingprocess..........................129
Figure
5.5
t-SNEvisualizationofthefacerepresentationsina
2
Dspace.Eachidentityis
representedbyauniquecolor.TheinitialfacerepresentationsextractfromCurricularFace,
andtheupdatedrepresentationslearnedviaadversarialgraphclassi˝cationareshownin(a)
and(b),respectively.
..................................130
xix
LISTOFALGORITHMS
Algorithm1FaceRepresentationCapacityEstimation..................58
xx
KEYTOABBREVIATIONS
Acronyms/Abbreviation
FRFaceRecognition
SOTAState-of-the-art
CCTVClosedCircuitTelevision
ROCReceivingOperatingCharacteristic
TARTrueAcceptanceRate
FARFalseAcceptanceRate
CMCCumulativeMatchCharacteristic
DIRDetection&Identi˝cationRate
:

NN
:

nearestneighbors
PCAPrincipalComponentAnalysis
LDALinearDiscriminantAnalysis
LBPLocalBinaryPatterns
LQPLocalQuantizedPatterns
HOGHistogramofOrientedGradients
SIFTScaleInvariantFeatureTransform
DNNDeepNeuralNetwork
CNNConvolutionalNeuralNetwork
INDIntrinsicDimensionality
GACGroupAdaptiveClassi˝er
MDSMultidimensionalScaling
RMSERootMeanSquaredError
GANGenerativeAdversarialNetwork
GNNGraphNeuralNetwork
xxi
Chapter1
Introduction:TheImportanceofFace
Representation
Inbiometricsandcomputervisioncommunities,facerecognition(FR)hasemergedasoneof
themajorresearch˝eldsfocusingonthedesignofalgorithmsthatcanautomaticallyauthenticate
people'sidentitiesbasedontheirdigitalfaceimages.Withtherapidproliferationoffaceimages
oronsocialmediawebsites,suchasFacebook,Twitter,andInstagram,researchersin
theFRcommunityhaveaccesstoabundantimagesandvideosofhumanface,whichhasrapidly
acceleratedthedevelopmentofFRsystemsandextendeditsapplications.Forexample,FRsystems
arewidelyadoptedforsecurity-relatedapplications(
e.g.,
accesscontrol,surveillancesystems),
forensicapplications(criminalidentityveri˝cation),andentertainmentapplicationsondesktopsand
mobiledevices,forexample,mobileappsforfacephotoediting.Asaconvenientauthenticationtool,
FRrequiresminimalinteractionwithusersandcanevenoperateunderuncontrolledenvironments
andatadistance[16].Comparedtootherbiometrictraits(
i.e.,
iris,˝ngerprints,voice,etc.),in
additiontoidentity,ahumanfaceimagecontainsseveralusefulinformation,includingdemographics
(gender,age,race/ethnicity),facialexpression,andemotioncues.Suchrichinformation,ontheother
hand,canunderminethereliabilityofFRsystems.Thisisbecausesomefacialcharacteristics,such
asfacialexpressions,candeformcrucialfacefeatures,andleadtolargeintra-personfacialvariations
1
thataredi˚culttocompensate.OneofthekeystepsinanyFRsystemistoimpartrobustness
towardschallengescausedbyvariationsinfaceimagesbyextractingsalientfacialfeatures.Instead
ofarawfaceimage,avectorofrepresentativefeatures,alsoreferredtoasa
facerepresentation
,is
usedtodistinguishtheidentity.Agoodrepresentationmethodiscapableofreducingtheintra-person
variationswhilemaintainingorevenenhancinginter-persondi˙erences.
Whenusing
deeplearning
models[6,13,17

19]toextractfacerepresentations,currentstate-of-
the-art(SOTA)FRsystemsclaimtosurpasshumancapabilityincertainscenarios[20].Despite
thistremendousprogress,somefundamentalquestionsinfacerepresentationlearningremain:
‹
Howcompactcantherepresentationbewithoutanylossinrecognitionperformance?
‹
Givenafacerepresentation,howmanyidentitiescanitresolve?Orinotherwords,whatis
thecapacityofagivenfacerepresentation?
‹
DoesaFRsystemgeneraterepresentationswhichisequallydiscriminableforfacesindi˙erent
demographicgroups?
‹
Isitpossibletoenhancethesaliencyoffacerepresentationforeverysubjectinatarget
populationbymeansofthedistancedistributionintheembeddingspace?
First,ascienti˝cbasisforestimatingthecompactnessandthecapacityofagivenfacerepresentation
willnotonlybene˝ttheevaluationandcomparisonofdi˙erentrepresentationmethods,butwill
alsobene˝tthedevelopmentofcompactfacerepresentations(smalltemplatesize)withhighsearch
e˚ciencyandestablishanupperboundonthescalabilityofanautomaticFRsystem.Second,FR
systemsareknowntoexhibitbiasedperformanceagainstcertaindemographicgroups[7,14,21].
GiventheimportanceofautomatedFR-drivendecisions,deployingbiasedFRsystemsespeciallyfor
lawenforcementispotentiallyunethical[22].SomestateandlocalgovernmentsintheUnitedStates
havecurtailedtheuseoffacerecognitionforthesereasons,withcitiesincludingSanFranciscoand
Bostonenactingtheirownbans.ItisnecessarytodevelopfairandunbiasedFRsystemstoavoid
theirnegativesocietalimpact.Whilereducingthebiasamongdemographicgroupsisimportant,
thesegroupsarepre-de˝nedbasedonone'sdemographicattributes.Itisalsocrucialtoimprovethe
representationforeachindividualregardlessofhisorhergender,age,andrace/ethnicity.Themain
2
goalofthisthesisistodeveloppracticaltoolstoreasonaboutthecompactnessandcapacityofagiven
facerepresentation,anddesignalgorithmsforfairerandmorediscriminativefacerepresentations
ofeachidentityineverydemographicgroup.Tobeginwith,we˝rstgiveabriefbackgroundon
automatedfacerecognition.Thenweintroducethemotivation,goals,andproblemdomainsofeach
workonautomatedfacerecognitionaddressedinthisdissertationinSec.1.3toSec.1.5,separately.
Wealsosummarizethecontributionsandthestructureofthisthesisattheendofthechapter.
1.1AutomatedFaceRecognition
1.1.1InputDataSource
Asavisualpatternrecognitiontask,FRcanbeperformedonavarietyofinputdatasource,such
as1)asingle2Dimage,2)asetof2Dimages(videoframes),and3)3Dfaceimages.Asingle
2Dimageisoftenusedastheinputoffaceveri˝cationsystems.Insomescenarios,forexample,
inasurveillanceenvironment,aclipofvideocapturedbyCCTVsystemsistakenastheinputof
FRsystems.Althoughmultipleframesinavideoclipproviderichinformationatdi˙erenttime
stamps,unconstrainedFRinvideo-surveillancesettingsisstillachallengingproblem,becausevideo
framestendtobeofpoorqualitywithmotionblurandunfavorableviewingangles.3Dsensors,
includingdepthsensors,mayprovideextrainformationandhelpimprovetheaccuracyofFR,but
theyareexpensivewithrelativelylargeracquisitiontime.Thisthesisfocusesonthemorecommonly
deployedscenariowhereasingle2Dimageistheinputdata.
AkeyaspectofdeeplearningbasedFRalgorithmsisthetrainingdatausedtolearnface
representations.Datacollectionandannotationareextremelyimportantforsupervisedface
representationlearning.Widelyadoptedpubliclyavailablefacedatasetsusedinthisthesisfor
trainingandtestingfacerepresentationmodelsarelistedbelow.Examplesoffaceimagesinthese
datasetsareshowninFig.1.1.
CASIAWebFace[11]:
Acollectionoflabeledimagesdownloadedfromtheweb(basedon
namesoffamouspersonalitiesaskeywords)popularfortrainingdeepneuralnetworks.Itconsists
3
of
494
Œ
414
imagesacross
10
Œ
575
subjects,withanaverageofabout
500
faceimagespersubjects.
Thisdatasetisprimarlyusedfortrainingfacerepresentationmodels.
MS-Celeb-1M[23]:
The˝rstreleasedversionofthisdatasetcontainedaround10million
imagesof100Kcelebrities.About100imageswereretrievedforeachidentitybytheBingsearch
engineusingthecelebrity'sname.Withno˝lteringoftheretrievedimages,thequalityofthe
datasetisgreatlymuddiedbylabelnoise,duplicatedimages,andnon-faceimages,makingithardto
beuseddirectlyforrepresentationlearning[24].Forthisreason,therehavebeenseveralcleaned
versionsofthedataset([6,13,25,26]).Inthisthesis,theversionof[6]isusedasthedatasetfor
trainingFRmodels,whichcontains
5
Œ
822
Œ
653
imagesof
85
Œ
742
subjects.
LFW[27]:
13
Œ
233
faceimagesof
5
Œ
749
subjects,downloadedfromtheweb.Theseimages
exhibitlimitedvariationsinpose,illumination,andexpression,sinceonlyfacesthatcouldbe
detectedbytheViola-Jonesfacedetector[28]wereincludedinthedataset.Onelimitationofthis
datasetisthatonly
1
Œ
680
subjectsamongthetotalof
5
Œ
749
subjectshavemorethanonefaceimage.
IJB-A[29]:
IARPAJanusBenchmark-A(IJB-A)contains
500
subjectswithatotalof
25
Œ
813
images(
5
Œ
399
stillimagesand
20
Œ
414
videoframes),anaverageof
51
imagespersubject.Compared
totheLFWandCASIAdatasets,theIJB-Adatasetismorechallengingduetothepresenceof:i)
largeposevariationsmakingitdi˚culttodetectallthefacesusingacommodityfacedetector,ii)a
mixofimagesandvideos,andiii)widergeographicalvariationofsubjects.Thefacelocationsare
providedwiththeIJB-Adataset(andusedinourexperimentsinthisthesiswhenneeded).
IJB-B[30]:
IARPAJanusBenchmark-B(IJB-B)datasetisasupersetoftheIJB-Adataset
consistingof
1
Œ
845
subjectswithatotalof
76
Œ
824
images(
21
Œ
798
stillimagesand
55
Œ
026
video
framesfrom
7
Œ
011
videos),anaverageof
41
imagespersubject.Imagesinthisdatasetarelabeled
withgroundtruthboundingboxesandothercovariatemeta-datasuchasocclusions,facialhair
andskintone.AkeymotivationfortheIJB-Bdatasetistomakethefacedatasetlessconstrained
comparedtoIJB-Adatasetandhaveamoreuniformgeographicaldistributionofsubjectsacrossthe
globe.
IJB-C[9]:
IARPAJanusBenchmark-C(IJB-C)datasetconsistsof
3
Œ
531
subjectswithatotal
4
(a)MS-Celeb-1M[23]
(b)CASIAWebFace[11]
(c)LFW[27]
(d)IJB-A[29]
(e)IJB-B[30]
(f)IJB-C[9]
Figure1.1Exampleimagesfromdi˙erentdatasets.MS-Celeb-1M,CASIA,andLFWcontainonly
stillimages.IJB-A,IJB-BandIJB-Cincludebothstillimagesandvideoframes.Threesubjectsare
selectedfromeachdataset,andeachrowcontainsimagesbelongingtoonesubject.
of
31
Œ
334
(
21
Œ
294
faceand
10
Œ
040
non-face)stillimagesand
11
Œ
779
videos(
117
Œ
542
frames),an
averageof
39
imagespersubject.Thisdatasetemphasizesfaceswithfullposevariations,occlusions
anddiversityofsubjectoccupationandgeographicalorigin.Imagesinthisdatasetarelabeledwith
groundtruthboundingboxesandothercovariatessuchasocclusions,facialhairandskintone.
1.1.2Evaluation
TherearemanyapplicationswhereFRtechniquesaresuccessfullyusedtoperformaspeci˝ctask.
Amongthosetasks,twoprimarytasksareconsideredtoevaluateanFRsystem:
‹
Veri˝cation(authentication)onetoonematch.
‹
Identi˝cation(recognition)onetomanymatch.
Veri˝cation.
Thetaskofveri˝cationgenerallyaimsatidentityauthenticationwithuser
interaction,toverifyifagivenfacematchestheidentitythatisclaimed.Toevaluateaveri˝cation
5
system,faceimagesare˝rstdividedintotwogroups:1)
genuine
groupwherepeoplegainaccess
usingtheirownidentity;2)
impostor
groupwherepeoplegainaccessusingfalseidentities.Aface
imageiscomparedwithotherfaceimagesfromgenuinegroupandimpostorgroup,respectively,
whichcorrespondstogenuinepairs(apairoffaceimagesfromthesameidentity),andimpostor
pairs(apairoffaceimagesfromdi˙erentidentities).AReceivingOperatingCharacteristic(ROC)
curveisutilizedtoevaluatetheFRperformanceonveri˝cationtasks,wherethepercentageof
genuineaccessisreportedastheTrueAcceptanceRate(TAR)andthepercentageofimpostor
falselygainingaccessisreportedastheFalseAcceptanceRate(FAR)foragivenmatchthreshold,
g
.
Let
˚
4
1
and
˚
4
2
denoteagenuinepairoffaceimages,and
˚
<
1
and
˚
<
2
denoteanimpostorpairofface
images.ThenTARandFARcanbeformulatedas[31]:
)'
(
g
)=
jf
E
j
B
(
˚
4
1
Œ˚
4
2
)
¡g
gj
j
E
j
Œ˙'
(
g
)=
jf
M
j
B
(
˚
<
1
Œ˚
<
2
)
¡g
gj
j
M
j
Œ
(1.1)
where
E
isthesetofgenuinepairs,
M
isthesetofimpostorpairs,and
B
(

Œ

)
isagivensimilarity
function.
Identi˝cation.
Thetaskofidenti˝cationismostlyaimedatidentitysearchwithoutuser
interaction,forexample,surveillancesystems.Similartoevaluationonveri˝cationsystems,the
identi˝cationtestisconductedbydividingfaceimagesintotwogroups:1)
probe
imageswhose
identitiesareunknown,denotedas
P
;2)
gallery
imagesthatbelongtopeoplewithknownidentities,
denotedas
G
.Ingeneral,eachsubjecthasonefaceimageinthegallery.Basedontherelationship
betweentheprobeandgalleryidentities,theevaluationissplitintotwodi˙erentsettings:1)
closed-set
identi˝cationwhereallprobeidentitiesareassumedtobeamongthegalleryidentities;
2)
open-set
identi˝cationwhereprobeidentitiesarenotnecessarilyinthegallery.Forclosed-set,
aCumulativeMatchCharacteristic(CMC)curveisusedtoreportthecorrectidenti˝cationrate
foreachcumulativerank(thenumberofcandidatesreturned).Byorderingthesimilarityscores
betweenaprobeimage
˚
?
2
P
andimagesinthegallery
˚
6
2
G
,therankof
˚
?
iscomputedasthe
numberofidentityinthegallerywhosesimilarityscoresarehigherorequaltothecorrectidentity
6
˚
6
?
:
A0=:
(
˚
?
)=
jf
G
j
B
(
˚
6
Œ˚
?
)

B
(
˚
6
?
Œ˚
?
);
˚
6
2
G
gj
(1.2)
Foragivenrank
A
,thecorrectidenti˝cationrateinaCMCcurveis
jf
P
j
A0=:
(
˚
?
)

A
gj
j
P
j
.Inthecaseof
open-set,imagesintheprobeset
P
canbeclassi˝edintoknownsubjects
S
(thesubjectsthatappear
inthegallery)andunkownsubjects
U
(thesubjectsthatarenotinthegallery),i.e.,
P
=
S
S
U
.A
DetectionandIdenti˝cationRate(DIR)curveisplottedtoshowthecorrectidenti˝cationrateswith
respecttothefalseacceptancerates(FAR).Here,thede˝nitionofFARisdi˙erentfromthatinan
ROCcurve.Foragiventhreshold
\
,afalseacceptanceoccurswhenthesimilarityofanunknown
probe
˚
?
2
U
tooneofthegallerysubjectsishigherthan
\
.FARcomputestheaverageprobability
as[32]:
˙'
(
\
)=
jf
P
j
max
G
B
(
˚
6
Œ˚
?
)

\
;
˚
?
2
U
gj
j
U
j
Ł
(1.3)
TheDIRforagivenrank
A
iscalculatedontheknownprobeset
S
as[32]:
ˇ˚'
(
\ŒA
)=
jf
P
j
A0=:
(
˚
?
)

A
V
B
(
˚
6
Œ˚
?
)

\
;
˚
?
2
S
gj
j
S
j
Ł
(1.4)
1.1.3FaceRecognitionPipeline
AgeneralFRsystemconsistsoffoursteps,
i.e,
facedetection,alignmentandnormalization,feature
extraction,andfeaturematching.Fig.1.2showsatypicalpipelineoffeature-basedFRsystems.
First,thefaceareaislocalizedinaninputimage,andthefaceregioniscroppedfromtheoriginal
image.Next,thecroppedfaceimageisaligned(typicalalignmentsincludetranslation,rotation,
andscaling)basedonthedetectedfaciallandmarks(keypointsonfaceincludingeyebrows,eyes,
nose,mouth,andjawsilhouette).Toreducethee˙ectsofilluminationvariations,thealignedface
imagesneedtobenormalizedbeforebeingusedforfeatureextraction.Then,intherepresentation
stage,weextractacompactsetofdiscriminatinggeometricaland/orphotometricalfeaturesofthe
facesothateachfaceimageisrepresentedandstoredasa
3

dimensionalfeaturevector.Finally,
usingadistancemetric,thesimilaritybetweentwofeaturevectorsismeasured,alsoknownasthe
7
Figure1.2Atypicalpipelineoffeature-basedFRframeworkscomprisesoffacedetection,alignment,
normalization,featureextraction,andfeaturematching.Whileeachofthesecomponentsa˙ectthe
performanceofFRsystems,inthisthesis,wefocusontherepresentationfunctionoffeatureextraction
thatmapsahigh-dimensionalnormalizedfaceimagetoa
d-
dimensionalvectorrepresentation.
similarityscore,toverifyiftwofaceimagesbelongtothesameperson(faceveri˝cation),orto
identifytheidentityofthefaceimagebyassigningittothelabelofthenearestgalleryimage(rank-
1
faceidenti˝cation).Sinceourcentralfocusisonfacerepresentations,werefertothestepsbefore
featureextractionasdatapre-processing.Inthisthesis,wemainlyemploytwofaciallandmark
detectionalgorithms,MTCNN[33]andRetinaface[34],todetectandalignallfacesinourtraining
andtestingdatasets.Inparticular,eachfaceiscroppedfromthedetectedfaceregionandresizedto
thesamesizeusingasimilaritytransformationbasedonthedetected˝vefaciallandmarks,
i.e.,
left
eyecenter,righteyecenter,nosetip,leftmouthcorner,andrightmouthcorner.
1.2FaceRepresentationExtractors
ThesubjectofFRisasoldasthe˝eldofcomputervision[35].Unsurprisingly,facerecognitionhas
receivedtremendousattentioninthecomputervisionandbiometricscommunitiesoverthepastthree
decades.Here,wepresentafewnotableapproachestolearningfacerepresentation(hand-crafted
andlearning-basedmethods).
Hand-craftedRepresentation.
The˝rstandmostwell-knownglobalfeatureextractionmethod
isEigenfaces[36],inwhichprincipalcomponentanalysis(PCA)isemployedonnormalizedvectors
oftrainingimagesto˝ndtheprincipaleigenvectors,correspondingtothelargest,sayk,eigenvalues.
8
Theobtainedeigenvectorsarethenusedasaseedsettorepresentotherfaceimages(notinthe
trainingset)viaalinearprojectionoperation.Toutilizetheinformationwheneachsubjecthasmore
thanoneimageinthetrainingsetandtheimagesarelabeledbysubjectID,Fisherfaces[37]uses
Fisher'sLinearDiscriminantAnalysis(LDA)tominimizeintra-classvariations(amongimagesof
thesameperson)whilemaximizinginter-classdi˙erences(amongimagesbelongingtodi˙erent
people).Besidesglobalfeatures,avarietyoffacerepresentationmethodswereproposedtoencode
localfacialcomponentssuchaseyes,nose,andmouth.Forexample,complexcoe˚cientsofGabor
˝lters[38](alsoknownasGaborwavelets)areusedtoencodebothfacialshapeandlocalappearance
features.Intheworkof[39],faceimagesarerepresentedbyutilizingthebunchgraphtocollect
informationfromGaborwaveletsconvolutionvaluesateachfaciallandmarklocation.Incontrastto
Gabor˝lters,intensitybasedlocalelementarydescriptorshavealsobeenusedtorepresentface
imagesduetotheire˚cientcomputations,andinsensitivitytopartialocclusionandposechanges,
whichincludesLocalBinaryPatterns(LBP)[40],LocalPhaseQuantization(LPQ)[41],Histogram
ofOrientedGradients(HOG)[42],andScaleInvariantFeatureTransform(SIFT)[43].
Learning-basedRepresentation.
Giventhesuccessofdeepneuralnetwork(DNN)inthe
ImageNetcompetition[44],DNN-basedrepresentationlearninghavecontributedtomassivestrides
inFRcapabilities[9,45].Thede˝ningcharacteristicofsuchmethodsistheuseofconvolutional
neuralnetwork(CNN)basedfeatureextractor,alearnableembeddingfunctioncomprisedof
severalsequentiallinearandnon-linearoperators[44].Amongthe˝rstattemptsatlearningface
representationusingdeeplearning,Taigman
etal
.[17]presentedDeepFacewhichusesadeepCNN
trainedtoclassifyfacesinCASIAdataset,demonstratingremarkableperformanceonLFWdataset.
AsanimprovedversionofDeepID[46],DeepID2[18]employsbothveri˝cationandidenti˝cation
tasksassupervisionsignalstolearnrobustdiscriminatingrepresentations.Thefollowingapproaches
haveexploreddi˙erentlossfunctionstoimprovethediscriminabilityoftheembeddingvector.
ResearchersfromGoogle[12]useamassivedatasetofabout
200
millionfaceimagesof8million
identitiestotrainaCNNdirectlyforfaceveri˝cation(calledFaceNet).Theyoptimizeatripletloss
functionwhichisbasedontripletsofimagescomprisingapairofsimilarandapairofdissimilar
9
faces.Thelossfunctionisformulatedas:
L
CA8?;4C
=
"
X
8
=1
[
U
+
3
(
r
0
Œ
r
+
)

3
(
r
0
Œ
r

)]
+
Œ
(1.5)
where
"
isthenumberoftriplets,
r
0
,
r
+
and
r

aretherepresentationsofananchorfaceimage,a
positive(sameidentityastheanchor)andanegative(di˙erentidentityfromtheanchor)faceimage,
respectively.
[
G
]
+
=
max
f
0
ŒG
g
,
U
isamarginparameterand
3
(

)
isthesquaredeuclideandistance.
InspiredbyDNN-basedimageclassi˝cation,amajorpartofthee˙ortshasbeenmadetodevelop
newlossfunctionsontopofthesoftmaxlayerbasedoncross-entropyloss:
L
2A>BB

4=CA>?H
(
r
ŒH
;
W
Œ
b
)=

 
X
:
=1
I
(
:
=
H
)log
4
W
)
H
r
+
b
H
P
 
9
=1
4
W
)
9
r
+
b
9
Œ
(1.6)
where
r
and
H
aretherepresentationandtheidentitylabelofaninputfaceimage,
W
and
b
arethe
parametersofthesoftmaxlayer,and
 
isthenumberofclasses(uniqueidentities)inthetraining
set.Wen
etal
.[47]proposecenterlossthatisaugmentedwithEq.(1.6)toreducetheintra-class
variations.IntheL2-constrainedsoftmax[48],afeaturevector
r
is˝rstnormalizedbyits
;
2
norm
tolieonahyper-sphereandthenscaledbyaconstantfactor.SphereFace[13]introducesangular
softmax(A-softmax)inwhichtheoriginalsoftmaxismodi˝edtodirectlyoptimizeanglesbetween
W
H
and
r
,resultinginangularlydistributedfeatures.Othersoftmaxmodi˝cationsenforceextra
intra-classconcentrationandinter-classvariancetofacefeaturesbyaddingamarginpenaltytothe
decisionboundary[6,19,49].InCosFace[19],boththerepresentation
r
andtheweightvectors
W
are
;
2
normalizedtocomputetheircosinesimilarity,basedonwhichacosinemargintermis
introducedtofurtherbroadenthedecisionboundaryinanangularspace.ArcFace[6]addsan
additiveangularmargintotheanglebetweentherepresentation
r
anditstargetweightvector
W
H
via
thearc-cosinefunction.Allthesefacerepresentationmethodssharethesameobjective:increasing
inter-classdistancesandreducingintra-classvariations.
10
1.3ManifoldofFaceRepresentation
Afacerepresentationisobtainedbyanembeddingfunctionthattransformstherawpixelrepresen-
tationoftheimagetoapointinahigh-dimensionalfeaturespace.Learningorestimatingsucha
mappingismotivatedbytwogoals:(a)compactnessoftherepresentation,and(b)e˙ectiveness
ofthemappingforFR.GiventhemethodsintroducedinSec.1.2,thelattertopichasreceived
substantialattention.Yet,therehasbeenlittlefocusonthedimensionalityoftherepresentation
itself.Anothertopicrelatedtorepresentationcompactnessisthecapacityoffacerepresentation.
Givenafacerepresentation,howmanyidentitiescanitresolve?
Inthiswork,wedevelopalgorithms
toestimatetheintrinsicdimensionalityandcapacityoffacerepresentations,anddesignanew
dimensionalityreductionmethodtoobtaincompactrepresentations.
1.3.1IntrinsicDimensionality
Thedimensionalityoffacerepresentationsextractedfromdeepnetworkshasrangedfromhundreds
tothousandsofdimensions.Forinstance,currentSOTAfacerepresentationshave128,512,1,024
and4,096dimensionsforFaceNet[12],ResNet[50],SphereFace[13],andVGG[51],respectively.
Thechoiceofdimensionalityisoftendeterminedbypracticalconsiderations,suchas,easeof
learningtheembeddingfunction[52],constraintsonsystemmemory,etc.,insteadofassigning
e˙ectivedimensionalityneededforimagerepresentation.Thisnaturallyraisesthefollowing
fundamentalbutrelatedquestion,
Howcompactcantherepresentationbewithoutanylossin
recognitionperformance?
Inotherwords,
whatistheintrinsicdimensionalityoftherepresentation?
Subsequently,
howcanoneobtainsuchacompactrepresentation?
Theintrinsicdimensionality(IND)ofarepresentationreferstotheminimumnumberofparame-
ters(ordegreesoffreedom)necessarytocapturetheinformationpresentintherepresentation[53].
Equivalently,itreferstothedimensionalityofthe
<

dimensionalmanifold,
M
,embeddedwithin
the
3

dimensionalambient(representation)space
P
where
<

3
.Thisnotionofintrinsic
dimensionalityisnotablydi˙erentfromcommonlineardimensionalityestimatesobtainedthrough
11
e.g.,
PCA.Thislineardimensioncorrespondstothebestlinearsubspacenecessarytoretaina
desiredfractionofthevariationsinthedata.Inprinciple,lineardimensionalitycanbeaslargeas
theambientdimensionifthevariationfactorsarehighlyentangledwitheachother.
Theabilitytoestimatetheintrinsicdimensionalityofagivenfacerepresentationisusefulina
numberofways.Ata
fundamental
level,theINDdeterminesthetruecapacityandcomplexityof
variationsinthedatacapturedbytherepresentation,throughtheembeddingfunction.Infact,IND
canbeusedtogaugetheinformationcontentintherepresentation,duetoitslinearrelationwith
Shannonentropy[54,55].Also,itprovidesanestimateoftheamountofredundancybuiltintothe
representationwhichrelatestoitsgeneralizationcapability.Ona
practical
level,knowledgeofthe
INDiscrucialfordevisingoptimalunsupervisedstrategiestoobtainfacefeaturesthatareminimally
redundant,whileretainingitsfullabilitytorecognizefacesofdi˙erentidentities.Recognitionin
theintrinsicspacecanprovidesigni˝cantsavings,bothinmemoryrequirementsforthetemplates
aswellasprocessingtime,acrossdownstreamtaskslikelarge-scalefacematchingintheencrypted
domain[56].Lastly,thegapbetweenambientandintrinsicdimensionalitiesofarepresentation
canserveasausefulindicatortodrivethedevelopmentofalgorithmsthatcandirectlylearnhighly
compactembeddings.
EstimatingtheINDofagivenfacerepresentationis,however,achallengingtask.Suchestimates
arecruciallydependentonthedensityvariationsintherepresentation,whichinitselfisdi˚cultto
estimateasimagesoftenlieonatopologicallycomplexcurvedmanifold[57].Moreimportantly,
givenanestimateofIND,howdoweverifythatittrulyrepresentsthedimensionalityofthecomplex
high-dimensionalrepresentationspace?AnindirectvalidationoftheINDispossiblethrougha
mappingthattransformstheambientrepresentationspacetotheintrinsicrepresentationspacewhile
preservingitsdiscriminativeability.However,thereisnocertaintythatsuchamappingcanbe
founde˚ciently.Inpractice,˝ndingsuchmappingscanbeconsiderablyharderthanestimatingthe
INDitself.
Weovercomebothofthesechallengesby(1)adoptingatopologicaldimensionalityestimation
techniquebasedonthegeodesicdistancebetweenpointsonthemanifold,and(2)relyingonthe
12
abilityofDNNstoapproximatethecomplexmappingfunctionfromtheambientspacetothe
intrinsicspace(asweseebelowinCh.2).ThelatterenablesvalidationoftheINDestimatesthrough
facematchingexperimentsonthecorrespondinglow-dimensionalintrinsicrepresentationoffeature
vectors.
1.3.2Capacity
Considerthefollowingscenario:wewouldliketodeployaFRsystemwithrepresentation
"
ina
targetapplicationthatrequiresamaximumFARof
@
%
.Assubjectsarecontinuouslyaugmentedto
thegallery,anintuitivelyknownandempiricallyobservedphenomenonoccurs:theFRaccuracy
startsdecreasing.Thisisprimarilyduetothefactthatwithmoresubjectsanddiverseviewpoints,
therepresentationsoftheclasseswillnolongerbedisjoint.Inotherwords,theFRsystembasedon
representation
"
cannolongercompletelyresolvealloftheuserswithinthe
@
%
FAR.Wede˝ne
themaximalnumberofusersatwhichthefacerepresentationreachesthislimitasthecapacity
1
of
therepresentation.Ourmaincontributioninthiswork,istodeterminethecapacityinanobjective
mannerwithouttheneedforempiricalevaluation.
Theabilitytodeterminethiscapacitya˙ordsthefollowingbene˝ts:(i)Statisticalestimatesof
theupperboundonthenumberofidentitiesthefacerepresentationcanresolve.Thiswouldallow
forinformeddeploymentofFRsystemsbasedontheexpectedscaleofoperation;(ii)Estimatethe
maximalgallerysizeforthefacerepresentation
without
havingtoexhaustivelyevaluatetheface
representationateachscale.Consequently,capacityo˙ersanalternativedataset
2
-agnosticmetric
forcomparingdi˙erentfacerepresentations.
Anattractivesolutionforestimatingthecapacityoffacerepresentationsistoleveragethenotion
ofpackingbounds
3
;themaximalnumberofshapesthatcanbe˝t,withoutoverlapping,within
thesupportoftherepresentationspace.Alooseboundonthispackingproblemcanbeobtained
1
Thisisdi˙erentfromthenotionofcapacityofaspaceoffunctionsasmeasuredbyitsVvonenkis
dimensionoflinearclassi˝ers.
2
Classofdatasetsasopposedtoaspeci˝cdataset.
3
Ageneralizationofthewellstudiedsphere-packingproblem.
13
asaratioofthevolumeofthesupportspaceandthevolumeoftheshape.Inthecontextof
facerepresentations,therepresentationsupportcanbemodeledasalow-dimensionalpopulation
manifold
M2
R
<
embeddedwithinahigh-dimensionalrepresentationspace
P2
R
?
,whileeach
class
4
canbemodeledasitsownmanifold
M
2
M
.Underthissetting,aboundonthecapacity
oftherepresentationcanbeobtainedasaratioofthevolumesofthepopulationandclass-speci˝c
manifolds.However,adoptingthisapproachtoobtainempiricalestimatesofthecapacitypresents
thefollowingchallenges:
1.
Estimatingthesupportofthepopulationmanifold
M
andtheclass-speci˝cmanifolds
M
2
,
especiallyforahigh-dimensionalembedding,suchasafacerepresentation(typically,several
hundred),isanopenproblem.
2.
Estimatingthedensityofthemanifoldswhileaccountingforthedi˙erentsourcesofnoiseisa
challengingtask.Inthecontextoffacerepresentations,allthecomponentsofatypicalface
representationpipeline(seeFig.1.2)arepotentialsourcesofnoise.
3.
Obtainingreliableestimatesofthevolumeofarbitrarilyshapedhigh-dimensionalmanifolds
(forcapacitybound),isanotheropenproblem.
Weproposeaframeworkthataddressestheaforementionedchallengestoobtainreliableestimates
ofthecapacityofanyfacerepresentation.Oursolutionrelieson:(1)modelingthefacerepresentation
asalow-dimensionalEuclideanmanifoldembeddedwithinahigh-dimensionalspace,(2)projecting
andunfoldingthemanifoldtoalow-dimensionalspace,(3)approximatingthepopulationmanifold
byamultivariateGaussiandistribution(equivalently,hyper-ellipsoidalsupport)intheunfolded
low-dimensionalspace,(4)approximatingtheclass-speci˝cmanifoldsbyamulti-variateGaussian
distributionandestimatingitssupportasafunctionofthespeci˝cFAR,and(5)estimatingthe
capacityasaratioofthevolumesofthepopulationandclass-speci˝chyper-ellipsoids.Thisworkis
introducedbelowinCh.3.
4
InthecaseofFR,eachclassisanidentity(subject)andthenumberofclassescorrespondstothenumberof
identities.
14
Figure1.3Theerrortradeo˙characteristicsintheformoffalsenon-matchrates(FNMR=1-TAR)
vs.falsematchrates(FMR=FAR)foronecommercialFRalgorithmverifyingmugshotimages,
reportedbytheNIST(NationalInstituteofStandardsandTechnology)FacerecognitionVendor
Test[7].TheFMRestimatesarecomputedonimpostorpairsoffaceimageswithsamegenderand
samerace.Eachsymbol(circle,triangle,square)correspondstoa˝xedthreshold-theirverticaland
horizontaldisplacementsreveal,respectively,di˙erencesinFNMRandFMRbetweendemographic
groups[7].
1.4BiasinFaceRecognition
FRsystemsareknowntoexhibitdiscriminatorybehaviorsagainstcertaindemographicgroups[7,14,
21].Fig.1.3showsonecommercialFRalgorithmthathaslowerperformanceincertaindemographic
groupsthanothersinthe2019NISTFaceRecognitionVendorTest(FRVT)[7].Infact,all106
FRalgorithmsthatparticipatedintheNISTFRVTexhibitdi˙erentlevelsofbiasedperformances
ongender,race,andagegroupsofamugshotdataset(seeFig.1.4).Forallthealgorithmslisted
inFig.1.4,weseethealgorithmsthatachievebetterperformancepresentlesssex/racebias.For
example,thedi˙erenceinFMRbetweenthehighestandlowestsex/racegroupislessthan
0
Ł
1%
for
thebestmodelshowninthelastrowofFig.1.4.Evenso,demographicbiasstillexistsincurrentFR
algorithms.InatimewhenFRsystemsarebeingdeployedintherealworldforsocietalbene˝t,
15
Figure1.4Falsepositivedi˙erentialsinveri˝cationalgorithmsprovidedtoNIST.Thedotsgive
thefalsematchratesforsame-sexandsame-raceimpostorcomparisons.Thethresholdissetfor
eachalgorithmtogiveFMR=
0
Ł
0001
onwhitemales(thepurpledotsintherighthandpanel).The
algorithmsaresortedinorderofworstcaseFMR[7].
16
thistypeofbias
5
isnotacceptable.Notethathere,wede˝neFRbiasastheunevenrecognition
performancewithrespecttodemographicgroups.ItisdesirabletodesignunbiasedFRalgorithms
tomaintainfairnessinFRperformancewhendeployingthistechnologyforlawenforcementand
otherapplications.
Anaturalquestionarises,
WhydoesthebiasproblemexistinFRsystems?
First,SOTAFR
algorithms[6,13,19]relyonCNNstrainedonlarge-scalefacedatasets.Thepublictrainingdatasets
forFR,e.g.,CASIAWebFace[11],VGGFace2[59],andMS-Celeb-1M[23],arecollectedby
scrapingfaceimageso˙theweb,withinevitabledemographicbias[5].Previousstudieshaveshown
thatmodelstrainedwithimbalanceddatasets(unequalnumberoftrainingsamplesfromdi˙erent
demographicgroups)leadtobiaseddiscrimination[60,61].Similarly,biasinfacedatasetsis
transmittedtotheFRmodelsthroughnetworklearning.Forexample,tominimizetheoverallloss,a
networktendstolearnabetterrepresentationforfacesinthemajoritygroupwhosenumberoffaces
dominatethetrainingset,resultinginunequaldiscriminabilities.Theimbalanceddistributionof
demographicsinfacedata,isnevertheless,nottheonlytriggerofFRbias.Priorworkhasshownthat
evenusingademographicbalanceddataset[5]ortrainingseparateclassi˝ersforeachgroup[14],
theperformanceonsomegroupsisstillinferiortotheothers.Thisshowsthebiasinembedding
functions.Sincethegoaloffacerepresentationistomaptheinputfaceimagetoatargetfeature
vectorwithhighdiscriminativepower,thebiasinthemappingfunctionwillresultinfeaturevectors
withlowerdiscriminabilityforcertaindemographicgroups.Furthermore,bystudyingnon-trainable
FRalgorithms,[14]introducedanewnotionofinherentbias,i.e.,certaingroupsareinherently
moresusceptibletoerrorsinthefacematchingprocess.
Totacklethedataset-inducedbias,datare-samplingmethodshavebeenexploitedtobalance
thedatadistributionbyunder-samplingtheclasseswithmoresamples[62]orover-samplingthe
classeswithlesssamples[63,64].Despiteitssimplicity,valuableinformationmayberemovedby
under-sampling,andover-samplingmayintroducenoisysamples.Naivelytrainingonabalanced
datasetcanstillleadtobias[5].Anothercommonoptionforimbalanceddatatrainingiscost-
5
Thisisdi˙erentfromthenotionofinductivebiasinmachinelearning,de˝nedas"anybasisforchoosingone
generalization[hypothesis]overanother,otherthanstrictconsistencywiththeobservedtraininginstances"[58].
17
sensitivelearningthatassignsweightstodi˙erentclassesbasedon(i)theirfrequencyor(ii)the
e˙ectivenumberofsamples[65,66].Recentimbalancedlearningmethodsfocusonnovelobjective
functionsforclass-skeweddatasets.Forinstance,Dong
etal
.[67]proposeaClassRecti˝cation
Losstoincrementallyoptimizeonhardsamplesoftheclasseswithunder-representedattributes.
Alternatively,researchersstrengthenthedecisionboundarytoimpedeperturbationfromother
classesbyenforcingmarginsbetweenhardclustersviaadaptiveclustering[68],orbetweenrare
classesviaBayesianuncertaintyestimates[69].Toadapttheaforementionedmethodstoracialbias
mitigation,Wang
etal
.[5]modifythelargemarginbasedlossfunctionsbyreinforcementlearning.
However,[5]requirestwoauxiliarynetworks,ano˜inesamplingnetworkandadeepQ-learning
network,togenerateadaptivemarginpolicyfortrainingtheFRnetwork,whichhindersthelearning
e˚ciency.
Weproposetwodi˙erentapproachestofairrepresentationlearningforFRsystems.The
˝rstapproach,called
DebFace
,utilizesadversariallearningtodisentangleafacerepresentation
intofourcomponents,
i.e.,
gender,age,race/ethnicity,andidentity.Eachofthefourcomponents
isindependentfromothers.Wede-biasafacerepresentationundertheassumptionthatifno
discriminatingdemographicinformationiscapturedbythefacerepresentation,itwouldbeunbiased
withrespecttodemographicattributes.MoredetailsaregiveninCh.4.Thesecondapproach,called
GAC(GroupAdaptiveClassi˝er)
,addressesthebiasissueinadi˙erentway.GACoptimizestheface
representationlearningoneverydemographicgroupinasinglenetworkviaadaptiveconvolution
kernelsandchannel-wiseattentionmaps,whichincreasesthenetworkcapacityforrepresenting
multiplefacepatternsfromdi˙erentdemographicgroups.
1.5FaceRepresentationviaGraphNeuralNetwork
AsintroducedinSec.1.2,facerepresentationlearningistheprocessofestablishinganembedding
functionbywhichafaceimageistransformedtoahigh-dimensionalfeaturevector.Amongthe
currentSOTADNN-basedapproaches,di˙erentnetworkarchitecturesandlossfunctionshave
18
beenexploredtoimproveFRperformance.Inearlystudies[17,46],deepfeaturesarelearntviaa
faceclassi˝cationobjective.However,laterstudies[18,48]foundasimpleclassi˝cationlossis
insu˚cienttocapturediscriminativefacefeatures,andthusattemptedtodesignappropriateloss
functionstoenhancethediscriminativepowerofrepresentations.Onedirectionofresearchisto
directlylearnanembeddingviametriclearning,
e.g.,
contrastiveloss[18]andtripletloss[12].The
otherlineofmethodsadaptthetraditionalsoftmax-cross-entropylosstodecreasetheintra-class
variations[47],ortolengthentheinter-classdistancesbyaddinganextramarginbetweenclassesin
eithercosine[19]orangularspace[6,13].
WhiletheFRperformanceisimproved,bothtypesofmethodshavelimitations.Forthesoftmax
approachanditsvariants,thedimensionalityoftheoutputsoftmaxvectorincreaseslinearlywiththe
numberofidentitiesinthetrainingset,whichmayleadtoabottleneckissueinthecomputation.
Distancemetriclearningapproachescanavoidthisissuebyacquiringafeaturespacewheredistance
correspondstoclasssimilarity.However,acarefullydesignedschemeisrequiredtoselectthepair
ortripletsamplesfromatremendousnumberofcombinationsforlarge-scaledatasets.Furthermore,
bothclassi˝cation-basedanddistance-basedobjectivestacklemerelyoneachindividualsample
oratmostatripletofsamplesinthephysicaldistancemetric,butignorethe
generaldistribution
derivedfromcorrelationsbetweensamplesofwithin-classandcross-class.
ToaddresstheaforementionedshortcomingsoftheFRlossfunctions,weproposearepresentation
learningmethodthatutilizesgraphclassi˝cationviaadversarialtraining.Incontrasttousinga
metricastheconstraintinthefeaturespace,ourideaistocreateanidealfeaturespace,referred
toas
oraclespace
,wheretheclusteroffeaturepointsineachclassisclearlyseparatedfromother
classes.Adeepneuralnetwork(DNN)isthentrainedtogeneratefacefeaturesthatfollowthedata
distributioninthe
oraclespace
.Inthiscase,thefacerepresentationmodelcanberegardedas
thegenerativemodelinGenerativeAdversarialNetwork(GAN),whichhasemergedasauseful
frameworkforlearningarbitrarydistributionsfromobserveddatasamples.Bymeansofadversarial
learning,GANo˙ersapleasingoptionforgenerativetasksinwhichthegeneratoristrainedtoderive
thedatadistributionfromobservedsamples.However,therelationshipsandinter-dependencies
19
betweenfeaturepointsarenotassimpleasthedatastructureof˝xed-sizegridimages.Forthis
reason,wecannolongerusetheconventional
discriminator
inGANastheformofadversarial
supervision.
Instead,weconsiderthedatastructureinfeaturespaceasadirectedgraph,whereeachvertex
correspondstoanimagesampleandtheedgesbetweenverticesrepresenttheirdependencies.For
theoraclespace,featurespointsareconnectediftheybelongtothesubject;whileintheactual
featurespace,nodesarelinkedtotheir
:
nearestneighbors.Thediscriminationtaskhereisto
distinguishbetweengraphsfromtheoraclespaceandthegeneratedspace.Tothisend,weemploya
graphclassi˝ertrainedwithGraphNeuralNetwork(GNN),asthediscriminatorthatguidesthe
representationmodeltooutputfeaturesthatfollowtheoracledistribution.Theproposedframework
isthuscapableoflearningagenericfeaturespacewithenhanceddiscriminativepowerforface
images,basedonapredesignedfeaturedistributionde˝nedonagraphstructure.Furthermore,
theconstructionoforaclegraphsprovidesuswithanopportunitytotakecontrolofthelearning
complexitysoasto˛exiblyadjustthebias-variancetrade-o˙fromtheperspectiveofthetraining
objective.
1.6ThesisContributions
Theresearchworkcarriedoutinaddressingtheaboveproblemshasresultedinanumberof
contributionstofacerecognition.Thekeycontributionsofthisdissertationarebrie˛ydescribed
below:
‹
Itisthe˝rstattempttoestimatetheintrinsicdimensionalityofDNNbasedimagerepre-
sentations.NumericalexperimentsyieldanIDestimateof,12and16forFaceNet[12]
andSphereFace[13]facerepresentations,respectively,and19forResNet-34[70]image
representation.Theestimatesaresigni˝cantlylowerthantheirrespectiveambientdimension-
alities,128-
38<
forFaceNetand512-
38<
fortheothers.Wealsoproposeanunsupervised
DNNbaseddimensionalityreductionmethodundertheframeworkofmulti-dimensional
20
scaling,calledDeepMDS.DeepMDSmappingissigni˝cantlybetterthanotherdimensionality
reductionapproachesintermsofitsdiscriminativecapability.
‹
Itisthe˝rstpracticalattemptatestimatingthecapacityofDNNbasedfacerepresentations.
Weconsidertwosuchrepresentations,namely,FaceNet[12]andSphereFace[13]consisting
of128-dimensionaland512-dimensionalfeaturevectors,respectively.Weproposeanoise
modelforfacialembeddingsthatexplicitlyaccountsfortwosourcesofuncertainty,uncertainty
duetodataandtheuncertaintyintheparametersoftherepresentationfunction.Wecan
estimatecapacityasafunctionofthedesiredoperatingpoint,intermsofthemaximum
desiredprobabilityoffalseacceptanceerrorbyestablishingarelationshipbetweenthesupport
oftheclass-speci˝cmanifoldsandthediscriminantfunctionofanearestneighborclassi˝er.
‹
Weprovideathoroughanalysisofdeeplearningbasedfacerecognitionperformanceon
threedi˙erentdemographics:(i)gender,(ii)age,and(iii)race.Weproposetwoface
recognitionframeworksthatmitigatedemographicbias:(i)DebFace,and(ii)GAC.DebFace
generatesdisentangledrepresentationsforbothidentityanddemographicattributerecognition
whilejointlyremovingdiscriminativeinformationfromothercounterparts.Theresult
indicateboththeidentityrepresentationandthedemographicattributeestimationviaDebFace
showlowerbiasondi˙erentdemographiccohorts.GACreducesdemographicbiasand
increasesrobustnessofrepresentationsforfacesineverydemographicgroupbyadopting
adaptiveconvolutionsandattentiontechniques.GACisabletoautomaticallydeterminethe
layerstoemploydynamickernelsandattentionmaps,leadingtoSOTAperformanceona
demographic-balanceddatasetandthreebenchmarkdatasets.
‹
Weproposeanewframeworkforfacerepresentationlearning,whichmodelsthedata
distributioninanoraclespaceviaadversariallearning,totransformrawpixelsofanimagetoa
highlydiscriminativefeaturevectorforfacerecognition.Graphsareconstructedinthefeature
spacetodescribethedatadistributionoffeaturepointswithrespecttotheiridentitiesand
similarities.Wealsoprovideathoroughanalysistowardstheimpactofprede˝nedgraphson
thediscriminabilityofthelearnedfacerepresentations.Ourgraphbasedapproachsurpasses
21
thebaselinemodelandachievesstate-of-the-artperformanceonsixbenchmarkdatasets
(LFW[27],CPLFW[71],CFP-FP[72],IJB-A[29],IJB-B[30],IJB-C[9]).
1.7ThesisStructure
Ch.2ofthisthesisfocusesonthecompactnessofafacerepresentation,andproposesanew
algorithmtoreducethedimensionalityoftherepresentationwithlittledegradationinperformance.
WiththedimensionalityreductiontooldevelopedinCh.2,Ch.3estimatesthecapacityofaface
representationonamorecompactfeaturespace,attemptingtoovercomethecurseofdimensionality
thatmayleadtoover-estimatedcapacityvalues.InCh.4,theissueofdemographicbiasinFR
systemsisaddressed.Besidesanempiricalanalysisoftheunequalveri˝cationperformancein
di˙erentdemographicgroups,weintroducenewstrategiestomitigatesuchbiasforfairerface
representationlearning.AnewframeworkforfacerepresentationlearningispresentedinCh.5,
whichutilizesadversariallearningtoacquireoraclefeaturedistributionintheformofa
:

NNgraph.
Thelastchapterdiscussestheconclusionsofthisdissertationandpresentsdirectionsforfuturework.
Theexperimentalresultsoftheworkinthisthesiswerepreviouslypresentedin[8,
22
Chapter2
TheIntrinsicDimensionalityofFace
Representation
Thischapteraddressesthefollowingquestionspertainingtotheintrinsicdimensionalityofany
givenfacerepresentation:(i)estimateitsintrinsicdimensionality,(ii)developadeepneuralnetwork
basednon-linearmapping,dubbedDeepMDS,thattransformstheambientrepresentationtothe
minimalintrinsicspace,and(iii)validatetheveracityofthemappingthroughfacematchinginthe
intrinsicspace.Experimentsonbenchmarkimagedatasets(LFW[27]andIJB-C[9])revealthat
theintrinsicdimensionalityofdeepneuralnetworkrepresentationsissigni˝cantlylowerthanthe
dimensionalityoftheambientfeatures.Forinstance,SphereFace's[13]
512

dimfacerepresentation
hasanintrinsicdimensionalityof16onIJB-Cdataset.Further,theDeepMDSmappingisableto
obtainarepresentationofsigni˝cantlylowerdimensionalitywhilemaintainingdiscriminativeability
toalargeextent,
59
Ł
75%
TAR@
0
Ł
1%
FARin
16

dim
vs
71
Ł
26%
TARin
512

dimonIJB-C.
Thekeycontributionsand˝ndingsofthischapterare:

The˝rstattempttoestimatetheintrinsicdimensionalityofDNNbasedfacerepresentations.

AnunsupervisedDNNbaseddimensionalityreductionmethodundertheframeworkofmultidi-
mensionalscaling,calledDeepMDS.

NumericalexperimentsyieldanINDestimateof,12and16forFaceNet[12]andSphereFace[13]
23
facerepresentations,respectively.Theestimatesaresigni˝cantlylowerthantheirrespectiveambient
dimensionalities,128-
38<
forFaceNetand512-
38<
forSphereFace.

DeepMDSmappingissigni˝cantlybetterthanotherdimensionalityreductionapproaches(e.g.,
PCAandIsomap[76])intermsofitsdiscriminativecapability.
2.1IntrinsicDimensionality
Existingapproachesforestimatingintrinsicdimensionalitycanbebroadlyclassi˝edintotwo
groups:projectionmethodsandgeometricmethods.Theprojectionmethods[77

79]determine
thedimensionalitybyprincipalcomponentanalysisonlocalsubregionsofthedataandestimating
thenumberofdominanteigenvalues.Theseapproacheshaveclassicallybeenusedinthecontext
ofmodelingfacialappearanceunderdi˙erentilluminationconditions[80]andobjectrecognition
withvaryingpose[81].Whiletheyserveasane˚cientheuristic,theydonotprovidereliable
estimatesofintrinsicdimension.Geometricmethods[2,82

86],ontheotherhand,modelthe
intrinsictopologicalgeometryofthedataandarebasedontheassumptionthatthevolumeofa
<
-dimensionalsetscaleswithitssize
n
as
n
<
andhencethenumberofneighborslessthan
n
also
behavesthesameway.
Ourapproachinthischapterisbasedonthetopologicalnotionofcorrelationdimension[82,83],
themostpopulartypeoffractaldimensions.Thecorrelationdimensionimplicitlyusesnearest-
neighbordistance,typicallybasedontheEuclideandistance.However,Granata
etal
.[1]observe
thatleveragingthemanifoldstructureofthedata,intheformofgeodesicdistancesinducedbya
neighborhoodgraphofthedata,providesmorerealisticestimatesoftheIND.Buildinguponthis
observationwebaseourINDestimatesonthegeodesicdistancebetweenpoints.Webelievethat
estimatingtheintrinsicdimensionalitywouldserveasthe˝rststeptowardsunderstandingthebound
ontheminimalrequireddimensionalityforrepresentingfacesandaidinthedevelopmentofnovel
algorithmsthatcanachievethislimit.
24
2.2DimensionalityReduction
Thereisatremendousbodyofworkonthetopicofestimatinglow-dimensionalapproximationsof
datamanifoldslyinginhigh-dimensionalspace.TheseincludelinearapproachessuchasPrincipal
ComponentAnalysis[87],MultidimensionalScaling(MDS)[88]andLaplacianEigenmaps[89]
andtheircorrespondingnon-linearspectralextensions,LocallyLinearEmbedding[90],Isomap[76]
andDi˙usionMaps[91].Anotherclassofdimensionalityreductionalgorithmsleveragethe
abilityofdeepneuralnetworkstolearncomplexnon-linearmappingsofdata,includingdeep
autoencoders[92],denoisingautoencoders[93,94]andlearninginvariantmappingseitherwith
thecontrastiveloss[95]orwiththetripletloss[12].Whiletheautoencoderscanlearnacompact
representationofdata,sucharepresentationisnotexplicitlydesignedtoretaindiscriminative
ability.Boththecontrastivelossandthetripletlosshaveanumberoflimitations;(1)requires
similarityanddissimilaritylabelsandcannotbetrainedinanunsupervisedsetting,(2)requires
anadditionalhyper-parameter,maximummarginofseparation,whichisdi˚culttopre-determine,
especiallyforanarbitraryrepresentation,and(3)doesnotmaintainthemanifoldstructureinthe
low-dimensionalspace.Inthiswork,wetooleverageDNNstoapproximatethenon-linearmapping
fromtheambienttotheintrinsicspace.However,weconsideranunsupervisedsetting(i.e.,no
similarityordissimilaritylabels)andcastthelearningproblemwithintheframeworkofMDSi.e.,
preservingtheambientgraphinducedgeodesicdistancebetweenpointsintheintrinsicspace.
2.3OurApproach
Ourgoalinthisworkistocompressagivenfacerepresentationspace.Weachievethisintwo
stages
1
:(1)estimatetheintrinsicdimensionalityoftheambientfaceresentation,and(2)learnthe
DeepMDSmodeltomaptheambientrepresentationspace
P2
R
3
totheintrinsicrepresentation
space
M2
R
<
(
<

3
).TheINDestimatesarebasedontheonepresentedby[1]whichrelieson
1
Traditionalsingle-stagedimensionalityreductionmethodsusevisualaidstoarriveatthe˝nalINDandintrinsic
space,e.g.,plottingtheprojectionerroragainsttheINDvaluesandlookingforainthecurve.
25
twokeyideas:(1)usinggraphinducedgeodesicdistancestoestimatethecorrelationdimensionof
thefacerepresentationtopology,and(2)thesimilarityofthedistributionsofgeodesicdistances
acrossdi˙erenttopologicalstructureswiththesameintrinsicdimensionality.TheDeepMDS
modelisoptimizedtopreservetheinterpointgeodesicdistancesbetweenthefeaturevectorsinthe
ambientandintrinsicspace,andistrainedinastage-wisemannerthatprogressivelyreducesthe
dimensionalityoftherepresentation.BasingtheprojectionmethodonDNNs,insteadofspectral
approacheslikeIsomap,addressesthescalabilityandout-of-sample-extensionproblemssu˙eredby
spectralmethods.Speci˝cally,DeepMDSistrainedinastochasticfashion,whichallowsittoscale.
Furthermore,oncetrained,DeepMDSprovidesamappingfunctionintheformofafeed-forward
networkthatmapstheambientfeaturevectortoitscorrespondingintrinsicfeaturevector.Sucha
mapcaneasilybeappliedtonewtestdata.
2.3.1EstimatingIntrinsicDimensionality
Wede˝nethenotionofintrinsicdimensionthroughtheclassicalconceptof
topologicaldimension
ofthesupportofadistribution.Thisisageneralizationoftheconceptofdimensionofalinear
space
2
toanon-linearmanifold.Methodsforestimatingthetopologicaldimensionareallbasedon
theassumptionthatthebehaviorofthenumberofneighborsofagivenpointonan
<
-dimensional
manifoldembeddedwithina
3
-dimensionalspacescaleswithitssize
n
as
n
<
.Inotherwords,
thedensityofpointswithinan
n
-ball(
n
!
0
)intheambientspaceisindependentoftheambient
dimension
3
andvariesonlyaccordingtoitsintrinsicdimensionality
<
.Givenacollectionof
points
^
=
f
x
1
ŒŁŁŁŒ
x
=
g
,where
x
8
2
R
3
,thecumulativedistributionofthepairwisedistances
˘
(
A
)
betweenthe
=
pointscanbeestimatedas,
˘
(
A
)=
2
=
(
=

1)
=
X
89
=1
˛
(
A
k
x
8

x
9
k
)=
Z
A
0
?
(
A
)
3A
(2.1)
2
Lineardimensionistheminimumnumberofindependentvectorsnecessarytorepresentanygivenpointinthis
spaceasalinearcombination.
26
(a)GraphInducedGeodesicDistance
2
f
?
(
A
)
A
A
<0G
GeodesicDistance
(b)TopologicalSimilarity
Figure2.1
IntrinsicDimension:
Ourapproachisbasedontwoobservations:(a)Graphinduced
geodesicdistancebetweenimagesisabletocapturethetopologyoftheimagerepresentation
manifoldmorereliably.Asanillustration,weshowthegraphedgesforthesurfaceofaunitary
hypersphereandafacemanifoldofIDtwo,embeddedwithina3-
38<
space.(b)Thedistributionof
thegeodesicdistances(fordistance
A
<0G

2
f

A

A
<0G
,where
A
<0G
isthedistanceatthemode)
hasbeenempiricallyobserved[1]tobesimilaracrossdi˙erenttopologicalstructureswiththesame
intrinsicdimensionality.Theplotshowsthedistancedistributionforafacerepresentation,unitary
hypersphereandaGaussiandistributionofIDtwoembeddedwithin3-
38<
space.[8]
where
˛
(

)
istheHeavisidefunctionand
?
(
A
)
istheprobabilitydistributionofthepairwisedistances.
Inthiswork,wechoosethecorrelationdimension[82],aparticulartypeoftopologicaldimension,
torepresenttheintrinsicdimensionofthefacerepresentation.Itisde˝nedas,
<
=lim
A
!
0
ln
˘
(
A
)
ln
A
=
)
lim
A
!
0
˘
(
A
)
/
A
<
(2.2)
Therefore,theintrinsicdimensioniscruciallydependentontheaccuracywithwhichtheprobability
distributioncanbeestimatedatverysmalllength-scales(distances),i.e.,
A
!
0
.Signi˝cante˙orts
havebeendevotedtoestimatingtheintrinsicdimensionthroughline˝ttinginthe
ln
˘
(
A
)
vs
ln
A
spacearoundtheregionwhere
A
!
0
i.e.,
<
=lim
(
A
2

A
1
)
!
0
ln
˘
(
A
2
)

ln
˘
(
A
1
)
ln
A
2

ln
A
1
(2.3)
=lim
A
!
0
3
ln
˘
(
A
)
3
ln
A
=lim
A
!
0
?
(
A
)
˘
(
A
)
A
=lim
A
!
0
<
(
A
)
Themaindrawbackwiththisapproachistheneedforreliableestimatesof
?
(
A
)
atverysmall
lengthscales,whichispreciselywheretheestimatesaremostunreliablewhendataislimited,
27
especiallyinveryhigh-dimensionalspaces.Granataetal.[1]presentanelegantsolutiontothis
problemthroughthreeobservations,(i)estimatesof
<
(
A
)
canbestableevenas
A
!
0
ifthe
distancebetweenpointsiscomputedasthegraphinducedshortestpathbetweenpointsinsteadof
theeuclideandistance,asiscommonlythecase,(ii)theprobabilitydistribution
?
(
A
)
atintermediate
length-scalesaroundthemodeof
?
(
A
)
i.e.,
(
A
<0G

2
f
)

A

A
<0G
canbeconvenientlyusedto
obtainreliableestimatesofIND,and(iii)thedistributions
?
(
A
)
ofdi˙erenttopologicalgeometries
aresimilartoeachotheraslongastheintrinsicdimensionalityisthesame,orinotherwordsthe
distribution
?
(
A
)
dependsonlyontheintrinsicdimensionalityandnotonthegeometricsupportof
themanifolds.
Fig.2.1providesanillustrationoftheseobservations.Considertwodi˙erentmanifolds,facesand
thesurfaceofa(
<
+1
)-dimensionalunitaryhypersphere(henceforthreferredtoas
<
-hypersphere
S
<
),withintrinsicdimensionalityof
<
=2
butembeddedwithin
3
-
38<
Euclideanspace.Beyondthe
nearestneighbor,thedistance
A
betweenanypairofpointsinthemanifoldiscomputedastheshortest
pathbetweenthepointsasinducedbythegraphconnectingallthepointsintherepresentation.
Fig.2.1bshowsthedistributionof
log
?
(
A
)
?
(
A
<0G
)
vs
log
A
A
<0G
intherange
A
<0G

2
f

A

A
<0G
,
where
f
isthestandarddeviationof
?
(
A
)
and
A
<0G
=
argmax
A
?
(
A
)
correspondstotheradiusof
themodeof
?
(
A
)
.Interestingly,di˙erenttopologicalgeometries,namely,afacerepresentationof
INDtwo,a
2
-hypersphereanda
2
-
38<
Gaussian,allembeddedwithin
3
-
38<
Euclideanspacehave
almostidenticaldistributions.Moregenerally,thedistributionof
log
?
(
A
)
?
(
A
<0G
)
vs
log
A
A
<0G
intherange
A
<0G

2
f

A

A
<0G
isempiricallyobservedtodependonlyontheintrinsicdimensionality,rather
thanthegeometricalsupportofthemanifold.
Theintrinsicdimensionalityoftherepresentationmanifoldcanthusbeestimatedbycomparing
theempiricaldistributionofthepairwisedistances
^
?
M
(
A
)
onthemanifoldtothatofaknown
distribution,suchasthe
<
-hypersphereortheGaussiandistributionintherange
A
<0G

f

A

A
<0G
.
We˝rstshowthederivationforestimatingtheintrinsicdimensionality
<
thatminimizestheRoot
MeanSquaredError(RMSE)withrespecttoa
<
-hypersphere.Thedistributionofthegeodesic
distance
?
S
<
(
A
)
of
<
-hyperspherecanbeanalyticallyexpressedas,
?
S
<
(
A
)=
2
sin
<

1
(
A
)
,where
2
28
isaconstantand
<
istheIND.Given
^
?
M
(
A
)
,weminimizetheRMSEbetweenthedistributionsas,
min
2Œ<
Z
A
<0G
A
<0G

2
f
k
log^
?
M
(
A
)

log(
2
)

(
<

1)log
¹
sin[
A
]
ºk
2
whichuponsimpli˝cationyields,
min
<
Z
A
<0G
A
<0G

2
f


log
^
?
M
(
A
)
^
?
M
(
A
<0G
)

(
<

1)log

sin

cA
2
A
<0G


2
Theaboveoptimizationproblemcanbesolvedviaaleast-squares˝tafterestimatingthestandard
deviation,
f
,of
?
(
A
)
.Firstweestimate
f
forthe
<
-hyperspherebyapproximatingthedistribution
^
?
M
(
A
)
byaunivariateGaussiandistributionaroundthemodeof
?
M
(
A
)
.So,givensamples
(
=
f
A
1
ŒŁŁŁŒA
)
g
fromthedistribution
?
(
A
)
,thevariancearoundthemodecanbeestimatedas,
f
2
=
1
)
P
)
C
=1
(
A
C

A
<0G
)
2
,where
A
<0G
istheradiusatthemodeof
^
?
M
(
A
)
.Then,weestimatethe
distribution
log
^
?
M
(
A
)
^
?
M
(
A
<0G
)
vs
log

B8=
h
cA
2
A
<0G
i
andsolvethefollowingleast-squares˝tproblem:
min
<
X
(
\
A
<0G

2
f

A
8

A
<0G
¹
H
8

(
<

1)
G
8
º
2
where
H
8
=
log
^
?
_
M
(
A
8
)
^
?
_
M
(
A
<0G
)
and
G
8
=
log

B8=
h
cA
2
A
<0G
i
.Suchaprocedurecould,inprinciple,resultin
afractionalestimateofdimension.Ifoneonlyrequiresintegersolutions,theoptimalvalueof
<
can
beestimatedbyrounding-o˙theleastsquares˝tsolution.
InthecaseofcomparisontoaGaussiandistribution,theintrinsicdimensionalitycanalsobe
estimatedbycomparingtothegeodesicdistancedistributionforpointssampledfromaGaussian
distributionas,
min
3
Z
A
<0G
A
<0G

2
f


log
?
(
A
)
?
(
A
<0G
)
+(
3

1)
A
2
4
f
2


2
2
(2.4)
Thesolutionofthisoptimizationproblemcanbefoundfollowingthesameproceduredescribed
abovefora
<
-hypersphere.
29
Figure2.2
DeepMDSMapping:
ADNNbasednon-linearmappingislearnedtotransformthe
ambientspacetoaplausibleintrinsicspace.Thenetworkisoptimizedtopreservedistancesbetween
pairsofpointsintheambientandintrinsicspace.
2.3.2EstimatingIntrinsicSapce
Theintrinsicdimensionalityestimatesobtainedintheprevioussubsectionalludestotheexistenceof
amapping,thatcantransformtheambientrepresentationtotheintrinsicspace,butdoesnotprovide
anysolutionsto˝ndsaidmapping.Themappingitselfcouldpotentiallybeverycomplexandour
goalofestimatingitispracticallychallenging.
Webaseoursolutiontoestimateamappingfromtheambienttotheintrinsicspaceon
Multidimensionalscaling(MDS)[88],aclassicalmappingtechniquethatattemptstopreservethe
distances(similarities)betweenpointsafterembeddingtheminalow-dimensionalspace.Given
datapoints
^
=
f
x
1
ŒŁŁŁŒ
x
=
g
intheambientspaceand
_
=
f
y
1
ŒŁŁŁŒ
y
=
g
thecorrespondingpoints
intheintrinsiclow-dimensionalspace,theMDSproblemisformulatedas,
min
X
89

3
˛
(
x
8
Œ
x
9
)

3
!
(
y
8
Œ
y
9
)

2
(2.5)
where
3
˛
(

)
and
3
!
(

)
aredistance(similarity)metricsintheambientandintrinsicspace,respectively.
Di˙erentchoicesofthemetric,leadstodi˙erentdimensionalityreductionalgorithms.Forinstance,
classicalmetricMDSisbasedonEuclideandistancebetweenthepointswhileusingthegeodesic
distanceinducedbyaneighborhoodgraphleadstoIsomap[76].Similarly,manydi˙erentdistance
metricshavebeenproposedcorrespondingtonon-linearmappingsbetweentheambientspaceand
theintrinsicspace.Amajorityoftheseapproachesarebasedonspectraldecompositionsandsu˙er
30
manydrawbacks,(i)computationalcomplexityscalesas
O
(
=
3
)
for
=
datapoints,(ii)ambiguity
inthechoiceofthecorrectnon-linearfunction,and(iii)collapsedembeddingsonmorecomplex
data[95].
Toovercometheselimitations,weemployaDNNtoapproximatethenon-linearmapping
thattransformstheambientrepresentation,
x
,totheintrinsicspace,
y
byaparametricfunction
y
=
5
(
x
;
)
)
withparameters
)
.WelearntheparametersofthemappingwithintheMDSframework,
min
)
=
X
8
=1
=
X
8
=1

3
˛
(
x
8
Œ
x
9
)

3
!
(
5
(
x
8
;
)
)
Œ5
(
x
9
;
)
))

2
+
_
k
)
k
2
2
wherethesecondtermisaregularizerwithahyperparameter
_
.Fig.2.2showsanillustrationofthe
DNNbasedmapping.
Inpractice,directlylearningthemappingfromtheambienttotheintrinsicspaceisvery
challenging,especiallyfordisentanglingacomplexmanifoldunderhighlevelsofcompression.We
adoptacurriculumlearning[96]approachtoovercomethischallengeandprogressivelyreducethe
dimensionalityofthemappinginmultiplestages.Westartwitheasiersub-tasksandprogressively
increasethedi˚cultyofthetasks.Forexample,adirectmappingfrom
R
512
!
R
15
isinstead
decomposedintomultiplemappingfunctions
R
512
!
R
256
!
R
128
!
R
64
!
R
32
!
R
15
.We
formulatethelearningproblemfor
!
mappingfunctions

y
;
=
5
;
(
x
;
)
)

as:
min
)
1
ŒŁŁŁŒ
)
!
=
X
8
=1
=
X
9
=1
!
X
;
=1
U
;
h
3
˛
(
x
8
Œ
x
9
)

3
!
(
y
;
8
Œ
y
;
9
)
i
2
+
_
k
)
;
k
2
2
where
)
;
aretheparametersofthe
;
-thmapping.Appropriatelyschedulingthe
U
;
weightsenables
ustosetitupasacurriculumlearningproblem.
2.4Experiments
2.4.1IntrinsicDimensionalityEstimation
Inthissection,˝rstwewillestimatetheintrinsicdimensionalityofmultiplefacerepresentations
overmultipledatasetsofvaryingcomplexity.Then,wewillevaluatethee˚cacyoftheproposed
31
(a)DistanceDistribution
?
(
A
)
(b)LeastSquaresFitting
Figure2.3
IntrinsicDimensionality:
(a)Geodesicdistancedistribution,and(b)globalminimum
ofRMSE.
DeepMDSmodelin˝ndingthemappingfromtheambienttotheintrinsicspacewhilemaintaining
itsdiscriminativeability.
Datasets.
Weconsidertwodi˙erentfacedatasetsforfaceveri˝cation,LFW[27],andIJB-C[9].
RecallthatDeepMDSisanunsupervisedmethod,socategoryinformationassociatedwiththefaces
isneitherusedforintrinsicdimensionalityestimationnorforlearningthemappingfromtheambient
tointrinsicspace.
RepresentationModels.
Fortheface-veri˝cationtask,weconsidermultiplepubliclyavail-
ableSOTAfaceembeddingmodels,namely,
128
-
38<
FaceNet[12]representationand
512
-
38<
SphereFace[13]representation.Inaddition,wealsoevaluatea
512
-
38<
variantofFaceNet
3
thatoutperformsthe128-
38<
version.AlloftheserepresentationsarelearnedfromtheCASIA
WebFace[11]dataset,consistingof494,414imagesacross10,575subjects.
BaselineMethods.

IntrinsicDimensionality:Weselecttwodi˙erentalgorithmsforestimatingtheintrinsicdimen-
sionalityofagivenrepresentation,aclassicalk-nearestneighborbasedestimator[2]andinsic
DimensionalityEstimationAlgorithm"(IDEA)[3].

DimensionalityReduction:WecompareDeepMDSagainstthreedimensionalityreduction
3
https://github.com/davidsandberg/facenet
32
(a)FaceNet-128
(b)FaceNet-512
(c)SphereFace
Figure2.4Distributionofgeodesicdistancesfordi˙erentrepresentationmodelsanddatasets.
algorithms,principalcomponentanalysis(PCA)forlineardimensionalityreduction,Isomap[76]
anddenoisingautoencoders[94](DAE).
ImplementationDetails:
TheINDestimatesforallthemethodsweevaluatearedependentonthe
numberofneighbors
:
.Forthebaselines,
:
isusedtocomputetheparametersoftheprobability
density.Forourmethod,
:
parameterizestheconstructionoftheneighborhoodgraph.Forthelatter,
thechoiceof
:
isconstrainedbythreefactors;(1)
:
shouldbesmallenoughtoavoidshortcuts
betweenpointsthatareclosetoeachotherintheEuclideanspace,butarepotentiallyfarawayinthe
correspondingintrinsicmanifoldduetohighlycomplicatedlocalcurvatures.(2)Ontheotherhand,
:
shouldalsobelargeenoughtoresultinaconnectedgraphi.e.,therearenoisolateddatasamples,
and(3)
:
thatbestmatchesthegeodesicdistancedistributionofahypersphereofthesameIND
i.e.,
:
thatminimizestheRMSE.Fig.2.3ashowsthedistancedistributionsforSphereFacewith
:
=15
,a16-hypersphereanda16-
38<
Gaussian.Theclosesimilarityofthepairwisedistance
distributionsofthesemanifoldsinthegraphinducedgeodesicdistancespacesuggeststhattheIND
ofSphereFace(512-dimambientspace)is16.Fig.2.3bshowstheoptimalRMSEforSphereFace
atdi˙erentvaluesof
<
.Thedistributionofgeodesicdistances
?
(
A
)
foreachofthedatasetsand
representationmodelsisshowninFig.2.4.Fig.2.5showstheplotof
log
^
?
M
(
A
)
^
?
M
(
A
<0G
)
vs
log
A
A
<0G
,aswe
varythenumberofneighbors
:
,fortheSphereFacerepresentationmodelontheLFWandIJB-C
datasets.
Foralltheapproachesweselectthe
:
-nearestneighborsusingcosinesimilarityforSphereFace,
andarc-length,
3
(
x
1
Œ
x
2
)=
cos

1

x
)
1
x
2
k
x
1
kk
x
2
k

,forFaceNetfeatures,asthelatterarenormalizedto
33
(a)SphereFaceonLFWDataset
(b)SphereFaceonIJB-CDataset
Figure2.5
log
^
?
M
(
A
)
^
?
M
(
A
<0G
)
vs
log
A
A
<0G
plotsaswevarynumberofneighbors
:
forspherefacerepresen-
tationmodelondi˙erentdatasets.
34
Table2.1IntrinsicDimensionality:GraphDistance[1]
Representation
dataset
k
47915
FaceNet-128
LFW
10*
131118
IJB-C101010
11*
FaceNet-512
LFW
10*
111117
IJB-C111112
12*
SphereFace
LFW
10*
11139
IJB-C141416
16*
Table2.2IntrinsicDimensionality:KNN[2]
Representation
dataset
k
47915
FaceNet-128
LFW10101111
IJB-C101099
FaceNet-512
LFW8889
IJB-C101099
Sphereface
LFW6778
IJB-C6655
resideonthesurfaceofaunitaryhypersphere.Finally,forsimplicity,weroundtheINDestimatesto
thenearestintegerforallthemethods.
ExperimentalResults:
Tab.2.1reportstheINDestimatesfromthegraphmethodfordi˙erent
valuesof
:
4
andfordi˙erentrepresentationmodelsacrossdi˙erentdatasets.Tab.2.2andTab.2.3
reportstheINDestimatesfromthek-nearestneighborapproach[2]andIDEA[3],respectively,
fordi˙erentrepresentationmodelsacrossdi˙erentdatasetsthatweconsider.Theseapproaches
areknowntounderestimatetheintrinsicdimensionality[79].Wemakeanumberofobservations
fromourresults:(1)Surprisingly,theINDestimatesacrossallthedatasets,featurerepresentations
andINDmethodsaresigni˝cantlylowerthanthedimensionalityoftheambientspace,between
10and20,suggestingthatfacerepresentationscould,inprinciple,bealmost10

to50

more
compact.(2)Boththe
:
-NNbasedestimator[2]andtheIDEAestimator[3]arelesssensitivetothe
numberofnearestneighborsincomparisontothegraphdistancebasedmethod[1],butareknown
tounderestimateINDforsetswithhighintrinsicdimensionality[79].Asreportedinthetables,
4
*denotes˝nalINDestimatethatsatis˝esallconstraintson
:
.
35
Table2.3IntrinsicDimensionality:IDEA[3]
Representation
dataset
k
47915
FaceNet-128
LFW14131312
IJB-C1411109
FaceNet-512
LFW12101010
IJB-C1411109
Sphereface
LFW10999
IJB-C8765
(a)HistogramofSwissRoll
(b)
log
^
?
M
(
A
)
^
?
M
(
A
<0G
)
vs
log
A
A
<0G
(c)DimensionalityofSwissRoll
Figure2.6IntrinsicDimensionalityofSwissRoll
INDestimatesofthetwobaselinemethodsarelowerthantheestimatesofthegraphdistancebased
approachthatweuse.
SwissRoll.
Wealsoconsiderswissrolldataset,asameansofprovidingvisualvalidationofthe
estimatedintrinsicspaceonaknowndataset.First,weestimatetheintrinsicdimensionalityofthe
swissrolldatasetandthenwelearnalow-dimensionalmappingfromtheambient3-
38<
spaceto
theintrinsicspace.Wesample
2
Œ
000
pointsfromtheswissrolldatasetandusethesepointsfor
theexperiments.Forthisdataset,theintrinsicdimensionalityestimateis
2
(SeeFig.2.6),whichis
indeedthegroundtruthintrinsicdimensionalityofswiss-roll.
2.4.2IntrinsicSpaceMapping
Giventheestimatesofthedimensionalityoftheintrinsicspace,welearnthemappingfromthe
ambientspaceto
aplausible
intrinsicspacewiththegoalofretainingthediscriminativeabilityof
therepresentation.Thetrueintrinsicrepresentation(INDandspace)isunknownandtherefore
36
(a)SwissRoll
(b)ProjectionofIsomap
(c)ProjectionofDeepMDS
Figure2.7
SwissRoll:
(a)theoriginal2000pointsfromtheswissrollmanifold,(b)the2-
38<
intrinsicspaceestimatedbyIsomap,and(3)the2-
38<
intrinsicspaceestimatedbyourproposed
methodDeepMDS.Inbothcases,theblueandblackpoints,andcorrespondinglygreenandred
points,areclosetogetherinboththeintrinsicandambientspace.
notfeasibletovalidatedirectly.However,verifyingitsdiscriminatepowercanservetoindirectly
validateboththeINDestimateandthelearnedintrinsicspace.
ImplementationDetails:
We˝rstextractfacefeaturesthroughtherepresentationmethodsi.e.,
FaceNet-128,FaceNet-512andSphereFace.ThearchitectureoftheproposedDeepMDSmodel
isbasedontheideaofskipconnectionladenresidualunits[70].Wetrainthemappingfromthe
ambienttointrinsicspaceinmultiplestageswitheachstagecomprisingoftworesidualunits.Once
theindividualstagesaretrained,allthe
!
projectionmodelsarejointly˝ne-tunedtomaintainthe
pairwisedistancesintheintrinsicspace.Weadoptasimilarnetworkstructure(residualunits)and
trainingstrategy(stagewisetrainingand˝ne-tuning)forthestackeddenoisingautoencoderbaseline.
Fromanoptimizationperspective,trainingtheautoencoderismorecomputationallye˚cientthan
theDeepMDSmodel,
O
(
=
)
vs
O
(
=
2
)
.
TheparametersofthenetworkarelearnedusingtheAdam[97]optimizerwithalearningrateof
3

10

4
andtheregularizationparameter
_
=3

10

4
.Weobservedthatusingthecosine-annealing
scheduler[98]wascriticaltolearningane˙ectivemapping.
ExperimentalResults:
Weevaluatethee˚cacyofthelearnedprojections,namelyPCA,Isomap
andDeepMDS,inthelearnedintrinsicspaceandcomparetheirrespectiveperformanceintheambient
space.Facerepresentationsareevaluatedintermsofveri˝cation(TAR@FAR)performance.Given
37
Table2.4LFWFaceVeri˝cationforSphereFaceEmbedding
Dimension
DimensionReductionmethod
PCAIsomapDAEDeepMDS
51296.74%
256
96.75
%92.88%77.80%96.73%
128
96.80
%93.18%32.95%96.44%
6491.71%95.00%32.04%
96.50
%
3266.38%95.31%11.71%
96.31
%
1632.67%89.47%27.53%
95.95
%
10(ID)16.04%77.31%6.73%
92.33
%
(a)IJB-C:FaceNet-128
(b)IJB-C:FaceNet-512
(c)IJB-C:SphereFace
Figure2.8
DeepMDS:
FaceVeri˝cationonIJB-C[9](TAR@0.1%FARinlegend)forthe(a)
FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings.
theINDestimate,designinganappropriateschemeformappingtheintrinsicmanifoldismuch
morechallengingthantheINDestimationitself.Toshowhowdimensionalityoftheintrinsicspace
in˛uencestheperformanceoffacerepresentations,weevaluateandcomparetheirperformanceat
multipleintermediatespaces.
Faceveri˝cationisperformedontheIJB-Cdatasetfollowingitsveri˝cationprotocolandonthe
LFWdatasetfollowingtheBLUFR[99]protocol.Fig.2.8showstheROCcurvesfortheIJB-C
datasetusingfacerepresentationsprojectedtomultipleintermediatespacesandtheintrinsicspace
byDeepMDS.Thefaceveri˝cationROCcurvesofDeepMDSonLFWdatasetforFaceNet-128,
FaceNet-512andSphereFacerepresentationmodelsareshowninFig.2.9.Tab.2.4reportsthe
veri˝cationrateatFARof0.1%ontheLFWdataset.Fig.2.10showsthefaceveri˝cationROC
curvesofPCAontheIJB-CandLFW(BLUFR)datasetsforallthethreerepresentationmodels.
Similarly,Fig.2.11andFig.2.13showthefaceveri˝cationROCcurvesoftheIsomapandDenoising
Autoencoderbaselines,respectively.
38
(a)LFW:FaceNet-128
(b)LFW:FaceNet-512
(c)LFW:SphereFace
Figure2.9
DeepMDS:
FaceVeri˝cationonLFW(BLUFR)datasetforthe(a)FaceNet-128,(b)
FaceNet-512and(c)SphereFaceembeddings.
Wemakethefollowingobservationsfromtheseresults:(1)foralltheveri˝cationexperiments,
theperformanceoftheDeepMDSfeaturesupto32dimensionsforfacesiscomparabletothe
original128-
38<
and512-
38<
features.The10-
38<
spaceofDeepMDSonLFW,consistinglargely
offrontalfaceimageswithminimalposevariationsandfacialocclusions,achievesaTARof92.33%
at0.1%FAR,alossofabout4.5%comparedtotheambientspace.The12-
38<
spaceofDeepMDS
onIJB-C,withfullposevariations,occlusionsanddiversityofsubject,achievesaTARof62.25%
at0.1%FAR,comparedto69.32%intheambientspace.(2)theproposedDeepMDSmodelis
abletolearnalow-dimensionalspaceuptotheINDwithaperformancepenaltyof5%-10%for
compressionfactorsof30

to40

for512-dimrepresentations,underscoringthefactthatlearning
amappingfromambienttointrinsicspaceismorechallengingthanestimatingtheINDitself.(3)
Inthetaskoffaceveri˝cation,weobservethattheDeepMDSmodelisabletoretainsigni˝cantly
morediscriminativeabilitycomparedtothebaselineapproachesevenathighlevelsofcompression.
WhileIsomapismorecompetitivethantheotherbaselinesitsu˙ersfromsomedrawbacks:(i)Due
toitsiterativenature,itdoesnotprovideanexplicitmappingfunctionfornew(unseen)datasamples,
whiletheautoencoderandDeepMDSmodelscanmapsuchdatasamples.Therefore,Isomap
cannotbeutilizedtoevaluateveri˝cationaccuracyonavalidation/testset,and(ii)Computational
complexityofIsomapis
O
(
=
3
)
andhencedoesnotscalewelltolargedatasets(IJB-C)andneeds
approximations,suchasNyströmapproximation[57],fortractability.
AblationStudy:
Herewedemonstratethee˚cacyofthestagewiselearningprocessfortraining
39
(a)IJB-C:FaceNet-128
(b)IJB-C:FaceNet-512
(c)IJB-C:SphereFace
(d)LFW:FaceNet-128
(e)LFW:FaceNet-512
(f)LFW:SphereFace
Figure2.10
PCA:
FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a)FaceNet-128,
(b)FaceNet-512and(c)SphereFaceembeddings.
Table2.5DeepMDSTrainingMethods(TAR@0.1%FAR)
Method
DirectDirect+ISStagewise+FinetuneStagewise
TAR
80.2586.1590.42
92.33
theDeepMDSmodel.Allmodelshavethesamecapacity.Weconsiderfourvariants:(1)
Direct
mappingfromtheambienttointrinsicspace,(2)
Direct+IS:
directmappingfromambientto
intrinsicspacewithintermediatesupervisionateachstagei.e.,optimizeaggregateintermediate
losses,(3)
Stagewise
learningofthemapping,and(4)
Stagewise+Fine-Tune:
theprojectionmodel
trainedstagewiseandthen˝ne-tuned.Tab.2.5comparestheresultsofthesevariationsontheLFW
dataset(BLUFRprotocol).Ourresultssuggestthatstagewiselearningofthenon-linearprojection
modelsismoree˙ectiveatprogressivelydisentanglingtheambientrepresentation.Similartrend
wasobservedonthelargerdataset,IJB-C.Infact,stagewisetrainingwith˝ne-tuningwascriticalin
learningane˙ectiveprojection,bothforDeepMDSaswellasDAE.
DirectTraining:
Our˝ndingsinthiswork,thatmannycurrentDNNrepresentationscanbe
signi˝cantlycompressed,namelybegsthequestion:
canwedirectlylearnembeddingfunctions
40
(a)IJB-C:FaceNet-128
(b)IJB-C:FaceNet-512
(c)IJB-C:SphereFace
(d)LFW:FaceNet-128
(e)LFW:FaceNet-512
(f)LFW:SphereFace
Figure2.11
Isomap:
FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetforthe(a)FaceNet-128,
(b)FaceNet-512and(c)SphereFaceembeddings.
thatyieldcompactanddiscriminativeembeddingsinthe˝rstplace?
Taigmanetal.[52]studythis
probleminthecontextoflearningfaceembeddings,andnotedthatacompactfeaturespacecreates
abottleneckintheinformation˛owtotheclassi˝cationlayerandhenceincreasesthedi˚cultyof
optimizingthenetworkwhentrainingfromscratch.Giventhesigni˝cantdevelopmentsinnetwork
architecturesandoptimizationtoolssincethen,weattempttolearnhighlycompactembedding
directlyfromraw-data,usingcurrentbest-practices,whilecircumventingthechicken-and-egg
problemofnotknowingthetargetintrinsicdimensionalitybeforelearningtheembeddingfunction.
Wetrain
5
theInceptionResNetV1[10]ontheCASIA-WebFace[11]forembeddingsofdi˙erent
sizes.Fig.2.12showstheROCcurvesontheLFWandIJB-Cdatasets.Themodelssu˙ersigni˝cant
lossinperformanceaswedecreasethedimensionalityoftheembeddings.Incomparisonthe
proposedDeepMDSbaseddimensionalityreductionretainsitsdiscriminativeabilityevenathigh
levelsofcompression.Theseresultscallforthedevelopmentofalgorithmsthatcandirectlylearn
compactande˙ectiveimagerepresentations.
5
Webuildo˙ofthepubliclyavailableimplementationat
https://github.com/davidsandberg/facenet
41
(a)LFW
(b)IJB-C
Figure2.12ROCcurveonLFWandIJB-CdatasetsfortheInceptionResNetV1[10]modeltrained
withdi˙erentembeddingdimensionalityontheCASIA-WebFace[11]dataset.
(a)IJB-C:FaceNet-128
(b)FaceNet-512
(c)SphereFace
(d)FaceNet-128
(e)FaceNet-512
(f)SphereFace
Figure2.13
DenoisingAutoencoder:
FaceVeri˝cationonIJB-CandLFW(BLUFR)datasetfor
the(a)FaceNet-128,(b)FaceNet-512and(c)SphereFaceembeddings.
42
2.5Conclusion
Theworkinthischapteraddressedtwoquestions,givenaDNNbasedfacerepresentation,whatis
theminimumnumberofdegreesoffreedomintherepresentationi.e.,itsintrinsicdimensionand
canwe˝ndamappingbetweentheambientandintrinsicspacewhilemaintainingthediscriminative
capabilityoftherepresentation?Contributionsoftheworkinclude,(i)agraphinducedgeodesic
distancebasedapproachtoestimatetheintrinsicdimension,and(ii)DeepMDS,anon-linear
projectiontotransformtheambientspacetotheintrinsicspace.ExperimentsonmultipleDNN
basedfacerepresentationsyieldedINDestimatesof9to16,whicharesigni˝cantlylowerthan
theambientdimension(10

to40

).TheDeepMDSmodelwasabletolearnaprojectionfrom
ambienttotheintrinsicspacewhilepreservingitsdiscriminativeability,toalargeextent,onthe
LFW,andIJB-Cdatasets.Our˝ndingsinthischaptersuggestthatfacerepresentationscouldbe
signi˝cantlymorecompactandcallforthedevelopmentofalgorithmsthatcandirectlylearnmore
compactfacerepresentations.
43
Chapter3
TheCapacityofFaceRepresentation
InthepreviouschapterweestimateINDoffacepresentation,andproposeadimensionalityreduction
methodtomaphigh-dimensionalfeaturevectorsintolow-dimensionalspace.Wehaveyettogive
anyapplicationrelatedtointrinsicrepresentationspace.Inthischapterwestudyaspeci˝cproblem
offacerepresentationapplyingthedimensionalityreductionalgorithmbasedonthenotionofIND.
Theproblemisde˝nedas,
givenafacerepresentation,howmanyidentitiescanitresolve?
Inother
words,
whatisthecapacityofthefacerepresentation?
Wecastthefacecapacityprobleminterms
ofpackingboundsonalow-dimensionalmanifoldembeddedwithinadeeprepresentationspace.By
explicitlyaccountingforthemanifoldstructureoftherepresentationaswelltwodi˙erentsourcesof
representationalnoise:
epistemic
(model)uncertaintyand
aleatoric
(data)variability,ourapproach
isabletoestimatethecapacityofagivenfacerepresentation.Todemonstratethee˚cacyofour
approach,weestimatethecapacityoftwodeepneuralnetworkbasedfacerepresentations,namely
128-dimensionalFaceNetand512-dimensionalSphereFace.
Atthecore,weleverageadvancesindeepneuralnetworks(DNNs)formultipleaspectsofour
solution,relyingontheirabilitytoapproximatecomplexnon-linearmappings.Firstly,weutilize
DNNstoapproximatethenon-linearfunctionforprojectingandunfoldingthehigh-dimensionalface
representationintoalow-dimensionalrepresentation,whilepreservingthelocalgeometricstructure
ofthemanifold.Secondly,weutilizeDNNstoaidinapproximatingthedensityandsupportofthe
44
Figure3.1Anillustrationofthegeometricalstructureofourcapacityestimationproblem:a
low-dimensionalmanifold
M2
R
<
embeddedinhighdimensionalspace
P2
R
?
.Onthismanifold,
allthefaceslieinsidethepopulationhyper-ellipsoidandtheembeddingofimagesbelongingto
eachidentityoraclassareclusteredintotheirownclass-speci˝chyper-ellipsoids.Thecapacityof
thismanifoldisthenumberofidentities(class-speci˝chyper-ellipsoids)thatcanbepackedintothe
populationhyper-ellipsoidwithinanerrortoleranceoramountofoverlap.
low-dimensionalmanifoldintheformofmultivariateGaussiandistributionsasafunctionofthe
desiredFAR.Thekeytechnicalcontributionsofthisworkare:
1.
Explicitlyaccountingforandmodelingthemanifoldstructureofthefacerepresentationinthe
capacityestimationprocess.ThisisachievedthroughaDNNbasednon-linearprojectionand
unfoldingoftherepresentationintoanequivalentlow-dimensionalEuclideanspacewhile
preservingthelocalgeometricstructureofthemanifold.
2.
Anoisemodelforfacialembeddingsthatexplicitlyaccountsfortwosourcesofuncertainty,
uncertaintyduetodataandtheuncertaintyintheparametersoftherepresentationfunction.
3.
Establishingarelationshipbetweenthesupportoftheclass-speci˝cmanifoldsandthe
discriminantfunctionofanearestneighborclassi˝er.Consequently,wecanestimatecapacity
asafunctionofthedesiredoperatingpoint,intermsofthemaximumdesiredprobabilityof
falseacceptanceerror.
45
4.
The˝rstpracticalattemptatestimatingthecapacityofDNNbasedfacerepresentations.We
considertwosuchrepresentations,namely,FaceNet[12]andSphereFace[13]consistingof
128-dimensionaland512-dimensionalfeaturevectors,respectively.
Numericalexperimentssuggestthatourproposedmodelcanprovidereasonableestimatesof
capacity.ForFaceNetandSphereFace,theupperboundsonthecapacityare
3
Ł
5

10
5
and
1
Ł
1

10
4
,
respectively,forLFW[27],and
2
Ł
2

10
3
and
3
Ł
6

10
3
,respectively,forIJB-C[9]ataFARof0.1%.
Thisimpliesthat,onaverage,therepresentationshouldhaveatrueacceptrate(TAR)of100%at
FARof0.1%for
2
Ł
2

10
3
and
3
Ł
6

10
5
subjectidentitiesforthemorechallengingIJB-Cdatabase
andrelativelesschallengingLFWdatabase(seeFig.1.1cand1.1fforexamplesoffacesinLFW
andIJB-C).Assuch,thecapacityestimatesrepresentanupperboundonthemaximalscalabilityof
agivenfacerepresentation.However,empirically,theFaceNetrepresentationonlyachievesaTAR
of43%ataFARof0.1%ontheIJB-Cdatasetwith3,531subjectsandaTARof94%ataFARof
0.1%ontheLFWdatasetwith5,749subjects.
3.1RelatedWork
Thefocusofagajorityofthefacerecognitionliteraturehasbeenontheaccuracyoffacialrecognition
onbenchmarkdatasets.Incontrast,ourgoalinthisworkistocharacterizethemaximaldiscriminative
capacityofagivenfacerepresentationataspeci˝ederrortolerance.
Anumberofapproacheshavebeenproposedtoanalyzevariousperformancemetricsofbiometric
recognitionsystems,primarilyusinginformationtheoreticconcepts.Schmid
etal
.[100,101]
deriveanalyticalboundsontheprobabilityoferrorandcapacityofbiometricsystemsthroughlarge
deviationanalysisonthedistributionofthesimilarityscores.Bhatnagar
etal
.[102]formulated
performanceindicesforbiometricauthentication.Theyobtainedthecapacityofabiometricsystem
followingShannon'schannelcapacityformulationalongwitharate-distortiontheoryframework
toestimatetheFAR.Similarly,Wang
etal
.[103]proposedanapproachtomodelandpredictthe
performanceofafacerecognitionsystembasedonananalysisofthesimilarityscores.Thecommon
46
themeacrossthisentirebodyofworkisthattheperformanceboundsofthesesystemsareanalyzed
purelybasedonthesimilarityscoresobtainedaspartofthematchingprocess.Incontrast,ourwork
directlyanalyzesthegeometryoftherepresentationspaceoffacerecognitionsystems.
Toalleviatethecurseofdimensionality,weunfoldthefacemanifoldtoalowerdimensional
spacebyusingthesolutionweintroduceinCh.2.IntheworkofCh.2,we˝rstestimatethe
intrinsicdimensionalityoftherepresentationandusetheproposedDeepMDStolearnanon-linear
mappingthatpreservesthediscriminativeperformanceoftherepresentationtoalargeextent.But
withthegoalofestimatingthecapacityinthischapter,itisopposedtothatofpreservingthe
discriminativeperformanceoftherepresentationinthepreviouschapter.Therefore,whilethelatter
doesnotnecessitatepreservingthelocalgeometricstructureofthemanifold,theformeriscritically
dependentontheabilityofthedimensionalityreductiontechniquetopreservethelocalgeometric
structureofthemanifold.
Inthecontextofestimatingdistributions,GaussianProcesses[104]areapopularandpowerful
tooltomodeldistributionsoverfunctions,o˙eringnicepropertiessuchasuncertaintyestimatesover
functionvalues,robustnesstoover-˝tting,andprincipledwaysforhyper-parametertuning.Anumber
ofapproacheshavebeenproposedformodelinguncertaintiesindeepneuralnetworks[105

107].
Alongsimilarlines,Kendall
etal
.[108]studiedthebene˝tsofexplicitlymodeling
epistemic
1
(model)and
aleatoric
2
(data)uncertainties[109]inBayesiandeepneuralnetworksforsemantic
segmentationanddepthestimationtasks.Drawinginspirationfromthiswork,weaccountfor
thesetwosourcesofuncertaintiesintheprocessofmappinganormalizedfacialimageintoa
low-dimensionalfacerepresentation.
Capacityestimatestodeterminetheuniquenessofotherbiometricmodalities,namely˝ngerprints
andirishavebeenreported.Pankanti
etal
.[110]derivedanexpressionforestimatingtheprobability
ofafalsecorrespondencebetweenminutiae-basedrepresentationsfromtwoarbitrary˝ngerprints
belongingtotwodi˙erent˝ngers.Zhu
etal
.[111]laterdevelopedamorerealisticmodelof
˝ngerprintindividualitythrougha˝nitemixturemodeltorepresentthedistributionofminutiaein
1
Uncertaintyduetolackofinformationaboutaprocess.
2
Uncertaintystemmingfromtheinherentrandomnessofaprocess.
47
˝ngerprintimages,includingminutiaeclusteringtendenciesanddependenciesindi˙erentregions
ofthe˝ngerprintimagedomain.Daugman[112]proposedaninformationtheoreticapproachto
computethecapacityofIrisCode.He˝rstdevelopedagenerativemodelofIrisCodebasedon
HiddenMarkovModelsandthenestimatedthecapacityofIrisCodebycalculatingtheentropyof
thisgenerativemodel.Adler
etal
.[113]proposedaninformationtheoreticapproachtoestimatethe
averageinformationcontainedwithinafacerepresentationlikeEigenfaces[36].
Tothebestofourknowledge,nosuchcapacityestimationmodelshavebeenproposedinthe
literatureforrepresentationsoffaces.Moreover,thedistinctnatureofrepresentationsfor˝ngerprint
3
,
iris
4
andface
5
traitsdoesnotallowcapacityestimationapproachestocarryoverfromonebiometric
modalitytoanother.Therefore,webelievethatanewmodelisnecessarytoestablishthecapacityof
facerepresentations.
3.2CapacityofFaceRepresentations
We˝rstdescribethesettingoftheproblemandthendescribeoursolution.Apictorialoutlineofthe
approachisshowninFig.3.2.
3.2.1FaceRepresentationModel
Afacerepresentationmodel
"
isaparametricembeddingfunctionthatmapsafaceimage
s
of
identity
c
toavectorspace
x
2
R
?
,i.e.,
x
=
5
"
(
s
;
)
P
)
,where
)
P
isthesetofparametersof
theembeddingfunction.Forexample,inthecaseofalinearembeddingfunctionlikePrincipal
ComponentAnalysis(PCA),theparameterset
)
P
wouldrepresenttheeigenvectors.And,inthe
caseofadeepneuralnetworkbasednon-linearembeddingfunction,
)
P
representstheparameters
ofthenetwork.
ThefaceembeddingprocesscanbeapproximatelycastwithintheframeworkofaGaussian
3
Anunorderedcollectionofminutiaepoints.
4
Abinaryrepresentation,calledtheiriscode.
5
A˝xed-lengthvectorofrealvalues.
48
Figure3.2
OverviewofFaceRepresentationCapacityEstimation:
Wecastthecapacityesti-
mationprocessintheframeworkofthespherepackingproblemonalow-dimensionalmanifold.
Togeneralizethespherepackingproblem,wereplacespheresbyhyper-ellipsoids,oneperclass
(subject).Ourapproachinvolvesthreesteps;(i)Unfoldingandmappingthemanifoldembeddedin
high-dimensionalspaceontoalow-dimensionalspace.(ii)
Teacher
-
Student
modeltoobtainexplicit
estimatesoftheuncertainty(noise)intheembeddingduetodataaswellastheparametersofthe
representation,and(iii)Theuncertaintyestimatesareleveragedtoapproximatethedensitymanifold
viamulti-variatenormaldistributions(tokeeptheproblemanditsanalysistractable),whichin
turnfacilitatesanempiricalestimateofthecapacityofthe
teacher
facerepresentationasaratioof
hyper-ellipsoidalvolumes.
noisechannelasfollows.Facerepresentations
y
ofanimage
s
fromthe
teacher
aremodeledas
observationsofatrueunderlyingembedding
x
thatiscorruptedbynoise
z
.Thenatureofthe
relationshipbetweentheseentitiesisdeterminedbytheassumptionsofaGaussianchannel,namely,
(i)additivityofthenoisei.e.,
y
=
x
+
z
,(ii)independenceofthetrueembeddingandtheadditive
noise,i.e.,
x
?
z
,and(iii)allentities,
y
,
x
and
z
followaGaussiandistribution,i.e.,
%
x
˘N
(
-
g
Œ

g
)
,
%
z
˘N
(
0
Œ

z
)
and
%
y
˘N
(
-
y
Œ

y
)
.Statisticalestimatesoftheseparameterizeddistributionswill
enableustocomputethecapacityofthe
teacher
facerepresentationmodelasdescribedinSection
3.2.3.
Foragivenblack-boxfacerepresentation,inpractice,theembeddingscouldpotentiallylieonan
arbitraryandunknownlow-dimensionalmanifold.Approximatingthismanifoldthroughanormal
distributionpotentiallyover-estimatesthesupportoftheembeddingin
R
?
,especiallywhen
?
is
high,resultinginanover-estimationofthecapacityoftherepresentation.Tothisend,wemodel
thespaceoccupiedbythelearnedfacerepresentationasalow-dimensionalpopulationmanifold
M2
R
<
embeddedwithinahigh-dimensionalspace
P2
R
?
.Underthismodelthefeaturesofa
givenidentity
2
lieonamanifold
M
2
M
.Directlyestimatingthesupportandvolumesofthese
49
Figure3.3
ManifoldUnfolding:
ADNNbasednon-linearmappingislearnedtounfoldandproject
thepopulationmanifoldintoalowerdimensionalspace.Thenetworkisoptimizedtopreservethe
geodesicdistancesbetweenpairsofpointsinthehighandlowdimensionalspace.
manifoldsisaverychallengingtask,especiallysincethemanifoldcouldbeahighlyentangled
surfacein
R
?
.Therefore,we˝rstlearnamappingthatcanprojectandunfoldthepopulation
manifoldontoalow-dimensionalspacewhosedensity,supportandvolumecanbeestimatedmore
reliably.WebaseoursolutionforprojectingandunfoldingthemanifoldingonDeepMDS,whichis
proposedintheworkofCh.2.
Speci˝cally,weemployDeepMDStoapproximatethenon-linearmappingthattransformsthe
populationmanifoldinhigh-dimensionalspace,
x
2
R
?
,totheunfoldedmanifoldinlow-dimensional
space,
y
2
R
<
byaparametricfunction
y
=
5
%
(
x
;
)
M
)
withparameters
)
M
.Welearntheparameters
ofthemappingwithintheDeepMDSframeworktominimizethefollowingobjective,
min
)
M
X
89

3
˛
(
x
8
Œ
x
9
)

3
!
(
5
(
x
8
;
)
M
)
Œ5
(
x
9
;
)
M
))

2
+
_
k
)
M
k
2
2
(3.1)
wherethesecondtermisaregularizerwithahyperparameter
_
.Sinceourprimarygoalisto
estimatethecapacityoftherepresentationwemapthemanifoldintothelow-dimensionalspace
whilepreservingthelocalgeometryofthemanifoldintheformofpairwisedistances.Toachieve
thisgoal,wechoose
3
˛
(
x
8
Œ
x
9
)=1+
x
)
8
x
9
k
x
8
k
2
k
x
9
k
2
correspondingtothecosinedistancebetweenthe
featuresinthehighdimensionalspaceand
3
!
(
y
8
Œ
y
9
)=
k
y
8

y
9
k
2
correspondingtotheeuclidean
distanceinthelowdimensionalspace.Fig.3.3showsanillustrationoftheDNNbasedmapping.
3.2.2EstimatingUncertaintiesinRepresentations
Theprojectionmodellearnedintheprevioussectioncanbeusedtoobtainthepopulationmanifold
bypropagatingmultipleimagesfrommanyidentitiesthroughit.However,thisprocessonlyprovides
50
pointestimates(samples)fromthemanifoldanddoesnotaccountfortheuncertaintyinthemanifold.
Accuratelyestimatingthecapacityofthefacerepresentation,however,necessitatesmodelingthe
uncertaintyintherepresentationstemmingfromdi˙erentsourcesofnoiseintheprocessofextracting
featurerepresentationsfromagivenfacialimage.
Aprobabilisticmodelforthespaceofnoisyembeddings
y
generatedbyablack-boxfacial
representationmodel(
teacher
6
)
"
C
withparameters
)
=
f
)
P
Œ
)
M
g
canbeformulatedasfollows:
?
(
y
j
Y

Œ
_

)=
Z
?
(
y
j
s
Œ
Y

Œ
_

)
?
(
s
j
Y

Œ
_

)
3
s
=
ZZ
?
(
y
j
s
Œ
)
)
?
(
)
j
Y

Œ
_

)
?
(
s
j
Y

Œ
_

)
3
)
3
s
(3.2)
where
_

=
f
y
1
ŒŁŁŁŒ
y
#
g
and
Y

=
f
s
1
ŒŁŁŁŒ
s
#
g
arethetrainingsamplestoestimatethemodel
parameters
)
,
?
(
y
j
s
Œ
)
)
isthe
aleatoric
(data)uncertaintygivenasetofparameters,
?
(
)
j
Y

Œ
_

)
is
the
epistemic
(model)uncertaintyintheparametersgiventhetrainingsamplesand
?
(
s
j
Y

Œ
_

)
˘
N
(
-
6
Œ

6
)
istheGaussianapproximation(seeSection3.2.3forjusti˝cation)oftheunderlying
manifoldofnoiselessembeddings.Furthermore,weassumethatthetruemappingbetweenthe
image
s
andthenoiselessembedding
-
isadeterministicbutunknownfunctioni.e.,
-
=
5
(
s
Œ
)
)
.
Theblack-boxnatureofthe
teacher
modelhoweveronlyprovides
D
=
f
s
8
Œ
y
8
g
#
8
=1
,pairsof
facialimages
s
8
andtheircorrespondingnoisyembeddings
y
8
,asinglesamplefromthedistribution
?
(
y
j
Y

Œ
_

)
.Therefore,welearna
student
model
"
B
withparameters
w
tomimicthe
teacher
model.Speci˝cally,the
student
modelapproximatesthedatadependent
aleatoric
uncertainty
?
(
y
8
j
s
8
Œ
w
)
˘N
(
-
8
Œ

8
)
,where
-
8
representsthedatadependentmeanofthenoiselessembeddingand

8
representsthedatadependentuncertaintyaroundthemean.This
student
isanapproximationof
theunknownunderlyingprobabilistic
teacher
,bywhichaninputimage
s
generatesnoisyembeddings
y
ofidealnoiselessembeddings
-
,foragivensetofparameters
w
,i.e.,
?
(
y
8
j
s
8
Œ
w
)
ˇ
?
(
y
8
j
-
8
Œ
)
)
.
Finally,weemployavariationaldistributiontoapproximatethe
epistemic
uncertaintyofthe
teacher
i.e.,
?
(
w
j
Y

Œ
_

)
ˇ
?
(
)
j
Y

Œ
_

)
.
Learning:
Givenpairsoffacialimagesandtheircorrespondingembeddingsfromthe
teacher
model,
welearna
student
modeltomimictheoutputsofthe
teacher
forthesameinputsinaccordance
6
Weadopttheterminologyofteacher-studentmodelsfromthemodelcompressioncommunity[114].
51
totheprobabilisticmodeldescribedabove.Weuseparameterizedfunctions,
-
8
=
5
(
s
8
;
w
-
)
and

8
=
5
(
s
8
;
w

)
tocharacterizethe
aleatoric
uncertainty
?
(
y
8
j
s
8
Œ
w
)
,where
w
=
f
w
`
Œ
w

g
.
Wechoosedeepneuralnetworks,speci˝callyconvolutionalneuralnetworksasourfunctions
5
(

;
w
-
)
and
5
(

;
w

)
.Forthe
epistemic
uncertainty,whilemanydeeplearningbasedvariational
inference[105,115,116]approacheshavebeenproposed,weusethesimpleinterpretationofdropout
asourvariationalapproximation[105].Practically,thisinterpretationsimplycharacterizesthe
uncertaintyinthedeepneuralnetworkweights
w
throughaBernoullisamplingoftheweights.
Welearntheparametersofourprobabilisticmodel
5
=
f
w
-
Œ
w

Œ
-
6
Œ

6
g
throughmaximum-
likelihoodestimationi.e.,minimizingthenegativelog-likelihoodoftheobservations
_
=
f
y
1
ŒŁŁŁŒ
y
#
g
.Thistranslatestooptimizingacombinationoflossfunctions:
min
5
L
B
+
_
L
6
+
W
L
A
B
+
X
L
A
6
(3.3)
where
_
,
W
and
X
aretheweightsforthedi˙erentlossfunctions,
L
A
B
=
1
2
#
P
#
8
=1
k

8
k
2
˙
and
L
A
6
=
1
2
k

6
k
2
˙
aretheregularizationtermsand
L
B
isthelossfunctionofthestudentthatcaptures
thelog-likelihoodofagivennoisyrepresentation
y
8
underthedistribution
N
(
-
8
Œ

8
)
.
L
B
=
1
2
#
X
8
=1
ln
j

8
j
+
1
2
Trace
 
#
X
8
=1


1
8

(
y
8

-
8
)(
y
8

-
8
)
)

!
L
6
isthelog-likelihoodofthepopulationmanifoldoftheembeddingundertheapproximationbya
multi-variatenormaldistribution
N
(
-
6
Œ

6
)
.
L
6
=
#
2
ln
j

6
j
+
1
2
Trace
 

1
6
#
X
8
=1

(
y
8

-
6
)(
y
8

-
6
)
)

!
Forcomputationaltractabilitywemakeasimplifyingassumptiononthecovariancematrix

byparameterizingitasadiagonalmatrixi.e.,theo˙-diagonalelementsaresettozero.This
parametrizationcorrespondstoindependenceassumptionsontheuncertaintyalongeachdimension
oftheembedding.Thesparseparametrizationofthecovariancematrixyieldstwocomputational
52
bene˝tsinthelearningprocess.Firstly,itissu˚cientforthe
student
topredictonlythediagonal
elementsofthecovariancematrix.Secondly,positivesemi-de˝nitivenessconstraintsonadiagonal
matrixcanbeenforcedsimplybyforcingallthediagonalelementsofthematrixtobenon-negative.
Toenforcenon-negativityoneachofthediagonalvariancevalues,wepredictthelogvariance,
;
9
=log
f
2
9
.Thisallowsustore-parameterizethe
student
likelihoodintermsof
l
8
:
(3.4)
L
B
=
1
2
#
X
8
=1
3
X
9
=1
;
9
8
+
1
2
#
X
8
=1
3
X
9
=1

H
9
8

`
9
8

2
4G?

;
9
8

Similarly,wereparameterizethelikelihoodofthenoiselessembeddingasafunctionof
l
6
,the
logvariancealongeachdimension.Theregularizationtermsarealsoreparameterizedas,
L
A
B
=
1
2
#
P
#
8
=1
P
3
9
=1
4G?

;
9
8

and
L
A
6
=
1
2
P
3
9
=1
4G?

;
9
6

.Weempiricallyestimate
-
6
as
-
6
=
1
#
P
#
8
=1
y
8
andtheotherparameters
5
=
f
w
-
Œ
w

Œ

6
g
throughstochasticgradientdescent[117].Thegradients
oftheparametersarecomputedbybackpropagating[118]thegradientsoftheoutputsthroughthe
network.
Inference:
The
student
modelthathasbeenlearnedcannowbeusedtoinfertheuncertaintyinthe
embeddingsoftheoriginal
teacher
model.Foragivenfacialimage
s
,the
aleatoric
uncertainty
canbepredictedbyafeed-forwardpassoftheimage
s
throughthenetworki.e.,
-
=
5
(
s
Œ
w
-
)
and

=
5
(
s
Œ
w

)
.The
epistemic
uncertaintycanbeapproximatelyestimatedthroughMonte-Carlo
integrationoverdi˙erentsamplesofmodelparameters
w
.Inpracticetheparametersampling
isperformedthroughtheuseofdropoutatinference.Insummary,thetotaluncertaintyinthe
embeddingofeachfacialimage
s
isestimatedbyperformingMonte-Carlointegrationoveratotal
of
)
evaluations,
^
-
8
=
1
)
)
X
C
=1
-
C
8
(3.5)
^

8
=
1
)
)
X
C
=1

-
C
8

^
-
8

-
C
8

^
-
8

)
+
1
)
)
X
C
=1

C
8
(3.6)
where
-
C
8
and

C
8
arethepredicted
aleatoric
uncertaintyforeachfeed-forwardevaluationofthe
network.
53
3.2.3ManifoldApproximation
ThestudentmodeldescribedinSection3.2.2allowsustoextractuncertaintyestimatesofeach
individualimage.Giventheseestimatesthenextstepistoestimatethedensityandsupportofthe
populationandclass-speci˝clow-dimensionalmanifolds.
Multipleexistingtechniquescanbeemployedforthispurposeunderdi˙erentmodeling
assumptions,rangingfromnon-parametricmodelslikekerneldensityestimatorsandconvex-hulls
toparametricmodelslikemultivariateGaussiandistributionandescribedhyper-spheres.The
non-parametricandparametricmodelsspanthetrade-o˙betweentheaccuracyofthemanifold's
shapeestimateandthecomputationalcomplexityof˝ttingtheshapeandcalculatingthevolumeof
themanifold.Whilethenon-parametricmodelsprovidemoreaccurateestimatesofthedensityand
supportofthemanifold,theparametricmodelspotentiallyprovidemorerobustandcomputationally
tractableestimatesofthedensityandvolumeofthemanifolds.Forinstance,estimatingtheconvex
hullofsamplesinhigh-dimensionalspaceanditsvolumeisbothcomputationallyprohibitiveand
lessrobusttooutliers.
Toovercometheaforestatedchallengesweapproximatethedensityofthepopulationand
class-speci˝cmanifoldsinthelow-dimensionalspaceviamulti-variatenormaldistributions.The
choiceofthenormaldistributionapproximationismotivatedbymultiplefactors;(a)probabilistically
itleadstoarobustandcomputationallye˚cientestimateofthedensityofthemanifold,(b)
geometricallyitleadstoahyper-ellipsoidalapproximationofthemanifold,whichinturnallows
fore˚cientandexactestimatesofthesupportandvolumeofthemanifoldasafunctionofthe
desiredfalseacceptancerate(seeSection3.2.4),and(c)thelow-dimensionalmanifoldobtained
throughprojectionandunfoldingofthehigh-dimensionalrepresentationisimplicitlydesigned,
throughEq.3.1,toclusterthefacialimagesbelongingtothesameidentity,andthereforeanormal
distributionisarealistic(seeSection3.3.1)approximationofthemanifold.
Empiricallyweestimatetheparametersofthesedistributionsasfollows.Themeanofthe
populationembeddingiscomputedas
-
y
2
=
1
˘
P
˘
2
=1
^
-
2
,where
^
-
2
=
1
#
2
P
#
2
8
=1
^
-
2
8
.Thecovarianceof
54
Figure3.4
DecisionTheoryandCapacity:
Weillustratetherelationbetweencapacityandthe
discriminantfunctioncorrespondingtoanearestneighborclassi˝er.
Left:
Depictionofthenotion
ofdecisionboundaryandprobabilityoffalseacceptbetweentwoidenticalonedimensionalGaussian
distributions.Shannon'sde˝nitionofcapacitycorrespondstothedecisionboundarybeingone
standarddeviationawayfromthemean.
Right:
Depictionofthedecisionboundaryinducedbythe
discriminantfunctionofnearestneighborclassi˝er.Unlikeinthede˝nitionofShannon'scapacity,
thesizeoftheellipsoidaldecisionboundaryisdeterminedbythemaximumacceptablefalseaccept
rate.Theprobabilityoffalseacceptancecanbecomputedthroughthecumulativedistribution
functionofa
j
2
(
A
2
Œ3
)
distribution.
thepopulationembedding

y
2
isestimatedas,
~

2
=argmax
^

2


^

2
+
1
˘
˘
X
2
=1
(
^
-
2

-
y
2
)(
^
-
2

-
y
2
)
)


y
2
+

z
2
=
~

2
+
1
˘
˘
X
2
=1
(
^
-
2

-
y
2
)(
^
-
2

-
y
2
)
)
(3.7)
where
^

2
=
1
#
2
P
#
2
8
=1
^

2
8
.Alongthesamelines,theclass-speci˝ccovariance

z
2
ofaclass
2
is
estimatedas,

z
2
=
1
#
2
)
#
2
X
8
=1
)
X
C
=1
h

-
C
8

^
-
8

-
C
8

^
-
8

)
+

C
8
i
(3.8)
3.2.4DecisionTheoryandModelCapacity
Thusfar,wedevelopedthetoolsnecessarytocharacterizethefacerepresentationmanifoldand
estimateitsdensity.Inthissectionwewilldeterminethesupportandvolumeofthepopulationand
class-speci˝cmanifoldsasafunctionofthespeci˝edfalseacceptrate(FAR).
55
Ourrepresentationspaceiscomposedoftwocomponents:thepopulationmanifoldofallthe
classesapproximatedbyamulti-variateGaussiandistributionandtheembeddingnoiseofeachclass
approximatedbyamulti-variateGaussiandistribution.Underthesesettings,thedecisionboundaries
betweentheclassesthatminimizestheclassi˝cationerrorratearedeterminedbydiscriminant
functions[119].AsillustratedinFig.3.4,foratwo-classproblem,thediscriminantfunctionisa
hyper-planein
R
3
,withtheoptimalhyper-planebeingequidistantfromboththeclasses.Moreover,
theseparationbetweentheclassesdeterminestheoperatingpointandhencetheFAR.Inthe
multi-classsettingtheoptimaldiscriminantfunctionisthesurfaceencompassedbyallthepairwise
hyper-planes,whichasymptoticallyreducestoahigh-dimensionalhyper-ellipsoid.Thesupport
ofthisenclosinghyper-ellipsoidcanbedeterminedbythedesiredoperatingpointintermsofthe
maximalerrorprobabilityoffalseacceptance.
Underthemulti-classsetting,thecapacityestimationproblemisequivalenttothegeometrical
problemofellipsepacking,whichseekstoestimatethemaximumnumberofsmallhyper-ellipsoids
thatcanbepackedintoalargerhyper-ellipsoid.Inthecontextoffacerepresentationsthesmall
hyper-ellipsoidscorrespondtotheclass-speci˝cenclosinghyper-ellipsoidsasdescribedabovewhile
thelargehyper-ellipsoidcorrespondstothespacespannedbythepopulationofallclasses.The
volume
+
ofahyper-ellipsoidcorrespondingtoaMahalanobisdistance
A
2
=(
x

-
)
)


1
(
x

-
)
withcovariancematrix

isgivenbythefollowingexpression,
+
=
+
3
j

j
1
2
A
3
,where
+
3
isthevolume
ofthe
3
-dimensionalhypersphere.Anupperboundonthecapacityofthefacerepresentationcanbe
computedsimplyastheratioofthevolumesofthepopulationandtheclass-speci˝chyper-ellipsoids,
˘


+
y
2
Œ
z
2
+
z
2

=
 
+
3
j

y
2
+

z
2
j
1
2
A
3
y
2
+
3
j

z
2
j
1
2
A
3
z
2
!
=
 
j

y
2
+

z
2
j
1
2
A
3
y
2
j

z
2
j
1
2
A
3
z
2
!
=
 
j


y
2
Œ
z
2
j
1
2
j


z
2
j
1
2
!
(3.9)
where
+
y
2
Œ
z
2
isthevolumeofpopulationhyper-ellipsoidand
+
z
2
isthevolumeoftheclass-speci˝c
56
hyper-ellipsoid.Thesizeofthepopulationhyper-ellipsoid
A
y
2
ischosensuchthatadesiredfraction
ofalltheclassesliewithinthehyper-ellipsoidand
A
z
2
determinesthesizeoftheclass-speci˝c
hyper-ellipsoid.


y
2
Œ
z
2
and


z
2
arethee˙ectivesizesoftheenclosingpopulationandclass-speci˝c
hyper-ellipsoidsrespectively.Foreachofthehyper-ellipsoidsthee˙ectiveradiusalongthe
8
-th
principaldirectionis
p

_
8
=
A
p
_
8
,where
p
_
8
istheradiusoftheoriginalhyper-ellipsoidalongthe
sameprincipaldirection.
ThisgeometricalinterpretationofthecapacityreducestotheShannoncapacity[120]when
A
y
2
and
A
z
2
arechosentobethesamei.e.,when
A
y
2
=
A
z
2
.Consequently,inthisinstance,thechoiceof
A
y
2
forthepopulationhyper-ellipsoidimplicitlydeterminestheboundaryofseparationbetween
theclassesandhencetheoperatingfalseacceptancerate(FAR)oftheembedding.Forinstance,
whencomputingtheShannoncapacityofthefacerepresentationchoosing
A
y
2
suchthat95%ofthe
classesareenclosedwithinthepopulationhyper-ellipsoidwouldimplicitlycorrespondtooperating
ataFARof5%.However,practicalfacerecognitionsystemsneedtooperateatlowerfalseaccept
rates,dictatedbythedesiredlevelofsecurity.
ThegeometricalinterpretationofthecapacitydescribedinEq.3.9directlyenablesusto
computetherepresentationcapacityasafunctionofthedesiredoperatingpointasdeterminedbyits
correspondingfalseacceptrate.Thesizeofthepopulationhyper-ellipsoid
A
y
2
willbedeterminedby
thedesiredfractionofclassestoencloseoralternativelyothergeometricshapesliketheminimum
volumeenclosinghyper-ellipsoidorthemaximumvolumeinscribedhyper-ellipsoidofa˝nitesetof
classes,bothofwhichcorrespondtoaparticularfractionofthepopulationdistribution.Similarly,
thedesiredfalseacceptrate
@
determinesthesizeoftheclass-speci˝chyper-ellipsoid
A
z
2
.
Let

=
f
x
j
A
2

(
x

-
)
)


1
(
x

-
)
g
betheenclosinghyper-ellipsoid.Withoutlossof
generality,assumingthattheclass-speci˝chyper-ellipsoidiscenteredattheorigin,thefalseaccept
rate
@
canbecomputedas,
@
=1

Z
x
2

1
p
(2
c
)
3
j

j
4G?


x
)


1
x
2

3
x
(3.10)
57
Reparameterizingtheintegralas
y
=


1
2
x
,wehave

=
f
y
j
A
2

y
)
y
g
and,
@
=1

Z
y
2

1
p
(2
c
)
3
4G?


y
)
y
2

3
y
(3.11)
where
f
H
1
ŒŁŁŁŒH
=
g
areindependentstandardnormalrandomvariables.TheMahalanobisdistance
A
2
isdistributedaccordingtothe
j
2
(
A
2
Œ3
)
distributionwith
3
degreesoffreedomand
1

@
isthe
cumulativedistributionfunctionof
j
2
(
A
2
Œ3
)
.Therefore,giventhedesiredFAR
@
,thecorresponding
Mahalanobisdistance
A
z
2
canbeobtainedfromtheinverseCDFofthe
j
2
(
A
2
z
Œ3
)
distribution.Along
thesamelines,thesizeofthepopulationhyper-ellipsoid
A
y
2
canbeestimatedfromtheinverseCDF
ofthe
j
2
(
A
2
y
Œ3
)
distributiongiventhedesiredfractionofclassestoencompass.Theseestimates
of
A
z
2
and
A
y
2
canbeutilizedinEq.3.9toestimatethecapacityasafunctionofthedesiredFAR.
Algorithm1providesahigh-leveloutlineofourcompletecapacityestimationprocedure.
Algorithm1
FaceRepresentationCapacityEstimation
Input:
Representation
5
"
(

Œ
)
P
)
,afacedatasetanddesiredFAR.
Output:
Capacityestimateatspeci˝edFAR.
Step1:
Learnparametricmapping
5
%
(

Œ
)
M
):
x
!
y
(Eq.3.1)
Step2:
Learn
student
model
"
B
tomimicandprovideuncertaintyestimatesof
teacher
"
C
=(
"Œ%
)
(Eq.3.3)
Step3:
Estimatedensityandsupport

y
2
ofpopulationmanifold(Eq.3.7)
Step4:
Estimatedensityandsupport

z
2
ofclass-speci˝cmanifolds(Eq.3.8)
Step5:
Obtain
A
y
2
and
A
z
2
fordesiredpopulationfractionandFAR,respectively(Eq.3.11)
Step6:
ObtaincapacityestimatefordesiredpopulationfractionandFARusing
A
y
2
and
A
z
2
(Eq.
3.9)
3.3NumericalExperiments
Inthissectionwewill,(a)illustratethecapacityestimationprocessonatwo-dimensionaltoy
example,(b)estimatethecapacityofadeepneuralnetworkbasedfacerepresentationmodel,
speci˝callyFaceNetandSphereFaceonmultipledatasetsofincreasingcomplexity,and(c)study
thee˙ectofdi˙erentdesignchoicesoftheproposedcapacityestimationapproach.
58
Table3.1CapacityofTwo-DimensionalToyExampleat1%FAR
Manifold
PopulationClass(maxarea)
EstimatedGround-Truth
CovarianceAreaCovarianceArea
SupportEstimateGround-TruthEstimateGround-TruthEstimateGround-TruthEstimateGround-TruthCapacityCapacity
Ellipse

10
Ł
840
Ł
56
0
Ł
5611
Ł
57

10
Ł
340
Ł
71
0
Ł
7111
Ł
79

35.1534.62

4
Ł
960
Ł
47
0
Ł
476
Ł
54

4
Ł
180
Ł
97
0
Ł
975
Ł
86

17.8415.251.972.27
ConvexHull403.91102.653.932.27
3.3.1Two-DimensionalToy-Example
Figure3.5
SampleRepresentationSpace:
Illustrationofatwo-dimensionalspacewherethe
underlyingpopulationandclass-speci˝crepresentations(weshowfourclasses)are2-DGaussian
distributions(solidellipsoids).Samplesfromtheclasses(colored
¢
)areutilizedtoobtainestimates
ofthisunderlyingpopulationandclass-speci˝cdistributions(solidlines).Asacomparison,the
supportofthesamplesintheformofaconvexhullarealsoshown(dashedlines).
Weconsideranillustrativeexampletodemonstratethecapacityestimationprocessgivena
constellationofclassesinatwo-dimensionalrepresentationspace.Wemodelthedistributionof
thepopulationspaceofclasses(classcenterstobespeci˝c)asamulti-variatenormaldistribution,
whilethefeaturespaceofeachclassismodeledasatwo-dimensionalnormaldistribution.From
thismodel,wesample100di˙erentclassesfromtheunderlyingpopulationdistributionandforeach
oftheseclasseswesamplefeaturesfromthegroundtruthmulti-variatenormaldistributionforthat
class.Fromthesesamples,weestimatethecovariancematrixofthepopulationspacedistribution
andthatoftheindividualclasses.
59
Fig.3.5showstherepresentationspace,includingthepopulationspaceandfourdi˙erentclasses
correspondingtoclasseswiththeminimum,mean,medianandmaximumareafromamongthe
100populationclassesthatweresampled.Asacomparisonwealsoobtainthesupportofthe
populationandtheclassesthroughtheconvexhullofthesamples,evenasthispresentsanumberof
practicalchallenges:(1)estimatingconvexhullinhigh-dimensionsiscomputationallychallenging,
(2)convexhulloverestimatesthesupportduetooutliers,and(3)cannotbeeasilyadaptedtoobtain
supportasafunctionofdesiredFAR.
Thecapacityoftherepresentationisnowestimatedastheratioofthesupportareaofthe
populationandtheclasswithmedianarea,respectively.Tab.3.1showsthecapacityestimatesso
obtainedforthissimpli˝edrepresentationspace.Resultsonthisexamplesuggeststhattheellipsoidal
approximationoftherepresentationisabletoprovidemoreaccurateestimatesofthecapacityofthe
representationincomparisontotheconvexhull.Modelingthesupportoftherepresentationthrough
convexhullsisseverelya˙ectedbyoutliers,resultinginanoverestimateoftheunderlyingsupport
andareaoftherepresentationleadingtooverestimatesofitscapacity.
3.3.2DatasetsandFaceRepresentationModel
Datasets.
Weutilizemultiplelarge-scalefacedatasets,bothforlearningthe
teacher
and
student
modelsaswellasforestimatingthecapacityofthe
teacher
.CASIAWebFacedataset[11]isusedfor
trainingboththe
teacher
and
student
models.Thecapacityofthe
teacher
isestimatedonLFW[27],
IJB-A[29],IJB-B[30],andIJB-C[9].
FaceRepresentationModels.
Weestimatethecapacityoftwodi˙erentfacerepresentationmodels:
(i)FaceNetintroducedbySchro˙
etal
.[12],and(ii)SphereFaceintroducedbyLiu
etal
.[13].These
modelsareillustrativeofthestate-of-the-artrepresentationsforfacerecognition.
Themanifoldprojectionandunfoldingfunctionismodeledasamulti-layerdeepneuralnetwork
withmultipleresidual[50]modulesconsistingoffully-connectedlayers.Therefore,foragiven
image,thelow-dimensionalrepresentationcanbeobtainedbypropagatingtheimagethroughthe
original
facerepresentationmodelandthenthroughthemanifoldprojectionmodel.Werefertothe
60
combinedmodel,i.e.,
original
representationandtheprojectionmodel,asthe
teacher
model.Since
the
student
modelispurposedtomimicthe
teacher
model,webasethe
student
networkarchitecture
onthe
teacher
's
7
architecturewithafewnotableexceptions.First,weintroducedropoutbefore
everyconvolutionallayerofthenetwork,includingalltheconvolutionallayersoftheinception[121]
andresidual[50]modulesandeverylinearlayerofthemanifoldprojectionandunfoldingmodules.
Second,thelastlayerofthenetworkismodi˝edtogeneratetwooutputs
-
and

insteadofthe
outputofthe
teacher
i.e.,sample
y
ofthenoisyembedding.
3.3.3FaceRecognitionPerformance
Belowweprovideimplementationdetailsforlearningthemanifoldprojectionandthe
student
net-
works.Subsequently,wedemonstratetheabilityofthe
student
modeltomaintainthediscriminative
performanceofthe
original
models.
ImplementationDetails:
Weusepre-trainedmodelsforbothFaceNet
8
andSphereFace
9
asour
original
facerepresentationmodels.Beforeweextractfeaturesfromthesemodels,thefaceimages
arepre-processedandnormalizedtoacanonicalfaceimage.Thefacesaredetectedandnormalized
usingthejointfacedetectionandalignmentsystemintroducedbyZhang
etal
.[33].Giventhefacial
landmarks,thefacesarenormalizedtoacanonicalimageofsize182

182fromwhichRGBpatches
ofsize160

160areextractedastheinputtothenetworks.
Giventhefeaturesextractedfromthe
original
representation,wetrainthemanifoldprojection
andunfoldingnetworksontheCASIAWebFacedataset.Themodelistrainedtominimizethe
multi-dimensionalscalinglossfunctiondescribedinEq.3.1onrandomlyselectedpairsoffeatures
vectors
x
8
and
x
9
fromthedataset.TrainingisperformedusingtheAdam[97]optimizerwitha
learningrateof3e-4andtheregularizationparameter
_
=3

10

4
.Weuseabatchsizeof256
imagepairsandtrainthemodelforabout100epochs.
7
Inthescenariowherethe
teacher
isablack-boxmodel,thedesignofthestudentnetworkarchitectureneedsmore
carefulconsiderationbutitalsoa˙ordsmore˛exibility.SeeFig.3.2foranillustrationofthisprocess.
8
https://github.com/davidsandberg/facenet
9
https://github.com/wy1iu/sphereface
61
The
student
istrainedtominimizethelossfunctionde˝nedinEq.3.3,wherethehyper-
parametersarechosenthroughcross-validation.Trainingisperformedthroughstochasticgradient
descentwithNesterovMomentum0.9andweightdecay0.0005.Weusedabatchsizeof64,a
learningrateof0.01thatisdroppedbyafactorof2every20epochs.Weobservedthatitissu˚cient
totrainthe
student
modelforabout100epochsforconvergence.The
student
modelincludesdropout
withaprobabilityof0.05aftereachconvolutionallayerandwithaprobabilityof0.2aftereach
fully-connectedlayerinthemanifoldprojectionlayers.Atinferenceeachimageispassedthrough
the
student
network1,000timesasawayofperformingMonte-Carlointegrationthroughthespace
ofnetworkparameters
f
w
-
Œ
w

g
.Thesesampledoutputsareusedtoempiricallyestimatethemean
andcovarianceoftheimageembedding.
Experiments:
Weevaluateandcomparetheperformanceofthe
original
and
student
modelsonthe
fourtestdatasets,namely,LFW,IJB-A,IJB-BandIJB-C.Toevaluatethe
student
modelweestimate
thefacerepresentationthroughMonte-Carlointegration.Wepasseachimagethroughthe
student
model1,000timestoextract
f
-
8
Œ

8
g
1000
8
=1
andcompute
-
=
1
1000
P
1000
8
=1
-
8
astherepresentation.
Followingstandardpractice,wematchapairofrepresentationsthroughanearestneighborclassi˝er
i.e.,bycomputingtheeuclideandistance
3
89
=


y
8

y
9


2
2
betweenthelow-dimensionalprojected
featurevectors
y
8
and
y
9
.
WeevaluatethefacerepresentationmodelsontheLFWdatasetusingtheBLUFRprotocol[99]
andfollowtheprescribedtemplatebasedmatchingprotocol,whereeachtemplateiscomposed
ofpossiblymultipleimagesoftheclass,fortheIJB-A,IJB-BandIJB-Cdatasets.Followingthe
protocolin[122],wede˝nethematchscorebetweentemplatesastheaverageofthematchscores
betweenallpairsofimagesinthetwotemplates.
Fig.3.6andTab.3.2reporttheperformanceofthe
original
and
student
models,bothFaceNet
andSphereFace,oneachofthesedatasetsatdi˙erentoperatingpoints.Thiscomparisonaccountsfor
boththeabilityoftheprojectionmodeltomaintaintheperformanceofthe
original
high-dimensional
representationaswellastheabilityofthe
student
tomimicthe
teacher
whileprovidinguncertainty
estimates.Wemakethefollowingobservations:(1)TheperformanceofDNNbasedrepresentation
62
Table3.2FaceRecognitionResultsforFaceNet,SphereFaceandState-of-the-Art(Thestate-of-the-art
facerepresentationmodelsarenotavailableinthepublicdomain)
Dataset
Original
:FaceNet
Student
:FaceNet
Original
:SphereFace
Student
:SphereFaceState-of-the-Art
0.1%FAR1%FAR0.1%FAR1%FAR0.1%FAR1%FAR0.1%FAR1%FAR0.1%FAR1%FAR
LFW(BLUFR)93.9098.5192.8398.2896.7499.1195.4998.7998.88[123]N/A
IJB-A45.9270.2643.8471.7265.0685.9764.1385.2594.897.1[124]
IJB-B48.3174.4745.5674.1067.5880.8164.0280.6393.797.5[125]
IJB-C42.5778.5340.7476.7571.2691.6764.0288.3394.798.3[125]
(a)LFW
(b)IJB-A
(c)IJB-B
(d)IJB-C
Figure3.6Facerecognitionperformanceofthe
original
and
student
modelsondi˙erentdatasets.
Wereportthefaceveri˝cationperformanceofbothFaceNetandSphereFacefacerepresentations,
(a)LFWevaluatedthroughtheBLUFRprotocol,(b)IJB-A,(c)IJB-B,and(d)IJB-Cevaluated
throughtheirrespectivematchingprotocol.
onLFW,consistinglargelyoffrontalfaceimageswithminimalposevariationsandfacialocclusions,
iscomparabletothestate-of-the-art.However,itsperformanceonIJB-A,IJB-BandIJB-C,datasets
withlargeposevariations,islowerthanstate-of-the-artapproaches.Thisisduetothetemplate
generationstrategythatweemployandthefactthatunlikethesemethodswedonot˝ne-tunethe
DNNmodelontheIJB-A,IJB-BandIJB-Ctrainingsets.Wereiteratethatourgoalinthisworkis
toestimatethecapacityofagenericfacerepresentationasopposedtoachievingthebestveri˝cation
performanceoneachindividualdatasets.,and(2)Ourresultsindicatethatthe
student
modelsare
abletomimicthe
teacher
modelsverywellasdemonstratedbythesimilarityofthereceiving
operatingcurves.
3.3.4FaceRepresentationCapacity
Havingdemonstratedtheabilityofthe
student
modeltobeane˙ectiveproxyforthe
original
representationmanifold,weindirectlyestimatethecapacityofthe
original
modelbyestimatingthe
capacityofthe
student
model.
ImplementationDetails:
Weestimatethecapacityofthefacerepresentationsbyevaluating
63
Table3.3CapacityofFaceRepresentationModelat1%FAR
DatasetFacesFaceNetSphereFace
LFW
4.3

10
6
2.6

10
5
IJB-A
6.3

10
4
3.2

10
6
IJB-B
6.4

10
4
2.4

10
5
IJB-C
2.7

10
4
8.4

10
4
Eq.3.9.Foreachofthedatasetsweempiricallydeterminetheshapeandsizeofthepopulation
hyper-ellipsoid

y
2
andtheclass-speci˝chyper-ellipsoids

z
2
.Thesequantitiesarecomputed
throughthepredictionsobtainedbysamplingtheweights(
w
-
Œ
w

)ofthemodel,viadropout.We
obtain1,000suchpredictionsforagivenimage,byfeedingtheimagethroughthe
student
networka
1,000di˙erenttimeswithdropout.Forrobustnessagainstoutliersweonlyconsiderclasseswith
atleasttwoimagesperclassforLFWand˝veimagesperclassforalltheotherdatasetsforthe
capacityestimates.
CapacityEstimates:
Tab.3.3reportsthecapacityofDNNbasedfacerepresentationsestimatedon
di˙erentdatasetsat1%FAR(i.e.,when
A
y
2
=
A
z
2
).Wemakethefollowingobservationsfromour
numericalresults:TheupperboundonthecapacityestimateoftheFaceNetandSphereFacemodels
inconstrainedscenarios(LFW)isoftheorderof
ˇ
10
6
,inunconstrainedenvironments(IJB-A,
IJB-BandIJB-C)isoftheorderof
ˇ
10
5
underthegeneralmodelofahyper-ellipsoidwiththeclass
correspondingtomaximumnoise.Therefore,theoretically,therepresentationshouldbeableto
resolve
10
6
and
10
5
subjectswithatrueacceptancerate(TAR)of100%ataFARof1%underthe
constrainedandunconstrainedoperationalsettings,respectively.Whilethiscapacityestimateison
theorderofthepopulationofalargecity,inpractice,theperformanceoftherepresentationislower
64
thanthetheoreticalperformance,about95%acrossonly10,000subjectsintheconstrainedandonly
50%across3,531subjectsintheunconstrainedscenarios.Theseresultssuggestthatourcapacity
estimatesareanupperboundontheactualperformanceoffacerecognitionsystemsinpractice,
especiallyunderunconstrainedscenarios.Therelativeorderofthecapacityestimates,however,
mimicstherelativeorderoftheveri˝cationaccuracyonthesedatasetsasshowninFig.3.7c.
(a)FaceNet[12]
(b)SphereFace[13]
(c)CapacityvsTAR
Figure3.7Capacityestimatesacrossdi˙erentdatasetsforthe(a)FaceNet[12]and(b)SphereFace[13]
representationsasfunctionofdi˙erentfalseacceptrates.Underthelimit,thecapacitytendstozero
astheFARtendstozero.Similarly,thecapacitytendsto
1
astheFARtendsto1.0.(c)Logarithmic
valuesofcapacityondi˙erentdatasetsversusthecorrespondingTAR@0.1%FAR.
Weextendthecapacityestimatespresentedabovetoestablishcapacityasafunctionofdi˙erent
operatingpoints,asde˝nedbydi˙erentfalseacceptrates.Wede˝ne
A
y
2
and
A
z
2
corresponding
tothedesiredoperatingpointsandevaluateEq.3.9.Inallourexperimentswechoose
A
y
2
to
encompass99%oftheclasseswithinthepopulationhyper-ellipsoid.Di˙erentFARsde˝nedi˙erent
decisionboundarycontoursthat,inturn,de˝nethesizeoftheclass-speci˝chyper-ellipsoid.Figures
3.7aand3.7bshowshowthecapacityoftherepresentationchangesasafunctionoftheFARsfor
di˙erentdatasets.Wenotethatattheoperatingpointof
˙'
=0
Ł
1%
,thecapacityofthemaximum
facerepresentationis
ˇ
10
5
intheconstrainedand
ˇ
10
3
intheunconstrainedcase.However,at
stricteroperatingpoints(FARof
0
Ł
001%
or
0
Ł
0001%
),thatismoremeaningfulatlargerscales
ofoperation[126],thecapacityoftheFaceNetrepresentationissigni˝cantlylower(63and6,
respectivelyforIJB-C)thanthetypicaldesiredscaleofoperationoffacerecognitionsystems.These
resultssuggestasigni˝cantroomforimprovementinfacerepresentation.
65
Table3.4IJB-CCapacityat1%FARAcrossIntra-ClassUncertainty
ModelMinMeanMedianMax
FaceNet
1
Ł
6

10
14
6

10
8
5
Ł
0

10
8
2
Ł
7

10
4
SphereFace
4
Ł
9

10
16
1
Ł
1

10
11
9
Ł
8

10
10
8
Ł
4

10
4
3.3.5AblationStudies
DNNandPCA:
WeseektocomparethecapacityofclassicalPCAbasedEigenFaces[36]
representationofimagepixelsandtheDNNbasedrepresentation.Theseareillustrativeof
thetwoextremesofvariousfacerepresentationsproposedintheliteraturewithFaceNetand
SphereFaceprovidingclosetostate-of-the-artrecognitionperformance.TheFaceNetandSphereFace
representationsarebasedonnon-linearmulti-layereddeepconvolutionalnetworkarchitectures.
EigenFaces,incontrast,isalinearmodelforrepresentingfaces.ThecapacityofEigenfacesis
ˇ
10
0
,whichissigni˝cantlylowerthanthecapacityofDNNbasedrepresentations.Eigenfaces,
byvirtueofbeingbasedonlinearprojectionsoftherawpixelvalues,isunabletoscalebeyonda
handfulofidentities,whiletheDNNrepresentationsareabletoresolvesigni˝cantlymorenumber
ofidentities.Therelativedi˙erenceinthecapacityisalsore˛ectedinthevastdi˙erenceinthe
veri˝cationperformancebetweenthetworepresentations.
DataBias:
Ourcapacityestimatesarecriticallydependentontherepresentationalsupportof
thecanonicalclass.Inotherwords,thecapacityexpressioninEq.3.9dependson

z
2
,thatis
representativeofthedemographicsandintra-classvariabilityofthesubjectsinthepopulationof
interest.However,thehyper-ellipsoidscorrespondingtovariousclassescouldpotentiallybeofa
di˙erentsize.Forinstance,inFig.3.1eachclass-speci˝cmanifoldisofdi˙erentsizes,orientation
andshape.Preciselyde˝ningoridentifyingacanonicalsubject,fromamongallpossibleidentities,
isinitselfachallengingtaskandbeyondthescopeofthiswork.InTab.3.4wereportthecapacityfor
di˙erentchoicesofclasses(subjects)fromtheIJB-Cdataseti.e.,classeswiththeminimum,mean,
medianandmaximumhyper-ellipsoidvolume,therebyrangingfromclasseswithverylowintra-class
variabilityandclasseswithveryhighintra-classvariability.Datasetswhoseclassdistributionis
66
similartothedistributionofthedatathatwasusedtotrainthefacerepresentation,areexpectedto
exhibitlowintra-classuncertainty,whiledatasetswithclassesthatareoutofthetrainingdistribution
canpotentiallyhavehighintra-classuncertainty,andconsequentlylowercapacity.Fig.3.8show
examplesoftheimagescorrespondingtothelowestandhighestintra-classvariabilityineachdataset.
Empirically,weobservedthatclasseswiththesmallesthyper-ellipsoidaretypicallyclasseswith
veryfewimagesandverylittlevariationinfacialappearance.Similarly,classeswithhighintra-class
uncertaintyaretypicallyclasseswithaverylargenumberofimagesspanningawiderangeof
variationsinpose,expression,illuminationconditionsetc.,variationsthatonecanexpectunder
anyreal-worlddeploymentsoffacerecognitionsystems.Coupledwiththefactthatthecapacityof
thefacerepresentationisestimatedfromaverysmallsampleofthepopulation(lessthan11,000
subjects),wearguethattheclasswithlargeintra-classuncertaintywithinthedatasetsconsideredin
thischapterisareasonableproxyofacanonicalsubjectinunconstrainedreal-worlddeploymentsof
facerecognitionsystems.
(a)LFW
(b)IJB-B
(c)IJB-C
Figure3.8Exampleimagesofclassesthatcorrespondtodi˙erentsizesoftheclass-speci˝chyper-
ellipsoids,basedontheSphereFacerepresentation,fordi˙erentdatasetsconsidered.
TopRow
:
Imagesoftheclasswiththelargestclass-speci˝chyper-ellipsoidforeachdatabase.Noticethatinthe
caseofadatabasewithpredominantlyfrontalfaces(LFW),largevariationsinfacialappearancelead
tothegreatestuncertaintyintheclassrepresentation.Onmorechallengingdatasets(IJB-B,IJB-C),
thefacerepresentationexhibitsmostuncertaintyduetoposevariations.
BottomRow:
Imagesofthe
classwiththesmallestclass-speci˝chyper-ellipsoidforeachdatabase.Asexpected,acrossallthe
datasets,frontalfaceimageswiththeminimalchangeinappearanceresultintheleastamountof
uncertaintyintheclassrepresentation.
67
Table3.5IJB-CCapacityat1%FARAcrossManifoldSupport
ModelHypersphere
Hyper-Ellipsoid
Hyper-Ellipsoid
(Axis-Aligned)
FaceNet
1
Ł
5

10
3
9
Ł
2

10
2
2
Ł
7

10
4
SphereFace
6
Ł
7

10
3
7
Ł
2

10
3
8
Ł
4

10
4
GaussianDistributionParameterization:
Forthesakeofe˚ciencywemadethesamemodeling
assumptionforboththeglobalshapeoftheembeddingandtheembeddingshapeofeachclass.The
capacityestimatesobtainedthusfararebymodelingthemanifoldsasunconstrainedhyper-ellipsoids.
Wenowobtaincapacityestimatesfordi˙erentmodelingassumptionsontheshapeoftheseentities.
Forinstancetheshapescouldalsobemodeledashyper-spherescorrespondingtoadiagonal
covariancematrixwiththesamevarianceineachdimension.Wegeneralizethehyper-spheremodel
toanaxisalignedhyper-ellipsoidcorrespondingtoadiagonalcovariancematrixwithpossibly
di˙erentvariancesalongeachdimension.Tab.3.5showsthecapacityestimatesontheIJB-Cdataset
at1%FAR.WeobservethatthecapacityestimatesoftheanisotropicGaussian(hyper-ellipsoid)
aretwoordersofmagnitudehigherthanthecapacityestimatesofthereducedapproximations,
hyper-sphere(isotropicGaussian)andaxis-alignedhyper-ellipsoid.Atthesametime,theisotropic
andtheaxis-alignedhyper-ellipsoidapproximationsresultinverysimilarcapacityestimates.
3.4Conclusion
Facerecognitionisbasedontwounderlyingpremises:persistence(invarianceoffacerepresentation
overtime)andcapacity(numberofdistinctidentitiesafacerepresentationcaresolve).Whileface
longitudinalstudies[127]haveaddressedthepersistenceproperty,verylittleattentionhasbeen
devotedtothecapacityproblemthatisaddressedhere.Thefacerepresentationprocesswasmodeled
asalow-dimensionalmanifoldembeddedinhigh-dimensionalspace.Weestimatedthecapacityof
afacerepresentationasaratioofthevolumeofthepopulationandclass-speci˝cmanifoldsasa
functionofthedesiredfalseacceptancerate.Empirically,weestimatedthecapacityoftwodeep
neuralnetworkbasedfacerepresentations:FaceNetandSphereFace.Numericalresultsyieldeda
68
capacityof
10
5
ataFARof1%.AtlowerFARof
0
Ł
001%
,thecapacitydropped-o˙signi˝cantly
toonly70underunconstrainedscenarios,impairingthescalabilityofthefacerepresentation.
Theredoesexistalargegapbetweenthetheoreticalandempiricalveri˝cationperformanceofthe
representationsindicatingthatthereisasigni˝cantscopeforimprovementinthediscriminative
capabilitiesofcurrentstate-of-the-artfacerepresentations.
Asfacerecognitiontechnologymakesrapidstridesinperformanceandwitnesseswideradoption,
quantifyingthecapacityofagivenfacerepresentationisanimportantproblem,bothfroman
analyticalaswellasfromapracticalperspective.However,duetothechallengingnatureof˝nding
aclosed-formexpressionofthecapacity,wemakesimplifyingassumptionsonthedistributionofthe
populationandspeci˝cclassesintherepresentationspace.Ourexperimentalresultsdemonstrate
thateventhissimpli˝edmodelisabletoprovidereasonablecapacityestimatesofaDNNbasedface
representation.Relaxingtheassumptionsoftheapproachpresentedhereisanexcitingdirectionof
futurework,leadingtomorerealisticcapacityestimates.
69
Chapter4
TheBiasinFaceRecognition
InthischapterweassessthedemographicbiasinFRalgorithmsanddevelopnewmethodsto
mitigatethedemographicimpactonFRperformance.Weexperimentwithdi˙erentde-biasing
approachesandnetworkarchitecturesusingdeeplearning.Assessingthemodels'demographic
biasquantitativelyonvariousdatasetsweseehowmuchbiasmitigatedinourattemptatimproving
fairnessoffacerepresentationsextractedfromCNNs.
Morespeci˝cally,inthischapterweproposetwodi˙erentmethodstolearnafairface
representation,wherefacesofeverygroupcouldbeequallywell-represented.Inthe˝rstmethod,
wepresentade-biasingadversarialnetwork(DebFace)thatlearnstoextractdisentangledfeature
representationsforbothunbiasedfacerecognitionanddemographicsestimation.Theproposed
networkconsistsofoneidentityclassi˝erandthreedemographicclassi˝ers(forgender,age,and
race)thataretrainedtodistinguishidentityanddemographicattributes,respectively.Adversarial
learningisadoptedtominimizecorrelationamongfeaturefactorssoastoabatebiasin˛uencefrom
otherfactors.Wealsodesignaschemetocombinedemographicswithidentityfeaturestostrengthen
robustnessoffacerepresentationindi˙erentdemographicgroups.
Thesecondmethod,groupadaptiveclassi˝er(GAC),learnstomitigatebiasbyusingadaptive
convolutionkernelsandattentionmechanismsonfacesbasedontheirdemographicattributes.The
adaptivemodulecompriseskernelmasksandchannel-wiseattentionmapsforeachdemographic
70
groupsoastoactivatedi˙erentfacialregionsforidenti˝cation,leadingtomorediscriminative
featurespertinenttotheirdemographics.Wealsointroduceanautomatedadaptationstrategy
whichdetermineswhethertoapplyadaptationtoacertainlayerbyiterativelycomputingthe
dissimilarityamongdemographic-adaptiveparameters,therebyincreasingthee˚ciencyofthe
adaptationlearning.
Theexperimentalresultsonbenchmarkfacedatasets(e.g.,RFW[4],LFW,IJB-A,andIJB-C)
showthatourapproachisabletoreducebiasinfacerecognitiononvariousdemographicgroupsas
wellasmaintainthecompetitiveperformance.
4.1FairnessLearningandDe-biasingAlgorithms
Westartbyreviewingrecentadvancesinfairnesslearningandde-biasingalgorithms.Previouse˙orts
onfairnesstechniquesareproposedtopreventmachinelearningmodelsfromutilizingstatisticalbias
intrainingdata,includingadversarialtraining[128

131],subgroupconstraintoptimization[132

134],datapre-processing(
e.g.
,weightedsampling[135],anddatatransformation[136]),and
algorithmpost-processing[137,138].Forexample,inprior-DNNere,Zhang
etal
.[139]propose
acost-sensitivelearningframeworktoreducemisclassi˝cationrateoffaceidenti˝cation.To
correcttheskewofseparatinghyperplanesofSVMonimbalanceddata,Liu
etal
.[140]propose
Margin-BasedAdaptiveFuzzySVMthatobtainsalowergeneralizationerrorbound.IntheDNN
era,facerecognitionmodelsaretrainedonlarge-scalefacedatasetswithhighly-imbalancedclass
distribution[141,142].Touncoverdeeplearningbias,Alexander
etal
.[143]developanalgorithmto
mitigatethehiddenbiaseswithintrainingdata.RangeLoss[142]learnsarobustfacerepresentation
thatmakesthemostuseofeverytrainingsample.Tomitigatetheimpactofinsu˚cientclass
samples,center-basedfeaturetransferlearning[141]andlargemarginfeatureaugmentation[144]
areproposedtoaugmentfeaturesofunder-representedidentitiesandequalizeclassdistribution.
Despitetheire˙ectiveness,thesestudiesignorethein˛uenceofdemographicimbalanceonthe
dataset,whichmayleadtodemographicbias.Thestudiesin[4,145]addressthedemographic
71
biasinFRbyleveragingunlabeledfacestoimprovetheperformanceingroupswithfewersamples.
Wang
etal
.[5]proposeskewness-awarereinforcementlearningtomitigateracialbiasinFR.Unlike
priorwork,wedesignaGACframeworktocustomizetheclassi˝erforeachdemographicgroup,
which,ifsuccessful,wouldleadtomitigatedbias.Thisframeworkispresentedinthefollowing
Sec.4.4.
Anotherpromisingapproachlearnsafairrepresentationtopreservealldiscerninginformtion
aboutthedataattributesortask-relatedattributesbuteliminatetheprejudiciale˙ectsfromsensitive
factors[22,146

149].Locatello
etal
.[150]showthefeaturedisentanglementisconsistently
correlatedwithincreasingfairnessofgeneralpurposerepresentationsbyanalyzing
12
Œ
600
SOTA
models.Accordingly,weproposeoursecondde-biasingframework,
DebFace
,whichdisentangles
facerepresentationstode-biasbothFRanddemographicattributeestimation.Sec.4.3discusses
DebFaceinmoredetails.
4.2ProblemDe˝nition
Wenowgiveaspeci˝cde˝nitionoftheproblemaddressedinthischapter.Theultimategoal
ofunbiasedfacerecognitionisthat,givenafacerecognitionsystem,nostatisticallysigni˝cant
di˙erenceamongtheperformanceindi˙erentcategoriesoffaceimages.Despitetheresearch
onpose-invariantfacerecognitionthataimsforequalperformanceonallposes[151,152],we
believethatitisinappropriatetode˝nevariationslikepose,illumination,orresolution,asthe
categories.Theseareinstantaneous
image-related
variationswithintrinsicbias.E.g.,large-poseor
low-resolutionfacesareinherentlyhardertoberecognizedthanfrontal-viewhigh-resolutionfaces.
Rather,wewouldliketode˝ne
subject-related
propertiessuchasdemographicattributesasthe
categories.
Afacerecognitionsystemis
biased
ifitperformsworseoncertaindemographiccohorts.
Forpracticalapplications,itisimportanttoconsiderwhatdemographicbiasesmayexist,and
whethertheseareintrinsicbiasesacrossdemographiccohortsoralgorithmicbiasesderivedfrom
thealgorithmitself.Thismotivatesustoanalyzethedemographicin˛uenceonfacerecognition
72
performanceandstrivetoreducealgorithmicbiasforfacerecognitionsystems.Onemayachievethis
bytrainingonadatasetcontaininguniformsamplesoverthecohortspace.However,thedemographic
distributionofadatasetisoftenimbalancedandunderrepresentsdemographicminoritieswhile
overrepresentingmajorities.Naivelyre-samplingabalancedtrainingdatasetmaystillinducebias
sincethediversityoflatentvariablesisdi˙erentacrosscohortsandtheinstancescannotbetreated
fairlyduringtraining.Tomitigatedemographicbias,weproposetwofacede-biasingframeworks
thatreducesdemographicbiasoverfaceidentityfeatureswhilemaintaintheoverallveri˝cation
performanceinthemeantime.
4.3JointlyDe-biasingFaceRecognitionandDemographicAt-
tributeEstimation
Inthissection,weintroduceouranotherframeworktoaddressthein˛uenceofdemographicbiason
facerecognition.Withthetechniqueofadversariallearning,weattackthisissuefromadi˙erent
perspective.Speci˝cally,weassumethatifthefacerepresentationdoesnotcarrydiscriminative
informationofdemographicattributes,itwouldbeunbiasedintermsofdemographics.Giventhis
assumption,onecommonwaytoremovedemographicinformationfromfacerepresentationsis
toperformfeaturedisentanglementviaadversariallearning(Fig.4.1b).Thatis,theclassi˝erof
demographicattributescanbeusedtoencouragetheidentityrepresentationto
not
carrydemographic
information.However,oneissueofthiscommonapproachisthat,thedemographicclassi˝eritself
couldbebiased(
e.g.
,theraceclassi˝ercouldbebiasedongender),andhenceitwillactdi˙erently
whiledisentanglingfacesofdi˙erentcohorts.Thisisclearlyundesirableasitleadstodemographic
biasedidentityrepresentation.
Toresolvethechicken-and-eggproblem,weproposeto
jointly
learnunbiasedrepresentations
forboththeidentityanddemographicattributes.Speci˝cally,startingfromamulti-tasklearning
frameworkthatlearnsdisentangledfeaturerepresentationsofgender,age,race,andidentity,
respectively,werequesttheclassi˝erofeachtasktoactasadversarialsupervisionfortheothertasks
73
(a)
Multi-tasklearning
(b)
Adversariallearning
(c)
DebFace
Figure4.1Methodstolearndi˙erenttaskssimultaneously.Solidlinesaretypicalfeature˛owin
CNN,whiledashlinesareadversariallosses.
(
e.g.
,thedasharrowsinFig.4.1c).Thesefourclassi˝ershelpeachothertoachievebetterfeature
disentanglement,resultinginunbiasedfeaturerepresentationsforboththeidentityanddemographic
attributes.AsshowninFig.4.1,ourframeworkisinsharpcontrasttoeithermulti-tasklearningor
adversariallearning.
Moreover,sincethefeaturesaredisentangledintothedemographicandidentity,ourface
representationsalsocontributetoprivacy-preservingapplications.Itisworthnoticingthat
suchidentityrepresentationscontainlittledemographicinformation,whichcouldunderminethe
recognitioncompetencesincedemographicfeaturesare
part
ofidentity-relatedfacialappearance.
Toretaintherecognitionaccuracyondemographicbiasedfacedatasets,weproposeanothernetwork
thatcombinesthedemographicfeatureswiththedemographic-freeidentityfeaturestogeneratea
newidentityrepresentationforfacerecognition.
Thekeycontributionsand˝ndingsofthisworkare:

Athoroughanalysisofdeeplearning
basedfacerecognitionperformanceonthreedi˙erentdemographics:(i)gender,(ii)age,and(iii)
race.

Ade-biasingfacerecognitionframework,calledDebFace,thatgeneratesdisentangled
representationsforbothidentityanddemographicsrecognitionwhilejointlyremovingdiscriminative
informationfromothercounterparts.

TheidentityrepresentationfromDebFace(DebFace-ID)showslowerbiasondi˙erent
demographiccohortsandalsoachievesSOTAfaceveri˝cationresultsondemographic-unbiased
facerecognition.
74

ThedemographicattributeestimationsviaDebFacearelessbiasedacrossotherdemographic
cohorts.

CombiningIDwithdemographicsresultsinmorediscriminativefeaturesforfacerecognition
onbiaseddatasets.
4.3.1AdversarialLearningandDisentangledRepresentation
We˝rstreviewpreviousworkrelatedtoadversariallearningandrepresentationdisentanglement.
Adversariallearning[153]hasbeenwellexploredinmanycomputervisionapplications.For
example,GenerativeAdversarialNetworks(GANs)[154]employadversariallearningtotraina
generatorbycompetingwithadiscriminatorthatdistinguishesrealimagesfromsyntheticones.
Adversariallearninghasalsobeenappliedtodomainadaptation[155

158].Aproblemofcurrent
interestistolearninterpretablerepresentationswithsemanticmeaning[159].Manystudies
havebeenlearningfactorsofvariationsinthedatabysupervisedlearning[152,160

163],or
semi-supervised/unsupervisedlearning[164

167],referredtoasdisentangledrepresentation.For
superviseddisentangledfeaturelearning,adversarialnetworksareutilizedtoextractfeaturesthat
onlycontaindiscriminativeinformationofatargettask.Forfacerecognition,Liu
etal
.[161]
proposeadisentangledrepresentationbytraininganadversarialautoencodertoextractfeaturesthat
cancaptureidentitydiscriminationanditscomplementaryknowledge.Incontrast,ourproposed
DebFacedi˙ersfrompriorworksinthateachbranchofamulti-tasknetworkactsasbothagenerator
anddiscriminatorsofotherbranches(Fig.4.1c).
4.3.2Methodology
4.3.2.1AlgorithmDesign
Theproposednetworktakesadvantageoftherelationshipbetweendemographicsandfaceidentities.
Ononehand,demographiccharacteristicsarehighlycorrelatedtofacefeatures.Ontheotherhand,
demographicattributesareheterogeneousintermsofdatatypeandsemantics[168].Amaleperson,
75
Figure4.2OverviewoftheproposedDe-biasingface(DebFace)network.DebFaceiscomposedof
threemajorblocks,
i.e.
,asharedfeatureencodingblock,afeaturedisentanglingblock,andafeature
aggregationblock.Thesolidarrowsrepresenttheforwardinference,andthedashedarrowsstand
foradversarialtraining.Duringinference,eitherDebFace-ID(
i.e.
,
f
˚ˇ
)orDemoIDcanbeusedfor
facematchinggiventhedesiredtrade-o˙betweenbiasnessandaccuracy.
forexample,isnotnecessarilyofacertainageorofacertainrace.Accordingly,wepresenta
frameworkthatjointlygeneratesdemographicfeaturesandidentityfeaturesfromasinglefaceimage
byconsideringboththeaforementionedattributecorrelationandattributeheterogeneityinaDNN.
Whileourmaingoalistomitigatedemographicbiasfromfacerepresentation,weobservethat
demographicestimationsarebiasedaswell(seeFig.4.5).Howcanweremovethebiasofface
recognitionwhendemographicestimationsthemselvesarebiased?Cook
etal
.[169]investigated
thise˙ectandfoundtheperformanceoffacerecognitionisa˙ectedbymultipledemographic
covariates.Weproposeade-biasingnetwork,DebFace,thatdisentanglestherepresentationinto
gender(DebFace-G),age(DebFace-A),race(DebFace-R),andidentity(DebFace-ID),todecrease
biasofbothfacerecognitionanddemographicestimations.Usingadversariallearning,theproposed
methodiscapableofjointlylearningmultiplediscriminativerepresentationswhileensuringthat
eachclassi˝ercannotdistinguishamongclassesthroughnon-correspondingrepresentations.
Thoughlessbiased,DebFace-IDlosesdemographiccuesthatareusefulforidenti˝cation.In
particular,raceandgenderaretwocriticalcomponentsthatconstitutefacepatterns.Hence,wedesire
toincorporateraceandgenderwithDebFace-IDtoobtainamoreintegratedfacerepresentation.
Weemployalight-weightfully-connectednetworktoaggregatetherepresentationsintoaface
representation(DemoID)withthesamedimensionalityasDebFace-ID.
76
4.3.2.2NetworkArchitecture
Fig.4.2givesanoverviewoftheproposedDebFacenetwork.Itconsistsoffourcomponents:the
sharedimage-to-featureencoder
ˆ
˚<6
,thefourattributeclassi˝ers(includinggender
˘
˝
,age
˘

,race
˘
'
,andidentity
˘
˚ˇ
),thedistributionclassi˝er
˘
ˇ8BCA
,andthefeatureaggregationnetwork
ˆ
˙40C
.
Weassumeaccessto
#
labeledtrainingsamples
f
(
x
(
8
)
ŒH
(
8
)
6
ŒH
(
8
)
0
ŒH
(
8
)
A
ŒH
(
8
)
83
)
g
#
8
=1
.Ourapproachtakesan
image
x
(
8
)
astheinputof
ˆ
˚<6
.Theencoderprojects
x
(
8
)
toitsfeaturerepresentation
ˆ
˚<6
(
x
(
8
)
)
.The
featurerepresentationisthendecoupledintofour
ˇ
-dimensionalfeaturevectors,gender
f
(
8
)
6
,age
f
(
8
)
0
,
race
f
(
8
)
A
,andidentity
f
(
8
)
˚ˇ
,respectively.Next,eachattributeclassi˝eroperatesthecorresponding
featurevectortocorrectlyclassifythetargetattributebyoptimizingparametersofboth
ˆ
˚<6
and
therespectiveclassi˝er
˘

.Forademographicattributewith
 
categories,thelearningobjective
L
˘
ˇ4<>
(
x
ŒH
ˇ4<>
;
ˆ
˚<6
Œ˘
ˇ4<>
)
isthestandardcrossentropylossfunction.Forthe
=

identity
classi˝cation,weadoptAM-Softmax[170]astheobjectivefunction
L
˘
˚ˇ
(
x
ŒH
83
;
ˆ
˚<6
Œ˘
˚ˇ
)
.To
de-biasallofthefeaturerepresentations,adversarialloss
L
3E
(
x
ŒH
ˇ4<>
ŒH
83
;
ˆ
˚<6
Œ˘
ˇ4<>
Œ˘
˚ˇ
)
isappliedtotheabovefourclassi˝erssuchthateachofthemwillnotbeabletopredictcorrect
labelswhenoperatingirrelevantfeaturevectors.Speci˝cally,givenaclassi˝er,theremainingthree
attributefeaturevectorsareimposedonitandattempttomisleadtheclassi˝erbyonlyoptimizingthe
parametersof
ˆ
˚<6
.Tofurtherimprovethedisentanglement,wealsoreducethemutualinformation
amongtheattributefeaturesbyintroducingadistributionclassi˝er
˘
ˇ8BCA
.
˘
ˇ8BCA
istrainedto
identifywhetheraninputrepresentationissampledfromthejointdistribution
?
(
f
6
Œ
f
0
Œ
f
A
Œ
f
˚ˇ
)
or
themultiplicationofmargindistributions
?
(
f
6
)
?
(
f
0
)
?
(
f
A
)
?
(
f
˚ˇ
)
viaabinarycrossentropyloss
L
˘
ˇ8BCA
(
x
ŒH
ˇ8BCA
;
ˆ
˚<6
Œ˘
ˇ8BCA
)
,where
H
ˇ8BCA
isthedistributionlabel.Similartoadversarialloss,a
factorizationobjectivefunction
L
˙02C
(
x
ŒH
ˇ8BCA
;
ˆ
˚<6
Œ˘
ˇ8BCA
)
isutilizedtorestrainthe
˘
ˇ8BCA
from
distinguishingtherealdistributionandthusminimizesthemutualinformationofthefourattribute
representations.BothadversariallossandfactorizationlossaredetailedinSec.4.3.2.3.Altogether,
77
DebFaceendeavorstominimizethejointloss:
L
(
x
ŒH
ˇ4<>
ŒH
83
ŒH
ˇ8BCA
;
ˆ
˚<6
Œ˘
ˇ4<>
Œ˘
˚ˇ
Œ˘
ˇ8BCA
)=
L
˘
ˇ4<>
(
x
ŒH
ˇ4<>
;
ˆ
˚<6
Œ˘
ˇ4<>
)
+
L
˘
˚ˇ
(
x
ŒH
83
;
ˆ
˚<6
Œ˘
˚ˇ
)
+
L
˘
ˇ8BCA
(
x
ŒH
ˇ8BCA
;
ˆ
˚<6
Œ˘
ˇ8BCA
)
+
_
L
3E
(
x
ŒH
ˇ4<>
ŒH
83
;
ˆ
˚<6
Œ˘
ˇ4<>
Œ˘
˚ˇ
)
+
a
L
˙02C
(
x
ŒH
ˇ8BCA
;
ˆ
˚<6
Œ˘
ˇ8BCA
)
Œ
(4.1)
where
_
and
a
arehyper-parametersdetermininghowmuchtherepresentationisdecomposedand
decorrelatedineachtrainingiteration.
ThediscriminativedemographicfeaturesinDebFace-IDareweakenedbyremovingdemographic
information.Fortunately,ourde-biasingnetworkpreservesallpertinentdemographicfeaturesin
adisentangledway.Basically,wetrainanothermultilayerperceptron(MLP)
ˆ
˙40C
toaggregate
DebFace-IDandthedemographicembeddingsintoauni˝edfacerepresentationDemoID.Sinceage
generallydoesnotpertaintoaperson'sidentity,weonlyconsidergenderandraceastheidentity-
informativeattributes.Theaggregatedembedding,
f
ˇ4<>˚ˇ
=
ˆ
540C
(
f
˚ˇ
Œ
f
6
Œ
f
A
)
,issupervisedbyan
identity-basedtripletloss:
L
ˆ
˙40C
=
1
"
"
X
8
=1
[
k
f
(
8
)
ˇ4<>˚ˇ
0

f
(
8
)
ˇ4<>˚ˇ
?
k
2
2
k
f
(
8
)
ˇ4<>˚ˇ
0

f
(
8
)
ˇ4<>˚ˇ
=
k
2
2
+
U
]
+
Œ
(4.2)
where
f
f
(
8
)
ˇ4<>˚ˇ
0
Œ
f
(
8
)
ˇ4<>˚ˇ
?
Œ
f
(
8
)
ˇ4<>˚ˇ
=
g
isthe
8
C
tripletconsistingofananchor,apositive,anda
negativeDemoIDrepresentation,
"
isthenumberofhardtripletsinamini-batch.
[
G
]
+
=
max
(0
ŒG
)
,
and
U
isthemargin.
4.3.2.3AdversarialTrainingandDisentanglement
AsdiscussedinSec.4.3.2.2,theadversariallossaimstominimizethetask-independentinformation
semantically,whilethefactorizationlossstrivestodwindletheinterferinginformationstatistically.
78
Weemploybothlossestodisentangletherepresentationextractedby
ˆ
˚<6
.Weintroducethe
adversariallossasameanstolearnarepresentationthatisinvariantintermsofcertainattributes,
whereaclassi˝ertrainedonitcannotcorrectlyclassifythoseattributesusingthatrepresentation.We
takeoneoftheattributes,
e.g.
,gender,asanexampletoillustratetheadversarialobjective.Firstof
all,forademographicrepresentation
f
ˇ4<>
,welearnagenderclassi˝eron
f
ˇ4<>
byoptimizingthe
classi˝cationloss
L
˘
˝
(
x
ŒH
ˇ4<>
;
ˆ
˚<6
Œ˘
˝
)
.Secondly,forthesamegenderclassi˝er,weintendto
maximizethechaosofthepredicteddistribution[171].Itiswellknownthatauniformdistribution
hasthehighestentropyandpresentsthemostrandomness.Hence,wetraintheclassi˝ertopredict
theprobabilitydistributionascloseaspossibletoauniformdistributionoverthecategoryspaceby
minimizingthecrossentropy:
L
˝
3E
(
x
ŒH
ˇ4<>
ŒH
83
;
ˆ
˚<6
Œ˘
˝
)=

 
˝
X
:
=1
1
 
˝

(log
4
˘
˝
(
f
ˇ4<>
)
:
P
 
˝
9
=1
4
˘
˝
(
f
ˇ4<>
)
9
+log
4
˘
˝
(
f
˚ˇ
)
:
P
 
˝
9
=1
4
˘
˝
(
f
˚ˇ
)
9
)
Œ
(4.3)
where
 
˝
isthenumberofcategoriesingender
1
,andtheground-truthlabelisnolongeran
one-hotvector,buta
 
˝
-dimensionalvectorwithallelementsbeing
1
 
˝
.Theabovelossfunction
correspondstothedashlinesinFig.4.2.Itstrivesforgender-invarianceby˝ndingarepresentation
thatmakesthegenderclassi˝er
˘
˝
performpoorly.Weminimizetheadversariallossbyonly
updatingparametersin
ˆ
˚<6
.
Wefurtherdecorrelatetherepresentationsbyreducingthemutualinformationacrossattributes.
Byde˝nition,themutualinformationistherelativeentropy(KLdivergence)betweenthejoint
distributionandtheproductdistribution.Toincreaseuncorrelation,weaddadistributionclassi˝er
˘
ˇ8BCA
thatistrainedtosimplyperformabinaryclassi˝cationusing
L
˘
ˇ8BCA
(
x
ŒH
ˇ8BCA
;
ˆ
˚<6
Œ˘
ˇ8BCA
)
onsamples
f
ˇ8BCA
fromboththejointdistributionanddotproductdistribution.Similartoadversarial
learning,wefactorizetherepresentationsbytrickingtheclassi˝erviathesamesamplessothatthe
predictionsareclosetorandomguesses,
L
˙02C
(
x
ŒH
ˇ8BCA
;
ˆ
˚<6
Œ˘
ˇ8BCA
)=

2
X
8
=1
1
2
log
4
˘
ˇ8BCA
(
f
ˇ8BCA
)
8
P
2
9
=1
4
˘
ˇ8BCA
(
f
ˇ8BCA
)
9
Ł
(4.4)
Ineachmini-batch,weconsider
ˆ
˚<6
(
x
)
assamplesofthejointdistribution
?
(
f
6
Œ
f
0
Œ
f
A
Œ
f
˚ˇ
)
.We
1
Inourcase,
 
˝
=2
,
i.e.
,maleandfemale.
79
Table4.1Statisticsoftrainingandtestingdatasetsusedinthepaper.
Dataset#ofImages#ofSubjects
Containsthelabelof
GenderAgeRaceID
CACD[172]
163
Œ
4462
Œ
000
NoYesNoYes
IMDB[173]
460
Œ
72320
Œ
284
YesYesNoYes
UTKFace[174]
24
Œ
106
-YesYesYesNo
AgeDB[175]
16
Œ
488567
YesYesNoYes
AFAD[176]
165
Œ
515
-YesYesYes
0
No
AAF[177]
13
Œ
32213
Œ
322
YesYesNoYes
FG-NET
2
1
Œ
00282
NoYesNoYes
RFW[4]
665
Œ
807
-NoNoYesPartial
BUPT-Balancedface[5]
1
Œ
251
Œ
43028
Œ
000
NoNoYesYes
IMFDB-CVIT[178]
34
Œ
512100
YesAgeGroupsYes
*
Yes
Asian-DeepGlint[179]
2
Œ
830
Œ
14693
Œ
979
NoNoYes
0
Yes
MS-Celeb-1M[23]
5
Œ
822
Œ
65385
Œ
742
NoNoNoYes
PCSO[180]
1
Œ
447
Œ
6075
Œ
749
YesYesYesYes
LFW[27]
13
Œ
2335
Œ
749
NoNoNoYes
IJB-A[29]
25
Œ
813500
YesYesSkinToneYes
IJB-C[9]
31
Œ
3343
Œ
531
YesYesSkinToneYes
0
EastAsian
*
Indian
randomlyshu˜efeaturevectorsofeachattribute,andre-concatenatetheminto
4
ˇ
-dimension,
whichareapproximatedassamplesoftheproductdistribution
?
(
f
6
)
?
(
f
0
)
?
(
f
A
)
?
(
f
˚ˇ
)
.During
factorization,weonlyupdate
ˆ
˚<6
tominimizemutualinformationbetweendecomposedfeatures.
4.3.3Experiments
4.3.3.1DatasetsandPre-processing
Weutilize
15
totalfacedatasetsinthiswork,forlearningthedemographicestimationmodels,
thebaselinefacerecognitionmodel,DebFacemodelaswellastheirevaluation.Tobespeci˝c,
CACD[172],IMDB[173],UTKFace[174],AgeDB[175],AFAD[176],AAF[177],FG-NET[181],
RFW[4],IMFDB-CVIT[178],Asian-DeepGlint[179],andPCSO[180]arethedatasetsfortraining
andtestingdemographicestimationmodels;andMS-Celeb-1M[23],LFW[27],IJB-A[29],and
IJB-C[9]areforlearningandevaluatingfaceveri˝cationmodels.Tab.4.1reportsthestatisticsof
trainingandtestingdatasetsinvolvedinalltheexperimentsofbothGACandDebFace,including
thetotalnumberoffaceimages,thetotalnumberofsubjects(identities),andwhetherthedataset
containstheannotationofgender,age,race,oridentity(ID).AllfacesaredetectedbyMTCNN[33].
Eachfaceimageiscroppedandresizedto
112

112
pixelsusingasimilaritytransformationbased
80
onthedetectedlandmarks.
4.3.3.2ImplementationDetails
DebFaceistrainedonacleanedversionofMS-Celeb-1M[6],usingtheArcFacearchitecture[6]
with
50
layersfortheencoder
ˆ
˚<6
.SincetherearenodemographiclabelsinMS-Celeb-1M,we
˝rsttrainthreedemographicattributeestimationmodelsforgender,age,andrace,respectively.
Forageestimation,themodelistrainedonthecombinationofCACD,IMDB,UTKFace,AgeDB,
AFAD,andAAFdatasets.Thegenderestimationmodelistrainedonthesamedatasetsexcept
CACDwhichcontainsnogenderlabels.WecombineAFAD,RFW,IMFDB-CVIT,andPCSOfor
raceestimationtraining.AllthreemodelsuseResNet[50]with
34
layersforage,
18
layersfor
genderandrace.Wediscusstheevaluationresultsofthedemographicattributeestimationmodels
inSec.4.5.
WepredictthedemographiclabelsofMS-Celeb-1Mwiththewell-traineddemographicmodels.
OurDebFaceisthentrainedonthere-labeledMS-Celeb-1MusingSGDwithamomentumof
0
Ł
9
,a
weightdecayof
0
Ł
01
,andabatchsizeof
256
.Thelearningratestartsfrom
0
Ł
1
anddropsto
0
Ł
0001
followingthescheduleat
8
,
13
,and
15
epochs.Thedimensionalityoftheembeddinglayerof
ˆ
˚<6
is
4

512
,
i.e.
,eachattributerepresentation(gender,age,race,ID)isa
512
-
dim
vector.Wekeep
thehyper-parametersettingofAM-Softmaxas[6]:
B
=64
and
<
=0
Ł
5
.Thefeatureaggregation
network
ˆ
˙40C
comprisesoftwolinearresidualunitswithP-ReLUandBatchNorminbetween.
ˆ
˙40C
istrainedonMS-Celeb-1MbySGDwithalearningrateof
0
Ł
01
.Thetripletlossmargin
U
is
1
Ł
0
.Thedisentangledfeaturesofgender,race,andidentityareconcatenatedintoa
3

512
-
dim
vector,whichinputsto
ˆ
˙40C
.Thenetworkisthentrainedtooutputa
512
-
dim
representationfor
facerecognitiononbiaseddatasets.
4.3.3.3De-biasingFaceVeri˝cation
Baseline:
WecompareDebFace-IDwitharegularfacerepresentationmodelwhichhasthesame
architectureasthesharedfeatureencoderofDebFace.ReferredtoasBaseFace,thisbaselinemodel
81
(a)
BaseFace
(b)
DebFace-ID
Figure4.3FaceVeri˝cationAUC(%)oneachdemographiccohort.Thecohortsarechosenbased
onthethreeattributes,
i.e.
,gender,age,andrace.To˝ttheresultsintoa
2
Dplot,weshowthe
performanceofmaleandfemaleseparately.Duetothelimitednumberoffaceimagesinsome
cohorts,theirresultsaregraycells.
isalsotrainedonMS-Celeb-1M,withtherepresentationdimensionof
512
.
Toshowthee˚cacyofDebFace-IDonbiasmitigation,weevaluatetheveri˝cationperformance
ofDebFace-IDandBaseFaceonfacesfromeachdemographiccohort.Thereare
48
totalcohorts
giventhecombinationofdemographicattributesincluding
2
gender(male,female),
4
race
3
(Black,
White,EastAsian,Indian),and
6
agegroup(
0

12
,
13

18
,
19

34
,
35

44
,
45

54
,
55

100
).
WecombineCACD,AgeDB,CVIT,andasubsetofAsian-DeepGlintasthetestingset.Overlapping
identitiesamongthesedatasetsareremoved.IMDBisexcludedfromthetestingsetduetoits
massivenumberofwrongIDlabels.Forthedatasetwithoutcertaindemographiclabels,wesimply
usethecorrespondingmodelstopredictthelabels.WereporttheAreaUndertheCurve(AUC)of
theReceiverOperatingCharacteristics(ROC).Wede˝nethedegreeofbias,termed
biasness
,asthe
standarddeviationofperformanceacrosscohorts.
Fig.4.3showsthefaceveri˝cationresultsofBaseFaceandDebFace-IDoneachcohort.Thatis,
foraparticularfacerepresentation(
e.g.
,DebFace-ID),wereportitsAUConeachcohortbyputting
thenumberinthecorrespondingcell.Fromtheseheatmaps,weobservethatbothDebFace-IDand
BaseFacepresentbiasinfaceveri˝cation,wheretheperformanceonsomecohortsaresigni˝cantly
worse,especiallythecohortsofIndianfemaleandelderlypeople.ComparedtoBaseFace,DebFace-
IDsuggestslessbiasandthedi˙erenceofAUCissmaller,wheretheheatmapexhibitssmoother
edges.Fig.4.4showstheperformanceoffaceveri˝cationon
12
demographiccohorts.Both
3
Toclarify,weconsidertworacegroups,BlackandWhite;andtwoethnicitygroups,EastAsianandIndian.The
wordracedenotesbothraceandethnicityhere.
82
(a)
Gender
(b)
Age
(c)
Race
Figure4.4Theoverallperformanceoffaceveri˝cationAUC(%)ongender,age,andrace.
DebFace-IDandBaseFacepresentsimilarrelativeaccuraciesacrosscohorts.Forexample,both
algorithmsperformworseontheyoungeragecohortsthanonadults;andtheperformanceonthe
Indianissigni˝cantlylowerthanontheotherraces.DebFace-IDdecreasesthebiasbygaining
discriminativefacefeaturesforcohortswithlessimagesinspiteofthereductionintheperformance
oncohortswithmoresamples.
4.3.3.4De-biasingDemographicAttributeEstimation
Baseline:
Wefurtherexplorethebiasofdemographicattributeestimationandcomparedemographic
attributeclassi˝ersofDebFacewithbaselineestimationmodels.Wetrainthreedemographic
estimationmodels,namely,genderestimation(BaseGender),ageestimation(BaseAge),andrace
estimation(BaseRace),onthesametrainingsetasDebFace.Forfairness,allthreemodelshavethe
samearchitectureasthesharedlayersofDebFace.
WecombinethefourdatasetsmentionedinSec.4.3.3.3withIMDBastheglobaltestingset.
Asalldemographicestimationsaretreatedasclassi˝cationproblems,theclassi˝cationaccuracy
isusedastheperformancemetric.AsshowninFig.4.5,alldemographicattributeestimations
presentsigni˝cantbias.Forgenderestimation,bothalgorithmsperformworseontheWhiteand
BlackcohortsthanonEastAsianandIndian.Inaddition,theperformanceonyoungchildrenis
signi˝cantlyworsethanonadults.Ingeneral,theraceestimationmodelsperformbetteronthemale
cohortthanonfemale.Comparedtogender,raceestimationshowshigherbiasintermsofage.Both
baselinemethodsandDebFaceperformworseoncohortsinagebetween
13
to
44
thaninotherage
83
(a)
BaseGender
(b)
DebFace-G
(c)
BaseAge
(d)
DebFace-A
(e)
BaseRace
(f)
DebFace-R
Figure4.5Classi˝cationaccuracy(%)ofdemographicattributeestimationsonfacesofdi˙erent
cohorts,byDebFaceandthebaselines.Forsimplicity,weuseDebFace-G,DebFace-A,and
DebFace-Rtorepresentthegender,age,andraceclassi˝erofDebFace.
Table4.2BiasnessofFaceRecognitionandDemographicAttributeEstimation.
Method
FaceVeri˝cationDemographicEstimation
All
Gender
AgeRace
Gender
AgeRace
Baseline
6.830.503.135.4912.3810.8314.58
DebFace
5.070.151.833.7010.227.6110.00
groups.
Similartorace,ageestimationstillachievesbetterperformanceonmalethanonfemale.
Moreover,thewhitecohortshowsdominantadvantagesoverotherracesinageestimation.Inspiteof
theexistingbiasindemographicattributeestimations,theproposedDebFaceisstillabletomitigate
biasderivedfromalgorithms.ComparedtoFig.4.5a,4.5e,4.5c,cellsinFig.4.5b,4.5f,4.5d
presentmoreuniformcolors.WesummarizethebiasnessofDebFaceandbaselinemodelsforboth
facerecognitionanddemographicattributeestimationsinTab.4.2.Ingeneral,weobserveDebFace
substantiallyreducesbiasnessforbothtasks.Forthetaskwithlargerbiasness,thereductionof
biasnessislarger.
84
(a)
BaseFace
(b)
DebFace-ID
(c)
BaseFace
(d)
DebFace-ID
(e)
BaseFace
(f)
DebFace-ID
Figure4.6ThedistributionoffaceidentityrepresentationsofBaseFaceandDebFace.Both
collectionsoffeaturevectorsareextractedfromimagesofthesamedataset.Di˙erentcolorsand
shapesrepresentdi˙erentdemographicattributes.Zoominfordetails.
Original
BaseFace
DebFace-ID
DebFace-G
DebFace-R
DebFace-A
Figure4.7ReconstructedImagesusingFaceandDemographicRepresentations.The˝rstrowisthe
originalfaceimages.Fromthesecondrowtothebottom,thefaceimagesarereconstructedfrom2)
BaseFace;3)DebFace-ID;4)DebFace-G;5)DebFace-R;6)DebFace-A.Zoominfordetails.
4.3.3.5AnalysisofDisentanglement
WenoticethatDebFacestillsu˙ersunequalperformanceindi˙erentdemographicgroups.It
isbecausethereareotherlatentvariablesbesidesthedemographics,suchasimagequalityor
85
captureconditionsthatcouldleadtobiasedperformance.Suchvariablesaredi˚culttocontrol
inpre-collectedlargefacedatasets.IntheframeworkofDebFace,itisalsorelatedtothedegree
offeaturedisentanglement.Afullydisentanglingissupposedtocompletelyremovethefactors
ofbiasfromdemographicinformation.ToillustratethefeaturedisentanglementofDebFace,we
showthedemographicdiscriminativeabilityoffacerepresentationsbyusingthesefeaturesto
estimategender,age,andrace.Speci˝cally,we˝rstextractidentityfeaturesofimagesfromthe
testingsetinSec.4.3.3.1andsplitthemintotrainingandtestingsets.Givendemographiclabels,
thefacefeaturesarefedintoatwo-layerfully-connectednetwork,learningtoclassifyoneofthe
demographicattributes.Tab.4.3reportsthedemographicclassi˝cationaccuracyonthetestingset.
Forallthreedemographicestimations,DebFace-IDpresentsmuchloweraccuraciesthanBaseFace,
indicatingthedeclineofdemographicinformationinDebFace-ID.Wealsoplotthedistributionof
identityrepresentationsinthefeaturespaceofBaseFaceandDebFace-ID.Fromthetestingsetin
Sec.4.3.3.3,werandomlyselect50subjectsineachdemographicgroupandoneimageofeach
subject.BaseFaceandDebFace-IDareextractedfromtheselectedimagesetandarethenprojected
from
512
-
dim
to
2
-
dim
byT-SNE.Fig.4.6showstheirT-SNEfeaturedistributions.Weobservethat
BaseFacepresentscleardemographicclusters,whilethedemographicclustersofDebFace-ID,asa
resultofdisentanglement,mostlyoverlapwitheachother.
TovisualizethedisentangledfeaturerepresentationsofDebFace,wetrainadecoderthat
reconstructsfaceimagesfromtherepresentations.Fourfacedecodersaretrainedseparately
foreachdisentangledcomponent,
i.e.
,gender,age,race,andID.Inaddition,wetrainanother
decodertoreconstructfacesfromBaseFaceforcomparison.AsshowninFig.4.7,bothBaseFace
andDebFace-IDmaintaintheidentifyfeaturesoftheoriginalfaces,whileDebFace-IDpresents
lessdemographiccharacteristics.Noraceorage,butgenderfeaturescanbeobservedonfaces
reconstructedfromDebFace-G.Meanwhile,wecanstillrecognizeraceandageattributesonfaces
generatedfromDebFace-RandDebFace-A.
86
Table4.3DemographicClassi˝cationAccuracy(%)byfacefeatures.
MethodGenderRaceAge
BaseFace95.2789.8278.14
DebFace-ID73.3661.7949.91
Table4.4FaceVeri˝cationAccuracy(%)onRFWdataset.
MethodGenderRaceAge
BaseFace95.2789.8278.14
DebFace-ID73.3661.7949.91
(a)
BaseFace:Race
(b)
DebFace-ID:Race
(c)
BaseFace:Age
(d)
DebFace-ID:Age
Figure4.8Thepercentageoffalseacceptedcrossraceoragepairsat1%FAR.
4.3.3.6FaceVeri˝cationonPublicTestingDatasets
Wereporttheperformanceofthreedi˙erentsettings,using1)BaseFace,thesamebaselinein
Sec.4.3.3.3,2)DebFace-ID,and3)thefusedrepresentationDemoID.Table4.5reportsface
veri˝cationresultsononthreepublicbenchmarks:LFW,IJB-A,andIJB-C.OnLFW,DemoID
outperformsBaseFacewhilemaintainingsimilaraccuracycomparedtoSOTAalgorithms.On
IJB-A/C,DemoIDoutperformsallpriorworksexceptPFE[184].AlthoughDebFace-IDshows
lowerdiscrimination,TARatlowerFARonIJB-CishigherthanthatofBaseFace.Toevaluate
DebFaceonaraciallybalancedtestingdatasetRFW[4]andcomparewiththework[5],wetraina
DebFacemodelonBUPT-Balancedface[5]dataset.Thenewmodelistrainedtoreduceracialbias
bydisentanglingIDandrace.Tab.4.4reportstheveri˝cationresultsonRFW.WhileDebFace-ID
givesaslightlylowerfaceveri˝cationaccuracy,itimprovesthebiasnessover[5].
WeobservethatDebFace-IDislessdiscriminativethanBaseFace,orDemoID,sincedemo-
graphicsareessentialcomponentsoffacefeatures.TounderstandthedeteriorationofDebFace,we
87
Table4.5Veri˝cationPerformanceonLFW,IJB-A,andIJB-C.
MethodLFW(%)
Method
IJB-A(%)IJB-C@FAR(%)
0.1%FAR0.001%0.01%0.1%
DeepFace+[17]
97
Ł
35
Yin
etal
.[182]
73
Ł
9

4
Ł
2
--69.3
CosFace[19]
99
Ł
73
Cao
etal
.[59]
90
Ł
4

1
Ł
474
Ł
784
Ł
091
Ł
0
ArcFace[6]
99
Ł
83
Multicolumn[183]
92
Ł
0

1
Ł
377
Ł
186
Ł
292
Ł
7
PFE[184]
99
Ł
82
PFE[184]
95
Ł
3

0
Ł
989
Ł
693
Ł
395
Ł
5
BaseFace
99
Ł
38
BaseFace
90
Ł
2

1
Ł
180
Ł
288
Ł
092
Ł
9
DebFace-ID
98
Ł
97
DebFace-ID
87
Ł
6

0
Ł
982
Ł
088
Ł
189
Ł
5
DemoID
99
Ł
50
DemoID
92
Ł
2

0
Ł
883
Ł
289
Ł
492
Ł
9
analysethee˙ectofdemographicheterogeneityonfaceveri˝cationbyshowingthetendencyfor
onedemographicgrouptoexperienceafalseaccepterrorrelativetoanothergroup.Foranytwo
demographiccohorts,wecheckthenumberoffalselyacceptedpairsthatarefromdi˙erentgroupsat
1%
FAR.Fig.4.8showsthepercentageofsuchfalselyaccepteddemographic-heterogeneouspairs.
ComparedtoBaseFace,DebFaceexhibitsmorecross-demographicpairsthatarefalselyaccepted,
resultingintheperformancedeclineondemographicallybiaseddatasets.Duetothedemographic
informationreduction,DebFace-IDismoresusceptibletoerrorsbetweendemographicgroups.In
thesenseofde-biasing,itispreferabletodecoupledemographicinformationfromidentityfeatures.
However,ifweprefertomaintaintheoverallperformanceacrossalldemographics,wecanstill
aggregatealltherelevantinformation.Itisanapplication-dependenttrade-o˙betweenaccuracyand
de-biasing.DebFacebalancestheaccuracyvs.biastrade-o˙bygeneratingbothdebiasedidentity
anddebiaseddemographicrepresentations,whichmaybeaggregatedintoDemoIDifbiasislessof
aconcern.
4.3.3.7DistributionsofScores
Wefollowtheworkof[21]thatinvestigatesthee˙ectofdemographichomogeneityandheterogeneity
onfacerecognition.We˝rstrandomlyselectimagesfromCACD,AgeDB,CVIT,andAsian-
DeepGlintdatasets,andextractthecorrespondingfeaturevectorsbyusingthemodelsofBaseFace
andDebFace,respectively.Giventheirdemographicattributes,weputthoseimagesintoseparate
groupsdependingonwhethertheirgender,age,andracearethesameordi˙erent.Foreachgroup,a
88
(a)
BaseFace
(b)
DebFace
Figure4.9BaseFaceandDebFacedistributionsofthesimilarityscoresoftheimposterpairsacross
homogeneousversusheterogeneousgender,age,andracecategories.
˝xedfalsealarmrate(thepercentageofthefacepairsfromthesamesubjectsbeingfalselyveri˝edas
fromdi˙erentsubjects)issetto
1%
.Amongthefalselyveri˝edpairs,weplotthetop
10
C
percentile
scoresofthenegativefacepairs(apairoffaceimagesthatarefromdi˙erentsubjects)giventheir
demographicattributes.AsshowninFig.4.9aandFig.4.9b,weobservethatthesimilaritiesof
DebFacearehigherthanthoseofBaseFace.Oneofthepossiblereasonsisthatthedemographic
informationisdisentangledfromtheidentityfeaturesofDebFace,increasingtheoverallpair-wise
similaritiesbetweenfacesofdi˙erentidentities.Intermsofde-biasing,DebFacealsore˛ects
smallerdi˙erencesofthescoredistributionwithrespecttothehomogeneityandheterogeneityof
demographics.
4.4MitigatingFaceRecognitionBiasviaGroupAdaptiveClas-
si˝er
Inspitethee˙ectivenessofDebFaceinmitigatingdemographicbias,itdegeneratestheoverall
recognitionperformanceaswell.Thismotivatesusto˝ndanthersolutiontothisproblemsuchthat
thebiasnesscanbereducedwithoutimpairingtheaveragerecognitionperformance.Inthissection,
89
(a)
(b)Bias(
1
Ł
11
!
0
Ł
60
)
Figure4.10(a)Ourproposedgroupadaptiveclassi˝er(GAC)automaticallychoosesbetween
non-adaptiveandadaptiveAlayerinamulti-layernetwork,wherethelatteruses
demographic-group-speci˝ckernelandattention.(b)Comparedtothebaselinewiththe
50
-layer
ArcFacebackbone,GACimprovesfaceveri˝cationaccuracyinmostgroupsofRFWdataset[4],
especiallyunder-representedgroups,leadingtomitigatedFRbias.GACreducesbiasnessfrom
1
Ł
11
to
0
Ł
60
.
weintroduceoursecondapproachtomitigatingfacerecognitionbiasviagroupadaptiveclassi˝er
(GAC).ThemainideaofGACistooptimizethefacerepresentationlearningoneverydemographic
groupinasinglenetwork,despitedemographicallyimbalancedtrainingdata.Conceptually,we
maycategorizefacefeaturesintotwotypesofpatterns:
generalpattern
issharedbyallfaces;
di˙erentialpattern
isrelevanttodemographicattributes.Whenthedi˙erentialpatternofone
speci˝cdemographicgroupdominatestrainingdata,thenetworklearnstopredictidentitiesmainly
basedonthatpatternasitismoreconvenienttominimizethelossthanusingotherpatterns,thus
bringingitbiastowardsfacesofthatspeci˝cgroup.Onemitigationistogivethenetworkmore
capacitytobroadenitsscopeformultiplefacepatternsfromdi˙erentdemographicgroups.An
unbiasedFRmodelshallrelyonnotonlyuniquepatternsforrecognitionofdi˙erentgroups,but
alsogeneralpatternsofallfacesforimprovedgeneralizability.Accordingly,asinFig.4.10,we
proposeGACtoexplicitlylearnthesedi˙erentfeaturepatterns.GACincludestwomodules:the
adaptivelayerandautomationmodule.TheadaptivelayerinGACcomprisesadaptiveconvolution
kernelsandchannel-wiseattentionmapswhereeachkernelandattentionmaptacklefacesin
one
demographicgroup.WealsointroduceanewobjectivefunctiontoGAC,whichdiminishesthe
variationofaverageintra-classdistancebetweendemographicgroups.
90
PriorworkondynamicCNNsintroduceadaptiveconvolutionstoeithereverylayer[185

187],or
manuallyspeci˝edlayers[188

190].Incontrast,thisworkproposesanautomationmoduletochoose
whichlayerstoapplyadaptations.Asweobserved,notallconvolutionallayersrequireadaptive
kernelsforbiasmitigation(seeFig.4.15a).AtanylayerofGAC,onlykernelsexpressinghigh
dissimilarityareconsideredasdemographic-adaptivekernels.Forthosewithlowdissimilarity,their
averagekernelissharedbyallinputimagesinthatlayer.Thus,theproposednetworkprogressively
learnstoselecttheoptimalstructureforthedemographic-adaptivelearning.Thisenablesthatboth
non-adaptivelayerswithsharedkernelsandadaptivelayersarejointlylearnedinauni˝ednetwork.
Contributionsofthisworkaresummarisedas:1)Anewfacerecognitionalgorithmthatreduces
demographicbiasandincreasesrobustnessofrepresentationsforfacesineverydemographicgroup
byadoptingadaptiveconvolutionsandattentiontechniques;2)Anewadaptationmechanismthat
automaticallydeterminesthelayerstoemploydynamickernelsandattentionmaps;3)Theproposed
methodachievesSOTAperformanceonademographic-balanceddatasetandthreebenchmarks.
4.4.1AdaptiveNeuralNetworks
SincethemaintechniqueappliedbyGACisadaptiveneuralnetwork,we˝rstreviewrecent
workrelatedtoadaptivelearning.ThreetypesofCNN-basedadaptivelearningtechniquesare
relatedtoourwork:adaptivearchitectures,adaptivekernels,andattentionmechanism.Adaptive
architecturesdesignnewperformance-basedneuralfunctionsorstructures,
e.g.
,neuronselection
hiddenlayers[191]andautomaticCNNexpansionforFR[192].AsCNNadvancesmanyAI˝elds,
priorworksproposedynamickernelstorealizecontent-adaptiveconvolutions.Li
etal
.[193]
proposeashape-drivenkernelforfacialtraitrecognitionwhereeachlandmark-centeredpatchhasa
uniquekernel.Aconvolutionfusionforgraphneuralnetworksisintroducedby[194]whereasetof
varying-size˝ltersareusedperlayer.Theworksof[195]and[196]useakernelselectionscheme
toautomaticallyadjustthereceptive˝eldsizebasedoninputs.Tobettersuitinputdata,[197]splits
trainingdataintoclustersandlearnsanexclusivekernelpercluster.Li
etal
.[198]introducean
adaptiveCNNforobjectdetectionthattransferspre-trainedCNNstoatargetdomainbyselecting
91
(a)AdaptiveKernel
(b)AttentionMap
(c)GAC
Figure4.11AcomparisonofapproachesinadaptiveCNNs.
usefulkernelsperlayer.Alternatively,onemayfeedinputimagesorfeaturesintoakernelfunction
todynamicallygenerateconvolutionkernels[199

202].Despiteitse˙ectiveness,suchindividual
adaptationmaynotbesuitablegiventhediversityoffacesindemographicgroups.Ourworkis
mostrelatedtothesideinformationadaptiveconvolution[185],whereineachlayerasub-network
inputsauxiliaryinformationtogenerate˝lterweights.Wemainlydi˙erinthatGACautomatically
learnswheretouseadaptivekernelsinamulti-layerCNN(seeFigs.4.11aand4.11c),thusmore
e˚cientandcapableinapplyingtoadeeperCNN.
Asthehumanperceptionprocessnaturallyselectsthemostpertinentpieceofinformation,
attentionmechanismsaredesignedforavarietyoftasks,
e.g.
,detection[203],recognition[204],
imagecaptioning[205],tracking[206],poseestimation[190],andsegmentation[188].Typically,
attentionweightsareestimatedbyfeedingimagesorfeaturemapsintoasharednetwork,composedof
convolutionalandpoolinglayers[204,207

209]ormulti-layerperceptron(MLP)[210

213].Apart
fromfeature-basedattention,Hou
etal
.[189]proposeacorrelation-guidedcrossattentionmapfor
few-shotclassi˝cationwherethecorrelationbetweentheclassfeatureandqueryfeaturegeneratesthe
attentionweights.Theworkof[186]introducesacross-channelcommunicationblocktoencourage
informationexchangeacrosschannelsattheconvolutionallayer.Toacceleratethechannelinteraction,
Wang
etal
.[187]proposea
1
Dconvolutionacrosschannelsforattentionprediction.Di˙erent
frompriorwork,ourattentionmapsareconstructedbydemographicinformation(seeFigs.4.11b
andFig.4.11c),whichimprovestherobustnessoffacerepresentationsineverydemographicgroup.
92
4.4.2Methodology
4.4.2.1Overview
OurgoalistotrainaFRnetworkthatisimpartialtoindividualsindi˙erentdemographicgroups.
Unlikeimage-relatedvariations,
e.g.
,large-posesorlow-resolutionfacesarehardertoberecognized,
demographicattributesaresubject-relatedpropertieswithnoapparentimpactinrecognizability
ofidentity,atleastfromalayman'sperspective.Thus,anunbiasedFRsystemshouldbeableto
obtainequallysalientfeaturesforfacesacrossdemographicgroups.However,duetoimbalanced
demographicdistributionsandinherentfacedi˙erencesbetweengroups,itwasshownthatcertain
groupsachievehigherperformanceevenwithhand-craftedfeatures[14].Thus,itisimpracticalto
extractfeaturesfromdi˙erentdemographicgroupsthatexhibitequaldiscriminability.Despitesuch
disparity,aFRalgorithmcanstillbedesignedto
mitigate
thedi˙erenceinperformance.
Tothisend,weproposeaCNN-basedgroupadaptiveclassi˝erthatutilizesdynamickernelsand
attentionmapstoboostFRperformanceinalldemographicgroupsconsideredhere.Speci˝cally,
GAChastwomainmodules,anadaptivelayerandanautomationmodule.Inanadaptivelayer,
faceimagesorfeaturemapsareconvolvedwithauniquekernelforeachdemographicgroup,and
multipliedwithadaptiveattentionmapstoobtaindemographic-di˙erentialfeaturesforfacesina
certaingroup.Theautomationmoduledeterminesinwhichlayersofthenetworkadaptivekernels
andattentionmapsshouldbeapplied.AsshowninFig.4.12,givenanalignedface,anditsidentity
label
H
˚ˇ
,apre-traineddemographicclassi˝er˝rstestimatesitsdemographicattribute
H
ˇ4<>
.With
H
ˇ4<>
,theimageisthenfedintoarecognitionnetworkwithmultipledemographicadaptivelayers
toestimateitsidentity.Inthefollowing,wepresentthesetwomodules.
4.4.2.2AdaptiveLayer
AdaptiveConvolution.
ForastandardconvolutioninCNN,animageorfeaturemapfromthe
previouslayer
-
2
R
2


-

F
-
isconvolvedwithasinglekernelmatrix
 
2
R
:

2


F
 
,where
2
is
thenumberofinputchannels,
:
thenumberof˝lters,

-
and
F
-
theinputsize,and

 
and
F
 
the
93
Figure4.12OverviewoftheproposedGACformitigatingFRbias.GACcontainstwomajor
modules:theadaptivelayerandtheautomationmodule.Theadaptivelayerconsistsofadaptive
kernelsandattentionmaps.Theautomationmoduleisemployedtodecidewhetheralayershould
beadaptiveornot.
˝ltersize.Suchanoperationsharesthekernelwitheveryinputgoingthroughthelayer,andisthus
agnostictodemographiccontent,resultinginlimitedcapacitytorepresentminoritygroups.To
mitigatethebiasinconvolution,weintroduceatrainablematrixofkernelmasks
 
"
2
R
=

2


F
 
,
where
=
isthenumberofdemographicgroups.Intheforwardpass,thedemographiclabel
H
ˇ4<>
andkernelmatrix
 
"
arefedintotheadaptiveconvolutionallayertogeneratedemographicadaptive
˝lters.Let
 
8
2
R
2


F
 
denotethe
8
C
channeloftheshared˝lter.The
8
C
channelofadaptive
˝lterforgroup
H
ˇ4<>
is:
 
H
ˇ4<>
8
=
 
8
N
 
"
H
ˇ4<>
Œ
(4.5)
where
 
"
H
ˇ4<>
2
R
2


F
 
isthe
H
ˇ4<>
C
kernelmaskforgroup
H
ˇ4<>
,and
N
denoteselement-wise
multiplication.Thenthe
8
C
channeloftheoutputfeaturemapisgivenby
/
8
=
5
(
-

 
H
ˇ4<>
8
)
,where
*denotesconvolution,and
5
(

)
isactivation.Unlikeconventionalconvolution,samplesinevery
demographicgrouphaveauniquekernel
 
H
ˇ4<>
.
AdaptiveAttention.
Eachchannel˝lterinaCNNplaysanimportantroleineverydimensionof
the˝nalrepresentation,whichcanbeviewedasasemanticpatterndetector[205].Intheadaptive
convolution,however,thevaluesofakernelmaskarebroadcastalongthechanneldimension,
indicatingthattheweightselectionisspatiallyvariedbutchannel-wisejoint.Hence,weintroducea
94
channel-wiseattentionmechanismtoenhancethefacefeaturesthataredemographic-adaptive.First,
atrainablematrixofchannelattentionmaps
"
2
R
=

:
isinitializedineveryadaptiveattentionlayer.
Given
H
ˇ4<>
andthecurrentfeaturemap
/
2
R
:


/

F
/
,where

/
and
F
/
aretheheightandwidth
of
/
,the
8
C
channelofthenewfeaturemapiscalculatedby:
/
H
ˇ4<>
8
=
Sigmoid
(
"
H
ˇ4<>
8
)

/
8
Œ
(4.6)
where
"
H
ˇ4<>
8
istheentryinthe
H
ˇ4<>
C
rowof
"
forthedemographicgroup
H
ˇ4<>
at
8
C
column.
Incontrasttotheadaptiveconvolution,elementsofeachdemographicattentionmap
"
H
ˇ4<>
diverge
inchannel-wisemanner,whilethesingleattentionweight
"
H
ˇ4<>
8
isspatiallysharedbytheentire
matrix
/
8
2
R

/

F
/
.Thetwoadaptivematrices,
 
"
and
"
,arejointlytunedwithalltheother
parameterssupervisedbytheclassi˝cationloss.
UnlikedynamicCNNs[185]whereadditionalnetworksareengagedtoproduceinput-variant
kernelorattentionmap,ouradaptivenessisyieldedbyasimplethresholdingfunctiondirectly
pointingtothedemographicgroupwithnoauxiliarynetworks.Althoughthekernelnetwork
in[185]cangeneratecontinuouskernelswithoutenlargingtheparameterspace,furtherencoding
isrequiredifthesideinputsforkernelnetworkarediscretevariables.Ourapproach,incontrast,
divideskernelsintoclusterssothatthebranchparameterlearningcansticktoaspeci˝cgroup
withoutinterferencefromindividualuncertainties,makingitsuitablefordiscretedomainadaptation.
Further,theadaptivekernelmasksinGACaremoree˚cientintermsofthenumberofadditional
parameters.Comparedtoanon-adaptivelayer,thenumberofadditionalparametersofGACis
=

2


F
 
,whilethatof[185]is
B

:

2


F
 
ifthekernelnetworkisaone-layer
MLP,where
B
isthedimensionofinputsideinformation.Thus,foroneadaptivelayer,[185]has
B

:
=
timesmoreparametersthanours,whichcanbesubstantialgiventhetypicallargevalueof
:
,the
numberof˝lters.
95
4.4.2.3AutomationModule
Thoughfacesindi˙erentdemographicgroupsareadaptivelyprocessedbyvariouskernelsand
attentionmaps,itisine˚cienttousesuchadaptationsin
every
layerofadeepCNN.Torelieve
theburdenofunnecessaryparametersandavoidempiricaltrimming,weadoptasimilarityfusion
processtoautomaticallydeterminetheadaptivelayers.Sincethesamefusionschemecanbe
appliedtobothtypesofadaptation,wetaketheadaptiveconvolutionasanexampletoillustratethis
automaticscheme.
First,amatrixcomposedof
=
kernelmasksisinitializedineveryconvolutionallayer.As
trainingcontinues,eachkernelmaskisupdatedindependentlytoreduceclassi˝cationlossforeach
demographicgroup.Second,wereshapethekernelmasksinto
1
Dvectors
V
=[
v
1
Œ
v
2
ŒŁŁŁŒ
v
=
]
,
where
v
8
2
R
;
Œ;
=
2

F
 

isthekernelmaskofthe
8
C
demographicgroup.Next,we
computeCosinesimilaritybetweentwokernelvectors,
\
89
=
v
8
k
v
8
k

v
9
k
v
9
k
,where
1

8Œ9

=
.The
averagesimilarityofallpair-wisesimilaritiesisobtainedby
\
=
2
=
(
=

1)
P
8
P
9
\
89
Œ8
6
=
9
.Ifthe
dissimilarity

\
islowerthanapre-de˝nedthreshold
g
,thekernelparametersinthislayerreveal
thedemographic-agnosticproperty.Hence,wemergethe
=
kernelsintoasinglekernelbyaveraging
alongthegroupdimension.Byde˝nition,alower
g
impliesmoreadaptivelayers.Givenanarray
of
f
\
8
g
C
(
C
isthetotalnumberofconvolutionallayers),we˝rstsorttheelementsfromsmallest
tohighest,andthisway,layerswhose

\
8
valuesarelargerthan
g
willbeadaptive.Thus,when
g
decreases,morelayerswillbeadaptive.Inthesubsequenttraining,thissinglekernelcanstillbe
updatedseparatelyforeachdemographicgroup,asthekernelmaybecomedemographic-adaptivein
laterepochs.Wemonitorthesimilaritytrendoftheadaptivekernelsineachlayeruntil
\
isstable.
4.4.2.4De-biasingObjectiveFunction
Apartfromtheobjectivefunctionforfaceidentityclassi˝cation,wealsoadoptaregressloss
functiontonarrowthegapoftheintra-classdistancebetweendemographicgroups.Let
6
(

)
denotetheinferencefunctionofGAC,and
˚
896
isthe
8
C
imageofsubject
9
ingroup
6
.Thus,
thefeaturerepresentationofimage
˚
896
isgivenby
r
896
=
6
(
˚
896
Œ
w
)
,where
w
denotestheGAC
96
parameters.AssumingthefeaturedistributionofeachsubjectisaGaussiandistributionwithidentity
covariancematrix(hyper-sphere),weutilizetheaverageEuclideandistancetoeverysubjectcenter
astheintra-classdistanceofeachsubject.Inparticular,we˝rstcomputethecenterpointofeach
identity-sphere:
-
96
=
1
#
#
X
8
=1
6
(
˚
896
Œ
w
)
Œ
(4.7)
where
#
isthetotalnumberoffaceimagesofsubject
9
.Theaverageintra-classdistanceofsubject
9
isasfollows:
ˇ8BC
96
=
1
#
#
X
8
=1
(
r
896

-
96
)
)
(
r
896

-
96
)
Ł
(4.8)
Wethencomputetheintra-classdistanceforallsubjectsingroup
6
as
ˇ8BC
6
=
1
&
P
&
9
=1
ˇ8BC
96
,where
&
isthenumberoftotalsubjectsingroup
6
.Thisallowsustolowerthedi˙erenceofintra-class
distanceby:
L
180B
=
_
&

=
=
X
6
=1
&
X
9
=1


ˇ8BC
96

1
=
=
X
6
=1
ˇ8BC
6


Œ
(4.9)
where
_
isthecoe˚cientforthede-biasingobjective.
4.4.3Experiments
Datasets
OurbiasstudyusesRFWdataset[4]fortestingandBUPT-Balancedfacedataset[5]
fortraining.RFWconsistsoffacesinfourrace/ethnicgroups:White,Black,EastAsian,and
SouthAsian
4
.Eachgroupcontains
˘
10
Kimagesof
3
Kindividualsforfaceveri˝cation.BUPT-
Balancedfacecontains
1
Ł
3
Mimagesof
28
Kcelebritiesandisapproximatelyrace-balancedwith
7
Kidentitiesperrace.Otherthanrace,wealsostudygenderbias.WecombineIMDB[173],
UTKFace[174],AgeDB[175],AAF[177],AFAD[176]totrainagenderclassi˝er,whichestimates
genderoffacesinRFWandBUPT-Balancedface.Thestatisticsofthedatasetsarereportedin
Tab.4.1.Allfaceimagesarecroppedandresizedto
112

112
pixelsvialandmarksdetectedby
RetinaFace[34].
4
RFW[4]usesCaucasian,African,Asian,andIndiantonamedemographicgroups.Weadoptthesegroupsand
accordinglyrenametoWhite,Black,EastAsian,andSouthAsianforclearerrace/ethnicityde˝nition.
97
ImplementationDetails
WetrainabaselinenetworkandGAConBUPT-Balancedface,using
the
50
-layerArcFacearchitecture[6].Theclassi˝cationlossisanadditiveCosinemarginin
Cosface[19],withthescaleandmarginof
B
=64
and
<
=0
Ł
5
.TrainingisoptimizedbySGDwith
abatchsize
256
.Thelearningratestartsfrom
0
Ł
1
anddropsto
0
Ł
0001
followingthescheduleat
8
,
13
,
15
epochsforthebaseline,and
5
,
17
,
19
epochsforGAC.Weset
_
=0
Ł
1
fortheintra-distance
de-biasing.
g
=

0
Ł
2
ischosenforautomaticadaptationinGAC.OurFRmodelsaretrainedtoextract
a
512
-dimrepresentation.Ourdemographicclassi˝erusesa
18
-layerResNet[50].ComparingGAC
andthebaseline,theaveragefeatureextractionspeedperimageonNvidia
1080
TiGPUis
1
Ł
4
ms
and
1
Ł
1
ms,andthenumberofmodelparametersis
44
Ł
0
Mand
43
Ł
6
M,respectively.
PerformanceMetrics
Thecommongroupfairnesscriterialikedemographicparitydistance[150]
areimpropertoevaluatefairnessoflearntrepresentations,sincetheyaredesignedtomeasureinde-
pendencepropertiesofrandomvariables.However,inFRthesensitivedemographiccharacteristics
aretiedtoidentities,makingthesetwovariablescorrelated.TheNISTreportusesfalsenegative
andfalsepositiveforeachdemographicgrouptomeasurethefairness[7].Insteadofplottingfalse
negativevs.falsepositives,weadoptacompactquantitativemetric,
i.e.
,thestandarddeviation
(STD)oftheperformanceindi˙erentdemographicgroups,previouslyintroducedin[5,74]and
calledAsbiasisconsideredassystematicerroroftheestimatedvaluescomparedtothe
actualvalues,here,weassumetheaverageperformancetobetheactualvalue.Foreachdemographic
group,itsbiasnessistheerrorbetweentheaverageandtheperformanceondemographicgroup.
Theoverallbiasnessistheexpectationofallgrouperrors,whichistheSTDofperformanceacross
groups.Wealsoreportaverageaccuracy(Avg)toshowtheoverallFRperformance.
4.4.3.1ResultsonRFWProtocol
WefollowRFWfaceveri˝cationprotocolwith
6
Kpairsperrace/ethnicity.Themodelsaretrained
onBUPT-Balancedfacewithgroundtruthraceandidentitylabels.
ComparewithSOTA.
WecomparetheGACwithfourSOTAalgorithmsonRFWprotocol,namely,
ACNN[185],RL-RBN[5],PFE[184],andDebFace[74].SincetheapproachinACNN[185]is
98
Table4.6PerformancecomparisonwithSOTAontheRFWprotocol[4].Theresultsmarkedby(*)
aredirectlycopiedfrom[5].
MethodWhiteBlackEastAsianSouthAsianAvg(
"
)STD(
#
)
RL-RBN[5]
96
Ł
2795
Ł
0094
Ł
8294
Ł
6895
Ł
190
Ł
63
ACNN[185]
96
Ł
1294
Ł
0093
Ł
6794
Ł
5594
Ł
580
Ł
94
PFE[184]
96
Ł
3895
Ł
17
94
Ł
2794
Ł
6095
Ł
110
Ł
93
ArcFace[6]
96
Ł
18

94
Ł
67

93
Ł
72

93
Ł
98

94
Ł
640
Ł
96
CosFace[19]
95
Ł
12

93
Ł
93

92
Ł
98

92
Ł
93

93
Ł
740
Ł
89
DebFace[74]
95
Ł
9593
Ł
6794
Ł
3394
Ł
7894
Ł
680
Ł
83
GAC
96
Ł
2094
Ł
77
94
Ł
8794
Ł
9895
Ł
210
Ł
58
relatedtoGAC,were-implementitandapplytothebiasmitigationproblem.First,wetrainarace
classi˝erwiththecross-entropylossonBUPT-Balancedface.Thenthesoftmaxoutputofourrace
classi˝erisfedtoa˝ltermanifoldnetwork(FMN)togenerateadaptive˝lterweights.Here,FMNis
atwo-layerMLPwithaReLUinbetween.SimilartoGAC,raceprobabilitiesareconsideredas
auxiliaryinformationforfacerepresentationlearning.WealsocomparewiththeSOTAapproach
PFE[184]bytrainingitonBUPT-Balancedface.AsshowninTab.4.6,GACissuperiortoSOTA
w.r.t.averageperformanceandfeaturefairness.ComparedtokernelmasksinGAC,theFMNin
ACNN[185]containsmoretrainableparameters.Applyingittoeachconvolutionallayerisprone
toover˝tting.Infact,thelayersthatareadaptiveinGAC(
g
=

0
Ł
2
)aresettobetheFMNbased
convolutioninACNN.Astheracedataisafour-elementinputinourcase,usingextrakernel
networksaddscomplexitytotheFRnetwork,whichdegradestheveri˝cationperformance.Even
thoughPFEperformsthebestonstandardbenchmarks(Tab.4.15),itstillexhibitshighbiasness.
OurGACoutperformsPFEonRFWinbothbiasnessandaverageperformance.Comparedto
DebFace[74],inwhichdemographicattributesaredisentangledfromtheidentityrepresentations,
GACachieveshigherveri˝cationperformancebyoptimizingtheclassi˝cationforeachdemographic
group,withalowerbiasnessaswell.
TofurtherpresentthesuperiorityofGACoverthebaselinemodelintermsofbias,weplot
ReceiverOperatingCharacteristic(ROC)curvestoshowthevaluesofTrueAcceptanceRate(TAR)
atvariousvaluesofFalseAcceptanceRate(FAR).Fig.4.13showstheROCperformanceofGAC
andthebaselinemodelonRFW.WeseethatthecurvesofdemographicgroupsgeneratedbyGAC
99
(a)Baseline
(b)GAC
Figure4.13ROCof(a)baselineand(b)GACevaluatedonallpairsofRFW.
suggestsmallergapsinTARateveryFAR,whichdemonstratesthede-biasingcapabilityofGAC.
Fig.4.14showspairsoffalsepositives(twofacesfalselyveri˝edasthesameidentity)andfalse
negativesinRFWdataset.
AblationonAdaptiveStrategies.
Toinvestigatethee˚cacyofournetworkdesign,weconduct
threeablationstudies:adaptivemechanisms,numberofconvolutionallayers,anddemographic
information.Foradaptivemechanisms,sincedeepfeaturemapscontainbothspatialandchannel-wise
information,westudytherelationshipamongadaptivekernels,spatialandchannel-wiseattentions,
andtheirimpacttobiasmitigation.Wealsostudytheimpactof
g
inourautomationmodule.Apart
fromthebaselineandGAC,weablateeightvariants:(1)GAC-Channel:channel-wiseattention
forrace-di˙erentialfeature;(2)GAC-Kernel:adaptiveconvolutionwithrace-speci˝ckernels;(3)
GAC-Spatial:onlyspatialattentionisaddedtobaseline;(4)GAC-CS:bothchannel-wiseandspatial
attention;(5)GAC-CSK:combineadaptiveconvolutionwithspatialandchannel-wiseattention;
(6,7,8)GAC-(
g
=

):set
g
to

.
FromTab.4.7,wemakeseveralobservations:(1)thebaselinemodelisthemostbiasedacross
racegroups.(2)spatialattentionmitigatestheracebiasatthecostofveri˝cationaccuracy,and
islesse˙ectiveonlearningfairfeaturesthanotheradaptivetechniques.Thisisprobablybecause
spatialcontents,especiallylocallayoutinformation,onlyresideatearlierCNNlayers,wherethe
100
Figure4.14
8
falsepositiveandfalsenegativepairsonRFWgivenbythebaselinebutsuccessfully
veri˝edbyGAC.
spatialdimensionsaregraduallydecreasedbythelaterconvolutionsandpoolings.Thus,semantic
detailslikedemographicattributesarehardlyencodedspatially.(3)ComparedtoGAC,combining
adaptivekernelswithbothspatialandchannel-wiseattentionincreasesthenumberofparameters,
loweringtheperformance.(4)As
g
determinesthenumberofadaptivelayersinGAC,ithasagreat
impactontheperformance.Asmall
g
mayincreaseredundantadaptivelayers,whiletheadaptation
layersmaylackincapacityiftoolarge.
AblationonDepthsandDemographicLabels.
Boththeadaptivelayersandde-biasinglossin
GACcanbeappliedtoCNNinanydepth.Inthisablation,wetrainboththebaselineandGAC
(
_
=0
Ł
1
,
g
=

0
Ł
2
)inArcFacearchitecturewiththreedi˙erentnumbersoflayers:
34
,
50
,and
100
.AsthetrainingofGACreliesondemographicinformation,theerrorandbiasindemographic
labelsmightimpactthebiasreductionofGAC.Thus,wealsoablatewithdi˙erentdemographic
101
Table4.7AblationofadaptivestrategiesontheRFWprotocol[4].
MethodWhiteBlackEastAsianSouthAsianAvg(
"
)STD(
#
)
Baseline
96
Ł
1893
Ł
9893
Ł
7294
Ł
6794
Ł
641
Ł
11
GAC-Channel
95
Ł
9593
Ł
6794
Ł
3394
Ł
7894
Ł
680
Ł
83
GAC-Kernel
96
Ł
2394
Ł
4094
Ł
2794
Ł
8094
Ł
930
Ł
78
GAC-Spatial
95
Ł
9793
Ł
2093
Ł
6793
Ł
9394
Ł
191
Ł
06
GAC-CS
96
Ł
2293
Ł
9594
Ł
32
95
Ł
12
94
Ł
650
Ł
87
GAC-CSK
96
Ł
1893
Ł
5894
Ł
2894
Ł
8394
Ł
720
Ł
95
GAC-(
g
=0
)
96
Ł
1893
Ł
9793
Ł
8894
Ł
7794
Ł
700
Ł
92
GAC-(
g
=

0
Ł
1
)
96
Ł
25
94
Ł
2594
Ł
8394
Ł
7295
Ł
010
Ł
75
GAC-(
g
=

0
Ł
2
)
96
Ł
20
94
Ł
7794
Ł
87
94
Ł
98
95
Ł
210
Ł
58
Table4.8AblationofCNNdepthsanddemographicsonRFWprotocol[4].
MethodWhiteBlackEastAsianSouthAsianAvg(
"
)STD(
#
)
NumberofLayers
ArcFace-34
96
Ł
1393
Ł
1592
Ł
8593
Ł
0393
Ł
781
Ł
36
GAC-ArcFace-34
96
Ł
0294
Ł
1294
Ł
1094
Ł
2294
Ł
620
Ł
81
ArcFace-50
96
Ł
1893
Ł
9893
Ł
7294
Ł
6794
Ł
641
Ł
11
GAC-ArcFace-50
96
Ł
2094
Ł
7794
Ł
8794
Ł
9895
Ł
210
Ł
58
ArcFace-100
96
Ł
2393
Ł
8394
Ł
2794
Ł
8094
Ł
780
Ł
91
GAC-ArcFace-100
96
Ł
4394
Ł
5394
Ł
9095
Ł
0395
Ł
220
Ł
72
Race/EthnicityLabels
Ground-truth
96
Ł
2094
Ł
7794
Ł
8794
Ł
9895
Ł
210
Ł
58
Estimated
96
Ł
2794
Ł
4094
Ł
3294
Ł
7794
Ł
940
Ł
79
Random
95
Ł
9593
Ł
1094
Ł
1894
Ł
8294
Ł
501
Ł
03
(a)
(b)
Figure4.15(a)Foreachofthethree
g
inautomaticadaptation,weshowtheaveragesimilarities
ofpair-wisedemographickernelmasks,
i.e.
,
\
,at
1
-
48
layers(y-axis),and
1
-
15
 
trainingsteps
(x-axis).Thenumberofadaptivelayersinthreecases,
i.e.
,
P
48
1
(
\¡g
)
at
15
 
C
step,are
12
,
8
,and
2
,respectively.(b)Withtworacegroups(White,BlackinPCSO[14])andtwomodels(baseline,
GAC),foreachofthefourcombinations,wecomputepair-wisecorrelationoffacerepresentations
usinganytwoof
1
Ksubjectsinthesamerace,andplotthehistogramofcorrelations.GACreduces
thedi˙erence/biasoftwodistributions.
102
Table4.9
Ablationson
_
onRFWprotocol(%).
_
WhiteBlackEastAsianSouthAsianAvg(
"
)STD(
#
)
0
96
Ł
2394
Ł
6594
Ł
9395
Ł
1295
Ł
230
Ł
60
0.1
96
Ł
2094
Ł
7794
Ł
8794
Ł
9895
Ł
210
Ł
58
0.5
94
Ł
8994
Ł
0093
Ł
6794
Ł
5594
Ł
280
Ł
47
Table4.10Veri˝cationAccuracy(%)of
5
-foldcross-validationon
8
groupsofRFW[4].
MethodGenderWhiteBlackEastAsianSouthAsianAvg(
"
)STD(
#
)
Baseline
Male
97
Ł
49

0
Ł
0896
Ł
94

0
Ł
2697
Ł
29

0
Ł
0997
Ł
03

0
Ł
13
96
Ł
96

0
Ł
030
Ł
69

0
Ł
04
Female
97
Ł
19

0
Ł
1097
Ł
93

0
Ł
1195
Ł
71

0
Ł
1196
Ł
01

0
Ł
08
AL+Manual
Male
98
Ł
57

0
Ł
1098
Ł
05

0
Ł
1798
Ł
50

0
Ł
12
98
Ł
36

0
Ł
02
98
Ł
09

0
Ł
050
Ł
66

0
Ł
07
Female
98
Ł
12

0
Ł
18
98
Ł
97

0
Ł
13
96
Ł
83

0
Ł
1997
Ł
33

0
Ł
13
GAC
Male
98
Ł
75

0
Ł
0498
Ł
18

0
Ł
2098
Ł
55

0
Ł
07
98
Ł
31

0
Ł
12
98
Ł
19

0
Ł
060
Ł
56

0
Ł
05
Female
98
Ł
26

0
Ł
16
98
Ł
80

0
Ł
15
97
Ł
09

0
Ł
1297
Ł
56

0
Ł
10
information,(1)ground-truth:therace/ethnicitylabelsprovidedbyRFW;(2)estimated:thelabels
predictedbyapre-trainedraceestimationmodel;(3)random:thedemographiclabelrandomly
assignedtoeachface.
AsshowninTab.4.8,comparedtothebaselines,GACsuccessfullyreducestheSTDatdi˙erent
numberoflayers.Weseethatthemodelwithleastnumberoflayerspresentsthemostbias,andthe
biasreductionbyGACisthemostaswell.Thenoiseandbiasindemographiclabelsdo,however,
impairtheperformanceofGAC.Withestimateddemographics,thebiasnessishigherthanthatof
themodelwithground-truthsupervision.Meanwhile,themodeltrainedwithrandomdemographics
hasthehighestbiasness.Evenso,usingestimatedattributesduringtestingstillimprovesfairnessin
facerecognitioncomparedtobaseline.Thisindicatesthee˚cacyofGACevenintheabsenceof
ground-truthlabels.
Ablationon
_
.
Weuse
_
tocontroltheweightofde-biasingloss.Tab.4.9reportstheresultsof
GACtrainedwithdi˙erentvaluesof
_
.When
_
=0
,de-biasinglossisremovedintraining.The
resultsindicatealarger
_
leadstolowerbiasnessatthecostofoverallaccuracy.
AblationonAutomationModule
Here,wealsoablateGACwithtwovariantstoshowthee˚ciencyofitsautomationmodule:
i)
Ada-All
,
i.e.
,alltheconvolutionallayersareadaptiveandii)
Ada-8
,
i.e.
,thesame
8
layersas
GACaresettobeadaptivestartingfromthebeginningofthetrainingprocess,withnoautomation
103
Table4.11AblationsontheautomationmoduleonRFWprotocol(%).
MethodWhiteBlackEastAsianSouthAsianAvg(
"
)STD(
#
)
Ada-All
93
Ł
2290
Ł
9591
Ł
3292
Ł
1291
Ł
900
Ł
87
Ada-8
96
Ł
2594
Ł
4094
Ł
3595
Ł
1295
Ł
030
Ł
77
GAC
96
Ł
2094
Ł
7794
Ł
8794
Ł
9895
Ł
210
Ł
58
AverageImage
EastAsianfemale
EastAsianmale
Blackfemale
Blackmale
Whitefemale
Whitemale
SouthAsianfemale
SouthAsianmale
Heatmap+Image-
GAC
Heatmap+Image-
Base
Figure4.16The˝rstrowshowstheaveragefacesofdi˙erentgroupsinRFW.Thenexttworows
showgradient-weightedclassactivationheatmaps[15]atthe
43
C
convolutionallayeroftheGAC
andbaseline.ThehigherdiversityofheatmapsinGACshowsthevariabilityofparametersinGAC
acrossgroups.
module(ourbestGACmodelhas
8
adaptivelayers).AsinTab.4.11,withautomationmodule,GAC
achieveshigheraverageaccuracyandlowerbiasnessthantheothertwomodels.
4.4.3.2ResultsonGenderandRaceGroups
Table4.12Statisticsofdatasetfoldsinthecross-validationexperiment.
Fold
White(#)Black(#)EastAsian(#)SouthAsian(#)
SubjectsImagesSubjectsImagesSubjectsImagesSubjectsImages
11
Œ
99168
Œ
1591
Œ
99967
Œ
8801
Œ
89867
Œ
1041
Œ
99657
Œ
628
21
Œ
99167
Œ
4991
Œ
99965
Œ
7361
Œ
89866
Œ
2581
Œ
99657
Œ
159
31
Œ
99166
Œ
0911
Œ
99965
Œ
6701
Œ
89867
Œ
6961
Œ
99656
Œ
247
41
Œ
99166
Œ
3331
Œ
99967
Œ
7571
Œ
89865
Œ
3411
Œ
99657
Œ
665
51
Œ
99468
Œ
5971
Œ
99967
Œ
7471
Œ
89868
Œ
7632
Œ
00056
Œ
703
Wenowextenddemographicattributestobothgenderandrace.First,wetraintwoclassi˝ers
thatpredictgenderandrace/ethnicityofafaceimage.Theclassi˝cationaccuracyofgenderand
104
Table4.13Veri˝cation(%)ongendergroupsofIJB-C(TAR@
0
Ł
1%
FAR).
ModelMaleFemaleAvg(
"
)STD(
#
)
Baseline
89
Ł
7279
Ł
5784
Ł
645
Ł
08
GAC
88
Ł
2583
Ł
7486
Ł
002
Ł
26
race/ethnicityis
85%
and
81%
5
,respectively.Then,these˝xedclassi˝ersarea˚liatedwithGACto
providedemographicinformationforlearningadaptivekernelsandattentionmaps.WemergeBUPT-
BalancedfaceandRFW,andsplitthesubjectsinto
5
setsforeachof
8
demographicgroups.In
5
-fold
cross-validation,eachtimeamodelistrainedon
4
setsandtestedontheremainingset.Tab.4.12
reportsthestatisticsofeachdatafoldforthecross-validationexperimentonBUPT-Balancedface
andRFWdatasets.
Herewedemonstratethee˚cacyoftheautomationmoduleforGAC.Wecomparetothescheme
ofmanuallydesign(AL+Manual)thataddsadaptivekernelsandattentionmapstoasubsetof
layers.Speci˝cally,the˝rstblockineveryresidualunitischosentobetheadaptiveconvolution
layer,andchannel-wiseattentionsareappliedtothefeaturemapoutputbythelastblockineach
residualunit.Asweuse
4
residualunitsandeachblockhas
2
convolutionallayers,themanual
schemeinvolves
8
adaptiveconvolutionallayersand
4
groupsofchannel-wiseattentionmaps.As
inTab.4.10,automaticadaptationismoree˙ectiveinenhancingthediscirminabilityandfairness
offacerepresentations.Figure4.15ashowsthedissimilarityofkernelmasksintheconvolutional
layerschangesduringtrainingepochsunderthreethresholds
g
.Alower
g
resultsinmoreadaptive
layers.Weseethelayersthataredeterminedtobeadaptivedovaryacrossbothlayers(vertically)
andtrainingtime(horizontally),whichshowstheimportanceofourautomaticmechanism.
SinceIJB-Calsoprovidesgenderlabels,weevaluateourGAC-gendermodel(seeSec.4.2of
themainpaper)onIJB-Caswell.Speci˝cally,wecomputetheveri˝cationTARat
0
Ł
1%
FARon
thepairsoffemalefacesandmalefaces,respectively.Tab.4.13reportstheTAR@
0
Ł
1%
FARon
5
Thisseeminglylowaccuracyismainlyduetothelargedatasetweassembledfortrainingandtestinggender/race
classi˝ers.Ourdemographicclassi˝erhasbeenshowntoperformcomparablyasSOTAoncommonbenchmarks.
Whiledemographicestimationerrorsimpactthetraining,testing,andevaluationofbiasmitigationalgorithms,the
evaluationisofthemostconcernasdemographiclabelerrorsmaygreatlyimpactthebiasnesscalculation.Thus,future
developmentmayincludeeithermanuallycleaningthelabels,ordesigningabiasnessmetricrobusttolabelerrors.
105
Table4.14Veri˝cationaccuracy(
%
)ontheRFWprotocol[4]withvaryingrace/ethnicitydistribution
inthetrainingset.
TrainingRatioWhiteBlackEastAsianSouthAsianAvg(
"
)STD(
#
)
7:7:7:796
Ł
2094
Ł
7794
Ł
8794
Ł
9895
Ł
210
Ł
58
5:7:7:796
Ł
5394
Ł
6794
Ł
5595
Ł
4095
Ł
290
Ł
79
3
Ł
5:7:7:796
Ł
4894
Ł
5294
Ł
4595
Ł
3295
Ł
190
Ł
82
1:7:7:795
Ł
4594
Ł
2894
Ł
4795
Ł
1394
Ł
830
Ł
48
0:7:7:792
Ł
6392
Ł
2792
Ł
3293
Ł
3792
Ł
650
Ł
44
gendergroupsofIJB-C.ThebiasnessofGACisstilllowerthanthebaselinefordi˙erentgender
groupsofIJB-C.
4.4.3.3AnalysisonIntrinsicBiasandDataBias
ForallthealgorithmslistedinTab.1ofthemainpaper,theperformanceishigherinWhitegroup
thanthoseintheotherthreegroups,eventhoughallthemodelsaretrainedonademographic
balanceddataset,BUPT-Balancedface[5].Inthissection,wefurtherinvestigatetheintrinsicbiasof
facerecognitionbetweendemographicgroupsandtheimpactofthedatabiasinthetrainingset.
Are
non-Whitefacesinherentlydi˚culttoberecognizedforexistingalgorithms?Or,arefaceimagesin
BUPT-Balancedface(thetrainingset)andRFW[4](testingset)biasedtowardstheWhitegroup?
Tothisend,wetrainourGACnetworkusingtrainingsetswithdi˙erentrace/ethnicitydistributions
andevaluatethemonRFW.Intotal,weconductfourexperiments,inwhichwegraduallyreducethe
totalnumberofsubjectsintheWhitegroupfromtheBUPT-Balancedfacedataset.Toconstruct
anewtrainingset,subjectsfromthenon-WhitegroupsinBUPT-Balancedfaceremainthesame,
whileasubsetofsubjectsisrandomlypickedfromtheWhitegroup.Asaresult,theratiosbetween
non-Whitegroupsareconsistentlythesame,andtheratiosofWhite,Black,EastAsian,South
Asianare
f
5:7:7:7
g
,
f
3
Ł
5:7:7:7
g
,
f
1:7:7:7
g
,
f
0:7:7:7
g
inthefourexperiments,
respectively.Inthelastsetting,wecompletelyremoveWhitefromthetrainingset.
Tab.4.14reportsthefaceveri˝cationaccuracyofmodelstrainedwithdi˙erentrace/ethnicity
distributionsonRFW.Forcomparison,wealsoputourresultsonthebalanceddatasethere(with
ratio
f
7:7:7:7
g
),whereallimagesinBUPT-Balancedfaceareusedfortraining.Fromtheresults,
106
Table4.15Veri˝cationperformanceonLFW,IJB-A,andIJB-C.[Key:
Best
,
Second
,Third
Best]
MethodLFW(%)
Method
IJB-A(%)IJB-C@FAR(%)
0.1%FAR0.001%0.01%0.1%
DeepFace+[17]
97
Ł
35
Yin
etal
.[182]
73
Ł
9

4
Ł
2
--69.3
CosFace[19]
99
Ł
73
Cao
etal
.[59]
90
Ł
4

1
Ł
474
Ł
784
Ł
091
Ł
0
ArcFace[6]
99
Ł
83
Multicolumn[183]
92
Ł
0

1
Ł
3
77
Ł
1
86
Ł
2
92
Ł
7
PFE[184]
99
Ł
82
PFE[184]
95
Ł
3

0
Ł
989
Ł
693
Ł
395
Ł
5
Baseline
99
Ł
75
Baseline
90
Ł
2

1
Ł
180
Ł
288
Ł
092
Ł
9
GAC
99
Ł
78
GAC
91
Ł
3

1
Ł
2
83
Ł
589
Ł
293
Ł
7
weseeseveralobservations:(1)ItshowsthattheWhitegroupstilloutperformsthenon-Whitegroups
forallthe˝rstthreeexperiments.EvenwithoutanyWhitesubjectsinthetrainingset,theaccuracy
ontheWhitetestingsetisstillhigherthanthoseonthetestingimagesinBlackandEastAsian
groups.ThissuggeststhatWhitefacesareeitherintrinsicallyeasiertobeveri˝edorfaceimagesin
theWhitegroupofRFWarelesschallenging.(2)WiththedeclineinthetotalnumberofWhite
subjects,theaverageperformancedeclinesaswell.Infact,forallthesegroups,theperformance
su˙ersfromthedecreaseinthenumberofWhitefaces.ThisindicatesthatfaceimagesintheWhite
groupsarehelpfultoboostthefacerecognitionperformanceforbothWhiteandnon-Whitefaces.
Inotherwords,facesfromtheWhitegroupbene˝ttherepresentationlearningofglobalpatternsfor
facerecognitioningeneral.(3)Oppositetoourintuition,thebiasnessislowerwithlessnumberof
Whitefaces,whilethedatabiasisactuallyincreasedbyaddingtheunbalancednesstothetraining
set.
4.4.3.4ResultsonStandardBenchmarkDatasets
WhileourGACmitigatesbias,wealsohopeitcanperformwellonstandardbenchmarks.Therefore,
weevaluateGAConstandardbenchmarkswithoutconsideringdemographicimpacts,including
LFW[27],IJB-A[29],andIJB-C[9].Thesedatasetsexhibitimbalanceddistributionindemographics.
ForafaircomparisonwithSOTA,insteadofusinggroundtruthdemographics,wetrainGAC
onMs-Celeb-1M[23]withthedemographicattributesestimatedbytheclassi˝erpre-trainedin
Sec.4.4.3.2.AsinTab.4.15,GACoutperformsthebaselineandperformscomparabletoSOTA.
107
Table4.16Distributionofratiosbetweenminimuminter-classdistanceandmaximumintra-class
distanceoffacefeaturesin
4
racegroupsofRFW.GACexhibitshigherratios,andmoresimilar
distributionstothereference.
Race
MeanStaDRelativeEntropy
BaselineGACBaselineGACBaselineGAC
White
1
Ł
151
Ł
170
Ł
300
Ł
310
Ł
00
Ł
0
Black
1
Ł
071
Ł
100
Ł
270
Ł
280
Ł
610
Ł
43
EastAsian
1
Ł
081
Ł
100
Ł
310
Ł
320
Ł
650
Ł
58
SouthAsian
1
Ł
151
Ł
180
Ł
310
Ł
320
Ł
190
Ł
13
4.4.3.5VisualizationandAnalysisonBiasofFR
Visualization
TounderstandtheadaptivekernelsinGAC,wevisualizethefeaturemapsatan
adaptivelayerforfacesofvariousdemographics,viaaPytorchvisualizationtool[214].Wevisualize
importantfaceregionspertainingtotheFRdecisionbyusingagradient-weightedclassactivation
mapping(Grad-CAM)[15].Grad-CAMusesthegradientsbackfromthe˝nallayercorrespondingto
aninputidentity,andguidesthetargetfeaturemaptohighlightimportregionsforidentitypredicting.
Figure4.16showsthat,comparedtothebaseline,thesalientregionsofGACdemonstratemore
diversityonfacesfromdi˙erentgroups.Thisillustratesthevariabilityofnetworkparametersin
GACacrossdi˙erentgroups.
BiasviaLocalGeometry
InadditiontoSTD,weexplainthebiasphenomenonviathelocal
geometryofagivenfacerepresentationineachdemographicgroup.Weassumethatthestatisticsof
neighborsofagivenpoint(representation)re˛ectscertainpropertiesofitsmanifold(localgeometry).
Thus,weillustratethepair-wisecorrelationoffacerepresentations.Tominimizevariationscaused
byothervariables,weuseconstrainedfrontalfacesofamugshotdataset,PCSO[14].Werandomly
select
1
KWhiteand
1
KBlacksubjectsfromPCSO,andcomputetheirpair-wisecorrelation
withineachrace.InFig.4.15b,Base-Whiterepresentationsshowlowerinter-classcorrelationthan
Base-Black,
i.e.
,facesintheWhitegroupareover-representedbythebaselinethantheBlackgroup.
Incontrast,GAC-WhiteandGAC-Blackshowsmoresimilarityintheircorrelationhistograms.
AsPCSOhasfewAsiansubjects,weuseRFWforanotherexaminationofthelocalgeometry
in
4
groups.Thatis,afternormalizingtherepresentations,wecomputethepair-wiseEuclidean
108
Table4.17Networkcomplexityandinferencetime.
ModelInputResolution#Parameters(M)MACs(G)Inference(ms)
Baseline
112

11243
Ł
585
Ł
961
Ł
1
GAC
112

11244
Ł
009
Ł
821
Ł
4
distanceandmeasuretheratiobetweentheminimumdistanceofinter-subjectspairsandthe
maximumdistanceofintra-subjectpairs.Wecomputethemeanandstandarddeviation(StaD)of
ratiodistributionsin
4
groups,bytwomodels.Also,wegaugetherelativeentropytomeasurethe
deviationofdistributionsfromeachother.Forsimplicity,wechooseWhitegroupasthereference
distribution.AsshowninTab.4.16,whileGAChasminorimprovementoverbaselineinthemean,
itgivessmallerrelativeentropyintheother
3
groups,indicatingthattheratiodistributionsof
otherracesinGACaremoresimilar,
i.e.
,lessbiased,tothereferencedistribution.Theseresults
demonstratethecapabilityofGACtoincreasefairnessoffacerepresentations.
4.4.3.6NetworkComplexityandFLOPs
Tab.4.17summarizesthenetworkcomplexityofGACandthebaselineintermsofthenumberof
parameters,,andinferencetimes.Whileweagreethenumberofparameters
willincreasewiththenumberofdemographiccategories,itwillnotnecessarilyincreasetheinference
time,whichismoreimportantforreal-timeapplications.
4.5DemographicEstimation
Wetrainthreedemographicestimationmodelstoannotateage,gender,andraceinformationofthe
faceimagesinBUPT-BalancedfaceandMS-Celeb-1MfortrainingGACandDebFace.Forallthree
models,werandomlysampleequalnumberofimagesfromeachclassandsetthebatchsizeto
300
.
Thetrainingprocess˝nishesat
35
 
C
iteration.Allhyper-parametersarechosenbytestingona
separatevalidationset.Belowgivesthedetailsofmodellearningandestimationperformanceof
eachdemographic.
109
(a)
Age
(b)
Gender
(c)
Race
Figure4.17DemographicAttributeClassi˝cationAccuracyoneachgroup.Thereddashedline
referstotheaverageaccuracyonallimagesinthetestingset.
Table4.18Genderdistributionofthedatasetsforgenderestimation.
Dataset
#ofImages
MaleFemale
Training321,590229,000
Testing15,71510,835
Gender:
WecombineIMDB,UTKFace,AgeDB,AFAD,andAAFdatasetsforlearningthe
genderestimationmodel.Similartoage,
90%
oftheimagesinthecombineddatasetsareused
fortraining,andtheremaining
10%
areusedforvalidation.Table4.18reportsthetotalnumber
offemaleandmalefaceimagesinthetrainingandtestingset.Moreimagesbelongtomalefaces
inbothtrainingandtestingset.Figure4.17bshowsthegenderestimationperformanceonthe
validationset.Theperformanceonmaleimagesisslightlybetterthanthatonfemaleimages.
Race:
WecombineAFAD,RFW,IMFDB-CVIT,andPCSOdatasetsfortrainingtherace
estimationmodel.UTKFaceisusedasvalidationset.Table4.19reportsthetotalnumberofimages
ineachracecategoryofthetrainingandtestingset.Similartoageandgender,theperformanceof
raceestimationishighlycorrelatedtotheracedistributioninthetrainingset.Mostoftheimages
arewithintheWhitegroup,whiletheIndiangrouphastheleastnumberofimages.Therefore,the
performanceonWhitefacesismuchhigherthanthatonIndianfaces.
Age:
WecombineCACD,IMDB,UTKFace,AgeDB,AFAD,andAAFdatasetsforlearningthe
ageestimationmodel.
90%
oftheimagesinthecombineddatasetsareusedfortraining,andthe
110
Table4.19Racedistributionofthedatasetsforraceestimation.
Dataset
#ofImages
WhiteBlackEastAsianIndian
Training468,139150,585162,07578,260
Testing9,4694,1153,3363,748
Table4.20Agedistributionofthedatasetsforageestimation
Dataset
#ofImagesintheAgeGroup
0-1213-1819-3435-4445-5455-100
Training9,53929,135353,901171,32893,50659,599
Testing1,0852,68113,8488,4145,4794,690
remaining
10%
areusedforvalidation.Table4.20reportsthetotalnumberofimagesineachage
groupofthetrainingandtestingset,respectively.Figure4.17ashowstheageestimationperformance
onthevalidationset.Themajorityoftheimagescomefromtheage19to34group.Therefore,the
ageestimationperformsthebestonthisgroup.Theperformanceontheyoungchildrenandmiddle
tooldagegroupissigni˝cantlyworsethanthemajoritygroup.
Itisclearthatallthedemographicmodelspresentbiasedperformancewithrespecttodi˙erent
cohorts.ThesedemographicmodelsareusedtolabeltheBUPT-BalancedfaceandMS-Celeb-1M
fortrainingGACandDebFace.Thus,inadditiontothebiasfromthedatasetitself,wealsoadd
labelbiastoit.SinceDebFaceemployssupervisedfeaturedisentanglement,weonlystrivetoreduce
thedatabiasinsteadofthelabelbias.
4.6Conclusion
ThischaptertacklestheissueofdemographicbiasinFRbylearningfairfacerepresentations.We
presenttwode-biasingFRnetworks,GACandDebFace,tomitigatedemographicbiasinFR.In
particular,GACisproposedtoimproverobustnessofrepresentationsforeverydemographicgroup
consideredhere.Bothadaptiveconvolutionkernelsandchannel-wiseattentionmapsareintroduced
toGAC.Wefurtheraddanautomaticadaptationmoduletodeterminewhethertouseadaptationsin
agivenlayer.Our˝ndingssuggestthatfacescanbebetterrepresentedbyusinglayersadaptiveto
111
di˙erentdemographicgroups,leadingtomorebalancedperformancegainforallgroups.Unlike
GAC,DebFacemitigatemutualbiasacrossidentitiesanddemographicattributesrecognitionby
adversariallylearningthedisentangledrepresentationforgender,race,andageestimation,andface
recognitionsimultaneously.WeempiricallydemonstratethatDebFacecannotonlyreducebiasin
facerecognitionbutindemographicattributeestimationaswell.
112
Chapter5
AdversarialFaceRepresentationLearning
viaGraphClassi˝cation
FacerepresentationlearningisoneofthekeystepsinFRtoovercomechallengescausedbyvariations
infaceimages.Inthischapter,weproposearepresentationlearningmethodthatutilizesgraph
classi˝cationviaadversarialtraining.Eachfaceimagecanbeviewedasanodeinthegraph,and
theedgesbetweennodesstandfortheconnectivityoffacesamples.Agraphclassi˝eristrained
todistinguishgraphsbetweenthosegeneratedbyextractedfeaturevectorsandthosede˝nedbya
practicalassumption.Meanwhile,thefacerepresentationmodelattemptstofoolthegraphclassi˝er
sothatitcangraduallyacquirethefeaturedistributionoftheidealoraclegraph.Inthisway,the
featurepointsofthesameidentitywillcomecloserwhilemaintainareasonabledistancefrompoints
ofotheridentities.Experimentsonbenchmarkfacedatasets(LFW,CPLFW,CFP-FP,IJB-A,IJB-B,
IJB-C)showthatourframeworkachievesstate-of-the-artperformanceforbothveri˝cationand
identi˝cationtasks.
5.1AdversarialLearningandGraphClassi˝cationwithGNN
Adversariallearning[153]hasproventobeausefulapproachtolearningdatadistribution,andhas
beenappliedtomanycomputervisionapplications.(RefertoSec.4.3.1formoredetailsonresent
113
(a)Generativeadversarialnetwork(GAN)
(b)Adversarialrepresentationlearningviagraphclassi˝cation
Figure5.1(a)InGANs,duringtraininganimagegeneratorgraduallyproduceshigherqualityfaces
sothataCNN-baseddiscriminatorcouldnotdistinguishfakefromrealfaces.(b)Analogously,
giveninputfaces,ourembeddingnetworkforfacerecognitionlearnstoextractdiscriminative
featuresandconnectfeaturesasagraph,withthegoalthatagraphneuralnetwork(GNN)-based
discriminatorcouldnotdistinguishgeneratedgraphsfromoraclegraphsthegraphofidealface
representations.Duringinference,ourembeddingnetworkcanextractmorediscriminativefeatures
thatformoracle-likegraph,justlikeGAN'sgeneratorsynthesizesphoto-realisticfaces.
114
advancesinadversariallearning.)
Thetaskofgraphclassi˝cationistopredictthecategoryagraphbelongsto.Unlikenodelabel
prediction,afullgraphstructureisconsideredasasingleinputcomponent,andthecorresponding
outputiseitherasinglerepresentationvectororaclasslabel.Twomaintechniquesareinvolved
inrecentDNN-basedgraphnetwork:spatialcomputationandspectraloperation.InSpectral
approaches[215,216],thegraphconvolutionsarebasedontheconvolutiontheoremfromsignal
processingtechnology,wherethepoint-wisemultiplicationsareperformedintheFourierdomain
ofthegraph.Incontrast,spatialmethods[217

219]operateconvolutionsdirectlyonthegraph
structure.Beforethelabelingprocedure,thealgorithmin[218]˝rstappliesanormalization
ontheneighborhoodgraphscreatedbydeterminingthesequenceofnodes.Thisnormalization
stepmovesgraphswithsimilarstructuralrolesinthesameneighborhood,andbene˝tsthe˝nal
classi˝cation.Antoine
etal
.[220]modi˝estheconventional1Dgraphconvolutiontoavanilla2D
CNNarchitecturefor2Dgraphclassi˝cation.
5.2OurApproach
5.2.1OverallFramework
Theproposedadversarialtrainingframeworkforfacerepresentationiscomposedof:aface
embeddingnetwork
ˆ
(

)
,afeaturegraphconstructor
˝
(

)
,andaGNN-basedgraphdiscriminator
ˇ
(

)
.First,givenasetof
#
labeledtrainingimages
f
(
x
8
ŒH
8
)
g
#
8
,theembeddingnetworktakesthe
8
C
image
x
8
astheinputandtransforms
x
8
intoa
<
-dimensionalfeaturevector:
f
8
=
ˆ
(
x
8
)
,where
f
8
2R
<
.Thepairofthefeaturevectoranditscorrespondinglabel
(
f
8
ŒH
8
)
isthensenttothegraph
constructor
˝
(

)
tobeaddedasanewnodeinthegraph.When
˝
(

)
collectsaspeci˝ednumberof
nodes,itstartstobuildgraphsbasedonlabelsofthenodesandtheirsimilarities.Forthegraphsin
theoraclespace,twovertices
(
E
8
ŒE
9
)
areconnectedbyabidirectionaledgeiftheyarefromthesame
subject;andforthegraphsinthegeneratedspace,onevertex
E
8
islinkedtoanother
E
9
if
E
8
isone
ofthe
:
nearestneighborsof
E
9
.MoredetailsofgraphconstructionarediscussedinSec.5.2.2.
115
Figure5.2Overviewoftheproposedadversarialfacerepresentationlearningviagraphclassi˝cation.
Solidarrowspresentforwardpass,anddashedarrowsdenotebackwardpropagation.Thetraining
alternatesbetween
ˆ
(

)
and
ˇ
(

)
.Forthesharedinference(solidbluearrows),asetoffaceimages
are˝rsttakenbytheembeddingnetwork
ˆ
(

)
toextractfeaturerepresentations.Thesefeature
vectorsarethenconvertedintographstructurebythegraphconstructor
˝
(

)
inwhichanoracle
graphandageneratedgraphareconstructed.Duringthetrainingof
ˇ
(

)
(yellowarrows),thetwo
typesofgraphsarereceivedbythegraphdiscriminator
ˇ
(

)
thatisrequiredtomakepredictions
onthecategoryofthegraphs.
ˇ
(

)
isthenupdatedbasedonthegradientsentbackfromtheloss
function
L
ˇ
.Inthecourseoftrainingon
ˆ
(

)
(redarrows),onlygeneratedgraphsaredeliveredto
ˇ
(

)
,and
ˆ
(

)
receivesfeedbacksfrom
L

whosegoalistodrive
ˇ
(

)
tomakeerrorsongenerated
graphs.
Next,thetwotypesofgraphsproducedby
˝
(

)
arefedintotheGNNdiscriminator
ˇ
(

)
asthe
trainingsamples.Theparametersof
ˇ
(

)
isupdatedwiththeobjectivetocorrectlyclassifythe
graphcategory.Inthemeantime,theadversarialgoalof
ˆ
(

)
istomislead
ˇ
(

)
bymakingthe
featuredistributioninthetwospacesfromwhichthesegraphsarecreated
indistinguishable
.
AsshowninFig.5.2,withthecooperationfrom
˝
(

)
,
ˆ
(

)
endeavorstoextractfeature
representationsbycompetingagainst
ˇ
(

)
usingagraphclassi˝cationnetwork[217].Sucha
graphstructurecannotonlyallowfordistancemetricbetweennodes,butattendtobothlocaland
globaldistributionsoffeaturerepresentations,andhencebene˝tsthediscriminativepowerofface
embeddings.
116
5.2.2GraphConstruction
Agraphiscomposedofvertices
V
andedges
E
.Inourframework,eachvertexisrepresented
bya
<
-dimensionalfeaturerepresentation,andanadjacencymatrixwithbinary(
0
€
1
)elements
isusedtodescribeedgeinformation.Theadjacentelementequalstoonewhenthetwovertices
areconnected,otherwiseitiszero.Givenasetofdatapairs
f
(
f
8
ŒH
8
)
g
#
8
,thegraphconstructor
˝
(

)
producesgraphsoftwotypes:(1)anoraclegraph
6
>
(
V
>
Œ
E
>
)
,and(2)ageneratedgraph
6
5
(
V
5
Œ
E
5
)
.
Weintroducetheconstructionofbothtypesofgraphsintermsofthede˝nitionoftheirverticesand
edges,respectively.
OracleGraph
Foredgesinanoraclegraph,onevertexisconnectedtoalltheotherverticesofthe
samesubject.Theresultingadjacencymatrixissymmetric:
A
>
89
=
8
>
>
><
>
>
>
:
1
ŒH
8
=
H
9
0
Œ
otherwise
.Forverticesin
anoraclegraph,weneedtorede˝netherepresentationvectorsbasedonboththeexistingfeature
pairs
f
(
f
8
ŒH
8
)
g
#
8
andtheultimatelearningobjective.Forexample,inthemostidealsituationeach
faceidentitywouldberepresentedbyasinglevector,whichmeansallimagesamplesofthesame
subjectwouldbemappedtothesamefeaturevector,regardlessoftheintra-subjectvariations.The
correspondingnodeinformationmatrixis:
P
>
=
f
f
2
H
8
g
#
8
2R
#

<
Œ
(5.1)
where
f
2
9
=
1
)
9
P
H
8
=
9
f
8
,and
)
9
denotesthetotalnumberoffaceimagesofthe
9
C
subject(See
Fig.5.3b).However,whensuchnoderepresentationsareusedasthetrainingtarget,itisfarbeyond
therealisticdatadistributionoffaceimages.Asaresult,itmayeitherbehardtotrainorleadto
trivialsolutions.
Tomakeitmorepractical,weintroducearatiohyper-parameter,
A
,toadaptivelyadjustthe
trainingdi˚cultyintermsofthenoderepresentation.Speci˝cally,
A
isafractionnumberbetween
0
and
1
.Itcontrolsthemaximumdistanceofanyfeaturevectortoitscentroid.Forthosevectors
thatviolatethisconstraint,theyareforcedtomovetheircoordinatestowardsthedirectionofthe
117
centroidsinane˙orttoreachthemaximumdistance.Nowwecaneasilymanipulatethetraining
targetbychangingthevalueof
A
.The˝nalmatrixofnodeinformationintheoraclegraphis:
P
>
8
=
8
>
>
>
><
>
>
>
>
:
f
8
Œˇ8BC
(
f
8
Œ
f
2
H
8
)

A

ˇ8BC
<8=
f
2
H
8
+
A
(
f
8

f
2
H
8
)
ˇ8BC
<8=
ˇ8BC
(
f
8
Œ
f
2
H
8
)
,
otherwise
(5.2)
where
ˇ8BC
<8=
=
min

f
ˇ8BC
(
f
2
H
8
Œ
f
2
9
)
g
=
9
=1
Œ9
6
=
H
8

,
=
isthetotalnumberofsubjects,and
ˇ8BC
(

)
isa
distancemetricfunction.Fig.5.3cand5.3dshowa
2
Dexampleofthisprocess.
GeneratedGraph
Avertexinageneratedgraphissimplyrepresentedbyitscorresponding
representationvector
f
8
extractedfrom
ˆ
(

)
.Andtheentirenodeinformationinthegraphisdenoted
byafeaturematrix
P
5
=
f
f
8
g
#
8
2R
#

<
witheachrowcorrespondstoavertex.Foredgesina
generatedgraph,weassigntheadjacentvaluebasedonthesimilaritybetweentwovertices.A
vertex
E
8
isassociatedwithanothervertex
E
9
if
E
9
isoneof
E
8
'stop
:
nearestneighbors,denotedby
#481
:
(
f
9
)
,whichisbasedonthedistanceoftheirfeaturevectorsintheEuclideanspace.Unlike
A
>
,
the˝naladjacencymatrixofageneratedgraphmaynotbesymmetric:
A
5
89
=
8
>
>
><
>
>
>
:
1
Œ
f
8
2
#481
:
(
f
9
)
0
Œ
otherwise
.
Intheend,apairofgraphs

6
>
(
V
>
Œ
E
>
)
Œ6
5
(
V
5
Œ
E
5
)

associatedwiththebinarylabels

H
6
8
=1
ŒH
6
9
=0

areyieldedbythegraphconstructor.
5.2.3DiscriminatorandAdversarialLearning
Forthegraphdiscriminator
ˇ
(

)
,weconsiderawholegraphwithallitsverticesandedgesasasingle
instance,andthegoalistopredictthecategoryitbelongsto.Sincethereisavarietyofapplications
ongraphclassi˝cation,ithasraisedattentionstodevelopmoreusefulandpracticalarchitecturesfor
graphclassi˝cationandrepresentationlearning.Here,weemployafastapproximateconvolutions
ongraphs,proposedby[215].Inparticular,thediscriminatornetworkconsistsofmultiplegraph
convolutionallayersconnectedbynon-linearactivationfunctions:
H
(
;
+1)
=
f

~
D

1
2
~
A
~
D

1
2
H
(
;
)
W
(
;
)

Œ
(5.3)
118
(a)GeneratedGraph
(b)OracleGraphbyCenter
(c)MeettheRadius
(d)OracleGraphby
A
Figure5.3
Constructionofgeneratedgraphandoraclegraph
.Inthisexample,theinputimageset
comprises
9
imagesof
3
subjects,
3
imagespersubject.Theimageofeachsubjectissurroundedby
acirclewithauniquecolor,indicatingitsidentity.Eachimageisprojectedtoapointinthe
2
D
Euclideanfeaturespace.Thefollowinggraphsareconstructed:(a)ageneratedgraph,whereeach
vertex
E
8
isrepresentedbyitsfeaturevector,withadirectededgefrom
E
8
to
E
9
if
E
9
isoneofthe
top
2
nearestneighborsof
E
8
;(b)anoraclegraphcreatedbycenterpoints,whereeachvertexis
representedbythemeanvectorofitsidentitywithabidirectionaledgeconnectingtwoverticesof
thesamesubject;(c)aradiusconstraintisusedtoallowtolerableintra-subjectvariations,where
verticesmovetowardscenterdirections(denotedbydashedarrows)tomeettheradiusrequirement.
Forthevertexwithintheradius,theleftmostoneinthisexample,itstaysthesame.(d)anoracle
graphcontrolledby
A
,wherethedistanceofeachvertextoitscenterisreducedbytheratioof
A
,
withabidirectionaledgeconnectingtwoverticesofthesamesubject.
119
where
~
A
=
A
+
I
#
isthesummationoftheadjacencymatrixandtheidentitymatrix
I
#
,
~
D
88
=
P
9
~
A
89
,
and
W
(
;
)
istheconvolutionalkernelmatrixofthe
;
C
layer.
f
(

)
denotestheactivationfunction.
H
(
;
)
istheinputfeaturemapforlayer
;
,andtheinputfortheinitiallayeristhenodeinformation
matrix
P
ofthegraph.Finally,theclassi˝cationnetworkendswithafully-connectedlayerfollowed
byasigmoidoperation.Asabinaryclassi˝cationtask,thediscriminatoristrainedbyminimizing
thestandardbinarycrossentropylossfunction:
L
ˇ
=

1
#
6
#
6
X
8
=1

H
6
8
log(
z
8
)+(1

H
6
8
)log(1

z
8
)

Œ
(5.4)
where
#
6
isthetotalnumberofgraphs,and
ˇ
(
6
8
)=
z
8
istheprobabilitypredictionbySigmoid.
Asadistributiondetector,
ˇ
(

)
isalsoinchargeofguiding
ˆ
(

)
toimitatefeaturessampledfrom
thetargetdistribution.Theparametersof
ˆ
(

)
isadversariallytrainedbymaking
ˇ
(
6
5
)
toyield
wrongpredictionswhen
6
5
istheinput,suchthattheSigmoidoutputfor
6
5
isashighas
6
>
.This
adversariallossisformulatedas:
L
0
=

1
#
6
#
6
X
8
=1

log
ˇ
(
6
5
8
)

Œ
(5.5)
where
6
5
8
=
˝
(
ˆ
(
f
x
g
8
))
.Thegradientsderivedfrom
L
0
willnotpropagatebackto
ˇ
(

)
,butonlyto
ˆ
(

)
toupdateitsparameters.Byoptimizing
L
0
,weassumethatif
ˆ
(

)
successfullyacquiresthe
oracledistribution,
ˇ
(

)
issupposedtoconsistentlygivehighprobabilityestimatesforthecategory
oforaclegraphs,nomatterwhatkindofgraphisactuallytakenastheinput.Asaresult,theface
embeddingsoutputby
ˆ
(

)
arelikelytopresentsimilarglobalandlocalnodedependenciesand
similarities,aswellastherecognitionabilitytothetargetfeatures.
5.2.4NetworkTraining
Asbothoraclegraphsandgeneratedgraphsareconstructedbasedonfeaturesextractedfrom
images,ourframeworkrequiresapre-trainedfacerepresentationmodeltoinitializethenode
120
informationmatrixforbothgraphs.Apartfromtheaboveadversarialloss,aconventionallossfor
facerecognition,
L
5
,isalsoincludedtoconstrainthefeaturedistributioninareasonablephysical
metricwherevectorsimilaritiescanbeapplied,andtopreventtrivialsolutionstoo.Further,as
statedinSec.5.2.2thatthemaximumdistanceofanarbitrarypointtoitsclasscentroidisbased
ontheminimumdistancebetweencentroids,thedistributionofcentroidsisalsoakeyfactorin
creatingdiscriminativenodeembeddingsfororaclegraphs.Thus,weintroduceanotherobjectiveto
constrainthedistancebetweencentroids:
L
2
=
2
=
(
=

1)
=
(
=

1)
2
X
8
6
=
9
[
ˇ8BC
2

ˇ8BC
(
^
f
2
8
Œ
^
f
2
9
)+
<
]
+
Œ
(5.6)
where
^
f
2
isthebatch-estimatedclasscentroidthatissmoothlyupdatedduringtraining,and
ˇ8BC
2
istheaveragedistanceofallpair-wisecentroidsamongtheentiretrainingset.
[
G
]
+
=
max
(0
ŒG
)
,
and
<
isthemarginparameter.Intotal,thetrainingonthefaceembeddingnetworkisupdatedby
minimizingthecombinedlossfunction:
L
ˆ
=
!
5
+
_
L
0
+
`
L
2
Œ
(5.7)
where
_
and
`
areusedtoadjustthecontributionoftheadversariallossandthecentroiddistance
lossto
ˆ
(

)
.
Trainingstrategy
Theproposedframeworkcanbetrainedinanend-to-endmanner,oratwo-step
strategy.Iftrainedasonepiece,thetrainingprocedureissimilartothatofGAN,where
ˆ
(

)
and
ˇ
(

)
areupdatedalternately.Ontheotherhand,toobtainamorestableinformationoftheglobal
datastructureinthefeaturespace,
e.g.
,theglobalclasscentroidvector
f
2
andtheaveragecentroid
distance
ˇ8BC
2
,weadoptastage-wiselearningprocesstotrainthetwonetworks.
Speci˝cally,thediscriminator
ˇ
(

)
is˝rsttrainedonthefeaturespre-extractedby
ˆ
(

)
.Mean-
while,alltheparametersrelatedtotheglobaldistributionisalsopre-computed,including
f
2
and
ˇ8BC
2
.When
ˇ
(

)
achievesadecentperformance,westopthetrainingand˝xitsparameters.Next,
121
thetrainingentersthesecondstage,wherethepairwisecentroiddistanceandthevertexfeaturesin
thegeneratedgraphareupdatedwith
ˆ
(

)
.Atrainingloopiscompletedafterthesecondstagecomes
toanend.Wecanstartanotherloopifnecessary.Similartotripletloss,weselecthardsamples
duringtraining.Thosegeneratedgraphswithverticesoredgesdi˙erentfromoraclegraphsare
consideredasvalidtrainingsamples.Inaddition,thethreeadjustablehyper-parameters,maximum
distanceratio
A
,losscontributions
_
and
`
canbeupdatedduringdi˙erenttrainingloops.For
instance,wecangraduallyraisethedi˚cultyofthetargetgraphmanipulationwhilethenumberof
loopsincreases.
5.3Experiments
Inthissection,we˝rstconductablationexperimentstostudyvariousdesignchoicesofouralgorithm,
andthencompareourperformancewiththestateoftheartmethodsonpublicbenchmarkdatasets.
Finally,weanalyzethedistributionofourfacerepresentationsviagraphvisualization.
5.3.1DatasetsandImplementationDetails
OurtrainingdatasetisMS-Celeb-1M(MS1M)[23]cleanedbyArcFace[6],referredtoas
MS1MV2
,
containingabout
5
Ł
8
Mimagesof
85
Ksubjects.Weevaluatedourmethodonsixpublicbenchmark
datasetsforfacerecognition:LFW[27],CPLFW[71],CFP-FP[72],IJB-A[29],IJB-B[30],and
IJB-C[9].Thefaceareais˝rstcroppedfromeachimagebasedon˝vefaciallandmarksdetectedby
RetinaFace[34],andthenresizedto
112

112
pixels.
Thearchitectureof
ˆ
(

)
isa
100
-layerResNetusedin[6].Forgraphdiscriminator
ˇ
(

)
,weadopt
theDGCNNarchitectureproposedby[217],consistingoffourlayersofgraphconvolution[215]and
Tanhactivation,asortpoolinglayer,MaxPoolingand
1
Dconvolutionlayers,afully-connectedlayer
withReLUactivation,andaSoftmaxclassi˝cationlayerintheend.Foreachgraph,
5
subjectswith
5
imagespersubjectarerandomlyselectedtoformthesetofnodes.Duringthetrainingof
ˇ
(

)
,the
graphbatchsizeissetto
90
,ofwhichhalfareoracleandhalfaregenerated.Theparametersof
ˇ
(

)
122
Table5.1Veri˝cationperformance(
%
)ofdi˙erentvertexfeaturematricesoforaclegraphs.A
bigger
A
toleratesmoreintra-classvariations,whileasmall
A
,
f
2
8
,or
f
?
8
striveforminimalintra-class
variation.Abalancebetweenthelearningcapabilityandidealrepresentationsperformsthebest
(
A
=0
Ł
7
).
P
>
CFP-FP
IJB-A
TAR@
0
Ł
1%
FAR
A
=1
Ł
097
Ł
9094
Ł
42

1
Ł
49
A
=0
Ł
998
Ł
2796
Ł
58

0
Ł
45
A
=0
Ł
7
98
Ł
3497
Ł
31

0
Ł
38
A
=0
Ł
596
Ł
6894
Ł
04

1
Ł
87
A
=0
Ł
390
Ł
0478
Ł
13

3
Ł
98
f
f
2
8
g
#
1
92
Ł
7181
Ł
30

3
Ł
61
f
f
?
8
g
#
1
90
Ł
0577
Ł
42

4
Ł
19
areupdatedusingAdamwithalearningrateof
1

10

3
.Fortraining
ˆ
(

)
,eachbatchcontains
275
imagesfrom
55
subjects,also
5
imagespersubject.Withthesamegraphsize,
22
graphsare
constructedby
˝
(

)
ineverytrainingstep.
ˆ
(

)
isoptimizedbySGDwithamomentumof
0
Ł
9
anda
weightdecayof
5
4

4
.Thelearningratestartsfrom
0
Ł
05
anddropsatepoch
8
Œ
15
Œ
20
.Themargin
<
inlossfunction
L
2
issetto
0
Ł
3
.WeutilizethelossfunctionintroducedinCurricularFace[221]as
L
5
,andtheirResNet100modeltrainedonMS1MV2asthepre-trainedmodeltoinitializethevertex
featuresingraphs.Theentiretrainingprocesstakestwoloops,andthethreehyper-parameters,
A
=
f
0
Ł
9
Œ
0
Ł
7
g
,
_
=
f
1
Ł
0
Œ
0
Ł
5
g
,
`
=
f
0
Ł
5
Œ
0
Ł
1
g
inthetwoloops,respectively.
5.3.2AblationStudy
VertexFeatureMatrixofOracleGraphs
Thedesignforvertexfeaturematrix,
P
>
,directly
in˛uencesthefeaturedistributioninoraclespace,andalsodeterminesthecomplexityandfeasibility
forlearningtheembeddingnetwork.Here,weexploredi˙erentwaystode˝neoraclerepresentations
andanalyzetheirimpactonthelearnedfeaturespace.Sevenvariantsareconsidered:(1-5):Move
eachfeaturevectortowardsclasscentersunderthecontrolofdi˙erentratios
A
,inwhich
A
=1
Ł
0
,
0
Ł
9
,
0
Ł
7
,
0
Ł
5
,and
0
Ł
3
,respectively.(6)Eachidentityisrepresentedbyasinglefeaturevector
f
2
8
,
themeanvectorofallsamplesfromthesubject
8
.Thisisequivalentto
A
=0
;(7)Eachidentityis
representedbyaprototypefeaturevector
f
?
8
,thecorrespondingcolumnvectorintheweightmatrix
123
ofthelastclassi˝cationlayer;Allthesesevenablationmodelsaretrainedforoneloop,with
_
=1
Ł
0
and
`
=0
Ł
5
.
Tab.5.1reportsthefaceveri˝cationresultsonCFP-FPandIJB-Ausingdi˙erentvertexfeature
matricesfororaclegraphs.Ourresultssuggestthatthereisalimitoftheintra-classdistanceofeach
identityinoraclegraphs.Iftheprede˝nedintra-classdistanceissmallerthanthelimit,itisbeyond
thenetworklearningcapacityandthusitsrecognitionperformancewillsigni˝cantlydegrade.For
example,whenweappointcenterrepresentation
f
f
2
8
g
#
1
,orprototyperepresentation
f
f
?
8
g
#
1
asthe
vertexfeaturematrixfororaclegraph,theaverageintra-classdistanceiszero,sinceeveryidentityis
aprefect
512
-dimensionalfeaturerepresentation.Theveri˝cationaccuracyyetdrops
8
Ł
29%
on
CFP-FPcomparedtothebestmodelinthisablation,andtheTrueAcceptanceRate(TAR)falls
from
97
Ł
31%

0
Ł
38%
to
77
Ł
42%

4
Ł
19%
onIJB-A.Asimilarperformanceisalsoobservedwhen
A
issetto
0
Ł
3
.
Theseresultsareevenworsethantheinitialpre-trainednetwork.Thisindicatesthatgraph
discriminatorwoulddeliverajumbleofinformationthatmaymisguidetheembeddingnetwork,
whenanunreasonablyidealdistributionisassumedfororaclerepresentations.Ontheotherhand,
theperformanceisrelativelyinsensitiveto
A
whentheminimumintra-classvariationiswithin
areasonablerange.Byde˝nition,abigger
A
toleratesmoreintra-classvariations,butleaving
lessroomforimprovingtherepresentation.Thus,
A
shouldbesmallenoughtoachievehigher
discriminabilityoftherepresentation,andmeanwhile,bebiggerenoughtopreventcapacityover˛ow.
Inourexperiments,
A
=0
Ł
7
appearstobeagoodtrade-o˙andconsistentlyperformsthebeston
bothCFPandIJB-A.
AdjacencyMatrixofGeneratedGraphs
InSec.5.2.2,wementionthattheadjacencymatrixofa
generatedgraphiscreatedbytheinformationofnearestneighbors.Thegoalisfor
ˆ
(

)
tolearn
howtomaptheoracledependenciesbetweenverticesintheEuclideanspace.Toshowitse˚cacy,
weablatebyreplacingitwiththeadjacencymatrixoftheoraclegraph,whichisestablishedbased
onidentitylabels.Boththeablationmodelandtheproposedmodelaretrainedforoneloop,with
A
=0
Ł
7
,
_
=1
Ł
0
,and
`
=0
Ł
5
.
124
Table5.2Veri˝cationperformance(
%
)ofdi˙erentadjacencymatricesofgeneratedgraphs.
A
5
CFP-FP
IJB-A
TAR@
0
Ł
1%
FAR
NearestNeighbors
98
Ł
3497
Ł
31

0
Ł
38
IdentityLabels
97
Ł
1694
Ł
44

1
Ł
09
Table5.3Veri˝cationperformance(
%
)ofdi˙erent
_
and
`
.
_`
CFP-FP
IJB-A
TAR@
0
Ł
1%
FAR
1
Ł
00
Ł
5
98
Ł
3497
Ł
31

0
Ł
38
1
Ł
00
Ł
198
Ł
2796
Ł
20

0
Ł
40
0
Ł
50
Ł
598
Ł
1495
Ł
82

0
Ł
47
Tab.5.2comparestheresultsoftrainingusingdi˙erentwaystode˝neadjacencymatricesfor
generatedgraphs.Clearlyitismoreimportanttoallowadjacencymatricesdependingonthefeature
representations,whichhelpsgradient˛owsthroughtheembeddingnetwork,andtheadjacency
matriceswillupdatealongwiththeembeddings.Otherwise,ifde˝nedviaidentitylabels,constant
adjacencymatriceswillbeutilizedduringtrainingiterations.
ContributionofAdversarialLossandCentroidLoss
Hereweshowthee˙ectsofadversarial
lossandcentroidlossbytrainingthenetworkusingdi˙erent
_
and
`
.Theremainingsettingsare
thesameforallthemodelstrainedinthisablation:oneloop,with
A
=0
Ł
7
.Tab.5.3reportsthe
resultsfordi˙erenthyper-parameters,
_
and
`
toshowthecontributionsofbothobjectivefunctions,
Wekeeponeofthemunchanged,anddecreasethevalueoftheothertoseehowita˙ectsthe
performancewhenlesscontributionismadeby
L
2
or
L
2
.AsshowninTab.5.3,asmaller
_
and
`
bothleadtoworseperformanceonCFP-FPandIJB-A,whileincreasingeitherofthembooststhe
performance.Thisindicatesthattheproposedadversariallearningwiththecentroiddistanceloss
makesnon-negligiblecontributiontothediscriminabilityoffacepresentations.
5.3.3ComparisonswithSOTAMethods
Wechoosesixbenchmarkdatasetswidelyusedforfacerecognition,tothoroughlyevaluateour
approachandcompareitwithotherSOTAmethodsaswell.Amongthesixdatasets,threeofthem
125
Table5.4Veri˝cationaccuracy(
%
)ofourmodelandSOTAmethodsonLFW,CPLFW,andCFP-FP.
Theresultsmarkedby(*)arere-implementedbyArcFace[6].Allotherbaselineresultsarereported
bytheirrespectivepapers.[Keys:Red:Best,Blue:Secondbest]
MethodTrainingDataLFWCPLFWCFP-FP
DeepID2+[222]
0
Ł
3
M
99
Ł
47

CenterLoss[47]
0
Ł
7
M
99
Ł
28

SphereFace[13]CASIA[11](
0
Ł
5
M)
99
Ł
42

DeepFace[17]
4
M
97
Ł
35

FaceNet[12]
200
M
99
Ł
63

TPE[223]CASIA

89
Ł
17
DRGAN[224]
1
M

93
Ł
41
Yin
etal
.[11]CASIA

94
Ł
39
UVGAN[225]MS1M(
10
M)
99
Ł
60

94
Ł
05
CosFace[19]CASIA
99
Ł
51


95
Ł
44

PFE[184]MS1M(
4
Ł
4
M)
99
Ł
82

93
Ł
34
ArcFace[6]MS1MV2(
5
Ł
8
M)
99
Ł
83

98
Ł
37
CurricularFace[221]MS1MV2
99
Ł
8093
Ł
1398
Ł
37
DDL[226]VGGFace2(
3
Ł
3
M)
99
Ł
68
93
Ł
43
98
Ł
53
OursMS1MV2
99
Ł
78
93
Ł
40
98
Ł
41
aretestedundertheinstance-basedimage-to-imageveri˝cationprotocol,verifyingwhetherapairof
faceimagesbelongtothesameperson;andtheveri˝cationprotocoloftheotherthreedatasetsis
basedonimagetemplates.Atemplateofimagesisreferredasacollectionoffaceimagessampled
fromthesameidentity.Thetemplate-basedveri˝cationtaskrequiresustodecidewhethertwo
templatesarefromthesamepersonornot.
Instance-basedFaceVeri˝cation
Tab.5.4reportstheveri˝cationaccuracyonthethreeinstance-
basedbenchmarkdatasets,LFW,CPLFW,andCFP-FP.LFWisadatasetcollectedbeforetheeraof
deepfacerepresentations.Its
6
Œ
000
pairsoffaceimagesareconsideredassemi-constrained,with
limitedintra-classvariationsandrelativelyhighimagequality.TheSOTAperformanceisalready
saturatedonLFW.AlmostallDNN-basedmethodslistedinTab.5.4achieveover
99
Ł
00%
accuracy.
Evenso,LFWisstillusedasastandardvalidationbenchmarkgivenitsprevalenceinFRresearch
communitiesande˚cientpairwiseassessment.Ourmodelobtainsasimilaraccuracy(
99
Ł
78%
)to
othermethods,thoughslightlyworsethanthetwomodelsbothtrainedonMS1MV2(ArcFace[6]
andCurricularFace[221]).
Theothertwodatasets,CPLFW,andCFP-FP,arecreatedtoaddressthechallengeoflargefacial
126
Table5.5Comparisonsofveri˝cationperformancewithSOTAmethodsonIJB-A,IJB-B,andIJB-C.
TheevaluationismeasuredbyTAR(
%
),TrueAcceptanceRate,atacertainFAR,FalseAcceptance
Rate.ForIJB-A,FAR=
0
Ł
1%
;forIJB-BandIJB-C,FAR=
0
Ł
01%
.ThedecimalprecisionofTAR
variesamongthosereportedbySOTAmethods.Resultsreportedinthistableareuni˝edtoone
decimalplace(
0
Ł
1
).Allbaselineresultsarereportedbytheirrespectivepapers.[Keys:Red:Best,
Blue:Secondbest]
MethodTrainingDataIJB-AIJB-BIJB-C
DRGAN[224]
1
M
53
Ł
9

4
Ł
3

Yin
etal
.[11]CASIA
73
Ł
9

4
Ł
2

NAN[227]
3
M
88
Ł
1

1
Ł
1

QAN[228]
5
M
89
Ł
3

3
Ł
9

TPE[223]CASIA
90
Ł
0

1
Ł
0

VGGFace2[59]VGGFace2
90
Ł
4

2
Ł
080
Ł
084
Ł
0
Multicolumn[183]VGGFace2
92
Ł
0

1
Ł
383
Ł
188
Ł
7
DCN[125]VGGFace2

84
Ł
988
Ł
5
PFE[184]MS1M(
4
Ł
4
M)
95
Ł
3

0
Ł
9

93
Ł
3
Adacos[229]
2
Ł
8
M

92
Ł
4
P2sgrad[230]
2
Ł
8
M

92
Ł
3
ArcFace[6]MS1MV2

94
Ł
295
Ł
6
CurricularFace[221]MS1MV2

94
Ł
8
96
Ł
1
DDL[226]VGGFace2

90
Ł
793
Ł
1
OursMS1MV2
97
Ł
3

0
Ł
4
94
Ł
6
96
Ł
2
posevariations.Theveri˝cationisconductedbetweenafrontalfaceandapro˝leface,ortwo
faceswithvariantyawangles.DespitemorechallengingthanLFW,imagesinCFP-FPareofhigh
resolution.AndforCPLFW,itcontainsthesameimagesasLFW,butwithre-designatedfacepairs
withposedi˙erence.Thus,theaverageSOTAperformanceonthesetwodatasetsisover
90
Ł
00%
accuracy.Withoutaparticularpolicytailoredforlargeposevariations,ourapproachstillachieves
topperformance,beingbetterthanalloftheSOTAmethodsexceptDDL[226],whichistrained
onVGGFace2[59],adatasetwithlargeposevariationsthatexhibitslessdomaingapwiththetwo
testingsetscomparedtoMS1MV2.
Template-basedFaceVeri˝cation
Tab.5.5reportstheTARperformanceonthethreetemplate-
basedbenchmarkdatasets,IJB-A,IJB-B,andIJB-C.ToevaluateFRmodelsinmorechallenging
scenes,NISTreleasedaseriesofdatasetsthatcontainamixofhigh/lowqualityimagesandlow
qualityvideoframes,presentinglargevariationsinpose,illumination,occlusion,resolution,
etc.
IJB-Aisamongthe˝rsttobepublishedandhasthesmallestnumberofsubjectsandimages.Our
modeloutperformstheothermethodsonIJB-Ausingnospeci˝cschemefortemplate-basedface
127
Table5.6Comparisonsoffaceidenti˝cationperformance(
%
)ontheIJB-Cdataset(close-set).
Method
IJB-C
Rank-
1
Rank-
5
VGGFace2[59]
91
Ł
495
Ł
1
CurricularFace[221]
94
Ł
496
Ł
1
Ours
95
Ł
396
Ł
7
veri˝cation.ItshouldbenotedthatsincetheevaluationonIJB-Aisaten-foldcross-validation
protocol,˝ne-tuningcanbedoneonthesplitfoldsbeforeevaluation.No˝ne-tuninghasyetbeen
conductedonourmodel.
BothIJB-BandIJB-CaretheextendedversionsofIJB-Abyaddingmoresubjectsandimages.
The
=

foldcross-validationprotocolisnotprovidedforthesetwoextensionsandonlyevaluation
isallowed.InTab.5.5,weseethatourapproachperformscomparablytotheSOTAmethodson
IJB-BandIJB-C,whichdemonstratesthatourgraphbasedadversarialframeworkindeedbene˝ts
thediscriminabilityandgeneralizabilityforfacerepresentation.
Apartfromveri˝cationtasks,wealsoreportthefaceidenti˝cationresultsonIJB-CinTab.5.6.
Theclose-setidenti˝cationprotocolofIJB-Ccontainstwosetsoffacetemplateswithnooverlapping
images,referredtoasprobetemplates,andgallerytemplates,respectively.Inparticular,thesetof
gallerytemplatesincludeallidentitiesinprobetemplates,onetemplateperidentity.Eachtime,
onetemplateintheprobesetiscomparedwithallthegallerytemplatestosearchforthenearest
matches.The˝nalresultispresentedbya
Rank-
:
accuracy,thecorrectmatchingrateamong
thetop
:
nearestneighbors.WereportRank-
1
andRank-
5
accuraciesinTab.5.6.Compared
withCurricularFaceandthealgorithmin[59],ourmodelbooststheperformanceinbothRank-
1
andRank-
5
accuracies,whichshowsthattheproposedmethodisalsoe˙ectiveinimprovingface
representationsforidenti˝cationtasks.
5.3.4AnalysisonFeatureDistribution
Wefurtherinvestigatethee˙ectsofourgraph-basedadversariallearningmechanismonface
representationsviavisualization.Wediscusstheenhancementoffeaturediscriminativenessfrom
128
Figure5.4Twoexamplesofgeneratedgraphsbeingupdatedbyadversariallearningat
4
instances
duringthetrainingprocess.
thegraphperspective.Sincethevertexfeaturesofourgraphareinitializedbyawell-trainedmodel,
itlaysagoodfoundationoffurtherimprovement.Infact,manyfeaturepointsarewithintheradius
constraintandarewellconnectedwiththeirsame-identityneighbors.Forthese
goodpoints
,they
neednoattentionsfromtraining.Onthecontrary,thesepointsmayconfusethegraphdiscriminator
ˇ
(

)
astheirfeaturesandstructuresarethesameasoraclegraphs,whichleadstobadresults.Hence,
ouradversarialtrainingonlyfocusesonthose
hardsamples
thatpresentdi˙erentlyfromoracle
graphs,
i.e.
,eitherviolatingtheradiusrequirementornotconnectingtothesameidentity.We
askthenetworktolearnagoodmappingforthesesamplestoboostthegeneralperformance.For
example,Fig.5.4showsthetrainingprogressoftwosetsofinitiallyhardsamplesmadebythe
proposedadversariallearning.
Fig.5.5showsthet-SNEvisualizationofthefaceembeddingsextractedfromCurricularFaceas
wellastheproposedmethod.First,Werandomlyselect
15
hardsamplesubjectswithalloftheir
faceimagesavailableinMS1MV2,ourtrainingdataset.Next,a
512
-dimfeaturevectorisextracted
fromeachimagebyourmodelandtheinitialCurricularFacemodel.Wethenuset-SNEtoproject
each
512
-dimvectorintoa
2
Dspace,inwhichthegeodesicdistancesbetweenpointsinthehigh
dimensionalspacearemaximallymaintained.Thecolorofeach
2
Dpointstandsforthesubject's
identity.FromFig.5.5,weseethatthesesamplesimagesarehardtoberecognizedbyCurricularFace
becausemanyofthenearestneighborsarepointsofotheridentities.Incomparison,bylearning
129
(a)CurricularFace[221]
(b)Ours
Figure5.5
t-SNEvisualizationofthefacerepresentationsina
2
Dspace.Eachidentityisrepresentedbya
uniquecolor.TheinitialfacerepresentationsextractfromCurricularFace,andtheupdatedrepresentations
learnedviaadversarialgraphclassi˝cationareshownin(a)and(b),respectively.
fromoraclegraphs,theproposedmethodlearnsabetterfeaturespacewherethefaceembeddingsof
thesameidentityformamorecompactclustering,leadingtoahigherdiscriminability.
5.4ConcludingRemarks
Deepnetworkbasedfacerecognitionhaswitnessedrapiddevelopmentinrecentyears,especially
owningtotheinnovationofaseriesoflossfunctionsdesignedtoenhancethediscriminativenessof
theembeddingspace.However,mostlossfunctionsexamineeachindividualsample,asamplepair,
oratmostatripletofsamplesinthephysicaldistancemetric,butignorethegeneral
distribution
derivedfromcorrelationsbetweensamplesofwithin-classandcross-class.Incontrast,motivated
bythedesignofgenerativeadversarialnetwork,ourproposedapproachoverseesthedistributionof
featurevectors,representedbyagraph,andpushthegeneratedgraphtobesimilartothe
oraclegraph,byupdatingtheembeddingnetwork.
Webelievethisworkisameaningfulandexcitingexplorationalongthedirectionoflossfunction
designinthefacerecognitioncommunity.Experimentalresultsdemonstratethepromisingofour
proposedapproach.Sinceourmainideaisnotfacespeci˝c,oneofthefutureworksistoextendthis
studytoImageNet-likegeneralclassi˝cationproblems.
130
Chapter6
SummaryandFutureWork
Thisdissertationaddressesfourfacerecognitionproblems,whichareessentialtobothfundamental
analysisandpracticalapplications.Solutionsproposedinthisdissertationemployavarietyof
techniquesindeeplearningtoadvanceresearchwithintheFRcommunity.
IntrinsicDimensionality.
Oneofthemaingoalsisto
estimatethelimitofrepresentation
compactnesswithnolossinrecognitionperformance
.Thisisdenotedastheintrinsicdimensionality
(IND)ofafacerepresentation.Meanwhile,giventheintrinsicdimensionality,wealsoaimat˝nding
aprojectionmethodtoobtainarepresentationtowardsitslimitofcompactness.
Thedensityvariationinarepresentationistheprincipalconsiderationwhenestimatingreliable
INDofagivenfacerepresentation.However,itisdi˚culttoobtainanaccurateestimateofsuch
probabilitydistributionsincefaceimagesoftenlieonatopologicallycomplexcurvedmanifold,
andestimatingitsdistributionrequiresdatapointsatverysmalllength-scales(distances),which
arehardtoaccesswhendataislimited,especiallyinhigh-dimensionalspaceswheredeepface
representationsareusuallyembedded.Anotherchallengingtaskistoverifywhetheragivenestimate
ofINDtrulyrepresentsthedimensionalityofthecomplexhigh-dimensionalrepresentationspace.
Ourcontributiontothisproblemisthatweovercometheabovechallengesbyo˙eringa
topologicaldimensionalityestimationtechniqueforhigh-dimensionalfacerepresentationsand
proposingamappingapproachthatenablesvalidationoftheINDestimatesthroughimagematching
131
experimentsonthecorrespondinglow-dimensionalintrinsicrepresentationoffeaturevectors.In
thisstudy,wede˝nethenotionofintrinsicdimensionthroughtheclassicalconceptoftopological
dimensionofthesupportofadistribution.Weadoptanelegantsolutiontoaddressingtheissue
of
curseofdimensionality
byutilizingthegeodesicdistancebetweenpointsthatiscomputedas
graphinducedshortestpathbetweenpointsinsteadoftheEuclideandistance.Andthedi˚culty
ofestimatingthedatadistributionisconqueredbasedontheobservationthatdi˙erenttopological
geometriesaresimilartoeachotheraslongastheintrinsicdimensionalityisthesame,orin
otherwordsthedistributiondependsonlyontheintrinsicdimensionalityandnotonthegeometric
supportofthemanifolds.Thus,theintrinsicdimensionalityoffacemanifoldcanbeestimated
bycomparingtheempiricaldistributionofthepairwisedistancesonthemanifoldtothatofa
knowndistribution,suchasthe
<

hypersphere.TovalidatetheestimatedIND,weproposea
method,DeepMDS,thatreliesontheabilityofDNNstoapproximatethecomplexmappingfunction
fromtheambientspacetotheintrinsicspace.TheDeepMDSmodelisoptimizedtopreservethe
interpointgeodesicdistancesbetweenthefeaturevectorsintheambientandintrinsicspace,andis
trainedinastage-wisemannerthatprogressivelyreducesthedimensionalityoftherepresentation.
Ournewdimensionalityreductionmethodaddressesthescalabilityandout-of-sample-extension
problemssu˙eredbytraditionalspectralmethodslikeIsomap.Awell-trainedDeepMDSmodel
isnotlimitedtomapobserveddata,butcanbeeasilyappliedtonewtestdatasinceitprovidesa
mappingfunctionintheformofafeed-forwardnetworkthatmapstheambientfeaturevectortoits
correspondingintrinsicfeaturevector.ItisshownthatDeepMDSmappingissigni˝cantlybetter
thanotherdimensionalityreductionapproachesintermsofitsdiscriminativecapability.
Capacity.
Givenafacerepresentation,howmanyidentitiescanitresolve?
Addressingthis
questionistheprimarygoalofthiswork.Wealsorefertoitasthecapacityofagivenface
representation.Wede˝nethemaximalnumberofusersatwhichthefacerepresentationreaches
itslimitasthecapacityoftherepresentation.Byourde˝nition,thecapacityisdeterminedinan
objectivemannerwithouttheneedforempiricalevaluation.
Oursolutionreliesonthenotionofcapacitythathasbeenwellstudiedintheinformation
132
theorycommunityinthecontextofwirelesscommunication.Thesetting,commonlyreferred
toastheGaussianchannel,consistsofasourcesignalthatisadditivelycorruptedbyGaussian
noisetogenerateobservations.ThecapacityofthisGaussianchannelisde˝nedasthenumberof
distinctsourcesignalsinthesignalrepresentations.Despitetherichtheoreticalunderstandingof
thecapacityofaGaussianchannel,therehasbeenlimitedpracticalapplicationofthistheoryinthe
contextofestimatingthecapacityoflearnedembeddingslikefacerepresentations.Forexample,
estimatingthedistributionofthesourceandthenoiseforahigh-dimensionalembedding,suchasa
facerepresentation,isanopenproblem.Avarietyofsourcesofnoiseneedtobetakenintoaccount
whenitcomestoreliablyinferringtheprobabilitydistributionsinhigh-dimensionalspaces.Itis
alsochallengingtoobtainreliableestimatesofthevolumeofarbitrarilyshapedhigh-dimensional
manifolds(forcapacitybound).
Weaddresstheaforementionedchallengestoobtainreliableestimatesofthecapacityofany
facerepresentationbyleveragingthedimensionalityreductionmethod,DeepMDS,weproposedin
thepreviousstudy.WiththeassistanceofDeepMDS,we˝rstmodelthefacerepresentationasa
low-dimensionalEuclideanmanifoldembeddedwithinahigh-dimensionalspace,andthenproject
andunfoldthemanifoldtoalow-dimensionalspace.Inoursolution,twokindsofmanifoldsneed
tobeapproximated:(1)apopulationmanifoldthatisapproximatedbyamultivariateGaussian
distribution(equivalently,hyper-ellipsoidalsupport)intheunfoldedlow-dimensionalspace;(2)
identity-speci˝cmanifoldsthatareapproximatedbythecorrespondingmulti-variateGaussian
distributionswhosesupportsareestimatedasafunctionofthespeci˝edFAR.The˝nalcapacity
valueisestimatedasaratioofthevolumesofthepopulationandidentity-speci˝chyper-ellipsoids.
Weestimatethedistributionofbothkindsofmanifoldsfromanobservedfacerepresentationby
leveragingtherecentadvancesinDNNs.Inparticular,givenanembeddingfunction(
teacher
network
)thatmapsnormalizedhigh-dimensionalfaceimagestoalow-dimensionalvector,wetrain
aDNN(
studentnetwork
)tomodeltwosourcesofuncertaintythatcontributetothenoiseinthe
embeddings:(i)uncertaintyinthedata,and(ii)uncertaintyintheembeddingfunction.These
uncertaintyestimatesarethenusedtodeterminethevolumesoftheirmanifoldsandalsodirectly
133
enableustocomputetherepresentationcapacityasafunctionofthedesiredoperatingpointas
determinedbyitscorrespondingFAR.Experimentalresultssuggestthatourcapacityestimatesare
anupperboundontheactualperformanceoffacerecognitionsystemsinpractice,especiallyunder
unconstrainedscenarios.Therelativeorderofthecapacityestimatesmimicstherelativeorderof
theveri˝cationaccuracyonthebenchmarkdatasets.
Bias.
Demographicbiasinfacerecognitionsystemscanpotentiallycauseethicalissueswhen
deployed.AthoroughanalysisonthediscriminatorybehaviorofFRsystemsagainstcertain
demographicgroupsandasolutiontomitigatingsuchbiasarethecentralaimsofthisstudy.We
de˝neFRbiasastheunevenrecognitionperformancew.r.t.demographicgroups.And,theultimate
goalofunbiasedfacerecognitionisthat,givenafacerecognitionsystem,thereshouldbeno
statisticallysigni˝cantdi˙erenceamongtheperformanceindi˙erentdemographicgroupsofface
images.
Biascanbederivedfromvarioussources,suchasthebalancedegreeofthedemographicsamples,
variationsincaptureconditionsandimagenoiseamongdi˙erentdemographicgroups.Therefore,
naivelytrainingonadatasetcontaininguniformsamplesoverthegroupspacemaystillleadtobias.
Infact,thedemographicdistributionofadatasetisoftenimbalancedwithunderrepresentedand
overrepresentedgroups.Similarly,simplyre-samplinganimbalancedtrainingdatasetmaynotsolve
theproblemeither,giventhefactthatthediversityoflatentvariablesisdi˙erentacrossgroupsand
theinstancescannotbetreatedfairlyduringtraining.Forthesereasons,biasmitigationrequires
specialattentionsonbothdatasamplingandalgorithmdesign.
Thisthesisfocusesondevelopingde-biasingalgorithmsforfacerecognition.Ourcontribution
tothisproblemisthatweprovidetwoapproachestoaddressingtheissueofdemographicbias.
The˝rstframework,DebFace,istodiminishthein˛uenceofbiasonbothfacerecognitionand
demographicattributeestimation.Infact,duringourinvestigationondemographicbiasinface
recognition,wealsoobservethebiasedperformanceofdemographicattributeestimation,which
maycauseadditionalbiassinceitactsdi˙erentlywhenappliedtode-biasfacerepresentations
indi˙erentgroups.Tothisend,weproposetojointlylearnunbiasedrepresentationsforboth
134
theidentityanddemographicattributes.Thissolutionisbasedontheassumptionthatiftheface
representationdoesnotcarrydiscriminativeinformationofdemographicattributes,itwouldbe
unbiasedintermsofdemographics.Andthesamehypothesisisappliedtodemographicattribute
estimationaswell.Startingfromamulti-tasklearningframeworkthatlearnsdisentangledfeature
representationsofgender,age,race,andidentity,respectively,werequesttheclassi˝erofeachtask
toactasadversarialsupervisionfortheothertasks.Thesefourclassi˝ershelpeachothertoachieve
betterfeaturedisentanglement,resultinginunbiasedfeaturerepresentationsforboththeidentityand
demographicattributes.
AlthoughDebFaceshowsnoticeablee˙ectinmitigatingdemographicbias,weobservethatthe
recognitionperformancedeclinesaswell.Thus,inoursecondsolution,GAC,wemainlyfocuson
racialbiasandstrivetoenhancethediscriminativenessoffacerepresentationsineveryrace/ethnicity
group.ThekeyideaofGACistogivethenetworkmorecapacitytobroadenitsscopeformultiple
facepatternsfromdi˙erentgroupssinceanunbiasedFRmodelshallrelyonbothuniquepatterns
forrecognitionofdi˙erentgroups,andgeneralpatternsofallfacesforimprovedgeneralizability.
GACexplicitlylearnsthesedi˙erentfeaturepatternsbyleveragingtwomodules:theadaptive
layerandautomationmodule.Theadaptivelayercomprisesadaptiveconvolutionkernelsand
channel-wiseattentionmapswhereeachkernelandmaptacklefacesin
one
demographicgroup.
WealsointroduceanewobjectivefunctiontoGAC,whichdiminishesthevariationofaverage
intra-classdistancebetweendemographicgroups.Todynamicallyapplytheseadaptivemodules,
wealsoproposeanautomationschemethatcanchoosewhichlayerstoapplytheadaptations.Asa
result,ourexperimentsdemonstratethee˚cacyofGACinbiasmitigationandSOTAperformance
preservation.
RepresentationLearning.
Wefocusedonthefairnessofpre-de˝nedgroups(demographic
groups)inthepreviousstudy.Yet,itisequallyimportanttoaddresstheindividualfairnessand
performanceinfacerecognition.Inthisstudy,weaimatdevelopingafacerepresentationmethodin
considerationofindividualperformancesothatfaceimagesofeachidentityisfairlytreated,andin
themeantimetheoverallperformanceisimproved.
135
Byinvestigatingtheaverageintra-subjectandinter-subjectdistanceofeachidentity,westartby
constructinga
:

NNgraphtodescribethegeneralfacedistributionintheembeddingspace.Asa
furtherendeavor,weproposearepresentationlearningmethodthatutilizesgraphclassi˝cationvia
adversarialtraining.Incontrasttousinganequationalmetricastheconstraintinthefeaturespace,
ourideaengagesacreationofanidealfeaturespacethatensuresindividualperformance,referred
toas
oraclespace
,wheretheclusteroffeaturepointsineachclassisclearlyseparatedfromother
classes.Adeepneuralnetwork(DNN)isthentrainedtogeneratefacefeaturesthatfollowthedata
distributioninthe
oraclespace
.Sincetherelationshipsandinter-dependenciesbetweenfeature
pointsarenotassimpleasthedatastructureof˝xed-sizegridimages,wecannolongerusethe
conventionalCNN
discriminator
astheformofouradversarialsupervision.Inparticular,thedata
structureinthefeaturespacecanberepresentedbyadirectedgraph,whereeachvertexcorresponds
toanimagesampleandtheedgesbetweenverticesrepresenttheirdependencies.Fortheoracle
space,featurepointsareconnectediftheybelongtothesubject;whileintheactualfeaturespace,
nodesarelinkedtotheir
:
nearestneighbors.Thediscriminationtaskhereistodistinguishbetween
graphsfromtheoraclespaceandthegeneratedspace.Hence,weemployagraphclassi˝ertrained
withGraphNeuralNetwork(GNN),asthediscriminatorthatguidestherepresentationmodelto
outputfeaturesthatfollowtheoracledistribution.Itisdemonstratedthatourframeworkiscapable
oflearningagenericfeaturespacewithenhanceddiscriminativepowerforfaceimages,basedona
predesignedfeaturedistributionde˝nedonagraphstructure.
FutureWork.
Whilethisdissertationhasexploredfundamentalproblemsinfacerecognitionand
hasdevelopedusefultoolsandalgorithmsthatprovideexcellentperformance,thereisalwaysroom
foradditionalimprovement.Mostimportantly,theresearchcontributionsinthisdissertationisnot
limitedtofacerecognition.Thereareanumberofotherareasincomputervision,forexample,
generalimageclassi˝cationandrepresentationlearning,thatcanbene˝tgreatlyfromtheresearch
conductedinthisdissertation.
136
APPENDIX
137
PUBLICATIONS
[1]
S.Gong,V.N.Boddeti,andA.K.Jain,theintrinsicdimensionalityofimagerepresentations,
in
CVPR
,2019.
[2]
S.Gong,X.Liu,andA.K.Jain,yde-biasingfacerecognitionanddemographicattribute
estimation,
ECCV
,2020.
[3]
D.Deb,S.Wiper,S.Gong,Y.Shi,C.Tymoszek,A.Fletcher,andA.K.Jain,acerecognition:
Primatesinthewild,in
BTAS
,IEEE,2018.
[4]
S.Gong,Y.Shi,N.D.Kalka,andA.K.Jain,ideofacerecognition:Component-wisefeature
aggregationnetwork(c-fan),in
ICB
,IEEE,2019.
[5]
S.Gong,Y.Shi,andA.Jain,wqualityvideofacerecognition:Multi-modeaggregation
recurrentnetwork(marn),in
ICCVW
,2019.
[6]
S.Gong,X.Liu,andA.K.Jain,atingfacerecognitionbiasviagroupadaptiveclassi˝er,
in
CVPR
,2021.
[7]
S.Gong,V.N.Boddeti,andA.K.Jain,thecapacityoffacerepresentation,
arXivpreprint
arXiv:1709.10433
,2017.
138
BIBLIOGRAPHY
139
BIBLIOGRAPHY
[1]
D.GranataandV.Carnevale,Accurateestimationoftheintrinsicdimensionusinggraph
distances:Unravelingthegeometriccomplexityofdatasets,
Scienti˝cReports
,vol.6,p.31377,
2016.
[2]
K.W.Pettis,T.A.Bailey,A.K.Jain,andR.C.Dubes,Anintrinsicdimensionalityestimatorfrom
near-neighborinformation,
IEEETransactionsonPatternAnalysisandMachineIntelligence
,
pp.1979.
[3]
A.Rozza,G.Lombardi,C.Ceruti,E.Casiraghi,andP.Campadelli,ovelhighintrinsic
dimensionalityestimators,
MachineLearning
,vol.89,no.1-2,pp.2012.
[4]
M.Wang,W.Deng,J.Hu,X.Tao,andY.Huang,facesinthewild:Reducingracial
biasbyinformationmaximizationadaptationnetwork,in
ICCV
,2019.
[5]
M.WangandW.Deng,atingbiasinfacerecognitionusingskewness-awarereinforcement
learning,in
CVPR
,2020.
[6]
J.Deng,J.Guo,N.Xue,andS.Zafeiriou,Arcface:Additiveangularmarginlossfordeepface
recognition,in
CVPR
,2019.
[7]
P.Grother,M.Ngan,andK.Hanaoka,acerecognitionvendortest(FRVT)part3:Demographic
e˙ects,in
TechnicalReport,NationalInstituteofStandardsandTechnology
,2019.
[8]
S.Gong,V.N.Boddeti,andA.K.Jain,theintrinsicdimensionalityofimagerepresentations,
in
CVPR
,pp.2019.
[9]
B.Maze,J.Adams,J.A.Duncan,N.Kalka,T.Miller,C.Otto,A.K.Jain,W.T.Niggel,
J.Anderson,J.Cheney,
etal.
,pajanusbenchmark-c:Facedatasetandprotocol,in
ICB
,
2018.
[10]
C.Szegedy,S.Io˙e,V.Vanhoucke,andA.A.Alemi,inception-resnetandthe
impactofresidualconnectionsonlearning.,in
AAAI
,2017.
[11]
D.Yi,Z.Lei,S.Liao,andS.Z.Li,ningfacerepresentationfromscratch,
arXivpreprint
arXiv:1411.7923
,2014.
[12]
F.Schro˙,D.Kalenichenko,andJ.Philbin,acenet:Auni˝edembeddingforfacerecognition
andclustering,in
CVPR
,2015.
[13]
W.Liu,Y.Wen,Z.Yu,M.Li,B.Raj,andL.Song,ace:Deephypersphereembedding
forfacerecognition,in
CVPR
,2017.
[14]
B.F.Klare,M.J.Burge,J.C.Klontz,W.V.Bruegge,Richard,andA.K.Jain,acerecognition
performance:Roleofdemographicinformation,
IEEETrans.InformationForensicsand
Security
,vol.7,no.6,pp.2012.
140
[15]
R.R.Selvaraju,M.Cogswell,A.Das,R.Vedantam,D.Parikh,andD.Batra,
Visualexplanationsfromdeepnetworksviagradient-basedlocalization,in
ICCV
,2017.
[16]
H.-T.Nguyen,
Contributionstofacialfeatureextractionforfacerecognition
.PhDthesis,
Grenoble,2014.
[17]
Y.Taigman,M.Yang,M.Ranzato,andL.Wolf,ace:Closingthegaptohuman-level
performanceinfaceveri˝cation,in
CVPR
,2014.
[18]
Y.Sun,Y.Chen,X.Wang,andX.Tang,learningfacerepresentationbyjoint
identi˝cation-veri˝cation,in
NeurIPS
,2014.
[19]
H.Wang,Y.Wang,Z.Zhou,X.Ji,D.Gong,J.Zhou,Z.Li,andW.Liu,ace:Large
margincosinelossfordeepfacerecognition,in
CVPR
,2018.
[20]
C.LuandX.Tang,passinghuman-levelfaceveri˝cationperformanceonlfwwith
gaussianface,
arXivpreprintarXiv:1404.3840
,2014.
[21]
J.Howard,Y.Sirotin,andA.Vemury,Thee˙ectofbroadandspeci˝cdemographic
homogeneityontheimposterdistributionsandfalsematchratesinfacerecognitionalgorithm
performance,in
IEEEBTAS
,2019.
[22]
E.Creager,D.Madras,J.-H.Jacobsen,M.Weis,K.Swersky,T.Pitassi,andR.Zemel,xibly
fairrepresentationlearningbydisentanglement,in
ICML
,2019.
[23]
Y.Guo,L.Zhang,Y.Hu,X.He,andJ.Gao,Adatasetandbenchmarkfor
large-scalefacerecognition,in
ECCV
,Springer,2016.
[24]
I.Masi,Y.Wu,T.Hassner,andP.Natarajan,facerecognition:Asurvey,in
SIBGRAPI
,
2018.
[25]
C.Jin,R.Jin,K.Chen,andY.Dou,Acommunitydetectionapproachtocleaningextremely
largefacedatabase,
ComputationalIntelligenceandNeuroscience
,2018.
[26]
ashlist,
[27]
G.B.Huang,M.Ramesh,T.Berg,andE.Learned-Miller,facesinthewild:A
databaseforstudyingfacerecognitioninunconstrainedenvironments,Tech.Rep.07-49,
UniversityofMassachusetts,Amherst,October2007.
[28]
P.ViolaandM.J.Jones,obustreal-timefacedetection,
Internationaljournalofcomputer
vision
,2004.
[29]
B.F.Klare,B.Klein,E.Taborsky,A.Blanton,J.Cheney,K.Allen,P.Grother,A.Mah,and
A.K.Jain,thefrontiersofunconstrainedfacedetectionandrecognition:Iarpajanus
benchmarka,in
CVPR
,2015.
[30]
C.Whitelam,E.Taborsky,A.Blanton,B.Maze,J.Adams,T.Miller,N.Kalka,A.K.Jain,
J.A.Duncan,K.Allen,
etal.
,pajanusbenchmark-bfacedataset,in
CVPRW
,2017.
141
[31]
S.A.Rizvi,P.J.Phillips,andH.Moon,Theferetveri˝cationtestingprotocolforface
recognitionalgorithms,in
AFGR
,pp.IEEE,1998.
[32]
M.Gunther,S.Cruz,E.M.Rudd,andT.E.Boult,Towardopen-setfacerecognition,in
CVPRW
,pp.2017.
[33]
K.Zhang,Z.Zhang,Z.Li,andY.Qiao,facedetectionandalignmentusingmultitask
cascadedconvolutionalnetworks,
IEEESignalProcessingLetters
,2016.
[34]
J.Deng,J.Guo,Y.Zhou,J.Yu,I.Kotsia,andS.Zafeiriou,etinaface:Single-stagedense
facelocalisationinthewild,
arXivpreprintarXiv:1905.00641
,2019.
[35]
A.K.Jain,K.Nandakumar,andA.Ross,yearsofbiometricresearch:Accomplishments,
challenges,andopportunities,
PatternRecognitionLetters
,vol.79,pp.2016.
[36]
M.A.TurkandA.P.Pentland,acerecognitionusingeigenfaces,in
CVPR
,1991.
[37]
P.N.Belhumeur,J.P.Hespanha,andD.J.Kriegman,enfacesvs.˝sherfaces:Recognition
usingclassspeci˝clinearprojection,
IEEETransactionsonPatternAnalysisandMachine
Intelligence
,vol.19,no.7,pp.1997.
[38]
D.Gabor,Theoryofcommunication.part1:Theanalysisofinformation,
Journalofthe
InstitutionofElectricalEngineers-PartIII:RadioandCommunicationEngineering
,vol.93,
no.26,pp.1946.
[39]
L.Wiskott,N.Krüger,N.Kuiger,andC.VonDerMalsburg,acerecognitionbyelasticbunch
graphmatching,
IEEETransactionsonPatternAnalysisandMachineIntelligence
,vol.19,
no.7,pp.1997.
[40]
T.Ahonen,A.Hadid,andM.Pietikäinen,acerecognitionwithlocalbinarypatterns,in
ECCV
,2004.
[41]
T.Ahonen,E.Rahtu,V.Ojansivu,andJ.Heikkila,ecognitionofblurredfacesusinglocal
phasequantization,in
ICPR
,IEEE,2008.
[42]
A.Albiol,D.Monzo,A.Martin,J.Sastre,andA.Albiol,acerecognitionusing
PatternRecognitionLetters
,vol.29,no.10,pp.2008.
[43]
M.Bicego,A.Lagorio,E.Grosso,andM.Tistarelli,theuseofsiftfeaturesforface
authentication,in
CVPRW
,IEEE,2006.
[44]
A.Krizhevsky,I.Sutskever,andG.E.Hinton,enetclassi˝cationwithdeepconvolutional
neuralnetworks,in
NeurIPS
,2012.
[45]
P.Grother,P.Grother,M.Ngan,andK.Hanaoka,
FaceRecognitionVendorTest(FRVT)Part
2:Identi˝cation
.USDepartmentofCommerce,NationalInstituteofStandardsandTechnology,
2019.
[46]
Y.Sun,X.Wang,andX.Tang,learningfacerepresentationfrompredicting10,000
classes,in
CVPR
,pp.2014.
142
[47]
Y.Wen,K.Zhang,Z.Li,andY.Qiao,Adiscriminativefeaturelearningapproachfordeep
facerecognition,in
ECCV
,2016.
[48]
R.Ranjan,C.D.Castillo,andR.Chellappa,trainedsoftmaxlossfordiscriminative
faceveri˝cation,
arXivpreprintarXiv:1703.09507
,2017.
[49]
W.Liu,Y.Wen,Z.Yu,andM.Yang,ge-marginsoftmaxlossforconvolutionalneural
networks.,in
ICML
,p.7,2016.
[50]
K.He,X.Zhang,S.Ren,andJ.Sun,residuallearningforimagerecognition,in
CVPR
,
2016.
[51]
K.SimonyanandA.Zisserman,erydeepconvolutionalnetworksforlarge-scaleimage
recognition,
arXivpreprintarXiv:1409.1556
,2014.
[52]
Y.Taigman,M.Yang,M.Ranzato,andL.Wolf,eb-scaletrainingforfaceidenti˝cation,
in
CVPR
,pp.2015.
[53]
R.S.Bennett,epresentationandanalysisofsignalspartxxi.theintrinsicdimensionality
ofsignalcollections,tech.rep.,JohnsHopkinsUniversityBaltimoreMD,Deptartmentof
ElectricalEngineeringandComputerScience,1965.
[54]
J.Theiler,timatingfractaldimension,
JOSAA
,vol.7,no.6,pp.1990.
[55]
J.A.CostaandA.O.Hero,entropicgraphsfordimensionandentropyestimation
inmanifoldlearning,
IEEETransactionsonSignalProcessing
,vol.52,no.8,pp.
2004.
[56]
V.N.Boddeti,facematchingusingfullyhomomorphicencryption,in
IEEE
InternationalConferenceonBiometrics:Theory,Applications,andSystems(BTAS)
,2018.
[57]
A.Talwalkar,S.Kumar,andH.Rowley,ge-scalemanifoldlearning,in
CVPR
,2008.
[58]
T.G.DietterichandE.B.Kong,hinelearningbias,statisticalbias,andstatistical
varianceofdecisiontreealgorithms,tech.rep.,DepartmentofComputerScience,Oregon
StateUniversity,1995.
[59]
Q.Cao,L.Shen,W.Xie,O.M.Parkhi,andA.Zisserman,GGface2:Adatasetfor
recognisingfacesacrossposeandage,in
FRGC
,IEEE,2018.
[60]
T.Bolukbasi,K.-W.Chang,J.Y.Zou,V.Saligrama,andA.T.Kalai,istocomputer
programmeraswomanistohomemaker?debiasingwordembeddings,in
NeurIPS
,2016.
[61]
A.Torralba,A.A.Efros,
etal.
,nbiasedlookatdatasetbias.,in
CVPR
,2011.
[62]
C.Drummond,R.C.Holte,
etal.
,5,classimbalance,andcostsensitivity:whyunder-
samplingbeatsover-sampling,in
WorkshoponLearningfromImbalancedDatasetsII
,Citeseer,
2003.
143
[63]
N.V.Chawla,K.W.Bowyer,L.O.Hall,andW.P.Kegelmeyer,syntheticminority
over-samplingtechnique,
JournalofArti˝cialIntelligenceresearch
,vol.16,pp.
2002.
[64]
S.S.Mullick,S.Datta,andS.Das,eadversarialminorityoversampling,
arXiv
preprintarXiv:1903.09730
,2019.
[65]
K.Cao,C.Wei,A.Gaidon,N.Arechiga,andT.Ma,ningimbalanceddatasetswith
label-distribution-awaremarginloss,
arXivpreprintarXiv:1906.07413
,2019.
[66]
Y.Cui,M.Jia,T.-Y.Lin,Y.Song,andS.Belongie,lossbasedone˙ective
numberofsamples,in
CVPR
,2019.
[67]
Q.Dong,S.Gong,andX.Zhu,deeplearningbyminorityclassincremental
recti˝cation,
IEEETransactionsonPatternAnalysisandMachineIntelligence
,2018.
[68]
C.Huang,Y.Li,C.L.Chen,andX.Tang,imbalancedlearningforfacerecognitionand
attributeprediction,
IEEETransactionsonPatternAnalysisandMachineIntelligence
,2019.
[69]
S.Khan,M.Hayat,S.W.Zamir,J.Shen,andL.Shao,trikingtherightbalancewith
uncertainty,in
CVPR
,2019.
[70]
K.He,X.Zhang,S.Ren,andJ.Sun,mappingsindeepresidualnetworks,in
ECCV
,
Springer,2016.
[71]
T.ZhengandW.Deng,lfw:Adatabaseforstudyingcross-posefacerecognition
inunconstrainedenvironments,
BeijingUniversityofPostsandTelecommunications,Tech.Rep
,
vol.5,2018.
[72]
S.Sengupta,J.-C.Chen,C.Castillo,V.M.Patel,R.Chellappa,andD.W.Jacobs,to
pro˝lefaceveri˝cationinthewild,in
WACV
,IEEE,2016.
[73]
S.Gong,V.N.Boddeti,andA.K.Jain,thecapacityoffacerepresentation,
arXivpreprint
arXiv:1709.10433
,2017.
[74]
S.Gong,X.Liu,andA.K.Jain,yde-biasingfacerecognitionanddemographicattribute
estimation,
ECCV
,2020.
[75]
S.Gong,X.Liu,andA.K.Jain,atingfacerecognitionbiasviagroupadaptiveclassi˝er,
in
CVPR
,2021.
[76]
J.B.Tenenbaum,V.DeSilva,andJ.C.Langford,Aglobalgeometricframeworkfornonlinear
dimensionalityreduction,
Science
,vol.290,no.5500,pp.2000.
[77]
K.FukunagaandD.R.Olsen,Analgorithmfor˝ndingintrinsicdimensionalityofdata,
IEEETransactionsonComputers
,vol.100,no.2,pp.1971.
[78]
J.BruskeandG.Sommer,insicdimensionalityestimationwithoptimallytopology
preservingmaps,
IEEETransactionsonPatternAnalysisandMachineIntelligence
,vol.20,
no.5,pp.1998.
144
[79]
P.J.VerveerandR.P.W.Duin,Anevaluationofintrinsicdimensionalityestimators,
IEEE
TransactionsonPatternAnalysisandMachineIntelligence
,vol.17,no.1,pp.1995.
[80]
A.S.Georghiades,P.N.Belhumeur,andD.J.Kriegman,fewtomany:Illumination
conemodelsforfacerecognitionundervariablelightingandpose,
IEEETransactionson
PatternAnalysisandMachineIntelligence
,vol.23,no.6,pp.2001.
[81]
H.MuraseandS.K.Nayar,isuallearningandrecognitionof3-dobjectsfromappearance,
InternationalJournalofComputerVision
,vol.14,no.1,pp.1995.
[82]
P.GrassbergerandI.Procaccia,ingthestrangenessofstrangeattractors,in
The
TheoryofChaoticAttractors
,pp.Springer,2004.
[83]
F.CamastraandA.Vinciarelli,timatingtheintrinsicdimensionofdatawithafractal-based
method,
IEEETransactionsonPatternAnalysisandMachineIntelligence
,vol.24,no.10,
pp.2002.
[84]
B.Kégl,insicdimensionestimationusingpackingnumbers,in
AdvancesinNeural
InformationProcessingSystems
,2003.
[85]
M.HeinandJ.-Y.Audibert,insicdimensionalityestimationofsubmanifoldsin
R
3
,in
InternationalConferenceonMachineLearning
,2005.
[86]
E.LevinaandP.J.Bickel,likelihoodestimationofintrinsicdimension,in
AdvancesinNeuralInformationProcessingSystems
,2005.
[87]
I.T.Jolli˙e,incipalcomponentanalysisandfactoranalysis,in
PrincipalComponent
Analysis
,pp.Springer,1986.
[88]
J.B.Kruskal,scalingbyoptimizinggoodnessof˝ttoanonmetric
hypothesis,
Psychometrika
,vol.29,no.1,pp.1964.
[89]
M.BelkinandP.Niyogi,eigenmapsfordimensionalityreductionanddata
representation,
NeuralComputation
,vol.15,no.6,pp.2003.
[90]
S.T.RoweisandL.K.Saul,onlineardimensionalityreductionbylocallylinearembedding,
Science
,vol.290,no.5500,pp.2000.
[91]
R.R.CoifmanandS.Lafon,maps,
AppliedandComputationalHarmonicAnalysis
,
vol.21,no.1,pp.2006.
[92]
G.E.HintonandR.R.Salakhutdinov,educingthedimensionalityofdatawithneural
networks,
Science
,vol.313,no.5786,pp.2006.
[93]
P.Vincent,H.Larochelle,Y.Bengio,andP.-A.Manzagol,andcomposingrobust
featureswithdenoisingautoencoders,in
InternationalConferenceonMachineLearning
,
pp.ACM,2008.
145
[94]
P.Vincent,H.Larochelle,I.Lajoie,Y.Bengio,andP.-A.Manzagol,tackeddenoising
autoencoders:Learningusefulrepresentationsinadeepnetworkwithalocaldenoisingcriterion,
JournalofMachineLearningResearch
,vol.11,no.Dec,pp.2010.
[95]
R.Hadsell,S.Chopra,andY.LeCun,reductionbylearninganinvariant
mapping,in
IEEEConferenceonComputerVisionandPatternRecognition
,pp.
2006.
[96]
Y.Bengio,J.Louradour,R.Collobert,andJ.Weston,riculumlearning,in
International
ConferenceonMachineLearning
,pp.ACM,2009.
[97]
D.P.KingmaandJ.Ba,Adam:Amethodforstochasticoptimization,
arXivpreprint
arXiv:1412.6980
,2014.
[98]
I.LoshchilovandF.Hutter,stochasticgradientdescentwithwarmrestarts,
arXiv
preprintarXiv:1608.03983
,2016.
[99]
S.Liao,Z.Lei,D.Yi,andS.Z.Li,Abenchmarkstudyoflarge-scaleunconstrainedface
recognition,in
IEEEInternationalJointConferenceonBiometrics(IJCB)
,2014.
[100]
N.A.SchmidandJ.A.O'Sullivan,erformancepredictionmethodologyforbiometric
systemsusingalargedeviationsapproach,
IEEETransactionsonSignalProcessing
,vol.52,
no.10,pp.2004.
[101]
N.A.Schmid,M.V.Ketkar,H.Singh,andB.Cukic,erformanceanalysisofiris-based
identi˝cationsystematthematchingscorelevel,
IEEETransactionsonInformationForensics
andSecurity
,vol.1,no.2,pp.2006.
[102]
J.BhatnagarandA.Kumar,estimatingperformanceindicesforbiometricidenti˝cation,
PatternRecognition
,vol.42,no.9,pp.2009.
[103]
P.Wang,Q.Ji,andJ.L.Wayman,andpredictingfacerecognitionsystem
performancebasedonanalysisofsimilarityscores,
IEEETransactionsonPatternAnalysis
andMachineIntelligence
,vol.29,no.4,pp.2007.
[104]
C.E.RasmussenandC.K.Williams,
GaussianProcessesforMachineLearning
,vol.1.MIT
PressCambridge,2006.
[105]
Y.GalandZ.Ghahramani,asabayesianapproximation:Representingmodel
uncertaintyindeeplearning,
arXiv:1506.02142
,vol.2,2015.
[106]
Y.GalandZ.Ghahramani,Atheoreticallygroundedapplicationofdropoutinrecurrent
neuralnetworks,in
NIPS
,2016.
[107]
Y.GalandZ.Ghahramani,yesianconvolutionalneuralnetworkswithbernoulliapproxi-
matevariationalinference,
arXiv:1506.02158
,2015.
[108]
A.KendallandY.Gal,uncertaintiesdoweneedinbayesiandeeplearningforcomputer
vision?,in
NIPS
,2017.
146
[109]
A.DerKiureghianandO.Ditlevsen,Aleatoryorepistemic?doesitmatter?,
Structural
Safety
,vol.31,no.2,pp.2009.
[110]
S.Pankanti,S.Prabhakar,andA.K.Jain,theindividualityof˝ngerprints,
IEEE
TransactionsonPatternAnalysisandMachineIntelligence
,vol.24,no.8,pp.2002.
[111]
Y.Zhu,S.C.Dass,andA.K.Jain,tatisticalmodelsforassessingtheindividuality
of˝ngerprints,
IEEETransactionsonInformationForensicsandSecurity
,vol.2,no.3,
pp.2007.
[112]
J.Daugman,ormationtheoryandtheiriscode,
IEEETransactionsonInformation
ForensicsandSecurity
,vol.11,no.2,pp.2016.
[113]
A.Adler,R.Youmaran,andS.Loyka,Towardsameasureofbiometricfeatureinformation,
PatternAnalysisandApplications
,vol.12,no.3,pp.2009.
[114]
J.BaandR.Caruana,deepnetsreallyneedtobedeep?,in
NIPS
,2014.
[115]
D.P.Kingma,T.Salimans,andM.Welling,ariationaldropoutandthelocalreparameteri-
zationtrick,in
NIPS
,2015.
[116]
C.Blundell,J.Cornebise,K.Kavukcuoglu,andD.Wierstra,eightuncertaintyinneural
networks,
arXiv:1505.05424
,2015.
[117]
L.Bottou,ge-scalemachinelearningwithstochasticgradientdescent,in
Proceedings
ofCOMPSTAT'2010
,pp.Springer,2010.
[118]
D.E.Rumelhart,G.E.Hinton,andR.J.Williams,ningrepresentationsbyback-
propagatingerrors,
CognitiveModeling
,vol.5,no.3,p.1,1988.
[119]
R.O.Duda,P.E.Hart,andD.G.Stork,
PatternClassi˝cation
.JohnWiley&Sons,2012.
[120]
T.M.CoverandJ.A.Thomas,
ElementsofInformationTheory
.JohnWiley&Sons,2012.
[121]
C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,and
A.Rabinovich,deeperwithconvolutions,in
CVPR
,2015.
[122]
D.Wang,C.Otto,andA.K.Jain,acesearchatscale:80milliongallery,
IEEETransactions
onPatternAnalysisandMachineIntelligence
,vol.39,no.6,pp.11221136,2017.
[123]
X.Wu,R.He,Z.Sun,andT.Tan,Alightcnnfordeepfacerepresentationwithnoisylabels,
IEEETransactionsonInformationForensicsandSecurity
,vol.13,no.11,pp.2018.
[124]
R.Ranjan,A.Bansal,H.Xu,S.Sankaranarayanan,J.-C.Chen,C.D.Castillo,andR.Chellappa,
ystallossandqualitypoolingforunconstrainedfaceveri˝cationandrecognition,
arXiv
preprintarXiv:1804.01159
,2018.
[125]
W.Xie,L.Shen,andA.Zisserman,networks,
arXivpreprintarXiv:1807.11440
,
2018.
147
[126]
I.Kemelmacher-Shlizerman,S.M.Seitz,D.Miller,andE.Brossard,Themegaface
benchmark:1millionfacesforrecognitionatscale,in
CVPR
,2016.
[127]
L.Best-RowdenandA.K.Jain,studyofautomaticfacerecognition,
IEEE
TransactionsonPatternAnalysisandMachineIntelligence
,vol.40,no.1,pp.2018.
[128]
M.Alvi,A.Zisserman,andC.Nellåker,Turningablindeye:Explicitremovalofbiasesand
variationfromdeepneuralnetworkembeddings,in
ECCV
,2018.
[129]
L.A.Hendricks,K.Burns,K.Saenko,T.Darrell,andA.Rohrbach,omenalsosnowboard:
Overcomingbiasincaptioningmodels,in
ECCV
,2018.
[130]
T.Wang,J.Zhao,M.Yatskar,K.-W.Chang,andV.Ordonez,datasetsarenot
enough:Estimatingandmitigatinggenderbiasindeepimagerepresentations,in
CVPR
,2019.
[131]
D.Madras,E.Creager,T.Pitassi,andR.Zemel,ningadversariallyfairandtransferable
representations,in
ICML
,2018.
[132]
M.Kearns,S.Neel,A.Roth,andZ.S.Wu,Anempiricalstudyofrichsubgroupfairness
formachinelearning,in
ProceedingsoftheConferenceonFairness,Accountability,and
Transparency
,2019.
[133]
J.Zhao,T.Wang,M.Yatskar,V.Ordonez,andK.-W.Chang,alsolikeshopping:
Reducinggenderbiasampli˝cationusingcorpus-levelconstraints,in
EMNLP
,2017.
[134]
Z.Wang,K.Qinami,Y.Karakozis,K.Genova,P.Nair,K.Hata,andO.Russakovsky,
Towardsfairnessinvisualrecognition:E˙ectivestrategiesforbiasmitigation,
arXivpreprint
arXiv:1911.11834
,2019.
[135]
A.Grover,J.Song,A.Kapoor,K.Tran,A.Agarwal,E.J.Horvitz,andS.Ermon,
correctionoflearnedgenerativemodelsusinglikelihood-freeimportanceweighting,in
NeurIPS
,
2019.
[136]
F.Calmon,D.Wei,B.Vinzamuri,K.N.Ramamurthy,andK.R.Varshney,
pre-processingfordiscriminationprevention,in
NeurIPS
,2017.
[137]
M.P.Kim,A.Ghorbani,andJ.Zou,:Black-boxpost-processingforfairness
inclassi˝cation,in
AAAI/ACM
,2019.
[138]
G.Pleiss,M.Raghavan,F.Wu,J.Kleinberg,andK.Q.Weinberger,fairnessand
calibration,in
NeurIPS
,2017.
[139]
Y.ZhangandZ.-H.Zhou,t-sensitivefacerecognition,
IEEETrans.PatternAnalysis
andMachineIntelligence
,vol.32,no.10,pp.2009.
[140]
Y.-H.LiuandY.-T.Chen,acerecognitionusingtotalmargin-basedadaptivefuzzysupport
vectormachines,
IEEETransactionsonNeuralNetworks
,vol.18,no.1,pp.2007.
[141]
X.Yin,X.Yu,K.Sohn,X.Liu,andM.Chandraker,eaturetransferlearningforface
recognitionwithunder-representeddata,in
CVPR
,2019.
148
[142]
X.Zhang,Z.Fang,Y.Wen,Z.Li,andY.Qiao,elossfordeepfacerecognitionwith
long-tailedtrainingdata,in
CVPR
,2017.
[143]
A.Amini,A.Soleimany,W.Schwarting,S.Bhatia,andD.Rus,ncoveringandmitigating
algorithmicbiasthroughlearnedlatentstructure,in
AAAI/ACMConferenceonAI,Ethics,and
Society
,2019.
[144]
P.Wang,F.Su,Z.Zhao,Y.Guo,Y.Zhao,andB.Zhuang,class-skewedlearningfor
facerecognition,
Neurocomputing
,2019.
[145]
H.Qin,Asymmetricrejectionlossforfairerfacerecognition,
arXivpreprint
arXiv:2002.03276
,2020.
[146]
D.Moyer,S.Gao,R.Brekelmans,A.Galstyan,andG.VerSteeg,variantrepresentations
withoutadversarialtraining,in
NeurIPS
,2018.
[147]
J.Song,P.Kalluri,A.Grover,S.Zhao,andS.Ermon,ningcontrollablefairrepresenta-
tions,in
ICAIS
,2019.
[148]
R.Zemel,Y.Wu,K.Swersky,T.Pitassi,andC.Dwork,ningfairrepresentations,in
ICML
,2013.
[149]
M.Hardt,E.Price,andN.Srebro,ualityofopportunityinsupervisedlearning,in
NeurIPS
,2016.
[150]
F.Locatello,G.Abbati,T.Rainforth,S.Bauer,B.Schölkopf,andO.Bachem,the
fairnessofdisentangledrepresentations,in
NeurIPS
,2019.
[151]
X.Yin,X.Yu,K.Sohn,X.Liu,andM.Chandraker,Towardslarge-posefacefrontalization
inthewild,in
ICCV
,2017.
[152]
L.Tran,X.Yin,andX.Liu,epresentationlearningbyrotatingyourfaces,
IEEETrans.on
PatternAnalysisandMachineIntelligence
,vol.41,no.12,pp.2019.
[153]
J.Schmidhuber,ningfactorialcodesbypredictabilityminimization,
NeuralComputa-
tion
,vol.4,no.6,pp.1992.
[154]
I.Goodfellow,J.Pouget-Abadie,M.Mirza,B.Xu,D.Warde-Farley,S.Ozair,A.Courville,
andY.Bengio,eadversarialnets,in
NIPS
,2014.
[155]
E.Tzeng,J.Ho˙man,T.Darrell,andK.Saenko,deeptransferacrossdomains
andtasks,in
CVPR
,2015.
[156]
E.Tzeng,J.Ho˙man,K.Saenko,andT.Darrell,Adversarialdiscriminativedomain
adaptation,in
CVPR
,2017.
[157]
M.Long,Z.Cao,J.Wang,andM.I.Jordan,adversarialdomainadaptation,in
NIPS
,2018.
149
[158]
C.Tao,F.Lv,L.Duan,andM.Wu,entropynetwork:Learningcategory-invariant
featuresfordomainadaptation,
arXivpreprintarXiv:1904.09601
,2019.
[159]
B.Yin,L.Tran,H.Li,X.Shen,andX.Liu,Towardsinterpretablefacerecognition,in
ICCV
,2019.
[160]
Y.Liu,Z.Wang,H.Jin,andI.Wassell,adversarialnetworkfordisentangled
featurelearning,in
CVPR
,2018.
[161]
Y.Liu,F.Wei,J.Shao,L.Sheng,J.Yan,andX.Wang,ingdisentangledfeature
representationbeyondfaceidenti˝cation,in
CVPR
,2018.
[162]
L.Tran,X.Yin,andX.Liu,representationlearningGANforpose-invariant
facerecognition,in
CVPR
,2017.
[163]
F.Liu,D.Zeng,Q.Zhao,andX.Liu,featuresin3Dfaceshapesforjointface
reconstructionandrecognition,in
CVPR
,2018.
[164]
H.KimandA.Mnih,byfactorising,
arXivpreprintarXiv:1802.05983
,2018.
[165]
S.Narayanaswamy,T.B.Paige,J.-W.VandeMeent,A.Desmaison,N.Goodman,P.Kohli,
F.Wood,andP.Torr,ningdisentangledrepresentationswithsemi-superviseddeep
generativemodels,in
NIPS
,2017.
[166]
F.Locatello,S.Bauer,M.Lucic,S.Gelly,B.Schölkopf,andO.Bachem,
commonassumptionsintheunsupervisedlearningofdisentangledrepresentations,
arXiv
preprintarXiv:1811.12359
,2018.
[167]
Z.Zhang,L.Tran,X.Yin,Y.Atoum,J.Wan,N.Wang,andX.Liu,recognitionvia
disentangledrepresentationlearning,in
CVPR
,2019.
[168]
H.Han,K.J.Anil,S.Shan,andX.Chen,eneousfaceattributeestimation:Adeep
multi-tasklearningapproach,
IEEETrans.PatternAnalysisMachineIntelligence
,vol.PP,
no.99,pp.2017.
[169]
C.M.Cook,J.J.Howard,Y.B.Sirotin,J.L.Tipton,andA.R.Vemury,raphic
e˙ectsinfacialrecognitionandtheirdependenceonimageacquisition:Anevaluationofeleven
commercialsystems,
IEEETransactionsonBiometrics,Behavior,andIdentityScience
,2019.
[170]
F.Wang,J.Cheng,W.Liu,andH.Liu,Additivemarginsoftmaxforfaceveri˝cation,
IEEE
SignalProcessingLetters
,vol.25,no.7,pp.2018.
[171]
A.Jourabloo,X.Yin,andX.Liu,Attributepreservedfacede-identi˝cation,in
ICB
,2015.
[172]
B.-C.Chen,C.-S.Chen,andW.H.Hsu,ereferencecodingforage-invariantface
recognitionandretrieval,in
ECCV
,2014.
[173]
R.Rothe,R.Timofte,andL.VanGool,expectationofrealandapparentagefroma
singleimagewithoutfaciallandmarks,
IJCV
,2018.
150
[174]
Z.Zhang,Y.Song,andH.Qi,Ageprogression/regressionbyconditionaladversarial
autoencoder,in
CVPR
,2017.
[175]
S.Moschoglou,A.Papaioannou,C.Sagonas,J.Deng,I.Kotsia,andS.Zafeiriou,Agedb:
the˝rstmanuallycollected,in-the-wildagedatabase,in
CVPRW
,2017.
[176]
Z.Niu,M.Zhou,L.Wang,X.Gao,andG.Hua,regressionwithmultipleoutput
cnnforageestimation,in
CVPR
,2016.
[177]
J.Cheng,Y.Li,J.Wang,L.Yu,andS.Wang,e˙ectivefacialpatchesforrobust
genderrecognition,
TsinghuaScienceandTechnology
,vol.24,no.3,pp.2019.
[178]
S.Setty,M.Husain,P.Beham,J.Gudavalli,M.Kandasamy,R.Vaddi,V.Hemadri,J.C.
Karure,R.Raju,Rajan,V.Kumar,andC.V.Jawahar,MovieFaceDatabase:A
BenchmarkforFaceRecognitionUnderWideVariations,in
NCVPRIPG
,2013.
[179]
illionpairs.deepglint.com/overview,
[180]
D.Deb,L.Best-Rowden,andA.K.Jain,acerecognitionperformanceunderaging,in
CVPRW
,2017.
[181]
https://yanweifu.github.io/FG_NET_data
.
[182]
X.YinandX.Liu,convolutionalneuralnetworkforpose-invariantfacerecogni-
tion,
IEEETrans.ImageProcessing
,vol.27,no.2,pp.2017.
[183]
W.XieandA.Zisserman,networksforfacerecognition,
arXivpreprint
arXiv:1807.09192
,2018.
[184]
Y.ShiandA.K.Jain,ticfaceembeddings,in
ICCV
,2019.
[185]
D.Kang,D.Dhar,andA.Chan,poratingsideinformationbyadaptiveconvolution,in
NeurIPS
,2017.
[186]
J.Yang,Z.Ren,C.Gan,H.Zhu,andD.Parikh,hannelcommunicationnetworks,
in
NeurIPS
,2019.
[187]
Q.Wang,B.Wu,P.Zhu,P.Li,W.Zuo,andQ.Hu,E˚cientchannelattentionfor
deepconvolutionalneuralnetworks,
arXivpreprintarXiv:1910.03151
,2019.
[188]
X.Lu,W.Wang,C.Ma,J.Shen,L.Shao,andF.Porikli,more,knowmore:Unsupervised
videoobjectsegmentationwithco-attentionsiamesenetworks,in
CVPR
,2019.
[189]
R.Hou,H.Chang,M.Bingpeng,S.Shan,andX.Chen,attentionnetworkforfew-shot
classi˝cation,in
NeurIPS
,2019.
[190]
K.Su,D.Yu,Z.Xu,X.Geng,andC.Wang,poseestimationwithenhanced
channel-wiseandspatialinformation,in
CVPR
,2019.
[191]
T.-K.Hu,Y.-Y.Lin,andP.-C.Hsiu,ningadaptivehiddenlayersformobilegesture
recognition,in
AAAI
,2018.
151
[192]
Y.Zhang,D.Zhao,J.Sun,G.Zou,andW.Li,Adaptiveconvolutionalneuralnetworkand
itsapplicationinfacerecognition,
NeuralProcessingLetters
,2016.
[193]
S.Li,J.Xing,Z.Niu,S.Shan,andS.Yan,drivenkerneladaptationinconvolutional
neuralnetworkforrobustfacialtraitsrecognition,in
CVPR
,2015.
[194]
J.Du,S.Zhang,G.Wu,J.M.Moura,andS.Kar,Topologyadaptivegraphconvolutional
networks,
arXivpreprintarXiv:1710.10370
,2017.
[195]
C.Ding,Y.Li,Y.Xia,L.Zhang,andY.Zhang,Automatickernelsizedeterminationfor
deepneuralnetworksbasedhyperspectralimageclassi˝cation,
RemoteSensing
,2018.
[196]
X.Li,W.Wang,X.Hu,andJ.Yang,ekernelnetworks,in
CVPR
,2019.
[197]
C.Ding,Y.Li,Y.Xia,W.Wei,L.Zhang,andY.Zhang,volutionalneuralnetworks
basedhyperspectralimageclassi˝cationmethodwithadaptivekernels,
RemoteSensing
,2017.
[198]
X.Li,M.Ye,Y.Liu,andC.Zhu,Adaptivedeepconvolutionalneuralnetworksforscene-
speci˝cobjectdetection,
IEEETransactionsonCircuitsandSystemsforVideoTechnology
,
2017.
[199]
H.Su,V.Jampani,D.Sun,O.Gallo,E.Learned-Miller,andJ.Kautz,el-adaptive
convolutionalneuralnetworks,in
CVPR
,2019.
[200]
J.ZamoraEsquivel,A.CruzVargas,P.LopezMeyer,andO.Tickoo,Adaptiveconvolutional
kernels,in
ICCVWorkshops
,2019.
[201]
B.Klein,L.Wolf,andY.Afek,Adynamicconvolutionallayerforshortrangeweather
prediction,in
CVPR
,2015.
[202]
X.Jia,B.DeBrabandere,T.Tuytelaars,andL.V.Gool,˝lternetworks,in
NeurIPS
,2016.
[203]
X.Zhang,T.Wang,J.Qi,H.Lu,andG.Wang,ressiveattentionguidedrecurrent
networkforsalientobjectdetection,in
CVPR
,2018.
[204]
Z.Chen,L.Liu,I.Sa,Z.Ge,andM.Chli,ningcontext˛exibleattentionmodelfor
long-termvisualplacerecognition,
IEEERoboticsandAutomationLetters
,2018.
[205]
L.Chen,H.Zhang,J.Xiao,L.Nie,J.Shao,W.Liu,andT.-S.Chua,A-CNN:Spatialand
channel-wiseattentioninconvolutionalnetworksforimagecaptioning,in
CVPR
,2017.
[206]
B.Chen,P.Li,C.Sun,D.Wang,G.Yang,andH.Lu,attentionmoduleforvisual
tracking,
PatternRecognition
,2019.
[207]
A.A.BastidasandH.Tang,attentionnetworks,in
CVPRWorkshops
,2019.
[208]
H.Ling,J.Wu,J.Huang,J.Chen,andP.Li,Attention-basedconvolutionalneuralnetwork
fordeepfacerecognition,
MultimediaToolsandApplications
,2020.
152
[209]
V.A.SindagiandV.M.Patel,Hierarchicalattention-basedcrowdcountingnetwork,
IEEETransactionsonImageProcessing
,2019.
[210]
J.Hu,L.Shen,andG.Sun,ueeze-and-excitationnetworks,in
CVPR
,2018.
[211]
S.Woo,J.Park,J.-Y.Lee,andI.SoKweon,Convolutionalblockattentionmodule,
in
ECCV
,2018.
[212]
M.Sadiq,D.Shi,M.Guo,andX.Cheng,aciallandmarkdetectionviaattention-adaptive
deepnetwork,
IEEEAccess
,2019.
[213]
D.Linsley,D.Schiebler,S.Eberhardt,andT.Serre,ningwhatandwheretoattend,in
ICLR
,2019.
[214]
U.Ozbulak,hcnnvisualizations.
https://github.com/utkuozbulak/
pytorch-cnn-visualizations
,2019.
[215]
T.N.KipfandM.Welling,visedclassi˝cationwithgraphconvolutionalnetworks,
in
ICLR
,2017.
[216]
M.De˙errard,X.Bresson,andP.Vandergheynst,volutionalneuralnetworksongraphs
withfastlocalizedspectral˝ltering,in
NeurIPS
,2016.
[217]
M.Zhang,Z.Cui,M.Neumann,andY.Chen,Anend-to-enddeeplearningarchitecturefor
graphclassi˝cation,in
AAAI
,2018.
[218]
M.Niepert,M.Ahmed,andK.Kutzkov,ningconvolutionalneuralnetworksforgraphs,
in
ICML
,PMLR,2016.
[219]
J.-C.Vialatte,V.Gripon,andG.Mercier,theconvolutionoperatortoextend
cnnstoirregulardomains,
arXivpreprintarXiv:1606.01166
,2016.
[220]
A.J.-P.Tixier,G.Nikolentzos,P.Meladianos,andM.Vazirgiannis,classi˝cation
with2dconvolutionalneuralnetworks,in
ICANN
,Springer,2019.
[221]
Y.Huang,Y.Wang,Y.Tai,X.Liu,P.Shen,S.Li,J.Li,andF.Huang,ricularface:
adaptivecurriculumlearninglossfordeepfacerecognition,in
CVPR
,2020.
[222]
Y.Sun,X.Wang,andX.Tang,ylearnedfacerepresentationsaresparse,selective,
androbust,in
CVPR
,2015.
[223]
S.Sankaranarayanan,A.Alavi,C.D.Castillo,andR.Chellappa,Tripletprobabilistic
embeddingforfaceveri˝cationandclustering,in
BTAS
,IEEE,2016.
[224]
L.Tran,X.Yin,andX.Liu,representationlearningganforpose-invariant
facerecognition,in
CVPR
,2017.
[225]
J.Deng,S.Cheng,N.Xue,Y.Zhou,andS.Zafeiriou,-gan:Adversarialfacialuvmap
completionforpose-invariantfacerecognition,in
ProceedingsoftheIEEEconferenceon
computervisionandpatternrecognition
,pp.2018.
153
[226]
Y.Huang,P.Shen,Y.Tai,S.Li,X.Liu,J.Li,F.Huang,andR.Ji,vingfacerecognition
fromhardsamplesviadistributiondistillationloss,in
ECCV
,pp.Springer,2020.
[227]
J.Yang,P.Ren,D.Zhang,D.Chen,F.Wen,H.Li,andG.Hua,euralaggregationnetwork
forvideofacerecognition,in
CVPR
,2017.
[228]
Y.Liu,J.Yan,andW.Ouyang,awarenetworkforsettosetrecognition,in
CVPR
,
2017.
[229]
X.Zhang,R.Zhao,Y.Qiao,X.Wang,andH.Li,Adacos:Adaptivelyscalingcosinelogits
fore˙ectivelylearningdeepfacerepresentations,in
CVPR
,2019.
[230]
X.Zhang,R.Zhao,J.Yan,M.Gao,Y.Qiao,X.Wang,andH.Li,rad:Re˝nedgradients
foroptimizingdeepfacemodels,in
CVPR
,2019.
154