FaceSearchandClusteringatScaleByCharlesOttoADissertationSubmittedtoMichiganStateUniversityinpartialentoftherequirementsforthedegreeofComputerScience{DoctorofPhilosophy2016AbstractFaceSearchandClusteringatScaleByCharlesOttoTherehasbeenagreatdealofprogressontheproblemofunconstrainedfacerecognitioninrecentyears,particularlywiththeemergenceofdeeplearningmethodsforgeneratingfacerepresentations.Developingrobustfacerepresentationsisnotonlyofgreatpracticalinterestforclassicbiometricproblemslikefacevbutalsoforemerginglargescalefacesearchproblems.Inbothsocialmedia,andlaw-enforcementapplications,anextremelylargevolumeoffacedataisbecomingcommonplace.Oneemergingproblemrelatedtothishighvolumeofdataistodevisemethodstosearchforpersonsofinterestamongthebillionsofsharedphotosonthesewebsites.Anotherchallengeistogrouporclusterlargecollectionsofunlabeledfaceimagesbyidentity,forexampleasapreludetomanualexaminationofacollectionofimagesinlaw-enforcementapplications.Regardingthefacesearchproblem,weproposeafacesearchsystemwhichcombinesafastsearchprocedure,coupledwithastate-of-the-artcommercialtheshelf(COTS)matcher,inacascadedframework.Givenaprobeface,wethelargegalleryofphotostothetop-kmostsimilarfacesusingfeatureslearnedbyaconvolutionalneuralnetwork.Thekretrievedcandidatesarere-rankedbycombiningsimilaritiesbasedondeepfeaturesandthoseoutputbytheCOTSmatcher.Weevaluatetheproposedfacesearchsystemonagallerycontaining80millionweb-downloadedfaceimages.ExperimentalresultsdemonstratethatwhilethedeepfeaturesperformworsethantheCOTSmatcheronamugshotdataset(93:7%vs.98:6%TAR@FARof0:01%),fusingthedeepfeatureswiththeCOTSmatcherimprovestheoverallperformance(99:5%TAR@FARof0:01%).Thisshowsthatthelearneddeepfeaturesprovidecomplementaryinformationoverrepresentationsusedinstate-of-the-artfacematchers.Ontheunconstrainedfaceimagebenchmarks,theperformanceofthelearneddeepfeaturesiscompetitivewithreportedaccuracies.LFWdatabase:98:20%accuracyunderthestandardprotocoland88:03%TAR@FARof0:1%undertheBLUFRprotocol;IJB-Abenchmark:51:0%TAR@FARof0:1%(vrank1retrievalof82:2%(closed-setsearch),61:5%FNIR@FPIRof1%(open-setsearch).Theproposedfacesearchsystemanexcellentbetweenaccuracyandscalabilityongallerieswithmillionsofimages.Additionally,inafacesearchexperimentinvolvingphotosoftheTsarnaevbrothers,convictedoftheBostonMarathonbombing,theproposedcascadefacesearchsystemcouldtheyoungerbrother's(DzhokharTsarnaev)photoatrank1in1secondona5Mgalleryandatrank8in7secondsonan80Mgallery,usingaIntelXeonprocessorclockedat3.1GHz.Intermsofclustering,insocialmedia,lawenforcement,andotherapplicationsthenum-berofunlabeledfacescanbeoftheorderofhundredsofmillion,whilethenumberofidentities(clusters)canrangefromafewthousandtomillions.Toaddressthechallengesofrun-timecomplexityandclusterquality,wepresentanapproximateRank-Orderclusteringalgorithmthatperformsbetterthanpopularclusteringalgorithms(k-MeansandSpectral).Ourexperimentsincludeclusteringupto123millionfaceimagesintoover10millionclusters.Clusteringresultsareanalyzedintermsofexternal(knownfacelabels)andinternal(un-knownfacelabels)clusterqualitymeasures,andrun-time.OurclusteringalgorithmachievesanF-measureof0:87ontheLFWbenchmark(13Kfacesof5;749individuals),whichdropsto0:27onthelargestdatasetconsidered(13KfacesinLFW+123Mdistractorimages).Additionally,weshowthatframesintheYouTubebenchmarkcanbeclusteredwithanF-measureof0:71.Aninternalper-clusterqualitymeasureisdevelopedtorankindividualclustersformanualexplorationofhighqualityclustersthatarecompactandisolated.Finally,wefurtherdevelopourfacerepresentation,leveragingmoreadvancednetworkarchitectures,andanorderofmagnitudemoretrainingdata{attainingavrateof92:22%at0:1%FARontheBLUFRvprotocol,usingasinglemodel.AcknowledgmentsIwouldliketothankmyadviserAnilK.Jain,myfamily,andmyfriends(youknowwhoyouare).IwouldadditionallyliketothankNobliscorporationfortheirassistanceinprovidingsomedatausedforthisthesis.Andso5yearshavepassed,andwehavereachedtheendofthisroad,suchasitis.It'sappropriatetoendthingsasFallpassesintoWinter.WecanalwayscountonMichigantoprovideabrightlightandacoldwind,beforethegrayskiessetin.Winterisatimefortheendofsomething,ortheanticipationofsomething{sowiththatinmindjustwait,afterallthat'sallthatyoucando.ivTableofContentsLISTOFTABLES....................................viiiLISTOFFIGURES...................................xChapter1Introduction................................11.1FaceRecognitionBackground..........................61.1.1FaceRecognitionProblems........................71.1.1.1Veri...........................81.1.1.2Search..............................91.1.1.3ClusteringbyIdentity.....................131.1.2FaceRecognitionPipeline........................141.1.2.1FaceDetection.........................151.1.2.2Pre-Processing.........................161.1.2.3FaceAlignment.........................161.1.2.4FeatureExtraction.......................181.1.2.5Comparison...........................191.1.3FaceRecognitionTaskProgression...................191.2SearchBackground................................221.3ClusteringBackground..............................231.4Contributions...................................261.5Organization...................................27Chapter2FaceRepresentation...........................292.1Introduction....................................292.2LayerTypes....................................312.3NetworkDescription...............................322.4FaceDatabases..................................362.4.1123MillionWebFaces..........................382.5FaceRecognitionEvaluation...........................392.5.1MugshotEvaluation...........................402.5.2LFWEvaluation.............................412.5.2.1StandardProtocol.......................412.5.2.2BLUFRProtocol........................422.5.3IJB-AEvaluation.............................462.6Conclusions....................................51Chapter3FaceSearchatScale...........................533.1Introduction....................................533.2RelatedWork...................................563.3FaceSearchFramework..............................583.3.1FaceFiltering...............................59v3.3.2Re-Ranking................................613.3.2.1ImpactofCandidateSetSize(k)...............623.3.2.2FusionMethod.........................633.4Large-scaleFaceSearch..............................643.4.1SearchDatabase.............................653.4.2DatabaseSegmentation..........................663.4.3PerformanceMeasures..........................693.4.4Closed-setFaceSearch..........................703.4.5Open-setFaceSearch...........................733.4.6Scalability.................................733.5BostonMarathonBombingCaseStudy.....................753.6Conclusions....................................78Chapter4Large-ScaleFaceClustering......................804.1Introduction....................................804.2Background....................................834.2.1FaceClustering..............................834.2.2GeneralImageClustering........................864.2.3ApproximateNearestNeighborMethods................864.2.3.1k-nnGraphConstruction...................874.2.3.2Randomizedk-dTree.....................874.2.4ClusteringEvaluation...........................884.3ProposedFaceClusteringApproach.......................894.3.1Rank-OrderClustering..........................904.3.2ProposedApproximateRank-OrderClustering.............924.3.3Per-ClusterQualityEvaluation.....................954.4Experiments....................................974.4.1ClusteringAlgorithmEvaluation....................974.4.2ApproximationPerformance.......................1014.4.3Large-ScaleFaceClustering.......................1054.4.4Deduplication...............................1074.4.5VideoFrameClustering.........................1084.4.6Per-ClusterQuality:InternalMeasures.................1114.4.6.1RankingEvaluation......................1124.5Conclusions....................................115Chapter5NetworkImprovements.........................1195.1Introduction....................................1195.2VGG-FaceDatabase...............................1205.3DeepResidualNetworks.............................1245.4ExperimentalResults...............................1275.5Conclusions....................................135Chapter6SummaryandFutureWork......................136viBIBLIOGRAPHY....................................138viiListofTablesTable1.1Facevperformanceonnotableconstrainedfacebenchmarks.20Table2.1Thedetailedconvolutionalnetworkarchitectureusedinthisthesis,whichiscloselyrelatedtotheoneoutlinedin[104].Themajorare:RGBimageinputtothedatalayer,improvedinputfacealignment,andanaddeddataaugmentationlayer........................35Table2.2PerformanceoffacevonasubsetofthePCSOdatabase(89;905imagesof13;666subjects).Thereareabout340Kgenuinepairsandover4billionimposterpairs............................40Table2.3PerformanceofvariousfacerecognitionmethodsonthestandardLFWvprotocol................................41Table2.4PerformanceofvariousfacerecognitionmethodsonLFWusingtheBLUFRprotocolreportedasTrueAcceptRate(TAR)andDetectionandIdenRate(DIR)standarddeviationacross10folds........45Table2.5RecognitionaccuraciesundertheIJB-Aprotocol.ResultsforGOTSandOpenBRaretakenfrom[48].Resultsreportedaretheaveragestandarddeviationoverthe10foldsspintheIJB-Aprotocol...........49Table3.1Asummaryoffacesearchsystemsreportedintheliterature......56Table3.2Large-scalewebfacesearchdatabaseoverview..............65Table3.3Theaveragesearchtime(seconds)perprobefaceandthecorrespondingsearchperformance(mAP)............................75Table3.4RankretrievalresultsofthetwoBostonbombersuspectsbasedon5Mand80Mfacegallery.Thesixprobeimagesaredesignatedas1a,1b,1c,2a,2b,and2c.Thesixmatedimagesinthegalleryaredesignatedas1x,1y,1z,2x,2y,and2z.ThecorrespondingimagesareshowninFig.3.9.......77Table4.1Asummaryofrelatedstudiesonfaceclustering.............83Table4.2ClusteringresultsonthecompleteLFWdatabase.TimesaregivenasHH:MM:SS,measuredusing20coresofanIntelXeonCPUclockedat2.5GHz.Theproposedalgorithm(lastrow)hasthehighestclusteringaccuracy(F-measure)andtheshortestrun-time......................98viiiTable4.3ClusteringResultsontheLFWdatabase,andLFWwithadditional1millionweb-downloadedfaceimagesusingapproximaterank-orderclusteringwithvaryingnearestneighborcomputationmethods.Timesmeasuredusing20coresofanIntelXeonCPUclockedat2.5GHz...............103Table4.4ClusteringresultsusingApproximateRank-OrderclusteringontheLFWdatabasewithincreasingamountsofunlabeleddata,andtsearchsizestrategiesfortheapproximatenearestneighborcalculations.Timesmea-suredusing5coresforLFW+5Mdatabaseexperiments,andasinglecoreusedforthesmallerexperiments,onanIntelXeonCPUclockedat2.5GHz.104Table4.5Large-scaleclusteringresultsusingtheproposedApproximateRank-Orderclustering,withrandomizedk-dtreenearestneighborapproximation.TimesmeasuredusingthespnumberofcoresofIntelXeonCPUsclockedat2.5GHz.#Clustersistheresultingnumberofclusters,excludingsingle-imageclusters................................106Table4.6YTFclusteringresultsusingtheproposedApproximateRank-Orderclustering,withrandomizedk-dtreenearestneighborapproximation,forclus-teringallframesinthedatabase,andforrandomlysampling3framespervideo.TimeisgivenasHH:MM:SS,measuredusing20coresofanIntelXeonCPUclockedat2.5GHz..............................109Table4.7Correlationofinternalclusterqualitymeasureswithprecision,fortheLFWdatabase.AveragePrec.@100istheunweightedaverageofper-clusterpairwiseprecisionforthetop-100scoringclustersforeachmetric,FractionPure@100isthefractionofthetop-100clusterswhichcontainonlyasingleidentity.......................................112Table5.1Numberofimagesandsubjectsavailableforuseintrainingsets,undervarious..............................125Table5.2Detaileddeepresidualnetworkarchitectures,following[31].......127Table5.3BLUFRvperformancethatweachievedusingtnet-workarchitecturesandtrainingsets.Resultsreportedas(mean-standarddeviation)across10folds.............................128ixListofFiguresFigure1.1Progressioninfacerecognitiondatabasey.a)laboratorycol-lectedimagesfromtheFERETdatabase,b)mugshotimagesfromthePinellasCounty's(PCSO)database,c)imagesfromtheLFWdatabase(unconstrained,butallfacesareautomaticallydetectable),d)imagesfromtheIJB-Adatabase(imagesaremanuallylocalized,notallfacescanbede-tectedautomaticallywithcurrentmethods).Ina),conditionsarerelativelycontrolled,withoutmuchtimebetweenacquisitions,inb),pose,expression,andilluminationarerelativelycontrolled,butvariationinsubjecthairstyle,andgeneralchangeinappearanceduetoagingoccur.Inc),somepose,oc-clusion,andunevenilluminationoccur,andind)extremevariationsinposecanoccur.....................................2Figure1.2ImagesoftheBostonMarathonBombingsuspectsa),andb),studiedin[49]and[4].Thetoprowshowsimagesreleasedofthesuspects,acquiredattheevent,whilethebottomrowshowssomematchingimagesofthesuspectsfromsocialmediaetc...............................6Figure1.3VproblemGivenanenrolledidentity,theproblemistodecideifagivenqueryfacematchesthatidentity.Therearetwounder-lyingcases,whentheinputfacedoesmatch(genuine),andwhentheinputfacedoesnotmatch(impostor).Sincethedecisionisabinaryaccept/reject,therearefourpossibleoutcomes.Correctdecisionsaremadewhenagenuinequeryimageiscorrectlymatched(trueaccept),andwhenandimpostorfaceiscorrectlyrejected(truereject),whileerrorsaremadewhenanimpostorfaceisreject(falseaccept),orwhenagenuinefaceisrejected(falsereject).8Figure1.4Idenproblemoverview.Givenagalleryofenrolledidentities,undertheclosed-setscenario(a),wherethequeryimageisguaranteedtomatchofthegalleryidentities,evaluationistypicallydoneintermsofrankingtheidentitiesinthegallery,andreportingtherankatwhichacorrectmatchismade(arankonematchina)above).Undertheopen-setscenario,thequeryimagemaynotbeinthegalleryatall,asinexampleb)........10Figure1.5Retrievalproblemoverview.Givenagalleryofimages,notnecessarilygroupedbyidentity,allimagesinthegalleryareranked,suchthatthelistofthektop-rankingitemsshouldcontainasmanyrelevant(sameidentity)imagesaspossibleforeachquery........................12Figure1.6Clusteringbyidentityoverview.Givenasetofunorganized,unknownidentityimages(a),automaticallygroupthemintoasetofsingle-identityclusters(b).Threeclustersbyidentityareshownin(b)fortheimagesin(a).13xFigure1.7Diagramofagenericfacerecognitionpipeline..............14Figure1.8ExamplefacedetectionsfromtheOpenCVimplementationoftheViola-Jonesfacedetector.............................15Figure1.9ExampleimagesafterapplyingtheTanandTriggspre-processingmethod,correspondingoriginalimagesshownin1.8..............16Figure1.10Exampleimagesunder2Dalignmentbasedondetectedeyelocations.17Figure1.11ExampleofgenuinematchpairswithlargeagegapsfromthePCSOdatabase.Thetwoimagesofsubjecta)exhibita14-yearagegap,whilethetwoimagesofsubjectb)exhibita13-yearagegap.Neitherpaircanbesuccessfullymatchedbyaleadingcommercialmatcherat0:1%FAR.....21Figure2.1Proposeddeepconvolutionalneuralnetwork(ConvNet)........33Figure2.2Afaceimagealignmentexample.Theoriginalimageisshownin(a);(b)showsthe68landmarkpointsdetectedbythemethodin[44],and(c)isthealignedfaceimage,wherethebluecirclewasusedtocenterthefaceimagealongthex-axis,andtheredcirclesdenotethetwopointsusedforfacecropping.......................................34Figure2.3Examplesoffaceimagesfromthesixfacedatabasesusedinsubsequentexperiments.....................................36Figure2.4ExamplesoffalsepositivesamplesdetectedbyDLIBfacedetectorinthe123MWebfacedatabase.(a)non-faceimages,(b)non-humanfaceimages.39Figure2.5Examplecorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocol,usingtheproposedDeepModel.(a)4trueacceptpairs,(b)4truerejectpairs...............................43Figure2.6Exampleincorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocol,proposedDeepModel.(a)4falseacceptpairs,(b)4falserejectpairs.....................................44Figure2.7ExamplesofwebimagesintheIJB-Adatabasewithoverlayedland-marks(toprow),andthecorrespondingalignedfaceimages(bottomrow);(a)exampleofawell-alignedimageobtainedusingautomaticallydetectedland-marksbyDLIB[44];(b),(c),and(d)examplesofpoorly-alignedimageswith3,2,and0ground-truthlandmarksprovidedinIJB-A,respectively.DLIBfailstooutputlandmarksfor(b)-(d).Theimagesinthetoprowhavebeencroppedaroundtherelevantfaceregionsfromtheoriginalimages.......47xiFigure2.8ExamplesoffacesearchinthefoldoftheIJB-Aclosed-setsearchprotocol,using\templates."Thecolumncontainstheprobetemplates,andthefollowing5columnscontainthecorrespondingtop-4rankedgallerytemplates,whereredtexthighlightsthecorrectmatedgallerytemplateandthenumberoffacesinthecorrespondingtemplateisdenotedwith#.Thereare112gallerytemplatesintotal;onlyasubset(atmostfour)oftheimagesforeachtemplateareshown............................50Figure2.9Distributionofwell-alignedtemplatesandpoorly-alignedtemplatesin1:NsearchprotocolofIJB-A,averagedover10folds.CorrectMatch@Rank-1meansthatthematedgallerytemplateiscorrectlyretrievedatrank1.Landmarksinwell-alignedimagescanbeautomaticallydetectedbyDLIB[44].Poorly-alignedimagesmainlyconsistofviewsoffaces.Wealigntheseimagesusingthethreeground-truthlandmarkswhenavailable,orelsebycroppingtheentirefaceregion..........................51Figure3.1Outlineofthelarge-scalefacesearchproblem..............54Figure3.2Illustrationoftheproposedlarge-scalefacesearchsystem.......59Figure3.3(a)Impactofcandidatesetsize(k)asafunctionofthegallerysize(N)onthesearchperformanceasmeasuredintermsofMeanAveragePrecision(mAP).RedpointsmarktheoptimalvalueofkfortvaluesofN.(b)Comparisonoffusionstrategiesbasedona1Mgalleryand3Kprobes.....63Figure3.4Distributionsofcosinesimilaritiesofthegenuinepairs,within-databaseimpostorpairsandbetween-databaseimpostorpairsforthecombinationsofLFW+Web-Face(left)andIJB-A+Web-Face(right)..............67Figure3.5egallerysizefortheaugmentedLFW+Web-Facedatabase.egallerycomputedforaFRRbycountingthenumberofcross-databaseimpostorsoverthreshold,andestimatingthesizeofdatabasefollow-ingtheobservedwithin-databaseimpostordistributionrequiredtoproduceasimilarnumberoferrors..............................68Figure3.6Exampletop-10searchresultsforLFWimages.Probeimagesareontheleft,thetop-10retrievalresultsareshownin2columnsontheright.(a)arelativelygoodquality(frontal)imageofDonaldRumsfeld,mostofthetop-10areofthecorrectidentity.(b)animageofDonaldRumsfeldexhibitingpartialocclusion,andposevariation.Onlyonecorrectidentityphotographisinthetop-10,howeveracaricatureofDonaldRumsfeldisalsoreturned.(c)AprobeimageofCondoleezzaRice,alltop-10resultsareofthecorrectidentity.(d)AprobeimageofGeorgeH.W.Bush,6ofthetop-10retrievalresultsareofthecorrectidentity.........................71xiiFigure3.7Closed-setfacesearchperformance(mAP)vs.gallerysizeN(log-scale),onLFWandIJB-Adatabases.TheperformanceofCOTSmatcheron80Mgalleryisnotshown,sinceenrollingthecomplete80MgallerywiththeCOTSmatcherwouldhavetakenaprohibitiveamountoftime(over80days).72Figure3.8Open-setfacesearchperformance(mAP)vs.falseacceptrate(FAR)onLFWandIJB-Adatabases,usingthe80Mfacegallery.TheperformanceofCOTSmatcherisnotshownduetocomputationalissues.FARisshownonlyupto10%sinceoperationalsystemsarenotlikelytooperatebeyondthisvalue........................................74Figure3.9ProbeandgalleryimagesofDzhokharTsarnaevandTamerlanTsar-naev,responsiblefortheApril15,2013Bostonmarathonbombing.Theseimageswerepreviouslystudiedin[49]and[4].Faceimages1aand1barethetwoprobeimagesusedforSuspect1(TamerlanTsarnaev),and1cishissketchimagedrawnbyaforensicsketchartistbasedon1aand1b.Faceim-ages2a,2band2carethethreeprobeimagesusedforSuspect2(DzhokharTsarnaev).Thegalleryimagesofthetwosuspectsbecameavailableonmediawebsitesfollowingtheidenofthetwosuspectsbasedoninvestigativeleads.Faceimages1x,1yand1zarethethreegalleryimagesforSuspect1andimages2x,2yand2zarethethreegalleryimagesforSuspect2......76Figure3.10Top10searchresultsforthetwoBostonmarathonbombersonthe80Mfacegallery.Thethreeprobefaces(1cisasketch)areoftheolderbrother(TamerlanTsarnaev)andthelastthreeprobefacesareoftheyoungerbrother(DzhokharTsarnaev).Foreachprobeface,theretrievedgalleryimagewithgreenborderisthecorrectlyretrievedimage.Imageswiththeredborderare\near-duplicate"imagespresentinthegallery.Notethatwewerenotawareoftheexistenceofthesenear-duplicateimagesinthe80Mgallerybeforethesearch........................................79Figure4.1Givenanunlabeledsetoffaceimagesacquirede.g.fromsocialmediaorinthecourseofaforensicinvestigation,wepropose\clusteringbyidentity"asastepinexploringandunderstandingthedatabase.Faceimagesherebelongtotwoindividuals:GeorgeW.Bush(W)andGeorgeH.W.Bush(G).81Figure4.2Diagramofapossibleclusteringusedtoillustrateevalu-ationmetrics.Sixsamplesarepartitionedinto2clusters;A1,A2,andA3arelabeledwiththesameidentity,sampleB1islabeledwithatidentity,andsamplesU1andU2areunlabeled......................89xiiiFigure4.3ApproximateRank-Orderclustering.Givenasetofunlabeledfaceimages(a),nearestneighborlistsarecomputedforeachimage(b);nearestneighborlistsarethenusedtocomputedistancesbetweenfaces(c).(b)showsthenearestneighborlistsofonlyefacesin(a).dm(1;2)(Eq.4.5)istheasymmetricdistancebetweenfaces1and2whereasDm(1;2)(Eq.4.6)isthesymmetricdistancebetweenfaces1and2....................93Figure4.4Two-dimensionalt-SNE[90]embeddingof320-dimensionaldeepfea-turesfortheLFWdatabase,includingonlyclustersfromtheproposedclus-teringwithtwoormoreimages.Linesaredrawnbetweenallsame-clusterfaces.........................................96Figure4.5NumberofclustersandcorrespondingF-measurefortthresholdvaluesusedwiththeproposedapproximaterank-orderclusteringmethodontheentireLFWdatabase.............................99Figure4.6Examplesof\pure"(singleindividual)clusters(a,b),and\impure"(multipleindividuals)clusters(c,d)generatedbytheproposedApproximateRank-OrderclusteringontheentireLFWdatabase.Facesnotbelongingtothemajorityidentityineachclusterareoutlinedinred.............100Figure4.7NumbersoftimeseachfaceintheLFWdatabaseappearedinanyotherface'stop-200nearestneighborlistfora)theexactnearestneighbors,andb)thenearestneighborscomputedviarandomizedk-dtreeapproximation...102Figure4.8Exampleimagepairsat5std.dev.belowthemeangenuinescoreonLFW.(a)agenuineLFWmatch,(b),animpostorLFWmatch,(c)anearduplicateinthebackgrounddatabase,and(d)poorqualityimagesinthebackgrounddatabase.............................107Figure4.9ExampleimagesfromclustersgeneratedfromtheYTFdatabase.a)showstwoclusters,eachcontainingframesfromonevideoofthesamesub-ject,b)showsaclustercontainingframesfromtwovideosofthesamesubject,wherethebackgroundforthevideoisapparentlyidentical,c)shows28identi-tieswhichwereincorrectlygroupedintoasinglecluster;manyoftheseimagesarepoorlylit...................................110Figure4.10Pairwiseprecisionvs.theproposedinternalqualitymeasure,forallclustersgeneratedbytheproposedApproximateRank-Orderclusteringal-gorithmonthefullLFWdatabase.Pointsinblueareclustersofsize3orlarger,pointsinredareofsize2.Thehighlightedsetofpointsontheleftedgeoftheareallofsize2,withzeropairwiseprecision.Sincewecan'treliablydistinguishbetweengoodandbad2-itemclusters,wediscardthemfromconsideration................................113xivFigure4.11Thebluelinedisplaystheaveragepairwiseprecisionforlistsofclustersorderedbytheproposedinternalclusterqualitymeasure,terminatedateachpossiblerank,whiletheredlinedisplaystheaveragepairwiseprecisionforalistorderedbythetrueprecisionofeachcluster(andrepresentsthebestpossibleperformanceonthistask).Thehorizontalblacklineindicatestheunweightedaverageprecisionofallclustersconsidered.ClustersaregeneratedbytheproposedApproximateRank-OrderclusteringalgorithmfromthefullLFWdatabase...................................114Figure4.12Averagepairwiseprecisionatrank,orderedbytheproposedinternalclusterqualitymeasureforaugmenteddatabases.ClustersaregeneratedbytheproposedApproximateRank-Orderclusteringalgorithm..........117Figure4.13Top-5rankedclustersfortheLFW,LFW+10M,LFW+123M,anddeduplicateddatabases.FortheLFW+10M,andLFW+123Mdatabases,boththeabsolutetop-5rankingclustersintermsoftheproposedqualitymeasure,andthetop-5rankingclustersoutofthoseclusterscontainingatleastsomeLFWimagesareshown.UnlabeledimagesgroupedinwiththeLFWimagesareoutlinedinred..........................118Figure5.1ExampleimagesfromtheVGG-Facedatabase..............120Figure5.2AnexampleimagelistedtwiceintheVGG-Facedatabase,withtwotfaceboundingboxes(onecorrectfacedetection,andonefalseposi-tive),bothlabeledasthesamesubject(WayneRooney)...........121Figure5.3Twotypesofunlabeledduplicatedetectedinthewedown-loaded.(a)placeholderimages,servedbysomeimagehostsintheplaceofbrokenlinks,and(b)exactduplicatecopiesofanimageuploadedtotwotURLs,labeledastwotsubjectsintheVGG-Facelists..122Figure5.4Threeimages,labeledas\JustinKirk."Theleftmostimageistheactorinquestion,themiddleimagecontainssomeincorrectlylabeledchildren,andtherightmostimageisnalcharacter\JamesKirk,"playedbyWilliamShatner.......................................122Figure5.5Thetoptwohighestranking(a)images,andlowest-ranking(b)imagesforonesubject(JuliaStiles),intheVGG-Facedatabase,accordingtotheapproximatequalityrankingprovidedwiththedatabase............123Figure5.6ResidualNetworkblockstructure.(a)originalversion[31],(b)fullpre-activationversion[32].............................126Figure5.7DeduplicatedVGGdatabaseimages/subjectdistribution........129xvFigure5.8BLUFRROCCurves...........................130Figure5.9Examplecorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocolmadebyour50-LayerDeepresidualNetwork,thatweremadeincorrectlybyourpreviousDeepModel.(a)4genuinepairsacceptedunderthenewmodel,rejectedundertheoldmodel,(b)4impostorpairsrejectedunderthenewmodelthatwereacceptedbytheoldmodel......132Figure5.10Exampleincorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocolmadebyour50-LayerDeepresidualNetwork,thatweremadecorrectlybyourpreviousDeepModel.(a)4genuinepairsrejectedunderthenewmodel,thatwererejectedundertheoldmodel,(b)4impostorpairsacceptedunderthenewmodel,thatwererejectedundertheoldmodel.133xviChapter1IntroductionTherehasbeenagreatdealofinterestinautomaticfacerecognitionasaresearchtopic,fordecades.TurkandPentland'sseminalEigenfacespaper[89]waspublished25yearsago.Althoughtprogresshasbeenmadeinautomaticfacerecognition,andthevolumeofresearchpublishedhasbeenconsiderable,manychallengesremain.Overtheyears,increasinglyultfacerecognitionproblemshavebeentackled,incorporatinglargerdegreesofvarianceinpose,lighting,andexpression(Figure1.1showssomeexamplesofprogressivelymoredfacerecognitiondatabases).Thesevariations(commonlyreferredtoasPIE:PoseIlluminationandExpression)causeahighdegreeofvariabilityinimagesofagivenindividual(referredtoasintra-personvariability).Soalthoughwewouldliketobeabletosaythateveryfaceimagetakenofagivenpersonisinfactofthatperson,dependingonthedegreeofvariationalloweditcanstillbeatask.Inadditiontothegeneraltrendofconsideringmorechallengingdata,thedegreeofautomationintherecognitionprogresshasalsogenerallyincreasedovertime.Inmanyearlyworks,facelocationsweremanuallymarked,orevenaligned(e.g.in[89]).However,givenadvancesinautomaticfacedetection,andfaciallandmarkdetectionfullyautomaticmethodshavebecomecommonplace.Themainfocusofearlyresearchonfacerecognitionwasdatacollectedfromcooperativesubjects,withimageacquisitionconditionsthatlimitedintra-personvariability(e.g.aco-1Figure1.1Progressioninfacerecognitiondatabasety.a)laboratorycollectedimagesfromtheFERETdatabase,b)mugshotimagesfromthePinellasCounty's(PCSO)database,c)imagesfromtheLFWdatabase(unconstrained,butallfacesareau-tomaticallydetectable),d)imagesfromtheIJB-Adatabase(imagesaremanuallylocalized,notallfacescanbedetectedautomaticallywithcurrentmethods).Ina),conditionsarerelativelycontrolled,withoutmuchtimebetweenacquisitions,inb),pose,expression,andilluminationarerelativelycontrolled,butvariationinsubjecthairstyle,andgeneralchangeinappearanceduetoagingoccur.Inc),somepose,occlusion,andunevenilluminationoccur,andind)extremevariationsinposecanoccur.operativesubjectcanbeinstructedtolookdirectlyatthecamera,limitingposevariation)|commonlyreferredtoas\constrained"facerecognitionscenarios.Earlyathandlingconstrainedfacerecognitionproblemsoftenusedlaboratorycollecteddata(e.g.theFERETdatabase[71],someexamplesofwhichareshowninFigure1.1);however,therearepracticalapplicationsinwhichconstrainedimagerynaturallyoccurs.Forexample,whenmugshotimagesaretaken,thesubjectcanbeinstructedtolookdirectlyatthecamera(althoughsubjectswillnotalwaysbeperfectlycompliant),andlightingconditionscanalsobecon-trolled.SomeexamplemugshotimagesfromthePinellasCounty's(PCSO)2database,illustratingthatsomedegreeofappearancevariationisneverthelesspresentforsubjectseveninpracticalconstrainedscenarios.Inparticular,appearancevariationduetosubjectagingwilloccurifonepersonisarrestedseveraltimesoveraperiodofyears.Thisposesanadditionaldimensionofyforfacerecognition,comparedtolaboratorycollecteddata(whichtypicallycoversarelativelysmallperiodoftime).TheNationalIn-stituteofStandardsandTechnology(NIST)'sFaceRecognitionVendorTest(FRVT)[27]isaseriesofevaluationsofcommercialandacademicfacerecognitionsystems,onpracticalconstraineddatabases(includinglaw-enforcementandvisaimagery),andservesasperhapsthebestmethodforevaluatingcurrentfacerecognitionperformanceontheseproblems,andcommercialproductshaveattainedahighdegreeofsuccessintheseevaluations.Recentacademicworkonfacerecognitionhasmostlyfocusedonhandlingso-called\un-constrained"scenarios,asbythewellknownLabeledFacesintheWild(LFW)database[35].ThisdatabaseconsistsoffaceimagesofcelebritiesandpublicwhichwereharvestedfromtheInternet,andtocontainonlyimageswithfacesdetectablebytheViola-Jonesfacedetector[92].Overthepastnineyears,recognitionperformanceonthestandardLFWevaluationprotocolhasincreaseddrasticallyparticularlyfollowingtheapplicationofdeeplearningmethodstofacerecognition[84].Atthispointthatoverallclas-accuracies(considering1:1matchingtobeabinaryproblem)ofover99%arecommonplace1;however,thedegreeofrecognitionaccuracyobservedinpracticalapplicationswilldependonthecharacteristicsofthedataobserved.Thedegreeofvaria-tionpresentinLFWislimited,duetoboththerestrictiontoonlycontainautomaticallydetectablefaces,andthetypeofsubjectsselected.LFWisprimarilycomposedofpublicandmanyimagesinthedatabaseconsistofpressoreventphotography,whichmayhavetcharacteristicsthane.g.typicalpersonalphotocollections.Withthoselimitationsinmind,andgiventhatahighdegreeofsuccessonLFWhasbeenachieved,oneobviouslineofresearchistoconsidermorefacerecognitionproblems.1http://vis-www.cs.umass.edu/lfw/results.html#UnrestrictedLb3ThisresultedinanewfacerecognitioninitiativebyIARPA,calledJanus2,withthegoalofdevelopingmoreltfacerecognitiondatabasestotheneedsoftheintelligencecommunity.TheIJB-Adatabase[48],collectedundertheJanusprogram,containsfaceswhicharenotevendetectablebystandardfacedetectors,andahigherdegreeofdemographicvariabilitythanLFW.However,evenconsideringjustLFW-styledata,handlinglarger-scalefacerecognitionproblemsisanotherpotentialavenueofresearch.ThestandardLFWevaluationprotocolislimitedinthatitonlyconsidersv(1:1comparison)performance,andeventhatatonlyasmallscale(theevaluationprotocolconsistingof10-foldcrossvalidation,withjust300positiveand300negativecomparisonsperfold).So,evenhighaccuracyonthestandardLFWprotocoldoesnotguaranteethatthesamelevelofperformancewillremainin1:NcomparisonproblemswithlargeN,oreveninvtionproblemsinvolvingalargernumberofidentities.Large-scaleproblemspresentobviouschallengesintermsofmanagingscalabilityinrun-timeandmemoryuse,butalsointermsofaccuracy.Consideringthesearchscenario(whereweattempttoreturnarelevantsetofimagesfromsomedatabaseinresponsetoasinglequery),alargerdatabaseprovides1)moreopportunitiestomatchesforthequeryfaceimage(\wolves"inDoddington'sbiometriczoo[21]),and2)inaretrievalproblem,theremaybemoregenuinematchestobefoundforeachqueryinalargerdatabase.Forthesereasons,maintainingasimilarlevelofaccuracyinsearchproblemswithlargerandlargergalleriescanbeexpectedtorequireanincreasinglevelof1:1comparisonperformance.Inadditiontolargescalesearchproblems,wecanalsoconsiderttasks,forexample,clusteringunlabeledfaceimagesbyidentityatalargescale.Ifwecanautomaticallydecidewhetherornottwofacesarethesamealmostperfectly,thenautomaticallygroupingthemintoidentitiesshouldalsobefeasible.Whilevariousresearchdirectionsarepossible,certainsocietaltrendsgivelarge-scalefacerecognitionproblemsahighdegreeofrelevance.Inparticular,smartphoneadoptionhas2https://www.iarpa.gov/index.php/research-programs/janus4reachedhighlevels{accordingtoPewResearch,USsmartphoneadoptionreached64%in20153.Oneimplicationofthistrendisthatratherthanhavingtoexplicitlychoosetocarryacamera,almosttwothirdsofpeoplehaveaprettygoodcameraontheirpersoneveryday,intheformoftheirphone.Simultaneously,socialmediausagehasbecomequiteprevalent,withmillionsofuserspostingregularlytoFacebook,andTwitteraccounts.Onecommonactionissharingpersonalphotographs,forexampleFacebookreportedupthatusersuploaded350millionphotographsaday4in2013.ely,smartphonesprovidethemeansforadrasticincreaseinthevolumeofcasualphotography,andsocialmediaprovidesthemotivationformillionsofpeopletotakeandsharehundredsofmillionsofphotographs.Thesetrendshavetimplicationsfortherelevanceoffacerecognitionresearchtosociety.Therearesomeverypracticalpotentialapplicationsoffacerecognitioninthelawenforcementorforensicsdomains.Theprevalenceofsmartphonecameras(alongwiththecontinuedadoptionofsurveillancecamerasforsecuritypurposes)meansthatitisincreas-inglylikelythatwhencrimesoccurtherewillbeeithersurveillanceimagery,orincidentalpersonalphotographsoftheincident.InthecaseofhighcrimesliketheBostonMarathonbombing5,theremayevenbeahugevolumeofimageryavailable,tothepointwheremanuallyinvestigatingitisadauntingtask.Figure1.2showsinthetoprow,im-agesofthetwoBostonMarathonBombingsuspectsfromtheevent,andinthebottomrowarchivalimagesofthesuspectsthatwerepostedonsocialmedia(imagesoriginallycollectedin[49]).Thetoprowimagespresentsevereforfacerecognition,duetolowimageresolution,occlusion(e.g.fromsunglasses),andposevariation.Asidefromlaw-enforcement,socialmediaalsomotivateslargescalefacerecognitionap-plications,intheformofconvenienceapplicationsforusers.Forexample,whenauseruploadsaphotograph,possibleidentitiesofpeopleinthepicturecanbeautomaticallysug-gested(whichmaybealarge-scalesearchapplication).Additionally,giventhehugevolume3http://www.pewinternet.org/2015/04/01/us-smartphone-use-in-2015/4http://www.businessinsider.com/facebook-350-million-photos-each-day-2013-95http://wapo.st/1WiH6TX5Figure1.2ImagesoftheBostonMarathonBombingsuspectsa),andb),studiedin[49]and[4].Thetoprowshowsimagesreleasedofthesuspects,acquiredattheevent,whilethebottomrowshowssomematchingimagesofthesuspectsfromsocialmediaetc.ofimageryuploadedtotheseservices,clusteringthedetectedfacesmaygivepossibleinsights(suchassuggestingpossiblefriendstousersetc.).Themainfocusofthisthesiswillbelargescalefacerecognitionapplications,splarge-scalefacesearch,andlarge-scalefaceclustering.However,performanceonthesetasks(intermsofaccuracy)isverystronglybytherobustnessandsaliencyoftheun-derlyingfacerepresentation.Toattainalevelofperformancettoachievereasonableaccuracyondatabasesontheorderof100millionfaceimages,wefollowtherecenttrendofemployingdeepconvolutionalneuralnetworkstolearnafacerepresentation.Improvingthediscriminativeabilityoftheunderlyingfacerepresentationinthepresenceoflargeintra-personvariationsisakeystepinimprovingoverallperformanceonlarge-scalesearchandclusteringtasks.1.1FaceRecognitionBackgroundTopresentpriorworkonfacerecognitioninbrief,wewilldiscusscommonfacerecog-nitionproblems(andmethodsfortheirperformanceevaluation),thengiveanoutlineofa6generalautomaticfacerecognitionpipeline(includingexamplesofnotablemethodsthatfallineachstepofthepipeline),andgiveanoverviewofhistoricalprogressinhandlingmoreandmorefacerecognitionscenarios.1.1.1FaceRecognitionProblemsTheclassicoperationalmodesconsideredundertheoverallsubjectoffacerecognitioncanbebrokendowninto1:1matching(vproblems,whichdecidewhetherornotagivenfaceimagebelongstoaclaimedidentity),and1:Nmatchingproblems(sometimesreferredtoas\search"problems,whereagivenfaceimageismatchedagainstanentirepre-enrolledgallery).Giventhesecategories,theclassicexampleofasearchproblemistheidentask,decidingwhichofasetofpre-enrolledidentitiesagivenqueryimagecorrespondsto(whichistypicallycarriedoutbyexhaustivelymatchingthegivenqueryagainsttheentiregallery).Idenioncanfurtherbebrokendowninto\closed-set"and\open-set"tasks.Closed-setidenassumesthatthequeryimagepresentedtothesystemmustcorrespondtooneoftheidentitiesenrolledinthegallery,whileinopen-setidenitisalsoconsideredpossibleforthequeryimagetobeofanunknownperson.Open-setidenisthereforeamorepracticalproblem,becauseitistypicallyinfeasibletoensurethateveryfacethatcouldpossiblybepresentedtoafacerecognitionsystemisenrolledbeforehand.Thegeneralimageretrievalproblemofreturningallrelevantimagesforagivenquery(withrelevancedasfaceimageswithmatchingidentities)isalsoofinterestine.g.image-searchapplications,whereausermaywishtoseeallavailableimagesofthepersoninagivenquery,andthiscanalsobeconsideredinclosed-set,andopen-setmodes.Generallyspeaking,whethertheendgoalisvsearch,orevenclustering(orga-nizinganunlabeleddatabasebyidentity)applications,thestepistoderiveadiscrim-inativerepresentation,typicallyintheformofa\featurevector,"alengthnumericalrepresentationdescribingagivenface.Theproblemofderivinganappropriaterepresenta-7tionisquitechallenginginthecaseoffacerecognition,sincetimagesofthesamepersoncanappeardrasticallyt,dependingonavarietyoffactorsrelatedtoimageacquisitionaswellasthesubject'sinteractionwiththecamera.Forexample,andfrontalimagesofthesameperson'sfaceappeardrasticallytwhenconsideredasanimage,eveniftheyareultimatelyjusttviewsofthesameperson'sface.So,anappropriatefacerepresentationshouldinsomewaybeonewiththepropertythatextractedrepresentationsofthesameperson'sfaceimagesarealwayssimilar(undersomeappropriatemetric),butalsotherepresentationsextractedfromtpeople'sfacesshouldbeentagainregardlessofanychangesimagingconditions.Inotherwords,twoimagesoftpeopleshouldresultinquitetextractedrepresentations(exhibithighinter-subjectvariation),whileaandfrontalimageofthesamepersonshouldexhibitsmallintra-subjectvariation.1.1.1.1VFigure1.3VproblemGivenanenrolledidentity,theproblemistodecideifagivenqueryfacematchesthatidentity.Therearetwounderlyingcases,whentheinputfacedoesmatch(genuine),andwhentheinputfacedoesnotmatch(impostor).Sincethedecisionisabinaryaccept/reject,therearefourpossibleoutcomes.Correctdecisionsaremadewhenagenuinequeryimageiscorrectlymatched(trueaccept),andwhenandimpostorfaceiscorrectlyrejected(truereject),whileerrorsaremadewhenanimpostorfaceisreject(falseaccept),orwhenagenuinefaceisrejected(falsereject).VisfundamentallyabinaryonproblemThegoalistodecidewhether8theinputimage(sometimesreferredtoasthe\probe"image)matchesitsclaimedidentity(thisiscommonine.g.authenticationapplications,suchasunlockingaphone),ordoesnot.Figure1.3outlinesthevproblem.Theproblemistoclassifythegivenimageaseithermatchingtheclaimedidentity(a\genuine"match),ornotmatching(an\impostor"match).Assuch,theactionwhichisultimatelytakenistomakea\yes"or\no"decisiononwhetherornotthegivenimageisoftheclaimedidentity(genuine).Therearetwopossibleerrortypes,aimagewhichisinrealitygenuinecanberejectedbythesystem,oranimagewhichisinrealityanimpostorcanbeacceptedbythesystem.Theseerrorsarereferredtoas\falserejects"and\falseaccepts."Theperformanceofagivenvsystemwillalwaysabetweenthesetwoerrors.Thetypicalwayinwhichavsystemactuallyfunctions,istoextractfeaturevectorsforboththeprobeimage,andanimagematchingitsclaimedidentity,thencomputeasimilaritymeasurebetweenthesetworepresentations(usingadissimilaritymeasureisfunctionallyequivalent),andmakethedecisiontoacceptorrejectthematchbycomparingthecomputedsimilaritymeasuretoaknownthreshold.Thethresholdusedistypicallyderivedbycomputingasetofknowngenuinematchscores,andasetofknownimpostormatchscores.Basedonthesescores,theFalseAcceptRate(FAR)andFalseRejectRate(FRR)forthedatabaseataspthresholdcanbecomputed;thethresholdcorrespondingtothedesiredoperatingpointisselected,ortheoverallrangeofpossibleoperatingpointscanbeshownbyplottingFARandFRRagainsteach-otherdirectlyinaReceiverOperatingCharacteristic(ROC)curve.1.1.1.2SearchAspreviouslydiscussed,iden(outlinedinFigure1.4)istheproblemofdecidingwhichidentityinadatabaseofknownidentitiesagivenqueryfaceimagecorrespondsto(intheopen-setcase,thequerymaynotmatchanypreviouslyenrolledidentity).Inonesense,theidenproblemisentwinedwiththevproblem.Ifwehadaperfectmethodforperformingvcation,idenagainstanN-itemgallerycouldbecarried9Figure1.4Idenproblemoverview.Givenagalleryofenrolledidentities,undertheclosed-setscenario(a),wherethequeryimageisguaranteedtomatchofthegalleryidentities,evaluationistypicallydoneintermsofrankingtheidentitiesinthegallery,andreportingtherankatwhichacorrectmatchismade(arankonematchina)above).Undertheopen-setscenario,thequeryimagemaynotbeinthegalleryatall,asinexampleb).10outbysimplymakingNvdecisions,andreportingtheonematchasthecorrectidentityfortheprobeimage(orreportingthatnomatchexists,intheopen-setcase).Inpractice,attainingperfectvaccuracyisinfeasible(duemainlytotheunderlyingyinhandlingthelargeintra-subjectvariationpresentinfaceimages).Typically,idenisperformedbycomparingthequeryimageagainstallback-groundimages,rankingthebackgroundimages,andreportingtheidentityofthesubjectasthehighest-scoring(rankone)backgroundimageinthedatabase.Intheclosed-setscenario,noglobalthresholdisimposedonscores,sotheremaybecaseswhereagivenprobeimagescoresrelativelylowagainstallgalleryimages(comparedtotheoveralldistributionofmatchscoresforthepopulation),butthegalleryimagecorrespondingtoitsgenuineidentitystillscoresbetterthantheimpostorimages.Typically,closed-setidenperformanceissummarizedbyaCumulativeMatchCharacteristic(CMC)curve,whichplotsrankonthex-axisvs.thefractionofqueryimagesforwhichatleastonecorrectmatchiswithinthesprankonthey-axis.Ratherthanconsideringjusttheperformanceofthesinglebest-guesspossibleforthefacerecognitionsystem,rankshigherthanonerepresentthechancethatthesystemcanatleastreturnacorrectmatchwithinthetopseveralresults.Handlingthemorepracticalopen-setidentaskrequiresaccountingforthepossibilitythatthequeryimagemaynothaveamatchinthegalleryatall.Thismakesopen-setidenti-amoreproblem,andtypicallyrequiresthere-introductionofsomeformofglobalmatchscorethreshold,toallowthesystemtosaythatthegivenquerydoesn'tmatchanythingatall(ratherthanjustreportwhicheverimagesscorerelativelyhigh).Onecommonevaluationmethodistoplotafalse-acceptrateonthex-axis(representingthefractionofqueryimagewithnomatethatwerenotrejectedbythesystem)vs.a\detectionandidenrateonthey-axis,whereDIRisthefractionofqueryimageswithamateinthegallerywhichwerenotrejectedbythesystem,andforwhichthesystemalsoreturnedacorrectmatchwithinsomesprank(sowecouldplotaDIRcurveforrank-1idenperformance,orrank-10idencationperformance).11Figure1.5Retrievalproblemoverview.Givenagalleryofimages,notnecessarilygroupedbyidentity,allimagesinthegalleryareranked,suchthatthelistofthektop-rankingitemsshouldcontainasmanyrelevant(sameidentity)imagesaspossibleforeachquery.Inadditiontoidenwecanalsoconsidertherelatedimageretrievalproblemasanothercommonformofsearch(outlinedinFigure1.5),wheretheobjectiveisnottodecideaidentityforagivenqueryimage,butratherreturnasmanymatchingimagesaspossibleforagivenquery.Sotheproblemistoconsideraless-structuredgallery,onewhichiseitherintentionallynotorganizedintoidentities,oroneforwhichtheidentityinformationisunknown(inpractice).Theobviousapplicationforthisproblemisafaceimagesearchengine,whichcouldreturnallsimilarlooking(orevensameidentity)facesforagivenquery.Themostcommonevaluationmethodforthesekindofsearchproblemisintermsofprecision/recall.Precisionisthefractionofareturnedsetofresultswhichare\relevant"accordingtosomecriterion,whilerecallisthefractionoftotalrelevantresultswhicharereturned.Directevaluationofprecisionandrecallrequiresasizesetofreturnedresults(i.e.weassumethesystemreturnsanumberofresultsperquery).AmoregeneralmethodofevaluationisbasedonAveragePrecision(AP),whereforagivenqueryimage,werankallimagesinthegallery,andcomputetheaverageprecisionforeachpossiblelistlength(i.e.precisionforlistlength1,2,100etc.),soaverageprecisionsummarizessearchperformanceacrossavarietyoflistlengths.Foroveralldatabaseevaluation,anunweightedmeaniscomputedacrossqueryimagestogivetheso-calledmeanAveragePrecision(mAP)onthedatabase.mAPmayalsobeadaptedtotheopen-setscenariosimilarlytotheDIRcurve,byintroducingaglobalscorethreshold(torejectnon-matchingqueryimages),and12varyingthethresholdtoplotthefalseacceptratevs.themAPandacceptancerate.Figure1.6Clusteringbyidentityoverview.Givenasetofunorganized,unknownidentityimages(a),automaticallygroupthemintoasetofsingle-identityclusters(b).Threeclustersbyidentityareshownin(b)fortheimagesin(a).1.1.1.3ClusteringbyIdentityClusteringisthegeneralproblemoforganizinganunlabeleddatabaseintogroups(\clus-ters")whichareinsomewaymeaningful.Therearemanypossiblemethodsforclustering,andtclusteringsofaparticulardatabasemayhavet,butequallymeaningfulinterpretations.Forfacerecognition,oneobviouslymeaningfulgroupingistoorganizeanunlabeleddatabaseintodiscreteidentities(wecouldalsoconsidere.g.clusterscorrespond-ingtodemographicgroupsorposevariations,butthisworkprimarilyconsidersclusteringbyidentity).ThisproblemisoutlinedinFigure1.6andcanbeconsideredintermsofitsrelationshiptovIfwecouldperfectlydecidewhetherornoteachpairoffaceimagesinagivendatabasecorrespondedtothesameidentity,wecouldautomaticallygroupthemtogether.Ofcourse,thismeansconsideringO(n2)totalbinaryvdecisions,sothiskindofapproachismorethanidenorsearch.Variousmethodsfor13evaluationclusteringperformancearepossible;ifgroundtruthclusterlabelsareavailable,onecommonmethodistoconsiderthefullsetofpairwisecomparisons(aspreviouslydis-cussed),andcomputeprecisionandrecallmeasuresforperformanceintermsofthesepairwisecomparisons.So,pairwiseprecisionisthefractionofpairswithinaclustercorrespondingtothesameidentity,andpairwiserecallisthefractionofpairswithinanidentityplacedinthesamecluster.1.1.2FaceRecognitionPipelineFigure1.7Diagramofagenericfacerecognitionpipeline.Facerecognitionapplicationstypicallyshareacommonprocessofderivinganappropriaterepresentationfromaninputfaceimage.Figure1.7presentsagenericformulationofsuchaprocess:inthefacedetectionstage,aregioncontainingafaceislocalizedfromtheinputimage;inpreprocessing,variousoperationsmaybeappliedtoimprovefeatureinvariance(suchasilluminationnormalization),ormakesubsequentstageseasier;infacealignment,faceshapeiscorrectedinsomeway,typicallyregisteringthedetectedfacetosomestandardscale,andorientation;infeatureextraction,descriptivefeaturesareextractedfromthereg-isteredregionviasomemethod;andincomparison,extractedfeaturevectorsarecomparedagainsteachothertocarryoutsptaskssuchasvsearch,orclustering.141.1.2.1FaceDetectionFigure1.8ExamplefacedetectionsfromtheOpenCVimplementationoftheViola-Jonesfacedetector.Givenaninputimage,itisnecessarytolocalizetheexactregioncontainingtheface.Failuresatthisstage(missingfacesintheimage,ordetectingnon-faceregionsasfaces)arekey,sincesubsequentstageswillnotdoanythingofvalueiftheyarenotpresentedwithactualfaceregiontoworkwith.Computationalperformanceconsiderationsarealsokeyhere,sincetypicallyafacewillbelocalizedbyanexhaustivescanovertheinputimage,consideringmultiplescalesaswellasallpossiblespatiallocations.TheclassicexampleofafacedetectorisofcoursetheworkofViolaandJones[92],whichleveragedHaar-likefeatures,alongwithboostedregressionstumpstoachievehighfacedetectionaccuracy,whilealsoattainingveryfastdetectionspeedsduetothecombinationof(i)acascadestructureintheboostingscheme,and(ii)theuseofintegralimagestodrasticallyspeedupHaarfeaturecomputation.SomeexampleViola-JonesfacedetectionsareshowninFigure1.8.OnthestandardFDDBfacedetectionbenchmark[41],whichconsistsof5;171facesin2;845images,Zhangetal.[106]reportedatruedetectionrateofaround95%atafalsepositivecountof500(overthefullFDDBdatabase),usingajointconvolutionalneuralnetworkbasedfacedetection/alignmentapproach(andattainedadetectionrateof16FPSona2.6GHzCPU),comparedtoadetectionrateof53%forthestandardViola-Jonesmethod.15Figure1.9ExampleimagesafterapplyingtheTanandTriggspre-processingmethod,cor-respondingoriginalimagesshownin1.8.1.1.2.2Pre-ProcessingFollowingfacedetection,oneisleftwitharegionwhich(barringdetectionerrors)containsaface.Atthispointitisrelativelycommontoapplypre-processingtechniquesofsomesort,usuallyintendedtoovercomesourcesofvariationintroducedduringimagecapture,suchaslightingorfocuschanges.TheFADEINmethoddevelopedbyNishiyamaetal.[66]isapre-processingmethod,intendedtomitigatetheofblur(suchasblurduetopoorfocus,orlinearmotionblur).Independentofsubsequentstagesofprocessing,themethoddevelopedsimplyreducestheofblur,usingaschemetoidentifytheblurtyperelevanttoaparticularfaceimage,thendeconvolutiontoreversetheidenprocess.TanandTriggs[86]developedapre-processingschemeintendedtoreducevariationduetoillumination(suchaslocalshadowingetc.),consistingofgammacorrection,followedbyofGaussian(DoG)ng,andthencontrastequalization.ThemethodshowstimprovementoverusingunderlyingfeaturevectorsdirectlyonboththeFRGC-104andYale-Bdatabases(whichincorporatelightingvariation).SomeexampleprocessedimagesareshowninFigure1.91.1.2.3FaceAlignmentTherotationandtranslationofthefacein3Dspaceisoneofthemosttcausesofappearancevariation,andonethatisnoteasilymitigatedbysimpleschemes.16Figure1.10Exampleimagesunder2Dalignmentbasedondetectedeyelocations.Ingeneral,thegoalinfacerecognitionistoarriveatanerepresentationdescribingaface.Variouspossiblefeaturetypeshavebeenusedelyinfacerecognition;how-ever,regardlessofthespnatureoftherepresentationused,essentiallyeveryapproachperformssomeprocessingpriortofeatureextractiontoensurethatthefeaturesextractedfromvariousfacescorrespondtothesameunderlyingregionsoftheface.Thisprocesscanbesimpleasin,forexample,detectingthelocationsoftheeyes,andapplyingrotationandscalingtoplacethematpointsinthenormalizedimage,ensuringthatatleastthetwoeyesoneveryfacefeaturesarebeingextractedfromhavethesamelocationinthealignedimages.SomeexamplesareshowninFigure1.10.Thiseye-basedalignmentprocedureisrelativelycommon,andeasytoimplement(requiringonlyanemethodforlocalizingtheeyesgivenafaceregion),butisunabletoaddresstheof3Drotation.Ifagivenfaceexhibitsnoticeableyaw,aligningtheeyesvia2Drotationwillstillleavethatfacepoorlyalignedwithafrontalface.Moreelaborateformsofalignmentinclude\activeappearancemodels"(AAMs)[18],whichincludea2Dshape,anda2Dappearancemodel,andattempttothesemodelssuchthattheycanreproducethecurrentimage.Amoreelaborateap-proachalongtheselinesistheworkon3DmorphablemodelsbyBlanzandVetter[7].Here,ratherthana2Dmodel,thegoalistoa3Dmodelofthefacetothegivenimage,andthenpotentiallyusethatmodeltorenderafrontalview(therebycorrecting3Drota-tion).BlanzandVetter'sapproachhandles3Dmodelasanoptimizationproblem,basedonananalysisbysynthesisapproach(similarinspirittoAAMs),wherethecurrent17gooofthemodeltotheimageisevaluatedintermsofhowwellthemodelcanbeusedtoreproducetheimage.Somerecentapproachesto3Dmodelinghavee.g.[45,76]haveattempted3Dreconstructionsfromcollectionsofunconstrainedfaceimagery,attemptingtoalleviateonekeyyinunconstrainedfacerecognition.1.1.2.4FeatureExtractionAfteradetectedfacehasbeenpre-processed,andnormalizedtosomestandardspatialuration,a(typically)lengthfeaturevectorofsomemanneriscomputedtodescribetheface.Varioustypesoffeatureshavebeenusedovertheyears.EarlyworksuchasTurkandPentland[89](whorepresentedafaceasavectorofthetopNprincipalcomponentanalysis(PCA)coets),andBelhumeuretal.[3](whosimilarlyrepresentedafaceusingLDAcots,inasupervisedsubspacelearningapproach)focusedonlearningsubspacestodescribefacesdirectlyfromimagepixels.SubsequentworksuchasthatofAhonenetal.[1]demonstratedthevalueofcomputinglocalstatisticalfeatures(suchaslocalbinarypattern(LBP)histograms),ratherthanworkingdirectlyonpixelvalues,asamethodforattainingdescriptorswithgreaterinvariancetoe.g.lightingchanges.OtherformsoflocalfeaturesincludeGaborjets,whicharetheresultofapplyingasetofGabor[52],sometimescomparedthroughgraph-basedmethodsasinElasticBunchGraphMatching(EBGM)[99].TheworkofKumaretal.[51]builtontheideaofincorporatinglocalfeatures,byextractingfeaturesfromlocalpatchesontheface,thenusingthosefeaturestotrainasetofinterme-diatethatpredictedcertainvisualattributesoftheface(e.g.gender,facialhair,haircolor,etc.).Usingtheoutputoftheseper-attributeSVMsasafeaturevectorleadtoimprovedperformance,comparedtodirectlyusinglocalfeaturesasafacerepresentation.Inrecentyears,therehasbeenagreatdealofinterestinmethodsforlearninglocalfea-tures(ratherthanusinghandcrafteddescriptorssuchasLBP).DeepconvolutionalneuralnetworkapproachessuchasthatofTaigmanetal.[84]areanexampleofsuchalearningbasedapproach.TheinitialconvolutionallayersusedinTaigmanetal.'snetworkmaybe18consideredlow-levelfeatureextractors,andsubsequentlayersmaybeinterpretedastherepresentationtosomethingmorerobusttoextraneoussourcesofimagevariation.1.1.2.5ComparisonAfterafeaturevectorhasbeenextracted,itistypicallyusedin1to1comparisonswithotherfeaturevectors.Invionapplications,onemerelyneedstocompareaprobetoasingleclaimedidentity,butevensearchapplications(whichinvolve,inprinciple,a1toNcomparisonofasingleprobefeaturevectoragainstagalleryofsizeN)aretypicallyalsohandledasN1-to-1matches,individuallycomparingtheprobeagainsteveryidentityinthegallery.Sophisticatedindexingmethodsarerarelyused,duetotheltyofmanaginghighdimensionalfeaturevectors(ontheorderofhundredstothousandsofdimensions),andgeneralunwillingnesstoloserecognitionaccuracybyapplyingapproximatematchingmethods.Eveninthiscase,thereissomevarietyinthenatureof1-to-1matchescarriedout.Variousdistancemetricsorsimilarityfunctionsmaybeused,aswellasmoresophisticatedprobabilisticmatchingmodelssuchasthemethodofLietal.[54](whoformallydevelopvandrecognitiontasksasBayesianmodelselectionproblems)havebeenemployed,ashavedistancemetriclearningmethodssuchastheworkofGuillauminetal.[29](whoattempttolearnedistancemetrics,toimprovetheperformanceofaspcfeaturevectorintherecognitionscenario).1.1.3FaceRecognitionTaskProgressionAsnotedinFigure1.1,thegeneraltrendinfacerecognitionresearchhasbeentotacklemoreandmorescenariosovertime.Returningtotheoriginaleigenfacespaper,thedatabaseusedintheexperimentswasanacademicdatabase,andallfacesinthedatabaseweremanuallyalignedbymanuallymarkingeyelocations.Althoughconstrainedfacerecog-nitionisanearlierproblem(intermsofpose,illuminationandexpressiony)thathasbeenaddressedingreatdetail,therecognitionperformancehasnotreacheditslimit.There19arenumerouspracticalapplications(particularlygovernmentapplications)wherelargescaleconstrainedfacerecognitionproblemsareencountered.Forexample,thephotographstakenfordriverslicensesareconstrainedfaceimages,andoneobviousapplicationoffacerecogni-tiontechnologyisfordrivers'licensedatabasede-duplication(i.e.detectingandpreventingcaseswheredriver'slicensescorrespondingtomultipleidentitiesareissuedtothesameper-son).Additionally,themugshotimagestakenwhenpeoplearearrestedareanotherformofconstrainedfacephotography.Again,de-duplicationisanapplicationofobviousinterest(itisundesirableforarrestsofthesamepersontobeassociatedwithmultiplenames).Anotherclassicexampleisvisaimages.ToentertheUnitedStates,citizensofmanycountriesmustacquireavisa.Thesevisasareissuedtospindividuals,sovalidatingthattheyarecorrectisanareaofinterest,andcantaketheformofafacevproblem.Table1.1Facevperformanceonnotableconstrainedfacebenchmarks.DatabaseYearTAR@0.1%FARRank-1RetrievalRateGallerySizeFERET[71]199321%78%316FERET[71]199646%95%831FRVT[6]200280%73%37,437FRGCv2.0[70]200599%n/an/aFRVT[72]200699%n/an/aMBE[28]201099%92%1.6MFRVT[27]2013n/a96%1.6MIncreasedperformanceintheconstrainedscenarioisperhapsbestcharacterizedbytheprogressreportedovertgovernmentsponsoredfacerecognitionevaluations.TheFERETtestprocedureranfrom1993to1996(fundedbyDARPA),theNISTFaceRecog-nitionVendorTestswererunin2002,withthemostrecentevaluationbeingin2013.Table1.1showsprogressonthesedatabasesinthefacevontaskacrosstheyears.Bythemostrecentevaluation,performanceatthe0:1%FARoperatingpointalreadyappearssaturated,withseveralmethodsproducingresultsintheneighborhoodof99%TAR.Anotherpointisthatthenatureofparticipantsintheseevaluationshaschangedovertheyears.EarlyFERETparticipantsweremainlyacademicresearchgroups,whilethetopparticipantsin20themorerecentFRVTsarestrictlymajorcorporations(withrelativelyfewacademicgroupsparticipatingatall),thematurityofconstrainedfacerecognitionasatechnology.Inspiteoftheoveralldegreeofsuccessachievedinconstrainedscenarios,somechallengesremain.Forexample,evenifPIEfactorsarecontrolled,asubjectsappearancewillnatu-rallychangeovertime.TwoexamplesofpairsofimagesfromthePCSOdatabase,whichcannotbesuccessfullymatchedat0:1%FARbyaleadingcommercialmatcherareshowninFigure1.11.Figure1.11ExampleofgenuinematchpairswithlargeagegapsfromthePCSOdatabase.Thetwoimagesofsubjecta)exhibita14-yearagegap,whilethetwoimagesofsubjectb)exhibita13-yearagegap.Neitherpaircanbesuccessfullymatchedbyaleadingcommercialmatcherat0:1%FAR.Therecenttrendhasbeentoconsiderunconstrainedscenarios,i.e.databasesthatal-lowforgreatervariationinkeyfactorssuchasPose,Illumination,andExpression(PIE).Particularly,thetrendhasbeentowardsoperationalfaceacquisition(constructingexperi-mentaldatabasesbyharvestingimagesfromsocialmedaiornewssources,thenorganizingthemforexperimentaluse),asintheLFWdatabase[35]ratherthancollectingfaceimagesinalaboratory.TheLFWdatabasewasgeneratedbysearchingforimagesofcelebritiesandpublices(byname),automaticallytheresultsdowntoonescontainingfaces(accordingtoanhe-shelffacedetector),thenmanuallyverifyingtheresults.Inthiscase,themanualrequiredislimitedtojustverifyingimagesalreadyascontainingfaces.Bycontrast,datacollectionfortheJanusproject[48]requiredagreater21dealofmanualforfacelocalization,andmarkingmajorfaciallandmarks(facilitatedbytheuseofAmazonMechanicalTurk(AMT)).1.2SearchBackgroundInadditiontodiscussingfacerecognitionproblemsinparticular,backgroundongeneralimageretrievalisalsoofinterest,sincethesemethodsareagoodsourceofmotivationforfacerecognitionspsystems.Theimageretrievalproblemisofgreatinterestforwebtechnologiessuchassearch(e.g.asearchengineshouldelyrankresultssuchthatthebestmatchesareshownFacespsearchwillbediscussedindetailinchapter3.Themajormethodsforhandlingthisproblemcanbebrokendownintoapproximatenearestneighbor(ANN)methodsincludinghashingmethods,tree-basedsearchmethods,andcodingmethodswhichreducetheamountofdatathatmustbestored(bycompressingfeaturevectorsthroughsomemethod)andspeedupindividualcomparisons.Themainyinimprovingnearestneighborsearchspeed,forimagesisfeaturedimensionality.Fromanalgorithmsanddatastructuresperspective,wecanindex1-dimensionaldataperfectly(withnoapproximation)withtreebasedmethods(withO(logn)searchtimes),orhashingmethods(withexpectedconstanttimesearch).However,ex-tendingtheseapproachestohigherdimensionaldataremainsTheclassick-dtreeapproach[24]canworkwellforlow-dimensionalfeaturevectors(e.g.around10dimensions),butdoesn'tprovidetpracticalbforhigher-dimensionalfeaturespaces(typi-calfeaturevectorsforimageorfacerecognitionareataminimumontheorderofhundredsoffeatures).Generally,toattainimprovedsearchspeedonhighdimensionaldata,approx-imationmethodsareemployed.Treebasedapproachessuchasrandomizedk-dtrees[78]areonefamilyofapproximationmethods(inthecaseofrandomizedk-dtrees,multiplek-dtreesarebuilttoindexadatabase,eachontrandomsubsetsofthedata,andthetotalsearchsizeallowedperqueryisconstrained).Hashingbasedmethodsarealsoanarea22ofinterest.TheclassicLocalitySensitiveHashing(LSH)approach[22]developshashtableswhichaimtoplaceobjectswhicharesimilarinahighdimensionalspaceinthesamebucketsofthehashtable,withhighprobability,howeveragainthisisanapproximationmethod,andsomedegreeoferrorinthedesiredsearchresultsisexpected.Hashingandtreebasedmethodsarenearestneighborapproximationmethodsinthesensethattheyattempttoreducethetotalnumberofindividualcomparisonsbeingmade.Alternatively,thereareapproximationmethodswhichattempttospeedupthecostofanindividualcomparisonbetweentwoimages.Jegou'sproductquantizationmethod[42]canbeconsideredsuchacomparisonapproximationmethod.ProductQuantizationpartitionsafeaturevectorintosmallersub-vectors,computescodesforeachsub-vectorusingk-meansclustering,thenpre-computesandstoresdistancesbetweenparticularcodes.Thesepre-computedcodesanddistancesarethenusedasamethodto1)compressfeaturevectors(onlytheindicesoftheparticularcodesusedmustbestored,notthefullreal-valuedgalleryfeaturevectors),and2)speedupcomparisonviapre-computedsub-distances.Anadditionalhashingstepcanalsobeperformedbasedonthecodestoreducethetotalnumberofcomparisonsneeded.But,evenwithoutthisadditionalhashingstep,productquantizationcangiveatimprovementinsearchspeed.Alongsimilarlinestoproductquantization,binarizingfeaturevectorshasattractedinterestasameansofgallerycompression,andspeedimprovementforindividualcomparisons.Forexample,Gongetal.[25]binarizedafeaturevectorextractedbyadeepneuralnetwork,andusedthatbinaryfeaturevectortodevelopasetofhashtables,allowingforconstanttimesearchofthenearestclustercentersforpointsunderk-meansclustering.1.3ClusteringBackgroundClusteringinthegeneralsenseisoneoftheclassicproblemsinpatternrecognition,theproblemoforganizingunlabeleddatabasesintomeaningfulgroups.Clusteringisoftencon-23sideredaexploratoryapproachforinvestigatingdatabases,providingsomeformofautomaticorganizationpriortomanualanalysis.Potentialapplicationsforfaceclusteringincludeforensicapplications,suchasclusteringimagesfoundonaseizedharddrive,orpho-tographstakenaroundahcrime.Insuchapplications,theuseoffaceclusteringisasameansoforganizinglargedatabases,allowingformoretmanualinvestigation.Asidefromfaceimageclusteringinparticular(whichwillbediscussedindetailinchap-ter4),therehasbeenagreatdealofworkonthesubject,andsinceanexhaustivesurveyisinfeasiblehere,werefertoJain[40]foradetailedoverview.Inthissectionwewilldiscusstheclusteringproblemingeneral,andacoupleofpopularmethodsindetail.Ingeneral,theclusteringproblemcanbesummarizedas:givenanunlabeledsetofdatapoints,groupthosepointstogetherinameaningfulway.Thisisperhapstoogeneralaproblemstatement,sincefromascienperspectivewewanttostudytheenessoftclusteringmeth-ods,orclusteringresultsontdatabasesinsomequantitativeway.Ifthedatabasebeingclusteredisfullyunlabeled(noextrinsiclabelsondatapoints),thenthebestthatcanbedoneintermsofevaluationistoemploy\internal"measures,whichindicatethequal-ityofaparticularclusteringbasedonsomedatacohesionmeasure,raterthananextrinsic(category)labelcriterion.Thereisnoidealinternalmeasureintheunlabeledscenario,buttypicallytheymeasurecompactness(howclosetogetherthepointsinagivenclusterare)andseparability(howfaraparttclustersare)insomeway,withtheassumptionthatagoodclusteringsolutionshouldexhibitthesetwoproperties.Ingeneral,thereisnoguaranteethatasolutionwhichgivesgoodresultsintermsofsomespinternalmeasurewillbeofpracticalvalue.Measuresbasedondistancesinafeaturespace,orgraphrelationshipsareintheendonlycharacterizingsomepropertiesofthedatabasebasedonthesprepresentationused,andthereisnoguaranteethattherepresentationmeetsitsgoalofmakingpointseasytoorganizeinallcases.Ifthereissomeparticularsolutionofpracticalinterest(e.g.organizethedatabaseintoclusterscorrespondingtosomespclasses,suchasdiscreteidentitiesunderfacerecognition),24thenthequalityofthesolutioncanbeevaluatedintermsof\external"measures(assuminglabelsareavailableforthepointsbeingclustered).Externalmeasuresindicatehowwellagivenclusteringconformstotheidealsolution.Examplesofexternalmeasuresincludenormalizedmutualinformation(NMI),pairwiseprecision/recall,andoverallclasscationaccuracy(generatedbyassigninganidentitytoeachclusterbasedonmajorityvoting,andevaluatingaccuracyasinaproblem).Regardlessoftheevaluationmeasureused,variousclusteringmethodshavebeenpro-posed.Thek-meansclusteringalgorithmisworthdiscussing,since(i)itisperhapsthemostwell-knownclusteringalgorithm,(ii)hasonlyafewparametersfortuning,(iii)isoneofthemostt,and(iv)large-scaleclusteringmethodsareoftenapproximationsofk-meansclusteringwithimprovedscalability.Ink-means,theclusteringproblemisformulatedasminimizingthetotalsquaredistanceofpointsinthedatabase(insomefeaturespace)tothenearestofCclustercenters.Findingtheexactsolutiontothek-meansobjectiveisnotfeasible,soinpracticeanapproximatesolutionistypicallyreachedviaLloydsalgorithm[58],whichcanbeoutlinedasfollows,foranumberofclustersC:(i)initializeclustercen-ters(e.g.bythek-means++seedingprocedureofArthurandVassilvitskii[2]),(ii)assigneachpointinthedatabasetothenearestclustercenter,(iii)recomputeclustercentersasthemeanofallfeaturevectorsassignedtoeachcenter,and(iv)repeatsteps(ii)-(iii)un-tilconvergence.Therearetwoparametersneededink-meansclustering:thenumberofclusters,Candthedistancemetrictothesquarederror.ThecommonpracticeistouseEuclideandistancemetricandrepeatk-meansmultipletimeswithtvaluesofC.TheuserthendetermineswhichvalueofCgivesmeaningfulresults.Therearenumerousvariationsonk-means,suchasapproximationmethods(e.g.[25]),orkernelmethods(whichallownon-linearsolutions)[16].Anotherwidelyusedclusteringmethodisspectralclustering[93],whichapproachestheproblemfromagraphtheoryperspective.Inspectralclustering,thestepistoconstructanadjacencymatrixforthetargetfeaturevectors,describingthedatabaseasann-vertex25graph,wherenisthenumberofobjectsinthedatabase.Ifnoinherentgraphstructureisknown,asisthecaseforgeneralfaceclustering,theadjacencymatrixcanbeconstructedinseveralways.Oneoptionistoconstructafullyconnectedgraph,whereineachvalueintheadjacencymatrixisthesimilaritybetweenthecorrespondingfaces;otherwise,asparseadjacencymatrixmaybeconstructed,byeitherretainingalledgeswithasimilarityaboveathreshold,orretaininganumberofedgeswiththegreatestweights.AftertheadjacencymatrixisthenormalizedLaplacianiscomputed,followedbythetopCeigenvectorsofthenormalizedLaplacian,andthenanewmatrixisformedwhosecolumnsconsistofthecomputedeigenvalues.Consideringeachrowofthismatrixasanewsample(correspondingtothenoriginalsamples),k-meansclusteringiscarriedinthistransformedfeaturespace(ratherthanontheoriginalfeatures).Oneinterestingresultisthattheobjectivefunctionofspectralclusteringcanbeconsideredavariationofkernelk-means.1.4ContributionsTosummarize,theoverallfocusofthisthesisisondevelopingimprovedrepresentationsforunconstrainedfacerecognition,andapplyingtheserepresentationstoverylargescaleproblems(involvinguptoa120millionimagedatabase)infacesearch,andfaceclusteringbyidentity.Thecontributionsofthisthesisareasfollows.1.Antdeepconvolutionalnetworkforfacerecognition,trainedonalargepublicdomaindata(CASIA[104]),whichimprovesuponthebaselineresultsreportedin[104].2.Studiesoftheenessofthedeepnetworkbasedfacerepresentationonthreefacedatabasesofincreasingcomplexity:PCSOmugshotdatabase,LFWdatabase(onlycontainsfacesdetectablebytheOpenCVversion1.0Viola-Jonesfacedetector),andtheIJB-Adatabase(containsseveralfaceswhicharenotdetectablebytheViola-Jonesdetector).263.Applicationofsaidrepresentationtolarge-scalesearchandfaceclusteringproblems,handlingdatabasescontainingupto120millionfaceimages.4.UsingfaceimagesoftheTsarnaevbrothersinvolvedintheBostonMarathonbombingasqueries,weshowthatDzhokharTsarnaev'sphotocouldbeidenatrank8whensearchingagainstthe80Mgallery.5.Developmentofanupdatedapproximateclusteringalgorithmforlargescalefaceclus-teringimprovingonthemethodpresentedbyZhuetal.[108].Theproposedmethodusesanapproximatenearestneighbormethodforimprovedscalability,whichalsoat-tainsbetterclusteringaccuracy.6.Apreliminaryinvestigationoftheapplicabilityofthepresentedfaceclusteringmethodtovideo.7.Investigationintothededuplicationofaverylarge-scale(123Mimage)database.8.Developmentofintrinsic(nofacelabels)per-clusterqualitymeasures,tofacilitateuseofclusteringasapre-cursortomanualdatabaseinvestigation.9.Preliminaryinvestigationoftheimpactofincreaseddatabasesize,andimprovednet-workarchitecturesonunderlyingfacerepresentationperformance.1.5OrganizationTheremainderofthedocumentisstructuredasfollows.Chapter2providesadiscussionofadeepConvolutionalNeuralNetwork(CNN)basedfacerepresentation,withperformanceevaluationonstandarddatabasesinvandidenexperiments.Chapter3containsanin-depthstudyoflarge-scalefacesearchusingthefacerepresentationonabackgroundsetcontainingupto80millionimages.Chapter4containsastudyoflargescalefaceclusteringbyidentity,involvingdatabasesofupto120millionimages.Chapter275containsaninvestigationontheofalargertrainingset,andimprovedarchitecturesonfacerepresentationperformance.Finally,Chapter6summarizesthemainresultsofthisthesis,andindicatessomedirectionsforfuturework.28Chapter2FaceRepresentation2.1IntroductionThemostnotabletrendinavarietyofcomputervisiontasks(suchasgeneralobjectrecogni-tion)inrecentyearshasbeentheadoptionofso-called\deeplearning"methods.Althoughthecommonlyusedterm\deeplearning"isnew,thesemethodsrepresentthere-emergenceofneuralnetworkbasedmachinelearning.Inparticular,the\deep"indeeplearningreferstothetotalnumberoflayers(depth)usedinrecentnetworkarchitectures.Althoughneuralnetworkmethodswereinitiallyquitepopular,theylargelyfelloutoffavor,tomethodssuchasSupportVectorMachines(SVM)for[8].Inspiteofthis,neuralnetworkbasedresearchcontinuedinsomeareas.Notably,asearlyas1990LeCunetal.[53]reportedstrongresultsondigitusingaconvolutionalneuralnet-work(aconvolutionallayeriselyonewhichappliesalearneder-banktoitsentireinput,ratherthanlearningweightsforeachinputpixel).Inthelate2000s,deeparchi-tecturesstartedtogainpopularityandtypicallyfollowedtheoverallstrategyofusinganunsupervised\pre-training,"steponunlabeleddata,priortoatrainingsteponlabeleddata[33].Krizhevskyetal.'sworkonImagenetobject[50]resultedincantlyimprovedaccuracyoverthepreviousstate-of-the-art,andcanbeviewedasastarting29pointfortherecentsuccessofpurelysuperviseddeepconvolutionalneuralnetworksinobjectrecognition,andrelatedtasks.Fromanoptimizationperspective,deepnetworksaretypicallytrainedusingminibatchstochasticgradientdescent.Sincethefunctionbeingoptimizedisnon-convex,agradientdescentmethodcannotguaranteeanoptimalresult,sowewouldexpecttheoptimizationtohaveconvergenceproblems(e.g.togetstuckinlocalminimasomewhatfrequently).Inprac-tice,regardlessoftheparticularinitializationmethodused,manystrongresultsincomputervision(andfacerecognitionproblems)havealreadybeenreportedusingdeepnetworks.Re-cently,therehasbeensometheoreticalanalysissupportingthefeasibilityoftrainingdeepnetworksthroughgradientdescentinspiteofthelocalminimaproblem.Dauphinetal.[20]suggestthatsaddlepointsaremoreprevalentthanactuallocalminimaindeeparchitectures,andmayhavealargerimpactondeeplearningoptimizationncy.Alongsimilarlines,Choromanskaetal.[17]analyzetheprevalenceoflocalminimainnetworks(undersomesimplifyingassumptions),andthattheyarerelativelyinfrequent.StartingfromFacebook's\DeepFace"paper[84],theapplicationofdeeplearningmeth-odstofacerecognitionhasgainedattentionduetothehighdegreeofsuccessattainedonthestandardLFWbenchmark.TheoriginalDeepFacepaperusedatrainingsetofsome4:4millionimagesof4;030individuals,whichFacebookacquiredinternally.SubsequentpapersondeepnetworkbasedfacerecognitionachievedhighaccuracyresultsontheLFWbench-mark,evenwithdatabasesontheorderofhundredsofthousandsofimages[81,104],ratherthanmillions.Atthispoint,thetopseveralresultsontheLFWresultspageforthe\labeledoutsidetrainingdata"scenarioarealldeepneuralnetworkapproachesreportingaround99%accuracy1.Followingthesuccessattainedbyseveralgroups,wehavealsoadoptedadeeplearningapproachfordevelopingfacerepresentations,withapplicationtolarge-scalefacesearchandclustering.Intheremainderofthischapter,wewilldiscusssomelayertypescommonlyusedin1http://vis-www.cs.umass.edu/lfw/results.html#UnrestrictedLb30deeplearningarchitectures,discussdetailsoftherepresentationweuse,thengooverseveralfacedatabasesappropriateforevaluatingthesystem'sperformanceintfacerecogni-tionscenarios,andevaluatetherepresentation'sperformanceonprogressivelymorefacerecognitionscenarios(thePinellasCounty's(PCSO)mugshotdatabase,theLFWdatabase[35],andtheIJB-Adatabase[48]).2.2LayerTypesAneuralnetwork'sarchitectureisasaseriesoflayers,whicharearrangedinsequence(withoutputsfromonelayerfeedingintothenext,withnofeedbackduringthetestingphase).Thissectionaimstoprovideabriefoverviewofcommonlyusedlayertypes.Thecoreofanydeeparchitectureisaseriesoflayerswhichproduceoutputasacombinationoftheirinputs,andsomelearnedsetofweights.Keyexamplesincludefullyconnectedlayers,whichproducetheiroutputasafulllinearcombinationoftheirinputvalues(sotheyareanalogoustosubspacemethodslikePCA).Convolutionallayersproduceeachpixeloftheiroutputsviatheconvolutionofeveryinputregion,andalearned(soeachoutputvalueistheresultofasppatchoftheinput,unlikeinfully-connectedlayers).ItisoftencommentedthattheweightslearnedbyearlyconvolutionallayersresembleclassicimageprocessingmechanismslikeGaborLocallyconnectedlayersarerelatedtoconvolutionallayers,butproduceeachoutputasaconvolutionwiththeinput,andauniqueforthatlocation.Non-linearactivationfunctionsareanotherkeylayertypeinneuralnetworkarchitectures,sincethecompositionofaseriesoflayerswithlinearoutputisfunctionallystilllinear,soforadeeparchitecturetobemeaningful,itmustincorporatenon-linearities.Earlierworkonneuralnetworksoftenusednon-linearactivationfunctionsliketanh,orsigmoid,butonekeyaspectofthesuccessofrecentdeeparchitecturesisincorporationoftheRLinear(ReLU)activationfunction[64].Unliketanh,orsigmoid,theReLUfunctiondoes31notsaturateatahighvalue,sogradientscanstillbeusefullycomputedinthoseregions,elyleadingtoanincreaseintrainingspeed.Poolinglayersdownsampletheirinput,applyingsomesimpleoperation(e.g.maxormean)overaninputpatch,toproduceitsoutput.Max-poolingcanproviderobustnesstomisalignment(sinceeachoutputvalueisthemaxoftheinputregion,ifthemaxvalueisslightlymisaligned,thatwillbeerasedbythepoolinglayer),mean-poolingcanprovideadegreeofrobustnesstooutliervalues.Poolinglayersalsogenerallyservethepurposeofreducingthedimensionalityoftheirinputs;anoveralltrendinarchitecturesistostartwiththefullimageasinput,andalternatebetweenconvolutionalandpoolinglayerstoreducethedimensionalityoftheinputsoverthecourseofthenetwork,priortousingfully-connectedlayersandlosslayers.Losslayersmeasuretheperformanceofthenetworkinsomeway,andareusedasthebasisforback-propagation(gradientdescentisultimatelyperformedrelativetothelosslayer).Forthisreason,choosinganappropriateloss-layerseemscritical,althougharecentpaperfromTaigmanetal.[85]suggeststhatasimplesoftmaxlossremainseevenfortrulylargedatabases.Anumberofrentlosslayershavebeensuggestedforfaceapplications,e.g.tripletloss(whichtake3inputs,andpredictwhichofthetwothethirdshouldbemoresimilarto)whichwasusedinGoogle'sFaceNetnetwork[77],orthecontrastivelossfunctionusedin[104]whichattemptstoreducewithinclassscatter.2.3NetworkDescriptionTheparticulararchitectureofthenetworkproposedhere(outlinedinFig.2.1)isinspiredby[94,104].Therearefourmainbetweentheproposednetworkandtheonein[104]:i)inputtothenetworkiscolorimagesinsteadofgrayscaleimages;ii)useofarobustfacealignmentprocedure;iii)anadditionaldataargumentationstepthatrandomlycropsa100100regionfromthe110110inputcolorimage,followedbyahorizontal32Figure2.1Proposeddeepconvolutionalneuralnetwork(ConvNet).togenerateadditionalimagestotrainthenetwork;andiv)deletingthecontrastivecostlayerforcomputationalduringtraining.Forthenetwork'sconvolutionlayers,weadoptaverydeeparchitecture[79](10convo-lutionlayersintotal)andusewithasmallsize(33).Thesmallsizereducesthetotalnumberofparameterstobelearned,andthedeeparchitectureenhancesthenon-linearityofthenetwork[79].Thenetworkoutputisa320-dimensionalfeaturevector,thelowdimensionalityofthisfeaturevectorisakeyfactorinthesuccessofapplyingtherepre-sentationtolarge-scalesearchtasks(involvingdatabasesontheorderof100millionimages).TheinputlayeracceptstheRGBpixelvaluesofthealignedfaceimages.Facesarealignedasfollows:i.UsetheDLIB2implementationofKazemiandSullivan'sensembleofregressiontreesmethod[44]todetect68faciallandmarks(seeFig.2.2);ii.rotatethefaceintheimageplanetomakeituprightbasedontheeyepositions;iii.acentralpointontheface(thebluepointinFig.2.2)bytakingthemid-pointbetweentheleftmostand2http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html33(a)(b)(c)Figure2.2Afaceimagealignmentexample.Theoriginalimageisshownin(a);(b)showsthe68landmarkpointsdetectedbythemethodin[44],and(c)isthenalalignedfaceimage,wherethebluecirclewasusedtocenterthefaceimagealongthex-axis,andtheredcirclesdenotethetwopointsusedforfacecropping.rightmostlandmarks;thecenterpointsoftheeyesandmouth(redpointsinFig.2.2)arefoundbyaveragingallthelandmarksintheeyeandmouthregions,respectively;iv.centerthefacesalongthex-axis,basedonthecentralpoint(bluepoint);v.keeptheaspectratioandthepositionalongthey-axisbyplacingtheeyecenterpointat45%fromthetopoftheimageandthemouthcenterpointat25%fromthebottomoftheimage,respectively;vi.resizethewidthandheightoftheimageto110110.Notethatthecomputedmidpointisnotconsistentacrosspose.Infacesexhibitingtyaw,thecomputedmidpointwillbentfromtheonecomputedinafrontalimage,sofaciallandmarksarenotalignedconsistentlyacrossyaw.Asshownintable2.1,followingtheinputlayer,thereare10convolutionallayers,4max-poolinglayers,and1average-poolinglayer.Everypairofconvolutionallayersisgroupedtogetherandconnectedsequentially.Thefourgroupsofconvolutionallayersarefollowedbyamax-poolinglayerwitha22andastrideof2,whilethelastgroupofconvolutionallayersisfollowedbyanaverage-poolinglayerwitha77Thedimensionalityofthefeaturerepresentationlayeristhesameasthenumberofinthelastconvolutionallayer.34Table2.1Thedetailedconvolutionalnetworkarchitectureusedinthisthesis,whichiscloselyrelatedtotheoneoutlinedin[104].Themajorare:RGBimageinputtothedatalayer,improvedinputfacealignment,andanaddeddataaugmentationlayer.NameTypeFilterSize/StrideOutputsizeDatadata{1101103DataAugmentationRandomcrop{1001003Conv11convolution33/110010032Conv12convolution33/110010064Pool1maxpooling22/2505064Conv21convolution33/1505064Conv22convolution33/15050128Pool2maxpooling22/22525128Conv31convolution33/1252596Conv32convolution33/12525192Pool3maxpooling22/21313192Conv41convolution33/11313128Conv42convolution33/11313256Pool4maxpooling22/277256Conv51convolution33/177160Conv52convolution33/177320Featureaveragepooling77/111320Dropoutdropout(keep=60%)11320FCfullyconnection10;575softmax10;575Asdiscussedin[104],theReLU[50]nodeproducesasparsevector,whichisundesirableforafacerepresentationlayer.Inournetwork,weuseReLUnodes[50]inalltheconvolutionallayers,exceptthelastone,whichiscombinedwithanaverage-poolinglayertogeneratea320-dimensionalfacerepresentation.Althoughmultiplefully-connectedlayersareusedin[50,94],inournetworkwedirectlyfeedthedeepfeaturesgeneratedbythefeaturelayertoaP-waysoftmax,wherePisthenumberofsubjectsinthetrainingset.Weregularizethefeaturerepresentationlayerus-ingdropout[80],keeping60%ofthefeaturecomponentsas-isandrandomlysettingtheremaining40%tozeroduringtraining.Weuseasoftmaxlossfunctionforournetwork,andtrainitusingthestandardback-35propagationmethod.Thenetworkisimplementedusingopensourcecuda-convnet23library;weightdecayissetto5104.Thelearningrateforstochasticgradientdescent(SGD)isinitializedto102,andgraduallyreducedto105.2.4FaceDatabases(a)PCSO(b)LFW[35](c)YTF[100](d)IJB-A[48](e)CASIA[104](f)Web-FaceFigure2.3Examplesoffaceimagesfromthesixfacedatabasesusedinsubsequentexperi-ments.Inthissection,wewilldiscussthedatabasesusedforevaluatingourrepresentation's3https://code.google.com/p/cuda-convnet2/36baselineperformanceontfacerecognitiontasks,aswellasotherdatabasesusedinevaluatingsearchandclusteringperformance.Totrainthenetwork,weusetheCASIA-WebFacedatabase[104],toevaluatebaselinematchingperformanceweusePCSOmugshotdata,theLFWdatabase[35],andtheIJB-Adatabase[48].Toaugmenttheyofoursearchandclusteringexperiments,weleveragealargeunlabeledsetofimagesdownloadedfromtheweb(referredtoas\Web-Face"),andweadditionallyperformsomeclusteringexperimentsonvideoframesfromtheYouTubeFaces(YTF)database.ExampleimagesfromeachdatabaseareshowninFig.2.3,andthedatabasesareoutlinedasfollows:PCSO:ThisdatabaseisasubsetofalargecollectionofmugshotimagesacquiredfromthePinellasCounty's(PCSO)database,whichcontains1;447;607imagesof403;619subjects.LFW[35]:TheLFWdatabaseisacollectionof13;233faceimagesof5;749indi-viduals,downloadedfromtheweb.Faceimagesinthisdatabasecontaintvariationsinpose,illumination,andexpression.AlltheimagesinthisdatabasecontainfacesthatcanbedetectedbytheViola-Jonesfacedetector[35,92].YTF[100]:SimilarinspirittoLFW,theYouTubeFaces(YTF)databaseconsistsofvideosofcelebritiesandpublicharvestedfromtheInternet.Thedatabasecontains1;595subjects(whichareasubsetofthesubjectsinLFW),in3;425videos,consistingofatotalof621;126individualframes.Labelsareprovidedforthesubjectofinterestforeveryframeofvideowhereafacecouldbedetected.Inourexperimentsweusethepre-croppedframes,toavoidconfusionbetweentheprimarysubjectineachvideo,andanyunlabeledindividualsthatmaybeinagivenframe.IJB-A[48]IARPAJanusBenchmark-A(IJB-A)contains500subjectswithatotalof25;813images(5;399stillimagesand20;414videoframes).ComparedtotheLFWdatabase,theIJB-Adatabaseismorechallenging.37CASIA[104]databaseprovidesalargecollectionoffaceimageslabeledwiththesubjects'identities,suitablefortrainingdeepnetworks.Itcontains494;414faceimagesof10;575subjects.Thereare22overlappingsubjectsbetweentheCASIAandIJB-Adatabases.Afterremovingtheimagesofthese22subjectsandallimageswherefacedetectionfailed,weareleftwith404;992imagesof10;553subjectsintheCASIAdatabase.2.4.1123MillionWebFacesToconductfacesearchatscale,weusedacrawlertoautomaticallydownloadmillionsofwebimages(Thelinkstothesewebimageswereprovidedbyatresearchcollaborator).Followingthat,weoutallthenon-face-detectablewebimageswiththeDLIBfacedetector4.Weinitiallycollected80millionfaceimagesinthismanner,performedlarge-scalefacesearchexperiments,thensubsequentlycollectedanadditional43Mimages,foratotalof123millionfaceimages,whichwereusedinlarge-scalefaceclusteringexperiments.Sincetheseimagesareunlabeled(sothatwedonotknowthetruenumberofsubjectspresentinthe123Mfaces),weusethemtoaugmentthegallerysizeinourlarge-scalesearchandclusteringexperiments.Wecallthisdatabase\Web-Face."WeevaluatetheDLIBfacedetectorontheunconstrainedfacedetectionbenchmarkFDDB[41],whichcontains5;171facesextractedfrom2;845Yahoonewsimages.UsingdefaultparametersprovidedbytheDLIBfacedetector,thenumberofdetectedfalsepositivefacesis140,andthecorrespondingtruepositiverate(TPR)isabout81:6%usingdiscretescoreevaluationmetric[41].Fixingthesamenumberoffalsepositivefaces,thetruepositiverate(TPR)ofthebestcommercialfacedetector5andacademicfacedetector[75]arearound91%and89%,respectively.Wefurthermanuallyexamined10;000randomlydrawnsamplesfromtheentire123milliondatabase.Atotalof50non-faceimagesand164non-humanfacesweredetected,4http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html5http://idl.baidu.com/38whichindicatesthatapproximately2:1%ofthe123millionwebdownloadedfaceimagesmaybefalsepositives.InFigure2.4weshowsomeexamplesofnon-faceandnon-humanfaceimagesfromtheWeb-Facedatabase.(a)(b)Figure2.4ExamplesoffalsepositivesamplesdetectedbyDLIBfacedetectorinthe123MWebfacedatabase.(a)non-faceimages,(b)non-humanfaceimages.2.5FaceRecognitionEvaluationInthissection,werstevaluatetheproposeddeepConvNetonamugshotdatabase(PCSO),followedbyevaluationsontwopubliclyavailableunconstrainedfacerecognitionbenchmarks(LFW[35]andIJB-A[48])toestablishitsperformancerelativetostate-of-the-artresults.392.5.1MugshotEvaluationWeevaluatetheproposeddeepmodelonmugshotdataobtainedfromPCSO.SomeexamplemugshotsareshowninFig.2.3(a).Mugshotfacesarecapturedinconstrainedenvironments(e.g.,apolicestation)withnearfrontalviewsoftheface.Wecomparetheperformanceofourdeepfeatureswithastate-of-the-artCOTSfacematcher.TheCOTSmatcherusedherewasdesignedtoworkwithmugshot-styleimages,andisoneofthetopperformersinthe2014NISTFRVT[27].SincemugshotdataisqualitativelyerentfromtheCASIA[48]databasethatweinitiallyusedtotrainourdeepnetwork,weretrainedthenetworkwithasubsetof471;130mugshotimagesof29;674subjectstakenfromthefullPCSOmugshotdatabase.Forevaluation,wecompareddeepfeatureswiththeCOTSmatcheronatestsubsetofthePCSOdatabasecontaining100;00imagesof63;670subjects,whichdoesnotcontainanyofthesubjectsandimagesincludedinthetrainingset.Weconductthefacevexperimentusing10-foldcross-validation;thereareapproximately340Kgenuinecomparisonsand4billionimpostorcomparisons,onaverageineachfold.Table2.2PerformanceoffacevonasubsetofthePCSOdatabase(89;905imagesof13;666subjects).Thereareabout340Kgenuinepairsandover4billionimposterpairs.TAR@FAR=0.01%TAR@FAR=0.1%TAR@FAR=1%COTS0.9850.9930.997DeepFeatures0.9350.9770.993DF+COTS0.9920.9960.997ExperimentalresultsareshowninTable2.2.TheCOTSmatcheroutperformsthedeepfeatures,especiallyatlowFalseAcceptRates(FAR)6.Forexample,ataFARof0:01%,theTrueAcceptRates(TAR)7ofdeepfeaturesandtheCOTSmatcherare98:6%0:1%and93:7%0:1%,respectively.However,ascore-levelfusionbetweenthedeepfeatures6FalseAcceptRate(FAR)isasthefractionofimpostorpairsincorrectlyacceptedataparticularthreshold.7TrueAcceptRate(TAR)isasthefractionofgenuinepairscorrectlyacceptedataparticularthreshold40andCOTSscoresresultsinanimprovedoverallperformance:99:5%0:3%8.Thisresultsuggeststhefollowing:(i)deepfeaturesarenotsuperiortootherstate-of-the-artfacerecog-nitionapproachesforalltasks,especiallynearfrontalphotoswithneutralexpressionandcontrolledillumination,and(ii)Deepfeaturesandstate-of-the-artCOTSmatchersprovidecomplementaryinformation.2.5.2LFWEvaluationWhilemugshotdataisofinterestinseveralapplications,manyotherapplicationsrequirerecognizingmoreunconstrainedfaceimages.WenowevaluatetheproposeddeepmodelsonthebenchmarkLFW[35]unconstrainedfacedatabase,usingtwoprotocols:thestandardLFW[35]protocolandtheBLUFRprotocol[55].2.5.2.1StandardProtocolThestandardLFWevaluationprotocolavexperimentunder10-foldcross-validationwith300genuinecomparisonsand300impostorcomparisonsperfold,involvingatotalof7;701imagesof4;281subjects.InTable2.3,followingthestandardprotocol,wepresentthemeanvtionaccuracyoftheproposeddeepmodelsandthesameCOTSfacematcherevaluatedinsection2.5.1forthemugshotdatabase.MoreevaluationresultsareavailableontheLFWleaderboard.9Table2.3PerformanceofvariousfacerecognitionmethodsonthestandardLFWvprotocol.Method#NetsMeanaccuracys.d.COTSaN/A90.4%1.3%ProposedDeepModel196.2%0.9%ProposedDeepModel998.2%0.6%aThisversionoftheCOTSSDKisnolongerthemostrecentoneavailable.8Thetwo-tailedPvalueequals0.0001,whichindicatestheperformanceimprovementoffusionschemeisstatisticallyt.9http://vis-www.cs.umass.edu/lfw/results.html41WenoticethattheCOTSmatcherperformspoorlyrelativetothedeeplearningbasedalgorithms.Thisistobeexpectedsinceunlikedeepmodels,mostCOTSmatchershavebeentrainedtohandlefaceimagescapturedinconstrainedenvironments,e.g.mugshotordriverlicensephotos.Almostallthetop-rankingalgorithmsontheLFWleaderboardaredeeplearningbasedalgorithms.Thesuperiorperformanceofdeeplearningbasedalgorithmscanbeattributedto(a)largenumberoftrainingimagesoflargenumberofsubjects(>100K)[77],(b)dataaugmentationmethods,e.g.,useofmultipledeepmodels[56],and(c)supervisedlearningalgorithms,suchasJoint-Bayes[12],usedtolearnavonmodelforapairoffacesinthetrainingset.Togeneratemultipledeepmodels,wetrainedthreedeepConvNetsindependentlybasedontrainingdatathatwaspreprocessedusingthealignmentmethodinSection2.3.Inaddition,wecroppedsixtsub-regionsfromthealignedfaceimages(bycenteringthepositionsoftheleft-eye,right-eye,nose,mouth,left-brow,andright-brow)andtrainedsixadditionalnetworks.BycombiningtheseninedeepmodelstogetherandusingJoint-Bayes[12],themeanvaccuracyofourdeepmodelimprovesto98:20%from96:20%forasinglenetworkusingthecosinesimilarity.Despiterelyingonlyonpubliclyavailabletrainingdata,theperformanceofourdeepmodeliscompetitivewithstate-of-the-artonthestandardLFWprotocolasshownontheleaderboard.2.5.2.2BLUFRProtocolIthasbeenarguedthatthestandardLFWevaluationprotocolisnotappropriateformanyfacerecognitionapplications,whichrequirehighTrueAcceptRates(TAR)atlowFalseAcceptRates(e.g.FAR=0:1%).ThisisbecausetheLFWprotocola10-foldcross-validationexperiment,withonly300genuine,and300impostorscoreper-fold,sothereistsupportforcalculatingverylowFARoperatingpoints.Inthisexperiment,wefurtherevaluatetheproposeddeepmodelsusingtheBLUFR[55]protocol,whichboth10-foldcross-validationfacevandopen-setidentestsinvolvinglarger42(a)(b)Figure2.5Examplecorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocol,usingtheproposedDeepModel.(a)4trueacceptpairs,(b)4truerejectpairs.43(a)(b)Figure2.6Exampleincorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocol,proposedDeepModel.(a)4falseacceptpairs,(b)4falserejectpairs.44numberofgenuineandimpostorcomparisons.Forfacevineachtrial,thetestsetcontains9;708faceimagesof4;249sub-jects,onaverage.Asaresult,over47millionfacecomparisonscoresneedtobecomputedineachtrial.Foropen-setidenineachtrial,thegenuineprobesetcontains4;350faceimagesof1;000subjects,theimpostorprobesetcontains4;357imagesof3;249subjects,onaverage,andthegallerysetcontains1;000images.Followingtheprotocolin[55],wereporttheTrueAcceptRate(TAR)ataFalseAcceptRate(FAR)of0:1%forfacev10.Foropen-setidenwereportthedetectionandidenrate(DIR)atRank-1correspondingtoaFalseAcceptRate(FAR)of1%.SeeTable2.4forresults.Table2.4PerformanceofvariousfacerecognitionmethodsonLFWusingtheBLUFRpro-tocolreportedasTrueAcceptRate(TAR)andDetectionandIdenRate(DIR)standarddeviationacross10folds.Method#NetsTARDIR@FAR=1%@FAR=0.1%Rank=1Lietal.[104]180.3%28.9%COTSN/A60.0%1.5%37.9%1.5%ProposedDeepModel185.0%1.9%49.1%2.8%ProposedDeepModel989.8%1.8%55.9%3.3%WenoticethattheTARataFARof0:1%undertheBLUFRprotocolismuchlowerthantheaccuraciesreportedonthestandardLFWprotocol.Forexample,theperformanceoftheCOTSmatcherisonly58:56%undertheBLUFRprotocolcomparedto90:35%inthestandardLFWprotocol.ThisindicatesthatitismoretosaturatetheperformancemetricsfortheBLUFRprotocolthanthoseofthestandardLFWprotocol;however,aspreviouslydiscussedpracticalapplicationsrequiregoodperformanceatlowFARoperatingpoints.ThedeepmodelsstilloutperformtheCOTSmatcher.Usingcosinesimilarityandasingledeepmodel,ourmethodachievesbetterperformance(85:0%)thantheonein[104],whichindicatesthatourmotothenetworkdesign(usingRGBinput,random10TheoriginalBLUFRprotocolusesVRate(VR).WechangedittoTrueAcceptRate(TAR)forconsistencyinreportingourresults.45cropping,andimprovedfacealignment)helpboosttherecognitionperformance.Ourper-formanceisfurtherimprovedto89:8%whenwefuseninedeepmodels.Inthisexperiment,theJoint-Bayesapproach[12]didnotimproveaccuracy.Intheopen-setrecognitionresults,asingledeepmodelachievesasigntlyDIRat1$FAR(55:90%)thantheresultof28:90%reportedin[104],aswellastheCOTSmatcher(36:44%).2.5.3IJB-AEvaluationTheIJB-Adatabase[48]wasreleasedin2015inanattempttopushthefrontiersofun-constrainedfacerecognition.GiventhatrecognitionperformanceontheLFWdatabasehassaturatedunderthestandardprotocol,theIJB-Adatabasewasconstructedtocontainmorechallengingfaceimages.Thefundamentalindatacollectionstrategiesbe-tweenIJB-AandLFWisthatwhileLFWwasconstructedoutofimagesthatcouldbeautomaticallydetectedbytheViola-Jonesfacedetector,IJB-Awasconstructedwithman-uallylocalizedimages(leveragingAmazonMechanicalTurk).WhilesomeimageinIJB-AaresimilarinqualitytoLFWimages,othersexhibitextremedegreesofpose(forexample,faceimageswithonlyoneeyevisible)thatcannotbemanuallylocatedbystandardfrontalfacedetectors,someimagesfromIJB-AareshowninFigure2.7.Intermsofeval-uation,IJB-Aspbothvandiden(openandclosesets)protocols.Thebasicprotocolconsistsof10-foldcross-validationonsplitsofthedatabase,withadisjointtrainingsetforeachsplit.OneuniqueaspectoftheIJB-Aevaluationprotocolisthatit\templates,"consist-ingofoneormoreimages(stillimagesorvideoframes),andset-to-setcomparisons,ratherthanface-to-facecomparisons,asshowninFig.2.8.Inparticular,intheIJB-Aevalu-ationprotocolthenumberofimagespertemplaterangesfromasingleimagetoamaximumof202images.Boththesearchtask(1:Ncomparisons)andvtask(1:1comparison)areintermsofcomparisonsbetweentemplates(consistingofseveralfaceimages),ratherthansinglefaceimages.46ThevtionprotocolinIJB-Aconsistsof10setsofcomparisonsbetweentemplates(groupsofimages).Eachsetcontainsabout11;748pairsoftemplates(1;756genuineplus9;992impostorpairs),onaverage.Forthesearchprotocol,whichevaluatesbothclosed-setandopen-setperformance,10correspondinggalleryandprobesetsarewithboththegalleryandprobesetsconsistingoftemplates.Ineachsearchfold,thereareabout1;187genuineprobetemplates,576impostorprobetemplates,and112gallerytemplates,onaverage.Figure2.7ExamplesofwebimagesintheIJB-Adatabasewithoverlayedlandmarks(toprow),andthecorrespondingalignedfaceimages(bottomrow);(a)exampleofawell-alignedimageobtainedusingautomaticallydetectedlandmarksbyDLIB[44];(b),(c),and(d)examplesofpoorly-alignedimageswith3,2,and0ground-truthlandmarksprovidedinIJB-A,respectively.DLIBfailstooutputlandmarksfor(b)-(d).Theimagesinthetoprowhavebeencroppedaroundtherelevantfaceregionsfromtheoriginalimages.GivenanimageorvideoframefromtheIJB-Adatabase,weattempttoautomaticallydetect68faciallandmarkswithDLIB.Ifthelandmarksaresuccessfullydetected,wealignthedetectedfaceusingthealignmentmethodproposedinSection2.3.Wecalltheimageswithautomaticallydetectedlandmarkswell-alignedimages.Ifthelandmarkscannotbe47automaticallydetected,asisthecaseforfacesorwhenonlythebackoftheheadisshowing(Fig.2.7),wealignthefacebasedontheground-truthlandmarksprovidedwiththeIJB-Aprotocol.Thegroundtruthlandmarksconsistofthelefteye,righteye,andnosetip,butsincethesepointsarenotvisibleineveryimage,landmarkswhicharenotclearlyvisibleareomitted.Forexample,infacesexhibitingahighdegreeofyaw,onlyoneeyeistypicallyvisible,sotheothereyewillnotbeincludedinthegroundtruthlandmarks.Ifallthethreelandmarksareavailable,weestimatethemouthpositionandalignthefaceimagesusingthealignmentmethodinSection2.3;otherwise,wedirectlycropasquarefaceregionusingtheprovidedground-truthfaceregion.Wecallimagesforwhichautomaticlandmarkdetectionfailspoorly-alignedimages.Fig.2.7showssomefacealignmentexamplesintheIJB-Adatabase.TheIJB-Aprotocolallowstrainingforeachfold.SincetheCASIAdatabasewascol-lectedundersimilarconditionstoLFW(i.e.iscomprisedofautomaticallydetectedfaces),itcannotmatchthedegreeofposevariationpresentinIJB-A.Tocompensateforthisdif-ference,weretrainourdeepmodelusingtheIJB-Atrainingsetforeachfold.ThefacerepresentationsconsistsofaconcatenationofthedeepfeaturesfrometdeepmodelstrainedjustontheCASIAdatabase,andonere-trainedontheIJB-Atrainingsetforthecurrentfold.WethenusePrincipalComponentAnalysis(PCA)toreducethedi-mensionalityofthecombinedfacerepresentationto100,whichisthelowestvaluewithoutperformancereductionoverthetrainingset,reducingthedimensionalityinthiswasbothincreasestemplatecomparisonspeed,anddecreasestheamountofstoragerequiredforeachtemplatebyafactorofapproximately3:2.SincealltheIJB-Acomparisonsarebetweensetsoffaces,weneedtodetermineanappropriateset-to-setcomparisonmethod.Ourset-to-setcomparisonstrategyde-terminesifthereareoneormorewell-alignedimagesinatemplate.Ifso,weonlyusethewell-alignedimagesforthesetcomparison;wecallthecorrespondingtemplateswell-alignedtemplates.Otherwise,weusethepoorly-alignedimages,callingthecorrespondingtemplates48poorly-alignedtemplates.Thepairwiseface-to-facesimilarityscoresarecomputedusingthecosinesimilarity,andtheaveragescoreovertheselectedsubsetofimagesistheset-to-setsimilarityscore.Table2.5RecognitionaccuraciesundertheIJB-Aprotocol.ResultsforGOTSandOpenBRaretakenfrom[48].Resultsreportedaretheaveragestandarddeviationoverthe10foldsspintheIJB-Aprotocol.EvaluationCriterionGOTSOpenBRDCNNallProposed(V10:0%62:7%1:2%43:3%0:6%94:7%1:1%89:3%1:4%TARatFAR:1:0%40:6%1:4%23:6%0:9%78:7%4:3%72:9%3:5%0:1%19:8%0:8%10:4%1:4%N/A51:0%6:1%(Closed-SetSearch)Rank-559:5%2:0%37:5%0:8%94:3%1:7%93:1%1:4%CMC?:Rank-144:3%2:1%24:6%1:1%86:0%2:3%82:2%2:3%(Open-SetSearch)10%76:5%3:3%85:1%2:8%N/A39:2%2:7%FNIRatFPIRy:1%95:3%2:4%93:4%1:7%N/A61:5%4:6%?CumulativeMatchCharacteristic(CMC)computesthefractionofgenuinesamplesretrievedatorbelowasprank.yFalsePositiveIdentiRate(FPIR)isthefractionofimpostorprobeimagesacceptedatagiventhreshold,andFalseNegativeIdenRate(FNIR)isthefractionofgenuineprobeimagesrejectedatthesamethreshold.Keyresultsoftheproposedmethod,alongwiththebaselineresultsreportedin[48]andDCNN[13]areshowninTable2.5.Ourdeepnetworkbasedmethodperformstlybetterthanthetwobaselinesatallevaluatedoperatingpoints,andslightlyworsethanDCNN[13].DCNNusesasimilarnetworkstructureandthesametrainingdatabaseasourdeepmodel;however,itincorporatestherecentlyproposedparametriclinearunit(PReLu)[30],insteadofthelinearunit(ReLu)[50]usedinourdeepmodel.Thisindicatesthattheperformanceofourdeepmodelcouldalsobefurtherimprovedusingupdatednetworkarchitectures.Fig.2.8showsfacesearchresultsfortwoprobetemplates,onewhererank-1retrievalissuccessfulandtheotherwhererank-1retrievalisnotsuccessful.Atemplatecontainingasinglepoorly-alignedimageismuchhardertorecognizethanthetemplatescontainingoneormorewell-alignedimages.Fig.2.9showsthedistributionofwell-alignedimagesandpoorly-alignedimagesinprobetemplates.Comparedtothedistributionofpoorlyaligned49ProbeTemplateRetrievedtemplates,IJB-Aclosed-set1:NsearchprotocolRank-1Rank-2Rank-3Rank-4TemplateID:234#Images=2TemplateID:226#Images=34TemplateID:5754#Images=10TemplateID:234#Images=27TemplateID:234#Images=42TemplateID:232#Images=1TemplateID:5750#Images=15TemplateID:599#Images=49TemplateID:226#Images=34TemplateID:724#Images=47TemplateID:414#Images=1TemplateID:2176#Images=68TemplateID:3779#Images=4TemplateID:2572#Images=4TemplateID:410#Images=6Figure2.8ExamplesoffacesearchinthefoldoftheIJB-Aclosed-setsearchproto-col,using\templates."Thecolumncontainstheprobetemplates,andthefollowing5columnscontainthecorrespondingtop-4rankedgallerytemplates,whereredtexthighlightsthecorrectmatedgallerytemplateandthenumberoffacesinthecorrespondingtemplateisdenotedwith#.Thereare112gallerytemplatesintotal;onlyasubset(atmostfour)oftheimagesforeachtemplateareshown.50Figure2.9Distributionofwell-alignedtemplatesandpoorly-alignedtemplatesin1:NsearchprotocolofIJB-A,averagedover10folds.CorrectMatch@Rank-1meansthatthematedgallerytemplateiscorrectlyretrievedatrank1.Landmarksinwell-alignedimagescanbeautomaticallydetectedbyDLIB[44].Poorly-alignedimagesmainlyconsistofviewsoffaces.Wealigntheseimagesusingthethreeground-truthlandmarkswhenavailable,orelsebycroppingtheentirefaceregion.templatesintheoveralldatabase,wefailtorecognizeadisproportionatenumberoftemplatescontainingonlypoorly-alignedfaceimages.2.6ConclusionsWedemonstratetheperformanceofour320-dimensionaldeepfeaturesonthreefacerecogni-tiondatabases,ofincreasingy:thePCSOmugshotdatabase,theLFWunconstrainedfacedatabase,andtheIJB-Adatabase.Onthemugshotdata,deepfeatureperformance(TARof93:5%atFARof0:01%)isworsethanaCOTSmatcher(98:5%),butfusingourdeepfeatureswiththeCOTSmatcherdoesimprovetheoverallperformance(99:2%).OurperformanceonthestandardLFWprotocol(98:20%accuracy)iscomparabletostate-of-the-artaccuraciesreportedintheliterature.OntheBLUFRprotocolfortheLFWdatabaseweattainedaTARof88:03%atFARof0:1%,improvingovertheresultsreportedin[104](althoughthisresultwasinturnoutstrippedbyChengatal.[15](with93:05%VRat0:1%FAR).Ourdeepfeaturesoutperformthebenchmarksreportedin[48]ontheIJB-Adatabase,asfollows:TARof51:0%atFARof0:1%(vRank1retrievalof82:2%(closed-set51search);FNIRof61:5%atFPIRof1%(open-setsearch);however,subsequentresultsonIJB-Ahaveexceededourinitialworkherereportingupto82:2%TARat1%FARinveri-arank-onerecognitionrateof88%inclosed-setsearch,andaTPIRof36%at1%FPIRinopen-setsearch11.Giventheresultsweattainedonrelativelysmall-scaleproblems,itisinterestingtoconsiderhowwelltheywouldholduponlargerscalefacerecognitionprob-lems,particularlyifgoodidenperformanceontheBLUFRprotocolcantranslatetostrongsearchresultsformuchlargerdatabases.11https://www.nist.gov/programs-projects/face-challenges52Chapter3FaceSearchatScale3.1IntroductionSocialmediahasbecomepervasiveinoursociety.Onepopularaspectofsocialmediaisthesharingofpersonalphotographs.Facebook,ina2013whitepaper,revealedthatitsusershaveuploadedmorethan250billionphotos,andareuploading350millionnewphotoseachday1.Toenableautomatictaggingoftheseimages,accurateandrobustfacerecognitioncapabilitiesareneeded.Givenanuploadedphoto,FacebookandGoogle'stagsuggestionsystemsautomaticallydetectfacesandthensuggestpossiblenametagsbasedonthesimilaritybetweenfacialtemplatesgeneratedfromtheinputphotoandpreviouslytaggedphotographsintheirdatabases.Inthelawenforcementdomain,theFBIplanstoincludeover50millionphotographsinitsNextGenerationIden(NGI)database2,withthegoalofprovidinginvestigativeleadsbysearchingthegalleryforimagessimilartoasuspect'sphoto.Bothtagsuggestioninsocialnetworksandsearchingforasuspectincriminalinvestigationsareexamplesoffacesearchatscale(Fig.3.1).Weaddressthelarge-scalefacesearchprobleminthecontextofsocialmediaandotherwebapplicationswherefaceimagesaregenerallyunconstrainedintermsofpose,expression,andillumination[11,101].1http://phys.org/news/2016-01-facebook.html2goo.gl/UYlT8p53Figure3.1Outlineofthelarge-scalefacesearchproblem.Amajorfocusinfacerecognitionhasbeentoimproveunconstrainedfacerecognitionaccuracy,particularlyontheLabeledFacesintheWild(LFW)benchmark[35].But,theproblemofscaleinfacerecognitionhasnotbeenadequatelyaddressed3.ItisnowgenerallyagreedthatthesmallsizeoftheLFWdatabase(13;233imagesof5;749subjects),aswellstheIJB-Adatabase(25;813imagesof500subjects),andthelimitationsintheLFWprotocoldonotaddressthetwomajorchallengesinlarge-scalefacesearch:(i)lossinsearchaccuracy,and(ii)increaseincomputationalcomplexitywithincreasegallerysize.Thetypicalapproachtoscalability(usedine.g.content-basedimageretrieval[101])istorepresentobjectswithfeaturevectorsandemployanindexingorapproximatesearchschemeinthefeaturespace.Avastmajorityoffacerecognitionapproaches,irrespectiveoftherepresentationscheme,areultimatelybasedonlengthfeaturevectors,soemployingfeaturespacemethodsisfeasible.However,sometechniquesforimprovingfacerecognitionaccuracy,suchaspairwisecomparisonmodels(e.g.Joint-Bayes[12]),arenotcompatible3OurpreliminaryworkonthistopicappearedintheProc.IEEEInternationalConferenceonBiometrics(ICB),Phuket,June2015[96].Atechnicalreportdescribingthisworkappearedin[97]54withfeaturespaceapproaches.Additionally,mostCOTSfacerecognitionSDKspairwisecomparisonscoresbutdonotrevealtheunderlyingfeaturevectors,sotheyarealsoincompatiblewithfeature-spaceapproaches.Therefore,usingafeaturespacebasedapproximationmethodalonemaynotbeentforlarge-scalesearch.Toaddressthebetweensearchperformanceandsearchtimeatscale(80Mfaceimagesusedhere),weproposeacascadedfacesearchframework(Fig.3.2).Inessence,wedecomposethesearchproblemintotwosteps:(i)afaststep,whichusesanapproximationmethodtoreturnashortcandidatelist,and(ii)are-rankingstep,whichre-ranksthecandidatelistwithaslowerpairwisecomparisonoperation,resultinginamoreaccuratesearch.Thefaststeputilizestgedeepconvolutionalnetwork(ConvNet)discuessedinthepreviouschapter,whichisantimplementationofthearchitecturein[104],withproductquantization(PQ)[42]tospeedupsearch.Forthere-rankingstep,aCOTSfacematcher(oneofthetopperformersinthe2014NISTFRVT[27])isused.Themaincontributionsofthischapterareasfollows:Alarge-scalefacesearchsystem,leveragingthedeepnetworkrepresentationcombinedwithastate-of-the-artCOTSfacematcherinacascadedscheme.ThelargestfacesearchexperimentsconductedtodateontheLFW[35]andIJB-A[48]benchmarks,withan80Mimagegallery.Asearchspeedof6:6secondsforasingleprobeface,againstan80Mfaceimagegallery.UsingfaceimagesoftheTsarnaevbrothersinvolvedintheBostonMarathonbombingasqueries,weshowthatDzhokharTsarnaev'sphotocouldbeidenatrank8whensearchingagainstthe80Mgallery.55Table3.1AsummaryoffacesearchsystemsreportedintheliteratureAuthorsProbeGalleryDatabase#Images#Subjects#Images#SubjectsWuetal.[101]220N/A1M+N/ALFW[35]+aChenetal.[11]1201213;1135;749LFW[35]4;3004354;497200[51]Milleretal.[62]4;000801M+N/AFaceScrub[65]+Yietal.[103]1;195N/A201;196N/AFERET[74]+Yanetal.[102]16;028N/A116;028N/AFRGC[73]+Klareetal.[47]840840840840LFW[35]25;00025;00025;00025;000PCSO[47]Best-Rowdenetal.[46]10;0905;1533;143596LFW[35]Liaoetal.[55]8;7074;2491;0001;000LFW[35]ProposedSystem7;3705;50780M+N/ALFW[35]+14;8684;50080M+N/AIJB-A[48]+aDatabasesmarkedwith\+"augmentalabeleddatabasewithtsetsofunlabeledimagesharvestedfromtheinternet.Theunlabeledimagesusedaretacrosspaper.3.2RelatedWorkFacesearchhasbeenextensivelystudiedinmultimediaandcomputervisionliterature[39].Earlystudiesprimarilyfocusedonfacescapturedunderconstrainedconditions,e.g.theFERETdatabase[74].However,duetothegrowingneedforstrongfacerecognitionca-pabilityinthesocialmediacontext,ongoingresearchisfocusedonfacescapturedundermorechallengingconditionsintermsoflargevariationsinpose,expression,illuminationandaging,similartoimagesintheLFW[35]andIJB-A[48]databases.Thethreemainchallengesinlarge-scalefacesearchare:i)facerepresentation,ii)approx-imatek-NNsearch(neededforattaininghighsystemthroughput),andiii)galleryselectionandevaluationprotocol.Forfacerepresentation,featureslearnedfromdeepconvolutionalnetworks(deepfeatures)havebeenshowntosaturateperformanceonthestandardLFWevaluationprotocol4.ThebestaccuracytodateonLFWisreportedbyBaidu[56]whichis99:77%;itleverages70deeplearningmodelstrainedon1:2Mimagesof18Kindividuals.Acomparableresult4http://vis-www.cs.umass.edu/lfw/results.html56(99:63%)wasachievedbyGoogle[77]usingadeepmodelandatrainingsetwithover150Mimagesof8Msubjects.Ithasevenbeenreportedthatdeepfeaturesexceedthehumanfacerecognitionaccuracy(99:20%[51])ontheLFWdatabase(althoughthereportedrencemaynotbestatisticallyt).Inordertorecognizefacesinwebdownloadedimages,wealsoadoptadeepConvNetbasedfacerepresentationbyimprovingthearchitectureoutlinedin[104].Givenourgoalofusingdeepfeaturestoalargegallerytoasmallsetofcandidatefaceimages,weuseanapproximatek-NNsearchmethodtoimprovescalability.Therearethreemainapproachesforapproximatefacesearch:InvertedIndexing.Followingthetraditionalbag-of-wordsrepresentation,Wuetal.[101]designedacomponent-basedlocalfacerepresentationforinvertedindexing.Theysplitalignedfaceimagesintoasetofsmallblocksaroundthedetectedfaciallandmarksandthenquantizedeachblockintoavisualwordusinganidentity-basedquantizationscheme.Thecandidateimageswereretrievedfromtheinvertedindexofvisualwords.Chenetal.[11]improvedthesearchperformancein[101]byleveraginghumanattributes.Hashing.Yanetal.[102]proposedaspectralregressionalgorithmtoprojectfacialfeaturesintoadiscriminativespace;acascadedhashingscheme(similarityhashing)wasusedfortsearch.Wangetal.[95]proposedaweaklabelregularizedsparsecodingtoenhancefacialfeaturesandadoptedtheLocality-SensitiveHash(LSH)[22]toindexthegallery.ProductQuantization(PQ).Unlikeinvertedindexingandhashingwhichrequireindexvectorstobestoredinmainmemory,PQ[42]isacompactdiscreteencodingmethodthatcanbeusedeitherforexhaustivesearchorinvertedindexingsearch.Inthiswork,weadoptproductquantizationforfast57Intheliterature,facesearchsystemshavemainlybeenevaluatedundertheclosed-setprotocol(Table3.1),whichassumesthatthesubjectintheprobeimageispresentinthegallery.However,inmanylargescaleapplications(e.g.,surveillanceandwatchlistscenarios),open-setsearchprotocol,wheretheprobesubjectmaynotbepresentinthegallery,ismoreappropriate.Recognizingthis,severalnewprotocolsforunconstrainedfacerecognitionbasedontheLFWdatabasehavebeenproposed,includingtheopen-setidenprotocol[4]andtheBenchmarkofLarge-scaleUnconstrainedFaceRecognition(BLUFR)protocol[55].However,evenforthesetwoprotocols,thegallerysizesarefairlysmall(afewthousandimages),duetotheinherentsmallsizeoftheLFWdatabase.EventheIJB-Adatabase,whichincludesapproximatelytwiceasmanyimagesasLFW(whencountingvideoframes),isstilltoosmallforevaluatingtrulylarge-scalefacesearchperformance.Table3.1showsthatthelargestfacegalleryreportedintheliteraturetodateisabout1M,whichisnotevenclosetobeingarepresentativeofsocialmediaandforensicapplications.Totacklethesetwolimitations,weevaluatetheproposedcascadedfacesearchsystemwithan80Mfacegallery5underbothclosed-setandopen-setprotocols.3.3FaceSearchFrameworkGivenaprobeimage,afacesearchsystemaimstothetop-kmostsimilarfaceimagesinthegallery.Tohandlelargegalleriescontainingtensofmillionsofimages,weproposeacascadedfacesearchstructuresimilarto[103,107],designedtospeedupthesearchprocesswhileachievingacceptableaccuracy.Figure3.2outlinestheproposedfacesearcharchitectureconsistingofthreemainsteps:i)templategenerationmodulewhichextractsfeaturesfortheNgalleryfacesaswellasfortheprobeface(online),usingthepreviouslydiscusseddeepCNNbasedfacerepresentation;ii)facemodulewhichcomparestheproberepresentationagainst5Wegotthelinkstothesewebimagesfromatresearchcollaborator.Ourapproachwillalsoworkonalargergallerysize.58Figure3.2Illustrationoftheproposedlarge-scalefacesearchsystem.thegalleryrepresentationsusingproductquantizationtoretrievethetop-kmostsimilarcandidates(k˝N);and(iii)re-rankingmodulewhichfusessimilarityscoresofdeepfeatureswithscoresfromaCOTSfacematchertogenerateaneworderingofthekcandidates.Thesemodulesarediscussedindetailintheremainderofthissection.3.3.1FaceFilteringGivenaprobefaceIandatemplategenerationfunctionF,thetop-kmostsimilarfacesCk(I)inthegalleryGisformulatedasfollows:Ck(I)=Rankk(fS(F(I);F(Ji))jJi=1;2;:::;N2Gg)(3.1)whereNisthesizeofgalleryG,SisafunctionwhichmeasuresthesimilarityoftheprobefaceIandthegalleryimageJi,andRank()isafunctionthatthetop-klargestvaluesinanarray.ThecomputationalcomplexityofnaefacecomparisonfunctionsislinearwithrespecttothegallerysizeNandthefeaturedimensionalityd.Toaddresslarge-scale59search,approximatenearestneighbor(ANN)algorithms,whichimproveruntimewithoutatlossinaccuracy,havebecomepopular.HashingbasedalgorithmsusecompactbinaryrepresentationstoconductanexhaustivenearestneighborsearchinHammingspace.Althoughmultiplehashtables[22]cansig-tlyimproveperformanceandreducedistortion,theirperformancedegradesquicklywithincreasinggallerysizeinfacerecognitionapplications.Productquantization(PQ)[42],wherethefeaturetemplatespaceisdecomposedintoaCartesianproductoflowdimensionalsubspaces(eachsubspaceisquantizedseparately)hasbeenshowntoachieveexcellentsearchresults[42].Detailsofproductquantizationusedinourimplementationaredescribedbelow.Undertheassumptionthatthefeaturedimensionalityisamultipleofm,wheremisaninteger,anyfeaturevectorx2Rdcanbewrittenasaconcatenation(x1;x2;:::;xm)ofmsub-vectors,eachofdimensiond=m.Inthei-thsubspaceRd=m,givenasub-codebookCi=fcij=1;2;:::;zjcij2Rd=mg,wherezisthesizeofcodebook,thesub-vectorxicanbemappedtoacodewordcijinthecodebookCi,withjastheindexvalue.Theindexjcanthenberepresentedbyabinarycodewithlog2(z)bits.Inoursystem,eachcodebookisgeneratedusingthek-meansclusteringalgorithm.Givenallthemsub-codebooksfC1;C1;:::;Cmg,theproductquantizeroffeaturetemplatexisq(x)=(q1(x1);:::;qm(xm))whereqj(xj)2Cjisthenearestsub-centroidofsub-vectorxjinCj,forj=1;2;:::;m,andthequantizerq(x)requiresmlog2(z)bits.Givenanotherfeaturetemplatey,theasymmetricsquaredEuclideandistancebetweenxandyisapproximatedbyD(y;x)=kyq(x)k2=mXj=1kyjqj(xj)k2whereqj(xj)2Cj,andthedistanceskyjqj(xj)karepre-computedforeachsub-vectorofyj;j=1;2;:::;mandeachsub-centroidinCj;j=1;2;:::;m.Sincethedistancecompu-60tationrequiresO(m)lookupandaddoperations[42],approximatenearestneighborsearchwithproductquantizersisfast,andtlyreducesthememoryrequirementswithbinarycoding.Tofurtherreducethesearchtime,anon-exhaustivesearchschemewasproposedin[42]and[43]basedonaninvertedsystemandacoarsequantizer;thequeryimageisonlycomparedagainstaportionoftheimagegallery,basedonthecoarsequantizer.However,wefoundthatnon-exhaustivesearchsigntlyreducesfacesearchperformancewhenusedwiththeproposedfeaturevector.Twoimportantparametersinproductquantizationarethenumberofsub-vectorsmandthesizeofthesub-codebookz,whichtogetherdeterminethelengthofthequantizationcode:mlog2z.Typically,zissetto256.Totheoptimalm,weempiricallyevaluatesearchaccuracyandtimeperqueryforvariousvaluesofm,basedona1millionfacegalleryandover3;000queries.Wenoticedthattheperformancegapbetweenproductquantization(PQ)andbruteforcesearchbecomessmallwhenthelengthofthequantizationcodeislongerthan512bits(m=64).Consideringsearchtime,thePQ-basedapproximatesearchisanorderofmagnitudefasterthanthebruteforcesearch.Asabetweenandeness,wesetthenumberofsub-vectorsmto64;thequantizationcodeis64log2(256)=512bitslong.Althoughweuseproductquantizationtocomputefacesimilarityscores,wealsoneedtopickadistanceorsimilaritymetric.Weevaluatedcosinesimilarity6,L1distance,andL2distanceusinga5Mgallery.Thecosinesimilarityachievesthebestperformanceamongthosethreemetrics,althoughthenormalizedL2distancehasidenticalperformance.3.3.2Re-RankingGiventheshortcandidatelist,there-rankingmoduleaimstoimprovesearchaccuracybyusingadditionalfacematcherstore-rankthecandidatelist.Inparticular,givenaprobeface6https://en.wikipedia.org/wiki/Cosinesimilarity61Ianditstop-kmostnearestfacesreturnedfromthemodule(denotedwithCk(I)),thekcandidatefacesarere-rankedbyfusingthesimilarityscoresfromLtmatchers.There-rankingmoduleisformulatedas:Re-Rank(fFusion(Sj=1;:::;L(I;Ji))jJi=1;:::;k2Ck(I)g)(3.2)whereSjisthej-thmatcher,andRe-Rankfunctionsortstop-ksamplesinthedescendingorder.Tomakeoursystemsimpleyete;wesetL=2andgeneratethesimilarityscoreusingsum-rulefusion[46]betweencosinesimilarityscorescomputedfromthelearneddeepfeaturesandthesimilarityscoresgeneratedbytheCOTSfacematcher7.Toreducetheofscaleinsum-rulefusion,weadoptz-scorenormalization[37]overthetop-ksimilarityvaluesforeachfacematcher,respectively.ThemainbofcombiningthesimilaritiesderivedfromdeepfeaturesandscoresoutputbyaCOTSmatcheristoutilizethestrengthoftwotfacerepresentations.WenoticedthatthesetofimpostorfaceimagesthatareincorrectlyassignedhighsimilarityscoresbydeepfeaturesandtheCOTSmatcherdonotoverlap,suggestingtheirrepresen-tationsarecomplementary,whichisnecessaryforthesuccessoffusion[46].SinceCOTSmatchersarewidelydeployedinmanyrealworldapplications[27],theproposedcascadefusionschemecanbeeasilyintegratedinexistingapplicationstoimprovebothscalabilityandperformance.3.3.2.1ImpactofCandidateSetSize(k)Intheproposedcascadedfacesearchsystem,thesizeofcandidatelistkisakeyparameter.Ingeneral,weexpecttheoptimalvalueofktoberelatedtothegallerysizeN(alargergallerywouldrequirealargercandidatelisttomaintaingoodsearchperformance).WeevaluatetherelationshipbetweenkandNbycomputingthemeanaverageprecision(mAP)7WeenforcetheCOTSmatchertocomparetwofaceimagesdirectlywithoutusinganymetadata.62asthegallerysize(N)isincreasedfrom100Kto5Mandthesizeofcandidatelist(k)isincreasedfrom50to500K.(a)Impactofcandidatesetsizek(b)VariousfusionstrategiesFigure3.3(a)Impactofcandidatesetsize(k)asafunctionofthegallerysize(N)onthesearchperformanceasmeasuredintermsofMeanAveragePrecision(mAP).RedpointsmarktheoptimalvalueofkforrentvaluesofN.(b)Comparisonoffusionstrategiesbasedona1Mgalleryand3Kprobes.Fig.3.3(a)showsthatthesearchperformance,asexpected,decreaseswithincreasinggallerysize.Further,foraN,searchperformanceinitiallyincreases,thendropswhenkgetstoolarge.TheoptimalcandidatesetsizekscaleslinearlywiththegallerysizeN.BecausetheplotsinFig3.3(a)outforlargek,anearoptimalvalueofk(e.g.,k=0:01N)candrasticallyreducethecandidatelistwithaverysmalllossinaccuracy.3.3.2.2FusionMethodAnotherimportantissueisthechoiceoffusionschemetocombinesimilarityscoresfromdeepfeatures(DF)andCOTS.Weempiricallyevaluatedthefollowingfourfusionstrategies:DF+COTS:FusionofsimilarityscoresbasedondeepfeaturesandtheCOTSmatcher,withoutanyDF!COTS:Filterthegalleryusingdeepfeatures,thenre-rankthecandidatelistbasedonfusionofsimilarityscoresfromdeepfeaturesandtheCOTSmatcher.63DF!COTSonly:OnlyusethesimilarityscoresofCOTSmatchertorankthekcan-didatefacesoutputbydeepfeatures.DF!COTSrank:Filterthegalleryusingdeepfeatures,andrankkcandidatefacesusingsimilarityscoresofdeepfeaturesandtheCOTSmatcher,respectively,andthen,combinetworankedlistsusingrank-levelfusion.ThisisusefulwhentheCOTSmatcherdoesnotreportsimilarityscores.Tokeeptheevaluationtractable,tfusionmethodswereevaluatedusingasubsetofabout3Kprobes,anda1Mfacegallery.Theaverageprecisionvs.averagerecallcurvesofthesefourfusionstrategiesareshowninFig.3.3(b).Asabaseline,wealsoshowthesearchperformancesofusingDFandCOTSalone.Thefusionscheme(DF!COTS)consistentlyoutperformstheotherfusionmethods,aswellassimplyusingDFandCOTSalone.Notethatomittingthestepleadstopoorsearchresultscomparedtothecascadedapproach,whichisconsistentwithresultsintheprevioussection:whenkapproachesN,thesearchaccuracydecreases.3.4Large-scaleFaceSearchInthissection,weevaluateourfacesearchsystemusingan80Mgallery.ThetestdatabasesweuseconsistofLFWandIJB-Aimages.Weusetheimagestoconstructthematedportionofasearchdatabasewithanextendedgallery,ratherthanfollowingthestandardprotocolsforthosedatabases.Wereportsearchresultsunderbothopen-setandclosed-setprotocolswithincreasinggallerysize,upto80Mfaces.Weevaluatethefollowingthreefacesearchschemes:DeepFeatures(DF):Useourdeepfeatureswithproductquantization(PQ)todirectlyretrievethetop-kmostsimilarfacestotheprobe(nore-rankingstep).64COTS:Useastate-of-the-artCOTSfacematchertocomparetheprobeimagewitheachgalleryface,andoutputthetop-kmostsimilarfacestotheprobe(nolteringstep).DF!COTS:Firstthegalleryusingdeepfeatures.Next,re-rankthetop-kcandidatefacesbyfusingcosinesimilaritiescomputedfromdeepfeatureswiththeCOTSmatcher'ssimilarityscoresforthekcandidatefaces.Forclosed-setfacesearch,weassumethattheprobealwayshasatleastonecorrespondingfaceimageinthegallery.Foropen-setfacesearch,givenaprobewedecidewhetheracorrespondingimageispresentinthegallery.Ifitisdeterminedthattheprobe'sidentityisrepresentedinthegallery,thenwereturnthesearchresults.Foropen-setperformanceevaluation,theprobesetconsistsoftwogroups:i.genuineprobesetthathasmatedimagesinthegallery,andii.impostorprobesetthathasnomatedimagesinthegallery.3.4.1SearchDatabaseTable3.2Large-scalewebfacesearchdatabaseoverview.Source#Subjects#ImagesTrainingSetCASIA[104]10,553404,992LFWbasedprobeandmatesetsGenuineProbeSetLFW[35]1,5073,370MateSetLFW[35]1,5073,845IJB-AbasedprobeandmatesetsGenuineProbeSetIJB-A[48]50010,868MateSetIJB-A[48]50010,626ImpostorProbeSetLFW[35]4,0004,000BackgroundSetWeb-FacesN/A80,000,000Weconstructalarge-scalesearchdatabaseusingfourfacedatabases.Thedatabaseconsistsofeparts,asshowninTable3.2:1)trainingset,whichisusedtotrainourdeepnetwork;2)genuineprobeset,theprobesetwhichhascorrespondinggalleryimages;3)mate65set,thepartofthegallerycontainingthesamesubjectsasthegenuineprobeset;4)impostorprobeset,whichhasnooverlappingsubjectswiththegenuineprobeset;5)backgroundset,whichhasnoidentitylabelsandissimplyusedasbackgroundimagestoenlargethegallerysize.WeusetheLFWandIJB-Adatabasestoconstructthegenuineprobesetandthecorre-spondingmateset.FortheLFWdatabase,weremoveallthesubjectswhohaveonlyasingleimage,leaving1;507subjectswith2ormoreimages.Foreachofthesesubjects,wetakehalfoftheimagesforthegenuineprobesetandusetheremainingimagesforthematesetinthegallery.Werepeatthisprocess10timestogenerate10groupsofprobeandmatesets.Toconstructtheimpostorprobeset,wetake4;000subjectsfromLFW,eachhavingonlyoneimage.FortheIJB-Adatabase,asimilarprocessisadoptedtogenerate10groupsofprobeandmatesets.Tobuildalarge-scalebackgroundset,acrawlerwasusedtodownloadmillionsofwebimagesfromtheInternet,thenthemtoonlyincludethosewithdetectablefacesbyDLIB8.Bycombiningthematesetandbackgroundset,wecomposean80millionimagegallery.MoredetailsareshowninTable3.2.3.4.2DatabaseSegmentationInthesearchexperiments,weuseLFWorIJB-Afortheprobeandmatesets,and80Mwebfacesforthebackgroundset.Althoughallthethreedatabasesconsistofunconstrainedfaceimagesfromtheweb,theyarecollectedfromtsources.Inparticular,LFWandIJB-Aareimagesofpublicoftentakenfrompresscoverageoftheseindividuals(withIJB-Acontainingmorechallengingimages)andWeb-Faceisfromphotosonsocialmediawebsites.Assuch,thetcharacteristicsofthedatabasesmayleadtoasegmentationwhereimagesfromonedatabasemayeasilybedistinguishedfromothersbasedonimageacquisitionproperties,ratherthantheidentitiesofthefacesbeingcompared.Inotherwords,thebackgroundsetshouldhaveasimilardistributiontotheprobesetand8Thelinkstothesewebimageswereprovidedbyatresearchcollaborator.Wedownloadedtherawwebimages,andoutallnon-face-detectablewebimagesusingtheDLIBfacedetector.66themateset,otherwise,theuseofthebackgroundsetwillnotelydemonstratethesearchperformancethatwouldbeseenwithalargegalleryofimageswithmoreuniformproperties.Figure3.4Distributionsofcosinesimilaritiesofthegenuinepairs,within-databaseimpostorpairsandbetween-databaseimpostorpairsforthecombinationsofLFW+Web-Face(left)andIJB-A+Web-Face(right).ToexaminethebetweentheWeb-Faceandlabeleddatabases(LFWandIJB-A),werandomlysample10KimagesfromtheWeb-Facedatabase,andcombinethissubsetwiththeLFWandIJB-Adatabases,respectively.Wethenextractfeaturesusingtheproposeddeepmodel,andcomputeallpairwisecosinesimilarities.WeexaminethedistributionsofthesescoresinFig.3.4,whichplotsthegenuine,within-databaseimpostor,andbetween-databaseimpostorscoredistributionsforboththeLFW+Web-FaceandIJB-A+Web-Facedatabases.WeobservethatforbothLFWandIJB-Adatabasesthedistributionsofcosinesimilaritiesofthewithin-databaseandbetween-databaseimpostorpairshaveatoverlap,butthedistributionofcosinesimilaritiesofthebetween-databaseimpostorpairsisleft-shifted.Thisindicatesthatthee"backgroundgallerysizeissmallerthan80million,sincetypicalimpostorimagesfromthebackgrounddatabasescorerelativelylowerthanimpostorsfromthelabeleddatabases.Weanalyzethisintermsoftheverproblem,byestimatingwhatsizesamplefromthewithin-databaseimpostorwouldresultinthesametotalnumberoffalseacceptsseenfromthecross-databaseimpostorscoredistribution(using67Figure3.5egallerysizefortheaugmentedLFW+Web-Facedatabase.egallerycomputedforaFRRbycountingthenumberofcross-databaseimpostorsoverthreshold,andestimatingthesizeofdatabasefollowingtheobservedwithin-databaseim-postordistributionrequiredtoproduceasimilarnumberoferrors.68theempiricalscoredistributionsdirectly),resultsforvariousFalseRejectRatesareshowninFigure3.5.Forthe1%FalseRejectRateoperatingpoint,asampleofapproximately23millionimagesfollowingtheobservedwithin-LFWimpostorscoredistributionwouldgenerateasmanyfalseaccepterrorsasweregeneratedfromthefull80millionWeb-Facedatabase.ForalowerFRRof.01%,matchingthenumberoffalseacceptsgeneratedfromthecross-databaseimpostordistributionwouldrequireapproximately72millionimagesfollowingthewithin-LFWimpostorscoredistribution.3.4.3PerformanceMeasuresFacesearchaimstoallthematedfacesinthegallery,whichisbroaderthanthetra-ditionalbiometricproblems,e.g.authentication(1:1search)oriden(1:Nsearch).Hence,weevaluateourfacesearchsystemwiththewidelyusedretrievalevaluationmet-rics:precision,thefractionofthesearchsetconsistingofmatedfaceimages,andrecall,thefractionofallmatedfaceimagesforagivenprobefacethatwerereturnedinthesearchresults.Varioustrbetweenprecisionandrecallarepossible(forexample,highrecallcanbeachievedbyreturningalargeresultset,butalargeresultsetwillalsoleadtolowerprecision),sowesummarizetheoverallclosed-setfacesearchperformanceusingmeanAveragePrecision(mAP),whichisalsowidelyusedforsearchsystemevaluation[42].mAPisasfollows:givenasetofnprobefaceimagesQ=fx1q;x2q;:::;xnqgandagallerysetwithNimages,theaverageprecisionofxiqis:avgP(xiq)=NXj=1P(xiq;j)[R(xiq;j)R(xiq;j1)](3.3)whereP(xiq;j)isprecisionatthej-thpositionforxiqandR(xiq;j)isrecallatthej-th69positionforxiq(R(0)=0).ThemeanAveragePrecision(mAP)oftheentireprobesetis:mAP(Q)=mean(avgP(xiq));i=1;2;:::;nWhenthegallerysizeNistoolarge,for,wecomputetheaverageprecisionusingthetop-100Ksearchresults.Intheopen-setscenario,weevaluatesearchperformanceasabetweenmeanaverageprecision(mAP)andfalseacceptrate(FAR)(thefractionofimpostorprobeimageswhicharenotrejectedatagiventhreshold).Givenagenuineprobe,itsaverageprecisionissetto0ifitisrejectedatagiventhreshold,otherwise,itsaverageprecisioniscomputedusingEq.3.3.Foralargebackgroundgallerydatabaselikethe80millionWeb-Faceusedhere,itistoensurethattherearenosubjectswhichoverlapbetweenthequeryandbackgroundsets.Asaresult,inourevaluation,ifoneoftheunlabeledbackgroundimagesisactuallythesamepersonasthequeryimage,weconsideritan\incorrect"searchresult.Inotherwords,whilewecannotguaranteethatnoimagesinthebackgroundsethavethesameidentityasthequeryimage,anysuchimages,ifpresent,willbiasourresultsinthedirectionofloweraccuracy.3.4.4Closed-setFaceSearchWeexamineclosed-setfacesearchperformancewiththegallerysizeNrangingfrom100Kto80M.Enrollingthecomplete80MgalleryintheCOTSmatcherwouldtakeaprohibitiveamountoftime(over80days),duetolimitationsoftheSDKwehave,sothemaxi-mumgallerysetusedfortheCOTSmatcheris5M.FortheproposedfacesearchschemeDF!COTS,wechosethesizeofcandidatesetkusingtheheuristick=1=100Nwhenthegallerysizeissmallerthan5Mandk=1;000whenthegallerysetsizeis80M.Weuseakforthefull80Mgallerysinceusingalargerkwouldtakeaprohibitiveamountoftime,70(a)(b)(c)(d)Figure3.6Exampletop-10searchresultsforLFWimages.Probeimagesareontheleft,thetop-10retrievalresultsareshownin2columnsontheright.(a)arelativelygoodquality(frontal)imageofDonaldRumsfeld,mostofthetop-10areofthecorrectidentity.(b)animageofDonaldRumsfeldexhibitingpartialocclusion,andposevariation.Onlyonecorrectidentityphotographisinthetop-10,howeveracaricatureofDonaldRumsfeldisalsoreturned.(c)AprobeimageofCondoleezzaRice,alltop-10resultsareofthecorrectidentity.(d)AprobeimageofGeorgeH.W.Bush,6ofthetop-10retrievalresultsareofthecorrectidentity.71duetotheneedtoenrolltheimagesintheCOTSmatcher.ExperimentalresultsfortheLFWandIJB-Adatabasesunderclosed-setsearchareshowninFig.3.7.Someexamplesearchresults,usingthefull80Mgalleryareshownin3.6.Closed-setSearchEvaluationonLFWandIJB-AdatabasesFigure3.7Closed-setfacesearchperformance(mAP)vs.gallerysizeN(log-scale),onLFWandIJB-Adatabases.TheperformanceofCOTSmatcheron80Mgalleryisnotshown,sinceenrollingthecomplete80MgallerywiththeCOTSmatcherwouldhavetakenaprohibitiveamountoftime(over80days).ForbothLFWandIJB-Afaceimages,asexpected,therecognitionperformanceofallthreefacesearchschemesevaluatedheredecreaseswithincreasinggallerysetsize.Inparticular,forallthesearchschemes,mAPlinearlydecreaseswiththegallerysizeNonlogscale;theperformancegapbetweena100Kgalleryanda5Mgalleryisaboutthesameastheperformancegapbetweena5Mgalleryandan80Mgallery.Whiledeepfeaturesoutperform72theCOTSmatcheralone,theproposedcascadedfacesearchsystem(whichleveragesbothdeepfeaturesandtheCOTSmatcher)givesbettersearchaccuracythaneithermethodindividually.ResultsontheIJB-AdatabasearesimilartotheLFWresults,exceptforaloweroverallaccuracy.TheloweraccuracyonIJB-AdataistobeexpectedgiventhatIJB-Acontainsmorechallengingfaceimages.3.4.5Open-setFaceSearchOpen-setsearchisimportantforseveralpracticalapplications(e.g.,de-duplication),onecannotassumethatthegallerywillcontainimagesofallpotentialprobesubjects.Weevaluateopen-setsearchperformanceonthe80Mgallery,andplotthesearchperformance(mAP)atvaryingFARvaluesinFigs.3.8.ForboththeLFWandIJB-Adatabases,theopen-setfacesearchproblemismuchharderthanclosed-setfacesearch.AtaFARof1%,thesearchperformance(mAP)ofbothalgorithmsismuchlowerthantheclosed-setfacesearchat80MshowinFig.3.7,indicatingthatalargenumberofgenuineprobeimagesarerejectedatthethresholdneededtoattain1%FAR.3.4.6ScalabilityInadditiontothemAPperformancemeasure,wealsoreportthesearchtimesinTable3.3.WerunalltheexperimentsonaPCwithanIntel(R)Xeon(R)CPU(E5-2687W)clockedat3.10GHz.Forafaircomparison,allthecomparedalgorithmsuseonlyoneCPUcore.ThedeepfeaturesareextractedusingaTeslaK40graphicscard.Inourexperiments,templategenerationfortheentiregalleryisdoneandthegalleryisindexedusingproductquantizationbeforeprocessingtheprobeimages.Galleryimages(upto5M)areenrolledintotheCOTSmatcher.Theruntimeoftheproposedfacesearchsystemafterthegalleryisenrolledandindexedconsistsoftwoparts:i)enrollmenttimeincludingfacedetection,alignmentandfeatureextraction,andii)searchtimeconsisting73Open-setSearchEvaluationonLFWandIJB-ADatabasesFigure3.8Open-setfacesearchperformance(mAP)vs.falseacceptrate(FAR)onLFWandIJB-Adatabases,usingthe80Mfacegallery.TheperformanceofCOTSmatcherisnotshownduetocomputationalissues.FARisshownonlyupto10%sinceoperationalsystemsarenotlikelytooperatebeyondthisvalue.ofthetimetakentothetop-ksearchresultsgiventheprobetemplate.Sincewedidnotenrollall80MgalleryimagesusingtheCOTSmatcher,weestimatethequerytimeforthe80Mgallerybyassumingthatsearchtimeincreaseslinearlywiththegallerysize.Usingproductquantizationforfastmatchingbasedondeepfeatures,wecanretrievethetop-kcandidatefacesinabout0:9secondsfora5Mimagegalleryandinabout6:7secondsforan80Mgallery(includingfeatureextractiontime).Ontheotherhand,theCOTSmatchertakesabout30and480secondstocarryoutbrute-forcecomparisonoverthecompletegalleriesof5and80millionimages,respectively.Intheproposedcascadedfacesearchsystem,wemitigatetheimpactoftheslowexhaustivesearchrequiredbythe74Table3.3Theaveragesearchtime(seconds)perprobefaceandthecorrespondingsearchperformance(mAP).5MFaceGallery80MFaceGalleryCOTSDFDF!COTSCOTSDFDF!COTS@50K@1KEnrollment0.090.050.140.090.050.14Search300.841.15480.0?6.636.64TotalTime30.090.891.29480.1?6.686.88mAP0.360.520.62N/A0.340.40?Estimatedbyassumingthatsearchtimeincreaseslinearlywithgallerysize.COTSmatcherbyonlyusingitonashortcandidatelist.Theproposedcascadedschemetakesabout1secondforthe5Mgalleryandabout6:9secondsforthe80Mgallery,whichisonlyaminorincreaseoverthetimetakenusingdeepfeaturesalone(6:68seconds).Whilethesearchtimecouldbefurtherreducedbyusinganon-exhaustivesearchmethod,itwouldmostlikelyresultinatlossinsearchaccuracy.3.5BostonMarathonBombingCaseStudyInadditiontothelarge-scalefacesearchexperimentsreportedabove,wereportonacase-study:theidentityofBostonmarathonbombingsuspects9inan80Mfacegallery.KlontzandJain[49]madeanattempttoidentifythefaceimagesoftheBostonmarathonbombingsuspectsina1Mgalleryofmugshotimages.Videoframesofthetwosuspectswerematchedagainstabackgroundsetofmugshotsusingtwostate-of-the-artCOTSfacematchers.Fivelowresolutionimages(1a,1b,2a,2b,2c)ofthetwosuspects,releasedbytheFBI(shownintheleftsideofFig.3.9)wereusedasprobeimages,andsiximages(1x,1y,1z,2x,2y,2z)ofthesuspectsreleasedbythemedia(shownintherightsideofFig.3.9)wereusedasthematesinthegallery.Thesematedimageswereaugmentedwith1million9https://en.wikipedia.org/wiki/BostonMarathonbombing75Figure3.9ProbeandgalleryimagesofDzhokharTsarnaevandTamerlanTsarnaev,re-sponsiblefortheApril15,2013Bostonmarathonbombing.Theseimageswerepreviouslystudiedin[49]and[4].Faceimages1aand1barethetwoprobeimagesusedforSuspect1(TamerlanTsarnaev),and1cishissketchimagedrawnbyaforensicsketchartistbasedon1aand1b.Faceimages2a,2band2carethethreeprobeimagesusedforSuspect2(DzhokharTsarnaev).Thegalleryimagesofthetwosuspectsbecameavailableonmediawebsitesfollowingtheidenofthetwosuspectsbasedoninvestigativeleads.Faceimages1x,1yand1zarethethreegalleryimagesforSuspect1andimages2x,2yand2zarethethreegalleryimagesforSuspect2.mugshotimages.OneoftheCOTSmatcherswassuccessfulinthetruemate(2y)ofoneoftheprobeimage(2c)ofDzhokharTsarnaevatrank1.Neitherofthetwoprobeimagesfortheolderbrothercouldretrievethetruematesatareasonablerank.Toevaluatetheperformanceofourcascadedfacesearchsystem,weconstructasimilarsearchproblemundermorechallengingconditionsbyaddingthegalleryimagestoaback-groundsetofupto80millionwebfaces.Inaddition,wealsouseonesketchimage(1cinFig.3.9)oftheolderbrotherastheprobeimage.Wearguethattheunconstrainedwebfacesaremoreconsistentwiththequalityoftheimagesofthesuspectsusedinthegallerythanmugshotimagesandthereforecompriseamoremeaningfulgalleryset.Weevaluatethesearchresultsusinggallerysizesof5Mand80Mleveragingthesamebackgroundsetusedin76Table3.4RankretrievalresultsofthetwoBostonbombersuspectsbasedon5Mand80Mfacegallery.Thesixprobeimagesaredesignatedas1a,1b,1c,2a,2b,and2c.Thesixmatedimagesinthegalleryaredesignatedas1x,1y,1z,2x,2y,and2z.ThecorrespondingimagesareshowninFig.3.9COTS(5MGallery)DeepFeatures(5MGallery)1x1y1z1x1y1z1a2,041,004595,2651,750,309132,613232,2751,401,4741b3,816,8743,688,3682,756,6411,511,1231,153,0361,699,9511c126,217608,899535,81566,427199,0831,529,1692x2y2z2x2y2z2a67,76686,747301,868174,44039,4171058792b352,06248,335865,04371,79526,52584,0132c158,341625515,85193419,975ProposedCascadedFaceSearchSystem2cDF!COTS@1K719,9752cDF!COTS@10K1011,580DeepFeatures(80MGallery)1x1y1z1a2,566,9175,398,45431,960,0911b33,783,36027,439,52644,282,1731c753,6532,408,39229,383,9452x2y2z2a2,461,664875,1681,547,8952b1,417,768972,4111,367,6942c1092,952136,651ProposedCascadedFaceSearchSystem2cDF!COTS@1K462,952136,6512cDF!COTS@10K1608136,651ourpriorsearchexperiments.ThesearchresultsareshowninTable3.4.Consideringtheimagesofsubject1,althoughtheperformanceofdeepfeaturesisbetterthantheCOTSmatcher,boththedeepfeaturesandtheCOTSmatcherreturnthematchinggalleryimagesatexcessivelyhighranksforallthreeprobeimages.Wenoticedthatthesearchperformanceofthesketchimage(1c)ismuchbetterthanthesearchresultsofthetwoprobefacesextractedfromvideoframes(1aand1b).Still,evenforthesketchimage,thebestsearchresultisatruematchatrank66,427onthe5Mgallery.77Forthesecondsubject,resultsarerelativelybetter.Forthe5Mgallery,theCOTSmatcherfoundamate(2y)forprobe2catrank625,whilethedeepfeaturesreturnedgalleryimage2xforprobe2catrank9.Theproposedcascadedsearchsystemreturnedgalleryimage2yatrank1,bycombiningtheCOTSmatcheranddeepfeaturestore-rankthetop1Kortop10Kcandidatefaces,demonstratingthestrengthoftheproposedcascadeframework.Thesearchresultsforprobe2careslightlyworseonthe80Mimagegallery,whichistobeexpected.Usingdeepfeaturesalone,wenowgalleryimage2xatrank109andgalleryimage2yatrank2;952.However,usingthecascadedsearchsystem,weretrievegalleryimage2xatrank46byre-rankingthetop-1Kfaces,andretrievegalleryimage2yatrank8byre-rankingthetop-10Kfaces.So,evenwithan80Mimagegallery,wecansuccessfullyamatchforoneoftheprobeimages(2c)withinthetop-10retrievedfaces.Thefacesearchresultsforthe80MgalleriesareshowninFig.3.10.3.6ConclusionsWehaveproposedacascadedsearchsystemsuitableforlarge-scalefacesearchproblems.Wehavedevelopedadeeplearningbasedfacerepresentationtrainedonthepubliclyavail-ableCASIAdatabase[104].Thedeepfeaturesareusedinaproductquantizationbasedapproximatek-NNsearchtoobtainashortlistofcandidatefaces.Thisshortlistofcandidatefacesisthenre-rankedusingthesimilarityscoresprovidedbyastate-of-the-artCOTSfacematcher.InadditiontotheevaluationsontheLFWandtheIJB-Abenchmarks,weevaluatetheproposedsearchschemeonan80millionfacegallery,andshowthattheproposedschemeanattractivebetweenrecognitionaccuracyandruntime.WealsodemonstratesearchperformanceonanoperationalcasestudyinvolvingthevideoframesoftheTsarnaevbrothersimplicatedinthe2013Bostonmarathonbombing.Inthiscasestudy,theproposedsystemcanoneofthesuspects'imagesatrank1in1secondona5Mgalleryandatrank8in7secondsonan80Mgallery.Giventheseretrievalresults,itis78Figure3.10Top10searchresultsforthetwoBostonmarathonbombersonthe80Mfacegallery.Thethreeprobefaces(1cisasketch)areoftheolderbrother(TamerlanTsarnaev)andthelastthreeprobefacesareoftheyoungerbrother(DzhokharTsarnaev).Foreachprobeface,theretrievedgalleryimagewithgreenborderisthecorrectlyretrievedimage.Imageswiththeredborderare\near-duplicate"imagespresentinthegallery.Notethatwewerenotawareoftheexistenceofthesenear-duplicateimagesinthe80Mgallerybeforethesearch.interestingtoconsidertheperformanceofourfacerepresentationinunsupervisedscenarios,i.e.canenessinsupervisedproblemslikevandfacesearchholdinthelarge-scaleclusteringscenario.79Chapter4Large-ScaleFaceClustering4.1IntroductionAsdiscussedin[68],faceclusteringattemptstoaddressthefollowingproblem:Givenalargenumberofunlabeledfaceimages1,clusterthemintotheindividualidentitiespresentinthisdata.Thissituationisencounteredinanumberoftapplicationscenariosrangingfromsocialmediatolawenforcement,wherethenumberoffacesinthecollectioncanbeoftheorderofhundredsofmillion.Often,theidentitylabelsattachedtothefaceimagesareeithermissingorcontainnoise.Thenumberofclustersortheunknownnumberofidentitiespresentinthefacecollectioncanrangefromafewthousandtohundredsofmillions,leadingtoultiesintermsofbothrun-timeandclusteringquality.Ingeneral,nopriorinformationaboutthetruenumberofclustersinthefacecollectioncanbeassumedtobeavailable.Consideringsocialmedia,Facebookreportedthat350millionimagesareuploadedperdayonaverage2,andofthoseimages,alargenumbermayreasonablybeassumedtobeimagesofpeople.Insocialmediasomeidentityinformationmaybeprovidedviatagging,1Unlabeledfaceimagesareimagesforwhichthetrueidentityofthepersonisunknown,i.e.noidentitylabelispresent.2https://goo.gl/FmzROn80Figure4.1Givenanunlabeledsetoffaceimagesacquirede.g.fromsocialmediaorinthecourseofaforensicinvestigation,wepropose\clusteringbyidentity"asastepinexploringandunderstandingthedatabase.Faceimagesherebelongtotwoindividuals:GeorgeW.Bush(W)andGeorgeH.W.Bush(G).butingeneralthisisincompleteandmaybeinaccurate.Weconsidergroupingfaceimagesintodiscreteidentitiesasonepossibleapproachfororganizingthislargevolumeofdata.Inforensicinvestigations,triaginglarge-scalefacecollectionsisalsoanemergingprob-lem.FewexamplesaremorerelevantthantheBostonMarathonbombing[49],wheretensofthousandsofimagesandvideosneededtobeanalyzedduringatimesensitiveinvesti-gation[82].Othercommoncasesthatrequiretheinvestigationoflargemediacollectionsincludeidentifyingperpetratorsandvictimsinchildexploitationcases3,anunderstandingofwhichindividualsexistinacollectionofsocialmedia(suchasimageryfromgangandterroristnetworks),andorganizingmediacollectionsfromharddrives(personalcomputersorservers).Inbothsocialmedia,andforensicinvestigationsweexpecttheunknownnumberofindividualidentitiespresentinadatabasetobelarge,whichischallengingfromascalabilityperspectivesinceruntimestendtoberelatedtothenumberofclusters.Additionally,weexpectthenumberofimagesperindividualtobeunbalanced(somepeoplemayappearvery3http://www.nist.gov/itl/iad/ig/chexia-face.cfm81often,othersmuchlessfrequently),whichischallengingfore.g.clusteringalgorithmslikek-meanswhichtendtogeneratesimilarsizedclusters.Itcanalsobeassumedthatthequalityofimagesintermsofpose,illumination,occlusion,etc.beingconsideredisrelativelylow,sincesocialmediaimages,imagestakenatpubliceventsetc.arenotgenerallycapturedinthemostfavorableconditionsforfacerecognition.Followingrecentprogressinunconstrainedfacerecognition,weattempttomitigatetheyoftheunderlyingfaceclusteringproblembyusingastate-of-the-artconvolutionalneuralnetworkbasedfacerepresentation[97].Evenusingastrongfacerepresentation,accuracyisnotperfectonvtasks(particularlywhenconsideringdata).Zhuetal.[108]reportedsuccessinclusteringcollectionsofpersonalphotographsusingaRank-Orderclusteringmethodwhichdevelopsadistancemeasurebasedonsharednearest-neighborsoffaceimagesbeingcompared(sincedirectfeaturevector-to-featurevectordistancesmaybeinaccurategiventhecultyofthefacerecognitiontask).However,inadditiontotheproblemofpoorfacequality,largescalefaceclusteringtasks(ontheorderof100millionfaceimages)areinherentlyintermsofscalability(run-time).Wedevelopaversionoftherank-orderclusteringalgorithmofZhuetal.[108]leveraginganapproximatenearestneighbormethodforimprovedscalability,andsimplifyingtheactualclusteringproceduretoachieveimprovedscalabilityandclusteringaccuracy.Weevaluatelarge-scaleclusteringperformancebycombiningthewell-knownLabeledFacesintheWild(LFW)database[35]withupto123Munlabeledimages(downloadedfromtheweb),andclusteringtheaugmenteddatabase.Additionally,consideringthatevenareasonablyaccurateclusteringofatrulylargedatabasemaystillresultintoomanyclusterstobemanuallyinvestigated,weinvestigateper-cluster\internal"qualitymeasures(whichdonotrequireexternallabelsonfaceimages)toidentifyasubsetof\good"clusters(relativelycompactandisolated),formanualexploration.Inadditiontolarge-scaleclusteringonunconstrainedstillfaceimages,weperformpreliminaryinvestigationsofclusteringvideoframesleveragingtheYouTubeFaces(YTF)database[100],clusteringhundredsofthousands82ofvideoframes.Theperceivedcontributionsofthischapterinclude:(i)anupdatedclusteringalgorithm,improvingonthemethodpresentedbyZhuetal.[108]usinganapproximatenearestneighbormethodforimprovedscalability,whichalsoattainsbetterclusteringaccuracy,(ii)large-scalefaceclusteringexperimentsusingastate-of-the-artfacerepresentationlearnedforlargescalesupervisedfacerecognitionbasedondeepnetworks[97],(iii)apreliminaryinvestigationoftheapplicabilityofthepresentedfaceclusteringmethodtovideo,and(iv)ofaper-clusterqualitymeasuresuitableforprioritizingasubsetofclustersoutofmillionsofdetectedclusters.4.2BackgroundTable4.1Asummaryofrelatedstudiesonfaceclustering.PublicationFeaturesClusteringmethod#Faceimages#SubjectsHoetal.[34]Gradientandpixelintensityfea-turesSpectralclustering1,38666Zhaoetal.[105]2DHMM+contextualHierarchicalclustering1,5008Cuietal.[19]LBP,clothingcolor+textureSpectral4005Tianetal.[88]Image+contextualPartialclustering1,14734Zhuetal.[108]Learning-baseddescriptor[10]Rank-order1,32253VidalandFavaro[91]Jointsubspacelearningandclus-tering-y2,43238Ottoetal.[67]Component-basedfeatures,com-mercialfacematcherk-Means,spectral,rank-order1M195,494OursDeepfeatures[97]Approximaterank-order123MUnknownzyInthisworkaalgorithmisusedforrepresentationandclusteringzDuetothenatureofthedatabaseused(faceimagesblindlyharvestedfromtheInternet),wedonotknowthetruenumberofidentities,asisthecaseinpracticalscenarios.4.2.1FaceClusteringTheclusteringproblem,atoolforexploratorydataanalysis,hasbeenwellstudiedinpat-ternrecognition,statistics,andmachinelearningliterature(Jain[40]providesasurvey).Lessstudiedisthechallengingproblemofclusteringfaceimages,especiallywhenboththenumberofimagesandthenumberofclustersareverylarge.Animportantconsiderationinclustering(andclassifying)faceimagesisthatsincethereisnouniversallyagreedupon83facerepresentationordistancemetric,theclusteringresultsdependnotonlyonthechoiceofclusteringalgorithm(groupingcriteria),butalsoonthequalityoftheunderlyingfacerepresentationandmetric.Table4.1listspriorworkonfaceclustering,withthefacerep-resentationandclusteringalgorithmused,alongwiththelargestdatabasesizeemployedintermsoffaceimages,aswellasnumberofsubjects.Hoetal.[34]developedvariationsonspectralclusteringwhereintheymatrixiscomputedbasedon(i)assumingaLambertianobjectwithcamera/objectpositioning,andthencomputingtheprobabilitythattwofaceimagesareofthesameobject(sameconvexpolyhedralconeintheimagespace),or(ii)thelocalgradientsoftheimagesbeingcompared;evaluationisdoneontheYale-BandPIE-66databases.Zhaoetal.[105]clusteredpersonalphotographcollections.Theirapproachcombinesavarietyofcontextualinformationincludingtimebasedclustering,andtheprobabilityoffacesofcertainpeopletoappeartogetherinimages,withidentityestimatesobtainedviaa2D-HMM,andhierarchicalclusteringresultsbasedonbodydetection;adatabaseof1;500faceimagesof8individualsisusedforevaluation.Cuietal.[19]developedasemi-automatictoolforannotatingphotographs,whichem-ploysclusteringasaninitialmethodfororganizingphotographs.LBPfeaturesareextractedfromdetectedfaces,andcolorandtexturefeaturesareextractedfromdetectedbodies.Spectralclusteringisperformed,andtheclusteringresultscanthenbemanuallyadjustedbyahumanoperator.Evaluationisdoneonadatabaseconsistingof400photographsof5subjects.Tianetal.[88]furtherdevelopedthisapproach,incorporatingaprobabilisticclus-teringmodel,whichincorporatesa\junk"class,allowingthealgorithmtodiscardclustersthatdonothavetightlydistributedsamples.Zhuetal.[108]developedadissimilaritymeasurebasedontherankingsoftwofacesbeingcomparedineachface'snearestneighborlists(formedusingabasicdistancemetric),andperformhierarchicalclusteringbasedontheresultingrank-orderdistancefunction.Thefeaturerepresentationusedistheresultofunsupervisedlearning[10].Theclusteringmethod84isevaluatedonseveralsmalldatabases(thelargestofwhichcontainsonly1;322faceimages).Wangetal.[98]primarilydevelopanapproximatek-NNgraphconstructionmethod;inoneoftheirexperimentstheyapplythismethodtoconstructthenearestneighborlistsrequiredby[108],onadatabasecontainingLFWandanadditional500Kunlabeledfaceimages,andusetherank-orderdistancemeasuretoproduceanimprovedk-NNgraph(butdonotperformhardassignmentoffacesintoclusters).VidalandFavaro[91]developedajointsubspacelearningandclusteringapproach.Itderivesseveralsubspacesfromtheinputdatabasewhichbestcaptureclustersinthedata.TheyevaluatethemethodontheextendedYale-Bdatabase.Inrelatedapplications,Bhattaraietal.[5]developasemi-supervisedmethodfororga-nizingdatabasesforimprovedretrievalspeedviahierarchicalclustering.Tapaswietal.[87]addressorganizationofvideoframes,performingbothwithinvideoandcross-videocluster-ing,incorporatingconstraintsfromfacetrackingandcommonvideoeditingpatterns.Scetal.[77]givesomequalitativeresultsofclusteringpersonalphotosusingadeeplearningbasedfacerepresentation.Therehaverecentlybeensomeadditionalworkonlarger-scalefacerecognitionprob-lems[62],consideringdatabasesontheorderofamillionimages.Someexperimentalworkinfaceclusteringhasconsideredhundredsofthousandsofimages,whilesomegeneralobjectclusteringtaskshaveuseddatabasesontheorderofbillionsofimages[57].Incaseswherethetruenumberofclustersisknownapriori,thatnumberistypicallyordersofmagnitudelowerthanthenumberofimages.Ingeneral,theevaluationmethodsusedtodeterminehowwellclusteringalgorithmsperform(whentruelabelsareavailable)aresplit.Insomecasestheclusteringaccuracyisused[34],inothersprecision/recall[105],andinstillothersnormalizedmutualinformationisemployed[108].854.2.2GeneralImageClusteringForclusteringimagesingeneral,ratherthanfacesinparticular,Liuetal.[57](i)extractedHaarwaveletfeaturesfromimages,(ii)appliedadistributedalgorithmconsistingofanapproximatenearestneighborstep,(iii)generatedaninitialsetofclustersbyapplyingadistancethresholdtothenearestneighborlists,and(iv)appliedaalgorithmtogetasetofclusters.Clusteringwasperformedonapproximately1:5billionunlabeledimages,alongwithanevaluationon3;385labeledimages.Themaingoaloftheprocedurewastogroupimagesintosetsofnearduplicates,butthetotalnumberofsuchsetsinthe1:5billionimagedatabasewasunknown.Gongetal.[25]developaversionofk-meansclusteringwhichissuitableforhandlinglargedatabasesbyencodingtheirfeaturevectorstobinaryvectors,andthenusinganindexingschemetosupportconstanttimelookupofclustercentersfortheassignmentstepofk-means.Theyapplytheirbinaryk-meansalgorithmtoasubsetoftheImageNetdatabase,containing1:2milliongeneralobjectimagesin1;000classes.Fooetal.[23]considerarelatedproblem,thedetectionofnear-duplicateimagesinlargedatabases.Inthiscase,ratherthangroupingimagesofpeoplebyidentity,thegoalistoidentifynear-duplicateimages,whichmaybetheresultofvariousimageprocessingoperations,suchascropping,rotation,colorspaceconversion,etc.TheirimagerepresentationconsistsofapplyingavisualwordsapproachtolocalPCA-SIFTdescriptors,indexedwithaLocalitySensitiveHashing(LSH)scheme.Theclusteringmethodusedisaalgorithm.Evaluationwasperformedbygeneratingasyntheticsetofnearduplicateimages,andperformingclusteringinthepresenceofanaddednoiseset;thelargestdatabaseusedcontained300;000images.4.2.3ApproximateNearestNeighborMethodsAcommonprobleminsomeofthewell-knownclusteringmethodsisnearestneighborsetsforallNsamplesinadatabase.Naively,theruntimeisO(N2),whichisaproblem86forlargeN.Thiscanbeconsideredaninstanceofthek-NNgraphconstructionproblem,oralternativelyitcanbeconsideredasetofNapproximatenearestneighborsearches.Forbothofthesecases,approximationmethodsareavailableintheliterature.4.2.3.1k-nnGraphConstructionOneapproximationmethodforcomputingthefullk-NNgraphisgivenbyChenetal.[14].ThealgorithmisaprocedurebasedonrecursivesubdivisionofthefeaturespaceviaLanczosbisection.Weuseaparallelizedversionofthisalgorithm,presentedin[67],whichbranchesateachrecursivesubdivision,handlingbothhalvesinseparatethreads.Thisalgorithmachievesimprovedruntimeoverthebrute-forcemethodbyskippingsomesetsofcomparisons(theportionofcomparisonsateachsplitbetweensamplesinoppositepartitions,notincludedintheoverlapset),andassuchtheruntimeisafunctioninthedegreeofoverlapchosen.4.2.3.2Randomizedk-dTreeInadditiontok-NNgraphconstruction,wemayconsiderbuildingnearestneighborlistsfortheentiredatabaseasNdiscretenearestneighborsearchproblems,andimprovethetotalruntimebyemployinganapproximatenearestneighborsearchmethod.Amongvar-iousapproximatenearestneighboralgorithms,oneclassicfamilyispartitioningtree-basedapproaches.Theybuildontheclassick-dtreealgorithmwhichdevelopsanindexthatsub-dividesthefeaturespacebyselectingasubsetoffeaturestosplitthedataon,typicallybyconductingnon-exhaustivesearchesofsuchatree.Therandomizedk-dtreealgorithm[78]improvesncybybuildingmultiplerandomizedk-dtreeindices,thensearchesthoseindicesinparallel,stoppingthesearchafteraspnumberoftreenodeshavebeenvisited.874.2.4ClusteringEvaluationInevaluatingclusteringperformance,sinceweuseaedof\correct"clus-tering(clusteringbyidentity),wecanevaluateaccuracyintermsofclusterscorrespondingtoknownidentitylabels.Externalmeasuresforevaluatingclusteringqualityrelyonidentitylabels;wewillusepairwiseprecision/recallsinceitcanbecomputedtly.Runtimeisalsoanimportantevaluationmetric.Pairwiseprecisionisasthefractionofpairsofsampleswithinacluster(consid-eringallpossiblepairs)whichareofthesameclass(havethesameidentity),overthetotalnumberofsame-clusterpairswithinthedatabase.InFigure4.2,(A1,A2)isamatchingpair,and(A1,B1)and(A2,B1)aremismatchedpairs.Pairwiserecallisasthefractionofpairsofsampleswithinaclass(consideringallpossiblepairs)whichareplacedinthesamecluster,overthetotalnumberofsame-classpairsinthedatabase.InFigure4.2(A1,A2)isasameclasspairinthesamecluster,while(A1,A3)and(A2,A3)aresame-classpairsintclusters.Thesemeasurescapturetwotypesoferror,aclusteringwhichplacesallsamplesasindividualclusterswillhavehighprecision,butlowrecall,whileaclusteringwhichplacesallsamplesinthesameclusterwillhavehighrecall,butlowprecision.ThetwonumberscanbesummarizedusingF-measure,asF=2(PrecisionRecall)=(Precision+Recall),orplottedagainsteach-otherinaPrecision-Recallcurve.Weextendthesemeasurestohandlepartiallylabeleddata,asencounteredinlarge-scaleclusteringproblems,bysimplyomittingtheunlabeleddatafromevaluation,totheextentpossible.Inourexperiments,partiallylabeleddataoccurswhenwemixLFWfaceimages(withknownlabels)againstalargecollectionoffacesdownloadedfromthewebwithunknownlabels.Wemopairwiserecallbysimplynotcountingwhetherornotunlabeledidentitiesaregroupedtogether.Forprecision,weconsiderlabeled-unlabeledpairs(e.g.(A3,U1)and(A3,U2)inFigure4.2)mismatches,andomitunlabeled-unlabeledpairs(i.e.(U1,88Figure4.2Diagramofapossibleclusteringusedtoillustrateevaluationmet-rics.Sixsamplesarepartitionedinto2clusters;A1,A2,andA3arelabeledwiththesameidentity,sampleB1islabeledwithatidentity,andsamplesU1andU2areunlabeled.U2))fromthecalculation.So,ratherthanconsideringallpossiblepairsinagivencluster,weomitanyunlabeled-unlabeledpairsfromthetotal.IntherightclusterinFigure4.2,wewouldonlyusepairs(A3,U1)and(A3,U2)tocalculatethemoprecision.ThemoprecisioninFigure4.2isthen1=5(onlytheA1-A2pairiscorrect,theU1-U2pairisnotcounted),themorecallis1=3,onlyclassAhasmorethanonesample(andislabeled),andoftheclassApairs,onlyA1andA2areinthesamecluster.Inpractice,ifanunlabeleddatabaseiscombinedwithalabeleddatabase,itispossiblethatsomelabeledsubjectsarealsopresentintheunlabeledset;however,weconsiderallunlabeled-labeledpairserrorstoavoidmakingfavorableassumptionsaboutourperformance.4.3ProposedFaceClusteringApproachWhenperformingclustering,weleveragethedeepconvolutionalneuralnetworkfacerepre-sentationdiscussedpreviouslyinthisdocument.Weusethe320-dimensionalfeaturevectorsextractedbythatmethodasthebasisforapplyingtclusteringalgorithms.Alargenumberofclusteringmethodshavebeenproposedintheliteraturebasedonsquared-error,mixturemodels,nearestneighborandgraph-theoreticapproaches[40].Basedonevaluationoftapproachesforfaceclusteringin[67],weleverageanapproximateversionoftherank-orderclusteringalgorithmproposedbyZhuetal.[108].Wepresenttheoriginal89algorithmindetail,thenourmoedversion.4.3.1Rank-OrderClusteringTherank-orderclusteringalgorithmproposedbyZhuetal.[108],similartothemethodofGowdaandKrishna[26],isroughlyaformofagglomerativehierarchicalclustering,usinganearestneighborbaseddistancemeasure.Theoverallwofthealgorithmistoinitializeallsamplestobeseparateclusters,computedistancesbetweenpairsofclusters,mergethoseforwhichthecomputeddistancesarebelowathreshold,theniterativelyrecomputeanewsetofcluster-to-clusterdistances,andperformmergesbasedonthenewdistances.Thisrequiresacluster-to-clusterdistancemetric.Inthealgorithm,thedistancebetweentwoclustersisconsideredtobetheminimumdistancebetweenanytwosamplesintheclusters,thesamecluster-to-clusterdistancemeasureusedinsingle-linkhierarchicalclustering.ThedistancemeasureusedinRank-Orderclusteringisgivenby:d(a;b)=Oa(b)Xi=1Ob(fa(i));(4.1)wherefa(i)isthei-thfaceintheneighborlistofa,andOb(fa(i))givestherankoffacefa(i)infaceb'sneighborlist,wherethenearestneighborlistsaregeneratedaccordingtosomeunderlyingdistancemeasure(weuseEuclideandistance).Thisasymmetricdistancefunctionisthenusedtoasymmetricdistancebetweentwofaces,aandb,as:D(a;b)=d(a;b)+d(b;a)min(Oa(b);Ob(a)):(4.2)Anadditionalcluster-levelnormalizeddistancemeasureisusedtoelyconstrainmergestolocalneighborhoods.Inparticular,theminimumdistancebetweenanytwopointsinapairofclustersiscomputed,anddividedbytheaveragedistance,as:90DN(Ci;Cj)=1˚(Ci;Cj)d(Ci;Cj)(4.3)where˚(Ci;Cj)=1jCij+jCjjXa2Ci[Cj1KKXk=1d(a;fa(k))(4.4)thatis,DN(Ci;Cj),istheminimumdistancebetweenanytwopointsinclustersiandjdividedbytheaverageofeachpointinCiorCjtotheirKnearestneighbors.Athresholdof1isalwaysusedforthisfunction,withthelocalneighborhoodsizeKleftasafreeparameter,soelythetestiswhetherornottheminimumdistancebetweenanytwopointsintheclustersisbelowtheaveragedistanceofallpointsinbothclusterstotheirKnearestneighbors.Thesymmetricrankorderdistancefunctiongiveslowvaluesifthetwofacesareclosetoeach-other(facearankshighinfaceb'sneighborlist,andfacebrankshighinfacea'sneighborlist),andhaveneighborsincommon(highrankingneighborsoffacebalsorankhighlyinfacea'sneighborlist).Afterdistancesarecomputed,clusteringisperformedbyinitializingeveryfaceimagetoitsowncluster,thencomputingthesymmetricdistancesbetweeneachcluster,alongwiththecluster-levelnormalizeddistances,andmerginganyclusterswithdistancesbelowbothspthresholds.Then,nearestneighborlistsforanynewlyformedclustersaremerged,anddistancesbetweentheremainingclustersarecomputedagainiteratively,untilnofurtherclusterscanbemerged.Inthiscase,ratherthanspecifyingthedesirednumberofclustersC,adistancethresholdisspfortherank-orderdistancemeasure,alongwithaneighborhoodsizeforthecluster-levelnormalizeddistance,andtogethertheseparametersdeterminethenumberofclustersfoundbythisalgorithm.evaluesfortheneighborhoodsize,anddistancethresholdareempiricallydetermined.Weuseourownimplementationofthisalgorithm.Intermsofrun-time,computingthefullnearestneighborlistsforeachsampleincursanO(N2)cost.Additionally,theactualclusteringstepusedhereisiterative,withcostper91iterationproportionaltothecurrentnumberofclusterssquared(withnumberofclustersstartingatNanddecreasingacrossiterations),soboththenearestneighborcomputation,andtheclusteringstepitselfarecostlywithincreasingdatabasesize.4.3.2ProposedApproximateRank-OrderClusteringTheRank-Orderclusteringmethodhasanobviousscalabilityprobleminthatitrequirescomputingnearestneighborlistsforeverysampleinthedatabase,whichhasanO(N2)costifcomputeddirectly.Althoughvariousapproximationmethodsexistforcomputingnearestneighbors,theyaretypicallyonlyabletocomputeashortlistofthetopknearestneighborstly,ratherthanexhaustivelyrankingthedatabase.WeusetheFLANNlibraryimplementationoftherandomizedk-dtreealgorithm[63]tocomputeashortlistofnearestneighbors.ApplyingapproximationmethodsforfasternearestneighborcomputationthenrequiressomemooftheoriginalRank-Orderclusteringalgorithm.Inparticular,ratherthanconsideringallneighborsinthesummationequation(1),wesumuptoatmostthetopkneighbors(undertheassumptionthatclusterformationreliesonlocalneighborhoods).Additionally,ratherthanusingthecluster-levelnormalizeddistancefromtheoriginalal-gorithmofZhuetal.[108],weenforcelocalitybyonlycomputingapproximaterank-orderdistancesbetweenpairsofsamplesforwhichbotharewithintheothersample'stop-200nearestneighbors.Further,wenotethatifonlyashortlistofthetop-kneighborsisconsidered,thepresenceorabsenceofaparticularexampleontheshortlistmaybemoretthanthesam-ple'snumericalrank.Assuch,weconsideradistancemeasurebasedondirectlysummingthepresence/absenceofsharednearestneighbors,ratherthantheranks,resultinginthefollowingdistancefunction:dm(a;b)=min(Oa(b);k)Xi=1Ib(Ob(fa(i));k);(4.5)92whereIb(x;k)isanindicatorfunctionwithavalueof0iffacexisinfaceb'stopknearestneighbors,and1otherwise.Inpractice,wethatthismoleadstobetterclusteringaccuracycomparedtosummingtheranksdirectly,asintheoriginalformulation.ely,thisdistancefunctionimpliesthatthepresenceorabsenceofsharedneighborstowardsthetopofthenearestneighborlist(saywithinthetop-200ranks)isimportant,whilethenumericalvaluesoftheranksthemselvesarenot.Figure4.3ApproximateRank-Orderclustering.Givenasetofunlabeledfaceimages(a),nearestneighborlistsarecomputedforeachimage(b);nearestneighborlistsarethenusedtocomputedistancesbetweenfaces(c).(b)showsthenearestneighborlistsofonlyefacesin(a).dm(1;2)(Eq.4.5)istheasymmetricdistancebetweenfaces1and2whereasDm(1;2)(Eq.4.6)isthesymmetricdistancebetweenfaces1and2.Thenormalizationprocedureemployedintheoriginalalgorithm(onlysummingupto93therankoftheothersamplebeingcompared,anddividingbymin(Oa(b);Ob(a)))isstille,andcontributestomoreaccurateclusteringresultsevenwiththismototheoriginalalgorithm.Thecombinedmodistancemeasureisas:Dm(a;b)=dm(a;b)+dm(b;a)min(Oa(b);Ob(a)):(4.6)Additionally,toimprovetheruntimeoftheclusteringstepitself,asmentionedearlierwe1)onlycomputedistancesbetweensampleswhichappearineachother'snearestneighborlists,and2)onlyperformoneroundofmergesofindividualfacesintoclusters,ratherthanperformingmultiplemergeiterationsasintheoriginalalgorithm.ThismeansthatcomparedtotheoriginalalgorithmwhichhasaruntimeofC2perclusteringiteration,weonlyperformoneiterationofclustering,andadditionallyonlycheckformergesonasubsetofallpossiblepairs(sinceweconsiderthe200nearestneighborsforeachsample).Thisresultsinaruntimeoftheclusteringstep(assumingpre-computednearestneighbors)ofO(N).Theclusteringprocedureweemploy,asoutlinedin4.3isthen:1.Extractdeepfeaturesforeveryfaceinthedatabase2.Computeasetofthetop-knearestneighborsforeachfaceinthedatabase3.Computepairwisedistancesbetweeneachface,andthosefacesinitstop-knearestneighborlistforwhichthefaceisalsoontheneighbor'snearestneighborlist,followingequation4.64.TransitivelymergeallpairsoffaceswithdistancesbelowathresholdSelectingathresholdtodeterminethenumberofclusters,Cinagivendatabaseisoneoftheperennialissuesindataclustering.Inpracticalapplicationswecannotassumethatthetruenumberofclusterswillbeknownapriori,thereforeintheabsenceofarobustprocedurefordeterminingthetruenumberofclusters,wesimplyevaluateouralgorithmatseveralevaluesofCandreportthebestresultsattainedinourexperiments.944.3.3Per-ClusterQualityEvaluationOuroverallgoalistofacilitatetheinvestigationofverylargecollectionsofunlabeledfaceimages.Wehaveproposedclusteringfaceimagesbyidentityasaapproach,butforverylargedatabasesevenclusteringbyidentitymayleavetoomanyclustersformanualexploration.Weattempttoaddressthisissuebyusinginternalclustervaliditymeasurestoidentifyasubsetof\good"individualclusters,suitableformanualinvestigation.Inpracticalapplicationswherethedatabaseiscompletelyunlabeled,evaluatingcluster-ingaccordingtoexternallabelsisnotpossible.But,thereisabodyofworkintheliteratureontinternalclusterqualitymeasures[38]whichattempttocharacterizeclusterqual-itywithouttheuseoflabels.Thesemeasurescantypicallybeunderstoodasmeasuresofeithercompactness(howwelltheclustermembersaregroupedtogetherintermsofpairwisesimilarity),orisolation(howwelltclustersareseparatedfromeachotherintermsofinter-clustersimilarity).Additionally,wecanmakeadistinctionbetweenevaluatingtheoverallqualityofagivenclusteringofadatabase,andevaluatingthequalityofindividualclustersinaparticularclustering;wewilluseper-clusterqualitymeasuresasameansofrankingindividualclusters.Whendealingwithverylargedatabases,onefundamentalconcernisrun-time.Itisgen-erallyinfeasibletocomputedistancesbetweenallsamplesinthedatabase,andadditionallyinfeasibletocomputedistancesbetweenallclustersincaseswhereboththedatabaseandnumberofclusterspresentinthedatabasearelarge.Inthiscase,wearepre-computingak-nearestneighborgraph,soitisnaturaltoconsidergraph-basedqualitymeasures,andalleviatecomputationalconcernsbyusingthepre-computedgraph.Coverage[9]isasthefractionofintra-clusteredgespresentoutofthecompletesetofedgesinthegraph.Wemodifythisforuseasaper-clusterqualitymeasurebyjustconsideringnodesinthecurrentcluster,i.e.per-clustercoverageasthefractionofedgesout-boundfromnodesinthecurrentclusterwhichlinktoothernodesinthecluster.Modularizationquality[60]isasthebetweenaninter-clusterconnectivity95Figure4.4Two-dimensionalt-SNE[90]embeddingof320-dimensionaldeepfeaturesfortheLFWdatabase,includingonlyclustersfromtheproposedclusteringwithtwoormoreimages.Linesaredrawnbetweenallsame-clusterfaces.measure(thefractionofedgespresentbetweennodesinaclusteroutofpossibleedgesinacompletegraphofthosenodes),andanintra-clusterconnectivitymeasure(averagefractionofcross-clusteredgespresentoutofpossibleedgesbetweeneachpairofclustersinacompletegraph).Thesegraph-basedmeasuresareformulatedsolelyintermsofthepresenceorabsenceofedgesbetweencertainvertices.ButwealsomotivationtolookatsomesimpledistancebasedmeasuresbylookingatFigure4.4whichshowsa2-dimensionalt-SNE[90]embeddingoftheoriginal320-dimensionalfeaturespacefornon-singletonclustersgeneratedfromthefullLFWdatabase.Inthisvisualization,linesaredrawnbetweenallimagesplacedinthesameclusters.Onethingwhichisapparentisthatsomeclusterassignmenterrorscoveralargedistanceintheembedding,andmayindicatethattheseerrorsoccuroveralargedistancein96theoriginalfeaturespace.Wewillthereforealsoconsidersimplecompactnessandisolationmeasuresbasedonthedistancesbetweenedgespresentinthek-NNgraph,primarilytheaveragedistanceofsamplestoothersamplesinthesameclusterintheirnearestneighborlists(averageintra-clusterdistance),andtheaveragedistanceofsamplesinaclustertosamplesoutsidethatclusterintheirnearestneighborlists(averageinter-clusterdistance).4.4ExperimentsInthissection,wewillpresentouroverallevaluationoflarge-scalefaceclustering,inseveralsteps.First,wewillevaluatevariousclusteringalgorithmsonasmalldatabase(theentireLFWdatabase,comprisedof13;233faceimages),evaluatethenearestneighborapproxi-mationused,thencarryoutlarge-scalefaceclusteringexperiments(involvinguptoa123millionfacedatabase),andpresentsomepreliminaryworkonvideoclustering.4.4.1ClusteringAlgorithmEvaluationBeforeinvestigatingperformanceonlarge-scaledatabases,wewillattempttoclustertheentireLFWdatabasebyidentity.OneissueisthatthedistributionofimagespersubjectisquiteimbalancedinLFW(indeed,themajorityofsubjectshaveonlyasinglefaceimage,accountingforapproximatelyathirdofallimagesinthedatabase).AlthoughwecouldconceivablyconstructasubsetofLFWwithmore\balanced"clusters,inpracticefortheapplicationdomainsoflarge-scaleclustering(analyzingsocialmediaimagery,andforensicapplications),thereisnobasistoassumethatthenumberofimagespersubjectiswellbalanced.Intheabsenceofpriorknowledgeabouttheexpecteddistributionofimagespersubjectinpracticalapplications,weclustertheentireLFWdatabase.Asabaseline,wewillconsiderk-meansclustering,since(i)itisperhapsthemostwell-knownclusteringalgorithm,(ii)hasonlyafewparametersfortuning,(iii)isoneofthemostt,and(iv)large-scaleclusteringmethodsareoftenapproximationsofk-means97Table4.2ClusteringresultsonthecompleteLFWdatabase.TimesaregivenasHH:MM:SS,measuredusing20coresofanIntelXeonCPUclockedat2.5GHz.Theproposedalgorithm(lastrow)hasthehighestclusteringaccuracy(F-measure)andtheshortestrun-time.ClusteringAlgorithm#ClustersF-measureRun-Timek-Means1000.3600:00:16k-Means6,5080.0704:58:49Spectral(Euclidean)2000.2000:11:18Spectral(Approx.ROD)2000.4300:14:04Single-LinkHierarchical1,0000.00500:00:25Rank-Order7,0590.6500:00:33Approx.Rank-Order(proposed)6,5080.8700:00:08clusteringwithimprovedscalability.WeusetheMATLABr2015aimplementationofthek-meansalgorithm,withtheeuclideandistancemetric.Weadditionallyusespectralcluster-ing[93],whichapproachestheproblemfromagraphtheoryperspective,asabaseline.Weemployspectralclusteringontwotgraphstructures:weinduceagraphstructureintheadjacencymatrixbykeepingthetop200neighborsnon-zerousingEuclideandistance,andalsousetheegraphstructureusedbyourproposedalgorithm(usingbinarizedrank-orderdistance(ROD),onlyconsideringimageswhicharebothineachother'stop-200nearestneighborsascomputedbytherandomizedk-dtreealgorithm).WeuseaMAT-LABimplementationofspectralclustering4.Wealsoappliedsimplesingle-linkhierarchicalclusteringbasedonEuclideandistance(usingtheMATLABimplementationofhierarchicalclustering).Additionally,weuseourimplementationoftheoriginalrank-orderclusteringalgorithmasabaseline.Ink-means,spectralclustering,andhierarchicalclustering,thealgorithmisparameter-izedonanumberofclusters,whileforrank-orderclusteringthenumberofclustersfounddependsonthedistancethresholdparameter.Inpracticalapplicationswecannotassumethatthetruenumberofclusterswillbeknownapriori,thereforeintheabsenceofarobustprocedurefordeterminingthetruenumberofclusters(C),wesimplyevaluateall4http://www.mathworks.com/matlabcenhange/34412-fast-and-t-spectral-clustering/contenectralClustering.m98Figure4.5NumberofclustersandcorrespondingF-measurefortthresholdval-uesusedwiththeproposedapproximaterank-orderclusteringmethodontheentireLFWdatabase.algorithmsatseveralevaluesofC,andreportthebestresultsattainedinTable4.2.Fork-meansandspectralclustering,clusteringperformance(intermsofF-measure)withCclosetothetruenumberofidentitiesisquitepoor(thisisexpected,sincethesealgorithmsarenotabletohandlehighlyunbalanceddatawell).Forthisreason,theoptimalvalueofCintermsofF-measureisrelativelylow.Forrank-orderclustering,thedistancethresholdwhichleadstothebestoverallF-measureresultsinanumberofclusters(approximately7k)closetothetruenumberofidentities,andtheoverallF-measureistlyhigherthanthespectralandk-meansresults.Figure4.5showstheenumberofclusters99Figure4.6Examplesof\pure"(singleindividual)clusters(a,b),and\impure"(multipleindividuals)clusters(c,d)generatedbytheproposedApproximateRank-OrderclusteringontheentireLFWdatabase.Facesnotbelongingtothemajorityidentityineachclusterareoutlinedinred.100andcorrespondingF-measuresgeneratedbyusingtthresholdvaluesforapproximaterankorderclusteringonthefullLFWdatabase.WeattainedthebestoverallF-measureonthisexperimentwiththetworank-orderclusteringvariants.SpectralclusteringleveragingourgraphstructureperformedwellrelativetotheEuclideandistancebasedgraph,attainingsomewhathigherperformancethank-means.Inourexperiments,directlyappliedhierarchi-calclusteringformedlargeimpureclustersearlyonintheprocess,leadingtopoorresultsoverall.Intermsofruntime,perTable4.2,evenforjust13;233faceimagesinLFWspectralclusteringtakesnoticeablylargecomputetime,whiletheproposedrank-orderclusteringissubstantiallyfaster.SomeexampleclustersareshowninFigure4.6;4.6(a)and(b)showpureclusters,while4.6(c)and(d)showexampleimpureclustersintermsofsubjectidentity.Incluster4.6(c),3imagesoftindividuals,allwithsimilardemographics,weregroupedinwiththemajorityidentity(WalterMondale);whileincluster4.6(d),2imagesof1additionalindividualwithsimilardemographics,andfaceposeweregroupedwiththemajorityidentity(MichaelDouglas).4.4.2ApproximationPerformanceWeevaluatetheperformanceofourk-NNapproximationmethodintermsofclusteringaccuracy,andrun-time.Weconsidertwoapproximationmethodsforcomputingthefullk-NNgraph,andcomparetheirperformancetothebruteforceapproachofperformingallpairwisecomparisons.ResultsareshowninTable4.3forthesethreenearestneighborcalculationmethodsonthefullLFWdatabase,andtheLFWdatabaseaugmentedwithanadditional1millionunlabeledimagesfromtheWebfacesdatabase.Inpractice,theRandomizedk-dTreemethod[78]achievesthebestrun-timeofthethreemethodsandthebestclusteringaccuracyaswell.Thisisasurprisingresult,sinceanapproximationmethodwouldgenerallybeexpectedtogivelessaccurateresultsthantheprocessitisapproximating;however,sinceourobjective101Figure4.7NumbersoftimeseachfaceintheLFWdatabaseappearedinanyotherface'stop-200nearestneighborlistfora)theexactnearestneighbors,andb)thenearestneighborscomputedviarandomizedk-dtreeapproximation.102Table4.3ClusteringResultsontheLFWdatabase,andLFWwithadditional1millionweb-downloadedfaceimagesusingapproximaterank-orderclusteringwithvaryingnearestneighborcomputationmethods.Timesmeasuredusing20coresofanIntelXeonCPUclockedat2.5GHz.NearestNeighborAlgorithmDatabaseF-measureRun-TimeBrute-ForceLFW0.7200:00:12Chenetal.[14]LFW0.6900:26:36Randomizedk-dTree[78]LFW0.8700:00:08Brute-ForceLFW+1M0.4914:18:24Chenetal.LFW+1M0.4101:06:58Randomizedk-dTreeLFW+1M0.7900:07:20istoperformclusteringbasedonthenearestneighborlists,ratherthansimplytheexactknearestneighborsforeachitem,thiscounter-intuitiveresultcanbeexplainedasfollows.Figure4.7plotsthenumberoftimeseachfaceintheLFWdatabaseoccursinthetop200nearestneighborlistofeveryfaceinthedatabase.Fortheexactnearestneighbors,thereareanumberoffaceimageswhichoccurveryfrequentlyinthenearestneighborlists(uptooverhalfofallnearestneighborlists),whilefortheapproximatenearestneighborsthesefacesoccurlessfrequently.Theactionofthek-dTreealgorithmistopickahighestvariancedimensionateachnode,andsplitsthefeaturevectorsatthatdimension.Afterthetreeisbuilt,splittingontheselecteddimensionsateachnodepartitionsthespaceintohyperrectangles,andanapproximatesearchisaccomplishedbyrestrictingthetotalnumberofthesenodesthatwevisitinagivensearch.Asaresultofthis,therearecaseswherecloserneighborsexistinthefeaturespace,butduetothetruncatedsearchwenevervisitthenodecontainingthem,andthissamplinghasreducedthetotalnumberoftimesthemostfrequentlyoccurringnearestneighbors(underEuclideandistance)arefound.Fromtheperspectiveofclusteringbasedonthenearestneighborlists,thelistscomputedfromtherandomizedk-dtreeapproximationactuallyformmorediscriminativefeatures,sincecertainfacesarenotpresentinverylargefractionsofthenearestneighborlists,asisthecasewiththeexactnearestneighbors.103Table4.4ClusteringresultsusingApproximateRank-OrderclusteringontheLFWdatabasewithincreasingamountsofunlabeleddata,andtsearchsizestrategiesfortheapprox-imatenearestneighborcalculations.Timesmeasuredusing5coresforLFW+5Mdatabaseexperiments,andasinglecoreusedforthesmallerexperiments,onanIntelXeonCPUclockedat2.5GHz.DatabaseSearchSizeF-measureRun-TimeJustLFW2,0000.8700:00:19LFW+1M2,0000.7901:03:25LFW+5M10,000(linearincrease)0.6706:28:42LFW+5M4,000(logarithmicincrease)0.3302:51:13LFW+5M2,0000.1301:52:32Generally,therandomizedk-dtreealgorithmhasO(NlogN)expectedrun-timefortreeconstruction,andperformingNsearches.Inpractice,theFLANNimplementationofthealgorithmisparametrizedwiththenumberofrandomizedtreesconstructed,aswellasthetotalnumberofnodesavailabletovisitpersearch.Ifdparametersareused,thetotalruntimeisindeedO(NlogN);however,ifeitherthenumberofindicesbuiltorsearchsizeisincreasedwithlargerdatabasesize,theeruntimeofthealgorithmwillincrease.Inpractice,weconstruct4treesperindex(andhavefoundlittleimpactfromusingslightlyhigherorlowervalues),butthenumberofnodesvisitedpersearchmustbeselectedwithcare.Oneprimaryquestionistodetermineifanumberofnodevisitspersearchisfeasibleforlargerdatabases,orifthenumberofnodesvisitedpersearchshouldincreasewithdatabasesize.Table4.4presentsresultsforclusteringbasedontheLFWdatabase,theLFW+1Mdatabase,andtheLFW+5Mdatabase,usingdtstrategiesforselectingthenumberofnodesvisitedpersearchontheLFW+5Mdatabase.Inpractice,usingthesamenumberofnodesvisitedpersearchontheLFW+5MdatabaseaswasusedontheLFW+1Mdatabaseleadstoadrasticreductioninclusteringaccuracyonthelargerdatabase.Infact,evenalogarithmicincreaseinsearchsizeleadstotaccuracyloss,relativetoalinearincreaseinsearchsize.Inthefollowinglarge-scaleexperiments,wethereforeincreasethesearchsizelinearlywithdatabasesize.Inpractice,thismeanstherun-timeoftheapproximationalgorithmcannotbeconsideredtobeO(NlogN),sincewe104increasethecostofeachsearchlinearlywiththedatabasesizeN,givingafullO(N2)costforperformingNnearestneighborsearches.Byusingtherandomizedk-dtreealgorithmforapproximatenearestneighborcomputa-tion,withourupdatedclusteringalgorithmwegetimprovedruntimeintheclusteringstep,andalsobetterclusteringaccuracy(comparedtothebaselinealgorithm).AlthoughwestillhaveanO(N2)run-timeforthenearestneighborcomputationstep,thereisatreductioninrun-time,animprovementbyafactorof120fortheLFW+1Mimagedatabaseoverbrute-forcecomputation.4.4.3Large-ScaleFaceClusteringInthissection,wewillconsiderclusteringtrulylarge-scalefacedatabases,combiningupto123millionunlabeledfaceimageswiththe13;233imageLFWdatabase.Asdiscussed,wewillusetherandomizedk-dtreenearestneighborapproximationmethodtoreducethetotalcostofcomputingnearestneighborsforthesedatabases;however,whenconsideringverylargescaledatabases,additionalproblemsarise.Consideringthetotalsizeofthedatabase,123million320-dimensionalfeaturevectors,witheachdimensionrepresentedbyonetakesupapproximately157gigabytesofspace,withoutconsideringanysupportingdatastructures.Thisamountofdataistoonasinglemachine,consideringthatatreestructuremustalsobeloadedinmemory,andadditionallysincetheapproximationmethodweareusingincursafullO(N2)costintime,computingnearestneighborsforthisdatabasebecomesinfeasibleonasinglemachine.Fortunately,adistributedmemoryvariationoftherandomizedk-dtreealgorithmisavailableaspartoftheFLANNlibrary[63].Thestrategyemployedistosplitthedatabaseintodisjointsubsets,assignonesubsettoeverydiscretemachineused,andconstructseparatek-dtreeindicesforeachdisjointchunk.Duringnearestneighborcomputation,wendaseparatesetofnearestneighborcandidatesfromeachchunk,andmergetheresultstogetasetofnearestneighborsforthesearch.Inpractice,thissimplestrategyworkswell.In105Table4.5Large-scaleclusteringresultsusingtheproposedApproximateRank-Orderclus-tering,withrandomizedk-dtreenearestneighborapproximation.TimesmeasuredusingthespnumberofcoresofIntelXeonCPUsclockedat2.5GHz.#Clustersistheresultingnumberofclusters,excludingsingle-imageclusters.DatabaseF-measure#Clusters#CoresRun-TimeLFW0.871,463100:00:19LFW+1M0.7994,740101:03:25LFW+5M0.67445,880506:28:42LFW+10M0.56933,2781012:11:33LFW+30M0.422,800,2023030:44:58LFW+123M0.2710,619,853123289:04:53LFW+121M0.3210,281,612123245:12:54thefollowingexperimentstheinitialdatabaseispartitionedinto1millionimagechunks,andeachchunkisdistributedtoaseparatemachineforindexconstruction.Sinceourdatabasesareasmalllabeledsubset(LFW),inalargerunlabeledbackgroundset,werandomizetheorderoftheLFWimages,andassignaportionofthelabeledimagestoeachofthediscretechunksofdata,toavoidanybiasduetoconstructingoneofthesub-indiceswithe.g.theentireLFWdatabaseaspartofit.Resultsforprogressivelylargerdatabases(constructedbyaddinglargerandlargersetsofunlabeledbackgrounddatatoLFW)arepresentedinTable4.5.Duetoourstrategyofallocating1millionimagespercore,welinearlyincreasethenumberofcoreswithdatabasesize,resultinginanoverallO(N)increaseinruntimewhenmovingtolargerdatabases.Computingnearestneighborsforthelargestdatabaseconsidered(123millionimages)tookapproximately2weeksofreal-timeusing123nodesintheMSUHigh-PerformanceCom-putingCenter.Whiletheobservedruntimeincreaseisapproximatelylinear,theclusteringaccuracyprogressivelydecayswhenconsideringlargerandlargerdatabases.Thisisasex-pected,consideringalargerdatabasemeansalargerchanceofimpostorsforeachindividualimageasnearestneighbors.Evenso,onthe123millionimagedatabase,westillattain0:27F-measureonthelabeledsubset,whichisconsiderablybetterthanarandomresult(whichisclosetozero,since13;233imagescaneasilybegroupedinto123Mimages106Figure4.8Exampleimagepairsat5std.dev.belowthemeangenuinescoreonLFW.(a)agenuineLFWmatch,(b),animpostorLFWmatch,(c)anearduplicateinthebackgrounddatabase,and(d)poorqualityimagesinthebackgrounddatabase.withoutkeepinganyofthesameidentityfaceimagestogether).4.4.4DeduplicationThebackgroundsetusedintheseexperimentsconsistsof123millionunlabeledimagesharvestedfromtheInternet,andsomefractionoftheseimagesarenear-duplicatesofeach-other.Forexample,theremaybemultiplecopiesofthesameimage,whichwereuploadedatslightlytresolutions,orwithslightlytencodings,ordigitallymoWeattemptedtoidentifytheseimagesbyexaminingoutliermatchdistancesinthesetofnearestneighborswecomputedforthefull123millionimagedatabase.Tooutliers,weconsideredthedistributionofgenuinematchdistancesproducedbyourfacerepresentationontheLFWdatabase,computedthemeanandstandarddeviationofthatdistribution,andcomparisonsmorethan5standarddeviationsbelowthegenuinemeanasoutliers.OntheLFWdatabase,thisthreshold3genuinematches(allverysimilarlookingimagesofGeorgeW.Bush,oneofwhichisshowninFigure4.8-a),andanumberofimpostor107matches.ExaminingthesehighscoringimpostorcomparisonstypicallyindicatedcaseslikeFigure4.8-bwhereoneoftheinvolvedimageswashandledpoorlybyourfacerepresentation(e.g.extremeimagesarenottypicallypresentinourtrainingdata,andarenotrepresentedwell).Applyingthisthresholdtothefull123millionimagebackgroundsetatotalof33;359;232ofourcomputednearestneighborpairs,butthesematchesinvolvedatotalofjust3;405;294uniqueimageManuallyexaminingthesematchesrevealedsomenear-duplicateimages(asinFigure4.8-c),butalsorevealedanumberofcasessimilartoFigure4.8-d,whereoneorbothofthepairsofimagesareoflowvisualquality,butthepairsarenotnear-duplicates.Forsimplicity,weusedatransitivemergeonthematchpairsresultingin1;188;326groups,randomlyselectedoneimagepergrouptoretain,anddeletedtherest(removingatotalof2;216;968images).WeevaluatedLFWimagesremovedinthisprocessasrecallerrors,thatisweconsideredthemtostillbepartoftheirrelevantclasseswhencomputingrecall,butdidnotplacetheminanycluster(thisreducestheF-measureofourmethodonjusttheLFWimagesfrom0:87to0:86).Wethenperformedclusteringagainontheremaining121millionimages,andasshowninthelasttworowsofTable4.5,deletionofapproximately1:8%ofthebackgrounddatabaseinthismannerimprovedF-measurefrom0:27to0:32.4.4.5VideoFrameClusteringWealsoconsidertheproblemofclusteringvideoframes,usingtheYoutubeFaces(YTF)database.SimilartoourtreatmentofLFW,weclusterallfacesinYTF,andevaluatetheresultsintermsoftheirconsistencywitharrangingtheindividualframesbyidentity.TheresultsaresummarizedinTable4.6.TheoverallF-measureappearsreasonablyconsistentwithourLFWresults,at0:74for621;126totalframesofvideo(comparedto0:79F-measureforclusteringtheLFW+1Mdatabase);aloweraccuracyontheYTFdatabaseisexpectedbecauseofitsgenerallylowerimagequality.However,closeranalysisoftheresultsreveals108Table4.6YTFclusteringresultsusingtheproposedApproximateRank-Orderclustering,withrandomizedk-dtreenearestneighborapproximation,forclusteringallframesinthedatabase,andforrandomlysampling3framespervideo.TimeisgivenasHH:MM:SS,measuredusing20coresofanIntelXeonCPUclockedat2.5GHz.PerformanceMeasureAllFramesSampled3FramesF-Measure0.710.53Precision0.790.82Recall0.670.39Within-VideoRecall0.910.89Cross-VideoRecall0.560.20NearestNeighborComputationTime00:04:1000:00:07someconfoundingfactors.UnlikeLFWclusteringresults,whereprecisionandrecallarerelativelyclosefortheoptimalF-measurevalues,ourclusteringresultsonYTFhaveveryhighprecision,andrelativelylowerrecall.ely,wearegettingmoreclustersthanthenumberofidentities,buttheclustersarerelativelypure.Furtheranalyzingtherecall,wethatalthoughtheoverallvalueis0:589,thefractionofsame-videopairsgroupedtogetherismuchhigherthanthefractionofcross-videopairsgroupedtogether.Thisindicatesthatwearesuccessfullygroupingframesintovideos,buthavingrelativelylittlesuccessgroupingidentitiesacrossvideos.SomeexampleclustersareshowninFigure4.9.Inmostcases,clustersroughlycorrespondtosinglevideos,inafewcases,e.g.4.9(b),framesfromtvideosofthesameindividualarecorrectlygroupedtogether,andinasmallsubsetofclusters(e.g.4.9(c)),multipleidentitiesaregroupedinthesamecluster.Toinvestigatetheimpactofthehighnumberofsimilarframesineachvideoonouralgorithm,wealsoperformedclusteringaftersamplingarandomsetof3framespervideo,andagainperformedclustering.TheseresultsarealsoshowninTable4.6,andexhibitsimilaroveralltrendstothefullframeclusteringexperiment.Inparticular,precisionandrecallareagainunbalanced,butoverallresultsareworsesincewithin-videopairsformamuchsmallerpartoftheoverallevaluation,andcross-videorecallalsodecreases.Thisindicates109Figure4.9ExampleimagesfromclustersgeneratedfromtheYTFdatabase.a)showstwoclusters,eachcontainingframesfromonevideoofthesamesubject,b)showsaclustercontainingframesfromtwovideosofthesamesubject,wherethebackgroundforthevideoisapparentlyidentical,c)shows28identitieswhichwereincorrectlygroupedintoasinglecluster;manyoftheseimagesarepoorlylit.110thatthepresenceofmanynear-duplicateframesisnotprimarilyresponsibleforreducingtheperformanceofouralgorithmonthetask,ratheritseemsthatthemainyisinreliablymatchingframesacrossvideos.TheyinmatchingacrossvideoisalsointheperformanceofourfacerepresentationonthestandardYTFevaluationprotocol,whereourTrueAcceptRateat1%FalseAcceptRateis61%,comparedto92%onthestandardLFWprotocol.4.4.6Per-ClusterQuality:InternalMeasuresWeareinterestedinidentifyingagoodsubsetofclustersfromalargegroupofclusters,asameansofaidingmanualexplorationoflargedatabases.Toevaluatetheenessoftmeasuresexperimentally,weneedmethodsforevaluatingtheenessoftheinternalmeasures.Asaapproach,weconsiderthecorrelationbetweentheinternalmeasuresandanexternalmeasure,thepairwiseprecisioncomputedindividuallyforeachcluster.Correlationbetweenthevariousinternalmeasuresconsidered,andprecisionareshowinTable4.7.Inpractice,thegraphicalmeasuresdonotperformparticularlywell,whilethesimplermeasuresbasedonedgeweightsperformbetter.Inparticular,thebestcorrelationisobservedforthe\averageinter-clusteredgeweight",andthiscanbefurtherimprovedbysubtractingtheaverageinterandintraclusteredgeweights.Thisisreasonable,sinceweareelycombiningacompactnessmeasure(averageintraclusteredgeweight),andaseparabilitymeasure(averageinter-clusteredgeweight).Evenso,thebestcorrela-tionachievedisonly0:42.Thiscorrelationcanbeimprovedbyexcludingsize2clustersfromconsideration(soonlyexaminingclusterswith3ormoremembers),whichimprovescorrelationto0:46.Whileacorrelationof0:46isnotveryhigh,forourapplicationthereisnoparticularneedfortherelationshipbetweentheinternalandexternalmeasurestobelinear.Figure4.10plotstheexternalmeasure(Precision)vs.thebestperforminginternalmeasuref(avg.intra-clusteredgeweight)-(avg.inter-clusteredgeweight)g.Onenotablefeatureontheleftside111Table4.7Correlationofinternalclusterqualitymeasureswithprecision,fortheLFWdatabase.AveragePrec.@100istheunweightedaverageofper-clusterpairwiseprecisionforthetop-100scoringclustersforeachmetric,FractionPure@100isthefractionofthetop-100clusterswhichcontainonlyasingleidentity.InternalMeasureCorrelationAvg.Prec.@100Pure@100Inter-MQ0.1280.7480.70Intra-MQ0.1170.8370.72MQ(Combined)0.1200.8290.80Coverage0.1170.7480.70MaxIntra-clusterEdgeDistance0.0220.8600.86TotalIntra-ClusterEdgeDistance0.0300.8500.85Avg.Intra-ClusterEdgeDistance(1)0.0800.8530.85MinInter-ClusterEdgeDistance0.2050.8930.87TotalInter-ClusterEdgeDistance0.1250.8520.56Avg.Inter-ClusterEdgeDistance(2)0.3250.8800.85(1)-(2)0.4270.9080.89(1)-(2),clustersize30.4600.9790.95oftheplotisasetofclusterswithexactlyzeroprecision,thatstillscorerelativelyhighlyontheinternalmeasure.Closerexaminationrevealsthatallofthesehighscoringzero-precisionclustersareofsizetwo(sotheyconsistofrelativelyisolatedfacesoftpeoplethathappentoscorehighly),whichexplainstheimprovementincorrelation(from0:42to0:46)whenrestrictingconsiderationtosize3orlargerclusters.AnotherinterestingfeatureofFig.4.10isthatthepointsontheplotalmostformatriangle(withavarietyofprecisionvaluesforlow-scoringclusters,butmostlyjusthighprecisionvaluesforhigh-scoringclusters),soalthoughtherelationshipbetweentheexternalandinternalmeasuresisnotlinear,itisstillpossibletoselectasubsetofhighprecisionclustersbytakingahighthresholdontheinternalmeasure.4.4.6.1RankingEvaluationAsanalternativetoconsideringcorrelation,wecanusetheinternalmeasuretorankalltheclusters,andcomputetheunweightedaverageofper-clusterprecisionvaluesforthetopCclusters(rankedbytheinternalmeasure),inspiredbyanalysistypicallydoneinretrieval112Figure4.10Pairwiseprecisionvs.theproposedinternalqualitymeasure,forallclustersgeneratedbytheproposedApproximateRank-OrderclusteringalgorithmonthefullLFWdatabase.Pointsinblueareclustersofsize3orlarger,pointsinredareofsize2.Thehighlightedsetofpointsontheleftedgeoftheareallofsize2,withzeropairwiseprecision.Sincewecan'treliablydistinguishbetweengoodandbad2-itemclusters,wediscardthemfromconsideration.problems.Table4.7showstheaverageprecisionofthevariousinternalmeasuresconsideredforthetop-100clustersinthefullLFWdatabase,andadditionallyshowsthefractionofclustersinthetop-100foreachmeasurethatcontainasingleidentity(are\pure").ThebluelineinFigure4.11plotstheaverageprecisionofthetopCclusters,withCcutateachpossiblerankinthesortedlistofclusters,forthebestperforminginternalmeasure,whiletheredlineplotsthesameinformationwhentheclustersareorderedbytheiractualprecision(andthusthebestpossibleperformanceonthistask).Theinternalmeasureiseinselectinghighprecisionclusters(relativetotheaverageprecisionofallthe113Figure4.11Thebluelinedisplaystheaveragepairwiseprecisionforlistsofclustersorderedbytheproposedinternalclusterqualitymeasure,terminatedateachpossiblerank,whiletheredlinedisplaystheaveragepairwiseprecisionforalistorderedbythetrueprecisionofeachcluster(andrepresentsthebestpossibleperformanceonthistask).Thehorizontalblacklineindicatestheunweightedaverageprecisionofallclustersconsidered.ClustersaregeneratedbytheproposedApproximateRank-OrderclusteringalgorithmfromthefullLFWdatabase.clusters)ontheLFWdatabase.Infact,theseveralclustersrankedbytheinternalmeasurehaveapairwiseprecisionof1.Figure4.12extendsthisconcepttotheaugmenteddatabases(inthiscase,onlyclusterscontainingsomelabeleddatafromLFWareranked,andprecisioniscomputedomittingunlabeledclustersasinourpreviousevaluation).Initially,theinternalmeasureisstille;however,forverylargedatabases(LFW+30Mandabove),rankingclustersaccordingtotheinternalmeasure,asexpected,becomeslesse.SomeexampletoprankingclustersareshowninFigure4.13.Thetop-5clustersfortheLFWdatabaseareallsingleidentity,relativelysmallclusters,indicatingthatthequalitymeasureworksasexpected.Forlargerdatabases(LFW+10M,LFW+123M),weshowboththetop-5clustersrankedpurelyintermsofthequalitymeasure,(b)and(d),aswellasthetop-5resultscontaininganylabeleddata,(c)and(e)(sinceweusethelabeledsubsetinournumericalevaluations).Thetopclustersinabsoluterankingtypically114involvenear-duplicateimages(e.g.similarimagesuploadedintlocations,withminorduetocropping,resolution,orcolorcorrectionandoftencartoonfaces(whichweredetectedbyfacedetectors)inadditiontoactualphotographs.ThetopclustersinvolvingLFWimagesshowthatthereareinfactanumberofimagesoftheLFWsubjectsintheunlabeleddatabase{thisindicatesthatourperformanceevaluationisoverlyconservative,sinceweconsidergroupingLFWandunlabeleddatatogethertobeincorrect(duetolackoflabelinformation).AlthoughtheresultsfortheLFW+10Mdatabaseappearreasonable(inthesensethatmultipleimagesofthesameidentityarebeinggroupedtogether,thatarenotjustslightalterationsofthesameoriginalimage),fortheLFW+123Mdatabasewebegintoseealargenumberofnearduplicateimages,e.g.theclustersranked1through4onthelistofclusterswithLFWimageshaveasingleLFWimage,andmultiplenearduplicateimagesthathappenedtobeinthebackgroundset.Consideringtheresultsforthede-duplicated121Mimagedatabase,therearestillanumberofnear-duplicateimagesthatwerenotcaughtbythededuplicationprocess.Manualexaminationofhigherrankedclusters(uptorank150),revealedthatafterthe20orsoclusters,actualnearduplicateimagesbecamerare.ButevenwiththatallclustersinvolvingLFWdatauntilrank116wereactuallysinglesubjectclusters,consistingofsomeLFWimages,andimagesfromthebackgroundsetofthesamesubject.Deduplicationofunconstrainedfaceimagesisindeedachallengingproblem.4.5ConclusionsWehaveshownthefeasibilityofclusteringalargecollectionofunlabeledfaceimages(upto123M)intoanunspnumberofidentities(ontheorderofmillions).Thisproblemisofpracticalinterestasastepinorganizingalargecollectionofunlabeledfaceimagespriortohumanexaminationduetothehighvolumeoffaceimagesuploadedtosocialmedia,andpotentiallyencounteredinforensicinvestigations.Therearecomputationalchallenges115inprocessingdatabaseswithtensofmillionsoffaces(whichweaddressviaapproximationmethods,andparallelization).Evenifthecomputationalchallengesaremet,producingmeaningfulclustersondataofthisscaleisveryIntermsofclusteringaccuracy,weachieved0:27pairwiseF-measureonthelargestdatabaseconsidered(123Munlabeledfaces+13;233labeledimagesfromLFW),whichindicatesthatatleastsomeoftheclustersproducedbyouralgorithmcorrespondwelltotrueidentitiesinLFW.Toidentifythesehighqualityclusters,wedevelopedaninternalper-clusterqualitymeasure,thatdoesnotinvolveexternalidentitylabels,toranktheclustersbyqualityformanualexamination.Experimentalresultsshowedthatthismeasurewasextremelyeforsmallerdatabases,butforthelargerdatabasesconsidered(LFW+123Munlabeledfaces),performanceapparentlyfalls(althoughthispointiscomplicatedbythedegreeofoverlapbetweenthefullbackgrounddatabaseandLFW).Onetdirectionforimprovingtheclusteringresultsisthereforetoattempttoimprovetheunderlyingfacerepresentation.Therearetwomainavenuesforimprovement,improvingthetrainingsetusedintrainingtherepresentation,andleveragingimprovednetworkarchitectures.Theseavenueswillbeexploredinthefollowingchapter.116Figure4.12Averagepairwiseprecisionatrank,orderedbytheproposedinternalclusterqual-itymeasureforaugmenteddatabases.ClustersaregeneratedbytheproposedApproximateRank-Orderclusteringalgorithm.117Figure4.13Top-5rankedclustersfortheLFW,LFW+10M,LFW+123M,anddeduplicateddatabases.FortheLFW+10M,andLFW+123Mdatabases,boththeabsolutetop-5rankingclustersintermsoftheproposedqualitymeasure,andthetop-5rankingclustersoutofthoseclusterscontainingatleastsomeLFWimagesareshown.UnlabeledimagesgroupedinwiththeLFWimagesareoutlinedinred.118Chapter5NetworkImprovements5.1IntroductionThenetworkarchitectureusedintheprecedingchaptersonlarge-scalefacesearchandclus-teringprovidedareasonabledegreeofaccuracyonthosetasks,butitisnaturaltowonderifimprovingtheunderlyingrepresentation(asmeasuredbyperformanceonbasicfacerecog-nitiontasks)couldleadtobetterperformance,particularlygiventhatanumberofgroupshavereportedbetterresultsonthestandardLFWprotocol.ThebestreportedresultsonLFWtypicallyinvolvebothlargertrainingsetsthanCASIA-Webface(ontheorderofmil-lionsofimages),andmoresophisticatednetworkarchitectures[77],sobothincreasingthesizeofthetrainingset,andimprovingthenetworkarchitectureitselfarenaturalavenuesforimprovement.Intermsoflargertrainingsets,therecentlyreleasedVGG-Face[69]databaselistsap-proximately2:6millionfacedetections,of2;622subjects,andissubstantiallylargerthanthepreviouslyusedCASIA-Webfacedatabase.However,thedatabasewasreleasedasalistofsourceURLsfortheimages,ratherthantheimagesthemselves,soatfractionoftheoriginaldatabaseisnotrecoverableduetobrokenlinks.Inspiteofthis,wewerenev-erthelessabletodownloadapproximately2:2millionimages,over4timesthetotalnumber119Figure5.1ExampleimagesfromtheVGG-Facedatabase.ofimagesintheCASIA-Webfacedatabase.Wealsoattempttoimproveourfacerecognitionperformancebyutilizingdeepresidualnetworks,whichhaverecentlyreportedsubstantiallybetterperformancethanVGG-architecturenetworksontheImageNetobjectrecognitionbenchmark[31].5.2VGG-FaceDatabaseTheVGG-FaceDatabasewasreleasedin2016asasetof2:6millionURLswithcorrespondingfacedetectionlocations.Theconstructionofthisdatabaseisoutlinedin[69].Essentially,theInternetMovieDatabase(IMDB)wasusedtoconstructalistofcelebritynames,whichwascrossreferencedwiththeFreebaseknowledgegraphtobuildasetofidentities,and120Figure5.2AnexampleimagelistedtwiceintheVGG-Facedatabase,withtwotfaceboundingboxes(onecorrectfacedetection,andonefalsepositive),bothlabeledasthesamesubject(WayneRooney).theycollectonethousandimageseachof2;622knownidentitiesusingGoogleandBingimagesearches,leadingtoa2:6millionfacedetectiondatabase.TheURLsoftheseimageswerereleased,ratherthantheimagesthemselvesleavingareducednumberofimageswhichcanactuallybeacquiredatthepresenttime.Weacquire2.2millionimagesoftheoriginal2.6millionlisted.SomeexampleVGGimagesareshowninFigure5.1.Althoughthedatabasecontainsfacelocationsfor2:6millionfacelocations(detectedbyadeformablepartsmodelmethod[61]),theURLsusedarenotunique.TherearecaseswhereaparticularURLislistedmultipletimes,withtboundingboxes.Inpractice,theseimageswithmultiplefacedetectionslistedareoftenalllistedunderthesameidentitylabel,andtypicallyonlyoneoftheprovidedURLsisactuallyoftheclaimedidentity,asshowninFigure5.2.Giventhelackofreliablelabelsfortheseimages,weinvestigatedremovingthemoutrightfromthedatabase.Doingsoreducedthenominalnumberoffacedetectionsinthedatabasefrom2:6milliondownto2:2million,andcombiningthatwiththeimageswewere121Figure5.3Twotypesofunlabeledduplicatedetectedinthewedownloaded.(a)placeholderimages,servedbysomeimagehostsintheplaceofbrokenlinks,and(b)exactduplicatecopiesofanimageuploadedtotwotURLs,labeledastwoentsubjectsintheVGG-Facelists.unabletodownloadedgivesatotalof1:9millionimageswithuniquelistedURLs,thatwewereabletosuccessfullydownload.Figure5.4Threeimages,labeledas\JustinKirk."Theleftmostimageistheactorinques-tion,themiddleimagecontainssomeincorrectlylabeledchildren,andtherightmostimageischaracter\JamesKirk,"playedbyWilliamShatner.However,examiningtheremainingimagesleadtotheobservationthattherewerestillexactduplicateimagesintheremainingdatabase,eveniftheirsourceURLswereent.TwotypesofduplicateimagesetsareshowninFigure5.3.Insomecases,theseunlistedduplicateswereduetotheexactsameimagebeinghostedattwotURLs,andbothingestedinthedatabase(asinFigure5.3(b));however,inothercases(asshown122Figure5.5Thetoptwohighestranking(a)images,andlowest-ranking(b)imagesforonesubject(JuliaStiles),intheVGG-Facedatabase,accordingtotheapproximatequalityrankingprovidedwiththedatabase.inFigure5.3(a)),thereisafurtherproblemofcaseswherethelistedimageswerenolongeravailable,buttheimagehostservedaplaceholderimage(whichdoesnotcontainafaceatall),ratherthansimplyfailingtodownloadanything.FurtherrestrictingthedatabasetoimagesforwhichtherearenolistedURLduplicates,thatwewereactuallyabletodownload,andfurtherarenotexactduplicatesofanyotherimagesinthedatabaseleftuswithatotalof1.7millionimages.Inadditiontothetheissueofduplicateimagesinthedatabase,manualexaminationoftheimagesofarandomsubjectislikelytorevealanumberofmislabeledimages.OneexampleisshowninFigure5.4,wheresomeunrelatedindividualsarelabeledasaparticular123actor.Inonecase,thisislikelyduetonameconfusion,withanimageofcharacter\JameKirk,"beinglabeledasactualactor\JustinKirk."OnepotentialavenueforaddressingthismislabelingproblemistoconsidertherankinginformationthatwasprovidedwiththeVGG-Facedatabase.Aftertheycollectedtheimages,theVGGgroupprioritizedtheimagesforeachsubjectusinganAlexnet[50](retrainedontheVGG-Facesubjects),andrankedtheimagesofeachsubjectaccordingtothemagnitudeofthesoftmaxoutputforthatsubject.Figure5.5showsthehighestandlowestrankingpairsofimagesforonesubject,thehighestrankingpairscontainclearfrontalfaces,whilethelower-rankingpaircontainahigherdegreeofposevariation.InFigure5.4,thecorrectimageofJustinKirkisrank25,whilethetwomislabeledimagesareranks787and981,indicatingthatmislabeledimagesmayalsoranklowonthelistforaparticularsubject.Restrictingthetrainingsettoe.g.thetop500rankedimagesforeachsubjectcouldpotentiallyimproveperformance(ifitleadstotheexclusionofalargefractionofmislabeleddata),butcouldpotentiallyharmperformance(ife.g.itexcludescorrectlylabeledimages,exhibitingconditions,likelargeposevariation).Table5.1showsthenumberofimagesavailableforuseasatrainingset,undertherestrictionsmentionedabove.WealsoconsidercombiningtheVGG-FacedatabasewiththeCASIA-Webfacedatabase.Thereisasubstantialoverlapinsubjectsbetweenthesetwodatabases,ofthe2;622subjectsinVGG,1;871arealsosubjectsintheCASIA-Webfacedatabase,leadingtoatotalofjust11;326uniquesubjectsinthecombineddatabase.5.3DeepResidualNetworksHeetal.[31]useda\deepresidualnetwork"architecturetoachievestate-of-the-artresultsontheImageNetgeneralobjectrecognitiondatabase.Inparticular,theresultsattainedbythisarchitectureonobjectrecognitiontasksoutstrippedpreviouslyreportedVGGnetworkarchitectureresults[79].ThebasicobservationofthepaperisthatwhenaddingadditionallayerstoaconventionalCNN,therecomesapointwhenaddingfurtherlayersnotonlydoesn't124Table5.1Numberofimagesandsubjectsavailableforuseintrainingsets,undervariousDatabaseNumberofImagesNumberofsubjectsVGG-FullDatabase2:62M2;622VGG-NoDuplicateURLs2:3M2;622VGG-NoDuplicateURLs&DownloadableIm-ages1:9M2;622VGG-NoDuplicateFiles&DownloadableIm-ages1:7M2;622VGG-NoDuplicateFiles&DownloadableIm-ages&top-500per-Subject916;8232;622CASIA-Webface494;41410;575CASIA-Webface,withFacesDetected435;48710;575VGG-NoDuplicateFiles&CASIA-Webface,withFacesDetected2:1M11;326improvetestingaccuracy,butalsodoesnotimprovetrainingaccuracy.Soratherthanthetypicalover-problem(whereanoverlycomplexmodelthetrainingsetwell,butfailstogeneralize),theobservedissueisanproblem,wheretheoptimizationprocedurefailstothemorecomplexmodeltothetrainingsetintheplace.Duetotheincorporationofbatchnormalizationlayers[36],theauthorsarguethattheoptimizationproblemisnotnumericalyincomputinggradients,butratherisrelatedtotheltlyinlearninganidentitymappingwithaconventionalnetworkarchitecture.Givenaconvolutionallayer,andactivationlayer,itistheoreticallypossibletolearnasetofweightsthatpassestheinputthroughunchanged,butinpracticethisseemstoobtainduringoptimization.Thisisdirectlyrelevanttothegissuediscussedearlierinthatitshouldalwaysbepossibletostackmorelayerswithouthurtingtrainingsetaccuracy,aslongasthelayerscanconvergetoanidentitymapping.TheproposedsolutionistheresidualnetworkblockoutlinedinFigure5.6.Thebasicideaisthatratherthanlearningadirectmappingoftheinput,thenetworklearnssomethingwhichisaddedtotheinput(aresidual),ateachblock.Thisisaccomplishedthroughtheuseofa\skip"connection,wheretheinputtoresidualblockxispassedthroughaseries125Figure5.6ResidualNetworkblockstructure.(a)originalversion[31],(b)fullpre-activationversion[32].ofconvolutionallayersinonebranch,butalsopasseddirectlythroughwithoutalterationinanotherbranch,withbothbranchesbeingcombinedinasimpleadditionoperation,andpasstheresultthroughaReLUlayerasshowninFigure5.6(a).Thesameauthorsproposedafurthermocation,showninFigure5.6(b),wheretheblockisslightlyrestructuredtoremovethenalReLUlayeroutoftheskipconnection,reducingthedistortionoftheoutputbeforeenteringthenextresidualblock.TheauthorsshowconvincingimprovementsoverabasicVGG-stylearchitecture,byincorporatinglargernumbersofresidualblocks.Wedirectlyadapttheproposedarchitectureforfacerecognition(leveragingtheTorch7framework1andtheimplementationofresidual1http://torch.ch/126networksreleasedbyFacebookAIResearchfb.resnet.torch2).Wehaveinvestigatedthe18-layer,and50-layerarchitecturesoutlinedin[31];asummaryofthesearchitecturesisshowninTable5.2.Intermsofdataaugmentation,wescaleournormalizedimages(followingthealignmentprocedurediscussedinChapter2)to256by256,andrandomlycrop224by224regionsduringtraining,weadditionallyrandomlyhorizontallyimagesduringtraining,andusethescaleandaspect-ratioaugmentationsfrom[83].Table5.2Detaileddeepresidualnetworkarchitectures,following[31].LayerNamesOutputSize18-Layer50-LayerData224224conv11121127x7,64,stride2conv2x565633maxpool,stride211;649=;333;64˙233;6433;6411;256conv3x282811;1289=;433;128˙233;12833;12811;512conv4x141411;2569=;633;256˙233;25633;25611;1024conv5x7711;5129=;333;512˙233;51233;51211;2048FinalLayersaveragepool,fc,softmax5.4ExperimentalResultsAseriesoftrainingsets,andcorrespondingvevaluationresultsontheBLUFRprotocolareshowninTable5.3.Resultsfromourpreviousexperiments,trainingnetworkslargelyfollowing[104],areincludedatTable5.3-1,2.WealsoattemptedtotrainourpreviousarchitectureonthefullVGG-Facedatabase,butitfailedtoconverge.Asaproofofconcept,wetrainedan18-layerresidualnetworkontheCASIA-Webfacedatabase(Table5.3-2https://github.com/facebook/fb.resnet.torch127Table5.3BLUFRvperformancethatweachievedusingtnetworkarchi-tecturesandtrainingsets.Resultsreportedas(mean-standarddeviation)across10folds.VR(%)at0:1FAR(%)TrainingSetNetwork(1)CASIA-Webface\FromScratch"84:41(2)CASIA-Webface\FromScratch,"fusionof9models88:00(3)CASIA-Webface18-layerResNet82:06(4)VGG-CASIA50-layerResNet88:67(5)VGG-CASIA50-layerResNet,10-crop89:74(6)CASIA-Webface50-LayerPre-ResNet88:36(7)CASIA-Webface50-LayerPre-ResNet,10-crop89:64(8)VGG-Face50-LayerResnet81:40(9)VGG-Deduplicated50-LayerPre-ResNet86:98(10)VGG-Deduplicated50-LayerPre-ResNet,epoch1183:99(11)VGG-Deduplicated50-LayerPre-ResNet,weightedloss,epoch1182:31(12)VGG-Deduplicated,top-50050-LayerPre-Resnet85:28(13)VGG-Deduplicated+CASIA-Webface50-LayerPre-ResNet91:04(14)VGG-Deduplicated+CASIA-Webface50-LayerPre-ResNet,10-crop92:22(15)VGG-Deduplicated+CASIA-Webface101-LayerPre-ResNet91:18(16)VGG-Deduplicated+CASIA-Webface101-LayerPre-ResNet,10-crop92:10128Figure5.7DeduplicatedVGGdatabaseimages/subjectdistribution.3),andattainedresultsabitworsethanthepreviouslyusednetworkarchitecture.Thisisexpected,sinceintheoriginalResNetpaper,18-layernetworksshowednosubstantialimprovementovertheirVGG-stylecounterparts.Next,weattemptedtotraina50-layerresidualnetwork,usingthefullVGG-Facedatabase.Thiswasultimatelynotverye,astheoptimizationstalledandattainedworseperformanceonBLUFRthanweattainedwithasmallernetworktrainedontheCASIA-Webfacedatabase(Table5.3-8).Wethenre-trainedthis50layerarchitectureontheCASIA-Webfacedatabase,for37epochs,andattainednotablybetterperformancethanourpriorresultsontheBLUFRbenchmark(Table5.3-4).DuetothesubstantialamountoftimespentretrainingontheCASIA-Webfacedatabase,itisnaturaltowonderiftheinitialtrainingonVGGhadanytWethereforetraineda50-layernetworksolelyonCASIA-Webface(thistimeusingthefullypre-activatednetworkvariantdiscussedin[32]),andattainedslightlyworseresults(Table5.3-6).However,129Figure5.8BLUFRROCCurves.thebetweenthenetworks'performancewaslessthan1standarddeviation,soitistocreditthisasasitanditseemsthattrainingontheVGG-Facedatabaseinitiallyleadtonotimprovement.Subsequently,weattemptedtotrainanetworkonthededuplicatedVGG-Facedatabase(removingallimagesinvolvedinduplicateURLsets,andallimageswithexactdu-plicates).ThisimprovedperformanceovertrainingonthefullVGG-Facedatabase(Ta-ble5.3-9),buttheresultsstilllagbehindthenetworktrainedonCASIA-Webface.Wethenconsideredthedistributionofimagespersubjectinthededuplicateddatabase(showninFigure5.7),andnoticedthatitwastlymoreimbalancedthanthecompleteVGG130database.Weattemptedtoaddressthisclassimbalancebyweightingclassesinthelosslayerinverselyproportionaltotheirfrequencyinthedatabase.However,thisappearedtoleadtoworseperformanceafter11epochs(Table5.3-1vs.10),andwecuttrainingshort.Althoughanimbalancedpriorislikelytohurtperformance,wedonotusethecationoutputofthenetworkdirectly(rather,weusethevaluesofapoolinglayerdirectlyasafeaturevector),soitisreasonablethatattemptingtoaddresstheclassimbalanceissueleadtonoimprovement.WeadditionallytriedrestrictingtheVGG-Facedatabasetocontainonlynon-duplicateimages,andalsoonlyimagesrankedinthetop-500foreachsubject.ThisleadtoloweroveralltrainingandvalidationerrorthanwasseenwhentrainingontheVGG-dedupdatabase,butleadtoworseperformanceontheBLUFRbenchmark(Table5.3-12).Soitseemsthattheprioritizationremovedtoomuchvariancefromthetrainingset,andhurtgeneralization.TrainingonthecombinationoftheCASIA-WebfaceanddeduplicatedVGG-Facedatabasesdidimproveourvrateat0:1%FARto92:22%(Table5.3-14)fromourpreviousbestresultof89:74(Table5.3-5).Someexamplesofincorrectvdecisionsat0:1%FARmadebyourpreviousmodel,thatwerenotmadebyournewmodelareshowninFigure5.9,whilesomeerrorsmadebyournewmodel,thatwerenotmadebythepreviousmodelareshowninFigure5.10.Weadditionallytraineda101-layernetworkonthecombinedVGG-FaceandCASIA-Webfacedatabase;however,althoughthisreducedourobservederroronthevalidationset(consistingofarandomsampleofimagesofthetrainingsetsubjects),itdidnotleadtoimprovedperformanceonBLUFR,asshowninTable5.3-16.ThebestresultswehaveattainedonBLUFR92:22%VRat0:1%FARinvand62:05%DIRat1%FARinopen-setidenslightlylagsomenewlyreportedresultsontheprotocol.Chengatal.[15]developedsomeimprovementstothepopularJoint-Bayesianvmethod,andusingaGoogLeNet-styleinceptionarchitecturecombinedwithtraditionalJoint-Bayesattaineda92:19%VRat0:1FAR,andimprovedthisresultto93:05%usingtheirimprovedmethodforestimatingtheJoint-Bayesparameters.Lvetal.[59]proposedanoveldataaugmentationmethod(perturbingdetectedfaciallandmark131(a)(b)Figure5.9Examplecorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocolmadebyour50-LayerDeepresidualNetwork,thatweremadeincorrectlybyourpreviousDeepModel.(a)4genuinepairsacceptedunderthenewmodel,rejectedundertheoldmodel,(b)4impostorpairsrejectedunderthenewmodelthatwereacceptedbytheoldmodel.132(a)(b)Figure5.10Exampleincorrectvdecisionsat0:1%FARonfold1oftheBLUFRprotocolmadebyour50-LayerDeepresidualNetwork,thatweremadecorrectlybyourpreviousDeepModel.(a)4genuinepairsrejectedunderthenewmodel,thatwererejectedundertheoldmodel,(b)4impostorpairsacceptedunderthenewmodel,thatwererejectedundertheoldmodel.133locations,priortoimagealignment),againusinganInceptionarchitecture,andattaineda63:73%DIRat1%FARinopen-setidenusingthefusionof3models(withabestsingle-modelperformanceof57:90%).Theseresultsindicatethatourresultscouldpotentiallybefurtherimproved,throughtheincorporationofmetriclearningmethods,orfusingmultiplemodels.Figure5.8showsfullROCcurvesforsomekeyresultsontheBLUFRprotocol.The18-LayerResNettrainedonCASIA-Webfaceunderperformedourpreviousnetwork,butthe50-layerResNetleadtoatimprovement.Additionally,usingthestandardImageNet10-cropstrategy(takeaninputimage,cropfromthecenterasusual,additionallycropstartingfromthefourcornersoftheimage,anddothesameforthehorizontallyedimage,(Table5.3-5,7,14),leadstoaminorimprovementinvperformance(whichisperhapsnotwortha10-foldincreaseinfeatureextractiontime).OnthestandardLFWprotocol,weattain(1-EER)%errorof97:23,whichisquiteclosetothevalueof97:27reportedbytheVGGgroupfortrainingtheirnetworkonthefullVGG-Facedatabase.TheVGGgroupreportedbetterresultsbyemployingdatabase-sptraining,ontheLFWprotocol'strainingfoldswithatripletlossfunction(upto99:13%1-EER),butwearelessinterestedindatabasespOurresultsdivergefromthosereportedbytheVGGgroupinthatwesawlittletonoimprovementfromtrainingontheVGG-Facedatabase,andadditionallyfoundthatpruningduplicateimagesimprovedresults{whichiscontrarytotheresultreportedin[69],wheretrainingontheentiredatabaseleadtobetterperformancethanamanuallycleanedversion.Comparingtheseresults,therearesomedatabase(wewerenotabletodownloadapproximately400Kimagesfromtheoriginaldatabase,andusedatdatabasecleaningmethod),andrencesinnetworkarchitecture.ItispossiblethattheresidualnetworksweemployedaremoresensitivetolabelingnoisethantheVGGarchitecture,buttosaywithoutamoredetailedinvestigation.1345.5ConclusionsWehaveshownimprovedperformanceonthefacevtask,usingadeepresidualnetworkarchitecture,attainingvperformancesuperiortowhatwehadpreviouslyreachedusingafusionof9separatemodels.Aalrgerperformanceimprovementwaattainedduetothechangednetworkarchitecturethanbyaddingthe2MimageVGG-Facedatabasetothejust400KimageCASIA-Webfacedatabasewepreviouslyusedintraining,whichisasurprisingresult.ThiscouldbeduetotheprevalenceofincorrectlylabeleddataintheVGG-Facedatabase,althoughthiswouldbecontrarytotheresultin[69]thattrainingonthelargerversionofthedatabaseresultedinbetterperformance(eveninthepresenceofsomefractionofmislabeleddata).AnotherissueisthatwhiletheCASIA-WebfacetrainingsetcontainssubstantiallyfewerimagesthanVGG-Face,italsocontainssubstantiallymoresubjects(at10Kvs.2:6K),soitmaypotentiallybebetteratcapturinginter-subjectvariation(evenifithasfewersamplestoprovideintra-subjectvariationduringtraining)thantheVGG-Facedatabaseonitsown.135Chapter6SummaryandFutureWorkThisthesishaspresentedadeeplearningbasedfacerepresentation,evaluateditsperformanceonstandardfacerecognitionbenchmarks,andappliedittolarge-scalesearch,andclusteringproblems.Weadditionallyinvestigatedboththeimpactofincreasingthesizeofthetrainingset,andincorporatinganimprovednetworkarchitectureonfacerecognitionperformance.Weattainedamajorimprovementfromusingadeepresidualnetworkarchitecture,andaminorimprovementfromcombiningtheVGG-FacedatabasewiththeCASIA-Webfacedatabaseduringtraining.Intermsoffacerepresentationdevelopment,weevaluatedantdeepconvolutionalnetworkforfacerecognition,trainedonalargepublicdomaindata(CASIA[104])onthreefacedatabasesofincreasingcomplexity:PCSOmugshotdatabase,LFWdatabase(onlycontainsfacesdetectablebytheOpenCVversion1.0Viola-Jonesfacedetector),andtheIJB-Adatabase(containsseveralfaceswhicharenotdetectablebytheViola-Jonesdetector).Weadditionallydevelopedamoreadvancedrepresentationbasedonastate-of-the-artdeepresidualnetworkarchitecture,andshowedimprovedperformanceontheBLUFRbenchmark.Weappliedtheinitiallydeveloped320-dimensionalrepresentationtolarge-scalefacesearchandclusteringproblems,handlingdatabasescontainingupto120millionfaceimages,andfurtherperformedacasestudyusingfaceimagesoftheTsarnaevbrothersinvolvedin136the2013BostonMarathonbombingasqueriesagainstan80Mimagegallery,andshowedthatDzhokharTsarnaev'sphotocouldbeidenatrank8whensearchingagainstthe80Mgallery.Intermsofclustering,wedevelopedanupdatedapproximateclusteringalgo-rithmforlargescalefaceclusteringimprovingonthemethodpresentedbyZhuetal.[108],clusteringdatabasesupto123Mimagesinsize.Weevaluatedtheenessofimagede-duplicationontheclusteringprocess(andshowedsomeimprovementinclustering121Mimagesbycullingduplicates).Weadditionallyperformedapreliminarystudyoftheap-plicabilityoftheproposedfaceclusteringmethodtovideo.Finally,wedevelopedintrinsic(nofacelabels)per-clusterqualitymeasures,tofacilitateuseofclusteringasapre-cursortomanualdatabaseinvestigation.Oneobviousdirectionforfutureworkistoincorporatetheupdatedfacerepresentationdevelopedinchapter5intotheclusteringandretrievalproblems.DirectlyapplyingourpreviousclusteringmethodtothisrepresentationleadtodrasticallyworseperformancethanpreviouslyattainedonLFW,soitwillbenecessarytoevaluatethefunctionalbe-tweentheResNetbasedrepresentation,andourpreviouslyusedCNNbasedrepresentation.Applyingtheimprovedfacerepresentationtosearchapplicationsisofinterestaswell,par-ticularlystudyingtheinteractionbetweenthenewrepresentationandentapproximatesearchmethodsisofgreatpracticalinterest.137BIBLIOGRAPHY138Bibliography[1]TimoAhonen,AbdenourHadid,andMattiPietikainen.Facedescriptionwithlocalbinarypatterns:Applicationtofacerecognition.IEEETrans.onPAMI,28(12),2006.[2]DavidArthurandSergeiVassilvitskii.k-means++:Theadvantagesofcarefulseeding.InProc.oftheEighteenthAnnualACM-SIAMSymposiumonDiscreteAlgorithms,2007.[3]PeterN.Belhumeur,Jo~aoP.Hespanha,andDavidJKriegman.Eigenfacesvs.faces:Recognitionusingclasssplinearprojection.IEEETrans.onPAMI,19(7),1997.[4]L.Best-Rowden,H.Han,C.Otto,B.F.Klare,andA.K.Jain.Unconstrainedfacerecognition:Identifyingapersonofinterestfromamediacollection.IEEETrans.onTIFS,12,2014.[5]BinodBhattarai,GauravSharma,FredericJurie,andPatrickPerez.Somefacesaremoreequalthanothers:Hierarchicalorganizationforaccurateandtlarge-scaleidentity-basedfaceretrieval.InECCVWorkshops,2014.[6]DuaneM.Blackburn,MikeBone,andJonathonP.Phillips.Facerecognitionvendortest2000:evaluationreport.Technicalreport,DTICDocumenthttp://www.dtic.mil/dtic/tr/fulltext/u2/a415962.pdf,2001.[7]VolkerBlanzandThomasVetter.Facerecognitionbasedona3dmorphablemodel.IEEETrans.onPAMI,25(9),2003.[8]BernhardEBoser,IsabelleMGuyon,andVladimirNVapnik.AtrainingalgorithmforoptimalmarginInProc.WorkshoponComputationalLearningTheory,1992.[9]UlrikBrandes,MarcoGaertler,andDorotheaWagner.Experimentsongraphcluster-ingalgorithms.Algorithms-ESA2003,2003.[10]ZhiminCao,QiYin,XiaoouTang,andJianSun.Facerecognitionwithlearning-baseddescriptor.InProc.CVPR,2010.[11]B.Chen,Y.Chen,Y.Kuo,andW.Hsu.Scalablefaceimageretrievalusingattribute-enhancedsparsecodewords.IEEETrans.onPAMI,15,2012.[12]D.Chen,X.Cao,L.Wang,F.Wen,andJ.Sun.Bayesianfacerevisited:Ajointformulation.InProc.ECCV,2012.[13]J.Chen,V.Patel,andR.Chellappa.Unconstrainedfacevationusingdeepcnnfeatures.arXivpreprintarXiv:1508.01722,2015.139[14]JieChen,HawrenFang,andYousefSaad.Fastapproximatek-NNgraphconstructionforhighdimensionaldataviarecursivelanczosbisection.TheJournalofMachineLearningResearch,10,2009.[15]ChengCheng,JunliangXing,YoujiFeng,DelingLi,andXiang-DongZhou.Boot-strappingjointbayesianmodelforrobustfacevInProc.ICB,2016.[16]RadhaChitta,RongJin,TimothyCHavens,andAnilKJain.Approximatekernelk-means:Solutiontolargescalekernelclustering.InProc.InternationalConferenceonKnowledgeDiscoveryandDataMining,2011.[17]AnnaChoromanska,MikaelMichaelMathieu,GerardBenArous,andYannLeCun.Thelosssurfacesofmultilayernetworks.arXivpreprintarXiv:1412.0233,2014.[18]TimothyFCootes,GarethJEdwards,andChristopherJTaylor.Activeappearancemodels.IEEETrans.onPAMI,(6),2001.[19]J.Cui,FangWen,RongXiao,YuandongTian,andXiaoouTang.Easyalbum:aninteractivephotoannotationsystembasedonfaceclusteringandre-ranking.InProc.oftheACMSIGCHIConferenceonHumanfactorsinComputingSystems,2007.[20]YannNDauphin,RazvanPascanu,CaglarGulcehre,KyunghyunCho,SuryaGan-guli,andYoshuaBengio.Identifyingandattackingthesaddlepointprobleminhigh-dimensionalnon-convexoptimization.InAdvancesinNeuralInformationProcessingSystems,2014.[21]GeorgeDoddington,WalterLiggett,AlvinMartin,MarkPrzybocki,andDouglasReynolds.Sheep,goats,lambsandwolves:Astatisticalanalysisofspeakerperfor-manceintheNIST1998speakerrecognitionevaluation.Technicalreport,DTICDoc-umenthttp://www.dtic.mil/dtic/tr/fulltext/u2/a528610.pdf,1998.[22]W.Dong,Z.Wang,W.Josephson,M.Charikar,andK.Li.ModelingLSHforperfor-mancetuning.InCIKM,2008.[23]JunJieFoo,JustinZobel,andRanjanSinha.Clusteringnear-duplicateimagesinlargecollections.InProc.oftheInternationalWorkshoponMultimediaInformationRetrieval,2007.[24]JeromeH.Friedman,JonLouisBentley,andRaphaelAriFinkel.Analgorithmforbestmatchesinlogarithmicexpectedtime.ACMTrans.onMathematicalSoftware(TOMS),3(3),1977.[25]YunchaoGong,MarcinPawlowski,FeiYang,LouisBrandy,LubomirBoundev,andRobFergus.Webscalephotohashclusteringonasinglemachine.InProc.CVPR,2015.[26]KChidanandaGowdaandGKrishna.Agglomerativeclusteringusingtheconceptofmutualnearestneighbourhood.PatternRecognition,10(2),1978.140[27]P.GrotherandM.Ngan.Facerecognitionvendortest(FRVT):Performanceoffaceidenalgorithms.NISTInteragencyReport8009,2014.[28]PatrickJ.Grother,GeorgeW.Quinn,andJonathonP.Phillips.Reportontheeval-uationof2Dstill-imagefacerecognitionalgorithms.NISTInteragencyReport,7709,2010.[29]MatthieuGuillaumin,JakobVerbeek,andCordeliaSchmid.Isthatyou?metriclearningapproachesforfaceidenion.InProc.ICCV,2009.[30]K.He,X.Zhang,S.Ren,andJ.Sun.Delvingdeepintoers:Surpassinghuman-levelperformanceonImagenetarXivpreprintarXiv:1502.01852,2015.[31]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun.Deepresiduallearningforimagerecognition.InProc.CVPR,2016.[32]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun.Identitymappingsindeepresidualnetworks.arXivpreprintarXiv:1603.05027,2016.[33]GEHinton,SimonOsindero,andYee-WhyeTeh.Afastlearningalgorithmfordeepbeliefnets.Neuralcomputation,18(7),2006.[34]JeHo,Ming-HsuanYang,JongwooLim,Kuang-ChihLee,andDavidKriegman.Clusteringappearancesofobjectsundervaryingilluminationconditions.InProc.CVPR,2003.[35]GaryB.Huang,ManuRamesh,TamaraBerg,andErikLearned-Miller.Labeledfacesinthewild:Adatabaseforstudyingfacerecognitioninunconstrainedenvironments.TechnicalReport07-49,UniversityofMassachusetts,Amherst,October2007.[36]SergeyandChristianSzegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducinginternalcovariateshift.InProc.ICML,2015.[37]A.Jain,K.Nandakumar,andA.Ross.Scorenormalizationinmultimodalbiometricsystems.PatternRecognition,38(12),2005.[38]A.K.JainandR.C.Dubes.AlgorithmsforClusteringData.PrenticeHall,1988.[39]A.K.JainandS.Z.Li(eds.).HandbookofFaceRecognition,2ndedition.Springer-Verlag,2011.[40]AnilKJain.Dataclustering:50yearsbeyondk-means.PatternRecognitionLetters,31(8),2010.[41]ViditJainandErikLearned-Miller.FDDB:Abenchmarkforfacedetectioninuncon-strainedsettings.Technicalreport,UniversityofMassachusetts,Amherst,2010.[42]H.Jegou,M.Douze,andC.Schmid.Productquantizationfornearestneighborsearch.IEEETrans.onPAMI,33,2011.141[43]Y.KalantidisandY.Avrithis.Locallyoptimizedproductquantizationforapproximatenearestneighborsearch.InProc.CVPR,2014.[44]VahdatKazemiandJosephineSullivan.Onemillisecondfacealignmentwithanen-sembleofregressiontrees.InProc.CVPR,2014.[45]IraKemelmacher-ShlizermanandStevenMSeitz.Facereconstructioninthewild.InProc.ICCV,2011.[46]J.Kittler,M.Hatef,R.Duin,andJ.Matas.OncombiningIEEETrans.onPAMI,20,1998.[47]B.Klare,A.Blanton,andB.Klein.tfaceretrievalusingsynecdoches.InProc.ICB,2014.[48]BrendanFKlare,BenKlein,EmmaTaborsky,AustinBlanton,JordanCheney,KristenAllen,PatrickGrother,AlanMah,MarkBurge,andAnilKJain.Pushingthefrontiersofunconstrainedfacedetectionandrecognition:IARPAJanusbenchmarkA.InProc.CVPR,2015.[49]JoshuaCKlontzandAnilKJain.Acasestudyofautomatedfacerecognition:TheBostonMarathonbombingssuspects.IEEEComputer,46(11),2013.[50]AlexKrizhevsky,IlyaSutskever,andEHinton.Imagenetwithdeepconvolutionalneuralnetworks.InAdvancesinNeuralInformationProcessingSystems,2012.[51]N.Kumar,A.C.Berg,andS.K.Nayar.AttributeandsimileforfacevInProc.ICCV,2009.[52]M.Lades,J.C.Vorbruggen,J.Buhmann,J.Lange,C.vonderMalsburg,R.P.Wurtz,andW.Konen.Distortioninvariantobjectrecognitioninthedynamiclinkarchitecture.IEEETrans.onComputers,42,1993.[53]LeCun,BBoser,JohnSDenker,D.Henderson,RichardEHoward,W.Hubbard,andLawrenceDJackel.Handwrittendigitrecognitionwithaback-propagationnetwork.InAdvancesinNeuralInformationProcessingSystems,1990.[54]PengLi,YunFu,UmarMohammed,JamesH.Elder,andSimonJ.D.Prince.Proba-bilisticmodelsforinferenceaboutidentity.IEEETrans.onPAMI,34(1),2012.[55]S.Liao,Z.Lei,D.Yi,andS.Z.Li.Abenchmarkstudyoflarge-scaleunconstrainedfacerecognition.InProc.IJCB,2014.[56]JinguoLiu,YafengDeng,andChangHuang.Targetingultimateaccuracy:Facerecognitionviadeepembedding.arXivpreprintarXiv:1506.07310,2015.[57]TingLiu,CharlesRosenberg,andHenryARowley.Clusteringbillionsofimageswithlargescalenearestneighborsearch.InProc.WACV,2007.142[58]StuartPLloyd.LeastsquaresquantizationinPCM.IEEETrans.onInformationTheory,28(2),1982.[59]Jiang-JingLv,ChengCheng,Guo-DongTian,Xiang-DongZhou,andXiZhou.Land-markperturbation-baseddataaugmentationforunconstrainedfacerecognition.SignalProcessing:ImageCommunication,2016.[60]SpirosMancoridis,BrianSMitchell,ChrisRorres,Yih-FarnChen,andEmdenRGansner.Usingautomaticclusteringtoproducehigh-levelsystemorganizationsofsourcecode.InIWPC,volume98,1998.[61]MarkusMathias,RodrigoBenenson,MarcoPedersoli,andLucVanGool.Facedetec-tionwithoutbellsandwhistles.InProc.ECCV,2014.[62]D.Miller,I.Kemelmacher-Shlizerman,andS.M.Seitz.Megaface:Amillionfacesforrecognitionatscale.arXivpreprintarXiv:1505.02108,2015.[63]MariusMujaandDavidG.Lowe.Scalablenearestneighboralgorithmsforhighdi-mensionaldata.IEEETrans.onPAMI,36,2014.[64]VinodNairandEHinton.linearunitsimproverestrictedboltzmannmachines.InProc.InternationalConferenceonMachineLearning,2010.[65]H.NgandS.Winkler.Adata-drivenapproachtocleaninglargefacedatasets.InICIP,2014.[66]MasashiNishiyama,AbdenourHadid,HidenoriTakeshima,JamieShotton,TatsuoKozakaya,andOsamuYamaguchi.Facialdeblurinferenceusingsubspaceanalysisforrecognitionofblurredfaces.IEEETrans.onPAMI,33(4),2011.[67]CharlesOtto,BrendanKlare,andAnilJain.Ancientapproachforclusteringfaceimages.InProc.ICB,2015.[68]CharlesOtto,DayongWang,andAnilK.Jain.Clusteringmillionsoffacesbyidentity.arXivpreprintarXiv:1604.00989,2016.[69]OmkarMParkhi,AndreaVedaldi,andAndrewZisserman.Deepfacerecognition.InBritishMachineVisionConference,2015.[70]JonathonP.Phillips,PatrickJ.Flynn,ToddScruggs,KevinW.Bowyer,andWilliamWorek.Preliminaryfacerecognitiongrandchallengeresults.InProc.FGR,2006.[71]JonathonP.Phillips,HyeonjoonMoon,SyedA.Rizvi,andPatrickJ.Rauss.Theferetevaluationmethodologyforface-recognitionalgorithms.IEEETrans.onPAMI,22(10),2000.[72]JonathonP.Phillips,ToddW.Scruggs,AliceJ.O'ToolePatrickJ.Flynn,KevinW.Bowyer,CathyL.Schott,andMatthewSharpe.FRVT2006andICE2006large-scaleexperimentalresults.IEEETrans.onPAMI,32(5),2010.143[73]P.Phillips,P.Flynn,T.Scruggs,K.Bowyer,J.Chang,K.J.Marques,J.Min,andW.Worek.Overviewofthefacerecognitiongrandchallenge.InProc.CVPR,2005.[74]P.Phillips,H.Moon,S.Rizvi,andP.Rauss.TheFERETevaluationmethodologyforface-recognitionalgorithms.IEEETrans.onPAMI,22,2000.[75]R.Ranjan,V.Patel,andR.Chellappa.Adeeppyramiddeformablepartmodelforfacedetection.InIEEEBiometrics:Theory,ApplicationsandSystems(BTAS),2015.[76]JosephRoth,YiyingTong,andXiaomingLiu.Adaptive3dfacereconstructionfromunconstrainedphotocollections.InProc.CVPR,2016.[77]FlorianScDmitryKalenichenko,andJamesPhilbin.Facenet:Aembed-dingforfacerecognitionandclustering.InProc.CVPR,2015.[78]ChanopSilpa-AnanandRichardHartley.Optimisedkd-treesforfastimagedescriptormatching.InProc.CVPR,2008.[79]KarenSimonyanandAndrewZisserman.Verydeepconvolutionalnetworksforlarge-scaleimagerecognition.arXivpreprintarXiv:1409.1556,2014.[80]NitishSrivastava,Hinton,AlexKrizhevsky,IlyaSutskever,andRuslanSalakhutdinov.Dropout:AsimplewaytopreventneuralnetworksfromovTheJournalofMachineLearningResearch,15(1),2014.[81]YiSun,XiaogangWang,andXiaoouTang.Deeplylearnedfacerepresentationsaresparse,selective,androbust.InProc.CVPR,2015.[82]B.ScottSwann.FBIvideoanalyticspriorityinitiative.In17thAnnualConference&ExhibitiononthePracticalApplicationofBiometrics,2014.[83]ChristianSzegedy,WeiLiu,YangqingJia,PierreSermanet,ScottReed,DragomirAnguelov,DumitruErhan,VincentVanhoucke,andAndrewRabinovich.Goingdeeperwithconvolutions.InProc.CVPR,2015.[84]YanivTaigman,MingYang,Marc'AurelioRanzato,andLiorWolf.Deepface:Closingthegaptohuman-levelperformanceinfaceveInProc.CVPR,2014.[85]YanivTaigman,MingYang,Marc'AurelioRanzato,andLiorWolf.Web-scaletrainingforfaceidenInProc.CVPR,2015.[86]XiaoyangTanandBillTriggs.Enhancedlocaltexturefeaturesetsforfacerecognitionunderlightingconditions.IEEETrans.onImageProcessing,19(6),2010.[87]MakarandTapaswi,OmkarMParkhi,EsaRahtu,EricSommerlade,RainerStiefelha-gen,andAndrewZisserman.Totalcluster:Apersonagnosticclusteringmethodforbroadcastvideos.InProc.oftheIndianConferenceonComputerVisionGraphicsandImageProcessing,2014.144[88]YuandongTian,WeiLiu,RongXiao,FangWen,andXiaoouTang.Afaceannotationframeworkwithpartialclusteringandinteractivelabeling.InProc.CVPR,2007.[89]MatthewTurkandAlexPentland.Facerecognitionusingeigenfaces.InProc.CVPR,1991.[90]LaurensVanderMaatenandGeoHinton.Visualizingdatausingt-sne.JournalofMachineLearningResearch,9(2579-2605),2008.[91]ReneVidalandPaoloFavaro.Lowranksubspaceclustering(lrsc).PatternRecognitionLetters,43,2014.[92]PaulViolaandMichaelJJones.Robustreal-timefacedetection.InternationalJournalofComputerVision,57(2),2004.[93]UlrikeVonLuxburg.Atutorialonspectralclustering.StatisticsandComputing,17(4),2007.[94]J.Wan,D.Wang,S.Hoi,P.Wu,J.Zhu,Y.Zhang,andJ.Li.Deeplearningforcontent-basedimageretrieval:Acomprehensivestudy.ACMMultimedia,2014.[95]D.Wang,S.Hoi,Y.He,J.Zhu,T.Mei,andJ.Luo.Retrieval-basedfaceannotationbyweaklabelregularizedlocalcoordinatecoding.IEEETrans.onPAMI,36,2014.[96]D.WangandA.K.Jain.Faceretriever:thegalleryviadeepneuralnet.InProc.ICB,2015.[97]DayongWang,CharlesOtto,andAnilKJain.Facesearchatscale.IEEETrans.onPAMI,2016.[98]JingWang,JingdongWang,GangZeng,ZhuowenTu,RuiGan,andShipengLi.Scalablek-NNgraphconstructionforvisualdescriptors.InProc.CVPR,2012.[99]LaurenzWiskott,J-MFellous,NKuiger,andChristophVonDerMalsburg.Facerecognitionbyelasticbunchgraphmatching.IEEETrans.onPAMI,19(7),1997.[100]LiorWolf,TalHassner,andItayMaoz.Facerecognitioninunconstrainedvideoswithmatchedbackgroundsimilarity.InProc.CVPR,2011.[101]Z.Wu,Q.Ke,J.Sun,andH.Y.Shum.Scalablefaceimageretrievalwithidentity-basedquantizationandmulti-referencere-ranking.InProc.CVPR,2010.[102]J.Yan,Z.Lei,D.Yi,andS.Li.Towardsincrementalandlargescalefacerecognition.InProc.ICB,2011.[103]D.Yi,Z.Lei,Y.Hu,andS.Z.Li.Fastmatchingby2linesofcodeforlargescalefacerecognitionsystems.arXivpreprintarXiv:1302.7180,2013.[104]DongYi,ZhenLei,ShengcaiLiao,andStanZLi.Learningfacerepresentationfromscratch.arXivpreprintarXiv:1411.7923,2014.145[105]M.Zha,Y.Teo,S.Liu,T.Chua,andR.Jain.Automaticpersonannotationoffamilyphotoalbum.InImageandVideoRetrieval.Springer,2006.[106]KaipengZhang,ZhanpengZhang,ZhifengLi,andYuQiao.Jointfacedetectionandalignmentusingmulti-taskcascadedconvolutionalnetworks.arXivpreprintarXiv:1604.02878,2016.[107]J.ZhouandT.Pavlidis.Discriminationofcharactersbyamulti-stagerecognitionprocess.PatternRecognition,27,1994.[108]C.Zhu,F.Wen,andJ.Sun.Arank-orderdistancebasedclusteringalgorithmforfacetagging.InProc.CVPR,2011.146