UNCONSTRAINED3DFACERECONSTRUCTIONFROMPHOTOCOLLECTIONSByJosephRothADISSERTATIONSubmittedtoMichiganStateUniversityinpartialoftherequirementsforthedegreeofComputerScienceŠDoctorofPhilosophy2016ABSTRACTUNCONSTRAINED3DFACERECONSTRUCTIONFROMPHOTOCOLLECTIONSByJosephRothThisthesispresentsanovelapproachfor3Dfacereconstructionfromunconstrainedphotocol-lections.Anunconstrainedphotocollectionisasetoffaceimagescapturedunderanunknownanddiversevariationofposes,expressions,andilluminations.Theoutputoftheproposedalgorithmisatrue3Dfacesurfacemodelrepresentedasawatertighttriangulatedsurfacewithalbedodatacol-loquiallyreferredtoastextureinformation.Reconstructinga3Dunderstandingofafacebasedon2Dinputisalong-standingcomputervisionproblem.Traditionalphotometricstereo-basedrecon-structiontechniquesworkonaligned2Dimagesandproducea2:5Ddepthmapreconstruction.Weextendfacereconstructiontoworkwithatrue3Dmodel,allowingustoenjoytheofusingimagesfromallposes,uptoandincluding.Tousea3Dmodel,weproposeanovelnormalLaplaceeditingtechniquewhichallowsustodeformatriangulatedmeshtomatchtheobservedsurfacenormals.Unlikepriorworkthatrequirelargephotocollections,weformulateanapproachtoadapttophotocollectionswithfewimagesofpotentiallypoorquality.Weachievethisthroughincorporatingpriorknowledgeaboutfaceshapebya3DMorphableModeltoformapersonalizedtemplatebeforeusinganovelanalysis-by-synthesisphotometricstereoformulationtocompletethefacedetails.Astructuralsimilarity-basedqualitymeasureallowsevaluationintheabsenceofgroundtruth3Dscans.Superiorlarge-scaleexperimentalresultsarereportedonInternet,synthetic,andpersonalphotocollections.ThisthesisisdedicatedtomybeautifulwifeLynnette,whoseencouragementalongthewaywasparamounttomakingitthusfar.iiiACKNOWLEDGMENTS"ThefearoftheLordisthebeginningofknowledge."-Solomon,ProverbsThroughoutmyPh.D.studies,IhavecometounderstanddeeperGod'sblessingofmanwiththegiftofknowledgeandtheabilitytoexploreandunderstandtheworkingsofHiscreation.IhavebeenprivilegedtohaveDr.XiaomingLiuasmyadvisor.HisdesiretoseemesucceedhaspushedmetoobtainfarmorethanIcouldimaginealone.Thelatenightsspentwritingpa-perstogether,attentiontothesmallestdetails,anddesirethepushtheboundsofknowledgehaveinspiredmydedicationtoexcellence.Iamdeeplyindebtedforhisofmywritingandpresentationskills.IamalsogratefultoDr.YiyingTongforhiscollaborationandinthegapsofmycomputergraphicsknowledge.Iimmenselyappreciatehiscontributionandenjoyresearch-ingalongsideofhim.Iwouldalsoliketothanktheremainderofmycommitteemembers,Dr.ArunRoss,Dr.AnilK.Jain,andDr.HayderRadhafortheirvaluableinsightsandcontributionsalongtheway.IamgratefultomyComputerVisionLabmembers,XiYin,AminJourabloo,MortezaSaf-darnejad,YousefAtoum,JamalAfridi,andLuanTranfortheexcellentworkingatmosphere.ThewillingnesstoansweranyquestionsandlatenightsworkingtogethercausesallofourworktoIwillalsoneverforgetthetimesofcelebrationandentertainmenttogetherthatkeepsoursanity.AwordofthanksforthelargerbiometricscommunityatMichiganStateUniversityinthePRIPLabandi-PRoBeLabwiththeirinsightsandquestionsatourseminars.,abigthankstoCharlesOttoandLaceyBest-Rowdenfortheirassistancewithmyexperiments.AwordofappreciationtoKatherineTrinklein,CourtneyKosloski,LindaMoore,CathyDavi-ivson,andDebbieKruchfortheiradministrativeassistance.SpecialthankstoKatyLuchiniColbryforassistingthroughorganizingprofessionaleducationandhermother-likecare.Finally,Iwouldliketothankmyfamilyfortheirprayersandencouragementalongtheway.ThelargestthanksformybeautifulwifeLynnette.vTABLEOFCONTENTSLISTOFTABLES.......................................ixLISTOFFIGURES......................................xChapter1IntroductionandContributions.......................11.1PhotoCollections...................................21.2ThesisContributions.................................3Chapter2BackgroundandRelatedWork.......................62.1SceneParameters...................................62.1.1Objectmodel.................................62.1.2Lighting....................................82.1.3CameraModel................................92.2SurfaceReconstruction................................122.2.0.1Multi-viewStereo.........................132.2.0.2PhotometricStereo.........................142.3FaceReconstructionMethodologies.........................172.3.1Constrained..................................182.3.1.1RangeScanner...........................182.3.1.2Multi-ViewStereo.........................192.3.1.3PhotometricStereo.........................202.3.2Unconstrained.................................212.3.2.13DMorphableModel.......................212.3.2.2SingleImage............................232.3.2.3Video-basedReconstruction....................232.3.2.4PhotoCollections.........................252.4Applications......................................252.4.0.0.1Medical..........................252.4.0.0.2Facerecognition.....................262.4.0.0.3Commercialvideoediting................262.4.0.0.4Virtualcommunication..................262.5Organization......................................27Chapter3Unconstrained3DFaceReconstruction...................293.1Introduction......................................293.2ProposedAlgorithm..................................313.2.12DLandmarkAlignment...........................323.2.2LandmarkDriven3DWarping........................323.2.3PhotometricNormals.............................363.2.3.1InitialNormalEstimation.....................373.2.3.2AlbedoEstimation.........................38vi3.2.3.3LocalNormal.....................383.2.4SurfaceReconstruction............................393.3Experiments......................................433.3.1DataPreparation...............................443.3.1.0.1Photocollectionpipeline.................443.3.1.0.2Groundtruthmodels...................453.3.2Results....................................463.3.2.0.1Qualitativeevaluation..................463.3.2.0.2Quantitativeevaluation..................463.3.2.0.3Usageofviews..................473.3.2.0.4AdditionalReconstructions...............493.4Summary.......................................49Chapter4Adaptive3DFaceReconstructionfromUnconstrainedPhotoCollections554.1Introduction......................................554.2QualityMeasures...................................574.2.1ImageDistanceSquareError.........................584.2.2MahalanobisDistance............................594.2.3MeanEuclideanDistance...........................594.2.4HausdorffDistance..............................604.2.5SurfaceNormalDistance...........................604.2.5.0.1Summary.........................624.3Algorithm.......................................624.3.1InputsandPreprocessing...........................634.3.1.1Photocollection..........................634.3.1.2Landmarks.............................644.3.1.2.1Landmarkmarching...................654.3.2Step1:ModelPersonalization........................654.3.2.13DMorphableModel.......................664.3.2.1.1Modelprojection.....................684.3.3Step2:PhotometricNormalEstimation...................684.3.3.1LightingModel...........................694.3.3.2Dependability...........................714.3.3.3GlobalEstimation.........................724.3.3.4LocalSelection...........................734.3.4Step3:SurfaceReconstruction........................764.3.5AdaptiveMeshResolution..........................784.3.6SSIMQualityMeasure............................794.4ExperimentalResults.................................804.4.1ExperimentalSetup..............................814.4.1.0.1DataCollection......................814.4.1.0.2Metrics..........................824.4.1.0.3Parameters........................824.4.2InternetResults................................834.4.2.1QualitativeEvaluation.......................83vii4.4.2.2SSIMQualityEvaluation.....................864.4.2.3Adaptability............................874.4.3SyntheticResults...............................894.4.4PersonalResults................................894.4.4.1LocalSelection...........................894.4.4.2Adaptability............................904.4.5Discussions..................................904.4.5.0.1Efy........................904.4.5.0.2CoarsetoFine......................914.5Summary.......................................91Chapter5ConclusionsandFutureWork........................935.1Limitations......................................935.1.0.0.1Landmarkreliance....................935.1.0.0.2Expressionvariation...................935.1.0.0.3Specular....................945.1.0.0.4Continuoussurface....................945.1.0.0.5Hair............................945.2FutureWork......................................945.2.1TextureBasis.................................955.2.2MultipleReconstructedShapes........................965.2.3FaceRecognitionApplication........................97APPENDIX...........................................100BIBLIOGRAPHY.......................................106viiiLISTOFTABLESTable2.1:OverviewofFaceReconstructionApproaches................17Table2.2:Notations...................................28Table3.1:Distancesofthereconstructiontothegroundtruth.............46Table4.1:SyntheticSurface-to-SurfaceError......................89Table4.2:LocalSelectionError.............................89Table4.3:SSIMRadiusError..............................90Table4.4:PersonalCollectionAdaptability.......................90ixLISTOFFIGURESFigure1.1:GivenanunconstrainedphotocollectionofTomHankscontainingim-agesofunknownpose,expression,andlighting,facereconstructionseekstocreateanaccurate3Dmodelofhisface.Twoadditionalexamplere-constructionsalongwithasinglerepresentativeimageareshownaswell.2Figure2.1:Comparisonofobjectmodels,depthmap(a)andtriangulatedmesh(b).Thedepthmaphasaedorientationsinceitissimplyanimage,butthetriangulatedmeshmayberenderedatanyorientation............7Figure2.2:Unitvectorsusedinlightingmodels.listhelightsourcedirection,nisthesurfacenormal,vistheviewpointdirection,andristheofthelightsource................................8Figure2.3:Comparisonofperspectiveandweakperspectiveprojection.Withper-spectiveprojection,theintrinsiccameraparametersanddistancetotheobjectaffectthelocationofthepointontheimageplane.Withweakper-spectiveprojection,pointsalongorthogonallinestotheimagemanifestinthesamelocationregardlessofdepth...................11Figure2.4:Epipolargeometry,thebasisformulti-viewstereoreconstruction.Iftheleftcamera,OL,observesapointXLandthereexistsanothercamera,OR,withknownrelativeorientation,thepointmustfallalongepipolarlineintherightview.Ifthematchingpointisdeterminedintherightview,XR,thenthepositionofthatpointisuniquelydeterminedin3Dspace,X....14Figure2.5:Sampleimagesusedforphotometricstereo.Multipleimagesfromaedpositionarecapturedunderavarietyoflightingconditions.Thesurfacenormalscanbedeterminedduetothebrighterappearancewhenfacingthelightsource................................15Figure2.6:Visualrepresentationof0th,1st,and2ndordersphericalharmonics.Redispositiveandblueisnegativewiththeintensityindicatingthemagnitudeofthefunction................................16Figure2.7:Examplereconstructionsatdifferentlevelsofdetail.Fromlefttoright:pore,wrinkle,andsmooth..........................17Figure2.8:TheKonicaMinoltaVivid9iisanon-contactrangescannercapableofcapturingmultipleviewsofanobjectandcreatinga3Dreconstructionaccurateto0:03mm.............................19xFigure3.1:GivenanunconstrainedphotocollectionofTomHankscontainingim-agesofunknownpose,expression,andlighting,facereconstructionseekstocreateanaccurate3Dmodelofhisface.Twoadditionalexamplere-constructionsalongwithasinglerepresentativeimageareshownaswell.30Figure3.2:Overviewofour3Dfacereconstruction.Givenaphotocollection,agenerictemplatefacemesh,and2Dlandmarkalignment,weproposeaniterativeprocesstowarpthemeshbasedontheestimated3Dlandmarksandthephotometricstereo-basednormals..................31Figure3.3:Effectsoflandmarkdrivenwarpingonfacereconstruction.(a)FullreconstructionprocessfromwarpedtemplateforGeorgeClooney,(b)reconstructionfrominitialtemplatewithoutwarping,and(c)imageofClooneyusedinreconstruction........................33Figure3.4:Themeancurvaturenormalindicateshowavertexdeviatesfromtheav-eragelocationofitsimmediateneighbors,whichcanbeevaluatedastheLaplacianoftheposition.ThemeancurvatureHicanbeevaluatedthroughn...................................35Figure3.5:Exampleoftemplatedeformation.(a)Initialgenerictemplate,(b)tem-platewarpedforJustinTrudeau,and(c)templatewarpedforKieraKnight-ley.......................................36Figure3.6:Effectsofboundaryconstraintsonfacereconstruction.(a)Fullrecon-structionofClooneyand(b)boundaryconstraintremoved.........41Figure3.7:Effectsoflandmarkconstraintonfacereconstruction.(a)Fullreconstruc-tionofClooneyand(b)landmarkconstraintremoved............41Figure3.8:Theeffectsofthedeformation-basedsurfacereconstruction.Theblackarrowsindicatethephotometricnormalestimatesandtheorangearrowsshowtheactualsurfacenormal.Theblackdotsarethetargetlandmarklocationsandtheorangedotsarethecorrespondingverticesinthemesh..42Figure3.9:VisualcomparisononBingcelebritieswithimagesfrom[53]totheleftofeachofourviewpoints.Notehowourmethodcanincorporatethechinandmoreofthecheeks,aswellasproducingmorerealisticreconstruc-tionsespeciallyinthedetailedeyeregion..................44Figure3.10:ResultsonsubjectsfromtheLFWdataset.Thereconstructed3Dmodel,sampleimagefromwhichweextractthetexture,andanovelrenderedviewpoint...................................45xiFigure3.11:Distancefromthegroundtruthtothefacereconstructedvia(a)2:5D,(b)2:5Dimproved,and(c)3Dreconstruction.Distanceincreasesfromgreentored.Bestviewedincolor.........................47Figure3.12:Comparisonof(a)frontalonly,(b)includingsideviewforlandmarkwarping,and(c)groundtruthscan.Theadditionofsideviewimprovesthenoseandmouthregion(seearrows)whilealsoallowingforrecon-structionfurtherbackonthecheeks.....................48Figure3.13:Visualizationofmanysuccessfulexamples.................50Figure3.14:Visualizationofmanysuccessfulexamples.................51Figure3.15:Visualizationofmanysuccessfulexamples.................52Figure3.16:Visualizationofdifexamples......................53Figure4.1:Theproposedsystemreconstructsadetailed3Dfacemodeloftheindi-vidual,adaptingtothenumberandqualityofphotosprovided.......56Figure4.2:Overviewoffacereconstruction.Givenaphotocollection,weapplylandmarkalignmentandusea3DMMtocreateapersonalizedtemplate.Thenaprocessalternatesbetweennormalestimationandsurfacereconstruction............................63Figure4.3:Thelandmarkmarchingprocess.(a)internal(green)landmarksandex-ternal(red)paths;(b)estimatedfaceandpose;(c)facewithrollrotationremoved;(d)landmarkswithoutmarching;and(e)landmarksaftermarchingcorrespondingto2Dimagealignment............64Figure4.4:Effectonalbedoestimationwith(a)andwithout(b)dependability.Skinshouldhaveaconsistentalbedo,butwithoutdependabilitythecheekshowsghostingeffectsfrommisalignment..................71Figure4.5:Rawimage,syntheticimageunderestimatedlightingconditions,andSSIMusedforlocalselection.BrighterindicateshigherSSIM.......75Figure4.6:Themeancurvaturenormalindicateshowavertexdeviatesfromtheav-eragelocationofitsimmediateneighbors,whichcanbeevaluatedastheLaplacianoftheposition.ThemeancurvatureHjcanbeevaluatedthroughn...................................76Figure4.7:Syntheticdatawithlighting(top),pose(middle),andexpression(bottom)variation....................................82xiiFigure4.8:QualitativeevaluationofadiversesetofindividualsfromInternetphotocollections..................................83Figure4.9:Qualitativecomparisononcelebrities.Theproposedapproachincorpo-ratesmoreofthesidesofthefaceandneck.................84Figure4.10:Samplerenderingusedforhumanperceptionexperiment..........85Figure4.11:Human-basedPageRankscoresforSSIM..................85Figure4.12:Best(top)andworst(bottom)reconstructionsasdeterminedbyhuman(a)andSSIM(b)...............................86Figure4.13:Histogramofreconstructionperformance..................88Figure4.14:AnInternetimagecollectionthatresultsinacompletefailurereconstruc-tion......................................88Figure4.15:(a)GeorgeClooneywithdifferentqualityimages.(b)Reconstructionwithoutprocess.(c)Personalcollectionwithdifferentqual-ityimages...................................91Figure5.1:Samplesyntheticrenderingofsubjectfromphotocollectionusingesti-matedalbedo.................................97Figure5.2:CMCcurvecomparingofIJB-Aandaddingsyntheticim-agesrenderedfromtheproposedreconstruction.Thesyntheticimagesmakeandecreaseinaccuracy..........98Figure5.3:CMCcurvecomparingbaselineIJB-AtoasubsetofIJB-Awheretheimagesremovedhadlowstructuralsimilarityscorebasedonthere-constructions.Theimageshaveanincreaseiniden-accuracy...............................99Figure4:Overviewofvisualtypingbehavior.Awebcamcapturesavideoofthehandswhiletypingonakeyboardandusescomputervisionalgorithmstodetectandsegmentthehandsandusestheshapeinformationovertimetoverifythecurrentcomputeruser......................102Figure5:Overviewofacoustictypingbehavior.Amicrophonecapturesthesoundproducedfromkeypressesandextractsdifferentinformativefeaturestodeterminethecomputeruser.........................102Figure6:Ahairmatchertakestwoimagesandtheircorresondingsegmentedhairmaskanddeterminesiftheybelongtothesamesubjectordifferentsubjects.105xiiiChapter1IntroductionandContributions3Dreconstructionfromphotographsisalongstandingproblemwithmuchinterestincomputervision.Thegoalof3Dreconstructionistoinferdepthinformationfrom2Dinputssuchasphotosorvideos.Beginningwithreconstructionofrigiddesktopobjects[73,62,28]inhighlyconstrainedenvironments[36,17,86],3Dreconstructionhasadvancedtooutdoorenvironments[80,41,76]andevenlarge-scaleobjects[55,68,1]fromin-the-wildimages.Evenso,mostreconstructiontechniquesoperateonrigidobjectssincethe3Dstructurewillnotchangeovertimeandcommonpointsindifferentimagescandirectlydeterminethesurfaceunderperspectivegeometry.Oneobjectinparticular,theface,ishighlystudied,sinceobtainingauser3Dfacesurfacemodelisusefulforapplicationsin3D-assistedfacerecognition[14,42,59],3Dexpres-sionrecognition[88],facialanimations[21],avatarpuppeteering[90],andmore.Forinstance,accuratefacemodelshavebeenshowntoimprovefacerecognitionbyallowingtherenderingofafrontal-viewfaceimagewithneturalexpression,therebysuppressingintra-personvariability[101].However,facereconstructionisespeciallychallengingsincethefaceishighlynon-rigidandthestructurechangeswithexpressions.Inthisthesis,Ipresentanapproachtosolvingtheproblemofunconstrainedfacere-constructionfromphotocollections.Unconstrainedmeansthereisnopriorknowledgeabouttheimagecapturingconditions.TheseconditionsareoftenreferredtoasPIEforvariationsinPose,Illumination,andExpression.Posevariationreferstotheorientationandpositionofthepersonwithrespecttothecamera.Portraitphotographsaremainlytakenfromafrontalorientation,but1Figure1.1:GivenanunconstrainedphotocollectionofTomHankscontainingimagesofunknownpose,expression,andlighting,facereconstructionseekstocreateanaccurate3Dmodelofhisface.Twoadditionalexamplereconstructionsalongwithasinglerepresentativeimageareshownaswell.unconstrainedphotosmaybetakenfromanyorientation,includingtheside,revealingorobstruct-ingdifferentpartsoftheface.Illuminationreferstothelightingconditionsintermsofnumbers,positions,orientations,orcolors.Poorlyilluminatedimagesmaycastshadowsorwashoutpartsoftheimagebybeingtoobright.Expressionvariationssuchassmiling,laughing,orfrowningcandramaticallychangethestructureandappearanceofaface.Facereconstructionisastheprocessofcreatingadetailed3Dmodelofaperson'sfacefrom2Dinput.1.1PhotoCollectionsAphotocollectionisasetofimagesofthesameindividualcapturedfrompotentiallydifferentcamerasatunknowntimes.Thisisincontrasttoavideothatisasetofimagescapturedfromthesamecamerainquicksuccession.Photocollectionspresentparticularlydifchallengessince2notemporalinformationmaybeusedanddifferentcameralensesmaydistortimagesdifferently.Thereareafewrecentworksexploringtheproblemofreconstructingfacesfromphotocollections.TheseminalworkbyKemelmacher-ShlizermanandSeitz[53]usesaphotometricstereoapproachtocreateadepthmapofthefrontalviewfromthephotocollection.Todothis,theywarpallimagestoafrontalviewsotheyareincorrespondenceatthepixellevel.ThentheyuseSingularValueDecomposition(SVD)tojointlyextractthelightingconditionsandthesurfacenormals.Photocollectionswithavarietyofexpressionsexhibitingdifferentsurfacenormalswillproduceanaveragedorsmoothreconstruction.Tocompensate,theythesurfacenormalsbyselectingasubsetofimagesthatarelocallyconsistentforeachpixel.Thesurfaceisreconstructedbyintegratingthepixel-gridofsurfacenormals.Thenormalsarefurtherbyselectingasubsetofimagesforeachpointwhicharelocallyconsistent.Thesurfacenormalsarethenintegratedtoproducea2:5Ddepthmapofthesubject.Itisextendedinafewdifferentdirections,onein[94]wheretheyusethesurfacenormalsfromfrontalfacestoimprovetheofa3DMM.Andtwo,in[51,77]wherethetechniqueisusedtogeneratea3DMM1.2ThesisContributionsInthisthesis,thechallengingproblemofunconstrainedfacereconstructionfromphotocollectionsisaddressed.Therearetwomajorproblemswithpriorapproaches.One,arestrictive2:5Ddepthmapisreconstructedinsteadofatrue3Dmodel.Two,alargecollectionofimagesisrequiredduetotheSVD-basedapproachtophotometricstereo.Imakethefollowingcontributionsinordertosolvetheseoutstandingproblems.AnovelLaplacemesheditingtechniqueusingsurfacenormalsastheinputisproposedto3allowreconstructionofthe3Dfacemodel.Priorapproachesrequirethemeancurvaturenormalasinput,howeverweintroduceanestimationofthemeancurvaturebasedonthenormalThisenablesreconstructionwhenonlythesurfacenormalsareknownasisthecaseinphotometricstereomethods.Whileweonlyapplythismethodtofacereconstruction,itisapplicabletogeneralmeshediting.Atrue3Dfacialsurfacemodelisreconstructedfromphotocollections.State-of-the-artre-constructionfromphotocollectionsalignsallimagestoafrontalposebeforereconstructingadepthmap.Byformulatingtheproblemona3Dsurfaceinsteadofa2Dpixelgrid,wecanutilizefacesfromallposesinthereconstructionprocess.3Dmodelscancapturemoredetailsandhavebroaderapplicationsthan2:5Dreconstructions.PhotometricstereoissolvedinajointLambertianimagerenderingformulation,withanadaptivetemplateregularizationthatallowsforgracefuldegradationtoasmallnumberofimages.CurrentfacereconstructionusesanSVD-basedapproachtosolvingphotometricstereo,requiringlargephotocollections.Aposebaseddependabilitymeasureandstructuralsimilaritybasedlocalselectionarepro-posedtousethebestimagesinthecollectionforreconstructingdifferentpartsoftheface.Notallimagesareusefulforfacereconstruction.Whenallimagesareusedforeverypartoftheface,asmoothreconstructionisobtained.Previousworkassuminglargecollectionsaggressivelyeliminated˘90%oftheimagesinthecollectionforeachfacedetail.Byde-signinganovellocalselectionprocedure,weareabletousehalfoftheimagesandonlydiscardpartsthatwouldcontributenegativelytothereconstruction.Thislessaggressiveapproachisnecessaryforsmallphotocollections.Astructuralsimilaritybasedqualitymeasureisintroducedtoevaluatereconstructionperfor-4manceintheabsenceofgroundtruthscans.Performanceevaluationiscrucialtocomparingreconstructiontechniques.However,manyphotocollectionslackgroundtruthfacescansandpeoplemustrelyonqualitativeevaluation.WeproposeaquantitativeevaluationcriteriaforuseonInternetphotocollections,wheregroundtruthdatacannotexist.5Chapter2BackgroundandRelatedWorkNowthatabasicunderstandingoftheproblemisknown,Iwillpresentsomebackgroundinforma-tionandrelatedworknecessaryforfullyunderstandingfacereconstruction.2.1SceneParametersTheinverseprocessofreconstructioniscalledrendering.Startingwitha3Dmodelcomposedofasurfaceanditstextureandusingasetofsceneparametersdescribingtheobjectpositions,lighting,andcameraprojection,a2Dimagemayberendered.Manyassumptionsaremadeduetoeitherlimitedunderstandingaboutpropertiesofdifferentsurfacesormoreoftenforcomputationalefy.Inowpresentsomeoftheseassumptionsforthedifferentpartsnecessarytorenderanimage.2.1.1ObjectmodelAnobjectmodelconsistsofbotharepresentationofthesurfaceshapeandthetexture.Objectsurfacesarecontinuousbutareusuallyapproximatedwithdiscretestructures.Alimitedmodelofanobjectistoexpressthesurfaceasadepthmap.Depthmapsarerepresentedasanimageconsistingofasinglechannelcontainingthedistanceofthesurfacefromaviewpoint.Thissometasksbyremovingtheneedforcameraprojectionandonlystoringthevisibleinformation.However,suchamodelislimitedbecausetheobjectonlycontainsinformationfrom6(a)(b)Figure2.1:Comparisonofobjectmodels,depthmap(a)andtriangulatedmesh(b).Thedepthmaphasaedorientationsinceitissimplyanimage,butthetriangulatedmeshmayberenderedatanyorientation.onevieworsideoftheface,andhaslimitedtonoknowledgeaboutthesidesorchinofthefacethataretypicallyoccluded.Thesedepthmapsaresometimescalled2:5Dmodelssincetheyonlycontainpartialinformationinthethirddimension.Depthmapsarefrequentlyusedinconjunctionwitharegularimagewheretheyarecalled,RGB-Dforthestandardred,green,andbluechannelswiththeadditionofthedepthchannel.Atrue3Dmodelisbetterrepresentedasatriangulatedmeshandisabletocapturedetailsaroundtheentiresurface.Atrianglemeshiscomposedofvertices,edges,andtrianglefaces.Theverticesareapositionin3Dspacethatmaycontainotherinformationsuchastextureoranormalvector.Anedgeisaconnectionbetweentwovertices.Andatrianglefaceisaclosedsetofthreeedges.Typicallyfacesshareeachedgewithasingleotherfaceinordertomakeacontinuoussurface,buttherecanbeboundarieswhichformholesinthesurface.Figure2.1showsthedifferencesbetweena2:5Dand3Dmodel.Noticehowthe3Dmodelisabletowraparoundthesidesofthefaceandfolduponitselfatthenostrils.Thisparticularmeshhasasinglehole7Figure2.2:Unitvectorsusedinlightingmodels.listhelightsourcedirection,nisthesurfacenormal,vistheviewpointdirection,andristheofthelightsource.andboundaryaroundthebackofthehead,butiscontinuousacrossthefaceregion.Underatriangulatedmeshrepresentation,theobjectcontainsinformationaboutallpartsofthefaceandmayberotatedandrenderedundernovelviews.2.1.2LightingWithoutlightingorillumination,allsceneswouldappearsolidblack.Photonsoflightemitfromlightsourcesandinteractwithsurfacesinthesceneuntilreachingeitherthehumaneyeorthecamerasensorwhichformsthecolorsinanimagedependingonthenumberofphotonswhichreachthesensor.Theinteractionswithsurfacesarecomplicatedandnotfullyunderstood.Somesurfacesthelightsimplybouncesoff,otherslikeglasslightbendsasitpassesthrough,andotherslikefogcanpartlyoccludelightwhileallowingsometopassthrough.Humanskinisaverychallengingsurfacesinceitispartlytranslucentwheresomelightoffthesurface,whilesomeofitgoesthroughthesurfaceallowingustoseebloodveinsunderneaththeskin.Muchworkexiststocreaterealisticskinrendering[16].Sincelightingiscomplicated,weoftenmakesimplifyingassumptions.Onecommonmodel8forlightingistheLambertianmodelwhichmodelsdiffuselight,andisexpressedas,Id=kdl|n:(2.1)WhereIdisthediffuseintensity,kdistheintensityofthelightsource,listhelightsourcedi-rectionfromthesurfacetothesource,andnisthesurfacenormal.Thislightingassumptionisnotdependentonthepointofviewandhandlesmattesurfaceswell.However,polishedsurfacesdemonstratechangingilluminationdependentontheobserverspointofview.Thinkofthebrightspotonabaldperson'sheadthatmoveswhenyouchangepositions.ThesearemodeledwiththePhongilluminationmodelwhichincludesaspecularcomponent,Is=ks(r|v)a:(2.2)WhereIsisthespecularillumination,kaisthespecularconstant,aisthepowerofthecosineanglewhichdetermineshowglossythesurfaceis,vistheviewvectorfromthesurfacetothepointofview,andristheofthelightsource,r=2(l|n)nl.Figure2.2showsavisualizationofthedifferentvectorsusedinthelightingmodels.2.1.3CameraModelTounderstandanimage,notonlydoweneedtomodelhowlightinteractswiththesurface,butwealsoneedtomodelhowacameracapturesthelightinthescene.Thecameramodeldescribeshowtoprojectrealworld3Dpointsontoa2Dpixellocationintheimage.Incameraterms,howdoesthephotonoflighttravelingfromthesurfacereachaparticularelementontheCCDsensor.Themostcommonlyusedmodelisthepinholecameramodel,whichassumesthesensingplaneis9locatedbehindabarrierwithasinglepointallowinglighttopass.Withthismodel,theprojectionisexplainedbya3x4projectionmatrix[33]P,uptoascale,thatcanbedecomposedintoanintrinsicandextrinsicmatrix.P=2666664fxscx0fycy0013777775|{z}K2666666664r11r12r13r21r22r23r31r32r33|{z}Rtxtytz|{z}T3777777775(2.3)TheintrinsicmatrixKdescribesthepinholeandimageplaneproperties:focallength(fx,fy),principalpoint(cx,cy),andskews.Theextrinsicmatrixdescribethepositionofthecamerainworldcoordinates:rotationRandtranslationT.Note,thattypicallyfurtherassumptionsaremade,fx=fy,s=0,and(cx,cy)istheimagecentertoleaveonly8parametersincludingthescale.Asdescribed,thismodelprojectionequationisalsoknownasperspectiveprojection.Theperspectiveprojectionmodelisaoftypicalcamerasthathavealensdis-tortingthelightfromawiderrangethanapinhole.Inthesecases,radialdistortionsneedtobeconsideredparticularlyforhighresolutionphotographsorwide-anglelenses.However,typicallythedistortionwillberemovedpriortousingtheimagesinanypipelinebyresamplingtheimagewithknownparameters.Asimplercameramodelisusingweakperspectiveprojection.Inaweakperspectiveprojectionmodel,allraysfromthesensorplaneareparallel,immediatelyremovingskew,focallength,andtheprincipalpointfromconsideration.Theweakperspectivemodelisthereforeonlythe2rowsoftheextrinsicmatrixandhas6parametersincludingthescale.AcomparisonofthetwodescribemodelsisgiveninFigure2.3.Theweakperspectiveprojectionworkswhentheobjectisrelativelyfarawayfromthecameracomparedtothefocallengthandthesizeoftheobject.For10Figure2.3:Comparisonofperspectiveandweakperspectiveprojection.Withperspectivepro-jection,theintrinsiccameraparametersanddistancetotheobjectaffectthelocationofthepointontheimageplane.Withweakperspectiveprojection,pointsalongorthogonallinestotheimagemanifestinthesamelocationregardlessofdepth.11faces,imagestakengreaterthanarmlengthdemonstratenegligibleperspectivedistortion.2.2SurfaceReconstructionSurfacereconstructionisachallengingprocessthatvariesdependingonthetypeofinput(images,pointcloud,normaletc.),thequalityofthedata(noise,outlier,etc.),thede-siredoutput(mesh,skeleton,volume,etc.),andthetypeofshape(man-made,organic,etc.).Tothegraphicscommunity,surfacereconstructionalmostalwaysreferstoashapetoapointcloud,potentiallywithadditionalsideinformationlikesurfacenormals.Themajorityofreconstructionalgorithmstakeapointcloudastheirinput,includingmethodswithsurface-smoothnesspriors(e.g.,tangentplanes[39],movingleastsquares[3],radialbasisfunction[23],Poissonsurfacere-construction[50]),visibility-basedmethods[25],data-drivenmethods[64],etc.Moredetailsonthisparticulartopicareinthestateoftheartreport[12].OneofthemostwidelyusedmethodsisthePoissonsurfacereconstruction[50]duelargelytoitsefyandreliability.Thismethodestimatesavolumetricnormalbasedonthepointcloud,andconstructsa3DPoissonequationakintothe2DPoissonequationresultingfrompho-tometricstereo.AsurfaceisthengloballybysolvingthePoissonequation.Thisapproachisrobusttonoiseandallowsaglobalsolutionunlikemanyheuristicbasedapproaches.However,ourproposedmethodwillbeginwithanormalandnotapointcloud,whichrendersthe3DPoissonsurfacereconstructionnotdirectlyapplicable.Thus,weresorttoatemplatedeformationapproach,wherewedeformatemplatefacemeshtomatchtheobservedsurfacenormalswhilemaintainingtheglobalstructureofthetemplate3Dface.Ourtechniqueisbasedonthegradi-entdomainmethodscalledthePoisson/Laplacemeshediting[78,97],wherethemeancurvaturenormalaregiven,andthesurfaceisdeformedunderadditionallandmarkconstraints.Our12methoddiffersfromtheexistingvariantsofLaplacemesheditinginthatweareonlygiventhenormalandhavetoinferthemeancurvature.Incomputervision,however,surfacereconstructionusuallyreferstoproducinga3Dunder-standingofanobjectfrom2Dimages.SometechniqueswillconsiderRGB-Dinputwhichcanbeviewedasapointcloudwithtextureinformation.InthecasewheretheRGB-Ddataisavideo,thetaskofreconstructioniscalleddynamicfusion[61].Startingwithonly2Dphotographsismorechallengingthanapointcloud.Thegoalinthisscenariocanbetterbedescribedases-timatingthemostprobable3Dshapethatexplainsanimageunderasetofassumedmaterials,viewpoints,andlightingconditions.Generally,theproblemisill-posedwithoutassumptionssincetherearemultiplecombinationsof3Dobjects,materials,viewpoints,andlightingthatcanpro-duceidenticalimages.However,underdifferentsetsofassumptions,highlydetailedandaccuratereconstructionsarepossible.Manydifferentcuesprovideinsightintotheobjectshapesuchas,imagefocus,texture,shading,andstereocorrespondence.Theselasttwocueshavedemonstratedrobustnessforreconstructionandaregenerallythemostcommonapproaches.2.2.0.1Multi-viewStereoMulti-viewstereotakesaphotocollectionobtainedunderdifferentviewpointsandtriestorecon-structaplausible3Dgeometrythatexplainstheimagesunderasetofreasonableassumptions,foremostbeingobjectrigidity.Anoverviewoftheprocedureisasfollows.First,estimatethecameraparameters;boththeextrinsic(positionandorientation)andtheintrinsic(focallengthandsensorscale).ThisisgenerallydoneusingStructure-from-Motion[33]tocomputethecameramodelsfromaphotocollectionofTypically,apinholecameramodelisusedtofullydescribehowa3Dpointontheobjectismappedontoa2Dpixellocationintheimage.Second,identifycommonpointsbetweendifferentimages.Withknowncameramodels,anypairofpointsfoundin13Figure2.4:Epipolargeometry,thebasisformulti-viewstereoreconstruction.Iftheleftcamera,OL,observesapointXLandthereexistsanothercamera,OR,withknownrelativeorientation,thepointmustfallalongepipolarlineintherightview.Ifthematchingpointisdeterminedintherightview,XR,thenthepositionofthatpointisuniquelydeterminedin3Dspace,X.bothcamerasuniquelydeterminesanobjectpointin3Dspace(Figure2.4).Finally,the3Dloca-tionsofallcommonpointsformapointcloudwhichmaythenbesolvedusingtraditionalgraphicssurfacereconstructionmethods.2.2.0.2PhotometricStereoWhilemulti-viewstereousessimultaneousimagestakenunderdifferentviewpoints,photometricstereousessequentialimagestakenfromthesameviewpointunderdifferentlightingconditions.Whenmakingreasonableassumptionsaboutthematerialandlightproperties,thesurfacenormalsoftheobjectmaybedetermined.Intuitively,apartofanobjectorientedtowardsthelightwillappearbrighterthanapartorientedawayfromthelight,thereforewemayinfertheanglebetweenthesurfacenormalandthelightsourcedirection.Multipledifferentlightingdirectionsarerequired14Figure2.5:Sampleimagesusedforphotometricstereo.Multipleimagesfromaedpositionarecapturedunderavarietyoflightingconditions.Thesurfacenormalscanbedeterminedduetothebrighterappearancewhenfacingthelightsource.toresolvetheambiguityaboutthetruesurfaceorientationandtoensureeverypartoftheobjectisorientedtowardsthelightinoneimage.Anormalisconstructedusingthepixelgridoftheimage,andtheshapeisdeterminedbyintegratingthenormalShape-from-Shadinginthismannerproducesadepthmaprepresentationoftheobjectsinceallimageswerecapturedfromthesameview.TheeasiestassumptionsareaweakperspectivecameraprojectionmodelwithaLambertianlightingmodel.Undertheseconditions,theseminalworkonphotometricstereo[91]demonstratestheabilitytoreconstructasurfacefrom3imageswithknownlightingconditions.Evencurrentmethodsstilluseknownlightingconditionsforcooperativesubjects[32].Lateritwasdiscoveredthatevenwithoutknowledgeofthelightsourcephotometricstereocantakeadvantageofthelowranknatureofthelightingassumption[35,98,56,7,8,92]..Sphericalharmonics,acompletesetoforthogonalfunctionsonthesurfaceofasphere(Figure2.6,wereproposedtohelpsolvephotometricstereo.Insteadofusingthebidirectionaldistribu-tionfunctiontomodellight,sphericalharmonicscouldbeused.ForasimpleLambertiansurface(Eq.2.1),the1stordersphericalharmonicsisadirectmatchwiththevalues1,nx,ny,andnz.ThePhongilluminationmodeldoesnotdirectlytranslate,but2ndordersphericalharmonicscanmodel15Figure2.6:Visualrepresentationof0th,1st,and2ndordersphericalharmonics.Redispositiveandblueisnegativewiththeintensityindicatingthemagnitudeofthefunction.99:2ofthelightingenergy[29].Viewingtheproblemwithsphericalharmonicsallowsforacreativesolutionthroughsingularvaluedecomposition(SVD).Takeamatrixwhereeachrowisanimageandeachcolumnisacorre-spondingpixelacrossallimages,thentherank-4SVDisarobustwayofsimultaneouslyestimatingthesurfacenormalsandlightingconditionwithaLambertianassumption.Arank-9decompositioncanhandlemorecomplicatedlightingsituations.AfterSVD,anintegrabilityconstraintorpriorknowledgeoftheobjectisrequiredtoresolveambiguityofthedecomposition.Suchapproachesrequireasufnumberofimagestoobtainanaccuratereconstruction,especiallyfornon-rigidobjectslikethefacewhereexpressionvariationcandisturbthelowrankassumption.Itisalsoimportanttonote,thatthesephotometricstereotechniquesreconstructfromacom-monviewpointandthereforeproducea2:5Dsurfaceinsteadofatrue3Dsurfacelikemulti-viewstereo.Thereareafewworkscombiningmulti-viewstereoandphotometricstereotoproducemoreaccurate3Dsurfaces[38,75]evenestimatingarbitrarynon-linearcameraresponsemaps.16Table2.1:OverviewofFaceReconstructionApproaches.InputApproachDetailConstr.PointcloudRangescanner0:03mmmaxSynchronizedimagesMulti-viewstereo[9]0:088mmmean,poreTime-multiplexedPhotometricstereo[32]wrinkleUnconstr.Singleimage3DMM[13]smoothCNN[48]smoothVideoOpticalwtracking[51]wrinkleRGB-DVideoDynamicfusion[61]wrinklePhotocollectionPhotometricstereo[53]wrinkleFigure2.7:Examplereconstructionsatdifferentlevelsofdetail.Fromlefttoright:pore,wrinkle,andsmooth.Notethatthesereconstructiontechniquesaregenericandworkforanyarbitraryobjectthatsat-thematerialandlightingassumptions.Inowmovetodescribingfacereconstructionmethodsthatmaytakepriorknowledgeaboutfacesinordertosimplifytheproblem.2.3FaceReconstructionMethodologiesAsmentionedbefore,thefacehasseenarecentgrowthofresearchforcreatingdetailed3Dmodels.Therearenumerousmethodsforfacereconstructionthatdependonthelevelofcontroloverthecapturingenvironmentandcanproducedifferentqualitiesofreconstructionssuitablefordifferent17tasks.Table2.1providesanoverviewofthemostcommonscenariosandapproaches.Figure2.7providesexamplesofthelevelsofdetaildifferentreconstructionapproachescanproduce.Intheremainderofthissection,wewillprovideabriefoverviewofthedifferentscenarios,theirmostcommonapproaches,andsomeoftheirapplications.2.3.1ConstrainedThemostaccuratescenarioforfacereconstructionisunderaconstrainedenvironment.Con-strainedmeansthereisacooperativeenvironmentforcapturingtheimagery.Inotherwords,thesubjectofinterestmaybeplacedinaknownposeandexpression,thelightingconditionsmaybefullyknown,orthecameraplacementandcalibrationmaybeknown.Inaconstrainedscenario,as-sumptionsmaybemadeaboutthereconstructionprocesswhichallowhighlydetailedandaccuratemodelstobereconstructed.2.3.1.1RangeScannerArangescannerisaspecialpieceofequipmentdesignedtocaptureapointclouddirectly.Thereareavarietyofapproachesincludingprojectingastructuredinfraredlightpatternontoasurfaceandviewingitsdeformation,orusingalasertoeithermeasuretheortriangulation.Regardlessoftheapproach,arangescannerdoestheprocessingwithinthehardwareandreturnseitherapointcloudordepthmapofthescene.Ifyouhavemoneytospend,the$75;000KonicaMinoltaVivid9icancapturedetailswith0:03mmaccuracyacrossabroadrangeofdepths.Thesystemissuitableformanufacturingprocessesandreverseengineeringparts.Ifthepartistoolarge,itevenincludessoftwaretoau-tomaticallystitchtogethermultiplescanstocreateasingleobject.Tocapturetheentiretyofanobject,itincludesarotatingplatform.Forfaceapplications,thesescansareconsideredasground18Figure2.8:TheKonicaMinoltaVivid9iisanon-contactrangescannercapableofcapturingmultipleviewsofanobjectandcreatinga3Dreconstructionaccurateto0:03mm.truthandareoftenusedforevaluationofreconstructionapproachesorcreationofprobabilisticfacemodels.Onedownsidetothislevelofaccuracyisthatthescantimetakes2secondsforasinglecapture.Foramorereasonablypriced$1;441,youmaypurchasetheIIIDScanPrimeSensedevice.PrimeSensepowerstheMicrosoftKinectentertainmentsystem.ThisparticularPrimeSensedevicecaptureswith0:5mmaccuracy,whichmaynotworkformanufacturing,butwillstillcapturewrinklesonhumanfaces.Whatitlacksinaccuracy,itmorethanmakesupinscantime,sincethisdevicecancapturein30framespersecond,makingittheapplicablefordynamicfusion.2.3.1.2Multi-ViewStereoHumanshavedepthperceptionbasedinpartbyhavingtwoeyesthatcantriangulateobjectsinspace.Multi-viewstereoimagingusesthissameprinciplebyplacingtwoormorecameraswithknownpositionsandintrinsiccameramatrices.Whenacommonpointisinbothviews,the3Dpositionisuniquelydeterminedbasedontheintersectionoftheraysoriginatingateachcameraandpassingthroughthecommonpoint.Beeleretal.presentstate-of-the-artmethodsforreconstructingfacesbasedonaconsumersetupofcameras[9]orstereovideo[10].Theirwork19calibratesthecamerasbasedonacalibrationspherewithknowninordertodeterminethecameramatricesandrelativepositionsinspace.Oncecalibrated,framesarecapturedwithin0:1secondsofeachotherfromallcamerasandamulti-viewstereoapproachreconstructsfaceswithporeandwrinkleleveldetail.Workingonanimagepyramidfromlowtohighresolution,itsearchestheepipolarlinesbetweenpairsofcamerasbasedusingthenormalizedcrosscorrelationtoidentifycommonpointsbetweencameras.Thecommonpointsarefurtherbasedonphotometricandsurfaceconsistency.Fromthecommonpoints,a3Dpointcloudmaybeobtainedusingtheknowncameraparameters.Poissonsurfacereconstructionasurfacetothepointcloudwhichisthenfurthertoaddintheporeleveldetails.Thesereconstructionsareknowntobeextremelyaccurateandareoftenusedasgroundtruthsinplaceoflaserscanssincetheequipmentisrelativelycheap.Usinga3Dprintedmaskinsteadofaface[9]demonstratesanaverageerrorof0:088mmusingasetupof7DSLRcameras.Arealhumansubjectwillhavehighererrorsincethemaskhadconsistentalbedoandnospecularityorsub-surfacepresentinhumanskin.However,theycomeatacostofcomputationtimeasittakes20minutestoreconstructasingleimage,andthesubjectmustbestationaryinaspecializedroom.2.3.1.3PhotometricStereoTheprinciplesbehindgeneralphotometricstereoweredescribedindetailinSec.2.2.0.2.Oneofthechallengesofphotometricstereoisthetimedelaybetweenimagesnecessarytochangelighting.Forrigidobjects,thisisnotanissue,butfornon-rigidfaces,itisdesirabletosimultaneouslycap-tureallimages.Inordertocapturedifferentlightingconditionssimultaneously,[37,85]usethreedifferentcoloredlights(red,green,andblue)toilluminatethescenefromdifferentviewpoints.Sincethisistheminimumnumberofilluminationvariationstodeterminethesurfacenormal,20extracaremustbetakenforregionsofshadowsunlikeapproachesusingfourlightsources[6].Anotherapproachdesignedforhumansis[32]wherenearinfraredlightisusedforilluminationsinceitisnotasobtrusiveasvisiblelightandwillnotdisturbthesubjectduringillumination.Forthebestresults,combiningmulti-viewstereoandphotometricstereocanachieveextremelydetailedresults.2.3.2UnconstrainedWhenthesubjectisnotcooperativeandnopriorknowledgeaboutthelightingorcamerasisknown,facereconstructionbecomesunconstrained.Underthesescenarios,priorknowledgeaboutfaceshapesbecomeshighlyimportanttocreateacompellingreconstruction.2.3.2.13DMorphableModelThisseminalworkonunconstrainedfacereconstructionworksonasingleimageandwasproposedbyBlanzandVetter[14].Theydesigneda3DMorphableModel(3DMM)whichextendedclassicactiveappearancemodelsfrom2Dfacealignmenttoa3Dmodelontoanimagebasedonthetextureinformation.The3DMMiscreatedbytakingthegroundtruthscansof200individualsandputtingthemintocorrespondence.Theassumptionisthatanarbitraryfacecanbeexpressedasalinearcombinationofthescannedfaces.Smod=200åi=1aiSi;Tmod=200åi=1biTi;200åi=1ai=200åi=1bi=1:(2.4)21ThefacespacecanthenbecompressedandexpressedasastatisticaldistributionbyusingPrincipalComponentAnalysis(PCA)[47],Smod=¯S+199åi=1aiSi;Tmod=¯T+199åi=1biTi;(2.5)wheretheprobabilityforcoef~abecomes,p(~a)˘exp[12199åi=1aisi2];(2.6)wheres2iistheeigenvalueoftheshapecovariance.Giventhemodelaspriorknowledgeaboutexpectedfaceshapeandtexture,thegoalonceagainistoinferpose,lighting,shape,andtextureparametersofthescenebasedonasingleimage.TheinitialformulationisananalysisbysynthesisapproachwheretheerroristhedistancebetweentherealimageandthemodelsrenderedimagealongwiththelikelihoodofobservingthecoefThisformulationisnon-convexandtendstotolocalminimums,sotheyhavemanualannotationofthelightingandinitialposeparameterstohelpthemodeltotheface.Toattempttothenon-convexenergyfunction,[71]extendedtheproceduretoincludeedges,specularhighlights,andtextureconstraints.Morphablemodelshavefurtherbeendecom-posedintoidentityandexpressionbasisinsteadofasinglecombinedshapebasis[84,22].Astimeprogressed,2Dlandmarkalignment,theprocessofkeypointssuchastheeyes,nose,andmouthinimagesimproved.Recentworks,morphablemodelsbasedontheprojectionerrorofthe2Dlandmarkalignmentwhichisveryefandproducesreasonableresults[95,2].The3DMMisubiquitousinfacereconstructionapproachesandisamajorcomponentofalargepercentageofdifferentapproaches.Onemajorcomplaintwiththe3DMMisthesmalltrainingsize22ofsubject.Arecentpaper[15]buildsanewmodelfrom10;000people,thatwhenreleasedshouldhelpapproachesgeneralizebettertoawiderangeoffaceshapes.2.3.2.2SingleImageBesidesthe3DMM,thereareanumberofothertechniquesforperformingreconstructionfromasingleimage.[52]takesasinglereferencemodel,manuallyalignsittotheimage,andmorphsitbasedonashapefromshadingapproach.Thesolutionisa2:5Drecoveryofthefrontalpartoftheface.ApairofrecentworksperformreconstructionusingaCNNtoa3DMMinsteadofthemodel-based[100,48].Theseworksfocusontheproblemofdensefacealignment,whereeverypixelonthefaceisassignedalocationonacanonicalface,butintheprocesstheyacascadeofregressorstosolvetheprojectionparametersand3DMMshapecoefIndoingso,theyactuallyproduceareconstructionfortheimageaswell.2.3.2.3Video-basedReconstructionThereisalotofinterestinvideo-basedreconstruction.Wehighlightafewofthekeyworks.In[30],sparse2Dlandmarksinthevideoaretrackedanda3DMMistoeachframe,themodelisthenusingopticalwandtemporalconsistency.Thisapproachworksreasonablywelloninthewildvideos,butstruggleswithspecularIn[81],insteadofa3DMM,asurfacereconstructedsimilartoourproposedmethodfromaphotocollectionisusedasinput,theposeforeachframeisestimatedandsimilarto[30]opticalwthealignmentofthemodelanddeformstotheexpressionspresent,andatemporalconsistencyismaintainedbothforwardsandbackwardsinthevideo.In[22],a3DMMisseparatedintoidentityandexpressionbasisandthentoastillimage,,aKinectsensorisusedtotheexpressioncoef23andtransfertheexpressionofausertopuppeteerthe3DMM.In[46],acascadedregressorislearnedonalargesetoffacescanstoperform3Dfacealignmentforasingleimageoravideo.Theresultingalignmenthasonly1024verticesanddoesnotcapturewrinkleleveldetailsoftheface.In[20],alowresolutionmeshistrackedinavideosimilartopriorworks,andthewrinkledetailsareestimatedbasedonregressorsfromlocalpatches.Thisallowsvisuallyappealingwrinklelevelreconstructionsinrealtime,butthedetailsarenotmetricallycorrectfortheobservedface.In[44],anindividualusesacellphonetorecordtheirfaceinaneutralexpressionfromallposes,structurefrommotionastaticfacemodelfromthevideo.Thentheindividualrecordsthemselvesmakingexpressionsinordertobuildablendshapemodel.Thisinteractiveapproachallowsthemtocreatewrinklelevelavatarswhichmaybecontrolledbyothervideos.In[82]asmoothmodelistotwovideosinrealtimeandtheexpressionsfromonearetransferredtotheotherforrealtimepuppeteering.Onemajorapplicationforvideoreconstructionispuppeteering.Avatarpuppeteeringistheprocesswhereapersoncontrolstheactionsofvirtualpersoncalledanavatarthroughtheirownactions,usuallyasrecordedbyavideocamera,butalsothroughothermeanssuchasstrainsen-sorspressedonaface.ThepopularmovieBeingJohnMalkovichisanexampleofapuppeteer.Thisapplicationrequiresadetailedmodeloftheavatar,whichcaneitherbecompletelycomputergeneratedbyanartist,orreconstructedbasedonarealperson.Themodelneedstoberiggedinordertodeformtomatchthedesiredexpressions.Tocontrolthemodelfromavideocamera,facetrackingtheavatarmodeltomostcloselymatchtheposeandexpressionofthepuppeteerinthevideoateachframewhilealsoincorporatingtemporalconsistency.Thisfacetrackingsharessomesimilaritieswithfacereconstruction,especiallywhentheavatarisamodelofpuppeteer.Thesesystemsaremostinterestedincreatingavisuallycompellingavatarandtheefyofthealgo-rithmtoruninrealtime,butthemetricaccuracyofthereconstructionissecondarytothevisual24cohesiveness.2.3.2.4PhotoCollectionsPhotocollectionmethodswerediscussedindetailinSection1.1.Photocollectionsaremorechallengingthanvideossincenotemporalconstraintsareavailable.Intheory,theypossessmoreinformationthanasingleimage,butanaiveuseofaphotocollectioncanactuallyproduceasmoothedreconstructionsincethefaceisnon-rigidandmaychangebetweenimages.Photocollectionsarehighlyrelevantsincemostpeoplehaveapersonalcollectionofphotos.Forbiometricspurposes,arecentfacedatabase[54]introducescollection-basedmatchingproceduresduetotheincreasinglycommonoccurrenceofcapturingmultipleimagesofasuspectinforensicapplications.Anaccurate3Dmodelofthefacecanimprovefacerecognition.2.4ApplicationsJustastherearemanydifferentmethodsforreconstruction,therearenumerousapplications.Whilethescopeofthisworkismainlyacademicinnatureintermsofhowaccurateofareconstructionispossiblefromaphotocollection,Idowanttomentionsomeofthepotentialapplicationsoffacereconstruction.2.4.0.0.1MedicalPerson3DfacemodelsareusedinthemedicalTheyareusedtovisualizethepatientsexactstructureanddesignformpieces.Forexample,theUniversityofMichiganaligns3Dscansovertimeinordertotracktopographicalchangesthatmaybecausedbydifferentdiseases.Modelsareusedinsurgicalcorrection,orthopediccorrection,andcorrectingcraniofacialanomalieslikecleftpalettes.Formedicalapplications,precise3D25structureisimportant,andrangescannersorconstrainedstereosetupsarerequiredtoproducesub-millimeterlevelaccuracy.2.4.0.0.2FacerecognitionFacerecognitionisknowntobedirectlyapplicablefrom3Dto3D.Butrecently,worksdemonstratetheeffectivenessofusinga3Dmodeltoimprovetraditional2Dto2Dfacerecognition.Modelsareusedtonormalizepose,expression,andlightinginimagesinordertorenderanimprovedimagethatwillhavehigherfacerecognitionaccuracy.Itisunclearwhatlevelofdetailisrequiredinthemodelforoptimalfacerecognitionperformance.Evenusingahyper-ellipsoidasafacemodelimprovesfacerecognition[59].Researchershaveshownthatusingageneric3Dfacecanimprovefacerecognition[34].Oneinitialobjectiveofthe3DMMwastoimprovefacerecognition,andithasbeenshowntobeeffectivefornormalizationaswell[101].However,thereisneedforacomparativeevaluationdeterminingtherelativeeffectivenessofbetterpersonmodels.2.4.0.0.3CommercialvideoeditingPersonfacemodelsmaybeusedtomodifyvideos.Theymaybeusedtopost-processavideobyaddingvirtualmakeupandchanginglighting,ortheycanbeusedtocreateacompletelyCGIsceneforamovielikeavatar.Forvideoediting,ablendshapemodelisdesired,wherethefacemaybedeformedtomatchadifferentsetofexpressions.Theblendshapemaybeagenericsetofexpressionsobtainedfroma3DMM,orasetofexpressionscapturesofthetrueactor.Forhighcommercialporelevelac-curacyandpersonexpressionsaredesired.Sincetheoutputwillbehighresolutionanddisplayedontheaterscreens,consumersexpectahighqualityandaccuraterendering.2.4.0.0.4VirtualcommunicationThiscanbeviewedasalow-detailversionofvideoediting.Byreconstructionortrackingafaceinrealtime,anmodelmayberenderedonadifferent26screentoprovidecommunication.Thinkvideochat,butinsteadofsendingavideo,yousendtheparametersofthefacereconstruction.Thisprovidesanimmediateofcompression,butalsoallowsforentertainingchangestotheface.Snapchatprovidestochangetheappearanceofyourownface,Face2Face[82]showshowtocontroladifferentpersonsface,oryoucouldcontrolvirtualavatar.Fortheseapplications,realismismoreimportantthanaccuracy,asoftentimesthefacewillbemadeacaricatureanyways.2.5OrganizationTheremainderofthethesisisoutlinesasfollows.InChapter3,IpresenttheextensionoftheSVD-basedreconstructionintoatrue3Dsurface.UsinganovelnormalLaplaceedit-ingtechnique,theproposedapproachdeformsatriangulatedmeshtomatchtheobservedsurfacenormals.InChapter4theworkisextendtoadapttosmallphotocollections.Usinga3DMMasinitializationandanadaptiveenergyminimizationformulationforphotometricstereoallowsre-constructionsfromevenafewimages.IalsointroducetheSSIM-basedqualityevaluationmetric.InChapter5Ipresentadiscussiononpotentialapplicationsandsuggestionsforfuturedevelop-ment.Table2.2,presentsalistofthemostcommonnotationsusedthroughthethesis.27Table2.2:Notations.SymbolDim.DescriptionImatriximagenscalarnumberofimagesqscalarnumberoflandmarks(68)W2xq2DlandmarkmatrixX3xp3DshapemodelpscalarnumberofmeshverticesN4xpsurfacenormalmatrixL4xnlightingmatrixDnxpdependabilitymatrixLpxpsparseLaplaciansscalarscaleR2x3projectedrotationmatrixt2x1translationvector~r1xpalbedoFnxpimagecorrespondenceHjscalarmeancurvature28Chapter3Unconstrained3DFaceReconstruction3.1IntroductionAsshowninFig.1.1,givenacollectionofunconstrainedfacephotosofonesubject,wewouldliketoreconstructthe3Dfacesurfacemodel,despitethediversevariationsinPose,Illumination,andExpression(PIE).Thisiscertainlyaverychallengingproblem,aswedonothaveaccesstostereoimaging[83]orvideo[81,30].Kemelmacher-ShlizermanandSeitzdevelopedanimpressivephotometricstereo-basedmethodtoproducehigh-qualityfacemodelsfromphotocollections[53],wheretherecoveringofalocallyconsistentshapewasintelligentlyachievedbyusingadifferentsubsetofimages.However,therearestilltwolimitationsin[53].Oneisthatmainlynear-frontalimagesareselectedtocontributetothereconstruction,whiletheconsensusisthatnon-frontal,especiallyimagesarehighlyusefulfor3Dreconstruction.Theotheristhatduetosurfacereconstructionona2Dgrid,a2:5Dheightratherthanafull3Dmodel,isproduced.Motivatedbythestate-of-the-artresultsofphotometricstereo-basedmethods,aswellasamend-ablelimitations,inthischapter,Iproposeanovelapproachto3Dfacereconstructionwhereanumberofcrucialinnovativecomponentsaredesignedanddeveloped.Ourapproachisalsomo-tivatedbytherecentexplosionoffacealignmenttechniques[58,93,49,70],wheretheprecisionof2Dlandmarkestimationhasbeensubstantiallyimproved.,givenacollectionofun-constrainedfaceimages,weperform2Dlandmarkestimation[93]ofeachimage.Inordertoprepareanenhanced3Dtemplateforthephotometricstereo,wedeformageneric3Dfacetemplate29Figure3.1:GivenanunconstrainedphotocollectionofTomHankscontainingimagesofunknownpose,expression,andlighting,facereconstructionseekstocreateanaccurate3Dmodelofhisface.Twoadditionalexamplereconstructionsalongwithasinglerepresentativeimageareshownaswell.suchthattheprojectionsofits3Dlandmarksareconsistentwiththeestimated2Dlandmarksonallimages,andthesurfacenormalsaremaintained.Withtheenhanced3Dtemplate,2Dfaceimagesatallposesarebackprojectedontothe3Dsurface,wherethecollectionofprojectionswillformadatamatrixspanningallverticesofthetemplate.Sincethereareinevitablymissingelementsinthedatamatrixduetovaryingposes,matrixcompletionisemployedandfollowedbytheshapeandlightingdecompositionviaSVD.Withtheestimatedsurfacenormals,wefurtherdeformthe3Dshapesuchthattheupdatedshapewillhavenormalssimilartotheestimatedones,underthesamelandmarkconstraintandanadditionalboundaryconstraint.Toillustratethestrengthofourapproach,weperformexperimentsonseverallargecollectionsofcelebrities,aswellasonesubjectwherethegroundtruth3Dmodeliscollected.Bothqualitativeandquantitativeexperimentsareconductedandcomparedwiththestate-of-the-artmethod.Insummary,thischapterhasmadethreecontributions.30Atrue3Dfacialsurfacemodelisgenerated.Duringtheiterativereconstruction,weperformthephotometricstereoontheentire3Dsurface,andthe3Dcoordinatesofallverticesareupdatedtowardtheshapeofanindividual.Asaofa3Dsurfacemodel,ourapproachallowsfacesfromallposes,includingthetocontributetothereconstruc-tion.Oursurfacereconstructionutilizesacombinationofphotometricstereo-basednormalsandlandmarkconstraints,whichleveragesthepowerofemergingfacealignmenttechniques.Italsostrikesagoodbalanceofallowingfacialdetailstodeformaccordingtothephotometricstereo,whilemaintainingtheconsistencyoftheoverallshapewith2Dlandmarks.Inordertoachievethedeformationofatemplateusingnormalestimates,wedevelopanovelLaplacemesheditingwithsurfacenormalsasinput,whilepriormesheditingusemeancurvaturenormalasinput.3.2ProposedAlgorithmFigure3.2:Overviewofour3Dfacereconstruction.Givenaphotocollection,agenerictemplatefacemesh,and2Dlandmarkalignment,weproposeaniterativeprocesstowarpthemeshbasedontheestimated3Dlandmarksandthephotometricstereo-basednormals.Theproposedalgorithmoperatesonaphotocollectionofnimagesofanindividual.Nocon-31straintsareplacedregardingposesorexpressionsfortheimages,butitisassumedthatthecol-lectioncontainsavarietyof,albeitunknown,lightingconditions.Aninitialgenericfacetemplatemeshincludinglabeled3Dlandmarklocationsisalsogiven.WeassumeweakperspectivecameraprojectionandLambertianFigure3.2illustratesthemajorcomponentsandpipelineofourproposedalgorithm.Through-outthealgorithmdescription,wepresentmultiplereconstructionswhereweomitthepartbeingdiscussedtodemonstratetheeffectsandneedsofeachdifferentpartofthereconstructionprocess.3.2.12DLandmarkAlignmentProper2Dfacealignmentisvitalinprovidingregistrationamongimagesinthephotocollectionandregistrationwiththe3Dtemplate,althoughtheproposedapproachisrobusttoafairamountoferror.Weemploythestate-of-the-artcascadeofregressorsapproach[93]toautomaticallyq(=68)landmarksontoeachimage.AnexampleofthelandmarkisgiveninFigure3.2.GivenanimageI(u;v),thelandmarkalignmentreturnsa2qmatrixWi.3.2.2LandmarkDriven3DWarpingTheinitialtemplatefaceisnotnearlyisometrictotheindividualface,e.g.,theaspectratioofthefacemaybedifferentand,assuch,itwillnotcloselytotheimagesevenintheabsenceofexpression.Therefore,itishighlydesirabletowarptheinitialtemplatetowardthetrue3Dshapeoftheindividualsothatthesubsequentphotometricstereocanhaveabetterinitialization.Figure3.3demonstratesthemotivationforwhylandmarkdrivenwarpingisimportant.Notonlydoesthewarpingprocessgivethecorrectaspectratio,butitalsoenablesaclearerreconstruc-tionsincebettercorrespondencemaybeestablishedthroughoutthecollection.32(a)(b)(c)Figure3.3:Effectsoflandmarkdrivenwarpingonfacereconstruction.(a)FullreconstructionprocessfromwarpedtemplateforGeorgeClooney,(b)reconstructionfrominitialtemplatewithoutwarping,and(c)imageofClooneyusedinreconstruction.Sincetheestimated2Dlandmarksprovidethecorrespondencesofqpointsbetween3Dand2Daswellasacrossimages,theyshouldbeleveragedtoguidethetemplatewarping.Basedonthisobservation,weaimtowarpthetemplateinawaysuchthattheprojectionsofthewarped3Dlandmarklocationscanmatchwellwiththeestimated2Dlandmarks.ThetechniqueweuseisbasedonLaplaciansurfaceediting[78]andadaptedforthelandmarkconstraints.,inordertomaintaintheshapeoftheoriginaltemplatefacewhilereducingthematchingerrorfromthe3Dlandmarkstothe2Dlandmarks,weminimizethefollowingenergyfunction,ZWkDxDx0k2+llnåikRiXCWik2F;(3.1)wherethetermmeasuresthedeviationoftheLaplace-BeltramioperatorD(traceofHessian)ofthedeformedmeshxfromthatoftheoriginalmeshx0integratedovertheentiresurfaceW;thesecondtermmeasuresthesquareddistancebetweenthesetof3DlandmarksannotatedonthemeshXCweaklyperspectiveprojectedthroughRiandthe2DlandmarklocationsWiforimagei;33andllistheweightforlandmarkcorrespondence.Cisapqselectionmatrixwhereeachcolumnselectsasinglemanuallyannotatedlandmarkvertexfromthemesh,i.e.,itisasparsematrixwitha1ineachcolumnatthevertexindexforthecorrespondinglandmark.NotethattheoperatorDmeasuresthedifferencebetweenafunction'svalueatavertexwiththeaveragevalueattheneighboringvertices,sotheminimizationofthetermhelpsmaintainthegeometricdetails.TosolveEq3.1,wediscretizethesurfacepatchWasatrianglemeshwithpvertices,withthevertexlocationsconcatenatedasa3p-dimensionalmatrixX.Throughoutthedeformationprocess,wekeeptheconnectivityofthevertices(i.e.,whichtripletsformtriangles)edandthesameasthegiventemplatemesh.Wedeformthemeshonlythroughtothevertexlocations.Eq3.1isthusturnedintoaquadraticfunctiononX,Ewarp(X;Ri)=kXLX0Lk2F+llnåikRiXCWik2F;(3.2)whereLisadiscretizationofD.Usinglinearelements,itisturnedintoasymmetricmatrixwithentriesLij=12(cotaij+cotbij),whereaijandbijaretheoppositeanglesofedgeijinthetwoincidenttriangles(seeFigure3.4),knownasthecotanformula[66].Inordertotheminimizer,weestimateaninitialRiviathecorrespondingpairsof2Dand3Dlandmarks.WiththeprojectionmatricesRied,theminimizeroftheenergyEwarpcanbeobtainedbysolvingalinearsystem.Howevertheaboveprocedureisnotrotationinvariant.Asin[78,97],wecanresolvethisissuebynotingthatDx=Hn;whichmeansthattheLaplacianofthepositionisthemeancurvatureHtimestheunitnormalofthesurfacen.Therotation-invariantgeometricdetailsarecapturedbytheLaplacianoperator34Figure3.4:Themeancurvaturenormalindicateshowavertexdeviatesfromtheaveragelocationofitsimmediateneighbors,whichcanbeevaluatedastheLaplacianoftheposition.ThemeancurvatureHicanbeevaluatedthroughn.andthemeancurvaturescalarH.Thus,tokeeptheoriginalgeometricdetailwhileallowingittorotate,wecomputetheoriginalmeancurvatureH0(thediscretizationofwhichcorrespondstotheintegralofthemeancurvatureinaneighborhoodaroundeachvertex),andupdatenkaccordingtothedirectionofXkLfortheshapeXkatiterationk,andsolveforXk+1=argminX(kXL+NkH0k2F+llnåikRkiXCWik2F);(3.3)whereNisa3pmatrixofallvertexsurfacenormalsandH0isadiagonalmatrixwithH0foreachvertex.Thisleadstoalinearsystem,(L2+llCC|)X=NkH0L+llnåi(Rki)|WiC|:(3.4)Inpractice,theprocedureofiterativelyestimatingRkiandXk+1convergesquicklyin<10itera-tionsinourtests.35(a)(b)(c)Figure3.5:Exampleoftemplatedeformation.(a)Initialgenerictemplate,(b)templatewarpedforJustinTrudeau,and(c)templatewarpedforKieraKnightley.Figure3.5showstheinitialtemplateaswellasexamplesafterwarpinghascompleted.Noticethatwemaintaintheappearanceoftheinitialtemplatesincewetrytokeepthesamecurvature.Theonlychangeisthatthekeypointsonthefacehavebeenmovedtobetteralignwiththeobservedlandmarksinthecollection,whichchangestheaspectratio,heightofnose,etc.3.2.3PhotometricNormalsFittingthelandmarksallowsforaglobaldeformationofthetemplatemeshtowardtheshapeoftheindividual,butthedetailsoftheindividualarenotpresent.Torecoverthesedetails,weusethephotometricstereowithunknownlightingconditions,similartotheonedescribedbyKemelmacher-ShlizermanandSeitz[53].Theapproachin[53]estimatesaninitiallightingandshapebasedonthefactorizationofa2Dimageset,andtheestimatebasedonlocalizedsubsetsofimagesthatmatchcloselytotheestimateforagivenpixel.Onekeydifferenceisthatin[53]theinputtofactorizationisthefrontal-projected2Dimagesofthe3Dtextures,ratherthanthecollectionof3Dtexturemapsthemselvesinouralgorithm.Thatis,ourphotometricstereois36performedontheentire3Dsurface.Wepresentthephotometricnormalestimationasfollows.WeassumeaLambertianalongwithanambienttermforanypointxinanimage,Ix=rx(ka+kd`nx);wherekaistheambientweight,kdisthediffuseweight,`isthelightsourcedirection,rxisthepoint'salbedo,andnxisthepoint'ssurfacenormal.Weassemblethenumbersintoarowvectorforthelightingl=[ka;kd`]andacolumnvectorfortheshapesx=rx[1;nx]|,sothatIx=lsx.3.2.3.1InitialNormalEstimationInthissection,weassumepointcorrespondencebetweenthecurrentmeshandeachimageinthecollection.WecreateannpcorrespondencematrixFbystoringinfijtheintensityIxcorrespondingtotheprojectedlocationxofvertexjinimagei.ThiscorrespondenceisestablishedbyprojectingthewarpedshapetemplateontotheimagesviatheRimatricesfromSection3.2.2.Fornon-frontalimages,thereareverticesthatarenotvisibleduetotheprojection.Whenthisoccurs,wesetfij=0andusematrixcompletion[57]tointhemissingvaluestoobtainM.Ourexperimentsshowthatgivenanimagesetwithdiverseposes,missingdataoccursatdifferentareasofFandishandledwellbymatrixcompletion.Ifweassembleliforimageiintoann4matrixL,andtheshapevectorsjforeachvertexjintoa4pmatrixS,wehaveM=LS.ToobtainthelightingandnormalestimationLandSfromM,weuseatypicalphotometricstereotechniqueknowingthataLambertiansurfacewillberank-4ignoringself-shadows.WefactorizeMviasingularvaluedecomposition(SVD)toobtainM=ULV|andusetherank-4approximationM=ŸLŸSwhereŸL=UpLandŸS=pLV|.ŸLandŸSarethesamesizeasthedesiredlightingandshapematricesLandS,butthefactorizationisnot37uniqueasanyinvertible44matrixAgivesavalidfactorizationsinceLS=(ŸLA1)(AŸS).Theambiguitycanberesolveduptoageneralizedbas-relieftransformthroughintegrabilityconstraints,but[53]statesthatitmaybeunstableforimageswithexpressionvariations.Thus,wefollowtheapproachfrom[53],whereweselectimagesthataremodeledwellbythelowrankapproximation,i.e.,kMŸLŸSk2thresholddo3estimateprojectionRiforeachimage4solveEq3.45while3DfacevectorXchange>thresholddo//estimatenandr(Sec.3.2.3)6re-estimateRki7establishcorrespondenceF8performmatrixcompletiononFtoobtainM9estimatelighting,L,andshape,S,bySVD10estimatealbedo,r11resolveambiguitybyestimatingA12localnormalestimateviaEq3.5//deformXwithn(Sec.3.2.4)13updatenviaEq3.1114estimatemeancurvatureviaEq4.1815solveEq3.10whereIistheidentitymatrix.Theprocedureesthenon-shadowedregion,blendstheshadowedregionthroughimpaintingwithweightwd,andsmoothsouttheshadowedregionwithaweightws.Finally,Algorithm1summarizestheoverallprocedureinouralgorithm.3.3ExperimentsInthissectionwepresentourexperiments.Wedescribethepipelinetoprepareaphotocollec-tionforfacereconstruction.Wethendemonstratequalitativeresultscomparedwith2:5Drecon-structionontheLabeledFacesintheWild(LFW)database[43],andoncelebritiesdownloadedfromBingimagesearch.Finally,wecomparequantitativelyonapersonalphotocollectionwherewehavethegroundtruthmodelcapturedviaarangescanner.43[53]Ours[53]OursReal[53]OursAlbedorFigure3.9:VisualcomparisononBingcelebritieswithimagesfrom[53]totheleftofeachofourviewpoints.Notehowourmethodcanincorporatethechinandmoreofthecheeks,aswellasproducingmorerealisticreconstructionsespeciallyinthedetailedeyeregion.3.3.1DataPreparation3.3.1.0.1PhotocollectionpipelineForthecelebrities,weusetheBingAPItoaccessuptothe1;000imageresultsbysearchingontheirandlastnames.Weremoveduplicateimagesfromtheretrievalresults.TheimagesarethenimportedintoPicasa,whichperformsfacedetectionandgroupssimilarimages.Aftermanuallynamingafewgroups,furtherimagesaresuggestedbyPicasaandautomaticallyaddedtothecollection.Intheend,abouthalfoftheimagesremainforeachpersonsincemanysearchresultsarenotphotographsofthepersonofinterestorareduplicates.Alandmarkdetectorestimates68landmarksineachimagearoundtheeyes,eyebrows,nose,mouth,andchinline.Fortheinitialshapetemplate,weusethespace-timefacesneutralface44Figure3.10:ResultsonsubjectsfromtheLFWdataset.Thereconstructed3Dmodel,sampleimagefromwhichweextractthetexture,andanovelrenderedviewpoint.model[99],whichwesubdividetocreatemoreverticesandtherebyahigherresolution.3.3.1.0.2GroundtruthmodelsWeuseaMinoltaVivid910rangescannertoconstructgroundtruthdepthmapsforapersonalphotocollection.Thescannerproducesa2:5Ddepthscan;sowecapturethreescans,onefrontal,andtwoat˘45degreeyaw.WealignthescansviaIterativeClosestPointandmergethemtoproduceagroundtruthmodel.45Table3.1:Distancesofthereconstructiontothegroundtruth.Methods2:5D2:5Improved3DMean7:86%7:79%5:42%RMS9:71%9:04%6:89%3.3.2Results3.3.2.0.1QualitativeevaluationWeprocessthesamecelebritiesasusedin[53],GeorgeClooney(476photos),TomHanks(416),KevinSpacey(316),andBillClinton(460),aswellasthefourindividualswiththemostimagesinLFW,GeorgeBush(528),ColinPowell(236),TonyBlair(144),andDonaldRumsfeld(121).TheresolutionofLFWis250250andwescaleallBingfaceregionsto500pixelsheight.Figure3.9comparestheresultsbetweenourapproachandthefrom[53].Figure3.10showsourreconstructionsontheLFWdataset.Weseethatourreconstructionprovidesmoreaccuratedetailsinareaswithhighmeancurvatures,e.g.,theeyesandmouth,aswellasallowingforreconstructionofthechinandcheekswhenthesurfacenormalpointsawayfromthefrontalpose.Furthermore,thefacialfeaturesinourresultsarelesscaricature-likethan[53],butclosertothetruegeometry.3.3.2.0.2QuantitativeevaluationWealsoimplementthe2:5Dapproachbywarpingouresti-matedphotometricnormalstoafrontalviewandintegratingthedepth.Sincethe2:5Dapproachfrom[53]isnotmetricallycorrectastheymention,wealsoperformanimproved2:5Dapproachwhereweuseourlandmarkdriven3Dwarpingasapreprocessingsteptoresolvetheaspectambiguities.Tocomparetheapproachesnumerically,wecomputetheshortestdistancefromeachvertexinthegroundtruthtotheclosestpointonthereconstructedsurfaceface.Meshesarealignedbytheirinternallandmarksaccordingtotheabsoluteorientationproblem[40].Wereportthemeaneuclideandistanceandtherootmeansquare(RMS)ofthedistance,afternormalizedbytheeye-46(a)(b)(c)Figure3.11:Distancefromthegroundtruthtothefacereconstructedvia(a)2:5D,(b)2:5Dim-proved,and(c)3Dreconstruction.Distanceincreasesfromgreentored.Bestviewedincolor.to-eyedistance,inTable3.1.Figure3.11showsacoloringofthetemplatetovisualizewhereonthefaceiscloseforreconstruction.Thebase2:5Dapproach(a)hasincorrectdepthinformationatthenosesincetheshapeambiguityisrecoveredfromtheerinitialtemplate,theimproved2:5Dapproach(b)betterapproximatesthedepth,butthebridgeofthenoseprotrudestoofar,andour3Dreconstructionbestmatcheswiththegroundtruthacrossallthedetailsoftheface.3.3.2.0.3UsageofprviewsOneadvantageofthelandmark-baseddeformationapproachcombinedwithphotometricstereoistheabilitytousetheimagesmoreeffectively.Withonlyphotometricstereo,theextremeposesobscurepartsofthefaceandalsocauseincreased47(a)(b)(c)Figure3.12:Comparisonof(a)frontalonly,(b)includingsideviewforlandmarkwarping,and(c)groundtruthscan.Theadditionofsideviewimprovesthenoseandmouthregion(seearrows)whilealsoallowingforreconstructionfurtherbackonthecheeks.possibilityfordistortionwhenrenderinginafrontalview.However,theviewsproviderich3Dlandmarkdepthinformation.Werunanexperimentonthepersonalphotocollectionwhereweuse70nearlyfrontalimageswith<10degreeyaw,andthenadd40imageswithsideviewinformationof>45degreeyaw.Figure3.12showstheimproveddepthofreconstructionandaccuratemouthdetailsfromusingadditionalsideviewimages.Notethatwemanuallylabeledthegroundtruthfortheseimagesduetotheyofour2Dfacealignmentimplementationwhenpointsareoccluded,buttherearedetectorsthatworkwelleveninthesesituations[19].483.3.2.0.4AdditionalReconstructionsTofurtherdemonstratetheperformanceandgeneral-izationoftheproposedapproach,wepresentalargersetofindividualsprocessedthroughtheBingandPicassapipeline.Notethatfortheseindividuals,thephotocollectionscontainimagesmainlywithin30degreesyawduetothelackofimagesreturnedthroughtheBingAPI.Imainlyleavethetospeakforthemselves,butcommentonafewsmallpoints.Mostreconstructionsareconvincing,andtheidentityiseasilyrecognizable.ProminentwrinklesarecapturedforsomepeopleincludingHarryReidandRobinWilliams.VinDiesel'scommonuseofsunglassesappearsinthereconstruction.JimCarrey'sextremeexpressionvariationscausethereconstructiontofail.DenzelWashingtonhashighamountsofspecularitiesthatarenotmodeledbyourlightingassumptionpossiblycontributingtohispoorreconstruction.3.4SummaryWepresentedamethodfor3Dfacereconstructionfromanunconstrainedphotocollection.Theentirepipelineofiterativereconstructioniscoherentlyconductedonthe3Dtriangulatedsurface,includingtexturemapping,surfacenormalestimationandsurfacereconstruction.Thisenablesconsumingfaceswithallpossibleposesinthereconstructionprocess.Also,byleveragingtherecentlydevelopedimagealignmenttechnique,weuseacombinationof2Dlandmarkdrivencon-straintandthephotometricstereo-basednormalforsurfacereconstruction.Bothqualitativeandquantitativeexperimentsshowthatourmethodisabletoproducehigh-quality3Dfacemodels.Finally,therearemultipledirectionstobuildonthisnoveldevelopment,includingincorporating49HarryReidRickSnyderJosephGordon-LevittFigure3.13:Visualizationofmanysuccessfulexamples.50RobinWilliamsLeonardoDiCaprioTomCruiseFigure3.14:Visualizationofmanysuccessfulexamples.51WillSmithJinpingXiFigure3.15:Visualizationofmanysuccessfulexamples.52DenzelWashingtonVinDieselJimCarreyFigure3.16:Visualizationofdifexamples.53automaticallydetected2Dlandmarksintheviews,validatingourapproachonadiversesetofpopulations,andextendingtonon-faceobjects.54Chapter4Adaptive3DFaceReconstructionfromUnconstrainedPhotoCollections4.1IntroductionInthischapterwecontinuetoimproveunconstrainedfacereconstructionbymakingitmoreadap-tivetothenumberofimagesinthecollectionaswellasthequalityoftheimages.Therecon-structionapproachpresentedintheChapter3stillhaslimitations.Frontalimageswererequiredfor[53],andeventhoughthepreviouschaptercanusenon-frontalimages,wewilldemonstratethatitsperformancedropswithlargeposevariation.Weproposeapose-baseddepend-abilitymeasuretoexplicitlyhandlenon-frontalimages.Anotherlimitationisasuflargephotocollection.Theoretically,onlyfourimagesarenecessaryforaphotometricstereo-basedap-proach,butinpracticepriorapproachesreportresultsonoveronehundredimagecollectionsfortwoprimaryreasons.One,theirsingularvaluedecompositionsolutiontophotometricstereoissusceptibletonoisewithsmallcollections.Two,priorapproachesperformalocalselectionstepwhereonly˘10%ofimagesareusedforeachpartoftheface.Weproposeanenergyminimiza-tionsolutionwithanadaptivetemplateregularizationtoreconstructsmallcollections,evendowntoasingleimage.Andweproposetousealargerregionforlocalselectiontoallowuseof˘50%ofimages.Toperformfacereconstruction,givenacollectionofunconstrainedfaceimages,wealign55Figure4.1:Theproposedsystemreconstructsadetailed3Dfacemodeloftheindividual,adaptingtothenumberandqualityofphotosprovided.2Dlandmarks[93]toalldetectedfaces.Wethencreateapersonalizedfacemodelbya3DMorphableModel(3DMM)jointlytothecollectionsuchthattheestimated2Dlandmarksalignwiththeprojectionoftheassociated3Dlandmarksonthemodel.Densecorrespondenceisestablishedacrossthecollectionbyestimatingtheposeforeachimageandback-projectingtheimageontothepersonalizedtemplate.Aglobalestimationofthealbedo,lighting,andsurfacenormalsisperformedusingadependabilityweightingbasedontheposeofeachimage.Thesurfacenormalestimateisimprovedlocallyusinganovelstructuralsimilarityfeedbacktoidentifyacollectionofimageswhicharesimilarforeachpartoftheface.Reconstructionofthefacemodeldeformsthemeshtomatchtheestimatedsurfacenormals.Aprocessisemployedtocapturethegenericshapeandtheninthedetails.Weperformextensiveexperimentalevaluationstoshowqualitativelyandquantitativelytheperformanceoftheproposedfacereconstructionmethod.56Insummary,thischaptermakesthefollowingmaincontributions.A3DMorphableModelisjointlyto2Dlandmarksformodelpersonalization.Priorworkusedeitheraedtemplateorlandmark-baseddeformationthatdoesnotworkwellforsmallcollections.AjointLambertianimagerenderingformulationwithanadaptivetemplateregularizationsolvesthephotometricstereoproblem,allowingforgracefuldegradationtoasmallnumberofimages.Apose-baseddependabilitymeasureisproposedtoweighttheofmorefaceparts.Structuralsimilarity,ameasurecorrelatedwithhumanperception,drivesthelocalselectionofimagestouseforestimatingeachsurfacenormal.Theuseofstructuralsimilarityasaqualitymeasurementforperformanceevaluationintheabsenceofgroundtruth.4.2QualityMeasuresOnecrucialtaskforreconstructionismeasuringthequalityofthereconstruction.Thequalityisusefulbothforcomparingbetweendifferentreconstructionmethodologiesaswellasforprovidingfeedbacktoareconstructiontechnique.Forexample,ifweknowthatoneimagecon-tributespoorlytothereconstructionquality,itmaybebettertoremovetheimageorweightitlessduringthereconstructionprocess.Therearemanydifferentmeasuresusedintheliteratureforfacereconstruction.574.2.1ImageDistanceSquareErrorTheimagedistanceisthesquareerrorbetweentheoriginalimage(real)andarenderingofthereconstruction(syn).qimage=åu;vkIreal(u;v)Isyn(u;v)k2:(4.1)Theimagedistanceiswidelyusedinmanyreconstructionapproachesduetoitsinclusionofallofthemodelparametersincludingobjectshape,orientation,cameramodel,lighting,andalbedo.Itisalsostraightforwardtocomputeandeasytounderstand.However,therearesomemajordrawbackstotheimagedistancequalitymeasure.Sincetherenderingisprojectedintotheoriginalimage,itisimpossibletopenalizepartsofthefacenotoccludedbythesyntheticimage.Forexample,iftheshapeofthefaceisestimatedtoonarrow,therewillnotbeasyntheticrenderingacrosstheentirecheekcomparedwiththerealimageandtherewillbea0distancefortheseregionseventhoughthereisclearlyreconstructionerror.Ontheotherextreme,qimagemayalsobehighevenforveryhighqualityreconstructions.Forexample,iftheshapeandprojectionofthefacesareperfect,butthealbedoiswrong,theerrorwillbehigh.Oriftheprojectionisshifted,therewillalsobehigherrorsincethepixelcorrespondencebetweentherealandsyntheticimagenolongerhavesharedsemanticmeaning.Theseproblemsareduetousingasinglepixelcorrespondencetocomputetheerroraswellasmanysmallerrorscontributinglargerthanasinglelargeerror.Despitetheproblemswithqimage,itisveryrelevantasaqualitymeasure,especiallyduringtheprocedure.Itprovidesacompletemeasureofallparametersintheandiseasytocomputethederivativeforminimization.584.2.2MahalanobisDistanceTheMahalanobisdistancemeasurestherelativedistanceofafacefromastatisticaldistributionoffaces.Bytakingatrainingsetoffacesandusingprinciplecomponentanalysis(PCA),themultivariateGaussianprobabilitydensityfunctionoffaceshapesisestimated.Thisdistanceisgivenas,qmahal=åiaidisidi2;(4.2)fortheshapealone,whereaidisthecoefoftheshapeprojectedintothePCAspaceandsidisthestandarddeviationfromthePCA.TheMahalanobisdistanceisoftenusedasaregularizerduringsinceitrelatestothefacenessoftheshapeorhowprobabletheshapeisaface.Whenusedasaregularizeroftenthedistancefortheotherparameterssuchasalbedo,projection,andlightingareaddedfora3Dmorphablemodel.However,qmahalisnotgoodforcomparingbetweenreconstructiontechniques,sinceitdoesnottakeintoconsiderationthetruegroundtruthshapeoftheindividualandonlymeasuresthedistancetothedistributionofatrainingset.4.2.3MeanEuclideanDistanceThemeanEuclideandistanceisagoodmeasurewhenagroundtruthshapeispresent.Itmeasurestheaveragedistanceofallpointsononeshapetoanothershape.Thisisusuallydiscretizedtoonlyconsidertheverticesoftheshapes.GivenatemplateshapeYandreconstructedshapeX,whentheshapesaregiveninvertexcorrespondenceitis,qeucl=1ppåi=1kxiyik2:(4.3)59However,correspondenceistypicallynotknownbetweenthetwoshapes,inwhichcase,thedis-tanceateachvertexistypicallymeasuredastheminimumdistancetotheothershape.Notethatqeuclhasafewmajordrawbacks.One,itishighlysensitivetorigidtransformationsoftheshapes.Anyslighttransformationsofoneshape,(e.g.,aglobalscale)causesaeffectonqeucl.Therefore,itisimportanttoperformrigidalignmentofthefacesthroughiterativeclosestpointalignment.Two,thedistributionofpointdifferencesislikelytohavearighttailandasinglevalueforqeuclmaynotbeinformativeofdetailrecovery.4.2.4HausdorffDistanceTheHausdorffdistanceissimilartotheLdistanceandisas,qhaus=maxfsupx2Xinfy2Yd(x;y);supy2Yinfx2Xd(x;y)g;(4.4)wheresupisthesupremumandinfistheInotherwords,itmeasuresthemaximumdistancebetweenanypointonsurfaceXandtheclosestpointonYandviceversa.Whendealingwithsurfacesextremelyclosetoeachother,qtexthausisagoodmeasuretodifferentiatebetweendifferentreconstructions.Forexample,incomputergraphics,itisusedtomeasurethelevelofdetailwhendownsamplingobjectsfarawayinthescene.However,itisapoorqualitymetricforcurrentfacereconstructionmethodssincetheyarefarawayfromthegroundtruth,andonlyonepointoutofplacewillincreasethemeasure.4.2.5SurfaceNormalDistanceOneproblemwiththeEuclideandistancemeasureisthatpointsonthesurfacemaybeclosetoeachother,butstillbedissimilar.Particularlywhenthereisnodirectcorrespondencebetweenthe60surfacesandtheclosestpointmustbeused.Forexample,ifthereisasmallwrinkleinonesurface,itwillhaveverysmallEuclideandistance,totheothersurface,butisclearlyalargererror.Ifwelookatthesurfacenormaldifferencebetweenthewrinkleandthesmoothsurface,wewillseeamuchlargerdifference.Inthisregards,asurfacenormaldifferencecanmeasurethedetailsofareconstruction,attheexpenseofignoringthetruesurfacetosurfacedistance.In[67]thenormalmetricisas,qnormal=1ppåi=1arccos(nin0i);(4.5)whereniisthenormalofthereconstructedfaceandn0iisthenormaloftheaverageface.Theirisusedasaregularizerforthefaceshape.ItisobservedthatwhentheMahalanobisdistanceisusedasaregularizer,itrequireseither(1)itallowsnon-plausiblefaceswithalowweightor(2)theweightistoolargeandproducessmoothfaceswhenamoreoptimalresultcouldbeobtainedforsomeimages.Therefore,Mahalanobishasatradeoffbetweenqualityandrobustness.In[67],theydemonstratetheeffectivenessofusingthesurfacenormalmeasuretoregularizethereconstruction.Oneissuewithqnormalisthevarianceofdifferentpartsoftheface.Forexample,thenoseandlipsmayhavelargedifferencesinthesurfacenormalwhilestillmaintainingareasonablefacecomparedtothecheekswhichshouldremainsmooth.Tocompensate,theyintroduceaweightforeachvertexas,‹wi=1¯fi¯fmin¯fmax¯fmin;(4.6)where¯fiistheaveragenormaldeviationinthetrainingsetfortheirmorphablemodel.Thenormaldistancecouldalsobeusedtoevaluatereconstructionwithoutvertexcorrespon-dencebyselectingtheclosestvertexonthegroundtruthsurfacetocomparethenormaldifference.614.2.5.0.1SummaryThereareclearadvantagesanddisadvantagestothedifferentqualitymea-suresastheycandistinguishbetweendifferenttypeoffailuresinthereconstruction.qeuclandqhauscanonlybeusedwithagroundtruthfacescan.qnormalandqmahalareonlyappropriateforregularizationbecausetheyarepersonindependent.Thatleavesonlyqimageasarelevantmeasureforphotocollectionswithoutagroundtruthscan.However,qimageisverysensitivetonumerousfactorsandmaynotbeinformativeofshape.Basedonthelimitationsofcurrentqualitymeasuresforevaluationofphotocollections,weproposeanewqualitymeasurebasedonstructuralsimilarity(SSIM)thatencompassesallparame-tersofthereconstructionlikeqimage,butconsidersalargerareathanasinglepixelforcomparisonsoitismorerobusttosmallchangesinalignment.4.3AlgorithmWenowpresentthedetailsoftheproposedapproach,describingthemotivationaldifferencesfrompriorworks.Figure4.2providesanoverviewofthedifferentstepstofacereconstruction.Thealgorithmassumestheexistenceofaphotocollectionwithautomaticallyannotatedlandmarksanda3DMM.NotationsusedthroughoutthischapterareprovideinTable2.2.Themainalgorithmiscomposedofthreesteps.Step1:Fitthe3DMMtemplatetoproduceacoarsetemplatemesh.Step2:Estimatethesurfacenormalsoftheindividualusingaphotometricstereo(PS)-basedapproach.Step3:Reconstructadetailedsurfacematchingtheestimatednormals.62Figure4.2:Overviewoffacereconstruction.Givenaphotocollection,weapplylandmarkalign-mentandusea3DMMtocreateapersonalizedtemplate.Thenaprocessalternatesbetweennormalestimationandsurfacereconstruction.4.3.1InputsandPreprocessing4.3.1.1PhotocollectionAphotocollectionisasetofnimagescontainingthefaceofanindividualandmaybeobtainedinavarietyofways,e.g.,aGoogleimagesearchforacelebrityorapersonalphotocollection.Weassumethattheonlyfaceineachimagebelongstothepersonofinterest.Tonormalizetheimages,weautomaticallydetectthefaceusingthebuilt-infacedetectionmodelfromBob[4]whichwastrainedonvariousfacedatasets,suchasCMU-PIE,thatincludeviewfaces.ThefacedetectorisacascadeofCensusTransform(MCT)localbinarypatternsWeoutfaceswithaqualityscore<25toremoveextremelypoorqualityfacesorimageswithoutaface.Giventhefaceboundingboxfromthedetector,wescaletheimageto110pixelsinter-eyedistanceandcropittoatotalsizeof450450toensuretheentirefaceregionispresentintheimage.ALambertianlightingassumptionusesalinearencodingoftheintensityofthelightedobject.However,humanscandistinguishdifferencesinlowintensitybetterthanhighintensity,somost63(a)(b)(c)(d)(e)Figure4.3:Thelandmarkmarchingprocess.(a)internal(green)landmarksandexternal(red)paths;(b)estimatedfaceandpose;(c)facewithrollrotationremoved;(d)landmarkswithoutmarching;and(e)landmarksaftermarchingcorrespondingto2Dimagealignment.camerasuseanon-lineargammaencodingofimagesinordertoprovideasubjectivelyequalstepinbrightnessforhumans.Forthiswork,weapplyasingleindustrystandardgammacorrectiontoconverteachimageintothelinearintensityscale.4.3.1.2LandmarksLandmarksarethelocationsofcommonkeyfeaturessuchastheeyes,nose,ormouthonaface.Inrecentyears,theautomaticdetectionoflandmarks[26,27,48]hasseenrapidimprovementduetolargelabeleddatasetssuchasLFPW[11]and300-W[74].Toestimate2Dlandmarks,weemploythestate-of-the-artcascadeofregressorsapproach[93]toautomaticallyq=68landmarksdenotedasW2R2qontoeachimage.Figure4.3showsthe68landmarksusedinthiswork.Thelandmarkscanbeseparatedintotwogroups.One,theinternallandmarksontheeyebrows,eyes,nose,andmouth.Thesecorrespondtophysicalpartsofthefaceandareconsistentonallfacesregardlessofpose.Two,theexternallandmarksforthecheek/jawalongthesilhouetteoftheface.Theselandmarksdonothaveasinglecorrespondencetoapointonthe3Dface.Asthefaceturnstonon-frontalviews,facealignmentalgorithmstypicallydetectexternallandmarksonthefacialsilhouette.Asaresult,theexternallandmarksoftwodifferentposescorrespondto64different3Dmodelvertices.4.3.1.2.1LandmarkmarchingItisthereforedesirabletoestimateposeverticestomaintain3D-to-2Dcorrespondencesbetweenthelandmarks.Inliterature,therehavebeenafewproposedapproaches[21,69,101].Inthiswork,wefollowtheproposedlandmarkmarchingmethodfrom[101].,fortheexternallandmarksasetofhorizontalpaths,eachcon-tainingasetofvertexindices,aretomatchthecontourofthefaceasitturns.Givenanon-frontalfaceimagealongwithanestimatedpose,werotatethe3Dmodelusingtheestimatedyawandpitchwhileignoringtherollanddeterminethecorrespondingvertexalongeachprede-pathbasedonthemaximum(minimum)x-coordinatefortheright(left)sideoftheface.AvisualizationoftheprocessispresentedinFig4.3.4.3.2Step1:ModelPersonalizationThefacemodelplaysavitalroleinthereconstructionprocess.Thecurrentfacemodeldirectlyestablishescorrespondencebetweenphotos,providesaninitializationforsurfacenormalestima-tion,andregularizationduringsurfacereconstruction.Therefore,itisimportanttobeginwithagoodpersonalizedmodeloftheface.Wedesirethemodeltomatchtheoverallmetricstructureoftheindividualtoprovideaccuratecorrespondencewhenprojectedontophotosofdifferentposes.However,themodelneednotcontainfacialdetailssincethosewillbedeterminedbythephotometricnormalestimation.Priorworkusedeitherasinglefacemesh[53]oraStructurefromMotion-based(SfM)de-formationofasinglefacemesh[72].Thesemodelshavetwomainlimitations.One,themodelisasingleethnicity/genderandmaynotgeneralizeitsacrossadiversesetofsubjects.Two,theSfMtechniquerequiresmultipleimageswithsufentposevariationandmaynotworkfor65smallcollections.Therefore,weproposesupplementingmorepriorinformationtohelpformapersonalizedtemplateforawiderangeofsubjectswithfewimages.4.3.2.13DMorphableModelInlightoftheselimitations,weproposetousea3DMMinsteadofasingletemplatemesh.A3DMMcanapproximatearbitraryfaceshapesandisoneofthemostsuccessfulmodelsforde-scribingtheface.Representedasastatisticaldistributionoflinearcombinationsofscannedfaceshapes,the3DMMcompactlyrepresentswidevariationsduetoidentityandexpressionandisindependentoflightingandpose.Weusea3DMMintheform,X=¯X+199åk=1Xidkaidk+29åk=1Xexpkaexpk;(4.7)whereX2R3pisthe3Dfacecomposedofthemeanshape¯X,asetofidentitybasesXid,andasetofexpressionbasesXexp,withcoef~aidand~aexp.Weusethe3DMMfrom[101]wheretheidentitycomesfromtheBaselFaceModel[65]andtheexpressioncomesfromFaceWarehouse[22].Theseparationofthebasesintoexpressionandidentityisbasedonthemethodfrom[24].Fittinga3DMMentailsthemodelcoefandprojectionparameterswhichbestmatchafaceinagivenimage.Typically,3DMMaimstominimizethedifferencebetweenarenderedimageandtheobservedphoto[14]usingmanuallyannotatedlandmarksforposeini-tialization.Asautomaticfacealignmenthasimproved,Zhuetal.recentlyproposeanefmethodbasedonlyonlandmarkprojectionerrors[101].Tothe3DMMtoafaceimage,theyassumeweakperspectiveprojectionsRX+t,wheresisthescale,Risthetworowsofarotationmatrix,andtisthetranslationontheimageplane.66Giventhe2DalignmentresultsW,themodelparametersareestimatedbyminimizingtheprojectionerrorofthe3DMMtothelandmarks,argmins;R;t;~aid;~aexpkW(sR[X]land+t)k2F+Ereg;(4.8)where[X]landselectstheannotatedlandmarksfromtheentiremodelandkkFistheFrobeniusnormandEregisaregularizer(seeEq.4.9)forthe3DMMcoefHowever,asdiscussedinSec.4.3.1.2,theposemustbeknowntomarchtheexternal3DlandmarksalongtheirpathstoestablishcorrespondencewithW.Butinthecurrentformulation,theposeissolvedjointlywiththe3DMMcoefWefollow[101],andsolveEq.4.8inanalternatingmannerfortheposeparametersandthe3DMMcoefInitializingwiththemeanface,~aid=~aexp=~0,wesolveforthepose(s,R,andt)[18],thenupdatethelandmarksthroughmarching,andsolvefortheshape(~aidand~aexp).Allstepsareover-constrainedlinearleastsquaressolutions.Inthisworkweperform4totaliterationssinceitconvergesquickly.Weextendthisprocesstojointlynfacesofthesamepersonbyassumingacommonsetofidentitycoefaidbutauniquesetofexpressionaexpiandposeparametersperimage.Theerrorfunctionfullyexpressedis,argminsi;Ri;ti;~aid;~aexpinåi=11nkWi(siRi[¯X+199åk=1Xidkaidk+29åk=1Xexpkaexpki]landi+ti)k2F+199åk=1 aidksidk!2+29åk=1 1nåni=1aexpkisexpk!2;(4.9)whereskisthevarianceofthekthshapecoeftypicallyusedinTikhonovregularization,and[]landiisusedbecausedifferentposesoffaceimageshavedifferentselectionsofcorresponding67vertices.Thisfunctionmaybesolvedasbeforesinceitislinearwithrespecttoeachvariable.Oncetheparametersarelearned,wegenerateapersonalizedmodelX0usingtheidentitycoefentsandthemeanoftheexpressioncoef4.3.2.1.1ModelprojectionCorrespondencebetweenimagesinthecollectionisestablishedbasedonthecurrenttemplatemeshX0.GivenX0andtheprojectionparameterssolvedperimageduringmodelwesampletheintensityoftheprojectedlocationofvertexjinimageiandplacetheintensityintoacorrespondencematrixF2Rnp.Thatis,fij=Ii(u;v)whereIiistheithimageandhu;vi|=siRixj+tiistheprojected2Dimagelocationof3Dvertexj.AttheconclusionofStep1,wehaveapersonalizedmodelforthesubjectmatchingtheirover-allshape,aswellasprojectionparametersforeachimage.Themodelatthisstageisasmoothreconstructionfortworeasons.One,the3DMMonlycaptureslow-frequencyshapedetails.Two,themodelisbasedonalimitedsetofsparselandmarkssoitrequiresastrongregularizationfurthercreatingasmoothresult.Despitebeingsmooth,themodelallowsforasetofdensecor-respondencetobeestablishedacrossthephotocollection.Thesedensecorrespondenceswillbeusedtoaddinthedetailsoftheface.4.3.3Step2:PhotometricNormalEstimationToaddinthewrinkledetailstothepersonalizedmodel,weusethedensecorrespondencealongwithaphotometricstereo-basednormalestimation.Intuitively,thedifferencesinshadingobservedacrossthephotocollectionprovidecluestothetruesurfacenormalwhichmaydifferfromthesmoothversionofferedbythe3DMMestimate.Practicallyspeaking,wewillneedtoestimatethelightingconditionsforeachimageandthesurfacealbedoorofthefaceinordertoestimatethesurfacenormals.684.3.3.1LightingModelComputergraphicstakesamodeledsceneandrendersarealisticsyntheticimage.Whereas,com-putervisionsolvestheinverseproblem,i.e.,inferringthemodelparametersfromarealimage.Ineithercase,assumptionsabouthowtomodelascenemustbemade.Theassumptionsmaybeduetolimitedunderstandingoftherealworldenvironmentsuchaspropertiesofsurfaces,ortheymaybeforcomputationalefyortractability.Forexample,weuseaweakperspectivecameraprojectionmodeltotractablysolvetheposeandprojection,andweusethe3DMMmodelspriorknowledgeoffaceshapestopersonalizeourinitialshapemodel.Forlighting,weassumeaLambertianmodel,whichallowsaccumulationofmanyfarawaylightsourcesintoasinglevector,wheretheintensityataprojectedpointisbyalinearcombinationoflightingparametersandthesurfacenormal,I(u;v)=rjka+kdlxnxj+lynyj+lznzj;(4.10)whererjisthesurfacealbedoatvertexj,nxj;nyj;nzjistheunitsurfacenormalatvertexj,kaistheambientcoefkdisthediffusecoefandlx;ly;lzistheunitlightsourcedirectionoftheimage.Forsimplicity,wecombinethelightingcoefanddirectionintoavectorl=hka;kdlx;kdly;kdlzi|,andnj=h1;nxj;nyj;nzji|forthenormal.Usingthenotationfromthemodelprojectionweseethatfij=Ii(u;v)=rjl|inj.Thislightingmodelisalsocalledthesphericalharmonicsofthesurface.Ref[29]showsthattheoretically1stordersphericalharmonicsmodelsaminimumof87:5%ofthelightingenergywhileanon-linear2ndorderwillmodel99:2%,butinpracticetheyfound1stand2ndordermodel94-98%and99:5%respectively.Furthermore,[7]demonstratesthatshapereconstructionaccuracyusing1storderis95-98%while2ndorderis97-99%.So,whilea69morecomplexlightingassumptionmaypotentiallyincreasetheaccuracybyasinglepercentage,itintroducesnon-linearityintothesolutionprocess.Therefore,weusethe1storderassumptioninthiswork,butinthefuture,ifweallowothernonlinearitiesinthemodela2ndorderassumptioncouldbemade.PriorworkjointlysolvedfortheLambertianformulationusingsingularvaluedecomposition(SVD)byfactoringFintoalightmatrixL|andashapematrixŸNwhichincludesthealbedoandsurfacenormals[53,72].TheSVDapproachassumesthefourprincipalcomponentsofFencodethelightingvariationwhilesuppressingdifferencesinexpression,facialappearance,andcorrespondenceerrors.TheseassumptionsholdforlargecollectionsofnearlyfrontalimagesbecauseSVDcanaccuratelyrecoverthegroundtruthinthepresenceofsparseamountsoferror.However,wewillshowthatsmallcollectionsaresusceptibletoanycorrespondenceerrorsfrommisalignmentorexpressions.Furthermore,subjectswithlonghairthatobscuresthefaceandchangesstyleswithinthecollectionwillexpressasanalbedochangeandaffecttheprincipalcomponent.InlightofthelimitationsoftheSVDapproach,weproposeanenergyminimizationapproachtojointlysolveforalbedo,lighting,andnormalswith,argminrj;L;Npåj=1 nåi=1kfijrjl|injk2+lnknjntjk2!;(4.11)wherentjisthecurrentsurfacenormalofthefacemeshatvertexj.Thetemplateregularizationhelpskeepthefaceclosetotheinitialization.But,sincethesummationisnotaveraged,asmorephotosareaddedtothecollection,theregularizationhaslessoverallweightsincelnisindependentofcollectionsizeandtheestimatednormalsmaydeviatefurthertomatchtheobservedphotometricpropertiesofthecollection.Incontrast,whenthephotocollectionissmall,theregularization70(a)(b)Figure4.4:Effectonalbedoestimationwith(a)andwithout(b)dependability.Skinshouldhaveaconsistentalbedo,butwithoutdependabilitythecheekshowsghostingeffectsfrommisalignment.termwillplayanimportantroleindeterminingtheestimatedsurfacenormal.Thus,thisadaptiveweightinghandlesadiversephotocollectionsize.However,theoutlierswhicharemitigatedbytheSVDapproachcanhavealargerimpactwiththesquareerrorminimization,therefore,itisimportanttoagoodmethodfordeterminingwhatimagestouseforeachpartoftheface.4.3.3.2DependabilityWhilewehaveclaimedtoputthephotocollectionintocorrespondenceF,wecertainlydonotassumeittobeperfect.Weuseadependabilitymeasurementtoweighttheofdifferentimagesforeachvertex.Whatmakesapartoftheprojectedmeshonanimagedependable?Clearly,thepartmustbevisibleforthegivenposeandnotoccludedbysomethinginfrontoftheface.Doestheresolutionofanimagecontributetoitsdependability?Ifthefacehasadifferentexpression,itmayhavedifferentsurfacenormals.Faceswithinaccuratelandmarkalignmentwillbeoutofcorrespondence.Manydifferentfactorsplayaroleinthedependabilityofaprojectedpointwithinanimage.Weusedij=max(cos(c|inj);0)whereciisaunitcameravectorperpendiculartothe71imageplaneasthemeasureofdependabilitytohandleself-occlusionandsamplingartifacts.Otherproblemssuchasexpressionandandexternalocclusionweleaveforlocalselection(Sec.4.3.3.4).Whatdoesthisdependabilitymeasureaccomplish?First,self-occludedpartsofthefacearegivenaweightof0.Second,partsoftheimagemoresusceptibletoposeestimationerrorsaregivenlowerweights.Asavertex'snormalapproachesperpendiculartothecamera,slightperturbationsoftheposewillcausealargerchangeinwhatu,varesampledintheimage.Whereas,avertexpointingtowardsthecameraismorestableandshouldbemoredependable.Fig.4.4showsthealbedoestimationwithandwithoutdependability.WeupdateEqn.4.11to,argminrj;L;Npåj=1 nåi=1kdij(fijrjl|inj)k2+lnknjntjk2!:(4.12)Whatisnotmodeledbythisdependabilitychoice?First,anyexternalocclusion,suchassun-glasses.Second,landmarkalignmenterrors.Third,expressiondifferences.WewilladdresstheseissueswiththelocalizationstepintroducedinSec.4.3.3.4.Whatisnotmodeledbythisdependabilitychoice?First,anyexternalocclusion,suchassun-glasses.Second,landmarkalignmenterrors.Third,expressiondifferences.WewilladdresstheseissueswiththelocalizationstepintroducedinSec.4.3.3.4.4.3.3.3GlobalEstimationNowthatwehaveagoodideaofhowtoapproachthenormalestimation,wediscusshowtominimizetheenergyinEq.4.12.NotethatEq.4.12isnotjointlyconvex,butitwhensolvedinaniterativeapproach,ithasaclosedformsolutionfor~r,L,andNindependently.Webeginbyinitializingnjtothetemplatesurfacenormalatvertexjandrjto1.Wethenalternatesolvingforthelightingcoefalbedo,andthesurfacenormalsuntilconvergence.Solvinglightingisan72over-constrainedleastsquareswiththesolution,l|i=(fidi)=(ŸrNdi);(4.13)whereistheHadamardorentrywiseproductandŸris~rrepeated4timestobecomethesamesizeasN.Similarly,albedohasaclosedformsolution,rj=(d|jL|nj)=(d|jfj):(4.14)Finally,thenormalsaresolvedvia,nj=(B|B+lnI)1(B|(fjdj)+lnntj);(4.15)whereB=ŸrDL.4.3.3.4LocalSelectionAsmentionedinSec.4.3.3.2,thedependabilitymeasureonlyhandlessmalllandmarkalignmenterror,butdoesnotconsiderexpressionchanges,occlusions,orotherpotentialcorrespondenceerrors.Tohandletheseotherformsoferror,weusealocalselectionprocessasproposedin[53]tothephotometricestimates.Thegoaloflocalselectionistoacollectionofimagesforeachvertexthatareinlocalagreement,andre-estimatethesurfacenormalusingonlythoseimages.Thispreventssmoothingacrossallexpressions,andcantheocclusions.ThebasicapproachoflocalselectionistoidentifyasubsetofimagesBjforeachvertexjandthenre-minimizethe73photometricequationforthatvertex'snormal:argminnjåi2Bjkdij(rjl|injfij)k2+lnknjntjk2:(4.16)Allofthepriorworkusesthesameschemeoflocalselection[53,72]whichwetermsquareerrorlocalization.Thesubsetischosensuchthatthesquareerroroftheobservedvaluefortheimagematchestheestimatedvalueforthevertex,Bj=fijkrjl|injfijk2eg.75Figure4.6:Themeancurvaturenormalindicateshowavertexdeviatesfromtheaveragelocationofitsimmediateneighbors,whichcanbeevaluatedastheLaplacianoftheposition.ThemeancurvatureHjcanbeevaluatedthroughn.4.3.4Step3:SurfaceReconstructionGiventhelocalizedsurfacenormalsnjthatspecifythedetailsoftheface,wedesiretorecon-structanewfacesurfaceXwhichmatchestheobservednormals.TheprocessissimilartoChp.3,butwesummarizeitagainhere.WeuseaLaplacian-basedsurfaceeditingtechniquemotivatedby[78].TheLaplace-BeltramioperatoristhedivergenceofagradientUsinglinearelements,itcanbediscretizedintoL,asymmetricmatrixwithentriesLjk=12(cotajk+cotbjk),whereajkandbjkaretheoppositeanglesofedgejkinthetwoincidenttriangles(seeFigure4.6),knownasthecotanformula[66].Geometrically,Lmeasuresthedifferencebetweenafunctionsvalueatavertexandtheaveragevalueoftheneighboringvertices.Asin[78,97],wenotethatxjL=njHj,whereHjistheintegralofthemeancurvatureatvertexj.Whatthismeansforus,isthatwecanusetheestimatedsurfacenormalstoupdatethepositionsofthemeshassumingwecandeterminethemeancurvature.WeestimateHjgivenanormalUsingadiscretizationofH=ÑAn,i.e.,themean76curvaturemeasureshowfasttheareachangeswhenmovingthesurfacealongthenormaldirection.Thevariationoftheareacanbemeasuredthroughthedifferencebetweenniandnjasfollows,Hj=14Ajåk2N(j)(cotajk+cotbjk)ejk(nknj);(4.18)whereN(j)istheone-ringneighborhoodofj,Ajisthesumoftriangleareasincidenttoj,ejkistheedgefromjtok(Figure4.6).NotethecotanweightsareidenticaltothosefromtheLaplace-Beltramioperator.Weputthistogethertoperformsurfacereconstructionwithanenergycomposedofthreeparts,argminXEn+lbEb+llEl:(4.19)ThenEn=kXL+NHkk2isthenormalenergyderivedfromtheLaplaciandiscussionwhereHkisadiagonalmatrixofthevertexmeancurvatureintegralsHjfromthecurrentfacemodel.Eb=kXLbXkLbk2istheboundaryenergy,requiredsincethemeancurvatureformuladegeneratesalongthesurfaceboundaryintothegeodesiccurvature,whichcannotbedeterminedfromthephotometricnormals.WethereforeseektomaintainthesameLaplacianalongtheboundarywithLb;jk=1=jejkjwherejejkjistheedgelengthconnectingadjacentboundaryverticesjandk.AndEl=åiksiRi[X]landi+tiWik2F,whichusesthelandmarkprojectionerrortoprovideaglobalconstraintontheface,withoutwhich,theintegrationofthenormalscanhavenumericdriftacrossthesurfaceoftheface.UnlikeChp.3wedonotincludeashadowregionsmoothingsinceweusethetemplatenormalasaregularizerduringnormalestimation.77Algorithm2:Adaptive3DfacereconstructionData:PhotocollectionResult:3DfacemeshX//Step11estimatelandmarksWiforeachimage2the3DMMviaEq.4.9togeneratetemplateX03remeshtothecoarseresolution4forresolution2{coarse,medium,do5repeat6estimateprojectionsi;Ri;tiforeachimage7establishcorrespondenceFviabackprojection//Step28globallyestimateL,~r,andNviaEq.4.129localselectionofimagesBviaSec.4.3.3.410re-estimatesurfacenormalsNviaEq.4.16//Step311reconstructsurfaceXk+1viaEq.4.1912until1pkXk+1Xkk2F