IMAGEANNOTATIONANDTAGCOMPLETIONVIAKERNELMETRIC LEARNINGANDNOISYMATRIXRECOVERY ByZheyunFeng ADISSERTATION Submittedto MichiganStateUniversity inpartialfulÞllmentoftherequirements forthedegreeof ComputerScience-DoctorofPhilosophy 2016ABSTRACT IMAGEANNOTATIONANDTAGCOMPLETIONVIAKERNELMETRIC LEARNINGANDNOISYMATRIXRECOVERY ByZheyunFeng Inthelastseveralyears,withtheever-growingpopularityofdigitalphotographyand socialmedia,thenumberofimageswithuser-providedtagshasincreasedenormously.Due tothelargeamountandcontentversatilityoftheseimages,thereisanurgentneedto categorize,index,retrieveandbrowsetheseimagesviasemantictags(alsocalled attributes orkeywords ).Followingthistrend,imageannotationortagcompletionoutofmissing andnoisygiventagsoverlargescaledatasetshasbecomeanextremelyhottopicinthe interdisciplinaryareasofmachinelearningandcomputervision. Theoverarchinggoalofthisthesisistoreassesstheimageannotationandtagcompletion algorithmsthatmainlycapturetheessentialrelationshipbothbetweenandwithinimages andtagsevenwhenthegiventaginformationisincompleteornoisy,soastoachieveabetter performanceintermsofbothe !ectivenessande "ciencyinimageannotationandothertag relevanttasksincludingtagcompletion,tagrankingandtagreÞnement. Oneofthekeychallengesinsearch-basedimageannotationmodelsistodeÞneanap- propriatesimilaritymeasure(distancemetric)betweenimages,soastoassignunlabeled imageswithtagsthataresharedamongsimilarlabeledtrainingimages.Manykernelmetric learning(KML)algorithmshavebeendevelopedtoserveassuchanonlineardistancemetric. However,mostofthemsu !erfromhighcomputationalcostsincethelearnedkernelmetric needstobeprojectedintoapositivesemi-deÞnite(PSD)cone.Besides,inimageannota- tiontasks,existingKMLalgorithmsrequiretoconvertimageannotationtagsintobinary constraints,whichleadtoasigniÞcantsemanticinformationlossandseverelyreducesthe annotationperformance. Inthisdissertationweproposearobustkernelmetriclearning(RKML)algorithmbased onregressiontechniquethatisabletodirectlyutilizetheimagetags.RKMLiscomputation- allye "cientsincethePSDpropertyisautomaticallyensuredbytheregressiontechnique. Numericconstraintsovertagsarealsoappliedtobetterexploitthetaginformationand henceimprovetheannotationaccuracy.Further,theoreticalguaranteesforRKMLarepro- vided,anditse "ciencyande !ectivenessarealsoveriÞedempiricallybycomparingitto state-of-the-artapproachesofbothdistancemetriclearningandimageannotation. Sincetheuser-providedimagetagsarealwaysincompleteandnoisy,wealsoproposeatag completionalgorithmbynoisymatrixrecovery(TCMR)tosimultaneouslyenrichthemissing tagsandremovethenoisyones.TCMRassumesthattheobservedtagsareindependently sampledfromunknowndistributionsthatarerepresentedbyatagmatrix,andourgoalis torecoverthattagmatrixbasedonthepartiallyrevealedtagswhichcouldbenoisy.We providetheoreticalguaranteesforTCMRwithrecoveryerrorbounds.Inaddition,agraph Laplacianbasedcomponentisintroducedtoenforcetherecoveredtagstobeconsistentwith thevisualcontentsofimages.Ourempiricalstudywithmultiplebenchmarkdatasetsfor imagetaggingshowsthattheproposedalgorithmoutperformsstate-of-the-artapproaches intermsofbothe !ectivenessande "ciencywhenhandlingmissingandnoisytags. Tomyparents,ShiyingandXuexiang. ivACKNOWLEDGMENTS Firstandforemost,Ifeelindebtedtomyadvisor,ProfessorRongJin,forhisguidance, encouragement,andinspiringsupervisionthroughoutthecourseofthisresearchwork.His patience,prudentialattitude,extensiveknowledge,andcreativethinkinghavebeenthe sourceofinspirationforme.HewasavailableforadviceoracademichelpwheneverIneeded andgentlyguidedmefordeeperunderstanding,nomatterhowlateorinconvenientthetime was.WhenIwasstrugglingtostopmycareerandleavingParisfouryearsago,Iwasnot sureaboutmydecision,butnowIamsohappyandproudtosaythatImadeasowise decision.ItisextremelyhardtoexpresshowgratefulIamforhisunwaveringsupportover thelastyearsandinthecomingfuture. IwouldliketotakeonthisopportunitytothankmythesiscommitteemembersProfessor AnilK.Jain,ProfessorJoyceY.Chai,andProfessorSelinAviyentewhohaveaccommodated mytimingconstraintsdespitetheirfullschedules,andprovidedmewithpreciousfeedback forthepresentationoftheresults,inbothwrittenandoralform. DuringmyPh.D.studies,Ihadthepleasureofcollaboratingwithmanyresearchers fromeachandeveryoneofwhichIhadthingstolearn,andthequalityofmyresearch wasconsiderablyenhancedbytheseinteractions.IwouldliketothankProfessorAnilK. Jainwhogenerouslygavemesomuchgeneralguidanceinmyresearchdirectionandpaper polishing.IwouldliketothankSongheFeng,TianbaoYangandFengjieLiforallthe discussionswehadandthefunmomentswespenttogetherondoingresearchandfuture planning.IalsospenttwowonderfulsummersasinternatComcastLab,DCwithDr.Jan Neumann.Ilearnedalotfromhisteamandwouldliketoexpressmygratitudeforhaving measaresearchintern.IalsowouldliketothankSerhatSelücukBucakandRadhaChitta vforsomehelpfuldiscussionsandsuggestions. LivinginEastLansingwithoutmygoodfriendswouldnothavebeeneasy.Iwantto thankallmyfriendsinthedepartmentandoutsidethedepartment.IwishIcouldname youall. LastbutdeÞnitelynotleast,Iwanttoexpressmydeepestgratitudetomybeloved parentsanddearestboyfriend.Theirloveandunwaveringsupporthavebeencrucialtomy success,andaconstantsourceofcomfortandcounsel.Specialthankstomyparentsfor abidingbymyabsenceinlastfouryears. viTABLEOFCONTENTS LISTOFTABLES .................................... xLISTOFFIGURES ................................... xiiLISTOFALGORITHMS ............................... xvChapter1Introduction ................................ 11.1ImageAnnotation.................................5 1.2ImageTagging...................................6 1.3ThesisContributions...............................10 1.4ThesisOverview..................................12 1.5Notation......................................13 1.6BibliographicNotesforPreviousPublications.................13 Chapter2Background ................................ 142.1ImageRepresentation...............................14 2.1.1ColorFeature...............................14 2.1.2TextureFeature..............................15 2.1.3TypicalFeaturesinImageTagging...................16 2.2ImageAnnotation.................................17 2.2.1GenerativeModels............................17 2.2.2DiscriminativeModels..........................18 2.2.3SearchbasedModels...........................19 2.2.4NeuralNetworkbasedModels......................20 2.3ImageTagging...................................22 2.3.1ImageTaggingwithTopicModels....................24 2.3.2ImageTagCompletion..........................25 2.4ImageAnnotationbyMetricLearning.....................26 2.4.1LinearDistanceMetricLearning.....................27 2.4.2NonlinearDistanceMetricLearning...................29 2.4.3OnlineMetricLearning..........................31 2.4.4LocalMetricLearning..........................32 2.4.5OtherMetricLearning..........................32 2.5ImageTaggingbyMatrixCompletion......................33 2.5.1LowRankMatrixRecoverywithNuclearNormMinimization....33 2.5.2LowRankRecoveryunderOtherConstraintsandSamplingDistributions37 Chapter3ImageAnnotationwithKernelDistanceMetricLearning ...393.1MotivationandSetup...............................39 vii3.2RelatedWork...................................42 3.3AnnotateImagesbyRegressionbasedKernelMetricLearning(RKML)...43 3.3.1RegressionbasedKernelMetricLearning................44 3.3.2ExtensiontoImageFeatureDimensionReduction...........46 3.4TheoreticalGuaranteeofRKML........................46 3.4.1AnalysisoftheLowRankApproximationA !ectstoRKML.....48 3.5ProofsofErrorBounds..............................49 3.5.1ProofofTheorem3.1...........................49 3.6Implementation..................................51 3.6.1ComputingSemanticSimilarity si,j...................52 3.6.2E "cientlyComputing KrbyRandomProjection...........52 3.6.3ApplicationofRKMLtoImageAnnotation..............53 3.7Experiments....................................55 3.7.1DatasetsandExperimentalSetup....................55 3.7.2ComparisonwithState-of-the-artdistancemetriclearning(DML)and ImageAnnotationAlgorithms......................57 3.7.2.1ComparisontononlinearDMLalgorithms..........57 3.7.2.2ComparisontolinearDMLalgorithms............59 3.7.2.3ComparisonwithState-of-the-artImageAnnotationMethods60 3.7.2.4ComparisonofAnnotationResultsonExemplarImages..61 3.7.2.5E "ciencyEvaluation......................63 3.7.3A !ectsofDi !erentExperimentalandParameterSetup........63 3.7.3.1SensitivitytoParameters...................63 3.7.3.2AdvantagesofKernelTrickandNystr¬omApproximation..65 3.7.3.3AnalysisonBinaryConstraintsandTheirVariousGenera- tionWays............................67 3.7.3.4ComparisonoftheDesignChoicesofSemanticSimilarity Measure.............................68 3.8Summary.....................................69 Chapter4ImageTagMatrixCompletionbyNoisyMatrixRecovery ...714.1MotivationandSetup...............................72 4.2RelatedWork...................................75 4.3TagCompletionbyNoisyMatrixRecovery(TCMR).............76 4.3.1NoisyMatrixRecovery..........................77 4.3.2IncorporatingIrrelevantTagsintoNoisyMatrixRecovery.......79 4.4TheoreticalGuaranteeofRKML........................79 4.4.1ImpactofLowRankAssumptiononRecoveryError..........80 4.5ProofsofErrorBounds..............................81 4.5.1ProofofTheorem4.1...........................82 4.5.2ProofofTheorem4.2...........................85 4.5.3ProofofTheorem4.3...........................86 4.5.4ProofofTheorem4.6...........................86 4.6Implementation..................................90 4.6.1IncorporatingVisualFeatures......................90 viii4.6.1.1GraphLaplacianMethod...................90 4.6.1.2LinearReconstructionApproach...............92 4.6.2E "cientSolutionoftheProposedAlgorithm..............92 4.6.3Pseudo-codeofTCMR..........................93 4.7Experiments....................................93 4.7.1DatasetsandExperimentalSetup....................93 4.7.2Comparisontostate-of-the-artTagCompletionMethods.......96 4.7.2.1E "ciencyEvaluation......................98 4.7.3AnalysisofAlgorithmDesign......................98 4.7.3.1EvaluationofNoisyMatrixRecoverywithoutVisualFeatures98 4.7.3.2AnalysisofScalability.....................100 4.7.3.3EvaluationonVariousTypesofRegularizer.........102 4.7.3.4EvaluationonVariousLossFunctions............103 4.7.3.5E "cientExtensionofTCMRbyLinearReconstruction...104 4.7.4E !ectsonMissingandNoisyTags....................106 4.7.4.1SensitivitytotheNumberofObservedTags.........106 4.7.4.2SensitivitytoNoise.......................107 4.7.5E !ectsonOtherTag-relevantApplications...............108 4.8Summary.....................................110 Chapter5SummaryandConclusions ....................... 1125.1Contributions...................................112 5.1.1ImageAnnotationbyKernelMetricLearning.............112 5.1.2ImageTagCompletionbyNoisyMatrixRecovery...........113 5.2FutureWork....................................114 5.3Conclusions....................................116 APPENDIX ........................................ 117BIBLIOGRAPHY ................................... 122ixLISTOFTABLES Table3.1Statisticsforthedatasetsusedintheexperiments.Thebottomtwo rowsaregivenintheformatmean/maximum.............56 Table3.2Examplesofannotationresultsgeneratedby14baselinesandthepro- posedRKML.Theannotatedtagsarerankedbasedontheestimated relevancescoreindescendingorder,andthecorrectonesarehigh- lightedinblueboldfont.Notethegroundtruthannotationsinthe 2-ndcolumndonotalwaysincludeallrelevanttags( e.g.,ÒpeopleÓ forthe5-thimage),andsometimescontainpolysemes( e.g.,ÒpalmÓ forthe4-thand5-thimages)andcontroversialtags( e.g.,ÒfrontÓ).62 Table3.3Comparisonofrunningtime(s)forseveraldi !erentmetriclearning algorithms.................................63 Table3.4Runningtime(s)forimageannotation.SVMmethodsFlickr1Mare notincludedduetotheirhighcomputationalcosts..........63 Table3.5ComparisonofvariousextensionsofRKMLintermsof AP@tonIAPRTC12dataset.RLMListhelinearversionofRKML,RKMLO istheoriginalversionwithoutNystr¬omapproximation,andRKMLH runsRKMLusingbinaryconstraints..................66 Table3.6ComparisonoftheextensionsofRKMLintermsof AP@tonESP Gamedataset..............................66 Table3.7ComparisonoftheextensionsofRKMLintermsof AP@tonFlickr 1Mdataset.RKMLOisexcludedsincethedatasetistoolargetodo thecomputationonthefullkernel....................66 Table3.8Comparisonofdi !erentmethodsofgeneratingbinaryconstraintsthat areappliedinbaselinedistancemetriclearningalgorithmLMNNfor thetop tannotatedtagsontheFlickr1Mdataset.Method1clus- tersthespaceofkeywords,method2considerstheclassassignments asbinaryconstraints,method3clustersthespaceofkeywordsus- inghierarchicalclusteringalgorithms,method4clustersthespaceof keywordstogetherwiththevisualfeatures,andmethod5considers imagessharingmorethan4keywordsassimilarandimagessharing nokeywordasdissimilar.........................67 Table3.9Localweightingfunctions........................68 xTable3.10Globalweightingfunctions........................69 Table3.11ComparisonofextensionsofRKMLwithdi !erentdesignchoicesof semanticsimilarityforthetop tannotatedtagsontheIAPRTC12 dataset.Theleftmostcolumnliststhedi !erentweightingmethods, wherethenamebeforeÓ-ÓdenotesthelocalweightsshowninTa- ble3.9andthenamebehindÓ-Óindicatestheglobalweightsshown inTable3.10.ÓCosineÓrepresentsthecosinesimilaritybetweentag vectorsoftwoimages...........................69 Table4.1StatisticsforthereÞneddatasets. !indicatesthenumberofobserved tagswhentrainingtheTCMRmodelthroughouttheexperimental sectionifnospeciÞcexplanation.....................95 Table4.2Runningtime(seconds)fortagcompletionbaselines.Allalgorithms areruninMatlabonanAMD4-core@2.7GHzand64GBRAM machine..................................98 Table4.3ComparisonoftagcompletionperformancebetweenTCMRandits counterpartswithdi !erentregularizers,evaluatedbyaccuracy(%) ande "ciency/runningtime(s).....................102 Table4.4Comparisonoftagcompletionaccuracy(%)betweenTCMRandits counterpartswithdi !erentlossfunctions.Standarddeviationisomit- tedforsimplicity.[1]to[5]representabsolute,leastsquare,hinge, logisticandmaximumlikelihoodlossfunctions,respectively......103 Table4.5Comparisonoftagcompletione "ciency(runningtimeinsecond) betweenTCMRanditscounterpartswithdi !erentlossfunctions..103 Table4.6PerformancecomparisonofTCMRandTCMR-lr,intermsofboth accuracy(%)andrunningtime(s)...................104 Table4.7PerformancecomparisonofTCMRandTCMR-lrwhentheobserved tagsareseverelynoisy. AP@1isusedforevaluation..........105 Table4.8Examplesoftagcompletionresultsgeneratedbysomebaselinealgo- rithmsandtheproposedTCMR.Theobservedtagsinreditalicfont arenoisytags,andothersarerandomlysampledfromtheground truthtags.Thecompletedtagsarerankedbasedontherecovered scoresindescendingorder,andthecorrectonesarehighlightedin blueboldfont..............................111 xiLISTOFFIGURES Figure1.1DailynumberofimagesuploadedtotheInternetthroughselected apps[143].................................2 Figure1.2Anillustrativeexampleforthecomparisonoflinearandnonlinear distancemetriclearningalgorithms.(a),(b)and(c)showtheorig- inaldatadistribution,thedistributionadjustedbyalearned linear distancemetric,andthedistributionadjustedbyalearned kernel metric,respectively...........................7 Figure1.3Exemplarillustrationoftagcompletionandotherimagetagging worksincludingimageannotation,tagrecommendationandtagre- Þnement.Theupperboxshowstheinitiallygiveninformation(both visualandsemantic),andthebottomboxindicatestheultimateob- jectiveofallfourtasks.........................9 Figure2.1AnnotateanimagewithANN[220]...................21 Figure2.2Theillustrationofhowtopicmodelworks[11],whereeachtopicis highlightedbyaspeciÞccolor......................24 Figure3.1Illustrationofhowkerneldistancemetricworkstoimageswithap- propriatetags.Intheleftbox,imagessharethetagsmarkedinthe samecolorasthelinesconnectingthem.................40 Figure3.2Theproposedkernelmetriclearningscheme, i.e.,RKML,forauto- maticimageannotation.........................55 Figure3.3Averageprecisionforthetop tannotatedtagsusingnonlineardis- tancemetrics...............................58 Figure3.4Averagerecallforthetop tannotatedtagsusingnonlineardistance metrics..................................58 Figure3.5AverageF1scoreforthetop tannotatedtagsusingnonlineardistance metrics..................................58 Figure3.6Averageprecisionforthetop tannotatedtagsusinglineardistance metrics..................................59 xiiFigure3.7Averagerecallforthetop tannotatedtagsusinglineardistancemet- rics....................................59 Figure3.8AverageF1scoreforthetop tannotatedtagsusinglineardistance metrics..................................59 Figure3.9Annotationperformanceintermsof AP@twithdi !erentannotation models...................................60 Figure3.10Averagerecallforthetop tannotatedtagsusingdi !erentannotation models...................................61 Figure3.11AverageF1scoreforthetop tannotatedtagsusingdi !erentanno- tationmodels..............................61 Figure3.12AveragePrecisionfortheÞrsttagpredictedbyRKMLusingdi !erent valuesofrank r.TomaketheoverÞttinge !ectclearer,weturno !theNystr¬omapproximationforIAPRTC12andESPGamedatasets. Flickr1Mdatasetisnotincludedduetoitslargesize( n=999 ,764).TheoverÞttingonlyoccurswhen rapproximatestothetotalnumber ofimages,butitisinfeasibletoapplysuchalarge rinFlickr1M dataset..................................64 Figure3.13AveragePrecisionforthetop ttagspredictedbyRKMLusingdi !er- entvaluesof m",thenumberofretainedeigenvectorswhenestimating thesemanticsimilarity..........................65 Figure3.14AveragePrecisionforthetop ttagspredictedbyRKMLusingdif- ferentvaluesof ns,thenumberofsampledimagesusedforNystr¬om approximation.In(c), nscouldnÕtbesettoolargeduetothedataset size....................................65 Figure4.1Theschemeoftheproposednoisytagmatrixrecoveryframework, i.e.,TCMR,forimagetagcompletion.Thelowrankmatrixrecovery componentintheupperrightboxexploitsthetag-tagcorrelation, andthegraphLaplaciancomponentinthebottomlefttakesinto accountofthetag-contentcorrelation..................74 Figure4.2ComparisonoftagcompletionperformancebetweenTCMRandstate- of-the-artbaselinesonIAPRTC12dataset...............97 Figure4.3ComparisonoftagcompletionperformancebetweenTCMRandstate- of-the-artbaselinesonotherdatasetsincludingMirFlickr,ESPGame andNUS-WIDE..............................97 xiiiFigure4.4Comparisonofdi !erenttopicmodelsandmatrixcompletionalgo- rithmswithouttakingintoaccountthevisualfeature.Thetoprowis evaluatedby AP@N,themiddlerowisby AR@N,andthebottom rowisby C@N..............................99 Figure4.5Scalabilityanalysisoverlarge-scaledatasetImageNetintermsoftag- gingprecision.Metric AP@Nisusedforevaluation.Thesizeof evaluatedsubset Nvariesfrom4 Kto1 M...............101 Figure4.6Scalabilityanalysisoverlarge-scaledatasetImageNetintermsofim- plementationtime(log 10(seconds)).Thesizeofevaluatedsubset Nvariesfrom4 Kto1 M..........................101 Figure4.7Tagcompletionperformancewithvariednumberofobservedtags, evaluatedby AP@3(toprow)and AP@5(bottomrow).IAPRTC12 isacleanandcompletedatasetwhileNUS-WIDEcontainsmissing andnoisytags...............................106 Figure4.8Comparisonoftagcompletionperformance( AP@3)usingnoisyob- servedtags................................108 Figure4.9ComparisonbetweenTCMRandbaselinealgorithmswithvariedper- centageofnoisytagsintermsofothertwotagrelevantapplications, includingtagrankingandtagreÞnement.Thecounterpartperfor- manceoftagcompletioncanbereferredtoFigure4.8(b).......109 xivLISTOFALGORITHMS Algorithm1AutomaticImageAnnotationwithRKML...........................54 Algorithm2ImageTagCompletionbyNoisyMatrixRecovery....................94 xvChapter1 Introduction Wearefacingtheproblemofimageexplosion:massiveimageshavebeenprovidedthrough di!erentsourcesincludingtheInternet,cameranetwork,researchlaboratories,personal digitaldevicesandmanyotherphotomanagementapplications.TheInternethasgreatly promotedtheabilitytoreleaseandaccessallmannerofmultimediainformationespecially images.Forinstance,KPCBanalystMaryMeekerÕsannualInternetTrendsreportstates thatallinternet-connectedcitizensshareover1 .8billionphotoseachday[143]throughmulti- platformservicessuchasSnapchat 1,Instagram 2,Facebook 3,WhatsApp 4,etc.,asshown inFigure1.1.Therefore,suchaproliferationofimagesposesanurgentchallengeforlarge scaleimagecategorization,indexing,retrievalandbrowser. Intheimageretrievalcommunity,mostmethodscanbecategorizedintotwogroups: contentbasedimageretrieval(CBIR)[176]andtagbasedimageretrieval(TBIR)[135]. CBIRmatchesthequeryimageandthegalleryimagesbasedontheirvisualsimilarities thatcouldbecomputedfromagroupofvisualfeaturesincludingcolor,texture,shape[75], LBP[152],HOG[37],SIFT[137],GIST[154]andthelistgoeson[204].Despitetheelaborate systemdesigningandcomputationale !orts,theperformanceofCBIRisstillprohibitedby thenotorioussemanticgapbetweenthelowlevelvisualfeaturesthatreßecttheimage 1https://www.snapchat.com/ 2https://www.instagram.com 3https://www.facebook.com/ 4https://www.whatsapp.com/ 1Figure1.1:DailynumberofimagesuploadedtotheInternetthroughselectedapps[143]. contentsandthehighlevelsemanticsbehindimages[67,201].Toconsiderablyimprove theperformanceofCBIR,substantialadvancementsintermsoftheinvolvedtechnologies anddesignsarestillrequired,whichincludesfeatureextraction,featureselection,indexing, queryrephrasingandcompletion[132]. ToovercomethelimitationsofCBIRintermsofbothretrievale !ectivenessande "-ciency,theTBIRwasproposedaccordingly.Insteadofthevisualfeatureswhichtakegreat computationcost,TBIRrepresentsimageswithasetoftags(alsocalledkeywords,labels orattributes).Theusergivesoutthequeryasasequenceofsemanticwords,andthen therelevantimagesareretrievedbasedonthematchesbetweenthetextualqueryandthe imagetags.ComparedwithCBIR,TBIRhastwosigniÞcantadvantages.First,itallows theuserstobetterexpresstheirqueryneedswithsemanticwords,whichalleviatesthese- 2manticgapandimprovestheretrievalaccuracy.Andsecondly,TBIRformulatestheimage retrievalproblemasadocumentretrievalproblem,whichallowstousetheinvertedindex technique[227]andgreatlyimprovestheretrievale "ciency. Besidestagbasedimageretrieval,thereisalsomanyothertagdependenttasksthat categorize[116],indexingandbrowse[168]imagesviasemantictagsforsimilarreasons. However,thegoodperformanceofthesetaskssubstantiallyreliesontheimagesetwhichis supposedtohavesu "cienthighqualitytags. However,amongthegreatamountofavailableimages,onlyasmallportionofimagesare associatedwithappropriatetags.Generallyimagesareannotatedmanually,eitherbyprofes- sionalannotatorsorsimplybythephototakersandreviewers.Theprofessionallyannotated tagsareelegantandreliable,butcosttremendouslabele !ortsandtime.Typicalsuchimage datasetsincludeCCUBNABirds700Dataset 5andMicrosoftCOCODataset[125],which tookyearstocollect,annotateandbuildupbyagroupofresearchers.Apparently,thisis prohibitivelylaborcostingintermsoftheproliferationofimages[202].Fortunatelyinmost cases,theimagetagsareprovidedbytheuserswhouploadtheimagetosocialmedia(e.g. Flickr 6)andthereviewersofthisimage,ordirectlycrawledfromtheaccompanydescrip- tions/titlesofthatimage.Howevertagsgeneratedinthiswayarefarawayfromreliable, sincetheyareusuallygeneral,ambiguous,biased,andsometimeseveninappropriate,incom- pleteorredundantformanyreasonsaccordingto[70,101].Allthesefactorscouldseverely preventtheperformanceofTBIRandothertag-basedtasks. Asaresult,theneedforreliabletagsoverlargescaleimagesbecomesproÞtableand emergent,whichmotivatestheresearchcommunitytodevelope !ectiveande "cientauto- 5http://info.allaboutbirds.org/nabirds/.6https://www.flickr.com/.3matictaggingsystems[56].Amongthem,imageannotationandtagcompletionaretwobig branchesthatcatchthemosteyes. Thedi !erencesbetweenthesetwobranchescomefromthetwotypesofthesupervised information.Intheimageannotationframework,asubsetofimagesisassociatedwith appropriatetagsandtheotherimagesarenotassignedwithanytag.Andthegoalis topredicttagsfortheunlabeledimages.Inthetagcompletiontask,alltheimagesare associatedwithcertainnumberoftags.Howeversometagsareappropriatewhilesome othersarenot,andtherearealsosometagssupposedtobeobservedbutactuallynot.And itsÞnalpurposeistoupdatethewholetagmatrixtomakeitbetterdescribetheimageÕs visualcontents. Sobasically,thegoalofimageannotationandtagcompletionworkistolearnfrom labeledexamplesinordertopredictthelabelsofotherexamplesorthescoresoftheother labels.Thatis,givenatrainingsetofsupervisedinformation(examplesorlabels),itis aimedtolearnahypothesisthatassignseachlabelaconÞdencescoretoassociateasample wherethelabelorthesamplecouldhaveneverbeenobservedbythealgorithm.E "ciently Þndingsuchane !ectivehypothesisbasedonthetrainingsetandtheobservedlabels,which minimizessomevalidationmeasureofperformance,isthemainfocusofthislearning. Thischapterisdevotedtoanoverviewofthesetwobroadtopicsoftagassigningand amelioration,aimingtodevelopageneralcorrespondencebetweenorwithintheimagevisual contentsandthesemantictags.HerewemovetowardstothedeÞnitionsinafairlynon- technicalmannerandtheformaldetaileddeÞnitionswillbegiveninChapter2. 41.1ImageAnnotation Theobjectiveofimageannotationistoautomaticallyannotateanimagewithappropriate tags,whichexclusivelyreßectitsvisualcontent.Imageannotationhasbeenahottopic ofon-goingresearchformorethanadecade,andmanytechniqueshavebeendeveloped. Conventionally,imageannotationistackledasamachinelearningproblem,wheretwomajor componentsareincluded:visualfeatureextractionandmappingthosefeaturestosemantic tags[132].FeatureextractionobtainssigniÞcantpatternsfromimages,andthenthepatterns aremappedtokeywordsinthesemanticspaceviaasetofmachinelearningalgorithms,which capturetherelationshipbetweenvisualcontentsandsemantictagsinoneofthreeways:(i) formalizingastatisticalmodelbetweentagsandvisualfeatures[22,49,119,132];(ii)casting theproblemintoasetofbinaryclassiÞcationones[47,72];and(iii)representingthetags asamatrixandtreatingtheannotationproblemasamatrixfactorization[223]ormatrix completionproblem[126,201].Thekeyofthesemethodsistotrainareliablemodelwith su"cientlyaccuratetagsbyoptimizingimagecompactnessandseparabilityinaglobalsense. However,thesemanticgap,andtheimperfecttagsusuallyleadtoabiasedmodelandresult inasuboptimalsolution.Thatmeansthediscriminatorypowerofinputimagesmightvary betweendi !erentneighborhood,andaglobalmodelhencecannotÞtwellthevisual-semantic relationoverthedatamanifold.Besides,manyparametricmodelsarenotrichenoughto e!ectivelycapturethecomplicatedependenciesbetweenimagecontentandtags. Recently,alocalnon-parametricmodel,thesearchbasedapproach,hasbeenprovedtobe quitee !ective,particularlyforlargeimagedatasetswithmanykeywords[67,93,139,194]. Itskeyideaistoannotateatestimage Iwiththecommontagssharedbythesubsetof trainingimagesthatarevisuallysimilarto I.5Thecruxofsearchbasedannotationmethodsistoe !ectivelymeasurethevisualsimi- laritybetweenimages. Distancemetriclearning (DML)[78,210,205]tacklesthisproblem bylearningametricthatpullssemanticallysimilarimagescloseandpushessemantically dissimilarimagesfarapart.ManystudiesonDMLarerestrictedtolearningalinearMa- halanobisdistancemetricinaÞnitedimensionalspace,whichisexpectedtobeconsistent withtheassociatedtags. However,mostdistancemetriclearningalgorithmsassumealldataisoflinearsepara- bility[26],andtheyusuallyfailtocapturethenonlinearrelationshipsamongimages.To addressthisproblem,severalnonlinearDMLalgorithmshavebeenproposed.Theirkeyidea istomapdatapointsfromtheoriginalvectorspacetoahigh(oreveninÞnite)dimensional spacethroughanonlinearmapping,whichcanbeeitherexplicitlyconstructedusingboost- ingmethods[73,74,172],orimplicitlyderivedthroughkernelfunctions.Andthelatteris referredtoas KernelMetricLearning (KML)[26,39,187],whichhasbeenwidelyusedto settleimagesimilarityproblemsinimageclassiÞcation[39,60,187],clustering[2,26,205], andretrieval[77,78].Figure1.2illustratesanexamplecomprisedoftwogroupsofdata pointsannotatedbydi !erenttags,andFigure1.2(b)showsthedistributionsofdatapoints adjustedbyalearnedlineardistancemetric,whichfailstoseparatetheobjectswithdi !erent tags,whileinFigure1.2(c)thenewlylearnednonlinear(kernel)distancemetricsuccessfully separatesthedatawithdi !erenttags. 1.2ImageTagging Recentlymanyuser-providedtagsareautomaticallygenerated,andthusareincompleteor inaccurateindescribingthevisualcontentofimages[201].Inparticular,thesetagsare 6(a)Originaldatadistribution (b)Lineardistancemetric (c)Nonlineardistancemetric Figure1.2:Anillustrativeexampleforthecomparisonoflinearandnonlineardistancemetric learningalgorithms.(a),(b)and(c)showtheoriginaldatadistribution,thedistribution adjustedbyalearned linear distancemetric,andthedistributionadjustedbyalearned kernel metric,respectively. crawledfromthedescriptionsoftheuploaderandreviewers,andinmostcasesonlyasmall numberoftagsareprovidedforeachimageandsomeareevenirrelevanttothevisual contentofimages.Thisdisadvantagemakesitdi "culttofullyutilizetheseaccompanytags andlimitstheirapplicationsintagdependenttaskssuchastagbasedimageretrievaland tagrecommendation[126,129,226].TobetterbeneÞtfromthetags,thereisanurgent demandfore "cientalgorithmsthatareabletoimprovethetaggingqualityforalargescale ofimages,speciÞcally,e !ectivealgorithmsthatcansimultaneouslyrecoverthemissingtags andremoveordownweightthenoisytags. GenerallytherearetwogroupsofalgorithmscanfulÞllthisdesiretore-weightthegiven tags:theÞne-grainedonesandthecoarseones.Andbothgroupsmodelmainlythecorrela- tionsamongthepartiallyobservedtags,thenapplythismodeltothewholedictionaryand re-weightalltagsinthedictionary. TheÞne-grainedones[40,44,116,115]aimattheimageswithaspeciÞccontentsand carefullymanuallyannotatedtags.Thecontentsofimagesaresimpleandconcentrated, whilethesemanticknowledgearecompleteandclean,andsometimesevenmanuallyasso- 7ciatedtothecorrespondentsegmentsintheimages.Typicaldatasetsforthesealgorithms areAnimalwithAttributesdataset[115]andCUB-200-2011dataset[189].TheÞne-grained groupincludestransferlearning[115],labelpropagation[40],zeroshotrecognition[116]and Þne-grainedcategorization[44].Thisgroupisabletolearnnewattributesbyexploiting theirgraphicalorprobabilisticrelationshipviabothanintermediate-levelsemanticrepre- sentationandthelow-levelmappingbetweentagsandtheircorrespondentsegmentsinthe imagerepository. However,naturalimagesareusuallyassortedandinvolvealargescopeoftopicsinclud- ingscene,people,animal,action,objectsandmanyotheraspectsoflifeandenvironment. Besides,becauseofthelargeamountandversatilityoftheimages,thetagsareusually user-providedinsteadofmanuallyannotated,andthuscontainmanymissingannotations anderrors.Andprobably,thetagdictionarymightbeverylong.Typicaldatasetscouldbe referredtoFlickr1Mdataset[201]andNUS-WIDE[33]dataset.InthiscasetheÞnegrained methodsarenolongercapabletocapturethetag-tagortag-imagedependenciessinceon theonehand,themanualsegmentationandtaglocalizationarelaborcosting;andonthe otherhandtheobservedtagsareusuallytooproblematictotrainareliablemodel.Inthe contrast,thecoarsemethodsareabletosimultaneouslyaddressthechallengesofmissing andnoisytags[33]forahugevarietyofimagesusingmachinelearningtechniques,which relymainlyonthesemanticinformationthatcoversawiderangeofknowledge,andaswell astheauxiliaryvisualcontentinformation. Werefertothisproblemas tagcompletion [33]todistinguishitfrompreviouscoarse imagetaggingwork.AlthoughtheÞnalobjectiveofthosetaggingworksistoassignan imagewithcompleteandexacttagswithreasonableconÞdencescores,theirinitialsetups vary,asillustratedinFigure1.3. Imageannotation [28,67,139]automaticallyassignsunla- 8Figure1.3:Exemplarillustrationoftagcompletionandotherimagetaggingworksinclud- ingimageannotation,tagrecommendationandtagreÞnement.Theupperboxshowsthe initiallygiveninformation(bothvisualandsemantic),andthebottomboxindicatesthe ultimateobjectiveofallfourtasks. beledimageswithappropriatekeywords.Asastate-of-the-artimageannotationapproach, searchbasedalgorithms[52,67]relyonthequalityoftagsassignedtotrainingimages[52]. Tagrecommendation suggestscandidatetagstoonlineannotatorsinordertoimprovethe e"ciencyandqualityofthetaggingprocess[108,158,206].ItusuallyidentiÞesmissing tagsbytopicmodels(e.g. LatentDirichletAllocation (LDA))[12,108,225],butdoesnot addressthenoisytagproblem,animportantissueinexploitinguser-providedtags. Tagre- Þnementappliesvarioustechniques,includingtopicmodel,tagpropagation,sparsetraining andpartialsupervision[28,131,206],toselectasubsetoutoftheuser-providedtagsbased onimagefeaturesandtagcorrelation[224].Althoughitisabletohandlenoisytags,it 9cannotexplicitlyenrichthemissingtags. 1.3ThesisContributions Inthissectionweshallelaborateonthemainproblemsconsideredinthisthesisandourkey contributionstoaddresstheseproblems. Thisdissertationmainlydealswiththeimageannotationandtagcompletionproblems, givingtheoreticalguaranteesandprovidingempiricalcomparisonswithstate-of-the-artbase- linealgorithms.Generally,weattempttodelveintotheimage-imagecorrelation,image-tag mappingandtag-taginteractiontocapturetheirunderlyingrelationship.Inparticular,the maincontributionscanbesummarizedasfollows. ¥Newe !ectiveimagedistancemetricanditstheoreticalfoundation .The dissertationproposesannovelkernelbaseddistancemetriclearningalgorithm(RKML) speciÞcallyforimageannotationinChapter3.Thisalgorithmachievessuccessbyfully exploringtheimage-tagdependency,whichisconsequentlyusedtobettercapturethe nonlinearcomplexitiesamongimages.Manystrategiesareappliedtoguaranteethe annotationaccuracy,includingtheincorporationofsoftsemanticconstraintswhich betterexplorethesemanticinformationbetweentags,andtheadoptionofrankbased regularizationtermwhiche !ectivelyreducestheoverÞttingrisktothetrainingdata. Besides,thetheoreticalguaranteefortheproposedkerneldistancemetriclearningis provided. ¥E"cientkernelmetriclearningandrelatedcomputation .TheproposedRKML algorithmexplorestheregressiontechniquetoavoidtheprojectiontoPSDcone,which isnecessaryindistancemetriclearningandisintensivelycomputationallyexpensive. 10Besides,Nystr¬omapproximationisappliedinthekernelcomputationtospeedupthe implementation.Theseskillsaswellastherankbasedregularizationgreatlyreduce thecomputationburdenfortheproposedRKMLalgorithm. ¥Novelimagetagcompletionworkthate !ectivelydealingwithmissingand noisytags .Thedissertationalsoproposesanovelimagetagmatrixcompletion (TCMR)frameworkinChapter4thate !ectivelyrecoverstheexpectedtagsfrom incompleteandnoisygiventags.Thisalgorithmfocusestocapturethetag-tagcor- relation,andthenusesittoreverselyupdatethetagconÞdencescorematrix.Based ontheideaoftopicmodel,TCMRassumesthattheobservedtagsofanyimageare drawnindependentlyfromamixtureofasmallnumberofmultinomialdistributions, whichcanbestraightforwardlyinterpretedasthelowrankmatrixcompletiontheory. Sofollowingthistheory,thenuclearnormisappliedtosimultaneouslycapturethe interactionsamongtagsintwoways,eitherbetweendi !erenttagkeywordsorbetween tagvectorsassociatedwithdi !erentimages.Maximumlikelihoodcomponentisalso employedasthelossfunction,whichsuccessfullyconnectsprobabilisticmodelsand matrixcompletiontheory.Allthesetechniquesareappliedtoensuretheperformance oftagmatrixrecoveryoutofmissingandnoisytags. ¥Theoreticalguaranteeofimagetagcompletionbynoisymatrixrecovery .TheÞnalobjectivefunctionoftheoptimizationproblemisconvex,whichguarantees thattheglobaloptimalsolutionexistsanditwouldbee "cienttoÞndthisoptimal solution.Theerrorboundsbetweentherecoveredmatrixandthestatisticallyoptimal onearealsoprovidedtheoretically. 111.4ThesisOverview Theremainderofthisdissertationisorganizedasfollows.Chapter2laysoutthefoundation fortherestofthedissertation.Inparticular,weprovideasurveyonsomeofthebackground materialsincludingimagetaggingtaskslikeimageannotationandimagetagcompletion, distancemetriclearning(bothlinearandkernel),statisticalmodelsappliedinimagetagging work,andaswellaslowrankmatrixrecoverytheory.Itwillbecomeclearinthischapter thatthereexistdeepconnectionsbetweenthesetopics. TheÞrstpartofthethesisfocusesontheimageannotationproblem.InChapter3we focusonthekerneldistancemetriclearningproblem,investigatehowita !ectstheimage annotationperformance,andproposestrategiestosolvethelimitationsinexistingkernel distancemetriclearningalgorithmsandtheirapplicationsinreal-world. Thesecondpartofthethesisdealswiththeimagetagcompletionproblem.Chapter4 discussesitsrelationshiptothestatistical/topicmodelsandmatrixcompletiontheory.The e!ectivenessoftheproposedalgorithmisjustiÞedboththeoreticallybytherecoveryerror boundsandempiricallyonabunchofdatasetsintermsofseveralofsetups. Finally,Chapter5summarizesthisworkbyconcludingthemaincontributions,some potentialextensionsandthefutureresearchdirections.Besides,theappendixsummarizes ratherstandardthingsonrelevanttopicsofthiswork,andgivestheerrorboundsthatare usedintheproofofresultsinthethesisandismainlyforreference.Inordertofacilitate independentreadingofvariouschapters,someofthedeÞnitionsarerepeatedformultiple times. 121.5Notation Thissectionservesasaglossaryforthemainmathematicalsymbolsusedthroughoutthe thesis.Vectorsareshownbylowercaseboldletters,suchas x#Rd.Suchavectorusually representsthevisualfeatureortagvectorofanimage.Matricesareindicatedbyuppercase letterssuchas Aandtheirpseudo-inverseisrepresentedby A€.Weuse[ m]asashorthand forthesetofintegers {1,2,...,m }.Throughoutthepaperwedenoteby |á| ,|á| 1,|á| Fand|á|!the!2(Euclidean)norm, !1-norm,Frobeniusnormandspectralnorm,respectively. 1.6BibliographicNotesforPreviousPublications Someoftheresultsinthisdissertationhaveappearedinpriorpublications. ThematerialinChapter3isbasedonaworkpublishedintheInternationalConference onComputerVision[52](ICCV),thecontentofChapter4comesfrom[51]whichispublished attheEuropeanConferenceonComputerVision(ECCV),and[50]whichispublishedat theIEEETransactionsonImageProcessing. 13Chapter2 Background Thegoalofthischapteristogiveageneralandformaloverviewofthematerialsrelatedto theworkthathasbeendoneinthisthesis.Inparticular,wewilldiscussthekeyconceptsand questionsrelevanttoproblemsofimageannotation,imagetagcompletion,kerneldistance metriclearningandnoisymatrixrecovery.Theexpositiongivenhereisnecessarilyverybrief andthedetaileddiscussionwillbeprovidedintherelevantchapters. 2.1ImageRepresentation Inthecomputervisionarea,imagerepresentationplaysanimportantandineluctablerole. SpeciÞcally,appropriatefeaturerepresentationsigniÞcantlyimprovestheperformanceof typicalimagerelevanttasksincludingimageclassiÞcation,imageclustering,imageunder- standing,videounderstanding,etc.Sinceanimageconsistsofanunstructuredarrayof pixels,theÞrststepofimagerepresentationistoextracte "cientlycertaintypesofdiscrim- inativevisualfeaturesfromthesepixels,eithercolorfulorgrayscale[220].Variousfeature extractiontechniqueswillbereviewedindetailinthefollowingsub-sections. 2.1.1ColorFeature Colorfeatureisoneofthemostbasicandfundamentalfeaturestocapturetheimagechar- acteristics,whichisusuallydeÞnedsubjecttoaparticularcolorspace,suchas RGB ,HSV ,14andL"#spaces[41,65].Withinthesespaces,colorfeaturescouldbeextracted,including colorhistogram[204],colormoments[220]andcolorcoherencevector[220]. 2.1.2TextureFeature Unlikecolorfeatureswhichmeasuredthepropertyofasinglepixel,texturefeaturesexplore thetraitsofagroupofpixels.Accordingtotheextracteddomain,texturefeaturescanbe dividedintotwogroupsincludingspatialtexturefeatureandspectraltexturefeatures[220]. Spatialtexturefeaturesareusuallyextractedbycomputingthepixelstatistics,searching localpixelpatternsorconvertingwithstochastic/generativemodelsintheoriginalimage space.Typicalspatialfeaturesincludetexonhistogram[140]andMarkovrandomÞeld[97]. Generally,sincespatialfeaturesaredirectlygeneratedintheoriginalimagespace,they couldbestraightforwardlyextractedfromirregularshapedregions,whiletheyusuallysu !erseverelyfromthenoise,mutationanddistortionsofimages[220]. SpectraltexturefeaturesserveassigniÞcantimageanalysistoolsintheComputerVision areainearly2000s,andtheyareusuallyextractedinthefrequencydomainthatistrans- formedfromtheoriginalimagespace.CommonspectraltexturefeaturesincludesFourier transform(FT)[122],discretecosinetransform(DCT)[164],wavelet[85]andGaborÞl- ters[128].Amongthem,FTandDCTaree "cientbutsensitivetoscaleandrotation, waveletisfastcomputedbutlimitedtoorientations,andGaborfeatureisrobusttoscale andorientationbutwouldlosecertainspectralinformationduetotheincompletecoverof spectrum[183]. 152.1.3TypicalFeaturesinImageTagging Here,severalstate-of-artimagevisualfeaturesaresummarizedandcomparedindetail,which arepotentiallyusefulforimageleveltasksincludingimageannotationandimagetagging. SIFT feature[137]isinitiallyproposedforobjectrecognition.ItÞrstextractstheSIFT descriptorsfromasetofreferenceimagesatdi !erentscaleswithGaussianÞltersandthen usesbag-of-wordsmodeltocomputerthehistogramofthedescriptorstoformtheÞnalimage feature.TherearevariousversionsofSIFTfeaturesincludingsparseSIFT,denseSIFTand SURF.SparseSIFT[218]buildsthefeaturesatHessian-a "nandDenseSIFT[120]extract thedescriptorswithinaßatwindow.SURF[7]isaspeed-upversionofSIFTwhichtakecare ofthescaleproblembyaconvolutionwithboxÞltersandhandletheorientationproblem withwaveletresponses. Gistfeature[155]isinitiallydescribedasalowdimensionalrepresentationofthescene andspeciÞcallyforscenerecognition,whichrequiresnoimagesegmentationasintradition. Itsummarizesthegradientinformation,bothscalesandorientations,byconvolvingthe imagewithabankofGaborÞlters[128],whichprovidesaroughdescriptionoftheimage characteristics. HOGfeature[38,48]isreportedtoprovideexcellentperformanceforobjectandhuman detection.ItÞrstdenselyextractsthehistogramoforientededges(HOG)descriptorsand stackstheneighboringHOGdescriptorstogethertoincreasethefeaturedimensionandthe descriptivepoweraswell.Bag-of-wordsmodelisusedlatertoÞnallycomputetheHOG featureforanimage. LBPfeature[153],shortforLocalBinaryPatterns,isapowerfultexturefeaturebasedon occurrencehistogramoflocalbinarypatterns.BasicallyLBPdividestheimageintoblocks, 16forexample3 $3,thenthresholdtheblockwiththecenterpixelvalueandencodeitinto asequenceofbinarydigits.Thesequenceisthenconvertedtoadecimalnumberwhichis setasthevalueofthecenterpixel.Thusthehistogramofeachblockcanbecomputedand concatenatedtogethertoformafeaturevectoroftherepresentingimage.Essentially,LBP encodesthelocalcontrastandpatterns,makingithighlydiscriminativewhilecomputed e"ciently. 2.2ImageAnnotation Oncesu "cientvisualfeaturesareextractedfromtheimage,highlevelsemanticslikeannota- tionsandtagscouldbelearnedimmediatelyfromthegiveninformation.Accordingto[67], traditionalautomaticimageannotationmethodscanbecategorizedintothreegroups,while recentlynewdeepneuralnetworkbasedmodelshavealsogainedmoreandmoreattention intheannotationcommunity. 2.2.1GenerativeModels Thistypeofmodelsusuallytrainsglobalprobabilisticmodelstoexplaintheco-occurrence betweenimagevisualfeaturesandsemanticlabels,andthenpredictnewtagswiththe newlylearnedrelationship.Amongthem,manyareborrowedfromthetechniquesofnatu- rallanguageandtext-baseddocumentprocessing.Duyguluetal.triedtotranslateimage blobsintolabelkeywordsdirectlyusingamachinetranslationmodel[46],whichinspired severalrelevancemodels.Theseearlyworks,includingCross-MediaRelevanceModel[92], Continuous-SpaceRelevanceModel[119]andMultipleBernoulliRelevanceModel[49],as- sumestheblobsandtagsareconditionallyindependentgivenanspeciÞcimage.Besides, 17analgorithmin[22]isdesignedtomodelthejointdistributionbetweentagsandvisualfea- tureswithamixturedistribution,while[145]modelsthevisualandsemanticrelationship viaBayesiannetwork. Meanwhile,latentspacemodelsderivedfromnaturallanguageandtextprocessing,in- cludingLatentSemanticAnalysis[45]andProbabilisticLatentSemanticAnalysis[76],and variantsofLatentDirichletAllocationmodels[4,148,132,108,158]havebeensuccessfully appliedtoimageannotation. Besidesthepreviouslargegroups,in[195]theauthorsproposeasemi-supervisedformu- lationbasedonlinearregressionwithatag-biasedregularization. Thesemethodsusuallyhaveunsatisfactoryperformancesincetheprobabilisticmodels aretooglobaltocapturethenonlinearrelationbetweenimages. 2.2.2DiscriminativeModels ImageannotationcanalsobeviewedasaclassiÞcationproblemwhereeachkeywordistreated asanindependentclass.Asastate-of-the-artclassiÞer,SupportVectorMachine(SVM)has beenshownwithhighe !ectivenesswhenhandlinghighdimensionaldatalikeimage.AnSVM classiÞerisbasicallyabinaryclassiÞer,soinordertobeadaptivetotheimageannotation taskswhichrequiresmultipleclassiÞer,someSVM-basedannotationmodelsÞrsttraina separateSVMforeachconceptwitheachclassiÞergeneratingaprobability,andlaterfuse alltheSVMclassiÞerstogethertogetaÞnalconÞdencescoreforeachtag[36,47].Further, abatchmodere-taggingmethodisproposedin[27],whereaSVMwithaugmentedfeatures isproposedtolearnadaptedasetofclassiÞerstoreÞnetheexistingnoisytags. BesidesSVM,thereisagroupofotherdiscriminativemodelsthathavebeensuccessfully appliedinimageannotation.[139]assignstagsbya knearestneighborclassiÞercombining 18withmultipledistancemetrics.[144]appliesastructuralmodeltoattribute-basedimage classiÞcation,andtransferstheuserinputsaswellastheattribute-classmappingresults topredictedtags.[13]learnstheclasslabelsbyexploitingthegrouplassotechniqueand minimizingtherankingerrors.Commonlyforthesemethods,boththetrainingandtesting phasesarecomputationallyexpensive.But[72]raisesamax-marginformulationthatmodels thedensepairwiselabelcorrelations,andreducesthecomplexityfromexponentialtolinear. [157]alsolearnsamulti-labelclassiÞerthatexplicitlyande "cientlymodelsthedependencies betweensubmodularpairwiselabelsviagraph-cut,anddirectlyoptimizestheF-score. In[133],amultiviewHessiandiscriminativesparsecodingispresented,whichexploits Hessianregularizationtosteerthesolutionwhichvariessmoothlyalonggeodesicsinthe manifold,andtreatsthelabelinformationasanadditionalviewoffeatureforincorporating thediscriminativepowerforimageannotation.In[79],R.Hongetal.exploreboththe positiveandnegativetagcorrelationsandproposeanmethodwithdiscriminativefeature mapping,whichselectsthee !ectivefeaturesfromalargeanddiversesetoflow-levelfeatures foreachconceptundermultiple-instancelearningsettings. Despitetheconsiderableperformanceinlearningimageannotations,thisgroupofalgo- rithmssharesthesameshortcomingsthattheyhavepoorscalabilityonlargedatasetsor whenthetagdictionaryislarge;andtheyalsoperformunsatisfactoryespeciallywhenthe trainingtagsareincompleteornoisy. 2.2.3SearchbasedModels Sinceimageannotationisahighlynonlinearproblem,parametricmodelsmightnotbesuf- Þcienttocapturethecomplexdistributionofthedata,recentworksonimagetagginghave mostlyfocusedonnonparametricnearest-neighbormethods,whicho !erhigherexpressive 19power.Searchbasedapproacheshavegainedmuchpopularityintheexploringoftagrele- vanceduetoitsfeasibilityonlargescaledata.Recentstudiesonimageannotationshowthat searchbasedapproachesaremoree !ectivethanbothgenerativeanddiscriminativemod- els[67,202].Here,webrießyreviewthemostpopularsearch-basedapproachesdeveloped forimageannotation. TagProp[67]constructsasimilaritygraphforallimages,andpropagatesthelabelin- formationfromthetrainingimagestotestingimagesviathisgraph.In[123]amajority votingschemeamongtheneighboringimagesisproposed.[130]obtainsthetagrelevance scoreusingkerneldensityestimation,andthenperformsrandomwalktoboosttheprimary tagrelevancescoreoverthetagproximitygraphthatisconstructedfromtheneighboring images.Asparsecodingschemeisproposedin[59]toselectsemanticallyrelatedimages fortagpropagation,andthenlocalandglobalrankingagglomerationisadoptedtodown weightthenoisytags.Besides,conditionalRandomFieldmodelisadoptedinboth[93]and [203]tocapturethespatialcorrelationbetweenannotationsofneighboringimages,but[93] embedsthekernelizedlogisticregressionwithmultiplevisualdistancemetriclearningwhile [203]optimizesthemodelbymaximizingmarginsofthehingelossfunction. Thiscategoryofworksusuallyconcernsmoreonsearchtechniqueorvisual-semantic consistencyproblems,wheremuchattentionhasbeenpaidtolearne !ectiveande "cient distancemetrics. 2.2.4NeuralNetworkbasedModels NeuralnetworkbasedimageannotationmodelstypicallyincludesconventionalartiÞcialneu- ralnetwork(ANN)[55]andrecentdevelopeddeepconvolutionalneuralnetwork(CNN)[109]. AnANNconsistsofmultiplelayersofnodescalledneurons,andnodesindi !erentlayers 20areconnectedbyedgeswithcorrespondentweights.Eachneuronworksbyinputtingthe outputsofthepreviouslayersandtheweightsofitsconnectingedgesintoanactivation functiontogenerateaÞnaloutput.Figure2.1showshowanANNannotatesanimagewith threetags.Asanexample,four3-layerANNsareusedin[112]toannotateimageregions. Figure2.1:AnnotateanimagewithANN[220]. Veryrecently,deepconvolutionalneuralnetworks(CNN)havedemonstratedpromising resultsforimageclassiÞcation[109],andfeaturesbasedonCNNshavealsoshownpotential tosigniÞcantlyboostperformanceintermsofimageannotationandtagging[64,180].[64] proposesafeaturebasedonDNNthatcombinesconvolutionalarchitectureswithapproxi- matetop- krankingobjectives,andÞnallyoverwhelminglyoutperformsthetraditionalvisual featureinmultipleimageannotationjobs.[34]buildsmanysparselyconnectedneurallayers bytrainingonlythewinner-take-allneurons,whichyieldslargenetworkdepthandexcellent performanceonimageannotationtasks.In[214],Yangetal.tactfullyapplydeepneural networktoestablishthecorrelationsbetweenvisualfeaturesandsemantics,andaddressthe 21imbalancedkeyworddistributionbyincorporatingthekeywordfrequenciesandlog-entropy. Thisgrouphassomedistinctiveprosandcons.Formassiveinputandoutputdata,when wehavenoideawhatthefunctionmappingbetweenthetwotogetheris,neuralnetworkcan learnthisfunctionwithouthavingtoexplicitlyprovideit.Anditalsowellhandlesdefective datasetswithnoiseandmissingvariables.Nevertheless,emergingfromaneuralnetworkÕs weightscanbedi "culttounderstand,speciÞcally,itmaywork,butitishardtoexplainthe literalandphysicalmeaning,andthereisnotheoreticalguarantees.Sometimes,itstraining takeslongerthancertainothermethodsofmachinelearning. 2.3ImageTagging Imagetagwasinitiallyappliedtoimprovetheperformanceofcontent-basedimagere- trieval[114],andthenimagetaggingworksweredevelopedtogeneratetagsbyassociating semanticwordstounlabeledimages[5].Probabilisticandlanguagemodelsarewidelyused inearlymodelsthatmatchthesemanticsandimages[4,5,119].Astheimageannotation problem,theimagetaggingproblemcanalsobeformulatedasamulti-labelclassiÞcation problemwhereeachimagecanbeassignedtomorethanoneclasssimultaneously[14].And followingthisidea,therearemanymulticlasstechniques,includingSVM,CRF,andsome otherworkssuchas[14,15,115],thathasbeenmodiÞedtoadapttotheimagetagging problem. Similarasliteratureonimageannotation,mostexistingimagetaggingworksexploreonly therelationshipbetweenthevisualfeaturesthetags,forinstance,thedirectmappingbe- tweenvisualandtagspaces,theprobabilisticdependenciesandthegraphicalmodelbetween visualcontentsandtags[67,126,129,201]. 22Toachieveabettertaggingperformance,someworkstrytolearnabettermappingby studyingtheprecisetaglocalizationoranadaptivedistancemetric.[15]and[14]factorize theBags-of-Wordsfeatureasaweightedsumofclasshistogramsplusanerrortomodelthe imagecontent,andthusposethemulti-labelclassiÞcationproblemasarankminimization problem.[123]proposestoscalablyandreliablylearnthetagrelevancevectorofanimage byaccumulativelyvotesthetagsassociatedtoitssimilarimages(nearestneighbors).[52] and[202]applydistancemetriclearningmethodstocapturethedependencybetweenvisual andtextualcontents. However,sincecomparedtoimageannotation,additionaltaginformationareobserved inimagetaggingtasks,someotherworksdelveintothetextualcorrelationsamongtags.[40] introducesaso-called HierarchyandExclusiongraph toencodetherichsemanticrelations includingmutualexclusion,overlapandsubsumption.[42]mapsbothimagesandtexttoa commonsemanticspaceusing wordembedding ,whichimprovesthetaggingperformanceby avoidingdirectcross-modalmappingthatisalwaysimpracticaltobeconstructed. Besides,otherworksmainlyfollowtheideasoftopicmodelandmatrixcompletion. Theyusuallyexplorethemutualdependenciesbetweentagsandthensolveanoptimization problemderivedfromtheimage-tagrelation[58,151,193,215,224]. Andinourstudy,wefocusontheessentialcorrelationsamongdi !erenttags,whichcan bee !ectivelyrecoveredoutoftheincompleteandnoisyobservedtagsbythenoisymatrix recoverymodel.Inordertoprovideamorecomprehensivepresentationonthismodel,we furtherreviewtheimagetagcompletionworksaswellasthecloselyrelevanttopicmodel basedimagetaggingapproachesasfollows. 232.3.1ImageTaggingwithTopicModels Topicmodelisoriginallydesignedfordocumentclustering[12,11]whichdiscoverstheab- stractÓtopicsÓthatoccurinacollectionofdocuments.Figure2.2showsatypicaltopic modelpipeline.ItisÞrstassumedthatthereisahandofÕtopicsÕinthecollectionofdoc- uments,asshownintheleftcolumn,andeachtopiccouldbemodeledbyadistribution overasetofwords.Thenthegenerationofadocumentcanbedescribedasfollows.First, adistributionoverthetopics(thehistogramatright)shouldbechosen,andthenforeach word,wechooseatopicassignmentandchoosethewordfromthecorrespondingtopic[11]. Figure2.2:Theillustrationofhowtopicmodelworks[11],whereeachtopicishighlighted byaspeciÞccolor. Inthelastdecade,Topicmodelshasbeenwidelyappliedinimageunderstandingand tagrecoveryapplications[151,225].[206]appliestopicmodeltotagreÞnementbyjointly modelingtagsimilarityandtagrelevance.[108]usesLDA[12]todiscoverlatenttopicsfrom resourceswithcompletetagannotations,anddiscoveredtopicsarethenusedtorecommend 24topicsfornewresourcesthatareannotatedwithonlyafewtags.[158]presentsatopic- regressionmulti-modalLDAforimageannotation.Howeverallthesemethodsfocusonthe simpleco-occurrenceoftagsandfailtocapturetheirunderlyingdependencies,andthus workpoorlyonimperfecttags.Recently,[151]encodesthetextualtagsasrelationsamong theimages,andthenusestopicmodeltolearntheimagecontentandmodifytheirencoded relations.[87]extendstraditionalLDAtonoisytagsbyadditionallyintroducingageneral distributionunrelatedtotheimagecontentwhichleadstothenoisytags.Thekeylimitation oftheseproposedtopicmodelsare(i)theyhavetosolveanon-convexoptimization,and(ii) theyusuallydonothaveanytheoreticalguaranteeonthelearnedmodels. 2.3.2ImageTagCompletion ThereareonlyahandfulstudiesÞttingthecategoryoftagcompletionwithbothincomplete andnoisytags.[226]proposesadata-drivenframeworkfortagrankingthatoptimizesthe correlationbetweenvisualcuesandassignedtags.In[129]thenoisytagsareÞrstremoved basedonthevisualandsemanticsimilarities,andthentagsareobtainedbyexpandingthe observedtagswiththeirsynonymsandhypernymsusingWordNet.[201]proposestosearch fortheoptimaltagmatrixthatisconsistentwithbothobservedtagsandvisualsimilarity. [190]proposestocompletethemissingtagsbya locallinearlearning ,whichconstructs auniÞedobjectivefunctiontocalculatethetagscoringvectorforeachimageamongits neighborhood.In[208],theauthorsproposeanimage-tagre-weightingschemetoadjust thepenaltyofeachtagandimagebasedonbothimagesimilaritiesandtagassociations, andthereforeformulateauniÞedre-weightedempiricallossfunctiontohandlethedefective settingwithbothincompleteandnoisytags.Despitethesuccessfulapplication,noneof thesestudiesprovidesanytheoreticalguaranteefortheirapproaches. 25Besides,matrixdecompositionisadoptedinliteratureincluding[15,149,223,224]to handlebothmissingandnoisytags.[134]formulatestagcompletionintoanon-negative datafactorizationproblem.[126]exploitssparselearningtechniquestoreconstructthetag matrix.Thekeylimitationoftheseapproachesisthattheyrequireafullobservedtagmatrix withasmallnumberoferrors,makingitinappropriatefortagcompletionproblem. 2.4ImageAnnotationbyMetricLearning ItisubiquitoustoÞndappropriatemeasurestorepresentthedistanceorsimilaritybetween datainresearchandengineeringcommunitiesincludingmachinelearning,computervision, informationretrievalanddatamining,whichincreasestheemergenceofdistancemetric learning(DML)[9].Euclideandistanceisthesimplestandmostgenerallyuseddistance metric,butdespiteeasilyused,hardlyitisabletocapturetheirregularitiesandidiosyncrasies ofthecomplicatedandversatiledata.ThestudiesofDMLcanbetracedbackto2002[205], andimmediatelyitbecomesahottopicandinspiresmanyresearchwork.Yangetal.[210], Kulis[110]andBelletetal.[9]havecomprehensiveyetdetailedsurveysonthistopicincluding problemformulation,optimizationandapplications.Giventherichliteratureonthissubject, weonlydiscussthemetriclearningstudiescloselyrelatedtoimageannotation,wereferthe readersto[9,67,202,210]formoredetailedsurveysonthefocusedtopic,ifnecessary. Thegoalofdistancemetriclearningistotakeadvantageofthepriorinformationinform oflabels/tagsorpairwiseconstraintstocreateaprojectionofthedataintoanotherspace suchthattherelevantimageshavesmallerdistancesandsharemorelabelswhileirrelevant imageshavelargerdistancesandsharefewerlabels.Accordingtothelinearityofprojection, weroughlycategorizeDMLintotwogroups:linearandnonlineardistancemetriclearning. 26Besides,therearealsosomeextensionsincludingonlinelearningandlocalmetriclearning. Certainpartsofthesegroupsmayoverlap. 2.4.1LinearDistanceMetricLearning MostlinearDMLmethodsassumedatapointslieinaÞnitelinearspace,andfocuson Mahalanobismetriclearningproblemsetting,writtenas dM(x,x")=!(x%x")&M(x%x"),wherethemetric MshouldbesymmetricandpositivesemideÞnite(PSD).Mostnotable worksforlearningsuchaMahalanobisdistancefallintoseveralgroups[210]. TheÞrstgrouplearnsmetricswithexplicit classlabels (mayalsobereferredto tags,concepts orkeywords ).Forinstance,NCA[63]explicitlylearnsmetricsthroughak-nearest neighborclassiÞcation,MLCC[60]constructsaconvexproblemleadingtoametricthat collapsessameclasssamplestoasinglepointandpushessamplesintheotherclassesinÞnitely faraway,LMNN[196]extendstheK-NNbasedworksbyachievingmaximalmarginnearest neighborclassiÞcation,andLDML[68]modelstheimagesimilarityusingposterioriclass probabilitiesandobtainsthedistancemetricbymaximizingthelog-likelihood. Thesecondgrouplearnsmetricsfrom pairwiseconstraints andtypicallyincludesfol- lowingexamples.NMC[205]proposesaconvexformulationthatmaximizesthesumof distancebetweendissimilarpointswhilekeepingthesumofdistancebetweensimilarexam- plessmall.RCA[3]learnsadistancemetricthroughasetofpositiveconstraints(must-link), andlaterDCA[78]andERCA[216]extendRCAbyadditionallyintroducingnegativecon- straints(cannot-link)atthecostofamoreexpensivealgorithm.LRML[77]providesasemi- 27supervisedmetricbyintegratingtheunlabeleddatainformationandagraphregularization. ITML[39]introducesLogDetdivergenceregularizationandminimizesthedi !erentialen- tropyunderbothpositiveandnegativeconstraints.InITML,aBregmandivergencedeÞned asDld(M,M 0)=tr(MM%10)%logdet( MM%10)%d,where disthedimensionoftheinputspace,isintroducedtokeepthelearnedmetrictobeas identicalaspossibletotheEuclideanmetric( I).InITML,itisautomaticalandprettyeasy toguaranteethepositivesemideÞnitenessof Mbyminimizing Dld(M,M 0),duetothefact thattheLogDetdivergenceisÞniteifandonlyif MispositivedeÞnite.Furthermore,[91] andSDML[160]followthisideaandproposemoree "cientMahalanobisdistancelearning algorithms.BesidestheLogDetdivergenceregularization,SDML[160]alsoemploysan extra L1regularizationontheo !-diagonalelementsof Mtospeedupthecomputationin highdimensionalspacewhilemakeittheoreticallydescent.Andmoreover,tohandlethe noisyconstraints,RML[82]minimizestheworst-caseviolationoverallpossiblesetsofcorrect constraints. Exceptionally,despitethepopularityofDMLalgorithmsthattakingcareofclassla- belsandconstraints,onlyafewworksaredesignedtohandleothertypesofsupervised informationsuchasannotatedtags.Forimageannotationtasks,[93,200,202]proposeto exploremetricsfromimplicitsideinformationinsteadofclassassignmentsorpairwisecon- straints.KCRF[93]embedsaKernelizedLogisticRegression(KLR)withmultiplevisual distancelearningintoauniÞedConditionalRandomFields(CRF)framework.PRCA[200] Þrstproposesaprobabilisticmetriclearningoutoftheprobabilisticsideinformationbased onagraphicalmodel.UDML[202]uniÞesbothinductiveandtransductivemetriclearning 28techniquestoe !ectivelyexploitbothvisualandtextualimagecontents.Besides,MLR[142] learnsametricforsolvingrankingandretrievaltasks,anditsextensionR-MLR[124]addi- tionallydealswiththenoisyfeaturesusingamixed L2,1normtoignoremostoftheirrelevant features. Furthermore,thelinearsimilaritymetriclearningisusuallyanalternativeoflineardis- tancemetriclearning.Theonlydi !erenceisthatsimilaritymeasuredoesnotnecessaryhave distanceproperties,especiallythePSDandsymmetricrequirements,andasaresultitis usuallymoreßexibleandscalabletolargedata.Typicallinearsimilaritymetriclearning algorithmsincludeSiLA[159],OASIS[25],SLLC[8]andRSL[30]. 2.4.2NonlinearDistanceMetricLearning Duetothemultimodaldistributionsofreal-worlddata,recentlyanumberofnonlineardis- tancemetriclearningapproacheshavebeendevelopedtotacklethesenonlinearpatterns. Themainideaofnonlinearmetriclearningistolearnalinearmetricinareproducednon- linearfeaturespace.Dependingonhowthenonlinearmappingisconstructed,thenonlinear DMLfamilyisusuallyclassiÞedintotwocategories,boostingbasedapproaches[73,74,172] andkernelbasedapproaches[39,78,187]. Typicalboostingmethodsarelistedasfollows.BoostDist[73]combinesboostinghy- pothesesovertheproductspacewithaweaklearnerthatisbasedonpartitioningtheoriginal featurespace.BoostMetric[173]appliesasetofpositivesemideÞnitematriceswithtraceand rankbeingoneasweaklearnerstoanboostingbasedlearningprocess.AndGB-LMNN[98] appliesgradientboostingtolearnanonlinearmappingdirectlyinthefunctionspace. Initialkernelmetriclearning(KML)algorithms,suchasKPCA[169],KernelNMC[205], KernelMCML[60],KernelDCA[78],KLMCA[187],KernelITML[39]andKernelBoost[74], 29directlyextendtheirlinearorboostingbasedcounterpartstokernelmetriclearningusing thekerneltricks.Althoughseveralapproacheshavebeenempiricallyshowntobeableto kernelizable,ingeneralkernelizingsetting,aspeciÞcmetriclearningisnottrivial.Itinvolves anewproblemformulationwheretheinterfaceofdataislimitedtoinnerproducts,anda n$nmatrixisineluctabletolearn.Besides,asthenumberoftrainingexamples nincreases, theproblembecomesintractable.Theseproblemstogetheryieldanextremelydi !erentand di"cultsolutionasinthelinearspace.Toaddressthisproblem,ahandofgeneralkernel- izationextensionworks[24,219]havebeendevelopedbasedonKPCA[169].Aso-called KPCAtrick ,whichintroducesakerneltoprojectthedataintoanonlinearspacefollowed byadimensionalityreductionstrategy,isadoptedanditssoundnessisjustiÞedtheoreti- callythroughrepresentertheorems[24].Itisalsopossibletoobtaingeneralkernelization throughtheequivalencebetweenMahalanobisdistancelearningandlineartransformation kernellearningwithspectralregularizers[90,89].Inpreacticalimplementation,suchan appropriatekernelfunctioncouldbeselectthroughamultiplekernelframeworkthatis proposedin[191]. Inparallel,someotherKMLworksstraightforwardlyproposenewkernelbasedmetric learningframeworks.[2]proposesanexplicitkerneltransformationtotackleaconstrained traceratiooptimizationproblem.Itexploitsbothpositiveandnegativeconstraintsandas wellasthetopologicalstructureofdata.ThesuggestedimplementationofthisKMLalgo- rithmisquitee "cientsinceitisnotnecessarytolearnallentriesinthe n$nmetricmatrix. NAML[26]formulatesatracemaximizationproblemtojointkernellearning,dimension reductionandclusteringtogetherandsolvesitinaEMframework.[209]proposesasupport vectormetriclearning(SVML)thatco-jointsaMahalanobisdistanceandtheSVMmodel withaRBFkernel,wherethePSDconstraintisautomaticallyguaranteedandthemetric 30canbemadelowrank. However,althoughliteraturehasshownthatkernelmetriclearningmaydramatically improvethequalityoflearneddistanceoverhighlynonlineardata,italsosu !ersfromthe computationalburdenandeasilycausedataoverÞtting,whichresultsinapoorgeneralization performance. 2.4.3OnlineMetricLearning Aspreviouslystated,amainchallengeinlinearornonlineardistancemetriclearningisto enforcethelearnedmetrictobepositivesemideÞnite(PSD),whichturnsouttobevery computationallyexpensiveintermsofbothtimeandspace,especiallywhendealingwith largescaleproblems.Onlinelearningiscontraryveryusefulinhandlingtheseproblemby gettingridofthebottleneckofPSDrequirementsandthusgainsgreatpopularity,though itoccasionallyperformsabitinferiortobatchalgorithms.Prominentonlineworkscanbe referredtoPOLA[171],LEGO[91],OASIS[25],RDML[95]andMDML[111].POLA[171] istheÞrstonlineMahalanobisdistancelearningapproach,whichprovidesaregretbound andisdonequitee "ciently.LEGO[91]learnsmetricsinanonlinesettingusingaLogDet regularization,andOASIS[25]isasimilaritymetriclearningwhichscaleslinearlywiththe datasizethroughonlinelearningofabilinearmodelusingamargincriterionandane "cient hingeloss.RDML[95]solvesaconvexquadraticprogramineachiterationstepinsteadof doingeigenvaluecomputationlikePOLA,anditperformscomparablytoLMNNandITML yetmuchfaster.MDML[111]isbasedoncompositemirrordescentandcanaccommodatea largeclassoflossfunctionsandregularizersforwhiche "cientupdatesarederived.Besides, bothMCML[60]andITML[39]haveonlineversionswithexcellentperformance. 312.4.4LocalMetricLearning Thepreviousstudieslearnagloballinearornonlinearmetric,whichmayincapableto capturethecomplexityifthedataisheterogeneous.HoweveritmaybeneÞcialtouselocal metricsthatvaryacrossthespace,whichhavebeenshowntosigniÞcantlyoutperformglobal methodsattheexpenseofhighertimeandmemoryrequirements.[211]presentsaLocal DistanceMetric(LDM)thataimstooptimizelocalcompactnessandlocalseparabilityin aprobabilisticframework.MultipleMetricLMNN(M 2-LMNN)[197,198]learnsseveral Mahalanobisdistancesindi !erentpartsofthespacethatarepartitionedbyclusteringal- gorithms.GLML[113]leveragesthepowerofgenerativemodelinthecontextofmetric learning,bylocallyandsimultaneouslyminimizingtheasymptoticprobabilityofmisclassi- ÞcationandaswellasthebiascausedbyÞnitesampling.In[192],PLMLisproposedwhich learnslocalmetricsaslinearcombinationsofbasismetricsdeÞnedonanchorpointsover di!erentregionsoftheinstancespace,anditisquiterobusttooverÞttingduetoitsglobal manifoldregularization.Further,[86]extendsPLMLbyregularizingtheanchormetricsto belowrank,whichallowsabetteroptimizationtoachievetheoptimalmetric. 2.4.5OtherMetricLearning Besides,therearealsoafewapproachesthatareoutsidethescopeofthepreviouscategories. Forinstance,themulti-taskmetriclearningisdesignedformulti-tasksetting,wheregivena setofrelatedtasksametricislearnedforeachtaskinacoupledfashioninordertoimprove theperformanceonalltasks.Typicalmulti-taskmetriclearningalgorithmsincludemt- LMNN[156],MLCS[212],GPML[213]andTML[222].Andasforsparsemetriclearning, typicalexamplesincludeLPML[166]andSML[217],whichfavorthesparsitythrough L132normand L2,1normregularization,respectively.However,LPMLisnotguaranteedtobe lowrankwhileSMLsu !ersfromthecomplexityissueinhighdimensionalproblems.Besides, anuniÞedandgeneralframeworkforsparsemetriclearningisproposedin[83,84]. 2.5ImageTaggingbyMatrixCompletion LiterallymatrixcompletionmeanscompletingpartiallyspeciÞedmatricestofullyspeciÞed matricessatisfyingcertainprescribedproperties.Thematrixcompletionproblemcanbe datedtoback1990,whenJohnsonclaimsin[96]thatgivenafewassumptionsaboutthe natureofthematrix,theexpectedmatrixisallowedtobereconstructed.Theseassumptions includepositivesemideÞniteproperty,contractionpropertyandgivenrankassumption[96, 118].Abreakthroughoccursin2009whenCand`esandRecht[20]provethatalow-rankma- trixcanbereconstructedbasedonconvexoptimizationofthenuclearnorm.Untilnow, lowrankmatrixcompletionhasbecomearecurringprobleminmanyÞelds,forexample, collaborativeÞltering[62](notably,theNetßixchallenge)andcomputervisionproblems includingstructure-from-motion[186],multi-classiÞcation[1,15],globalpositioning[174], amongmanyothers.Wereferto[19]foradiscussionofmoreapplications. 2.5.1LowRankMatrixRecoverywithNuclearNormMinimiza- tionSinceÞndingthelowestrankmatrixsatisfyingtheequalityconstraintsisNP-hard[31]and thefunctionofmatrixrankisnon-convex,apopularapproachistoreplaceitwiththe nuclearnorm,thetightestconvexrelaxationofmatrixrank[19,21].Thetheoreticalbasefor 33suchrelaxationisprovidedin[21,162]thatunderfavorableconditions,theminimizationof therankfunctioncanbeachievedbythenuclearnorm,whichlaysthefoundationforlater matrixcompletionproblemlearning.Andwiththenuclearnorm,itispossibletoaccurately recoveralowrankmatrixfromasmallfractionofitsentrieseveniftheyarecorruptedwith noise[19,20,105]. Inthenoiselesssetting,thematrixcompletionproblemisconsideredasexactornear- exactrecovery,whererelevantworks[21,66,99,161,174]discovertheminimumrequired numberofrandomobservationstoexactlyreconstructalowrankmatrixbyaconstrained nuclearnormminimization.[21,99,174]provethat O(nrpoly(ln n))observedsamplesare requiredtorecovera r-rank n$nmatrixinspecialcase.[66]developsmoregeneralmethods andimprovesthatresultbyintroducinga degreeofincoherence $betweentheunknown matrixandthebasis,andÞnallyindicatesthat O(nr$ln2n)randomlysampledentriesis su"cienttorecoveranylow-rankmatrixwithhighprobability.And[161]simpliÞesthe previousargumentsandsharpenstheresultsof[21,99,174]byprovidingaboundonthat numberwhichisoptimaluptoasmallnumericalconstantandonelogarithmicfactor.These resultsthusprovidetheoreticalguaranteesforthenuclearnormconstrainedminimization methods. Inaparallellineofwork,noisymatrixcompletion,whichismorecommonandwherea fewobservedentriesarecorruptedwithnoise,hasalsobeenextensivelystudied[19,53,54, 100,102,103,104,105,107,150,165].Theobservednoisymatrixisusuallyregardedas A=L+S,where Lisanunknownlowrankmatrixand Scorrespondstothenoisycorruptions. Comparedtothenoiselesssetting,noisecouldseverelyharmthematrixcompletionresults, asshownin[207]thatthenuclearnormminimizationcouldfailtorecoverthelowrank matrixevenif Scontainsonlyasinglenon-zerocolumn. 34Whenallentriesof Aareobserved,thematrixcompletionproblembecomesamatrix decompositionproblem.[23]assumes Sissparseandprovesthat LandScanbeperfectly recoveredunderadditionalsu "cientidentiÞabilityconditions,andmilderconditionsare furthergivenin[81].RPCA[18]studiesthesamemodelbasedelement-wisesparse Swhere thecorruptionpositionsaresampleduniformlyatrandom,while[207]considerscolumn-wise sparse S,wheretheuncorruptedcolumnsarechosenuniformlyatrandomandguaranteed torecoveraslongas Lislowrank. Whenonlypartialentriesof Aareobserved,thematrixcompletionproblemisregarded asapproximatematrixrecovery.ItisÞrstsystematicallyaddressedin[18]inthenoiseless frameworkwithelement-wisesparse S,wherethecorruptionpositionsaresampleduniformly atrandom.And[29]improvesbyconsideringcolumn-wisesparse S,anditprovesthatthe uncorruptedcolumnsof Lcanberecoveredandthecorruptionpositionsin ScanbeidentiÞed aswell,aslongasthefollowingassumptionsaresatisÞed:Theuncorruptedcolumnsare chosenuniformlyatrandom, Lislowrank,thenumberofcorruptedcolumnsarelimitedand thenumberofobserveduncorruptedentriesaresu "cient.Recently,bothelement-wiseand column-wisecorruptionsaresimultaneouslyaddressedin[105],wherethehighprobability recoveryof Lrequiresonlyanupperboundonthemaximumoftheabsolutevaluesof LandS,insteadoftherankof Landthesparsitylevelof Sasinpreviousstudies. Inthenoisy/approximatematrixrecoverysetting,mostworksdelveintolowrankmatrix reconstructionbyminimizingthenuclearnormwithuniformsampling[19,54,100,102, 107].Keshavan etal. [100]improvesovertheresultsof[19]andachievesreconstruction guaranteesthatareorder-optimalinavarietyofcircumstances.Foygel etal. [54]presents reconstructionguaranteesbasedonanalysisontheRademachercomplexityofthenuclear normunitball[179].Koltchinskii etal. [107]proposesanuclearnormpenalizationwithfast 35convergenceratethatisshowntobeoptimaluptologarithmicfactorsinaminimaxsenseand isequippedwithanon-minimaxlowerbound.Andlater,unknownnoisevarianceisfocused onin[102],wheretheauthorproposesareconstructionestimatorthatachieves,uptoa logarithmicfactor,optimalratesofconvergenceundertheFrobeniusrisk.Andthisestimator yieldscomparablematrixcompletionperformanceasthepreviousstudies[19,107,150,165] withknownnoisedeviation. Acommonstrategytosolvetheconvexoptimizationproblemistheiterativescheme, andtypicalalgorithmsinclude[16,138,185].Besides,alowcomplexityalgorithmOptSpace basedonacombinationofspectraltechniquesandmanifoldoptimizationisÞrstintroduced by[99]tohandletheexactrecoveryproblem,anditsrobustnesstonoisymatrixprob- lemsettingistheoreticallyprovedin[100].Andothere "cientnuclearnormminimization solvers[57,88,94,141]havealsobeenintensivelylearned.However,mostofthemfailtosolve largescaleproblems,makingnuclearnormregularizationlessfeasibleinpractice,despiteits strongtheoreticalguarantees.Fortunately,itisrecentlyclaimedthatlargescalematrix completionproblemcouldbesolvedthroughaparallelstochasticgradientalgorithm[163], orbyane "cientnuclearsolverviaactivespaceselection[80]. ThenuclearnormhasbeenappliedasaregularizertoimageclassiÞcation[15,61,71], visualrecovery[136,149]andtagrelevanttasksincludingimagetagreÞnement[224]and imagetagcompletion[51],wherethenuclearnormisusedtoenforcecorrelationsbetween classiÞersortags.Inthematrixcompletion/recoveryscheme,moststudiesadoptsmooth losses,includingcommonsquaredloss[54,61,184],sparse !1-normloss[57,224],logistic loss[15,61],maximummarginestimator[136],andeven !-Lipschitzloss[53]. 362.5.2LowRankRecoveryunderOtherConstraintsandSampling Distributions Mostmatrixcompletionworksfocusontheuniformsamplingandthenuclearnormregular- ization,whichmightbeunrealisticsinceinpracticetheobservedentriesarenotguaranteed tofollowuniformschemeanditsdistributionisnotknownexactly.Toovercomethislimita- tion,someresearcherssearchforbettersamplingdistributions[104,105]whilesomeothers developmoresuitablesurrogateofthematrixrank[53,54,150,17]. In[104,105],theuniformsamplingisreplacedbyageneralandunknownsampling distributionwithinthenuclearnormminimizationframework.Nevertheless,thecondition neededin[104]ismuchmilderthatitrequiresonlyanupperboundonthemaximumabsolute valuesoftheentriesin A,insteadofboth LandSasdonein[105]. Itisshownin[178]thatthestandardnuclearnormmightperformpoorly,andacommon alternativeistheempiricallyweightednuclearnorm[53,150,178],whichincorporatesthe priorknowledgeofsamplingdistributionthatcanbecomputedbasedonthelocationsofthe observedentries.Besides,adirectrankpenalizedestimator,obtainedbyhardthresholdingof thesingularvaluesof A,isproposedin[103],wheregeneraloracleinequalityfortheprediction errorisestablished.Andinparallel,[165]introducestheSchatten- pnormbasedpenalization andalsoestablishesthepredictionerrorboundsformatrixcompletion.Recently,aso-called maxnorm [127]recentlyhasbeenproposedasanotherconvexsurrogatetotherankofthe matrix,whichisdeÞnedas |M|max =min M=UV&|M|2,'|V|2,',37where |á|2,'isthemaximum !2rownormofamatrix.ThemaxnormisÞrstappliedto matrixcompletionundertheuniformsamplingdistributionin[54].Andlateramaxnorm constrainedminimizationmethodisproposedin[17]fornoisymatrixcompletionundera generalsamplingmodel,whichisshowntobeminimaxrate-optimalandyieldsauniÞedand robustapproximaterecoveryguarantee. 38Chapter3 ImageAnnotationwithKernel DistanceMetricLearning ARegressionbasedKernelMetricLearning(RKML) algorithmisproposedintheimage annotationframeworkinthisChapter. Theremainderofthechapterisorganizedasfollows.Section3.1motivatestheprob- lemandmainintuitionbehindtheproposedalgorithm,andaswellassetupsthenotations. Section3.3isdevotedtothedetaileddescriptionoftheproposedRKMLalgorithmandits extensions.Thetheoreticalpropertiesandguarantee, i.e.,theboundsoferrorbetweenthe computedkerneldistancemetricanditsstatisticaloptimalone,isgiveninSection3.4,and theomittedproofsaredeferredtoSection3.5.Section3.6presentsthedetailedimplemen- tationissues.Section3.7describestheintensiveexperimentalsetup,resultsandanalysis. Section3.8summarizesthechapterandSection3.2surveysthecloselyrelatedworks. 3.1MotivationandSetup Amongthehugevolumeofimageannotationalgorithms,thesearchbasedapproachhasbeen provedtobequitee !ective,particularlyforlargeimagedatasetswithmanykeywords[67,93, 139,194].Theirkeyideaistoannotateatestimage Iwiththecommontagssharedbythe subsetoftrainingimagesthatarevisuallysimilarto I,whichgivesrisetoanemergentneedof 39ane !ectivevisualsimilaritymeasurebetweenimages.Duetotheintricatecomplexitiesand nonlineardependenciesbetweenimagevisualcontents,weresortto KernelMetricLearning (KML)[26,39,187]totacklethisproblembylearningakernelbaseddistancemetricthat pullssemanticallysimilarimagescloseandpushessemanticallydissimilarimagesfarapart. Figure3.1illustratesempiricale !ectsofapplyingappropriatekerneldistancemetricto imagesassociatedwithpropertags,indicatingthatastwoimagessharemoretags,their visualdistanceisshortenedbyalearnedkerneldistancemetric. Figure3.1:Illustrationofhowkerneldistancemetricworkstoimageswithappropriatetags. Intheleftbox,imagessharethetagsmarkedinthesamecolorasthelinesconnectingthem. Kernelmetriclearninghasbeenwidelyusedtosettleimagesimilarityproblemsinimage classiÞcation[39,60,187],clustering[2,26,205],andretrieval[77,78].Traditionalkernel distancemetriclearningapproaches[39,60,78,187,205]areusuallyextendedfromexisting lineardistancemetriclearning,andtheyusuallyÞndtheoptimalkernelmetricbyminimizing thedistancebetweenmust-linkimagesandsimultaneouslymaximizingthedistancebetween cannot-linkimages. 40DespitethesuccessofKMLalgorithmsinthoseapplications,theystillsu !erfromtwo signiÞcantlimitations.First,thehighdimensionalityofKML,denotedby d,usuallyleads toahighcomputationalcostinsolvingtherelatedoptimizationproblems.Inparticular, toensurethelearnedmetrictobe PositiveSemideÞnite (PSD),theexistingmethodsneed toprojectthelearnedmatrixintoaPSDconewhosecomputationalcostis O(d3),whichis relativelycomputationallyexpensive.Althoughonlinelearningalgorithms[25,39,91]are abletogetridofthePSDrequirements,theyneedtotrainaconsiderableamountofdata andstillcostremarkabletimetoearnareasonableperformance.Secondly,thehighdimen- sionalityinkernelmetriclearningprocessmayleadtotheoverÞttingoftrainingdata[95], andÞnallyreducestheannotationperformance.Toaddresstheover-Þttingproblem,some studiestrytoÞndbetterkernelswithboostingmethods[74,172],somestraightforwardly reducethedimensionalityoftheprojecteddata[26,187],andsomeothersdirectlyadda regularizer[95].However,noneofthemhasasolidtheoreticsupportindealingwiththe overÞttingproblem. Unlikemostlinearorkernelmetriclearningalgorithmsinsimilarsetupincludingimage classiÞcation,clusteringandretrieval,whichdealwithbinarysemanticconstraints(must- linkorcannot-linktoalabel),theproposedRKMLalgorithmisabletohandlethenumeric semanticconstraints,whichbetterrepresentthecomplexsemanticrelationshipbetweenim- agesandthusmakebetteruseofthesupervisedinformation.Besides,theproposedRKML algorithmavoidsthetimeconsumingPSDconeprojectionstepbyexploitingthespecial propertyofregression,wherethePSDpropertyisautomaticallyguaranteed.Additionally theoverÞttingriskthatiseasilycausedbythehighdimensionalityandcommonlyexistsin kernelmetriclearningisalleviatedinRKMLbyappropriatelyregularizingtherankofthe learnedkernelmetricmatrix,insteadofanindependentnorm(FrobeniusorAbsolutenorm) 41ofthelearnedmetricmatrix.Thisstrategyalsofacilitatesthefurtherimplementationby connectingRKMLwiththeNystr¬omapproximation,andthusspeedsupthecomputation withlimitedstoragememoryrequestedinthecomputationphase.Finally,theproposed RKMLisequippedwiththeoreticalguarantees,theboundsoferrorbetweenthelearned metricandthestatisticaloptimalone,whichisoriginalandconstructiveforkerneldistance metriclearning. 3.2RelatedWork Duetotherichliteratureinbothareasofimageannotationanddistancemetriclearning, hereweonlysurveythestudiescloselyrelatedtothiswork.Formorecomprehensiveand detailedbackgroundreview,pleaserefertoChapter2. Accordingto[67],automaticimageannotationmethodscanbecategorizedintothree groups:(i)generativemodels[22,49],whicharedesignedtomodelthejointdistribution betweentagsandvisualfeatures,(ii)discriminativemodels[47,144]thatviewimageanno- tationasaclassiÞcationproblemswhereeachkeywordistreatedasanindependentclass, and(iii)searchbasedapproaches[139,194].Recentstudiesonimageannotationshowthat searchbasedapproachesaremoree !ectivethanbothgenerativeanddiscriminativemod- els.Here,webrießyreviewthemostpopularsearch-basedapproachesdevelopedforimage annotation.TagProp[67]constructsasimilaritygraphforallimages,andpropagatesthe labelinformationviathegraph.In[123]amajorityvotingschemeamongtheneighboring imagesisproposed.Asparsecodingschemeisproposedin[59]tofacilitatelabelpropaga- tion.ConditionalRandomFieldmodelisadoptedin[93]tocapturethespatialcorrelation betweenannotationsofneighboringimages. 42Manyalgorithmshavebeendevelopedtolearnalineardistancemetricfrompairwise constraints[210],andsomeofthemaredesignedexclusivelyforimageannotation[93,200, 202].Recently,anumberofnonlinearDMLapproacheshavebeendevelopedtohandle nonlinearandmultimodalpatterns.TheyareusuallyclassiÞedintotwocategories,boosting basedapproaches[73,74,172]andkernelbasedapproaches,dependingonhowthenonlinear mappingisconstructed.ManyKMLalgorithms,suchasKernelDCA[78],KLMCA[187] andKernelITML[39],directlyextendtheirlinearcounterpartstoKMLusingthekernel trick.TohandlethehighdimensionalitychallengeinKML,acommonapproachistoapply dimensionalityreductionbeforelearningthemetric[26,187].Althoughthesestudiesshow dimensionalityreductionhelpsalleviatetheoverÞttingriskinKML,notheoreticalsupport isprovided. 3.3AnnotateImagesbyRegressionbasedKernelMet- ricLearning(RKML) Tobegin,let X=(x1,..., xn)&beasetoftraininginstances,where xi#Rdisa d-dimensionalinstance.Let mbethenumberofclasses,and Y=(y1,..., yn)&betheclass assignmentsofthetraininginstances,where yi#{0,1}mwith yi,j=1if xiisassignedto class jandzero,otherwise.Inimageannotation,eachimagecanbeassignedtomultiple classes,andthuseachvector yimaycontainmultipleones.Let %(x,x"):Rd$Rd()Rbeakernelfunction,and H!bethecorresponding ReproducedKernelHilbertSpace .43Withoutametric,thesimilaritybetweentwoinstances xaandxbcouldbeassessedby thekernelfunctionas *%(xa,á),%(xb,á)+H!=%(xa,xb).Similartolineardistancemetriclearningalgorithms,wemodifythesimilaritymeasurewith kerneldistancemetricas %(xa,xb)=*%(xa,á),T[%(xb,á)]+H!,where T:H!()H!isalinearoperatorlearnedfromthetrainingexamples.Theobjective ofkernelmetriclearningistolearnaPSDlinearoperator Tthatisconsistentwiththeimage tagassignmentsoftrainingexamples.Notethatthisisdi !erentfromsimilaritylearning[25] becauseindistancemetriclearningwerequire TtobePSD. 3.3.1RegressionbasedKernelMetricLearning TheproposedRKMLisakernelmetriclearningalgorithmbasedontheregressiontechnique. Let si,j#Rbethesimilaritymeasurebetweentwoimages xiandxjbasedontheiranno- tationsyiandyj.Wenotethat si,jisareal-valuedmeasurement,whichisdi !erentfrom theconventionalstudiesofdistancemetriclearningthatonlyconsiderabinaryrelationship betweentwoinstances.Thediscussionof si,jwillbedelayedtoSection3.6.1.Weadopta regressionmodeltolearnakerneldistancemetricconsistentwiththesimilaritymeasure si,jbysolvingtheoptimizationproblem: "T=argmin T,0n#i,j=112$si,j%*%(xi,á),T[%(xj,á)]+H!%2.44Followingtherepresentertheoremofkernellearning[170],itissu "cienttoassumethat "Tonlyoperatesinthesubspacespannedby %(xi,á),i=1,...,n ,leadingtothefollowing deÞnitionfor "T:"T[f](á)=n#i,j=1%(xi,á)Ai,jf(xj),(3.1)where A#Rn$nisaPSDmatrix.Using(3.1),wecanchangetheoptimizationproblemfor "Tintoanoptimizationproblemfor Aasfollows: minA,0L(A)=12|S%KAK &|2F,(3.2)where K=[%(xi,xj)]n$nisthekernelmatrixand S=[si,j]n$nincludesallthepairwise semanticsimilaritiesbetweenanytwotrainingimages,and |á| FrepresentstheFrobenius normofamatrix. Itisstraightforwardtoverifythat A=K€SK€isanoptimalsolutionto(3.2),where K€standsforthepseudoinverseof K.Notethatwhen thesemanticsimilaritymatrix SisPSD, AwillalsobePSD,thusnoadditionalprojection isneededtoenforcethelinearoperator "TtobePSD.ToavoidoverÞtting,wereplace Kwith Kr,thebestrank rapproximationof K,andexpress AasA=K%1rSK%1r.(3.3)45Evidently,therank rmakesthetradeo !betweenbiasandvarianceinestimating A:the largertherank r,thelowerthebiasandhigherthevariance.Thiswillbecomeclearerin ourtheoreticalanalysisinSection3.4. 3.3.2ExtensiontoImageFeatureDimensionReduction Usingthelearnedlinearoperator "T,thesimilaritybetweenanytwodatainstances xaandxbisgivenby %(xa,xb)=n#i,j=1%(xa,xi)%(xb,xj)Ai,j=#(xa)&A#(xb)=&A1/2#(xa)'&&A1/2#(xb)',where #(x):Rd()Rnisgivenby #(x)=[%(x,x1),..., %(x,xn)]&.Thus,theproposed RKMLalgorithmmapsavectorof ddimensionsintoonewithatmost mdimensions, i.e.,thelengthoftagdictionary.MorejustiÞcationaboutthedimensionreductiondetailscan bereferredtoSection3.6.1. 3.4TheoreticalGuaranteeofRKML Wewillshowthatthelinearoperatorlearnedbytheproposedalgorithmisstochastically consistent, i.e.,thelinearoperatorlearnedfromÞnitesamplesprovidesagoodapproximation totheoptimalonelearnedfromaninÞnitenumberofsamples.Tosimplifyouranalysis,we assumethatthesemanticsimilaritymeasure si,j=y&iyj1.1Wenotethatouranalysiscanbeeasilyextendedtothecasewhen si,j=öy&iöyj,where öyiisadeter- ministictransformationof yi.46DeÞnetheoptimallinearoperator T!thatminimizestheexpectedlossasfollows, minT"E(xa,xb,ya,yb)()y&ayb%*%(xa,á),T"[%(xb,á)]+H!*2+.Let T!(r)bethebestrank- rapproximationof T!,and "Tbethelinearoperatorconstructed byAgivenin(3.3).Wewillshowthatunderappropriateconditions, |T!%"T|!isrelativelysmall,where |á|!measuresthespectralnorm. Let gk(á)bethepredictionfunctionforthe k-thclass, i.e.,yi,k=gk(xi).Wemakethe followingassumptionfor gk(á)inouranalysis: A1:gk(á)#H!,k=1,...,m. Assumption A1essentiallyassumesthatitispossibletoaccuratelylearntheprediction function gk(á)givensu "cientlylargenumberoftrainingexamples.Wealsonotethatas- sumption A1holdsif gk(á)isasmoothfunctionand %(á,á)isauniversalkernel[146].The followingtheoremshowsthatunderassumption A1,withahighprobability,thedi !erence between T!and"Twillbesmall,provided nissu "cientlylarge. Theorem3.1. Assume A1holds,and %(x,x)-1forany x.Let r|L%Ln|!,where |á|!measuresthespectralnormofalinearoperator.Then,withaprobability 1%',wehave maxf#H!0("T%T!)[f]0H!-*0T![f]0H!,where *isgivenby *=20L%Ln02(&r%&r+1)/n%0L%Ln02.TheproofofTheorem3.3canbereferredtoSectionA.1.2. Theorem3.4. [175]Assume %(x,x)-1.Withaprobability 1%',wehave 0L%Ln0HS-4ln(1 /')/n.Theorem3.1followsimmediatelyfromTheorem3.4and3.3. 3.6Implementation Regardingimplementation,wehavetwoimportantissuestoaddress:(1)howtoappropri- atelymeasurethesemanticsimilarity si,j,and(2)howtoe "cientlycompute Kr,thebest rankrapproximationof K,withoutcomputingthefullkernelmatrix K.Thesecondissue isparticularlyimportantforapplyingtheproposedalgorithmtolargedatasetsconsistedof millionsofannotatedimages.Below,wewilldiscussthesetwoissuesseparately. 513.6.1ComputingSemanticSimilarity si,jThemoststraightforwardapproachistomeasurethesemanticsimilarityas si,j=y&iyj.We improveuponthisapproachbyincorporatingthelog-entropyweightingscheme[117]which hasbeenusedfordocumentretrieval.Itcomputestheweightedclassassignment÷ yi,jas÷yi,j=.1+n#kpk,j logpk,j logn/álog( yi,j+1) ,(3.5)where pk,j =yk,j /0niyi,j.WeapplyLatentSemanticAnalysis(LSA)[117]tofurther enhancetheestimationofsemanticsimilarity,whichallowsustoremovethenoiseandcor- relationin/betweenannotations.Let ÷Y=[÷ yi,j]n$mincludetheweightedclassassignments forallthetrainingimages,and öY#Rn$m"includetheÞrst m"singularvectorsof ÷Ywith eachofitsrow L2-normalizedby1.Thisoperationprojects ÷Yontoaspaceofreduced dimensionality m",andthisspacerepresentationhasbeenempiricallyshowntocaptureto somedegreethesemanticrelationshipacrossannotationscorpus[147].Wethencompute thesemanticsimilarityas S=öYöY&.3.6.2E "cientlyComputing KrbyRandomProjection TheproposedRKMLalgorithmrequirescomputingthefullkernelmatrix Kanditstop rsingularvectors.Sincethecostofcomputing KisO(n2),itwillbeexpensivewhenthe numberoftraininginstances nislarge.Wecanimprovethecomputationale "ciencyby exploitingtheNystr¬ommethod[43]toapproximate Kr.Tothisend,werandomlysample nsS(x,xj),1i#[1,k],j#[1,n],xj2=xNi}.Thustherelevanceofkeywordsfor xcanbeestimatedover Nxbyeithermajorityvoting, orweightedvoting,i.e., "y=k#i=1*%(x,á),"T[%(xNiá)]+yNi=kxAKN÷YN(3.7)Thekeywordswiththe t-largestrelevancescoreswillberegardedastheannotationforthe testimage.Algorithm1summarizesthekeystepsoftheimageannotationalgorithmusing RKML.Algorithm1 AutomaticImageAnnotationwithRKML Input: ¥Trainingimages: X#Rn$d,labels Y#Rn$m¥Testingimages: Xq#Rnq$d¥Parameters:smoothparameter *andapproximationrank r.1:Randomlysample rimages "x1,..., "xrfromthetrainingset 2:Computekernelmatrices Kg=[!(xi,"xj)]n!rand "K=[!("xi,"xj)]r!r3:Computesingularvaluedecomposition: V!VT=Kb4:Selectthe rlargestsingularvalues "i#!randtheircorrespondingright-singularvectors ui#Ur5:Kernelmetric: A=0riuiuTi/("2i+#)6:Relevancescorematrix: Tq=KqA(KTgTg)(TTgTg)7:Output: Matrixoftagrelevancescore Tq#Rnq!mFigure3.2highlightsthekeycomponentsofakernelmetriclearningalgorithmbased frameworkforimageannotation.ItÞrstconstructsa ReproducedKernelHilbertSpace (RKHS) Hbasedoneitherthewholesetorasubsetofimagesthatarerandomlysam- pledfromthetrainingimageset.Itthenmapsthetrainingimagesto H,andlearnsakernel distancemetric Afromthemappedimages.Givenatestimage I,itÞrstmaps ItoH,54Figure3.2:Theproposedkernelmetriclearningscheme, i.e.,RKML,forautomaticimage annotation.andmeasuresthesimilaritiesbetween Iandallthetrainingimagesin Husingthelearned distancemetric A.Basedontheoccurrenceofkeywordsinthesubsetoftrainingimages thatsharethelargestsimilaritieswith I,itestimatesrelevancescoresforeachkeywordand returnstheoneswiththelargestscoresasthepredictedannotationtags. 3.7Experiments 3.7.1DatasetsandExperimentalSetup Threebenchmarkdatasetsforimageannotationareusedinourstudyandtheirstatisticsare summarizedinTable3.1.ForbothESPGameandIAPRTC12datasets 2,abag-of-words 2Thevisualfeaturesandtagsofboththedatasetswereobtainedfrom[67] http://lear.inrialpes.fr/people/guillaumin/data.php.55modelbasedondenselysampledSIFTdescriptorsisusedtorepresentthevisualcontent. Flickr1Mdataset[202]iscomprisedofmorethanonemillionimagescrawledfromthe Flickr websitethatareannotatedbymorethan700 ,000keywords.Sincemostkeywordsareonly associatedwithasmallnumberofimages,weonlykeepthe1 ,000mostpopularones.We follow[200,202]andrepresenteachimagewithfollowingfeatures:gridcolormoment,local binarypattern,Gaborwavelettexture,andedgedirectionhistogram. ESPGame IAPRTC12 Flickr1M No.ofImages 20,768 19,627 999,764 Dimensionality 10001000291Vocabularysize 2682911,000 Tagsperimage 4.69/15 5.72/23 5.98/202 Imagespertag 363/5,059 386/5,534 5,976/76,531 Table3.1:Statisticsforthedatasetsusedintheexperiments.Thebottomtworowsare givenintheformatmean/maximum. Werandomlyselect90%ofimagesfromeachdatasetastrainingandusetheremaining 10%fortesting.Givenatestimage,weÞrstidentifythe kmostvisuallysimilarimagesfrom thetrainingsetusingthelearneddistancemetric,andthenrankthetagsbyamajorityvote overthe knearestneighbors,where kischosenbycross-validation. AnRBFkernelisusedinourstudyforallKMLalgorithms.InRKMLweset ns=5,000andm"=0.38mbasedonourexperience,anddeterminethekernelwidthandrank rbycross-validation.Parametersforthebaselinesaredirectlysettotheirdefaultvaluessuggested bytheoriginalauthors.Besides,annotationbasedontheEuclideandistance,denotedby Euclid ,isusedasareferenceinourcomparison.SincemostDMLsaredevelopedagainst must-linksandcannot-links,weapplytheproceduredescribedin[200]togeneratethebinary constraintsbyperformingaprobabilisticclusteringovertheimagesbasedontheirtags.More detailsofthisprocedurecanbefoundin[200]. 56Weevaluatetheannotationaccuracybytheaverageprecisionforthetoprankedimage tags.Following[201,202],weÞrstcomputetheprecisionforeachtestimagebycomparing thetop10annotatedtagswiththegroundtruth,andthentaketheaverageoverthetestset. AveragerecallandF1scorearereportedinthesupplementarydocument.Thecomputational e"ciencyismeasuredbytherunningtime 3.Boththemeanandstandarddeviationof evaluationmetricsover20experimentaltrialsarereportedinthispaper. 3.7.2ComparisonwithState-of-the-artdistancemetriclearning (DML)andImageAnnotationAlgorithms 3.7.2.1ComparisontononlinearDMLalgorithms WeÞrstcomparetheproposedRKML 4algorithmtosixstate-of-the-art kernel distance metriclearningmethods:(1)KernelPCA( KPCA )[169],(2)Generalizeddiscriminantanal- ysis( GDA )[6],(3)Kerneldiscriminativecomponentanalysis( KDCA )[78],(4)Kernellocal Fisherdiscriminantanalysis( KLFDA )[182],(5)Kernelinformationtheoreticbasedmetric learning( KITML )[39],and(6)Metriclearningforkernelregression( MLKR )[199].Wealso includethreeboostingDMLalgorithms, i.e.,DistanceBoost( DBoost )[73],KernelBoost (KBoost )[74],andmetriclearningwithboosting( BoostM )[172],forcomparison. Figure3.3,3.4and3.5showtheaverageprecision,averagerecallandaverageF1score, respectively,ofthetop tannotatedtagsobtainedbynonlineardistancemetriclearning (DML)baselinesandtheproposedRKML.Surprisingly,weobservethatmostofthenonlin- earDMLalgorithmsareonlyabletoyieldperformancesimilartothatbasedontheEuclidean 3AllthecodesaredownloadedfromtheauthorsÕwebsites,andruninMatlabontheAMD2core@2.7GHz and64GBRAMmachine. 4WithoutspeciÞcnotiÞcation,RKMLstandsfortheproposedRKMLalgorithmwithNystr¬omapproxi- mation.Anditssourcecodecanbefoundin http://www.cse.msu.edu/ ~fengzhey/research/rkml.html .57(a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.3:Averageprecisionforthetop tannotatedtagsusingnonlineardistancemetrics. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.4:Averagerecallforthetop tannotatedtagsusingnonlineardistancemetrics. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.5:AverageF1scoreforthetop tannotatedtagsusingnonlineardistancemetrics. distance,andmoredisturbingly,someofthenonlinearDMLalgorithmsevenperformsig- niÞcantlyworsethantheEuclideandistance.Ontheotherhand,theproposedalgorithm performssigniÞcantlybetterthantheEuclideandistanceforalmostallcases.Relevant analysisofthisphenomenaisprovidedinSection3.7.3.3. 583.7.2.2ComparisontolinearDMLalgorithms WecompareourRKMLtosevenstate-of-the-art linear distancemetriclearningalgo- rithms,includingRelevantcomponentanalysis( RCA )[3],Discriminativecomponentanalysis (DCA )[78],LargemarginnearestneighborclassiÞer( LMNN )[196],LocalFisherdiscrimi- nantanalysis( LFDA )[182],Informationtheoreticbasedmetriclearning( ITML )[39],Prob- abilisticRCA( pRCA )[200],andLogisticdiscriminant-basedmetriclearning( LDML )[68]. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.6:Averageprecisionforthetop tannotatedtagsusinglineardistancemetrics. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.7:Averagerecallforthetop tannotatedtagsusinglineardistancemetrics. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.8:AverageF1scoreforthetop tannotatedtagsusinglineardistancemetrics. Figure3.6,3.7and3.8showtheaverageannotationprecision,averagerecallandaverage 59F1score,respectively,forthelineardistancemetriclearning(DML)baselines.Similarto KML,weobservethateventhebestlinearDMLalgorithmisonlyslightlybetterthanthe Euclideandistance,whileRKMLsigniÞcantlyoutperformsalllinearDMLbaselines.Again, webelievethatthefailureoflinearDMLislikelyduetothebinaryconstraintsgenerated fromimageannotations,whichisexplainedinSection3.7.3.3. 3.7.2.3ComparisonwithState-of-the-artImageAnnotationMethods Additionally,wecompareRKMLalgorithmtoseveralstate-of-the-artimageannotationmod- elsincluding:(1)TwoversionsoftheTagPropmethod[67],usingeitherrank-basedweights (TP-R )ordistance-basedweights( TP-D ),(2)TagRelevance( tRel )[123]basedontheidea ofneighborvoting,(3)1-vs-1SVMclassiÞcation,usingeitherlinear( SVML )orRBFkernel (SVMK )classiÞers 5.Weinclude Pop asacomparisonreferencewhichsimplyrankstags basedontheiroccurringfrequencyinthetrainingset. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.9:Annotationperformanceintermsof AP@twithdi !erentannotationmodels. Figure3.9,3.10and3.11showthecomparisonofaverageprecision,averagerecalland averageF1scorethatobtainedbydi !erentimageannotationmodels,respectively.Itisnot surprisingtoobservethatmostannotationmethodssigniÞcantlyoutperformPop,whilethe proposedRMKLmethodoutperformsallthestate-of-the-artimageannotationmethodson 5SVMmethodswereunabletoperformover Flickr1M duetoitslargesizeandhighcomputationalcost, andtheresultsofSVMmethodsareexcluded. 60(a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.10:Averagerecallforthetop tannotatedtagsusingdi !erentannotationmodels. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.11:AverageF1scoreforthetop tannotatedtagsusingdi !erentannotationmodels. IAPRTC12andESPGamedatasets,andonlyperformsslightlyworsethanTP-Donthe Flickr1Mdataset. 3.7.2.4ComparisonofAnnotationResultsonExemplarImages Inordertostraightforwardlycomparetheempiricalperformanceofimageannotationbe- tweenvariouslinearandkerneldistancemetriclearningalgorithmsaswellasimagean- notationapproaches,weincludethecomparisonofannotationresultsoncertainimagesin Table3.2,whichshowstheannotationsofexemplarimagesbydi !erentDMLandimage annotationalgorithms. 6162GroundEuclid DCALMNNLDML DBoostBoostMKBoost KLDAKPCAKDCAKLFDAMLKR TP-RTP-D RKMLfogmountain mountain mountain mountain mountain mountain mountain mountain treemountain mountain mountain mountain mountain mountain front wall wall wall wall wall tourist wall range sky wall rangewall manman terrace mountain terrace treefog terrace terrace front terrace wall front terrace terrace terrace fog wall wall rangefog cloud terrace woman cloud wall cloud cloudman fog wall cloud wall treerange ruinrange terrace tourist cloud fog woman range sky wall range greyman treepeople ruinterracecloud tourist woman range range groupsky terrace housecloud ruinwoman terrace woman fog touristtree fog treefog ruinrange treefog mountain hillsky range slope front slope wall ruinpeopleforestman skymanman peoplepeopleman fog treesummit tourist sky buildingskyskyskyskyskypeople skyskyskyskyskyskyskyskymeadow front tree tree tree tree sea skytree cloud tree tree tree tree tree tree skyhillcloud buildingmeadow cloud cloudmancloud house wall sea buildingcloud wall house tree meadow buildingman buildingbuildingbeachmountain buildingtree front beachpeoplemountain ruincloud buildingruinhouse housecloudhouse rock tree bushhillmountainbushsquarehouse slope front hillsky hillfront bushpeople meadow front meadow mountainpeoplecloudcolumnman meadow meadow wall treepeople cloudlandscapebush tree bushsea buildingman meadow ßag meadow buildingman terracewallbush meadow ruinhillcoastrockhouse seahousehouse front people housepeople front bike road mansky road treemansky treetreeskysky road landscape treeroad cyclingman wallbushman skyskysnow skyskysnowtreeman mansky sky cyclist cyclist desertman cyclist short rock treemeadowfront cycling landscape cyclist grass man landscape helmet jersey front road helmet jersey peoplebuilding manwall cyclist rock helmet seafront cyclist jersey short skytree jersey meadowtreefront cyclist manmanbush jersey treeroad short landscape bike ßoor bike short sockbushpeople landscape people short buildingshort cactuswall bike mountain cycling road carcycling lawn landscape cloud road house bike cloudsky road bushcycling roadhelmet treecycling sky mancli !man rock mountain frontfront cycling skymeadow jersey shortcar tourist cyclist bike spectatorfrontstreet cloudwoman helmet grass bike rock people helmet doorbuilding wall buildingbuilding frontbuilding house buildingskyfront house buildinghouse house door housefront tablestreettable house tree buildingfront tree house skyhouse window window house palm house womanbalcony house window skywindow window frontbuilding tree tablestreetstreet skyrooftable frontpeoplefront buildinghouse front house people window hillwall skyskywindow sky window window square wall wall street door skyhouse door landscapefront door tree palmtreesquare classroom tree woman skypeoplebalcony wall man wall meadowman tree door tree windowwoman man window man columntowerentrance door mountainßag roof window palmpalm buildingwall door buildingfrontsquare entrancecar wall columnbuildingmansnowwoman mantile street carskytree tree people skyskyman skyskypeoplefogpeople tree people skyfencepeople buildingbuilding skyfrontbuildingsea people tree man skyskyskytree spectator grandstand tree front skytree buildingtree skycloudwall tree wall tree front skytree houseman peoplefront house peoplepeoplewoman boatfront skymanman buildingman fence skywoman man carfront squareman tree manman fence mountainfront cloudfront front palm house skyhouse man tower house beach seacloudwomanslope house river house carspectator carcarmeadow square tree frontcloud tree mountainbankbeachwoman boatbuilding grandstand treebuilding fence palmwoman man carwater beach house carbedsquare peoplewoman people bed wall wall wall wall bed wall wall wall sky wall bed wall woman wall wall blankettable tablewomantable wall room room window treeroom wall tablewall woman bed curtain room room door room room bed bed room mountain bed room room front tableroom front window womantable front curtaintabletable bed cloudtabletable front doorman window room curtainfront man window window window window curtainwall curtainwood man man room curtainwallwoman window room bed tablewood wood tablerock window bedcoverwoman tablefront wood window bed bed bed woman wood curtaincurtainwood front wood bedside window house window tablewooddoor doorbuilding curtainlamplampdoor front housedoor curtainbed room bed front buildingskytree tree skyskyskyskyskytree skyskyskyskyskyskycloud cloudman road front cloudtree tree cloudman tree mountain front cloudtree tree front front carfront cloudseamountain buildingtree wallcyclist tree tree front cloudcloudhilltree cyclistman tree man hillsea mountain skyfront desert cloudtree man buildingmeadowman cyclingmountain road landscapetouristbeach beachwomanmangrey road carfront meadow monument road short skyman meadow front house man front mountain hillman parkmountain hillskymountain buildingcarmountain beachhouse front househousepeoplelandscapemountain man road mountain treecar skycloudpeople tree landscapecity meadow mountain road snow hillshophouse front Table3.2:Examplesofannotationresultsgeneratedby14baselinesandtheproposedRKML.Theannotatedtagsareranked basedontheestimatedrelevancescoreindescendingorder,andthecorrectonesarehighlightedinblueboldfont.Notethe groundtruthannotationsinthe2-ndcolumndonotalwaysincludeallrelevanttags( e.g.,ÒpeopleÓforthe5-thimage),and sometimescontainpolysemes( e.g.,ÒpalmÓforthe4-thand5-thimages)andcontroversialtags( e.g.,ÒfrontÓ). 3.7.2.5E "ciencyEvaluation TIMEDCALMNNITMLLDMLDBoostBoostM RKML IAPRTC12 1.5e41.4e44.2e44.2e51.7e41.1e6 4.6e2 ESPGame 2.3e41.7e45.8e45.5e54.3e41.2e6 1.3e3 Flickr1M 8.1e46.0e43.0e45.2e51.2e43.2e5 3.4e3 TIMEKPCAGDAKDCAKLFDAKITMLMLKR RKML IAPRTC12 2.8e44.8e42.2e48.8e45.3e42.2e3 4.6e2 ESPGame 3.3e45.4e43.7e43.2e56.8e43.5e4 1.3e3 Flickr1M 7.3e33.3e41.3e51.0e53.7e67.9e3 3.4e3 Table3.3:Comparisonofrunningtime(s)forseveraldi !erentmetriclearningalgorithms. TIMETP-RTP-D tRel SVMLSVMK RKML IAPRTC12 9.1e24.6e2 1.0e1 2.5e34.0e5 4.8e2 ESPGame 2.7e21.5e2 1.5e1 1.6e28.9e4 1.3e3 Flickr1M 1.6e59.9e4 5.7e3 --3.4e3 Table3.4:Runningtime(s)forimageannotation.SVMmethodsFlickr1Marenotincluded duetotheirhighcomputationalcosts. Table3.3summarizestherunningtimeofdi !erentDMLalgorithms.Weobservethat RKMLissigniÞcantlymoree "cientthananyDMLbaseline.Table3.4comparestheef- Þciencyofdi !erentbaselinesforannotation,wheretherunningtimeincludesthetimefor bothlearningadistancemetricandpredictingimagetags.Weobservethatcomparedtothe otherannotationmethods,theproposedRKMLalgorithmisparticularlye "cientforlarge datasets( i.e.,Flickr1M),makingitsuitableforlarge-scaleimageannotation. 3.7.3A !ectsofDi !erentExperimentalandParameterSetup 3.7.3.1SensitivitytoParameters Inthissection,weanalyzethesensitivitytoparametersinRKML,includingrank r,m",thenumberofretainedeigenvectorswhenestimatingthesemanticsimilarity,and ns,the 63numberofsampledimagesusedforNystr¬omapproximation. (a) IAPRTC12 (b)ESPGame Figure3.12:AveragePrecisionfortheÞrsttagpredictedbyRKMLusingdi !erentvalues ofrank r.TomaketheoverÞttinge !ectclearer,weturno !theNystr¬omapproximation forIAPRTC12andESPGamedatasets.Flickr1Mdatasetisnotincludedduetoitslarge size( n=999 ,764).TheoverÞttingonlyoccurswhen rapproximatestothetotalnumberof images,butitisinfeasibletoapplysuchalarge rinFlickr1Mdataset. Weexaminetheroleofrank rintheproposedalgorithmbyevaluatingtheprediction accuracywithvaried rontheIAPRTC12andESPGamedatasetsforbothtrainingand testingimages.Tomakeitclear,weturno !theNystr¬omapproximationusedbyRMKLin thisexperiment.WeobserveinFigure3.12thatwhiletheaverageaccuracyoftestimages initiallyimprovessigniÞcantlywithincreasingrank r,itbecomessaturatedaftercertainrank. Ontheotherhand,thepredictionaccuracyoftrainingdataincreasesalmostlinearlywith respecttotherank,andbecomesalmost1forverylarge r,aclearindicationofoverÞtting trainingdata. WealsoexaminethesensitivityoftheotherparametersusedbytheproposedRKML algorithm,including m",thenumberofretainedeigenvectorsof ÷Y,and ns,thenumberof sampledimagesusedforNystr¬omapproximation).Figure3.13and3.14showtheimage annotationperformanceintermsofvaried m"andns,respectively.Overall,wefoundthat ouralgorithmisinsensitivetothevaluesoftheseparametersoverawiderange,which facilitatetheselectionoftheseparametersinreal-worldapplication. 64(a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.13:AveragePrecisionforthetop ttagspredictedbyRKMLusingdi !erentvalues ofm",thenumberofretainedeigenvectorswhenestimatingthesemanticsimilarity. (a) IAPRTC12 (b)ESPGame (c) Flickr1M Figure3.14:AveragePrecisionforthetop ttagspredictedbyRKMLusingdi !erentvalues ofns,thenumberofsampledimagesusedforNystr¬omapproximation.In(c), nscouldnÕt besettoolargeduetothedatasetsize. 3.7.3.2AdvantagesofKernelTrickandNystr¬omApproximation Sincenoneofthebaselinealgorithms,neitherlinearnornonlinearDML,isabletosigniÞ- cantlyoutperformtheEuclideandistance,itremainsunclearifkernelDMLisadvantageous toalinearDML.Toexaminethispoint,weimplementthelinearversionofRKML,de- notedbyRLML.Table3.5,3.6and3.7showthecomparisonofperformancebetweenRLML andRKMLonthreedatasets.ItisclearthatRKMLsigniÞcantlyoutperformsitslinear counterpartRLML,verifyingtheadvantageofusingkerneltrickindistancemetriclearning. However,althoughthekerneltrickconsiderablyimprovestheimageannotationaccuracy, 65AP@ t(%) t=1t=2t=3t=4t=5t=6t=8t=10 RKML 55±1.2 48±0.9 44±0.6 41±0.8 37±0.6 35±0.5 31±0.5 28±0.4 RLML 52±1.3 46±1.2 42±1.0 38±0.8 35±0.7 33±0.6 29±0.5 26±0.4 RKMLO 57±0.9 51±0.6 46±0.7 43±0.6 39±0.6 37±0.5 32±0.5 29±0.4 RKMLH 49±1.1 44±0.9 39±0.9 36±0.7 33±0.7 31±0.7 27±0.6 24±0.5 Table3.5:ComparisonofvariousextensionsofRKMLintermsof AP@tonIAPRTC12 dataset.RLMListhelinearversionofRKML,RKMLOistheoriginalversionwithout Nystr¬omapproximation,andRKMLHrunsRKMLusingbinaryconstraints. AP@ t(%) t=1t=2t=3t=4t=5t=6t=8t=10 RKML 40±1.1 35±0.5 32±0.4 29±0.5 27±0.4 25±0.4 22±0.4 20±0.4 RLML 36±0.8 31±0.7 28±0.7 26±0.7 24±0.5 22±0.4 20±0.4 18±0.4 RKMLO 44±0.8 39±0.6 35±0.5 32±0.4 29±0.4 27±0.4 24±0.3 21±0.3 RKMLH 34±1.0 30±0.5 28±0.5 26±0.4 24±0.4 22±0.3 20±0.3 18±0.3 Table3.6:ComparisonoftheextensionsofRKMLintermsof AP@tonESPGamedataset. AP@ t(%) t=1t=2t=3t=4t=5t=6t=8t=10 RKML 24±0.1 21±0.2 18±0.1 17±0.2 15±0.2 14±0.1 13±0.2 12±0.1 RLML 13±0.3 12±0.2 11±0.2 11±0.1 10±0.06 10±0.05 9.0 ±0.05 8.0 ±0.08 RKMLH 20±0.2 18±0.1 16±0.2 15±0.2 14±0.2 13±0.1 11±0.1 10±0.1 Table3.7:ComparisonoftheextensionsofRKMLintermsof AP@tonFlickr1Mdataset. RKMLOisexcludedsincethedatasetistoolargetodothecomputationonthefullkernel. italsoinevitablyleadstohighevenprohibitivecomputationalcost.SotheNystr¬omapproxi- mationisproposedtosolvethisproblem,whichmakesatrade-o !betweenthecomputational costandannotationaccuracy.Toverifythee !ectivenessoftheNystr¬omapproximation,we implementtheRKMLbyturningo !theNystr¬omapproximationanddirectlydoallcom- putationonthefullkernel,andthismethodisdenotedbyRKMLO.Table3.5,3.6and3.7 comparetheannotationperformanceofRKMLandRKMLO,whereweobservethatRKML performsslightlyworsethanRKMLO.ThisphenomenonindicatesthatRKMLmakesagood compromisebetweenthee !ectivenessandcomputationalcost,bymakingtolerantsacriÞce onannotatione !ectivenesstogetridofthegreatcomputationalburden. 663.7.3.3AnalysisonBinaryConstraintsandTheirVariousGenerationWays WeobserveinSection3.7.2thatmostbaselinemetriclearningalgorithms,eitherlinearor kernelones,performworsethantheEuclideandistance.Weattributethisfailuremostlyto thebinaryconstraints.Asdescribedbefore,allbaselinedistancemetriclearningalgorithms requireconvertingimageannotationsintobinaryconstraintsinimageannotationtasks, whichdoesnotmakefulluseoftheannotationinformation.Toverifythispoint,werun RKMLwithsimilaritymeasure si,jcomputedfromthebinaryconstraintsthataregenerated forthebaselinedistancemetriclearningalgorithms,anddenotethismethodbyRKMLH. WeobserveinTable3.5,3.6and3.7thatRKMLHperformssigniÞcantlyworsethanRMKL whichdirectlyusesthereal-valuedsimilaritymeasures,conÞrmingthesigniÞcanceofusing real-valuedsimilaritiesfordistancemetriclearninginautomaticimageannotation. AP@ t(%) t=1t=4t=7t=10 Method1 20.7 ±0.2 15.3 ±0.2 12.4 ±0.12 10.6 ±0.10 Method2 20.6 ±0.3 15.2 ±0.2 12.4 ±0.11 10.6 ±0.09 Method3 20.8 ±0.2 15.4 ±0.1 12.5 ±0.05 10.7 ±0.04 Method4 19.7 ±0.2 14.6 ±0.1 11.9 ±0.06 10.2 ±0.06 Method5 21.3 ±0.4 15.9 ±0.3 12.8 ±0.20 11.0 ±0.14 Table3.8:Comparisonofdi !erentmethodsofgeneratingbinaryconstraintsthatareapplied inbaselinedistancemetriclearningalgorithmLMNNforthetop tannotatedtagsonthe Flickr1Mdataset.Method1clustersthespaceofkeywords,method2considerstheclass assignmentsasbinaryconstraints,method3clustersthespaceofkeywordsusinghierarchical clusteringalgorithms,method4clustersthespaceofkeywordstogetherwiththevisual features,andmethod5considersimagessharingmorethan4keywordsassimilarandimages sharingnokeywordasdissimilar. SincemostDMLalgorithmsweredesignedforbinaryconstraints,wetriedtoimprove theperformanceofstandardDMLalgorithmsbyexperimentingwithdi !erentmethodsfor generatingbinaryconstraints.Theyarelistedasfollows:(1)Clusteringthespaceofkey- 67words,(2)GeneratingbinaryconstraintsfromclassiÞcationlabels 6,(3)Clusteringthespace ofkeywordsusinghierarchicalclusteringalgorithms,(4)Clusteringthespaceofkeywords togetherwiththevisualfeatures,and(5)Generatingbinaryconstraintsbasedonthenumber ofcommonkeywords, i.e.,imagessharingmorethan4keywordsareconsideredassimilar andimagessharingnokeywordsareconsideredasdissimilar.Notethelastoneisapplicable inLMNN,butnotapplicableinmanyotherDMLalgorithms.Forexample,RCA[3]and DCA[78]divideimagesetintogroupswhereimageswithinagroupareconsideredassimilar andimagesfromdi !erentgroupsareconsideredasdissimilar;butthismethodisnotableto generatesuchgroups.Weobservethatthesemethodsyieldessentiallythesameperformance reportedinourstudy,asshowninTable3.8. 3.7.3.4ComparisonoftheDesignChoicesofSemanticSimilarityMeasure Toobtainthenumericconstraintsontheannotatedtag,besideslog-entropy,wefurther exploreotherweightingschemes.Andbesidesclusteringusingatopicmodel,wealsoexper- imentotherbinaryconstraintgenerationmethods. Binary li,j =1iftag iexistsinimage j,orelse0. TermFrequency li,j =tfi,j ,theoccurrencescountsof (TF) tag jinimage i.Logli,j =log( tfi,j +1) Table3.9:Localweightingfunctions. Weexaminethechoiceofsemanticsimilaritybyevaluatingthepredictionaccuracywith varieddeÞnitionof÷ yi,jinEquation(5).÷ yi,jisactuallytheproductofalocaltagweight li,jthatdescribestherelativeoccurrenceoftag jinimage i,andaglobalweight gjthat6Flickr1MdatasetalsoincludesclassassignmentlabelswhichisusuallyusedforclassiÞcation.ESPGame andIAPRTC12donothaveclassiÞcationlabels. 68Binary gj=1Normal gj=1/!0nitf2i,j Idf gj=log 2n1+dfjEntropy gj=1+ 0nipi,jlogpi,jlogn,where pi,j =tfi,j!nitfi,jTable3.10:Globalweightingfunctions. describestherelativeoccurrenceoftag jwithintheentiretagcollection.Theexamined weightingfunctions[10]aredeÞnedasfollowsinTable3.9and3.10. AP@ t(%) t=1t=4t=7t=10 Binary-Binary 56±1.01 41±0.57 33±0.49 28±0.45 Binary-Normal 53±1.28 39±0.62 32±0.54 28±0.44 Cosine 56±1.19 41±0.61 33±0.52 28±0.47 TF-IDF 55±1.12 41±0.57 33±0.50 28±0.44 Log-IDF 55±1.12 41±0.57 33±0.50 28±0.44 Log-Entropy 55±1.10 41±0.57 33±0.49 28±0.45 Table3.11:ComparisonofextensionsofRKMLwithdi !erentdesignchoicesofsemantic similarityforthetop tannotatedtagsontheIAPRTC12dataset.Theleftmostcolumnlists thedi !erentweightingmethods,wherethenamebeforeÓ-Ódenotesthelocalweightsshown inTable3.9andthenamebehindÓ-ÓindicatestheglobalweightsshowninTable3.10. ÓCosineÓrepresentsthecosinesimilaritybetweentagvectorsoftwoimages. Table3.11showsthatdi !erentsemanticsimilaritymeasures,eitherTF-IDFbasedweight- ingorthepopularcosinesimilarity,provideessentiallysimilarperformances.Wehenceadopt theLog-Entropyweightingschemeinourexperiments. 3.8Summary Inthissection,weproposearobustande "cientalgorithmRKMLforkernelmetriclearning. Theproposedmethodaddresses(i)highcomputationalcostbyavoidingtheprojectioninto PSDcone,(ii)limitationofbinaryconstraintsintagsbyadoptingareal-valuedsimilarity measure,andaswellas(iii)theoverÞttingproblembyappropriatelyregularizingtherankof 69thelearnedkernelmetric.Experimentswithlarge-scaleimageannotationdemonstratethe e!ectivenessande "ciencyoftheproposedRKMLalgorithmbycomparingittothestate- of-the-artapproachesfordistancemetriclearningandimageannotation.Inthefuture,we plantoimprovetheannotationperformancebydevelopingamorerobustsemanticsimilarity measure. 70Chapter4 ImageTagMatrixCompletionby NoisyMatrixRecovery InthisSection,weproposean ImageTagCompletionbyNoisyMatrixRecovery(TCMR) algorithm,whichisabletosimultaneouslyrecoverthemissingtagsandremoveordown weightthenoisytagsthatareirrelevanttothevisualcontentofimages.Inparticular,this algorithmisdesignedforimagetagcompletion,butactuallyitisnotexclusivetoimagetag completionbutalsoworksprettywellforrelevantimagetaggingtasksincludingimagetag reÞnementandimagetagre-ranking. Therestofthechapterisarrangedasfollows.Section4.1motivatestheproblemand statesmainintuitionbehindtheproposedalgorithm,andaswellassetupsthenotations. Section4.3introducesthedetaileddescriptionofnoisymatrixrecovery,andextendsitto theproposedTCMRalgorithm.Thetheoreticalpropertiesandguarantee, i.e.,thebounds oferrorbetweentherecoveredtagmatrixanditsstatisticaloptimalone,isgiveninSec- tion4.4,andtheomittedproofsaredeferredtoSection4.5.Section4.6presentsthedetailed implementationissues,anddescribestheproposedframeworkthatincorporatesimagecon- tentconsistencyintothematrixcompletionbasedtopicmodelthroughagraphLaplacian. Section4.7describestheintensiveexperimentalsetup,resultsandanalysis.Section4.8con- cludesthechapterwithfuturedirectionsandSection4.2reviewsthecloselyrelatedworks. 714.1MotivationandSetup Itisapparentthatdi !erentsemantictagshavedi !erentbiasedsigniÞcanceindescribing atopicthatisdeterminedbytheimagecontents,andthetagsassociatedwiththesame topicusuallyhaveastrongdependencyoneachother,whichcanbeexploitedtoimprove theannotationortagcompletionperformance. TheproposedTCMRalgorithmaddressestheincompleteandnoisytagproblemsby attemptingtoe "cientlyrecoverthemissingtagsandremoveordownweightthenoisytags simultaneously.TheinspirationandunderlyingconceptbehindtheTCMRalgorithmisthe connectionbetweenthefollowingtwoassumptions. ¥IdeaofLanguageModel .SincethetagsaregeneratedfromtheuserÕsdescription ofanimage,eachtagvectorcanbeviewedasamixtureoftopicsandeachtopic followsamultinomialdistributionoverthevocabulary[12,108,206].Notethatthe numberofobservedtagsforeachimageislimited,whilethenumberofparametersof themultinomialdistributiontobeestimatedissigniÞcantlylargerthanthenumberof observedtags. ¥LowRankMatrixRecovery .Observedtagsofanyimagecanbeassumedto samplefromamixtureofasmallnumberofmultinomialdistributions,whichcanbe interpretedequivalentlythattherecoveredtagmatrixhastobeof lowrank .Withtheconnectionofthesetwoassumptions,theproposedTCMRalgorithmenforces therecoveredmatrixtobelowrank.Throughanappropriatenuclearnormregularizer, itisabletoe !ectivelycapturetheinteractionsamongdi !erenttaginformation,bothtag keywords(column-wise)andtagvectorsbetweendi !erentimages(row-wise),whichturns outtobethekeyinÞllingoutmissingtagsanddownweightingnoisyones[19,136,184]. 72Unlikeinmostmatrixcompletionproblemswheretheobservedmatrixentriesaresampled uniformlyatrandomfromagivenmatrix[19,20],eachtagentryinourproblemsettingis sampledfromanunknownmultinomialdistribution,makingthestandardleastsquareloss andabsolutelossinappropriate.Henceamaximumlikelihoodestimatorisusedinthiswork toensurethelearnedtagprobabilitymatrixtobeconsistentwiththeobservedtags,and thisstrategyalsoaddsmorecomplexitytobothoptimizationandanalysisaswell. Itisnoticedthatalthoughlowrankmatrixrecoveryiscloselyrelatedtotopicmodelthat hasbeenappliedtomanyimagetagrelatedproblems[108,206,221],mostexistingtopic models[11]needtosolveanon-convexoptimizationproblem,whichmaycausethefailure inÞndingtheglobaloptimumandturnsouttobeabigchallengeinmatrixcompletion problem[20,21].Toaddressthislimitation,TCMRproposestosolveaconvexoptimization problemwhichensurestoe "cientlyconvergetoanoptimalsolution. Besides,theoreticalsupportisprovidedtoshowthatunderfavorableconditions,TCMR isguaranteedtorecovermostofthemissingtagsevenwhentheuser-providedtagsare noisy,andthatisnovelamongtheexistingstudiesfortagcompletion[126,134,201,226]. Additionally,TCMRimprovesthetaggingperformancebyexploitingthedependenciesbe- tweenimagefeaturesandtagsviaagraphLaplacian[224,226],whichreducestheimpact ofincompleteandnoisytagsbyassigninghighweightstotagsthatareconsistentwiththe imagevisualcontents,andlowweightstothosewhicharenot,particularlyunderextreme cases.Furthermore,theempiricalevaluationontagre-rankingandtagreÞnementtasks demonstratesthatTCMRisgenerallyapplicableande !ectivetootherimagetaggingtasks. Figure4.1highlightsthekeycomponentsoftheproposedimagetagcompletionalgorithm bynoisymatrixrecovery.Ontheonehand,itamelioratesthetagconÞdencescoreby enforcingthetagmatrixtobelowrank.Thisstrategytakestheadvantageofboththe 73Figure4.1:Theschemeoftheproposednoisytagmatrixrecoveryframework, i.e.,TCMR, forimagetagcompletion.Thelowrankmatrixrecoverycomponentintheupperrightbox exploitsthetag-tagcorrelation,andthegraphLaplaciancomponentinthebottomlefttakes intoaccountofthetag-contentcorrelation. row-wiseandcolumn-wiseinteractionswithinthetagmatrix, i.e.,thedependenciesamong tagwordsandcorrelationsoftaginformationbetweenimages.Ontheotherhand,agraph Laplacianisconstructedbasedonthevisualfeaturesofimages,whereeachnoderepresentsan imageandeachedgeisweightedbasedonthedistancebetweenitsconnectedimages.Then thetagvectorofanimageismodiÞedbasedontheweightedmajorityvotingresultsamong itsconnectedneighbors.Thesetwocomponentsconjointogetherinaconvexoptimization frameworkandÞnallymodifythetagconÞdencescorematrix. 744.2RelatedWork ThereareonlyafewstudiesÞttingthecategoryof imagetagcompletion withboth incompleteandnoisytags.[226]proposesadata-drivenframeworkfortagrankingthat optimizesthecorrelationbetweenvisualcuesandassignedtags.[129]removesthenoisy tagsbasedonthevisualandsemanticsimilarities,andexpandstheobservedtagswiththeir synonymsandhypernymsusingWordNet.[201]proposestosearchfortheoptimaltag matrixthatisconsistentwithbothobservedtagsandvisualsimilarity.[134]formulatestag completionintoanon-negativedatafactorizationproblem.[126]exploitssparselearning techniquestoreconstructthetagmatrix.Noneofthesestudiesprovidesanytheoretical guaranteefortheirapproaches.Matrixdecompositionisadoptedin[15,149,224]tohandle bothmissingandnoisytags.Thekeylimitationoftheseapproachesisthattheyrequireafull observedmatrixwithasmallnumberoferrors,makingitinappropriatefortagcompletion. Lowrankmatrixrecovery hasbeenappliedinmanyapplications[19,149],including visualrecovery[136,149],multilabelclassiÞcation[15],tagreÞnement[224],etc.Sincethe functionofmatrixrankisnon-convex,apopularapproachistoreplaceitwiththenuclear norm,thetightestconvexrelaxationformatrixrank[19,20,224].Usingthenuclearnorm regularization,itispossibletoaccuratelyrecoveralowrankmatrixfromasmallfractionofits entries[20]eveniftheyarecorruptedwithnoise[19,57].Variousalgorithms[57,94,149,224] havebeendevelopedtosolvetherelatedoptimizationproblem.Insteadofthe !1-norm loss[57,224],squaredloss[184]andmax-marginfactorizationmodel[136]usedinmost studiesonmatrixcompletion/recovery,amaximumlikelihoodestimationisusedinour worktorecovertheunderlyingtagmatrix. 754.3TagCompletionbyNoisyMatrixRecovery(TCMR) Inthissection,wedescribeanoisymatrixrecoveryframeworkfortagcompletion.And beforepresentingouralgorithmandanalysis,weÞrstintroducethenotationsthatwillbe usedthroughoutthispaper.Weuse Q!,itorepresentthe i-thcolumnofmatrix Q,|Q|F,|Q|trand|Q|!torepresenttheFrobeniusnorm,nuclear(trace)normandspectralnormofmatrix Q,respectively. |Q|1isusedtorepresentthe !1normofmatrix Q,i.e.,|Q|1=0i,j|Qi,j|,and|v|'isusedtorepresenttheinÞnitynormofvector v,i.e.,|v|'=max i|vi|.Wealso use ei#{0,1}ntorepresentthe i-thcanonicalbasisfor Rn,and 1#Rmtorepresenta m-dimensionalvectorwithallitsentriesbeing1. Tobegin,let mbethenumberofuniquetags,and D={d1,ááá ,dn}beacollectionof ntaggedimages,where di=(di,1,ááá ,di,m)isthetagvectorforthe i-thimagewith di,j=1whentag jisassignedtotheimageandzero,otherwise.Forthesimplicityofanalysis,in thisstudy,weassumethatalltheimageshavethesamenumberofassignedtags,denoted bym!1.Whendi !erentnumberoftagsareobserved,wecanapplyasimpleweighting technique[150]tohandlethevariationinthenumberoftags. Ourdevelopmentisbasedonthesimpleobservationthattheessentialgoaloftopicmodel istoapproximateanobservedtagprobabilitymatrixbyalowrankmatrix(ormoreprecisely, theproductoftwolowrankmatrices).Itisthisobservationthatmotivatesustoconnect topic(probabilistic)modelwithlowrankmatrixcompletion. 1Notethisassumptionisonlyfortheconvenienceofanalysis,anddoesnota "ectthealgorithm. 764.3.1NoisyMatrixRecovery Followingtheideaoflanguagemodels[11,12,225],weassumethatalltheobservedtags ineachimagearedrawnindependentlyfromaÞxedbutunknownmultinomialdistribution. Let pi=(pi,1,ááá ,pi,m)bethemultinomialdistributionusedtogeneratetagsin di.We use P=(p1,ááá ,pn)torepresentthemultinomialdistributionsforalltheimages.Ourgoal istoaccuratelyrecoverthemultinomialdistribution Pfromalimitednumberofobserved tagsin D.Ingeneral,thisisimpossiblesincethenumberofparameterstobeestimatedis signiÞcantlylargerthanthenumberofobservedtags.Toaddressthischallenge,wefollow thekeyassumptionbehindmosttopicmodels[184,224], i.e.tagsofanyimagearesampled fromamixtureofasmallnumberofmultinomialdistributions.Adirectimplicationofthis assumptionisthatmatrix Phastobeoflowrank,thefoundationforthetheoryoflowrank matrixrecovery[20]. Theproposedapproachcombinestheideaofmaximumlikelihoodestimation,acommon approachfortopicmodel,andthetheoryoflowrankmatrixrecovery.Itaimstorecoverthe multinomialprobabilitymatrix Pbysolvingthefollowingoptimizationproblem minQ##L(Q):= %n#i=1m#j=1di,jm!logQi,j1234:=E1+(rank(Q)1234:=E2,(4.1)wheredomain $isdeÞnedas $={Q#(0,1)m$n:Qi,!1&=1,i#[1,n]},(4.2)and(isaregularizationparameter.Wedenoteby öQtheoptimalsolutionto(4.1).Term 77E1in(4.1)ensuresthelearnedprobabilitymatrix öQtobeconsistentwiththeobservedtag matrix,andterm E2ensuresthat öQisoflowrank,whichindicatesthatalltagsofanimage aresampledfromamixtureofasmallnumberofmultinomialdistributions. Wenotethatunlikestandardmatrixcompletiontheory[20,61]whereobservedentriesare sampleduniformlyatrandomfromagivenmatrix,inourmodelaswellasothertopicmodel, eachobservedtagissampledfromanunknownmultinomialdistribution.Thisdi !erence makesthecommonleastsquarelossandabsolutelossinappropriate. However,thefunctionofmatrixrankin(4.1)isnon-convexandnon-di !erentiable,which posesaproblemintheoptimizationprocedure.Therefore,wereplacetherankfunction withthenuclearnorm,thetightestconvexenvelopeofthematrixrankfunction[19,20,21]. Thenuclearnormofamatrix QisdeÞnedasthesumofsingularvaluesof Q.Withthe nuclearnormregularization,itispossibletoaccuratelyrecoveralowrankmatrixfroma smallfractionofitsentries[20,21,161]eveniftheyarecorruptedwithnoise[19,57,100, 107],whichexactlyÞtsthemissingandnoisytagsituation.Consequently,theoptimization problemin(4.1)becomes minQ##L(Q)=%n#i=1m#j=1di,jm!logQi,j+(|Q|tr,(4.3)andthedomain $isstilldeÞnedthesameasin4.2. In(4.3)thesparsityoftherecoveredmatrix öQisintroducedbythenuclearnorm.Nuclear normregularizerenforcesthematrixcompletiontofavortheinteractionsbetweenrows andcolumnstoÞndaglobalsolution[14],whichisincontrasttoFrobeniusand !1normregularizersthatdealwitheachentryinthematrixindependently. 784.3.2IncorporatingIrrelevantTagsintoNoisyMatrixRecovery Regardingthefactthattheinitiallyunobservedtagsarewithasmallprobabilityrelevant totheassociatedimage,wealsomaximizethelikelihoodoftheirirrelevance,sotheloss functionsinboth(4.1)and(4.3)arethusupdated,andtheoptimizationproblembecomes minQ##L(Q)=%n,m #i,j=1(di,jm!logQi,j+1%di,jm%m!log(1 %Qi,j)++(|Q|tr,(4.4)wheredomain $remainstobe4.2. 4.4TheoreticalGuaranteeofRKML Thefollowingtheoremboundsthedi !erencebetween P,therecoveredtagmatrixbyTCMR, andtheoptimalrecoveredprobabilitymatrix öQ.Theorem4.1. Let rbetherankofmatrix P,and Nbethetotalnumberofobservedtags. Let öQbetheoptimalsolutionto(4.3).Assume N.%(nlog( n+m)),anddenoteby µ%andµ+thelowerandupperboundsfortheprobabilitiesinP. Thenwehave,withahighprobability 1n|öQ%P|1-O,rn+2log( n+m)N-,(4.5)where +2:=µ+|P1|'nµ2%-µ2+µ2%.AsketchoftheproofisprovidedinSection4.5.4. 79Itisclearthattherecoveryerroris O(rnlog( n+m)/N),implyingthatthetagmatrixcan beaccuratelyrecoveredwhen N.%(rnlog( n+m)).Thisisconsistentwiththestandard resultsinmatrixcompletion[107]andlowrankmatrixrecovery[106].Theimpactoflowrank assumptionisanalyzedinSection4.4.1.However,insteadofsquarelossusedinstandard matrixcompletiontheory,weadoptmaximumlikelihoodlossfunctioninourmodel,which leadstoadditionalchallengesinanalyzingtherecoverypropertyforourmodel. 4.4.1ImpactofLowRankAssumptiononRecoveryError Inordertoseetheimpactoflowrankassumption,letusconsiderthemaximumlikeli- hoodestimationofmultinomialdistribution.Sincetagsfordi !erentimagesaresampled independently,weonlyneedtoconsideroneimageateachtime.Let pbetheunderlying multinomialdistributiontobeestimated,andlet dbetheimagetagvectorcomprisedof m!wordssampledfrom p.Weestimate pbythesimplemaximumlikelihoodestimation, i.e.,minp#[µ%,µ+]m:p&1=1%n#i=1dilogpi,(4.6)where misthenumberofuniquetags, nisthenumberofimages, µ%andµ+arethelower andupperboundsfortheprobabilitiesinmatrix P=(p1,ááá ,pn).Theorem4.2. DeÞne zasz=dm!%p.Let öqbetheoptimalsolutionto(4.6).Then |p%öq|1-µ2+µ2%|z|22.80Andtobound |z|2,weneedthefollowingconcentrationinequalityforvectors. Theorem4.3. Withaprobability 1%2e%t,|z|2-5t+log mµ%m!|p|2.FollowingtheconcentrationinequalityforvectorsinTheorem4.3,webound |z|2.Then bycombiningTheorems4.2and4.3,wehave,withaprobability1 %2e%t,|p%öq|1-µ2+|p|22µ4%2(t+log m)m!Byapplyingtheaboveresulttomatrix Pandtakingtheunionbound,wehave,with probability1 %e%t,1n|P%öQ|1-µ2+µ4%max1-i-n|pi|222n(t+log m+log n)N.(4.7)Wenowcomparetheboundin(4.7)tothatin(4.5).Itiseasytoverifythat |pi|22/µ2%.mforany pi.Hence,thenete !ectoftheboundin(4.5)istoreplace mwith r,whichisexactly theimpactoflowrankassumption. 4.5ProofsofErrorBounds Inthissection,wegiveouttheproofsofthemaintheoremsproposedinSection4.4. 814.5.1ProofofTheorem4.1 Proof. DeÞnematrix MasM:=n#i=1,1m!di%pi-e&i=n#i=11m!die&i%P,(4.8)where ei#{0,1}nisthecanonicalbasefor Rn.Sincetheoccurrenceofeachtagin diissampledaccordingtotheunderlyingmultinomialdistribution pi,itiseasytoverifythat E[M]=0 .Beforepresentingouranalysis,weneedtwosupportinglemmasthatareimportantto ouranalysis.ThedetailedproofsoftheselemmasareprovidedinSectionA.2. Lemma4.4. Let P#$andQ#$betwoprobabilitymatrices.Wehave n#i=1m#j=1|Pi,j%Qi,j|2Qi,j.n#i=1m#j=1|Pi,j%Qi,j|=|P%Q|1.Lemma4.5. ([107])Let Z1,ááá ,Znbeindependentrandommatriceswithdimension m1$m2thatsatisfy E[Zi]=0and|Zi|!-Ualmostsurelyforsomeconstant U,andall i=1,ááá ,n.DeÞne ,Z=max 6777771nn#i=1E[ZiZ&i]77777!,777771nn#i=1E[Z&iZi]77777!8.82Then,forall t>0,withaprobability 1%e%t,wehave 777771nn#i=1Zi77777!-2max 6,Z9t+log( m1+m2)n,Ut+log( m1+m2)n8.Thefollowingtheoremisthekeytoouranalysis.Itshowsthattheestimationerror |P%Q|1,measuredby !1norm,willbesmallwhen Pcanbewellapproximatedbyalow rankmatrix. Theorem4.6. Let öQbetheoptimalsolutionto(4.3).If (.1µ%|M|!,where MisdeÞnedin(4.8),then |öQ%P|1-minQ##6|Q%P|2Fµ%+16 (2µ+rank (Q)8.ToutilizeTheorem4.6forboundingthedi !erencebetween PandöQ,weneedtobound |M|!.Thetheorembelowbounds |M|!byusingLemma4.5. Theorem4.7. DeÞne *as*:=2µ%max:;?.(4.9)Thenwithaprobability 1%e%t,wehave |M|!-*µ%.83CombiningTheorems4.6and4.7,wehavethefollowingresultforrecoveringtheproba- bilitymatrix P.Corollary4.8. Set(=*.Withaprobabilityatleast 1%e%t,wehave |öQ%P|1-minQ##6|Q%P|2Fµ%+16 *2µ+rank (Q)8.Furthermore,let öPbethebestrank-rapproximationof P.Wehave,withaprobability 1%e%t|öQ%öP|1-|P%öP|2Fµ%+16 *2µ+r.WenowcometotheproofofTheorem4.1.Whentherankof Pisr,usingCorollary4.8, wehave,withahighprobability, |öQ%P|1-16*2µ+r.If|P1|'.1and m!.O(log( m+n)),wehave *=O@A1µ%5|P1|'log( n+m)m!BCandtherefore,withahighprobability,wehave 1n|öQ%P|1-O.rlog( n+m)m!µ+|P1|'µ2%/-O.rnlog( n+m)Nµ+|P1|'nµ2%/.where Nisthenumberofobservedtags.ThisimmediatelyimpliesTheorem4.1. 844.5.2ProofofTheorem4.2 Proof. FollowingthesameanalysisasthatforTheorem4.6whoseproofisprovidedin Section4.5.4,wehave m#i=1(pi%öqi)2öqi-m#i=1ziöqi(pi%öqi).Usingthefact öqi#[µ%,µ+],wehave |pi%öqi|22-µ+µ%|z|2|p%öq|2,andtherefore |pi%öq|2-µ+µ%|z|2.WeÞnallycompletetheproofbyusingthefact m#i=1(pi%öqi)2öqi.|p%öq|1.854.5.3ProofofTheorem4.3 Proof. WewillusetheCherno !bound,i.e. X1,ááá ,Xm!beindependentdrawsfroma Bernoullidistributionwith P(X=1)= µ.Wehave P.1m!m!#i=1Xi.(1+ ')µ/-exp ,%'2µm!3-,P.1m!m!#i=1Xi-(1%')µ/-exp ,%'2µm!2-.UsingtheCherno !bound,wehave,withaprobability1 %2exp( %'2µm!/2)|X%µ|2-'2µ2.Bytakingtheunionbound,wehave,withaprobability1 %2e%t|z|2-5t+log mµ%m!|p|2.4.5.4ProofofTheorem4.6 Proof. Weconsideranysolution Q#$.Since öQistheoptimalsolutiontoEq(4.3),we have *3L(öQ),öQ%Q+-0,i.e.%1m!n#i=1m#j=1di,jöQi,j)öQi,j%Qi,j*+(*-|öQ|tr,öQ%Q+-0,86where -|öQ|trisasubgradientof |öQ|tr.Usingthefactthat *-|öQ|tr%-|Q|tr,öQ%Q+.0,wecanreplace *-|öQ|tr,öQ%Q+with *-|Q|tr,öQ%Q+,whichresultsinthefollowinginequality %1m!n#i=1m#j=1di,jöQi,j)öQi,j%Qi,j*+(*-|Q|tr,öQ%Q+-0.DeÞne Zi,j=(öQi,j%Qi,j)/öQi,j.Wehave %1m!n#i=1m#j=1di,jöQi,j)öQi,j%Qi,j*=%1m!n#i=1*die&i,Z+=%*P,Z +%*M,Z +.ThustheboundinEq(4.5)ismodiÞedas %n#i=1m#j=1Pi,jöQi,j)öQi,j%Qi,j*+(*-|Q|tr,öQ%Q+-n#i=1m#j=1Mi,jöQi,j)öQi,j%Qi,j*.Since %m#j=1Pi,jöQi,j)öQi,j%Qi,j*=%m#j=11öQi,j)Pi,j%öQi,j*)öQi,j%Qi,j*.wehave %m#j=1Pi,jöQi,j)öQi,j%Qi,j*=n#i=1m#j=1(öQi,j%Pi,j)22öQi,j+(öQi,j%Qi,j)22öQi,j%(Qi,j%Pi,j)22öQi,j.DeÞnematrix B#Rn$masBi,j=Mi,j/öQi,j.Usingthefact öQi,j#[µ%,µ+]andresult 87fromLemma4.4,wehave 12|P%öQ|1+|öQ%Q|2F2µ++(*-|Q|tr,öQ%Q+-|M|!µ%|öQ%Q|tr+|P%Q|2F2µ%.WewritetheSingularvaluedecompositionof QasQ=r#i=1,iuiv&i,(4.10)where ristherankof Q,,iisthe i-thsingularvalueof Q,and( ui,vi)aretheleftandright singularvectorsof Q.Let U4#Rn$(n%r)andV4#Rm$(m%r)betheorthogonalbases complementaryto UandV,respectively.DeÞnethelinearoperators PQandP4QasPQ(Z)=UU&Z+ZVV &%UU&ZVV &,P4Q(Z)=Z%PQ(Z).Accordingto(4.10),thesubgradient -|Q|trisgivenbytheset WW=DUV&+U4WV4:W#R(n%r)$(m%r),|W|!=1E.Thusbychoosinganappropriatematrix Wforthesubgradient -|Qtr|,wehave *-|Q|tr,öQ%Q+.%|PQ(öQ%Q)|tr+|P4Q(öQ%Q)|trandtherefore 12|P%öQ|1+|öQ%Q|2F2µ++(|P4Q(öQ%Q)|tr-(|PQ(öQ%Q)|tr+|M|!µ%|öQ%Q|tr+|P%Q|2F2µ%.88Usingthefact (.1µ%|M|!,wehave |P%öQ|1+|öQ%Q|2Fµ+-4(|PQ(öQ%Q)|tr+|P%Q|2Fµ%.Weconsidertwocases.IntheÞrstcase,weassume |P%öQ|1-1µ%|P%Q|2F,inwhichtheboundintheoremtriviallyholds.Inthesecondcase,wehavetheopposite |P%öQ|1>1µ%|P%Q|2F,whichimplies |öQ%Q|2Fµ+-4(|PQ(öQ%Q)|tr,andtherefore |PQ(öQ%Q)|tr-4(rµ+.Wecompletetheproofbypluggingtheabovebound. 894.6Implementation Inthissection,wepresenttwoauxiliarytechniquestoimprovethetagcompletionperfor- mance.Weincorporatethevisualfeaturestoimprovethetagaccuracyanduseanextended gradientmethodtosolvetheoptimizationprobleme "ciently. 4.6.1IncorporatingVisualFeatures Thelimitationofthenoisymatrixrecoverymethodin(4.4)isthatitdoesnottakeadvantage ofthevisualcontentsoftheimages,animportanthintforaccuratetagprediction.Sowe nextmodify(4.4)toincorporatethevisualfeatures.Hereweintroducetwocommonmethods tointegratethevisualfeatures,includingintroducingtheGraphLaplaciantotheobjective functionin(4.4)andalinearcombinationoftherecoveredmatrix Pandthemajorityvoting resultsamongnearestneighbors. 4.6.1.1GraphLaplacianMethod Let X=(x1,ááá ,xn)&includethevisualfeaturesofallimages,wherevector xi#Rdrepresentsthevisualcontentofthe ithimage.Let W=[wi,j]n$nbethepairwisesimilarity matrix,where wi,jisthevisualsimilaritybetweenimages xiandxj,i.e.,wi,j=:FFF;FFF