IMAGEANNOTATIONANDTAGCOMPLETIONVIAKERNELMETRIC
LEARNINGANDNOISYMATRIXRECOVERY
ByZheyunFeng
ADISSERTATION
Submittedto
MichiganStateUniversity
inpartialfulÞllmentoftherequirements
forthedegreeof
ComputerScience-DoctorofPhilosophy
2016ABSTRACT
IMAGEANNOTATIONANDTAGCOMPLETIONVIAKERNELMETRIC
LEARNINGANDNOISYMATRIXRECOVERY
ByZheyunFeng
Inthelastseveralyears,withtheever-growingpopularityofdigitalphotographyand
socialmedia,thenumberofimageswithuser-providedtagshasincreasedenormously.Due
tothelargeamountandcontentversatilityoftheseimages,thereisanurgentneedto
categorize,index,retrieveandbrowsetheseimagesviasemantictags(alsocalled
attributes
orkeywords
).Followingthistrend,imageannotationortagcompletionoutofmissing
andnoisygiventagsoverlargescaledatasetshasbecomeanextremelyhottopicinthe
interdisciplinaryareasofmachinelearningandcomputervision.
Theoverarchinggoalofthisthesisistoreassesstheimageannotationandtagcompletion
algorithmsthatmainlycapturetheessentialrelationshipbothbetweenandwithinimages
andtagsevenwhenthegiventaginformationisincompleteornoisy,soastoachieveabetter
performanceintermsofbothe
!ectivenessande
"ciencyinimageannotationandothertag
relevanttasksincludingtagcompletion,tagrankingandtagreÞnement.
Oneofthekeychallengesinsearch-basedimageannotationmodelsistodeÞneanap-
propriatesimilaritymeasure(distancemetric)betweenimages,soastoassignunlabeled
imageswithtagsthataresharedamongsimilarlabeledtrainingimages.Manykernelmetric
learning(KML)algorithmshavebeendevelopedtoserveassuchanonlineardistancemetric.
However,mostofthemsu
!erfromhighcomputationalcostsincethelearnedkernelmetric
needstobeprojectedintoapositivesemi-deÞnite(PSD)cone.Besides,inimageannota-
tiontasks,existingKMLalgorithmsrequiretoconvertimageannotationtagsintobinary
constraints,whichleadtoasigniÞcantsemanticinformationlossandseverelyreducesthe
annotationperformance.
Inthisdissertationweproposearobustkernelmetriclearning(RKML)algorithmbased
onregressiontechniquethatisabletodirectlyutilizetheimagetags.RKMLiscomputation-
allye
"cientsincethePSDpropertyisautomaticallyensuredbytheregressiontechnique.
Numericconstraintsovertagsarealsoappliedtobetterexploitthetaginformationand
henceimprovetheannotationaccuracy.Further,theoreticalguaranteesforRKMLarepro-
vided,anditse
"ciencyande
!ectivenessarealsoveriÞedempiricallybycomparingitto
state-of-the-artapproachesofbothdistancemetriclearningandimageannotation.
Sincetheuser-providedimagetagsarealwaysincompleteandnoisy,wealsoproposeatag
completionalgorithmbynoisymatrixrecovery(TCMR)tosimultaneouslyenrichthemissing
tagsandremovethenoisyones.TCMRassumesthattheobservedtagsareindependently
sampledfromunknowndistributionsthatarerepresentedbyatagmatrix,andourgoalis
torecoverthattagmatrixbasedonthepartiallyrevealedtagswhichcouldbenoisy.We
providetheoreticalguaranteesforTCMRwithrecoveryerrorbounds.Inaddition,agraph
Laplacianbasedcomponentisintroducedtoenforcetherecoveredtagstobeconsistentwith
thevisualcontentsofimages.Ourempiricalstudywithmultiplebenchmarkdatasetsfor
imagetaggingshowsthattheproposedalgorithmoutperformsstate-of-the-artapproaches
intermsofbothe
!ectivenessande
"ciencywhenhandlingmissingandnoisytags.
Tomyparents,ShiyingandXuexiang.
ivACKNOWLEDGMENTS
Firstandforemost,Ifeelindebtedtomyadvisor,ProfessorRongJin,forhisguidance,
encouragement,andinspiringsupervisionthroughoutthecourseofthisresearchwork.His
patience,prudentialattitude,extensiveknowledge,andcreativethinkinghavebeenthe
sourceofinspirationforme.HewasavailableforadviceoracademichelpwheneverIneeded
andgentlyguidedmefordeeperunderstanding,nomatterhowlateorinconvenientthetime
was.WhenIwasstrugglingtostopmycareerandleavingParisfouryearsago,Iwasnot
sureaboutmydecision,butnowIamsohappyandproudtosaythatImadeasowise
decision.ItisextremelyhardtoexpresshowgratefulIamforhisunwaveringsupportover
thelastyearsandinthecomingfuture.
IwouldliketotakeonthisopportunitytothankmythesiscommitteemembersProfessor
AnilK.Jain,ProfessorJoyceY.Chai,andProfessorSelinAviyentewhohaveaccommodated
mytimingconstraintsdespitetheirfullschedules,andprovidedmewithpreciousfeedback
forthepresentationoftheresults,inbothwrittenandoralform.
DuringmyPh.D.studies,Ihadthepleasureofcollaboratingwithmanyresearchers
fromeachandeveryoneofwhichIhadthingstolearn,andthequalityofmyresearch
wasconsiderablyenhancedbytheseinteractions.IwouldliketothankProfessorAnilK.
Jainwhogenerouslygavemesomuchgeneralguidanceinmyresearchdirectionandpaper
polishing.IwouldliketothankSongheFeng,TianbaoYangandFengjieLiforallthe
discussionswehadandthefunmomentswespenttogetherondoingresearchandfuture
planning.IalsospenttwowonderfulsummersasinternatComcastLab,DCwithDr.Jan
Neumann.Ilearnedalotfromhisteamandwouldliketoexpressmygratitudeforhaving
measaresearchintern.IalsowouldliketothankSerhatSelücukBucakandRadhaChitta
vforsomehelpfuldiscussionsandsuggestions.
LivinginEastLansingwithoutmygoodfriendswouldnothavebeeneasy.Iwantto
thankallmyfriendsinthedepartmentandoutsidethedepartment.IwishIcouldname
youall.
LastbutdeÞnitelynotleast,Iwanttoexpressmydeepestgratitudetomybeloved
parentsanddearestboyfriend.Theirloveandunwaveringsupporthavebeencrucialtomy
success,andaconstantsourceofcomfortandcounsel.Specialthankstomyparentsfor
abidingbymyabsenceinlastfouryears.
viTABLEOFCONTENTS
LISTOFTABLES
....................................
xLISTOFFIGURES
...................................
xiiLISTOFALGORITHMS
...............................
xvChapter1Introduction
................................
11.1ImageAnnotation.................................5
1.2ImageTagging...................................6
1.3ThesisContributions...............................10
1.4ThesisOverview..................................12
1.5Notation......................................13
1.6BibliographicNotesforPreviousPublications.................13
Chapter2Background
................................
142.1ImageRepresentation...............................14
2.1.1ColorFeature...............................14
2.1.2TextureFeature..............................15
2.1.3TypicalFeaturesinImageTagging...................16
2.2ImageAnnotation.................................17
2.2.1GenerativeModels............................17
2.2.2DiscriminativeModels..........................18
2.2.3SearchbasedModels...........................19
2.2.4NeuralNetworkbasedModels......................20
2.3ImageTagging...................................22
2.3.1ImageTaggingwithTopicModels....................24
2.3.2ImageTagCompletion..........................25
2.4ImageAnnotationbyMetricLearning.....................26
2.4.1LinearDistanceMetricLearning.....................27
2.4.2NonlinearDistanceMetricLearning...................29
2.4.3OnlineMetricLearning..........................31
2.4.4LocalMetricLearning..........................32
2.4.5OtherMetricLearning..........................32
2.5ImageTaggingbyMatrixCompletion......................33
2.5.1LowRankMatrixRecoverywithNuclearNormMinimization....33
2.5.2LowRankRecoveryunderOtherConstraintsandSamplingDistributions37
Chapter3ImageAnnotationwithKernelDistanceMetricLearning
...393.1MotivationandSetup...............................39
vii3.2RelatedWork...................................42
3.3AnnotateImagesbyRegressionbasedKernelMetricLearning(RKML)...43
3.3.1RegressionbasedKernelMetricLearning................44
3.3.2ExtensiontoImageFeatureDimensionReduction...........46
3.4TheoreticalGuaranteeofRKML........................46
3.4.1AnalysisoftheLowRankApproximationA
!ectstoRKML.....48
3.5ProofsofErrorBounds..............................49
3.5.1ProofofTheorem3.1...........................49
3.6Implementation..................................51
3.6.1ComputingSemanticSimilarity
si,j...................52
3.6.2E
"cientlyComputing
KrbyRandomProjection...........52
3.6.3ApplicationofRKMLtoImageAnnotation..............53
3.7Experiments....................................55
3.7.1DatasetsandExperimentalSetup....................55
3.7.2ComparisonwithState-of-the-artdistancemetriclearning(DML)and
ImageAnnotationAlgorithms......................57
3.7.2.1ComparisontononlinearDMLalgorithms..........57
3.7.2.2ComparisontolinearDMLalgorithms............59
3.7.2.3ComparisonwithState-of-the-artImageAnnotationMethods60
3.7.2.4ComparisonofAnnotationResultsonExemplarImages..61
3.7.2.5E
"ciencyEvaluation......................63
3.7.3A
!ectsofDi
!erentExperimentalandParameterSetup........63
3.7.3.1SensitivitytoParameters...................63
3.7.3.2AdvantagesofKernelTrickandNystr¬omApproximation..65
3.7.3.3AnalysisonBinaryConstraintsandTheirVariousGenera-
tionWays............................67
3.7.3.4ComparisonoftheDesignChoicesofSemanticSimilarity
Measure.............................68
3.8Summary.....................................69
Chapter4ImageTagMatrixCompletionbyNoisyMatrixRecovery
...714.1MotivationandSetup...............................72
4.2RelatedWork...................................75
4.3TagCompletionbyNoisyMatrixRecovery(TCMR).............76
4.3.1NoisyMatrixRecovery..........................77
4.3.2IncorporatingIrrelevantTagsintoNoisyMatrixRecovery.......79
4.4TheoreticalGuaranteeofRKML........................79
4.4.1ImpactofLowRankAssumptiononRecoveryError..........80
4.5ProofsofErrorBounds..............................81
4.5.1ProofofTheorem4.1...........................82
4.5.2ProofofTheorem4.2...........................85
4.5.3ProofofTheorem4.3...........................86
4.5.4ProofofTheorem4.6...........................86
4.6Implementation..................................90
4.6.1IncorporatingVisualFeatures......................90
viii4.6.1.1GraphLaplacianMethod...................90
4.6.1.2LinearReconstructionApproach...............92
4.6.2E
"cientSolutionoftheProposedAlgorithm..............92
4.6.3Pseudo-codeofTCMR..........................93
4.7Experiments....................................93
4.7.1DatasetsandExperimentalSetup....................93
4.7.2Comparisontostate-of-the-artTagCompletionMethods.......96
4.7.2.1E
"ciencyEvaluation......................98
4.7.3AnalysisofAlgorithmDesign......................98
4.7.3.1EvaluationofNoisyMatrixRecoverywithoutVisualFeatures98
4.7.3.2AnalysisofScalability.....................100
4.7.3.3EvaluationonVariousTypesofRegularizer.........102
4.7.3.4EvaluationonVariousLossFunctions............103
4.7.3.5E
"cientExtensionofTCMRbyLinearReconstruction...104
4.7.4E
!ectsonMissingandNoisyTags....................106
4.7.4.1SensitivitytotheNumberofObservedTags.........106
4.7.4.2SensitivitytoNoise.......................107
4.7.5E
!ectsonOtherTag-relevantApplications...............108
4.8Summary.....................................110
Chapter5SummaryandConclusions
.......................
1125.1Contributions...................................112
5.1.1ImageAnnotationbyKernelMetricLearning.............112
5.1.2ImageTagCompletionbyNoisyMatrixRecovery...........113
5.2FutureWork....................................114
5.3Conclusions....................................116
APPENDIX
........................................
117BIBLIOGRAPHY
...................................
122ixLISTOFTABLES
Table3.1Statisticsforthedatasetsusedintheexperiments.Thebottomtwo
rowsaregivenintheformatmean/maximum.............56
Table3.2Examplesofannotationresultsgeneratedby14baselinesandthepro-
posedRKML.Theannotatedtagsarerankedbasedontheestimated
relevancescoreindescendingorder,andthecorrectonesarehigh-
lightedinblueboldfont.Notethegroundtruthannotationsinthe
2-ndcolumndonotalwaysincludeallrelevanttags(
e.g.,ÒpeopleÓ
forthe5-thimage),andsometimescontainpolysemes(
e.g.,ÒpalmÓ
forthe4-thand5-thimages)andcontroversialtags(
e.g.,ÒfrontÓ).62
Table3.3Comparisonofrunningtime(s)forseveraldi
!erentmetriclearning
algorithms.................................63
Table3.4Runningtime(s)forimageannotation.SVMmethodsFlickr1Mare
notincludedduetotheirhighcomputationalcosts..........63
Table3.5ComparisonofvariousextensionsofRKMLintermsof
AP@tonIAPRTC12dataset.RLMListhelinearversionofRKML,RKMLO
istheoriginalversionwithoutNystr¬omapproximation,andRKMLH
runsRKMLusingbinaryconstraints..................66
Table3.6ComparisonoftheextensionsofRKMLintermsof
AP@tonESP
Gamedataset..............................66
Table3.7ComparisonoftheextensionsofRKMLintermsof
AP@tonFlickr
1Mdataset.RKMLOisexcludedsincethedatasetistoolargetodo
thecomputationonthefullkernel....................66
Table3.8Comparisonofdi
!erentmethodsofgeneratingbinaryconstraintsthat
areappliedinbaselinedistancemetriclearningalgorithmLMNNfor
thetop
tannotatedtagsontheFlickr1Mdataset.Method1clus-
tersthespaceofkeywords,method2considerstheclassassignments
asbinaryconstraints,method3clustersthespaceofkeywordsus-
inghierarchicalclusteringalgorithms,method4clustersthespaceof
keywordstogetherwiththevisualfeatures,andmethod5considers
imagessharingmorethan4keywordsassimilarandimagessharing
nokeywordasdissimilar.........................67
Table3.9Localweightingfunctions........................68
xTable3.10Globalweightingfunctions........................69
Table3.11ComparisonofextensionsofRKMLwithdi
!erentdesignchoicesof
semanticsimilarityforthetop
tannotatedtagsontheIAPRTC12
dataset.Theleftmostcolumnliststhedi
!erentweightingmethods,
wherethenamebeforeÓ-ÓdenotesthelocalweightsshowninTa-
ble3.9andthenamebehindÓ-Óindicatestheglobalweightsshown
inTable3.10.ÓCosineÓrepresentsthecosinesimilaritybetweentag
vectorsoftwoimages...........................69
Table4.1StatisticsforthereÞneddatasets.
!indicatesthenumberofobserved
tagswhentrainingtheTCMRmodelthroughouttheexperimental
sectionifnospeciÞcexplanation.....................95
Table4.2Runningtime(seconds)fortagcompletionbaselines.Allalgorithms
areruninMatlabonanAMD4-core@2.7GHzand64GBRAM
machine..................................98
Table4.3ComparisonoftagcompletionperformancebetweenTCMRandits
counterpartswithdi
!erentregularizers,evaluatedbyaccuracy(%)
ande
"ciency/runningtime(s).....................102
Table4.4Comparisonoftagcompletionaccuracy(%)betweenTCMRandits
counterpartswithdi
!erentlossfunctions.Standarddeviationisomit-
tedforsimplicity.[1]to[5]representabsolute,leastsquare,hinge,
logisticandmaximumlikelihoodlossfunctions,respectively......103
Table4.5Comparisonoftagcompletione
"ciency(runningtimeinsecond)
betweenTCMRanditscounterpartswithdi
!erentlossfunctions..103
Table4.6PerformancecomparisonofTCMRandTCMR-lr,intermsofboth
accuracy(%)andrunningtime(s)...................104
Table4.7PerformancecomparisonofTCMRandTCMR-lrwhentheobserved
tagsareseverelynoisy.
AP@1isusedforevaluation..........105
Table4.8Examplesoftagcompletionresultsgeneratedbysomebaselinealgo-
rithmsandtheproposedTCMR.Theobservedtagsinreditalicfont
arenoisytags,andothersarerandomlysampledfromtheground
truthtags.Thecompletedtagsarerankedbasedontherecovered
scoresindescendingorder,andthecorrectonesarehighlightedin
blueboldfont..............................111
xiLISTOFFIGURES
Figure1.1DailynumberofimagesuploadedtotheInternetthroughselected
apps[143].................................2
Figure1.2Anillustrativeexampleforthecomparisonoflinearandnonlinear
distancemetriclearningalgorithms.(a),(b)and(c)showtheorig-
inaldatadistribution,thedistributionadjustedbyalearned
linear
distancemetric,andthedistributionadjustedbyalearned
kernel
metric,respectively...........................7
Figure1.3Exemplarillustrationoftagcompletionandotherimagetagging
worksincludingimageannotation,tagrecommendationandtagre-
Þnement.Theupperboxshowstheinitiallygiveninformation(both
visualandsemantic),andthebottomboxindicatestheultimateob-
jectiveofallfourtasks.........................9
Figure2.1AnnotateanimagewithANN[220]...................21
Figure2.2Theillustrationofhowtopicmodelworks[11],whereeachtopicis
highlightedbyaspeciÞccolor......................24
Figure3.1Illustrationofhowkerneldistancemetricworkstoimageswithap-
propriatetags.Intheleftbox,imagessharethetagsmarkedinthe
samecolorasthelinesconnectingthem.................40
Figure3.2Theproposedkernelmetriclearningscheme,
i.e.,RKML,forauto-
maticimageannotation.........................55
Figure3.3Averageprecisionforthetop
tannotatedtagsusingnonlineardis-
tancemetrics...............................58
Figure3.4Averagerecallforthetop
tannotatedtagsusingnonlineardistance
metrics..................................58
Figure3.5AverageF1scoreforthetop
tannotatedtagsusingnonlineardistance
metrics..................................58
Figure3.6Averageprecisionforthetop
tannotatedtagsusinglineardistance
metrics..................................59
xiiFigure3.7Averagerecallforthetop
tannotatedtagsusinglineardistancemet-
rics....................................59
Figure3.8AverageF1scoreforthetop
tannotatedtagsusinglineardistance
metrics..................................59
Figure3.9Annotationperformanceintermsof
AP@twithdi
!erentannotation
models...................................60
Figure3.10Averagerecallforthetop
tannotatedtagsusingdi
!erentannotation
models...................................61
Figure3.11AverageF1scoreforthetop
tannotatedtagsusingdi
!erentanno-
tationmodels..............................61
Figure3.12AveragePrecisionfortheÞrsttagpredictedbyRKMLusingdi
!erent
valuesofrank
r.TomaketheoverÞttinge
!ectclearer,weturno
!theNystr¬omapproximationforIAPRTC12andESPGamedatasets.
Flickr1Mdatasetisnotincludedduetoitslargesize(
n=999
,764).TheoverÞttingonlyoccurswhen
rapproximatestothetotalnumber
ofimages,butitisinfeasibletoapplysuchalarge
rinFlickr1M
dataset..................................64
Figure3.13AveragePrecisionforthetop
ttagspredictedbyRKMLusingdi
!er-
entvaluesof
m",thenumberofretainedeigenvectorswhenestimating
thesemanticsimilarity..........................65
Figure3.14AveragePrecisionforthetop
ttagspredictedbyRKMLusingdif-
ferentvaluesof
ns,thenumberofsampledimagesusedforNystr¬om
approximation.In(c),
nscouldnÕtbesettoolargeduetothedataset
size....................................65
Figure4.1Theschemeoftheproposednoisytagmatrixrecoveryframework,
i.e.,TCMR,forimagetagcompletion.Thelowrankmatrixrecovery
componentintheupperrightboxexploitsthetag-tagcorrelation,
andthegraphLaplaciancomponentinthebottomlefttakesinto
accountofthetag-contentcorrelation..................74
Figure4.2ComparisonoftagcompletionperformancebetweenTCMRandstate-
of-the-artbaselinesonIAPRTC12dataset...............97
Figure4.3ComparisonoftagcompletionperformancebetweenTCMRandstate-
of-the-artbaselinesonotherdatasetsincludingMirFlickr,ESPGame
andNUS-WIDE..............................97
xiiiFigure4.4Comparisonofdi
!erenttopicmodelsandmatrixcompletionalgo-
rithmswithouttakingintoaccountthevisualfeature.Thetoprowis
evaluatedby
AP@N,themiddlerowisby
AR@N,andthebottom
rowisby
C@N..............................99
Figure4.5Scalabilityanalysisoverlarge-scaledatasetImageNetintermsoftag-
gingprecision.Metric
AP@Nisusedforevaluation.Thesizeof
evaluatedsubset
Nvariesfrom4
Kto1
M...............101
Figure4.6Scalabilityanalysisoverlarge-scaledatasetImageNetintermsofim-
plementationtime(log
10(seconds)).Thesizeofevaluatedsubset
Nvariesfrom4
Kto1
M..........................101
Figure4.7Tagcompletionperformancewithvariednumberofobservedtags,
evaluatedby
AP@3(toprow)and
AP@5(bottomrow).IAPRTC12
isacleanandcompletedatasetwhileNUS-WIDEcontainsmissing
andnoisytags...............................106
Figure4.8Comparisonoftagcompletionperformance(
AP@3)usingnoisyob-
servedtags................................108
Figure4.9ComparisonbetweenTCMRandbaselinealgorithmswithvariedper-
centageofnoisytagsintermsofothertwotagrelevantapplications,
includingtagrankingandtagreÞnement.Thecounterpartperfor-
manceoftagcompletioncanbereferredtoFigure4.8(b).......109
xivLISTOFALGORITHMS
Algorithm1AutomaticImageAnnotationwithRKML...........................54
Algorithm2ImageTagCompletionbyNoisyMatrixRecovery....................94
xvChapter1
Introduction
Wearefacingtheproblemofimageexplosion:massiveimageshavebeenprovidedthrough
di!erentsourcesincludingtheInternet,cameranetwork,researchlaboratories,personal
digitaldevicesandmanyotherphotomanagementapplications.TheInternethasgreatly
promotedtheabilitytoreleaseandaccessallmannerofmultimediainformationespecially
images.Forinstance,KPCBanalystMaryMeekerÕsannualInternetTrendsreportstates
thatallinternet-connectedcitizensshareover1
.8billionphotoseachday[143]throughmulti-
platformservicessuchasSnapchat
1,Instagram
2,Facebook
3,WhatsApp
4,etc.,asshown
inFigure1.1.Therefore,suchaproliferationofimagesposesanurgentchallengeforlarge
scaleimagecategorization,indexing,retrievalandbrowser.
Intheimageretrievalcommunity,mostmethodscanbecategorizedintotwogroups:
contentbasedimageretrieval(CBIR)[176]andtagbasedimageretrieval(TBIR)[135].
CBIRmatchesthequeryimageandthegalleryimagesbasedontheirvisualsimilarities
thatcouldbecomputedfromagroupofvisualfeaturesincludingcolor,texture,shape[75],
LBP[152],HOG[37],SIFT[137],GIST[154]andthelistgoeson[204].Despitetheelaborate
systemdesigningandcomputationale
!orts,theperformanceofCBIRisstillprohibitedby
thenotorioussemanticgapbetweenthelowlevelvisualfeaturesthatreßecttheimage
1https://www.snapchat.com/
2https://www.instagram.com
3https://www.facebook.com/
4https://www.whatsapp.com/
1Figure1.1:DailynumberofimagesuploadedtotheInternetthroughselectedapps[143].
contentsandthehighlevelsemanticsbehindimages[67,201].Toconsiderablyimprove
theperformanceofCBIR,substantialadvancementsintermsoftheinvolvedtechnologies
anddesignsarestillrequired,whichincludesfeatureextraction,featureselection,indexing,
queryrephrasingandcompletion[132].
ToovercomethelimitationsofCBIRintermsofbothretrievale
!ectivenessande
"-ciency,theTBIRwasproposedaccordingly.Insteadofthevisualfeatureswhichtakegreat
computationcost,TBIRrepresentsimageswithasetoftags(alsocalledkeywords,labels
orattributes).Theusergivesoutthequeryasasequenceofsemanticwords,andthen
therelevantimagesareretrievedbasedonthematchesbetweenthetextualqueryandthe
imagetags.ComparedwithCBIR,TBIRhastwosigniÞcantadvantages.First,itallows
theuserstobetterexpresstheirqueryneedswithsemanticwords,whichalleviatesthese-
2manticgapandimprovestheretrievalaccuracy.Andsecondly,TBIRformulatestheimage
retrievalproblemasadocumentretrievalproblem,whichallowstousetheinvertedindex
technique[227]andgreatlyimprovestheretrievale
"ciency.
Besidestagbasedimageretrieval,thereisalsomanyothertagdependenttasksthat
categorize[116],indexingandbrowse[168]imagesviasemantictagsforsimilarreasons.
However,thegoodperformanceofthesetaskssubstantiallyreliesontheimagesetwhichis
supposedtohavesu
"cienthighqualitytags.
However,amongthegreatamountofavailableimages,onlyasmallportionofimagesare
associatedwithappropriatetags.Generallyimagesareannotatedmanually,eitherbyprofes-
sionalannotatorsorsimplybythephototakersandreviewers.Theprofessionallyannotated
tagsareelegantandreliable,butcosttremendouslabele
!ortsandtime.Typicalsuchimage
datasetsincludeCCUBNABirds700Dataset
5andMicrosoftCOCODataset[125],which
tookyearstocollect,annotateandbuildupbyagroupofresearchers.Apparently,thisis
prohibitivelylaborcostingintermsoftheproliferationofimages[202].Fortunatelyinmost
cases,theimagetagsareprovidedbytheuserswhouploadtheimagetosocialmedia(e.g.
Flickr
6)andthereviewersofthisimage,ordirectlycrawledfromtheaccompanydescrip-
tions/titlesofthatimage.Howevertagsgeneratedinthiswayarefarawayfromreliable,
sincetheyareusuallygeneral,ambiguous,biased,andsometimeseveninappropriate,incom-
pleteorredundantformanyreasonsaccordingto[70,101].Allthesefactorscouldseverely
preventtheperformanceofTBIRandothertag-basedtasks.
Asaresult,theneedforreliabletagsoverlargescaleimagesbecomesproÞtableand
emergent,whichmotivatestheresearchcommunitytodevelope
!ectiveande
"cientauto-
5http://info.allaboutbirds.org/nabirds/.6https://www.flickr.com/.3matictaggingsystems[56].Amongthem,imageannotationandtagcompletionaretwobig
branchesthatcatchthemosteyes.
Thedi
!erencesbetweenthesetwobranchescomefromthetwotypesofthesupervised
information.Intheimageannotationframework,asubsetofimagesisassociatedwith
appropriatetagsandtheotherimagesarenotassignedwithanytag.Andthegoalis
topredicttagsfortheunlabeledimages.Inthetagcompletiontask,alltheimagesare
associatedwithcertainnumberoftags.Howeversometagsareappropriatewhilesome
othersarenot,andtherearealsosometagssupposedtobeobservedbutactuallynot.And
itsÞnalpurposeistoupdatethewholetagmatrixtomakeitbetterdescribetheimageÕs
visualcontents.
Sobasically,thegoalofimageannotationandtagcompletionworkistolearnfrom
labeledexamplesinordertopredictthelabelsofotherexamplesorthescoresoftheother
labels.Thatis,givenatrainingsetofsupervisedinformation(examplesorlabels),itis
aimedtolearnahypothesisthatassignseachlabelaconÞdencescoretoassociateasample
wherethelabelorthesamplecouldhaveneverbeenobservedbythealgorithm.E
"ciently
Þndingsuchane
!ectivehypothesisbasedonthetrainingsetandtheobservedlabels,which
minimizessomevalidationmeasureofperformance,isthemainfocusofthislearning.
Thischapterisdevotedtoanoverviewofthesetwobroadtopicsoftagassigningand
amelioration,aimingtodevelopageneralcorrespondencebetweenorwithintheimagevisual
contentsandthesemantictags.HerewemovetowardstothedeÞnitionsinafairlynon-
technicalmannerandtheformaldetaileddeÞnitionswillbegiveninChapter2.
41.1ImageAnnotation
Theobjectiveofimageannotationistoautomaticallyannotateanimagewithappropriate
tags,whichexclusivelyreßectitsvisualcontent.Imageannotationhasbeenahottopic
ofon-goingresearchformorethanadecade,andmanytechniqueshavebeendeveloped.
Conventionally,imageannotationistackledasamachinelearningproblem,wheretwomajor
componentsareincluded:visualfeatureextractionandmappingthosefeaturestosemantic
tags[132].FeatureextractionobtainssigniÞcantpatternsfromimages,andthenthepatterns
aremappedtokeywordsinthesemanticspaceviaasetofmachinelearningalgorithms,which
capturetherelationshipbetweenvisualcontentsandsemantictagsinoneofthreeways:(i)
formalizingastatisticalmodelbetweentagsandvisualfeatures[22,49,119,132];(ii)casting
theproblemintoasetofbinaryclassiÞcationones[47,72];and(iii)representingthetags
asamatrixandtreatingtheannotationproblemasamatrixfactorization[223]ormatrix
completionproblem[126,201].Thekeyofthesemethodsistotrainareliablemodelwith
su"cientlyaccuratetagsbyoptimizingimagecompactnessandseparabilityinaglobalsense.
However,thesemanticgap,andtheimperfecttagsusuallyleadtoabiasedmodelandresult
inasuboptimalsolution.Thatmeansthediscriminatorypowerofinputimagesmightvary
betweendi
!erentneighborhood,andaglobalmodelhencecannotÞtwellthevisual-semantic
relationoverthedatamanifold.Besides,manyparametricmodelsarenotrichenoughto
e!ectivelycapturethecomplicatedependenciesbetweenimagecontentandtags.
Recently,alocalnon-parametricmodel,thesearchbasedapproach,hasbeenprovedtobe
quitee
!ective,particularlyforlargeimagedatasetswithmanykeywords[67,93,139,194].
Itskeyideaistoannotateatestimage
Iwiththecommontagssharedbythesubsetof
trainingimagesthatarevisuallysimilarto
I.5Thecruxofsearchbasedannotationmethodsistoe
!ectivelymeasurethevisualsimi-
laritybetweenimages.
Distancemetriclearning
(DML)[78,210,205]tacklesthisproblem
bylearningametricthatpullssemanticallysimilarimagescloseandpushessemantically
dissimilarimagesfarapart.ManystudiesonDMLarerestrictedtolearningalinearMa-
halanobisdistancemetricinaÞnitedimensionalspace,whichisexpectedtobeconsistent
withtheassociatedtags.
However,mostdistancemetriclearningalgorithmsassumealldataisoflinearsepara-
bility[26],andtheyusuallyfailtocapturethenonlinearrelationshipsamongimages.To
addressthisproblem,severalnonlinearDMLalgorithmshavebeenproposed.Theirkeyidea
istomapdatapointsfromtheoriginalvectorspacetoahigh(oreveninÞnite)dimensional
spacethroughanonlinearmapping,whichcanbeeitherexplicitlyconstructedusingboost-
ingmethods[73,74,172],orimplicitlyderivedthroughkernelfunctions.Andthelatteris
referredtoas
KernelMetricLearning
(KML)[26,39,187],whichhasbeenwidelyusedto
settleimagesimilarityproblemsinimageclassiÞcation[39,60,187],clustering[2,26,205],
andretrieval[77,78].Figure1.2illustratesanexamplecomprisedoftwogroupsofdata
pointsannotatedbydi
!erenttags,andFigure1.2(b)showsthedistributionsofdatapoints
adjustedbyalearnedlineardistancemetric,whichfailstoseparatetheobjectswithdi
!erent
tags,whileinFigure1.2(c)thenewlylearnednonlinear(kernel)distancemetricsuccessfully
separatesthedatawithdi
!erenttags.
1.2ImageTagging
Recentlymanyuser-providedtagsareautomaticallygenerated,andthusareincompleteor
inaccurateindescribingthevisualcontentofimages[201].Inparticular,thesetagsare
6(a)Originaldatadistribution
(b)Lineardistancemetric
(c)Nonlineardistancemetric
Figure1.2:Anillustrativeexampleforthecomparisonoflinearandnonlineardistancemetric
learningalgorithms.(a),(b)and(c)showtheoriginaldatadistribution,thedistribution
adjustedbyalearned
linear
distancemetric,andthedistributionadjustedbyalearned
kernel
metric,respectively.
crawledfromthedescriptionsoftheuploaderandreviewers,andinmostcasesonlyasmall
numberoftagsareprovidedforeachimageandsomeareevenirrelevanttothevisual
contentofimages.Thisdisadvantagemakesitdi
"culttofullyutilizetheseaccompanytags
andlimitstheirapplicationsintagdependenttaskssuchastagbasedimageretrievaland
tagrecommendation[126,129,226].TobetterbeneÞtfromthetags,thereisanurgent
demandfore
"cientalgorithmsthatareabletoimprovethetaggingqualityforalargescale
ofimages,speciÞcally,e
!ectivealgorithmsthatcansimultaneouslyrecoverthemissingtags
andremoveordownweightthenoisytags.
GenerallytherearetwogroupsofalgorithmscanfulÞllthisdesiretore-weightthegiven
tags:theÞne-grainedonesandthecoarseones.Andbothgroupsmodelmainlythecorrela-
tionsamongthepartiallyobservedtags,thenapplythismodeltothewholedictionaryand
re-weightalltagsinthedictionary.
TheÞne-grainedones[40,44,116,115]aimattheimageswithaspeciÞccontentsand
carefullymanuallyannotatedtags.Thecontentsofimagesaresimpleandconcentrated,
whilethesemanticknowledgearecompleteandclean,andsometimesevenmanuallyasso-
7ciatedtothecorrespondentsegmentsintheimages.Typicaldatasetsforthesealgorithms
areAnimalwithAttributesdataset[115]andCUB-200-2011dataset[189].TheÞne-grained
groupincludestransferlearning[115],labelpropagation[40],zeroshotrecognition[116]and
Þne-grainedcategorization[44].Thisgroupisabletolearnnewattributesbyexploiting
theirgraphicalorprobabilisticrelationshipviabothanintermediate-levelsemanticrepre-
sentationandthelow-levelmappingbetweentagsandtheircorrespondentsegmentsinthe
imagerepository.
However,naturalimagesareusuallyassortedandinvolvealargescopeoftopicsinclud-
ingscene,people,animal,action,objectsandmanyotheraspectsoflifeandenvironment.
Besides,becauseofthelargeamountandversatilityoftheimages,thetagsareusually
user-providedinsteadofmanuallyannotated,andthuscontainmanymissingannotations
anderrors.Andprobably,thetagdictionarymightbeverylong.Typicaldatasetscouldbe
referredtoFlickr1Mdataset[201]andNUS-WIDE[33]dataset.InthiscasetheÞnegrained
methodsarenolongercapabletocapturethetag-tagortag-imagedependenciessinceon
theonehand,themanualsegmentationandtaglocalizationarelaborcosting;andonthe
otherhandtheobservedtagsareusuallytooproblematictotrainareliablemodel.Inthe
contrast,thecoarsemethodsareabletosimultaneouslyaddressthechallengesofmissing
andnoisytags[33]forahugevarietyofimagesusingmachinelearningtechniques,which
relymainlyonthesemanticinformationthatcoversawiderangeofknowledge,andaswell
astheauxiliaryvisualcontentinformation.
Werefertothisproblemas
tagcompletion
[33]todistinguishitfrompreviouscoarse
imagetaggingwork.AlthoughtheÞnalobjectiveofthosetaggingworksistoassignan
imagewithcompleteandexacttagswithreasonableconÞdencescores,theirinitialsetups
vary,asillustratedinFigure1.3.
Imageannotation
[28,67,139]automaticallyassignsunla-
8Figure1.3:Exemplarillustrationoftagcompletionandotherimagetaggingworksinclud-
ingimageannotation,tagrecommendationandtagreÞnement.Theupperboxshowsthe
initiallygiveninformation(bothvisualandsemantic),andthebottomboxindicatesthe
ultimateobjectiveofallfourtasks.
beledimageswithappropriatekeywords.Asastate-of-the-artimageannotationapproach,
searchbasedalgorithms[52,67]relyonthequalityoftagsassignedtotrainingimages[52].
Tagrecommendation
suggestscandidatetagstoonlineannotatorsinordertoimprovethe
e"ciencyandqualityofthetaggingprocess[108,158,206].ItusuallyidentiÞesmissing
tagsbytopicmodels(e.g.
LatentDirichletAllocation
(LDA))[12,108,225],butdoesnot
addressthenoisytagproblem,animportantissueinexploitinguser-providedtags.
Tagre-
Þnementappliesvarioustechniques,includingtopicmodel,tagpropagation,sparsetraining
andpartialsupervision[28,131,206],toselectasubsetoutoftheuser-providedtagsbased
onimagefeaturesandtagcorrelation[224].Althoughitisabletohandlenoisytags,it
9cannotexplicitlyenrichthemissingtags.
1.3ThesisContributions
Inthissectionweshallelaborateonthemainproblemsconsideredinthisthesisandourkey
contributionstoaddresstheseproblems.
Thisdissertationmainlydealswiththeimageannotationandtagcompletionproblems,
givingtheoreticalguaranteesandprovidingempiricalcomparisonswithstate-of-the-artbase-
linealgorithms.Generally,weattempttodelveintotheimage-imagecorrelation,image-tag
mappingandtag-taginteractiontocapturetheirunderlyingrelationship.Inparticular,the
maincontributionscanbesummarizedasfollows.
¥Newe
!ectiveimagedistancemetricanditstheoreticalfoundation
.The
dissertationproposesannovelkernelbaseddistancemetriclearningalgorithm(RKML)
speciÞcallyforimageannotationinChapter3.Thisalgorithmachievessuccessbyfully
exploringtheimage-tagdependency,whichisconsequentlyusedtobettercapturethe
nonlinearcomplexitiesamongimages.Manystrategiesareappliedtoguaranteethe
annotationaccuracy,includingtheincorporationofsoftsemanticconstraintswhich
betterexplorethesemanticinformationbetweentags,andtheadoptionofrankbased
regularizationtermwhiche
!ectivelyreducestheoverÞttingrisktothetrainingdata.
Besides,thetheoreticalguaranteefortheproposedkerneldistancemetriclearningis
provided.
¥E"cientkernelmetriclearningandrelatedcomputation
.TheproposedRKML
algorithmexplorestheregressiontechniquetoavoidtheprojectiontoPSDcone,which
isnecessaryindistancemetriclearningandisintensivelycomputationallyexpensive.
10Besides,Nystr¬omapproximationisappliedinthekernelcomputationtospeedupthe
implementation.Theseskillsaswellastherankbasedregularizationgreatlyreduce
thecomputationburdenfortheproposedRKMLalgorithm.
¥Novelimagetagcompletionworkthate
!ectivelydealingwithmissingand
noisytags
.Thedissertationalsoproposesanovelimagetagmatrixcompletion
(TCMR)frameworkinChapter4thate
!ectivelyrecoverstheexpectedtagsfrom
incompleteandnoisygiventags.Thisalgorithmfocusestocapturethetag-tagcor-
relation,andthenusesittoreverselyupdatethetagconÞdencescorematrix.Based
ontheideaoftopicmodel,TCMRassumesthattheobservedtagsofanyimageare
drawnindependentlyfromamixtureofasmallnumberofmultinomialdistributions,
whichcanbestraightforwardlyinterpretedasthelowrankmatrixcompletiontheory.
Sofollowingthistheory,thenuclearnormisappliedtosimultaneouslycapturethe
interactionsamongtagsintwoways,eitherbetweendi
!erenttagkeywordsorbetween
tagvectorsassociatedwithdi
!erentimages.Maximumlikelihoodcomponentisalso
employedasthelossfunction,whichsuccessfullyconnectsprobabilisticmodelsand
matrixcompletiontheory.Allthesetechniquesareappliedtoensuretheperformance
oftagmatrixrecoveryoutofmissingandnoisytags.
¥Theoreticalguaranteeofimagetagcompletionbynoisymatrixrecovery
.TheÞnalobjectivefunctionoftheoptimizationproblemisconvex,whichguarantees
thattheglobaloptimalsolutionexistsanditwouldbee
"cienttoÞndthisoptimal
solution.Theerrorboundsbetweentherecoveredmatrixandthestatisticallyoptimal
onearealsoprovidedtheoretically.
111.4ThesisOverview
Theremainderofthisdissertationisorganizedasfollows.Chapter2laysoutthefoundation
fortherestofthedissertation.Inparticular,weprovideasurveyonsomeofthebackground
materialsincludingimagetaggingtaskslikeimageannotationandimagetagcompletion,
distancemetriclearning(bothlinearandkernel),statisticalmodelsappliedinimagetagging
work,andaswellaslowrankmatrixrecoverytheory.Itwillbecomeclearinthischapter
thatthereexistdeepconnectionsbetweenthesetopics.
TheÞrstpartofthethesisfocusesontheimageannotationproblem.InChapter3we
focusonthekerneldistancemetriclearningproblem,investigatehowita
!ectstheimage
annotationperformance,andproposestrategiestosolvethelimitationsinexistingkernel
distancemetriclearningalgorithmsandtheirapplicationsinreal-world.
Thesecondpartofthethesisdealswiththeimagetagcompletionproblem.Chapter4
discussesitsrelationshiptothestatistical/topicmodelsandmatrixcompletiontheory.The
e!ectivenessoftheproposedalgorithmisjustiÞedboththeoreticallybytherecoveryerror
boundsandempiricallyonabunchofdatasetsintermsofseveralofsetups.
Finally,Chapter5summarizesthisworkbyconcludingthemaincontributions,some
potentialextensionsandthefutureresearchdirections.Besides,theappendixsummarizes
ratherstandardthingsonrelevanttopicsofthiswork,andgivestheerrorboundsthatare
usedintheproofofresultsinthethesisandismainlyforreference.Inordertofacilitate
independentreadingofvariouschapters,someofthedeÞnitionsarerepeatedformultiple
times.
121.5Notation
Thissectionservesasaglossaryforthemainmathematicalsymbolsusedthroughoutthe
thesis.Vectorsareshownbylowercaseboldletters,suchas
x#Rd.Suchavectorusually
representsthevisualfeatureortagvectorofanimage.Matricesareindicatedbyuppercase
letterssuchas
Aandtheirpseudo-inverseisrepresentedby
A€.Weuse[
m]asashorthand
forthesetofintegers
{1,2,...,m
}.Throughoutthepaperwedenoteby
|á|
,|á|
1,|á|
Fand|á|!the!2(Euclidean)norm,
!1-norm,Frobeniusnormandspectralnorm,respectively.
1.6BibliographicNotesforPreviousPublications
Someoftheresultsinthisdissertationhaveappearedinpriorpublications.
ThematerialinChapter3isbasedonaworkpublishedintheInternationalConference
onComputerVision[52](ICCV),thecontentofChapter4comesfrom[51]whichispublished
attheEuropeanConferenceonComputerVision(ECCV),and[50]whichispublishedat
theIEEETransactionsonImageProcessing.
13Chapter2
Background
Thegoalofthischapteristogiveageneralandformaloverviewofthematerialsrelatedto
theworkthathasbeendoneinthisthesis.Inparticular,wewilldiscussthekeyconceptsand
questionsrelevanttoproblemsofimageannotation,imagetagcompletion,kerneldistance
metriclearningandnoisymatrixrecovery.Theexpositiongivenhereisnecessarilyverybrief
andthedetaileddiscussionwillbeprovidedintherelevantchapters.
2.1ImageRepresentation
Inthecomputervisionarea,imagerepresentationplaysanimportantandineluctablerole.
SpeciÞcally,appropriatefeaturerepresentationsigniÞcantlyimprovestheperformanceof
typicalimagerelevanttasksincludingimageclassiÞcation,imageclustering,imageunder-
standing,videounderstanding,etc.Sinceanimageconsistsofanunstructuredarrayof
pixels,theÞrststepofimagerepresentationistoextracte
"cientlycertaintypesofdiscrim-
inativevisualfeaturesfromthesepixels,eithercolorfulorgrayscale[220].Variousfeature
extractiontechniqueswillbereviewedindetailinthefollowingsub-sections.
2.1.1ColorFeature
Colorfeatureisoneofthemostbasicandfundamentalfeaturestocapturetheimagechar-
acteristics,whichisusuallydeÞnedsubjecttoaparticularcolorspace,suchas
RGB
,HSV
,14andL"#spaces[41,65].Withinthesespaces,colorfeaturescouldbeextracted,including
colorhistogram[204],colormoments[220]andcolorcoherencevector[220].
2.1.2TextureFeature
Unlikecolorfeatureswhichmeasuredthepropertyofasinglepixel,texturefeaturesexplore
thetraitsofagroupofpixels.Accordingtotheextracteddomain,texturefeaturescanbe
dividedintotwogroupsincludingspatialtexturefeatureandspectraltexturefeatures[220].
Spatialtexturefeaturesareusuallyextractedbycomputingthepixelstatistics,searching
localpixelpatternsorconvertingwithstochastic/generativemodelsintheoriginalimage
space.Typicalspatialfeaturesincludetexonhistogram[140]andMarkovrandomÞeld[97].
Generally,sincespatialfeaturesaredirectlygeneratedintheoriginalimagespace,they
couldbestraightforwardlyextractedfromirregularshapedregions,whiletheyusuallysu
!erseverelyfromthenoise,mutationanddistortionsofimages[220].
SpectraltexturefeaturesserveassigniÞcantimageanalysistoolsintheComputerVision
areainearly2000s,andtheyareusuallyextractedinthefrequencydomainthatistrans-
formedfromtheoriginalimagespace.CommonspectraltexturefeaturesincludesFourier
transform(FT)[122],discretecosinetransform(DCT)[164],wavelet[85]andGaborÞl-
ters[128].Amongthem,FTandDCTaree
"cientbutsensitivetoscaleandrotation,
waveletisfastcomputedbutlimitedtoorientations,andGaborfeatureisrobusttoscale
andorientationbutwouldlosecertainspectralinformationduetotheincompletecoverof
spectrum[183].
152.1.3TypicalFeaturesinImageTagging
Here,severalstate-of-artimagevisualfeaturesaresummarizedandcomparedindetail,which
arepotentiallyusefulforimageleveltasksincludingimageannotationandimagetagging.
SIFT
feature[137]isinitiallyproposedforobjectrecognition.ItÞrstextractstheSIFT
descriptorsfromasetofreferenceimagesatdi
!erentscaleswithGaussianÞltersandthen
usesbag-of-wordsmodeltocomputerthehistogramofthedescriptorstoformtheÞnalimage
feature.TherearevariousversionsofSIFTfeaturesincludingsparseSIFT,denseSIFTand
SURF.SparseSIFT[218]buildsthefeaturesatHessian-a
"nandDenseSIFT[120]extract
thedescriptorswithinaßatwindow.SURF[7]isaspeed-upversionofSIFTwhichtakecare
ofthescaleproblembyaconvolutionwithboxÞltersandhandletheorientationproblem
withwaveletresponses.
Gistfeature[155]isinitiallydescribedasalowdimensionalrepresentationofthescene
andspeciÞcallyforscenerecognition,whichrequiresnoimagesegmentationasintradition.
Itsummarizesthegradientinformation,bothscalesandorientations,byconvolvingthe
imagewithabankofGaborÞlters[128],whichprovidesaroughdescriptionoftheimage
characteristics.
HOGfeature[38,48]isreportedtoprovideexcellentperformanceforobjectandhuman
detection.ItÞrstdenselyextractsthehistogramoforientededges(HOG)descriptorsand
stackstheneighboringHOGdescriptorstogethertoincreasethefeaturedimensionandthe
descriptivepoweraswell.Bag-of-wordsmodelisusedlatertoÞnallycomputetheHOG
featureforanimage.
LBPfeature[153],shortforLocalBinaryPatterns,isapowerfultexturefeaturebasedon
occurrencehistogramoflocalbinarypatterns.BasicallyLBPdividestheimageintoblocks,
16forexample3
$3,thenthresholdtheblockwiththecenterpixelvalueandencodeitinto
asequenceofbinarydigits.Thesequenceisthenconvertedtoadecimalnumberwhichis
setasthevalueofthecenterpixel.Thusthehistogramofeachblockcanbecomputedand
concatenatedtogethertoformafeaturevectoroftherepresentingimage.Essentially,LBP
encodesthelocalcontrastandpatterns,makingithighlydiscriminativewhilecomputed
e"ciently.
2.2ImageAnnotation
Oncesu
"cientvisualfeaturesareextractedfromtheimage,highlevelsemanticslikeannota-
tionsandtagscouldbelearnedimmediatelyfromthegiveninformation.Accordingto[67],
traditionalautomaticimageannotationmethodscanbecategorizedintothreegroups,while
recentlynewdeepneuralnetworkbasedmodelshavealsogainedmoreandmoreattention
intheannotationcommunity.
2.2.1GenerativeModels
Thistypeofmodelsusuallytrainsglobalprobabilisticmodelstoexplaintheco-occurrence
betweenimagevisualfeaturesandsemanticlabels,andthenpredictnewtagswiththe
newlylearnedrelationship.Amongthem,manyareborrowedfromthetechniquesofnatu-
rallanguageandtext-baseddocumentprocessing.Duyguluetal.triedtotranslateimage
blobsintolabelkeywordsdirectlyusingamachinetranslationmodel[46],whichinspired
severalrelevancemodels.Theseearlyworks,includingCross-MediaRelevanceModel[92],
Continuous-SpaceRelevanceModel[119]andMultipleBernoulliRelevanceModel[49],as-
sumestheblobsandtagsareconditionallyindependentgivenanspeciÞcimage.Besides,
17analgorithmin[22]isdesignedtomodelthejointdistributionbetweentagsandvisualfea-
tureswithamixturedistribution,while[145]modelsthevisualandsemanticrelationship
viaBayesiannetwork.
Meanwhile,latentspacemodelsderivedfromnaturallanguageandtextprocessing,in-
cludingLatentSemanticAnalysis[45]andProbabilisticLatentSemanticAnalysis[76],and
variantsofLatentDirichletAllocationmodels[4,148,132,108,158]havebeensuccessfully
appliedtoimageannotation.
Besidesthepreviouslargegroups,in[195]theauthorsproposeasemi-supervisedformu-
lationbasedonlinearregressionwithatag-biasedregularization.
Thesemethodsusuallyhaveunsatisfactoryperformancesincetheprobabilisticmodels
aretooglobaltocapturethenonlinearrelationbetweenimages.
2.2.2DiscriminativeModels
ImageannotationcanalsobeviewedasaclassiÞcationproblemwhereeachkeywordistreated
asanindependentclass.Asastate-of-the-artclassiÞer,SupportVectorMachine(SVM)has
beenshownwithhighe
!ectivenesswhenhandlinghighdimensionaldatalikeimage.AnSVM
classiÞerisbasicallyabinaryclassiÞer,soinordertobeadaptivetotheimageannotation
taskswhichrequiresmultipleclassiÞer,someSVM-basedannotationmodelsÞrsttraina
separateSVMforeachconceptwitheachclassiÞergeneratingaprobability,andlaterfuse
alltheSVMclassiÞerstogethertogetaÞnalconÞdencescoreforeachtag[36,47].Further,
abatchmodere-taggingmethodisproposedin[27],whereaSVMwithaugmentedfeatures
isproposedtolearnadaptedasetofclassiÞerstoreÞnetheexistingnoisytags.
BesidesSVM,thereisagroupofotherdiscriminativemodelsthathavebeensuccessfully
appliedinimageannotation.[139]assignstagsbya
knearestneighborclassiÞercombining
18withmultipledistancemetrics.[144]appliesastructuralmodeltoattribute-basedimage
classiÞcation,andtransferstheuserinputsaswellastheattribute-classmappingresults
topredictedtags.[13]learnstheclasslabelsbyexploitingthegrouplassotechniqueand
minimizingtherankingerrors.Commonlyforthesemethods,boththetrainingandtesting
phasesarecomputationallyexpensive.But[72]raisesamax-marginformulationthatmodels
thedensepairwiselabelcorrelations,andreducesthecomplexityfromexponentialtolinear.
[157]alsolearnsamulti-labelclassiÞerthatexplicitlyande
"cientlymodelsthedependencies
betweensubmodularpairwiselabelsviagraph-cut,anddirectlyoptimizestheF-score.
In[133],amultiviewHessiandiscriminativesparsecodingispresented,whichexploits
Hessianregularizationtosteerthesolutionwhichvariessmoothlyalonggeodesicsinthe
manifold,andtreatsthelabelinformationasanadditionalviewoffeatureforincorporating
thediscriminativepowerforimageannotation.In[79],R.Hongetal.exploreboththe
positiveandnegativetagcorrelationsandproposeanmethodwithdiscriminativefeature
mapping,whichselectsthee
!ectivefeaturesfromalargeanddiversesetoflow-levelfeatures
foreachconceptundermultiple-instancelearningsettings.
Despitetheconsiderableperformanceinlearningimageannotations,thisgroupofalgo-
rithmssharesthesameshortcomingsthattheyhavepoorscalabilityonlargedatasetsor
whenthetagdictionaryislarge;andtheyalsoperformunsatisfactoryespeciallywhenthe
trainingtagsareincompleteornoisy.
2.2.3SearchbasedModels
Sinceimageannotationisahighlynonlinearproblem,parametricmodelsmightnotbesuf-
Þcienttocapturethecomplexdistributionofthedata,recentworksonimagetagginghave
mostlyfocusedonnonparametricnearest-neighbormethods,whicho
!erhigherexpressive
19power.Searchbasedapproacheshavegainedmuchpopularityintheexploringoftagrele-
vanceduetoitsfeasibilityonlargescaledata.Recentstudiesonimageannotationshowthat
searchbasedapproachesaremoree
!ectivethanbothgenerativeanddiscriminativemod-
els[67,202].Here,webrießyreviewthemostpopularsearch-basedapproachesdeveloped
forimageannotation.
TagProp[67]constructsasimilaritygraphforallimages,andpropagatesthelabelin-
formationfromthetrainingimagestotestingimagesviathisgraph.In[123]amajority
votingschemeamongtheneighboringimagesisproposed.[130]obtainsthetagrelevance
scoreusingkerneldensityestimation,andthenperformsrandomwalktoboosttheprimary
tagrelevancescoreoverthetagproximitygraphthatisconstructedfromtheneighboring
images.Asparsecodingschemeisproposedin[59]toselectsemanticallyrelatedimages
fortagpropagation,andthenlocalandglobalrankingagglomerationisadoptedtodown
weightthenoisytags.Besides,conditionalRandomFieldmodelisadoptedinboth[93]and
[203]tocapturethespatialcorrelationbetweenannotationsofneighboringimages,but[93]
embedsthekernelizedlogisticregressionwithmultiplevisualdistancemetriclearningwhile
[203]optimizesthemodelbymaximizingmarginsofthehingelossfunction.
Thiscategoryofworksusuallyconcernsmoreonsearchtechniqueorvisual-semantic
consistencyproblems,wheremuchattentionhasbeenpaidtolearne
!ectiveande
"cient
distancemetrics.
2.2.4NeuralNetworkbasedModels
NeuralnetworkbasedimageannotationmodelstypicallyincludesconventionalartiÞcialneu-
ralnetwork(ANN)[55]andrecentdevelopeddeepconvolutionalneuralnetwork(CNN)[109].
AnANNconsistsofmultiplelayersofnodescalledneurons,andnodesindi
!erentlayers
20areconnectedbyedgeswithcorrespondentweights.Eachneuronworksbyinputtingthe
outputsofthepreviouslayersandtheweightsofitsconnectingedgesintoanactivation
functiontogenerateaÞnaloutput.Figure2.1showshowanANNannotatesanimagewith
threetags.Asanexample,four3-layerANNsareusedin[112]toannotateimageregions.
Figure2.1:AnnotateanimagewithANN[220].
Veryrecently,deepconvolutionalneuralnetworks(CNN)havedemonstratedpromising
resultsforimageclassiÞcation[109],andfeaturesbasedonCNNshavealsoshownpotential
tosigniÞcantlyboostperformanceintermsofimageannotationandtagging[64,180].[64]
proposesafeaturebasedonDNNthatcombinesconvolutionalarchitectureswithapproxi-
matetop-
krankingobjectives,andÞnallyoverwhelminglyoutperformsthetraditionalvisual
featureinmultipleimageannotationjobs.[34]buildsmanysparselyconnectedneurallayers
bytrainingonlythewinner-take-allneurons,whichyieldslargenetworkdepthandexcellent
performanceonimageannotationtasks.In[214],Yangetal.tactfullyapplydeepneural
networktoestablishthecorrelationsbetweenvisualfeaturesandsemantics,andaddressthe
21imbalancedkeyworddistributionbyincorporatingthekeywordfrequenciesandlog-entropy.
Thisgrouphassomedistinctiveprosandcons.Formassiveinputandoutputdata,when
wehavenoideawhatthefunctionmappingbetweenthetwotogetheris,neuralnetworkcan
learnthisfunctionwithouthavingtoexplicitlyprovideit.Anditalsowellhandlesdefective
datasetswithnoiseandmissingvariables.Nevertheless,emergingfromaneuralnetworkÕs
weightscanbedi
"culttounderstand,speciÞcally,itmaywork,butitishardtoexplainthe
literalandphysicalmeaning,andthereisnotheoreticalguarantees.Sometimes,itstraining
takeslongerthancertainothermethodsofmachinelearning.
2.3ImageTagging
Imagetagwasinitiallyappliedtoimprovetheperformanceofcontent-basedimagere-
trieval[114],andthenimagetaggingworksweredevelopedtogeneratetagsbyassociating
semanticwordstounlabeledimages[5].Probabilisticandlanguagemodelsarewidelyused
inearlymodelsthatmatchthesemanticsandimages[4,5,119].Astheimageannotation
problem,theimagetaggingproblemcanalsobeformulatedasamulti-labelclassiÞcation
problemwhereeachimagecanbeassignedtomorethanoneclasssimultaneously[14].And
followingthisidea,therearemanymulticlasstechniques,includingSVM,CRF,andsome
otherworkssuchas[14,15,115],thathasbeenmodiÞedtoadapttotheimagetagging
problem.
Similarasliteratureonimageannotation,mostexistingimagetaggingworksexploreonly
therelationshipbetweenthevisualfeaturesthetags,forinstance,thedirectmappingbe-
tweenvisualandtagspaces,theprobabilisticdependenciesandthegraphicalmodelbetween
visualcontentsandtags[67,126,129,201].
22Toachieveabettertaggingperformance,someworkstrytolearnabettermappingby
studyingtheprecisetaglocalizationoranadaptivedistancemetric.[15]and[14]factorize
theBags-of-Wordsfeatureasaweightedsumofclasshistogramsplusanerrortomodelthe
imagecontent,andthusposethemulti-labelclassiÞcationproblemasarankminimization
problem.[123]proposestoscalablyandreliablylearnthetagrelevancevectorofanimage
byaccumulativelyvotesthetagsassociatedtoitssimilarimages(nearestneighbors).[52]
and[202]applydistancemetriclearningmethodstocapturethedependencybetweenvisual
andtextualcontents.
However,sincecomparedtoimageannotation,additionaltaginformationareobserved
inimagetaggingtasks,someotherworksdelveintothetextualcorrelationsamongtags.[40]
introducesaso-called
HierarchyandExclusiongraph
toencodetherichsemanticrelations
includingmutualexclusion,overlapandsubsumption.[42]mapsbothimagesandtexttoa
commonsemanticspaceusing
wordembedding
,whichimprovesthetaggingperformanceby
avoidingdirectcross-modalmappingthatisalwaysimpracticaltobeconstructed.
Besides,otherworksmainlyfollowtheideasoftopicmodelandmatrixcompletion.
Theyusuallyexplorethemutualdependenciesbetweentagsandthensolveanoptimization
problemderivedfromtheimage-tagrelation[58,151,193,215,224].
Andinourstudy,wefocusontheessentialcorrelationsamongdi
!erenttags,whichcan
bee
!ectivelyrecoveredoutoftheincompleteandnoisyobservedtagsbythenoisymatrix
recoverymodel.Inordertoprovideamorecomprehensivepresentationonthismodel,we
furtherreviewtheimagetagcompletionworksaswellasthecloselyrelevanttopicmodel
basedimagetaggingapproachesasfollows.
232.3.1ImageTaggingwithTopicModels
Topicmodelisoriginallydesignedfordocumentclustering[12,11]whichdiscoverstheab-
stractÓtopicsÓthatoccurinacollectionofdocuments.Figure2.2showsatypicaltopic
modelpipeline.ItisÞrstassumedthatthereisahandofÕtopicsÕinthecollectionofdoc-
uments,asshownintheleftcolumn,andeachtopiccouldbemodeledbyadistribution
overasetofwords.Thenthegenerationofadocumentcanbedescribedasfollows.First,
adistributionoverthetopics(thehistogramatright)shouldbechosen,andthenforeach
word,wechooseatopicassignmentandchoosethewordfromthecorrespondingtopic[11].
Figure2.2:Theillustrationofhowtopicmodelworks[11],whereeachtopicishighlighted
byaspeciÞccolor.
Inthelastdecade,Topicmodelshasbeenwidelyappliedinimageunderstandingand
tagrecoveryapplications[151,225].[206]appliestopicmodeltotagreÞnementbyjointly
modelingtagsimilarityandtagrelevance.[108]usesLDA[12]todiscoverlatenttopicsfrom
resourceswithcompletetagannotations,anddiscoveredtopicsarethenusedtorecommend
24topicsfornewresourcesthatareannotatedwithonlyafewtags.[158]presentsatopic-
regressionmulti-modalLDAforimageannotation.Howeverallthesemethodsfocusonthe
simpleco-occurrenceoftagsandfailtocapturetheirunderlyingdependencies,andthus
workpoorlyonimperfecttags.Recently,[151]encodesthetextualtagsasrelationsamong
theimages,andthenusestopicmodeltolearntheimagecontentandmodifytheirencoded
relations.[87]extendstraditionalLDAtonoisytagsbyadditionallyintroducingageneral
distributionunrelatedtotheimagecontentwhichleadstothenoisytags.Thekeylimitation
oftheseproposedtopicmodelsare(i)theyhavetosolveanon-convexoptimization,and(ii)
theyusuallydonothaveanytheoreticalguaranteeonthelearnedmodels.
2.3.2ImageTagCompletion
ThereareonlyahandfulstudiesÞttingthecategoryoftagcompletionwithbothincomplete
andnoisytags.[226]proposesadata-drivenframeworkfortagrankingthatoptimizesthe
correlationbetweenvisualcuesandassignedtags.In[129]thenoisytagsareÞrstremoved
basedonthevisualandsemanticsimilarities,andthentagsareobtainedbyexpandingthe
observedtagswiththeirsynonymsandhypernymsusingWordNet.[201]proposestosearch
fortheoptimaltagmatrixthatisconsistentwithbothobservedtagsandvisualsimilarity.
[190]proposestocompletethemissingtagsbya
locallinearlearning
,whichconstructs
auniÞedobjectivefunctiontocalculatethetagscoringvectorforeachimageamongits
neighborhood.In[208],theauthorsproposeanimage-tagre-weightingschemetoadjust
thepenaltyofeachtagandimagebasedonbothimagesimilaritiesandtagassociations,
andthereforeformulateauniÞedre-weightedempiricallossfunctiontohandlethedefective
settingwithbothincompleteandnoisytags.Despitethesuccessfulapplication,noneof
thesestudiesprovidesanytheoreticalguaranteefortheirapproaches.
25Besides,matrixdecompositionisadoptedinliteratureincluding[15,149,223,224]to
handlebothmissingandnoisytags.[134]formulatestagcompletionintoanon-negative
datafactorizationproblem.[126]exploitssparselearningtechniquestoreconstructthetag
matrix.Thekeylimitationoftheseapproachesisthattheyrequireafullobservedtagmatrix
withasmallnumberoferrors,makingitinappropriatefortagcompletionproblem.
2.4ImageAnnotationbyMetricLearning
ItisubiquitoustoÞndappropriatemeasurestorepresentthedistanceorsimilaritybetween
datainresearchandengineeringcommunitiesincludingmachinelearning,computervision,
informationretrievalanddatamining,whichincreasestheemergenceofdistancemetric
learning(DML)[9].Euclideandistanceisthesimplestandmostgenerallyuseddistance
metric,butdespiteeasilyused,hardlyitisabletocapturetheirregularitiesandidiosyncrasies
ofthecomplicatedandversatiledata.ThestudiesofDMLcanbetracedbackto2002[205],
andimmediatelyitbecomesahottopicandinspiresmanyresearchwork.Yangetal.[210],
Kulis[110]andBelletetal.[9]havecomprehensiveyetdetailedsurveysonthistopicincluding
problemformulation,optimizationandapplications.Giventherichliteratureonthissubject,
weonlydiscussthemetriclearningstudiescloselyrelatedtoimageannotation,wereferthe
readersto[9,67,202,210]formoredetailedsurveysonthefocusedtopic,ifnecessary.
Thegoalofdistancemetriclearningistotakeadvantageofthepriorinformationinform
oflabels/tagsorpairwiseconstraintstocreateaprojectionofthedataintoanotherspace
suchthattherelevantimageshavesmallerdistancesandsharemorelabelswhileirrelevant
imageshavelargerdistancesandsharefewerlabels.Accordingtothelinearityofprojection,
weroughlycategorizeDMLintotwogroups:linearandnonlineardistancemetriclearning.
26Besides,therearealsosomeextensionsincludingonlinelearningandlocalmetriclearning.
Certainpartsofthesegroupsmayoverlap.
2.4.1LinearDistanceMetricLearning
MostlinearDMLmethodsassumedatapointslieinaÞnitelinearspace,andfocuson
Mahalanobismetriclearningproblemsetting,writtenas
dM(x,x")=!(x%x")&M(x%x"),wherethemetric
MshouldbesymmetricandpositivesemideÞnite(PSD).Mostnotable
worksforlearningsuchaMahalanobisdistancefallintoseveralgroups[210].
TheÞrstgrouplearnsmetricswithexplicit
classlabels
(mayalsobereferredto
tags,concepts
orkeywords
).Forinstance,NCA[63]explicitlylearnsmetricsthroughak-nearest
neighborclassiÞcation,MLCC[60]constructsaconvexproblemleadingtoametricthat
collapsessameclasssamplestoasinglepointandpushessamplesintheotherclassesinÞnitely
faraway,LMNN[196]extendstheK-NNbasedworksbyachievingmaximalmarginnearest
neighborclassiÞcation,andLDML[68]modelstheimagesimilarityusingposterioriclass
probabilitiesandobtainsthedistancemetricbymaximizingthelog-likelihood.
Thesecondgrouplearnsmetricsfrom
pairwiseconstraints
andtypicallyincludesfol-
lowingexamples.NMC[205]proposesaconvexformulationthatmaximizesthesumof
distancebetweendissimilarpointswhilekeepingthesumofdistancebetweensimilarexam-
plessmall.RCA[3]learnsadistancemetricthroughasetofpositiveconstraints(must-link),
andlaterDCA[78]andERCA[216]extendRCAbyadditionallyintroducingnegativecon-
straints(cannot-link)atthecostofamoreexpensivealgorithm.LRML[77]providesasemi-
27supervisedmetricbyintegratingtheunlabeleddatainformationandagraphregularization.
ITML[39]introducesLogDetdivergenceregularizationandminimizesthedi
!erentialen-
tropyunderbothpositiveandnegativeconstraints.InITML,aBregmandivergencedeÞned
asDld(M,M
0)=tr(MM%10)%logdet(
MM%10)%d,where
disthedimensionoftheinputspace,isintroducedtokeepthelearnedmetrictobeas
identicalaspossibletotheEuclideanmetric(
I).InITML,itisautomaticalandprettyeasy
toguaranteethepositivesemideÞnitenessof
Mbyminimizing
Dld(M,M
0),duetothefact
thattheLogDetdivergenceisÞniteifandonlyif
MispositivedeÞnite.Furthermore,[91]
andSDML[160]followthisideaandproposemoree
"cientMahalanobisdistancelearning
algorithms.BesidestheLogDetdivergenceregularization,SDML[160]alsoemploysan
extra
L1regularizationontheo
!-diagonalelementsof
Mtospeedupthecomputationin
highdimensionalspacewhilemakeittheoreticallydescent.Andmoreover,tohandlethe
noisyconstraints,RML[82]minimizestheworst-caseviolationoverallpossiblesetsofcorrect
constraints.
Exceptionally,despitethepopularityofDMLalgorithmsthattakingcareofclassla-
belsandconstraints,onlyafewworksaredesignedtohandleothertypesofsupervised
informationsuchasannotatedtags.Forimageannotationtasks,[93,200,202]proposeto
exploremetricsfromimplicitsideinformationinsteadofclassassignmentsorpairwisecon-
straints.KCRF[93]embedsaKernelizedLogisticRegression(KLR)withmultiplevisual
distancelearningintoauniÞedConditionalRandomFields(CRF)framework.PRCA[200]
Þrstproposesaprobabilisticmetriclearningoutoftheprobabilisticsideinformationbased
onagraphicalmodel.UDML[202]uniÞesbothinductiveandtransductivemetriclearning
28techniquestoe
!ectivelyexploitbothvisualandtextualimagecontents.Besides,MLR[142]
learnsametricforsolvingrankingandretrievaltasks,anditsextensionR-MLR[124]addi-
tionallydealswiththenoisyfeaturesusingamixed
L2,1normtoignoremostoftheirrelevant
features.
Furthermore,thelinearsimilaritymetriclearningisusuallyanalternativeoflineardis-
tancemetriclearning.Theonlydi
!erenceisthatsimilaritymeasuredoesnotnecessaryhave
distanceproperties,especiallythePSDandsymmetricrequirements,andasaresultitis
usuallymoreßexibleandscalabletolargedata.Typicallinearsimilaritymetriclearning
algorithmsincludeSiLA[159],OASIS[25],SLLC[8]andRSL[30].
2.4.2NonlinearDistanceMetricLearning
Duetothemultimodaldistributionsofreal-worlddata,recentlyanumberofnonlineardis-
tancemetriclearningapproacheshavebeendevelopedtotacklethesenonlinearpatterns.
Themainideaofnonlinearmetriclearningistolearnalinearmetricinareproducednon-
linearfeaturespace.Dependingonhowthenonlinearmappingisconstructed,thenonlinear
DMLfamilyisusuallyclassiÞedintotwocategories,boostingbasedapproaches[73,74,172]
andkernelbasedapproaches[39,78,187].
Typicalboostingmethodsarelistedasfollows.BoostDist[73]combinesboostinghy-
pothesesovertheproductspacewithaweaklearnerthatisbasedonpartitioningtheoriginal
featurespace.BoostMetric[173]appliesasetofpositivesemideÞnitematriceswithtraceand
rankbeingoneasweaklearnerstoanboostingbasedlearningprocess.AndGB-LMNN[98]
appliesgradientboostingtolearnanonlinearmappingdirectlyinthefunctionspace.
Initialkernelmetriclearning(KML)algorithms,suchasKPCA[169],KernelNMC[205],
KernelMCML[60],KernelDCA[78],KLMCA[187],KernelITML[39]andKernelBoost[74],
29directlyextendtheirlinearorboostingbasedcounterpartstokernelmetriclearningusing
thekerneltricks.Althoughseveralapproacheshavebeenempiricallyshowntobeableto
kernelizable,ingeneralkernelizingsetting,aspeciÞcmetriclearningisnottrivial.Itinvolves
anewproblemformulationwheretheinterfaceofdataislimitedtoinnerproducts,anda
n$nmatrixisineluctabletolearn.Besides,asthenumberoftrainingexamples
nincreases,
theproblembecomesintractable.Theseproblemstogetheryieldanextremelydi
!erentand
di"cultsolutionasinthelinearspace.Toaddressthisproblem,ahandofgeneralkernel-
izationextensionworks[24,219]havebeendevelopedbasedonKPCA[169].Aso-called
KPCAtrick
,whichintroducesakerneltoprojectthedataintoanonlinearspacefollowed
byadimensionalityreductionstrategy,isadoptedanditssoundnessisjustiÞedtheoreti-
callythroughrepresentertheorems[24].Itisalsopossibletoobtaingeneralkernelization
throughtheequivalencebetweenMahalanobisdistancelearningandlineartransformation
kernellearningwithspectralregularizers[90,89].Inpreacticalimplementation,suchan
appropriatekernelfunctioncouldbeselectthroughamultiplekernelframeworkthatis
proposedin[191].
Inparallel,someotherKMLworksstraightforwardlyproposenewkernelbasedmetric
learningframeworks.[2]proposesanexplicitkerneltransformationtotackleaconstrained
traceratiooptimizationproblem.Itexploitsbothpositiveandnegativeconstraintsandas
wellasthetopologicalstructureofdata.ThesuggestedimplementationofthisKMLalgo-
rithmisquitee
"cientsinceitisnotnecessarytolearnallentriesinthe
n$nmetricmatrix.
NAML[26]formulatesatracemaximizationproblemtojointkernellearning,dimension
reductionandclusteringtogetherandsolvesitinaEMframework.[209]proposesasupport
vectormetriclearning(SVML)thatco-jointsaMahalanobisdistanceandtheSVMmodel
withaRBFkernel,wherethePSDconstraintisautomaticallyguaranteedandthemetric
30canbemadelowrank.
However,althoughliteraturehasshownthatkernelmetriclearningmaydramatically
improvethequalityoflearneddistanceoverhighlynonlineardata,italsosu
!ersfromthe
computationalburdenandeasilycausedataoverÞtting,whichresultsinapoorgeneralization
performance.
2.4.3OnlineMetricLearning
Aspreviouslystated,amainchallengeinlinearornonlineardistancemetriclearningisto
enforcethelearnedmetrictobepositivesemideÞnite(PSD),whichturnsouttobevery
computationallyexpensiveintermsofbothtimeandspace,especiallywhendealingwith
largescaleproblems.Onlinelearningiscontraryveryusefulinhandlingtheseproblemby
gettingridofthebottleneckofPSDrequirementsandthusgainsgreatpopularity,though
itoccasionallyperformsabitinferiortobatchalgorithms.Prominentonlineworkscanbe
referredtoPOLA[171],LEGO[91],OASIS[25],RDML[95]andMDML[111].POLA[171]
istheÞrstonlineMahalanobisdistancelearningapproach,whichprovidesaregretbound
andisdonequitee
"ciently.LEGO[91]learnsmetricsinanonlinesettingusingaLogDet
regularization,andOASIS[25]isasimilaritymetriclearningwhichscaleslinearlywiththe
datasizethroughonlinelearningofabilinearmodelusingamargincriterionandane
"cient
hingeloss.RDML[95]solvesaconvexquadraticprogramineachiterationstepinsteadof
doingeigenvaluecomputationlikePOLA,anditperformscomparablytoLMNNandITML
yetmuchfaster.MDML[111]isbasedoncompositemirrordescentandcanaccommodatea
largeclassoflossfunctionsandregularizersforwhiche
"cientupdatesarederived.Besides,
bothMCML[60]andITML[39]haveonlineversionswithexcellentperformance.
312.4.4LocalMetricLearning
Thepreviousstudieslearnagloballinearornonlinearmetric,whichmayincapableto
capturethecomplexityifthedataisheterogeneous.HoweveritmaybeneÞcialtouselocal
metricsthatvaryacrossthespace,whichhavebeenshowntosigniÞcantlyoutperformglobal
methodsattheexpenseofhighertimeandmemoryrequirements.[211]presentsaLocal
DistanceMetric(LDM)thataimstooptimizelocalcompactnessandlocalseparabilityin
aprobabilisticframework.MultipleMetricLMNN(M
2-LMNN)[197,198]learnsseveral
Mahalanobisdistancesindi
!erentpartsofthespacethatarepartitionedbyclusteringal-
gorithms.GLML[113]leveragesthepowerofgenerativemodelinthecontextofmetric
learning,bylocallyandsimultaneouslyminimizingtheasymptoticprobabilityofmisclassi-
ÞcationandaswellasthebiascausedbyÞnitesampling.In[192],PLMLisproposedwhich
learnslocalmetricsaslinearcombinationsofbasismetricsdeÞnedonanchorpointsover
di!erentregionsoftheinstancespace,anditisquiterobusttooverÞttingduetoitsglobal
manifoldregularization.Further,[86]extendsPLMLbyregularizingtheanchormetricsto
belowrank,whichallowsabetteroptimizationtoachievetheoptimalmetric.
2.4.5OtherMetricLearning
Besides,therearealsoafewapproachesthatareoutsidethescopeofthepreviouscategories.
Forinstance,themulti-taskmetriclearningisdesignedformulti-tasksetting,wheregivena
setofrelatedtasksametricislearnedforeachtaskinacoupledfashioninordertoimprove
theperformanceonalltasks.Typicalmulti-taskmetriclearningalgorithmsincludemt-
LMNN[156],MLCS[212],GPML[213]andTML[222].Andasforsparsemetriclearning,
typicalexamplesincludeLPML[166]andSML[217],whichfavorthesparsitythrough
L132normand
L2,1normregularization,respectively.However,LPMLisnotguaranteedtobe
lowrankwhileSMLsu
!ersfromthecomplexityissueinhighdimensionalproblems.Besides,
anuniÞedandgeneralframeworkforsparsemetriclearningisproposedin[83,84].
2.5ImageTaggingbyMatrixCompletion
LiterallymatrixcompletionmeanscompletingpartiallyspeciÞedmatricestofullyspeciÞed
matricessatisfyingcertainprescribedproperties.Thematrixcompletionproblemcanbe
datedtoback1990,whenJohnsonclaimsin[96]thatgivenafewassumptionsaboutthe
natureofthematrix,theexpectedmatrixisallowedtobereconstructed.Theseassumptions
includepositivesemideÞniteproperty,contractionpropertyandgivenrankassumption[96,
118].Abreakthroughoccursin2009whenCand`esandRecht[20]provethatalow-rankma-
trixcanbereconstructedbasedonconvexoptimizationofthenuclearnorm.Untilnow,
lowrankmatrixcompletionhasbecomearecurringprobleminmanyÞelds,forexample,
collaborativeÞltering[62](notably,theNetßixchallenge)andcomputervisionproblems
includingstructure-from-motion[186],multi-classiÞcation[1,15],globalpositioning[174],
amongmanyothers.Wereferto[19]foradiscussionofmoreapplications.
2.5.1LowRankMatrixRecoverywithNuclearNormMinimiza-
tionSinceÞndingthelowestrankmatrixsatisfyingtheequalityconstraintsisNP-hard[31]and
thefunctionofmatrixrankisnon-convex,apopularapproachistoreplaceitwiththe
nuclearnorm,thetightestconvexrelaxationofmatrixrank[19,21].Thetheoreticalbasefor
33suchrelaxationisprovidedin[21,162]thatunderfavorableconditions,theminimizationof
therankfunctioncanbeachievedbythenuclearnorm,whichlaysthefoundationforlater
matrixcompletionproblemlearning.Andwiththenuclearnorm,itispossibletoaccurately
recoveralowrankmatrixfromasmallfractionofitsentrieseveniftheyarecorruptedwith
noise[19,20,105].
Inthenoiselesssetting,thematrixcompletionproblemisconsideredasexactornear-
exactrecovery,whererelevantworks[21,66,99,161,174]discovertheminimumrequired
numberofrandomobservationstoexactlyreconstructalowrankmatrixbyaconstrained
nuclearnormminimization.[21,99,174]provethat
O(nrpoly(ln
n))observedsamplesare
requiredtorecovera
r-rank
n$nmatrixinspecialcase.[66]developsmoregeneralmethods
andimprovesthatresultbyintroducinga
degreeofincoherence
$betweentheunknown
matrixandthebasis,andÞnallyindicatesthat
O(nr$ln2n)randomlysampledentriesis
su"cienttorecoveranylow-rankmatrixwithhighprobability.And[161]simpliÞesthe
previousargumentsandsharpenstheresultsof[21,99,174]byprovidingaboundonthat
numberwhichisoptimaluptoasmallnumericalconstantandonelogarithmicfactor.These
resultsthusprovidetheoreticalguaranteesforthenuclearnormconstrainedminimization
methods.
Inaparallellineofwork,noisymatrixcompletion,whichismorecommonandwherea
fewobservedentriesarecorruptedwithnoise,hasalsobeenextensivelystudied[19,53,54,
100,102,103,104,105,107,150,165].Theobservednoisymatrixisusuallyregardedas
A=L+S,where
Lisanunknownlowrankmatrixand
Scorrespondstothenoisycorruptions.
Comparedtothenoiselesssetting,noisecouldseverelyharmthematrixcompletionresults,
asshownin[207]thatthenuclearnormminimizationcouldfailtorecoverthelowrank
matrixevenif
Scontainsonlyasinglenon-zerocolumn.
34Whenallentriesof
Aareobserved,thematrixcompletionproblembecomesamatrix
decompositionproblem.[23]assumes
Sissparseandprovesthat
LandScanbeperfectly
recoveredunderadditionalsu
"cientidentiÞabilityconditions,andmilderconditionsare
furthergivenin[81].RPCA[18]studiesthesamemodelbasedelement-wisesparse
Swhere
thecorruptionpositionsaresampleduniformlyatrandom,while[207]considerscolumn-wise
sparse
S,wheretheuncorruptedcolumnsarechosenuniformlyatrandomandguaranteed
torecoveraslongas
Lislowrank.
Whenonlypartialentriesof
Aareobserved,thematrixcompletionproblemisregarded
asapproximatematrixrecovery.ItisÞrstsystematicallyaddressedin[18]inthenoiseless
frameworkwithelement-wisesparse
S,wherethecorruptionpositionsaresampleduniformly
atrandom.And[29]improvesbyconsideringcolumn-wisesparse
S,anditprovesthatthe
uncorruptedcolumnsof
Lcanberecoveredandthecorruptionpositionsin
ScanbeidentiÞed
aswell,aslongasthefollowingassumptionsaresatisÞed:Theuncorruptedcolumnsare
chosenuniformlyatrandom,
Lislowrank,thenumberofcorruptedcolumnsarelimitedand
thenumberofobserveduncorruptedentriesaresu
"cient.Recently,bothelement-wiseand
column-wisecorruptionsaresimultaneouslyaddressedin[105],wherethehighprobability
recoveryof
Lrequiresonlyanupperboundonthemaximumoftheabsolutevaluesof
LandS,insteadoftherankof
Landthesparsitylevelof
Sasinpreviousstudies.
Inthenoisy/approximatematrixrecoverysetting,mostworksdelveintolowrankmatrix
reconstructionbyminimizingthenuclearnormwithuniformsampling[19,54,100,102,
107].Keshavan
etal.
[100]improvesovertheresultsof[19]andachievesreconstruction
guaranteesthatareorder-optimalinavarietyofcircumstances.Foygel
etal.
[54]presents
reconstructionguaranteesbasedonanalysisontheRademachercomplexityofthenuclear
normunitball[179].Koltchinskii
etal.
[107]proposesanuclearnormpenalizationwithfast
35convergenceratethatisshowntobeoptimaluptologarithmicfactorsinaminimaxsenseand
isequippedwithanon-minimaxlowerbound.Andlater,unknownnoisevarianceisfocused
onin[102],wheretheauthorproposesareconstructionestimatorthatachieves,uptoa
logarithmicfactor,optimalratesofconvergenceundertheFrobeniusrisk.Andthisestimator
yieldscomparablematrixcompletionperformanceasthepreviousstudies[19,107,150,165]
withknownnoisedeviation.
Acommonstrategytosolvetheconvexoptimizationproblemistheiterativescheme,
andtypicalalgorithmsinclude[16,138,185].Besides,alowcomplexityalgorithmOptSpace
basedonacombinationofspectraltechniquesandmanifoldoptimizationisÞrstintroduced
by[99]tohandletheexactrecoveryproblem,anditsrobustnesstonoisymatrixprob-
lemsettingistheoreticallyprovedin[100].Andothere
"cientnuclearnormminimization
solvers[57,88,94,141]havealsobeenintensivelylearned.However,mostofthemfailtosolve
largescaleproblems,makingnuclearnormregularizationlessfeasibleinpractice,despiteits
strongtheoreticalguarantees.Fortunately,itisrecentlyclaimedthatlargescalematrix
completionproblemcouldbesolvedthroughaparallelstochasticgradientalgorithm[163],
orbyane
"cientnuclearsolverviaactivespaceselection[80].
ThenuclearnormhasbeenappliedasaregularizertoimageclassiÞcation[15,61,71],
visualrecovery[136,149]andtagrelevanttasksincludingimagetagreÞnement[224]and
imagetagcompletion[51],wherethenuclearnormisusedtoenforcecorrelationsbetween
classiÞersortags.Inthematrixcompletion/recoveryscheme,moststudiesadoptsmooth
losses,includingcommonsquaredloss[54,61,184],sparse
!1-normloss[57,224],logistic
loss[15,61],maximummarginestimator[136],andeven
!-Lipschitzloss[53].
362.5.2LowRankRecoveryunderOtherConstraintsandSampling
Distributions
Mostmatrixcompletionworksfocusontheuniformsamplingandthenuclearnormregular-
ization,whichmightbeunrealisticsinceinpracticetheobservedentriesarenotguaranteed
tofollowuniformschemeanditsdistributionisnotknownexactly.Toovercomethislimita-
tion,someresearcherssearchforbettersamplingdistributions[104,105]whilesomeothers
developmoresuitablesurrogateofthematrixrank[53,54,150,17].
In[104,105],theuniformsamplingisreplacedbyageneralandunknownsampling
distributionwithinthenuclearnormminimizationframework.Nevertheless,thecondition
neededin[104]ismuchmilderthatitrequiresonlyanupperboundonthemaximumabsolute
valuesoftheentriesin
A,insteadofboth
LandSasdonein[105].
Itisshownin[178]thatthestandardnuclearnormmightperformpoorly,andacommon
alternativeistheempiricallyweightednuclearnorm[53,150,178],whichincorporatesthe
priorknowledgeofsamplingdistributionthatcanbecomputedbasedonthelocationsofthe
observedentries.Besides,adirectrankpenalizedestimator,obtainedbyhardthresholdingof
thesingularvaluesof
A,isproposedin[103],wheregeneraloracleinequalityfortheprediction
errorisestablished.Andinparallel,[165]introducestheSchatten-
pnormbasedpenalization
andalsoestablishesthepredictionerrorboundsformatrixcompletion.Recently,aso-called
maxnorm
[127]recentlyhasbeenproposedasanotherconvexsurrogatetotherankofthe
matrix,whichisdeÞnedas
|M|max
=min
M=UV&|M|2,'|V|2,',37where
|á|2,'isthemaximum
!2rownormofamatrix.ThemaxnormisÞrstappliedto
matrixcompletionundertheuniformsamplingdistributionin[54].Andlateramaxnorm
constrainedminimizationmethodisproposedin[17]fornoisymatrixcompletionundera
generalsamplingmodel,whichisshowntobeminimaxrate-optimalandyieldsauniÞedand
robustapproximaterecoveryguarantee.
38Chapter3
ImageAnnotationwithKernel
DistanceMetricLearning
ARegressionbasedKernelMetricLearning(RKML)
algorithmisproposedintheimage
annotationframeworkinthisChapter.
Theremainderofthechapterisorganizedasfollows.Section3.1motivatestheprob-
lemandmainintuitionbehindtheproposedalgorithm,andaswellassetupsthenotations.
Section3.3isdevotedtothedetaileddescriptionoftheproposedRKMLalgorithmandits
extensions.Thetheoreticalpropertiesandguarantee,
i.e.,theboundsoferrorbetweenthe
computedkerneldistancemetricanditsstatisticaloptimalone,isgiveninSection3.4,and
theomittedproofsaredeferredtoSection3.5.Section3.6presentsthedetailedimplemen-
tationissues.Section3.7describestheintensiveexperimentalsetup,resultsandanalysis.
Section3.8summarizesthechapterandSection3.2surveysthecloselyrelatedworks.
3.1MotivationandSetup
Amongthehugevolumeofimageannotationalgorithms,thesearchbasedapproachhasbeen
provedtobequitee
!ective,particularlyforlargeimagedatasetswithmanykeywords[67,93,
139,194].Theirkeyideaistoannotateatestimage
Iwiththecommontagssharedbythe
subsetoftrainingimagesthatarevisuallysimilarto
I,whichgivesrisetoanemergentneedof
39ane
!ectivevisualsimilaritymeasurebetweenimages.Duetotheintricatecomplexitiesand
nonlineardependenciesbetweenimagevisualcontents,weresortto
KernelMetricLearning
(KML)[26,39,187]totacklethisproblembylearningakernelbaseddistancemetricthat
pullssemanticallysimilarimagescloseandpushessemanticallydissimilarimagesfarapart.
Figure3.1illustratesempiricale
!ectsofapplyingappropriatekerneldistancemetricto
imagesassociatedwithpropertags,indicatingthatastwoimagessharemoretags,their
visualdistanceisshortenedbyalearnedkerneldistancemetric.
Figure3.1:Illustrationofhowkerneldistancemetricworkstoimageswithappropriatetags.
Intheleftbox,imagessharethetagsmarkedinthesamecolorasthelinesconnectingthem.
Kernelmetriclearninghasbeenwidelyusedtosettleimagesimilarityproblemsinimage
classiÞcation[39,60,187],clustering[2,26,205],andretrieval[77,78].Traditionalkernel
distancemetriclearningapproaches[39,60,78,187,205]areusuallyextendedfromexisting
lineardistancemetriclearning,andtheyusuallyÞndtheoptimalkernelmetricbyminimizing
thedistancebetweenmust-linkimagesandsimultaneouslymaximizingthedistancebetween
cannot-linkimages.
40DespitethesuccessofKMLalgorithmsinthoseapplications,theystillsu
!erfromtwo
signiÞcantlimitations.First,thehighdimensionalityofKML,denotedby
d,usuallyleads
toahighcomputationalcostinsolvingtherelatedoptimizationproblems.Inparticular,
toensurethelearnedmetrictobe
PositiveSemideÞnite
(PSD),theexistingmethodsneed
toprojectthelearnedmatrixintoaPSDconewhosecomputationalcostis
O(d3),whichis
relativelycomputationallyexpensive.Althoughonlinelearningalgorithms[25,39,91]are
abletogetridofthePSDrequirements,theyneedtotrainaconsiderableamountofdata
andstillcostremarkabletimetoearnareasonableperformance.Secondly,thehighdimen-
sionalityinkernelmetriclearningprocessmayleadtotheoverÞttingoftrainingdata[95],
andÞnallyreducestheannotationperformance.Toaddresstheover-Þttingproblem,some
studiestrytoÞndbetterkernelswithboostingmethods[74,172],somestraightforwardly
reducethedimensionalityoftheprojecteddata[26,187],andsomeothersdirectlyadda
regularizer[95].However,noneofthemhasasolidtheoreticsupportindealingwiththe
overÞttingproblem.
Unlikemostlinearorkernelmetriclearningalgorithmsinsimilarsetupincludingimage
classiÞcation,clusteringandretrieval,whichdealwithbinarysemanticconstraints(must-
linkorcannot-linktoalabel),theproposedRKMLalgorithmisabletohandlethenumeric
semanticconstraints,whichbetterrepresentthecomplexsemanticrelationshipbetweenim-
agesandthusmakebetteruseofthesupervisedinformation.Besides,theproposedRKML
algorithmavoidsthetimeconsumingPSDconeprojectionstepbyexploitingthespecial
propertyofregression,wherethePSDpropertyisautomaticallyguaranteed.Additionally
theoverÞttingriskthatiseasilycausedbythehighdimensionalityandcommonlyexistsin
kernelmetriclearningisalleviatedinRKMLbyappropriatelyregularizingtherankofthe
learnedkernelmetricmatrix,insteadofanindependentnorm(FrobeniusorAbsolutenorm)
41ofthelearnedmetricmatrix.Thisstrategyalsofacilitatesthefurtherimplementationby
connectingRKMLwiththeNystr¬omapproximation,andthusspeedsupthecomputation
withlimitedstoragememoryrequestedinthecomputationphase.Finally,theproposed
RKMLisequippedwiththeoreticalguarantees,theboundsoferrorbetweenthelearned
metricandthestatisticaloptimalone,whichisoriginalandconstructiveforkerneldistance
metriclearning.
3.2RelatedWork
Duetotherichliteratureinbothareasofimageannotationanddistancemetriclearning,
hereweonlysurveythestudiescloselyrelatedtothiswork.Formorecomprehensiveand
detailedbackgroundreview,pleaserefertoChapter2.
Accordingto[67],automaticimageannotationmethodscanbecategorizedintothree
groups:(i)generativemodels[22,49],whicharedesignedtomodelthejointdistribution
betweentagsandvisualfeatures,(ii)discriminativemodels[47,144]thatviewimageanno-
tationasaclassiÞcationproblemswhereeachkeywordistreatedasanindependentclass,
and(iii)searchbasedapproaches[139,194].Recentstudiesonimageannotationshowthat
searchbasedapproachesaremoree
!ectivethanbothgenerativeanddiscriminativemod-
els.Here,webrießyreviewthemostpopularsearch-basedapproachesdevelopedforimage
annotation.TagProp[67]constructsasimilaritygraphforallimages,andpropagatesthe
labelinformationviathegraph.In[123]amajorityvotingschemeamongtheneighboring
imagesisproposed.Asparsecodingschemeisproposedin[59]tofacilitatelabelpropaga-
tion.ConditionalRandomFieldmodelisadoptedin[93]tocapturethespatialcorrelation
betweenannotationsofneighboringimages.
42Manyalgorithmshavebeendevelopedtolearnalineardistancemetricfrompairwise
constraints[210],andsomeofthemaredesignedexclusivelyforimageannotation[93,200,
202].Recently,anumberofnonlinearDMLapproacheshavebeendevelopedtohandle
nonlinearandmultimodalpatterns.TheyareusuallyclassiÞedintotwocategories,boosting
basedapproaches[73,74,172]andkernelbasedapproaches,dependingonhowthenonlinear
mappingisconstructed.ManyKMLalgorithms,suchasKernelDCA[78],KLMCA[187]
andKernelITML[39],directlyextendtheirlinearcounterpartstoKMLusingthekernel
trick.TohandlethehighdimensionalitychallengeinKML,acommonapproachistoapply
dimensionalityreductionbeforelearningthemetric[26,187].Althoughthesestudiesshow
dimensionalityreductionhelpsalleviatetheoverÞttingriskinKML,notheoreticalsupport
isprovided.
3.3AnnotateImagesbyRegressionbasedKernelMet-
ricLearning(RKML)
Tobegin,let
X=(x1,...,
xn)&beasetoftraininginstances,where
xi#Rdisa
d-dimensionalinstance.Let
mbethenumberofclasses,and
Y=(y1,...,
yn)&betheclass
assignmentsofthetraininginstances,where
yi#{0,1}mwith
yi,j=1if
xiisassignedto
class
jandzero,otherwise.Inimageannotation,eachimagecanbeassignedtomultiple
classes,andthuseachvector
yimaycontainmultipleones.Let
%(x,x"):Rd$Rd()Rbeakernelfunction,and
H!bethecorresponding
ReproducedKernelHilbertSpace
.43Withoutametric,thesimilaritybetweentwoinstances
xaandxbcouldbeassessedby
thekernelfunctionas
*%(xa,á),%(xb,á)+H!=%(xa,xb).Similartolineardistancemetriclearningalgorithms,wemodifythesimilaritymeasurewith
kerneldistancemetricas
%(xa,xb)=*%(xa,á),T[%(xb,á)]+H!,where
T:H!()H!isalinearoperatorlearnedfromthetrainingexamples.Theobjective
ofkernelmetriclearningistolearnaPSDlinearoperator
Tthatisconsistentwiththeimage
tagassignmentsoftrainingexamples.Notethatthisisdi
!erentfromsimilaritylearning[25]
becauseindistancemetriclearningwerequire
TtobePSD.
3.3.1RegressionbasedKernelMetricLearning
TheproposedRKMLisakernelmetriclearningalgorithmbasedontheregressiontechnique.
Let
si,j#Rbethesimilaritymeasurebetweentwoimages
xiandxjbasedontheiranno-
tationsyiandyj.Wenotethat
si,jisareal-valuedmeasurement,whichisdi
!erentfrom
theconventionalstudiesofdistancemetriclearningthatonlyconsiderabinaryrelationship
betweentwoinstances.Thediscussionof
si,jwillbedelayedtoSection3.6.1.Weadopta
regressionmodeltolearnakerneldistancemetricconsistentwiththesimilaritymeasure
si,jbysolvingtheoptimizationproblem:
"T=argmin
T,0n#i,j=112$si,j%*%(xi,á),T[%(xj,á)]+H!%2.44Followingtherepresentertheoremofkernellearning[170],itissu
"cienttoassumethat
"Tonlyoperatesinthesubspacespannedby
%(xi,á),i=1,...,n
,leadingtothefollowing
deÞnitionfor
"T:"T[f](á)=n#i,j=1%(xi,á)Ai,jf(xj),(3.1)where
A#Rn$nisaPSDmatrix.Using(3.1),wecanchangetheoptimizationproblemfor
"Tintoanoptimizationproblemfor
Aasfollows:
minA,0L(A)=12|S%KAK
&|2F,(3.2)where
K=[%(xi,xj)]n$nisthekernelmatrixand
S=[si,j]n$nincludesallthepairwise
semanticsimilaritiesbetweenanytwotrainingimages,and
|á|
FrepresentstheFrobenius
normofamatrix.
Itisstraightforwardtoverifythat
A=K€SK€isanoptimalsolutionto(3.2),where
K€standsforthepseudoinverseof
K.Notethatwhen
thesemanticsimilaritymatrix
SisPSD,
AwillalsobePSD,thusnoadditionalprojection
isneededtoenforcethelinearoperator
"TtobePSD.ToavoidoverÞtting,wereplace
Kwith
Kr,thebestrank
rapproximationof
K,andexpress
AasA=K%1rSK%1r.(3.3)45Evidently,therank
rmakesthetradeo
!betweenbiasandvarianceinestimating
A:the
largertherank
r,thelowerthebiasandhigherthevariance.Thiswillbecomeclearerin
ourtheoreticalanalysisinSection3.4.
3.3.2ExtensiontoImageFeatureDimensionReduction
Usingthelearnedlinearoperator
"T,thesimilaritybetweenanytwodatainstances
xaandxbisgivenby
%(xa,xb)=n#i,j=1%(xa,xi)%(xb,xj)Ai,j=#(xa)&A#(xb)=&A1/2#(xa)'&&A1/2#(xb)',where
#(x):Rd()Rnisgivenby
#(x)=[%(x,x1),...,
%(x,xn)]&.Thus,theproposed
RKMLalgorithmmapsavectorof
ddimensionsintoonewithatmost
mdimensions,
i.e.,thelengthoftagdictionary.MorejustiÞcationaboutthedimensionreductiondetailscan
bereferredtoSection3.6.1.
3.4TheoreticalGuaranteeofRKML
Wewillshowthatthelinearoperatorlearnedbytheproposedalgorithmisstochastically
consistent,
i.e.,thelinearoperatorlearnedfromÞnitesamplesprovidesagoodapproximation
totheoptimalonelearnedfromaninÞnitenumberofsamples.Tosimplifyouranalysis,we
assumethatthesemanticsimilaritymeasure
si,j=y&iyj1.1Wenotethatouranalysiscanbeeasilyextendedtothecasewhen
si,j=öy&iöyj,where
öyiisadeter-
ministictransformationof
yi.46DeÞnetheoptimallinearoperator
T!thatminimizestheexpectedlossasfollows,
minT"E(xa,xb,ya,yb)()y&ayb%*%(xa,á),T"[%(xb,á)]+H!*2+.Let
T!(r)bethebestrank-
rapproximationof
T!,and
"Tbethelinearoperatorconstructed
byAgivenin(3.3).Wewillshowthatunderappropriateconditions,
|T!%"T|!isrelativelysmall,where
|á|!measuresthespectralnorm.
Let
gk(á)bethepredictionfunctionforthe
k-thclass,
i.e.,yi,k=gk(xi).Wemakethe
followingassumptionfor
gk(á)inouranalysis:
A1:gk(á)#H!,k=1,...,m.
Assumption
A1essentiallyassumesthatitispossibletoaccuratelylearntheprediction
function
gk(á)givensu
"cientlylargenumberoftrainingexamples.Wealsonotethatas-
sumption
A1holdsif
gk(á)isasmoothfunctionand
%(á,á)isauniversalkernel[146].The
followingtheoremshowsthatunderassumption
A1,withahighprobability,thedi
!erence
between
T!and"Twillbesmall,provided
nissu
"cientlylarge.
Theorem3.1.
Assume
A1holds,and
%(x,x)-1forany
x.Let
r<n
beaÞxedrank,
and&1,...,
&nbetheeigenvaluesofkernelmatrix
K/n
rankedinthedescendingorder.
ForaÞxedfailureprobability
'#(0,1),weassume
nislargeenoughsuchthat
&r.&r+1+8/nln(1/').(3.4)47Then,withaprobability
1%',wehave
|"T%T!(r)|!-(,where
|á|!isthespectralnormofalinearoperatorand
(isgivenby
(=8ln(1
/')//n&r%&r+1%8ln(1
/')//n.ThedetailedproofofTheorem3.1canbefoundinSection3.5.1.
3.4.1AnalysisoftheLowRankApproximationA
!ectstoRKML
UsingtheresultfromTheorem3.1,wecananalyzehowrank
ra!ects|"T%T!|!,thedi
!erence
betweentheestimatedlinearoperatorandtheoptimalonerepresentedinspectralnorm.We
have
|"T%T!|!-|"T%T!(r)|!+|T!%T!(r)|!.AsindicatedbyTheorem3.1,
|"T%T!(r)|!-O,1/n(&r%&r+1)-,provided
&r.&r+1+16ln(1
/')/n.Bychoosingasmall
r,wewouldexpectalarge
&r%&r+1andconsequentiallyasmall
|"T%T!(r)|!,implyingasmallvarianceinapproximating
T!(r).Ontheotherhand,asthe
rgoessmaller,the
|T!%T!(r)|!becomeslarger,implyingalargebiasinapproximating
T!.48Thus,rank
ressentiallymakesthetradeo
!betweenthebiasandvarianceintheestimation
oftheoptimallinearoperator
T!.3.5ProofsofErrorBounds
Inthissection,wegiveouttheproofsofthemaintheoremsproposedinSection3.4.
3.5.1ProofofTheorem3.1
WeheregivethesketchoftheproofandreferthereaderstoSectionA.1formoredetailed
analysis.WeÞrstrewrite
Tintothefollowingformusingtheexpressionof
Ain(3.3)
"T[f](á)=m#k=1"hk(á)*"hk(á),f(á)+H!,where
"hk(á)=n#i=1%(xi,á)[K%1ryk]i,andyk#Rnisthe
k-thcolumnvectorofmatrix
Y.UsingthedeÞnitionof
gk(á)andassumption
A1,aswellasthereproducingpropertyof
kernelfunction[170],wehave
yi,k=gk(xi)=*gk(á),%(xi,á)+H!.Basedonthesepreparations,wedevelopthefollowingtheoremfor
"hk(á).49Theorem3.2.
Underassumption
A1,wehave
"hk(á)=r#i=1")i(á)*")i(á),gk(á)+H!,where
")i(á),i=1,...,r
aretheÞrst
reigenfunctionsofthelinearoperator
Ln[f]=1nn#i=1%(xi,á)f(xi).TheproofofTheorem3.2canbereferredtoSectionA.1.1.
UsingsimilaranalysisasTheorem3.2,wecanexpress
T!asT![f]=m#k=1hk(á)*hk(á),f(á)+,where
hk(á)=r#i=1)i(á)*)i(á),gk(á)+H!,anditistheprojectionofpredictionfunction
gk(á)intothesubspacespannedby
{)i}ri=1.Here
)i(á),i=1,...,r
aretheÞrst
reigenfunctionsoftheintegraloperator
L[f]=E
x[%(x,á)f(x)].Thereforethefollowingtheoremsbound
|"T%T!|!and|L%Ln|!bythefollowingtwotheorems,
respectively.
Theorem3.3.
Let
&rand&r+1bethe
r-thand
r+1-theigenvaluesofkernelmatrix
K.50ForaÞxedfailureprobability
'#(0,1),assume
&r%&r+1n>|L%Ln|!,where
|á|!measuresthespectralnormofalinearoperator.Then,withaprobability
1%',wehave
maxf#H!0("T%T!)[f]0H!-*0T![f]0H!,where
*isgivenby
*=20L%Ln02(&r%&r+1)/n%0L%Ln02.TheproofofTheorem3.3canbereferredtoSectionA.1.2.
Theorem3.4.
[175]Assume
%(x,x)-1.Withaprobability
1%',wehave
0L%Ln0HS-4ln(1
/')/n.Theorem3.1followsimmediatelyfromTheorem3.4and3.3.
3.6Implementation
Regardingimplementation,wehavetwoimportantissuestoaddress:(1)howtoappropri-
atelymeasurethesemanticsimilarity
si,j,and(2)howtoe
"cientlycompute
Kr,thebest
rankrapproximationof
K,withoutcomputingthefullkernelmatrix
K.Thesecondissue
isparticularlyimportantforapplyingtheproposedalgorithmtolargedatasetsconsistedof
millionsofannotatedimages.Below,wewilldiscussthesetwoissuesseparately.
513.6.1ComputingSemanticSimilarity
si,jThemoststraightforwardapproachistomeasurethesemanticsimilarityas
si,j=y&iyj.We
improveuponthisapproachbyincorporatingthelog-entropyweightingscheme[117]which
hasbeenusedfordocumentretrieval.Itcomputestheweightedclassassignment÷
yi,jas÷yi,j=.1+n#kpk,j
logpk,j
logn/álog(
yi,j+1)
,(3.5)where
pk,j
=yk,j
/0niyi,j.WeapplyLatentSemanticAnalysis(LSA)[117]tofurther
enhancetheestimationofsemanticsimilarity,whichallowsustoremovethenoiseandcor-
relationin/betweenannotations.Let
÷Y=[÷
yi,j]n$mincludetheweightedclassassignments
forallthetrainingimages,and
öY#Rn$m"includetheÞrst
m"singularvectorsof
÷Ywith
eachofitsrow
L2-normalizedby1.Thisoperationprojects
÷Yontoaspaceofreduced
dimensionality
m",andthisspacerepresentationhasbeenempiricallyshowntocaptureto
somedegreethesemanticrelationshipacrossannotationscorpus[147].Wethencompute
thesemanticsimilarityas
S=öYöY&.3.6.2E
"cientlyComputing
KrbyRandomProjection
TheproposedRKMLalgorithmrequirescomputingthefullkernelmatrix
Kanditstop
rsingularvectors.Sincethecostofcomputing
KisO(n2),itwillbeexpensivewhenthe
numberoftraininginstances
nislarge.Wecanimprovethecomputationale
"ciencyby
exploitingtheNystr¬ommethod[43]toapproximate
Kr.Tothisend,werandomlysample
ns<ninstancesfromthecollectionof
ntrainingexamples,denotedby
"x1,...,
"xns,then
52computetherectanglematrix
Kb#Rn$ns,andapproximate
Krby÷Kr=Kb[Ksr]%1[Kb]&,(3.6)where
Ksristhebestrank
rapproximationof
Ks=[%("xi,"xj)]ns$ns,thekernelmatrixfor
thesampleddata.Accordingto[32],withahighprobability,wehave
0÷Kr%Kr02-O,1/ns-,implyingthat
÷Krisanaccurateapproximationof
Krprovidedthenumberofsamples
nsissu"cientlylarge.Thisisalsosupportedbyourempiricalstudy,
i.e.,kernelmatrix
KcanbewellapproximatedbytheNytr¬ommethodwhen
nsisafewthousands.Accordingtoour
implementation,weobservethatfurtherapproximating
Kbin(3.6)torank
rusuallyyields
moreaccuratepredictionfortags.Thus,ourÞnalapproximationof
Krisgivenby
öKr=Kbr[Ksr]%1[Kbr]&.3.6.3ApplicationofRKMLtoImageAnnotation
Giventhelearnedkernelmetric
"T,thesimilaritybetweenthequeryimage
xandtheimages
ingallery
Gcouldbecomputedasfollows:
S(x,G)=n#i=1*%(x,á),"T[%(xGiá)]+=kx!A!KG53where
KG=[#(x1),ááá
,#(xn)]&,and
kx=#(x).Consequentlyweconducttheestimated
similarityandobtaintheneighborlistof
xasNx={xNi|S(x,xNi)>S(x,xj),1i#[1,k],j#[1,n],xj2=xNi}.Thustherelevanceofkeywordsfor
xcanbeestimatedover
Nxbyeithermajorityvoting,
orweightedvoting,i.e.,
"y=k#i=1*%(x,á),"T[%(xNiá)]+yNi=kxAKN÷YN(3.7)Thekeywordswiththe
t-largestrelevancescoreswillberegardedastheannotationforthe
testimage.Algorithm1summarizesthekeystepsoftheimageannotationalgorithmusing
RKML.Algorithm1
AutomaticImageAnnotationwithRKML
Input:
¥Trainingimages:
X#Rn$d,labels
Y#Rn$m¥Testingimages:
Xq#Rnq$d¥Parameters:smoothparameter
*andapproximationrank
r.1:Randomlysample
rimages
"x1,...,
"xrfromthetrainingset
2:Computekernelmatrices
Kg=[!(xi,"xj)]n!rand
"K=[!("xi,"xj)]r!r3:Computesingularvaluedecomposition:
V!VT=Kb4:Selectthe
rlargestsingularvalues
"i#!randtheircorrespondingright-singularvectors
ui#Ur5:Kernelmetric:
A=0riuiuTi/("2i+#)6:Relevancescorematrix:
Tq=KqA(KTgTg)(TTgTg)7:Output:
Matrixoftagrelevancescore
Tq#Rnq!mFigure3.2highlightsthekeycomponentsofakernelmetriclearningalgorithmbased
frameworkforimageannotation.ItÞrstconstructsa
ReproducedKernelHilbertSpace
(RKHS)
Hbasedoneitherthewholesetorasubsetofimagesthatarerandomlysam-
pledfromthetrainingimageset.Itthenmapsthetrainingimagesto
H,andlearnsakernel
distancemetric
Afromthemappedimages.Givenatestimage
I,itÞrstmaps
ItoH,54Figure3.2:Theproposedkernelmetriclearningscheme,
i.e.,RKML,forautomaticimage
annotation.andmeasuresthesimilaritiesbetween
Iandallthetrainingimagesin
Husingthelearned
distancemetric
A.Basedontheoccurrenceofkeywordsinthesubsetoftrainingimages
thatsharethelargestsimilaritieswith
I,itestimatesrelevancescoresforeachkeywordand
returnstheoneswiththelargestscoresasthepredictedannotationtags.
3.7Experiments
3.7.1DatasetsandExperimentalSetup
Threebenchmarkdatasetsforimageannotationareusedinourstudyandtheirstatisticsare
summarizedinTable3.1.ForbothESPGameandIAPRTC12datasets
2,abag-of-words
2Thevisualfeaturesandtagsofboththedatasetswereobtainedfrom[67]
http://lear.inrialpes.fr/people/guillaumin/data.php.55modelbasedondenselysampledSIFTdescriptorsisusedtorepresentthevisualcontent.
Flickr1Mdataset[202]iscomprisedofmorethanonemillionimagescrawledfromthe
Flickr
websitethatareannotatedbymorethan700
,000keywords.Sincemostkeywordsareonly
associatedwithasmallnumberofimages,weonlykeepthe1
,000mostpopularones.We
follow[200,202]andrepresenteachimagewithfollowingfeatures:gridcolormoment,local
binarypattern,Gaborwavelettexture,andedgedirectionhistogram.
ESPGame
IAPRTC12
Flickr1M
No.ofImages
20,768
19,627
999,764
Dimensionality
10001000291Vocabularysize
2682911,000
Tagsperimage
4.69/15
5.72/23
5.98/202
Imagespertag
363/5,059
386/5,534
5,976/76,531
Table3.1:Statisticsforthedatasetsusedintheexperiments.Thebottomtworowsare
givenintheformatmean/maximum.
Werandomlyselect90%ofimagesfromeachdatasetastrainingandusetheremaining
10%fortesting.Givenatestimage,weÞrstidentifythe
kmostvisuallysimilarimagesfrom
thetrainingsetusingthelearneddistancemetric,andthenrankthetagsbyamajorityvote
overthe
knearestneighbors,where
kischosenbycross-validation.
AnRBFkernelisusedinourstudyforallKMLalgorithms.InRKMLweset
ns=5,000andm"=0.38mbasedonourexperience,anddeterminethekernelwidthandrank
rbycross-validation.Parametersforthebaselinesaredirectlysettotheirdefaultvaluessuggested
bytheoriginalauthors.Besides,annotationbasedontheEuclideandistance,denotedby
Euclid
,isusedasareferenceinourcomparison.SincemostDMLsaredevelopedagainst
must-linksandcannot-links,weapplytheproceduredescribedin[200]togeneratethebinary
constraintsbyperformingaprobabilisticclusteringovertheimagesbasedontheirtags.More
detailsofthisprocedurecanbefoundin[200].
56Weevaluatetheannotationaccuracybytheaverageprecisionforthetoprankedimage
tags.Following[201,202],weÞrstcomputetheprecisionforeachtestimagebycomparing
thetop10annotatedtagswiththegroundtruth,andthentaketheaverageoverthetestset.
AveragerecallandF1scorearereportedinthesupplementarydocument.Thecomputational
e"ciencyismeasuredbytherunningtime
3.Boththemeanandstandarddeviationof
evaluationmetricsover20experimentaltrialsarereportedinthispaper.
3.7.2ComparisonwithState-of-the-artdistancemetriclearning
(DML)andImageAnnotationAlgorithms
3.7.2.1ComparisontononlinearDMLalgorithms
WeÞrstcomparetheproposedRKML
4algorithmtosixstate-of-the-art
kernel
distance
metriclearningmethods:(1)KernelPCA(
KPCA
)[169],(2)Generalizeddiscriminantanal-
ysis(
GDA
)[6],(3)Kerneldiscriminativecomponentanalysis(
KDCA
)[78],(4)Kernellocal
Fisherdiscriminantanalysis(
KLFDA
)[182],(5)Kernelinformationtheoreticbasedmetric
learning(
KITML
)[39],and(6)Metriclearningforkernelregression(
MLKR
)[199].Wealso
includethreeboostingDMLalgorithms,
i.e.,DistanceBoost(
DBoost
)[73],KernelBoost
(KBoost
)[74],andmetriclearningwithboosting(
BoostM
)[172],forcomparison.
Figure3.3,3.4and3.5showtheaverageprecision,averagerecallandaverageF1score,
respectively,ofthetop
tannotatedtagsobtainedbynonlineardistancemetriclearning
(DML)baselinesandtheproposedRKML.Surprisingly,weobservethatmostofthenonlin-
earDMLalgorithmsareonlyabletoyieldperformancesimilartothatbasedontheEuclidean
3AllthecodesaredownloadedfromtheauthorsÕwebsites,andruninMatlabontheAMD2core@2.7GHz
and64GBRAMmachine.
4WithoutspeciÞcnotiÞcation,RKMLstandsfortheproposedRKMLalgorithmwithNystr¬omapproxi-
mation.Anditssourcecodecanbefoundin
http://www.cse.msu.edu/
~fengzhey/research/rkml.html
.57(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.3:Averageprecisionforthetop
tannotatedtagsusingnonlineardistancemetrics.
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.4:Averagerecallforthetop
tannotatedtagsusingnonlineardistancemetrics.
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.5:AverageF1scoreforthetop
tannotatedtagsusingnonlineardistancemetrics.
distance,andmoredisturbingly,someofthenonlinearDMLalgorithmsevenperformsig-
niÞcantlyworsethantheEuclideandistance.Ontheotherhand,theproposedalgorithm
performssigniÞcantlybetterthantheEuclideandistanceforalmostallcases.Relevant
analysisofthisphenomenaisprovidedinSection3.7.3.3.
583.7.2.2ComparisontolinearDMLalgorithms
WecompareourRKMLtosevenstate-of-the-art
linear
distancemetriclearningalgo-
rithms,includingRelevantcomponentanalysis(
RCA
)[3],Discriminativecomponentanalysis
(DCA
)[78],LargemarginnearestneighborclassiÞer(
LMNN
)[196],LocalFisherdiscrimi-
nantanalysis(
LFDA
)[182],Informationtheoreticbasedmetriclearning(
ITML
)[39],Prob-
abilisticRCA(
pRCA
)[200],andLogisticdiscriminant-basedmetriclearning(
LDML
)[68].
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.6:Averageprecisionforthetop
tannotatedtagsusinglineardistancemetrics.
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.7:Averagerecallforthetop
tannotatedtagsusinglineardistancemetrics.
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.8:AverageF1scoreforthetop
tannotatedtagsusinglineardistancemetrics.
Figure3.6,3.7and3.8showtheaverageannotationprecision,averagerecallandaverage
59F1score,respectively,forthelineardistancemetriclearning(DML)baselines.Similarto
KML,weobservethateventhebestlinearDMLalgorithmisonlyslightlybetterthanthe
Euclideandistance,whileRKMLsigniÞcantlyoutperformsalllinearDMLbaselines.Again,
webelievethatthefailureoflinearDMLislikelyduetothebinaryconstraintsgenerated
fromimageannotations,whichisexplainedinSection3.7.3.3.
3.7.2.3ComparisonwithState-of-the-artImageAnnotationMethods
Additionally,wecompareRKMLalgorithmtoseveralstate-of-the-artimageannotationmod-
elsincluding:(1)TwoversionsoftheTagPropmethod[67],usingeitherrank-basedweights
(TP-R
)ordistance-basedweights(
TP-D
),(2)TagRelevance(
tRel
)[123]basedontheidea
ofneighborvoting,(3)1-vs-1SVMclassiÞcation,usingeitherlinear(
SVML
)orRBFkernel
(SVMK
)classiÞers
5.Weinclude
Pop
asacomparisonreferencewhichsimplyrankstags
basedontheiroccurringfrequencyinthetrainingset.
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.9:Annotationperformanceintermsof
AP@twithdi
!erentannotationmodels.
Figure3.9,3.10and3.11showthecomparisonofaverageprecision,averagerecalland
averageF1scorethatobtainedbydi
!erentimageannotationmodels,respectively.Itisnot
surprisingtoobservethatmostannotationmethodssigniÞcantlyoutperformPop,whilethe
proposedRMKLmethodoutperformsallthestate-of-the-artimageannotationmethodson
5SVMmethodswereunabletoperformover
Flickr1M
duetoitslargesizeandhighcomputationalcost,
andtheresultsofSVMmethodsareexcluded.
60(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.10:Averagerecallforthetop
tannotatedtagsusingdi
!erentannotationmodels.
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.11:AverageF1scoreforthetop
tannotatedtagsusingdi
!erentannotationmodels.
IAPRTC12andESPGamedatasets,andonlyperformsslightlyworsethanTP-Donthe
Flickr1Mdataset.
3.7.2.4ComparisonofAnnotationResultsonExemplarImages
Inordertostraightforwardlycomparetheempiricalperformanceofimageannotationbe-
tweenvariouslinearandkerneldistancemetriclearningalgorithmsaswellasimagean-
notationapproaches,weincludethecomparisonofannotationresultsoncertainimagesin
Table3.2,whichshowstheannotationsofexemplarimagesbydi
!erentDMLandimage
annotationalgorithms.
6162GroundEuclid
DCALMNNLDML
DBoostBoostMKBoost
KLDAKPCAKDCAKLFDAMLKR
TP-RTP-D
RKMLfogmountain
mountain
mountain
mountain
mountain
mountain
mountain
mountain
treemountain
mountain
mountain
mountain
mountain
mountain
front
wall
wall
wall
wall
wall
tourist
wall
range
sky
wall
rangewall
manman
terrace
mountain
terrace
treefog
terrace
terrace
front
terrace
wall
front
terrace
terrace
terrace
fog
wall
wall
rangefog
cloud
terrace
woman
cloud
wall
cloud
cloudman
fog
wall
cloud
wall
treerange
ruinrange
terrace
tourist
cloud
fog
woman
range
sky
wall
range
greyman
treepeople
ruinterracecloud
tourist
woman
range
range
groupsky
terrace
housecloud
ruinwoman
terrace
woman
fog
touristtree
fog
treefog
ruinrange
treefog
mountain
hillsky
range
slope
front
slope
wall
ruinpeopleforestman
skymanman
peoplepeopleman
fog
treesummit
tourist
sky
buildingskyskyskyskyskypeople
skyskyskyskyskyskyskyskymeadow
front
tree
tree
tree
tree
sea
skytree
cloud
tree
tree
tree
tree
tree
tree
skyhillcloud
buildingmeadow
cloud
cloudmancloud
house
wall
sea
buildingcloud
wall
house
tree
meadow
buildingman
buildingbuildingbeachmountain
buildingtree
front
beachpeoplemountain
ruincloud
buildingruinhouse
housecloudhouse
rock
tree
bushhillmountainbushsquarehouse
slope
front
hillsky
hillfront
bushpeople
meadow
front
meadow
mountainpeoplecloudcolumnman
meadow
meadow
wall
treepeople
cloudlandscapebush
tree
bushsea
buildingman
meadow
ßag
meadow
buildingman
terracewallbush
meadow
ruinhillcoastrockhouse
seahousehouse
front
people
housepeople
front
bike
road
mansky
road
treemansky
treetreeskysky
road
landscape
treeroad
cyclingman
wallbushman
skyskysnow
skyskysnowtreeman
mansky
sky
cyclist
cyclist
desertman
cyclist
short
rock
treemeadowfront
cycling
landscape
cyclist
grass
man
landscape
helmet
jersey
front
road
helmet
jersey
peoplebuilding
manwall
cyclist
rock
helmet
seafront
cyclist
jersey
short
skytree
jersey
meadowtreefront
cyclist
manmanbush
jersey
treeroad
short
landscape
bike
ßoor
bike
short
sockbushpeople
landscape
people
short
buildingshort
cactuswall
bike
mountain
cycling
road
carcycling
lawn
landscape
cloud
road
house
bike
cloudsky
road
bushcycling
roadhelmet
treecycling
sky
mancli
!man
rock
mountain
frontfront
cycling
skymeadow
jersey
shortcar
tourist
cyclist
bike
spectatorfrontstreet
cloudwoman
helmet
grass
bike
rock
people
helmet
doorbuilding
wall
buildingbuilding
frontbuilding
house
buildingskyfront
house
buildinghouse
house
door
housefront
tablestreettable
house
tree
buildingfront
tree
house
skyhouse
window
window
house
palm
house
womanbalcony
house
window
skywindow
window
frontbuilding
tree
tablestreetstreet
skyrooftable
frontpeoplefront
buildinghouse
front
house
people
window
hillwall
skyskywindow
sky
window
window
square
wall
wall
street
door
skyhouse
door
landscapefront
door
tree
palmtreesquare
classroom
tree
woman
skypeoplebalcony
wall
man
wall
meadowman
tree
door
tree
windowwoman
man
window
man
columntowerentrance
door
mountainßag
roof
window
palmpalm
buildingwall
door
buildingfrontsquare
entrancecar
wall
columnbuildingmansnowwoman
mantile
street
carskytree
tree
people
skyskyman
skyskypeoplefogpeople
tree
people
skyfencepeople
buildingbuilding
skyfrontbuildingsea
people
tree
man
skyskyskytree
spectator
grandstand
tree
front
skytree
buildingtree
skycloudwall
tree
wall
tree
front
skytree
houseman
peoplefront
house
peoplepeoplewoman
boatfront
skymanman
buildingman
fence
skywoman
man
carfront
squareman
tree
manman
fence
mountainfront
cloudfront
front
palm
house
skyhouse
man
tower
house
beach
seacloudwomanslope
house
river
house
carspectator
carcarmeadow
square
tree
frontcloud
tree
mountainbankbeachwoman
boatbuilding
grandstand
treebuilding
fence
palmwoman
man
carwater
beach
house
carbedsquare
peoplewoman
people
bed
wall
wall
wall
wall
bed
wall
wall
wall
sky
wall
bed
wall
woman
wall
wall
blankettable
tablewomantable
wall
room
room
window
treeroom
wall
tablewall
woman
bed
curtain
room
room
door
room
room
bed
bed
room
mountain
bed
room
room
front
tableroom
front
window
womantable
front
curtaintabletable
bed
cloudtabletable
front
doorman
window
room
curtainfront
man
window
window
window
window
curtainwall
curtainwood
man
man
room
curtainwallwoman
window
room
bed
tablewood
wood
tablerock
window
bedcoverwoman
tablefront
wood
window
bed
bed
bed
woman
wood
curtaincurtainwood
front
wood
bedside
window
house
window
tablewooddoor
doorbuilding
curtainlamplampdoor
front
housedoor
curtainbed
room
bed
front
buildingskytree
tree
skyskyskyskyskytree
skyskyskyskyskyskycloud
cloudman
road
front
cloudtree
tree
cloudman
tree
mountain
front
cloudtree
tree
front
front
carfront
cloudseamountain
buildingtree
wallcyclist
tree
tree
front
cloudcloudhilltree
cyclistman
tree
man
hillsea
mountain
skyfront
desert
cloudtree
man
buildingmeadowman
cyclingmountain
road
landscapetouristbeach
beachwomanmangrey
road
carfront
meadow
monument
road
short
skyman
meadow
front
house
man
front
mountain
hillman
parkmountain
hillskymountain
buildingcarmountain
beachhouse
front
househousepeoplelandscapemountain
man
road
mountain
treecar
skycloudpeople
tree
landscapecity
meadow
mountain
road
snow
hillshophouse
front
Table3.2:Examplesofannotationresultsgeneratedby14baselinesandtheproposedRKML.Theannotatedtagsareranked
basedontheestimatedrelevancescoreindescendingorder,andthecorrectonesarehighlightedinblueboldfont.Notethe
groundtruthannotationsinthe2-ndcolumndonotalwaysincludeallrelevanttags(
e.g.,ÒpeopleÓforthe5-thimage),and
sometimescontainpolysemes(
e.g.,ÒpalmÓforthe4-thand5-thimages)andcontroversialtags(
e.g.,ÒfrontÓ).
3.7.2.5E
"ciencyEvaluation
TIMEDCALMNNITMLLDMLDBoostBoostM
RKML
IAPRTC12
1.5e41.4e44.2e44.2e51.7e41.1e6
4.6e2
ESPGame
2.3e41.7e45.8e45.5e54.3e41.2e6
1.3e3
Flickr1M
8.1e46.0e43.0e45.2e51.2e43.2e5
3.4e3
TIMEKPCAGDAKDCAKLFDAKITMLMLKR
RKML
IAPRTC12
2.8e44.8e42.2e48.8e45.3e42.2e3
4.6e2
ESPGame
3.3e45.4e43.7e43.2e56.8e43.5e4
1.3e3
Flickr1M
7.3e33.3e41.3e51.0e53.7e67.9e3
3.4e3
Table3.3:Comparisonofrunningtime(s)forseveraldi
!erentmetriclearningalgorithms.
TIMETP-RTP-D
tRel
SVMLSVMK
RKML
IAPRTC12
9.1e24.6e2
1.0e1
2.5e34.0e5
4.8e2
ESPGame
2.7e21.5e2
1.5e1
1.6e28.9e4
1.3e3
Flickr1M
1.6e59.9e4
5.7e3
--3.4e3
Table3.4:Runningtime(s)forimageannotation.SVMmethodsFlickr1Marenotincluded
duetotheirhighcomputationalcosts.
Table3.3summarizestherunningtimeofdi
!erentDMLalgorithms.Weobservethat
RKMLissigniÞcantlymoree
"cientthananyDMLbaseline.Table3.4comparestheef-
Þciencyofdi
!erentbaselinesforannotation,wheretherunningtimeincludesthetimefor
bothlearningadistancemetricandpredictingimagetags.Weobservethatcomparedtothe
otherannotationmethods,theproposedRKMLalgorithmisparticularlye
"cientforlarge
datasets(
i.e.,Flickr1M),makingitsuitableforlarge-scaleimageannotation.
3.7.3A
!ectsofDi
!erentExperimentalandParameterSetup
3.7.3.1SensitivitytoParameters
Inthissection,weanalyzethesensitivitytoparametersinRKML,includingrank
r,m",thenumberofretainedeigenvectorswhenestimatingthesemanticsimilarity,and
ns,the
63numberofsampledimagesusedforNystr¬omapproximation.
(a)
IAPRTC12
(b)ESPGame
Figure3.12:AveragePrecisionfortheÞrsttagpredictedbyRKMLusingdi
!erentvalues
ofrank
r.TomaketheoverÞttinge
!ectclearer,weturno
!theNystr¬omapproximation
forIAPRTC12andESPGamedatasets.Flickr1Mdatasetisnotincludedduetoitslarge
size(
n=999
,764).TheoverÞttingonlyoccurswhen
rapproximatestothetotalnumberof
images,butitisinfeasibletoapplysuchalarge
rinFlickr1Mdataset.
Weexaminetheroleofrank
rintheproposedalgorithmbyevaluatingtheprediction
accuracywithvaried
rontheIAPRTC12andESPGamedatasetsforbothtrainingand
testingimages.Tomakeitclear,weturno
!theNystr¬omapproximationusedbyRMKLin
thisexperiment.WeobserveinFigure3.12thatwhiletheaverageaccuracyoftestimages
initiallyimprovessigniÞcantlywithincreasingrank
r,itbecomessaturatedaftercertainrank.
Ontheotherhand,thepredictionaccuracyoftrainingdataincreasesalmostlinearlywith
respecttotherank,andbecomesalmost1forverylarge
r,aclearindicationofoverÞtting
trainingdata.
WealsoexaminethesensitivityoftheotherparametersusedbytheproposedRKML
algorithm,including
m",thenumberofretainedeigenvectorsof
÷Y,and
ns,thenumberof
sampledimagesusedforNystr¬omapproximation).Figure3.13and3.14showtheimage
annotationperformanceintermsofvaried
m"andns,respectively.Overall,wefoundthat
ouralgorithmisinsensitivetothevaluesoftheseparametersoverawiderange,which
facilitatetheselectionoftheseparametersinreal-worldapplication.
64(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.13:AveragePrecisionforthetop
ttagspredictedbyRKMLusingdi
!erentvalues
ofm",thenumberofretainedeigenvectorswhenestimatingthesemanticsimilarity.
(a)
IAPRTC12
(b)ESPGame
(c)
Flickr1M
Figure3.14:AveragePrecisionforthetop
ttagspredictedbyRKMLusingdi
!erentvalues
ofns,thenumberofsampledimagesusedforNystr¬omapproximation.In(c),
nscouldnÕt
besettoolargeduetothedatasetsize.
3.7.3.2AdvantagesofKernelTrickandNystr¬omApproximation
Sincenoneofthebaselinealgorithms,neitherlinearnornonlinearDML,isabletosigniÞ-
cantlyoutperformtheEuclideandistance,itremainsunclearifkernelDMLisadvantageous
toalinearDML.Toexaminethispoint,weimplementthelinearversionofRKML,de-
notedbyRLML.Table3.5,3.6and3.7showthecomparisonofperformancebetweenRLML
andRKMLonthreedatasets.ItisclearthatRKMLsigniÞcantlyoutperformsitslinear
counterpartRLML,verifyingtheadvantageofusingkerneltrickindistancemetriclearning.
However,althoughthekerneltrickconsiderablyimprovestheimageannotationaccuracy,
65AP@
t(%)
t=1t=2t=3t=4t=5t=6t=8t=10
RKML
55±1.2
48±0.9
44±0.6
41±0.8
37±0.6
35±0.5
31±0.5
28±0.4
RLML
52±1.3
46±1.2
42±1.0
38±0.8
35±0.7
33±0.6
29±0.5
26±0.4
RKMLO
57±0.9
51±0.6
46±0.7
43±0.6
39±0.6
37±0.5
32±0.5
29±0.4
RKMLH
49±1.1
44±0.9
39±0.9
36±0.7
33±0.7
31±0.7
27±0.6
24±0.5
Table3.5:ComparisonofvariousextensionsofRKMLintermsof
AP@tonIAPRTC12
dataset.RLMListhelinearversionofRKML,RKMLOistheoriginalversionwithout
Nystr¬omapproximation,andRKMLHrunsRKMLusingbinaryconstraints.
AP@
t(%)
t=1t=2t=3t=4t=5t=6t=8t=10
RKML
40±1.1
35±0.5
32±0.4
29±0.5
27±0.4
25±0.4
22±0.4
20±0.4
RLML
36±0.8
31±0.7
28±0.7
26±0.7
24±0.5
22±0.4
20±0.4
18±0.4
RKMLO
44±0.8
39±0.6
35±0.5
32±0.4
29±0.4
27±0.4
24±0.3
21±0.3
RKMLH
34±1.0
30±0.5
28±0.5
26±0.4
24±0.4
22±0.3
20±0.3
18±0.3
Table3.6:ComparisonoftheextensionsofRKMLintermsof
AP@tonESPGamedataset.
AP@
t(%)
t=1t=2t=3t=4t=5t=6t=8t=10
RKML
24±0.1
21±0.2
18±0.1
17±0.2
15±0.2
14±0.1
13±0.2
12±0.1
RLML
13±0.3
12±0.2
11±0.2
11±0.1
10±0.06
10±0.05
9.0
±0.05
8.0
±0.08
RKMLH
20±0.2
18±0.1
16±0.2
15±0.2
14±0.2
13±0.1
11±0.1
10±0.1
Table3.7:ComparisonoftheextensionsofRKMLintermsof
AP@tonFlickr1Mdataset.
RKMLOisexcludedsincethedatasetistoolargetodothecomputationonthefullkernel.
italsoinevitablyleadstohighevenprohibitivecomputationalcost.SotheNystr¬omapproxi-
mationisproposedtosolvethisproblem,whichmakesatrade-o
!betweenthecomputational
costandannotationaccuracy.Toverifythee
!ectivenessoftheNystr¬omapproximation,we
implementtheRKMLbyturningo
!theNystr¬omapproximationanddirectlydoallcom-
putationonthefullkernel,andthismethodisdenotedbyRKMLO.Table3.5,3.6and3.7
comparetheannotationperformanceofRKMLandRKMLO,whereweobservethatRKML
performsslightlyworsethanRKMLO.ThisphenomenonindicatesthatRKMLmakesagood
compromisebetweenthee
!ectivenessandcomputationalcost,bymakingtolerantsacriÞce
onannotatione
!ectivenesstogetridofthegreatcomputationalburden.
663.7.3.3AnalysisonBinaryConstraintsandTheirVariousGenerationWays
WeobserveinSection3.7.2thatmostbaselinemetriclearningalgorithms,eitherlinearor
kernelones,performworsethantheEuclideandistance.Weattributethisfailuremostlyto
thebinaryconstraints.Asdescribedbefore,allbaselinedistancemetriclearningalgorithms
requireconvertingimageannotationsintobinaryconstraintsinimageannotationtasks,
whichdoesnotmakefulluseoftheannotationinformation.Toverifythispoint,werun
RKMLwithsimilaritymeasure
si,jcomputedfromthebinaryconstraintsthataregenerated
forthebaselinedistancemetriclearningalgorithms,anddenotethismethodbyRKMLH.
WeobserveinTable3.5,3.6and3.7thatRKMLHperformssigniÞcantlyworsethanRMKL
whichdirectlyusesthereal-valuedsimilaritymeasures,conÞrmingthesigniÞcanceofusing
real-valuedsimilaritiesfordistancemetriclearninginautomaticimageannotation.
AP@
t(%)
t=1t=4t=7t=10
Method1
20.7
±0.2
15.3
±0.2
12.4
±0.12
10.6
±0.10
Method2
20.6
±0.3
15.2
±0.2
12.4
±0.11
10.6
±0.09
Method3
20.8
±0.2
15.4
±0.1
12.5
±0.05
10.7
±0.04
Method4
19.7
±0.2
14.6
±0.1
11.9
±0.06
10.2
±0.06
Method5
21.3
±0.4
15.9
±0.3
12.8
±0.20
11.0
±0.14
Table3.8:Comparisonofdi
!erentmethodsofgeneratingbinaryconstraintsthatareapplied
inbaselinedistancemetriclearningalgorithmLMNNforthetop
tannotatedtagsonthe
Flickr1Mdataset.Method1clustersthespaceofkeywords,method2considerstheclass
assignmentsasbinaryconstraints,method3clustersthespaceofkeywordsusinghierarchical
clusteringalgorithms,method4clustersthespaceofkeywordstogetherwiththevisual
features,andmethod5considersimagessharingmorethan4keywordsassimilarandimages
sharingnokeywordasdissimilar.
SincemostDMLalgorithmsweredesignedforbinaryconstraints,wetriedtoimprove
theperformanceofstandardDMLalgorithmsbyexperimentingwithdi
!erentmethodsfor
generatingbinaryconstraints.Theyarelistedasfollows:(1)Clusteringthespaceofkey-
67words,(2)GeneratingbinaryconstraintsfromclassiÞcationlabels
6,(3)Clusteringthespace
ofkeywordsusinghierarchicalclusteringalgorithms,(4)Clusteringthespaceofkeywords
togetherwiththevisualfeatures,and(5)Generatingbinaryconstraintsbasedonthenumber
ofcommonkeywords,
i.e.,imagessharingmorethan4keywordsareconsideredassimilar
andimagessharingnokeywordsareconsideredasdissimilar.Notethelastoneisapplicable
inLMNN,butnotapplicableinmanyotherDMLalgorithms.Forexample,RCA[3]and
DCA[78]divideimagesetintogroupswhereimageswithinagroupareconsideredassimilar
andimagesfromdi
!erentgroupsareconsideredasdissimilar;butthismethodisnotableto
generatesuchgroups.Weobservethatthesemethodsyieldessentiallythesameperformance
reportedinourstudy,asshowninTable3.8.
3.7.3.4ComparisonoftheDesignChoicesofSemanticSimilarityMeasure
Toobtainthenumericconstraintsontheannotatedtag,besideslog-entropy,wefurther
exploreotherweightingschemes.Andbesidesclusteringusingatopicmodel,wealsoexper-
imentotherbinaryconstraintgenerationmethods.
Binary
li,j
=1iftag
iexistsinimage
j,orelse0.
TermFrequency
li,j
=tfi,j
,theoccurrencescountsof
(TF)
tag
jinimage
i.Logli,j
=log(
tfi,j
+1)
Table3.9:Localweightingfunctions.
Weexaminethechoiceofsemanticsimilaritybyevaluatingthepredictionaccuracywith
varieddeÞnitionof÷
yi,jinEquation(5).÷
yi,jisactuallytheproductofalocaltagweight
li,jthatdescribestherelativeoccurrenceoftag
jinimage
i,andaglobalweight
gjthat6Flickr1MdatasetalsoincludesclassassignmentlabelswhichisusuallyusedforclassiÞcation.ESPGame
andIAPRTC12donothaveclassiÞcationlabels.
68Binary
gj=1Normal
gj=1/!0nitf2i,j
Idf
gj=log
2n1+dfjEntropy
gj=1+
0nipi,jlogpi,jlogn,where
pi,j
=tfi,j!nitfi,jTable3.10:Globalweightingfunctions.
describestherelativeoccurrenceoftag
jwithintheentiretagcollection.Theexamined
weightingfunctions[10]aredeÞnedasfollowsinTable3.9and3.10.
AP@
t(%)
t=1t=4t=7t=10
Binary-Binary
56±1.01
41±0.57
33±0.49
28±0.45
Binary-Normal
53±1.28
39±0.62
32±0.54
28±0.44
Cosine
56±1.19
41±0.61
33±0.52
28±0.47
TF-IDF
55±1.12
41±0.57
33±0.50
28±0.44
Log-IDF
55±1.12
41±0.57
33±0.50
28±0.44
Log-Entropy
55±1.10
41±0.57
33±0.49
28±0.45
Table3.11:ComparisonofextensionsofRKMLwithdi
!erentdesignchoicesofsemantic
similarityforthetop
tannotatedtagsontheIAPRTC12dataset.Theleftmostcolumnlists
thedi
!erentweightingmethods,wherethenamebeforeÓ-Ódenotesthelocalweightsshown
inTable3.9andthenamebehindÓ-ÓindicatestheglobalweightsshowninTable3.10.
ÓCosineÓrepresentsthecosinesimilaritybetweentagvectorsoftwoimages.
Table3.11showsthatdi
!erentsemanticsimilaritymeasures,eitherTF-IDFbasedweight-
ingorthepopularcosinesimilarity,provideessentiallysimilarperformances.Wehenceadopt
theLog-Entropyweightingschemeinourexperiments.
3.8Summary
Inthissection,weproposearobustande
"cientalgorithmRKMLforkernelmetriclearning.
Theproposedmethodaddresses(i)highcomputationalcostbyavoidingtheprojectioninto
PSDcone,(ii)limitationofbinaryconstraintsintagsbyadoptingareal-valuedsimilarity
measure,andaswellas(iii)theoverÞttingproblembyappropriatelyregularizingtherankof
69thelearnedkernelmetric.Experimentswithlarge-scaleimageannotationdemonstratethe
e!ectivenessande
"ciencyoftheproposedRKMLalgorithmbycomparingittothestate-
of-the-artapproachesfordistancemetriclearningandimageannotation.Inthefuture,we
plantoimprovetheannotationperformancebydevelopingamorerobustsemanticsimilarity
measure.
70Chapter4
ImageTagMatrixCompletionby
NoisyMatrixRecovery
InthisSection,weproposean
ImageTagCompletionbyNoisyMatrixRecovery(TCMR)
algorithm,whichisabletosimultaneouslyrecoverthemissingtagsandremoveordown
weightthenoisytagsthatareirrelevanttothevisualcontentofimages.Inparticular,this
algorithmisdesignedforimagetagcompletion,butactuallyitisnotexclusivetoimagetag
completionbutalsoworksprettywellforrelevantimagetaggingtasksincludingimagetag
reÞnementandimagetagre-ranking.
Therestofthechapterisarrangedasfollows.Section4.1motivatestheproblemand
statesmainintuitionbehindtheproposedalgorithm,andaswellassetupsthenotations.
Section4.3introducesthedetaileddescriptionofnoisymatrixrecovery,andextendsitto
theproposedTCMRalgorithm.Thetheoreticalpropertiesandguarantee,
i.e.,thebounds
oferrorbetweentherecoveredtagmatrixanditsstatisticaloptimalone,isgiveninSec-
tion4.4,andtheomittedproofsaredeferredtoSection4.5.Section4.6presentsthedetailed
implementationissues,anddescribestheproposedframeworkthatincorporatesimagecon-
tentconsistencyintothematrixcompletionbasedtopicmodelthroughagraphLaplacian.
Section4.7describestheintensiveexperimentalsetup,resultsandanalysis.Section4.8con-
cludesthechapterwithfuturedirectionsandSection4.2reviewsthecloselyrelatedworks.
714.1MotivationandSetup
Itisapparentthatdi
!erentsemantictagshavedi
!erentbiasedsigniÞcanceindescribing
atopicthatisdeterminedbytheimagecontents,andthetagsassociatedwiththesame
topicusuallyhaveastrongdependencyoneachother,whichcanbeexploitedtoimprove
theannotationortagcompletionperformance.
TheproposedTCMRalgorithmaddressestheincompleteandnoisytagproblemsby
attemptingtoe
"cientlyrecoverthemissingtagsandremoveordownweightthenoisytags
simultaneously.TheinspirationandunderlyingconceptbehindtheTCMRalgorithmisthe
connectionbetweenthefollowingtwoassumptions.
¥IdeaofLanguageModel
.SincethetagsaregeneratedfromtheuserÕsdescription
ofanimage,eachtagvectorcanbeviewedasamixtureoftopicsandeachtopic
followsamultinomialdistributionoverthevocabulary[12,108,206].Notethatthe
numberofobservedtagsforeachimageislimited,whilethenumberofparametersof
themultinomialdistributiontobeestimatedissigniÞcantlylargerthanthenumberof
observedtags.
¥LowRankMatrixRecovery
.Observedtagsofanyimagecanbeassumedto
samplefromamixtureofasmallnumberofmultinomialdistributions,whichcanbe
interpretedequivalentlythattherecoveredtagmatrixhastobeof
lowrank
.Withtheconnectionofthesetwoassumptions,theproposedTCMRalgorithmenforces
therecoveredmatrixtobelowrank.Throughanappropriatenuclearnormregularizer,
itisabletoe
!ectivelycapturetheinteractionsamongdi
!erenttaginformation,bothtag
keywords(column-wise)andtagvectorsbetweendi
!erentimages(row-wise),whichturns
outtobethekeyinÞllingoutmissingtagsanddownweightingnoisyones[19,136,184].
72Unlikeinmostmatrixcompletionproblemswheretheobservedmatrixentriesaresampled
uniformlyatrandomfromagivenmatrix[19,20],eachtagentryinourproblemsettingis
sampledfromanunknownmultinomialdistribution,makingthestandardleastsquareloss
andabsolutelossinappropriate.Henceamaximumlikelihoodestimatorisusedinthiswork
toensurethelearnedtagprobabilitymatrixtobeconsistentwiththeobservedtags,and
thisstrategyalsoaddsmorecomplexitytobothoptimizationandanalysisaswell.
Itisnoticedthatalthoughlowrankmatrixrecoveryiscloselyrelatedtotopicmodelthat
hasbeenappliedtomanyimagetagrelatedproblems[108,206,221],mostexistingtopic
models[11]needtosolveanon-convexoptimizationproblem,whichmaycausethefailure
inÞndingtheglobaloptimumandturnsouttobeabigchallengeinmatrixcompletion
problem[20,21].Toaddressthislimitation,TCMRproposestosolveaconvexoptimization
problemwhichensurestoe
"cientlyconvergetoanoptimalsolution.
Besides,theoreticalsupportisprovidedtoshowthatunderfavorableconditions,TCMR
isguaranteedtorecovermostofthemissingtagsevenwhentheuser-providedtagsare
noisy,andthatisnovelamongtheexistingstudiesfortagcompletion[126,134,201,226].
Additionally,TCMRimprovesthetaggingperformancebyexploitingthedependenciesbe-
tweenimagefeaturesandtagsviaagraphLaplacian[224,226],whichreducestheimpact
ofincompleteandnoisytagsbyassigninghighweightstotagsthatareconsistentwiththe
imagevisualcontents,andlowweightstothosewhicharenot,particularlyunderextreme
cases.Furthermore,theempiricalevaluationontagre-rankingandtagreÞnementtasks
demonstratesthatTCMRisgenerallyapplicableande
!ectivetootherimagetaggingtasks.
Figure4.1highlightsthekeycomponentsoftheproposedimagetagcompletionalgorithm
bynoisymatrixrecovery.Ontheonehand,itamelioratesthetagconÞdencescoreby
enforcingthetagmatrixtobelowrank.Thisstrategytakestheadvantageofboththe
73Figure4.1:Theschemeoftheproposednoisytagmatrixrecoveryframework,
i.e.,TCMR,
forimagetagcompletion.Thelowrankmatrixrecoverycomponentintheupperrightbox
exploitsthetag-tagcorrelation,andthegraphLaplaciancomponentinthebottomlefttakes
intoaccountofthetag-contentcorrelation.
row-wiseandcolumn-wiseinteractionswithinthetagmatrix,
i.e.,thedependenciesamong
tagwordsandcorrelationsoftaginformationbetweenimages.Ontheotherhand,agraph
Laplacianisconstructedbasedonthevisualfeaturesofimages,whereeachnoderepresentsan
imageandeachedgeisweightedbasedonthedistancebetweenitsconnectedimages.Then
thetagvectorofanimageismodiÞedbasedontheweightedmajorityvotingresultsamong
itsconnectedneighbors.Thesetwocomponentsconjointogetherinaconvexoptimization
frameworkandÞnallymodifythetagconÞdencescorematrix.
744.2RelatedWork
ThereareonlyafewstudiesÞttingthecategoryof
imagetagcompletion
withboth
incompleteandnoisytags.[226]proposesadata-drivenframeworkfortagrankingthat
optimizesthecorrelationbetweenvisualcuesandassignedtags.[129]removesthenoisy
tagsbasedonthevisualandsemanticsimilarities,andexpandstheobservedtagswiththeir
synonymsandhypernymsusingWordNet.[201]proposestosearchfortheoptimaltag
matrixthatisconsistentwithbothobservedtagsandvisualsimilarity.[134]formulatestag
completionintoanon-negativedatafactorizationproblem.[126]exploitssparselearning
techniquestoreconstructthetagmatrix.Noneofthesestudiesprovidesanytheoretical
guaranteefortheirapproaches.Matrixdecompositionisadoptedin[15,149,224]tohandle
bothmissingandnoisytags.Thekeylimitationoftheseapproachesisthattheyrequireafull
observedmatrixwithasmallnumberoferrors,makingitinappropriatefortagcompletion.
Lowrankmatrixrecovery
hasbeenappliedinmanyapplications[19,149],including
visualrecovery[136,149],multilabelclassiÞcation[15],tagreÞnement[224],etc.Sincethe
functionofmatrixrankisnon-convex,apopularapproachistoreplaceitwiththenuclear
norm,thetightestconvexrelaxationformatrixrank[19,20,224].Usingthenuclearnorm
regularization,itispossibletoaccuratelyrecoveralowrankmatrixfromasmallfractionofits
entries[20]eveniftheyarecorruptedwithnoise[19,57].Variousalgorithms[57,94,149,224]
havebeendevelopedtosolvetherelatedoptimizationproblem.Insteadofthe
!1-norm
loss[57,224],squaredloss[184]andmax-marginfactorizationmodel[136]usedinmost
studiesonmatrixcompletion/recovery,amaximumlikelihoodestimationisusedinour
worktorecovertheunderlyingtagmatrix.
754.3TagCompletionbyNoisyMatrixRecovery(TCMR)
Inthissection,wedescribeanoisymatrixrecoveryframeworkfortagcompletion.And
beforepresentingouralgorithmandanalysis,weÞrstintroducethenotationsthatwillbe
usedthroughoutthispaper.Weuse
Q!,itorepresentthe
i-thcolumnofmatrix
Q,|Q|F,|Q|trand|Q|!torepresenttheFrobeniusnorm,nuclear(trace)normandspectralnormofmatrix
Q,respectively.
|Q|1isusedtorepresentthe
!1normofmatrix
Q,i.e.,|Q|1=0i,j|Qi,j|,and|v|'isusedtorepresenttheinÞnitynormofvector
v,i.e.,|v|'=max
i|vi|.Wealso
use
ei#{0,1}ntorepresentthe
i-thcanonicalbasisfor
Rn,and
1#Rmtorepresenta
m-dimensionalvectorwithallitsentriesbeing1.
Tobegin,let
mbethenumberofuniquetags,and
D={d1,ááá
,dn}beacollectionof
ntaggedimages,where
di=(di,1,ááá
,di,m)isthetagvectorforthe
i-thimagewith
di,j=1whentag
jisassignedtotheimageandzero,otherwise.Forthesimplicityofanalysis,in
thisstudy,weassumethatalltheimageshavethesamenumberofassignedtags,denoted
bym!1.Whendi
!erentnumberoftagsareobserved,wecanapplyasimpleweighting
technique[150]tohandlethevariationinthenumberoftags.
Ourdevelopmentisbasedonthesimpleobservationthattheessentialgoaloftopicmodel
istoapproximateanobservedtagprobabilitymatrixbyalowrankmatrix(ormoreprecisely,
theproductoftwolowrankmatrices).Itisthisobservationthatmotivatesustoconnect
topic(probabilistic)modelwithlowrankmatrixcompletion.
1Notethisassumptionisonlyfortheconvenienceofanalysis,anddoesnota
"ectthealgorithm.
764.3.1NoisyMatrixRecovery
Followingtheideaoflanguagemodels[11,12,225],weassumethatalltheobservedtags
ineachimagearedrawnindependentlyfromaÞxedbutunknownmultinomialdistribution.
Let
pi=(pi,1,ááá
,pi,m)bethemultinomialdistributionusedtogeneratetagsin
di.We
use
P=(p1,ááá
,pn)torepresentthemultinomialdistributionsforalltheimages.Ourgoal
istoaccuratelyrecoverthemultinomialdistribution
Pfromalimitednumberofobserved
tagsin
D.Ingeneral,thisisimpossiblesincethenumberofparameterstobeestimatedis
signiÞcantlylargerthanthenumberofobservedtags.Toaddressthischallenge,wefollow
thekeyassumptionbehindmosttopicmodels[184,224],
i.e.tagsofanyimagearesampled
fromamixtureofasmallnumberofmultinomialdistributions.Adirectimplicationofthis
assumptionisthatmatrix
Phastobeoflowrank,thefoundationforthetheoryoflowrank
matrixrecovery[20].
Theproposedapproachcombinestheideaofmaximumlikelihoodestimation,acommon
approachfortopicmodel,andthetheoryoflowrankmatrixrecovery.Itaimstorecoverthe
multinomialprobabilitymatrix
Pbysolvingthefollowingoptimizationproblem
minQ##L(Q):=
%n#i=1m#j=1di,jm!logQi,j1234:=E1+(rank(Q)1234:=E2,(4.1)wheredomain
$isdeÞnedas
$={Q#(0,1)m$n:Qi,!1&=1,i#[1,n]},(4.2)and(isaregularizationparameter.Wedenoteby
öQtheoptimalsolutionto(4.1).Term
77E1in(4.1)ensuresthelearnedprobabilitymatrix
öQtobeconsistentwiththeobservedtag
matrix,andterm
E2ensuresthat
öQisoflowrank,whichindicatesthatalltagsofanimage
aresampledfromamixtureofasmallnumberofmultinomialdistributions.
Wenotethatunlikestandardmatrixcompletiontheory[20,61]whereobservedentriesare
sampleduniformlyatrandomfromagivenmatrix,inourmodelaswellasothertopicmodel,
eachobservedtagissampledfromanunknownmultinomialdistribution.Thisdi
!erence
makesthecommonleastsquarelossandabsolutelossinappropriate.
However,thefunctionofmatrixrankin(4.1)isnon-convexandnon-di
!erentiable,which
posesaproblemintheoptimizationprocedure.Therefore,wereplacetherankfunction
withthenuclearnorm,thetightestconvexenvelopeofthematrixrankfunction[19,20,21].
Thenuclearnormofamatrix
QisdeÞnedasthesumofsingularvaluesof
Q.Withthe
nuclearnormregularization,itispossibletoaccuratelyrecoveralowrankmatrixfroma
smallfractionofitsentries[20,21,161]eveniftheyarecorruptedwithnoise[19,57,100,
107],whichexactlyÞtsthemissingandnoisytagsituation.Consequently,theoptimization
problemin(4.1)becomes
minQ##L(Q)=%n#i=1m#j=1di,jm!logQi,j+(|Q|tr,(4.3)andthedomain
$isstilldeÞnedthesameasin4.2.
In(4.3)thesparsityoftherecoveredmatrix
öQisintroducedbythenuclearnorm.Nuclear
normregularizerenforcesthematrixcompletiontofavortheinteractionsbetweenrows
andcolumnstoÞndaglobalsolution[14],whichisincontrasttoFrobeniusand
!1normregularizersthatdealwitheachentryinthematrixindependently.
784.3.2IncorporatingIrrelevantTagsintoNoisyMatrixRecovery
Regardingthefactthattheinitiallyunobservedtagsarewithasmallprobabilityrelevant
totheassociatedimage,wealsomaximizethelikelihoodoftheirirrelevance,sotheloss
functionsinboth(4.1)and(4.3)arethusupdated,andtheoptimizationproblembecomes
minQ##L(Q)=%n,m
#i,j=1(di,jm!logQi,j+1%di,jm%m!log(1
%Qi,j)++(|Q|tr,(4.4)wheredomain
$remainstobe4.2.
4.4TheoreticalGuaranteeofRKML
Thefollowingtheoremboundsthedi
!erencebetween
P,therecoveredtagmatrixbyTCMR,
andtheoptimalrecoveredprobabilitymatrix
öQ.Theorem4.1.
Let
rbetherankofmatrix
P,and
Nbethetotalnumberofobservedtags.
Let
öQbetheoptimalsolutionto(4.3).Assume
N.%(nlog(
n+m)),anddenoteby
µ%andµ+thelowerandupperboundsfortheprobabilitiesinP.
Thenwehave,withahighprobability
1n|öQ%P|1-O,rn+2log(
n+m)N-,(4.5)where
+2:=µ+|P1|'nµ2%-µ2+µ2%.AsketchoftheproofisprovidedinSection4.5.4.
79Itisclearthattherecoveryerroris
O(rnlog(
n+m)/N),implyingthatthetagmatrixcan
beaccuratelyrecoveredwhen
N.%(rnlog(
n+m)).Thisisconsistentwiththestandard
resultsinmatrixcompletion[107]andlowrankmatrixrecovery[106].Theimpactoflowrank
assumptionisanalyzedinSection4.4.1.However,insteadofsquarelossusedinstandard
matrixcompletiontheory,weadoptmaximumlikelihoodlossfunctioninourmodel,which
leadstoadditionalchallengesinanalyzingtherecoverypropertyforourmodel.
4.4.1ImpactofLowRankAssumptiononRecoveryError
Inordertoseetheimpactoflowrankassumption,letusconsiderthemaximumlikeli-
hoodestimationofmultinomialdistribution.Sincetagsfordi
!erentimagesaresampled
independently,weonlyneedtoconsideroneimageateachtime.Let
pbetheunderlying
multinomialdistributiontobeestimated,andlet
dbetheimagetagvectorcomprisedof
m!wordssampledfrom
p.Weestimate
pbythesimplemaximumlikelihoodestimation,
i.e.,minp#[µ%,µ+]m:p&1=1%n#i=1dilogpi,(4.6)where
misthenumberofuniquetags,
nisthenumberofimages,
µ%andµ+arethelower
andupperboundsfortheprobabilitiesinmatrix
P=(p1,ááá
,pn).Theorem4.2.
DeÞne
zasz=dm!%p.Let
öqbetheoptimalsolutionto(4.6).Then
|p%öq|1-µ2+µ2%|z|22.80Andtobound
|z|2,weneedthefollowingconcentrationinequalityforvectors.
Theorem4.3.
Withaprobability
1%2e%t,|z|2-5t+log
mµ%m!|p|2.FollowingtheconcentrationinequalityforvectorsinTheorem4.3,webound
|z|2.Then
bycombiningTheorems4.2and4.3,wehave,withaprobability1
%2e%t,|p%öq|1-µ2+|p|22µ4%2(t+log
m)m!Byapplyingtheaboveresulttomatrix
Pandtakingtheunionbound,wehave,with
probability1
%e%t,1n|P%öQ|1-µ2+µ4%max1-i-n|pi|222n(t+log
m+log
n)N.(4.7)Wenowcomparetheboundin(4.7)tothatin(4.5).Itiseasytoverifythat
|pi|22/µ2%.mforany
pi.Hence,thenete
!ectoftheboundin(4.5)istoreplace
mwith
r,whichisexactly
theimpactoflowrankassumption.
4.5ProofsofErrorBounds
Inthissection,wegiveouttheproofsofthemaintheoremsproposedinSection4.4.
814.5.1ProofofTheorem4.1
Proof.
DeÞnematrix
MasM:=n#i=1,1m!di%pi-e&i=n#i=11m!die&i%P,(4.8)where
ei#{0,1}nisthecanonicalbasefor
Rn.Sincetheoccurrenceofeachtagin
diissampledaccordingtotheunderlyingmultinomialdistribution
pi,itiseasytoverifythat
E[M]=0
.Beforepresentingouranalysis,weneedtwosupportinglemmasthatareimportantto
ouranalysis.ThedetailedproofsoftheselemmasareprovidedinSectionA.2.
Lemma4.4.
Let
P#$andQ#$betwoprobabilitymatrices.Wehave
n#i=1m#j=1|Pi,j%Qi,j|2Qi,j.n#i=1m#j=1|Pi,j%Qi,j|=|P%Q|1.Lemma4.5.
([107])Let
Z1,ááá
,Znbeindependentrandommatriceswithdimension
m1$m2thatsatisfy
E[Zi]=0and|Zi|!-Ualmostsurelyforsomeconstant
U,andall
i=1,ááá
,n.DeÞne
,Z=max
6777771nn#i=1E[ZiZ&i]77777!,777771nn#i=1E[Z&iZi]77777!8.82Then,forall
t>0,withaprobability
1%e%t,wehave
777771nn#i=1Zi77777!-2max
6,Z9t+log(
m1+m2)n,Ut+log(
m1+m2)n8.Thefollowingtheoremisthekeytoouranalysis.Itshowsthattheestimationerror
|P%Q|1,measuredby
!1norm,willbesmallwhen
Pcanbewellapproximatedbyalow
rankmatrix.
Theorem4.6.
Let
öQbetheoptimalsolutionto(4.3).If
(.1µ%|M|!,where
MisdeÞnedin(4.8),then
|öQ%P|1-minQ##6|Q%P|2Fµ%+16
(2µ+rank
(Q)8.ToutilizeTheorem4.6forboundingthedi
!erencebetween
PandöQ,weneedtobound
|M|!.Thetheorembelowbounds
|M|!byusingLemma4.5.
Theorem4.7.
DeÞne
*as*:=2µ%max:;<t+log(
m+n)m!,5max(1,|P1|')t+log(
n+m)m!=>?.(4.9)Thenwithaprobability
1%e%t,wehave
|M|!-*µ%.83CombiningTheorems4.6and4.7,wehavethefollowingresultforrecoveringtheproba-
bilitymatrix
P.Corollary4.8.
Set(=*.Withaprobabilityatleast
1%e%t,wehave
|öQ%P|1-minQ##6|Q%P|2Fµ%+16
*2µ+rank
(Q)8.Furthermore,let
öPbethebestrank-rapproximationof
P.Wehave,withaprobability
1%e%t|öQ%öP|1-|P%öP|2Fµ%+16
*2µ+r.WenowcometotheproofofTheorem4.1.Whentherankof
Pisr,usingCorollary4.8,
wehave,withahighprobability,
|öQ%P|1-16*2µ+r.If|P1|'.1and
m!.O(log(
m+n)),wehave
*=O@A1µ%5|P1|'log(
n+m)m!BCandtherefore,withahighprobability,wehave
1n|öQ%P|1-O.rlog(
n+m)m!µ+|P1|'µ2%/-O.rnlog(
n+m)Nµ+|P1|'nµ2%/.where
Nisthenumberofobservedtags.ThisimmediatelyimpliesTheorem4.1.
844.5.2ProofofTheorem4.2
Proof.
FollowingthesameanalysisasthatforTheorem4.6whoseproofisprovidedin
Section4.5.4,wehave
m#i=1(pi%öqi)2öqi-m#i=1ziöqi(pi%öqi).Usingthefact
öqi#[µ%,µ+],wehave
|pi%öqi|22-µ+µ%|z|2|p%öq|2,andtherefore
|pi%öq|2-µ+µ%|z|2.WeÞnallycompletetheproofbyusingthefact
m#i=1(pi%öqi)2öqi.|p%öq|1.854.5.3ProofofTheorem4.3
Proof.
WewillusetheCherno
!bound,i.e.
X1,ááá
,Xm!beindependentdrawsfroma
Bernoullidistributionwith
P(X=1)=
µ.Wehave
P.1m!m!#i=1Xi.(1+
')µ/-exp
,%'2µm!3-,P.1m!m!#i=1Xi-(1%')µ/-exp
,%'2µm!2-.UsingtheCherno
!bound,wehave,withaprobability1
%2exp(
%'2µm!/2)|X%µ|2-'2µ2.Bytakingtheunionbound,wehave,withaprobability1
%2e%t|z|2-5t+log
mµ%m!|p|2.4.5.4ProofofTheorem4.6
Proof.
Weconsideranysolution
Q#$.Since
öQistheoptimalsolutiontoEq(4.3),we
have
*3L(öQ),öQ%Q+-0,i.e.%1m!n#i=1m#j=1di,jöQi,j)öQi,j%Qi,j*+(*-|öQ|tr,öQ%Q+-0,86where
-|öQ|trisasubgradientof
|öQ|tr.Usingthefactthat
*-|öQ|tr%-|Q|tr,öQ%Q+.0,wecanreplace
*-|öQ|tr,öQ%Q+with
*-|Q|tr,öQ%Q+,whichresultsinthefollowinginequality
%1m!n#i=1m#j=1di,jöQi,j)öQi,j%Qi,j*+(*-|Q|tr,öQ%Q+-0.DeÞne
Zi,j=(öQi,j%Qi,j)/öQi,j.Wehave
%1m!n#i=1m#j=1di,jöQi,j)öQi,j%Qi,j*=%1m!n#i=1*die&i,Z+=%*P,Z
+%*M,Z
+.ThustheboundinEq(4.5)ismodiÞedas
%n#i=1m#j=1Pi,jöQi,j)öQi,j%Qi,j*+(*-|Q|tr,öQ%Q+-n#i=1m#j=1Mi,jöQi,j)öQi,j%Qi,j*.Since
%m#j=1Pi,jöQi,j)öQi,j%Qi,j*=%m#j=11öQi,j)Pi,j%öQi,j*)öQi,j%Qi,j*.wehave
%m#j=1Pi,jöQi,j)öQi,j%Qi,j*=n#i=1m#j=1(öQi,j%Pi,j)22öQi,j+(öQi,j%Qi,j)22öQi,j%(Qi,j%Pi,j)22öQi,j.DeÞnematrix
B#Rn$masBi,j=Mi,j/öQi,j.Usingthefact
öQi,j#[µ%,µ+]andresult
87fromLemma4.4,wehave
12|P%öQ|1+|öQ%Q|2F2µ++(*-|Q|tr,öQ%Q+-|M|!µ%|öQ%Q|tr+|P%Q|2F2µ%.WewritetheSingularvaluedecompositionof
QasQ=r#i=1,iuiv&i,(4.10)where
ristherankof
Q,,iisthe
i-thsingularvalueof
Q,and(
ui,vi)aretheleftandright
singularvectorsof
Q.Let
U4#Rn$(n%r)andV4#Rm$(m%r)betheorthogonalbases
complementaryto
UandV,respectively.DeÞnethelinearoperators
PQandP4QasPQ(Z)=UU&Z+ZVV
&%UU&ZVV
&,P4Q(Z)=Z%PQ(Z).Accordingto(4.10),thesubgradient
-|Q|trisgivenbytheset
WW=DUV&+U4WV4:W#R(n%r)$(m%r),|W|!=1E.Thusbychoosinganappropriatematrix
Wforthesubgradient
-|Qtr|,wehave
*-|Q|tr,öQ%Q+.%|PQ(öQ%Q)|tr+|P4Q(öQ%Q)|trandtherefore
12|P%öQ|1+|öQ%Q|2F2µ++(|P4Q(öQ%Q)|tr-(|PQ(öQ%Q)|tr+|M|!µ%|öQ%Q|tr+|P%Q|2F2µ%.88Usingthefact
(.1µ%|M|!,wehave
|P%öQ|1+|öQ%Q|2Fµ+-4(|PQ(öQ%Q)|tr+|P%Q|2Fµ%.Weconsidertwocases.IntheÞrstcase,weassume
|P%öQ|1-1µ%|P%Q|2F,inwhichtheboundintheoremtriviallyholds.Inthesecondcase,wehavetheopposite
|P%öQ|1>1µ%|P%Q|2F,whichimplies
|öQ%Q|2Fµ+-4(|PQ(öQ%Q)|tr,andtherefore
|PQ(öQ%Q)|tr-4(rµ+.Wecompletetheproofbypluggingtheabovebound.
894.6Implementation
Inthissection,wepresenttwoauxiliarytechniquestoimprovethetagcompletionperfor-
mance.Weincorporatethevisualfeaturestoimprovethetagaccuracyanduseanextended
gradientmethodtosolvetheoptimizationprobleme
"ciently.
4.6.1IncorporatingVisualFeatures
Thelimitationofthenoisymatrixrecoverymethodin(4.4)isthatitdoesnottakeadvantage
ofthevisualcontentsoftheimages,animportanthintforaccuratetagprediction.Sowe
nextmodify(4.4)toincorporatethevisualfeatures.Hereweintroducetwocommonmethods
tointegratethevisualfeatures,includingintroducingtheGraphLaplaciantotheobjective
functionin(4.4)andalinearcombinationoftherecoveredmatrix
Pandthemajorityvoting
resultsamongnearestneighbors.
4.6.1.1GraphLaplacianMethod
Let
X=(x1,ááá
,xn)&includethevisualfeaturesofallimages,wherevector
xi#Rdrepresentsthevisualcontentofthe
ithimage.Let
W=[wi,j]n$nbethepairwisesimilarity
matrix,where
wi,jisthevisualsimilaritybetweenimages
xiandxj,i.e.,wi,j=:FFF;FFF<exp
(%d(xi,xj)2$2+ifj#Nk(i)or
i#Nk(j);0otherwise
,(4.11)where
Nk(i)denotestheindexsetforthe
knearestneighborsofthe
ithimage,
kisempirically
setto
k=0.001n,d(xi,xj)representsthedistancebetween
xiandxj,and
,istheaverage
distance.Weadopt
.-distanceif
xiishistogramfeaturesandEuclideandistance,otherwise.
90Usingmatrix
W,wecanmeasuretheconsistencybetweentheestimatedtagprobability
matrixQandvisualsimilaritiesby
n#i,j=1Wi,j|Q!,i%Q!,j|2=Tr(Q&LQ),(4.12)where
L=diag(
W&1)%W(4.13)andLisexactlythegraphLaplaciandeÞnedin[202,224].Byminimizing
Tr(Q&LQ),we
ensurethattherecoveredprobabilitymatrix
Qtobeconsistentwithvisualfeatures,
i.e.,similarimagessharesimilartags.
Finallybycombiningthenoisymatrixrecoverycomponentwiththecomponentofvisual
features,werecoverthetagprobabilitymatrix
Qbysolvingthefollowingoptimization
problem
minQ##%n,m
#i,j=1(di,jm!logQi,j+1%di,jm%m!log(1
%Qi,j)++"nTr(QTLQ)+#|Q|tr,(4.14)where
$isdeÞnedin4.2,andboth
"and#areregularizationparameters.
Byminimizingtheobjectivefunctionabove,weareabletosimultaneouslyÞlloutthe
missingtagsandÞlterout/downweightthenoisytags.Figure4.1showstheframeworkof
thewholealgorithmdescribedin4.14,whichincludesthetwoprinciplecomponents:the
lowranknoisymatrixrecoverycomponentreßectingthetag-tagcorrelation,andthegraph
Laplaciancomponentexploringtheimage-tagdependencies.
914.6.1.2LinearReconstructionApproach
AlthoughweincorporatethevisualconsistencyintheproposedmodelwithGraphLaplacian
asexplainedinSection4.6.1.1,TCMRmainlyexploresthestatisticcorrelationbetweentags.
AsitissigniÞcantlyobviousthatvisuallysimilarimagesusuallysharesimilarsemantic
tags,wefurtheremphasizetheroleofvisualcontentswithanadditionalweightedlinear
reconstructionstrategyfollowing[126],whichissimpleyetempiricallydemonstratedtobe
e!ective
%='T+(1
%')R,(4.15)where
%istheexpectedÞnalresult,
'isaweightingparameterin[0
,1],Tisthenormalized
completionresultofTCMRin(4.4),and
Risthenormalizedtaggingresultgeneratedby
amajorityvotingstrategyamongthenearestneighborsoftheimages,oranyothervisual
featurebasedimageannotationresults.
4.6.2E
"cientSolutionoftheProposedAlgorithm
Weincorporateseveralheuristicstoimprovethecomputationale
"ciency.First,weadopt
oneprojectionparadigmthathasbeensuccessfullyappliedtometriclearning[39].The
keyideaistoignorethedomainconstraint
Q#$duringtheiteration,andonlyproject
thesolution
Qinto
$attheendofoptimization.Asaresult,weonlyneedtosolvean
unconstrainedoptimizationproblem.Secondly,weadopttheextendedgradientmethod
in[94].Tothisend,werewritetheobjectivefunctionin(4.4)or(4.14)as
L(Q)=f(Q)+(|Q|tr.Thengiventhecurrentsolution
Qk%1,weupdatethesolution
Qkbysolvingthe
92followingoptimizationproblem
argmin
QPtk(Q,Q
k%1)=127777Q%,Qk%1%1tk3f(Qk%1)-77772F+(tk|Q|tr.(4.16)where
tkisthestepsizeforthe
kthiteration.Thedetailedalgorithmforsolvingtheuncon-
strainedversionoftheobjectivefunctionscanbefoundin[94].
4.6.3Pseudo-codeofTCMR
TCMRsolvesasemi-supervisedlearningproblemanditmodiÞesthetagconÞdencescores
basedontheinitialbinarytagmatrix.Unlikethetraditionalimageannotationalgorithms,
e.g.,RKMLproposedinChapter3thatconsistsoftrainingandtestingphase,TCMR
doesthelearningonthewholedatasetandresultsinanupdatedtagmatrix.Algorithm2
summarizesthemainstepsofTCMR.ToobtaintheÞnaltagsforanimageinthetag
completionsetting,wereturnthetagswithtopscoreastheÞnaltagsofanimage.
4.7Experiments
4.7.1DatasetsandExperimentalSetup
Fourbenchmarkdatasetsareusedtoevaluateourproposedalgorithm.ESPGamedataset
wascollectedforacollaborativeimagelabelingtaskandconsistsofimagesincludinglogos,
drawingsandpersonalphotos.IAPRTC12datasetconsistsofimagesofactions,landscapes,
animalsandmanyothercontemporarylife,anditstagsareextractedfromthetextcaptions
accompanyingeachimage.BothMirFlickrandNUS-WIDEdatasets[33]includeimages
93Algorithm2
ImageTagCompletionbyNoisyMatrixRecovery
Input:
¥Visualfeaturesofthewholeimagedataset:
X#Rn$d¥Binarytagmatrix:labels
D#Rn$m¥Regularizationparameters
"and#¥InitialLipschitzconstant
t0,itsincreasingparameter
*andk501:ComputetheLaplacianmatrix
Lbasedon
Xaccordingto(4.12)and(4.13).
2:Initialize
Q0andletallentriesequalto0
.5.3:whilenotconverged
do4:k5k+1,
5:C=Qk"1%1tk!13f(Qk"1),6:Computesingularvaluedecomposition:
U$VT=C,7:tk5tk"1/#.8:repeat
9:tk=#tk,10:GQ=U$kVT,where
$kisdiagonalwith(
$k)ii=max(0
,$ii%!tk).11:until
F(GQ)(4.4or4.14)
<=Ptk(GQ,Q
k"1)(4.16)
12:Qk5GQ.13:endwhile
14:Q5Qk.15:Output:
Matrixoftagrelevancescore
Q#Rn!m.crawledfromFlickr
2,togetherwithusersprovidedtags.ImageNet
3isanimagedataset
organizedaccordingtotheWordNethierarchy,whichcontainsmorethan20Kconcepts
4.ESPGameandIAPRTC12arecollaborativelyhumanlabeledandthusrelativelyclean,
whileMirFlickrandNUS-WIDEareautomaticallycrawledfromsocialmediaandhence
prettynoisy.Besides,withtheWordNethierarchy,ImageNetisabletoo
!ertensofmillions
ofcleanlysortedimagesformostoftheprovidedconcepts.Abag-of-wordsmodelbased
ondenselysampledSIFTdescriptorsisusedtorepresentthevisualcontentinMirFlickr,
ESPGame,IAPRTC12andImageNetdatasets
56.InNUS-WIDEdataset,visualcontent
2https://www.flickr.com/.3http://www.image-net.org/
4ThelistofImageNetconceptscouldbereferredto
http://www.image-net.org/archive/words.txt
.5Thefeatureswereobtainedfrom
http://lear.inrialpes.fr/people/guillaumin/data.php.More
detaileddescriptionaboutMirFlickr,ESPGameandIAPRTC12canalsobefoundin[67,69].
6ImageNeto
"ersa1
.2MsubsetofimageswithSIFTfeature,whichcanbedownloadedthrough
http://www.image-net.org/download-features.php
.94arerepresentedbysixlow-levelfeatures,includingcolorinformation,edgedistributionand
wavelettexture[33].
Toevaluatetheproposedapproachfortagcompletion,wedividetheoriginaltagmatrix
Yintotwoparts:theobservedtagmatrix(
i.e.trainingset)
Dandtheleftasevaluation
groundtruth(
i.e.testingset).Wecreatetheobservedtagmatrixbyrandomlysampling
asubsetoftagsfrom
Dforeachimage.Toguaranteethattheevaluationismeaningful,
weensurethateachimagehasatleastoneevaluationtagbyÞlteringoutimageswithtoo
fewtagsandtagsassociatedwithonlyafewimages.DetailedstatisticsaboutthereÞned
datasetsarelistedinTable4.7.1.AllthehyperparametervaluesusedinTCMR,
e.g.(,",#,andtheparametervaluesinthebaselinesaredeterminedbycross-validation.
ESPGame
IAPRTC12
MirFlickr
NUS-WIDE
ImageNet
NumberofImgs
10,45012,9855,23120,9681,253,679Featuredimension
1000100010005001000Vocabularysize
2652913724201,625Averagetags/img
6.417.075.8210.427.54Min/maxtags/img
5/155/234/439/155/125Averageimgs/tag
253.0315.581.9519.64751Min/maximgs/tag
16/3,43914/4,75210/78178/5,058300/25,361Numofobservedtags
!44344Table4.1:StatisticsforthereÞneddatasets.
!indicatesthenumberofobservedtagswhen
trainingtheTCMRmodelthroughouttheexperimentalsectionifnospeciÞcexplanation.
Following[126],weevaluatethetagcompletionaccuracybythe
averageprecision@N
(AP@N).Itmeasurestheaveragepercentageofthetop
Nrecoveredtagsthatarecorrect.
Notethatatagiscorrectlyrecoveredifitisincludedintheoriginaltagmatrix
Ybutnot
observedin
D.Wealsouse
averagerecall
(AR@N)tomeasurethepercentageofcorrect
tagsthatarerecoveredbyacomputationalalgorithmoutofallgroundtruthtags,and
coverage
(C@N)tomeasurethepercentageofimageswithatlastonecorrectlyrecovered
95tag.Boththemeanandstandarddeviationofevaluationmetricsover20experimentaltrials
arereportedinthispaper.
4.7.2Comparisontostate-of-the-artTagCompletionMethods
WeÞrstcompareourTCMRalgorithm
78proposedby(4.14)toseveralstate-of-the-arttag
completionapproaches:1)LRES[224],tagreÞnementtowardslow-rank,content-tagprior
anderrorsparsity,2)TMC[201]thatsearchesfortheoptimaltagmatrixconsistentwith
boththeobservedtagsandvisualsimilarity,3)MC-1[15]whichapplieslowrankmatrix
completiontotheconcatenationofvisualfeaturesandassignedtags,4)FastTag[28]that
co-regularizestwosimplelinearmappingsinajointconvexlossfunction,5)LSR[126]that
optimallyreconstructseachimageandeachtagwithremainingonesunderconstraintsof
sparsity.Wealsocomparetheproposedapproachwiththreestate-of-the-artimageanno-
tationalgorithmsthataredesignedforcleantags:6)TagProp[67],7)RKML[52],akernel
metriclearningalgorithm,and8)vKNN[123],anearestneighborvotingalgorithm.Since
mostofthemareoriginallydesignedforimageannotation,wetrainthemodelusingtheob-
servedtagsÞrstoverthewholegallery,andthenapplythemodelstothegallerytoupdate
thetagmatrix.
Figure4.2showstheimagetagcompletionresultsontheIAPRTC12datasetmeasured
byAP@N,AR@NandC@N,respectively.Figure4.3showthetagcompletionperformance
ontheleftthreedatasets;wheretherowsrepresentdi
!erentevaluationmeasuresandthe
columnsindicatedi
!erentdatasets.Weobservethatoverall,theproposedTCMRandLSR
7NotethatifwithoutnotiÞcation,TCMRstandsforthealgorithmproposedin(4.14),andTCMR-lr
standsfortheoneproposedin4.15.
8ThesourcecodeofTCMRcanbedownloadedfromourwebsite
http://www.cse.msu.edu/~fengzhey/downloads/src/tcmr.zip.96124600.10.20.30.4Top N completed tagsAP@N(a)AP@N
124600.10.20.30.40.5Top N completed tagsAR@N(b)AR@N
124600.20.40.60.8Top N completed tagsC@N  LRESTMCMC!1FastTagLSRTagPropRKMLvKNNTCMR(c)C@N
Figure4.2:ComparisonoftagcompletionperformancebetweenTCMRandstate-of-the-art
baselinesonIAPRTC12dataset.
124600.10.20.3Top N completed tagsAP@N(a)AP@NonMirFlickr
124600.10.20.30.4Top N completed tagsAP@N(b)AP@NonESPGame
124600.10.20.30.40.5Top N completed tagsAP@N  LRESTMCMC!1FastTagLSRTagPropRKMLvKNNTCMR(c)AP@NonNUSWIDE
124600.10.20.3Top N completed tagsAR@N(d)AR@NonMirFlickr
124600.10.20.30.4Top N completed tagsAR@N(e)AR@NonESPGame
124600.10.20.3Top N completed tagsAR@N  LRESTMCMC!1FastTagLSRTagPropRKMLvKNNTCMR(f)AR@NonNUSWIDE
124600.10.20.30.40.5Top N completed tagsC@N(g)C@NonMirFlickr
124600.20.40.6Top N completed tagsC@N(h)C@NonESPGame
124600.20.40.60.8Top N completed tagsC@N  LRESTMCMC!1FastTagLSRTagPropRKMLvKNNTCMR(i)C@NonNUSWIDE
Figure4.3:ComparisonoftagcompletionperformancebetweenTCMRandstate-of-the-art
baselinesonotherdatasetsincludingMirFlickr,ESPGameandNUS-WIDE.
yieldsigniÞcantlybetterperformancethantheotherapproachesincomparison.TCMR
performssigniÞcantlybetterthanLSRintermsof
C@N,aswellastheothermethods.In
particular,TCMRrecoversatleastonecorrecttagoutofthetopsixpredictedtagsfor80%
97oftheimageswhiletheotherapproachesareonlyabletorecoveratleastonecorrecttag
forlessthan50%oftheimages,indicatingthattheproposedalgorithmismoree
!ectivein
recoveringrelevanttagsforawiderangeofimages,animportantpropertyforimagetag
completionalgorithm.WealsoobservethatTCMRperformsslightlybetterthanLSRin
termsof
AP@Nwhenthenumberofpredictedtags
Nissmall.
4.7.2.1E
"ciencyEvaluation
LRES
TMCMC-1FastTag
LSRTagProp
RKMLTCMRMirFlickr
5.6e24.7e38.6e21.4e36.2e32.5e23.0e21.3e2ESPGame
3.4e25.8e31.0e38.6e21.3e46.7e21.3e33.5e2IAPRTC12
5.2e21.2e41.7e31.6e31.6e41.1e31.5e35.2e2NUS-WIDE
6.8e32.9e41.8e32.6e32.8e41.5e33.8e31.2e3Table4.2:Runningtime(seconds)fortagcompletionbaselines.Allalgorithmsarerunin
MatlabonanAMD4-core@2.7GHzand64GBRAMmachine.
Table4.7.2.1summarizestherunningtimeofallalgorithmsincomparison.Weobserve
thatalthoughTCMRisnotase
"cientasseveralbaselines,itismoree
"cientthanLSR
whichyieldssimilarperformanceasTCMRinmultiplecases.Thehighcomputationalcost
ofLSRisduetothefactthatithastotrainadi
!erentmodelforeachinstance,whichdoes
notscalewelltolargedatasets.
4.7.3AnalysisofAlgorithmDesign
4.7.3.1EvaluationofNoisyMatrixRecoverywithoutVisualFeatures
Thekeycomponentoftheproposedapproachisanoisymatrixrecoveryframework.To
independentlyevaluatethee
!ectivenessofnoisymatrixrecoverycomponentproposedin
thiswork,wesimplifyTCMRbyignoringtheGraphLaplaciancomponentaccordingto4.4
98andcompareit(denotedasTCMR0)toseveralbaselineapproachesformatrixcompletion
thatdonottakeintoaccountvisualfeatures:(1)Freq,whichassignsthemostfrequent
tagstoalltheimages,(2)LSA[147],LatentSemanticAnalysis,(3)tKNN,majorityvoting
amongthenearestneighborsinthetagspace,(4)LDA[12],(5)LRES0[224],aversionof
LRESalgorithmwithoutusingvisualfeatures,and(6)pLSA,probabilisticLSA.
124600.10.2Top N completed tagsAP@N(a)MirFlickr
124600.10.20.3Top N completed tags(b)ESPGame
124600.10.20.30.4Top N completed tags(c)IAPRTC12
124600.10.20.30.4Top N completed tags  FreqLSAtKNNLDALRES0pLSATCMR0(d)NUSWIDE
124600.10.20.3Top N completed tagsAR@N(e)MirFlickr
124600.10.20.30.4Top N completed tags(f)ESPGame
124600.10.20.30.4Top N completed tags(g)IAPRTC12
124600.10.2Top N completed tags  FreqLSAtKNNLDALRES0pLSATCMR0(h)NUSWIDE
124600.20.4Top N completed tagsC@N(i)MirFlickr
124600.20.40.6Top N completed tags(j)ESPGame
124600.20.40.60.8Top N completed tags(k)IAPRTC12
124600.20.40.60.8Top N completed tags  FreqLSAtKNNLDALRES0pLSATCMR0(l)NUSWIDE
Figure4.4:Comparisonofdi
!erenttopicmodelsandmatrixcompletionalgorithmswithout
takingintoaccountthevisualfeature.Thetoprowisevaluatedby
AP@N,themiddlerow
isby
AR@N,andthebottomrowisby
C@N.Figure4.4comparesthetagcompletionperformanceofalgorithmswithoutvisualfea-
tures.WeobservethattheproposednoisymatrixrecoveryalgorithmperformssigniÞcantly
betterthantheotherbaselinemethods,implyingthatitcansuccessfullycapturetheimpor-
tantdependencyamongtags.WealsoobservethatasimpletKNNalgorithmworksbetter
thanthetopicalmodels(LSA,LDAandpLSA),suggestingthatdirectlyapplyingatopical
modelmaynotbeappropriateforthetagcompletionproblem.
99Figures4.2,4.3and4.4showthatTMCandRKMLperformmuchworsethantheother
algorithmsincomparison,whileLSAandtKNNperformquitewell.Accordingly,weexclude
TMCandRKML,andincludeLSAandtKNNinthefollowingevaluationcases.
4.7.3.2AnalysisofScalability
ScalabilityisacrucialproblemubiquitouslypresentinginMachineLearningandComputer
Visiondomainsincludingtagcompletion,annotation,imageunderstanding,etc.Inorder
toidentifyhowtheproposedTCMRalgorithmissensitivetothedatasize,weconduct
thecomparisonexperimentsonasequenceofsubsetsofImageNetdatasetwhosescales
variesfrom4
,000to1
,000,000.Sincethemaximumdatasizeisupto1
M,somebaseline
algorithmscomparedinSection4.7.2areunabletoimplementduetothee
"ciencyissue,
sointhisSectionweonlycomparetheproposedTCMRmethodwithafewfastalgorithms
including(1)Freq,whichassignsthemostfrequenttagstoalltheimages,(2)LSA[147],
LatentSemanticAnalysis,(3)t-KNN,majorityvotingamongthenearestneighborsinthe
tagspace,and(4)v-KNN[123],anearestneighborvotingalgorithmbasedonthevisual
similarity.SincethecomputationofGraphLaplacianisextremelyexpensiveoverlarge-scale
data,wereplaceitwithmoree
"cientstrategy.WeuseTCMRtorepresentthealgorithm
proposedin4.4,andTCMRVtodenoteitsextensionthatincorporatesthevisualinformation
bylinearreconstructionfollowingSection4.6.1.2.
Figure4.5evaluatesthescalabilityintermsoftagcompletionaccuracy.Weobservethat
formostmethods,asthedatasizeincreases,theaverageprecisionaccordinglyincreases.
However,comparedtot-KNN,TCMRbasedalgorithmshaveamuchimpressiveperformance
whenthedatasetissmall(lessthan63
K),indicatingthatTCMRismuchmorecapableto
recoverthetaginformationwithfewersamples.Andwhenthedatasizeexceeds63
K,the
100TRMCTRMCVv!KNNt!KNNLSAFreq0.10.20.30.40.50.6MethodsAverage Precision @4  N=1MN=500KN=250KN=125KN=63KN=32KN=16KN=8KN=4K(a)
AP@4TRMCTRMCVv!KNNt!KNNLSAFreq0.10.150.20.250.30.35MethodsAverage Precision @10  N=1MN=500KN=250KN=125KN=63KN=32KN=16KN=8KN=4K(b)AP@10
Figure4.5:Scalabilityanalysisoverlarge-scaledatasetImageNetintermsoftaggingpreci-
sion.Metric
AP@Nisusedforevaluation.Thesizeofevaluatedsubset
Nvariesfrom4
Kto1
M.accuracycurvetendstobematuredandtheperformancemoderatelyimprovesasthedata
sizeincreases,implyingthatthoughthereisalargeredundancyasthedatasetenlarges,it
isstillhelpfultoexplorelarge-scaledataset.
TRMCTRMCVv!KNNt!KNNLSAFreq0123456Methodslog10(Seconds)  N=1MN=500KN=250KN=125KN=63KN=32KN=16KN=8KN=4KFigure4.6:Scalabilityanalysisoverlarge-scaledatasetImageNetintermsofimplementation
time(log
10(seconds)).Thesizeofevaluatedsubset
Nvariesfrom4
Kto1
M.Figure4.6showsthatallthealgorithmsincomparisonshowsimilarscalabilityinterms
ofthedatasize,
i.e.,thetimecostedforimplementationincreasesexponentiallyasthedata
101sizeincreases.
4.7.3.3EvaluationonVariousTypesofRegularizer
WeattributethesuccessoftheproposedTCMRalgorithmmostlytothenuclearnorm
regularizerthatsimultaneouslyexplorestheinteractionbetweenbothimagesandtags.To
verifythispoint,weconductexperimentsthatreplacethenuclearnormregularizer
|Q|trin(4.14)with
!1norm|Q|1orFrobeniusnorm
|Q|Fregularizers.Afterthat,sinceneither
newlyconstructedoptimizationproblemhasaclosedformsolution,weusegradientdescent
method[177,188]tosolveboththenon-smooth
!1andFrobeniusnormoptimizationproblem.
Regularizer!1Frobenius
Nuclear
AP@1AP@3time
AP@1AP@3time
AP@1AP@3time
MirFlickr
8.1±0.35.7
±0.15.7e2
8.7±0.36.4
±0.23.8e1
28±0.619
±0.41.3e2
ESPGame
18±0.313
±0.26.5e2
19±0.414
±0.22.3e2
37±0.525
±0.13.5e2
IAPRTC12
37±0.327
±0.15.7e2
37±0.327
±0.12.2e2
47±0.333
±0.35.2e2
NUS-WIDE
17±0.414
±0.23.9e3
17±0.414
±0.22.1e2
48±0.239
±0.21.2e3
Table4.3:ComparisonoftagcompletionperformancebetweenTCMRanditscounterparts
withdi
!erentregularizers,evaluatedbyaccuracy(%)ande
"ciency/runningtime(s).
Table4.7.3.3summarizesboththeaccuracyande
"ciencyperformancesofTCMRand
itscounterpartswiththeothertypesofregularizer.Fromit,weobservethat
!1normand
Frobeniusnormregularizationgivesparseestimatesandgreatlyreducethecomputation
time,especiallyonlargescaledatasets.However,thenuclearnormoverwhelminglyoutper-
formsitscounterpartssinceitenforcesboththerow-wiseandcolumn-wiseinteractionofthe
tagmatrix,while
!1andFrobeniusnormstreateachentryindependently.
1024.7.3.4EvaluationonVariousLossFunctions
Besides,wealsocomparetheproposedmaximumlikelihoodlossfunctionwithacoupleof
popularlossfunctionsinmatrixcompletionwork[167],includingtheabsolute(
!1norm)loss,leastsquare(Frobeniusnorm)loss,hingelossandlogisticloss.
Lossfunctions
[1][2][3][4][5]AP@1AP@3
AP@1AP@3
AP@1AP@3
AP@1AP@3
AP@1AP@3
MirFlickr
22.815.1
28.118.7
25.517.0
28.218.7
28.318.8
ESPGame
31.122.4
37.024.8
31.922.8
37.024.7
37.124.9
IAPRTC12
43.632.2
45.733.3
44.932.9
45.932.8
47.433.5
NUS-WIDE
39.132.8
45.936.3
43.334.8
47.037.4
48.338.6
Table4.4:Comparisonoftagcompletionaccuracy(%)betweenTCMRanditscounterparts
withdi
!erentlossfunctions.Standarddeviationisomittedforsimplicity.[1]to[5]represent
absolute,leastsquare,hinge,logisticandmaximumlikelihoodlossfunctions,respectively.
Lossfunction
Absolute
Leastsquare
Hinge
Logistic
Likelihood
MirFlickr
3.65e+016.84e+038.09e+027.53e+031.26e+02ESPGame
3.52e+011.82e+035.29e+025.98e+033.50e+02IAPRTC12
9.10e+015.83e+038.35e+031.47e+045.16e+02NUS-WIDE
1.38e+022.21e+042.76e+032.06e+041.22e+03Table4.5:Comparisonoftagcompletione
"ciency(runningtimeinsecond)betweenTCMR
anditscounterpartswithdi
!erentlossfunctions.
Table4.4and4.5showthetagcompletionperformanceofTCMRanditscounterparts
withdi
!erentlossfunctionsintermsofaccuracyande
"ciency.Weobservethatfrom
theviewpointoftagcompletionaccuracy,theproposedmaximumlikelihoodlossfunction
signiÞcantlyoutperformtheotherlossfunctions,especiallywhenthesizeofthedatasetis
large.Fromtheviewpointofe
"ciency,absolutelossandhingelossaremuchfasterthan
theotherthreeones,buttheirtagcompletionaccuraciesaresigniÞcantlyworse.Logistic
lossfunctionperformsabitworsethanmaximumlikelihoodloss,ithowevertakespretty
103Algorithm
TCMR-lr
TCMRAP@1AP@3time(s)
AP@1AP@3time(s)
MirFlickr
26.4±0.417.4
±0.39.7e+1
28.3±0.618.8
±0.41.3e+2
ESPGame
37.6±0.425.0
±0.12.1e+2
37.1±0.524.9
±0.13.5e+2
IAPRTC12
47.3±0.533.6
±0.22.7e+2
47.4±0.333.5
±0.35.2e+2
NUS-WIDE
48.3±0.339.0
±0.21.9e+2
48.3±0.238.6
±0.21.2e+3
Table4.6:PerformancecomparisonofTCMRandTCMR-lr,intermsofbothaccuracy(%)
andrunningtime(s).
muchmorecomputationtime,whichdemonstratesthatproposedmaximumlikelihoodloss
functionistheoptimalsolutionwhichmakesagoodcompromisebetweentheaccuracyand
e"ciency.
FromSection4.7.3.4and4.7.3.3wecaneasilyseethereasonswhywechoosethecombi-
nationofmaximumlikelihoodlossandnuclearnormregularizer,whichyieldssuperiortag
completionaccuracyyetremainse
"cientincomputation.
4.7.3.5E
"cientExtensionofTCMRbyLinearReconstruction
WeuseTCMR-lrtorepresentthealgorithmproposedin(4.15),whichreconstructsthe
tagmatrixbylinearlycombiningthenoisymatrixrecoveryresultsandthenearestneighbor
votingresults.Thee
"ciencybottleneckofTCMRandTCMR-lristosolvetheoptimization
problemsin(4.4)and(4.14)thatconsistofthenuclearnorm.However,(4.4)ismuchfaster
becausethecomputationofterm(4.12)anditsgradientisquitetimeconsuming,which
reducestheupdatingspeedin(4.16).Table4.6showsthatunderthesameexperimental
setup,TCMR-lrachievessimilartagcompletionaccuracyasTCMRwhileittakesmuchless
computationaltime.
FromTable4.6,weobservethatTCMR-lrachievesalmostsimilartagcompletionper-
formanceasTCMRintermsofaccuracy,underproperconditionswithsu
"cienttagand
104IAPRTC12
NUS-WIDE
Noiseratio
0.70.9
0.20.30.50.70.9
TCMR-lr
27±0.66.8
±0.528±0.425
±0.216
±0.18.2
±0.11.5
±0.1TCMR28±0.719
±1.429±0.226
±0.218
±0.19.7
±0.15.1
±1.0Table4.7:PerformancecomparisonofTCMRandTCMR-lrwhentheobservedtagsare
severelynoisy.
AP@1isusedforevaluation.
moderatenoiselevel.TheessentialideasbehindthesetwoTCMRimplementationsarethe
same,whichenjoyboththetopicmodelbasednoisymatrixrecoverycomponentandthe
visuallynearestneighborvotingscheme.Besides,TCMR-lrismuchfasterthanTCMR.
Moreover,asthesizeofdatasetincreases,thegapbetweentheiraccuracyreduces.Sofor
largedatasets,wecandeÞnitelyuseTCMR-lrtoreplaceTCMRtospeeduptheoptimization
whiledonothurtthetagcompletionaccuracy.
Table4.7showsthecomparisonofTCMR-lrandTCMRwhentheyperformsigniÞcantly
di!erent.TheexperimentalsetupisdescribedinSection4.7.4.2.SinceTCMR-lrtakes
thelinearcombinationoftwotagmatricesfromsub-steps,itsu
!ersmoreunderextreme
caseswhentheobservedtagsareseverelycorruptedwithnoise.Thisisbecausetheinterac-
tion/relationshipofthetwosub-stepsareignored,whichpreventsÞndingtheglobaloptimal
solutionforthewholetagcompletionprocedure.Soonlyundercertaincircumstanceswith
moderatenumberofnoisyobservedtagentries,TCMR-lrisagoodalternativeofTCMR
whichisabletosavemuchcomputationtime;andwhenthereistoomuchnoise,TCMRis
stillhighlyrecommended.
1054.7.4E
!ectsonMissingandNoisyTags
4.7.4.1SensitivitytotheNumberofObservedTags
WealsoexaminethesensitivityoftheproposedTCMRtothenumberofinitiallyobserved
tagsbycomparingittothebaselinealgorithmsonIAPRTC12andNUS-WIDEdatasets.
Tomakeameaningfulevaluation,weonlykeepimageswith6ormoretagsforIAPRTC12
dataset,andimageswith9ormoretagsforNUS-WIDEdataset.Asbefore,wedividethe
tagsintotestingandtrainingsets,andrandomlysample
m!tagsforeachimagefromthe
trainingtagsettocreatethepartiallyobservedtagmatrix,wherethenumberofsampled
tagsm!isvaried.Weevaluatethetagcompletionperformanceonthetestingtagsets.
1234500.050.10.150.20.250.3Number of observed tags (m*)AP@3(a)
AP@3onIAPRTC12
12346800.050.10.150.20.25Number of observed tags (m*)AP@3  LRESMC!1FastTagLSRTagPropvKNNLSAtKNNTCMR(b)AP@3onNUS-WIDE
1234500.10.20.30.40.5Number of observed tags (m*)AR@5(c)
AR@5onIAPRTC12
12346800.10.20.30.4Number of observed tags (m*)AR@5  LRESMC!1FastTagLSRTagPropvKNNLSAtKNNTCMR(d)AR@5onNUS-WIDE
Figure4.7:Tagcompletionperformancewithvariednumberofobservedtags,evaluatedby
AP@3(toprow)and
AP@5(bottomrow).IAPRTC12isacleanandcompletedataset
whileNUS-WIDEcontainsmissingandnoisytags.
Figure4.7showstheinßuenceofthenumberofpartiallyobservedtagstotheÞnaltag
completionperformancemeasuredby
AP@3and
AR@5.Weobservethattheperformance
ofallalgorithmsimproveswithincreasingnumberofobservedtags.Wealsoobservethat
106whenthenumberofobservedtagsis3orlarger,TCMRandLSRperformsigniÞcantlybetter
thantheotherbaselineapproaches.Whenthenumberofobservedtagsissmall(
i.e.1or
2),TCMRperformssigniÞcantlybetterthanLSR,indicatingthattheproposedalgorithm
isnoticeablye
!ectiveevenwhenthenumberofobservedtagsissmall.
Besides,somealgorithms(TCMR,LSR,MC-1,tKNNandLSA)performsimilaron
bothIAPRTC12andNUS-WIDEdataset,
i.e.,thetagcompletionperformanceincreases
graduallyasthenumberofobservedtagsincreases.However,underthesameexperimental
setting,theotheralgorithms(LRES,FastTag,TagPropandvKNN)improvesigniÞcantlyon
IAPRTC12datasetbutimproveslightlyonNUS-WIDE.ThismightbecauseIAPRTC12
isacleandatasetcontainingsubstantiallycompletewhileNUS-WIDEisarawdataset
consistingofprettyincompleteandnoisytags.ThisphenomenonindicatesthattheÞrst
groupofalgorithmsiscapabletoexplorethevalidobservedtaginformationevenwhenthey
comewithnoise,
,i.e.
,theysomehowexplorestheinteractionbetweenimagesortagsto
dilutetheimpactofnoisytags.
4.7.4.2SensitivitytoNoise
Toevaluatethesensitivitytonoise,weconductexperimentswithnoisyobservedtagson
datasetsIAPRTC12andNUS-WIDE.Togeneratenoisytags,wereplacesomeofthesam-
pledtagswiththeincorrectonesthatarechosenuniformlyatrandomfromthevocabulary.
Thepercentageofnoisytagsamongthetotalobservedonesinthewholegalleryisvaried
from0to0
.9.Toensurethereareasu
"cientnumberofnoisytagsaswellassu
"cient
numberofimages,weset
m!,thenumberofsampledtags,tobe8forNUS-WIDEdataset
andtobe4forIAPRTC12datasetinthisexperiment.
Figure.4.8showsthetagcompletionperformancefordi
!erentalgorithmsusingnoisy
10700.10.20.30.50.70.900.050.10.150.20.250.3Noise percentageAP@3(a)IAPRTC12
AP@300.10.20.30.50.70.900.050.10.150.2Noise percentageAP@3  LRESMC!1FastTagLSRTagPropvKNNLSAtKNNTCMR(b)NUS-WIDE
AP@3Figure4.8:Comparisonoftagcompletionperformance(
AP@3)usingnoisyobservedtags.
observedtags.Itisnotsurprisingtoobservethattheperformanceofallalgorithmsin
comparisondegradeswithincreasingamountsofnoise.WealsoobservethatLSRseemstobe
signiÞcantlysensitivetothenoiseintheobservedtagsthantheproposedTCMRalgorithm.
Inparticular,weÞndthatTCMRoutperformsLSRsigniÞcantlywhenthepercentageof
noisytagsislarge.ThecontrastisparticularlyobviousfortheIAPRTC12dataset,where
LSRstartstoperformworsethanseveralotherbaselineswhenthenoiselevelisabove50%.
Besides,allalgorithmsreducetheirperformancedramaticallyasthenoiselevelincreasesfrom
70%to90%.Thisisnotsurprisingbecauseatthe90%noiselevel,anumberofimagesdonot
haveaccurateobservedtagsfortrainingthemodel,especiallyfortheNUS-WIDEdataset
whoseoriginallyassignedtagsareprettynoisy.However,theproposedTCMRalgorithm
isoverwhelminglybetterinthiscase,especiallyonIAPRTC12,indicatingthatitismore
powerfulinrecoveringexpectedtagsfromseverelynoisytaggedimages.Table4.8showsthe
tagcompletionresultsofexemplarimagesbydi
!erentalgorithms,wherebothpartiallytrue
andnoisytagsareobserved.
4.7.5E
!ectsonOtherTag-relevantApplications
ToevaluatetherobustnessoftheproposedTCMRalgorithmonimagetaggingtasks,we
compareittothebaselinealgorithmsontagrankingandtagreÞnementtasks.Compared
108withtagcompletion,thesetwotasksrequiremoreinitiallyobservedtags.Besides,among
thefouruseddatasets,onlyNUS-WIDEhasmanuallyannotatedtags,whichareregardedas
thetruerelevanttagsintheevaluationphase.Soinordertomaketheevaluationstatistically
meaningful,wedotheevaluationonNUS-WIDEwiththenumberofobservedtagsincreased
from
m!=4to
m!=8.
WeÞrstrandomlysample
m!tagsforeachimagetocreatethetrainingtagset
Ttr.And
thentheobservedtagmatrixisgeneratedfrom
Ttrbyrandomlyaddingcertainnumberof
noisytagswhileremovingthesamenumberoforiginallyassociatedtagsforeachimage.The
percentageofnoisytagsoutoftotalobservedtagsvariesfrom0to0
.9.Denotethetotal
initiallyassignedtagsasset
Tk,andthemanuallylabeledtagsetofNUS-WIDEas
Tf.The
taggingperformancesfortagrankingandtagreÞnementtasksareevaluatedontestingset
TkandTf,respectively.
00.10.20.30.50.70.900.20.40.60.8Noise percentageAP@10(a)NUS-WIDE,Tagranking,with
AP@10
00.10.20.30.50.70.900.050.10.150.20.25Noise percentageAP@3  LRESMC!1FastTagLSRTagPropvKNNLSAtKNNTCMR(b)NUS-WIDE,TagreÞnement,with
AP@3Figure4.9:ComparisonbetweenTCMRandbaselinealgorithmswithvariedpercentageof
noisytagsintermsofothertwotagrelevantapplications,includingtagrankingandtag
reÞnement.ThecounterpartperformanceoftagcompletioncanbereferredtoFigure4.8(b).
Figures4.8(b)and4.9showtheimpactofnoisytagstotheÞnalaccuracyforthese
threetasks,includingtagcompletion,tagrankingandtagreÞnement,measuredby
AP@3,AP@10and
AP@3,respectively.Weobservethattheperformanceofallalgorithmsdegrades
withincreasingnoisepercentage.Wealsoobservethatwhenthenoisepercentageislow,
althoughsomebaselinesyieldsimilarperformanceasTCMRforcertaintasks(
i.e,LSRfor
109tagcompletion,MC-1fortagrankingandtKNNfortagreÞnement),TCMRsigniÞcantly
outperformsthemonothertasks,whichmeansTCMRismorerobusttoincompleteand
noisytagsthanthebaselinealgorithmsonthesethreeimagetaggingtasks.
4.8Summary
Inthissection,wehaveproposedarobustyete
"cientimagetagcompletionalgorithm
(TCMR),whichiscapableofsimultaneouslyÞllinthemissingtagsandremove/downweight
thenoisytags.TCMRintroducesanoisymatrixrecoveryprocedurethatcapturesthe
underlyinginteractionamongtagsbyenforcingtherecoveredtagmatrixtobeoflowrank.
Besides,agraphLaplacianbasedontheimagevisualfeaturesisalsoincorporatedtoensure
therecoveredtagmatrixisconsistentwiththevisualcontentsofimages.Experiments
overÞvedi
!erentscaledimagedatasetswithsizeupto1Mdemonstratethee
!ectiveness
ande
"ciencyoftheproposedTCMRalgorithmbycomparingittostate-of-the-arttag
completionapproaches.Inthefuture,weplantoimprovethetagcompletionperformance
byincorporatingthevisualfeaturesmoree
!ectively,andadoptingmoree
"cientnuclear
normoptimizationprocedure.
110building,front,
boy,cap,hair,
bank,bush,
balcony,door,
bed,brick,
church,llama,
Groundgroup,people,
house,power,
helmet,entrance,car,
curtain,leg,
meadow,range,
palm,lawn,
pole,roof,sky,
jacket,life,
ßag,front,
man,short,
mountain,
truthtree,square,
shirt,sweater,
people,river,
lamp,house,
sweater,
roof,tourist,
statueterrain,tree
rock,tree
sky,window
wall,woman
tower,train
Observ-lawn,people,
cap,terrain,
life,river
balcony,car,
curtain,wall,
church,range,
edtags
square,
cloud
sky,
meadow
tree,llama
window,
water
ßoor
,team
train,lawnLSERpeople
,bike,
terrain,sky,tree,river
,entrance
,car,woman
,wall
,wall,
tourist,wall,cloud,
hair,sweater
,life,helmet
,front
,balcony
,table,room,
people,house,
square,roof
,mountain,
rock
,woman
water,
window
,hand,curtain,range,lawn,
tree,house,
wall,meadow,
llama,jacket
,building,people,
ßoor,team,
church
,train,lawn
,palmcap,trouser
gravel,
people
harbour,
skyperson,front
grave,child
MC-1
people
,square,sky,meadow,
tree,river
,window
,car,wall
,curtain,range,lawn,
cloud,lawn
,terrain,cap,life,man,
balcony
,water,
ßoor,team,
church
,train,tree,sky,
wall,mountain,
llama,wall,
man,front
,window,room,
front,
mountain
,building,man,house,people
,front
building,wall,
man,table,
wall,people,
front
,wall
woman,
hairmountain,sky
house,woman
front,
bed
tourist,man
FastTag
tree,tourist,
wall,
boy
,life,mountain,
building,front
,wall
,room,
tourist,front,
footpath,shirt,
desk,meadow,
people
,front,
house,car,table,window,
wall,
mountain
,river,
group,mountain,girl,
tourist,railing,
grey,
window
,bed
,curtain,classroom,house,
woman,tile
hair,tee-shirt,
river
,llama,
rail,balcony
,hand,night,
body,fjord,
people
plane,fence
tree,wall
street,photo
cup,towel
square,tile
LSRsky,
square,house,sky,bank,jacket
,front
,building,
wall
,room,
mountain
,view,
building,hill,boy
,grey,
river
,helmet
,house,wall,
window,front,
tower
,woman,
people
,tree,jacket,
tree,bush,tourist,
sky,cli
!,uniform,bed
,people,
roof
,house,lawn
,terrain,cloud,
boat,mountain,
door
,window
,table,jersey,
square,street,
street,cloud
landscape
tree,people
street,man
short,round
snow,park
TagProp
people
,tree,wall,woman,
people
,tree,wall,
front
,front,
woman
,people,wall,
square,man,sky,front,
woman,front,
man,building,
wall
,table,
tourist,mountain
,house,front
,sweater
,hair,man,rock
,woman,table,
man,house,
front,man,
wall,tourist,
mountain,
wall,
river
,people,
house,room,people,
table,woman,
man,woman
table,desert
sky,mountain
sky,entrance
tree,window
tree,square
vKNNtree,wall,
sweater
,desert,
people
,tree,front
,building,
room,
woman
,tourist,people,
house,sky,landscape,
helmet
,front,
people,
house,table,front,
wall,table,
people
,sky,
terrain,hair,river
,bush,entrance
,sky,house,wall
,house,square,
woman,bike,
mountain,wall,
woman,
life,wall,
balcony
,man,chair,
mountain
,tree,
front
,squarecloud,front
sky,man
tree,window
window,child
hill,lawn
LSApeople
,cloud,
sky,meadow,
tree,bush,car,window
,wall
,room,
mountain
,building,
square,roof,
cloud,hair,lake,palm,
street,house,table,bed
,range,people,
group,meadow,
roof
,road,
meadow,
river
,building,room,
window,hair,
snow,tree,
building,tower,
short,tree,tourist,slope,
lamp
,front
,girl,wood,
house,street,
landscape
woman,
boy
building,grass
bed,bush
boy,
curtaincity,wall
tKNN
people
,square,sky,meadow,
tree,river
,window
,car,wall
,ßoor,
range,church
,cloud,lawn
,terrain,cap,life,bush,balcony
,wall,
curtain,room,
mountain
,view,
sky,
tree,people,cloud,
house,sky,
house,front
,bed
,front,
lawn,
train,mountain,street,
hill,mountain,
building,man,
building,bed,
window,girl,
front,snow,
buildingroad,treepeople
,bankroom,curtain
team,brick
landscape,column
TCMR
people
,square,sky,terrain,tree,river
,car,window
,wall
,ßoor,
range,lawn,
lawn
,sky,
cap,boy
,hill,
life,boat,
balcony
,door
,curtain,bed
,church
,train,building,house,hair,jacket
,bank,building,wall,
brick
,room,
mountain
,people,
tree,cloud,
landscape,
llama,helmet
,front
,house,window,front,
tower
,square,
street,palmsweater
,cloud
rock
,mountain
water,
skytable,team,
view,street
Table4.8:Examplesoftagcompletionresultsgeneratedbysomebaselinealgorithmsand
theproposedTCMR.Theobservedtagsinreditalicfontarenoisytags,andothersare
randomlysampledfromthegroundtruthtags.Thecompletedtagsarerankedbasedonthe
recoveredscoresindescendingorder,andthecorrectonesarehighlightedinblueboldfont.
111Chapter5
SummaryandConclusions
Inthisdissertation,wedesignedtwoalgorithmsforimagetagcompletiononlargescale
datasetswheretheobservedtagsmightbeincompleteandcorruptedwithnoise.Thepro-
posedalgorithms,namely,RKMLandTCMR,achievetheultimatetaskaroundtwoques-
tionsincluding
¥HowcanweÞndbetterneighbors(visuallysimilarimages)foragivenimage?
¥Howcanwemaximallyexploitthehintsbehindthegiventags?
5.1Contributions
ThisdissertationmainlyanswersthetwoquestionsraisedabovebyproposingspeciÞcalgo-
rithmsasfollows,givingtheoreticalguaranteesandprovidingempiricalcomparisonswith
state-of-the-artbaselinealgorithms.
5.1.1ImageAnnotationbyKernelMetricLearning
TheRKML(shortfor
RegressionbasedKernelMetricLearning
)algorithmpresented
inChapter3isadistancemetriclearningalgorithmdesignedforsearchbasedimageanno-
tation.ItanswerstheÞrstquestionandaddressesacoupleofchallengescommonlyexisting
inkernelmetriclearning,intermsofboththeoryandreal-worldapplication.Themain
112contributionsoftheproposedRKMLcanbeconcludedasfollows.
¥Provideakernelmetriclearningwiththeoreticalguarantee
.Wedemonstrate
therobustnessofRKMLinthehighdimensionalkernelspacebyprovingthetheoretical
guaranteesofthelearnedkernelmetricfortheÞrsttime.
¥E"cientmetriccomputation
.ThePSDpropertyisautomaticallyguaranteedby
thespecialpropertyofregressionandthusnoneedtotakeextraprojections,and
Nystr¬omapproximation[43]isappliedtoavoidthedirectcomputationbasedonthe
fullkernel.Thoseactionsgreatlyimprovethemetriccomputationale
"ciency.
¥E!ectivemetricforimageannotation
.ThenotoriousoverÞttingriskisalleviated
byarankregularizerofthelearnedkernelmetric.Besides,imagetagsaredirectly
utilizedtocomputenumericsemanticsimilarities,whichmakebetteruseofthetag
informationandsubstantiallypromotetheimageannotationperformance.
5.1.2ImageTagCompletionbyNoisyMatrixRecovery
TheTCMR(shortfor
tagcompletionbynoisymatrixrecovery
)algorithmpresented
inChapter4isanoisymatrixcompletionbasedalgorithmdesignedforimagetagcompletion
problem.Itanswersthesecondquestionraisedatthebeginningofthischapterandaddresses
thechallengesofapplyingnoisymatrixcompletiontheorytopracticalimagetagcompletion
tasks.ThemaincontributionsoftheproposedTCMRaresummarizedasfollows.
¥Incorporatenoisymatrixrecoverytheorytoimagetagcompletionwith
theoreticalguarantees
.Lowranknoisymatrixrecoveryisachievedbynuclear
normwithminimization,leadingtothesuccessinÞllingoutmissingtagswhiledown-
weightingnoisyones.Boththeoreticalsupportandempiricalevaluationareprovided.
113¥Proposeaconvexoptimizationproblembasedontopicmodel
.Although
inspiredbytheideaoftopicmodels,unlikethemtheproposedTCMRsolvesaconvex
optimizationproblem,leadingtoamoree
"cientoptimizationprocedureandavoiding
theestimationofabunchofhyper-parameters.
¥Exploitfullytheimagevisualcontents
.TCMRimprovesthetagcompletion
performancebyexploitingthestatisticaldependencebetweenimagefeaturesandtags
viaagraphLaplacian[224,226],whichreducestheimpactofincompleteandnoisy
tagsbykeepingtherecoveredtagmatrixconsistentwithimagevisualfeatures.
¥Applytomultipletagrelevanttasks
.TCMRhasbeensuccessfullyappliedto
multipletagrelevanttasksincludingtagcompletion,tagrankingandtagreÞnement
underthedefectivescenario.Extensiveexperimentsonbenchmarkdatasetsshowthat
TCMRismorerobusttoincompleteandnoisytagsthanthebaselinealgorithmson
thesethreeimagetaggingtasks.
5.2FutureWork
Thestudiespresentedinthethesisleadtoseveralimportantresearchquestions,which
weplantoinvestigateinthecomingmonthsandwillappearintheÞnalversionofthe
dissertation.
¥Improvethee
"ciencyofTCMRonlargescaledatasets
.AlthoughTCMRhas
aquitegoodtagcompletionperformanceintermsofbothe
!ectivenessande
"ciency,
itiscurrentlynotabletowelldealwithlargescaledatasetsduetothecomputational
cost.Thebottleneckisthenuclearnormoptimization.Thestate-of-the-artsolution
114tooptimizethenuclearnormisiterativeschemes,whereeachiterationinvolvesaSVD
decompositionanditusuallytakesmorethan100iterationstoconvergefora1
M$2Kmatrix.Throughourexperiments,wefoundthatwhenthedatasizeissu
"cientlarge(
largerthan60K),asthedatasizeincreases,thetagcompletionperformanceimproves
marginallyandinsigniÞcantly,whichindicatesthereismuchredundantinformation.
Bytakingadvantageofthispoint,itispossibletointegratetheunderlyingideaof
RKML,randomsampling,tooptimizethenuclearnormsolutioninlarge-scaleproblem.
¥Applynumerictaginformation
.Currently,bothRKMLandTCMRareusing
binarytaginformation.Howeverthereactuallyaremuchnumerictaginformation
thatreßectstheconÞdencescorewhenassigningatagtoanimage,wheremissing
tagsareusuallywithsmallscoresandnoisytagsaremainlywiththeambiguousones.
SowenextplantoexplorethismorespeciÞcinformationtoimproveourimagetag
completionalgorithms.
¥Exploreadistancemetriclearningadaptivetoincompleteandnoisytags
.Distancemetricsareusuallylearnedfromsupervisedinformationthatisassumedto
beperfectintraditionalproblemslikeclassiÞcationandclustering.However,inimage
taggingproblem,thisassumptionisnolongertrue.Eachimagecanbeassociatedwith
multipletags,andamongthemsomeareincorrectorirrelevanttothevisualcontent.
Andalsosometagassociationsmightbemissingforsomereasons.Inthissituation,
howobtainavalidande
!ectivedistancemetricturnstobemeaningfulandproÞtable.
SoweplantoextendourworkofRKMLandmakeitadaptivetoimageswithmissing
andnoisytags.Matrixcompletiontechniquewillbeusedtosupplementarycapture
thetag-tagcorrelation.
115¥Introducedeepconvolutionalneuralnetwork
.Ascurrentlymosthottesttopicin
MachineLearningandComputerVision,deepconvolutionalneuralnetworkhasbeen
provedtobee
"cientandsigniÞcantlye
!ectivebyvastliterature[34,35,109,121].In
thefollowingyears,Iplantoexploremoreaboutthedeeplearningandtrytoapplyit
toimagetaggingproblems
5.3Conclusions
Thisdissertationanswersthetwoquestionsraisedinthebeginningofthischapterbypre-
sentingtwonewimagetagcompletionalgorithmsforlargescaledatasetswheretheobserved
tagsmightbeincompleteandnoisy.Toassignappropriatetagstoeachimage,twoe
!ective
ande
"cientimagetaggingmodelsareembeddedintotheproposedalgorithms,including
kernelmetriclearningamongimagesandimagetagcompletionbynoisymatrixrecovery.
TheconcludedresearchmakessigniÞcantcontributionsto(i)thetheoreticalfoundations
ofexploringtheimage-imagecorrelationandtag-taginteraction,(ii)thechallengesofkernel
metriclearning,(iii)thedi
"cultyofcouplingtopicmodelandnoisymatrixrecovery,and
(iv)theempiricalapplicationsandimplementationstolargescaleimagedata.Algorithms
presentedhereadvancethestate-of-the-artofimageannotationandimagetagcompletion
works.Severalfutureresearchdirectionsofnewimagetaggingrelevantalgorithmsarealso
identiÞed.
116APPENDIX
117Appendix
TechnicalBackgrounds
A.1LowRankMatrixApproximation
Inthisappendix,wegivetheproofsofthedetailedsupportingtheoremsoflowrankmatrix
approximationforSection3.5.
A.1.1ProofofTheorem3.2
Proof.
Let(
&i,ui),i=1,...,n
betheeigenvaluesandeigenvectorsof
K.DeÞne
U=(u1,...,
un).Accordingto[175],theeigenfunctionsof
Lnisgivenby
")i(á)=1/&in#j=1Uj,i
%(xj,á).Wethereforehave
r#i=1")i(á)*")i(á),gk(á)+H!=r#i=1n#a,b=11&i%(xa,á)*%(xb,á),gk(á)+H!Ua,iUb,i
=r#i=1#a=1%(xa,á)1&iUa,iUb,i
Yb,k
=r#i=1#a=1%(xa,á)1&iUa,iU&!,iyk=n#a=1%(xa,á)[Ur&%1rUryk]i=n#a=1%(xa,á)[K%1ryk]i.118A.1.2ProofofTheorem3.3
Proof.
DeÞnealinearoperator
GasG[f]=m#k=1gk(á)*gk,f+H!.DeÞnetwoprojectionoperator
"PandPas"P[f]=r#i=1")i(á)*")i(á),f(á)+H!,P[f]=r#i=1)i(á)*)i(á),f(á)+H!.Using
G,"PandP,wewrite
"TandT!as"T="PG"P,T
!=PGP.
Usingthesin
'theorem[181],wehave
|"P%P|-|L%Ln|2&r(Ln)%&r+1(L).Since
&r(Ln)=&r/n,and
&r+1(L)-&r+1(Ln)+|L%Ln|2,wehave
|"P%P|-|L%Ln|2(&r%&r+1)/n%|L%Ln|2.119Wecompletetheproofbyusingthefact
|("T%T)[f]|H!-|("P%P)G"P[f]|H!+|PG("P%P)[f]|H!.A.2MatrixCompletion
Inthisappendix,wegivetheproofsofthetwosupportinglemmasthatareusedtobound
thematrixrecoveryerrorforSection4.5.
A.2.1ProofofLemma4.4
Proof.
Wehave
n#i=1m#j=1|Pi,j%Qi,j|2Qi,j=n#i=1@Am#j=1|Pi,j%Qi,j|2Qi,jBC@Aj#i=1Qi,jBC.n#i=1m#j=1|Pi,j%Qi,j|HQi,j!Qi,j=|P%Q|1.A.2.2ProofofLemma4.5
Proof.
Tofacilitateouranalysis,werewriteeach
diasdi=m!#j=1dji,120where
djiistheimagetagvectorcorrespondingtothe
j-thwordsamplingforthetagvector
ofthe
i-thimage.ToutilizeLemma4.5,wedeÞne
Zi,jasZi=)dji%pi*e&i,andtherefore
M=1m!n#i=1m!#j=1Zi,j.Tobound
UinLemma4.5,wehave
|Zi,j|!-777dji%pi7772-|dji|2-1.Tobound
,Z,wecompute
7771nm!0ni=10m!j=1E&Zi,jZ&i,j'777!=77771nm!0ni=10m!j=1E()dji%pi*)dji%pi*&+7777!=7771nm!0ni=10m!j=1E&dji(dji)&'%pip&i777!-max1-j-m1n0ni=1pi,j(i%p2i,j)=|P1|'n.Similarly,wehave
7771nm!0ni=10m!j=1E&Z&iZi'777!=77771nm!0ni=10m!j=1E()dji%pi*&)dji%pi*eie&i+7777!=7771nm!0ni=10m!j=1E&)d&idi%p&ipi*eie&i'777!-1n.Wecompletetheproofbypluggingtheboundsfor
Uand,Z.121BIBLIOGRAPHY
122BIBLIOGRAPHY
[1]YonatanAmit,MichaelFink,NathanSrebro,andShimonUllman.Uncoveringshared
structuresinmulticlassclassiÞcation.In
Proceedingsofthe24thinternationalconfer-
enceonMachinelearning
,pages17Ð24.ACM,2007.
[2]MahdiehSoleymaniBaghshahandSaeedBagheriShouraki.Non-linearmetriclearning
usingpairwisesimilarityanddissimilarityconstraintsandthegeometricalstructureof
data.PatternRecognition
,43(8):2982Ð2992,2010.
[3]AharonBar-Hillel,TomerHertz,NoamShental,andDaphnaWeinshall.Learninga
mahalanobismetricfromequivalenceconstraints.
JMLR
,6:937Ð965,2005.
[4]KobusBarnard,PinarDuygulu,DavidForsyth,NandoDeFreitas,DavidMBlei,and
MichaelIJordan.Matchingwordsandpictures.
TheJournalofMachineLearning
Research
,3:1107Ð1135,2003.
[5]KobusBarnardandDavidForsyth.Learningthesemanticsofwordsandpictures.In
ComputerVision,2001.ICCV2001.Proceedings.EighthIEEEInternationalConfer-
enceon
,volume2,pages408Ð415.IEEE,2001.
[6]G.BaudatandFatihaAnouar.Generalizeddiscriminantanalysisusingakernelap-
proach.
NeuralComputation
,12(10):2385Ð2404,2000.
[7]HerbertBay,TinneTuytelaars,andLucVanGool.Surf:Speededuprobustfeatures.
InComputervisionÐECCV2006
,pages404Ð417.Springer,2006.
[8]Aur«elienBellet,AmauryHabrard,andMarcSebban.Similaritylearningforprovably
accuratesparselinearclassiÞcation.
arXivpreprintarXiv:1206.6476
,2012.
[9]Aur«elienBellet,AmauryHabrard,andMarcSebban.Asurveyonmetriclearningfor
featurevectorsandstructureddata.
CoRR
,abs/1306.6709,2013.
[10]MichaelW.BerryandMurrayBrowne.
Understandingsearchengines:mathematical
modelingandtextretrieval
.SocietyforIndustrialandAppliedMathematics,Philadel-
phia,PA,USA,1999.
[11]DavidM.Blei.Probabilistictopicmodels.
CommunicationsoftheACM
,2012.
123[12]DavidM.Blei,AndrewY.Ng,MichaelI.Jordan,andJohnLa
!erty.Latentdirichlet
allocation.
JournalofMachineLearningResearch
,2003.
[13]SerhatSelcukBucak,RongJin,andAnilK.Jain.Multi-labellearningwithincomplete
classassignments.In
CVPR
,pages2801Ð2808,2011.
[14]RicardoS.Cabral,FernandoDelaTorre,Jo÷
aoP.Costeira,andAlexandreBernardino.
Matrixcompletionforweakly-supervisedmulti-labelimageclassiÞcation.
IEEETrans-
actionsPatternAnalysisandMachineIntelligence(PAMI)
,2014.
[15]RicardoS.Cabral,FernandoD.DelaTorre,JoaoP.Costeira,andAlexandre
Bernardino.Matrixcompletionformulti-labelimageclassiÞcation.In
NIPS
,pages
190Ð198.2011.
[16]Jian-FengCai,EmmanuelJCand`es,andZuoweiShen.Asingularvaluethresholding
algorithmformatrixcompletion.
SIAMJournalonOptimization
,20(4):1956Ð1982,
2010.[17]T.TonyCaiandWenxinZhou.Matrixcompletionviamax-normconstrainedopti-
mization.
CoRR
,abs/1303.0341,2013.
[18]EmmanuelJCand`es,XiaodongLi,YiMa,andJohnWright.Robustprincipalcom-
ponentanalysis?
JournaloftheACM(JACM)
,58(3):11,2011.
[19]EmmanuelJ.Cand`esandYanivPlan.Matrixcompletionwithnoise.
Proceedingsof
theIEEE
,98(6):925Ð936,2010.
[20]EmmanuelJ.Cand`esandBenjaminRecht.Exactmatrixcompletionviaconvexopti-
mization.
FoundationsofComputationalMathematics
,9(6):717Ð772,2009.
[21]EmmanuelJCand`esandTerenceTao.Thepowerofconvexrelaxation:Near-optimal
matrixcompletion.
InformationTheory,IEEETransactionson
,56(5):2053Ð2080,
2010.[22]GustavoCarneiro,AntoniBChan,PedroJMoreno,andNunoVasconcelos.Supervised
learningofsemanticclassesforimageannotationandretrieval.
PAMI
,29(3):394Ð410,
2007.[23]VenkatChandrasekaran,SujaySanghavi,PabloAParrilo,andAlanSWillsky.Rank-
sparsityincoherenceformatrixdecomposition.
SIAMJournalonOptimization
,21(2):572Ð596,2011.
124[24]RatthachatChatpatanasiri,TeesidKorsrilabutr,PasakornTangchanachaianan,and
BoonsermKijsirikul.Anewkernelizationframeworkformahalanobisdistancelearning
algorithms.Neurocomputing
,73(10):1570Ð1579,2010.
[25]GalChechik,VarunSharma,UriShalit,andSamyBengio.Largescaleonlinelearning
ofimagesimilaritythroughranking.
JMLR
,11:1109Ð1135,2010.
[26]JianhuiChen,ZhengZhao,JiepingYe,andHuanLiu.Nonlinearadaptivedistance
metriclearningforclustering.In
KDD
,2007.
[27]LinChen,DongXu,IvorW.Tsang,andJieboLuo.Tag-basedwebphotoretrieval
improvedbybatchmodere-tagging.In
CVPR
,2010.
[28]MinminChen,AliceZheng,andKilianQ.Weinberger.Fastimagetagging.In
ICML
,2013.[29]YudongChen,HuanXu,ConstantineCaramanis,andSujaySanghavi.Robustmatrix
completionandcorruptedcolumns.In
Proceedingsofthe28thInternationalConference
onMachineLearning,ICML2011,Bellevue,Washington,USA,June28-July2,
2011,pages873Ð880,2011.
[30]LiCheng.Riemanniansimilaritylearning.In
Proceedingsofthe30thInternational
ConferenceonMachineLearning(ICML-13)
,volume28,pages540Ð548.JMLRWork-
shopandConferenceProceedings,May2013.
[31]AlexanderLChistovandDYuGrigorÕev.ComplexityofquantiÞereliminationinthe
theoryofalgebraicallyclosedÞelds.In
MathematicalFoundationsofComputerScience
1984,pages17Ð31.Springer,1984.
[32]RadhaChitta,RongJin,andAnilK.Jain.E
"cientkernelclusteringusingrandom
fourierfeatures.In
ICDM
,2012.
[33]Tat-S.Chua,JinhuiTang,RichangHong,HaojieLi,ZhipingLuo,andYan-T.Zheng.
Nus-wide:Areal-worldwebimagedatabasefromnationaluniversityofsingapore.In
CIVR
,2009.
[34]DanCiresan,UeliMeier,andJ¬urgenSchmidhuber.Multi-columndeepneuralnetworks
forimageclassiÞcation.In
ComputerVisionandPatternRecognition(CVPR),2012
IEEEConferenceon
,pages3642Ð3649.IEEE,2012.
125[35]RonanCollobertandJasonWeston.AuniÞedarchitecturefornaturallanguagepro-
cessing:Deepneuralnetworkswithmultitasklearning.In
Proceedingsofthe25th
internationalconferenceonMachinelearning
,pages160Ð167.ACM,2008.
[36]ClaudioCusano,GianluigiCiocca,andRaimondoSchettini.Imageannotationusing
svm.In
ElectronicImaging2004
,pages330Ð338.InternationalSocietyforOpticsand
Photonics,2003.
[37]NavneetDalalandBillTriggs.Histogramsoforientedgradientsforhumandetection.
InComputerVisionandPatternRecognition,2005.CVPR2005.IEEEComputer
SocietyConferenceon
,volume1,pages886Ð893.IEEE,2005.
[38]NavneetDalalandBillTriggs.Histogramsoforientedgradientsforhumandetection.
InComputerVisionandPatternRecognition,2005.CVPR2005.IEEEComputer
SocietyConferenceon
,volume1,pages886Ð893.IEEE,2005.
[39]JasonV.Davis,BrianKulis,PrateekJain,SuvritSra,andInderjitS.Dhillon.
Information-theoreticmetriclearning.In
ICML
,pages209Ð216,2007.
[40]JiaDeng,NanDing,YangqingJia,AndreaFrome,KevinMurphy,SamyBengio,Yuan
Li,HartmutNeven,andHartwigAdam.Large-scaleobjectclassiÞcationusinglabel
relationgraphs.In
ComputerVision-ECCV2014-13thEuropeanConference,Zurich,
Switzerland,September6-12,2014,Proceedings,PartI
,pages48Ð64,2014.
[41]YiningDeng,BSManjunath,CharlesKenney,MichaelSMoore,andHyundooShin.
Ane
"cientcolorrepresentationforimageretrieval.
ImageProcessing,IEEETransac-
tionson
,10(1):140Ð147,2001.
[42]Zhi-HongDeng,HongliangYu,andYunlunYang.Imagetaggingviacross-modal
semanticmapping.In
Proceedingsofthe23rdACMInternationalConferenceonMul-
timedia
,MMÕ15,pages1143Ð1146,NewYork,NY,USA,2015.ACM.
[43]PetrosDrineasandMichaelWMahoney.Onthenystr¬ommethodforapproximating
agrammatrixforimprovedkernel-basedlearning.
JMLR
,6:2153Ð2175,2005.
[44]KunDuan,DeviParikh,DavidCrandall,andKristenGrauman.Discoveringlocalized
attributesforÞne-grainedrecognition.In
Proc.IEEEConf.onComputerVisionand
PatternRecognition
,2012.
[45]SusanTDumais.Latentsemanticanalysis.
Annualreviewofinformationscienceand
technology
,38(1):188Ð230,2004.
126[46]P.Duygulu,KobusBarnard,J.F.G.deFreitas,andDavidA.Forsyth.Object
recognitionasmachinetranslation:LearningalexiconforaÞxedimagevocabulary.
InProceedingsofthe7thEuropeanConferenceonComputerVision-PartIV
,ECCV
Õ02,pages97Ð112,London,UK,UK,2002.Springer-Verlag.
[47]JianpingFan,YuliGao,andHangzaiLuo.Multi-levelannotationofnaturalscenes
usingdominantimagecomponentsandsemanticconcepts.In
ACMMultimedia
,2004.
[48]PedroFFelzenszwalb,RossBGirshick,DavidMcAllester,andDevaRamanan.Ob-
jectdetectionwithdiscriminativelytrainedpart-basedmodels.
PatternAnalysisand
MachineIntelligence,IEEETransactionson
,32(9):1627Ð1645,2010.
[49]S.L.Feng,R.Manmatha,andV.Lavrenko.Multiplebernoullirelevancemodelsfor
imageandvideoannotation.In
CVPR
,2004.
[50]SongheFeng,ZheyunFeng,andRongJin.Learningtorankimagetagswithlimited
trainingexamples.
ImageProcessing,IEEETransactionson
,24(4):1223Ð1234,2015.
[51]ZheyunFeng,SongheFeng,RongJin,andAnilKJain.Imagetagcompletionbynoisy
matrixrecovery.In
ComputerVisionÐECCV2014
,pages424Ð438.Springer,2014.
[52]ZheyunFeng,RongJin,andAnilJain.Large-scaleimageannotationbye
"cientand
robustkernelmetriclearning.In
ComputerVision(ICCV),2013IEEEInternational
Conferenceon
,pages1609Ð1616.IEEE,2013.
[53]RinaFoygel,OhadShamir,NatiSrebro,andRuslanSalakhutdinov.Learningwiththe
weightedtrace-normunderarbitrarysamplingdistributions.In
AdvancesinNeural
InformationProcessingSystems
,pages2133Ð2141,2011.
[54]RinaFoygelandNathanSrebro.Concentration-basedguaranteesforlow-rankmatrix
reconstruction.InShamM.KakadeandUlrikevonLuxburg,editors,
COLT
,volume19
ofJMLRProceedings
,pages315Ð340.JMLR.org,2011.
[55]FabioDelFrate,FabioPaciÞci,GiovanniSchiavon,andChiaraSolimini.Useofneural
networksforautomaticclassiÞcationfromhigh-resolutionimages.
Geoscienceand
RemoteSensing,IEEETransactionson
,45(4):800Ð809,2007.
[56]GeorgeW.Furnas,CaterinaFake,LuisvonAhn,JoshuaSchachter,ScottGolder,
KevinFox,MarcDavis,CameronMarlow,andMorNaaman.Whydotaggingsystems
work?In
CHIÕ06ExtendedAbstractsonHumanFactorsinComputingSystems
,CHI
EAÕ06,pages36Ð39,NewYork,NY,USA,2006.ACM.
127[57]ArvindGanesh,JohnWright,XiaodongLi,EmmanuelJ.Cand`es,andYiMa.Dense
errorcorrectionforlow-rankmatricesviaprincipalcomponentpursuit.In
ISIT
,2010.
[58]ShenghuaGao,Liang-TienChia,andI.W.Tsang.Multi-layergroupsparsecodingÐ
forconcurrentimageclassiÞcationandannotation.In
ComputerVisionandPattern
Recognition(CVPR),2011IEEEConferenceon
,pages2809Ð2816,June2011.
[59]ShenghuaGao,ZhengxiangWang,Liang-TienChia,andIvorWai-HungTsang.Au-
tomaticimagetaggingviacategorylabelandwebdata.In
ACMMultimedia
,2010.
[60]AmirGlobersonandSamT.Roweis.Metriclearningbycollapsingclasses.In
NIPS
,2005.[61]AndrewGoldberg,BenRecht,JunmingXu,RobertNowak,andXiaojinZhu.Trans-
ductionwithmatrixcompletion:Threebirdswithonestone.In
Advancesinneural
informationprocessingsystems
,pages757Ð765,2010.
[62]DavidGoldberg,DavidNichols,BrianMOki,andDouglasTerry.Usingcollaborative
Þlteringtoweaveaninformationtapestry.
CommunicationsoftheACM
,35(12):61Ð70,
1992.[63]JacobGoldberger,SamT.Roweis,Geo
!reyE.Hinton,andRuslanSalakhutdinov.
Neighbourhoodcomponentsanalysis.In
NIPS
,2004.
[64]YunchaoGong,YangqingJia,ThomasLeung,AlexanderToshev,andSergey
Io!e.Deepconvolutionalrankingformultilabelimageannotation.
arXivpreprint
arXiv:1312.4894
,2013.
[65]RafaelCGonzalez.
Digitalimageprocessing
.PearsonEducationIndia,2009.
[66]DavidGross.Recoveringlow-rankmatricesfromfewcoe
"cientsinanybasis.
Infor-
mationTheory,IEEETransactionson
,57(3):1548Ð1566,2011.
[67]MatthieuGuillaumin,ThomasMensink,JakobVerbeek,andCordeliaSchmid.Tag-
prop:Discriminativemetriclearninginnearestneighbormodelsforimageannotation.
InICCV
,2009.
[68]MatthieuGuillaumin,JakobVerbeek,andCordeliaSchmid.Isthatyou?metric
learningapproachesforfaceidentiÞcation.In
ICCV
,2009.
128[69]MatthieuGuillaumin,JakobVerbeek,andCordeliaSchmid.Multimodalsemi-
supervisedlearningforimageclassiÞcation.In
IEEEConferenceonComputerVision
&PatternRecognition
,pages902Ð909,jun2010.
[70]HarryHalpin,ValentinRobu,andHanaShepherd.Thecomplexdynamicsofcollab-
orativetagging.In
Proceedingsofthe16thinternationalconferenceonWorldWide
Web
,pages211Ð220.ACM,2007.
[71]ZaidHarchaoui,MatthijsDouze,MattisPaulin,MiroslavDudik,andJ«eröomeMalick.
Large-scaleimageclassiÞcationwithtrace-normregularization.In
ComputerVision
andPatternRecognition(CVPR),2012IEEEConferenceon
,pages3386Ð3393.IEEE,
2012.[72]BharathHariharan,S.V.N.Vishwanathan,andManikVarma.LargescaleMax-
MarginMulti-LabelclassiÞcationwithpriorknowledgeaboutdenselycorrelatedlabels.
InProceedingsofInternationalConferenceonMachineLearning
,2010.
[73]TomerHertz,Aharon-BarHillel,andDaphnaWeinshall.Boostingmarginbaseddis-
tancefunctionsforclustering.In
ICML
,2004.
[74]TomerHertz,Aharon-BarHillel,andDaphnaWeinshall.Learningakernelfunction
forclassiÞcationwithsmalltrainingsamples.In
ICML
,2006.
[75]PSHiremathandJagadeeshPujari.Contentbasedimageretrievalusingcolor,texture
andshapefeatures.In
AdvancedComputingandCommunications,2007.ADCOM
2007.InternationalConferenceon
,pages780Ð784.IEEE,2007.
[76]ThomasHofmann.Probabilisticlatentsemanticanalysis.In
ProceedingsoftheFif-
teenthconferenceonUncertaintyinartiÞcialintelligence
,pages289Ð296.Morgan
KaufmannPublishersInc.,1999.
[77]StevenC.H.Hoi,WeiLiu,andShih-FuChang.Semi-superviseddistancemetric
learningforcollaborativeimageretrieval.In
CVPR
,2008.
[78]StevenCHHoi,WeiLiu,MichaelRLyu,andWei-YingMa.Learningdistancemet-
ricswithcontextualconstraintsforimageretrieval.In
ComputerVisionandPattern
Recognition,2006IEEEComputerSocietyConferenceon
,volume2,pages2072Ð2078.
IEEE,2006.
[79]RichangHong,MengWang,YueGao,DachengTao,XuelongLi,andXindongWu.
Imageannotationbymultiple-instancelearningwithdiscriminativefeaturemapping
andselection.
Cybernetics,IEEETransactionson
,44(5):669Ð680,2014.
129[80]Cho-JuiHsiehandPederOlsen.Nuclearnormminimizationviaactivesubspaceselec-
tion.In
ProceedingsofThe31stInternationalConferenceonMachineLearning
,pages
575Ð583,2014.
[81]DanielHsu,ShamMKakade,andTongZhang.Robustmatrixdecompositionwith
sparsecorruptions.
InformationTheory,IEEETransactionson
,57(11):7221Ð7234,
2011.[82]KaizhuHuang,RongJin,ZenglinXu,andCheng-LinLiu.Robustmetriclearning
bysmoothoptimization.In
UAI2010,ProceedingsoftheTwenty-SixthConference
onUncertaintyinArtiÞcialIntelligence,CatalinaIsland,CA,USA,July8-11,2010
,pages244Ð251,2010.
[83]KaizhuHuang,YimingYing,andColinCampbell.Gsml:AuniÞedframeworkfor
sparsemetriclearning.In
DataMining,2009.ICDMÕ09.NinthIEEEInternational
Conferenceon
,pages189Ð198.IEEE,2009.
[84]KaizhuHuang,YimingYing,andColinCampbell.Generalizedsparsemetriclearning
withrelativecomparisons.
KnowledgeandInformationSystems
,28(1):25Ð45,2011.
[85]KeHuangandSelinAviyente.WaveletfeatureselectionforimageclassiÞcation.
Image
Processing,IEEETransactionson
,17(9):1709Ð1720,2008.
[86]YinjieHuang,CongLi,MichaelGeorgiopoulos,andGeorgiosCAnagnostopoulos.
Reduced-ranklocaldistancemetriclearning.In
MachineLearningandKnowledge
DiscoveryinDatabases
,pages224Ð239.Springer,2013.
[87]TomoharuIwata,TakeshiYamada,andNaonoriUeda.Modelingsocialannotation
datawithcontentrelevanceusingatopicmodel.InY.Bengio,D.Schuurmans,J.D.
La!erty,C.K.I.Williams,andA.Culotta,editors,
AdvancesinNeuralInformation
ProcessingSystems22
,pages835Ð843.CurranAssociates,Inc.,2009.
[88]MartinJaggi,MarekSulovsk,etal.Asimplealgorithmfornuclearnormregularized
problems.In
Proceedingsofthe27thInternationalConferenceonMachineLearning
(ICML-10)
,pages471Ð478,2010.
[89]PrateekJain,BrianKulis,JasonVDavis,andInderjitSDhillon.Metricandkernel
learningusingalineartransformation.
TheJournalofMachineLearningResearch
,13(1):519Ð547,2012.
130[90]PrateekJain,BrianKulis,andInderjitSDhillon.Inductiveregularizedlearningof
kernelfunctions.In
AdvancesinNeuralInformationProcessingSystems
,pages946Ð
954,2010.
[91]PrateekJain,BrianKulis,InderjitS.Dhillon,andKristenGrauman.Onlinemetric
learningandfastsimilaritysearch.In
NIPS
,pages761Ð768,2008.
[92]J.Jeon,V.Lavrenko,andR.Manmatha.Automaticimageannotationandretrieval
usingcross-mediarelevancemodels.In
Proceedingsofthe26thAnnualInternational
ACMSIGIRConferenceonResearchandDevelopmentinInformaionRetrieval
,SIGIR
Õ03,pages119Ð126,NewYork,NY,USA,2003.ACM.
[93]ChuanjunJi,XiangdongZhou,LanLin,andWeidongYang.Labelingimagesby
integratingsparsemultipledistancelearningandsemanticcontextmodeling.In
ECCV
,2012.[94]ShuiwangJiandJiepingYe.Anacceleratedgradientmethodfortracenormminimiza-
tion.In
ICML
,pages457Ð464.ACM,2009.
[95]RongJin,ShijunWang,andYangZhou.Regularizeddistancemetriclearning:theory
andalgorithm.In
NIPS
.2009.
[96]CharlesRJohnson.Matrixcompletionproblems:asurvey.In
ProceedingsofSymposia
inAppliedMathematics
,volume40,pages171Ð198,1990.
[97]ZoltanKatoandTing-ChuenPong.AmarkovrandomÞeldimagesegmentationmodel
forcolortexturedimages.
ImageandVisionComputing
,24(10):1103Ð1114,2006.
[98]DorKedem,StephenTyree,KilianQ.Weinberger,FeiSha,andGertLanckriet.Non-
linearmetriclearning.In
AdvancesinNeuralInformationProcessingSystems25
,pages2582Ð2590.2012.
[99]RaghunandanHKeshavan,AndreaMontanari,andSewoongOh.Matrixcompletion
fromafewentries.
InformationTheory,IEEETransactionson
,56(6):2980Ð2998,2010.
[100]RaghunandanH.Keshavan,AndreaMontanari,andSewoongOh.Matrixcompletion
fromnoisyentries.
J.Mach.Learn.Res.
,11:2057Ð2078,August2010.
[101]MargaretEIKippandDGrantCampbell.Patternsandinconsistenciesincollaborative
taggingsystems:Anexaminationoftaggingpractices.
ProceedingsoftheAmerican
SocietyforInformationScienceandTechnology
,43(1):1Ð18,2006.
131[102]OlgaKlopp.Highdimensionalmatrixestimationwithunknownvarianceofthenoise.
February2012.
[103]OlgaKloppetal.Rankpenalizedestimatorsforhigh-dimensionalmatrices.
Electronic
JournalofStatistics
,5:1161Ð1183,2011.
[104]OlgaKloppetal.Noisylow-rankmatrixcompletionwithgeneralsamplingdistribution.
Bernoulli
,20(1):282Ð303,2014.
[105]OlgaKlopp,KarimLounici,andAlexandreBTsybakov.Robustmatrixcompletion.
arXivpreprintarXiv:1412.8132
,2014.
[106]V.Koltchinskii.
OracleInequalitiesinEmpiricalRiskMinimizationandSparseRecov-
eryProblems:
«EcoleDÕ
«Et«edeProbabilit«esdeSaint-FlourXXXVIII-2008
.EcoledÕ«et«e
deprobabilit«esdeSaint-Flour.Springer,2011.
[107]VladimirKoltchinskii,KarimLounici,AlexandreBTsybakov,etal.Nuclear-norm
penalizationandoptimalratesfornoisylow-rankmatrixcompletion.
TheAnnalsof
Statistics,39(5):2302Ð2329,2011.
[108]RalfKrestel,PeterFankhauser,andWolfgangNejdl.Latentdirichletallocationfortag
recommendation.In
ACMconferenceonRecommendersystems
,pages61Ð68.ACM,
2009.[109]AlexKrizhevsky,IlyaSutskever,andGeo
!reyEHinton.ImagenetclassiÞcationwith
deepconvolutionalneuralnetworks.In
Advancesinneuralinformationprocessing
systems,pages1097Ð1105,2012.
[110]BrianKulis.Metriclearning:Asurvey.
FoundationsandTrendsinMachineLearning
,5(4):287Ð364,2013.
[111]GautamKunapuliandJudeW.Shavlik.Mirrordescentformetriclearning:AuniÞed
approach.InPeterA.Flach,TijlDeBie,andNelloCristianini,editors,
ECML/PKDD
(1),volume7523of
LectureNotesinComputerScience
,pages859Ð874.Springer,2012.
[112]KazuhiroKurodaandMasafumiHagiwara.Animageretrievalsystembyimpression
wordsandspeciÞcobjectnamesÐiris.
Neurocomputing
,43(1):259Ð276,2002.
[113]YungkyunNoh,ByoungtakZhang,andDanielD.Lee.Generativelocalmetriclearn-
ingfornearestneighborclassiÞcation.InJ.D.La
!erty,C.K.I.Williams,J.Shawe-
Taylor,R.S.Zemel,andA.Culotta,editors,
AdvancesinNeuralInformationProcess-
ingSystems23
,pages1822Ð1830.CurranAssociates,Inc.,2010.
132[114]MarcoLaCascia,SaratenduSethi,andStanSclaro
!.Combiningtextualandvisual
cuesforcontent-basedimageretrievalontheworldwideweb.In
Content-BasedAccess
ofImageandVideoLibraries,1998.Proceedings.IEEEWorkshopon
,pages24Ð28.
IEEE,1998.
[115]ChristophH.Lampert,HannesNickisch,andStefanHarmeling.Learningtodetect
unseenobjectclassesbybetweenclassattributetransfer.In
InCVPR
,2009.
[116]ChristophH.Lampert,HannesNickisch,andStefanHarmeling.Attribute-basedclas-
siÞcationforzero-shotvisualobjectcategorization.
IEEETransactionsonPattern
AnalysisandMachineIntelligence
,36(3):453Ð465,2014.
[117]Landauer.
HandbookofLatentSemanticAnalysis
.LawrenceErlbaumAssociates,
2007.[118]MoniqueLaurent.Matrixcompletionproblemsmatrixcompletionproblems.In
En-
cyclopediaofOptimization
,pages1967Ð1975.Springer,2009.
[119]VictorLavrenko,RManmatha,andJiwoonJeon.Amodelforlearningthesemantics
ofpictures.In
Advancesinneuralinformationprocessingsystems
,pageNone,2003.
[120]SvetlanaLazebnik,CordeliaSchmid,andJeanPonce.Beyondbagsoffeatures:Spatial
pyramidmatchingforrecognizingnaturalscenecategories.In
ComputerVisionand
PatternRecognition,2006IEEEComputerSocietyConferenceon
,volume2,pages
2169Ð2178.IEEE,2006.
[121]HonglakLee,RogerGrosse,RajeshRanganath,andAndrewYNg.Convolutionaldeep
beliefnetworksforscalableunsupervisedlearningofhierarchicalrepresentations.In
Proceedingsofthe26thAnnualInternationalConferenceonMachineLearning
,pages
609Ð616.ACM,2009.
[122]Kuen-LongLeeandLing-HweiChen.Ane
"cientcomputationmethodforthetexture
browsingdescriptorofmpeg-7.
ImageandVisionComputing
,23(5):479Ð489,2005.
[123]XirongLi,CeesG.M.Snoek,andMarcelWorring.Learningtagrelevancebyneighbor
votingforsocialimageretrieval.In
Proceedingsofthe1stACMInternationalConfer-
enceonMultimediaInformationRetrieval
,MIRÕ08,pages180Ð187,NewYork,NY,
USA,2008.ACM.
[124]DarylLim,GertLanckriet,andBrianMcFee.Robuststructuralmetriclearning.In
ProceedingsofThe30thInternationalConferenceonMachineLearning
,pages615Ð
623,2013.
133[125]Tsung-YiLin,MichaelMaire,SergeBelongie,JamesHays,PietroPerona,DevaRa-
manan,PiotrDoll«ar,andC.LawrenceZitnick.MicrosoftCOCO:commonobjectsin
context.
CoRR
,abs/1405.0312,2014.
[126]ZijiaLin,GuiguangDing,MingqingHu,JianminWang,andXiaojunYe.Imagetag
completionviaimage-speciÞcandtag-speciÞclinearsparsereconstructions.
CVPR
,2013.[127]NatiLinial,ShaharMendelson,GideonSchechtman,andAdiShraibman.Complexity
measuresofsignmatrices.
Combinatorica
,27(4):439Ð463,2007.
[128]ChengjunLiuandHarryWechsler.GaborfeaturebasedclassiÞcationusingtheen-
hancedÞsherlineardiscriminantmodelforfacerecognition.
Imageprocessing,IEEE
Transactionson
,11(4):467Ð476,2002.
[129]DongLiu,Xian-ShengHua,MengWang,andHong-JiangZhang.Imageretagging.
InProceedingsoftheInternationalConferenceonMultimedia
,pages491Ð500.ACM,
2010.[130]DongLiu,Xian-ShengHua,LinjunYang,MengWang,andHong-JiangZhang.Tag
ranking.In
WorldWideWeb
,pages351Ð360,2009.
[131]DongLiu,ShuichengYan,Xian-ShengHua,andHong-JiangZhang.Imageretagging
usingcollaborativetagpropagation.
IEEETransactionsonMultimedia
,13(4):702Ð712,
2011.[132]JiakaiLiu,RongHu,MeihongWang,YiWang,andEdwardYChang.Web-scale
imageannotation.In
AdvancesinMultimediaInformationProcessing-PCM2008
,pages663Ð674.Springer,2008.
[133]WeifengLiu,DachengTao,JunCheng,andYuanyanTang.Multiviewhessiandiscrimi-
nativesparsecodingforimageannotation.
ComputerVisionandImageUnderstanding
,118:50Ð60,2014.
[134]XiaobaiLiu,ShuichengYan,Tat-SengChua,andHaiJin.Imagelabelcompletionby
pursuingcontextualdecomposability.
TOMCCAP
,pages21:1Ð21:20,2012.
[135]YingLiu,DengshengZhang,GuojunLu,andWei-YingMa.Asurveyofcontent-based
imageretrievalwithhigh-levelsemantics.
PatternRecognition
,40(1):262Ð282,2007.
[136]NicolasLoe
!andAliFarhadi.Scenediscoverybymatrixfactorization.In
ECCV
,2008.134[137]DavidGLowe.Distinctiveimagefeaturesfromscale-invariantkeypoints.
International
journalofcomputervision
,60(2):91Ð110,2004.
[138]ShiqianMa,DonaldGoldfarb,andLifengChen.Fixedpointandbregmaniterative
methodsformatrixrankminimization.
MathematicalProgramming
,128(1-2):321Ð353,
2011.[139]AmeeshMakadia,VladimirPavlovic,andSanjivKumar.Anewbaselineforimage
annotation.In
Proceedingsofthe10thEuropeanConferenceonComputerVision
,ECCV,2008.
[140]DavidMartin,CharlessFowlkes,DoronTal,andJitendraMalik.Adatabaseofhuman
segmentednaturalimagesanditsapplicationtoevaluatingsegmentationalgorithms
andmeasuringecologicalstatistics.In
ComputerVision,2001.ICCV2001.Proceed-
ings.EighthIEEEInternationalConferenceon
,volume2,pages416Ð423.IEEE,2001.
[141]RahulMazumder,TrevorHastie,andRobertTibshirani.Spectralregularizational-
gorithmsforlearninglargeincompletematrices.
TheJournalofMachineLearning
Research
,11:2287Ð2322,2010.
[142]BrianMcFeeandGertRLanckriet.Metriclearningtorank.In
Proceedingsofthe
27thInternationalConferenceonMachineLearning(ICML-10)
,pages775Ð782,2010.
[143]MaryMeeker.InternetTrends2014-CodeConference.Technicalreport,Kleiner
PerkinsCauÞeldByers(KPCB),May14,2014.
[144]ThomasMensink,JakobJ.Verbeek,andGabrielaCsurka.Learningstructuredpre-
dictionmodelsforinteractiveimagelabeling.In
CVPR
,2011.
[145]DonaldMetzlerandRManmatha.Aninferencenetworkapproachtoimageretrieval.
InImageandvideoretrieval
,pages42Ð50.Springer,2004.
[146]CharlesA.Micchelli,YueshengXu,andHaizhangZhang.Universalkernels.
JMLR
,6:2651Ð2667,2006.
[147]FlorentMonayandDanielGatica-Perez.Onimageauto-annotationwithlatentspace
models.In
ProceedingsoftheeleventhACMinternationalconferenceonMultimedia
,MULTIMEDIAÕ03,pages275Ð278,NewYork,NY,USA,2003.ACM.
[148]FlorentMonayandDanielGatica-Perez.Plsa-basedimageauto-annotation:Con-
strainingthelatentspace.In
Proceedingsofthe12thAnnualACMInternational
135ConferenceonMultimedia
,MULTIMEDIAÕ04,pages348Ð351,NewYork,NY,USA,
2004.ACM.
[149]YadongMu,JianDong,XiaotongYuan,andShuichengYan.Acceleratedlow-rank
visualrecoverybyrandomprojection.In
CVPR
,pages2609Ð2616.IEEEComputer
Society,2011.
[150]SahandNegahbanandMartinJWainwright.Restrictedstrongconvexityandweighted
matrixcompletion:Optimalboundswithnoise.
TheJournalofMachineLearning
Research
,13(1):1665Ð1697,2012.
[151]ZhenxingNiu,GangHua,XinboGao,andQiTian.Semi-supervisedrelationaltopic
modelforweaklyannotatedimagerecognitioninsocialmedia.In
2014IEEECon-
ferenceonComputerVisionandPatternRecognition,CVPR2014,Columbus,OH,
USA,June23-28,2014
,pages4233Ð4240,2014.
[152]TimoOjala,MattiPietikainen,andTopiMaenpaa.Multiresolutiongray-scaleand
rotationinvarianttextureclassiÞcationwithlocalbinarypatterns.
PatternAnalysis
andMachineIntelligence,IEEETransactionson
,24(7):971Ð987,2002.
[153]TimoOjala,MattiPietik¬ainen,andTopiM¬aenp¬a¬a.Multiresolutiongray-scaleand
rotationinvarianttextureclassiÞcationwithlocalbinarypatterns.
PatternAnalysis
andMachineIntelligence,IEEETransactionson
,24(7):971Ð987,2002.
[154]AudeOlivaandAntonioTorralba.Modelingtheshapeofthescene:Aholisticrepresen-
tationofthespatialenvelope.
Internationaljournalofcomputervision
,42(3):145Ð175,
2001.[155]AudeOlivaandAntonioTorralba.Modelingtheshapeofthescene:Aholisticrepresen-
tationofthespatialenvelope.
Internationaljournalofcomputervision
,42(3):145Ð175,
2001.[156]ShibinParameswaranandKilianQWeinberger.Largemarginmulti-taskmetriclearn-
ing.In
Advancesinneuralinformationprocessingsystems
,pages1867Ð1875,2010.
[157]JamesPettersonandTib«erioS.Caetano.Submodularmulti-labellearning.In
NIPS
,pages1512Ð1520,2011.
[158]DuangmaneePutthividhya,HagaiThomasAttias,andSrikantanS.Nagarajan.Topic
regressionmulti-modallatentdirichletallocationforimageannotation.In
CVPR
,2010.136[159]AliMustafaQamar,EricGaussier,J-PChevallet,andJooHweeLim.Similarity
learningfornearestneighborclassiÞcation.In
DataMining,2008.ICDMÕ08.Eighth
IEEEInternationalConferenceon
,pages983Ð988.IEEE,2008.
[160]Guo-JunQi,JinhuiTang,Zheng-JunZha,Tat-SengChua,andHong-JiangZhang.
Ane
"cientsparsemetriclearninginhigh-dimensionalspacevial1-penalizedlog-
determinantregularization.In
Proceedingsofthe26thAnnualInternationalConference
onMachineLearning
,ICMLÕ09,pages841Ð848,NewYork,NY,USA,2009.ACM.
[161]BenjaminRecht.Asimplerapproachtomatrixcompletion.
TheJournalofMachine
LearningResearch
,12:3413Ð3430,2011.
[162]BenjaminRecht,MaryamFazel,andPabloAParrilo.Guaranteedminimum-rank
solutionsoflinearmatrixequationsvianuclearnormminimization.
SIAMreview
,52(3):471Ð501,2010.
[163]BenjaminRechtandChristopherR«e.Parallelstochasticgradientalgorithmsfor
large-scalematrixcompletion.
MathematicalProgrammingComputation
,5(2):201Ð
226,2013.
[164]JonathanRobinsonandVojislavKecman.Combiningsupportvectormachinelearning
withthediscretecosinetransforminimagecompression.
NeuralNetworks,IEEE
Transactionson
,14(4):950Ð958,2003.
[165]AngelikaRohde,AlexandreBTsybakov,etal.Estimationofhigh-dimensionallow-
rankmatrices.
TheAnnalsofStatistics
,39(2):887Ð930,2011.
[166]R«omerRosalesandGlennFung.Learningsparsemetricsvialinearprogramming.In
Proceedingsofthe12thACMSIGKDDinternationalconferenceonKnowledgediscov-
eryanddatamining
,pages367Ð373.ACM,2006.
[167]LorenzoRosasco,ErnestoDeVito,AndreaCaponnetto,MichelePiana,andAlessandro
Verri.Arelossfunctionsallthesame?
NeuralComput.
,16(5):1063Ð1076,May2004.
[168]OlgaRussakovsky,JiaDeng,HaoSu,JonathanKrause,SanjeevSatheesh,SeanMa,
ZhihengHuang,AndrejKarpathy,AdityaKhosla,MichaelBernstein,AlexanderC.
Berg,andLiFei-Fei.Imagenetlargescalevisualrecognitionchallenge,2014.
[169]BernhardSch¬olkopf,AlexJ.Smola,andKlaus-RobertM¬uller.Nonlinearcomponent
analysisasakerneleigenvalueproblem.
NeuralComputation
,10(5):1299Ð1319,1998.
137[170]BernhardSch¬olkopfandAlexanderJSmola.
Learningwithkernels:supportvector
machines,regularization,optimizationandbeyond
.MITPress,2002.
[171]ShaiShalev-Shwartz,YoramSinger,andAndrewY.Ng.Onlineandbatchlearning
ofpseudo-metrics.In
ProceedingsoftheTwenty-ÞrstInternationalConferenceon
MachineLearning
,ICMLÕ04,pages94Ð,NewYork,NY,USA,2004.ACM.
[172]ChunhuaShen,JunaeKim,LeiWang,andAntonvandenHengel.PositivesemideÞnite
metriclearningwithboosting.In
AdvancesinNeuralInformationProcessingSystems
,pages1651Ð1659.2009.
[173]ChunhuaShen,JunaeKim,LeiWang,andAntonVanDenHengel.PositivesemideÞ-
nitemetriclearningusingboosting-likealgorithms.
TheJournalofMachineLearning
Research
,98888(1):1007Ð1036,2012.
[174]AmitSingerandMihaiCucuringu.Uniquenessoflow-rankmatrixcompletionby
rigiditytheory.
SIAMJournalonMatrixAnalysisandApplications
,31(4):1621Ð1641,
2010.[175]SteveSmaleandDing-XuanZhou.Geometryonprobabilityspaces.
Constructive
Approximation
,30(3):311Ð323,2009.
[176]ArnoldWMSmeulders,MarcelWorring,SimoneSantini,AmarnathGupta,and
RameshJain.Content-basedimageretrievalattheendoftheearlyyears.
Pattern
AnalysisandMachineIntelligence,IEEETransactionson
,22(12):1349Ð1380,2000.
[177]JanA.Snyman.
Practicalmathematicaloptimization:anintroductiontobasicopti-
mizationtheoryandclassicalandnewgradient-basedalgorithms
.Appliedoptimization.
Springer,NewYork,2005.
[178]NathanSrebroandRuslanSalakhutdinov.CollaborativeÞlteringinanon-uniform
world:Learningwiththeweightedtracenorm.In
AdvancesinNeuralInformation
ProcessingSystems
,pages2056Ð2064,2010.
[179]NathanSrebroandAdiShraibman.Rank,trace-normandmax-norm.In
Learning
Theory
,pages545Ð560.Springer,2005.
[180]NitishSrivastavaandRuslanSalakhutdinov.Learningrepresentationsformultimodal
datawithdeepbeliefnets.In
InternationalConferenceonMachineLearningWork-
shop,2012.
138[181]G.W.StewartandJi-GuangSun.MatrixPerturbationTheory(ComputerScience
andScientiÞcComputing).1990.
[182]MasashiSugiyama.DimensionalityreductionofmultimodallabeleddatabylocalÞsher
discriminantanalysis.
JMLR
,8:1027Ð1061,2007.
[183]IshratJahanSumana,MdMonirulIslam,DengshengZhang,andGuojunLu.Content
basedimageretrievalusingcurvelettransform.In
MultimediaSignalProcessing,2008
IEEE10thWorkshopon
,pages11Ð16.IEEE,2008.
[184]QiTian,C.Aggarwal,Guo-JunQi,HengJi,andThomasS.Huang.Exploringcontext
andcontentlinksinsocialmedia:Alatentspacemethod.
PAMI
,34(5):850Ð862,2012.
[185]Kim-ChuanTohandSangwoonYun.Anacceleratedproximalgradientalgorithmfor
nuclearnormregularizedlinearleastsquaresproblems.
PaciÞcJournalofOptimiza-
tion,6(615-640):15,2010.
[186]CarloTomasiandTakeoKanade.Shapeandmotionfromimagestreamsunderorthog-
raphy:afactorizationmethod.
InternationalJournalofComputerVision
,9(2):137Ð
154,1992.
[187]LorenzoTorresaniandKuang-chihLee.Largemargincomponentanalysis.In
Advances
inneuralinformationprocessingsystems
,pages1385Ð1392,2006.
[188]YoshimasaTsuruoka,JunÕichiTsujii,andSophiaAnaniadou.Stochasticgradientde-
scenttrainingforl1-regularizedlog-linearmodelswithcumulativepenalty.In
Pro-
ceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4th
InternationalJointConferenceonNaturalLanguageProcessingoftheAFNLP:Vol-
ume1-Volume1
,ACLÕ09,pages477Ð485,Stroudsburg,PA,USA,2009.Association
forComputationalLinguistics.
[189]C.Wah,S.Branson,P.Welinder,P.Perona,andS.Belongie.TheCaltech-UCSD
Birds-200-2011Dataset.Technicalreport,2011.
[190]JingyanWang,YihuaZhou,HaoxiangWang,XiaohongYang,FengYang,andAustin
Peterson.Imagetagcompletionbylocallearning.In
AdvancesinNeuralNetworksÐ
ISNN2015
,pages232Ð239.Springer,2015.
[191]JunWang,HuyenTDo,AdamWoznica,andAlexandrosKalousis.Metriclearning
withmultiplekernels.In
AdvancesinNeuralInformationProcessingSystems
,pages
1170Ð1178,2011.
139[192]JunWang,AlexandrosKalousis,andAdamWoznica.Parametriclocalmetriclearning
fornearestneighborclassiÞcation.InF.Pereira,C.J.C.Burges,L.Bottou,andK.Q.
Weinberger,editors,
AdvancesinNeuralInformationProcessingSystems25
,pages
1601Ð1609.CurranAssociates,Inc.,2012.
[193]QifanWang,BinShen,ShumiaoWang,LiangLi,andLuoSi.Binarycodesembedding
forfastimagetaggingwithincompletelabels.In
ComputerVisionÐECCV2014
,pages
425Ð439.Springer,2014.
[194]X-JWang,LeiZhang,FengJing,andWei-YingMa.Annosearch:Imageauto-
annotationbysearch.In
ComputerVisionandPatternRecognition,2006IEEECom-
puterSocietyConferenceon
,volume2,pages1483Ð1490.IEEE,2006.
[195]ZhengWang,JiashiFeng,ChangshuiZhang,andShuichengYan.Learningtorank
tags.In
ProceedingsoftheACMInternationalConferenceonImageandVideoRe-
trieval
,CIVRÕ10,pages42Ð49,NewYork,NY,USA,2010.ACM.
[196]KilianQWeinberger,JohnBlitzer,andLawrenceKSaul.Distancemetriclearning
forlargemarginnearestneighborclassiÞcation.In
Advancesinneuralinformation
processingsystems
,pages1473Ð1480,2006.
[197]KilianQWeinbergerandLawrenceKSaul.Fastsolversande
"cientimplementations
fordistancemetriclearning.In
Proceedingsofthe25thinternationalconferenceon
Machinelearning
,pages1160Ð1167.ACM,2008.
[198]KilianQWeinbergerandLawrenceKSaul.Distancemetriclearningforlargemargin
nearestneighborclassiÞcation.
TheJournalofMachineLearningResearch
,10:207Ð
244,2009.
[199]K.Q.WeinbergerandG.Tesauro.Metriclearningforkernelregression.In
ArtiÞcial
IntelligenceandStatistics
,2007.
[200]LeiWu,StevenC.H.Hoi,RongJin,JiankeZhu,andNenghaiYu.Distancemetric
learningfromuncertainsideinformationwithapplicationtoautomatedphototagging.
InACMMultimedia
,2009.
[201]LeiWu,RongJin,andAnilKJain.Tagcompletionforimageretrieval.
Pattern
AnalysisandMachineIntelligence,IEEETransactionson
,35(3):716Ð727,2013.
[202]PengchengWu,StevenChu-HongHoi,PeilinZhao,andYingHe.Miningsocialimages
withdistancemetriclearningforautomatedimagetagging.In
ProceedingsoftheFourth
140ACMInternationalConferenceonWebSearchandDataMining
,WSDM,pages197Ð
206,NewYork,NY,USA,2011.ACM.
[203]YuXiang,XiangdongZhou,ZuotaoLiu,Tat-SengChua,andChong-WahNgo.Se-
manticcontextmodelingwithmaximalmarginconditionalrandomÞeldsforautomatic
imageannotation.In
CVPR
,2010.
[204]JianxiongXiao,JamesHays,KristaA.Ehinger,AudeOliva,andAntonioTorralba.
SUNdatabase:Large-scalescenerecognitionfromabbeytozoo.In
TheTwenty-Third
IEEEConferenceonComputerVisionandPatternRecognition,CVPR2010,San
Francisco,CA,USA,13-18June2010
,pages3485Ð3492,2010.
[205]EricPXing,MichaelIJordan,StuartRussell,andAndrewYNg.Distancemetric
learningwithapplicationtoclusteringwithside-information.In
Advancesinneural
informationprocessingsystems
,pages505Ð512,2002.
[206]HaoXu,JingdongWang,Xian-ShengHua,andShipengLi.TagreÞnementbyregu-
larizedlda.In
ACMInternationalConferenceonMultimedia
,pages573Ð576.ACM,
2009.[207]HuanXu,ConstantineCaramanis,andSujaySanghavi.Robustpcaviaoutlierpursuit.
InAdvancesinNeuralInformationProcessingSystems
,pages2496Ð2504,2010.
[208]XingXu,AkiraShimada,andRin-ichiroTaniguchi.Tagcompletionwithdefectivetag
assignmentsviaimage-tagre-weighting.In
MultimediaandExpo(ICME),2014IEEE
InternationalConferenceon
,pages1Ð6.IEEE,2014.
[209]ZhixiangXu,KilianQWeinberger,andOlivierChapelle.Distancemetriclearningfor
kernelmachines.
arXivpreprintarXiv:1208.3422
,2012.
[210]LiuYangandRongJin.Distancemetriclearning:Acomprehensivesurvey.Technical
report,MichiganStateUniv.,2009.
[211]LiuYang,RongJin,RahulSukthankar,andYiLiu.Ane
"cientalgorithmforlocal
distancemetriclearning.In
AAAI,volume2,2006.
[212]PeipeiYang,KaizhuHuang,andCheng-LinLiu.Multi-tasklow-rankmetriclearn-
ingbasedoncommonsubspace.In
NeuralInformationProcessing
,pages151Ð159.
Springer,2011.
[213]PeipeiYang,KaizhuHuang,andCheng-LinLiu.Geometrypreservingmulti-task
metriclearning.
Machinelearning
,92(1):133Ð175,2013.
141[214]YangYang,WenshengZhang,andYuanXie.Imageautomaticannotationviamulti-
viewdeeprepresentation.
JournalofVisualCommunicationandImageRepresenta-
tion,33:368Ð377,2015.
[215]YiYang,FeiWu,FeipingNie,HengTaoShen,YuetingZhuang,andAlexanderG
Hauptmann.Webandpersonalimageannotationbymininglabelcorrelationwith
relaxedvisualgraphembedding.
ImageProcessing,IEEETransactionson
,21(3):1339Ð
1351,2012.
[216]Dit-YanYeungandHongChang.Extendingtherelevantcomponentanalysisalgorithm
formetriclearningusingbothpositiveandnegativeequivalenceconstraints.
Pattern
Recognition
,39(5):1007Ð1010,2006.
[217]YimingYing,KaizhuHuang,andColinCampbell.Sparsemetriclearningviasmooth
optimization.In
Advancesinneuralinformationprocessingsystems
,pages2214Ð2222,
2009.[218]JoaquinZepeda,EwaKijak,andChristineGuillemot.Sift-basedlocalimagedescrip-
tionusingsparserepresentations.In
MultimediaSignalProcessing,2009.MMSPÕ09.
IEEEInternationalWorkshopon
,pages1Ð6.IEEE,2009.
[219]ChangshuiZhang,FeipingNie,andShimingXiang.Ageneralkernelizationframework
forlearningalgorithmsbasedonkernelpca.
Neurocomputing
,73(4):959Ð967,2010.
[220]DengshengZhang,MdMonirulIslam,andGuojunLu.Areviewonautomaticimage
annotationtechniques.
PatternRecognition
,45(1):346Ð362,2012.
[221]XianxingZhangandLawrenceCarin.Jointmodelingofamatrixwithassociated
textvialatentbinaryfeatures.InF.Pereira,C.J.C.Burges,L.Bottou,andK.Q.
Weinberger,editors,
AdvancesinNeuralInformationProcessingSystems25
,pages
1556Ð1564.CurranAssociates,Inc.,2012.
[222]YuZhangandDit-YanYeung.Transfermetriclearningbylearningtaskrelationships.
InProceedingsofthe16thACMSIGKDDinternationalconferenceonKnowledgedis-
coveryanddatamining
,pages1199Ð1208.ACM,2010.
[223]NingZhou,WilliamK.Cheung,GuopingQiu,andXiangyangXue.Ahybridproba-
bilisticmodelforuniÞedcollaborativeandcontent-basedimagetagging.
IEEETrans.
PatternAnal.Mach.Intell.
,33(7):1281Ð1294,2011.
142[224]GuangyuZhu,ShuichengYan,andYiMa.ImagetagreÞnementtowardslow-rank,
content-tagprioranderrorsparsity.In
InternationalConferenceonMultimedia
.ACM,
2010.[225]XiaojinZhu,DavidM.Blei,andJohnLa
!erty.Taglda:Bringingdocumentstructure
knowledgeintotopicmodels.TechnicalReportTR-1553,UniversityofWisconsin,
Madison,2006.
[226]JinfengZhuangandStevenC.H.Hoi.Atwo-viewlearningapproachforimagetag
ranking.In
WSDM
,pages625Ð634,2011.
[227]JustinZobelandAlistairMo
!at.InvertedÞlesfortextsearchengines.
ACMcomputing
surveys(CSUR)
,38(2):6,2006.
143