EDGEIMPACTINGRAPHSANDSOCIALNETWORKMATRIXCOMPLETIONByDennisRossADISSERTATIONSubmittedtoMichiganStateUniversityinpartialmentoftherequirementsforthedegreeofComputerScience{DoctorofPhilosophy2016ABSTRACTEDGEIMPACTINGRAPHSANDSOCIALNETWORKMATRIXCOMPLETIONByDennisRossEverygraphGcanbeassociatedwithmanywell-knowninvariantpropertiesalongwiththeircorrespondingvalues.Here,aframeworkisproposedtomeasurethechangeinanyparticularinvariantuponadditionofanewedgeeintheresultinggraphGe.Ingraphs,theP-impactofanedgeeisthe`magnitude'ofthebetweenthevaluesoftheinvariantPingraphsGefromG.AnedgeissaidtobeofmaximumP-impactifitcanbeoptimallyaddedtoGinordertoachieveadesiredresultoneithertheinvariant'svalueorgraph'sstructure.Newedgesareaddedfromthegraphcomplementinsimplegraphsorbetweenanypairofverticesotherwise.Inthisworkseveralinvariantsareexploredincluding:numberofspanningtrees,sumofdistancesbetweenvertices,andvertexconnectivity.Abriefcommentaryonseveralotherfamousinvariantsisalsoprovided.AnumberofquestionsabouttheP-impactofanedgeonthestructureofgraphsarepresented.Alsoincludedisanattempttotlydetermineanoptimalsetofedgesbasedonourinvariant.Severalrestrictionsandconjecturestodeterminingthisoptimalsetarediscussed.Aprooftowardsoptimaledgeadditionfordistance-impactintreesisgiven.Anaturalapplicationtomeasuringtheimpactofedgeadditiontoagraphisthatoflinkpredictionproblems.Theseapplicationsareconsideredandantalgorithmforlinkpredictionevenwithcold-startverticesusingasubspacesharingmethodthatdecouplesmatrixcompletionandsideinformationtransductionispresented.Thismethodisextendedtopredictratingsinuser-itemrecommendersystemswherebothmaybecold-start.Mathematicalguaranteesandexperimentalresultsonrealworldpublicationandsocialnetworksareprovided.Vegenegyutra.iiiACKNOWLEDGMENTSBeforeanythingelsecanbesaid,ImustmymostheartfeltthankyoutomyadvisorDr.Abdol-HosseinEsfahanian.IhavewantedtoworkwithyoufromthedayIyouralgorithmicgraphtheorycourse.Icannotthankyouenoughfortakingachanceonmeasyouradvisee,andithastrulybeenaprivilegetoearnmydegreeunderyourmentorship.Ihaveachievedsomanyprofessionalgoalsthankstoyourunendingsupportandleadership.IwillmissthemanycoldMichiganafternoonsconsistingofdrawingtoomanygraphsonyourwhiteboard,butIlookforwardtocontinuingourcollaborationsandfriendshipinthefuture.IhavebeenlargelysupportedbyworkingwithUSAIDaspartoftheDSI.ThiswouldnothavebeenpossiblewithoutthetirelessworkofDr.PouyanNejadhashemi.Thankyoufortakingthestressofmaintainingfundingofmyshoulders-nottomentionexposingmetomanyinterestingbiosystemsandglobalaiddataproblems.Ifurtherwanttoextendthankstotherestofmycommittee:Drs.Pang-NingTanandGuoliangXing.Yourguidanceandinsightsmadetheprocessofcompletingadissertationmuchmoreachievable.Thisworkwouldnotbepossiblewithoutthewonderfulhelpofseveralcolleagues.ToDr.RonaldNussbaum,Igreatlyenjoyedthegraphtheoryworkwecompletedtogether.Also,amassivethankyoutobothDr.RanaForsatiandImanBarjastehforsharingyourinterestandexpertiseinmatrixcompletionproblems(andenduringlotsofsillyquestions).Finally,thankyoutoAshleyDepotteyformakingtheDSIlotsmorefunthanitotherwisewouldhavebeen.ivTomyparents,BethandGary,thankyouforencouragingintellectualcuriositythrough-outmyentirelife.Thatalongwithyourloveandsupportwereinstrumentalinthisverychallengingendeavor.Alongwithmysister,Valerie,IcouldalwayscountonmyfamilywhenIneededit.Tomyfriends,youallhavestoodbymebothpersonallyandprofessionallythroughoutmyjourneythroughgraduateschool.Iwouldnothavegottenthroughitwithoutyouall,norwouldmytimeatMichiganStatebeenasmuchfun.ThankyouBenHardin,SaraVrede-voogd,Dr.JoshVredevoogd,Dr.AndrewRatkiewicz,ErichOwens,Dr.LukeWilliams,LizWilson,Dr.CherylJaegerBalm,Dr.ThomasJaeger,andDr.TianHao.Tomywife-to-be,Dr.JenniferCornacchione,youhavethedistinctionofbeingjustoneplaceaboveFarkasinthissection.Youarealsolikelytheonlypersontoreadeverysinglewordinthisdocument,andyourcopy-editingwasincrediblyhelpful.Thankyouforbeingsuchawonderfulpartner,sharingsomanyexperienceswithme,andofcourseforgener-ouslygivingmeallofthebestchairsinNewLeaf.Let'sdosomegreatthingsnowthatIamdonetoo!IsupposeIcanalsothankGusandOliverhere-albeitbegrudgingly...Finally,Ihavetothankmymostloyalcompanion,Farkas.Nagyonom.Youweretheonlyonewithmeeveryday,foreveryhighandlow,throughbothmymastersanddoctoralstudies.Yourfuzzyfacealwayskeptmegoing.Szeretlekkicsem!vTABLEOFCONTENTSLISTOFTABLES...................................ixLISTOFFIGURES..................................xiiChapter1Introduction..............................11.1andTerminology..........................21.2MotivationandGoals..............................61.3RelatedWork..................................81.4Overview.....................................14Chapter2ProblemStatement..........................16Chapter3TheP-ImpactProcess........................193.1TheBasics....................................193.1.1TimeComplexity............................203.1.1.1RunningTimeofSPtree-Impact..............213.1.1.2RunningTimeofDistance-Impact.............223.2RandomP-ImpactProcess...........................223.2.1RandomDistance-ImpactAlgorithm.................233.3Improvements..................................233.4Conclusion....................................24Chapter4ConstructionfromEmptyGraphs.................254.1Observations...................................254.2DistanceInvariants...............................264.2.1Count-Impact..............................264.2.2Distance-Impact.............................284.3Conclusion....................................31Chapter5Trees...................................325.1NumberofSpanningTrees...........................325.2Distance.....................................335.2.1TreePartitions.............................335.2.2CounterexamplesinDistanceImpact.................345.2.3Non-LeafDistance-ImpactEdgesinTrees..............405.3Conclusion....................................60Chapter6P-ImpactinNetworkCompletion.................616.1Experimentation................................616.1.1NetworkCompletion..........................626.1.2EvaluationMetrics...........................626.1.3TheProcess...............................636.1.4PredictiveEdgesonErd}os-RenyiRandomGraphs..........63vi6.1.5PredictiveEdgesonRandomTrees..................656.1.6PredictiveEdgesonRandomPowerLawGraphs...........676.2PowerLawMotivation.............................696.3Conclusion....................................71Chapter7NetworkCompletioninGraphs..................737.1TheAssumptions................................747.2TheAlgorithm.................................757.2.1PreviousApproach...........................757.2.2Overview................................767.2.3AlgorithmDetails............................767.3ExperimentalEvaluation............................807.3.1Datasets.................................807.3.2EvaluationMetrics...........................837.3.3BaselineAlgorithms..........................837.3.3.1LinkPredictionBaselineAlgorithms............847.3.3.2NetworkCompletionBaselineAlgorithms.........857.3.4ExperimentsonSyntheticDatasets..................867.3.4.1ofNoiseinSideInformation.............867.3.4.2ofTrainingSize....................877.3.5EvaluationofLinkPrediction.....................887.3.5.1LinkPredictiononEpinions.................887.3.5.2LinkPredictiononWeibo..................907.3.6EvaluationofNetworkCompletion..................927.3.6.1NetworkCompletioninFacebook..............927.3.6.2NetworkCompletioninGoogle+..............927.4Conclusion....................................94Chapter8RecommenderSystems........................958.1TheAssumptions................................968.2TheAlgorithm.................................978.2.1PreviousApproach...........................978.2.2Overview................................988.2.3AlgorithmDetails............................988.3ExperimentalEvaluation............................1028.3.1Datasets.................................1038.3.2EvaluationMetrics...........................1048.3.3TheBaselineAlgorithms........................1058.3.4ofNoise.............................1088.3.5ExistingUsersandItems........................1088.3.6Cold-StartItems............................1138.3.7Cold-startUsers.............................1158.3.8Cold-StartUsersandItems......................1158.4Conclusion....................................116viiChapter9Conclusions...............................117APPENDIX.......................................119REFERENCES.....................................133viiiLISTOFTABLESTable5.1Analysisofthecounterexamplestotheconjecturethatthemaximumdistance-impactedgejoinscomponentsinTkwherekisthemax-imumindexedge.............................39Table5.2Analysisofthecounterexamplestotheconjecturethatthemaximumdistance-impactedgejoinscomponentsinTkwherekisthemax-imumdistIndexedge...........................39Table5.3Analysisofthecounterexamplestotheconjecturethatthemaximumdistance-impactedgejoinscomponentsinTcorisincidenttocwherecisthecentervertex.......................40Table5.4Thedistance-impactofallnon-isomorphicedgeadditionsinP4...42Table5.5Thedistance-impactofallnon-isomorphicedgeadditionsinP5...42Table5.6Thedistance-impactofallnon-isomorphicedgeadditionsinP6...49Table5.7Thedistance-impactofallnon-isomorphicedgeadditionsinTbad..52Table6.1ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:5q...................65Table6.2Thebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforrandomtrees.67Table6.3Thebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforrandompowerlawgraphs................................69Table7.1StatisticsofWeibo,Epinions,Facebook,andGoogle+datasets...83ixTable7.2LinkpredictionresultsonEpinionsdatasetandtheoftrainingsize.....................................89Table7.3LinkpredictionresultsontheWeibodatasetandtheofvaryingtrainingsize...............................91Table7.4ComparisonoftalgorithmsontheGoogle+withtpercentagesofobservednodes.....................92Table8.1Statisticsoftherealworlddatasets..................105Table8.2ResultsonMovieLens100Kand1MandEpinionsforneighbor-basedmethodswithnocold-startusers/items................111Table8.3ResultsonMovieLens100Kand1MandEpinionsforlatentfactormethodswithnocold-startusers/items................112Table8.4Resultsonallofthecold-startscenariosforrealdatasets......114TableA.1ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:1q...................129TableA.2ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:2q...................129TableA.3ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:3q...................130TableA.4ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:4q...................130xTableA.5ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:6q...................131TableA.6ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:7q...................131TableA.7ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:8q...................132TableA.8ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-Renyiran-domgraphsoftheformGp25;0:9q...................132xiLISTOFFIGURESFigure1.1Agraphwherethedashededgeshavetheirrespectivedistance-impactnoted...............................4Figure3.1Maximumdistance-impactdoesnotyieldanoptimalsolution...21Figure4.1ThePhasesoftheMaximumCount-ImpactProcess.........27Figure4.2ThePhasesoftheMaximumDistance-ImpactProcess.......29Figure5.1Counterexampletoconjecture5.2.0.1.................35Figure5.2Counterexampletoconjecture5.2.0.2.................35Figure5.3Counterexampletoconjecture5.2.0.4.................36Figure5.4CounterexampletoConjecture5.2.0.9................37Figure5.5CounterexampletoConjecture5.2.0.10................37Figure5.6CounterexampletoConjecture5.2.0.11................38Figure5.7P4withallnon-isomorphicpossibleedgeadditionsasdashededges42Figure5.8P5withallnon-isomorphicpossibleedgeadditionsasdashededges42Figure5.9P6withallnon-isomorphicpossibleedgeadditionsasdashededges43Figure5.10Cn(left)andDpCn2(right),eachoforderandsizen.......43Figure5.11TheregionsofDpCn2........................45xiiFigure5.12Tbadwithallnon-isomorphicpossibleedgeadditionsasdashededges52Figure5.13Proposeddistance-impactedgewhenselectedleavesareofdistance353Figure5.14Proposeddistance-impactedgewhenselectedleavesareofdistance454Figure5.15Proposeddistance-impactedgewhenselectedleavesareofdistanceatleast5................................59Figure6.1RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:5q..................64Figure6.2MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:5q......................64Figure6.3RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomtreesgraphs.....................66Figure6.4MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomtrees............................66Figure6.5RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandompowerlawgraphs..................68Figure6.6MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandompowerlawgraphs.....................68Figure6.7RMSEcausedbyadjacencymatrixelementdeletionforerandompowerlawgraphsoforder15.....................70Figure6.8RMSEcausedbyadjacencymatrixelementdeletion(1'sonly)forerandompowerlawgraphsoforder15..............70Figure7.1Algorithmfornetworkcompletionwithsideinformationviathepro-posedalgorithmfordecoupledcompletionandtransduction....77xiiiFigure7.2TherecoveryerroroftheproposedMC-DTalgorithmnoisevariancevalues..................................86Figure7.3Therecoveryerroroftalgorithmsonasyntheticdatasetfortsizesofpartiallyobservedsubmatrixwithmnodes....87Figure7.4TherecoveryoffouralgorithmsontheFacebookdatasetmeasuredinRMSEundertpercentagesofobservednodes.......93Figure7.5TherecoveryoffouralgorithmsontheFacebookdatasetmeasuredinMAEundertpercentagesofobservednodes.......93Figure8.1Algorithmforratingpredictionusingsideinformationviathepro-posedalgorithmfordecoupledcompletionandtransduction....99Figure8.2RMSE&MAEonthesyntheticdatasetfortnoisevariancesonsimilaritymatrices.........................109Figure8.3RMSE&MAEofMovieLens1Mfortnoisevariancesonsimilaritymatrices...........................110FigureA.1RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:1q..................121FigureA.2MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:1q......................121FigureA.3RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:2q..................122FigureA.4MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:2q......................122FigureA.5RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:3q..................123xivFigureA.6MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:3q......................123FigureA.7RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:4q..................124FigureA.8MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:4q......................124FigureA.9RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:6q..................125FigureA.10MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:6q......................125FigureA.11RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:7q..................126FigureA.12MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:7q......................126FigureA.13RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:8q..................127FigureA.14MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:8q......................127FigureA.15RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:9q..................128FigureA.16MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:9q......................128xvChapter1IntroductionGraphsprovideasimpleandpowerfulmathematicalconstructinwhichbothab-stractandrealworldmodelscanleveragetheirstructuretomakecompellingobservations.Thestructureofagraphisthenmeasuredbyexaminingitsinvariantproperties.Weareinterestedinmaximizingorminimizingcertaininvariants,spdistance-basedmea-sures,asagraphevolvesundertheadditionofedges.Furtherapplicationswilltakeagraphtobeasocialnetwork.Usingthesegraphsbothlinkpredictionandrecommendersystemproblemsmaybeexplored.Thisworkshowstheprogressionofapuregraphtheoreticalproblemintomatrixcompletiontechniquesinnetworksandrecommendersystems.Assuch,inthischaptertheimportantgraphtheoreticalandstructuresrequiredfortheintroductionoftheP-impactprocessareprovided.Withinthisframeworkwefurthergivethenecessarymotivationforcodifyingthisnewconceptwhileprovidinganoverviewofthescopeofthiswork.Additionally,andsymbologyformatrixcompletion,linkprediction,andrecommendersystemproblemswillbenecessarytoexaminetheevolutionoftheimpactproblem.11.1andTerminologyThegraphtheoretictermsandsymbolsofBasinModernGraphTheory,unlessotherwisenoted,areusedthroughoutthisdissertation.Tobegin,agraphisanunorderedpair,GpV;Eq,withthesetofedges,EorEG,composedofpairsofelementsinthesetofvertices,VorVG.Theelementsineachedgeex;y,wherex;yPV,arecalledtheendpointsoftheedge.Inaweightedgrapheachedgeisalsoassignedareal-valuedlabel.Inanunweightedgraphalledgesareassignedavalueofone.Theorderofthegraphisthecardinalityofthevertexsetandthesizeisthecardinalityoftheedgeset.Thesearedenotedn|V|andm|E|respectively.GxisunderstoodaspVYx;EqorpV;EYxqcontextuallywhenxisavertexsetoredgeset.AsubgraphHpV1;E1qofagraphGpV;EqisanorderedsetwhereV1—VandE1—E.Aninducedsubgraph,GrV1s,istherestrictionofGtoavertexsubsetV1wherealledgeswithbothendpointsinV1areincluded.GraphsGandHareisomorphicifthereexistsbijectivecorrespondencebetweentheirvertexsetsthatpreservesadjacencyunderthebijectivemap':GÑH.Agraphinvariantisapropertythatremainsunchangeduptoisomorphismbetweengraphs.Anedgeiscalledincidenttoavertexifitcontainsthatvertexasatleastoneofitsendpoints.Twoverticesarecalledadjacentiftheyshareanedge,andtwoedgesareadjacentiftheyshareatleastoneendpoint.Theneighborhoodofavertexisthesetofalladjacentvertices.Thenumberofedgesincidenttoavertexisthedegreeofthatvertex.TheminimumdegreeofG,pGq,isthesmallestdegreeinthegraph.Similarly,themaximumdegreeofG,pGq,isthelargestdegreepresentinthegraph.IfthesetEcontainsrepeatedelements,Giscalledamultigraphandtherepeatededgesarecalledmultipleedges.Aloopisanedgewherebothendpointsarethesame.Agraphiscalledsimpleifitcontainsnoloopsormultipleedges.Allgraphswillbeconsideredtobesimpleunlessexplicitlystated.2Apathisalinearsequenceofverticeswhereadjacentverticesinthesequenceareadjacentinthegraph.Theshortestpathisthepathbetweentwoverticesxandywiththelowesttotaledge-weightsumconnectingthem.Thelengthofthispathiscalledthedistancedenoteddpx;yqordGpx;yq.Thediameterofagraphisthelengthofthelongestshortestpath.Theindexofanedgeisthenumberofshortestpathscontainingit.Thedsitanceindex,ordistIndex,ofanedgeisthesumoftheshortestpathscontainingmultipliedbytheirrespectivelengths.Thetwoendpointsofthepathcreatedbythediameterarecalledperipheralvertices.Acycleisthesameasapathexcepttheunderlyingsequenceiscyclic.Thesumofthedistanceofallofthepairsofverticesistheall-pairsshortestpathdistanceortotaldistance.Agraphiscalledconnectedifthereexistsapathbetweenallpairsofverticesanddisconnectedotherwise.Acomponentisasetofverticesthatareconnectedandtherearenopathstoverticesoutsideofthisset.AnyvertexvwhoseremovalcreatesmorecomponentsinGviscalledacutvertex.Agraphofordern¥2andsizen2iscalledthecompletegraphKn.ThetrivialgraphK1isasinglevertexwithnoedges.ThecomplementofGpV;EqisagraphGpV;EpGqq.Atreeisaconnectedgraphcontainingexactlyn1edges.AspanningtreeisaconnectedsubgraphofGwithn1edgesandnvertices.Avertexisapendantifitisdegreeone,anditisaleafifitisdegreeoneinatree.AbipartitegraphisagraphwheretheverticescanbepartitionedintotwosetsUandVsuchthattheendpointsofalledgeslieinexactlyoneofthepartitions.Acompletebipartitegraph,Km;n,isabipartitegraphwhereeveryvertexinUisadjacenttoeveryvertexinV.Aspecialcaseofthecompletebipartitegraphisthestar,K1;n.1.1.TheP-impactofanedgeeistheencebetweenthevaluesoftheinvariantPingraphsGefromG.WithEEpGqforsimplegraphsandEEpKnqotherwise.3Whentheinvariantisdiscrete-valuedtheofisunderstoodcon-textually.ThesetEmayberestrictedtoanymultisetoftheedgesofKnandtheresultisthesetofpermissibleedges.IfmorethanoneedgeisaddedtothegraphtheP-impactismea-suredwiththeadditionoftheentiresetofedges.ThemethodofaddingedgestoagraphGusingtheirP-impactistheP-impact-process.ForsimplegraphstheP-impact-processterminateswhen,aftertheadditionofe,wehaveacompletegraph.Foramulitgraph,thisprocesscanbeorterminateafteraprescribednumberofedgesisreached.IftheinvariantPistakentobethesumoftheall-pairsdistanceinG,thentheP-impactiscalledthedistance-impactofeonG.WhenPisthenumberofspanningtreesofG,thisisdenotedassptree-impactofeonG.Thediscreteversionofthedistance-impactofanedgeconsidersthenumberofpairsofverticeswhosedistanceisreducedwiththeadditionofanedge.Thisproblemiscalledtheall-pairsshortestpathdistancecount-impactorcount-impact.AlloftheseprocessesandvaluescanalsosimilarlybeconsideredforedgedeletionoredgeevaluationinGeandpGeqerespectively.74Figure1.1Agraphwherethedashededgeshavetheirrespectivedistance-impactnoted.Movingintotheapplicationspace,thecommondescriptionsforlink-predictionandrecommendersystemproblemsaredescribed.Herethewordsgraphandnetworkwillbeusedinterchangeablywithapreferenceforgraphinthemathematicalspaceandnetworkintheapplicationspace.Ingeneral,thegoaloftheseapplicationsistoprovidealistofedgestoaddtoagraphorasetofitemstorecommendtoauser.Thenotationdescribingthelinearalgebrastructureisasfollows.Lowercaseletters,suchasu,areusedtodenotescalarsandtheboldedversions,suchasu,areinsteadvectors.HereRisusedasthesetofrealnon-negativenumbersandrnstodenoteasetonintegerst1;2;:::;nu.For4matricesuppercaseletters,suchasM,areused.ThetransposeofvectorsandamatricesaredenotedbymJandMJ,respectively.Thescalarproduct,ordotproduct,betweentwovectorsmandnisdenotedbymJn.Therankofamatrixisthedimensionofthevectorspacespannedbythecolumnsofthatmatrix.TheFrobeniusnormofamatrixMPRnmisdenotedby}M}Fwhere}M}Fb°ni1°mj1|Mij|2.ThespectralnormofMis}M}2givenbythesquarerootofthemaximumeigenvalueofMM.Thenuclearnorm,ortracenorm,ofamatrixisdenotedby}M}trace?MJM.FinallytheMoore-PenrosepseudoinverseofmatrixMisshownbypMq:.Fortheapplications,matrixcompletionproblemsareexploredviacollaborativeFornetworkcompletionthereisasetofnusers,Utu1;:::;unu.Asanextension,asetofmitems,Iti1;:::;imuisaddedforrecommendersystemapplications.Inthecasewithusersanditemseachuser,ui,may(ormaynot)providesomeformoffeedbackonsubsetoftheitemset.Examplesoffeedbackcanbevaried,butinthisworkitwillberestrictedtoanexplicitreal-valuedrating.Althoughthegraphnotationisconsistentwiththeimpactproblems,somefurtherwillbeusedtodescribethegraphical(network)structureoftheappliedprob-lems.GivenanorderngraphGpV;Eq,theverticesaredistinguishableastheyrepresentdistinctusers.Theadjacencymatrix,APt0;1unn,issuchthateveryentryei;jPt0;1;?u.Anentryof\0"isaknownmissingedge,\1"isaknownextantedge,and\?"isanedgethatisunknown.ThereisfurtheraninducedsubgraphofGcalledOpV;E1qwhereOPt0;1;?ummwith1¤m¤nandE1•E.Whenaroworcolumniscompletelyunobserved,thecorrespondingnodeiscon-sideredacold-startnode.TherearenoassumptionsastothedistributionofobservednorunobservedentriesinA.However,inO,theedgesaresampleduniformallyatrandomanddonotcontainanycold-startentries.Thisisenforcedevenwhensucharestrictiondictatesthatm!n.5Theobjectiveofthisworktooutthemissingentriesofamatrixusingtheobservedentriesandpossibleotherinformation.Theprocessoflinginthemissingentriesiscalledmatrixcompletion.Anyexternalinformationaboutthenodesinasocialnetworkgraphbeyondtheadjacencymatrixiscalledsideinformation.Sideinformationcanbewidelyvaried,dependingontheproblemdomain,andtakesonmanyforms(e.g.humandemographicinformation,URLclickpatterns,proteinfoldsinyeast).Theprocessofapplyingthesideinformationtocompletetheadjacencymatrix(orratingsmatrix)iscalledtransduction.1.2MotivationandGoalsGraphinvariantsdescribethenatureofgraphsandareatthecoreofgraphtheoret-icalknowledge.Further,itisusefultoexaminetheevolutionofgraphsovertimeviaedgeadditionanddeletion.ThesefactsmotivatedustocreateaprocessinwhichthechangesininvariantvaluesofgraphscanbetrackedorcontrolledviatheP-impactprocess.Byaddingedgestoanemptygraphandexaminingitsevolution,patternsemerged.Thisledtoattemptstogeneralizeandcategorizetheentireprocess.Accordingtoanex-tensiveliteraturesearch,thereisnopreviousworkresemblingtheP-impactorP-impactprocess.ThepolynomialnatureofthecomplexityofcomputationofmanyinvariantsisalsoleveragedastestingtheP-impactprocesscanbefullycomputationallycompletedandcomparedagainstknownbenchmarks.ThiscomputationaladvantageisimperativebecausethedeterminationoftheoptimalP-impactprocessisgenerallynottrivial{evenonbasicgraphclasses.Bycreatinganewandprocessthereareseveralclearobjectivestothiswork.Primarily,thegoalisunderstandasmuchofthefundamentalmathematicalnatureoftheP-impactprocessaspossible.Tothisend,thegoalistodeterminenecessaryandtconditionsfortdiscoveryofoptimalP-impactedgesinspecialgraphclasses6beforeconsideringthegeneralcase.Thisworksettlesmanycasesintreesandprovidesastartingpointforusinggraphicalinformationtodeterminesideinformationtobeusedinlinkpredictionandrecommendersystemproblems.Therearemanyresultsofthesetypesinsocialmedianetworks.Initially,distance-impactwasdeployedasawaytobridgedistinctgroupsofpeopleformorevemes-saging.Messagestendtostayclosetotheirsourcesinsocialnetworks[4].Policymakersoradvertiserscouldpaytoconnectseeminglydisparateusersbyahighimpactedgesothencouldreleasecoordinatedmessages.Thiselyreducesthedistancethatamessagewouldneedtotravelandcreatesanadditionalsourcefornewlyreduced-distanceuserstoreceivethemessage.Theseapplications,fromapuregraphtheoreticalconcepttoaninitialapplication-drivenapproach,ledtoanexplorationofusingsideinformationtopredictedgesinpartiallyobservednetworks.Thisexplorationpushedthisworkintotherealmofmatrixcompletionproblemsintherealmoflinkprediction,recommendersystems,and,,initem-basedtaxonomyproblems.Realworldnetworks,socialorotherwise,areoftenmodeledbygraphs.However,inrealworldnetworksitcanbeverytogatherthecompletegraphstructurethatmodelsmanyapplications(e.g.socialnetworks,biologicalnetworks,internetwebsiteinteractions).Suchissuesarisefromthenatureoftheseproblems.Informationmaybeprotectedasamatterprivacy,corporateknowledge,ormaynotbequeriedduetothesheersizeofthedata.Further,actorsinnetworksmaytrytodeliberatelyobscurethemselves,ormayhavefewmeasurableinteractions.Althoughnotalloftheseproblemsmaybealleviated,thereareseveralmathematicaltechniquesthathavebeensuccessfullydeployedtocompletethemissinginformationfromsuchpartiallyobservedgraphs.Thesetechniquesinitiallycollectapartialsampleofanetworkandtheninferthenetworksstructure.This,admittedlybroad,techniqueisreferredtoasnetworkcompletionorlessfrequentlyassurveysampling.7Toimproveonnetworkcompletionandrecommendersystemproblemsviamatrixcompletionsideinformationiscollectedandanalyzed.Theworkhereinisusedtosolvesuchproblemsintheirmosttcase,namelywhencold-startissuesoccur.Currently,thisistheleastunderstoodareaofmatrixcompletionapplicationstographcompletionandrecommendersystems.1.3RelatedWorkAlthoughtheconceptoftheedgeimpactprocessisnew,graphinvariantshavebeenwidelystudiedusingavarietyofmathematicaltechniques.Evenmoresimilarly,manyresultsinextremalgraphtheoryaminimumadditionofedgestoachieveadesiredinvariantvalue.Simplegraphclassesareexaminedtogaininsightandfollowextremalworkthatwasdoneusingconstructiveandprobabilistictools.TounderstandtheP-impactingeneralgraphs,weconsiderparticulargraphclasses.SimilarworkwasdonebyRossetal.inthedistance-preservingproblemforregulargraphs[77].WorkinrandomgraphsgenerationtomakegeneralobservationswasgivenbySzemeredi'sRegularityLemma[43].ThisallowsclaimsofrandomnesswhentheP-impactprocessisusedonlargegraphs.Erd}ospublishedmanyresultsonextremalgraphtheory[22]andspgraphscontainingcertainstructures[21].TheseextremalproblemstrytothesmallestnumberofedgesorverticesthatmustbeusedforcertainpropertiestobeTheP-impactprocessexaminessimilarphenomena,yetrestrictstotheadditionofedgestopreserveorachieveinvariantvalues.Strictlyworkingwiththenumberofedges,otherextremalresultsalsoexist[15].SomemajorinvariantsareunderstoodthroughsimilarmethodsastheP-impactpro-cess.Edge-connectivitycanhavearelationshipwithbothdistanceandspanningtreeap-plications.Theproblemitselfisquitematureandmanycomputationaltechniquesforcom-putingedge-connectivityexist[54].Algorithmsandimprovedboundsforedge-connectivity8werealsofoundbyEsfahanianandHakimi[23].Thisisinadditiontootherbounds,com-putationalresults[56,25]andextremalresultsonthenumberofedgesthatcanbeaddedtoenforcek-edge-connectivity[13].Althoughthisworkfocusesmoreontheroleedgesplayindistances,oftentheverticesofconcernarehighlycentraltothegraph.Previousworkhasexploredthediscoveryandpropertiesofcentralvertices.Nieminenexaminedandthecentralityofverticesingraphs[62].LaterworkbyBorgattiexaminedcentrality'sroleinsocialnetworks[10].Theseareexaminedinthepracticalapplicationsastheimpactofimportantactorsinsocialmediagraphsareexplored.Themaximumdistance-impactedgestrytoreduceeithertotaldistanceorthedis-tancesofthelargestnumberofshortest-pathsinagraph.Thisprocessissimilartootherideasthataimtocreatestructurallyimportantedgesbymeasuringbetweennessandcen-trality.Freemangaveaframeworkformeasuringbetweennesscentrality,andBrandesgaveafastalgorithmforcomputationalit[26,11].Thisalsoleveragessimilarworkexploringthecentersandcentroidsofgraphs[63].Theapplicationscontainedhereinusethemodelingpowerofgraphsforphysicalandsocialnetworks.TherehasbeenmuchworkinthesethatcanguidetheseSeveralauthorshavefoundmethodsofmeasuringimpactofmessagesinsocialmedia[85,37].Othershaveexploredthepredictivepowersofexamingthesocialnetworksforpredicting:useractivity[86],propagation[18],andborevenue[3].Bakshyetal.alsoputforthamethodforestimatingthepopularityofcontentbymeasuringdistancethatURLstraveledthroughanetwork[4].Graph(network)completionproblemsexistinanumberoftsettingswhereincompletegraphinformationispresent.Theliteratureisdeepwithmethodstosolvesuchinstanceswithapplicationsininformationretrieval,socialnetworkanalysis,andcom-putationalbiology[2,66].SomerelevantexamplesinvolvelargesocialnetworkssuchasFacebookandTwitter.Becausethesenetworkshaveusersthatnumberinthebillions,9toinferthefullnetworktopologydescribingtherelationshipsbetweenusers(edgesinthenetwork)isknowntobeinmanycases[66].Extantlearningalgorithmsfornetworkcompletionproblemscommonlymakestruc-turalassumptionsonthenatureoftheunderlyingnetwork.Fromhere,therearemanymethodstotlyreconstructtheactualnetwork.Theclassicalnetworkcompletionimplementationsmakeanassumptionthatrandomentriesoftheadjacencymatrixaremissing.However,innetworkcompletionisaccomplishedbyrandomlysubsamplingthepartiallyobservednetwork.Sp,in[41]itisassumedthattheunderlyingnetworkfollowstheKroneckergraphmodel.Anexpectationmaximization(EM)frameworkwasusedtoinfertheunobservedpiecesofthenetwork.Acomputationalmethodthatwasusedtomissingandspuriousinteractionswithincomplexnetworkisgivenin[32].Heretheseinteractionswerefoundusingstochasticblockmodelswhichthencapturedthestructuralfeaturesofthenetworkitself.Thesamemethodwasappliedontheinteractionsofproteinsinayeastnetwork.Further,asamplingmethodthatderivesintervalsfromsamplednetworksisgivenin[36].AsimilarlinkpredictionapplicationthatappliesmostsimilarlytotheP-impactprocessisgivenin[48].Thisworkpredictsfutureedgesthatwillbeaddedtoanetwork.However,becausenolinkscanbeobservedforunsamplednodes,thesestatisticalmodelsperformpoorlyinthesecases.Suchproblemsexistwhenextremesparsityispresent,andfareevenworsewhenpresentedwiththecold-startscenario.Maximummarginmatrixfactorization[83]wasamethoddevelopedforcollaborativeSeveralworksshowthat,theoretically,thismethodmayperfectlycompleteamatrix[90,58,72].Someextensionsofthisworkincludecollaborative[83]andalsoallowtheuseofsideinformation[1].Oneprinciplemethodexploredintheworkistheuseoftransductionofsideinfor-mationtocompletematricesandrecentworkapproachesmatrixcompletionusingasimilar10tack.Someworkconsidersmatrixcompletionwithtransduction[24,65]andevencanbeexpandedtothecasewheresideinformationisofdimensionasdescribedin[1].Tohandlesparsityproblems,severalstudieshavegivenmatrixfactorizationmodelsthattrytooptimizetheirresultsbytakingfewersamplesfromtheoriginalmatrices.Menonetal.gavealogisticregressionapproachthataddedalogisticregressionwithaprincipledeightingschemetoitsobjectivefunction[58].ABayesianapproachwastakenbyPorteousetal.whereinregressionisappliedtothesideinformationthemselves[72].OneadvantagetothisBayesianapproachisthatthemixturemodelfortheprioroftheusers/itemsprovidesatregularizationforeachofthelatentclasses.Parketal.examinedrecommendersassimpleregressionproblems[67].Heretheyusedacombinationofbothuseranditemmetadatatocreatethesideinformation.Arelaxationofthenetworkcompletionproblemcanbeseenaslinkpredictiononbipartitegraphswithweights.Theserecommendersytemsproblemshavealsobeenwidelystudiedwithmanysimilartechniquestothoseusedinlink-prediction.Therehasalsobeensomeworkdirectlyaddressingcold-startproblems.Content-based(CB)andcollaborative(CF)arewell-knownexam-plesofrecommendationapproaches.AsdemonstratedbyitsperformanceintheKDDCup[19]andcompetition[8],themostsuccessfulrecommendationtechniqueusediscollaborativeThistechniqueexploitstheusers'opinions(e.g.,movieratings)and/orpurchasing(e.g.,watching,reading)historyinordertoextractasetofinterestingitemsforeachuser.InfactorizationbasedCFmethods,bothusersanditemsaremappedintoalatentfeaturespacebasedonobservedratingsthatarelaterusedtomakepredictions.Somemethodsareappliedthattrytopredictitemratingsusingglobal,andeasytocalculate,itemcharacteristics.Suchalgorithmshavebeenbasedonthepopularityoftheitems[67]oreventhroughrandomselection[51].Byapproachingthecold-startusers/itemsinthisglobalmanner,anynuancebetweendistinctgroupsofusersislost.Withthislackofagreatamountofaccuracyintherecommendationsislost.11Toalleviatetheconcernsraisedbythelackofhistoricaldataforusers/items,meth-odsweredevelopedpresentuserswithasetofitemsthattheymustrate.Thesewarm-startmethodsmayalsoimportuserpreferencesfromsideinformation.Suchworkhasbeensucessful[51,89,17,87],butasthesemethodsexplicitlyforceanewusertoprovideratingsforkrepresentativeitems(oranewitembeingforcedonkusers)thereiscantdrawbacksinusingwarm-startmethods.Incommerce,theviewsspendlearningitemcharacteristicscouldhavebetterbeenspentonpresentingitemsausermaywanttohavepurchased.Forsocialnetworks,thesitemaybeuninterestingtoanewuserbecausetheyarenotpresentedimmediatelywithrelevantorentertaininginteractions.Toavoidthewarm-startpitfalls,therehasbeenalargeamountofinterestinusingsideinformationtocompletetheratingoradjacencymatrices[82].Thissideinformationcanevenprovidecontextforcold-startusers/itemswithoutrequiringdirectuserfeedback.Thesemethodsoffeaturecombinationcombinethefeaturesoftheusersanditemstoincreasethemodel'saccuracy.Thedeterminationofthesefeaturesisfoundusingavailableuserinformation(e.g.location,webhistory)anditeminformation(e.g.metadata,manufacturerspwebBecauseuser-user(oruser-item)featurespacescanbedescribedusingthefeaturesextractedfromthesideinformation,therehasbeenapushtodevelopmethodsthatexploittheoverlappingsubspacestherein.Thesemethodsareoftencalledmatrixco-factorizationandhavebeenusedtosucessfullyexploitrichsourcesofsideinformationtoincreasemodelaccuracy.In[52,31,35,34]ratingandsideinformationmatricesaredecomposedatthesametimetoexposesharedlatentfeatures.Theworkof[64]imputedmissingvaluesandusedtheseinthematrixfactorizationtoboosttheperformanceonthecold-startproblem.AkernelizedmatrixfactorizationwasgivenbyZhouetal.[90].Hereauxiliaryinformationisincorporatedintomatrixfactorizationtoassessthesimilarityofthelatentfeatures.Saveskietal.[79]developedamatrixfactorizationmethodthatcollectivelydecomposedratingandsideinformationmatriceswithinacommonspaceoflowdimension.12Bymappingthefeaturesfromthesideinformationintothelatentfeatures,anothergroupoffeaturecombinationmethodshavebeenfound.Elbadrawyetal.'sapproachismadetolearnafunctionthattransformsthefeaturevectorsofitemsintotheirlatentspace[20].Gantneretal.furthergaveamatrixfactorizationmodelthatmapsthefeaturesdirectlytolatentfeatures[28].Finally,Boltzmannmachines[33]andaspectmodels[80]alsoutilizesideinformationtobeusedexplicitlyforcold-startrecommendationproblems.Someothermodelscomputesimilaritybetweenusers/itemexplicitlyandusethesesimilaritymatricestoapplyfeaturecombinationmethods[49,84].Sp,Trevisioletal.studiedusers'consumptionofnewsarticlesandthenbuiltaspecialgraphdubbedBrowsGraph[84].BrowsGraphincludedbothstructuralandtemporalpropertiesthatbothservednewstoitsusersbutwasalsoabletoproviderelevantarticlestoevencold-startusers.Anothercommonapproachistoallowmanyentrecommenderalgorithmstoexecuteonthedataandcombinetheirresultsusingvariousmethods.Theseapproachesrequirebuildingandrunningseveraltmodelsleadingtonecessarilyhighercompu-tationaloverhead[33].However,somesuccesshasbeenfoundinthecombinationofthesemodelsduetoweightedoutputs[16],reapplyingarankfunction[12],applyingtrecommendersateachphase[9],andcreatingavotingframework[70].Oneofthemoreeapproachesforrecommendersystemsistotlyfactortheadjacencyorratingmatricesintomultiplicativeofk-rankmatrices.Thisapproachheavilythisdissertation.Consideringtheitemrecommendationcase,thegeneralideaistofactortheratingmatrixintouseranditemspmatricessuchthattheirproductapproximatesthecompleteratingmatrix.Thetwomajorapproachestothisendareoptimizationtechniques[76,50,53,46]andprobabilistic[61]innature.Asanotherformofsideinformation,overthelasteyearsseveraltaxonomicapproachestorecommendersystemshavebeenattempted.Thesemethodsattempttoextractimplicituserpreferencesbasedonthehierarchyoftheitemsthemselves.The13taxonomicmatrixfactorizationapproachestriedtograbtheseexplicittaxonomiesandusethemassideinformation.Astheevolved,sophisticatedmethodstolearnthetaxonomiesthemselvesfromtheitem-userinteractionsweredeveloped.Zhouetal.,usingtheirKernelizedProbabalisticMatrixFactorization(KPMF)algo-rithm,haveleveragedsideinformationwithintheitem-taxonomicframework[91].Intheirwork,theyaccuratelymakeitemrecommendationsusingtheusers'socialnetworkassideinformation.Droretal.createdamatrixfactorizationschemewhereinthecategoricalandtemporalinformationaboutmusicwasusedassideinformationformusicratingpredictions[42].Kanagaletal.developedataxonomy-awaredynamiclatentfeaturemodelthatnotonlyallowedsuchtaxomoniestobeusedinitemstorecommendtousersbutalsoprovidedaframeworkforsimilarlatentfeaturesbetweenparentandsiblingitemsinthetaxonomy[40].Zhangetal.automatedthetaxonomycreationprocessvialearningtechniquesthatcreatedtaxonomieswithnouserinput[88].Theiralgorithmevenperformsbetterthanlatentfactormodelsonhuman-createdtaxonomies.Finally,wenotethatalthoughvarioushybridmethodssuchasfactorizationma-chines[74],content-boostedcollaborative[57],probabilisticmodels[71],pairwisekernelmethods[7],andots-basedmethods[68]havebeendevelopedtoblendcollab-orativewithsideinformation,theyarespdesignedtoaddressthedatasparsityproblemandfailtocopewithcold-startusersoritemsproblem,whichisthemainfocusofthiswork.1.4OverviewInthisintroduction,weanewconceptcalledtheP-impactprocessinaddi-tiontoourmotivationforintroducingthisconcept.InChapter2wedescribeadetailedproblemstatementwhileraisingopenquestionsaboutP-impactedgesanddescribingim-plicationsinnetworkcompletion,linkprediction,andrecommendersystemproblems.InChapter3theP-impactprocessisexplored.InChapter4distanceandcountimpactare14comparedinemptygraphs.WhileinChapter5P-impactisappliedtotrees.Chapter6servesasabridgebetweenP-problemsandapplicationsinsocialnetworks.Chapter7examinesnetworkcompletioningraphs.Chapter8extendstorecommendersystemsandChapter9containstheconclusionsofthiswork.15Chapter2ProblemStatementThegoalofthisworkisoptimizeinvariantvaluesunderedgeadditionandtousesuchtechniquestoprovidepredictionsfornewedgeadditionsinrealworldapplications.Becauseofthis,thescopeofthisworkfocusesontwogeneralfacets.First,themathematicalbasisfortheP-impactproblemisdescribed.Second,theapplicationstolinkpredictionandrecommendersystemsaregiven.Forageneralinvariant,ourmeaningofP-impactoptimizationmaynotbeclear.Optimalityasameasureofgoodnesswhereeachaddededgewilltrytooptimallypreserveaninvariantvalue,structure,orconcept.AsinputagraphG,atargetsizek,areconsideredandagraphHofsizekwiththedesiredinvariantvalueiscreated.Giventhewideandvariednatureofgraphinvariants,thismaybeunderstoodtoholdaseachedgeisadded,oronlyaftertheentiresetofedgesisadded.TheremaynotbeanoptimalsetofedgesforagivenGandkunderaparticularinvariant;theprescribedvalueisapproachedascloselyaspossiblewiththeadditionofkedges.Tobeginthisexplorationthethreeinvariantsareconsidered.Thecountsthenumberofspanningtreesandtheothertwoareconcernedwithdistancevariants.Forthenumberofspanningtrees,thegoalistomaximizethetotalnumberofspanningtreesviaanumberofedgeadditions.Forthedistanceinvariants,agoalissettodetermineboth16thesetofedgesthatreducethesumofalldistancesandthehighestnumberofverticeswithreduceddistances,respectively.Todiscovertheoptimalsetofedgestoaddinthecaseofageneralgraph,eachoftheseinvariantsareconsideredundertheP-impactprocesswithGrestrictedtospgraphclasses.Anemptygraphisconsideredwithkn2.TheprocessthenaddsedgesthroughoutitsP-impactprocesstoeventuallyreachKninthesimplecase.Here,anattemptprobabilisticallyidentifywhatgraphsarecreatedthroughthisevolutionaryprocesswithourchoseninvariantsismade.Afurtheropenproblemistointroduceafunctiontorandomlyaddedgesofhighestandlowestimpactinanattempttocreategraphswithinterestingpropertiesinaprocesssimilartothatwhichcreatessmall-worldnetworks.ThehighestconsiderationismadeforwhenGasatreeonnnodeswithkn.Atreeischosenbecauseitisaminimallyconnectedgraphandourinvariantsrelyonconnectednessforreasonableresults.Inthiscasethespanningtreeimpactproblemistrivialandcanbedeterminedbyonlythediameterofthegraph.However,thecountanddistanceimpactedgesetsareprovablynon-trivialandtheirdeterminationistobeexploredwithsomeresultspresented.WhendealingwithtreesintheP-impactproblem,connectivityisheavilyleveraged.However,questionsriseastohowtohandleverticesthataredisconnectedfromthegraph.Onceconnectivitywasdroppedverticesbecomecold-start(isolated).Thisleadtoanexplorationofcold-startverticesandalsodeterminationofedgestobeaddedtoagraph.Fromtheselinkpredictionexercisesapracticalapplicationarosethatconsideredmatrixcompletiontechniquesforadjacencymatricesusinggraphinvariantsassideinformation.Fromheretheproblemsbecamemoreconcreteandwereexploredonavarietyofrealworlddatasets.Theinitialaimoftheworkonnetworkcompletion(linkprediction)applicationswastoexploitsideinformationfromthenodes.Thissideinformationisgenerallyavailableinsomeformfromthesocialnetworksathand.Aproblemthusariseswhereinthechoiceof17sideinformationandmethodsoftransductiondeeplythequalityofthelinkprediction.Variationsarealsoincludedwheretherearecold-startnodesandchoicesforsideinformation.Similartolinkpredictionproblems,thisworkalsoexploresthechallengesofpredict-ingratingsinrecommendersystems.Again,entmatrixfactorizationmethodsexist,butaregenerallyintoleranttocold-startproblems.Afurtherchallengeisexploredinrec-ommendersystemswherecold-startusersanditemsmaybepresent.Withsparsityandcold-startissuespresent,thisworkseekstoprovideaframeworkforincorporatingsidein-formationusingsharedsubspacesinthematrixcompletiontechniquestoprovideaccurateratingprediction.TheresultsofthislaythegroundworkforfutureexplorationofP-impactedges.Itprovidesmanyinsightsabouttheyofassessingdistance-impactingraphsandshowsthenon-trivialnatureofdistance-impactedgesintreeswithacollectionofcounterexamplestovariousconjectures.Themathematicalbasisculminateswithaproofthatthemaximumdistance-impactedgecannotbeincidenttotwoleavesinatree.Considerableprogresswasmadewhenconsideringtheapplication-basedworkinbothnetworkcompletionandrecommendersystems.Bothexperimentalandmathematicalresultsaregivenshowingadecoupledtransductionapproachtoincorporatingsideinformationinmatrixcompletionistandaccurate.Thealgorithmspresentedalsohandlecold-startproblemsandnon-randomlydistributedsparsedatasources.18Chapter3TheP-ImpactProcessTheP-impactprocessreferstoanalgorithmthataddsedgesfromthepermissiblesettoagraphwhileattemptingtomaintainsomeinvariantvalueuntilaprescribedgraphsizeisachieved.ThegeneralP-impactprocessalgorithmisdescribedhereaswellasarandomizedvariant.Althoughrunningtimesdependonthecomplexityofcomputingtheinvariantsthemselves,computationalimprovementsthroughmeaningfulreductionsinthepermissiblesetareexplored.Thesereductionsareintroducedhereandfurtherdetailedinlaterchapters.Generally,maximumorminimumP-impactedgesareconsidered,butanyimpactvalueofanedgemaybeused.Lastly,theunderlyinggraphsmaycontrolmuchoftheP-impactprocess,andthesearealsodiscussedinlaterchapters.3.1TheBasicsTheP-impactprocessismostsimplyimaginedasabruteforcealgorithm.Generally,theconcerniswithsingleedgeadditionthatatthetimesomeinvariantcondition.Here,onceanedgeisaddeditmaynotberemoved.Thealgorithmitselfissimpleinthatitaddseachedgefromthepermissiblesetoneatatime,computestheinvariantvalue,andremovestheedge.Fromthis,itthentakestheedgewiththedesiredinvariantvalue.19ThemaximumP-impactedgeisusuallytheonemostused,butthemaxfunctionmaybereplacedwithanyconditionasrequiredbytheinvariant.Algorithm1TheP-ImpactProcessprocedureFindImpactEdgeI=fgforePPdoIYpPpGeq;eqreturnmaxConditionpPpGeq;eqprocedureFindImpactfulSetwhile|EpGq|€kdoeFindImpactEdgepGqPPzeGGereturnEpGqThisisthecasethatwegenerallypursue.However,thereisalsointerestinan`optimal'setofedgestoadd.Noticethattheprocessgenerallypickstheedgethatisbestinanyparticulartime,butdoesnotconsideritsimpactonfutureedges.Thisgreedyapproachmaynotalwayschoosethebestedgesetforagiveninvariantasawhole.Arelaxationofthisadditionwilltrytotheoptimalsetofedgestoaddallatonce.Forthisformulationoftheproblemweattempttodetermineasinglesetofedgesfromthepermissiblesetthat,whenadded,maintainorachieveaninvariantvalue.Theseresultsmaybebetter,buttheyrelyonahigherburdenofcalculationthatwillbediscussednext.3.1.1TimeComplexityNoticethattheP-impactprocessmustcomputetheinvariantvalueaftertheadditionofeachedgeofthepermissibleset,andthenrepeatthisprocessforeveryedgeadditionuntilGissizek.Thiscostisbythenumberoftimesthatthebestknownalgorithmtocomputeeachinvariantiscalled.IfaninvariantPpGqcanbecomputedinfpnqtimethiscallmustbemadeforeachpossibleedgeinthepermissibleset,whichisOpn2q,andkntimestochoosealloftherequirededges.Thus,thealgorithmwilltakeOppknqn2fpnqq20timetocompute.Throughoutthisdissertation,thismodelwillbeusedunlessexplicitlystatedotherwise.Comparethistotherelaxedcase,anditisclearthattheinvariantcalculationsareedbutthereisaneedtoconsiderallsizeknsubsetsofthepermissibleset.AsthepermissiblesetcanbeuptoOpn2qinthesimplecasethisisOn2knfpnq.ThisconsidersallpossiblesubsetsofsizeknofP.Lookingforanoptimalsubsetofadsizewillgenerallybemoretime-consumingthanthepreviouscase,butisstillpolynomialinniffpnqis.InFigure3.1themaximumdistance-impactprocessonarandomgraphGwithn10,m14,andk17isoutperformedbyanoptimalsetofthreeedgestoadd.Theaddededgesareshowninreddashedlines,andtheorderthattheedgesareaddedaregiven.MaximumDistance-ImpactProcessOptimalThreeEdges123111TotalDistance:150TotalDistance:146Figure3.1Maximumdistance-impactdoesnotyieldanoptimalsolution3.1.1.1RunningTimeofSPtree-ImpactThesptree-impactofagraphGcanbecomputedbycalculatingthenumberofspanningtreesunderadditionofeachedgeinthepermissibleset,respectively.OnagivengraphthenumberofspanningtreesiscomputedinOpVEETqwhereTisthenumber21ofspanningtrees[27].ThisprocesscanberepeatedkntimestothedesiredgraphinOppknqVEETqq.Ifthebestoverallsetistaken,asintherelaxedcase,thereisaneedtocheckthenumberofspanningtreesundertheadditionofallpossiblesubsetsofsizekn.ThiswouldtakeOn2knpVEETqtime.3.1.1.2RunningTimeofDistance-ImpactToanedgeofparticularcountordistance-impact,eachpossibleedgeinthepermissiblesetmustbecheckedandtheall-pairsshortestpathsafteritsadditionmustbefound.UsingtheFloydWarshallAlgorithm,orseveralcallstoDijktra'sAlgorithm,wecancomputeallofthedistancesinagraphinOpn3q.Forthedistance-impactthesumofthetotaldistanceineachstepisfound,andforthecount-impactthesumofthetotalnumberofchangesisfound.Combiningthese,anyoneedgemaybedoneinOpn5q.Thisisthespecialcasewhenkn1.Ingeneral,thereisaneedforthisprocesstoberepeatedkntimesandthatgivesarunningtimeofOppknqn5q.Againfortherelaxedcase,allofthedistancesundertheadditionofallpossiblesubsetsofsizeknmustbechecked.ThiswouldtakeOn2knn3time.3.2RandomP-ImpactProcessWhathappensifinsteadofaddingthemaximumdistance-impactedgeateachstep,abiasedcoinisedandthenthedecisiontoaddthemaximumorminimum-impactedgeismade?Thisisthequestionthatmaybesolvedbytherandomdistance-impactprocess.223.2.1RandomDistance-ImpactAlgorithmThisprocessisquitestraightforward.First,twopropertiesarechosenalongwithaprobabilitythreshold.Second,usetheP-impactprocessasbefore,butchoosetheedgetoaddbasedonwhichconditionisrandomlyselected.Conditiononeisconsideredtobeamaximumimpactedgeandconditiontwotobeaminimumimpactedgeunderbothdistanceandsptreeinvariant.Here,considertwoconditionsandathreshold,0¤t¤1,fordecidingbetweenthem.Algorithm2TheRandomP-ImpactProcessprocedureFindImpactEdgeI=fgChooserandompPr0;1sforePPdoIYpPpGeq;eqifp¤tthenreturnmaxCondition1pPpGeq;eqelsereturnmaxCondition2pPpGeq;eqprocedureFindImpactfulSetwhile|EpGq|€kdoeFindImpactEdgepGqPPzeGGereturnEpGqNotethatthereisnoreasontorestricttoonlytwoconditions.Theminimumandmaximumand/ordistanceandcountP-impactareoftenpairedasourconditions,butarbitrarilymanyconditionscouldbeimposed.3.3ImprovementsThecostsofcalculatinginvariantsisconsideredtobeandthetimecomplexityrequiredtocomputeinvariantvaluesisgiventothebestalgorithmsavailableatthetime.Becauseofthis,anyimprovementstorunningtimemustbeachievedbyreducingthesize23ofthepermissiblesetmeaningfully.Ifsuchimprovementscanbeachieved,feweredgesmustbeexaminedand,therefore,fewercallstocalculatetheinvariantvaluesaremade.Bydeterminingwhichedgesinthepermissiblesetmaybesafelyremoved,thepracticalrunningtimecanbereducedeveninsomecaseswheretheworst-casetimecomplexityisunchanged.Mostconjecturesastopermissiblesetreductionsaregivenintrees,butothergraphclassescouldbeexplored.3.4ConclusionTheP-impactprocessisamethodfordeterminingwhichedgestoaddtoagraphuntiladesiredsizeisachieved.Thealgorithmforboththestandardandrandomprocessisdescribedherealongwithrunningtimeobservations.Providedaninvariantcanbefoundinpolynomialtime,thisprocesscanbecompletedinpolynomialtimeaswell.ThefocusontheremainingP-impactchapterswillrevolvearoundfastwaystoreducethepermissiblesettoimprovethecomputationalcomplexityoftheprocess.24Chapter4ConstructionfromEmptyGraphsInthischapteranemptygraphisconsideredandarandomimpactprocessisusedtoaddinnewedges.Theminimalstructurepresentintreescanbeusedtomakesev-eralobservations,however,inthischapterthereisnoinitialstructure.Unlessotherwisestated,edgesofthemaximumimpactforourP-impactprocessareconsideredateachstep.However,onevariantoftherandomP-impactprocesswillalternatebetweenminimumandmaximumimpactedges.Theevolutionofmaximumdistanceandcount-impactisalsodescribedwhentheinitialgraphisempty.4.1ObservationsManyinvariantsarenotnecessarilywellondisconnectedgraphs.Forone,therearenospanningtreesofadisconnectedgraphbecausethespanningtreeitselfmustbeconnectedandcontainallvertices.Conversely,thedistanceinvariantscanbeondisconnectedgraphs.Iftwovertices,xandy,havenopathconnectingthemthenitisthecasethatdpx;yq8.Forthepurposesofthiswork,withthesamevertices,considerdpx;yqC.HereCwillbeanarbitrarilylargeconstant.Withthisreal-valueddistance,elyalargepenaltyissetupforleavingagraphdisconnected.25Inthedistance-impactcasethisrestrictionwillforcethen1edgesaddedtocreateastar,butinthecount-impactversionanytreewillbeinitiallycreated.4.2DistanceInvariants4.2.1Count-ImpactThecount-impactprocessfromanemptygraphmovesthroughseveralfullyphases.Inthephaseitisimportanttorestatethatdisconnectedverticesinagraphareofarbitrarilylargedistanceapart.Thisrepresentsthedistancethattheyhavefromeachotherandforcesthen1count-impacttocreateatree.Thisisbecausewheneveratleastonevertexisdisconnectedfromtheothers,themaximumcount-impactedgemustconnectittotherestofthegraphbecausethedistancereductionisThisimpliesthatisolatedverticeswillbeconnectedtoeachotherorconnectedcomponents,butunlikethemaximumdistance-impactedge,whichwillbedescribedlater,theseconnectionsdonothavetooccuronthehighestdegreevertices.Inthephaseatrulyrandomtreeisconstructed.Oncethisrandomtreeisconstructed,maximumcount-impactedgesmostlytendtofallintotwogeneralarchetypes.Inphasetwo,newedgeswillgenerallybeconnectedtoperipheralverticesortoverticesofmaximumdegree.Similartotheresultsontrees,therearecounterexamples,butcomputa-tionaltestinghasshownthatapproximately82%oftheedgesinthesecondphasewillbeofoneoftheseclasses.Thispercentagewascomputedfromallgraphsuptoordereleven,butisnotaclaimforallgraphs.Further,notethatphasetwowillbeskippedwhentherandomtreecreatedinphaseoneproducesastar.Aconjecturehereisthatthelengthofthediameterintherandomtreewillcontrolhowmanyedgeswillbeaddedinphasetwo.Onceagraphisdiametertwo,phasethreebegins.Novertices,otherthanthenewlyconnectedones,willbfromtheadditionaledge.Thus,thecount-impactofanyedgeisagainone.AgraphicaldescriptionofthephasesisgiveninFigure4.1.26Initial:EmptyGPhase1:AnyTreePhase2:DiameterReductionPhase3:GraphCompletionOutput:KnFigure4.1ThePhasesoftheMaximumCount-ImpactProcessProposition4.2.1.Aftertheadditionofthen1maximumcount-impactedgestoanemptygraphofordernatreeiscreatedfromthecollectionofallpossibletreesonnvertices.Proof.ConsiderthesetofdisconnectedverticestobeDoforderd,0¤d¤n.ThesetofverticesthatmakeuptheconnectedcomponentareCoforderc,1¤c¤n.Becausetheunlabeledgraphisinitiallyempty,theedgeconnectstwoarbitraryvertices.Thereisnochoicetobemade,however,oncetheinitialedgeisadded,itmustbeshownthatallforthcomingdistance-impactedgeswillcreateatree.IfthereareonlytwoverticestheprocessiscompleteandastarK1;1iscreated.Count-impactisfairlysimplebecauseitonlymeasuresthenumberofverticesthathavereduced.Newedgescanbeaddedinoneofthreecases.27Case1:TheedgeisincidenttotwoverticesinC.Inthiscasethecount-impactcanbeatmostcbecausenoverticesinDareaddedtotheconnectedcomponent.Case2:TheedgeisincidenttotwoverticesinD.Thishasacount-impactofexactlytwobecausebothverticeswereinthedisconnectedsetsoonlyaK2componentiscreated.Case3:TheedgeevuisaddedwherevPDanduPC.Thecount-impactisc1becauseallverticesinwPChavereducedtheirdistancefromftodpw;uq1.Thisisareductionofthedistancesofc1vertices.Clearlycase3istheoptimalcasebecauseithasacount-impactofc1whilecasetwoisatmostcandcase1isexactlytwo.Further,theedgesfromDtoCcanbeaddedatrandombecausethecount-impactdoesnottakeintoaccountthetotaldistance.Thus,arandomtreeoforderniscreatedaftertheadditionofn1count-impactedges.4.2.2Distance-ImpactThemaximumdistance-impactprocessinitiallyappearsasthoughitmustmovethroughthesamephasesasthemaximumcount-impactprocess.However,tinphaseoneleadtotheeliminationofthediameterreductionphasefromthecount-impactvariant.Sp,thedistance-impactedgewillalwaysformastaraftertree-manyedgesareadded.Newedgeswillstilltrytominimizethetotaldistanceandmustbeaddedfromanunconnectedvertextothecenterofthelargestcomponent.Becauseofthis,thephaseremainsthesameandwilladdedgesatrandomtothegraphbecauseitsdiameteristwo.Thisimpliesthatallpossibleedgeshavedistance-impactofone.ThisprocessisdescribedinFigure4.2.28Initial:EmptyGPhase1:StarEmergencePhase2:GraphCompletionOutput:KnFigure4.2ThePhasesoftheMaximumDistance-ImpactProcessProposition4.2.2.Aftertheadditionofthen1maximumdistance-impactedgestoanemptygraphofordernthegraphK1;n1iscreated.Proof.ConsiderthesetofdisconnectedverticestobeDoforderd,0¤d¤n.ThesetofverticesthatmakeuptheconnectedcomponentareCoforderc,c˘1and1€c¤n.ThegraphinquestionisTpV;EqwhereVDYC.Disconnectedverticesareofarbitrarilylargedistanceapartdenotedf.ThetotaldistanceinCistheconstanttdpCq.Astheunlabeledgraphisinitiallyemptythestedgeconnectstwoarbitraryver-tices.Thereisnochoicetobemade,however,oncetheinitialedgeisadded,itmustbeshownthatallforthcomingdistance-impactedgeswillcreateatree.IfthereareonlytwoverticestheprocessiscompleteandastarK1;1iscreated.AftertheinitialK2iscreatedtherearethreechoicesforapossiblenewdistance-impactedgeadditions:29Case1:ThenewedgeisbetweentwoverticesinC.Thisisnotpossible.EitherthesetDisemptyinwhichcasetherehavealreadybeenn1additions,orthereisatleastonevertexleftinD.BecausetheverticesinDareofarbitrarydistancefromthoseofC,nomatterhowmuchthereductionofthedistancesinC,thesinglereductionofevu,withvPDanduPC,fromfto1makessuchaedgehaveahigherdistance-impact.Case2:TwoverticesinDareconnected.ThisreducesthedistanceofthesetDfromd2ftod21f1,atotaloff1.NotethattheotherdistancesintdpCqandthedistancesbetweenDandCisdcfandisunchanged.Case3:ThenewedgeconnectsavertexofDtooneofC.Thisreducesthetotaldistanceofthedisconnectedandconnectedsetsfromdcfbyatleastpd1qcf.Thisisadistancesavingsofcf.Case1willnotoccuranditisalsothecasethattheorderoftheconnectedcompo-nentisatleasttwo.Thisimpliesthatthereductionincase3ofcfisgreaterthanthatofcase2becauseitonlyhasareductionoff1asc¥2.Nowallthatremainstobeshownisthatthecase2edgeadditionsalwaysformastar.Clearly,theonlychoicesforthetwoedgesareK1;1andK1;2,respectively.Thesearethebasecasesforinduction.ConsidernowthatiedgesareaddedcreatinganoptimalK1;i1.Considertheadditionofanotheredge.BytheinductivehypothesisitmustbeconnectedtothestarK1;i1.Thisleavesnonon-isomorphicchoices.Case1:ThenewedgeisconnectedtoaleafofK1;i1.ThetotaldistanceinthestarK1;i1ispi1qpi2qpi1qpi1q2.TheaddeddistanceofthenewvertextothatofK1;i1iscomprisedofthedistancetotheincidentleaf,thedistancetothecenter,andthedistancetotheotheri2leaves.Thisgivesatotaldistanceofpi1q2123pi2qpi2qpi1q.Case2:ThenewedgeisconnectedtothecenterofK1;i1.TheaddeddistanceofthenewvertextothatofK1;i1iscomprisedofthedistancetothecenterandthedistancetotheotheri1leaves.Thisispi1q212pi1qi2.Comparingthecasesthetotaldistancesarei2¤pi2qpi1qifandonlyifi¥2whichis30coveredbythebasecases.Thisnewedgemustalwaysbeaddedtothecenterandastaristhuscreated.4.3ConclusionTheP-impactprocessonemptygraphsprovidessomeinsightontheevolutionofinvariantsasgraphsincreaseinsize.However,atthemomentmuchofthisworkisobser-vationalinnature.Notethatmaximizingdistance-impacttendstoreducethediameterwhereasthemaximumcount-impacttendstoincreasethemaximumdegree.Bothoftheseclaimshavebeensupportedbysomecomputationalresults,butremain`rulesofthumb'ratherthanprovenstatements.Furtherobservationscanbemadetosomegeneraltrendsforgraphs,buteachstageoftheprocessfordistanceandcount-impactedgeadditionfromemptygraphsisgiven.31Chapter5TreesManyinvariantsareoruninformativeindisconnectedgraphs;treespro-videconnectedgraphsofthesmallestsize.BecauseofthisminimalextremalstructuretreesprovideareasonableinitialgraphfortheP-impactprocess.Thischapteralsofocusesonthegraphinvariantsofthenumberofspanningtreesandthetotaldistancebecausetheyarecomputationallysimpletocomputeanddetermineinterestinggraphproperties.Itisshownthatthemaximumsptree-impactedgeistrivialwhilecontrastingthiswiththemorecomplextaskofdeterminingthemaximumcountanddistance-impactedge.Severalconjecturesandcounterexamplestotheapriorideterminationofthemaxi-mumdistance-impactedgeareprovidedandtheyprovidesomeintuitiontotheroleofthemaximumdistance-impactedgeinGe.Finally,aproofisgiventhatshowsthemaximumdistance-impactedgemustnotbeincidenttotwoleavesofatree.5.1NumberofSpanningTreesTreesareacyclicandassuchthereisonlyonespanningtreeofatreeT,namelyTitself.Further,theadditionofasingleedgetoatreecreatesonlyonecycle.Thismakesthetaskofanarbitrarysptree-impactedgequitesimplebecauseitreducestothedistancesbetweentheverticesinatree.32Lemma5.1.1.Thenumberofspanningtreesinanordernconnectedgraphwithnedgesisequaltothegirthofthegraph.Proof.Theremustbeexactlyonecycleinaconnectedgraphwithnedgesonnvertices.Removinganysingleedgeofthiscyclecreatesatree,andremovinganyotheredgesdis-connectsthegraph.ThuswecancreatespanningtreesonlybyremovingoneedgeatatimefromtheonlycycleinG.Thelengthofofthiscycleisthegirth.Asanimmediateresultofthislemma,toasptree-impactedgeofvalueiinTtheedgeexywheredTpx;yqi1mustbeadded.Thisresultistrivial,butitwillbeusedtohighlightthebetweensptree-impactanddistance-impact.5.2DistanceWheresptree-impactonlymustcreateacycleofanappropriatelength,thedistance-impactedgemustbalancethelengthofthecyclecreatedwiththedistributionofverticesalongthatcycle.Becausetressareacyclic,totaldistanceisagainsimpletocompute.However,whenanextraedgeisadded,notonlydoesthedistancebetweenthenewlyconnectedverticesdroptoonebuta`shortcut'maybecreatedformanyothervertices.Thisproblembecomesdbecauseknowingwhetheranyotherverticesusethis`shortcut'isanon-trivialproblem.5.2.1TreePartitionsInthesimplecaseanyedgeinGmaybeconsideredasanP-impactedgeinthepermissibleset.Thus,OpEqOpn2qedgesmustbeaddedindividuallyalongwiththeinvariantcalculations.Again,asthecomplexityforinvariantcomputationisthegoalwillbetotryandreducethesizeofthepermissiblesetinstead.Thiscanreducethepractical,ifnotasymptotic,complexityofingP-impactedges.Tothisend,graph33structuresthatwouldallowtheexaminationveryspverticesofatreeaprioriweretargeted.Severalobservationscanbemadeaboutwhatedgescannotbeinthepermissiblesetfor,say,calculatingthemaximumimpactedge.However,manyobservationswilltakeatleastacalltosearchtoidentifyimportantstructures.Suchacallwillnullifyanycomplexitysavingsinreducingthesizeofthepermissibleset.Instead,somestructureinthegraphthatisfasttocomputeandcanpartitionthesetofedgesinGintoedgestoconsiderandedgestoimmediatelydiscountshouldbefound.Intrees,severalcommongraphinvariantswereconsideredtosplitthegraph'sedges.Theseincludedlookingatdegree,centers,anddistance.Formostoftheseconsiderations,counterexampleswerefoundandaregiveninthenextsection.However,thesecounterex-ampleshelpedtonethebasisforselectingpartitionsontheedgeset,andledtoaproofregardingdistance-impactedgesamongtheleavesinSection5.2.3.5.2.2CounterexamplesinDistanceImpactTodeterminetheappropriatelocationforthedistance-impactedgeseveralconjec-tureswereputforward,butcounterexampleswerefoundandarepresentedbelow.Thesegiveinsightastowhatstructureisimportanttothedistance-impactofedges.IneachthesolidlinesrepresenttheedgesinGandthedashedlinesaretheedge(s)ofmaximumdistance-impactinGe.Anwasmadetoprovidetheextremallyinterestingcounterexampleswheretheorderisthesmallest.Conjecture5.2.0.1.Themaximumdistance-impactedgeisincidenttoapairofperipheralvertices.Counterexample.InthegraphinFigure5.1themaximumimpactedgesdonotconnectthemaximumdegreevertices.34Figure5.1Counterexampletoconjecture5.2.0.1Conjecture5.2.0.2.Themaximumdistance-impactedgeisincidenttotwoofthemaxi-mumdegreeverticesintrees.Counterexample.InthegraphinFigure5.2themaximumimpactedgesdonotconnectthemaximumdegreevertices.Figure5.2Counterexampletoconjecture5.2.0.2Conjecture5.2.0.3.Themaximumdistance-impactedgeisincidenttoapairofleaves.Counterexample.Figure5.2alsoprovidesacounterexample.Conjecture5.2.0.4.Themaximumdistance-impactedgeisincidenttoatleastoneofthemaximumdegreeverticesintrees.Counterexample.GiventhefollowinggraphGinFigure5.3themaximumimpactedgesdonotconnectthemaximumdegreevertices.35Figure5.3Counterexampletoconjecture5.2.0.4Conjecture5.2.0.5.Themaximumdistance-impactedgeisincidenttoatleastonvertexofmaximumdegreeoratleastoneoftheleaves.Counterexample.Figure5.3alsoprovidesacounterexample.Conjecture5.2.0.6.Themaximumdistance-impactedgeisincidenttothemaximumdegreeverticeswiththelongpathbetweenthemintrees.Counterexample.Figure5.3alsoprovidesacounterexample.Conjecture5.2.0.7.Themaximumdistance-impactedgeisincidenttothemaximumdegreeverticesofthefurthestdistancefromeachother.Counterexample.Figure5.3alsoprovidesacounterexample.Conjecture5.2.0.8.Themaximumimpactedgeinatree,T,iseitherincidenttothecenter,C,oritsendpointsareinseparatecomponentsinTC.Conjecture5.2.0.9.Themaximumdistance-impactedgejoinscomponentsinTk,wherekistheedgeofmaximumindex.Counterexample.InFigure5.4themaximumdistance-impactedgedoesnotjoincompo-nentsofTk.36kFigure5.4CounterexampletoConjecture5.2.0.9Conjecture5.2.0.10.Themaximumdistance-impactedgejoinscomponentsinTkwherekisthemaximumdistIndexedge.Counterexample.InFigure5.5themaximumdistance-impactedgedoesnotjoincompo-nentsofTk.kFigure5.5CounterexampletoConjecture5.2.0.10Conjecture5.2.0.11.Themaximumdistance-impactedgejoinsdisjointcomponentsinTCorisincidenttoCwhereCisthecentervertex.37Counterexample.InFigure5.6,themaximumdistance-impactedgeisnotincidenttothecenter,andpTCqwiththemaximumdistance-impactedgeisdisconnected.Thevertexlabeled`C'isthecenter.CFigure5.6CounterexampletoConjecture5.2.0.11Conjectures5.2.0.8-10yieldveryfewcounterexamples.Thefollowingtablesshowthecomputationalresultsforsmalltrees.38Order34567891011NumberofTrees1236112347106235Counterexample100000000AdjacenttoMaxIndex12249143363151PercentCounterexample100000000001213141516175511301315977411932048629003121631073216003919896122760000.0950.0120.0100.033Table5.1Analysisofthecounterexamplestotheconjecturethatthemaximumdistance-impactedgejoinscomponentsinTkwherekisthemaximumindexedge.Order34567891011NumberofTrees1236112347106235Counterexamples100000000AdjacenttoMaxdisIndex12248132755123PercentCounterexamples100000000001213141516175511301315977411932048629003131126758413343069724817512000.0950.0120.0160.023Table5.2Analysisofthecounterexamplestotheconjecturethatthemaximumdistance-impactedgejoinscomponentsinTkwherekisthemaximumdistIndexedge.39Order3456789101112NumberofTrees1236112347106235551Counterexamples1000000000AdjacenttoCenter011247122548112PercentCounterexamples10000000000013141516171819201301315977411932048629123867317955823065000000012164911112260862668583124011768600000001.2106Table5.3Analysisofthecounterexamplestotheconjecturethatthemaximumdistance-impactedgejoinscomponentsinTcorisincidenttocwherecisthecentervertex.Thesetablesarenotttoprovethenumberofcounterexamplestothecon-jecturesarevansishinglysmall,butthisappearstobethecase.Thisisespeciallyimportantbecausethecentercanbequicklydetermined.Thiscouldimplythatthepermissiblesetmustbeoftheformdescribedin5.2.0.10withonlyexceptionallyrarecasesfallingintotheclassofcounterexamples.5.2.3Non-LeafDistance-ImpactEdgesinTreesTheprevioussectionhighlightstheunderlyingslipperynatureofmaximumdistance-impactedges.Sp,theyshowhowpathlengthanddegreecanbemisleadingwhentryingtothemaximumdistance-impactedge.Inmanytreesthemaximumdistanceedgeisontheperipheralverticesorincidenttothecenter,butunfortunatelythisisnotthecaseingeneral.Theseexamplesseemtoindicatethatamoresophisticatedconjectureisneededthatwilltakeintoaccountpathlengthanddegree.40Inthissectionaproofispresentedtoreducethesearchspaceformaximumdistance-impactedgesintrees.Suchareductionisusefulinspeedingcomputationalevaluationsofdistance-impactgraphsandbeginstomovetowardsamethodforthemaximumdistance-impactedgeinthegeneralcase.Lemma5.2.1.Themaximumdistanceimpactedgeinstarsmustbeincidenttotwoleaves.Proof.Notethatallverticesareleavesexceptforthesinglevertexthatisconnectedtoallleaves.Thiscentralvertexisofmaximumdegree,thusanynewedgemustbebetweentwoleavesinthesimplecase.Becausethisgraphisdiametertwo,anyaddededgehasedgeimpactofexactlyone.Pathgraphswillbeconsideredtodevelopthecentralpropositionforprovingthatthemaximumdistance-impactedgemustnotbebetweentwoleavesinallbutvanishinglyrarecases.Considerthepathgraph,Pnandthefollowingclaim:Claim5.2.2.Themaximumdistance-impactedgeinPnfornPZisnotincidenttothetwoleaves.Thisclaimneedstassomesmallcasesdonothold.TheyareconsideredexhaustivelyandClaim5.2.2willbeintoLemma5.2.7.Thesesmallcasesencompass0¤n¤6.Whenn0,P0istheemptygraphandnoedgesmaybeadded.Whenn1orn2thegraphsarethecompletegraphsK1andK2,respectively.Again,thesegraphsdonotallowadditionaledgesinthesimplecase.Forn3thegraphPnS1;2andthusthemaximumdistance-impactedgesmustbeincidenttotwoleaves.Thenon-degeneratecaseisP4(Figure5.7).Therearetwonon-isomorphicedgesthatbothhaveequaldistance-impactof2asshowninTable5.4.Oneoftheseedgesisincidenttothetwoleavesandtheotherisnot.Thus,themaximumdistanceimpactedgecanbechosensoasnottobeincidenttotwoleaves.However,anedgebetweenthetwoleavesisalsopossible.4121Figure5.7P4withallnon-isomorphicpossibleedgeadditionsasdashededgesEdgeAdditionDistance-Impact1222Table5.4Thedistance-impactofallnon-isomorphicedgeadditionsinP4ThecaseofP5isatrueexception.Consideringthenon-isomorphicedgeadditionsinFigure5.8,thedistance-impactisgiveninTable5.5.Themaximumdistance-impactedgeisincidenttotheleaves,andeveryotherpossibleedgeadditionyieldslowerdistance-impactvalues.2134Figure5.8P5withallnon-isomorphicpossibleedgeadditionsasdashededgesEdgeAdditionDistance-Impact15243244Table5.5Thedistance-impactofallnon-isomorphicedgeadditionsinP542TheedgeadditionstoP6(Figure5.9)donotviolateClaim5.2.2asshownexhaus-tivelyinTable5.6.However,notethatthemaximumdistance-impactedgeisincidenttooneleaf.Forn¡6,itwillbeshownthatthemaximumdistance-impactedgeisnotincidenttoeitherleafinpathstoproveLemma5.2.7.432165Figure5.9P6withallnon-isomorphicpossibleedgeadditionsasdashededgesToproveamoversionofClaim5.2.2,thetotaldistanceinCnfornoddorevenisshowncombinatorially.AgeneralgraphofCnisshownontheleftofFigure5.10Thedistancevaluescomputedherewillbeusedinalaterproof.::::::::::::::::::::::::::::::::::::::::::Figure5.10Cn(left)andDpCn2(right),eachoforderandsizenLemma5.2.3.ThetotaldistanceinCn,wherenisodd,isnn12¸i12in3n4Proof.Duetosymmetry,onlyonevertexisconsidered,v.Themaximumshortest-pathdistancetoanyvertexinanoddcycleisn12.Thereareexactlytwoverticesofthisdistance43fromv.Further,thedistancestotheverticesonthesepathscanbefoundasthesummationof°n12i1i.Becausetherearetwosuchpaths,thevertexvhasdistance,fromeveryothervertex,of:n12¸i1in12¸i1in12¸i12in2n4Therearensuchverticessothetotaldistanceinanoddcycleis:nn12¸i12iLemma5.2.4.ThetotaldistanceinCn,whereniseven,isnn2¸i12in2˝˛n34Proof.Duetosymmetry,onlyonevertexisconsidered,v.Themaximumshortest-pathdistancetoanyvertexinanoddcycleisn2.Thereisexactlyonevertexofthisdistancefromv.Further,thedistancestotheverticesonthesepathscanbefoundasthesummationof°n2i1i.Becausetherearetwosuchpaths,thevertexvhasdistance,fromeveryothervertex,of:n12¸i1in12¸i1in12¸i12iHowever,thisdoublecountsonlythesinglevertexofdistancen2fromv.Thisiscorrectedbythesubtractionofn2distance.Thus,consideringthensuchverticesinCnthetotaldistanceinanevencycleis:nn2¸i12in2˝˛44Oneauxiliarygraphisneeded.GivenacycleCnwheren¡5,theendpointsofoneedgearemovedsuchthattwopendantsofdistance3fromeachotherarecreated.ThisformsthegraphontherightinFigure5.10dubbedthedoubledistance-three-pendantcycledenotedDpCn2.Thetotaldistanceforoddandevennaregiveninthefollowinglemmas:Lemma5.2.5.ThetotaldistanceinDpCn2,forn¡5odd,is:pn2qn32¸i12i4n12¸i12i1˝˛6andisequivalentto:n32n211n24Proof.Thisproofwillprocesscombinatoriallybyassessingthedistancebetweentwosub-graphsofDpCn2.ThetwosubgraphsofDpCn2arethecyclepartandthependants.Theyaredenotedascandp,respectively,inFigure5.11.Tothetotaldistanceinthisgraphittothetotal-distanceinthecyclepart,thetotaldistanceinthependantpart,andthetotaldistancebetweenthesesubgraphs.Notethat,althoughthepathbetweenthependantsliesonthecycle,itisnotcountedtwiceinthecalculationsofthetotaldistanceofthecycle.::::::::::::::::::ppccccccFigure5.11TheregionsofDpCn245Thethreetotaldistancecalculationsareasfollows:1.Pendant-PendantTotalDistance:Theoftheconstructiongivesthatthependantsareofdistance3fromeachother.Thus,thetotaldistanceforthissectionis336exactly.2.Cycle-CycleTotalDistance:Thecycleisoflengthn2withtwopendants.Be-causenisodd,soisn2.ThetotaldistanceforoddcycleswasgiveninLemma5.2.3.Substitutinginn2,thetotaldistanceis:pn2qn32¸i12in36n211n643.Pendant-CycleTotalDistance:Thepathsinthissetareallofthepathstotheverticesonthecycleplusthesingleedgeadditionofthependant.Withthisextraedgethecalculationofdistanceisalmostidenticaltothatofacyclewithanextracorrectionterm.Fromthependant,thehalf-waypointonthecycleisreachedafter1n32n12edges(reachthecycleandthentraverseit).Becausen2isodd(andthependantisoneedgefromathecycle)thereareexactlytwoverticesofthisdistance,however,thisformulationdoublecountsthepathfromthependanttothevertexofthecycle.Thissumisthus:n12¸i12i1n25Becausetotaldistanceisbeingcalculated,thisvaluemustbedoubledtocountallpendant-to-cycleandcycle-to-pendantpaths.Thus,foreachpendantthereisatotaldistanceof:2n12¸i12i1˝˛46Finally,becausetherearetwopendantsofthesameform,thisexpressionmustbedoubledagaintogive:4n12¸i12i1˝˛n24ThesumofallthreepartsgivesthetotaldistanceforDpCn2,nodd,as:pn2qn32¸i12i4n12¸i12i1˝˛6Lemma5.2.6.ThetotaldistanceinDpCn2,forn¡5even,is:pn2qn22¸i12in22˝˛4n2¸i12i1n2˝˛6andisequivalentto:n32n212n4Proof.Thisproofwillprocesscombinatoriallybyassessingthedistancebetweentwosub-graphsofDpCn2.ThetwosubgraphsofDpCn2arethecyclepartandthependants.Theyaredenotedascandp,respectively,inFigure5.11.Tothetotaldistanceinthisgraphittothetotal-distanceinthecyclepart,thetotaldistanceinthependantpart,andthetotaldistancebetweenthesesubgraphs.Notethat,althoughthepathbetweenthependantsliesonthecycle,itisnotcountedtwiceinthecalculationsofthetotaldistanceofthecycle.Thethreetotaldistancecalculationsareasfollows:471.Pendant-PendantTotalDistance:Theoftheconstructiongivesthatthependantsareofdistance3fromeachother.Thus,thetotaldistanceforthissectionis336exactly.2.Cycle-CycleTotalDistance:Thecycleisoflengthn2withtwopendants.Becauseniseven,soisn2.ThetotaldistanceforevencycleswasgiveninLemma5.2.4.Substitutinginn2,thetotaldistanceis:pn2qn22¸i12in22˝˛n36n212n843.Pendant-CycleTotalDistance:Thepathsinthissetareallofthepathstotheverticesonthecycleplusthesingleedgeadditionofthependant.Withthisextraedgecalculationofdistanceisalmostidenticaltothatofacyclewithanextracorrectionterm.Fromthependant,thehalf-waypointonthecycleisreachedafter1n22n2edges(reachthecycleandthentraverseit).Becausen2iseven(andthependantisoneedgefromathecycle)thereisexactlyonevertexofthisdistance.Thus,anextran2pathisdoublecounted.Thepathfromthependanttothevertexofthecycleisalsodoublecountedasthereisonlyonevertexofdistanceonefromthependant.Tocorrectthisovercounting,acorrectiontermof1n2issubtracted.Thedistancefromthependanttoallofthoseinthecycleisthus:n2¸i12i1n2Thisexpressionisquadrupled(doubledtoalsocountthecycle-pendantdistances,anddoubledagainforthesecondpendant'sdistances).Thisgivesatotaldistanceinthissubgraphas:484n2¸i12i1n2˝˛n25ThesumofallthreepartsgivesthetotaldistanceforDpCn2,neven,as:pn2qn22¸i12in22˝˛4n2¸i12i1n2˝˛6EdgeAdditionDistance-Impact182936445866Table5.6Thedistance-impactofallnon-isomorphicedgeadditionsinP6Proposition5.2.7.Themaximumdistance-impactedgeinPnforn¥6isnotincidenttobothleaves.Forn¡6,themaximum-distance-impactedgeisnotincidenttoeitherleaf.Proof.Thelemmaisprovenusingthepreviouscombinatorialresults,andwillbeconsideredinevenandoddcases.Considerthatn¡6.1.OddCase:Considerann-path,Pn.Themaximumdistance-impactedgeisnotbetweentwoleavesifthetotal-distanceinCnisgreaterthanthetotaldistanceinDpCn2.Considerthefollowingifandonlyifequivalencesforn¡6.Iftheequivalenceshold,thentheoddcaseisproven.OntheleftisthetotaldistanceinCn(odd)from49Lemma5.2.3andontherightisthetotaldistanceofDpCn2(odd)fromLemma5.2.5.nn12¸i12in3n4¡pn2qn32¸i12i4n12¸i12i1˝˛6n3n4¡n36n211n64n256n3n¡n36n211n64n24n3n¡n32n211n2n26n1¡0Theinequalityholdsforn¡32?25:828,thusthewholesetofequivalencesaretrueforthegivenboundsofn¡6.Thenegativecaseisnotconsideredbecausenisagraph'sorder.2.EvenCase:Considerann-pathPn.Themaximumdistance-impactedgeisnotbetweentwoleavesifthetotal-distanceinCnisgreaterthanthetotaldistanceinDpCn2.Considerthefollowingifandonlyifequivalencesforn¡6.Iftheequivalenceshold,thentheevencaseisproven.OntheleftisthetotaldistanceinCn(even)fromLemma5.2.4andontherightisthetotaldistanceofDpCn2(even)fromLemma5.2.6.Thenotationislong,sonotethatthetotaldistanceinanevencycleis:nn2¸i12in2˝˛n34andthetotaldistanceintheevendoubledistance-threependantcycleis:pn2qn22¸i12in22˝˛4n2¸i12i1n2˝˛650andisequivalentto:n32n212n4Combiningtheseintoaseriesofinequalities,thefollowingifandonlyifinequalitiesareevaluated:n34¡n32n212n4n3¡n32n212n0¡2n212nn26n¡0npn6q¡0Theinequalityholdsforn¡6,thusthewholesetofequivalencesaretrueforthegivenboundsofn¡6.Thenegativecaseisnotconsideredbecausenisagraph'sorder.Bothcasesarenowevaluated.Withtheevenandoddcases,itisshownthatthemaximumdistance-impactedgeisnotincidenttoeitherleafinPnwithn¡6.AlongwithFigure5.9andTable5.6,whenn6themaximumdistance-impactedgeisnotincidenttobothleaves,thepropositionisproven.Thereisoneadditionaltreetonotebeforeproceedingwiththetheorem.ThegraphinFigure5.12hasnoedgethathashigherdistance-impactthanoneincidenttotheleavesasshowninTable5.7.ThisgraphwillbedenotedTbadforreasonsthatwillbecomeapparentinthestatementofTheorem5.2.8.51213456Figure5.12Tbadwithallnon-isomorphicpossibleedgeadditionsasdashededgesEdgeAdditionDistance-Impact152434445262Table5.7Thedistance-impactofallnon-isomorphicedgeadditionsinTbadWiththedevelopmentofProposition5.2.7andthelemmasofthischapter,theprincipletheoremcanbeproven.Theorem5.2.8.Themaximumdistance-impactedgeisnotincidenttotwoleavesinalltreesexceptstars,Pn,andTbad(giveninFigure5.12).Proof.TheproofisgreatlywiththeinclusionofProposition5.2.7,namelyCnhaslargertotaldistancethanDpCn2.Consideratreethatisnotastar,T.Thistreemusthavetwoleaves,butitmayhavemanymore.Callanytwoleavesl1andl2wherel1;l2PtEpGqq|l1˘l2;degpl1qdegpl2q1u.Anedgenotincidenttotwoleavesthathashigherdistance-impactwillbedeterminedforeverypairofleaves.ThecaseswillbepartitionedbasedthevalueofdTpl1;l2q.ForeaseofnotationthesubscriptgraphonthedistancefunctionissuppressedwhenthegraphissimplyT.521.dpl1;l2q2:Becausethedistancebetweentheseleavesis2,thedistancebetweenthem,upontheadditionofei1i2,isreducedtoone.However,becausetheyarebothleaves,theirneighborhoodconsistsofthesamevertex,v.Thisimpliesthatnootherpaththroughthetreewillusethenewedge.Supposeitdid.Thenfortwovertices,xandy,tousethenewedgethepathPpx;yqmustbePpx;vqevl1el1l2el2vPpv;yq.ThisistriviallyshortenedasPpx;vqPpv;yq.Becausethetreeisnotastar,movetheaddededgetoanypairofnon-leafverticesthatarenotadjacent.Theirnewdistanceis1.Thisisareductionofatleast1asthoseverticeswerenotpreviouslyconnected.Intheworstcase,thetotaldistanceisthesame,buttheaddededgeisnotincidenttotwoleaves.2.dpl1;l2q3:Whenthedistancebetweentheselectedleavesis3,theyareconnectedbyacopyofP4.IfTisisomorphictoP4then,asinFigure5.7,thereisamaximumdistance-impactedgenotincidenttotwoleaves.Inthegeneraltree,becausetherearenoloops,anyverticesnotcontainedintheP4formsubgraphsthatonlyconnecttothecutverticesinsideofthepath.ConsiderthisstructureinFigure5.13wherethedashededgeisthepotentialnon-double-leaf-adjacentcandidateedge.Thetotalordersoftheverticesconnectedtoeachpathvertexare|A|and|B|.Note,theinducedsubgraphsGrAsandGrBsarenotnecessarilyconnected.l1vwl2ABFigure5.13Proposeddistance-impactedgewhenselectedleavesareofdistance3IfbothAandBareemptythegraphisP4.Inthiscase,thedashededgeisofthesamedistance-impactastheleaf-leafedgeandisonlyincidenttooneleaf.Consider53thenthatatleastoneofAandBisnon-empty.Withoutlossofgenerality,Acontainsatleastonevertex.Bymovingtheedgefroml1tothepathvertexv,thedistancefroml1tol2increasedby1.However,previouslythedistancefroml2toanyvertexiPAwas|Ppi;vqevl1el1l2||Ppi;vq2|,butthiswasreducedto|Ppi;vqevl2||Ppi;vq1|foreveryvertexinA.Thisisareductionofatleast1.ThechangeoftheedgedoesnotanydistancesonP4.NopathsfromAtoBcanusetheaddededgeasitwouldincreasethedistancebetweenthesubgraphsbyone.Therefore,thereisalwaysanedgeofhigherimpactthanoneincidenttotwoleavesofdistance3.3.dpl1;l2q4:Inthiscasetheunderlyingpathconnectedl1andl2isP5.FromFig-ure5.8thisgraphisknowntohaveauniquemaximumdistance-impactedgebetweenitsleaves.Considertheunderlyingpathastl1;u;v;w;l2u,andcandidateedgesiandj.Figure5.14detailsthegraphandproposededges.Thereareotheredgesthatcouldbeconsidered,butthesearetfortheproof.l1uvwl2ABC:::ijkFigure5.14Proposeddistance-impactedgewhenselectedleavesareofdistance4Notethatedgeiisincidenttotheleavesandshouldbeavoidedifpossible.UsingFigure5.14,thefollowingthreeequationsrepresentthetotaldistancesavingsofeachedgeadditionandarederivedfromcombinatorialmethods.Further,|A|a,|B|b,and|C|c.BecauseallverticesinthesetsA;B;andCconnecttotheP5pathviaacutvertex,anyreductioninpath(oflength)betweenthesetsandavertexonthe54reducedpathimprovesthetotaldistancebyforallpossiblepaths.Withoutlossofgenerality,considertheectedsettobeA.Thenthedistancedropis|A|fromthesettothevertex.Thisis2|A|intotaldistance.(a)Distance-Impactofi:Theleaveswereinitiallyofdistance4,thishasbeenreducedto1viatheedgeel1l2.Theedgeel1l2alsoreducesdpl1;wqanddpl2;uqfrom3to2.Finally,thepathPpl1;wqallowsaccesstoCbyapaththatis1shorter.Thisisareductionof2|A|.Bysymmetry,withPpl2;uqthereisalsoareductionin2|C|.Consideringallotherpathsinthegraph,nootheronescanusetheadditionaledge.Thedistance-impactofedgeiis2|A|2|C|10.(b)Distance-Impactofj:Theleaveswereinitiallyofdistance4,thishasbeenreducedto3viatheedgeeuwasavingsof1foreach.Similarly,thepathsPpu;l2qandPpl1;wqarealsoreducedby1eachforatotalof4.ThepathbetweensetsAandCarereducedto1asPpu;wqisnowpresent.Thisyields2|A||C|asallverticesinAhaveareducedpathtoCandviceversa.Further,thisreducesby2moreforthePpu;wqpath.Theverticesl1anduarenowoneclosertoCviaw.Thisisadistance-impactof2|C|2|C|4|C|.Bysymmetrythesamereductionhappensforl2andwtoA.Consideringallotherpathsinthegraph,nootheronescanusetheadditionaledge.Thetotaldistance-impactofjis2|A||C|4|A|4|C|8.(c)Distance-Impactofk:TheadditionofkistheonlycasewherepathstoBarereduced.Thedistancefroml1tol2isreducedby1saving2.Also,kev;l2sotheirdistanceisreducedby1savinganadditional2.Thedistancefroml2tovanduarereducedby1sothesetsAandBarealsocloser.Thisreducesthetotaldistanceby2|A|2|B|.Consideringallotherpathsinthegraph,nootheronescanusetheadditionaledge.Thedistance-impactofedgeiis2|A|2|B|6.55FromthesecombinationalargumentsthreefunctionsonthesizeofthepartitionsA,B,andCwerederived.Thesefunctionsare:fipa;b;cq2a2c10gjpa;b;cq2ac4a4c8hkpa;b;cq2a2b6Herethefunctionfipa;b;cqcomputesthedistance-impactoftheadditionofedgeitoatreewhereatleasttwoleavesareofdistance4(similarforgjandhk).Ittoallpa;b;cqsuchthatforthattripleoneofthefollowingistrue:gjpa;b;cq¥fipa;b;cqorhkpa;b;cq¥fipa;b;cqbecausethisimpliesthedistance-impactofedgesnotincidenttotwoleavesareofhigherimpact.TherearetwotripleswherethisdoesnotholdcorrespondingtoP5andTbad.Recallthata;b;andcaresimplythenumberofverticesinthesetsA;B;andC.ThenthecaseofP5occurswhenthetripleisp0;0;0q.Thisgives:gjp0;0;0q§fip0;0;0q20040408§2020108§10and56hkp0;0;0q§fip0;0;0q202062020106§10SimilarlyforTbad,considerthetriplep0;1;0q.Thereisonlyonegraphofthisform.Theinequalitiesagainfail.gjp0;1;0q§fip0;1;0q20040408§2020108§10andhkp0;1;0q§fip0;1;0q202162020108§10Thisthecounterexamplesandalsoallowsforproofthatthesearetheonlytwotreesthatforcetheadditionalofaleaf-leafedge.Toshowthattherearetheonlysuchcounterexamplesconsiderfipa;b;cq2a2c10andgjpa;b;cq2ac4a4c8.Becausetheseentriescorrespondto57thesizeofsetsintermsofvertices,allofa;b;cPZ0;.Thenconsiderthefollowingifandonlyifconditionals:fipa;b;cq¤gjpa;b;cq2a2c10¤2ac4a4c82a2c10¤4a4c82¤2a2cThefollowingholdswhenevera¡0orc¡0.Notethat2accouldbesafelydroppedas2ac¥0.Regardlessofthevalueofb,ifa¡0orc¡0addingedgejwillalwayshaveaatleastthedistance-impactofaddingi.ThissinglecounterexampleP5istheonlyonethatholdsinthiscase.Finally,considerfipa;b;cq2a2c10andhkpa;b;cq2a2b6.Thefollowingifandonlyifconditionalsgive:fipa;b;cq¤hkpa;b;cq2a2c10¤2a2b62c10¤2b62c4¤2bc2¤bComparingtheinequalityoffiandgj,itisknownthatifc¡0addingedgejist.Thisimpliesthattheonlycaseofconcerniswhen2¤b.Therearetwo58suchcaseswherea0;b¤2;c0.Theisaknowncounterexample,Tbad,fromp0;1;0q.Allthatmustbeconsideredisp0;2;0q.However,inthiscase,equalityholdsbecauseofthefollowing:fip0;2;0q2020101020226hkp0;2;0qThus,forthistriplethedistance-impactofiandkisequal,andkischosentoavoidaleaf-leafedge.Therefore,withthestatedexceptionsofP5andTbad,ifthedpl1;l2q4thereisanedgeofhigherdistance-impactthanei1i2.4.dpl1;l2q¥5:Inthiscasetheunderlyingpathconnectingl1andl2inTisP6.ThisallowstheuseofProposition5.2.7.Considertheunderlyingpathastl1;u;v;w;x;l2u.ItisknowfromProposition5.2.7thattheaddingtheedgeeuvyieldsalargerreductionoftotaldistancethanel1l2.Figure5.15detailsthegraphandproposededge.l1uvwxl2ABCD:::Figure5.15Proposeddistance-impactedgewhenselectedleavesareofdistanceatleast5ThedistancefromtheleavestothesubsetsA,B,C,andD,hasnotchanged.Thisimpliestherearenodistancechangesto,orfrom,theleavesandthesubsets.Further,if|A|or|D|¡1thenthetotaldistancereducesasthepreviouspathforiPAandjPDwasreducedfrom|Ppiuqeul1el1l2el2xPpx;jq|to|PpiuqeuxPpx;jq|.ThisisareductionoftwoforallverticesinAandB.Ifdpl1;l2q¡5,thenthere59arefurthertotaldistancereductionsamongtheothersubsets,butinanycasetheproposededgeisofgreaterdistance-impactthanaleaf-leafedgewheredpl1;l2q¥5.Thesecaseshaveshown,forallbutthedegeneratetrees(S1;n1;P5;andTbad),thatanyedgethatisadjacenttotwoleavesisnottheuniquemaximumdistance-impactedgewheneverdpl1;l2q¥2.Thiscoveredallpossiblecasesfortheleavesandthusthetheoremisproven.Randomtreesarecomposedofapproximately30%leaves.BecauseofthisfactandTheorem5.2.8,wheningthemaximizedistance-impactedgethereisnoneedtosearchthen{32-manyleaf-leafpossibleedges.Becauseleavescanbequicklyfoundintrees,thisresultyieldsanappreciablecomputationaltimedecrease-eveniftheasymptoticbehaviorisunchanged.5.3ConclusionManyaspectsoftreestructurewereconsideredfortheP-impactedges.Evenasdeterminingthespanning-impactofedgesistrivial,thedistance-impactisnot.Severalnegativeresultsalongwiththeirrespectivecounterexampleswerepresentedalongwithaproofaboutthelackofmaximumdistance-impactedgesincidenttoleavesintrees.Findingthedistance-impactofanedge,apriori,inatreeisstillanopenquestion,butsettlingthisquestionintreesisausefulstartingpointtoexplorethesamequestioningeneralgraphs.Itisclearthatsimpleconsiderationofdegreeordistanceseparatelyisnotttodeterminethemaximumdistance-impactedge.Fromtheobservationsandresultsherethemaximumdistance-impactedgemaybedecidedintreesbasedonanunknownfunctionthatchoosesanedgebasedonthedegreeanddistanceofitsendpoints.Observationseemstoimplythatdistanceislessimportantthandegree,butthisclaimhasnotbeenproven.60Chapter6P-ImpactinNetworkCompletionTheP-impactprocesshasbeenshowntouseedgeadditiontomaintainoroptimizeagiveninvariant.Usingthedistance-relatedinvariants,maximumP-impactedgesreducedtheglobaldistancemeasuresingraphs.Thesespedgesseemtohavesomesalientpropertiesregardingthegraphstructure.Toexplorethisnotion,asettingisconsideredwhereapartiallyunobservedgraphistaken.Thefeasibilityofidentifyingmissingedgesthatmostimprovetheaccuracyofstandardnetworkcompletionalgorithmswhenedgesareaddedtothegraphisexamined.ThischapterservesasabridgebetweentheP-impactprocessandthematrixcompletionworkofthelaterchapters,anditsresultsonsmallrandomgraphsshouldbeviewedasmotivationtoexplorematrixcompletionproblemsforgraph-modeledapplications.ThesematrixcompletionproblemswilltaketheformoflinkpredictionandnetworkcompletioninChapter7andrecommendersystemsinChapter8.6.1ExperimentationTheP-impactquestionwasdesignedtodeterminethemostrelevantedgestoaninvariantvalueinagraph.Insteadoftheoptimaledgesetstoaddtographs,theseexperimentsmeasuretheerrorthatoccurswhenedgesareobfuscatedandnetworkcompletionisperformed.Thegoalistodeterminewhetherthereisaninherenthierarchy61ofpredicativepoweramongtheedgesofvariousgraphclasses.Theclassesusedwere:Erd}os-Renyirandomgraphs,randomtrees,andrandompowerlaw(scale-free)graphs.6.1.1NetworkCompletionThenetworkcompletionprocesswasachievedbyusingmatrixfactorizationwithregularization.ConsideragraphGwithadjacencymatrixA.ThismethodtriestolearnthelatentfeaturesoftheadjacencymatrixofthegraphintheformoftwomatriceswheretheirproductapproximatesA.ThisprocessisdetailedmuchfurtherinChapter7,buttheresultswerefoundbysolvingthefollowingoptimization:minU;V}AUVT}}U}2F}W}2F6.1.2EvaluationMetricsTheperformanceofthetalgorithmsweremeasuredbythediscrep-ancybetweentheoriginalandcompletedmatrices.ThewidelyadoptedMeanAbsoluteError(MAE)andRootMeanSquaredError(RMSE)metrics[41]wereusedforevaluation.LetAandpAdenotethefullandestimatedadjacencymatrices,respectively.LetTbethesetofunobservedtestlinks.Then,MAE°pi;jqPT|AijpAij||T|;TheRMSEmetricisas:RMSEgffe°pi;jqPTAijpAij2|T|:626.1.3TheProcessFiveorder25randomgraphsofeachclassweregenerated.Fromeachofthese,nineothergraphswerecreated.Ineachoftheseninegraphs,between10-90%(stepsizeof10%)oftheentriesintheadjacencymatrixcorrespondingtopotentialedgeswereeliminated.Notethatfornon-directedgraphstheadjacencymatrixissymmetric,soinpracticeonlytheupper-triangleadjacencymatrixwasconsidered.Fromeachoftheseadjacencymatri-cestwoprocessesoccurred.First,theadjacencymatrixwascompletedas-isviamatrixfactorization.Second,oneentryatatimewasrevealedandreplacedwithitstruevalue(0or1).Atthispoint,theadjacencymatrixwascompleted(againviamatrixfactorization).Inthecase,thebasalerrorrateswerecalculated(i.e.errorwithoutrevealinganyedges).Inthesecondcase,eachtimeanentrywasrevealedtheerrorofcompletionwasreportedasininstance.Fortheeaseoflanguage,intheoriginalgraphbothabsentandpresentedgeswillbereferredtoas`edges'.Thebeing,absent`edges'willhaveatruevalueof0intheadjacencymatrixwhilepresentedgeswillhaveavalueof1.6.1.4PredictiveEdgesonErd}os-RenyiRandomGraphsErd}os-Renyirandomgraphsareoneofthemostcommonrandomgraphmodels.ThecharacterizationconsideredhereinwasGpn;pq.Thisprobabilisticmodelcreatesagraphofordernwhereeachpossibleedgeexistswithprobabilityp.Thesizeofsuchgraphsisexpectedtoben2p.Thevalueofnwasas25andpwasallowedtovaryfrom0.1to0.9withstepsizeof0.1.AllresultsinthissectionareforGp25;0:5q,whiletheresultsfortheothervaluesofparecontainedinAppendixA.TheresultsforRMSEarereportedinFigure6.1whiletheMAEvaluesareinFigure6.2.Theredbarineachoftherunsisthebasalcompletionerror.Thebasalcompletionerror,alongwiththepercentofaddededgesthatresultinpoorernetworkcompletionthanthebasalrate,aregiveninTable6.1.63Figure6.1RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:5qFigure6.2MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:5q64EdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)101.239539.661.096245.66201.2360441.096150.66301.241941.331.085037.33401.204823.661.047419.33501.354794.331.218895.33601.2280311.067025.66701.240035.331.081930.0801.367995.331.231795.66901.152760.98865.333Table6.1ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:5q6.1.5PredictiveEdgesonRandomTreesTreesareminimallyconnectedgraphsandtheyalsoformsparseadjacencymatrices.Becauseofthisminimalstructure,treeswerethefocusoftheproofsandexplorationinChapter5.Treesarefurtherexploredhereforthethepredictivepowerofrevealingtheiredges.TheresultsforRMSEarereportedinFigure6.3whiletheMAEvaluesareinFigure6.4.Theredbarineachoftherunsisthebasalcompletionerror.Thebasalcompletionerror,alongwiththepercentofaddededgesthatresultinpoorernetworkcompletionthanthebasalrate,aregiveninTable6.2.65Figure6.3RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomtreesgraphsFigure6.4MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomtrees66EdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)101.510526.661.451524.0201.45645.6661.40946.333301.508325.661.448823.0401.5159321.468233.33501.618681.661.558279.0601.524936.331.461932.33701.38730.671.32920.666801.42092.01.566269.0901.599970.661.544269.33Table6.2Thebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforrandomtrees6.1.6PredictiveEdgesonRandomPowerLawGraphsPowerlawgraphsareacommoncolloquialismforscale-freenetworkswherethedegreesequencefollowsapowerlawdistribution.Thisnamingconventionalignswiththelaterchapters.Manydegreesequencesinsocialmedianetworkfollowessentiallypowerlawdistributions[4].Becauseoftherealworldutilityofpowerlawgraphs,thepredictivepowerismeasuredhere.TheresultsforRMSEarereportedinFigure6.5whiletheMAEvaluesareinFigure6.6.Theredbarineachoftherunsisthebasalcompletionerror.Thebasalcompletionerror,alongwiththepercentofaddededgesthatresultinpoorernetworkcompletionthanthebasalrate,aregiveninTable6.3.67Figure6.5RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandompowerlawgraphsFigure6.6MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandompowerlawgraphs68EdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)101.4787431.393337.66201.39798.6661.30687.333301.4077131.326312.0401.528771.661.461376.0501.4853371.396732.33601.4819391.406739.33701.492050.661.411948.66801.701399.661.626799.66901.525663.661.447162.0Table6.3Thebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforrandompowerlawgraphs6.2PowerLawMotivationThisexperimentationwasmotivatedbyexaminingbehavioronpowerlaw(scale-free)graphs.Becauseoftheirrealworldutility,anothertestwasundertaken.Consideringerandompowerlawgraphsoforder15,eachoftheedgeswereremovedinturn,andthecompletionerrorwasmeasured.The105152edgeswerethensortedfromlow-to-highresultanterror.TheresultsarepresentedinFigure6.7.Intuitionseemstoimplythattheentriesof`1'inthesparseadjacencymatrixwouldcarrymorepredictivepower.However,inFigure6.8theresultingerrorforonlythe`1'entriesisgivenshowingthattheseentriescansometimesbeomittedwithoutincreasingthecompletionerror.69Figure6.7RMSEcausedbyadjacencymatrixelementdeletionforerandompowerlawgraphsoforder15Figure6.8RMSEcausedbyadjacencymatrixelementdeletion(1'sonly)forerandompowerlawgraphsoforder15706.3ConclusionAllofthetgraphclassesproducedsimilarresultsintheedgeremovalex-periments.Further,thescopeoftestingwaslimited.However,asthischapterismeanttoserveasmotivationforfurtherexploration,therearestillsomeusefulobservationstobemade.Sp,thepredictivepowerofedgesingraphsseemstovaryinstandardwaysthatmaybeproveninthefuture.Inallcasesofedgedeletion,approximately5%oftheedgesproducedtlybetterresultsafternetworkcompletionwasperformed.Thisdidnotvarymuchovereachofthegraphclasses,norwasthereatbasedonpercentofobscurededges.Theshouldmotivatefutureworktotargetthose5%ofedgesviagraphtheoreticmethods.Itfurthershowedthatmanyedgesfallbelowthebasalerrorrate.Thisalsoprovidesmotivationtodetermineasetofedgestoavoidaddingtoagraph.Forthesecondpowerlawgraphexamplesundersingleedgedeletion,thereisalsoausefulcautiongivenregardingthepredictivepowerofindividualedges.Althoughtheedgescorrespondingtothevalueof1intheadjacencymatrixseemtoholdmorepredictivepowerthanthe0entries,thereisawidevariationamongthesetof1's.Spally,theyfellintowhatappearstobethreeclassesofpredictivepower.Whatismostinterestinghereisthattheseclasseswereconsistentoverthetestingandhadverysharpboundariesbetweenthem.Thisfurtherimpliesthatingraphstheremaybeasetofhighlypredictiveedgesthat,bytheirdiscovery,networkcompletiontechniqueswouldperformwithmuchhigheraccuracy.Itisanopenproblemtodiscoverthedistributionofthepredictiveedgesingeneral.Regardingthebasalnetworkcompletionerrorrate,someinterestingobservationsaremade.Principally,thenumberofedgeswheretheiradditioncausesmoreerrorinthematrixfactorizationthanthebasalrateisnontrivial.Theselimitedexperimentsnotethatthebasalerrorthresholdneverislowerthantheerrorreductionofthebestedge71addition,butitvarieswildly.Inpracticaltermsthisimpliesthatsingleedgeadditioncan,inmanycases,actuallymakethenetworkcompletionresultsworsethanhadthatedgeremainedunobserved.Thismaybeduetotheedgesholdingunduepowerindetermininggraphstructuresinsparsergraphs,therebybiasingthematrixfactorization,howevermoreresearchtoexplorethismustbedone.PerhapsthisprocesscouldbeguidedviatheadditionofsideinformationinformtheP-impactedgesextra-adjacencydatasources,butthisisbeyondthescopeofthiswork.Thisendstheexplorationofedgeimpact,butbyexploringtherolethatindividualedgesplayingraphsandmatrixfactorizationtechniquesledtomorequestionsaboutnetworkcompletion.Chapters7and8extendcurrentmatrixfactorizationalgorithmsbyincorporatingsideinformation.Theyareinitiallyusedfornetworkcompletion,asbyChapters3through6,butarealsoextendedtoconsiderlinkpredictionandrecommendersystemproblems.72Chapter7NetworkCompletioninGraphsTheneedfortiveandtnetworkcompletionalgorithmsoccursinawidevarietyofsettingssuchaninformationretrieval,socialnetworkanalysis,andcomputationalbiology.Withthemassivegrowthofbigdataproblems,merecollectionofdataisgenerallysimplewhilecharacterizationofsaiddatapresentsmeaningfulchallenges.Theremaybenaturalstructureinthedata,butonlyasmallsamplingmaybefeasiblyobtained.Usingthissample,aquestionarises:canthissmallsampledeterminetherestofthenetwork'slinks?Becausecold-startissuesmaybepresent,itisimportanttoalsoconsidersideinfor-mationfortransductionofknowledgetothenetwork.Usingthissideinformationessentiallyidensimilarusers,andwillassistindeterminingthegraphstructuresthatshouldexistintheunobservednetworkedges.Usingsideinformationalsobthecalculationsasthenetworkinformationissimultaneouslyincorporated.Thus,ifeithersourcehasnoiseormissingentriestheothermaybeabletocompensate.Tocompletegraphsofinformationinnetworkedsystems,antalgorithmthattlydepartsfromtheexistingmethodsisproposed.Thisalgorithmusessideinformationandemployssharedsubspacelearning.Fromthismatrixcompletionwithtransductionisapplied.Thisalgorithmsfrompreviousattemptsasitdecouplesthecompletionandtransductionstages.Speinphaseone,allcold-startnodesare73initiallydiscardedandasub-networkiscompletedperfectlyundersomenon-restrictiveconditions.Inphasetwo,theavailablesideinformationisusedinthetransductionstagetocompletetheentirenetwork{includingtheunobservednodes.Althoughthismethodissimilartosubspacesharing,animportantdistinctiondoesexist.Namely,asubmatrixisperfectlycompletedbeforetransductionoccurs.Thisallowstheproposedalgorithmtoavoidunboundederrorpropagationthatcanoccurinsubspacesharingmethods.Thealgorithm,alongwiththeoreticalanalysisofitsrecoveryerrorfromBarjastehetal.[5],willbedescribed.Thereisalsoextensiveanalysisonrealworlddatasetsthatcomparedtheproposedalgorithmwithmanystate-of-the-artmethods.Thecontentsofthischapterwereadaptedfromtheauthor'scontributiontoajournalarticle1.7.1TheAssumptionsItisassumedthatthereisanundirectedunweightedgraphGpV;Eqwhere|V|naredistinguishablenodes.TheresultantgroundtruthadjacencymatrixisAPt0;1unn.Toapproximaterealworlduse,anassumptionismadethatthereisonlyapartialobservationofA,namelyOPt0;1;?umm;1¤m¤n.Notethatthissub-matrixispartiallyobservedandtherowsandcolumnsofindexnmtonrepresentcold-startusers.ThesetofusersinOcanbesaidtoinduceanedge-labeledsub-graphonGwhereedgesareexplicitlypresent,explicitlyabsent,orunknown.Keepingwithobservationsinrealnetworks,noassumptionismadeonthedistributionofthemissingentries.Manyclassicalmethodsrelyonsuchuniformrandomness.Thereis,however,anassumptionthattheentrieswithinOaresampleduniformallyatrandom.Ifthisisnotthecase,Omaybesubsampled,butwithabuseofnotationforsimplicitythesubsampledO1isstilldenotedO.Thereisafurtherassumptionthatthereissideinformationavailableabouteverynodeviaasocialnetworkgraph.Fromthis,featuresofthenodesareextractedandpairwise1\NetworkCompletionwithProvableGuaranteesbyLeveragingSideInformation".ImanBarjasteh,RanaForsati,DennisRoss,Abdol-HosseinEsfahanian,HayderRadha,FarzanMasrour.SpringerSocialNetworkAnalysisandMining(SNAM).2016.Submitted.74similaritybetwixeachoftheusersisfound.ThisinformationisstoredinthesimilaritymatrixSPRnn.Althoughithasnotyetbeenstated,thereisanassumptionthatthesimilaritymatrixandthenetworksstructurearecorrelated.ThisassumptionisfurtherastherowvectorsofAsharesomesubspacethatisspannedbytheleadingeigenvectorsofthesimilaritymatrixS.Ifsuchanassumptioncouldnotbemade,therewouldbenouseinincorporatingthesideinformationintothepredictions.Theextenttowhichthesesubspacesaresharedwillbeparameterizedlater.7.2TheAlgorithm7.2.1PreviousApproachAsimilarmethodtotheproposedalgorithmcaststheproblemasasharedsubspacelearningframework[79].Thismethodexploitsknowledgefromthesimilaritymatrixandtransfersthistomakestructuralpredictionsonthenetwork.ThisisdonebyjointmatrixfactorizationwhereacommonsubsetofbasisvectorsofAandSarelearned.Thesematricesarefactoredintothreesubspaces.Theissharedbetweentheadjacencyandsimilarity,whereastheothertwoaresptothematricesthemselves.Thisisformulatedasthefollowingoptimizationproblemwithastheregularizationparameterforthenormsofthesolutionmatrices:minU;V;W12}AUVJ}2F12}SUWJ}2F}U}2F}V}2F}W}2FThemainissuewiththesubspacesharingmethodisthatthecompletionofun-observedentriesfromtheadjacencymatrixaremadefromsampledobservedones,and75thetransductionofknowledgefromtheseentriestofullyunobservednodesiscarriedoutsimultaneously.Becauseofthis,completionandtransductionerrorsarepropagatedrepet-itivelyandinanuncontrolledwaythathinderstheenessofincorporatingsimilarityinformation.7.2.2OverviewDivergingfromtraditionalapproaches,theproposedalgorithmfullyrecoversthesubmatrixObeforeusingsimilarityinformationtocompleteadjacencymatrix.Thistech-niquedecouplesthematrixcompletionfromsideinformationtransduction.Morespecif-ically,thephasecompletesthepartiallyobservedsubmatrixOperfectlyduetotheassumptionsofthedistributionofknownentriesinO.PhasetwotransductsthelinksfromboththerecoveredsubmatrixOandthecompletesimilaritymatrixS.ThealgorithmisdescribedinFigure7.1.7.2.3AlgorithmDetailsThestepinthisalgorithmextractsrepresentativesubspacefromthesimilar-itymatrixS.Thishastheoftakinganorthogonalmatrix,UsPRns,fromthesimilaritymatrix.Herethecolumnspacesubsumesthecolumnspaceofadjacencyma-trix.Further,sischosensothatitislargerthanrankofadjacencymatrix.Toreducedimensionalityandincreasethesalienceofthesimilarityinformation,Ustakesonlythes-manylargesteigenvectorsofthesingularvaluedecomposition(spectralclustering)ofthesimilaritymatrix.Inthesecondstep,thepartiallyobservedsubmatrixOisfullyrecovered.Notehere,withthepresenceofcold-startusers,matrixcompletiontechniquesarenotapplicabletoA.However,becauseoftheassumptionsimposedonthedistributionofknownentriesinO,standardmatrixcompletiontechniquesapplyanditcanbefullyrecovered.Thiscompletionisdoneviaaconvexoptimizationalgorithm[14].pOdenotestheoptimalsolutiontothis761.Input:n:thenumberofnodesinnetwrokGpV;EqO:theadjacencymatrixofsubgraphwithmnodesS:thepartiallyobservedpairwisesimilaritymatrixs¥rankpAq:numberofeigenvectorsinsubspace[Extraction]2.ExtractUsfromSbyspectralclustering3.CompletethesubmatrixObysolvingtheconvexoptimizationproblemin(8.1)togetpO[Completion]4.SamplemrowsofUsPRnsuniformlyatrandomtocreatematrixpUsPRms5.SetppUJspUs:pUJspOpUspUJspUs:6.Output:pAUspUJs[Transduction]Figure7.1Algorithmfornetworkcompletionwithsideinformationviatheproposedal-gorithmfordecoupledcompletionandtransductionoptimizationproblem.IfisthesetofobservedlinksintheinducedsubmatrixO,thenthematrixmayberecoveredbysolving:minimize}X}s.t.XijOij;@pi;jqP:Thethird,andstepistransductionofknowledgefromthesideinformationandthecompletedOtoA.ThespinformationtobeusedistheextractedsubspaceUsandtheoptimalsubmatrixpO.Asboththesocialnetworkadjacencymatrix(ofrankr)andtherecoveredsubmatrixarelowrank,adecompositioncanbefoundasfollows,Ar¸i1aiaiJandpOr¸i1paipaJi77Toseethis,wecanconsideraiPt1;0unandpaiPt1;0umasthecorrespondingmembershipassignmentstotheithhiddencomponentsofthegraph.Wenotethatifthesimilarityma-trixissettobeequivalenttotheadjacencymatrix,thentheindicatorvectorsofconnectedcomponentsareexactlya1;a2;;ar.Toformalizethecorrelationofsimilarityinformationandtheadjacencymatrix,itisassumedthatbothmatricesshareacommonsubspace.Becauseofthis,eachvectoraiwithiPrrscanbeuniquelydecomposedintoparallelandorthogonalcomponentstothesharedsubspace.ThisisformalizedasaiakiaKiwhereakiisinthecolumnspanofUsandaKiisexactlyorthogonaltocolumnspanofUs.BecauseakibelongstocolumnspanofUs,itcanberewrittenasabasisofthesharedsubspaceandtheleft-singularcotssuchthat,forsomebiPRs:akiUsbi;i1;2;;r:Theadjacencymatrixcanberelatedtothesimilaritymatrixviaakiandthedecom-positionintoparallelandorthogonalcomponents.Ar¸i1aiaJir¸i1pakiaKiqpakiaKiqJr¸i1akiakJiakiaKJiaKiakJiaKiaKJir¸i1akiakJiaKiaKJiAAEr¸i1akiakJi78HereAisfullycapturedbythesimilarityinformation,butthematrixAEispureerrorassingularvectorsareorthogonaltothesubspacespannedbythesimilaritysubspace.Becauseofthis,AEdoesbfromthesideinformation.Theerrorcontributedbythismatrixintotherecoveryerrorofinferredadjacencymatrixisthusunavoidable,but,bydisregardingtheerrorterm,theadjacencymatrixcannowbewrittenas:Ar¸i1akiakiJAEakiakiJInanobservationoftheabovealgorithm,thekeytotherecoveryofthematrixAisintheestimationofthevectorsbi;i1;2;;r.GivensomereasonableconditionsonthenumberofsamplednodesavailableinG,thefullyrecoveredsubmatrixpOandextractedsubspaceUscanbeutilizedtoestimatethesebivectorsperfectly.ThelinearsystemakiUsbicannotbedirectlysolvedforthevaluesofbitoestimateai.ThisisbecauseUsandakiarenotaccessible.However,akicanbereplacedwithabasispUsandanaccessiblevectorpai.Togetherthesecanestimatethevaluesoftheakivectorsperfectly(withhighprobability).ThesubspacepUsisobtainedbytakingauniformlyrandompermutationofUswheretheselectedrowscorrespondtotherowsinO.Thenthecorrespondingordinaryleast-squaresregressionmaybesolvedforanestimationofpbi.pbiminbPRspaipUsb22pUJspUs:pUJspaiThelastequalityholdsbecausepUJspUs:pUJsistheoptimalsolutiontotheordinaryleast-squaresregressionproblemabove.Havingcomputedanestimateforeachbi,theadjacencymatrixcanbefullycomputedas:pAUsr¸i1pbipbJiUJs:797.3ExperimentalEvaluationTwobroadcategoriesofexperimentationaretakenwiththealgorithm.Usingwellknownsocialnetworkandsyntheticdatasets,thealgorithm'sperformanceagainststandardbaselineistestedforlinkpredictionproblems.Itisalsocomparedagainststate-of-theartalgorithmsfornetworkcompletion.Themainreasonforsplittingtheexperimentsintotwopartsistheuniquechallengesthatexistinnetworkcompletionduetothesparsenessintheinformationofthelinks,whichmakesitaproblem.7.3.1DatasetsFourrealworlddatasetswereusedfortheexperiments.Twowereusedforlinkpredictionandtheothertwowereusedfornetworkcompletion.ForlinkpredictionaproductratingsocialnetworksitecalledEpinionsandthepopularChinesemicrobloggingsiteTencentWeibowereused.Thenetworkcompletionexperimentswereperformedontwolargesocialnetworks:FacebookandGoogle+.1.Epinions:Epinionsisanexplicittrust/distrustsocialnetworkthatyieldsadirectednetwork.Becausethenumberofdistrustrelationsisverysmall,onlyexplicittrustrelationshipsbetweenusersareconsidered.Thestatisticsofthisdatasetaresumma-rizedinTable7.1.ThisdatasetisavailablethroughJiliangTangatArizonaStateUniversity2.Generally,theEpinionsdatasetisusedtopredictratingsgiventhetrustgraphofitsusersassideinformation.Hereitusedtopredicttheexistenceoflinksviaalinkpredictionprocessthattheproposedalgorithmfoundwhileusingtheratinginformationassideinformation.Toextractthefeatures,eachuserwasendowedwithacollectionoftheirratingsoftheitems.Theseratingscontainedscoresandtimestamps.Toextractmorefeatures,additionaliteminformation(e.g.names,categories,ratings)wasincludedwhentheuserswereinterestedinsuchitems.A2http://www.public.asu.edu/~jtang20/datasetcode/truststudy.htm80featureincludedanormalizedtripleforeachitemcategory.Thistriplegavethetotalnumberofreviewsinthatcategory,andthenumberofpositiveandnegativereviewspercategory(normalizedbythemaximumvalueofeachpresentinthedata).Again,pairwisesimilaritiesbetweenuserswerefoundviacosinesimilaritytogeneratethesimilaritymatrixofusers.2.TencentWeibo:TencentWeibo(Weibo)isaChinesemicrobloggingsitesimilartoTwitter.Thereisanetworkstructurewhereuserscan`follow'anothertocreatealink.Inadditiontothelargenetwork,thesideinformationisquiterich.Featuresfortheanonymizeduserswereextractedthatincludedpersonalinformation(age,gender,andinterests),socialinformation(numberofmessagessent,messagesreposted,usersfollowed),anditem(categoriesandkeywords).Itemsandusersaredistinguishedbytheirfeatures,butforallpracticalpurposesitemsmayactlikeusersinthesocialnetwork.Itemsaregenerallyunderstoodtobecorporateentitiesorcelebrities.Afollower-followeedirectedgraphisgiven,andtheproposedalgorithmisusedtodeterminetheexistenceofadirectededgethatrepresentswhetherornotagivenuserwillfollowanother.Onlyconsideredpositivelinks(whereauserfollowsanother)areused,andallnegativelinks(whereauserexplicitlychosesnottofollowanother)areignoredforprediction.ThedataisavailablefromtheKDDcup3.TheoriginalWeibosocialdirectedgraphhas1.4millionusersandover73millionlinks.Asmallsubsetofapproximately4,000userswithmanyobservedlinksamongthem,andareasonableamountofsideinformation,waschosen.Findsuchasetinthislargedataset,amodsearchthatpenalizedaddingnodesthatdidnotcontainlinksgoingbackintotheexplorednodeswasdeployed.Oncethisprocessterminated,other(possiblyisolated)nodesatrandomwereaddediftheyhadalargeamountofside-informationuntilthegraph'sorderthresholdwasmet.3http://www.kddcup2012.org/c/kddcup2012-track1/data81Theobservedaverageratioofdirectedlinkstonodesforrandomdirectedgraphsoforderapproximately4,000was0.46.WiththisBFSmethodthedirectedsubgraphof4,052verticesthatwasfoundhadatlyhigher-than-averageratioofdirectedlinkstonodesat0.98.Thestatisticsofdata-subsetsaresummarizedinTable7.1.Togeneratethesimilaritymatrixofusers,cosinesimilaritybetweenallpairsofuserswasused.3.Facebook:Facebookisoneofthemostpopularrealworldsocialnetworkdatasets.OnFacebook,peoplehavedirectedfriendshiprelationshipswitheachotherandeachuserhasacontaininginformationaboutthemselves.Foreachuser(anodeinthenetwork)personalfeatures(e.g.gender,jobtitle,age)areextractedfromtheirThenetworkofuserswasproducedbycombiningego-networksoftheFacebookdatasetavailableattheStanfordLargenetworkDatasetCollection4.ThestatisticsofdatasetsaresummarizedinTable7.1.Thepairwisesimilaritiesbe-tweenuserswerefoundviacosinesimilaritytogeneratethesimilaritymatrixofusers.4.Google+:Google+issimilartoFacebookintermsofrelationshipsbetweenusers.FeatureswerealsoextractedfromtheindividualuserinthesamewayasFacebook.Theego-networksofuserswerecombinedtogethertohaveanetworkofusersalongwiththeirfeatures.ThisdatasetisalsoavailableattheStanfordLargenetworkDatasetCollection5andthestatisticsareshowninTable7.1.Thepairwisesimilaritiesbetweenuserswerefoundviacosinesimilaritytogeneratethesimilaritymatrixofusers.4http://snap.stanford.edu/data5http://snap.stanford.edu/data82Dataset#ofNodes#ofLinks#ofNodeFeaturesWeibo4,0523,95716,786Epinions90,879365,42290,087Facebook4,089170,174175Google+250,46930,230,905690Table7.1StatisticsofWeibo,Epinions,Facebook,andGoogle+datasets7.3.2EvaluationMetricsTheperformanceofthetalgorithmswasmeasuredbythediscrepancybetweentheoriginalandcompletedmatrices.ThewidelyadoptedMeanAbsoluteError(MAE)andRootMeanSquaredError(RMSE)metrics[41]wereusedforevaluation.LetAandpAdenotethefullandestimatedadjacencymatrices,respectively.LetTbethesetofunobservedtestlinks.Then,MAE°pi;jqPT|AijpAij||T|;TheRMSEmetricisas:RMSEgffe°pi;jqPTAijpAij2|T|:7.3.3BaselineAlgorithmsToevaluatetheperformanceofourproposedalgorithm,avarietyofbaselineap-proacheswereconsidered.Thebaselinealgorithmsfallintotwocategories.Thesetincludedthosethatwereusedinthelinkpredictionexperiments,andthesecondsetin-cludedthosethatwereusedinthenetworkcompletionexperiments.837.3.3.1LinkPredictionBaselineAlgorithmsThebaselinealgorithmsarechosenfromawidevarietyoftypesoflinkpredictionalgorithmstohaveafaircomparisontotheproposedalgorithm.AllimplementationareavailableintheMyMediaLitePackage[29].1.BPRMF:MatrixfactorizationwithBayesianpersonalizedranking(BPR)fromim-plicitfeedbackproducesrankingsfrompairwiseThematrixfactoriza-tionmodelprovidesitempredictionoptimizedforBPR.2.BiPolarSlopeOne(BPSO):SlopeOneratingpredictionmethodsweightedbybipolarfrequency.3.MatrixFactorization(MF):Matrixfactorizationwherelearningisprovidedbystochasticgradientdescentfactoringtheobservedratingsintouseranditemfactormatrices.4.SlopeOne(SO)SlopeOneratingpredictionmethodwithfrequencyrating.5.User-ItemBaseline(UIB):Assignsanaverageratingvalue,andregularization,forbaselinepredictions.Usestheaverageratingvalue,plusaregularizeduseranditembiasforprediction.6.CoClustering(CC):Performssimultaneousclusteringonboththerowsandthecolumnsoftheratingmatrix.7.LatentFeaturelog-linearModel(LFM):Ratingpredictionmethodthatusesalog-linearmodelonthelatentfeaturesofthesystem.8.BiasedMatrixFactorization(BMF):Matrixfactorizationwithparametersforbiasesofusersanditems.Utilizeslearningprovidedbystochasticgradientdescent.849.SVDPlusPlus(SPP):Singularvaluedecompositionmatrixfactorizationmethodthatmakesuseofimplicitfeedback.Furtherconsideredwhatitemsanduserseachuserhasrated.10.SigmoidSVDPlusPlus(SSPP):Singularvaluedecompositionmatrixfactoriza-tionmethodthatmakesuserofimplicitfeedbackandutilizesasigmoidfunction.11.SigmoidItemAsymmetricFactorModel(SFM):Asymmetricfactormodelthatrepresentstheitemsbasedonhowtheusersratedthem.7.3.3.2NetworkCompletionBaselineAlgorithmsThebaselinealgorithmswerechosenfromthreenttypesofnetworkcomple-tionalgorithms.Onemethodonlyutilizetheobservedlinks,oneusesnetworkstructureaswellasnodefeatures,andthirdonelearnsthesharedsubspaces.1.MF:Considersonlythenetworkstructureandignoresthenodefeatures.MatrixCompletion(MF)algorithminthestandardinthisclass[83].2.MF-NF:Employingbothnodefeaturesandthestructureofnetworkfornetworkcompletion,amatrixfactorizationbasedalgorithmthatfactorizestheadjacencyma-trixandcombinesthelatentfeatureswithexplicitfeaturesofnodesandlinksusingabilinearregressionmodel[60].Theimplementationofthisalgorithmisprovidedbytheauthors6.3.MF-SS:Combiningthenetworkstructurewithnodefeaturesbysharingasubspace.MatrixFactorizationwithSubspaceSharing(MF-SS)[24]ismostliketheproposedalgorithm.4.MC-DT:ThisreferstotheproposedalgorithmformallycalledMatrixCompletionwithDecoupledTransduction(MC-DT).6http://cseweb.ucsd.edu/~akmenon/code/85Figure7.2TherecoveryerroroftheproposedMC-DTalgorithmnoisevariancevalues7.3.4ExperimentsonSyntheticDatasetsToexaminethealgorithmsonasyntheticdataset,anadjacencymatrixAwith2,048nodeswasgenerated.Tocreatethisnetwork,therankwasatr10yieldingtencomponents.Thenodeswereevenlydistributedamongthesecomponents.ApairwisesimilaritymatrixforthenodeswasgeneratedbyaddinganoisetermtotheadjacencymatrixSAN,whereeachentryofnoisematrixfollowsauniformdistributionNijUp0;0:5q.Togenerateanincompletenetwork,20%ofalllinkswererandomlyremoved.Thenetworkthenhad119,999links.7.3.4.1ofNoiseinSideInformationTobetterunderstandtheofsimilarityinformationontherecoveryerror,theofnoiseinsimilaritymatrixontheperformanceofMC-DTalgorithmwasinvesti-gated.Theaddednoiseonthesimilaritymatrixfollowedazero-meanGaussiandistribution86Figure7.3TherecoveryerroroftalgorithmsonasyntheticdatasetfortsizesofpartiallyobservedsubmatrixwithmnodesNp0;˙qwithntvaluesofvariance˙.Atthesametime,thesizeofobservedsubma-trixwastom400.Inparticular,thevariancerangedfrom˙0:1to˙1withstepsizeof0:1.AsFigure7.2shows,byincreasingthenoise,therecoveryerrorincreaseslinearly.ThisisconsistentwiththetheoreticalresultofForsatietal.[55].7.3.4.2ofTrainingSizeInvestigationonthesizeofobservedsubmatrixOasitappliestorecoveryerrorwasalsoundertaken.Thesizeoftheobservedsubmatrixwasvariedfrom200to1600withstep-sizeof200.ThereportedrecoveryerrorforthetalgorithmsisgiveninFigure7.3.Itcanbeobservedthatbyincreasingm,therecoveryerrordecreasesforallofthemethods.Thefewerunobservedelementswehave,thelowertherecoveryerroris.TheproposedMC-DTalgorithmperformsthebest,verifyingitsreliableperformance.ThesigntbetweentherecoveryerrorofMC-DTandthreeotheralgorithmsimpliestheenessofMC-DTinexploitingsimilarityinformation.BecauseMC-DT87ignoresthemissingentriesofthenetworkandcompletesthesubmatrixpurelyfromtheobservedpartofthenetwork,thecompletionerrorbecomeszeroandisnotpropagatedintransductionstage.Itisofinteresttonotethat,byincreasingthesizeofobservedsubmatrix(asmapproachesn),theofsimilarityinformationondecreasingrecoveryerrorforallmethodsbecamelesstial.Thisfollowsfromthefactthatexistenceoflinksinthenetworkprovidemuchricherinformationthanthepairwisesimilaritybetweenusers.7.3.5EvaluationofLinkPredictionTheresultsofapplyingtheproposedalgorithmalongwithtbaselinealgo-rithmsonthelinkpredictionproblemforEpinionsandWeiboarecontainedinthissection.7.3.5.1LinkPredictiononEpinionsToperformthecomparisonwiththebaselinealgorithms,rsttheuser-userbinarytrustmatrixwascreatedfromtheavailabledataset.Inthismatrix,a`1'indicatedatrustrelationbetweentwousersand`0'indicatedunobservedrelations.Recall,distrustwasnotconsidered.Ineachdataset,afractionthatvariedfromthesett40%;50%;60%;70%;80%uwaschosentobetraining,with10%asvalidation,andtheremainingpartwasthetestset.Thetestsethaditsentriessettozero.Thedatawerebrokendownrandomlyintoetsetspertestsize.TheaveragevaluesoftheexperimentsaregiveninTable7.2.Theproposedalgo-rithmoutperformedallotherbaselinealgorithmsinalltrainingsizes.Byincreasingthetrainingsizetheerroralsoreducedbutthereisnotatdropintheerror.Thisallowedaconclusiontobedrawnthatbyselectingsmallertrainingsizes,accuracycanbemaintained.Addedboffastercomputationsandovavoidancearealsoachieved.Thisfurtherassistsinsituationswherethereisnotenoughdata(orcomputa-tionalpower)availabletouselargetrainingsets.88RMSEMethodParametersTrainPercentage40%50%60%70%80%BPSOr=20,u=15,i=10,T=500.77080.76830.76830.76260.7610MFr=20,=0.015,=0.01,T=500.70630.71270.71910.71730.7269SO|0.78510.77800.77260.76500.7611UIB|0.70260.70220.70320.69930.7022CCCu=10,Ci=20,T=500.78490.77760.77120.76790.7625LFMr=10,=0.01,u;i=0.05,=0.01,T=500.72120.71100.71880.71500.7122BMFr=10,=0.01,u;i=0.05,T=500.70860.70840.70940.70610.7087SPPr=10,=0.05,=0.01,T=500.68970.69030.69260.68920.6929SSPPr=10,=0.05,=0.01,T=500.98790.98670.98510.98360.9856SFMr=10,=0.015,=0.33,=0.00,T=500.67380.67430.67700.67310.6777MC-DTr370.66230.64950.64280.64300.6551MAEMethodParametersTrainPercentage40%50%60%70%80%BPSOr=20,u=15,i=10,T=500.94040.93180.92450.91560.9078MFr=20,=0.015,=0.01,T=500.83630.84220.84740.84680.8543SO|0.95150.93700.92440.91290.9047UIB|0.82800.82780.82800.82590.8276CCCu=10,Ci=20,T=500.92850.91600.90710.90130.8932LFMr=10,=0.01,u;i=0.05,=0.01,T=500.84250.83400.84020.83770.8342BMFr=10,=0.01,u;i=0.05,T=500.83180.83170.83190.83020.8315SPPr=10,=0.05,=0.01,T=500.82110.82150.82270.82060.8227SSPPr=10,=0.05,=0.01,T=501.26171.26051.25991.25661.2596SFMr=10,=0.015,=0.33,=0.00,T=500.81650.81670.81810.81550.8182MC-DTr370.76250.75910.75620.75620.7573Table7.2LinkpredictionresultsonEpinionsdatasetandtheoftrainingsize.897.3.5.2LinkPredictiononWeiboTheTencentWeibodatasetisgivenasaseriesoflinkrecommendationfortheusersalongwithsideinformation.Forthelinkrecommendationsatripleispresentedthatgivesauser,arecommendationofauseroritem,andaBoolean.TheBooleanrepresentswhetherornotauseracceptstherecommendeduseroritem.Anacceptancecanbeviewedasadirectededge,fromtheusertotherecommendation,inthesocialnetworkgraph.Toperformthecomparisonwiththebaselinealgorithms,rsttheuser-userbinarytrustmatrixwascreatedfromtheavailabledataset.Inthismatrix,a`1'indicatedatrustrelationbetweentwousersand`0'indicatedunobservedrelations.Recall,distrustwasnotconsidered.Ineachdataset,afractionthatvariedfromthesett40%;50%;60%;70%;80%uwaschosentobetraining,with10%asvalidation,andtheremainingpartwasthetestset.Thetestsethaditsentriessettozero.Thedatawerebrokendownrandomlyintoetsetspertestsize.ThelinkpredictionexperimentsonWeibousedttrainingsizessimilartotheEpinionsexperimentation.Table7.3showstheresultsofRMSEandMAEmeasuresonWeibodataset.TheresultsshownintheTable7.3thattheproposedalgorithmisthebestperformingalgorithmamongthebaselinealgorithms.Thereisatgapbetweentheresultsofourproposedalgorithmandotherbaselinealgorithms.ThiswasnotthecaseinEpinionsdatasetresults.ThistrenceinaccuracywasattributedtohavingmoreavailablesideinformationavailableinWeibothanwhatwasavailableinEpinions.Again,afterhaving50%astrainingand10%asvalidation,thereisnotaantdropintheaccuracyoftheresults.Theseresultssupporttheofthedecoupledapproachtolinkprediction.90RMSEMethodParametersTrainPercentage40%50%60%70%80%BPSOr=20,u=15,i=10,T=500.50230.50270.50170.50410.5063MFr=20,=0.015,=0.01,T=500.49920.49990.49910.50000.5018SO|0.50170.50340.49880.49990.5072UIB|0.50070.50020.49960.49830.5000CCCu=10,Ci=20,T=500.50200.50040.50360.49500.5022LFMr=10,=0.01,u;i=0.05,=0.01,T=500.50080.50070.49970.49840.5000BMFr=10,=0.01,u;i=0.05,T=500.50040.50030.49980.49860.5000SPPr=10,=0.05,=0.01,T=500.49960.50020.49980.49870.5002SSPPr=10,=0.05,=0.01,T=500.49910.50150.49430.48850.4884SFMr=10,=0.015,=0.33,=0.00,T=500.50010.50000.50010.49960.5000MC-DTr370.34540.32850.32320.32370.3266MAEMethodParametersTrainPercentage40%50%60%70%80%BPSOr=20,u=15,i=10,T=500.55090.54880.54440.55090.5496MFr=20,=0.015,=0.01,T=500.50120.50190.50130.50250.5053SO|0.54890.54960.54140.54360.5488UIB|0.50670.50660.50530.50380.5053CCCu=10,Ci=20,T=500.56120.55740.55640.55110.5550LFMr=10,=0.01,u;i=0.05,=0.01,T=500.50930.51000.50830.50680.5082BMFr=10,=0.01,u;i=0.05,T=500.50210.50220.50170.50060.5019SPPr=10,=0.05,=0.01,T=500.50270.50360.50310.50200.5035SSPPr=10,=0.05,=0.01,T=500.69870.70030.69510.69080.6908SFMr=10,=0.015,=0.33,=0.00,T=500.50050.50020.50020.49970.5001MC-DTr370.46240.44570.44190.44210.4437Table7.3LinkpredictionresultsontheWeibodatasetandtheofvaryingtrainingsize917.3.6EvaluationofNetworkCompletionNowtheproposedalgorithmiscomparedtothestate-of-the-artalgorithmsfornet-workcompletion.ThefollowingresultswerefoundusingtheFacebookandGoogle+socialnetworkdatasets.Inthisscenario,varyingtrainingsizeswereagainutilized.Againthedatasetsweredividedintotraining,validation,andtestingasdescribedinthelastsection.Whenthetrainingsizeis20%,20%ofthenodesandthecorrespondinglinksfromtheeachsocialnetworkarerandomlyselectedtopredicttherestofnetwork.TheperformanceofMC-DTalongwiththebaselinenetworkcompletionalgorithmsonthesetwodatasetswasevaluated.7.3.6.1NetworkCompletioninFacebookInFigures7.4and7.5,theRMSEandMAEoftalgorithmsonFacebookdatasetareshown,respectively.Intheseplotstheimprovementofallmethodsgraduallydecreasedasmoreofthenetworkstructurewasobserved.7.3.6.2NetworkCompletioninGoogle+Table7.4showstheperformanceoftheMC-DTincomparisonwiththeotherbase-linealgorithmsonGoogle+'ssocialnetwork.Thealgorithm'sperformanceexhibitssimilarbehaviortotheonepresentedinpreviousexperimentwithFacebook.Sp,MC-DToutperformstheothernetworkcompletionalgorithms.MethodRMSE(30%)MAE(30%)RMSE(60%)MAE(60%)MF0.972010.891320.817490.76601MF-NF0.863840.820940.715050.68935MF-SS0.813680.788200.583690.68160MC-DT0.788060.501860.508510.30011Table7.4ComparisonoftalgorithmsontheGoogle+withtpercentagesofobservednodes92Figure7.4TherecoveryoffouralgorithmsontheFacebookdatasetmeasuredinRMSEundertpercentagesofobservednodesFigure7.5TherecoveryoffouralgorithmsontheFacebookdatasetmeasuredinMAEundertpercentagesofobservednodes937.4ConclusionAnctivealgorithminMC-DTwasdevelopedfornetworkcompletionandlinkpredictionwithauxiliarysimilarityinformationonthenodes.Itscomparisontothestate-of-the-artbaselinesonsyntheticandrealdatasetsrevealsthatthisalgorithmexhibitsim-provedperformanceintermsofrecoveringthefullnetwork.Thisimprovementisduetothedecoupledcompletionoftheobservedsubmatrixfromtransductionofknowledgewhileexploitingsimilarityinformation.MC-DTgreatlyoutperformedtheotherbaselinealgorithmsinoftheevaluationmet-rics.BoththeRMSEandMAEerrorsofthecompletednetworkaremorethantwotimessmallerforMC-DTthantheothers.TheseexperimentsdemonstratedtheenessofMC-DTinexploitingthesimilarityinformationtorecoverthefullnetwork.Althoughbothusematrixfactorizationtechniques,MC-DTdoesutilizenodefeatureswhilenaiveMFdoesnot.MC-DTachievesbetterperformance,asitcombinestheinformationfromthenodefeaturesandthenetworkstructure.Thetgapbetweentheperformanceofthesetwoalgorithmsdemonstratedtheimportanceofsimilarityinformationinnetworkcompletionproblems.Whennetworksareincomplete,theperformanceofMC-DTdegradesgracefully.Becauseitcanrelyonthenodefeatures,thealgorithmcancompensateforthelackoflinksintheoverallnetworkstructure.Finally,bycomparingtheresultsforsyntheticandrealdatasets,inthesyntheticdatasetthedecreaseoferror(byincreasingthenumberofobservednodes)istinitially,butthenslowlyshrinks.Thiscontraststorealworlddatasetswherethedecreaseisroughlylinear.94Chapter8RecommenderSystemsAscommercehasshiftedtotheinternet,e-commercewebsites(e.g.Amazon,Wal-mart.com),mediaconsumption(e.g.AmazonVideo),andnewsmedia(e.g.NY-times.com,CNN.com)haveexperiencedlargegrowthinbothusersandavailableservices.Withthisgrowth,therebecameademandforcuratedcontentfortheendusers.Thisgaverisetotrecommendersystemsthatwouldpresentuserswithitemsofpersonalrelevance.However,justasinthenetworkcompletioncase,cold-startproblemswiththeusersanditemshavenotbeenadequatelyaddressed.Thegoalofthealgorithmproposedinthischapteristogiveancientmatrixfactorizationapproachusingsimilarityinformationderivedfromsideinformationthatisaccurateandcanseamlesslyincorporatesparseorcold-startdatasets.ThisisanextensionoftheworkonnetworkcompletionpresentedinChapter7.Atwo-stagealgorithmthatdecouplesthecompletionandtransductionstagesduringthematrixcompletionispresentedhere.Inthephasecold-startitemsandusersareexcludedandaratingsubmatrixisfullycompleted.Inthenextphase,thissubmatrixandasimilaritymatrixextractedfromtheavailablesideinformationareusedtotransductinformationforthecold-startusersanditems.Unlikemostsubspacesharingapproaches,thereisnoerrorpropagationofcompletionduringtransduction.95Thetheoreticalresultsarefurtherenhancedwithcomprehensiveexperimentsonafewbenchmarkdatasetstodemonstratethemeritsandadvantagesofproposedframeworkindealingwithcold-startproblems.Theresultsdemonstratethesuperiorityoftheproposedframeworkoverseveralofstate-of-artcold-startrecommenderalgorithms.Thecontentsofthischapterwereadaptedfromtheauthor'scontributiontoapublishedjournalarticle1.8.1TheAssumptionsThereissomelow-rankratingsmatrixRPRnmwheretherearenusersprovidingreal-valuedratingsformitems.Forthepurposesofthisworktheratingsweretakentobevaluesfrom1to5inclusive;however,thisassumptioncouldbeeasilychangedbasedontheproblemdomain.ThereisapartiallyobservedsubmatrixM—RwhereMPt0;1;?upq;1¤p¤n;1¤q¤q.TherowsofRwithnovaluesareuserswhoarecold-start(i.e.userswhohavenotprovidedanyhistoricalratings).ThecolumnsofRwithnoentriesarecold-startitems(i.e.itemswherenofeedbackhaseverbeenprovided).Aswiththenetworkcompletioncase,noassumptionshavebeenmadeaboutthedistributionoftheratings.Thereisafurtherassumptionthatthereissideinformationavailableabouteveryuseranditemviaasocialnetworkgraphanditemspinformation.Fromthis,featuresoftheusersanditems,respectively,areextractedandpairwisesimilarityiscom-putedforeach.ThisinformationisstoredinthesimilaritymatrixAPRnnfortheusersandBPRmmfortheitems.Thereisalassumptionthattheratingmatrixandthesimilaritymatricesarecorrelated.Thus,thepairwisesharesomelatentinformation.Sp,therowvectorsofRshareanunderlyingsubspacewiththeleadingeigenvectorsofA,andthecolumnvectorsshareanunderlyingsubspacewiththeleadingeigenvectorsofB.Theamountof1\Cold-StartRecommendationwithProvableGuarantees:ADecoupledApproach".ImanBarjasteh,RanaForsati,DennisRoss,Abdol-HosseinEsfahanian,HayderRadha.IEEETransactionsonKnowledgeandDataEngineering(TKDE).2016.Accepted.96subspacesharingwillbeparameterized,butifnosuchsubspaceexistedtherewouldbenobtoapplyingthisapproach.8.2TheAlgorithm8.2.1PreviousApproachThepreviousmethods[79]forpredictingratingsviaasharedsubspaceapproachisverysimilartothatofthenetworkcompletionapproach.Again,theproblemiscastasasharedsubspacelearningframework.Thismethodexploitsknowledgefromthesimilaritymatrixandtransfersittomakestructuralpredictionsonthenetwork.ThisisachievedbyjointmatrixfactorizationtolearnacommonsubsetofbasisvectorsfortheratingmatrixRandthesimilaritymatricesAandBforusersanditemsasformulatedinthefollowingoptimizationproblemwithastheregularizationparameterforthenormsofthesolutionmatricesandcommonlatentspacerepresentationisgivenbyusingthesamematricesWandZ:minUPRnr;VPRmr;WPRnr;ZPRmr12}RUVJ}2F}U}2F}V}2F12}AUWJ}2F12}BZVJ}2F}W}2F}Z}2FAgain,thisapproach'sperformanceisimpairedbecausethecompletionoftheun-observedentriesinratingmatrixRandtransductionofknowledgefromtheseentriestocold-startusers/itemsviasimilaritymatricesiscarriedoutsimultaneously.Completionandtransductionerrorsarepropagatedrepetitivelyanduncontrollably.Theproblemwith97errorpropagationbecomesevenworseduetothenon-convexityofoptimizationproblems,whichwillbedescribedlater.8.2.2OverviewAsnotedinChapter7,theproposedalgorithmfullyrecoversthesubmatrixMbeforeusingsimilarityinformationtocompletetheadjacencymatrix.Thistechniquedecouplesthematrixcompletionfromsideinformationtransduction.Moresp,thephasecompletesthepartiallyobservedsubmatrixMperfectlyduetotheassumptionsofthedistributionofknownentriesinM.PhasetwotransductsthelinksfromboththerecoveredsubmatrixMandthecompletesimilaritymatricesAandB.ThealgorithmisdescribedinFigure8.1.8.2.3AlgorithmDetailsAnorthogonalmatrixUAPRnswasconstructedwhereitscolumnspacesubsumestherowspaceoftheratingmatrix.Similarly,UBPRmswasconstructedforthecolumnspace.BothUAandUBwerecreatedbytakingtheirleadingseigenvectors.Thevalueofsrelatestotheextenttowhichtheextractedsubspacessubsumetherowandcolumnspacesoftheratingmatrixandisthusdependentonthequalityofthesideinformation.Thisallowedtherankrmatrixtobedecomposedas:Rr¸i1uivJi:Colloquially,thisexpressedtheratingsmatrixasitsuseranditemcomponents.Further,thei-thuser'slatentfeaturesvectoruicanbedecomposedinauniquewayintotwopartsthatareparallelandorthogonaltothesharedsubspaceasuiukiuKi.Here,ukiisspannedbyUAwhileuKiisorthogonaltoUA.Forthej-thitem'slatentfeaturesvicanbewrittenasvkivKiwhere,vkiisspannedbyUBwhilevKiisorthogonaltoUB.981.Input:RPRnm;r:observedmatrixanditsrankAPRnn:theusers'similaritymatrixBPRmm:theitems'similaritymatrix2.Extractthemaximalrecoverableratingsub-matrixMPRpq3.Completethesub-matrixMtogetxM4.DecomposexMasxM°ri1puipvJi5.ExtractsubspacesUAandUBbyspectralclusteringfromsimilaritymatricesAandB,respectively6.ComputepaipUJApUA:pUJApui;i1;2;;r7.ComputepbipUJBpUB:pUJBpvi;i1;2;;r8.ComputepRUA°ri1paipbJiUJB9.Output:pRFigure8.1Algorithmforratingpredictionusingsideinformationviatheproposedalgo-rithmfordecoupledcompletionandtransductionHavingdecomposedthelatentfeaturesasaboveintotwoparallelandorthogonalcomponents,theratingmatrixcanberewrittenas:99Rr¸i1uivJir¸i1pukiuKiqpvkivKiqJr¸i1ukivkJiukivKJiuKivkJiuKivKJir¸i1ukivkJir¸i1ukivKJir¸i1uKivkJir¸i1uKivKJiRRLRRREr¸i1ukivkJiHereRisfullyspannedbythesubspacesUAandUB.TheothermatricesaresourcesoferrorasRL'srightsingularvectorsareorthogonaltothesubspacespannedbyUB,andRR'sleftsingularvectorsareorthogonaltothesubspacespannedbyUA.Finally,RE'sleftandrightsingularvectorsbothareorthogonaltotheirrespectivesimilaritysubspaces.Theerrorcontributedbythismatrixintotheestimationerrorofrecoveredratingmatrixisunavoidable,butisboundedbytheresultsofBarjastehetal.[6].Withthisinmind,theratingsmatrixisrewrittenas:RukivkJiRLRRREukivkJiTobeginthealgorithm,asub-matrixMPRpqisextracted.ToreconstructMperfectly,thenumberofobservedelementsmustbeatleastprppqqlog2p2pqq[73].Tomeettheseconditions,therowsaresortedbythenumberofratingstheycontain.Inthiswaythetoprowsaretheleastsparse.Then,thelargestpossiblesubmatrixthatmeetsthematrixcompletionconditionsistakenform.ThissubmatrixisthenfullyrecoveredasxM.Thisisaccomplishedviamatrixcompletiontechniquesof[83,61,81,14].This100leadstothefollowingconvexoptimization,withM—asthesetofobservedratings:xMargminXPRpq}X}s.t.XijMij;@pi;jqPM:(8.1)ThiscompletedsubmatrixandthesubspacesUAandUBcanthenbeusedtorecoverR°ri1uivJiviatransduction.Speci,theratinginformationintherecoveredmatrixxMistransducedtothecold-startusersanditems.BecauseukiandvkiarefullyspannedbythesubspacesUAandUB,fori1;2;:::;rtheycanberewrittenas:ukiUAaiandvkiUBbiHereaiPRsandbiPRsaretheorthogonalprojectionofthesingularvectorsontothecorrespondingsubspaces.BysubstitutingtheseequationsintothedecompositionofRthefollowingistrue:Rr¸i1UAaibJiUJBUAr¸i1aibJiUJBTorecoverthematrixR,theremustbeanestimateforthevectorsai;bi.Thesubspacesextractedfromthesimilaritymatricesandtherecoveredratingsub-matrixxMcanbeusedtomaketheseestimations.Tothisend,considerthedecompositionoftherecoveredmatrixasxM°ri1puipvJi.Theestimationofvectorsai;biandthematrixRnowdescribed.LetpUAPRpsbearandomsubmatrixofUAwherethesampledrowscorrespondtothesubsetofrowsinthematrixxM.Similarly,letpUBPRqsbecreatedbysamplingtherowsofUBcorrespondingtothecolumnsinxM.Anestimationofai;bi;iPrrsvectorscanbeobtainedbythe101orthogonalprojectionofleftandrightsingularvectorsofxMontothesampledsubspacespUAandpUBbysolvingtwooptimizationproblems,namely:paiargminaPRspuipUAa22pbiargminbPRspvipUBb22Thesolutionstotheseordinary-leastsquaresregressionproblemsareknown.Theyare:pUJApUA:pUJApuiandpUJBpUB:pUJBpvi,respectively.ThisallowsRtobeestimatedby:pRUAr¸i1paipbJiUJBUApUJApUA:pUJAr¸i1puipvJipUBpUJBpUB:UBTheestimatedratingmatrixpRisfoundbynowsettingpRpR.Thereisdimensionreductionviathesubmatrices,andthisleavesonlythespectralclusteringandorthogonalprojectionascomputationallyexpensive{evenwithalargeratingmatrix.8.3ExperimentalEvaluationSeveralexperimentswerecompletedonmultipledatasets.ThesecomparedDecRecoverasetofbaselinealgorithmstodemonstratethemeritsandadvantagesofDecRec.Thedatasetsarewell-knownandpubliclyavailable.Theseare:MovieLens(1Mand100K)2,Epinions3andNIPS4.2http://www.grouplens.org/node/733http://www.trustlet.org/wiki/Epinions\_dataset4http://www.cs.nyu.edu/~roweis/data.html102Severalfundamentalquestionswereaddressed.Principally,theproposedalgorithmiscomparedagainststate-of-theartmethodsforincorporatingsideinformationintorec-ommendationofexistingitemstoexistingusers.Asthisisawellresearchedarea,theinterestingresultsoccurduringthecold-startcases.Resultsarepresentedwhentherearecold-startitems,cold-startusers,andbothcold-startusersanditems.8.3.1DatasetsAlthoughthisalgorithmismadetoproviderecommendations,itisapplicabletoanyuser-itemsetting.Becauseofthis,experimentswereperformedonwell-knownmovieanditemratingdatasetsforrecommendation,andafurtherexperimentwasconductedonanauthor-publicationgraphtosolveanetworkcompletionproblem.Therecommendation(includingcold-start)experimentsusedMovieLensandEpinionswhiletheNIPSdatasetwasusedfornetworkcompletion.Thedescriptionsaregivenalongwithatableofthedatasets'statisticsinTable8.1.Thereisalsoasyntheticdatasetincludedinsomeoftheexperiments.1.MovieLens:TwoversionsoftheMovieLensdatasetswereused.Theywerethe100Kand1Mvariants.Theyconsistofratings(1-5)fromusersonmovies.Inadditiontoratingdata,thesedatasetsalsocontainfeaturesforbothusersandmovies.Foreachmoviethefeatures(e.gtitle,year,genre)aregiven.Furtherfeaturedatawasextractedfromthemovieinformationwebsiteimdb.com.Foreachuser,personalfeatures(e.g.gender,age,occupation,location)wereextracted.Thenforbothusersanditemswecomputedtheircosinesimilaritiestobeusedassideinformation.2.Epinions:Thisdatasetwasobtainedfromauser-orientedproductreviewwebsitethathasatrustnetworkofusers.Userscanspecifywhethertheytrustotherusersornotexplicitly.Thistrustnetworkallowsthecreationofa0/1trustconnectivityvectorforeachuserwithallotherusers.Fromthis,thecosinesimilarityoftrust103vectorswascomputedandbecamethesimilaritymatrixofusers.Theitemshavecategoricalsideinformationandtheuserscanprovidetheirratings(1-5).3.NIPS:Fornetworkcompletion,theco-authornetworkattheNIPSconference[78]wastaken.Fromthis,thepaper-authorandpaper-wordmatriceswereextracted.Thereisnosideinformationfortheauthors,butthepapers'contentsarepre-processedsuchthatallwordsareconvertedtolowercasesandstop-wordsarere-moved.Again,thecosinesimilarityiscomputedofthewordsbetweenpapers.4.Synthetic:ThesyntheticdatasetwasgeneratedbytwomatricesUPr0;1s4;000randVPr0;1s2;000r.ThenUandVwereusedtogeneratearatingmatrixR4;0002;000UVJ.InR,therewere4,000usersand2,000items.AsimilaritymatrixA4;0004;000UUJwasgeneratedforusersandalsoB2;0002;000VVJforitems.Finally,randomnoisewasaddedtotheallelementsofAandBwherethenoisefollowstheGaussiandistributionwithNp0;0:5q.8.3.2EvaluationMetricsAsisstandard,theMeanAbsoluteError(MAE)andtheRootMeanSquaredError(RMSE)metricsforpredictionaccuracy[38]wereagainused.IfTdenotesthesetofratingstobepredicted,i.e.,Ttpi;jqPrnsrms;Rijneedstobepredictedu,andpRdenotesthepredictionmatrixobtainedbyarecommendationalgorithm,thenMAEis:MAE°pi;jqPT|RijpRij||T|RMSEissimilarlyas:RMSEgffe°pi;jqPTRijpRij2|T|104StatisticsMovieLens100KMovieLens1MEpinioinsNIPSNumberofusers9436,0408,5772,073Numberofitems16823,7063,7691,740Numberofratings100,0001,000,209203,2753,990Rangeofratings1-51-51-50-1Table8.1StatisticsoftherealworlddatasetsEvensmallimprovementsinRMSEareconsideredvaluableinthecontextofrecommendersystems.Forexample,thestreamingvideoservicethatreliesheavilyonpredictingusers'ratingsoncontent,aprizeof$1,000,000totheresearcherstoachievea10%reductionofRMSE.Thereisoneothermetrictobeconsidered.Givenanitemi,letribetherelevancescoreoftheitemrankedatpositioni,whereri1iftheitemisrelevanttotheiandri0otherwise.TheNDCGmeasureisanormalizationoftheDiscountedCumulativeGain(DCG)measure.DCGisaweightedsumofthedegreeofrelevancyoftherankedusers.ThevalueofNDCGisbetweenr0;1sandatpositionkisas:NDCG@kZkk¸i12ri1logpi1qInallexperiments,thevalueofkissetasthenumberofrateditemsbyeachuser.8.3.3TheBaselineAlgorithmsToevaluatetheperformanceoftheproposedDecRecalgorithm,awidevarietyofbaselinealgorithmsweredeployed.Thebaselinealgorithmswerechosenfromfourtypesofcategories:state-of-the-artalgorithmsforratingpredictions,recommenderswithcold-startitemscapacity,recommenderswithcold-startuserscapacity,andalgorithmswiththe105capacitytoconsiderbothcold-startitemsandusers.TheMyMediaLiteimplementationswereusedthroughoutthetesting[29].1.RS(RandomStrategy)[51]:Asimplebaselinethatselectsatrandomasubsetofusersoritems.Therecommendationforcold-startusersanditemsisachallengingcase,whereRSisoneofthebaselinemethods.2.U-KNN(UserKNN)[45]:PredictstheratingsusingthesimilaritywiththeKnearestneighborswhereusershaveweights.3.I-KNN(ItemKNN)[45]:Isaweighteditem-basedKNNapproachforratepredic-tion.4.GA(GlobalAverage):Usestheaverageofratingsoverallratings.5.I-A(ItemAverage):Usestheaverageratingofanitemforitsprediction.6.U-A(UserAverage):Usestheaverageratingofauserforitsprediction.7.SO(SlopeOne)[47]:Pre-computestheaveragebetweentwoitemsthatareratedbyusers.SOisafrequencyweightedslopeoneratingprediction.8.BPSO(BiPolarSlopeOne)[47]:IsaBi-polarfrequencyweightedslopeoneratingprediction.9.SMF(SocialMatrixFactorization)[39]:Isamatrixfactorizationalgorithmthatincorporatesthesocialnetworkforprediction.10.CC(CoClustering)[30]:Isaweightedco-clusteringalgorithmthatinvolvessimulta-neousclusteringofusersanditems.11.LFLLM(LatentFeatureLogLinearModel)[59]:Isascalablelog-linearmodelthatexploitsthesideinformationfordyadicprediction.10612.U-I-B(UserItemBaseline)[45]:Isaratingpredictionmethodthatusestheaverageratingvaluealongwitharegularizeduseranditembias.13.FWMF(FactorWiseMatrixFactorization):Isamatrixfactorizationbasedmodelwithafactor-wiselearning.14.BMF(BiasedMatrixFactorization)[75]:Isamatrixfactorizationthatlearnsbystochasticgradientdescentwithexplicitbiasforusersanditems.15.SVDPP(SVD++)[44]:Isamatrixfactorizationthattakesintoaccountwhatusershaveratedanddirectlyusersanditems.16.SSVDPP(SigmoidSVD++)[44]:IsaversionofSVD++thatusesasigmoidfunc-tion.17.SU-AFM(SigmoidUserAsymmetricFactorModel)[69]:Isanasymmetricfactormodelthatrepresentstheitemsintermsofthoseusersthatratedthem.18.SI-AFM(SigmoidItemAsymmetricFactorModel)[69]:Isanasymmetricfactormodelthatrepresentsusersintermsoftheitemstheyrated.19.SCAFM(SigmoidCombinedAsymmetricFactorModel)[69]:Isanasymmetricfactormodelthatrepresentsitemsintermsoftheusersthatratedthem,andusersintermsoftheitemstheyrated.20.CBF(ContentBasedFiltering)[79]:Thisalgorithmbuildsaforeachuserbasedonthepropertiesoftheuser'spreferreditemsfromthepast.21.LCE(LocalCollectiveEmbeddings)[79]:Isamatrixfactorizationmethodthatexploitspropertiesofitemsandpastuserpreferenceswhileenforcingthemanifoldstructureexhibitedbythecollectiveembeddings.22.LCE-NL(LocalCollectiveEmbeddingsNoLaplacian)[79]:IsLCEwithoutlaplacianregularization.10723.ELCE(ExtendedLCE):IsanextendedversionofLCEmeanttohandlethechallengeofthepresenceofbothcold-startusersanditemssimultaneously.24.KMF(KernelizedMatrixFactorization)[90]:Isamatrixcompletionbasedalgo-rithm,whichincorporatesexternalsideinformationoftheusersoritemsintothematrixfactorizationprocess.25.DecRec:Theproposedalgorithmwithdecoupledcompletionandtransduction.8.3.4ofNoiseThesensitivitytonoiseontheproposedalgorithmisexplored.Withthesyntheticdataset,recallAandBasthetwosimilaritymatricesbetweenusersanditems,respectively.ThevarianceoftheGaussianwerevariedbyevery0.05for0:05to0:95.Figure8.2showstheincreaseinRMSEandMAEoftheresultsontestdataasthenoiseincreases.However,thedegradationoftheresultsisgracefulwithrespecttotheinceaseofnoise.Theresultsaresimilartothoseinarealworlddataset.ExaminingMovieLens1M,noisewasgeneratedthatalsofollowsaGaussiandistri-bution.Thisnoise,Np0;0:5q,wasaddedtothesimilaritymatrices.Thevariancewasvariedinthesamewayasinthesyntheticdataset.Figure8.3showstheRMSEandMAEofthepredictions.Similartothepreviousresults,byaddingnoisetothesimilaritymatrices,theerrorgrowsgracefully.8.3.5ExistingUsersandItemsThestandardcaseforrecommendersystemsisthatallusersanditemshaveahistoryofratingandbeingrated,respectively.Generally,usersonlyrateasmallnumberofitemsandtherestoftheratingsareunknown.Thestatedgoalofsuchrecommendersystemsistoprovideapredictionforalloftheratingsoftheunrateditems.Therearetwo108Figure8.2RMSE&MAEonthesyntheticdatasetfortnoisevariancesonsimi-laritymatricesbroadapproachestothisproblem:neighbor-basedandlatentfactormodels.Manybaselinealgorithmsfromeachtypewereused.DecRecwasdeployedontheMovieLens100K,MovieLens1MandEpinionsdatasets.GroupLensResearch5madeesetsavailable,whichare80%/20%splitsoftheMovieLens100Kintotrainingandtestdata.ForMovieLens1MandEpinions,thedataweresplitinto80%/20%train-testsetsrandomlyetimes(for5-foldcross-validation).Theaverageresultsarereported.Table8.2showsthisaverageRMSEandMAEresultingfromthe5-foldcross-validationforallbaselinealgorithms.DecRechadthesmallestRMSEandMAEvaluesforeachcolumn(indicatedbyboldtype-face).Theresultssuggestedthatamongneighbor-basedapproaches,I-KNNisthebest-performingalgorithmforMovieLens1Mand100KandU-KNNisthesecondbest-performingone.Thatisbecauseofthesimilaritybe-tweenmovies(genre,director,etc)andsimilaritybetweentasteofusersplayanimportantroleinpredictionaccuracy.I-KNNandU-I-Boutperformedotherneighbor-basedmethods5grouplens.org/109Figure8.3RMSE&MAEofMovieLens1MfortnoisevariancesonsimilaritymatricesonEpinionsinrespecttoRMSEmeasures,whileBPSOandI-KNNoutperformedothersinrespecttoMAEmeasures.SimilarityofitemsandhavingthesamepatternofratingsforsimilaritemsonEpinionshelpedI-KNN'sresultsperformbetterthanotherneighborbasedmethods.Table8.3furthershowsthecomparisonbetweenlatentfactormethodsinwhichDecRec(KMF)algorithmachievedthe(second)bestperformanceofRMSEandMAEforMovieLens100KandRMSEforMovieLens1M.ItalsoachievedthesecondbestperformanceofMAEforMovieLens1M.OnEpinions,DecRecalsooutperformedotherlatentfactormethods.Tables8.2and8.3showthatDecRecachievedthebestperformanceforalldatasetsamongallmethodsofbothcategories,latentfactorandneighborbasedmethods(exceptforMAEonMovieLens1M),theperformanceadvantageofDecRecoverallbaselinealgorithms.Hence,theproposeddecoupledmethodbyincorporatingsideinformationrevealstheneedforpreventingerrorpropagationalongwithusingsideinformationtoobtainmoreaccuratepredictions.110AlgorithmsHyperparametersMovieLens100KMovieLens1MEpinionsRMSEMAERMSEMAERMSEMAENeighbor-BasedMethodsGA|1.11900.93991.1160.93271.16920.8878I-A|1.02200.81590.97590.77901.06950.8140U-A|1.03900.83501.0340.82721.12760.8769U-KNNk=800.93550.73980.89520.70302.39992.2229I-KNNk=80,sh=10,u=25,v=100.92410.72700.87110.68301.02790.6993U-I-Bu=5,v=20.94190.74500.90810.71901.02900.8010SO|0.93970.74030.90200.71201.08650.7067BPSO|0.97440.74820.93900.71991.04490.6813CCCi=3,Cu=3,T=300.95590.75260.91180.71341.05730.7890Table8.2ResultsonMovieLens100Kand1MandEpinionsforneighbor-basedmethodswithnocold-startusers/items111AlgorithmsHyperparametersMovieLens100KMovieLens1MEpinionsRMSEMAERMSEMAERMSEMAELatentFactorModelsSMFp=10,u=0.015,v=0.015,b=0.011.01340.78841.22840.93151.12240.8436s=1,=0.01,b=1,T=30FWMFp=5,T=5,sh=1500.92120.72520.86010.67301.50901.0597BMFp=160,b=0.003,=0.07,T=1000.91040.71940.85400.67601.02400.7918u=0.08,v=0.1MFp=10,T=750.91330.72450.85700.67511.09080.8372g=0.05,=0.005KMF˙r=0.4,D=10,=0.003,=0.10.79470.68930.74920.65140.90150.7873LFLLMp=10,b=0.01,u=0.015,v=0.0150.95500.76170.90120.70821.28911.0386=0.01,T=30,b=1SI-AFMb=0.7,g=0.015,=0.001,p=100.95680.76281.0350.84881.15340.8816b=0.33,T=1SU-AFMb=0.7,g=0.0150.95690.76340.90620.71891.0690.8398b=0.33,T=1,=0.001,p=10SCAFM|0.94990.75590.91210.72391.06000.8312SVDPPb=0.07,g=10.90650.71350.85100.66801.05500.8220b=0.005,p=50,=0.01,T=50SSVDPPb=0.7,T=301.1850.91470.94020.73521.33280.9022p=10,g=0.015,=0.001,b=0.33RS|1.69601.38601.70701.39401.90961.5789DecRecr=100.70020.66280.71570.67210.71570.6796Table8.3ResultsonMovieLens100Kand1MandEpinionsforlatentfactormethodswithnocold-startusers/items1128.3.6Cold-StartItemsTosimulatecold-startitemproblems,theitemsweredividedintotwodisjointtrain-ingandtestsubsets.Here,80%oftheitemswereconsideredexistingitemsfortrainingandtheremaining20%werecold-startitemsfortesting.Thisgeneralframeworkcanalsobeusedonnetworkcompletionchallenges.TheNIPSdatasetwaschosentonotonlysimulatethecold-startitemscenario,butalsotoshowtheresultsofDecRecfornetworkcompletion.NIPShasrichsideinformationfortheitems(papers)andshowstherelationship(0or1)betweenpapersandauthors.BecausethevaluesinNIPSareeither0or1,predictingtheauthorsofnewpaperscanbealsoperceivedasalinkpredictionproblem.FourcompetitiverecommendationmethodswereconsideredonNIPSdataset.TheywereCBF,KMF,LCEandLCE-NL.Table8.4showstheaverageRMSEandMAEof5-foldcross-validationforthesealgorithmsforthecold-startitemscenario.TheparametersfromMyMediaLiteareprovidedforreproduciblility.TheseresultsindicatethatDecRecisthebestperformingalgorithmamongallbase-linealgorithmsinthecold-startitemscenariowithrespecttoRMSE,MAEandNDCG.HavingthehighestNDCGvalueamongallcompetitivealgorithmsshowsthatDecReccanpresentthetop-rankeditemstousersbetterthanotheralgorithms.DecRecalsoyieldsthelowestRMSEandMAEvalues.Fromthis,itmaybeconcludedthatDecRec'spredictionsoftheratingsforcold-startitemsaremoreaccuratethantheothermethods.DecReccanbettersuggestthetop-rankedcold-startitemstouserswithhigheraccuracy.Therunningtimesarealsoincludedinthisinstance.CBFmustonlycreateuserandisthusveryfast.KPMFandDecRecareacceptablyt,butLCEandLCE-NLaremuchslowerduetotheirconvergenceconditions.113DatasetsMethodHyperparametersMeasuresNDCG@kRMSEMAETime(s)NIPSCBF|0.38610.79430.88810.17597LCEk=500,g=0.5,"0:001,Tm=500,=0.050.42400.76920.8675709.869LCE-NLk=500,g=0.5,"=0.001,Tm=500,=00.41860.75320.85621823.48Cold-startitemKMF˙r=0.4,D=10,=0.003,=0.10.14150.88040.919619.5413DecRecr=10000.46260.51110.680523.8410EpinionsCBF|0.22010.66440.77414.4800LCEk=500,g=0.5,"=0.001,Tm=500,=0.050.23270.67130.77861067.11LCE-NLk=500,g=0.5,"=0.001,Tm=500,=00.23190.67120.778511969.9Cold-startuserKMF˙r=0.4,D=10,=0.003,=0.10.20840.85220.88821196.32DecRecr=10630.23430.66180.7716144.660MovieLens100KRS|0.10221.19810.97820.0131KMF˙r=0.4,D=10,=0.003,=0.10.24230.98230.873011.540Cold-startELCEk=500,g=0.5,"=0.001,Tm=500,=00.26810.89340.762616.4683user&itemDecRecr=1000.26410.86720.72304.8729MovieLens1MRS|0.06521.38200.93260.0682KMF˙r=0.4,D=10,=0.003,=0.10.18340.97300.844263.732Cold-startELCEk=500,g=0.5,"=0.001,Tm=500,=00.26620.88490.7684166.201user&itemDecRecr=1000.27830.85240.716210.578Table8.4Resultsonallofthecold-startscenariosforrealdatasets1148.3.7Cold-startUsersTosimulatecold-startuserproblems,theusersweredividedintotwodisjointtrain-ingandtestsubsets.Here,80%oftheuserswereconsideredexistingusersfortrainingandtheremaining20%werecold-startusersfortesting.ToshowtherelativeresultsofDecRec,itwascomparedwithseveralcompetitivealgorithms:CBF,LCE,LCE-NLandKPMF.BecauseEpinionshasanexplicitlygiventrustnetworkamongtheusers,ithasveryusefulsideinformation.Thissetisusedtoexperimentonthecold-startuserscenario.Table8.4showstheaveraged(5-foldcrossvalidation)performanceresultsofthementionedalgorithms.DecRecachievedthebestperformanceofNDCG,RMSEandMAEontheEpinionsdataset.TheseevaluationsofDecRecandthebaselinealgorithmsareclose,butthecon-sistencyinoutperformingothercompetitivemethodsonbothcold-startuseranditemscenariosthestabilityandperformanceadvantageofDecRecoverstate-of-the-artalgorithms.8.3.8Cold-StartUsersandItemsTosimulatehandlingbothcold-startusersanditems,20%oftheusersand20%oftheitemswhererandomlychosentobecold-startusersanditems,respectively.AllexperimentswereperformedonthetwoMovieLensvariantsbecausetheyhaverichsideinformationaboutboththeusersandtheitems.Boththeratingsforcold-startusersanditemswerethenpredicted.Thisisaverychallengingscenariobecauseofallofthecompletelyemptyrowsandcolumnsintheratingmatrix.Veryfewbaselinealgorithmsexiststhatcanevenattemptasolution.DecRecwascomparedonlywithonlyRS,KMF,ELCEbecauseofthisrestriction.115Table8.4showstheresultsofapplyingRS,KMF,ELCEandDecRecalgorithmsonMovieLens100Kand1M.Onbothdatasets,DecRecoutperformedotherbaselines.ThenearestcompetitorwasELCE{acollaborativefactorizationmethod.8.4ConclusionDecRecexplicitlyexploitsthesimilarityinformationaboutusersanditemstoalle-viatecold-startproblems.Inparticular,DecRecdecouplesthecompletionfromtheknowl-edgetransduction,thuspreventingsomeerrorpropagationasdescribed.ExperimentalresultsonrealdatasetsclearlyindicatedthatDecRecoutperformsmodernbenchmarkal-gorithmsinallofthepermutationsofthecold-startproblems,andevenperformswellwhennocold-startusers/itemsarepresent.Alongwiththedearthofalgorithmsthatcanbeappliedtocold-startproblems,thispositionsDecRecasoneofthebestrecommendationalgorithmsforcold-startscenarios.116Chapter9ConclusionsThisworkpresentedanewframeworktostudyinvariantevolutionwiththestandardandrandomP-impactprocess.Severalinvariantswereexploredandtheirimpactintreesandemptygraphswasshown.Theprocessofdeterminingmaximumdistance-impactedgeswasshowntobenon-trivialthroughaseriesofcounterexamplesandaproofthattheseedgesmustnotbeincidenttotwoleavesintrees.Furtherexperimentationseemstoindicatethatsomeedgesprovidemoreaccuracywhentheyareaddedbeforematrixfactorizationisapplied.Anealgorithmfornetworkcompletionwithauxiliarysimilarityinformationaboutnodeswasalsodeveloped.Itscomparisontothestate-of-the-artbaselinesonsyn-theticandrealdatasetsrevealsthatproposedmethodexhibitsimprovedperformanceintermsofrecoveringthefullnetwork.Thisadvantageisbroughtbytheprocessinwhichwedecoupledthecompletionofobservedsubmatrixfromtransductionofknowledgebyexploitingsimilarityinformation.Finally,analgorithmthatexplicitlyexploitsthesimilarityinformationaboutusersanditemstoalleviatecold-startproblemsforrecommendationwasgiven.Similartothenetworkcompletionalgorithm,itcompletessub-matrixoftheratingmatrixandtransductsknowledgefromrecoveredsub-matrixalongwiththesideinformationoftheusersanditems.117Thisalgorithmalsoperformedwellinexperimentationwithexistingusers/itemsandinallthreecold-startscenarios.118APPENDIX119APPENDIXThisappendixcontainstheresultsoftheadditionalexperimentationinChapter6.Fiveorder25randomgraphsofeachclassweregenerated.Fromeachofthese,nineothergraphswerecreated.Ineachoftheseninegraphs,between10-90%(stepsizeof10%)oftheentriesintheadjacencymatrixcorrespondingtopotentialedgeswereeliminated.Notethatfornon-directedgraphstheadjacencymatrixissymmetric,soinpracticeonlytheupper-triangleadjacencymatrixwasconsidered.Fromeachoftheseadjacencymatricestwoprocessesoccurred.First,theadjacencymatrixwascompletedas-isviamatrixfactor-ization.Second,oneentryatatimewasrevealedandreplacedwithitstruevalue(0or1).Atthispoint,theadjacencymatrixwascompleted(againviamatrixfactorization).Inthecase,thebasalerrorrateswerecalculated(i.e.errorwithoutrevealinganyedges).Inthesecondcase,eachtimeanentrywasrevealedtheerrorofcompletionwasreportedasininstance.Fortheeaseoflanguage,intheoriginalgraphbothabsentandpresentedgeswillbereferredtoas`edges'.Thebeing,absent`edges'willhaveatruevalueof0intheadjacencymatrixwhilepresentedgeswillhaveavalueof1.Thepredictivepowerofindividualedgesarefoundbycomputingthecompletionerrorthattheiradditionyieldsfortheresultantadjacencymatrix.Thebasalerrorrateisalsoreported.ForallrandomgraphsoftheformGp25;pqforall0:1¤p¤0:9;p˘0:5stepsize0.1refertoFiguresA.1-15.ThetablescomparingthebasalRMSEandMAEandthepercentofedgeswho'sadditiondegradesthematrixfactorizationresultsbelowthebasalratearegiveninTablesA.1-8forGp25;pqwith0:1¤p¤0:9;p˘0:5stepsize0.1,respectively.ForGp25;0:5q,refertoFigures6.1,6.2,and6.1inChapter5fortheRMSE,MAE,andbasalcomparisonresults,respectively.120FigureA.1RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:1qFigureA.2MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:1q121FigureA.3RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:2qFigureA.4MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:2q122FigureA.5RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:3qFigureA.6MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:3q123FigureA.7RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:4qFigureA.8MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:4q124FigureA.9RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:6qFigureA.10MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:6q125FigureA.11RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:7qFigureA.12MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:7q126FigureA.13RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:8qFigureA.14MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:8q127FigureA.15RMSEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:9qFigureA.16MAEonsingleedgeadditionforvariedpercentagesofremovededgesforrandomgraphspp0:9q128EdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%1.576378.331.514679.6620%1.7159100.01.6573100.030%1.561662.01.486854.040%1.527345.331.455541.6650%1.542754.331.484957.6660%1.444612.661.375711.3370%1.44119.6661.36828.66680%1.671198.331.617899.090%1.472714.331.406914.33TableA.1ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:1qEdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%1.410624.661.312722.6620%1.366810.331.270710.030%1.518681.01.431583.3340%1.456153.01.361050.6650%1.452546.661.359746.6660%1.461556.001.368755.3370%1.475860.01.367650.3380%1.423130.01.325226.6690%1.523180.01.428477.0TableA.2ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:2q129EdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%1.535091.661.444892.3320%1.546393.661.449092.3330%1.539688.01.432386.3340%1.507978.01.416078.3350%1.526386.331.427583.060%1.407428.331.301525.070%1.469261.01.383364.6680%1.610198.01.521897.3390%1.433339.331.340440.0TableA.3ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:3qEdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%1.224110.331.07788.66620%1.424997.331.298597.030%1.282535.01.138232.3340%1.302444.331.174653.3350%1.300439.331.162939.060%1.243614.661.108015.6670%1.274226.661.128325.080%1.397087.661.256884.6690%1.293038.331.155336.66TableA.4ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:4q130EdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%1.234176.661.077373.020%1.224767.331.078270.6630%1.271182.331.119681.3340%1.200143.01.029135.050%1.237872.331.068263.060%1.263675.01.111875.3370%1.187228.991.048240.080%1.185832.661.033334.3390%1.286688.01.154192.33TableA.5ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:6qEdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%1.071926.660.894923.3320%1.197393.331.031691.6630%1.139766.330.992276.040%0.97320.6660.78490.66650%1.077227.330.909228.0060%1.080628.330.910827.6670%1.02056.6660.83125.080%1.049612.330.871911.6690%1.152267.660.997371.0TableA.6ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:7q131EdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%0.944540.330.765932.6620%0.89148.00.741217.6630%0.982662.660.805455.0040%1.000966.330.835765.3350%0.83831.6660.65972.33360%0.923620.660.779134.070%1.024676.660.872878.3380%0.988458.660.820354.090%1.058388.330.915590.0TableA.7ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:8qEdgesRemoved(%)BasalRMSEWorseEdges(%)BasalMAEWorseEdges(%)10%0.870849.00.684031.020%0.853334.00.694832.3330%0.917676.330.762173.040%0.971694.00.822993.6650%0.886956.000.735853.6660%0.844927.00.701334.3370%0.891856.660.731652.3380%0.815410.00.653811.3390%0.863330.660.702128.99TableA.8ThebasalcompletionerrorandthepercentofedgesthatproducehighererrorforallpercentagesofedgesremovedforErd}os-RenyirandomgraphsoftheformGp25;0:9q132REFERENCES133REFERENCES[1]JacobAbernethy,FrancisBach,TheodorosEvgeniou,andJean-PhilippeVert.Anewapproachtocollaborativering:Operatorestimationwithspectralregularization.TheJournalofMachineLearningResearch,10:803{826,2009.[2]AAnnibaleandACCCoolen.Whatyouseeisnotwhatyouget:howsamplingmacroscopicfeaturesofbiologicalnetworks.InterfaceFocus,1(6):836{856,2011.[3]SitaramAsurandBernardoAHuberman.Predictingthefuturewithsocialmedia.InWebIntelligenceandIntelligentAgentTechnology(WI-IAT),2010IEEE/WIC/ACMInternationalConferenceon,volume1,pages492{499.IEEE,2010.[4]EytanBakshy,JakeMHofman,WinterAMason,andDuncanJWatts.Everyone'sanquantifyingontwitter.InProceedingsofthefourthACMinternationalconferenceonWebsearchanddatamining,pages65{74.ACM,2011.[5]ImanBarjasteh,RanaForsati,Abdol-HosseinEsfahanian,andHayderRadha.Cold-startitemanduserrecommendationwithdecoupledcompletionandtransduction.InRecSys,pages91{98,2015.[6]ImanBarjasteh,RanaForsati,DennisRoss,Abdol-HosseinEsfahanian,andHayderRadha.Cold-startrecommendationwithprovableguarantees:Adecoupledapproach.[7]JustinBasilicoandThomasHofmann.Unifyingcollaborativeandcontent-basedtering.InICML,page9.ACM,2004.[8]RobertMBellandYehudaKoren.Lessonsfromtheixprizechallenge.ACMSIGKDDExplorationsNewsletter,9(2):75{79,2007.[9]DanielBillsusandMichaelJPazzani.Usermodelingforadaptivenewsaccess.Usermodelinganduser-adaptedinteraction,10(2-3):147{180,2000.[10]StephenPBorgattiandMartinGEverett.Agraph-theoreticperspectiveoncentrality.Socialnetworks,28(4):466{484,2006.134[11]UlrikBrandes.Afasteralgorithmforbetweennesscentrality.TheJournalofMathe-maticalSociology,25(2):163{177,2001.[12]RobinBurke.Hybridrecommendersystems:Surveyandexperiments.Usermodelinganduser-adaptedinteraction,12(4):331{370,2002.[13]Guo-RayCaiandYu-GengSun.Theminimumaugmentationofanygraphtoak-edge-connectedgraph.Networks,19(1):151{172,1989.[14]Jian-FengCai,EmmanuelJCandes,andZuoweiShen.Asingularvaluethresholdingalgorithmformatrixcompletion.SIAMJournalonOptimization,20(4):1956{1982,2010.[15]PaulA.Catlin.Areductionmethodtospanningeuleriansubgraphs.JournalofGraphTheory,12(1):29{44,1988.[16]MarkClaypool,AnujaGokhale,TimMiranda,PavelMurnikov,DmitryNetes,andMatthewSartin.Combiningcontent-basedandcollaborativesinanonlinenews-paper.InProceedingsofACMSIGIRworkshoponrecommendersystems,volume60.Citeseer,1999.[17]GabriellaContardo,LudovicDenoyer,andThierryArtieres.Representationlearningforcold-startrecommendation.arXivpreprintarXiv:1412.7156,2014.[18]AronCulotta.Towardsdetectingepidemicsbyanalyzingtwittermessages.InProceedingsoftheworkshoponsocialmediaanalytics,pages115{122.ACM,2010.[19]GideonDror,NoamKoenigstein,YehudaKoren,andMarkusWeimer.Theyahoo!musicdatasetandkdd-cup'11.InKDDCup,pages8{18,2012.[20]AsmaaElbadrawyandGeorgeKarypis.Feature-basedsimilaritymodelsfortop-nrecommendationofnewitems.2014.[21]PaulErd}os.Onsomeextremalproblemsingraphtheory.IsraelJournalofMathe-matics,3(2):113{116,1965.135[22]PaulErd}os,asHajnal,andJohnWMoon.Aproblemingraphtheory.AmericanMathematicalMonthly,pages1107{1110,1964.[23]Abdol-HosseinEsfahanianandSLouisHakimi.Oncomputingaconditionaledge-connectivityofagraph.InformationProcessingLetters,27(4):195{199,1988.[24]YiFangandLuoSi.Matrixco-factorizationforrecommendationwithrichsideinfor-mationandimplicitfeedback.InProceedingsofthe2ndInternationalWorkshoponInformationHeterogeneityandFusioninRecommenderSystems,pages65{69.ACM,2011.[25]AnasFrank.Augmentinggraphstomeetedge-connectivityrequirements.SIAMJournalonDiscreteMathematics,5(1):25{53,1992.[26]LintonCFreeman.Asetofmeasuresofcentralitybasedonbetweenness.Sociometry,pages35{41,1977.[27]HaroldNGabowandEugeneWMyers.Findingallspanningtreesofdirectedandundirectedgraphs.SIAMJournalonComputing,7(3):280{287,1978.[28]ZenoGantner,LucasDrumond,ChristophFreudenthaler,SteRendle,andLarsSchmidt-Thieme.Learningattribute-to-featuremappingsforcold-startrecommenda-tions.InICDM.IEEE,2010.[29]ZenoGantner,nRendle,ChristophFreudenthaler,andLarsSchmidt-Thieme.MyMediaLite:Afreerecommendersystemlibrary.InACM,RecommenderSystems,2011.[30]ThomasGeorgeandSrujanaMerugu.Ascalablecollaborativeingframeworkbasedonco-clustering.InICDM,2005.[31]QuanquanGuandJieZhou.Learningthesharedsubspaceformulti-taskclusteringandtransductivetransfertion.InICDM'09,pages159{168.IEEE,2009.[32]RogeraandMartaSales-Pardo.Missingandspuriousinteractionsandthereconstructionofcomplexnetworks.ProceedingsoftheNationalAcademyofSciences,106(52):22073{22078,2009.136[33]AselaGunawardanaandChristopherMeek.Aapproachtobuildinghybridrecommendersystems.InProceedingsofthethirdACM,Recommendersystems,2009.[34]SunilKumarGupta,DinhPhung,BrettAdams,TruyenTran,andSvethaVenkatesh.Nonnegativesharedsubspacelearninganditsapplicationtosocialmediaretrieval.InACMSIGKDD,pages1169{1178.ACM,2010.[35]SunilKumarGupta,DinhPhung,BrettAdams,andSvethaVenkatesh.Regularizednonnegativesharedsubspacelearning.Dataminingandknowledgediscovery,26(1):57{97,2013.[36]SteveHannekeandEricPXing.Networkcompletionandsurveysampling.InAISTAT,pages209{215,2009.[37]ThorstenHennig-Thurau,CarolineWiertz,andFabianFeldhaus.Exploringthetwitteraninvestigationoftheimpactofmicrobloggingwordofmouthonconsumersearlyadoptionofnewproducts.AvailableatSSRN,2016548,2012.[38]JonathanLHerlocker,JosephAKonstan,LorenGTerveen,andJohnTRiedl.Evalu-atingcollaborativerecommendersystems.ACMTransactionsonInformationSystems(TOIS),22(1):5{53,2004.[39]MohsenJamaliandMartinEster.Amatrixfactorizationtechniquewithtrustprop-agationforrecommendationinsocialnetworks.InProceedingsofthefourthACMconferenceonRecommendersystems,pages135{142.ACM,2010.[40]BhargavKanagal,AmrAhmed,SandeepPandey,VanjaJosifovski,Yuan,andLluisGarcia-Pueyo.Superchargingrecommendersystemsusingtaxonomiesforlearn-inguserpurchasebehavior.ProceedingsoftheVLDBEndowment,5(10):956{967,2012.[41]MyunghwanKimandJureLeskovec.Thenetworkcompletionproblem:Inferringmissingnodesandedgesinnetworks.InSDM,pages47{58.SIAM,2011.[42]NoamKoenigstein,GideonDror,andYehudaKoren.Yahoo!musicrecommendations:modelingmusicratingswithtemporaldynamicsanditemtaxonomy.pages165{172.ACM,2011.137[43]JnosKomlsandMiklsSimonovits.Szemerdi'sregularitylemmaanditsapplicationsingraphtheory,1996.[44]YehudaKoren.Factorizationmeetstheneighborhood:amultifacetedcollaborativemodel.InProceedingsofthe14thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages426{434.ACM,2008.[45]YehudaKoren.Factorintheneighbors:Scalableandaccuratecollaborative.ACMTransactionsonKnowledgeDiscoveryfromData(TKDD),4(1):1,2010.[46]YehudaKoren,RobertBell,andChrisVolinsky.Matrixfactorizationtechniquesforrecommendersystems.Computer,(8):30{37,2009.[47]DanielLemireandAnnaMaclachlan.Slopeonepredictorsforonlinerating-basedcollaborativeInSDM,volume5,pages1{5.SIAM,2005.[48]DavidLiben-NowellandJonKleinberg.Thelink-predictionproblemforsocialnet-works.JournaloftheAmericansocietyforinformationscienceandtechnology,58(7):1019{1031,2007.[49]GuangLing,MichaelRLyu,andIrwinKing.Ratingsmeetreviews,acombinedapproachtorecommend.InACMConferenceonRecommenderSystems,pages105{112.ACM,2014.[50]JuntaoLiu,CaihuaWu,andWenyuLiu.Bayesianprobabilisticmatrixfactorizationwithsocialrelationsanditemcontentsforrecommendation.DecisionSupportSystems,2013.[51]NathanNLiu,XiangruiMeng,ChaoLiu,andQiangYang.Wisdomofthebetterfew:coldstartrecommendationviarepresentativebasedratingelicitation.InACMConferenceonRecommenderSystems,pages37{44.ACM,2011.[52]MingshengLong,JianminWang,GuiguangDing,WeiCheng,XiangZhang,andWeiWang.Dualtransferlearning.InSDM,pages540{551.SIAM,2012.[53]HaoMa,HaixuanYang,MichaelRLyu,andIrwinKing.Sorec:socialrecommendationusingprobabilisticmatrixfactorization.InProceedingsofthe17thACMconferenceonInformationandknowledgemanagement,pages931{940.ACM,2008.138[54]WolfgangMader.Areductionmethodforedge-connectivityingraphs.AnnalsofDiscreteMathematics,3:145{164,1978.[55]FarzanMasrour,ImanBarjesteh,RanaForsati,Abdol-HosseinEsfahanian,andHay-derRadha.Networkcompletionwithnodesimilarity:Amatrixcompletionapproachwithprovableguarantees.pages302{307.ACM,2015.[56]DavidWMatula.Determiningedgeconnectivityin0(nm).InFoundationsofCom-puterScience,1987.,28thAnnualSymposiumon,pages249{251.IEEE,1987.[57]PremMelville,RaymondJMooney,andRamadassNagarajan.Content-boostedcol-laborativeforimprovedrecommendations.InAAAI/IAAI,pages187{192,2002.[58]AdityaKrishnaMenon,Krishna-PrasadChitrapura,SachinGarg,DeepakAgarwal,andNagarajKota.Responsepredictionusingcollaborativewithhierarchiesandside-information.InACMSIGKDD,pages141{149.ACM,2011.[59]AdityaKrishnaMenonandCharlesElkan.Alog-linearmodelwithlatentfeaturesfordyadicprediction.InICDM,pages364{373.IEEE,2010.[60]AdityaKrishnaMenonandCharlesElkan.Linkpredictionviamatrixfactorization.InMachineLearningandKnowledgeDiscoveryinDatabases,pages437{452.Springer,2011.[61]AndriyMnihandRuslanSalakhutdinov.Probabilisticmatrixfactorization.InAd-vancesinneuralinformationprocessingsystems,pages1257{1264,2007.[62]JuhaniNieminen.Onthecentralityinagraph.ScandinavianJournalofPsychology,15(1):332{336,1974.[63]JuhaniNieminen.Distancecenterandcentroidofamediangraph.JournaloftheFranklinInstitute,323(1):89{94,1987.[64]UrosOcepek,JozeRugelj,andZoranBosnic.Improvingmatrixfactorizationrecom-mendationsforexamplesincoldstart.ExpertSystemswithApplications,2015.139[65]WeikePan,EvanWeiXiang,NathanNanLiu,andQiangYang.Transferlearningincollaborativeforsparsityreduction.InAAAI,volume10,pages230{235,2010.[66]ManosPapagelis,GautamDas,andNickKoudas.Samplingonlinesocialnetworks.KnowledgeandDataEngineering,IEEETransactionson,25(3):662{676,2013.[67]Seung-TaekParkandWeiChu.Pairwisepreferenceregressionforcold-startrecom-mendation.InRecSys,pages21{28,2009.[68]Seung-TaekPark,DavidPennock,OmidMadani,NathanGood,andDennisDeCoste.Naeerbotsforrobustcold-startrecommendations.pages699{705,2006.[69]ArkadiuszPaterek.Improvingregularizedsingularvaluedecompositionforcollabo-rativetering.InProceedingsofKDDcupandworkshop,volume2007,pages5{8,2007.[70]MichaelJPazzani.Aframeworkforcollaborative,content-basedanddemographicAIntelligenceReview,13(5-6):393{408,1999.[71]AlexandrinPopescul,DavidMPennock,andSteveLawrence.Probabilisticmodelsforcollaborativeandcontent-basedrecommendationinsparse-dataenvironments.InUAI,pages437{444,2001.[72]IanPorteous,ArthurUAsuncion,andMaxWelling.Bayesianmatrixfactorizationwithsideinformationanddirichletprocessmixtures.InAAAI,2010.[73]BenjaminRecht.Asimplerapproachtomatrixcompletion.TheJournalofMachineLearningResearch,12:3413{3430,2011.[74]SRendle.Factorizationmachines.InICDM,pages995{1000.IEEE,2010.[75]SRendleandLarsSchmidt-Thieme.Online-updatingregularizedkernelmatrixfactorizationmodelsforlarge-scalerecommendersystems.InProceedingsofthe2008ACMconferenceonRecommendersystems,pages251{258.ACM,2008.140[76]JassonDMRennieandNathanSrebro.Fastmaximummarginmatrixfactorizationforcollaborativeprediction.InProceedingsofthe22ndinternationalconferenceonMachinelearning,pages713{719.ACM,2005.[77]DennisRoss,BruceESagan,RonaldNussbaum,andAbdol-HosseinEsfahanian.Onconstructingregulardistance-preservinggraphs.arXivpreprintarXiv:1405.1713,2014.[78]SRoweis.Nipsdataset(2002).http://www.cs.nyu.edu/~roweis.[79]MartinSaveskiandAminMantrach.Itemcold-startrecommendations:learninglocalcollectiveembeddings.InRecSys,pages89{96.ACM,2014.[80]AndrewISchein,AlexandrinPopescul,LyleHUngar,andDavidMPennock.Methodsandmetricsforcold-startrecommendations.InSIGIR,pages253{260.ACM,2002.[81]HanhuaiShanandArindamBanerjee.GeneralizedprobabilisticmatrixfactorizationsforcollaborativeInICDM,pages1025{1030.IEEE,2010.[82]YueShi,MarthaLarson,andAlanHanjalic.Collaborativebeyondtheuser-itemmatrix:Asurveyofthestateoftheartandfuturechallenges.ACMComputingSurveys(CSUR),47(1):3,2014.[83]NathanSrebro,JasonRennie,andTommiSJaakkola.Maximum-marginmatrixfac-torization.InAdvancesinneuralinformationprocessingsystems,pages1329{1336,2004.[84]MicheleTrevisiol,LucaMariaAiello,RossanoSchifanella,andAlejandroJaimes.Cold-startnewsrecommendationwithdomain-dependentbrowsegraph.InACMConferenceonRecommenderSystems,volume14,2014.[85]OmarWasow,AlexBaron,MarlonGerra,KatharineLauderdale,andHanZhang.1cantweetskillamovie?anempiricalevaluationofthebrunoAvailableatSSRN,2010.[86]WouterWeerkampandMaartenDeRijke.Activityprediction:Atwitter-basedex-ploration.InSIGIRWorkshoponTime-awareInformationAccess,2012.141[87]XiZhang,JianCheng,ShuangQiu,GuiboZhu,andHanqingLu.Dualds:Adualdiscriminativeratingelicitationframeworkforcoldstartrecommendation.Knowledge-BasedSystems,73:161{172,2015.[88]YuchenZhang,AmrAhmed,VanjaJosifovski,andAlexanderSmola.Taxonomydis-coveryforpersonalizedrecommendation.pages243{252.ACM,2014.[89]KeZhou,Shuang-HongYang,andHongyuanZha.Functionalmatrixfactorizationsforcold-startrecommendation.InACMSIGIR,pages315{324.ACM,2011.[90]TinghuiZhou,HanhuaiShan,ArindamBanerjee,andGuillermoSapiro.Kernelizedprobabilisticmatrixfactorization:Exploitinggraphsandsideinformation.InSDM,pages403{414,2012.[91]TinghuiZhou,HanhuaiShan,ArindamBanerjee,andGuillermoSapiro.Kernelizedprobabilisticmatrixfactorization:Exploitinggraphsandsideinformation.volume12,pages403{414.SIAM,2012.142