ESTIMATINGCOVARIANCESTRUCTUREINHIGHDIMENSIONSByAshwiniMauryaADISSERTATIONSubmittedtoMichiganStateUniversityinpartialful˝llmentoftherequirementsforthedegreeofStatisticsDoctorofPhilosophy2016ABSTRACTESTIMATINGCOVARIANCESTRUCTUREINHIGHDIMENSIONSByAshwiniMauryaManyofscienti˝cdomainsrelyonextractingknowledgefromhigh-dimensionaldatasetstoprovideinsightsintocomplexmechanismsunderlyingthesedata.Statisticalmodelinghasbecomeubiquitousintheanalysisofhighdimensionaldataforexploringthelarge-scalegeneregulatorynetworksinhopeofdevelopingbettertreatmentsfordeadlydiseases,insearchofbetterunderstandingofcognitivesystems,andinpredictionofvolatilityinstockmarketinthehopeofavertingthepotentialrisk.Statisticalanalysisinthesehigh-dimensionaldatasetsyieldsbetterresultsonlyifanestimationprocedureexploitshiddenstructuresunderlyingthedata.Thisthesisdevelops˛exibleestimationprocedureswithprovabletheoreticalguaranteesforestimatingtheunknowncovariancestructuresunderlyingdatageneratingprocess.Ofparticularinterestareproceduresthatcanbeusedonhighdimensionaldatasetswherethenumberofsamplesnismuchsmallerthantheambientdimensionp.Duetotheimportanceofstructureestimation,themethodologyisdevelopedfortheestimationofbothcovarianceanditsinverseinparametricandaswellinnon-parametricframework.CopyrightbyASHWINIMAURYA2016Thisthesisisdedicatedtomyfamily.Fortheirendlesslove,support,andencouragement.ivACKNOWLEDGMENTSIamreallygratefultomanypeoplewhohelpedmeachievedoctorateinStatistics.ItwouldnothavebeenpossibletopursuePhDinUnitedStatesifitwasnotformyparentswhospendenormousamountoftimeande˙ortineducatingmefromtheearlieststagesofmylife.Theysupportedmeineverystepofmycareerandprovidedasafetynettofreelypursuemanydi˙erentpossibilities.AtMichiganStateUniversity,IamextremelyfortunatetobeadvisedbyProfessorHiraL.Koul,whotaughtmehowtothinkabouttheresearchproblems;Ihavebene˝tedalotfromhisclarityofthoughtandcreativeintellect.Hehasalwaysbeenconstantsourceofmotivationandencouragedmetorealizemypotential.Iamgratefultohimforprovidingthebestpossibleacademicenvironmentthatenabledmetothinkindependentlyandgrowasascienti˝cresearcher.Ialsothankhisfamilyfortheamazinghospitality,whichinmanywaysmademeathomeawayfromhomewhileatMichiganState.Ihavemuchtothankothermembersofmythesiscommitteeaswell.Dr.MarkIwen'scourseonesensingandBigandmanydiscussionsprovedveryhelpfulinmyresearchwork.IamthankfultoprofessorYuehuaCuiandDr.GraceHongforservingonmythesiscommitteeandfortakingtimeoutfromtheirbusyscheduletoteachmetheimportanceofgoodresearch.IamverygratefultoProfessorTathagataBandyopadhyayatIndianInstituteofMan-agementAhmedabad,forhissupportandencouragingmetopursueadvanceddegreefromUnitedStates.IamalsoveryfortunatetoknowProfessorArnabLahaatIndianInstituteofManagementAhmedabadandthankhimforhissupportandencouragement.AtMichiganState,IhavelearnedalotfromteachingofProfessorTapabrataMaiti,andthankhisfamilyfortheunconditionalsupport.IamthankfultoSueWatsonwhohasbeenanever-presentsourceofhelp,KimSchmuecker,andAndyHu˙ordfortheirhelpduringmanytechnicalissuesatMichiganState.vToallmyfriends,thankyouforyourunderstandingandencouragementinmymany,manymomentsofcrisis.Yourfriendshipmakesmylifeawonderfulexperience.Ican'tlistallthenameshere,butyouarealwaysonmymind.viTABLEOFCONTENTSLISTOFTABLES...................................xLISTOFFIGURES...................................xiCHAPTER1INTRODUCTION...........................11.1CovarianceStructureEstimation........................21.1.1HighDimensionalCovarianceMatrixEstimation............31.2InverseCovarianceMatrixEstimation......................31.3ThesisOverview..................................41.4Notation......................................6PartIEstimatingCovarianceStructure.................8CHAPTER2SAMPLECOVARIANCEMATRIXANDITSLIMITATIONS....92.1SampleCovarianceMatrix............................92.2WhySampleCovarianceMatrixisNOTSuitableinHighDimensions?....10CHAPTER3LOSSFUNCTIONSFORCOVARIANCEMATRIXESTIMATION.133.1LikelihoodBasedMethods............................133.2FrobeniusNormLossBasedMethods......................143.3OtherLossFunctionBasedMethods......................15CHAPTER4LEARNINGSPARSESTRUCTURE..................174.1TwoBroadClassofCovarianceMatrices....................174.2LassoTypePenalty................................194.3Discussion:.....................................21CHAPTER5ESTIMATINGAWELL-CONDITIONEDSTRUCTURE.......235.1Motivation.....................................235.2WellConditionedEstimation...........................235.3VarianceofEigenvaluesPenalty.........................24CHAPTER6LEARNINGSIMULTANEOUSSTRUCTUREWITHJOINTPENALTY266.1Motivation.....................................266.2JPENFramework.................................286.3TheoreticalAnalysisofJPENEstimators....................296.3.1ResultsonConsistency..........................316.4GeneralizedJPENEstimatorsandOptimalEstimation............336.4.1WeightedJPENEstimatorfortheCovarianceMatrixEstimation..33CHAPTER7ANALGORITHMANDITSCOMPUTATIONALCOMPLEXITY..347.1AVeryFastExactAlgorithm..........................357.1.1Derivation.................................35vii7.1.2ChoiceofRegularizationParameters..................367.1.3ChoiceofWeights.............................367.2ComputationalComplexity............................37CHAPTER8SIMULATIONS.............................398.1Preliminary....................................398.2PerformanceComparison.............................428.3RecoveryofEigen-structureandSparsity....................45PartIIEstimatingInverseCovarianceStructure..........48CHAPTER9INVERSECOVARIANCEMATRIXANDITSAPPLICATIONS...499.1Motivation.....................................499.2RelatedWork...................................499.3JointPenaltyforPrecisionMatrixEstimation.................519.4SomeApplications................................539.4.1LinearDiscriminantAnalysis......................539.4.2GaussianGraphicalModeling......................53CHAPTER10AJOINTCONVEXPENALTY(JCP)ESTIMATION.........5510.1Motivation.....................................5510.2JointConvexPenaltyEstimation........................5510.2.1ProblemFormulation...........................5610.2.2ProposedEstimator............................56CHAPTER11PROXIMALGRADIENTALGORITHMSANDITSCONVERGENCEANALYSIS...............................5811.1Introduction....................................5811.2ProximalGradientMethod............................5811.3BasicApproximationModel...........................5911.4Algorithmforoptimization............................6011.5ChoosingtheRegularizationParameter.....................6111.6ConvergenceAnalysis...............................6111.7SimulationStudy.................................6311.7.1PerformanceCriteria...........................6311.7.2StARSMethodofTuningparameterselection:.............6411.7.3SimulationResults............................6411.7.3.1ToeplitzTypePrecisionMatrix................6511.7.3.2BlockTypePrecisionMatrix.................6611.7.3.3HubGraphTypePrecisionMatrix..............6711.7.3.4NeighborhoodGraphTypePrecisionMatrix.........6811.8Summary.....................................6911.9Discussion.....................................69viiiCHAPTER12SIMULTANEOUSESTIMATIONOFSPARSEANDWELL-CONDITIONEDPRECISIONMATRIX.........................7112.1Motivation.....................................7112.2JointPenaltyEstimation:ATwoStepApproach...............7212.3WeightedJPENestimatorforprecisionmatrix.................7312.4TheoreticalAnalysisofJPENestimators....................73CHAPTER13SIMULATIONSANDANAPPLICATIONTOREALDATAANAL-YSIS...................................7513.1SimulationResults:Settings...........................7513.2PerformanceComparison.............................7613.3ColonTumorGeneExpressionDataAnalysis.................77APPENDICES......................................81APPENDIXAJPENCovarianceMatrixEstimation..............82APPENDIXBJCPforPrecisionMatrixEstimation..............88APPENDIXCJPENforPrecisionMatrixEstimation.............91BIBLIOGRAPHY....................................92ixLISTOFTABLESTable8.1Covariancematrixestimation.........................43Table8.2Covariancematrixestimation.........................44Table11.1AverageKL-Losswithstandarderrorover20replications.........65Table11.2Averagerelativeerrorwithstandarderrorover20replications.......65Table11.3AverageKL-Losswithstandarderrorover20replications.........66Table11.4Averagerelativeerrorwithstandarderrorover20replications.......66Table11.5AverageKL-Losswithstandarderrorover20replications.........67Table11.6Averagerelativeerrorwithstandarderrorover20replications.......67Table11.7AverageKL-Losswithstandarderrorover20replications.........68Table11.8Averagerelativeerrorwithstandarderrorover20replications.......68Table13.1Precisionmatrixestimation..........................76Table13.2Precisionmatrixestimation..........................77Table13.3Averagesandstandarderrorsofclassi˝cationerrorsover100replicationsin%.......................................79xLISTOFFIGURESFigure2.1Eigenvalueofsampleandpopulationcovariancematrices.........11Figure3.1Aconcavefunction..............................13Figure4.1Denseandsparsecovarianceandprecisionmatrices............18Figure6.1Comparisonofeigenvaluesofcovariancematrixestimates.........27Figure7.1TimingcomparisonofJPEN,Glasso,andPDSCE.............38Figure8.1Covariancegraphfordi˙erenttypeofmatrices...............41Figure8.2Heat-mapofzerosidenti˝edincovariancematrixoutof50realizations.Whitecoloris50/50zerosidenti˝ed,blackcoloris0/50zerosidenti˝ed.45Figure8.3Eigenvaluesplotforn=100;p=50basedon50realizationsforneigh-borhoodtypeofcovariancematrix.....................46Figure8.4Eigenvaluesplotforn=100;p=100basedon50realizationsforCov-Itypematrix..................................47Figure9.1Eigenvaluesplotofprecisionmatrix.....................52Figure9.2Illustrationofconditionalindependence..................54Figure13.1Colontumorgeneexpressiondata......................78Figure13.2Partialcorrelationnetworkofcolontumorgeneexpressiondata.....80xiCHAPTER1INTRODUCTIONTheincreasinguseoftechnologywiththedevelopmentsinstoragesystemshascreatedvastamountofhighdimensionaldataacrossmanyscienti˝cdisciplines.Examplesincludethelarge-scaleomicsdatathatenhanceourknowledgeofhumanbiology,networkdatathatexplainshowweinteractandconnectwitheachother,andthe˝nancedatathatprovidesanopportunitytobeatthemarket.Thestatisticalanalysisofthesedataischallengingduetothecurseofdimensionality.Newstatisticalmethodsareneededtomodeltheunknownstructureunderlyingthesedatasetstoleverageourscienti˝cunderstanding.Thestatisticalinferenceinhigh-dimensionaldataispossibleonlyifaninferencepro-cedureis˛exibleenoughtoexploitthehiddenstructureunderlyingthesedatasets.Thistranslatestodesigninganinferenceprocedurethatdoeswellinmodelingthestructureun-derlyingthesedatasets.Suchaninferenceprocedureoftenassumesthatmanyofhighdimensionalstructurescanberepresentedwithasmallernumberofparameterswhichisthecaseinmanyscienti˝cdisciplines.Consequently,thetheconceptofparsimonybecomescrucialinhighdimensions.Thee˙ectivenessofstatisticalestimationinhighdimensionalsettingreliesonthero-bustnessoftheprocedure,itse˚ciency,andscalability.Thelatterdependsupontheavail-abilityofscalablealgorithmictechniquesanditsabilitytoe˚cientlylearnthestructureunderlyingthesedatasets.Themaingoalofthisthesisistodevelopa˛exibleandprincipledstatisticalmethodsforuncoveringhiddenstructureunderlyingthehighdimensional,complexdatawithafocusonscienti˝cdiscovery.Inparticularthethesisaddressestwomaintasks:(i)Estimationofcovariancematricesanditsinverse,and(ii)Scalablealgorithmsforcomputingthecovariancestructurebasedontheproposedestimationprocedures,inhighdimensionalsetting.11.1CovarianceStructureEstimationInmanyscienti˝cdisciplines,astudyinvolveslargecomplexsystemswhoseoutputoftendependsonlargenumberofcomponents(variables)andtheirinteractions.Asamotivat-ingexample,insystembiology,thecellularnetworksoftenconsistsofverylargenumberofmoleculesthatinteractandexchangeinformationamongthemselves.Manyoftheexistingtechniquesdependuponthedescriptiveanalysisofmacroscopicproperties,whichincludedegreedistribution,pathlengthsandmotifpro˝leofthesemolecularnetworksordatamin-ingtoolstoidentifyclusters.Suchananalysisprovideslimitedinsightintothecomplexmechanismofthefunctionalandstructuralorganizationofbiologicalstructures.Anesti-mateoftherobustcovariancematrixcanexplainthebiologicallyimportantinteractionsandenhanceourscienti˝cunderstandingofthecomplexphenomenon.Covariancestructureestimationisoffundamentalimportanceinmultivariatedataanal-ysis.Itiswidelyusedinnumberofapplicationsincluding(i)Principalcomponentanalysis(PCA)[JohnstoneandLu[2004],Zouetal.[2006]],wherethegoalistoprojectthedataonest"k-dimensionalsubspace,andwherebestmeanstheprojecteddataexplainsasmuchofthevariationinoriginaldatawithoutincreasingk;(ii)Discriminantanalysis[Mardiaetal.[1979]]:wherethegoalistoclassifyobservationsintodi˙erentclasses,hereestimatesofcovarianceandinversecovariancematricesplayanimportantroleastheclassi˝erisoftenafunctionoftheseentities;(iii)Regressionanalysis:Ifinterestfocusesonestimationofregressioncoe˚cientswithcorrelated(orlongitudinal)data,asandwichestimatorofthecovariancematrixmaybeusedtoprovidestandarderrorsfortheestimatedcoe˚cientsthatarerobustinthesensethattheyremainconsistentundermis-speci˝cationofthecovariancestructure;and(iv)Gaussiangraphicalmodeling[YuanandLin[2007],Wainwrightetal.[2006]Wainright[2009],Yuan[2009],MeinshausenandBühlmann[2006]]:therelationshipstructureamongnodescanbeinferredfrominversecovariancematrix(alsocalledprecisionmatrix).Azeroentryintheprecisionmatriximpliesconditionalindependencebetweenthe2correspondingnodes,giventheremainingnodes.Inapplicationswheretheprobabilitydis-tributionofdataismultivariateGaussian,precisionmatrixisusedtodescribetheunderlyingdependecestructureamongthevariables.1.1.1HighDimensionalCovarianceMatrixEstimationInmanyoftheapplications,statisticianoftenencounterdatasetswheresamplesizenismuchsmallerthantheambientdimensionp,wherethelattercanbeverylargeofteninthousands,millionsorevenmore.Insuchsituations,theclassicalstatisticalestimatorstendtohavehugebiasforestimatingtheirpopulationcounterpartandmanyoftheexistingasymptotictheoriesnolongerremainvalid.Ingeneralapdimensionalcovariancematrixrequiresestimationofp(p+1)=2freeparameters.Forn2.Themainideabehindthenon-convexpenaltyistoreducethebiaswhenthevalueofparameterhasrelativelylargermagnitude.Forexample,theSCADpenaltyremainsconstantwhenislarge,whereasthe`1penaltygrowslinearlywith.Themainlimitationsofnon-convexpenaltyisthattheproposedalgorithmsusesiterativeprocedurebasedonlocalconvexapproximationshencecomputatinalyintensive.Alsoitishardtosayiftheproposedalgorithmconvergestotheglobalminima/maxima.TheproposedJointPENalty(JPEN)methodinthisthesisusesFrobeniusnormlossfunctionandjointpenaltyof`1andvarianceofeigenvalueofunderlyingcovariancematrix.Thechoiceofsquaredlossfunctionallowssparseestimationofcovariancematrix(ratherthesparseprecisionmatrix),andresultsinveryfastalgorithm.Weintroducevarianceofeigenvaluespenaltytoensurethattheestimatedcovariancematrixispositivede˝nite.Formoredetailsonthis,seethechapter6.4.3Discussion:Assumptionofsparsityinvolvesatradeo˙betweenbene˝tandcost.Inparticularinhighdimensionaldataanalysis,whenentriesofcovariancematricesaresettozero,thenoiseduetotheerrorofestimationisgenerallyreduced.Ontheotherhand,errorsofmisspeci˝cationareintroduced.Hencethedecisionto˝tasparsemodelcomesattrade-o˙betweenover˝tting21andmodelspeci˝cation.AsnotedbyDempster[1972],oncetheparametricmodelisadopted,choiceoflevelofsparsityisoftensettleddown,speciallywhentheoptimalestimatescaneasilybecomputed.However,suchoptimalityprovidesnogauranteeagainstthecostofintroducingunecessaryparameters.22CHAPTER5ESTIMATINGAWELL-CONDITIONEDSTRUCTURE5.1MotivationInhighdimensionaldataapplications,whereaninverseofcovariancematrixisused,samplecovariancematrixcannotbeusedasgenerallythisisnotinvertible.Byawell-conditionedcovariancematrix,wemeanthatitsconditionnumber(ratioofmaximumandminimumeigenvalues)isboundedabovebyapositively˝niteconstant(heretheconstantisnottoolarge).AspointedoutbyLedoitandWolf[2004],awell-conditionedestimatorreducestheestimationerrorandisadesiredpropoertyinhighdimensionalsettings.Inthischapter,wediscusssomeoftheexistingliteratureonwell-conditionedestimation,andintroducevarianceofeigenvaluespenaltlyasane˙ectivemethodforimprovedeigen-structureestimation.5.2WellConditionedEstimationTheproblemofwell-conditionedcovariancematrixestimationisalongstudiedsubjectStein[1975,1986],LedoitandWolf[2004,2014],SheenaandGupta[2003],Wonetal.[2012].Ithasreceivedconsiderableattentioninhighdimensionalanalysisduetotheimportanceofsuchestimatorsinmanyhighdimensionaldataapplications.Amongtheearlierdevelopmentstosolvethisproblem,Stein[1975]proposedhisfamousclassofrotationinvariantshrinkageestimators.Herethemainideawastokeepthesameeigenvectorsasthatofthesamplecovariancematrixbutshrinktheeigenvaluestowardsthecenter,inordertoreducetheeigenvaluesdispersion.LetS:=UDUTbeeigen-decompositionofthesamplecovariancematrix.Stein'sestimatorisgivenby:^=UDnewUTwhereDnew=diag(dnew1;dnew2;;dnewp)(5.2.1)23withdnewii=ndiinp+1+2diipXi6=j1diidjj,where(d11;d22;;dpp)isthediagonalofD.ThisclassofestimatorsisfurtherstudiedbyHa˙[1980],LinandPerlman[1985],DeyandSrinivasan[1985],LedoitandWolf[2004,2014].AlthoughSteinestimatorisconsideredtobeStanda[Rajaratnametal.[2014]],ithasanumberoflimitationsincluding,(i)itisnotnecessarilypositivede˝nite,(ii)assumesnormality,and(iii)suitableonlyforlowdimensiondataanalysiswhensamplesizeexceedsthedimension.Amongtheearlierworkofeigenvaluesshrinkageestimationinhighdimensionalsetting,LedoitandWolf[2004]proposedanestimatorthatshrinksthesamplecovariancematrixtowardsidentity.Theirestimatorisgivenby:ˆ1S+ˆ2I;whereˆ1;ˆ2areestimatedfromdata.(5.2.2)Forˆlargeenough,theestimatorgivenby(5.2.2)iswell-conditionedbutneednotbesparseassamplecovariancematrixisgenerallynotsparse.Inanotherinterestingwork,Wonetal.[2012]considermaximumlikelhihoodestimationwithaconditionnumberconstraint.Theysolvethefollowingoptimizationproblem:MaximizeL(S;subjectto˙max=˙minmax;(5.2.3)whereL(S;:)islikelihoodfunctionofmultivariateGaussiandistributiongivenin(3.1.1),andmaxissomepositiveconstant.Theestimateof(5.2.3)isinvertibleifmaxis˝nite,andwellconditionedifmaxismoderate.Theyconsideravalueofmax<103tobemoderatebutitsvaluesalsodependsupontheeigenvaluesdispersionoftruepopulationcovariancematrix.5.3VarianceofEigenvaluesPenaltyTheestimatorsproposedbyStein[1975],DeyandSrinivasan[1985],LedoitandWolf[2004],andWonetal.[2012]arewell-conditionedandhavebeenusedinanumberofap-24plications.Howevertheseestimatorsarenotsparse,inadditiontheyhavethefollowinglimitations:Therotationinvariantestimatorsdonotchangetheeigen-vectorsandhencetheyremaininconsistent[JohnstoneandLu[2004]].Theseestimatorstendtooverestimatethenumberofnon-zerosoftheunderlyingtruecovariancematrixanditsinverse.TheestimatorgiveninLedoitandWolf[2004]resultsinlinearshrinkageofeigenvaluestowardsthoseofidentitymatrix,whichmaynotbeoptimalcriteria,aseigenvaluesfarfromcentertendtobeheavilybiasedascomparedtotheeigenvaluesinthecenter.Thechoiceofvarianceofeigenvalueshasadvantageasitallowsmoreshrinkageoftheextremeeigenvaluesthantheonesincenterandthereforenon-linearlyreducesthebias,andthequadratictermleadstoveryfastandexactalgorithm.Seechapter6formoredetailedanalysisoftheproposedmethod.25CHAPTER6LEARNINGSIMULTANEOUSSTRUCTUREWITHJOINTPENALTY6.1MotivationFromthediscussioninchapter4,itisunderstoodthatlearningasparsestructurecanbeachievedby`1regularization.Fromchapter5,itislearnedthatawell-conditionedstructurecanbeachievedbysuitablyshrinkingthesampleeigenvaluestowardsitscenter.Howevereitheroftheseregularizationdonotprovideasimultaneoustreatmentofsparseandwell-conditionedestimation.Forexample,considertheestimationofcovariancematrixbyminimizingthe`1regularizedFrobeniusnormlossfunction:^=argminT;trtr(S)jjSjj22+kk1;(6.1.1)whereissomepositiveconstant.Notethatbypenaltyfunctionkk1,weonlypenalizeo˙-diagonalelementsof.Bytheconstraint,tr=tr(S),weensurethattotalvariationintheestimatedcovariancematrixisthesameasthatinthesamplecovariancematrix.Thesolutionto(6.1.1)isthestandardsoft-thresholdingestimatoranditisgivenby(seechapter7forderivationofthisestimator):^ii=sii^ij=sign(sij)maxjsijj2;0;i6=j:(6.1.2)Itisclearfromthisexpressionthatasu˚cientlylargevalueofwillresultinsparsecovari-ancematrixestimate.Howevertheestimator^of(6.1.1)isnotnecessarilypositivede˝nite[formoredetailshereseeMaurya[2016],Xueetal.[2012]].Moreoveritishardtosaywhetheritovercomestheover-dispersioninthesampleeigenvalues.Figure6.1illustratesthisphenomenonforaneighbourhoodtypecovariancematrix.Herewesimulatedrandomvectorsfrommultivariatenormaldistributionwithsamplesizen=50anddimensionp=50.AsisevidentfromFigure6.1,eigenvaluesofsamplecovariancematrixareover-dispersedas26Figure6.1Comparisonofeigenvaluesofcovariancematrixestimatesmostofthemareeithertoolargeorclosetozero.EigenvaluesoftheproposedJointPenalty(JPEN)estimatorarewellalignedwiththoseofthetruecovariancematrix.Seechapter8fordetaileddiscussion.Thesoft-thresholdingestimator(6.1.2)issparsebutfailstorecovertheeigenstructureofthetruecovariancematrix.Toovercometheoverdispersionandachieveawell-conditionedestimator,itisnaturaltoregularizetheeigenvaluesofthesamplecovariancematrix.Considertheeigenvaluesreg-ularizedestimatorofcovariancematrixbasedonsquaredlosspenaltyasthesolutiontothe27followingoptimizationproblem:^=argminT;trtr(S)jjSjj22+pXi=1n˙i(R)˙Ro2;(6.1.3)whereissomepositiveconstant.Theminimizer^of(6.1.3)isgivenby,^=(S+tI)=(1+);(6.1.4)whereIistheidentitymatrix,andt=Ppi=1Sii=p.Toseetheadvantageofeigenvalueshrinkagepenalty,notethataftersomealgebra,forany>0,˙min(^=˙min(S+tI)=(1+)t1+>0:ThismeansthatthevarianceofeigenvaluespenaltyimprovesStoapositivede˝niteesti-mator^.Howevertheestimator(6.1.3)iswell-conditionedbutneednotbesparse.Sparsitycanbeachievedbyimposing`1penaltyontheentriesofcovariancematrix.Inthenextsectionwedescribethejointpenaltyestimationanddiscussitsadvantageasanimprovedcovariancematrixestimator.6.2JPENFrameworkWeconsiderthejointpenaltyestimatorofcovariancematrixasthesolutiontothefollowingoptimizationproblem.^=argminT;trtr(S)jjSjj22+kk1+pXi=1n˙i˙o2;(6.2.1)whereandaresomepositiveconstants.Fromhereonwardswesuppressthedependenceof^onanddenote^by^.Simulationshaveshownthat,ingeneraltheminimizerof(6.2.1)isnotpositivedef-initeforallvaluesof>0and>0.Thereforeweconsidertheoptimizationof(6.2.1)forrestrictedsetof()toensuretheresultingestimatorissparseandwell-conditioned28simultaneously.Inwhatfollows,we˝rstconsidercorrelationmatrixestimation,andlatergeneralizethemethodforcovariancematrixestimation.TheproposedJPENcovariancematrixestimatorisobtainedbyoptimizingthefollowingobjectivefunctioninRoverspeci˝cregionofvaluesof()whichdependsonthesamplecorrelationmatrixK,and.Heretheconditiontr=tr(S)reducestotr(R)=p,andthereforet=1.^RK=argminR=RT;tr(R)=pj()2^SK1jjRKjj2F+kRk1+pXi=1n˙i(R)˙Ro2;(6.2.2)where^SK1=ˆ():>0;qlogpn;8>0;˙minf(K+I)2sign(K+I)g>˙;and˙RisthemeanoftheeigenvaluesofR.InparticularifKisdiagonalmatrix,theset^SK1isgivenby,^SK1=ˆ():>0;qlogpn;8>0;<2()˙.Theminimizationin(6.2.2)overRisfor˝xed()2^SK1.FurthermoreLemmas6.2.1and6.2.2,respectivelyshowthattheobjectivefunction(6.2.2)isconvexandestimatorgivenin(6.2.3)ispositivede˝nite.Theproposedestimatorofcovariancematrix(basedonregularizedcorrelationmatrixestimator^RK)isgivenby:^K=(S+)1=2^RK(S+)1=2;(6.2.3)whereS+isthediagonalmatrixofthediagonalelementsofS.6.3TheoreticalAnalysisofJPENEstimatorsAlthoughwedonotmakeanyassumptionofdatageneratingprocessforestimation,toderiveratesofconvergencewemakeassumptionthattheunderlyingdatagenerating29processissub-Gaussian.Inthissection,wegiveratesofconvergenceoftheproposedJPENestimatorinhighdimensionalsettingwherethesamplesizeanddimensionbothdivergetoin˝nity.Def:ArandomvectorXissaidtohavesub-Gaussiandistributionifforeacht0andy2Rpwithkyk2=1,thereexist0<˝<1suchthatPfjyT(XE(X))j>tget2=2˝(6.3.1)TheJPENestimatorsexistsforany(n;p)satisfying2n0suchthat1=k˙min0)˙max0)k.AssumptionA2guaranteesthatthetruecovariancematrix0iswell-conditioned(i.e.alltheeigenvaluesare˝niteandpositive).AssumptionA1ismoreofade˝nitionwhichsaysthatthenumberofnon-zeroo˙diagonalelementsareboundedbysomepositiveinteger.ThefollowingLemmas6.3.1and6.3.2,respectively,showthattheoptimizationproblemin(6.2.2)isconvexandyieldsapositivede˝nitesolution.Lemma6.3.1.Theoptimizationproblemin(6.2.2)isconvex.Lemma6.3.2.Theestimatorgivenby(6.2.2)ispositivede˝niteforany2n<1and1p<1.306.3.1ResultsonConsistencyTheorem6.3.1.Let()2^SK1and^Kbeasde˝nedin(6.2.2).UnderAssumptionsA0,A1,A2,k^RKR0kF=OPsslogpnandk^K0k=OPs(s+1)logpn;(6.3.2)whereR0istruecorrelationmatrix.Remark6.3.1.TheJPENestimator^Kismini-maxoptimalundertheoperatornorm.In(Caietal.[2015]),theauthorsobtainthemini-maxrateofconvergenceintheoperatornormoftheircovariancematrixestimatorfortheparticularconstructionofparameterspaceH0(cn;p):=ˆ:max1ipPpi=1If˙ij6=0gcn;p˙.Theyshowthatthisrateinoperatornormiscn;pqlogp=nwhichissameasthatof^Kfor1cn;p=ps.Remark6.3.2.BickelandLevina[2008b]provedthatundertheassumptionofPj=1j˙ijjqc0(p)forsome0q1,thehardthresholdingestimatorofthesamplecovariancematrixfortuningparameterq(logp)=nisconsistentinoperatornormataratenoworsethanOPc0(p)pp(logpn)(1q)=2wherec0(p)istheupperboundonthenumberofnon-zeroele-mentsineachrow.Herethetrulysparsecasecorrespondstoq=0.Therateofconvergenceof^KissameasthatofBickelandLevina[2008b]exceptinthefollowingcases:Case(i)Thecovariancematrixhasallo˙diagonalelementszeroexceptlastrowwhichhasppnon-zeroelements.Thenc0(p)=ppandps=q2pp1.TheoperatornormrateofconvergenceforJPENestimatorisOPqpp(logp)=nwhereasrateofBickelandLev-ina'sestimatorisOPqp(logp)=n.Case(ii)Whenthetruecovariancematrixistri-diagonal,wehavec0(p)=2ands=2p2,theJPENestimatorhasoperatornormrateofqplogp=nwhereasthatofBickelandLevina'sestimatorisqlogp=n.Forthecasepsc0(p)andJPENestimatorhasthesamerateofconvergenceasthatofBickelandLevina'sestimator.31Remark6.3.3.TheoperatornormrateofconvergenceismuchfasterthanFrobeniusnorm.ThisisduetothefactthatFrobeniusnormconvergenceisintermsofalleigenvaluesofthecovariancematrixwhereastheoperatornormconvergenceisintermsofthelargesteigenvalue.Remark6.3.4.Ourproposedestimatorisapplicabletoestimateanynon-negativede˝nitecovariancematrix.Notethattheestimator^Kisobtainedbyregularizationofsamplecorrelationmatrixin(6.2.2).Insomeapplicationsitisdesirabletodirectlyregularizethesamplecovariancematrix.TheJPENestimatorofthecovariancematrixbasedonregularizationofsamplecovariancematrixisobtainedbysolvingthefollowingoptimizationproblem:^S=argminT;trtr(S)j()2^SS1jjSjj2F+kk1+pXi=1f˙i˙g2;(6.3.3)where^SS1=ˆ():>0;qlogpn;8>0;˙minf(S+tI)2sign(S+tI)g>g:Theminimizationin(6.3.3)overisfor˝xed()2^SS1.Theestimator^Sispositivede˝niteandwell-conditioned.Theorem6.3.2givestherateofconvergenceoftheestimator^SinFrobeniusnorm.Theorem6.3.2.Let()2^SS1,andlet^Sbeasde˝nedin(6.3.3).UnderAssumptionsA0,A1,A2,k^S0kF=OPs(s+p)logpn(6.3.4)AsnotedinRothman[2012]theworstpartofconvergenceherecomesfromestimatingthediagonalentries.326.4GeneralizedJPENEstimatorsandOptimalEstimationTheestimatorsin(6.2.1)and(6.3.3)encourageeigenvalueshrinkagebythesameweightsforalltheeigenvalues.Howeveronemightwanttopenalizetheeigenvalueswithdi˙erentweights,especiallyifsomepriorknowledgeisavailableaboutthestructureoftrueeigenvalues.Toencouragedi˙erentlevelofshrinkagetowardsthecenter,weprovidethemoregenericestimatorsandcallitweightedJPENestimators.6.4.1WeightedJPENEstimatorfortheCovarianceMatrixEstimationAmodi˝cationofestimator^RKisobtainedbyaddingpositiveweightstotheterm(˙i(R)˙R)2.Thisleadstoweightedeigenvaluesvariancepenaltywithlargerweightsamountingtogreatershrinkagetowardsthecenterandviceversa.Notethatthechoiceoftheweightsallowsonetouseanypriorstructureoftheeigenvalues(ifknown)inestimatingthecovariancematrix.TheweightedJPENcorrelationmatrixestimator^RAisgivenby^RA=argminR=RT;tr(R)=pj()2^SK;A1jjRKjj2F+kRk1+pXi=1aif˙i(R)˙Rg2;(6.4.1)where^SK;A1=ˆ():qlogpn;(2˙min(K))(1+max(Aii)1)(1+min(Aii))1p+min(Aii)p˙;andA=diag(A11;A22;App)withAii=ai.Theproposedcovariancematrixestimatoris^K;A=(S+)1=2^RA(S+)1=2.Theoptimizationproblemin(6.4.1)isconvexandyieldsapositivede˝niteestimatorforeach()2^SK;A1.Asimpleexcerciseshowsthattheestimator^K;Ahasthesamerateofconvergenceas^S.Howtochooseweightsaiin(6.4.1),isdiscussedinnextchapter.33CHAPTER7ANALGORITHMANDITSCOMPUTATIONALCOMPLEXITYAproblemisregardedasinherentlydi˚cultifitssolutionrequiressigni˝cantresources,whateverthealgorithmused.Despiterecentambitiousdevelopmentsinsolvingconvexoptimizationproblems,e˚cientcomputationandscalabilitystillremaintwochallengingproblemsinhighdimensionsdataanalysis.Theexistingmethodsthatsolveaconvexopti-mization(herewemeanminimization)problemsoftencanbeimplementedverye˚cientlyinfarlesstimethantheconcaveoptimization.Extensiveliteratureexitsforconvexopti-mizationproblems[Bertsekas[2010]VandenbergheandBoyd[2004],Bachetal.[2011],BeckandTeboulle[2009]].ThemainchallangeincovariancematrixestimationintheGaussianliklihoodframeworkisthatthenegativeofloglikelihoodisconcavefunctionwhichmakesitaveryhardoptimizationproblem.Insuchsituations,onewaytofacilitatetheoptimiza-tionistoapproximatethenegativeofloglikelihoodfunctionbysomenon-concavefunctionandthensolvethisapproximateprobleme˚cientlyusingexistingalgorithms.Howeverthesolutionthusobtainedmaynotbeanoptimalsolutiontotheoriginalproblem.TheexistingalgorithmsofcomputingtheoptimalcovariancematrixbasedonFrobeniuslossfunctionhavecomputationalcomplexityofO(p3),wheretheconstantinO(p3)canbereallylarge,oftenmorethanthedimensionofthematrix.Themainreasonbehindsuchhighcomputationalcomplexityisthatthemethodsrequireoptimzationoverasetofpositivede˝niteconesfortheestimatortobepositivede˝nite(formoreonthistopic,seeXueetal.[2012]).TheJPENframeworkprovidesaneasysolutionforpositivede˝niteconstraintsthatdependsuponchoicesofthe().ThecompuatonalcomplexityofJPENestimatorisO(p2)andthusmuchfasterthantheotherexistingalgorithms.ThenextsectiondescribesthederivationofJPENestimatordescribedinchapter6.347.1AVeryFastExactAlgorithm7.1.1DerivationTheoptimizationproblem(6.2.2)canbewrittenas:^RK=argminR=RTj()2^SK1f(R);(7.1.1)wheref(R)=jjRKjj2F+kRk1+pXi=1f˙i(R)˙(R)g2:NotethatPpi=1f˙i(R)˙(R)g2=tr(R2)2tr(R)+p,wherewehaveusedtheconstrainttr(R)=p.Therefore,f(R)=kRKk2F+kRk1+tr(R2)2tr(R)+p=tr(R2(1+))2trfR(K+I)g+tr(KTK)+kRk1+p=(1+)ftr(R2)21+trfR(K+I)g+(1=(1+))tr(KTK)g+kRk1+p=(1+)fkR(K+I)=(1+)k2F+11+tr(KTK)g+kRk1+p:Thesolutionoftheaboveoptimizationproblemissoftthresholdingestimatorandisgivenby,^RK=11+sign(K)pmaxfabs(K+I)2;0g(7.1.2)with(^RK)ii=(Kii+)=(1+),pmax(A;b)ij:=max(Aij;b)iselementwisemaxfunctionforeachentryofthematrixA.Notethatforeach()2^SK1,^RKispositivede˝nite.357.1.2ChoiceofRegularizationParametersForagivenvalueof,wecan˝ndthevalueofsatisfying˙minf(K+I)2sign(K+I)g>0;(7.1.3)whichcanbesimpli˝edto<˙min(K+I)C12˙max(sign(K));C120:5:Then()2^SK1,andtheestimator^RKispositivede˝nite.SmallervaluesofC12yieldasolutionwhichismoresparsebutmaynotbepositivede˝nite.Theoptimalvaluesof()wereobtainedfollowingtheapproachsuggestedinBickelandLevina[2008b]byminimizingthe5foldcrossvalidationerror155Xi=1k^vivik1;where^viisJPENestimateofthecovariancematrixbasedonv=4n=5observations,viisthesamplecovariancematrixusing(n=5)observations.7.1.3ChoiceofWeightsFortheoptimizationproblemin(6.4.1),wechosetheweightsasperthefollowingcriteria:LetE=(1;2;;p)bethesetofsmallesttolargestdiagonalelementsofthesamplecovariancematrixS.LetkbethelargestintegersuchthatkthelementsofEislessthan1.Letbi=8>>><>>>:iforik1i;fork0,considerthefollowingquadraticapproximationmodeloffat0:QL;0):=f0)+<0;5f0)>+L2k0k22(11.3.1)whereistheinnerproductofAandB,andLisapositiveconstant.Theopti-mizationproblemin(11.3.1)hastwoconvexpenalties.Proximalgradientmethodconsistsofsequentialoptimizationof(11.3.1)bytakingoneconstraintatatimeineithercyclicorrandomorder.Rewritingtheoptimizationproblem(11.3.1)withsingleconstraintProx1Lgi0)=argmin˜0 QL;0)+gi!=argmin˜0 L2kf01L5f0)gk22+gi!:(11.3.2)Ingeneral,LisunknownanditisestimatedasanupperboundofLipschitzparameterof5f)[Bachetal.[2011]].InotherwordsLsatis˝es:k5f5f0)k2Lk0k28;02Dom(f):(11.3.3)AcommonmethodofgeneratingvalueofListodoalinesearch.Fortheoptimizationproblem(11.3.2)wesequentiallygeneratenewestimatesandincreasethevalueofLbyafactor>1untilthefollowingconditionismet:fLk)fk1)++L2kkk1k22;(11.3.4)wherekisasolutionatkthiteration.InLemma11.3.1and11.3.2below,wegiveproximalgradientoperatorfor`1andtracenorm.Lemma11.3.1.LetM2Rmn.Theproximaloperatorofk:k1withconstantisgivenbyProx1(M)=argminC˜0 kCk1+12kCMk22!;(11.3.5)where59Prox1(M)=sign(M)(0;abs(M))+,>0.andabs(M)entriwisemaximumfunctionformatrixM.Proof.Proofofthelemmaisgiveninappendix.Lemma11.3.2.LetM2RmnandM=UVTbesingularvaluedecompositionofMwhereU2RmrandV2Rrnhaveorthogonalcolumns,isthediagonalmatrixofsingularvaluesofMandrisrankofmatrixM.Thenproximaloperatorofk:kwithconstant˝isgivenbyProx˝(M)=argminC˜0 ˝kCk+12kCMk22!;(11.3.6)whereProx˝(M)=U˝VT,˝isdiagonalmatrixwith˝))ii=max(0;ii˝),˝>minipii).Proof.Proofofthelemmaisgiveninappendix.For`1andtracenorm,theproximaloperatorsareinexpensivetocalculate.Thisre-sultsine˚cientoptimizationoftheobjectivefunction.Theproximaloperatorfor`1iselementwisesoft-thresholdingoperator.Theproximaloperatorfortracenormisobtainedbyshrinkingthesingularvaluesofprecisionmatrixbyregularizationparameter.Alargervalueofregularizationparameterresultsismoreshrinkageoftheeigenvalues.11.4AlgorithmforoptimizationBelowwesummarizetheoptimizationalgorithmfor(11.3.2).InitializeL0=1;>1;0=diag(1=diag(S)).Iterate:Step1:SetL=Lk1.60Step2:WhileF)>QL);k1)+g1)+g2)(where=argmin˜0QL;k1)+g1+g2)SetL=L.Step3:SetLk=L;SetZk=k1Lk5fk);SetZk+1=Prox(˝=Lk)g2(Zk);Setk+1=Prox(k)g1(Zk+1):Repeatuntilconvergence.11.5ChoosingtheRegularizationParameterThechoiceofregularizationparameterisachallengingprobleminhighdimensionaldataanalysis.Regularizationhasclearbene˝tinproducingsparsesolutionaswellreducesfalsediscoveryrate.Asmallervalueofaccountsforasparserstructureoftheprecisionmatrix.SomeofthemethodsforchoosingregularizationparameterincludeK-foldcrossvalidation(KCV),stabilityapproachtoregularizationselection(StARS)(Liuetal.[2010]),Akaike-InformationCriteria(AIC)andBayesianInformationcriteria(BIC).Experiments[Liuetal.[2010]]haveshownthatAICandBICmethodstendtogivepoorperformanceforsmallersamplesizes.AlsoK-foldcrossvalidationtendstoselectsmallervaluesofregular-izationparameterandresultsinhigherfalsediscoveryrate.WefollowStARSapproachforestimatingtheregularizationparametersand˝:Nextwepresenttheconvergenceanalysisofthealgorithmgivenin11.4.11.6ConvergenceAnalysisWeusetheproposition3.1fromBertsekas[2010],Lemma11.6.1.Letfn;Ln;n=1;2;:::gbethesequencegeneratedbyalgorithmgivenin61Letc>0beaconstantsatisfying,maxf5kfk;5kgjkgcandmaxffn)f(Zn+j1);gjn)gj(Zn+j1)gcknZn+j1k;j=1;2:Thenforacyclicorderoptimizationofcomponentsg1()andg2(),followingholds:kn1k2knk22Ln(Fn)F))+18c2=L2n(11.6.1)wheren1=Prox(˝=Ln)g2n)andisasolutionof(2.1)and(2.2).Lemma11.6.2.Letfn;Ln;n=1;2;:::gbethesequencegeneratedbyalgorithmx11:4.Letcbeaconstantasde˝nedinLemma(11.6.1).Then,Fn)F)Ln4 kn1k2knk2!+9c22Ln2(11.6.2)Proof.ProofoftheLemma11.6.2isgiveninappendix.Wegivebelowtheconvergenceofthealgorithmx11.4.Theorem11.6.1.Letfk;k=1;2;:::gbethesequencegeneratedbyalgorithmx11:4.Letcbeaconstantasde˝nedinLemma11.6.1.Inaddition,weassumethatthereexistsaconstantM<1suchthat1Pn=1jj0istheleastupperLipschitzconstantofthegradientoffand>1isconstantasde˝nedinalgorithmx11:4.Proof.NotethatLnLLn,foralln=1;2;:::UsingLemma11.6.2,byaddingFn)F)overn=1;2;:::,wegetthedesiredresult.62Duetothenon-smoothnessofthetracenorm,theoptimal˝rstorderblackboxmethodshaveconvergencerateofO(k12).Theproximalgradientalgorithmusesthespecialstructureofthetraceand`1normwhichimprovestheconvergencerateforjointpenaltytotheorderO(k1).11.7SimulationStudyToimplementtheproposedmethod,weperformasimulationstudyforvariouschoicesofprecisionmatrix.Weconsiderdi˙erenttypesofprecisionmatricesasoutlinedinchapter8.Hereweconsidertheunderlyingtrueprecisionmatrixsparseratherthanthecovariancematrix.Forallthesechoicesofinversecovariancematrices,wegeneraterandomnumbersfrommultivariatenormaldistributionwithvaryingnandp.Wesetn=50;100;200andp=50;100;200.TheperformanceofproposedmethodiscomparedtographicallassoandSPICEestimatesofprecisionmatrix.ThejointconvexpenaltyestimateoftheprecisionmatrixwascomputedusingRsoftwareversion3.0.1basedonthealgorithmx11:4.ThegraphicallassoestimateoftheprecisionmatrixwascomputedusingRpackage(http://statweb.stanford.edu/tibs/glasso/).Inthereisoptionofnotpenalizingthediagonalelementsbysettingtheoptionenalizing.diagonal=FALSE",thisgivesSPICEestimate.11.7.1PerformanceCriteriaForeachofprecisionmatrixestimate,wecalculateKullback-Leibler(KL)Loss,andAverageRelativeError(ARE)de˝nedbelow:KLLoss;^=log(det(^+tr1^+log(detpARE;^=jlog(f(S;^log(f(S;j=log(f(S;63wheref(;)isdensityofmultivariateGaussiandistributionandSissamplecovariancematrix.Thetuningparametersand˝wereestimatedfollowingtheLiuetal.[2010]criteria,whichwedescribebelow.11.7.2StARSMethodofTuningparameterselection:Givenasampleofsizen,methodgeneratesNsamplesofsizeb,whereb0;slogpn;8>0;˙minf(^R1K+t1I)2sign(^R1K+t1I)g>˙;andt1istheaverageofthediagonalelementsof^R1K.Theminimizationin(12.2.1)overZisfor˝xed()2^SK2.TheproposedJPENestimatoroftheprecisionmatrix(basedonregularizedinversecorrelationmatrixestimator^ZK)isgivenby,^K=(S+)1=2^ZK(S+)1=2;whereS+isthediagonalmatrixofthediagonalelementsofS.Moreover(12.2.1)isaconvexoptimizationproblemand^ZKispositivede˝nite.Nextwegiveanotherestimateoftheprecisionmatrixbasedon^Sof6.3.3.Considerthefollowingoptimizationproblem:^S=argminT;trtr(^1S)j()2^SS2jj^1Sjj2F+kk1+pXi=1f˙i˙g2;(12.2.2)where^SS2=ˆ():>0;slogpn;8>0;˙minf(^1S+t2I)2sign(^1S+t2I)g>˙;72andt2isaverageofthediagonalelementsof^S.Theminimizationin(12.2.2)overisfor˝xed()2^SS2.Theestimatorin(12.2.2)ispositivede˝niteandwell-conditioned.12.3WeightedJPENestimatorforprecisionmatrixSimilartoweightedJPENcovariancematrixestimator^K;A,aweightedJPENestima-toroftheprecisionmatrixisobtainedbyaddingpositiveweightsaitotheterm(˙i(Z)1)2in(12.2.2).TheweightedJPENprecisionmatrixestimatoris^K;A:=(S+)1=2^ZA(S+)1=2,where^ZA=argminZ=ZT;tr(Z)=tr(^R1K)j()2^SK;A2jjZ^R1Kjj2F+kZk1+pXi=1aif˙i(Z)1g2;(12.3.1)with^SK;A2=ˆ():qlogpn;(2˙min(R1K))(1+t1max(Aii)1)(1+min(Aii)1p+min(Aii)p˙;andA=diag(A11;A22;App)withAii=ai.Theoptimizationproblemin(12.3.1)isconvexandyieldsapositivede˝niteestimatorfor()2^SK;A2.12.4TheoreticalAnalysisofJPENestimatorsInthissection,wederivetherateofconvergenceoftheproposedJPENestimatorsofprecisionmatrixinbothFrobeniusandspectralnorm.Let0=10bethetrueprecisionmatrix.LetX=(X1;X2;;Xn)besub-Gaussianrandomvectorsasde˝nedin(6.3.1).Wemakethefollowingadditionalassumptionsaboutthe0.B0.SameastheassumptionA0ofx6:3.B1.WithH=f(i;j):0ij6=0;i6=jg,thejHjs,forsomepositiveintegers.B2.Thereexist0>>>>><>>>>>>:1:ifRij>01:ifRij<0˝2(1;1):ifRij=0NotethatkRk1hasthesamevalueirrespectiveofsignofR,thereforetherighthandsideof(A.0.2)isminimumif:82sign(R)=sign(K+I)=sign(K)forevery>0,using(A.0.3),˙minf(K+I)2sign(K)g>givesa()2^SK1andsuchachoiceof()guaranteestheestimatortobepositivede˝nite.RemarkA.0.1.Intuitively,alargershrinkstheeigenvaluestowardscenterwhichis1,alargerwouldresultinpositivede˝niteestimator,whereasalargerresultsinsparseestimate.Acombinationof()resultsinasparseandwell-conditionedestimator.Inparticularcase,whenKisdiagonalmatrix,the<2.Proof.[Theorem6.3.1]LetQ(R)=f(R)f(R0),whereR0isthetruecorrelationmatrixandRisanyothercorrelationmatrix.LetR=UDUTbeeigenvaluedecompositionofR,whereDisdiagonalmatrixofeigenvaluesandUismatrixofeigen-vectors.R0=U0D0UT0iseigenvaluedecompositionofR0.Wehave,Q(R)=kRKk2F+kRk1+tr(D22D+p)kR0Kk2FkR0k1tr(D202D0+p)(A.0.4)Letn(M):=f:=T;kk2=Mrn;00.Nextweobtainanupperboundonthetermsinvolvingin(A.0.4).ByCauchy-Schwarzinequality,tr(D22D)tr(D202D0)=trfR2R20g2trfRR0)g=tr(R0+2tr(R20)=2tr(R0+trT2pskkF+kk2F:Toboundtheterm(kRk1kR0k1)=(k+R0k1kR0k1),letEbetheindexsetasde˝nedinAssumptionA.2ofTheorem6.3.1.Thenusingthetriangleinequality,weobtain,(k+R0k1kR0k1)=(kE+R0k1+kEk1kR0k1)(kR0k1kEk1+kEk1kR0k1)(kEk1kEk1)Let=(C1)qlogp=n,=(C11)qlogp=n;where()2^SK1,weobtain,GtrT+)2C1ˆslogpn(kk1)+11sslogpnkkF˙+C1slogpnkEk1Ek1kk2F(1+)2C1slogpnkEk1+kEk1+C1slogpnkEk1Ek12C11sslogpnkkF:84AlsobecausekEk1=P(i;j)2E;i6=jijpskkF,2C1slogpnkEk1+C1slogpnkEk1slogpnkEk12C1+C10forsu˚cientlysmall.Therefore,Gkk2F1+C11slogpnC1sslogpnk+kFf1+1+21gkk2F1+C11slogpnC1Mf1+1+21g0;forallsu˚cientlylargenandM.Whichprovesthe˝rstpartoftheorem.Toprovetheoperatornormconsistency,bysub-multiplicativenormpropertykABkkAkkBk,k^K0k=k^W^R^WWKWkk^WWkk^RKkk^WWk+k^WWk(k^RkkWk+k^WkkKk)+k^RKkk^WkkWk:SincekKk=O(1)andk^RKkF=O(qslogpn)thesetogetherimpliesthatk^Rk=Op(1).Also,k^W2W2k=maxkxk2=1pXi=1j(^w2iw2i)jx2imax1ipj(^w2iw2i)jpXi=1x2i=max1ipj(^w2iw2i)j=Opslogpn;byusingaresult(Lemma1fromRavikumaretal.[2011]).Nextweshallshowsthatk^WWkk^W2W2k,(whereABmeansA=OP(B)andB=OP(A)).Wehave,k^WWk=maxkxk2=1pXi=1j(^wiwi)jx2i=maxkxk2=1pXi=1j^w2iw2i^wi+wijx2ipXi=1j(^w2iw2i)jx2i=C3k^W2W2k:85wherewehaveusedthefactthatthetruestandarddeviationsarewellabovezero,i.e.,9008i=1;2;;p:Nowsincek^W2W2kk^WWk,itfollowsthatk^Wk=Op(1)andwehavek^K0k2=Opslogpn+logpn.Thiscompletestheproof.Proof.[Theorem3.2]Letf=jjSjj2F+kk1+pXi=1f˙i˙g2;Similartotheproofoftheorem6.3.1,de˝nethefunction,Q1=ff0)where0isthetruecovariancematrixandisanyothercovariancematrix.Let=UDUTbetheeigenvaluedecompositionof,whereDisdiagonalmatrixofeigenvaluesandUismatrixofeigen-vectors.Let0=U0D0UT0iseigenvaluedecompositionof0.Then,Q1=kSk2F+kk1+tr(D2)(tr(D))2=pk0Sk2Fk0k1tr(D20)(tr(D0))2=p(A.0.5)whereA=diag(a1;a2;;ap).Write=0,andletn(M):=f:=T;kk2=Mrn;00and+ismatrixwithallo˙-diagonalelementssetequaltozero.Nextweobtainupperboundonthetermsinvolving.wehave,tr(D2)(tr(D))2=ptr(D20)(tr(D))2=p=tr2)tr20)(tr2=p+(tr0))2=p(i)tr2)20))tr0+2tr0)2=tr2+2tr20)tr2+C1pskkF(ii)tr2(tr0))2=(tr0+2(tr0))2(tr2+2tr0)trpkk2F+2kpppk+kF:Thereforethetermcanbeboundedby2kk2F+(C1ps+2ppk)kkF.WeboundtheterminvolvingasinsimilartotheproofofTheorem6.3.1.Forqlogpn,therestoftheprooffollowsverysimilartoTheorem6.3.1.87APPENDIXBJCPFORPRECISIONMATRIXESTIMATIONProof.[Lemma11.3.1]LetMbeasolutionof(11.3.6)whichexitsbecause(11.3.6)isaconvexoptimizationproblem.Thenthefollowingsub-gradientoptimalityconditionholds[VandenbergheandBoyd[2004]]02MM+kMk1:(B.0.1)where((@kMk1))ij=@jmijjandgivenby@jmijj=8>>>>>><>>>>>>:+1ifmij>01ifmij<0i=1;2;:::m;j=1;2::::n:2[1;1]ifmi;j=0Notethat(B.0.1)issatis˝edifandonlyifjmijjandthereforeoptimialsolutionisgivenbymij=sign(mij)(mij)+.Thiscompletestheproof.Proof.[Lemma11.3.2]LetLbeasolutionto(11.3.7).Thenthefollowingsub-gradientoptimalityconditionholds:02LM+˝@kLk(B.0.2)LetW=U˝VT.WeshallshowthatthischoiceofWsatis˝estheaboveoptimalitycondi-tion.Thesub-di˙erential@kWkofkWkisgivenbyBach[2008]as@kWk=(UVT+HsuchthatH2Rmn;kHk21;UTH=0andHV=0):ThereforeWM+˝@kWk=U˝VTUVT+˝(UVT+H):88MultiplyingbothsidesbyUUT,andnotingthatUUT=IweobtainWM+@kWk=UUT(U˝VTUVT+˝(UVT+H))=U˝VTU˝I)VT+UUTH=0:Therefore,W=U˝VTisasolutionto(11.3.7),thiscompletestheproof.Proof.[Lemma11.3.2]ForsuchchoiceofLnwehaveF(Wn)=f(Wn)+kWnk1+kWnk=QLn(Wn;Wn1)+kWnk1+˝kWnk=f(Wn1)+Ln2kWnWn1k2++kWnk1+˝kWnkAlsowehave,F(W)=f(W)+kWk1+kWkf(W)f(Wn1)+kWk1kWnk1+kWkkWnk+Weget,F(W)F(Wn)Ln2kWnWn1k2+(B.0.3)NotethatWnissolutionof5f(Wn1)+Ln(WnWn1)+˝5kWnk=0(using11.3.1)Therefore(11.6.2)becomesF(W)F(Wn)Ln2 kWn1Wnk2+22Ln!WeknowthatforanythreematricesA;B;CkBAk2+2=kBCk2kACk289Usingthis,weobtainF(Wn)F(W)Ln2 kWn1Wk2kWnWk22Ln!:Usinglemma(11.3.2),wegetF(Wn)F(W)Ln4 kWn1Wk2kWnWk2!+9c22Ln2:Thiscompletestheproof.90APPENDIXCJPENFORPRECISIONMATRIXESTIMATIONProof.[Theorem12.4.1]Toboundthecrossproductterminvolvingand^R1K,wehave,jtr((R10^R1Kj=jtr(R10(^RKR0)^R1Kj˙1(R10)jtr((^RKR0)^R1Kjk˙1(^R1K)jtr((^RKR0jkk1jtr((^RKR0j:where˙min(^RK)(1=k1)>0,isapositivelowerboundontheeigenvaluesofJPENestimate^RKofthecorrelationmatrixR0.SuchaconstantexistsbyLemma6.3.2.RestoftheproofcloselyfollowsthatofTheorem6.3.1.Proof.[Theorem12.4.2]Weboundthetermtr((^S0similartothatinproofofTheorem12.4.1.RestoftheproofcloselyfollowstothatTheorem12.4.1.91BIBLIOGRAPHY92BIBLIOGRAPHYI.JohnstoneandY.Lu.Sparseprincipalcomponentsanalysis.UnpublishedManuscript,2004.H.Zou,HastieT.,andTibshiraniR.Sparseprincipalcomponentsanalysis.JournalofComputationalandGraphicalStatistics,2006.K.Mardia,KentJ.,andBibbyJ.MultivariateAnalysis.,volume1.AcademicPress,NewYork,NY,1979.M.YuanandY.Lin.Modelselectionandestimationinthegaussiangraphicalmodel.Biometrika,2007.M.Wainwright,RavikumarP.,andLa˙ertyJ.High-dimensionalgraphicalmodelselectionusingl1-regularizedlogisticregression.ProceedingsofAdvancesinNeuralInformationProcessingSystems.,2006.M.Wainright.Sharpthresholdsforhigh-dimensionalandnoisysparsityrecoveryusingl1-constrainedquadraticprogrammming(lasso).IEEETransactionsonInformationTheoryarchive,55,2009.M.Yuan.Sparseinversecovariancematrixestimationvialinearprogramming.JournalofMachineLearningResearch,2009.MeinshausenandP.Bühlmann.Highdimensionalgraphsandvariableselectionwiththelasso.AnnalsofStatistics,2006.D.Yatsenko,K.Josic,rA.S.Ecke,E.Froudarakis,R.J.Cotton,andA.STolias.Improvedestimationandinterpretationofcorrelationsinneuralcircuits.PLoSComput.Biol,11,2015.V.MarcenkoandL.Pastur.Distributionsofeigenvaluesofsomesetsofrandommatrices.Math.USSR-Sb,1967.S.Geman.Alimittheoremforthenormofrandommatrices.TheAnnalsofStatistics,8(2):1980.D.L.Donoho,M.Gavish,andI.M.Johnstone.Optimalshrinkageofeigenvaluesinthespikedcovariancemodel.http://arxiv.org/pdf/1311.0851.pdf,2015.P.BickelandE.Levina.Regulatizedestimationoflargecovariancematrices.AnnalsofStatistics,2008a.P.BickelandE.Levina.Covarianceregularizationbythresholding.TheAnnalsofStatistics,2008b.J.BienandR.Tibshirani.Sparseestimationofacovariancematrix.Biometrica,2011.93A.Rothman.Positivede˝niteestimatorsoflargecovariancematrices.Biometrica,740,2012.L.Xue,MaS.,andZouHui.Positive-de˝nitel1-penalizedestimationoflargecovariancematrices.JournalofAmericanStatisticalAssociation,2012.J.Dahl,L.Vandenberghe,andV.Roychowdhury.Covarianceselectionfornon-chordalgraphsviachordalembedding.OptimizationMethodsandSoftware,2008.A.Dempster.Covarianceselection.Biometrika,1972.T.Cai,C.Zhang,andH.Zhou.Optimalratesofconvergenceforcovariancematrixestima-tion.TheAnnalsofStatistics,2010.N.Karoui.Operatornormconsistentestimationoflargedimensionalsparsecovariancematrices.TheAnnalsofStatistics,2008.A.Rothman,BickelP.J.,LevinaE.,andZhuJ.Sparsepermutationinvariantcovarianceestimation.ElectronicJournalofStatistics,2008.R.Tibshiran.Regressionshrinkageandselectionviathelasso.JournaloftheRoyalStatisticalSociety,SeriesB(StatisticalMethodology),pages1996.S.Chaudhuri,M.Drton,andT.S.Richardson.Estimationofacovariancematrixwithzeros.Biometrika,2007.A.J.Butte,P.Tamayo,D.Slonim,T.R.Golub,andI.S.Kohane.Discoveringfunctionalrelationshipsbetweenrnaexpressionandchemotherapeuticsusceptibilityusingrelevancenetworks.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,2000.C.LamandJ.Fan.Sparsistencyandratesofconvergenceinlargecovariancematricesestimation.AnnalsofStatistics,2009.O.LedoitandM.Wolf.Awell-conditionedestimatorforlarge-dimensionalcovariancema-trices.JournalofMultivariateAnalysis,2004.C.Stein.Estimationofacovariancematrix.Rietzlecture,39thAnnualMeetingIMS.Atlanta,Georgia,1975.C.Stein.Lecturesonthetheoryofestimationofmanyparameters.JournalofMathematicalSciences.,1986.O.LedoitandM.Wolf.Optimalestimationofalarge-dimensionalcovariancematrixunderstein'sloss.http://papers.ssrn.com/,2014.Y.SheenaandA.Gupta.Estimationofthemultivariatenormalcovariancematrixundersomerestrictions.StatisticsandDecisions,2003.94J.H.Won,J.Lim,S.J.Kim,andB.Rajaratnam.Condition-numberregularizedcovarianceestimation.JournaloftheRoyalStatisticalSocietyB,75,2012.L.R.Ha˙.Empiricalbayesestimationofthemultivariatenormalcovariancematrix.AnnalsofStatistics,1980.S.LinandM.D.Perlman.Amontecarlocomparisonoffourestimatorsofacovariancematrix.MultivariateAnalysis,1985.D.DeyandC.Srinivasan.Estimationofacovariancematrixunderstein'sloss.AnnalsofStatistics,1985.B.Rajaratnam,D.Vincenzi,andB.Naul.Atheoreticalstudyofstein'scovarianceestimator.Technicalreport,DepartmentofStatistics,StanfordUniversity,2014.A.Maurya.Asparseandwell-conditionedestimationofcovarianceandinversecovariancematricesusingajointpenalty.JournalofMachineLearningResearch,15-345,2016.T.Cai,Z.Ren,andH.Zhou.Estimatingstructuredhigh-dimensionalcovarianceandpre-cisionmatrices:Optimalratesandadaptiveestimation.ElectronicJournalofStatistics,2015.D.P.Bertsekas.Incrementalgradient,subgradient,andproximalmethodsforconvexopti-mization,asurvey.LabaratoryforInformationandDecisionSystemsReportLIDS-P-2848.MIT,2010.L.VandenbergheandS.Boyd.Convexoptimization.CambridgeUniversityPress,2004.F.Bach,R.Jenatton,J.Mairal,andG.Obozinski.Convexoptimizationwithsparsity-inducingnorms.InOptimizationforMachineLearning,MITpress,2011.A.BeckandM.Teboulle.Afastiterativeshrinkage-thresholdingalgorithmforlinearinverseproblems.SIAMJournalofImagingScience,2009.J.Friedman,HastieT.,andTibshiraniR.Sparseinversecovarianceestimationwiththegraphicallasso.Biostatistics.,2008.C.R.Rao.Generalizedinverseofamatrixanditsapplications.Proc.SixthBerkeleySymp.onMath.Statist.andProb.,Univ.ofCalif.Press,1972.L.Vandenberghe,S.Boyd,andS.-P.Wu.Determinantmaximizationwithlinearmatrixinequalityconstraints.SIAMJournalonMatrixAnalysisandApplications,1998.O.Banerjee,E.L.Ghaoui,andd'AspremontA.Modelselectionthroughsparsemaximumlikelihoodestimationformultivariategaussianorbinarydata.JournalofMachineLearn-ingResearch,2008.S.Zhou,P.Rutimann,andBuhlmannP.XuM.High-dimensionalcovarianceestimationbasedongaussiangraphicalmodels.JournalofMachineLearningResearch,2011.95M.Pourahmadi.Choleskydecompositionsandestimationofacovariancematrix:orthogo-nalityofvariance-correlationparameters.Biometrika94,2007.M.Pourahmadi.Modelingcovariancematrices:Theglmandregularizationperspectives.StatisticalScience,2011.T.Cai,W.Liu,andX.Luo.,aconstrained`1minimizationapproachtosparseprecisionmatrixestimation.JournalofAmericanStatisticalAssociation,2011.R.TomiokaandK.Aihara.Classifyingmatriceswithaspectralregularization.Proc.24thInt.Conf.MachineLearning,pages2007.F.Bach.Consistencyoftracenormminimization.JournalofMachineLearningResearch,2008.A.Argyriou,T.EvgeniouT.,andM.Pontil.Convexmulti-taskfeaturelearning.MachineLearning,SpecialIssueonInductiveTransferLearning,2008.M.Fazel.Matrixrankminimizationwithapplications.,phdthesis.Elec.Eng.Dept,StanfordUniversity,2002.Y.Nesterov.Smoothminimizationofnon-smoothfunctions.Math.Program.,pages12005.R.T.Rockafellar.Monotoneoperatorsandproximalpointalgorithm.SIAMJournalofControlandOptimization,14,1976.H.Liu,K.Roede,andL.Wasserman.Stabilityapproachtoregularizationselection(stars)forhighdimensionalgraphicalmodels.InProceedingsoftheTwenty-ThirdAnnualConferenceonNeuralInformationProcessingSystems(NIPS),2010.A.Maurya.Ajointconvexpenaltyforinversecovariancematrixestimation.ComputationalStatisticsandDataAnalysis,2014.U.Alon,BarkaiN.,NottermanD.,GishK.,YbarraS.,MackD.,andLevineA.Broadpatternsofgeneexpressionrevealedbyclusteringanalysisoftumorandnormalcolontissuesprobedbyoligonucleotidearrays.ProceedingofNationalAcademyofScienceUSA,1999.L.Wang,J.Zhu,andH.Zou.Hybridhuberizedsupportvectormachinesformicroarrayclassi˝cation.Proceedingsofthe24thInternationalConferenceonMachineLearning.,pages2007.P.BickelandE.Levina.Sometheoryfor˝sher'slineardiscriminantfunction,ebayandsomealternativeswhentherearemanymorevariablesthanobservations.Bernoulli,2004.P.Ravikumar,WainwrightM.andRaskuttiG.,andYuB.High-dimensionalcovarianceestimationbyminimizingl1-penalizedlog-determinantdivergence.ElectronicJournalofStatistics,2011.96