ESTIMATINGCOVARIANCESTRUCTUREINHIGHDIMENSIONSByAshwiniMauryaADISSERTATIONSubmittedtoMichiganStateUniversityinpartialful˝llmentoftherequirementsforthedegreeofStatisticsDoctorofPhilosophy2016ABSTRACTESTIMATINGCOVARIANCESTRUCTUREINHIGHDIMENSIONSByAshwiniMauryaManyofscienti˝cdomainsrelyonextractingknowledgefromhigh-dimensionaldatasetstoprovideinsightsintocomplexmechanismsunderlyingthesedata.Statisticalmodelinghasbecomeubiquitousintheanalysisofhighdimensionaldataforexploringthelarge-scalegeneregulatorynetworksinhopeofdevelopingbettertreatmentsfordeadlydiseases,insearchofbetterunderstandingofcognitivesystems,andinpredictionofvolatilityinstockmarketinthehopeofavertingthepotentialrisk.Statisticalanalysisinthesehigh-dimensionaldatasetsyieldsbetterresultsonlyifanestimationprocedureexploitshiddenstructuresunderlyingthedata.Thisthesisdevelops˛exibleestimationprocedureswithprovabletheoreticalguaranteesforestimatingtheunknowncovariancestructuresunderlyingdatageneratingprocess.Ofparticularinterestareproceduresthatcanbeusedonhighdimensionaldatasetswherethenumberofsamplesnismuchsmallerthantheambientdimensionp.Duetotheimportanceofstructureestimation,themethodologyisdevelopedfortheestimationofbothcovarianceanditsinverseinparametricandaswellinnon-parametricframework.CopyrightbyASHWINIMAURYA2016Thisthesisisdedicatedtomyfamily.Fortheirendlesslove,support,andencouragement.ivACKNOWLEDGMENTSIamreallygratefultomanypeoplewhohelpedmeachievedoctorateinStatistics.ItwouldnothavebeenpossibletopursuePhDinUnitedStatesifitwasnotformyparentswhospendenormousamountoftimeande˙ortineducatingmefromtheearlieststagesofmylife.Theysupportedmeineverystepofmycareerandprovidedasafetynettofreelypursuemanydi˙erentpossibilities.AtMichiganStateUniversity,IamextremelyfortunatetobeadvisedbyProfessorHiraL.Koul,whotaughtmehowtothinkabouttheresearchproblems;Ihavebene˝tedalotfromhisclarityofthoughtandcreativeintellect.Hehasalwaysbeenconstantsourceofmotivationandencouragedmetorealizemypotential.Iamgratefultohimforprovidingthebestpossibleacademicenvironmentthatenabledmetothinkindependentlyandgrowasascienti˝cresearcher.Ialsothankhisfamilyfortheamazinghospitality,whichinmanywaysmademeathomeawayfromhomewhileatMichiganState.Ihavemuchtothankothermembersofmythesiscommitteeaswell.Dr.MarkIwen'scourseonesensingandBigandmanydiscussionsprovedveryhelpfulinmyresearchwork.IamthankfultoprofessorYuehuaCuiandDr.GraceHongforservingonmythesiscommitteeandfortakingtimeoutfromtheirbusyscheduletoteachmetheimportanceofgoodresearch.IamverygratefultoProfessorTathagataBandyopadhyayatIndianInstituteofMan-agementAhmedabad,forhissupportandencouragingmetopursueadvanceddegreefromUnitedStates.IamalsoveryfortunatetoknowProfessorArnabLahaatIndianInstituteofManagementAhmedabadandthankhimforhissupportandencouragement.AtMichiganState,IhavelearnedalotfromteachingofProfessorTapabrataMaiti,andthankhisfamilyfortheunconditionalsupport.IamthankfultoSueWatsonwhohasbeenanever-presentsourceofhelp,KimSchmuecker,andAndyHu˙ordfortheirhelpduringmanytechnicalissuesatMichiganState.vToallmyfriends,thankyouforyourunderstandingandencouragementinmymany,manymomentsofcrisis.Yourfriendshipmakesmylifeawonderfulexperience.Ican'tlistallthenameshere,butyouarealwaysonmymind.viTABLEOFCONTENTSLISTOFTABLES...................................xLISTOFFIGURES...................................xiCHAPTER1INTRODUCTION...........................11.1CovarianceStructureEstimation........................21.1.1HighDimensionalCovarianceMatrixEstimation............31.2InverseCovarianceMatrixEstimation......................31.3ThesisOverview..................................41.4Notation......................................6PartIEstimatingCovarianceStructure.................8CHAPTER2SAMPLECOVARIANCEMATRIXANDITSLIMITATIONS....92.1SampleCovarianceMatrix............................92.2WhySampleCovarianceMatrixisNOTSuitableinHighDimensions?....10CHAPTER3LOSSFUNCTIONSFORCOVARIANCEMATRIXESTIMATION.133.1LikelihoodBasedMethods............................133.2FrobeniusNormLossBasedMethods......................143.3OtherLossFunctionBasedMethods......................15CHAPTER4LEARNINGSPARSESTRUCTURE..................174.1TwoBroadClassofCovarianceMatrices....................174.2LassoTypePenalty................................194.3Discussion:.....................................21CHAPTER5ESTIMATINGAWELL-CONDITIONEDSTRUCTURE.......235.1Motivation.....................................235.2WellConditionedEstimation...........................235.3VarianceofEigenvaluesPenalty.........................24CHAPTER6LEARNINGSIMULTANEOUSSTRUCTUREWITHJOINTPENALTY266.1Motivation.....................................266.2JPENFramework.................................286.3TheoreticalAnalysisofJPENEstimators....................296.3.1ResultsonConsistency..........................316.4GeneralizedJPENEstimatorsandOptimalEstimation............336.4.1WeightedJPENEstimatorfortheCovarianceMatrixEstimation..33CHAPTER7ANALGORITHMANDITSCOMPUTATIONALCOMPLEXITY..347.1AVeryFastExactAlgorithm..........................357.1.1Derivation.................................35vii7.1.2ChoiceofRegularizationParameters..................367.1.3ChoiceofWeights.............................367.2ComputationalComplexity............................37CHAPTER8SIMULATIONS.............................398.1Preliminary....................................398.2PerformanceComparison.............................428.3RecoveryofEigen-structureandSparsity....................45PartIIEstimatingInverseCovarianceStructure..........48CHAPTER9INVERSECOVARIANCEMATRIXANDITSAPPLICATIONS...499.1Motivation.....................................499.2RelatedWork...................................499.3JointPenaltyforPrecisionMatrixEstimation.................519.4SomeApplications................................539.4.1LinearDiscriminantAnalysis......................539.4.2GaussianGraphicalModeling......................53CHAPTER10AJOINTCONVEXPENALTY(JCP)ESTIMATION.........5510.1Motivation.....................................5510.2JointConvexPenaltyEstimation........................5510.2.1ProblemFormulation...........................5610.2.2ProposedEstimator............................56CHAPTER11PROXIMALGRADIENTALGORITHMSANDITSCONVERGENCEANALYSIS...............................5811.1Introduction....................................5811.2ProximalGradientMethod............................5811.3BasicApproximationModel...........................5911.4Algorithmforoptimization............................6011.5ChoosingtheRegularizationParameter.....................6111.6ConvergenceAnalysis...............................6111.7SimulationStudy.................................6311.7.1PerformanceCriteria...........................6311.7.2StARSMethodofTuningparameterselection:.............6411.7.3SimulationResults............................6411.7.3.1ToeplitzTypePrecisionMatrix................6511.7.3.2BlockTypePrecisionMatrix.................6611.7.3.3HubGraphTypePrecisionMatrix..............6711.7.3.4NeighborhoodGraphTypePrecisionMatrix.........6811.8Summary.....................................6911.9Discussion.....................................69viiiCHAPTER12SIMULTANEOUSESTIMATIONOFSPARSEANDWELL-CONDITIONEDPRECISIONMATRIX.........................7112.1Motivation.....................................7112.2JointPenaltyEstimation:ATwoStepApproach...............7212.3WeightedJPENestimatorforprecisionmatrix.................7312.4TheoreticalAnalysisofJPENestimators....................73CHAPTER13SIMULATIONSANDANAPPLICATIONTOREALDATAANAL-YSIS...................................7513.1SimulationResults:Settings...........................7513.2PerformanceComparison.............................7613.3ColonTumorGeneExpressionDataAnalysis.................77APPENDICES......................................81APPENDIXAJPENCovarianceMatrixEstimation..............82APPENDIXBJCPforPrecisionMatrixEstimation..............88APPENDIXCJPENforPrecisionMatrixEstimation.............91BIBLIOGRAPHY....................................92ixLISTOFTABLESTable8.1Covariancematrixestimation.........................43Table8.2Covariancematrixestimation.........................44Table11.1AverageKL-Losswithstandarderrorover20replications.........65Table11.2Averagerelativeerrorwithstandarderrorover20replications.......65Table11.3AverageKL-Losswithstandarderrorover20replications.........66Table11.4Averagerelativeerrorwithstandarderrorover20replications.......66Table11.5AverageKL-Losswithstandarderrorover20replications.........67Table11.6Averagerelativeerrorwithstandarderrorover20replications.......67Table11.7AverageKL-Losswithstandarderrorover20replications.........68Table11.8Averagerelativeerrorwithstandarderrorover20replications.......68Table13.1Precisionmatrixestimation..........................76Table13.2Precisionmatrixestimation..........................77Table13.3Averagesandstandarderrorsofclassi˝cationerrorsover100replicationsin%.......................................79xLISTOFFIGURESFigure2.1Eigenvalueofsampleandpopulationcovariancematrices.........11Figure3.1Aconcavefunction..............................13Figure4.1Denseandsparsecovarianceandprecisionmatrices............18Figure6.1Comparisonofeigenvaluesofcovariancematrixestimates.........27Figure7.1TimingcomparisonofJPEN,Glasso,andPDSCE.............38Figure8.1Covariancegraphfordi˙erenttypeofmatrices...............41Figure8.2Heat-mapofzerosidenti˝edincovariancematrixoutof50realizations.Whitecoloris50/50zerosidenti˝ed,blackcoloris0/50zerosidenti˝ed.45Figure8.3Eigenvaluesplotforn=100;p=50basedon50realizationsforneigh-borhoodtypeofcovariancematrix.....................46Figure8.4Eigenvaluesplotforn=100;p=100basedon50realizationsforCov-Itypematrix..................................47Figure9.1Eigenvaluesplotofprecisionmatrix.....................52Figure9.2Illustrationofconditionalindependence..................54Figure13.1Colontumorgeneexpressiondata......................78Figure13.2Partialcorrelationnetworkofcolontumorgeneexpressiondata.....80xiCHAPTER1INTRODUCTIONTheincreasinguseoftechnologywiththedevelopmentsinstoragesystemshascreatedvastamountofhighdimensionaldataacrossmanyscienti˝cdisciplines.Examplesincludethelarge-scaleomicsdatathatenhanceourknowledgeofhumanbiology,networkdatathatexplainshowweinteractandconnectwitheachother,andthe˝nancedatathatprovidesanopportunitytobeatthemarket.Thestatisticalanalysisofthesedataischallengingduetothecurseofdimensionality.Newstatisticalmethodsareneededtomodeltheunknownstructureunderlyingthesedatasetstoleverageourscienti˝cunderstanding.Thestatisticalinferenceinhigh-dimensionaldataispossibleonlyifaninferencepro-cedureis˛exibleenoughtoexploitthehiddenstructureunderlyingthesedatasets.Thistranslatestodesigninganinferenceprocedurethatdoeswellinmodelingthestructureun-derlyingthesedatasets.Suchaninferenceprocedureoftenassumesthatmanyofhighdimensionalstructurescanberepresentedwithasmallernumberofparameterswhichisthecaseinmanyscienti˝cdisciplines.Consequently,thetheconceptofparsimonybecomescrucialinhighdimensions.Thee˙ectivenessofstatisticalestimationinhighdimensionalsettingreliesonthero-bustnessoftheprocedure,itse˚ciency,andscalability.Thelatterdependsupontheavail-abilityofscalablealgorithmictechniquesanditsabilitytoe˚cientlylearnthestructureunderlyingthesedatasets.Themaingoalofthisthesisistodevelopa˛exibleandprincipledstatisticalmethodsforuncoveringhiddenstructureunderlyingthehighdimensional,complexdatawithafocusonscienti˝cdiscovery.Inparticularthethesisaddressestwomaintasks:(i)Estimationofcovariancematricesanditsinverse,and(ii)Scalablealgorithmsforcomputingthecovariancestructurebasedontheproposedestimationprocedures,inhighdimensionalsetting.11.1CovarianceStructureEstimationInmanyscienti˝cdisciplines,astudyinvolveslargecomplexsystemswhoseoutputoftendependsonlargenumberofcomponents(variables)andtheirinteractions.Asamotivat-ingexample,insystembiology,thecellularnetworksoftenconsistsofverylargenumberofmoleculesthatinteractandexchangeinformationamongthemselves.Manyoftheexistingtechniquesdependuponthedescriptiveanalysisofmacroscopicproperties,whichincludedegreedistribution,pathlengthsandmotifpro˝leofthesemolecularnetworksordatamin-ingtoolstoidentifyclusters.Suchananalysisprovideslimitedinsightintothecomplexmechanismofthefunctionalandstructuralorganizationofbiologicalstructures.Anesti-mateoftherobustcovariancematrixcanexplainthebiologicallyimportantinteractionsandenhanceourscienti˝cunderstandingofthecomplexphenomenon.Covariancestructureestimationisoffundamentalimportanceinmultivariatedataanal-ysis.Itiswidelyusedinnumberofapplicationsincluding(i)Principalcomponentanalysis(PCA)[JohnstoneandLu[2004],Zouetal.[2006]],wherethegoalistoprojectthedataonest"k-dimensionalsubspace,andwherebestmeanstheprojecteddataexplainsasmuchofthevariationinoriginaldatawithoutincreasingk;(ii)Discriminantanalysis[Mardiaetal.[1979]]:wherethegoalistoclassifyobservationsintodi˙erentclasses,hereestimatesofcovarianceandinversecovariancematricesplayanimportantroleastheclassi˝erisoftenafunctionoftheseentities;(iii)Regressionanalysis:Ifinterestfocusesonestimationofregressioncoe˚cientswithcorrelated(orlongitudinal)data,asandwichestimatorofthecovariancematrixmaybeusedtoprovidestandarderrorsfortheestimatedcoe˚cientsthatarerobustinthesensethattheyremainconsistentundermis-speci˝cationofthecovariancestructure;and(iv)Gaussiangraphicalmodeling[YuanandLin[2007],Wainwrightetal.[2006]Wainright[2009],Yuan[2009],MeinshausenandBühlmann[2006]]:therelationshipstructureamongnodescanbeinferredfrominversecovariancematrix(alsocalledprecisionmatrix).Azeroentryintheprecisionmatriximpliesconditionalindependencebetweenthe2correspondingnodes,giventheremainingnodes.Inapplicationswheretheprobabilitydis-tributionofdataismultivariateGaussian,precisionmatrixisusedtodescribetheunderlyingdependecestructureamongthevariables.1.1.1HighDimensionalCovarianceMatrixEstimationInmanyoftheapplications,statisticianoftenencounterdatasetswheresamplesizenismuchsmallerthantheambientdimensionp,wherethelattercanbeverylargeofteninthousands,millionsorevenmore.Insuchsituations,theclassicalstatisticalestimatorstendtohavehugebiasforestimatingtheirpopulationcounterpartandmanyoftheexistingasymptotictheoriesnolongerremainvalid.Ingeneralapdimensionalcovariancematrixrequiresestimationofp(p+1)=2freeparameters.Forn
2.Themainideabehindthenon-convexpenaltyistoreducethebiaswhenthevalueofparameterhasrelativelylargermagnitude.Forexample,theSCADpenaltyremainsconstantwhenislarge,whereasthe`1penaltygrowslinearlywith.Themainlimitationsofnon-convexpenaltyisthattheproposedalgorithmsusesiterativeprocedurebasedonlocalconvexapproximationshencecomputatinalyintensive.Alsoitishardtosayiftheproposedalgorithmconvergestotheglobalminima/maxima.TheproposedJointPENalty(JPEN)methodinthisthesisusesFrobeniusnormlossfunctionandjointpenaltyof`1andvarianceofeigenvalueofunderlyingcovariancematrix.Thechoiceofsquaredlossfunctionallowssparseestimationofcovariancematrix(ratherthesparseprecisionmatrix),andresultsinveryfastalgorithm.Weintroducevarianceofeigenvaluespenaltytoensurethattheestimatedcovariancematrixispositivede˝nite.Formoredetailsonthis,seethechapter6.4.3Discussion:Assumptionofsparsityinvolvesatradeo˙betweenbene˝tandcost.Inparticularinhighdimensionaldataanalysis,whenentriesofcovariancematricesaresettozero,thenoiseduetotheerrorofestimationisgenerallyreduced.Ontheotherhand,errorsofmisspeci˝cationareintroduced.Hencethedecisionto˝tasparsemodelcomesattrade-o˙betweenover˝tting21andmodelspeci˝cation.AsnotedbyDempster[1972],oncetheparametricmodelisadopted,choiceoflevelofsparsityisoftensettleddown,speciallywhentheoptimalestimatescaneasilybecomputed.However,suchoptimalityprovidesnogauranteeagainstthecostofintroducingunecessaryparameters.22CHAPTER5ESTIMATINGAWELL-CONDITIONEDSTRUCTURE5.1MotivationInhighdimensionaldataapplications,whereaninverseofcovariancematrixisused,samplecovariancematrixcannotbeusedasgenerallythisisnotinvertible.Byawell-conditionedcovariancematrix,wemeanthatitsconditionnumber(ratioofmaximumandminimumeigenvalues)isboundedabovebyapositively˝niteconstant(heretheconstantisnottoolarge).AspointedoutbyLedoitandWolf[2004],awell-conditionedestimatorreducestheestimationerrorandisadesiredpropoertyinhighdimensionalsettings.Inthischapter,wediscusssomeoftheexistingliteratureonwell-conditionedestimation,andintroducevarianceofeigenvaluespenaltlyasane˙ectivemethodforimprovedeigen-structureestimation.5.2WellConditionedEstimationTheproblemofwell-conditionedcovariancematrixestimationisalongstudiedsubjectStein[1975,1986],LedoitandWolf[2004,2014],SheenaandGupta[2003],Wonetal.[2012].Ithasreceivedconsiderableattentioninhighdimensionalanalysisduetotheimportanceofsuchestimatorsinmanyhighdimensionaldataapplications.Amongtheearlierdevelopmentstosolvethisproblem,Stein[1975]proposedhisfamousclassofrotationinvariantshrinkageestimators.Herethemainideawastokeepthesameeigenvectorsasthatofthesamplecovariancematrixbutshrinktheeigenvaluestowardsthecenter,inordertoreducetheeigenvaluesdispersion.LetS:=UDUTbeeigen-decompositionofthesamplecovariancematrix.Stein'sestimatorisgivenby:^=UDnewUTwhereDnew=diag(dnew1;dnew2;;dnewp)(5.2.1)23withdnewii=ndiinp+1+2diipXi6=j1diidjj,where(d11;d22;;dpp)isthediagonalofD.ThisclassofestimatorsisfurtherstudiedbyHa˙[1980],LinandPerlman[1985],DeyandSrinivasan[1985],LedoitandWolf[2004,2014].AlthoughSteinestimatorisconsideredtobeStanda[Rajaratnametal.[2014]],ithasanumberoflimitationsincluding,(i)itisnotnecessarilypositivede˝nite,(ii)assumesnormality,and(iii)suitableonlyforlowdimensiondataanalysiswhensamplesizeexceedsthedimension.Amongtheearlierworkofeigenvaluesshrinkageestimationinhighdimensionalsetting,LedoitandWolf[2004]proposedanestimatorthatshrinksthesamplecovariancematrixtowardsidentity.Theirestimatorisgivenby:ˆ1S+ˆ2I;whereˆ1;ˆ2areestimatedfromdata.(5.2.2)Forˆlargeenough,theestimatorgivenby(5.2.2)iswell-conditionedbutneednotbesparseassamplecovariancematrixisgenerallynotsparse.Inanotherinterestingwork,Wonetal.[2012]considermaximumlikelhihoodestimationwithaconditionnumberconstraint.Theysolvethefollowingoptimizationproblem:MaximizeL(S;subjectto˙max=˙minmax;(5.2.3)whereL(S;:)islikelihoodfunctionofmultivariateGaussiandistributiongivenin(3.1.1),andmaxissomepositiveconstant.Theestimateof(5.2.3)isinvertibleifmaxis˝nite,andwellconditionedifmaxismoderate.Theyconsideravalueofmax<103tobemoderatebutitsvaluesalsodependsupontheeigenvaluesdispersionoftruepopulationcovariancematrix.5.3VarianceofEigenvaluesPenaltyTheestimatorsproposedbyStein[1975],DeyandSrinivasan[1985],LedoitandWolf[2004],andWonetal.[2012]arewell-conditionedandhavebeenusedinanumberofap-24plications.Howevertheseestimatorsarenotsparse,inadditiontheyhavethefollowinglimitations:Therotationinvariantestimatorsdonotchangetheeigen-vectorsandhencetheyremaininconsistent[JohnstoneandLu[2004]].Theseestimatorstendtooverestimatethenumberofnon-zerosoftheunderlyingtruecovariancematrixanditsinverse.TheestimatorgiveninLedoitandWolf[2004]resultsinlinearshrinkageofeigenvaluestowardsthoseofidentitymatrix,whichmaynotbeoptimalcriteria,aseigenvaluesfarfromcentertendtobeheavilybiasedascomparedtotheeigenvaluesinthecenter.Thechoiceofvarianceofeigenvalueshasadvantageasitallowsmoreshrinkageoftheextremeeigenvaluesthantheonesincenterandthereforenon-linearlyreducesthebias,andthequadratictermleadstoveryfastandexactalgorithm.Seechapter6formoredetailedanalysisoftheproposedmethod.25CHAPTER6LEARNINGSIMULTANEOUSSTRUCTUREWITHJOINTPENALTY6.1MotivationFromthediscussioninchapter4,itisunderstoodthatlearningasparsestructurecanbeachievedby`1regularization.Fromchapter5,itislearnedthatawell-conditionedstructurecanbeachievedbysuitablyshrinkingthesampleeigenvaluestowardsitscenter.Howevereitheroftheseregularizationdonotprovideasimultaneoustreatmentofsparseandwell-conditionedestimation.Forexample,considertheestimationofcovariancematrixbyminimizingthe`1regularizedFrobeniusnormlossfunction:^=argminT;trtr(S)jjSjj22+kk1;(6.1.1)whereissomepositiveconstant.Notethatbypenaltyfunctionkk1,weonlypenalizeo˙-diagonalelementsof.Bytheconstraint,tr=tr(S),weensurethattotalvariationintheestimatedcovariancematrixisthesameasthatinthesamplecovariancematrix.Thesolutionto(6.1.1)isthestandardsoft-thresholdingestimatoranditisgivenby(seechapter7forderivationofthisestimator):^ii=sii^ij=sign(sij)maxjsijj2;0;i6=j:(6.1.2)Itisclearfromthisexpressionthatasu˚cientlylargevalueofwillresultinsparsecovari-ancematrixestimate.Howevertheestimator^of(6.1.1)isnotnecessarilypositivede˝nite[formoredetailshereseeMaurya[2016],Xueetal.[2012]].Moreoveritishardtosaywhetheritovercomestheover-dispersioninthesampleeigenvalues.Figure6.1illustratesthisphenomenonforaneighbourhoodtypecovariancematrix.Herewesimulatedrandomvectorsfrommultivariatenormaldistributionwithsamplesizen=50anddimensionp=50.AsisevidentfromFigure6.1,eigenvaluesofsamplecovariancematrixareover-dispersedas26Figure6.1Comparisonofeigenvaluesofcovariancematrixestimatesmostofthemareeithertoolargeorclosetozero.EigenvaluesoftheproposedJointPenalty(JPEN)estimatorarewellalignedwiththoseofthetruecovariancematrix.Seechapter8fordetaileddiscussion.Thesoft-thresholdingestimator(6.1.2)issparsebutfailstorecovertheeigenstructureofthetruecovariancematrix.Toovercometheoverdispersionandachieveawell-conditionedestimator,itisnaturaltoregularizetheeigenvaluesofthesamplecovariancematrix.Considertheeigenvaluesreg-ularizedestimatorofcovariancematrixbasedonsquaredlosspenaltyasthesolutiontothe27followingoptimizationproblem:^=argminT;trtr(S)jjSjj22+pXi=1n˙i(R)˙Ro2;(6.1.3)whereissomepositiveconstant.Theminimizer^of(6.1.3)isgivenby,^=(S+tI)=(1+);(6.1.4)whereIistheidentitymatrix,andt=Ppi=1Sii=p.Toseetheadvantageofeigenvalueshrinkagepenalty,notethataftersomealgebra,forany>0,˙min(^=˙min(S+tI)=(1+)t1+>0:ThismeansthatthevarianceofeigenvaluespenaltyimprovesStoapositivede˝niteesti-mator^.Howevertheestimator(6.1.3)iswell-conditionedbutneednotbesparse.Sparsitycanbeachievedbyimposing`1penaltyontheentriesofcovariancematrix.Inthenextsectionwedescribethejointpenaltyestimationanddiscussitsadvantageasanimprovedcovariancematrixestimator.6.2JPENFrameworkWeconsiderthejointpenaltyestimatorofcovariancematrixasthesolutiontothefollowingoptimizationproblem.^=argminT;trtr(S)jjSjj22+kk1+pXi=1n˙i˙o2;(6.2.1)whereandaresomepositiveconstants.Fromhereonwardswesuppressthedependenceof^onanddenote^by^.Simulationshaveshownthat,ingeneraltheminimizerof(6.2.1)isnotpositivedef-initeforallvaluesof>0and>0.Thereforeweconsidertheoptimizationof(6.2.1)forrestrictedsetof()toensuretheresultingestimatorissparseandwell-conditioned28simultaneously.Inwhatfollows,we˝rstconsidercorrelationmatrixestimation,andlatergeneralizethemethodforcovariancematrixestimation.TheproposedJPENcovariancematrixestimatorisobtainedbyoptimizingthefollowingobjectivefunctioninRoverspeci˝cregionofvaluesof()whichdependsonthesamplecorrelationmatrixK,and.Heretheconditiontr=tr(S)reducestotr(R)=p,andthereforet=1.^RK=argminR=RT;tr(R)=pj()2^SK1jjRKjj2F+kRk1+pXi=1n˙i(R)˙Ro2;(6.2.2)where^SK1=ˆ():>0;qlogpn;8>0;˙minf(K+I)2sign(K+I)g>˙;and˙RisthemeanoftheeigenvaluesofR.InparticularifKisdiagonalmatrix,theset^SK1isgivenby,^SK1=ˆ():>0;qlogpn;8>0;<2()˙.Theminimizationin(6.2.2)overRisfor˝xed()2^SK1.FurthermoreLemmas6.2.1and6.2.2,respectivelyshowthattheobjectivefunction(6.2.2)isconvexandestimatorgivenin(6.2.3)ispositivede˝nite.Theproposedestimatorofcovariancematrix(basedonregularizedcorrelationmatrixestimator^RK)isgivenby:^K=(S+)1=2^RK(S+)1=2;(6.2.3)whereS+isthediagonalmatrixofthediagonalelementsofS.6.3TheoreticalAnalysisofJPENEstimatorsAlthoughwedonotmakeanyassumptionofdatageneratingprocessforestimation,toderiveratesofconvergencewemakeassumptionthattheunderlyingdatagenerating29processissub-Gaussian.Inthissection,wegiveratesofconvergenceoftheproposedJPENestimatorinhighdimensionalsettingwherethesamplesizeanddimensionbothdivergetoin˝nity.Def:ArandomvectorXissaidtohavesub-Gaussiandistributionifforeacht0andy2Rpwithkyk2=1,thereexist0<˝<1suchthatPfjyT(XE(X))j>tget2=2˝(6.3.1)TheJPENestimatorsexistsforany(n;p)satisfying2n
0suchthat1=k˙min0)˙max0)k.AssumptionA2guaranteesthatthetruecovariancematrix0iswell-conditioned(i.e.alltheeigenvaluesare˝niteandpositive).AssumptionA1ismoreofade˝nitionwhichsaysthatthenumberofnon-zeroo˙diagonalelementsareboundedbysomepositiveinteger.ThefollowingLemmas6.3.1and6.3.2,respectively,showthattheoptimizationproblemin(6.2.2)isconvexandyieldsapositivede˝nitesolution.Lemma6.3.1.Theoptimizationproblemin(6.2.2)isconvex.Lemma6.3.2.Theestimatorgivenby(6.2.2)ispositivede˝niteforany2n<1and1p<1.306.3.1ResultsonConsistencyTheorem6.3.1.Let()2^SK1and^Kbeasde˝nedin(6.2.2).UnderAssumptionsA0,A1,A2,k^RKR0kF=OPsslogpnandk^K0k=OPs(s+1)logpn;(6.3.2)whereR0istruecorrelationmatrix.Remark6.3.1.TheJPENestimator^Kismini-maxoptimalundertheoperatornorm.In(Caietal.[2015]),theauthorsobtainthemini-maxrateofconvergenceintheoperatornormoftheircovariancematrixestimatorfortheparticularconstructionofparameterspaceH0(cn;p):=ˆ:max1ipPpi=1If˙ij6=0gcn;p˙.Theyshowthatthisrateinoperatornormiscn;pqlogp=nwhichissameasthatof^Kfor1cn;p=ps.Remark6.3.2.BickelandLevina[2008b]provedthatundertheassumptionofPj=1j˙ijjqc0(p)forsome0q1,thehardthresholdingestimatorofthesamplecovariancematrixfortuningparameterq(logp)=nisconsistentinoperatornormataratenoworsethanOPc0(p)pp(logpn)(1q)=2wherec0(p)istheupperboundonthenumberofnon-zeroele-mentsineachrow.Herethetrulysparsecasecorrespondstoq=0.Therateofconvergenceof^KissameasthatofBickelandLevina[2008b]exceptinthefollowingcases:Case(i)Thecovariancematrixhasallo˙diagonalelementszeroexceptlastrowwhichhasppnon-zeroelements.Thenc0(p)=ppandps=q2pp1.TheoperatornormrateofconvergenceforJPENestimatorisOPqpp(logp)=nwhereasrateofBickelandLev-ina'sestimatorisOPqp(logp)=n.Case(ii)Whenthetruecovariancematrixistri-diagonal,wehavec0(p)=2ands=2p2,theJPENestimatorhasoperatornormrateofqplogp=nwhereasthatofBickelandLevina'sestimatorisqlogp=n.Forthecasepsc0(p)andJPENestimatorhasthesamerateofconvergenceasthatofBickelandLevina'sestimator.31Remark6.3.3.TheoperatornormrateofconvergenceismuchfasterthanFrobeniusnorm.ThisisduetothefactthatFrobeniusnormconvergenceisintermsofalleigenvaluesofthecovariancematrixwhereastheoperatornormconvergenceisintermsofthelargesteigenvalue.Remark6.3.4.Ourproposedestimatorisapplicabletoestimateanynon-negativede˝nitecovariancematrix.Notethattheestimator^Kisobtainedbyregularizationofsamplecorrelationmatrixin(6.2.2).Insomeapplicationsitisdesirabletodirectlyregularizethesamplecovariancematrix.TheJPENestimatorofthecovariancematrixbasedonregularizationofsamplecovariancematrixisobtainedbysolvingthefollowingoptimizationproblem:^S=argminT;trtr(S)j()2^SS1jjSjj2F+kk1+pXi=1f˙i˙g2;(6.3.3)where^SS1=ˆ():>0;qlogpn;8>0;˙minf(S+tI)2sign(S+tI)g>g:Theminimizationin(6.3.3)overisfor˝xed()2^SS1.Theestimator^Sispositivede˝niteandwell-conditioned.Theorem6.3.2givestherateofconvergenceoftheestimator^SinFrobeniusnorm.Theorem6.3.2.Let()2^SS1,andlet^Sbeasde˝nedin(6.3.3).UnderAssumptionsA0,A1,A2,k^S0kF=OPs(s+p)logpn(6.3.4)AsnotedinRothman[2012]theworstpartofconvergenceherecomesfromestimatingthediagonalentries.326.4GeneralizedJPENEstimatorsandOptimalEstimationTheestimatorsin(6.2.1)and(6.3.3)encourageeigenvalueshrinkagebythesameweightsforalltheeigenvalues.Howeveronemightwanttopenalizetheeigenvalueswithdi˙erentweights,especiallyifsomepriorknowledgeisavailableaboutthestructureoftrueeigenvalues.Toencouragedi˙erentlevelofshrinkagetowardsthecenter,weprovidethemoregenericestimatorsandcallitweightedJPENestimators.6.4.1WeightedJPENEstimatorfortheCovarianceMatrixEstimationAmodi˝cationofestimator^RKisobtainedbyaddingpositiveweightstotheterm(˙i(R)˙R)2.Thisleadstoweightedeigenvaluesvariancepenaltywithlargerweightsamountingtogreatershrinkagetowardsthecenterandviceversa.Notethatthechoiceoftheweightsallowsonetouseanypriorstructureoftheeigenvalues(ifknown)inestimatingthecovariancematrix.TheweightedJPENcorrelationmatrixestimator^RAisgivenby^RA=argminR=RT;tr(R)=pj()2^SK;A1jjRKjj2F+kRk1+pXi=1aif˙i(R)˙Rg2;(6.4.1)where^SK;A1=ˆ():qlogpn;(2˙min(K))(1+max(Aii)1)(1+min(Aii))1p+min(Aii)p˙;andA=diag(A11;A22;App)withAii=ai.Theproposedcovariancematrixestimatoris^K;A=(S+)1=2^RA(S+)1=2.Theoptimizationproblemin(6.4.1)isconvexandyieldsapositivede˝niteestimatorforeach()2^SK;A1.Asimpleexcerciseshowsthattheestimator^K;Ahasthesamerateofconvergenceas^S.Howtochooseweightsaiin(6.4.1),isdiscussedinnextchapter.33CHAPTER7ANALGORITHMANDITSCOMPUTATIONALCOMPLEXITYAproblemisregardedasinherentlydi˚cultifitssolutionrequiressigni˝cantresources,whateverthealgorithmused.Despiterecentambitiousdevelopmentsinsolvingconvexoptimizationproblems,e˚cientcomputationandscalabilitystillremaintwochallengingproblemsinhighdimensionsdataanalysis.Theexistingmethodsthatsolveaconvexopti-mization(herewemeanminimization)problemsoftencanbeimplementedverye˚cientlyinfarlesstimethantheconcaveoptimization.Extensiveliteratureexitsforconvexopti-mizationproblems[Bertsekas[2010]VandenbergheandBoyd[2004],Bachetal.[2011],BeckandTeboulle[2009]].ThemainchallangeincovariancematrixestimationintheGaussianliklihoodframeworkisthatthenegativeofloglikelihoodisconcavefunctionwhichmakesitaveryhardoptimizationproblem.Insuchsituations,onewaytofacilitatetheoptimiza-tionistoapproximatethenegativeofloglikelihoodfunctionbysomenon-concavefunctionandthensolvethisapproximateprobleme˚cientlyusingexistingalgorithms.Howeverthesolutionthusobtainedmaynotbeanoptimalsolutiontotheoriginalproblem.TheexistingalgorithmsofcomputingtheoptimalcovariancematrixbasedonFrobeniuslossfunctionhavecomputationalcomplexityofO(p3),wheretheconstantinO(p3)canbereallylarge,oftenmorethanthedimensionofthematrix.Themainreasonbehindsuchhighcomputationalcomplexityisthatthemethodsrequireoptimzationoverasetofpositivede˝niteconesfortheestimatortobepositivede˝nite(formoreonthistopic,seeXueetal.[2012]).TheJPENframeworkprovidesaneasysolutionforpositivede˝niteconstraintsthatdependsuponchoicesofthe().ThecompuatonalcomplexityofJPENestimatorisO(p2)andthusmuchfasterthantheotherexistingalgorithms.ThenextsectiondescribesthederivationofJPENestimatordescribedinchapter6.347.1AVeryFastExactAlgorithm7.1.1DerivationTheoptimizationproblem(6.2.2)canbewrittenas:^RK=argminR=RTj()2^SK1f(R);(7.1.1)wheref(R)=jjRKjj2F+kRk1+pXi=1f˙i(R)˙(R)g2:NotethatPpi=1f˙i(R)˙(R)g2=tr(R2)2tr(R)+p,wherewehaveusedtheconstrainttr(R)=p.Therefore,f(R)=kRKk2F+kRk1+tr(R2)2tr(R)+p=tr(R2(1+))2trfR(K+I)g+tr(KTK)+kRk1+p=(1+)ftr(R2)21+trfR(K+I)g+(1=(1+))tr(KTK)g+kRk1+p=(1+)fkR(K+I)=(1+)k2F+11+tr(KTK)g+kRk1+p:Thesolutionoftheaboveoptimizationproblemissoftthresholdingestimatorandisgivenby,^RK=11+sign(K)pmaxfabs(K+I)2;0g(7.1.2)with(^RK)ii=(Kii+)=(1+),pmax(A;b)ij:=max(Aij;b)iselementwisemaxfunctionforeachentryofthematrixA.Notethatforeach()2^SK1,^RKispositivede˝nite.357.1.2ChoiceofRegularizationParametersForagivenvalueof,wecan˝ndthevalueofsatisfying˙minf(K+I)2sign(K+I)g>0;(7.1.3)whichcanbesimpli˝edto<˙min(K+I)C12˙max(sign(K));C120:5:Then()2^SK1,andtheestimator^RKispositivede˝nite.SmallervaluesofC12yieldasolutionwhichismoresparsebutmaynotbepositivede˝nite.Theoptimalvaluesof()wereobtainedfollowingtheapproachsuggestedinBickelandLevina[2008b]byminimizingthe5foldcrossvalidationerror155Xi=1k^vivik1;where^viisJPENestimateofthecovariancematrixbasedonv=4n=5observations,viisthesamplecovariancematrixusing(n=5)observations.7.1.3ChoiceofWeightsFortheoptimizationproblemin(6.4.1),wechosetheweightsasperthefollowingcriteria:LetE=(1;2;;p)bethesetofsmallesttolargestdiagonalelementsofthesamplecovariancematrixS.LetkbethelargestintegersuchthatkthelementsofEislessthan1.Letbi=8>>><>>>:iforik1i;fork