HIGH-DIMENSIONALVARIABLESELECTIONFORSPATIALREGRESSIONANDCOVARIANCEESTIMATIONBySiddharthaNandyADISSERTATIONSubmittedtoMichiganStateUniversityinpartialful˝llmentoftherequirementsforthedegreeofStatisticsDoctorofPhilosophy2016ABSTRACTHIGH-DIMENSIONALVARIABLESELECTIONFORSPATIALREGRESSIONANDCOVARIANCEESTIMATIONBySiddharthaNandySpatialregressionisanimportantpredictivetoolinmanyscienti˝capplicationsandanadditivemodelprovidesa˛exibleregressionrelationshipbetweenpredictorsandaresponsevariable.Suchamodelisprovedtobee˙ectiveinregressionbasedprediction.Inthisarticle,wedeveloparegularizedvariableselectiontechniqueforbuildingaspatialadditivemodel.We˝ndthattheapproachesdevelopedforindependentdatadonotworkwellforspatiallydependentdata.Thismotivatesustoproposeaspatiallyweighted`2-errornormwithagroupLASSOtypepenaltytoselectadditivecomponentsforspatialadditivemodels.Weestablishtheselectionconsistencyoftheproposedapproachwhereapenaltyparameterdependsonseveralfactors,suchastheorderofapproximationofadditivecomponents,characteristicsofthespatialweightandspatialdependence,etc.Anextensivesimulationstudyprovidesavividpictureoftheimpactsofdependentdatastructuresandchoicesofaspatialweightonselectionresultsaswellastheasymptoticbehavioroftheestimates.Wealsoinvestigatetheimpactofcorrelatedpredictorvariables.Asanillustrativeexample,theproposedapproachisappliedtolungcancermortalitydataovertheperiodof2000-2005,obtainedfromSurveillance,Epidemiology,andEndResultsProgrambytheNationalCancerInstitute,U.S.Providingabestlinearunbiasedpredictor(BLUP)isalwaysachallengeforanon-repetitive,irregularlyspaced,spatialdata.Theestimationprocessaswellaspredictioninvolvesinvertingannncovariancematrix,whichcomputationallyrequiresO(n3).Stud-iesshowedthepotentialobservedprocesscovariancematrixcanbedecomposedintotwoadditivematrixcomponents,measurementerrorandanunderlyingprocesswhichcanbenon-stationary.Thenon-stationarycomponentisoftenassumedtobe˝xedbutlowrank.Thisassumptionallowsustowritetheunderlyingprocessasalinearcombinationof˝xednumbersofspatialrandome˙ects,knownasfixedrankkriging(FRK).Thebene˝tofsmallerrankhasbeenusedtoimprovethecomputationtimeasO(nr2),whereristherankofthelowrankcovariancematrix.InthisworkwegeneralizeFRK,byrewritingtheunderlyingprocessasalinearcombinationofnrandome˙ects,althoughonlyafewamongtheseareactuallyresponsibletoquantifythecovariancestructure.Further,FRKconsidersthecovariancematrixoftherandome˙ectcanberepresentedasproductofrrcholeskydecomposition.Thegeneralizationleadsustoanncholeskydecompositionanduseagroup-wisepenalizedlikelihoodwhereeachrowofthelowertriangularmatrixispenalized.Moreprecisely,wepresentatwo-stepapproachusinggroupLASSOtypeshrinkageestima-tiontechniqueforestimatingtherankofthecovariancematrixand˝nallythematrixitself.Weinvestigateour˝ndingsoverasetofsimulationstudyand˝nallyapplytoarainfalldataobtainedonColorado,US.CopyrightbySIDDHARTHANANDY2016ACKNOWLEDGMENTSIwouldliketotakethisopportunity,toextentmyappreciationtowardsbothmyadvi-sors,Prof.Chae-YoungLimandProf.TapabrataMaitifortheirguidance,encouragementandchallengemetothriveinwhateverwayIwassuccessful.Iwouldliketospeci˝callythankProf.Limforhertimeinreadingthedraftandsuggestingmychangesallthrough-out.Iwouldalsoliketothankmyothercommitteememberswhogavemesomepositivefeedbacksandsuggestionsregardingfuturework.Iwouldalsoliketothankmyfriendsandfamilyforthesupporttheygavemeduringmytimesofneed.vPREFACETheworkinthisthesiscanbesummedupintoageneralproblemofselecting˝xedandrandome˙ectscomponentinamixed-e˙ectpredictionmodel,withaGaussianerror,wherethepredictingvariableisobservedonlyonaspatiallocationgrids.Thesegridscouldbeirregularaswell.Thespatiallocationssiteareinatwodimensional(2D)orthreedimensional(3D)spaceinourapplications.Although,the˝nalgoalistosuccessfullydetectboth˝xedandrandome˙ectscomponentsimultaneously,the˝rststeptowardsolvingthisproblembyachievingeachseparatelyisquitechallenging.Itstillrequiresspeci˝cattentioncomparedtosomeexistingmethods,whichcanbelookedasspecialsituationofourmethod.Thisspatialdatastructuresappearsinabundance,ifoneisinterestedinmodelingcli-mateparametersorinterestedinunderstandingbrainconnectivityusingfunctionalmagneticresonanceimagingsignal.Climateparametersaresupposedtovaryoverbothlatitudeandlongitude,althoughoftenaltitudesarealsoconsidered.Henceforclimateproblemsdealwitheither2Dor3Dinspace.Ontheotherhand,brainacitivityvariesoverpointson3Dsurface.Alloure˙ortsaretowardscertainchallengesinreproducingafeasibleparsimoniousmodel.Anobviousstepistodecomposetheresponsevariableintotwocomponents,onemeanfunction,andtheotherspatiallydependentGaussianerror.Themeanfunctionhasageneraladditivemodelstructure,andthenumberofcovariatesgrowsasfastasexponentialofthenumberofsitesinthestudy.ThedependencestructureoftheGaussianerrorcaneitherbestationary,ornon-stationaryandanisotropy.Thecaseofstationarycovarianceallowsthenumberofparameterrequiredtomodelcanbecontrolledandweusecovariancefunctionsviz.Exponential(Exp),Matérn(Mat),InverseMulti-quadratic(InvMQ),andGaussian(Gauss).Thecaseofnon-stationaryandanisotropycovariancerequiresadi˙erentapproachasnoweachandeverylocationrequiresmultipleparametersandthiscasecanoftenbetranslatedtoarandome˙ectsmodel.Sooneofourchaptersconsiderstationarycovariancefunctionandhenceittalksaboutviselectingthemostappropriatesetof˝xede˙ectscomponentsandthatonly.Whiletheotherchapterwithanon-stationarycovariancetalksaboutselectionfrommixede˙ectsmodelbut,wekeepourfocusofselectiononlytotherandome˙ectscomponents.Onceweareabletocombinethetwomethodsofselectionwewillbesimultaneouslyestimatinggerenaladditivemeanfunctionandnon-stationarycovariancefunctionofGaussianspatialprocessde˝nedonaspatialsurface.Since,thenumbersofcovariatesinthegeneraladditivemeanfunction,growsexponen-tiallywithsamplesize,high-dimensionalvariableselectionhasitsownchallenges(Huang,et:al:(2010)).Additionally,wehaveaspatiallydependentGaussianerrormodelwhichre-quiresspecialattention.Thisthesisholdsproofofoure˙orttoextendHuang'sworktooursetupofspatialdependenterrormodel.Westartbyconsideringastationarycovariancestructureforspatialdependenterrormodelandtrytoachieveaparsimoniousmeanmodel.Amoreelaboratedpictureisdepictedthroughchapter1.Arelevantconclusionofourworkatthispointis,onecanbypasstheproblemofesti-matingthestationarycovariancefunctionifthesoleinterestisinselectingthetruepositivecomponents,aslongasweuseaweightedleastsquaresbyinverseofcertainstationaryco-variancematrixwithbothshortandlongrangedependence.ItalsoholdsdocumentationaboutthefactthatevenidentitymatrixasachoiceoftheweightmatrixforleastsquaresisanimprovementoverHuang'sworkasperunder-selectionoffalsepositivecomponentsintheadditivemodel.ThisispossiblesincethepenaltyparameterisstillhighercomparedtoHuang'schoice.Supposetheproblemofinterestisnownotinreducingthenumbersofcovariatesany-morerather,itisinestimatingthecovariancefunction.Also,amoregeneralsolutiontotheproblemofestimatingthecovariancematrixisobviouslyfornon-stationaryandanisotropiccovariance.ThisgeneralizationcanbeovercomebyrepresentingtheGaussianerrorassumoftwoindependentGaussianprocesses,oneisalinearcombinationofspatialrandome˙ectsvector,andtheotherismeasurementerrorwith˝nitevariance.Thespatialrandome˙ectsviivectorisassumedtohaveanon-stationarycovariancedependencebutofnotfullrank.Thefactorsinvolvedincomputingthelinearcombinationarebi-variatesplinefunctions.ThisabovedecompositionoftheoverallGaussianerrorintotwoindependentGaussianprocess,allowstheoverallmodelerrortohaveaGaussianprocesswithaveryspecialcovari-ancestructure.Thecovariancematrixoftheoverallmodelerrorisafullrankmatrix,ideallythecomputationtimeofinvertingitshouldbeoftheorderofcubicpowerofdimensionofthesquarematrix.ButtheabovedecompositionallowsustouseSherman-Morisson-Woddbury(SMW)identityoninverseofmatrices.AlongwithSMWidentityweexploitthefactthattherandome˙ectsvectorhasalowrankcovariancestructure.In2008Cressieet:al:exploitedthispropertyandconsideredifthetruevalueofthelowranknon-stationarycomponentisknown,thenonecaninvertthecovariancematrixwithacontrolledcomputationtime.Asitturnsoutthatthespatialrandome˙ectscomponent,whichisresponsibleofthenon-stationarycomponentincovariancematrix,canberepresentedasalinearcombinationofseveralspatialbasisvectors.Theweightscorrespondingtoaparticularspatiallocationdependsonthedistancebetweenthespatiallocationandsomebasisknotlocations.Theknotsplaytheroleofuniformityinmanysenses.Theknowledgeofnecessaryandsu˚cientofknotlocationstranslatestothesimplerversionofCressie'sworkin2008.ThistechniquehasalsobeensuccesfullyimplimentedbyNychkaet:al2015intheirworkonmulti-resolutionkriging.Ourcontributiontothisdirectionisadatadrivenapproachtoestimatethenumberandpositionsoftheseknotlocations.Thisproblemcanberewrittenasselectionproblemofspatialrandome˙ectscomponents.AmoredetailedscrutinyconvincedustopenalizetheGaussianlikelihoodbyagroupwiseLASSOpenaltytoovercomethecurseofdimensionality.Although,therehasbeenquiteextensiveresearchinusinggroupLASSOor`1=`2penaltyinreducingfactorsthatareusedtomodelmean,thisproblemisquiteuniqueandrequiresspecialattentionforseveralreasons.Therestofthethesisisorganizedasfollows.Chapter1discussesamathematicalviiiintroductionofthehighdimensionalmodelsusedinthiswork.Italsothrowslightonwhythisproblemofdimensionreductionshouldbeaddressedseparately.Itbreaksdowntheoverallproblemofreducingdimensionofbothmeanandcovariancefunctioninfewstepsand˝nallydiscusshowtocombinethepicture.Chapter2discussestheresultondimensionreductionofmeanfunctionwhilethecovariancefuncionishold˝xedatalowdimensionbutunkown.Itintroducesthealgorithmforestimatingtherankofthenon-stationarycovariancematrix.Italsoprovidessometheoritical˝ndingsjustifyingtheconsistencyoftheparameters.ixTABLEOFCONTENTSLISTOFTABLES...................................xiLISTOFFIGURES...................................xiiCHAPTER1INTRODUCTION...........................11.1Additivemodelbuilding.............................11.2Estimatingnon-stationarycovariance......................4CHAPTER2ADDITIVEMODELBUILDINGFORSPATIALREGRESSION...82.1Methodforselectingcomponentsinspatialadditivemodels..........122.2MainTheoreticalResults.............................152.2.1Selectionofapenaltyparameterandaspatialweightmatrix.....222.3Numericalinvestigation..............................242.3.1Simulationstudy.............................242.3.2Realdataexample............................282.4Discussion.....................................302.5Proofsoftheorems................................322.6Fewtheoreticalextensions............................522.6.1Resultsforextensiontodi˙erentvariabilityofeachadditivecomponents522.6.2Resultsforextensiontoalongrangedependence...........53CHAPTER3ESTIMATINGNON-STATIONARYSPATIALCOVARIANCEMA-TRIXUSINGMULTI-RESOLUTIONKNOTS............573.1Methodologyforestimatinganon-stationarylowrankcovariancematrix..613.2BlockCoordinateDescentalgorithmwithProximalupdate..........723.3Numericalinvestigation..............................753.3.1Simulationstudy.............................753.3.2Realdataexamples............................763.4Discussion.....................................76BIBLIOGRAPHY....................................80xLISTOFTABLESTable2.1AverageandStandarddeviationforthenumberofselectedcovariatesusing400datasetsfromtheexponentialcovariancefunction,(h)=exp(ˆjhj)withˆ=0:5.Thetruenumberofnonzerocomponentsis2.mmunitsquarelatticesareconsidered.................11Table2.2MonteCarloMean(Standarddev.)fortheselectednumberofnonzerocovariatesusing100datasetsunderbothIndependentandDependentsetupusingaspatiallyweightedgroupLASSOalgorithm.........27Table2.3Comparingthetwomethods(IndependentapproachandDependentap-proach)fortherealdataexampleusingGroupLASSOandAdaptiveGroupLASSOAlgorithm.VariabledescriptionforIDis1:PopulationMortality,2:Poverty,3:PM25,4:Urban,5:Nonwhite,6:NeverMar-ried,7:Agriculture,8:Unemployment,9:WhiteCollar,10:HigherHighschool,11:Agemorethan65,12:Agelessthan18,13:Crowding,14:Foreignborn,15:Languageisolation,16:Medianhouseholdincome,17:Samehousenomigration,18:MovesameCounty,19:MovesameState,20:Movedi˙erentState,21:Normalizedcostofliving.......30Table2.4Coe˚cientestimates(standarderror)andcorrespondingp-valuesobtainedfromLinearregressionusingRofthevariablesselectedunderindepen-denterrorassumption............................30Table3.1Numberofknotsnecessaryforeveryresolution...............70Table3.2Mean(StandardDevation)of200MonteCarlosimulationsforrankes-timationofthenonstationarycovariancematrix.............76xiLISTOFFIGURESFigure3.1Locationssitesinthestudy.........................65Figure3.2Firstresolutionoverlayedonlocationsitesinthestudy..........65Figure3.3Secondresoltionsoverlayedonlocationsitesinthestudy.........66Figure3.4resolutionsoverlayedonlocationssitesinthestudy............66Figure3.5Thirdresolutionsoverlayedonlocationssitesinthestudy........67Figure3.6Threeresolutionsoverlayedonlocationssitesinthestudy........67Figure3.7Threeresolutionsoverlayedonlocationssitesinthestudy........68Figure3.8Threeresolutionsoverlayedonlocationssitesinthestudy........68Figure3.9QuantileImageplotofbgL,theestimatedcovariancematrixoftheobservedprocess...............................77Figure3.10QuantileImageplotofbgLestimatedcovariancematrixoftherandome˙ectsvector.................................77xiiCHAPTER1INTRODUCTIONFormanystatisticalproblemsarisinginspatiallyobserveddata,itisstraightforwardtoconsidertheunderlyingprocessY=fY(s);s2Sg,isaspatiallyvaryingprocess.Itisalsooftenassumedtohaveameanfunctionwhichisbasedontheavailablesetofcovariates.Letusdenotethemeanfunctionbym(s).IfwecentertheunderlyingprocessY(s)withm(s)theresidual,(s)isoftenbroadlyassumedtohaveGaussianerrorstructureorinsomemoregeneralnotionofhavingaGaussiantailstructure.Wewilldiscussthedependencestructureofthisspatialprocesselaboratelyinthiswork.Thereforethefollowingmodelcouldbethebasicofthisresearch.Y(s)=m(s)+(s):(1.0.1)1.1AdditivemodelbuildingConsiderthatthemeanfunctionm(s)ismodeledusingJcovariatesdenotedas,fX(s)=(X1(s);;XJ(s))g,andeachofthesevariablesarespatiallyvaryingprocessestoo.InasimplerversionalinearrelationisassumedbetweeneachoftheseXj(s)'sandY(s).Thisworkwillbegeneralizingthelinearrelationtoabroaerscenarioofanypossibleunknownfunctions,whichisreferedasgeneraladditivemeanfunction.Thisisanon-parametricversionofthelinearityassumption.Thenon-parametricnatureoffunctionalregressionmakesthemeanpredictioncomplicatedtostartwithwhichrequiresspecialattentionandwewillfollowingsomepre-provedstandardsfromtheliteratureforthiscomplicacy.So,themeanfunctionm(s)canberepresentedasthefollowingadditivemodel,Y(s)=+JXj=1fj(Xj(s))+(s);(1.1.1)with,beingtheoverallmean.Nowifwewanttoputsomelightonthespatiallydepen-dencestructureof(s),wedi˙erentiatetheproblemintotwocases,eitherwithstationary1covariancestructureor,withnon-stationaryandanisotropiccovariancestructure.Althoughthe˝nalgoalistogetajointestimationofmeanandcovariancefunction,itisnotastraight-forwardextensionofcombiningmeanandcovariancefunctionsestimation.Sowewilltakeonestepatatime.ThetechniqueusedforselectingadditivecomponentsisshowntobegeneralizingoveritsindependentGaussianerrorcounterpart.Wealsoconsiderasituation,wherethenumberofcovariates,Jisincreasinginnandcanevengrowexponentiallywiththenumberofsubjectsusedtoestimatethemodel.Inaspatialstudythenumberofsubject,isthenumberoflocationsiteswherewehaveobservedtheprocessY(s).Thedimensionofthecovariancematrix,isincreasingwiththenumberoflocationconsideredunderstudy.Thisisaseriousissue,but˝rstgoalistoidentifythosevariableswhicharerelevantinmakingapredictionoftheresponsevariable.WeshouldkeepinmindthathereJ>n,hencethestandardvariableselectiontechniquesfallapartand,sowetooshallexploresomerecentadvancementinpenalizedoptimizations.Wewillalsopointoutthechallengesindecidingtheoptimalpenaltyparameterspeci˝ctoourproblem.Theconceptofpenalizedoptimizationhavetwomajorcomponents,oneisthestatisticallikelihoodwhichindicatesthedistributionoftheerrorprocessorthenatureoferrornorm.Weareinterestedinminimizingthelikelihoodsubjecttoaconstraint.Thesecondcomponentistheconstraintandalsocalledthepenaltycomponent.Thepenaltycomponentisdrivenandmotivatedfromthefactthatnumberofparametersinthemodelislargerthanthenumberofspatiallocationsobservedforthestudy.Wewanttopenalizeouroriginallikelihoodandshrinksomeofthecomponentinouradditivemodeltozero.Overthelastdecaderesearchonpenalizedoptimization˛ourishedduetoabundanceofhighdimensionalprobleminvarious˝eldsofappliedscience.TheGaussianerror=((s);s2S)inthemodel,inequation(1.1.1)hasaspatialdependence.Forfuturenotationalpurposesletussay˘N(0;.Fromtheperspectiveofestimatingmeanorselectingvariablesinapredictionmodel,wecanusethefollowing2penalizedoptimizationfunction,Q(f1;:::;fJ;n)=0@YJXj=1fjXj1A010@YJXj=1fjXj1A+nJXj=1p(fj)(1.1.2)Weknowthetechniquetocontrolthesegeneraladditivemodelsaretoapproximateeachofthesefunctionsfjusingsplinerepresentation.ThecorrespodingstructuregivesrisetoaparametricformulationthatallowsustouseaverycelebratedtechniquecalledgroupLeastAbsoluteShrinkageandSelectionOperator(LASSO).Wewillskipandleavedetaileddiscusssiononpenalizedoptimizationforrestofthethesis.Jointestimationofmeanandthedependencestructureorevenconsideringaspatiallydependenterror,butavoidestimatingthecovariancefunctionwhileusingtechniqueslikegroupLASSOcomplicatestheprobleminvariousspectrum.Equation(1.1.2)givesa˛avorofpenalizedoptimizationofweighted`2norm,wheretheweightsareproportionaltotheinverseofthedependencestructureoftheGaussianerrorprocess.Thechallengesofestimatingwhiledetectingimportantadditivecomponentsareboththeoriticalandalgorithmic.Sowewillkeepthejointestimationoutofthepictureforawhileanddealeachproblemseparetly.Therearetwoalternativewaystomodelthisdependencestructure,onewhereweassumethatdependencebetweentwolocationsitesolelydependsonthedistancebetweenthetwosites.Thecovariancefunctioninthiscaseisaparametricfunctionofthedistancethistechniqueofmodelingthecovariancefunctionintheliteraturehasbereferedasstationarycovariances.Anotheralternativeandamoregeneralmodeliswhenwecanrelaxtheassumptionofde-pendencethroughdistancebetweenlocationsitesbyintroducingtheclassofnon-stationaryandanistropiccovariancefunctions.Thenextchapter,chapter2isonselectingtheneces-sarynon-zerocomponentsoutofalltheJcovariatesunderstudywithoutgoingintothecomplicationsofestimating.Tostartwithweshallrestrictourselevestotheclassofsta-tionarycovariancefunctions,i:e:=˙t(ss0;s;s02S)andss0isthemeasureofdistancebetweensands0:Ontheotherhandchapter3considerstheageneralnon-stationary3covariancefunction.Althoughisunknown,inournextchapterthecovariancefunctionreferedaboveas˙t(),belongstoeitherofaclassofparametriccovariancefunctionsviz.Exponential,Gaus-sian,InverseMulti-quadraticsorMatern.Wesuccesfullydefendedtheideathatselectionofcomponentsamongallthecovariatesinouradditivemodelisrobustinthechoiceofthepreciseformofthetruecovariancefunction.Wemadesigni˝cantamountofresearchabouthowthepenaltyparameterofourspatiallydependentmodelshoulddi˙erascomparedtoitsindependentcounterpart.Insteadofusing,thetruecovariancematrixin(1.1.2)weuseadi˙erentspatialweightmatrixW=˙w(ss0;s;s02S),whichanusercanchoosebasedonanexploratoryanalysisofdata.Henceforthweoptimized,Q(f1;:::;fJ;n)=0@YJXj=1fjXj1A0W10@YJXj=1fjXj1A+nJXj=1p(fj)(1.1.3)1.2Estimatingnon-stationarycovarianceAsmentionedournextgoalistoexplorehowtoestimatethenon-stationarycovariancefunction.Thecomplicationistwofold.First,jointestimationofbothmeanandcovarianceforaGaussianprocessworksbetteriftheoveralllikelihoodisoptimizedratherminimizingjustthepredictionerrorlike(1.1.2).Second,thecomputationtimeofthelikelihooodorthegradientoftheoptimizationfunctionforanon-stationarymatrixisofcubicorderofthedimensionofthesquarematrix.Toeastoutletusreducetheburdenofselectionofadditivecomponents.Chapter3introducesthetechniqueofusingspatialrandome˙ectsmodelsandmulti-resolutionknotstomodelthenon-stationarycovariancematrixovercomingbothcomplications.Spatialrandome˙ectsmodelingallowsustoincorporateanotherindepdentadditiveGaussiancomponentˇ(s)alongwiththestationarycomponent(s).Formoresimplicity4weconsider˘N(0;˙2I)i:e:,thestationarycovariancefunctiontakesnon-zerovalueonlywhendistanceiszero.Theobjectiveofusingspatialrandome˙ectsmodelistocapturethecovariancebetweentheresponsevariableYattwodi˙erentlocationsands0through,covY(s);Y(s0)=8>>><>>>:R(s)01rrrR(s0)r1ifs6=s0R(s)01rrrR(s0)r1+˙2ifs=s0:(1.2.1)Inspatialliteraturetheparameter˙iscallednuggete˙ect.Notetheaboverepresentationrequiresavalidquadraticformmultiplication,wherethelengthofthevectorR(s)0andthedimensionofthepositivede˝nitematrixshouldcoincideandisdenotedbyr.Themagnitudeofrplaysanimportantroleinreducingcomputationtimeofinverseoftheoverallcovariancematrix˙2I+RR0.Thechoiceofrhasbeenalwaysachallenge.Tounderstandthenatureoftheparameterr,analternativerepresentationof(1.0.1)asfollows,Y(s)=m(s)+ˇ(s)+(s)=m(s)+R(s)+(s);(1.2.2)where˘Nr(0;isassumed,andusedinthelastchapter.Amagni˝edlookoftherepresentation,ˇ(s)=R(s)0,givesˇ(s)=rXk=1Rk(s)k(1.2.3)whereR(s)=(R1(s);R2(s);:::;Rr(s))0weightsforrcomponentsofthespatialrandome˙ects.Thevaribilityoftherandome˙ect,isapositivede˝nitematrix.Sowecanuseacholeskyrepresentation=0.Beforewegointothedetailsofhowweareproposingtoestimatebothparametersrandletmeremindthenatureofm(s)isrelaxedfromageneraladditivemodelsandusejustalinearmodel.WeevendisregardtheneedtoselectingthevariablesassumingJ>><>>>:1=k^gL;jk2ifk^gL;jk2>0;1ifk^gL;jk2=0:(2.1.7)inanobjectivefunctionwhichiscalledanadaptivegroupLASSO(AgL)objectivefunction.Thatis,wede˝neanAgLobjectivefunction,Qn2(;n2):=Qn(;n2)with ngivenin(2.1.7).Theestimatefromthisupdatedobjectivefunction,^AgL(n2)=argminQn2(;n2),asafunctionofn2isreferredasanadaptivegroupLASSOestimate.14Wede˝ne10=0,socomponentsnotselectedbythegLmethodarenotincludedfortheAgLmethod.Also,duetothenatureofweightsinthepenaltyterm,theAgLobjectivefunctionputshigherpenaltyoncomponentswithsmaller`2-normandlowerpenaltyoncomponentswithlarger`2-normofthegLestimates.Hence,thecomponentswithlargergLestimateshavehigherchancetobeselectedinthe˝nalmodel.Finally,theAgLestimatesforandfjaregivenby^=Y=1nXs2SY(s)and^fAgL;j(Xj(s))=mnXl=1^AgL;jlBl(Xj(s))forallj=1;:::;J,respectively.Computationallye˚cientobjectivefunction:Theobjectivefunction,(2.1.6),involves1Wwhichmaycausecomputationalcomplicacyparticularlyforlargen.Toavoidsuchsituation,wereformulate(2.1.6)usingCholeskydecompositionof1W.Let1W=LL0,whereLisalowertriangularmatrix.Then,(2.1.6)canberewrittenas,Qn(;n)=(ZcDc)0(ZcDc)+nkk2;1; n;(2.1.8)whereZc=L0YcandDc=L0BcwithDcj=L0Bcj.Then,theobjectivefunctionbecomestheonewithnospatialweightmatrixwithanewresponsevariableZsothatwecanadoptanavailablealgorithmforagroupLASSOmethodwithindependenterrors.NotethatweusedaknownspatialweightmatrixsothatCholeskydecompositiontogetLisdoneonlyonce.2.2MainTheoreticalResultsInthissection,wepresentresultsonasymptoticpropertiesofthegLandAgLestimatorsintroducedinSection2.1.Westartwithintroducingsomenotations.LetA0andAbethesetsofzeroandnonzerocomponents,respectively.Withoutlossofgenerality,weconsiderA=f1;2;:::;qgandA0=fq+1;:::;Jg.Also,weassumethatthereexists~A0thatsatis˝esPj2~A0kjk21forsome10andlet~A=f1;2;:::;Jgn~A0.Existenceof~A0isreferredtoasgeneralizedsparsitycondition(GSC)[ZhangandHuang(2008)].15First,weconsidertheselectionandestimationpropertiesofagLestimator.Let~AbetheindexsetofnonzerogLestimatesforjand^AfbetheindexsetofnonzerogLestimatorsforfj.NecessaryassumptionstostudyasymptoticpropertiesaregiveninAssumption2.2.(H1)AmongJcovariates,thenumberofnonzerocomponents,q,is˝xedandthereexistsaconstantkf>0suchthatmin1jqkfjk2kf.(H2)Thereexistsv>0suchthatmins6=s02Skss0k2>v,wherekk2foravectoristhe`2-normsothatkss0k2istheEuclideandistancebetweensands0.(H3)Therandomvector=f(s);s2Sg˘Gaussian(0;T),whereTisconstructedbyastationarycovariancefunctionT(h)whichsatis˝esRDnT(h)dh<1.DnˆRdisthesamplingregionthatcontainsthesamplinglocationsS.Withoutlossofgenerality,weassumethattheoriginofRdisintheinteriorofDnandDnisincreasingwithn.(H4)fj2FandEfj(Xj)=0forj=1;:::;J,whereF=nfjf(k)(s)f(k)(t)Cjstj;8s;t2[a;b]oforsomenonnegativeintegerkand2(0;1].Alsosupposethat˝=k+>1.(H5)ThecovariatevectorXhasaboundedcontinuousdensityfunctiongj(x)ofXjonabondeddomain[a;b]forj=1;;J.(H6)mn=O(n)with1=6=1=(2˝+1)<(1)1=3.(H7)Wisconstructedbyastationarycovariancefunction(h)thatsatis˝esthesameconditionasT(h)in(H3)andW)=1W)MforsomeM<1,whereistheconditionnumberofamatrix.Theassumption(H1)indicatesthatweneedstrongsignalsfornonzerocomponentstodistinguishthemfromthenoise.Theassumption(H2)impliesthatweconsiderincreasing-domainasymptotics[Stein(1999)]forourlargesampleproperties,whichisacommonsam-plingassumptionfortheasymptotictheoryofspatialstatistics.16Theassumption(H3)speci˝esdistributionalassumptionofspatialerroranditsspa-tialdependence.Commonlyavailablestationaryspatialcovariancemodelssatisfyintegrableassumption.Forexample,popularspatialcovariancefunctionssuchasExponential,Maérn,Gaussiancovariancefunctionsareallintegrable.Fortheexplicitexpressionofsuchcovari-ancefunctions,pleaseseethesupplementarymaterial.WeassumethatDncontainstheorigintohavespatiallaghandobservationlocationsonthesamespatialdomain.Thestationaryspatialcovariancefunction(h)givesamarginalvarianceath=0(i.e.s=s0)anddecreasestowardzeroaskhk!1.Theconditionon(h)intheassumption(H3)becomesmeaningfulbyassumingDncontainstheorigin.Forexample,theconditiondoesnotguaranteewhether(h)isintegrableifwedonotassumethatDncontainstheorigin.Theassumption(H4)considersfjisintheclassoffunctionsde˝nedon[a;b]suchthatthekthderivativesatis˝estheLipschitzconditionoforderandzeroexpectationconditionisneededtoavoidanidenti˝abilityissue.Theassumption(H5)isneededtohavesplineapproximationforadditivecomponents.Theassumption(H6)isrelatedtothenumberofB-splinebasestoapproximateadditivecomponents.Theparameterinassumption(H6)controlsthesmoothnessofadditivecomponents,whereimposinganupperandlowerboundtoimpliesthatthosefunctionscanbeneithertoosmoothnortoowiggly.Iffunctionsaretoosmooth,thenitwouldbehardtodetectthosedistinctivelyfromtheoverallmean,whereas,ifthefunctionsaretoowiggly,thenitwouldbehardtodetectthosedistinctivelyfromtherandomnoise.Theassumption(H7)impliesthataspatialweightmatrix,W,isawell-conditionedmatrix.Notethatweconsiderastationaryspatialcovariancefunctiontoconstructaspatialweightmatrix.Undertheassumption(H2),onecanseethatthesmallesteigenvalueofthespatialweightmatrixconstructedbyseveralspatialcovariancefunctionsisboundedawayfromzero[see,e.g.Wendland(2005)].Ontheotherhand,thelargesteigenvalueofaspa-tialweightmatrixisrelatedtothenormofthematrix.BytheGer²gorin'stheorem[HornandJohnson(1985)],wecanshowthatˆmaxW)maxjPkjW;jkj=maxjPk(sjsk),17whereW;jkisthe(j;k)-thentryofW.Thisisboundedbya˝niteconstantwhichisinde-pendentofnforthespatialweightmatrixfromstationaryintegrablecovariancefunctions.Now,weintroducetheconsistencyresultforthegLestimator.Theorem1.SupposethatconditionsinAssumption2.2hold,andifn1>Cˆmax(L)qn1+mnlog(Jmn)forasu˚cientlylargeconstantC.Then,wehave(a)PJj=1k^gL;jjk22=Op ˆ2max(L)m3nlog(Jmn)n1+mnn1+1m2˝1n+4m2n2n1n2!,(b)Ifm2n2n1n2!0asn!1,allthenonzerocomponentsj;1jqareselectedwithprobability(w.p.)convergingto1.Thespatialdependenceofthedatacontributestothetheoreticallowerboundofthepenaltyparameterandtheconvergenceratebyanadditionalmntermcomparedtotheinde-pendentdatacase.Recallthatmnisthenumberofsplinebasisfunctionsforapproximatingthefj's.Thisadditionalmncomesfromtheboundoftheexpectedvalueforafunctionofspatiallydependenterror(see,Lemma??).ThespatialweightmatrixalsocontributestothetheoreticallowerboundofthepenaltyparameterandtheconvergencerateviathemaximumeigenvalueofL.RecallthatListheCholeskydecompositioncomponentof1W.Thisimpliesthatspatialdependenceofthedataandaspatialweightmatrixa˙ectthecon-vergencerateandtheselectionofcomponentsviathechoiceofthepenaltyparameter.Alltheseadditionalquantitiesmakethelowerboundofthepenaltyparameterforspatialad-ditivemodelslargercomparedtotheindependentdatacase.AlargerlowerboundreducesoverestimationevenincaseofW=Iduetotheadditionalmnfromspatialdependence.Thisisalsoobservedinthesimulationstudy.ToproveTheorem1,weneedtocontrolsplineapproximationoffjaswellasthespatialdependenceintheerrorterms.Theboundednessofthenumberofselectedcomponentsiscriticaltoprovethetheorem.TheseareinvestigatedaslemmasinAppendix2.5andusedin18theproofofthetheorem.Althoughtheapproachtoprovetheasymptoticpropertiesstatedaboveforspatialadditivemodelsissimilartotheoneforadditivemodelswithindependenterrors,detailsaredi˙erentduetospatialdependenceaswellasaspatialweightmatrixintheproposedmethod.ProofsofTheorem1aswellassubsequenttheoremsaregivenintheAppendix2.5.Nexttheoremprovidesconsistencyintermsoftheestimatednonlinearcomponents^fj.Theorem2.SupposethatconditionsinAssumption2.2holdandifn1>Cˆmax(L)qn1+mnlog(Jmn)forasu˚cientlylargeconstantC.Then,(a)k^fgL;jfjk22=Op ˆ2max(L)m2nlog(Jmn)n1+1n1+1m2˝n+4mn2n1n2!forj2~A[A,where~AistheindexsetofnonzerogLestimatesforj,(b)Ifmn2n1n2!0asn!1,allthenonzerocomponentsfj;1jqareselectedw.p.convergingto1.Notethatby(a)and(b)ofTheorem2withn1=O(ˆmax(L)qn1+mnlog(Jmn))andmn=O(n)with1=6=1=(2˝+1)<(1)1=3(assumption(H6)),wehave,(i)k^fgL;jfjk22=Op ˆ2max(L)n2log(Jmn)n1!;forj2~A[Aand,(ii)Ifˆ2max(L)log(J)n12!0asn!1then,withprobabilityconvergingto1,allthenonzerocomponentsfj;1jqareselected.Hence,wecaninferthat,thenumberofadditivecomponents,J,canbeaslargeupto,expoˆ2min(L)n(12)2:Inparticular,ifweneedsecondorderdi˙erentiabilityforthefunctionsfj,then˝=2implies=1=5andinthiscaseJcanbeaslargeasexpoˆ2min(L)n(3=5)3=5.KeepingL˝xed,wecanseethatJincreasesexponentiallyinn.19Next,westatetheadditionalassumptionsfortheasymptoticpropertiesoftheAgLestimator.(K1)Theinitialestimators,bgL;j,arern-consistent:rnmax1jJbgL;jj2=Op(1);asrn!1;andthereexistsaconstantkb>0suchthatP(minj2Ak^gL;jk2kbbn1)!1wherebn1=minj2Akjk2m1=2n,wherefortwopositivesequencesanandbn,anbnifthereexistsa1anda2suchthat00and20=0ifkxk2=0.De˝neJ0=jA[fj:k^AgL;jk2>0gj.NotethatJ0isboundedbya˝nitenumberw:p:convergingto1byTheorem1.Theorem3.SupposethatconditionsinAssumptions2.2and2.2aresatis˝ed.Then,(a)P(^AgL=0)!1,(b)Pqj=1k^AgL;jjk22=Op ˆ2max(L)m3nlog(J0mn)n1+mnn1+1m2˝1n+4m2n2n2n2!.ByTheorem3,wecanshowthattheproposedAgLestimatorofcanseparateouttruezerocomponentsforthespatialadditivemodel.SimilartoTheorem1,wehaveadditionalquantitiesontherighthandsideoftheexpressioninTheorem3(b),whichareduetothespatialdependenceinerrorsaswellasuseofaspatialweightmatrix.SinceJ0isboundedbya˝nitenumberw:p:convergingto1,theconvergencerateoftheAgLestimatorofisfasterthanthegLestimatorof.NexttheoremshowsthatestimatedcomponentsforfjsinthespatialadditivemodelusingtheAgLestimatorofcanidentifyzerocomponentsconsistently.Also,thetheoremprovidestheconvergenceratefortheestimatedcomponents.Theorem4.SupposethatconditionsinAssumptions2.2and2.2aresatis˝ed.Then,(a)P(k^fAgL;jk2>0;j2Aandk^fAgL;jk2=0;j=2A)!1,(b)Pqj=1k^fAgL;jfjk22=Op ˆ2max(L)m2nlog(J0mn)n1+1n1+1m2˝n+4mn2n2n2!.TheupperboundsoftheconvergenceratesinTheorems14showthattheconvergenceratesareslowerthanthosefortheindependentdatacase,whichisnotsurprisinggiventhatwearedealingwithdependentdata.Also,ourtheoreticalresultsshowweneedanimprovedlowerboundofthepenaltyparameterforspatialadditivemodels,whichiscriticalsincethepenaltyparameterissensitivetotheselectionresultsinpractice.Thisissupportedbythesimulationstudyinthenextsectionwherewecanclearlyseeworseperformancewhenweblindlyapplytheapproachdevelopedforindependentdatatoselectnon-zerocomponentsinaspatialadditivemodel.Weassumedthateachadditivecomponentsharesthesamesmoothnessby(H4)inAssumption2.2.Wecanextendourresultstoallowdi˙erentlevelsofsmoothnessforadditive21componentswithoutmuchdi˚culty.NecessarychangesinAssumption2.2are(H4)0fj2FjandEfj(Xj)=0forj=1;:::;J,whereFj=ˆfjf(kj)(s)f(kj)(t)Cjstjj;8s;t2[a;b]˙forsomenonnegativeintegerkjandj2(0;1].Alsosupposethat˝j=kj+j>1,and,(H6)0mnj=O(nj)with1=6j=1=(2˝j+1)<(1)1=3;8j=1;2;:::;J,wheremnjisthenumberofB-splinebases(orknots)toapproximatethejthadditivecomponent.Therevisedtheoremsandlemmasareprovidedinthesupplementarymaterial,wheremn=maxj=1;:::;Jmnj.Resultsaresimilarexceptafewchangesduetotheintroductionofmnj.Inpractice,westandardizetherangeandvariabilityofalladditivecomponentsandusethesamenumberofknotpointsforeachcomponentwhichwechooseasmn.Notethatthelargestnumberofknotpointsmncancoverallsmoothadditivecomponents.Sincetheorderofmnalsohastobebetweenhn1=6;n(1)1=3accordingtotheassumption(H6)0,weusethisboundasaguidelinetochoosemn.Thesuggestionofusingthesamenumberofknotpointsforeachcomponent,inpractice,isalsosuggestedbyHastieandTibshirani(1990).2.2.1SelectionofapenaltyparameterandaspatialweightmatrixTheselectionresultissensitivetothechoiceofapenaltyparameter(orregularizationparameter).Inadditiontothepenaltyparameter,theproposedapproachforaspatialadditivemodelrequirestochooseaspatialweightmatrixaswell.Theoreticalresultsonlyprovidethelowerboundofthepenaltyparameterwhichinvolvestheinformationofaspatialweightmatrixthroughˆmax(L).Theapproachto˝ndanoptimalpenaltyparameterinthepenalizedmethodsforindependentdatacannotbeapplieddirectlytooursettingduetothespatialweightmatrix.Completetheoreticalinvestigationfor˝ndinganoptimalvalueisinteresting,however,beyondthescopeofthisarticle.Also,theoreticallyobtainedoptimal22choiceofthepenaltyparameterisnotfeasibleinpracticesinceitisonlyvalidasymptoticallyanditoftendependsonunknownnuisanceparametersinthetruemodel[FanandTang(2013)].Thus,wedemonstratebelowapracticalwayofselectingthepenaltyparameterguidedbytheoreticalresultsderivedinthisarticle.Weassumeaspatialweightmatrixisconstructedbyaclassofstationaryspatialco-variancefunctionscontrolledbyaparameter,ˆ,forsimplicity.Wecallˆaspatialweightparameter.Exampleclassesofspatialcovariancefunctionsthatsatisfytheassumption(H7)toconstructaspatialweightmatrixareGaussiancovariancefunctionandinversemulti-quadraticfunction.Theexplicitexpressionofthesefunctionsaregiveninthesupplementarymaterial.Theselectionproblemisthenreducedtoselectaspatialweightparameter.Tochooseapenaltyparametertogether,weconsideratheoreticallowerboundofthepenaltyparameterasourpenaltyparametervalueatagivenvalueofˆsothatoneparameter(aspatialweightparameter)controlsbothregularizationlevelandspatialweight.Finally,weadoptgeneralizedinformationcriterion(GIC)[Nishii(1984)andFanandTang(2013)]asameasuretochooseaspatialweightparameter.Thatis,we˝ndˆthatminimizesGIC(n(ˆ))=log(RSS)+dfnloglog(n)log(J)n.Inpractice,weconsiderasequenceofˆandchoosetheonethatminimizesGIC.Whileexperimentingwithdi˙erentinformationcrite-riaandcomparingwiththeexistingcrossvalidationcriterionsuggestedbyYuanandLin(2006),wenoticedthatwecannothaveaninitialleastsquareestimatorwhenJ>n.Thus,wede˝nedegreesoffreedom(dfn)asthetotalnumberofestimatednon-zerocomponents,i.e.dfn=^qnmnwhere^qnistheactivesetofselectedvariables.Fortheindependentdatacase,Huangetal.(2010b)suggestedextendedBayesianinformationcriterion(EBIC)tochooseapenaltyparameter,whichrequirestochooseanadditionalparameter(intheirpaper).Wefoundthatasmallervaluecomparedtothesuggestedvaluefortheadditionalparameterworksbetterinoursettingfromthesimulationstudy.GiventhesensitivityofEBICwiththisadditionalparameter,weinsteadrecommendtouseGIC,whichdoesnothaveanyadditionalparameter.23Therearetwoplaceswhereaspatialcovariancemodelisconsidered.Oneisformodelingaspatialdependenceofthedataandtheotheroneisforconstructingaspatialweightmatrixintheobjectivefunction.Ourtheoryshowsthatthemethodisvalidforaclassofunderlyingspatialdistributionsthatsatisfythecondition(H3)andforaclassofspatialweightmatricesthatsatisfythecondition(H7).However,somespatialcovariancemodelsthatsatisfy(H3)maynotsatisfy(H7).Inthisregard,ourapproachismoregeneralasitcoversthecasethatthetruespatialcovariancematrixasaspatialweightmatrixaslongasbothconditions(H3)and(H7)arevalid.Further,ourobjectiveisnottoestimatethetruecovariancematrixandourmethoddoesnotrequiretoestimatethetruecovariancematrixtoselectadditivecomponents.2.3Numericalinvestigation2.3.1SimulationstudyInthissection,wepresentasimulationstudytoillustrateourtheoretical˝ndings.WeconsiderS=f(s1;s2);si;sj=1;;m;gwithm=6;12;24whichmakessamplesizes,n=36;144;576,respectivelyandweconsiderJ=15;25;35.Wehaveq=4nonzerocom-ponentswhicharef1(x)=5x,f2(x)=3(2x1)2,f3(x)=4sin(2ˇx)=(2sin(2ˇx)),f4(x)=0:6sin(2ˇx)+1:2cos(2ˇx)+1:8sin2(2ˇx)+2:4cos3(2ˇx)+3:0sin3(2ˇx)andtheremainingcomponentsaresettozero.Thatis,fj(x)0forj=5;;J.ThecovariatesareXj=(Wj+tU)=(1+t),forj=1;;J,whereWjandUarei.i.d.fromUniform[0,1].NotethatcorrelationbetweenXjandXkisgivenast2=(1+t2),therefore,withanincreaseint,thedependenceamongcovariatesincreases.Inthesimulationstudy,weconsidert=1andt=3.Notethatthenonzerocomponents,f1;;f4givenabovearetakenfromHuangetal.(2010b)whichareoriginallyintroducedbyLinandZhang(2006).WeassumethattheerrorprocessfollowsastationarymeanzeroGaussianprocesswithaspatialcovariancefunction,(h).Toinvestigateselectionperformanceoftheproposed24methodbythetypeofspatialdependenceoftheprocess,weconsiderthreedi˙erentco-variancemodels:Exponential,MatérnandGaussiancovariancefunctions.ExponentialandGaussiancovariancefunctionshavetwoparameters,˙2andˆandMatérncovariancefunc-tioninvolvesonemoreparameter,.Theexpressionofsuchspatialcovariancefunctionsaregiveninthesupplementarymaterial.Forsimplicity,wesetthevariance,˙2=1.Foranexponentialcovariancefunction,weconsiderˆ=0:5and1.ForaMatérncovariancefunction,weconsider=3=2and5=2andforeachofthesecaseswehavechosenˆtobe1:5and2:5.ForaGaussiancovariancefunction,weconsiderˆ=1:5and2:5.Theparameterˆcontrolshowfastthecovariancefunctiondecaysasthedistancekhkincreases,thus,thelevelofspatialdependencewithinthecovariancemodel.Givenspeci˝edfjsandspatialdependencestructureoftheerrorprocessdescribedabove,wegenerate100datasetsforeachcase.Threecovariancefunctionsarecharacterizedbymeansquaredi˙erentiabilityoftheprocess.AGaussianprocesswithanexponentialcovariancefunctioniscontinuousinameansquaredsensewhileaGaussianprocesswithaGaussiancovariancefunctionisin˝nitelydi˙erentiable.Asanintermediate,aGaussianprocesswithaMatérncovariancefunctionisd1etimesdi˙erentiableinameansquaredsense,wheredxeisthesmallestintegerlargerthanorequaltox[Stein(1999)].Themeansquaredi˙erentiabilityisrelatedtothesmoothnessoftheprocesses,thatis,thelocalbehavioroftheprocesses,thus,wecanalsoinvestigateselectionperformanceinviewoflocalpropertyoftheprocessesbyconsideringdi˙erenttypesofcovariancefunctions.Oncethedataaregenerated,theselectionperformanceoftheproposedmethodisexaminedunderseveralchoicesofspatialweightmatrix.Inparticular,weconsideredtwoclasses:GaussianandInverseMultiquadraticfunctions.Inaddition,wealsoconsideredidentitymatrixandthetruecovariancefunctionoftheunderlyingprocessasaspatialweightmatrixforcomparison.Whenaspatialweightmatrixiscontrolledbyaspatialweightparameter,weappliedtheapproachintroducedinthesection2.2.1toselectbothaspatial25weightparameterandapenaltyparameter.Whenweconsidernospatialweightmatrix,wedonothaveaspatialweightparametertocontrol.Inthiscase,weconsideredasequenceofthepenaltyparametervaluesaroundthetheoreticallowerboundofn1andchoosetheonethatminimizesGIC.WealsoimplementedthemethodbyHuangetal.(2010b)whichwasdevelopedforindependentdataforcomparison.Wereferthisapproachas`independentapproach'.Fol-lowingthestandardpractice,wecomputedtheaverageandstandarddeviationofTruePositive(TP)andFalsePositive(FP).TPisthenumberofadditivecomponentsthatarecorrectlyselected,FPisthenumberofadditivecomponentsthatarefalselyselected.Inoursimulationsetting,thedesiredvaluesofTPandFPare4(=q)and0,respectively.Table2.2showstheselectedresultsforthecasewitht=1whengeneratingXj,m=6;24andaselectedsetoftruecorrelationparametervalues.Completesimulationresultsaregiveninthesupplementarymaterial.Toseetheapplicabilityofourmethodforthecasewherethedependencebetweencovariatesincreases,weincluderesultscorrespondingtot=3inthesupplementarymaterialaswell.The˝rstrow(whereSpatialWeightis`None(Indep)')ineachofthecovariancemodelblockinTable2.2correspondstoindependentdataapproach.Thisclearlyindicatesover-estimationofFPcomponents.BylookingatTP,onemaythinktheresultisgoodfortheindependentapproach,butwiththeexpenseoflargerFP,themethodisactuallyselectingmanymorecomponentsthanthetruth.Thetrendremainsevenwhensamplesizeincreases.Thisisexpectedasdiscussedbefore.Thefollowingrowsareresultsfromvariouschoicesofspatialweightmatrices.Ourmethodsuccessfullyreducedoverestimation.Evenwhenthespatialweightmatrixisanidentitymatrixunderdependenterrormodels,ourmethodstillreducesoverestimationofselectedcomponentscomparedtotheindependentapproach.Theresultpersistsforvarioussamplesizes(m),covariancemodelsweconsidered,choicesofspatialweightmatricesandofcoursealargenumberofcovariates(J).WhenthetruecovariancemodelisexponentialorMatérnandweusedthetrueco-26JCovModelSpatialWeightsm=6m=24GLASSOGLASSOTruePositiveFalsePositiveTruePositiveFalsePositive15Exp(0.5)None(Indep)3.74(0.48)3.63(1.83)4(0)0.55(0.77)I3.17(0.82)2.02(1.34)4(0)0(0)Gauss3.01(0.85)1.76(1.24)4(0)0(0)InvMQ2.94(0.93)1.58(1.44)4(0)0(0)True1.66(1.02)0.29(0.56)4(0)0(0)Mat3=2(2.5)None(Indep)3.8(0.45)3.63(1.86)4(0)0.34(0.57)I3.26(0.77)1.87(1.5)4(0)0.05(0.22)Gauss2.98(0.85)1.64(1.34)4(0)0.01(0.1)InvMQ2.74(0.86)1.5(1.18)4(0)0(0)True0.8(0.68)0.09(0.32)4(0)0(0)Mat5=2(2.5)None(Indep)3.82(0.41)3.59(2.08)4(0)0.58(0.88)I3.22(0.76)2.06(1.55)4(0)0.04(0.2)Gauss2.91(0.89)1.72(1.35)4(0)0(0)InvMQ2.97(0.89)1.67(1.33)4(0)0(0)True0.37(0.56)0.04(0.24)4(0)0(0)Gauss(1.5)None(Indep)3.76(0.49)3.97(2.06)4(0)0.59(0.81)I3.23(0.75)2.07(1.24)4(0)0.03(0.17)Gauss3.02(0.82)1.86(1.25)4(0)0.01(0.1)InvMQ2.76(0.79)1.58(1.17)4(0)0(0)True3.3(0.73)2.32(1.55)4(0)0.12(0.33)35Exp(0.5)None(Indep)3.38(0.74)6.22(2.39)4(0)1.49(1.55)I2.62(0.94)3.15(2.14)4(0)0.04(0.2)Gauss2.39(0.98)2.59(1.84)4(0)0.02(0.14)InvMQ2.22(1.04)2.38(1.79)4(0)0(0)True1.21(0.95)0.48(0.78)4(0)0(0)Mat3=2(2.5)None(Indep)3.48(0.67)5.79(2.32)4(0)1.25(1.37)I2.71(0.9)2.94(1.75)4(0)0.1(0.39)Gauss2.57(0.92)2.59(1.6)4(0)0.01(0.1)InvMQ2.25(0.9)2.24(1.68)4(0)0(0)True0.52(0.58)0.12(0.41)4(0)0(0)Mat5=2(2.5)None(Indep)3.33(0.74)5.59(2.19)4(0)1.34(1.51)I2.72(1)3.06(1.97)4(0)0.1(0.33)Gauss2.53(1.04)2.57(1.81)4(0)0.01(0.1)InvMQ2.25(0.97)2.17(1.68)4(0)0(0)True0.34(0.57)0.05(0.26)4(0)0(0)Gauss(1.5)None(Indep)3.15(0.88)6.37(1.95)4(0)1.78(1.54)I2.44(1.02)3.35(1.98)4(0)0.08(0.27)Gauss2.24(0.95)2.83(1.61)4(0)0.02(0.14)InvMQ2.18(0.9)2.2(1.41)4(0)0(0)True2.78(0.87)3.66(1.74)4(0)0.17(0.45)Table2.2MonteCarloMean(Standarddev.)fortheselectednumberofnonzerocovariatesusing100datasetsunderbothIndependentandDependentsetupusingaspatiallyweightedgroupLASSOalgorithm27variancemodelasaspatialweightmatrix,theproposedmethoddidnotperformwellintermsofTPforsmallsamplesize.OnepossiblereasonisthattheexponentialandMatérncovariancefunctionsinthespatialweightmatricesproducedlargermaximumeigenvaluesofLforsmallsample.Forexample,whenMat5=2(2:5)isusedforaspatialweight,thecorrespondingmaximumeigenvalueofLis20:32form=6whilethemaximumeigenvalueofLis1:23forGauss(1.5)asaspatialweight.AlargermaximumeigenvalueofLmakesalargerpenaltyparameter,solesscomponentsareselected.However,theperformanceim-provesasmincreases.Forsmallsize(m=6),inversemultiquadraticspatialweightstendtounderestimatesothattheTPislowercomparedtootherspatialweightmatrixchoices,butimprovingasmincreases.GaussianspatialweightmatrixmaintainssimilarlevelofTPwhileFPisreducedcomparedtotheindependentapproach.Thus,werecommendtouseaGaussianspatialweightmatrix,inparticularforsmallsamplesize.Resultsforincreaseddependenceamongcovariates(t=1tot=3)showthatthereisanincrease(overestimation)ofFP.Pleaseseetablesinsection6ofthesupplementarymaterial.Thisissomewhatexpectedsincestrongdependencebetweencovariatesmayhindertheselectionpowerofthevariableselectionapproaches,inturn,resultsinselectionofmorecomponents.However,ourapproachperformscomparativelybetterthantheindependentapproach.2.3.2RealdataexampleWeconsideredlungcancermortalitydataovertheperiodof2000-2005,obtainedfromSurveillance,Epidemiology,andEndResultsProgram(SEER,www.seer.cancer.gov)bytheNationalCancerInstitute,U.S.asanillustrativeexample.TheSEERdatacanbeac-cessedbysubmittingasignedSEERResearchDataAgreement.Thedataisavaiableat,seer.cancer.gov/data/access.html.TheSEERdataincludesincidenceormortalityofcancersandassociatedvariablesforU.S.counties.WeconsideredthesouthernpartofMichi-ganwhichconstituteof68counties.28WeappliedTukey'stransformation(e.g.,CressieandChan(1989))toage-adjustedlungcancermortalityratesusedastheresponsevariable.Weincluded20covariatesobtainedfromtheSEERdatabasewhichareoriginallyfromtheU.S.CensusBureau.WealsoaddedPM2:5(Particulatemattersmallerthan2.5micrometers)obtainedfromtheU.S.EPA'sNationalEmissionInventory(NEI)database(www.epa.gov/air/data/emisdist.html?st~MI~Michigan).Sinceemissiondatainthiswebsiteisavailablefortheyears2001and2002,weconsideredtheaverageof20012002emissiondata.Theunitistonspercounty.Forouranalysis,wescaledeachofourpredictorvariablesto[0;1].WeconsideredaGaussiancovariancefunctionforthespatialweightmatrixandasequenceofˆvaluearoundtheestimatedˆ,obtainedby˝ttinganempiricalvariogram.Then,weappliedtheselectionapproachintroducedinSection2.2.1.SelectedvariablesforbothgroupLASSOandadaptivegroupLASSOalgorithmsunderindependentanddependenterrorarepresentedinTable2.3.Ourmethodofvariableselectionhasastrictsenseofselectingvariablesinthesenseofdroppingmorevariables.Forthisdataexample,ourapproach(adaptivegroupLASSOcase)droppedtwomorevariablesamongthevariablesselectedbytheindependentapproach.Theselectedcomponents,`Poverty'and`Movedi˙erentstate',donotseemtoberelatedtolungcancermortalityratesdirectlybutonemaythink`Poverty'canbeaproxyofmorerelevantcovariatetolungcancermortalityrates.Forexample,astudyshowssmokingismoreprevalentinlowersocio-economicgroupsofsociety[Haustein(2006)].Thus,onecanthinkof`Poverty'asaproxyoftobaccouse.Althoughavariable`Movedi˙erentstate'waskept,ourapproachatleastdroppedafewmoreirrelevantvariablescomparedtotheindependentapproach.Toexploremoreonthosecovariatesdroppedbytheproposedapproachbutselectedbytheindependentapproach,we˝tamultiplelinearregressionmodel.Althoughweconsiderednon-linearrelationshipinspatialadditivemodelsinthispaper,simplelinearregressioncanprovideinitialassessmentabouttheresult.Wepresentoutputsoflinearregressionmodel29ID123456789101112131415161718192021IndependentGLASSOXXXXAGLASSOXXXXDependentGLASSOXXXAGLASSOXXTable2.3Comparingthetwomethods(IndependentapproachandDependentapproach)fortherealdataexampleusingGroupLASSOandAdaptiveGroupLASSOAlgorithm.VariabledescriptionforIDis1:PopulationMortality,2:Poverty,3:PM25,4:Urban,5:Nonwhite,6:NeverMarried,7:Agriculture,8:Unemployment,9:WhiteCollar,10:HigherHighschool,11:Agemorethan65,12:Agelessthan18,13:Crowding,14:Foreignborn,15:Languageisolation,16:Medianhouseholdincome,17:Samehousenomigration,18:MovesameCounty,19:MovesameState,20:Movedi˙erentState,21:NormalizedcostoflivingVariableEstimateStd.Errortvaluep-valueIntercept0.439580.074005.9401.41e-07Poverty0.424670.128873.2950.00163Unemployment0.010860.146380.0740.94107MovesameState-0.018370.10624-0.1730.86331Movedi˙State-0.154740.10379-1.4910.14106Normalizedcostofliving0.027150.073460.3700.71298Table2.4Coe˚cientestimates(standarderror)andcorrespondingp-valuesobtainedfromLinearregressionusingRofthevariablesselectedunderindependenterrorassumptionwith˝vecovariatesinTable2.4.Thisshowsthatourapproachselectedtwomostsigni˝cantvariablesbasedonp-values(bold)whileindependentapproachselectedsomeinsigni˝cantvariables.2.4DiscussionWeestablishedamethodofselectingnon-zerocomponentsinspatialadditivemodelsusingatwo-stepadaptivegroupLASSOtypeapproachandaspatialweightinthe`2-errorterm.Theconsistencyresultsallowthenumberofadditivecomponentstoincreaseexponentiallywiththesamplesize.Thetheoryshowedthatthelowerboundofthepenaltyparameterinvolvesthespatialweightmatrix.Thus,weconsideredanapproachofchoosingaspatialweightmatrixtogetherwithapenaltyparameterthatworkswellinpractice.Furthermore,ourtheoreticalresultimpliesthattheproposedvariableselectionmethodstill30workswithdi˙erentconvergencerateanddi˙erentlowerboundofthepenaltyparameter,comparedtotheindependentdataapproach,whenanidentitymatrixisusedasaspatialweightmatrixwhiletheobserveddatahasastationarydependence.Indeed,thesimulationresultsshowedsuperiorityofourapproachinthiscasecomparedtothestraightuseofindependentdataapproach.InthegLobjectivefunction,weintroducedaspatialweightmatrixinthe`2errorterm,whichlookslikethegeneralizedleastsquare.Ifweusethecovariancematrixofthetrueprocess,itistheformofageneralizedleastsquares.Note,thentheproblembecomessimultaneousestimationofmeanandvariance,whichisbeyondthescopeofthispaperasourgoalismeanselection,notcovarianceestimation.Theconditionforthespatialcovariancefunctionintheassumption(H3)canbeex-tendedtoRDn(h)dh=O(n)forsome2[0;1).Here=0correspondstothecurrentassumption(H3).Byallowing0<<1,wecanincludespatialcovariancemodelsoflong-memoryprocesses,sothatthetheoremsunderthisassumptioncoverabroaderclassofspatialcovariancefunctions.Thetheoreticallowerboundofthepenaltyparameterandtheconvergenceratewillbemodi˝edasisintroduced.Weproviderevisedlemmas,theoremsandrelateddiscussioninthesupplementarymaterialforthisextendedsetting.Spatialcovariancemodelsthatcorrespondto=0stillhaveparametersthatcontrolspatialdependence.Ourcurrenttheoreticalresultsonlybringanadditionalmnintotheconvergencerateandthetheoreticallowerboundofthepenaltyparameterwhenweconsiderspatialdependence.ThisisduetotheboundthatweusedinprovingLemma3.Ifwecan˝ndatighterboundwhichdependsonthosecovarianceparameters,itwouldhavehelpedunderstandingtheroleofthosecovarianceparametersbetterinvariableselectionofspatialadditivemodels.312.5ProofsoftheoremsBeforeprovingthetheoremsstatedinSection2.2,weintroducesomenotationsandlemmas.Notethatforf2F,thereexistsfn2Snsuchthatkffnk2=O(m˝n),wherekk2forafunctionisde˝nedaskfk2=qRbaf2(x)dxandSnisthespacespannedbyB-splinebases[e.g.seeHuangetal.(2010b)].Then,de˝nethecenteredS0njby,S0nj=8<:fnj:fnj=mnXl=1bjlBcl(x);(bj1;;bjmn)2Rmn9=;;1jJ:(2.5.1)RecallthatBcl(x)=Bl(x)1nPs02SBl(Xj(s0)),whichdependonXj.Also,forthepurposeofemphasizingthefactthat,ˇ,anddependsonsamplesizenwewillusesu˚xnforeachofthesethreequantitiesinourproofs.RecallthatA0=fj:fj(x)0;1jJgandA=f1;2;:::;JgnA0sothatA0andAarethesetsofzeroandnonzerocomponents,respectively.Weintroduced~A0astheindexsetthatsatis˝esPj2~A0kjk21forsome10andlet~A=f1;2;:::;Jgn~A0.Withoutlossofgenerality,wecanassumeA0~A0sothat~AAandq:=j~Ajq.ForanysubsetAˆf1;2;:::;Jg,de˝neA=(j;j2A)0andA=Dc0ADcA=n,whereDcA=L0BcAandBcA=(Bcj;j2A)isthesub-designmatrixformedbythecolumnsindexedinthesetA.NotethatA00.Also,denotetheminimumandmaximumeigenvaluesandtheconditionnumberofamatrixMbyˆmin(M),ˆmax(M)and(M),respectively.Lemma1(Lemma1inHuangetal.(2010b)).Supposethatf2FandEf(Xj)=0.Thenunder(H4)and(H5)inAssumption2.2,thereexistsanfn2S0njsuchthatkfnfk2=Opm˝n+rmnn:(2.5.2)Particularly,underthechoiceofmn=O n12˝+1!,wehavekfnfk2=Opm˝n=Opn˝2˝+1:(2.5.3)32Lemma2.SupposethatjAjisboundedbya˝xedconstantindependentofnandJ.Lethnm1n.Thenunder(H4)and(H5)inAssumption2.2,withprobabilityconvergingtoone,ˆmin1W)d1hnˆminA)ˆmaxA)ˆmax1W)d2hn(2.5.4)Additionallyunder(H7),(2.5.4)becomes,c1hnˆminA)ˆmaxA)c2hn(2.5.5)whered1,d2,c1andc2aresomepositiveconstants.Proof.OnecanfollowtheproofoftheLemma3inHuangetal.(2010b)butafterobservingthat,ˆmin1W) Bc0ABcAn!A=Dc0ADcAnˆmax1W) Bc0ABcAn!whichgives(2.5.4).By(H7),thewell-conditionedpropertyof1W,wehave(2.5.5).Lemma3.De˝neMnbeanon-negativede˝nitematrixofordernand,Tjl=mnn12a0jlMn81jJ;1lmn(2.5.6)whereajl=(Bcl(Xj(s));s2S)0andTn=max1jJ1lmnjTjlj.Then,underassumptions(H2)to(H5)inAssumption2.2,E(Tn)C1ˆmax(Mn)q(mnlog(Jmn))O(n);(2.5.7)forsomeC1>0.Proof.Since˘Gaussian(0;T),Tjl˘Gaussian(0;mnna0jlMnTM0najl).Thereforewecanusemaximalinequalitiesofsub-Gaussianrandomvariables[vanderVaartandWell-ner(1996),Lemmas2.2.1and2.2.2].Letkk˚betheOrlicznorm,de˝nedbykXk˚=inffk2(0;1)jE(˚(jXj=k))1g.Then,conditionalonfXj(s);s2S;1jJg,wehave33thefollowingmax1jJ;1lmnjTjljXj(s);s2S;1jJ˚2Krmnnlog(1+Jmn)max1jJ;1lmnkja0jlMnjXj(s);s2S;1jJk˚2Krmnnlog(Jmn)max1jJ;1lmnra0jlMnTM0najl;whereK>0isagenericconstantand˚p(x)=exp1.NowtakingexpectationwithrespecttofXj(s);s2S;1jJgonbothsidesoftheaboveinequality,kmax1jJ;1lmnjTjljk˚2Krmnnlog(Jmn)E max1jJ;1lmnra0jlMnTM0najl!=Krmnnlog(Jmn)E rmax1jJ;1lmna0jlMnTM0najl!Kˆmax(Mn)rmnnlog(Jmn)vuutE max1jJ;1lmna0jlTajl!:(2.5.8)SinceBcl(x)arenormalizedB-splines,wehaveE0@max1jJ;1lmnXs2SXs02SBcl(Xj(s))(ss0)Bcl(Xj(s0))1A4Xs2SXs02S(ss0)KXs2SZh2CDn(h)dhKnZh2CDn(h)dh:(2.5.9)forsomeK;C>0.From(2.5.8)and(2.5.9),kmax1jJ;1lmnjTjljk˚Kˆmax(Mn)qmnlog(Jmn)sZh2CDn(h)dhKˆmax(Mn)qmnlog(Jmn)O(n):Finally,(2.5.7)followsfromkXkL1CkXkL2kXk˚2,wherekXkLp=(E(jXjp))1=p.34Beforewedelveintotheproofofthetheorems,letusde˝neandsummarizesomeofindexsetswewillbeusing.RecallthatA0=fj:fj0;1jJg.Foranindexset~A1thatsatis˝es~A=fj:k^gL;jk2>0g~A1~A[~A;weconsiderthefollowingsets:kjk2(i.e.~A)kjk2(i.e.~A0)~A1~A3~A4~A2=~Ac1~A5~A6Wecandeducesomerelationsfromtheabovetable~A3=~A1\~A,~A4=~A1\~A0,~A5=~Ac1\~A,~A6=~A2\~A0,andhencewehave~A3[~A4=~A1,~A5[~A6=~A2,and~A3\~A4=~A5\~A6=˚.Also,letj~A1j=q1.Foranindexset^A1thatsatis˝es^Af=fj:k^fgL;jk2>0g^A1^Af[A;fj(i.e.A)fj(i.e.A0)^A1^A3^A4^A2=^Ac1^A5^A6Wehaveasimilarsetofrelationsfromtheabovewhichis^A3=^A1\A,^A4=^A1\A0,^A5=^Ac1\A,^A6=^A2\A0,andhencewehave^A3[^A4=^A,^A5[^A6=^A2,and^A3\^A4=^A5\^A6=˚.ToproveTheorem1,weneedboundednessofj~Aj,whichisgiveninthefollowinglemma.Lemma4.UndertheAssumption2.2withn1>Cˆmax(L)qn1+mnlog(Jmn)forasu˚cientlylargeconstantC,wehavej~AjM1jAjfora˝niteconstantM1>1withw.p.convergingto1.Proof.Alongwithconsideringtheapproximationerrorforsplineregression,wealsohavetotakecareofthedependencestructureofaGaussianrandomvectoraccordingto(H3).Toemphasizethedependenceonn,wedenotewriteninsteadofandsimilarnotationforothersaswell.Recallˇn=n+n,wheren=(n(s);s2S)0withn(s)=PJj=1(fj(Xj(s))fnj(Xj(s))).Notethatknk2=O(n1=2q1=2m˝n)=O(q1=2n1=(4˝+2))byLemma1since35mn=O(n1=(2˝+1)).De˝nen;J=2qKˆ2max(L)mnn1+log(Jmn)forsomeK>0andn1maxf0;n;Jg,where0=inff:M1()q+1q0gforsome˝niteq0>0andM1()>1,whichwillbespeci˝edlaterintheproof.Withoutlossofgenerality,wewillassumethein˝mumofanemptysettobe1.Thatis,iff:M1()q+1q0gisanemptyset,itimpliesthatn1=0=1andwhichinturnimpliesthatwedropallthecomponentsinouradditivemodel,i.e.j~Aj=0.Sopart(i)istrivialinthiscaseandhencefortherestoftheproofwewillassumef:M1()q+1q0gisanon-emptyset.First,de˝neanewvectorUksuchthatUk=Dkc0(ZcDc^gL).n1fork=1;;J.ByKarsuh-Kuhn-Tucker(KKT)conditionsoftheoptimizationproblemforQn(;n)withthesolution^gL,wehaveUk8>>>><>>>>:=^gL;kk^gL;kk2ifk^gL;kk2>0;1ifk^gL;kk2=0:(2.5.10)Then,thenormofUkiskUkk28>>><>>>:=1ifk^gL;kk2>0;1ifk^gL;kk2=0:(2.5.11)Nowweintroducethefollowingquantities.xr=maxjAj=rmaxkUkk2=1;k2A;BˆAˇ0nwAjB;andxr=maxjAj=rmaxkUkk2=1;k2A;BˆA0nwAjB;wherewAjB=WAjB=WAjB2withWAjB=DcA(DcA0DcA)1n1Q0BAQBAUA(IPA)Dc.ForBˆA,QBAisthematrixcorrespondingtotheselectionofvariablesinBfromA,i:e:QBAA=B.PA=DcA1ADcA0=n,UA=(Uk;k2A)0.Bythetriangleinequalityand36Cauchy-Schwarzinequality,forsomesetAwithjAj=r>0,wehaveˇ0nwAjB0nwAjB+knk20nwAjB+K2qn1=(4˝+2)0nwAjB+K1vuut(rmn_mn)ˆ2max(L)nmnlog(Jmn)ˆmax~A1)(2.5.12)wherethelastinequalityholdsforasu˚cientlylargenforsomeK1>0.Byintroducingthefollowingsets,r0=8<:(Dc;ˇn);xr2K1vuut(rmn_mn)ˆ2max(L)nmnlog(Jmn)ˆmax~A1);8rr09=;;andr0=8<:(Dc;n);xrK1vuut(rmn_mn)ˆ2max(L)nmnlog(Jmn)ˆmax~A1);8rr09=;;wecanshowPn(Dc;ˇn)2r0oPn(Dc;n)2r0oforanyr0>0sincewehavexrxr+knk2xr+K1vuut(rmn_mn)ˆ2max(L)nmnlog(Jmn)ˆmax~A1)(2.5.13)byrecallingthede˝nitionsofxrandxrand(2.5.12).Now,wewanttoshowthatPn(Dc;ˇn)2q1o!1impliesj~AjM1j~Aj=M1qforsome˝niteM1>1,whichcompletestheproofsinceqq.Beforeprovingthisclaim,we˝rstshowPn(Dc;n)2q1o!1,whichimpliesPn(Dc;ˇn)2q1o!1.Westartwiththefollowing:1Pn(Dc;n)2q1o(2.5.14)1Xr=0P0@xr>K1vuut(rmn_mn)ˆ2max(L)nmnlog(Jmn)ˆmax~A1)1A1Xr=0 Jr!P0@jw0AjBnj>K1vuut(rmn_mn)ˆ2max(L)nmnlog(Jmn)ˆmax~A1)1A;(2.5.15)37wherejAj=r.Sincew0AjBn˘Gaussian(0;w0AjBTwAjB),(2.5.15)becomes21Xr=0 Jr!exp0@0:5K21(rmn_mn)ˆ2max(L)nmnlog(Jmn)(w0AjBTwAjB)ˆmax~A1)1A21Xr=0 Jr!exp0@0:5K21(rmn_mn)ˆ2max(L)nmnlog(Jmn)ˆmaxT)ˆmax~A1)1A=21Xr=0 Jr!(Jmn)0:5K21(rmn_mn)ˆ2max(L)nmn.ˆmax(T)ˆmax~A1:(2.5.16)LetKn=0:5K21m2nˆ2max(L)n.ˆmaxT)ˆmax~A1.Then(2.5.16)becomes,=2(Jmn)Kn+21Xr=1 Jr!(Jmn)rKn2(Jmn)Kn+21Xr=11r! J(Jmn)Kn!r=2(Jmn)Kn+2exp J(Jmn)Kn!2:(2.5.17)De˝nekTk1=maxs2SPs02S˙s;s0andnotethatkTk1Rh2Dn(h)dh=O(n1).There-forebyusingthefact,1pnkTk1ˆmaxT)pnkTk1andˆmax1W)ˆmax(L)ˆmax(L0)=ˆ2max(L),wehaveKnc10:5K21m3nnpnkTk10:5c1K21pn61;andKn!1by(H6).Thisshows(2.5.17)goestozeroasn!1.ToshowPn(Dc;ˇn)2q1o!1impliesj~AjM1j~Aj=M1q,letV1j=12~A1Q0j1U~Ajn1pn,forj=1;3;4;andu=Dc~A11=2~A1V14=pn!2kDc~A11=2~A1V14=pn!2k2;where,forsimplicityinnotations,Qkj=Q~Ak~Ajisthematrixcorrespondingtotheselectionofvariablesin~Akfrom~Ajand!2=(IP~A1)Dc~A2~A2=(IP~A1)Dc.Wecanshow38that,V11=V14+V13andQ031Q31+Q041Q41=Imnj~A1jduetothefactthat~A3[~A4=~A1;~A3\~A4=˚andhence0~A3Q31+0~A4Q41=~A1.Sinceq1=j~A1j=j~A3j+j~A4jandj~A3jq,j~A4j(q1q).Then,wehavethefollowinglowerboundforL2-normofV14,kV14k222n1kQ041U~A4k22nˆmax~A1)=2n1kQ041Q41U~A1k22nˆmax~A1)=2n1mnj~A4jnˆmax~A1)B1(q1q)+q;(2.5.18)withB1=2n1mnqnˆmax~A1.From(2.5.18),wehavej~Ajj~A1j=q1(q1q)++qqkV14k22B1+q= kV14k22B1+1!q:(2.5.19)Thus,toshowj~AjM1qforsome˝niteM1>1,weneedtoshowkV14k22B1+1M1forsome˝niteM1>1.WestartwithanupperboundofkV14k22+k!2k22.SincekV14k22+k!2k22=V014V14+k!2k22=V014(V11V13)+k!2k22V014V11+kV14k2kV13k2+k!2k22;(2.5.20)we˝ndupperboundsforV014V11,kV14k2kV13k2andk!2k22,respectively.First,V014V11=2n1nU0~A4Q411~A1U~A1=2n1nU0~A4Q411~A1Dc0~A1ZcDc^gL.n1=n1nU0~A4Q411~A1Dc0~A1Dc~A1~A1+Dc~A2~A2+ˇnDc~A1^gL;~A1=n1U0~A4Q410B@(~A1^gL;~A1)+1~A1n(Dc0~A1Dc~A2)~A2+1~A1nDc0~A1ˇn1CA=n1U0~A4(~A4^gL;~A4)+n1U0~A4Q410B@1~A1n(Dc0~A1Dc~A2)~A2+1~A1nDc0~A1ˇn1CAn1Xk2~A4kkk2+n1U0~A4Q411~A1(Dc0~A1Dc~A2)~A2n+n1U0~A4Q411~A1Dc0~A1ˇnn;(2.5.21)39wherethelastinequalityisbasedonU0~A4~A4Pk2~A4U0kkPk2~A4kkk2andU0~A4^gL;~A4=Xk2~A4\~AU0k^gL;k0:ForkV14k2kV13k2,wehavekV14k2kV13k2kV14k2n1vuuutmnj~A3jnˆmin~A1)(2.5.22)fromthede˝nitionofV13.Fork!2k22,k!2k22=k(IP~A1)Dc~A2~A2k22=0~A2Dc0~A2(IP~A1)Dc~A2~A2=0~A2n~A2~A21nDc0~A2Dc~A11~A1Dc0~A1Dc~A2~A20~A2n1D~A2Dc0~A2ˇnDc0~A2Dc~A1~A1^gl;~A11nDc0~A2Dc~A11~A1Dc0~A1Dc~A2~A2=0~A2 n1D~A2Dc0~A2ˇnn1nDc0~A2Dc~A11~A1U~A1+1nDc0~A2Dc~A11~A1Dc0~A1ˇn!=n10~A2D~A2!02ˇnn1nU0~A11~A1Dc0~A1Dc~A2~A2;wheretheinequalityisfromDc0~A2Dc~A1(~A1^gL;~A1)+n~A2~A2+Dc0~A2ˇnn1D~A2:(2.5.23)In(2.5.23),DAisa01vectorwhosekthentryisI(k^k;gLk2=0),whereI(A)istheindicatorfunctionforasetA.Theinequalitybetweenvectorsisde˝nedentry-wise.Notethat(2.5.23)holdsdueto(2.5.11).SinceV14??!2,wehavekDc~A11~A1Q041U~A4n1=n!2k22=kDc~A11=2~A1V14=pn!2k22=kV14k22+k!2k22sothat n1nU0~A4Q411~A1Dc0~A1!02!ˇn=(kV14k22+k!2k22)1=2(u0ˇn)40usingthede˝nitionofu.Then,thisimpliesk!2k22n10~A2D~A2n1nU0~A11~A1Dc0~A1Dc~A2~A2+(kV14k22+k!2k22)1=2(u0ˇn)n1nU0~A4Q411~A1Dc0~A1ˇn:(2.5.24)Combining(2.5.21)and(2.5.24),wehaveV014V11+k!2k22n1Xk2~A4kkk2n1U0~A3Q311~A1(Dc0~A1Dc~A2)~A2n+n10~A2D~A2+(kV14k22+k!2k22)1=2ju0ˇnj=n1Xk2~A4kkk2V0131=2~A1(Dc0~A1Dc~A2)~A2=pn+n10~A2D~A2+2(kV14k22+k!2k22)1=2=2ju0ˇnjn1Xk2~A4kkk2+kV13k2k1=2~A1(Dc0~A1Dc~A2)~A2k2=pn+n1Xk2~A2kkk2+(kV14k22+k!2k22)4+ju0ˇnj2;(2.5.25)wheretheinequalityisbytheCauchy-Schwarzinequality,triangleinequalityand2aba2+b2.Then,by(2.5.22)and(2.5.25),kV14k22+k!2k22V014V11+kV14k2kV13k2+k!2k22n1Xk2~A4kkk2+kV13k2k1=2~A1(Dc0~A1Dc~A2)~A2k2=pn+n1Xk2~A2kkk2+(kV14k22+k!2k22)4+ju0ˇnj2+kV14k2n1vuuutmnj~A3jnˆmin~A1):(2.5.26)Sincef(Dc;ˇn)2q1gimpliesju0ˇnj2(xq1)2(q1mn_mn)2n14nˆmax(~A1)=q1mn2n14nˆmax(~A1)=14q1qB1,wecanshowju0ˇnj214q1qB114(kV14k22+B1)byusing(2.5.18).AlongwithkV13k241n1vuutmnj~A3jnˆmin(~A1),(2.5.26)becomesn1Xk2~A4kkk2+n1vuuutmnj~A3jnˆmin~A1)k1=2~A1Dc0~A1Dc~A2~A2k2=pn+n1Xk2~A2kkk2+(kV14k22+k!2k22)4+(kV14k22+B1)4+kV14k2n1vuuutmnj~A3jnˆmin~A1)=n1Xk2~A5kkk2+n1vuuutmnj~A3jnˆmin~A1)kP1=2~A1Dc~A2~A2k2+n1Xk2~A4[~A6kkk2+kV14k222+k!2k224+B14+kV14k2n1vuuutmnj~A3jnˆmin~A1)n1Xk2~A5kkk2+n1vuuutmnj~A3jnˆmin~A1)kP1=2~A1Dc~A2~A2k2+n11+kV14k222+k!2k224+B14+kV14k2n1vuuutmnj~A3jnˆmin~A1);wheretheequalitycomesfrom~A2=~A5[~A6withP1=2~A1=1=2~A1Dc0~A1=pnandthelastin-equalityisduetoGSC.Thisresultgives12kV14k22+34k!2k22n1Xk2~A5kkk2+n1vuuutmnj~A3jnˆmin~A1)kP1=2~A1Dc~A2~A2k2+n11+B14+kV14k2n1vuuutmnj~A3jnˆmin~A1)(2.5.27)fromwhichwehavekV14k22212kV14k22+34k!2k222n1Xk2~A5kkk2+2n1vuuutmnj~A3jnˆmin~A1)kP1=2~A1Dc~A2~A2k2+2n11+B12+2kV14k2n1vuuutmnj~A3jnˆmin~A1):42Notethatthelargestpossible~A1containsallthekjk2then~A5=˚sothat~A2=~A6,Pk2~A5kkk2=0andP1=2~A1Dc~A2~A2=P1=2~A1Dc~A6~A6.FinallyusingkP1=2~A1Dc~A6~A6k2maxAˆ~A0kPk2ADckkk2since~A6ˆ~A0,wehavethefollowinginequality.kV14k2222qB2+2n11+B12+2qB2kV14k2;(2.5.28)where2=maxAˆ~A0kPk2ADckkk2andB2=2n1mnqnˆmin~A1.Henceafterusingthefact2pB2kV14k2=2p2B2kV14k2=p22B2+kV14k22=2,weobtainanupperboundforkV14k22from(2.5.28),kV14k22B1+4n11+42qB2+4B2which,whencombinedwith(2.5.19),implies~AM1q;whereM1=M1(n1)=2+4r1+4r2pC12+4C12withr1=r1(n1)= c21nqmnn1!;r2=r2(n1)= c222nqmn2n1!1=2andC12=c2c1:NotethatM1(n1)isadecreasingfunctioninn1.If1=0whichisreferredtoasanarrow-sensesparsitycondition,thenr1=r2=0andhenceM1(n1)=2+4C12<1.Notethatsinceweareassuming0<1,weimplic-itlyassumethat(2+4C12)q+1q0holds.Ingeneral,aslongas1and2satisfythat1C1qc2mnn1nand22C2qc2 mn2n1n!forsome˝niteC1andC2,wewillhaver1C1andr2C2,whichgivesM1(n1)(2+4C1+4C2pC12+4C12)<1.Thus,wecompletetheproof.43PROOFOFTHEOREM1Part(a):Since^gListhegroupLASSOestimatebyminimizingQn1(;n1),forany,kZcDc^gLk22+n1k^gLk2;1;1kZcDck22+n1kk2;1;1:(2.5.29)LetA2=fj:kjk2>0ork^gL;jk2>0g,whereA2=(j;j2A2)and^gL;A2=(^gL;j;j2A2).RecallthatkA2k2;1;1=Pj2A2kjk2andk^gL;A2k2;1;1=Pj2A2k^gL;jk2.By(2.5.29),wehavekZcDcA2^gL;A2k22+n1k^gL;A2k2;1;1kZcDcA2A2k22+n1kA2k2;1;1:(2.5.30)Let#A2=ZcDcA2A2andA2=DcA2(^gL;A2A2).Usingthefactthatkabk22=kak222a0b+kbk22andZcDcA2^gL;A2=ZcDcA2A2DcA2(^gL;A2A2);wecanrewrite(2.5.30)suchthatkA2k222#0A2A2n1kA2k2;1;1n1k^gL;A2k2;1;1n1qjA2jk^gL;A2A2k2:(2.5.31)Subsequentstepswillbetoboundk^gL;A2A2k22.First,wehave2j#0A2A2j2k#A2k2kA2k2=2p2k#A2k2 kA2k2p2!2k#A2k22+kA2k222;(2.5.32)wherethelastinequalityisbasedonthefactthat,2aba2+b2and#A2istheprojectionof#A2tothespanofDcA2.Nowbycombining(2.5.31)and(2.5.32),wehavekA2k224k#A2k22+2n1qjA2jk^gL;A2A2k2:(2.5.33)Ontheotherhand,wehavekA2k22=(^gL;A2A2)0Dc0A2DcA2(^gL;A2A2)nˆmin0@Dc0A2DcA2n1Ak^gL;A2A2k22=nˆminA2)k^gL;A2A2k22nc1hnk^gL;A2A2k22;(2.5.34)44wherethelastinequalityisfromLemma2.Combining(2.5.33)and(2.5.34),wehavenc1hnk^gL;A2A2k224k#A2k22+2n1qjA2jkA2^gL;A2k2=4k#A2k22+20@n1q2jA2jpnc1hn1A0@kA2^gL;A2k2snc1hn21A4k#A2k22+2n12jA2jnc1hn+kA2^gL;A2k22nc1hn2sothatk^gL;A2A2k228k#A2k22nc1hn+42n1jA2jn2c21h2n:(2.5.35)Let#(s)betheentryof#A2.Then,wecanrewrite#(s)=(s)+(Y)+f0(X(s))fnA2(X(s)),wheref0(X(s))=PJj=1fj(Xj(s))andfnA2(X(s))=Pj2A2fnj(Xj(s)).NotethatjYj2=Op(n1).LetA2betheprojectionofntothespanofDcA2,thatis,A2=(Dc0A2DcA2)1=2Dc0A2n.Then,wehavek#A2k22kA2k22+Op(n1+jA2jnm2˝n)(2.5.36)=k(Dc0A2DcA2)1=2Dc0A2nk22+Op(n1+jA2jnm2˝n)kDc0A2nk22nc1hn+Op(n1+jA2jnm2˝n)(ByLemma2)1nc1hnmaxA:jAA2jkDc0Ank22+Op(n1+jA2jnm2˝n)1nc1hnjA2jmnmax1jJ;1lmnDc0jln2+Op(n1+jA2jnm2˝n)=1c1hnjA2jmax1jJ;1lmnmnn1=2a0jln2+Op(n1+jA2jnm2˝n)=Op jA2jˆ2max(L)mnlog(Jmn)c1hn!+Op(n1+jA2jnm2˝n);(2.5.37)wherethelastequalityisbyByLemma3usingMn=L.Thepart(a)followsbycombining(2.5.35)and(2.5.37)sincejA2jisboundedbyLemma4.Part(b):Ifm2n2n1n2!0then,bytheconditionn1>Cˆmax(L)qn1+mnlog(Jmn),wehaveˆ2max(L)m3nlog(Jmn)n1!0,alsonotethat,1=ˆmax(I)=ˆmax1WW)ˆmax1W)45ˆmaxW)ˆ2max(L)ˆmaxW)thereforeˆ2max(L)ˆmin1Wandhence,(ˆ2max(L)m3nlog(Jmn)n1)=nˆ2max(L)m2nlog(Jmn)omnn1nˆmin1Wm2nlog(Jmn)omnn1nCm2nlog(Jmn)omnn1;(2.5.38)whereCisagenericconstant.SinceL.H.Sof(2.5.38)goesto0andm2nlog(Jmn)!1,mnn1!0.Similarly,(ˆ2max(L)m3nlog(Jmn)n1)(2.5.39)=(ˆ2max(L)m2˝+2nlog(Jmn)n1)1m2˝1n=8>><>>:ˆmin1Wn2˝+22˝1log(Jmn)n19>>=>>;1m2˝1nnˆmin1Wnlog(Jmn)o1m2˝1nfCnlog(Jmn)g1m2˝1n(2.5.40)SinceL.H.Sof(2.5.40)goesto0andnlog(Jmn)!1,1m2˝1n!0.Thus,wehavepart(b)ofTheorem1.PROOFOFTHEOREM2Thepart(a)isfromthefactthatcm1nk^gL;jjk2k^fgL;jfjk2cm1nk^gL;jjk2forsomec;c>0.Thepart(b)isfromthepart(a).PROOFOFTHEOREM3Part(a):Recallthat,bytheKKTconditions,anecessaryandsu˚cientconditionfor^AgLis8>>>><>>>>:Dcj0(ZcDc^AgL)=n2nj^AgL;j2k^AgL;jk2;whenk^AgL;jk2>0;kDcj0(ZcDc^AgL)k2n2nj=2;whenk^AgL;jk2=0:(2.5.41)LetA=A\fj:k^AgL;jk2>0g.De˝ne^A=(DcA0DcA)1(DcA0Zcn2vn);(2.5.42)46wherevn=(vnj;j2A)withvnj=nj^j=(2k^jk2).Then,wehaveDcj0(ZcDcA^A)=n2nj^j2k^jk2;forj2A.IfweassumekDcj0(ZcDcA^A)k2n2nj=2forallj=2A,then(2.5.41)holdsfor(^0A;00),sothat^AgL=(^0A;00)sinceDc^=DcA^A.Ifkjk2k^jk2n2nj=2;9j=2A)P(k^jjk2kjk2;9j2A)+P(kDcj0(ZDcA^A)k2>n2nj=2;9j=2A);(2.5.43)wherethelastinequalityisfromkjk2>0forj2A.First,weshowP(k^jjk2kjk2;9j2A)!0:(2.5.44)Toshow(2.5.44),itissu˚cienttoshowthatmaxj2Ak^jjk2!0inprobabilitysincekjk2>0forj2A.De˝neTnj=(Omn;:::;Omn;Imn;Omn;:::;Omn)beamnqmnmatrixwithImnisinthejthblock,whereOmnbemnmnmatrixofzerosandImnbeanmnmnidentitymatrix.From(2.5.42),^AA=n11A(Dc0An+Dc0Ann2vn).Thus,ifj2A,wehave^jj=n1Tnj1A(Dc0An+Dc0Ann2vn).Bytriangleinequality,k^jjk2kTnj1ADc0Ank2n+kTnj1ADc0Ank2n+n2kTnj1Avnk2n:(2.5.45)Weshoweachtermontherighthandsidein(2.5.45)goestozeroinprobability.Forthe47˝rstterm,maxj2An1kTnj1ADc0Ank21nˆmaxA)kDc0Ank2qjAjn1=2ˆmaxA)vuuutmaxj2A1lmnmnnjDcjl0nj2=Op0@sˆ2max(L)m3nlog(jAjmn)n11A!0(2.5.46)wherethelastequalityholdsbyLemma3and(2.5.46)holdsbyassumptions(H6)and(H7).Forthesecondterm,maxj2An1kTnj1ADc0Ank2n1=2k1Ak2kn1Dc0ADcAk1=22knk2n1=2ˆ1minA)ˆ1=2maxA)Op(n1=(4˝+2))=Opn1=(2˝+1)1=2!0;(2.5.47)where(2.5.47)holdsbyassumption(H6).Forthethirdterm,we˝rst˝ndanupperboundforkvnk22,kvnk22=12Xj2A2nj=12Xj2Ak^gL;jk22=12Xj2Akjk22k^gL;jk22+k^gL;jk22k^gL;jk22kjk22=12Xj2Akjk22k^gL;jk22k^gL;jk22kjk22+12Xj2Akjk22Ck2bb4n1k^gL;AAk22+qb2n1=Op(k2bb4n1r1n+qb2n1)=Op(k2n);(2.5.48)whereCisagenericconstant.Then,wehavemaxj2An1n2kTnj1Avnk2n1n2ˆ1minA)kvnk2=Op(n1n2ˆ1minA)kn)=Op(n1n2(r1=2n+m1=2n)=Op0@n2m1=2nn1A!0;(2.5.49)where(2.5.49)isimpliedbyassumption(K2).Thereforebycombining(2.5.46),(2.5.47)and(2.5.49),wehave(2.5.44).Now,weshowP(kDcj0(ZcDcA^A)k2>n2nj=2;9j=2A)!0:(2.5.50)48Asnj=k^gL;jk12=Op(rn)forj=2A,insteadof(2.5.50)itissu˚cienttoshow,P(kDcj0(ZcDcA^A)k2>n2rn=2;9j=2A)!0:(2.5.51)Forj=2A,Dcj0(ZcDcA^A)=Dcj0(ZcDcA(DcA0DcA)1DcA0Zc+n2n1DcA1Avn)=Dcj0HnZc+n2n1Dcj0DcA1Avn=Dcj0HnDc+Dcj0Hnn+Dcj0Hnn+n2n1Dcj0DcA1Avn=Dcj0DcAcAc+Dcj0Hnn+Dcj0Hnn+n2n1Dcj0DcA1Avn=Dcj0DcA\AcA\Ac+Dcj0Hnn+Dcj0Hnn+n2n1Dcj0DcA1Avn;(2.5.52)whereHn=IPA,the˝rstequalityisbyreplacing^Awiththeexpressionasin(2.5.42)andthefourthequalityisbecauseHnistheprojectionmatrixontoAc.By(2.5.52),thelefthandsideof(2.5.51)canbeboundedabovebyPkDcj0(ZcDcA^AgL;A)k2>n2rn=2;9j=2APkDcj0DcA\AcA\Ack2>n2rn=8;9j=2A+PkDcj0Hnnk2>n2rn=8;9j=2A+PkDcj0Hnnk2>n2rn=8;9j=2A+Pkn2n1Dcj0DcA1Avnk2>n2rn=8;9j=2A(2.5.53)sothatwe˝ndupperboundsofthefourtermsin(2.5.53).Forthe˝rstterm,wehavemaxj=2AkDcj0DcA\AcA\Ack2nmaxj=2Akn1=2Dc0jk2kn1=2DcA\Ack2kA\Ack2=Op(nˆ1=2maxAc)ˆ1=2maxA)m1=2n)=Op(nm1=2n):49Then,wehave,forsomegenericconstantC,P(kDcj0DcA\AcA\Ack2>n2rn=8;9j=2A)P(maxj=2AkDcj0DcA\AcA\Ack2>Cn2rn=8)P(nm1=2n>Cn2rn=8)=P0@nm1=2nn2rn>C=81A!0;(2.5.54)where(2.5.54)holdsbyassumption(K2).Forthesecondterm,letsn=JjAj.Sinceˆmax(Hn)=ˆmax(IPA)=1ˆmin(PA)andPAisanon-negativede˝nitematrix,ˆmax(Hn)1.ByLemma3withMn=LHn,andusingthefacethatˆmax(LHn)ˆmax(L)ˆmax(Hn),wehave,E maxj=2An1=2kDcj0Hnnk2!=E0@maxj=2An1=2vuutmnXl=1jDcjl0Hnnj21AE0BB@maxj=2A1lmnmnn1=2ja0jlLHnnj1CCA=O(qˆ2max(L)nmnlog(snmn)):(2.5.55)Thus,byMarkov'sinequality,P(kDcj0Hnnk2>n2rn=8;9j=2A)P(maxj=2An1=2kDcj0Hnnk2>Cn1=2n2rn=8)O0@qˆ2max(L)n1+mnlog(snmn)Cn2rn1A!0;(2.5.56)whereCisagenericconstantand(2.5.56)holdsbyassumption(K2).Forthethirdterm,maxj=2AkDc0jHnnk2n1=2maxj=2Akn1Dc0jDcjk1=22kHnk2knk2=O(nˆ1=2maxAc)m˝n)=O(nm˝1=2n):50Therefore,forsomegenericconstantC,P(kDcj0Hnnk2>n2rn=6;9j=2A)P(maxj=2AkDcj0Hnnk2>Cn2rn=8)P(nm˝1=2n>Cn2rn=8)=P0@nn2rnm(2˝+1)=2n>C=81A!0;(2.5.57)where(2.5.57)isimpliedbyassumption(K2).Finally,using(2.5.48)wehavemaxj=2Akn2n1Dc0jDcA1Avnk2n2maxj=2Akn1=2Dc0jk2kn1=2DcA1=2Ak2k1=2Ak2kvnk2=Op(n2ˆ1=2maxAc)ˆ1=2minA)kn)=Opn2(m1nr1=2n+m1=2n):Then,wehave,forsomegenericconstantC,P(kn2n1Dc0jDcA1Avnk2>n2rn=8;9j=2A)P(maxj=2Akn2n1Dc0jDcA1Avnk2>Cn2rn=8)Pn2(m1nr1=2n+m1=2n)>Cn2rn=8=P0@m1nr1=2n+m1=2nrn>C=81A!0;(2.5.58)where(2.5.58)holdssincern;mn!1.Hencebycombining(2.5.56),(2.5.57)and(2.5.58),(2.5.50)follows.Part(b):Denote=maxj2A1=kjk2.LetA=A[fj:k^AgL;jk2>0g.NotethatJ0=jAj.De˝ne#A=ZcDcAAanddenote#AandAbetheprojectionsof#AandntothespanofDcA.Then,inasimilarwayasinthepart(b)ofTheorem511,wecanshowk#Ak22kAk22+Op(n1+jAjnm2˝n)=k(Dc0ADcA)1=2Dc0Ank22+Op(n1+jAjnm2˝n)=Op jAjnˆ2max(L)mnlog(J0mn)c1hn+n1+jAjnm2˝n!:(2.5.59)Inasimilarwaytoget(2.5.35),wecanalsoshowk^AgL;AAk228k#Ak22nc1hn+42n2jAjn2c21h2n;(2.5.60)thus,by(2.5.59)and(2.5.60),weobtainthepart(b)ofTheorem3.PROOFOFTHEOREM4TheproofwouldgosimilarasTheorem2.2.6Fewtheoreticalextensions2.6.1Resultsforextensiontodi˙erentvariabilityofeachadditivecomponentsInthissection,weproviderevisedlemmasandtheoremswhenweallowdi˙erentlevelsofsmoothnessforadditivecomponents.We˝rstextendthede˝nitionofS0njasfollows:S0nj=8<:fnj:fnj=mnjXl=1bjlBcl(x);(bj1;;bjmnj)2Rmnj9=;;1jJ:(2.6.1)Withnewassumptions,Lemmas1and3andTheorems2and4arechangedasfollows.Lemma10.Supposethatf2FjandEf(Xj)=0.Then,under(H4)0and(H5),thereexistsanfn2S0njsuchthatkfnfk2=Op m˝nj+smnjn!:(2.6.2)Particularly,underthechoiceofmnj=O(n12˝j+1),wehavekfnfk2=Opm˝jnj=Op0B@n˝j2˝j+11CA:(2.6.3)52Lemma30.De˝neMnbeanon-negativede˝nitematrixofordernand,Tjl=mnjn12a0jlMn81jJ;1lmnj;(2.6.4)whereajl=(Bcl(Xj(s));s2S)0andTn=max1jJ1lmnjjTjlj.WithnewAssumption1,E(Tn)C1ˆmax(Mn)q(mnlog(Jmn))O(n);(2.6.5)forsomeC1>0andmn=maxj=1;2;:::;Jmnj.Theorem20.WithnewAssumption1andn1>Cˆmax(L)qn1+mnlog(Jmn)forasu˚cientlylargeconstantC,(a)k^fgL;jfjk22=Op( ˆ2max(L)m3nlog(Jmn)n1+mnn1+1m2˝1n+4m2n2n1n2!=mnj)forj2~A[A,where~AistheindexsetofnonzerogLestimatesforj,(b)Ifm2n2n1mnjn2!0asn!1for1jq,allthenonzerocomponentsfj;1jqareselectedw.p.convergingto1.Theorem40.WithnewAssumptions1and2,(a)P(k^fAgL;jk2>0;j2Aandk^fAgL;jk2=0;j=2A)!1,(b)k^fAgL;jfjk22=Op( ˆ2max(L)m3nlog(J0mn)n1+mnn1+1m2˝1n+4m2n2n2n2!=mnj)8j2A.2.6.2ResultsforextensiontoalongrangedependenceWeextendtheassumption(H3)tocoverabroaderclassofspatialcovariancefunctions.Theassumptions(H6)and(K2)areadjustedaccordinglyaswell.(H3)Therandomvector=f(s);s2Sg˘Gaussian(0;T),whereT=((˙s;s0))s;s02S53with˙s;s0=(ss0)and(h)isacovariancefunctionsuchthatRDn(h)dh=O(n)forsome2[0;1).DnˆRdisthesamplingregionthatcontainsthesamplinglocationsS.Withoutlossofgenerality,weassumethattheoriginofRdisintheinteriorofDnandDnisincreasingwithn.(H6)mn=O(n)with1=6=1=(2˝+1)<(1)1=3.(K2)qˆ2max(L)n1+mnlog(snmn)n2rn+n22n2r2nmn+n2mnn=o(1)wheresn=JjAj.Lemmasandtheoremsarethenupdatedinthefollowingway.Lemma3.De˝neMnbeanon-negativede˝nitematrixofordernand,Tjl=mnn12a0jlMn81jJ;1lmn(2.6.6)whereajl=(Bcl(Xj(s));s2S)0andTn=max1jJ1lmnjTjlj.Then,underassumptions(H2),(H3),(H4)and(H5),E(Tn)C1ˆmax(Mn)q(mnlog(Jmn))O(n);(2.6.7)forsomeC1>0.Lemma4.UndertheAssumption1withupdated(H3)and(H6)andwithn1>Cˆmax(L)qn1+mnlog(Jmn)forasu˚cientlylargeconstantC,wehavej~AjM1jAjfora˝niteconstantM1>1withw.p.convergingto1.Theorem1.SupposethatconditionsinAssumption1withupdated(H3)and(H6)holdandifn1>Cˆmax(L)qn1+mnlog(Jmn)forasu˚cientlylargeconstantC.Then,wehave54(a)PJj=1k^gL;jjk22=Op ˆ2max(L)m3nlog(Jmn)n1+mnn1+1m2˝1n+4m2n2n1n2!,(b)Ifm2n2n1n2!0asn!1,allthenonzerocomponentsj;1jqareselectedwithprobability(w.p.)convergingto1.Theorem2.SupposethatconditionsinAssumption1withupdated(H3)and(H6)holdandifn1>Cˆmax(L)qn1+mnlog(Jmn)forasu˚cientlylargeconstantC.Then,(a)k^fgL;jfjk22=Op ˆ2max(L)m2nlog(Jmn)n1+1n1+1m2˝n+4mn2n1n2!forj2~A[A,where~AistheindexsetofnonzerogLestimatesforj,(b)Ifmn2n1n2!0asn!1,allthenonzerocomponentsfj;1jqareselectedw.p.convergingto1.Theorem3.SupposethatconditionsinAssumptions1and2withupdated(H3),(H6)and(K2)aresatis˝ed.Then,(a)P(^AgL=0)!1,(b)Pqj=1k^AgL;jjk22=Op ˆ2max(L)m3nlog(J0mn)n1+mnn1+1m2˝1n+4m2n2n2n2!.Theorem4.SupposethatconditionsinAssumptions1and2withupdated(H3),(H6)and(K2)aresatis˝ed.Then,(a)P(k^fAgL;jk2>0;j2Aandk^fAgL;jk2=0;j=2A)!1,(b)Pqj=1k^fAgL;jfjk22=Op ˆ2max(L)m2nlog(J0mn)n1+1n1+1m2˝n+4mn2n2n2!.Theupdatedtheoremsshowthatthelowerboundofthepenaltyparameteraswellastheconvergenceratearea˙ectedby.Morespeci˝cally,introductionofincreasestheorderinthelowerboundofthepenaltyparameterandtheorderoftheconvergence55rateisdecreased(slowerconvergencerate)with.Notethatdoesnotfullycharacterizeaspatialdependencestructurebutitgivessomeinformationonthelevelofspatialdependencesuchthat0<<1impliesalong-rangedependence.Foranyintegrablestationaryspatialcovariancemodel,=0andthisisthecaseformostpracticalsituations.If0<<1,onemightconsiderestimatingforcalculationofthelowerboundofthepenaltyparameter.Therearesomeliteraturewhichprovidehowtoestimatelong-rangeparametersforrandom˝elds[e.g.AnhandLunney(1995),Boissyetal.(2005)],buttheyarelimitedsinceaspeci˝cclassofrandom˝eldsoraparametricmodelisassumed.Estimationofhasitsowninterestbutwedonotpursueitsinceourfocusisonvariableselection.56CHAPTER3ESTIMATINGNON-STATIONARYSPATIALCOVARIANCEMATRIXUSINGMULTI-RESOLUTIONKNOTSFormostofstatisticalpredictionproblems,obtainingaBLUPisverycrucialandgener-allymodelingandestimatingthemeandoesthetrick.AlthoughestimationoftheunderlyingprocesscovarianceisinstrumentalforspatialBLUPalsoknownaskriging.Theconceptofkrigingwas˝rstintroducedbyD.G.Krige,aSouthAfricanminingengineer(Cressie,1990)andMatheronin1962coinedthetermtohonorKrige.Krigingisaverypopulartoolusedinearthclimatemodelingandenvironmentalsciences.Itusesquanti˝cationofspatialvariabilitythroughcovariancefunctionandsolvingthestandardkrigingequationisoftennumericallycumbersome,andinvolvesinversionofanncovariancematrix.Withlargen,whichisquitereasonableforrealdataobservedonglobalscalesincecomputationcostincreaseswithcubicpowerofthedimensionn,spatialBLUPbecomeschallenging.Hence,therehavebeenseverale˙ortstoachieveacomputationallyfeasibleestimate.Theforemostchallengeofestimatingcovarianceforaspatialsetuparisesduetoabsenceofrepetition.Thismayseemabsurdifwerealizethissituationasamultivariateextensionofcomputingvariancefromoneobservation.Asoddasmayitsound,thetrickistoconsideraspeci˝csparsitystructureforthecovariancematrixunderstudy.Thecovariancematrixissparsewhenthecovariancefunctionisof˝niterangeandduetosparsitythecomputationcosttoinvertannmatrixreducesconsiderably.Beforewedelveintothediscussionofourcontributionwewouldliketoputforwardafewotherattemptstoestimatelargecovariancematricesthroughliteraturereview.In1997BarryandPaceusedsymmetricminimumdegreealgorithmwhenn=916forkriging.RueandTjelmeland(2002)approximated1tobesparseprecisionmatrixofaGaus-sianMarkovrandom˝eldwrappedonatorus.Forlargern,the˝rstchallengeinapplyingkrigingis,increaseinconditionnumberofthecovariancematrix,whichplaysamajorrole57inbuildingupthecomputationtimeandmakesthekrigingequationnumericallyunstable.Ontheotherhand,tohandlecomputationalcomplexity,Kaufmanet:al:(2008);introducedtheideaofcovariancetaperingwhichsparsifythecovariancematrixelementwisetoap-proximatethelikelihood.Someotherworthmentioninge˙ortsintaperingareFurreret:al:(2012),Stein(2013)e.t.c.Covariancetaperinggainsimmensecomputationalstability,keepinterpolatingpropertyandalsohaveasymptoticconvergenceofthetaperestimator.Buttaperingisrestrictedonlytoisotropiccovariancestructureandthetaperingradiusneedstobedetermined.Anotheralternativemethod,FRKwasintroducedbyCressie&Johannesson(2008).Unliketapering,FRKisapplicabletoamore˛exibleclassofnon-stationarycovariancematrix,andalsoreducescomputationalcostofkrigingtoO(n).Formanynon-stationarycovariancemodellikeours,theobservedprocesscovariancematrixcanbedecomposedintotwoadditivematrixcomponents.The˝rstisameasurementerrormodeledaswhitenoise.Whilethesecondisanunderlyingprocesswhichcanbenon-stationarycovariancestructureandisoftenassumedtobe˝xedbutlowrank.Theunderlyingprocesscanberepresentedasalinearcombinationofrnrandome˙ects.ForFRKrnplaystheroleofrankofthenon-stationarycomponent,isconsideredtobeknownrand˝xedovern.Inthisworkwewouldliketorelaxthisassumptionbyallowingrnchangingovern.Ourgoalinthispaperistoachieveadatadrivenapproachfor˝ndingtherankrn.Todosoletusassumeeventhoughthereareunknownrnrandome˙ectsusedtorepresenttheunderlyingprocess,whatifwestartwithsomenumbersofrandome˙ectsandasweproceed,ouralgorithmwilldirectustowardadatadrivenvalueforrn?Oncewe˝gureoutthatthedispersionmatrixofthisndimensionalrandome˙ectcanbedecomposedintocholeskyfactor,acloserlookwillteachusthatdroppingorselectingaparticularrandome˙ectboilsdowntozeroornon-zerorowinthecorrespondingcholeskymatrix.Weconsiderapenalizedlikelihoodapproachwherewepenalize`2normwithineachrowofthecholeskymatrixand`1normbetweentwodi˙erentrowsofthecholeskymatrix.58Thelowranknon-stationarycovariancematrixisdecomposed,usingabasiscomponents(notnecessarilyorthogonal)andanothercomponentisdispersionmatrixofrandome˙ectsvector.Thebasiscomponentdependsprimarilyonthechoiceoftheclassofbasisfunctionandnumberofknotpoints.FRKrecommendsthatthechoiceofbasisfunctionshouldbemulti-resolutional,morepreciselytheyusedalocalbi-squarefunctions.Thisuseoflocallymulti-resolutionalknotshasalsobeenprovedquiteusefulintheliteratureofkrigingforlargespatialdatasets(Nychka(2015))otherthanFRK.Thechoiceofnumberofknotpointsandtheirpositionsisalwayscrucial.Thenumberofknotpointsisdirectlyrelatedtorn;thenumberofrandome˙ectscomponent.Theforemostchallengeinapplyingourmethodischoiceofe˙ectivenumbersofknotpointsnecessarytoconstructthebasisfunctionunderstudy.Althoughourinitialobjectiveinthisworkistoprovideawaytoestimatethenon-zerorandome˙ectsand˝nallythecovariancematrix,butlikeanyotherstatisticalpredictionproblemweshallbeextendingour˝ndingsinpresenceofcovariates.PengandWu(2010),provedthatconditionnumberofthecovariancematrixalsoincreaseswithincreaseinin-putvariables.Tohandlenumericalinstability,PengandWu(2010),suggestedtheideaofregularizedkriging,whichisasimplemodi˝cationinthemethodofestimation.Unlikekriging,regularizedkrigingoptimizesregularizedorpenalizedlikelihood.Atthisstagewehavenotconsidereddimensionreductionchallengeswhileextendingour˝ndingsinpresenceofcovariatesbut,forfuturestudies,thiscanbeanon-trivialandworthwhileextension.Arecentstudyonlimitationsoflowrankkriging(Stein(2015))showsanapproximationinwhichobservationsaresplitintocontiguousblocksandassumesindependenceacrosstheseblocks.Itprovidesamuchbetterapproximationtothedatalikelihoodthanalowrankapproximationrequiringsimilarmemoryandcalculations.ItalsoshowsthatKullback-Leiblerdivergenceforlowrankapproximationisnotreducedasmuchasitshouldhavebeeninfewsettings.Onthecontrarythedivergenceisconsiderablyreducedifthereisablockstructure.Keepingthisinmind,andconsideringthefactthatselectionsofknotsworkbetter59undermulti-resolutionsetup,weconsidertheknotsbysuperimposingresolutions.Undersomesensibleassumptionsthisworkwillmotivateourreaderstotheideaofexistenceofaconsistentcovarianceestimatorofthespatialprocessusingalowrankmod-eling,whoseestimationhasnotbeendiscussedbeforeinanyliteraturetothebestofourknowledge.Wewilldiscussthepracticalimplicationsofourassumptionlaterbut,westillliketopointoutthatwithoutlossofgeneralityweconsidered,thelocationknotsforthebi-variatesplinematrixareorderedinaspeci˝cwaysuchthatthetruestructurehasthe˝rstrnnon-zerorowsandrestnrnzerorows.Wealsodiscusshowour˝ndings˝tinthesituationsoflimitationsoflowrankkriging(Stein(2015)).Toavoidfurthermathematicaldetailshere,thispartofthecomparisonisindiscussionsection3.4.Allkindsofapproximationofthecovariancefunctionintroducedsofar,hasamotivetoreducethecomputationalcost.Mostoftheseexistingmethodsfailtocapturebothlargescale(long-range)andsmallscale(short-range)dependence.Howevertaperingcapturessmallscaledependenceand,lowranktechniquescaptureslargescaledependence.Anewmethodisdiscussedusingaddingthesetwocomponents(SangandHuang2012).Wewouldliketopointoutourreadersthathoweverworthwhilethismethodofcombiningbothlowrankandtaperingmaylook,thispaperprovidesamoresoundtheoreticalapproachtosupportouralgorithmand˝ndings.Althoughestimationoflowrankcovariancematrixhasit'slimitations,themethodhasnotalwaysbeencriticized,ratherwellestablishedinseveralsituationsbyvariousauthors.Mostoftheinterestingworkinthis˝eld,canbeclassi˝edintwobroadclasses:statisticsandmachinelearning.Amongmanyothersinthe˝eldofstatisticswethink,FanandLi(2012),Banerjeeet:al:(2012),TzengandHuang(2015)e.t.c.areworthmentioning.Onotherthehand,the˝eldofmachinelearningfocusesondevelopingalgorithmswhere,Friezeet:al:(2004),AchlioptasandMcSherry(2007),Journeeet:al:(2010)arequitereasonabletobrowsethrough.Basedontheseliteraturesitisobviouslyworthwhiletocontributeourtimeandtocomeupwithatheoreticaljusti˝cationbehindthepossibilityoflowrankcovariancematrixestimation.60Evenwhenwekeeptherank˝xed,foraverylargedataset(orderoftensofthousandstohundredsofthousands),krigingcanbequiteimpossibleandadhoclocalkrigingneigh-borhoodsareused(Cressie(1993)).SomerecentdevelopmentsincludeNychkaet:al:(1996;2002),Furreret:al:(2006)andmanymore.Amongotheralternativemethods,someworthdiscussingareRadialbasisinterpolationfunctions(Buhlmann,(2004)),inversedistanceweighting(Shepard,(1968))orregression-basedinversedistanceweightingusedbyJoshepandKang(2009)whichisafastinterpolatorandovercompletebasissurrogatemethod(Chen,Wang,andWu(2010)).Surrogatebasisrepresentationissimilartolatticekriging(Nychka(2015))wherethebasisfunctionsareboundedlysupportedandovercomplete.Butlatticekrigingconsiderssparsityintheprecisionmatrixthroughboundedbasisfunctionmatrixandaparametricneighborhoodmatrixwhereasweareconsideringsparsityinthecovariancema-trixthroughlowrankfactorizationandCholeskydecompositionofthelowrankcovariancematrix.Therestofthispaperisorganizedasfollows.InSection3.1,weexplaintheproposedapproachforselectingandestimatingnonzerorows(rank)andthecorrespondinglowrankcovariancematrix.Followingwhichinsection3.2wepresenttheblockcoordinatedescentalgorithmforblockwiseconvexregularizingfunctions.Section3.3containssimulationresultsalongwitharealdataexample.Wemakesomeconcludingremarksinsection3.4.3.1Methodologyforestimatinganon-stationarylowrankcovari-ancematrixTovividlyunderstandthemethodologyweneedtoexplaintwoideas,arrangingbi-variatesknotsinmulti-resolutionsetup,andgroupLASSOpenaltytoestimateparametershavinganinherentgroupstructure.Boththeideasareseperatelyusefulinsolvingtwodi˙erentproblems.Thearrangementofbivariatesknotsinmulti-resolutionsetupplaysanimportantroleinindexingofspatialdomain.Asweallknowunliketimeseriesanalysis,61wherethetimedomainiseasilyarrangedinchronologicalfashion,inaspatialdomainfromkDandk2,itisquiteimpossibletoarrangethelocationsitesinanychronologicalpat-tern.Whereas,thegroupLASSOpenalizationplaystheroleinsolvingtheproblemofhighdimensionalityoftheparameterspace,whichdependsonthenumberoflocationsitesweconsiderinourstudy.GroupLASSOpenalization(Yuanet:al:(2006)),ortheconceptofusing`1=`2-penalty(Bühlmannet:al:(2011))hasbeenwellestablishedinthecontextofselectingvariblesifitisbelievedthatthereexistsaninherentgroupstructureintheparameterspace.Butithasnotbeenquiteclearhowsuchapproachisappliedtoestimatingrankofalow-rankmatrixandestimatingthematrixitself.Inthissection,weproposean`1=`2-penalizedapproachinestimatingthelowranknon-stationarycovariancematrixasanextensionofFRK.ThegoalofFRKistoreducecomputationcostofinversionofamatrixfromcubictolinearinsamplesizewhileallowingnon-stationarity.Toexplainthedi˙erencebetweenFRKandourmethod,weintroducethefollowingmathematicalnotations.Sincetheideasareinterlinked,Iwouldliketopresentbothusingthesamenotationsandhencethefollowingnotationalbookkeeping.Consideraspatialprocess,Y=fY(s);s2Sg,perturbedwithmeasurementerror=f(s);s2SgandletX=fX(s);s2SgbetheprocessfortheobservableswhereisaGaussianprocesswithmean0andvar((s))=˙2v(s)2(0;1);s2S,for˙2>0andv()known.Sisaspatialdomainofinterest.Ingeneral,theunderlyingprocessYhasameanstructure,Y(s)=Z(s)0+ˇ(s);foralls2Swhere,ˇ=fˇ(s);s2SgfollowsaGaussiandistributionwithmean0,0>><>>>:1di(k)(s)22if,di(k)(s)10otherwise.and,Wendlandbasisfunctionisde˝nedas,eRi(k)(s)=Rds;ui(k)=8>>><>>>:1di(k)(s)6(35di(k)(s)2+18di(k)(s)+3)if,di(k)(s)10otherwise.:Notethat,boththesebasisfunctionsarebounded.Althoughbyintroducingmulti-resolutionknotarrangementswearebringinginmoreparametersandwearecursedwithdimensionalitywewillbeexploitingboundednesstoacheivesparsity.Theobjectivehere,istoobtainwhichrnamongtheseLknotsarenecessarytomodelthenon-stationarycovariancefunction.Inasituation,wherethesamplepointsaredistributeduniformlyoverthesampledomain,weselectthe˝rstrnbasisrows,andwedroptheresttoobtainR.Buteven,inpracticeforirregularlyspaceddata,wemighthopethatwewouldabletoselectrnoutofalltheLknots.Asmentionedearlier,multi-resolutionknotsplaysanimportantroleinde˝ningtheunderlyingindexingsystemeventhoughtheoriginallocationpointsarehardtoorder,itcomeswiththecurseofdimensionality.Topresentthecurseofdimensionalityweareusing69ResolutionIndexRangeofSamplepointi2dimensionalspace3dimensionalspace1(1-4)(1-8)2(5-20)(9-72)3(21-84)(73-584)4(85-340)(585-4680)5(341-1364)(4681-37448)6(1365-5460)(37449-299592)Table3.1Numberofknotsnecessaryforeveryresolutiontable3.1,wherethe˝rstcolumnrepresentstheresolution,andtheothertwocolumnsarefortwoandthreedimensionalspacesrespectively.Thenumbersinthebrackets(ab)means,ifthestudyhasnsamplepoints,with(anb)thenweneedbknots.Onetheotherhandthenumberofresolutionwillbecorrespondingtothenumberappearinginthe˝rstcolumn.So,forourcovariancefunction˙(s;s0)westartwiththemodel,˙(s;s0)=eR(s)0eeR(s0);8s;s02Sn(3.1.3)Similartotheequation(3.1.2),onecaneasilyseethattheequation(3.1.3)isacon-sequenceofwritingˇ(s)=eR(s)0e,whereeisanLdimensionalvectorwithvar(e)=e.Hencethisexpressionofrandome˙ectˇ(s)in(3.1),givesusX(s)=Z(s)0+eR(s)0e+(s)8s2S:(3.1.4)Forsimplicity,letus˝rstpresentourmethodforthecaseZ=0.Letusnowexplaintherelationbetweentwoversionsofrandome˙ectsorcovariancemodeland,themethodusedtoreducethisdimensionalitycost.Ideally,isasub-matrixofewithRe=Rsuchthat, Orn(Lrn)O(Lrn)rnO(Lrn)(Lrn)!=e=ee0= 0Orn(Lrn)O(Lrn)rnO(Lrn)(Lrn)!;(3.1.5)where,rnrnisthernrnmatrixfromtheCholeskydecompositionof,whichistheuniquelowertriangularmatrix.Inpractice,itmaybenecessarythatwereorderthecolumns70inourbasismatrixeRtoachievetheabovestructure.Thisreorderingcanbetakencareofbyintroducingapermutationmatrix,explainedintheappendix.So,fortherestofthediscussionwewillconsider=eReeR0.Sincetherankisunknown,weproposetostartwithallLrowsnon-zero.Ourproposedmethodallowsustoselectnon-zerorowsofe,whicheventuallycapturesallinformationrequiredtoretrieve.Wedroparowfromeifandonlyifalltheelementsinthatrowaresmallerthansomepresetvalue.Theinnovationofthiswork,istoexploitthegroupstructureintherowsoflowertriangularmatrix:Thishastwosigni˝cancesofthiswork.First,isobserve,asTodothis,weconsideragroupwisepenalization.Suchshrinkageequationwillhaveasimilarnatureofblock-wiseoptimization.Denotee'0(j)=(e'j1;e'j2;:::;e'jj;0;0;:::;0)=(e'0j;0;0;:::;0)tobethejthrowofe,wherethenumberofzerosinthejthrowisLj.De˝neevecFullset=(e'01;e'02;:::;e'0L)tobearow-wisevectorrepresentationofthelowertriangularpartofthematrixe.Foraweightvector =( 1; 2;:::; L)0,wede˝neaweighted`1=`2-norm,evecFullset2;1; =PLj=1 je'(j)2,wherekk2isthe`2-normofavector.So,weproposethefollowingweighted`1=`2-penalizedloglikelihoodfunction,Qn(e;˙2;˝n; )=X01X+logdet+˝nevecFullset2;1; ;(3.1.6)where˝nistheregularizationparameter, n=( n1; n2;:::; nL)0isasuitablechoiceofaweightvectorinthepenaltytermand=˙2I+.Weallowthepenaltyparameter,˝n,andtheweightvector, n,todependonthesamplesizen.Nowusingtheabovecovarincemodelingfori:e:=eRee0eR0,(3.1.6)canberewrittenas,Qn(e;˙2;˝n; )=nTr 0˙2I+eRee0eR01!+logdet˙2I+eRee0eR0+˝nevecFullset2;1; ;(3.1.7)where0=XX0=nisthescaledempiricalcovariancematrix.Onecanobservethatthelengthofnonzerocomponentsineachrowofeisvaryingsinceitisalowertriangularmatrixandhenceideallyweshouldputvaryingpenaltyquantityforeachrowofthematrix.71OnewaytohandlethisproblemistorescalethejthcolumnofeRby1= nj.Sowede˝neeR?withthejthcolumnequalto1= njtimesthejthcolumnofeR,andaccordinglywede˝nee?withthejthrowequalto njtimesthejthrowofewhichleadstoeRe=eR?e?.Thistransformationhelpsustoachieveinvarianceine=eRee0eR0foradaptivegroupLASSO.Thereforetheoptimizationproblemin(3.1.7)boilsdowntoanunweighted`1=`2-penalizedloglikelihoodfunction,Qn(e;˙2;˝n;1)=Tr 0˙2I+eRee0eR01!+logdet˙2I+eRee0eR0+˝nevecFullset2;1;1:(3.1.8)Wewanttorestrictoursearchoverthespaceoflowertriangularmatriceswithabsolutelyboundedelementsandbounded˙.LetusdenoteoursearchspacebyPN0introducedin(??),whereN=n(n+1)2+1,thetotalnumberofparameters,isanincreasingfunctionofn.Observethatwiththisrescaling,magnitudeofourspatialbasismatrixeRwillchangeovernwhichcouldimplythatthelargestorsmallesteigenvaluesofeRarenot˝xedforthevaryingsamplesize.Asachoicefor nj,weconsider nj=1=j,i:e:thejthrow,e'(j),isscaleddownbyitsnumberofnonzerocomponents.De˝ne,begL(˝n);^˙2=argminPN0Qn(e;˙2;˝n;1).BasedonbegL,^˙2andeR,wecanobtainbgL=^˙2I+eRbegLbe0gLeR0and^rn=RbegL.3.2BlockCoordinateDescentalgorithmwithProximalupdateInthissectionwewillpresentancost-e˙ectivealgorithmfortheoptimizationproblemposedin(3.1.8).Wehaveablock-wisefunction,blocksbeingtherowsofalowertriangularmatrixe,alongwithagroupLASSOtypepenalty,groupscorrespondingtoeachblock.Therehasbeenfewsigni˝cante˙ortsbehindbuildinge˚cientalgorithmtominimizeapenalizedlikelihood.Althoughgroupwisepenalizationisnotacompletelydi˙erentball72game,itstillrequiressomespecialattention,whichexploitsthegroupstructureandconsiderspenalizing`2normofeachgroup.WewillbeusingaBlockCoordianteDescent(BCD)methodforablockmulti-convexfunctionunderregularizingconstraints,minx2X8<:F(x1;x2;:::;xn)=f(x1;x2;:::;xn)+nXj=1rj(xj)9=;;(3.2.1)wherexisdecomposedintonblocksandrj(xj)istheregularizingconstraintforthejthblock.Oncomparing(3.2.1)with(3.1.8)wecanseethat,inourcasewehavenblocks,Xisthecollectionoflowertriangularmatricesoftheform,e,F(e=Qn(e;˝n)with,f(e'(1);e'(2);:::;e'(n))=Tr 0˙2I+eRee0eR01!+logdet˙2I+eRee0eR0(3.2.2a)rje'(j)=˝ne'(j)2j(3.2.2b)ToeasethecomputationweuseMatrixdeterminantlemmaandSherman-Morisson-Woddburymatrixindentiy.Wefollowprox-linear"algorithm(Xu&Yin(2013))wheretheupdatefore'(j)inthekthstepisdenotedbye'k(j)andisgivenby,e'k(j)=argmine'(j)8<:˝^gkj;e'(j)be'k1(j)˛+Lk1j2e'(j)be'k1(j)22+rje'(j)9=;;8j&k(3.2.3)wheretheextrapolatedpointbe'k1(j)isgivenasbe'k1(j)=e'k1(j)+!k1ie'k1(j)e'k2(j);with!k1i0istheextrapolationweight,^gkj=5fkjbe'k1(j)and,fkje'(j)def=fbe'k(1);be'k(2);:::;be'k(j1);e'(j);be'k1(j+1);:::;be'k1(s);8j&k:Thesecondtermontherighthandside,isaddedonthecontrarytostandardblockcoordinatedescentalgorithm,tomakesurethatthekthupdateisnottoofarfromthe(k1)thupdateinL2sense.Beforewecandothatweneedtoproveblockmulti-convexity(lemma1)andLipschitzcontinuity(lemma??)off('1;'2;:::;'n)and5fkje'(j)respectively.Generally,Lk1jissomeconstantgreaterthatzero,andplaystherolesimilartothepenaltyparameter˝ninrj(e'(j)),soifthekthupdateistoofarfrom(k1)thupdatein73L2-sense,ourobjectivewouldbetopenalizeitmoreandcontrolit,sounlikestandardgroupLASSOproblem,herewehavetotakecareoftwopenaltyparametersratherthanjustone.So,wehaveamorechallengingproblemtosolve,butifscaledproperlyonecanchoseconstantLk1jasascalarmultiplieof˝n.Letusintroduceanewquantity=Lk1j=˝n,whichisusedtoexplaintherestouralgorithmandthisisreferedtoasadualparameterforouroptimizationmethod.Toupdate(3.2.3)weusethefactthatif,rjcanberepresentedasanindicatorfunctionofconvexsetDj,i:e:rj=Dj(e'(j))=Ie'(j)2Dj,thene'k(j)=PDjbe'k1(j)^gkj=Lk1j,wherePDjistheprojectiontothesetDj.SinceforagroupwiseLASSOpenalty,rje'(j)=˝ne'(j)2=j,whichisequivalenttosaythatweneedourpre-penalizedupdatebe'k1(j)^gkj=Lk1j,˝rstscaleddownbyitslengthj,andthenprojectitonasurfaceofthespherewithradius:AndforourgroupwiseLASSOpenalty,wede˝neourcomponentwisescaledprojectionfunctionis,PDj(t)=sgn(t)max(qjtj=j;0):Sotheupdaterule(3.2.3)canbesimpli˝edandthefollowingcanbeusedcomponentwisetoobtainthejthrow,e'k(j)=sgnbe'k1(j)^gkj=Lk1j sabsbe'k1(j)^gkj=Lk1j=j!+;8j&k(3.2.4)wherealltheabovefunctionsde˝nedonthevectorbe'k1(j)^gkj=Lk1jareusedcomponentwise.De˝nethecorrespondinglowertriangularmatrixasek=row-bind(e'k0(1);e'k0(2);;e'k0(n))andnowletuspresenttheworkingalgorithmforouroptimizationandfollowingwhichwealsoprovideasmallmodi˝cationinsituationswhereasubsequentextrapolatedupdatedoesnotreducestheoptimizingfunctionalvalue.[1]Initialization:e1ande0lowertriangularmatricesas˝rsttwoinitialrootswithnozerorowsPre˝x:>0and>0prespeci˝edk=1;2;3;:::j=1;2;3;:::;nbe'k(j) using(3.2.4)Lowertriangularmatrixbeke1 e0ande0 bekj=1;2;:::;ntempj be'k(j)be'k1(j)2maxtemp0and<2=(15+11)with>0.Althoughonemightfeelthenecessityofestimatingthenuisanceparameter:Butletuspointoutthefactthatourresultsworksforanyvalueof>0:Evenifwechoose=n!0;<1=7:5.Thisimpliesour˝ndingcoversCase3andasubsetofCase2inStein(2015).InthepaperbyStein(2015)itisbeenpointedoutKLdivergencedonotreducessu˚cientlyenoughunderaveryspecialsituationofstationaryperiodicprocessonline,whichcanbeextendedtobeaprocessonsurface,althoughcanbequitechallengingevenforstationaryperiodicprocess.Onthecontraryour˝ndingprovidestheoriticaljusti˝cationofconsistent78estimationof0=ˆ2I+eSfKeS0inamoregeneralsetup.79BIBLIOGRAPHY80BIBLIOGRAPHY[1]Achlioptas,D.,&McSherry,F.(2007),astcomputationoflow-rankmatrixapprox-JournaloftheACM,54(2),9.[2]Anh,V.V.andLunney,K.E.(1995),arameterestimationofrandom˝eldswithlong-rangedepMathematicalandcomputermodelling,21(9),67-77.[3]Arbenz,P.,&Drmac,Z.(2002),positivesemide˝nitematriceswithknownnullSIAMJournalonMatrixAnalysisandApplications,24(1),132-149.[4]Antoniadis,A.andFan,J.(2001),gularizationofwaveletapproximation(withJournaloftheAmericanStatisticalAssociation,96,939-967.[5]Bai,Z.D.,Yin,Y.Q.(1993),ofthesmallesteigenvalueofalargedimensionalsamplecovarianceTheannalsofProbability,1275-1294.[6]Boissy,Y.,Bhattacharyya,B.B.,Li,X.andRichardson,G.D.(2005),arameterestimatesforfractionalautoregressivespatialproTheAnnalsofStatistics,33,2553-2567.[7]Banerjee,A.,Dunson,D.B.,&Tokdar,S.T.(2012),tGaussianprocessregressionforlargeBiometrika,ass068.[8]Bühlmann,P.,&VanDeGeer,S.(2011),forhigh-dimensionaldata:meth-ods,theoryandSpringerScience&BusinessMedia.[9]Cressie,N.,(1990),originsofMathematicalgeology,22(3),239-252.[10]Cressie,N.(1993),forspatialWileyseriesinprobabilityandstatis-tics.Wiley-InterscienceNewYork,15,16.[11]Cressie,N.,Johannesson,G.(2008),rankkrigingforverylargespatialdataJournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology),70(1),209-226.[12]Chu,T.,Zhu,J.andWang,H.(2011),enalizedmaximumlikelihoodestimationandvariableselectioninTheAnnalsofStatistics,39(5),2607-2625.[13]Cressie,N.andChan,N.H.(1989),modelingofregionalvJournaloftheAmericanStatisticalAssociation,84(406),393-401.[14]Datta,A.,Banerjee,S.,Finley,A.O.,&Gelfand,A.E.(2014),hicalnearest-neighborGaussianprocessmodelsforlargegeostatisticalarXivpreprintarXiv:1406.7343.81[15]Fan,Y.andTang,C.Y.(2013),uningparameterselectioninhighdimensionalpenalizedlikelihooJournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology),75(3),531-552.[16]Fan,Y.,Li,R.(2012),ariableselectioninlinearmixede˙ectsmoAnnalsofstatistics,40(4),2043.[17]Frieze,A.,Kannan,R.,&Vempala,S.(2004),astMonte-Carloalgorithmsfor˝ndinglow-rankapproJournaloftheACM,51(6),1025-1041.[18]Furrer,R.,Genton,M.G.,&Nychka,D.(2006),varianceTaperingforInterpo-lationofLargeSpatialJournalofComputationalandGraphicalStatistics,15(3)502-523.[19]Fu,R.,Thurman,A.L.,Chu,T.,Steen-Adams,M.M.andZhu,J.(2013),Estima-tionandSelectionofAutologisticRegressionModelsviaPenalizedPseudolikelihooJournalofAgricultural,Biological,andEnvironmentalStatistics,18,429-449.[20]Gupta,S.(2012),noteontheasymptoticdistributionofLASSOestimatorforcorrelatedSankhyaA,74,10-28.[21]Hastie,T.J.andTibshirani,R.J.GeneralizedadditivemoCRCPress.[22]Haustein,K.O.(2006),andpovertyEuropeanJournalofPreventiveCar-diology,13(3),[23]Hoeting,J.A.,Davis,R.A.,Merton,A.A.andThompson,S.E.(2006),delselectionforgeostatisticalmoEcologicalApplications,16(1),87-98.[24]Horn,R.A.andJohnson,C.A.(1985),CambridgeUniversityPress.[25]Hsu,N-J.Hung,H-L.andChang,Y-M.(2008),selectionforvectorautoregres-siveprocessesusingComputationalStatisticsandDataAnalysis,52,3645-3657.[26]Huang,H.C.andChen,C.S.(2007),geostatisticalmodelJournaloftheAmericanStatisticalAssociation,102(479),1009-1024.[27]Huang,H.C.,Hsu,N.J.,Theobald,D.M.andBreidt,F.J.(2010a),LassowithapplicationstoGISmodelJournalofComputationalandGraphicalStatistics,19(4),963-983.[28]Huang,J.,Horowitz,J.L.andWei,F.(2010b),ariableselectioninnonparametricadditivemoTheAnnalsofStatistics,38,2282-2313.[29]Journee,M.,Bach,F.,Absil,P.A.,&Sepulchre,R.(2010),w-rankoptimizationontheconeofpositivesemide˝niteSIAMJournalonOptimization,20(5),2327-2351.[30]Kneib,T.,Hothorn,T.andTutz,G.(2009),ariableselectionandmodelchoiceingeoadditiveregressionmoBiometrics,65(2),626-634.82[31]Lai,R.C.S.,Huang,H.andLee,T.C.M.(2012),andrandome˙ectsselectioninnonparametricadditivemixedmoElectronicJournalofStatistics,6,810-842.[32]Lin,Y.andZhang,H.(2006),onentselectionandsmoothinginmultivariatenonparametricTheAnnalsofStatistics,34,2272-2297.[33]Marshall,A.W.,&Olkin,I.(1979),ofMajorizationanditsAcademic,NewYork,16,4-93.[34]Meier,L.,VanDeGeer,S.andBuhlmann,P.(2009),additivemoTheAnnalsofStatistics,37,3779-3821.[35]Nardi,Y.andRinaldo,A.(2011),utoregressiveprocessmodelingviatheLassoproJournalofMultivariateAnalysis,102,528-549.[36]Nishii,R.(1984),propertiesofcriteriaforselectionofvariablesinmultipleTheAnnalsofStatistics,12,758-765.[37]Nychka,D.,Ellner,S.,Haaland,P.D.,&O'Connell,M.A.(1996),dataanalysisandstatisticaltoolsforestimatingfunctRaleigh:NorthCarolinaStateUniversity.[38]Nychka,D.,Wikle,C.,&Royle,J.A.(2002),modelsfornonstation-aryspatialcovarianceStatisticalModelling,2(4),315-331.[39]Nychka,D.,Bandyopadhyay,S.,Hammerling,D.,Lindgren,F.,&Sain,S.(2015),multiresolutiongaussianprocessmodelfortheanalysisoflargespatialdataseJournalofComputationalandGraphicalStatistics,24(2),579-599.[40]Peng,C.,Wu,C.F.J.,(2013),thechoiceofnuggetinkrigingmodelingfordeter-ministiccomputerexperimenJournalofComputationalandGraphicalStatistics,23(1),151-168.[41]Ravikumar,P.,La˙erty,J.,Liu,H.andWasserman,L.(2009),additivemod-JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology),71(5),1009-1030.[42]Reyes,P.E.,Zhu,J.andAukema,B.H.(2012),ofSpatial-TemporalLatticeModels:AssessingtheImpactofClimateConditionsonaMountainPineBeetleOut-JournalofAgricultural,Biological,andEnvironmentalStatistics,17,508-525.[43]Schott,J.R.(2005),analysisforstatistics.[44]Simon,B.,(1979),raceidealsandtheirapplications(Vol.Cambridge:Cam-bridgeUniversityPress.[45]Stein,M.L.(2013),alpropertiesofcovariancetapJournalofComputa-tionalandGraphicalStatistics,22(4),866-885.83[46]Stein,M.L.(2014),Limitationsonlowrankapproximationsforcovariancematricesofspatialdata.SpatialSpatialStatistics,8(2014):1-19.[47]Schumaker,L.(2007),functions:basictheoryCambridgeUniversityPress[48]SEER(Census2000),Surveillance,Epidemiology,andEndSEER,cf.-www.seer.cancer.gov[49]Stein,M.L.(1999),terpolationofSpatialSpringer.[50]Tibshirani,R.(1996).ssionshrinkageandselectionviatheJournaloftheRoyalStatisticalSociety.SeriesB(Methodological),267-288.[51]Tzeng,S.,Huang,H.C.(2015),MultivariateSpatialCovarianceEs-timationviaLow-RankRegularizatiStatisticalSinica,26,151-172.[52]VanderVart,A.W.andWellner,J.A.(1996),eakConvergenceandEmpericalProcesses:WithApplicationtoSpringer,NewYork.MR13851671.[53]Wang,H.andZhu,J.(2009),ariableselectioninspatialregressionviapenalizedleastTheCanadianJournalofStatistics,37,607-624.[54]Wang,H.,Li,G.andTsai,C-L.(2007),coe˚cientandautoregressiveordershrinkageandselectionviatheJournaloftheRoyalStatisticalSociety.SeriesB(Methodological),69,63-78.[55]Wendland,H.(2005).Scattereddataappro(Vol.17).CambridgeUniversityPress.[56]Xu,G.,Xiang,Y.,Wang,S.andLin,Z.(2012),andvariableselec-tionforin˝nitevarianceautoregressivemoJournalofStatisticalPlanningandInference,142,2545-2553.[57]Xu,Y.,Yin,W.(2013),blockcoordinatedescentmethodformulti-convexopti-mizationwithapplicationstononnegativetensorfactorizationandSIAMJournalonImagingSciences,6(3),1758-1789.[58]Yuan,M.andLin,Y.(2006),delselectionandestimationinregressionwithgroupedvJournaloftheRoyalStatisticalSociety.SeriesB(Methodologi-cal),68,49-67.[59]Zhang,C.andHuang,J.(2008),sparsityandbiasoftheLassoselectioninhigh-dimensionallinearregTheAnnalsofStatistics,46,1567-1594.[60]Zhao,Y.B.(2012),approximationtheoryofmatrixrankminimizationanditsapplicationtoquadraticLinearAlgebraanditsApplications,437(1),77-93.[61]Zhu,J.,Huang,H.-C.,andReyes,P.E.(2010),selectionofspatiallinearmodelsforlattice,JournaloftheRoyalStatisticalSocietySeriesB,72,389-402.84