HIGHDIMENSIONALLINEARREGRESSIONMODELSUNDERLONGMEMORY DEPENDENCEANDMEASUREMENTERROR By AbhishekKaul ADISSERTATION Submittedto MichiganStateUniversity inpartialentoftherequirements forthedegreeof Statistics-DoctorofPhilosophy 2015 ABSTRACT HIGHDIMENSIONALLINEARREGRESSIONMODELSUNDERLONG MEMORYDEPENDENCEANDMEASUREMENTERROR By AbhishekKaul Thisdissertationconsistsofthreechapters.Thechapterintroducesthemodelsunder considerationandmotivatesproblemsofinterest.Abriefliteraturereviewisalsoprovided inthischapter. ThesecondchapterinvestigatesthepropertiesofLassounderlongrangedependentmodel errors.Lassoisacomputationallytapproachtomodelselectionandestimation,and itspropertiesarewellstudiedwhentheregressionerrorsareindependentandidentically distributed.Westudythecase,wheretheregressionerrorsformalongmemorymoving averageprocess.WeestablishasampleoracleinequalityfortheLassosolution.We thenshowtheasymptoticsignconsistencyinthissetup.Theseresultsareestablishedinthe highdimensionalsetup( p>n )where p canbeincreasingexponentiallywith n .Finally,we showtheconsistency, n 1 2 d -consistencyofLasso,alongwiththeoraclepropertyofadaptive Lasso,inthecasewhere p isHere d isthememoryparameterofthestationaryerror sequence.TheperformanceofLassoisalsoanalysedinthepresentsetupwithasimulation study. Thethirdchapterproposesandinvestigatesthepropertiesofapenalizedquantilebased estimatorformeasurementerrormodels.Standardformulationsofpredictionproblemsin highdimensionregressionmodelsassumetheavailabilityoffullyobservedcovariatesand sub-Gaussianandhomogenousmodelerrors.Thismakesthesemethodsinapplicableto measurementerrorsmodelswherecovariatesareunobservableandobservationsarepossi- blynonsub-Gaussianandheterogeneous.Weproposeweightedpenalizedcorrectedquantile estimatorsfortheregressionparametervectorinlinearregressionmodelswithadditivemea- surementerrors,whereunobservablecovariatesarenonrandom.Theproposedestimators forgotheneedfortheabovementionedmodelassumptions.Westudytheseestimatorsin boththedimensionandhighdimensionalsparsesetups,inthelattersetup,thedi- mensionalitycangrowexponentiallywiththesamplesize.Intheeddimensionalsetting weprovidetheoraclepropertiesassociatedwiththeproposedestimators.Inthehighdi- mensionalsetting,weprovideboundsforthestatisticalerrorassociatedwiththeestimation, thatholdwithasymptoticprobability1 ; therebyprovidingthe ` 1 -consistencyofthepro- posedestimator.Wealsoestablishthemodelselectionconsistencyintermsofthecorrectly estimatedzerocomponentsoftheparametervector.Asimulationstudythatinvestigates thesampleaccuracyoftheproposedestimatorisalsoincludedinthischapter. ACKNOWLEDGMENTS IwishtoexpressmysincerethankstoProfessorHiraL.Koulforgivingmetheopportunity tocomehereandlearnandforguidingmeeversince.Ithankhimforsuggestingproblems andinspiringandguidingmethroughthem.Ialsowishtothankhimandhisfamilyfor makingthiscountryfeelnottooalien. IwouldalsoliketothankProfessorRamamoorthiandProfessorMaitiforstimulating discussionsandfortheirhelpduringmygraduatestudy.Also,IthankProfessorPing-Shou ZhongandProfessorHaoWangforservingonmydissertationcommittee. Ithankmyparentsfortheirconstantloveandsupportwhichenabledmetocomplete thiswork. FinallyIwouldliketothankmyfriendsAkshitaChawla,ChanakyaKaul,JiwoongKim andRanjitBawaformakingthisjourneyjoyous. ThisresearchhasbeensupportedinpartbytheNSF-DMSGrant1205271:PI.HiraL. Koul,forwhichIamgrateful. iv TABLEOFCONTENTS LISTOFTABLES .................................... vi LISTOFFIGURES ................................... vii KEYTOSYMBOLS .................................. viii Chapter1IntroductionandLiteratureReview ................ 1 1.1Lassoandlongmemory..............................3 1.2Measurementerrorandpenalizedquantileregression.............6 Chapter2LassowithLongMemoryRegressionErrors ........... 9 2.1ResultswithFiniteSample............................11 2.2SignConsistencyofLassounderLongMemory.................17 2.3Asymptoticswhen p is...........................20 2.3.1Asymptoticdistributionof X T " .....................20 2.3.2AsymptoticPropertiesofLasso.....................21 2.3.3AdaptiveLasso..............................23 2.4SimulationStudy.................................24 2.5ProofsforChapter2...............................28 Chapter3Weighted ` 1 -PenalizedCorrectedQuantileRegression forHighDimensionalMeasurementErrorModels ....... 44 3.1Model,EstimatoranditsOracleProperty...................45 3.2Relationshipbetween ˆ ? L and ˆ .........................51 3.3ResultsinHighDimensions...........................54 3.3.1ASparsityProperty............................58 3.3.2Adaptivechoiceoftheweightvector d: .................60 3.4SimulationStudy.................................61 3.4.1Simulationsetup.............................61 3.4.2ComputationalIssues...........................63 3.4.3Simulationsetupandresults.......................65 3.5ProofsforChapter3...............................71 BIBLIOGRAPHY ................................. 93 v LISTOFTABLES Table2.1MediansofRPE,REE,NZ&IZwithGaussiandesign,longmem. errors...................................27 Table2.2MediansofRPE,REE,NZ&IZwithGaussiandesign,i.i.d.errors.27 Table3.1 Simulationresultsat p =40forNormal,CauchyandParetomodelerrors ....67 Table3.2 Simulationresultsat p =300forNormal,Cauchymodelerrors .........69 Table3.3 Simulationresultsat p =500underheteroscedasticity .............69 Table3.4 W` 1 - CQ at p =40forNormalandCauchymodelerrors. ...........69 Table3.5 W` 1 - CQ at p =300forNormalandCauchymodelerrors. ..........70 Table3.6 ` 1 - CQ at p =300withmisspcovariateerrordistribution. ........70 vi LISTOFFIGURES Figure2.1Lagvssampleauto-correlationfunctionwithd=0.15andd=0.25..25 Figure2.2Lagvssampleauto-correlationfunctionwithd=0.35andd=0.45..26 Figure3.1 1 n P n i =1 ˆ ? L ( )+ n k k 1 evaluatedaround 1 at h =0 : 01(left)and1 : 5(right). 64 Figure3.2 Coloredcontoursof k ^ k 1 on h vs. n for ` 1 - CQ . .............64 Figure3.3 Coloredcontoursof k ^ k 1 on p vs. (left)and h: (right) ..........66 vii KEYTOSYMBOLS Inwhatfollows,forany z =( z 1 ; ;z p ) T 2 R p , k z k 1 = P p j =1 j z j j ; k z k 2 2 = P p j =1 z 2 j : For anytwosequencesofpositivenumbers f a n ;b n g , a n = O ( b n ) ; denotesthatforalllarge n , a n cb n ,forsomeuniversalconstant c> 0 ; whichdoesnotdependonanyunderlying parametersorthesamplesize n: Alllimitsaretakenas n !1 : Foranyindexset S f 1 ;:::;p g ; andforanyvector 2 R p ; denote S = f j ; j 2 S g : Foranyevent A ,denote I A astheindicatoroftheevent A . viii Chapter1 IntroductionandLiteratureReview Thisdissertationshallanalyseanddevelopestimationandvariableselectiontechniquesfor linearregressionmodelsundertwodistinctsetups.Chapter2considersthesetupwiththe modelerrorsbeingalongmemorymovingaverageprocessandChapter3considersthesetup wherethedesignvariablesarenotobserveddirectlybutanoisyformofthesevariablesare observable,whichiscommonlyreferredtoasthemeasurementerrororerrors-in-variables setup. Webeginbydescribingthemodelsunderconsideration.The i th componentofthe responsevector y T =( y 1 ;:::y n )isassumedtoberelatedtothe i th row x T i =( x i 1 ; ;x ip ) ; 1 i n ofthe n p designmatrix X bytherelation y i = x T i + " i ; forsome 2 R p ; 1 i n: (1.1) Hereforanyvector a , a T denotesitstranspose.InChapter2,thevector " :=( " 1 ;:::;" n ) T is assumedtobealongmemorymovingaverageprocess,additionalassumptionsonthisvector shallbemadepreciseinChapter2.Thedesignvariables f x i ; 1 i n g shallbeassumed tobenonrandom.Howeverasshallbecomeapparentlaterthisconditionmaybeeasily relaxedtoallowforsomecommonrandomdesignssuchassub-Gaussianorsub-Exponential withindependentrows. Intheabovediscussionweassumethatthedesignvariables f x i ; 1 i n g arecom- 1 pletelyobserved.Aclassicalproblemisthatofestimationoftheparametervector when thedesignvariables x i 'sarenotdirectlyobservable.Thisproblemshallserveasthecontent ofChapter3.Inplaceofthedesignvariables x i 'sweobservethesurrogate w i 'sobeyingthe model w i = x i + u i ; 1 i n: (1.2) Here, u T i =( u i 1 ; ;u ip )areassumedtobeindependentof f " i g andindependentand identicallydistributed(i.i.d.) p dimensionalrandomvectors.Theexactassumptionsare madepreciseinChapter3. Theparametervectorofinterestthroughoutthisdocumentshallbethe p -dimensional vector =( 1 ;:::; p ) : Weshallconsiderboththe p setupaswellasthehighdimen- sional p setup.Inthelattercase,thedimension p shallbeallowedtogrowexponentially withthesamplesize n: Acriticalassumptionmadeontheparametervectorofinterest is"sparsity,"i.e.itis assumedthatalargeproportionofthecolumnsofthedesignmatrixdonotcontributeany lineartotheresponsevector y: Inotherwords,alargeproportionofthecomponents oftheparametervector arezero.Theproblemofinterestistoconsistentlyidentifyand estimatethenon-zerocomponentsof : Thepasttwodecadeshavecontributedextensivelytosolutionstothisproblem, chiefamongwhichhasbeenthe ` 1 -penalizedmethodsfortheirdesirablesampleand asymptoticpropertiesandcomputational.Theestimatesforthesemethodsare 2 typicallydescribedas ^ ( )=argmin 2 R p f l n ( )+ k d k 1 g ;> 0 ; (1.3) where l n ( )isanappropriatelychosenlossfunctionthatcanbecomputedusingtheobserved variables.Theweights d =( d 1 ;:::;d p ) T isavectorofnon-negativeweights,and` 'denotes theHadamardproduct,i.e., k d k 1 := P p j =1 d j j j j : Throughoutthisthesis,thedesign variables x i 'smaybetriangulararraysdependingon n ,butwedonotexhibitthisdependence forthesakeofthetransparencyoftheexposition.Also,alllimitsaretakenas n !1 ,unless mentionedotherwise. 1.1Lassoandlongmemory Fortheestimatordescribedin(1.3)choosingthelossfunction l n ( )= 1 n P n i =1 ( y i x T i ) 2 ; i.e.,thesquaredlossfunctionandsetting d j =1 ; 1 i p; weobtainthewellcelebrated leastabsoluteshrinkageandselectionoperator(Lasso)proposedbyTibshirani(1996).Its statisticalpropertiesarewellstudiedwhentheregressionerrorsareindependentandiden- ticallydistributed(i.i.d.)randomvariables(r.v.),see,e.g.,KnightandFu(2000),Mein- hausenandBuhlmann(2006),ZhaoandYu(2006),Bickel,RitovandTsybakov(2009) andBuhlmannandvandeGeer(2011).Inparticular,KnightandFu(2000)providethe consistencyand p n -consistencyofLassoestimatesunderthe p setting.Inthehighdi- mensionalsetting,Bickel,RitovandTsybakov(2009)andBuhlmannandvandeGeer(2011) provideerrorboundsforthestatisticalerrorassociatedwithLassothatholdwithprobabil- itytendingto1 : Furthermoretheyprovideadetaileddiscussionofthenecessaryconditions 3 requiredtoobtainthedesirederrorbounds.MeinhausenandBuhlmann(2006)andZhao andYu(2006)madetheimportantcontributionofprovidingmodelselectionconsistency resultsandalsodetailedtheconditionsnecessarytoobtainthem. Theabovementionedpapersworkunderthecommonassumptionthatthemodelerrors f " i ; 1 i n g arei.i.d.realizationsofasub-Gaussianr.v.,asstatedintheintroduc- tiontheprobleminvestigatedinthisdissertationistoanalysethepropertiesofLasso whenthemodelerrors f " i ; 1 i n g formalongmemorymovingaverageprocess. Theliteratureintheareaof ` 1 -penalizedestimationwithdependenceconsiderationsis scant.ThepaperdealingwiththisissueisthatofAlquierandDoukhan(2011).They providesampleerrorboundsunderweakdependencestructuresonthemodelerrors f " i ; 1 i n g : AnotherrecentpaperaddressingdependenceconcernsisthatofYoon,Park andLee(2013).Theirpaperprovidesasymptoticresultsinthe n>p setup,inalinear regressionmodelswithstationaryauto-regressiveerrors.Inthatpapertheerrorprocessis assumedtobeanAR(q)process,whichisknowntobeashortmemoryprocess,seeGiraitis, Koul,andSurgailis(2012)(GKS).ThisthesisinvestigatesthebehaviourofLassoundera strongerdependencestructureandlessrestrictivemodelassumptionsincomparisontothe abovementionedpapers. TheadaptiveLassoproposedbyZou(2006)fromLassointhewayparameters arepenalizedwheretheweights d j ; 1 i p arechosencarefully.Tobemoreprecise,for any > 0 ; theweightvector d =1 = j ( ^ ) j ; with ^ beinganyinitialestimateof suchthat n 1 2 d ( ^ )= O p (1)componentwise.Toavoidconfusionthereadershouldrecall that d denotesthememoryparameterofthestationaryerrorsequencewhereas d denotes 4 thevectorofweightsinthepenaltyterm.TheadaptiveLassoestimates ~ aregivenby ~ =argmin 2 R p 8 < : 1 n k y X k 2 2 + n p X j =1 d j j j j 9 = ; : (1.4) Let A = f j : j 6 =0 g , A ? n = f j : ~ j 6 =0 ; 1 j p g and A ; ~ A bethecorresponding vectorswithonlythosecomponentswhoseindicesareintheset A : AsstatedinZou(2006)forthei.i.d.modelerrorsetupwhen p isanestimatoris saidtohave oracleproperty ifthefollowinghold. 1. Asymptotically,therightmodelisideni.elim n !1 P ( A ? n = A )=1 : 2. Theestimatorhasanoptimalestimationrate, n 1 2 ( ~ A A ) ! D N (0 ; ? ),forsome covariancematrix ? : Here ! D denotesconvergenceindistribution.TheadaptiveLassohasanadvantageover Lasso,sinceitpossessestheaboveoraclepropertyundermildassumptions.Ontheother hand,asprovedbyZhaoandYu(2006),forLassotobesignconsistentanecessarycondition isthe"strongirrepresentablecondition"whichisamuchstrongerassumption. InChapter2weprovideboundsonthestatisticalerrorassociatedwithLassounder longmemorydependentmodelerrors,thatholdwithprobabilitytendingto1 : Secondly, weobtainthesignconsistencyofLassounderthissetupwithstandardrestrictionsonthe designmatrix X: Lastly,weprovidetheconsistencyand n 1 2 d -consistencyoftheLassoin thecasewhere p isandislessthan n: Thisproofisalsoextendedtoderivetheoracle propertyforadaptiveLasso.Fortheseresults,thepricethatwepaytotacklethepersistent correlationamongtheerrorsequenceisthattherateofincreaseofthedimension p inthe highdimensionalsetting,andtherateofconvergenceinthe p settingissloweddown 5 byafactorof n d : Here d isthememoryparameterofthestationaryerrorsequence. 1.2Measurementerrorandpenalizedquantileregres- sion Chapter3ofthisdissertationproposesandanalysesaquantilebasedestimationandvariable selectiontechniqueforthemeasurementmodeldescribedin(1.1)and(1.2).Intheclassical p setting,itiswellknownthatdisregardingmeasurementerrorincovariatesinduces anattenuationbiasintheestimatesoftheparametervector ; i.e.,theestimatesarebiased towardszerowithanondiminishingbiaswhosemagnitudedependsonthevarianceofthe covariatenoise.Thisproblemformeasurementerrormodelswhen p ishasbeenexten- sivelyworkedonbyseveralauthorsincludingFuller(1987)andCarroll,Ruppert,Stefanski andCrainiceanu(2006).Theseauthorsalsoprovidenumerousapplicationsofsuchmod- elsandalsoprovidebiascorrectedestimatorsformeasurementerrormodelsinthecontext ofmeanregression,wheretheobjectiveistoobtainestimatesof correspondingtothe conditionalmeanoftheresponsevariables,giventhecovariates. Theproblemofestimationandvariableselectioninhighdimensionalmeasurementerror modelsisofrecentinterest.AuthorsincludingRosenbaumandTsybakov(2010,2011)and LohandWainwright(2012)studythesemodelsandproposepenalizedestimatorswhichpro- videconsistentestimationandvariableselectioninthepresenceofmeasurementerror.In particular,LohandWainwright(2012)proposethe ` 1 -penalisedbiascorrectedleastsquares approach.Theymaketheimportantcontributionofprovidingerrorboundsfortheassoci- atedstatisticalerrorinestimationthatholdwithprobabilitytendingto1 ; andtheydoso withanon-convexlossfunction.However,bothpapersassumeanunderlyingsub-Gaussian 6 homoscedasticdistributionofthemodelerrorsandworkunderthepremiseofmeanregres- sion. Anotherimportantestimationtechniqueinlinearregressionmodelsisthatofquantile regressionwhereoneisinterestedinestimatesof correspondingtoaspconditional quantileoftheresponse y; giventhecovariates,asopposedtomeanregressionwherethe problemofinterestistoobtainestimatescorrespondingtotheconditionalmeanofresponse variables.Quantileregressionisrobustagainstoutliersandisusefulinthepresenceof heteroscedasticity,see,e.g.,Buchinsky(1994).Thisestimationtechniqueforgoestheneed forsub-Gaussiandistributionalassumptionsonthemodelerrors f " i ; 1 i n g andreplaces themwithmuchmildersmoothnessconditionsonthedensityfunctionsoftheseerrors. Whencovariatesarecompletelyobserved,i.e.withoutmeasurementerror,theproblem ofquantileregressionwithnonsub-Gaussianityandheteroscedasticityinsparsehighdimen- sionalmodelshasrecentlybeenstudiedbyFan,FanandBarut(2014),BelloniandCher- nozhukov(2011),andWang,WuandLi(2012).Theconvexityofquantilelossfunctionis crucialfortheanalysisoftheirinferenceprocedures.Morepreciselytheyproposeandprovide theoreticalguaranteesforthe ` 1 -penalisedestimator(1.3)with l n ( )= 1 n P n i =1 ˆ ( y i ;x i ; ) where ˆ ( y i ;x i ; )= ˆ ˝ ( y i x T i ) ;ˆ ˝ ( v )= v f ˝ I ( v 0) g isthequantilelossfunction. Here ˝ isthequantilelevelofinterest,i.e., P ( " i < 0)= ˝; 1 i n: Theproblemofcorrectingforbiasinducedbymeasurementerrorinquantileregression posesachallengingproblem.Inthecaseof p; thisproblemhasbeenaddressedby Wang,StefanskiandZhu(2012)(WSZ).Theyprovideacorrectedquantilelossfunctionand showthattheestimatesobtainedbyminimizingthislossfunctionprovidesconsistentand p n -consistentestimates.Howeversincetheirlossfunctionisun-penalized,itisunableto performvariableselection. 7 Chapter3ofthisdissertationproposesandanalyzesthepenalizedversionoftheestimator proposedbyWSZ.Inthe p setupweprovidetheoraclepropertyoftheproposed estimator.Inthehighdimensionalsettingweprovideboundsonthestatisticalerrorof theproposedestimatorthatholdwithprobabilitytendingto1 : Inthissetting,wealso establishmodelselectionconsistencyofthisestimatorintermsofidentifyingthecorrect zerocomponentsoftheparametervector. 8 Chapter2 LassowithLongMemoryRegression Errors Inmanyproblemsofpracticalinterestregressionmodelswithlongmemoryerrorsarise naturallyintheofeconometricsandseee.g.,Beran(1994),Baillie(1996)and morerecentmonographsofGiraitis,Koul,andSurgailis(2012)(GKS),andBeran,Feng, Ghosh,Kulik(2013),andthenumerousreferencestherein.Itisthusofinteresttoinvestigate thebehaviorofLassoinregressionmodelswithlongmemoryerrors. Recallthemodel(1.1),where x i =( x i 1 ; ;x ip ) T , i =1 ; ;n arevectorsofdesign variables, y i 'sdenotetheresponses.Theerrors " i areassumedtobelongmemorymoving averagewithi.i.d.innovations,i.e., " i = 1 X k =1 a k i k = i X k = a i k k ; (2.1) where, a k = c 0 k 1+ d ;k 1 ; 0 0 ; and a k =0for k 0 : Also, j ;j 2 Z := f 0 ; 1 ; 2 ; g arei.i.d.r.v.'swithmeanzeroandvariance ˙ 2 .For notationalconvenience,weshallassume c 0 =1and ˙ 2 =1 ; withoutlossofgenerality.Note 9 that f " i ;i 2 Z g isastationaryprocesswithautocavariancefunction " ( k )= 1 X j =1 a j a j + k = k 1+2 d B ( d; 1 2 d )(1+ o (1)) ; 0 0 ;b> 0,see,e.g.,Proposition3.2.1(ii)inGKS. Thisauto-correlationstructureinducesalongmemorystructureonthemodelerrors "; i.e., P 1 k =1 j " ( k ) j = 1 : AsmentionedinChapter1,theLassoestimateof isas, ^ ( )=argmin ˆ 1 n k y X k 2 2 + k k 1 ˙ ;> 0 : (2.3) ThischapterisdevotedtounderstandingthepropertiesofLassounderourlongmemory setup.Thethreemaincontributionsofthischapterareasfollows.First,weshowthatthe probabilityboundforapresetcontrollingthestochastictermmax 1 j p j X T j " j can beobtainedwithalongmemorymovingaverageprobabilitystructureon f " i ; 1 i n g underappropriaterestrictionsontherateofincreaseofthedesignvariablesandwiththe properchoiceoftheregularizer n : Here X j denotesthe j th columnofthedesignmatrix X: Thisresultcanbeusedtoobtainthedesirederrorboundforthestatisticalerrorassociated withLassointhissetup.Secondly,weobtainsignconsistencyofLassounderthelong memorysetupwithstandardrestrictionsonthedesignmatrix X: Theseresultsareobtained inthehighdimensionalsetup.Lastly,weprovideconsistencyand n 1 2 d -consistencyofLasso inthecasewhere p isxed,undercertainassumptionsonthedesignvariables X: Thisproof isalsoextendedtoderivetheoraclepropertyforamoversionofLassoknownasthe adaptiveLasso.Thepricethatwepaytotacklethepersistentcorrelationamongtheerror sequenceisthattherateofincreaseofthedimension p inthehighdimensionalsetting,and 10 therateofconvergenceinthe n>p settingissloweddownbyafactorof n d : Allresultsinthehighdimensionalsettingallowthedesignvariablestogrowwiththe restriction P 1 i n x 2 ij = O ( n ) ; andhencetheresultsobtainedcanalsoeasilybeextended tocaseofGaussianrandomdesigns.Furthermore,allresultsprovedinthehighdimensional setupinthischapterallow p togrowexponentiallywith n: Thischapterisorganizedasfollows.Section2.1belowinvestigatesthesample propertiesofLasso.Section2.2investigatesthesignconsistencyofLasso.Section2.3 providestheasymptoticpropertiesofLassoandalsotheoraclepropertyofadaptiveLasso inthe p setup.Section2.4presentsasimulationstudytoanalysethesample performanceofLassointhecurrentsetup. 2.1ResultswithFiniteSample InthissectionweproveasampleoracleinequalityfortheLassosolutionwhenthe designisnonrandom.Thisinturnwillimplytheconsistencyaswell.Basedontheassump- tionsofthedesignvariablesitwillsoonbeclearthattheresultscanbeeasilyextendedto Gaussianrandomdesignsaswell.Accordingly,inthissubsectionweassume x i 'sarenon random.Toproceedfurther,weshallneedthefollowingnotation.Let W nj = n (1 = 2+ d ) n X i =1 x ij " i = n (1 = 2+ d ) n X i =1 i X k = x ij a i k k = n X k = c nk;j k ; (2.4) 11 where c nk;j := n (1 = 2+ d ) n X i =1 x ij a i k ;k 2 Z ;j =1 ; ;p; (2.5) c n;j :=sup 0 , 0 n = B n ( t 2 +4log p )+ q B 2 n ( t 2 +4log p ) 2 +16 ˙ 2 n ( t 2 +4log p ) 2 n 1 = 2 d ; (2.9) where B n := c n D .Then,forall 1 j p andforall n 1 , P 2 n 1 n X i =1 x ij " i > 0 n ! 2exp ( t 2 +4log p ) = 4 g : (2.10) Consequently, P 1 2exp( t 2 4 ) ;n 1 : (2.11) Theproofoftheabovepropositionwillrequireseverallemmas,henceispostponedto Section2.5ofthischapter.ThekeytotheproofisanapplicationoftheBernsteininequality topartialsumsandthenpassingtolimit. WecannowproceedtodescribetheoracleinequalityfortheLassosolution.Thecorre- spondingresultswithi.i.d.errorsareprovedinBuhlmannandvandeGeer(2011,chapter 6).Inwhatfollows, S 0 denotesthecollectionofindicesofthenonzeroelementsofthetrue asin(1.1)and s 0 denotesthecardinalityof S 0 : Also,forany 2 R p ; S 0 denotes thevectorofthosecomponentsof whichhavetheirindicesin S 0 .Inordertoobtainthe followinginequalitywerequirethe`compatibilitycondition'onthedesignmatrix X: This 13 conditionisasgiveninBuhlmannandvandeGeer(2011),whichisrestatedhereforthe convenienceofthereader. 2.1.1 Wesaythe Compatibilitycondition ismetfortheset S 0 ; ifforsome ˚ 0 ,andforall satisfying jj S c 0 jj 1 3 jj S 0 jj 1 ; jj S 0 jj 2 1 ( T ^ ) s 0 ˚ 2 0 ; with ^ = X T X=n: Theorem2.1.1 Assumethatthecompatibilityconditionholdsfor S 0 : Forsome t> 0 let theregularizationparameterbe n 2 0 n ; where 0 n isgivenin(2.9).Thenwithprobability atleast 1 2exp( t 2 = 4) ,wehave k X ( ^ ) k 2 2 =n + k ^ k 1 4 2 n s 0 ˚ 2 0 : (2.12) TheproofofTheorem2.1.1isthesameasin(BuhlmannandvandeGeer(2011,chapter 6)),withthevalueof 0 n changedtotheonegivenin(2.9).Thisresultholdsontheset whichhastherequiredhighprobabilitybyProposition2.1.1. Theonlyassumptionswehavemadesofarare(i)Cramer'sConditionin(2.8)onthe innovationdistributionand(ii)TheCompatibilityConditionin3.15onthedesign variables.ItmaybeofinteresttomentionthatGaussianityoftheerrordistributionhas notbeenassumed.Thepricethatwehavepaidforthisgeneralityisthat 0 n as in(2.9)isnowitselfdatadriven,i.e. 0 n alsodependsonthedesignvariables x i 's.Thus, keepinginviewTheorem2.1.1,itisofinteresttoanalysetherateofconvergenceof 0 n : Thefollowinglemmaandremarkgiveadditionalconditionsonthedesignvariables,andthe 14 rateofincreaseofthedimension p; underwhich 0 n willconvergeto0. Lemma2.1.1 Let X =( x ij ) n p bethedesignmatrixandsupposethefollowingcondition holds 8 1 j p; n 1 n X i =1 x 2 ij C; forsome C< 1 : (2.13) Thenwith c n and ˙ 2 n asdin(2.5),(2.6)respectively,wehave, c n = o (1) and ˙ 2 n = O (1) . Since B n = c n D; with D beingaconstant,theabovelemmaimplies B n ! 0 : Remark2.1.1 Now,recalltheof 0 n from(2.9).Assumethedesignvariables satisfycondition(2.13).Furtherassume,log p = o ( n 1 = 2 d ),then, 0 n ! 0 : ThefollowingpropositionwillyieldtheconsistencyoftheLassosolution. Proposition2.1.2 Forsome t> 0 ; let n 2 0 n ; where 0 n isdin(2.9).Thenon theset ; withprobabilityatleast 1 2exp( t 2 = 4) ; 2 k X ( ^ ) k 2 2 =n 3 k k 1 : (2.14) Asmentionedearlier,theproofofthistheoremfollowsdeterministicargumentsontheset ; (SeeBuhlmannandvandeGeer(2011,chapter6)).Theprobabilityofthesetisgiven inProposition2.1.1. 15 Remark2.1.2 ConsistencyofLasso :Assumethefollowinghold, ( i )log p=n 1 = 2 d ! 0 ; ( ii ) Assumption(2.13)holds ; ( iii ) k k 1 = o ( n 1 = 2 d = log p ) : (2.15) ThenbyRemark2.1.1wehave 0 n ! 0 : Also,assumption(iii)ensurestherighthandside oftheinequality(2.14)convergestozero.Hence,Proposition2.1.2alongwithLemma2.1.1 yieldstheconsistencyoftheLassosolution. Remark2.1.3 RandomDesign :Therearetwoassumptionsmadeonthedesignvariables inordertoobtaintheerrorboundinTheorem2.1.1andtheconvergenceof 0 n tozeroin Remark2.1.1.(i)Compatibilityconditiongivenin(3.15)and(ii)condition(2.13)which restrictstherateofincreaseofthedesignvariables.Theseconditionscanbeshownto holdinthecaseofGaussianrandomdesignswithindependentrows.UsingTheorem1of Raskutti(2010),condition(i)canbeshowntoholdwithhighprobability(increasingto 1exponentially).Ifthemaximumvariancecomponentofthedesignvariablesisbounded abovebyaconstant,then(ii)canbeshowntoholdwithhighprobabilityusingboundsfor chi-squaredistributionsgiveninJohnstone(2001).Hencetheaboveresultsremainvalid withhighprobabilitywhenthedesignvariablesareGaussianwithindependentrowsand independentofthemodelerrors ": 16 2.2SignConsistencyofLassounderLongMemory InthissectionweprovethesignconsistencyofLassoforthemodel(1.1),(2.1).Theresults inthissectionaresimilarinspirittoZhaoandYu(2006)andweshallfollowthestructureof theirproofs.Theyworkedinthei.i.d.setupwhereaswewillbeworkinginthelongmemory setup.Webeginwithaandsomenotations. 2.2.1 Lassoissaidtobestronglysignconsistentifthereexists n = f ( n ) ,that is,afunctionofnandindependentof y n or X n suchthat, lim n !1 P ( ^ n ( n )= s n )=1 : Heretheequalitydenotesequalityinsign,i.e., ^ n = s n ifandonlyif sign ( ^ n )= sign ( n ), where sign ( j )assignsavalue+1toapositiveentry, 1toanegativeentryand0toazero entry. Assume n =( n 1 ;:::; n q ; n q +1 n p ) T ,where n j 6 =0 ;j =1 ;::;q ,and n j =0, j = q +1 ;::;p .Let n (1) =( n 1 ; n q ) T and n (2) =( n q +1 ; n p ) T : Denote X (1)asthe q columnsof X; correspondingtothenonzerocomponentsof n : Denote X (2)asthelast p q columnsof X; correspondingtothezerocomponentsof n : Let C n = n 1 X T X: Then bysetting C n 11 = n 1 X (1) T X (1), C n 22 = n 1 X (2) T X (2) ;C n 12 = n 1 X (1) T X (2)=( C n 21 ) T , C n canthenbeexpressedas C n = C n 11 C n 12 C n 21 C n 22 : Inwhatfollows,wedonotexhibitthedependenceof , ^ on n fortransparencyofthe exposition.Assuming C n 11 isinvertible,theStrongIrrepresentableconditionasby ZhaoandYuis, 17 StrongIrrepresentableCondition :Thereexistsavector ; withconstant,positivecom- ponents,suchthat, j C n 21 ( C n 11 ) 1 sign( (1) ) j 1 ; (2.16) where 1 isa( p q ) 1vectorofonesandtheinequalityholdselement-wise. Thefollowingpropositionwillserveasatooltoderivethesignconsistencyinthepresent setup. Proposition2.2.1 Assumethestrongirrepresentableconditionholdswithavector ,with allcomponentspositive.Then P ( ^ ( n )= s ) P ( A n \ B n ) ; for A n = j ( C n 11 ) 1 W (1) j 0,sothat 1 n x T i x i M 1 ; 8 i 2f 1 ;:::;n g ; (2.20) 0 C 11 M 2 ; 8 3jj jj 2 2 =1 ; (2.21) q n = O ( n c 1 ) ; (2.22) n 1 2 d c 2 2 min 1 i q j i j M 3 : (2.23) UndertheaboveassumptionsweobtainthefollowingsignconsistencyresultforLasso inthelongmemorycase. Theorem2.2.1 Supposethelongmemoryregressionmodel(1.1)and(2.1)hold,withthe innovationdistributionsatisfyingtheCramer'scondition(2.8).Thenunderthecondi- tions(2.16),(2.20),(2.21),(2.22),(2.23),ifforsome 0 p and p ised,theasymptoticpropertiesofLassorelycriticallyontheasymp- toticdistributionofsuitablynormalized X T ": Thisdistributionisstraightforwardtoobtain inthecaseofi.i.d.errors.Herewepresenttheasymptoticdistributionofnormalized X T " . Thisdistributionhasessentiallybeenobtainedinchapter4ofGKS,wheretheauthorsgive CLT'sforweightedsumsofalongmemorymovingaverageprocess. T n asthenon normalizedweightedsums W n asgivenin(2.4),i.e. T n = n 1 2 + d W n : Weuse T n insteadof W n torelatethefollowingmorecloselytoGKS.Notethat T n = X T ": Ourgoalistoestablishtheasymptoticdistributionofsuitablynormalized T n .Thisin turnisfacilitatedbyTheorem4.3.2ofGKS,pp70.Westateaslightlymoversionof thistheoremwhichcanbeprovedeasilybyfollowingthesamearguments.Inthefollowing denoteby n =Cov( T nj ;T nk ) p j;k =1 : Theorem2.3.1 Let f x ij g n i =1 ;j =1 ;:::;p; beparraysofrealweightsand f " i g bethesta- tionarylinearprocessasdin(2.1).Assumetheweights f x ij g n i =1 satisfythefollowing condition 8 j =1 ;:::;p; ( i )max 1 i n j x ij j = o ( n 1 2 + d ) and ( ii ) n X i =1 x 2 ij C j n 1+2 d ; (2.25) andforsomematrix , n (1+2 d ) n ! : (2.26) 20 Then, n ( 1 2 + d ) X T " = n ( 1 2 + d ) T n 1 ;:::;T np ! D N (0 ; : Corollary2.3.1 Supposetheweights f x ij g satisfy(2.13).Then, n (1+2 d ) n = O (1) com- ponentwise,forany 0 0 ; theweightvector^ w =1 = j ( ^ n ) j ; with ^ n beinganyestimate of suchthat n 1 2 d ( ^ n )= O p (1)componentwise.TheadaptiveLassoestimates ~ n are givenby, ~ n =argmin 8 < : 1 n k y X k 2 2 + n p X j =1 ^ w j j j j 9 = ; : (2.29) Let A = f j : j 6 =0 g , A ? n = f j : ~ n j 6 =0 ; 1 j p g and A ; ~ n A bethecorresponding vectorswithonlythosecomponentswhoseindicesareintheset A : AsstatedinZou(2006),anestimatorissaidtohave oracleproperty ifthefollowing hold, 1. Asymptotically,therightmodelisideni.elim n !1 P ( A ? n = A )=1 : 2. Theestimatorhasanoptimalestimationrate, n 1 2 d ( ~ n A A ) ! D N (0 ; ? ),forsome covariancematrix ? : TheadaptiveLassohasanadvantageoverLasso,sinceitpossessesadesirablevariable selectionpropertyundermildassumptions.Ontheotherhand,asseeninSection3,for 23 Lassotobesignconsistent,werequirethestrongirrepresentableconditionwhichisamuch strongerassumption.ThefollowingtheoremshowsthispropertyoftheadaptiveLasso.In otherwords,theadaptiveLassoenjoystheoraclepropertyinthelongmemorycase.Let A bethelimitingcovariancematrixin(2.26)withonlythosecomponentswhoseindicesarein theset AA : Theorem2.3.4 Forthelinearmodel(1.1),assumethedesignvariablessatisfy(2.25), (2.26)and(2.28).Lettheregularizer n besuchthat n 1 2 d n ! 0 ; and n 1 2 + 2 d n ! 1 : ThentheadaptiveLassomustsatisfythefollowing. 1. Variableselectionconsistency, lim n !1 P( A ? n = A )=1 : 2. Asymptoticnormality, n 1 2 d ( ~ n A A ) ! D ( C n 11 ) 1 N (0 ; A ) Remark2.3.2 Fortheadaptiveweights^ w =1 = j ^ j ; wecanchoose ^ astheordinaryleast squareestimate.IthasalreadybeenshowninGKSthat n 1 2 d ( ^ n )= O p (1) ; whichis therequiredconditionthattheweightsmustsatisfy. 2.4SimulationStudy InthissectionwenumericallyanalysetheperformanceofLassounderlongrangedependent setup.Wealsocompareitsperformancetothatinthei.i.d.setup.Allsimulationsweredone inR,theestimationofLassowasdoneusingthepackage`glmnet'developedbyFriedman, Hastie,Tibshirani(2013).Theregularizer n waschosenbyefoldcross-validation. Simluationsetup :Inthisstudy, waschosenasa1000 1vector,withthetwenty ecomponentschosenindependentlyfromauniformdistributionovertheinterval(-2,5),all othercomponentsof weresettozero.Thecovariates x i arei.i.d.observationsfroma1000 24 dimensionalGaussiandistributionwitheachcomponenthavingmeanandvarianceone.We setthepairwisecorrelationtobecor( x ij ;x ik )=0 : 5 j j k j : Thisdesignmatrixhasbeenused byTibshirani(1996)andmanyauthorssincethen.Themodelerrorvector " isgenerated usingthe(2.1)with c 0 =1and d =0 : 15 ; 0 : 25 ; 0 : 35 ; 0 : 45 ; withtheinnovations beingi.i.d.Gaussianr.v.asgivenin(2.1)withmeanzeroandstandarddeviation ˙ =3 : 5 : Thesimulationswererepeated100times,i.e.100datasetsweregeneratedundertheabove setupwiththesameparametervector . Sincewehavechosend,thecorrespondingvarianceofeachcomponentofthestationary errorprocesscanbecomputedas,Var( " i )= ˙ 2 P 1 k =1 k 2+2 d 8 i; whichturnsouttobe 25.16,31.98,47.64and100.94correspondingtod=0.15,0.25,0.35,0.45respectively. Webeginbyillustratingthetcorrelationamongthecomponentsoftheregres- sionerrorvector ": Figure1and2presentthesampleauto-correlationfunctionsoftheerror vector " ofthemodelofthe100simulateddatasets.Figure1&2aboveexhibitthe Figure2.1:Lagvssampleauto-correlationfunctionwithd=0.15andd=0.25. slowdecayoftheautocorrelationamongtheerrorsequence ": Thisslowrateofdecayisin coherencewithlongmemorydependence,since P 1 k =1 j " ( k ) j = 1 : Also,itisevidentfrom theabovetwothatthestrengthofthedependenceisincreasingasdincreases. 25 Figure2.2:Lagvssampleauto-correlationfunctionwithd=0.35andd=0.45. Forcomparisonpurposes,weshallalsoperformthesamesimulationstudywiththe errors " =( " 1 ;" 2 ;:::;" n )beingi.i.d.Gaussianobservationswithmean0andvariance25.16, 31.98,47.64,100.94whichcorrespondtothevariancesofthecomponentsofthestationary sequence " underthelongmemorysetupcorrespondingto d =0 : 15 ; 0 : 25 ; 0 : 35 ; 0 : 45 : The reasontochoosethesamevarianceof " i asinthelongmemorysetupistomaintainthe samesignaltonoiseratio. Nowweproceedtotheestimationpart.Inourstudywesimulated100tre- alizationsofthedesignmatrix X andtheerrorvector ": Thusleadingto100datasets withthesameparametervector : ForperformancecomparisonweshallreporttheRelative EstimationError(REE),i.e k ^ k 2 = k k 2 andtheRelativePredictionError(RPE)as inZou(2006),i.e.theempiricalestimateof E k ^ y X T k 2 =˙ 2 " : Also,weshallreport thenumberofcorrectlyestimatednon-zeroparameters(NZ)andthenumberofincorrectly estimatedzeroparameters(IZ).Recallthatinthetruemodelthereare25non-zeroand975 zeroparameters.Table1summarizesthesimulationresultsunderthelongmemorysetup andTable2summarisestheresultsunderthei.i.d.setup. 26 Table2.1:MediansofRPE,REE,NZ&IZwithGaussiandesign,longmem.errors d=0.15 d=0.25 d=0.35 d=0.45 (n) REE RPE NZ IZ REE RPE NZ IZ REE RPE NZ IZ REE RPE NZ IZ 100 0.216 0.62 14 33 0.23 0.62 14 33.5 0.24 0.61 14 34.5 0.28 0.46 14 32 200 0.13 0.47 15 30 0.13 0.44 14 32.5 0.14 0.41 14 35 0.18 0.38 14 35 300 0.10 0.39 15 33 0.11 0.36 15 35 0.11 0.38 15 34 0.14 0.31 15 41 400 0.09 0.33 16 39 0.09 0.31 16 40 0.10 0.32 15 36 0.12 0.31 15 41 700 0.05 0.23 20 60 0.06 0.21 19 59.5 0.07 0.23 18 50 0.08 0.22 17 52 Table2.2:MediansofRPE,REE,NZ&IZwithGaussiandesign,i.i.d.errors Var( " i )=25 : 16 Var( " i )=31 : 98 Var( " i )=47 : 64 Var( " i )=100 : 94 (n) REE RPE NZ IZ REE RPE NZ IZ REE RPE NZ IZ REE RPE NZ IZ 200 0.14 0.42 14 29.5 0.15 0.38 14 29 0.18 0.33 14 29 0.25 0.29 14 30 400 0.09 0.28 16 32 0.10 0.26 15 31 0.12 0.22 15 28 0.15 0.18 14 28 Interpretation: Lassoisadesirableestimationprocedureinourlongrangedependentsetup.Itper- formsaccurateestimationatalllevelsofdependence,fromd=0.15tod=0.45.Itis evidentfromthesimulationresultsthattheestimationbecomesincreasinglyaccurate intermsofbothREEandRPEasthesamplesizeincreases.Atn=400,therelative errorinestimationof isaround10%atalllevelsofdependence.Asthereadermight observe,Itwasexpectedthatatanysamplesize,RPEshouldincreaseas d in- creases,howeverthisisnotthecase,thereasonforthisis,weusecrossvalidationto choose n andnotthetheoreticalvalueof n derivedearlier. Intermsofvariableselection,Lassoisincreasinglysuccessfulinchoosingthenonzero parametersasthesamplesizeincreases.Byn=700,itidenaround20ofthenon zeroparametersforalllevelsofdependence.TheparametersthatLassoisconsistently unabletoselectaretheonesthataretoosmallinsize,i.einourmodelwehavefour parameterswhere j j j < 0 : 65 ;j =3 ; 7 ; 15 ; 19 ; anditistheseparametersthatLassois consistentlyunabletodetect,uptothesamplesizen=700.Thepointherebeing,this 27 isaknowndrawbackofLassoconnectedwithassumption(2.23),anditisnotdueto thelongmemorydependencestructureontheerrors.Thisisbytheresults forthei.i.d.case,whichexhibitsthesameproblem.Theabovesimulationalsobrings outanotherfamiliardrawback,asthereadermightobserve,althoughLassomanagesto correctlyestimateatportionofthezeroparameters(around95%atn=700), howeverthenumberofincorrectlyestimatedzeroparameters(IZ)isnotdecreasing asthesamplesizeincreases.Thisagainisnotduethelongmemoryerrorsbutisan inherentdrawbackofcrossvalidation.Thiscanagainbebytheresultsin thei.i.d.caseatthevariancelevels47.64and100.94whereIZdoesnotdecreaseasn increasesfrom200to400. ComparingRPEinthelongmemorycaseandthei.i.d.case,asexpected,weobserve thatthelongmemorycaserequireslargernumberofobservationstoreachthesame levelofaccuracy,keepinginmindthatthevarianceofcomponentsof " issimilarfor boththedependentandindependentcase. 2.5ProofsforChapter2 Theproofof Lemma2.1.1 ,willfollowaftertwokeylemma's.Toproceedfurtherwerequire thefollowingnotation, Let r beaepositiveintegerand 8 1 j r; let h j =( h 1 j ;h 2 j ;:::;h nj ) T beavector ofweights.Further,let c nk;j = P n i =1 h ij a i k andfor1 j r; W n;j = h T j " = n X i =1 h ij " i = n X i =1 i X k = h ij a i k k = n X k = c nk;j k : (2.30) 28 Further c n;j =sup n and i 0 ; thenundertheassumption P n i =1 h 2 ij M=n 2 d ;M< 1 ; weobtainusing(2.2)thatforall1 j r; ˙ 2 n;j = c n 1 X s = ( n 1) ;s 6 =0 p ( s;j ) j s j 1+2 d + o (1) = c n X m =1 n X l =1 ;l 6 = m h mj h lj j l m j 1+2 d + o (1) : (2.33) Here, p ( s;j ):= P n i =1 h ij h ( i + s ) j ; and c = B ( d; 1 2 d )asgivenby(2.2). Notethat,ifwereplace h ij by n ( 1 2 + d ) x ij intheaboveof W nj thenwe obtain(2.4).Thismoregeneralof W nj willbeessentiallaterintheproofofsign consistency. Lemma2.5.1 Foranypositiveinteger r ,andforall 1 j r ,let h j =( h 1 j ;:::;h nj ) T be anyvectorofweightssuchthat k h j k 2 2 = P n i =1 h 2 ij M=n 2 d ; forsomeconstant M< 1 .Let 29 ˙ 2 n beasdin(2.32).Then ˙ 2 n = O (1) : Proof. First j p ( s;j ) j n X i =1 j h ij h ( i + s ) j j ( n X i =1 h 2 ij ) 1 = 2 ( n X i =1 h 2 ij ) 1 = 2 M=n 2 d ; andhence Var ( W n;j )= c n 1 X s = ( n 1) ;s 6 =0 p ( s;j ) j s j 1+2 d + o (1) c M n 2 d n 1 X s = ( n 1) ;s 6 =0 j s j 1+2 d + o (1) c M n n 1 X s = ( n 1) ;s 6 =0 j s=n j 1+2 d + o (1) ! M 0 Z 1 1 j t j 1+2 d dt: (2.34) Observethattheboundin(2.34)isfreeof j; hencetheclaimfollows. Lemma2.5.2 Foranypositiveinteger r ,andforall 1 j r ,let h j =( h 1 j ;:::;h nj ) T be anyvectorofweightssuchthat k h j k 2 2 = P n i =1 h 2 ij M=n 2 d ; forsomeconstant M< 1 . Thenfor c n asdin(2.31)wehave, c n = o (1) : Proof TheideaoftheproofisborrowedfromGKSaspartofProposition4.3.1,pp66,where itisusedinatcontext.Firstobserve,sinceforall1 j r; P n i =1 h 2 ij M=n 2 d , 30 ) 1 max 1 i n 1 j r j h ij j n d = p M !1 : K n := 1 max 1 i n 1 j p j h ij j ; andconsider j c nk;j j n X i =1 j h ij a i k j n X i =1 j h ij a i k j I ( j i k j K n ) + n X k =1 j h ij a i k j I ( j i k j K n ) =: q n; 1 k;j + q n; 2 k;j q n; 1 k;j ( n X k =1 h 2 ij ) 1 = 2 ( n X k =1 a 2 i k I ( j i k j K n )) 1 = 2 C=n d n X l K n a 2 l ! 0 : (2.35) q n; 2 k;j max 1 i n; 1 j r j h ij j n X i =1 j a i k j I ( j i k j K n ) K 1 n K 1 = 2 n ( 1 X l =0 a 2 l ) 1 = 2 CK 1 = 2 n ! 0 : (2.36) Sincetherighthandsideof(2.35)and(2.36)arefreeof j; henceweobtain, c n = o (1). ProofofLemma2.1.1 :Intheabovesetup 8 1 j p; let h ij = n ( 1 2 + d ) x ij ; 1 i n: Then,undertheassumption(2.13),wehave, P n i =1 h 2 ij C=n 2 d : Theresultnowfollowsfrom Lemma2.5.1andLemma2.5.2. 31 Remark2.5.1 ObservefromtheproofofLemma2.5.2,for h ij = n (1 = 2+ d ) x ij ; wehave, j c nk;j j ( P n i =1 x 2 ij ) 1 = 2 n 1 2 + d ( n X i =1 a 2 i k I ( j i k j K n )) 1 = 2 + K 1 = 2 n ( 1 X l =0 a 2 l ) 1 = 2 ; where K n =max 1 i n 1 j p j x ij j : Since x ij < 18 1 i n; 8 1 j p; andthesequence f a l g issquaresummable,henceeach n;c n < 1 withouttheassumption(2.13). Thefollowingseverallemmasareneededtoprove Proposition2.1.1 .Firstrecallthe BernsteininequalityfromDoukhan(1994)orLemma3.1fromGuoandKoul(2007). Lemma2.5.3 Foreach n 1 ;m 1 ,let Z mni ;i = m;:::;n; beanarrayofmeanzero varianceindependentrandomvariables.Assume,additionallythattheysatisfythe Cramerscondition:forsome B mn < 1 ; E j Z mni j k B k 2 mn k ! EZ 2 mni ;k =2 ; 3 ;:::;i = m;:::;n: (2.37) Let T mn = P n i = m Z mni ;˙ 2 mn = P n i = m Var ( Z mni ) .Then,forany > 0 and n 1 ; P ( j T mn j > ) 2exp ˆ 2 4 ˙ 2 mn +2 B mn ˙ ; 8 m 2 Z + ;n 1 : (2.38) WeneedtoapplytheaboveBernsteininequality p times, j thtimeto Z mni;j := c ni;j i , m i n; 1 j p .Inthiscasethen T mnj = n X i = m c ni;j i : (2.39) 32 Forthispurpose,weneedtoverify(2.37)inthiscase.LetDbeasin(2.8)and B mn;j B n := c n D;c n =max 1 j p c n;j : (2.40) Thenbyassumption(2.8), j c ni;j j k E j i j k j c ni;j j k 2 D k 2 k ! c 2 ni;j E 2 i (2.41) B k 2 n k ! c 2 ni;j E 2 i ; m i n; therebyverifyingtheCramer'scondition(2.37)for Z mni;j foreach1 j p with B mn;j B n ; notdependingon m and j . Toproceedfurther,weneedtoobtainanupperboundfor ˙ 2 mn;j := P n i = m Var( Z mni;j ). But ˙ 2 mn;j = n X i = m Var( c ni;j i ) n X i = Var ( c ni;j i ) = Var ( n X i = c ni;j i )= Var ( n X i =1 n (1 = 2+ d ) x ij " i ) = n (1+2 d ) n X k;` =1 x kj x `j " ( k ` )= ˙ 2 n;j < 1 ; (2.42) Fromtheabovediscussionwenowreadilyobtainthatforall > 0and1 j p , P ( j n X i = m c ni;j i j > ) 2exp 2 4 ˙ 2 mn;j +2 B n (2.43) 2exp 2 4 ˙ 2 n +2 B n : 33 Remark2.5.2 ByRemark2.5.1,weseethatforeach n 1 ; wehave, c n < 1 withoutassumption(2.13).HencetheBernsteininequalityisapplicableforevery n 1 withoutassumption(2.13). WearenowalmostsettoderivetheprobabilityboundforBeforethat,welookat thefollowingpreliminarylemma,whichwillhelpustoobtainthisboundfromthetruncated sums T mnj in(2.39)for W nj in(2.4)bytakinglimitas m !1 . Lemma2.5.4 Foreachdn,let A := fj n X i = y ni j >r g ;B m = fj n X i = m y ni j >r g ;r> 0 ;> 0 ;m =1 ; 2 ;::: B =liminf m !1 B m : If j P n i = y ni j < 1 ,a.s.,then,foreachd n , A B Proof. Let ! 2 A .Then j P n i = y ni ( ! ) j >r .Also,byassumption, j P n i = y ni ( ! ) j < 1 ; whichimplies 8 > 0 9 N ;! 3j P m i = y ni ( ! ) j <; 8 m>N ;! : Hence j P n i = m y ni ( ! ) j > r ; 8 m>N ;! ; whichinturnimplies ! 2 1 \ m = N fj n X i = m y ni j >r jg ) ! 2 1 [ m =1 1 \ l = m fj n X i = l y ni j >r g ) ! 2 liminf m !1 B m : (2.44) Since(2.44)istrueforany > 0,theclaim A B follows. Beforeproceedingtothenextproposition,weseethattheassumptioninLemma2.5.4 isvalidfortheseriesinconsideration,whichis P n k = c nk;j k : First,considertheseries 34 " i = P 1 k =1 a k i k ; sincethisisansumofindependentzeromeanrandomvariables with P 1 k =1 Var( a k i k ) < 1 ; hence " i < 1 a.s.(Durrett,Theorem1.8.3,page62).Nowfor each n; wehaveby(2.4), P n k = c nk;j k = n ( 1 2 + d ) P n i =1 x ij " i ; sincethisisa weightedsumof f " i g henceforeach n; wehave, P n k = c nk;j k < 1 ; a.s. 8 1 j p: ProofofProposition2.1.1. Fixa1 j p andan n 1.Recalltheof c nk;j from(2.5).Let r np := n 1 = 2 d 0 n = 2.Then,forany0 <r np )= P ( j n X k = c nk;j k j >r np ) P (liminf m !1 j n X k = m c nk;j k j >r np ) ; byLemma2.5.4 ; liminf m !1 P ( j n X k = m c nk;j k j >r np ) ; Fatou'slemma ; liminf m !1 2exp ( r np ) 2 4 ˙ 2 n +2 B n ( r np ) ; wherethelastinequalityfollowsfrom(2.43).Uponletting ! 0inthisboundwethus obtain P ( j n (1 = 2+ d ) n X i =1 x ij " i j >r np ) 2exp r 2 np 4 ˙ 2 n +2 B n r np : (2.45) Notethat r np isapositivesolutionofthefollowingquadraticequation. r 2 np 4 ˙ 2 n +2 B n r np = ( t 2 +4log p ) 4 : 35 Hence,(2.45)andtherelation 2exp r 2 np 4 ˙ 2 n +2 B n r np =2exp ( t 2 +4log p ) 4 ; togetherimply P 2 n 1 n X i =1 x ij " i > 0 n = P ( j n (1 = 2+ d ) n X i =1 x ij " i j >r np ) (2.46) 2exp ( t 2 +4log p ) 4 : Thiscompletestheproofof(2.10).Toprove(2.11),notethat 1 P = P (max 1 j p 2 n 1 j n X i =1 x ij " i j > 0 ) P ( [ p j =1 f 2 n 1 j n X i =1 x ij " i j > 0 g ) p X j =1 P (2 n 1 j n X i =1 x ij " i j > 0 ) : By(2.46)weget, p X j =1 P (2 n 1 j n X i =1 x ij i j > 0 ) 2 p exp ( t 2 +4log p ) = 4 g =2exp( t 2 4 ) : ThiscompletestheproofofProposition2.1.1 36 ProofsforSection3 ProofofProposition2.2.1 : Let ^ beasin(2.3)andlet^ u = ^ : V n ( u )= n X i =1 1 n [( " i X T i u ) 2 " 2 i ]+ n k u + k 1 : Then^ u =argmin u V n ( u ).Denotethetermin V n ( u )by(I),andthesecondtermby (II).Then(I)canbeas n X i =1 1 n [( " i X T i u ) 2 " 2 i ]= h 2 n X i =1 1 n u T X i " i + n X i =1 1 n ( u ) T X i X T i u i = h 2 u T W n 1 2 d + u T C n u i ; (2.47) where W = n 1 = 2 d X T ": tiate(2.47)withrespectto u toobtain 2 n ( 1 2 d ) ( C n ( n 1 2 d u ) W ) : Let^ u (1), W (1)and^ u (2), W (2)denotethe q andthelast p q entriesof^ u , W , respectively.Nownotethat(ZhaoandYu,2006) f sign( (1) )^ u (1) > (1) jgf sign( ^ j )=sign( j ) ;j =1 ; 2 :::;q g : (2.48) 37 Also,bytheKarush-Kuhn-TuckerconditionsanduniquenessofLasso,ifasolution^ u exists, thenthefollowingconditionsmusthold, ( C n 11 ( n 1 2 d ^ u (1)) W (1))= n 2 n 1 2 d sign ( (1) ) ; (2.49) j ^ u (1) j < j (1) j ; (2.50) j ( C n 21 ( n 1 2 d ^ u (1)) W (2)) j n 2 n 1 2 d 1 : (2.51) Theset(2.50)iscontainedinthesetontheleftof(2.48).Hence(2.49),(2.50),(2.51) togetherimply f sign ( ^ (1) )= sign ( (1) ) g and ^ (2) =^ u (2)=0 : Thecondition A n impliesthe existenceof^ u (1)which(2.49)and(2.50)andcondition B n and A n togetherimply (2.51).Theresultfollows. Tomaintainclarityofnotationinthecomingproof,wethefollowing,foramatrixof weights h a =( h a 1 ;:::;h aq ) ; where h aj =( h a 1 j ;h a 2 j ;:::h anj ) ; 8 1 j q; W a n;j ;c a n ;˙ 2 an asdonein(2.30),(2.31),(2.32)respectively.Alsoe B a n = c a n D .Repeatsimilarlyfora matrixofweights h b =( h b 1 ;:::;h b ( p q ) ) ; with h bj =( h b 1 j ;h b 2 j ;:::h bnj ) ; 8 1 j ( p q ) : ProofofTheorem2.2.1 : Let A n , B n beasinProposition2.2.1. 1 P ( A n \ B n ) P ( A c n )+ P ( B c n ) q X i =1 P ( j z i j n 1 2 d ( j i j n 2 b i ))+ p q X i =1 P ( j i j n 2 n 1 2 d i ) ; where z =( z 1 ;z 2 ;:::;z q ) T =( C n 11 ) 1 W (1), =( 1 ; 2 ;:::; p q ) T = C n 21 ( C n 11 ) 1 W (1) W (2), b =( b 1 ;b 2 ;:::;b q )=( C n 11 ) 1 sign ( (1) ) : Nowexpress z = h T a " ,where h T a =( h a 1 ;:::;h aq ) T = 38 ( C n 11 ) 1 ( n 1 = 2 d X (1) T ).Then h T a h a =( C n 11 ) 1 n 2 d ; and z j = h T aj " with k h aj k 2 2 1 n 2 d M 2 8 j =1 ;:::q; byassumption(2.21) Similarlywrite = h T b " ,where h T b = C n 21 ( C n 11 ) 1 ( n 1 2 d X (1) T ) ( n 1 2 d X (2) T ).Then h T b h b = 1 n 1+2 d X (2) T I X (1)( X (1) T X (1)) 1 X (1) T X (2) : Since[ I X (1)( X (1) T X (1)) 1 X (1)]haseigenvaluesbetween0and1,therefore n j = h T bj " , with k h bj k 2 2 M 1 =n 2 d 8 j =1 ;:::p q; byassumption(2.20) : Hencetheweightvectors h aj ; 1 j q and h bj ; 1 j p q bothsatisfyLemma2.5.1 andLemma2.5.2for r = q and r = p q respectively.Also, j n b j = n j ( C 11 ) 1 sign ( (1) ) j n M 2 jj sign ( (1) ) jj 2 = n M 2 p q: (2.52) Now, z j = h T aj " = P n i =1 h aij " i Proceedasdoneearlierin(2.45).Using(2.52),Lemma2.5.1, Lemma2.5.2andtheBernstein'sInequalityasappliedin(2.45).Weget,forsomeconstants r 1 ;r 2 > 0 ; q X j =1 P ( j z j j n 1 2 d ( j j j n 2 b j )) q X j =1 P ( j z j j r 1 n c 2 = 2 ) (2.53) 2 q exp r 2 1 n c 2 4 ˙ 2 an +2 B a n r 1 n c 2 = 2 ! 0 : 39 Also p q X j =1 P ( j j j n 2 n 1 2 d j ) ( p q )exp r 2 2 ( n n 1 = 2 d ) 2 4 ˙ 2 bn +2 B b n r 2 n n 1 = 2 d (2.54) ( p q )exp r 2 n n 1 = 2 d ; fornlarge ; exp( n c 3 r 2 n n 1 = 2 d ) ! 0 : Theresultfollowsfrom(2.53)and(2.54)together. ProofsforSection4 ProofofCorollary3.57 :Observethatassumption(2.13)impliesassumption(2.25)(i) and(2.25)(ii).Henceweonlyneedtoshow, n (1+2 d ) n = O (1)componentwise.Foreach variancecomponent,thishasalreadybeenshownin(2.34)intheproofofLemma2.1.1, with h ij = n ( 1 2 + d ) x ij ; 8 1 j p .Thecovariancecomponentscanbeeasilydealtwith theCauchy-Schwarzinequality. ProofofRemark2.3.1 :Using(2.33),weobtain, n 1 2 d Cov( T nj ;T nk )= n 1 2 d c n X l;m =1 ;l 6 = m g j ( l n ) g k ( m n ) j l m j 1+2 d + o (1) ! c Z 1 0 Z 1 0 g j ( u ) g k ( v ) j u v j 1+2 d dudv: ProofofTheorem2.3.2 :Let Z n ( ˚ )= 1 n n X i =1 ( y i X T i ˚ ) 2 + n p X i =1 j ˚ i j ; 40 then Z n ( ˚ )isconvex.Weneedtoshowthepointwiseconvergence(inprobability)of Z n ( ˚ ) to Z ( ˚ )+ k 2 forsomeconstant k .Clearly, n P p i =1 j ˚ i j! 0 P p i =1 j ˚ i j andconsider, 1 n n X i =1 ( y i X T i ˚ ) 2 = 1 n n X i =1 ( " i X T i ( ˚ )) 2 = 1 n n X i =1 " 2 i + 1 n n X i =1 ( ˚ ) T X i X T i ( ˚ ) 2 n 1 ( ˚ ) T n X i =1 X i " i = 1 n n X i =1 " 2 i + 1 n n X i =1 ( ˚ ) T X i X T i ( ˚ ) 2 1 n ( ˚ ) T X T "; thetermintheaboveequationconvergesto k 2 bytheergodictheorem(since f " i g form astationaryergodicsequence),thesecondtermconvergesto( ˚ ) T C ( ˚ )andthelast termconvergestozeroinprobability(sincebyTheorem2.3.1, n ( 1 2 + d ) X T " convergesin distribution).ThisprovestheTheorem. ProofofTheorem2.3.3 : V n ( u )= n 1 2 d 2 4 n X i =1 1 n [( " i X T i u n 1 2 d ) 2 " 2 i ]+ n p X j =1 [ j j + u j n 1 2 d jj j j ] 3 5 : (2.55) Denotethetermintheaboveequationby(I),andthesecondtermby(II).Then ( I )= n 1 2 d " n X i =1 1 n " 2 i + u T P n i =1 X i X T i u n:n 1 2 d 2 P n i =1 u T X i " i n:n 1 2 d n X i =1 1 n " 2 i # =[ u T P n i =1 X i X T i u n 2 P n i =1 u T X i " i n 1 2 + d ] ! u T Cu 2 u T W; as n !1 ; (2.56) where W is N p (0 ; : Also, 41 ( II )= n 1 2 d n p X j =1 [ j n 1 2 d j + u j j n 1 2 d j j j ] ! 0 p X j =1 [ u j sign ( j ) I [ j 6 =0] + j u j j I [ j =0] ] ; (2.57) Theresultfollowsfrom(2.56)and(2.57)together. ProofofTheorem2.3.4 :ThestructureoftheproofissimilartothatofTheorem2in Zuo(2006). ~ V n ( u )= n 1 2 d 2 4 n X i =1 1 n [( " i X T i u n 1 2 d ) 2 " 2 i ]+ n p X j =1 ^ w j [ j j + u j n 1 2 d jj j j ] 3 5 ; (2.58) then~ u j = n 1 2 d ( ~ n )=argmin ~ V n ( u ) : Expanding ~ V n ( u )asdonein(2.56)and(2.57)we get, ~ V n ( u )= u T P n i =1 X i X T i u n 2 P n i =1 u T X i " i n 1 2 + d + n 1 2 d n p X j =1 ^ w j [ j n 1 2 d j + u j j n 1 2 d j j j : Recall, n 1 X T X ! C; andbyTheorem2.3.1wehave n ( 1 2 + d ) X T " ! D N (0 ; : Also,since n 1 2 d n ! 0 ;n 1 2 + 2 d n !1 andtheadaptiveweights ^ n aresothat n 1 2 d ( ^ n )= O p (1) : Henceweobtain ~ V n ( u ) ! ~ V ( u )where, ~ V ( u )= 8 > < > : u T A C 11 u A 2 u T A W A ; if u j =0 8 j= 2A 1 ;else 9 > = > ; (2.59) Theuniqueminimumof ~ V ( u )is( C 1 11 W A ; 0) T : Henceweobtain, ~ u A = n 1 2 d ( ~ n A A ) ! D C 1 11 W A and~ u A c ! D 0 : (2.60) 42 ThevariableselectionpartcanbeobtainedbyadjustingnormalizationintheproofofZuo (2006).Fromtheasymptoticnormalityobtainedin(2.60),weobtain, 8 j 2A ;P ( j 2A ? n ) ! 1 : Let x j :=( x 1 j ;::::;x nj ) T bethe j th columnofthedesignmatrix X; 1 j p: Nextwe showthatif j= 2A ; then P ( j= 2A ? n ) ! 1 : BytheKKTconditionsfortheLassosolution,we have, j 2 x T j ( y X ~ ) j n ^ w j : Consider, x T j ( y X ~ n ) n 1 2 + d = x T j Xn 1 2 d ( ~ n ) n + x T j " n 1 2 + d using(2.60),thetermontherightsideconvergestosomenormaldistribution,andby Theorem2.3.1thesecondtermontherightconvergestoanormaldistribution.Also,since j =0and n 1 2 d ( ^ n )= O p (1) ; hence, n 1 2 d n ^ w j = n 1 2 d + n 1 j n 1 2 d ^ j j !1 : Thisimplies, P ( j= 2A ? n ) P ( j 2 x T j ( y X ~ n ) j n ^ w j ) ! 1 : Thiscompletestheproof. 43 Chapter3 Weighted ` 1 -PenalizedCorrected QuantileRegression forHighDimensionalMeasurement ErrorModels AsdescribedinChapter1,weshallnowconsidertheproblemofestimationandvariable selectioninmeasurementerrorlinearregressionmodels.Inthischapterweshallpropose a ` 1 -penalizedcorrectedquantileestimatorthatconsistentlycorrectsthebiasinducedby measurementerrorandalsoprovidesconsistentvariableselection.Theproblemofbias correctionduetomeasurementerrorinthecontextofmeanregressionisaclassicalproblem inthecaseof p; ontheotherhanditisofrecentinterestinthehighdimensional setting.Furthermorebiascorrectioninquantileregressionhasonlyrecentlybeenstudied Wang,StefanskiandZhu(2012)inthe p settingandtothebestofourknowledgethis chapteristheattempttodosointhehighdimensionalsetting.Themaincontributions ofthischapteraretoprovidetheoraclepropertyoftheproposedestimatorinthe p setting,andtoprovideboundsonthestatisticalerroroftheproposedestimatorinthehigh dimensionalsetting.Inthissetting,wealsoestablishthemodelselectionconsistencyof 44 thisestimatorintermsofidentifyingthecorrectzerocomponentsoftheparametervector. Furthermoreweillustrateitsempiricalsuccessviaasimulationstudy. 3.1Model,EstimatoranditsOracleProperty Inthissection,wedescribethemodel,theproposedestimatorandthenecessaryassumptions onthemodelparameters.Alsoweestablishtheoraclepropertyoftheproposedestimatorin the p setting.Asdescribedin(1.1)and(1.2)weconsideralinearregressionmodelwith additiveerrorinthedesignvariables.Where x i =( x i 1 ; ;x ip ) T , i =1 ; ;n; arevectors ofnon-randomdesignvariablesand y i 'saretheresponsesrelatedto x i 'sbytherelations y i = x T i 0 + " i ; forsome 0 2 R p ; 1 i n: (3.1) Here 0 =( 0 1 ;:::; 0 p ) 2 R p istheparametervectorofinterest,and " =( " 1 ;:::;" n ) T is an n dimensionalvectorwhosecomponentsareindependentbutnotnecessarilyidentically distributed,andsatisfy P ( " i 0)= ˝; forevery1 i n ,where ˝ 2 (0 ; 1)isthequantile levelofinterest. Furthermore,thedesignvariables x i 'sarenotobserveddirectly.Instead,weobservethe surrogate w i 'sobeyingthemodel, w i = x i + u i ; 1 i n: (3.2) Here, u T i =( u i 1 ; ;u ip )areassumedtobeindependentof f " i g andindependentand identicallydistributed(i.i.d.)accordingtoa p dimensionalmultivariateLaplacedistribution whichisviaitscharacteristicfunctionasfollows, 45 3.1.1 Arandomvector u 2 R p issaidtohaveamultivariateLaplacedistribution L p ( ifforsome 2 R p andanonnegativesymmetric p p matrixits characteristicfunctionis 1+ t T t= 2 1 , t 2 R p : Notethat,if =0,thenisthecovariancematrixoftherandomvector u . Laplacedistributionsareoftenusedinpracticetomodeldatawithtailsheavierthan normal.McKenzieetal.(2009)usedthesedistributionsintheanalysisofglobalpositioning dataandPurdomandHolmes(2005)adoptedLaplacemeasurementerrormodelinthe analysisofdatafromsomemicroarrayexperiments.StefanskiandCarroll(1990)providean indepthdiscussionofLaplacemeasurementerrors. Inoursetup,weshallconsiderthemodel(3.1)and(3.2)inboththeandhighdimen- sionalsettings.Inthelattersetting,thedimension p oftheparametervector 0 isallowed togrowexponentiallywith n; andthemeasurementerrors u i ; 1 i n arei.i.d. L p (0 ; ; withknown.Furthermore, 0 isassumedtobesparse,i.e.,onlyasmallproportionofthe parametersareassumedtobenonzero.Thenumberofnonzerocomponentsshallbede- notedby s; where s isallowedtodivergeslowerthan n: Let S = j 2f 1 ; 2 ;:::;p g ; 0 j 6 =0 : ; and S c denoteitscomplimentset.Notethatcard( S )= s: Also,foranyvector 2 R p ; let S = f j ; j 2 S g and S c = f j ; j 2 S c g : Allresultspresentedinthechaptershallassumetheunobserveddesignvariables x i 'sto benon-random.However,itisworthpointingoutthattheassumptionsmadeinthischapter on x i 'scanbeshowntoholdwithprobabilityconvergingto1undersomerandomdesigns, inparticularforsub-Gaussianorsub-Exponentialdesignswithindependentobservations. Whenthedesignvariables x i 'sarecompletelyobserved,severalauthorsincludingFan, etal.(2014),BelloniandChernozhukov(2011)andWang,WuandLi(2012)haveshown 46 that 0 canbeestimatedconsistentlyby ^ x =argmin 2 R p ( 1 n n X i =1 ˆ ( y i ;x i ; )+ n k d k 1 ) ; (3.3) where ˆ ( y i ;x i ; )= ˆ ˝ ( y i x T i ) ;ˆ ˝ ( v )= v f ˝ I ( v 0) g isthequantilelossfunction,and d =( d 1 ;:::;d p ) T isavectorofnon-negativeweights,and` 'denotestheHadamardproduct, i.e. k d k 1 := P p j =1 d j j j j : Toovercometheltyduetomeasurementerrorinthecovariates,webeginwiththe regularizedversionofcorrectedquantileestimator W` 1 - CQ .Theun-penalizedversionwas introducedbyWang,StefanskiandZhu(2012)(WSZ).Todescribetheirlossfunction,let K ( )denoteakerneldensityfunctionand K 0 beitsderivative.Also,let h = h n ! 0 besequenceofpositivewindowwidths,and H ( x )= R x K ( u ) du: Let ˆ ? L ( y i ;w i ;;h )=~ " i ( ˝ 1)+~ " i H ( ~ " i h ) ˙ 2 2 n 2 h K ~ " i h + ~ " i h 2 K 0 ~ " i h o ; (3.4) where,~ " i = y i w T i and ˙ 2 = T : WSZproposedtoapproximatethequantilefunc- tion ˆ ˝ ( y i x T i 0 )bythesmoothfunction ˆ ? L ( y i ;w i ;;h )andtheirestimatorasa minimizerwithrespectto oftheaverage n 1 P n i =1 ˆ ? L ( y i ;w i ;;h ).Itspenalizedanalogis l ? n ( ):= 1 n n X i =1 ˆ ? L ( y i ;w i ;;h )+ n k d k 1 : Observethat l ? n ( )isnon-convexand l ? n ( )maydivergewhen ˙ 2 = T !1 : Hence,we restricttheparameterspacetotheexpanding ` 1 -ball= f 2 R p ; k k 1 b 0 p s g ; forsome 47 b 0 > 0.Now,the W` 1 - CQ estimatoras ^ =argmin 2 l ? n ( ) : (3.5) Theweights d j ; 1 j p areassumedtosatisfy (i)max j 2 S d j c max ; 0 0 ;C 2 > 0such thatforany y satisfying j y j C 1 ; max 1 i n j F i ( y ) F i (0) yf i (0) j C 2 y 2 : ThisconditionisthesameasCondition1inFanetal.(2014).Itimposesonlymildconditions ontheerrordensitiesandisslightlystrongerthantheLipchitzconditionfor f i 'saroundthe origin.Gaussianityandhomoscedasticityisnotimposed.Severaldistributions,including doubleexponentialandCauchy,satisfythiscondition. (A2) Unobserveddesignmatrix X : Forall1 j p , n 1 P n i =1 x 2 ij c x ; forsomeconstant c x < 1 : (A3) Measurementerrors: Themeasurementerrors f u i g areindependentof f " i g ,and i.i.d. L p (0 ; ; forall1 i n ,withaknownFurthermore,thereexistsaconstant 50 0 <˙ 2 u < 1 suchthatmax 1 j p Var ( u ij ) ˙ 2 u : (A4) Kernelfunction K : K istheprobabilitydensityfunctionofastandardnormalrandom variable. ThischoiceoftheKernelfunctionshallplayanimportantroleinouranalysis.Thiskernel functionischosenforitsmanytractableproperties,namely,itissymmetricaroundorigin, tiable,andmoreimportantly,itsderivativesbeingLipchitzcontinuous, whichisdetailedinSection3.5ofthischapter. 3.2Relationshipbetween ˆ ? L and ˆ Theanalysistofollowreliescriticallyontheapproximationofthecorrectedquantileloss function ˆ ? L in(3.4)intermsofobserved w i 'sbytheusualconvexquantilefunction ˆ in(3.3)involvingunobserved x i 's.Webeginbyestablishingthisconnection. Theapproximationresultwederiveforthecurrenthighdimensionalsetup,where p is increasingexponentiallywith n ,issimilartotheoneusedinWSZinthecaseof p .For thatreasonweusesimilarnotationasinWSZ.Accordingly,asmoothedquantileloss functionwitharguments( y i ;x i ;;h ) ; ˆ L ( y i ;x i ;;h )=( y i x T i ) f ˝ 1+ H ( y i x T i h ) g : (3.7) Notethat ˆ ? L isafunctionoftheobservedcovariates w; whereasthe ˆ L and ˆ arefunctions 51 oftheunobservedcovariates x: Now,for 2 ; M n ( ) M ? n ( w;;h )= n 1 n X i =1 n ˆ ? L ( y i ;w i ;;h ) ˆ ? L ( y i ;w i ; 0 ;h ) o ; (3.8) ~ M n ( ) ~ M n ( x;;h )= n 1 n X i =1 n ˆ L ( y i ;x i ;;h ) ˆ L ( y i ;x i ; 0 ;h ) o ; M n ( ) M n ( x; )= n 1 n X i =1 n ˆ ( y i ;x i ; ) ˆ ( y i ;x i ; 0 ) o : Wearenowreadytostatethefollowingtheoremdescribingtheapproximationofthe processes M ? n ( )and M n ( )bytheirrespectiveexpectations,uniformlyin 2 inprob- abilitywithrates.Itsproofisgiveninsection3.5.Throughout, max denotesthelargest eigenvalueof Theorem3.2.1 Assumethemeasurementerrormodel(3.1)and(3.2)andtheassumptions (A1),(A2),(A3)and(A4)hold.Then, sup 2 M ? n ( ) EM ? n ( ) = O p max s 3 = 2 h 2 r 2log2 p n ; (3.9) sup 2 ~ M n ( ) E ~ M n ( ) = O p p s r 2log2 p n : (3.10) Toproceedfurther,werequirethefollowingtworesultsofWSZ.First,thetwice entiabilityof ˆ L ( y;x;;h )inthevariable y x 0 and u i ˘ L p (0 ; imply EM ? n ( )= E ~ M n ( ) ; 8 2 R p : (3.11) 52 Secondly,underassumption(A4), sup 2 j ~ M n ( ) M n ( ) j = O ( h ) ; a.s. (3.12) Claim(3.11)isadirectconsequenceofTheorem2ofWSZwhileclaim(3.12)isprovedin WSZasapartoftheproofoftheirTheorem3(page14).Theshortproofofthisstatement isreproducedherefortheconvenienceofareader. Proofof(3.12): Denoteby ˆ L ( e;h ):= ˆ L ( y;x;;h ) ; where e = y x 0 : Similarly ˆ ( e ) : Let Z denotear.v.havingd.f. H .Usethesymmetryof K ,andhenceof H ,the ofitsmoment,andthechangeofvariableformulatoobtainthatthelefthand sideof(3.12)isboundedaboveby2times sup e ˆ L ( e;h ) ˆ ( e ) sup e e H e h I f e> 0 g sup t htH j t j hE j Z j : Thiscompletestheproofof(3.12). Itisimportanttonotethatbothoftheseresultsarevalidwithoutanyrestrictiononthe modeldimension p ,henceapplicableinourhighdimensionalsetup.Inviewoftheresults (3.11),(3.12)andTheorem3.2.1,weobtain sup 2 j M ? n ( ) M n ( ) j (3.13) sup 2 j M ? n ( ) EM ? n ( ) j +sup 2 j ~ M n ( ) E ~ M n ( ) j +sup 2 j ~ M n ( ) M n ( ) j = O p max s 3 = 2 h 2 r log2 p n + O ( h ) : 53 ThelastclaimintheaboveboundsfollowssincebyTheorem3.2.1,thesecondtermonthe righthandsideof(3.13)decreasesfasterthantheterm.Thisapproximationplaysa pivotalroleintheanalysiscarriedoutinthesequel. 3.3ResultsinHighDimensions Inthissectionweshallprovidestatisticalerrorboundsforthe W` 1 - CQ estimator.The followinglemmaiscrucialforobtainingourerrorbounds.Let v n ( )= n 1 n X i =1 ˆ ( y i ;x i ; ) ;g n ( ):= Ev n ( ) ; 2 R p : Weshallsometimeswrite ˆ Li ( )for ˆ L ( y i ;w i ;;h ).Similarcommentappliesto ˆ and ˆ L . Wehave Lemma3.3.1 Forthemeasurementerrormodel(3.1)and(3.2),wehave, g n ( ^ ) g n ( 0 )+ n k d ^ k 1 (3.14) n k d 0 k 1 + j M ? n ( ^ ) M n ( ^ ) j + j M n ( ^ ) EM n ( ^ ) j ; 8 2 R p : Thisinequalityisobtainedbysubtracting M n ( ^ ) EM n ( ^ )onbothsidesoftheinequality n 1 n X i =1 ˆ ? Li ( ^ )+ n k d ^ k 1 n 1 n X i =1 ˆ ? Li ( 0 )+ n k d 0 k 1 ; andthenrearrangingtermsandusingthetriangleinequality. Thetechniqueadoptedtoprovidethedesirederrorboundsistoestablishresultsfor any choseninasmallneighbourhoodof 0 : Later,usingtheconvexityof ˆ ˝ ( )andthe 54 inequality(3.13),weshowthattheestimator ^ indeedeventuallyliesinthisneighborhood, withprobabilitytendingto1.Let n :=max 1 i n; 1 j p j x ij j ,and B ( n )= 2 R p : k 0 k 1 n ; where n isasequenceofpositivenumbersdecreasingto0andsatisfying n = o ( 1 n ) : The lastpieceofthisjigsawisthefollowinglemmawhichshallprovidealowerboundforthe termonthelefthandsideof(3.14).Let X :=( x 1 ;x 2 ; ;x n ) T denotethe n p design matrix,and:= diag f f 1 (0) ;:::;f n (0) g : Lemma3.3.2 Supposethemodel(3.1)andassumption(A1)hold.Then,thereexistsa constant 0 0 ; andconstants 0 0 ,andforall 2 R p satisfying k S c k 1 c 0 k S k 1 ; k S k 2 1 bs n˚ 2 X 0 X: (3.15) Inoursetuptheconstant c 0 canbeexplicitlycomputedas c 0 =(2 c max + c min ) =c min : Hence, ifweareusingthe ` 1 penalty,wheretheweights d j =1 ; forall1 j p; then c 0 =3 : Wealsoneedthefollowingrateconditionsonvariousunderlyingentities. (i) n !1 ; max !1 ; n ! 0 ; n ! 0 ; (3.16) n max s 3 = 2 n h 2 r log2 p n = o ( n ) ; n = o ( 1 n ) ; (ii) n h = o ( n ) ; ( iii ) n s n n ˚ 2 ! 0 : Fortheboundeddesignswhere n = O (1),weshallneedthefollowingrateconditions. (i) max !1 ; n ! 0 ; n ! 0 ; max s 3 = 2 n h 2 r log2 p n = o ( n ) ;; (ii) h = o ( n ) ; ( iii ) n s n ˚ 2 ! 0 : Intheaboveconditions, ˚ istheconstantin(3.15).Asisthecasewithkernel densityestimators,therateofdecreaseofthesmoothingparameter h hastobeappropriately balanced.Ithastodecreaseslowlyenoughsoastosatisfy(3.16)(i)andfastenoughtosatisfy (3.16)(ii)inthecaseofunboundeddesign.Similarly,inthecaseofboundeddesign,theserate constraintshavetobalancebetween(3.17)(i)and(3.17)(ii).Notethattherateofdecrease 56 of n istlyslowerthaninthecaseofnonmeasurementerrorinthecovariates.This ismainlyduetothepresenceoftheadditionalnoiseinthecovariatesandthesmoothing parameter h: Wenowstatethemainresultprovidingerrorboundsfortheproposedestimator. Theorem3.3.1 Forthemeasurementerrormodel(3.1)and(3.2),let ^ beasin(3.5)and c min ;c max beasin(3.6)with c m := c min + c max : Assume(A1),(A2),(A3)and(A4),along withtheCompatibilitycondition(3.15)hold.Alsoassumethateithertherateconditions (3.16)or(3.17)holds.Thenthefollowinginequalityholdswithprobabilityatleast1-o(1). 3 c a n 1 k 1 = 2 X ( ^ 0 ) k 2 2 +2 n c min k ^ 0 k 1 (3.17) 4 2 n c 2 m s n ˚ 2 + O max s 3 = 2 n h 2 r log2 p n + O ( h ) : Thebound(3.17)clearlyimpliesthatundertheconditionsofTheorem3.3.1, k ^ 0 k 1 ! p 0and n 1 k 1 = 2 X ( ^ 0 ) k 2 2 ! p 0.Inotherwords,thesequenceofestimators W` 1 - CQ isconsistentfor 0 in ` 1 -normandintheweighted L 2 -norm n 1 k 1 = 2 X ( ^ ) k 2 2 . Secondly,theweights d j ; 1 j p; donotplayacriticalrolefortheconsistencyofthe estimator,i.e.aslongasthecondition(3.6)istheaboveerrorboundswillpro- videtherequiredconsistency.Hence,ifnopriorinformationisavailable,onemaychoose d j 1,inwhichcasetheestimator ^ becomesthe ` 1 - CQ estimator,whichis ` 1 -consistent. Thisfactshallalsobeusefulforconsistentmodelselection,whichrequirescarefullycho- senweightscorrespondingtothenon-zeroandzeroindicesoftheparameter.Thisshallbe furtherelaboratedonafterweprovidearesultonthesparsitypropertiesoftheproposed estimator. 57 Theconclusion(3.17)ofTheorem3.3.1boundingthe l 1 andweighted l 2 errorinestima- tionresemblesinformtothatofTheorem6.2ofBuhlmannandVandeGeer(2011)obtained for ` 1 -penalizedmeanregressionestimatorwhenthereisnomeasurementerrorinthecovari- atesandwhentheerrorsinregressionmodelareassumedtobesub-Gaussian.Incomparison, theaboveresult(3.17)isestablishedhereinthepresenceofheavytailmeasurementerror incovariatesandwhentheregressionmodelerrorsareindependentheteroscedasticnotnec- essarilysub-Gaussian. 3.3.1ASparsityProperty. Next,weinvestigateamodelselectionpropertyof W` 1 - CQ: Itiswellknownthatthemodel selectionpropertiesarelinkedtotheorderoptimalityconditions,alsoknownasthe KKTconditions.Theseconditionsarenecessaryandtwhentheobjectivefunction isconvex.However,asnotedearlier,thelossfunctionofourestimatorisnon-convex.In thiscase,KKTconditionsarenecessarybutnott.WeexploitthenecessityofKKT conditionstoshowthattheestimator W` 1 - CQ idenallzerocomponentssuccessfully withasymptoticprobability1 ; providedtheweights d j arechosenappropriately.More precisely,let n = 2 n c 2 m s c min ˚ 2 + 1 n O max s 3 = 2 h 2 r 2log2 p n + 1 n O ( h ) ! 0 : (3.18) TheninadditiontotheconditionsofTheorem3.3.1,weassumethereexistsa0 << 1 = 2 suchthat, ( i ) n n ; and( ii )log p = o ( n ) : (3.19) 58 Furthermorealongwiththeconditions(3.6)ontheweightvector d; weassumethat d j for j 2 S c ; divergeatafastenoughrate,i.e., d S c min =min f d j ;j 2 S c g thefollowingrate conditions. ( i )max n ; 3 n 2 n = o ( n d S c min ) ; ( ii ) n h = o ( n d S c min ) ; (3.20) ( iii )max ( max sn h 2 r 2log p n ; n max n h 3 s r 2log p n ) = o ( n d S c min ) : Theorem3.3.2 Forthemeasurementerrormodel(3.1)and(3.2),assumetheconditions ofTheorem3.3.1hold.Inadditionassumethat(3.6),(3.19)and(3.20)hold.Then P ^ j =0 ; 8 j 2 S c ! 1 : (3.21) Thistheoremprovidesthemodelselectionconsistencyoftheproposed W` 1 - CQ estimator ^ ,undersuitablechoiceoftheweightvector d =( d 1 ;:::d p ) T .Notethatsettingtheweights d j 1 ; i.e.,theun-weighted ` 1 -penaltydoesnotsatisfytherateassumptions(3.20)and hence ` 1 - CQ cannotbeguaranteedtobemodelselectionconsistentasopposedto W` 1 - CQ: Asthereadermayobserve,theconditionsrequiredforTheorem3.3.2areonlyrate conditionsonthemodelparameters,inadditiontomilddistributionalassumptions.These conditionsareweakerthanthoserequiredformodelidenintheworkofBelloniand Chernozhukov(2011).Thereasonforthisbeingthatourresultstatesthatzerocomponents arecorrectlyidenasopposedtoBelloniandChernozhukov(2011),whostateastronger resultregardingidentyofthenon-zerocomponents.Thus,weareabletostatea weakerresultunderweakerconditions.Weareunabletoprovideanyresultregardingthe 59 idenofnon-zerocomponentsduetothenon-convexityofthelossfunction. Wenotethattheaboveresultswillcontinuetoholdforallothermeasurementerror distributionsforwhichtheidentity(3.11)andtheprobabilitybound(3.47)givenbelow hold. 3.3.2Adaptivechoiceoftheweightvector d: Formodelselectionwehaveseenthatthechoiceoftheweightvector d playsacriticalrole fortheproposedestimatortoguaranteethatthezerocomponentsareidencorrectly. Zou(2008)proposedtheideaofadaptivelychoosingtheseweightsbysetting d j = j ^ ini j j ; 1 j p;> 0 ; where ^ ini j isanyinitialestimateof 0 j satisfyingmax 1 j p j ^ ini j 0 j j = O p ( n ),with n ! 0 : Weusethesameapproachtoselecttheweightvector d inoursetup. First,weusethe l 1 - CQ estimator,i.e.,theproposedestimatorwiththeordinary ` 1 - penalty( d j =1 ; 8 1 j p ),thisgivestheinitialestimate ^ ini : Theorem3.3.1provides theconsistencyofthisestimate.Inparticular,underconditionsofTheorem3.3.1weobtain withhighprobability, k ^ ini 0 k 1 n ! 0 ; where n isin(3.18).Hereweplace anadditionalassumptiononthetrueparametervector,namely,weassumethatallnon-zero componentsof 0 areboundedaboveandbelowbyaconstant,i.e. b 1 j 0 j j b 2 : Thus, withhighprobability j ^ ini j j b 2 + n ; 8 j 2 S j ^ ini j j n 8 j 2 S c : (3.22) Now,weset d j =( j ^ ini j j + c w ) ; where c w =min 1 j p ( j ^ ini j j ; ^ ini j 6 =0)isaddedtothe initialestimatestoavoidtheproblemofdivingbyzero. Keeping(3.22)inview,itiseasytoverifythatwhen n islargeenough,theaboveweight 60 vector d therequiredassumptions(3.6)and(3.20)forsomeconstant chosen appropriately,withprobabilityapproachingto1. 3.4SimulationStudy 3.4.1Simulationsetup Inthissectionwenumericallyanalysetheperformanceoftheproposedestimators ` 1 - CQ and W` 1 - CQ: AllcomputationsweredoneinR,onanordinarydesktopmachinewithae core(2.3GHz)processor.Wecompareourproposedestimatorswithleastsquaresbasedhigh dimensionalproceduresincludingLassoandthebiascorrectedLasso(LohandWainwright (2011)),thelatterofwhichisspdesignedtohandlesub-Gaussianmeasurement errorincovariates. Whileconductingoursimulationstudy,wecomputeLassoestimatesusingthepackage glmnetdevelopedbyFriedmanetal.(2010).Tocompute ` 1 - CQ estimatesandthebiascor- rectedLasso,weusetheprojectedgradientdescentalgorithm(Agarwaletal.(2012)),which isatooldevelopedforoptimizingpenalizedsmoothlossfunctionsinhighdimensions.More precisely,with r L ( )denotingthegradientofalossfunction L ,themethodofprojected descentiteratesbytherecursions, f r ;r =0 ; 1 ; 2 ;::: g as, r +1 =argmin 2 ˆ L ( r )+ r L ( r ) T ( r )+ 2 k r k 2 2 + n k k 1 ˙ ; (3.23) where > 0isastepsizeparameter.Theserecursionscanbecomputedrapidlyin O ( p )time usingtheproceduresuggestedbyAgarwaletal.(2012)withtherestrictionoftheparameter spacetothe ` 1 -ballimplementedbytheprocedureofDuchietal.(2008).Thisprocedure 61 essentiallyinvolvestwo ` 2 projectionsontothe ` 1 ball : Theweightedversion W` 1 - CQ canbecomputedbytheproceduredescribedabovewith thefollowingalgorithmsimilarinspirittothatdescribedbyZou(2008).Theproofofthis algorithmisstraightforwardandhenceisomitted. Algorithmtocompute W` 1 - CQ bymethodofprojectedgradientdescent: 1. e w ? i =( w i 1 =d 1 ;:::;w ip =d p ) T ; 8 1 i n: Also ? = ˙ ij =d i d j ; 8 1 i;j p where ˙ ij denotethecomponentsof : 2. Optimize,usingthemethodsofprojectedgradientdescentandDuchietal., ^ ? =argmin 2 n 1 n n X i =1 ˆ ? L ( y i ;w ? i ;;h )+ n k k 1 o : 3. Evaluate ^ j = ^ ? j =d j ; 8 1 j p: TuningParameters: Thechoiceofthetuningparameters n and h isstillnotacompletely understoodaspectofhighdimensionaldataanalysis.Typicallyinregularizedestimation methods,eithercrossvalidationor AIC - BIC typeselectorsareusedtoselectthetuning parameters.Zhang,LiandTsai(2010)providetheoreticalforusing AIC - BIC typecriteriaforseveralmodels.Thecrossvalidationmethodisoftenobservedto resultinoving(Wang,LiandTsai,2007),furthermoreitisconsiderablymoretime consuming.Morerecently,Lee,NohandPark(2014)havesuggestedahighdimensional BICtypecriterionforquantileregressionmethods.Motivatedbytheirresults,onewayto proceedistoselect n ;h asminimizersofthefunction HBIC( n ;h )=log n X i =1 ˆ ? L ( y i w T i ^ ) + j S j (log n ) 2 n C n ; 62 where j S j isthenumberofnonzerocotsintheestimatedparametervector ^ and C n isadivergingsequenceofpositivenumbers.However,since ˆ ? L cantakenegativevalues, weshalluse e HBIC ( n ;h )= n X i =1 ˆ ? L ( y i w T i ^ ) e j S j (log n ) 2 n C n toobtain n and h .Theexponentialtransformationremovestheproblemofnegativityof ˆ L andalsomaintainsmonotonicity.Furthermore,wechoose C n = O (log(log p ))whichis empiricallyfoundtoworkwellinthissimulation. InHBIC( n ;h ),weusedthecorrectedquantilelossfunctioninsteadofthecheck functionasbyLeeetal.(2013).Althoughthismakesintuitivesenseasthecorrected quantilelossfunctionisapproximatedbythecheckfunction,howeverarigoroustheoretical argumentjustifyingitsuseismissing.Thiscriterionisempiricallyfoundtoperformwellin oursetup. 3.4.2ComputationalIssues Acomputationalchallengeoftheproposedestimatoristhenon-convexityofthelossfunction l n .Theobjectivefunction l n becomesincreasinglyvolatilearoundthetrueparameterasit approachesthecheckfunctionatvaluesof h veryclosetozero.Thisbehaviourisillustrated inFigure1,whichplotsthelossfunctionagainst 1 ; keepingallotherparametersedat thetruevalues.Thisplotisgeneratedfortheofthe100simulatedmodels. Thelossfunctionexhibitsseverallocaloptimumsatsmallervaluesof h: Ontheother handatrelativelyhighervaluesof h; thelossfunctionappearstobeconvexshapedaround thetrueparameterandappearstohaveauniqueminimumatthetrueparametervalue.Two 63 Figure3.1: 1 n P n i =1 ˆ ? L ( )+ n k k 1 evaluatedaround 1 at h =0 : 01(left)and1 : 5(right). computationalconsequencesofthisbehaviorarethat,at h closetozero,anyoptimiza- tionprocedurebecomesexcessivelytimeconsuming.Toavoidthisunpleasantfeature,we avoidvaluesof h closetozero.Itwasnumericallyobservedthatbydoingso,weareableto maintaintheaccuracyoftheestimatoralongwithareasonablecomputationtime.Second, atvaluesof h outsideaneighborhoodofzerotheoptimizationsarerobustagainstthestart- ingpointsusedinoptimizations.Inparticular,inall100simulationrepetitionsthestarting pointforoptimizationwaschosenrandomlyfromaGaussiandistribution.Thisbehaviorof theobjectivefunctionisalsorepresentedvisuallyinthecontourplotinFigure3.2.Herethe ` 1 estimationerror k ^ k 1 isplottedascoloredcontourswiththeerrorincreasingfrom redtoblueregions.Valuesof h arerepresentedonthex-axisandvaluesof n onthey-axis. Fromthisplotitisapparentthatthelowesterrorisgiveninregionsconcentratedaround therelativelysmallervaluesof h and n exceptwhen h isinasmallneighborhoodofzero. Figure3.2: Coloredcontoursof k ^ k 1 on h vs. n for ` 1 - CQ . Withregardstocomputationaltime,oneoptimizationat h =0 : 01takes ˇ 16seconds 64 asopposedto ˇ 2secondsat h =0 : 5 ; at( p =40 ;n =120) : Thiscanalsobeviewedin comparisontocorrectedLassowhichtakes ˇ 0 : 5secondtocompleteoneoptimization. 3.4.3Simulationsetupandresults Forthissimulationstudy,dataisgeneratedfromthemeasurementerrormodel(3.1)and (3.2)underseveralchoicesoftheunderlyingparametersanddistributionalassumptions. Theunobserveddesignvariables f x ij ; 1 i n; 1 j p g arechosenasi.i.d.r.v.'sfrom a N (0 ; 1)distribution.Themeasurementerrors f u i ; 1 i n g arei.i.d. L p (0 ; with = ˙ 2 I ,where I isthe p p identitymatrix.Themodelerrors " i ; 1 i n are independentrealizationsofNormal,CauchyormeancenteredParetor.v.'s. WebeginbynumericallyverifyingtheresultofTheorem3.3.1.Observethatthistheorem canbeviewedasdescribingthescalingbehaviouroftheerror k ^ k 1 : Inordertovisualize this,weperformsimulationsbyvaryingthedimensionoftheparametervector p andthe samplesize n whileholdingallotherparametersinparticularthenumberofnon zerocomponents s =5 ; thecovariancematrixoftheLaplacedistributionforthecovariate errorsistakentobe0 : 2 I p p andthemodelerrorsareGaussianwithvariance0 : 2.Allof thefollowingdescribetheerrorof ` 1 - CQ estimate.Thebehavioroftheerrorof estimationforthe W` 1 - CQ isobservedtobesimilarandthusthecorrespondingplotsare omittedforthesakeofbrevity. Figure3.3isacontourplotgeneratedat h =0 : 4(left)and =0 : 07(right).Thisplot describesthe ` 1 error, k ^ k 1 asaspectrumofcolorswithredbeingtheleastandviolet beingthemaximum.They-axisplotsdtvaluesofthetuningparameter andthe x-axismarksthevaryingdimension p ofthemodel.Notethatforagivenmodeldimension thecorrespondingsamplesizeisrescaledtomaintaintheratio( n= log2 p )tobeconstant. 65 ThisrescalingisdoneinaccordancewiththeresultofTheorem3.3.1andaspredictedby thetheorem,holdingallotherparametersedforeachvalueof (left)and h (right)the errorlevelstaysroughlyconstant(thecolorsalign)acrossthechosenvaluesof p . Figure3.3: Coloredcontoursof k ^ k 1 on p vs. (left)and h: (right) Wenowproceedtoamoredetailednumericalcomparisonoftheproposedestimates withLassoandcorrectedLassoestimates.Foranygivenmethod,wesummarizetheresults obtainedby100repetitions.Foreveryrepetition,eachnonzerocomponentoftheparameter vector isgeneratedfroma N (0 ; 1)distributionnormalizedbythe ` 2 normofthegenerated vector.Thedimensionsofthisvectorarechosentobe p =40 ; 300 ; 500 : Thedimensionof thenonzerocomponentsaresetto s =5 ; 8 ; 10 : Asmentionedearlier,themodelerrors " i ; 1 i n aregeneratedfromGaussian,CauchyormeancenteredParetor.v.'s.Notethat theParetodistributioncanbeheavilyskewed.Forperformancecomparisonwereportthe followingcriteria. MEE :(Medianestimationerror),medianover100repetitionsoftheestimationerror k ^ 0 k 2 : MIZ :(Medianincorrectnumberofzeros),medianover100repetitionsofthenumberof incorrectlyidenzerocomponentsoftheparametervector. MINZ :(Medianincorrectnumberofnonzeros),medianover100repetitionsofthenumber 66 ofincorrectlyidennonzerocomponentsoftheparametervector. Inalltables,thestandarderrorsofthecorrespondingcriteriaarereportedintheparen- theses. Table3.1: Simulationresultsat p =40forNormal,CauchyandParetomodelerrors s =8 ;u i ˘ L p ( ˙ 2 =0 : 2) ;" i ˘N (0 ; 0 : 2) ;˝ =0 : 5 ` 1 - CQC LassoLasso nMEEMINZMIZMEEMINZMIZMEEMINZMIZ 20 0.62 45 0.6346 0.7849 (0.21)(1.72)(2.33) (0.22)(1.76)(2.39) (0.27)(1.98)(3.97) 60 0.27 19 0.2819 0.39112 (0.061)(1.07)(2.91) (0.065)(1.04)(2.90) (0.071)(1.12)(4.77) 120 0.18 112 0.20111 0.34114 (0.038)(0.86)(2.86) (0.038)(0.88)(2.93) (0.04)(0.91)(4.83) s =8 ;u i ˘ L p ( ˙ 2 =0 : 2) ;" i ˘ Cauchy ( scale =0 : 1) ;˝ =0 : 5 50 0.95 57 1.6179 5.21220.5 (0.21)(1.21)(2.41) (0.42)(1.10)(2.25) (15.89)(1.59)(5.45) 150 0.60 49 1.54710 5.02225 (0.14)(1.17)(3.83) (0.47)(1.15)(2.44) (19.12)(1.69)(6.28) 300 0.35 211 1.56713 4.77224 (0.11)(1.01)(3.75) (0.49)(1.21)(2.45) (18.80)(1.53)(5.37) s =5 ;u i ˘ L p ( ˙ 2 =0 : 2) ;" i ˘ meancenteredPareto;˝ =0 : 75 100 0.66 28 1.0137 1.15313 (0.15)(1.05)(2.93) (0.31)(1.32)(2.99) (3.05)(1.06)(7.20) 200 0.50 18 0.8427.5 0.97215 (0.10)(0.94)(2.56) (0.32)1.31(2.64) (2.49)(1.78)(6.70) 300 0.38 19 0.7128 0.90113 (0.07)(0.84)(2.89) (0.27)(1.16)(2.86) (2.41)(0.94)5.64 Tables3.1and3.2provideresultsofthesimulationstudycomparingthe ` 1 - CQ; the correctedLasso(C-Lasso)andLassoestimators.Itisclearfromtheseresultsthatunder heavytailedorskewedmodelerrors(CauchyandmeancenteredPareto),the ` 1 - CQ esti- matortlyoutperformstheothertwoproceduresinallthreecomparisoncriteria. Furthermore,thestandarderrorsof ` 1 - CQ aretlysmallerthanthoseoftheother two.UnderGaussianmodelerrors, ` 1 - CQ iscomparable(slightlybetter)inperformance 67 totheC-Lasso,whilebothoftheseproceduresoutperformLasso.Consistencyintermsof theestimationerrorandidentifyingthecorrectsupportof 0 isclearlyvisibleas n !1 : Asexpected,the ` 1 - CQ estimatordoesnotprovideconsistencyinidentifyingthezerocom- ponentscorrectly.However,itisstillmuchbetterincomparisontoLasso.Thisbehavior ofLassoundermeasurementerrorhasalsobeenobservedbySorensenetal.(2014),i.e., measurementerrortendstoinduceovbynaiveestimatorssuchasLasso. AnotherinstancewheretheproposedestimatorsoutperformthecorrectedLassoand Lassoistheheteroscedasticsetup.Toillustratethis,wegeneratedindependentmodelerrors " i from N (0 ;˙ 2 i ),where ˙ 2 i ischosenuniformlyfromtheinterval(0.1,9),foreach i =1 ; n . Thedimension p isincreasedto500forthiscaseandtheresultsareprovidedinTable3.3. Wenextinvestigatethe W` 1 - CQ estimatorforconsistentidenofthezerocom- ponents,inadditiontoconsistentestimation.Tables4and5providesimulationresults for W` 1 - CQ for p =40and p =300 : Theweights d j arechosenasdescribedattheend ofsection4above,wheretheexponent ischosenbyusingtheselectioncriteriae HBIC : Comparing W` 1 - CQ;` 1 - CQ; C-LassoandLasso,theandmostimmediateconclusion istheoftheproposedestimatorsunderheavytailedorskewedmodelerrors.The W` 1 - CQ estimatorconsistentlyandtlyoutperformsallotherproceduresinterms ofmodelidentiunderallchosendistributionalandparametersettings. Finally,toseehowrobustthe ` 1 - CQ estimatoristothemisspcationofthemea- surementerrordistribution,wecompareditsperformancewithC-Lassowhenthiserror distributionisGaussian.TheresultsreportedinTable6showacomparableperformance withC-lassoperformingonlymarginallybetter. 68 Table3.2: Simulationresultsat p =300forNormal,Cauchymodelerrors s =10 ;u i ˘ L p ( ˙ 2 =0 : 1) ;" i ˘N (0 ; 0 : 2) ;˝ =0 : 5 ` 1 - CQC LassoLasso nMEEMINZMIZMEEMINZMIZMEEMINZMIZ 100 0.26 221 0.27221.5 0.35237 (0.04)(1.20)(5.20) (0.05)(1.19)(4.70) (0.07)(1.09)(7.43) 200 0.17 130 0.17 129.5 0.23155 (0.028)(0.89)(6.86) (0.029)(0.90)(6.77) (0.031)(0.91)(11.70) 300 0.11 133 0.14132 0.18179 (0.023)(0.79)(5.74) (0.023)(0.76)(5.68) (0.025)(0.82)(16.16) s =10 ;u i ˘ L p ( ˙ 2 =0 : 1) ;" i ˘ Cauchy ( scale =0 : 1) ;˝ =0 : 5 100 0.44 323 0.92720 4.39727 (0.10)(1.57)(10.36) (0.41)(2.35)(7.21) (10.70)(1.96)(30.16) 200 0.30 221.5 0.87622.5 3.27630 (0.07)(1.34)(12.63) (0.44)(2.42)(8.18) (8.27)(1.84)(51.34) 300 0.21 213 0.85622 3.47628 (0.05)(1.23)(12.86) (0.50)(2.59)(8.99) (9.10)(1.91)(47.21) Table3.3: Simulationresultsat p =500underheteroscedasticity s =10 ;u i ˘ L p ( ˙ 2 =0 : 1) ;" i ˘N (0 ;˙ 2 i ) ;˙ 2 i ˘ Uniform (0 : 1 ; 9) ;˝ =0 : 5 ` 1 - CQC LassoLasso nMEEMINZMIZMEEMINZMIZMEEMINZMIZ 100 0.86 725.5 0.99725 1.73773 (0.14)(1.13)(4.33) (0.15)(0.99)(5.05) (0.27)(1.87)(16.15) 200 0.66 530 0.78530 1.67587 (0.12)(1.32)(5.98) (0.12)(1.20)(5.56) (0.25)(1.19)(21.25) 300 0.54 434 0.69533 1.51383 (0.09)(1.37)(6.56) (0.10)(1.17)(5.71) (0.18)(1.26)(19.56) Table3.4: W` 1 - CQ at p =40forNormalandCauchymodelerrors. " i ˘N (0 ; 0 : 2) ; " i ˘ Cauchy ( scale =0 : 1) ; n MEEMINZMIZ n MEEMINZMIZ 20 0.701 5 50 0.981.55 (0.25)(0.66)(1.38) (0.23)(0.63)(2.20) 60 0.2805 150 0.641 5 (0.06)(0.64)(1.55) (0.14)(0.61)(2.27) 120 0.2104 300 0.4114 (0.038)(0.61)(1.27) (0.11)(0.70)(2.17) 69 Table3.5: W` 1 - CQ at p =300forNormalandCauchymodelerrors. " i ˘N (0 ; 0 : 2) ; " i ˘ Cauchy ( scale =0 : 1) ; n MEEMINZMIZ n MEEMINZMIZ 100 0.281 7 100 0.4018 (0.04)(0.69)(1.75) (0.11)(0.72)(2.91) 200 0.2306 200 0.2916 (0.02)(0.62)(0.98) (0.09)(0.68)(1.85) 300 0.1805 300 0.2405 (0.02)(0.41)(0.97) (0.06)(0.46)(1.56) Table3.6: ` 1 - CQ at p =300withmisspecovariateerrordistribution. s =10, u ij ˘N (0 ; 0 : 3) ;" i ˘N (0 ; 0 : 3) ` 1 - CQ C Lasso n MEEMINZMIZ MEEMINZMIZ 100 0.31118 0.28 116 (0.06)(0.89)(3.64) (0.07)(0.83)(3.12) 200 0.22121 0.20 118 (0.04)(0.71)(4.10) (0.03)(0.71)(4.44) 300 0.20024 0.17 019 (0.02)(0.63)(4.14) (0.02)(0.61)4.88 70 3.5ProofsforChapter3 WebeginbyprovingTheorem3.1.1andTheorem3.1.2whichprovideconsistencyinestima- tionandvariableselectionforthe W` 1 - CQ estimatorforthecaseof p: Forthispurpose weshallrequirethefollowinguniformconvergenceresultstatedinWSZ,pp16aspartof theproofofTheorem4oftheirpaper.Foranycompactset B in R p wehave, sup 2B n 1 n X i =1 @ 2 ˆ ? L ( ) @ j @ l E @ 2 ˆ ? L ( ) @ j @ l = o p (1) : (3.24) TheproofofthisstatementfollowsfromNolanandPollard(1987,Lemma22)andPollard (1984,Theorem2.37). ProofofTheorem3.1.1. Ittoshowthatforany > 0 ; thereexistsatly largeconstant C suchthat P inf k u k = C l n ( 0 + n u ) >l n ( 0 ) > 1 (3.25) where n := O ( n 1 = 2 ) : Thisimpliesthereexistsalocalminimumintheball( 0 + n u : k u k C ) ; i.e.thereexistsalocalminimizersuchthat k ^ 0 k = O P ( n ) : Recall n 1 and n 2 fromassumption(F3).Consider, D n ( u )= n 1 n X i =1 ˆ L ( 0 + n u ) ˆ L ( 0 ) + n p X i =1 d j ( j 0 j + n u j jj 0 j j ) (3.26) = T n 1 + T n 2 ; ( say; ) here, T n 1 = n 1 P n i =1 ˆ L ( 0 + n u ) ˆ L ( 0 ) ; and T n 2 = n P p i =1 d j ( j 0 j + n u j jj 0 j j ) : 71 NowbyTaylor'sexpansionweobtain, T n 1 = n n 1 = 2 T n 1 ( w; 0 ) u + 2 n 1 2 u T n 2 ( w; ) u = n n 1 = 2 T n 1 ( w; 0 ) u + 2 n 1 2 u T Au f 1+ o P (1) g : (3.27) Where isbetween 0 + n u and 0 andthesecondequalityfollowsfrom(3.24)and assumption(F3)sup 2B j n 2 ( w; ) E n 2 ( w; ) j = o P (1)andtheassumption(F3). NowbytheCentralLimitTheorem T n 1 ( w; 0 )= O P (1) ; hencethetermin(3.27)is oftheorder O P ( n 1 = 2 n ) : ConsiderthesecondtermontheRHS,since A ispositive hencethesecondtermispositiveandbychoosingantlylarge C itdominatesthe termuniformlyin k u k = C: Similarly, T n 2 = n p X i =1 d j ( j 0 j + n u j jj 0 j j ) n n X j 2 S d j j u j j n n k u k ( X j 2 S d 2 j ) 1 = 2 = O ( n n ) : Thusbychoosing C largeenough,thesecondterminRHSof(3.26)dominatestheother two,thusproving(3.25). ProofofTheorem3.1.2. Toshowthepartofthistheorem,observethatsincethe lossfunction ˆ L isanonconvexsmoothfunction,thustheKKTconditionforoptimatilyis necessarybutnotcient.Weshowthatif( a )isnottruethenwithprobabilitytending to1thenecessityofKKTconditionsisviolated,i.e.,forany j 2 S c ; letifpossible ^ j 6 =0 thenbythenecessityofKKTwehave, n 1 n X i =1 @ˆ L ( ^ ) @ j = n d j sign ( ^ j ) (3.28) 72 NowbyTaylor'sexpansionweobtain, n 1 n X i =1 @ˆ L ( ^ ) @ j = n 1 n X i =1 @ˆ L ( 0 ) @ j + n 1 p X l =1 n X i =1 @ 2 ˆ L ( ) @ j @ l ( ^ l 0 l ) Followingstandardarguments, n 1 n X i =1 @ˆ L ( 0 ) @ j = O P ( n 1 = 2 ) and n 1 P n i =1 @ 2 ˆ L ( ) =@ j @ l = n 1 P n i =1 E ( @ 2 ˆ L ( ) =@ j @ l )+ o P (1)by(3.24).Also byassumption ^ 0 = O P ( n 1 = 2 ) : ThustheLHSof(3.28)isoftheorder O P ( n 1 = 2 )hence choosingthecondition p n d S c min !1 contradictstherelation(3.28).Thusprovingpart ( a )oftheTheorem. Toprove(b)webeginagainwiththeKKToptimalitycondition,i.e.,inviewofTheorem 3.1.1,withprobabilitytendingto1,forany j 2 S; ^ j 6 =0thus, n 1 n X i =1 @ˆ L ( ^ ) @ j n d j sign ( ^ j )=0 (3.29) Now, @ˆ L ( ^ ) @ j = @ˆ L ( 0 ) @ j + p X l =1 @ 2 ˆ L ( ) @ j @ l ( ^ l 0 l ) Since, 1 p n P n i =1 @ˆ L ( 0 ) @ S )N (0 ;D S )and p n d j ! 0 ; sincebyassumption(3.6)forall j 2 S;d j 'sareboundedabove,thusweobtaintherequiredresult. WenowproceedtotheproofsofresultsintheHighDimensionalSetting.As statedatthebeginningofSection4,thetechniqueusedtoproveTheorem3.3.1istouse 73 theconvexityofthequantilefunction ˆ ( )andtheweighted ` 1 -penalty,alongwiththe approximationof ˆ ? L ( )to ˆ ( ) : Someofthestepsoftheproofaresimilartothoseadopted byBulmannandVanderGeer(2011,ch.4). Let t = n = ( n + k ^ 0 k 1 ) ; andset ~ = t ^ +(1 t ) 0 : Notethat ~ 2B ( n ) : Moreover, ~ 2B ( n )implies ^ 2B ( n = (1 c )) ; 8 0 0, (i) j l ( s 1 ;y ) l ( s 2 ;y ) j C j s 1 s 2 j ; (ii) j l 0 ( s 1 ;y ) l 0 ( s 2 ;y ) j C h 1 j s 1 s 2 j ; (3.43) (iii) j l 00 ( s 1 ;y ) l 00 ( s 2 ;y ) j C h 2 j s 1 s 2 j ; (iv) j l 000 ( s 1 ;y ) l 000 ( s 2 ;y ) j C h 3 j s 1 s 2 j : Theaboveconditionsarethereasonforchoosingthekernelfunction K ( )asthep.d.f.of astandardnormalr.v.Werequirethatthethreederivativesof K ( )tobebounded uniformly.Inthefollowingwedenote l i ( )= l ( w 0 i ;y i )and l 0 i ( ) ;l 00 i ( )and l 000 i ( )are similarly. ProofofTheorem3.2.1. Firstnotethatfor 2 ; wehave k 0 k 1 2 b 0 p s; by 79 theofandtheassumption k 0 k 1 b 0 p s: Notethat, ˆ ? Li ( )= l i ( ) ˙ 2 2 l 00 i ( ) : Now, M ? n ( ) EM ? n ( )= 1 n n X i =1 l i ( ) l i ( 0 ) E l i ( ) l i ( 0 ) (3.44) 1 n ˙ 2 2 n X i =1 l 00 i ( ) l 00 i ( 0 ) E l 00 i ( ) l 00 i ( 0 ) + 1 n ˙ 2 0 ˙ 2 2 n X i =1 l 00 i ( 0 ) E l 00 i ( 0 ) I ˙ 2 2 II + ˙ 2 0 ˙ 2 2 III; say : Weshallshowthat ( a )sup 2 j I j = O p p s r 2log2 p n ; ( b )sup 2 j II j = O p s 1 = 2 h 2 r 2log2 p n ; ( c ) j III j = O p s h r log2 p n : Observethatfor 2 ;˙ 2 = T max b 0 s and j ˙ 2 0 ˙ 2 j 2 b 0 max s: Thisfact alongwithboundsfor I;II and III shallimplythedesiredresult. theempiricalprocess G n ( ):= 1 n P n i =1 l i ( ) El i ( ) and Z n :=sup 2 jG n ( ) G n ( 0 ) j : With ˙ u asinassumption(A3),let c u =1 : 4 ˙ u .Ontheevent A = max 1 j p 1 n P n i =1 u 2 ij 80 c u ; 1 n n X i =1 w 2 ij 2 n n X i =1 ( x 2 ij + u 2 ij ) 2( c x + c u ) : (3.45) ThisboundandtheLipschitzcondition(3.43)(i)allowustoapplyLemma14.20andTheorem 14.2asdoneinExample14.2ofBuhlmannandVandeGeer(2011)page503,toyield E Z n I A 32 c 1 b 0 ( c x + c u ) p s r 2log2 p n ; P Z n I A 8 c 1 b 0 ( c x + c u ) p s 4 r 2log2 p n + r 2 t n exp t ; forany t> 0 : Choose t =log2 p inthelatterboundtoobtain P Z n I A O p s r 2log2 p n = o (1) : Nowtoremovethetruncationof Z n ontheset A; observethat(3.43)(i)alsoimpliesthat, j l i ( ) l i ( 0 ) j C ( n +max ij j u ij j ) k 0 k 1 2 Cb 0 ( n +max ij j u ij j ) p s; sinceforany 2 ; wehave k 0 k 2 b 0 p s: Hence, Z n Z n I A + c p s ( n +max ij j u ij j ) I A c + c p sE ( n +max ij j u ij j ) I A c : (3.46) Nowrecallthatforeach1 j p , f u ij ; 1 i n g arei.i.d. L (0 ;˙ 2 jj )r.v.'s.Hence, 2 P n i =1 j u ij j =˙ jj ˘ ˜ 2 2 n ; where ˜ 2 2 n denotesachisquarer.v.with2 n degreesoffreedom. Nowusetheprobabilityboundsforchi-squaredistributionsgivenbyJhonstone(2001)to 81 obtain P A c p X j =1 P 1 n n X i =1 j u ij j o 2 c 2 u p X j =1 P 2 1 n n X i =1 j u ij j ˙ jj 2 : 8 (3.47) = p X j =1 P ˜ 2 2 n n 2 : 8 p X j =1 P j ˜ 2 2 n 2 n j 2 n (0 : 4) p X j =1 exp 3 n 100 exp 3 n 100 +log p : Next,usethefactthat j u ij j˘ Exp ( ˙ jj ) ; toobtain E (max ij j u ij j ) 2 X i;j E ( u 2 ij ) npc u Thus,usingthisbound,(3.47),andtheCauchy-Schwarzinequality,weobtain (a) P (max ij j u ij j ) I A c >n k n k E (max ij j u ij j ) I A c n k r npc u exp 3 n 100 +log p ; (b)) E (max ij j u ij j ) I A c r E (max ij j u ij j ) 2 P ( A c ) r npc u exp 3 n 100 +log p : Theexponentialboundin(a)impliesthattheprobabilityoftheeventin(a)tendstozero, forany k> 0.Thisinturnimpliesthatthesecondsummandin(3.46) p s ( n +max ij j u ij j ) I A c = o p ( n k ) ; 8 k> 0 : Similarly,theboundin(b)impliesthatthethirdsummandintheboundof(3.46)decreases tozeroatanexponentialrate.Thus,withprobabilityatleast1 o (1) ; theremaindertwo 82 summandsintheboundin(3.46)decreasetozero,inprobability,fasterthan Z n I A : Hence, sup 2 j I j = Z n = O p p s r 2log2 p n : (3.48) Wecansimilarlyobtainaboundforterm II of(3.44).Anoutlineisgivenbelow. theempiricalprocess ~ G n ( ):= 1 n P n i =1 l 00 i ( ) El 00 i ( ) : Let ~ Z n :=sup 2 j ~ G n ( ) ~ G n ( 0 ) j : Proceedingasearlier,(3.43)(ii)alongwiththebound(3.45)allowustoapplyLemma14.20 andTheorem14.2ofBuhlmannandVandeGeer(2011),page503,whichyields E ~ Z n I A 32 c 3 b 0 ( c x + c u ) p s h 2 r 2log2 p n ; and P Z n I A 8 c 3 b 0 ( c x + c u ) p s h 2 4 r 2log2 p n + r 2 t n exp t ; 8 t> 0 : Choose t =log2 p inthisboundtoobtain P ~ Z n I A O p s h 2 r 2log2 p n = o (1) : Getridofthetruncationontheset A asdonefor I ,toobtain sup 2 j II j = ~ Z n = O p p s h 2 r 2log2 p n : (3.49) Lastly,considertheterm III in(3.44).Observethat j l 00 i ( 0 ) j ch 1 ; for c< 1 : Then 83 Lemma14.11ofBuhlmannandVandeGeer(2011)yields P 1 n n X i =1 l 00 i ( 0 ) El 00 i ( 0 )) t 2exp nt 2 h 2 2 c 2 : Choosing t = h 1 q log2 p n ; weobtain j III j = 1 n n X i =1 l 00 i ( 0 ) El 00 i ( 0 )) = O p h 1 r log2 p n : (3.50) Nowuse(3.48),(3.49)and(3.50)in(3.44),andthefactthattherateofdecreaseof(3.49) istheslowest,toconclude(3.9)ofTheorem3.2.1. Theproofof(3.10)similar.ThiscompletestheproofofTheorem3.2.1. ProofofLemma3.5.1. ˆ ( s;y i )= ˆ ˝ ( y i s ) : Thenobservethatitthe followingLipchitzcondition, j ˆ ( s 1 ;y i ) ˆ ( s 2 ;y 2 ) j max f ˝; 1 ˝ gj s 1 s 2 j : Thenproceedasintheproofof(3.9)ofTheorem3.2.1toobtainthedesiredbound. ProofofTheorem3.3.2. Let n beasin(3.18).By(3.41), ^ 2B ( n ),with probability1 o (1).Thus,witharbitrarilylargeprobability,foralllarge n , ^ isintheinterior ofandnotonitsboundary.Hence,KKTconditionsarenecessaryforthisoptimum.We provethedesiredresultviacontradiction.Forany j 2 S c ; letifpossible ^ j 6 =0 : Thenby thenecessityofKKTconditions, 84 d j n 1 n X i =1 ˆ ? Li ( ^ ) = n d j sign ( ^ j ) : (3.51) Recall(3.42).Thederivativesof ˆ ? Li ( )and ˆ Li ( )w.r.t j are ˆ ? 0 Li;j ( ):= d j ˆ ? Li ( )= w ij l 0 i ( )+ ˙ 2 2 l 000 i ( ) w ij n X k =1 ˙ kj j l 00 i ( ) ; (3.52) ˆ 0 Li;j ( )= d j ˆ Li ( )= x ij h ˝ 1+ H " h + " h K " h i : Let ˆ 0 i;j ( ):= x ij ˝ I f y i x 0 i 0 g ; 0 Li;j ( ):= Eˆ ? 0 Li;j ( ) ; 0 Li;j ( ):= Eˆ 0 Li;j ( ) ; 0 i;j ( ):= Eˆ 0 i;j ( ),and S ? n;j ( )= n 1 n X i =1 ˆ ? 0 Li;j ( ) ˆ ? 0 Li;j ( 0 ) 0 Li;j ( )+ 0 Li;j ( 0 ) ;T ? B ;j =sup 2B ( n ) S ? n;j ( ) : Thefactthat 0 i;j ( 0 )= E ( ˆ 0 ij ( 0 ))=0 ; andtriangleinequalityyield j n 1 n X i =1 ˆ ? 0 Li;j ( ^ ) j n 1 n X i =1 ˆ ? 0 Li;j ( 0 ) 0 Li;j ( 0 ) + n 1 n X i =1 ? 0 Li;j ( ^ ) 0 i;j ( ^ ) + n 1 n X i =1 0 i;j ( ^ ) 0 i;j ( 0 ) + T B ;j : = J 1 + J 2 + J 3 + J 4 : Wewillshowtherelations(3.53)-(3.53)belowholdforall j 2 S c simultaneouslywith 85 probabilityatleast1 o (1) : J 1 := n 1 n X i =1 ˆ ? 0 Li;j ( 0 ) ? 0 Li;j ( 0 ) = o ( d j n ) ; (3.53) J 2 :=sup 2B ( n ) n 1 n X i =1 ? 0 Li;j ( ) 0 i;j ( ) = O ( n h )= o ( d j n ) ; (3.54) J 3 :=sup 2B ( n ) n 1 n X i =1 0 i;j ( ) 0 i;j ( 0 ) = o ( d j n ) ; (3.55) J 4 := T B ;j = o ( d j n ) : (3.56) Lemma3.5.2proves(3.54)and(3.55)andLemma3.5.3proves(3.53)and(3.56).Finally, combining(3.53)-(3.56),weobtainthatfor n large,withprobability1 o (1) ; d j n 1 n X i =1 ˆ ? Li ( ^ ) 0 g + " h K " h (3.59) = x ij E H " h + " h K " h : Now, E H " h = Z 1 x = H x a i h f i ( x ) dx = h Z 1 t = H ( t j ) f i ( ht + a i ) dt = h Z 0 t = H ( t ) f i ( ht + a i ) dt + h Z 1 t =0 H ( t ) f i ( ht + a i ) dt = O ( h ) ; uniformlyin1 i n; 2 R p ,becausebyassumption(A1),sup 1 i n;x 2 R f i ( x ) < 1 . Similarly,onevsthatmax 1 i 2 R p j h 1 E" K ( " =h ) j = O ( h ) : Hencefrom(3.59) weobtain, sup 1 i n; 1 j 2 R p 0 Li;j ( ) 0 i;j ( ) = O ( n h )= o ( n d S c min ) : (3.60) Thelastequalityfollowsfromtherateassumptions(3.20).Thisboundandassumption(A2) completestheproofof(3.57). Next,weshow(3.58).Byassumption(A1), 0 i;j ( ) ˆ 0 i;j ( 0 ) (3.61) = x ij h F i x 0 i ( 0 ) F i (0) i = x ij f i (0) x T i ( 0 ) x ij ~ I i ; 87 where ~ I i = F i ( x 0 i ( 0 )) F i (0) f i (0) x T i ( 0 ) : Nowforany j 2 S c , 1 n n X i =1 x ij f i (0) x 0 i ( 0 ) 1 n x ij f i (0) x 0 i 1 k 0 k 1 = O ( n )= o ( n d S c min ) ; (3.62) thisfollowssince f i (0)and n 1 P n i =1 x 2 ij areboundedbyaconstantforall1 i n and 1 j p: Also,fromassumption(A1)weobtain, max j 2 S c 1 n n X i =1 x ij ~ I i n n n X i =1 ~ I i C 2 n n n X i =1 x 0 i ( 0 ) 2 : (3.63) C 3 n k 0 k 2 1 = O 3 n 2 n = o ( n d S c min ) : Nowuse(3.61){(3.63)toobtain(3.58),therebycompletingtheproofofthelemma. Lemma3.5.3 UndertheconditionsofTheorem3.3.2, max j 2 S c 1 n n X i =1 ˆ ? 0 Li;j ( 0 ) ? 0 Li;j ( 0 ) = o p ( n d S c min ) (3.64) max j 2 S c T B ;j = o p ( n d S c min ) : (3.65) Proof ThestructureofthisproofissimilartotheproofofTheorem3.2.1.Inthe followingproof c> 0shalldenoteagenericconstantthatmaybetdependingonthe context.Forany0 <; theevent A = n max 1 j p 1 n n X i =1 u 2 ij c u ; max 1 j p; 1 i n j u ij j n o : 88 Usethefact j u ij j˘ Exp ( ˙ jj )toobtain P max 1 i n; 1 j p j u ij j >cn p X j =1 n X i =1 P j u ij j cn 1 ˙ u exp cn ˙ u +log p +log n : Thisboundand(3.47)togetherimplythat P ( A c ) 1 ˙ u exp cn ˙ u +log p +log n +exp 3 n 100 +log p : Now, n 1 n X i =1 ˆ ? 0 Li;j ( 0 ) ? 0 Li;j ( 0 ) n 1 n X i =1 w ij 0 i ( 0 ) + ˙ 2 0 2 n 1 n X i =1 w ij 000 i ( 0 ) + n X i =1 ˙ ij 0 j n 1 n X i =1 w ij 00 i ( 0 ) ; where 0 i ( 0 ):= l 0 i ( 0 ) El 0 i ( 0 ) ; 00 i ( 0 ):= l 00 ( 0 ) El 00 i ( 0 )and 000 i ( 0 ):= l 000 i ( 0 ) El 000 i ( 0 ) : Using n n ,weobtain ( i ) j w ij 0 i ( 0 ) I A j cn ; ( ii ) j w ij 00 i ( 0 ) I A j cn h 1 ; ( iii ) j w ij 000 i ( 0 ) I A j cn h 2 : Hence, P max 1 j p 1 n n X i =1 w ij 0 i ( 0 ) I A t p X j =1 P 1 n n X i =1 w ij 0 ( 0 ) I A t 2exp cn nt 2 +log p ; wherethelastinequalityfollowsfromLemma14.11ofBuhlmannandVandeGeer(2011). 89 Thuschoosing t = cn p 2log2 p=n; forsomeconstant c> 0,weobtain 1 n n X i =1 w ij l 0 i ( 0 ) I A = O p n r 2log2 p n : (3.66) Nowtoremovethetruncationontheset A ,observethat, max 1 j p j n X i =1 w ij 0 i ( 0 ) j max 1 j p j n X i =1 w ij 0 i ( 0 ) I A j +max 1 j p c ( n +max i;j j u ij j ) 1 A c (3.67) + c max 1 j p E ( n +max i;j j u ij j ) 1 A c ProceedasintheproofofTheorem3.2.1toshowthatthelasttwotermsontheRHS convergetozerofasterthantheterm,inprobability.Thusweobtain, 1 n n X i =1 w ij 0 i ( 0 ) = O p n r 2log2 p n ; (3.68) Asimilarargumentyieldsthat max 1 j p 1 n n X i =1 00 i ( 0 ) = O p h 1 r 2log2 p n ; max 1 j p 1 n n X i =1 w ij 000 ( 0 ) = O p n h 2 r 2log2 p n : Recallthat ˙ 2 0 b 0 max s; and j P n k =1 ˙ kj j j b 0 ˙ u s: Nowcombinetheseresultswiththe rateassumptions(3.20)toobtain(3.64). 90 Toproveclaim(3 : 65),notethat S ? n;j ( )= 1 n n X i =1 w ij l 0 i ( ) l 0 i ( 0 ) El 0 i ( ) El 0 i ( 0 ) + ˙ 2 2 1 n n X i =1 w ij l 000 i ( ) l 000 i ( 0 ) El 00 i ( ) El 00 i ( 0 ) + ˙ 2 ˙ 2 0 2 1 n n X i =1 w ij l 000 i ( 0 ) El 000 i ( 0 ) n X k =1 ˙ kj j 1 n n X i =1 l 00 i ( ) l 00 i ( 0 ) El 00 i ( ) El 00 i ( 0 ) n X k =1 ˙ kj ( j 0 j ) 1 n n X i =1 l 00 i ( 0 ) El 00 i ( 0 ) = I + ˙ 2 2 II + ˙ 2 ˙ 2 0 2 III n X k =1 ˙ kj j IV n X k =1 ˙ kj ( j 0 j ) V: Webeginwiththeterm II; whichturnsouttohavetheslowestrateofconvergence. theempiricalprocess G 000 n;j ( ):= 1 n P n i =1 w ij l 000 i ( ) El 000 i ( ) andlet, Z 000 n;j =sup 2B ( n ) G 000 n;j ( ) G 000 n;j ( 0 ) : Also,observethatfrom(3.42)and(3.43)wehave, w ij l 000 ( s 1 ;y i ) w ij l 000 ( s 2 ;y i ) c ( n +max i;j j u ij j ) h 3 j s 1 s 2 j : ThenasintheaboveproofofTheorem3.2.1,applyLemma14.2andTheorem14.2of 91 BuhlmannandVandeGeer(2011toobtain P max 1 j p Z 000 n;j I A 8 cb 0 ( c x + c u ) n h 3 n 4 r 2log2 p n + r 2 t n ! p X j =1 P Z 000 n;j I A 8 cb 0 ( c x + c u ) n h 3 n 4 r 2log2 p n + r 2 t n ! exp t +log p : Nowchoose t = c log p , c> 0 ; sothatthelasttermintheaboveexpressionis o (1) : Now removingthetruncationontheset A asdoneintheproofofTheorem3.2.1,weobtain max j 2 S c sup 2B ( n ) j II j =max 1 j p Z 000 n;j = O p n h 3 n r 2log2 p n ; wherethelastequalityfollowsbytherateassumption(3.20).Asimilarargumentapplied totheterms I and IV yieldsthat max j 2 S c sup 2B ( n ) j I j = O p n h 1 n r 2log2 p n ; max j 2 S c sup 2B ( n ) j IV j = O p h 2 n r 2log2 p n ; Anargumentsimilartotheoneusedforproving(3.64)yields max j 2 S c j III j = O p n h 2 r 2log2 p n ; max j 2 S c j V j = O p n h 1 r 2log2 p n : Nowclaim(3.65)followsfromthesebounds,theratecondition( iii )of(3.20),andthefacts j P n k =1 ˙ kj ( j 0 ) j 2 b 0 ˙ u p s; and ˙ 2 b 0 max s: ThiscompletestheproofofLemma 3.5.3. 92 BIBLIOGRAPHY 93 BIBLIOGRAPHY [1] AGARWAL,A.,NEGHBAN,S.ANDWAINWRIGHT,M.J.(2012).FASTGLOBAL CONVERGENCEOFGRADIENTMETHODSFORHIGHDIMENSIONALSTATIS- TICALRECOVERY. ANN.STATIST. , 40 ,2452{2482. [2] ALQUIER,P.ANDDOUKHAN,P.(2011).SPARSITYCONSIDERATIONSFORDE- PENDENTVARIABLES ELECTRONICJOURNALOFSTATISTICS , 5 ,750-774. [3] BAILLIE,R.T.(1996).LONGMEMORYPROCESSESANDFRACTIONALINTE- GRATIONINECONOMETRICS JOURNALOFECONOMETRICS , 73 ,735-59. [4] BELLONI,A.ANDCHERNOZHUKOV,V.(2011) ` 1 -PENALIZEDQUANTILERE- GRESSIONINHIGHDIMENSIONALSPARSEMODELS, ANN.OFSTATIST. , 39 , 82{130. [5] BERAN,J.,FENG,Y.,GHOSH,S.,KULIK,R(2013).LONGMEMORYPROCESSES, SPRINGER [6] BERAN,J.(1992).STATISTICALMETHODSFORDATAWITHLONG-RANGEDE- PENDENCE, STATISTICALSCIENCE 7 ,404-427. [7] BICKEL,P.,RITOV,Y.ANDTSYBAKOV,A.(2009).SIMULTANEOUSANALYSIS OFLASSOANDDANTZIGSELECTOR, ANN.STATIST. , 37 1705{1732. [8] BUCHINSKY,M.(1994).CHANGESINTHEU.S.WAGESTRUCTURE1963-1987: APPLICATIONSOFQUANTILEREGRESSION ECONOMETRICA , 62 405{458. [9] B UHLMANN,P.ANDVANDEGEER,S.(2011). STATISTICSFORHIGHDIMEN- SIONALDATA .SPRINGER,NEWYORK. [10] CARROLL,R.J.,RUPPERT,D.,STEFANSKI,L.A.ANDCRAINICEANU,C.(2006). MEASUREMENTERRORINNONLINEARMODELS:AMODERNPERSPECTIVE . NEWYORK:CHAPMANANDHALL. [11] DAHLHAUS,R.(1995).EFFICIENTLOCATIONANDREGRESSIONESTIMATION FORLONGRANGEDEPENDENTREGRESSIONMODELS, ANN.STATIST. , 23 1029-1047. 94 [12] DOUKHAN,P.(1994).MIXING:PROPERTIESANDEXAMPLES, SPRINGER [13] DUCHI,J.,SHALEV-SHWARTZ,S.,SINGER,Y.ANDCHANDRA,T.(2008).EF- FICIENTPROJECTIONSONTOTHE ` 1 -BALLFORLEARNINGINHIGHDI- MENSIONS. INTERNATIONALCONFERENCEONMACHINELEARNING.272- 279. ACM,NEWYORK,NY. [14] DUREETT,R.(2005).PROBABILITY:THEORYANDEXAMPLES,(THIRDEDI- TION) DUXBURY [15] FAN,J.,FAN,Y.,BARUT,E.(2014).ADAPTIVEROBUSTVARIABLESELEC- TION ANN.OFSTATIST. , 42 ,324-351. [16] FRIEDMAN,J.,HASTIE,T.ANDTIBSHIRANI,R.(2010).REGULARIZATION PATHSFORGENERALIZEDLINEARMODELSVIACOORDINATEDESCENT. J. STAT.SOFTW. , 33 ,1{22. [17] FULLER,W.A.(1987). MEASUREMENTERRORMODELS .WILEY,NEWYORK. [18] GIRAITIS,L.,KOUL,H.,SURGAILIS,D.(2012).LARGESAMPLEINFERENCE FORLONGMEMORYPROCESSES, IMPERIALCOLLEGEPRESS [19] GUO,H.ANDKOUL,H.(2007).NONPARAMETRICREGRESSIONWITHHET- EROSCEDASTICLONGMEMORYERRORS, JOURNALOFSTATISTICALPLAN- NINGANDINFERENCE , 137 379-404. [20] JOHNSTONE,I.M.(2001).CHI-SQUAREORACLEINEQUALITIES. STATEOF THEARTINPROBABILITYANDSTATISTICS (LEIDEN,1999),399-418,IMSLEC- TURENOTESMONOGR.SER., 36 ,IMS,BEACHWOOD,OH. [21] KNIGHT,K.ANDFU,W.(2000).ASYMPTOTICSFORLASSO-TYPEESTIMA- TORS, ANN.STATIST. , 28 1356-1378. [22] LEE,R.,NOH,H.ANDPARK,B.(2014).MODELSELECTIONVIABAYESIAN INFORMATIONCRITERIONFORQUANTILEREGRESSIONMODELS. J.AMER. STATIST.ASSOC. , 109 ,216-229. [23] LOH,P.,ANDWAINWRIGHT,M.J.(2012).HIGH-DIMENSIONALREGRESSION WITHNOISYANDMISSINGDATA:PROVABLEGUARANTEESWITHNON- CONVEXITY. ANNALSOFSTATIST. , 40 ,1637{1664. 95 [24] MCKENZIE,H.,JERDE,C.VISSCHER,D.,MERRILL,E.ANDLEWIS,M.(2009). INFERRINGINTHEPRESENCEOFGPSMEASUREMENTERROR. ENVIRON. ECOL.STAT. , 16 ,531{546. [25] MEINHAUSEN,N.ANDB UHLMANN,P.(2006).HIGHDIMENSIONALGRAPHS ANDVARIABLESELECTIONWITHTHELASSO, ANN.STATIST. , 34 1436-1462. [26] NOLAN,D.ANDPOLLARD,D.(1987).U-PROCESSES:RATESOFCONVER- GENCE, ANN.STATIST. , 15 780-799. [27] POLLARD,D.(1984).CONVERGENCEOFSTOCHASTICPROCESSES. SPRINGER,NEWYORK. [28] PURDOM,E.ANDHOLMES,S.(2005).ERRORDISTRIBUTIONFORGENEEX- PRESSIONDATA. STATIST.APPL.INGENETICSANDBIOLOGY, 4 ,ARTICLE 16. [29] RASKUTTI,G.,WAINWRIGHT,M.,YU,B.(2010).RESTRICTEDEIGENVALUE PROPERTIESFORCORRELATEDGAUSSIANDESIGNS, JOURNALOFMA- CHINELEARNINGRESEARCH , 99 ,2241{2259. [30] ROSENBAUM,M.ANDTSYBAKOV,A.B.(2010).SPARSERECOVERYUNDER MATRIXUNCERTAINTY ANNALSOFSTATIST. , 38 2620{2651. [31] ROSENBAUM,M.ANDTSYBAKOV,A.B.(2011).IMPROVEDMA- TRIXUNCERTAINTYSELECTOR TECHNICALREPORT. AVAILABLEAT HTTP://ARXIV.ORG/ABS/1112.4413. [32] SORENSEN,O.,FRIGESSI,A.ANDTHORESEN,M.(2012).MEASUREMENTER- RORINLASSO:IMPACTANDLIKELIHOODBIASCORRECTION.AVAILABLE ATHTTP://ARXIV.ORG/PDF/1210.5378.PDF. [33] STEFANSKI,L.A.,CARROLL,R.J.(1990).DECONVOLUTINGKERNELDENSITY ESTIMATORS. STATISTICS , 21 ,169{184. [34] TIBSHIRANI,R.(1996).REGRESSIONSHRINKAGEANDSELECTIONVIATHE LASSO, JOURNALOFTHEROYALSTATISTICALSOCIETY,SERIESB , 58 267288. [35] WANG,H.,LI,R.ANDTSAI,C.L.(2007).TUNINGPARAMETERSELECTORS FORSMOOTHLYCLIPPEDABSOLUTEDEVIATIONMETHOD BIOMETRIKA , 3 , 553{668. 96 [36] WANG,H.,STEFANSKI,L.A.,ANDZHU,Z.(2012).CORRECTED-LOSSESTIMA- TIONFORQUANTILEREGRESSIONWITHCOVARIATEMEASUREMENTER- RORS. BIOMETRIKA , 99 ,405-421. [37] WANG,L.,WU,Y.,LI,R.(2012).QUANTILEREGRESSIONFORANALYZING HETEROGENEITYINULTRA-HIGHDIMENSION. J.AMER.STATIST.ASSOC. , 107 ,214-222. [38] YOON,Y.,PARK,C.,ANDLEE,T.(2013),PENALIZEDREGRESSIONMODELS WITHAUTOREGRESSIVEERRORTERMS, JOURNALOFSTATISTICALCOM- PUTATIONANDSIMULATION , 83 ,1756-1772. [39] ZHANG,Y.,LI,R.ANDTSAI,C.L.(2010).REGULARIZATIONPARAMETERSE- LECTIONSVIAGENERALIZEDINFORMATIONCRITERION. J.AMER.STATIST. ASSOC. , 105 ,312{323. [40] ZHAO,P.ANDYU,B.(2006).ONMODELSELECTIONCONSISTENCYOF LASSO, JOURNALOFMACHINELEARNINGRESEARCH , 7 ,2541{2563. [41] ZOU,H.(2006).THEADAPTIVELASSOANDITSORACLEPROPERTIES, J. AMER.STATIST.ASSOC. , 101 ,1418-1429. 97