HIGHDIMENSIONALLINEARREGRESSIONMODELSUNDERLONGMEMORY
DEPENDENCEANDMEASUREMENTERROR
By
AbhishekKaul
ADISSERTATION
Submittedto
MichiganStateUniversity
inpartialentoftherequirements
forthedegreeof
Statistics-DoctorofPhilosophy
2015
ABSTRACT
HIGHDIMENSIONALLINEARREGRESSIONMODELSUNDERLONG
MEMORYDEPENDENCEANDMEASUREMENTERROR
By
AbhishekKaul
Thisdissertationconsistsofthreechapters.Thechapterintroducesthemodelsunder
considerationandmotivatesproblemsofinterest.Abriefliteraturereviewisalsoprovided
inthischapter.
ThesecondchapterinvestigatesthepropertiesofLassounderlongrangedependentmodel
errors.Lassoisacomputationallytapproachtomodelselectionandestimation,and
itspropertiesarewellstudiedwhentheregressionerrorsareindependentandidentically
distributed.Westudythecase,wheretheregressionerrorsformalongmemorymoving
averageprocess.WeestablishasampleoracleinequalityfortheLassosolution.We
thenshowtheasymptoticsignconsistencyinthissetup.Theseresultsareestablishedinthe
highdimensionalsetup(
p>n
)where
p
canbeincreasingexponentiallywith
n
.Finally,we
showtheconsistency,
n
1
2

d
-consistencyofLasso,alongwiththeoraclepropertyofadaptive
Lasso,inthecasewhere
p
isHere
d
isthememoryparameterofthestationaryerror
sequence.TheperformanceofLassoisalsoanalysedinthepresentsetupwithasimulation
study.
Thethirdchapterproposesandinvestigatesthepropertiesofapenalizedquantilebased
estimatorformeasurementerrormodels.Standardformulationsofpredictionproblemsin
highdimensionregressionmodelsassumetheavailabilityoffullyobservedcovariatesand
sub-Gaussianandhomogenousmodelerrors.Thismakesthesemethodsinapplicableto
measurementerrorsmodelswherecovariatesareunobservableandobservationsarepossi-
blynonsub-Gaussianandheterogeneous.Weproposeweightedpenalizedcorrectedquantile
estimatorsfortheregressionparametervectorinlinearregressionmodelswithadditivemea-
surementerrors,whereunobservablecovariatesarenonrandom.Theproposedestimators
forgotheneedfortheabovementionedmodelassumptions.Westudytheseestimatorsin
boththedimensionandhighdimensionalsparsesetups,inthelattersetup,thedi-
mensionalitycangrowexponentiallywiththesamplesize.Intheeddimensionalsetting
weprovidetheoraclepropertiesassociatedwiththeproposedestimators.Inthehighdi-
mensionalsetting,weprovideboundsforthestatisticalerrorassociatedwiththeestimation,
thatholdwithasymptoticprobability1
;
therebyprovidingthe
`
1
-consistencyofthepro-
posedestimator.Wealsoestablishthemodelselectionconsistencyintermsofthecorrectly
estimatedzerocomponentsoftheparametervector.Asimulationstudythatinvestigates
thesampleaccuracyoftheproposedestimatorisalsoincludedinthischapter.
ACKNOWLEDGMENTS
IwishtoexpressmysincerethankstoProfessorHiraL.Koulforgivingmetheopportunity
tocomehereandlearnandforguidingmeeversince.Ithankhimforsuggestingproblems
andinspiringandguidingmethroughthem.Ialsowishtothankhimandhisfamilyfor
makingthiscountryfeelnottooalien.
IwouldalsoliketothankProfessorRamamoorthiandProfessorMaitiforstimulating
discussionsandfortheirhelpduringmygraduatestudy.Also,IthankProfessorPing-Shou
ZhongandProfessorHaoWangforservingonmydissertationcommittee.
Ithankmyparentsfortheirconstantloveandsupportwhichenabledmetocomplete
thiswork.
FinallyIwouldliketothankmyfriendsAkshitaChawla,ChanakyaKaul,JiwoongKim
andRanjitBawaformakingthisjourneyjoyous.
ThisresearchhasbeensupportedinpartbytheNSF-DMSGrant1205271:PI.HiraL.
Koul,forwhichIamgrateful.
iv
TABLEOFCONTENTS
LISTOFTABLES
....................................
vi
LISTOFFIGURES
...................................
vii
KEYTOSYMBOLS
..................................
viii
Chapter1IntroductionandLiteratureReview
................
1
1.1Lassoandlongmemory..............................3
1.2Measurementerrorandpenalizedquantileregression.............6
Chapter2LassowithLongMemoryRegressionErrors
...........
9
2.1ResultswithFiniteSample............................11
2.2SignConsistencyofLassounderLongMemory.................17
2.3Asymptoticswhen
p
is...........................20
2.3.1Asymptoticdistributionof
X
T
"
.....................20
2.3.2AsymptoticPropertiesofLasso.....................21
2.3.3AdaptiveLasso..............................23
2.4SimulationStudy.................................24
2.5ProofsforChapter2...............................28
Chapter3Weighted
`
1
-PenalizedCorrectedQuantileRegression
forHighDimensionalMeasurementErrorModels
.......
44
3.1Model,EstimatoranditsOracleProperty...................45
3.2Relationshipbetween
ˆ
?
L
and
ˆ
.........................51
3.3ResultsinHighDimensions...........................54
3.3.1ASparsityProperty............................58
3.3.2Adaptivechoiceoftheweightvector
d:
.................60
3.4SimulationStudy.................................61
3.4.1Simulationsetup.............................61
3.4.2ComputationalIssues...........................63
3.4.3Simulationsetupandresults.......................65
3.5ProofsforChapter3...............................71
BIBLIOGRAPHY
.................................
93
v
LISTOFTABLES
Table2.1MediansofRPE,REE,NZ&IZwithGaussiandesign,longmem.
errors...................................27
Table2.2MediansofRPE,REE,NZ&IZwithGaussiandesign,i.i.d.errors.27
Table3.1
Simulationresultsat
p
=40forNormal,CauchyandParetomodelerrors
....67
Table3.2
Simulationresultsat
p
=300forNormal,Cauchymodelerrors
.........69
Table3.3
Simulationresultsat
p
=500underheteroscedasticity
.............69
Table3.4
W`
1
-
CQ
at
p
=40forNormalandCauchymodelerrors.
...........69
Table3.5
W`
1
-
CQ
at
p
=300forNormalandCauchymodelerrors.
..........70
Table3.6
`
1
-
CQ
at
p
=300withmisspcovariateerrordistribution.
........70
vi
LISTOFFIGURES
Figure2.1Lagvssampleauto-correlationfunctionwithd=0.15andd=0.25..25
Figure2.2Lagvssampleauto-correlationfunctionwithd=0.35andd=0.45..26
Figure3.1
1
n
P
n
i
=1
ˆ
?
L
(

)+

n
k

k
1
evaluatedaround

1
at
h
=0
:
01(left)and1
:
5(right).
64
Figure3.2
Coloredcontoursof
k
^


k
1
on
h
vs.

n
for
`
1
-
CQ
.
.............64
Figure3.3
Coloredcontoursof
k
^


k
1
on
p
vs.

(left)and
h:
(right)
..........66
vii
KEYTOSYMBOLS
Inwhatfollows,forany
z
=(
z
1
;

;z
p
)
T
2
R
p
,
k
z
k
1
=
P
p
j
=1
j
z
j
j
;
k
z
k
2
2
=
P
p
j
=1
z
2
j
:
For
anytwosequencesofpositivenumbers
f
a
n
;b
n
g
,
a
n
=
O
(
b
n
)
;
denotesthatforalllarge
n
,
a
n

cb
n
,forsomeuniversalconstant
c>
0
;
whichdoesnotdependonanyunderlying
parametersorthesamplesize
n:
Alllimitsaretakenas
n
!1
:
Foranyindexset
S

f
1
;:::;p
g
;
andforanyvector

2
R
p
;
denote

S
=
f

j
;
j
2
S
g
:
Foranyevent
A
,denote
I
A
astheindicatoroftheevent
A
.
viii
Chapter1
IntroductionandLiteratureReview
Thisdissertationshallanalyseanddevelopestimationandvariableselectiontechniquesfor
linearregressionmodelsundertwodistinctsetups.Chapter2considersthesetupwiththe
modelerrorsbeingalongmemorymovingaverageprocessandChapter3considersthesetup
wherethedesignvariablesarenotobserveddirectlybutanoisyformofthesevariablesare
observable,whichiscommonlyreferredtoasthemeasurementerrororerrors-in-variables
setup.
Webeginbydescribingthemodelsunderconsideration.The
i
th
componentofthe
responsevector
y
T
=(
y
1
;:::y
n
)isassumedtoberelatedtothe
i
th
row
x
T
i
=(
x
i
1
;

;x
ip
)
;
1

i

n
ofthe
n

p
designmatrix
X
bytherelation
y
i
=
x
T
i

+
"
i
;
forsome

2
R
p
;
1

i

n:
(1.1)
Hereforanyvector
a
,
a
T
denotesitstranspose.InChapter2,thevector
"
:=(
"
1
;:::;"
n
)
T
is
assumedtobealongmemorymovingaverageprocess,additionalassumptionsonthisvector
shallbemadepreciseinChapter2.Thedesignvariables
f
x
i
;
1

i

n
g
shallbeassumed
tobenonrandom.Howeverasshallbecomeapparentlaterthisconditionmaybeeasily
relaxedtoallowforsomecommonrandomdesignssuchassub-Gaussianorsub-Exponential
withindependentrows.
Intheabovediscussionweassumethatthedesignvariables
f
x
i
;
1

i

n
g
arecom-
1
pletelyobserved.Aclassicalproblemisthatofestimationoftheparametervector

when
thedesignvariables
x
i
'sarenotdirectlyobservable.Thisproblemshallserveasthecontent
ofChapter3.Inplaceofthedesignvariables
x
i
'sweobservethesurrogate
w
i
'sobeyingthe
model
w
i
=
x
i
+
u
i
;
1

i

n:
(1.2)
Here,
u
T
i
=(
u
i
1
;

;u
ip
)areassumedtobeindependentof
f
"
i
g
andindependentand
identicallydistributed(i.i.d.)
p

dimensionalrandomvectors.Theexactassumptionsare
madepreciseinChapter3.
Theparametervectorofinterestthroughoutthisdocumentshallbethe
p
-dimensional
vector

=(

1
;:::;
p
)
:
Weshallconsiderboththe
p
setupaswellasthehighdimen-
sional
p
setup.Inthelattercase,thedimension
p
shallbeallowedtogrowexponentially
withthesamplesize
n:
Acriticalassumptionmadeontheparametervectorofinterest

is"sparsity,"i.e.itis
assumedthatalargeproportionofthecolumnsofthedesignmatrixdonotcontributeany
lineartotheresponsevector
y:
Inotherwords,alargeproportionofthecomponents
oftheparametervector

arezero.Theproblemofinterestistoconsistentlyidentifyand
estimatethenon-zerocomponentsof
:
Thepasttwodecadeshavecontributedextensivelytosolutionstothisproblem,
chiefamongwhichhasbeenthe
`
1
-penalizedmethodsfortheirdesirablesampleand
asymptoticpropertiesandcomputational.Theestimatesforthesemethodsare
2
typicallydescribedas
^

(

)=argmin

2
R
p
f
l
n
(

)+

k
d


k
1
g
;>
0
;
(1.3)
where
l
n
(

)isanappropriatelychosenlossfunctionthatcanbecomputedusingtheobserved
variables.Theweights
d
=(
d
1
;:::;d
p
)
T
isavectorofnon-negativeweights,and`

'denotes
theHadamardproduct,i.e.,
k
d


k
1
:=
P
p
j
=1
d
j
j

j
j
:
Throughoutthisthesis,thedesign
variables
x
i
'smaybetriangulararraysdependingon
n
,butwedonotexhibitthisdependence
forthesakeofthetransparencyoftheexposition.Also,alllimitsaretakenas
n
!1
,unless
mentionedotherwise.
1.1Lassoandlongmemory
Fortheestimatordescribedin(1.3)choosingthelossfunction
l
n
(

)=
1
n
P
n
i
=1
(
y
i

x
T
i

)
2
;
i.e.,thesquaredlossfunctionandsetting
d
j
=1
;
1

i

p;
weobtainthewellcelebrated
leastabsoluteshrinkageandselectionoperator(Lasso)proposedbyTibshirani(1996).Its
statisticalpropertiesarewellstudiedwhentheregressionerrorsareindependentandiden-
ticallydistributed(i.i.d.)randomvariables(r.v.),see,e.g.,KnightandFu(2000),Mein-
hausenandBuhlmann(2006),ZhaoandYu(2006),Bickel,RitovandTsybakov(2009)
andBuhlmannandvandeGeer(2011).Inparticular,KnightandFu(2000)providethe
consistencyand
p
n
-consistencyofLassoestimatesunderthe
p
setting.Inthehighdi-
mensionalsetting,Bickel,RitovandTsybakov(2009)andBuhlmannandvandeGeer(2011)
provideerrorboundsforthestatisticalerrorassociatedwithLassothatholdwithprobabil-
itytendingto1
:
Furthermoretheyprovideadetaileddiscussionofthenecessaryconditions
3
requiredtoobtainthedesirederrorbounds.MeinhausenandBuhlmann(2006)andZhao
andYu(2006)madetheimportantcontributionofprovidingmodelselectionconsistency
resultsandalsodetailedtheconditionsnecessarytoobtainthem.
Theabovementionedpapersworkunderthecommonassumptionthatthemodelerrors
f
"
i
;
1

i

n
g
arei.i.d.realizationsofasub-Gaussianr.v.,asstatedintheintroduc-
tiontheprobleminvestigatedinthisdissertationistoanalysethepropertiesofLasso
whenthemodelerrors
f
"
i
;
1

i

n
g
formalongmemorymovingaverageprocess.
Theliteratureintheareaof
`
1
-penalizedestimationwithdependenceconsiderationsis
scant.ThepaperdealingwiththisissueisthatofAlquierandDoukhan(2011).They
providesampleerrorboundsunderweakdependencestructuresonthemodelerrors
f
"
i
;
1

i

n
g
:
AnotherrecentpaperaddressingdependenceconcernsisthatofYoon,Park
andLee(2013).Theirpaperprovidesasymptoticresultsinthe
n>p
setup,inalinear
regressionmodelswithstationaryauto-regressiveerrors.Inthatpapertheerrorprocessis
assumedtobeanAR(q)process,whichisknowntobeashortmemoryprocess,seeGiraitis,
Koul,andSurgailis(2012)(GKS).ThisthesisinvestigatesthebehaviourofLassoundera
strongerdependencestructureandlessrestrictivemodelassumptionsincomparisontothe
abovementionedpapers.
TheadaptiveLassoproposedbyZou(2006)fromLassointhewayparameters
arepenalizedwheretheweights
d
j
;
1

i

p
arechosencarefully.Tobemoreprecise,for
any
>
0
;
theweightvector
d
=1
=
j
(
^

)
j

;
with
^

beinganyinitialestimateof

suchthat
n
1
2

d
(
^


)=
O
p
(1)componentwise.Toavoidconfusionthereadershouldrecall
that
d
denotesthememoryparameterofthestationaryerrorsequencewhereas
d
denotes
4
thevectorofweightsinthepenaltyterm.TheadaptiveLassoestimates
~

aregivenby
~

=argmin

2
R
p
8
<
:
1
n
k
y

X
k
2
2
+

n
p
X
j
=1
d
j
j

j
j
9
=
;
:
(1.4)
Let
A
=
f
j
:

j
6
=0
g
,
A
?
n
=
f
j
:
~

j
6
=0
;
1

j

p
g
and

A
;
~

A
bethecorresponding
vectorswithonlythosecomponentswhoseindicesareintheset
A
:
AsstatedinZou(2006)forthei.i.d.modelerrorsetupwhen
p
isanestimatoris
saidtohave
oracleproperty
ifthefollowinghold.
1.
Asymptotically,therightmodelisideni.elim
n
!1
P
(
A
?
n
=
A
)=1
:
2.
Theestimatorhasanoptimalestimationrate,
n
1
2
(
~

A


A
)
!
D
N
(0
;

?
),forsome
covariancematrix
?
:
Here
!
D
denotesconvergenceindistribution.TheadaptiveLassohasanadvantageover
Lasso,sinceitpossessestheaboveoraclepropertyundermildassumptions.Ontheother
hand,asprovedbyZhaoandYu(2006),forLassotobesignconsistentanecessarycondition
isthe"strongirrepresentablecondition"whichisamuchstrongerassumption.
InChapter2weprovideboundsonthestatisticalerrorassociatedwithLassounder
longmemorydependentmodelerrors,thatholdwithprobabilitytendingto1
:
Secondly,
weobtainthesignconsistencyofLassounderthissetupwithstandardrestrictionsonthe
designmatrix
X:
Lastly,weprovidetheconsistencyand
n
1
2

d
-consistencyoftheLassoin
thecasewhere
p
isandislessthan
n:
Thisproofisalsoextendedtoderivetheoracle
propertyforadaptiveLasso.Fortheseresults,thepricethatwepaytotacklethepersistent
correlationamongtheerrorsequenceisthattherateofincreaseofthedimension
p
inthe
highdimensionalsetting,andtherateofconvergenceinthe
p
settingissloweddown
5
byafactorof
n
d
:
Here
d
isthememoryparameterofthestationaryerrorsequence.
1.2Measurementerrorandpenalizedquantileregres-
sion
Chapter3ofthisdissertationproposesandanalysesaquantilebasedestimationandvariable
selectiontechniqueforthemeasurementmodeldescribedin(1.1)and(1.2).Intheclassical

p
setting,itiswellknownthatdisregardingmeasurementerrorincovariatesinduces
anattenuationbiasintheestimatesoftheparametervector
;
i.e.,theestimatesarebiased
towardszerowithanondiminishingbiaswhosemagnitudedependsonthevarianceofthe
covariatenoise.Thisproblemformeasurementerrormodelswhen
p
ishasbeenexten-
sivelyworkedonbyseveralauthorsincludingFuller(1987)andCarroll,Ruppert,Stefanski
andCrainiceanu(2006).Theseauthorsalsoprovidenumerousapplicationsofsuchmod-
elsandalsoprovidebiascorrectedestimatorsformeasurementerrormodelsinthecontext
ofmeanregression,wheretheobjectiveistoobtainestimatesof

correspondingtothe
conditionalmeanoftheresponsevariables,giventhecovariates.
Theproblemofestimationandvariableselectioninhighdimensionalmeasurementerror
modelsisofrecentinterest.AuthorsincludingRosenbaumandTsybakov(2010,2011)and
LohandWainwright(2012)studythesemodelsandproposepenalizedestimatorswhichpro-
videconsistentestimationandvariableselectioninthepresenceofmeasurementerror.In
particular,LohandWainwright(2012)proposethe
`
1
-penalisedbiascorrectedleastsquares
approach.Theymaketheimportantcontributionofprovidingerrorboundsfortheassoci-
atedstatisticalerrorinestimationthatholdwithprobabilitytendingto1
;
andtheydoso
withanon-convexlossfunction.However,bothpapersassumeanunderlyingsub-Gaussian
6
homoscedasticdistributionofthemodelerrorsandworkunderthepremiseofmeanregres-
sion.
Anotherimportantestimationtechniqueinlinearregressionmodelsisthatofquantile
regressionwhereoneisinterestedinestimatesof

correspondingtoaspconditional
quantileoftheresponse
y;
giventhecovariates,asopposedtomeanregressionwherethe
problemofinterestistoobtainestimatescorrespondingtotheconditionalmeanofresponse
variables.Quantileregressionisrobustagainstoutliersandisusefulinthepresenceof
heteroscedasticity,see,e.g.,Buchinsky(1994).Thisestimationtechniqueforgoestheneed
forsub-Gaussiandistributionalassumptionsonthemodelerrors
f
"
i
;
1

i

n
g
andreplaces
themwithmuchmildersmoothnessconditionsonthedensityfunctionsoftheseerrors.
Whencovariatesarecompletelyobserved,i.e.withoutmeasurementerror,theproblem
ofquantileregressionwithnonsub-Gaussianityandheteroscedasticityinsparsehighdimen-
sionalmodelshasrecentlybeenstudiedbyFan,FanandBarut(2014),BelloniandCher-
nozhukov(2011),andWang,WuandLi(2012).Theconvexityofquantilelossfunctionis
crucialfortheanalysisoftheirinferenceprocedures.Morepreciselytheyproposeandprovide
theoreticalguaranteesforthe
`
1
-penalisedestimator(1.3)with
l
n
(

)=
1
n
P
n
i
=1
ˆ
(
y
i
;x
i
;
)
where
ˆ
(
y
i
;x
i
;
)=
ˆ
˝
(
y
i

x
T
i

)
;ˆ
˝
(
v
)=
v
f
˝

I
(
v

0)
g
isthequantilelossfunction.
Here
˝
isthequantilelevelofinterest,i.e.,
P
(
"
i
<
0)=
˝;
1

i

n:
Theproblemofcorrectingforbiasinducedbymeasurementerrorinquantileregression
posesachallengingproblem.Inthecaseof
p;
thisproblemhasbeenaddressedby
Wang,StefanskiandZhu(2012)(WSZ).Theyprovideacorrectedquantilelossfunctionand
showthattheestimatesobtainedbyminimizingthislossfunctionprovidesconsistentand
p
n
-consistentestimates.Howeversincetheirlossfunctionisun-penalized,itisunableto
performvariableselection.
7
Chapter3ofthisdissertationproposesandanalyzesthepenalizedversionoftheestimator
proposedbyWSZ.Inthe
p
setupweprovidetheoraclepropertyoftheproposed
estimator.Inthehighdimensionalsettingweprovideboundsonthestatisticalerrorof
theproposedestimatorthatholdwithprobabilitytendingto1
:
Inthissetting,wealso
establishmodelselectionconsistencyofthisestimatorintermsofidentifyingthecorrect
zerocomponentsoftheparametervector.
8
Chapter2
LassowithLongMemoryRegression
Errors
Inmanyproblemsofpracticalinterestregressionmodelswithlongmemoryerrorsarise
naturallyintheofeconometricsandseee.g.,Beran(1994),Baillie(1996)and
morerecentmonographsofGiraitis,Koul,andSurgailis(2012)(GKS),andBeran,Feng,
Ghosh,Kulik(2013),andthenumerousreferencestherein.Itisthusofinteresttoinvestigate
thebehaviorofLassoinregressionmodelswithlongmemoryerrors.
Recallthemodel(1.1),where
x
i
=(
x
i
1
;

;x
ip
)
T
,
i
=1
;

;n
arevectorsofdesign
variables,
y
i
'sdenotetheresponses.Theerrors
"
i
areassumedtobelongmemorymoving
averagewithi.i.d.innovations,i.e.,
"
i
=
1
X
k
=1
a
k

i

k
=
i
X
k
=

a
i

k

k
;
(2.1)
where,
a
k
=
c
0
k

1+
d
;k

1
;
0
<d<
1
2
andsomeconstant
c
0
>
0
;
and
a
k
=0for
k

0
:
Also,

j
;j
2
Z
:=
f
0
;

1
;

2
;
g
arei.i.d.r.v.'swithmeanzeroandvariance
˙
2

.For
notationalconvenience,weshallassume
c
0
=1and
˙
2

=1
;
withoutlossofgenerality.Note
9
that
f
"
i
;i
2
Z
g
isastationaryprocesswithautocavariancefunction

"
(
k
)=
1
X
j
=1
a
j
a
j
+
k
=
k

1+2
d
B
(
d;
1

2
d
)(1+
o
(1))
;
0
<d<
1
=
2
;k
!1
;
(2.2)
where
B
(
a;b
):=
R
1
0
u
a

1
(1

u
)
b

1
du;a>
0
;b>
0,see,e.g.,Proposition3.2.1(ii)inGKS.
Thisauto-correlationstructureinducesalongmemorystructureonthemodelerrors
";
i.e.,
P
1
k
=1
j

"
(
k
)
j
=
1
:
AsmentionedinChapter1,theLassoestimateof

isas,
^

(

)=argmin

ˆ
1
n
k
y

X
k
2
2
+

k

k
1
˙
;>
0
:
(2.3)
ThischapterisdevotedtounderstandingthepropertiesofLassounderourlongmemory
setup.Thethreemaincontributionsofthischapterareasfollows.First,weshowthatthe
probabilityboundforapresetcontrollingthestochastictermmax
1

j

p
j
X
T
j
"
j
can
beobtainedwithalongmemorymovingaverageprobabilitystructureon
f
"
i
;
1

i

n
g
underappropriaterestrictionsontherateofincreaseofthedesignvariablesandwiththe
properchoiceoftheregularizer

n
:
Here
X
j
denotesthe
j
th
columnofthedesignmatrix
X:
Thisresultcanbeusedtoobtainthedesirederrorboundforthestatisticalerrorassociated
withLassointhissetup.Secondly,weobtainsignconsistencyofLassounderthelong
memorysetupwithstandardrestrictionsonthedesignmatrix
X:
Theseresultsareobtained
inthehighdimensionalsetup.Lastly,weprovideconsistencyand
n
1
2

d
-consistencyofLasso
inthecasewhere
p
isxed,undercertainassumptionsonthedesignvariables
X:
Thisproof
isalsoextendedtoderivetheoraclepropertyforamoversionofLassoknownasthe
adaptiveLasso.Thepricethatwepaytotacklethepersistentcorrelationamongtheerror
sequenceisthattherateofincreaseofthedimension
p
inthehighdimensionalsetting,and
10
therateofconvergenceinthe
n>p
settingissloweddownbyafactorof
n
d
:
Allresultsinthehighdimensionalsettingallowthedesignvariablestogrowwiththe
restriction
P
1

i

n
x
2
ij
=
O
(
n
)
;
andhencetheresultsobtainedcanalsoeasilybeextended
tocaseofGaussianrandomdesigns.Furthermore,allresultsprovedinthehighdimensional
setupinthischapterallow
p
togrowexponentiallywith
n:
Thischapterisorganizedasfollows.Section2.1belowinvestigatesthesample
propertiesofLasso.Section2.2investigatesthesignconsistencyofLasso.Section2.3
providestheasymptoticpropertiesofLassoandalsotheoraclepropertyofadaptiveLasso
inthe
p
setup.Section2.4presentsasimulationstudytoanalysethesample
performanceofLassointhecurrentsetup.
2.1ResultswithFiniteSample
InthissectionweproveasampleoracleinequalityfortheLassosolutionwhenthe
designisnonrandom.Thisinturnwillimplytheconsistencyaswell.Basedontheassump-
tionsofthedesignvariablesitwillsoonbeclearthattheresultscanbeeasilyextendedto
Gaussianrandomdesignsaswell.Accordingly,inthissubsectionweassume
x
i
'sarenon
random.Toproceedfurther,weshallneedthefollowingnotation.Let
W
nj
=
n

(1
=
2+
d
)
n
X
i
=1
x
ij
"
i
=
n

(1
=
2+
d
)
n
X
i
=1
i
X
k
=

x
ij
a
i

k

k
=
n
X
k
=

c
nk;j

k
;
(2.4)
11
where
c
nk;j
:=
n

(1
=
2+
d
)
n
X
i
=1
x
ij
a
i

k
;k
2
Z
;j
=1
;

;p;
(2.5)
c
n;j
:=sup

<k

n
j
c
nk;j
j
;c
n
=max
1

j

p
c
n;j
:
Also,let,
˙
2
n;j
:=
Var
(
W
nj
)
;˙
2
n
=max
1

j

p
˙
2
n;j
:
(2.6)
Weshallprovethat,withanappropriatechoiceof

n
,theLassosolutionobeysthe
followingoracleinequalityinthelongmemorycase,withoverwhelmingprobability,i.e.,for
any
n

1
;
withprobabilityapproaching1
;
k
X
(
^


)
k
2
2
=n
+

n
k
^


k
1

4

2
n
s
0
˚
2
0
Here

n
=(
O
log
p=n
1
=
2

d
),undersomeconditionsonthedesignmatrix.Also,
s
0
isthe
cardinalityofthesetofnonzerocomponentsof

and
˚
0
isaconstant.
Thekeyresultrequiredfortheproofinvolvesobtainingaprobabilityboundfortheset
=
(
max
1

j

p
2
n

1
j
n
X
i
=1
x
ij
"
i
j

0
n
)
;
(2.7)
foraproperchoiceof

0
n
.Oncethisprobabilityboundisobtained,theoracleinequality
followsbydeterministicarguments(Seee.g.BuhlmannandvandeGeer(2011)).Infactwe
havethefollowing
Proposition2.1.1
Let
"
i
beasdin(2.1)withtheinnovationdistributionsatisfying
12
theCramer'scondition:Forall
k

2
andsome
0
<D<
1
,
E
j

0
j
k

D
k

2
k
!
E
2
0
:
(2.8)
For
t>
0
,

0
n
=

B
n
(
t
2
+4log
p
)+
q
B
2
n
(
t
2
+4log
p
)
2
+16
˙
2
n
(
t
2
+4log
p
)

2
n
1
=
2

d
;
(2.9)
where
B
n
:=
c
n
D
.Then,forall
1

j

p
andforall
n

1
,
P
 
2


n

1
n
X
i
=1
x
ij
"
i


>
0
n
!

2exp

(
t
2
+4log
p
)
=
4
g
:
(2.10)
Consequently,
P


1

2exp(

t
2
4
)
;n

1
:
(2.11)
Theproofoftheabovepropositionwillrequireseverallemmas,henceispostponedto
Section2.5ofthischapter.ThekeytotheproofisanapplicationoftheBernsteininequality
topartialsumsandthenpassingtolimit.
WecannowproceedtodescribetheoracleinequalityfortheLassosolution.Thecorre-
spondingresultswithi.i.d.errorsareprovedinBuhlmannandvandeGeer(2011,chapter
6).Inwhatfollows,
S
0
denotesthecollectionofindicesofthenonzeroelementsofthetrue

asin(1.1)and
s
0
denotesthecardinalityof
S
0
:
Also,forany

2
R
p
;
S
0
denotes
thevectorofthosecomponentsof

whichhavetheirindicesin
S
0
.Inordertoobtainthe
followinginequalitywerequirethe`compatibilitycondition'onthedesignmatrix
X:
This
13
conditionisasgiveninBuhlmannandvandeGeer(2011),whichisrestatedhereforthe
convenienceofthereader.
2.1.1
Wesaythe
Compatibilitycondition
ismetfortheset
S
0
;
ifforsome
˚
0
,andforall

satisfying
jj

S
c
0
jj
1

3
jj

S
0
jj
1
;
jj

S
0
jj
2
1

(

T
^


)
s
0
˚
2
0
;
with
^
=
X
T
X=n:
Theorem2.1.1
Assumethatthecompatibilityconditionholdsfor
S
0
:
Forsome
t>
0
let
theregularizationparameterbe

n

2

0
n
;
where

0
n
isgivenin(2.9).Thenwithprobability
atleast
1

2exp(

t
2
=
4)
,wehave
k
X
(
^


)
k
2
2
=n
+

k
^


k
1

4

2
n
s
0
˚
2
0
:
(2.12)
TheproofofTheorem2.1.1isthesameasin(BuhlmannandvandeGeer(2011,chapter
6)),withthevalueof

0
n
changedtotheonegivenin(2.9).Thisresultholdsontheset
whichhastherequiredhighprobabilitybyProposition2.1.1.

Theonlyassumptionswehavemadesofarare(i)Cramer'sConditionin(2.8)onthe
innovationdistributionand(ii)TheCompatibilityConditionin3.15onthedesign
variables.ItmaybeofinteresttomentionthatGaussianityoftheerrordistributionhas
notbeenassumed.Thepricethatwehavepaidforthisgeneralityisthat

0
n
as
in(2.9)isnowitselfdatadriven,i.e.

0
n
alsodependsonthedesignvariables
x
i
's.Thus,
keepinginviewTheorem2.1.1,itisofinteresttoanalysetherateofconvergenceof

0
n
:
Thefollowinglemmaandremarkgiveadditionalconditionsonthedesignvariables,andthe
14
rateofincreaseofthedimension
p;
underwhich

0
n
willconvergeto0.
Lemma2.1.1
Let
X
=(
x
ij
)
n

p
bethedesignmatrixandsupposethefollowingcondition
holds
8
1

j

p;
n

1
n
X
i
=1
x
2
ij

C;
forsome
C<
1
:
(2.13)
Thenwith
c
n
and
˙
2
n
asdin(2.5),(2.6)respectively,wehave,
c
n
=
o
(1)
and
˙
2
n
=
O
(1)
.
Since
B
n
=
c
n
D;
with
D
beingaconstant,theabovelemmaimplies
B
n
!
0
:
Remark2.1.1
Now,recalltheof

0
n
from(2.9).Assumethedesignvariables
satisfycondition(2.13).Furtherassume,log
p
=
o
(
n
1
=
2

d
),then,

0
n
!
0
:
ThefollowingpropositionwillyieldtheconsistencyoftheLassosolution.
Proposition2.1.2
Forsome
t>
0
;
let

n

2

0
n
;
where

0
n
isdin(2.9).Thenon
theset

;
withprobabilityatleast
1

2exp(

t
2
=
4)
;
2
k
X
(
^


)
k
2
2
=n

3

k

k
1
:
(2.14)
Asmentionedearlier,theproofofthistheoremfollowsdeterministicargumentsontheset

;
(SeeBuhlmannandvandeGeer(2011,chapter6)).Theprobabilityofthesetisgiven
inProposition2.1.1.
15
Remark2.1.2
ConsistencyofLasso
:Assumethefollowinghold,
(
i
)log
p=n
1
=
2

d
!
0
;
(
ii
)
Assumption(2.13)holds
;
(
iii
)
k

k
1
=
o
(
n
1
=
2

d
=
log
p
)
:
(2.15)
ThenbyRemark2.1.1wehave

0
n
!
0
:
Also,assumption(iii)ensurestherighthandside
oftheinequality(2.14)convergestozero.Hence,Proposition2.1.2alongwithLemma2.1.1
yieldstheconsistencyoftheLassosolution.
Remark2.1.3
RandomDesign
:Therearetwoassumptionsmadeonthedesignvariables
inordertoobtaintheerrorboundinTheorem2.1.1andtheconvergenceof

0
n
tozeroin
Remark2.1.1.(i)Compatibilityconditiongivenin(3.15)and(ii)condition(2.13)which
restrictstherateofincreaseofthedesignvariables.Theseconditionscanbeshownto
holdinthecaseofGaussianrandomdesignswithindependentrows.UsingTheorem1of
Raskutti(2010),condition(i)canbeshowntoholdwithhighprobability(increasingto
1exponentially).Ifthemaximumvariancecomponentofthedesignvariablesisbounded
abovebyaconstant,then(ii)canbeshowntoholdwithhighprobabilityusingboundsfor
chi-squaredistributionsgiveninJohnstone(2001).Hencetheaboveresultsremainvalid
withhighprobabilitywhenthedesignvariablesareGaussianwithindependentrowsand
independentofthemodelerrors
":
16
2.2SignConsistencyofLassounderLongMemory
InthissectionweprovethesignconsistencyofLassoforthemodel(1.1),(2.1).Theresults
inthissectionaresimilarinspirittoZhaoandYu(2006)andweshallfollowthestructureof
theirproofs.Theyworkedinthei.i.d.setupwhereaswewillbeworkinginthelongmemory
setup.Webeginwithaandsomenotations.
2.2.1
Lassoissaidtobestronglysignconsistentifthereexists

n
=
f
(
n
)
,that
is,afunctionofnandindependentof
y
n
or
X
n
suchthat,
lim
n
!1
P
(
^

n
(

n
)=
s

n
)=1
:
Heretheequalitydenotesequalityinsign,i.e.,
^

n
=
s

n
ifandonlyif
sign
(
^

n
)=
sign
(

n
),
where
sign
(

j
)assignsavalue+1toapositiveentry,

1toanegativeentryand0toazero
entry.
Assume

n
=(

n
1
;:::;
n
q
;
n
q
+1

n
p
)
T
,where

n
j
6
=0
;j
=1
;::;q
,and

n
j
=0,
j
=
q
+1
;::;p
.Let

n
(1)
=(

n
1
;
n
q
)
T
and

n
(2)
=(

n
q
+1
;
n
p
)
T
:
Denote
X
(1)asthe
q
columnsof
X;
correspondingtothenonzerocomponentsof

n
:
Denote
X
(2)asthelast
p

q
columnsof
X;
correspondingtothezerocomponentsof

n
:
Let
C
n
=
n

1
X
T
X:
Then
bysetting
C
n
11
=
n

1
X
(1)
T
X
(1),
C
n
22
=
n

1
X
(2)
T
X
(2)
;C
n
12
=
n

1
X
(1)
T
X
(2)=(
C
n
21
)
T
,
C
n
canthenbeexpressedas
C
n
=

C
n
11
C
n
12
C
n
21
C
n
22

:
Inwhatfollows,wedonotexhibitthedependenceof

,
^

on
n
fortransparencyofthe
exposition.Assuming
C
n
11
isinvertible,theStrongIrrepresentableconditionasby
ZhaoandYuis,
17
StrongIrrepresentableCondition
:Thereexistsavector
;
withconstant,positivecom-
ponents,suchthat,
j
C
n
21
(
C
n
11
)

1
sign(

(1)
)
j
1

;
(2.16)
where
1
isa(
p

q
)

1vectorofonesandtheinequalityholdselement-wise.
Thefollowingpropositionwillserveasatooltoderivethesignconsistencyinthepresent
setup.
Proposition2.2.1
Assumethestrongirrepresentableconditionholdswithavector

,with
allcomponentspositive.Then
P
(
^

(

n
)=
s

)

P
(
A
n
\
B
n
)
;
for
A
n
=

j
(
C
n
11
)

1
W
(1)
j
<n
1
2

d
(
j

(1)
j

n
2
j
(
C
n
11
)

1
sign
(

(1)
)
j
)

;
(2.17)
B
n
=
fj
C
n
21
(
C
n
11
)

1
W
(1)

W
(2)
j

n
2
n
1
2

d

g
;
(2.18)
where
W
(1)=
X
(1)
T
"
n
1
2
+
d
and
W
(2)=
X
(2)
T
"
n
1
2
+
d
:
(2.19)
18
Thispropositionprovidesalowerprobabilityboundfortheequivalenceinsignofthe
Lassoestimateandthetrue

vector.Theproofisdeterministicandhencetheconclusion
holdswithanyprobabilisticstructureon
":
Itisalsoworthmentioningthatthisproposition
holdswithoutanyrestrictiononthedimension
p;
henceweshallbeabletoobtainsign
consistencyunderthecasewhere
p
isincreasingwith
n:
Inthefollowing,weshallassumethefollowingconditionsonthedesignmatrixandthe
modelparameters.Assumethereexists0

c
1
<c
2
<
1

2
d
and
M
1
;M
2
;M
3
>
0,sothat
1
n
x
T
i
x
i

M
1
;
8
i
2f
1
;:::;n
g
;
(2.20)

0
C
11


M
2
;
8

3jj

jj
2
2
=1
;
(2.21)
q
n
=
O
(
n
c
1
)
;
(2.22)
n
1
2

d

c
2
2
min
1

i

q
j

i
j
M
3
:
(2.23)
UndertheaboveassumptionsweobtainthefollowingsignconsistencyresultforLasso
inthelongmemorycase.
Theorem2.2.1
Supposethelongmemoryregressionmodel(1.1)and(2.1)hold,withthe
innovationdistributionsatisfyingtheCramer'scondition(2.8).Thenunderthecondi-
tions(2.16),(2.20),(2.21),(2.22),(2.23),ifforsome
0
<c
3
<c
4
<
(
c
2

c
1
)
=
2
;

n
/
n

(1
=
2

d

c
4
)
and
p
n
=
O
(
e
n
c
3
)
;
then
P
(
^

(

n
)=
s

)
!
1
:
(2.24)
TheproofisdetailedintheSection2.5.
19
2.3Asymptoticswhen
p
is
2.3.1Asymptoticdistributionof
X
T
"
When
n>p
and
p
ised,theasymptoticpropertiesofLassorelycriticallyontheasymp-
toticdistributionofsuitablynormalized
X
T
":
Thisdistributionisstraightforwardtoobtain
inthecaseofi.i.d.errors.Herewepresenttheasymptoticdistributionofnormalized
X
T
"
.
Thisdistributionhasessentiallybeenobtainedinchapter4ofGKS,wheretheauthorsgive
CLT'sforweightedsumsofalongmemorymovingaverageprocess.
T
n
asthenon
normalizedweightedsums
W
n
asgivenin(2.4),i.e.
T
n
=
n
1
2
+
d
W
n
:
Weuse
T
n
insteadof
W
n
torelatethefollowingmorecloselytoGKS.Notethat
T
n
=
X
T
":
Ourgoalistoestablishtheasymptoticdistributionofsuitablynormalized
T
n
.Thisin
turnisfacilitatedbyTheorem4.3.2ofGKS,pp70.Westateaslightlymoversionof
thistheoremwhichcanbeprovedeasilybyfollowingthesamearguments.Inthefollowing
denoteby
n
=Cov(
T
nj
;T
nk
)
p
j;k
=1
:
Theorem2.3.1
Let
f
x
ij
g
n
i
=1
;j
=1
;:::;p;
beparraysofrealweightsand
f
"
i
g
bethesta-
tionarylinearprocessasdin(2.1).Assumetheweights
f
x
ij
g
n
i
=1
satisfythefollowing
condition
8
j
=1
;:::;p;
(
i
)max
1

i

n
j
x
ij
j
=
o
(
n
1
2
+
d
)
and
(
ii
)
n
X
i
=1
x
2
ij

C
j
n
1+2
d
;
(2.25)
andforsomematrix

,
n

(1+2
d
)

n
!

:
(2.26)
20
Then,
n

(
1
2
+
d
)
X
T
"
=
n

(
1
2
+
d
)

T
n
1
;:::;T
np

!
D
N
(0
;

:
Corollary2.3.1
Supposetheweights
f
x
ij
g
satisfy(2.13).Then,
n

(1+2
d
)

n
=
O
(1)
com-
ponentwise,forany
0
<d<
1
=
2
:
Moreover,if(2.26)holdsthen,
n

(
1
2
+
d
)
X
T
"
!
D
N
(0
;

Remark2.3.1
Theorem2.25assumestheconvergence(2.26),Corollary3.57showsthat
underafurtherrestrictiononthedesignmatrix(2.13)wehave
n

(1+2
d
)

n
=
O
(1)
;
however,
weareunabletoshowconvergenceoridentifythelimitwithoutfurtherassumptionson
thedesignmatrix.Ontheotherhand,ifweassumethefollowingstructureonthedesign
variables,thislimitcanthenbeexplicitlycomputed.Let
g
j
:[0
;
1]
!
R
;j
=1
;:::;p;
and
x
i
=(
g
1
(
i=n
)
;

;g
p
(
i=n
))
T
;i
=1
;:::;n;
(2.27)
whereweassumethat
g
j
isacontinuousfunctionwith
k
g
j
k
2
:=
R
1
0
g
2
j
(
u
)
du<
1
;
8
1

j

p:
Underthisstructureonthedesignvariables,wehave
8
1

j;k

p

j;k
:=lim
n
!1
n

(1+2
d
)
Cov(
T
nj
;T
nk
)=
B
(
d;
1

2
d
)
Z
1
0
Z
1
0
g
j
(
u
)
g
k
(
v
)
j
u

v
j

1+2
d
dudv;
where
B
(
d;
1

2
d
)isin(2.2)and
j;k
isthe(
j;k
)
th
componentof
:
Thisstructure
onthedesignvariableshasbeenusedinDahlhaus(1995)inthecontextofpolynomial
regressionwithlongrangedependentregressionerrors.AshortproofisgivenintheSection
2.5.
2.3.2AsymptoticPropertiesofLasso
KnightandFu(2000)provedthatinthecaseofi.i.d.errors,Lassoestimates
^

converge
inprobabilitytothetruecotvector
;
withanoptimalchoiceoftheregularizer

n
:
21
TheyalsoshowthatLassois
p
n
-consistent(asymptoticnormality).Hereweshallpresent
analogousresultswhentheerrorsareassumedtobelongmemorymovingaverage.Inthis
sectionweshallrequirethefollowingassumption.
n

1
X
T
X
!
C;
whereCisapositivematrix.
(2.28)
Theorem2.3.2
Forthelongmemoryregressionmodel(1.1)and(2.1)assumethatthe
designvariablessatisfy(2.25),(2.26),and(2.28).Further,if

n
issuchthat

n
!

0

0
;
then
^

n
!
p
argmin
˚
(
Z
(
˚
))
,where
Z
(
˚
)=(
˚


)
T
C
(
˚


)+

0
p
X
j
=1
j
˚
j
j
;˚
2
R
p
:
Thus,if

n
=
o
(1)
then
argmin
˚
(
Z
(
˚
))=

and
^

n
(

n
)
isconsistentfor

Theorem2.3.3
Forthelongmemoryregressionmodel(1.1)and(2.1)assumethatthe
designvariablessatisfy(2.25),(2.26),and(2.28).Suppose
n
1
2

d

n
!

0

0
as
n
!1
,
then
n
1
2

d
(
^

n


)
!
D
argmin
u
V
(
u
)
;
where
V
(
u
)=

2
u
T
W
+
u
T
Cu
+

0
p
X
j
=1
[
u
j
sign
(

j
)
I
[

j
6
=0]
+
j
u
j
j
I
[

j
=0]
j
]
;
and
W
isa
N
p
(0
;

r.v.
Notethat,when

0
=0
;
argmin
V
(
u
)=
C

1
W;
where
W
˘N
p
(0
;

:
Theabovetwo
theoremshighlightthedesirableasymptoticpropertiesofLassointhecurrentsetup.In
22
particular,when

0
=0
;
Theorem2.3.2guaranteesestimationconsistency,whileTheorem
2.3.3guaranteesthe
n
1
2

d

consistency
:
Thetechniqueusedtoprovetheabovetheoremsistonormalizethedispersionfunc-
tionappropriatelyinordertousetheasymptoticnormalityof
n

(
1
2
+
d
)
X
T
";
incontrastto
n

1
2
X
T
"
inthei.i.d.case.TheproofisdetailedintheSection2.5.
2.3.3AdaptiveLasso
TheadaptiveLassofromLassointhewayparametersarepenalized.Tobemore
precise,forany
>
0
;
theweightvector^
w
=1
=
j
(
^

n
)
j

;
with
^

n
beinganyestimate
of

suchthat
n
1
2

d
(
^

n


)=
O
p
(1)componentwise.TheadaptiveLassoestimates
~

n
are
givenby,
~

n
=argmin

8
<
:
1
n
k
y

X
k
2
2
+

n
p
X
j
=1
^
w
j
j

j
j
9
=
;
:
(2.29)
Let
A
=
f
j
:

j
6
=0
g
,
A
?
n
=
f
j
:
~

n
j
6
=0
;
1

j

p
g
and

A
;
~

n
A
bethecorresponding
vectorswithonlythosecomponentswhoseindicesareintheset
A
:
AsstatedinZou(2006),anestimatorissaidtohave
oracleproperty
ifthefollowing
hold,
1.
Asymptotically,therightmodelisideni.elim
n
!1
P
(
A
?
n
=
A
)=1
:
2.
Theestimatorhasanoptimalestimationrate,
n
1
2

d
(
~

n
A


A
)
!
D
N
(0
;

?
),forsome
covariancematrix
?
:
TheadaptiveLassohasanadvantageoverLasso,sinceitpossessesadesirablevariable
selectionpropertyundermildassumptions.Ontheotherhand,asseeninSection3,for
23
Lassotobesignconsistent,werequirethestrongirrepresentableconditionwhichisamuch
strongerassumption.ThefollowingtheoremshowsthispropertyoftheadaptiveLasso.In
otherwords,theadaptiveLassoenjoystheoraclepropertyinthelongmemorycase.Let
A
bethelimitingcovariancematrixin(2.26)withonlythosecomponentswhoseindicesarein
theset
AA
:
Theorem2.3.4
Forthelinearmodel(1.1),assumethedesignvariablessatisfy(2.25),
(2.26)and(2.28).Lettheregularizer

n
besuchthat
n
1
2

d

n
!
0
;
and
n
1
2
+

2

d


n
!
1
:
ThentheadaptiveLassomustsatisfythefollowing.
1.
Variableselectionconsistency,
lim
n
!1
P(
A
?
n
=
A
)=1
:
2.
Asymptoticnormality,
n
1
2

d
(
~

n
A


A
)
!
D
(
C
n
11
)

1
N
(0
;

A
)
Remark2.3.2
Fortheadaptiveweights^
w
=1
=
j
^

j

;
wecanchoose
^

astheordinaryleast
squareestimate.IthasalreadybeenshowninGKSthat
n
1
2

d
(
^

n


)=
O
p
(1)
;
whichis
therequiredconditionthattheweightsmustsatisfy.
2.4SimulationStudy
InthissectionwenumericallyanalysetheperformanceofLassounderlongrangedependent
setup.Wealsocompareitsperformancetothatinthei.i.d.setup.Allsimulationsweredone
inR,theestimationofLassowasdoneusingthepackage`glmnet'developedbyFriedman,
Hastie,Tibshirani(2013).Theregularizer

n
waschosenbyefoldcross-validation.
Simluationsetup
:Inthisstudy,

waschosenasa1000

1vector,withthetwenty
ecomponentschosenindependentlyfromauniformdistributionovertheinterval(-2,5),all
othercomponentsof

weresettozero.Thecovariates
x
i
arei.i.d.observationsfroma1000
24
dimensionalGaussiandistributionwitheachcomponenthavingmeanandvarianceone.We
setthepairwisecorrelationtobecor(
x
ij
;x
ik
)=0
:
5
j
j

k
j
:
Thisdesignmatrixhasbeenused
byTibshirani(1996)andmanyauthorssincethen.Themodelerrorvector
"
isgenerated
usingthe(2.1)with
c
0
=1and
d
=0
:
15
;
0
:
25
;
0
:
35
;
0
:
45
;
withtheinnovations
beingi.i.d.Gaussianr.v.asgivenin(2.1)withmeanzeroandstandarddeviation
˙

=3
:
5
:
Thesimulationswererepeated100times,i.e.100datasetsweregeneratedundertheabove
setupwiththesameparametervector

.
Sincewehavechosend,thecorrespondingvarianceofeachcomponentofthestationary
errorprocesscanbecomputedas,Var(
"
i
)=
˙
2

P
1
k
=1
k

2+2
d
8
i;
whichturnsouttobe
25.16,31.98,47.64and100.94correspondingtod=0.15,0.25,0.35,0.45respectively.
Webeginbyillustratingthetcorrelationamongthecomponentsoftheregres-
sionerrorvector
":
Figure1and2presentthesampleauto-correlationfunctionsoftheerror
vector
"
ofthemodelofthe100simulateddatasets.Figure1&2aboveexhibitthe
Figure2.1:Lagvssampleauto-correlationfunctionwithd=0.15andd=0.25.
slowdecayoftheautocorrelationamongtheerrorsequence
":
Thisslowrateofdecayisin
coherencewithlongmemorydependence,since
P
1
k
=1
j

"
(
k
)
j
=
1
:
Also,itisevidentfrom
theabovetwothatthestrengthofthedependenceisincreasingasdincreases.
25
Figure2.2:Lagvssampleauto-correlationfunctionwithd=0.35andd=0.45.
Forcomparisonpurposes,weshallalsoperformthesamesimulationstudywiththe
errors
"
=(
"
1
;"
2
;:::;"
n
)beingi.i.d.Gaussianobservationswithmean0andvariance25.16,
31.98,47.64,100.94whichcorrespondtothevariancesofthecomponentsofthestationary
sequence
"
underthelongmemorysetupcorrespondingto
d
=0
:
15
;
0
:
25
;
0
:
35
;
0
:
45
:
The
reasontochoosethesamevarianceof
"
i
asinthelongmemorysetupistomaintainthe
samesignaltonoiseratio.
Nowweproceedtotheestimationpart.Inourstudywesimulated100tre-
alizationsofthedesignmatrix
X
andtheerrorvector
":
Thusleadingto100datasets
withthesameparametervector
:
ForperformancecomparisonweshallreporttheRelative
EstimationError(REE),i.e
k
^


k
2
=
k

k
2
andtheRelativePredictionError(RPE)as
inZou(2006),i.e.theempiricalestimateof
E
k
^
y

X
T

k
2
=˙
2
"
:
Also,weshallreport
thenumberofcorrectlyestimatednon-zeroparameters(NZ)andthenumberofincorrectly
estimatedzeroparameters(IZ).Recallthatinthetruemodelthereare25non-zeroand975
zeroparameters.Table1summarizesthesimulationresultsunderthelongmemorysetup
andTable2summarisestheresultsunderthei.i.d.setup.
26
Table2.1:MediansofRPE,REE,NZ&IZwithGaussiandesign,longmem.errors
d=0.15
d=0.25
d=0.35
d=0.45
(n)
REE
RPE
NZ
IZ
REE
RPE
NZ
IZ
REE
RPE
NZ
IZ
REE
RPE
NZ
IZ
100
0.216
0.62
14
33
0.23
0.62
14
33.5
0.24
0.61
14
34.5
0.28
0.46
14
32
200
0.13
0.47
15
30
0.13
0.44
14
32.5
0.14
0.41
14
35
0.18
0.38
14
35
300
0.10
0.39
15
33
0.11
0.36
15
35
0.11
0.38
15
34
0.14
0.31
15
41
400
0.09
0.33
16
39
0.09
0.31
16
40
0.10
0.32
15
36
0.12
0.31
15
41
700
0.05
0.23
20
60
0.06
0.21
19
59.5
0.07
0.23
18
50
0.08
0.22
17
52
Table2.2:MediansofRPE,REE,NZ&IZwithGaussiandesign,i.i.d.errors
Var(
"
i
)=25
:
16
Var(
"
i
)=31
:
98
Var(
"
i
)=47
:
64
Var(
"
i
)=100
:
94
(n)
REE
RPE
NZ
IZ
REE
RPE
NZ
IZ
REE
RPE
NZ
IZ
REE
RPE
NZ
IZ
200
0.14
0.42
14
29.5
0.15
0.38
14
29
0.18
0.33
14
29
0.25
0.29
14
30
400
0.09
0.28
16
32
0.10
0.26
15
31
0.12
0.22
15
28
0.15
0.18
14
28
Interpretation:

Lassoisadesirableestimationprocedureinourlongrangedependentsetup.Itper-
formsaccurateestimationatalllevelsofdependence,fromd=0.15tod=0.45.Itis
evidentfromthesimulationresultsthattheestimationbecomesincreasinglyaccurate
intermsofbothREEandRPEasthesamplesizeincreases.Atn=400,therelative
errorinestimationof

isaround10%atalllevelsofdependence.Asthereadermight
observe,Itwasexpectedthatatanysamplesize,RPEshouldincreaseas
d
in-
creases,howeverthisisnotthecase,thereasonforthisis,weusecrossvalidationto
choose

n
andnotthetheoreticalvalueof

n
derivedearlier.

Intermsofvariableselection,Lassoisincreasinglysuccessfulinchoosingthenonzero
parametersasthesamplesizeincreases.Byn=700,itidenaround20ofthenon
zeroparametersforalllevelsofdependence.TheparametersthatLassoisconsistently
unabletoselectaretheonesthataretoosmallinsize,i.einourmodelwehavefour
parameterswhere
j

j
j
<
0
:
65
;j
=3
;
7
;
15
;
19
;
anditistheseparametersthatLassois
consistentlyunabletodetect,uptothesamplesizen=700.Thepointherebeing,this
27
isaknowndrawbackofLassoconnectedwithassumption(2.23),anditisnotdueto
thelongmemorydependencestructureontheerrors.Thisisbytheresults
forthei.i.d.case,whichexhibitsthesameproblem.Theabovesimulationalsobrings
outanotherfamiliardrawback,asthereadermightobserve,althoughLassomanagesto
correctlyestimateatportionofthezeroparameters(around95%atn=700),
howeverthenumberofincorrectlyestimatedzeroparameters(IZ)isnotdecreasing
asthesamplesizeincreases.Thisagainisnotduethelongmemoryerrorsbutisan
inherentdrawbackofcrossvalidation.Thiscanagainbebytheresultsin
thei.i.d.caseatthevariancelevels47.64and100.94whereIZdoesnotdecreaseasn
increasesfrom200to400.

ComparingRPEinthelongmemorycaseandthei.i.d.case,asexpected,weobserve
thatthelongmemorycaserequireslargernumberofobservationstoreachthesame
levelofaccuracy,keepinginmindthatthevarianceofcomponentsof
"
issimilarfor
boththedependentandindependentcase.
2.5ProofsforChapter2
Theproofof
Lemma2.1.1
,willfollowaftertwokeylemma's.Toproceedfurtherwerequire
thefollowingnotation,
Let
r
beaepositiveintegerand
8
1

j

r;
let
h
j
=(
h
1
j
;h
2
j
;:::;h
nj
)
T
beavector
ofweights.Further,let
c
nk;j
=
P
n
i
=1
h
ij
a
i

k
andfor1

j

r;
W
n;j
=
h
T
j
"
=
n
X
i
=1
h
ij
"
i
=
n
X
i
=1
i
X
k
=

h
ij
a
i

k

k
=
n
X
k
=

c
nk;j

k
:
(2.30)
28
Further
c
n;j
=sup

<k

n
j
c
nk;j
j
;c
n
=max
1

j

r
c
n;j
:
(2.31)
Also,denoteby
˙
2
n;j
=
Var
(
W
n;j
)
;˙
2
n
=max
1

j

r
˙
2
n;j
:
(2.32)
Observethat,
˙
2
n;j
=
n
X
l;m
=1
h
lj
h
mj

"
(
l

m
)
:
Furthermore,ifweset
h
ij
=0
8
i>n
and
i

0
;
thenundertheassumption
P
n
i
=1
h
2
ij

M=n
2
d
;M<
1
;
weobtainusing(2.2)thatforall1

j

r;
˙
2
n;j
=
c

n

1
X
s
=

(
n

1)
;s
6
=0
p
(
s;j
)
j
s
j

1+2
d
+
o
(1)
=
c

n
X
m
=1
n
X
l
=1
;l
6
=
m
h
mj
h
lj
j
l

m
j

1+2
d
+
o
(1)
:
(2.33)
Here,
p
(
s;j
):=
P
n
i
=1
h
ij
h
(
i
+
s
)
j
;
and
c

=
B
(
d;
1

2
d
)asgivenby(2.2).
Notethat,ifwereplace
h
ij
by
n

(
1
2
+
d
)
x
ij
intheaboveof
W
nj
thenwe
obtain(2.4).Thismoregeneralof
W
nj
willbeessentiallaterintheproofofsign
consistency.
Lemma2.5.1
Foranypositiveinteger
r
,andforall
1

j

r
,let
h
j
=(
h
1
j
;:::;h
nj
)
T
be
anyvectorofweightssuchthat
k
h
j
k
2
2
=
P
n
i
=1
h
2
ij

M=n
2
d
;
forsomeconstant
M<
1
.Let
29
˙
2
n
beasdin(2.32).Then
˙
2
n
=
O
(1)
:
Proof.
First
j
p
(
s;j
)
j
n
X
i
=1
j
h
ij
h
(
i
+
s
)
j
j
(
n
X
i
=1
h
2
ij
)
1
=
2
(
n
X
i
=1
h
2
ij
)
1
=
2

M=n
2
d
;
andhence
Var
(
W
n;j
)=
c

n

1
X
s
=

(
n

1)
;s
6
=0
p
(
s;j
)
j
s
j

1+2
d
+
o
(1)

c

M
n
2
d
n

1
X
s
=

(
n

1)
;s
6
=0
j
s
j

1+2
d
+
o
(1)

c

M
n
n

1
X
s
=

(
n

1)
;s
6
=0
j
s=n
j

1+2
d
+
o
(1)
!
M
0
Z
1

1
j
t
j

1+2
d
dt:
(2.34)
Observethattheboundin(2.34)isfreeof
j;
hencetheclaimfollows.

Lemma2.5.2
Foranypositiveinteger
r
,andforall
1

j

r
,let
h
j
=(
h
1
j
;:::;h
nj
)
T
be
anyvectorofweightssuchthat
k
h
j
k
2
2
=
P
n
i
=1
h
2
ij

M=n
2
d
;
forsomeconstant
M<
1
.
Thenfor
c
n
asdin(2.31)wehave,
c
n
=
o
(1)
:
Proof
TheideaoftheproofisborrowedfromGKSaspartofProposition4.3.1,pp66,where
itisusedinatcontext.Firstobserve,sinceforall1

j

r;
P
n
i
=1
h
2
ij

M=n
2
d
,
30
)
1
max
1

i

n
1

j

r
j
h
ij
j

n
d
=
p
M
!1
:

K
n
:=
1
max
1

i

n
1

j

p
j
h
ij
j
;
andconsider
j
c
nk;j
j
n
X
i
=1
j
h
ij
a
i

k
j

n
X
i
=1
j
h
ij
a
i

k
j
I
(
j
i

k
j
K
n
)
+
n
X
k
=1
j
h
ij
a
i

k
j
I
(
j
i

k
j
K
n
)
=:
q
n;
1
k;j
+
q
n;
2
k;j
q
n;
1
k;j

(
n
X
k
=1
h
2
ij
)
1
=
2
(
n
X
k
=1
a
2
i

k
I
(
j
i

k
j
K
n
))
1
=
2

C=n
d
n
X
l

K
n
a
2
l
!
0
:
(2.35)
q
n;
2
k;j

max
1

i

n;
1

j

r
j
h
ij
j
n
X
i
=1
j
a
i

k
j
I
(
j
i

k
j
K
n
)

K

1
n
K
1
=
2
n
(
1
X
l
=0
a
2
l
)
1
=
2

CK

1
=
2
n
!
0
:
(2.36)
Sincetherighthandsideof(2.35)and(2.36)arefreeof
j;
henceweobtain,
c
n
=
o
(1).

ProofofLemma2.1.1
:Intheabovesetup
8
1

j

p;
let
h
ij
=
n

(
1
2
+
d
)
x
ij
;
1

i

n:
Then,undertheassumption(2.13),wehave,
P
n
i
=1
h
2
ij

C=n
2
d
:
Theresultnowfollowsfrom
Lemma2.5.1andLemma2.5.2.

31
Remark2.5.1
ObservefromtheproofofLemma2.5.2,for
h
ij
=
n

(1
=
2+
d
)
x
ij
;
wehave,
j
c
nk;j
j
(
P
n
i
=1
x
2
ij
)
1
=
2
n
1
2
+
d
(
n
X
i
=1
a
2
i

k
I
(
j
i

k
j
K
n
))
1
=
2
+
K

1
=
2
n
(
1
X
l
=0
a
2
l
)
1
=
2
;
where
K
n
=max
1

i

n
1

j

p
j
x
ij
j
:
Since
x
ij
<
18
1

i

n;
8
1

j

p;
andthesequence
f
a
l
g
issquaresummable,henceeach
n;c
n
<
1
withouttheassumption(2.13).
Thefollowingseverallemmasareneededtoprove
Proposition2.1.1
.Firstrecallthe
BernsteininequalityfromDoukhan(1994)orLemma3.1fromGuoandKoul(2007).
Lemma2.5.3
Foreach
n

1
;m

1
,let
Z
mni
;i
=

m;:::;n;
beanarrayofmeanzero
varianceindependentrandomvariables.Assume,additionallythattheysatisfythe
Cramerscondition:forsome
B
mn
<
1
;
E
j
Z
mni
j
k

B
k

2
mn
k
!
EZ
2
mni
;k
=2
;
3
;:::;i
=

m;:::;n:
(2.37)
Let
T
mn
=
P
n
i
=

m
Z
mni
;˙
2
mn
=
P
n
i
=

m
Var
(
Z
mni
)
.Then,forany
>
0
and
n

1
;
P
(
j
T
mn
j
>
)

2exp
ˆ


2
4
˙
2
mn
+2
B
mn

˙
;
8
m
2
Z
+
;n

1
:
(2.38)
WeneedtoapplytheaboveBernsteininequality
p
times,
j
thtimeto
Z
mni;j
:=
c
ni;j

i
,

m

i

n;
1

j

p
.Inthiscasethen
T
mnj
=
n
X
i
=

m
c
ni;j

i
:
(2.39)
32
Forthispurpose,weneedtoverify(2.37)inthiscase.LetDbeasin(2.8)and
B
mn;j

B
n
:=
c
n
D;c
n
=max
1

j

p
c
n;j
:
(2.40)
Thenbyassumption(2.8),
j
c
ni;j
j
k
E
j

i
j
k
j
c
ni;j
j
k

2
D
k

2
k
!
c
2
ni;j
E
2
i
(2.41)

B
k

2
n
k
!
c
2
ni;j
E
2
i
;

m

i

n;
therebyverifyingtheCramer'scondition(2.37)for
Z
mni;j
foreach1

j

p
with
B
mn;j

B
n
;
notdependingon
m
and
j
.
Toproceedfurther,weneedtoobtainanupperboundfor
˙
2
mn;j
:=
P
n
i
=

m
Var(
Z
mni;j
).
But
˙
2
mn;j
=
n
X
i
=

m
Var(
c
ni;j

i
)

n
X
i
=

Var
(
c
ni;j

i
)
=
Var
(
n
X
i
=

c
ni;j

i
)=
Var
(
n
X
i
=1
n

(1
=
2+
d
)
x
ij
"
i
)
=
n

(1+2
d
)
n
X
k;`
=1
x
kj
x
`j

"
(
k

`
)=
˙
2
n;j
<
1
;
(2.42)
Fromtheabovediscussionwenowreadilyobtainthatforall
>
0and1

j

p
,
P
(
j
n
X
i
=

m
c
ni;j

i
j
>
)

2exp


2
4
˙
2
mn;j
+2
B
n


(2.43)

2exp


2
4
˙
2
n
+2
B
n


:
33
Remark2.5.2
ByRemark2.5.1,weseethatforeach
n

1
;
wehave,
c
n
<
1
withoutassumption(2.13).HencetheBernsteininequalityisapplicableforevery
n

1
withoutassumption(2.13).
WearenowalmostsettoderivetheprobabilityboundforBeforethat,welookat
thefollowingpreliminarylemma,whichwillhelpustoobtainthisboundfromthetruncated
sums
T
mnj
in(2.39)for
W
nj
in(2.4)bytakinglimitas
m
!1
.
Lemma2.5.4
Foreachdn,let
A
:=
fj
n
X
i
=

y
ni
j
>r
g
;B
m
=
fj
n
X
i
=

m
y
ni
j
>r


g
;r>
0
;>
0
;m
=1
;
2
;:::
B
=liminf
m
!1
B
m
:
If
j
P
n
i
=

y
ni
j
<
1
,a.s.,then,foreachd
n
,
A

B
Proof.
Let
!
2
A
.Then
j
P
n
i
=

y
ni
(
!
)
j
>r
.Also,byassumption,
j
P
n
i
=

y
ni
(
!
)
j
<
1
;
whichimplies
8
>
0
9
N
;!
3j
P

m
i
=

y
ni
(
!
)
j
<;
8
m>N
;!
:
Hence
j
P
n
i
=

m
y
ni
(
!
)
j
>
r

;
8
m>N
;!
;
whichinturnimplies
!
2
1
\
m
=
N

fj
n
X
i
=

m
y
ni
j
>r


jg
)
!
2
1
[
m
=1
1
\
l
=
m
fj
n
X
i
=

l
y
ni
j
>r


g
)
!
2
liminf
m
!1
B
m
:
(2.44)
Since(2.44)istrueforany
>
0,theclaim
A

B
follows.

Beforeproceedingtothenextproposition,weseethattheassumptioninLemma2.5.4
isvalidfortheseriesinconsideration,whichis
P
n
k
=

c
nk;j

k
:
First,considertheseries
34
"
i
=
P
1
k
=1
a
k

i

k
;
sincethisisansumofindependentzeromeanrandomvariables
with
P
1
k
=1
Var(
a
k

i

k
)
<
1
;
hence
"
i
<
1
a.s.(Durrett,Theorem1.8.3,page62).Nowfor
each
n;
wehaveby(2.4),
P
n
k
=

c
nk;j

k
=
n

(
1
2
+
d
)
P
n
i
=1
x
ij
"
i
;
sincethisisa
weightedsumof
f
"
i
g
henceforeach
n;
wehave,
P
n
k
=

c
nk;j

k
<
1
;
a.s.
8
1

j

p:
ProofofProposition2.1.1.
Fixa1

j

p
andan
n

1.Recalltheof
c
nk;j
from(2.5).Let
r
np
:=
n
1
=
2

d

0
n
=
2.Then,forany0
<<r
np
,wehavethefollowing
inequalities.
P
(
j
n

(1
=
2+
d
)
n
X
i
=1
x
ij
"
i
j
>r
np
)=
P
(
j
n
X
k
=

c
nk;j

k
j
>r
np
)

P
(liminf
m
!1

j
n
X
k
=

m
c
nk;j

k
j
>r
np


)
;
byLemma2.5.4
;

liminf
m
!1
P
(
j
n
X
k
=

m
c
nk;j

k
j
>r
np


)
;
Fatou'slemma
;

liminf
m
!1
2exp


(
r
np


)
2
4
˙
2
n
+2
B
n
(
r
np


)

;
wherethelastinequalityfollowsfrom(2.43).Uponletting

!
0inthisboundwethus
obtain
P
(
j
n

(1
=
2+
d
)
n
X
i
=1
x
ij
"
i
j
>r
np
)

2exp


r
2
np
4
˙
2
n
+2
B
n
r
np

:
(2.45)
Notethat
r
np
isapositivesolutionofthefollowingquadraticequation.

r
2
np
4
˙
2
n
+2
B
n
r
np
=

(
t
2
+4log
p
)
4
:
35
Hence,(2.45)andtherelation
2exp


r
2
np
4
˙
2
n
+2
B
n
r
np

=2exp


(
t
2
+4log
p
)
4

;
togetherimply
P

2


n

1
n
X
i
=1
x
ij
"
i


>
0
n

=
P
(
j
n

(1
=
2+
d
)
n
X
i
=1
x
ij
"
i
j
>r
np
)
(2.46)

2exp


(
t
2
+4log
p
)
4

:
Thiscompletestheproofof(2.10).Toprove(2.11),notethat
1

P
=
P
(max
1

j

p
2
n

1
j
n
X
i
=1
x
ij
"
i
j
>
0
)

P
(
[
p
j
=1
f
2
n

1
j
n
X
i
=1
x
ij
"
i
j
>
0
g
)

p
X
j
=1
P
(2
n

1
j
n
X
i
=1
x
ij
"
i
j
>
0
)
:
By(2.46)weget,
p
X
j
=1
P
(2
n

1
j
n
X
i
=1
x
ij

i
j
>
0
)

2
p
exp

(
t
2
+4log
p
)
=
4
g
=2exp(

t
2
4
)
:
ThiscompletestheproofofProposition2.1.1

36
ProofsforSection3
ProofofProposition2.2.1
:
Let
^

beasin(2.3)andlet^
u
=
^


:

V
n
(
u
)=
n
X
i
=1
1
n
[(
"
i

X
T
i
u
)
2

"
2
i
]+

n
k
u
+

k
1
:
Then^
u
=argmin
u
V
n
(
u
).Denotethetermin
V
n
(
u
)by(I),andthesecondtermby
(II).Then(I)canbeas
n
X
i
=1
1
n
[(
"
i

X
T
i
u
)
2

"
2
i
]=
h

2
n
X
i
=1
1
n
u
T
X
i
"
i
+
n
X
i
=1
1
n
(
u
)
T
X
i
X
T
i
u
i
=
h

2
u
T
W
n
1
2

d
+
u
T
C
n
u
i
;
(2.47)
where
W
=
n

1
=
2

d
X
T
":
tiate(2.47)withrespectto
u
toobtain
2
n

(
1
2

d
)
(
C
n
(
n
1
2

d
u
)

W
)
:
Let^
u
(1),
W
(1)and^
u
(2),
W
(2)denotethe
q
andthelast
p

q
entriesof^
u
,
W
,
respectively.Nownotethat(ZhaoandYu,2006)
f
sign(

(1)
)^
u
(1)
>


(1)
jgf
sign(
^

j
)=sign(

j
)
;j
=1
;
2
:::;q
g
:
(2.48)
37
Also,bytheKarush-Kuhn-TuckerconditionsanduniquenessofLasso,ifasolution^
u
exists,
thenthefollowingconditionsmusthold,
(
C
n
11
(
n
1
2

d
^
u
(1))

W
(1))=


n
2
n
1
2

d
sign
(

(1)
)
;
(2.49)
j
^
u
(1)
j
<
j

(1)
j
;
(2.50)
j
(
C
n
21
(
n
1
2

d
^
u
(1))

W
(2))
j

n
2
n
1
2

d
1
:
(2.51)
Theset(2.50)iscontainedinthesetontheleftof(2.48).Hence(2.49),(2.50),(2.51)
togetherimply
f
sign
(
^

(1)
)=
sign
(

(1)
)
g
and
^

(2)
=^
u
(2)=0
:
Thecondition
A
n
impliesthe
existenceof^
u
(1)which(2.49)and(2.50)andcondition
B
n
and
A
n
togetherimply
(2.51).Theresultfollows.

Tomaintainclarityofnotationinthecomingproof,wethefollowing,foramatrixof
weights
h
a
=(
h
a
1
;:::;h
aq
)
;
where
h
aj
=(
h
a
1
j
;h
a
2
j
;:::h
anj
)
;
8
1

j

q;

W
a
n;j
;c
a
n
;˙
2
an
asdonein(2.30),(2.31),(2.32)respectively.Alsoe
B
a
n
=
c
a
n
D
.Repeatsimilarlyfora
matrixofweights
h
b
=(
h
b
1
;:::;h
b
(
p

q
)
)
;
with
h
bj
=(
h
b
1
j
;h
b
2
j
;:::h
bnj
)
;
8
1

j

(
p

q
)
:
ProofofTheorem2.2.1
:
Let
A
n
,
B
n
beasinProposition2.2.1.
1

P
(
A
n
\
B
n
)

P
(
A
c
n
)+
P
(
B
c
n
)

q
X
i
=1
P
(
j
z
i
j
n
1
2

d
(
j

i
j

n
2
b
i
))+
p

q
X
i
=1
P
(
j

i
j

n
2
n
1
2

d

i
)
;
where
z
=(
z
1
;z
2
;:::;z
q
)
T
=(
C
n
11
)

1
W
(1),

=(

1
;
2
;:::;
p

q
)
T
=
C
n
21
(
C
n
11
)

1
W
(1)

W
(2),
b
=(
b
1
;b
2
;:::;b
q
)=(
C
n
11
)

1
sign
(

(1)
)
:
Nowexpress
z
=
h
T
a
"
,where
h
T
a
=(
h
a
1
;:::;h
aq
)
T
=
38
(
C
n
11
)

1
(
n

1
=
2

d
X
(1)
T
).Then
h
T
a
h
a
=(
C
n
11
)

1
n

2
d
;
and
z
j
=
h
T
aj
"
with
k
h
aj
k
2
2

1
n
2
d
M
2
8
j
=1
;:::q;
byassumption(2.21)
Similarlywrite

=
h
T
b
"
,where
h
T
b
=
C
n
21
(
C
n
11
)

1
(
n

1
2

d
X
(1)
T
)

(
n

1
2

d
X
(2)
T
).Then
h
T
b
h
b
=
1
n
1+2
d
X
(2)
T

I

X
(1)(
X
(1)
T
X
(1))

1
X
(1)
T

X
(2)
:
Since[
I

X
(1)(
X
(1)
T
X
(1))

1
X
(1)]haseigenvaluesbetween0and1,therefore

n
j
=
h
T
bj
"
,
with
k
h
bj
k
2
2

M
1
=n
2
d
8
j
=1
;:::p

q;
byassumption(2.20)
:
Hencetheweightvectors
h
aj
;
1

j

q
and
h
bj
;
1

j

p

q
bothsatisfyLemma2.5.1
andLemma2.5.2for
r
=
q
and
r
=
p

q
respectively.Also,
j

n
b
j
=

n
j
(
C
11
)

1
sign
(

(1)
)
j

n
M
2
jj
sign
(

(1)
)
jj
2
=

n
M
2
p
q:
(2.52)
Now,
z
j
=
h
T
aj
"
=
P
n
i
=1
h
aij
"
i
Proceedasdoneearlierin(2.45).Using(2.52),Lemma2.5.1,
Lemma2.5.2andtheBernstein'sInequalityasappliedin(2.45).Weget,forsomeconstants
r
1
;r
2
>
0
;
q
X
j
=1
P
(
j
z
j
j
n
1
2

d
(
j

j
j

n
2
b
j
))

q
X
j
=1
P
(
j
z
j
j
r
1
n
c
2
=
2
)
(2.53)

2
q
exp


r
2
1
n
c
2
4
˙
2
an
+2
B
a
n
r
1
n
c
2
=
2

!
0
:
39
Also
p

q
X
j
=1
P
(
j

j
j

n
2
n
1
2

d

j
)

(
p

q
)exp


r
2
2
(

n
n
1
=
2

d
)
2
4
˙
2
bn
+2
B
b
n
r
2

n
n
1
=
2

d

(2.54)

(
p

q
)exp


r
2

n
n
1
=
2

d

;
fornlarge
;

exp(
n
c
3

r
2

n
n
1
=
2

d
)
!
0
:
Theresultfollowsfrom(2.53)and(2.54)together.

ProofsforSection4
ProofofCorollary3.57
:Observethatassumption(2.13)impliesassumption(2.25)(i)
and(2.25)(ii).Henceweonlyneedtoshow,
n

(1+2
d
)

n
=
O
(1)componentwise.Foreach
variancecomponent,thishasalreadybeenshownin(2.34)intheproofofLemma2.1.1,
with
h
ij
=
n

(
1
2
+
d
)
x
ij
;
8
1

j

p
.Thecovariancecomponentscanbeeasilydealtwith
theCauchy-Schwarzinequality.

ProofofRemark2.3.1
:Using(2.33),weobtain,
n

1

2
d
Cov(
T
nj
;T
nk
)=
n

1

2
d
c

n
X
l;m
=1
;l
6
=
m
g
j
(
l
n
)
g
k
(
m
n
)
j
l

m
j

1+2
d
+
o
(1)
!
c

Z
1
0
Z
1
0
g
j
(
u
)
g
k
(
v
)
j
u

v
j

1+2
d
dudv:
ProofofTheorem2.3.2
:Let
Z
n
(
˚
)=
1
n
n
X
i
=1
(
y
i

X
T
i
˚
)
2
+

n
p
X
i
=1
j
˚
i
j
;
40
then
Z
n
(
˚
)isconvex.Weneedtoshowthepointwiseconvergence(inprobability)of
Z
n
(
˚
)
to
Z
(
˚
)+
k
2
forsomeconstant
k
.Clearly,

n
P
p
i
=1
j
˚
i
j!

0
P
p
i
=1
j
˚
i
j
andconsider,
1
n
n
X
i
=1
(
y
i

X
T
i
˚
)
2
=
1
n
n
X
i
=1
(
"
i

X
T
i
(
˚


))
2
=
1
n
n
X
i
=1
"
2
i
+
1
n
n
X
i
=1
(
˚


)
T
X
i
X
T
i
(
˚


)

2
n

1
(
˚


)
T
n
X
i
=1
X
i
"
i
=
1
n
n
X
i
=1
"
2
i
+
1
n
n
X
i
=1
(
˚


)
T
X
i
X
T
i
(
˚


)

2
1
n
(
˚


)
T
X
T
";
thetermintheaboveequationconvergesto
k
2
bytheergodictheorem(since
f
"
i
g
form
astationaryergodicsequence),thesecondtermconvergesto(
˚


)
T
C
(
˚


)andthelast
termconvergestozeroinprobability(sincebyTheorem2.3.1,
n

(
1
2
+
d
)
X
T
"
convergesin
distribution).ThisprovestheTheorem.

ProofofTheorem2.3.3
:
V
n
(
u
)=
n
1

2
d
2
4
n
X
i
=1
1
n
[(
"
i

X
T
i
u
n
1
2

d
)
2

"
2
i
]+

n
p
X
j
=1
[
j

j
+
u
j
n
1
2

d
jj

j
j
]
3
5
:
(2.55)
Denotethetermintheaboveequationby(I),andthesecondtermby(II).Then
(
I
)=
n
1

2
d
"
n
X
i
=1
1
n
"
2
i
+
u
T
P
n
i
=1
X
i
X
T
i
u
n:n
1

2
d

2
P
n
i
=1
u
T
X
i
"
i
n:n
1
2

d

n
X
i
=1
1
n
"
2
i
#
=[
u
T
P
n
i
=1
X
i
X
T
i
u
n

2
P
n
i
=1
u
T
X
i
"
i
n
1
2
+
d
]
!
u
T
Cu

2
u
T
W;
as
n
!1
;
(2.56)
where
W
is
N
p
(0
;

:
Also,
41
(
II
)=
n
1
2

d

n
p
X
j
=1
[
j
n
1
2

d

j
+
u
j
j
n
1
2

d
j

j
j
]
!

0
p
X
j
=1
[
u
j
sign
(

j
)
I
[

j
6
=0]
+
j
u
j
j
I
[

j
=0]
]
;
(2.57)
Theresultfollowsfrom(2.56)and(2.57)together.

ProofofTheorem2.3.4
:ThestructureoftheproofissimilartothatofTheorem2in
Zuo(2006).
~
V
n
(
u
)=
n
1

2
d
2
4
n
X
i
=1
1
n
[(
"
i

X
T
i
u
n
1
2

d
)
2

"
2
i
]+

n
p
X
j
=1
^
w
j
[
j

j
+
u
j
n
1
2

d
jj

j
j
]
3
5
;
(2.58)
then~
u
j
=
n
1
2

d
(
~

n


)=argmin
~
V
n
(
u
)
:
Expanding
~
V
n
(
u
)asdonein(2.56)and(2.57)we
get,
~
V
n
(
u
)=
u
T
P
n
i
=1
X
i
X
T
i
u
n

2
P
n
i
=1
u
T
X
i
"
i
n
1
2
+
d
+
n
1
2

d

n
p
X
j
=1
^
w
j
[
j
n
1
2

d

j
+
u
j
j
n
1
2

d
j

j
j
:
Recall,
n

1
X
T
X
!
C;
andbyTheorem2.3.1wehave
n

(
1
2
+
d
)
X
T
"
!
D
N
(0
;

:
Also,since
n
1
2

d

n
!
0
;n
1
2
+

2

d


n
!1
andtheadaptiveweights
^

n
aresothat
n
1
2

d
(
^

n


)=
O
p
(1)
:
Henceweobtain
~
V
n
(
u
)
!
~
V
(
u
)where,
~
V
(
u
)=
8
>
<
>
:
u
T
A
C
11
u
A

2
u
T
A
W
A
;
if
u
j
=0
8
j=
2A
1
;else
9
>
=
>
;
(2.59)
Theuniqueminimumof
~
V
(
u
)is(
C

1
11
W
A
;
0)
T
:
Henceweobtain,
~
u
A
=
n
1
2

d
(
~

n
A


A
)
!
D
C

1
11
W
A
and~
u
A
c
!
D
0
:
(2.60)
42
ThevariableselectionpartcanbeobtainedbyadjustingnormalizationintheproofofZuo
(2006).Fromtheasymptoticnormalityobtainedin(2.60),weobtain,
8
j
2A
;P
(
j
2A
?
n
)
!
1
:
Let
x
j
:=(
x
1
j
;::::;x
nj
)
T
bethe
j
th
columnofthedesignmatrix
X;
1

j

p:
Nextwe
showthatif
j=
2A
;
then
P
(
j=
2A
?
n
)
!
1
:
BytheKKTconditionsfortheLassosolution,we
have,
j
2
x
T
j
(
y

X
~

)
j

n
^
w
j
:
Consider,
x
T
j
(
y

X
~

n
)
n
1
2
+
d
=
x
T
j
Xn
1
2

d
(


~

n
)
n
+
x
T
j
"
n
1
2
+
d
using(2.60),thetermontherightsideconvergestosomenormaldistribution,andby
Theorem2.3.1thesecondtermontherightconvergestoanormaldistribution.Also,since

j
=0and
n
1
2

d
(


^

n
)=
O
p
(1)
;
hence,
n
1
2

d

n
^
w
j
=
n
1
2

d
+


n
1
j
n
1
2

d
^

j
j

!1
:
Thisimplies,
P
(
j=
2A
?
n
)

P
(
j
2
x
T
j
(
y

X
~

n
)
j

n
^
w
j
)
!
1
:
Thiscompletestheproof.

43
Chapter3
Weighted
`
1
-PenalizedCorrected
QuantileRegression
forHighDimensionalMeasurement
ErrorModels
AsdescribedinChapter1,weshallnowconsidertheproblemofestimationandvariable
selectioninmeasurementerrorlinearregressionmodels.Inthischapterweshallpropose
a
`
1
-penalizedcorrectedquantileestimatorthatconsistentlycorrectsthebiasinducedby
measurementerrorandalsoprovidesconsistentvariableselection.Theproblemofbias
correctionduetomeasurementerrorinthecontextofmeanregressionisaclassicalproblem
inthecaseof
p;
ontheotherhanditisofrecentinterestinthehighdimensional
setting.Furthermorebiascorrectioninquantileregressionhasonlyrecentlybeenstudied
Wang,StefanskiandZhu(2012)inthe
p
settingandtothebestofourknowledgethis
chapteristheattempttodosointhehighdimensionalsetting.Themaincontributions
ofthischapteraretoprovidetheoraclepropertyoftheproposedestimatorinthe
p
setting,andtoprovideboundsonthestatisticalerroroftheproposedestimatorinthehigh
dimensionalsetting.Inthissetting,wealsoestablishthemodelselectionconsistencyof
44
thisestimatorintermsofidentifyingthecorrectzerocomponentsoftheparametervector.
Furthermoreweillustrateitsempiricalsuccessviaasimulationstudy.
3.1Model,EstimatoranditsOracleProperty
Inthissection,wedescribethemodel,theproposedestimatorandthenecessaryassumptions
onthemodelparameters.Alsoweestablishtheoraclepropertyoftheproposedestimatorin
the
p
setting.Asdescribedin(1.1)and(1.2)weconsideralinearregressionmodelwith
additiveerrorinthedesignvariables.Where
x
i
=(
x
i
1
;

;x
ip
)
T
,
i
=1
;

;n;
arevectors
ofnon-randomdesignvariablesand
y
i
'saretheresponsesrelatedto
x
i
'sbytherelations
y
i
=
x
T
i

0
+
"
i
;
forsome

0
2
R
p
;
1

i

n:
(3.1)
Here

0
=(

0
1
;:::;
0
p
)
2
R
p
istheparametervectorofinterest,and
"
=(
"
1
;:::;"
n
)
T
is
an
n

dimensionalvectorwhosecomponentsareindependentbutnotnecessarilyidentically
distributed,andsatisfy
P
(
"
i

0)=
˝;
forevery1

i

n
,where
˝
2
(0
;
1)isthequantile
levelofinterest.
Furthermore,thedesignvariables
x
i
'sarenotobserveddirectly.Instead,weobservethe
surrogate
w
i
'sobeyingthemodel,
w
i
=
x
i
+
u
i
;
1

i

n:
(3.2)
Here,
u
T
i
=(
u
i
1
;

;u
ip
)areassumedtobeindependentof
f
"
i
g
andindependentand
identicallydistributed(i.i.d.)accordingtoa
p

dimensionalmultivariateLaplacedistribution
whichisviaitscharacteristicfunctionasfollows,
45
3.1.1
Arandomvector
u
2
R
p
issaidtohaveamultivariateLaplacedistribution
L
p
(

ifforsome

2
R
p
andanonnegativesymmetric
p

p
matrixits
characteristicfunctionis

1+
t
T

t=
2


1
,
t
2
R
p
:
Notethat,if

=0,thenisthecovariancematrixoftherandomvector
u
.
Laplacedistributionsareoftenusedinpracticetomodeldatawithtailsheavierthan
normal.McKenzieetal.(2009)usedthesedistributionsintheanalysisofglobalpositioning
dataandPurdomandHolmes(2005)adoptedLaplacemeasurementerrormodelinthe
analysisofdatafromsomemicroarrayexperiments.StefanskiandCarroll(1990)providean
indepthdiscussionofLaplacemeasurementerrors.
Inoursetup,weshallconsiderthemodel(3.1)and(3.2)inboththeandhighdimen-
sionalsettings.Inthelattersetting,thedimension
p
oftheparametervector

0
isallowed
togrowexponentiallywith
n;
andthemeasurementerrors
u
i
;
1

i

n
arei.i.d.
L
p
(0
;

;
withknown.Furthermore,

0
isassumedtobesparse,i.e.,onlyasmallproportionofthe
parametersareassumedtobenonzero.Thenumberofnonzerocomponentsshallbede-
notedby
s;
where
s
isallowedtodivergeslowerthan
n:
Let
S
=

j
2f
1
;
2
;:::;p
g
;

0
j
6
=0
:

;
and
S
c
denoteitscomplimentset.Notethatcard(
S
)=
s:
Also,foranyvector

2
R
p
;
let

S
=
f

j
;
j
2
S
g
and

S
c
=
f

j
;
j
2
S
c
g
:
Allresultspresentedinthechaptershallassumetheunobserveddesignvariables
x
i
'sto
benon-random.However,itisworthpointingoutthattheassumptionsmadeinthischapter
on
x
i
'scanbeshowntoholdwithprobabilityconvergingto1undersomerandomdesigns,
inparticularforsub-Gaussianorsub-Exponentialdesignswithindependentobservations.
Whenthedesignvariables
x
i
'sarecompletelyobserved,severalauthorsincludingFan,
etal.(2014),BelloniandChernozhukov(2011)andWang,WuandLi(2012)haveshown
46
that

0
canbeestimatedconsistentlyby
^

x
=argmin

2
R
p
(
1
n
n
X
i
=1
ˆ
(
y
i
;x
i
;
)+

n
k
d


k
1
)
;
(3.3)
where
ˆ
(
y
i
;x
i
;
)=
ˆ
˝
(
y
i

x
T
i

)
;ˆ
˝
(
v
)=
v
f
˝

I
(
v

0)
g
isthequantilelossfunction,and
d
=(
d
1
;:::;d
p
)
T
isavectorofnon-negativeweights,and`

'denotestheHadamardproduct,
i.e.
k
d


k
1
:=
P
p
j
=1
d
j
j

j
j
:
Toovercometheltyduetomeasurementerrorinthecovariates,webeginwiththe
regularizedversionofcorrectedquantileestimator
W`
1
-
CQ
.Theun-penalizedversionwas
introducedbyWang,StefanskiandZhu(2012)(WSZ).Todescribetheirlossfunction,let
K
(

)denoteakerneldensityfunctionand
K
0
beitsderivative.Also,let
h
=
h
n
!
0
besequenceofpositivewindowwidths,and
H
(
x
)=
R
x

K
(
u
)
du:
Let
ˆ
?
L
(
y
i
;w
i
;;h
)=~
"
i
(
˝

1)+~
"
i
H
(
~
"
i
h
)

˙
2

2
n
2
h
K

~
"
i
h

+
~
"
i
h
2
K
0

~
"
i
h

o
;
(3.4)
where,~
"
i
=
y
i

w
T
i

and
˙
2

=

T

:
WSZproposedtoapproximatethequantilefunc-
tion
ˆ
˝
(
y
i

x
T
i

0
)bythesmoothfunction
ˆ
?
L
(
y
i
;w
i
;;h
)andtheirestimatorasa
minimizerwithrespectto

oftheaverage
n

1
P
n
i
=1
ˆ
?
L
(
y
i
;w
i
;;h
).Itspenalizedanalogis
l
?
n
(

):=
1
n
n
X
i
=1
ˆ
?
L
(
y
i
;w
i
;;h
)+

n
k
d


k
1
:
Observethat
l
?
n
(

)isnon-convexand
l
?
n
(

)maydivergewhen
˙
2

=

T


!1
:
Hence,we
restricttheparameterspacetotheexpanding
`
1
-ball=
f

2
R
p
;
k

k
1

b
0
p
s
g
;
forsome
47
b
0
>
0.Now,the
W`
1
-
CQ
estimatoras
^

=argmin

2

l
?
n
(

)
:
(3.5)
Theweights
d
j
;
1

j

p
areassumedtosatisfy
(i)max
j
2
S
d
j

c
max
;
0
<c
max
<
1
;
(ii)min
j
2
S
c
d
j

c
min
;
0
<c
min
<
1
:
(3.6)
Webeginbyprovidingtheoraclepropertyoftheproposedestimatorinthedimen-
sionsetting,i.e.
p;s
areconstantsthatdonotchangewith
n:
Inthefollowingweprovide
thenecessarynotationandassumptionrequiredtoproceedfurther.
(F1)
Intheneighborhoodofzero,thedensityfunction
f
i
(

)of
"
i
;
1

i

n
isbounded
awayfromzeroandtyandhasaboundedderivative.
(F2)
Thekernelfunction
K
(

)isaboundedprobabilitydensityfunctionhavingfourth
momentandissymmetricabouttheorigin.Inaddition,
K
(

)istwiceerentiable
anditssecondderivative
K
00
(

)isboundedandLipchitzcontinuous.
(F3)
Let


n
1
(

0
;h
)=
1
p
n
n
X
i
=1
@ˆ

L
(
y
i
;w
i
;
0
;h
)
=@


n
2
=
1
n
n
X
i
=1
@
2
ˆ

L
(
y
i
;w
i
;
0
;h
)
=@@
T
:
Assumethatthereexistspositiveitematrices
D
and
A
suchthatCov


n
1
(

0
;h
)

!
D
and
E


n
2
(

0
;h
)
!
A:
(F4)
Measurementerrors:
Themeasurementerrors
f
u
i
g
areindependentof
f
"
i
g
,and
i.i.d.
L
p
(0
;

;
forall1

i

n
,withaknownpositivematrix
48
Theorem3.1.1
Assumethemeasurementerrormodel(3.1)and(3.2)alongwith(F1)-(F4)
hold.Furthermore,if

n
=
O
(
n

1
=
2
)
andtheweightvector
d
(3.6),thenthereexists
alocalminimizer
^

of
l

n
(

)
satisfying
k
^


0
k
2
=
O
p
(
n

1
=
2
)
:
Theabovetheoremrequirestheweightvectortosatisfythecondition(3.6).Thiscon-
ditionisalsobytheordinary
`
1
penalty,i.e.choosing
d
j
=1
;
1

j

p:
Toestablishtheoraclepropertyweneedtheweightstobemorecarefullychosen.Let
d
S
c
min
=min
f
d
j
;
j
2
S
c
g
thenwehavethefollowing.
Theorem3.1.2
Assumethemeasurementerrormodel(3.1)and(3.2)alongwith(F1)-(F4)
hold.Furthermore,if
n
1
=
2

n
!
0
andtheweightvector
d
(3.6).Inadditionassume
n
1
=
2

n
d
S
c
min
!1
;
thenthe
p
n
-consistentlocalminimizer
^

=(
^

S
;
^

S
c
)
T
inTheorem3.1.1
thefollowing
1.
Sparsity:
P

^

S
c
=0

!
1
;
2.
AsymptoticNormality:
n
1
=
2
(
^

S


0
S
)
!N
(0
;A

1
S
D
S
A

1
S
)
:
Thistheoremprovidesthedesiredoraclepropertyoftheproposed
`
1
-penalizedcorrected
quantileestimator.
Wenowproceedtothehighdimensionalsetup,wherethemodeldimensions
p;s
are
allowedtodivergeexponentiallywiththesamplesize
n:
Hereweshallprovideboundsonthe
statisticalerrorassociatedwiththe
W`
1
-
CQ
estimator,namely,boundsonthequantities
k
^


0
k
1
and
n

1
k

1
=
2
X
(
^


0
)
k
2
;
whereisintheparagraphprecedingLemma
3.3.2below.The
`
1
-consistencyof
^

willbeadirectconsequenceoftheseerrorbounds.
Notethatthechoice
d
j
=1
;
forall1

j

p
,makes
^

tobetheun-weightedpenalized
`
1
-
CQ
estimator.Asshallbecomeapparent,the
`
1
-
CQ
estimatorisalso
`
1
-consistentin
49
estimation.ThisshallalsobeobservedinthesimulationstudyinSection4.Ontheother
hand,asobservedinpart(
a
)ofTheorem3.1.2forthe
p
case,setting
d
j
=1
;
1

j

p
maynotleadtoconsistentvariableselection.Theweights
f
d
j
g
;
chosenappropriately,shall
servetoimproveonthisissue,byguaranteeingthatthezerocomponentsareiden
correctlywithasymptoticprobability1
;
therebymaking
W`
1
-
CQ
modelselectionconsistent
inadditiontobeing
`
1
-consistent.
Weshallnowdescribethemodelmorepreciselywhilealsoprovidingassumptionsneces-
sarytoproceedfurtherinthehighdimensionalsetup.
(A1)
Modelerrors
(
"
)
:
Thedistributionfunction(d.f.)
F
i
of
"
i
hasLebesguedensity
f
i
suchthatsup
1

i

n;x
2
R
f
i
(
x
)
<
1
,and
f
i
isuniformly(in
i
)boundedawayfromzero,
inaneighborhoodofzero.Also,thereexistsuniversalconstants
C
1
>
0
;C
2
>
0such
thatforany
y
satisfying
j
y
j
C
1
;
max
1

i

n
j
F
i
(
y
)

F
i
(0)

yf
i
(0)
j
C
2
y
2
:
ThisconditionisthesameasCondition1inFanetal.(2014).Itimposesonlymildconditions
ontheerrordensitiesandisslightlystrongerthantheLipchitzconditionfor
f
i
'saroundthe
origin.Gaussianityandhomoscedasticityisnotimposed.Severaldistributions,including
doubleexponentialandCauchy,satisfythiscondition.
(A2)
Unobserveddesignmatrix
X
:
Forall1

j

p
,
n

1
P
n
i
=1
x
2
ij

c
x
;
forsomeconstant
c
x
<
1
:
(A3)
Measurementerrors:
Themeasurementerrors
f
u
i
g
areindependentof
f
"
i
g
,and
i.i.d.
L
p
(0
;

;
forall1

i

n
,withaknownFurthermore,thereexistsaconstant
50
0
<˙
2
u
<
1
suchthatmax
1

j

p
Var
(
u
ij
)

˙
2
u
:
(A4)
Kernelfunction
K
:
K
istheprobabilitydensityfunctionofastandardnormalrandom
variable.
ThischoiceoftheKernelfunctionshallplayanimportantroleinouranalysis.Thiskernel
functionischosenforitsmanytractableproperties,namely,itissymmetricaroundorigin,
tiable,andmoreimportantly,itsderivativesbeingLipchitzcontinuous,
whichisdetailedinSection3.5ofthischapter.
3.2Relationshipbetween
ˆ
?
L
and
ˆ
Theanalysistofollowreliescriticallyontheapproximationofthecorrectedquantileloss
function
ˆ
?
L
in(3.4)intermsofobserved
w
i
'sbytheusualconvexquantilefunction
ˆ
in(3.3)involvingunobserved
x
i
's.Webeginbyestablishingthisconnection.
Theapproximationresultwederiveforthecurrenthighdimensionalsetup,where
p
is
increasingexponentiallywith
n
,issimilartotheoneusedinWSZinthecaseof
p
.For
thatreasonweusesimilarnotationasinWSZ.Accordingly,asmoothedquantileloss
functionwitharguments(
y
i
;x
i
;;h
)
;
ˆ
L
(
y
i
;x
i
;;h
)=(
y
i

x
T
i

)
f
˝

1+
H
(
y
i

x
T
i

h
)
g
:
(3.7)
Notethat
ˆ
?
L
isafunctionoftheobservedcovariates
w;
whereasthe
ˆ
L
and
ˆ
arefunctions
51
oftheunobservedcovariates
x:
Now,for

2

;

M

n
(

)

M
?
n
(
w;;h
)=
n

1
n
X
i
=1
n
ˆ
?
L
(
y
i
;w
i
;;h
)

ˆ
?
L
(
y
i
;w
i
;
0
;h
)
o
;
(3.8)
~
M
n
(

)

~
M
n
(
x;;h
)=
n

1
n
X
i
=1
n
ˆ
L
(
y
i
;x
i
;;h
)

ˆ
L
(
y
i
;x
i
;
0
;h
)
o
;
M
n
(

)

M
n
(
x;
)=
n

1
n
X
i
=1
n
ˆ
(
y
i
;x
i
;
)

ˆ
(
y
i
;x
i
;
0
)
o
:
Wearenowreadytostatethefollowingtheoremdescribingtheapproximationofthe
processes
M
?
n
(

)and
M
n
(

)bytheirrespectiveexpectations,uniformlyin

2
inprob-
abilitywithrates.Itsproofisgiveninsection3.5.Throughout,

max
denotesthelargest
eigenvalueof
Theorem3.2.1
Assumethemeasurementerrormodel(3.1)and(3.2)andtheassumptions
(A1),(A2),(A3)and(A4)hold.Then,
sup

2


M
?
n
(

)

EM
?
n
(

)


=
O
p


max
s
3
=
2
h
2
r
2log2
p
n

;
(3.9)
sup

2


~
M
n
(

)

E
~
M
n
(

)


=
O
p

p
s
r
2log2
p
n

:
(3.10)
Toproceedfurther,werequirethefollowingtworesultsofWSZ.First,thetwice
entiabilityof
ˆ
L
(
y;x;;h
)inthevariable
y

x
0

and
u
i
˘
L
p
(0
;
imply
EM
?
n
(

)=
E
~
M
n
(

)
;
8

2
R
p
:
(3.11)
52
Secondly,underassumption(A4),
sup

2

j
~
M
n
(

)

M
n
(

)
j
=
O
(
h
)
;
a.s.
(3.12)
Claim(3.11)isadirectconsequenceofTheorem2ofWSZwhileclaim(3.12)isprovedin
WSZasapartoftheproofoftheirTheorem3(page14).Theshortproofofthisstatement
isreproducedherefortheconvenienceofareader.
Proofof(3.12):
Denoteby
ˆ
L
(
e;h
):=
ˆ
L
(
y;x;;h
)
;
where
e
=
y

x
0
:
Similarly
ˆ
(
e
)
:
Let
Z
denotear.v.havingd.f.
H
.Usethesymmetryof
K
,andhenceof
H
,the
ofitsmoment,andthechangeofvariableformulatoobtainthatthelefthand
sideof(3.12)isboundedaboveby2times
sup
e


ˆ
L
(
e;h
)

ˆ
(
e
)


sup
e


e

H

e
h


I
f
e>
0
g


sup
t


htH

j
t
j


hE
j
Z
j
:
Thiscompletestheproofof(3.12).
Itisimportanttonotethatbothoftheseresultsarevalidwithoutanyrestrictiononthe
modeldimension
p
,henceapplicableinourhighdimensionalsetup.Inviewoftheresults
(3.11),(3.12)andTheorem3.2.1,weobtain
sup

2

j
M
?
n
(

)

M
n
(

)
j
(3.13)

sup

2

j
M
?
n
(

)

EM
?
n
(

)
j
+sup

2

j
~
M
n
(

)

E
~
M
n
(

)
j
+sup

2

j
~
M
n
(

)

M
n
(

)
j
=
O
p


max
s
3
=
2
h
2
r
log2
p
n

+
O
(
h
)
:
53
ThelastclaimintheaboveboundsfollowssincebyTheorem3.2.1,thesecondtermonthe
righthandsideof(3.13)decreasesfasterthantheterm.Thisapproximationplaysa
pivotalroleintheanalysiscarriedoutinthesequel.
3.3ResultsinHighDimensions
Inthissectionweshallprovidestatisticalerrorboundsforthe
W`
1
-
CQ
estimator.The
followinglemmaiscrucialforobtainingourerrorbounds.Let
v
n
(

)=
n

1
n
X
i
=1
ˆ
(
y
i
;x
i
;
)
;g
n
(

):=
Ev
n
(

)
;
2
R
p
:
Weshallsometimeswrite
ˆ

Li
(

)for
ˆ

L
(
y
i
;w
i
;;h
).Similarcommentappliesto
ˆ
and
ˆ
L
.
Wehave
Lemma3.3.1
Forthemeasurementerrormodel(3.1)and(3.2),wehave,
g
n
(
^

)

g
n
(

0
)+

n
k
d

^

k
1
(3.14)


n
k
d


0
k
1
+
j
M
?
n
(
^

)

M
n
(
^

)
j
+
j
M
n
(
^

)

EM
n
(
^

)
j
;
8

2
R
p
:
Thisinequalityisobtainedbysubtracting
M
n
(
^

)

EM
n
(
^

)onbothsidesoftheinequality
n

1
n
X
i
=1
ˆ
?
Li
(
^

)+

n
k
d

^

k
1

n

1
n
X
i
=1
ˆ
?
Li
(

0
)+

n
k
d


0
k
1
;
andthenrearrangingtermsandusingthetriangleinequality.
Thetechniqueadoptedtoprovidethedesirederrorboundsistoestablishresultsfor
any

choseninasmallneighbourhoodof

0
:
Later,usingtheconvexityof
ˆ
˝
(

)andthe
54
inequality(3.13),weshowthattheestimator
^

indeedeventuallyliesinthisneighborhood,
withprobabilitytendingto1.Let

n
:=max
1

i

n;
1

j

p
j
x
ij
j
,and
B
(

n
)=


2
R
p
:
k


0
k
1


n

;
where

n
isasequenceofpositivenumbersdecreasingto0andsatisfying

n
=
o
(


1
n
)
:
The
lastpieceofthisjigsawisthefollowinglemmawhichshallprovidealowerboundforthe
termonthelefthandsideof(3.14).Let
X
:=(
x
1
;x
2
;

;x
n
)
T
denotethe
n

p
design
matrix,and:=
diag
f
f
1
(0)
;:::;f
n
(0)
g
:
Lemma3.3.2
Supposethemodel(3.1)andassumption(A1)hold.Then,thereexistsa
constant
0
<c
a
<
1
,suchthatforany

2B
(

n
)
;
andforalllarge
n
,
g
n
(

)

g
n
(

0
)

c
a
(


0
)
0
X
0

X
n
(


0
)=
c
a
n

1
k

1
=
2
X
(


0
)
k
2
2

0
:
Proof:
Set
a
i
=
j
x
0
i
(


0
)
j
;
1

i

n:
Thenfor

2B
(

n
)
;a
i


n
k


0
k
1


n

n
!
0
:
ThenproceedasinFanetal.(2014)page341,toobtainthedesiredresult.

Fanetal.(2014)provetheaboveresultfortheoracleestimator,i.e.,withtheadditional
informationregardingthelocationsofzeroandnonzerocomponentsof

and

0
:
However,
asnotedabove,thisresultcanbeobtainedwithoutoracleinformationbytheset
B
(

n
)asabove.
Toproceedfurther,weneedthefollowingCompatibilityconditionontheunobservedde-
signmatrix
X:
Thisconditionisoftenusedinhighdimensionalanalysis(see,e.g.,Buhlmann
andVandeGeer(2011)andRaskuttietal.(2010)).Thecloselyrelated`Restrictedeigen-
valuecondition'isusedbyBelloniandChernozhukov(2011)toprovideconsistencyinesti-
55
mationforquantileregression,whenthecovariatesarecompletelyobserved.
3.3.1
Wesaythe
Compatibilitycondition
ismetfortheset
S;
ifforsome
˚>
0
;
andconstants
0
<b<
1
;c
0
>
0
,andforall

2
R
p
satisfying
k

S
c
k
1

c
0
k

S
k
1
;
k

S
k
2
1

bs
n˚
2
X
0

X:
(3.15)
Inoursetuptheconstant
c
0
canbeexplicitlycomputedas
c
0
=(2
c
max
+
c
min
)
=c
min
:
Hence,
ifweareusingthe
`
1
penalty,wheretheweights
d
j
=1
;
forall1

j

p;
then
c
0
=3
:
Wealsoneedthefollowingrateconditionsonvariousunderlyingentities.
(i)

n
!1
;
max
!1
;
n
!
0
;
n
!
0
;
(3.16)

n

max
s
3
=
2
n
h

2
r
log2
p
n
=
o
(

n
)
;
n
=
o
(


1
n
)
;
(ii)

n
h
=
o
(

n
)
;
(
iii
)

n
s
n

n
˚
2
!
0
:
Fortheboundeddesignswhere

n
=
O
(1),weshallneedthefollowingrateconditions.
(i)

max
!1
;
n
!
0
;
n
!
0
;
max
s
3
=
2
n
h

2
r
log2
p
n
=
o
(

n
)
;;
(ii)
h
=
o
(

n
)
;
(
iii
)

n
s
n
˚
2
!
0
:
Intheaboveconditions,
˚
istheconstantin(3.15).Asisthecasewithkernel
densityestimators,therateofdecreaseofthesmoothingparameter
h
hastobeappropriately
balanced.Ithastodecreaseslowlyenoughsoastosatisfy(3.16)(i)andfastenoughtosatisfy
(3.16)(ii)inthecaseofunboundeddesign.Similarly,inthecaseofboundeddesign,theserate
constraintshavetobalancebetween(3.17)(i)and(3.17)(ii).Notethattherateofdecrease
56
of

n
istlyslowerthaninthecaseofnonmeasurementerrorinthecovariates.This
ismainlyduetothepresenceoftheadditionalnoiseinthecovariatesandthesmoothing
parameter
h:
Wenowstatethemainresultprovidingerrorboundsfortheproposedestimator.
Theorem3.3.1
Forthemeasurementerrormodel(3.1)and(3.2),let
^

beasin(3.5)and
c
min
;c
max
beasin(3.6)with
c
m
:=
c
min
+
c
max
:
Assume(A1),(A2),(A3)and(A4),along
withtheCompatibilitycondition(3.15)hold.Alsoassumethateithertherateconditions
(3.16)or(3.17)holds.Thenthefollowinginequalityholdswithprobabilityatleast1-o(1).
3
c
a
n

1
k

1
=
2
X
(
^


0
)
k
2
2
+2

n
c
min
k
^


0
k
1
(3.17)

4

2
n
c
2
m
s
n
˚
2
+
O


max
s
3
=
2
n
h

2
r
log2
p
n

+
O
(
h
)
:
Thebound(3.17)clearlyimpliesthatundertheconditionsofTheorem3.3.1,
k
^


0
k
1
!
p
0and
n

1
k

1
=
2
X
(
^


0
)
k
2
2
!
p
0.Inotherwords,thesequenceofestimators
W`
1
-
CQ
isconsistentfor

0
in
`
1
-normandintheweighted
L
2
-norm
n

1
k

1
=
2
X
(
^


)
k
2
2
.
Secondly,theweights
d
j
;
1

j

p;
donotplayacriticalrolefortheconsistencyofthe
estimator,i.e.aslongasthecondition(3.6)istheaboveerrorboundswillpro-
videtherequiredconsistency.Hence,ifnopriorinformationisavailable,onemaychoose
d
j

1,inwhichcasetheestimator
^

becomesthe
`
1
-
CQ
estimator,whichis
`
1
-consistent.
Thisfactshallalsobeusefulforconsistentmodelselection,whichrequirescarefullycho-
senweightscorrespondingtothenon-zeroandzeroindicesoftheparameter.Thisshallbe
furtherelaboratedonafterweprovidearesultonthesparsitypropertiesoftheproposed
estimator.
57
Theconclusion(3.17)ofTheorem3.3.1boundingthe
l
1
andweighted
l
2
errorinestima-
tionresemblesinformtothatofTheorem6.2ofBuhlmannandVandeGeer(2011)obtained
for
`
1
-penalizedmeanregressionestimatorwhenthereisnomeasurementerrorinthecovari-
atesandwhentheerrorsinregressionmodelareassumedtobesub-Gaussian.Incomparison,
theaboveresult(3.17)isestablishedhereinthepresenceofheavytailmeasurementerror
incovariatesandwhentheregressionmodelerrorsareindependentheteroscedasticnotnec-
essarilysub-Gaussian.
3.3.1ASparsityProperty.
Next,weinvestigateamodelselectionpropertyof
W`
1
-
CQ:
Itiswellknownthatthemodel
selectionpropertiesarelinkedtotheorderoptimalityconditions,alsoknownasthe
KKTconditions.Theseconditionsarenecessaryandtwhentheobjectivefunction
isconvex.However,asnotedearlier,thelossfunctionofourestimatorisnon-convex.In
thiscase,KKTconditionsarenecessarybutnott.WeexploitthenecessityofKKT
conditionstoshowthattheestimator
W`
1
-
CQ
idenallzerocomponentssuccessfully
withasymptoticprobability1
;
providedtheweights
d
j
arechosenappropriately.More
precisely,let

n
=
2

n
c
2
m
s
c
min
˚
2
+
1

n
O


max
s
3
=
2
h
2
r
2log2
p
n

+
1

n
O
(
h
)
!
0
:
(3.18)
TheninadditiontotheconditionsofTheorem3.3.1,weassumethereexistsa0
<<
1
=
2
suchthat,
(
i
)

n

n

;
and(
ii
)log
p
=
o
(
n

)
:
(3.19)
58
Furthermorealongwiththeconditions(3.6)ontheweightvector
d;
weassumethat
d
j
for
j
2
S
c
;
divergeatafastenoughrate,i.e.,
d
S
c
min
=min
f
d
j
;j
2
S
c
g
thefollowingrate
conditions.
(
i
)max


n
;
3
n

2
n

=
o
(

n
d
S
c
min
)
;
(
ii
)

n
h
=
o
(

n
d
S
c
min
)
;
(3.20)
(
iii
)max
(

max
sn

h

2
r
2log
p
n
;
n

max
n

h

3
s
r
2log
p
n
)
=
o
(

n
d
S
c
min
)
:
Theorem3.3.2
Forthemeasurementerrormodel(3.1)and(3.2),assumetheconditions
ofTheorem3.3.1hold.Inadditionassumethat(3.6),(3.19)and(3.20)hold.Then
P

^

j
=0
;
8
j
2
S
c

!
1
:
(3.21)
Thistheoremprovidesthemodelselectionconsistencyoftheproposed
W`
1
-
CQ
estimator
^

,undersuitablechoiceoftheweightvector
d
=(
d
1
;:::d
p
)
T
.Notethatsettingtheweights
d
j

1
;
i.e.,theun-weighted
`
1
-penaltydoesnotsatisfytherateassumptions(3.20)and
hence
`
1
-
CQ
cannotbeguaranteedtobemodelselectionconsistentasopposedto
W`
1
-
CQ:
Asthereadermayobserve,theconditionsrequiredforTheorem3.3.2areonlyrate
conditionsonthemodelparameters,inadditiontomilddistributionalassumptions.These
conditionsareweakerthanthoserequiredformodelidenintheworkofBelloniand
Chernozhukov(2011).Thereasonforthisbeingthatourresultstatesthatzerocomponents
arecorrectlyidenasopposedtoBelloniandChernozhukov(2011),whostateastronger
resultregardingidentyofthenon-zerocomponents.Thus,weareabletostatea
weakerresultunderweakerconditions.Weareunabletoprovideanyresultregardingthe
59
idenofnon-zerocomponentsduetothenon-convexityofthelossfunction.
Wenotethattheaboveresultswillcontinuetoholdforallothermeasurementerror
distributionsforwhichtheidentity(3.11)andtheprobabilitybound(3.47)givenbelow
hold.
3.3.2Adaptivechoiceoftheweightvector
d:
Formodelselectionwehaveseenthatthechoiceoftheweightvector
d
playsacriticalrole
fortheproposedestimatortoguaranteethatthezerocomponentsareidencorrectly.
Zou(2008)proposedtheideaofadaptivelychoosingtheseweightsbysetting
d
j
=
j
^

ini
j
j


;
1

j

p;>
0
;
where
^

ini
j
isanyinitialestimateof

0
j
satisfyingmax
1

j

p
j
^

ini
j


0
j
j
=
O
p
(

n
),with

n
!
0
:
Weusethesameapproachtoselecttheweightvector
d
inoursetup.
First,weusethe
l
1
-
CQ
estimator,i.e.,theproposedestimatorwiththeordinary
`
1
-
penalty(
d
j
=1
;
8
1

j

p
),thisgivestheinitialestimate
^

ini
:
Theorem3.3.1provides
theconsistencyofthisestimate.Inparticular,underconditionsofTheorem3.3.1weobtain
withhighprobability,
k
^

ini


0
k
1


n
!
0
;
where

n
isin(3.18).Hereweplace
anadditionalassumptiononthetrueparametervector,namely,weassumethatallnon-zero
componentsof

0
areboundedaboveandbelowbyaconstant,i.e.
b
1
j

0
j
j
b
2
:
Thus,
withhighprobability
j
^

ini
j
j
b
2
+

n
;
8
j
2
S
j
^

ini
j
j

n
8
j
2
S
c
:
(3.22)
Now,weset
d
j
=(
j
^

ini
j
j
+
c
w
)


;
where
c
w
=min
1

j

p
(
j
^

ini
j
j
;
^

ini
j
6
=0)isaddedtothe
initialestimatestoavoidtheproblemofdivingbyzero.
Keeping(3.22)inview,itiseasytoverifythatwhen
n
islargeenough,theaboveweight
60
vector
d
therequiredassumptions(3.6)and(3.20)forsomeconstant

chosen
appropriately,withprobabilityapproachingto1.
3.4SimulationStudy
3.4.1Simulationsetup
Inthissectionwenumericallyanalysetheperformanceoftheproposedestimators
`
1
-
CQ
and
W`
1
-
CQ:
AllcomputationsweredoneinR,onanordinarydesktopmachinewithae
core(2.3GHz)processor.Wecompareourproposedestimatorswithleastsquaresbasedhigh
dimensionalproceduresincludingLassoandthebiascorrectedLasso(LohandWainwright
(2011)),thelatterofwhichisspdesignedtohandlesub-Gaussianmeasurement
errorincovariates.
Whileconductingoursimulationstudy,wecomputeLassoestimatesusingthepackage
glmnetdevelopedbyFriedmanetal.(2010).Tocompute
`
1
-
CQ
estimatesandthebiascor-
rectedLasso,weusetheprojectedgradientdescentalgorithm(Agarwaletal.(2012)),which
isatooldevelopedforoptimizingpenalizedsmoothlossfunctionsinhighdimensions.More
precisely,with
r
L
(

)denotingthegradientofalossfunction
L
,themethodofprojected
descentiteratesbytherecursions,
f

r
;r
=0
;
1
;
2
;:::
g
as,

r
+1
=argmin

2

ˆ
L
(

r
)+
r
L
(

r
)
T
(


r
)+

2
k


r
k
2
2
+

n
k

k
1
˙
;
(3.23)
where
>
0isastepsizeparameter.Theserecursionscanbecomputedrapidlyin
O
(
p
)time
usingtheproceduresuggestedbyAgarwaletal.(2012)withtherestrictionoftheparameter
spacetothe
`
1
-ballimplementedbytheprocedureofDuchietal.(2008).Thisprocedure
61
essentiallyinvolvestwo
`
2
projectionsontothe
`
1
ball
:
Theweightedversion
W`
1
-
CQ
canbecomputedbytheproceduredescribedabovewith
thefollowingalgorithmsimilarinspirittothatdescribedbyZou(2008).Theproofofthis
algorithmisstraightforwardandhenceisomitted.
Algorithmtocompute
W`
1
-
CQ
bymethodofprojectedgradientdescent:
1.
e
w
?
i
=(
w
i
1
=d
1
;:::;w
ip
=d
p
)
T
;
8
1

i

n:
Also
?
=

˙
ij
=d
i
d
j

;
8
1

i;j

p
where
˙
ij
denotethecomponentsof
:
2.
Optimize,usingthemethodsofprojectedgradientdescentandDuchietal.,
^

?
=argmin

2

n
1
n
n
X
i
=1
ˆ
?
L
(
y
i
;w
?
i
;;h
)+

n
k

k
1
o
:
3.
Evaluate
^

j
=
^

?
j
=d
j
;
8
1

j

p:
TuningParameters:
Thechoiceofthetuningparameters

n
and
h
isstillnotacompletely
understoodaspectofhighdimensionaldataanalysis.Typicallyinregularizedestimation
methods,eithercrossvalidationor
AIC
-
BIC
typeselectorsareusedtoselectthetuning
parameters.Zhang,LiandTsai(2010)providetheoreticalforusing
AIC
-
BIC
typecriteriaforseveralmodels.Thecrossvalidationmethodisoftenobservedto
resultinoving(Wang,LiandTsai,2007),furthermoreitisconsiderablymoretime
consuming.Morerecently,Lee,NohandPark(2014)havesuggestedahighdimensional
BICtypecriterionforquantileregressionmethods.Motivatedbytheirresults,onewayto
proceedistoselect

n
;h
asminimizersofthefunction
HBIC(

n
;h
)=log

n
X
i
=1
ˆ
?
L
(
y
i

w
T
i
^


)

+
j
S

j
(log
n
)
2
n
C
n
;
62
where
j
S

j
isthenumberofnonzerocotsintheestimatedparametervector
^


and
C
n
isadivergingsequenceofpositivenumbers.However,since
ˆ
?
L
cantakenegativevalues,
weshalluse
e
HBIC
(

n
;h
)=

n
X
i
=1
ˆ
?
L
(
y
i

w
T
i
^


)

e
j
S

j
(log
n
)
2
n
C
n
toobtain

n
and
h
.Theexponentialtransformationremovestheproblemofnegativityof
ˆ
L

andalsomaintainsmonotonicity.Furthermore,wechoose
C
n
=
O
(log(log
p
))whichis
empiricallyfoundtoworkwellinthissimulation.
InHBIC(

n
;h
),weusedthecorrectedquantilelossfunctioninsteadofthecheck
functionasbyLeeetal.(2013).Althoughthismakesintuitivesenseasthecorrected
quantilelossfunctionisapproximatedbythecheckfunction,howeverarigoroustheoretical
argumentjustifyingitsuseismissing.Thiscriterionisempiricallyfoundtoperformwellin
oursetup.
3.4.2ComputationalIssues
Acomputationalchallengeoftheproposedestimatoristhenon-convexityofthelossfunction
l

n
.Theobjectivefunction
l

n
becomesincreasinglyvolatilearoundthetrueparameterasit
approachesthecheckfunctionatvaluesof
h
veryclosetozero.Thisbehaviourisillustrated
inFigure1,whichplotsthelossfunctionagainst

1
;
keepingallotherparametersedat
thetruevalues.Thisplotisgeneratedfortheofthe100simulatedmodels.
Thelossfunctionexhibitsseverallocaloptimumsatsmallervaluesof
h:
Ontheother
handatrelativelyhighervaluesof
h;
thelossfunctionappearstobeconvexshapedaround
thetrueparameterandappearstohaveauniqueminimumatthetrueparametervalue.Two
63
Figure3.1:
1
n
P
n
i
=1
ˆ
?
L
(

)+

n
k

k
1
evaluatedaround

1
at
h
=0
:
01(left)and1
:
5(right).
computationalconsequencesofthisbehaviorarethat,at
h
closetozero,anyoptimiza-
tionprocedurebecomesexcessivelytimeconsuming.Toavoidthisunpleasantfeature,we
avoidvaluesof
h
closetozero.Itwasnumericallyobservedthatbydoingso,weareableto
maintaintheaccuracyoftheestimatoralongwithareasonablecomputationtime.Second,
atvaluesof
h
outsideaneighborhoodofzerotheoptimizationsarerobustagainstthestart-
ingpointsusedinoptimizations.Inparticular,inall100simulationrepetitionsthestarting
pointforoptimizationwaschosenrandomlyfromaGaussiandistribution.Thisbehaviorof
theobjectivefunctionisalsorepresentedvisuallyinthecontourplotinFigure3.2.Herethe
`
1
estimationerror
k
^


k
1
isplottedascoloredcontourswiththeerrorincreasingfrom
redtoblueregions.Valuesof
h
arerepresentedonthex-axisandvaluesof

n
onthey-axis.
Fromthisplotitisapparentthatthelowesterrorisgiveninregionsconcentratedaround
therelativelysmallervaluesof
h
and

n
exceptwhen
h
isinasmallneighborhoodofzero.
Figure3.2:
Coloredcontoursof
k
^


k
1
on
h
vs.

n
for
`
1
-
CQ
.
Withregardstocomputationaltime,oneoptimizationat
h
=0
:
01takes
ˇ
16seconds
64
asopposedto
ˇ
2secondsat
h
=0
:
5
;
at(
p
=40
;n
=120)
:
Thiscanalsobeviewedin
comparisontocorrectedLassowhichtakes
ˇ
0
:
5secondtocompleteoneoptimization.
3.4.3Simulationsetupandresults
Forthissimulationstudy,dataisgeneratedfromthemeasurementerrormodel(3.1)and
(3.2)underseveralchoicesoftheunderlyingparametersanddistributionalassumptions.
Theunobserveddesignvariables
f
x
ij
;
1

i

n;
1

j

p
g
arechosenasi.i.d.r.v.'sfrom
a
N
(0
;
1)distribution.Themeasurementerrors
f
u
i
;
1

i

n
g
arei.i.d.
L
p
(0
;
with
=
˙
2

I
,where
I
isthe
p

p
identitymatrix.Themodelerrors
"
i
;
1

i

n
are
independentrealizationsofNormal,CauchyormeancenteredParetor.v.'s.
WebeginbynumericallyverifyingtheresultofTheorem3.3.1.Observethatthistheorem
canbeviewedasdescribingthescalingbehaviouroftheerror
k
^


k
1
:
Inordertovisualize
this,weperformsimulationsbyvaryingthedimensionoftheparametervector
p
andthe
samplesize
n
whileholdingallotherparametersinparticularthenumberofnon
zerocomponents
s
=5
;
thecovariancematrixoftheLaplacedistributionforthecovariate
errorsistakentobe0
:
2
I
p

p
andthemodelerrorsareGaussianwithvariance0
:
2.Allof
thefollowingdescribetheerrorof
`
1
-
CQ
estimate.Thebehavioroftheerrorof
estimationforthe
W`
1
-
CQ
isobservedtobesimilarandthusthecorrespondingplotsare
omittedforthesakeofbrevity.
Figure3.3isacontourplotgeneratedat
h
=0
:
4(left)and

=0
:
07(right).Thisplot
describesthe
`
1
error,
k
^


k
1
asaspectrumofcolorswithredbeingtheleastandviolet
beingthemaximum.They-axisplotsdtvaluesofthetuningparameter

andthe
x-axismarksthevaryingdimension
p
ofthemodel.Notethatforagivenmodeldimension
thecorrespondingsamplesizeisrescaledtomaintaintheratio(
n=
log2
p
)tobeconstant.
65
ThisrescalingisdoneinaccordancewiththeresultofTheorem3.3.1andaspredictedby
thetheorem,holdingallotherparametersedforeachvalueof

(left)and
h
(right)the
errorlevelstaysroughlyconstant(thecolorsalign)acrossthechosenvaluesof
p
.
Figure3.3:
Coloredcontoursof
k
^


k
1
on
p
vs.

(left)and
h:
(right)
Wenowproceedtoamoredetailednumericalcomparisonoftheproposedestimates
withLassoandcorrectedLassoestimates.Foranygivenmethod,wesummarizetheresults
obtainedby100repetitions.Foreveryrepetition,eachnonzerocomponentoftheparameter
vector

isgeneratedfroma
N
(0
;
1)distributionnormalizedbythe
`
2
normofthegenerated
vector.Thedimensionsofthisvectorarechosentobe
p
=40
;
300
;
500
:
Thedimensionof
thenonzerocomponentsaresetto
s
=5
;
8
;
10
:
Asmentionedearlier,themodelerrors
"
i
;
1

i

n
aregeneratedfromGaussian,CauchyormeancenteredParetor.v.'s.Notethat
theParetodistributioncanbeheavilyskewed.Forperformancecomparisonwereportthe
followingcriteria.
MEE
:(Medianestimationerror),medianover100repetitionsoftheestimationerror
k
^


0
k
2
:
MIZ
:(Medianincorrectnumberofzeros),medianover100repetitionsofthenumberof
incorrectlyidenzerocomponentsoftheparametervector.
MINZ
:(Medianincorrectnumberofnonzeros),medianover100repetitionsofthenumber
66
ofincorrectlyidennonzerocomponentsoftheparametervector.
Inalltables,thestandarderrorsofthecorrespondingcriteriaarereportedintheparen-
theses.
Table3.1:
Simulationresultsat
p
=40forNormal,CauchyandParetomodelerrors
s
=8
;u
i
˘
L
p
(
˙
2
=0
:
2)
;"
i
˘N
(0
;
0
:
2)
;˝
=0
:
5
`
1
-
CQC

LassoLasso
nMEEMINZMIZMEEMINZMIZMEEMINZMIZ
20
0.62
45
0.6346
0.7849
(0.21)(1.72)(2.33)
(0.22)(1.76)(2.39)
(0.27)(1.98)(3.97)
60
0.27
19
0.2819
0.39112
(0.061)(1.07)(2.91)
(0.065)(1.04)(2.90)
(0.071)(1.12)(4.77)
120
0.18
112
0.20111
0.34114
(0.038)(0.86)(2.86)
(0.038)(0.88)(2.93)
(0.04)(0.91)(4.83)
s
=8
;u
i
˘
L
p
(
˙
2
=0
:
2)
;"
i
˘
Cauchy
(
scale
=0
:
1)
;˝
=0
:
5
50
0.95
57
1.6179
5.21220.5
(0.21)(1.21)(2.41)
(0.42)(1.10)(2.25)
(15.89)(1.59)(5.45)
150
0.60
49
1.54710
5.02225
(0.14)(1.17)(3.83)
(0.47)(1.15)(2.44)
(19.12)(1.69)(6.28)
300
0.35
211
1.56713
4.77224
(0.11)(1.01)(3.75)
(0.49)(1.21)(2.45)
(18.80)(1.53)(5.37)
s
=5
;u
i
˘
L
p
(
˙
2
=0
:
2)
;"
i
˘
meancenteredPareto;˝
=0
:
75
100
0.66
28
1.0137
1.15313
(0.15)(1.05)(2.93)
(0.31)(1.32)(2.99)
(3.05)(1.06)(7.20)
200
0.50
18
0.8427.5
0.97215
(0.10)(0.94)(2.56)
(0.32)1.31(2.64)
(2.49)(1.78)(6.70)
300
0.38
19
0.7128
0.90113
(0.07)(0.84)(2.89)
(0.27)(1.16)(2.86)
(2.41)(0.94)5.64
Tables3.1and3.2provideresultsofthesimulationstudycomparingthe
`
1
-
CQ;
the
correctedLasso(C-Lasso)andLassoestimators.Itisclearfromtheseresultsthatunder
heavytailedorskewedmodelerrors(CauchyandmeancenteredPareto),the
`
1
-
CQ
esti-
matortlyoutperformstheothertwoproceduresinallthreecomparisoncriteria.
Furthermore,thestandarderrorsof
`
1
-
CQ
aretlysmallerthanthoseoftheother
two.UnderGaussianmodelerrors,
`
1
-
CQ
iscomparable(slightlybetter)inperformance
67
totheC-Lasso,whilebothoftheseproceduresoutperformLasso.Consistencyintermsof
theestimationerrorandidentifyingthecorrectsupportof

0
isclearlyvisibleas
n
!1
:
Asexpected,the
`
1
-
CQ
estimatordoesnotprovideconsistencyinidentifyingthezerocom-
ponentscorrectly.However,itisstillmuchbetterincomparisontoLasso.Thisbehavior
ofLassoundermeasurementerrorhasalsobeenobservedbySorensenetal.(2014),i.e.,
measurementerrortendstoinduceovbynaiveestimatorssuchasLasso.
AnotherinstancewheretheproposedestimatorsoutperformthecorrectedLassoand
Lassoistheheteroscedasticsetup.Toillustratethis,wegeneratedindependentmodelerrors
"
i
from
N
(0
;˙
2
i
),where
˙
2
i
ischosenuniformlyfromtheinterval(0.1,9),foreach
i
=1
;

n
.
Thedimension
p
isincreasedto500forthiscaseandtheresultsareprovidedinTable3.3.
Wenextinvestigatethe
W`
1
-
CQ
estimatorforconsistentidenofthezerocom-
ponents,inadditiontoconsistentestimation.Tables4and5providesimulationresults
for
W`
1
-
CQ
for
p
=40and
p
=300
:
Theweights
d
j
arechosenasdescribedattheend
ofsection4above,wheretheexponent

ischosenbyusingtheselectioncriteriae
HBIC
:
Comparing
W`
1
-
CQ;`
1
-
CQ;
C-LassoandLasso,theandmostimmediateconclusion
istheoftheproposedestimatorsunderheavytailedorskewedmodelerrors.The
W`
1
-
CQ
estimatorconsistentlyandtlyoutperformsallotherproceduresinterms
ofmodelidentiunderallchosendistributionalandparametersettings.
Finally,toseehowrobustthe
`
1
-
CQ
estimatoristothemisspcationofthemea-
surementerrordistribution,wecompareditsperformancewithC-Lassowhenthiserror
distributionisGaussian.TheresultsreportedinTable6showacomparableperformance
withC-lassoperformingonlymarginallybetter.
68
Table3.2:
Simulationresultsat
p
=300forNormal,Cauchymodelerrors
s
=10
;u
i
˘
L
p
(
˙
2
=0
:
1)
;"
i
˘N
(0
;
0
:
2)
;˝
=0
:
5
`
1
-
CQC

LassoLasso
nMEEMINZMIZMEEMINZMIZMEEMINZMIZ
100
0.26
221
0.27221.5
0.35237
(0.04)(1.20)(5.20)
(0.05)(1.19)(4.70)
(0.07)(1.09)(7.43)
200
0.17
130
0.17
129.5
0.23155
(0.028)(0.89)(6.86)
(0.029)(0.90)(6.77)
(0.031)(0.91)(11.70)
300
0.11
133
0.14132
0.18179
(0.023)(0.79)(5.74)
(0.023)(0.76)(5.68)
(0.025)(0.82)(16.16)
s
=10
;u
i
˘
L
p
(
˙
2
=0
:
1)
;"
i
˘
Cauchy
(
scale
=0
:
1)
;˝
=0
:
5
100
0.44
323
0.92720
4.39727
(0.10)(1.57)(10.36)
(0.41)(2.35)(7.21)
(10.70)(1.96)(30.16)
200
0.30
221.5
0.87622.5
3.27630
(0.07)(1.34)(12.63)
(0.44)(2.42)(8.18)
(8.27)(1.84)(51.34)
300
0.21
213
0.85622
3.47628
(0.05)(1.23)(12.86)
(0.50)(2.59)(8.99)
(9.10)(1.91)(47.21)
Table3.3:
Simulationresultsat
p
=500underheteroscedasticity
s
=10
;u
i
˘
L
p
(
˙
2
=0
:
1)
;"
i
˘N
(0
;˙
2
i
)
;˙
2
i
˘
Uniform
(0
:
1
;
9)
;˝
=0
:
5
`
1
-
CQC

LassoLasso
nMEEMINZMIZMEEMINZMIZMEEMINZMIZ
100
0.86
725.5
0.99725
1.73773
(0.14)(1.13)(4.33)
(0.15)(0.99)(5.05)
(0.27)(1.87)(16.15)
200
0.66
530
0.78530
1.67587
(0.12)(1.32)(5.98)
(0.12)(1.20)(5.56)
(0.25)(1.19)(21.25)
300
0.54
434
0.69533
1.51383
(0.09)(1.37)(6.56)
(0.10)(1.17)(5.71)
(0.18)(1.26)(19.56)
Table3.4:
W`
1
-
CQ
at
p
=40forNormalandCauchymodelerrors.
"
i
˘N
(0
;
0
:
2)
;
"
i
˘
Cauchy
(
scale
=0
:
1)
;
n
MEEMINZMIZ
n
MEEMINZMIZ
20
0.701
5
50
0.981.55
(0.25)(0.66)(1.38)
(0.23)(0.63)(2.20)
60
0.2805
150
0.641
5
(0.06)(0.64)(1.55)
(0.14)(0.61)(2.27)
120
0.2104
300
0.4114
(0.038)(0.61)(1.27)
(0.11)(0.70)(2.17)
69
Table3.5:
W`
1
-
CQ
at
p
=300forNormalandCauchymodelerrors.
"
i
˘N
(0
;
0
:
2)
;
"
i
˘
Cauchy
(
scale
=0
:
1)
;
n
MEEMINZMIZ
n
MEEMINZMIZ
100
0.281
7
100
0.4018
(0.04)(0.69)(1.75)
(0.11)(0.72)(2.91)
200
0.2306
200
0.2916
(0.02)(0.62)(0.98)
(0.09)(0.68)(1.85)
300
0.1805
300
0.2405
(0.02)(0.41)(0.97)
(0.06)(0.46)(1.56)
Table3.6:
`
1
-
CQ
at
p
=300withmisspecovariateerrordistribution.
s
=10,
u
ij
˘N
(0
;
0
:
3)
;"
i
˘N
(0
;
0
:
3)
`
1
-
CQ
C

Lasso
n
MEEMINZMIZ
MEEMINZMIZ
100
0.31118
0.28
116
(0.06)(0.89)(3.64)
(0.07)(0.83)(3.12)
200
0.22121
0.20
118
(0.04)(0.71)(4.10)
(0.03)(0.71)(4.44)
300
0.20024
0.17
019
(0.02)(0.63)(4.14)
(0.02)(0.61)4.88
70
3.5ProofsforChapter3
WebeginbyprovingTheorem3.1.1andTheorem3.1.2whichprovideconsistencyinestima-
tionandvariableselectionforthe
W`
1
-
CQ
estimatorforthecaseof
p:
Forthispurpose
weshallrequirethefollowinguniformconvergenceresultstatedinWSZ,pp16aspartof
theproofofTheorem4oftheirpaper.Foranycompactset
B
in
R
p
wehave,
sup

2B


n

1
n
X
i
=1

@
2
ˆ
?
L
(

)
@
j
@
l

E
@
2
ˆ
?
L
(

)
@
j
@
l


=
o
p
(1)
:
(3.24)
TheproofofthisstatementfollowsfromNolanandPollard(1987,Lemma22)andPollard
(1984,Theorem2.37).
ProofofTheorem3.1.1.
Ittoshowthatforany
>
0
;
thereexistsatly
largeconstant
C
suchthat
P

inf
k
u
k
=
C
l

n
(

0
+

n
u
)
>l

n
(

0
)

>
1


(3.25)
where

n
:=
O
(
n

1
=
2
)
:
Thisimpliesthereexistsalocalminimumintheball(

0
+

n
u
:
k
u
k
C
)
;
i.e.thereexistsalocalminimizersuchthat
k
^


0
k
=
O
P
(

n
)
:
Recall
n
1
and

n
2
fromassumption(F3).Consider,
D
n
(
u
)=
n

1
n
X
i
=1

ˆ

L
(

0
+

n
u
)

ˆ

L
(

0
)

+

n
p
X
i
=1
d
j
(
j

0
j
+

n
u
j
jj

0
j
j
)
(3.26)
=
T
n
1
+
T
n
2
;
(
say;
)
here,
T
n
1
=
n

1
P
n
i
=1

ˆ

L
(

0
+

n
u
)

ˆ

L
(

0
)

;
and
T
n
2
=

n
P
p
i
=1
d
j
(
j

0
j
+

n
u
j
jj

0
j
j
)
:
71
NowbyTaylor'sexpansionweobtain,
T
n
1
=

n
n

1
=
2
 
T
n
1
(
w;
0
)
u
+

2
n
1
2
u
T
 

n
2
(
w;

)
u
=

n
n

1
=
2
 
T
n
1
(
w;
0
)
u
+

2
n
1
2
u
T
Au
f
1+
o
P
(1)
g
:
(3.27)
Where


isbetween

0
+

n
u
and

0
andthesecondequalityfollowsfrom(3.24)and
assumption(F3)sup

2B
j


n
2
(
w;
)

E


n
2
(
w;
)
j
=
o
P
(1)andtheassumption(F3).
NowbytheCentralLimitTheorem
 
T
n
1
(
w;
0
)=
O
P
(1)
;
hencethetermin(3.27)is
oftheorder
O
P
(
n

1
=
2

n
)
:
ConsiderthesecondtermontheRHS,since
A
ispositive
hencethesecondtermispositiveandbychoosingantlylarge
C
itdominatesthe
termuniformlyin
k
u
k
=
C:
Similarly,
T
n
2
=

n
p
X
i
=1
d
j
(
j

0
j
+

n
u
j
jj

0
j
j
)

n

n
X
j
2
S
d
j
j
u
j
j

n

n
k
u
k
(
X
j
2
S
d
2
j
)
1
=
2
=
O
(

n

n
)
:
Thusbychoosing
C
largeenough,thesecondterminRHSof(3.26)dominatestheother
two,thusproving(3.25).

ProofofTheorem3.1.2.
Toshowthepartofthistheorem,observethatsincethe
lossfunction
ˆ

L
isanonconvexsmoothfunction,thustheKKTconditionforoptimatilyis
necessarybutnotcient.Weshowthatif(
a
)isnottruethenwithprobabilitytending
to1thenecessityofKKTconditionsisviolated,i.e.,forany
j
2
S
c
;
letifpossible
^

j
6
=0
thenbythenecessityofKKTwehave,
n

1
n
X
i
=1
@ˆ

L
(
^

)
@
j
=

n
d
j
sign
(
^

j
)
(3.28)
72
NowbyTaylor'sexpansionweobtain,
n

1
n
X
i
=1
@ˆ

L
(
^

)
@
j
=
n

1
n
X
i
=1
@ˆ

L
(

0
)
@
j
+
n

1
p
X
l
=1
n
X
i
=1
@
2
ˆ

L
(


)
@
j
@
l
(
^

l


0
l
)
Followingstandardarguments,
n

1
n
X
i
=1
@ˆ

L
(

0
)
@
j
=
O
P
(
n

1
=
2
)
and
n

1
P
n
i
=1
@
2
ˆ

L
(


)
=@
j
@
l
=
n

1
P
n
i
=1
E
(
@
2
ˆ

L
(


)
=@
j
@
l
)+
o
P
(1)by(3.24).Also
byassumption
^


0
=
O
P
(
n

1
=
2
)
:
ThustheLHSof(3.28)isoftheorder
O
P
(
n

1
=
2
)hence
choosingthecondition
p

n
d
S
c
min
!1
contradictstherelation(3.28).Thusprovingpart
(
a
)oftheTheorem.
Toprove(b)webeginagainwiththeKKToptimalitycondition,i.e.,inviewofTheorem
3.1.1,withprobabilitytendingto1,forany
j
2
S;
^

j
6
=0thus,
n

1
n
X
i
=1
@ˆ

L
(
^

)
@
j


n
d
j
sign
(
^

j
)=0
(3.29)
Now,
@ˆ

L
(
^

)
@
j
=
@ˆ

L
(

0
)
@
j
+
p
X
l
=1
@
2
ˆ

L
(


)
@
j
@
l
(
^

l


0
l
)
Since,
1
p
n
P
n
i
=1
@ˆ

L
(

0
)
@
S
)N
(0
;D
S
)and
p

n
d
j
!
0
;
sincebyassumption(3.6)forall
j
2
S;d
j
'sareboundedabove,thusweobtaintherequiredresult.

WenowproceedtotheproofsofresultsintheHighDimensionalSetting.As
statedatthebeginningofSection4,thetechniqueusedtoproveTheorem3.3.1istouse
73
theconvexityofthequantilefunction
ˆ
(

)andtheweighted
`
1
-penalty,alongwiththe
approximationof
ˆ
?
L
(

)to
ˆ
(

)
:
Someofthestepsoftheproofaresimilartothoseadopted
byBulmannandVanderGeer(2011,ch.4).
Let
t
=

n
=
(

n
+
k
^


0
k
1
)
;
andset
~

=
t
^

+(1

t
)

0
:
Notethat
~

2B
(

n
)
:
Moreover,
~

2B
(

n
)implies
^

2B
(

n
=
(1

c
))
;
8
0
<c<
1
:
(3.30)
Thisfactwillbeusedinthesequel.
Next,bytheconvexityof
g
n
(

)and
k
d


k
1
;
andtheinequality(3.14),weobtain
g
n
(
~

)

g
n
(

0
)+

n
k
d

~

k
1


n
k
d


0
k
1
+sup

2

j
M
?
n
(

)

M
n
(

)
j
+sup

2B
(

n
)
j
M
n
(

)

EM
n
(

)
j
:
Webeginbyprovidingerrorboundsfor
~
;
whichshalleasilyextendto
^
:
By(3.13)and
(3.16),thesecondtermintheRHSofthe(3.31)is
o
p
(1)
:
Thefollowinglemmaprovidesthe
rateofdecreaseofthelastterm.
Lemma3.5.1
Forthemeasurementerrormodel(3.1),(3.2),assumethat(A1)and(A2)
hold.Then
sup

2B
(

n
)
j
M
n
(

)

EM
n
(

)
j
=
O
p


n
r
2log2
p
n

:
(3.31)
TheproofofTheorem3.2.1andLemma3.5.1areprovidedaftertheproofofTheorem3.3.1.
74
Considerthefollowingevents,
(
i
)
1
=theeventthatthebounds(3
:
9)and(3
:
10)hold,
(
ii
)
2
=theeventthatthebound(3
:
31)holds.
ThenbyTheorem3.2.1andLemma3.5.1,
P

1
\

2
)

1

o
(1)
;
andon
1
\

2
,
sup

2

j
M
?
n
(

)

M
n
(

)
j
+sup

2B
(

n
)
j
M
n
(

)

EM
n
(

)
j
(3.32)
=
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
(3.33)
Thisfollowssince

n
!
0
;
andhencethesecondtermsontheLHSof(3.32)convergesto0
fasterthantheterm.
Inthesequel,allargumentsshallberestrictedtotheset
1
\

2
:
Recallthat
~

2B
(

n
).
From(3.31)and(3.32)wenowreadilyobtainthatwithprobabilityatleast1

o
(1)
;
g
n
(
~

)

g
n
(

0
)+

n
k
d

~

k
1


n
k
d


0
k
1
+
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
(3.34)
ByLemma3.3.2weobtain
g
n
(
~

)

g
n
(

0
)

0
:
Thus,thetriangleinequality
k
d

~

k
1

k
d


0
k
1
k

d

(
~


0
)

S
k
1
+
k
(
d

~

)
S
c
k
1
appliedto(3.34)yields

n
k
(
d

~

)
S
c
k
1


n
k

d

(
~


0
)

S
k
1
+
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
(3.35)

c
max

n
k
~

S


0
S
k
1
+
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
75
Nowweconsidertwocases,
Case(i)
where,

n
2
c
min
k
~


0
k
1

O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
;
(3.36)
or
Case(ii)
where,

n
2
c
min
k
~


0
k
1

O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
(3.37)
ProofofTheorem3.3.1.
First,weproveerrorboundsfor
~
;
which,inviewof(3.30),
shallbeaprecursortoobtainingerrorboundsfor
^
:
Suppose
Case(i)
(3.36)holds.Thefact
k
~


0
k
1
=
k
(
~


0
)
S
k
1
+
k
~

S
c
k
1
and(3.35)
imply

n
c
min
k
~

S
c
k
1


n
k
(
d

~

)
S
c
k
1


n
c
max
k
~

S


0
S
k
1
+

n
2
c
min
k
~


0
k
1
;
whichimplies
k
~

S
c
k
1

c
0
k
~

S


0
S
k
1
;
where
c
0
=(2
c
max
+
c
min
)
=c
min
:
ThustheCompati-
bilitycondition(3.15)isfor

=
~


0
:
NowLemma3.3.2,thetriangleinequality
k
d

~

k
1
k
d


0
k
1
k

d

(
~


0
)

S
k
1
+
k
(
d

~

)
S
c
k
1
,(3.34),and(3.36)togetheryield
2
c
a
n
k

1
=
2
X
(
~


0
)
k
2
2
+

n
c
min
k
~

S
c
k
1


n
c
min
c
0
k
~

S


0
S
k
1
:
(3.38)
76
Recall
c
m
=
c
min
+
c
max
andconsider
4
c
a
n

1
k

1
=
2
X
(
~


0
)
k
2
2
+2

n
c
min
k
~


0
k
1
=4
c
a
n

1
k

1
=
2
X
(
~


0
)
k
2
2
+2

n
c
min
k
~

S


0
S
k
1
+2

n
c
min
k
~

c
S
k
1

2

n
c
min
c
0
k
~

S


0
S
k
1
+2

n
c
min
k
~

S


0
S
k
1
=4

n
c
m
k
~

S


0
S
k
1

4

n
c
m
p
sc
a
p
n˚
k

1
=
2
X
(
~


0
)
k
2

c
a
n
k

1
=
2
X
(
~


0
)
k
2
2
+
4

2
n
c
2
m
s
˚
2
:
Heretheinequalityfollowsfrom(3.38),thesecondfromtheCompatibilityconditionin
(3.15),andthethirdusingtheidentity4
uv

u
2
+4
v
2
:
Thus
3
c
a
n

1
k

1
=
2
X
(
~


0
)
k
2
2
+2

n
c
min
k
~


0
k
1

4

2
n
c
2
m
s
˚
2
:
(3.39)
Nowweconsider
Case(ii)
.From(3.34)weobtain,
c
a
n

1
k

1
=
2
X
(
~


0
)
k
2
2
+

n
c
min
k
~

S
c
k
1


n
c
max
k
~

S


0
S
k
1
+
O
p


max
s
3
=
2
h
2
r
2log2
p
n

+
O
p
(
h
)
;
=
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
Inparticular,
c
a
n

1
k

1
=
2
X
(
~


0
)
k
2
2
=
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
77
Thusunder
Case(ii)
,wehave
3
c
a
n

1
k

1
=
2
X
(
~


0
)
k
2
2
+2

n
c
min
k
~


0
k
1
=
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
(3.40)
Hencefrom(3.39)and(3.40)forany
~

2B
(

n
)wehave,withprobability1

o
(1),
3
c
a
n

1
k

1
=
2
X
(
~


0
)
k
2
2
+2

n
c
min
k
~


0
k
1

4

2
n
c
2
m
s
˚
2
+
O


max
s
3
=
2
h
2
r
2log2
p
n

+
O
(
h
)
:
Therebychoosing

n
accordingtotherateassumptions(3.16),withprobability1

o
(1),
k
~


0
k
1

1
2
h
4

n
c
2
m
s=c
min
˚
+
1

n
O


max
s
3
=
2
h
2
r
2log2
p
n

+
1

n
O
(
h
)
i
!
0
:
Thuschoosing,

n


4

n
c
2
m
s=c
min
˚
+
1

n
O


max
s
3
=
2
h
2
r
2log2
p
n

+
1

n
O
(
h
))
!
0
;
wehavebytherateassumptions(3.16),

n

n
!
0
;
andhence
k
~


0
k
1


n
2
:
Thisalongwiththeconstructionof
~

and(3.30)appliedwith
c
=1
=
2impliesthat
k
^


0
k
1


n
;
andthus,
^

2B
(

n
).Repeatingtheaboveargumentwith
~

replacedby
^

now
givesthedesirederrorbound(3.17),therebycompletingtheproofofTheorem3.3.1.

Foralaterusewestatethefactaboutfact
^

2B
(

n
)asfollows.Notethattheabove
78

n
(3.18).Thuswith

n
asin(3.18),withprobability1

o
(1),
^

2B
(

n
)
:
(3.41)
WenowproceedtotheproofsofTheorem3.2.1andLemma3.5.1.Forthispurposewe
statesomefactsaboutthetwosummandsinthelossfunction
ˆ

L
of(3.4).These
factsareconsequencesofthepropertiesofnormalkerneldensity.Let,for
s;y
2
R
,
l
(
s;y
)=(
y

s
)(
˝

1)+(
y

s
)
H

y

s
h

;
(3.42)
l
0
(
s;y
)=
˝

1+
H

y

s
h

+
y

s
h
K

y

s
h

;
l
00
(
s;y
)=
2
h
K

y

s
h

+
y

s
h
2
K
0

y

s
h

;
l
000
(
s;y
)=
3
h
2
K
0

y

s
h

+
y

s
h
3
K
00
(
y

s
h
)
:
BytheMVTandthedofstandardnormaldensitywereadilyobtainthatuniformly
in
y
2
R
,thefollowingfactsholdforall
s
1
;s
2
2
R
.Forsomeconstant
C>
0,
(i)
j
l
(
s
1
;y
)

l
(
s
2
;y
)
j
C
j
s
1

s
2
j
;
(ii)
j
l
0
(
s
1
;y
)

l
0
(
s
2
;y
)
j
C
h
1
j
s
1

s
2
j
;
(3.43)
(iii)
j
l
00
(
s
1
;y
)

l
00
(
s
2
;y
)
j
C
h
2
j
s
1

s
2
j
;
(iv)
j
l
000
(
s
1
;y
)

l
000
(
s
2
;y
)
j
C
h
3
j
s
1

s
2
j
:
Theaboveconditionsarethereasonforchoosingthekernelfunction
K
(

)asthep.d.f.of
astandardnormalr.v.Werequirethatthethreederivativesof
K
(

)tobebounded
uniformly.Inthefollowingwedenote
l
i
(

)=
l
(
w
0
i
;y
i
)and
l
0
i
(

)
;l
00
i
(

)and
l
000
i
(

)are
similarly.
ProofofTheorem3.2.1.
Firstnotethatfor

2

;
wehave
k


0
k
1

2
b
0
p
s;
by
79
theofandtheassumption
k

0
k
1

b
0
p
s:
Notethat,
ˆ
?
Li
(

)=
l
i
(

)

˙
2

2
l
00
i
(

)
:
Now,
M
?
n
(

)

EM
?
n
(

)=
1
n
n
X
i
=1

l
i
(

)

l
i
(

0
)

E

l
i
(

)

l
i
(

0
)


(3.44)

1
n
˙
2

2
n
X
i
=1

l
00
i
(

)

l
00
i
(

0
)

E

l
00
i
(

)

l
00
i
(

0
)


+
1
n
˙
2

0

˙
2

2
n
X
i
=1

l
00
i
(

0
)

E

l
00
i
(

0
)


I

˙
2

2
II
+
˙
2

0

˙
2

2
III;
say
:
Weshallshowthat
(
a
)sup

2

j
I
j
=
O
p

p
s
r
2log2
p
n

;
(
b
)sup

2

j
II
j
=
O
p

s
1
=
2
h
2
r
2log2
p
n

;
(
c
)
j
III
j
=
O
p

s
h
r
log2
p
n

:
Observethatfor

2

;˙
2

=

T


max
b
0
s
and
j
˙
2

0

˙
2

j
2
b
0

max
s:
Thisfact
alongwithboundsfor
I;II
and
III
shallimplythedesiredresult.
theempiricalprocess
G
n
(

):=
1
n
P
n
i
=1

l
i
(

)

El
i
(

)

and
Z
n
:=sup

2

jG
n
(

)
G
n
(

0
)
j
:
With
˙
u
asinassumption(A3),let
c
u
=1
:
4
˙
u
.Ontheevent
A
=

max
1

j

p
1
n
P
n
i
=1
u
2
ij

80
c
u

;
1
n
n
X
i
=1
w
2
ij

2
n
n
X
i
=1
(
x
2
ij
+
u
2
ij
)

2(
c
x
+
c
u
)
:
(3.45)
ThisboundandtheLipschitzcondition(3.43)(i)allowustoapplyLemma14.20andTheorem
14.2asdoneinExample14.2ofBuhlmannandVandeGeer(2011)page503,toyield
E

Z
n
I
A


32
c
1
b
0
(
c
x
+
c
u
)
p
s
r
2log2
p
n
;
P

Z
n
I
A

8
c
1
b
0
(
c
x
+
c
u
)
p
s

4
r
2log2
p
n
+
r
2
t
n


exp


t

;
forany
t>
0
:
Choose
t
=log2
p
inthelatterboundtoobtain
P

Z
n
I
A

O

p
s
r
2log2
p
n


=
o
(1)
:
Nowtoremovethetruncationof
Z
n
ontheset
A;
observethat(3.43)(i)alsoimpliesthat,
j
l
i
(

)

l
i
(

0
)
j
C
(

n
+max
ij
j
u
ij
j
)
k


0
k
1

2
Cb
0
(

n
+max
ij
j
u
ij
j
)
p
s;
sinceforany

2

;
wehave
k


0
k
2
b
0
p
s:
Hence,
Z
n

Z
n
I
A
+
c
p
s
(

n
+max
ij
j
u
ij
j
)
I
A
c
+
c
p
sE

(

n
+max
ij
j
u
ij
j
)
I
A
c

:
(3.46)
Nowrecallthatforeach1

j

p
,
f
u
ij
;
1

i

n
g
arei.i.d.
L
(0
;˙
2
jj
)r.v.'s.Hence,
2
P
n
i
=1
j
u
ij
j
=˙
jj
˘
˜
2
2
n
;
where
˜
2
2
n
denotesachisquarer.v.with2
n
degreesoffreedom.
Nowusetheprobabilityboundsforchi-squaredistributionsgivenbyJhonstone(2001)to
81
obtain
P

A
c


p
X
j
=1
P

1
n
n
X
i
=1
j
u
ij
j
o
2

c
2
u


p
X
j
=1
P

2
1
n
n
X
i
=1
j
u
ij
j
˙
jj

2
:
8

(3.47)
=
p
X
j
=1
P

˜
2
2
n

n
2
:
8


p
X
j
=1
P

j
˜
2
2
n

2
n
j
2
n
(0
:
4)


p
X
j
=1
exp


3
n
100


exp


3
n
100
+log
p

:
Next,usethefactthat
j
u
ij
j˘
Exp
(
˙
jj
)
;
toobtain
E

(max
ij
j
u
ij
j
)
2


X
i;j
E
(
u
2
ij
)

npc
u
Thus,usingthisbound,(3.47),andtheCauchy-Schwarzinequality,weobtain
(a)
P

(max
ij
j
u
ij
j
)
I
A
c
>n

k


n
k
E

(max
ij
j
u
ij
j
)
I
A
c


n
k
r
npc
u
exp


3
n
100
+log
p

;
(b))
E

(max
ij
j
u
ij
j
)
I
A
c


r
E

(max
ij
j
u
ij
j
)
2

P
(
A
c
)

r
npc
u
exp


3
n
100
+log
p

:
Theexponentialboundin(a)impliesthattheprobabilityoftheeventin(a)tendstozero,
forany
k>
0.Thisinturnimpliesthatthesecondsummandin(3.46)
p
s
(

n
+max
ij
j
u
ij
j
)
I
A
c
=
o
p
(
n

k
)
;
8
k>
0
:
Similarly,theboundin(b)impliesthatthethirdsummandintheboundof(3.46)decreases
tozeroatanexponentialrate.Thus,withprobabilityatleast1

o
(1)
;
theremaindertwo
82
summandsintheboundin(3.46)decreasetozero,inprobability,fasterthan
Z
n
I
A
:
Hence,
sup

2

j
I
j
=
Z
n
=
O
p

p
s
r
2log2
p
n

:
(3.48)
Wecansimilarlyobtainaboundforterm
II
of(3.44).Anoutlineisgivenbelow.
theempiricalprocess
~
G
n
(

):=
1
n
P
n
i
=1

l
00
i
(

)

El
00
i
(

)

:
Let
~
Z
n
:=sup

2

j
~
G
n
(

)

~
G
n
(

0
)
j
:
Proceedingasearlier,(3.43)(ii)alongwiththebound(3.45)allowustoapplyLemma14.20
andTheorem14.2ofBuhlmannandVandeGeer(2011),page503,whichyields
E

~
Z
n
I
A


32
c
3
b
0
(
c
x
+
c
u
)
p
s
h
2
r
2log2
p
n
;
and
P

Z
n
I
A

8
c
3
b
0
(
c
x
+
c
u
)
p
s
h
2

4
r
2log2
p
n
+
r
2
t
n


exp


t

;
8
t>
0
:
Choose
t
=log2
p
inthisboundtoobtain
P

~
Z
n
I
A

O

p
s
h
2
r
2log2
p
n


=
o
(1)
:
Getridofthetruncationontheset
A
asdonefor
I
,toobtain
sup

2

j
II
j
=
~
Z
n
=
O
p

p
s
h
2
r
2log2
p
n

:
(3.49)
Lastly,considertheterm
III
in(3.44).Observethat
j
l
00
i
(

0
)
j
ch

1
;
for
c<
1
:
Then
83
Lemma14.11ofBuhlmannandVandeGeer(2011)yields
P

1
n


n
X
i
=1

l
00
i
(

0
)

El
00
i
(

0
))


t


2exp


nt
2
h
2
2
c
2

:
Choosing
t
=
h

1
q
log2
p
n
;
weobtain
j
III
j
=
1
n


n
X
i
=1

l
00
i
(

0
)

El
00
i
(

0
))


=
O
p

h

1
r
log2
p
n

:
(3.50)
Nowuse(3.48),(3.49)and(3.50)in(3.44),andthefactthattherateofdecreaseof(3.49)
istheslowest,toconclude(3.9)ofTheorem3.2.1.
Theproofof(3.10)similar.ThiscompletestheproofofTheorem3.2.1.

ProofofLemma3.5.1.

ˆ
(
s;y
i
)=
ˆ
˝
(
y
i

s
)
:
Thenobservethatitthe
followingLipchitzcondition,
j
ˆ
(
s
1
;y
i
)

ˆ
(
s
2
;y
2
)
j
max
f
˝;
1

˝
gj
s
1

s
2
j
:
Thenproceedasintheproofof(3.9)ofTheorem3.2.1toobtainthedesiredbound.

ProofofTheorem3.3.2.
Let

n
beasin(3.18).By(3.41),
^

2B
(

n
),with
probability1

o
(1).Thus,witharbitrarilylargeprobability,foralllarge
n
,
^

isintheinterior
ofandnotonitsboundary.Hence,KKTconditionsarenecessaryforthisoptimum.We
provethedesiredresultviacontradiction.Forany
j
2
S
c
;
letifpossible
^

j
6
=0
:
Thenby
thenecessityofKKTconditions,
84
d

j

n

1
n
X
i
=1
ˆ
?
Li
(
^

)

=

n
d
j
sign
(
^

j
)
:
(3.51)
Recall(3.42).Thederivativesof
ˆ
?
Li
(

)and
ˆ
Li
(

)w.r.t

j
are
ˆ
?
0
Li;j
(

):=
d

j
ˆ
?
Li
(

)=

w
ij
l
0
i
(

)+
˙
2

2

l
000
i
(

)


w
ij
n
X
k
=1
˙
kj

j

l
00
i
(

)

;
(3.52)
ˆ
0
Li;j
(

)=
d

j
ˆ
Li
(

)=

x
ij
h
˝

1+
H

"

h

+
"

h
K

"

h

i
:
Let
ˆ
0
i;j
(

):=

x
ij

˝

I
f
y
i

x
0
i


0
g

; 

0
Li;j
(

):=
Eˆ
?
0
Li;j
(

)
; 
0
Li;j
(

):=
Eˆ
0
Li;j
(

)
;
 
0
i;j
(

):=
Eˆ
0
i;j
(

),and
S
?
n;j
(

)=
n

1
n
X
i
=1

ˆ
?
0
Li;j
(

)

ˆ
?
0
Li;j
(

0
)

 
0
Li;j
(

)+
 

0
Li;j
(

0
)

;T
?
B
;j
=sup

2B
(

n
)


S
?
n;j
(

)


:
Thefactthat
 
0
i;j
(

0
)=
E
(
ˆ
0
ij
(

0
))=0
;
andtriangleinequalityyield
j
n

1
n
X
i
=1
ˆ
?
0
Li;j
(
^

)
j
n

1


n
X
i
=1

ˆ
?
0
Li;j
(

0
)

 
0
Li;j
(

0
)


+
n

1


n
X
i
=1

 
?
0
Li;j
(
^

)

 
0
i;j
(
^

)


+
n

1


n
X
i
=1

 
0
i;j
(
^

)

 
0
i;j
(

0
)


+
T

B
;j
:
=
J
1
+
J
2
+
J
3
+
J
4
:
Wewillshowtherelations(3.53)-(3.53)belowholdforall
j
2
S
c
simultaneouslywith
85
probabilityatleast1

o
(1)
:
J
1
:=
n

1


n
X
i
=1

ˆ
?
0
Li;j
(

0
)

 
?
0
Li;j
(

0
)


=
o
(
d
j

n
)
;
(3.53)
J
2
:=sup

2B
(

n
)
n

1


n
X
i
=1

 
?
0
Li;j
(

)

 
0
i;j
(

)


=
O
(

n
h
)=
o
(
d
j

n
)
;
(3.54)
J
3
:=sup

2B
(

n
)
n

1


n
X
i
=1

 
0
i;j
(

)

 
0
i;j
(

0
)


=
o
(
d
j

n
)
;
(3.55)
J
4
:=
T

B
;j
=
o
(
d
j

n
)
:
(3.56)
Lemma3.5.2proves(3.54)and(3.55)andLemma3.5.3proves(3.53)and(3.56).Finally,
combining(3.53)-(3.56),weobtainthatfor
n
large,withprobability1

o
(1)
;


d

j

n

1
n
X
i
=1
ˆ
?
Li
(
^

)


<d
j

n
;
8
j
2
S
c
:
Thiscontradictstheoptimalitycondition(3.51),andalsocompletestheproofofTheorem
3.3.2.

Lemma3.5.2
UndertheconditionsofTheorem3.3.2wehave,
max
1

j


2
R
p
n

1


n
X
i
=1

 
?
0
Li;j
(

)

 
0
i;j
(

)


=
O
(

n
h
)=
o
(

n
d
S
c
min
)
;
(3.57)
max
j
2
S
c
sup

2B
(

n
)
n

1


n
X
i
=1

 
0
i;j
(

)

 
0
i;j
(

0
)


=
o
(

n
d
S
c
min
)
:
(3.58)
Proof.
Let
a
i
=
x
0
i
(


0
),and
"

:=
y
i

x
0
i

=
"
i

a
i
:
ByTheorem2ofWSZ,
n
X
i
=1

 
?
0
Li;j
(

)

 
0
i;j
(

)

=
n
X
i
=1

 
0
Li;j
(

)

 
0
i;j
(

)

:
86
But
 
0
Li;j
(

)

 
0
ij
(

)=

x
ij
E

H

"

h


1
f
"

>
0
g
+
"

h
K

"

h


(3.59)
=

x
ij
E

H


"

h


+
"

h
K

"

h


:
Now,
E

H


"

h


=
Z
1
x
=

H


x

a
i
h


f
i
(
x
)
dx
=
h
Z
1
t
=

H
(

t
j
)
f
i
(
ht
+
a
i
)
dt
=
h
Z
0
t
=

H
(
t
)
f
i
(
ht
+
a
i
)
dt
+
h
Z
1
t
=0
H
(

t
)
f
i
(
ht
+
a
i
)
dt
=
O
(
h
)
;
uniformlyin1

i

n;
2
R
p
,becausebyassumption(A1),sup
1

i

n;x
2
R
f
i
(
x
)
<
1
.
Similarly,onevsthatmax
1

i


2
R
p
j
h

1
E"

K
(
"

=h
)
j
=
O
(
h
)
:
Hencefrom(3.59)
weobtain,
sup
1

i

n;
1

j


2
R
p


0
Li;j
(

)

 
0
i;j
(

)


=
O
(

n
h
)=
o
(

n
d
S
c
min
)
:
(3.60)
Thelastequalityfollowsfromtherateassumptions(3.20).Thisboundandassumption(A2)
completestheproofof(3.57).
Next,weshow(3.58).Byassumption(A1),
 
0
i;j
(

)

ˆ
0
i;j
(

0
)

(3.61)
=

x
ij
h
F
i

x
0
i
(


0
)


F
i
(0)
i
=

x
ij
f
i
(0)
x
T
i
(


0
)

x
ij
~
I
i
;
87
where
~
I
i
=
F
i
(
x
0
i
(


0
))

F
i
(0)

f
i
(0)
x
T
i
(


0
)
:
Nowforany
j
2
S
c
,


1
n
n
X
i
=1
x
ij
f
i
(0)
x
0
i
(


0
)


1
n
x
ij
f
i
(0)
x
0
i


1
k


0
k
1
=
O
(

n
)=
o
(

n
d
S
c
min
)
;
(3.62)
thisfollowssince
f
i
(0)and
n

1
P
n
i
=1
x
2
ij
areboundedbyaconstantforall1

i

n
and
1

j

p:
Also,fromassumption(A1)weobtain,
max
j
2
S
c


1
n
n
X
i
=1
x
ij
~
I
i


n
n
n
X
i
=1
~
I
i

C
2

n
n
n
X
i
=1

x
0
i
(


0
)

2
:
(3.63)

C
3
n
k


0
k
2
1
=
O


3
n

2
n

=
o
(

n
d
S
c
min
)
:
Nowuse(3.61){(3.63)toobtain(3.58),therebycompletingtheproofofthelemma.

Lemma3.5.3
UndertheconditionsofTheorem3.3.2,
max
j
2
S
c
1
n


n
X
i
=1

ˆ
?
0
Li;j
(

0
)

 
?
0
Li;j
(

0
)


=
o
p
(

n
d
S
c
min
)
(3.64)
max
j
2
S
c
T

B
;j
=
o
p
(

n
d
S
c
min
)
:
(3.65)
Proof
ThestructureofthisproofissimilartotheproofofTheorem3.2.1.Inthe
followingproof
c>
0shalldenoteagenericconstantthatmaybetdependingonthe
context.Forany0
<;
theevent
A
=
n
max
1

j

p
1
n
n
X
i
=1
u
2
ij

c
u
;
max
1

j

p;
1

i

n
j
u
ij
j
n

o
:
88
Usethefact
j
u
ij
j˘
Exp
(
˙
jj
)toobtain
P

max
1

i

n;
1

j

p
j
u
ij
j
>cn


p
X
j
=1
n
X
i
=1
P

j
u
ij
j
cn


1
˙
u
exp


cn

˙
u
+log
p
+log
n

:
Thisboundand(3.47)togetherimplythat
P
(
A
c
)

1
˙
u
exp


cn

˙
u
+log
p
+log
n

+exp


3
n
100
+log
p

:
Now,
n

1


n
X
i
=1

ˆ
?
0
Li;j
(

0
)

 
?
0
Li;j
(

0
)


n

1


n
X
i
=1
w
ij

0
i
(

0
)


+
˙
2

0
2
n

1


n
X
i
=1
w
ij

000
i
(

0
)


+


n
X
i
=1
˙
ij

0
j


n

1


n
X
i
=1
w
ij

00
i
(

0
)


;
where

0
i
(

0
):=
l
0
i
(

0
)

El
0
i
(

0
)
;
00
i
(

0
):=
l
00
(

0
)

El
00
i
(

0
)and

000
i
(

0
):=
l
000
i
(

0
)

El
000
i
(

0
)
:
Using

n

n

,weobtain
(
i
)
j
w
ij

0
i
(

0
)
I
A
j
cn

;
(
ii
)
j
w
ij

00
i
(

0
)
I
A
j
cn

h

1
;
(
iii
)
j
w
ij

000
i
(

0
)
I
A
j
cn

h

2
:
Hence,
P

max
1

j

p
1
n


n
X
i
=1
w
ij

0
i
(

0
)
I
A


t


p
X
j
=1
P

1
n


n
X
i
=1
w
ij

0
(

0
)
I
A


t


2exp


cn


nt
2
+log
p

;
wherethelastinequalityfollowsfromLemma14.11ofBuhlmannandVandeGeer(2011).
89
Thuschoosing
t
=
cn

p
2log2
p=n;
forsomeconstant
c>
0,weobtain
1
n


n
X
i
=1
w
ij
l
0
i
(

0
)
I
A


=
O
p

n

r
2log2
p
n

:
(3.66)
Nowtoremovethetruncationontheset
A
,observethat,
max
1

j

p
j
n
X
i
=1
w
ij

0
i
(

0
)
j
max
1

j

p
j
n
X
i
=1
w
ij

0
i
(

0
)
I
A
j
+max
1

j

p
c
(

n
+max
i;j
j
u
ij
j
)
1
A
c
(3.67)
+
c
max
1

j

p
E

(

n
+max
i;j
j
u
ij
j
)
1
A
c

ProceedasintheproofofTheorem3.2.1toshowthatthelasttwotermsontheRHS
convergetozerofasterthantheterm,inprobability.Thusweobtain,
1
n


n
X
i
=1
w
ij

0
i
(

0
)


=
O
p

n

r
2log2
p
n

;
(3.68)
Asimilarargumentyieldsthat
max
1

j

p
1
n


n
X
i
=1

00
i
(

0
)


=
O
p

h

1
r
2log2
p
n

;
max
1

j

p
1
n


n
X
i
=1
w
ij

000
(

0
)


=
O
p

n

h
2
r
2log2
p
n

:
Recallthat
˙
2

0

b
0

max
s;
and
j
P
n
k
=1
˙
kj

j
j
b
0
˙
u
s:
Nowcombinetheseresultswiththe
rateassumptions(3.20)toobtain(3.64).
90
Toproveclaim(3
:
65),notethat
S
?
n;j
(

)=

1
n
n
X
i
=1
w
ij

l
0
i
(

)

l
0
i
(

0
)

El
0
i
(

)

El
0
i
(

0
)

+
˙
2

2
1
n
n
X
i
=1
w
ij

l
000
i
(

)

l
000
i
(

0
)

El
00
i
(

)

El
00
i
(

0
)

+
˙
2


˙
2

0
2
1
n
n
X
i
=1
w
ij

l
000
i
(

0
)

El
000
i
(

0
)


n
X
k
=1
˙
kj

j
1
n
n
X
i
=1

l
00
i
(

)

l
00
i
(

0
)

El
00
i
(

)

El
00
i
(

0
)


n
X
k
=1
˙
kj
(

j


0
j
)
1
n
n
X
i
=1

l
00
i
(

0
)

El
00
i
(

0
)

=

I
+
˙
2

2
II
+
˙
2


˙
2

0
2
III

n
X
k
=1
˙
kj

j
IV

n
X
k
=1
˙
kj
(

j


0
j
)
V:
Webeginwiththeterm
II;
whichturnsouttohavetheslowestrateofconvergence.
theempiricalprocess
G
000
n;j
(

):=
1
n
P
n
i
=1
w
ij

l
000
i
(

)

El
000
i
(

)

andlet,
Z
000
n;j
=sup

2B
(

n
)


G
000
n;j
(

)
G
000
n;j
(

0
)


:
Also,observethatfrom(3.42)and(3.43)wehave,


w
ij
l
000
(
s
1
;y
i
)

w
ij
l
000
(
s
2
;y
i
)


c
(

n
+max
i;j
j
u
ij
j
)
h

3
j
s
1

s
2
j
:
ThenasintheaboveproofofTheorem3.2.1,applyLemma14.2andTheorem14.2of
91
BuhlmannandVandeGeer(2011toobtain
P
 
max
1

j

p
Z
000
n;j
I
A

8
cb
0
(
c
x
+
c
u
)
n

h

3

n

4
r
2log2
p
n
+
r
2
t
n

!

p
X
j
=1
P
 
Z
000
n;j
I
A

8
cb
0
(
c
x
+
c
u
)
n

h

3

n

4
r
2log2
p
n
+
r
2
t
n

!

exp


t
+log
p

:
Nowchoose
t
=
c
log
p
,
c>
0
;
sothatthelasttermintheaboveexpressionis
o
(1)
:
Now
removingthetruncationontheset
A
asdoneintheproofofTheorem3.2.1,weobtain
max
j
2
S
c
sup

2B
(

n
)
j
II
j
=max
1

j

p
Z
000
n;j
=
O
p

n

h

3

n
r
2log2
p
n

;
wherethelastequalityfollowsbytherateassumption(3.20).Asimilarargumentapplied
totheterms
I
and
IV
yieldsthat
max
j
2
S
c
sup

2B
(

n
)
j
I
j
=
O
p

n

h

1

n
r
2log2
p
n

;
max
j
2
S
c
sup

2B
(

n
)
j
IV
j
=
O
p

h

2

n
r
2log2
p
n

;
Anargumentsimilartotheoneusedforproving(3.64)yields
max
j
2
S
c
j
III
j
=
O
p

n

h
2
r
2log2
p
n

;
max
j
2
S
c
j
V
j
=
O
p

n

h

1
r
2log2
p
n

:
Nowclaim(3.65)followsfromthesebounds,theratecondition(
iii
)of(3.20),andthefacts
j
P
n
k
=1
˙
kj
(

j


0
)
j
2
b
0
˙
u
p
s;
and
˙
2


b
0

max
s:
ThiscompletestheproofofLemma
3.5.3.
92
BIBLIOGRAPHY
93
BIBLIOGRAPHY
[1]
AGARWAL,A.,NEGHBAN,S.ANDWAINWRIGHT,M.J.(2012).FASTGLOBAL
CONVERGENCEOFGRADIENTMETHODSFORHIGHDIMENSIONALSTATIS-
TICALRECOVERY.
ANN.STATIST.
,
40
,2452{2482.
[2]
ALQUIER,P.ANDDOUKHAN,P.(2011).SPARSITYCONSIDERATIONSFORDE-
PENDENTVARIABLES
ELECTRONICJOURNALOFSTATISTICS
,
5
,750-774.
[3]
BAILLIE,R.T.(1996).LONGMEMORYPROCESSESANDFRACTIONALINTE-
GRATIONINECONOMETRICS
JOURNALOFECONOMETRICS
,
73
,735-59.
[4]
BELLONI,A.ANDCHERNOZHUKOV,V.(2011)
`
1
-PENALIZEDQUANTILERE-
GRESSIONINHIGHDIMENSIONALSPARSEMODELS,
ANN.OFSTATIST.
,
39
,
82{130.
[5]
BERAN,J.,FENG,Y.,GHOSH,S.,KULIK,R(2013).LONGMEMORYPROCESSES,
SPRINGER
[6]
BERAN,J.(1992).STATISTICALMETHODSFORDATAWITHLONG-RANGEDE-
PENDENCE,
STATISTICALSCIENCE
7
,404-427.
[7]
BICKEL,P.,RITOV,Y.ANDTSYBAKOV,A.(2009).SIMULTANEOUSANALYSIS
OFLASSOANDDANTZIGSELECTOR,
ANN.STATIST.
,
37
1705{1732.
[8]
BUCHINSKY,M.(1994).CHANGESINTHEU.S.WAGESTRUCTURE1963-1987:
APPLICATIONSOFQUANTILEREGRESSION
ECONOMETRICA
,
62
405{458.
[9]
B

UHLMANN,P.ANDVANDEGEER,S.(2011).
STATISTICSFORHIGHDIMEN-
SIONALDATA
.SPRINGER,NEWYORK.
[10]
CARROLL,R.J.,RUPPERT,D.,STEFANSKI,L.A.ANDCRAINICEANU,C.(2006).
MEASUREMENTERRORINNONLINEARMODELS:AMODERNPERSPECTIVE
.
NEWYORK:CHAPMANANDHALL.
[11]
DAHLHAUS,R.(1995).EFFICIENTLOCATIONANDREGRESSIONESTIMATION
FORLONGRANGEDEPENDENTREGRESSIONMODELS,
ANN.STATIST.
,
23
1029-1047.
94
[12]
DOUKHAN,P.(1994).MIXING:PROPERTIESANDEXAMPLES,
SPRINGER
[13]
DUCHI,J.,SHALEV-SHWARTZ,S.,SINGER,Y.ANDCHANDRA,T.(2008).EF-
FICIENTPROJECTIONSONTOTHE
`
1
-BALLFORLEARNINGINHIGHDI-
MENSIONS.
INTERNATIONALCONFERENCEONMACHINELEARNING.272-
279.
ACM,NEWYORK,NY.
[14]
DUREETT,R.(2005).PROBABILITY:THEORYANDEXAMPLES,(THIRDEDI-
TION)
DUXBURY
[15]
FAN,J.,FAN,Y.,BARUT,E.(2014).ADAPTIVEROBUSTVARIABLESELEC-
TION
ANN.OFSTATIST.
,
42
,324-351.
[16]
FRIEDMAN,J.,HASTIE,T.ANDTIBSHIRANI,R.(2010).REGULARIZATION
PATHSFORGENERALIZEDLINEARMODELSVIACOORDINATEDESCENT.
J.
STAT.SOFTW.
,
33
,1{22.
[17]
FULLER,W.A.(1987).
MEASUREMENTERRORMODELS
.WILEY,NEWYORK.
[18]
GIRAITIS,L.,KOUL,H.,SURGAILIS,D.(2012).LARGESAMPLEINFERENCE
FORLONGMEMORYPROCESSES,
IMPERIALCOLLEGEPRESS
[19]
GUO,H.ANDKOUL,H.(2007).NONPARAMETRICREGRESSIONWITHHET-
EROSCEDASTICLONGMEMORYERRORS,
JOURNALOFSTATISTICALPLAN-
NINGANDINFERENCE
,
137
379-404.
[20]
JOHNSTONE,I.M.(2001).CHI-SQUAREORACLEINEQUALITIES.
STATEOF
THEARTINPROBABILITYANDSTATISTICS
(LEIDEN,1999),399-418,IMSLEC-
TURENOTESMONOGR.SER.,
36
,IMS,BEACHWOOD,OH.
[21]
KNIGHT,K.ANDFU,W.(2000).ASYMPTOTICSFORLASSO-TYPEESTIMA-
TORS,
ANN.STATIST.
,
28
1356-1378.
[22]
LEE,R.,NOH,H.ANDPARK,B.(2014).MODELSELECTIONVIABAYESIAN
INFORMATIONCRITERIONFORQUANTILEREGRESSIONMODELS.
J.AMER.
STATIST.ASSOC.
,
109
,216-229.
[23]
LOH,P.,ANDWAINWRIGHT,M.J.(2012).HIGH-DIMENSIONALREGRESSION
WITHNOISYANDMISSINGDATA:PROVABLEGUARANTEESWITHNON-
CONVEXITY.
ANNALSOFSTATIST.
,
40
,1637{1664.
95
[24]
MCKENZIE,H.,JERDE,C.VISSCHER,D.,MERRILL,E.ANDLEWIS,M.(2009).
INFERRINGINTHEPRESENCEOFGPSMEASUREMENTERROR.
ENVIRON.
ECOL.STAT.
,
16
,531{546.
[25]
MEINHAUSEN,N.ANDB

UHLMANN,P.(2006).HIGHDIMENSIONALGRAPHS
ANDVARIABLESELECTIONWITHTHELASSO,
ANN.STATIST.
,
34
1436-1462.
[26]
NOLAN,D.ANDPOLLARD,D.(1987).U-PROCESSES:RATESOFCONVER-
GENCE,
ANN.STATIST.
,
15
780-799.
[27]
POLLARD,D.(1984).CONVERGENCEOFSTOCHASTICPROCESSES.
SPRINGER,NEWYORK.
[28]
PURDOM,E.ANDHOLMES,S.(2005).ERRORDISTRIBUTIONFORGENEEX-
PRESSIONDATA.
STATIST.APPL.INGENETICSANDBIOLOGY,
4
,ARTICLE
16.
[29]
RASKUTTI,G.,WAINWRIGHT,M.,YU,B.(2010).RESTRICTEDEIGENVALUE
PROPERTIESFORCORRELATEDGAUSSIANDESIGNS,
JOURNALOFMA-
CHINELEARNINGRESEARCH
,
99
,2241{2259.
[30]
ROSENBAUM,M.ANDTSYBAKOV,A.B.(2010).SPARSERECOVERYUNDER
MATRIXUNCERTAINTY
ANNALSOFSTATIST.
,
38
2620{2651.
[31]
ROSENBAUM,M.ANDTSYBAKOV,A.B.(2011).IMPROVEDMA-
TRIXUNCERTAINTYSELECTOR
TECHNICALREPORT.
AVAILABLEAT
HTTP://ARXIV.ORG/ABS/1112.4413.
[32]
SORENSEN,O.,FRIGESSI,A.ANDTHORESEN,M.(2012).MEASUREMENTER-
RORINLASSO:IMPACTANDLIKELIHOODBIASCORRECTION.AVAILABLE
ATHTTP://ARXIV.ORG/PDF/1210.5378.PDF.
[33]
STEFANSKI,L.A.,CARROLL,R.J.(1990).DECONVOLUTINGKERNELDENSITY
ESTIMATORS.
STATISTICS
,
21
,169{184.
[34]
TIBSHIRANI,R.(1996).REGRESSIONSHRINKAGEANDSELECTIONVIATHE
LASSO,
JOURNALOFTHEROYALSTATISTICALSOCIETY,SERIESB
,
58
267288.
[35]
WANG,H.,LI,R.ANDTSAI,C.L.(2007).TUNINGPARAMETERSELECTORS
FORSMOOTHLYCLIPPEDABSOLUTEDEVIATIONMETHOD
BIOMETRIKA
,
3
,
553{668.
96
[36]
WANG,H.,STEFANSKI,L.A.,ANDZHU,Z.(2012).CORRECTED-LOSSESTIMA-
TIONFORQUANTILEREGRESSIONWITHCOVARIATEMEASUREMENTER-
RORS.
BIOMETRIKA
,
99
,405-421.
[37]
WANG,L.,WU,Y.,LI,R.(2012).QUANTILEREGRESSIONFORANALYZING
HETEROGENEITYINULTRA-HIGHDIMENSION.
J.AMER.STATIST.ASSOC.
,
107
,214-222.
[38]
YOON,Y.,PARK,C.,ANDLEE,T.(2013),PENALIZEDREGRESSIONMODELS
WITHAUTOREGRESSIVEERRORTERMS,
JOURNALOFSTATISTICALCOM-
PUTATIONANDSIMULATION
,
83
,1756-1772.
[39]
ZHANG,Y.,LI,R.ANDTSAI,C.L.(2010).REGULARIZATIONPARAMETERSE-
LECTIONSVIAGENERALIZEDINFORMATIONCRITERION.
J.AMER.STATIST.
ASSOC.
,
105
,312{323.
[40]
ZHAO,P.ANDYU,B.(2006).ONMODELSELECTIONCONSISTENCYOF
LASSO,
JOURNALOFMACHINELEARNINGRESEARCH
,
7
,2541{2563.
[41]
ZOU,H.(2006).THEADAPTIVELASSOANDITSORACLEPROPERTIES,
J.
AMER.STATIST.ASSOC.
,
101
,1418-1429.
97