QUANTILE REGRESSION AND EXTREMES

By

Raka Mandal

A DISSERTATION

Submitted to

Michigan State University

in partial fulﬁllment of the requirements

for the degree of

Statistics – Doctor of Philosophy

2020

ABSTRACT

QUANTILE REGRESSION AND EXTREMES

By

Raka Mandal

The estimation of conditional quantiles of a response variable is of prime interest in many

statistical applications. Quantile regression (QR) is a powerful and popular method used

extensively to estimate conditional quantiles of a response Y in the presence of a covariate

X. Moreover, QR can quantify the eﬀect of covariates at diﬀerent quantile levels. While

modeling a rare event, quantiles at high or low tails are of particular interest. In such cases,

QR has inevitable shortfalls. Since we have fewer data at tails, QR estimates suﬀer from

high variability. The performance deteriorates further when the underlying distribution is

heavy-tailed. Estimation of extreme quantiles is therefore challenging, especially when the

data comes from a heavy-tailed distribution.

Extreme value theory (EVT) provides mathematical tools to analyze rare events. In prac-

tice, EVT can be used to assess the probability of more extreme events than any previously

observed occurrences. The scope of application of EVT includes ﬁnancial risk assessment,

extreme climate modeling, network anomaly detection, etc. Statistical modeling using EVT

can be carried out in two approaches: Block maxima and Peak over Threshold (POT). In the

block maxima approach, Generalized extreme value (GEV) distribution is ﬁtted to a series

of maximums derived from the observations.

In the POT approach, one ﬁts Generalized

Pareto (GP) distribution to observations exceeding a certain threshold.

An important problem while using the POT approach is the choice of the threshold.

Models based on EVT use asymptotic arguments to approximate the tail behavior. Hence,

the choice of the threshold is crucial in order to ﬁt the GP distribution to the data. POT

approach is a popular tool for the estimation of extreme quantiles in heavy-tailed data

wherein the excesses over a threshold are modeled as a function of the covariate X. However,

the eﬃciency of POT is severely compromised when the threshold itself varies as a function

of the covariate.

This dissertation proposes an integrated approach for estimating extreme conditional

quantiles from a heavy-tailed distribution. We begin with the case where the threshold does

not vary as a function of the covariate X. Using the POT approach to model the scaled

conditional excess, we propose an estimator for high conditional quantile. We establish

large sample properties of our estimator in the context of the GP distribution. Through

numerical investigations, we demonstrate the superiority of our method over QR at higher

quantiles. However, this method is not adequate when the threshold itself varies as a function

of the covariate X.

In order to circumvent this issue, we propose an extension of our

approach, which uses standard QR to extract information on the threshold and then model

the residuals as a GP distribution with covariate dependent parameters. We establish the

asymptotic properties of our method in the context of the GP distribution. Extending

further, we thoroughly study the asymptotic performance of this approach for a wide class of

heavy-tailed distributions using numerical simulations. Using simulation studies, we compare

our approach with existing methods used in estimation of high conditional quantiles for

heavy tailed distributions. As an application, we have implemented our method for the task

of precipitation downscaling with data obtained from the Vancouver International Airport

weather station. We have also demonstrated how our method of covariate adaptive threshold

selection can be implemented in practice.

Copyright by
RAKA MANDAL
2020

In memory of my grandmother Lila Ghosh.

v

ACKNOWLEDGEMENTS

This Ph.D. journey has been challenging, both in my professional and personal life, particu-

larly for the last two years. I want to take this opportunity to thank everyone who has given

me immense support through this journey.

First of all, I would like to thank my doctoral advisors, Dr. Tapabrata Maiti and Dr.

Shrijita Bhattacharya, who has introduced me to the world of research. I am thankful to

Dr. Maiti for believing in my potential and giving me several opportunities during my

time at Michigan State University. His constant guidance has helped me to grow as a

researcher and become successful in my career. It would not have been possible for me to

write this dissertation to I am today without his counsel and support.

I am thankful to

Dr. Bhattacharya for introducing me to the magniﬁcent domain of Extreme Value Theory.

She has been an excellent mentor, and working with her has been rewarding. Her timely

advice on my career plans, constant encouragement during stressful times, and constructive

criticisms have aided me in improving and prepare myself for the next phase of my life. Not

to mention, she has provided me with vital ﬁnancial support for this work.

I would like to thank Prof. Andrew Finley and Prof. Hyokyoung (Grace) Hong for being

part of my dissertation committee. Their helpful suggestions and recommendations have

been crucial for the completion of this dissertation.

I would also like to thank the Department of Statistics and Probability at Michigan State

University (MSU) to provide me with this incredible opportunity to pursue my academic

endeavor. East Lansing has been my home away from home for the past seven years.

I

consider myself fortunate to stay here and enjoy the picturesque campus of MSU. Be it

Spring, Summer, Fall, or Winter, my every venture into the nooks and corners of this campus

has been full of exciting discoveries. During my stressful days, I found solace by the river

banks of the Red Cedar River just outside my department. Thank you, WKAR-FM, for

fueling my mornings with classical melodies. Writing this thesis would have been mundane

vi

without you. Thank you, Dr. Tawa Sina, for inspiring me and making me believe in myself

every time I had given up. I consider you my unoﬃcial advisor and a guarding angel on this

mission. I would also like to thank my friends and fellow Ph.D. students in the department

Rejaul Karim, Abdhi Sarkar, Metin Eroglu, Yi-Chen Zhang, Siddhartha Nandy, Atreyee

Majumder, and Asish Banik. They have helped me big time during my tenure at MSU.

I have been blessed with a wonderful circle of friends who have always been caring and

supportive through this challenging phase of life. This acknowledgment would be incomplete

without mentioning a few of them. Thank you, Suman Chakraborty, for your constant

guidance like the north star to a sailor. Thank you, Sohini Raha, for being my lifeline during

the darkest hours of my life. Thank you, Sujatro Chakladar, for your immense patience and

endless encouragement. Thank you, Rejaul Karim, for being a friend everyone shall have in

their life. Thank you, Shreya Saha and Ankita Bhattacharya, for opening the new avenue of

spirituality in my life. Thank you, Sejuti Dasgupta, for giving me the comfort of a home far

away from my family. Thank you, Moumanti Podder, for your continual supply of courage

and conﬁdence. Last but not least, thank you, Sebanti Sengupta, Snigdha Panigrahi, Arpita

Mukherjee, Saswati Saha, Shrijita Bhattacharya, Saptarshi Mukherjee, Portia Bannerjee,

Shruti Saripalli, Rohosen Bandopadhyay, for making this journey easier.

I apologize if I

have missed anyone.

I want to end this acknowledgment with thanking my family members, though a mere

thank you is not enough for every little thing they have done for me. I would like to thank my

parents, Tripti Mandal and Prasanta Mandal, for their unconditional love, endless support,

and countless sacriﬁces. They have devoted their life to my well being. Thank you, Maa,

for nurturing my scientiﬁc curiosity since childhood. You have introduced me to the world

of mathematics, and that is why I came this far in my academic venture. Thank you, Baba,

for always believing in my potential and giving me the liberty to pursue my dreams, be it

my career choice or my hobbies. You are an impeccable human being, and I always look up

to you. I must thank my little sister, Trisha Mandal, for being always available for me, in

vii

spite of staying on a diﬀerent continent. Thank you, Chottu, for countless times you have

woken me up with your call so that I could reach to my classes in time. I cannot express

how hard it has been to stay away from all of you. Last but not least, thank you, Didu,

for blessing me with elated childhood. I know you would have been proud and happy if you

were still with us.

.

viii

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 2 QUANTILE REGRESSION AND ESTIMATION OF CONDITIONAL
QUANTILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Extreme Value Theory: Deﬁnitions and Results
. . . . . . . . . . . . . . . .
2.4 Statistical Modeling of Extreme Values with GP . . . . . . . . . . . . . . . .
2.5 Proposed Method: GP for Scale Models . . . . . . . . . . . . . . . . . . . . .
2.5.1 Method of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1
3.2 Proposed Method: QR Adjusted GP for Location-Scale Models

CHAPTER 3 ESTIMATION OF EXTREME CONDITIONAL QUANTILES WITH
COVARIATE ADAPTIVE THRESHOLD SELECTION . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
3.2.1 Method of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2
Simulation Study: GP Distribution . . . . . . . . . . . . . . . . . . .
3.3 Extension to the Nonparametric Problems and Threshold Selection . . . . .
3.3.1 Method of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Discussion: Threshold Selection for Heavy Tailed Distributions . . . .
3.4 Simulation Study: Nonparametric Problem . . . . . . . . . . . . . . . . . . .
3.4.1 Review of an Existing Method for Extreme Quantile Estimation . . .
3.4.2
Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Application to precipitation downscaling . . . . . . . . . . . . . . . . . . . .

CHAPTER 4 DISCUSSIONS AND FUTURE DIRECTIONS . . . . . . . . . . . .

APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX FOR CHAPTER 2 . . . . . . . . . . . . . . . .
APPENDIX FOR CHAPTER 3 . . . . . . . . . . . . . . . .

APPENDIX A
APPENDIX B

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

xi

1

7
7
8
10
15
17
17
21

29
29
31
32
36
41
42
44
47
49
51
68

73

75
76
78

79

ix

24

26

26

28

76

76

78

LIST OF TABLES

Table 2.1: φ = 0, GP(0.5): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5

and quantile levels τ.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 2.2: φ = 0, GP(0.5): mse(se) of quantile estimates for φ = 0 at x = 0.5 and

quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 2.3: φ = 2, GP(0.5): rel.bias(se) of quantile estimates for φ = 2 at x = 0.5

and quantile levels τ.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 2.4: φ = 2, GP(0.5): mse(se) of quantile estimates for φ = 2 at x = 0.5 and

quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table A.1: φ = 0, exp(1): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5 and

quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table A.2: φ = 0, exp(1): mse(se) of quantile estimates for φ = 0 at x = 0.5 and

quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table B.1: Probability distributions in heavy-tail domain . . . . . . . . . . . . . . . .

x

LIST OF FIGURES

Figure 2.1: Quantile Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

Figure 2.2: Quantile estimates for φ = 0, exp(1) at x = 0.5 and quantile levels
τ. The black line is the true quantile function. Average ± standard
error curves of the estimates are plotted for QR (orange) and GP.scale
(violet). The sample size is n = 500.

. . . . . . . . . . . . . . . . . . . .

Figure 2.3: Quantile estimates for φ = 0, GP(0.5) at x = 0.5 and quantile levels
τ. The black line is the true quantile function. Average ± standard
error curves of the estimates are plotted for QR (orange) and GP.scale
(violet). The sample size is n = 500.

. . . . . . . . . . . . . . . . . . . .

Figure 2.4: mse(se) for φ = 0, GP(0.5) at x = 0.5 and quantile levels τ. The sample

size is n = 500.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 2.5: Quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. The black
line is the true quantile function. Average± standard error curves of the
estimates are plotted for QR (orange) and GP.scale (violet).

. . . . . . .
Figure 2.6: mse±se for φ = 2, GP(0.5) and quantile levels τ. . . . . . . . . . . . . . .
Figure 3.1: mse±se for, GP(0.5) at x = (0.5, 0.5) and quantile levels τ. The sample

size is n = 500. (a): τc = 0.01 for QR.GP (b) and (c): 0 < τc < 1 . . . .

Figure 3.2: mse±se for, GP(0.5) at x = (0.5, 0.5) and quantile levels τ.The sample

size is n = 1000. (a): τc = 0.01 for QR.GP. (b) and (c): 0 < τc < 1

. . .

Figure 3.3: mse±se for, GP(0.8) at x = (0.5, 0.5) and quantile levels τ. The sample

size is n = 1000. (a): τc = 0.01 for QR.GP. (b) and (c): 0 < τc < 1

. . .

Figure 3.4: mse±se for, exp(1) at x = (0.5, 0.5) and quantile levels τ. τc = 0.01 for

QR.GP.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.5: Tail index estimates with changing values of τc. Red line corresponds to
the estimate of ξ when GP is ﬁt to the error i. Black line corresponds
to the estimate when our proposed method is ﬁt to yi. True tail index
is 0.8.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

22

23

25

27

28

38

39

40

41

47

Figure 3.6: Tail index estimates with changing values of τc. Red line corresponds to
the estimate of ξ when GP is ﬁt to the error i. Black line corresponds
to the estimate when our proposed method is ﬁt to yi. True tail index
is 0.8. Left: T-distributed errors. Right: Burr-distributed errors

. . . . .

Figure 3.7: mse±se for Pareto(0.5) at x = (0.5, 0.5) and quantile levels τ. The

sample size is n = 1000 and 0 < τc < 1 . . . . . . . . . . . . . . . . . . .

Figure 3.8: mse±se for Pareto(0.5) at x = (0.5, 0.5) at quantile τ = 0.998 and

sample sizes (a) n = 1000, (b) n = 2000 . . . . . . . . . . . . . . . . . . .

Figure 3.9: mse±se for Pareto(0.8) at x = (0.5, 0.5) and quantile levels τ. The

sample size is n = 2000 and 0 < τc < 1 . . . . . . . . . . . . . . . . . . .

Figure 3.10: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.985 and

sample sizes (a) n = 2000, (b) n = 5000 . . . . . . . . . . . . . . . . . . .

Figure 3.11: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.998 and

sample sizes (a) n = 2000, (b) n = 5000 . . . . . . . . . . . . . . . . . . .

Figure 3.12: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5) and quantiles

0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 1000, (b) n = 2000 . . . . . . . . .

Figure 3.13: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values

τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . .

Figure 3.14: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values

τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . .

Figure 3.15: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values

τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . .

Figure 3.16: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values

τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . .

Figure 3.17: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5) and quantiles

0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 2000, (b) n = 5000 . . . . . . . . .

Figure 3.18: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ =

0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . .

Figure 3.19: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ =

0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . .

48

53

54

55

56

56

57

58

59

60

61

62

64

65

xii

Figure 3.20: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ =

0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . .

Figure 3.21: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ =

0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . .

Figure 3.22: Daily precipitation at the Vancouver International Airport, 1971-2000 . .
Figure 3.23: Estimated tail index ( ˆξ) at diﬀerent choices of τc ∈ (0, 1). Method of

estimation is QR.GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.24: Mean square prediction error (MSPE) of diﬀerent quantile estimates at
quantile levels 0.9 < τ < 0.995. Thresholds: 0.7 for QR.GP(blue); 0.8,
0.85, 0.9 and 0.95 for QR.Hill (dotted lines)

. . . . . . . . . . . . . . . .

Figure A.1: mse(se) for φ = 0, exp(1) at x = 0.5 and quantile levels τ. The sample

size is n = 500.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

67

69

70

72

77

xiii

CHAPTER 1

INTRODUCTION

In many practical applications, we are interested in events that are rare but have very

signiﬁcant consequences. For example, a sudden heavy rainfall can abruptly increase the

volume of water inside a dam. The water then released can cause ﬂooding in the lowland

areas. Moreover, depending on the architecture and other attributes, the dam can break

if the water level exceeds a certain threshold. Hence it is vital to take into account the

extreme climate pattern while constructing the dam. Suppose that rainfall data from the

past 50 years from the location is available. The project aims to build a dam that can

sustain for the next 100 years. How can we predict an event that might happen once in

100 years with observations from the past 50 years? Extreme Value Theory provides the

theoretical justiﬁcation of extrapolation beyond a range of observations. One can derive

natural estimators of relevant quantities like extreme quantiles, return levels, etc. We can

expect these estimators to have desirable analytical properties. In the example given above,

we might be interested in predicting 0.99 quantile of the daily rainfall at the construction

location.

The scope of application of Extreme Value Theory (EVT) is vast. For example, Tawn

(1992) used EVT to estimate probabilities of annual maximum hourly sea-levels. This in-

formation is necessary to accurately predict the height of sea defenses in the coastal areas.

Jansen & Vries (1991) have successfully applied EVT to analyze ﬁnancial data and con-

cluded that the market crash of 1987 was not an outlier. Gilli & këllezi (2006) demonstrated

the application of extreme value distributions for risk assessment with major stock market

indices. Scarf & Laycock (1996) studied the analysis of corrosion extremes with EVT. The

main diﬀerence between traditional statistical modeling and EVT is that we generally model

the mean in the former while EVT models the extreme quantile, maxima, or some higher

order statistic. In this article, we are going to focus on methods for extreme tail quantiles.

1

Several approaches have been proposed to address extreme tail quantile estimation based

on EVT. In the nonregression setup, the estimation of tail quantiles using EVT is based on

the assumption that observations are independently and identically distributed. For exam-

ple, Weissman (1978) approached the high quantile estimation problem where only the k

largest observations from a sample of size n are available, k < n. The choice of the optimal

sample fraction k is vital to minimize the quantile estimate’s asymptotic mean square error.

As pointed out by Li et al. (2010), one can focus on the bias reduction of the parameter

estimates that produce a reduced bias quantile estimator in turn. The ﬁrst class of bias

reduction methods uses optimal k at the same rate as the theoretical optimal rate of the

tail index estimator (e.g., Beirlant et al. (2002), Gomesa & Martins (2002)) while the second

class of methods uses a larger order of k than the theoretical optimal rate (e.g., Feuerverger

et al. (1999), Peng & Qi (2004), Caeiro & Gomes (2009)). Although these methods are

developed for the nonregression setup, they are a good starting point based on which many

extensions for the regression framework can be adapted.

In a regression framework, we want to incorporate information on a covariate X into our

analysis. The foundation of regression quantiles has been laid by Koenker & Bassett (1978),

which uses liner quantile functions to explore the relationship between the response Y and

a covariate or a set of covariates X. The scope of regression quantiles proposed by Koenker

& Bassett (1978) is not just limited to tail quantiles. With quantile regression, we can

study the impact of X on diﬀerent quantiles of the response distribution. There are many

advantages to choosing quantile regression over mean regression. Firstly, quantile regression

gives us a complete picture of the conditional response distribution. Secondly, it is robust

to outliers. Thirdly, it is a distribution free approach so we can apply it to non-Gaussian

models. Abrevaya (2001) used regression quantiles to study the eﬀects of several factors

linked to newborns’ lower birthweights. Umantsev & Chernozhukov (2001) used regression

quantiles to model Value-at-Risk, a measure of market risk widely used in ﬁnance. Buchinsky

(1994) examined the eﬀects of education and experience at diﬀerent quantiles of the wage

2

distribution using quantile regression.

Although quantile regression serves as a basis, the large sample theory of quantile regres-

sion does not apply far enough into tails. For details, see Chernozhukov (2005). Moreover,

data sparsity at tails further ampliﬁes the variability of the estimate. Naturally, behavior at

tails is contingent on the nature of the underlying distribution. Therefore, one has to build

an alternative model for tails. Many authors have proposed quantile regression or a varia-

tion of it to predict extreme quantiles. For example, Friederichs & Hense (2007b) proposed

censored quantile regression to predict high precipitation. Taylor (2008) used exponentially

weighted quantile regression to analyze stock market data. Quantile regression methods

tailored for higher quantiles have been proposed for modeling extreme weather phenomenon

like tropical cyclones (Elsner & Jagger (2008), Jagger & Elsner (2008)), severe snowstorms

(Velthoen et al. (2019a)) and heatwaves (Kyselý (2002)). Chernozhukov (2005) used the

extreme value theory framework by Feigin & Resnick (1994) to develop a theoretical un-

derstanding of the linear quantile regression models at the tail. Chernozhukov (2005) has
shown that the asymptotic distribution of extreme quantiles depends on the tail index 1 and
the regression design. This tail index is a crucial indicator of the domain of attraction 2 of

the underlying process. Not all quantile estimation methods can capture the true quantile
function if the distribution belongs to the heavy-tailed3 domain of attraction.

To the best of our knowledge, there are very few methods that have been tailored for

estimating extreme quantiles in heavy tailed models. However, as pointed out by Wang &

Li (2013), the existing techniques can be broadly grouped into four classes. The ﬁrst class

of methods uses fully parametric models developed using EVT. Generally, these models ﬁt

the Generalized Extreme Value (GEV) or the Generalized Pareto (GP) distribution to the

data. Here, the information on the covariate X can be incorporated via a combination of

the location, scale, or the shape parameters of the GEV or GP distributions. These methods

1See deﬁnition 2.3.1
2See deﬁnition 2.3.1
3See deﬁnition 2.3.1

3

usually require a choice of covariate dependent tuning parameter, e.g., threshold value. More

details can be found in Davison & Smith (1990), Davison & Ramesh (2000), Hall & Tajvidi

(2000), Wang & Tsai (2009), and Chavez-Demoulin & Davison (2005). The second class of

models uses local estimation to extend the EVT methods in the regression framework. Here,

the high conditional quantiles of Y given X = x is estimated using the observations in a small

neighborhood of x. For details, see Gardes & Girard (2010), Daouia et al. (2011), Gardes &

Girard (2010), and Velthoen et al. (2019b). The eﬃciency of this class of method depends

on the richness of the data in the neighborhood of x. The third class of models uses quantile

regression directly to estimate extreme quantiles. More details can be found in Bremnes

(2004), Friederichs & Hense (2007a) and Jagger & Elsner (2008). The fourth and ﬁnal class

of methods uses a combination of quantile regression and EVT while imposing some suitable

assumptions of the tail behavior of the underlying distribution. Chernozhukov & Du (2006)

and Wang et al. (2012) proposed methods in which the intermediate conditional quantiles

are estimated using QR and used to extrapolate to the extreme quantiles. These methods

are developed assuming that the target quantile functions are linear in the covariates and

the distribution is in the maximum domain of attraction of a heavy tailed distribution.

In this article, we propose another approach for estimating high conditional quantiles us-

ing EVT and quantile regression, where we use the GP distribution as the basis of our model.

Pickands (1975) ﬁrst suggested using the GP distribution to model the upper tail of a distri-

bution and Davison & Smith (1990) extended this method for the regression setup. There are

several advantages of using GP distribution to model tails. Firstly, by Pickands–Balkema–de

Haan theorem, the exceedance over a high threshold of an arbitrary heavy tail distribution

can be well approximated by GP. Hence, regardless of the original distribution, GP can

approximate the tail behavior if it is heavy tailed. Secondly, the analytic behavior of the

parameter estimates of GP is well studied and they have desirable asymptotic properties.

This is useful for studying the behavior of diﬀerent statistics based on the parameter esti-

mates. Nonetheless, approximation by GP is contingent upon the selection of an appropriate

4

threshold. In a non-regression settings, the threshold is usually determined via model diag-

nostics or some nonparametric tests which assess the goodness of the ﬁt (Langousis et al.

(2016)). These methods can be adapted in a regression framework when the threshold is

not dependent on the covariate X. However, the problem becomes challenging when the

threshold changes with X. As far as we know, the problem of covariate adaptive threshold

selection has not been addressed. In this dissertation we propose a solution to this problem

which in turn facilitates the estimation of extreme conditional quantiles.

Our contribution in this article can be summarised as following. Our approach provides

consistent estimation of quantiles for the GP family of distributions featuring a non negative

tail index. Through numerical investigations, we show that our method can overcome the

shortcomings of the standard Quantile Regression (Koenker & Bassett (1978)) far in the tail

when the distribution is heavy tailed in nature. Moreover, we propose a generalization of

our method which can accommodate a large class of distributions in the heavy tail domain

of attraction. Selection of a covariate adaptive threshold is a nontrivial problem in this

case. Through several examples, we demonstrate the choice of an optimal threshold for

implementing our method. Our numerical investigations also show that our method provides

a robust estimation of extreme quantiles for heavy tailed distributions. Wang et al. (2012)

has proposed a semi-parametric approach for estimating higher conditional quantiles in the

context of heavy tailed distributions. We demonstrate the superiority of our approach over

Wang et al. (2012) in the context of a Pareto or a Generalized Pareto distribution. When

the underlying distribution is in the heavy-tailed domain of attraction, we show that with

appropriate choice of the threshold, our approach is as eﬃcient as Wang et al. (2012). Our

approach is computationally eﬃcient than Wang et al. (2012) which requires estimation of

a number of intermediate regression quantiles. Moreover,it is also easy to implement in

practice and we demonstrate it though an application in the real data. The contents of the

rest of the thesis is organized as below.

Chapter 2 proposes a method for estimating conditional quantiles from a regression model

5

where the scale is modeled as a function of the covariate X while the location is not dependent

X. Since the threshold is independent of the covariate, a direct adaptation of the POT

approach in Davison & Smith (1990) is shown to outperform the quantile regression at high

conditional quantiles. We establish the asymptotic properties of the proposed estimator

when the true model is GP. Through numerical investigations, we establish the eﬃciency

of our method over standard Quantile Regression for the case when the observations are

distributed according to a GP distribution.

Estimating conditional quantiles with GP when the threshold itself varies as a function

of the covariate is challenging. Chapter 3 proposes an extension of our method in Chapter 2,

which can be used for a covariate adaptive threshold selection. We show that our proposed

extension oﬀers a signiﬁcant improvement over our proposal in chapter 2 and standard Quan-

tile regression through numerical investigations. Also, we establish the asymptotic properties

of our estimator in the context of the GP distribution. The selection of a proper threshold

is vital for applying our method in practice where we do not know the exact underline dis-

tribution. Here, we give a guideline to choose an appropriate threshold when the underlying

distribution is a heavy tailed distribution from the Fréchet domain of attraction. We also

show that our proposed method can signiﬁcantly reduce computational complexity, which

is usually associated with the optimization of the quantile loss function. Extending further,

we thoroughly study the asymptotic performance of our approach for a wide class of heavy

tailed distributions via numerical simulations. We compare the performance of our approach

with Wang et al. (2012) in estimating conditional quantiles for a variety of distributions in

the heavy tailed domain of attraction. Finally, we implement our method to predict extreme

precipitation with the data obtained from the Vancouver International Airport weather sta-

tion. We end this dissertation with some discussions regarding our proposed method in

chapter 4. Here we also propose some future directions that can be pursued.

6

CHAPTER 2

QUANTILE REGRESSION AND ESTIMATION OF CONDITIONAL

QUANTILES

2.1

Introduction

The estimation of high conditional quantiles of a response variable is of prime interest

in many statistical applications. Quantile regression is a powerful and popular method for

this task. One can estimate the eﬀects of the covariates at diﬀerent conditional quantiles of

the response using quantile regression. For example, Friederichs & Hense (2007b) used me-

teorological information obtained from global climate model projections to predict localized

high precipitation. Lower infant birth weight can lead to severe health conditions. Abrevaya

(2001) used quantile regression to quantify the eﬀects of maternal behavior, demographics,

etc. on diﬀerent quantiles of the birth weight distribution. Umantsev & Chernozhukov

(2001) analyzed factors associated with high risk in ﬁnance. Although the scope of quan-

tile regression applications is vast, Chernozhukov (2005) pointed out that the large sample

theory of quantile regression is not applicable suﬃciently far in the tails. Hence, we should

proceed with caution while using quantile regression at tails.

The literature of modern extreme value theory has provided us with elegant tools to

quantify rare events.

In the nonregression framework, the problem is familiar and well

explored. For details, see Weissman (1978), Li et al. (2010), and Embrechts et al. (1997). In

the regression framework, one can model the extreme quantiles by ﬁtting a fully parametric

model using the generalized extreme value or the generalized Pareto distribution. See Coles

et al. (2001) for details.

In this chapter, we explore diﬀerent approaches for estimating conditional quantiles. We

start with quantile regression method proposed by Koenker & Bassett (1978) in section 2.2.

In section 2.3, we introduce fundamental results from the extreme value literature. In section

7

2.4 we introduce the statistical modeling framework using GP that serves as an basis for our

work. In section 2.5, we propose our method for estimating conditional quantile using POT

approach. We provide large sample properties of the estimator in the context of the GP dis-

tribution. In section 2.6 we demonstrate the eﬃciency of our method over standard quantile

regression. From our numerical investigation, we can see that quantile regression estimates

are not adequate for estimating higher conditional quantiles. Our proposed method can be

adapted for estimating high conditional quantiles, given the data follows GP distribution.

2.2 Quantile Regression

Consider a random variable Y with distribution function F and let 0 < τ < 1. Let
QY (τ ) = inf{y : F (y) ≥ τ} be the τ quantile of Y and E(Y ) is the expected value of Y .
Then

E(Y ) = argmin

c

E(Y − c)2

Or we can say that the mean E(Y ) minimizes the square error loss. Consider the function
ρτ : R → [0,∞) such that

ρτ (y) = y(cid:0)τ − I

[y<0]

(cid:1) =

yτ

if y ≥ 0

if y < 0

(τ − 1)y

Where τ ∈ (0, 1) is a ﬁxed number and I
[y<0] is the indicator function. The function ρτ in
ﬁgure 2.1 is often referred to as quantile loss function or tilted absolute loss function. This

is a convex function diﬀerentiable everywhere except at 0. Like mean, quantile of a random

variable can also be derived as a solution to a minimization problem. Following Koenker

(2005) one can easily verify that

QY (τ ) = argmin

c

E(cid:0)ρτ (Y − c)(cid:1)

(2.1)

QY (τ ) is the minimizer of the quantile error loss ρτ . When τ = 0.5, minimizing equation
2.1 is equivalent to minimizing E|Y − c| which is the absolute error loss.
In this case,
the solution is the median of Y . In the regression paradigm, one explores the relationship

8

Figure 2.1: Quantile Loss Function

between the response Y and the covariate X. In order to do so, the covariate information

must be incorporated into the response in some form. In classical regression, one models

the the conditional mean of the response. When the conditional mean is a parametric
function of the covariates, i.e, E(Y |X) = f (X, β) one can estimate the conditional mean
function E(Y |X) by minimizing the square error loss function E(Y − f (X, β))2. Suppose
we have sample observations (yi, xi), i = 1, . . . , n. The estimation of the mean function can
be achieved by the method of least squares. One can minimize the sample analog of the

squared error loss

n(cid:88)

(cid:0)yi − f (xi, β)(cid:1)2

i=1

The method of least squares enjoys computational feasibility since the loss function is dif-

ferentiable. However, it does not give more insight into the conditional distribution of the

response. Also, it lacks robustness as it is sensitive to outliers. Koenker & Bassett (1978)

introduced quantile regression as a robust alternative to mean regression. Quantile regres-

sion can be viewed as an extension of classical mean model regression where the conditional

quantile is modeled as a function of the covariates. When the conditional quantile is mod-
eled as a parametric function of the covariates, i.e., QY |X (τ ) = g(X, β) one can estimate it

9

by minimizing Eρτ (Y − g(X, β)). With sample observations (yi, xi), i = 1, . . . , n one can
minimize the sample quantile error loss at τ quantile

n(cid:88)

(cid:0)yi − g(xi, β)(cid:1)

ρτ

(2.2)

i=1

By choosing diﬀerent values of τ, one can obtain more insight into the conditional distribution

beyond the mean. Quantile regression methods are robust and do not rely on the Gaussian

error assumption. Since the quantile loss function is not diﬀerentiable, optimization of the

objective function in 2.2 is not straightforward. Koenker & Bassett (1978) has shown that

for linear quantiles, i.e., when

QY |X (τ ) = g(X, β) = α(τ ) + XT β(τ )

one can estimate the τ quantile at X = x as

ˆQY |x(τ ) = ˆα(τ ) + xT ˆβ(τ )

ˆα(τ ), ˆβ(τ ) = argmin

α,β

i=1

n(cid:88)

ρτ (yi − α − xT

i β)

(2.3)

The minimization of equation 2.3 can be reformulated as a linear programming problem

and solved accordingly. Moreover, the optimization has to be done separately for every

choice of τ. Nonetheless, this classical quantile regression model can be adapted for a broad

class of regression problems. In Chapter 1, we have discussed how the presence of extreme

observations in the data can increase the challenge of consistent estimation of higher quantiles

by several manifolds. Extreme Value Theory (EVT) provides us some powerful tools to deal

with the analysis of rare events. To understand the challenges, we start with a few basics of

EVT.

2.3 Extreme Value Theory: Deﬁnitions and Results

Extreme value theory deals with rare events. Rare events are events that happen with

low probability near the upper or lower endpoints of the distribution function F . Intuitively,

the asymptotic behavior of the maximum (or minimum) can give us more insight into the

10

upper (or lower) tail of the distribution. Let Y1, Y2, . . . be a sequence of independent random
variables with common distribution function F and let Mn = max(Y1, . . . , Yn). Let y+ =
sup{y : F (y) < 1}, the upper endpoint of the distribution function F . Then P (Mn < y) =
{F (y)}n for any y in the domain of F and Mn converges to y+ in probability as n → ∞.
Now,

n→∞{F (y)}n =

lim

0 if

1 if

y < y+
y ≥ y+

Hence, in order to obtain a nondegenerate limit distribution, proper centering and scaling of

Mn are necessary. Suppose that there exists sequence of constants an > 0 and bn such that

n→∞ P ((Mn − bn)/an < y) = lim

lim

n→∞ F n(any + bn) = G(y)

for some distribution function G and for all y in the domain of F . The natural question

that with proper centering and scaling, the sum(cid:80)n

that follows is what the possibilities of the limit are? We know from Central limit theorems
i=1 Yi of iid random variables converge to
normal distribution. The answer is given by the following theorem, which is also a signiﬁcant

result in the extreme value literature.

Theorem 2.3.1. (Fisher-Tippet-Gnedenko) Let Y1, Y2, . . . be independent random variables
with common distribution function F and let Mn = max(Y1, . . . , Yn). If there exists a se-
quence of constants an > 0, bn such that (Mn − bn)/an converges to a non degenerate
distribution G, i.e,

n→∞ Gn(y) = lim
lim

n→∞ P ((Mn − bn)/an < y) = lim

n→∞ F n(any + bn) = G(y)

then, the only three possible forms of the limit are:
Gumble : G0(y) =exp(−e−y),
F rechet : G1,α(y) =exp(−y−α),
W eibull : G2,α(y) =exp(−(−y)−α),

y ∈ R
y ≥ 0, α > 0

y ≤ 0, α < 0

(2.4)

11

Theorem 2.3.1 says that if the limiting distribution of F n(any + bn) exists, then there are
only three possible forms of the limit as given above. The limits of 2.4 can be collectively

thought as a member of a single family of distributions known as Generalized Extreme Value

(GEV) distributions. We say that Y follows standard GEV distribution with parameter ξ,
Y ∼ GEV(ξ) if

P (Y < y) = Gξ(y) = exp[−(1 + ξy)−1/ξ],

(2.5)
Note that the support of Y is dependent on ξ. We have −1/ξ < y < ∞ for ξ > 0 while
−∞ < y < −1/ξ for ξ < 0. This naturally leads to two very diﬀerent sub-classes of
distributions within the GEV family. The parameter ξ is a crucial indicator of the sub-class

1 + ξy > 0, ξ (cid:54)= 0

and it is known as the extreme value index or tail index. For ξ = 0, the cdf in 2.5 is not
deﬁned. Rather, we use the limit ξ → 0 which leads us to the Gumble distribution as
mentioned in equation 2.4.

Gξ(y) = G0(y) = exp(−e−y),

y ∈ R

lim
ξ→0

Again, we can see that the support has changed to entire R from the ξ (cid:54)= 0 cases. This con-
cludes the third sub-class in GEV family: Gumble. In fact, the other two limits: Fréchet and

Weibull belong to the two diﬀerent sub-classes mentioned above. With a little reparametriza-
tion α = 1/ξ and Y ∗ = 1 + ξY where Y ∼ GEV(ξ) we can see that Y ∗ ∼ G1,α for α > 0
and Y ∗ ∼ G2,α for α < 0.

We have seen that the behaviour of the GEV distribution is governed by the value of

the tail index ξ. Any change of location and scale does not aﬀect ξ. Hence, the entire GEV

family can be summarised by introducing two additional parameters: ˜µ as location and ˜σ > 0
as scale. Let ˜Y = (Y − ˜µ)/˜σ. Then ˜Y ∼ GEV(˜µ, ˜σ, ξ) with

exp

exp

(cid:104) −(cid:16)
(cid:104) − exp

(cid:17)(cid:105)
1 + ξ( y−˜µ
˜σ )

(cid:16) y−˜µ

˜σ

(cid:17)−1/ξ(cid:105)

if ξ = 0

P ( ˜Y < y) =

if ξ (cid:54)= 0

(2.6)

Where 1 + ξ( y−˜µ
Y ∼ GEV(ξ). The following deﬁnitions are useful to keep in mind for future use.

˜σ ) > 0. When Y is standard GEV with µ = 0 and σ = 1, we simply say

12

Deﬁnition 2.3.1. (Domain of attraction) Suppose that Y1, Y2, . . .
If

iid∼ F and Mn = max(Y1, . . . , Yn).

n→∞ P ((Mn − bn)/an) < y) = lim

lim

n→∞ F n(any + bn) = G(y)

for some constants an > 0 and bn and G is a non-degenerate distribution, we say F belongs
to the domain of attraction of G.

From theorem 2.3.1, we know that there are three possible domains of attraction of F ,

namely Fréchet (ξ > 0), Weibull (ξ < 0) or Gumble ξ = 0. Moreover, F can belong to

only one of the three sub-classes. In practice, we can use the value of the tail index ξ to
identify the domain of attraction. Hence we adopt the notation F ∈ D(Gξ) to indicate that
F belongs to the domain of attraction of G with tail index ξ.
Deﬁnition 2.3.2. (Heavy-tailed distribution) F is a heavy-tailed distribution if F ∈ D(Gξ)
for some tail-index ξ > 0.

When F is known, one can easily ﬁnd the quantile by inverting the cdf. Here we are

interested in a higher quantile τ close to 1 when F is unknown. We will now give an outline
of approximating a higher quantile of Y using EVT. Suppose u ∈ R. One can interpret
Y > u as an extreme event for high values of u. Consider the following quantity:
1 − F (y + u)
1 − F (u)

1 − Fu(y) = P (Y > y + u|Y > u) =

P (Y > y + u)

P (Y > u)

=

The above quantity can be interpreted as the conditional exceedance probability over a
value u. We say Fu(y) = P (Y ≤ y + u|Y > u) is the excess cdf of Y over a threshold u. If
Yu = Y |Y > u, then P (Yu < y + u) = Fu(y). The following theorem gives us the limiting
distribution of Yu as u → ∞.

Theorem 2.3.2. (Pickands–Balkema–de Haan) Let Y1, Y2, . . . be independent random vari-
ables with common distribution function F . Deﬁne the conditional distribution of excess as
Y − u|Y > u ∼ Fu.

Fu(y) = P (Y − u ≤ y|Y > u) =

F (u + y) − F (u)

1 − F (u)

13

If F ∈ D(Gξ) for some ξ satisfying theorem 2.3.1 then for large enough u, Fu can be
approximated as

(cid:17)−1/ξ

1 −(cid:16)

1 − e− y

σ

1 + ξy
σ

if ξ (cid:54)= 0

if ξ = 0

(2.7)

H(y) =

for some σ > 0.

Theorem 2.3.2 is also known as the second theorem in extreme value literature, while

theorem 2.3.1 by Fisher is known as the ﬁrst theorem. The distribution H in theorem 2.3.2

is known as generalized Pareto (GP) distribution with scale σ and tail index ξ. The standard
GP distribution has scale σ = 1. We say Y ∼ GP(ξ) if

(1 + ξy)−1/ξ

e−y

P (Y > y) =

if ξ (cid:54)= 0

if ξ = 0

(2.8)

where y ≥ 0 if ξ > 0 and 0 ≤ y ≤ −1/ξ if ξ < 0. Just like in GEV, the range of Y depends
on ξ and leads to three diﬀerent sub-classes within the GP family. Consider the following
reparametrization: Y ∗ = 1 + ξY and α = 1/ξ. Then, for α > 0, P (Y ∗ > y) = y−α, y ≥ 1
which is the standard Pareto distribution. When ξ < 0, we have a bounded distribution on
0 ≤ y ≤ −1/ξ with P (Y ∗ > y) = y−1/ξ. Finally, for ξ = 0, P (Y > y) = e−y is standard
exponential distribution with rate 1. The entire GP family can be represented by introducing
two additional parameters: σ for scale and µ for location. Let ˜Y = (Y − µ)/σ then we say
˜Y ∼ GP (µ, σ, ξ) and

(cid:104)



(cid:16) y−µ
(cid:104)−(cid:0) y−µ

σ

σ

(cid:17)(cid:105)−1/ξ
(cid:1)(cid:105)

1 + ξ

exp

if ξ (cid:54)= 0

if ξ = 0

P ( ˜Y > y) =

Suppose F ∈ D(Gξ) satisﬁes theorem 2.3.1 with limn→∞ F n(any+bn) following GEV(˜µ, ˜σ, ξ).
According to theorem 2.3.2, for large values of u the distribution of the excess Yu is well
approximated by GP(µ = 0, σ, ξ) with σ = ˜σ + ξ(u − ˜µ). The important thing to note here
is that the shape parameter ξ in GP is in fact the tail index in GEV. The scale parameter

σ is a function of GEV parameters and u.

14

2.4 Statistical Modeling of Extreme Values with GP

Let Y denotes the univariate random response variable and X = (X1, . . . , Xp) denotes
the p dimensional random vector of covariates. (yi, xi); i = 1, . . . , n are random samples
from the joint distribution of Y and X. We also assume that Y ∼ FY |x. In the previous
section we have seen that if F ∈ D(Gξ) then for a large enough value u, the distribution of
the excess eu = Y − u|Y > u can be approximated as

¯Fu(y) = P (Y − u ≤ y|Y > u) ≈(cid:16)

(cid:17)−1/ξ

1 +

ξy
σ

y ≥ 0, (1 + ξy/σ) > 0

,

for some σ > 0.In a parametric regression setup, one has to incorporate the covariates’
information in the parameters of the conditional distribution FY |x. This can be done in
several ways. Let us begin with the following model:

Y = (θ + XT γ)

(2.9)

where θ ∈ R and γ = (γ1, . . . , γp) ∈ Rp are unknown parameters. XT γ = γ1X1 +··· + γpXp
with θ + XT γ > 0. We assume that  ∼ GP(ξ) with some ξ > 0. Clearly, Y |x follows GP
distribution with µ = 0, tail index ξ and scale σ = σ(x) = θ + xT γ.
Our goal is to estimate quantiles from model 2.9. Let 0 < τ < 1 and QY |x(τ ) = F−1
Then from 2.8, we have

Y |x(τ ).

QY |x(τ ) = (θ + xT γ)Q(τ ) =

(θ + xT γ)

ξ

(cid:104)

(cid:105)
(1 − τ )−ξ − 1

(2.10)

ξ [(1 − τ )−ξ − 1] is the τ quantile of standard GP distribution. We need to
Here Q(τ ) = 1
estimate θ, γ and ξ from the data. We can obtain these parameters from maximizing the log

15

likelihood

i=1

l(θ, γ, ξ) = − n(cid:88)
= − n(cid:88)
n(cid:88)
n(cid:88)

i=1

i=1

=

=

i=1

dF (yi|xi)
(cid:16)

1

log

σ(xi)

1 +

log σ(xi) + (1 +

log(θ + xT

i γ) + (1 +

ξyi
σ(xi)

1
ξ

)

−1

ξ

(cid:17)− 1
(cid:16)
n(cid:88)
n(cid:88)

log

i=1
1
ξ

)

i=1

(cid:17)

ξyi
σ(xi)

1 +

(cid:16)

log

1 +

ξyi
θ + xT
i γ

(2.11)

(cid:17)

Note that 2.11 is valid if σ(x) = θ + xT

i γ > 0. Also, there is no analytical solution for
2.11. Since the range of GP depends on its parameters, numerical optimization of 2.11 needs

separate treatment for ξ < 0. From now onward, we will only focus on the methods of
parameter estimation for ξ ≥ 0. Given ξ ≥ 0, the support of GP is unbounded above. We
numerically optimize 2.11 with the constraints ξ > 0 and θ + γxi > 0 for i = 1, . . . , n. Let

(cid:26) n(cid:88)

(cid:16)

n(cid:88)

i=1

(cid:17)(cid:27)

ˆθ, ˆγ, ˆξ = argmin

l(θ, γ, ξ) = argmin

θ,γ,ξ

θ,γ,ξ

i=1

log(θ + xT

i γ) + (1 +

1
ξ

)

log

1 +

ξyi
(θ + xT

i γ)

We propose the following estimator for model 2.9 by plugging in the estimators in 2.10 we

propose to estimate the conditional quantile as

ˆQY |x(τ ) =

(ˆθ + xT ˆγ)

ˆξ

(cid:104)

(cid:105)
(1 − τ )− ˆξ − 1

(2.12)

Note that by theorem 2.3.2, we can reasonably approximate quantiles from model 2.9 by
2.12 as long as Y |x ∼ F ∈ D(Gξ) for some ξ > 0. What happens when we extend model 2.9
as

(2.13)
where φ ∈ R and θ + γX > 0 and  ∼ GP(ξ) with Y |x ∼ GP(φ, θ + xT γ, ξ). Now we have
to estimate an additional parameter φ. We begin with our proposal in the following section.

Y = φ + (θ + γX)

16

2.5 Proposed Method: GP for Scale Models

Let us assume the following extension of the regression model:

Y = φ + (θ + XT γ)

(2.14)
Where φ ∈ R and γ = (γ1, . . . , γp) ∈ Rp are unknown parameters. XT γ = γ1X1+···+γpXp.
We assume that  ∼ GP(ξ) for some ξ > 0. Clearly, here Y |x follows GP distribution with
tail index ξ, scale σ(x) = θ + xT γ and location µ = φ. Hence

ξ(y − φ)
σ(x)
Note that in model 2.14, the true τ quantile is given by

P (Y > y|x) =

1 +

QY |x(τ ) = φ + (θ + xT γ)Q(τ ) = φ +

(cid:105)
(1 − τ )−ξ − 1

(2.15)

(cid:20)

(cid:21)−1/ξ
(cid:104)

(θ + xT γ)

ξ

We need to estimate the parameters φ, γ and ξ. If the true distribution of  is GP, we can
i γ > 0 and yi − φ > 0 for all

numerically optimize 2.16 under the constraints ξ > 0, θ + xT

i = 1, . . . , n.

ˆφ, ˆθ, ˆγ, ˆξ = argmin
φ,θ,γ,ξ

= argmin
φ,θ,γ,ξ

(cid:20)

l(φ, θ, γ, ξ)

n log(θ + xT

i γ) + (1 +

(cid:16)

log

n(cid:88)

i=1

1
ξ

)

1 +

ξ(yi − φ)
θ + xT
i γ

(cid:17)(cid:21)

(2.16)

Then we can plug in estimates from 2.16 in 2.15 to obtain ˆQY |x(τ ). Note that the last
condition yi − φ > 0 adds n many more constraints in 2.16. In model 2.9 we had only n + 1
constraints. Moreover, in many practical applications, the true distribution of  might not
be GP. It can be some distribution in D(Gξ) with ξ > 0. Our aim is to ﬁnd a reasonable
estimator for quantiles in regression models in a more general setup. We propose our method

in the following section.

2.5.1 Method of Estimation

Suppose that we have a random sample (yi, xi), i = 1, . . . , n from the distribution of (Y, X)
where yi ∈ R and xi = (xi1, . . . , xip) ∈ Rp. Y |x ∼ F where F ∼ GP(ξ) for some ξ > 0. Let

17

τc be a ﬁxed value such that 0 < τc < 1. Let y(1) = min(y1, . . . , yn). We ﬁt a GP model to
˜yi = yi − y(1); i = 1, . . . , n. Clearly, ˜yi ≥ 0. We can estimate θ, γ and ξ from the restricted
likelihood as:

θ,γ,ξ

log(θ + xT

l(θ, γ, ξ) = argmin

ˆθ, ˆγ, ˆξ = argmin

ξ ˜yi
θ + xT
i γ
(2.17)
i γ > 0 and ξ > 0. Here we do not estimate φ directly
form the GP likelihood. Our advantage is a huge gain in eﬃciency as we avoid n additional

with only n + 1 constraints: θ + xT

i γ) + (1 +

θ,γ,ξ

i:˜yi>0

log

1 +

constraints as described in 2.16. Finally, we propose the τ Quantile estimate for 2.15 as:

ˆQY |x(τ ) = y(1) +

= y(1) +

(ˆθ + xT ˆγ)

ˆξ

(ˆθ + xT ˆγ)

ˆξ

(cid:104)
(cid:104)

(cid:105)
(1 − τ )− ˆξ − 1
(cid:105)
(1 − τ )− ˆξ − 1

(cid:34) (cid:88)

(cid:16)

(cid:88)

i:˜yi>0

1
ξ

)

(cid:17)(cid:35)

(2.18)

Algorithm 1 can be used to ﬁnd quantile estimates with our proposed method.

Algorithm 1: GP for scale family

1. Get ˜yi = yi − min(y1, . . . , yn)
2. Get ˆθ, ˆγ, ˆξ from 2.17

3. Given X = x, estimate the τ quantile as

ˆQY |x(τ ) = y(1) +

(ˆθ + xT ˆγ)

ˆξ

(cid:104)
(cid:105)
(1 − τ )− ˆξ − 1

In the following proposition, we establish the consistency of algorithm 1 for GP distribu-

tion under the following compactness assumption on the covariate x.
Assumption (A1): |xi| ≤ M for some M > 0.

Proposition 2.5.1. Let

yi = φ + (θ + x(cid:62)

i γ)i,

iid∼ GP (ξ).

i

18

Let ˆφ = Y(1). With (cid:101)xi = [1, x(cid:62)
i ] and(cid:101)γ = [θ, γ], let
(cid:34) n(cid:88)
log((cid:101)x(cid:62)
ˆQY |x(τ ) = ˆφ +(cid:101)x(cid:62)ˆγ

i=1
Estimate the τ quantile of Y as

ˆγ, ˆξ = argmin

i c) +

(cid:18)

(cid:16)

1 +

c,d

1
d

(cid:19) n(cid:88)

(cid:32)

log

1 +

i=1

(cid:17)
(1 − τ )− ˆξ − 1

ˆξ

(cid:33)(cid:35)

(cid:101)x(cid:62)
d(yi − ˆφ)

i c

(2.19)

Then,

1.

2.

√

n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ)

√
n( ˆQY |x(τ ) − QY |x(τ )) =⇒ N (0, σ2)

Proof. We ﬁrst show that Y(1) is a consistent estimate of φ.

√
P (

√
n) =

n|Y(1) − φ| > δ) = P (Y(1) > φ + δ/
n((cid:101)x(cid:62)
i (cid:101)γ)

(cid:32)
(cid:33)−1/ξ
n(cid:89)
nδ/M(cid:101)γ → 0, n → ∞
−√

≈ e

1 +

i=1

ξδ

=

n(cid:89)
P (((cid:101)x(cid:62)
i (cid:101)γ)i > δ/
(cid:32)
≤ n(cid:89)
nM(cid:101)γ
ξδ√

1 +

i=1

i=1

√

n)

(cid:33)−1/ξ

where M(cid:101)γ = M max|(cid:101)γj| by assumption (A1). Thus, ˆφ − φ = oP (1/

√

n).

First, suppose φ is ﬁxed, then

(cid:20) n(cid:88)

i=1

log(cid:101)xT

i(cid:101)γ + (1 +

1
ξ

)

(cid:16)

log

1 +

n(cid:88)

i=1

ˆγ0, ˆξ0 = argmin(cid:101)γ,ξ

Then, the MLE is consistent and satisﬁes

(cid:17)(cid:21)

θ +(cid:101)xT
i(cid:101)γ
ξ(yi − φ)


(cid:101)xi(cid:101)x(cid:62)

i γ

i=1

(2.20)

(2.21)

where

I(γ, ξ) =

√

n((ˆγ0, ˆξ0) − ((cid:101)γ, ξ)) =⇒ N (0, I(γ, ξ)−1)

(cid:80)n

(cid:101)xi(cid:101)x(cid:62)
((cid:101)x(cid:62)
(cid:80)n

(1−ξ)(1−2ξ)

(cid:80)n

1
ξ
−1

i γ)2

−1

i=1

2

i

(1−ξ)(1−2ξ)

i=1

(1−ξ)(1−2ξ)

(cid:101)x(cid:62)
i(cid:101)x(cid:62)

i γ

19

Thus, for φ ﬁxed (ˆγ0, ˆξ0) is the solution of the equation

n(cid:88)

i=1

log(cid:101)xT

i γ + (1 +

) log

1 +

1
ξ

(cid:32)

(cid:33)

= 0

θ +(cid:101)xT
ξ(yi − φ)
i γ

∇(cid:101)γ,ξ

which implies

Now,

(ˆγ0, ˆξ0) = (h1(φ), h2(φ))

(h1( ˆφ), h2( ˆφ)) ≈ (h1(φ) + ( ˆφ − φ)h(cid:48)

1(φ), h2(φ) + ( ˆφ − φ)h(cid:48)

2(φ))

Therefore,

√

n((h1( ˆφ), h2( ˆφ)) − ((cid:101)γ, ξ)) ≈ √

n((h1( ˆφ), h2( ˆφ)) − ((cid:101)γ, ξ)) +

√

n( ˆφ − φ)(h(cid:48)

1(φ), h(cid:48)

2(φ))

As shown previously,

√
n( ˆφ − φ) = oP (1), thus

√

n((ˆγ, ˆξ) − (γ, ξ)) =⇒ N (0, Σ)

1(φ), h(cid:48)
where Σ = (h(cid:48)
For part (2), note that

2(φ))I(γ, ξ)−1(h(cid:48)

1(φ), h(cid:48)

2(φ))(cid:62) with I(γ, ξ) same as in (2.21).

√

n( ˆQY |x(τ ) − QY |x(τ ))
√

√

n( ˆφ − φ) +

n

=

√

=

n( ˆφ − φ) +

√

(cid:16)

(cid:32)(cid:101)x(cid:62)ˆγ
(cid:17) −(cid:101)x(cid:62)(cid:101)γ
n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) = oP (1) +

(1 − τ )− ˆξ − 1

ˆξ
√

ˆξ

(cid:17)(cid:33)
(cid:16)
(1 − τ )− ˆξ − 1
n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ))

Now,

Since

√

n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) ≈ √
n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ), thus

√

n((ˆγ, ˆξ) − ((cid:101)γ, ξ))(cid:62)∇(cid:101)γ,ξg((cid:101)γ, ξ)

n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) =⇒ N (0, σ2)
where σ2 = ∇(cid:101)γ,ξg((cid:101)γ, ξ)(cid:62)Σ∇(cid:101)γ,ξg((cid:101)γ, ξ). This completes the proof.

√

20

2.6 Simulation Study

We want to assess our proposed GP scale model’s performance in estimating quantiles

from regression models of the form Y = φ + (θ + XT γ). In this case, the quantile is linear

in covariates. Throughout this section and later, we will concentrate on estimating quantile

functions of the form:

QY |x(τ ) = α(τ ) + xT β(τ )

where α(τ ) is the slope and β(τ ) = {β1(τ ), . . . , βp(τ )} is the intercept. For our regression
model

Y = φ + (θ + XT γ),

 ∼ F

QY |x(τ ) = φ + (θ + xT γ)Q(τ )

α(τ ) = φ + Q(τ ),

β(τ ) = γQ(τ )

Since the quantiles are linear, quantile regression can be used without violating any model

assumptions. Note that both the slope and the intercept of the target quantile are aﬀected

by the change in the quantile level τ. This eﬀect is very signiﬁcant at higher values of τ,
particularly if  ∈ D(Gξ) for ξ > 0. Hence we want to see how diﬀerent values of τ and ξ
aﬀect the quantile estimates. We start with the simple case: φ = 0 (case-I). In this case, we

can directly ﬁt GP distribution to the data with location µ = 0, scale σ(x) = θ + xT γ > 0

and shape ξ > 0. Since we assume that φ = 0, we do not estimate it from the data. Both

the slope and the intercept of the target quantile can be estimated from a GP ﬁt to the
data. When φ (cid:54)= 0 (case-II), we use the scaled GP method as proposed in section 2.5, where
we estimate φ as the minimum value in the data. We compare our method (GP.scale) with

quantile regression (QR) for both case-I and II. We generate our data as follows:

yi = φ + (1 + 0.9xi)ei, i = 1, . . . , n

Here xi ∼ U (−1, 1) and ei are independent random variables, θ = 1 and γ = 0.9. Clearly,
1 + 0.9xi > 0 for all i = 1, . . . , n. We choose two diﬀerent values φ = 0 and 2. For both

21

φ = 0 and φ = 2, we simulate ei from two distributions: exponential with mean λ = 1
(exp(1)) and standard GP with ξ = 0.5 (GP(0.5)). Standard exponential is the limit of the

GP distribution in 2.3.2 for ξ = 0. We use two sample sizes n = 500 and 1000. The Monte

Carlo sample size for all of the cases is K = 500.

Figure 2.2: Quantile estimates for φ = 0, exp(1) at x = 0.5 and quantile levels τ. The black
line is the true quantile function. Average ± standard error curves of the estimates are
plotted for QR (orange) and GP.scale (violet). The sample size is n = 500.

First we look at the τ quantile estimates for case-I with ei ∼ exp(1). We use the Monte

22

0246qq.hat$q.trueq.truegp.scaleqr0.010.100.200.300.400.500.600.700.800.900.96(a) all quantiles0.00.51.01.52.02.53.03.5qq.hat$q.trueq.truegp.scaleqr0.010.150.300.450.600.750.90(b) lower quantiles34567qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98(c) higher quantilesquantiletFigure 2.3: Quantile estimates for φ = 0, GP(0.5) at x = 0.5 and quantile levels τ. The
black line is the true quantile function. Average ± standard error curves of the estimates
are plotted for QR (orange) and GP.scale (violet). The sample size is n = 500.

Carlo average estimate for the τ quantile

¯Q(τ ) =

1
K

K(cid:88)

i=1

ˆQY |x(τ )

The black line in ﬁgure 2.2 is the true quantile (q.true) function. Average estimates from

both of the methods are very close to q.true. For better visualization, we have included
only average ± standard error (se) curves for both QR (orange) and GP.scale (violet). We
plot for all quantiles 0 < τ < 1 in ﬁgure 2.2 (a). Both GP.scale and QR estimates have less

23

05101520253035qq.hat$q.trueq.truegp.scaleqr0.010.100.200.300.400.500.600.700.800.900.96(a) all quantiles01234567qq.hat$q.trueq.truegp.scaleqr0.010.150.300.450.600.750.90(b) lower quantiles5101520253035qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98(c) higher quantilesquantiletTable 2.1: φ = 0, GP(0.5): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5 and
quantile levels τ.

τ

0.1
0.3
0.5
0.7
0.9
0.93
0.95
0.97
0.99

n = 500

n = 1000

GP.scale
0.07(0.06)
0.06(0.05)
0.06(0.05)
0.06(0.04)
0.07(0.05)
0.08(0.06)
0.09(0.06)
0.11(0.08)
0.15(0.12)

QR

0.14(0.11)
0.09(0.08)
0.08(0.06)
0.08(0.06)
0.1(0.08)
0.12(0.09)
0.14(0.11)
0.18(0.14)
0.29(0.31)

GP.scale
0.05(0.03)
0.05(0.03)
0.04(0.03)
0.04(0.03)
0.05(0.04)
0.06(0.04)
0.06(0.05)
0.07(0.06)
0.1(0.08)

QR

0.11(0.08)
0.07(0.05)
0.06(0.04)
0.06(0.04)
0.07(0.05)
0.08(0.06)
0.09(0.07)
0.11(0.09)
0.17(0.15)

variability in lower quantiles. For better visibility, we separate our quantiles into two regions:
τ ∈ (0.9, 1) (higher quantiles) and τ ∈ (0, 0.9) (lower quantiles). In ﬁgure 2.2 (b) we can see
that both QR and GP.scale approximate lower quantiles pretty well. At higher quantiles,

QR has more variability than GP.scale. To see if the behavior changes with heavier than
exponential tails, we proceed to case-I where ei ∼ GP(0.5).
that GP(0.5) quantiles have a much higher range than exp(1). We do not have any issues

In ﬁgure 2.3 (a) we can see

with QR at lower quantiles. In ﬁgure 2.3 (c) we can see that QR estimates have very high

standard error, particularly for τ > 0.95. Note that from ﬁgures 2.2 and 2.3 we do not get

any idea about the bias of the estimators. So we look at the relative bias from K iterations.

n(cid:88)

i=1

(cid:12)(cid:12)(cid:12)1 − ˆQY |x,k(τ )

QY |x(τ )

(cid:12)(cid:12)(cid:12)

rel.bias =

1
K

We have enumerated the relative bias for case-I, GP(0.5) in table 2.1. Our true quantile
values are very close to 0 for τ ≈ 0. That leads to a higher relative bias at lower quantiles.
At higher quantiles, we expect the rel.bias to be large. The relative bias of case-I, exp(1)

is given in table A.1 in the appendix. As we can see, looking at the relative bias is not

enough. Hence, we use mean square error (mse), which can account for both the bias and

the variance.

mse =

1
K

QY |x(τ ) − ˆQY |x,k(τ )

(2.22)

(cid:17)2

n(cid:88)

(cid:16)

i=1

24

Figure 2.4 (a) gives us mse of quantile estimates for QR and GP.scale for all quantiles.

Figure 2.4: mse(se) for φ = 0, GP(0.5) at x = 0.5 and quantile levels τ. The sample size is
n = 500.

We can see that the mse is less than 0.6 at lower quantiles for both of the methods. After

0.95 quantile, mse for both QR and GP.scale increases at a very high rate.But relative to

QR, GP.scale is more stable, particularly at higher quantiles. Note that we have very few

observations after 0.9 quantile. With a sample size of n = 500 we do not expect to do any

better. We have summarised these results in table 2.2 for some quantile levels. Clearly

for n = 500, mse of QR starts to explode after τ = 0.95. When we increase our sample

size to 1000, the performance of QR improves, but it does not surpass GP.scale. Note that

25

010203040qq.hat.ms$mse.10.010.100.200.300.400.500.600.700.800.900.96gp.scaleqr(a) all quantiles0.00.10.20.30.40.50.6qq.hat.ms$mse.10.010.150.300.450.600.750.90gp.scaleqr(b) lower quantiles051015202530qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqr(c) higher quantilesTable 2.2: φ = 0, GP(0.5): mse(se) of quantile estimates for φ = 0 at x = 0.5 and quantile
levels τ.

n = 500

n = 1000

τ

0.1
0.3
0.5
0.7
0.9
0.93
0.95
0.97
0.99

GP.scale

0(0)
0(0)
0.01(0)
0.03(0)
0.29(0.02)
0.61(0.04)
1.23(0.08)
3.39(0.23)
25.32(2.04)

QR
0(0)
0(0)
0.01(0)
0.06(0)
0.62(0.04)
1.41(0.1)
3.16(0.23)
9.72(0.76)

120.95(19.78)

GP.scale

0(0)
0(0)
0(0)
0.02(0)
0.15(0.01)
0.31(0.02)
0.61(0.04)
1.61(0.12)
11.29(0.91)

QR
0(0)
0(0)
0.01(0)
0.03(0)
0.33(0.02)
0.67(0.04)
1.35(0.09)
3.77(0.26)
33.62(3.03)

Table 2.3: φ = 2, GP(0.5): rel.bias(se) of quantile estimates for φ = 2 at x = 0.5 and
quantile levels τ.

τ

0.1
0.3
0.5
0.7
0.9
0.93
0.95
0.97
0.99

n = 500

n = 1000

GP.scale
0.01(0)
0.01(0.01)
0.02(0.02)
0.03(0.02)
0.05(0.04)
0.06(0.05)
0.07(0.06)
0.09(0.07)
0.14(0.11)

QR

0.01(0.01)
0.02(0.02)
0.03(0.02)
0.04(0.03)
0.08(0.06)
0.09(0.07)
0.11(0.08)
0.14(0.1)
0.26(0.25)

GP.scale

0(0)

0.01(0.01)
0.02(0.01)
0.02(0.02)
0.04(0.03)
0.05(0.03)
0.05(0.04)
0.06(0.05)
0.1(0.07)

QR

0.01(0.01)
0.01(0.01)
0.02(0.02)
0.03(0.02)
0.06(0.04)
0.07(0.05)
0.08(0.06)
0.1(0.08)
0.17(0.14)

the rel.bias in table 2.1 could not reﬂect so many details as the magnitude of the bias is

proportional to the value of the true quantile. We do not have this issues when we use mse.

For case-I, exp(1), the magnitude of the bias is much lower than GP(0.5) at all quantiles.

These results are given in the appendix ﬁgure A.1 and table A.2.

Now we move to case-II where φ = 2. First we simulate ei ∼ GP(0.5). We want to see
how our estimates ﬂuctuate at higher quantiles (τ > 0.9). We use two sample sizes n = 500

and 1000. Figure 2.5 gives us the quantile function and the estimates. Like in case-I, both QR

and GP.scale estimates improves with increasing sample size.

In table 2.3, we can see that

the relative bias is an increasing function of quantile level τ for both of the methods. Now

26

Figure 2.5: Quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. The black line is
the true quantile function. Average± standard error curves of the estimates are plotted for
QR (orange) and GP.scale (violet).

the true quantile value is always more than φ = 2 since QY |x(τ ) = 2 + (1 + 0.9x)Q(τ ) ≥ 2
for all τ ∈ [0, 1]. We can see that for all quantile values, GP.scale has lower rel.bias compared
to QR, even when we increase our sample size to 1000. Also, QR has very high relative bias

at τ = 0.97 and 0.99. The mse in ﬁgure 2.6 conveys the same picture as relative bias and it

is easier to interpret ﬁgure 2.6 than table 2.3. Moreover, we can see that mse for both case-I

and case-II reﬂects the magnitude of the bias. Hence from now onward, we will only use mse

as a measure of comparing diﬀerent estimates. We have summarised mse values at some

quantile levels for case-II, GP(0.5) in table 2.4. It is evident from here that using GP.scale

is preferred over QR at higher quantiles. From our simulation studies, we see that the tails

are approximated well with GP.scale when observations are form a scale model 2.9 and the

underlying distribution is GP or exponential. Motivated by this we extend our method for

the location-scale family of regression models in the next chapter.

27

101520253035qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98n = 500101520253035qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98n = 1000quantiletFigure 2.6: mse±se for φ = 2, GP(0.5) and quantile levels τ.

Table 2.4: φ = 2, GP(0.5): mse(se) of quantile estimates for φ = 2 at x = 0.5 and quantile
levels τ.

n = 500

n = 1000

τ

0.5
0.75
0.9
0.93
0.95
0.97
0.99

GP.scale
0.01(0)
0.05(0)
0.32(0.02)
0.65(0.04)
1.27(0.08)
3.39(0.22)
23.9(1.68)

QR

0.01(0)
0.09(0.01)
0.67(0.05)
1.39(0.1)
2.47(0.16)
7.84(0.51)

100.11(14.26)

GP.scale

0(0)
0.02(0)
0.16(0.01)
0.32(0.02)
0.63(0.04)
1.66(0.1)
11.6(0.73)

QR

0.01(0)
0.04(0)
0.36(0.02)
0.76(0.05)
1.49(0.1)
4.29(0.29)
38.78(3.24)

28

010203040qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqrn = 500mset010203040qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqrn = 1000CHAPTER 3

ESTIMATION OF EXTREME CONDITIONAL QUANTILES WITH

COVARIATE ADAPTIVE THRESHOLD SELECTION

3.1

Introduction

The study of the tail behavior of a distribution is helpful for the analysis of rare events.

Often, Generalized Pareto (GP) distribution is used for modeling the tail of a distribution.

This approach is popularly known as the Peak Over Threshold (POT) approach.

In the

POT approach, a GP distribution is commonly ﬁtted to the tail region of the data using

only the observations exceeding a certain threshold. A fundamental result in the extreme

value literature by Pickands (1975) pointed out that GP distribution can be used to model

the tail, as long as the underlying distribution satisfy certain regularity conditions. Let

Z be a random variable with distribution function F . Pickands (1975) showed that the
distribution of the conditional excess, Z − u|Z > u can be approximated by GP distribution
as long as F belongs to the maximum domain of attraction of an extreme value distribution.

It is hard to ﬁnd distributions which do not belong to the maximum domain of attraction.

Hence, the POT approach can be used even when F is unknown. It is important to keep

in mind that the GP approximation is valid when u is large. As pointed out by Song &

Song (2012a), selecting a low threshold value will lead to poor approximation of the excess

by GP and hence the resulting estimates will have high bias. On the other hand, when

the threshold is too high, the eﬀective sample size used for estimating the parameters is

small which leads to high variance. Thus, choosing an optimal threshold is necessary for the

bias-variance tradeoﬀ. Once the threshold is determined, the rest of the parameters: scale

and shape are estimated from the GP ﬁt to the scaled excess data. Langousis et al. (2016)

have nicely summarised the available methods for threshold selection. Davison & Smith

(1990) introduced statistical modeling of peaks over threshold with GP distribution where

29

the parameters of the GP distibution are allowed to vary as a function of covariates. Some

of the more recent works which use covariate-based GP distribution for modeling peaks over

threshold include Park & Kim (2016), Song & Song (2012b), del Castillo & Serra (2015). A

comprehensive review of GP modeling where the parameters of the GP vary as a function of

the covariates has been presented in Coles et al. (2001). In all these works, the threshold of

which GP is ﬁt is chosen among the set of empirical quantiles of the response variable. To the

best of our knowledge, the problem of choosing the threshold as a function of the covariate

has not been addressed in the literature. The main focus of this chapter is to develop a

method which can choose a covariate dependent threshold while still suitably estimating the

covariate dependent parameters of the GP distribution.

In this chapter, we approach the conditional quantile estimation problem where the

threshold can vary across diﬀerent covariates. Integrating QR and EVT, we propose a con-

sistent estimation of the quantile function. This method is quite ﬂexible in the sense that

it can be easily adapted for distributions in the heavy tail domain of attraction. Although

there are relatively few works which address the issue of estimation of high conditional quan-

tiles for heavy tailed distributions, the one by Wang et al. (2012) presents a very promising

method for this purpose. Their method uses standard QR to estimate intermediate quantiles

and then employs EVT to extrapolate this information to estimate extreme quantiles. Their

approach is semi-parametric and uses the Hill estimator (see Hill (1975)) for estimating the

tail behavior. We compare our parametric approach based on the GP distribution to that of

Wang et al. (2012) in terms of accuracy of estimation for both quantiles and the tail index.

The parametric based approach is the best method when the true observations are indeed

from a Pareto or Generalized Pareto distribution. For other distributions in the Fréchet

domain of attraction, the method is fairly competitive and comes with several advantages in

terms of practical implementation.

The rest of this chapter is organized as follows. In section 3.2 we propose our method (QR

adjusted GP) for location-scale family of regression models. In 3.2.1, we establish asymptotic

30

properties of QR adjusted GP in the context of the GP distribution. We also demonstrate

eﬃciency of our approach over standard QR and our previous method (GP.scale, section 2.5)

through numerical investigations in 3.2.2. In section 3.3 we propose a generalization of our

method for heavy tailed distributions. We give an outline of the analytical properties of this

extension in 3.3.1. In section 3.4, we explore conditional quantile estimation in the heavy tail

domain of attraction through simulation study. In 3.4.1 we review high conditional quantile

estimation method of Wang et al. (2012) where the slopes of the quantile function also

changes with quantile levels. We compare our method with Wang et al. (2012) for diﬀerent

distributions in the heavy-tail domain of attraction in 3.4.2. From our ﬁndings, we conclude

that our proposed method better or as good as Wang et al. (2012). Finally, in section 3.5

we apply our method for the precipitation downscaling task with rainfall data.

3.2 Proposed Method: QR Adjusted GP for Location-Scale Models

In chapter 2, we have seen that our proposed method is suitable for the following family

of regression models:

(3.1)
where φ ∈ R and θ + XT γ > 0, and  ∼ GP(µ = 0, σ = 1, ξ). In this case, Y |X = x ∼
GP(µ = φ, σ(x) = θ + xT γ, ξ). Hence, the conditional distribution of Y depends on x only

Y = φ + (θ + XT γ)

through the scale parameter σ(x) = θ + xT γ. Now we concentrate on the general case:

Y = φ + XT ρ + (θ + XT γ)

(3.2)

Here the conditional distribution of Y given X = x depends on x through both the scale

σ(x) = θ + xT γ and the location µ(x) = φ + xT ρ. Clearly, the model 3.1 is a special case
of 3.2 for ρ = 0. Let ˜yi = (yi − µ(xi))/σ(xi). If we try to ﬁt the GP distribution to the

31

observations directly, the parameters in 3.4 can be estimated as

l(φ, ρ, θ, γ, ξ)

ˆφ, ˆρ, ˆθ, ˆγ, ˆξ = argmin
φ,ρ,θ,γ,ξ

= argmin
φ,ρ,θ,γ,ξ

= argmin
φ,ρ,θ,γ,ξ

(cid:34) (cid:88)
(cid:34) (cid:88)

i:˜yi>0

i:˜yi>0

log(σ(xi)) + (1 +

1
ξ

)

log(θ + xT

i γ) + (1 +

log

1 + ξ

(cid:16)

(cid:88)
(cid:88)

i:˜yi>0
1
ξ

)

i:˜yi>0

(cid:16)

log

1 + ξ

(cid:17)(cid:35)

(yi − µ(xi))

(cid:17)(cid:35)

σ(xi)
(yi − φ − xT
θ + xT
i γ

i ρ)

(3.3)
We have to minimize the loss function in 3.3 such that ξ > 0, σ(xi) > 0 and yi−µ(xi) > 0 for
all i = 1, . . . , n. Like in the optimization problem of 2.11 in the previous chapter, there is no

analytical solution to this. We need to solve this with 2n + 1 many constraints numerically.
Note that n many constraints come from the condition yi− µ(xi) > 0. Our proposed method
can reduce the computational challenge by removing these constrains. We discuss the details

in the next section.

3.2.1 Method of Estimation

We assume the following location-scale regression model

Y = φ + XT ρ + (θ + XT γ),

(3.4)
where φ, θ ∈ R, ρ = (ρ1, . . . , ρp) and γ = (γ1, . . . , γp) in Rp are unknown parameters.
θ+XT γ > 0 for all X in its domain. XT γ = γ1X1+···+γpXp and XT ρ = ρ1X1+···+ρpXp
are the linear combinations of covariates in equation 3.4. Given X = x and a quantile level

0 < τ < 1, the conditional quantile is given by

QY |x(τ ) = φ + xT ρ + (θ + xT γ)Q(τ )

Suppose  follows standard GP distribution with tail index ξ > 0, i.e.,  ∼ GP(ξ). Then Y
given X = x follows GP distribution with location µ(x) = φ+xT ρ, scale σ(x) = θ +xT γ > 0,

and tail index ξ > 0. The conditional quantile is

QY |x(τ ) = φ + xT ρ +

(θ + xT γ)

ξ

32

(cid:104)

(cid:105)
(1 − τ )−ξ − 1

(3.5)

Through our numerical investigation in section 2.6, we have seen that for the scale family
of models Y = φ + (θ + XT γ),  ∼ GP(ξ), quantile regression can reasonably approximate
any lower quantile τ ∈ (0, 0.9). But quantile regression suﬀers from very high variability,
particularly when τ > 0.95. Although we need an alternative of quantile regression when τ

is close to 1, we can use the information from quantile regression estimates a lower quantiles.
For covariate free threshold µ = φ, we can use the ﬁrst order statistic Y(1) to estimate µ.
Motivated by this, we propose using the quantile regression estimate at quantile level τc = 1
n
as our threshold.

(cid:92)
φ + xT ρ = ˆQRY |x

(cid:16) 1
(cid:17)
(cid:16) 1

n

n
+ xT ˆβ

(cid:16) 1

(cid:17)

n

ˆµ(x) =

ˆQRY |x

(cid:16) 1

(cid:17)

n

= ˆα

(cid:17)

(3.6)

We obtain ˆα( 1

n) and ˆβ( 1

n) from equation 2.3. Suppose

(cid:16) 1

(cid:17)

e

, x

n

(cid:16) 1

(cid:17)

n

= Y − ˆQRY |x

= Y − ˆµ(x)

Then mimicking the procedure in 2.17, we estimate the rest of the parameters θ, γ, and ξ as

ˆθ, ˆγ, ˆξ = argmin

θ,γ,ξ

= argmin

θ,γ,ξ

= argmin

θ,γ,ξ

l(θ, γ, ξ)

(cid:34) (cid:88)
(cid:34) (cid:88)

i:e( 1

n ,xi)>0

i:e( 1

n ,xi)>0

log(θ + xT

i γ) + (1 +

log(θ + xT

i γ) + (1 +

1
ξ

)

1
ξ

)

(cid:88)
(cid:88)

i:e( 1

n ,xi)>0

i:e( 1

n ,xi)>0

log

log

(cid:16)
(cid:16)

1 + ξ

(yi − ˆµ(xi))
θ + xT
i γ

(cid:17)(cid:35)

1 + ξ

e( 1
n, xi)
θ + xT
i γ

(cid:17)(cid:35)

(3.7)

Finally, using 3.7 and 3.6, we can estimate the quantile function in 3.5 as

(cid:16) 1

(cid:17)

n

+

ˆθ + xT ˆγ

ˆξ

(cid:104)

(cid:105)
(1 − τ )− ˆξ − 1

ˆQY |x(τ ) = ˆQRY |x

The algorithm of the estimation procedure is outlined below.

In the following proposition, we establish the consistency of algorithm 2 for GP distribu-

tion under the compactness assumption (A1) on the covariate x.

33

Algorithm 2: QR adjusted GP

(cid:17)

(cid:16) 1

(cid:17)
(cid:16) 1
n, xi) = yi − ˆQRyi|xi

= ˆα

n

n

+ xT
i

(cid:16) 1

ˆβ

(cid:17)

n

1. Get ˆQRY |xi

2. Get e( 1

(cid:16) 1

n

(cid:17)

from 3.6.

3. Get ˆξ, ˆθ and ˆγ from 3.7

4. Given X = x and 0 < τ < 1, estimate the τ quantile as

(cid:16) 1

(cid:17)

n

+

ˆθ + xT ˆγ

ˆξ

(cid:104)
(cid:105)
(1 − τ )− ˆξ − 1

ˆQY |x(τ ) = ˆQRY |x

iid∼ GP (ξ).

Proposition 3.2.1. Let yi = φ + x(cid:62)
With (cid:101)xi = [1, x(cid:62)

i γ)i, i

i ρ + (θ + x(cid:62)
i ],(cid:101)γ = [θ, γ] and (cid:101)ρ = [φ, ρ], let(cid:100)(cid:101)x(cid:62)
(cid:18) 1
i (cid:101)ρ = ˆQRY |xi
(cid:18)
(cid:19) n(cid:88)

n

(cid:19)

and

ˆγ, ˆξ = argmin

c,d

and estimate the τ quantile of Y as

i c) +

 n(cid:88)
log((cid:101)x(cid:62)
ˆQY |x(τ ) = (cid:100)(cid:101)x(cid:62)

i=1

i (cid:101)ρ +(cid:101)x(cid:62)ˆγ

ˆξ

(cid:16)

(cid:17)
(1 − τ )− ˆξ − 1

√

n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ)

1 +



d(yi − (cid:100)(cid:101)x(cid:62)
i (cid:101)ρ)
(cid:101)x(cid:62)

i c

log

(3.8)

1 +

1
d

i=1

Then,

1.

2.

√
n( ˆQY |x(τ ) − QY |x(τ )) =⇒ N (0, σ2)
(1/n) is a consistent estimate of(cid:101)x(cid:62)

i ˜ρ.

Proof. We ﬁrst show that ˆQRY |xi

34

In this direction, note that by relation (7.1) in Chernozhukov (2005), we have

√

n( ˆQRY |xi

(1/n) − QY |xi

(1/n)) ∼ N

1/n(1 − /n)
f 2(F−1(1/n))

0,

(E(XX(cid:48)))−1

(cid:18)

(cid:19)

where f and F denote the density and cumulative distribution function of GP(ξ) respectively.
Since F−1(1/n) = (1/ξ)((1 − 1/n)−1/ξ − 1) → 0, f 2((1/ξ)((1 − 1/n)−1/ξ − 1)) → 1. Hence,

√
n( ˆQRY |xi

(1/n) − QY |xi
(cid:32)(cid:18)
n(cid:101)x(cid:62)
i (cid:101)γ

(1/n)) = oP (1)

(cid:33)
(cid:19)−1/ξ − 1

Also,

√

(1/n) −(cid:101)x(cid:62)

i (cid:101)ρ) =

√

ξ

n(QY |xi

1 − 1
n
where M(cid:101)γ = M maxj |γj| by assumption (A1). Hence,
i (cid:101)ρ) = oP (1)
(1/n) −(cid:101)x(cid:62)
(cid:16)
n(cid:88)

First, suppose(cid:101)x(cid:62)

n( ˆQRY |xi

√

log(cid:101)xT

i(cid:101)γ + (1 +

log

1 +

i (cid:101)ρ is known, then
(cid:20) n(cid:88)
ˆγ0, ˆξ0 = argmin(cid:101)γ,ξ
√

i=1

Then,

where I((cid:101)γ, ξ) is same as deﬁned in (2.21). Thus, for (cid:101)x(cid:62)

n((ˆγ0, ˆξ0) − ((cid:101)γ, ξ)) =⇒ N (0, I((cid:101)γ, ξ)−1)
n(cid:88)

(cid:32)

1
ξ

)

i=1

√

nM(cid:101)γ

n

(cid:46)

= oP (1)

(cid:17)(cid:21)

ξ(yi −(cid:101)x(cid:62)
i (cid:101)ρ)
θ +(cid:101)xT
i(cid:101)γ

(3.9)

the equation

which implies

∇(cid:101)γ,ξ

i=1

(cid:33)

i (cid:101)ρ known (ˆγ0, ˆξ0) is the solution of
ξ(yi −(cid:101)x(cid:62)
i (cid:101)ρ)
θ +(cid:101)xT
i (cid:101)ρ), h2((cid:101)x(cid:62)
i (cid:101)ρ))

i γ

= 0

1 +

) log

1
ξ

i γ + (1 +

log(cid:101)xT
(ˆγ0, ˆξ0) = (h1((cid:101)x(cid:62)

Now, by Taylor expansion, we have

i (cid:101)ρ), h2((cid:100)(cid:101)x(cid:62)
(h1((cid:100)(cid:101)x(cid:62)
i (cid:101)ρ))
i (cid:101)ρ)) + (((cid:100)(cid:101)x(cid:62)
≈ (h1((cid:101)x(cid:62)
i (cid:101)ρ), h2((cid:101)x(cid:62)

1((cid:101)x(cid:62)
i (cid:101)ρ)h(cid:48)
i (cid:101)ρ −(cid:101)x(cid:62)
i (cid:101)ρ), h2((cid:101)x(cid:62)

i (cid:101)ρ) + ((cid:100)(cid:101)x(cid:62)

2((cid:101)x(cid:62)
i (cid:101)ρ)h(cid:48)
i (cid:101)ρ −(cid:101)x(cid:62)
i (cid:101)ρ))

35

Therefore,
n((h1((cid:100)(cid:101)x(cid:62)
i (cid:101)ρ), h2((cid:100)(cid:101)x(cid:62)
i (cid:101)ρ)) − ((cid:101)γ, ξ))
√
n((cid:100)(cid:101)x(cid:62)
i (cid:101)ρ)) − ((cid:101)γ, ξ)) +
i (cid:101)ρ), h2((cid:101)x(cid:62)
i (cid:101)ρ −(cid:101)x(cid:62)
n((h1((cid:101)x(cid:62)
≈ √
n((cid:100)(cid:101)x(cid:62)
i (cid:101)ρ) = oP (1), thus
i (cid:101)ρ −(cid:101)x(cid:62)

As shown previously,

√

√

√

n((ˆγ, ˆξ) − (γ, ξ)) =⇒ N (0, Σ)

i (cid:101)ρ))
2((cid:101)x(cid:62)
i (cid:101)ρ), h(cid:48)
1((cid:101)x(cid:62)
i (cid:101)ρ)(h(cid:48)

where Σ = (h(cid:48)
(2), note that

1(φ), h(cid:48)

2(φ))I(γ, ξ)−1(h(cid:48)

1(φ), h(cid:48)

2(φ))(cid:62) with I(γ, ξ) same as in (2.21). For part

(cid:32)(cid:101)x(cid:62)ˆγ
(cid:17) −(cid:101)x(cid:62)(cid:101)γ
n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) = oP (1) +

(1 − τ )− ˆξ − 1

ξ
√

ˆξ

(cid:17)(cid:33)
(cid:16)
(1 − τ )−ξ − 1
n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ))

n((ˆγ, ˆξ) − ((cid:101)γ, ξ))(cid:62)∇(cid:101)γ,ξg((cid:101)γ, ξ)

√

=

=

√

(cid:16)

n( ˆQY |x(τ ) − QY |x(τ ))
n((cid:100)(cid:101)x(cid:62)
i (cid:101)ρ −(cid:101)x(cid:62)
i (cid:101)ρ) +
√
√
n
n((cid:100)(cid:101)x(cid:62)
i (cid:101)ρ −(cid:101)x(cid:62)
i (cid:101)ρ) +
√
n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) ≈ √
n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ), thus

√

Now,

√

Since

n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) =⇒ N (0, σ2)
where σ2 = ∇(cid:101)γ,ξg((cid:101)γ, ξ)(cid:62)Σ∇(cid:101)γ,ξg((cid:101)γ, ξ). This completes the proof.

√

3.2.2 Simulation Study: GP Distribution

We want to assess performance of our method when the data is simulated from the following

regression model

Y = φ + XT ρ + (θ + XT γ),

 ∼ GP(ξ)

QY |x(τ ) = φ + xT ρ + (θ + xT γ)Q(τ )

α(τ ) = φ + θQ(τ ),

β(τ ) = ρ + γQ(τ )

36

(3.10)

Again, we see that quanitles are linear in x. So we can use quantile regression in this case

without violating any model assumption. We simulate (xi, yi), i = 1, . . . , n iid observations
from 3.10. We use φ = θ = 1, ρ = (1, 1) and γ = (0.9, 0) so that

yi = 1 + x1i + x2i + (1 + 0.9x1i)ei

We simulate xij ∼ U (−1, 1) for j = 1, 2 so that 1 + 0.9x1i > 0 for all i = 1, . . . , n. We want
to see how QR and GP.scale perform compared to our proposed method QR adjusted GP

(QR.GP) for estimating higher quantiles above 0.9. We simulate ei from two distributions:
exponential with mean λ = 1 (exp(1)) and standard GP with ξ = 0.5 and 0.8 (GP(ξ)). We
n leads to
use two sample sizes n = 500 and 1000. From proposition 3.2.1, we know that τc = 1
consistent estimation of higher quantiles with QR.GP when the true distribution is GP. Here

we want to validate our proposition through simulations. We also investigate the performance

of QR.GP and QR.Scale with varying threshold. For that purpose, we choose τc from a grid
of values in (0,1). When the data is generated from a GP or exponential distribution, values
of τc close to 0 would be the best choice. We start with the case ei ∼ GP(0.5). We use mse
as our measure of assessment and our monte carlo sample size is K = 500.

The results for sample size n = 500 is summarised in ﬁgure 3.1. We can see in 3.1 (a)

that GP.scale is not appropriate in this case. It fails to capture the eﬀect of the covariate
x at at any given higher quantile 0.9 ≤ τ ≤ 0.99. QR.GP performs better than QR for
threshold τc = 0.01 at any quantile level 0.9 ≤ τ ≤ 0.99. In 3.1 (b), we can see how mse of
QR.GP changes for diﬀerent values of τc when the quantile level is τ = 0.93. QR.GP has
lower mse than QR for all τc except at τc = 0.95. When we increase the quantile level to
τ = 0.97, QR.GP has lower mse than QR for all values of τc (ﬁgure 3.1 (c)).

Now we increase our sample size to n = 1000. In ﬁgure 3.2 (a) we can see that even with

an increase in the sample size, QR.GP outperforms QR at τc = 0.01. From ﬁgure 3.2 (b)
and (c), we get an understanding of the eﬀect of τc on the estimated quantile by QR.GP.
The mse is an increasing function of τc in this case. This is bound to happen since the
eﬀective sample size used for estimation decreases when we increase τc. Note that according

37

Figure 3.1: mse±se for, GP(0.5) at x = (0.5, 0.5) and quantile levels τ. The sample size is
n = 500. (a): τc = 0.01 for QR.GP (b) and (c): 0 < τc < 1

to algorithm 2, given τc, we only use {yi : yi − xT
ˆQR(τc) > 0} to estimate the scale and
the shape parameters. We get the best performance from QR.GP when τc = 0.01. We can
comfortably choose any τc as long as τc ≤ 0.85. Any choice of τc higher than 0.85 leads
to inconsistent estimation of GP parameters as we have very few observations above the

i

threshold.

Sparsity of the observations at tail of a distribution is associated with higher values of

the tail index ξ. Next we simulate samples of size n = 1000 from GP with tail index ξ = 0.8.

In ﬁgure 3.3 (a) we can see that the magnitude of the mse is higher than the previous case

38

01020300.900.910.920.930.940.950.960.970.980.99gp.scaleqrqr.gpt(a) 0.9<t<0.991.01.52.02.53.00.010.150.300.500.700.85qrqr.gp(b) t=0.933456789100.010.150.300.500.700.85qrqr.gp(c) t=0.97mse tcFigure 3.2: mse±se for, GP(0.5) at x = (0.5, 0.5) and quantile levels τ.The sample size is
n = 1000. (a): τc = 0.01 for QR.GP. (b) and (c): 0 < τc < 1

ξ = 0.5 for all the methods. Nonetheless, we still outperform QR. From ﬁgure 3.3 (b) and

(c), we can see that lower τc produces lower mse. Note that, our choice of τc is still 0.01
even when we increase ξ from 0.5 to 0.8. We end this section with our ﬁnal simulation
study for the exponential case, i.e., ei ∼ exp(1). This is an asymptotic case of GP(ξ)) when
ξ → 0. Hence, we start with larger sample size n = 1000. After experimenting with diﬀerent
τc, we have seen that τc ≈ 0 is optimal for QR.GP in this case. We have also seen that
for τc ≤ 0.15 we do reasonably well and our performance deteriorates when we increase τc.
Hence in ﬁgure 3.4 (a), we have plotted mse of QR.GP at τc = 0.01. Although QR has

39

01020300.900.910.920.930.940.950.960.970.980.99gp.scaleqrqr.gpt(a) 0.9<t<0.990.40.60.81.01.21.41.60.010.150.300.500.700.85qrqr.gp(b) t=0.931.52.02.53.03.54.04.55.00.010.150.300.500.700.85qrqr.gp(c) t=0.97mse tcFigure 3.3: mse±se for, GP(0.8) at x = (0.5, 0.5) and quantile levels τ. The sample size is
n = 1000. (a): τc = 0.01 for QR.GP. (b) and (c): 0 < τc < 1

lower mse than QR.GP, the magnitude of the diﬀerence is not much (less than 1.5 at higher

quantiles). When we increase our sample size to n = 2000, we outperform QR at all the

higher quantiles.

From our simulation studies, we can conclude that when the data is generated from GP

or exponential distribution, optimal threshold for the QR.GP is close to 0. Performance of

the QR deteriorates with increasing quantile level τ and higher values of the tail index ξ.

Now we want to investigate how QR.GP behaves when the underlying distribution is a heavy

tailed distribution other than GP.

40

02004006008000.900.910.920.930.940.950.960.970.980.99gp.scaleqrqr.gpt(a) 0.9<t<0.9924680.010.150.300.500.700.85qrqr.gp(b) t=0.9310152025303540450.010.150.300.500.700.85qrqr.gp(c) t=0.97mse tcFigure 3.4: mse±se for, exp(1) at x = (0.5, 0.5) and quantile levels τ. τc = 0.01 for QR.GP.

3.3 Extension to the Nonparametric Problems and Threshold Se-

lection

So far, we have only focused on the GP distribution, i.e.,  ∼ GP(ξ), ξ ≥ 0. Now we
want to generalize for the case when  is any heavy tailed distribution, i.e.,  ∼ F where
F ∈ D(Gξ), ξ > 0. When we do not know F , we can use GP approximation to compute a
higher quantile. First we focus on the nonregression case. Let Y ∼ F where F ∈ D(Gξ) for
some ξ > 0. Then, by Theorem 2.3.2, for a large enough value u, Y |Y > u is approximately
distributed according to a GP distribution. Hence, for any y > u,

(cid:32)

1 +

41

P (Y > y|Y > u) =

(cid:33)−1/ξ

ξ(y − u)

σ

0.00.51.01.50.900.920.940.960.98qrqr.gp(a) n= 10000.000.050.100.150.200.900.920.940.960.98qrqr.gp(b) n= 2000tmseLet yτ be the τ quantile of Y such that yτ ≥ u. Then

ξ(yτ − u)

ξ

P (Y > yτ ) = P (Y > u)P (Y > yτ|Y > u)

(cid:17)− 1
(cid:17)−ξ − 1
(cid:105)
we do not know F so we estimate ¯F (u) = E(I(Y > u)) with ˆpu =(cid:80)n

(cid:16)
(cid:104)(cid:16)1 − τ

= ¯F (u)
= 1 − τ
σ
ξ

Clearly, if we know F , we can substitute the exact expression for ¯F (u) in 3.11. In practice
i=1 I(yi > u)/n. Hence

(3.11)

therefore,

yτ = QY (τ ) = u +

1 +

σ

¯F (u)

we can estimate the quantile in 3.11 as

ˆQY (τ ) = u +

ˆσ
ˆξ

(cid:104)(cid:16)1 − τ

ˆpu

where ˆσ and ˆξ are estimated as

(cid:88)

ˆσ, ˆξ = argmin

σ,ξ

i:yi>u

log(σ) + (1 +

1
ξ

)

(cid:17)− ˆξ − 1
(cid:105)
(cid:16)
(cid:88)

log

1 + ξ

i:yi>u

(3.12)

(cid:17)

(yi − u)

σ

An important thing to note here is that we do not estimate u from the data. Moreover, 3.11

is an approximation of the true quantile for large u.

3.3.1 Method of Estimation
We assume model 3.4 with  ∼ F , F ∈ D(Gξ) for some ξ > 0. Let 0 < τc < 1 be an
intermediate quantile level such that τc is not too close to 1. Here we use ˆµ(x) as our
threshold.

ˆµ(x) = ˆQRY |x(τc)

ˆQRY |x(τc) = ˆα(τc) + xT ˆβ(τc)

(3.13)

We obtain ˆα(τc) and ˆβ(τc) from equation 2.3.
n. Here we want
to choose τc appropriately so that Y − ˆµ(x) approximately follows GP distribution. Let

In 3.2.1 we used τc = 1

42

e(τc, x) = Y − ˆQRY |x(τc). Then

e(τc, x) = φ + xT ρ + (θ + xT γ) − ˆQRY |x(τc)

= φ + xT ρ + (θ + xT γ) − ˆα(τc) − xT ˆβ(τc)
= φ − ˆα(τc) + xT (ρ − ˆβ(τc)) + (θ + xT γ)

We assume that given τc, e(τc, x)|e(τc, x) > 0 ≈ GP(µ = 0, σ(x) = θ + xT γ, ξ). Hence, we
can estimate the rest of the unknown parameters θ, γ and ξ by ﬁtting a scaled GP model to
the conditional residuals e(τc, xi)|e(τc, xi) > 0 for i = 1, . . . , n.
(cid:88)

(cid:35) (3.14)

ˆθ, ˆγ, ˆξ = argmin

(cid:88)

l(θ, γ, ξ)

(cid:34)

θ,γ,ξ

= argmin

log(θ + xT

i γ) + (1 +

1
ξ

)

log

1 + ξ

e(τc, xi)
θ + xT
i γ

θ,γ,ξ

i:e(τc,xi)>0

i:e(τc,xi)>0

i γ > 0 for all
We need to minimize the loss function in 3.14 such that ξ > 0 and θ + xT
i = 1, . . . , n. Using ˆQRY |x(τc) to estimate µ(x) = φ + xT ρ, we have reduced n constraints
in our optimization problem. Let τ be a high quantile level such that Qe(τc,x)(τ ) > 0. Then
from 3.11 we have

(cid:80)n

Let ˆp0(τc, n) = 1
n
residual e(τc, x)

(cid:104)(cid:16)

Qe(τc,x)(τ ) ≈ σ(x)

ξ

1 − τ

P (e(τc, x) > 0)

(cid:104)(cid:16) 1 − τ

ˆQe(τc,x)(τ ) =

ˆθ + xT ˆγ

ˆξ

ˆp0(τc, n)

(cid:17)−ξ − 1
(cid:105)
(cid:17)− ˆξ − 1
(cid:105)

i=1 I[e(τc, xi) > 0]. Following 3.12 we can estimate the τ quantile of the

(3.15)

Finally, combining 3.15 and 3.13, we can estimate the conditional quantile of Y given x at

a quantile level τ as

ˆQY |x(τ ) = ˆQRY |x(τc) + ˆQe(τc,x)(τ )

= ˆα(τc) + xT ˆβ(τc) +

ˆθ + xT ˆγ

(cid:104)(cid:16) 1 − τ

(cid:17)− ˆξ − 1
(cid:105)

ˆξ

ˆp0(τc, n)

The algorithm for estimation is outlined below.

43

Algorithm 3: QR adjusted GP for location-scale family
start: Set 0 < τc ≈ 0; ﬁx x, δ > 0;

ˆβ(τc) from 3.13.

(τc) = ˆα(τc) + xT
i

I. Obtain ˆQRY |xi
II. Get e(τc, xi) = yi − ˆQRyi|xi
III. Get ˆξ, ˆθ and ˆγ from optimizing 3.14 and ˆp0(τc, n) = 1
n
IV. For 0 < τ < 1 estimate τ quantile as

(τc)

ˆQY |x(τ ) = ˆQRY |x(τc) +

ˆθ + xT ˆγ

ˆξ

i=1 I[e(τc, xi) > 0]

(cid:80)n
(cid:17)− ˆξ − 1
(cid:104)(cid:16) 1 − τ
(cid:105)

ˆp0(τc, n)

V. Set τc = τc + δ such that 0 < τc < 1 and repeat steps I-IV

end: Stop when τc ≈ 0.95

3.3.2 Discussion: Threshold Selection for Heavy Tailed Distributions

According to proposition 3.2.1, if the underlying distribution is GP, QR adjusted GP with
n leads to a consistent quantile estimator for a quantile level τ close to 1. Here we
τc = 1
propose an outline to generalize our method when the underline distribution is in the heavy-

tail domain of attraction. Let Z1, . . . , Zn be an independent and identically distributed
sample with common distribution function F ∈ D(Gξ). Then, given Z(n−k,n), the order
statistics diﬀerences

( ˜Z1, . . . , ˜Zk) = (Z(n−k+1) − Z(n−k,n), . . . , Z(n,n) − Z(n−k,n))

(3.16)

follow approximately GP distribution with tail index ξ. Using theorem 2.3.2, we can ﬁnd

the parameters of GP by maximizing the log likelihood

 (cid:88)

i:˜zi>0

(cid:18)

(cid:18)

log

1 +

ξ ˜zi
σ

(cid:19)

(cid:19) (cid:88)

1
ξ

i:˜zi>0

ˆσ, ˆξ = argmin

γ,ξ

log(σ) +

1 +

wi√
In Theorem 3.4.2 of de Haan & Ferreira (2010), it has been shown zi
k(k/n)ξ
where W = (w1, . . . , wk) follows multivariate normal distribution. This allowed them to

(k/n)ξ +

d
=

cξ

44

prove

√

k(( ˆξ, ˆσ/(k/n)−ξ) − (ξ, 1)) =⇒ N (µ, Σ)

Under the same set of assumptions, in Theorem 3.1 of Diebolt et al. (2007), it has been

shown that for

where

then

(cid:18)1 − τ

k/n


(cid:19)− ˆξ − 1

ˆQZ (τ ) = z(n−k,n) +

ˆσ
ˆξ

QZ (τ ) = F−1

Z (τ )

√

k(k/n)ξ( ˆQZ (τ ) − QZ (τ )) =⇒ N (m, s2).

For a regression model of the form,

(cid:124) (cid:123)(cid:122) (cid:125)
Yi = φ + x(cid:62)
i ρ
(cid:101)x(cid:62)
i (cid:101)ρ

(cid:124)
(cid:123)(cid:122)
+ (θ + x(cid:62)
(cid:101)x(cid:62)
i (cid:101)γ

(cid:125)

i,

i ∈ D(Gξ),

i iid

i γ)

We consider

Ei = Yi − ˆQY |xi

(cid:19)

(cid:18)

1 − k
n

where k is the same which ensures that given E(n−k,n)

(E(n−k+1) − E(n−k,n), . . . , E(n,n) − E(n−k,n))

are approximately independent and identically distributed as GP with tail index ξ.

By Theorem 5.1 in Chernozhukov (2005),

(cid:18)

(cid:19)

i (cid:101)γF−1


1 − k
n

−(cid:101)x(cid:62)
i (cid:101)ρ −(cid:101)x(cid:62)
i (cid:101)γ(i − F−1


(cid:19)
(cid:17)ξ(cid:19)
(cid:16) n

k

(cid:18) 1√

k

(1 − k/n)

=⇒ N (0, σ2
X )

√

k(k/n)ξ

Therefore,

term), we get

(cid:18)

ˆQY |xi

Ei =(cid:101)x(cid:62)
k(k/n)ξ(cid:16)

√

(1 − k/n)) + OP
(cid:17)

(1 − k/n)

45

(n−k,n) − F−1


=⇒ N (0, σ2
 )

Applying Theorem 5.1 in Chernozhukov (2005) with covariate x = 1 (i.e. only constant

By assumption (A1),(cid:101)x(cid:62)

i (cid:101)γ ≤ M maxj |γj| = M(cid:101)γ, therefore
(cid:18) 1√
i (cid:101)γ(i − (n−k,n)) + OP
Ei =(cid:101)x(cid:62)

(cid:16)n

(cid:17)ξ(cid:19)

k

k

Let i0, i1, i2, . . . , ik be the indices corresponding to (n−k+1,n), . . . , (n,n), then

Since, (n−k+1,n) − (n−k,n), . . . , (n,n) − (n−k,n) are iid observations from GP distribution.
(cid:62)(cid:101)γ and tail
This implies Ei1
index ξ.

,··· , Eik

S = {i : Ei > 0} ≈ {i1, . . . , ik}
are iid observations from GP distribution with scale (cid:101)xi
(cid:33)
 (cid:88)

(cid:19) (cid:88)

log((cid:101)x(cid:62)

i γ) +

(cid:32)

log

1 +

(cid:18)

Therefore, we can ﬁt GP distribution on the residuals and estimate the parameters as

ˆγ, ˆξ = argmin

γ,ξ

i:ei>0

1 +

1
ξ

i:ei>0

ξei(cid:101)x(cid:62)

i γ

Note that ei closely share the asymptotic behavior of ˜zi’s as deﬁned in (3.16), i.e. ei =
k(k/n)ξ). However, the proof of asymptotic normality of the ei’s is too involved and

√
OP (1/
is thereby omitted in this dissertation.

Mimicking the steps of Theorem 3.4.2 in de Haan & Ferreira (2010), the rate of conver-

gence of the tail index ˆγ and the scale coeﬃcient ˆγ can be determined.

Similary, mimicking the steps of Theorem 3.1 in Diebolt et al. (2007), one can determine
the convergence rate of ˆQY |X=x(τ ). For the non-parametric regime, we explore the perfor-
mance of our method from a numerical standpoint. Intricate details on the exact convergence

rates of the tail index and the quantiles from a theoretical standpoint have been avoided.

However, the above formulation does give us an insight into the choice of τc. Note that
we vary τc = 1 − k
n, k = 1, . . . , n and choose k such that estimation of ξ stabilizes. Figure
3.6 displays the tail index estimate for t-distributed errors with by applying the GP-ﬁt to

errors and the proposed QR-GP method to the regression output y’s. Note that the plot
stabilizes at τc ≈ 0, which means that k = 0. Indeed the tail index estimation based on our
approach for regression model closely agrees with the estimates if GP had been directly ﬁt

to the errors. Also, the estimate of the tail index agree with the true value.

46

Figure 3.6 displays the tail index estimate for t-distributed errors and burr-distributed

errors by applying the GP-ﬁt to errors and the proposed QR-GP method to the regression

output y’s. For t-distributed errors, for τc lying between 0.75-0.9, the plot stabilizes, thus
one can choose 1 − k/n ≈ 0.75. For Burr-distributed errors, for τc lying between 0.4-0.6,
the plot stabilizes, thus one can choose 1 − k/n ≈ 0.4. Note that the region of stabilization
completely agree to the case if a GP model had been ﬁt to the original errors. Therefore, this

approach gives an alternative to GP modeling when both location and scale are functions of

the covariate.

Figure 3.5: Tail index estimates with changing values of τc. Red line corresponds to the
estimate of ξ when GP is ﬁt to the error i. Black line corresponds to the estimate when
our proposed method is ﬁt to yi. True tail index is 0.8.

3.4 Simulation Study: Nonparametric Problem

In section 3.2.2, our focus has been on the cases  ∼GP(ξ) for ξ ≥ 0. We have assessed
the eﬃcacy of our method for higher quantile levels 0.9 ≤ τ ≤ 0.99. In section 3.3, we have
proposed a guideline to extend our method for distributions in the heavy tail domain of
attraction. Also, we have seen that selecting thresholds of the form τc = 1− k
n would lead to
consistent estimation of the tail index ξ. The value of k is determined by the nature of the

47

0.00.20.40.60.80.780.790.800.810.82GPD fit for Pareto errors1−k/nxGP on YGP on EFigure 3.6: Tail index estimates with changing values of τc. Red line corresponds to the
estimate of ξ when GP is ﬁt to the error i. Black line corresponds to the estimate when
our proposed method is ﬁt to yi. True tail index is 0.8. Left: T-distributed errors. Right:
Burr-distributed errors

underlying distribution. Let ˆξ = ˆξ(k) be the estimated tail index for threshold τc = 1 − k
n.
We have demonstrated through numerical simulations that when we select k such that ˆξ(k)

is stable, we predict ξ accurately. The estimation of the tail index is an important problem

in the extreme value literature. There have been several alternative approaches to estimate

the tail index, namely: method of moments, Hill estimator, Pickands estimator, etc.(Haan &

Ferreira (2006)). Hill (1975) has proposed a nonparametric estimator for ξ > 0 using upper

ordered statistics. Suppose that Z1, . . . , Zn are n iid random variables with distribution F ,
which is unknown. An equivalent condition for theorem 2.3.1 says that if F ∈ D(Gξ) for
some ξ > 0 if and only if

(3.17)
i.e., as u → ∞, the conditional excess P (Z > uz|Z > u) behaves like Pareto. Meaning
that for large u, the scaled excess Z/u conditioned on Z > u is approximately Pareto with

ξ > 0, z > 1

lim
u→∞

¯F (uz)
¯F (u)

− 1
ξ

= z

tail index ξ. Theorem 1.2.2 in Haan & Ferreira (2006) gives an equivalent form of of the

48

0.20.40.60.80.650.700.750.80GPD fit for T errors1−k/nxGP on YGP on E0.20.40.60.80.730.750.770.79GPD fit for Burr errors1−k/nxGP on YGP on Econdition in 3.17

Further simpliﬁcation leads to

lim
u→∞

lim
u→∞

(cid:82) ∞
u (1 − F (z)) dz
(cid:82) ∞
u (log z − log u)dF (z)

1 − F (u)

z

= ξ

1 − F (u)

= ξ

Let Z(i) be the ith ordered observation with Z(n) = max(Z1, . . . , Zn). Replacing u by Z(n−k)
and F by the empirical distribution function, the Hill estimator for ξ based on k upper order
statistics Z(n−k+1), . . . Z(n) is

k(cid:88)

i=1

(cid:104) Z(n−i+1)

(cid:105)

Z(n−k)

ˆξH (k) =

1
k

log

(3.18)

Let k = k(n) such that k(n) → ∞, k(n)/n → 0, and k(n + 1)/k(n) → 1 as n → ∞. Then, by
the theorem 3.2.4 in Haan & Ferreira (2006), ˆξH (k) → ξ in probability. More details about
the derivation of the Hill estimator and its asymptotic properties can be found in Haan

& Ferreira (2006), Hill (1975). Also, due to the nonparametric nature of the estimator,

this method can be easily adapted as long as the underlying distribution is heavy tailed.

But diﬀerent choices of k leads to diﬀerent values of ˆξH (k); hence it is crucial to choose k
appropriately for consistent estimation of ξ. Wang et al. (2012) combined quantile regression

and the Hill estimator to estimate conditional quantiles from a heavy tailed distribution.

We outline the method of estimating higher quantiles by Wang et al. (2012) in the following

section.

3.4.1 Review of an Existing Method for Extreme Quantile Estimation

In Wang et al. (2012) the authors have proposed methods for regression models with quantile
functions QY |x(τ ) = α(τ ) + xT β(τ ) where Y |x ∼ D(Gξ) for some ξ > 0. Let 0 < τc < 1 be
a ﬁxed constant, close to 1. The authors have assumed the linear quantile function only for
higher quantiles τ ∈ [τc, 1].

QY |x(τ ) = α(τ ) + xβ(τ ) for τ ∈ [τc, 1]

49

QY |x(τ ) has no speciﬁc form for τ ∈ (0, τc). Let τc < τj = j/(n + 1) be a sequence of
quantiles where j = n − k, . . . , m and m < n is an integer close to n. Suppose [a] denotes
the integer part of a. The authors assume m = n − [nη] with η > 0 being a small constant
such that nη < k. They also assume that as n → ∞, k = k(n) → ∞, and k(n)/n → 0.
Consider the sequence of quantile levels τc < τn−k <,··· < τm < 1. The authors use quantile
regression to obtain a reasonable quantile estimate at the intermediate quantile level τj for
j = n − k, . . . , m. Let

qj =ˆα(τj) + xT ˆβ(τj)

n(cid:88)
(yi − α − xT

i β)

i=1

ˆα(τj),

ˆβ(τj) = argmin

α,β

ρτj

be the QR estimate at τj. Then one might consider qj as an upper ordered statistic of a
sample from FY |x. The authors have used qj as a replacement for Y(j) in equation 3.18.
Using k + (n − m) many upper quantiles, they have proposed to estimate the tail index ξ as

k(cid:88)

j=[nη]

log

qn−j
qn−k

(3.19)

ˆξH (k, n) = ˆξ =

1

k − [nη]

Let U be the tail quantile function, i.e.,

U (t) = inf{z : F (z) ≥ 1 − 1/t},

t ∈ [1,∞)

U (t) is the 1 − 1/t quantile. When F ∈ D(Gξ) and ξ > 0, we have for z > 0

lim
t→∞

U (tz)
U (t)

= zξ

let 1− 1/t = τn−k = 1− (k + 1)/(n + 1) and 1− 1/tz = τn so that as n → ∞, τn → 1. Then
for large n,

Combining 3.19 and 3.20, the conditional quantile is estimated as

(cid:17)ξ

≈(cid:16)1 − τn−k
(cid:16)1 − τn−k
(cid:17) ˆξ

1 − τn

1 − τn

qn−k

U (tz)
U (t)

=

Q(τn)
Q(τn−k)

ˆQY |x(τn) =

50

(3.20)

(3.21)

Before we get into our simulation model, we would like to brieﬂy discuss the optimal choice

of tail sample fraction determined by k = k(n). As noted by Wang et al. (2012), the choice

of k is a very important problem in statistical applications of extreme value theory. By

optimally choosing k we identify the tail sample fraction that may have the most relevant

information on the tail behavior. Garrido & Lezaud (2013) explained the inﬂuence of k on

ˆξH (k). For small values of k, the variance of the estimate is large, while increasing k leads
to higher bias. The trade oﬀ between the bias and the variance comes at some intermediate

value of k. We would like to see how the method proposed by Wang et al. (2012) (QR.Hill)

behaves for diﬀerent values of k. We will discuss more on the choices for k and η during our

numerical investigation. Wang et al. (2012) has also proposed a variation of their method

for β(τ ) = β, i.e., when the slopes are constant at higher quantiles. That is a special case

of our simulation model in 3.10 with γ = 0. We restrict ourselves to the general case, i.e.,
γ (cid:54)= 0. In section 3.4.2, we compare our proposed QR.GP method with QR.Hill for heavy
tailed data at higher quantiles. We have outlined the algorithm for QR.Hill in algorithm 4.

We use algorithm 3 to implement QR.GP.

3.4.2 Simulation Study

We simulate from the following model

Y = φ + XT ρ + (θ + XT γ)
 ∼ F ∈ D(Gξ),

ξ > 0

(3.22)

We have the same model as in 3.10. We now explore diﬀerent distributions in the domain

of attraction of the Fréchet distribution. The conditional τ quantile of 3.22 is given by

QY |x(τ ) = φ + xT ρ + (θ + xT γ)Q(τ )

(3.23)

For all 0 < τ < 1. Hence the true quantile function 3.23 is linear in x. Moreover, both

the slope α(τ ) = φ + θQ(τ ) and the intercept β(τ ) = ρ + γQ(τ ) vary with τ. Thus, we
satisfy the model assumption 2.10 in Wang et al. (2012) with any τ ∈ (0, 1). We simulate

51

Algorithm 4: QR.Hill
start: Set 0 < τc ≈ 1; ﬁx x, δ > 0;

I. Set k = n − [nτc] and m = n − [nη]. Select k upper quantiles τj = j

n+1 for

j = n − k, . . . , m. Get

qj = ˆQRY |x(τj) = ˆα(τj) + xT ˆβ(τj)

II. Estimate tail-index ξ as

III. Estimate τ quantile as

k(cid:88)

j=[nη]

log

qn−j
qn−k

ˆξ =

1

k − [nη]

(cid:16)1 − τn−k

1 − τ

(cid:17) ˆξ

ˆQY |x(τ ) = qn−k

IV. Set τc = τc − δ such that 0 < τc < 1 and repeat steps I-IV

end: Stop when τc ≈ 0.01

(xi, yi), i = 1, . . . , n iid observations from 3.22

yij = 1 + x1i + x2i + (1 + 0.9x1i)ei

Here φ = θ = 1, ρ = (1, 1) and γ = (0.9, 0). We simulate xij ∼ U (−1, 1) for j = 1, 2 and
ei ∼ F . We choose three example distributions F in the Fréchet domain: Pareto, absolute
t and Burr. See Table B.1 in the appendix for a list of well known distributions in the

heavy-tail domain. The true τ quantile at x is

QY |x(τ ) = 1 + x1 + x2 + (1 + 0.9x1)(Q(τ ))

(3.24)

Pareto: We start our analysis with standard Pareto distribution Pa(α). Note that Pareto
distribution is a special case of standard GP distribution. If Y ∼ Pa(α) and Y ∗ = α(Y − 1)
then Y ∗ ∼ GP(µ = 0, σ = 1, ξ = 1
α. Tails of GP and Pareto
behave very similarly. In section 3.2.2, we have looked into quantiles up to 0.99 from GP

α ). Pa(α) has tail index ξ = 1

with ξ = 0.5 and 0.8.

52

Figure 3.7: mse±se for Pareto(0.5) at x = (0.5, 0.5) and quantile levels τ. The sample size
is n = 1000 and 0 < τc < 1

Now we will focus on the four extreme quantile levels τ ∈ (0.985, 0.991, 0.995, 0.998). We
compare our method QR.GP with QR.Hill for diﬀerent values of tuning parameter τc = 1− k
for k = 1, . . . , n. We choose from a grid of values τc ∈ (0.01, 0.05, . . . , 0.95). When we look
into extreme quantiles, the data is even more sparse. So we choose sample sizes n ≥ 1000 in
our simulations. Our monte carlo sample size is K = 500 for all of the cases. we are going

n

to compare QR.GP (algorithm 3) with QR.Hill (algorithm 4) for diﬀerent choices of τc. We
choose m = n − [n0.1] for QR.Hill as per the recommendation by Wang et al. (2012).

Figure 3.7 summarises mse of the quantile estimate at x = (0.5, 0.5) for sample size

53

2345670.010.150.300.500.700.85qr.hillqr.gp(a) t=0.985468101214160.010.150.300.500.700.85qr.hillqr.gp(b) t=0.991101520253035400.010.150.300.500.700.85qr.hillqr.gp(c) t=0.995tmse4060801001201401600.010.150.300.500.700.85qr.hillqr.gp(d) t=0.998n = 1000 and tail index ξ = 0.5. Clearly, the optimal threshold for QR.GP is τc = 0.01.
Our result is consistent with our ﬁndings in section 3.2.2. We can see that we outperform

QR.Hill regardless of the choice of threshold at τ = 0.985, 0.991 and 0.995 quantiles. For

QR.Hill, optimal τc is between 0.8 and 0.9. Now we increase our sample size to 2000 and
compare QR.Hill with QR.GP at τ = 0.998. In ﬁgure 3.8 (b) we can see that QR.GP has

lower mse than QR.Hill at all threshold values. With increasing sample size, we can see that

QR.GP can outperform QR.Hill at quantile level τ = 0.998.

Figure 3.8: mse±se for Pareto(0.5) at x = (0.5, 0.5) at quantile τ = 0.998 and sample sizes
(a) n = 1000, (b) n = 2000

54

4060801001400.010.150.300.500.700.85qr.hillqr.gptc(a) n = 1000501001500.010.150.300.500.700.85qr.hillqr.gptc(b) n = 2000t=0.998mseFigure 3.9: mse±se for Pareto(0.8) at x = (0.5, 0.5) and quantile levels τ. The sample size
is n = 2000 and 0 < τc < 1

We now look into the case when ξ = 0.8. We choose the sample size 2000 as higher

values of ξ is associated with even more data sparsity. From ﬁgure 3.9 (a), (b) and (c) we

can clearly see that QR.GP is uniformly better that QR.Hill at all but quantile τ = 0.998.In

fact, we can see in ﬁgure 3.11 that we achieve uniform dominance at quantileτ = 0.998

for n = 5000. For τ = 0.985, QR.GP is still uniformly better than QR.Hill even when we

increase our sample size form 2000 to 5000. We can see this in ﬁgure 3.10.

55

501001502000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(a) t=0.9851002003004005006000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(b) t=0.9915001000150020000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(c) t=0.9952000600010000140000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(d) t=0.998tmseFigure 3.10: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.985 and sample sizes
(a) n = 2000, (b) n = 5000

Figure 3.11: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.998 and sample sizes
(a) n = 2000, (b) n = 5000

56

204060801000.010.150.300.500.700.85qr.hillqr.gptc(a) n = 2000204060801000.010.150.300.500.700.85qr.hillqr.gptc(b) n = 5000t=0.985mse2000600010000140000.010.150.300.500.700.85qr.hillqr.gptc(a) n = 200020006000100000.010.150.300.500.700.85qr.hillqr.gptc(b) n = 5000t=0.998mseAbsolute t: Next we simulate  from the absolute t distribution. The tail index for t
ν where ν is the degrees of freedom. We choose ξ = 0.5 and 0.8. Note that
now the underlying distribution is asymptotically GP with tail index ξ. Here we explore the

distribution is ξ = 1

optimal choice for τc in the case of absolute t distribution. From our numerical investigations,
we have seen that τc around 0.75 works best with QR.GP. For QR.Hill, the optimal is 0.7.
In ﬁgure 3.12 we demonstrate their performance at their corresponding optimal across lower
quantiles 0.9 ≤ τ ≤ 0.99 for sample sizes n = 1000 and 2000.

Figure 3.12: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5) and quantiles
0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 1000, (b) n = 2000

We can see in ﬁgure 3.12 (a) that both QR.GP an QR.Hill are consistent in the sense that
they have low mse when τ ≤ 0.95. For n = 1000, QR.Hill outperforms QR.GP at τ > 0.95.
When we increase our sample size to n = 2000, QR.Hill cannot outperform QR.GP at any

quantile level between 0.9 and 0.991. Next we move into higher quantiles. Figure 3.13

illustrates QR.GP and QR.Hill across diﬀerent choices of τc for τ = 0.985, 0.991.

57

0123450.90.9250.950.9750.9850.991tqr.gp (tc=0.5)qr.hill (tc=0.7)(a) n = 1000mse | t(0.5) |0123450.90.9250.950.9750.9850.991tqr.gp (tc=0.5)qr.hill (tc=0.7)(b) n = 2000Figure 3.13: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values
τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000

58

1.52.02.53.00.010.100.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill3.54.04.55.05.56.06.57.00.010.100.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill0.80.91.01.11.21.30.010.100.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill23450.010.100.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill0.30.40.50.60.70.80.90.010.100.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill12340.010.100.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc | t(0.5) |Figure 3.14: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values
τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000

59

10121416182022240.010.100.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill4060801001200.010.100.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill51015200.010.100.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill204060801001200.010.100.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill51015200.010.100.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill204060801001200.010.100.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc | t(0.5) |Figure 3.15: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values
τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000

60

2530354045500.050.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill60801001201401600.050.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill10152025300.050.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill4060801001201400.050.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill510152025300.050.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill204060801001201400.050.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc | t(0.8) |Figure 3.16: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values
τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000

61

2003004005006007008009000.050.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill500010000150000.050.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill1002003004005006000.050.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill100020003000400050000.050.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill1002003004005006000.050.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill100020003000400050000.050.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc | t(0.8) |Figure 3.17: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5) and quantiles
0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 2000, (b) n = 5000

We achieve uniform dominance at quantile 0.985 with sample size n = 2000. We do so

for quantile 0.991 with n = 5000. In ﬁgure 3.14 we can see that for τ = 0.995 and 0.998,

QR.GP is comparable to QR.Hill. An important observation we have is that QR.GP is more

robust to the choice of τc than QR.Hill. Little deviation from optimal τc results to huge
ﬂuctuation in the mse of QR.Hill. In comparison, our mse is bounded and converges to 0 as
we increase n for all τc ∈ (0, 1). Next we want to see how changing values of ξ can aﬀect our
predictor. We increase ξ to 0.8. From our numerical investigations, we saw that τc = 0.4 is
optimal for both QR.GP and QR.Hill. Figure 3.17 shows mse at optimal values of τc = 0.4
at quantiles 0.9 < τ < 0.99. QR.GP does a better job at higher values of τ. Note that for

QR.Hill, the optimal τc is 0.7 when ξ = 0.5 while for ξ = 0.8, optimal τc = 0.4. The optimal
threshold τc is between 0.4 and 0.5 for QR.GP. We plot mse for diﬀerent choices of τc and
quantiles τ = 0.985 and 0.991 in ﬁgure 3.15. We can see that any τc between 0.4 and 0.6
works well for QR.GP. Figure 3.16 shows large sample behavior of the quantile estimates at

quantiles τ = 0.995 and 0.998. With increasing sample size, QR.GP can approximate the
tail well for thresholds 0.4 ≤ τc < 1.

62

0204060800.90.9250.950.9750.9850.991tqr.gp (tc=0.4)qr.hill (tc=0.4)(a) n = 10000204060800.90.9250.950.9750.9850.991tqr.gp (tc=0.4)qr.hill (tc=0.4)(b) n = 2000mse | t(0.8) |Burr: We simulate from the Burr distribution with tail index 0.5. The results for

quantiles τ = 0.985 and 0.991 are summarised in ﬁgure 3.18. Optimal τc for QR.GP is close
to 0.5. Any τc > 0.5 works well when the sample size is 5000. For QR.Hill, the optimal
τc is around 0.6. In case of |t(0.5)| we have seen in ﬁgure 3.13 that with increasing sample
size, QR.GP can approximate the 0.985 quantile for all τc ∈ (0, 1). We notice the same
characteristic here. Moreover, with increasing sample size QR.GP and QR.Hill both are

comparable at their optimal. For quantiles 0.991 and 0.995, we can see from ﬁgure 3.19 that

QR.GP is comparable with QR.Hill at their optimal. Now we simulate from Burr with tail

index 0.8. From ﬁgure 3.20 we can see that the optimal τc is 0.5 for QR.GP whereas optimal
τc for QR.Hill is 0.7. For higher quantiles τ = 0.995, 0.998, we can see in ﬁgure 3.21 that
both QR.GP and QR.Hill are comparable at their corresponding optimals.

63

Figure 3.18: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values
τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000

64

1.52.02.53.00.010.100.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill34567890.010.100.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill0.81.01.21.41.60.010.100.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill23456780.010.100.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill24680.010.100.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill12345670.010.100.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc | Burr(0.5) |Figure 3.19: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values
τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000

65

1015202530350.010.100.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill501001500.010.100.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill51015202530350.010.100.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill501001500.010.100.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill510152025300.010.100.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill0501001500.010.100.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc | Burr(0.5) |Figure 3.20: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values
τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000

66

4050607080900.010.100.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill1502002503000.010.100.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill2530354045500.010.100.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill801001201401601802000.010.100.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill101520253035400.010.100.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill4060801001201401600.010.100.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc Burr(0.8)Figure 3.21: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values
τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000

67

6008001000120014000.010.100.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill5000100001500020000250000.010.100.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill3004005006007008000.010.100.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill2000300040005000600070000.010.100.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill1002003004005006007000.010.100.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill100020003000400050000.010.100.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc Burr(0.8)3.5 Application to precipitation downscaling

According to the National Oceanic and Atmospheric Administration (NOAA), “Statis-

tical downscaling encompasses the use of various statistics-based techniques to determine

relationships between large-scale climate patterns resolved by global climate models and

observed local climate responses.” Statistical downscaling techniques aim to quantify the

relationship between global scale climate model outputs and local scale climate variables.

Statistical downscaling methods are useful for making predictions about local climate vari-

ables like precipitation, temperature, wind speed and direction, air quality, etc..

In this

section, we want to apply our proposed method to predict extreme quantiles of precipita-

tion. Here we are not looking for the best set of predictors form the global scale model

outputs that explains regional precipitation. Instead, we use relevant variables like sea level

pressure, speciﬁc humidity, etc. which contain necessary information and use them as our

predictors. Using a pre-determined set of covariates, we assess the predictive performance of

three diﬀerent methods: QR, QR.GP, and QR.Hill in the context of the estimation of high

conditional quantiles.

Cannon (2011) have used historic precipitation data obtained from the Vancouver In-
ternational Airport (WMO station 71892, 49◦12(cid:48)N, 123◦10.8(cid:48)W) weather station to demon-
strate an application of their proposed method: quantile regression with neural network.

This dataset is publicly available with the R package ‘qrnn’. The dataset consists of daily

precipitation records of 30 years, from 1971-2000. From ﬁgure 3.22, we can see that the

distribution of the precipitation is unimodal and skewed. Moreover, out of the total 10958

days, there are 5978 dry days. Precipitation for these days is recorded as 0mm. We did not

encounter any missing values in this dataset. The maximum precipitation is 89.4mm while

less than 1% of the observations are above 31mm.

We use three global climate model outputs: daily sea-level pressure, 500-hPa geopotential

height, and 700-hPa speciﬁc humidity as covariates. The predictor variables are obtained

from the NCEP/NCAR reanalysis project by Kalnay et al. (1996) from the nearest grid

68

Figure 3.22: Daily precipitation at the Vancouver International Airport, 1971-2000

point: 50◦N, 122.5◦W. We also include the sine and cosine of the day of the year as our
predictors. We standardize each of these variables before the analysis.

In real data, we do not know the true quantiles. In order to compare the predictive per-

formance of a quantile estimation method, we need a reference quantile estimate. Following

the suggestion by Cannon (2011), we use monthly sample quantiles as our reference quantile.
Given a quantile level τ, let QY (m,i)(τ ) be the monthly sample quantile for the day i of the
month m. Then QY (m,i)(τ ) is the sample τ quantile based on the observations from month
m, and we assume the same value for all the days of month m.

Here our aim is to compare the performance of three diﬀerent quantile estimators: QR,

QR.GP, and QR.Hill. For all three methods, we use the same set of covariates as described

earlier in this section. We can directly ﬁt QR as there is no threshold selection involved. For

69

both QR.Hill and QR.GP, we need to choose a threshold.

Using a grid of threshold values τc in (0.01, . . . , 0.8), we ﬁt QR.GP to the data following
algorithm 3. In ﬁgure 3.23, we have plotted diﬀerent values of the estimated tail index against
the thresholds. The tail index estimate stabilizes around τc ≈ 0.7. We choose τc = 0.68 (red
dotted line in ﬁgure 3.23) for which ˆξ = 0.18. Next, we ﬁt QR.Hill to the data using four

diﬀerent threshold values τc = 0.8, 0.85, 0.9, and 0.95. To assess the goodness of ﬁt, we look
into mean square prediction error at τ quantile

(cid:17)2

M SP E(τ ) =

1
n

(τ ) − QY (m,i)(τ )

(cid:16) ˆQY |xi

n(cid:88)

i=1

where ˆQY |xi
(τ ) is the predicted conditional τ quantile given xi and QY (m,i)(τ ) is the monthly
sample quantile for day i. Note that for both QR.GP and QR.Hill, we can only predict

Figure 3.23: Estimated tail index ( ˆξ) at diﬀerent choices of τc ∈ (0, 1). Method of
estimation is QR.GP

70

quantiles τ above the threshold τc. We choose τ ∈ {0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96,
0.97, 0.98, 0.985, 0.99, 0.995}. We use only quantile levels above 0.95 for the case τc =
0.95 (QR.Hill). Figure 3.24 shows the mean square prediction error for diﬀerent quantile

estimates. We can see that the mspe for QR.GP is lower than QR at all quantile levels

τ > 0.9. Although the performance of QR and QR.GP are not very diﬀerent. Note that the

tail index estimated by QR.GP is ˆξ = 0.18 which indicates that the data is not very heavy

tailed. We have seen from our numerical investigations that QR performs poorly when ξ

is 0.5 or above. But for ξ = 0 (exponential), QR is comparable with QR.GP. Moreover,

we have a decent sample size n = 10958 which gives QR some beneﬁt. Hence we are not

very surprised that the performance of QR closely follows our method. On the other hand,

predictive performance of QR.Hill is very sensitive to the choice of τc. Higher thresholds
tend to work better at higher quantile levels. For QR.Hill, we cannot agree upon a universal

threshold value that gives us the lowest mspe at all quantile levels above 0.9.

Implementation of the QR.Hill method in practice is challenging. Firstly, For a given
value of the predictor x, Wang et al. (2012) proposed to choose τc such that ˆξ|x is stable.
This makes the choice of tail index dependent on the covariate. Thus, we cannot choose

a optimal τc for all x. Meanwhile, with QR.GP we can still select a covariate adaptive
threshold for a ﬁxed τc as we use QR(τc|x) to estimate the threshold. Secondly, due to the
nature of QR.Hill, one has to compute QR estimates for n(1 − τc) many quantiles. This is
computationally challenging, particularly when τc is low or the sample size n is high. For
this data, with the sample size of n = 10958 implementing QR.Hill is very time consuming.

Compared to that we compute QR only once for each choice of τc.

71

Figure 3.24: Mean square prediction error (MSPE) of diﬀerent quantile estimates at
quantile levels 0.9 < τ < 0.995. Thresholds: 0.7 for QR.GP(blue); 0.8, 0.85, 0.9 and 0.95
for QR.Hill (dotted lines)

72

50100150200250tmse0.9000.9100.9200.9300.9400.9500.9600.9700.9800.990mspe at higher quantilesqr.gp.0.7qrqr.hill 0.8qr.hill 0.85qr.hill 0.9qr.hill 0.95CHAPTER 4

DISCUSSIONS AND FUTURE DIRECTIONS

Estimation of extreme conditional quantile is of prime interest in many ﬁelds of science.

Quantile Regression is useful to quantify eﬀects of the covariates at diﬀerent quantile levels.

When the underlying distribution is heavy tailed in nature, Quantile Regression estimates

suﬀer form high variability particularly at higher quantiles.

In this dissertation, we have developed methods tailored for estimating high conditional

quantiles for heavy tailed distributions. We have used the GP distribution to model the tail

behavior. Selection of an appropriate threshold is necessary for ﬁtting a GP model. The

threshold selection problem itself is challenging: a higher threshold leads to high variance

while a lower threshold is associated with higher bias. The eﬃciency of the GP model is

severely compromised when the threshold itself varies as a function of the covariate. The

common practice is to use an empirical quantile of the response variable as a threshold which

is not covariate adaptive. Combining Quantile Regression with Extreme Value theory, we

have proposed a novel method (QR.GP) of selecting a covariate adaptive threshold which in

turn produce a consistent estimator for higher quantiles. We establish desirable asymptotic

properties: consistency and asymptotic normality of QR.GP when the distribution is belongs

to the GP family. We have shown that with proper thresholding, QR.GP can be adapted

for a wide class of distribution in the heavy tailed domain of attraction. Although there

are relatively few works which address the issue of estimation of high conditional quantiles

for heavy tailed distributions, the one by Wang et al. (2012) presents a very promising ap-

proach. We have compared QR.GP approach which is fully parametric with that of Wang

et al. (2012). Our ﬁndings show that when the underlying distribution is Pareto or General-

ized Pareto, QR.GP provides consistent estimation of higher quantiles and has lower mean

square error that Wang et al. (2012). From our numerical investigations, we show that the

value of the threshold in QR.GP is diﬀerent for diﬀerent distributions in the heavy tailed

73

domain. Nonetheless, we have proposed a guideline for choosing the threshold when we

implement QR.GP in practice. Compared to Wang et al. (2012) our method is more robust,

computationally much faster and easy to interpret and apply in real data.

In our work, we have focused on the linear quantile functions. As pointed out by Wang

& Li (2013), the linearity assumption is restrictive. Wang & Li (2013) have used Box-Cox

power transformation in which the quantiles of the transformed response variable is linear

in the covariates. An extension of work could be modeling some power transformation of

the response variable with GP distribution. Our method features a constant tail index.

In future, we can look at a possible extension where the tail index is also a function of

the covariate. Establishing asymptotic properties of our method is challenging when the

underlying distribution is a heavy tailed distribution other than Pareto or GP. This is an

open ended problem and one can explore further in this direction.

74

APPENDICES

75

APPENDIX A

APPENDIX FOR CHAPTER 2

Supplemental ﬁgures and tables:

Table A.1: φ = 0, exp(1): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5 and
quantile levels τ.

τ

0.1
0.3
0.5
0.7
0.9
0.93
0.95
0.97
0.99

n = 500

n = 1000

GP.scale
0.04(0.03)
0.04(0.03)
0.04(0.03)
0.04(0.03)
0.04(0.03)
0.04(0.03)
0.04(0.03)
0.04(0.03)
0.05(0.04)

QR

0.14(0.11)
0.08(0.06)
0.06(0.05)
0.06(0.04)
0.06(0.04)
0.06(0.05)
0.07(0.05)
0.07(0.06)
0.1(0.08)

GP.scale
0.03(0.03)
0.03(0.03)
0.03(0.02)
0.03(0.02)
0.03(0.02)
0.03(0.02)
0.03(0.02)
0.03(0.03)
0.04(0.03)

QR

0.11(0.08)
0.06(0.05)
0.05(0.04)
0.04(0.03)
0.04(0.03)
0.05(0.04)
0.05(0.04)
0.05(0.04)
0.07(0.05)

Table A.2: φ = 0, exp(1): mse(se) of quantile estimates for φ = 0 at x = 0.5 and quantile
levels τ.

τ

0.1
0.3
0.5
0.7
0.9
0.93
0.95
0.97
0.99

n = 500

n = 1000

GP.scale

0(0)
0(0)
0(0)
0.01(0)
0.03(0)
0.04(0)
0.05(0)
0.08(0.01)
0.18(0.01)

QR
0(0)
0(0)
0.01(0)
0.01(0)
0.06(0)
0.09(0.01)
0.12(0.01)
0.21(0.01)
0.71(0.05)

GP.scale

0(0)
0(0)
0(0)
0(0)
0.02(0)
0.02(0)
0.03(0)
0.05(0)
0.1(0.01)

QR
0(0)
0(0)
0(0)
0.01(0)
0.03(0)
0.05(0)
0.07(0)
0.12(0.01)
0.36(0.02)

76

Figure A.1: mse(se) for φ = 0, exp(1) at x = 0.5 and quantile levels τ. The sample size is
n = 500.

77

0.00.20.40.60.8qq.hat.ms$mse.10.010.100.200.300.400.500.600.700.800.900.96gp.scaleqr(a) all quantiles0.000.010.020.030.040.050.06qq.hat.ms$mse.10.010.150.300.450.600.750.90gp.scaleqr(b) lower quantiles0.00.20.40.60.8qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqr(c) higher quantilesmsetAPPENDIX B

APPENDIX FOR CHAPTER 3

Table B.1: Probability distributions in heavy-tail domain

(cid:16)

Distribution

GP(µ, σ, ξ)

GP(0, 1, ξ)

exp(λ)

Pa(α)

Frechet(α)

Burr(η, λ, ν)

|T(n)|

1 − F (y)

1 + ξ(y−µ)

σ

(cid:17)− 1

ξ

− 1
ξ

(1 + ξy)

exp(−λy)

y−α

1 − exp(−y)−α

1 −(cid:16)
(cid:16)

(cid:17)λ
(cid:17)
η+y−ν
1 − Ft(n)(y)

η

2

ξ

ξ

ξ

0

1
α

1
α

1
ν

1
n

Parameters

µ < y, σ, ξ > 0

F−1(τ )

(cid:104)

(1 − τ )−ξ − 1

(cid:105)

(cid:105)

µ + σ
ξ

(cid:104)

0 < y, ξ > 0

1
ξ

(1 − τ )−ξ − 1

λ log(1 − τ )
− 1
(1 − τ )− 1

α

(− log(τ ))

1
α

η(cid:0)τ

− 1

F−1

t(n)

λ − 1(cid:1)(cid:105)− 1
(cid:17)
(cid:16) 1+τ

ν

2

y > 0, λ > 0

y > 1, α > 0

y > 0, α > 0

y > 0, η, λ, ν > 0

y > 0, n > 0

(cid:104)

78

BIBLIOGRAPHY

79

BIBLIOGRAPHY

Abrevaya, J. 2001. “the eﬀect of demographics and maternal behavior on the distribution of

birth outcomes,”. Empirical Economics 26. 247.

Beirlant, Jan, Goedele Dierckx, A Guillou & C Staăricaă. 2002. On exponential representa-

tions of log-spacings of extreme order statistics. Extremes 5(2). 157–180.

Bremnes, John Bjørnar. 2004. Probabilistic wind power forecasts using local quantile re-
gression. Wind Energy: An International Journal for Progress and Applications in Wind
Power Conversion Technology 7(1). 47–54.

Buchinsky, Moshe. 1994. Changes in the u.s. wage structure 1963-1987: Application of quan-
tile regression. Econometrica 62(2). 405–458. http://www.jstor.org/stable/2951618.

Caeiro, Frederico & M Ivette Gomes. 2009. Semi-parametric second-order reduced-bias high

quantile estimation. Test 18(2). 392–413.

Cannon, Alex J. 2011. Quantile regression neural networks: Implementation in r and ap-
plication to precipitation downscaling. Computers and Geosciences 37(9). 1277 – 1284.
doi:https://doi.org/10.1016/j.cageo.2010.07.005.

del Castillo, Joan & Isabel Serra. 2015. Likelihood inference for generalized pareto dis-
tribution. Computational Statistics Data Analysis 83. 116 – 128. doi:https://doi.org/
10.1016/j.csda.2014.10.014.
http://www.sciencedirect.com/science/article/pii/
S0167947314003065.

Chavez-Demoulin, V. & A. Davison. 2005. Generalized additive modeling of sample extremes.
Journal of the Royal Statistical Society Series C 54. 207–222. doi:10.1111/j.1467-9876.
2005.00479.x.

Chernozhukov, Victor. 2005. Extremal quantile regression. Ann. Statist. 33(2). 806–839.

doi:10.1214/009053604000001165. https://doi.org/10.1214/009053604000001165.

Chernozhukov, Victor & Songzi Du. 2006. Extremal quantiles and value-at-risk .

Coles, Stuart, Joanna Bawa, Lesley Trenner & Pat Dorazio. 2001. An introduction to statis-

tical modeling of extreme values, vol. 208. Springer.

Daouia, Abdelaati, Laurent Gardes, Stéphane Girard & Alexandre Lekina. 2011. Kernel

estimators of extreme level curves. Test 20(2). 311–333.

Davison, A. C. & R. L. Smith. 1990. Models for exceedances over high thresholds.
Journal of the Royal Statistical Society: Series B (Methodological) 52(3). 393–425.
doi:10.1111/j.2517-6161.1990.tb01796.x. https://rss.onlinelibrary.wiley.com/doi/
abs/10.1111/j.2517-6161.1990.tb01796.x.

80

Davison, AC & NI Ramesh. 2000. Local likelihood smoothing of sample extremes. Journal

of the Royal Statistical Society: Series B (Statistical Methodology) 62(1). 191–208.

Diebolt, Jean, Armelle Guillou & Pierre Ribereau. 2007. Asymptotic normality of ex-
treme quantile estimators based on the peaks-over-threshold approach. Communica-
tions in Statistics - Theory and Methods 36(5). 869–886. doi:10.1080/03610920601036317.
https://doi.org/10.1080/03610920601036317.

Elsner, Kossin J. P, J. B & T. H Jagger. 2008.

“the increasing intensity of the strongest

tropical cyclones,”. Nature 455. 92. doi:https://doi.org/10.1038/nature07234.

Embrechts, Paul, Claudia Klüppelberg & Thomas Mikosch. 1997. Modelling extremal events,

volume 33 of applications of mathematics.

Feigin, Paul D. & Sidney I. Resnick. 1994.

Limit distributions for linear program-
ming time series estimators. Stochastic Processes and their Applications 51(1). 135 –
165. doi:https://doi.org/10.1016/0304-4149(94)90022-1. http://www.sciencedirect.
com/science/article/pii/0304414994900221.

Feuerverger, Andrey, Peter Hall et al. 1999. Estimating a tail exponent by modelling depar-

ture from a pareto distribution. The Annals of Statistics 27(2). 760–781.

Friederichs, P & A Hense. 2007a. Statistical downscaling of extreme precipitation events

using censored quantile regression. Monthly weather review 135(6). 2365–2378.

Friederichs, P & A. Hense. 2007b. “statistical downscaling of extreme precipitation events

using censored quantile regression,”. Monthly Weather Review 135. 2365.

Gardes, Laurent & Stephane Girard. 2010. Conditional extremes from heavy-tailed distribu-
tions: An application to the estimation of extreme rainfall return levels. Extremes 13(2).
177–204.

Garrido, Myriam & Pascal Lezaud. 2013. Extreme value analysis : an introduction .

Gilli, Manfred & Evis këllezi. 2006. An application of extreme value theory for measuring

ﬁnancial risk. Computational Economics 27. 207–228. doi:10.1007/s10614-006-9025-7.

Gomesa, M Ivette & M João Martins. 2002. “asymptotically unbiased” estimators of the tail
index based on external estimation of the second order parameter. Extremes 5(1). 5–31.

Haan, Laurens & Ana Ferreira. 2006. Extreme value theory: An introduction. doi:10.1007/

0-387-34471-3.

de Haan, Laurens & Ana Ferreira. 2010. Extreme value theory: An introduction (springer

series in operations research and ﬁnancial engineering). Springer 1st edn.

Hall, Peter & Nader Tajvidi. 2000. Nonparametric analysis of temporal trend when ﬁtting

parametric models to extreme-value data. Statistical Science 153–167.

81

Hill, Bruce M. 1975. A simple general approach to inference about the tail of a distribution.
Ann. Statist. 3(5). 1163–1174. doi:10.1214/aos/1176343247. https://doi.org/10.1214/
aos/1176343247.

Jagger, T. H & J. B Elsner. 2008. “modeling tropical cyclone intensity with quantile regres-
sion,”. International Journal of Climatology 29. 1351. doi:https://doi.org/10.1002/joc.
1804.

Jansen, Dennis & Casper Vries. 1991. On the frequency of large stock returns: Putting
booms and busts into perspective. The Review of Economics and Statistics 73. 18–24.
doi:10.2307/2109682.

Kalnay, Eugenia, M. KANAMITSU, R. KISTLER, William Collins, D. DEAVEN, Gandin
LS, M. IREDELL, Satyajit Saha, G. White, J. WOOLLEN, Yuejian Zhu, Muthuvel Chel-
liah, W. EBISUZAKI, Wayne Higgins, John Janowiak, K. C, C. ROPELEWSKI, Julian
Wang & A. LEETMAA. 1996. The nmc/ncar 40-year reanalysis project. bull am meteorol
soc. Bulletin of the American Meteorological Society. 77. doi:10.1175/1520-0477(1996)
077<0437:TNYRP>2.0.CO;2.

Koenker, Roger. 2005. Quantile regression Econometric Society Monographs. Cambridge

University Press. doi:10.1017/CBO9780511754098.

Koenker, Roger & Gilbert Bassett. 1978. Regression quantiles. Econometrica 46(1). 33–50.

Kyselý, Jan. 2002. Probability estimates of extreme temperature events: Stochastic mod-
elling approach vs. extreme value distributions. Studia Geophysica et Geodaetica 46. 93–
112. doi:10.1023/A:1019841717369.

Langousis, Andreas, Antonios Mamalakis, Michelangelo Puliga & Roberto Deidda. 2016.
Threshold detection for the generalized pareto distribution: Review of representative meth-
ods and application to the noaa ncdc daily rainfall database. Water Resources Research
52(4). 2659–2681.

Li, D, L Peng & J Yang. 2010. “bias reduction for high quantiles,”. Journal of Statistical

Planning and Inference 140. 2433.

Park, Myung Hyun & Joseph H.T. Kim. 2016. Estimating extreme tail risk measures with
generalized pareto distribution. Computational Statistics Data Analysis 98. 91 – 104. doi:
https://doi.org/10.1016/j.csda.2015.12.008. http://www.sciencedirect.com/science/
article/pii/S0167947315003163.

Peng, Liang & Yongcheng Qi. 2004. Estimating the ﬁrst-and second-order parameters of a
heavy-tailed distribution. Australian & New Zealand Journal of Statistics 46(2). 305–312.

Pickands, James, III. 1975. Statistical inference using extreme order statistics. Ann.
Statist. 3(1). 119–131. doi:10.1214/aos/1176343003. https://doi.org/10.1214/aos/
1176343003.

82

Scarf, Philip A. & Patrick J. Laycock. 1996. Estimation of extremes in corrosion engineering.
Journal of Applied Statistics 23(6). 621–644. doi:10.1080/02664769623982. https://doi.
org/10.1080/02664769623982.

Song, Jongwoo & Seongjoo Song. 2012a. A quantile estimation for massive data with gen-

eralized pareto distribution. Computational statistics and data analysis 56(1). 143–150.

Song, Jongwoo & Seongjoo Song. 2012b. A quantile estimation for massive data with gener-
alized pareto distribution. Computational Statistics Data Analysis 56(1). 143 – 150. doi:
https://doi.org/10.1016/j.csda.2011.06.030. http://www.sciencedirect.com/science/
article/pii/S0167947311002477.

Tawn, Jonathan A. 1992. Estimating probabilities of extreme sea-levels. Journal of the Royal
Statistical Society. Series C (Applied Statistics) 41(1). 77–93. http://www.jstor.org/
stable/2347619.

Taylor, James W. 2008. Using exponentially weighted quantile regression to estimate value
at risk and expected shortfall. Journal of Financial Econometrics 6(3). 382–406. https:
//EconPapers.repec.org/RePEc:oup:jfinec:v:6:y:2008:i:3:p:382-406.

Umantsev, Len & Victor Chernozhukov. 2001. Conditional value-at-risk: Aspects of modeling
and estimation. Empirical Economics 26(1). 271–292. https://EconPapers.repec.org/
RePEc:spr:empeco:v:26:y:2001:i:1:p:271-292.

Velthoen, Jasper, Juan-Juan Cai, Geurt Jongbloed & Maurice Schmeits. 2019a. Improving
precipitation forecasts using extreme quantile regression. Extremes 22(4). 599–622. doi:
10.1007/s10687-019-00355-1.

Velthoen, Jasper, Juan-Juan Cai, Geurt Jongbloed & Maurice Schmeits. 2019b.

Improv-
ing precipitation forecasts using extreme quantile regression. Extremes doi:10.1007/
s10687-019-00355-1.

Wang, Hansheng & Chih-Ling Tsai. 2009. Tail index regression. Journal of the American

Statistical Association 104(487). 1233–1240.

Wang, Huixia Judy & Deyuan Li. 2013. Estimation of extreme conditional quantiles through
power transformation. Journal of the American Statistical Association 108(503). 1062–
1074.

Wang, Huixia Judy, Deyuan Li & Xuming He. 2012. Estimation of high conditional quantiles
for heavy-tailed distributions. Journal of the American Statistical Association 107(500).
1453–1464. doi:10.1080/01621459.2012.716382.

Weissman, Ishay. 1978. “estimation of parameters and large quantiles based on the k largest

observations,”. Journal of the American Statistical Association 73. 812.

83