QUANTILE REGRESSION AND EXTREMES By Raka Mandal A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics – Doctor of Philosophy 2020 ABSTRACT QUANTILE REGRESSION AND EXTREMES By Raka Mandal The estimation of conditional quantiles of a response variable is of prime interest in many statistical applications. Quantile regression (QR) is a powerful and popular method used extensively to estimate conditional quantiles of a response Y in the presence of a covariate X. Moreover, QR can quantify the effect of covariates at different quantile levels. While modeling a rare event, quantiles at high or low tails are of particular interest. In such cases, QR has inevitable shortfalls. Since we have fewer data at tails, QR estimates suffer from high variability. The performance deteriorates further when the underlying distribution is heavy-tailed. Estimation of extreme quantiles is therefore challenging, especially when the data comes from a heavy-tailed distribution. Extreme value theory (EVT) provides mathematical tools to analyze rare events. In prac- tice, EVT can be used to assess the probability of more extreme events than any previously observed occurrences. The scope of application of EVT includes financial risk assessment, extreme climate modeling, network anomaly detection, etc. Statistical modeling using EVT can be carried out in two approaches: Block maxima and Peak over Threshold (POT). In the block maxima approach, Generalized extreme value (GEV) distribution is fitted to a series of maximums derived from the observations. In the POT approach, one fits Generalized Pareto (GP) distribution to observations exceeding a certain threshold. An important problem while using the POT approach is the choice of the threshold. Models based on EVT use asymptotic arguments to approximate the tail behavior. Hence, the choice of the threshold is crucial in order to fit the GP distribution to the data. POT approach is a popular tool for the estimation of extreme quantiles in heavy-tailed data wherein the excesses over a threshold are modeled as a function of the covariate X. However, the efficiency of POT is severely compromised when the threshold itself varies as a function of the covariate. This dissertation proposes an integrated approach for estimating extreme conditional quantiles from a heavy-tailed distribution. We begin with the case where the threshold does not vary as a function of the covariate X. Using the POT approach to model the scaled conditional excess, we propose an estimator for high conditional quantile. We establish large sample properties of our estimator in the context of the GP distribution. Through numerical investigations, we demonstrate the superiority of our method over QR at higher quantiles. However, this method is not adequate when the threshold itself varies as a function of the covariate X. In order to circumvent this issue, we propose an extension of our approach, which uses standard QR to extract information on the threshold and then model the residuals as a GP distribution with covariate dependent parameters. We establish the asymptotic properties of our method in the context of the GP distribution. Extending further, we thoroughly study the asymptotic performance of this approach for a wide class of heavy-tailed distributions using numerical simulations. Using simulation studies, we compare our approach with existing methods used in estimation of high conditional quantiles for heavy tailed distributions. As an application, we have implemented our method for the task of precipitation downscaling with data obtained from the Vancouver International Airport weather station. We have also demonstrated how our method of covariate adaptive threshold selection can be implemented in practice. Copyright by RAKA MANDAL 2020 In memory of my grandmother Lila Ghosh. v ACKNOWLEDGEMENTS This Ph.D. journey has been challenging, both in my professional and personal life, particu- larly for the last two years. I want to take this opportunity to thank everyone who has given me immense support through this journey. First of all, I would like to thank my doctoral advisors, Dr. Tapabrata Maiti and Dr. Shrijita Bhattacharya, who has introduced me to the world of research. I am thankful to Dr. Maiti for believing in my potential and giving me several opportunities during my time at Michigan State University. His constant guidance has helped me to grow as a researcher and become successful in my career. It would not have been possible for me to write this dissertation to I am today without his counsel and support. I am thankful to Dr. Bhattacharya for introducing me to the magnificent domain of Extreme Value Theory. She has been an excellent mentor, and working with her has been rewarding. Her timely advice on my career plans, constant encouragement during stressful times, and constructive criticisms have aided me in improving and prepare myself for the next phase of my life. Not to mention, she has provided me with vital financial support for this work. I would like to thank Prof. Andrew Finley and Prof. Hyokyoung (Grace) Hong for being part of my dissertation committee. Their helpful suggestions and recommendations have been crucial for the completion of this dissertation. I would also like to thank the Department of Statistics and Probability at Michigan State University (MSU) to provide me with this incredible opportunity to pursue my academic endeavor. East Lansing has been my home away from home for the past seven years. I consider myself fortunate to stay here and enjoy the picturesque campus of MSU. Be it Spring, Summer, Fall, or Winter, my every venture into the nooks and corners of this campus has been full of exciting discoveries. During my stressful days, I found solace by the river banks of the Red Cedar River just outside my department. Thank you, WKAR-FM, for fueling my mornings with classical melodies. Writing this thesis would have been mundane vi without you. Thank you, Dr. Tawa Sina, for inspiring me and making me believe in myself every time I had given up. I consider you my unofficial advisor and a guarding angel on this mission. I would also like to thank my friends and fellow Ph.D. students in the department Rejaul Karim, Abdhi Sarkar, Metin Eroglu, Yi-Chen Zhang, Siddhartha Nandy, Atreyee Majumder, and Asish Banik. They have helped me big time during my tenure at MSU. I have been blessed with a wonderful circle of friends who have always been caring and supportive through this challenging phase of life. This acknowledgment would be incomplete without mentioning a few of them. Thank you, Suman Chakraborty, for your constant guidance like the north star to a sailor. Thank you, Sohini Raha, for being my lifeline during the darkest hours of my life. Thank you, Sujatro Chakladar, for your immense patience and endless encouragement. Thank you, Rejaul Karim, for being a friend everyone shall have in their life. Thank you, Shreya Saha and Ankita Bhattacharya, for opening the new avenue of spirituality in my life. Thank you, Sejuti Dasgupta, for giving me the comfort of a home far away from my family. Thank you, Moumanti Podder, for your continual supply of courage and confidence. Last but not least, thank you, Sebanti Sengupta, Snigdha Panigrahi, Arpita Mukherjee, Saswati Saha, Shrijita Bhattacharya, Saptarshi Mukherjee, Portia Bannerjee, Shruti Saripalli, Rohosen Bandopadhyay, for making this journey easier. I apologize if I have missed anyone. I want to end this acknowledgment with thanking my family members, though a mere thank you is not enough for every little thing they have done for me. I would like to thank my parents, Tripti Mandal and Prasanta Mandal, for their unconditional love, endless support, and countless sacrifices. They have devoted their life to my well being. Thank you, Maa, for nurturing my scientific curiosity since childhood. You have introduced me to the world of mathematics, and that is why I came this far in my academic venture. Thank you, Baba, for always believing in my potential and giving me the liberty to pursue my dreams, be it my career choice or my hobbies. You are an impeccable human being, and I always look up to you. I must thank my little sister, Trisha Mandal, for being always available for me, in vii spite of staying on a different continent. Thank you, Chottu, for countless times you have woken me up with your call so that I could reach to my classes in time. I cannot express how hard it has been to stay away from all of you. Last but not least, thank you, Didu, for blessing me with elated childhood. I know you would have been proud and happy if you were still with us. . viii TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 2 QUANTILE REGRESSION AND ESTIMATION OF CONDITIONAL QUANTILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Extreme Value Theory: Definitions and Results . . . . . . . . . . . . . . . . 2.4 Statistical Modeling of Extreme Values with GP . . . . . . . . . . . . . . . . 2.5 Proposed Method: GP for Scale Models . . . . . . . . . . . . . . . . . . . . . 2.5.1 Method of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 Proposed Method: QR Adjusted GP for Location-Scale Models CHAPTER 3 ESTIMATION OF EXTREME CONDITIONAL QUANTILES WITH COVARIATE ADAPTIVE THRESHOLD SELECTION . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Method of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Simulation Study: GP Distribution . . . . . . . . . . . . . . . . . . . 3.3 Extension to the Nonparametric Problems and Threshold Selection . . . . . 3.3.1 Method of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Discussion: Threshold Selection for Heavy Tailed Distributions . . . . 3.4 Simulation Study: Nonparametric Problem . . . . . . . . . . . . . . . . . . . 3.4.1 Review of an Existing Method for Extreme Quantile Estimation . . . 3.4.2 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Application to precipitation downscaling . . . . . . . . . . . . . . . . . . . . CHAPTER 4 DISCUSSIONS AND FUTURE DIRECTIONS . . . . . . . . . . . . APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX FOR CHAPTER 2 . . . . . . . . . . . . . . . . APPENDIX FOR CHAPTER 3 . . . . . . . . . . . . . . . . APPENDIX A APPENDIX B BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x xi 1 7 7 8 10 15 17 17 21 29 29 31 32 36 41 42 44 47 49 51 68 73 75 76 78 79 ix 24 26 26 28 76 76 78 LIST OF TABLES Table 2.1: φ = 0, GP(0.5): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2.2: φ = 0, GP(0.5): mse(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2.3: φ = 2, GP(0.5): rel.bias(se) of quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2.4: φ = 2, GP(0.5): mse(se) of quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A.1: φ = 0, exp(1): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A.2: φ = 0, exp(1): mse(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table B.1: Probability distributions in heavy-tail domain . . . . . . . . . . . . . . . . x LIST OF FIGURES Figure 2.1: Quantile Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Figure 2.2: Quantile estimates for φ = 0, exp(1) at x = 0.5 and quantile levels τ. The black line is the true quantile function. Average ± standard error curves of the estimates are plotted for QR (orange) and GP.scale (violet). The sample size is n = 500. . . . . . . . . . . . . . . . . . . . . Figure 2.3: Quantile estimates for φ = 0, GP(0.5) at x = 0.5 and quantile levels τ. The black line is the true quantile function. Average ± standard error curves of the estimates are plotted for QR (orange) and GP.scale (violet). The sample size is n = 500. . . . . . . . . . . . . . . . . . . . . Figure 2.4: mse(se) for φ = 0, GP(0.5) at x = 0.5 and quantile levels τ. The sample size is n = 500. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.5: Quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. The black line is the true quantile function. Average± standard error curves of the estimates are plotted for QR (orange) and GP.scale (violet). . . . . . . . Figure 2.6: mse±se for φ = 2, GP(0.5) and quantile levels τ. . . . . . . . . . . . . . . Figure 3.1: mse±se for, GP(0.5) at x = (0.5, 0.5) and quantile levels τ. The sample size is n = 500. (a): τc = 0.01 for QR.GP (b) and (c): 0 < τc < 1 . . . . Figure 3.2: mse±se for, GP(0.5) at x = (0.5, 0.5) and quantile levels τ.The sample size is n = 1000. (a): τc = 0.01 for QR.GP. (b) and (c): 0 < τc < 1 . . . Figure 3.3: mse±se for, GP(0.8) at x = (0.5, 0.5) and quantile levels τ. The sample size is n = 1000. (a): τc = 0.01 for QR.GP. (b) and (c): 0 < τc < 1 . . . Figure 3.4: mse±se for, exp(1) at x = (0.5, 0.5) and quantile levels τ. τc = 0.01 for QR.GP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.5: Tail index estimates with changing values of τc. Red line corresponds to the estimate of ξ when GP is fit to the error i. Black line corresponds to the estimate when our proposed method is fit to yi. True tail index is 0.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 22 23 25 27 28 38 39 40 41 47 Figure 3.6: Tail index estimates with changing values of τc. Red line corresponds to the estimate of ξ when GP is fit to the error i. Black line corresponds to the estimate when our proposed method is fit to yi. True tail index is 0.8. Left: T-distributed errors. Right: Burr-distributed errors . . . . . Figure 3.7: mse±se for Pareto(0.5) at x = (0.5, 0.5) and quantile levels τ. The sample size is n = 1000 and 0 < τc < 1 . . . . . . . . . . . . . . . . . . . Figure 3.8: mse±se for Pareto(0.5) at x = (0.5, 0.5) at quantile τ = 0.998 and sample sizes (a) n = 1000, (b) n = 2000 . . . . . . . . . . . . . . . . . . . Figure 3.9: mse±se for Pareto(0.8) at x = (0.5, 0.5) and quantile levels τ. The sample size is n = 2000 and 0 < τc < 1 . . . . . . . . . . . . . . . . . . . Figure 3.10: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.985 and sample sizes (a) n = 2000, (b) n = 5000 . . . . . . . . . . . . . . . . . . . Figure 3.11: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.998 and sample sizes (a) n = 2000, (b) n = 5000 . . . . . . . . . . . . . . . . . . . Figure 3.12: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5) and quantiles 0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 1000, (b) n = 2000 . . . . . . . . . Figure 3.13: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . Figure 3.14: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . Figure 3.15: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . Figure 3.16: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . Figure 3.17: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5) and quantiles 0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 2000, (b) n = 5000 . . . . . . . . . Figure 3.18: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . . Figure 3.19: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . . 48 53 54 55 56 56 57 58 59 60 61 62 64 65 xii Figure 3.20: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . . Figure 3.21: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 . . . . . . . . . . . Figure 3.22: Daily precipitation at the Vancouver International Airport, 1971-2000 . . Figure 3.23: Estimated tail index ( ˆξ) at different choices of τc ∈ (0, 1). Method of estimation is QR.GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.24: Mean square prediction error (MSPE) of different quantile estimates at quantile levels 0.9 < τ < 0.995. Thresholds: 0.7 for QR.GP(blue); 0.8, 0.85, 0.9 and 0.95 for QR.Hill (dotted lines) . . . . . . . . . . . . . . . . Figure A.1: mse(se) for φ = 0, exp(1) at x = 0.5 and quantile levels τ. The sample size is n = 500. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 67 69 70 72 77 xiii CHAPTER 1 INTRODUCTION In many practical applications, we are interested in events that are rare but have very significant consequences. For example, a sudden heavy rainfall can abruptly increase the volume of water inside a dam. The water then released can cause flooding in the lowland areas. Moreover, depending on the architecture and other attributes, the dam can break if the water level exceeds a certain threshold. Hence it is vital to take into account the extreme climate pattern while constructing the dam. Suppose that rainfall data from the past 50 years from the location is available. The project aims to build a dam that can sustain for the next 100 years. How can we predict an event that might happen once in 100 years with observations from the past 50 years? Extreme Value Theory provides the theoretical justification of extrapolation beyond a range of observations. One can derive natural estimators of relevant quantities like extreme quantiles, return levels, etc. We can expect these estimators to have desirable analytical properties. In the example given above, we might be interested in predicting 0.99 quantile of the daily rainfall at the construction location. The scope of application of Extreme Value Theory (EVT) is vast. For example, Tawn (1992) used EVT to estimate probabilities of annual maximum hourly sea-levels. This in- formation is necessary to accurately predict the height of sea defenses in the coastal areas. Jansen & Vries (1991) have successfully applied EVT to analyze financial data and con- cluded that the market crash of 1987 was not an outlier. Gilli & këllezi (2006) demonstrated the application of extreme value distributions for risk assessment with major stock market indices. Scarf & Laycock (1996) studied the analysis of corrosion extremes with EVT. The main difference between traditional statistical modeling and EVT is that we generally model the mean in the former while EVT models the extreme quantile, maxima, or some higher order statistic. In this article, we are going to focus on methods for extreme tail quantiles. 1 Several approaches have been proposed to address extreme tail quantile estimation based on EVT. In the nonregression setup, the estimation of tail quantiles using EVT is based on the assumption that observations are independently and identically distributed. For exam- ple, Weissman (1978) approached the high quantile estimation problem where only the k largest observations from a sample of size n are available, k < n. The choice of the optimal sample fraction k is vital to minimize the quantile estimate’s asymptotic mean square error. As pointed out by Li et al. (2010), one can focus on the bias reduction of the parameter estimates that produce a reduced bias quantile estimator in turn. The first class of bias reduction methods uses optimal k at the same rate as the theoretical optimal rate of the tail index estimator (e.g., Beirlant et al. (2002), Gomesa & Martins (2002)) while the second class of methods uses a larger order of k than the theoretical optimal rate (e.g., Feuerverger et al. (1999), Peng & Qi (2004), Caeiro & Gomes (2009)). Although these methods are developed for the nonregression setup, they are a good starting point based on which many extensions for the regression framework can be adapted. In a regression framework, we want to incorporate information on a covariate X into our analysis. The foundation of regression quantiles has been laid by Koenker & Bassett (1978), which uses liner quantile functions to explore the relationship between the response Y and a covariate or a set of covariates X. The scope of regression quantiles proposed by Koenker & Bassett (1978) is not just limited to tail quantiles. With quantile regression, we can study the impact of X on different quantiles of the response distribution. There are many advantages to choosing quantile regression over mean regression. Firstly, quantile regression gives us a complete picture of the conditional response distribution. Secondly, it is robust to outliers. Thirdly, it is a distribution free approach so we can apply it to non-Gaussian models. Abrevaya (2001) used regression quantiles to study the effects of several factors linked to newborns’ lower birthweights. Umantsev & Chernozhukov (2001) used regression quantiles to model Value-at-Risk, a measure of market risk widely used in finance. Buchinsky (1994) examined the effects of education and experience at different quantiles of the wage 2 distribution using quantile regression. Although quantile regression serves as a basis, the large sample theory of quantile regres- sion does not apply far enough into tails. For details, see Chernozhukov (2005). Moreover, data sparsity at tails further amplifies the variability of the estimate. Naturally, behavior at tails is contingent on the nature of the underlying distribution. Therefore, one has to build an alternative model for tails. Many authors have proposed quantile regression or a varia- tion of it to predict extreme quantiles. For example, Friederichs & Hense (2007b) proposed censored quantile regression to predict high precipitation. Taylor (2008) used exponentially weighted quantile regression to analyze stock market data. Quantile regression methods tailored for higher quantiles have been proposed for modeling extreme weather phenomenon like tropical cyclones (Elsner & Jagger (2008), Jagger & Elsner (2008)), severe snowstorms (Velthoen et al. (2019a)) and heatwaves (Kyselý (2002)). Chernozhukov (2005) used the extreme value theory framework by Feigin & Resnick (1994) to develop a theoretical un- derstanding of the linear quantile regression models at the tail. Chernozhukov (2005) has shown that the asymptotic distribution of extreme quantiles depends on the tail index 1 and the regression design. This tail index is a crucial indicator of the domain of attraction 2 of the underlying process. Not all quantile estimation methods can capture the true quantile function if the distribution belongs to the heavy-tailed3 domain of attraction. To the best of our knowledge, there are very few methods that have been tailored for estimating extreme quantiles in heavy tailed models. However, as pointed out by Wang & Li (2013), the existing techniques can be broadly grouped into four classes. The first class of methods uses fully parametric models developed using EVT. Generally, these models fit the Generalized Extreme Value (GEV) or the Generalized Pareto (GP) distribution to the data. Here, the information on the covariate X can be incorporated via a combination of the location, scale, or the shape parameters of the GEV or GP distributions. These methods 1See definition 2.3.1 2See definition 2.3.1 3See definition 2.3.1 3 usually require a choice of covariate dependent tuning parameter, e.g., threshold value. More details can be found in Davison & Smith (1990), Davison & Ramesh (2000), Hall & Tajvidi (2000), Wang & Tsai (2009), and Chavez-Demoulin & Davison (2005). The second class of models uses local estimation to extend the EVT methods in the regression framework. Here, the high conditional quantiles of Y given X = x is estimated using the observations in a small neighborhood of x. For details, see Gardes & Girard (2010), Daouia et al. (2011), Gardes & Girard (2010), and Velthoen et al. (2019b). The efficiency of this class of method depends on the richness of the data in the neighborhood of x. The third class of models uses quantile regression directly to estimate extreme quantiles. More details can be found in Bremnes (2004), Friederichs & Hense (2007a) and Jagger & Elsner (2008). The fourth and final class of methods uses a combination of quantile regression and EVT while imposing some suitable assumptions of the tail behavior of the underlying distribution. Chernozhukov & Du (2006) and Wang et al. (2012) proposed methods in which the intermediate conditional quantiles are estimated using QR and used to extrapolate to the extreme quantiles. These methods are developed assuming that the target quantile functions are linear in the covariates and the distribution is in the maximum domain of attraction of a heavy tailed distribution. In this article, we propose another approach for estimating high conditional quantiles us- ing EVT and quantile regression, where we use the GP distribution as the basis of our model. Pickands (1975) first suggested using the GP distribution to model the upper tail of a distri- bution and Davison & Smith (1990) extended this method for the regression setup. There are several advantages of using GP distribution to model tails. Firstly, by Pickands–Balkema–de Haan theorem, the exceedance over a high threshold of an arbitrary heavy tail distribution can be well approximated by GP. Hence, regardless of the original distribution, GP can approximate the tail behavior if it is heavy tailed. Secondly, the analytic behavior of the parameter estimates of GP is well studied and they have desirable asymptotic properties. This is useful for studying the behavior of different statistics based on the parameter esti- mates. Nonetheless, approximation by GP is contingent upon the selection of an appropriate 4 threshold. In a non-regression settings, the threshold is usually determined via model diag- nostics or some nonparametric tests which assess the goodness of the fit (Langousis et al. (2016)). These methods can be adapted in a regression framework when the threshold is not dependent on the covariate X. However, the problem becomes challenging when the threshold changes with X. As far as we know, the problem of covariate adaptive threshold selection has not been addressed. In this dissertation we propose a solution to this problem which in turn facilitates the estimation of extreme conditional quantiles. Our contribution in this article can be summarised as following. Our approach provides consistent estimation of quantiles for the GP family of distributions featuring a non negative tail index. Through numerical investigations, we show that our method can overcome the shortcomings of the standard Quantile Regression (Koenker & Bassett (1978)) far in the tail when the distribution is heavy tailed in nature. Moreover, we propose a generalization of our method which can accommodate a large class of distributions in the heavy tail domain of attraction. Selection of a covariate adaptive threshold is a nontrivial problem in this case. Through several examples, we demonstrate the choice of an optimal threshold for implementing our method. Our numerical investigations also show that our method provides a robust estimation of extreme quantiles for heavy tailed distributions. Wang et al. (2012) has proposed a semi-parametric approach for estimating higher conditional quantiles in the context of heavy tailed distributions. We demonstrate the superiority of our approach over Wang et al. (2012) in the context of a Pareto or a Generalized Pareto distribution. When the underlying distribution is in the heavy-tailed domain of attraction, we show that with appropriate choice of the threshold, our approach is as efficient as Wang et al. (2012). Our approach is computationally efficient than Wang et al. (2012) which requires estimation of a number of intermediate regression quantiles. Moreover,it is also easy to implement in practice and we demonstrate it though an application in the real data. The contents of the rest of the thesis is organized as below. Chapter 2 proposes a method for estimating conditional quantiles from a regression model 5 where the scale is modeled as a function of the covariate X while the location is not dependent X. Since the threshold is independent of the covariate, a direct adaptation of the POT approach in Davison & Smith (1990) is shown to outperform the quantile regression at high conditional quantiles. We establish the asymptotic properties of the proposed estimator when the true model is GP. Through numerical investigations, we establish the efficiency of our method over standard Quantile Regression for the case when the observations are distributed according to a GP distribution. Estimating conditional quantiles with GP when the threshold itself varies as a function of the covariate is challenging. Chapter 3 proposes an extension of our method in Chapter 2, which can be used for a covariate adaptive threshold selection. We show that our proposed extension offers a significant improvement over our proposal in chapter 2 and standard Quan- tile regression through numerical investigations. Also, we establish the asymptotic properties of our estimator in the context of the GP distribution. The selection of a proper threshold is vital for applying our method in practice where we do not know the exact underline dis- tribution. Here, we give a guideline to choose an appropriate threshold when the underlying distribution is a heavy tailed distribution from the Fréchet domain of attraction. We also show that our proposed method can significantly reduce computational complexity, which is usually associated with the optimization of the quantile loss function. Extending further, we thoroughly study the asymptotic performance of our approach for a wide class of heavy tailed distributions via numerical simulations. We compare the performance of our approach with Wang et al. (2012) in estimating conditional quantiles for a variety of distributions in the heavy tailed domain of attraction. Finally, we implement our method to predict extreme precipitation with the data obtained from the Vancouver International Airport weather sta- tion. We end this dissertation with some discussions regarding our proposed method in chapter 4. Here we also propose some future directions that can be pursued. 6 CHAPTER 2 QUANTILE REGRESSION AND ESTIMATION OF CONDITIONAL QUANTILES 2.1 Introduction The estimation of high conditional quantiles of a response variable is of prime interest in many statistical applications. Quantile regression is a powerful and popular method for this task. One can estimate the effects of the covariates at different conditional quantiles of the response using quantile regression. For example, Friederichs & Hense (2007b) used me- teorological information obtained from global climate model projections to predict localized high precipitation. Lower infant birth weight can lead to severe health conditions. Abrevaya (2001) used quantile regression to quantify the effects of maternal behavior, demographics, etc. on different quantiles of the birth weight distribution. Umantsev & Chernozhukov (2001) analyzed factors associated with high risk in finance. Although the scope of quan- tile regression applications is vast, Chernozhukov (2005) pointed out that the large sample theory of quantile regression is not applicable sufficiently far in the tails. Hence, we should proceed with caution while using quantile regression at tails. The literature of modern extreme value theory has provided us with elegant tools to quantify rare events. In the nonregression framework, the problem is familiar and well explored. For details, see Weissman (1978), Li et al. (2010), and Embrechts et al. (1997). In the regression framework, one can model the extreme quantiles by fitting a fully parametric model using the generalized extreme value or the generalized Pareto distribution. See Coles et al. (2001) for details. In this chapter, we explore different approaches for estimating conditional quantiles. We start with quantile regression method proposed by Koenker & Bassett (1978) in section 2.2. In section 2.3, we introduce fundamental results from the extreme value literature. In section 7 2.4 we introduce the statistical modeling framework using GP that serves as an basis for our work. In section 2.5, we propose our method for estimating conditional quantile using POT approach. We provide large sample properties of the estimator in the context of the GP dis- tribution. In section 2.6 we demonstrate the efficiency of our method over standard quantile regression. From our numerical investigation, we can see that quantile regression estimates are not adequate for estimating higher conditional quantiles. Our proposed method can be adapted for estimating high conditional quantiles, given the data follows GP distribution. 2.2 Quantile Regression Consider a random variable Y with distribution function F and let 0 < τ < 1. Let QY (τ ) = inf{y : F (y) ≥ τ} be the τ quantile of Y and E(Y ) is the expected value of Y . Then E(Y ) = argmin c E(Y − c)2 Or we can say that the mean E(Y ) minimizes the square error loss. Consider the function ρτ : R → [0,∞) such that ρτ (y) = y(cid:0)τ − I [y<0] (cid:1) = yτ if y ≥ 0 if y < 0 (τ − 1)y Where τ ∈ (0, 1) is a fixed number and I [y<0] is the indicator function. The function ρτ in figure 2.1 is often referred to as quantile loss function or tilted absolute loss function. This is a convex function differentiable everywhere except at 0. Like mean, quantile of a random variable can also be derived as a solution to a minimization problem. Following Koenker (2005) one can easily verify that QY (τ ) = argmin c E(cid:0)ρτ (Y − c)(cid:1) (2.1) QY (τ ) is the minimizer of the quantile error loss ρτ . When τ = 0.5, minimizing equation 2.1 is equivalent to minimizing E|Y − c| which is the absolute error loss. In this case, the solution is the median of Y . In the regression paradigm, one explores the relationship 8 Figure 2.1: Quantile Loss Function between the response Y and the covariate X. In order to do so, the covariate information must be incorporated into the response in some form. In classical regression, one models the the conditional mean of the response. When the conditional mean is a parametric function of the covariates, i.e, E(Y |X) = f (X, β) one can estimate the conditional mean function E(Y |X) by minimizing the square error loss function E(Y − f (X, β))2. Suppose we have sample observations (yi, xi), i = 1, . . . , n. The estimation of the mean function can be achieved by the method of least squares. One can minimize the sample analog of the squared error loss n(cid:88) (cid:0)yi − f (xi, β)(cid:1)2 i=1 The method of least squares enjoys computational feasibility since the loss function is dif- ferentiable. However, it does not give more insight into the conditional distribution of the response. Also, it lacks robustness as it is sensitive to outliers. Koenker & Bassett (1978) introduced quantile regression as a robust alternative to mean regression. Quantile regres- sion can be viewed as an extension of classical mean model regression where the conditional quantile is modeled as a function of the covariates. When the conditional quantile is mod- eled as a parametric function of the covariates, i.e., QY |X (τ ) = g(X, β) one can estimate it 9 by minimizing Eρτ (Y − g(X, β)). With sample observations (yi, xi), i = 1, . . . , n one can minimize the sample quantile error loss at τ quantile n(cid:88) (cid:0)yi − g(xi, β)(cid:1) ρτ (2.2) i=1 By choosing different values of τ, one can obtain more insight into the conditional distribution beyond the mean. Quantile regression methods are robust and do not rely on the Gaussian error assumption. Since the quantile loss function is not differentiable, optimization of the objective function in 2.2 is not straightforward. Koenker & Bassett (1978) has shown that for linear quantiles, i.e., when QY |X (τ ) = g(X, β) = α(τ ) + XT β(τ ) one can estimate the τ quantile at X = x as ˆQY |x(τ ) = ˆα(τ ) + xT ˆβ(τ ) ˆα(τ ), ˆβ(τ ) = argmin α,β i=1 n(cid:88) ρτ (yi − α − xT i β) (2.3) The minimization of equation 2.3 can be reformulated as a linear programming problem and solved accordingly. Moreover, the optimization has to be done separately for every choice of τ. Nonetheless, this classical quantile regression model can be adapted for a broad class of regression problems. In Chapter 1, we have discussed how the presence of extreme observations in the data can increase the challenge of consistent estimation of higher quantiles by several manifolds. Extreme Value Theory (EVT) provides us some powerful tools to deal with the analysis of rare events. To understand the challenges, we start with a few basics of EVT. 2.3 Extreme Value Theory: Definitions and Results Extreme value theory deals with rare events. Rare events are events that happen with low probability near the upper or lower endpoints of the distribution function F . Intuitively, the asymptotic behavior of the maximum (or minimum) can give us more insight into the 10 upper (or lower) tail of the distribution. Let Y1, Y2, . . . be a sequence of independent random variables with common distribution function F and let Mn = max(Y1, . . . , Yn). Let y+ = sup{y : F (y) < 1}, the upper endpoint of the distribution function F . Then P (Mn < y) = {F (y)}n for any y in the domain of F and Mn converges to y+ in probability as n → ∞. Now, n→∞{F (y)}n = lim 0 if 1 if y < y+ y ≥ y+ Hence, in order to obtain a nondegenerate limit distribution, proper centering and scaling of Mn are necessary. Suppose that there exists sequence of constants an > 0 and bn such that n→∞ P ((Mn − bn)/an < y) = lim lim n→∞ F n(any + bn) = G(y) for some distribution function G and for all y in the domain of F . The natural question that with proper centering and scaling, the sum(cid:80)n that follows is what the possibilities of the limit are? We know from Central limit theorems i=1 Yi of iid random variables converge to normal distribution. The answer is given by the following theorem, which is also a significant result in the extreme value literature. Theorem 2.3.1. (Fisher-Tippet-Gnedenko) Let Y1, Y2, . . . be independent random variables with common distribution function F and let Mn = max(Y1, . . . , Yn). If there exists a se- quence of constants an > 0, bn such that (Mn − bn)/an converges to a non degenerate distribution G, i.e, n→∞ Gn(y) = lim lim n→∞ P ((Mn − bn)/an < y) = lim n→∞ F n(any + bn) = G(y) then, the only three possible forms of the limit are: Gumble : G0(y) =exp(−e−y), F rechet : G1,α(y) =exp(−y−α), W eibull : G2,α(y) =exp(−(−y)−α), y ∈ R y ≥ 0, α > 0 y ≤ 0, α < 0 (2.4) 11 Theorem 2.3.1 says that if the limiting distribution of F n(any + bn) exists, then there are only three possible forms of the limit as given above. The limits of 2.4 can be collectively thought as a member of a single family of distributions known as Generalized Extreme Value (GEV) distributions. We say that Y follows standard GEV distribution with parameter ξ, Y ∼ GEV(ξ) if P (Y < y) = Gξ(y) = exp[−(1 + ξy)−1/ξ], (2.5) Note that the support of Y is dependent on ξ. We have −1/ξ < y < ∞ for ξ > 0 while −∞ < y < −1/ξ for ξ < 0. This naturally leads to two very different sub-classes of distributions within the GEV family. The parameter ξ is a crucial indicator of the sub-class 1 + ξy > 0, ξ (cid:54)= 0 and it is known as the extreme value index or tail index. For ξ = 0, the cdf in 2.5 is not defined. Rather, we use the limit ξ → 0 which leads us to the Gumble distribution as mentioned in equation 2.4. Gξ(y) = G0(y) = exp(−e−y), y ∈ R lim ξ→0 Again, we can see that the support has changed to entire R from the ξ (cid:54)= 0 cases. This con- cludes the third sub-class in GEV family: Gumble. In fact, the other two limits: Fréchet and Weibull belong to the two different sub-classes mentioned above. With a little reparametriza- tion α = 1/ξ and Y ∗ = 1 + ξY where Y ∼ GEV(ξ) we can see that Y ∗ ∼ G1,α for α > 0 and Y ∗ ∼ G2,α for α < 0. We have seen that the behaviour of the GEV distribution is governed by the value of the tail index ξ. Any change of location and scale does not affect ξ. Hence, the entire GEV family can be summarised by introducing two additional parameters: ˜µ as location and ˜σ > 0 as scale. Let ˜Y = (Y − ˜µ)/˜σ. Then ˜Y ∼ GEV(˜µ, ˜σ, ξ) with exp exp (cid:104) −(cid:16) (cid:104) − exp (cid:17)(cid:105) 1 + ξ( y−˜µ ˜σ ) (cid:16) y−˜µ ˜σ (cid:17)−1/ξ(cid:105) if ξ = 0 P ( ˜Y < y) = if ξ (cid:54)= 0 (2.6) Where 1 + ξ( y−˜µ Y ∼ GEV(ξ). The following definitions are useful to keep in mind for future use. ˜σ ) > 0. When Y is standard GEV with µ = 0 and σ = 1, we simply say 12 Definition 2.3.1. (Domain of attraction) Suppose that Y1, Y2, . . . If iid∼ F and Mn = max(Y1, . . . , Yn). n→∞ P ((Mn − bn)/an) < y) = lim lim n→∞ F n(any + bn) = G(y) for some constants an > 0 and bn and G is a non-degenerate distribution, we say F belongs to the domain of attraction of G. From theorem 2.3.1, we know that there are three possible domains of attraction of F , namely Fréchet (ξ > 0), Weibull (ξ < 0) or Gumble ξ = 0. Moreover, F can belong to only one of the three sub-classes. In practice, we can use the value of the tail index ξ to identify the domain of attraction. Hence we adopt the notation F ∈ D(Gξ) to indicate that F belongs to the domain of attraction of G with tail index ξ. Definition 2.3.2. (Heavy-tailed distribution) F is a heavy-tailed distribution if F ∈ D(Gξ) for some tail-index ξ > 0. When F is known, one can easily find the quantile by inverting the cdf. Here we are interested in a higher quantile τ close to 1 when F is unknown. We will now give an outline of approximating a higher quantile of Y using EVT. Suppose u ∈ R. One can interpret Y > u as an extreme event for high values of u. Consider the following quantity: 1 − F (y + u) 1 − F (u) 1 − Fu(y) = P (Y > y + u|Y > u) = P (Y > y + u) P (Y > u) = The above quantity can be interpreted as the conditional exceedance probability over a value u. We say Fu(y) = P (Y ≤ y + u|Y > u) is the excess cdf of Y over a threshold u. If Yu = Y |Y > u, then P (Yu < y + u) = Fu(y). The following theorem gives us the limiting distribution of Yu as u → ∞. Theorem 2.3.2. (Pickands–Balkema–de Haan) Let Y1, Y2, . . . be independent random vari- ables with common distribution function F . Define the conditional distribution of excess as Y − u|Y > u ∼ Fu. Fu(y) = P (Y − u ≤ y|Y > u) = F (u + y) − F (u) 1 − F (u) 13 If F ∈ D(Gξ) for some ξ satisfying theorem 2.3.1 then for large enough u, Fu can be approximated as (cid:17)−1/ξ 1 −(cid:16) 1 − e− y σ 1 + ξy σ if ξ (cid:54)= 0 if ξ = 0 (2.7) H(y) = for some σ > 0. Theorem 2.3.2 is also known as the second theorem in extreme value literature, while theorem 2.3.1 by Fisher is known as the first theorem. The distribution H in theorem 2.3.2 is known as generalized Pareto (GP) distribution with scale σ and tail index ξ. The standard GP distribution has scale σ = 1. We say Y ∼ GP(ξ) if (1 + ξy)−1/ξ e−y P (Y > y) = if ξ (cid:54)= 0 if ξ = 0 (2.8) where y ≥ 0 if ξ > 0 and 0 ≤ y ≤ −1/ξ if ξ < 0. Just like in GEV, the range of Y depends on ξ and leads to three different sub-classes within the GP family. Consider the following reparametrization: Y ∗ = 1 + ξY and α = 1/ξ. Then, for α > 0, P (Y ∗ > y) = y−α, y ≥ 1 which is the standard Pareto distribution. When ξ < 0, we have a bounded distribution on 0 ≤ y ≤ −1/ξ with P (Y ∗ > y) = y−1/ξ. Finally, for ξ = 0, P (Y > y) = e−y is standard exponential distribution with rate 1. The entire GP family can be represented by introducing two additional parameters: σ for scale and µ for location. Let ˜Y = (Y − µ)/σ then we say ˜Y ∼ GP (µ, σ, ξ) and (cid:104)  (cid:16) y−µ (cid:104)−(cid:0) y−µ σ σ (cid:17)(cid:105)−1/ξ (cid:1)(cid:105) 1 + ξ exp if ξ (cid:54)= 0 if ξ = 0 P ( ˜Y > y) = Suppose F ∈ D(Gξ) satisfies theorem 2.3.1 with limn→∞ F n(any+bn) following GEV(˜µ, ˜σ, ξ). According to theorem 2.3.2, for large values of u the distribution of the excess Yu is well approximated by GP(µ = 0, σ, ξ) with σ = ˜σ + ξ(u − ˜µ). The important thing to note here is that the shape parameter ξ in GP is in fact the tail index in GEV. The scale parameter σ is a function of GEV parameters and u. 14 2.4 Statistical Modeling of Extreme Values with GP Let Y denotes the univariate random response variable and X = (X1, . . . , Xp) denotes the p dimensional random vector of covariates. (yi, xi); i = 1, . . . , n are random samples from the joint distribution of Y and X. We also assume that Y ∼ FY |x. In the previous section we have seen that if F ∈ D(Gξ) then for a large enough value u, the distribution of the excess eu = Y − u|Y > u can be approximated as ¯Fu(y) = P (Y − u ≤ y|Y > u) ≈(cid:16) (cid:17)−1/ξ 1 + ξy σ y ≥ 0, (1 + ξy/σ) > 0 , for some σ > 0.In a parametric regression setup, one has to incorporate the covariates’ information in the parameters of the conditional distribution FY |x. This can be done in several ways. Let us begin with the following model: Y = (θ + XT γ) (2.9) where θ ∈ R and γ = (γ1, . . . , γp) ∈ Rp are unknown parameters. XT γ = γ1X1 +··· + γpXp with θ + XT γ > 0. We assume that  ∼ GP(ξ) with some ξ > 0. Clearly, Y |x follows GP distribution with µ = 0, tail index ξ and scale σ = σ(x) = θ + xT γ. Our goal is to estimate quantiles from model 2.9. Let 0 < τ < 1 and QY |x(τ ) = F−1 Then from 2.8, we have Y |x(τ ). QY |x(τ ) = (θ + xT γ)Q(τ ) = (θ + xT γ) ξ (cid:104) (cid:105) (1 − τ )−ξ − 1 (2.10) ξ [(1 − τ )−ξ − 1] is the τ quantile of standard GP distribution. We need to Here Q(τ ) = 1 estimate θ, γ and ξ from the data. We can obtain these parameters from maximizing the log 15 likelihood i=1 l(θ, γ, ξ) = − n(cid:88) = − n(cid:88) n(cid:88) n(cid:88) i=1 i=1 = = i=1 dF (yi|xi) (cid:16) 1 log σ(xi) 1 + log σ(xi) + (1 + log(θ + xT i γ) + (1 + ξyi σ(xi) 1 ξ ) −1 ξ (cid:17)− 1 (cid:16) n(cid:88) n(cid:88) log i=1 1 ξ ) i=1 (cid:17) ξyi σ(xi) 1 + (cid:16) log 1 + ξyi θ + xT i γ (2.11) (cid:17) Note that 2.11 is valid if σ(x) = θ + xT i γ > 0. Also, there is no analytical solution for 2.11. Since the range of GP depends on its parameters, numerical optimization of 2.11 needs separate treatment for ξ < 0. From now onward, we will only focus on the methods of parameter estimation for ξ ≥ 0. Given ξ ≥ 0, the support of GP is unbounded above. We numerically optimize 2.11 with the constraints ξ > 0 and θ + γxi > 0 for i = 1, . . . , n. Let (cid:26) n(cid:88) (cid:16) n(cid:88) i=1 (cid:17)(cid:27) ˆθ, ˆγ, ˆξ = argmin l(θ, γ, ξ) = argmin θ,γ,ξ θ,γ,ξ i=1 log(θ + xT i γ) + (1 + 1 ξ ) log 1 + ξyi (θ + xT i γ) We propose the following estimator for model 2.9 by plugging in the estimators in 2.10 we propose to estimate the conditional quantile as ˆQY |x(τ ) = (ˆθ + xT ˆγ) ˆξ (cid:104) (cid:105) (1 − τ )− ˆξ − 1 (2.12) Note that by theorem 2.3.2, we can reasonably approximate quantiles from model 2.9 by 2.12 as long as Y |x ∼ F ∈ D(Gξ) for some ξ > 0. What happens when we extend model 2.9 as (2.13) where φ ∈ R and θ + γX > 0 and  ∼ GP(ξ) with Y |x ∼ GP(φ, θ + xT γ, ξ). Now we have to estimate an additional parameter φ. We begin with our proposal in the following section. Y = φ + (θ + γX) 16 2.5 Proposed Method: GP for Scale Models Let us assume the following extension of the regression model: Y = φ + (θ + XT γ) (2.14) Where φ ∈ R and γ = (γ1, . . . , γp) ∈ Rp are unknown parameters. XT γ = γ1X1+···+γpXp. We assume that  ∼ GP(ξ) for some ξ > 0. Clearly, here Y |x follows GP distribution with tail index ξ, scale σ(x) = θ + xT γ and location µ = φ. Hence ξ(y − φ) σ(x) Note that in model 2.14, the true τ quantile is given by P (Y > y|x) = 1 + QY |x(τ ) = φ + (θ + xT γ)Q(τ ) = φ + (cid:105) (1 − τ )−ξ − 1 (2.15) (cid:20) (cid:21)−1/ξ (cid:104) (θ + xT γ) ξ We need to estimate the parameters φ, γ and ξ. If the true distribution of  is GP, we can i γ > 0 and yi − φ > 0 for all numerically optimize 2.16 under the constraints ξ > 0, θ + xT i = 1, . . . , n. ˆφ, ˆθ, ˆγ, ˆξ = argmin φ,θ,γ,ξ = argmin φ,θ,γ,ξ (cid:20) l(φ, θ, γ, ξ) n log(θ + xT i γ) + (1 + (cid:16) log n(cid:88) i=1 1 ξ ) 1 + ξ(yi − φ) θ + xT i γ (cid:17)(cid:21) (2.16) Then we can plug in estimates from 2.16 in 2.15 to obtain ˆQY |x(τ ). Note that the last condition yi − φ > 0 adds n many more constraints in 2.16. In model 2.9 we had only n + 1 constraints. Moreover, in many practical applications, the true distribution of  might not be GP. It can be some distribution in D(Gξ) with ξ > 0. Our aim is to find a reasonable estimator for quantiles in regression models in a more general setup. We propose our method in the following section. 2.5.1 Method of Estimation Suppose that we have a random sample (yi, xi), i = 1, . . . , n from the distribution of (Y, X) where yi ∈ R and xi = (xi1, . . . , xip) ∈ Rp. Y |x ∼ F where F ∼ GP(ξ) for some ξ > 0. Let 17 τc be a fixed value such that 0 < τc < 1. Let y(1) = min(y1, . . . , yn). We fit a GP model to ˜yi = yi − y(1); i = 1, . . . , n. Clearly, ˜yi ≥ 0. We can estimate θ, γ and ξ from the restricted likelihood as: θ,γ,ξ log(θ + xT l(θ, γ, ξ) = argmin ˆθ, ˆγ, ˆξ = argmin ξ ˜yi θ + xT i γ (2.17) i γ > 0 and ξ > 0. Here we do not estimate φ directly form the GP likelihood. Our advantage is a huge gain in efficiency as we avoid n additional with only n + 1 constraints: θ + xT i γ) + (1 + θ,γ,ξ i:˜yi>0 log 1 + constraints as described in 2.16. Finally, we propose the τ Quantile estimate for 2.15 as: ˆQY |x(τ ) = y(1) + = y(1) + (ˆθ + xT ˆγ) ˆξ (ˆθ + xT ˆγ) ˆξ (cid:104) (cid:104) (cid:105) (1 − τ )− ˆξ − 1 (cid:105) (1 − τ )− ˆξ − 1 (cid:34) (cid:88) (cid:16) (cid:88) i:˜yi>0 1 ξ ) (cid:17)(cid:35) (2.18) Algorithm 1 can be used to find quantile estimates with our proposed method. Algorithm 1: GP for scale family 1. Get ˜yi = yi − min(y1, . . . , yn) 2. Get ˆθ, ˆγ, ˆξ from 2.17 3. Given X = x, estimate the τ quantile as ˆQY |x(τ ) = y(1) + (ˆθ + xT ˆγ) ˆξ (cid:104) (cid:105) (1 − τ )− ˆξ − 1 In the following proposition, we establish the consistency of algorithm 1 for GP distribu- tion under the following compactness assumption on the covariate x. Assumption (A1): |xi| ≤ M for some M > 0. Proposition 2.5.1. Let yi = φ + (θ + x(cid:62) i γ)i, iid∼ GP (ξ). i 18 Let ˆφ = Y(1). With (cid:101)xi = [1, x(cid:62) i ] and(cid:101)γ = [θ, γ], let (cid:34) n(cid:88) log((cid:101)x(cid:62) ˆQY |x(τ ) = ˆφ +(cid:101)x(cid:62)ˆγ i=1 Estimate the τ quantile of Y as ˆγ, ˆξ = argmin i c) + (cid:18) (cid:16) 1 + c,d 1 d (cid:19) n(cid:88) (cid:32) log 1 + i=1 (cid:17) (1 − τ )− ˆξ − 1 ˆξ (cid:33)(cid:35) (cid:101)x(cid:62) d(yi − ˆφ) i c (2.19) Then, 1. 2. √ n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ) √ n( ˆQY |x(τ ) − QY |x(τ )) =⇒ N (0, σ2) Proof. We first show that Y(1) is a consistent estimate of φ. √ P ( √ n) = n|Y(1) − φ| > δ) = P (Y(1) > φ + δ/ n((cid:101)x(cid:62) i (cid:101)γ) (cid:32) (cid:33)−1/ξ n(cid:89) nδ/M(cid:101)γ → 0, n → ∞ −√ ≈ e 1 + i=1 ξδ = n(cid:89) P (((cid:101)x(cid:62) i (cid:101)γ)i > δ/ (cid:32) ≤ n(cid:89) nM(cid:101)γ ξδ√ 1 + i=1 i=1 √ n) (cid:33)−1/ξ where M(cid:101)γ = M max|(cid:101)γj| by assumption (A1). Thus, ˆφ − φ = oP (1/ √ n). First, suppose φ is fixed, then (cid:20) n(cid:88) i=1 log(cid:101)xT i(cid:101)γ + (1 + 1 ξ ) (cid:16) log 1 + n(cid:88) i=1 ˆγ0, ˆξ0 = argmin(cid:101)γ,ξ Then, the MLE is consistent and satisfies (cid:17)(cid:21) θ +(cid:101)xT i(cid:101)γ ξ(yi − φ)  (cid:101)xi(cid:101)x(cid:62) i γ i=1 (2.20) (2.21) where I(γ, ξ) = √ n((ˆγ0, ˆξ0) − ((cid:101)γ, ξ)) =⇒ N (0, I(γ, ξ)−1)  (cid:80)n (cid:101)xi(cid:101)x(cid:62) ((cid:101)x(cid:62) (cid:80)n (1−ξ)(1−2ξ) (cid:80)n 1 ξ −1 i γ)2 −1 i=1 2 i (1−ξ)(1−2ξ) i=1 (1−ξ)(1−2ξ) (cid:101)x(cid:62) i(cid:101)x(cid:62) i γ 19 Thus, for φ fixed (ˆγ0, ˆξ0) is the solution of the equation n(cid:88) i=1 log(cid:101)xT i γ + (1 + ) log 1 + 1 ξ (cid:32) (cid:33) = 0 θ +(cid:101)xT ξ(yi − φ) i γ ∇(cid:101)γ,ξ which implies Now, (ˆγ0, ˆξ0) = (h1(φ), h2(φ)) (h1( ˆφ), h2( ˆφ)) ≈ (h1(φ) + ( ˆφ − φ)h(cid:48) 1(φ), h2(φ) + ( ˆφ − φ)h(cid:48) 2(φ)) Therefore, √ n((h1( ˆφ), h2( ˆφ)) − ((cid:101)γ, ξ)) ≈ √ n((h1( ˆφ), h2( ˆφ)) − ((cid:101)γ, ξ)) + √ n( ˆφ − φ)(h(cid:48) 1(φ), h(cid:48) 2(φ)) As shown previously, √ n( ˆφ − φ) = oP (1), thus √ n((ˆγ, ˆξ) − (γ, ξ)) =⇒ N (0, Σ) 1(φ), h(cid:48) where Σ = (h(cid:48) For part (2), note that 2(φ))I(γ, ξ)−1(h(cid:48) 1(φ), h(cid:48) 2(φ))(cid:62) with I(γ, ξ) same as in (2.21). √ n( ˆQY |x(τ ) − QY |x(τ )) √ √ n( ˆφ − φ) + n = √ = n( ˆφ − φ) + √ (cid:16) (cid:32)(cid:101)x(cid:62)ˆγ (cid:17) −(cid:101)x(cid:62)(cid:101)γ n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) = oP (1) + (1 − τ )− ˆξ − 1 ˆξ √ ˆξ (cid:17)(cid:33) (cid:16) (1 − τ )− ˆξ − 1 n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) Now, Since √ n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) ≈ √ n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ), thus √ n((ˆγ, ˆξ) − ((cid:101)γ, ξ))(cid:62)∇(cid:101)γ,ξg((cid:101)γ, ξ) n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) =⇒ N (0, σ2) where σ2 = ∇(cid:101)γ,ξg((cid:101)γ, ξ)(cid:62)Σ∇(cid:101)γ,ξg((cid:101)γ, ξ). This completes the proof. √ 20 2.6 Simulation Study We want to assess our proposed GP scale model’s performance in estimating quantiles from regression models of the form Y = φ + (θ + XT γ). In this case, the quantile is linear in covariates. Throughout this section and later, we will concentrate on estimating quantile functions of the form: QY |x(τ ) = α(τ ) + xT β(τ ) where α(τ ) is the slope and β(τ ) = {β1(τ ), . . . , βp(τ )} is the intercept. For our regression model Y = φ + (θ + XT γ),  ∼ F QY |x(τ ) = φ + (θ + xT γ)Q(τ ) α(τ ) = φ + Q(τ ), β(τ ) = γQ(τ ) Since the quantiles are linear, quantile regression can be used without violating any model assumptions. Note that both the slope and the intercept of the target quantile are affected by the change in the quantile level τ. This effect is very significant at higher values of τ, particularly if  ∈ D(Gξ) for ξ > 0. Hence we want to see how different values of τ and ξ affect the quantile estimates. We start with the simple case: φ = 0 (case-I). In this case, we can directly fit GP distribution to the data with location µ = 0, scale σ(x) = θ + xT γ > 0 and shape ξ > 0. Since we assume that φ = 0, we do not estimate it from the data. Both the slope and the intercept of the target quantile can be estimated from a GP fit to the data. When φ (cid:54)= 0 (case-II), we use the scaled GP method as proposed in section 2.5, where we estimate φ as the minimum value in the data. We compare our method (GP.scale) with quantile regression (QR) for both case-I and II. We generate our data as follows: yi = φ + (1 + 0.9xi)ei, i = 1, . . . , n Here xi ∼ U (−1, 1) and ei are independent random variables, θ = 1 and γ = 0.9. Clearly, 1 + 0.9xi > 0 for all i = 1, . . . , n. We choose two different values φ = 0 and 2. For both 21 φ = 0 and φ = 2, we simulate ei from two distributions: exponential with mean λ = 1 (exp(1)) and standard GP with ξ = 0.5 (GP(0.5)). Standard exponential is the limit of the GP distribution in 2.3.2 for ξ = 0. We use two sample sizes n = 500 and 1000. The Monte Carlo sample size for all of the cases is K = 500. Figure 2.2: Quantile estimates for φ = 0, exp(1) at x = 0.5 and quantile levels τ. The black line is the true quantile function. Average ± standard error curves of the estimates are plotted for QR (orange) and GP.scale (violet). The sample size is n = 500. First we look at the τ quantile estimates for case-I with ei ∼ exp(1). We use the Monte 22 0246qq.hat$q.trueq.truegp.scaleqr0.010.100.200.300.400.500.600.700.800.900.96(a) all quantiles0.00.51.01.52.02.53.03.5qq.hat$q.trueq.truegp.scaleqr0.010.150.300.450.600.750.90(b) lower quantiles34567qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98(c) higher quantilesquantilet Figure 2.3: Quantile estimates for φ = 0, GP(0.5) at x = 0.5 and quantile levels τ. The black line is the true quantile function. Average ± standard error curves of the estimates are plotted for QR (orange) and GP.scale (violet). The sample size is n = 500. Carlo average estimate for the τ quantile ¯Q(τ ) = 1 K K(cid:88) i=1 ˆQY |x(τ ) The black line in figure 2.2 is the true quantile (q.true) function. Average estimates from both of the methods are very close to q.true. For better visualization, we have included only average ± standard error (se) curves for both QR (orange) and GP.scale (violet). We plot for all quantiles 0 < τ < 1 in figure 2.2 (a). Both GP.scale and QR estimates have less 23 05101520253035qq.hat$q.trueq.truegp.scaleqr0.010.100.200.300.400.500.600.700.800.900.96(a) all quantiles01234567qq.hat$q.trueq.truegp.scaleqr0.010.150.300.450.600.750.90(b) lower quantiles5101520253035qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98(c) higher quantilesquantilet Table 2.1: φ = 0, GP(0.5): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. τ 0.1 0.3 0.5 0.7 0.9 0.93 0.95 0.97 0.99 n = 500 n = 1000 GP.scale 0.07(0.06) 0.06(0.05) 0.06(0.05) 0.06(0.04) 0.07(0.05) 0.08(0.06) 0.09(0.06) 0.11(0.08) 0.15(0.12) QR 0.14(0.11) 0.09(0.08) 0.08(0.06) 0.08(0.06) 0.1(0.08) 0.12(0.09) 0.14(0.11) 0.18(0.14) 0.29(0.31) GP.scale 0.05(0.03) 0.05(0.03) 0.04(0.03) 0.04(0.03) 0.05(0.04) 0.06(0.04) 0.06(0.05) 0.07(0.06) 0.1(0.08) QR 0.11(0.08) 0.07(0.05) 0.06(0.04) 0.06(0.04) 0.07(0.05) 0.08(0.06) 0.09(0.07) 0.11(0.09) 0.17(0.15) variability in lower quantiles. For better visibility, we separate our quantiles into two regions: τ ∈ (0.9, 1) (higher quantiles) and τ ∈ (0, 0.9) (lower quantiles). In figure 2.2 (b) we can see that both QR and GP.scale approximate lower quantiles pretty well. At higher quantiles, QR has more variability than GP.scale. To see if the behavior changes with heavier than exponential tails, we proceed to case-I where ei ∼ GP(0.5). that GP(0.5) quantiles have a much higher range than exp(1). We do not have any issues In figure 2.3 (a) we can see with QR at lower quantiles. In figure 2.3 (c) we can see that QR estimates have very high standard error, particularly for τ > 0.95. Note that from figures 2.2 and 2.3 we do not get any idea about the bias of the estimators. So we look at the relative bias from K iterations. n(cid:88) i=1 (cid:12)(cid:12)(cid:12)1 − ˆQY |x,k(τ ) QY |x(τ ) (cid:12)(cid:12)(cid:12) rel.bias = 1 K We have enumerated the relative bias for case-I, GP(0.5) in table 2.1. Our true quantile values are very close to 0 for τ ≈ 0. That leads to a higher relative bias at lower quantiles. At higher quantiles, we expect the rel.bias to be large. The relative bias of case-I, exp(1) is given in table A.1 in the appendix. As we can see, looking at the relative bias is not enough. Hence, we use mean square error (mse), which can account for both the bias and the variance. mse = 1 K QY |x(τ ) − ˆQY |x,k(τ ) (2.22) (cid:17)2 n(cid:88) (cid:16) i=1 24 Figure 2.4 (a) gives us mse of quantile estimates for QR and GP.scale for all quantiles. Figure 2.4: mse(se) for φ = 0, GP(0.5) at x = 0.5 and quantile levels τ. The sample size is n = 500. We can see that the mse is less than 0.6 at lower quantiles for both of the methods. After 0.95 quantile, mse for both QR and GP.scale increases at a very high rate.But relative to QR, GP.scale is more stable, particularly at higher quantiles. Note that we have very few observations after 0.9 quantile. With a sample size of n = 500 we do not expect to do any better. We have summarised these results in table 2.2 for some quantile levels. Clearly for n = 500, mse of QR starts to explode after τ = 0.95. When we increase our sample size to 1000, the performance of QR improves, but it does not surpass GP.scale. Note that 25 010203040qq.hat.ms$mse.10.010.100.200.300.400.500.600.700.800.900.96gp.scaleqr(a) all quantiles0.00.10.20.30.40.50.6qq.hat.ms$mse.10.010.150.300.450.600.750.90gp.scaleqr(b) lower quantiles051015202530qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqr(c) higher quantiles Table 2.2: φ = 0, GP(0.5): mse(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. n = 500 n = 1000 τ 0.1 0.3 0.5 0.7 0.9 0.93 0.95 0.97 0.99 GP.scale 0(0) 0(0) 0.01(0) 0.03(0) 0.29(0.02) 0.61(0.04) 1.23(0.08) 3.39(0.23) 25.32(2.04) QR 0(0) 0(0) 0.01(0) 0.06(0) 0.62(0.04) 1.41(0.1) 3.16(0.23) 9.72(0.76) 120.95(19.78) GP.scale 0(0) 0(0) 0(0) 0.02(0) 0.15(0.01) 0.31(0.02) 0.61(0.04) 1.61(0.12) 11.29(0.91) QR 0(0) 0(0) 0.01(0) 0.03(0) 0.33(0.02) 0.67(0.04) 1.35(0.09) 3.77(0.26) 33.62(3.03) Table 2.3: φ = 2, GP(0.5): rel.bias(se) of quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. τ 0.1 0.3 0.5 0.7 0.9 0.93 0.95 0.97 0.99 n = 500 n = 1000 GP.scale 0.01(0) 0.01(0.01) 0.02(0.02) 0.03(0.02) 0.05(0.04) 0.06(0.05) 0.07(0.06) 0.09(0.07) 0.14(0.11) QR 0.01(0.01) 0.02(0.02) 0.03(0.02) 0.04(0.03) 0.08(0.06) 0.09(0.07) 0.11(0.08) 0.14(0.1) 0.26(0.25) GP.scale 0(0) 0.01(0.01) 0.02(0.01) 0.02(0.02) 0.04(0.03) 0.05(0.03) 0.05(0.04) 0.06(0.05) 0.1(0.07) QR 0.01(0.01) 0.01(0.01) 0.02(0.02) 0.03(0.02) 0.06(0.04) 0.07(0.05) 0.08(0.06) 0.1(0.08) 0.17(0.14) the rel.bias in table 2.1 could not reflect so many details as the magnitude of the bias is proportional to the value of the true quantile. We do not have this issues when we use mse. For case-I, exp(1), the magnitude of the bias is much lower than GP(0.5) at all quantiles. These results are given in the appendix figure A.1 and table A.2. Now we move to case-II where φ = 2. First we simulate ei ∼ GP(0.5). We want to see how our estimates fluctuate at higher quantiles (τ > 0.9). We use two sample sizes n = 500 and 1000. Figure 2.5 gives us the quantile function and the estimates. Like in case-I, both QR and GP.scale estimates improves with increasing sample size. In table 2.3, we can see that the relative bias is an increasing function of quantile level τ for both of the methods. Now 26 Figure 2.5: Quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. The black line is the true quantile function. Average± standard error curves of the estimates are plotted for QR (orange) and GP.scale (violet). the true quantile value is always more than φ = 2 since QY |x(τ ) = 2 + (1 + 0.9x)Q(τ ) ≥ 2 for all τ ∈ [0, 1]. We can see that for all quantile values, GP.scale has lower rel.bias compared to QR, even when we increase our sample size to 1000. Also, QR has very high relative bias at τ = 0.97 and 0.99. The mse in figure 2.6 conveys the same picture as relative bias and it is easier to interpret figure 2.6 than table 2.3. Moreover, we can see that mse for both case-I and case-II reflects the magnitude of the bias. Hence from now onward, we will only use mse as a measure of comparing different estimates. We have summarised mse values at some quantile levels for case-II, GP(0.5) in table 2.4. It is evident from here that using GP.scale is preferred over QR at higher quantiles. From our simulation studies, we see that the tails are approximated well with GP.scale when observations are form a scale model 2.9 and the underlying distribution is GP or exponential. Motivated by this we extend our method for the location-scale family of regression models in the next chapter. 27 101520253035qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98n = 500101520253035qq.hat$q.trueq.truegp.scaleqr0.900.920.940.960.98n = 1000quantilet Figure 2.6: mse±se for φ = 2, GP(0.5) and quantile levels τ. Table 2.4: φ = 2, GP(0.5): mse(se) of quantile estimates for φ = 2 at x = 0.5 and quantile levels τ. n = 500 n = 1000 τ 0.5 0.75 0.9 0.93 0.95 0.97 0.99 GP.scale 0.01(0) 0.05(0) 0.32(0.02) 0.65(0.04) 1.27(0.08) 3.39(0.22) 23.9(1.68) QR 0.01(0) 0.09(0.01) 0.67(0.05) 1.39(0.1) 2.47(0.16) 7.84(0.51) 100.11(14.26) GP.scale 0(0) 0.02(0) 0.16(0.01) 0.32(0.02) 0.63(0.04) 1.66(0.1) 11.6(0.73) QR 0.01(0) 0.04(0) 0.36(0.02) 0.76(0.05) 1.49(0.1) 4.29(0.29) 38.78(3.24) 28 010203040qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqrn = 500mset010203040qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqrn = 1000 CHAPTER 3 ESTIMATION OF EXTREME CONDITIONAL QUANTILES WITH COVARIATE ADAPTIVE THRESHOLD SELECTION 3.1 Introduction The study of the tail behavior of a distribution is helpful for the analysis of rare events. Often, Generalized Pareto (GP) distribution is used for modeling the tail of a distribution. This approach is popularly known as the Peak Over Threshold (POT) approach. In the POT approach, a GP distribution is commonly fitted to the tail region of the data using only the observations exceeding a certain threshold. A fundamental result in the extreme value literature by Pickands (1975) pointed out that GP distribution can be used to model the tail, as long as the underlying distribution satisfy certain regularity conditions. Let Z be a random variable with distribution function F . Pickands (1975) showed that the distribution of the conditional excess, Z − u|Z > u can be approximated by GP distribution as long as F belongs to the maximum domain of attraction of an extreme value distribution. It is hard to find distributions which do not belong to the maximum domain of attraction. Hence, the POT approach can be used even when F is unknown. It is important to keep in mind that the GP approximation is valid when u is large. As pointed out by Song & Song (2012a), selecting a low threshold value will lead to poor approximation of the excess by GP and hence the resulting estimates will have high bias. On the other hand, when the threshold is too high, the effective sample size used for estimating the parameters is small which leads to high variance. Thus, choosing an optimal threshold is necessary for the bias-variance tradeoff. Once the threshold is determined, the rest of the parameters: scale and shape are estimated from the GP fit to the scaled excess data. Langousis et al. (2016) have nicely summarised the available methods for threshold selection. Davison & Smith (1990) introduced statistical modeling of peaks over threshold with GP distribution where 29 the parameters of the GP distibution are allowed to vary as a function of covariates. Some of the more recent works which use covariate-based GP distribution for modeling peaks over threshold include Park & Kim (2016), Song & Song (2012b), del Castillo & Serra (2015). A comprehensive review of GP modeling where the parameters of the GP vary as a function of the covariates has been presented in Coles et al. (2001). In all these works, the threshold of which GP is fit is chosen among the set of empirical quantiles of the response variable. To the best of our knowledge, the problem of choosing the threshold as a function of the covariate has not been addressed in the literature. The main focus of this chapter is to develop a method which can choose a covariate dependent threshold while still suitably estimating the covariate dependent parameters of the GP distribution. In this chapter, we approach the conditional quantile estimation problem where the threshold can vary across different covariates. Integrating QR and EVT, we propose a con- sistent estimation of the quantile function. This method is quite flexible in the sense that it can be easily adapted for distributions in the heavy tail domain of attraction. Although there are relatively few works which address the issue of estimation of high conditional quan- tiles for heavy tailed distributions, the one by Wang et al. (2012) presents a very promising method for this purpose. Their method uses standard QR to estimate intermediate quantiles and then employs EVT to extrapolate this information to estimate extreme quantiles. Their approach is semi-parametric and uses the Hill estimator (see Hill (1975)) for estimating the tail behavior. We compare our parametric approach based on the GP distribution to that of Wang et al. (2012) in terms of accuracy of estimation for both quantiles and the tail index. The parametric based approach is the best method when the true observations are indeed from a Pareto or Generalized Pareto distribution. For other distributions in the Fréchet domain of attraction, the method is fairly competitive and comes with several advantages in terms of practical implementation. The rest of this chapter is organized as follows. In section 3.2 we propose our method (QR adjusted GP) for location-scale family of regression models. In 3.2.1, we establish asymptotic 30 properties of QR adjusted GP in the context of the GP distribution. We also demonstrate efficiency of our approach over standard QR and our previous method (GP.scale, section 2.5) through numerical investigations in 3.2.2. In section 3.3 we propose a generalization of our method for heavy tailed distributions. We give an outline of the analytical properties of this extension in 3.3.1. In section 3.4, we explore conditional quantile estimation in the heavy tail domain of attraction through simulation study. In 3.4.1 we review high conditional quantile estimation method of Wang et al. (2012) where the slopes of the quantile function also changes with quantile levels. We compare our method with Wang et al. (2012) for different distributions in the heavy-tail domain of attraction in 3.4.2. From our findings, we conclude that our proposed method better or as good as Wang et al. (2012). Finally, in section 3.5 we apply our method for the precipitation downscaling task with rainfall data. 3.2 Proposed Method: QR Adjusted GP for Location-Scale Models In chapter 2, we have seen that our proposed method is suitable for the following family of regression models: (3.1) where φ ∈ R and θ + XT γ > 0, and  ∼ GP(µ = 0, σ = 1, ξ). In this case, Y |X = x ∼ GP(µ = φ, σ(x) = θ + xT γ, ξ). Hence, the conditional distribution of Y depends on x only Y = φ + (θ + XT γ) through the scale parameter σ(x) = θ + xT γ. Now we concentrate on the general case: Y = φ + XT ρ + (θ + XT γ) (3.2) Here the conditional distribution of Y given X = x depends on x through both the scale σ(x) = θ + xT γ and the location µ(x) = φ + xT ρ. Clearly, the model 3.1 is a special case of 3.2 for ρ = 0. Let ˜yi = (yi − µ(xi))/σ(xi). If we try to fit the GP distribution to the 31 observations directly, the parameters in 3.4 can be estimated as l(φ, ρ, θ, γ, ξ) ˆφ, ˆρ, ˆθ, ˆγ, ˆξ = argmin φ,ρ,θ,γ,ξ = argmin φ,ρ,θ,γ,ξ = argmin φ,ρ,θ,γ,ξ (cid:34) (cid:88) (cid:34) (cid:88) i:˜yi>0 i:˜yi>0 log(σ(xi)) + (1 + 1 ξ ) log(θ + xT i γ) + (1 + log 1 + ξ (cid:16) (cid:88) (cid:88) i:˜yi>0 1 ξ ) i:˜yi>0 (cid:16) log 1 + ξ (cid:17)(cid:35) (yi − µ(xi)) (cid:17)(cid:35) σ(xi) (yi − φ − xT θ + xT i γ i ρ) (3.3) We have to minimize the loss function in 3.3 such that ξ > 0, σ(xi) > 0 and yi−µ(xi) > 0 for all i = 1, . . . , n. Like in the optimization problem of 2.11 in the previous chapter, there is no analytical solution to this. We need to solve this with 2n + 1 many constraints numerically. Note that n many constraints come from the condition yi− µ(xi) > 0. Our proposed method can reduce the computational challenge by removing these constrains. We discuss the details in the next section. 3.2.1 Method of Estimation We assume the following location-scale regression model Y = φ + XT ρ + (θ + XT γ), (3.4) where φ, θ ∈ R, ρ = (ρ1, . . . , ρp) and γ = (γ1, . . . , γp) in Rp are unknown parameters. θ+XT γ > 0 for all X in its domain. XT γ = γ1X1+···+γpXp and XT ρ = ρ1X1+···+ρpXp are the linear combinations of covariates in equation 3.4. Given X = x and a quantile level 0 < τ < 1, the conditional quantile is given by QY |x(τ ) = φ + xT ρ + (θ + xT γ)Q(τ ) Suppose  follows standard GP distribution with tail index ξ > 0, i.e.,  ∼ GP(ξ). Then Y given X = x follows GP distribution with location µ(x) = φ+xT ρ, scale σ(x) = θ +xT γ > 0, and tail index ξ > 0. The conditional quantile is QY |x(τ ) = φ + xT ρ + (θ + xT γ) ξ 32 (cid:104) (cid:105) (1 − τ )−ξ − 1 (3.5) Through our numerical investigation in section 2.6, we have seen that for the scale family of models Y = φ + (θ + XT γ),  ∼ GP(ξ), quantile regression can reasonably approximate any lower quantile τ ∈ (0, 0.9). But quantile regression suffers from very high variability, particularly when τ > 0.95. Although we need an alternative of quantile regression when τ is close to 1, we can use the information from quantile regression estimates a lower quantiles. For covariate free threshold µ = φ, we can use the first order statistic Y(1) to estimate µ. Motivated by this, we propose using the quantile regression estimate at quantile level τc = 1 n as our threshold. (cid:92) φ + xT ρ = ˆQRY |x (cid:16) 1 (cid:17) (cid:16) 1 n n + xT ˆβ (cid:16) 1 (cid:17) n ˆµ(x) = ˆQRY |x (cid:16) 1 (cid:17) n = ˆα (cid:17) (3.6) We obtain ˆα( 1 n) and ˆβ( 1 n) from equation 2.3. Suppose (cid:16) 1 (cid:17) e , x n (cid:16) 1 (cid:17) n = Y − ˆQRY |x = Y − ˆµ(x) Then mimicking the procedure in 2.17, we estimate the rest of the parameters θ, γ, and ξ as ˆθ, ˆγ, ˆξ = argmin θ,γ,ξ = argmin θ,γ,ξ = argmin θ,γ,ξ l(θ, γ, ξ) (cid:34) (cid:88) (cid:34) (cid:88) i:e( 1 n ,xi)>0 i:e( 1 n ,xi)>0 log(θ + xT i γ) + (1 + log(θ + xT i γ) + (1 + 1 ξ ) 1 ξ ) (cid:88) (cid:88) i:e( 1 n ,xi)>0 i:e( 1 n ,xi)>0 log log (cid:16) (cid:16) 1 + ξ (yi − ˆµ(xi)) θ + xT i γ (cid:17)(cid:35) 1 + ξ e( 1 n, xi) θ + xT i γ (cid:17)(cid:35) (3.7) Finally, using 3.7 and 3.6, we can estimate the quantile function in 3.5 as (cid:16) 1 (cid:17) n + ˆθ + xT ˆγ ˆξ (cid:104) (cid:105) (1 − τ )− ˆξ − 1 ˆQY |x(τ ) = ˆQRY |x The algorithm of the estimation procedure is outlined below. In the following proposition, we establish the consistency of algorithm 2 for GP distribu- tion under the compactness assumption (A1) on the covariate x. 33 Algorithm 2: QR adjusted GP (cid:17) (cid:16) 1 (cid:17) (cid:16) 1 n, xi) = yi − ˆQRyi|xi = ˆα n n + xT i (cid:16) 1 ˆβ (cid:17) n 1. Get ˆQRY |xi 2. Get e( 1 (cid:16) 1 n (cid:17) from 3.6. 3. Get ˆξ, ˆθ and ˆγ from 3.7 4. Given X = x and 0 < τ < 1, estimate the τ quantile as (cid:16) 1 (cid:17) n + ˆθ + xT ˆγ ˆξ (cid:104) (cid:105) (1 − τ )− ˆξ − 1 ˆQY |x(τ ) = ˆQRY |x iid∼ GP (ξ). Proposition 3.2.1. Let yi = φ + x(cid:62) With (cid:101)xi = [1, x(cid:62) i γ)i, i i ρ + (θ + x(cid:62) i ],(cid:101)γ = [θ, γ] and (cid:101)ρ = [φ, ρ], let(cid:100)(cid:101)x(cid:62) (cid:18) 1 i (cid:101)ρ = ˆQRY |xi (cid:18) (cid:19) n(cid:88) n (cid:19) and ˆγ, ˆξ = argmin c,d and estimate the τ quantile of Y as i c) +  n(cid:88) log((cid:101)x(cid:62) ˆQY |x(τ ) = (cid:100)(cid:101)x(cid:62) i=1 i (cid:101)ρ +(cid:101)x(cid:62)ˆγ ˆξ (cid:16) (cid:17) (1 − τ )− ˆξ − 1 √ n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ) 1 +  d(yi − (cid:100)(cid:101)x(cid:62) i (cid:101)ρ) (cid:101)x(cid:62) i c log (3.8) 1 + 1 d i=1 Then, 1. 2. √ n( ˆQY |x(τ ) − QY |x(τ )) =⇒ N (0, σ2) (1/n) is a consistent estimate of(cid:101)x(cid:62) i ˜ρ. Proof. We first show that ˆQRY |xi 34 In this direction, note that by relation (7.1) in Chernozhukov (2005), we have √ n( ˆQRY |xi (1/n) − QY |xi (1/n)) ∼ N 1/n(1 − /n) f 2(F−1(1/n)) 0, (E(XX(cid:48)))−1 (cid:18) (cid:19) where f and F denote the density and cumulative distribution function of GP(ξ) respectively. Since F−1(1/n) = (1/ξ)((1 − 1/n)−1/ξ − 1) → 0, f 2((1/ξ)((1 − 1/n)−1/ξ − 1)) → 1. Hence, √ n( ˆQRY |xi (1/n) − QY |xi (cid:32)(cid:18) n(cid:101)x(cid:62) i (cid:101)γ (1/n)) = oP (1) (cid:33) (cid:19)−1/ξ − 1 Also, √ (1/n) −(cid:101)x(cid:62) i (cid:101)ρ) = √ ξ n(QY |xi 1 − 1 n where M(cid:101)γ = M maxj |γj| by assumption (A1). Hence, i (cid:101)ρ) = oP (1) (1/n) −(cid:101)x(cid:62) (cid:16) n(cid:88) First, suppose(cid:101)x(cid:62) n( ˆQRY |xi √ log(cid:101)xT i(cid:101)γ + (1 + log 1 + i (cid:101)ρ is known, then (cid:20) n(cid:88) ˆγ0, ˆξ0 = argmin(cid:101)γ,ξ √ i=1 Then, where I((cid:101)γ, ξ) is same as defined in (2.21). Thus, for (cid:101)x(cid:62) n((ˆγ0, ˆξ0) − ((cid:101)γ, ξ)) =⇒ N (0, I((cid:101)γ, ξ)−1) n(cid:88) (cid:32) 1 ξ ) i=1 √ nM(cid:101)γ n (cid:46) = oP (1) (cid:17)(cid:21) ξ(yi −(cid:101)x(cid:62) i (cid:101)ρ) θ +(cid:101)xT i(cid:101)γ (3.9) the equation which implies ∇(cid:101)γ,ξ i=1 (cid:33) i (cid:101)ρ known (ˆγ0, ˆξ0) is the solution of ξ(yi −(cid:101)x(cid:62) i (cid:101)ρ) θ +(cid:101)xT i (cid:101)ρ), h2((cid:101)x(cid:62) i (cid:101)ρ)) i γ = 0 1 + ) log 1 ξ i γ + (1 + log(cid:101)xT (ˆγ0, ˆξ0) = (h1((cid:101)x(cid:62) Now, by Taylor expansion, we have i (cid:101)ρ), h2((cid:100)(cid:101)x(cid:62) (h1((cid:100)(cid:101)x(cid:62) i (cid:101)ρ)) i (cid:101)ρ)) + (((cid:100)(cid:101)x(cid:62) ≈ (h1((cid:101)x(cid:62) i (cid:101)ρ), h2((cid:101)x(cid:62) 1((cid:101)x(cid:62) i (cid:101)ρ)h(cid:48) i (cid:101)ρ −(cid:101)x(cid:62) i (cid:101)ρ), h2((cid:101)x(cid:62) i (cid:101)ρ) + ((cid:100)(cid:101)x(cid:62) 2((cid:101)x(cid:62) i (cid:101)ρ)h(cid:48) i (cid:101)ρ −(cid:101)x(cid:62) i (cid:101)ρ)) 35 Therefore, n((h1((cid:100)(cid:101)x(cid:62) i (cid:101)ρ), h2((cid:100)(cid:101)x(cid:62) i (cid:101)ρ)) − ((cid:101)γ, ξ)) √ n((cid:100)(cid:101)x(cid:62) i (cid:101)ρ)) − ((cid:101)γ, ξ)) + i (cid:101)ρ), h2((cid:101)x(cid:62) i (cid:101)ρ −(cid:101)x(cid:62) n((h1((cid:101)x(cid:62) ≈ √ n((cid:100)(cid:101)x(cid:62) i (cid:101)ρ) = oP (1), thus i (cid:101)ρ −(cid:101)x(cid:62) As shown previously, √ √ √ n((ˆγ, ˆξ) − (γ, ξ)) =⇒ N (0, Σ) i (cid:101)ρ)) 2((cid:101)x(cid:62) i (cid:101)ρ), h(cid:48) 1((cid:101)x(cid:62) i (cid:101)ρ)(h(cid:48) where Σ = (h(cid:48) (2), note that 1(φ), h(cid:48) 2(φ))I(γ, ξ)−1(h(cid:48) 1(φ), h(cid:48) 2(φ))(cid:62) with I(γ, ξ) same as in (2.21). For part (cid:32)(cid:101)x(cid:62)ˆγ (cid:17) −(cid:101)x(cid:62)(cid:101)γ n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) = oP (1) + (1 − τ )− ˆξ − 1 ξ √ ˆξ (cid:17)(cid:33) (cid:16) (1 − τ )−ξ − 1 n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) n((ˆγ, ˆξ) − ((cid:101)γ, ξ))(cid:62)∇(cid:101)γ,ξg((cid:101)γ, ξ) √ = = √ (cid:16) n( ˆQY |x(τ ) − QY |x(τ )) n((cid:100)(cid:101)x(cid:62) i (cid:101)ρ −(cid:101)x(cid:62) i (cid:101)ρ) + √ √ n n((cid:100)(cid:101)x(cid:62) i (cid:101)ρ −(cid:101)x(cid:62) i (cid:101)ρ) + √ n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) ≈ √ n((ˆγ, ˆξ) − ((cid:101)γ, ξ)) =⇒ N (0, Σ), thus √ Now, √ Since n(g(ˆγ, ˆξ) − g((cid:101)γ, ξ)) =⇒ N (0, σ2) where σ2 = ∇(cid:101)γ,ξg((cid:101)γ, ξ)(cid:62)Σ∇(cid:101)γ,ξg((cid:101)γ, ξ). This completes the proof. √ 3.2.2 Simulation Study: GP Distribution We want to assess performance of our method when the data is simulated from the following regression model Y = φ + XT ρ + (θ + XT γ),  ∼ GP(ξ) QY |x(τ ) = φ + xT ρ + (θ + xT γ)Q(τ ) α(τ ) = φ + θQ(τ ), β(τ ) = ρ + γQ(τ ) 36 (3.10) Again, we see that quanitles are linear in x. So we can use quantile regression in this case without violating any model assumption. We simulate (xi, yi), i = 1, . . . , n iid observations from 3.10. We use φ = θ = 1, ρ = (1, 1) and γ = (0.9, 0) so that yi = 1 + x1i + x2i + (1 + 0.9x1i)ei We simulate xij ∼ U (−1, 1) for j = 1, 2 so that 1 + 0.9x1i > 0 for all i = 1, . . . , n. We want to see how QR and GP.scale perform compared to our proposed method QR adjusted GP (QR.GP) for estimating higher quantiles above 0.9. We simulate ei from two distributions: exponential with mean λ = 1 (exp(1)) and standard GP with ξ = 0.5 and 0.8 (GP(ξ)). We n leads to use two sample sizes n = 500 and 1000. From proposition 3.2.1, we know that τc = 1 consistent estimation of higher quantiles with QR.GP when the true distribution is GP. Here we want to validate our proposition through simulations. We also investigate the performance of QR.GP and QR.Scale with varying threshold. For that purpose, we choose τc from a grid of values in (0,1). When the data is generated from a GP or exponential distribution, values of τc close to 0 would be the best choice. We start with the case ei ∼ GP(0.5). We use mse as our measure of assessment and our monte carlo sample size is K = 500. The results for sample size n = 500 is summarised in figure 3.1. We can see in 3.1 (a) that GP.scale is not appropriate in this case. It fails to capture the effect of the covariate x at at any given higher quantile 0.9 ≤ τ ≤ 0.99. QR.GP performs better than QR for threshold τc = 0.01 at any quantile level 0.9 ≤ τ ≤ 0.99. In 3.1 (b), we can see how mse of QR.GP changes for different values of τc when the quantile level is τ = 0.93. QR.GP has lower mse than QR for all τc except at τc = 0.95. When we increase the quantile level to τ = 0.97, QR.GP has lower mse than QR for all values of τc (figure 3.1 (c)). Now we increase our sample size to n = 1000. In figure 3.2 (a) we can see that even with an increase in the sample size, QR.GP outperforms QR at τc = 0.01. From figure 3.2 (b) and (c), we get an understanding of the effect of τc on the estimated quantile by QR.GP. The mse is an increasing function of τc in this case. This is bound to happen since the effective sample size used for estimation decreases when we increase τc. Note that according 37 Figure 3.1: mse±se for, GP(0.5) at x = (0.5, 0.5) and quantile levels τ. The sample size is n = 500. (a): τc = 0.01 for QR.GP (b) and (c): 0 < τc < 1 to algorithm 2, given τc, we only use {yi : yi − xT ˆQR(τc) > 0} to estimate the scale and the shape parameters. We get the best performance from QR.GP when τc = 0.01. We can comfortably choose any τc as long as τc ≤ 0.85. Any choice of τc higher than 0.85 leads to inconsistent estimation of GP parameters as we have very few observations above the i threshold. Sparsity of the observations at tail of a distribution is associated with higher values of the tail index ξ. Next we simulate samples of size n = 1000 from GP with tail index ξ = 0.8. In figure 3.3 (a) we can see that the magnitude of the mse is higher than the previous case 38 01020300.900.910.920.930.940.950.960.970.980.99gp.scaleqrqr.gpt(a) 0.9 0. When we do not know F , we can use GP approximation to compute a higher quantile. First we focus on the nonregression case. Let Y ∼ F where F ∈ D(Gξ) for some ξ > 0. Then, by Theorem 2.3.2, for a large enough value u, Y |Y > u is approximately distributed according to a GP distribution. Hence, for any y > u, (cid:32) 1 + 41 P (Y > y|Y > u) = (cid:33)−1/ξ ξ(y − u) σ 0.00.51.01.50.900.920.940.960.98qrqr.gp(a) n= 10000.000.050.100.150.200.900.920.940.960.98qrqr.gp(b) n= 2000tmse Let yτ be the τ quantile of Y such that yτ ≥ u. Then ξ(yτ − u) ξ P (Y > yτ ) = P (Y > u)P (Y > yτ|Y > u) (cid:17)− 1 (cid:17)−ξ − 1 (cid:105) we do not know F so we estimate ¯F (u) = E(I(Y > u)) with ˆpu =(cid:80)n (cid:16) (cid:104)(cid:16)1 − τ = ¯F (u) = 1 − τ σ ξ Clearly, if we know F , we can substitute the exact expression for ¯F (u) in 3.11. In practice i=1 I(yi > u)/n. Hence (3.11) therefore, yτ = QY (τ ) = u + 1 + σ ¯F (u) we can estimate the quantile in 3.11 as ˆQY (τ ) = u + ˆσ ˆξ (cid:104)(cid:16)1 − τ ˆpu where ˆσ and ˆξ are estimated as (cid:88) ˆσ, ˆξ = argmin σ,ξ i:yi>u log(σ) + (1 + 1 ξ ) (cid:17)− ˆξ − 1 (cid:105) (cid:16) (cid:88) log 1 + ξ i:yi>u (3.12) (cid:17) (yi − u) σ An important thing to note here is that we do not estimate u from the data. Moreover, 3.11 is an approximation of the true quantile for large u. 3.3.1 Method of Estimation We assume model 3.4 with  ∼ F , F ∈ D(Gξ) for some ξ > 0. Let 0 < τc < 1 be an intermediate quantile level such that τc is not too close to 1. Here we use ˆµ(x) as our threshold. ˆµ(x) = ˆQRY |x(τc) ˆQRY |x(τc) = ˆα(τc) + xT ˆβ(τc) (3.13) We obtain ˆα(τc) and ˆβ(τc) from equation 2.3. n. Here we want to choose τc appropriately so that Y − ˆµ(x) approximately follows GP distribution. Let In 3.2.1 we used τc = 1 42 e(τc, x) = Y − ˆQRY |x(τc). Then e(τc, x) = φ + xT ρ + (θ + xT γ) − ˆQRY |x(τc) = φ + xT ρ + (θ + xT γ) − ˆα(τc) − xT ˆβ(τc) = φ − ˆα(τc) + xT (ρ − ˆβ(τc)) + (θ + xT γ) We assume that given τc, e(τc, x)|e(τc, x) > 0 ≈ GP(µ = 0, σ(x) = θ + xT γ, ξ). Hence, we can estimate the rest of the unknown parameters θ, γ and ξ by fitting a scaled GP model to the conditional residuals e(τc, xi)|e(τc, xi) > 0 for i = 1, . . . , n. (cid:88) (cid:35) (3.14) ˆθ, ˆγ, ˆξ = argmin (cid:88) l(θ, γ, ξ) (cid:34) θ,γ,ξ = argmin log(θ + xT i γ) + (1 + 1 ξ ) log 1 + ξ e(τc, xi) θ + xT i γ θ,γ,ξ i:e(τc,xi)>0 i:e(τc,xi)>0 i γ > 0 for all We need to minimize the loss function in 3.14 such that ξ > 0 and θ + xT i = 1, . . . , n. Using ˆQRY |x(τc) to estimate µ(x) = φ + xT ρ, we have reduced n constraints in our optimization problem. Let τ be a high quantile level such that Qe(τc,x)(τ ) > 0. Then from 3.11 we have (cid:80)n Let ˆp0(τc, n) = 1 n residual e(τc, x) (cid:104)(cid:16) Qe(τc,x)(τ ) ≈ σ(x) ξ 1 − τ P (e(τc, x) > 0) (cid:104)(cid:16) 1 − τ ˆQe(τc,x)(τ ) = ˆθ + xT ˆγ ˆξ ˆp0(τc, n) (cid:17)−ξ − 1 (cid:105) (cid:17)− ˆξ − 1 (cid:105) i=1 I[e(τc, xi) > 0]. Following 3.12 we can estimate the τ quantile of the (3.15) Finally, combining 3.15 and 3.13, we can estimate the conditional quantile of Y given x at a quantile level τ as ˆQY |x(τ ) = ˆQRY |x(τc) + ˆQe(τc,x)(τ ) = ˆα(τc) + xT ˆβ(τc) + ˆθ + xT ˆγ (cid:104)(cid:16) 1 − τ (cid:17)− ˆξ − 1 (cid:105) ˆξ ˆp0(τc, n) The algorithm for estimation is outlined below. 43 Algorithm 3: QR adjusted GP for location-scale family start: Set 0 < τc ≈ 0; fix x, δ > 0; ˆβ(τc) from 3.13. (τc) = ˆα(τc) + xT i I. Obtain ˆQRY |xi II. Get e(τc, xi) = yi − ˆQRyi|xi III. Get ˆξ, ˆθ and ˆγ from optimizing 3.14 and ˆp0(τc, n) = 1 n IV. For 0 < τ < 1 estimate τ quantile as (τc) ˆQY |x(τ ) = ˆQRY |x(τc) + ˆθ + xT ˆγ ˆξ i=1 I[e(τc, xi) > 0] (cid:80)n (cid:17)− ˆξ − 1 (cid:104)(cid:16) 1 − τ (cid:105) ˆp0(τc, n) V. Set τc = τc + δ such that 0 < τc < 1 and repeat steps I-IV end: Stop when τc ≈ 0.95 3.3.2 Discussion: Threshold Selection for Heavy Tailed Distributions According to proposition 3.2.1, if the underlying distribution is GP, QR adjusted GP with n leads to a consistent quantile estimator for a quantile level τ close to 1. Here we τc = 1 propose an outline to generalize our method when the underline distribution is in the heavy- tail domain of attraction. Let Z1, . . . , Zn be an independent and identically distributed sample with common distribution function F ∈ D(Gξ). Then, given Z(n−k,n), the order statistics differences ( ˜Z1, . . . , ˜Zk) = (Z(n−k+1) − Z(n−k,n), . . . , Z(n,n) − Z(n−k,n)) (3.16) follow approximately GP distribution with tail index ξ. Using theorem 2.3.2, we can find the parameters of GP by maximizing the log likelihood  (cid:88) i:˜zi>0 (cid:18) (cid:18) log 1 + ξ ˜zi σ (cid:19) (cid:19) (cid:88) 1 ξ i:˜zi>0 ˆσ, ˆξ = argmin γ,ξ log(σ) + 1 + wi√ In Theorem 3.4.2 of de Haan & Ferreira (2010), it has been shown zi k(k/n)ξ where W = (w1, . . . , wk) follows multivariate normal distribution. This allowed them to (k/n)ξ + d = cξ 44 prove √ k(( ˆξ, ˆσ/(k/n)−ξ) − (ξ, 1)) =⇒ N (µ, Σ) Under the same set of assumptions, in Theorem 3.1 of Diebolt et al. (2007), it has been shown that for where then (cid:18)1 − τ k/n  (cid:19)− ˆξ − 1 ˆQZ (τ ) = z(n−k,n) + ˆσ ˆξ QZ (τ ) = F−1 Z (τ ) √ k(k/n)ξ( ˆQZ (τ ) − QZ (τ )) =⇒ N (m, s2). For a regression model of the form, (cid:124) (cid:123)(cid:122) (cid:125) Yi = φ + x(cid:62) i ρ (cid:101)x(cid:62) i (cid:101)ρ (cid:124) (cid:123)(cid:122) + (θ + x(cid:62) (cid:101)x(cid:62) i (cid:101)γ (cid:125) i, i ∈ D(Gξ), i iid i γ) We consider Ei = Yi − ˆQY |xi (cid:19) (cid:18) 1 − k n where k is the same which ensures that given E(n−k,n) (E(n−k+1) − E(n−k,n), . . . , E(n,n) − E(n−k,n)) are approximately independent and identically distributed as GP with tail index ξ. By Theorem 5.1 in Chernozhukov (2005), (cid:18) (cid:19) i (cid:101)γF−1  1 − k n −(cid:101)x(cid:62) i (cid:101)ρ −(cid:101)x(cid:62) i (cid:101)γ(i − F−1  (cid:19) (cid:17)ξ(cid:19) (cid:16) n k (cid:18) 1√ k (1 − k/n) =⇒ N (0, σ2 X ) √ k(k/n)ξ Therefore, term), we get (cid:18) ˆQY |xi Ei =(cid:101)x(cid:62) k(k/n)ξ(cid:16) √ (1 − k/n)) + OP (cid:17) (1 − k/n) 45 (n−k,n) − F−1  =⇒ N (0, σ2  ) Applying Theorem 5.1 in Chernozhukov (2005) with covariate x = 1 (i.e. only constant By assumption (A1),(cid:101)x(cid:62) i (cid:101)γ ≤ M maxj |γj| = M(cid:101)γ, therefore (cid:18) 1√ i (cid:101)γ(i − (n−k,n)) + OP Ei =(cid:101)x(cid:62) (cid:16)n (cid:17)ξ(cid:19) k k Let i0, i1, i2, . . . , ik be the indices corresponding to (n−k+1,n), . . . , (n,n), then Since, (n−k+1,n) − (n−k,n), . . . , (n,n) − (n−k,n) are iid observations from GP distribution. (cid:62)(cid:101)γ and tail This implies Ei1 index ξ. ,··· , Eik S = {i : Ei > 0} ≈ {i1, . . . , ik} are iid observations from GP distribution with scale (cid:101)xi (cid:33)  (cid:88) (cid:19) (cid:88) log((cid:101)x(cid:62) i γ) + (cid:32) log 1 + (cid:18) Therefore, we can fit GP distribution on the residuals and estimate the parameters as ˆγ, ˆξ = argmin γ,ξ i:ei>0 1 + 1 ξ i:ei>0 ξei(cid:101)x(cid:62) i γ Note that ei closely share the asymptotic behavior of ˜zi’s as defined in (3.16), i.e. ei = k(k/n)ξ). However, the proof of asymptotic normality of the ei’s is too involved and √ OP (1/ is thereby omitted in this dissertation. Mimicking the steps of Theorem 3.4.2 in de Haan & Ferreira (2010), the rate of conver- gence of the tail index ˆγ and the scale coefficient ˆγ can be determined. Similary, mimicking the steps of Theorem 3.1 in Diebolt et al. (2007), one can determine the convergence rate of ˆQY |X=x(τ ). For the non-parametric regime, we explore the perfor- mance of our method from a numerical standpoint. Intricate details on the exact convergence rates of the tail index and the quantiles from a theoretical standpoint have been avoided. However, the above formulation does give us an insight into the choice of τc. Note that we vary τc = 1 − k n, k = 1, . . . , n and choose k such that estimation of ξ stabilizes. Figure 3.6 displays the tail index estimate for t-distributed errors with by applying the GP-fit to errors and the proposed QR-GP method to the regression output y’s. Note that the plot stabilizes at τc ≈ 0, which means that k = 0. Indeed the tail index estimation based on our approach for regression model closely agrees with the estimates if GP had been directly fit to the errors. Also, the estimate of the tail index agree with the true value. 46 Figure 3.6 displays the tail index estimate for t-distributed errors and burr-distributed errors by applying the GP-fit to errors and the proposed QR-GP method to the regression output y’s. For t-distributed errors, for τc lying between 0.75-0.9, the plot stabilizes, thus one can choose 1 − k/n ≈ 0.75. For Burr-distributed errors, for τc lying between 0.4-0.6, the plot stabilizes, thus one can choose 1 − k/n ≈ 0.4. Note that the region of stabilization completely agree to the case if a GP model had been fit to the original errors. Therefore, this approach gives an alternative to GP modeling when both location and scale are functions of the covariate. Figure 3.5: Tail index estimates with changing values of τc. Red line corresponds to the estimate of ξ when GP is fit to the error i. Black line corresponds to the estimate when our proposed method is fit to yi. True tail index is 0.8. 3.4 Simulation Study: Nonparametric Problem In section 3.2.2, our focus has been on the cases  ∼GP(ξ) for ξ ≥ 0. We have assessed the efficacy of our method for higher quantile levels 0.9 ≤ τ ≤ 0.99. In section 3.3, we have proposed a guideline to extend our method for distributions in the heavy tail domain of attraction. Also, we have seen that selecting thresholds of the form τc = 1− k n would lead to consistent estimation of the tail index ξ. The value of k is determined by the nature of the 47 0.00.20.40.60.80.780.790.800.810.82GPD fit for Pareto errors1−k/nxGP on YGP on E Figure 3.6: Tail index estimates with changing values of τc. Red line corresponds to the estimate of ξ when GP is fit to the error i. Black line corresponds to the estimate when our proposed method is fit to yi. True tail index is 0.8. Left: T-distributed errors. Right: Burr-distributed errors underlying distribution. Let ˆξ = ˆξ(k) be the estimated tail index for threshold τc = 1 − k n. We have demonstrated through numerical simulations that when we select k such that ˆξ(k) is stable, we predict ξ accurately. The estimation of the tail index is an important problem in the extreme value literature. There have been several alternative approaches to estimate the tail index, namely: method of moments, Hill estimator, Pickands estimator, etc.(Haan & Ferreira (2006)). Hill (1975) has proposed a nonparametric estimator for ξ > 0 using upper ordered statistics. Suppose that Z1, . . . , Zn are n iid random variables with distribution F , which is unknown. An equivalent condition for theorem 2.3.1 says that if F ∈ D(Gξ) for some ξ > 0 if and only if (3.17) i.e., as u → ∞, the conditional excess P (Z > uz|Z > u) behaves like Pareto. Meaning that for large u, the scaled excess Z/u conditioned on Z > u is approximately Pareto with ξ > 0, z > 1 lim u→∞ ¯F (uz) ¯F (u) − 1 ξ = z tail index ξ. Theorem 1.2.2 in Haan & Ferreira (2006) gives an equivalent form of of the 48 0.20.40.60.80.650.700.750.80GPD fit for T errors1−k/nxGP on YGP on E0.20.40.60.80.730.750.770.79GPD fit for Burr errors1−k/nxGP on YGP on E condition in 3.17 Further simplification leads to lim u→∞ lim u→∞ (cid:82) ∞ u (1 − F (z)) dz (cid:82) ∞ u (log z − log u)dF (z) 1 − F (u) z = ξ 1 − F (u) = ξ Let Z(i) be the ith ordered observation with Z(n) = max(Z1, . . . , Zn). Replacing u by Z(n−k) and F by the empirical distribution function, the Hill estimator for ξ based on k upper order statistics Z(n−k+1), . . . Z(n) is k(cid:88) i=1 (cid:104) Z(n−i+1) (cid:105) Z(n−k) ˆξH (k) = 1 k log (3.18) Let k = k(n) such that k(n) → ∞, k(n)/n → 0, and k(n + 1)/k(n) → 1 as n → ∞. Then, by the theorem 3.2.4 in Haan & Ferreira (2006), ˆξH (k) → ξ in probability. More details about the derivation of the Hill estimator and its asymptotic properties can be found in Haan & Ferreira (2006), Hill (1975). Also, due to the nonparametric nature of the estimator, this method can be easily adapted as long as the underlying distribution is heavy tailed. But different choices of k leads to different values of ˆξH (k); hence it is crucial to choose k appropriately for consistent estimation of ξ. Wang et al. (2012) combined quantile regression and the Hill estimator to estimate conditional quantiles from a heavy tailed distribution. We outline the method of estimating higher quantiles by Wang et al. (2012) in the following section. 3.4.1 Review of an Existing Method for Extreme Quantile Estimation In Wang et al. (2012) the authors have proposed methods for regression models with quantile functions QY |x(τ ) = α(τ ) + xT β(τ ) where Y |x ∼ D(Gξ) for some ξ > 0. Let 0 < τc < 1 be a fixed constant, close to 1. The authors have assumed the linear quantile function only for higher quantiles τ ∈ [τc, 1]. QY |x(τ ) = α(τ ) + xβ(τ ) for τ ∈ [τc, 1] 49 QY |x(τ ) has no specific form for τ ∈ (0, τc). Let τc < τj = j/(n + 1) be a sequence of quantiles where j = n − k, . . . , m and m < n is an integer close to n. Suppose [a] denotes the integer part of a. The authors assume m = n − [nη] with η > 0 being a small constant such that nη < k. They also assume that as n → ∞, k = k(n) → ∞, and k(n)/n → 0. Consider the sequence of quantile levels τc < τn−k <,··· < τm < 1. The authors use quantile regression to obtain a reasonable quantile estimate at the intermediate quantile level τj for j = n − k, . . . , m. Let qj =ˆα(τj) + xT ˆβ(τj) n(cid:88) (yi − α − xT i β) i=1 ˆα(τj), ˆβ(τj) = argmin α,β ρτj be the QR estimate at τj. Then one might consider qj as an upper ordered statistic of a sample from FY |x. The authors have used qj as a replacement for Y(j) in equation 3.18. Using k + (n − m) many upper quantiles, they have proposed to estimate the tail index ξ as k(cid:88) j=[nη] log qn−j qn−k (3.19) ˆξH (k, n) = ˆξ = 1 k − [nη] Let U be the tail quantile function, i.e., U (t) = inf{z : F (z) ≥ 1 − 1/t}, t ∈ [1,∞) U (t) is the 1 − 1/t quantile. When F ∈ D(Gξ) and ξ > 0, we have for z > 0 lim t→∞ U (tz) U (t) = zξ let 1− 1/t = τn−k = 1− (k + 1)/(n + 1) and 1− 1/tz = τn so that as n → ∞, τn → 1. Then for large n, Combining 3.19 and 3.20, the conditional quantile is estimated as (cid:17)ξ ≈(cid:16)1 − τn−k (cid:16)1 − τn−k (cid:17) ˆξ 1 − τn 1 − τn qn−k U (tz) U (t) = Q(τn) Q(τn−k) ˆQY |x(τn) = 50 (3.20) (3.21) Before we get into our simulation model, we would like to briefly discuss the optimal choice of tail sample fraction determined by k = k(n). As noted by Wang et al. (2012), the choice of k is a very important problem in statistical applications of extreme value theory. By optimally choosing k we identify the tail sample fraction that may have the most relevant information on the tail behavior. Garrido & Lezaud (2013) explained the influence of k on ˆξH (k). For small values of k, the variance of the estimate is large, while increasing k leads to higher bias. The trade off between the bias and the variance comes at some intermediate value of k. We would like to see how the method proposed by Wang et al. (2012) (QR.Hill) behaves for different values of k. We will discuss more on the choices for k and η during our numerical investigation. Wang et al. (2012) has also proposed a variation of their method for β(τ ) = β, i.e., when the slopes are constant at higher quantiles. That is a special case of our simulation model in 3.10 with γ = 0. We restrict ourselves to the general case, i.e., γ (cid:54)= 0. In section 3.4.2, we compare our proposed QR.GP method with QR.Hill for heavy tailed data at higher quantiles. We have outlined the algorithm for QR.Hill in algorithm 4. We use algorithm 3 to implement QR.GP. 3.4.2 Simulation Study We simulate from the following model Y = φ + XT ρ + (θ + XT γ)  ∼ F ∈ D(Gξ), ξ > 0 (3.22) We have the same model as in 3.10. We now explore different distributions in the domain of attraction of the Fréchet distribution. The conditional τ quantile of 3.22 is given by QY |x(τ ) = φ + xT ρ + (θ + xT γ)Q(τ ) (3.23) For all 0 < τ < 1. Hence the true quantile function 3.23 is linear in x. Moreover, both the slope α(τ ) = φ + θQ(τ ) and the intercept β(τ ) = ρ + γQ(τ ) vary with τ. Thus, we satisfy the model assumption 2.10 in Wang et al. (2012) with any τ ∈ (0, 1). We simulate 51 Algorithm 4: QR.Hill start: Set 0 < τc ≈ 1; fix x, δ > 0; I. Set k = n − [nτc] and m = n − [nη]. Select k upper quantiles τj = j n+1 for j = n − k, . . . , m. Get qj = ˆQRY |x(τj) = ˆα(τj) + xT ˆβ(τj) II. Estimate tail-index ξ as III. Estimate τ quantile as k(cid:88) j=[nη] log qn−j qn−k ˆξ = 1 k − [nη] (cid:16)1 − τn−k 1 − τ (cid:17) ˆξ ˆQY |x(τ ) = qn−k IV. Set τc = τc − δ such that 0 < τc < 1 and repeat steps I-IV end: Stop when τc ≈ 0.01 (xi, yi), i = 1, . . . , n iid observations from 3.22 yij = 1 + x1i + x2i + (1 + 0.9x1i)ei Here φ = θ = 1, ρ = (1, 1) and γ = (0.9, 0). We simulate xij ∼ U (−1, 1) for j = 1, 2 and ei ∼ F . We choose three example distributions F in the Fréchet domain: Pareto, absolute t and Burr. See Table B.1 in the appendix for a list of well known distributions in the heavy-tail domain. The true τ quantile at x is QY |x(τ ) = 1 + x1 + x2 + (1 + 0.9x1)(Q(τ )) (3.24) Pareto: We start our analysis with standard Pareto distribution Pa(α). Note that Pareto distribution is a special case of standard GP distribution. If Y ∼ Pa(α) and Y ∗ = α(Y − 1) then Y ∗ ∼ GP(µ = 0, σ = 1, ξ = 1 α. Tails of GP and Pareto behave very similarly. In section 3.2.2, we have looked into quantiles up to 0.99 from GP α ). Pa(α) has tail index ξ = 1 with ξ = 0.5 and 0.8. 52 Figure 3.7: mse±se for Pareto(0.5) at x = (0.5, 0.5) and quantile levels τ. The sample size is n = 1000 and 0 < τc < 1 Now we will focus on the four extreme quantile levels τ ∈ (0.985, 0.991, 0.995, 0.998). We compare our method QR.GP with QR.Hill for different values of tuning parameter τc = 1− k for k = 1, . . . , n. We choose from a grid of values τc ∈ (0.01, 0.05, . . . , 0.95). When we look into extreme quantiles, the data is even more sparse. So we choose sample sizes n ≥ 1000 in our simulations. Our monte carlo sample size is K = 500 for all of the cases. we are going n to compare QR.GP (algorithm 3) with QR.Hill (algorithm 4) for different choices of τc. We choose m = n − [n0.1] for QR.Hill as per the recommendation by Wang et al. (2012). Figure 3.7 summarises mse of the quantile estimate at x = (0.5, 0.5) for sample size 53 2345670.010.150.300.500.700.85qr.hillqr.gp(a) t=0.985468101214160.010.150.300.500.700.85qr.hillqr.gp(b) t=0.991101520253035400.010.150.300.500.700.85qr.hillqr.gp(c) t=0.995tmse4060801001201401600.010.150.300.500.700.85qr.hillqr.gp(d) t=0.998 n = 1000 and tail index ξ = 0.5. Clearly, the optimal threshold for QR.GP is τc = 0.01. Our result is consistent with our findings in section 3.2.2. We can see that we outperform QR.Hill regardless of the choice of threshold at τ = 0.985, 0.991 and 0.995 quantiles. For QR.Hill, optimal τc is between 0.8 and 0.9. Now we increase our sample size to 2000 and compare QR.Hill with QR.GP at τ = 0.998. In figure 3.8 (b) we can see that QR.GP has lower mse than QR.Hill at all threshold values. With increasing sample size, we can see that QR.GP can outperform QR.Hill at quantile level τ = 0.998. Figure 3.8: mse±se for Pareto(0.5) at x = (0.5, 0.5) at quantile τ = 0.998 and sample sizes (a) n = 1000, (b) n = 2000 54 4060801001400.010.150.300.500.700.85qr.hillqr.gptc(a) n = 1000501001500.010.150.300.500.700.85qr.hillqr.gptc(b) n = 2000t=0.998mse Figure 3.9: mse±se for Pareto(0.8) at x = (0.5, 0.5) and quantile levels τ. The sample size is n = 2000 and 0 < τc < 1 We now look into the case when ξ = 0.8. We choose the sample size 2000 as higher values of ξ is associated with even more data sparsity. From figure 3.9 (a), (b) and (c) we can clearly see that QR.GP is uniformly better that QR.Hill at all but quantile τ = 0.998.In fact, we can see in figure 3.11 that we achieve uniform dominance at quantileτ = 0.998 for n = 5000. For τ = 0.985, QR.GP is still uniformly better than QR.Hill even when we increase our sample size form 2000 to 5000. We can see this in figure 3.10. 55 501001502000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(a) t=0.9851002003004005006000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(b) t=0.9915001000150020000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(c) t=0.9952000600010000140000.010.150.300.400.500.600.700.800.90qr.hillqr.gp(d) t=0.998tmse Figure 3.10: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.985 and sample sizes (a) n = 2000, (b) n = 5000 Figure 3.11: mse±se for Pareto(0.8) at x = (0.5, 0.5) at quantile τ = 0.998 and sample sizes (a) n = 2000, (b) n = 5000 56 204060801000.010.150.300.500.700.85qr.hillqr.gptc(a) n = 2000204060801000.010.150.300.500.700.85qr.hillqr.gptc(b) n = 5000t=0.985mse2000600010000140000.010.150.300.500.700.85qr.hillqr.gptc(a) n = 200020006000100000.010.150.300.500.700.85qr.hillqr.gptc(b) n = 5000t=0.998mse Absolute t: Next we simulate  from the absolute t distribution. The tail index for t ν where ν is the degrees of freedom. We choose ξ = 0.5 and 0.8. Note that now the underlying distribution is asymptotically GP with tail index ξ. Here we explore the distribution is ξ = 1 optimal choice for τc in the case of absolute t distribution. From our numerical investigations, we have seen that τc around 0.75 works best with QR.GP. For QR.Hill, the optimal is 0.7. In figure 3.12 we demonstrate their performance at their corresponding optimal across lower quantiles 0.9 ≤ τ ≤ 0.99 for sample sizes n = 1000 and 2000. Figure 3.12: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5) and quantiles 0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 1000, (b) n = 2000 We can see in figure 3.12 (a) that both QR.GP an QR.Hill are consistent in the sense that they have low mse when τ ≤ 0.95. For n = 1000, QR.Hill outperforms QR.GP at τ > 0.95. When we increase our sample size to n = 2000, QR.Hill cannot outperform QR.GP at any quantile level between 0.9 and 0.991. Next we move into higher quantiles. Figure 3.13 illustrates QR.GP and QR.Hill across different choices of τc for τ = 0.985, 0.991. 57 0123450.90.9250.950.9750.9850.991tqr.gp (tc=0.5)qr.hill (tc=0.7)(a) n = 1000mse | t(0.5) |0123450.90.9250.950.9750.9850.991tqr.gp (tc=0.5)qr.hill (tc=0.7)(b) n = 2000 Figure 3.13: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 58 1.52.02.53.00.010.100.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill3.54.04.55.05.56.06.57.00.010.100.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill0.80.91.01.11.21.30.010.100.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill23450.010.100.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill0.30.40.50.60.70.80.90.010.100.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill12340.010.100.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc | t(0.5) | Figure 3.14: mse±se for absolute t with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 59 10121416182022240.010.100.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill4060801001200.010.100.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill51015200.010.100.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill204060801001200.010.100.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill51015200.010.100.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill204060801001200.010.100.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc | t(0.5) | Figure 3.15: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 60 2530354045500.050.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill60801001201401600.050.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill10152025300.050.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill4060801001201400.050.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill510152025300.050.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill204060801001201400.050.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc | t(0.8) | Figure 3.16: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 61 2003004005006007008009000.050.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill500010000150000.050.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill1002003004005006000.050.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill100020003000400050000.050.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill1002003004005006000.050.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill100020003000400050000.050.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc | t(0.8) | Figure 3.17: mse±se for absolute t with ξ = 0.8 at x = (0.5, 0.5) and quantiles 0.9 ≤ τ ≤ 0.991. Sample sizes (a) n = 2000, (b) n = 5000 We achieve uniform dominance at quantile 0.985 with sample size n = 2000. We do so for quantile 0.991 with n = 5000. In figure 3.14 we can see that for τ = 0.995 and 0.998, QR.GP is comparable to QR.Hill. An important observation we have is that QR.GP is more robust to the choice of τc than QR.Hill. Little deviation from optimal τc results to huge fluctuation in the mse of QR.Hill. In comparison, our mse is bounded and converges to 0 as we increase n for all τc ∈ (0, 1). Next we want to see how changing values of ξ can affect our predictor. We increase ξ to 0.8. From our numerical investigations, we saw that τc = 0.4 is optimal for both QR.GP and QR.Hill. Figure 3.17 shows mse at optimal values of τc = 0.4 at quantiles 0.9 < τ < 0.99. QR.GP does a better job at higher values of τ. Note that for QR.Hill, the optimal τc is 0.7 when ξ = 0.5 while for ξ = 0.8, optimal τc = 0.4. The optimal threshold τc is between 0.4 and 0.5 for QR.GP. We plot mse for different choices of τc and quantiles τ = 0.985 and 0.991 in figure 3.15. We can see that any τc between 0.4 and 0.6 works well for QR.GP. Figure 3.16 shows large sample behavior of the quantile estimates at quantiles τ = 0.995 and 0.998. With increasing sample size, QR.GP can approximate the tail well for thresholds 0.4 ≤ τc < 1. 62 0204060800.90.9250.950.9750.9850.991tqr.gp (tc=0.4)qr.hill (tc=0.4)(a) n = 10000204060800.90.9250.950.9750.9850.991tqr.gp (tc=0.4)qr.hill (tc=0.4)(b) n = 2000mse | t(0.8) | Burr: We simulate from the Burr distribution with tail index 0.5. The results for quantiles τ = 0.985 and 0.991 are summarised in figure 3.18. Optimal τc for QR.GP is close to 0.5. Any τc > 0.5 works well when the sample size is 5000. For QR.Hill, the optimal τc is around 0.6. In case of |t(0.5)| we have seen in figure 3.13 that with increasing sample size, QR.GP can approximate the 0.985 quantile for all τc ∈ (0, 1). We notice the same characteristic here. Moreover, with increasing sample size QR.GP and QR.Hill both are comparable at their optimal. For quantiles 0.991 and 0.995, we can see from figure 3.19 that QR.GP is comparable with QR.Hill at their optimal. Now we simulate from Burr with tail index 0.8. From figure 3.20 we can see that the optimal τc is 0.5 for QR.GP whereas optimal τc for QR.Hill is 0.7. For higher quantiles τ = 0.995, 0.998, we can see in figure 3.21 that both QR.GP and QR.Hill are comparable at their corresponding optimals. 63 Figure 3.18: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 64 1.52.02.53.00.010.100.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill34567890.010.100.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill0.81.01.21.41.60.010.100.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill23456780.010.100.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill24680.010.100.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill12345670.010.100.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc | Burr(0.5) | Figure 3.19: mse±se for Burr with ξ = 0.5 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 65 1015202530350.010.100.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill501001500.010.100.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill51015202530350.010.100.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill501001500.010.100.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill510152025300.010.100.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill0501001500.010.100.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc | Burr(0.5) | Figure 3.20: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.985, 0.991 and sample sizes n = 1000, 2000 and 5000 66 4050607080900.010.100.200.300.400.500.600.700.800.90 t=0.985 n=1000qr.gpqr.hill1502002503000.010.100.200.300.400.500.600.700.800.90 t=0.991 n=1000qr.gpqr.hill2530354045500.010.100.200.300.400.500.600.700.800.90 t=0.985 n=2000qr.gpqr.hill801001201401601802000.010.100.200.300.400.500.600.700.800.90 t=0.991 n=2000qr.gpqr.hill101520253035400.010.100.200.300.400.500.600.700.800.90 t=0.985 n=5000qr.gpqr.hill4060801001201401600.010.100.200.300.400.500.600.700.800.90 t=0.991 n=5000qr.gpqr.hillmsetc Burr(0.8) Figure 3.21: mse±se for Burr with ξ = 0.8 at x = (0.5, 0.5). Quantile values τ = 0.995, 0.998 and sample sizes n = 1000, 2000 and 5000 67 6008001000120014000.010.100.200.300.400.500.600.700.800.90 t=0.995 n=1000qr.gpqr.hill5000100001500020000250000.010.100.200.300.400.500.600.700.800.90 t=0.998 n=1000qr.gpqr.hill3004005006007008000.010.100.200.300.400.500.600.700.800.90 t=0.995 n=2000qr.gpqr.hill2000300040005000600070000.010.100.200.300.400.500.600.700.800.90 t=0.998 n=2000qr.gpqr.hill1002003004005006007000.010.100.200.300.400.500.600.700.800.90 t=0.995 n=5000qr.gpqr.hill100020003000400050000.010.100.200.300.400.500.600.700.800.90 t=0.998 n=5000qr.gpqr.hillmsetc Burr(0.8) 3.5 Application to precipitation downscaling According to the National Oceanic and Atmospheric Administration (NOAA), “Statis- tical downscaling encompasses the use of various statistics-based techniques to determine relationships between large-scale climate patterns resolved by global climate models and observed local climate responses.” Statistical downscaling techniques aim to quantify the relationship between global scale climate model outputs and local scale climate variables. Statistical downscaling methods are useful for making predictions about local climate vari- ables like precipitation, temperature, wind speed and direction, air quality, etc.. In this section, we want to apply our proposed method to predict extreme quantiles of precipita- tion. Here we are not looking for the best set of predictors form the global scale model outputs that explains regional precipitation. Instead, we use relevant variables like sea level pressure, specific humidity, etc. which contain necessary information and use them as our predictors. Using a pre-determined set of covariates, we assess the predictive performance of three different methods: QR, QR.GP, and QR.Hill in the context of the estimation of high conditional quantiles. Cannon (2011) have used historic precipitation data obtained from the Vancouver In- ternational Airport (WMO station 71892, 49◦12(cid:48)N, 123◦10.8(cid:48)W) weather station to demon- strate an application of their proposed method: quantile regression with neural network. This dataset is publicly available with the R package ‘qrnn’. The dataset consists of daily precipitation records of 30 years, from 1971-2000. From figure 3.22, we can see that the distribution of the precipitation is unimodal and skewed. Moreover, out of the total 10958 days, there are 5978 dry days. Precipitation for these days is recorded as 0mm. We did not encounter any missing values in this dataset. The maximum precipitation is 89.4mm while less than 1% of the observations are above 31mm. We use three global climate model outputs: daily sea-level pressure, 500-hPa geopotential height, and 700-hPa specific humidity as covariates. The predictor variables are obtained from the NCEP/NCAR reanalysis project by Kalnay et al. (1996) from the nearest grid 68 Figure 3.22: Daily precipitation at the Vancouver International Airport, 1971-2000 point: 50◦N, 122.5◦W. We also include the sine and cosine of the day of the year as our predictors. We standardize each of these variables before the analysis. In real data, we do not know the true quantiles. In order to compare the predictive per- formance of a quantile estimation method, we need a reference quantile estimate. Following the suggestion by Cannon (2011), we use monthly sample quantiles as our reference quantile. Given a quantile level τ, let QY (m,i)(τ ) be the monthly sample quantile for the day i of the month m. Then QY (m,i)(τ ) is the sample τ quantile based on the observations from month m, and we assume the same value for all the days of month m. Here our aim is to compare the performance of three different quantile estimators: QR, QR.GP, and QR.Hill. For all three methods, we use the same set of covariates as described earlier in this section. We can directly fit QR as there is no threshold selection involved. For 69 both QR.Hill and QR.GP, we need to choose a threshold. Using a grid of threshold values τc in (0.01, . . . , 0.8), we fit QR.GP to the data following algorithm 3. In figure 3.23, we have plotted different values of the estimated tail index against the thresholds. The tail index estimate stabilizes around τc ≈ 0.7. We choose τc = 0.68 (red dotted line in figure 3.23) for which ˆξ = 0.18. Next, we fit QR.Hill to the data using four different threshold values τc = 0.8, 0.85, 0.9, and 0.95. To assess the goodness of fit, we look into mean square prediction error at τ quantile (cid:17)2 M SP E(τ ) = 1 n (τ ) − QY (m,i)(τ ) (cid:16) ˆQY |xi n(cid:88) i=1 where ˆQY |xi (τ ) is the predicted conditional τ quantile given xi and QY (m,i)(τ ) is the monthly sample quantile for day i. Note that for both QR.GP and QR.Hill, we can only predict Figure 3.23: Estimated tail index ( ˆξ) at different choices of τc ∈ (0, 1). Method of estimation is QR.GP 70 quantiles τ above the threshold τc. We choose τ ∈ {0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.985, 0.99, 0.995}. We use only quantile levels above 0.95 for the case τc = 0.95 (QR.Hill). Figure 3.24 shows the mean square prediction error for different quantile estimates. We can see that the mspe for QR.GP is lower than QR at all quantile levels τ > 0.9. Although the performance of QR and QR.GP are not very different. Note that the tail index estimated by QR.GP is ˆξ = 0.18 which indicates that the data is not very heavy tailed. We have seen from our numerical investigations that QR performs poorly when ξ is 0.5 or above. But for ξ = 0 (exponential), QR is comparable with QR.GP. Moreover, we have a decent sample size n = 10958 which gives QR some benefit. Hence we are not very surprised that the performance of QR closely follows our method. On the other hand, predictive performance of QR.Hill is very sensitive to the choice of τc. Higher thresholds tend to work better at higher quantile levels. For QR.Hill, we cannot agree upon a universal threshold value that gives us the lowest mspe at all quantile levels above 0.9. Implementation of the QR.Hill method in practice is challenging. Firstly, For a given value of the predictor x, Wang et al. (2012) proposed to choose τc such that ˆξ|x is stable. This makes the choice of tail index dependent on the covariate. Thus, we cannot choose a optimal τc for all x. Meanwhile, with QR.GP we can still select a covariate adaptive threshold for a fixed τc as we use QR(τc|x) to estimate the threshold. Secondly, due to the nature of QR.Hill, one has to compute QR estimates for n(1 − τc) many quantiles. This is computationally challenging, particularly when τc is low or the sample size n is high. For this data, with the sample size of n = 10958 implementing QR.Hill is very time consuming. Compared to that we compute QR only once for each choice of τc. 71 Figure 3.24: Mean square prediction error (MSPE) of different quantile estimates at quantile levels 0.9 < τ < 0.995. Thresholds: 0.7 for QR.GP(blue); 0.8, 0.85, 0.9 and 0.95 for QR.Hill (dotted lines) 72 50100150200250tmse0.9000.9100.9200.9300.9400.9500.9600.9700.9800.990mspe at higher quantilesqr.gp.0.7qrqr.hill 0.8qr.hill 0.85qr.hill 0.9qr.hill 0.95 CHAPTER 4 DISCUSSIONS AND FUTURE DIRECTIONS Estimation of extreme conditional quantile is of prime interest in many fields of science. Quantile Regression is useful to quantify effects of the covariates at different quantile levels. When the underlying distribution is heavy tailed in nature, Quantile Regression estimates suffer form high variability particularly at higher quantiles. In this dissertation, we have developed methods tailored for estimating high conditional quantiles for heavy tailed distributions. We have used the GP distribution to model the tail behavior. Selection of an appropriate threshold is necessary for fitting a GP model. The threshold selection problem itself is challenging: a higher threshold leads to high variance while a lower threshold is associated with higher bias. The efficiency of the GP model is severely compromised when the threshold itself varies as a function of the covariate. The common practice is to use an empirical quantile of the response variable as a threshold which is not covariate adaptive. Combining Quantile Regression with Extreme Value theory, we have proposed a novel method (QR.GP) of selecting a covariate adaptive threshold which in turn produce a consistent estimator for higher quantiles. We establish desirable asymptotic properties: consistency and asymptotic normality of QR.GP when the distribution is belongs to the GP family. We have shown that with proper thresholding, QR.GP can be adapted for a wide class of distribution in the heavy tailed domain of attraction. Although there are relatively few works which address the issue of estimation of high conditional quantiles for heavy tailed distributions, the one by Wang et al. (2012) presents a very promising ap- proach. We have compared QR.GP approach which is fully parametric with that of Wang et al. (2012). Our findings show that when the underlying distribution is Pareto or General- ized Pareto, QR.GP provides consistent estimation of higher quantiles and has lower mean square error that Wang et al. (2012). From our numerical investigations, we show that the value of the threshold in QR.GP is different for different distributions in the heavy tailed 73 domain. Nonetheless, we have proposed a guideline for choosing the threshold when we implement QR.GP in practice. Compared to Wang et al. (2012) our method is more robust, computationally much faster and easy to interpret and apply in real data. In our work, we have focused on the linear quantile functions. As pointed out by Wang & Li (2013), the linearity assumption is restrictive. Wang & Li (2013) have used Box-Cox power transformation in which the quantiles of the transformed response variable is linear in the covariates. An extension of work could be modeling some power transformation of the response variable with GP distribution. Our method features a constant tail index. In future, we can look at a possible extension where the tail index is also a function of the covariate. Establishing asymptotic properties of our method is challenging when the underlying distribution is a heavy tailed distribution other than Pareto or GP. This is an open ended problem and one can explore further in this direction. 74 APPENDICES 75 APPENDIX A APPENDIX FOR CHAPTER 2 Supplemental figures and tables: Table A.1: φ = 0, exp(1): rel.bias(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. τ 0.1 0.3 0.5 0.7 0.9 0.93 0.95 0.97 0.99 n = 500 n = 1000 GP.scale 0.04(0.03) 0.04(0.03) 0.04(0.03) 0.04(0.03) 0.04(0.03) 0.04(0.03) 0.04(0.03) 0.04(0.03) 0.05(0.04) QR 0.14(0.11) 0.08(0.06) 0.06(0.05) 0.06(0.04) 0.06(0.04) 0.06(0.05) 0.07(0.05) 0.07(0.06) 0.1(0.08) GP.scale 0.03(0.03) 0.03(0.03) 0.03(0.02) 0.03(0.02) 0.03(0.02) 0.03(0.02) 0.03(0.02) 0.03(0.03) 0.04(0.03) QR 0.11(0.08) 0.06(0.05) 0.05(0.04) 0.04(0.03) 0.04(0.03) 0.05(0.04) 0.05(0.04) 0.05(0.04) 0.07(0.05) Table A.2: φ = 0, exp(1): mse(se) of quantile estimates for φ = 0 at x = 0.5 and quantile levels τ. τ 0.1 0.3 0.5 0.7 0.9 0.93 0.95 0.97 0.99 n = 500 n = 1000 GP.scale 0(0) 0(0) 0(0) 0.01(0) 0.03(0) 0.04(0) 0.05(0) 0.08(0.01) 0.18(0.01) QR 0(0) 0(0) 0.01(0) 0.01(0) 0.06(0) 0.09(0.01) 0.12(0.01) 0.21(0.01) 0.71(0.05) GP.scale 0(0) 0(0) 0(0) 0(0) 0.02(0) 0.02(0) 0.03(0) 0.05(0) 0.1(0.01) QR 0(0) 0(0) 0(0) 0.01(0) 0.03(0) 0.05(0) 0.07(0) 0.12(0.01) 0.36(0.02) 76 Figure A.1: mse(se) for φ = 0, exp(1) at x = 0.5 and quantile levels τ. The sample size is n = 500. 77 0.00.20.40.60.8qq.hat.ms$mse.10.010.100.200.300.400.500.600.700.800.900.96gp.scaleqr(a) all quantiles0.000.010.020.030.040.050.06qq.hat.ms$mse.10.010.150.300.450.600.750.90gp.scaleqr(b) lower quantiles0.00.20.40.60.8qq.hat.ms$mse.10.900.920.940.960.98gp.scaleqr(c) higher quantilesmset APPENDIX B APPENDIX FOR CHAPTER 3 Table B.1: Probability distributions in heavy-tail domain (cid:16) Distribution GP(µ, σ, ξ) GP(0, 1, ξ) exp(λ) Pa(α) Frechet(α) Burr(η, λ, ν) |T(n)| 1 − F (y) 1 + ξ(y−µ) σ (cid:17)− 1 ξ − 1 ξ (1 + ξy) exp(−λy) y−α 1 − exp(−y)−α 1 −(cid:16) (cid:16) (cid:17)λ (cid:17) η+y−ν 1 − Ft(n)(y) η 2 ξ ξ ξ 0 1 α 1 α 1 ν 1 n Parameters µ < y, σ, ξ > 0 F−1(τ ) (cid:104) (1 − τ )−ξ − 1 (cid:105) (cid:105) µ + σ ξ (cid:104) 0 < y, ξ > 0 1 ξ (1 − τ )−ξ − 1 λ log(1 − τ ) − 1 (1 − τ )− 1 α (− log(τ )) 1 α η(cid:0)τ − 1 F−1 t(n) λ − 1(cid:1)(cid:105)− 1 (cid:17) (cid:16) 1+τ ν 2 y > 0, λ > 0 y > 1, α > 0 y > 0, α > 0 y > 0, η, λ, ν > 0 y > 0, n > 0 (cid:104) 78 BIBLIOGRAPHY 79 BIBLIOGRAPHY Abrevaya, J. 2001. “the effect of demographics and maternal behavior on the distribution of birth outcomes,”. Empirical Economics 26. 247. Beirlant, Jan, Goedele Dierckx, A Guillou & C Staăricaă. 2002. On exponential representa- tions of log-spacings of extreme order statistics. Extremes 5(2). 157–180. Bremnes, John Bjørnar. 2004. Probabilistic wind power forecasts using local quantile re- gression. Wind Energy: An International Journal for Progress and Applications in Wind Power Conversion Technology 7(1). 47–54. Buchinsky, Moshe. 1994. Changes in the u.s. wage structure 1963-1987: Application of quan- tile regression. Econometrica 62(2). 405–458. http://www.jstor.org/stable/2951618. Caeiro, Frederico & M Ivette Gomes. 2009. Semi-parametric second-order reduced-bias high quantile estimation. Test 18(2). 392–413. Cannon, Alex J. 2011. Quantile regression neural networks: Implementation in r and ap- plication to precipitation downscaling. Computers and Geosciences 37(9). 1277 – 1284. doi:https://doi.org/10.1016/j.cageo.2010.07.005. del Castillo, Joan & Isabel Serra. 2015. Likelihood inference for generalized pareto dis- tribution. Computational Statistics Data Analysis 83. 116 – 128. doi:https://doi.org/ 10.1016/j.csda.2014.10.014. http://www.sciencedirect.com/science/article/pii/ S0167947314003065. Chavez-Demoulin, V. & A. Davison. 2005. Generalized additive modeling of sample extremes. Journal of the Royal Statistical Society Series C 54. 207–222. doi:10.1111/j.1467-9876. 2005.00479.x. Chernozhukov, Victor. 2005. Extremal quantile regression. Ann. Statist. 33(2). 806–839. doi:10.1214/009053604000001165. https://doi.org/10.1214/009053604000001165. Chernozhukov, Victor & Songzi Du. 2006. Extremal quantiles and value-at-risk . Coles, Stuart, Joanna Bawa, Lesley Trenner & Pat Dorazio. 2001. An introduction to statis- tical modeling of extreme values, vol. 208. Springer. Daouia, Abdelaati, Laurent Gardes, Stéphane Girard & Alexandre Lekina. 2011. Kernel estimators of extreme level curves. Test 20(2). 311–333. Davison, A. C. & R. L. Smith. 1990. Models for exceedances over high thresholds. Journal of the Royal Statistical Society: Series B (Methodological) 52(3). 393–425. doi:10.1111/j.2517-6161.1990.tb01796.x. https://rss.onlinelibrary.wiley.com/doi/ abs/10.1111/j.2517-6161.1990.tb01796.x. 80 Davison, AC & NI Ramesh. 2000. Local likelihood smoothing of sample extremes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62(1). 191–208. Diebolt, Jean, Armelle Guillou & Pierre Ribereau. 2007. Asymptotic normality of ex- treme quantile estimators based on the peaks-over-threshold approach. Communica- tions in Statistics - Theory and Methods 36(5). 869–886. doi:10.1080/03610920601036317. https://doi.org/10.1080/03610920601036317. Elsner, Kossin J. P, J. B & T. H Jagger. 2008. “the increasing intensity of the strongest tropical cyclones,”. Nature 455. 92. doi:https://doi.org/10.1038/nature07234. Embrechts, Paul, Claudia Klüppelberg & Thomas Mikosch. 1997. Modelling extremal events, volume 33 of applications of mathematics. Feigin, Paul D. & Sidney I. Resnick. 1994. Limit distributions for linear program- ming time series estimators. Stochastic Processes and their Applications 51(1). 135 – 165. doi:https://doi.org/10.1016/0304-4149(94)90022-1. http://www.sciencedirect. com/science/article/pii/0304414994900221. Feuerverger, Andrey, Peter Hall et al. 1999. Estimating a tail exponent by modelling depar- ture from a pareto distribution. The Annals of Statistics 27(2). 760–781. Friederichs, P & A Hense. 2007a. Statistical downscaling of extreme precipitation events using censored quantile regression. Monthly weather review 135(6). 2365–2378. Friederichs, P & A. Hense. 2007b. “statistical downscaling of extreme precipitation events using censored quantile regression,”. Monthly Weather Review 135. 2365. Gardes, Laurent & Stephane Girard. 2010. Conditional extremes from heavy-tailed distribu- tions: An application to the estimation of extreme rainfall return levels. Extremes 13(2). 177–204. Garrido, Myriam & Pascal Lezaud. 2013. Extreme value analysis : an introduction . Gilli, Manfred & Evis këllezi. 2006. An application of extreme value theory for measuring financial risk. Computational Economics 27. 207–228. doi:10.1007/s10614-006-9025-7. Gomesa, M Ivette & M João Martins. 2002. “asymptotically unbiased” estimators of the tail index based on external estimation of the second order parameter. Extremes 5(1). 5–31. Haan, Laurens & Ana Ferreira. 2006. Extreme value theory: An introduction. doi:10.1007/ 0-387-34471-3. de Haan, Laurens & Ana Ferreira. 2010. Extreme value theory: An introduction (springer series in operations research and financial engineering). Springer 1st edn. Hall, Peter & Nader Tajvidi. 2000. Nonparametric analysis of temporal trend when fitting parametric models to extreme-value data. Statistical Science 153–167. 81 Hill, Bruce M. 1975. A simple general approach to inference about the tail of a distribution. Ann. Statist. 3(5). 1163–1174. doi:10.1214/aos/1176343247. https://doi.org/10.1214/ aos/1176343247. Jagger, T. H & J. B Elsner. 2008. “modeling tropical cyclone intensity with quantile regres- sion,”. International Journal of Climatology 29. 1351. doi:https://doi.org/10.1002/joc. 1804. Jansen, Dennis & Casper Vries. 1991. On the frequency of large stock returns: Putting booms and busts into perspective. The Review of Economics and Statistics 73. 18–24. doi:10.2307/2109682. Kalnay, Eugenia, M. KANAMITSU, R. KISTLER, William Collins, D. DEAVEN, Gandin LS, M. IREDELL, Satyajit Saha, G. White, J. WOOLLEN, Yuejian Zhu, Muthuvel Chel- liah, W. EBISUZAKI, Wayne Higgins, John Janowiak, K. C, C. ROPELEWSKI, Julian Wang & A. LEETMAA. 1996. The nmc/ncar 40-year reanalysis project. bull am meteorol soc. Bulletin of the American Meteorological Society. 77. doi:10.1175/1520-0477(1996) 077<0437:TNYRP>2.0.CO;2. Koenker, Roger. 2005. Quantile regression Econometric Society Monographs. Cambridge University Press. doi:10.1017/CBO9780511754098. Koenker, Roger & Gilbert Bassett. 1978. Regression quantiles. Econometrica 46(1). 33–50. Kyselý, Jan. 2002. Probability estimates of extreme temperature events: Stochastic mod- elling approach vs. extreme value distributions. Studia Geophysica et Geodaetica 46. 93– 112. doi:10.1023/A:1019841717369. Langousis, Andreas, Antonios Mamalakis, Michelangelo Puliga & Roberto Deidda. 2016. Threshold detection for the generalized pareto distribution: Review of representative meth- ods and application to the noaa ncdc daily rainfall database. Water Resources Research 52(4). 2659–2681. Li, D, L Peng & J Yang. 2010. “bias reduction for high quantiles,”. Journal of Statistical Planning and Inference 140. 2433. Park, Myung Hyun & Joseph H.T. Kim. 2016. Estimating extreme tail risk measures with generalized pareto distribution. Computational Statistics Data Analysis 98. 91 – 104. doi: https://doi.org/10.1016/j.csda.2015.12.008. http://www.sciencedirect.com/science/ article/pii/S0167947315003163. Peng, Liang & Yongcheng Qi. 2004. Estimating the first-and second-order parameters of a heavy-tailed distribution. Australian & New Zealand Journal of Statistics 46(2). 305–312. Pickands, James, III. 1975. Statistical inference using extreme order statistics. Ann. Statist. 3(1). 119–131. doi:10.1214/aos/1176343003. https://doi.org/10.1214/aos/ 1176343003. 82 Scarf, Philip A. & Patrick J. Laycock. 1996. Estimation of extremes in corrosion engineering. Journal of Applied Statistics 23(6). 621–644. doi:10.1080/02664769623982. https://doi. org/10.1080/02664769623982. Song, Jongwoo & Seongjoo Song. 2012a. A quantile estimation for massive data with gen- eralized pareto distribution. Computational statistics and data analysis 56(1). 143–150. Song, Jongwoo & Seongjoo Song. 2012b. A quantile estimation for massive data with gener- alized pareto distribution. Computational Statistics Data Analysis 56(1). 143 – 150. doi: https://doi.org/10.1016/j.csda.2011.06.030. http://www.sciencedirect.com/science/ article/pii/S0167947311002477. Tawn, Jonathan A. 1992. Estimating probabilities of extreme sea-levels. Journal of the Royal Statistical Society. Series C (Applied Statistics) 41(1). 77–93. http://www.jstor.org/ stable/2347619. Taylor, James W. 2008. Using exponentially weighted quantile regression to estimate value at risk and expected shortfall. Journal of Financial Econometrics 6(3). 382–406. https: //EconPapers.repec.org/RePEc:oup:jfinec:v:6:y:2008:i:3:p:382-406. Umantsev, Len & Victor Chernozhukov. 2001. Conditional value-at-risk: Aspects of modeling and estimation. Empirical Economics 26(1). 271–292. https://EconPapers.repec.org/ RePEc:spr:empeco:v:26:y:2001:i:1:p:271-292. Velthoen, Jasper, Juan-Juan Cai, Geurt Jongbloed & Maurice Schmeits. 2019a. Improving precipitation forecasts using extreme quantile regression. Extremes 22(4). 599–622. doi: 10.1007/s10687-019-00355-1. Velthoen, Jasper, Juan-Juan Cai, Geurt Jongbloed & Maurice Schmeits. 2019b. Improv- ing precipitation forecasts using extreme quantile regression. Extremes doi:10.1007/ s10687-019-00355-1. Wang, Hansheng & Chih-Ling Tsai. 2009. Tail index regression. Journal of the American Statistical Association 104(487). 1233–1240. Wang, Huixia Judy & Deyuan Li. 2013. Estimation of extreme conditional quantiles through power transformation. Journal of the American Statistical Association 108(503). 1062– 1074. Wang, Huixia Judy, Deyuan Li & Xuming He. 2012. Estimation of high conditional quantiles for heavy-tailed distributions. Journal of the American Statistical Association 107(500). 1453–1464. doi:10.1080/01621459.2012.716382. Weissman, Ishay. 1978. “estimation of parameters and large quantiles based on the k largest observations,”. Journal of the American Statistical Association 73. 812. 83