High-dimensional variable selection for spatial regression and covariance estimation
Spatial regression is an important predictive tool in many scientific applications and an additive model provides a flexible regression relationship between predictors and a response variable. Such a model is proved to be effective in regression based prediction. In this article, we develop a regularized variable selection technique for building a spatial additive model. We find that the approaches developed for independent data do not work well for spatially dependent data. This motivates us to propose a spatially weighted L2- error norm with a group LASSO type penalty to select additive components for spatial additive models. We establish the selection consistency of the proposed approach where a penalty parameter depends on several factors, such as the order of approximation of additive components, characteristics of the spatial weight and spatial dependence, etc. An extensive simulation study provides a vivid picture of the impacts of dependent data structures and choices of a spatial weight on selection results as well as the asymptotic behavior of the estimates. We also investigate the impact of correlated predictor variables. As an illustrative example, the proposed approach is applied to lung cancer mortality data over the period of 2000-2005, obtained from Surveillance, Epidemiology, and End Results Program by the National Cancer Institute, U.S.Providing a best linear unbiased predictor (BLUP) is always a challenge for a non-repetitive, irregularly spaced, spatial data. The estimation process as well as prediction involves inverting an $n\times n$ covariance matrix, which computationally requires O(n^3). Studies showed the potential observed process covariance matrix can be decomposed into two additive matrix components, measurement error and an underlying process which can be non-stationary. The non-stationary component is often assumed to be fixed but low rank. This assumption allows us to write the underlying process as a linear combination of fixed numbers of spatial random effects, known as fixed rank kriging (FRK). The benefit of smaller rank has been used to improve the computation time as O(n r^2), where r is the rank of the low rank covariance matrix. In this work we generalize FRK, by rewriting the underlying process as a linear combination of n random effects, although only a few among these are actually responsible to quantify the covariance structure. Further, FRK considers the covariance matrix of the random effect can be represented as product of r x r cholesky decomposition. The generalization leads us to a n x n cholesky decomposition and use a group-wise penalized likelihood where each row of the lower triangular matrix is penalized. More precisely, we present a two-step approach using group LASSO type shrinkage estimation technique for estimating the rank of the covariance matrix and finally the matrix itself. We investigate our findings over a set of simulation study and finally apply to a rainfall data obtained on Colorado, US.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Nandy, Siddhartha
- Thesis Advisors
-
Lim, Chae-Young
Maiti, Tapabrata
- Committee Members
-
Zhong, Ping-Shou
Tan, Pan-Ning
- Date Published
-
2016
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xii, 84 pages
- ISBN
-
9781369425543
1369425546
- Permalink
- https://doi.org/doi:10.25335/deqh-jp13