Empirical likelihood based functional data analysis and high dimensional inference with applications to biology
High dimensional data analysis has been a rapidly developing topic in statistics with various applications in areas such as genetics/genomics, neuroscience, finance, social science and so on. With the rapid development of technology, statistics as a data science requires more and more innovations in methodologies as well as breakthroughs in mathematical frameworks. In high dimensional world, classical statistical methods designed for fixed dimensional models are often doomed to fail. This thesis focuses on two types of high dimensional data analysis. One is the study of typical ``large $p$ small $n$" problem in linear regression with high dimensional covariates $\bX\in\R^p$ but small sample size $n$, and the other is the functional data analysis. Functional data belong to the class of high dimensional data in the sense that every data object consists of a large number of measurements, which may be larger than the sample size. But the key characteristic is that functional objects can be modeled as smooth curves or surfaces. We make use of Empirical Likelihood (EL) introduced by \citep{owen2001empirical}, to solve some fundamental problems in these two particular high dimensional problems. The first part of the thesis considers the problem of testing functional constraints in a class of functional linear regression models where both the predictors and the response are functional data measured at discrete time points. We propose test procedures based on the empirical likelihood with bias-corrected estimating equations to conduct both pointwise and simultaneous inference. The asymptotic distributions of the test statistics are derived under the null and local alternative hypotheses, where sparse and dense functional data are considered in a unified framework. We find a phase transition in the asymptotic distributions and the orders of detectable alternatives from sparse to dense functional data. Specifically, the proposed tests can detect alternatives of root-$n$ order when the number of repeated measurements per curve is of an order larger than $n^{\eta_0}$ with $n$ being the number of curves. The transition points $\eta_0$ are different for pointwise and simultaneous tests and both are smaller than the transition point in the estimation problem.In the second part of the thesis, we consider hypothesis testing problems for a low-dimensional coefficient vector in a high-dimensional linear model under heteroscedastic error. Heteroscedasticity is a commonly observed phenomenon in many applications including finance and genomic studies. Several statistical inference procedures have been proposed for low-dimensional coefficients in a high-dimensional linear model with homoscedastic noise. However, those procedures designed for homoscedastic error are not applicable for models with heteroscedastic error and the heterscedasticity issue has not been investigated and studied. We propose a inference procedure based on empirical likelihood to overcome the heteroscedasticity issue. The proposed method is able to make valid inference under heteroscedasticity model even when the conditional variance of random error is a function of the high-dimensional predictor. We apply our inference procedure to three recently proposed estimating equations and establish the asymptotic distributions of the proposed methods.For both of the two parts, simulation studies and real data analyses are conducted to demonstrate the proposed methods.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Wang, Honglang
- Thesis Advisors
-
Cui, Yuehua
Zhong, Ping-Shou
- Committee Members
-
Hong, Hyokyoung
Buell, Carol R.
- Date
- 2015
- Subjects
-
Biology
Quantitative research
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xi, 165 pages
- ISBN
-
9781321916652
1321916655
- Permalink
- https://doi.org/doi:10.25335/hpwe-kt56