Statistical inference with high-dimensional dependent data

High-dimensional time dependent data appear in practice when a large number of variables are repeatedly measured for a relatively small number of experimental units. The number of repeated measurements can range from two to hundreds depending on the application. Advances in technology have made the process of gathering and storing data such as these relatively low-cost and efficient. Demand to analyze such complex data arises in genetics, microbiology, neuroscience, finance, and meteorology. In this dissertation, we first introduce and investigate a novel solution to a classical problem that involves high-dimensional time dependent data. In addition, we propose a new approach to analyze high-dimensional dependent genomics data.First, we consider detecting and identifying change points among covariance matrices of high-dimensional longitudinal data and high-dimensional functional data. The proposed methods are applicable under general temporospatial dependence. A new test statistic is introduced for change point detection, and its asymptotic distribution is established under two different asymptotic settings. If a change point is detected, an estimate for the location is provided. We investigate the rate of convergence for the change point estimator and study how it is impacted by dimensionality and temporospatial dependence in each asymptotic framework. Binary segmentation is applied to estimate the locations of possibly multiple change points, and the corresponding estimator is shown to be consistent under mild conditions for each asymptotic setting. Simulation studies demonstrate the empirical size and power of the proposed test and accuracy of the change point estimator. We apply our procedures on a time-course microarray data set and a task-based fMRI data set. In the second part of this dissertation we consider a hierarchical high-dimensional dependent model in the context of genomics. Our model analyzes RNA sequencing data to identify polymorphisms with allele-specific expression that are correlated with phenotypic variation. Through simulation, we demonstrate that our model can consistently select significant predictors among a large number of possible predictors. We apply our model to an RNA sequencing and phenotypic data set derived from a sounder of swine.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Santo, Shawn M.

Thesis Advisors: Zhong, Ping-Shou

Committee Members: Cui, Yuehua
Hong, Hyokyoung
Steibel, Juan

Date Published: 2018

Subjects: Statistics

Program of Study: Statistics - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: 176 pages

Statistical inference with high-dimensional dependent data

Full text