Multivariate generalized functional linear models with applications to genomics
This thesis is focused on developing functional data methodology with the aim of addressing problems that arise in genetic sequencing data. While significant progress has been made in identifying common genetic variants associated with diseases, these variants only explain a small proportion of heritability. Recent studies suggest that rare variants could account for this variability. With advancements in sequencing technology, large-scale sequencing studies are now being conducted to comprehensively investigate the contribution of rare variants to the genetic etiology of various diseases. Although these studies hold great potential for uncovering new disease-associated variants, the massive amount of data and complex structure of sequencing data poses great analytical challenges on association analysis. Advanced methods are needed to address these challenges and to facilitate the discovery process of new variants predisposing to various diseases. We use functional data analysis methods to capture the complexities of sequencing data.In the first chapter we investigate the importance of considering the genetic structure of sequencing data. In association studies the effect of appropriately modeling genetic structure of sequencing data on association analysis have not been well studied. We compare three statistical approaches which use different strategies to model the genetic structure. They are a burden test, a burden test that considers pairwise correlation, and a functional analysis of variance (FANOVA) test that models the gene through fitting continuous curves on an individuals genotype profile. We find some evidence in favor of treating sequencing data as a function.In the second chapter we present the definitions of some fundamental concepts in Functional Data Analysis like the mean element, covariance operator and its eigen decomposition, and Karhunen- Loeve expansion. Basis expansion and in particular Karhunen-Loeve expansion play an important role in this thesis. We briefly discuss the estimators for the mean function, the covariance operatorand their consistency. Results on the consistency of the eigenvalues and eigenfunctions of the sample covariance operator are also stated.Several times genetic data is collected on families, where the response variable or the trait of the family members can be dependent on each other. Additionally, this trait of interest can be discrete or continuous. Thus there is a need for a functional model that can handle dependent data that may be continuous or discrete. The model proposed by Muller and Stadtmuller (2005) uses the generalized estimating equations approach that can handle both continuous and discrete data. However, they assume the response variable to be univariate and the sample to be independent. There are no existing functional methods that we know of that can be directly applied to the family data. In the third chapter we develop a framework for dependent generalized functional linear models where the response is multivariate, that can be used to test for a certain type of association between the genetic data and the trait of interest for family data.In the fourth chapter we develop regression framework where the response variable has a normal distribution and there is measurement error in the regressor function. In this set-up, the true regressor function is not observable. Instead, we observe a surrogate variable and its replicates. The relation between the true function and the surrogate one is assumed to follow the additive classicalmeasurement error model. We use the approach developed by Stefanski and Carroll (1987) to propose an estimating equation for the parameters and show asymptotic existence and consistency of the estimate obtained from this equation.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Jadhav, Sneha
- Thesis Advisors
-
Koul, Hira L.
- Date
- 2017
- Subjects
-
Linear models (Statistics)
Genomics
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- viii, 79 pages
- ISBN
-
9780355146400
0355146401
- Permalink
- https://doi.org/doi:10.25335/q2yj-9t96