Graph estimation and network constrained regularization with applications in genetical genomics analysis
Estimation and application of graphical structure are important topics in modern statistics. Graphical structure is an ideal tool to describe gene regulatory network which is important in genetical genomics studies. In this dissertation, a graph estimation method and two network constrained regularization methods are proposed. Both their theoretical properties and applications in genetical genomics are studied.Large amount of research efforts have been focused on estimating gene networks based on gene expression data in order to understand the functional basis of a living organism. By treating gene expressions as quantitative traits while considering genetic markers, genetical genomics analysis has shown its power in enhancing the understanding of gene regulation. Knowing that gene expressions are often due to directed regulations, we introduce a covariate-adjusted Gaussian graphical model to estimate the equivalence class of the directed acyclic graphs (DAGs) in a genetical genomics analysis framework in the second chapter. An estimation procedure is introduced and the estimation consistency for sparse DAGs is established. We apply the method to a human Alzheimer's disease data set to show the utility of the method.Given a gene regulation network, learned either through biological experiments or via statistical methods, incorporating such structure in a regression model would increase the phenotype prediction accuracy. In chapter 3, we use gene expressions to predict phenotypic responses while considering the graphical structures on gene networks in a parametric regression model. Given that gene expressions are intermediate phenotypes between a trait and genes, we follow the instrumental variable regression framework proposed by Lin et al. (2015) and treat genetic variants as instrumental variables to deal with the endogeneity issue. We propose a two-step estimation procedure. In the first step, we apply lasso to estimate the effects of genetic variants. In the second step, we use the predicted expressions obtained from the first step as predictors while adopting a network constrained regularization method to obtain better variable selection and estimation. We establish the selection consistency and related bound.It is commonly recognized that genetic effects on complex traits are largely modulated by environmental influences. However, the modulation mechanism is barely understood. Based on the work in chapter 3, we adopt an instrumental variable regression approach while incorporating a varying coefficient model to capture potential nonlinear environmental modulation effects in chapter 4. B-splines are used to approximate the coefficient functions. Group lasso and a graph constrained penalties are applied to achieve better variable selection and estimation when network structures are considered for gene expressions. Selection consistency is established under some assumptions. We apply our method to a human liver cohort data set to demonstrate its utility.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Gao, Bin, Ph. D.
- Thesis Advisors
-
Cui, Yuehua
- Committee Members
-
Xiao, Yimin
Lu, Qing
Zhong, Ping-Shou
- Date Published
-
2015
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xi, 103 pages
- ISBN
-
9781321797602
1321797605