You are here
(1 - 1 of 1)
- Flexible hierarchical Bayesian modeling extensions to improve whole genome prediction and genome wide association analyses
- Chen, Chunyu (Graduate of Michigan State University)
- Electronic Theses & Dissertations
"Whole genome prediction (WGP) has been widely implemented in animal and plant breeding for genomic selection of economically important traits, having already accelerated genetic progress for economically important traits in some species especially dairy cattle. Genome wide association (GWA) analysis is used for screening genomic regions that may include important candidate genes segregating for the trait of interest and is being increasingly integrated with WGP analysis. Both WGP and GWA...
Show more"Whole genome prediction (WGP) has been widely implemented in animal and plant breeding for genomic selection of economically important traits, having already accelerated genetic progress for economically important traits in some species especially dairy cattle. Genome wide association (GWA) analysis is used for screening genomic regions that may include important candidate genes segregating for the trait of interest and is being increasingly integrated with WGP analysis. Both WGP and GWA typically represent m226Bn problems as defined by a large number of single nucleotide polymorphism (SNP) markers (m) and comparably much smaller number of individuals (n). Two broad types of parametric models are typically considered for these analyses: traditional best linear unbiased prediction approaches based on SNP marker effects being normally distributed and Bayesian WGP models that allow more flexible specifications for SNP marker effects based on either heavy-tailed or variable selection specifications. Bayesian WGP models can achieve higher prediction accuracies than traditional approaches in many applications if properly tuned; however, their implementation can be computationally challenging. My dissertation was aimed to address some of these emerging issues in Bayesian WGP models as well as providing software tools for real data applications. In Chapter 2, I developed an expectation maximization (EM) algorithm as a fast alternative to traditional Markov Chain Monte Carlo (MCMC) for Bayesian WGP models. I proposed EM implementations for two models, heavy-tailed BayesA and stochastic search and variable selection (SSVS) adapting the EM algorithm for maximum a posterior (MAP) inference of SNP effects and adapting REML like strategies to estimate key hyperparameters. Using a comprehensive simulation study and real data analysis, I found that these empirical Bayes approaches can be quite sensitive to starting values for SNP effects. However, using a deterministic annealing variant of EM, I obtained hyperparameter estimates and prediction accuracies comparable to their MCMC counterparts. In Chapter 3, I further assessed the possibility using two Bayesian WGP models BayesA and SSVS for GWA studies. I also included a popular GWA analysis (EMMAX) based on the utilization of the linear mixed model. In addition to basing inferences on traditional single SNP tests and fixed genomic window tests, I assessed the merit of tests involving adaptively determined windows based on clustering genome into blocks based on linkage disequilibrium. I found that SSVS and BayesA under MCMC and adaptive window tests led to best receiver operating curve (ROC) properties. In Chapter 4, I extended SSVS to single step SSVS to incorporate phenotypes of non-genotyped individuals and compared its performance with corresponding models ignoring these genotypes for both WGP and GWA. I found single step SSVS to be a promising for WGP and GWA, particularly for genetic architectures characterized by a few genes with large effects. In Chapter 5, I combined much of the developments in Chapter 2 to Chapter 4 and beyond in a unified framework as an open source R package BATools to implement several different Bayesian models for WGP and GWA."--Pages ii-iii.