HIERARCHICAL EXTENSIONS OF BAYESIAN PARAMETRIC MODELS FOR WHOLE GENOME PREDICTION By Wenzhao Yang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Animal Science – Doctor of Philosophy Quantitative Biology – Doctor of Philosophy 2014 ABSTRACT HIERARCHICAL EXTENSIONS OF BAYESIAN PARAMETRIC MODELS FOR WHOLE GENOME PREDICTION By Wenzhao Yang Whole genome prediction (WGP) is increasingly used to predict breeding values (BV) of plants and animals based on the use of single nucleotide polymorphism (SNP) marker panels. Two particularly popular WGP models, labeled BayesA and BayesB, are based on specifying all SNP-associated effects to be independent of each other. In this dissertation, we further extend these two models to allow for greater flexibility to infer upon BV and SNP effects in three different frameworks: 1) allowing for correlated SNP effects, 2) reaction norm modeling of genotype by environment interaction (G×E) and 3) bivariate WGP models. We complement these efforts with focusing on strategies to infer upon key hyperparameters that anchor some of these specifications. Based on a first order nonstationary antedependence specification, we extended BayesA and BayesB to account for spatial correlation between SNP effects due to the proximal QTL; we label the corresponding extensions as ante-BayesA and ante-BayesB respectively. Using simulation studies and application to the publicly available heterogeneous stock mice data and other provided benchmark data, we determined that antedependence models had significantly higher WGP accuracies compared to their conventional counterparts, especially at higher LD levels. Subsequently, we extended reaction norm (RN) and random regression (RR) models to account for G×E. Several specifications on the SNP-specific variance-covariance matrices (VCV) of intercept and slope effects were considered using independent inverted Wishart (IW) prior densities (IW-BayesA, IW-BayesB and IW-BayesC). Two potentially more flexible RR/RN models using square root free Cholesky decomposition (CD) were proposed (CD-BayesA and CD-BayesB). Based on a RN simulation study and a RR data analysis in pigs, RR/RN WGP models provided greater WGP accuracies compared to conventional WGP models although differences were not substantial between the competing IW- vs CDbased methods except with simpler genetic architectures (i.e., low number of QTL). We also developed bivariate WGP models based on more or less the same specifications for SNP-specific VCV in RR/RN models (i.e., IW-BayesA, CD-BayesA and CD-BayesB) comparing them to the more conventional bivariate genomic BLUP (bGBLUP) model. Using a LD simulation study, the three bivariate trait models generally demonstrated higher WGP accuracy than univariate BayesA or BayesB when the number of pleiotropic QTL was relatively large and the heritability of the trait was low. Furthermore, in an application to data from pine trees, CD-BayesB exhibited higher predictive ability compared to other competing models. Comparisons between competing WGP models require appropriate tuning of key hyperparameters. Hence we also studied three alternative Metropolis-Hastings (MH) sampling strategies to infer upon key hyperparameters in BayesA and BayesB. Both simulation studies and application to the heterogeneous stock mice data, strategies that were more heavily based on Metropolis Hastings sampling of key hyperparameters demonstrated significantly greater computational efficiencies compared to strategies that deferred to usage of Gibbs sampling. ACKNOWLEDGEMENTS I would like to thank my advisor, Dr. Robert Tempelman, for taking me as his student and guiding me throughout my PhD research project with encouragement and patience. Dr. Robert Tempelman not only offered me a great opportunity to work with him but also helped me developing critical thinking. I greatly appreciate his guidance and menorship he provided to me over the years. I want to thank my guidance committee members for their constructive and thought-provoking suggestions on my research work. Dr. Juan Steibel, who was my second reader, provided me important data resource for my research project. Dr. Cathy Ernst helped me understanding genetics and guided me for my dual major. Dr. Yuehua Cui inspired me to learn quantitative genetics and Dr. Qing Lu gave me great input from an epigenetics perspective. Furthermore, I would like to thank United States Department of Agriculture (USDA), Animal Science department and Quantitative Biology program for sponsoring my research project. I am also grateful to my colleagues Nora Bello, Chunyu Chen, Heng Wang, Lei Zhou, Igseo Choi, Yvonne Badke, Jose Luis Gualdron and Pablo Reeb for their feedback and friendship. Last but not the least, I’d like to express my thanks to my husband and my parents for their unconditional love and support. iv TABLE OF CONTENTS LIST OF TABLES ....................................................................................................... viii LIST OF FIGURES ........................................................................................................ ix Chapter 1 Introduction .................................................................................................... 1 Chapter 2 A Bayesian antedependence model for whole genome prediction ................. 5 2.1 Background .............................................................................................................. 5 2.2 Materials and Methods ............................................................................................. 7 2.2.1 Conventional WGP model ........................................................................... 7 2.2.2 Antedependence extensions of WGP models .............................................. 9 2.2.3 Simulation study ........................................................................................ 12 2.2.4 Application to Heterogeneous Stock Mice Dataset ................................... 14 2.2.5 Application on Simulated Genomic Data from Hickey and Gorjanc ........ 16 2.2.6 Bayesian inference ..................................................................................... 17 2.2.7 Prior specifications .................................................................................... 17 2.3 Results .................................................................................................................... 18 2.3.1 Simulation Study ....................................................................................... 18 2.3.2 Application to Heterogeneous Stock Mice data ........................................ 28 2.3.3 Application to Hickey and Gorjanc Data .................................................. 28 2.4 Discussion and Conclusion .................................................................................... 30 Chapter 3 Improving the computational efficiency of fully Bayes inference and assessing the effect of misspecification on hyperparameters in whole genome prediction model ............................................................................................................ 37 3.1 Introduction ............................................................................................................ 37 3.2 Materials and Methods ........................................................................................... 40 3.2.1 WGP Model ............................................................................................... 40 2 3.2.2 Univariate Metropolis Hastings sampling on ν and Gibbs update on s (DFMH) ............................................................................................................... 41 3.2.3 Univariate Metropolis Hastings sampling for each of ν and s (UNIMH) ............................................................................................................................. 42 2 3.2.4 Bivariate Metropolis Hastings sampling on ν and s (BIVMH) ............. 43 3.2.5 Simulation Study ....................................................................................... 44 3.2.6 Data Application: Assessment of computational efficiency comparisons 47 3.3 Results .................................................................................................................... 48 3.3.1 Simulation Study ....................................................................................... 48 3.3.2 Application to Heterogeneous Stock Mice data ........................................ 60 3.4 Discussion .............................................................................................................. 60 3.5 Conclusions ............................................................................................................ 63 2 v Chapter 4 Random regression and reaction norm extensions of whole genome prediction models to account for genotype by environment interaction ....................... 64 4.1 Introduction ............................................................................................................ 64 4.2 Materials and Methods ........................................................................................... 67 4.2.1 Random regression and reaction norm models.......................................... 67 4.2.2 Conventional BayesA and BayesB (BayesA\BayesB) .............................. 69 4.2.3 Bivariate Normality (IW-BayesC)............................................................. 69 4.2.4 Bivariate Student t and Variable Selection (IW-BayesA\IW-BayesB) ..... 70 4.2.5 Cholesky decomposition specifications (CD-BayesA\CD-BayesB) ......... 71 4.2.6 Bayesian inference ..................................................................................... 73 4.2.7 Simulation Study ....................................................................................... 73 4.2.8 MSU Pig Resource Population data .......................................................... 78 4.2.9 Priors used for data analyses ..................................................................... 81 4.3 Results .................................................................................................................... 82 4.3.1 Simulation Study ....................................................................................... 82 4.3.2 MSU Pig Resource Population data .......................................................... 87 4.4 Discussion .............................................................................................................. 94 4.5 Conclusions .......................................................................................................... 101 Chapter 5 Exploring alternative specifications for bivariate trait whole genome prediction models ........................................................................................................ 102 5.1 Introduction ........................................................................................................... 102 5.2 Methods and Materials ......................................................................................... 105 5.2.1 Whole genome prediction models ........................................................... 105 5.2.2 Univariate BayesA and BayesB (uBayesA\uBayesB)............................. 106 5.2.3 Bivariate Ridge regression (bGBLUP) .................................................... 106 5.2.4 Bivariate Student-t (IWBayesA) ............................................................. 107 5.2.5 Cholesky decomposition specifications (CDBayesA\CDBayesB).......... 108 5.2.6 Bayesian inference ................................................................................... 111 5.2.7 Simulation studies.................................................................................... 111 5.2.8 Pine data analyses .................................................................................... 115 5.2.9 Priors used for data analyses ................................................................... 117 5.3 Results .................................................................................................................. 118 5.3.1 Simulation Studies ................................................................................... 118 5.3.2 Pine data analyses .................................................................................... 121 5.4 Discussion ............................................................................................................ 124 5.5 Conclusions .......................................................................................................... 128 Chapter 6 Discussion, Conclusions and Future Work................................................. 129 APPENDICES ............................................................................................................. 133 APPENDIX A: Chapter 2 ........................................................................................... 134 APPENDIX B: Chapter 3............................................................................................ 164 APPENDIX C: Chapter 4............................................................................................ 184 APPENDIX D: Chapter 5 ........................................................................................... 211 vi BIBLIOGRAPHY ....................................................................................................... 242 vii LIST OF TABLES Table 2.1: Summary statistics for 6 different marker densities in the simulation study over 20 replicates ...................................................................................................................... 19 Table 4.1: Summary of six scenarios in LD simulation.................................................... 74 Table 5.1: Summary of two different populations compared in a LD simulation study. 112 viii LIST OF FIGURES 2 2 Figure 2.1: Average posterior means of sg (BayesA, BayesB) and sδ (ante-BayesA, ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and ante-BayesA in (A) and BayesB versus ante-BayesB in (B). Significant differences in posterior means between competing methods at each LD level are indicated by *(P<0.01), **(P <0.001), or ***(P <0.0001). .................................................................................... 21 Figure 2.2: Average posterior means of π g (BayesB) versus π δ (ante-BayesB) across 20 replicates as a function of six different LD levels. Significant differences in posterior means between competing methods at each LD level are indicated by *(P<0.01), **(P <0.001), or ***(P <0.0001). ............................................................................................. 22 Figure 2.3: Average posterior means of vg (BayesA, BayesB) and vδ (ante-BayesA, ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and ante-BayesA in (A) and BayesB versus ante-BayesB in (B). No significant differences (P>.01) were determined between the two sets of competing procedures at each LD level. ........................................................................................................................................... 24 Figure 2.4: Average accuracies of estimated breeding value across 20 replicates for analyses based on each of six LD levels. Differences in accuracy between BayesA and ante-BayesA (bottom symbols) and between BayesB with ante-BayesB (top symbols) indicated as significant by *(P<.01) (*) or **( P<.001) ................................................... 26 Figure 2.5: Boxplots of average accuracies of estimated breeding value across 9 replicates for four traits in Generations 6, 8 and 10 for benchmark data from Hickey and Gorjanc (2012). Differences in accuracy between anteBayesB (black) and BayesB (dark gray) and between anteBayesA (light gray) with BayesA (white) indicated as significant by *(0.05> n issue since the specification of random effects facilitates a borrowing of information across markers. However, it has been speculated that the distributional assumptions in GBLUP may be too strong, depending upon the genetic architecture of the trait; i.e., the distribution of the QTL effects themselves, often believed to be non-normal (HAYES and GODDARD 2001) or the relative number of QTL to number of markers. Meuwissen et al. (2001) introduced parametric Bayesian models labeled “BayesA” and “BayesB” to provide additional distributional flexibility, with both approaches often demonstrating better fit for WGP compared to GBLUP (Meuwissen, Hayes et al. 2001; Habier, Fernando et al. 2007; Hayes, Bowman et al. 2009). The “BayesA” model specifies marker specific genetic effects to be normally distributed with mean 0 and marker specific variances being independent random draws from a scaled inverted chi2 square distribution; in essence, the genetic effects are marginally specified to be IID Student t distributed (de los Campos, Hickey et al. 2012). The “BayesB” model uses this same distributional assumption as one component of a mixture distribution, the other component being a point spike at 0; i.e., no effects for those markers belonging to that component. Since then, several other “Bayesian alphabet” models have been developed as well (de los Campos, Naya et al. 2009; Verbyla, Hayes et al. 2009; Habier, Fernando et al. 2011; Wang, Ding et al. 2013);nevertheless, it has been duly noted that these developments and any such comparisons involving new models might be tainted by misspecification or inappropriate tuning of key hyperparameters that anchor their corresponding distributional specifications (GIANOLA 2013). SNP effects have been jointly analyzed under a multivariate WGP framework across heterogeneous environments (Burgueno, de los Campos et al. 2012) or multiple traits (CALUS and VEERKAMP 2011). For heterogeneous environments in WGP, genotype by environment interaction (G×E) can be detected by modeling SNP specific intercept and slope effects of environmental covariates (Lillehammer, Hayes et al. 2009) in random regression (RR) and reaction norm (RN) models (Berry, Buckley et al. 2003; Cardoso and Tempelman 2012). For multiple traits in WGP, pleiotropic regions on the genome can be detected by modeling SNP specific pleiotropic effects in multivariate trait models (van Binsbergen, Veerkamp et al. 2012). In RR/RN models and bivariate trait models, the same prior densities can be specified on the genetic variance-covariance matrices (VCV) of the SNP specific effects. Calus and Veerkamp (2011) proposed a multiple trait BayesA model with a conjugate inverted Wishart (IW) prior on VCV. It has potential inflexibility since uncertainty in all elements of a VCV is based on a single degrees of freedom 3 parameter (MUNILLA and CANTET 2012). Bello et al. (2010) suggested that the square root free Cholesky decomposition (CD) of the VCV in bivariate mixed models might allow greater flexibility as uncertainty can be differentially expressed on each element of a VCV using such a parameterization. There are three overarching goals in this dissertation as it pertains to meaningfully developing WGP models with improved accuracies. Firstly, one objective was to help potentially improve WGP accuracy by extending existing models to account for potential spatially-induced correlations between SNP effects due to the proximity of QTL (Chapter 2) as originally anticipated by Gianola et al. (2003). A second objective was to investigate computational strategies that one might use to be reasonably able to infer upon all key hyperparameters that underlie these and other more conventional WGP models in Chapter 3, including an assessment of the implications of hyperparameter misspecification. Finally, I deemed it imperative to provide greater flexibility in bivariate SNP effects modeling than currently developed for WGP models, whether for inferring upon genotype by environment interactions from a reaction norm perspective (Chapter 4) or to provide for bivariate trait WGP models (Chapter 5). I conclude this dissertation with some further concluding thoughts and areas for future research in Chapter 6. 4 Chapter 2 A Bayesian antedependence model for whole genome prediction 2.1 Background Whole genome prediction (WGP) using commercially available medium to high density (>50,000) single nucleotide polymorphism (SNP) panels have transformed livestock and plant breeding. Typically, the allelic substitution effects of all SNP markers are jointly estimated in WGP evaluation models assuming additive inheritance and summed to predict breeding values of each individual animal based on its SNP genotypes (Meuwissen, Hayes et al. 2001). This technology will not only lead to dramatically increased rates of genetic improvement for economically important traits such as meat and milk production in livestock (Wiggans, VanRaden et al. 2011) or crop production (Lorenz, Chao et al. 2011), but would also improve predictions of genetic predisposition to human diseases for personalized medicine (de los Campos, Gianola et al. 2010). Currently, the number (m) of available SNP markers is typically much greater than the number (n) of animals having phenotypic records. Hence, hierarchical mixed model or Bayesian approaches have been generally adopted in WGP to efficiently borrow information across these many markers by specifying their corresponding effects to be random. Following MEUWISSEN et al. (2001), these effects are typically specified to be either Gaussian or Student t-distributed (BayesA), or a mixture of either of these two densities with a point mass on zero (BayesB). When these effects are specified to be Gaussian, then best linear unbiased prediction of these effects is typically pursued because of computational tractability (VanRaden 2008; Hayes, Bowman et al. 2009); applied to WGP, this procedure is often known as GBLUP. Thus far, the distributional 5 specifications for these various hierarchical modeling approaches have been based on a prior assumption of independence between all such effects. GIANOLA et al. (2003) anticipated that some of these effects might be spatially correlated within chromosomes such that greater inference efficiency might be provided by modeling these effects as correlated. Their proposed specifications required either equally spaced markers and/or within-chromosome correlations depending strictly on physical/linkage map distance between markers. However, the equally spaced assumption is rather tenuous for most currently available SNP marker panels. Even more importantly, the inferred correlation structure is likely to be nonstationary given that it should be primarily driven by the proximity of SNP markers to quantitative trait loci (QTL) of major effects. In other words, we anticipate that the correlation between the inferred effects of adjacent SNPs distal to major QTL would be substantially smaller than those proximal to these QTL. Antedependence models have been increasingly advocated for the analysis of repeated measures data (ZIMMERMAN and NÚÑEZ-ANTÓN 2010) to parsimoniously account for nonstationary correlations between repeated measurements over time. In this paper, we develop first-order antedependence counterparts to BayesA and BayesB. Through simulation, a cross-validation study involving the publicly available heterogeneous stock mixture data (Valdar, Solberg et al. 2006; Valdar, Solberg et al. 2006) and journal-provided reference data (HICKEY and GORJANC 2012) to benchmark our proposed methods against others, we demonstrate that, compared to their conventional counterparts, these antedependence-based WGP models improve the 6 accuracy of genomic merit prediction as well as potentially increase the sensitivity of QTL detection, which is the key objective of genome wide association studies (GWAS). 2.2 Materials and Methods 2.2.1 Conventional WGP model The base linear mixed model used for WGP is generally written as follows: y = Xβ + Zg + Wu + e [1] Here, y = { yi }i =1 is a n x 1 vector of phenotypes, β is an p x 1 unknown vector of fixed n effects connected to y via a known n x p incidence or covariate matrix X (e.g., environmental effects), g = { g j } m j =1 is an m x 1 vector of random SNP effects connected to y via a known n x m matrix of SNP genotypes coded as 0, 1, or 2 copies of the minor allele for each SNP (column) and animal (row) in Z. Furthermore, u = {uk }k =1 is a q x 1 q vector of random polygenic effects connected to y via a known n x q incidence matrix W, and e = {ei }i =1 is the residual vector. We assume that u ~ N(0, As u2 ) , where A denotes n the pedigree-derived numerator relationship matrix (HENDERSON 1976), and is often included in WGP models due to insufficient genome coverage by Z (CALUS and ( ) VEERKAMP 2007). Furthermore, we specify g ~ N ( 0, Σg ) where Σg = diag s g2 j and e ~ N (0, Is e2 ) . From a Bayesian perspective, a subjective prior may be also specified on ( ) β using β ~ N β 0 , Vβ with β 0 and Vβ taken as known (SORENSEN and GIANOLA 2002). 7 Now the distinction between GBLUP, BayesA, and BayesB in MEUWISSEN et al. (2001) s g2∀j ), then the model depends upon the characterization of Σg . If Σg = Is g2 (i.e., s g2= j is defined to be GBLUP. If, instead, the diagonal elements of Σg are independent random draws from an scaled inverted chi-square distribution, i.e., s g2 j ~ χ −2 (ν g ,ν g sg2 ) ( ) such that E s g2 j = ν g sg2 , then the model is said to be BayesA such that marginally gj is νg − 2 a random draw from a Student t distribution with mean 0, degrees of freedom ν g and scale parameter sg2 (de los Campos, Naya et al. 2009; Gianola, de los Campos et al. 2009). Now BayesB further extends BayesA by including a two-component mixture with one component being χ −2 (ν g ,ν g sg2 ) and the other component being a spike or point mass at 0; i.e., =0 with probability π g 2  ~χ (ν g ,ν g sg ) with probability (1-π g )  s g2 | ν g , sg2  j −2 [2] That is, π g ( 0 < π g < 1 ) represents the proportion of SNP markers having no associated genetic effects on the trait of interest. Clear warnings have been provided on how sensitive inferences using BayesA or BayesB may be to specification of the hyperparameters (de los Campos, Naya et al. 2009; Gianola, de los Campos et al. 2009). It has not been widely appreciated that ν g and sg2 are estimable; this recognition is critical as both hyperparameters help define the genetic architecture in BayesA and BayesB. That is, ν g characterizes the variability of s g2 j 8 about a typical variance component of sg2 . Details on how to estimate ν g and sg2 in the context of BayesA were previously provided by YI and XU (2008). Furthermore, π g is estimable in BayesB. For both BayesA and BayesB, we specify the prior distribution ν g ~ p (ν g ) ∝ (ν g + 1) , similar to what we have previously adopted in other applications −2 (Kizilkaya and Tempelman 2005; Bello, Steibel et al. 2010). Furthermore, we specify π g ~ p (π g | απ , βπ ) = Beta (απ , βπ ) for BayesB, with values of απ and βπ chosen to reflect prior uncertainty on π g . We also specify a proper conjugate prior on sg2 in BayesB; i.e., sg2 ~ p ( sg2 | α g , β g ) = Gamma(α s , β s ) recognizing that the specifications on α g and β g become increasingly influential as π g → 1. Finally, we specify noninformative priors s e2 ~ χ −2 ( −1, 0 ) and s u2 ~ χ −2 ( −1, 0 ) which are congruent with specifying uniform priors on s e and s u , respectively, and in line with recommendations for variance components by GELMAN (2006). We similarly and confidently specify sg2 ~ χ −2 ( −1, 0 ) in BayesA, given that m is generally large enough for stable inference on sg2 without the need for more informative priors. 2.2.2 Antedependence extensions of WGP models We propose a nonstationary first-order antedependence correlation structure for g based on the relative physical location of SNP markers along the chromosome(s): δ1  gj =   t j , j −1 g j −1 + δ j if j = 1 if 2 ≤ j ≤ m 9 [3] ( ) Here δ j ~ NID 0, s δ2 j , j = 1,..., m whereas t j , j −1 is the marker interval specific antedependence parameter (ZIMMERMAN and NÚÑEZ-ANTÓN 2010) of g j on g j −1 in the specified order. We can rewrite the recursive expression in [3] in matrix notation: [4] g=Tg+δ where δ= {δ } j m = (I − T)g for I being a m x m identity matrix, and T having all null j =1 values except for elements tj,,j-1 at the corresponding subscript addresses. It can be readily seen using Equation [4] that var(g ) = Σg = ( I − T ) Δ ( I − T ) ' where ( I − T ) is a −1 −1 −1 { } lower triangular matrix with diagonal elements equal to 1 and Δ = diag s δ2j m j =1 . As further illustrated in File S1 from the Supporting Information, Σg−1 = ( I − T )′ Δ −1 ( I − T ) is a readily determined tri-diagonal matrix (ZIMMERMAN and NÚÑEZ-ANTÓN 2010) which is important as it facilitates inference on g. Some of the other developments closely follow the BayesA and BayesB models of MEUWISSEN et al. (2001). That is, we specify s δ2j ~ χ −2 (ν δ ,ν δ sδ2 ) in a model which we label ante-BayesA. Similarly, we propose an ante-BayesB model whereby we specify a mixture similar to Equation [2] except that it is specified on s δ2j ; i.e., a mixture of point mass on zero with probability π δ and scaled inverted chi-square prior χ −2 (ν δ ,ν δ sδ2 ) with probability (1- π δ ). As we suggested earlier for π g , we believe that π δ is estimable such 10 that ante-BayesA is merely a special case of ante-BayesB. In turn, BayesA is merely a special case of ante-BayesA, as is BayesB of ante-BayesB, when T = 0 ; i.e., tj,,j-1 = 0 ∀j . These antedependence extensions, nevertheless, do require inference on the unknown m-1 non-zero elements {t j , j −1} m j =2 of T. Borrowing from DANIELS and POURAHMADI (2002) and BELLO et al. (2010), we specify t j , j −1 ~ N ( mt , s t2 ) as a conjugate prior in both ante-BayesA and ante-BayesB, thereby allowing flexible inference on the nonstationary correlation structure in Σg . However, it should be further noted that if interval j,j-1 specifies that between the last SNP of one particular linkage group or chromosome and the first SNP in the arbitrarily subsequent linkage group, then we set the corresponding t j , j −1 = 0 . The remaining prior specifications are specified on the hyperparameters that essentially characterize the hypothesized genetic architecture of the trait and are virtually identical to those previously prescribed for BayesA and BayesB; i.e., ν δ ∝ (ν δ + 1) , sδ2 ~ Gamma(α s , β s ) , π δ ~ Beta (απ , βπ ) with απ , βπ α s and β s again all −2 2 specified as known. Similarly, we also estimate mt and s t by placing subjective priors, 2 mt ~ N ( mt 0 , st20 ) and s t2 ~ χ −2 (ν t ,ν t st2 ) on these key hyperparameters, where mt 0 , st 0 , ν t and st2 are specified to be known. As in MEUWISSEN et al. (2001) and subsequent work, our implementation strategy is based on the use of Markov Chain Monte Carlo (MCMC) methods; however, we also additionally infer key hyperparameters; i.e., ν g (ν δ ) , sg2 ( sδ2 ) , and π g (π δ ) that characterize the genetic architecture of the trait, as alluded to earlier. Further details on the full 11 conditional densities and any necessary Metropolis Hastings strategies to sample from the joint posterior density of all unknown parameters using MCMC are provided in Appendix A1. 2.2.3 Simulation study We compare the performance of BayesA and BayesB with their antedependence counterparts, ante-BayesA and ante-BayesB, in a simulation study. Twenty replicated datasets were each generated from a base population containing 50 unrelated males and 50 unrelated females. Each dataset underwent random mating while maintaining constant population size for 6001 generations beyond the base population. The entire genome was composed of one chromosome of length 1 Morgan. All of 20,001 potential SNP markers were equally spaced on this genome with a potential QTL placed directly in the middle of each interval of adjacent markers. In the base population, all 20,000 QTL and 20,001 SNP marker alleles were coded as monomorphic. The number of simulated crossover events per meiosis was generated from a Poisson (mean = 1) distribution with the location of the crossover events uniformly distributed throughout the chromosome in accordance with the Haldane mapping function. The mutation rate for both QTL and SNP markers was specified to be 10-4 per locus per generation and to be recurrent, that is, switching between one of two alternative allelic states 0 and 1 whenever mutation occurred so as to ensure biallelic loci (Coster, Bastiaansen et al. 2010; Daetwyler, PongWong et al. 2010). In Generation 6001, all SNP markers and QTL with a minor allele frequency (MAF) less than 0.05 were discarded. We then randomly selected only 30 of the 12 remaining QTL and their corresponding allelic substitution effects. For each of these k=1,2,…,30 QTL, an allelic substitution effect ( α k ) was drawn from a reflected gamma distribution with shape parameter 0.4 and scale parameter 1.66, with a positive or negative sign on α k sampled with equal probability. The genetic variance at QTL k was 2 determined to be 2 pk (1 − pk ) α k , where pk is the MAF at QTL k . The total genetic variance was subsequently determined to be the summation of these terms across the 30 30 2 selected QTLs; i.e., as 2∑ pk (1 − pk ) α k . Now the true breeding values (TBV) were k =1 defined to be a genotype-based linear function of the 30 generated QTL effects which, because these QTL were located between various SNP, are not subsets of g. These TBV were further scaled such that the total genetic variance was 1 as per MEUWISSEN and GODDARD (2010). Residual effects were, in turn, sampled from a standard normal distribution, such that the heritability was 0.50. That is, each phenotypic record was generated by adding the TBV for that animal plus its corresponding residual. Hence, 100 animals with known phenotypes and genotypes in Generation 6001 were simulated for inferring upon the SNP effects, using each of the competing methods. Genotypes and the TBV for each of 100 offspring were also generated in Generation 6002, based on randomly mating animals in Generation 6001. For each of the 20 replicated datasets, the effect of 6 different marker densities on the comparison between the competing methods were investigated by selecting every 1, 4, 7, 10, 15, and 20 SNP markers from those with MAF>0.05. That is, the datasets were used as a blocking factor in comparing different marker densities for the accuracy of predicting genetic merit in Generation 6002, using each of the four different methods: 13 BayesA, BayesB, ante-BayesA and ante-BayesB. Accuracy was defined as the correlation between estimated breeding values (EBV) for Generation 6002, using just Generation 6001 phenotypes and genotypes data, and the corresponding TBV of Generation 6002. These EBV are based on the posterior mean ( g ) of g; i.e., EBV are elements of Zg . Comparisons were also drawn between the BayesA/BayesB procedures and their antedependence counterparts for inference on the key hyperparameters that characterize genetic architecture. This was conducted using multifactorial ANOVA on the posterior means using replicate as the blocking factor for assessing the importance of model and marker density and their interaction across the 20 replicates. Furthermore, an assessment of the relative ability of ante-BayesB compared to BayesB to identify the top QTL by genetic variance was based on the difference in posterior probabilities of δ j and g j , respectively, of adjacent SNP markers being non-zero. As QTLs were placed between SNP markers and never on top of SNP markers, we calculated this probability of association by determining the proportion of MCMC cycles that either or both of the two markers adjacent to the known QTL were chosen to be non-zero within each analysis. All comparisons were based on the linear mixed model in Equation [2] with X being a column vector of ones, except that polygenic effects (u) were ignored for simplicity and computational tractability. 2.2.4 Application to Heterogeneous Stock Mice Dataset We used a dataset publicly available from the Wellcome Trust (http://gscan.well.ox.ac.uk/) which includes phenotypic records on 2,296 mice, each genotyped for 12,147 SNP markers. This data resource, which also includes pedigree 14 information, was based on an advanced intercross mating among 8 inbred strains after 50 generations of random mating (Valdar, Solberg et al. 2006). The average linkage disequilibrium (LD), as measured by r2 between adjacent markers is 0.62 (Legarra, Robert-Granie et al. 2008), which is high compared to commonly used SNP panels available for livestock populations. For example, the average r2 between adjacent markers in most commercially available livestock SNP panels ranges from 0.10 to 0.37 for markers that are generally around 100kb apart (Du, Clutter et al. 2007; De Roos, Hayes et al. 2008; Abasht, Sandford et al. 2009; Jarmila, Sargolzaei et al. 2010). Given this high pairwise LD, we considered only a random subset of all markers from this dataset to ensure adjacent marker LD levels that are representative of livestock populations. We first excluded SNP markers if the percentage of missing genotypes across samples was greater than 10% or if the MAF was less than 2.5%. We also discarded animals having greater than 20% missing SNP genotypes. We then randomly selected 50 SNP markers from each of the 19 autosomes, leading to an average LD of r2 of 0.35 between adjacent markers. The resulting dataset then involved records on 1917 animals with genotypes on 950 SNPs. As in LEGARRA et al. (2008), we also added the random effect of cage in the WGP 2 model of [2]; i.e., y = Xβ + Zg + Wu + Sc + e where c ~ N (0, Is c ) and S is the corresponding incidence matrix with all other terms defined as before. Furthermore, we 2 specified GELMAN’s prior s c2 ~ χ −2 ( −1, 0 ) on s c in addition to all previously provided prior specifications. Also, as per LEGARRA et al. (2008),we chose to use the data provided on body weight at 6 weeks that was already pre-corrected for fixed effects such 15 that X was a column vector of ones and β consisted of just an overall mean. Missing SNP genotypes were simply imputed from binary distributions based on their corresponding allelic frequencies in the dataset following LEGARRA et al. (2008). We adopted the same within-family cross validation technique as described in LEGARRA et al. (2008) by randomly partitioning each family into two. This partitioning was replicated 20 times to obtain 20 different nearly equal sized partitions of training and validation data subsets. Also, as in LEGARRA et al. (2008), we compared the various methods using predictive abilities, defined as the correlation between phenotypes in the validation subset and their corresponding predictions based on their inferences from the training data subset. 2.2.5 Application on Simulated Genomic Data from Hickey and Gorjanc To provide a benchmark comparison of our proposed methods with competing methods in other papers in this issue, we analyze simulated datasets provided by and described in detail by Hickey and Gorjanc (2012). They generated 10 replicated datasets for each of four different traits whereby 9000 QTL effects were generated for Trait 1 and 900 QTL effects were generated for Trait 2. Traits 3 and 4 mirrored Traits 1 and 2, respectively, with the further requirement that the MAF for these QTL was less than 0.30. Since we were permitted to simultaneously run 144 jobs on the High Performance Computing Cluster at MSU (hpcc.msu.edu), we chose to compare the four methods for each of the four traits on each of the first nine datasets (4x4x9=144). For all analyses, training data were based on 2000 animals in Generations 4 and 5 whereas TBV were provided on 500 animals within each Generation 6, 8 and 10. To facilitate computing tractability, we saved every tenth SNP marker that had a MAF > 0.20. This led to a 16 range of 2884 to 2952 SNP markers and an average LD between adjacent markers from 0.16 to 0.17 across the nine replicates. All four models also included polygenic effects. Antedependence methods were directly compared with their classical counterparts for accuracy (correlation of EBV with TBV) and bias (deviation of slope from 1 from regressing TBV on EBV) in these latter validation generations using a Wilcoxon signed rank test. 2.2.6 Bayesian inference For each of the four methods, BayesA, BayesB, ante-BayesA and ante-BayesB in both our simulation study and the heterogeneous stock mice application, we ran MCMC for 50,000 cycles of burn-in followed by an additional 300,000 cycles; for the benchmark data from Hickey and Gorjanc (2012), the corresponding numbers were 80,000 and 1,000,000, respectively. Every tenth MCMC cycle was subsequently saved for inference post burn-in. We monitored MCMC convergence via inspection of trace plots and determined the effective sample size (ESS) for number of random draws from the joint posterior density for all key hyperparameters using the R package CODA (Plummer, Best et al. 2006). The larger number of MCMC cycles for the Hickey-Gorjanc data were based on ensuring that ESS for all hyperparameters at least exceeded 100. Inferences were primarily based on the posterior means and posterior standard deviations for key parameters, including those hyperparameters that characterize genetic architecture. 2.2.7 Prior specifications For all analyses in this paper, we chose απ = 10 and βπ = 1 in both BayesB and ante-BayesB to reflect the prior belief that most of the markers will not be associated 17 with any genetic effects; however, the dispersion of this corresponding beta distribution is still large enough such that values of π g ( π δ ) close to 0.70 are plausible. Based on preliminary runs, we also found that this prior specification led to superior mixing properties of the MCMC chains over a naïve Uniform(0,1) prior, yet facilitated domination of data over prior information since απ + βπ << m. For BayesB and ante-BayesB, we always specified α= β= 0.1 for the Gamma s s prior on sg2 ( sδ2 ). For the antedependence based models, we specified mt 0 = 0 , st20 = 1 , ν t = -1, and st2 = 0; i.e., a standard normal prior on mt and GELMAN’s prior on s t2 . We also always specified a flat prior on β by defining Vβ−1 = 0. Prior specifications for all other parameters (e.g., variance components) were based on those previously recommended in this paper. 2.3 Results 2.3.1 Simulation Study For the six different marker densities, the average distances between adjacent markers ranged from 0.046 to 0.918 cM over the 20 replicates whereas the average LD between adjacent markers, measured by r2 values, ranged from 0.15 to 0.31, as shown in Table 2.1. Among the 30 chosen QTLs within each of the 20 replicates, anywhere between 6 to 11 of the QTLs had variances greater than 2% of the total genetic variance. 18 Table 2.1: Summary statistics for 6 different marker densities in the simulation study over 20 replicates Marker density level† Average number of markers per replicate Average distance between adjacent marker loci (cM) per replicate Average r2 between adjacent marker loci per replicate 1 108 0.918 0.15 2 145 0.689 0.18 3 217 0.459 0.21 4 311 0.321 0.24 5 545 0.184 0.27 6 2182 0.046 0.31 †Marker density levels 1 through 6 pertain to saving every 20th, 15th, 10th, 7th, 4th, and every single SNP marker from a single 1M chromosome within each data replicate. It is important to recognize that none of the modeling assumptions behind BayesA, BayesB, ante-BayesA, or ante-BayesB truly match the data generation model based on thousands of generations of LD created between markers and QTLs, even for simulated data. This goes beyond the fact that the QTL effects were drawn from reflected Gamma distributions in our simulation study as typically done (e.g., Meuwissen, Hayes et al. 2001; Meuwissen and Goddard 2010). That is, the process of recombination over thousands of generations in terms of how it generates LD between QTL and SNP markers is not explicitly captured in any known WGP model, including any of the competing models, especially when the effects of neighboring SNP markers rather than the causal 19 QTL effects are being estimated. Hence, there is no way to surmise the “true” values of key hyperparameters, whether for sg2 , ν g , or π g in BayesA or BayesB or for sδ2 ,ν δ , π δ , mt or s t2 in ante-BayesA or ante-BayesB. However, one should anticipate that estimates of 2 sg2 or sδ should be inversely related to marker density, since they closely represent the { } mean value of the variance components s g2 j m j =1 { } or s δ2j m j =1 , respectively, accounted for by each SNP. Indeed, we observe this phenomena in the comparison between BayesA and ante-BayesA in Figure 2.1A. We also note a similar comparison between sδ2 and sg2 for BayesB versus ante-BayesB in Figure 2.1B, but further recognize that the corresponding estimates of sg2 and sδ2 are roughly one order of magnitude greater than those seen in Figure 2.1A. That is, sg2 and sδ2 specify a typical value for s g2 and s δ2 , i i respectively, over many more loci in (ante)BayesA than their (ante)BayesB counterparts. In spite of the lower values observed in Figure 2.1A, however, there was a significant difference (P<0.01) between sg2 and sδ2 when r2 ≥ 0.21. 20 2 2 Figure 2.1: Average posterior means of sg (BayesA, BayesB) and sδ (ante-BayesA, ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and ante-BayesA in (A) and BayesB versus ante-BayesB in (B). Significant differences in posterior means between competing methods at each LD level are indicated by *(P<0.01), **(P <0.001), or ***(P <0.0001). 21 As marker density increased, we also expected that the estimates of π g or π δ should increase as well; that is, it becomes increasingly unlikely that individual SNP markers become associated with a particular QTL with greater marker density. Indeed we observed this in Figure 2.2. It was particularly interesting that the posterior means of π δ were generally lower than that of π g , with differences widening with increasing marker density (i.e., LD level) such that the differences were significant beyond r2=0.24 (P<0.01). Note the subtle difference in interpretation between π g and π δ as π g pertains to the probability of non-association for the corresponding SNP whereas π δ pertains to the probability of non-association conditional on a neighboring SNP. Figure 2.2: Average posterior means of π g (BayesB) versus π δ (ante-BayesB) across 20 replicates as a function of six different LD levels. Significant differences in posterior means between competing methods at each LD level are indicated by *(P<0.01), **(P <0.001), or ***(P <0.0001). 22 The estimates of ν g and ν δ also changed as a function of marker density for anteBayesA versus BayesA in Figure 2.3A and for ante-BayesB versus BayesB in Figure 2.3B. Specifically, the posterior means of ν g and particularly of ν δ both decrease with increasing marker intensity. Since these parameters, respectively, characterize the heterogeneity of s g2 j and s δ2j across SNP or, alternatively, the heaviness of the tails for the resulting marginal Student t distribution on g j and δ j across SNP, our results imply that these hierarchical methods, and particularly those based on nonstationary first order antedependence correlation structures, identify SNP with large effects as being more outlying relative to a normal distribution when marker density increases. However, these differences between ν g and ν δ were not seen to be statistically significant at any marker density. 23 Figure 2.3: Average posterior means of vg (BayesA, BayesB) and vδ (ante-BayesA, ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and ante-BayesA in (A) and BayesB versus ante-BayesB in (B). No significant differences (P>.01) were determined between the two sets of competing procedures at each LD level. 24 Figure A2.1 and A2.2 (see Appendix A2) show, respectively, the average posterior means for mt and s t2 against LD level across the 20 replicates under both anteBayesA and ante-BayesB. There was no evidence (P>0.01) across these 20 replicates that the posterior means of mt were different from zero at any LD levels; however, at higher LD levels, the posterior means tended to converge to zero as anticipated. Similarly, Figure A2.2 showed that the posterior estimates for s t2 were also lower at higher LD levels. Again, this was somewhat anticipated since there should be less disparity in different values of the antedependence parameters ( t j , j −1 ) between adjacent markers with increasing marker intensity. The average accuracies of the EBV over the 20 replicated datasets are plotted as a function of the average r2 (i.e., the different marker densities) between adjacent markers for the four different methods in Figure 2.4. As anticipated, given the simulated genetic architecture of few QTL, the accuracies for the BayesB methods were consistently greater than their corresponding BayesA counterparts at all marker densities. Also, anteBayesA and ante-BayesB outperformed their classical counterparts with differences increasing with LD level. Specifically, ante-BayesA had significantly greater accuracies compared to conventional BayesA, as did ante-BayesB compared to BayesB (P<0.01), when average LD levels exceeded r2 = 0.24. 25 Figure 2.4: Average accuracies of estimated breeding value across 20 replicates for analyses based on each of six LD levels. Differences in accuracy between BayesA and ante-BayesA (bottom symbols) and between BayesB with ante-BayesB (top symbols) indicated as significant by *(P<.01) (*) or **( P<.001) We anticipated that the antedependence parameters t j , j −1 's would have greater importance at higher marker densities. To demonstrate this, we standardized the posterior means of these parameters as a ratio over their posterior standard deviations, i.e., tj , j −1 = E ( t j , j −1 | y ) var ( t j , j −1 | y ) , for each analysis. We then determined the proportion of these tj , j −1 whose absolute value exceeded an arbitrary value of 2 for each data replicate and 26 marker density analysis to indicate the relative importance of these antedependence parameters. We present boxplots of these proportions across the 20 replicates for anteBayesA and for ante-BayesB in Figure A2.3 (see Appendix A2). We anticipated and noted that a higher proportion of tj , j −1 exceeded 2 in datasets characterized by higher marker densities, thereby indicating that, in general, nonstationary serial correlation between adjacent markers becomes increasingly more important with higher levels of LD. We believe this phenomenon is responsible for driving the differences in accuracies between ante-BayesA (ante-BayesB) and BayesA (BayesB) with increasing LD levels as seen earlier in Figure 2.4. Hierarchical methods that are similar to BayesB, in that they jointly infer upon all SNP effects, have been increasingly advocated as tools for GWAS (Hoggart, Whittaker et al. 2008; Lee, van der Werf et al. 2008; Logsdon, Hoffman et al. 2010). Figure A2.4 (see Appendix A2) shows the average (across 20 replicates) posterior mean probabilities of identifying the largest QTL by genetic variance within each replicate using BayesB and ante-BayesB, respectively. These estimated posterior probabilities increased with LD level for both models but were significantly greater for ante-BayesB than for BayesB with statistical significance also increasing with LD or marker density. That is, the precision for detecting QTL was increasingly greater for ante-BayesB compared to BayesB at higher LD levels. We observed this consistently across data replicates with the ability of ante-BayesB to better track causal variants increasing with marker density (see Appendix A2, Figure A2.5). 27 2.3.2 Application to Heterogeneous Stock Mice data We summarize posterior inferences of key parameters using BayesA and BayesB in Table A2.1 and for their antedependence counterparts in Table A2.2 (see Appendix A2) for the heterogeneous stock mice data. Inferences on s u2 , s c2 , and s e2 were consistent with results previously reported by LEGARRA et al. (2008). As expected from our simulation study, the estimates for ν g (ν δ ) and sg2 ( sδ2 ) were substantially greater for BayesB (ante-BayesB) than for BayesA (ante-BayesA). Although the posterior mean for π g of 0.81 (BayesB) was only slightly larger than π δ = 0.80 (ante-BayesB), the posterior mean of sδ2 was substantially larger in ante-BayesB compared to sg2 in BayesB. The average estimates ± empirical standard errors of predictive ability correlations over the 20 cross validation partitions of training and validation data subsets were 0.57±0.01, 0.62±0.01, 0.60±0.01 and 0.66±0.01 for BayesA, BayesB, ante-BayesA and ante-BayesB, respectively. The differences between BayesA with ante-BayesA and BayesB with anteBayesB were both determined to be statistically significant (P<0.005), indicating the relative advantage of the antedependence methods. Furthermore, BayesB and anteBayesB had significantly greater predictive abilities than BayesA and ante-BayesA, respectively (P<.001). 2.3.3 Application to Hickey and Gorjanc Data Average posterior means for key hyperparameters for each of the four methods across the nine replicates are provided in Table A2.3, whereas the corresponding average ESS are provided in Table A2.4 (see Appendix A2). Estimates of π g ( π δ ) and sg2 ( sδ2 ) 28 were lower whereas estimates of ν g (ν δ ) are higher for traits with higher numbers of QTL (Traits 1 and 3) compared to those with lower numbers of QTL (Traits 2 and 4), relative to the same number of markers. A side-by-side comparison of the accuracies of the four methods across the validation generations (6, 8, and 10) is provided in Figure 2.5. It is remarkable to note that ante-BayesA had generally significantly greater accuracies than BayesA for Traits 1 and 3 (larger numbers of QTL) that was still maintained until Generation 10, whereas ante-BayesB had generally significantly greater accuracies than BayesB for Traits 2 and 4 (lower numbers of QTL) but only in Generations 6 and 8. An assessment of bias of the four procedures based on regressing TBV on EBV is provided in Figure A2.6 (see Appendix A2). For all traits, all four methods had some significant bias in Generation 6 but not in later generations. Figure 2.5: Boxplots of average accuracies of estimated breeding value across 9 replicates for four traits in Generations 6, 8 and 10 for benchmark data from Hickey and Gorjanc (2012). Differences in accuracy between anteBayesB (black) and BayesB (dark gray) and between anteBayesA (light gray) with BayesA (white) indicated as significant by *(0.05500,000) that are being developed for livestock or for situations where there is sequence data (MEUWISSEN and GODDARD 2010). Along those lines, we anticipate that these methods would also perform better in populations where LD is greater between markers due to other phenomena; e.g., selection history. Our simulation studies were also based on a particular genetic architecture; i.e. 30 QTL that were randomly distributed throughout a 1 M chromosome (or equivalently, 900 QTL for a 30 M genome). Although this is not the focus of our paper, we realize that genetic architecture (i.e., number of QTL, average QTL substitution effect, marker density, etc.) can impact the relative merit of BayesA, BayesB, and GBLUP based on other studies where key hyperparameters such as π g , ν g and sg2 are arbitrarily specified to be known (Daetwyler, Pong-Wong et al. 2010; Meuwissen and Goddard 2010). That is, the greater the number of QTL, each with small effects, relative to the number of SNP markers, the more likely the genetic architecture reflects the GBLUP assumptions ( π g = 1 , ν g → ∞ such that s g2 j= sg2∀j ). Conversely, BayesB would be favored in the situation where SNP marker density is high relative to the number of QTL ( π g < 1 ). However, we believe that formal comparisons in data fit between BayesA, BayesB, and GBLUP, along 33 with ante-BayesA and ante-BayesB, are not entirely necessary since ante-BayesB represents the most general model. As previously noted, ante-BayesA is a special case of BayesA, as is ante-BayesB of BayesB, when T = 0 such that then π δ = π g , ν δ = ν g , and sδ2 = sg2 . Furthermore, BayesB becomes BayesA as π g → 1 , whereas BayesA becomes GBLUP as ν g → ∞ . Nevertheless, our claim that one only needs to fit ante-BayesB, rather than any of the other three competing submodels, vitally depends upon reliable inferences being provided on these key hyperparameters defining genetic architecture, rather than arbitrarily specifying them (Daetwyler, Pong-Wong et al. 2010; Meuwissen and Goddard 2010) or estimating a subset thereof (Habier, Fernando et al. 2011). We provide details on MCMC inference strategies on these and other unknown parameters in Appendix A1. We are currently pursuing more suitable inferential strategies for variable selection (O'HARA and SILLANPAA 2009) when inferring upon π g or π δ . Also, although our proposed antedependence methods seem to work well under additive genetic model assumptions, it is not clear how well they may perform in the presence, for example, of extensive non-additive gene action where nonparametric approaches may be warranted (Gianola, Wu et al. 2010). Nevertheless, even in the extensive presence of such phenomena, genetic variance is still considered to be primarily additive (Hill, Goddard et al. 2008). Although the scope of this work was focused on the potential merit of these antedependence models for WGP, we suggested earlier that there may be also merit for using these models in GWAS in both livestock and human populations. It has become increasingly recognized that GWAS procedures based on joint analyses of all SNP 34 markers are more powerful than the conventional series of single SNP analyses. Our results suggest that modeling nonstationary correlations between SNP effects will further augment this power. At any rate, we recognize that for reasonably accurate GWAS, that a greater marker density (m) per chromosome and sample size (n) should be considered (e.g., MEUWISSEN and GODDARD 2010) than those studied in this paper; i.e., most of the posterior probabilities reported in Appendix Figure A2.4 and Figure A2.5 are too low to be of practical benefit in current applications. We also acknowledge that our ante-BayesA and ante-BayesB models increase the computational load relative to their conventional counterparts. Since m is typically large, the computing time for the proposed antedependence models is bottlenecked primarily by the m elements of δ , the m diagonal elements of ∆ and the m-1 non-zero elements of T. Similarly, computing time for the two conventional methods, BayesA and BayesB, is primarily restricted by the dimension of δ and ∆ ; i.e., roughly 2/3 as many variables for the antedependence-based models, ignoring the remaining parameters such as variance components and hyperparameters. Hence, the computing time for the antedependence based procedures should be somewhat less than 1/3 greater than for their conventional counterparts. Indeed, we discovered from our simulation study that computing time for all four competing models were linear in m with the antedependence based models taking less than 30% greater computing time compared to the conventional counterparts for the wide range of values of m considered in this paper. We recognize for much larger number of SNP markers, than those pursued in this study, that alternative algorithmic adaptations already developed for models similar to conventional BayesA or BayesB, 35 such as those based on the EM algorithm (Shepherd, Meuwissen et al. 2010) or variational Bayes (Logsdon, Hoffman et al. 2010), would be worth exploring. We believe the proposed antedependence models provide opportunities for further study and extension. For example, it has been previously recognized that basing inferences on allelic effects on the use of multiple marker haplotypes rather than single markers increases accuracy of WGP (Calus, Meuwissen et al. 2008; Villumsen, Janss et al. 2008) or GWAS (Grapes, Dekkers et al. 2004). Given the difficulty in how to appropriately specify these haplotypes, we believe our antedependence-based methods may help bridge these two different strategies as the effects of adjacent SNP markers connected by large values of t j , j −1 may somewhat determine “effective haplotype” effects. We also think that our antedependence specifications might facilitate multiple breed inference if, for example, genomic effect differences between breeds is primarily due to differences in SNP associations with QTL, as partly manifested in T 36 Chapter 3 Improving the computational efficiency of fully Bayes inference and assessing the effect of misspecification on hyperparameters in whole genome prediction model 3.1 Introduction Genomic predictions based on high density single nucleotide polymorphisms (SNP) markers distributed over the whole genome have become increasingly adopted for animal and plant breeding. Parametric Bayesian methods have been particularly popular, most notably BayesA and BayesB as first presented by MEUWISSEN et al. (2001). BayesB specifies a mixture prior on the SNP specific effects having point mass at zero with probability π or randomly drawn, with probability (1- π), from a Student t distribution with degrees of freedom v and scale parameter s2; BayesA is BayesB with π = 0. Hence π is typically believed to be the proportion of SNPs that are not associated or in linkage disequilibrium (LD) with causal variants although this interpretation is somewhat complicated by the existence of LD. These hyperparameters (v, s2 and π) are relevant in that they partly determine the genetic architecture of traits and can be further shown to depend upon SNP marker densities used in the analyses (YANG and TEMPELMAN 2012) . Now inference in BayesA/B like models is conducted using either Markov Chain Monte Carlo (MCMC) methods for fully Bayesian inference or faster albeit approximate methods based on the use of the expectation maximization (EM) algorithm or its various derivatives (Shepherd, Meuwissen et al. 2010). Unfortunately, it has not been readily established how to properly infer upon these hyperparameters in the EM based methods such that they are often arbitrarily “tuned” or specified (KARKKAINEN and SILLANPAA 2012). Furthermore, although it is possible to infer upon these same hyperparameters 37 using MCMC, the poor efficiency and speed of these implementations have seemingly discouraged this practice (de los Campos, Hickey et al. 2013). In particular, it has been noted that the correlation between v and s2 across MCMC cycles is generally so large that these two hyperparameters are nearly non-identifiable from each other (Habier, Fernando et al. 2011; de los Campos, Hickey et al. 2013). This particular MCMC analysis was based on a strategy first presented by Yi and Xu (2008) that invokes a Gibbs update for the full conditional density (FCD) on s2, as it is conditionally conjugate with a Gamma prior, whereas a Metropolis Hastings (MH) update was used on sampling from the FCD of v since it is not recognizable (YI and XU 2008). We label this particular algorithm as DFMH (i.e., sampling v using MH) and it is the control or reference strategy for this paper. Now computational efficiency in MCMC schemes is related to the degree of mixing or autocorrelation between subsequent samples of the same parameter. The most popular metric for inferring the degree of mixing or autocorrelation for a fixed number of MCMC cycles is the effective sample size (ESS), which can be readily computed using software packages like CODA (Plummer, Best et al. 2006). The ESS determines the effective number of independent draws such that a greater degree of autocorrelation between subsequent samples for the same parameter would lead to a smaller ESS and hence poorer computational efficiency. Now, although there are clear exceptions, MCMC sampling strategies that lead to a greater ESS for a certain total number of MCMC cycles tend to have greater computational cost per cycle. This realization is reflected in other recent quantitative genetics applications (Shariati and Sorensen 2008; 38 Waagepetersen, Ibanez-Escriche et al. 2008) who derived various metrics to integrate together these two components of computational efficiency. We surmised that there may be a number of strategies that could improve the computational efficiency of inferring upon key hyperparameters in a BayesA/B WGP model compared to DFMH. Furthermore, the efficiency of any such strategy could markedly depend on the use of an appropriate scale. For example, a highly nonlinear relationship between two variables can be rendered somewhat linear by transforming either one or both of the corresponding parameters. When v and s2 are both logtransformed, the resulting scatterplot of the transformed variables against each other tends to demonstrate a more linear relationship. Hence this change of variable might facilitate potentially more efficient MCMC sampling strategies based on multivariate proposal densities, for example. Specification of key hyperparameters in a BayesA/B WGP model has been treated arbitrarily in a wide selection of genomic selection topics. Meuwissen et al. (2001) chose 4.2 as v and 0.04 as s2 for BayesB model based on population genetics arguments in a simulation study. Daetwyler et al. (2010) set both v and s2 to 1 under their BayesB model analyses across all simulation scenarios. In various simulation scenarios, it seems reasonable to choose different specifications on the key hyperparameters as their estimates were dependent on many factors such as marker density in the simulation study (YANG and TEMPELMAN 2012). There were two primary objectives in this study. First, we wanted to explore alternative strategies to improve the computational efficiency of estimating 39 hyperparameters in BayesA/B WGP models. Secondly given the prevalent practice of specifying rather than estimating these hyperparameters, we wanted to assess the impact of misspecifying these hyperparameters on accuracy of breeding value prediction. 3.2 Materials and Methods 3.2.1 WGP Model The WGP model used for comparison of the various computational strategies and/or hyperparameter specifications can be denoted as follows: m yi =+ x i' β ∑ zij g j + ei [1] j =1 Here yi is the phenotype for ith animal (i=1,2,…,n), β is a vector of fixed effects such ' that xi is the known incidence row vector connecting yi to β, zij is the genotype covariate for SNP j on animal i coded as either 0, 1, or 2 copies of a reference allele for SNP j on animal i, g j is the random effect for SNP j, and ei is the residual. The WGP model in matrix algebra notation can be written as: y = Xβ + Zg + e [2] Where X = {xi' } , Z = { zij } , g = {g j } ~ N ( 0, Σg ) with variance-covariance matrix j =1 i=1 n m ( ) and residual vector e= {e } G = diag s g2 j n i i =1 ~ N (0, Is e2 ) . We compared three sampling strategies under BayesA and BayesB specifications (Meuwissen, Hayes et al. 2001) on s g2 j in WGP model. In BayesB, s g2 j has a mixture 40 prior of two components: a scaled inverted chi-square distribution with s g2 j ~ χ −2 (ν ,ν s 2 ) with probability π and a spike at 0 with probability (1 − π ) . Here π loosely represents the proportion of SNP markers having associated genetic effects on the phenotype. BayesA is a special case of BayesB when π = 1. Following Yang and Tempelman (2012), we specify the following prior distributions on the hyperparameters as: ν ~ p (ν ) ∝ (ν + 1) and the Gelman prior s 2 ~ χ −2 ( −1,0 ) (GELMAN 2006) for BayesA, a −2 Beta (= απ 1,= βπ 8 ) for proper conjugate prior s 2 ~ Gamma ( 0.1,0.1) and π ~ p (π | απ , β= π) BayesB. For all three computational strategies that we subsequently describe, we adapt the same commonly used MCMC strategies for sampling from all parameters/random variables, other than ν , s , and π , as outlined by Meuwissen et al. (2001) for example. 2 We now describe each of the three computational strategies in turn. 2 3.2.2 Univariate Metropolis Hastings sampling on ν and Gibbs update on s (DFMH) This strategy, which we designate as DFMH, closely follows Yi and Xu (2008). The FCD of ν does not have a recognizable form; hence sampling from this FCD requires a strategy other than a Gibbs step. Here, we used the MH algorithm to sample from the FCD of ν drawing from our experiences in various other applications (Kizilkaya, Carnier et al. 2003; Kizilkaya and Tempelman 2005; Bello, Steibel et al. 2010; Yang and Tempelman 2012). More specifically, we generate from the FCD of ξ = log(ν ) , ensuring that the FCD of ξ takes into account the Jacobian of the transformation from ν to ξ (see Appendix B1.1). Since ξ can conceptually be defined 41 anywhere on the continuous real line, we believe this transformation better justifies the use of a Gaussian proposal density centered on the value of ξ from the previous MCMC cycle; i.e., a random walk MH step (CHIB and GREENBERG 1995); alternatively, a heavier-tailed Student t proposal density (CHIB and GREENBERG 1995) could be used as well. During the first half of burn-in, we adaptively tune the variance of this proposal density such that the MH acceptance ratios are intermediate (i.e., 25-75%) adapting the strategy described by Muller (1991) and in accordance with standard recommendations (Gelman, Carlin et al. 2003; Carlin and Louis 2008). This proposal density variance was then fixed for the last half of burn-in in order to ensure a proper convergent MCMC 2 algorithm. Yi and Xu (2008) demonstrated that the FCD of s is Gamma, provided that a conditionally conjugate Gamma or noninformative prior is used. Using the Gelman prior 2 (GELMAN 2006) for s as we have previously advocated for BayesA (YANG and TEMPELMAN 2012), the FCD of s can be shown to be Gamma with shape 0.5 ( mν + 1) 2 m and scale 0.5ν ∑ s g2 j . Hence, for DFMH, we sampled ν using the described MH update j=1 2 and s with a Gibbs update. In DFMH, we sampled 10 MH samples per MCMC cycle for ν . 3.2.3 Univariate Metropolis Hastings sampling for each of ν and s (UNIMH) 2 Metropolis Hastings sampling, if properly tuned with good proposal densities and intermediate acceptance rates, can often lead to faster mixing and hence greater MCMC efficiency relative to Gibbs sampling. This is because MH sampling typically proposes bigger jumps throughout the posterior density compared to the use of Gibbs sampling. 42 Hence, we propose a second strategy, UNIMH, whereby we again use MH to sample from ν but also use MH to sample from s . As with ν in DFMH, we sample s by first 2 2 ( ) a change of variable to its logarithm (i.e., ψ = log s ) and use a random walk MH 2 algorithm based on a Gaussian proposal density for ψ . Similar to what was done for ν , the variance of this proposal density was only tuned for intermediate acceptance rates during the first half of burn-in to ensure a properly convergent MCMC chain. In UNIMH, 10 MH samples per MCMC cycle were specified for sampling ν and s . Details on this 2 strategy are further provided in Appendix B1.2. 2 3.2.4 Bivariate Metropolis Hastings sampling on ν and s (BIVMH) 2 As previously noted, the posterior correlation between ν and s can be high; hence, it might be advantageous to jointly sample both parameters together with a bivariate random walk MH sampler as demonstrated with another application by Ntzoufras (2011). Hence, we propose a third sampling algorithm that we label BIVMH. Here, we divided the burn-in period for this strategy into four stages of equal lengths with respect to the number of MCMC cycles; arguably, a more efficient implementation might be possible given that these stages may not necessarily need to be of the same length. In Stage 1, we sampled log(ν ) and log( s ) from their respective FCD using the UNIMH 2 strategy previously described, fine-tuning the variances of the two separate Gaussian proposal densities to ensure MH acceptance rates falling between 25% and 75%. In Stage 2, we sampled log(ν ) and log( s 2 ) using UNIMH, fixing the variances of their respective proposal densities to those values tuned at the end of Stage 1 while computing 43 the empirical correlation between the samples of log(ν ) and log( s ) drawn within the 2 same cycle. In Stage 3, log(ν ) and log( s ) were jointly sampled together using a 2 bivariate Gaussian proposal density with variances based on those tuned at the end of Stage 1 and a covariance based on the correlation computed from Stage 2. During Stage 3, we further fine-tuned the proposal variances to ensure intermediate acceptance rates for joint samples of log(ν ) and log( s ) with the proposal covariance based on the same 2 correlation derived in Stage 3. In Stage 4, we drew samples using the same joint MH random walk from the newly tuned bivariate Gaussian proposal density in Stage 3 but without further tuning in order to ensure a proper convergent MCMC chain. Upon the end of Stage 4, and hence burn-in, we saved samples for the hyperparameters of ν and s 2 (i.e., back-transformed) for MCMC-based fully Bayesian inference. Ten MH samples per MCMC cycle for ν and s were drawn at each Stage. Details on this strategy are 2 further provided in Appendix B1.3. 3.2.5 Simulation Study In order to compare the efficiency of the three sampling strategies, DFMH, UNIMH and BIVMH under BayesA and BayesB modeling specifications, we simulated 15 replicated datasets using the HaploSim package in R (Coster, Bastiaansen et al. 2010). The simulated genome was composed of one chromosome of length 1 Morgan consisting of 100,000 equally spaced loci. For each of the 100 animals in the base population, every 5th locus on this chromosome was heterozygous (i.e., for a total of 20,000 such loci) whereas the remaining 80,000 loci were completely monomorphic, similar to that in Coster et al. (2010). Individuals were randomly mated to generate 100 animals within 44 each of 6000 subsequent generations in order to generate LD between loci. The number of recombinations per each meiosis event was drawn from a Poisson(1) distribution with the position of each recombination being randomly drawn from a uniform distribution on the chromosome (i.e., no interference). Furthermore, we specified the recurrent mutation rate to be 10-5 per locus per generation. After Generation 6000, random matings were used to augment the population size to 1000 individuals in Generation 6001. In Generation 6001, we deleted loci with a minor allele frequency (MAF) less than 0.05 and randomly selected 30 from the remaining loci to be quantitative trait loci (QTL). Following Meuwissen et al. (2001), we simulated substitution effects α for these 30 QTL from a reflected gamma distribution with shape parameter 0.4 and scale parameter 1.66 such that the true breeding values (TBV) were genotype-based linear combinations of α . Phenotypes for animals in generation 6001 were generated based on heritability of 50%; i.e., such that s e2 = var(TBV). Additionally, genotypes for 1000 offspring in generation 6002 were based on random matings of individuals in Generation 6001. Again, TBV were based on linear combinations of α based on QTL genotypes inherited from Generation 6001. After discarding SNP with MAF< 0.05, we then selected every 1st, 4th and 10th SNP markers for inclusion in analyses in order to consider the effect of different marker densities; i.e., high (around 2394 SNPs), medium (around 598 SNPs), and low (around 239 SNPs). We compared the computational efficiency of inferring on key hyperparameters (e.g., v and s2) for genetic architecture under these three different marker densities. We ignored fitting polygenic effects for all comparisons in the 45 simulation study to facilitate further computational feasibility, recognizing that the relative efficiency of each strategy should not differ otherwise. We compared the computational efficiencies of the three MCMC strategies on each replicated dataset, considering each of three different marker densities. Now computational efficiency, as it pertains to a particular hyperparameter, was defined as the effective sample size (ESS) for the post-burn-in MCMC cycles divided by total CPU time; i.e. ESS/CPU recorded in #/seconds. That is, the greater ESS/CPU, the greater the computational efficiency for inferring the posterior density of that particular hyperparameter. Given that many researchers do not infer upon some or even all hyperparameters in WGP models because of perceived inferential challenges, we thought it important to assess the impact of their misspecification on the accuracy of genomic prediction. Using the same simulated data as described previously, we focused on five different scenarios, all at the medium marker density (selecting every 4th marker). Each scenario was based on setting s2 to be an arbitrary multiplicative constant of the average posterior mean at the 2 medium marker density ( smed ) based on the complete (BayesA or BayesB) model that was used to infer upon the other hyperparameters (ν and π where applicable) as well. 2 2 2 These five scenarios were to set 1) s2 = smed , 2) s2 = 0.1 smed , 3) s2 = 0.01 smed , 4) s2 = 10 2 2 2 , and 5) s2 = 100 smed . Note that the specification of smed depended upon which smed model (BayesA or BayesB) was employed, as described later. We also wondered if one could roughly specify a good working value for s2 by merely basing it on an estimate derived from, say, analysis of the same phenotype but 46 based on a SNP panel with a different marker density. As s2 represents a typical value for the SNP-specific variances s g2 j , then its value should be inversely related to the number of SNP markers as we have observed previously (YANG and TEMPELMAN 2012). For example, given that there were four times as many markers at the high marker density as there were at the medium marker density in the simulation study, an initial specification 2 for s2 at the high marker density is to use s2 = 0.25 smed . Similarly, since there were half as many SNP markers for the low marker density specification, an initial specification in 2 . These specifications for s2 were compared for their effect that case might be s2 = 2 smed on accuracy of breeding value prediction relative to the situation where s2 is inferred upon along with all other hyperparameters under both BayesA and BayesB for all 15 replicated datasets. In all cases, accuracy was defined as correlation between estimated breeding values (EBV) and TBV where EBV =Z g for g is the posterior mean of g and TBV is defined as before. 3.2.6 Data Application: Assessment of computational efficiency comparisons In this dataset, 2,296 mice were genotyped for 12,147 SNP markers with a high pairwise LD of r2=0.6 (Legarra, Robert-Granie et al. 2008). After data cleaning on genotypes (YANG and TEMPELMAN 2012), there were 1940 animals with 10,467 SNP markers. We selected 50, 100 and 200 SNP markers from each of the 19 autosomes to create three different marker densities using pre-corrected body weight at 6 weeks as our phenotypes. As in Yang and Tempelman (2012), we also modeled the random effects of cage in addition to SNP effects and polygenic effects in the WGP model using the 47 2 2 Gelman prior (GELMAN 2006) specified on the cage s c and the polygenic variance s u . After merging phenotypes with the genotypes, we were left with 1917 animals with complete phenotypes and genotypes on 950, 1900 and 3800 SNP markers across the 19 autosomes. 3.3 Results 3.3.1 Simulation Study By selecting every single, 4th and 10th SNP markers for inclusion, the average r2 between adjacent SNPs, was 0.17, 0.24 and 0.32 for the three marker densities over the 15 replicated datasets. Inferences on s2 in the three sampling strategies DFMH, UNIMH and BIVMH were compared under both BayesA (Appendix Figure B2.1A) and BayesB (Appendix Figure B2.1B) specifications. We observed estimates (posterior means) of s2 decreased as the marker density increased and that estimates derived from BayesB were generally one order of magnitude greater than those in BayesA. Furthermore, estimates of π generally increased (Appendix Figure B2.2) whereas estimates of v generally decreased as marker density increased (Appendix Figure B2.3). All of these results are consistent with our previous work (YANG and TEMPELMAN 2012). For quality control, we checked to see that the estimates for the key hyperparameters should be the same between the three computational strategies within each replicated dataset under the same model, allowing for Monte Carlo error. Pairwise scatterplots of the estimates of s2 under the three different strategies for each of the three different marker densities in BayesA (Appendix Figure B2.4) and in BayesB (Appendix Figure B2.5) indicated generally good agreement as did for π using BayesB (Appendix 48 Figure B2.6) and for ν under BayesA (Appendix Figure B2.7) and BayesB (Appendix Figure B2.8). The greatest differences between DFMH and UNIMH or between DFMH and BIVMH were found in estimating ν under BayesB (Appendix Figure B2.8). Figure 3.1 and 3.2 illustrate side-by-side boxplots of ESS/CPU for each of the three strategies under each of the three marker densities for v and s 2 , respectively, under the BayesA model. In all cases, ESS/CPU were higher for BIVMH and UNIMH compared to DFMH (P<0.05). For v at high marker density (r2=0.32), BIVMH had higher ESS/CPU than UNIMH (P<0.0001). For s 2 at high and median marker densities (r2=0.32 and 0.24), ESS/CPU were higher for BIVMH than UNIMH (P<0.0001). Interestingly, the differences in efficiencies between the three strategies widened as marker density increased. Efficiencies for the three alternative sampling strategies were also compared under BayesB model for v (Figure 3.3), s 2 (Figure 3.4), and π (Figure 3.5). We found that UNIMH and BIVMH had significantly greater computational efficiencies compared to DFMH for all three hyperparameters (P<0.05). For v , UNIMH had significantly higher ESS/CPU compared to BIVMH at low marker density (P<0.05). For s 2 , UNIMH had significantly greater computational efficiencies compared to BIVMH at median marker density whereas BIVMH had higher ESS/CPU than UNIMH at low marker density (P<0.05). For π , BIVMH had higher ESS/CPU compared to UNIMH at high and median marker densities (P<0.05). 49 Figure 3.1: Boxplots of effective sample size for v divided by total CPU time in seconds across 15 replicates for three different levels of LD comparing DFMH, UNIMH and BIVMH under BayesA model. Different letters indicate significant difference with P < 0.05. 2 Figure 3.2: Boxplots of effective sample size for s divided by total computational time in seconds across 15 replicates for three different levels of LD comparing DFMH, UNIMH and BIVMH under BayesA model. Different letters indicate significant difference with P< 0.05. 50 Figure 3.3: Boxplots of effective sample size for v divided by total computational time in seconds across 15 replicates for three different levels of LD comparing DFMH, UNIMH and BIVMH under BayesB model. Different letters indicate significant difference with P < 0.05. Figure 3.4: Boxplots of effective sample size for s 2 divided by total computational time in seconds across 15 replicates for three different levels of LD comparing DFMH, UNIMH and BIVMH under BayesB model. Different letters indicate significant difference with P < 0.05. 51 Figure 3.5: Boxplots of effective sample size for π divided by total computational time in seconds across 15 replicates for three different levels of LD comparing DFMH, UNIMH and BIVMH under BayesB model. Different letters indicate significant difference with P< 0.05. We also separately looked at the components of computational efficiency; i.e., ESS and CPU/cycle in seconds for each parameter in both models and under all three strategies. As anticipated, DFMH was computationally less expensive in terms of CPU/cycle compared to the proposed strategies UNIMH and BIVMH; however, the ESS for the 40,000 MCMC cycles that were drawn in each analyses were such that UNIMH and BIVMH generally far exceeded that of DFMH. What was particularly ominous was how quickly the ESS measures degraded with increasing marker densities thereby suggesting that high density marker panels lead to analyses that require not only greater CPU/cycle but also a greater number of MCMC cycles to ensure that ESS values are sufficiently great enough to ensure reliable inference. We were interested as to whether accuracy of breeding value prediction might depend on misspecification of hyperparameters, say, s 2 . We assessed the impact on 52 accuracy of breeding value predictions based on setting s 2 to a wide range of values 2 based on various multiples (0.01x to 100x) of the average posterior mean ( smed =7x10-4 2 for BayesA, smed =4x10-2 for BayesB) across the 15 replicates under the medium marker density. For BayesA (Figure 3.6) we determined no significant difference in accuracies 2 2 and s 2 = 0.01 smed ); however, breeding value when s2 was understated (i.e., s 2 = 0.1 smed accuracies were significantly compromised when s 2 was overstated (P<0.01), particularly 2 at s2 = 100 smed (P<0.0001). For BayesB (Figure 3.7), we did not see any significant differences in accuracy of prediction between the various scenarios. 53 Figure 3.6: Boxplots of accuracies of breeding value predictions under BayesA model 2 across 15 replicates at medium marker density (pairwise r2=0.24) for s set equal to 2 2 different multiples of average posterior mean of s ( smed =7x10-4) in a fully Bayes 2 analysis. Significant differences with 1 smed are indicated by *(P <0.01), ***(P <0.0001) . Figure 3.7: Boxplots of accuracies of breeding value predictions under BayesB model 2 across 15 replicates at medium marker density (pairwise r2=0.24) for s set equal to 2 2 =4x10-2) in a fully Bayes different multiples of average posterior mean of s ( smed 2 analysis. Significant differences with 1 smed are indicated by *(P <0.01), ***(P <0.0001). 54 We wondered if some of the non-significant differences in these comparisons could be partly attributed to a compensation in the inferences on other hyperparameters, specifically v and π (in BayesB). Indeed we noted that as the specification on s 2 2 2 to 100 smed , the posterior means of v also increased under both increased from 0.01 smed BayesA (Figure 3.8) and BayesB (Figure 3.9). This was somewhat anticipated given the high posterior correlation attributed between these two hyperparameters. Note from the { } prior specification on s 2 gj { } the MCMC draws of s 2 gj m j =1 ν s2 that E s | s > 0 = ; that is, the average values of ν −2 ( 2 gj 2 gj ) ν s2 will be somewhat constrained by . So if s 2 is j =1 ν −2 m understated, the estimate of v ( v > 2) should decrease accordingly to compensate such ν s2 that there is a good deal of flexibility in maintaining the value of . However, if s 2 is ν −2 ν s2 overstated, then there is very little flexibility to accordingly bring down with an ν −2 increased value of v since ν s2 can never be less than s 2 . We believe this is the reason ν −2 why understating the value of s 2 is less serious than overstating it, at least for BayesA, as further indicated by our results. The misspecification also impacts estimates of π in a BayesB as further illustrated in Figure 3.10. This provides BayesB even more flexibility than BayesA for misspecification of s 2 ; that is, overstated values of s 2 merely distribute the number of non-zero { g j } m j =1 over a few number of markers as indicated by higher values of π . This may be a key reason why we noticed non-significant differences for accuracy of breeding value prediction in BayesB between various specifications of s2 in 55 Figure 3.7. Misspecification of hyperparameters could then be another reason why BayesB often outperforms BayesA in many other comparisons. Figure 3.8: Boxplots of posterior mean and median for v under BayesA model across 15 2 replicates at medium marker density (pairwise r2=0.24) for s set equal to different 2 2 multiples of average posterior mean of s ( smed =7x10-4) in a fully Bayes analysis. 56 Figure 3.9: Boxplots of posterior mean and median for v under BayesB model across 15 2 replicates at medium marker density (pairwise r2=0.24) for s set equal to different 2 2 multiples of average posterior mean of s ( smed =4x10-2) in a fully Bayes analysis. Figure 3.10: Boxplots of posterior mean and median for π under BayesB model across 2 15 replicates at medium marker density (pairwise r2=0.24) for s set equal to different 2 multiples of average posterior mean of s 2 ( smed =4x10-2) in a fully Bayes analysis. 57 We also wondered if estimates of s 2 based on analysis derived from certain marker densities could be extrapolated to other marker densities for analysis of the same 2 2 phenotypes. Recall that smed =7x10-4 for BayesA, smed =4x10-2 for BayesB with the medium marker density panel. For the high marker density involving 4 times as many 2 markers, we specified s2 = 4 smed whereas for the low marker density panel which had half 2 as many markers, we specified s2 = 0.5 smed . We found no significant differences in accuracies in any case (see Figures 3.11 and 3.12) except for a significant lower accuracy for extrapolation on s 2 at the higher marker density using BayesA (P=0.04). 58 Figure 3.11: Accuracy of breeding value predictions under BayesA model across 15 replicates at high and low marker densities (pairwise LD r2=0.32 and 0.17) using DFMH (red), DFMH with fixed scale s 2 =7x10-4/4=1.75x10-4 (green) at high marker density and DFMH with fixed scale s 2 =7x10-4 x 2.5=1.75x10-3 (blue) at low marker density. Significant difference in accuracy between DFMH (red) and DFMH with s 2 =1.75x10-4 (green) was found at P< 0.05. Figure 3.12: Accuracy of breeding value predictions under BayesB model across 15 replicates at high and low marker densities (pairwise LD r2=0.32 and 0.17) using DFMH (red), DFMH with fixed scale s 2 =0.04/4=0.01 (green) at high marker density and DFMH with fixed scale s 2 =0.04 x 2.5=0.1 (blue) at low marker density. No significant differences in accuracy among each pair of methods were found. 59 3.3.2 Application to Heterogeneous Stock Mice data We summarize posterior inferences for the key hyperparameters under BayesA and BayesB analyses of the heterogeneous stock mice data in Appendix Table B2.1, Table B2.2 and Table B2.3 for the three marker densities: 950, 1900 and 3800 SNP. For the 950 SNP marker analysis (Appendix Table B2.1), posterior means of parameters were close to estimates previously provided for this same data by Yang and Tempelman (2012). For all analyses, the number of MCMC cycles post burn-in was the same. Under the BayesA model, the ESS for s 2 was twice as large using UNIMH and BIVMH compared to DFMH, while the ESS for v was four times larger in UNIMH and BIVMH compared to DFMH; the ESS for s 2 and for v were similar between UNIMH and BIVMH. Under the BayesB model, the ESS for v using UNIMH and BIVMH was 7-8 times greater for ν and 2 times greater for and π than that for DFMH. For the 1900 marker panel (Appendix Table B2.2), the ESS for UNIMH and BIVMH compared to DFMH were between 10-14 times greater for v and were both around 3 times greater for s 2 under BayesA than for BayesB model, these ratios were respectively between 10 and 15 for v , and close to 2 for s 2 and π . For the 3800 marker panel (Appendix Table B2.3), these respective ratios were between 12 and 13 for v and around 4 for s 2 using BayesA, whereas they were between 10 and 15 for v , about 3 for s 2 and about 4 for π using BayesB. 3.4 Discussion Most researchers don’t typically infer upon key hyperparameters (i.e., v , s 2 and π ) that partly determine the genetic architecture in BayesA/B WGP models. This is in part 60 due to the high posterior correlation that exists between some of these hyperparameters, in particular, v and s 2 (Habier, Fernando et al. 2011; de los Campos, Hickey et al. 2013). Nevertheless, some (Riedelsheimer, Technow et al. 2012; Technow, Riedelsheimer et al. 2012; Technow and Melchinger 2013) have been successful in using techniques previously presented by Yi and Xu (2008) and Yang and Tempelman (2012) to infer upon these hyperparameters, and which closely mirrors that labeled DFMH as described in this paper. We considered two alternative sampling strategies to DFMH, each involving the use of MH, in an attempt to improve the computational efficiency of WGP models as measured by the ratio ESS/CPU. Using simulation studies and empirical data analyses, we demonstrated that strategies borrowing more heavily on MH sampling had better computational efficiencies compared to DFMH. Simple modifications such as sampling s 2 with a MH rather than a Gibbs step (UNIMH) or joint sampling of s 2 and v with a bivariate MH step (BIVMH) lead to substantial improvements in ESS/CPU. We concede that our investigation is not exhaustive with respect to assessing all possible strategies to improve computational efficiency in these models; in fact, there may be a hybrid involving some or all of the three presented sampling strategies that might be computationally more efficient. Deviations of MH sampling such as Langevin-Hastings could also have been explored and assessed here as well although its advantage relative to MH sampling has not been too convincing in other animal breeding models (Shariati and Sorensen 2008; Waagepetersen, Ibanez-Escriche et al. 2008). In other work that we do not report here, we attempted to base the covariance matrix for the proposal density in BIVMH on the negative Hessian of the joint FCD of log( v ) and log( s 2 ). 61 However, we determined this matrix to be positive definite generally only when v < 50, thereby negating its use in this way. Recently, non-MCMC (i.e. expectation-maximization) schemes have been increasingly popular; however, it is often not straightforward how to estimate key hyperparameters in these implementations (KARKKAINEN and SILLANPAA 2012). In any case, we encourage further development and work in this area including the Bayesian LASSO model (de los Campos, Naya et al. 2009). We have previously demonstrated that it may be advantageous to specify nonstationary correlation structures between adjacent SNP using a first-order antedependence specification (YANG and TEMPELMAN 2012). In work not reported here, we also evaluated the three alternative sampling strategies in the context of antedependence versions of BayesA and BayesB and drew conclusions virtually identical to what we draw here. Overspecifying s 2 appeared to have deleterious effects on accuracy of genomic selection using BayesA models although no such effect was observed in BayesB models likely due to the counteracting influence of π . It appeared that underspecification of s 2 lead to more robust genomic predictions as there is greater flexibility for inference on π and s 2 to compensate for this. We also determined that it may be reasonable to consider specifying values for s 2 for one marker density based on a previous estimate from another marker density by taking into account the direct inverse relationship between s 2 and marker density. At any rate, it should be fully appreciated that these hyperparameters should not be arbitrarily specified in BayesA models. We anticipate that these issues are also 62 pertinent to determining tuning parameters for various nonparametric approaches as well. We do recognize however computational challenges may be formidable for marker density panels that far exceed those that we considered in this paper. At the very least then, some hyperparameters should be estimated based on simple model-based approximations; for example, s 2 in BayesA should not be much different in magnitude from the variance component for SNP effects in a GBLUP (Meuwissen, Hayes et al. 2001)analysis; hence, a REML-like estimator could be used to provide a reasonable specification. If this is deemed to be computationally intractable relative to the marker density, then extrapolations based on analyses based on lower marker densities might be pursued similar to those presented in this paper. 3.5 Conclusions In WGP Bayesian hierarchical models, log transformation and jointly drawing v and s 2 can improve MCMC efficiency for inference on all hyperparameters. Even separate univariate MH draws on v and s 2 is substantially more efficient than Gibbs sampling of s 2 . Overspecification of key hyperparameters s 2 can reduce accuracy of breeding value prediction under BayesA model. BayesB model is more robust to misspecification of s 2 due to inference on association probability π . However, it’s important to estimate all hyperparameters since misspecification of s 2 can lead to poor inference on v and π . 63 Chapter 4 Random regression and reaction norm extensions of whole genome prediction models to account for genotype by environment interaction 4.1 Introduction Whole genome prediction (WGP) has become a revolutionary process for selecting animals and plants for genetic merit on economically important traits using high density single nucleotide polymorphism (SNP) markers (Meuwissen, Hayes et al. 2001). Many WGP methods have been investigated to improve accuracy of breeding value (BV) prediction (de los Campos, Hickey et al. 2013). Meuwissen et al. (2001) proposed two hierarchical Bayesian methods, i.e. scaled-t density prior with and without point mass at zero, namely BayesB and BayesA, respectively. To infer upon key hyperparameters, fully hierarchical Bayesian WGP approaches based on BayesA have been developed and applied in many studies (Yi and Xu 2008; Jia and Jannink 2012; Yang and Tempelman 2012). Genotype by environment (G×E) interaction refers to how genotypes influence phenotypes differentially in different environments (FALCONER 1952). That is to say, the genetic merit, even ranking, of animals for certain quantitative traits could be substantially different across different environments. The existence of G×E has been found for various traits in various livestock and plant species (Deeb and Cahaner 2001; Berry, Buckley et al. 2003; Beerda, Ouweltjes et al. 2007; Bohmanova, Misztal et al. 2008; Knap and Su 2008; Hadjipavlou and Bishop 2009; Lillehammer, Hayes et al. 2009). Recently, it has been determined that some SNP and hence quantitative trait loci (QTL) effects are different across environments (Lillehammer, Arnyasi et al. 2007; 64 Lillehammer, Odegard et al. 2007; Lillehammer, Goddard et al. 2008; Lillehammer, Hayes et al. 2009). In fact, Lillehammer et al. (2007) determined in their analyses that some QTL may not have been otherwise inferred without allowing for G×E. However, little work has been considered to jointly model SNP effects across different environments under a WGP framework (Burgueno, de los Campos et al. 2012). Burgueno et al. (2012) adopted factor analytic models to account for G×E based on SNP and/or pedigree derived relationships. Their model did not consider information due to environmental covariates that might potentially drive G×E. If G×E is present, but is not considered in WGP models, then selection of animals for certain environments could be suboptimal. The existence of G×E further complicates the process of WGP validation; that is, sometimes the effects of markers estimated under one population (i.e., training set) are retested in another population or environment (i.e. validation set) (Daetwyler, Calus et al. 2013); a clear example is the use of parental genotypes and data as training data with progeny genotypes and data used as validation within the context of future environments as validation. If G×E effects are important, then this validation strategy may not work as intended. Random regression (RR) and reaction norm (RN) models have played an important role in detecting G×E of a linear or even higher order nature (Calus, Groen et al. 2002; Berry, Buckley et al. 2003; Calus and Veerkamp 2003; Mattar, Silva et al. 2011; Cardoso and Tempelman 2012; Streit, Reinhardt et al. 2012). RR models have been typically used for modeling genetic merit of traits with repeated measurements over time (Berry, Buckley et al. 2003) whereas RN models have been applied to quantitative traits where genetic merit is typically modeled as a function of key environmental covariate(s) 65 (Streit, Reinhardt et al. 2012). For both RR and RN models, BV can be modeled as function of an intercept, reflecting an average environment, and a linear function of a key environmental covariate (Calus, Groen et al. 2002). In QTL mapping studies, the QTL specific intercept and slope effects of environmental covariates have been modeled to account for G×E (Lillehammer, Arnyasi et al. 2007; Lillehammer, Odegard et al. 2007; Lillehammer, Goddard et al. 2008). In a genome wide association study (GWAS) focusing on the detection of G×E, SNP specific intercept and slope effects of environmental covariates have been modeled (Lillehammer, Hayes et al. 2009). With increasing availability of high marker densities in WGP, we develop genomic RR/RN models by specifying SNP substitution effects as random intercept and linear functions of age or environmental covariates in a manner similar to Streit et al. (2013). In “Bayesian alphabet” WGP models like BayesA or BayesB (MEUWISSEN and GODDARD 2010), SNP specific genetic variances are modeled. These variances are of no inherent interest but are used to specify a distribution for the SNP effects that are heavier tailed (e.g. Student t) than Gaussian. For RR/RN models used to specify G×E in WGP, 2x2 genetic variance-covariance matrices (VCV) of the SNP-specific intercepts and slopes are modeled. Conjugate priors on these trait-specific VCV, such as independent inverted Wishart (IW) densities, have been used for bivariate genomic analyses (CALUS and VEERKAMP 2011), thereby rendering marginal distributions on SNP intercept and slope effects as bivariate Student t. An alternative specification, the square root free Cholesky decomposition (CD) of the VCV, has been applied in bivariate trait analyses to model random and residual variance-covariance matrices (Bello, Steibel et al. 2010). The CD specification re-parameterizes VCV into generalized autoregressive parameters 66 (GARP) and innovation variances to provide potentially greater flexibility relative to IW based specifications (POURAHMADI 1999). In this paper, we develop and test five possible RR/RN models based on IW or CD based specifications. The objectives of this study were to compare these five models to two conventional BayesA and BayesB models for assessing the accuracy of BV prediction in WGP, and to compare the five RR/RN models in the ability to detect G×E of a linear nature. 4.2 Materials and Methods 4.2.1 Random regression and reaction norm models The random regression (RR) model for WGP can be denoted as follows: yik = x′i β + z′i ( g1 + dik g 2 ) + ai + eik , [1] where yik is the kth phenotype record for ith animal (k=1,2,…,t; i =1,2,…,ni); β is the vector of fixed effects, x′i is the incidence row vector connecting elements of β to animal i; z′i = [ zi1 zi 2 zi 3  zim ] is the vector of genotypes coded as 0, 1, or 2 copies of the minor allele on SNPs for animal i; g1 = { g1 j } effects; g 2 = { g 2 j } m j =1 m j =1 is the vector of SNP-specific intercept is the vector of SNP-specific temporal slope effects; d ik is the environmental covariate for record k on animal i and eik is the random residual. Finally, ak is the random effect of animal characterized by a variance component s a2 . This particular random effect may be either a specification on residual polygenic or permanent environmental effects or both; nevertheless, it is particularly required in addition to the 67 other terms when there are repeated records per animal. The environmental covariates dik are assumed to be known without error. In Equation [1], one could write the total breeding value (BV) for animal i in an environment characterized by covariate dik as z′i ( g1 + dik g 2 ) which, in turn, is a function of the intercept BV z′i g1 and the slope BV z′i g 2 . One could then think of z′i g1 as being a measure of overall genetic merit (if dik is recentered to an average of 0) for animal i whereas z′i g 2 is a the genetic merit of that same animal’s environmental sensitivity; i.e., the greater the value of | z′i g 2 |, the greater the sensitivity of that animal’s genetic merit to different environments as represented by different values of dik. We further write the RR-WGP model in matrix notation as follows: y=Xβ + Zg1 + DZg 2 + Wa + e, [2] where e = {eik } , D is a nt by nt diagonal matrix with the environmental covariates {dik } Z along the diagonal, X = {x′i ⊗ 1tx1}i =1 , and = n {z′i ⊗ 1tx1}i=1 with ⊗ denoting the n 2 Kronecker product. We assume e ~ N (0, R = Is e ) in this paper although generalizations to heterogeneous residual variances over time would be possible too. Also, we might specify arbitrarily informative or diffuse priors p ( β ) on “fixed effects” β , p (s e2 ) on s e2 and p (s a2 ) on s a2 . Reaction norm (RN) models could somewhat be considered simplifications of RR models whereby typically only one phenotypic record ( yi ) is observed per animal, although the environmental covariate ( di ) unique to animal i may vary across animals 68 (i.e., t = 1). Sometimes these covariates need to also be inferred (Su, Madsen et al. 2006) but for the purposes of this study we’ll also consider di as known. The RN model can then be written as: yi = x′iβ + z′i ( g1 + d i g 2 ) + ei , [3] As with the RR model in Equations [1] and [2], we could add an effect for animal, or equivalently, residual polygenic effects based on a known correlation (i.e., numerator relationship) matrix, A, between animals if the number of SNP is not considered to be large enough to model genetic variability. 4.2.2 Conventional BayesA and BayesB (BayesA\BayesB) In conventional WGP models, all elements of g 2 are zero. BayesB specifies a mixture prior of a point mass at zero with non-association probability (1 − π ) and a Student-t density with degrees of freedom v and scale parameter s 2 with association probability π . BayesA is a special case of BayesB when π = 1 (Meuwissen, Hayes et al. 2001). Priors such as p (ν ) , p ( s 2 ) , and/or p (π ) can be specified on these hyperparameters for BayesA or BayesB in order to properly “tune” them or account for their uncertainty as we and others have done previously (Yi and Xu 2008; Technow, Riedelsheimer et al. 2012; Yang and Tempelman 2012) or strongly advocated (JIA and JANNINK 2012). 4.2.3 Bivariate Normality (IW-BayesC) The simplest specification for SNP-effects and slopes might be based on multivariate normality. Suppose we reorder g = [ g1 ' g 2 '] ' instead as 69 g* g11 g 21 g12 g 22 ... g1m g 2 m ] ' [ g.1 ' [= g.2 ' ... g.m '] where g.j represents the random intercept and slope effects of SNP j. Here g.j are all specified to be independently multivariate normal with null mean vector and common variance covariate matrix Σg where  s g21 s g1g2  Σg =  2  s g1g2 s g2  [4] This specification, more or less, represents a bivariate extension of what Habier et al. (2011) describes as BayesCπ when π = 0 where π defines the probability of nonassociation. However, as we illustrate later, this is effectively equivalent to a classical mixed model analysis. One can specify a conjugate Inverted Wishart prior on Σ g with Σ g ~ IW (v0 , Σ 0 ) , and we denote this extension as IW-BayesC. 4.2.4 Bivariate Student t and Variable Selection (IW-BayesA\IW-BayesB) We consider an extension to IW-BayesC, whereby intercept and slope effects are specified to have heterogeneous variance-covariance matrices across SNP. For SNP j, we specify g. j to be conditionally bivariate normal; i.e.,   s g21 j s g1 j g2 j   . g. j  N  02×1 , G j =  2   s s  g1 j g2 j g2 j    [5] We then specify all G j to have independent conjugate inverted Wishart prior densities; i.e. G j ~ IW(vg , Σ g ) characterized by a degrees of freedom parameter vg and scale matrix  s g21 s g1g2  . We denote this specification as IW-BayesA, noting the obvious Σg =  2  s g1g2 s g2  70 bivariate extension to BayesA as first proposed by Meuwissen et al. (Meuwissen, Hayes Σ g ∀j such that IW-BayesA reverts back to IWet al. 2001). Note that as vg →∞, G= j BayesC. We specify a prior p ( vg ) on vg and a conjugate Wishart prior Σ g ~ W ( v0 , S0 ) on Σ g . As alluded to by Munilla and Cantet (2012) and Bello et al. (2010), the variability of the three components (i.e., 2 SNP-specific variances and a SNP-specific covariance) of G j using IW-BayesA/B is primarily controlled by one hyperparameter: vg . Mirroring the extension of BayesA to BayesB by Meuwissen et al. (2001), we also further modified IW-BayesA by specifying a mixture prior on G j such that G j = 02 x 2 with probability (1 − π ) and G j ~ IW(vg , Σ g ) with probability π . We name this procedure IW-BayesB specifying a prior p (π ) on π . This specification is perhaps more dubious than IW-BayesA, given its all or none assumption with respect to SNP effects on both intercept and slope, whereas IW-BayesA likely has more flexibility to specify large SNP effects for, say, the intercept, but near-zero for slope. 4.2.5 Cholesky decomposition specifications (CD-BayesA\CD-BayesB) Based on our previous experiences, e.g. (Bello, Steibel et al. 2010), we conjectured that specification of inverted Wishart prior densities on G j might be rather inflexible as such specifications either imply that all SNP have either non-zero effects for both intercept and slope (IW-BayesA, IW-BayesC) or, if they don’t, both effects are 0 (i.e., IW-BayesB), thereby not allowing for the possibility that some SNP effects are overall important (i.e. non-zero intercept) but environmentally robust (i.e., zero slope). 71 Furthermore, these IW specifications are additionally inflexible in that the heterogeneity of each single component of G j is controlled by a single parameter vg . We subsequently developed an alternative parameterization based on the square root free Cholesky decomposition (CD) of G j as based on our previous work (Bello, Steibel et al. 2010). The CD parameterization provides potentially greater flexibility by modeling the following relationship between g 2 and g1 : = g 2 Ψg1 + g 2|1 [6] { } Here g 2|1 = g 2|1, j is the vector of SNP-specific slope effects conditional on intercept effects whereas Ψ = diag {φ j } j =1 represents a diagonal matrix of SNP-specific m associations between intercept and slope effects. Hence, we can re-write the RN/RR model [2] as: y=Xβ + Zg1 + DZ ( Ψg1 + g 2|1 ) + e, [7a] y=Xβ + ( Z+DZΨ ) g1 + DZg 2|1 + e. [7b] or Note for SNP j that if φj ≈ 0, intercept effects are independent of slope effects. If g 2|1, j ≈ 0 and φj ≠ 0 then intercept and slope effects are perfectly correlated. If g 2|1, j ≈ 0 and φj ≈ 0, then the SNP is said to be environmentally robust (i.e., g 2 j ≈ 0 ). For SNP j, we specify g1 j ~ N (0, s g21 j ) with g 2|1, j ~ N (0, s g22|1 j ) with s g22|1 j ~ χ −2 ( v2 , v2 s22 ) . 72 s g21 j ~ χ −2 ( v1 , v1s12 ) whereas In essence, these two mixtures specify two separate univariate Student t densities for elements of g1 and g 2|1 . Furthermore, we specify independent normal priors on SNP-specific association parameters between intercept and slope effects; i.e., φ j ~ N ( mφ , s φ2 )∀j . We label this model as CD-BayesA. Alternatively, let’s consider a variable selection extension of CD-BayesA such that s g21 j and s g22|1 j have these same respective inverted chi-square priors but with corresponding probabilities π 1 and π 2 such that s g21 j = 0 and s g22|1 j =0 with probabilities (1 − π 1 ) and (1 − π 2 ) . For obvious reasons, we then label this model then as CD-BayesB. For both models, we specify diffuse or informative priors on mφ and on s φ2 . 4.2.6 Bayesian inference In order to conduct fully Bayesian inference using Markov Chain Monte Carlo methods, it is necessary to derive the full conditional densities (FCD) for each unknown parameter to be inferred. For each of the aforementioned RN/RR models, we present these FCD in Appendix C1. 4.2.7 Simulation Study In order to discern the ability of the various models to differentially fit various naively defined genetic architectures, we conducted a simple simulation study. We targeted six specific scenarios as outlined in Table 4.1. Key specifications were based on an overall or average genetic correlation ( ρ g1g2 ) between intercept and slope, as further described and defined in Appendix C1.3, targeting values of ρ g1g2 = 0, ρ g1g2 = 0.5, and ρ g g = 0.8. We also investigated the effects of the number of QTL influencing both 1 2 73 intercept and slope (Mboth = 20, 50, or 100), the number of QTL influencing the intercept only (Mint = 20, 50) and the heritability (h2 = 0.20 or 0.50). One may think of Mint as the number of environmentally robust QTL (i.e., consistent genetic effects across environments) whereas Mboth denotes the number of environmentally sensitive QTL; i.e. QTL whose effects are influenced by environmental effects. Table 4.1: Summary of six scenarios in LD simulation Scenario Number of QTLs Number of QTLs Overall genetic Average for both intercept for intercept only correlation ( ρ g1g2 ) heritability and slope ( Mint ) [and range across ( Mboth ) 1 2 3 4 5 6 100 100 100 50 50 20 (h 2 ) replicates] 0 0 0 50 50 20 0 [-0.07, 0.07] 0.5 [0.39, 0.61] 0.8 [0.68, 0.85] 0.5 [0.40, 0.66] 0.5 [0.40, 0.66] 0.5 [0.37, 0.71] 74 0.5 0.5 0.5 0.5 0.2 0.5 The first three scenarios (Scenarios 1-3) all entailed Mboth = 100 and h2 = 0.50 and characterized situations that seemingly best agreed with an IW-BayesB or a CD-BayesB situation i.e., Mint = 0. The only differences between each of these three scenarios were differences in ρ g1g2 . Scenarios 4-6 seemingly best agreed with the CD-BayesB specifications (i.e., Mint ≠ 0) and were studied in order to assess the effect of h2 (0.5 vs 0.2 for Scenarios 4 vs. 5) and Mint (50 versus 20 for Scenarios 4 vs. 6). Twenty replicated datasets were generated under each of the six different scenarios. For each replicate, we used the R package HaploSim (Coster, Bastiaansen et al. 2010) to generate 6000 historical generations based on a constant population size of 100 animals as based on 200 unique haplotypes in the base generation. For all cases, the genome was originally composed of one chromosome with 1 Morgan in length and having 100,000 loci. For 20,000 of these loci, the biallelic minor allele frequency was 0.5 whereas the remaining loci (i.e., 80,000) were specified to be monomorphic in the base population. The number of recombinations for each meiosis event was drawn from a Poisson(1) distribution with genomic positions for recombination randomly chosen from a uniform distribution. For the 6000 historical generations, we specified the recurrent mutation rate for all loci as 10-5 per locus per generation. After 6000 generations, two additional Generations 6001 and 6002 were generated to expanded to randomly mated population sizes of n = 2000 animals each. For each replicate, we deleted SNP with minor allele frequency (MAF) < 0.05 in Generation 6001. Around 2200 SNPs remained after data editing. The genotype matrix Z was then based on the number of minor alleles (0, 1 or 2) at each locus for each animal. 75 We randomly chose Mboth + Mint of these SNP to be QTL. Variances for QTL-specific intercept effects s g21 j were drawn from scaled inverted chi-square distributions with scale s12 = 2 and degrees of freedom ν 1 = 5. Variances for any QTL-specific slope effects conditional on intercept effects, (i.e., s g22|1 j ) for each of Mboth QTL, were also drawn from scaled inverted chi-square densities always with ν 2 = 5 and with s22 = 2 ( ρ g1g2 = 0), s22 =1.5 ( ρ g1g2 = 0.5), or s22 = 0.72 ( ρ g1g2 = 0.8). QTL effects for intercepts {g } =j M both + M int QTL ,1 j j =1 and conditional slopes { gQTL ,2|1 j } j = M both j =1 were then independently generated from normal distributions with null means and their corresponding variances s g21 j and s g22|1 j . Hence, QTL effects for { gQTL ,1 j } =j M both + M int j =1 and { gQTL ,2|1 j } j = M both j =1 were each Student t-distributed. The association parameters {φQTL , j } j = M both j =1 between intercept and slope for each of Mboth QTL were generated from independent normal distributions, always with variance 2 s QTL ,φ = 0.05 and with mean mQTL ,φ = 0, for ρ g g = 0, mQTL ,φ =0.5, for ρ g g = 0.5, and 1 2 1 2 mQTL ,φ =0.8, for ρ g g = 0.8. Subsequently, the effect for slope for QTL j was determined 1 2 to be g= gQTL ,1 j + φQTL , j gQTL ,2|1 j ; j=1,2,…,Mboth. An environmental covariate, di , QTL ,2 j unique to each animal, was randomly drawn from Ν(0,0.36). Writing ZQTL,int as the subset of the n x Mint SNP genotypes in Z that are designated to be QTL for intercepts only and ZQTL,both as the subset of the n x Mboth SNP genotypes in Z that are designated to be QTL for both intercepts and slopes, the true breeding values {TBV }i =1 in each of i =n 76 Generations 6001 and 6002 were generated using ZQTL ,both { gQTL ,1 j } {TBVi }i =1 i=n + ZQTL ,int { gQTL ,1 j } + diag ( di ) ZQTL ,both { gQTL ,2 j } j= M both j= M both + M int j= M both 1 j= j= M both +1 1 j= [8] Residuals ( ei ) were generated for the record on each animal were drawn from a normal distribution with null mean and variance ( s e2 ) as dependent upon h2; i.e s = 2 e ( ) var {TBVi }i =1 (1 − h 2 ) i =n [9] h2 = yi TBVi + ei . Hence, phenotypic records were generated as Since the existence of GxE would imply that we select for genetic merit tailored for specific environments, we compared the accuracy of predicting TBV among the five aforementioned RN methods as well the two conventional methods (BayesA, BayesB) at three environmental covariate values: d = -1.2, d = 0.0 and d = 1.2 representing -2, 0 and + 2 standard deviations, respectively, for d. Accuracy for a particular value of d was { } defined as the correlation between {TBVi ,d }i =1 and { EBVi ,d }i =1 where TBVi ,d for i =n i =n Generation 6002 animals is based on Equation [8] but with all di = d whereas EBV= z i' gˆ 1 + dz i' gˆ 2 for z i' being the SNP genotypes for animal i in Generation 6002 i ,d * were based on estimated effects using only Generation 6001 data; i.e., gˆ 1 and gˆ 2 are the respective posterior means for SNP-specific intercepts and slopes. For each of the 5 RR/RN models, we also assessed the relative accuracy of predicting the components of 77 the BV due to intercept and slope separately in all six scenarios among the five methods. Intercept BV accuracy was determined as the correlation between true intercept BV (first two expressions on right side of Equation 8) and estimated intercept BV ( Zgˆ 1 ) whereas slope BV accuracy was determined as the correlation between true slope BV ( ZQTL ,both { gQTL ,2 j } j = M both j =1 ) and estimated slope BV ( Zgˆ 2 ). We compared prediction accuracy of BV using a Wilcoxon signed rank test between each pair of the RR/RN and conventional methods. 4.2.8 MSU Pig Resource Population data The genotypes used for this analysis were based on a commercial platform for low density genotyping (8434 SNP) in swine marketed as the Neogen Porcine GeneSeek Genomic Profiler LD (version 1) (GeneSeek, a Neogen Company, Lincoln, NE) (Badke, Bates et al. 2013). We received complete phenotypes and genotypes information on 928 F2 animals derived from a Duroc × Pietrain resource population at Michigan State University (Edwards, Ernst et al. 2008; Choi, Steibel et al. 2010). Any SNP with MAF < 0.01 were deleted. For adjacent SNP in complete LD with pairwise r2=1, we deleted one SNP at random from each pair. Then we excluded SNPs having P-value < 10-4 for the Hardy Weinberg equilibrium test. Genotypes for the remaining 5271 SNPs were standardized using ( zij − 2 p j ) 2 p j (1 − p j ) where zij is genotype of jth SNP on ith animal and p j is allele frequency of the reference (“0”) allele for jth SNP (de los Campos, Hickey et al. 2013). Pedigree information was available for the 928 F2 animals including their parents and grandparents. There were 140 unique full-sib families in total among the 78 928 F2 animals. The average full-sib family size was around 6. Map information was available on SNPs for each of the 18 autosomes from Duarte et al. (Duarte, Bates et al. 2013). We focused on back fat thickness as our response variable. Back fat thickness was measured at the 10th rib by B-mode ultrasound at weeks 10, 13, 16, 19 and 22 (Edwards et al. 2008). We used the following RR model for data analysis: yijkl = m + ∑ βω di + sex j + ∑ ( sex j × βω diω ) 4 4 ω = ω 1= ω 1 + litterk + u1l + di u2l + z′l ( g1 + di g 2 ) + eijkl . [10] ω Here, m is the overall mean; di is the ωth order power or polynomial (ω =1,2,3,4) on week i with βω being the corresponding partial regression coefficient, sex j is the fixed ( effect of sex of animal j, sex j × βω diω ) j ω is the fixed effect of the interaction between di and sex j , litterk is random effect for litter k, u1l is permanent environmental intercept effect for animal l, u2l is the permanent environmental slope effect for animal l, z′l = [ zl1 zl 2 zl 3  zlm ] is the vector of genotypes on animal l, g1 is the vector of SNPspecific intercept effects; g 2 is the vector of SNP-specific slope effects; d i is the recoded covariate for week i; eijkl is the residual effect on yijkl . We further rescaled d i as -1, -0.5, 0, 0.5 and 1 for weeks 10, 13, 16, 19 and 22. Random effect specifications were as follows. Litter effects were presumed to be 2 . normally and independently distributed with null mean and variance component s litter We decided not to fit polygenic effects since they are strongly confounded with both litter 79 effects, permanent environment effects and SNP effects and hence would severely impede MCMC performance. The permanent environmental effects u = [u1 ' u 2 '] ' for u1 = {u1l } and u 2 = {u2l } were assumed to be multivariate normal with null mean vector and variance-covariance matrix I ⊗ Σ where I is the identity matrix and Σ is the 2 x 2 unstructured covariance matrix between permanent environmental intercept and slope. With availability of pedigree information, we also applied a conventional polygenic model (RR-BLUP) as a control-based comparison for the genomic (i.e., SNP) based RR models. Fixed effects, random litter effects and permanent environmental effects were defined the same as in Equation [10], except that we replaced SNP effects by polygenic effects for both intercept and slope as following: yijkl = m + ∑ βω di + sex j + ∑ ( sex j × βω diω ) 4 ω 4 = ω 1= ω 1 + litterk + u1l + di u2l + a1l + di a2l + eijkl . [11] We assumed that the polygenic effects a = [a1 ' a 2 '] ' for intercept a1 = {a1l } and slope a 2 = {a2l } have a joint multivariate normal distribution with null mean vector and variance-covariance matrix A ⊗ Σa , where A is the numerator relationship matrix and Σa is the 2 x 2 unstructured covariance matrix between polygenic intercept and slope effects. All of the five aforementioned RR models along with RR-BLUP were compared to conventional WGP specifications (i.e. BayesA and BayesB) based on 20 random across litter cross validation splits of the data using Wilcoxon signed rank test. That is, for each split, data on all individuals from 126 (90%) litters were randomly chosen as the 80 training data whereas the data on all subjects in the remaining 14 (10%) litters were chosen to be validation data. In this cross validation, hyperparameters and variance components were fixed to estimates derived from analyses of the entire data. In all cases, except for RR-BLUP, inferences were based on posterior means of the MCMC samples from the posterior distributions. To expedite analyses under RR-BLUP model, we used ASReml 3.0 to provide analytical BLUP solutions rather than using MCMC, noting that conditionally on variance components, these MCMC and BLUP inferences are identical. For all eight models, the Pearson correlation between these predictions and the actual training data was used as a measure of performance of the competing methods. 4.2.9 Priors used for data analyses For all analyses in this paper, we specified a vaguely informative prior ν ~ p (ν ) ∝ (ν + 1) and Gelman’s prior s 2  χ −2 ( −1,0 ) , which is also informative −2 when bounded, for conventional BayesA as we’ve done previously (YANG and TEMPELMAN 2012). For conventional BayesB, proper conjugate priors s 2  Gamma ( 0.1,0.1) and π ~ p (π | απ , β= Beta (= απ 1,= βπ 8 ) were used. For IWπ) BayesC, a conjugate inverted Wishart prior was specified on scale matrix Σ g with Σ g ~ IW (v0 = −3, S0 = 02×2 ) . For IW-BayesA and IW-BayesB, we specified a prior on vg such that p ( vg ) ∝ ( vg + 1) and a conjugate Wishart prior; i.e., Σ g ~ W ( v0 , S0 ) on the scale −2 matrix Σ g . As in conventional BayesB, the same proper prior was specified on π , i.e. π ~ Beta (= απ 1,= βπ 8) in IW-BayesB. 81 In CD-BayesA and CD-BayesB, we used the same non-informative priors p (ν 1 ) ∝ (ν 1 + 1) and p (ν 2 ) ∝ (ν 2 + 1) for intercept and conditional slope effects. In −2 −2 addition, Gelman’s prior s12  χ −2 ( −1,0 ) and s22  χ −2 ( −1,0 ) were used in CD-BayesA, 2 2 while s1  Gamma (0.1, 0.1) and s2  Gamma (0.1, 0.1) were specified in CD-BayesB. απ 1 1,= βπ 1 8) and For CD-BayesB, we specified the proper priors π 1 ~ Beta (= π 2 ~ Beta (= απ 2 1,= βπ 2 8) . For both CD-BayesA and CD-BayesB, mean of SNP specific association parameters between intercept and slope mφ was specified using p (= (τ 0,= mφ ) N= ζ 2 1) , and variance for association parameters s φ2 was specified with ( ) 2 χ −2 ( −1,0 ) . Gelman’s prior p s= φ 4.3 Results 4.3.1 Simulation Study In Figure 4.1, we illustrate comparisons of the accuracies for each of the seven competing models at each of three different environmental covariate values: d = -1.2, 0 and 1.2 for each of the six scenarios as previously described in Table 4.1. Again, these results were based on 20 replicated datasets for each scenario. We focus first on the comparisons in Scenarios 1-3 where the simulated genetic architecture seemed more congruent with an IW-BayesA or CD-BayesA like specification (i.e., Mint = 0). When ρ g g = 0 in Scenario 1 (Figure 4.1(1)), there were significant (P<0.05) differences in 1 2 accuracy of nearly 30 percentage points (i.e., >84% vs. <56% at d = -1.2, >82% vs. <52% at d = 1.2) in favor of each the five RN models over the two conventional models 82 (BayesA/BayesB). Such differences were also significant when ρ g1g2 = 0.5 (Scenario 2 in Figure 4.1(2)) and when ρ g1g2 = 0.8 (Scenario 3 in Figure 4.1(3)) although these differences became increasingly asymmetric between d = -1.2 (i.e., progressively larger) and d = 1.2 (i.e., progressively smaller). For each of these three scenarios, IW-BayesA had the highest accuracy relative to all other models in d = -1.2 and 1.2 (P<0.05) whereas it did not appear to be different from either of the CD models when d = 0. Furthermore, the differences between all of the competing models were trivial (<2-3%) between all RN and conventional models at d = 0 such that IW-BayesB and IW-BayesC were not even judged to be different from BayesA and BayesB. We further compared accuracies at d = -1.2, 0 and 1.2 in three other scenarios (4, 5, and 6) where the simulated genetic architecture seemed more congruent with a CDBayesB like specification (i.e., Mint>0) based on ρ g1g2 = 0.5. In Scenario 4 (Figure 4.1(4)) where Mboth = Mint = 50 and h2=0.5, both conventional BayesA and BayesB were inferior (P<0.05) to all RN models at both d = -1.2 and 1.2 as expected. However, IW-BayesA surprisingly outperformed (P<0.05) all other RN methods at both d = -1.2 and 1.2, although such differences were small (i.e., <2-3%). Differences between all models were even smaller at d = 0 (i.e., <1-2%). The specifications for Scenario 5 (Figure 4.1(5)) differed only from Scenario 4 with a lower h2 = 0.2. In that case, IW-BayesA also had significantly (P<0.05) higher accuracy compared to other RN models at d =-1.2 and d=+1.2 except for CD-BayesB at d = +1.2. At d = 0, CD-BayesB (P<0.05) outperformed other models although all differences were very small. The final Scenario 6 (Figure 4.1(6)) differed only from Scenario 4 by lower numbers of QTL (i.e. Mboth = Mint = 20). Not only was IW-BayesC significantly much lower in accuracy compared to all other RN 83 models at all three values of d, it was even lower in accuracy compared to conventional (i.e., BayesA and BayesB) models at d = 0. CD-BayesB had the highest accuracy among the five RN methods at each value of d although that difference was not significant compared to IW-BayesA and IW-BayesB at d = -1.2. Figure 4.1: Average accuracy of breeding value prediction for seven methods (BayesA\BayesB\IW-BayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) at three environmental covariates in six scenarios, i.e. (1) M both = 100 , M int = 0 , ρ g1g2 = 0 , h 2 = 0.5 ; (2) M both = 100 , M int = 0 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (3) M both = 100 , M int = 0 , ρ g1g2 = 0.8 , h 2 = 0.5 ; (4) M both = 50 , M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (5) M both = 50 , M int = 50 , ρ g g = 0.5 , h 2 = 0.2 ; (6) M both = 20 , M int = 20 , ρ g g = 0.5 , h 2 = 0.5 ; 1 2 1 2 Different letters indicate significant difference at P<0.05. 84 We attempted to better understand these results by focusing on which of the RN methods performed best for inferring upon the intercept component of the BV and the slope component of the BV. For all six simulation scenarios, we found that CD-BayesB always had the significantly highest accuracy for intercept BV compared to all other RN models (Figure 4.2A) whereas IW-BayesC and IW-BayesB were generally amongst the worst. For slope BV accuracy, there was no evidence of differences between any of the models at the low heritability Scenario 5 (Figure 4.2B). However, IW-BayesA did outperform other RN methods in Scenarios 1-4 except for Scenario 3 where CD-BayesB was not found to be inferior to IW-BayesA. In Scenario 6, IW-BayesA only outperformed IW-BayesC. IW-BayesC and IW-BayesB had generally among the lowest slope BV accuracy although CD-BayesA was poorest in Scenario 4. Hence the general advantages for predicting environment-specific BV for CD-BayesB appeared to accrue from its greater accuracy on inferring intercept components of genetic merit whereas that for IW-BayesA appeared to accrue from inferring the slope components. 85 Figure 4.2: Accuracy of intercept (A) and slope (B) breeding value prediction for five RN methods (IW-BayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) under six scenarios: i.e. (1) M both = 100 , M int = 0 , ρ g1g2 = 0 , h 2 = 0.5 ; (2) M both = 100 , M int = 0 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (3) M both = 100 , M int = 0 , ρ g1g2 = 0.8 , h 2 = 0.5 ; (4) M both = 50 , M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (5) M both = 50 , M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.2 ; (6) M both = 20 , M int = 20 , ρ g1g2 = 0.5 , h 2 = 0.5 ; Different letters indicate significant difference at P<0.05. 86 4.3.2 MSU Pig Resource Population data Predictive ability was calculated as the correlation between observed phenotypes yijkl in the validation dataset and their predicted values based on inferences from the training dataset. The predictive abilities for each of the eight models for each of the 20 different cross-validation sets are illustrated in Figure 4.3. The five RR Bayesian methods had ~2.5% higher (P<0.0001) predictive ability than the two conventional methods and RR-BLUP model. In addition, no significant differences in predictive ability were found among the five SNP effects based RR methods. Figure 4.3: Predictive ability for eight methods (BayesA\BayesB\RR-BLUP\IWBayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) from cross-validation analysis using back fat thickness in MSU Pig Resource Population data. Different letters indicate significant difference at P<0.0001. 87 In an attempt to better understand these results, we focused on the non-mixture methods (conventional BayesA, IW-BayesA and CD-BayesA) for estimated intercept BV and slope BV respectively. Based on estimated intercept BV using the complete final analyses data, Figure 4.4 showed a relatively high correlation (~0.9996) between the two RR methods IW-BayesA and CD-BayesA. In contrast, low correlations were determined between conventional BayesA and IW-BayesA (~0.8606) and between conventional BayesA and CD-BayesA(~0.8589). As conventional BayesA does not model genomic effects for slope on weeks of age, we can only compare IW-BayesA and CD-BayesA for the estimated slope BV. Figure 4.5 demonstrated that there was a high correlation (~0.9845) between IW-BayesA and CD-BayesA for the estimated slope component of BV. 88 Figure 4.4: Estimated intercept breeding values from conventional BayesA, IW-BayesA and CD-BayesA using the complete final analyses data on back fat thickness in MSU Pig Resource Population. Reference line is y=x. Figure 4.5: Estimated slope breeding values from IW-BayesA and CD-BayesA using the complete final analyses data on back fat thickness in MSU Pig Resource Population. Reference line is y=x. 89 To further investigate the difference in detecting QTLs, we computed the absolute values of posterior means of SNP effects at three different ages (10, 16 and 22 weeks) for these same three methods in Figures 4.6-4.8. In the conventional BayesA model, the SNP effects by necessity are estimated to be the same at any age and hence only one plot was provided; this plot demonstrated that three chromosomes (2, 6, and 11) had some relatively large SNP peaks. IW-BayesA (Figure 4.7) and CD-BayesA (Figure 4.8) also demonstrated peaks in these and other chromosomes at all three ages. However, as might be anticipated, SNP effects under these RR models tended to increase with increasing ages. On chromosome 6, we found two relatively large SNP peaks with IW-BayesA and CD-BayesA, respectively. With estimated SNP intercept and slope effects from IWBayesA and CD-BayesA, we can further demonstrate regression lines of estimated SNP effects on rescaled weeks of age (-1, -0.5, 0, 0.5, 1) for the two SNP markers (Appendix Figure C2.1). 90 Figure 4.6: Estimated SNP effects from conventional BayesA against marker position using the complete final analyses data on back fat thickness in MSU Pig Resource Population. 91 Figure 4.7: Estimated SNP effects from IW-BayesA against marker position using the complete final analyses data on back fat thickness in MSU Pig Resource Population when A) at week 10, B) at week 16, C) at week 22. 92 Figure 4.8: Estimated SNP effects from CD-BayesA against marker position using the complete final analyses data on back fat thickness in MSU Pig Resource Population when A) at week 10, B) at week 16, C) at week 22. 93 4.4 Discussion Although RR and RN models have been extensively used for modeling GxE in classical polygenic models (CARDOSO and TEMPELMAN 2012), they have not been as extensively adapted for WGP models. Several efforts have been made to infer upon G×E using RN models in QTL mapping and GWAS studies (Lillehammer, Arnyasi et al. 2007; Lillehammer, Odegard et al. 2007; Lillehammer, Goddard et al. 2008; Lillehammer, Hayes et al. 2009). To improve power of QTL detection, Lillehammer et al. (2007) proposed RN models to estimate the QTL intercept and slope effects based on haplotypes with identity by descent (IBD) information. They applied their models to a Norwegian Red cattle population using herd-year mean estimates as environmental covariates (Lillehammer, Arnyasi et al. 2007; Lillehammer, Goddard et al. 2008) which is typical of RN models. Lillehammer et al. (2009) compared their models with and without pedigree information in an Australian dairy bull population to scan one SNP at a time based on genotypes. WGP models (i.e., joint analysis of all SNP) have been advocated to be very important for conducting GWAS (Wang, Misztal et al. 2012); hence, we explored and compared the alternative WGP RR alternatives for that purpose as well. We consider both RR and RN WGP models in this paper since the modeling issues are almost identical, albeit the circumstances are rather different. RN models are intended for those situations where there is typically one measure per animal and environmental effects and animal BV might be characterized by linear functions of these covariates, (DE JONG 1995). RR models are intended for longitudinal data collection in the sense that there are repeated measurements for each animal over time, such as back fat thickness in pigs as analyzed in this paper. Because of this repeated measures dynamic, 94 it becomes even more imperative to model animal effects (i.e. additive genetics and/or permanent environmental effects) in RR genomic models relative to RN models. Alternative strategies for modeling GxE might be pursued based on, for example, factor analysis, when environments cannot be classified by linear functions of covariates (Burgueno, de los Campos et al. 2012). The simplest RN/RR specification that we considered was IW-BayesC, being essentially identical to a classical mixed model specification. Unlike any of the other four RR/RN specifications, IW-BayesC assumes all pairs of SNP-specific intercept and slope effects to be normally distributed whereas each of the other methods specify either t-distributed or null effects. Furthermore, being simpler, the only hyperparameters requiring inference in IW-BayesC are variance components to estimate; this is not trivial given the difficulty for inferring upon degrees of freedom, for example, in heavier tailed specifications (Habier, Fernando et al. 2011). In fact, IW-BayesC is identical in principle to a classical RR or RN approach for mixed effects modeling. That is, we can set up the mixed model equations from the RR/RN-WGP model [2] as follows:  X ' R −1X  −1  Z'R X ( DZ ) ' R −1X  X ' R −1Z Z ' R −1Z + Ig 11 ( DZ ) ' R −1Z + Ig12   βˆ   X ' R −1y  X ' R −1DZ     −1 ' Z ' R −1DZ + Ig 12   gˆ 1  = Z R y   22   −1 −1    ˆ + DZ ' R DZ I g g DZ ' R y ( ) )   2  (  [11]  g 11 g 12  where  12 22  = Σ −g1 whereby one could readily base s e2 and Σ g on their REML g g  estimates. Of course, it might also be necessary to further specify polygenic effects and/or permanent environmental effects in [11], particularly for the RR case. In the strictest sense, this computational approach is not exactly equivalent to IW-BayesC, 95 which is a fully Bayes procedure that would take into account uncertainty in s e2 and Σ g . Nevertheless, based on experiences with our simulation study (not reported), inferences on variance components, SNP effects, and, hence, BV are effectively the same, provided that relatively diffuse priors on s e2 and Σ g are specified in IW-BayesC. Furthermore, this mixed model/REML approach is computationally far more efficient compared to having to conduct MCMC on IW-BayesC. This efficiency can be further enhanced when the number of markers well exceeds the number of animals for which genomic BV are estimated. That is, one can design a set of mixed model equations equivalent to [11] but with potentially much smaller dimensionality by directly solving for genomic intercept and slope BV in a genomic animal model rather explicitly modeling SNP-specific effects (Habier, Fernando et al. 2007; Strandén and Garrick 2009). Extending Stranden and Garrick (2009) further and assuming, say, one record per genotyped animal in a RN-like situation we can reparameterize Equation [11] as follows X' X'D X ' X   βˆ   X ' y      −1 −1  2 11 D + ( ZZ′ ) s e2 g 12   uˆ 1  =  X I + ( ZZ′ ) s e g  y   −1 −1  2 11 2 22    D ' X D + ( ZZ′ ) s e g D + ( ZZ′ ) s e g  uˆ 2   D ' y  [12] where u1 =Zg1 and u 2 =Zg 2 . Extensions by Misztal et al. (Misztal, Legarra et al. 2009) to RR and RN models where records from genotyped and ungenotyped animals are combined into the analysis would be relatively straightforward as well. One objective of this paper was to develop and compare five alternative RR/RN models against each other and two conventional WGP models BayesA and BayesB. In the simulation study, we first investigated the effect of an “average” genetic correlation 96 ( ρ g1g2 ) between intercept and slope. In Scenarios 1-3 of the simulation study, each mimicking a CD BayesA-like or even IW-BayesA-like process (because ν1 = ν2) , we considered three different levels of ρ g1g2 representing low, median and high positive genetic correlations between traits. We compared accuracy of genomic prediction in environments characterized by low, average and high values. We found significantly higher accuracies for the five RR/RN methods compared to the two conventional WGP methods at the extreme environments (d=-1.2 and d =1.2) at all specifications of ρ g1g2 . We found the difference in accuracies between the RN versus the conventional models became greater with increasing ρ g1g2 at d = -1.2 whereas, curiously, we found the converse trends at d = 1.2. This result might be due to the fact that the intercept is defined at d = 0. Hence, positive genetic correlations between intercept and slope would build positive associations between genomic evaluations at d = 0 with those genomic evaluations based on d ≥ 0, but negative associations between genomic evaluations at d = 0 and those genomic evaluations based on d appreciably less than 0; i.e, where SNP or animal-specific reaction norms start to “crisscross” each other. In other words, if we had specified ρ g1g2 to be negative, we would have found the opposite trends. We found that IW-BayesA tended to have significantly greater accuracies in Scenarios 1-3 than all other RN models at d=1.2 and d=-1.2. This was initially surprising to us; however, we would note two considerations. Firstly, the degrees of freedom specification were the same for QTL effects for both intercept and in slope, i.e., ν1 = ν2. Hence, CD-BayesA might confer little or no advantage to IW-BayesA then because of the greater parsimony of a single degrees of freedom specification of the latter. Secondly, 97 with data simulated based on LD, one might anticipate that the assumption of independence between effects of adjacent SNP markers is somewhat distorted (YANG and TEMPELMAN 2012), although independence is specified for every model in this study. Subsequently it may be rather difficult to predict the relative performance of WGP models under LD, since the true model cannot be specified parametrically under LD, even when QTL effects are generated from known distributions. In fact, we conducted a separate simulation study (results not shown) whereby the data generation strategy exactly match the assumptions of either of the two models, CD-BayesA and IW-BayesA; but based on an assumption of linkage equilibrium between SNP markers. In those comparisons, CD-BayesA did outperform IW-BayesA when the data generation model was based on a CD-BayesA model and vice versa. Similar conclusions have been recently drawn by Wimmer et al. (2013) who found that the presence of high levels of LD, high levels of complexity ( (Mboth + Mint)/m) of genetic architecture, and low levels of determinedness (n/m) will tend to mute differences in performance between various Bayesian alphabet models. At an average level of performance (d=0), there appeared to be very little difference between any of the models, including the two conventional models not based on any RN specification whatsoever. This was not surprising since one would expect that conventional WGP models would, by default, predict to an average environment. Scenarios 4-6 in the simulation study were intended to mimic a CD-BayesB like process whereby only a fraction of the QTL effects having general performance effects (i.e. intercept) also showed environmental sensitivity (i.e., non-zero slope effects). Scenario 4 (heritability = 0.5) and Scenario 5 (heritability = 0.2) were the same in that 98 half of the 100 QTL were environmentally sensitive. As expected, all RN models had lower accuracy in Scenario 5 compared to Scenario 4 because of the lower heritability, although somewhat surprisingly IW-BayesA generally maintained its advantage over all models. Realizing that the total number of QTL have also been known to influence prediction accuracy comparisons between conventional BayesA and BayesB (MEUWISSEN and GODDARD 2010), we considered Scenario 6 which involved a total of 40 QTL, Mint = 20 environmentally robust QTL and Mboth = 20 environmentally sensitive QTL, 2/5 of what was specified in Scenario 4 with everything else being the same. As anticipated, CD-BayesB finally started to emerge as the most accurate of the 4 RN methods particularly at d = 0 and d = 1.2. Again, these results are in agreement with Wimmer et al (2013) who determined that variable selection methods (like BayesB or CD-BayesB here), perform best under genetic architectures with low complexities. Conversely, it was this scenario where the performance of IW-BayesC started to plummet, even being inferior to conventional BayesA/BayesB analyses at d = 0. Of course, we should be quick to note that QTL effects were simulated from heavy-tailed t-distributions, perhaps thereby stacking the odds against IW-BayesC. A particularly odd result was that the comparisons on accuracies did not necessarily match up with estimated accuracies of predicting their components, i.e. intercept and slope BV. That is, CD-BayesB was among the best for inferring upon intercept BV in Figure 4.2A, whereas IW-BayesA was typically among the best for inferring upon slope BV. Nevertheless, this does help explain why IW-BayesA was often 99 the best for inferring upon genomic BV at d = -1.2 and d = 1.2, whereas CD-BayesB generally dominated at d = 0. Our simulation study was based on arbitrary specifications of genetic architecture based on (Mboth + Mint) QTL randomly located on a 1M chromosome; based on arguments provided by Meuwissen and Goddard (2010), one might readily extrapolate these simulation results to the case of nchr*(Mboth + Mint) QTL for a nchr M genome based on nchr chromosomes. We realize that other determinants such as marker density can also influence the comparisons among the five RN WGP models. Furthermore, if we had specified even greater Mboth + Mint QTL, the more complex genetic architecture might then more likely reflect the IW-BayesC assumptions ( π g = 1 , ν g → ∞ such that Σ= Σ g∀j ). j Given our simulation results then, it perhaps was not too surprising that we did not observe any meaningful differences between the various models with an application to data from a pig resource population. Firstly, the model applied was a RR, rather than a RN model, implying that there is greater phenotypic information provided by the repeated records in a RR context thereby potentially muting any real differences between the various candidate models. Furthermore, the genetic architecture of the trait analyzed was presumably far more complex and the level of determinedness far less than anything considered in the simulation study. Based on an analysis using microsatellite markers in the MSU pig resource population, Choi et al. (2010) found highly significant QTL for back fat thickness at week of age 10, 13, 16, 19 and 22 on chromosome 6 using a QTL mapping approach without considering G×E. Our results based on a RR WGP analysis using a low density 100 SNP chip also indicated potential QTL on chromosome 6. Nevertheless, the RR WGP specifications allowed us to explicitly model these potential QTL effects as a function of age. 4.5 Conclusions Five RR/RN methods have been developed in this paper under the frame work of WGP. Based on a RN simulation study and a RR data analysis in pigs, RR/RN WGP models provide greater accuracies in genomic evaluations compared to more conventional WGP models. We believe that it’s important to account for SNP specific intercept and slope effects in RN or RR data situations where SNP genotypes are available. Nevertheless, differences in predictive performance between the various RR/RN WGP models were not overwhelming such that simpler specifications such as IW-BayesA may be suitable for analyses that involve high degrees of genetic complexity or low levels of determinedness as previously mentioned by Wimmer et al (2013). Conversely, based on our simulation results, we anticipate that CD-BayesB might show greater promise when marker density is large relative to the number of QTL; i.e., low degree of complexity. It is important that efficient software and/or algorithms be developed for these models in order to allow for meaningful comparisons in these situations. 101 Chapter 5 Exploring alternative specifications for bivariate trait whole genome prediction models 5.1 Introduction With the advent of genotyping and sequencing technologies, whole genome prediction (WGP) has become commonly used for genetically selecting animals and plants for economically important traits (de los Campos, Hickey et al. 2013). Numerous approaches, including non-parametric methods (Gianola, Wu et al. 2010), Bayesian parametric “alphabet” methods (Meuwissen, Hayes et al. 2001; Gianola, de los Campos et al. 2009; Habier, Fernando et al. 2011), and generalized expectation-maximization methods (KARKKAINEN and SILLANPAA 2012) for single trait analyses have been developed. There may be, however, other untapped opportunities to improve prediction accuracy in WGP. It is well known, for example, that many economically important traits are genetically correlated. Multiple trait analyses have been recently used to account for correlations among traits due to specific genes in genome wide association studies (ZHU and ZHANG 2009), including for differential mapping of pleiotropic versus non-pleiotropic QTLs (Banerjee, Yandell et al. 2008). A large number of genetic evaluation methods have been developed and applied to jointly analyze correlated traits in livestock (Gianola and Sorensen 2004; Banerjee, Yandell et al. 2008). Some of these methods involve independent analyses on sets of transformed variables using techniques based on, for example, factor analysis, principal component analysis, canonical analysis and cluster analysis (Weller, Wiggans et al. 1996; Musani, Zhang et al. 2006; de los Campos and Gianola 2007; Vichi and Saporta 2009). 102 For quantitative genetic analysis, these methods generally require a two-step approach of reducing either number of traits and/or number of genetic effects. Another approach to multiple trait modeling is to model the linear regression relationships among traits in a multilayer system, namely structural equation models (SEM) or path coefficient models (GIANOLA and SORENSEN 2004) although this might seem rather complex with large numbers of SNPs. Banerjee et al. (2008) used seeming unrelated regression (SUR) to identify pleiotropic QTL for multiple traits. This method allows each trait to have a separate set of QTL or trait specific QTL and facilitates a computational efficient sampling algorithm. However, their method models trait correlations due to residuals rather than due to QTL, thereby providing no information on genetic correlation between traits. In order to further improve WGP accuracy, several efforts have been made to develop Bayesian approaches in multiple trait models (CALUS and VEERKAMP 2011; JIA and JANNINK 2012). Calus and Veerkamp (2011) demonstrated that, for traits having a high genetic correlation with each other, multiple trait WGP model analyses lead to higher WGP accuracies compared to single trait analyses, particularly for the lower heritability trait. Among multiple trait WGP models investigated by Calus and Veerkamp (2011), BayesSSVS, a variable selection method with a spike and slab prior on the SNP effects, outperformed other models that assumed a normal density on all SNP. Jia and Jannink (2012) further confirmed that the advantage of multiple trait WGP models over single trait counterparts was greatly influenced by several factors, i.e. heritabilities and genetic correlations between traits as well as the number of QTLs. 103 BayesA and BayesB are popular “Bayesian alphabet” models used for single trait WGP. BayesA specifies a scaled-t prior on SNP effects and is a special case of BayesB which, similar to BayesSSVS, specifies a mixture prior of point mass at zero and scaled-t density (Meuwissen, Hayes et al. 2001). These specifications often have higher WGP prediction accuracies compared to procedures based on Gaussian distribution assumptions (e.g. ridge regression or GBLUP) if SNP effects deviate substantially from normality. Calus and Veerkamp (2011) recently extended BayesA for use in multiple trait analyses and determined similar advantages over multiple trait GBLUP predictions. By estimating SNP-specific pleiotropic effects for multiple traits, we may infer upon the most important pleiotropic regions in the genome given the relative locations of SNP markers (van Binsbergen, Veerkamp et al. 2012). However, the assumption of a multiple trait BayesA model, derived from conjugate inverted Wishart (IW) prior densities on the SNP specific variance-covariance matrices (VCV), might be potentially inflexible since the uncertainty in all elements of a VCV is based on a single degrees of freedom parameter (MUNILLA and CANTET 2012). An alternative parameterization on VCV for random and residual effects was proposed by Bello et al. (2010) who suggested that the square root free Cholesky decomposition (CD) of the VCV in bivariate mixed models might allow greater flexibility as uncertainty can be differentially expressed on each element of a VCV using such a parameterization. In this study, our objectives were: 1) To reaffirm the greater accuracies of prediction provided by bivariate trait models relative to single trait conventional WGP approaches and 2) To assess whether there may be greater flexibility, and hence greater 104 WGP accuracy, using CD-based parameterizations compared to IW based specifications on SNP-specific variance-covariance matrices. 5.2 Methods and Materials 5.2.1 Whole genome prediction models In a bivariate trait WGP model, SNP substitution effects are estimated for two traits simultaneously. The general bivariate trait WGP model can be denoted by the following: yij = x 'ij β j + z′i g j + eij , [1] where yij is the phenotype record for ith animal on jth trait (i =1,2,…,n; j=1,2); β j is the vector of fixed effects on trait j, x′ij is the incidence row vector connecting elements of β j to animal i; z′i = [ zi1 zi 2 zi 3  zim ] is the vector of genotypes coded as 0, 1, or 2 copies of the minor allele on SNPs for animal i; g j = {g jk } is the vector of SNP k =1 m substitution effects on trait j; eij is the random residual for ith animal on jth trait. We can rewrite Equation [1] using matrix notation as:  y1   X1 0   β1   Z 0   g1   e1   y  =  0 X  β  +  0 Z   g  +  e  ,   2  2 2 2  2  where y i = { yij } n i =1 [2] Xi = {xij' } , and Z = {z i' } . The animals’ genomic merit for the two n n i=1 i=1 traits can be subsequently represented as u1 = Zg1 and u 2 = Zg 2 , respectively. For the various bivariate trait WGP models investigated, we assumed that pairs of residuals on 105 animal i; i.e. ei. = [ ei1 ei 2 ] ' , i=1,2,…,n, follow independent bivariate normal densities  s e21 s e1e2  with a vector null mean and a common variance covariance matrix Σ e =  . 2 s e1e2 s e2  Similarly, effects of SNP k on the two traits follow independent bivariate normal densities with a vector null mean and a common variance covariance matrix  s g2 s g1g2  Σg =  1  . Diffuse proper Gaussian or flat priors are typically specified on β1 2 s g1g2 s g2  and β2 (SORENSEN and GIANOLA 2002). For the residual variance covariance matrix Σe , we might typically specify a conjugate inverted Wishart prior with degrees of freedom v0 and scale matrix Σ 0 . 5.2.2 Univariate BayesA and BayesB (uBayesA\uBayesB) We re-label conventional single trait BayesA and BayesB models as uBayesA and uBayesB, respectively to indicate the emphasis on univariate analysis. We infer upon key hyperparameters in this model using prior specifications and strategies previously outlined by Yang and Tempelman (2012). 5.2.3 Bivariate Ridge regression (bGBLUP) Outside of some strategies for rescaling, we specify the realized relationship matrix based on the unscaled genotype matrix for SNPs derived as G=ZZ′ in a bivariate mixed effects model that we label as bivariate genomic BLUP or bGBLUP. In the bGBLUP model, we specified multivariate normal distributions having null means for each of u = [u1 ' u 2 '] ' and e = [e1 ' e 2 '] ' such that var = (u ) 106 ( ( ZZ') ⊗ Σ ) g and var= (e) (I 2x2 ⊗ Σ e ) . Based on these specifications we use ASReml 3.0 (Gilmour, Gogel et al. 2009) to provide REML estimates of Σg and Σe in order to compute the BLUP uˆ of u and hence gˆ of g as necessary. 5.2.4 Bivariate Student-t (IWBayesA) A convenient and previously used extension for bivariate trait WGP model is to apply a conjugate inverted Wishart prior on heterogeneous SNP-specific variancecovariance matrices. This specification represents a multivariate extension of BayesA (CALUS and VEERKAMP 2011) that we label as IWBayesA. For the joint effects of SNP k on the two traits, we can specify a bivariate normal density conditionally as follows:   s g21k s g1k g2 k   g.k  N  02×1 , G k =   2  s s    g g g 2k  1k 2 k  [4] where G k is the SNP-specific variance-covariance matrix for the two traits and is regarded as a random draw from a conjugate Inverted Wishart prior with degrees of  s g21 s g1g2  Σ = freedom vg and scale matrix g  2  . For a fully hierarchical Bayesian model s g1g2 s g2  as developed in Yang and Tempelman (2012), we inferred upon hyperparameters after specifying a prior p ( vg ) on vg , and also a conjugate Wishart prior Σ g ~ W ( v0 , S0 ) on Σ g . Note that the uncertainty of IWBayesA is only controlled by one scalar vg (MUNILLA and CANTET 2012). Furthermore, IWBayesA assumes that every single SNP is pleiotropic. Conceptually, IWBayesA is not much different from that developed (with the same label) for reaction norm modeling in Chapter 4. 107 5.2.5 Cholesky decomposition specifications (CDBayesA\CDBayesB) In order to address the potential inflexibility in IWBayesA, we developed an alternative approach based on the square root free Cholesky decomposition (CD) of G k . We have previously shown that the CD parameterization can provide greater flexibility for modeling variance-covariance matrices (Bello, Steibel et al. 2010). Based on a particular order for two traits, we can write the SNP effects on the second trait g 2 as a linear regression of SNP effects on the first trait g1 : = g 2 Ψg1 + g 2|1 [5] Hence we can re-write the general bivariate trait WGP model [2] as: g1   e1   y1   X1 0   β1   Z 0    y  =  0 X  β  +  0 Z   Ψg + g  + e  2|1    1 2 2  2   2 [6] where g 2|1 = {g 2|1,k } is the vector of SNP effects on the second trait conditional on the k =1 m first trait, and Ψ = diag {φk }k =1 is a diagonal matrix of SNP-specific association effects m between SNP effects on the two traits. 2 Suppose we specify g1k ~ N (0, s g1k )∀k with s g2 ~ χ −2 ( v1 , v1s12 ) ∀k for SNP 1k effects and their respective g 2|1,k ~ N (0, s g22|1k )∀k with variances on Trait s g22|1k ~ χ −2 ( v2|1 , v2|1s2|21 ) ∀k 1. Similarly, we specify for SNP effects and variances on Trait 2 conditional on Trait 1. We label this model as CDBayesA given its CD-based multivariate extension of BayesA; conceptually, it is very similar to the model of the same name used in Chapter 4 for reaction norm modeling. 108 For the SNP specific association effects between the two traits, we specify independent normal priors φk ~ N ( mφ , s φ2 )∀k . Key hyperparameters, namely the degrees of freedom ( v1 , v2|1 ) and scale ( s12 , s2|21 ) parameters, can be inferred upon in CDBayesA using prior specifications similar to those in the conventional uBayesA model. Here, the SNP-specific variance-covariance matrices G k are still specified very generally by three parameters, as with IWBayesA, but expressed by an alternative parameterization; i.e.  s g21k  s g21k φk Gk =  2  2 2 2 s g1k φk s g1k φk + s g2|1k  However, unlike IWBayesA whereby the uncertainty on G k is essentially controlled by one degrees of freedom parameter, the uncertainty on G k in CDBayesA is controlled by three such parameters: the two different degrees of freedom terms v1 and v2|1 as well as the variance component s φ2 . Nevertheless, CDBayesA, as does IWBayesA, assumes that every SNP has a pleiotropic effect. In an attempt to provide even greater flexibility than CDBayesA; i.e., to allow not only pleiotropic effects but also non-pleiotropic effects and/or potentially null effects for each SNP on both traits, we developed a variable selection approach analogous to a BayesB type of specification which we naturally label as CDBayesB. Let s g2 have a 1k mixture prior of point mass at zero with probability (1 − π 1 ) or randomly drawn from 109 χ −2 ( v1 , v1s12 ) with probability π 1 for k=1,2,…,m. Similarly, s g22|1k has a mixture prior of ( ) ( −2 2 point mass at zero with probability 1 − π 2|1 or randomly drawn from χ v2|1 , v2|1s2|1 ) with probability π 2|1 for k=1,2,…,m. For the association effects, φk between the two traits at each SNP k=1,2,…,m, we specify a mixture prior of point mass at zero with non- ( association probability 1 − π φ ) or randomly drawn from N ( mφ , s φ2 ) with association probability π φ . Hence, in CDBayesB, we could infer upon SNP effects that are non-zero and trait-specific (i.e., non-pleiotropic). For Trait 1, this would occur when g1k ≠ 0 with φk = 0 and g 2|1k = 0 whereas for Trait 2 this would entail g1k = 0 and g 2|1k ≠ 0 regardless of the value of φk . Pleiotropic effects will be inferred if g1k ≠ 0 and φk ≠ 0 regardless of the value of g 2|1k although a value of g 2|1k = 0 would then imply a situation of “perfect” pleiotropy between the two traits (i.e., a SNP-specific genetic correlation equal to ±1). Using prior specifications similar to those in CDBayesA, we could infer upon the key hyperparameters, i.e. degrees of freedom ( v1 , v2|1 ) and scale parameters ( s12 , s2|21 ). Furthermore, we could specify informative or diffuse priors p (π 1 ) , p (π 2|1 ) , p (π φ ) , p ( mφ ) , and p (s φ2 ) on π 1 , π 2|1 , π φ , mφ and s φ2 , respectively. This model is somewhat analogous to the same named model for reaction norms in Chapter 4 except that here we specify three mixture rather than two mixture distributions to provide even greater flexibility. 110 5.2.6 Bayesian inference For fully Bayesian inference using Markov Chain Monte Carlo (MCMC) methods, we require strategies for drawing random draws from the full conditional densities (FCD) of each unknown parameter (or blocks thereof) under all models. FCD for uBayesA\ uBayesB have been illustrated in our previous work (YANG and TEMPELMAN 2012). In this paper, we present FCD for three bivariate trait WGP models (IWBayesA, CDBayesA, CDBayesB) in Appendix D1. The fourth bivariate WGP model was analyzed using classical REML and BLUP for computational expedience although we determined that these inferences weren’t practically different from MCMC based inferences (results not shown). 5.2.7 Simulation studies We designed a naive small-scale simulation study involving independent markers (i.e., not in LD) using a response surface design (Table D2.1 in Appendix D2) based on five factors that we thought might be particularly important for influencing the WGP accuracies on the trait with the lower heritability between two rather different models: IWBayesA versus CD-BayesB. These five factors included the number (n) of animals, the number (M1) of QTL controlling Trait 1 (h2 = 0.8), the number (M2) of QTL controlling Trait 2 (h2 = 0.1), the number (M12) of QTL pleiotropically controlling both 2 traits and the variability ( s QTL ,φ ) of the associations between the two traits across QTL. Among all five factors considered, only M12 had a significant interaction with model; i.e., the difference between IWBayesA and CDBayesB for WGP accuracy on Trait 2 depended on M12 (P<0.0001) as further noted in Table D2.2 of Appendix D2. We used 111 this knowledge to design a more focused LD simulation study to compare the WGP accuracies for bivariate trait analyses involving six different models (uBayesA, uBayesB, bGBLUP, IWBayesA, CDBayesA, CDBayesB). Two populations were targeted, differing only in the number of pleiotropic QTL influencing both traits (M12=10 versus M12=30), with all the other specifications being the same as indicated in Table 5.1. Table 5.1: Summary of two different populations compared in a LD simulation study. Factors Population 1 Population 2 Constant Heritability of Trait 1 0.5 0.5 Heritability of Trait 2 0.1 0.1 Residual covariance between two traits 0 0 Number of SNPs 2000 2000 Number of animals 500 500 Mean on association parameters (mφ) 0.8 0.8 Variance on association parameters (s2φ) 0.05 0.05 Number of QTLs for Trait 1(M1) 10 10 Number of QTLs for Trait 2(M2) 10 10 Number of QTLs for both traits (M12) 10 Of Interest 112 30 For each of the two populations or scenarios, we generated 20 replicates based on a constant population size of 100 for 6000 generations of random mating using the hypred package in R (TECHNOW 2012). For each replicate, we defined the genome as composed of one chromosome of length 1 Morgan having 20,000 SNP loci; i.e., the number of recombinations for each meiosis event was drawn from a Poisson(1) distribution with crossing over locations drawn from a uniform distribution. In the base population, all loci were monomorphic with polymorphisms created by a recurrent mutation rate of 2.5×10-4 per locus per generation for each of the first 6000 generations. After 6000 generations, two additional generations (6001 and 6002) were created with expanded population sizes of 500 animals each. In Generation 6001, we excluded SNP with a minor allele frequency (MAF) < 0.1. We defined genotype dosages (i.e., genotype matrix) in Generation 6001 and 6002 as counts (0, 1, 2) of the minor allele for all the remaining SNPs. We randomly selected 2000 SNPs plus an additional M1 + M2 + M12 SNPs to be QTLs in Generation 6001. Now M1=10, M2=10, M12=10 in Population 1 whereas M1=10, M2=10, M12=30 in Population 2. We generated QTL effects { gQTL ,1 j } =j M1 + M12 j =1 for Trait 1 from a reflected gamma distribution with shape=0.4 and scale=2.24. QTL effects { gQTL ,2|1 j } j =M1 + M12 + M 2 =j M1 +1 for Trait 2, conditional on Trait 1 were also generated from a reflected gamma distribution with shape=0.4 and scale = 1.34. The association variables between Trait 1 and Trait 2 (i.e. {φQTL , j } =j M1 + M12 =j M1 +1 2 ) were simulated from a N( mQTL ,φ =0.8, s QTL ,φ =0.05). The effect for gQTL ,1 j + φQTL , j gQTL ,2|1 j ; j=1, 2,…,M1 + QTL j on Trait 2 was thereby determined as g= QTL ,2 j 113 M12 + M2 noting that {φQTL , j } and { gQTL ,2|1 j } j = M1 j =1 j =M 2 j =1 = 0 , {φQTL , j } j =M1 + M12 + M 2 j =M1 + M12 +1 = 0 , { gQTL ,1 j } j =M1 + M12 + M 2 j =M1 + M12 +1 = 0, = 0. Hence pleiotropic QTL effects were generated from a complex bivariate distribution. If we define ZQTL,1 as the subset of the n x M1 SNP genotypes in Z that are designated to be QTL for Trait 1 only and ZQTL,12 as the subset of the n x M12 SNP genotypes in Z that are designated to be QTL for both traits, the true breeding values {TBV1i }i =1 for Trait 1 in each of Generations 6001 and 6002 were generated using i =n {TBV1i }i =1 i =n = ZQTL ,12 { gQTL ,1 j } =j M12 =j 1 + ZQTL ,1 { gQTL ,1 j } =j M12 + M1 [7a] =j M12 +1 Similarly, if we define ZQTL,2 as the subset of the n x M2 SNP genotypes in Z that are designated to be QTL for Trait 2 only, the true breeding values {TBV2i }i =1 for Trait 2 i =n in each of Generations 6001 and 6002 were generated using {TBV2i }i =1 i =n = ZQTL ,12 { gQTL ,2 j } =j M12 =j 1 + ZQTL ,2 { gQTL ,2 j } =j M12 + M 2 [7b] =j M12 +1 2 2 Based on heritabilities h1 = 0.5 and h2 = 0.1 for Trait 1 and Trait 2, respectively, we generated the pair of residuals for each of the two traits on each animal from a bivariate normal distribution with null mean and variance covariance matrix Σ e , i.e. ( )  var {TBV }i =n (1 − h 2 ) h 2 1i i =1 1 1 Σe =   0  0 ( ) var {TBV2i }i =1 (1 − h22 ) i =n   2 h2   [8] In other words, residuals were specified to be uncorrelated as it has been determined previously by Jia and Jannink (2012) that the nature of residual correlation between two 114 traits is inconsequential to the accuracy of WGP prediction in bivariate models. y1i TBV1i + e1i and Phenotypic records for two traits were then generated as= = y2i TBV2i + e2i . Prediction accuracies of breeding values for the two traits in Generation 6002 were defined as correlation between {TBV1i }i =1 and {EBV1i }i =1 for Trait 1, and i =n i =n correlation between {TBV2i }i =1 and {EBV2i }i =1 for Trait 2. The factor M12, of interest, i =n i =n here influences the overall genetic correlation ( ρ g1g2 ) between the two traits, which we determined as the correlation between {TBV1i }i =1 and {TBV2i }i =1 in Generation 6001. i =n i =n 5.2.8 Pine data analyses Resende et al. (2012) provide a data set of loblolly pine phenotypes and genotypes for demonstration of WGP methods which has been previously used by Jia and Jannink (2012). The original data set had genotypes on 4854 SNPs and 926 individuals. After we excluded SNPs with MAF<0.05 and with P<10-4 in HWE test, 2684 SNPs remained. Using de los Campos et al. (2013), we standardized the genotype matrix based (z ij − 2pj ) 2 p j (1 − p j ) , where zij is genotype of jth SNP (minor allele dosage of 0, 1, or 2) on ith animal and p j is allele frequency of one allele for jth SNP. Although raw phenotypes were not publicly available, the authors provided deregressed EBVs for 17 traits. Following Jia and Jannink (2012), we fitted deregressed EBVs as response variables to compare the various WGP models. We selected two disease resistance traits, i.e. Rust gall volume (RGV) with heritability of 0.12 and presence or absence of rust (RBIN) with heritability of 0.21. After merging deregressed EBVs for the two traits and 115 SNP genotypes, 807 individuals with deregressed EBVs on both traits and each genotyped with 2684 SNPs remained in the final data set. Hyperparameters for each of the six models (uBayesA/uBayesB, bGBLUP, IWBayesA, CDBayesA, CDBayesB) were estimated. Bayesian inference was based on 600,000 MCMC iterations with a burn-in period of 50,000 cycles for uBayesA, uBayesB and IWBayesA. However, we found MCMC samples of hyperparameters using the CDBayesA and CDBayesB models were mixing very slowly, particularly for hyperparameters like scale ( s12 , s2|21 ) and degrees of freedom ( v1 , v2|1 ). To alleviate this problem, we arbitrarily specified the degrees of freedom ( v1 , v2|1 ) to be 4 for both traits in the CDBayesA model since its specification didn’t influence the WGP accuracy of BV as we found in Chapter 3. We fixed the scale parameters ( s12 , s2|21 ) in CDBayesA model unique to each trait to their corresponding REML estimates in a bBLUP model as based on the Cholesky decomposition of Σg. Nevertheless, we still estimated the mean and variance of the association parameters (i.e., mφ and s φ2 ) in CDBayesA model using MCMC. For CDBayesB model, we fixed the degrees of freedom ( v1 , v2|1 ) to 5. We also fixed scale parameters ( s12 , s2|21 ) and probabilities ( π 1 , π 2|1 ) unique to each trait based on their corresponding estimates using uBayesB. Other hyperparameters ( π φ , mφ and s φ2 ) in CDBayesB were estimated using MCMC. To further compare the six models by cross-validation, we randomly split the data 20 different times into a training subset with 726 individuals (90%) and a validation subset with the remaining 81 individuals (10%), thereby leading to 20 cross-validation replicates. In order to investigate the influence of the specification of trait order in 116 CDBayesA and CDBayesB, we first analyzed the training data by setting RBIN as Trait 1 and RGV as Trait 2, labeling the two models as CDBayesA1 and CDBayesB1. Then, we switched the order of the two traits and labeled the two models as CDBayesA2 and CDBayesB2. For the seven methods, we predicted the deregressed EBVs in the validation dataset based on posterior mean estimates of SNP effects for the two traits from the training dataset. Performance of the seven models, namely cross-validation predictive ability, was evaluated by Pearson correlation between the predicted and fitted deregressed EBVs in the validation data. In cross-validation, we expected to see more differences in predictive ability comparing the seven models on the low heritability trait RGV than on RBIN. For each of the models on trait RGV, we also assessed inferences on effects on the various SNPs to see if there might be any meaningful differences between the various models in this respect. 5.2.9 Priors used for data analyses In the simulation study, we specified a non-informative prior p (ν ) ∝ (ν + 1) −2 and Gelman’s prior s 2  χ −2 ( −1,0 ) for uBayesA as we’ve done previously (YANG and TEMPELMAN 2012). We specified a non-informative prior p (ν ) ∝ (ν + 1) , a proper −2 απ 1,= βπ 8) in uBayesB. For conjugate prior s 2  Gamma (0.1, 0.1) and π ~ Beta (= IWBayesA, we specified a proper conjugate prior p( Σg ) ∝ W( v0 , Σ0 ) where v0 is 4 and Σ0 is a 2 by 2 identity matrix. For CDBayesA, we specified the same non-informative prior on degree of freedom (ν 1 ,ν 2|1 ) and Gelman’s prior on scale parameters ( s1 , s2|1 ) as 2 2 in uBayesA. Priors p ( mφ ) ∝ N(0,1) and p (s φ2 ) ∝ χ −2 ( −1,0 ) were specified on mean and 117 variance of association parameters in CDBayesA. For CDBayesB, we specified the same prior on degree of freedom (ν 1 ,ν 2|1 ), scale parameters ( s1 , s2|1 ) and probabilities ( π 1 , π 2|1 ) 2 2 as in uBayesB. We specified the same priors on mφ and s φ2 as in CD-BayesA. For π φ , απ 1,= βπ 8 ) . we specified proper prior π φ ~ Beta (= For the analysis of the pine data, we used the same priors on hyperparameters in uBayesA and uBayesB. For IWBayesA, we specified proper conjugate prior p( Σg ) ∝ W( v0 , Σ0 ) where v0 is 2 and Σ0 is a 2 by 2 diagonal matrix. The first and second diagonal elements in Σ0 were specified to be the estimates of scale on RGV and RBIN using uBayesA, i.e. 1.323e-05 and 3.215e-05. For CDBayesA and CDBayesB, we fixed the degrees of freedom and scale parameters to estimates from the corresponding univariate analyses (uBayesA and uBayesB) because of slow mixing as previously noted. We used the same priors on mφ , s φ2 , and π φ for CDBayesA and CDBayesB as specified in the simulation study. 5.3 Results 5.3.1 Simulation Studies In the simulation study, the overall genetic correlation ( ρ g1g2 ) between two traits was 0.48 and 0.63, respectively, for Populations 1 and 2 over the 20 replicates per population. Population 2 had a much smaller between replicate standard deviation (~0.13) for ρ g1g2 compared to Population 1 (~0.29). 118 Figure 5.1 illustrates the average WGP accuracy of predicting breeding values (BV) in Generation 6002 for the two traits over the 20 replicates for each of Populations 1 (Figure 5.1A) and 2 (Figure 5.2A). For Trait 1 (h2 = 0.5) in Population 1, bGBLUP had >5% lower (P<0.05) average accuracy compared to IWBayesA, while IWBayesA had ~2% lower (P<0.05) accuracy compared to the four models (uBayesA\uBayesB\CDBayesA\CDBayesB), including two based on univariate WGP analyses. For Trait 2 (h2=0.1) in Population 1, bGBLUP and uBayesA had ~8% and ~3% lower (P<0.05) average accuracy respectively compared to the other four models (BayesB\IWBayesA\CDBayesA\CDBayesB). No significant difference was found between uBayesB and the three bivariate models (IWBayesA\CDBayesA\CDBayesB) for Trait 2 in Population 1. For Trait 1 in Population 2 (Figure 5.1B), we found that both bGBLUP and IWBayesA had ~2% lower (P<0.05) average accuracy than the other four models (uBayesA\uBayesB\CDBayesA\CDBayesB). For Trait 2 in Population 2, the three bivariate trait models (IWBayesA\CDBayesA\CDBayesB) outperformed bGBLUP (~3%), while bGBLUP had ~5% higher (P<0.05) accuracy than the uBayesA and uBayesB. No significant difference was found among the three Bayesian bivariate trait models (IWBayesA\CDBayesA\CDBayesB) for Trait 2 in Population 2. 119 Figure 5.1: Accuracy of breeding value prediction for six methods (uBayesA\uBayesB\bGBLUP\IWBayesA\CDBayesA\CDBayesB) in two scenarios: A) number of QTLs for both traits = 10; B) number of QTLs for both traits = 30. Different letters indicate significant difference with P<0.05. 120 5.3.2 Pine data analyses The average predictive abilities for the eight different models over the 20 different replicates in the cross validation are summarized in Figure 5.2. For the lower heritability trait RGV, we found that CDBayesB1 and CDBayesB2 had ~5% greater (P<0.05) predictive accuracy compared to the other six models, whereas bGBLUP, CDBayesA1 and CDBayesA2 had lower (P<0.05) predictive accuracies (~6% and ~9%) compared to the other three bivariate trait models. There was no evidence that IWBayesA had different predictive accuracies for RGV compared to uBayesA and uBayesB. However, four of the models (uBayesA\uBayesB\IWBayesA\CDBayesA1) including the two univariate models outperformed bGBLUP (~4% - 6%) for RGV. For the higher heritability trait RBIN, we found that bGBLUP, CDBayesA1 and CD-BayesA2 had lower (P<0.05) predictive accuracies (~6% and ~9%) compared to the other three bivariate trait models. Furthermore, there was no evidence of a difference among uBayesA and uBayesB and three of the bivariate trait models (IWBayesA\CDBayesB1\CDBayesB2). Across each of the cross validation replicates, the estimated SNP effects for either trait using either the two specifications for order of traits using CDBayesB agreed rather well (refer to panels A and B of Figure D2.1 in Appendix D2). However, in contrast with CD-BayesB, there was less agreement between the two different trait orders using CDBayesA (refer to panels C and D of Figure D2.1 in Appendix D2). To further demonstrate the difference among three methods (uBayesA, IWBayesA and CDBayesB1), Figure 5.3 shows the absolute values of estimated SNP effects on RGV against SNP 121 index using all of the data. IWBayesA (Figure 5.3B) detected the same number of extreme large SNP effects as uBayesA (Figure 5.3A). However, compared to IWBayesA and uBayesA, there were larger SNP effects inferred in CDBayesB1 (Figure 5.3C) for RGV, which might partially explain the higher predictive accuracy for RGV in crossvalidation for that particular model. Figure 5.2: Average predictive ability from cross-validation using the loblolly pine data set for eight methods (uBayesA\uBayesB\bGBLUP\IWBayesA\CDBayesA1\CDBayesB1 \CDBayesA2\CDBayesB2), where CDBayesA1 and CDBayesB1 were using RBIN as the first trait and RGV as the second trait; CDBayesA2 and CDBayesB2 were using RGV as the first trait and RBIN as the second trait; Different letters indicate means are different (P<0.05) from each other. 122 Figure 5.3: Estimated SNP effects for RGV from using 807 individuals and 2684 SNPs in Pine data set using three methods: A) uBayesA; B) IWBayesA; C) CDBayesB1. 123 5.4 Discussion Multiple trait extensions to WGP have been developed to improve prediction accuracy by accounting for genetic correlations between traits (CALUS and VEERKAMP 2011; JIA and JANNINK 2012). Many studies have shown the advantage of multiple trait models compared to their univariate counterparts especially for lower heritability traits. For multiple trait models, variable selection methods such as BayesSSVS have shown some advantage in prediction accuracy over models based on no such mixtures and/or normality (CALUS and VEERKAMP 2011). Nevertheless, in Jia and Jannink (2012), a variable selection method BayesCπ, based on normality for one of the mixtures, demonstrated advantages over bivariate BayesA, which is similar to our IW-BayesA, and GBLUP. A Gaussian prior on SNP effects might not be an ideal specification for genetic architectures characterized by few large QTL effects. Furthermore, heavy-tailed variable selection methods like BayesB popularized for univariate WGP analyses have not yet been considered in multiple trait analysis. In this study, we developed multiple trait WGP models based on the two univariate methods BayesA and BayesB. We did not, however, pursue a variable selection of IWBayesA, analogous to IWBayesB developed in the previous chapter, as we believed it to be dubious to attempt to fit a model where SNP effects were either both zero or both non-zero for different traits. In univariate trait WGP analyses, many factors, e.g. number of animals, number of SNP markers, number of QTLs and heritability for the trait, could influence prediction accuracy (MEUWISSEN and GODDARD 2010). According to Jia and Jannink (2012) factors such as number of QTLs, genetic correlation and heritabilities for the two traits could influence the prediction accuracy in bivariate trait WGP analyses,. They pursued “change 124 one factor at a time” techniques for their simulation study experimental design, which reduced the total number of simulated replicates to be generated and analyzed by various competing WGP models compared to a full factorial design. Conversely, we used a response surface design to quickly pinpoint factors that might be particularly important for influencing differences in accuracy between CDBayesB and IWBayesA for WGP prediction leading us to focus in on M12, the number of pleiotropic loci. In previously developed bivariate trait WGP simulations as well as distributional assumptions for various modeling, QTLs have been generally assumed to be always pleiotropic, often to the point that the genetic correlation between two traits is uniform throughout the genome (CALUS and VEERKAMP 2011; JIA and JANNINK 2012). Under such scenarios, these investigators generally found greater advantages for bivariate trait compared to univariate trait WGP models. Conversely, we considered a situation whereby QTL may be either pleiotropic or non-pleiotropic in their effects between the two traits. We specifically defined three categories for QTLs, where M1 and M2 was the number of non-pleiotropic QTLs for each of the two respective traits, while M12 represented the number of pleiotropic QTLs. We further allowed for the fact, in both our simulations and some of our models (e.g., IWBayesA, CDBayesA, and CDBayesB) that the nature of the strength of association (i.e., genetic correlation) might be rather heterogeneous across pleiotropic QTL as association variables (φj) between two traits for the M12 QTL were drawn from a normal distribution. In a focused LD simulation study, the only difference between two competing scenarios was M12 (M12 = 10 in Scenario 1, M12 = 30 in Scenario 2). We found that bivariate trait models improved accuracy of WGP compared to univariate BayesB for the 125 lower heritability trait in Scenario 2 but not in Scenario 1. That is, increasing M12 seemed to provide more power to detect pleiotropic effects in bivariate trait models. However, we did not detect any difference in WGP accuracy between CDBayesB and IWBayesA in Scenario 2, even though CDBayesB, unlike IWBayesA, implicitly distinguishes between pleiotropic versus non-pleitropic QTL. Although IWBayesA is a convenient choice for bivariate trait WGP modeling and closely mirrors the multivariate BayesA procedure previously developed by Calus and Veerkamp (2011), it is imperative that hyperparameters like degrees of freedom and scale parameters are properly “tuned” or inferred upon rather than set to arbitrary values; otherwise, WGP accuracies can be badly compromised (JIA and JANNINK 2012). In this study, we inferred upon hyperparameters based on the specification of diffuse prior distributions. However, we also recognized in this study that mixing problems can arise with real data applications such that it might be necessary to tune these hyperparameters somewhat on univariate analyses. In our LD simulation study, we found that IWBayesA has much lower accuracy compared to other competing methods for the higher heritability trait. Conversely, in Jia and Jannink (2012), IWBayesA outperformed uBayesA in their default simulation scenario. The reason for the different results might be how QTL were differently generated between our two studies. IWBayesA assumes that every SNP has a pleiotropic effect, which partly agrees with the specifications in Jia and Jannink’s simulation study. However, we also specified a substantial proportion of QTL in our study to be nonpleiotropic (i.e., trait-specific). This may have resulted in the IWBayesA being rather 126 inflexible relative to other competing models, including even univariate WGP models, for the higher heritability trait. In our simulation study, bGBLUP performed poorly compared to IWBayesA and even to univariate WGP models (uBayesA\uBayesB) for Trait 1 (high heritability trait) under both Scenarios 1 and 2. The assumption of bGBLUP states that SNP effects for bivariate traits follows a light tail (multivariate normal) distribution, which is often violated when there are only a few QTL that underly both traits. Our simulation study and the analysis of the pine data indicated that bGBLUP had lower WGP accuracy compared to IWBayesA which effectively assumes that SNP effects follow a heavier tailed multivariate t distribution. This is also consistent with conclusions drawn by Jia and Jannink (2012). We found CDBayesB had much higher predictive ability in cross validation on RGV in the pine data analysis. Jia and Jannink (2012) studied the same dataset in their comparisons. However, they didn’t find any significant difference in cross-validation performance between bivariate and univariate WGP trait models (except for the situation when they assumed some missing values for one trait). With the ability to differentially infer upon both pleiotropic and non-pleiotropic effects, we believe that CDBayesB offers more flexibility compared to other competing models, including those previously tested by Jia and Jannink (2012) and Calus and Veerkamp (2011). CDBayesA model is a special case of CDBayesB with probabilities of nonzero effects on both traits and associations set to 1. Unlike IWBayesA, it is necessary to specify an order for the two traits for both CD models. In LD simulation, we analyzed the 127 data under the two models, using the same order of traits we simulated. However, it might not be obvious whether the order of traits specified is important in actual applications. We initially used the higher heritability trait (RBIN) as the first trait and the lower heritability trait (RGV) as the second trait for both CD models. After switching orders for these two traits, we found that predictive ability was unaffected for both CDBayesA and CD-BayesB. One possible reason for this is that these two models might be far more flexible than IWBayesA for distinguishing between pleiotropic versus nonpleitropic QTL. 5.5 Conclusions Alternative Cholesky-based parameterizations (CDBayesA\CDBayesB) and inverted Wishart specification on VCVs (IWBayesA) for bivariate trait WGP models were investigated for their advantage in prediction accuracy compared to bivariate ridge regression (bGBLUP) and univariate WGP models (uBayesA\uBayesB). With both nonpleiotropic and pleiotropic QTLs specified in two scenarios of LD simulation, the three bivariate trait WGP models had higher accuracy than the two univariate trait models (~8%) when the number of pleiotropic QTLs was relatively large. For the low heritability trait in the two scenarios, the three Bayesian bivariate trait WGP models outperformed bGBLUP (P<0.05). However, we didn’t find any significant difference among the three Bayesian bivariate trait WGP models in both scenarios. Jointly accounting for pleiotropic and non-pleiotropic SNP effects in CDBayesB is obviously more flexible compared to bivariate models (CDBayesA and IWBayesA) assuming all SNP are pleiotropic. Due to its flexibility, CDBayesB had higher predictive ability (~5%) compared to other competing models regardless of the order on the two traits in application to pine data. 128 Chapter 6 Discussion, Conclusions and Future Work This dissertation has focused on extending statistical models and developing computing strategies to better conduct whole genome prediction (WGP) for selection of breeding stock for economically important traits based on high density single nucleotide polymorphisms (SNP) marker panels. The primary intent of this work was to develop greater flexibility of WGP models in a number of potentially different ways. One key enhancement was to model potential spatial correlation between SNP effects due to the presence of QTL (Chapter 2). Another was to allow for potentially different modes of genetic action (i.e., pleiotropic versus trait-specific), whether for reaction norm models that account for a specific form of genotype by environment interaction (Chapter 4) or for bivariate trait analysis (Chapter 5). Additional hybrid models that combine the features of various WGP models in this dissertation (e.g., bivariate antedependence models) could be conceptually derived and tested in future work as well. Some researchers might be rather critical of these efforts, recognizing that this dissertation has only added further to the “Bayesian alphabet” (Gianola, de los Campos et al. 2009; Gianola 2013) given proposed model labels such as “ante-BayesA” or “CDBayesB”, for example. This criticism is certainly warranted if key hyperparameters are not properly tuned, since improper tuning would only distort comparisons between the models proposed in this dissertation and more conventional models used in current WGP implementations. Hence this work has been prepared with this issue keenly in mind, presenting fully Bayesian inferential strategies to infer upon these key hyperparameters in every instance whenever possible. In fact, an entire chapter (Chapter 3) addresses 129 computational efficiencies for alternative strategies and the impact of hyperparameter misspecification in WGP models. However, we recognize that there is much more work that needs to be done on this front, particularly as the applications in this dissertation were smaller in scale, i.e., with respect to number of genotypes m and number of phenotypes n, compared to many current applications. Even in those cases, some difficulties were encountered. For example, we resorted to inferring upon some key hyperparameters using conventional univariate models before properly tuning the key bivariate hyperparameters for some of the bivariate genomic analyses in Chapter 5. Some of the models developed in this dissertation did not always perform better than more conventional specifications; indeed, this was contrary to our expectations based on simulated genetic architectures. We realize that all of the simulation studies and applications considered in this dissertation are, by no means, exhaustive; nevertheless we do, for example, note the following. Antedependence specifications (ante-BayesA/ anteBayesB) seemed to show particular advantages when linkage disequilibrium was substantial (Chapter 2) whereas finely constructed bivariate genomic models such as CDBayesB which differentially model pleiotropic from non-pleiotropic QTL did particularly well when the genetic architecture was simple (Chapter 5); i.e., low numbers (mQTL) of QTL relative to the number of SNP markers (m). However, on nearly just as many occasions, we did not detect meaningful differences in WGP accuracy between seemingly disparate model specifications. For example, results were often counterinituitive with our reaction norm model work (Chapter 4) in that sometimes IW-BayesA, a model that assumes complete pleiotropy throughout the genome, did better than CD-BayesB which 130 was constructed to loosen that requirement. Recent work from Wimmer et al. (2013) might be particularly enlightening in that regard; that is, they concluded that Bayesian variable selection models are not likely to confer substantial advantages over simpler GBLUP specifications when the heritability is low, the level of determinedness (n/m), is low, the model complexity (mQTL/n) is high and/or the LD is high. This may partially explain why some of the proposed variable selection methods in this dissertation (e.g. CD-BayesB) did not confer substantial advantages for some of the smaller scale examples considered. Nevertheless, these rules do not necessarily apply to looking at different distributional forms, e.g., Student t versus normal, with extensions to various bivariate forms, e.g. based on inverted Wishart specifications versus more flexible specifications based on the Cholesky decomposition, especially if model complexity is high. Furthermore, the level of determinedness is increasing so fast in some populations, e.g. Holsteins, that now n > m (LEGARRA and DUCROCQ 2012) such that it might become increasingly more feasible to develop more comprehensive WGP models. This might become particularly true for GWAS types of analyses where it has been noted in this dissertation that inferences on individual SNP effects may be sensitive to model specification. Furthermore although m and pairwise LD will admittedly increase with sequencing technologies and thereby limit the effectiveness of more elaborate WGP model specifications, it is also then more likely that future WGP models might be based on haplotypes of SNP rather than single SNP per se, thereby further distorting the issue of WGP model fit and choice relative to work by, e.g., Wimmer et al. (2013). Hence, this continues to be a promising and exciting area of research. 131 There are certainly other strong limitations to the work in this dissertation that may further distort the comparisons between the various WGP models, particularly for analyses that involve real data. Firstly, we assumed genetic effects to be strictly additive, such that it is unpredictable what the effects of non-additive gene action might be (Gianola, Wu et al. 2010) on our comparisons. Certainly, there may be other nonlinearities that might not have been accounted for in our work as well. Plasmodebased simulations may represent a more effective way of reassessing the relative performance of WGP models (Vaughan, Divers et al. 2009). We have already mentioned the computational limitations of some of our proposed models, particularly when hyperparameters need to be estimated. Animal breeders have been reticent, at best, to attempt to infer upon or properly tune these hyperparameters for good reason; it has been rather difficult to do so except, perhaps, based on some method of moments based determinations (de los Campos, Hickey et al. 2013). Although the toolkit in this dissertation was based on MCMC, it might be prudent to pursue other computationally feasible analytical approximations based on, for example, variational Bayes (Logsdon, Hoffman et al. 2010) or expectation-maximization like methods (KARKKAINEN and SILLANPAA 2012). A similar argument also applies to hyperparameter estimation. For example, in ridge regression or GBLUP like models, REML could be used to estimate the key “hyperparameters” like the common SNP variance component and the residual variance for example; certainly something similar could be done for Student t (e.g. Bayes A) implementations as well; e.g. (Pinheiro, Liu et al. 2001). This should be another fruitful area for future research. 132 APPENDICES 133 APPENDIX A: Chapter 2 A1 Markov Chain Monte Carlo Implementation Strategy for Ante-BayesA and Ante-BayesB In order to conduct MCMC, it is necessary to first specify the joint posterior density of all unknown parameters (SORENSEN and GIANOLA 2002). To do this, we interchangeably reparameterize the joint density of the data y and the random SNP effects, using g for ante-BayesA and δ for ante-BayesB in order to exploit algorithmic efficiencies that are unique to either model. For instance with ante-BayesA, we write p ( y,g | β, u, σ δ , t, s e2 ) = p ( y | β, u, g, s e2 ) p ( g | σ δ , t ) [A1] Note the component p ( y | β, u, g, s e2= ) N ( Xβ + Zg + Wu, Is e2 ) is based on Equation [1] −1 −1 whereas g ~ p ( g | σ δ , t ) = N ( 0, Σg ) with Σg = ( I − T ) Δ ( I − T ) ′ are defined by elements in σ δ = s δ21 s δ22 s δ23  s δ2m  specified along the diagonal of ∆, and by t = t2,1 , t3,2 ,..., tm ,m −1  ' specified just below the diagonal elements in T as previously indicated. For ante-BayesB, we reparameterize [A1] differently: p ( y,δ | β, u, σ δ , t, s e2 ) = p ( y | β, δ, t, s e2 ) p ( δ | σ δ ) [A2] recognizing that δ= (I − T)g such that [A2] represents a linear transformation of [A1]. That is, the first component of [A2] is based on ( p ( y | β, δ, t, s )= N Xβ + Z ( I − T ) δ + Wu, Is 2 e −1 ) whereas p (δ | σ ) = ∏ N ( 0,s ) . m 2 e δ j =1 2 δj We’ll subsequently represent [A1] and [A2] together as p ( y,g ( δ ) | β, u, σ δ , t, s e2 , mt , s t2 ) 134 to recognize the interchangeability between g and δ when conditioning on t. The joint posterior density of all unknown parameters can be written as products of specifications provided previously: p ( β, g, u,s e2 , σ δ , t, mt ,s t2 ,ν δ , sδ2 | y ) ∝  m  p ( y,g ( δ ) | β, u, σ δ , t,s e2 , mt ,s t2 ) p ( β )  ∏ p ( t j , j −1 | mt ,s t2 )   j =2  [A3]  m  2 2 2 2 2  ∏ p s δ j |ν δ , sδ , p δ  p (s u |ν u , su ) , p (s e |ν e , Se ) p (ν δ )  j =1  ( ) p ( sδ2 | α s , β s ) p ( mt | mt 0 , st20 ) p (s t2 | vt , st2 ) p (p δ | α pp ,β ) From the paper, p ( β ) = N ( β, Vβ ) , p ( t j , j −1 | mt ,s t2 ) = N ( mt ,s t2 ) , p (s u2 | ν u , su2 ) = χ −2 (ν u ,ν u su2 ) , ( ) p s e | ν e , se = χ 2 2 −2 (ν ,ν s ) , p ( s e 2 e e 2 δ ( ) ) ( ) | α s , β s = Gamma (α s , β s ) , p mt | mt 0 , st 0 = N mt 0 , st 0 , 2 2 p (s t2 | vt , st2 ) = χ −2 ( vt , vt st2 ) , and p (π δ | α π , βπ ) = Beta (α π , βπ ) . Furthermore, ( j ( j p s δ2 | ν δ , sδ2 , π δ ) is a mixture analogous to Equation [2] for ante-BayesB whereas ) ( p s δ2 | ν δ , sδ2 , π δ= 1= χ −2 ν δ ,ν δ sδ2 ) for ante-BayesA as described in the paper. For some parameters, we subsequently derive and present FCD separately for ante-BayesA ( π δ = 1) from ante-BayesB ( π δ < 1) as some MCMC sampling strategies appear to be simpler or more computationally efficient for one or the other model. Now MCMC requires random draws from the full conditional densities of each unknown parameter (or blocks thereof) conditional on all other parameters and the data (SORENSEN and GIANOLA 2002). These full conditional densities are provided below for various classes of these unknown parameters. 135 To sample all fixed and random effects in ante-BayesA, write θ = [β ' g' u '] ' as the (p+m+q) vector of fixed and random effects, Q = [ X Z  W ] as the n x (p+m+q) overall model incidence matrix with Σ − = diag ( Vβ−1 Σg −1 A −1s u−2 ) as a block diagonal matrix with the corresponding listed components as the various blocks. It can be readily demonstrated (SORENSEN and GIANOLA 2002) that the FCD of θ is ( ) θ | y,ELSE ~ N θˆ , C [A4] where ELSE denotes all other parameters in [A3] other than θ and C θˆ = CQ ' y+ β 0 ' Vβ−1 01x ( m + q )  ' for= (Q ' Q + Σ ) − −1 s e2 . Note that with a typical “flat” prior for β is defined by Vβ−1 = 0 such that θˆ = CQ ' y . Also, note that univariate or multivariate block FCD subsets of θ could also be partitioned and sampled using [A4] based on results from Wang and Gianola (1994) . The structure of Σg −1 = {Σg jj ' } contained within Σ − is a simple tri-diagonal matrix: using ZIMMERMAN and NÚÑEZjj s δ−j2 + t 2j +1, js δ−j2+1 for j = 1,2,….,m-1 with ANTÓN (2010), the diagonal elements are Σ= g Σg mm = s δ−m2 whereas the elements adjacent to the diagonal are Σg j , j +1 = Σg j +1, j = −t j +1, j s δ−j2+1 . To sample marker-specific variances in ante-BayesA: Consider now the FCD for s δ2 , j=1,2,…,m: j ( ) ( ) ( p s δ2j | y ,ELSE ∝ p g | t21 , t32 ,..., tm,m −1 , s δ21 , s δ22 , s δ23 ,..., s δ2m p s δ2j | ν δ , sδ2 136 ) [A5] We use Chan and Jeliazkov (2009, pg 461) to simplify the first component of [A5] as follows: ( p g | t21 , t32 ,..., tG ,G −1 , s δ21 , s δ22 , s δ23 ,..., s δ2G ∝ G −1 ∝Δ 1/2 −1 1/2 ) 1/2 1  1  exp  − g ' Σg −1g  = ( I − T ) ' Δ −1 ( I − T ) exp  − g ' ( I − T ) ' Δ −1 ( I − T ) g   2   2  ( ) G  1  exp  − δ ' Δ −1δ  ∝ ∏ s δ2j  2  j =1 −1/2  1 δ j2 exp  −  2 s δ2 j      [A6] Using the component in [A6] pertaining to s δ2j in [A5] and ( ) ν s2 ν  − δ δ − δ +1 2s δ2 j 2  2  p s δ2j | ν δ , sδ2 ∝ s δ j e then νδ  ν δ sδ2  2 ν s2  νδ  − δ δ 2   − + 1 −1/2 δ   2 1 j   s 2  2  e 2s δ2 j  exp  − p s δ2j | y ,ELSE ∝∝ s δ2j 2  2 sδ  νδ  δ j j   Γ   2  ν +1   1 (δ j2 + ν δ sδ2 )  − δ +1 2  2   exp  − ∝ sδ j  2  s δ2j   ( ) ( ) [A7] ( ) ( ) ( ) i.e. p s δ2j | y , ELSE= χ −2 ν δ + 1, δ j2 +ν δ sδ2 . As a sidenote, elements of δ can be recursively derived from g: 0  0  1  −t 0  21 1  0 δ= ( I − T ) g = 0 −t32   1    0 −tm ,m −1 0  137 0   g1   g1      0 g2 g 2 − t21 g1      g3 − t32 g 2  0   g3  =     0         1   g m   g m − tm ,m −1 g m −1  [A8] To sample fixed and random effects other than SNP effects in ante-BayesB, here we deem it computationally tractable to sample the rest of the location parameters separately from g. We again use Equation [A4] except that now we define θ = [β ' u '] ' as a (p+ q)x1 vector of fixed and random polygenic effects with Q = [ X W ] being the corresponding n x (p+ q) submodel incidence matrix and Σ − = diag ( 0 pxp A −1s u−2 ) being the corresponding block diagonal matrix. We then sample using Equation [A4] and C θˆ = CQ ' ( y-Zg ) + β 0 ' Vβ−1 01xq  ' for= (Q ' Q + Σ ) − −1 s e2 . To sample random SNP effects and variances in ante-BayesB, we consider the collapsed sampling strategy (Liu 1994) for jointly sampling s δ2j and δj as previously adapted for Bayes B in Meuwissen et al. (2001). Consider the previously described mixture prior on the conditional variances 0 with probability π δ   p s δ2j | vδ , sδ2 , π δ =  −2 2   χ ( vδ , vδ sδ ) with probability 1-π δ ( ) ( [A9] ) We jointly sample s δ2j and δj from p s δ2j , δ j | ELSE , y , by sampling first from ( ) p s δ2j | y , ELSE except δ j and then from p (δ j | ELSE , y ) . The first component of [A2] implies the following linear model: y = Xβ + Hδ + Wu + e [A10] 138 where= H Z ( I − T ) . Let’s further partition H into the jth column, hj, and other −1 remaining columns H − j ; similarly, we represent δ − j as all elements of δ other than δ j . Then we further rewrite [A10] as follows: y =Xβ + H − j δ − j + h jδ j + Wu + e [A11] It can be readily demonstrated, following similar developments for BayesB provided by Meuwissen et al. (2001), that: ( ) ∫ p (s , δ | y, ELSE ) ) p ( δ | s ) p (s | ν , s , π ) d δ p s δ2j | y , ELSE except δ j = ∝ ∫ p ( y | β, δ, u, t, s δ 2 e 2 δj 2 j δj δj j 2 δj δ 2 δ j j    1  1 * * 2  − − − − exp y h ' y h exp δ δ δ ∫  2s e2 ( j j j ) ( j j j )   2s δ2 j dδ j δj j   −1/2  1  ∝ p s δ2j | ν δ , sδ2 , π V j exp  − y *j ' V j−1y *j   2  ( ) ( ) ∝ p s δ2j | ν δ , sδ2 , π [A12] = V j h j h 'js δ2j + Is e2 . where y *j =y − Xβ − H − j δ j − Wu and Since [A12 ] is not a recognizable distributional form, a Metropolis Hastings step is required. We adapt the independence chains implementation (CHIB and GREENBERG ( ) 1995) as also adapted by Meuwissen et al. (2001) using the prior p s δ2j | ν δ , sδ2 , π δ as the , candidate density. That is, at MCMC cycle [k], one samples a candidate, say, s δ2[*] j ( ) 2 from p s δ2j | ν δ , sδ2 , π δ conditioned upon the updated values for ν δ , sδ and π δ . One 139 accepts s δ2j [ k ] = s δ2j* as the value for in cycle [k] with probability based on the Metropolis- ( ) Hastings acceptance ratio q s δ2j[k −1] → s δ2j* : ( q s δ j[k −1] → s δ j* 2 2 ) ) ( ( )   p s 2 | y , ELSE except δ p s 2 | ν , s 2 , π  δ j* j δ j[ k −1] δ δ min  ,1   2 2 2 =   p s δ j[k −1] | y , ELSE except δ j p s δ j* | ν δ , sδ , π  ;     1, otherwise ( ) ( ) [A13] If the proposal s δ2j* is rejected, then set s δ2j [ k ] = s δ2j [ k−1] ; i.e., the value of s δ2j in the previous MCMC cycle. It can be demonstrated that using Meuwissen et al. (2001) that [A13] is further equal to: ( q s δ2j[k −1] → s δ2j* )     1  * −1/2 exp  − y *' V j *−1y *  V j     2  min  ,1 1 − 1/2 − =  1   [A14] [t −1]  V [t −1]  exp  − y *' V j y *   j   2     1, otherwise  ( ) −1 Note that neither the determinant V j nor the inverse V j are trivial computations since m is typically large. Adapting a development from Rohan Fernando (personal communication) for BayesB, it can be readily shown that [A14] further simplifies: 140 ( q s δ2j[k −1] → s δ2j* = where v*j (h )  * 2     ' h y −1/2 ( 1 j j) *   ( v ) exp  −   j    2   v*j   ,1 min   2   =   h j ' y *j )   ( 1 [ t +1] −1/2   ( v j ) exp  − 2 v[t +1]      j       1, otherwise  ' h j ) s δ2j * + ( h j ' h j )= s e2 and v[jk −1] 2 j (h [A15] ' h j ) s δ2j [ k −1] + ( h j ' h j ) s e2 . Once 2 j s δ2 is sampled, one could immediately draw δ j from p (δ j | ELSE , y ) readily seen to be j  h 'j y *j  s e2  p (δ j | ELSE , y ) = N  ' ,  h j h j + s δ−2 h 'j h j + s δ−2  j j   [A16] ( ) in order to complete the joint collapsed sampler draw from p s δ2j , δ j | ELSE , y . One h j −1 t j , j −1h j + z j −1 , j = could demonstrate the following backward recursive relationship= m, m -2,….,2 with zj denoting column j of Z and hm= zm. Hence for computational ( tractability, one could use this relationship in sampling pairs from p s δ2j , δ j | ELSE , y starting with j = m and working recursively backwards to j=1. To sample proportion of SNP markers associated with zero-effects in AnteBayesB, the FCD of πδ is based on the following: m ( ) p ( π δ | y , ELSE ) ∝ ∏ p s δ2j | ν δ , sδ2 , π δ p ( π δ | αδ , β δ ) j =1 141 [A17] ) = I (s δ ∑ where p ( π δ | αδ , β δ ) = Beta (αδ , β δ ) = . Let m1 m 2 j =1 j ) 0 denote the number of zero-valued elements sampled in σ δ for a particular MCMC cycle where I(.) denotes the indicator function. Then it can be readily demonstrated that Equation [A17] is simply = p ( π δ | y, ELSE ) Beta (αδ + m1 , βδ + m − m1 ) . To sample antedependence parameters and their corresponding hyperparameters, consider now deriving the joint FCD of t = t2,1 , t3,2 ,..., tm ,m −1  ' : p ( t | y ,ELSE )  m  ∝ p g | t2,1 , t3,2 ,..., tm,m −1 , s δ21 , s δ22 , s δ23 ,..., s δ2G  ∏ p ( t j , j −1 | mt , s t2 )  [A18]  j =2  ( ) Borrowing developments, again from Chan and Jeliakov (2009, pg 462), the first component of [A18] can be rewritten as: ( p g | t21 , t32 ,..., tm ,m −1 , s δ21 , s δ22 , s δ23 ,..., s δ2m )  1  exp  − g ' ( I − T ) ' Δ −1 ( I − T ) g   2  2 2 2   1 ( g3 − t32 g 2 )   1 ( g 2 − t21 g1 )  g m − tm ,m −1 g m −1 )  ( 1    exp  −  exp  − ∝ xp  − 2 2 2  2      s 2 s 2 s δ2 δ3 δm        1  ∝ exp  − g ( −1) − Ψt ' Δ(−−11) g ( −1) − Ψt   2  ∝ ( I − T ) ' Δ −1 ( I − T ) ( 1/2 ) ( ) [A19] 142 saving only terms that are functions of t with Ψ =diag ( g1 , g 2, ..., g m −1 ) being a diagonal m-1 x m-1 matrix with the listed elements, g ( −1) =  g 2 g3  g m  ' , and ( ∆ ( −1) = diag s δ22 , s δ23 ..., s δ2m ) being a diagonal m-1 x m-1 matrix with the listed elements. Hence, [A18] can be rewritten as follows: ( ) p ( t | y,ELSE ) ∝ p g ( −1) | t, Δ( −1) p ( t | 1mt , Is t2 )  1    1  ∝  exp  − g ( −1) − Ψt ' Δ(−−11) g ( −1) − Ψt   exp  − 2 ( t − 1mt ) ' ( t − 1mt )   2    2s t    1  ∝  exp  − t − tˆ ' Σt−1 t − tˆ   [A20]  2   ( ) ( ) ( ( ) ) where ( Σˆ t = Ψ ' Δ(−−11) Ψ + Is t−2 ) −1 [A21] and ( tˆ = Ψ ' Δ(−−11) Ψ + Is t−2 ) ( Ψ ' Δ( −1 −1 −1) g ( −1) + 1s t−2 mt ) ( ) [A22] 2 Note that Ψ 'Δ(−−11) Ψ + Is t−2 is diagonal with elements g j s δ−j2+1 + s t−2 , j = 1,2,…,m-1, −1 −2 whereas element j of Ψ 'Δ g + 1s t mt is g j g j +1s δ−j2+1 + s t−2 mt , j = 1,2,…,m-1. In other ( words, the FCD of t j +1, j is t j +1, j | ELSE , y ~ N tˆj +1, j , sˆ t ( j +1, j2) tˆj +1, j = g j g j +1s δ−j2+1 + s t−2 mt (g ) j 2 sδ + s −2 j +1 −2 t [A23] 143 ) where and sˆ t ( j +1, j2) = (( g ) s 2 j −2 δ j +1 + s t−2 ) −1 [A24] Note further that tˆj +1, j can be written as a weighted average: g j g j +1s δ−j2+1 + s t−2 mt s δ−j2+1 ( g j ) g j +1 s t−2 ˆ = t j +1, j = + mt 2 2 2 s δ−j2+1 ( g j ) + s t−2 s δ−j2+1 ( g j ) + s t−2 g j s δ−j2+1 ( g j ) + s t−2 2 [A25] Now with g j = 0 , as one might anticipate occasionally with ante-BayesB with markers defined at the beginning of a linkage group, tˆj +1, j = mt and sˆ 2 t ( j +1, j ) = s t2 such that one 2 draws t j +1, j from its prior density based on updated values of mt and s t . For the much more common situation in ante-BayesB (assuming large π δ ) where gj ≠ 0 but s δ2j+1 = 0, the FCD of t j +1, j can be shown to be a normal with mean tˆj +1, j = sˆ 2 t ( j +1, j ) g j +1 gj and variance = s t2 . With p ( mt ) specified to be normal with prior mean mt0 and prior variance s2t0 then Gibbs sampling can be used for the corresponding parameters. p ( mt ,| y,ELSE ) = N ( m t , s t2 ) [A26] where 144 m −1 m t = s 2 t 1 mt 0 st20 m −1 t + 1 + 2 st20 st [A27] m for t = ∑t j , j −1 2 m −1 and  1 m −1  = s  2 + 2  st   st 0 −1 2 t [A28] 2 The FCD of s t given that the prior p (s t2 | ν t , st2 ) is scaled inverted chi-square 2 with known hyperparameters ν t and st can be derived as follows: p (s t2 ,| y, ELSE )  m  ∝  ∏ p ( t j , j −1 | mt , s t2 )  p (s t2 | ν t , st2 )  j =2  ν s2 ν  t t − t +1 − 2  − ( m −1)/2  1 m 2  exp  − 2 ∑ ( t j , j −1 − mt )   s t2  2  e 2s t ∝  ( 2πs t2 )    2s t j =2   [A29]  ν + m −1    1  m − t +1  2 ∝  (s t2 )  2  exp  − 2  ∑ ( t j , j −1 − mt ) + ν t st2     2s j =2   t       m   2 That is, p (s t2 ,| y , ELSE = ) χ −2  m +ν t , ∑ ( t j , j −1 − mt ) +ν t st2  . Note that we advocate j =2   2 the non-informative specificationsν t = −1 and st = 0 in the paper. 145 To sample the scale parameter for the random SNP effects, borrowing results 2 from Yi and Xu (2008), the FCD for sδ based on the specification of a conjugate prior p ( sδ2 | α s , β s ) = Gamma (α s , β s ) can be written as follows:  m  p ( sδ2 | y , ELSE ) ∝  ∏ I s δ2j > 0 p s δ2j | ν δ , sδ2  p ( sδ2 | α s , β s )  j =1  νδ   2 2   s ν   δ δ 2 sν αs   ν  − δ δ 2  β  m − δ +1 α s −1 2 2 2 s ( δj s)   2 2  2  =  ∏ I sδ j > 0 sδ j e sδ2 ) e − β s sδ  ( ν   j =1  Γ (α s ) Γ δ     2     m1ν δ +1  ν m  αs + −1 2 ∝ ( sδ2 ) exp  − sδ2  δ ∑ I s δ2j > 0 s δ−j2 + β s      2 j =1   ( ( ) ( ) ) ( [A30] ) i.e., a Gamma distribution with parameters α s + m1νδ +1 ν and δ 2 2 ∑ I (s δ m 2 j =1 j ) > 0 s δ−j2 + β s . To sample the degrees of freedom parameter for the random SNP effects, simple Metropolis updates could be used for sampling ν δ . For an arbitrary prior p (ν δ ) , the corresponding FCD is as follows:  m  p (ν δ | ELSE ) ∝  ∏ I s δ2j > 0 p s δ2j | ν δ , sδ2  p (ν δ )  j =1  νδ    ν δ sδ2  2  2  ν s   ν  − δ δ  m − δ +1 2  2s g2   2 2  2  s gj =  ∏ I sδ j > 0 e j  p (ν δ ) ν   j =1  Γ δ    2       ( ( ) ( ) ) 146 [A31] Details on how to ν δ can be based on a random walk Metropolis Hastings step; we have provided details on this in other non-genomic applications involving the sampling of degrees of freedom parameters (Kizilkaya and Tempelman 2005; Bello, Steibel et al. 2010). To sample the residual variance, given a specified scaled inverted chi-square prior 2 p (s e2 ,| α e , se2 ) = χ −2 (α e , α e se2 ) , the corresponding FCD of s e can be written as follows: p (s e2 | y , ELSE ) ( ∝ ( 2πs ∝s ) 2 − n /2 e ν + n  − e +1 2  2  e ) ν ν s2   1  2 − 2e +1 − 2es ee2 e exp  − 2 ( y-Xβ − Zg − Wu ) ' ( y-Xβ − Zg − Wu )  s e  2s e   1  exp  − 2 ( ( y-Xβ − Zg − Wu ) ' ( y-Xβ − Zg − Wu ) +ν e se2 )   2s e  [A32] In other words, [A32] is χ −2 (ν e + n, ( y-Xβ − Zg − Wu ) ' ( y-Xβ − Zg − Wu ) +ν e se2 ) . Note 2 that we advocate the non-informative specificationsν e = −1 and se = 0 in the paper. To sample the polygenic variance, given a conjugate scaled inverted-chi square 2 prior p (s u2 ,| α u , su2 ) = χ −2 (α u , α u su2 ) , the FCD of s e is classically given as follows: p (s | y,ELSE ) ∝ (s 2 u ν u + q  2 - 2 +1 u )  1 ( u′A -1u +ν u su2 )   exp   2  s u2   147 [A33] In other words, [A33] is χ −2 (ν u + q, u′A -1u +ν u su2 ) . Note that we advocate the non2 informative specifications ν u = −1 and su = 0 in the paper. 148 A2 Supplementary Figures and Tables Figure A2.1: Average posterior means of mt and empirical standard errors across 20 replicates for each of six different LD levels using ante-BayesA and ante-BayesB. No significant differences (P>0.01) were determined between the competing procedures with each other or from zero at each LD level. 149 Figure A2.2: Average posterior means of s t and empirical standard errors across 20 replicates for each of six different LD levels using ante-BayesA and ante-BayesB. No significant differences (P>.01) were determined between the two sets of competing procedures at each LD level. 2 150 Figure A2.3: Box-plot of proportions of the absolute posterior means of elements of {t } m j , j −1 j = 2 divided by their respective posterior standard deviations that exceeded 2 across all 20 replicates for each of six different levels of LD using ante-BayesA (A) and anteBayesB (B). 151 Figure A2.4: Average posterior probabilities of association for the top QTL within each of 20 replicates using BayesB and ante-BayesB for each of six different LD levels. LDspecific differences between the two methods declared significant by *(P<0.01), **( P <0.001), or ***(P<0.0001). 152 153 Figure A2.5: Bar plots of posterior probabilities of association of either or both of two bracketing SNP to each of the six largest QTL effects within each of the first four replicates (A,B,C,D) at the highest (r2=0.31), medium (r2=0.24) and lowest (r2=0.15) average LD levels. Posterior probabilities using BayesB and ante-BayesB are represented by green and black bars, respectively, whereas gray bars represent the proportion of the genetic variance accounted for by the corresponding QTL. QTL location is labeled on x-axis for each replicate. 154 Figure A2.6: Boxplots of estimated slopes for within-replicate regressions of true breeding values on estimated breeding values across 9 replicates for four traits in Generations 6, 8 and 10 from benchmark data of Hickey and Gorjanc (2011) using anteBayesB (black), BayesB (dark gray), anteBayesA (light gray) and BayesA (white). Differences from unity indicated as significant by *(0.05 0 p s g2 j | ν , s 2  p (ν )  j =1  (    m =  ∏ I s g2 j > 0  j =1   ( ) ( ) ν )  ν s2  2 ν s2 ν   2  − +1 − 2s 2   s 2  2 e g j gj ν  Γ  2    1  2  (1 + v )   As this FCD is not recognizable, we could use a random walk normal MH step on ξ = log(ν ) . Note that the Jacobian from ν to ξ is exp(ξ ) . The corresponding FCD for ξ is as follows: p (ξ | ELSE )  ( exp(ξ ) s 2 / 2 )exp (ξ )/2   ∝  Γ ( exp(ξ ) / 2 )    Where = m1 ∑ I (s m j =1 2 gj m1 exp ( ξ ) s  m  exp ( ξ )  − − +1 2s g2 j 2 2  2   s 0 s I > e gj gj ∏ j =1  ( ) ) > 0 , hence 164 2  1  exp(ξ )  (1 + exp(ξ ) ) 2  log p (ξ | ELSE )  exp(ξ )  exp(ξ )   ξ + log( s 2 ) − log(2) ) − log Γ  = m1  (  +  2   2 ∑ I (s m j =1 2 gj   exp(ξ )  exp(ξ ) s 2  2 > 0 − + 1 log(s g j ) −  − 2 log(1 + exp(ξ )) + ξ 2   2  s 2  g j   ) Suppose the value of ξ in the current cycle i is ξ [i ] , we could propose a random walk value for ξ [i +1] in the next cycle from a Gaussian distribution:  − (ξ * −ξ [i ] )2  1  p (ξ *) = exp  2cv2   2π cv   That is equivalent to generate a random variable, say δ from N(0, cv2 ) and add it to ξ [i ] * to propose ξ= ξ [i ] + δ . To determine the odds ratio α = p (ξ * | ELSE ) p (ξ [i ] | ELSE ) , we evaluated this ratio as: α ( exp log p (ξ * | ELSE ) − log p (ξ [i ] | ELSE ) ) To implement this Metropolis sampling strategy, we first generated u from a Uniform(0,1) distribution. Then, 1) If α > 1 , accept ξ [i +1] = ξ * ; 2) If α > u , accept ξ [i +1] = ξ * ; 3) If α < u , then set ξ [i +1] = ξ [i ] . The following tuning procedure is to determine cv2 : 1) For the last 10 cycles, the rate of acceptance is greater than 80%, increase cv2 by a factor of 1.2. 165 2) For the last 10 cycles, the rate of acceptance is less than 20%, decrease cv2 by a factor of 0.7. 3) After the burn-in, keep cv2 constant and monitor subsequent acceptance rates to ensure that they fall within 25 to 75%. To sample the scale parameter for the random SNP effects, borrowing results from Yi and Xu (2008), the FCD for s 2 based on the specification of a conjugate prior p ( s 2 | α s , β s ) = Gamma (α s , β s ) can be written as follows:  m  p ( s 2 | ELSE ) ∝  ∏ I s g2 j > 0 p s g2 j | ν , s 2  p ( s 2 | α s , β s )  j =1  (    m =  ∏ I s g2 j > 0  j =1   ( ∝ ( s2 ) αs + m1ν −1 2 ) ( ) ν )  s 2ν  2 s 2ν ν  − 2  2  − +1   s 2  2  e 2s g j gj ν  Γ  2    ( β s )α s 2 α s −1 − β s s2 (s ) e   Γ (α s )    ν m  exp  − s 2  ∑ I s g2 j > 0 s g−2j + β s      2 j =1   ( ) mν i.e., a Gamma distribution with parameters α s + 21 and ν ∑ I (s 2 m j =1 2 gj ) > 0 s g−2j + β s . B1.2 Sampling strategy for UNIMH To sample the degrees of freedom parameter for the random SNP effects, the FCD for sampling ν is as follows: 166  m  p (ν | ELSE ) ∝  ∏ I ( g j ≠ 0 ) p ( g j | ν g , sg2 )  p (ν )  j =1    v +1 v +1  Γ 2 − 2   1 1/2   m  g 1 2   1 + j2   = ∏ I ( g j ≠ 0 )    2   v  π vs   vs   (1 + v ) 2  j =1 Γ     2 As this FCD is not recognizable, we could use a random walk normal MH step on ξ = log(ν ) . Note that the Jacobian from ν to ξ is exp(ξ ) . The corresponding FCD for ξ is as follows: p (ξ | ELSE )  Γ ((exp(ξ ) +1) / 2 )    1 ∝    2  Γ ( exp(ξ ) / 2 )   π exp(ξ ) s  m1 1 (1 + exp(ξ ) ) Where = m1 2 m1 /2 exp ( ξ ) +1 −  m  2 2   g j   I ( g j ≠ 0 ) 1 + 2 ∏   exp(ξ ) s   j =1    exp(ξ ) ∑I (g m j =1 j ≠ 0 ) , hence log p (ξ | ELSE )    1  exp(ξ ) +1   exp(ξ )  1 = m1  log Γ  +  − log Γ   + log  2  2 2 2 π exp ( ξ ) s           exp(ξ ) +1   g 2j I ( g j ≠ 0)  −  ) − 2 log(1 + exp (ξ )) + ξ ∑  log(1 + 2  2 exp(ξ ) s   j =1   m Suppose the value of ξ in the current cycle i is ξ [i ] , we could propose a random walk value for ξ [i +1] in the next cycle from a Gaussian distribution: 167  − (ξ * −ξ [i ] )2  1  p (ξ *) = exp  2 c 2   2π cv v   That is equivalent to generate a random variable, say δ from N(0, cv2 ) and add it to ξ [i ] * to propose ξ= ξ [i ] + δ . To determine the odds ratio α = p (ξ * | ELSE ) p (ξ [i ] | ELSE ) , we evaluated this ratio as: α ( exp log p (ξ * | ELSE ) − log p (ξ [i ] | ELSE ) ) To implement this Metropolis sampling strategy, we first generated u from a Uniform(0,1) distribution. Then, 1) If α > 1 , accept ξ [i +1] = ξ * ; 2) If α > u , accept ξ [i +1] = ξ * ; 3) If α < u , then set ξ [i +1] = ξ [i ] . The following tuning procedure is to determine cv2 : 1) For the last 10 cycles, the rate of acceptance is greater than 70%, increase cv2 by a factor of 1.2. 2) For the last 10 cycles, the rate of acceptance is less than 20%, decrease cv2 by a factor of 0.7. 3) After the burn-in, keep cv2 constant and monitor subsequent acceptance rates to ensure that they fall within 25 to 75%. 168 To sample the scale parameter for the random SNP effects, the FCD for sampling s 2 is as follows:  m  p ( s 2 | ELSE ) ∝  ∏ I ( g j ≠ 0 ) p ( g j | ν , s 2 )  p ( s 2 | α s , β s )  j =1    v +1 v +1  Γ α 2 − 2   1 1/2   m gj  β s ) s 2 αs −1 − β s s2 ( 2    =  ∏ I ( g j ≠ 0)  (s ) e    1 + Γ (α s )  v   π vs 2   vs 2   j =1  Γ    2   Even if this FCD is recognizable, we could use a random walk normal MH step on 2 ψ = log( s 2 ) . Note that the Jacobian from s to ψ is exp(ψ ) . The corresponding FCD for ψ is as follows: p (ψ | ELSE )   v +1 v +1  Γ α 2 − 2   1 1/2   m gj  β s ) s 2 αs −1 − β s s2 ( 2     ∏ I ( g j ≠ 0)  (s ) e    1 + Γ (α s )  v   π vs 2   vs 2   j =1  Γ    2    Γ ( (ν + 1) / 2 )    1 ∝     Γ ( v / 2 )   π vexp(ψ )  m1 ( exp(ψ ) ) α s −1 Where = m1 m1 /2 v +1 −  m  2 2   g j   I ( g j ≠ 0 ) 1 +  ∏   vexp(ψ )   j =1    e − β s exp (ψ ) exp(ψ ) ∑I (g m j =1 j ≠ 0 ) , hence 169 log p (ψ | ELSE )   1  v +1 v 1 = m1  log Γ  ) +  − log Γ   + log( π vexp(ψ )   2  2 2   ν + 1  g 2j I g 0 log(1 ) ≠ − + + (α s − 1)ψ − β s exp(ψ ) + ψ ( )   ∑ j   vexp(ψ )  j =1   2  m Suppose the value of ψ in the current cycle i is ψ [i ] , we could propose a random walk value for ψ [i +1] in the next cycle from a Gaussian distribution:  − (ψ * −ψ [i ] )2  1  p (ψ *) = exp  2 2cs   2π cs   That is equivalent to generate a random variable, say δ from N (0, cs2 ) and add it to ψ [i ] * to propose ψ = ψ [i ] + δ . To determine the odds ratio α = p (ψ * | ELSE ) p (ψ [i ] | ELSE ) , we evaluated this ratio as: α ( exp log p (ψ * | ELSE ) − log p (ψ [i ] | ELSE ) ) To implement this Metropolis sampling strategy, we first generated u from a Uniform(0,1) distribution. Then, 1) If α > 1 , accept ψ [i +1] = ψ * ; 2) If α > u , accept ψ [i +1] = ψ * ; 3) If α < u , then set ψ [i +1] = ψ [i ] . The following tuning procedure is to determine cs2 : 1) For the last 10 cycles, the rate of acceptance is greater than 70%, increase cs2 by a factor of 1.2. 170 2) For the last 10 cycles, the rate of acceptance is less than 20%, decrease cs2 by a factor of 0.7. 3) After the burn-in, keep cs2 constant and monitor subsequent acceptance rates to ensure that they fall within 25 to 75%. B1.3 Sampling strategy for BIVMH To sample the degrees of freedom and scale parameters for the random SNP effects, we divided burn-in into four stages with equal length as follows: 2 In stage 1, we sample log(ν ) and log( s ) using UNIMH (see sampling strategy 2) with fine-tuning procedure on cv2 and cs2 , which are also the variances for the two separate Gaussian proposal densities; 2 In stage 2, we sample log(ν ) and log( s ) using UNIMH with fixing cv2 and cs2 to the values tuned from the last cycle in stage 1 and compute correlation r between samples of log(ν ) and log( s 2 ) within stage 2; 2 In stage 3, we jointly sample log(ν ) and log( s ) using a bivariate Gaussian proposal density with variances cv2 and cs2 based on those tuned at the end of Stage 1 and a covariance based on the correlation computed from Stage 2. Joint density for ν and s 2 is as follows: 171  m  p ( v, s 2 | ELSE ) ∝  ∏ I ( g j ≠ 0 ) p ( g j | ν , s 2 )  p (ν ) p ( s 2 | α s , β s )  j =1    v +1 v +1  Γ αs 2 − 2   1 1/2   m  g 1 ( βs ) 2   j 2 α s −1 − β s s 2  =  ∏ I ( g j ≠ 0) 1 +  s ( ) e     2 Γ (α s )  v   π vs 2   vs 2  1 v + ( )  j =1  Γ    2   As this density is not recognizable, we could use a random walk normal MH step on ξ = log(ν ) and ψ = log( s 2 ) . Note that the Jacobian from ν to ξ is exp(ξ ) . Note that the Jacobian from s 2 to ψ is exp(ψ ) . The corresponding joint density for ξ and ψ is as follows: p (ξ ,ψ | ELSE )  Γ ((exp(ξ ) +1) / 2 )    1      Γ ( exp(ξ ) / 2 )   π exp(ξ )exp(ψ )  m1 1 (1 + exp(ξ ) ) Where = m1 ( exp(ψ ) ) α s −1 2 ∑I (g m j =1 j m1 /2 exp ( ξ ) +1 −  m  2 2   g j   I ( g j ≠ 0 ) 1 +  ∏  ( ) ( ) exp exp ξ ψ    j =1    e − β s exp (ψ ) exp(ψ )exp(ξ ) ≠ 0 ) , hence log p (ξ ,ψ | ELSE )   1  exp(ξ ) +1   exp(ξ )  1  = m1  log Γ  log − Γ +      + 2    2  2  π exp(ξ )exp(ψ )      exp(ξ ) +1   g 2j 0 log(1 ) − I g ≠ − + ( j )   2  ∑ exp(ξ )exp(ψ )  j =1  2 log(1 + exp(ξ )) + (α s − 1)ψ − β s exp(ψ ) + ψ + ξ m 172 Suppose the value of η =[ξ ,ψ ]′ in the current cycle i is η[i ] , we could propose a random walk value for η[i +1] in the next cycle from a bivariate Gaussian distribution: = p ( η *) 1 2π cη2Σ 1/2 −1   exp  − ( η * −η[i ] )′ ( cη2Σ ) ( η * −η[i ] )    2 That is equivalent to generate a random variable, say δ from N(0, cηΣ ) and add it to η[i ]  c2 v to propose η = η + δ , where Σ =  r cv2cs2 * [i ] r cv2cs2  2  , cv and cs2 were fixed value cs2  2 from last cycle in stage 1, correlation r between samples of log(ν ) and log( s ) computed from stage 2. To determine the odds ratio α = α p ( η* | ELSE ) p ( η[i ] | ELSE ) ( , we evaluated this ratio as: exp log p ( η* | ELSE ) − log p ( η[i ] | ELSE ) ) To implement this Metropolis sampling strategy, we first generated u from a Uniform(0,1) distribution. Then, 1) If α > 1 , accept η[i +1] = η * ; 2) If α > u , accept η[i +1] = η * ; 3) If α < u , then set η[i +1] = η[i ] . The following tuning procedure is to 2 determine cη : 2 1) For the last 10 cycles, the rate of acceptance is greater than 60%, increase cη by a factor of 1.2. 173 2 2) For the last 10 cycles, the rate of acceptance is less than 10%, decrease cη by a factor of 0.7. 2 3) After the burn-in, keep cη constant and monitor subsequent acceptance rates to ensure that they fall within 25 to 75%. 2 In stage 4, we jointly sample log(ν ) and log( s ) using a bivariate Gaussian proposal 2 density with fixing value of cη at the end of stage 3. After burn-in, we started to save all samples on ν and s 2 using MH with the bivariate Gaussian proposal density. 174 B2 Supplementary tables and figures Table B2.1: Average posterior means (PMEAN), posterior standard deviations (PSD), posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e2 ), cage variance ( s c2 ), polygenic variance ( s u2 ) and key hyperparameters (ν , s 2 , and π ) based on BayesA and BayesB analyses of subsets with 950 SNPs from the heterogeneous stock mice dataset. DFMH Parameter PMEAN PSD BayesA s e2 0.16 0.09 2 sc 2.02 0.21 2 su 4.02 0.36 18.60 47.26 ν s2 2e-3 6e-4 BayesB s e2 0.16 0.06 s c2 1.94 0.21 s u2 3.92 0.32 29.16 65.25 ν 2 s 0.02 5e-3 0.83 0.10 π ESS UNIMH PMEAN PSD ESS BIVMH PMEAN PSD ESS 2964 2515 2295 291 279 0.16 2.02 3.98 22.28 2e-3 0.08 3218 0.21 2797 0.36 2602 61.07 856 6e-4 498 0.16 2.02 3.98 24.49 2e-3 0.09 0.21 0.36 64.30 6e-4 3240 2757 2690 1092 504 4918 4301 4113 395 314 391 0.16 1.91 3.91 30.61 0.02 0.83 0.06 0.21 0.30 72.58 5e-3 0.10 0.16 1.91 3.91 35.80 0.02 0.83 0.06 0.21 0.30 81.36 5e-3 0.10 6283 5466 5274 2957 813 639 175 6171 5441 5280 2346 782 619 Table B2.2: Average posterior means (PMEAN), posterior standard deviations (PSD), posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e2 ), cage variance ( s c2 ), polygenic variance ( s u2 ) and key hyperparameters (ν , s 2 , and π ) based on BayesA and BayesB analyses of subsets with 1900 SNPs from the heterogeneous stock mice dataset. DFMH Parameters PMEAN PSD UNIMH ESS BIVMH PMEAN PSD ESS PMEAN PSD ESS BayesA s e2 s 2 c s u2 ν s2 0.16 2.03 3.80 14.82 8e-4 0.08 1648 0.21 1515 0.39 1314 36.03 121 3e-4 109 0.16 2.00 3.80 19.96 8e-4 0.07 0.20 0.38 55.46 3e-4 2228 2016 1983 1075 303 0.16 2.00 3.80 22.75 8e-4 0.07 0.21 0.39 61.18 3e-4 3195 2786 2238 1425 315 0.16 1.98 3.68 23.57 2e-3 0.85 0.07 4228 0.19 4037 0.35 3971 52.94 215 1e-3 194 0.09 208 0.16 1.98 3.68 30.50 2e-3 0.85 0.07 0.19 0.36 72.51 1e-3 0.08 5210 4468 4300 2474 475 424 0.16 1.98 3.68 33.75 2e-3 0.85 0.07 0.19 0.34 81.49 1e-3 0.09 6185 5891 5284 2948 542 594 BayesB s e2 s 2 c s u2 ν s2 π 176 Table B2.3: Average posterior means (PMEAN), posterior standard deviations (PSD), posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e2 ), cage variance ( s c2 ), polygenic variance ( s u2 ) and key hyperparameters (ν , s 2 , and π ) based on BayesA and BayesB analyses of subsets with 3800 SNPs from the heterogeneous stock mice dataset. DFMH PSD ESS Parameters PMEAN BayesA s e2 0.16 0.07 s c2 2.01 0.20 s u2 3.43 0.33 3.14 0.88 ν 2 s 5e-4 2e-4 BayesB s e2 0.16 0.07 s c2 1.96 0.19 2 su 3.27 0.32 6.95 27.24 ν 2 s 1e-3 9e-4 0.88 0.10 π UNIMH PMEAN PSD ESS BIVMH PMEAN PSD ESS 1030 1011 827 111 103 0.16 2.01 3.43 3.14 5e-4 0.07 0.20 0.34 2.05 2e-4 1792 1565 1529 1280 407 0.16 2.01 3.43 3.24 5e-4 0.07 0.20 0.34 1.50 2e-4 2962 2537 1968 1339 456 3579 3134 2896 198 163 194 0.16 1.96 3.27 6.71 1e-3 0.88 0.07 0.19 0.33 20.79 9e-4 0.09 4198 3986 3761 2513 419 405 0.16 1.96 3.27 9.05 1e-3 0.88 0.07 0.19 0.31 30.10 9e-4 0.09 5230 4273 4158 3127 489 512 177 2 Figure B2.1: Average posterior means of s (BayesA, BayesB) using DFMH, UNIMH and BIVMH across 15 replicates at LD level of 0.17, 0.24 and 0.32 comparing DFMH, UNIMH and BIVMH using BayesA model in (A) and using BayesB model in (B). 178 Figure B2.2: Average posterior means of π using BayesB model across 15 replicates as a function of LD levels 0.17, 0.24 and 0.32 comparing DFMH, UNIMH and BIVMH. 179 Figure B2.3: Average posterior means of v (BayesA, BayesB) across 15 replicates for three different levels of LD comparing DFMH, UNIMH and BIVMH in BayesA model (A) and in BayesB model (B). 180 2 Figure B2.4: Average posterior means of s in BayesA model using DFMH, UNIMH and BIVMH across 15 replicates for three LD levels of 0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH, UNIMH and BIVMH. 2 Figure B2.5: Average posterior means of s in BayesB model using DFMH, UNIMH and BIVMH across 15 replicates for three LD levels of 0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH, UNIMH and BIVMH. 181 Figure B2.6: Average posterior means of π using BayesB model across 15 replicates at three LD levels of 0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH, UNIMH and BIVMH. Figure B2.7: Average posterior means of v across 15 replicates for three LD levels of 0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH, UNIMH and BIVMH in BayesA model. 182 Figure B2.8: Average posterior means of v across 15 replicates for three LD levels of 0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH, UNIMH and BIVMH in BayesB model. 183 APPENDIX C: Chapter 4 C1 Markov Chain Monte Carlo (MCMC) Implementation Strategy for all methods C1.1 MCMC Implementation Strategy for IW-BayesB/IW-BayesA /IW-BayesC For the RR/RN WGP model, we know that y is the n x 1 vector of phenotypes for animals, β is the q x 1 vector of fixed effects, g1 represents the m x 1 vector of SNP specific random intercept effects, g2 represents the m x 1 vector of SNP specific random slope effects, X is the n x q design matrix for fixed effect, Z is a m x m genotype matrix, D is the n x n diagonal matrix with the environmental covariates on the diagonal. y=Xβ + Zg1 + DZg 2 + e To sample location parameters β, g1, g2 computationally efficient, we adopted a GaussSeidel updating algorithm in MCMC implementation strategy (LEGARRA and MISZTAL 2008). To sample the random effects more efficiently in order to facilitate MCMC mixing, we block sampled random intercept and slope effects one SNP at a time, such that we ′ ′ define a vector g =  g11 , g 21 ,..., g1 j , g 2 j ,..., g1m , g 2 m  where g j =  g1 j , g 2 j  are random intercept and slope effects for SNP j. Based on the description of priors for the three models in materials and methods, we can write the joint posterior density for all unknown parameters as following: 184 p ( β, g, G, vg , Σg , π , s e2 | y ) ∝ p ( y | β, g, G, vg , Σg , π , s e2 ) p ( β )  m  m  ∏ p(g j | G j )   ∏ p G j | vg , Σg , π =  j 1=  j 1 (  )  p( Σ  g | v0 , Σ0 ) p ( vg ) p (π ) p (s e2 ) where G j is a 2 x 2 genetic variance-covariance matrix and Σg is a 2 x 2 scale matrix for random intercept and slope effects. In IW-BayesB ( π < 1 ), G has a mixture prior of point mass at a 2 x 2 matrix of 0’s with probability 1 − π and Inverted Wishart distribution with degrees of freedom vg and scale matrix Σg with probability π . In IW-BayesA ( π = 1 ), G has an Inverted Wishart prior of degree freedom vg and scale matrix Σg . In IW-BayesC ( π = 1 and vg = ∞ ) , all SNPs share a common genetic variance-covariance matrix Σg . To sample fixed effects in IW-BayesA/IW-BayesB/IW-BayesC, FCD for the kth element of β is as follows: β k | y, β − k , else ~ N ( βk , vk ) where βk = x.' k ( ( y-Xβ − Zg1 − DZg 2 ) +x.k β k ) x.' k ( e + x.k β k ) = = x.' k x.k x.' k x.k (x ' .k e + x.' k x.k β k ) x.' k x.k  n  n 2  x e +  ∑ ik i  ∑ xik  β k  i 1= = i1   = ' x.k x.k  n 2 x x s = and vk s= ( .k )  ∑ xik   i =1  2 e ' .k −1 −1 2 e 185 ( ) e − x.k β k [ t +1] − β k [ t ] , Immediately after sampling β k , we update the residual by e = where β k [ t +1] is the sampled β k value at cycle [t+1] and β k [ t ] is the sampled β k value at cycle [t]. To sample random intercept and slope effects in IW-BayesA, FCD for the jth element of g is as follows: (  g j | y, β, g − j , G j , Σg , s e2 ~ N g j , V gj ) Where −1   z′. j    z′. j    g1 j   −1 2     = + − − + s g j   Zg DZg z Dz z Dz G y-Xβ    ( ) j e .j .j 1 2  .j  z′. j D   . j   z′. j D   g2 j            −1   z′. j z. j   z′. j  z′. j Dz. j  =  + G j −1s e2   e + z. j g1 j + Dz. j g2 j   z′. j Dz. j z′. j DDz. j  z′. j D       z′. j Dz. j    z′. j z. j  + G j −1s e2  =    z′. j Dz. j z′. j DDz. j     −1 ( )  z′. j ( e + z. j g1 j + Dz. j g 2 j )     z′. j D ( e + z. j g1 j + Dz. j g 2 j )     z′. j z. j z′. j Dz. j  −2   = s e + G j −1  and V   gj   ′ ′   z. j Dz. j z. j DDz. j   −1 ( ) ( ) After sampling g j , we update residual by e = e − z. j g1[tj+1] − g1[ tj] − Dz. j g2[t j+1] − g2[ t j] , where g1 j [ t +1] , g 2 j [ t +1] are the sampled g1 j , g 2 j value at cycle [t+1] and g1 j [ t ] , g 2 j [ t ] are the sampled g1 j , g 2 j value at cycle [t]. 186 To sampling genetic variance-covariance matrices for random effects in IW- ( ) BayesA, given the specified inverted Wishart prior p G j | vg , Σg ∝ IW( vg , Σg ) , the FCD for G j can be written as follows: p ( G j | y, else ) ∝ p ( g j | G j ) p ( G j | vg , Σg ) ∝ Gj ∝ Gj −1/2 ( ) ( exp −0.5 ( g′j G −j 1g j ) G j ) 1 − ( vg +1) +3 2       ( ) 1 − vg +3 2 ( exp −0.5trace ( G −j 1Σg (vg − 3) ) )   g1j2  g1j g 2j    + − v Σ ( 3)  g g   g1j g 2j g 2j2      exp  −0.5trace  G −j 1      g1j2 g1j g 2j    | , ~ IW 1, G y + + Σg ( vg − 3)   else v Hence,   g j 2  g1j g 2j g 2j       To sample random effects and genetic variance-covariance matrices in IWBayesB, according to the collapsed sampling strategy (Liu 1994), we jointly sampled g j and G j as adapted in Bayes B (Meuwissen et al. 2001). We first sample from p ( G j | y, ELSE except g j ) p ( G j | ν g , Σg , π ) V j = −1/2  1  exp  − y *j ' V j−1y *j   2   g1 j  * where y j =( y-Xβ − Z − j g1,− j − DZ − j g 2,− j ) = z. j Dz. j    + e  g2 j   z '  V j var = = and ( y*j ) z j Dz j  G j z 'j D + Is e2  j  187 Since the above FCD is not recognizable, we need to adopt Metropolis-Hastings (MH) algorithm using the mixture prior p ( G j | ν g , Σg , π ) as the candidate density. At MCMC * cycle [t], we sample a candidate G j from the candidate density conditional on updated [t ] * values for hyperparameters. We accept G j = G j with the probability based on the MH ( ) [ t −1] acceptance ratio of α G[jt −1] , G*j , where G j is the value at MCMC cycle [t-1]. ( ) Adapting Meuwissen et al. (2001), the MH ratio α G[jt −1] , G*j looks as follows: α (G [ t −1] j * j ,G ) ( ( ) ( ) ( ) )   p G*j | ELSE except g j p G[jt −1] |ν G , ΣG , π   min   ,1 =  p G[jt −1] | ELSE except g j p G*j |ν G , ΣG , π     1, otherwise  * [t ] [ t −1] If the candidate G j is rejected, we can then set G j = G j . The MH ratio can be further derived as: −1     1  * −1/2 exp  − y * ' ( V j * ) y *    Vj   2  ,1  min  α ( G[jt −1] , G*j ) =   V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *      j  j   2     1, otherwise  ( ) It can be demonstrated that this MH ratio can be written as:     r j2  *−1/2  exp  −  Rj    2R *     j   ,1 min  [ t −1] * 2 α (G j , G j ) =     R [t −1]−1/2 exp  − r j   j  [ t 1] −  2R    j       1, otherwise  188 where rj  zj '  *  zj '   g1 j   z j '   z ' D  y j =  z ' D   z. j Dz. j   g  +  z ' D  e  j   j   2j  j  and  z '  g   z '   R j var = = ( r j ) var   z 'j D  z. j Dz. j   g1 j  +  z 'j D e   2j  j    j  z j ' Dz j   z j ' z j z j ' Dz j   z j ' z j z j ' Dz j  2  zj 'zj  z ' Dz z ' DDz  G j  z ' Dz z ' DDz  +  z ' Dz z ' DDz  s e j j j j j j j j j  j  j  j If a non-zero matrix for G j is sampled, then one can draw samples of g j using the same [ t +1] [t ] [ t +1] [t ] full conditionals as with IW-BayesA. If either one of g1 j , g1 j , g 2 j , g 2 j are non- ( ) ( e − z. j g1[ tj+1] − g1[ tj] − Dz. j g 2[ tj+1] − g 2[ tj] zero, we need to update residual e = ) immediately after sampling g j , where g1 j [ t +1] , g 2 j [ t +1] are the sampled g1 j , g 2 j value at cycle [t+1] and g1 j [ t ] , g 2 j [ t ] are the sampled g1 j , g 2 j value at cycle [t]. To sample proportion of SNP markers associated with non-zero effects in IWBayesB, with specified prior p (π | απ , βπ ) = Beta (απ , βπ ) , the FCD of π is based on the following,       p (π | y,else ) ∝  ∏ p ( G j | vg , Σg , π )  p (π | απ , βπ ) m j =1 189 = Let m1 m ∑ I(G j =1 j ≠ 02×2 ) denote the number of non-zero vectors sampled in G j at a particular MCMC cycle where I(.) denotes the indicator function. Then we can write p (π | y,else =) Beta(απ + m1 , βπ + m − m1 ) . To sample scale matrix for genetic variance-covariance matrices in IWBayesA/IW-BayesB, given the specified Wishart prior for scale matrix p( Σg ) ∝ W( v0 , Σ0 / v0 ) , we can write FCD for Σ g as the following,  m  p ( Σg | y, else ) ∝  ∏ I ( G j ≠ 02×2 ) p ( G j | vg , Σg )  p( Σg )  j =1   m  vg /2 ∝  ∏ I ( G j ≠ 02×2 ) Σg exp −0.5trace ( G −j 1Σg ( vg − 3) )  Σg  j =1  ( ∝ Σg ( 1 v0 + vg m −3 2 ) ) 1 ( v0 −3) 2 ( exp −0.5trace ( Σg Σ0−1v0 ) m     exp  −0.5trace  Σg  ( vg − 3)∑ I ( G j ≠ 02×2 ) G −j 1 + v0 Σ0−1       j =1       −1 m    −1 −1  Hence, Σg | y, else ~ W  v0 + vg m,  ( vg − 3)∑ I ( G j ≠ 02×2 ) G j + v0 Σ0    j =1     To sample the degrees of freedom for genetic variance-covariance matrices in IW-BayesB/IW-BayesA, with a specified non-informative prior for degrees of freedom p( vg ) ∝ 1 / (1 + vg ) , we can write the FCD for vg as the following, 2  m  p ( vg | y, else ) ∝  ∏ I ( G j ≠ 02×2 ) p G j | vg , Σg  p( vg )  j =1  ( vg /2  m ( vg − 3) Σg Gj ∝  ∏ I ( G j ≠ 02×2 ) − vg  j =1 2 Γ 2 ( vg / 2 )  ) − ( 1 vg + 3 2 )  1 exp −0.5trace ( G −j 1Σg ( vg − 3) )   (1 + vg )2  190 ( ) ) We sampled vg using a random walk Metropolis Hastings algorithm, which was described in other non-genomic applications involving the sampling of degrees of freedom parameter (Kizilkaya and Tempelman 2005; Bello, Steibel et al. 2010) . To sample random intercept and slope effects in IW-BayesC, FCD for the jth element of g is as follows: (  g j | y, β, g − j , Σg , s e2 ~ N g j , V gj ) Where −1   z′. j    z′   g   z. j Dz. j  + Σg −1s e2   . j   ( y-Xβ − Zg1 − DZg 2 ) +  z. j Dz. j   1 j   = g j       g2 j   z′ D   z′ D     .j    .j  −1   z′. j z. j   z′. j  z′. j Dz. j  =  + Σg −1s e2   e + z. j g1 j + Dz. j g2 j   z′. j Dz. j z′. j DDz. j  z′. j D         z′. j z. j  z′. j Dz. j  =  + Σg −1s e2    z′. j Dz. j z′. j DDz. j     −1 ( ( ) )  )  z′. j e + z. j g1 j + Dz. j g 2 j   z′ D e + z g + Dz g .j 1j .j 2j  .j (   z′. j z. j z′. j Dz. j  −2  −1  Σ s = + and V   gj g    z′. j Dz. j z′. j DDz. j  e    −1 ( ) ( ) After sampling g j , we update residual by e = e − z. j g1[tj+1] − g1[ tj] − Dz. j g2[t j+1] − g2[ t j] , where g1 j [ t +1] , g 2 j [ t +1] are the sampled g1 j , g 2 j value at cycle [t+1] and g1 j [ t ] , g 2 j [ t ] are the sampled g1 j , g 2 j value at cycle [t]. 191 To sample genetic variance-covariance matrix for random effects in IW-BayesC, from the joint posterior density, we can derive FCD for Σg , which has an inverted Wishart prior p( Σg | v0 , Σ0 ) ∝ IW ( v0 , Σ0 ) . p ( Σg | y, else ) ∝ p(g | Σg ) p( Σg | v0 , Σ0 ) ∝ Σ g ⊗ I m ×m ∝ Σg − −1/2 1 ( m + v0 + 3) 2 1   − ( v0 + 3) g    exp  −0.5  [ g1′ g′2 ]  Σg−1 ⊗ I m×m   1    Σg 2 exp −0.5trace ( Σg−1Σ0 )   g 2      ( ( ( exp −0.5trace Σg−1 ( Sg + Σ0 ) ))  g ' g g1' g 2  where S g =  '1 1 '   g 2 g1 g 2 g 2  i.e. p( Σg | y, else) ~ IW ( v0 + m, Sg + Σ0 ) To sample residual variance in IW-BayesA/IW-BayesB/IW-BayesC, given a 2 −2 specified scaled inverted Chi-square prior p(σ e |ν e , Se ) = χ (ν e ,ν e Se ) , we can write 2 the FCD for s e as follows, p (s e2 | y, else ) ∝ ( 2πs ∝s 2 e ) 2 − n /2 e − n +ν e +1  2  ν  ν S   2 − 2e +1 − 2es e2e 1 exp  − 2 e ' e  s e e  2s e   exp  −   1 ( e ' e + ν e Se )  2s  2 e where e = y-Xβ − Zg1 − DZg 2 ( i.e. p(s e2 | y, else)  χ −2 df = e′e + ν e Se n + ν e , scale = 192 ) ) C1.2 MCMC Implementation Strategy for CD-BayesB/CD-BayesA As presented in the Materials and Methods, a square root free Cholesky decomposition can be applied to the genetic variance-covariance matrix. Let = g 2 Ψg1 + g 2|1 where g 2|1 is the vector of SNP specific environmental slope effects conditional on intercept effects and Ψ = diag {φ j } j =1 represents a diagonal matrix of SNPm specific associations between intercept and slope effects. The RR/RN WGP model can be rewritten as: y=Xβ + Zg1 + DZ ( Ψg1 + g 2|1 ) + e =Xβ + ( Z + DZΨ ) g1 + DZg 2|1 + e Based on the description of priors for the two CD models in materials and methods, we can write down the joint posterior density for all unknown parameters: ( p β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , s e2 | y 1 ( 2|1 ) ) ∝ p y | β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , s e2 p ( β ) 1 2|1  m  m  m  m 2 2 2 2 2 2  ∏ p( g1 j | sg1 j )   ∏ p sg1 j | v1, s1 , π 1   ∏ p( g 2|1 j | sg2|1 j )   ∏ p sg2|1 j | v2 , s2 , π 2 = =  j 1=  j 1  j 1   j =1 ( ( )  )    m  2 2 2 2 2  ∏ p(φ j | mφ ,s φ )  p mφ p s φ p ( v1 ) p ( v2 ) p s1 p s2 p (π 1 ) p (π 2 ) p s e  j =1  ( ) ( ) ( ) ( ) ( ) ′ Where sg1 = s g211 s g212 s g213  s g21m  is a vector of SNP specific variances for ′ intercept, sg2|1 = s g22|1,1 s g22|1,2 s g22|1,3  s g22|1,m  is a vector of SNP specific variances for slope conditional on intercept, φ = [φ1 φ2  φm ]′ is a vector of SNP specific association 193 2 2 parameters between intercept and slope. In CD-BayesB ( π 1 < 1 and π 2 < 1 ), sg1 j ( sg2|1 j ) has a mixture prior of point mass at zero with probability 1 − π 1 ( 1 − π 2 ) and scaled inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12 ( s22 ) 2 2 with probability π 1 ( π 2 ). In CD-BayesA ( π 1 = 1 and π 2 = 1), sg1 j ( sg2|1 j ) has a scaled inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12 ( s22 ). To sample all fixed effects in CD-BayesA/CD-BayesB, FCD for the kth element of β is as follows: β k | y, β − k , else ~ N ( βk , vβ k ) with βk = x.' k ( ( y-Xβ − ( Z + DZΨ ) g x.' k ( e + x.k β k ) = = x.' k x.k (x 1 − DZg 2|1 ) +x.k β k ) ' .k x x.k ' .k e + x.' k x.k β k ) x.' k x.k  n  n 2  x e +  ∑ ik i  ∑ xik  β k  k 1= = i1   = ' x.k x.k  n 2  v s s x x = = and β k ( .k )  ∑ xik   k =1  2 e ' .k −1 −1 2 e ( ) e − x.k β k [ t +1] − β k [ t ] , Immediately after sampling β k , we update the residual by e = where β k [ t +1] is the sampled β k value at cycle [t+1] and β k [ t ] is the sampled β k value at cycle [t]. 194 To sample random intercept effects in CD-BayesA, with specified prior on g1 j ,  z1 j (1 + d1φ j ) z2 j (1 + d 2φ j ) z3 j (1 + d 3φ j )  znj (1 + d nφ j )′ is let’s define z*.g1 j =   column j of Z + DZΨ . The FCD can be written as follows: { } ,{s } m 2 g2|1 j =j 1 =j 1 g1 j | y, β, g −1 j , g 2|1 , s g21 j m ( , s e2 , {φ j } j =1 ~ N g1 j , vg1 j m ) where g1 j = z.'*g1 j ( ( y-Xβ − ( Z + DZΨ ) g − DZg ) +z 1 z ) ( '* * . g1 j . g1 j z +s s 2 e 2|1 * . g1 j g1 j ) −2 g1 j z.'*g1 j e + z.*g1 j g1 j z.'*g1 j e+z.'*g1 j z.*g1 j g1 j = = z.'*g1 j z.*g1 j + s e2s g−12j z.'*g1 j z.*g1 j + s e2s g−12j 2  n 2 φ z 1 + d e + ( ) ∑ ij i j i  ∑ zij (1 + d iφ j )  g1 j =i 1 = i1  = 2  n 2  2 −2  ∑ zij (1 + d iφ j ) + s e s g1 j   i =1  n n 2   vg1 j  s e−2 ∑ zij (1 + d iφ j ) + s g−12j  and = i =1   ( ) −1 e − z*. g1 j ( g1 j [ t +1] − g1 j [ t ] ) , Immediately after sampling g1 j , we update the residual by e = [ t +1] [t ] where g1 j is the sampled g1 j value at cycle [t+1] and g1 j is the sampled g1 j value at cycle [t]. To sample random slope effects conditional on intercept in CD-BayesA, with ′ * specified prior on g 2|1 j , let’s define z. g2|1 j =  z1 j d1 z2 j d 2 z3 j d 3  znj d n  is the jth column of DZ . The FCD can be written as follows: 195 { } ,{s } m 2 g2|1 j =j 1 =j 1 g 2|1 j | y, β, g −2|1 j , g1 , s g21 j m ( , s e2 , {φ j } j =1 ~ N g 2|1 j , vg2|1 j m ) where g 2|1 j = z.'*g2|1 j ( ( y-Xβ − ( Z + DZΨ ) g − DZg ) +z 1 z ( ) '* . g2|1 j z * . g2|1 j 2|1 +s s 2 e * . g2|1 j g 2|1 j ) −2 g2|1 j z.'*g2|1 j e + z.*g2|1 j g 2|1 j z.'*g2|1 j e+z.'*g2|1 j z.*g2|1 j g 2|1 j = = z.'*g2|1 j z*. g2|1 j + s e2s g−2|12 j z.'*g2|1 j z.*g2|1 j + s e2s g−2|12 j 2  n ( di zij )ei +  ∑ ( di zij )  g2|1 j ∑ =i 1 = i1  = n 2  2 −2   ∑ ( d i zij ) + s e s g2|1 j   i =1  n 2  −2 n −2  = and vg2|1 j  s e ∑ ( d i zij ) + s g2|1 j  i =1   −1 Immediately after sampling g 2|1 j , we update the residual by e= e − z*. g2|1 j ( g 2|1 j [ t +1] − g 2|1 j [ t ] ) , where g 2|1 j [ t +1] is the sampled g 2|1 j value at cycle [t+1] [t ] and g 2|1 j is the sampled g 2|1 j value at cycle [t]. To sample variances of SNP intercept effects in CD-BayesA, given the previously 2 2 −2 2 specified prior p(s g1 j | v1, s1 ) = χ ( v1, v1s1 ) , we can derive FCD as follows: 196 ( ) ( p s g2 | y , else ∝ p g1 j | s g2 1j ( ∝ 2πs 2 g1 j ) −1/2 1j  g1 j 2 exp  − 2  2s g  ∝s  (ν 1 +1) +1   2   2 s g   | ν 1 , s12 2 g1 j  ν 1 +1   2  − 1j 1j − 2  g1 j ) p (s )  ν 1s12  2s g2  exp  − 1j      ν 1s12 + g1 j 2  exp  −  2   2 s g   1j v1 + 1, scale = g12j + v1s12 ) That is, s g21 j | y, else  χ −2 ( df = To sample variances of SNP slope effects conditional on intercept in CD-BayesA, 2 2 −2 2 given the previously specified prior p(s g2|1 j | v2 , s2 ) = χ (v2 , v2 s2 ) , we can derive FCD as follows: ( ) ( ) p (s p s g2 | y , else ∝ p g 2|1 j | s g2 ( 2|1 j ∝ 2πs 2 g 2|1 j ) −1/2 2|1 j  g 2|1 j 2 exp  − 2  2s g  2|1 j ∝s  (ν 2 +1) +1   2  − 2  g 2|1 j  2 s g   2 g 2|1 j  ν 2 +1   2  − 2|1 j | ν 2 , s22 )  ν 2 s22  2s g2  exp  − 2|1 j      ν 2 s22 + g 2|1 j 2  exp  −  2   2 s g   2|1 j v2 + 1, scale =+ g 2|21 j v2 s22 ) That is, s g22|1 j | y, else  χ −2 ( df = To sample variances and effects of SNP intercepts in CD-BayesB, according to 2 the collapsed sampling strategy (Liu 1994), we jointly sampled g1 j and s g1 j as adapted in Bayes B (Meuwissen et al. 2001). Let’s define  z1 j (1 + d1φ j ) z2 j (1 + d 2φ j ) z3 j (1 + d 3φ j )  znj (1 + d nφ j )′ is the jth column of z*.g1 j =   Z + DZΨ . We first sample from 197 ( ) ∫ p (s p s g21 j | ELSE except g1 j = 2 g1 j ) , g1 j | ELSE dg1 j g1 j ∝ ( ) ( )  n  p ( yi | β, g −1 j , g1 j , g 2|1 , s e2 )  p g1 j | s g21 j p s g21 j | ν 1 , s12 , π 1 dg1 j ∫g  ∏ i =1  1j ( ) ∝ p s g21 j | ν 1 , s12 , π 1 Vg1 j −1/2  1  exp  − y *g1 j ' Vg−11j y *g1 j   2  ( y-Xβ − ( Z + DZΨ) * where y g= 1j −j ) g −1 j − DZg 2|1 = and Vg1 j z*g1 j z '*g1 j s g21 j + Is e2 . Using that expression, the random walk Metropolis-Hastings acceptance ratio for ( ) sampling from p s g21 j | ELSE except g1 j at MCMC cycle [t] based on using ) ( p s g21 j | ν 1 , s12 , π 1 as the candidate density looks as follows: ( α s g2[ t −1] , s g2* 1j 1j ) ( ( ) ( ) ( ) )   p s g2* | ELSE except g1 j p s g2[ t −1] | ν 1 , s12 , π 1  1j 1j  min  ,1 2[ 1] 2* 2 − t =  p sg  | except | , , s ν π ELSE g p s 1j 1 1 1 g1 j 1j    1, otherwise;  which can be demonstrated (Meuwissen et al. (2001)), to be equal to: ( α s g2[t −1] , s g2* 1j 1j ) −1/2     1  Vg1 j * exp  − y *g1 j ' Vg1 j *−1y *g1 j      2   min  ,1  =  V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *    g g1 j g1 j   g1 j    2 1j     1, otherwise  ( and further simplified as follows: 198 )     rg21 j  −1/2 *    exp  − *  vg1 j  2v g     1j   ,1  min   =  rg21 j   −1/2  [ t −1] exp  −    v g1 j  2v g [ t −1]   1j       1, otherwise  ( ) ( α s g2[t −1] , s g2* 1j 1j ) ( ) where rg1 j = z.'*g1 j y*g1 j =  2 ∑ zij (1 + diφ j )ei +  ∑ zij2 (1 + diφ j )  g1 j n n =i 1 = i 1  and vg1 j var( z.'*g1 j y *g1 j ) = = ( z.'*g1 j z.*g1 j )s 2 2 g1 j ( ) + z.'*g1 j z.*g1 j s e2 2 2 2  n 2  n 2 2 =  ∑ zij (1 + d iφ j )  s g1 j +  ∑ zij (1 + d iφ j )  s e2  i 1=  i1  2 If a non-zero value for s g1 j is sampled, then one can draw samples of g1 j using the same [ t +1] [t ] full conditionals as with CD-BayesA. If either g1 j or g1 j are nonzero, the residual e − z*. g1 j ( g1 j [ t +1] − g1 j [ t ] ) immediately thereafter. needs to be updated as e = To sample variances and effects of SNP slopes conditional on intercepts in CD2 BayesB, we jointly sampled g 2|1 j and s g2|1 j as adapted in Bayes B (Meuwissen et al. ′ * 2001). Let’s define z. g2|1 j =  z1 j d1 z2 j d 2 z3 j d 3  znj d n  is the jth column of DZ . We first sample from 199 ( ) ∫ p (s p s g22|1 j | ELSE except g 2|1 j = 2 g2|1 j ) , g 2|1 j | ELSE dg 2|1 j g2|1 j ( ) ( )  n  ∝ ∫  ∏ p ( yi | β, g −2|1 j , g 2|1 j , g1 , s e2 )  p g 2|1 j | s g22|1 j p s g22|1 j | ν 2 , s22 , π 2 dg 2|1 j  g2|1 j  i =1 ( ) ∝ p s g22|1 j | ν 2 , s22 , π 2 Vg2|1 j * where y g2|1= j −1/2  1  exp  − y *g2|1 j ' Vg−2|11 j y *g2|1 j   2  ( y-Xβ − ( Z + DZΨ) g − ( DZ ) 1 −j ) g −2|1 j= and Vg2|1 j z*g2|1 j z '*g2|1 j s g22|1 j + Is e2 . Using that expression, the random walk Metropolis-Hastings acceptance ratio for ( ) sampling from p s g22|1 j | ELSE except g 2|1 j at MCMC cycle [t] based on using ( ) p s g22|1 j | ν 2 , s22 , π 2 as the candidate density looks as follows: ( α s 2[ t −1] g2|1 j ,s 2* g2|1 j ) ( ( ) ( ) ( ) )   p s g2* | ELSE except g 2|1 j p s g2[ t −1] | ν 2 , s22 , π 2  2|1 j 2|1 j  min  ,1 2[ 1] 2* 2 − t =  p sg  | except | , , s ν π ELSE g p s 2|1 j 2 2 2 g2|1 j 2|1 j    1, otherwise;  According to Meuwissen et al. (2001), this ratio is further equal to: ( α s g2[t −1] , s g2* 2|1 j 2|1 j ) −1/2     1  Vg2|1 j * exp  − y *g2|1 j ' Vg2|1 j *−1y *g2|1 j      2   min  ,1  =  V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *    g g2|1 j g2|1 j   g2|1 j    2 2|1 j     1, otherwise  ( and can be further simplified as: 200 )     rg22|1 j  −1/2 *    vg2|1 j exp  − *   2v g     2|1 j   ,1  min   =  rg22|1 j   −1/2  [ t −1] exp  −    vg2|1 j  2vg [ t −1]   2|1 j       1, otherwise  ( ( α s g2[t −1] , s g2* 2|1 j 2|1 j ) ) ( ) where * = rg2|1 j z.'*= g2|1 j y g2|1 j  2 ∑ zij di ei +  ∑ ( zij di )  g2|1 j n n =i 1 = i 1  and z.'*g2|1 j y *g2|1 j ) = vg2|1 j var( = ( z.'*g2|1 j z*. g2|1 j )s 2 2 g2|1 j ( ) + z.'*g2|1 j z*. g2|1 j s e2 2 2 2  n  n 2 =  ∑ ( zij d i )  s g2|1 j +  ∑ ( zij d i )  s e2 =  i 1=  i1  2 If a non-zero value for s g2|1 j is sampled, then one can draw samples of g 2|1 j using the [ t +1] [t ] same full conditionals as with CD-BayesA. If either g 2|1 j or g 2|1 j are nonzero, the e − z*. g2|1 j ( g 2|1 j [ t +1] − g 2|1 j [ t ] ) immediately thereafter. residual needs to be updated as e = To sample proportion of SNP markers associated with non-zero intercept effects ( ) in CD-BayesB, with specified prior p π 1 | απ1 , βπ1 = Beta (απ1 , βπ1 ) , the FCD of π 1 is based on the following,  m  j =1 (  ) p (π 1 | y,else ) ∝  ∏ p s g21 j |ν 1, s12 , π 1  p (π 1 | απ , βπ   1 201 1 ) ∑ I (s ) m = Let m1 j =1 > 0 denote the number of non-zero values sampled in s g21 j at a 2 g1 j particular MCMC cycle where I(.) denotes the indicator function. Then we can write p (π 1 | y,else =) Beta(απ + m1 , βπ + m − m1 ) . 1 1 To sample proportion of SNP markers associated with non-zero slope effects conditional on intercept in CD-BayesB, with specified prior p (π 2 | απ , βπ ) = Beta (απ , βπ ) , the FCD of π 2 is based on the following, 2 2 2  m  j =1 2  ) ( p (π 2 | y,else ) ∝  ∏ p s g22|1 j |ν 2 , s22 , π 2  p (π 2 | απ , βπ   = Let m2 ∑ I (s m j =1 2 g2|1 j 2 2 ) ) > 0 denote the number of non-zero values sampled in s g2 at a 2|1 j particular MCMC cycle where I(.) denotes the indicator function. Then we can write p (π 2 | y,else =) Beta(απ + m2 , βπ + m − m2 ) . 2 2 To sample association parameters between intercept and slope effects in CDBayesA/CD-BayesB, with specified prior φj ~ N ( mφ , s φ2 ) , let’s define G = diag {g1 j } j =1 m ′ * and z.φ j =  z1 j d1 g1 j z2 j d 2 g1 j z3 j d 3 g1 j  znj d n g1 j  is column j of DZG . The FCD for φj can be written as follows: { } ,{s } m 2 g2|1 j 1j j 1= j 1 = φ j | y, β, φ− j , g1 , g 2|1 , s g2 m ( , s e2 , ~ N φj , vφ j 202 ) where φj = ( s e2s φ−2 mφ +z*.φ j ' ( y-Xβ − Zg1 -DZGφ − DZg 2|1 ) +z.*φ jφ j * .φ j z 'z * .φ j + sφ s −2 ) 2 e s e2s φ−2 mφ +z*.φ j ' ( e + z*.φ jφ j ) s e2s φ−2 mφ +z*.φ j ' e+z*.φ j ' z*.φ jφ j = z*.φ j ' z*.φ j + s φ−2s e2 z*.φ j ' z*.φ j + s φ−2s e2  n =  s e2s φ−2 mφ +∑ zij d i g1 j ei +  ∑ ( zij d i g1 j )  φ j n 2  i =1 i =1 n  −2 2   ∑ ( zij d i g1 j ) + s φ s e   i =1   2 and n 2   vφ j  s e−2 ∑ ( zij d i g1 j ) + s φ−2  = i =1   −1 e − z*.φj (φj [ t +1] − φj [ t ] ) , where Immediately after sampling φj , we update the residual by e = φj [ t +1] is the sampled φj value at cycle [t+1] and φj [ t ] is the sampled φj value at cycle [t]. To sampling mean for association parameters in CD-BayesA/CD-BayesB, given a 2 specified prior p( mφ ) = N (τ , ζ ) , FCD for mφ can be written as follows,  m  p ( mφ | y , ELSE ) ∝  ∏ p (φ j | mφ , s φ2 )  p ( mφ | τ , ζ 2 )  j =1   1 ∝ exp  −  2s 2 φ  ∑ (φ  1 ∝ exp  −  2s m2 φ  ∑(m − mφ ) m j =1 j m i =1 φ 2   ( mφ − τ ) 2   exp  −  2ζ 2    2 − mφ )    203 −1 m  s φ2  2 −1   φ + ( ζ ) τ φj ∑ m  j =1 s m2φ = Where mφ = for φ = and 2 −1 m s  −1 (ζ 2 ) +  mφ    ( mφ | y, ELSE  N mφ , s m2φ i.e. −1   s φ2   2 −1  (ζ ) +    m        −1 ) To sampling variance of association parameters in CD-BayesA/CD-BayesB, with ( ) 2 a specified prior p s φ2 = Gamma (αφ , βφ ) , FCD for s φ can be written as follows,  m  p (s φ2 | y, ELSE ) ∝  ∏ p (φ j | mφ , s φ2 )  p (s φ2 )  j =1    1 m αφ −1 − β s 2 2  2 − m /2 2 πs exp − 2 ∑ (φ j − mφ )   (s φ2 ) e φ φ (  φ )  2s j =1   φ     m − 2αφ  − +1 2  2  ∝ sφ  (φ − mφ )' (φ − mφ ) + βφ exp  −  2s φ2     That is, s φ2 | y, ELSE  χ −2 ( df =( m − 2αφ ), scale =(φ − mφ )' (φ − mφ ) + βφ ) . To sample scale parameter for SNP intercepts in CD-BayesA/CD-BayesB, with a specified prior for scale parameter p( s12 | α1 , β1 ) = Gamma (α1 , β1 ) , we can write FCD as follows: 204  m  p ( s12 | y, ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p ( s12 | α1 , β1 )  j =1  (    m =  ∏ I s g21 j > 0  j =1   ( ∝ ( s12 ) α1 + m1ν 1 −1 2 ) ( ) ν1 )  s12ν 1  2 s 2ν ν  − 1 21  2  − 1 +1 2   s 2  2  e s g1 j g1 j ν  Γ 1  2    2 α1 −1 − β1s12  ( s1 ) e     ν m  exp  − s12  1 ∑ I s g21 j > 0 s g−12j + β1      2 j =1   ( ) mν ν m That is, a gamma distribution with parameters α1 + 12 1 and 1 ∑ I s g21 j > 0 s g−12j + β1 . 2 j =1 ( ) To sample scale parameter for SNP slopes conditional on intercept in CDBayesA/CD-BayesB, with a specified prior for scale parameter p( s22 | α 2 , β 2 ) = Gamma (α 2 , β 2 ) , we can write FCD as follows,  m  p ( s22 | y , ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2 , s22 , π 2  p ( s22 | α 2 , β 2 )  j =1  (    m =  ∏ I s g22|1 j > 0  j =1   ( ∝ ( s22 ) α2 + m2ν 2 −1 2 ) ( ) ν2 )  s22ν 2  2 s22ν 2 ν   2  − 2 +1 − 2s 2   s 2  2  e g2|1 j g2|1 j ν 2  Γ  2    2 α2 −1 − β2 s22  ( s2 ) e     ν m  exp  − s22  2 ∑ I s g22|1 j > 0 s g−2|12 j + β 2      2 j =1   ( ) mν ν m That is, a gamma distribution with parameters α 2 + 22 2 and 2 ∑ I s g22|1 j > 0 s g−2|12 j + β 2 . 2 j =1 ( 205 ) To sampling degrees of freedom for SNP intercept effects in CD-BayesA/CD- v1 ) 1 / (1 + v1 ) , BayesB, with a specified non-informative prior for degrees of freedom p(= 2 we can write FCD for v1 as follows:  m  p (ν 1 | y, ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p (ν 1 )  j =1  (    m =  ∏ I s g21 j > 0  j =1   ( ν1 ) ) ) (  s12  2 s12 ν  2 − 1 +1 − 2s 2   s 2  2  e g1 j g ν  1j Γ 1  2     p (ν 1 )    We sampled v1 using a random walk Metropolis Hastings algorithm, which was described in other non-genomic applications involving the sampling of degrees of freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010). To sample degrees of freedom for SNP slopes conditional on intercepts in CDBayesA/CD-BayesB, with a specified non-informative prior for degrees of freedom p(= v2 ) 1 / (1 + v2 ) , we can write FCD for v2 as follows: 2  m  p (ν 2 | y, ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2 , s22 , π 2  p (ν 2 )  j =1  (    m =  ∏ I s g22|1 j > 0  j =1   ( s 2 s22 ν2  − 2   s 2 − 2 +1 e 2s g22|1 j g  ν 2  2|1 j Γ   2 2 2 ) ν2 ) ( )     p (ν 2 )    206 We sampled v2 using a random walk Metropolis Hastings algorithm, which was described in other non-genomic applications involving the sampling of degrees of freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010). To sample residual variance in CD-BayesA/CD-BayesB, given a specified scaled 2 2 −2 inverted Chi-square prior p(σ e |ν e , Se ) = χ (ν e ,ν e Se ) , we can write the FCD for s e as follows, p (s e2 | y, else ) ∝ ( 2πs ∝ s e2 ) 2 − n /2 e − n +ν e +1  2  ν  ν S   2 − 2e +1 − 2es e2e 1 exp  − 2 e ' e  s e e  2s e   exp  −   1 ( e ' e + ν e Se )  2s  2 e where e = y − Xβ − ( Z + DZΨ ) g1 − DZg 2|1 ( i.e. p(s e2 | y, else)  χ −2 df = e′e + ν e Se n + ν e , scale = 207 ) C1.3 Derivation for overall genetic correlation between intercept and slope The RR/RN WGP model can be written as follows, yi = x 'i β + ∑ zij ( g1 j + di g 2 j ) + ei ; i = 1, 2,..., n m j =1 Then, the genetic variance at environmental covariate d i for animal i with genotype zij at locus j can be defined as,  m  m 2 2 var  ∑ zij ( g1 j + di g 2 j = )  ∑ zij s g1 j + di2s g22 j + 2dis g1 jg 2 j =  j 1=  j1 ( m m ) m = ∑ zij2s g21 j + ∑ zij2 di2s g22 j + ∑ zij2 2dis g1g 2 j =j 1 =j 1 =j 1 Thus, for any environment d, we can define the genetic variance as follows,  m  var  ∑ zij ( g1 j + dg 2 j )   j =1  m m m 2 2 2 2 2 ij g1 j ij g 2 j =j 1 =j 1 =j 1 = ∑z s +d ∑z s + 2d ∑ zij2 s g1g 2 j Using de los Campos et al. (2012), we can compute the overall genetic variance across all animals as follows: n  m m m  = Vg n −1 ∑  ∑ zij2s g21 j + d 2 ∑ zij2s g22 j + 2d ∑ zij2 s g1g 2 j  =i 1 = =j 1 =j 1 j1  −1 = n n m ∑∑ z s =i 1 =j 1 2 ij 2 g1 j +d n 2 −1 n m ∑∑ z s =i 1 =j 1 2 ij 2 g2 j + 2dn −1 n m ∑∑ z s =i 1 =j 1 208 2 ij g1g 2 j n m where n −1 ∑∑ zij2s g21 j is the overall genetic variance for intercept across animals, =i 1 =j 1 n −1 n m ∑∑ z s =i 1 =j 1 n 2 ij 2 g2 j is the overall genetic variance for slope across animals and m n −1 ∑∑ zij2s g1g 2 j is the overall genetic covariance between intercept and slope across =i 1 =j 1 animals. Therefore, the overall genetic correlation ρg1g2 between intercept and slope can be written as, n ρg g = 1 2 m n −1 ∑∑ zij2s g1g 2 j =i 1 =j 1  −1 n m 2 2  −1 n m 2 2   n ∑∑ zijs g1 j  n ∑∑ zijs g2 j   =i 1 =j 1  =i 1 =j 1  209 C2 Supplementary tables and figures Figure C2.1: Estimated SNP effects with Environmental (rescaled weeks of age 10, 13, 16, 19 and 22) dependence on back fat thickness for two SNP markers (solid line and dash line) using the complete final analyses data in MSU Pig Resource Population under model A) IW-BayesA, B) CD-BayesA. 210 APPENDIX D: Chapter 5 D1 Markov Chain Monte Carlo (MCMC) Implementation Strategy for Bayesian hierarchical methods D1.1 MCMC Implementation Strategy for IWBayesA To sample location parameters computationally efficiently, we adopted GaussSeidel updating algorithm in MCMC implementation strategy (Legarra and Misztal, 2008). In the bivariate trait WGP model, y1 is the n x 1 vector of phenotypes for animals on trait 1, y2 is the n x 1 vector of phenotypes for animals on trait 2, β1 (β2) is the q x 1 vector of fixed effects on trait 1 (trait 2), g1 (g2) represents the m x 1 vector of SNP substitution effects on trait 1 (trait 2), X1 (X2) is the n x q design matrix for fixed effect β1 (β2), Z is the m x m genotype matrix.  y1   X1 0   β1   Z 0   g1   e1   y  =  0 X  β  +  0 Z   g  +  e    2  2 2 2  2  [1] To estimate the random SNP effects more efficiently, we block sampling trait specific effects by SNP, such that we have a vector g = [ g11 , g 21 ,..., g1k , g 2 k ,..., g1m , g 2 m ]′ where g k = [ g1k , g 2 k ]′ is random SNP effects on trait 1 and trait 2 for kth SNP. Let β = [β1 β2 ]′ and y = [ y1 y 2 ]′ , based on the description of priors for IWBayesA in methods, we can write the joint posterior density for all unknown parameters as follows: 211 p ( β, g, G, vg , Σg , Σe | y ) ∝ p ( y | β, g, G, vg , Σg , Σe ) p ( β ) [2]  m  m  p ( g | G ) ∏ k k   ∏ p G k | vg , Σg  p ( Σg | v0 , Σ 0 ) p vg p ( Σe ) =  k 1=  k 1  ( ) ( ) where G is a 2m x 2m genetic variance-covariance matrix and Σg is a 2 x 2 scale matrix for trait specific random SNP effects. We assume that G has an Inverted Wishart (IW) prior of degree freedom vg and scale matrix Σg . Residuals for the two traits e = [ e1 e 2 ]′ were assumed to have a bivariate normal distribution with null mean and variance Is e21 Is e1e2  covariance matrix R =  2  . Thus, we know the inverse of the residual  Is e1e2 Is e2  −1 2  Ir11 Ir12   r11 r12   s e1 s e1e2  = variance-covariance matrix is R =  12 where  12 22  s 2  . 22   Ir Ir   r r   e1e2 s e2  −1 To sample fixed effects on trait 1, FCD for lth element of β1 can be written as follows: β1l | y, β −1l , g1 , g 2 , G, Σe ~ N ( β1l , vβ 1l ) with 212 β1l = = = ( ' x.(1) ( r11y1 +r12 y 2 -r11X1β1 -r12X 2β2 − r11Zg1 − r12Zg 2 ) +r11x.(1)l β1l l (1) ' (1) 11 .l .l ) x x r ( ' x.(1) ( r11 ( y1 − X1β1 − Zg1 ) +r12 ( y 2 − X 2β2 − Zg 2 ) ) +r11x.(1)l β1l l ) (1) ' (1) 11 .l .l x x r x (1) ' .l (r e1 + r e 2 + r x β1l ) 11 12 11 (1) .l ' (1) 11 x.(1) l x.l r n n  n  r11 ∑ x1il e1i + r12 ∑ x1il e2i + r11  ∑ x1il2  β 1l =i 1 =i 1 = i1  = n   r11  ∑ x1il2   i =1   n 2   ∑ x1il  (1) ' (1) 11 −1  i =1  x.l x.l r ) = and vβ 1l (= 11 r −1 Immediately after sampling β1l , we update the first trait residual using ( ) e1 = e1 − x.(1) β 1l [ t +1] − β 1l [ t ] , where x.(1) l l is the covariate/dummy variable values for fixed effect variable l on trait 1, β1l [ t +1] is the sampled β1l value at cycle [t+1] and β1l [ t ] is the sampled β1l value at cycle [t]. To sample fixed effects on trait 2, FCD for lth element of β2 can be written as follows: β 2 l | y, β −2 l , g1 , g 2 , G, Σe ~ N ( β2 l , vβ 2 l ) with 213 β2 l = = ' x.(2) l ' x.(2) l (( r 21 y1 +r 22 y 2 -r 21X1β1 -r 22 X 2β2 − r 21Zg1 − r 22 Zg 2 ) +x.(2) l β 2l x (( r 12 ) (2) ' (2) 22 .l .l x r ( y1 − X1β1 − Zg1 ) +r 22 ( y 2 − X 2β2 − Zg 2 ) ) +r 22x.(2)l β 2l ) ' (2) 22 x.(2) l x.l r ' 12 22 22 (2) 22 (2) 22 (2) ' (2) x.(2) r12x.(2) l ( r e1 + r e 2 + r x . l β 2 l ) l e1 + r x . l e1 + r x . l x . l β 2 l = ' (2) 22 ' (2) 22 x.(2) x.(2) l x.l r l x.l r n n  n  r12 ∑ x2il e1i + r 22 ∑ x1il e2i + r 22  ∑ x2il2  β 2 l =i 1 =i 1 = i1  = n   r 22  ∑ x2il2   i =1   n 2   ∑ x2il  (2) ' (2) 22 −1  i =1  x.l x.l r ) = and vβ 2 l (= 22 r −1 Immediately after sampling β 2l , we update the second trait residual using ( ) e2 = e 2 − x.(2) β 2 l [ t +1] − β 2 l [ t ] , where x.(2) is the covariate/dummy variable values for fixed l l effect variable l on trait 2, β 2 l [ t +1] is the sampled β 2l value at cycle [t+1] and β 2 l [ t ] is the sampled β 2l value at cycle [t]. To sample trait specific random SNP effects, FCD for kth element of g can be written as follows: g  m  g k =  1k  | y, β1 , β2 , g − k , {G k }k =1 , Σe ~ N g k , V gk g  2k  ( Where 214 )   r11z′ z r12 z′ z   = g k   21 .k .k 22 .k .k  + G k−1   r z′ z r z′ z  .k .k .k .k    −1  z′.k ( r11e1 + r12e 2 + r11z.k g1k + r12 z.k g 2 k )     z′.k ( r 21e1 + r 22e 2 + r12 z.k g1k + r 22 z.k g 2 k )   and   r11z′.k z.k  = Vgk   21  r z′ z   .k .k  r12 z′.k z.k  −1 + G   k r 22 z′.k z.k   −1 Immediately after sampling g k , we update the first trait residual using e1 = e1 − z.k ( g1k [ t +1] − g1k [ t ] ) and update the second trait residual using e2 = e 2 − z.k ( g 2 k [ t +1] − g 2 k [ t ] ) , where g1k [ t +1] ( g 2 k [ t +1] ) is the sampled g1k ( g 2k ) value at cycle [t+1] and g1k [ t ] ( g 2 k [ t ] ) is the sampled g1k ( g 2k ) value at cycle [t]. To sample genetic variance-covariance matrix for random SNP effects, given the specified conjugate prior on genetic variance-covariance matrix p ( G k | vg , Σg ) ∝ IW( vg , Σg ) for kth SNP, FCD of G k can be written as follows: p ( G k | else ) ∝ p ( g k | G k ) p ( G k | vg , Σg ) ∝ Gk −1/2 ∝ Gk − ( ( ) exp −0.5 ( g′k G k−1g k ) G k 1 ( vg +1) + 3 2 ) − 1 ( vG + 3) 2 ( exp −0.5trace ( G k−1Σg )   −1   g1k2  g1k g 2k   + Σ exp  −0.5trace  G k    g 2  g g      g 1k 2k 2k           g1k2  g1k g 2k  G | ~ IW 1, Σ else v + +    Hence, k  g 2    g g g g 2k    1k 2k   215 ) To sample scale matrix for genetic variance-covariance matrices, given a conjugate Wishart prior for scale matrix p( Σg ) ∝ W( v0 , Σ0 / v0 ) , FCD of Σg can be written as follows: p ( Σg | else ) ∝ ∏ p ( G k | vg , Σg )W( v0 , Σ0 / v0 ) m k =1 1 vg /2 ( v0 −3)  m  exp −0.5trace ( G k−1Σg )  Σg 2 exp −0.5trace ( Σg Σ0−1v0 ) ∝  ∏ Σg  k =1  1 m    vg m /2 ( v0 −3) exp  −0.5trace  Σg ∑ G −j 1   Σg 2 exp −0.5trace ( Σg Σ0−1v0 ) ∝ Σg    j =1   ( ∝ Σg ( 1 v0 + vg m −3 2 ) ) ( ) ( )    m −1  exp  −0.5trace  Σg  ∑ G k + v0 Σ0−1       k =1  −1    m −1 −1  Hence, Σg | else ~ W  v0 + vg m,  ∑ G k + v0 Σ0     k =1    To sample degrees of freedom for genetic variance-covariance matrices, given a ( ) ( “non-informative” prior on degrees of freedom p vg ∝ 1 1 + vg ) 2 , we can write FCD for vg as follows: p ( vg | else ) ∝ 2 − vg m Γ 2 ( vg / 2 ) −m Σg vg m /2 m ∏ Gj − j =1 ( 1 vg + 3 2 ) ( exp −0.5trace ( G −j 1Σg ) ) 1 +1v ( g ) 2 where Γ 2 ( vg / = 2 ) π 2(2−1)/4 ∏ Γ  vg / 2 + (1 − p ) / 2  ∝ Γ  vg / 2  Γ  vg / 2 − 1 / 2  2 p =1 The FCD for vg is not recognizable and so a sampling strategy for nonstandard distributions is required. In order to use proposal densities, especially in a random walk Metropolis implementation, it may be more appropriate to change the variable such that 216 its parameter space is defined on the real line. Using ζ = log ( vg ) , the relevant FCD in this case is: ( p (ζ | else ) ∝ 0.5 Σg m ∏ Gj ) 1/2 m exp(ζ ) − ( Γ exp (ζ ) / 2 Γ exp (ζ ) / 2 − 1 / 2 ) 1 ( exp(ζ )+3) 2 j =1 ( exp −0.5trace ( G −j 1Σg ) ) 1 + exp1 (ζ ) ( −m ) 2 exp (ζ ) Hence, log p (ζ | else ) = ) (  exp (ζ ) log(0.5) + 0.5log Σg   m  − log Γ exp (ζ ) / 2  − log Γ exp (ζ ) / 2 − 0.5       −0.5 ( exp (ζ ) + 3) ∑ log G j − 0.5∑ trace ( Σg G −j 1 ) − 2 log (1 + exp (ζ ) ) + ζ m m =j 1 =j 1 We sampled vg using a random walk Metropolis Hastings algorithm, which was described in other non-genomic applications involving the sampling of degrees of freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010). Suppose the value of ζ in the current cycle i is ζ[i]. Then generate a random variable, say δ from N(0,c2) and add it to ζ[i] to propose ζ* = ζ[i] + δ. Determine the following odds ratio: α = p (ζ * | else ) p (ζ [i ] | else ) . For numerical stability, it is perhaps much wiser to ( ) evaluate this ratio as: α exp log p (ζ [i +1] | else ) − log p (ζ [i ] | else ) . To implement this = Metropolis (within Gibbs) scheme, one would first generate U from a Uniform(0, 1) distribution. If 1) α > 1, accept ζ[i+1] = ζ* ; 2) If α > U, accept ζ[i+1] = ζ* ; 3) If α < U, then set ζ[i+1] = ζ[i]. 217 To sample residual variance-covariance matrix, given a specified Inverted Wishart prior p ( Σe | v0 , Σ0 ) = IW ( v0 , Σ0 ) , we can write the FCD for Σe as follows, p ( Σe |ELSE , y ) ∝ p ( y | β, g, Σe ) p ( Σe | v0 , Σ0 ) ( ) exp −0.5 trace ( Σe −1Se ) Σe ∝ Σe − n 2 ∝ Σe − 1 ( n + v0 + 3) 2 ( − 1 ( v0 + 3) 2 ( exp −0.5trace ( Σe -1Σ0−1 ) )) ( exp −0.5 trace Σe −1 ( Se + Σ0−1 )     e1' e1 e1' e 2  where S e =  '  ' e 2 e1 e 2 e 2  ( Hence, Σe | else ~ IW n + v0 , ( Se + Σ0−1 ) ) 218 ) D1.2 MCMC Implementation Strategy for CD-BayesA/CD-BayesB As defined in the methods, a square root free Cholesky decomposition (CD) can g 2 Ψg1 + g 2|1 be applied to the genetic variance-covariance matrices. Thereby, if let = where g 2|1 is the vector of SNP substitution effects for trait 2 conditional on trait 1 and Ψ = diag {φ j } j =1 represents a diagonal matrix of SNP-specific associations between two m traits. Based on the description of priors for the two CD models in methods, we can write down the joint posterior density for all unknown parameters: ( p β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , π φ , Σe | y 1 ( 2|1 ) ) ∝ p y | β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , π φ , Σe p ( β ) 1 2|1  m  m  m  m 2 2 2 2 s s π s p ( g | ) p | v , s , p ( g | ) p sg22|1k | v2 , s22 , π 2 ∏ ∏ g1 k   ∏ g1 k 1 ∏ g 2|1 k   1 1 1k 2|1k  = =  k 1=  k 1  k 1   j =1 ( ( )  )     2 2 2 2  ∏ p(φk | mφ ,s φ , π φ )  p mφ p s φ p ( v1 ) p ( v2 ) p s1 p s2 p (π 1 ) p (π 2 ) p π φ p ( Σe )  k =1  ′ Where sg1 = s g211 s g212 s g213  s g21m  is a vector of SNP specific variances for trait 1, m ( ) ( ) ( ) ( ) ( ) ′ s g2|1 = s g22|1,1 s g22|1,2 s g22|1,3  s g22|1,m  is a vector of SNP specific variances for trait 2 conditional on trait 1, φ = [φ1 φ2  φm ]′ is a vector of SNP specific association 2 2 parameters between two traits. In CDBayesB ( π 1 < 1 , π 2 < 1 and π φ < 1 ), sg1k ( sg2|1k ) has a mixture prior of point mass at zero with probability 1 − π 1 ( 1 − π 2 ) and scaled inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12 ( s22 ) 2 2 with probability π 1 ( π 2 ). In CDBayesA ( π 1 = 1 , π 2 = 1 and π φ = 1 ), sg1 j ( sg2|1 j ) has a 219 scaled inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12 ( s22 ). To sample location parameters computationally efficiently, we again adopted Gauss-Seidel updating algorithm in MCMC implementation strategy (Legarra and Misztal, 2008). To sample fixed effects on trait 1 in CDBayesA\CDBayesB, FCD for lth element of β1 can be written as follows: m m m 2 , {s g22|1k } , Σe , {φk }k =1 ~ N ( β1l , vβ 1l ) β1l | y, β −1l , g1 , g 2|= 1 , {s g 1k }k 1 = k 1 with β1l = = = ' x.(1) l ' x.(1) l (( r (( r 11 y1 +r12 y 2 -r11X1β1 -r12 X 2β2 − r11Zg1 − r12 ZΨg1 − r12 Zg 2|1 ) +r11x.(1) l β1l ) (1) ' (1) 11 .l .l x x r 11 ( y1 − X1β1 − Zg1 ) +r12 ( y 2 − X 2β2 − ZΨg1 − Zg 2|1 ) ) +r11x.(1)l β1l ) ' 11 12 11 (1) x.(1) l ( r e1 + r e 2 + r x . l β1l ) ' (1) 11 x.(1) l x.l r ' (1) 11 x.(1) l x.l r n n  n  r11 ∑ x1il e1i + r12 ∑ x1il e2i + r11  ∑ x1il2  β 1l =i 1 =i 1 = i1  = n   r11  ∑ x1il2   i =1   n 2   ∑ x1il  (1) ' (1) 11 −1  i =1  x.l x.l r ) = and vβ 1l (= 11 r −1 Immediately after sampling β1l , we update the first trait residual using ( ) e1 = e1 − x.(1) β 1l [ t +1] − β 1l [ t ] , where x.(1) l l is the covariate/dummy variable values for fixed 220 effect variable l on trait 1, β1l [ t +1] is the sampled β1l value at cycle [t+1] and β1l [ t ] is the sampled β1l value at cycle [t]. To sample fixed effects on trait 2 in CDBayesA\CDBayesB, FCD for lth element of β2 can be written as follows: m m m 2 , {s g22|1k } , s e2 , {φk }k =1 ~ N ( β2 l , vβ 2 l ) β 2 l | y, β −2 l , g1 , g 2|= 1 , {s g 1k }k 1 = k 1 Where β2 l = = = ' x.(2) l ' x.(2) l (( r (( r 12 y1 +r 22 y 2 -r12 X1β1 -r 22 X 2β2 − r12 Zg1 − r 22 ZΨg1 − r 22 Zg 2|1 ) +r 22x.(2) l β 2l x 12 ) (2) ' (2) 22 .l .l x r ( y1 − X1β1 − Zg1 ) +r 22 ( y 2 − X 2β2 − ZΨg1 − Zg 2|1 ) ) +r 22x.(2)l β 2l ) ' 12 22 22 (2) x.(2) l ( r e1 + r e 2 + r x . l β 2 l ) ' (2) 22 x.(2) l x.l r ' (2) 22 x.(2) l x.l r  n 2 r ∑ x2il e1i + r ∑ x1il e2i + r  ∑ x2il  β 2 l =i 1 =i 1 = i1  = n   r 22  ∑ x2il2   i =1  n n 12 22 22  n 2   ∑ x2il  (2) ' (2) 22 −1  i =1  x.l x.l r ) = and vβ 2 l (= 22 r −1 Immediately after sampling β 2l , we update the second trait residual using ( ) e2 = e 2 − x.(2) β 2 l [ t +1] − β 2 l [ t ] , where x.(2) is the covariate/dummy variable values for fixed l l effect variable l on trait 2, β 2 l [ t +1] is the sampled β 2l value at cycle [t+1] and β 2 l [ t ] is the sampled β 2l value at cycle [t]. 221 To sample random SNP effects for trait 1 in CDBayesA, FCD for kth element of g1 can be written as follows: g1k | y, β1 , β2 , g −1k , g 2|1 , {s g21k } , {s g22|1k } , Σe , {φk }k =1 ~ N ( g1k , v1k ) m m m = k 1= k 1 Where ( g1k = = )  z.' k ( r11y1 +r12 y 2 -r11X1β1 -r12 X 2β2 − r11Zg1 − r12 ZΨg1 − r12 Zg 2|1 + r11z.k g1k + r12φk z.k g1k )     +φ z ' ( r12 y + r 22 y − r12 X β − r 22 X β − r12 Zg − r 22 ZΨg − r 22 Zg + r12 z g + r 22φ z g )  1 2 1 1 2 2 1 1 2|1 . k 1k k . k 1k   k .k r11z.' k z.k + r12φk z.' k z.k + r12φk z.' k z.k + r 22φk z.' kφk z.k + s g−12k  z.' k ( r11e1 +r12e 2 + r11z.k g1k + r12φk z.k g1k )     +φk z.' k ( r12e1 + r 22e 2 + r12 z.k g1k + r 22φk z.k g1k )    r11z.' k z.k + r12φk z.' k z.k + r12φk z.' k z.k + r 22φk z.' kφk z.k + s g−12k n n n  11 n  12 11  2  12  2   r ∑ zik e1i + r ∑ zik e2i + r  ∑ zik  g1k + r  ∑ zik  φk g1k  i 1 =  i 1=  i1   i 1 ==  n n n  12 n   22 12 2  22 2  2   + r φk ∑ zik e1i + r φk ∑ zik e2i + r φk  ∑ zik  g1k + r φk  ∑ zik  g1k  i 1 =i 1 = =  i 1 = i1   = n   ( r11 + 2r12φk + r 22φk2 )  ∑ zik2  + s g−12k  i =1  = (r 11 n  n   n  + r12φk )  ∑ zik e1i  + ( r12 + r 22φk ) ∑ zik e2i + ( r11 + 2r12φk + r 22φk2 )  ∑ zik2  g1k i 1= =  i =1  i1  n 11 12 22 2  2  −2 ( r + 2r φk + r φk )  ∑ zik  + s g1k  i =1  and    n  v1k = ( r11 + 2r12φk + r 22φk2 )  ∑ zik2  + s g−12k   i =1    −1 Immediately after sampling g1k , we update the first trait residual using 222 e1 = e1 − z.k ( g1k [ t +1] − g1k [ t ] ) and update the second trait residual using e2 = e 2 − φk z.k ( g1k [ t +1] − g1k [ t ] ) , where g1k [ t +1] is the sampled g1k value at cycle [t+1] and g1k [ t ] is the sampled g1k value at cycle [t]. To sample random SNP effects for trait 2 conditional on trait 1 in CDBayesA, FCD for kth element of g2|1 can be written as follows: g 2|1k | y, β1 , β2 , g1 , g −2|1k , {s g21k } , {s g22|1k } , s e2 , {φk }k =1 ~ N ( g 2|1k , v2|1k ) m m m = k 1= k 1 where g 2|1k = = = = z.' k (( r 12 y1 +r 22 y 2 -r12 X1β1 -r 22 X 2β2 − r12 Zg1 − r 22 ZΨg1 − r 22 Zg 2|1 + r 22 z.k g 2|1k ) r z z +s 22 ' .k .k ( −2 g 2|1k z.' k r12 ( y1 − X1β1 − Zg1 ) +r 22 ( y 2 − X 2β2 − ZΨg1 − Zg 2|1 ) + r 22 z.k g 2|1k r z z +s 22 ' .k .k −2 g 2|1k z.' k ( r12e1 +r 22e 2 + r 22 z.k g 2|1k ) r 22 z.' k z.k + s g−2|2 1k r12 z.' k e1 +r 22 z.' k e 2 + r 22 z.' k z.k g 2|1k r 22 z.' k z.k + s g−2|2 1k n n  n  r12 ∑ zik e1i + r 22 ∑ zik e2i + r 22  ∑ zik2  g 2|1k =i 1 =i 1 = i1  = n   r 22  ∑ zik2  + s g−2|2 1k  i =1  and   n   v2|1k  r 22  ∑ zik2  + s g−2|2 1k  =   i =1   −1 223 ) ) Immediately after sampling g 2k , we update the second trait residual using e2 = e 2 − z.k ( g 2|1k [ t +1] − g 2|1k [ t ] ) , where g 2|1k [ t +1] is the sampled g 2|1k value at cycle [t+1] and g 2|1k [ t ] is the sampled g 2|1k value at cycle [t]. To sample random association parameters in CDBayesA, FCD for φk (k=1,2,…,m) can be written as follows: φk | y, β, g1 , g 2|1 , {s g2 } ,{s } m 1k 2 g 2|1 k k =1 m k =1 ( , Σ e , φ− k ~ N φk , vφ k ) with φk = ( s e2 s φ−2 mφ +z.gk ' ( y 2 -Xβ − Zg 2|1 ) +z.gkφk 2 ) z ' z + s sφ g .k g .k 2 e2 −2 s e2 s φ−2 mφ +z.gk ' ( e 2 + z.gkφ j ) s e2 s φ−2 mφ +z.gk ' e 2 +z.gk ' z.gkφk = z.gk ' z.gk + s e2 s φ−2 z.gk ' z.gk + s e2 s φ−2 2 2 2 n = 2  n  s e2 s φ−2 mφ +g1k ∑ zik e2 j +  g12k ∑ ( zik )  φk  2 =i 1 =i 1 n 2 −2 2 2 ik e2 φ 1k i =1  g  ∑(z ) 2    +s s  and  2 n  2   g1k ∑ ( z jk )   j =1    −2  vφ k = + sφ   s e22       −1 224 Immediately after sampling φk , we update the second trait residual using e2 = e 2 − g1k z.k (φk [ t +1] − φk [ t ] ) , where z.gk = g1k z.k , φk [ t +1] is the sampled φk value at cycle [t+1] and φk [ t ] is the sampled φk value at cycle [t]. To sampling SNP specific variances on trait 1 in CDBayesA, ( ) given that the prior density is scaled inverted chi-square: s g21k | v1 , s12  χ −2 v1 , v1s12 , FCD for s g21k (k=1,2,…,m) can be derived as follows: ( ) ( ) ( p s g21k | else ∝ p g1k | s g21k p s g21k | ν 1 , s12 ( ∝ 2πs g21k ∝s ) −1/2  g 2 exp  − 1k2  2s g 1k  ) ν1 ν s  2 ν 1s12 ν  − 1 +1 − 2   2  2s s g21k  2 e g1k   Γ  ν 1  2 2 1 1 2 2  (ν +1)  −ν 1s1 + g1 k − 1 +1 2 2s g1 k 2  2  g1 k e ( v1 + 1, scale = g12k + v1s12 Thus, s g21k | else  χ −2 df = ) To sample SNP specific variances on trait 2 conditional on trait 1 in CDBayesA, given that the prior density is scaled inverted chi-square: for s g2 (k=1,2,…,m) can be derived as follows: 2|1k 225 s g22|1k |ν 2|1 , s2|21  χ −2 ( v2|1 , v2|1s2|21 ) , FCD ( ( ) p s g2 | else ∝ p g 2|1k | s g2 ( 2|1 k ∝ 2πs 2 g 2|1 k ) −1/2 2|1 k  g 2|1k exp  − 2  2s g  2 2|1 k  (ν 2|1 +1) ∝s − 2  g2|1 k 2  +1  −  ) p (s  2 s g   2 g 2|1 k  ν 2|1 −  2 | ν 2|1 , s2|21  +1  − 2|1 k  e ) 2 ν 2|1s2|1 2s g 2 2|1 k 2 2 + g 2|1 k ν 2|1s2|1 2s g 2 e 2|1 k ( v2|1 1, scale = g 2|21k + v2|1s2|21 Thus, s g22|1k | else  χ −2 df =+ ) To sample random SNP effects and variances for trait 1 in CDBayesB, according 2 to the collapsed sampling strategy (Liu 1994), we jointly sampled g1k and s g1k as adapted in Bayes B (Meuwissen et al. 2001). Let’s first sample from p (s g21k | ELSE except g1k ) = ∫ p (s 2 g 1k , g1k | ELSE )dg1k g1 k  n  ∝ ∫  ∏ p ( yi1, yi 2 | β, g −1k , g1k , g 2|1 , R )  p ( g1k | s g21k ) p (s g21k | ν 1 , s12 , π 1 ) dg1k  g1 k  i =1 ∝ p (s g21k | ν 1 , s12 , π 1 ) V1k −1/2  1  exp  − y *−1k ' V1−k1y *−1k   2  where y *−1k =  y1,* −1k ' y *2,−1k '  ' , y1,* −1k =y1 -X1β1 − Z − k g −1k ,  z  .k 2 y *2,−1k = y 2 -X 2β 2 − ( ZΨ )− k g −= 1k − Zg 2|1 and V1k φ z  [ z.k φk z.k ]s g1k + R ⊗ I  k .k  Using that expression, the random walk Metropolis-Hastings acceptance ratio for ( ) sampling from p s g21k | ELSE except g1k at MCMC cycle [t] based on using ( ) p s g21k | ν 1 , s12 , π 1 as the candidate density looks as follows: 226 α (s 2[ t −1] g1 k ,s 2* g1 k ) ( ( ) ( ) ( ) )   p s g2* | ELSE except g1k p s g2[ t −1] | ν 1 , s12 , π 1  1k 1k  min  ,1  p s g2[ t −1] | ELSE except g1k p s g2* | ν 1 , s12 , π 1  = 1k 1k    1, otherwise;  According to Meuwissen et al. (2001), this ratio is further equal to: ( α s g2[t −1] , s g2* 1k 1k ) −1/2     1  V1k * exp  − y *−1k ' V1k *−1y *−1k      2   min  ,1  =  V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *    −1k − 1k  1k  1k    2     1, otherwise  ( ) Using results from Rohan Fernando, we could simplify the Metropolis acceptance ratio for sampling SNP specific variance for trait 1 further as follows: ( α s g2[t −1] , s g2* 1k 1k )     w12k  * −1/2   ( v1k ) exp  − *    2v1k  ,1 min    =  w12k   [ t −1] −1/2 exp v −  ( )      1k 2v1k [t −1]       1, otherwise Where  Ir11 Ir12   z.k g1k + e1  w1k =  z.' k φk z.' k   12  22    Ir Ir  φk z.k g1k + e 2   r11z g + r11e + r12φ z g +r12e  =  z.' k φk z.' k   12 .k 1k 12 1 22 k .k 1k 22 2   r z.k g1k + r e1 + r φk z.k g1k +r e 2  = ( r11 + 2r12φk + r 22φk2 ) z.' k z.k g1k + ( r11 + φk r12 ) z.' k e1 + ( r12 + φk r 22 ) z.' k e 2 and 227 = v1k var ( w1k ) = ( r11 + 2r12φk + r 22φk2 ) ( z.' k z.k ) s g21k + ( r11 + φk r12 ) ( z.' k z.k ) s e21 2 2 2 + ( r12 + φk r 22 ) ( z.' k z.k ) s e22 + ( r11 + φk r12 )( r12 + φk r 22 )( z.' k z.k ) s e12 2 2 n n 2 2 2 2 11 12 2 k k ik g1 k k =i 1 =i 1 = ( r + 2r φ + r φ 11 12 22 )  ∑ z    s  + (r + φ r )  ∑ z  2 ik  2  s e1  n 2   n  + ( r12 + φk r 22 )  ∑ zik2  s e22 + ( r11 + φk r12 )( r12 + φk r 22 )  ∑ zik2  s e12 =  i 1=  i1  2 If a non-zero value for s g1k is sampled, then one can draw samples of g1k using the same full conditionals as with CD-BayesA. If either g1[kt +1] or g1[kt ] are nonzero, residual for ( ) e1 − z.k g1k [ t +1] − g1k [ t ] and residual for trait 2 also trait 1 needs to be updated as e1 = ( ) e 2 − φk z.k g1k [ t +1] − g1k [ t ] immediately thereafter. needs to be updated as e 2 = To sample random SNP effects and variances for trait 2 conditional on trait 1 in CDBayesB, according to the collapsed sampling strategy (Liu 1994), we jointly sampled g 2|1k and s g2 2|1 k as adapted in Bayes B (Meuwissen et al. 2001). Let’s first sample from p (s g22|1k | ELSE except g 2 k ) = ∫ p (s 2 g 2|1k , g 2|1k | ELSE )dg 2|1k g2|1 k ∝  n  p ( yi1, yi 2 | β, g1k , g 2|1k , g −2|1k , R )  p ( g 2|1k | s g22|1k ) p (s g22|1k | ν 2 , s22 , π 2 ) dg 2|1k ∫g  ∏ i =1  2|1 k ∝ p (s g22|1k | ν 2 , s22 , π 2 ) V2|1k −1/2  1  exp  − y *−2|1k ' V2|−11k y *−2|1k   2  where y *−2|1k =  y1* ' y *2,−2|1k ' ' , y1* =y1 -X1β1 − Z1g1 , y *2,−2|1k= y 2 -X 2β 2 − ZΨg1 − Zg −2|1k and 0 V2|1k   [ 0 z.k ]s g2 2|1k + R ⊗ I =  z. k  228 Using that expression, the random walk Metropolis-Hastings acceptance ratio for ( ) sampling from p s g22|1k | ELSE except g 2|1k at MCMC cycle [t] based on using ( ) p s g22|1k | ν 2 , s22 , π 2 as the candidate density looks as follows: ( α s 2[ t −1] g2|1 k ,s 2* g2|1 k ) ( ( ) ( ) ( ) )   p s g2* | ELSE except g 2|1k p s g2[ t −1] | ν 2 , s22 , π 2  2|1 k 2|1 k  min  ,1 t − 2[ 1] 2* 2 =  p sg  | ELSE except g p s | ν , s , π g2|1 k 2|1k 2 2 2 2|1 k    1, otherwise;  According to Meuwissen et al. (2001), this ratio is further equal to: ( α s g2[t −1] , s g2* 2|1 k 2|1 k ) −1/2     1  exp  − y *−2|1k ' V2|1k *−1y *−2|1k  V2|1k *     2   min  ,1  =  V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *    2|1k −2|1k −2|1k   2|1k    2     1, otherwise  ( ) Using results from Rohan Fernando, we could simplify the Metropolis acceptance ratio for sampling SNP specific variance for trait 1 further as follows: ( α s g2[t −1] , s g2* 2|1 k 2|1 k ) 2     w2|1  −1/2 k   ( v2|1k * ) exp  − *   v 2   k 2|1   ,1 min  2 =     ( v [t −1] )−1/2 exp  − w2|1k     2v [t −1]    2|1k 2|1k       1, otherwise  Where 229 e1   Ir11 Ir12   w2|1k = 0 z.' k   12  22    Ir Ir   z.k g 2|1k + e 2   r11e + r12 z g +r12e  = 0 z.' k   12 1 22 .k 2|1k 22 2   r e1 + r z.k g 2|1k +r e 2  = r12 z.' k e1 +r 22 z.' k e 2 +r 22 z.' k z.k g 2|1k and v2|1k = var ( w2|1k ) = ( r12 ) ( z.' k z.k ) s e21 + ( r 22 ) ( z.' k z.k ) s e22 + r12 r 22 ( z.' k z.k ) s e12 + ( r 22 ) ( z.' k z.k ) s g21k 2 2 2 2 2 n n n  n 2 2 22 2  2  2 12 22  2  22 2  2  2 + r z s r z s r r z s r + + ( )  ∑ ik  e1 ( )  ∑ ik  e2  ∑ ik  e12 ( )  ∑ zik  s g1k =  i 1=   i 1=  i1=  i1  12 2 2 If a non-zero value for s g2|1k is sampled, then one can draw samples of g 2|1k using the [ t +1] [t ] same full conditionals as with CDBayesA. If either g 2|1k or g 2|1k are nonzero, residual ( ) e 2 − z.k g 2|1k [ t +1] − g 2|1k [ t ] immediately for trait 2 also needs to be updated as e 2 = thereafter. To sample random association parameters φk (k=1,2,…,m) in CDBayesB, given  N ( mφ , s φ2 ) with prob π φ the specified prior distribution, i.e. φk ~  , FCD on φk can be with prob 1- π φ 0 derived following Geweke (1994): If we define y *2,−φk = y 2 -X 2β 2 − ( ZΓ )− k φ− k − Zg 2|1 = z.gkφk + e 2 and z.gk = g1k z.k , then the ( )(  y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk k k likelihood function kernel is : exp  − 2  2s e2  230 )  . Conditional on    y *2,−φk k ' y *2,−φk k φk = 0 , the value of the kernel is: exp  −  2s e22    . Conditional on φk ≠ 0 , the  corresponding kernel density is: ( )(  y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk k k s φ exp  − 2  2s e2  −1 ( )  exp  − (φ k   )(  y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk k k =s φ exp  −  2s e22  −1 ( )(  y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk k k = s φ−1 exp  −  2s e22    2 − mφ )   2s φ2   )  exp  − (φ k     )  exp  − (φ 2ωk2 k     − φk − φˆk ) − (φ 2 k ) 2 2 φk 2v 2 − mφ )   2s φ2    2 2 ˆ2  exp  −  φk + mφ − φk   2ωk2 2s φ2 2vφ2k         where 2  2 n   2 2 −2 2  s e s φ mφ +g1k ∑ zik e2 j +  g1k ∑ ( zik )  φk  g1k ∑ ( z jk )  j =1 −2  2 = = i i 1 1    = + sφ and vφ k φˆk =   s e22  2 n 2 −2 2 + g z s s ( ) ∑ k ik e φ 1     i =1     n −1 n 2 2 To remove the conditioning on φk = 0 or on φk ≠ 0 , it is necessary to further integrate this expression over φk . This integration yields: ( )(  y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk k k = exp  −  sφ 2s e22  vφk )  exp  −  φ   mφ 2 φˆk 2   + −    2ωk2 2s φ2 2vφ2k      2 k Thus, the conditional Bayes Factor in favor of φk ≠ 0 over φk = 0 is, 231 ( )(  y *2,−φ ' y *2,−φ − y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk k k k k BFk exp  2  sφ 2s e2   φˆ 2 m 2  v = φ k exp  k 2 − φ 2   2v  sφ  φ k 2s φ  vφ k )  exp  −  φ mφ 2 φˆk 2 + −    2ωk2 2s φ2 2vφ2k   2 k        To draw φk from its conditional distribution, the conditional posterior probability that φk = 0 is computed from the conditional Bayes factor (BF) is pˆ k = 1 − πφ (1 − π φ ) + π φ BFk . Based on a comparison of this probability with a drawing from a Uniform(0,1), the ( 2 choice φk = 0 or φk ≠ 0 is made. If φk ≠ 0 , then a draw is made from a N φˆk , vφ k ) distribution. Immediately after sampling φk , if either φk[ t +1] or φk[ t ] are nonzero, we ( ) e 2 − g1k z.k φk [ t +1] − φk [ t ] . update residual for the second trait using e 2 = To sample proportion of SNP markers associated with non-zero SNP effects for ( ) trait 1 in CDBayesB, given a specified prior p π 1 | απ1 , βπ1 = Beta (απ1 , βπ1 ) , the FCD of π 1 is based on the following,  m  j =1  ( ) p (π 1 | ELSE ) ∝  ∏ p s g21 j |ν 1, s12 , π 1  p (π 1 | απ , βπ   = Let m1 ∑ I (s m j =1 2 g1 j 1 1 ) ) > 0 denote the number of non-zero values sampled in s g21 j at a particular MCMC cycle where I(.) denotes the indicator function. Then we can write p (π 1 | ELSE = ) beta(απ + m1, βπ + m − m1 ) . 1 1 232 To sample proportion of SNP markers associated with non-zero SNP effects for trait 2 conditional on trait 1 in CDBayesB, given a specified prior ( p π 2|1 | απ , βπ 2|1 2|1 ) = Beta(α  m  j =1 π 2|1 , βπ 2|1 ) , the FCD of π 2|1 is based on the following,  ) ( ( p (π 2|1 | ELSE ) ∝  ∏ p s g22|1 j |ν 2|1, s2|21, π 2|1  p π 2|1 | απ , βπ   = Let m2 ∑ I (s m j =1 2 g2|1 j 2|1 2|1 ) ) > 0 denote the number of non-zero values sampled in s g2 at a 2|1 j particular MCMC cycle where I(.) denotes the indicator function. Then we can write p (π 2|1 | ELSE = ) beta(απ + m2 , βπ + m − m2 ) . 2|1 2|1 To sample proportion of non-zero association parameters between two traits in ( ) CDBayesB, given a specified prior p π φ | αφ , βφ = Beta (αφ , βφ ) , the FCD of π φ is based on the following,  m  p π φ | ELSE ∝  ∏ p φk | mφ , s φ2 , π φ  p π φ | αφ , βφ  k =1  ( = mφ Let ( ) m ∑ I (φ k =1 k ) ( ) ≠ 0 ) denote the number of non-zero values sampled in φk at a particular MCMC cycle where I(.) denotes the indicator function. Then we can write p (π φ | ELSE = ) beta(αφ + mφ , βφ + m − mφ ) . 233 To sample scale parameter s12 for random SNP effects for trait 1 in CDBayesA\CDBayesB, given a specified prior for scale parameter p( s12 | α1 , β1 ) = Gamma (α1 , β1 ) , we can write FCD as follows:  m  p ( s12 | ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p ( s12 | α1 , β1 )  j =1  (    m =  ∏ I s g21 j > 0  j =1   ( m1ν 1 2 α1 + 2 −1 1 ∝ (s ) ) ν1  s ν 2 s 2ν ν  − 1 21  2  − 1 +1   s 2  2  e 2s g1 j g ν  1j Γ 1  2 2 1 1 ) ) (    2 α1 −1 − β s2  ( s1 ) e 1 1     ν m  exp  − s12  1 ∑ I s g21 j > 0 s g−12j + β1      2 j =1   ( ) mν ν m That is, a gamma distribution with parameters α1 + 12 1 and 1 ∑ I s g21 j > 0 s g−12j + β1 . 2 j =1 ( ) To sample scale parameter s2|21 for random SNP effects for trait 2 conditional on trait 1 in CDBayesA\CDBayesB, given a specified prior for scale parameter p( s2|21 | α 2 , β 2 ) = Gamma (α 2 , β 2 ) , we can write FCD as follows: 234  m  p ( s2|21 | ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2|1 , s2|21 , π 2|1  p ( s2|21 | α 2 , β 2 )  j =1  ( ) ( ) ν 2|1  2 2   s ν  2|1 2|1 2    ν 2|1  − s2|1ν 2|1  m − + 1   2  2s 2 =  ∏ I s g22|1 j > 0  s g22|1 j  2 e g2|1 j ν   j =1 Γ  2|1    2    ( ∝ ( s2|21 ) α2 + ) m2ν 2|1 2 −1    2 α −1  ( s2|21 ) 2 e − β2 s2|1      ν m  exp  − s2|21  2|1 ∑ I s g22|1 j > 0 s g−2|12 j + β 2      2 j =1   ( ) That is, a gamma distribution with parameters α 2 + ν 2|1 2 ∑ I (s m j =1 2 g2|1 j m2ν 2|1 and 2 ) > 0 s g−2|12 j + β 2 . To sample degrees of freedom for SNP effects on trait 1 in CDBayesA/CDBayesB, with a specified non-informative prior for degrees of freedom p(= v1 ) 1 / (1 + v1 ) , we can write FCD for v1 as follows, 2  m  p (ν 1 | ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p (ν 1 )  j =1  (    m =  ∏ I s g21 j > 0  j =1   ( ) ( ν1 )  s12  2 s12 ν  2 − 1 +1 − 2s 2   s 2  2  e g1 j g ν  1j Γ 1  2 )     p (ν 1 )    We sampled v1 using a random walk Metropolis Hastings algorithm, which was described in other non-genomic applications involving the sampling of degrees of freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010). 235 To sample degrees of freedom for SNP effects on trait 2 conditional on trait 1 in CDBayesA/CDBayesB, with a specified non-informative prior for degrees of freedom p( v= 1 / (1 + v2|1 ) , we can write FCD for v2|1 as the following, 2|1 ) 2  m  p (ν 2|1 | ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2|1 , s2|21 , π 2|1  p (ν 2|1 )  j =1  (    m =  ∏ I s g22|1 j > 0  j =1    ( ) ( ) ν 2|1 )  s2|21  2 2    ν 2|1  − s2|1 − +1 2 2   s 2  2  e 2s g2|1 j g  ν  2|1 j Γ  2|1   2      p (ν 2|1 )     We sampled v2|1 using a random walk Metropolis Hastings algorithm, which was described in other non-genomic applications involving the sampling of degrees of freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010). To sample mean for association parameters between two traits in CDBayesA/CD2 BayesB, given a specified prior mφ  N (τ , ζ ) , FCD can be written as the following,  m  p ( mφ | ELSE ) ∝  ∏ I (φk ≠ 0 ) p (φk | mφ , s φ2 )  p ( mφ | τ , ζ 2 )  j =1   1 ∝ exp  −  2s 2 φ  m ∑ I (φ k =1 k ≠ 0 ) (φk − mφ ) 2   ( mφ − τ ) 2   exp  −  2ζ 2     2 1  m m ∝ exp  − −  ( ) φ φ  2s m2  φ   Where 236 s φ2 φζ +τ 2 mφ = m mφ for φ = sφ 2 ζ + 2 ∑ φ I (φ k =1 k k ≠ 0) ζ and s mφ = 2 2 mφ mφ ζ + 2 mφ s φ2 s φ2 mφ 2 Hence, mφ | ELSE  N ( mφ , s mφ ) To sample variance for association parameters between two traits in CDBayesA/CDBayesB, given a specified prior p (s φ2 ) = (s φ2 ) −1/2 2 , FCD for s φ can be written as follows,  m  p (s φ2 | ELSE ) ∝  ∏ I (φk ≠ 0 ) p (φk | mφ , s φ2 )  p (s φ2 )  k =1    1 m −1/2 2  2 − m /2 2 exp πs − 2 ∑ I (φk ≠ 0 ) (φk − mφ )   (s φ2 ) (  φ )     2s φ k =1   m ∑ I (φk ≠0)(φk − mφ )  ( mφ −1)  k =1 − +1 − 2  2  ∝ sφ 2 2s φ2 e m 2 2 −2  ( mφ − 1), scale =∑1 (φk ≠ 0 ) (φk − mφ )  Hence, s φ | else  χ  df = k =1   To sample residual variance-covariance matrix in CDBayesA\CDBayesB, given a specified Inverted Wishart prior p ( Σe | v0 , Σ0 ) = IW ( v0 , Σ0 ) , we can write the FCD for Σe as follows, p ( Σe |ELSE , y ) ∝ p ( y | β, g, Σe ) p ( Σe | v0 , Σ0 ) ( ) exp −0.5 trace ( Σe −1Se ) Σe ∝ Σe − n 2 ∝ Σe − 1 ( n + v0 + 3) 2 ( ( − 1 ( v0 + 3) 2 ( exp −0.5trace ( Σe -1Σ0−1 ) )) exp −0.5 trace Σe −1 ( Se + Σ0−1 )    237 ) e ' e where S e =  1' 1 e 2 e1 e1' e 2   e '2 e 2  ( Hence, Σe | else ~ IW n + v0 , ( Se + Σ0−1 ) ) 238 D2 Supplementary tables and figures Table D2.1: Summary of Responsible surface designs (RSD) in LE simulation. Factors RSD Controlling Heritability on Trait 1 0.8 Heritability on Trait 2 0.1 Residual covariance between two traits 0 Number of SNPs 2000 Mean on association parameters 0.8 Number of animals 2000,4000,6000 Variance on association parameters 2e-3,1.001,2 Number of QTLs on Trait 1 20,310,600 Number of QTLs on Trait 2 20,310,600 Number of QTLs on both traits 20,310,600 Investigating Table D2.2: P values for the fixed effects by fitting accuracy on the lower heritability trait as response variable in LE simulation under RSD. Fixed effects P-value Number of animals (n) <0.0001 Number of QTL on Trait 2 (M2) <0.0001 Number of QTL on both traits (M12) <0.0001 Variance on association parameters ( s φ ) 0.0005 M2*M12 <0.0001 M12*M12 <0.0001 M12*Method <0.0001 M12*M12*Method 0.0038 2 239 Figure D2.1: Estimated SNP effects for Rust_gall_vol and Rust_bin against SNP index using whole Pine data set comparing CDBayesA1 and CDBayesA2, CDBayesB1 and CDBayesB2. A) and C) were for Rust_gall_vol, B) and D) were for Rust_bin. 240 BIBLIOGRAPHY 241 BIBLIOGRAPHY Abasht, B., E. Sandford, et al. (2009). "Extent and consistency of linkage disequilibrium and identification of DNA markers for production and egg quality traits in commercial layer chicken populations." Bmc Genomics 10(Suppl. 2): S2. Badke, Y., R. Bates, et al. (2013). "Methods of tagSNP selection and other variables affecting imputation accuracy in swine." Bmc Genetics 14(1): 8. Banerjee, S., B. S. Yandell, et al. (2008). "Bayesian quantitative trait loci mapping for multiple traits." Genetics 179(4): 2275-2289. Beerda, B., W. Ouweltjes, et al. (2007). "Effects of genotype by environment interactions on milk yield, energy balance, and protein balance." Journal of Dairy Science 90(1): 219228. Bello, N. M., J. P. Steibel, et al. (2010). "Hierarchical Bayesian modeling of random and residual variance-covariance matrices in bivariate mixed effects models." Biometrical Journal 52(3): 297-313. Berry, D. P., F. Buckley, et al. (2003). "Estimation of genotype X environment interactions, in a grassbased system, for milk yield, body condition score, and body weight using random regression models." Livestock Production Science 83(2-3): 191-203. Bohmanova, J., I. Misztal, et al. (2008). "Short communication: Genotype by environment interaction due to heat stress." Journal of Dairy Science 91(2): 840-846. Burgueno, J., G. de los Campos, et al. (2012). "Genomic Prediction of Breeding Values when Modeling Genotype x Environment Interaction using Pedigree and Dense Molecular Markers." Crop Science 52(2): 707-719. Calus, M. P. L., A. F. Groen, et al. (2002). "Genotype x environment interaction for protein yield in Dutch dairy cattle as quantified by different models." Journal of Dairy Science 85(11): 3115-3123. Calus, M. P. L., T. H. E. Meuwissen, et al. (2008). "Accuracy of genomic selection using different methods to define haplotypes." Genetics 178(1): 553-561. 242 Calus, M. P. L. and R. F. Veerkamp (2003). "Estimation of environmental sensitivity of genetic merit for milk production traits using a random regression model." Journal of Dairy Science 86(11): 3756-3764. Calus, M. P. L. and R. F. Veerkamp (2007). "Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM." Journal of Animal Breeding and Genetics 124(6): 362-368. Calus, M. P. L. and R. F. Veerkamp (2011). "Accuracy of multi-trait genomic selection using different methods." Genetics Selection Evolution 43. Cardoso, F. F. and R. J. Tempelman (2012). "Linear reaction norm models for genetic merit prediction of Angus cattle under genotype by environment interaction." Journal of Animal Science 90(7): 2130-2141. Carlin, B. P. and T. A. Louis (2008). Bayesian Methods for Data Analysis. Boca Raton, FL, CRC Press. Chan, J. C. C. and I. Jeliazkov (2009). "MCMC Estimation of Restricted Covariance Matrices." Journal of Computational and Graphical Statistics 18(2): 457-480. Chib, S. and E. Greenberg (1995). "Understanding the Metropolis-Hastings algorithm." American Statistician 49(4): 327-335. Choi, I., J. P. Steibel, et al. (2010). "Application of alternative models to identify QTL for growth traits in an F-2 Duroc x Pietrain pig resource population." Bmc Genetics 11. Coster, A., J. W. M. Bastiaansen, et al. (2010). "QTLMAS 2009: simulated dataset." BMC Proceedings 4(Suppl 1): S3. Coster, A., J. W. M. Bastiaansen, et al. (2010). "Sensitivity of methods for estimating breeding values using genetic markers to the number of QTL and distribution of QTL variance." Genetics Selection Evolution 42: 9. Daetwyler, H. D. (2009). Genome-wide evaluation of populations. Wageningen, Netherlands, Wageningen Universiteit (Wageningen University). 243 Daetwyler, H. D., M. P. L. Calus, et al. (2013). "Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking." Genetics 193(2): 347-+. Daetwyler, H. D., R. Pong-Wong, et al. (2010). "The Impact of Genetic Architecture on Genome-Wide Evaluation Methods." Genetics 185(3): 1021-1031. Daniels, M. J. and M. Pourahmadi (2002). "Bayesian analysis of covariance matrices and dynamic models for longitudinal data." Biometrika 89(3): 553-566. De Donato, M., S. O. Peters, et al. (2013). "Genotyping-by-Sequencing (GBS): A Novel, Efficient and Cost-Effective Genotyping Method for Cattle Using Next-Generation Sequencing." PLoS ONE 8(5): e62137. de Jong, G. (1995). "Phenotypic plasticity as a product of selection in a varialbe environment." American Naturalist 145(4): 493-512. de los Campos, G. and D. Gianola (2007). "Factor analysis models for structuring covariance matrices of additive genetic effects: a Bayesian implementation." Genetics Selection Evolution 39(5): 481 - 494. de los Campos, G., D. Gianola, et al. (2010). "Predicting genetic predisposition in humans: the promise of whole-genome markers." Nature Reviews Genetics 11(12): 880886. de los Campos, G., J. M. Hickey, et al. (2012). "Whole Genome Regression and Prediction Methods Applied to Plant and Animal Breeding." Genetics. de los Campos, G., J. M. Hickey, et al. (2013). "Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding." Genetics 193(2): 327-+. de los Campos, G., H. Naya, et al. (2009). "Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree." Genetics 182(1): 375385. De Roos, A. P. W., B. J. Hayes, et al. (2008). "Linkage disequilibrium and persistence of phase in Holstein Friesian, Jersey and Angus cattle." Genetics 179: 1503 - 1512. 244 Deeb, N. and A. Cahaner (2001). "Genotype-by-environment interaction with broiler genotypes differing in growth rate. 1. The effects of high ambient temperature and nakedneck genotype on lines differing in genetic background." Poultry Science 80(6): 695-702. Du, F. X., A. C. Clutter, et al. (2007). "Characterizing linkage disequilibrium in pig populations." International Journal of Biological Sciences 3(3): 166-178. Duarte, J. L. G., R. O. Bates, et al. (2013). "Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels." Bmc Genetics 14. Edwards, D. B., C. W. Ernst, et al. (2008). "Quantitative trait loci mapping in an F-2 Duroc x Pietrain resource population: I. Growth traits." Journal of Animal Science 86(2): 241-253. Falconer, D. S. (1952). "The problem of environment and selection." The American Naturalist 86: 293-298. Gelman, A. (2006). "Prior distributions for variance parameters in hierarchical models." Bayesian Analysis 1(3): 515-533. Gelman, A., J. B. Carlin, et al. (2003). Bayesian Data Analysis. Boca Raton, FL., CRC Press. Gianola, D. (2013). "Priors in whole-genome regression: the bayesian alphabet returns." Genetics 194(3): 573-596. Gianola, D., G. de los Campos, et al. (2009). "Additive Genetic Variability and the Bayesian Alphabet." Genetics 183(1): 347-363. Gianola, D., M. Perez-Enciso, et al. (2003). "On marker-assisted prediction of genetic value: Beyond the ridge." Genetics 163(1): 347-365. Gianola, D. and D. Sorensen (2004). "Quantitative Genetic Models for Describing Simultaneous and Recursive Relationships Between Phenotypes." Genetics 167(3): 14071424. Gianola, D., X.-L. Wu, et al. (2010). "A non-parametric mixture model for genomeenabled prediction of genetic value for a quantitative trait." Genetica 138(9): 959-977. 245 Gilmour, A. R., B. J. Gogel, et al. (2009). "2009 ASReml User Guide Release 3.0 " VSN International Ltd, Hemel Hempstead, HP1 1ES, UK. Goddard, M. E. and B. J. Hayes (2009). "Mapping genes for complex traits in domestic animals and their use in breeding programmes." Nature Reviews Genetics 10(6): 381-391. Goddard, M. E., N. R. Wray, et al. (2009). "Estimating Effects and Making Predictions from Genome-Wide Marker Data." Statistical Science 24(4): 517-529. Grapes, L., J. C. M. Dekkers, et al. (2004). "Comparing linkage disequilibrium-based methods for fine mapping quantitative trait loci." Genetics 166(3): 1561-1570. Habier, D., R. L. Fernando, et al. (2007). "The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values." Genetics 177(4): 2389-2397. Habier, D., R. L. Fernando, et al. (2011). "Extension of the Bayesian alphabet for genomic selection." Bmc Bioinformatics 12: 186. Hadjipavlou, G. and S. C. Bishop (2009). "Age-dependent quantitative trait loci affecting growth traits in Scottish Blackface sheep." Animal Genetics 40(2): 165-175. Hayes, B. and M. E. Goddard (2001). "The distribution of the effects of genes affecting quantitative traits in livestock." Genetics Selection Evolution 33(3): 209-229. Hayes, B. J., P. J. Bowman, et al. (2009). "Invited review: Genomic selection in dairy cattle: Progress and challenges." J Dairy Sci 92(2): 433 - 443. Henderson, C. R. (1976). "A Simple Method for Computing the Inverse of a Numerator Relationship Matrix Used in Prediction of Breeding Values." Biometrics 32(1): 69-83. Henderson, C. R. (1984). Applications of Linear Models in Animal Breeding. Guelph, Canada, University of Guelph. Hickey, J. M. and G. Gorjanc (2012). "Simulated Data for Genomic Selection and Genome-Wide Association Studies Using a Combination of Coalescent and Gene Drop Methods." G3-Genes Genomes Genetics 2(4): 425-427. 246 Hill, W. G., M. E. Goddard, et al. (2008). "Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits." PLoS Genet 4(2): e1000008. Hoggart, C. J., J. C. Whittaker, et al. (2008). "Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies." PLoS Genetics 4(7): e1000130. Jarmila, B., M. Sargolzaei, et al. (2010). "Characteristics of linkage disequilibrium in North American Holsteins." BMC Genomics 11: 11. Jia, Y. and J. L. Jannink (2012). "Multiple-Trait Genomic Selection Methods Increase Genetic Value Prediction Accuracy." Genetics 192(4): 1513-+. Karkkainen, H. P. and M. J. Sillanpaa (2012). "Back to Basics for Bayesian Model Building in Genomic Selection." Genetics 112(139014). Kizilkaya, K., P. Carnier, et al. (2003). "Cumulative t-link threshold models for the genetic analysis of calving ease scores." Genetics Selection Evolution 35(6): 489 - 512. Kizilkaya, K. and R. J. Tempelman (2005). "A general approach to mixed effects modeling of residual variances in generalized linear mixed models." Genetics Selection Evolution 37(1): 31-56. Knap, P. W. and G. Su (2008). "Genotype by environment interaction for litter size in pigs as quantified by reaction norms analysis." Animal 2(12): 1742-1747. Lee, S. H., J. H. van der Werf, et al. (2008). "Predicting unobserved phenotypes for complex traits from whole-genome SNP data." PLoS Genet 4(10): e1000231. Legarra, A. and V. Ducrocq (2012). "Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction." Journal of Dairy Science 95(8): 4629-4645. Legarra, A. and I. Misztal (2008). "Technical note: Computing strategies in genome-wide selection." Journal of Dairy Science 91(1): 360-366. Legarra, A., C. Robert-Granie, et al. (2008). "Performance of Genomic Selection in Mice." Genetics 180(1): 611-618. 247 Lillehammer, M., M. Arnyasi, et al. (2007). "A genome scan for quantitative trait locus by environment interactions for production traits." Journal of Dairy Science 90(7): 34823489. Lillehammer, M., M. E. Goddard, et al. (2008). "Quantitative trait locus-by-environment interaction for milk yield traits on Bos taurus autosome 6." Genetics 179(3): 1539-1546. Lillehammer, M., B. J. Hayes, et al. (2009). "Gene by environment interactions for production traits in Australian dairy cattle." Journal of Dairy Science 92(8): 4008-4017. Lillehammer, M., J. Odegard, et al. (2007). "Random regression models for detection of gene by environment interaction." Genetics Selection Evolution 39(2): 105-121. Liu, J. S. (1994). "The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene-Regulation Problem." Journal of the American Statistical Association 89(427): 958-966. Logsdon, B., G. Hoffman, et al. (2010). "A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis." BMC Bioinformatics 11(1): 58. Lorenz, A. J., S. M. Chao, et al. (2011). Genomic Selection in Plant Breeding: Knowledge and Prospects. Advances in Agronomy, Vol 110. San Diego, Elsevier Academic Press Inc. 110: 77-123. Mattar, M., L. O. C. Silva, et al. (2011). "Genotype x environment interaction for longyearling weight in Canchim cattle quantified by reaction norm analysis." Journal of Animal Science 89(8): 2349-2355. Meuwissen, T. and M. Goddard (2010). "Accurate Prediction of Genetic Values for Complex Traits by Whole-Genome Resequencing." Genetics 185(2): 623-631. Meuwissen, T. H. E., B. J. Hayes, et al. (2001). "Prediction of total genetic value using genome-wide dense marker maps." Genetics 157(4): 1819-1829. Misztal, I., S. E. Aggrey, et al. (2013). "Experiences with a single-step genome evaluation." Poultry Science 92(9): 2530-2534. 248 Misztal, I., A. Legarra, et al. (2009). "Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information." Journal of Dairy Science 92(9): 4648-4655. Muller, P. (1991). "A generic approach to posterior integration and Gibbs sampling. Technical Report 91-09." Retrieved March 12, 2013, from http://www.stat.purdue.edu/research/technical_reports/1991-tr.html. Munilla, S. and R. J. C. Cantet (2012). "Bayesian conjugate analysis using a generalized inverted Wishart distribution accounts for differential uncertainty among the genetic parameters - an application to the maternal animal model." Journal of Animal Breeding and Genetics 129(3): 173-187. Musani, S. K., H. G. Zhang, et al. (2006). "Principal component analysis of quantitative trait loci for immune response to adenovirus in mice." Hereditas 143(1): 189-197. Ntzoufras, I. (2011). Bayesian Modeling Using WinBugs, John Wiley & Sons. O'Hara, R. B. and M. J. Sillanpaa (2009). "A Review of Bayesian Variable Selection Methods: What, How and Which." Bayesian Analysis 4(1): 85-117. Pinheiro, J. C., C. H. Liu, et al. (2001). "Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution." Journal of Computational and Graphical Statistics 10(2): 249-276. Plummer, M., N. Best, et al. (2006). "CODA: convergence diagnostics and output analysis for MCMC." R News 6(1): 7-11. Pourahmadi, M. (1999). "Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation." Biometrika 86(3): 677-690. Resende, M. F. R., P. Munoz, et al. (2012). "Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine (Pinus taeda L.)." Genetics 190(4): 1503-+. Riedelsheimer, C., F. Technow, et al. (2012). "Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines." BMC Genomics 13(1): 452. 249 Shariati, M. and D. Sorensen (2008). "Efficiency of alternative MCMC strategies illustrated using the reaction norm model." Journal of Animal Breeding and Genetics 125(3): 176-186. Shepherd, R. K., T. H. Meuwissen, et al. (2010). "Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers." BMC Bioinformatics 11: 529. Sorensen, D. and D. Gianola (2002). Likelihood, Bayesian, and MCMC methods in quantitative genetics. New York, Springer-Verlag. Strandén, I. and D. J. Garrick (2009). "Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit." Journal of Dairy Science 92(6): 2971-2975. Streit, M., F. Reinhardt, et al. (2012). "Reaction norms and genotype-by-environment interaction in the German Holstein dairy cattle." Journal of Animal Breeding and Genetics 129(5): 380-389. Streit, M., R. Wellmann, et al. (2013). "Using Genome-Wide Association Analysis to Characterize Environmental Sensitivity of Milk Traits in Dairy Cattle." G3-Genes Genomes Genetics 3(7): 1085-1093. Su, G., P. Madsen, et al. (2006). "Bayesian analysis of the linear reaction norm model with unknown covariates." Journal of Animal Science 84(7): 1651-1657. Technow, F. (2012). "R Package Hypred: Simulation of Genomic Data in Applied Genetics." Technow, F. and A. Melchinger (2013). "Genomic prediction of dichotomous traits with Bayesian logistic models." Theoretical and Applied Genetics: 1-11. Technow, F., C. Riedelsheimer, et al. (2012). "Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects." Theoretical and Applied Genetics 125(6): 1181-1194. Valdar, W., L. C. Solberg, et al. (2006). "Genome-wide genetic association of complex traits in heterogeneous stock mice." Nature Genetics 38(8): 879-887. 250 Valdar, W., L. C. Solberg, et al. (2006). "Genetic and Environmental Effects on Complex Traits in Mice." Genetics 174(2): 959-984. van Binsbergen, R., R. F. Veerkamp, et al. (2012). "Makeup of the genetic correlation between milk production traits using genome-wide single nucleotide polymorphism information." Journal of Dairy Science 95(4): 2132-2143. VanRaden, P. M. (2008). "Efficient Methods to Compute Genomic Predictions." Journal of Dairy Science 91(11): 4414-4423. Vaughan, L. K., J. Divers, et al. (2009). "The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies." Computational Statistics & Data Analysis 53(5): 1755-1766. Vazquez, A. I., G. de los Campos, et al. (2012). "A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer Risk in Humans." Genetics 192(4): 1493-1502. Verbyla, K. L., B. J. Hayes, et al. (2009). "Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle." Genetics Research 91(5): 307-311. Vichi, M. and G. Saporta (2009). "Clustering and disjoint principal component analysis." Computational Statistics & Data Analysis 53(8): 3194-3208. Villumsen, T. M., L. Janss, et al. (2008). "The importance of haplotype length and heritability using genomic selection in dairy cattle." Journal of Animal Breeding and Genetics 126: 3-13. Waagepetersen, R., N. Ibanez-Escriche, et al. (2008). "A comparison of strategies for Markov chain Monte Carlo computation in quantitative genetics." Genetics Selection Evolution 40(2): 161-176. Wang, C. L., X. D. Ding, et al. (2013). "Bayesian methods for estimating GEBVs of threshold traits." Heredity 110(3): 213-219. Wang, C. S., J. J. Rutledge, et al. (1994). "Bayesian-Analysis of Mixed Linear-Models Via Gibbs Sampling with an Application to Litter Size in Iberian Pigs." Genetics Selection Evolution 26(2): 91-115. 251 Wang, H., I. Misztal, et al. (2012). "Genome-wide association mapping including phenotypes from relatives without genotypes." Genetics Research 94(2): 73-83. Weller, J. I., G. R. Wiggans, et al. (1996). "Application of a canonical transformation to detection of quantitative trait loci with the aid of genetic markers in a multi-trait experiment." Theoretical and Applied Genetics 92(8): 998-1002. Wiggans, G. R., P. M. VanRaden, et al. (2011). "The genomic evaluation system in the United States: Past, present, future." Journal of Dairy Science 94(6): 3202-3211. Wimmer, V., C. Lehermeier, et al. (2013). "Genome-Wide Prediction of Traits with Different Genetic Architecture Through Efficient Variable Selection." Genetics 195(2): 573-+. Yang, W. and R. J. Tempelman (2012). "A Bayesian antedependence model for whole genome prediction." Genetics 190(4): 1491-1501. Yi, N. and S. Xu (2008). "Bayesian Lasso for quantitative trait loci mapping." Genetics 179(2): 1045 - 1055. Zhu, W. S. and H. P. Zhang (2009). "Rejoinder: Why do we test multiple traits in genetic association studies?" Journal of the Korean Statistical Society 38(1): 25-27. Zimmerman, D. L. and V. A. Núñez-Antón (2010). Antedependence Models for Longitudinal Data, Chapman and Hall/CRC. 252